It’s clear that AI is entering every aspect of gaming. So where is the technology when it comes to voice-overs? Does it work? And is it worth using? We looked at the various tools out there to see the best ways to use it and whether the quality was up to scratch.
AI voice-overs won’t replace voice actors
First of all, we don’t believe the quality of AI voice-acting is anywhere close to a real actor. Even AI tools that are focused on the gaming sector hover in that uncanny valley, where the voice just sounds robotic and stilted. Not awful. But it just doesn’t have that same cadence a real person would give.
True, the technology will advance. But we don’t see it replacing the need for a real person. For one, you need to model the AI after someone. But even then, an AI can’t decide when to pause to show emotion or emphasise a point. It doesn’t have that same awareness of the context of the situation.
A YouTube comment on Sonantic’s video about AI voices.
As models improve, it’ll get better. But it’ll never be perfect, and it’ll take a lot of work from the developer to make it believable. A well-written scene, filled with character development and poignant moments, will always need a real actor to do it justice.
The results are rather stilted
We experimented with a few different voice-over AI tools to see how well they performed, such as ReadSpeaker, PlayHT, REsemble.AI, Lovo.AI, and Replica Studios.
Even just listening to the highlight reels on their websites, the examples sound robotic and somewhat lifeless. They might be passable for minor moments or tutorial text, but they’re certainly not good enough for emotional scenes or believable characters.
There are more specialised tools, like Replica Studios, which allow you to change the emotion behind the text and adjust the settings. But even these fall flat when the text gets longer or more nuanced. Small snippets of text, like one-liners, tutorial hints or narration, can be okay. But some words seem to completely mystify the computer and it can’t make the whole paragraph… flow.
So if the quality isn’t up to scratch, what’s the point of using it?
AI can speed up prototypes
There aren’t many studios using AI for voice-over work. At least, not work that’s out in the wild. It seems that most are using it to help speed up their development process, rather than using it for their final release.
Obsidian uses AI to make sure that the story is flowing properly and that the characters are behaving believably. And, as games become more and more customizable, it’s impractical to record those lines until the very end. AI can improve the quality of the prototype and testing build.
This seems to be a trend with most studios.
“We use Replica’s software to test scripts, dialogue, and gameplay sequences before engaging human voice actors to record the final lines,” said Chris O’Neill, the senior audio designer at PlaySide Studios.
Likewise, Ninja Theory said on X (Twitter):
“We use this AI only to help us understand things like timing and placement in early phases of development. We then collaborate with real actors whose performances are at the heart of bringing our stories to life.”
This seems like a good way to think about AI in general. Use it as a placeholder or way to brief your creative team. It can help your director communicate what they want and speed the process along.
AI allows for ‘generated’ content
There are already hundreds of thousands of lines of dialogue in modern games. Bethesda’s Starfield has around 250,000 lines. Baldur’s Gate 3, even during early access, had well over 45,000 lines – and that was just the first act. Red Dead Redemption 2 reportedly had over 500,000 across 1,000 voice actors.
Games are just getting bigger and bigger. The main bulk of the dialogue probably won’t replace the need for human actors. But it can help tidy up the quality after it’s been recorded.
With so many lines of dialogue, it’s not always practical to record it all at once. Baldur’s Gate 3 has great writing and quality actors. But sometimes it’s clear the lines were recorded at different times. Using AI to just tidy it up and make it consistent could really help.
But that’s just the written dialogue. The intentional dialogue. What players want is interactivity – to be able to talk to characters and have unique responses.
The next step is inevitably more “generated” or “dynamic” dialogue. Dialogue that’s powered by AI language models to respond to the player in real-time.
Replica Studios is already working on this, with their Smart NPCs plugin for Unreal Engine. And it’s pretty impressive.
AI will soon respond to players – and it can’t all be acted
The idea is simple. Imagine you could walk around a world and talk to any NPC and they’d respond like a real human being. It seems fantastical, but it’s within reach. We wouldn’t be surprised if we see a game with AI NPCs in the next couple of years.
Replica Studios did a demo with Matrix Awakens using their Smart NPCs. Their official demo is a bit lacklustre, so here’s a better example from YouTuber TmarTn2 trying it out.
As you can see, it’s pretty impressive. But janky. The novelty of saying anything to an NPC would likely wear thin after a little while and the responses aren’t world-shattering. Mix in a real writer, coming up with scenarios and stories that the NPCs could draw from – and we’re sure it’ll be mind-blowing.
The problem is that it’s all unique content. It needs an AI voice actor to speak the lines, because it’s literally impossible to record the dialogue.
We predict that studios will need to licence an actor’s voice to allow for this dynamic content. Pay the actor normally for the ‘real’ dialogue and then an extra fee to model their voice for the generated content.
Sure, the generated content will never be as good as the parts the voice actor actually performed. But, you know what? That’s fine. As a player, I’m willing to accept a bit of janky dialogue as an extra. I suspend my disbelief. It feels like the old days where the graphics weren’t particularly good. After a while, your mind fills in the blanks.
AI could help accessibility
Text-heavy games are always a problem for those that can’t read them. Whether the player is completely blind or just struggles seeing the tiny font – having a computer read out the text can be incredibly helpful.
Developers could use AI as a tool for accessibility. For example, you could have it narrate actions for blind people like “Frank enters the room.” Or just have it read out the in-game text and menus.
This is particularly useful for ports of old games. A game like Final Fantasy VII was purely text-based. Imagine Square Enix, when they ported it to PC, could just slap on an AI tool to read out all that text. It’d open it up to so many more players.
It’s possible to embrace AI and be ethical
If a developer wants to only use AI for their voice acting, it’s not really viable right now. Even in the future, it’s going to take a lot of effort to get to the quality you’d expect from an actor. There’s still a price to pay – time. For the most part, we imagine that developers will need a mix of AI and real people.
But how do we balance the two? Society, in general, has a lot to learn about how to work with AI. Regulations need to be set. Standards need to be made. Questions need answering.
With the right licences for voice actors, which pay them fairly for their talent, we can see a bright future for gaming. AI has the potential to become the private Game Master, helping run unique games for every individual player. Even if the voices do all sound the same.
But, then again, isn’t that every Game Master?
If you’d like to stay in the loop about the latest news from the gaming industry, make sure you subscribe to our newsletter.