A new âempathic voice interfaceâ launched today by Hume AI, a New Yorkâbased startup, makes it possible to add a range of emotionally expressive voices, plus an emotionally attuned ear, to large language models from Anthropic, Google, Meta, Mistral, and OpenAIâportending an era when AI helpers may more routinely get all gushy on us.
âWe specialize in building empathic personalities that speak in ways people would speak, rather than stereotypes of AI assistants,â says Hume AI cofounder Alan Cowen, a psychologist who has coauthored a number of research papers on AI and emotion, and who previously worked on emotional technologies at Google and Facebook.
WIRED tested Humeâs latest voice technology, called EVI 2 and found its output to be similar to that developed by OpenAI for ChatGPT. (When OpenAI gave ChatGPT a flirtatious voice in May, company CEO Sam Altman touted the interface as feeling âlike AI from the movies.â Later, a real movie star, Scarlett Johansson, claimed OpenAI had ripped off her voice.)
Like ChatGPT, Hume is far more emotionally expressive than most conventional voice interfaces. If you tell it that your pet has died, for example, it will adopt a suitable somber and sympathetic tone. (Also, as with ChatGPT, you can interrupt Hume mid-flow, and it will pause and adapt with a new response.)
OpenAI has not said how much its voice interface tries to measure the emotions of users, but Humeâs is expressly designed to do that. During interactions, Humeâs developer interface will show values indicating a measure of things like âdetermination,â âanxiety,â and âhappinessâ in the usersâ voice. If you talk to Hume with a sad tone it will also pick up on that, something that ChatGPT does not seem to do.
Hume also makes it easy to deploy a voice with specific emotions by adding a prompt in its UI. Here it is when I asked it to be âsexy and flirtatiousâ:
And when told to be âsad and moroseâ:
And hereâs the particularly nasty message when asked to be âangry and rudeâ:
The technology did not always seem as polished and smooth as OpenAIâs, and it occasionally behaved in odd ways. For example, at one point the voice suddenly sped up and spewed gibberish. But if the voice can be refined and made more reliable, it has the potential to help make humanlike voice interfaces more common and varied.
The idea of recognizing, measuring, and simulating human emotion in technological systems goes back decades and is studied in a field known as âaffective computing,â a term introduced by Rosalind Picard, a professor at the MIT Media Lab, in the 1990s.
Albert Salah, a professor at Utrecht University in the Netherlands who studies affective computing, is impressed with Hume AIâs technology and recently demonstrated it to his students. âWhat EVI seems to be doing is assigning emotional valence and arousal values [to the user], and then modulating the speech of the agent accordingly,â he says. âIt is a very interesting twist on LLMs.â