August 26, 2019 — Blog Post
Designing for voice: The science behind empathy
A conversation with Freddie Feldman
This is part 2 of our interview with Freddie Feldman, Voice Design Director at Clinical Effectiveness, Wolters Kluwer, Health.
You write primarily for the “voice” of Emmi, who is a voice actor named Deb.
Yes, we script what Deb says. But we also have to anticipate what the patient may say back — all the infinite variety of responses that people can give in real life.
Those are called grammars. It’s like a dictionary of possible responses. We may ask a question like, “Were you able to pick up your meds? Were you able to fill your prescription?” That’s a “yes/no” question.
We didn’t say, “Were you able to pick up your prescription? Yes or no?” That’s a little wordy and a little awkward to prompt somebody in that way. So it sounds a bit open-ended, but it’s not. It’s directed.
The typical person, when they’re asked, “Were you able to fill your prescription?” they understand that as a “yes/no” question. But they may not give a “yes/no” answer. So, we have a grammar that we set up for that question and we say, the patient can say “yes,” “no,” or “I don’t know.”
But underneath each bucket, we fill it with words that a patient could say. So, “yep,” “yeah.” We don’t handle “uh-huh,” because that’s not an actual word.
And then there are times when we have something that’s not even close to “yes” or “no.”
That’s a good one. But that’s an affirmative still, right? If someone said, “Where does ‘of course’ fit under?” and you had to put it in one of three piles, “yes,” “no,” or “I don’t know,” “of course” would always go under “yes.”
We have to come up with certain additional grammars for questions that are contextual. We ask patients the question, “I can transfer you to somebody to set up your appointment. Would you like me to do that?” “Do it,” is what the patient says, or “Send me there,” or all kinds of weird things like that. Because sometimes they just like to get cute, and they’re just playing with it.
If they don’t say something that’s in one of these grammars, we give them an error prompt: “I’m sorry I didn’t understand you.” You can say, “Yes or press one or say no.” And then we guide them into those buckets, or three, for “I don’t know.”
But we would find periodically for certain populations we would often have lots of those error prompts, where we weren’t understanding them. We would listen to recordings of the calls, and we learned that in the South we have a large number of patients that say, “yes, ma’am.”
In the South, “yes ma’am” and “yes sir” are standard polite responses.
So we added “yes ma’am”. However, it does not mean that we support intent parsing, which is natural language processing or machine learning, like an Amazon Echo or Google Home. Our programming isn’t able to anticipate what they’re trying to get at. It can only know, literally, what people say.
By limiting the grammars somewhat, we are able to have a higher confidence level on what they said in the speech recognition engine. We aren’t able to do things like, “Can I help you with–? Is there anything else I can answer for you?” We can’t have open-ended questions like that.
But some of the questions ask the patient for a data point, don’t they?
Right. “I need your weight, glucose number, or a date.” Date collection is another big one. “Have you had a mammogram in the last year?” “Yes.” “When was that? What month? Give me the month and year.”
When I give my presentations, I play the exact same text side-by-side: Deb versus text-to-speech (TTS). For example, the text might be about how important it is to get a mammogram. And I ask, “Which one of these voices is going to convince your mother or your wife or your sister that she needs to get a mammogram?”
Play the two audio files.
Listen to the human voice of Deb, the Emmi voiceover artist:
Listen to the computer- generated artificial voice:
Which would you rather listen to for any length of time?