Technology and language are strange and occasionally wonderful bedfellows. The same field that gave us 802.11b to describe a common household wireless standard is also capable of whimsical and clever trademarks. (Quick: when I say “blackberry,” do you envision a smart phone or an old-fashioned fruit?)
One of the best flights of fancy that has come from the wireless revolution is the Cloud. Loosely speaking, the cloud is the Internet—all of those computers out there that connect us in the world wide web. But cloud computing also refers to applications and sometimes data that reside “out there” rather than on your own computer. It’s rather soothing to think about all of those bits of code bouncing around the stratosphere on a cumulus mattress rather than residing in earthbound bunkers of supercomputers.
I was charmed, therefore, when reading up on Yap, Inc., to learn about the Speech Cloud. Yap provides software-processed (rather than human-processed) speech recognition services, largely via partners like Microsoft and Sprint and other phone carriers. Voice mails, conference calls, and other bits of dictation are transported to Yap’s Speech Cloud and rendered into text by software and returned to the customer’s computer or device.
Transcribing Conversations Accurately
According to the company’s web site, “Yap pioneered the world’s first high accuracy, automated speech recognition platform for ‘long duration’ dialogues. Long duration dialogues are conversations and audio content ranging from 10 seconds to several hours.”
As a writer, I’m involved in a lot of “long duration dialogues,” i.e. interviews. And as someone who came of age right after shorthand became a dead language, I have long struggled with the technology of reporting. Recorders are great, but recordings have to be transcribed. Traditional speech recognition software like Dragon Naturally Speaking are said to be able to handle one person’s voice after training, but are not meant for transcribing multiple voices. Computer tablets with handwriting recognition are not currently ready for prime time.
So it’s a tablet (of paper) and pen for me, with plenty of ad hoc abbreviations that I have to quickly translate for myself after each interview.
Understandably, then, I was more than a little intrigued with the possibility that Yap CEO Igor Jablakov would be able to present me with a written transcript of our recent phone conversation. But that was not to be. It’s not that the software is incapable of handling interviews, he assured me, but the demand isn’t currently great enough for Yap or its partners to develop interview transcription as a direct-to-consumer product.
Jablakov explained that Yap works largely through partners—like the aforementioned Microsoft and Sprint, and through companies like RingCentral and Vocalocity that provide assorted telephony services to businesses. Those partners can offer their enterprise customers with call mining, for example, in which Yap software produces transcripts of customer service calls that can be examined for market research purposes.
Yap touts the accuracy of these transcripts as compared to human transcriptionists and also the added value of privacy protection when transcriptions are created by software rather than people.
Founded in 2006, Yap, which was recently named one of North Carolina’s 25 Companies to Watch, is peopled with a team who had previously worked with Dragon Naturally Speaking and Via Voice, IBM’s entry into consumer speech recognition software. Jablakov, who headed a portion of IBM’s speech group in Research Triangle Park, got the startup bug after being a mentor in IBM’s Extreme Blue Program, an internship program in which students are involved in technology of software development.
“The company was founded on a love of cutting away annoyances in our lives,” Jablakov says, and those include sending text, receiving long long voice mails, and having to check multiple accounts for different types of messages (voice, text, and email).
This March, Yap achieved national recognition when Microsoft announced its partnership with Yap to provide code for its Talk-to-Text mobile application. Users can speak their text messages and emails, saving time and keystrokes.
The Microsoft announcement “caught a lot people’s attention,” says Jablakov. “It said we have arrived, and our capabilities are seen as world class.”
This year also saw Yap’s foray into the iPhone applications world in its first direct-to-consumer offering. The free Yap app allows you to send incoming voice mails to the Speech Cloud and then receive them as text messages. If you’re the kind of person who receives dozens of voice mails a day, this could be a real timesaver.
For Yap, offering a free app allows the company to be privy to what end users want; by customer reviews and comments in the App store and on Facebook and Twitter, the company can learn what features to tweak, whether in their own application or in the code that’s shipped through partners.
Back to the Speech Cloud. How it works is something of a trade secret, Jablakov told me, but he did indicate that the multitudes of voices that accumulate there all become part of a “stone soup” that helps contribute to the software’s accuracy. If you ever used Dragon or Via Voice, you’ll remember having to “train” the software to understand your unique voice through recording selected text and assiduous correcting of mistakes.
Yap, on the other hand, is “intentionally meant to be speaker-independent,” says Jablakov. You might get voice mails from the same person every day, but also from random people who will never call you again. But you need a high degree of accuracy from all of those calls, he says.
So is it accurate? I ran a very unscientific test with a Science in the Triangle colleague who has an iPhone. My first short message, sent from a cellphone with less than stellar clarity, was as follows: “Hi Tessa, it’s Lisa. Um, I just wanted to let you know that I’ll be in Durham on November 12 through 18 and I’d like to try to set up a meeting with you and Chris. I’ll send you an email to confirm this as well. Thanks. Bye bye.” The screenshot, at right, shows the transcription Tessa received. You’ll see that it bobbled her name but impressively got the rest of the message right and deleted my “um.” Not bad, especially because I have a problem with the letter “S.” Because of that, I subsequently left a more detailed message that I intentionally larded with “S”s. Let’s just say that my contributions to the Speech Cloud were reminiscent of my occasionally frustrating efforts to train the dragon.
My inconclusive experiment shouldn’t take away from the impressive challenge of creating accuracy from a virtual tower of Babel. And I’m still holding out for the possibility that one day Yap software will be able to transcribe reporters’ interviews.