© 2024 Connecticut Public

FCC Public Inspection Files:
WEDH · WEDN · WEDW · WEDY · WNPR
WPKT · WRLI-FM · WEDW-FM · Public Files Contact
ATSC 3.0 FAQ
Play Live Radio
Next Up:
0:00
0:00
0:00 0:00
Available On Air Stations

Voice Recognition Software Finally Beats Humans At Typing, Study Finds

Computers have already beaten us at chess, Jeopardy and Go, the ancient board game from Asia. And now, in the raging war with machines, human beings have lost yet another battle — over typing.

Turns out voice recognition software has improved to the point where it is significantly faster and more accurate at producing text on a mobile device than we are at typing on its keyboard. That's according to a new study by Stanford University, the University of Washington and Baidu, the Chinese Internet giant. The study ran tests in English and Mandarin Chinese.

Baidu chief scientist Andrew Ng says this should not feel like defeat. "Humanity was never designed to communicate by using our fingers to poke at a tiny little keyboard on a mobile phone. Speech has always been a much more natural way for humans to communicate with each other," he says.

Researchers set up a competition, pitting a Baidu program called Deep Speech 2 against 32 humans, ages 19 to 32. The humans took turns saying and then typing short phrases into an iPhone — like "buckle up for safety" and "wear a crown with many jewels" and "this person is a disaster." They found the voice recognition software was three times faster.

Stanford computer scientist James Landay did not expect that. "The surprise for me was that it was that much better: three times faster! You would think everyone would be flocking to use it if they knew how much better it actually was."

Voice recognition still gets a bad rap. That could be because of how people use it. Apple's Siri, the beloved and befuddled personal assistant, has a hard time answering basic questions.

The Stanford University-University of Washington-Baidu team didn't test query skills. They zoomed in on voice recognition software's ability to type the spoken words. In English, they found the software's error rate was 20.4 percent lower than humans typing on a keyboard; and in Mandarin Chinese, it was 63.4 percent lower.

Landay hopes these findings encourage people to revisit the idea of talking to their phone.

"People probably play with Siri and find oh, it didn't give them the right answer. So they don't think to use speech as a way to do their text messaging or their email or what not," he says. "Using speech for those things is now working really well."

Back in the 1990s, researchers found voice recognition tools were far less accurate than keyboard typing. Slang and ambient noise in a room tripped up the software.

In the last few years, that's changed for a few reasons: Just like smartphone cameras with more megapixels can see us better, the built-in microphones can hear us better. Supercomputers are churning through data more effectively in a process called "deep learning."

And there's more training data to vacuum in and learn from. For example, Ng says, Baidu has five years' worth of audio — unique recordings of people speaking that can play nonstop from now until 2021.

Last year, 65 percent of smartphone owners in the U.S. used voice assistants, according to the 2016 Internet Trends Report, a popular annual overview by tech investor Mary Meeker.

Many tech companies are betting that now is the inflection point and are hiring experts in the field of "natural language processing." Google and Amazon are inviting developers to work on voice-driven products.

It's easy to see how talking at your device would be far better than typing, say when you're driving.

Baidu's Ng imagines another scenario. He does not have children yet. But, he says, he looks forward to the day when his future grandchild comes home and asks, "Is it really true that when you were young, if you came home and you said something to your microwave oven — did it really just sit there and ignore you? That's just so rude of the microwave."

His co-author Landay reins him back and notes there are many moments — in a meeting, in bed with your partner sleeping — when typing still makes more sense than talking to one's devices.

Copyright 2021 NPR. To see more, visit https://www.npr.org.

Aarti Shahani is a correspondent for NPR. Based in Silicon Valley, she covers the biggest companies on earth. She is also an author. Her first book, Here We Are: American Dreams, American Nightmares (out Oct. 1, 2019), is about the extreme ups and downs her family encountered as immigrants in the U.S. Before journalism, Shahani was a community organizer in her native New York City, helping prisoners and families facing deportation. Even if it looks like she keeps changing careers, she's always doing the same thing: telling stories that matter.

Stand up for civility

This news story is funded in large part by Connecticut Public’s Members — listeners, viewers, and readers like you who value fact-based journalism and trustworthy information.

We hope their support inspires you to donate so that we can continue telling stories that inform, educate, and inspire you and your neighbors. As a community-supported public media service, Connecticut Public has relied on donor support for more than 50 years.

Your donation today will allow us to continue this work on your behalf. Give today at any amount and join the 50,000 members who are building a better—and more civil—Connecticut to live, work, and play.

Related Content