Getting voice interfaces for hardware right

Getting voice interfaces for hardware right

In the world of hardware, it feels like everyone's talking about voice lately. It's the new hotness. I'm shooting the breeze about it on Twitter....

If you hang out with Avidan Ross from Root Ventures, he'll wax passionate about how Alexa (the "name" of the Amazon Echo) has changed the way he and his wife interface with technology in their home. And this is just the tip of the iceberg. Amazon has released new add-ons to extend voice interface around your home, startup Ivee is working on the same's a thing. Voice is hear (heh), the perfect way to interact with your smart devices.

But I haven't been totally convinced.

I'll give it to you, the Echo is neat. We have one at the office at Mindtribe. But for us, it's a toy.  The voice interface is a bit laggy, the keywords somewhat unintuitive ("Alexa. ALEXA! Wikipedia donuts."), the audio not quite room-filling. Somebody got it to play fart noises.

The Amazon Echo

On my Android phone, there's Google Now voice commands. Where to begin there? I use it to set my alarm clock at night ("Ok Google... ok---OK GOOGLE. WAKE ME UP IN 7 HOURS"). But it's basically useless when I need it most - usually when I don't have my hands free. I've tried to use it to change my music while biking in SF or driving around in the ol' Honda Civic. Both situations would be humorous failures - if they weren't quite so dangerous. In both cases, I have to get at my phone (either from my pocket or my car phone holder), get it unlocked, get it on the right screen (the home page), and then scream into my phone. Maybe it understands me, maybe it doesn't. In the interest of the safety of myself and others, I usually just abstain.


So I haven't been huge on the whole voice thing....until now. An unlikely hero has shown me the way - Apple's new Apple TV.

I get back from the gym. It's late, and I'm beat. I flop down on the couch, turn on the TV. I click something on the Apple TV remote, and the interface pops right up. Whatever my roommates were watching last comes up - probably John Oliver off HBO Go. I tap the Siri button on the remote and say "play the next episode of Narcos on Netflix." I've barely shut my mouth before I am immersed in the world of 80's cocaine-fueled violence and intrigue.

This thing just works. It's so easy. Unlike when I was using my Playstation 3 as a media server, I don't flip through endless menus. I don't even have to pull out my phone, unlock it, and fiddle with a bunch of apps like when I'm using my Chromecast. I just flop down, utter a request, and I'm enjoying my content. It's like I've got a magical TV butler. The whole thing is oddly convenient.

The new Apple TV

Now, I'm no Apple fanboy. In real life and on this very blog I have ridiculed the original Apple TV for having a mind-numbingly poor interface. But they've really turned things around with this one.

This got me thinking - why am I so enamored with the Apple TV's voice interface, when I've never been a huge fan in other settings? Here's what I can figure out thus far:

It's accessible.

A touch of a button, an utterance, and I'm off to the races. It's hugely different than the several steps I have to take to access Google Now on my phone. In some ways I even prefer it to the always-on interface of the Echo. Not having to yell a keyword is nice.

It's natural.

I've yet to have the Apple TV not understand what I'm asking it to do. Many variations of how I request my content get interpreted just fine. Granted, the Apple TV interface has a very small set of possibilities to parse - I'm asking it to play content. It's not nearly as big a problem as general-use-case Siri or overall home-assistant Echo. But this is a good thing. Constrain the use cases and solve the problem well - that's good strategy.

It's fast.

The Apple TV's voice recognition is seriously faster than Google or Amazon's. Their voice recognition software has to be running locally on the device itself, as opposed to sending soundbytes to the cloud and receiving commands back. It feels natural and responsive in a way that even a few seconds of lag totally ruins.

It's private.

The voice interface makes a lot of sense in the comfort of your home. Less so biking through the streets of SF. I think we'll have to nail some form of subvocal recognition before voice interfaces become truly ubiquitous.

As with any tool, there's a time and a place for voice. But now I understand that, with the right setting and execution, voice is a very compelling interface option for smart hardware. I'm looking forward to seeing more good implementations of voice in the future.

Fun with Firmware: three implementations of a circular buffer

Fun with Firmware: three implementations of a circular buffer

Why Apple doesn’t do user studies

Why Apple doesn’t do user studies