Audio Tech? Voice Tech? What Gives?
If you’re just getting used to the concept of Voice Tech, it’s understandable that hearing “Audio Tech” might confuse you. For decades, technology only worked when you either typed a keyboard command or clicked a mouse. To suddenly have two new ways of interacting with technology - all within the span of a couple years - is enough to make your head spin. Never mind that the two terms are similar enough in common vernacular that there isn’t an easy logical separation between them.
Voice and Audio, though, are indeed two separate kinds of technology - and both are part of async communication. To make matters more complicated, they can often overlap. In this blog, I’m digging into the subtle differences between Voice Tech and Audio Tech - and then talking a bit more about platforms that are technically both.
A question of intended use
If you control or interact with technology using aural or oral skills (both pronounced like “oral”), the easiest way to differentiate between whether it’s Voice Tech or Audio Tech is the primary way you are supposed to interact with it.
Voice: controlling technology by speaking
Voice Tech is the mouse of the 21st century. Instead of moving the cursor on a screen to click the thing you want to open, you verbalize that command instead to get the result you want. Perhaps the two most famous examples of Voice Tech would be Amazon Alexa and Google Home. Instead of opening a search engine or clicking on Spotify, you can command the devices to play music or answer a question.
Arguably, Amazon Alexa and Google Home popularized Voice Tech and arguably started the “voice revolution” we’re currently in. But I’d be remiss if I didn’t state that the underlying technology behind the two products were based off of screen readers and voice adaptive technologies built for people with vision difficulties and blindness. The thing that people with visual disabilities know all too well is that a mouse-based or keyboard-based command system is heavily reliant on your sense of sight. If you can’t see the screen, you won’t necessarily know where to click. These kinds of adaptive technologies helped bring the screen to life with voice.
The new models of Voice Tech, though, are built not to augment a vision-based system, but to be controlled with voice first (and in many cases, voice only). Unlike adaptive technologies that took vision-based controls and tried to augment them for voice understanding, Voice Tech like Alexa or Google Home don’t require augmenting. They are built to be controlled with voice, and that’s what makes them Voice Tech.
Audio: engaging with technology by listening to it
A far cry from how we usually think of audio - speakers - Audio Tech is a different beast and refers specifically to apps and platforms that you engage with by listening. Two popular examples are Apple Airpods or the new audio-first social media network Clubhouse, both of which are focused on listening as the main method of engagement.
The other side of Audio Tech is that, in many cases, you also have to speak to engage. Clubhouse is a common example here, since the social network is based on listening to other people’s words - meaning someone has to speak. However, the experience of Audio Tech is built around the listener, and speaking is simply a way to engage for the sake of the listener. In other examples, like Airpods, you can speak if you use them for calls, but their primary use is as wireless headphones, making it Audio Tech.
A Venn Diagram of innovation
There are many platforms that blur the lines between Audio Tech and Voice Tech. Yac, for instance, is one of them. We call ourselves primarily Voice Tech because our main features - recording and sending voice notes - rely on verbalization as the main engagement method. However, when you send a voice note, the recipient is having an Audio Tech experience because their primary engagement method is listening.
Depending on how finicky you want to get, you could argue that most platforms in the Voice Tech and Audio Tech space are technically both, particularly for human-to-human platforms. However, the distinction is critical for human-to-machine platforms. The future of Voice and Audio may not be more social networks, but commanding apps to do work for you. Along with the rise of no-code, we may very well see a future where building technology is based on explaining what you want and having the computer do all the work.