Voice is my OS

This article is an edited excerpt from the recently published book, Voice Technology in Healthcare, published February 21, 2020. In this excerpt, I discuss why Voice is my OS.

Get your own free personalized #VOICEismyOS graphic here!

I can recall coming home from school in the mid-1980’s and being greeted by my excited parents. “We bought this new machine,” they said. “It’s the latest technology and everyone is talking about it!” I was extremely intrigued and immediately ran into our home office.

There sitting on the desk was a square, monochrome screen perched on top of a rectangular module, with a coiled cord attached to a keyboard. It was our very first personal computer.

I can remember sitting down at the desk, inserting a 5 1⁄4 inch floppy disc, turning on the computer, and waiting with great anticipation for MS-DOS (Microsoft Disc Operating System) to load. I was eventually greeted with a text prompt and I started typing some primitive commands on the keyboard. I was fascinated by the technology and I spent many hours typing commands and learning the skills to navigate the operating system.

Approximately 10 years later Microsoft introduced their graphical operating system, Microsoft Windows. Windows, combined with the use of a mouse to control the operating system, changed the way we interacted with computers. We now had a graphical user interface and we could use the mouse to click, drag, and drop things across a screen. You may remember one of the earliest programs that took advantage of this input method was a primitive “paint” application. I can recall opening the program and using the mouse to draw a variety of simple illustrations. It was incredible.

In the late 2000’s, there was another technological breakthrough that revolutionized the way we interact with computers. In an iconic presentation, the late Steve Jobs, then CEO of Apple, stood up and introduced the first iPhone to the world. This device put an incredibly powerful computer in the palm of our hands.

This magical device was a huge leap in technology.

We could now control our own personal computer using just our fingers to simply pinch, tap, zoom, and swipe – all on a piece of glass.

We have witnessed the evolution of some incredible technology over the years, but nothing like what we are about to see, or more specifically, hear. What has been the underlying theme amongst the devices mentioned so far? In the case of the personal computer with MS-DOS, we used a keyboard as an input device to communicate with the computer. In the second example with MS-Windows, we used a mouse to control the cursor on a screen. In the third iPhone example, we used our fingers to interact with a touch-sensitive piece of glass. In all of these scenarios, a specific hardware device was required in order for us to interface with technology (i.e. keyboard, mouse, and touchscreen). Furthermore, in all three cases, it was necessary for us to learn or be taught how to use the device. In other words, we had to adapt the way that we communicated with computers in order for the technology to understand us.

Now, for the first time in our history, and due to advances in computing power, artificial intelligence, and natural language processing and understanding, technology has caught up to our most natural and instinctual form of communication. We no longer have to adapt to the technology; the technology is adapting to us – to our voices. We can now simply speak to computers, and computers can understand and respond intelligently to what we are saying. This is a fundamental paradigm shift.

We no longer are adapting to computers; computers are adapting to us.

The “voice first” era is upon us. This term, coined by Brian Roemmele, refers to the fact that we as human beings “are voice first before we are born.” We will speak to computers first, before typing or texting for a variety of reasons that will be explored below. Voice-first and voice-enabled technologies are leading the way as the next frontier in human-computer interfaces. Our world is very quickly embracing the voice interface and our interactions are rapidly evolving to where each of us will use our voice first to talk to a computer. Not only is voice the future person-computer interface, but it extends much further than that.

In my opinion, voice is becoming the next operating system, the “vOS”.

Why is Voice the Next Operating System?

Voice technology is rooted in the ability to understand and convey messages using verbal (as opposed to nonverbal, written, or visual) communication. This is a fundamental difference in the evolution of technology that makes voice so compelling as the next operating system.

There are 5 main reasons that voice will be the next operating system and I have developed the following framework using the acronym V-O-I-C-E:

V – Versatile

O – Omnipresent

I – Innate

C – Contextual

E – Efficient

1. Voice is Versatile

Voice is the most versatile form of communication that we have at our disposal. With voice one can multitask, as a person’s attention can be focused elsewhere and yet that person can still be speaking or be aware of what is being said. In contrast, with written communication (e.g. handwriting, typing, or texting) the process of physically recording words requires the use of multiple senses (i.e. sight and touch) and increased cognitive load. One must think about the words to say and then record those words using some type of device and the sense of touch. The intended recipient must then read (i.e. visually observe) the words in order to be aware of the message being portrayed.

Even interpreting body language (i.e. nonverbal communication) and sign language – while extremely effective for communicating thoughts and feelings – requires the receiver of the message to look at the person conveying those messages. By definition this requires the person to use sight to receive the message.

In the case of voice however, the receiver of the message does not have to be looking at the transmitter. The only requirement is that the person needs to be within earshot of the sound of the voice and possess an intact sense of hearing. Because of this, the focus of the person receiving the message can be on anything they want. One can truly multitask while speaking or listening to someone’s voice.

In fact, we can do just about anything while we are speaking and listening.

We can hold verbal conversations with each other when we are doing a variety of activities, such as driving, cooking, or exercising. It is very difficult, or even dangerous, to attempt to do some of these activities while typing, clicking, tapping, or texting (i.e. written communication). In fact, there are laws that prohibit us from communicating with these nonverbal methods while driving; conversely, voice interactions are not only safe and efficient, but are also being encouraged by car manufacturers as they are incorporating voice assistants into their vehicles.

2. Voice is Omnipresent

Voice is always on and envelopes us with the sound of the spoken word. Unlike any other type of communication that requires some type of action to initiate a dialogue (e.g. type, text, etc), voice can be summoned with just a thought and the spoken word. This eliminates one layer of complexity when sharing thoughts and ideas. Consider even the mobile phone: you are required to lift your phone, make visual contact with it, and then do some type of maneuver to initiate the dialogue, whether that is tapping or swiping (unless you have summoned your virtual assistant by voice and in that case you are using verbal language). In the case of the written word, one clearly has to grasp a pen to begin capturing one’s thoughts. In the case of typing, one has to open an app or program to begin constructing the message.

In the case of voice however, you simply speak or listen. Assuming that a microphone is somewhere in the vicinity, there is no other action required to simply start speaking one’s ideas.

Voice simply “works”.

Similarly, your ears are always “on” and any sound transmitted from a speaker can be heard. This type of communication is the most seamless and frictionless type of communication available to us at the present time. The sound of voice surrounds us. It is always available and ready to be used.

3. Voice is Innate

Voice is the most natural way that we know how to communicate. Recall the example that began this chapter. When babies are born the first thing they do is use their voices – they cry. Furthermore, even before they are born they learn their mother’s voice. While there are certainly skills to learn when it comes to typing, texting, or even clicking, the ability to speak develops naturally (with some exceptions for speech challenges). Speaking to a computer requires little if any training at all.

If one can talk, one can communicate with a voice-first device.

We are very social members of the animal kingdom and computers are becoming participants in our social verbal interactions.

4. Voice is Contextual

While voice is innate, it is also extremely expressive. Whether you are sharing your most fond memories or describing your greatest nightmares, the emotion can be heard in your voice. The variability in a person’s voice is immense.

From whispering to shouting, talking to singing, each utterance out of a person’s mouth carries so much more information than just the words that are spoken.

Furthermore, we can naturally adjust the volume, pitch, frequency and other variables to the context and message we are conveying without consciously processing this information.

Not only does the voice reflect the context of the situation, voice also reflects the physical health of a person. When one becomes sick with laryngitis, one’s voice changes – a change that is evident to those that hear the voice, not by sight but by listening with the ears. In fact, research has shown that vocal biomarkers, the characteristics of our voices beyond the words that are spoken, are a sign of our emotional states and biological processes. We are entering an era where voice is likely to become a key vital sign for healthcare providers – a vital sign that is non-invasive and can be interpreted in real time from a distance.

5. Voice is Efficient

Voice saves us time, and time is one of the most valuable commodities that we each possess. In our busy lives, anything that saves us time is critical. Consider the fact that the average typing speed is approximately 40 words per minute.

Compare that to the average speaking speed of approximately 150 words per minute, and you have a mode of communication in voice that is three to four times faster than typing.

Even as we grow and develop more advanced communication skills including reading, writing, typing, and texting among others, we continue to choose to use our voices as the most efficient way to communicate with each other. If given the opportunity to communicate via the method of your choice with someone standing next to you, what would you choose? Would you speak to the person or send an email or text? Why?

Get ready for vOS

What makes voice the new operating system (vOS) is the fact that specific voice dialogs, applications, and use cases will be built on top of the basic voice functionality, similar to how mobile apps were built on top of the mobile (e.g. iOS and Android) operating systems. For iOS, the interface is the mobile phone. For voice, the interface is the smart speaker/microphone, (or any other hardware device that you can speak to) and the operating system is Amazon Alexa, Google Assistant, Apple Siri, Samsung Bixby, or any of the other voice assistants that are being developed. As we continue to witness the merge of computers and our innate ability to speak, the future of computing is getting louder right before our ears. Don’t be afraid to speak up!

For more information on the book or the authors of Voice Technology in Healthcare, see the link here.