The ways we interact with technology are continually evolving. We all remember how typing DOS commands on a keyboard gave way to the WYSIWYG simplicity of mouse-navigated Windows, and today, there’s a growing use of touch screens. The next big evolutionary step in user interfaces – and it’s big – includes voice commands, facial recognition technologies, and artificial intelligence (AI).
AI enabled machines will use these interfaces to anticipate, predict, and execute on a multitude of tasks – speeding up processes and actually minimizing time users dedicate to the interfacing process.
While this points to a very promising future, recently the brakes have been applied to many AI-based projects. How come? Because the collected data is no longer necessarily clean, accurate, or reliable.
It was accumulated in a pre-COVID-19 world, and was based on assumptions drawn from a pre-pandemic marketplace.
So like an architect discovering all the measurements on their project’s blueprint are incorrect, it’s back to the drawing board for a number of AI initiatives.
Let’s take a closer look at the challenge.
The goal is to make accessing information and services easier for everyone.
To this end, face recognition technology has grown exponentially, now being widely deployed for airport check-ins, as a security feature for unlocking our phones and tablets, and for granting access to restricted areas.
Voice-enabled experiences are also becoming more common. We’re seeing voice-activated smart kiosks in our fast food restaurants, for example, where your fries are ordered using only your voice and it’s voice-enabled chatbots, not workers busy fulfilling orders, that now offer customer support and all those upsells to supersize.
These are all great ways to access information and just as we’ve begun to assimilate them into our normal lives, it turns out these technologies may need to be changed, dramatically, as they were developed and trained for a pre-pandemic world.
Voice technologies were developed under an assumption that reasonably clear annunciation would be provided by the customer.
AI models that interpret the vocal data weren’t trained to handle commands muffled by a face mask – as they primarily work by comparing received sounds with speech corpuses with transcriptions tied to clear speech voice-samples.
This means that in a pandemic world, a successful voice-based customer experience just got a lot harder to deliver.
Similarly, because a face mask covers most of a person’s visage, Computer Vision models are now only receiving information from the customer’s upper half of the face… a data scenario they weren’t expected to have to handle
In fact, a study by the US National Institute of Standards and Technology (NIST) has found that facial recognition algorithms developed before the emergence of the COVID-19 pandemic have “great difficulty” in accurately identifying people.
The NIST study reveals: “Even the best of the 89 commercial facial recognition algorithms tested had error rates between 5% and 50% in matching digitally applied face masks with photos of the same person without a mask.”
As a result, the customer is left with an unpleasant user experience that requires them to revert to “manual” interfaces, significantly hindering identification process.
AI models use data to train, make assumptions, and then provide a response to the user. This data then constitutes the dataset which is the entire batch of data the current operation is compared with.
Up until recently, AI models had been trained with data that belonged to a non-pandemic world, where faces were fully visible and vocalizations weren’t obstructed by masks.
The COVID-19 pandemic caught our AI platforms off-guard and AI will need time to adapt to the new environment. In order for Voice Experiences and Face Recognition to stay relevant, datasets need to adjust to the new today.
A quick hack to mitigate problematic keywords and words in a voice-powered application is to use the data collected by the application itself to identify the words that get incorrectly transcribed; and to let the application make assumptions that correct the transcription in order to deliver the intended meaning to the user.
For example, a voice powered application in a fast food environment transcribing “May I get some orange shoes?” should take into account that what the user very likely meant is “orange juice” and repair the error from the model at an application level, or ask the final user for confirmation.
Ultimately, developers will need to re-engineer the application to increase the dataset and to collect voice samples that are actually mimicking real-life scenarios; which at this point will need to include muffled speech voices in a wide variety of environments.
Right now, certain workarounds are being adopted to avoid relying solely on face recognition – for example, Apple iPhones now disable the Face ID option when a face mask is detected.
“If the [facial recognition] companies aren’t looking at this, aren’t taking it seriously, I don’t foresee them being around much longer,” said Shaun Moore, CEO of Trueface, which creates facial recognition technology that’s used by the U.S. Air Force.
Results are already showing off, Computer Vision technology is now used to recognize people wearing masks in public places or before entering a store and so it’s showing the technology can be put to use for own safety as well.
In order to overcome the challenge set by the pandemic, data scientists are collecting and analyzing new and relevant data to successfully adapt their models to properly serve their end customers.
While in the past, collection of voice data of muffled speech was regulated in rare and specific cases, now it’s becoming a priority. The same is true for face recognition datasets which are expanding to recognize images of people with face masks, basically working with the area around the eyes.
It will take time, but companies are moving faster to adapt to this new reality. As the amount of data collected grows, AI models will become smarter and have less difficulty serving end customers and make technology easily accessible again.
Sergio Bruccoleri is Lead Technology Architect at Pactera EDGE.Reblogged 3 months ago from www.clickz.com