Amazon has thousands of employees around the world that listen to voice recordings captured in the homes of Amazon Echo owners, reports Bloomberg.
Voice recordings are captured when the Alexa wake word is spoken and then a subset of those recordings are listened to, transcribed, annotated, and added back into the software as part of Amazon's effort to help Alexa better respond to voice commands. Amazon has facilities for Alexa improvement in places that range from Boston to Costa Rica, India, and Romania.
Seven people familiar with Amazon's review process spoke to Bloomberg and revealed some insider details on the program that may be concerning to Echo users.
While much of the work has been described as "mundane," employees have sometimes come across more private recordings, such as a woman singing off key in the shower or a child screaming for help. Amazon employees have internal chat rooms where they share files when help is needed parsing a word or, more concerning, when an "amusing recording" is found.
Two workers told Bloomberg that they've heard recordings that are upsetting or potentially criminal, and while Amazon claims to have procedures in place for such occurrences, some employees have been told it's not the company's job to interfere.
Sometimes they hear recordings they find upsetting, or possibly criminal. Two of the workers said they picked up what they believe was a sexual assault. When something like that happens, they may share the experience in the internal chat room as a way of relieving stress. Amazon says it has procedures in place for workers to follow when they hear something distressing, but two Romania-based employees said that, after requesting guidance for such cases, they were told it wasn't Amazon's job to interfere.
Alexa users have the option to disable the use of their voice recordings for improvements to the service, but some may not know that these options exist. Amazon also does not make it clear that actual people are listening to the recordings.
According to Bloomberg, recordings sent to employees who work on Alexa don't include a user's full name or address, but an account number, first name, and the device's serial number are associated with the recording.
In a statement to Bloomberg, Amazon said that an "extremely small" number of Alexa voice recordings are annotated and that there are measures in place to protect user identity.
We take the security and privacy of our customers' personal information seriously. We only annotate an extremely small sample of Alexa voice recordings in order [to] improve the customer experience. For example, this information helps us train our speech recognition and natural language understanding systems, so Alexa can better understand your requests, and ensure the service works well for everyone.
We have strict technical and operational safeguards, and have a zero tolerance policy for the abuse of our system. Employees do not have direct access to information that can identify the person or account as part of this workflow. All information is treated with high confidentiality and we use multi-factor authentication to restrict access, service encryption and audits of our control environment to protect it.
It is standard practice to use some recordings for product improvement. Apple has employees who listen to Siri queries to make sure the interpretation of a request lines up with what the person said. Recordings are stripped of identifiable information, however, and stored for six months with a random identifier.
Google too has employees who are able to access audio snippets from Google Assistant for the purpose of improving the product, but Google, like Apple, removes personally identifiable information and also distorts audio.
Amazon does not appear to be removing all personally identifiable information, and while the Echo is meant to collect audio only when a wake word is spoken, the employees who spoke to Bloomberg said they often hear audio files that appear to have started recording with no wake word at all.
Alexa users concerned with the data that's being collected and used by Amazon should make sure to enable all privacy features and uncheck the option for letting Amazon save Echo recordings. Additional details on how Amazon uses the voice recordings it collects can be found in the original Bloomberg article.
Update: Amazon has provided the following statement to MacRumors as clarification: "By default, Echo devices are designed to detect only your chosen wake word (Alexa, Amazon, Computer or Echo). The device detects the wake word by identifying acoustic patterns that match the wake word. No audio is stored or sent to the cloud unless the device detects the wake word (or Alexa is activated by pressing a button)."