In a new entry in its Machine Learning Journal, Apple has detailed how Siri on the HomePod is designed to work in challenging usage scenarios, such as during loud music playback, when the user is far away from the HomePod, or when there are other active sound sources in a room, such as a TV or household appliances.
An overview of the task:
The typical audio environment for HomePod has many challenges — echo, reverberation, and noise. Unlike Siri on iPhone, which operates close to the user’s mouth, Siri on HomePod must work well in a far-field setting. Users want to invoke Siri from many locations, like the couch or the kitchen, without regard to where HomePod sits. A complete online system, which addresses all of the environmental issues that HomePod can experience, requires a tight integration of various multichannel signal processing technologies.
To accomplish this, Apple says its audio software engineering and Siri speech teams developed a multichannel signal processing system for the HomePod that uses machine learning algorithms to remove echo and background noise and to separate simultaneous sound sources to eliminate interfering speech.
Apple says the system uses the HomePod's six microphones and is powered continuously by its Apple A8 chip, including when the HomePod is run in its lowest power state to save energy. The multichannel filtering constantly adapts to changing noise conditions and moving talkers, according to the journal entry.
Apple goes on to provide a very technical overview of how the HomePod mitigates echo, reverberation, and noise, which we've put into layman's terms:
- Echo Cancellation: Since the speakers are close to the microphones on the HomePod, music playback can be significantly louder than a user's "Hey Siri" voice command at the microphone positions, especially when the user is far away from the HomePod. To combat the resulting echo, Siri on HomePod implements a multichannel echo cancellation algorithm.
- Reverberation Removal: As the user saying "Hey Siri" moves further away from the HomePod, multiple reflections from the room create reverberation tails that decrease the quality and intelligibility of the voice command. To combat this, Siri on the HomePod continuously monitors the room characteristics and removes the late reverberation while preserving the direct and early reflection components in the microphone signals.
- Noise Reduction: Far-field speech is typically contaminated by noise from home appliances, HVAC systems, outdoor sounds entering through windows, and so forth. To combat this, the HomePod uses state-of-the-art speech enhancement methods that create a fixed filter for every utterance.
Apple says it tested the HomePod's multichannel signal processing system in several acoustic conditions, including music and podcast playback at different levels, continuous background noise such as conversation and rain, noises from household appliances such as a vacuum cleaner, hairdryer, and microwave.
During its testing, Apple varied the locations of the HomePod and its test subjects to cover different use cases. For example, in living room or kitchen environments, the HomePod was placed against the wall and in the middle of the room.
Apple's article concludes with a summary of Siri performance metrics on the HomePod, with graphs showing that Apple's multichannel signal processing system led to improved accuracy and fewer errors. Those interested in learning more can read the full entry on Apple's Machine Learning Journal.