Back in July, Apple introduced the "Apple Machine Learning Journal," a blog detailing Apple's work on machine learning, AI, and other related topics. The blog is written entirely by Apple's engineers, and gives them a way to share their progress and interact with other researchers and engineers.
Apple today published three new articles to the Machine Learning Journal, covering topics that are based on papers Apple will share this week at Interspeech 2017 in Stockholm, Sweden.
The first article may be the most interesting to casual readers, as it explores the deep learning technology behind the Siri voice improvements introduced in iOS 11. The other two articles cover the technology behind the way dates, times, and other numbers are displayed, and the work that goes into introducing Siri in additional languages.
Links to all three articles are below:
- Deep Learning for Siri's Voice: On-device Deep Mixture Density Networks for Hybrid Unit Selection Synthesis
- Inverse Text Normalization as a Labeling Problem
- Improving Neural Network Acoustic Models by Cross-bandwidth and Cross-lingual Initialization
Apple is notoriously secret and has kept its work under wraps for many years, but over the course of the last few months, the company has been open to sharing some of its machine learning advancements. The blog, along with research papers, allows Apple engineers to participate in the wider AI community and may help the company retain employees who do not want to keep their progress a secret.
Top Rated Comments
Too long of a title?
Bixby works offline! /s
You can publish that on your S-Journal of S-Cience with your S-Pen!
Firstly, you have to work out what the person has said. You have many different accents to take into account, as well as a huge number of local dialects. These both affect the way that words are said and the way in which sentences flow.
Once you know what the person has said, you must then match it to an intent. Again, there are countless ways that a person might say something. And you can't just assume that the person speaking is a native speaker of the language, so they will say things in 'weird' ways.
Assuming you have a 'neutral' accent with completely accurate grammar, and you know the exact phrase that will activate a specific function, then it's feasible to carry out the activity entirely on the device. Right now though, that functionality is limited to 'Hey Siri', with all of the complex stuff offloaded to much more powerful servers.