Google DeepMind and Oxford University researchers have created a lip reading software that can have far reaching effects in mobile. Lipreading software is available for use by people who want to learn how to lipread on their own.
Giving everyone the ability to comprehend one another might soon be much easier.
A team from the University of Oxford's Department of Computer Science has developed new lip-reading software, LipNet, which they claim is the most accurate of its kind to date by a wide margin.
SEE ALSO: It's not just you: Siri is getting smarter
The development of the software, which was supported in part by Alphabet's DeepMind AI program, has been detailed in a paper which reports LipNet has bested the existing top marks in lipreading tech accuracy by 13.8 percent. The previous best software and its 79.6 percent mark was already light-years ahead of the efforts of human lip-readers, who averaged 52.3 percent accuracy with the same test.
Counter to practical logic, the breakthrough is actually in part thanks to a less refined approach to the task — at least in terms of scale. The Oxford team expanded their focus from a speaker's individual words, which every previous system had used, to the larger constructions on the sentence level.
LipNet is the first lip-reading model to operate at sentence-level.
According to the paper, 'All existing [lip-reading approaches] perform only word classification, not sentence-level sequence prediction.... To the best of our knowledge, LipNet is the first lip-reading model to operate at sentence-level.'
In other words, the software became more effective as it moved closer to the way the human brain best processes this type of visual data. It takes the video of a speaker and instead of honing in on each and every word as a distinct entity, its deep-learning predictive capabilities allow it to place them within a larger context for greater understanding (you can see it in action in the video above).
A member of the team, Oxford Professor and Google DeepMind scientist Nando de Freitas, has taken to social media to give the general public more context than they might have been able to find in the cut-and-dry jargon of the paper.
First, he clarified that the software has not yet been put to task beyond the baseline test and needs further development:
Thanks to CIFAR's support for ambitious research. Note this is still restricted to a simple dataset, but a significant improvement. https://t.co/lHJFfpqyBa
— Nando de Freitas (@NandoDF) November 8, 2016
More hopefully, he hinted at the great potential LipNet has for practical use:
We're excited to use this research to build better human-computer interfaces and hearing aids. https://t.co/lHJFfpqyBa
— Nando de Freitas (@NandoDF) November 8, 2016
Most importantly, this heightened level of accuracy opens up new possibilities. For those who depend on sign language and, to a lesser degree, lip-reading, communication can be extremely challenging.
There are also clear benefits for people in general: Reading lips could potentially become something anyone with a smartphone could do, and voice command systems may become even more accurate with the application of software like LipNet.
Automated Lip Reading (ALR) is a software technology developed by speech recognition expert Frank Hubner. A video image of a person talking can be analysed by the software. The shapes made by the lips can be examined and then turned into sounds. The sounds are compared to a dictionary to create matches to the words being spoken.
The technology was used successfully to analyse silent home movie footage of Adolf Hitler taken by Eva Braun at their Bavarian retreat Berghof.
The video, with words, was included in a documentary titled 'Hitler's Private World', Revealed Studios, 2006
Source: New Technology catches Hitler off guard
See also[edit]
Retrieved from 'https://en.wikipedia.org/w/index.php?title=Automated_Lip_Reading&oldid=862778063'