Microsoft recently claimed that their speech recognition is as good as a human transcriber. The company announced that their speech recognition system has achieved an error rate of 5.1%, which is the best till date. It is equivalent to the accuracy level produced by the professional human transcribers.

In 2016, the Speech and the Dialog Research Group of Microsoft claimed that 5.9 is the average human error rate but certain other professionals suggested that 5.1 was much closer. This new conservative system has achieved the human consistency.

Microsoft’s Speech Recognition is Equally Good as A Human

The company was aiming to minimize the error rate to below what the human teams can do and finally they achieved it. Speech recognition is considered to be the basic block for creating strong artificial intelligence. Xuedong Huang, the chief speech scientist at Microsoft, said that it essential to understand much more beyond the simple process of transcription. He wanted to emphasize on what is being said.

For about 20 years, the team used to collect a group of recorded phone conversations named as ‘Switchboard’, so as to test the speech recognition system for complete accuracy. It was carried out by either the humans or a machine, in order to transcribe the conversation between strangers on any topics.

In order to reduce the error rate of the system by 12% from the last year’s results, the team adopted a set of enhancements into its language models and neutral net-based audio. The vocabulary size of the language model was increased to 165000.  The researchers instilled the concept of ‘Dialog session-based long-short-term memory’. It implies that the contemporary language model allows the system to utilize the complete conversation as history while trying to govern certain phrases.

The team believes that a lot more has to be done in the field of speech recognition because the latest discovery does not cover the tedious tasks such as the speech recognition in a noisy environment etc. The systems should also be taught to understand the meaning of the words spoken. The team at Microsoft is excitedly waiting for the next breakthrough in the near future.


Leave a Reply