跳转到主要内容

Computational Linguistic Consulting

Inverse Text Normalization for 
Automatic Speech Recognition

 

The Challenge

Our client, Speechmatics, the world’s leading speech-to-text API scaleup, was searching for a partner to improve the consistency and readability of numerals in transcription for multiple languages.

Speechmatics reached out to DataForce for help with ensuring transcription output was consistently written for each language. DataForce was able to use its linguistic expertise and development team to drastically reduce the time-to-market for delivering 15 languages. The challenge was ensuring language-specific considerations were made, since every language has a different set of guidelines on how to write numbers, money amounts, dates, etc. This is where skilled computational linguists were required.

“When Speechmatics first contacted me, it was clear that it wasn’t a standard project! It was a steep learning curve for all involved. I had to make sure we clearly defined the scope of the project for the computational linguists and client team members, providing Speechmatics with a result that would enhance the ASR output.”  
– Dorota Iskra, Senior Director of AI, DataForce

• • • •The Solution• • • •

A set of rules was created to recognize relevant patterns in text and convert them to a consistent written form. The rules differed depending on the domain; the crucial part was capturing various exceptions. Being a UK-based company, Speechmatics developed the English rules and test cases. However, DataForce helped provide a solution for a set of 12 languages and later for another three.

Working closely with Speechmatics, we sourced top-notch computational linguists who helped the team put together Inverse Text Normalization (ITN) rules for each language and implement them within the Pynini framework. The linguists also defined positive and negative test cases for the rules. However, the biggest challenge was the various exceptions in languages that did not follow the patterns captured by the rules.

Collaborating with Speechmatics, we tested and modified these modules until all tests passed. We primarily used test sets from a number-dense financial domain to verify the quality of the work. As a result, the ASR output was formatted the same way as text would be in books or subtitles.

For example, the output for money amounts previously displayed as “20000 dollars,” and after this project, it displayed as “$20,000,” which is much easier to read. This improvement helped speed up Speechmatics’ customer transcription workflows and resulted in better human-readable captions.

Both teams met weekly to ensure close collaboration, quick resolution of issues, and clarification of any exceptions encountered as the work progressed.

“Working with DataForce gave us a scalable way to onboard experts in linguistics in a very tight timeframe, and they delivered excellent results across multiple languages. They worked closely with our engineering teams to deliver code and created test data to validate the final result. The team was responsive to any issues we had along the way and clearly communicated the progress. This project simply would not have been possible without their hard work and determination. The final improvements in our transcription have been greatly appreciated by our customers, speeding up editing workflows and improving readability, especially for long numbers and money.”  
– Stuart Wood, Product Manager, Speechmatics

Voice Audio

DataForce has a global community of over 1,000,000 members from around the globe and linguistic experts in over 250 languages. DataForce is its own platform but can also use client or third-party tools. This way, your data is always under control.

Request a consultation.