Data Collection

Building a Multilingual Speech Corpus

DataForce supports a global audio hardware leader with high quality data for fine-tuning their ASR engine.

The Challenge

Automatic speech recognition (ASR) systems can convert user commands into text that is then processed by natural language processing systems. To have an effective ASR implementation, one needs to consider several aspects, such as sound and voice variations across genders, age groups, accents, and dialects, and the background noise associated with the environment where the ASR system will be used. In this case, the client needed to collect training and test data from multiple demographic groups in English, Hindi, German, French, and Italian.

• • • •The Solution• • • •

DataForce collected voice data and background noise across several scenarios using our proprietary mobile app, DataForce Contribute. Our app ensured that the audio files respected all technical requirements, such as signal-to-noise ratio and sampling rate. After having all voice commands and ambient noise collected in parking, driving, and windows open/closed conditions, convoluting the sound waves helped create data sets that simulated a real environment. With DataForce’s solution, the client developed and tested an efficient ASR engine capable of understanding voice commands in several languages across different scenarios.

DataForce has a global community of over 1,000,000 members from around the globe and linguistic experts in over 250 languages. DataForce is its own platform but can also use client or third-party tools. This way, your data is always under control.

Request a consultation.

How can we help?

Please visit our website for FAQs before submitting!

Select

Please specify the name of the role/project

First Name

Last Name

Email

Telephone

Company

Leave us a quick message about how we can assist you today

Country or Residence

How did you hear about us?

Please Specify

I agree to the privacy policy and terms of this website

Subscribe to our Email List