Content Moderation, Data Collection, Video Annotation
Amplifying Safety with Responsible AI in Gaming: Audio Collection & Annotation for Toxic Speech Detection
The Challenge
Our client, an international technology company, was looking for a partner to assist in a large conversational speech data collection project in the gaming domain. Toxic speech is a growing concern in today's society, as hate speech, online harassment, and other verbal attacks are on the rise, especially in forums that children utilize. The goal was to collect a minimum of 100 hours of highly toxic speech with 100 avid gamers.
• • • •The Solution• • • •
Our proposed solution was to execute a moderated collection with more than 100 participants in a variety of highly competitive games in both multi-player scenarios and one-on-one play. We recorded groups of two to six participants playing games that are highly interactive and tend to have higher rates of toxic speech, such as competitive shooter and fighting games. We sourced avid gaming participants of various ages, genders, education levels, and Geo locations to ensure our data was thorough and diverse.
Working closely with the participants, we were able to collect 100 hours of toxic speech in these live games. Our client was extremely pleased with the quality and authenticity of the data. DataForce then completed the training dataset for the client’s model by annotating 70 speech hours according to the client’s guidelines.
This case study demonstrates the importance of accurate data collection for speech detection technologies. With the rise of online hate speech, it is imperative to have a diverse dataset to identify the nuances of harmful language to improve the detection of toxic speech in various contexts.
At DataForce, we pride ourselves on the ability to approach challenging projects with innovative solutions. Our success in this collection and annotation project illustrates our expertise in sourcing highly specialized participants, skillfully collecting nuanced data, and providing high-quality annotations. DataForce is proud to be a global partner in collaborations such as this, and we appreciate the opportunity to work on impactful, responsible AI solutions.
DataForce has a global community of over 1,000,000 members from around the globe and linguistic experts in over 250 languages. DataForce is its own platform but can also use client or third-party tools. This way, your data is always under control.