Text Annotation Content Moderation Data Collection Language Engineering
Empowering Safer AI with IBM
“The DataForce team collaborated with the IBM team to fine-tune their annotation techniques as model development progressed. This constant process of refining through feedback was critical in selecting the training dataset in the most effective and efficient way.”
- Prasanna Sattigeri, Principal Research Scientist & Manager, IBM
The Challenge
IBM aimed to build Granite Guardian 3.0 models that can be used to detect risks in prompts and responses of large language models (LLMs). The goal was to obtain high-quality, diverse annotations of harmful and safe prompts to train the model to detect and block harmful requests entirely while classifying them across dimensions such as harmfulness, social bias, and ethical concerns. However, identifying harmful content is inherently complex, particularly in nuanced or subtle cases. To address this challenge, IBM required a meticulously curated dataset of safe and unsafe prompts to train its LLM effectively and ensure high accuracy in detecting inappropriate inputs.
• • • •The Solution• • • •
DataForce partnered with IBM to deliver a comprehensive and diverse dataset tailored to the needs of Granite Guardian 3.0. This effort involved:
- Content Generation: DataForce generated a wide range of complicated scenarios from scratch across different risk dimensions. This strategy ensured the dataset provided IBM with edge cases to test the limits of the model’s detection capabilities.
- Diverse Annotation Process: Leveraging a global team of annotators with socio-cultural diversity, DataForce classified prompts into multiple relevant categories of safe or unsafe scenarios (e.g., “Jailbreaking,” “Violence,” “Profanity”), ensuring the dataset captured varied perspectives.
- Iterative QA and Testing: Using a phased approach, DataForce and IBM collaborated to test dataset groups, analyze results, and refine data for quality assurance. This process identified the most impactful data for training Granite Guardian 3.0.
- Custom Support: DataForce worked with IBM to refine annotation protocols, ensuring alignment with IBM’s guidelines and goals. This iterative process and feedback mechanism was critical in determining which risk dimensions needed the most attention after model iteration.
Results
The collaboration enabled IBM to significantly enhance Granite Guardian 3.0’s ability to detect harmful prompts, including subtle and nuanced cases. The model also outperformed standard benchmarks for hallucination, harmful content, and social bias, setting a new standard for AI safety. The LLM now demonstrates improved precision in identifying unsafe content, reducing the risk of responding to inappropriate inputs and ensuring safer interactions. IBM recognized DataForce’s adaptability and expertise in delivering tailored datasets to meet complex AI challenges.
"Working with IBM on this groundbreaking project was a pleasure. Their focus on creating enterprise-grade, ethical AI models showcases their leadership in the field, and we are honored to have played a role in this achievement."
- Kris Perez, Director, AI, DataForce
DataForce has a global community of over 1,000,000 members from around the globe and linguistic experts in over 250 languages. DataForce is its own platform but can also use client or third-party tools. This way, your data is always under control.