India is uniquely positioned to take the lead in developing unbiased AI systems that truly reflect its diversity, ignored by global GenAI models. Initiatives like MEITY’s Bhashini and IIT Madras’ AI4Bharat are pivotal in this movement. These focus on collecting data in regional languages, ensuring that diverse voices are represented in AI training datasets.
By Kamal Das
In recent years, artificial intelligence (AI) and generative AI (GenAI) technologies, like ChatGPT, have made remarkable advancements in transforming various industries and becoming an integral part of everyone’s lives. However, there is a growing concern regarding the inherent biases embedded within these technologies.
The Problem of Bias in GenAI
AI and GenAI models often mirror the data they are trained on, which can lead to the sidelining of the unique needs and perspectives of minority groups. Studies from reputed institutions like Oxford University Press have shown that GenAI, including various versions of ChatGPT, reflect cultural values prevalent in English-speaking and Protestant European countries. Another study published in Nature examined AI-generated content from models like ChatGPT and LLaMA and found substantial gender and racial biases. In the International Conference on Machine Learning, researchers noted that large parameter models such as GenAI worsen the accuracy of minority samples. As a result, GenAI models risk perpetuating the societal inequalities and marginalising non-dominant and minority views.
This impacts diverse countries like India, where a multitude of languages, cultures, and socioeconomic backgrounds exist. India is home to over 1.4 billion people and a rich tapestry of over 400 languages. According to the 2011 census, 60 Indian languages are spoken by more than a million people each. Despite representing approximately 17% of the world’s languages, Indian language content constitutes less than 0.1% of all online resource. This digital marginalization highlights a pressing issue: the underrepresentation of India’s linguistic and cultural diversity online.
India’s Role in Pioneering Unbiased AI
India is uniquely positioned to take the lead in developing unbiased AI systems that truly reflect its diversity, ignored by global GenAI models. Initiatives like MEITY’s Bhashini and IIT Madras’ AI4Bharat are pivotal in this movement. These focus on collecting data in regional languages, ensuring that diverse voices are represented in AI training datasets.
Giving regional languages priority is about accessibility as much as representation. Communities are empowered when these languages are integrated into AI systems because they can communicate with technology in their own languages instead of only European ones. This strategy guarantees that technology will always be accessible to all.
A Roadmap to Unbiased AI
India can adopt strategies to harness the full potential of inclusive and unbiased AI and GenAI:
- Data Diversification: Expanding datasets to include various regional languages and dialects can help create more robust and fair AI systems. It is crucial to support efforts to collect further language data.
- Collaborative Partnerships: Encouraging collaborations between tech companies, academia, and local communities can facilitate a more inclusive approach to AI development. Diverse stakeholders will ensure that varied perspectives are included in the design of AI technologies.
- Policy Advocacy and Public Awareness: Government policies should promote inclusive and unbiased AI. It is also important to educate the public about the importance of unbiased AI and the role of language in shaping AI outcomes.
Encoding India’s diversity into AI will help the Global South
India’s focus should address the unique challenges faced across the Global South, fostering inclusivity and accessibility. India’s multitude of languages has parallels with over 3,000 indigenous languages across Africa and more than 450 in South America, GenAI must serve such diverse populations effectively. India’s emphasis on multilingual AI capabilities is a step towards bridging this gap.
Furthermore, Global South faces high illiteracy rates, a challenge that India aims to overcome through a “voice-first” approach. In addition, India is leveraging technologies that do not rely solely on smartphones but can also function on feature phones, widening accessibility. The voice-first and feature-phone-compatible focus ensures these technologies can engage users with minimal digital literacy, promoting broader societal inclusion.
Finally, India’s tradition of frugal innovation, coupled with its commitment to Digital Public Goods infrastructure, aims to deliver low-cost, scalable GenAI solutions. This ensures that GenAI technologies remain affordable, benefitting citizens across all socio-economic levels.
Conclusion
As AI and GenAI continue to shape the future, addressing inherent biases becomes essential. India has a unique opportunity to lead the way in creating an AI landscape that values diversity and inclusion. By focusing on regional languages and engaging marginalized communities, India can set a global standard for unbiased AI, fostering a more equitable digital world.
Source: India AI