Grassroots Revolution: The Push to Diversify AI Voices

The explosion of voice AI technology has brought us powerful new tools like Apple’s Siri and OpenAI’s ChatGPT-powered assistants. Yet, the voices that guide us through this brave new world often feel monotonously familiar—overwhelmingly English, predominantly American, and unreflective of the rich diversity of languages and dialects across the globe.

But an ambitious grassroots movement, Mozilla’s Common Voice project, is working to change that. Through a volunteer-led effort, this initiative aims to collect voice data in a range of languages, accents, and tones, making future AI voice systems more inclusive and representative.

The Challenge of AI’s Linguistic Bias

The problem lies in the data. AI systems are trained on large datasets, but most of this information comes from English-language sources reflecting Anglo-American culture. This focus risks perpetuating linguistic colonialism, marginalising other languages, and eroding cultural diversity.

EM Lewis-Jong, a director for Common Voice, warns of the dangers: “Rather than creating truly multilingual models, we risk forcing everyone to operate in dominant languages like English or French.”

Mozilla’s Common Voice project offers an alternative. Since 2017, it has gathered over 31,000 hours of voice data across 180 languages, including underrepresented ones like Marathi, Circassian, and Zaza. The project’s open-source ethos ensures the data can be freely accessed and used by developers worldwide, providing a transparent foundation for more inclusive AI systems.

From Local Voices to Global Impact

The backbone of Common Voice is its growing army of volunteers—now more than 900,000 strong. These contributors record phrases, validate data, and even raise awareness within their communities. Among them is Bülent Özden, a Turkish researcher who has dedicated months to collecting and refining data for Turkish and minority languages in Turkey.

“For me, it’s about preserving cultures,” says Özden. “Low-resource languages are at risk of disappearing, and this project helps ensure they remain part of our digital future.”

Yet the road to inclusivity is uneven. While English boasts over 3,500 hours of recorded data, other languages like Korean and Punjabi have only a fraction of that. This disparity arises because data collection is community-driven, reliant on grassroots efforts in regions where tech resources may be scarce.

Building AI for Real People

For developers like Karolina Sjöberg, founder of Mabel AI, the value of Common Voice lies in its diversity. Her company used the dataset to develop translation tools for Ukrainian refugees in Sweden, tailoring AI to support people in vulnerable circumstances.

“Most voice data comes from people reading books or scripts, which doesn’t reflect how people actually speak,” Sjöberg explains. “Common Voice allows us to build tools that sound natural, even in moments of distress.”

Still, challenges persist. Many recordings in Common Voice skew toward younger male voices, leaving gaps in representation for women and older speakers. Sjöberg’s team has begun collecting its own data from these underrepresented groups to improve accuracy and relevance.

Beyond Voices: A Fight for Fair Data

The open-access nature of Common Voice has raised concerns about “data extractivism,” where large tech companies profit from community-generated data without giving back. Mozilla is actively exploring ways to ensure fairness, including new licensing models that could limit commercial use or require contributions to community projects.

“We’re piloting ways to make open data more equitable,” explains Lewis-Jong. “It’s a learning process, but the goal is to create an ‘open source 2.0’ that works for everyone.”

Voices of Identity

For many, Common Voice is about more than just technology—it’s about identity, culture, and preserving heritage. “Languages carry idioms and cultural nuances that can’t be translated,” says Lewis-Jong. By diversifying voice AI, the project ensures these unique elements are not lost in the digital age.

As I verified Finnish voice samples on the platform, I was struck by the sense of connection. These voices, echoing through my room, were united by a shared purpose: making AI less generic and more human.

Donating my own voice felt like a small yet significant contribution to this mission. Sitting in front of my computer, I repeated the phrases presented to me and pressed Record. One day, I hope, my voice will help shape an AI that is more inclusive, more representative, and perhaps, a little more like me.

Source: https://www.technologyreview.com/2024/11/15/1106935/how-this-grassroots-effort-could-make-ai-voices-more-diverse/