The Foundation of Inclusive AI: Understanding the Significance of Inclusive Data Sets
In grounding a discussion on the intersection of AI inclusion and innovation at Microsoft’s Ignite conference, VP and Distinguished Scientist of Microsoft Research Ashley Llorens recalled Microsoft’s mission: “to empower everyone on the planet.” In championing such a far-reaching goal for AI models, Microsoft considers the importance of using diverse, inclusive data sets that are representative of the global population, including “people in different societal contexts, people just trying to pursue different aims, people that speak different languages, [and] people that have different abilities.” To reach all people, Llorens contends alongside Chief Diversity Officer Lindsay-Rae McIntyre, we must first represent them. What does that mean when developing inclusive AI?
Language Inclusion as a Priority: Considering Low-Resource Languages in AI Development
At a Microsoft laboratory in India, researchers have been evaluating the competencies of AI foundation models on 80 different languages as part of a larger effort to both understand our vast linguistic diversity and represent these different languages within AI models. Their work both reflects the existing multilingual capacity of models like GPT-4 and indicates the work that lies ahead in including hundreds, if not thousands, of languages spoken throughout the world.
This group has focused their work on addressing and closing the reasoning ability of models in English versus other languages – particularly low-resource languages. Low-resource languages are those that currently lack significant representation and artifact digitization for ingestion by AI models, and thus Microsoft researchers have approached communities that communicate using these languages for help developing more robust data sets.
Llorens offers a glimpse into the complexity of such efforts by offering the example of ASL, or American Sign Language. ASL is a low-resource language that presents a multi modal challenge – “there’s a computer vision component to it, [and] a language modeling component” that necessitates the development of new data sets to improve performance with AI models. Interestingly, McIntyre adds that while ASL is often thought of as an extension of English, it “is a language unto itself, in that it also morphs and absorbs cultural context, community context. We think about accents in English, and ASL also has that reflected in the way that folks sign.” Such inherent linguistic complexity informs the larger mission of training AI models on a myriad of diverse languages – how can we develop inclusive AI models to both represent the broader community and respect the nuance and individuality of so many different languages?
Real-World Impact: Case Study of Inclusive Sata Sets in AI Applications
Expounding upon the importance of addressing all areas of low-resource data in empowering everyone on the planet, Llorens shared the impact of developing and incorporating low-resource data for underrepresented communities. At a talk in Colorado, a student from a tribal community shared with Llorens the utility of leveraging ChatGPT to develop a grazing plan for cattle. In consulting ChatGPT for both federal and tribal regulations – the rules and policies for grazing cattle and adhering to all land requirements – the student was able to align with federal law; however, he realized that ChatGPT lacked knowledge of local tribal policy and regulations. Including data for this tribe’s traditions and requirements would both illuminate and enhance the experience of the AI model locally and would further close the low resource data disparity.
Challenges and Opportunities: Navigating the Path to Data Inclusion in AI
Addressing the challenge and opportunity of furthering inclusive AI, McIntyre recalls the influence of psychologist Carol Dweck’s “growth mindset” on Microsoft’s cultural transformation: “With growth mindset, there came this understanding that you were not going to know everything all the time. That you were going to have to perpetually stay curious and invest in your knowledge. That you were going to have to be a lifelong learner around the workplace and around technology.” Microsoft continues its work in building and sharing more inclusive AI for the betterment of all people and society – and tackles this challenge in an open and flexible way. By continuously refining data sets, addressing language challenges, and engaging with external communities, Microsoft remains at the forefront of building AI technologies that are not just cutting-edge but also considerate of the diverse needs of individuals and communities worldwide. As much as the Microsoft team prides itself on being an early leader in this diversity mission, it also stays eager for the lessons that smaller companies have to offer. This commitment offers two promises – all businesses can continue to benefit from, and support, Microsoft’s inclusion growth mindset, and individuals can benefit from a technology that continuously improves and develops with all people and communities in mind. Curious how harnessing the power of AI can uplift and empower everyone at your organization? Reach out to the Cloudforce team to find out more.