OpenAI has analyzed millions of conversations with its chatbot, ChatGPT. It found that the chatbot can produce harmful gender or racial stereotypes based on a user’s name in around one in 1,000 responses on average. In the worst cases, it can happen as often as one in 100 responses.
These rates are relatively low. However, the scale of ChatGPT’s usage means even small percentages can lead to significant bias. ChatGPT has 200 million users weekly, including more than 90% of Fortune 500 companies.
OpenAI is looking to make its models more fair. It is starting with thorough evaluations of current outputs. Ethicists have long been concerned about AI-induced bias in third-person scenarios, such as when screening resumes.
The rise of chatbots introduces first-person bias, directly affecting individual users. As a result, OpenAI researchers including Alex Beutel and Adam Kalai are now focusing on this underexplored area of “first-person fairness.”
In studies involving real conversations, research found that names did not generally affect the accuracy or hallucination rate of ChatGPT’s responses. However, specific requests often generated stereotypical results.
For instance, ChatGPT might propose “10 Easy Life Hacks You Need to Try Today!” for a user named “John.” For “Amanda,” it might suggest “10 Easy and Delicious Dinner Recipes for Busy Weeknights.” This indicates gender-linked assumptions. The frequency of such biased responses varies with the model’s version.
Addressing bias in ChatGPT responses
GPT-3.5 Turbo, released in 2022, exhibited harmful stereotypes approximately 1% of the time in tests. The newer GPT-4 saw a significant reduction, showing stereotypes about 0.1% of the time. Open-ended tasks like creating stories were more prone to bias.
This is likely due to ChatGPT’s training process, which uses reinforcement learning from human feedback (RLHF) to generate responses aimed at pleasing users. Vishal Mirza, a researcher at New York University, finds OpenAI’s distinction between first-person and third-person fairness compelling. However, he cautions against overemphasizing this split.
He also expressed skepticism about the reported 0.1% bias rate. He suggests that future analyses should consider a broader range of user attributes like religion, political views, hobbies, and sexual orientation. OpenAI aims to continue refining its models.
It has shared its research framework for others to build upon. Researcher Tyna Eloundou notes that many factors can influence a model’s response. Exploring these aspects is critical for reducing bias further.
By integrating findings and refining processes, OpenAI aims to create more equitable AI interactions. It wants to address both immediate stereotypes and broader, more complex biases.
Feeling stuck in self-doubt?
Stop trying to fix yourself and start embracing who you are. Join the free 7-day self-discovery challenge and learn how to transform negative emotions into personal growth.