Meta has created an AI chatbot that appears to interpret the social media giant’s content moderation policies better than its own.
In September 2023, Meta announced that it would make a set of new AI tools available on social media platforms, including a chatbot called Meta AI. The chatbot, which Meta described as “an advanced conversational assistant for WhatsApp, Messenger, and Instagram,” is based on the company’s proprietary large-scale language model and pulls the latest information from search engine Bing.
Meta reportedly “spent 6,000 hours” bombarding Meta AI with queries to find potential “problematic use cases” for the tool, thereby ensuring “as many ‘avoiding a PR disaster,’ the company began training language models at scale based on community standards. This will help determine illegal content.
Media Matters has been reporting for years on Meta’s problematic use of the platform, particularly Instagram’s failure to cover up hate speech, conspiracy theories, and other content that it believes violates its content moderation policies. I have been reporting it for a while. So I thought I’d ask Meta AI a question. Why does such content persist?
For example, when we asked the chatbot about accounts that spread anti-Black racism that Instagram refuses to ban, Meta AI responded that the accounts allegedly violate the platform’s community guidelines by “hate speech and white supremacy.” specifically identified as promoting “an ideology of principle.”
The chat tool also provided suggestions on how to improve Instagram’s content moderation, as well as a list of reasons why these practices haven’t been implemented yet. In one example, Meta AI suggested that creators may not be enforcing their moderation policies because the company “prioritizes other features and monetization over moderation.” did.