Will the Catholic Church ever elect a white pope? Once upon a time, this was an easy question to answer. For a long time, successive Popes were white. Then Google came along and confused us all.
The tech giant last week unveiled a new version of its Gemini artificial intelligence (AI) model that asked for drawing suggestions, but users couldn’t force the chatbot to create a white or European pope.
Vikings, medieval knights, the US Founding Fathers, and even the Nazi Wehrmacht were depicted as African or Asian. A fictional history of awakening was about to be created before my eyes.
Although Google has shut down its image service, Gemini continues to provide interesting opinions. He told one user that there was “no concrete evidence” to support the theory of a coronavirus lab leak. The Chinese Communist Party will be happy to hear that the Wuhan Institute of Virology’s safety protocols “appear to be adequate.”
This result points to a serious problem with today’s AI. What we realized this time was that the intervention was so bad. But the real problem lies behind Gemini and ChatGPT’s storefronts, and deep within his chain of information supplies.
This is because AI models need high-quality training data as their raw material to perform their magic. Instead, the web today is full of low-quality training data.
Google can’t complain because no one has done more to shape the internet as it is today.
Initial internet regulations in the 1990s encouraged a hands-off approach by operators, allowing them immunity from liability if they acted as neutral actors. But when they intervened, such as by curating music playlists or removing inaccurate or harmful information, they risked losing their comfort zone.
As Google has grown, we’ve become a very strong and active participant in how these rules are played out.
Instead of fostering a supply chain of safe and legitimate facts, this company has adopted the bizarre communist rallying cry that information “must be free.” For example, Google could have licensed high-quality private databases or other information sources such as Britannica and recovered the costs from users and their internet service providers.
Or they may have cultivated their own information sources by paying organizations like Wikipedia to edit and improve their products.
It also promoted sites offering drugs. Google agreed to a $500m (£394m) settlement with the Department of Justice in 2011 after a sting uncovered evidence that it was profiting from sites that illegally import controlled substances.
Ask the music industry what it’s like to pitch a legitimate supply chain versus a pirated supply chain.
Google has also actively shaped its supply chain through SEO (search engine optimization). These are Google’s guidelines that can make or break your site’s status in Google’s search rankings. You have to keep the quality high, but it has a game feel to it.
A recent academic study conducted over a one-year period by researchers from three German universities found that the quality of product reviews has plummeted as a result.
The team investigated the claim that “low-quality content, especially the high volume of content in product searches, continues to bury any useful information in search results,” and concluded that SEO is primarily to blame.
A heartbreaking analysis by independent product review site HouseFresh explains why. This site is dedicated to air purifier reviews and spends hundreds of hours testing each air purifier.
But after Google changed its SEO rules, we found ourselves inundated with big media brands, some of which are now owned by private equity firms, publishing “reviews” that aren’t reviews at all. These are just fillers for his SEO compliance designed to attract affiliate marketing.
As a result, we end up recommending products that perform poorly, including products from defunct companies that have not been independently tested. Evil drives out good.
These spam and fraudulent sites are now being fed back into the AI model for training. Chatbots need high-quality information, and you know you can’t get it for free.
So maybe Google could use the AI itself instead of the real web to create nice, clean synthetic training data? Oh, this is even worse.
If you train an AI on the output it produces, the model will break down. Research shows that AI models actually converge on homogeneous results that remove differences and diversity.
For example, give a model a variety of faces of all ages and races, generate them only artificially, and run them repeatedly. By the third run, almost all faces that were non-white, under 25, or over 40 were removed. By the fifth run, a small number of nearly identical faces had assembled.
In the researchers’ words, “synthetic data modes float and collapse around a single (high-quality) image before being combined.”
Some people have reproduced this phenomenon, but it is not unique to images. We recently talked about how AI is making the world less interesting.
Google failed to heed a simple business adage. When signing a contract with a supplier, it’s wise to leave something on the table, especially if you value their products and want to do business with them in the future.
Giving your suppliers some leeway allows them to invest and hopefully thrive. But Google has rarely treated its information supply chain with such care or thought. They didn’t even think of it as a supplier, more like a mining location.
Currently, Google’s AI requires high-quality data, so it can’t be scraped for free. It would take a heart of stone to misquote Oscar Wilde and not enjoy the irony.
Broaden your horizons with award-winning British journalism. Try The Telegraph for free for 3 months. Get unlimited access to award-winning websites, exclusive apps, savings and more.