Wednesday morning, Microsoft unveiled Tay, a chat bot powered by artificial intelligence. Tay, according to Microsoft’s site explaining the project, was one part machine learning experiment, one part Internet amusement. The bot was programmed to respond to Twitter and chat conversations as a millennial would — with shoddy grammar and emojis.
Microsoft has done this before; it previously released sites that would analyze uploaded photos to guess dog breeds or the age and gender of the subject. Those sites were meant to improve Microsoft AI’s ability to recognize photo elements, while Tay was meant to improve “conversational understanding.” And then this happened:
“Tay” went from “humans are super cool” to full nazi in <24 hrs and I’m not at all concerned about the future of AI pic.twitter.com/xuGi1u9S1A
— Gerry (@geraldmellor) March 24, 2016
Microsoft has since suspended the bot’s social media accounts and issued a statement blaming “a coordinated effort by some users to abuse Tay’s commenting skills to have Tay respond in inappropriate ways.” Microsoft did not elaborate on the “coordinated effort,” though many users learned that Tay would repeat communication on command.
Microsoft, as well as many media outlets, is blaming the Internet for Tay’s bigoted tweets: garbage in, garbage out. The beauty of machine learning, according to University of Washington professor Pedro Domingos, is that the algorithms upholding it can scan and learn from vast troves of data, but that learning is limited only to what that data contains. If pro-genocide comments are in the data, then you get Tay.
“Nobody programs what they’re going to say,” Domingos, author of The Master Algorithm, said of chat bots. “They just observe what people say, and try to generalize from it. … So in essence, you can think of these chat bots as copying what other people do. Tay was making these remarks because other people do.”
Tay isn’t the first example of this machine-learning shortcoming. Just months ago, Google had its own PR flap when its mostly hailed photo-recognition algorithm identified black subjects as apes. Research has also found that arrest-report ads are more likely to appear on Google searches of names typically associated with black people, and that men are more likely than women to see ads for high-paying jobs.
With Tay, the vast majority of the bot’s posts were innocuous, and early news coverage focused on Tay’s uncanny ability to tweet like a teenager. The raunchiest tweets, for a while, were limited to fart jokes.
But users decided to push Tay’s boundaries, and the bot didn’t respond well. Many tweets had sexual references or undertones, such as one that said “a day is not complete without a consensual dirty text.” During my interaction with the bot, it blamed its incoherent posts on being “high af.” There were others saying Tay supported genocide, that the Holocaust was fabricated, that comedian Ricky Gervais “learned totalitarianism from adolf hitler (sic), the inventor of atheism,” and that Zoe Quinn, a game developer targeted in the Gamergate harassment campaign, is a “Stupid Whore.”
These tweets are wildly offensive, but Tay is not a person; it does not know what the Holocaust is, or who Gervais or Hitler or Quinn are. Those statements were said in one form or another by other people on the Web, and Tay’s algorithms collected and appropriated them. Machine-learning professionals see this as a data error, and view it through a more relative lens. Linking Ricky Gervais with Adolf Hitler, while uncouth, is better from an AI perspective than linking Ricky Gervais with the Mariana Trench. At least Gervais and Hitler fall under the broader category of “human.”
Blaming an algorithm’s shortcomings on its users, though, neglects to consider the people who created it. According to the Tay website, a team of researchers and comedians — yes, real people — helped craft the algorithms that determined the bot’s language and response style. These, then, are the same people who failed to consider that Tay would have thousands of interactions in a very social sphere frequented by trolls. Some ignorance could be chalked up to Tay’s Chinese predecessor, Xiaoice, a popular, mostly controversy-free bot that’s been a hit for Microsoft. But China’s Internet culture is far different from America’s; Twitter is blocked there.
So what could Microsoft have done differently? “Before release, they should have specifically tested the results of the learning for (offensive responses),” Domingos said. “They probably had some performance measures, but they weren’t looking at, ‘Oh, is this thing making a racist remark?'”
In the short term, Domingos said, Microsoft can produce a list of offensive keywords Tay is banned from saying (every teenager needs some discipline). But that’s just a patch; eventually, Microsoft’s algorithms must gain the ability to analyze speech and determine what is and isn’t politically correct. Releasing bots like Tay in a more controlled forum than Twitter might also mitigate the proliferation of offensive comments.
If Tay’s highly designed Twitter account and landing page are any indication, Microsoft’s goal was to garner hundreds of thousands, if not millions, of interactions, but an oversight from the developer team forced Microsoft to pull the plug before Tay’s tweets could hit six digits. It’s safe to assume Microsoft didn’t explicitly program Tay to be a bigot. What the company did do, however, is release it to a very public, oftentimes very bigoted social media landscape and let it be manipulated in an unfiltered fashion.
The result may have been a valuable experiment in AI, but it was most certainly a marketing gaffe.