The brain that wrote your favorite novel consumed Dickens and Austen, Pynchon and Didion. The brain that wrote this article devoured Bradbury and Orwell, Ishiguro and Octavia Butler. But the “brain” that powers that chatbot you played around with over the weekend ingested 170,000 books, all so it can spit out language that sounds smart, colorful, or helpful—even if it’s really not.
Copyright law is well equipped to differentiate between slightly derivative human ingenuity and reductive copycatting. But language-guzzling artificial intelligence models, which need to “train” on existing works, present a bigger challenge. A.I. companies are currently racking up lawsuits accusing them of training their powerful models on copyrighted materials. In July, a group of writers including comedian Sarah Silverman and novelist Michael Chabon filed suits against OpenAI and Meta, alleging that the companies improperly trained their models on the authors’ books.
OpenAI, Meta, Microsoft, and Google are all facing legal complaints—a barrage of them, in fact. So are A.I. upstarts like Midjourney and Stability AI, which make popular A.I. image generators.
The stakes could not be higher for companies betting on A.I. as a transformative, and lucrative, technology. Legal experts told me that copyright challenges pose a near-existential threat to existing A.I. models if the way they’re being trained isn’t aboveboard. If they can’t ingest mountains of data—which until now they’ve largely done without paying for that data—they won’t work. Because those mountains might be owned by someone else.
If these high-profile matters ever reach a courtroom, it’ll be years from now. But one smaller squabble is already headed to trial, and may portend whether authors like Silverman have a legitimate claim of wrongdoing, or if some of the largest technology companies in the world—and the content vacuums adding billions of dollars in value to their market capitalizations—are going to get away with it.
In 2020 the media company Thomson Reuters sued a little-known firm called Ross Intelligence. Better known for its namesake news wire service, Thomson Reuters is a juggernaut in the legal world because it also owns Westlaw, a ubiquitous legal research service.
In its complaint, Reuters alleged that Ross tried to license Westlaw’s legal summaries—called headnotes—to train an A.I.–powered legal search engine. When Reuters refused to do that, Ross contracted with a third-party firm to scrape them off Westlaw. Reuters argues that Ross’ product was simply “headnotes with question marks at the end,” as the judge recently summarized.
If a work is copyrightable, as the jury will decide in this case, the next step is determining whether the use of that material is protected by fair use—a legal doctrine aimed at promoting creative expression. A trial is tentatively scheduled for May 2024.
One big problem for Ross Intelligence is that it was apparently using Thomson Reuters’ content with the hopes of competing directly with the company. Ross shut down operations in 2021, after it was sued. “If your work is directly competing in the same market with the work that you’re using, that’s a huge factor weighing against fair use,” said Bob Brauneis, an intellectual property law professor at George Washington University.
That element of direct competition was decisive in the most recent Supreme Court battle over fair use, in which the court ruled that Andy Warhol had infringed on photographer Lynn Goldsmith’s copyright when he made silk-screen versions of her photograph of Prince. The reason: His foundation was directly competing against the original by licensing it to magazines. “Thomson Reuters is going to press very hard on that, and that’s what all of the plaintiffs in the generative A.I. cases are going to press hard on,” Brauneis said.
Fair use protects only works that are “transformative”—meaning, at its most basic, that it’s significantly different from the original work. That could also spell trouble for Ross.
“If all they were doing was feeding a large language model because they wanted it to learn English … then the argument is stronger in favor of the use being transformative,” said Kristelia García, an intellectual property law professor at Georgetown University. “Whereas what Ross arguably is doing here is not transformative but actually a competing product, which is exactly what happened in Warhol.”
García said that the question of whether feeding copyrighted material to train an A.I. model is “very unclear” but that the facts in this case don’t give her much confidence Ross will succeed.
There is a disaster scenario for OpenAI and other companies funneling billions into A.I. models: If a court found that a company was liable for copyright infringement, it could completely halt the development of the offending model.
“The worst outcome would be that you lose and that the relief is you have to destroy your model and start all over again,” Brauneis said. “The way these models are generated, there’s no way that, say, [OpenAI’s] GPT-4, there’s no way you can go back and filter out the plaintiff’s content from the model that you generated. I think every computer scientist agrees that is not currently possible with the way these models are being built. Then you have to destroy that and start all over again with content that you’ve licensed.”
García said that any finding of mass copyright infringement would be “crippling” for A.I. businesses. Not only could they be forced to shut down—or halt the development of these models—but the damages could be “incredibly astronomical.” Settlement, therefore, is likely because A.I. companies might not want to risk the matters going to trial and coming to an adverse conclusion.
Perhaps sensing that Pandora’s box had been opened, Microsoft President Brad Smith vowed to customers recently that the $2.3 trillion company would assume legal responsibility if using their A.I. tools drew any copyright challenges.
“We believe the world needs A.I. to advance the spread of knowledge and help solve major societal challenges,” Smith wrote in a blog post. “Yet it is critical for authors to retain control of their rights under copyright law and earn a healthy return on their creations.” Smith assured customers that the company is carefully training its model not to spit out anything that could infringe on copyright law.
Microsoft may just prefer to enter the fray on behalf of its clients rather than watch from the sidelines when the stakes of a bad court ruling are so high.
Brauneis recently struck up a conversation about the stakes of this debate. “Somebody at a conference said to me, ‘That’s the trillion-dollar question.’ I don’t think they were exaggerating—it’s at least a multibillion-dollar question.”
is a partnership of
New America, and
Arizona State University
that examines emerging technologies, public policy, and society.