Sarah Silverman has officially filed class-action complaints against both ChatGPT developer OpenAI and LLaMA developer Meta for allegedly training their artificial intelligence products on copyrighted works without permission.
The 52-year-old comedian (who penned a memoir entitled The Bedwetter in 2010), along with Sandman Slim author Richard Kadrey and Ararat writer Christopher Golden, just recently submitted the actions to a California federal court. Alleging both direct and vicarious copyright infringement as well as violations of California competition law, the substantially similar complaints center on the mentioned companies’ alleged use of a multitude of books to “train” their AI models.
On the OpenAI side, the plaintiffs according to the text “did not consent to the use of their copyrighted books as training material for ChatGPT” – which nevertheless allegedly made use of the works.
Besides laying out in detail OpenAI’s public disclosures about the books used to train the extremely popular chatbot – massive datasets called “Books1” and “Books2” – the suit indicates that the decidedly controversial company “has never revealed what books are part of the Books1 and Books2 datasets.”
However, “notorious ‘shadow library’ websites” have per the legal document “long been of interest to the AI-training community,” with the plaintiffs further maintaining (based upon conclusions drawn from the size of Books1 and Books2 and the few digital databases with the appropriate number of books) that “the OpenAI Books2 dataset includes books copied from these” libraries.
Bearing in mind this alleged unauthorized use of protected writing, the filing parties (whose complaint includes as an exhibit ChatGPT’s summaries of the aforesaid allegedly infringed works) maintain on the infringement front that OpenAI’s language models “are themselves infringing derivative works” because they “cannot function without the expressive information extracted” from the books and other sources.
Consequently, Silverman, Kadrey, and Golden are pushing for damages, “restitution of profits,” and more. Also included in the firmly worded action is the claim that the defendant company removed the plaintiffs’ copyright management information (copyright notice, title, etc.) in violation of the Digital Millennium Copyright Act.
“OpenAI knew or had reasonable grounds to know that this removal of CMI would facilitate copyright infringement by concealing the fact that every output from the OpenAI Language Models is an infringing derivative work,” the suit reads, “synthesized entirely from expressive information found in the training data.”
Shifting to the complaint concerning LLaMA, which Meta debuted in February, the legal text claims the Facebook and Instagram parent publicly disclosed that “85 gigabytes of the training data” at hand had come from “Books.”
Expanding upon basic details provided by Meta, the suit then ties this massive collection of writing back to the previously noted “flagrantly illegal” shadow libraries, which are, of course, said to encompass the initially identified books written by the plaintiffs.
“Meta knew or had reasonable grounds to know that this removal of CMI would facilitate copyright infringement by concealing the fact that every output from the LLaMA language models is an infringing derivative work,” the action against Meta echoes, “synthesized entirely from expressive information found in the training data.”
Needless to say, it’ll be worth keeping an eye out for similar suits in the music space moving forward – particularly given the ongoing explosion in artificial intelligence music generators, tracks, and platforms. At present, OpenAI is also grappling with regulatory scrutiny in the European Union, and Meta last month rolled out an AI music generator of its own.