OpenAI is facing another copyright infringement lawsuit, filed this time by authors including George R.R. Martin and Michael Connelly, for allegedly training ChatGPT on protected text without authorization.
17 authors as well as The Authors Guild (which owns the rights to mystery author Mignon Eberhart’s works) just recently submitted the firmly worded complaint, naming as defendants OpenAI proper and a number of distinct entities operating under similar titles. These defendants, the filing parties maintain, are “a tangled thicket of interlocking entities that generally keep from the public what the precise relationships among them are.”
Like the ongoing OpenAI litigation spearheaded by Sarah Silverman, the newer action explains in detail that the defendants have decided not to “disclose or publicize with specificity what datasets” they’ve used to train ChatGPT.
Both suits explore the potential contents of the “Books2” collection that OpenAI has acknowledged drawing from (with a focus on the possible inclusion of “notorious repositories” of pirated works), however, and emphasize that the company’s opted against shedding light upon any of the material behind GPT-4 in particular.
But the comparatively powerful GPT-4, which debuted in March, required a more comprehensive library of books, allegedly including many copyright-protected projects, to train, per the plaintiffs.
“There is no other way OpenAI could have obtained the volume of books required to ‘train’ a powerful LLM [large language model] like GPT-4,” the plaintiffs claim of the AI platform’s potential utilization of “very large sources of pirated ebooks.”
Also in support of their position, the litigating authors point to OpenAI’s alleged acknowledgement of its use of protected text to train ChatGPT and the chatbot’s in-depth novel summaries (which purportedly encompass “details not available in reviews and other publicly available material”). Driving home the point, the complaint even features ChatGPT’s comments on the matter.
“‘It is possible that some of the books used to train me were under copyright,’” ChatGPT is said to have responded to a related question in January of 2023. “‘However, my training data was sourced from various publicly available sources on the internet, and it is likely that some of the books included in my training dataset were not authorized to be used.’”
From there, the straightforward suit describes at length AI’s devastating impact on the writing community (including resulting financial hardship and professional obstacles) while underscoring protected works’ significant role within OpenAI’s well-known platform.
“In short, the success and profitability of OpenAI are predicated on mass copyright infringement without a word of permission from or a nickel of compensation to copyright owners,” the action spells out, indicating also that OpenAI has made the filing authors “unwilling accomplices in their own replacement.”
Regarding the complaint’s precise scope, the plaintiffs are looking to secure class certification – thereby opening up the action to the many other stateside authors who have copyrighted (or obtained the rights to) a work of fiction, sold north of 5,000 copies, and seen the writing used in ChatGPT’s training process.
And for OpenAI’s alleged direct, vicarious, and contributory copyright infringement, the suit is seeking sizable damages (including a portion of the profits deriving from the alleged IP theft) and an injunction preventing the use of protected works to train “large language models without express authorization.”
Late last month, OpenAI moved to dismiss the majority of Sarah Silverman’s aforementioned claims, which allege not only copyright infringement but violations of the Digital Millennium Copyright Act and California law as well.