Judge orders OpenAI to hand over 20 million ChatGPT logs in copyright dispute

A US magistrate judge has rejected OpenAI’s attempts to avoid turning over vast quantities of ChatGPT conversation logs in a lawsuit brought by several major news publishers, as reported by MediaPost. The decision compels OpenAI to produce 20 million de-identified logs, with the court finding them relevant to claims that ChatGPT may reproduce copyrighted news content.

Key Points from MediaPost’s coverage

  • Judge Ona T. Wang denied OpenAI’s motion for reconsideration, stating the company had not presented “any facts or law that the Court did not consider and that would compel a different conclusion.”

  • The publishers have not conceded that “at least 99.99% of the conversation logs are irrelevant,” countering OpenAI’s argument.

  • Wang ruled that providing 20 million de-identified logs is “proportional to the needs of the case,” noting that tens of billions of logs exist in total.

  • The logs are deemed relevant both to claims that ChatGPT outputs may contain copyrighted reproductions and to OpenAI’s affirmative defences.

  • Frank Pine, executive editor of MediaNews Group and Tribune Publishing, welcomed the ruling, saying: “OpenAI’s leadership was hallucinating when they thought they could get away with withholding evidence about how their business model relies on stealing from hardworking journalists.”

Analysis

This ruling marks a significant moment in the long-running legal clash between AI developers and news publishers over copyright, transparency and training data practices. The scale of the required disclosure is unusually large for a discovery order, and the court’s insistence that the logs are both relevant and proportionate underscores the seriousness with which it views the publishers’ allegations.

From an industry perspective, the decision strengthens publishers’ efforts to probe how generative AI systems might replicate or rely on proprietary journalism. It also raises wider questions about data governance within large AI models, particularly how companies document, retain and audit system outputs. The insistence on de-identification suggests the court is trying to balance privacy with accountability, but it also signals that AI firms will not easily avoid scrutiny on operational grounds alone.

Looking ahead, this order could set a precedent encouraging other plaintiffs to seek similarly expansive discovery when questioning AI-related data practices. If the logs reveal substantial reproductions of copyrighted material, it may bolster arguments for licensing mandates or compensation frameworks. Conversely, if the outputs show minimal overlap or mostly user-driven interactions, OpenAI may be able to strengthen its defence that generative AI models do not systematically reproduce protected works. Either scenario could influence forthcoming negotiations between publishers and AI platforms, shaping how rights, data access and commercial value are balanced in the next phase of AI-media relations.

Michael is the founder and CEO of Mocono. He spent a decade as an editorial director for a London magazine publisher and needed a subscriptions and paywall platform that was easy to use and didn't break the bank. Mocono was born.

Leave a Reply