Study Reveals OpenAI’s Models May Have 'Memorized' Copyrighted Content

A recent study indicates that OpenAI's AI models, including GPT-4, potentially trained on copyrighted materials without permission. Researchers from the University of Washington, the University of Copenhagen, and Stanford suggest that certain phrases and snippets from fiction and news articles were memorized by these models. This raises concerns as plaintiffs in ongoing lawsuits argue that OpenAI's use of their content lacks a fair use defense. The study proposes a method to identify memorized content, emphasizing the need for data transparency in AI training practices. OpenAI continues to advocate for looser copyright restrictions.