OpenAI Models Allegedly 'Memorized' Copyrighted Content, New Study Reveals

A recent study suggests that OpenAI's AI models, including GPT-4, may have memorized copyrighted material during their training. Co-authored by researchers from the University of Washington, University of Copenhagen, and Stanford, the study develops a method to identify 'memorized' data. The findings indicate that the models showed signs of retaining portions of popular fiction and New York Times articles. This revelation adds weight to ongoing lawsuits against OpenAI from authors and rights-holders. The researchers emphasize the need for greater data transparency in AI training practices.