Following the public release of DeepSeek’s R1 model, AI giant OpenAI has accused the Chinese AI firm of using OpenAI’s models to train its latest model, which DeepSeek claims was built for only £4.5 million ($5.6 million). Despite OpenAI’s objections, many users and industry insiders have criticised the firm for ‘double standards’, given OpenAI’s own contentious history of using copyrighted content without explicit permission to train its AI models.
What OpenAI Is Accusing DeepSeek Of
According to a report by Bloomberg, OpenAI and Microsoft—one of its backers—are investigating whether a group connected to a Chinese startup improperly utilised data produced by OpenAI’s technology to develop its AI model.
‘We know that groups in the PRC are actively working to use methods, including what’s known as distillation, to replicate advanced US AI models. We are aware of and reviewing indications that DeepSeek may have inappropriately distilled our models, and will share information as we know more,’ an OpenAI spokesperson said.
They added, ‘We take aggressive, proactive countermeasures to protect our technology and will continue working closely with the US government to protect the most capable models being built here.’
In a separate statement to the Financial Times, OpenAI also said it had found evidence linking DeepSeek to the practice of distillation—a technique commonly employed in AI development that involves extracting knowledge from more advanced models to train smaller ones.
Why OpenAI Is Accused of Double Standards?
Following OpenAI’s statement against DeepSeek, many online users—particularly those familiar with the AI industry—have described the firm’s actions as ‘hilarious’. This sentiment arises from OpenAI’s own controversial reputation for training its models on content obtained without securing proper permission from the original creators.
Notably, ChatGPT—OpenAI’s most popular AI service—relies on data formed through agreements with publishers such as Associated Press, Condé Nast, and News Corp. However, a previous report from The Information indicates that these publishers receive only around £802,720 ($1 million) per year for content used by ChatGPT.
As a result, several publishers have taken OpenAI to court. Earlier this year, The New York Times filed a lawsuit accusing the firm of copyright infringement, stating that it never granted ChatGPT the right to use its material. Meanwhile, additional complaints from multiple Indian publishers and Canadian organisations have also been made public, alleging copyright infringement by OpenAI.
Experts often argue that OpenAI ‘gets away’ with numerous legal complaints because the legal framework surrounding copyright infringement by AI models remains unclear. They therefore urge policymakers and government bodies to introduce regulations to address copyright infringement and ‘AI distillation’.
‘As the law on the intersection of AI and different bodies of IP is far from clear, at least in the context of copyright law, a practice is now developing where owners of datasets of copyrighted works license the works to companies that are developing generative AI systems, thus avoiding legal risk,’ said Dotan Oliar, a professor at the University of Virginia School of Law.
Rising Concerns on Copyright Infringement Amongst AI Models
The rapid advancement of artificial intelligence has sparked heightened apprehension regarding the use of copyrighted content without permission, particularly among creators in writing, visual arts, and music. AI models frequently require large volumes of data for training, often sourced from existing creative works—sometimes without obtaining consent from the original creators.
A prominent example involves artists Sarah Andersen, Kelly McKernan, and Karla Ortiz, who launched a class-action lawsuit against Stability AI, the developer of the AI image generator Stable Diffusion. The artists contend that their copyrighted works were used without their permission to train the model, resulting in AI outputs mimicking their unique artistic styles and potentially infringing their intellectual property rights.
In another instance, over 10,500 artists spanning multiple creative fields signed an open letter condemning the unlicensed use of their works to train AI models. The letter underscores the serious threat such practices pose to creators’ livelihoods and calls for ethical frameworks to govern AI usage in creative industries.
As AI technology continues to evolve, policymakers, developers, and the creative community are under growing pressure to craft clear guidelines that strike a balance between innovation and the protection of intellectual property rights. These measures include formulating ethical standards for AI training, ensuring greater transparency in data sourcing, and giving creators the tools and legal recourse to safeguard their work against unauthorised usage.