Chinese AI Developer DeepSeek Accused of Using Google Gemini Data

Chinese company DeepSeek has come under suspicion for illegally using Google Gemini data to train its artificial intelligence model R1. Researchers have discovered suspicious similarities in vocabulary and reasoning logic between the models.

G. Ostrov

June 5, 2025

The artificial intelligence technology industry faces another scandal related to suspicions of illegal use of competitors\' data. At the center of attention is Chinese company DeepSeek, which may be using Google Gemini data to train its own AI models.

Updated Model and Suspicions

In May 2025, DeepSeek presented an updated version of its artificial intelligence model R1, which demonstrated impressive results in mathematical calculations and programming tasks. However, the company did not disclose the data sources used for model training, raising suspicions within the expert community.

The first serious accusations were made by developer Sam Pech from Melbourne, who specializes in evaluating the emotional intelligence of AI systems. In his publication on social network X, he presented data that, according to his claims, irrefutably proves the training of the DeepSeek R1-0528 model on Google Gemini outputs.

Evidence of Similarity

Analysis revealed striking similarities in vocabulary and speech patterns between the DeepSeek model and Google Gemini 2.5 Pro. Moreover, another researcher, the creator of the anonymous SpeechMap project for evaluating free speech in AI systems, discovered similarities in \"thinking processes\" - intermediate outputs of the DeepSeek model with Gemini traces.

These discoveries point to the possible use of knowledge distillation techniques - a method of training AI based on data from more powerful existing models, which may violate licensing agreements and terms of use.

History of Violations

This is not the first accusation against DeepSeek for improper use of competitors\' data. In December 2024, developers noticed that the DeepSeek V3 model systematically identified itself as ChatGPT, clearly indicating training on OpenAI chat logs.

Earlier in 2025, OpenAI officially informed Financial Times about having irrefutable evidence of DeepSeek using data distillation methods from their models. According to Bloomberg information, Microsoft, which closely collaborates with OpenAI, discovered significant data leaks through OpenAI developer accounts at the end of 2024, presumably related to DeepSeek\'s activities.

Legal and Ethical Aspects

Although distillation is a common practice in the AI industry, OpenAI\'s terms of service categorically prohibit using their models\' outputs to create competing products. Similar restrictions apply to other major companies.

The situation is complicated by the fact that many models may mistakenly identify themselves and use similar phrases due to \"contamination\" of the open internet, which serves as the primary data source for AI training. Mass content creation using AI and bot activity on social networks significantly complicate data filtering.

Expert Opinions

Experts, including Nathan Lambert from research institute AI2, consider DeepSeek training on Gemini data a quite probable scenario. Lambert suggested that using the Gemini API could have been a more efficient solution for DeepSeek than developing their own technologies from scratch.

Industry Countermeasures

In response to the growing problem of unauthorized distillation, tech giants are strengthening security measures. In April 2025, OpenAI introduced mandatory identity verification for access to some advanced models, with China excluded from the list of supported countries.

Google also took active measures, beginning to \"summarize\" traces of models available through the AI Studio platform, significantly complicating the training of competing models on Gemini data. Company Anthropic announced the implementation of similar protective measures in May.

This situation highlights the growing tension in the AI field between innovation and intellectual property protection, as well as the need for clear legal regulation in the rapidly developing sector.

More detailed information about AI technology development can be found on the official DeepSeek website.

If you encounter any problems, contact us, we will help quickly and efficiently!