Did China’s DeepSeek improperly obtain data?

Jump to...

Print...

*Distillation is a process by which an actor asks a deep neural network, such as ChatGPT, tons and tons of questions in order to collect its answers. It sucks the knowledge of the “teacher” model—the one being asked the questions—to the “student” model. It’s a large-scale knowledge transfer.

It isn’t illegal, as far as US law goes, but it does violate OpenAI’s terms of service, which were last updated on December 11, 2024. They state that those who use OpenAI services or products may not “automatically or programmatically extract data or output” or “use output to develop models that compete with OpenAI.” (from Vice)

(by Julia Shapero, The Hill) – OpenAI (creator of ChatGPT) is examining whether Chinese artificial intelligence (AI) startup DeepSeek improperly obtained data from its models to build a popular new AI assistant, a spokesperson confirmed to The Hill.

The ChatGPT maker said it is “reviewing indications that DeepSeek may have inappropriately distilled” its models. *Distillation is a technique used to transfer the knowledge of a large model to a smaller model.

“We know that groups in the [People’s Republic of China] are actively working to use methods, including what’s known as distillation, to try to replicate advanced U.S. AI models,” the spokesperson said in a statement.

“We take aggressive, proactive countermeasures to protect our technology and will continue working closely with the U.S. government to protect the most capable models being built here,” they added.

Distillation does not expose a model’s inner workings and can be used by developers to improve their applications, the spokesperson noted. However, OpenAI’s terms of service bar users from using the data obtained through distillation to build competing AI products.

DeepSeek sent shock waves through the American AI industry with the release of its R1 open-source reasoning model last week.

The Chinese startup claims its model performs on par with OpenAI’s latest model and cost just $5.6 million to train with a couple thousand reduced-capacity chips. DeepSeek now sits atop Apple’s App Store after overtaking OpenAI’s ChatGPT. [Deepseek says it only needed 2,000 specialized chips from Nvidia to train its V3. This is in comparison to a reported 16,000 or more required to train leading models, according to The New York Times. (Newsweek, Jan. 29)]

Microsoft, a close partner of OpenAI, is reportedly also investigating the issue after its researchers noticed individuals potentially linked to DeepSeek extracting large amounts of data from the AI firm’s application programming interface last fall, according to Bloomberg.

White House AI and crypto czar David Sacks claimed Tuesday that there is “substantial evidence” that DeepSeek used distillation to pull information from OpenAI’s models.

“I don’t think OpenAI is very happy about this,” he told Fox News. “I think one of the things you’re going to see over the next few months is our leading AI companies taking steps to try and prevent distillation.”

Commerce secretary nominee Howard Lutnick also accused DeepSeek of ripping off U.S. tech firms and violating U.S. export bans on chips to build its model.

“We need to drive our innovation and we need to stop helping them. You know, open platforms — Meta’s open platform let DeepSeek rely on it. Nvidia’s chips, which they bought tons of, and they found their ways around, drive their DeepSeek model. It’s got to end,” Lutnick told the Senate Commerce Committee during his Wednesday confirmation hearing.

Published at thehill .com on Jan. 29, 2025. Reprinted here for educational purposes only. May not be reproduced on other websites without permission.

Questions

1. What has OpenAI accused Chinese AI startup DeepSeek of doing?

2. What is distillation?

3. What does an Open AI spokesperson accuse the PRC (People’s Republic of China) of doing?

4. What does Open AI prohibit users from doing in its ‘terms of service’?

5. What did Microsoft researchers discover about individuals potentially linked to DeepSeek?

6. What does White House AI and crypto czar David Sacks accuse DeepSeek of doing?

Background

On Jan. 29 Newsweek reported:

DeepSeek says it only needed 2,000 specialized chips from Nvidia to train its V3. This is in comparison to a reported 16,000 or more required to train leading models, according to The New York Times.

Elon Musk has questioned these claims in response to posts about DeepSeek AI on X.

In one of the posts, Musk responded to a clip of Scale AI CEO Alexandr Wang, who said that DeepSeek has around 50,000 NVIDIA H100s that they cannot talk about due to U.S. export controls.

Musk wrote in response: “Obviously.”

Musk also replied to a post from Salesforce CEO Marc Benioff, who had written: “Deepseek is now #1 on the AppStore, surpassing ChatGPT—no NVIDIA supercomputers or $100M needed. The real treasure of AI isn’t the UI or the model—they’ve become commodities. The true value lies in data and metadata, the oxygen fueling AI’s potential. The future’s fortune? It’s in our data. Deepgold.”

To this, Musk responded, “Lmao no.”

Ironically, OpenAI itself is facing lawsuits for allegedly using copyrighted material without permission.

The New York Times and several publishers have sued OpenAI, accusing it of training ChatGPT on their work without proper authorization.

In May of last year, OpenAI and News Corp struck a multi-year deal granting OpenAI access to current and archived content from News Corp’s major publications, including The Post, the Wall Street Journal, The Times of London and The Australian.

OpenAI has established similar content partnerships with other publishers, including Condé Nast, Le Monde and Prisa Media. (from a Jan. 29 NY Post article)

Elon Musk posts on X:

“OpenAI was funded as an open source, nonprofit, but has become a closed source, profit-maximizer.” (Jan, 6, 2025)

“[Sam] Altman literally testified to Congress that he wouldn’t get OpenAI compensation and now he wants $10 billion! What a liar. (Jan. 22, 2025)

Elon Musk and Sam Altman have a feud over the direction of OpenAI, the artificial intelligence (AI) company they co-founded. Musk is critical of Altman’s leadership and OpenAI’s recent decisions, which he believes have violated the company’s founding principles.

Musk and Altman had different ideas about how to govern OpenAI, a non-profit organization. Musk left OpenAI and has since sued the company over its decision to become for-profit.

Musk and Altman are both billionaires at the forefront of AI technology, and their feud is about who should lead OpenAI.

Musk and Altman once shared a vision for an open-source, non-profit AI organization, but Musk now believes that OpenAI is prioritizing profits over humanity. (from a google search on Musk/Altman. Results note “Generative AI is experimental)

Resources

Watch a January 28, 2025 news report from Indiana’s WTHR 13News:

Get Free Answers

Daily “Answers” emails are provided for Daily News Articles, Tuesday’s World Events and Friday’s News Quiz.

Did China’s DeepSeek improperly obtain data?

Jump to...

Print...

Questions

Background

Resources

Musk goes behind-the-scenes at DOGE, unveils biggest source of fraud

March 11, 2025

US considers plan to disrupt Iran’s oil by halting vessels at sea

March 10, 2025

Abbey Gate terror suspect charged and extradited to the US

March 6, 2025

Trump to address joint session of Congress

March 4, 2025

Did China’s DeepSeek improperly obtain data?

redo Jump to...

print Print...

Questions

Background

Resources

Musk goes behind-the-scenes at DOGE, unveils biggest source of fraud

March 11, 2025

US considers plan to disrupt Iran’s oil by halting vessels at sea

March 10, 2025

Abbey Gate terror suspect charged and extradited to the US

March 6, 2025

Trump to address joint session of Congress

March 4, 2025

Jump to...

Print...