January 27, 2025
3 I will read the smallest
Why DeepSeek’s AI model became a top evaluation app in the United States
Chinese emerging companies surprised the technology and financial markets with cheaper and lower technology AI assistants.

Deepseek’s artificial intelligence assistant caused a large wave on Monday, became an application for Apple Store, and sent Tech Stocks to a downward fall. What is the fuss about?
Deepseek, a Chinese emerging company, was a new model comparable to the ability of Openai’s latest model, a new model that made a much less investment and used a reduced capacity chip, surprised the high -tech industry. The United States prohibits the export of cutting -edge computer chips to China and restricts the sales of chipmaking equipment. Deepseek, based in Hangzhou, eastern China, is reportedly stored in high -performance NVIDIA A100 chips since the period before the ban. Therefore, the engineer was able to develop a model using them. However, the main break-through states that the startup used a much lower power NVIDIA H800 chip to train a new model called Deepseek-R1.
“I’ve seen the success of a large -scale high -tech company working in AI, not what the technology was actually, but how much money they collected,” said AI Company Plano Intelligense. , Ashlesha Nesarikar, CEO of Inc’s CEO. 。
About the support of science journalism
If you are enjoying this article, consider supporting journalism that has won. Subscription. By purchasing a subscription, it will help you secure the future of a story that has an impactful story that forms our world today.
According to VentureBeat, in a common AI test for mathematics and coding, Deepseek-R1 matched the score of the O1 model of Open AI. US companies have not disclosed the cost of training unique large language models (LLMS), which is a system that fills popular chatbots such as Chatgpt. However, Openai CEO Sam Altman said in 2023 that ChatGPT-4 training would cost more than $ 100 million. Deepseek-R1 can be downloaded for free for users, but the equivalent version of Chatgpt is $ 200 per month.
Nesarikar says that DeepSeek’s $ 6 million numbers do not necessarily reflect the cost of building LLM from zero. The cost may indicate fine -tuning this latest version. Nevertheless, she says that improving the energy efficiency of the model will make more people access AI in more industries. Regarding the impact of AI’s environment, the improvement of efficiency can be good news because the calculation cost to generate new data using LLM is 4 to 5 times higher than the typical search engine sink.
HANCHANG CAO, the next assistant professor of the Emory University Information System and Operation Management, has a low calculation ability, so the cost of executing Deepseek-R1 is one-tenth of the same cost of competitors. “This difference is really a lot for academic researchers and emerging companies,” says CAO.
Deepseek has achieved its efficiency in several ways Reasons for the machine to learn: Elegant mathematics behind modern AI。 This model has 670 billion parameters or variables to learn during training, and is the largest open source large language model so far. However, this model uses an architecture called “mixture of experts”, and only some of these parameters (billions of billions of people, not hundreds of millions) are used. It will be activated. This reduces computing costs. Deepseek LLM also uses a method called Multi-Head Latent Anterness. Instead of predicting the answer to each word, multiple words are generated at once.
This model is different from other models such as O1, how to strengthen learning during training. Many LLMs have an external “critic” model that runs with them, correct errors, and fine-tunes LLM for a verified answer, but Deepseek-R1 is the internal rule of the model. Use the set to tell which answer that may be generated is optimal. “DeepSeek has rationalized the process,” says AnasThauswamy.
Another important aspect of Deepseek-R1 is that the company has created a code behind the product’s open source, AnasthaSwamy. (Training data remains independent.) This means that you can confirm the company’s request. If the model is as calculated as Deepseek’s claim, he will probably open a new way for researchers who use AI to work faster and cheaply. It also enables more research on the internal mechanism of LLMS itself.
“One of the big things was this disparity held between academia and industries because the academia could not cooperate with these very large models or conduct research in a meaningful way.” ANASTHASWAMY says. “But this is where you have a code, so it’s now in the academic world.”