How much did DeepSeek V3 cost to train compared to GPT-4?

DeepSeek researchers wrote in a technical paper that DeepSeek-V3 was trained using Nvidia H800 chips for less than $6 million, as Reuters reported in January 2025. This compares to estimated GPT-4 training costs that are roughly an order of magnitude higher, though OpenAI does not publish its training costs. DeepSeek-R1 is also 20 to 50 times cheaper to use than OpenAI o1 per task, according to DeepSeek's own WeChat account cited by Reuters.

What happened to Nvidia's stock when DeepSeek R1 launched?

When DeepSeek-R1 launched in January 2025, Reuters reported that Nvidia lost $593 billion in market capitalization in a single trading day, a record one-day loss for any Wall Street company. TechCrunch reported that R1 claimed to beat OpenAI o1 on the AIME, MATH-500, and SWE-bench Verified benchmarks. The shock reflected that a Chinese lab had matched frontier reasoning capability at dramatically lower cost.

What changed with DeepSeek V4 and what hardware does it run on?

DeepSeek V4 launched on April 24, 2026 and was adapted to run best on Huawei chips in response to U.S. export controls limiting Chinese access to advanced Nvidia chips, Reuters confirmed. Lian Jye Su, chief analyst at Omdia, told Reuters the launch followed a predictable path rather than a breakthrough moment. Benchmark data showed V4 Pro ranks among leading open-weight models with Kimi and Qwen narrowing the competitive gap.

Why does DeepSeek SEO matter if most users are in China?

DeepSeek publishes fully open weights on Hugging Face for both V4-Pro and V4-Flash. Every enterprise team, researcher, or developer self-hosting a DeepSeek-based deployment inherits the same base retrieval behavior. Optimizing for DeepSeek is therefore a force multiplier across hundreds of downstream deployments, not just one website or geography. The APAC government and state-enterprise deployments following the Huawei chip integration further extend the addressable retrieval surface.

What content structure does DeepSeek favor when retrieving and citing sources?

DeepSeek documents little about its retrieval architecture publicly. Practitioner testing shows heavy preference for the first 200-300 tokens of a page, consistent with our May 2026 study of 153,425 citations that found 74.9% of cited sentences in the first half of the document and mean cited sentence length of 9.27 words. Atomic factual openers of 6-10 words per sentence, placed early, are the single highest-leverage optimization.

DeepSeek SEO 2026: How to Rank in the Open-Weight AI Engine

AI Summary

DeepSeek trained its V3 model for less than $6 million on Nvidia H800 chips, 20-50x cheaper to use than OpenAI o1 per Reuters. When R1 launched in January 2025, Nvidia lost $593 billion in a single trading day. V4 (April 2026) runs best on Huawei chips and ranks among leading open-weight models with Kimi and Qwen narrowing the gap. DeepSeek publishes fully open weights on Hugging Face, making any self-hosted variant inherit the same base retrieval behavior. Our May 2026 study of 153,425 citations found 74.9% of cited sentences in the first document half and mean cited length of 9.27 words. The five-step playbook: atomic factual openers, bilingual signals, open-source citations, robots.txt audit for DeepSeekBot, separate engine tracking.

DeepSeek is a Chinese open-weight AI lab whose V3 model was trained for less than $6 million on lower-capability Nvidia H800 chips, according to Reuters, at a fraction of the cost of Western rivals. That cost gap, combined with fully open weights published on Hugging Face, makes DeepSeek the most consequential AI retrieval surface most brands have not yet optimized for.

What DeepSeek is and why it upended the AI market

DeepSeek shipped DeepSeek-R1 in January 2025 as a reasoning model it claimed matched or beat OpenAI o1 on the benchmarks AIME, MATH-500, and SWE-bench Verified, as TechCrunch reported. The market reaction was immediate. Nvidia alone lost $593 billion in a single trading day, a record one-day loss for any Wall Street company, as Reuters reported.

The shock was structural as well as financial. DeepSeek-R1 is 20 to 50 times cheaper to use than OpenAI’s o1, depending on the task, according to DeepSeek’s own WeChat account as cited by Reuters. A reasoning model at that price point in open-weight form changes who can deploy it, and therefore which content gets retrieved at scale.

DeepSeek V4: the April 2026 release in context

DeepSeek V4 launched on April 24, 2026 and received a different reception than R1. “This announcement followed a rather predictable path,” Lian Jye Su, chief analyst at Omdia, told Reuters. Benchmark data from Artificial Analysis showed V4 Pro “ranks among leading open-weight models rather than clearly surpassing rivals, with competitors such as Kimi and Qwen narrowing the gap.”

The most consequential V4 change for reach is hardware. Reuters confirmed DeepSeek adapted V4 to “run best on Huawei chips, as tightening U.S. export controls are designed to cut off the Chinese market’s access to cutting-edge U.S. chips that power AI model development.” That single decision puts DeepSeek into every Chinese cloud, every state-owned enterprise stack, and a growing number of APAC government AI deployments. As Alfredo Montufar-Helu of Ankura China Advisors told Reuters: “What matters now is whether China can continue advancing on AI development, and potentially do so with its own chips.”

For content teams the strategic implication is the opposite of “DeepSeek peaked.” The user base and infrastructure deployment are growing even as the novelty benchmark is gone. The same playbook discipline that applies to ranking in ChatGPT search or getting cited by Perplexity applies here, but with a different retrieval profile.

How DeepSeek retrieves content (what is documented)

DeepSeek documents little about its retrieval architecture publicly. What we can confirm from the open weights and technical reports on Hugging Face: V4 uses a Mixture-of-Experts architecture. The model was trained on a bilingual corpus with strong Mandarin and English coverage. Base weights for both DeepSeek-V4-Pro (1.6T parameters, 49B activated) and DeepSeek-V4-Flash (284B parameters, 13B activated) are openly published. Everything else about live retrieval behavior is practitioner observation, not documented specification.

What practitioner testing consistently shows: DeepSeek favors the first 200-300 tokens of a retrieved page heavily, which is consistent with the pattern we documented in our atomic sentence SEO research. Our May 2026 study of 153,425 citations found mean cited sentence length of 9.27 words and 74.9% of cited sentences in the first half of the document. Those retrieval dynamics transfer across open-weight models, including DeepSeek.

DeepSeek vs ChatGPT vs Claude: where citation behavior differs

Feature	Option A	Option B
Training cost (V3/GPT-4 comparison)	Less than $6M on H800 chips (Reuters)	~$100M (estimated, not published)	Not disclosed
Primary language bias	Mandarin and English	English-dominant	English-dominant
Open weights published	Yes, on Hugging Face	No	No
Inference cost vs OpenAI o1	20-50x cheaper (Reuters/DeepSeek WeChat)	Baseline	Similar range to GPT-4
Crawler user agent	DeepSeekBot	OAI-SearchBot	ClaudeBot

The open-weight publication is the most underrated SEO signal. Every researcher, enterprise team, or developer self-hosting a fine-tuned DeepSeek variant inherits the base retrieval behavior. DeepSeek optimization is a force multiplier across every downstream deployment that inherits its base weights. This is why we track it separately in the AI brand visibility tracking framework.

The 5-step DeepSeek GEO playbook

1. Write atomic, factual openers

Write your opening paragraph as four to six standalone factual statements that each survive being quoted in isolation. Our May 2026 citation study found 45.2% of all AI-cited sentences fall in the 6-10 word range and 74.9% of cited sentences appear in the first half of the document. A dense, verifiable first paragraph is your single highest-leverage optimization. The detailed methodology is in our atomic sentence SEO guide.

2. Add bilingual signals where relevant

DeepSeek was trained on a large bilingual corpus and English-Mandarin parity is meaningful in retrieval scoring. Adding Mandarin title attributes and alt-text on key images tends to lift retrieval probability for bilingual queries. This compounds with general multilingual GEO strategy. For English-only brands, focus steps 1, 3, 4, and 5 first.

3. Cite open-source and Chinese-origin sources

DeepSeek training mix and retrieval graph over-index on sources that are themselves prominent in the open-source community. Linking out to Hugging Face model cards, arXiv papers, and GitHub repos increases perceived relevance to DeepSeek’s retrieval graph. This is the open-weight analog of why ChatGPT cites Reddit and Wikipedia disproportionately, as we covered in the YouTube and Reddit citation dominance study. In our May 2026 study of 153,425 citations, YouTube led at 9,868 citations and Reddit followed at 6,595, a pattern that mirrors DeepSeek’s training data bias.

4. Audit robots.txt for DeepSeekBot

DeepSeek crawler identifies itself with a user agent starting with “DeepSeekBot.” Many SEO teams have it inadvertently blocked because their robots.txt was last updated when only Googlebot mattered. Audit your robots.txt and confirm DeepSeekBot is explicitly allowed. Our robots.txt for AI crawlers guide covers the full crawler matrix including the April 2026 user agents.

5. Track DeepSeek share of voice as a separate engine

Most AI search trackers aggregate ChatGPT, Perplexity, Claude, and Gemini. Almost none break out DeepSeek separately, which is exactly why brands miss it. Use the framework in our AI brand visibility tracking post and add DeepSeek as a separate engine. We covered the broader question of AI search tracking tool selection and the same tooling extends to DeepSeek with one new query queue.

Connecting DeepSeek to broader GEO fundamentals

GEO paper arXiv 2311.09735 (KDD 2024) documented up to +40% visibility improvement from the best method combination, with cite-sources, quotations, and statistics methods producing +30-40% gains each. Keyword stuffing performed roughly 10% worse than baseline. Rank-5 sites gained +115.1% while top-ranked sites lost -30.3%, meaning GEO disproportionately benefits sites not already dominant in organic search. These findings apply to DeepSeek retrieval just as they apply to any generative engine.

If you have not yet built out the foundational GEO architecture, start with our explainer on Generative Engine Optimization and the 50-point GEO audit checklist. DeepSeek is one engine in a growing landscape. Our GEO/AEO Tracker (open source on GitHub) lets you track your brand across multiple engines including DeepSeek in a single dashboard.

The prompt research methodology we use with clients applies equally to DeepSeek, but the prompt distributions skew different. Run the same 50 prompts against DeepSeek and ChatGPT in parallel and you will see the divergence within two weeks. For enterprise teams already running Gemini optimization, the DeepSeek playbook adds roughly 30% incremental effort for coverage of a distinct retrieval surface. Bain and Company found in February 2025 that about 60% of searches now end without the user clicking through to a website. Being the cited source in a DeepSeek answer replaces the organic ranking click.

What to do this week

Run DeepSeek-V4 against your brand top 25 commercial queries. Save the cited URLs.
Compare those URLs to the same queries on ChatGPT and Perplexity. Non-overlapping URLs are your DeepSeek gap.
Audit robots.txt for DeepSeekBot explicitly in the allow list.
Add Mandarin alt-text to your top 10 commercial page hero images.
Add one Hugging Face or arXiv citation to your top 10 commercial pages.
Add DeepSeek as a separate tracked engine in your AI share-of-voice dashboard.

The brands that build the DeepSeek muscle now will replicate the playbook fastest when Kimi, Qwen, and Mistral close the performance gap, which Reuters reporting suggests is already happening. For a full engine-by-engine breakdown of how Claude citations differ from DeepSeek, see the individual engine guides. The AI search analytics metrics guide covers the measurement layer for all of them.

Want a DeepSeek-specific GEO audit?

OrganikPI runs DeepSeek share-of-voice audits as part of our standard GEO engagement. We track your brand across ChatGPT, Claude, Gemini, Perplexity, Grok, and DeepSeek as separate engines and give you a per-engine action plan.

Get a DeepSeek share-of-voice audit

DeepSeek SEO: How to Get Cited by the Open-Weight Chinese AI Engine