DeepSeek v4

hodgehog11 · 2026-04-24T07:11:05 1777014665

There are quite a few comments here about benchmark and coding performance. I would like to offer some opinions regarding its capacity for mathematics problems in an active research setting.

I have a collection of novel probability and statistics problems at the masters and PhD level with varying degrees of feasibility. My test suite involves running these problems through first (often with about 2-6 papers for context) and then requesting a rigorous proof as followup. Since the problems are pretty tough, there is no quantitative measure of performance here, I'm just judging based on how useful the output is toward outlining a solution that would hopefully become publishable.

Just prior to this model, Gemini led the pack, with GPT-5 as a close second. No other model came anywhere near these two (no, not even Claude). Gemini would sometimes have incredible insight for some of the harder problems (insightful guesses on relevant procedures are often most useful in research), but both of them tend to struggle with outlining a concrete proof in a single followup prompt. This DeepSeek V4 Pro with max thinking does remarkably well here. I'm not seeing the same level of insights in the first response as Gemini (closer to GPT-5), but it often gets much better in the followup, and the proofs can be _very_ impressive; nearly complete in several cases.

Given that both Gemini and DeepSeek also seem to lead on token performance, I'm guessing that might play a role in their capacity for these types of problems. It's probably more a matter of just how far they can get in a sensible computational budget.

Despite what the benchmarks seem to show, this feels like a huge step up for open-weight models. Bravo to the DeepSeek team!

segmondy · 2026-04-24T13:07:12 1777036032

They have had the best math models for about a year most folks just didn't know about it. You can't find inference on APIs, but I run these at home, this is also the advantage of open models.

https://huggingface.co/deepseek-ai/DeepSeek-Math-V2 https://huggingface.co/deepseek-ai/DeepSeek-Prover-V2-671B

simonjgreen · 2026-04-25T11:09:43 1777115383

You are of course specifically referring to the math optimised models, not the chat ones folks would generally encounter. Not that I’m trying to contradict you, your point is super valid and I agree with you! But I’m supplementing to help anyone following along who may make choices.

This is when it happened for anyone interested: https://binaryverseai.com/deepseek-math-v2-benchmarks-review...

jug · 2026-04-25T15:11:35 1777129895

Shouldn't one use e.g a Wolfram Alpha MCP endpoint for math in AI? From what I've seen on even premium non-quantized models, I would never ever trust the innate ability of a LLM to calculate.

lowbloodsugar · 2026-04-24T15:45:46 1777045546

You run a 671B model at home?

segmondy · 2026-04-24T18:15:30 1777054530

Yes, and plenty of others do too. Quantizied. Join us at r/localllama

My largest models

   318G    /llmzoo/models/Qwen3.5-397B
   377G    DeepSeekv3.2-nolight
   380G    /llmzoo/models/DeepSeek-V3.2-UD
   400G    /llmzoo/models/Qwen3.5-397B-Q8
   443G    DeepSeek-Math-v2
   443G    DeepSeek-V3-0324-Q5
   522G    /llmzoo/models/GLM5.1
   545G    /llmzoo/models/kimi2.6
   546G    /llmzoo/models/KimiK2.5

danilocesar · 2026-04-24T22:55:39 1777071339

Is your house's heating system based on H100s?

Liftyee · 2026-04-24T19:09:35 1777057775

What hardware do you use?

MezzoDelCammin · 2026-04-25T10:24:43 1777112683

I think the answer to this is:"yes"

CoolThings · 2026-04-25T11:04:55 1777115095

a Beowulf cluster of 256 x Raspberry Pi 3.

tclancy · 2026-04-25T03:00:03 1777086003

All of it.

chid · 2026-04-25T13:08:40 1777122520

even quantised, those are HUGE

tclancy · 2026-04-24T18:11:48 1777054308

It's a big house.

UncleOxidant · 2026-04-25T02:43:47 1777085027

Maybe if there was a 1-bit quant.

barbacoa · 2026-04-25T19:19:57 1777144797

Apple briefly was selling Mac studio with 512 GB of unified ram, meaning all that was available as vram.

verdverm · 2026-04-24T14:32:08 1777041128

Vertex AI has had deep seek available via API for a while

segmondy · 2026-04-24T18:16:39 1777054599

I'm talking about their specialized math models, not the general model.

PhilippGille · 2026-04-24T14:26:17 1777040777

When you say "Gemini", which exact model do you mean? You know there are several and they vary a lot in how capable they are? Pro 3.1 Preview, 2.5 Pro (their latest non-preview pro model), Flash 3 Preview, ...

Same with GPT-5: Latest 5.5, prior 5.4, or actually the original 5 (.0)?

You can't talk about model performance without specifying the exact model.

hodgehog11 · 2026-04-24T15:27:38 1777044458

My apologies, I thought it would be implicit that I am using the top-tier model of the time given the challenge of the tasks. GPT-5.5 was too new in this top comment (although I did test it a bit in a comment below), so I was using GPT-5.4. Gemini is Pro 3.1 Preview.

WarmWash · 2026-04-24T14:39:23 1777041563

High bet on 3.1 pro. I use it a lot for math and classic engineering, it's very strong.

ozgune · 2026-04-24T09:39:23 1777023563

I reviewed how DeepSeek V4-Pro, Kimi 2.6, Opus 4.6, and Opus 4.7 across the same AI benchmarks. All results are for Max editions, except for Kimi.

Summary: Opus 4.6 forms the baseline all three are trying to beat. DeepSeek V4-Pro roughly matches it across the board, Kimi K2.6 edges it on agentic/coding benchmarks, and Opus 4.7 surpasses it on nearly everything except web search.

DeepSeek V4-Pro Max shines in competitive coding benchmarks. However, it trails both Opus models on software engineering. Kimi K2.6 is remarkably competitive as an open-weight model. Its main weakness is in pure reasoning (GPQA, HMMT) where it trails Opus.

Speculation: The DeepSeek team wanted to come out with a model that surpassed proprietary ones. However, OpenAI dropped 5.4 and 5.5 and Anthropic released Opus 4.6 and 4.7. So they chose to just release V4 and iterate on it.

Basis for speculation? (i) The original reported timeline for the model was February. (ii) Their Hugging Face model card starts with "We present a preview version of DeepSeek-V4 series". (iii) V4 isn't multimodal yet (unlike the others) and their technical report states "We are also working on incorporating multimodal capabilities to our models."

solenoid0937 · 2026-04-24T15:05:11 1777043111

I feel like people suck at promoting Opus. Baseline, it's pretty on par with GPT 5.5.

But if you prompt it well - give it the reasoning behind why you're asking it to do something - it pulls far ahead.

hodgehog11 · 2026-04-24T15:25:32 1777044332

That's fine for procedural tasks, and I understand its value there. But these particular tasks I'm referring to occur on the front lines of research. You can't expect the prompts to be incredibly detailed, since those details are the whole challenge of the problem. I think there is value in having models that are capable of making really good preliminary insights to help guide the research.

cultofmetatron · 2026-04-25T10:21:46 1777112506

I really wanted to get excited about opus but in my own real world usage, I wasn't getting much out of it before hitting my limits. meanwhile i can abuse codex on 5.5 for hours getting a whole lot of work done. Plus, open code and PI are much more fun and interesting harnesses to work from than claude code imho.

I will however say that claude work and design are really great up until i blow its limit.

arcanemachiner · 2026-04-25T10:11:39 1777111899

Would love to know how GLM 5.1 stacks up in this ranking. Seems like it's on par with Kimi K2.6.

bbertelsen · 2026-04-24T18:08:22 1777054102

I'd be interested to know when that Opus 4.6 baseline is from given their recent recognition of performance issues. Do you have a paper posted on this review?

ozgune · 2026-04-25T06:44:26 1777099466

Ack. I took the benchmark results that AI Labs themselves published for their models. So the Opus 4.6 baseline would be from the time that Anthropic released the model.

lifty · 2026-04-24T07:30:15 1777015815

Wondering how gpt 5.5 is doing in your test. Happy to hear that DeepSeek has good performance in your test, because my experience seems to correlate with yours, for the coding problems I am working on. Claude doesn't seem to be so good if you stray away from writing http handlers (the modern web app stack in its various incarnations).

hodgehog11 · 2026-04-24T08:46:54 1777020414

Very cool to hear there is agreement with (probably quite challenging?) coding problems as well.

Just ran a couple of them through GPT 5.5, but this is a single attempt, so take any of this with a grain of salt. I'm on the Plus tier with memory off so each chat should have no memory of any other attempt (same goes for other models too).

It seems to be getting more of the impressive insights that Gemini got and doing so much faster, but I'm having a really hard time getting it to spit out a proper lengthy proof in a single prompt, as it loves its "summaries". For the random matrix theory problems, it also doesn't seem to adhere to the notation used in the documents I give it, which is a bit weird. My general impression at the moment is that it is probably on par with Gemini for the important stuff, and both are a bit better than DeepSeek.

I can't stress how much better these three models are than everything else though (at least in my type of math problems). Claude can't get anything nontrivial on any of the problems within ten (!!) minutes of thinking, so I have to shut it off before I run into usage limits. I have colleagues who love using Claude for tiny lemmas and things, so your mileage may vary, but it seems pretty bad at the hard stuff. Kimi and GLM are so vague as to be useless.

lifty · 2026-04-24T09:29:48 1777022988

My work is on a p2p database with quite weird constraints and complex and emergent interactions between peers. So it's more a system design problem than coding. Chatgpt 5.x has been helping me close the loop slowly while opus did help me initially a lot but later was missing many of the important details, leading to going in circles to some degree. Still remains to be seen if this whole endeavour will be successful with the current class of models.

wohoef · 2026-04-25T11:17:54 1777115874

Do you an idea of how well these models perform on set theory problems or more niche fields in mathematics? So the model would have to both understand a paper that’s not in its training data, and use this to write proofs.

hodgehog11 · 2026-04-25T13:32:02 1777123922

This is all fairly niche stuff I'm trying it on (well, the first three problems anyway), so yes, it needs me to give it several papers that are not in its training data and use them to write proofs. I would expect my experiences to transfer to set theory problems as well.

giwook · 2026-04-25T00:53:12 1777078392

Doesn't the Plus tier not have access to their best (Pro) model?

alansaber · 2026-04-24T12:35:31 1777034131

Very interesting. I wonder how much of this is due to the context length. I am unclear on the implementation strategy, you ran this problem as a 1-shot using chat mode, or using each on an agent harness?

segmondy · 2026-04-24T13:08:31 1777036111

Has nothing to do with context length, they have experience training math models, they have a model that would take gold in IMO and a lean prover. Both have been out for almost a year.

dataviz1000 · 2026-04-25T11:59:44 1777118384

> there is no quantitative measure of performance here

Have them do multiplication or other complicated arithmetic. You say that isn't difficult. Then why do they burn 200k tokens in 20 minutes without converging? I did a deep exploration to help myself understand here [0].

[0] https://adamsohn.com/reliably-incorrect/

bnm04 · 2026-04-24T14:10:16 1777039816

Have you also tried the Pro versions of ChatGPT and Gemini (Deep Think)?

hodgehog11 · 2026-04-24T15:23:02 1777044182

Yes to both, I'm paying for them and use the top-tier thinking models.

nibbleyou · 2026-04-24T07:12:21 1777014741

Curious to know what kind of problems you are talking about here

hodgehog11 · 2026-04-24T07:20:39 1777015239

I don't want to give away too much due to anonymity reasons, but the problems are generally in the following areas (in order from hardest to easiest):

- One problem on using quantum mechanics and C*-algebra techniques for non-Markovian stochastic processes. The interchange between the physics and probability languages often trips the models up, so pretty much everything tends to fail here.

- Three problems in random matrix theory and free probability; these require strong combinatorial skills and a good understanding of novel definitions, requiring multiple papers for context.

- One problem in saddle-point approximation; I've just recently put together a manuscript for this one with a masters student, so it isn't trivial either, but does not require as much insight.

- One problem pertaining to bounds on integral probability metrics for time-series modelling.

MinimalAction · 2026-04-24T13:33:39 1777037619

Regarding the first problem: are you looking at NCP maps for non-Markovian processes given you mention C*-algebra? Or is it more of a continuous weak monitoring of a stochastic system that results in dynamics with memory effects?

I'd be very curious to know how any LLMs fare. I completely understand if you don't want to continue the discussion because of anonymity reasons.

hodgehog11 · 2026-04-24T15:46:27 1777045587

More of the latter. It's a pet project of mine, and all of the LLMs tend to utterly fail at getting anywhere with it, at least in chats. In an agentic setup, it can chip away at some aspects, but it needs serious guidance on relevant language, notation, and concepts. To me, it demonstrates that the LLMs are not particularly good at crossing literatures, but then again, humans rarely seem to be good at that either...

pm2r · 2026-04-24T07:33:53 1777016033

It would be wonderful to have a deeper insight, but I understand that you can disclose your identity (I understand that you work in applied research field, right ? )

hodgehog11 · 2026-04-24T08:54:46 1777020886

Yes, I do mostly applied work, but I come from a background in pure probability so I sometimes dabble in the fundamental stuff when the mood strikes.

Happy to try to answer more specific questions if anyone has any, but yes, these are among my active research projects so there's only so much I can say.

pm2r · 2026-04-24T14:53:38 1777042418

Thanks a lot for your kind but detailed answer. I’m no more in the research field but you gave me good ideas to work on

fuddle · 2026-04-24T19:08:13 1777057693

Any plans to publish the benchmark results?

hodgehog11 · 2026-04-25T08:18:16 1777105096

I have plans to publish the problems, not any plans to publish how well the LLMs perform on them. The standard for publishing benchmarks is very high, and I'm really just posting vibes here. Still, I hope my experiences are useful to some people, as others experiences have been useful to me.

throwa356262 · 2026-04-24T06:17:50 1777011470

Seriously, why can't huge companies like OpenAI and Google produce documentation that is half this good??

https://api-docs.deepseek.com/guides/thinking_mode

No BS, just a concise description of exactly what I need to write my own agent.

u_sama · 2026-04-24T07:45:17 1777016717

I am very partial to Mistral's API docs https://docs.mistral.ai/api

eshack94 · 2026-04-24T18:29:28 1777055368

Agreed, they also have great documentation. There's something to be said for documentation that is so concise, well laid out, and immediately actionable for those looking to get started quickly.

madduci · 2026-04-25T12:36:35 1777120595

For me, DeepSeek has been the best so far, in terms of coding skills, performance and documentation all together. Too bad this is flagged as 'concerning' when it comes to privacy, while on the other hand Gemini, ChatGPT and Claude are way beyond that, especially their mobile apps requiring a lot of permissions.

lykr0n · 2026-04-24T06:35:19 1777012519

It's because they're optimizing for a different problem.

Western Models are optimizing to be used as an interchangeable product. Chinese models are being optimizing to be built upon.

Barbing · 2026-04-24T07:41:36 1777016496

>Western Models are optimizing to be used as an interchangeable product.

But so much investment in their platforms, not just their APIs?

raincole · 2026-04-24T06:38:28 1777012708

> Western Models are optimizing to be used as an interchangeable product

Why? It sounds like the stupidest idea ever. Interchangeability = no lock-in = no moot.

setr · 2026-04-24T08:05:22 1777017922

First you clone the API of the winner, because you want to siphon users from its install-base and offer de-risked switch over cost.

Now that you’re winning, others start cloning your API to siphon your users.

Now that you’re losing, you start cloning the current winner, who is probably a clone of your clone.

Highly competitive markets tend to normalize, because lock-in is a cost you can’t charge and remain competitive. The customer holds power here, not the supplier.

Thats also why everyone is trying to build into the less competitive spaces, where they could potentially moat. Tooling, certs, specialized training data, etc

hunter67 · 2026-04-24T07:34:42 1777016082

Our (western) economic model forces competing individual companies to be profitable quickly. China can ignore DeepSeek losing money, because they know developing DeepSeek will help China. Not every institution needs to be profitable.

naveen99 · 2026-04-24T10:57:14 1777028234

You mean like intel, tesla, spacex, openai ?

deaux · 2026-04-24T12:01:44 1777032104

Ah yes, the Western economic model forcing individual American companies like Amazon , Youtube and Uber to become profitable after.. checks notes _14 years_ for Uber, 9 years for Amazon, many years for Youtube.

FuckButtons · 2026-04-24T07:41:03 1777016463

yes, they want to win the same way they won more or less every other economic competition in the last 30 years, scale out, drop prices and asphyxiate the competition.

simonjgreen · 2026-04-24T07:15:29 1777014929

Yeah, it’s an interesting one. I think inertia and expectations at this point? I don’t think the big labs anticipated how low the model switching costs would be and how quickly their leads would be eroded (by each other and the upstarts)

They are developing their moats with the platform tooling around it right now though. Look at Anthropic with Routines and OpenAI with Agents. Drop that capability in to a business with loose controls and suddenly you have a very sticky product with high switching costs. Meanwhile if you stick with purely the ‘chat’ use cases, even Cowork and scheduled tasks, you maintain portability.

tick_tock_tick · 2026-04-24T07:20:02 1777015202

They are all racing to AGI. They aren't designing them to be interchangeable they just happen to be.

rglullis · 2026-04-24T07:29:10 1777015750

No, they are not. If they were "racing to AGI" they would be working together. OpenAI would still be focused on being a non-profit. Anthropic wouldn't be blocking distillation on their models.

koe123 · 2026-04-24T07:40:04 1777016404

If by AGI you mean IPO, sure. I genuinely don't believe Dario nor Sam should be trusted at this point. Elon levels of overpromising and underdelivering.

djmips · 2026-04-24T08:40:48 1777020048

If by AGI you mean IPO - I automatically read that in Fireship's voice. XD

peepee1982 · 2026-04-24T06:50:10 1777013410

If you want other people to know whether you're being genuine or sarcastic, you'll have to put a bit more effort into your comments. Your comment just adds noise.

kennyloginz · 2026-04-24T07:12:01 1777014721

What da?

vitorgrs · 2026-04-24T06:35:06 1777012506

Meanwhile, they don't actually say which model you are running on Deepseek Chat website.

alansaber · 2026-04-24T12:37:10 1777034230

Because they produce revenue from products which abstract this away

Alifatisk · 2026-04-24T06:24:09 1777011849

You might enjoy Z.ais api docs aswell

kubb · 2026-04-24T07:06:38 1777014398

Western orgs have been captured by Silicon Valley style patrimonialism, and aren’t based on merit anymore.

kccqzy · 2026-04-24T13:37:20 1777037840

I spent only two minutes reading their documentation and it’s clear no one did any proofreading and it’s full of mistakes made by non-native speakers.

Example: the second sentence on the first page says “softwares” but “software” is a mass noun that cannot be pluralized.

Example: the third page about tokens has some zipped code to “calculate the token usage for your intput/output” and obviously “intput” should be “input” but misspelled.

As a company that produces LLMs, they could have even used their own LLM to edit their documentation to fix grammar issues, and yet they did not.

Maybe I’m just extra sensitive to grammar and spelling issues but this kind of lack of attention to detail is a huge subconscious turnoff. I had to fight my urge to close the tab.

Maxatar · 2026-04-24T17:18:08 1777051088

Yeah I think those details are the least of most peoples concerns. I can't vouch one way or another for DeepSeek's documentation but for me what matters most when reading documentation is being able to get the information I want efficiently, not whether someone spelled "software" as "softwares", which is a very common spelling in Asia as an FYI.

I read OpenAI or Anthropic's documentation nowadays and it's just so full of useless junk and self-congratulation that makes it a miserable experience to go through. It's a real shame because OpenAI used to write stellar documentation and publish really lucid papers just few years ago.

aprdm · 2026-04-24T16:59:59 1777049999

No one cares about this kind of stuff. 99% of the devs are not English native speakers, what do you expect ? It works and we all can understand it

kccqzy · 2026-04-24T19:17:12 1777058232

I try hard not to care but subconsciously spelling errors and grammar issues scream low-quality work to me. It’s the kind of mistake that’s the easiest to correct, and they didn’t bother.

u_fucking_dork · 2026-04-24T22:44:42 1777070682

Missing comma in your first sentence was such an egregious grammar error that I was unable to finish reading the rest.

kccqzy · 2026-04-24T23:06:33 1777071993

The phrase “missing comma” is missing an article. You need “a” or “the” before that. As a result when reading your comment, I subconsciously think of it as low quality.

But it’s okay. HN comments aren’t supposed to be high quality anyways. I know mine aren’t. But the official product documentation ought to be.

komali2 · 2026-04-25T02:33:51 1777084431

Why ought it be?

Between you, me, and the Deepseek team, so far as I'm aware, only one entity has caused the Western frontier model companies to panic by delivering an open model that competes far more cheaply, to the point where people are running versions of it at home.

So they spelled software wrong. So what? Outside of this being the mental equivalent of a too-scratchy-sweater for the kinds of people sensitive to that sort of thing, I don't see why it matters.

Those of us that have spent a lot of time programming with non native English speakers (the majority of software engineers on earth) have learned long ago that English ability has no correlation with engineering ability.

diydsp · 2026-04-25T12:00:09 1777118409

It may be a sign deepseek isn't "only for" Americans. Billions of non-native speakers communicate in "flawed" versions of English. Similar for other languages. Circling back to polish instructions for the picky among the Americans... hmm

If it tickles anyone's subconscious feelings, it would be their internal guiding myth of exceptionalism. With their recent forays into authoritarianism, it's becoming ever harder to paper over the reality.

aprdm · 2026-04-24T19:24:22 1777058662

That seems like a you problem

amluto · 2026-04-24T13:53:09 1777038789

The tool calling Python example would have benefitted from actually parsing the tool call. As is, it explains almost nothing.

dackdel · 2026-04-25T05:14:34 1777094074

i dont think deepseek will ever recover from this. huge loss for them. they will stop the pursuit of agi cause of one hn user and a comma.

squirrellous · 2026-04-25T06:29:11 1777098551

This tells me a real developer wrote the docs, instead of someone with good English writing skills but is less technical.

> they could have even used their own LLM to edit their documentation to fix grammar issues

In my experience companies who do this rarely stop at using LLMs to fix grammar issues. It becomes full on LLM speak quite fast, especially if there isn’t a native English speaker in the room who can discern what’s good and bad writing.

replwoacause · 2026-04-25T03:05:30 1777086330

pedantry

slopinthebag · 2026-04-24T19:11:36 1777057896

i prefer it cuz it indicates they didnt use an LLM to write their documentations and that its human generated

jen20 · 2026-04-24T15:25:35 1777044335

> Example: the second sentence on the first page says “softwares” but “software” is a mass noun that cannot be pluralized.

I constantly see and hear this mistake from actual humans too.

It's fairly ironic that your own comment contains run-on sentences, speculative claims and phrasing peculiarities like "could have even" instead of "could even have". Perhaps you are less sensitive to this than you think!

angry_octet · 2026-04-24T15:37:40 1777045060

There is a difference between conversational speech and formal speech like documentation. It isn't rational to criticise use of the first when such speech is complaining about errors in the latter.

It's strange that you criticise "could have even" when it is a phrasing clearly being used for emphasis. "Could even have" makes no clearer sense in context.

No irony detected.

ChrisClark · 2026-04-24T17:31:31 1777051891

Nobody cares, we're talking about quality documentation here, not a couple spelling mistakes

orbital-decay · 2026-04-24T06:35:18 1777012518

>we implement end-to-end, bitwise batch-invariant, and deterministic kernels with minimal performance overhead

Pretty cool, I think they're the first to guarantee determinism with the fixed seed or at the temperature 0. Google came close but never guaranteed it AFAIK. DeepSeek show their roots - it may not strictly be a SotA model, but there's a ton of low-level optimizations nobody else pays attention to.

whatreason · 2026-04-25T01:13:24 1777079604

There have been others for sure, but I'm not sure who was first https://vllm-website-pdzeaspbm-inferact-inc.vercel.app/blog/...

oofbey · 2026-04-25T15:23:27 1777130607

Nobody does it because it’s expensive. If you remove the requirement for perfect reproducibility you open the door to lots of optimizations. Most people prefer faster cheaper results over perfect reproducibility. When the model is intrinsically statistical the value of perfect reproducibility is … limited.

orbital-decay · 2026-04-25T17:07:51 1777136871

Yeah, of course. Making it cheap/compatible with heavy batching is exactly what they did, that's what I mean. ("with minimal performance overhead")

chenzhekl · 2026-04-24T08:12:36 1777018356

It's interesting that they mentioned in the release notes:

"Limited by the capacity of high-end computational resources, the current throughput of the Pro model remains constrained. We expect its pricing to decrease significantly once the Ascend 950 has been deployed into production."

https://api-docs.deepseek.com/zh-cn/news/news260424#api-%E8%...

XCSme · 2026-04-24T11:49:27 1777031367

Yup, I tried to benchmark it, but harder questions time out or get rate-limited...

nsoonhui · 2026-04-24T09:59:39 1777024779

Sorry, but exactly where in the article that you linked contains the mention of " Ascend 950"?

chenzhekl · 2026-04-24T10:03:43 1777025023

it's in the footnote text of the first figure of the section the link points to, where "昇腾950" means "Ascend 950"

nsoonhui · 2026-04-24T10:31:39 1777026699

OK, strange that it doesn't appear on my version of the webpage

https://api-docs.deepseek.com/zh-cn/news/news260424#api-%E8%...

This is the first figure of the section that the above links point to (https://api-docs.deepseek.com/zh-cn/img/v4-spec.png).

And I can read Chinese.

chenzhekl · 2026-04-24T11:05:03 1777028703

https://api-docs.deepseek.com/zh-cn/img/v4-price.png

gertlabs · 2026-04-24T16:17:46 1777047466

Objective, detailed benchmark results at https://gertlabs.com

Early takeaways: from this release, DeepSeek V4 Flash is the model to pay attention to here. It's cheap, effective, and REALLY fast.

The Pro model is slow, not much better in coding reasoning so far when it works, and honestly too unreliable and rate limited to be of much use, currently. Hopefully that improves as new providers host the model. Flash is working fine, and is currently performing competitively with recent releases, but only on agentic workflows. Check back in 24 hours for full combined scoring with tool use and long context for both models.

Many of the frontier Chinese AI labs have released near-frontier models that are just a little bit behind Opus 4.6 in terms of speed, tool use ability, or long context handling. Open weights are winning the AI race, led by China. Crazy couple weeks of releases.

Mimo V2.5 Pro by Xiaomi (not open weights) is actually the best performer of the latest string of Chinese releases in our combined, comprehensive benchmarks, despite getting less attention. Kimi K2.6 is the most interesting open weights release, still. DeepSeek is not the leader in the space anymore.

An interesting pattern with the latest string of Chinese releases is the much better agentic boost (models are not as smart out of the box, but their ability to iterate in a loop with tools makes up most of the difference). Deepseek V4 Flash exemplifying this -- not a smart model on the first try, but it makes up for it over the course of a session.

Squarex · 2026-04-24T18:11:24 1777054284

I would say all benchmarks are inherently subjective. How is yours better? It seems to produce a little bit strange results. Opus 4.6 being worse than 4.5 for example. Or chinese models being rated too high. Kimi, Deepseek or GLM are all great in open source world, but I don't believe they are ahead of SOTA models from Anthropic, OpenAI or Google.

gertlabs · 2026-04-24T18:36:35 1777055795

No, some benchmarks are definitely objective, but most can be easily gamed. For example, most of the benchmarks on the model cards: they have measurable answers that don't rely on a human judge (a human made the question, but the answers are measuring some uncontroversial knowledge or capability). But because there is a single, correct answer, and those answer leak (or are randomly discovered and optimized for in training), they lose value over time, and regardless, they have a ceiling on the intelligence they can measure.

Others are purely subjective, like LMArena, which really only measures the personality and style preferences of the masses at this point, because frontier LLM technical answers are too hard for the average person to judge.

Then there are some interesting one-off benchmarks, but they lack enough rigor, breadth, and samples to draw larger conclusions from.

So we designed our benchmark with 3 goals: objective measurements (individual submissions not dependent on a human or LLM judge), no known correct answer (so simulations can scale to much higher levels of intelligence), and enough variety over important aspects of intelligence. We do this by running multiple models in cooperative/competitive environments with very complex action spaces and objective scoring, where model performance is relative and affected by the actions of other participants.

And yeah, there are some interesting results when you have a more objective benchmark. It should raise eyebrows when every single sub-release of every company's model is better across the board than its predecessor -- that isn't reality.

Squarex · 2026-04-24T18:58:18 1777057098

The word "objective" just seems too authoritative to me.

segmondy · 2026-04-24T18:31:24 1777055484

you are arguing with your belief instead of an objective truth. benchmark is more objective, if you don't agree with it, come up with a better one. but what you believe doesn't matter.

Squarex · 2026-04-24T18:55:39 1777056939

It was not a confrontational take. But all benchmarks are designed by humans, we are not that great at measuring intelligence. So it is somewhat subjective. I was just arguing with the word "objective". Not with the results per se.

swiftcoder · 2026-04-25T07:03:10 1777100590

If the benchmark has a correct answer, the benchmark itself is an objective measure (but of what?). The "of what" may well be subjective

tw1984 · 2026-04-25T11:27:27 1777116447

I agree that benchmarks are inherently subjective.

but the fact that you cite your brief as your main argument is funny - you don't even have any inherently subjective numbers to justify what you believe, you only have "I don't believe".

Squarex · 2026-04-25T18:59:48 1777143588

Sure, I have mixed up two things together. I don't think this benchmark is bad, I just did not like it is presented as the ultimate objective truth. The other thing I have mentioned is that it delivers different results from other benchmarks, so the "believe" stems from other benchmarks.

dandaka · 2026-04-24T17:35:08 1777052108

Interesting that you rate Claude Opus 4.6 lower than 4.5 and 4.7, while community consensus puts it on top.

nostrebored · 2026-04-25T05:00:34 1777093234

I think most hardcore people I know are still sticking with 4.5 for coding workflows

kamranjon · 2026-04-24T17:32:40 1777051960

I'm particularly interested in it being REALLY fast - do you have any rough tok/s numbers for the flash model? I'm excited for unsloth to drop some quants that I can try and run locally, but really curious how it's been performing speed wise. In general I actually over-index on speed over intelligence. I'd rather a model make mistakes quickly and correct in a follow-up than take forever to get a slightly better initial result.

gertlabs · 2026-04-24T17:39:29 1777052369

Take a look at the Time column in https://gertlabs.com/?mode=oneshot_coding -- this is the total time to complete a solution for a reasonably complex problem end-to-end (you would have to divide by avg submission size to estimate tok/s). It's fast in the sense that most of the smart, recent Chinese releases are quite slow, especially the DeepSeek Pro variant. Opus 4.7 is also quite fast.

If pure speed is most important for your use case, GPT-5.3 Chat is the fastest model we've tested and it's still reasonably smart. Not meant for agentic tool usage / long context, though.

So it might be more useful for business applications or non-engineering usage where you don't need exceptional intelligence, but it's useful to get fast, cheap responses.

Lord_Zero · 2026-04-24T16:45:49 1777049149

Why no mention of GPT-5.5?

gertlabs · 2026-04-24T16:51:52 1777049512

Waiting on public API release. Once it drops, results will be up within 24 hours.

gertlabs · 2026-04-25T03:43:13 1777088593

Results are up. GPT 5.5 is a beast.

wahnfrieden · 2026-04-25T04:47:35 1777092455

Have you considered running models like GPT 5.5 inside their agent harness (Codex)?

gertlabs · 2026-04-25T07:03:34 1777100614

I see the value in that, but there are a few reasons that isn't on the immediate roadmap -- mainly, it shifts focus from measuring the model to measuring the harness. The agentic benchmark section you see on the site is comparable to how an agent would perform using an open harness like Pi. But latest tool-using models are pretty well adapted to any harness, so I think that's less of a factor in overall model performance.

wahnfrieden · 2026-04-25T07:59:46 1777103986

Just fresh on my mind after reading this from Codex team member re: performance difference between Pi and Codex app server usage: https://x.com/pashmerepat/status/2046865863979172039

ZeroGravitas · 2026-04-25T10:36:55 1777113415

Well that couldn't be vaguer if he tried. Basically saying, our stuff is better, no reasons given.

wahnfrieden · 2026-04-25T19:40:01 1777146001

Yeah that's why I'm advocating for measuring it in this thread. Some of these models are trained specifically for their official harnesses

revolvingthrow · 2026-04-24T05:42:28 1777009348

> pricing "Pro" $3.48 / 1M output tokens vs $4.40

I’d like somebody to explain to me how the endless comments of "bleeding edge labs are subsidizing the inference at an insane rate" make sense in light of a humongous model like v4 pro being $4 per 1M. I’d bet even the subscriptions are profitable, much less the API prices.

edit: $1.74/M input $3.48/M output on OpenRouter

schneehertz · 2026-04-24T06:06:20 1777010780

This price is high even because of the current shortage of inference cards available to DeepSeek; they claimed in their press release that once the Ascend 950 computing cards are launched in the second half of the year, the price of the Pro version will drop significantly

Bombthecat · 2026-04-24T07:02:31 1777014151

In six month deepseek won't be sota anymore und usage will be wayyyy down.

randomgermanguy · 2026-04-24T09:47:10 1777024030

Only comparing on SOTA scores (ignoring price etc.) is like choosing your daily-driver by looking at who makes the fastest sports-car...

LinXitoW · 2026-04-24T10:34:24 1777026864

The constant improvements of SOTA are the main thing keeping the investment machine running. We can't really remove training costs from inference costs, because a bunch of the funding and loans for the inference hardware only exists because the promises the continuous training (tries to) provides.

dnnddidiej · 2026-04-24T09:57:17 1777024637

Not really. SOTA vs non SOTA is "can I get my coding work actually done today" vs. "this can do customer support chat"

It is like car vs. kick scooter.

regularfry · 2026-04-24T11:02:11 1777028531

It really isn't. We get coding work actually done today on Opus 4.5. That's not SOTA any more, and anything proximate to that level, even quite loosely, is genuinely useful.

dnnddidiej · 2026-04-24T11:07:31 1777028851

OK we are in Opus 4.5 is not SOTA. Right by that definition .... yes you are right.

randomgermanguy · 2026-04-24T11:47:54 1777031274

I mean its almost halve a year, i think that counts ?

dnnddidiej · 2026-04-24T23:14:18 1777072458

Time wise you are correct.

randomgermanguy · 2026-04-24T11:54:20 1777031660

> "can I get my coding work actually done today" vs. "this can do customer support chat"

I think you need to define "can get coding work done" for this to make sense. Ive been using GPT-3 back-then for basic scripts, does that count ? Or only Claude-Code ?

I also think this is a false dichotomy, if you look at the Project Vend project or Vending-Bench, customer support etc. is at no means trivial. (Old but great story https://www.businessinsider.com/car-dealership-chevrolet-cha...)

UlisesAC4 · 2026-04-24T17:32:42 1777051962

This, I have been doing my side hustle code with open code an 3.2 reasoner and it is way better than what I have at day job with copilot and whatever models are there.

wahnfrieden · 2026-04-25T04:50:43 1777092643

Copilot is a bad harness that perverts the productivity of models like GPT 5.5.

dnnddidiej · 2026-04-24T23:15:08 1777072508

Tell me more please!

2ndorderthought · 2026-04-24T10:59:46 1777028386

A huge proportion of those scores are gamed anyways. Use whatever works for you at the price and availability you can afford

Palmik · 2026-04-24T09:58:26 1777024706

Or there will be DSv4.1/2/3 ;)

randomgermanguy · 2026-04-24T11:57:07 1777031827

Definitely something in this realm, they call the models "preview" at a bunch of different points in the paper.

What im really hoping is for a double-punch like with V3 -> R1

Barbing · 2026-04-24T07:43:03 1777016583

Well, if they distilled once…

menzoic · 2026-04-24T07:04:06 1777014246

API prices may be profitable. Subscriptions may still be subsidized for power users. Free tiers almost certainly are. And frontier labs may be subsidizing overall business growth, training, product features, and peak capacity, even if a normal metered API call is profitable on marginal inference.

dannyw · 2026-04-24T08:04:14 1777017854

Research and training costs have to be amortized from somewhere; and labs are always training. I'm definitely keen for the financials when the two files for IPO though, it would be interesting to see; although I'm sure it won't be broken down much.

m00x · 2026-04-24T06:08:06 1777010886

They are profitable to opex costs, but not capex costs with the current depreciation schedules, though those are now edging higher than expected.

nl · 2026-04-24T08:25:42 1777019142

Amazingly, the current depreciation overestimates the retained value of GPUs.

In 2023, the depreciation schedule for H100s was 2 years, but they are still oversubscribed and generating signficant income.

Coreweve has upped their depreciation for GPUs to 6 years(!) now, which seems more realistic.

https://www.silicondata.com/blog/h100-rental-price-over-time

amunozo · 2026-04-24T06:48:10 1777013290

I was thinking the same. How can it be than other providers can offer third-party open source models with roughly the similar quality like this, Kimi K2.6 or GLM 5.1 for 10 times less the price? How can it be that GPT 5.5 is suddenly twice the price as GPT 5.4 while being faster? I don't believe that it's a bigger, more expensive model to run, it's just they're starting to raise up the prices because they can and their product is good (which is honest as long as they're transparent with it). Honestly the movement about subscription costing the company 20 times more than we're paying is just a PR movement to justify the price hike.

peepee1982 · 2026-04-24T07:00:41 1777014041

I'm pretty sure OpenAI and Anthropic are overpricing their token billed API usage mainly as an incentive to commit to get their subscriptions instead.

simonjgreen · 2026-04-24T07:18:05 1777015085

Anthropic recently dropped all inclusive use from new enterprise subscriptions, your seat sub gets you a seat with no usage. All usage is then charged at API rates. It’s like a worst of both worlds!

peepee1982 · 2026-04-24T07:32:21 1777015941

What's the point then? Special conditions for data retention/non-training policies?

simonjgreen · 2026-04-24T07:34:13 1777016053

SSO Tax is a large part of it, controls around plug-in marketplace, enforcement of config, observeability of spend. But it’s all pretty weak really for $20 a month.

And Microsoft are going the same route to moving Copilot Cowork over to a utilisation based billing model which is very unusual for their per seat products (I’m actually not sure I can ever remember that happening).

weird-eye-issue · 2026-04-24T07:09:59 1777014599

The target audience for the APIs is third party apps which are not compatible with the subscriptions.

peepee1982 · 2026-04-24T07:32:34 1777015954

True. I missed that.

adam_patarino · 2026-04-24T11:46:59 1777031219

Prices are not just hard cost of inference. Training costs are not equal. Chinese labs have cheaper access to large data centers. I also suspect they operate far more efficiently than orgs like openAI.

mirzap · 2026-04-24T06:04:52 1777010692

My thoughts exactly. I also believe that subscription services are profitable, and the talk about subsidies is just a way to extract higher profit margins from the API prices businesses pay.

Bombthecat · 2026-04-24T07:04:24 1777014264

Google stated a while back, that with tpus they are able to sell at cost / with profit.

Aka: everyone who uses Nvidia isn't selling at cost, because Nvidia is so expensive.

LinXitoW · 2026-04-24T10:37:50 1777027070

They got loans to buy inference hardware on the promise of potential AGI, or at least something approaching ASI, all leading to stupid amounts of profit for those investors.

We therefore cannot just look at inference costs directly, training is part of the pitch. Without the promises of continuous improvement and chasing the elusive AGI, money for investments for inference evaporates.

WarmWash · 2026-04-24T14:46:28 1777041988

Because you are comparing China to the US.

In China you need to appease state goals. In the US you need to appease investor goals.

China will keep funding them regardless of their income, because the goal is (ostensibly) a state AGI/ASI. In the US, the goal is an ROI which may or may not come with AGI/ASI.

They are different economies with different goals. We can look at past Chinese national projects and see that they are fine with burning $50 to get [social goal] that's worth $5.

ting0 · 2026-04-24T21:01:41 1777064501

This is nonsense. The real reason is because the US companies are scamming the public, as per usual.

vitorgrs · 2026-04-24T06:41:02 1777012862

And they actually say the prices will be "significantly" lower in second semester when Huawei 650 chips comes in.

raincole · 2026-04-24T06:19:59 1777011599

Insert always has been meme.

But seriously, it just stems from the fact some people want AI to go away. If you set your conclusion first, you can very easily derive any premise. AI must go away -> AI must be a bad business -> AI must be losing money.

louiereederson · 2026-04-24T11:53:19 1777031599

It is possible to question the sustainability of the AI buildout and not have a dogmatic position on AI development.

There are still major unanswered questions here. For instance, all of the incremental data capacity build out is going to businesses that have totally unknown LT unit economics and that today are burning obscene amounts of cash.

evilos · 2026-04-24T20:39:18 1777063158

The people who doubted the sustainability of dot com era bubbles were correct even though the tech was actually transformational. Personally I expect roughly the same outcome.

zarzavat · 2026-04-24T06:24:50 1777011890

Before the AI bubble that will burst any time now, there was the AI winter that would magically arrive before the models got good enough to rival humans.

jimmydoe · 2026-04-24T06:43:57 1777013037

They’ve also announced Pro price will further drop 2H26 once they have more HUAWEI chips.

masafej536 · 2026-04-24T06:06:35 1777010795

Point taken but there isnt any western providers there yet. Power is cheaper in china.

3uler · 2026-04-24T06:17:51 1777011471

These models are open and there are tons of western providers offering it at comparable rates.

NitpickLawyer · 2026-04-24T06:12:39 1777011159

As this is a new arch with tons of optimisations, it'll take some time for inference engines to support it properly, and we'll see more 3rd party providers offer it. Once that settles we'll have a median price for an optimised 1.6T model, and can "guesstimate" from there what the big labs can reasonably serve for the same price. But yeah, it's been said for a while that big labs are ok on API costs. The only unknown is if subscriptions were profitable or not. They've all been reducing the limits lately it seems.

ithkuil · 2026-04-24T08:10:22 1777018222

Is there evidence that frontier models at anthropic, openai or google or whatnot are not using comparable optimizations to draw down their coats and that their markup is just higher because they can?

persedes · 2026-04-24T14:21:38 1777040498

not soooo much though. It's heavily subsidized for residential consumption, but industrial power rates are almost comparable to the US (depends on the state you go to etc).

ting0 · 2026-04-24T21:00:54 1777064454

They don't make sense, they're a lie that these AI companies keep spamming using bots so that useful idiots perpetuate it, so that they can keep draining us of money. Straight out of the Anthropic handbook. They've always been cheap to run. I wouldn't be surprised if Anthropic is running for <$1 for 1M/tok.

dminik · 2026-04-24T06:41:26 1777012886

I mean, not one "bleeding edge" lab has stated they are profitable. They don't publish financials aside from revenue. And in Anthropic's case, they fuck with pricing every week. Clearly something is wrong here.

npn · 2026-04-24T10:10:47 1777025447

you know, if you don't have to pay insane salary for your top engineers, and don't have to pay billions for internet shills to control the narrative, then all of the labs will be insane profitable.

crazylogger · 2026-04-24T07:29:34 1777015774

I haven't seen anyone claiming that API prices are subsidized.

At some point (from the very beginning till ~2025Q4) Claude Code's usage limit was so generous that you can get roughly $10~20 (API-price-equivalent) worth of usage out of a $20/mo Pro plan each day (2 * 5h window) - and for good reason, because LLM agentic coding is extremely token-heavy, people simply wouldn't return to Claude Code for the second time if provided usage wasn't generous or every prompt costs you $1. And then Codex started trying to poach Claude Code users by offering even greater limits and constantly resetting everyone's limit in recent months. The API price would have to be 30x operating cost to make this not a subsidy. That would be an extraordinary claim.

nl · 2026-04-24T08:30:46 1777019446

The claim that APIs are subsidized is very common.

eg:

Token prices are significantly subsidized and anyone that does any serious work with AI can tell you this.

https://news.ycombinator.com/item?id=47684887

(the claims don't make any sense, but they are widely held)

vessenes · 2026-04-24T09:38:46 1777023526

I’ll note that it’s common and dangerous, in that there’s a generation of engineers who are at risk of leading each-other astray as to the economics and therefore probability distribution of outcomes for some firms that will massively impact their careers.

I think I understand the major reasons for this meme, but I find it really worrying; there were lots of incorrect ‘it’s a bubble’ conversations here in 2012-2015, but I don’t think they had the pervasive nature and “obvious” conclusion that a whole generation of engineering talent should just, you know, leave.

Meanwhile I am hearing rational economic modeling from the companies selling inference; Jensen, (a polished promoter, I grant you) says it really well — token value is increasing radically, in that new models -> better quality, and therefore revenues and utilization are increasing, and therefore contrary to the popular financial and techbro modeling of 2023, things like A100s still cost quite a lot whether hourly or to purchase. (!) Basically the economic value is so strong that it has actually radically extended the life of hardware.

I just hate to imagine like half of the world’s (or US’s) engineering talent quitting, spending ten years afraid, or wrongly convinced of some ‘inevitable’ market outcome. Feels like it will be bad for people’s personal lives, and bad for progress simultaneously.

mike_hearn · 2026-04-24T16:05:23 1777046723

People shouldn't be quitting the industry, agreed. There's plenty of work to do even with AI assistance.

But how is that a counterpoint to tokens being subsidized? They obviously are subsidized, this just isn't arguable at all. The claims in the linked post make perfect sense. If they weren't subsidized the investors in AI labs would all be minting money instead of burning it.

It doesn't matter if token value is increasing. What matters is how fast it increases relative to the price increases, the repayments on the debt loads and other things we can't really know here on this forum.

Every attempt I've seen to argue this fact away is merely playing with numbers e.g. excluding every cost except inf hardware+energy, even though labs are always training and have large costs outside of compute. This might or might not be a good way to predict the future of these orgs, but it doesn't help anyone argue inference is profitable today (because inference is literally the only thing OpenAI/Anthropic sell and they lose money).

The whole computing industry is in a super weird place right now that feels temporary, like Wile E. Coyote spinning his legs suspended in mid air. Until the economics of the AI industry stop being driven by FOMO and weird, hard to interpret quasi-religious or geopolitical motivations, it's impossible to make accurate predictions about what the impact on software jobs will be. Historically a tech like this would have started at super-high prices and the token cost would have gradually fallen over a period of decades, giving people plenty of time to adapt. Look at the cost of flying, desktop computers, mobile phones, etc. AI is attempting to short circuit that normal technological path and pack decades into years by convincing capital holders that they have no choice but to "invest" because it'll be a winner-takes-all repeat of web search and social media. Yet it's not shaping up that way.

nl · 2026-04-25T00:18:05 1777076285

> But how is that a counterpoint to tokens being subsidized? They obviously are subsidized, this just isn't arguable at all.

Why would Microsoft subsidize Anthropic's models when they serve the Claude model on Azure? They charge the same price as Anthropic. They aren't an investor in Anthropic.

There are numerous independent model serving companies that are clearly profitable serving non-Frontier models (Kimi K2.5 etc). It's easy to work out the raw costs of B200 GPUs, and then see what you need to charge for an API and see they make money.

The frontier labs charge a lot more than these companies.

The frontier labs have said they are profitable on inference.

Most people believe that training (and maybe subscriptions for some users) is where they lose money. Why do you think otherwise?

mike_hearn · 2026-04-25T13:44:06 1777124646

Who says it's MS subsidizing those prices and not Anthropic themselves? Just because someone rehosts a model doesn't imply they get to set whatever price levels they want.

I don't think otherwise, I just think it's meaningless to differentiate between training and inference. What the frontier labs sell is inference. They can't just exclude costs required to engage in that business unless they plan a pivot to just serving Chinese models in a commodified market.

Yes, tokens for random no-name firms serving Kimi K2 probably do make money, although even there it's unclear because so many datacenters and GPU purchases have been made on credit etc. And if we assume that's sustainable forever then you can assume training/staffing costs should be subsidized to zero and say sure, token serving is profitable in that situation. But we were discussing the top labs.

dannyw · 2026-04-24T08:10:11 1777018211

Yeah, subscriptions used to be extraordinarily generous. I miss those days, but the reinvigoration of open weight models is super exciting.

I'm still playing with the new Qwen3.6 35B and impressed, now DeepSeek v4 drops; with both base and instruction-tuned weights? There goes my weekend :P

Flavius · 2026-04-24T09:23:11 1777022591

It's because investors in OpenAI/Anthropic want to get their money back in 10 months, not in 10 years.

casey2 · 2026-04-24T08:33:50 1777019630

It's the decades of performance doesn't matter SV/web culture. I'd be surprised if over 1% of OpenAI/Anthropic staff know how any non-toy computer system works.

sekai · 2026-04-24T06:31:34 1777012294

> I’d like somebody to explain to me how the endless comments of "bleeding edge labs are subsidizing the inference at an insane rate" make sense in light of a humongous model like v4 pro being $4 per 1M. I’d bet even the subscriptions are profitable, much less the API prices.

One answer - Chinese Communist Party. They are being subsidized by the state.

lbreakjai · 2026-04-24T08:43:16 1777020196

When China does it it's communism. When companies in the west get massive tax cuts, rebates, incentives and subsidies, that's just supporting the captains of industry.

jari_mustonen · 2026-04-24T06:34:26 1777012466

Open Source as it gets in this space, top notch developer documentation, and prices insanely low, while delivering frontier model capabilities. So basically, this is from hackers to hackers. Loving it!

Also, note that there's zero CUDA dependency. It runs entirely on Huawei chips. In other words, Chinese ecosystem has delivered a complete AI stack. Like it or not, that's a big news. But what's there not to like when monopolies break down?

nabakin · 2026-04-24T16:55:26 1777049726

> Also, note that there's zero CUDA dependency. It runs entirely on Huawei chips.

That is a huge claim to make with no evidence.

I researched what you said, and I have found no statement to that effect in their paper[0], on huggingface[1], twitter[2], WeChat[3], or in their news release[4].

They only mention as a footnote in only the Chinese version of their news release that they plan to reduce inference costs with the Ascend 950 supernode when it releases[5]. The only mention of Huawei in their paper is that they validated a technique to lower interconnect bandwidth on Ascend NPUs and Nvidia GPUs[6].

[0] https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main...

[1] https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro

[2] https://xcancel.com/deepseek_ai/status/2047516922263285776

[3] https://mp.weixin.qq.com/s/8bxXqS2R8Fx5-1TLDBiEDg

[4] https://api-docs.deepseek.com/news/news260424

[5] https://api-docs.deepseek.com/zh-cn/img/v4-price.png

[6] Page 16

glenstein · 2026-04-24T17:22:00 1777051320

Comments like this are why I go to the comments! I never would have thought to check.

And while I'm here I want to note that I feel there's a big misunderstanding of what is and isn't demonstrated by DeepSeek. So far as I can tell the major (and important!) innovation is reproducing near-frontier level capabilities at a fraction of the cost, but it may be the case that iterating forward at the frontier is the costly thing and is a cost borne by Western companies and that nuance seems to get lost with DeepSeek. Which is not to say that as a matter of principle that non Western companies aren't sometimes capable of jumping into the lead (Kimi has been super impressive) but if GPT/Claude/etc "only" lead at the frontier with more expensive models, that's still a moat.

kybernetikos · 2026-04-24T22:12:26 1777068746

If you can get something almost as capable for a fiftieth of the price, in most cases you'll do that. You might still send a few tokens to the more expensive option for the exceptional, difficult cases, but that's maybe 10% of the tokens at most. I don't see how it'll be possible to keep spending what anthropic, openai, google etc are spending if they're only going to see the trickiest 10% of tokens.

MiiMe19 · 2026-04-25T02:30:44 1777084244

Missed the point award

kybernetikos · 2026-04-25T10:32:29 1777113149

Maybe I need to spell out the step that connects them - how will those companies afford to keep "iterating forward at the frontier" when they probably have a huge crash in their income coming from competition with good enough, but 1/50th the price cheaper and open models.

Iterating forward at the frontier doesn't seem like a sustainable approach if everyone else can catch up with you in 6 months.

Scipio_Afri · 2026-04-24T20:08:16 1777061296

Thank you for this due diligence, I was just reading through the technical report and couldn’t find any references to the software stack or hardware mentioning Huawei either and came back here wondering about this comment that I had read earlier.

jari_mustonen · 2026-04-24T18:22:56 1777054976

Here's a note about running entirely on Huawei chips:

https://finance.yahoo.com/sectors/technology/articles/deepse...

tadfisher · 2026-04-24T19:04:45 1777057485

> DeepSeek indicated that current service capacity for the V4 Pro series is constrained by a computing crunch, though pricing could fall after new clusters powered by Huawei's Ascend 950 chips come online in the second half of the year.

Only mention of Huawei in that article (as of now).

selectodude · 2026-04-24T19:05:14 1777057514

Did you read any part of the link you posted? Huawei is mentioned once and not in the context of the model being trained or currently running on Huawei chips.

vedaba · 2026-04-24T19:27:10 1777058830

Dammit, you found my technique of “citing” sources for papers in high school...

selectodude · 2026-04-24T20:01:00 1777060860

At least when I pulled random citations off Wikipedia I could reasonably trust whoever put it there figured it was tangentially related to what was being cited. I’m not sure I could get away with putting a literal press release that I didn’t read anywhere.

Big L for media literacy there.

chvid · 2026-04-24T20:11:34 1777061494

Not long ago the story was this:

DeepSeek’s next AI model delayed by attempt to use Chinese chips

https://www.ft.com/content/eb984646-6320-4bfe-a78d-a1da2274b...

czk · 2026-04-24T19:52:48 1777060368

They mention it uses MXFP4 quant which is a blackwell capability but it looks like this is also supported by ascend 950 series according to marketing material

kappi · 2026-04-24T17:40:35 1777052435

DeepSeek is planning to use Huawei extensively for inference

“Due to constraints in high-end compute capacity, the current service capacity for Pro is very limited. After the 950 supernodes are launched at scale in the second half of this year, the price of Pro is expected to be reduced significantly.”

https://x.com/jukan05/status/2047516566149816627

nabakin · 2026-04-24T18:01:29 1777053689

Yes, that's the footnote from citation [5].

nsoonhui · 2026-04-24T23:43:52 1777074232

I said the same thing as you and I got summarily downvoted (https://news.ycombinator.com/item?id=47888227).

That HN is quick to upvote an unsubstantiated comment ( the grandparent one, because it aligns with the anti US bias? ) and downvote fact finding one doesn't bode too well for the community as a whole. I have seen enough how polticial ideology colors everything in my home country( Malaysia), and the decline of the country is palpable, and I don't expect to find such a thing here. We are supposed to be impassioned and rational, right ?

Render to Jesus what's due to him, ditto for Caeser.

nabakin · 2026-04-25T00:38:58 1777077538

Probably because you said you used DeepSeek. People don't want to see AI in the comments and don't trust AI responses.

dzonga · 2026-04-24T09:52:59 1777024379

Jensen Huang said this in his recent interview - that China has the best/most engineers, it has the chip making ability, it's a good thing they wanna build on a Nvidia stack - but if you push them they will build on an all Chinese stack - but the interviewer was being a numb head who kept parroting the propaganda of Western tech supremacy

zdragnar · 2026-04-24T17:01:26 1777050086

They would have moved to their own stack regardless. They've got the people and resources for it, and they've witnessed the fallout of globalization and experienced dependency on semi-hostile political powers enough to know that it's the smart move.

It's also more or less the same move that they've been using pretty much since the WTO entry: take on foreign manufacturing, copy the products, sell knockoffs as their own, build new products on top of the that knowledge.

arcticfox · 2026-04-24T12:52:34 1777035154

Referring to the Dwarkesh interview clearly.

Jensen came across as incredibly defensive and intentionally close-minded, shows that even billionaires suffer from "a man can't understand something if his paycheck depends on him not understanding it."

Your assertion is silly: did Tesla selling electric cars into China stop them from delivering their own industry? They were going to develop their domestic industry regardless.

We simply don't know the counterfactual, if they had unlimited access to Nvidia chips, how far ahead would their models be?

awongh · 2026-04-24T13:10:48 1777036248

I thought Jensen’s comparison to Huawei’s cell phone hardware infra (towers and networking) to be an interesting comparison- that shutting them out of a market was one of the causes of their current position in the market. It made them more dominant in the end.