More

WithinReason · 2026-06-14T09:16:57 1781428617

Which tools? Even file reads and writes?

bob1029 · 2026-06-14T09:26:54 1781429214

Especially these things.

The only tools permissible to root in my scheme are call() and return().

WithinReason · 2026-06-14T09:42:49 1781430169

Is it in pi.dev? Don't thinking tokens still take up context?

WithinReason · 2026-06-13T08:06:09 1781337969

folding@home reached 2.43 exaflops by April 12, 2020, which would make it the largest supercomputer on the planet.

sho · 2026-06-13T09:25:15 1781342715

it's down 99% since that peak. But let's compare to it anyway.

It's pretty useless to compare raw FLOPS, but as a general hand-waving guesstimate, F@H is currently doing about 25 petaflops in a mix of FP16 and 32. AI usually trains at FP8, but to keep things fair the H100 is quoted at 60 FP64 teraflops per unit, so that's 12 FP64 exaflops given its 200k count.

So F@H at its peak did 2.43 exaflops@FP16/32. Colossus 1 does 12@FP64. These numbers are very hand-wavy, but I think the point is made.

By the way, I'm not trying to crap on F@H - I think it's an outstanding project and I've run it in the past. But a volunteer group simply cannot compete with well-funded, concentrated effort like what's going into AI.

WithinReason · 2026-06-13T07:58:48 1781337528

Efficiency difference between training on GPUs and TPUs is 2x at best. You can get very efficient with tensorcores, converging to TPU efficiency. In the end math is math, you can't make a multiplication more efficient than it already is on GPU.

schobi · 2026-06-13T08:37:33 1781339853

I guess this was more related to syncing GPUs.

If you were to take 500 computers with older 1080 GPUs, you might have enough compute/ram equivalent to an H200 GPU for training such a model. Maybe take 10000.

But if those machines are spread over 10000 homes, wired with residential internet service, training a large model will not get anywhere.

You go from "data in the same HBM memory chip" at 4.8TB/s or "data in adjacent GPU" with NVlink at 1.2 TB/s down to 25 MBit/s upload speed. Accessing the next piece of data is going to be about a Million times slower. At the same time you will heat a thousand times more, for a Million times longer.

incrudible · 2026-06-13T09:21:32 1781342492

You need to train independently and merge rarely. The problem is the merge step. Weights are too entangled, you are not going to get an improvement commensurate to the effort. Otherwise, everyone would do it. It is an open research problem.

filup · 2026-06-13T10:33:03 1781346783

That sounds like the way. Everyone trains their own small problems to maximally compressed weights and then merges.

zozbot234 · 2026-06-13T08:35:21 1781339721

The power-constrained part of compute is data movement, not the elementary arithmetic per se. Anyway, it's very possible to tweak the underlying design to increase throughput a lot for any given power budget at the cost of high latency. This seems especially useful for training workloads where we don't really care about latency as much.

GeoAtreides · 2026-06-13T12:54:53 1781355293

Math is math, but sadly math isn't physics nor engineering.

pvirgiliu · 2026-06-13T15:48:03 1781365683

math has physics.

WithinReason · 2026-06-13T07:56:08 1781337368

The gradient info can be compressed 10000x with the right tricks, I think it is achievable. Nous claims they did it already:

https://github.com/NousResearch/DisTrO

There are other gradient compression papers from the past reporting large compression rates

WithinReason · 2026-06-12T10:22:44 1781259764

This likely says something about the harness Fable was trained in. It knows how to do this because it has done this millions of times during reinforcement learning.

WithinReason · 2026-06-10T06:38:21 1781073501

https://www.forbes.com/sites/anishasircar/2026/04/17/ai-solv...

WithinReason · 2026-06-09T19:51:38 1781034698

It's a meme, and HN loves upvoting memes. Just like Reddit!

WithinReason · 2026-06-09T16:45:30 1781023530

The clone is you though, assuming it's a perfect copy

WithinReason · 2026-06-09T14:23:13 1781014993

There is a similar analysis from the Netherlands

WithinReason · 2026-06-05T07:17:37 1780643857

Not every one can afford millions to publish a paper

spindump8930 · 2026-06-05T17:40:01 1780681201

That's why you do several small and medium scale tests, fit a curve, and ideally show that the trend persists at several scales. Not a single large or medium run - see the other comments down thread for example sizes.