More

petters · 2026-06-04T14:23:29 1780583009

> If we give an LLM a prompt that reads “The following is a conversation between Julius Caesar and Genghis Khan,” it will generate a coherent dialogue between the two historical figures. But no matter how detailed the responses are, no matter how vividly they recount their respective historical accomplishments, we would never conclude that the LLM has conjured up digital re-creations of Julius Caesar and Genghis Khan, nor would we suggest that the historical figures are conscious

They might be in principle. It could be that the best way to generate a plausible dialogue is to bring up re-creations of the characters and have them act it out. LLMs definitely have been demonstrated to have world models in some cases. That helps generating text.

petters · 2026-05-30T19:21:34 1780168894

It also means you'll have to onboard a lot of new employees all the time. That sounds exhausting.

petters · 2026-05-16T07:29:40 1778916580

Wikipedia is amazing. The Swedish articles are even longer.

Symbiote · 2026-05-16T10:01:11 1778925671

I agree — but I was expecting actual runes to be shown, not just a transcription into Latin characters.

Like on https://en.wikipedia.org/wiki/Jelling_stones (ᚼᛅᚱᛅᛚᛏᚱ᛬ᚴᚢᚾᚢᚴᛦ᛬ᛒᛅᚦ᛬ᚴᛅᚢᚱᚢᛅ etc).

petters · 2026-05-03T07:44:32 1777794272

Many dismiss Dawkins here but Ilya Sutskever wrote in 2022: “it may be that today's large neural networks are slightly conscious.”

3748499449 · 2026-05-03T08:09:47 1777795787

IS quite literally gets paid to think that

petters · 2026-05-03T20:18:15 1777839495

Karpathy replied to IS with ”agree” at the time

Towaway69 · 2026-05-03T13:40:26 1777815626

Well then it must be so. Btw what exactly is “consciousness”? Oh, we don’t really know that either.

So two (AI and consciousness) concepts we don’t fully understand seem to be seem to uniting into something we definitely won’t understand. Which doesn’t matter since humankind is busy doom scrolling, talking about what color Trumps fart was last night and invading each others countries.

/s

petters · 2026-05-03T07:41:01 1777794061

We have a very good idea of all math behind chemistry. But the equations are very difficult to solve.

ekianjo · 2026-05-03T08:29:33 1777796973

We are not talking about the same thing. Not all chemical reactions are predictable like math is. Organic chemistry is full of lucky findings. Just look at how catalysts are discovered.

petters · 2026-04-22T17:24:08 1776878648

He is much better at building hardware than he is writing software.

KeplerBoy · 2026-04-22T17:59:34 1776880774

He seems pretty damn good at both.

petters · 2026-04-19T21:01:20 1776632480

That's a good idea and it exists: https://www.johndcook.com/blog/2026/04/18/qlora/

It seems quite wastful to have two zeros when you only have 4 bits it total

saulpw · 2026-04-19T22:08:20 1776636500

OTOH, it seems quite plausible that the most important numbers to represent are:

   +0
   -0
   +1
   -1
   +inf
   -inf

parsimo2010 · 2026-04-19T23:23:51 1776641031

In standard FP32, the infs are represented as a sign bit, all exponent bits=1, and all mantissa bits=0. The NaNs are represented as a sign bit, all exponent bits=1, and the mantissa is non-zero. If you used that interpretation with FP4, you'd get the table below, which restricts the representable range to +/- 3, and it feels less useful to me. If you're using FP4 you probably are space optimized and don't want to waste a quarter of your possible combinations on things that aren't actually numbers, and you'd likely focus your efforts on writing code that didn't need to represent inf and NaN.

  Bits s exp m  Value
  -------------------
  0000 0  00 0     +0
  0001 0  00 1   +0.5
  0010 0  01 0     +1
  0011 0  01 1   +1.5
  0100 0  10 0     +2
  0101 0  10 1     +3
  0110 0  11 0     +inf
  0111 0  11 1     NaN
  1000 1  00 0     -0
  1001 1  00 1   -0.5
  1010 1  01 0     -1
  1011 1  01 1   -1.5
  1100 1  10 0     -2
  1101 1  10 1     -3
  1110 1  11 0     -inf
  1111 1  11 1     NaN

saulpw · 2026-04-21T03:14:57 1776741297

I can see the most important values being:

   ± 0 (infinitesimal)
   ± 10^-2n
   ± 10^-n
   ± 1 (unity)
   ± 10^n
   ± 10^2n
   ± infinity

For fp4, this leaves 2 values. Maybe one of them should be NaN. What should the other one be?

Dwedit · 2026-04-19T23:00:02 1776639602

Why waste a slot on -0?

adampunk · 2026-04-20T13:00:58 1776690058

You need it if you want the idea of total ordering over the extended Reals. There's +/- infinity--an affine closure, not projective (point at infinity)--so to make that math work you need to give 0 a sign.

saulpw · 2026-04-19T23:49:53 1776642593

Because it means "infinitesimal negative" which is distinct from "infinitesimal positive".

Dylan16807 · 2026-04-20T02:08:07 1776650887

That sounds pretty niche. What's a use case where you have less than 8 bits and that distinction is more important than having an extra finite value? I don't think AI is one.

jlokier · 2026-04-20T03:55:38 1776657338

For neural net gradient descent, automatic differentiation etc, the widely used ReLU function has infornation carrying derivatives at +0 and –0 if those are infinitesimals.

Dylan16807 · 2026-04-20T06:19:59 1776665999

Barely any information. After surviving RELU that signed zero is probably getting added to another value and then oops the information is gone. It sounds a lot worse than properly spaced values.

saulpw · 2026-04-20T06:33:26 1776666806

sign = most important bit of information

Dylan16807 · 2026-04-20T10:32:05 1776681125

If you were looking at the entire number line, sign would roughly be the most important part.

But you still have all the other numbers carrying sign info. This is only the sign of denormals and that's way less valuable. Outside of particular equations it ends up added to something else and disappearing entirely. It would be way better to cut it and have either half the smallest existing positive value or double the largest existing value as a replacement. Or many other options.

petters · 2026-04-18T07:16:23 1776496583

You could add a feature where it will compute the global optimum of any function of a small number of variables. Branch and bound with interval arithmetic works well for a small number of variables.

Disjoint unions of intervals seems like a nice thing to have

petters · 2026-04-15T05:07:44 1776229664

Yes, that blog post could have been much shorter….

petters · 2026-04-11T20:00:43 1775937643

They have found a large number in OpenSSl