>Eventually, someone will figure out how to get hundreds of LLMs supervising teams of millions of LLMs to do some really wild stuff that is currently completely impossible.
This is an intuitive direction. In fact, it’s so intuitive that it’s a little bit odd that nobody seems to have made proper progress with LLM swarm computation.
This sounds like that old economics joke that says it's impossible to find $20 on the ground, because if it had been there, someone would have already picked it up.
Context window is a limitation, but have we actually hit the ceiling wrt scaling that? For GPT, you need O(N^2) VRAM to handle larger context sizes, but that is a "I need more hardware" problem ultimately; as I understand, the reason why they don't go higher is because of economic viability of it, not because it couldn't be done in principle. And there are many interesting hardware developments in the pipeline now that the engineers know exactly what kind of compute they can narrowly optimize for.
So, perhaps, there aren't swarms yet just because there are easier ways to scale for now?
Rather large parts of your brain are more generalized, but in particular places we have more specialized areas. Now, you looking at it would consider it all the same brain most likely, but if you're looking at it in systems thinking view, it's a small separate brain with a slightly different task than the rest of the brain.
If 80% of the processors in a cluster are running 'general LLM' and 20% are running 'math LLM' are they the same cluster? Could you host the cluster in a different data center? What if you want to test different math LLM modules out with the general intelligence?
I think I would consider them split when the different modules are interchangeable so there is de facto an interface.
In the case of the brain, while certain functional regions are highly specialized I would not consider them "a small separate brain". Functional regions are not sub-organs.
This is an intuitive direction. In fact, it’s so intuitive that it’s a little bit odd that nobody seems to have made proper progress with LLM swarm computation.