At the end of the day, the processor can only do Turing operations: assign values to variables (registers, memory locations, storage), loops, bitwise operations, and conditionals. Whether the source code is python, java, or lisp, it has to compile or interpret down to machine code ultimately. Likewise if the running software is a word processor, DOOM, or an LLM, at the end of the day it will be executed by the processor using the three operations. Lots of other fancy hardware and software may accelerate things but ultimately it is those ops that are the running code. The rest is many wonderful conveniences and abstractions.
Then you're assuming an efficiency that is analogous to how Moore's law made it efficient for chips. Same difference. The problem is that AI scaling in the longest term is a completely unknown problem.
Training improvements and Moore's Law are "analogous" but not "same difference." They are far from the same thing, governed by completely different factors, and one can happen and has been happening independently from the other.
Well I never said nor meant that, rather, my third (3) sentence should've hinted that I already believe what you are saying in your second sentence (2). Whereas my second (2) sentence was handwaving at the notion that if the parent commenter's remark (about improvment trends) were to be assumed then the rational argument must be subject to the same standards, ergo same difference (in argument standards). (Also I use a phone, please excuse any confusion due to not spelling out my online opinions in full)
To clarify another way, it seems the parent commenter and obviously many, many lay people seem to think ALL sorts of technology improves eventually and are always very assured of that. That's a common mistaken premise or axiom used in their arguments. (Arguably Moore's law (up until now) has been a factor in confounding this observation because so much other tech has historically benefited from it directly or indirectly)
Sorry, but a plain reading of your comment does not imply at all that you agree with me, rather the opposite. I'm not basing my opinion on any mistaken axiom of inevitable technology improvement, of course. I'm projecting obvious trends of the past few years which are overwhelmingly likely to continue in the medium term.
"Same difference" could only mean that you believe my argument should fail in the same way as an argument based on Moore's law. If that's not what you meant then you should have used different words. If that is what you meant, with the justification that "AI scaling in the longest term is a completely unknown problem", I disagree with that too.
In the "longest term" the ultimate scaling of AI doesn't matter for the original question of whether "AGI will most likely only be for the rich". Nobody looks at the TOP500 list today and says "computing is only for the rich". This is because we have an abundance of iPhones and gaming PCs in the consumer market, providing practically any application of computing that a consumer could want at very attainable prices. Similarly, practically any application of AGI will be accessible to consumers at attainable prices. Continued AI scaling after a certain point will be relevant mostly to industry (whose products will still be priced attainably, analogously to the way weather forecasts produced on TOP500 supercomputers are readily accessible to the public today).
Well it's half-and-half, why did Apple struggle for so long with Siri and its pre-LLM era technology, during the time of AlphaGo and so forth, and then after Covid why didn't Apple pivot to something like their own version of Gemini?
But there are lots of differing possible reasons for this, and I think it is premature to conclude with any one in particular.
Disagree, someone like the other commenter who points out LLMs don't even understand the domain concepts correctly versus someone who uses it anyways for corporate proprietary results have very different standards for what is acceptable. If you wrangle an LLM with harnesses and clever prompts you could use it to get some amazing results but that has more to do with trial and error and creativity, not some kind of fundamental skill of using LLMs.
It definitely understands the concepts well enough if you give it the right context. I'm not the only one saying this either. Like I said, it's a skill issue.
That's the Clever Hans argument, and the fact that you confidently use this unfalsifiable tactic ("Give it just the right context and it understands stuff!! It works!!" (Well, until the next iteration and then the next until the system paints itself into a corner)) tells me you are engaging in broscience / pseudoscience. Like I say, anti-scientific attitudes like yours are part of the problem, fanning the hype. It's bad faith to attribute people's criticisms of LLMs as some kind of lack of skill. People on here, many who are actual scientists and professional programmers, are very intelligent and highly trained, if they wanted to play around with LLMs they very likely capable of getting impressive one-time results, but proper, sustained use in a non-"vibe-coding" manner, such as with guarantees for validity, consistency, replicability, extensibility, and so forth is a completely open problem. Therefore it is out of proportion to reduce that to human skill. It's analogous to framing a bad design pattern as user error--disingenuous and bad faith. Ironically, with an intellectual standard like that, it then becomes easy to become overconfident about LLMs.
That's amazing, as someone who struggles to find something useful to do with LLMs. How long does this take, several minutes or more? Do you need a paid version of Claude Code for this?
It sat there for about half an hour working out the problem, step by step, before asking me for the preferred solution. At one point, it was trying to decompile the .APK, so I interrupted it and reminded it that Kodi was open source - it was welcome to clone from GitHub.
The only other feedback I gave it mid-process was wrong (I said that the crash probably wasn't caused by cache trimming, it ran some additional tests to confirm that its hunch about cache trimming was right).
This was with the paid version of Claude Code (I don't think they offer a free version at all; that's a Codex thing). The $20 version is as smart as the $200 one, but once you work out it can do stuff like this you'll quickly burn the $20 token limit. :)
The other thing that helps is a CLAUDE.md file - authored of course by Claude itself.
Mine's here: https://github.com/EspoTek/.claude/blob/master/CLAUDE.md
A lot of it is probably domain-specific for the stuff I do, but the "Working with unfamiliar data or systems" section is bloody gold! Stopped the bullshit completely!
Not the person you were asking but IMHO it all reduces to computational complexity, e.g. biological evolution provided the computational efficiencies that ultimately produced conscious minds and beings, whereas it is not obvious what scale of silicon, power or energy, and input data is sufficient for that to happen artificially. But that means my view is it is a matter of it being possible in principle, merely unknown in practice. Also my view is that denying this amounts to violating the Church Turing thesis of computational equivalence ("human brains are not magic, super-Turing, etc."), and I think a lot of talking-past one another in these public disagreements amounts to one side not actually having taken modern CS theory fundamentals enough to be persuaded of these couple of premises.
That's my take on it too, roughly. I think if we get to trillion-parameter models and they don't exhibit what we'd call AGI, however you define it, then the current transformer based systems never will.
But calling them "unconscious" is a pretty high bar. Mice are conscious. The house sparrow pecking in my yard right now is conscious.
reply