I did an experiment today, where I had a new Claude agent review the work of a former Claude agent - both Opus 4.6 - on a large refactor on a 16k LOC project. I had it address all issues it found, then I cleared context, and repeated. Rinse and repeat. It took 4 iterations before it approached nitpicking. The fact that each agent found new, legitimate problems that the last one had missed was concerning to me. Why can’t it find all of them at once?
It is more like a wiggly search engine. You give it a (wiggly) query and a (wiggly) corpus, and it returns a (wiggly) output.
If you are looking for a wiggly sort of thing 'MAKE Y WITH NO BUGS' or 'THE BUGS IN Y', it can be kinda useful. But thinking of it as a person because it vaguely communicates like a person will get you into problems because it's not.
You can try to paper over it with some agent harness or whatever, but you are really making a slightly more complex wiggly query that handles some of the deficiency space of the more basic wiggly query: "MAKE Y WITH NO ISSUES -> FIND ISSUES -> FIX ISSUE Z IN Y -> ...".
OK well what is an issue? _You_ are a person (presumably) and can judge whether something is a bug or a nitpick or _something you care about_ or not. Ultimately, this is the grounding that the LLM lacks and you do not. You have an idea about what you care about. What you care about has to be part of the wiggly query, or the wiggly search engine will not return the wiggly output you are looking for.
You cannot phrase a wiggly query referencing unavailable information (well, you can, but it's pointless). The following query is not possible to phrase in a way an LLM can satisfy (and this is the exact answer to your question):
- "Make what I want."
What you want is too complicated, and too hard, and too unknown. Getting what you are looking for reduces to: query for an approximation of what I want, repeating until I decide it no longer surfaces what I want. This depends on an accurate conception of what you want, so only you can do it.
If you remove yourself from the critical path, the output will not be what you want. Expressing what you want precisely enough to ground a wiggly search would just be something like code, and obviates the need for wiggly searching in the first place.
Just build Mac apps then. Claude Code can help you whip up real native apps without any Glaze dependencies just fine. I’ve built 4 Mac and iOS apps in the last 6 months for my own use. I even have my own HN app for iOS and Mac.
Even if you don't like Electron, I was able to get Claude to build Electrobun and Tauri apps as well. I don't understand what benefit Glaze will bring outside of more lock-in?
LLMs absolutely understand and write good Elixir. I've done complex OTP and distributed work in tandem with Sonnet/Opus and they understand it well and happily keep up. All the Elixir constructs distinct from ruby are well applied: pipes, multiple function clauses, pattern matching, etc.
I can say that anecdotally, CC/Codex are significantly more accurate and faster working with our 250K lines of Elixir than our 25K lines of JS (though not typescript).
I suspect this is partly due to the quality of documentation for Elixir, Erlang, and BEAM. The OTP documentation has been around for a long time and has been excellently written. Erlang/Elixer doc gen outputs function signatures, arity, and both Elixir and Erlang handle concepts like function overloading in very explicit, well-defined ways.
* Largely stable and unchanged language through out its whole existance
* Authorship is largely senior engineers so the code you train on is high quality
* Relatively low number of abstractions in comparisson to other languages. Meaning there's less ways to do one thing.
* Functional Programming style pushes down hidden state, which lowers the complexity when understanding how a slice of a system works, and the likelyhood you introduce a bug
I have a `codex-review` skill with a shell script that uses the Codex CLI with a prompt. It tells Claude to use Codex as a review partner and to push back if it disagrees. They will go through 3 or 4 back-and-forth iterations some times before they find consensus. It's not perfect, but it does help because Claude will point out the things Codex found and give it credit.
I've been using a development server for about 9 years and the best thing I ever did was move to a machine with a low-power Xeon D for a time. It made development painful enough that I quickly fixed the performance issues I was able to overlook on more powerful hardware. I recommend it, even just as an exercise.
For similar reasons, in the Google office I worked in you had the option to connect to a really intentionally crappy wifi that was simulating a 2G connection.
> I've been coding for decades already, but if I need to put something together in an unfamiliar language? I can just ask AI about any stupid noob mistake I make.
So you aren’t still learning foundational concepts or how to think about problems, you are using it as a translation tool. Very different, in my opinion.
It's not though. Processes can be supervised and crashes can just lead to "restart with good state" behavior. It's not that you don't try handling any errors at all, you just can be confident that anything you missed won't bring the system down.
And Elixir is strongly typed by most definitions. Perhaps you mean static?
You can be more confident. But remember that time an Ericsson switch crashed upon handling a message that it sends to adjacent switches every time it restarts? That crashed the whole network, and you could still do that in Erlang.
LiveView uploads are baked in, previews and all. Everything else you list is included in the Flop library, if you want something off the shelf. In rails you are still including Kaminari or whatever other gems for all this too, so this is really no different.
That's so disappointing to hear. I have an intern who hadn't touched Elixir 4 weeks ago who is already making meaningful PRs. She's done the PragProg courses and leans a bit on Copilot/Claude, but she's proving how quickly one can get up to speed on the language and contribute. To hear that a major company couldn't bring resources up to speed, to me, shows a failure of the organization, not the language or ecosystem.