It depends on how you review. In an orchestrated per-task review workflow with clearly defined acceptance criteria and implementation requirements, using anything other than Sonnet (handed those criteria and requirements) hasn’t really led to much improvement, but it drives up usage and takes longer. I even tried Haiku, but, yeah, Haiku is just not viable for review, even tightly scoped, lol.
Siccing Sonnet on a codebase or PR without guidance does indeed lead to worse results than using Opus, though.
That makes sense, if your scope is tight enough, good enough is good enough. I’ve got the expected specifications and code style guides, including some aerospace engineering ones, but in complex systems I still run into difficult to sus out corner cases where the code works but the system breaks, usually due to unresolved conflicts in operational requirements.
> I do see big problems around motivation of the next generation of engineers to keep looking under the hood if avoiding it is becoming so easy, but you should, individually, arguably feel more enabled to do so than ever.
This is what gets me every single time. I genuinely don’t think this is a hard realization to come to, and yet, the vast majority of arguments from both sides of the aisle, both proponents and antis, always assume that EITHER you do everything yourself, OR you have the AI do everything for you. If you use AI, you’re DOOMED to never think critically about anything anyone ever tells you ever again. If you don’t, you’re an idiot, because everyone else is using it, and skills and experience no longer matter because everyone can now do everything.
And this is on HN, too; supposedly, a site where experienced engineers, developers, and builders converge; the exact kind of demographic you’d expect to understand such a thing as nuance. And yet, your comment is one of very few. There’s someone RIGHT HERE, a few comments down, saying, verbatim, “it’s a solution engine not a curiosity engine. Getting effortless answers at every turn is the opposite of curiosity.” Treating curiosity as the end rather than the means, as if I stop being a curious person once I find an answer to a question I’ve been asking myself, or as if curiosity is some sort of “temporary status effect” that an answer/solution “consumes.”
And it seems to be worse than just “no one’s thought it through properly.” I’ve literally had someone show a fundamental incapability to understand the concept. I spent a non-trivial amount of effort writing out three comments with several paragraphs about how knowing your knowns and unknowns, and the fact that you have unknown unknowns, is the most important thing in any project, not just when it comes to AI. That these tools aren’t just doers, but also searchers. That they’re pretty much the best rubber ducky that’s ever been created, and that I argue a rubber ducky is exactly what you should be using for in any contexts that don’t have it automate trivial and testable work. The guy refused to read any of it and, after three walls of text, continued claiming I’m “advocating for the LLM to guide me.” There is some sort of deeply instinctive and intrinsically defensive reflex that a lot of people seem to immediately collapse into when the topic comes up, and it seems to seriously impair the ability to acknowledge nuance or concede a single fraction of an inch. It’s baffling.
They also sometimes flag stuff in their reasoning and then think themselves out of mentioning it in the response, when it would actually have been a very welcome flag.
This can result in some funny interactions. I don't know if Claude will say anything, but I've had some models act "surprised" when I commented on something in their thinking, or even deny saying anything about it until I insisted that I can see their reasoning output.
AI-assisted, I can see. I believe it doesn’t have to be that way, though. If you use AI as a grounding tool - essentially something that can take your stream of consciousness and parse it into a series of concerete and pointed search terms to do real-time research with instead of falling back on what’s in the weights - then it’s honestly hard to think of a technology that had the potential to be more useful in the history of the species - it gives you much more direct access to both your unknown unknowns and your unknown knowns.
That is, of course, provided that you pay attention it actually does research. In their current state, LLMs are practically useless for this purpose for the vast majority of users, as no one knows how they work, what to watch out for, what the failure modes look like, and how to keep nonsense apart from facts when both are presented with an equal amount of conviction. That’s not a user problem, it’s an education problem.
> Jai Das, president of investment firm Sapphire Ventures (who has no stake in either company), told the FT he saw OpenAI as “the Netscape of AI,” a reference to the once-dominant browser that was overtaken by Microsoft and eventually absorbed by AOL.
One can only pray and hope, I’d say. May they be absorbed by a company with just as much lasting staying power as AOL.
I’m more of a prosumer than a professional, but when I look for sounds, I look for individual ones; never for packs. What I’d appreciate more than anything else is the choice of either buying individual sounds for smaller money, or load up on a sub or credits if I have more of a bulk need.
Basically, look at FL Cloud and do exactly what they’re doing, haha. Image-Line is the prime example of a company worth trusting, and they get to reap the rewards of that trust as a result.
The point of an encyclopedia is that you can visit a very specific page under a very specific name and receive information that you know has been vetted and properly researched. You get precisely zero of any of this with an LLM, so they just seem like they’re fundamentally the wrong tool to even consider something like this for.
Siccing Sonnet on a codebase or PR without guidance does indeed lead to worse results than using Opus, though.
reply