Personally, I prefer the US approach. At least then everyone knows what they're dealing with and they can openly react instead of being forced into a fake dance between what is said and what is REALLY said.
The covert stuff gives some degree of plausible deniability and it causes a good amount of the population to be complacent and ignore reality. I don't see how this can be considered good for anyone but the people creating propaganda.
It's not like people have an unlimited number of places to work, even if they have Meta on their resume. Many of my colleagues (and myself included) had struggled in the job market in the past before landing at Meta. If it's work for Meta, or suffer more tumult in the hiring market; it's easy to understand why many might decide to take the offer even with the moral implications. I used to bring up politics in the office with coworkers and many people are simply unaware of the consequences of the company's products. There are a few different categories that these people fall into, but the main ones I saw in the office:
1) Chinese H1B holders who are happy to be working in the US at all, and generally apolitical (or view anything as better than the status quo of where they come from)
2) Just normal people who are interested in their own lives and have never been trained to think about the world in a big picture way (some overlap between 1&2 exist of course)
It's very western of us to always be tracking the conseqentiality of our actions even when we're just the cog in a wheel at BigCo. I think that it's the right thing to do, but this sort of reasoning largely absent in eastern cultures, or even for some in the west—even among those who are well educated. It's kind of hard to blame individuals when they either are rightfully consumed by worrying about their own welfare or are for whatever reason not as seminally hyperaware or woke as we can be in the west. Growing up I liked imposing my political philosophies onto everyone; maturity is understanding that even objectively righteous values are only useful for the right types of minds.
On the contrary, once someone has truly been made aware of the ramifications of their actions; it's more difficult for me to extend my sympathy to them. I consider mark and priscilla to be fully implicated based on their exposure to the harm that they're actively, willingly, knowingly causing. Other employees may never get that memo, though, people obviously avoid political talk in the workplace.
What Meta does (and here I want to be clear that you can replace Meta with Apple, Microsoft, Google, Palantir...) is eventually public knowledge, profusely discussed even on HN. This means substantial amount of people have been aware, for decades.
And even if "just quit" is not an option - why not push for policy to regulate these corps? Why is it that after all this time, these same corps now also own at least 1 branch of the US government?
And when the EU/Australia/China.. tries to regulate punish those corps, suddenly everyone comes out on HN to explain protectionism, overreach, some -ism, and "actually we need to give them the benefit of the doubt" etc... why not support that momentum?
> And when the EU/Australia/China.. tries to regulate punish those corps, suddenly everyone comes out on HN to explain protectionism, overreach, some -ism, and "actually we need to give them the benefit of the doubt" etc... why not support that momentum?
I really, really want to believe it's bot warfare. But there is this running theme of HN posters who think because something is _legal_, or because you can point at it historically and go "acktually it's always been like this", it's therefore _moral_ and we should not ever push back on the excesses of these awful fucking companies.
> And even if "just quit" is not an option - why not push for policy to regulate these corps? Why is it that after all this time, these same corps now also own at least 1 branch of the US government?
Because money is the current representation and approximation of power. It used to be "the yams," but now it's money.
You remind me of my former, younger self and I applaud the appeal you are making to our better selves. All I'm stating is simply that many people don't care, or can't be made to care. But further, there is a pontificating nature about the way you reason about these workers. In the case of my colleagues at Meta, many feel that they are so fortunate to be able to work in the US at all. Even if they did care, it would be rational for them to continue working there against their moral qualms anyways. Because no one would choose to go back to their home country and do the same work for a paltry fraction of the pay.
Not speaking philosophically. I'm just talking about my experience on the ground working with chinese (as a fellow chinese). Some of them are interested in global affairs, certainly, but I find it to be more common from people raised in the west.
> It's kind of hard to blame individuals when they either are rightfully consumed by worrying about their own welfare or are for whatever reason not as seminally hyperaware or woke as we can be in the west.
If you care that your employer is being unethical (such as storing your keystrokes), that's being hyperaware, woke?
I know the definition of woke can stretch like taffy, but it now seems dislodged from its origins concerning race and gender and is now just a vague disparagement of any speaking up to injustice.
Was quite tired when i wrote this; just want to be on the record saying that i don't necessarily think that people in the east haphazardly just do whatever they're told. there's more nuance to it than that; but i just observe generally that in the east there isn't a culture of political motivation or organizing, or democracy at all. So it's not at all surprising when people don't assign any political meaning to their work—even in the cases where one so overtly exists.
What is it about it that makes the story less interesting to you? It's the same story, down to the same delicate details. When AI-slop stops being, well, slop, and just is everything that humans do, but much better, and much more efficient—will we have the same repulsion to it that many of us do now?
I find it interesting to ponder. We look at the luddite movement as futile and somewhat fatalistic in a way. I feel like the current attitude towards AI generated art will suffer the same fate—but I'm really not quite sure.
What is your understanding of the luddite movement? I ask because I don't believe many are aware that luddites were not anti-technology. It was a labor movement which was targeted at exploitation by factory owners. Their issue was with factories forcing the use of machines to produce inferior products so owners could use cheaper, low skill labor.
I'd have been ok if things fell more in their direction... I'm not saying "clear win", but a middle ground that had the machines do the things they're best at while letting humans do the quality work.
> but a middle ground that had the machines do the things they're best at while letting humans do the quality work.
By arguing for letting humans work, particularly quality work, you're not especially finding a middle ground, more adopting the 1811 position of the OG Luddites who were opposed to being put out of work.
Everybody wants to lose their jobs. Almost by definition your job is something you do not because you want to, but because you need to earn a living. Even if your job coincides with your hobby, you would prefer not to have your economic welfare tied to it in a way that drives how you engage with it.
We are on the verge of making this possible, if a bunch of myopic morons -- people who have never been right about a single long-term trend in history -- can be convinced not to screw it up.
You're using a very loose definition of "losing your job".
Not everybody agrees with your definition of what a job means (some people are very passionate about their jobs; not me but I understand their point of view), and regardless, "losing your job" is a thing that is forced upon you and is a source of distress for most people, not something people "want". Many people throughout history, after losing their jobs, never recover (either psychologically, or in terms of the economy not giving them a place to recover).
To be clear, I don't subscribe to the following view at all, but a lot of people derive their self-worth from their occupation. Don't you remember, a few years back, an infamous comment made by someone on HN stating that "if you're fired from your job, you've failed as a person"? It was thankfully downvoted to hell, but it goes to show you your perception of jobs and job loss is not at all widely shared.
Even if nobody wanted to live without a job, until we reach some sort of post-scarcity utopia, the current AI trend is a threat.
Don't you remember, a few years back, an infamous comment made by someone on HN stating that "if you're fired from your job, you've failed as a person"? It was thankfully downvoted to hell, but it goes to show you your perception of jobs and job loss is not at all widely shared.
So, how about responding to a point I made in this thread, today, instead of a post made by "someone on HN a few years back?"
That post seems to have gotten your goat, and I can understand that, but I did not say (and would not have said) anything like it... and I don't, in fact, remember it.
Even if nobody wanted to live without a job, until we reach some sort of post-scarcity utopia, the current AI trend is a threat.
We can't reach post-scarcity without AI. If we could have, we would have. It's technology -- and only technology -- that is even giving us the luxury to think and talk about post-scarcity.
> So, how about responding to a point I made in this thread, today, instead of a post made by "someone on HN a few years back?"
It was only a counterexample to illustrate my point. I did address your point in general, that your assertion that "everybody wants to lose their jobs" is both tone-deaf and false.
> We can't reach post-scarcity without AI
Maybe. But more importantly, it doesn't explain away people's justified fears.
Stories are particularly troubling because we have the concept of "suspending disbelief" and readers tend to take a leap of faith with longwinded narratives because we assume the author is going somewhere with the story and has written purposefully.
When AI can write convincingly enough, it is basically a honeypot for human readers. It looks well-written enough. The concept is interesting and we think it is going somewhere. The point is that AI cannot write anything good by itself, because writing is a form of communication. AI can't communicate, only generate output based on a prompt. At best, it produces an exploded version of a prompt, which is the only seed of interest that carries the whole thing.
Somebody had that nugget of an idea which is relevant for today's readers. They told the AI to write it up, with some tone or setting details, then probably edited it a bunch. If we enjoy any part of it, we are enjoying the bits of humanity peeking through the process, not the default text the AI wrote.
Right, but in the present case we have exactly what you're describing—a story, almost fully written by AI but with some human cherry-picking in the mix. And readers are finding it a phenomenal story and then wanting to vomit retrospectively in learning about the authorship. It just seems patently obvious to me that this is not where the sentiment is going to stay—it will hit the margin, like the people who decide to not own a cell phone, or those who would rather listen to analog audio; there will be a market for it but it will exist at the margin. Eventually, especially for young people, more and more of what they consume will be AI generated and they won't care because it's indistinguishable from human work.
Or, I digress, it will be distinguishable from human work but because it's so much better than anything that a human could have ever created. These AI tools that we have now are as dumb as they will ever be. If we ever reach AGI or superintelligence or whatever—or even if not, even if these tools just advance for 10 more years on their current trajectory—it's easy for me to imagine some scenario where the machines can generate something so perfect to your liking that you just prefer it to anything a human ever would have created, storytelling and all.
You can take the general case where AI can just generate a better movie than a team of humans ever could plausibly generate. After all, AI doesn't have any of the physical constraints of a movie studio—the budget, the logistics of traveling from location to location, the catering, the fact that the crew has to sleep, has to coordinate schedules, all that. AI, with some human involvement or not, could just keep iterating on some script on a laptop overnight until its created an optimized version which is more satisfying to humans than any other human made movie ever created. Or in a narrow case it could create the perfect movie for you, given what it knows about you and your interests. All human movies would look inferior.
For my kids, who I'm sure are going to grow up in a world where this type of art is embedded everywhere—and where the human version is almost certainly going to be worse—I don't think the desperate cries to see the last scrap of human ingenuity will mean anything. All of these people throwing rocks at Waymos and others boycotting companies for generating ads rather than shooting one with a video studio; it's so obviously helpless, desperate and obviously futile in the face of what's coming.
I mourn the future that seems plausible here but I also welcome it as inevitable. The technology is coming, and people are going to have to adapt one way or another.
You're talking about content. Only content can be "perfect" as you say.
When I'm listening to music, looking at art, seeing a play or a short film I want to feel connection to the humans behind it. AI is by definition missing that connection. That's what makes me retrospectively vomit at AI writings like these. That connection requires that the humans behind it are imperfect, the solo can have one or two sloppy notes, but at least it's genuine interaction. We have seen this same yearning for connection with all the "Don't use LLM to comment, use your true style of writing with its flaws" rules.
I'm 100% certain mainstream studios will be producing "perfect" content with AIs just like current mainstream pop stars have 10 ghost writers working on each song to create "perfect" songs. The good stuff will exist in the fringes as always and I'm ok with that as I've already been for years.
And the future may not be as settled as you think it is. Leaders try to sell you their vision of the future by saying it is settled and that things are certain, but that is because they want you to believe that, because if you and the masses believe so, it's more certain for the future to settle the way the leaders want. But you can also actively refuse that future and find a different future that's worth believing in yourself.
The riff comes first, the people come second. One of the nice things about punk and metal is how anti celebrity in a fundamental way both genres are. In histories of the genres, you will usually find such and such band made such and such invention that led to certain new structures being accessible. Of course the social background of the scenes where it emerged is important too but the history is traced first in terms of the riff. Or aka books like glazing a particular rockstars life history are rare, even though there are some "superstars" in metal and punk. The culture is very "only analog is real, digitals fake shit" but idk in some other ways they seem much closer to having not much difficulty accepting a valid musical work regardless of origin.
I don't quite understand what you're getting at with this comment? In metal and punk it's pretty cornerstone of the genre to be authentic, and in metal to value human skills (all the solo parts, fast playing). I've played and listened punk and metal my whole life, but will also enjoy early Lady Gaga, Eminem, Kendrick etc. celebrities because I recognize their authenticity and skills. Sabrina Carpenter and Drake go over my head because of blatant ghost writing and even though they have good tunes, I vomit retrospectively.
So what is AI bringing to the fans of these genres that the fans might value? Because it's not authenticity nor is it skills. What is the point you're trying to make?
I am saying on surface it might seem they should be the staunchest opponents and as I said the culture is "only cassette tape is real otherwise fuck off and die" but simultaneously its also one of the least image/player focused genres in some ways, what is being played is of much higher priority than who in specific is playing it.
Hmm I can think of various examples where the guitarist was changed and people dismiss the new guitarist. Take a look at Megadeth for example - every new solo guitarist gets compared to Marty Friedman even though he hasn't been in the band for 26 years. So a lot of it is player focused.
But your point also stands here, every new guitarist must play the solos as close to the original ones as possible, otherwise it's not the same experience. So on the music level "what" is of much higher priority still. But I wouldn't say it is as black and white as you make it out to be.
Some of course have a very unique style that seems very hard to replicate. Personally I haven't yet found a single band that manages to faithfully execute classic era Slayer. But there are countless bands today who make very good execution of norwegian black metal and swedish death metal.
Edit: And a lot of modern black metal for example doesn't even bother with stating who they are. Member lists are pseudonymous or anonymous. I think this "anti god" culture makes metal different from other genres in some ways.
Ok I'm not as up to date with modern black metal, that pseudonymity seems cool.
There's also upcoming math rock band Angine de Poitrine who are also anonymous https://www.youtube.com/watch?v=0Ssi-9wS1so . In these cases you can argue that the person doesn't matter but in my opinion it still does. There's a person inside that costume, who has made the decision to be anonymous as part of the whole experience. That's part of their expression.
Of course there's then bands like Ghost who have mainstreamed this too - the players wearing the costumes are usually just contract musicians and don't have anything to do with Tobias or the music other than playing for money. Good for them but f that, you are just a robot at that point.
There's anonymity/pseduonymity where we have a entity that does not do any performances and releases cassettes with members acknowledged as "M., K. and J." or even nothing and there is "anonymity/pseudonymity" where a band tries to use that as its own image (eg Kanonenfieber). Obviously I meant more like the former which is legitimately a music first person irrelevant presentation, but modern black metal is a wide spectrum, it has some of the most image conscious crap out there too, if anything I think its probably the most superficial and image focused of the main metal genres. It's just that anonymity hasn't historically been part of death metal culture that much but I feel its actual presentation is quite workman like in many ways.
That's all speculation, and it may prove to be true.
But:
> readers are finding it a phenomenal story
is not true across the board.
I thought to myself, explicitly, and fairly early "This is a fun and thoughtful idea, but the writing is kinda crap" before I realized (maybe a third if the way through) "ah, right, this is genAI. That tracks."
Despite my deep-seated hatred of LLMs, I choose to finish the piece and see if I was being unfair to the actual work ("the output", in the soulless descriptor used by programmers who've never once written a real story or crafted a song).
As a longtime avid reader of fiction, lit nerd, and semi-pro musician, I understand writing and artistry better than the average HN poster, and couldn't help but see the flaws in this.
People who don't have deep knowledge of literature don't catch the tells or flaws as well, but are still understandably angry when they find out they burned their time reading clanker output, and are understandably depressed that they were suckered into it because they haven't spent a lifetime developing a deep understanding of the discipline.
It's possible that genAI approaches will surpass humans in every field we invented.
So far, though, in every field I understand deeply, I see the uncanny mediocrity of the average in every LLM output I have subjected myself to.
You can get some good guesses from the comment itself.
> I assumed the writer was a journalist or author with a non-technical background trying to explore a more "utopian" vision of where trends could go.
If you assume you're reading something from a person with intention and a perspective, who you could connect with or influence in some way, then that affects the experience of reading. It's not just the words on the page.
This reminds me of having the reverse experience with the 2017 New Yorker viral "Cat Person" story [0] which a (usually trustworthy) friend forwarded and enthusiastically told me to read: waste of time shaggy-dog story, intentional engagement-trolling aimed at the intersection of the hot-button topics of its target readership *. But why are we culturally expected to allow more slack to a human author, even a meretricious one? Both are comparably bad. The LLM-authored one needs a disclaimer at the top to set its readers' expectations right, then readers can make an informed choice.
(* "Cat Person" honestly felt like the literary equivalent of Rickrolling; I would have stopped reading it after the first page if not for my friend's glowing endorsement.)
It had a very similar quality to the AI'd article from this thread. A sort of attempt at Being Literary but never really ever getting to the point of saying anything. It has the same feeling of wallowing, of over indulging in its shtick.
Yes, this is a thing. Bad writing with an interesting idea underneath it all is still interesting if it comes from a human because we have the expectation that the human will improve in how they share their ideas in the future. In other words, we see potential.
But LLMs don't have potential. You can make an LLM write a thousand articles in the next hour and it will not get one iota better at writing because of it. A person would massively improve merely from the act of writing a dozen, but 100x that effort and the LLM is no better off than when it started.
Despite every model release every 6 months being hailed as a "game changer", we can see from the fact that LLMs are just as empty and dumb as they were when GPT-2 was new half a decade ago that there really is no long term potential here. Despite more and more power, larger and hotter and more expensive data centers, it's an asymptotic return where we've already broken over the diminishing returns point.
And you know, I wouldn't care all that much--hell, might even be enthusiastically involved--if folks could just be honest with themselves that this turd sandwich of a product is not going to bring about AGI.
You cannot even get angry or upset if you disagree with anything in the story, maybe the author’s despicable worldview permeating through the characters... because there's no author’s worldview, because there's no author. It's a window into nothing, except perhaps the myriad of stories in the model's training set.
I want to at least have to option of getting upset at the author.
i don't find the luddite comparison accurate. they were against looms and anti-ai people or ai skeptical people are against the wholesale strip mining of intellectual property as it exists... both public domain and non-public domain. it's used to enrich the capital class at the expense of the workers. sure it's similar but it certainly didn't have the copyright and wholesale theft of all of the human ideas behind it. it just feels quite different.
People had a revulsion to eating refrigerated foods. The developed world got over it. We're comfortably on the path to becoming Eloi who will trust everything the magic box does for us.
> When AI-slop stops being, well, slop, and just is everything that humans do, but much better, and much more efficient—will we have the same repulsion to it that many of us do now?
For me, the answer to this riddle is very easy: I want to engage with other human minds. A robot (or AI) doesn't have a human mind, so I'm not interested in its "artistic" output.
It was never about how good it was. Of course AI slop adds insult to injury by being also bad. Currently. But it'll get better. My position was never that AI art (shorts, pictures, music, text) is to be frowned up because it's bad. I don't like it because it's not the expression of a human mind.
It's a bit like how an AI boy/girlfriend is not the real deal, no matter how realistic -- and I'm sure they'll get uncannily realistic in the future. They aren't the real deal because there's no real human behind the facade of companionship.
As a couple sibling comments said, I took it for an insight into the way an optimistic writer might see AI software development becoming a new form of "end-user programming" or "citizen developer" tooling. I'm personally too deep in the weeds to ever see it becoming empowering in that way (if nothing else, this will be an incredibly centralizing technology and whoever wins the "arms race" [assuming we we're not in a bubble destined to pop soon] will absolutely have the possible Toms and Megans of such a future by the short hairs). But I love end-user programming, or whatever we're calling it now! (I was partial to "shadow IT" - made it sound really cool.) So I enjoyed the idea that somebody saw AI as a "bicycle for the mind" in that sense, even if I feared they'd end up disappointed.
But there was nobody there, and I'm only disappointed in myself for not noticing.
AIUI, all such results are because the FDA has given up since aduhelm and said "well, if it clears amyloid, that's as good as slowing Alzheimer's, right?" despite the actual results on Alzheimer's progression being largely negative.
For what it's worth, early statins were originally cleared based only on the evidence that they lower cholesterol without longer term studies showing a reduction in mortality. Of course there is now plenty of evidence showing statins improve overall endpoints.
Similarly, there were other drugs that lowered cholesterol that didn’t show a significant reduction in coronary events. As we later learned, it’s not nearly as simple as “cholesterol bad.”
~~yes by 4 months. If I had AD i wouldn't bother with those treatments.~~ Sorry I missed the context you are right the fact that they slow AD by 4 months is a proof that amyloid plaques are part of the pathogenesis.
At least in the case of "europe" it could refer to the EU (which obviously is not correct because it doesn't encompass all of europe). But when they are talking about "Asia"—what governing body would they even be referring to? It's obviously non-sensical.
> in the case of "europe" it could refer to the EU (which obviously is not correct because it doesn't encompass all of europe)
Not just that. If we get really pedantic, the EU is not only in Europe but includes territories in Africa (parts of Spain) and Asia (the entirety of Cyprus). And that's not even getting into the intercontinental shenanigans of France!
Having the productivity "drop through the floor" is a bit hyperbolic, no? Humans are still reviewing the PRs before code merge at least at my company (for the most part, for now).
I don't know that it's likely but it's certainly a plausible outcome. If tooling keeps getting built for this and the financial music stops it's going to take a while for everybody to get back up to speed
Remember this famously happened before, in the 1970s
There's an actual working product now, albeit one which is currently loss leading. In software world at least there is definitely enough value for it to be used even if it's just better search engine. I'm not sure why it would disappear if the financial music stops as opposed to being commoditised.
Because there's cheaper ways to get an equally good search engine? But yes I imagine some amount of inference will continue even in an AI Winter 3.0 scenario.
They are killing off their last and best generation of men, so yes the economy will suffer. I'm not questioning that part -- it's the repeated "russia will collapse any minute" propaganda, going on for 12+ years, that is very easy to see through.
Yeah seriously. Don't people understand the fact that society is not good at mopping up messes like this—there has been a K shaped economy for several decades now and most Americans have something like $400 in their bank accounts. The bottom had already fallen out for them, and help still hasn't arrived. I think it's more likely that what really happens is that white collar workers, especially the ones on the margin, join this pool—and there is a lot of suffering for a long time.
Personally, rather devolving into nihilism, I'd rather try to hedge against suffering that fate. Now is the time to invest and save money. (or yesterday)
If white collar workers as a whole suffer severe economic setback over a short term timespan, your savings and investments won’t help you.
Unless you’re investing in guns, ammo, food, and a bunker. We’re talking worse unemployment than depression era Germany. And structurally more significant unemployment because the people losing their jobs were formally very high earners.
That’s the cataclysmic outcome, though. Although I deemed that that’s certainly possible and I would put a double digit percentage probability on it, another very likely outcome is a very severe recession, or a recession, wear a lot of, but not all, white collar work is wiped out. Maybe there’s a significant restructuring in the economy I think in a scenario like that, which also seems to be in the realm of possibility, I think having resources still matters. Speech to text, sorry for the poor grammar.
It’s definitely possible that there’s an impact that is bad but not cataclysmic. I figure in thst case though my regular savings is enough to switch to something else. I could retire now if I was willing to move somewhere cheap and live on $60k a year. There’s a lot of things that could cause that level of recession though without the need for AI.
I do also think the mid level bad outcome isn’t super likely because of AI is good enough to replace a lot of white collar jobs, I think it could replace almost all of them.
> Our latest frontier models have shown particular strengths in their ability to do long-running tasks, working autonomously for hours, days or weeks without intervention.
I have yet to see this (produce anything actually useful).
I've been finding that the Opus 4.5/4.6 and GPT-5.2/5.3 models really have represented a step-change in how good they are at running long tasks.
I can one-shot prompt all sorts of useful coding challenges now that previously I would have expected to need multiple follow-ups to fix mistakes the agents made.
No, not for days - but it churned away on that one for about ten minutes.
I don't think I've got any examples of multi-hour or multi-day sessions that ran completely uninterrupted - this one back in December took 4.5 hours but I had to prompt it to keep going a few times along the way: https://simonwillison.net/2025/Dec/15/porting-justhtml/
Maybe so, but I did once spend 12 hours straight debugging an Emscripten C++ compiler bug! (After spending the first day of the jam setting up Emscripten, and the second day getting Raylib to compile in it. Had like an hour left to make the actual game, hahah.)
I am a bit thick with such things, but just wanted to provide the context that Emscripten can be a fickle beast :)
I sure am glad I can now deploy Infinite Mechanized Autistic Persistence to such soul-crushing tasks, and go make a sandwich or something.
(The bug turned out to be that if I included a boolean in a class member, the whole game crashed, but only the Emscripten version. Sad. Ended up switching back to JS, which you basically need anyway for most serious web game dev.)
How do you deal with the cost associated with a long running opus session? I asked it to validate some JSON configs against the spec yesterday and it burned $10 worth of tokens for what would have been a 1 millisecond linter task.
If you look through the commit logs on simonw/research and simonw/tools on GitHub most commits should either list the prompt, link to a PR with the prompt or link to a session transcript.
I routinely leave codex running for a few hours overnight to debug stuff
If you have a deterministic unit test that can reproduce the bug through your app front door, but you have no idea how the bug is actually happening, having a coding agent just grind through the slog of sticking debug prints everywhere, testing hypotheses, etc — it's an ideal usecase
I have a hard time understanding how that would work — for me, I typically interface with coding agents through cursor. The flow is like this: ask it something -> it works for a min or two -> I have to verify and fix by asking it again; etc. until we're at a happy place with the code. How do you get it to stop from going down a bad path and never pulling itself out of it?
The important role for me, as a SWE, in the process, is verify that the code does what we actually want it to do. If you remove yourself from the process by letting it run on its own overnight, how does it know it's doing what you actually want it to do?
Or is it more like with your usecase—you can say "here's a failing test—do whatever you can to fix it and don't stop until you do". I could see that limited case working.
For some reason setting up agents in a loop with a solid prompt and new context each iteration seems to result in higher quality work for larger or more difficult tasks than the chat interface. It's like the agent doesn't have to spend half its time trying to guess what you want
Its constantly restarting itself, looking at the current state of things, re-reading what was the request, what it did and failed at in the past (at a higher level), and trying again and again.
I don't even necessarily ask it to fix the bug — just identify the bug
Like if I've made a change that is causing some unit test to fail, it can just run off and figure out where I made an off-by-one error or whatever in my change.
I've heard this said a lot but never had this problem. Claude has been decent at debugging tests since 4.0 in my experience (and much better since 4.5)
it's more like "this function is crashing with an inconsistent file format error. can you figure out how a file with the wrong format got this far into the pipeline?". in cases like that the fix is usually pretty easy once you have the one code path out of several thousands nailed down.
Or, they have freed up time for more useful endeavours, that may otherwise have spent on drudgery.
I don't discount the value of blood, sweat and tears spent on debugging those hard issues, and the lessons learned from doing so, but there is a certain point where it's OK to take a pass and just let the robots figure it out.
It's easy to say that these increasingly popular tools are only able to produce useless junk. You haven't tried, or you haven't "closed the loop" so that the agent can evaluate its own progress toward acceptance criteria, or you are monitoring incompetent feeds of other users.
I'm definitely bullish on LLM's for coding. It sounds to me as though getting it to run on its own for hours and produce something usable requires more careful thought and setup than just throwing a prompt at it and wishing for the best—but I haven't seen many examples in the wild yet
Strategy -> [ Plan -> [Execute -> FastVerify -> SlowVerify] -> Benchmark -> Learn lessons] -> back to strategy for next big step.
Claude teams and a Ralph wiggum loop can do it - or really any reasonable agent. But usually it all falls apart on either brittle Verify or Benchmark steps. What is important is to learn positive lessons into a store that survives git resets, machine blowups, etc… Any telegram bot channel will do :)
The entire setup is usually a pain to set up - docker for verification, docker for benchmark, etc… Ability to run the thing quickly, ability for the loop itself to add things , ability to do this in worktree simultaneously for faster exploration - and got help you if you need hardware to do this - for example, such a loop is used to tune and custom-fuse CUDA kernels - which means a model evaluator, big box, etc….
I am currently porting pyte to Go through a similar approach (feeding the LLM with a core SPEC and two VT100/VT220 test suites). It's chugging along quite nicely.
Anthropic is actually sort of concerned with not burning through cash and charging people a reasonable price. Open AI doesn’t care. I can use Codex CLI all day and not approach any quotas with just my $20 a month ChatGPT subscription.
I treat coding agents like junior developers and never take my hand off the wheel except for boilerplate refactoring.
The other day I got Codex to one-shot an upgrade to Vite 8 at my day job (a real website with revenue). It worked in this for over 3 hours without intervention (I went to sleep). This is now in production.
(but honestly for a lot of websites and web apps you really can just send it, the stakes are very low for a lot of what most people do, if they're honest with themselves)
I find this absolutely wild. From my experience Codex code quality is still not as good as a human so letting codex do smth and not verifying / cleaning up behind it will most likely result in lower code quality and possibly subtle bugs.
For upgrading frameworks and such there are usually not that many architectural decisions to be made, where you care about how exactly something is implemented. Here the OP could probably verify the build works, with all the expected artifacts quite easily.
Agreed. Optimistically let it resolve merge conflicts in an old complex branch. Looked fine at first but was utter slop upon further review. Duplication, wildly unnecessary complexity and all.
reply