I wonder, based on your experience, how hard would it be to improve your system ...

ramon156 · 2026-03-26T06:48:47 1774507727

I think it's more important to pin down where a human must be in order for this not to become a mess. Or have we skipped that step entirely?

pianopatrick · 2026-03-26T20:20:33 1774556433

Personally my theory is that to solve the messiness we will need some new frameworks and even languages that are designed to catch AI mistakes in large code bases. For example, AIs in the past would sometimes hallucinate methods that do not exist. But in a language with a strong type system a static type checker should be able to catch that mistake and give the AI automated feedback to fix that mistake without a human in the loop.

As far as humans in the loop, the only human we ultimately cannot get rid of is the user. But I think with a combo of user feedback forms and automated metrics we can give AI a lot of feedback about how good software is just from users using the software.

vidarh · 2026-03-26T18:11:28 1774548688

Yes, they can, and they do a reasonably good job at it. Hand them playwright or similar, and point them at it. The caveat is that they're often "lazy", and it takes some practice to coax them into being thorough (hot tip: have one write a list of things to probe and test, and tell it to use sub agents to address each; otherwise they tend to decide very quickly it's too tedious and start taking shortcuts)

mlsu · 2026-03-26T07:09:28 1774508968

perhaps we can give the AI a bit of money, make it the customer, then we can all safely get off the computer and go outside :)

stingraycharles · 2026-03-26T09:29:48 1774517388

AI agents can absolutely use web browsers to do these things, but the hard part is accurately defining the acceptance criteria.

smokeyfish · 2026-03-26T06:22:10 1774506130

Datadog have a feature like that.