I wonder, based on your experience, how hard would it be to improve your system to have an AI agent review the software and suggest tickets?
Like, can an AI agent use a browser, attempt to use the software, find bugs and create a ticket? Can an AI agent use a browser, try to use the software and suggest new features?
Personally my theory is that to solve the messiness we will need some new frameworks and even languages that are designed to catch AI mistakes in large code bases. For example, AIs in the past would sometimes hallucinate methods that do not exist. But in a language with a strong type system a static type checker should be able to catch that mistake and give the AI automated feedback to fix that mistake without a human in the loop.
As far as humans in the loop, the only human we ultimately cannot get rid of is the user. But I think with a combo of user feedback forms and automated metrics we can give AI a lot of feedback about how good software is just from users using the software.
Yes, they can, and they do a reasonably good job at it. Hand them playwright or similar, and point them at it. The caveat is that they're often "lazy", and it takes some practice to coax them into being thorough (hot tip: have one write a list of things to probe and test, and tell it to use sub agents to address each; otherwise they tend to decide very quickly it's too tedious and start taking shortcuts)
Like, can an AI agent use a browser, attempt to use the software, find bugs and create a ticket? Can an AI agent use a browser, try to use the software and suggest new features?