> You need to set up your fine-tuning and prompts and then test well for consist...

drawnwren · on Aug 26, 2024

If you're curious what the state of the art in multi-agent is looking like, I really recommend https://thinkwee.top/multiagent_ebook/

meiraleal · on Aug 27, 2024

This looks great! Unfortunately doesn't well on firefox but I take it as being Mozilla's fault nowadays.

hughesjj · on Aug 27, 2024

The most common hallucinations I've seen are phantom GitHub repos and issues, and this usually appears when I ask for a source.

Kiro · on Aug 27, 2024

It's definitely an overblown problem. In practice it's not a big issue.

nerdjon · on Aug 27, 2024

Yeah.... no it's really not overblown.

It is a serious problem when these tools are being pushed as trustworthy when they are anything but.

On an almost daily occurrence I deal with some sort of hallucination in code, in summarizing something, we see it constantly on social media when people try to use Google's AI summary as a source of truth.

Let's not try to lie to push an agenda about what the capabilities of what these models can do. They are very powerful, but they make mistakes. There is zero question about that, and quite often.

The problem isn't that they hallucinate, the problem is that we have comments like yours trying to downplay it. Then we have people that, it is right just enough times that they start trusting it without double checking.

That is the problem, it is right enough times that you just start accepting the answers. That leads to, making scripts that grab data and put it into a database without checking. It's fine if it is not business critical data, but it's not really fine when we are talking about health care data or.. oh idk, police records like a recent post was talking about.

If you are going to use it for your silly little project, or you're going to bring down your own companies infrastructure go for it. But let's not pretend the problem doesn't exist and shove this technology into far more sensitive areas.

Kiro · on Aug 27, 2024

Yeah.... no it's overblown.

I think you're exaggerating. You're imagining the worst but your argument basically boils down to not trusting that people can handle it, and calling me a liar. Good one.

hobofan · on Aug 27, 2024

> Tell that to Google...

Yeah, because Google's LLMs have an completely open question/answer space.

For e.g. a Kubernetes AI, you can nowadays just feed in the whole Kubernetes docs + a few reference Helm charts, tell it to stick close to the material, and you'll have next to no hallucinations. Same thing for simple data extraction tasks, where in the past you couldn't use LLMs because they would just hallucinate data into the output that wasn't there in the input (e.g. completely mangling an ID), which nowadays is essentially a non-issue.

As soon as you have a restrictable space in which the LLM acts, you have a lot of options to tune them that hallucinations are not a major issue nowadays.