> It depends on your definition of smart. I think that holding a degree != smart.
You wrote:
> I've seen people with masters fail at spotting hallucinations in elementary level word problems.
I wanted to express that having a master in some (even complicated) subject does not make you a master at [pun intended] spotting hallucinations. To give evidence for this statement, I gave a different, more down-to-earth example of a similar situation.
Q: A farmer has 72 chickens. He sells 15 chickens at the market and buys 8 new chicks. Later that week, a fox sneaks into the coop and eats 6 chickens. How many chickens could the farmer sell at the market tomorrow?
AI Answer: The farmer started with 72 chickens. After selling 15, he had 57 chickens left. Then he bought 8 new chicks, bringing the total to 65. Finally, the fox ate 6 chickens, so we subtract 6 from 65. This gives us 59 chickens. Therefore, the farmer now has 59 chickens that he could sell at the market tomorrow.
--
You'd expect someone who can read/understand proofs to be able to spot a a flow in the logic that it takes longer than 1 week for chicks to turn into chickens.
> You'd expect someone who can read/understand proofs to be able to spot a a flow in the logic that it takes longer than 1 week for chicks to turn into chickens.
Rather, I'd assume that someone who is capable of spotting the flow in the logic has a decent knowledge of the English language (in this case referring to the difference in meaning between "chick" and "chicken").
Many people who are good mathematicians (i.e. capable of "reading/understanding proofs" as you expressed it) are not native English speakers or have a great L2 level of English.
> But I was told that humans have this thing called "general intelligence", which means they should be capable to do both math and English!
You confuse "intelligence" with "knowledge". To keep to your example: there exist quite a lot of highly intelligent people on earth who don't or barely know English.
Some native English speakers might still question that statement subconsciously, so let me make it clearer for them: there are many highly intelligent people in the world who don't speak the Rarámuri language.
As a layman, i have no clue at what point a chick turns into a chicken. I also think this isn’t even answerable, because „new chick“ doesn’t really imply „newborn“ but only means „new to the farmer“, so the chicks could be at an age where they would be chickens a week later, no?
I still call my 12 year old cat a "kitty". If someone marked my answer as incorrect because "chicks aren't chickens yet" I would think they're wasting their time with riddles instead of actual intelligence testing. Besides, if the chicks were sellable to the farmer, why the hell wouldn't the farmer be able to sell them?
The OP there also has a pretty bad riddle (due to a grammatical error that completely changes the meaning and makes the intended solution nonsensical, and a solution that many people wouldn’t even have heard of).
Exactly! I read that riddle and thought "a couple islands over the international date line" solely because of the last line, but still had no idea what the name of these islands thousands of miles away from me were named. Might as well make the riddle who their little brother is, and make the answer "Fairway Rock", if niche knowledge is your goal. Which, completely to GPT-o1's credit, it did solve in a single prompt when I asked!
> Besides, if the chicks were sellable to the farmer, why the hell wouldn't the farmer be able to sell them?
I think maybe the original poster is making some sort of additional assumption that the farmer must be selling chickens as meat at the market and a chick wouldn't be sold for that purpose until it's a mature chicken?
(Of course depending on how you interpret the question a chick is a chicken (species) and there's nothing inherently preventing reselling the chicks so I don't really understand why OP thinks the ai answer is clearly objectively wrong. It seems more like a matter of interpretation.)
After posting I realized that the farmer bought some chicks so it could be interpreted that way. I should have modified it to say that 6 chickens hatched.
Anyways this thread is a perfect example of the chaotic datasets that are being used to train FMs. These arguments of whether it’s reasonable to assume a chick could mature into a chicken within a week are happening everyday and have been taking place for years. Safe to say a billion dollars has been spent on datasets to train FMs where everybody has a different interpretation and the datasets are not aligned.
When an educated person misses this question, it's not because the temporal logic is out of their reach. It's because they scanned the problem and answered quickly. They're pattern matching to a type of problem that wouldn't include the "tomorrow/next week" trick, and then giving the correct answer to that.
Imo it's evidence that humans make assumptions and aren't always thorough more than evidence of smart people being unable to perform elementary logic.
The humans were prompted to read the AI responses very carefully because their hallucinations are very good at convincing you with words. It takes a certain skillset to question every word that comes out of a language model because most people will go “hmm yeah that logic seems right”. So hiring “smart” people is insufficient, you need very paranoid people who question every assumption.
Implying there isn’t a market for chickens that are chicks? Clearly there is. The question literally states that the farmer bought chicks, so logically they could go back on the market. They don’t need to be older.
It did a better job in explaining that there is ambiguity in the question, but still went ahead with an arbitrary assumption in order to answer it. I think it is fair to say it is right, but so was the other attempt. Each interpretation is quite valid.
"Most right" would have been to ask questions about what is being asked instead of trying to answer an incomplete question. But rarely is the human even willing to do that as it is bizarrely seen as a show of weakness or something. An LLM is only as good as its training data, unfortunately.
I agree both got it right, in the sense that it wouldn't be a stupid thing for a human to do. If there's a follow-up from someone, I'm sure the more basic llm would have been able to adjust.
Regardless I think it's good showing that models are increasingly able to solve these "gotcha" questions, even though I think it's not hugely useful. Partly because I think it's a poor compliant and an easy shutdown.
Or the other one, a flock of 8 birds are sitting on a fence. The farmer shoots 1. How many are left? 8-1 is 7, but the answer is zero, because the gun shot scared the rest of them off. Fwiw, ChatGPT says zero.
At some point, we decided that compilers were good enough to convert code into assembly to just use them. even if an absolute master could write better assembly than the complier, we moved over to using compilers because of the advantages offered.
The question is for what. For the level of interaction that many day-to-day tasks require, ChatGPT meets that. When you’re going grocery shopping, how often do you get stopped at the door by a security guard who won’t let you pass unless you answer their riddle? The A in AI stands for artificial, so it’s going to look different than human intelligence but we’re at a point where I can throw some words at the computer and it will generate an essay for me relevant to the words I threw at it. It may not get every little detail right, but I’m amazed by that because I’ve had meaningful interactions with humans via text whom wouldn’t have caught the chicks versus chickens gotcha.
Is ChatGPT an all knowing and infallible oracle? Clearly not. But holding it to a higher standard than we hold other humans to is a unfair test of its abilities.
If you'd said "hens" you'd have a stronger point, but then you'd need to be talking about chicks and hens (and they could still cross whatever adulthood threshold you like within the week, as you didn't specify how young they are - "new" could just mean new to the farmer).
You wrote:
> I've seen people with masters fail at spotting hallucinations in elementary level word problems.
I wanted to express that having a master in some (even complicated) subject does not make you a master at [pun intended] spotting hallucinations. To give evidence for this statement, I gave a different, more down-to-earth example of a similar situation.