Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

So if I say "I want to measure your capability as a mechanic" but then also "to ensure an accurate score you're forbidden to use any tools" how are you the human mechanic planning to diagnose and fix the engine problem without wrenches and jack stands and the like? It makes no sense.

That said their harness isn't generic. It includes a ridiculously detailed prompt for how to play this specific game. Forbidding tool use is arbitrary and above all pointless hoop jumping but that doesn't make the linked "achievement" any less fraudulent.



It is more like restricting the mechanic to only using commercially available tools and not allow them to create CUSTOM tools.


No, that would be analogous to disallowing customized harnesses, ie tooling specially crafted by someone else for the specific task at hand. Insisting that an LLM solve something without the ability to make use of any external tooling whatsoever is almost perfectly analogous to insisting that a human mechanic work on a car with nothing but his own bare hands.

The wrench is to the mechanic as the stock python repl is to the LLM.


They want the LLM that does the ARC-AGI-3 to be the same LLM that everyone uses.


Rephrase that in terms of the human mechanic and hopefully you can see the error of that reasoning. LLMs that perform tasks (as opposed to merely holding conversations) use tools just like we do. That's literally how we design them to operate.

In fact the LLMs that everyone uses today typically have access to specialized task specific tooling. Obviously specialized tools aren't appropriate for a test that measures the ability to generalize but generic tools are par for the course. Writing a bot to play a game for you would certainly serve to demonstrate an understanding of the task.


I'm pretty sure the LLM can use tools while doing arc-agi-3 but it has to the same tools available all the time not an incredibly elaborate custom harness.


To quote someone else from upthread, tool use requires a harness. Without one an LLM as commonly understood is a bare model that receives inputs and directly produces outputs the same as talking to an unaided person.


Then the LLM has to write the harness.


I'd like to suggest that prior to expressing disagreement you really ought to reread the comment you're replying to and make sure your understanding is correct.

Quoting this for the second time now - tool use requires a harness.

Without a harness the LLM has no ability to interact with the world. It has no agency. It's just spitting out text (or whatever else) into the void. There's no programming tools, no filesystem, no shell, nothing.


And by the rules of arc-agi-3 the LLM will have to write any harness it needs. I'm not sure what we are even arguing about this point.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: