Hacker Newsnew | past | comments | ask | show | jobs | submit | akavel's commentslogin

Also, in meantime, there's https://SWE-rebench.com as a nice riff on SWE-bench, as far as I understand.

There's a really nice, very low-power, 84x48 B&W LCD screen still widely available for electronics use, a clone of a Nokia 5110 screen - see e.g.:

- https://github.com/akavel/clawtype#clawtype

- mandatory "Bad Apple" vid (not mine): https://youtu.be/v6HidvezKBI

(for the "splash screen" linked above I used font u8g2_font_3x5im_te: https://docs.rs/u8g2-fonts/latest/u8g2_fonts/fonts/struct.u8... and a multilingual u8g2_font_tiny5_t_all: https://docs.rs/u8g2-fonts/latest/u8g2_fonts/fonts/struct.u8...)



Well, maybe the flamingo is a really good unicyclist...

https://youtu.be/Rrpgd5oIKwI


r/LocalLlama is now doing a horse in a racing car:

https://redd.it/1slz38i


AFAIU, their claim is that Mythos is in reality used in a framework that builds such contextual hints, and that their (Aisle's) own framework does the same:

"(...) a well-designed scaffold naturally produces this kind of scoped context through its targeting and iterative prompting stages, which is exactly what both AISLE's and Anthropic's systems do."


All evidence is point to LLMs not being sufficient for the taks everyone want them to do. That harness and agentic capabilities that shove them through JSON-shaped holes are utterly necessary and along with all the security, that there's no great singularity happening here.

The current tech is a sigmoid and even using the abilities of the AI, novelty, improvements don't appear to be happening at any exponetial pace.


> The current tech is a sigmoid

What makes you say that? I'm only asking because the data I've seen looks pretty cleanly exponential still, e.g. https://metr.org.


Lol, young padawan, check up those weird old programs that were called "VisiCalc" and "Lotus 1-2-3".

https://en.wikipedia.org/wiki/VisiCalc

https://en.wikipedia.org/wiki/Lotus_1-2-3


Which were before GUI of any complexity were possible. There was no alternative at the time.

Related, see the insane success and excitement from the early GUI based operating systems.


In the classic FLOSS tradition, it would be cool if you might still consider publishing such a "not-ready" repository - some people may (or may not!) be still interested, and also (sorry!) there's the bus factor... But on the other hand, in the classic FLOSS tradition, it's also 100% your decision and you have the full right to do any way you like!


I'm trying to disable "thinking", but it doesn't seem to work (in llama.cpp). The usual `--reasoning-budget 0` doesn't seem to change it, nor `--chat-template-kwargs '{"enable_thinking":false}'` (both with `--jinja`). Am I missing something?

EDIT: Ok, looks like there's yet another new flag for that in llama.cpp, and this one seems to work in this case: `--reasoning off`.

FWIW, I'm doing some initial tries of unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_XL, and for writing some Nix, I'm VERY impressed - seems significantly better than qwen3.5-35b-a3b for me for now. Example commandline on a Macbook Air M4 32gb RAM:

  llama-cli -hf unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_XL  -t 1.0 --top-p 0.95 --top-k 64 -fa on --no-mmproj --reasoning-budget 0 -c 32768 --jinja --reasoning off
(at release b8638, compiled with Nix)


Oh very cool! Will check the `--reasoning off` flag as well!

Yep the models are really good!


See also: "I'm Not a Robot" (2025 Academy Award Winner) https://youtu.be/4VrLQXR7mKU


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: