AFAIU, their claim is that Mythos is in reality used in a framework that builds such contextual hints, and that their (Aisle's) own framework does the same:
"(...) a well-designed scaffold naturally produces this kind of scoped context through its targeting and iterative prompting stages, which is exactly what both AISLE's and Anthropic's systems do."
All evidence is point to LLMs not being sufficient for the taks everyone want them to do. That harness and agentic capabilities that shove them through JSON-shaped holes are utterly necessary and along with all the security, that there's no great singularity happening here.
The current tech is a sigmoid and even using the abilities of the AI, novelty, improvements don't appear to be happening at any exponetial pace.
In the classic FLOSS tradition, it would be cool if you might still consider publishing such a "not-ready" repository - some people may (or may not!) be still interested, and also (sorry!) there's the bus factor... But on the other hand, in the classic FLOSS tradition, it's also 100% your decision and you have the full right to do any way you like!
I'm trying to disable "thinking", but it doesn't seem to work (in llama.cpp). The usual `--reasoning-budget 0` doesn't seem to change it, nor `--chat-template-kwargs '{"enable_thinking":false}'` (both with `--jinja`). Am I missing something?
EDIT: Ok, looks like there's yet another new flag for that in llama.cpp, and this one seems to work in this case: `--reasoning off`.
FWIW, I'm doing some initial tries of unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_XL, and for writing some Nix, I'm VERY impressed - seems significantly better than qwen3.5-35b-a3b for me for now. Example commandline on a Macbook Air M4 32gb RAM:
llama-cli -hf unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_XL -t 1.0 --top-p 0.95 --top-k 64 -fa on --no-mmproj --reasoning-budget 0 -c 32768 --jinja --reasoning off
reply