More

disiplus · 2026-05-05T17:02:13 1778000533

i dont know what are you talking about, i replaced an older gpt4o with a finetuned qwen. there is a huge amount of "AI, that can be done with those models, or partly by those models." Huge amount of people would not notice the difference. And if you prepare the context correctly, even bigger slice of people would not notice.

Farmadupe · 2026-05-05T18:28:13 1778005693

If it helps, I mean it in a really literal sense. qwen3.6 27b is currently $3.20 per million tokens on openrouter right now which is way overpriced. As good as the 27b is, kimi k2.5 $3.00 and it's just in another league in terms of capability. There's no reason to spend money on it.

And even alibaba's own qwen3.6-plus is $1.95, so it's kinda easy to come to a conclusion that alibaba (nor anyone else) is really interested in hosting that model.

And don't get me wrong, I fully agree with you, qwen3.6 27b is an amazing model. I run it on my own hardware and every day I'm constantly surprised with what it can zero shot.

dakolli · 2026-05-05T17:08:02 1778000882

Genuinely curious, what are you "fine tuning" these smaller models to do reliably? I hear this talked about a lot but very few people actually cough up examples, and I'd love to actually hear of one.

disiplus · 2026-05-05T17:31:16 1778002276

depends, a super small one finetuned to do function calling instead sending it to big model and waiting, instead, you ask for a revenue in last month, i do a small llm function call -> show results. some bigger ones, analysis, summary, classification. what is great with smaller ones, and im looking at 2b, 4b is you can get a huge throughput with just vllm and a couple of consumer gpus. what i usually do is basically distillation of a big one onto smaller one.

disiplus · 2026-05-05T16:54:27 1778000067

nice, will run it later agains qwen3.6 27b, the speed was one of the reasons why in was running qwen and not gemma. the difference was big, there is some magic that happpens when you have more then 100tps.

disiplus · 2026-04-24T05:06:19 1777007179

Depends how many users you have and what is "production grade" for you but like 500k gets you a 8x B200 machine.

disiplus · 2026-04-20T20:40:05 1776717605

was part of the beta, its properly good model, in some sense i forgot that im not on opus or gpt. opus is still better. gpt is the one struggling for me. it has some niche in backend work but you can get the same with opus with skills, its lacking in almost all others.

OtomotO · 2026-04-20T21:45:15 1776721515

Funny, for me Opus is struggling since about February.

4.7 made no difference, so for the first time in many moons I am cancelling my subscription.

disiplus · 2026-04-09T18:42:32 1775760152

It looks like its called prolite.

https://snipboard.io/jmGKfM.jpg

disiplus · 2026-04-08T10:53:43 1775645623

disiplus · 2026-04-07T19:18:09 1775589489

i have glm and kimi. kimi was in most of the cases better and my replacement for claude when i run out of tokens. Now im finding myself using glm more then kimi. Its funny that glm vs kimi, is like codex vs claude. Where glm and codex are better for backend and kimi and claude more for frontend.

as kimi did a huge amount of claude distilation it seems to be somewhat based in data

https://www.anthropic.com/news/detecting-and-preventing-dist...

disiplus · 2026-04-07T19:13:55 1775589235

Yeah it seems they did not align it to much, at least for now. Yesterday it helped me bypass the bot detection on a local marketplace. that i wanted to scrap some listing for my personal alerting system. Al the others failed but glm5.1 found a set of parameters and tweaks how to make my browser in container not be detected.

ReptileMan · 2026-04-07T20:47:42 1775594862

Model doing what the user wants with high quality is definitely aligned in my book.

smallerize · 2026-04-08T00:56:16 1775609776

It's too much in the direction of the paperclip maxmizer for me. It should only hack sites when explicitly directed to, not as a default.

wolttam · 2026-04-08T16:46:56 1775666816

This can never go wrong!

qingcharles · 2026-04-08T00:38:27 1775608707

I always jump on the Chinese models when I'm trying to do something that the US ones chastise me for, they're a little more liberal, especially around copyright.

disiplus · 2026-04-07T19:08:51 1775588931

basically my expirience as well. Sometimes it can break past 100k and be ok, but mostly it breaks down.

disiplus · 2026-04-07T18:15:07 1775585707

When it works and its not slow it can impress. Like yesterday it solved something that kimi k2.5 could not. and kimi was best open source model for me. But it still slow sometimes. I have z.ai and kimi subscription when i run out of tokens for claude (max) and codex(plus).

i have a feeling its nearing opus 4.5 level if they could fix it getting crazy after like 100k tokens.

DeathArrow · 2026-04-08T05:31:21 1775626281

Why don't you start a new session or use the /compact command when context gets to 100k tokens?

From my testing it was ok until 145k tokens, the largest context I had before switching to a new session. I think Z.ai officially said it should be good until 200k tokens.

Using it in Open Code is compacting the context automatically when it gets too large.