By continuously testing competitors and local LLMs? The reason for rising prices...

noosphr · 2026-04-18T02:56:48 1776481008

Anything but the simplest tooling is not transferable between model generations, let alone completely different families.

skissane · 2026-04-18T03:20:27 1776482427

> Anything but the simplest tooling is not transferable between model generations, let alone completely different families.

It is transferable-yes, you will get issues if you take prompts and workflows tuned for one model and send them to another unchanged. But, most of the time, fixing it is just tinkering with some prompt templates

People port solutions between models all the time. It takes some work, but the amount of work involved is tractable

Plus: this is absolutely the kind of task a coding agent can accelerate

The biggest risk is if your solution is at the frontier of capability, and a competing model (even another frontier model) just can’t do it. But a lot of use cases, that isn’t the case. And even if that is the case today, decent odds in a few more months it won’t be

adi_kurian · 2026-04-18T13:15:20 1776518120

Yep. My approach has been, if I can’t reliably get something to 90+% with a flash / nano / haiku, then it’s not viable for any accuracy critical work. (I don’t know of or have the luck of having any other work.) Starting out with the pro / opus for any production classification work has always been a trick.

jeffreygoesto · 2026-04-18T09:08:36 1776503316

Ha. Sounds a lot like the one 10x vs. predictable mediocre guys with a scaffolding of processes. Aim high and hit or miss or try to grind predictably and continuously. Same with humans and depends on the loss you can afford.

throwaway041207 · 2026-04-18T03:23:49 1776482629

If you're talking about APIs and SDKs, whether direct API calls or driving tools like Claude code or codex with human out of the loop, I think that's actually fairly straightforward to switch between the various tools.

If you're talking about output quality, then yeah, that's not as easy. But for product outputs (building a customer service agent or something like that), having a well-designed eval harness and doing testing and iteration can get you some degree of convergence between the models of similar generations. Coding is similar (iterate, measure), but less easy to eval.

pizza · 2026-04-18T06:42:16 1776494536

For most tasks, at some future date, isn't there going to be some ambient baseline of capabilities you can get per $/tok, starting at ~0 for OSS models, such that eventually all tooling gets trivially transferable?

vidarh · 2026-04-18T06:07:26 1776492446

It's not that hard to make it generic. It does take a little work, but really it boils down to figuring out how to make things work with the "dumbest" model in your set.

NBJack · 2026-04-18T14:45:13 1776523513

Note that it is very likely this market can't sustain this level of competition for long. We are all still chasing the carrot of AGI, while hardware costs skyrocket.