Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
cbg0
17 days ago
|
parent
|
context
|
favorite
| on:
Measuring Claude 4.7's tokenizer costs
That performance monitor is super easy to game if you cache responses to all the SWE bench questions.
solenoid0937
17 days ago
[–]
You dramatically overestimate how much time engineers at hypergrowth startups have on their hands
dns_snek
16 days ago
|
parent
|
next
[–]
There's a direct business incentive to game/cheat benchmarks, it wouldn't even be difficult to do, and besides, they have workforce-replacing AI to do it for them.
cbg0
17 days ago
|
parent
|
prev
[–]
Caching some data is time consuming? They can just ask Claude to do it.
Consider applying for YC's Summer 2026 batch! Applications are open till May 4
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: