Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
ericd
43 days ago
|
parent
|
context
|
favorite
| on:
$500 GPU outperforms Claude Sonnet on coding bench...
Well, also, LLM servers get much more efficient with request queue depth >1 - tokens per second per gpu are massively higher with 100 concurrents than 1 on eg vllm.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: