Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Also: attention sinks (although implemented as extra trained logits used in attention softmax rather than attending to e.g. a prepended special token).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: