Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is impressive. The next step is to see how well it generalizes outside of such tests.

"The Fellowship of the Royal College of Radiologists (FRCR) 2B Rapids exam is considered one of the leading and toughest certifications for radiologists. Only 40-59% of human radiologists pass on their first attempt. Radiologists who re-attempt the exam within a year of passing score an average of 50.88 out of 60 (84.8%).

Harrison.rad.1 scored 51.4 out of 60 (85.67%). Other competing models, including OpenAI’s GPT-4o, Microsoft’s LLaVA-Med, Anthropic’s Claude 3.5 Sonnet and Google’s Gemini 1.5 Pro, mostly scored below 30*, which is statistically no better than random guessing."



Impressive, but was it trained on questions from the exam? Were any of those other models?


harrison.rad.1 was not trained on any of the exam questions. It can't be guaranteed however that other models were not trained on them though.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: