Agreed, 2.5 flash too. I analyze a large json document of metrics for pricing de...

		jacob019 10 months ago \| parent \| context \| favorite \| on: ReasoningGym: Reasoning Environments for RL with V... Agreed, 2.5 flash too. I analyze a large json document of metrics for pricing decisions. Typically around 200k, occtionallly up to 1M, Gemini 2.5 significantly outperforms for my task. It isn't 100%, but role playing gets close. I suppose that's a form of inference time compute.