A smaller binary will not necessarily be faster - specialization can bloat code size, but having a specific version for a given subtype can be much faster. It’s a tradeoff.
We can hardly be expected to watch a 46 minute video on "A Practical Guide to Applying Data-Oriented Design" to find our answer. It seems unpromising on the surface.
This seems fairly significant to me and makes me wonder how this is possible with all the effort that has gone into optimizing C++