Yes, but that's an NVIDIA project, so would be expected to be hand optimized, sa...

Yes, but that's an NVIDIA project, so would be expected to be hand optimized, same as their cuDNN kernels.

I'm more curious about what types of model people in research or industry are developing, where NVIDIA support such as this is not enough, and they are developing their own PTX kernels.