I'm not familiar with the kernel RNG specifically, but I think the hash function is only used with small inputs and outputs, and it's the stream cipher (ChaCha) that has the more performance-sensitive job of generating lots of output bytes. So the performance differences between BLAKE2s and BLAKE3 probably aren't very important here, and the fact that a BLAKE2s implementation is already in the kernel makes this an easy change.
That makes sense, thanks. I wonder if it might make sense to use a reduced-round Chacha for generation. Aumasson calls for 8 rounds; I think Rust’s CSPRNG uses 12.