The example you show has integer division and multiplication.
SSE or AVX instructions don't have an integer division instruction so neither GCC or Clang will emit SIMD code for the division.
For the multiplication you need to bump up GCC to `-O3` optimization to get SIMD instruction but Clang will give the correct answer even on `-O2`. Both emit a `vpmullq` instruction.
At a general level, I agree with you: when writing SIMD code you need to keep an eye on your benchmark results and/or the compiler generated assembly. Typically you're doing some optimization work when working with SIMD so keeping a close eye one the performance is a good idea anyways.
SSE or AVX instructions don't have an integer division instruction so neither GCC or Clang will emit SIMD code for the division.
For the multiplication you need to bump up GCC to `-O3` optimization to get SIMD instruction but Clang will give the correct answer even on `-O2`. Both emit a `vpmullq` instruction.
At a general level, I agree with you: when writing SIMD code you need to keep an eye on your benchmark results and/or the compiler generated assembly. Typically you're doing some optimization work when working with SIMD so keeping a close eye one the performance is a good idea anyways.