Most of the SSE3, SSSE3, SSE4.1, and SSE4.2 (there is no SSE5 in any released processor) instructions are not particularly feasible to be used by automatic vectorization, being mostly horizontal vector optimizations or some oddball instructions that are pretty task-specific (hi, PCMPESTRI). You might see them come up in SLP vectorization, but my last experience with LLVM's SLP vectorizer is that it does a poor job of taking advantage of these kinds of instructions anyways.
For hot kernels (say, memcpy), it is definitely the case that many projects have implementations of several different varieties of these, and use the version best suited for your current architecture. See https://sourceware.org/git/?p=glibc.git;a=tree;f=sysdeps/x86... for the different variants of common functions in glibc.