For the self-attention mechanism, memory bandwidth requirements scale ~quadratically with the sequence length.
For the self-attention mechanism, memory bandwidth requirements scale ~quadratically with the sequence length.