It helps the experimental phase speed up by a lot. It basically caches previous ...

It helps the experimental phase speed up by a lot. It basically caches previous knowledge so you don't need to repeat experiments in each possible configuration.

Isn't it amazing that the same model (transformer) is now SOTA in both language and proteins? Seems like the real story here is the benefits we could get from the transformer in many different fields, not just NLP.