I think what’s really depressing here is just how effective scaling seems to be. It just means that any company that’s not willing to pour hundreds of millions of dollars into their AI programs isn’t serious at all and would probably be better off hiring engineers to figure out how to integrate GPTX into their systems than trying to roll their own. I really think we’re going to see a massive collapse of AI/data science jobs once it becomes clear that no in house model is ever going to be better than the zero shot performance of these mega models.
My understanding is that transformers are now favored over RNNs because they parallelize better.
It's hard to imagine, but I wonder if there's some non-parallelizable machine learning algorithms which might outperform these massive models? It seems improbable, but it's a small hope I've had. The greatest intellects were aware of (ourselves) do not scale very well, and maybe the same will ultimately apply to AI?
I remember seeing some theoretical analysis that compared computing differences between transformers, LSTMs and RNNs and I think that RNNs are theoretically better (can learn more complex functions). Can't find it now.