I'm still confused by the proliferation of bf16. Although it certainly doesn't h...

redox99 · on July 3, 2023

Sometimes during training, fp16 will cause networks that would converge on fp32, to explode to Infs or NaNs with fp16, because of the limited range. bf16 generally speaking fixes that.

It's true also that fp16 is often manageable with enough batch/layer norm and gradient clipping.

voz_ · on July 3, 2023

Yea, I spent a few months comparing the two, and empirically i had a lot more issues with various normalized entropy problems (explosion, not converging, converging slower) with fp16 than with bf16.

The transfer pipeline I wrote for fp32->fp16 also took a lot more work than fp32->bf16

eoskx · on July 3, 2023

My understanding is for certain types of networks BF16 will train better than FP16, given the additional protection against exploding gradients and loss functions with the extended range of BF16 - at the loss of precision.

YetAnotherNick · on July 3, 2023

bf16 is generally easier to train neural network than fp16 on due to no need for scaling. And most model training and inference performs the same with fp32 and bf16.

bravura · on July 3, 2023

Despite the other answers, I will tell you the grim truth: Your mileage might vary.

It's an empirical question and depends upon the nature of your problem and data. You should try all three fp32, fp16, and bf16 as part our model selection / hyperparameter tuning.

For example, in audio generative models (where typical output is 16-bit), I've sometimes found that fp16 and bf16 just don't produce good output as fp32 weights.

gok · on July 3, 2023

Fp16 makes it easy to accidentally overflow, especially around summation operations.

bobbylarrybobby · on July 3, 2023

(Not an ML guy.) bf16 and fp16 should be comparable if the weights are of the same magnitude, but what happens in a network where the weights are poorly regularized?

eoskx · on July 3, 2023

Someone commented below that with enough batchnorm/layernorm/etc. and/or gradient clipping you can manage it, but BF16 just makes life easier if you can live without some precision.