Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Wow, I've been out of the GPU game for a year or two now and it's clear how the market has shifted (or how NVIDIA wants it to move). Back in the day we'd keep asking NVIDIA and AMD for half precision support, looks like not only did they do that by now but there's 8-bit integer support too! I was about to say "well who the heck would be able to use 8-bit integers for much?" when I saw in TFA: "offering an 8-bit vector dot product with 32-bit accumulate."

In case it's not clear from the title (which used to read "...47 INT TOPS"), that's 47 [8-bit integer] tera-operations-per-second. Anandtech says it will "... offer a major boost in inferencing performance, the kind of performance boost in a single generation that we rarely see in the first place, and likely won’t see again." No kidding!



I've read a lot about fp16 being good enough for training, but the thing that people don't mention is that just swapping fp32 for fp16 will make things fail because your deep learning framework doesn't implement everything you use for fp16, and after you fix that, it will probably diverge because standard practices aren't used to dealing with such a limited range.

Which isn't to say that the Deep Learning stacks won't get there eventually, but at the moment it's not as easy as flipping a switch.


For a 4x boost, I imagine Tensorflow and Torch will get support shortly after the GPUs start shipping in real quantities.


It's a 2x boost for training since you can't use int8 for training and need fp16.

INT8 is a 4x for inference, but most people aren't using GPUs for inference atm.



Fair enough; when I said most people I meant companies who are not AmaGooFaceSoft and their international equivalents. Though I can't quite tell if you're doing GPU batch predictions and storing them or doing them in realtime with Spark Streaming.

Unrelated question though: any chance you will do blog post/paper about how DSSTNE does automatic model parallelism and gets good sparse performance compared to cuSparse/etc?


Or as Urs Hoezel would say: "advancing Moore's Law by 7 years(tm)..." Badum ba bum bum...

And given Frank Seide et al. demonstrated 1-bit SGD in 2014 (https://www.microsoft.com/en-us/research/publication/1-bit-s...) the race to the bottom is just beginning...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: