Yet another parallel framework. Without the 3rd party eco-system of APIs for mat...

seanmcdirmid · on June 11, 2014

Read the release, this is a collaboration with Google and Mozilla. But you are right, one of the main reasons CUDA is so popular is because of cuBLAS. And it is a big pipe dream that you could program a GPU without being aware of communication and memory transfer behavior.

sanxiyn · on June 11, 2014

Won't we have HSA in the future? HSA is supposed to provide unified coherent memory access to both CPU and GPU. Do you think HSA is a pipe dream? If so why?

figglesonrails · on June 11, 2014

OK, it's not that HSA isn't useful, it's that coordination between the CPU and GPU is still stupidly hard and has a lot of CPU-side overhead, making it impractical for small workloads. The problem is that a large number of small workloads still can't be done by a GPU. I'm seriously doubting the limits of "coherent memory access" -- unless the GPU can snoop into the CPU's cache (or the GPU/CPU share L1 cache -- eeek), then you will still need cache flushes and fences. Let's hope that "HSA" is a lot lower overhead than current CPU/GPU combos from AMD/Intel.

AshleysBrain · on June 11, 2014

It's not parallel, a framework, or a GPU feature. It's single-instruction-multiple-data (SIMD) which is used to speed up single threaded execution on a CPU when working with lists of numbers.

LnxPrgr3 · on June 12, 2014

My understanding is architectures are different enough that the fastest SIMD strategy is sometimes CPU-dependent.

The author of FFTS, for example, chose a different strategy on ARM than x86_64: http://anthonix.com/ffts/preprints/tsp2013.pdf

He found himself writing the NEON code in assembly entirely by hand because vector intrinsics didn't even expose CPU features he wanted to use—even in C, where vector intrinsics are CPU-specific.

Having access to SIMD is definitely better than not having it, but it really should be paired with good optimized implementations of things like BLAS and FFT libraries.

cma · on June 11, 2014

>Single instruction, multiple data (SIMD), is a class of parallel computers in Flynn's taxonomy