> Intel® APX doubles the number of general-purpose registers (GPRs) from 16 to 3...

_chris_ · on July 24, 2023

And adds non-destructive instructions.

> "In addition, legacy integer instructions now can also use EVEX to encode a dedicated destination register operand – turning them into three-operand instructions and reducing the need for extra register move instructions."

Overall, APX is providing 10% fewer instructions, 10% fewer loads and more than 20% fewer stores.

Also adding pop2/push2 instructions for moving state faster.

And adding more powerful conditional instructions (loads/stores/compares) and flag-suppression.

FullyFunctional · on July 24, 2023

Oh missed your comment and posted essentially the same. These are all interesting changes, predication certainly, but the thing that actually got me the most excited was the press release comment about:

"The processor tracks these new instructions internally and fast-forwards register data between matching PUSH2 and POP2 instructions without going through memory."

I wonder if this implies that pushes don't have to commit to memory if they are popped soon enough? It has always bothered me that we have these huge physical register files but force all the spill and restore to go through memory because of silly anachronistic processor semantics. With a more flexible PUSH/POP semantics we could essentially get the register windows for free.

chc4 · on July 25, 2023

Intel x86 stack engines have done 0-cycle store/load forwarding for years now.

FullyFunctional · on July 25, 2023

? That was not my question nor my point. They hit memory and, as I have since learned, they still do after this. The wording in the press release was ambiguous. In other words, the news here is just being able to push/pop two in one µop.

jeffbee · on July 24, 2023

10% fewer instructions but average instruction is longer, so code density is the same, they claim. This still leaves their ISA with the worst code density of any non-obsolete ISA.

Findecanor · on July 30, 2023

I've only read one study [1] and its follow-up [2] with code-density benchmarks but according to them (one source), x86-64 is actually one of the denser contemporary ISA's ... provided that the compiler/programmer is smart enough to adapt to the ISA's quirks.

1. <https://www.researchgate.net/publication/224114307_Code_dens...>

2. <https://web.eece.maine.edu/~vweaver/papers/iccd09/ll_documen...>