I've been super curious to see what was at stake here! This sounds better than I'd dared to hope for.
I kind of thought this was just gonna be some kind of deferred texture loading thing, help with streaming assets.
If it actually allows inter-warp sequencing, it sounds like it might possibly solve the chief complains supreme GUI master Raph Levien recently had in I want a good parallel computer, which so that even though we can dynamically add shaders & construct a dynamic workgraph (largely thanks to VK_AMDX_shader_enqueue?), there isn't any sequencing/fencing/barrier-ing between the sections. https://raphlinus.github.io/gpu/2025/03/21/good-parallel-com...https://news.ycombinator.com/item?id=43440174
Edit: having read the article more fully, I'm not sure this is about waves depending on each other. Maybe more about them trying to access memory. Apologies. Hopefully someday!
Terrascale from 2008 was very different. Ignore it.
GCN is mostly the same as RDNA and GCN is practically identical to CDNA. So you can go back to older guides as far back as GCN1 (like early 2010s era). The only fundamental difference is RDNA is SIMD32 while GCN/CDNA is SIMD64
--------
NVidia has an intermediate assembly language called PTX. NVidias true assembly language is undocumented (but not secret, not just intended for general purpose coding). Search on NVidias PTX manual and you'll see
...
Slightly tangent, but AMD is also working on amdgcnspirv (i.e. AMD-flavored SPIR-V) that'll hopefully result in a similar user experience like PTX [1].
Kind of. NIR is more oriented towards lowering and optimizing code for driver backends, as far as I know. SPIR-V is targeted towards the other end of the spectrum.
I kind of thought this was just gonna be some kind of deferred texture loading thing, help with streaming assets.
If it actually allows inter-warp sequencing, it sounds like it might possibly solve the chief complains supreme GUI master Raph Levien recently had in I want a good parallel computer, which so that even though we can dynamically add shaders & construct a dynamic workgraph (largely thanks to VK_AMDX_shader_enqueue?), there isn't any sequencing/fencing/barrier-ing between the sections. https://raphlinus.github.io/gpu/2025/03/21/good-parallel-com... https://news.ycombinator.com/item?id=43440174
Not applicable to GPUs, but since I ran into it recently, it's interesting to see how io_uring handles sequenced submissions. Here's Lord of io_uring's write-up, https://unixism.net/loti/tutorial/link_liburing.html#link-li...
Edit: having read the article more fully, I'm not sure this is about waves depending on each other. Maybe more about them trying to access memory. Apologies. Hopefully someday!