Hacker News new | past | comments | ask | show | jobs | submit login
RDNA 4's “Out-of-Order” Memory Accesses (chipsandcheese.com)
165 points by ingve 40 days ago | hide | past | favorite | 11 comments



I've been super curious to see what was at stake here! This sounds better than I'd dared to hope for.

I kind of thought this was just gonna be some kind of deferred texture loading thing, help with streaming assets.

If it actually allows inter-warp sequencing, it sounds like it might possibly solve the chief complains supreme GUI master Raph Levien recently had in I want a good parallel computer, which so that even though we can dynamically add shaders & construct a dynamic workgraph (largely thanks to VK_AMDX_shader_enqueue?), there isn't any sequencing/fencing/barrier-ing between the sections. https://raphlinus.github.io/gpu/2025/03/21/good-parallel-com... https://news.ycombinator.com/item?id=43440174

Not applicable to GPUs, but since I ran into it recently, it's interesting to see how io_uring handles sequenced submissions. Here's Lord of io_uring's write-up, https://unixism.net/loti/tutorial/link_liburing.html#link-li...

Edit: having read the article more fully, I'm not sure this is about waves depending on each other. Maybe more about them trying to access memory. Apologies. Hopefully someday!


At first glance at the title, I thought it was going to be about some twist on DNA 3' and DNA 5' reading frames.

https://en.wikipedia.org/wiki/Reading_frame


What’s interesting about that glass you?


Presumably this didn't matter hugely because the memory access patterns for each wave are going to be extremely similar anyway?

Ah yeah he says that at the end. Doesn't really matter for rasterisation but might make more of a difference for ray tracing.


Does AMD have its own flavor of GPU assembly and how is it called?


Yes and it's slightly different per architecture. Mostly new instructions (like the discussed one in this article).

Just RDNA4 ISA and you'll find it:https://www.amd.com/content/dam/amd/en/documents/radeon-tech...

Terrascale from 2008 was very different. Ignore it.

GCN is mostly the same as RDNA and GCN is practically identical to CDNA. So you can go back to older guides as far back as GCN1 (like early 2010s era). The only fundamental difference is RDNA is SIMD32 while GCN/CDNA is SIMD64

--------

NVidia has an intermediate assembly language called PTX. NVidias true assembly language is undocumented (but not secret, not just intended for general purpose coding). Search on NVidias PTX manual and you'll see ...


Slightly tangent, but AMD is also working on amdgcnspirv (i.e. AMD-flavored SPIR-V) that'll hopefully result in a similar user experience like PTX [1].

[1]: https://github.com/ROCm/ROCm/issues/3985#issuecomment-254616...


Mesa uses NIR as intermediate representation for its drivers. Is that comparable?


Kind of. NIR is more oriented towards lowering and optimizing code for driver backends, as far as I know. SPIR-V is targeted towards the other end of the spectrum.


From my limited understanding of SPIR-V (since AMDGCNSPIRV is in essence SPIR-V), I would say yes.


Interesting, thanks!

Looking forward to aco compiler using new features of RDNA4 to improve ray tracing performance with radv.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: