Agree with everything you're saying here, but to be fair I think the analogy with Progressive JPEG doesn't sit quite right with your concept. What you're describing sounds more like "semantic-aware streaming" - it's as if a Progressive JPEG would be semantically aware of its blob and load any objects that are in focus first before going after data for things that are out of focus.
I think that's a very contemporary problem and worth pursuing, but I also somehow won't see that happening in real-time (with the priority to reduce latency) without necessary metadata.
It’s not an exact analogy but streaming outside-in (with gradually more and more concrete visual loading states) rather than top-down feels similar to a progressive image to me.
It's data (JPEG/JSON) VS software (HTML/CSS/JS)... you can choose to look at HTML/CSS/JS as just some chunks of data, or you can look at it as a serialized program that wants to be executed with optimal performance. Your blog post makes it seem like your focus is on the latter (and it's just quite typical for react applications to fetch their content dynamically via JSON), and that's where your analogy to the progressive mode of JPEGs falls a bit flat and "streaming outside-in" doesn't seem like all you want.
Progressively loaded JPEGs just apply some type of "selective refinement" to chunks of data, and for Progressive selective refinement to work it's necessary to "specify the location and size of the region of one or more components prior to the scan"[0][1]. If you don't know what size to allocate, then it's quite difficult(?) to optimize the execution. This doesn't seem like the kind of discussion you'd like to have.
Performance aware web developers are working with semantic awareness of their content in order to make tweaks to the sites loading time. YouTube might prefer videos (or ads) to be loaded before any comments, news sites might prioritize text over any other media, and a good dashboard might prioritize data visualizations before header and sidebar etc.
The position of the nodes in any structured tree tells you very little about the preferred loading priority, wouldn't you agree?
I used this analogy more from the user's perspective (as a user, a gradually sharpening image feels similar to a website with glimmers gradually getting replaced by revealing content). I don't actually know how JPEG is served under the hood (and the spec is too dense for me) so maybe if you explain the point a bit closer I'll be able to follow. I do believe you that the analogy doesn't go all the way.
RSC streams outside-in because that's the general shape of the UI — yes, you might want to prioritize the video, but you have to display the shell around that video first. So "outside-in" is just that common sense — the shell goes first. Other than that, the server will prioritize whatever's ready to be written to the stream — if we're not blocked on IO, we're writing.
The client does some selective prioritization on its own as it receives stuff (e.g. as it loads JS, it will prioritize hydrating the part of the page that you're trying to interact with).
I think that's a very contemporary problem and worth pursuing, but I also somehow won't see that happening in real-time (with the priority to reduce latency) without necessary metadata.