More

LtdJorge · 2025-11-27T02:56:08 1764212168

Nothing new if you’ve read code from other Facepunch/Garry projects like their Steamworks SDK :D

LtdJorge · 2025-11-16T15:29:29 1763306969

I have ADHD and I 100% feel like what OP describes. I’m always motivated when helping others, not so much for myself.

LtdJorge · 2025-11-16T15:26:09 1763306769

That’s the philosophy. Use the less constrained (but still somewhat constrained and borrow checked) unsafe to wrap/build the low level stuff, and expose a safe public API. That way you limit the exposure of human errors in unsafe code to a few key parts that can be well understood and tested.

LtdJorge · 2025-11-16T15:23:40 1763306620

What tracing?

LtdJorge · 2025-11-16T15:22:45 1763306565

Stripping can save a huge amount of binary size, there’s lots of formatting code added for println! and family, stacktrace printing, etc. However, you lose those niceties if stripping at that level.

LtdJorge · 2025-11-14T23:05:28 1763161528

“I think you meant EU instead of UE" sounds much less passive-aggressive.

LtdJorge · 2025-11-14T22:59:03 1763161143

We could start treating Chinese patents with the same respect they treated the rest of the world’s, maybe.

7bit · 2025-11-15T10:43:57 1763203437

We could. But then we throw away the basis on how we treat European, Australian, Japanese, American, etc. patents.

You either respect patents or you don't. There is no middle ground.

jimmydorry · 2025-11-15T11:50:57 1763207457

Why would we change the way we respect those? None of those nations engage in the same uncompetitive practices that the Chinese do with respect to patents and IP.

LtdJorge · 2025-11-14T22:56:22 1763160982

The vast amount of CUDA libraries for anything you can think of. I think there’s where they have the biggest leverage.

observationist · 2025-11-14T23:43:49 1763163829

AI is going to be so ubiquitous, something principled and open is going to supersede cuda at some point, as HTML5 did for Flash. CUDA isn't like an x86 vs ARM situation where they can use hardware dominance for decades, it's a higher level language, and being compatible with a wide range of systems benefits NVIDIA and their competitors. They're riding out their relative superiority for now, but we're going to see a standards and interoperability correction sometime soon, imo. NVIDIA will drive it, and it will gain them a few more years of dominance, but afaik nothing in their hardware IP means CUDA compatibility sacrifices performance or efficiency. They're also going to want to compete in the Chinese market, so being flexible about interoperability with their systems gains them a bit of market access that might otherwise be lost.

There's a ton of pressure on the market to decouple nvidia's proprietary software from literally everything important to AI, and they will either gracefully transition and control it, or it will reach a breaking point and someone else will do it for (and to) them. I'm sure they've got finance nerds and quants informing and minmaxing their strategy, so they probably know to the quarter when they'll pivot and launch their FOSS, industry leading standards narrative (or whatever the strategy is.)

bigyabai · 2025-11-15T05:39:13 1763185153

> but we're going to see a standards and interoperability correction sometime soon, imo.

I thought this too, in 2015. OpenCL looked really promising, but Apple bailed and neither AMD nor Intel had the funding to keep up with Nvidia's research. It sorta floundered, even though Nvidia GPUs smugly ran OpenCL code with benchmark-leading performance.

Nvidia won the datacenter because of hardware. You could release a perfect CUDA-to-Vulkan translator tomorrow, and they still wouldn't be dethroned until better hardware replaced it. Intel is swirling the drain, Qualcomm is hedging their bets on mobile, AMD is (still) too underfunded - Apple is the only company with the design chops and TSMC inroads to be a serious threat, and they can't release a datacenter product to save their life. It's understandable why people think Nvidia is a monopoly, Team Green is pulling a full-on "Luigi wins by doing nothing" in 2025: https://knowyourmeme.com/memes/luigi-wins-by-doing-absolutel...

The market has almost no pressure to decouple from Nvidia - nobody else has mature solutions. It requires a preestablished player to make a similarly risky play, which might rule out everyone who's sitting at the table.

toasterlovin · 2025-11-15T05:22:08 1763184128

> as HTML5 did for Flash

Uh, Flash died because Apple refused to support it on mobile Safari. Perhaps Flash would have died anyway, but that is the proximate cause. And Apple's competitors were falling over themselves to market Flash support as a competitive advantage vs. iPhone.

bryanlarsen · 2025-11-14T23:35:54 1763163354

To rephrase the OP's point: transformers et al are worth trillions. All the other CUDA uses are worth tens or hundreds of billions. They've totally got that locked up, but researchers is a smaller market than video games.

LtdJorge · 2025-11-14T22:55:11 1763160911

First rule of AMD stock is nobody understands AMD stock. I guess it’s also the same for AMD’s software endeavors.

LtdJorge · 2025-11-14T22:53:57 1763160837

Ahh, composable-kernel. The highest offender in the list of software that have produced unrecoverable OOMs in my Gentoo system (it’s actually Clang while compiling CK, which uses upwards of 2.5GB per thread).

slavik81 · 2025-11-15T15:49:36 1763221776

I was recently reviewing a CK package for Debian. My test build crashed due to OOM using -j32 on a 64GB workstation, so I tried with -j1 to be safe. That completed successfully after 190 hours!

I think I may need to reduce the number of architectures it's built for to successfully compile it on the official Debian buildd infrastructure, but my (unverified) understanding is that most of its reverse dependencies only need the header-only parts of the library anyway.

I'm told they're working on improving the build times via a few different methods.

LtdJorge · 2025-11-16T19:55:23 1763322923

Same, -j32 with 64GB on a 3950x. I use 50% of ZRAM, but it’s still not enough most of the times, so I had to make a config called less-threads that only uses 24, with ZRAM enabled.

I also use OOMD, but I have to work on separating my systemd units better, OOMD has killed my greetd session before, and with that my entire tree of userland processes :D

nalllar · 2025-11-15T17:28:06 1763227686

Spending >10 minutes doing template instantiation for a single kernel for a single ISA is impressive!

`device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f16_instance`, what are you doing to our poor friend clang?

LtdJorge · 2025-11-16T19:55:41 1763322941

And they say Rust is slow!