More

rhdunn · 2025-12-07T12:18:48 1765109928

In a way this is what notebooks are for Python and other languages. They mix documentation and code such that you can run that code and inspect the output. See for example the pytorch tutorials.

d-lisp · 2025-12-07T12:49:32 1765111772

Yes, notebooks are a restrictive type of litterate programming, interactive and browser bound.

TeX was "proven" as a text/typography tool by the fact that the source code written in WEB (interleaving pascal and TeX (this is meta (metacircular))) allows for you to "render" the program as a typographed work explaining how TeX is made+ run the program as a mean to create typographic work.

I'm lacking the words for a better explanation of how do I feel sbout the distinction, but in a sense I would say that notebooks are litterate scrips, while TeX is a litterate program ? (The difference is aesthetical)

d0mine · 2025-12-07T14:22:50 1765117370

There is Org Babel in Emacs that can be an alternative to jupyter notebooks for literate programming (research/devopsy tasks). It is more powerful in some aspects and weaker in others.

electroglyph · 2025-12-07T12:36:33 1765110993

or all the unsloth notebooks

rhdunn · 2025-12-05T11:11:11 1764933071

Blameless post mortems should be similar to air accident investigations. I.e. don't blame the people involved (unless they are acting maliciously), but identify and fix the issues to ensure this particular incident is unlikely to recur.

The intent of the postmortems is to learn what the issues are and prevent or mitigate similar issues happening in the future. If you don't make changes as a result of a postmortem then there's no point in conducting them.

meindnoch · 2025-12-05T12:44:06 1764938646

>don't blame the people involved (unless they are acting maliciously)

Or negligently.

jq-r · 2025-12-05T13:23:08 1764940988

That still shouldn't be a part of post mortem, more of a performance review item.

tempaccount420 · 2025-12-05T13:38:27 1764941907

They should be performantly removed.

__turbobrew__ · 2025-12-05T16:04:21 1764950661

The aviation industry regularly requires certifications, check rides, and re-qualifications when humans mess up. I have never seen anything like that in tech.

Sometimes the solution is to not let certain people do certain things which are risky.

Xunjin · 2025-12-05T11:24:16 1764933856

Agree 100%, however using your example, there is no regulatory agency that investigate the issue and demand changes to avoid related future problems. Should the industry move towards this way?

tialaramex · 2025-12-05T12:31:42 1764937902

However, one of the things you see (if you read enough of them) in accident investigation reports for regulated industries is a recurring pattern

1. Accident happens 2. Investigators conclude Accident would not happen if people did X. Recommend regulator requires that people do X, citing previous such recommendations each iteration 3. Regulator declined this recommendation, arguing it's too expensive to do X, or people already do X, or even (hilariously) both 4. Go to 1.

Too often, what happens is that eventually

5. Extremely Famous Accident Happens, e.g. killing loved celebrity Space Cowboy 6. Investigators conclude Accident would not happen if people did X, remind regulator that they have previously recommended requiring X 7. Press finally reads dozens of previous reports and so News Story says: Regulator killed Space Cowboy! 8. Regulator decides actually they always meant to require X after all

ethbr1 · 2025-12-05T13:47:00 1764942420

As bad as (3) sounds, I'll strongman the argument: it's important to keep the economic cost of any regulation in mind.*

On the one hand, you'd like to prevent the thing the regulation is seeking to prevent.

On the other hand, you'd have costs for the regulation to be implemented (one-time and/or ongoing).

"Is the good worth the costs?" is a question worth asking every time. (Not least because sometimes it lets you downscope/target regulations to get better good ROI)

*Yes, the easy pessimistic take is 'industry fights all regulation on cost grounds', but the fact that the argument is abused doesn't mean it doesn't have some underlying merit

tialaramex · 2025-12-05T14:30:02 1764945002

I think conventionally the verb is "to steelman" with the intended contrast being to a strawman, an intentionally weak argument by analogy to how straw isn't strong but steel is. I understood what you meant by "strongman" but I think that "steelman" is better here.

There is indeed a good reason regulators aren't just obliged to institute all recommendations - that would be a lot of new rules. The only accident report I remember reading with zero recommendations was a MAIB (Maritime accidents) report here which concluded that a crew member of a fishing boat has died at sea after their vessel capsized because they both they and the skipper (who survived) were on heroin, the rationale for not recommending anything was that heroin is already illegal, operating a fishing boat while on heroin is already illegal, and it's also obviously a bad idea, so, there's nothing to recommend. "Don't do that".

Cost is rarely very persuasive to me, because it's very difficult to correctly estimate what it will actually cost to change something once you decided it's required - based on current reality where it is not. Mass production and clever cost reductions resulting from the normal commercial pressures tend to drive down costs when we require something but not before (and often not after we cease to require it either)

It's also difficult to anticipate all benefits from a good change without trying it. Lobbyists against a regulation will often try hard not to imagine benefits after all they're fighting not to be regulated. But once it's in action, it may be obvious to everyone that this was just a better idea and absurd it wasn't always the case.

Remember when you were allowed to smoke cigarettes on aeroplanes? That seems crazy, but at the time it was normal and I'm sure carriers insisted that not being allowed to do this would cost them money - and perhaps for a short while it did.

ethbr1 · 2025-12-06T14:14:22 1765030462

> it's very difficult to correctly estimate what it will actually cost to change something once you decided it's required - based on current reality where it is not. Mass production and clever cost reductions resulting from the normal commercial pressures tend to drive down costs

Difficult, but not impossible.

What are calculable and do NOT scale down is cost for compliance documentation and processes. Changing from 1 form of documentation to 4 forms of documentation has measurable cost, that will be imposed forever.

> It's also difficult to anticipate all benefits from a good change without trying it.

That's not a great argument, because it can be counterbalanced by the equally true opposite: it's difficult to anticipate all downsides to a change without trying it.

> Remember when you were allowed to smoke cigarettes on aeroplanes?

Remember when you could walk up to a gate 5 minutes before a flight, buy a ticket, and fly?

The current TSA security theater has had some benefits, but it's also made using airports far worse as a traveler.

tialaramex · 2025-12-06T20:32:46 1765053166

I mean, I'm pretty sure there was a long period where you could walk up 5 minutes before, and fly on a plane where you're not allowed to smoke. It's completely unrelated.

The TSA makes no sense as a safety intervention, it's theatre, it's supposed to look like we're trying hard to solve the problem, not be an attempt to solve the problem, and if there was an accident investigation for 9/11 I can't think why, that's not an accident.

As to your specific claim about enforcement, actually we don't even know whether we'd increase paperwork overhead in many cases. Rationalization driven by new regulation can actually reduce this instead.

For a non-regulatory (at least in the sense that there's no government regulators involved) example consider Let's Encrypt's ACME which was discussed here recently. ACME complies with the "Ten Blessed Methods". But prior to Let's Encrypt the most common processes weren't stricter, or more robust, they were much worse and much more labour intensive. Some of them were prohibited more or less immediately when the "Ten Blessed Methods" were required because they're just obviously unacceptable.

The Proof of Control records from ACME are much better than what had been the usual practice prior yet Let's Encrypt is $0 at point of use and even if we count the actual cost (borne by donations rather than subscribers) it's much cheaper than the prior commercial operators had been for much more value delivered.

ethbr1 · 2025-12-08T18:11:57 1765217517

Smoking and TSA are unrelated.

You provided an example of where arguing against regulation was ill-conceived in hindsight. I offered an obvious example of the opposite (everyone against plane hijacking -> regulation -> air travel is made worse for everyone without much improvement for the primary issue).

> Rationalization driven by new regulation can actually reduce [paperwork] instead.

Ha! Anything is possible, I suppose.

I'd point out that the TBM were not ratified by committee (much less a government) and were rammed through by unilateral Mozilla fiat.

rhdunn · 2025-12-02T12:22:51 1764678171

If someone publishes a novel when they are twenty and dies when they are 90 the novel won't be in the public domain for 140 years. That's rediculous.

rhdunn · 2025-12-02T12:11:10 1764677470

It allows you the freedom to publish works in those worlds, reference characters, etc. See for example the horror game Alice: Madness Returns based on the Alice in Wonderland series.

rhdunn · 2025-12-02T12:06:35 1764677195

Lord of the Rings (1954-1955) has only recently entered the public domain for life+50 countries due to JRR Tolkien dying in 1973, despite the work being over 70 years old. It won't enter the public domain in life+70 countries until 2044.

Only recently are works written in the early to mid 1900s being released in the public domain. This limits the works to around the first world war. For example:

- HG Wells (Died 1946, Life+70 in 2017), works like War of the Worlds and The Time Machine.

- LM Montgomery (Died 1942, Life+70 in 2013), works like Anne of Green Gables -- In the US where publication + 90 years is in effect, her later works (after ~1925) are not yet in the public domain there.

With comic IPs, most are not yet in the public domain:

- Superman (1938, P+95 of 2034) and will only cover that incarnation of the character.

- Batman (1939, P+95 of 2035) and will only cover that incarnation of the character.

So the current copyright terms are very limiting for IPs that are nearly a decade old.

rhdunn · 2025-12-01T18:50:37 1764615037

For just the model itself: 4 x params at F32, 2 x params at F16/BF16, or 1 x params at F8, e.g. 685GB at F8. It will be smaller for quantizations, but I'm not sure how to estimate those.

For a Mixture of Experts (MoE) model you only need to have the memory size of a given expert. There will be some swapping out as it figures out which expert to use, or to change expert, but once that expert is loaded it won't be swapping memory to perform the calculations.

You'll also need space for the context window; I'm not sure how to calculate that either.

anvuong · 2025-12-01T19:36:49 1764617809

I think your understanding of MoE is wrong. Depending on the settings, each token can actually be routed to multiple experts, called experts choice architecture. This makes it easier to parallelize the inference (each expert on a different device for example), but it's not simply just keeping one expert in memory.

petu · 2025-12-01T19:21:47 1764616907

I think your idea of MoE is incorrect. Despite the name they're not "expert" at anything in particular, used experts change more or less on each token -- so swapping them into VRAM is not viable, they just get executed on CPU (llama.cpp).

jodleif · 2025-12-01T22:11:40 1764627100

A common pattern is to offload (most of) the expert layers to the CPU. This combination is still quite fast even with slow system ram, though obviously inferior to a pure VRAM loading

rhdunn · 2025-11-29T13:28:51 1764422931

Like this?: https://huggingface.co/TheDrummer/Rivermind-12B-v1

desideratum · 2025-11-29T15:36:11 1764430571

Aside: this guy regularly posts on the Discord server for an open-source post-training framework I maintain, demanding repayment for bugs in nightly builds and generally abusing the maintainers.

immibis · 2025-11-30T10:07:45 1764497265

I assume you offered him to buy a support contract or get banned. Otherwise why is he still allowed to do that?

rhdunn · 2025-11-26T09:56:27 1764150987

It could be that they weren't able to produce stable video -- i.e. getting a consistent look across frames. Video is more complex than image because of this. If their architecture couldn't handle that properly then no amount of training would fix it.

If they found that their architecture worked better on static images then it is better to pivot to that than wasting the effort. Especially if you have a trained model that is good at producing static images and bad at generating video.

rhdunn · 2025-11-26T09:48:10 1764150490

Reading the post the architectural change is combining a vision model (Mistral 3 in the flux.2 case) with a rectified flow transformer.

I wonder if this architectural change makes it easier to use other vision models such as the ones in Llama 3 and 4, or possibly a future Llama 5.

rhdunn · 2025-11-24T08:14:23 1763972063

Veritasium did a video on this [1] with a surface of oil to replicate the effect on a petri dish.

[1] https://www.youtube.com/watch?v=WIyTZDHuarQ