More

celrod · on Aug 4, 2024

Signed integer overflow being undefined has these two consequences for me: 1. It makes my code slightly faster. 2. It makes my code slightly smaller. 3. It makes my code easier to check for correctness, and thus makes it easier to write correct code.

Win, win, win.

Signed integer overflow would be a bug in my code.

As I do not write my own implementations to correctly handle the case of signed integer overflow, the code I am writing will behave in nonsensical ways in the presence of signed integer overflow, regardless of whether or not it is defined. Unless I'm debugging my code or running CI, in which case ubsan is enabled, and the signed overflow instantly traps to point to the problem.

Switching to UB-on-overflow in one of my Julia packages (via `llvmcall`) removed like 5% of branches. I do not want those branches to come back, and I definitely don't want code duplication where I have two copies of that code, one with and one without. The binary code bloat of that package is excessive enough as is.

account42 · on Aug 5, 2024

Agreed. If anything, I'd like to have an unsigned type with undefined overflow so that I can get these benefits while also guaranteeing that the numbers are never negative where that doesn't make any sense.

atiedebee · on Aug 6, 2024

That's what zig did, and they solved the overflow problem by having seperate operators for addition and subtraction that guarantee that the number saturates/wraps on overflow.

celrod · on Aug 3, 2024

I don't think you'd even necessarily need to ignore. Roll it out in phases. You aren't going to have to deliver the final finished solution all at once.

Some elements are inevitably going to end up being de-prioritized, and pushed further into the future. Features that do end up having a lot of demand could remain a priority.

I don't think this is even a case of "ask for forgiveness, not permission" (assuming you do intend to actually work on w/e particular demands if they end up actually continuing to demand it), but a natural product of triage.

CRConrad · on Aug 17, 2024

> Some elements are inevitably going to end up being de-prioritized, and pushed further into the future. Features that do end up having a lot of demand could remain a priority.

Why, that sounds positively... Agile. In the genuine original sense.

celrod · on July 21, 2024

C++20 added `[[no_unique_address]]`, which lets a `std::is_empty` field alias another field, so long as there is only 1 field of that `is_empty` type. https://godbolt.org/z/soczz4c76 That is, example 0 shows 8 bytes, for an `int` plus an empty field. Example 1 shows two empty fields with the `int`, but only 4 bytes thanks to `[[no_unique_address]]`. Example 2 unfortunately is back up to 8 bytes because we have two empty fields of the same type...

`[[no_unique_address]]` is far from perfect, and inherited the same limitations that inheriting from an empty base class had (which was the trick you had to use prior to C++20). The "no more than 1 of the same type" limitation actually forced me to keep using CRTP instead of making use of "deducing this" after adopting c++23: a `static_assert` on object size failed, because an object grew larger once an inherited instance, plus an instance inherited by a field, no longer had different template types.

So, I agree that it is annoying and seems totally unnecessary, and has wasted my time; a heavy cost for a "feature" (empty objects having addresses) I have never wanted. But, I still make a lot of use of empty objects in C++ without increasing the size of any of my non-empty objects.

C++20 concepts are nice for writing generic code, but (from what I have seen, not experienced) Rust traits look nice, too.

tialaramex · on July 21, 2024

It's probably mean for me to say "empty type" to C++ people because of course just as std::move doesn't move likewise std::is_empty doesn't detect empty types. It can't because C++ doesn't have any.

You may need to sit down. An empty type has no values. Not one value, like the unit type which C++ makes a poor job of as you explain, but no values. None at all.

Because it has no values we will never be called upon to store one, we can't call functions which take one as a parameter, operations whose result is an empty type must diverge (ie control flow escapes, we never get to use the value because there isn't one). Code paths which are predicated on the value of an empty type are dead and can be pruned. And so on.

Rust uses this all over the place. C++ can't express it.

plasticeagle · on July 21, 2024

Help me out here.

What is this empty type for? Could you provide an old man with a nice concrete example of this in action? I've used empty types in C++ to mark the end of recursive templates - which I used implement typelists before variadic templates were available.

But then you mention being unable to call functions which take an empty type as a parameter. At which point I cease to understand the purpose.

tialaramex · on July 21, 2024

I don't know that I'll be able to convince you but I'll give a couple of examples.

What is the type of the expression "return x" ? Rust says that's ! pronounced Never, an empty type. This expression never had a value, control flow diverges.

So this means we can just use simple type arithmetic to decide that a branch which returns contributed nothing to the type of the expression - it has no possible value. This wasn't a special case, it's just type arithmetic.

Ok, lets introduce another. Rust has a suite of conversion traits. From, Into, TryFrom and TryInto. They're chained, so if I implement From<Goose> for Doodad, everybody gets the three other implied conversions. But the Try conversions are potentially fallible, hence the word Try. So they have an error type. Generic Code handling the Error type of potentially failing conversion will thus be written, even if in some cases the conversion undertaken chained back to my From<Goose> code. But wait, that conversation can't fail! Sure enough the chained TryFrom and TryInto produced will have the error type Infallible, which is an Empty Type.

So the compiler can trim all the error handling code, it depends upon this value which we know can't exist, therefore it never executes.

plasticeagle · on July 23, 2024

Got it.

Which of course is equivalent to the statement "I have begun the process of understanding, but do not yet know what I do not know". My old High School teacher used to complain that I claimed understanding long before I actually reached it.

Anyway, thank you, and that seems a clever concept. I can't help but think that it's solving a problem that the language itself created - though that it doubtless an artifact of my as-yet limited understanding.

So "From" has to return something that might be an error, in some way. Just so that the Try... variants can be generated. And generic callers have to write something to handle that error - though presumably concrete callers do not because of the empty type.

tialaramex · on July 25, 2024

> So "From" has to return something that might be an error, in some way. Just so that the Try... variants can be generated

Not quite. From can't fail, but TryFrom for example could fail.

Lets try a couple very concrete examples, From<u16> for i32 exists. Turning any 16-bit unsigned integer into a 32-bit signed integer works easily. As a result of the "chain" I mentioned, Rust will also accept TryInto<i32> for u16. This also can't fail - and it's going to run the identical code, but TryInto has an associated Error type, this must be filled out, it's filled out as Infallible. The compiler can see that Infallible is empty, therefore where somebody wrote error handling code for their TryInto<i32> if the actual type was u16 that Error type will be Infallible, therefore the code using it is dead.

Now, compare converting signed 16-bit integers to unsigned. This can clearly fail, -10 is a perfectly good signed 16-bit integer, but it's out of range for unsigned. So From<i16> for u16 does not exist. But TryInto<u16> for i16 does exist - but this type that really does have an error type, this conversion can and does fail with a "TryFromIntError" type apparently, which I expect has some diagnostics inside it.

celrod · on July 22, 2024

Thanks for the clarification.

plorkyeran · on July 21, 2024

void is an empty type in C++. It's less useful than it could be, but it does exist.

tialaramex · on July 22, 2024

void isn't a type. If you try to use it as a type you'll be told "incomplete type".

People who want void to be a type in C++ (proponents of "regular void") mostly want it to be a unit type. If they're really ambitious they want it to have zero size. Generally a few committee meetings will knock that out of them.

celrod · on July 3, 2024

Multiple accumulators increases accuracy. See pairwise summation, for example.

SIMD sums are going to typically be much more accurate than a naive sum.

Chabsff · on July 3, 2024

Not necessarily either. It's not particularly hard to create a vector where in-order addition is the most accurate way to sum its terms. All you need is a sequence where the next term is close to the sum of all prior ones.

There just isn't a one-size-fit-all solution to be had here.

kardos · on July 3, 2024

> There just isn't a one-size-fit-all solution to be had here.

But there is: https://news.ycombinator.com/item?id=40867842

bee_rider · on July 3, 2024

That’s a one size fits some solution. Not more than 2x as slower than native floats is pretty slow for cases where you don’t need the extra precision.

If might be the case that it is the best solution if you do need the extra precision. But that’s just one case, not all.

celrod · on June 27, 2024

C++23 added `allocate_at_least`: https://en.cppreference.com/w/cpp/memory/allocator_traits/al...

I'm not sure if any standard libraries have an implementation that takes advantage of the "at least" yet.

celrod · on June 23, 2024

Yes! It annoys me when a scene with characters shouting is much louder than a scene where characters are talking with hushed voices, as an example.

We know a shout was louder at the source, but the decibel level at our ears is proportional to 1/distance squared, meaning hushed voices aren't necessarily any quieter "in real life". I'd prefer suspenseful and dramatic scenes both play at similar, comfortable levels. I don't want to have to adjust the volume up and down so I can understand one scene and then not have it be disturbingly loud in the next. In practice, I just use subtitles to circumvent the "difficult to understand" problem.

aeadio · on June 23, 2024

What you're describing is called dynamic range compression, and mpv can be configured to do this in multiple ways.

Back in the physical media days, it was pretty common for DVD/Bluray players to include this feature. Unfortunately it's not something that streaming app developers thought twice about. Your TV or streaming box also may or may not have the feature.

technofiend · on June 23, 2024

It's a problem for me between being half deaf anyway and Google not adjusting source streams to any sort of minimum decibel level. So listening to the local Pacifica station with no adjustment is about half the loudness of the local NPR station. Pleas to correct the audio level have fallen (heh) on deaf ears, so my donations and my listening has moved elsewhere. Sorry, local radio. I'd support you if you'd support me.

3abiton · on June 25, 2024

Doesn't that depends in the sound system setup ie hardware? I tried vehemently to fix it once with JamesDSP with no avail. My understanding then was the need for hardware support like Dolby or something. It was always an issue with music and dialog.

yonatan8070 · on June 23, 2024

Unfortunatly cinemas either don't use MPV or don't use this feature, so while dialog is audible, gunfire is often way too loud for me

iforgotpassword · on June 23, 2024

Huh, I actually want this when I'm at the cinema, for immersion. Just not when I'm watching a movie at home at 11pm.

gertlex · on June 23, 2024

It's interesting that even podcasts, which you'd think would be more focused on audio considerations, don't do this, either. I'm mostly thinking of Dan Carlin's podcasts, where sometimes he goes a weee bit quiet. (And Shadows of Utopia, which I have to jack up the volume for the entirety of)

(insert caveats about: artistic effect; I have zero audio engineering experience; podcasters definitely do some things in this vein e.g. https://www.reddit.com/r/podcasting/comments/10oqvpm/podcast...)

exe34 · on June 23, 2024

I'm watching a tv show, and while I'm sure a crying baby is appealing to a parent to the point they'll run at it, I'm now sufficiently annoyed that I'm just skipping past the crying baby storyline.

cozzyd · on June 23, 2024

Part of this is psychoacoustics... Our hearing sensitivity is peaked near a crying baby frequency regardless of dBa.

jokethrowaway · on June 23, 2024

Poor audio systems are also to blame for this

You want a 3.1 system (at least) and increase the center channel

aidenn0 · on June 23, 2024

I have a 3.1 system and it's still terrible on many movies made after 2010ish.

P.S.

Even properly mastered content will not work well in a noisy environment (e.g. a car or if you have a dozen fans going because it's summer). Here's my ffmpeg filter for such situations (including a stereo downmix, for those without a center channel):

   aformat=channel_layouts=stereo, compand=0 0:1 1:-90/-900 -70/-70 -30/-9 0/-3:6:0:0:0

Dalewyn · on June 23, 2024

No: The problem is directors who love mumbling, audio people who have no idea how to record and balance audio properly, and media players that can't or aren't configured to properly downmix audio tracks as appropriate.

It is absolutely possible and honestly standard fare to have properly balanced audio playing on the $1 tin cans found in TVs and laptops.

tiberious726 · on June 24, 2024

No, you'd be surprised at all the random crap they put on the center channel nowadays. You just need an equalizer and use it to boost the frequencies of human speech. A lot of AVRs call this "dialogue enchancer", but you can do it easily in software

celrod · on June 15, 2024

This memory is now the least recently used in the L1 cache, despite being freed by the allocator, meaning it probably isn't being used again.

If it was freed after already being removed from the L1 cache, then you also need to evict other L1 cache contents and wait for it to be read into L1 so you can write to it.

128 cycles is a generous estimate, and ignores the costs to the rest of the program.

astrange · on June 15, 2024

You can use non-temporal writes to avoid this, and some CPUs have an instruction that zeroes a cache line. It's not expensive to do this.

celrod · on June 15, 2024

Nontemporal writes are substantially slower, e.g. with avx512 you can do 1 64 byte nontemporal write every 5 or so clock cycles. That puts you at >= 640 cycles for 8 KiB. https://uops.info/html-instr/VMOVNTPS_M512_ZMM.html

astrange · on June 16, 2024

Well, the point of a non-temporal write kind of is that you don't care how fast it is. (Since if it was being read again anytime soon, you'd want it in the cache.)

But yes, it can be an over-optimization.

10000truths · on June 15, 2024

The worker is already reading/writing to the buffer memory to service each incoming HTTP request, whether the memory is zeroed or not. The side effects on the CPU cache are insubstantial.

celrod · on June 4, 2024

Different vector widths for different cores isn't currently feasible, even with SVE. So all cores would need to support 1024-bit SIMD.

I think it's reasonable for the non-SIMD focused cores to do so via splitting into multiple micro-ops or double/quadruple/whatever pumping.

I do think that would be an interesting design to experiment with.

paulmd · on June 5, 2024

I actually think the CPU and GPU meeting at the idea of SIMT would be very apropos. AVX-512/AVX10 has mask registers which work just like CUDA lanes in the sense of allowing lockstep iteration while masking off lanes where it “doesn’t happen” to preserve the illusion of thread individuality. With a mask register, an AVX lane is now a CUDA thread.

Obviously there are compromises in terms of bandwidth but it’s also a lot easier to mix into a broader program if you don’t have to send data across the bus, which also gives it other potential use-cases.

But, if you take the CUDA lane idea one step further and add Independent Thread Scheduling, you can also generalize the idea of these lanes having their own “independent” instruction pointer and flow, which means you’re free to reorder and speculate across the whole 1024b window, independently of your warp/execution width.

The optimization problem you solve is now to move all instruction pointers until they hit a threadfence, with the optimized/lowest-total-cost execution. And technically you may not know where that fence is specifically going to be! Things like self-modifying code etc are another headache not allowed gpgpu too - there certainly will be some idioms that don’t translate well, but I think that stuff is at least thankfully rare in AVX code.

celrod · on June 3, 2024

I read that comment as "the wider, the sweeter" (which I agree with), but that we're now (as you say) at the end of the road, and thus the sweetest point.

But an increase in cacheline size would be nice if it can get us larger vectors, or otherwise significantly improve memory bandwidth.

celrod · on May 31, 2024

Our software ecosystem doesn't work well with an army of ants. I think we'd need a paradigm shift to get there.

Also, FWIW, Xeon Phi hit 244 threads in 2012 and 256 threads in 2016, although it used 4 threads/core.

Pet_Ant · on May 31, 2024

The Phi is was many-core and not multi-core. It's a much harder paradigm to wrap your mind around. I do recommend learning about it for the new perspective. Try TIS-1000 by Zachtronics.