Every time I dip my toes in multithreading in C++, I just get into a world of ra...

rramadass · 2024-12-13T17:06:52 1734109612

That is just not true and you are being unnecessarily hyperbolic. Even when i was learning/doing concurrency/multi-threading (using pthreads) long ago, i never got "into a world of random crashes and segfaults". It was of course challenging but not too difficult. You structure your application following standard usage patterns given in a good book (eg. Butenhof's) and then plug in your app logic in the thread routines. With some experience things get clearer over time and you begin to have an intuitive feel for structuring multi-threaded code. The key is to stay at a high enough level of abstraction appropriate for your usecase (eg. mutexes/semaphores/condition variables) before diving into compiler and hardware level intrinsics/atomics/etc.

A good book to study and get a handle on all aspects of Concurrent Programming is Foundations of Multithreaded, Parallel, and Distributed Programming by Gregory Andrews - https://www2.cs.arizona.edu/~greg/mpdbook/

nocman · 2024-12-13T17:27:24 1734110844

In fairness the person you were responding to was referring to their own personal experience. They certainly are not the first person to conclude that doing non-trivial concurrent programming is too difficult for them. I agree that it is achievable with an appropriate level of care and experience, but I know there are many very smart people that conclude that multithreaded programming in C++ is too difficult for their taste.

Even Rich Hickey, when discussing concurrency in Java/C#/C++ said "I am tired of trying to get right, because it is simply far too difficult."

> In particular, talk about shared state, how we do it today, how you do it in C#, Java, or C++, what happens when we get into a multi-threaded context, and specifically what are some of the current solutions for that in those spaces. Locking, in particular. That is something I have done a lot of, over a long period of time, and I am tired of trying to get right, because it is simply far too difficult.

( https://github.com/matthiasn/talk-transcripts/blob/master/Hi... )

rramadass · 2024-12-14T05:33:36 1734154416

The first point to understand is that a knowledge of Concurrent Programming (in all its guises) is mandatory for all programmers today.

The second point to note is that when people like Rich Hickey or John Ousterhout talk about multi-threaded programming being "hard" they are talking about a level of depth far beyond what a "normal" application programmer will encounter in his/her entire career. These guys span everything from Apps/OS/Compilers/Language/Hardware and hence by necessity know the full gamut of complexity involved in concurrency. Trying to understand concurrency across all the above abstraction layers is very difficult and that is what they mean when they say it is "hard".

But for most application programmers the above is simply not relevant and they can comfortably stay at higher-level abstractions given by their language/library and ignore lower-level details unless and until forced by other needs like performance etc. One can do a lot with this knowledge alone and indeed that is what most of us do.

So instead of making wild statements like "random crashes and segfaults" and "too hard to program" learn to use heuristics/commonsense to simplify the code structure eg. a) copy code patterns given by reputed authors so one does not make unnecessary errors b) Keep the number of locks to a minimum by using a few "global locks" rather than a lot of "granular locks" c) Learn to use Thread Local Storage d) Acquire locks/resources in the same order to avoid deadlocks etc etc.

neonsunset · 2024-12-13T17:30:16 1734111016

> Java/C#/C++

These three have different memory models, concurrency primitives and adopted practices...it's odd that they are lobbed together.

nocman · 2024-12-13T17:33:48 1734111228

It's not odd at all. The underlying problems to be solved are identical.

nocman · 2024-12-13T17:35:27 1734111327

Also, Rich has many years of experience using all three of those languages.

neonsunset · 2024-12-13T17:45:19 1734111919

The two out of three have evolved in this area since.

pionaryon · 2024-12-13T12:35:34 1734093334

You can get pretty far with keeping sharing to an absolute minimum and when you do need to share data, slap a lock free ringbuffer between them to communicate. Pretty simple to get right.

adrian_b · 2024-12-13T16:57:35 1734109055

You are right, but what you call "lock free" is not the same as many other things that are called "lock free", even if indeed a ringbuffer needs no locks, so this may be confusing for newbies.

I strongly dislike the term "lock free", which is really just a marketing term invented by people trying to promote the idea that some algorithms are better than those "lock-based", when in fact those "lock-free" algorithms were only choosing a different trade-off in performance, which can be better or worse, depending on the application.

Even worse is that after the term "lock free" has become fashionable, it has also been applied to unrelated algorithms, so now it has become ambiguous, so you cannot know for sure what is meant by it, unless more details are provided.

When accessing shared data structures, the accesses are most frequently done in one of three ways.

The first is to use mutual exclusion, when the shared data structure is accessed within critical sections and only one thread can execute the critical section at a given time. This method is usually called as lock-based access.

The second is to use optimistic access, when the shared data structure is accessed concurrently by many threads, but they are able to detect interference from the other concurrent accesses and they retry their accesses in such cases. This is what is most frequently referred as "lock free" access. Compared to mutual exclusion, this access method may be faster in the best cases, but it is much slower in the worst cases, so whether this is a good choice depends on the application.

The third method happens when it is possible to partition the shared resource between the threads that access it concurrently, so their concurrent accesses can proceed without fear of interference. This partitioning is usually possible for arrays and for buffers a.k.a. FIFO memories a.k.a. message queues (including one-to-one, many-to-one, one-to-many and many-to-many message queues).

So your "lock free ringbuffer" refers to the third method from above, which is very different from the "lock free" algorithms of the second kind from above.

Whenever concurrent access to partitioned shared resources is possible, it is much better than accesses with mutual exclusion or optimistic accesses, which require either waiting or retrying, both of which are wasting CPU time.

Therefore using correctly-implemented message queues or other kinds of shared buffers is usually the best method to achieve high levels of concurrency, in comparison with other kinds of shared data structures, because it avoids the bottlenecks caused by mutual exclusion or optimistic accesses.

gpderetta · 2024-12-13T17:27:26 1734110846

FWIW, lock-free is an academic term and it is not really about performance.

adrian_b · 2024-12-13T19:59:25 1734119965

The fact that it has been coined by some academics in their research papers is not in contradiction with the fact that it has been chosen exactly like any marketing term, to imply that something is better than it really is.

The alternative term "optimistic access" describes much better the essence of those algorithms, while "lock free" attempts to hide their nature and to make them look like something that is guaranteed to be better (so receiving money for researching them is justified), because locks are supposed to be bad.

"Lock free" and "wait free" have been buzzwords that have provided subjects for a huge number of research papers in the academia, most of which have been useless in practice, because the described algorithms have been frequently worse than the lock-based algorithms that they were supposed to replace.

tialaramex · 2024-12-14T09:27:52 1734168472

I don't agree with your characterization of these algorithms as "worse".

They have a desirable property. If you needed a wait-free algorithm, and this is a wait-free algorithm it's not "worse" for you than an existing algorithm that isn't wait-free, regardless of whether it's slower, or more memory intensive or whatever. You needed wait-free and this is wait free.

Why is wait-free desirable? Well, unlike a lock-free algorithm, the wait-free algorithm makes progress in defined time for everybody and it might be that it's actually much worse if anybody is stalled than for the averages to be bad for example.

If you mean "fast on average" say that. If you mean (as often C++ programmer do) "Validates my feeling of self-worth" then say that. I don't know whether anybody wants to pay you more money to validate your self-worth, but at least you're being honest about your priorities.

revskill · 2024-12-13T18:15:36 1734113736

Thanks.

anal_reactor · 2024-12-13T13:04:04 1734095044

Step 1: use concurrency on a very high level. For example, write an app, and then run 4 instances of it, each working on a quarter of your data.

Step 2: when you absolutely need concurrency within one app, try using some library that already solved the issue. For example, there are lots of databases with pretty strong concurrency contracts while still being efficient.

Step 3: if you absolutely need custom solution, use a library/language that provides reasonable tools out of the box and keep your logic to minimum.

Following the first two steps will solve 95% of your concurrency issues, if you include step 3 it goes to 99%.

tobyhinloopen · 2024-12-18T10:37:26 1734518246

Thanks anal reactor!

bitbasher · 2024-12-13T16:02:12 1734105732

If C++ makes it possible to blow your leg off, multithreading with C++ makes it possible to blow yourself up, along with half your neighborhood and some homes halfway around the country, unexplicitly.

commandersaki · 2024-12-13T17:40:47 1734111647

Really? I don't share either sentiment. I found multithreading in C++ to be pretty mundane, and in fact a lot easier than it probably once was with lambdas. You definitely need to be on top of the multithreading primitives such as atomics, mutex, condition variables, shared_ptr, etc. But otherwise it's pretty straightforward.

crabbone · 2024-12-13T20:12:14 1734120734

One of the problems I'm usually facing with this is the need to call library functions from C++ code. Especially TLS-related stuff. It doesn't matter what primitives C++ has to offer, as long as you are using library code things will break.

mattgreenrocks · 2024-12-13T16:51:51 1734108711

The standard library concurrency primitives are still too low-level for a lot of general purpose concurrency needs. IMO, the minimum level of abstraction that you should start with for many apps is a thread-safe queue that dispatches messages with immutable data to one or more worker threads (e.g. actor-like model). You can go lower-level, but it needs to be deliberately chosen.

Jtsummers · 2024-12-13T17:17:24 1734110244

People keep reinventing those and thread pools over and over in C++. I've been researching one of our older systems (slated for decommissioning in 2021, as is typical that did not happen). In trying to understand it I have found many areas of concern around how they deal with concurrency, in particular they created their own queue and thread pool. Based on past experience, there's 50/50 chance for each that they were created correctly (with proper concurrency controls), and less than that that the tasks submitted to the thread pool themselves make use of proper concurrency controls rather than assuming that they can read/write whatever they want as if the system were single-threaded.

saagarjha · 2024-12-13T08:36:39 1734078999

Concurrency is hard, unfortunately. C++ just makes it even harder than that.

sumtechguy · 2024-12-13T13:19:03 1734095943

We had multi CPU stuff in the 90s. C/C++ was dead set on ignoring it for a long time. Every OS had its own way of handling it and all with subtle weird bits that did not act like other OS's. You could not even trust the stdlib or crt to get it right with their globals that could change underneath you.

So it was left to the developer. It is much better now but for so long the problem was ignored and now we have decades of 'not sure if I can touch that code'. Also by default C/C++ are fairly open about sharing memory. So it is very easy create race conditions on memory. It would be nice if the base language had a concept of 'locked to a thread' or 'i want to share this with other threads' then the compiler can flag where I have wandered into the weeds for a class so we could catch the race conditions at compile time, at least some sort of warning.

Sharing semantics were awful for a long time. stdlib has done some very good things to help clean that up but it is still very easy to share between threads and cause yourself a headache.

tialaramex · 2024-12-14T20:53:10 1734209590

C++ itself doesn't have the owning mutex, but there is one in Boost for example.

The problem with an owning mutex in such a language is that you can (on purpose or by mistake) keep accessing the thing it owned after you've released the protecting mutex. Rust's Mutex<T> owns the T but it has the behaviour you want where if you tried to keep the access to T but give back the mutex that doesn't compile. "I tried to unlock the mutex but I still need it, d'oh".

And the same problem applies broadly, you should not share write access to things without an ordering guarantee, but it's hard to ensure that guarantee is followed in C++.

sumtechguy · 2024-12-16T13:32:03 1734355923

Exactly. This stuff has been known about for a long time. It was just kind of ignored and you kinda hoped your library might have something to deal with it (boost, win32, pthread, etc). Then each one acted differently on different platforms or with each other. Some of the std lib is starting to have things we need. But now I have to deal with things in the crt and stdlib that actively break multi threading. Mutexes, semaphores, flags, pulsing, etc is not exactly new patterns. Real mess and you have to understand it too deeply for it to be meaningful to more people. It is why things like promise/async/await are very popular with javascript and their libraries. As it looks like multithreaded programming with a decently clear interface as to what it is going on.

menaerus · 2024-12-13T09:37:52 1734082672

CPU design makes it inherently hard. C or C++ is just a thin layer above it making no tradeoffs. If you can live with the tradeoff then Rust land or VM-language land is more appropriate.

tialaramex · 2024-12-14T20:47:37 1734209257

What is it that you imagine C++ is doing that Rust has traded off?

menaerus · 2024-12-15T12:16:02 1734264962

In theory none but in practice the codebase ends up littered with hidden shared state mostly disguised through one or another shared pointer implementation. And this happens because that's what the Rust compiler is pessimistically enforcing upon you by default.

For heavy workloads, this approach doesn't scale particularly very well.

tialaramex · 2024-12-15T14:55:52 1734274552

It sounds to me as though what you're saying is that when you write Rust programs which don't scale very well they don't scale very well, whereas when you write C++ programs you don't do that, I suggest learning not to do it in Rust either.

menaerus · 2024-12-15T16:59:52 1734281992

Easier to be said than done since that's one of the core tradeoffs of the Rust language. Language is forcing these semantics upon you and while it is possible to get around it with unsafe blocks it turns out to be much more difficult in the practice. So, by default Rust code will almost certainly in majority of cases going to be designed around shared ownership.

tialaramex · 2024-12-15T20:35:01 1734294901

If you actually have shared ownership but in C++ you're getting away with pretending you don't, chances are that'll bite you really hard. Maybe it's already biting and you didn't notice, so Rust actually do you a massive favour.

If there is no shared ownership then inventing it so as to make your Rust slower is just a problem with you, not with Rust.

menaerus · 2024-12-16T09:20:26 1734340826

No. For 98% of the multi-core sensitive code I don't have nor I need shared ownership. While C++ doesn't force you into such semantics but provides you with the ability to do so, the Rust semantics and compiler pesimisstically do. I am going to stop here since I'm repeating myself and you're blatantly going over my points.

jazzypants · 2024-12-13T09:24:32 1734081872

I hear you have to be pretty tall to be able to do it right. [1]

[1] https://bholley.net/blog/2015/must-be-this-tall-to-write-mul...