Libraries can start processes too

jolux · on Aug 2, 2021

This is a huge part of the power of Elixir/Erlang. Libraries can be written to use the same powers of isolation, resiliency, and concurrency that application code has, at no extra cost to the consumer and with total transparency.

You quickly realize that every function you call could be working asynchronously underneath, and it’s not scary because the whole system is designed to function like this.

dnautics · on Aug 2, 2021

I really wish that in elixir we drew a distinction between dependencies that do and don't spawn their own supervision trees. It would be nice if this were reflected in, say, hex.pm, possibly using different tags for them.

QuinnWilton · on Aug 2, 2021

Agreed. I also wish fewer libraries started their own supervision tree, and instead gave you a child spec to drop into your supervision tree. There's definitely use-cases where shipping libraries as an application makes sense, but oftentimes that sort of design causes problems for me, because it means not being able to start multiple copies of the dependency with different configurations.

I think Phoenix PubSub is a perfect example of how libraries should be structured, in that you just need to drop the module + options into your supervision tree, and you have the freedom of starting multiple independent copies of the tree, in different contexts, and with their own configurations: https://hexdocs.pm/phoenix_pubsub/Phoenix.PubSub.html#module...

derefr · on Aug 2, 2021

Alternately, the dependency can start its own supervision tree with any global processes/tables hanging off it from the beginning; and then export a Mod:start_link/1 function which clients will call, which will 1. start a child tree owned/managed by the dep's supervision tree; but which then 2. links that child subtree's root into the caller as well.

Such deps are integrated, by adding a stub GenServer that calls Mod:start_link/1 in its init/2 callback; and then adding a child-spec for that stub GenServer in your client app's supervision hierarchy.

The ssh daemon module in the stdlib works this way. Most connection-pooler abstractions (e.g. pg2, gproc) do as well.

QuinnWilton · on Aug 3, 2021

Yes! This is a great approach, and I'd be happy to see more examples like this in the wild. This is similar to the same way Phoenix PubSub works, with the PubSub application starting a pg scope as part of its supervision tree, that client PubSub servers can join if configured to use the pg adapter.

I was a little bit flippant in my initial comment, but my main criticism was of libraries that don't support any sort of hooks like this into their supervision strategy, and instead rely entirely on a global and static supervision tree, usually configured using app config.

dnautics · on Aug 2, 2021

I'm 50-50 on that one (used to agree with you more but have since retraced a bit). This may be an overly nitpicky detail, but I you sort of want your own sup tree to not necessarily have a different-ly scoped "microservice" tied to it in terms of failure domains, and also just plain visual organization in your observer/livedashboard. For the 90% use case (e.g. http process pools) an indepentent sup tree is correct, but to your points,

1. it would be nice to have a choice. The library-writer should think about their users and choose which case is more correct. And make it opt-out and easy (let's say 2-3 loc) to implement the "other case", and spelled out explicitly in the readme/docs landing page.

2. PubSub indeed made (IMO) the correct choice when it migrated over from being its own sup tree to moving into the app's sup tree.

Thank you for listening to my TED talk.

jolux · on Aug 3, 2021

This is pretty much exactly how I feel and I appreciate that Ranch gives you this option.

jolux · on Aug 2, 2021

You do sometimes have to be careful about how you handle configuration with embedding multiple copies of other supervision trees though: https://ninenines.eu/docs/en/ranch/2.0/guide/embedded/

sandbags · on Aug 3, 2021

IIYC you're suggesting that what I am depending upon here is convenient but problematic?

My understand is not yet sophisticated enough to follow your point about "not being able to start mutiple copies of the dependency with different configurations".

Do you have any explanatory examples that could help me (and presumably others like me)? Thanks. m@t

QuinnWilton · on Aug 3, 2021

Problematic is probably too strong of a term, and I think I'd use the word inflexible instead.

I want to be clear though: my issue isn't with applications -- the functionality you're talking about is powerful and useful -- it's purely with the tendency of starting a static and global supervision tree as part of a dependency: see some of the other comments in this thread for some neat examples of how applications like ssh and pg2 handle supervision.

When libraries are written like this, they usually start everything up automatically, and pull from their application environment in order to configure everything. This means that this configuration is global and shared amongst all consumers of the library.

Imagine an HTTP client, for example, that provides a config key for setting the default timeout. This key would be shared among all callers, and so if multiple libraries depended on this client, their configurations would override each other.

Fortunately, Elixir now recommends against libraries setting app config, so this problem is partially mitigated, but it's still a concern within your app: if I'm calling two different services, I want to use different timeouts for each, based on their SLA, so having a global timeout isn't helpful.

Instead, in this situation, I'd prefer something like what Finch provides, where I'm able to start different HTTP pools within my supervision tree, for different use-cases, and each can be configured independently: https://github.com/keathley/finch#usage

Another approach would be to do something like what ssh does, and have the Finch application start a pool supervisor automatically, but then provide functions for creating new pools against that supervisor, and linking or monitoring them from the caller.

There's a few other techniques you can use too, with different tradeoffs and benefits: like Ecto's approach of requiring that you define your own repo and add that to your tree. Chris Keathley describes some of those ideas here: https://keathley.io/blog/reusable-libraries.html

Global trees like this are also harder to test, especially if they rely on hardcoded unique names, and usually restrict you to synchronous tests, since you can't duplicate the tree for every test and run them independently of each other.

Again though, I want to stress that running processes in the library's application is not my problem: it's just not having any control over when or how those processes are started.

I'm just responding on my phone, and I need to run for a few hours, but feel free to ask for more info or reach out. I'm always happy to talk about this stuff! I enjoyed your article, and I apologize if my initial comment came across as an attack on your core points.

sandbags · on Aug 3, 2021

No indeed, I did not perceive it as an attack, rather as hinting at concerns that I am not aware of and I'm grateful for your comment and the links (and thank you for your compliment).

Reading what you've written I wonder if this is about configuration rather than the nature of a library starting a process per se.

In my case there is no configuration, the agent state is a pure-counter, I think firing it off is harmless as other users of the library would just bump the counter value. Your point about testing is a subtle one, I'm not 100% sure I have the right mental picture yet (something I struggle with most of the time anyway).

What I think you are getting at is a library starting a process that does have configuration around how it works, should be less automatic giving the user a chance to make choices about how it works.

Do I have that right?

QuinnWilton · on Aug 5, 2021

Yes, that's mostly it!

A lot of what I'm talking about has to do with configuration, but reuse is another big element. Your example has no configuration, and so is good in that regard, however your example is not reusable, in the sense that it's only possible for a single counter to exist.

I realize this is a contrived example, because you were trying to keep things simple, but if I needed two distinct atomic counters in my app, then I wouldn't be able to use Ergo, as it's currently implemented, because the application only starts a single counter, and doesn't provide any capabilities for starting additional counters.

You could change Ergo to get around this, possibly by instead running a dynamic supervisor that can start named counters under it, using something like `Ergo.create_counter/1`, but this would only address this specific use case.

To go back to my last comment, if you instead exposed, for example, a `__using__` macro that modules could use to define new counters, then callers would be able to integrate as many counters as they needed, whenever or however into their supervision tree as they required.

This ties back to the testing point too: if the process is a singleton, managed by the application, then you can only run one test against that process at a time in order to isolate the state for this tests, and you need to ensure you properly clean up that state between tests. Instead though, if the library allows you to start the processes yourself, then each test can use `start_supervised!` to start it's own isolated copy of the process, which will be linked to the test's process, and automatically cleaned up once the test finishes.

Fire-Dragon-DoL · on Aug 4, 2021

If a library is not written like that, it's poorly designed. It could provide one global started version as a commodity, but not being able to start it multiple times would be a big no.

QuinnWilton · on Aug 5, 2021

This has historically been fairly common among a lot of the early Elixir libraries, and I'd imagine that's a byproduct of many of the early adopters coming from the Ruby ecosystem, and not having prior experience with the patterns used in Erlang. I think some of the early confusion surrounding how application config should be used also led to some misguided decisions early on.

Fortunately it's something that I've seen improve over time, but it's a pain-point I've run into with a lot of dependencies, so I try to call it out when I see it.

lostcolony · on Aug 2, 2021

It's not obvious from use? It's been a long time since I was in this world (well, Erlang), but it was application:ensure_started(App) that let me know "this dependency has its own supervision tree".

dnautics · on Aug 2, 2021

Httpoison, for example, starts its own supervision tree to manage client process pools (not obvious that an http client should do that, definitely usecases where it shouldn't) and there is no indication either in mix.exs, or `MyApplication.Application.start/2` that Httpoison needs the tree. For most deployments, it's probably not a big deal. Some process will try to read http content, fail if httpoison isn't quite ready, and be restarted by its supervisor.

However, if you try to use it early in compilation or test, say, in your test_helper.exs file, speaking from experience, you could wind up with a very difficult to understand race condition where the httpoison process tree hasn't fully booted and you're trying to fetch something off the internet, and you don't have the same level of supervision protection -- if test_helper fails the whole test suite gives up and doesn't restart -- for obvious reason.

For the http case thankfully the elixir ecosystem is getting Mint as a base http library, which doesn't require a process tree out the gate, and several interesting explicit process-pool-libaries (finch, mojito) which are tuned for their own use cases that derive from mint.

Fire-Dragon-DoL · on Aug 4, 2021

I think the manifest file already contains this information, so it's a matter of just surfacing this. At least for dependencies built using mix.

dceddia · on Aug 2, 2021

In earlier versions of Elixir where you had to explicitly add the libraries as additional processes to start, I remember being very confused why things like e.g. an HTTP client needed its own process. To be honest I still felt a little uneasy about it every time. Thinking of those processes as just “the library needs some internal state” makes so much more sense!

dnautics · on Aug 2, 2021

Problem is, it's not entirely clear that an http client library needs external state. (And for example, mint does not).

masklinn · on Aug 3, 2021

Persistent connections? Cookies? Response cache? Conditional requests?

chii · on Aug 3, 2021

depends on how low level you want the library to be - for example, cURL doesn't have state.

masklinn · on Aug 3, 2021

That is incorrect. Libcurl has a connection cache for persistent connections enabled by default.

It also supports automatically recording and resending cookies (including optionally storing those cookies to disk for cross-session usage), though that is not enabled by default.

In 2020, the CLI cURL utility also added support for conditional requests, although that is not native to the underlying libcurl.

AlchemistCamp · on Aug 2, 2021

Many, many libraries do this! It's a key feature.

(to clarify, these are light-weight Elixir processes, not OS processes)

OskarS · on Aug 2, 2021

I think Erlang and Elixir are incredible, and this particular feature of libraries starting BEAM processes seems cool in the abstract. However, this PARTICULAR task, Erlang/Elixir might be the worst possible language you could choose:

> The Elixir approach to shared mutable state is wrapping it in a process. In this case, I needed a counter and the easiest way to implement it is to use an Agent which is a kind of process designed to handle simple state. In this case, the get_and_update function allows me to return the counter and increment it as an atomic operation.

This is literally just an atomic counter. It’s a single CPU instruction that is guaranteed to be safe. I don’t care how lightweight a BEAM process is, it’s not faster than updating an atomic counter. Doing it this way is also absurdly more complicated than using, say, a std::atomic in C++ (or the equivalent type in other languages).

Again, I think Elixir is cool, but if you want to show off how cool it is, maybe don’t use an example that is incredibly much slower and more complicated than it should be. It’s not a great look for Elixir.

whalesalad · on Aug 2, 2021

Processes on the BEAM may not be on the same machine or even in the same datacenter. You can't assume that everyone has access to the same physical memory.

enjo · on Aug 2, 2021

Of course you're right, but I think anyone who is capable of getting their head around Elixir is capable of understanding that this is a trivial example meant to showcase a powerful language feature. What you choose to use it for is up to you, not the article's author.

OskarS · on Aug 2, 2021

I see that point, but it was pretty jarring reading this Elixir article as a curious outsider that they had to make these kinds of very advanced contortions to do something so simple in such a non-performant way. It’s something that is only impressive if you were already totally sold in on the BEAM way of doing things, but looks insane to any C/C++/Rust/Java/whatever developer.

bcrosby95 · on Aug 3, 2021

Processes are a core fundamental concept in Elixir - I wouldn't call them advanced. But they certainly have been shoved to the background for a lot of developers due to things like the Phoenix framework where you might not have to make your own.

brightball · on Aug 3, 2021

You can just as easily swap out the “increment a counter” with “remove a random letter from a string” or “recalculate the hash” or anything else if it helps.

sandbags · on Aug 3, 2021

If I understand you correctly your comment boils down to whether or not you agree with the approach that Elixir and Erlang take to mutability.

In other langauges (even in Clojure, where I have a lot more experience) there are mutable global variables that you can use for these purposes. Unless I missed them then in Elixir/Erlang there are not.

Elixir/Erlangs approach to mutable state is the process. It doesn't matter what your specific application of shared mutable state happens to be.

We could discuss how any specific use case is more-or-less performant, but I would conjecture that someone who cared signficantly more about low-level performance over other things (e.g. systemic robustness) probably doesn't start with a BEAM language.