This is exciting. About 5 years ago I tried to set up a home-made Sonos clone by using two Raspberry Pis to stream synchronised audio across my network. I did get it to work, but it was a huge hassle finding the particular combination of PulseAudio configuration flags to use, and I had to set up a dedicated wireless network for the bandwidth because it fell over if I used compression. I figured the best approach would be to write a PulseAudio module in C but I never had the time or skill to do it.
I've built something similar at home with three RPis and Snapcast [0]. It has an integration with Librespot [1] that shows it as a Spotify destination. It works really well!
So far I mostly tested Roc on several 2.4 Ghz Wi-Fi networks. You can usually expect 100-500 ms in this case (depending on the network). See "Typical configuration" in this article[1].
Most likely you will be able to achieve lower latencies on a better channel, but I did no serious testing for lower latencies yet. This is in my to-do list.
You can choose the target latency. Presumably, the larger that value, the less effect dropped packets and network jitter have on the quality of the output:
You can as well configure the FEC block size (it should be smaller than the target latency), the length of network packets, the length of internal audio frames in the pipeline, and the resampler window. And also the I/O latency, e.g.PulseAudio buffer size. So basically you can configure all (or almost all) parameters that can affect the resulting latency.
I'll document these parameters and their configuration a bit later. (Currently you can find all of them in the man-page in in the API, but there is no overview page that explains how exactly do they affect the total latency).
I once wanted to do something similar, so I wrote my own realtime audio streaming tool [0]; the UX resembles something more akin to a foot-gun than the polished (and much more feature-complete) Roc, but it does have excellent latency properties by running at the absolute ragged edge of what it can.
You could connect multiple clients to stream music to a single server (or multiple servers if you wanted to); the server kept a list of pending audio buffers for each client, mixing them all together into a ring buffer that gets spat out to portaudio. If a client underran, it simply missed that ring buffer rotation, and it "fell behind" by one buffer length (we request the minimum latency from portaudio, so this is usually measured in single-digit milliseconds). That would cause a few crackles and pops in the first second or two as the natural jitter of the network caused the client to underrun a few times, but then it would stabilize. (This didn't bother me much, as I usually was playing silence when I first connected it in any case).
In my experiments, the overall system latency was pretty close to the perceptual limit, I would estimate around 10ms, streaming over wifi from my laptop to a raspberry pi.
> Research. Learn to measure the full network latency, test Roc on different network types and conditions, determine the minimum possible latency that we can handle on different channels.
It sounds like latency is unknown, but my guess is that it’s comparable to RTP plus a few extra milliseconds to handle error correction and a constant latency.
I'm including Dante/Ravenna/Livewire in that question.
My guess is that this is more an application toolkit for home/consumer applications instead of professional use, but a lot of the problems they have listed on their roadmap are already solved through a family of open* standards. The name escapes me but there's an industry initiative to provide FOSS APIs on top of web tech to build these kinds of applications. Engineering manager from Fox/Disney gave a talk about it last month (maybe it was march?) at the LA SMPTE meeting. Lots of money is going into this domain.
* with an AES membership, but you should have that for this kind of project anyway
AES67 is just an interoperability standard, it's more of a device level protocol than anything. You buy AES67 compatible gear for your application, then use the vendors' tools like Dante Virtual Soundcard (so you can essentially treat the networked audio system as a normal soundcard on your machine through CoreAudio/WASAPI/JACK/etc).
It's actually pretty great, most of the time there's no need for a separate API just to handle streaming. It "just works."
Kinda? You need hardware at some point. AES67 was all about creating an open protocol for connecting different proprietary stuff, and frankly there's only a handful of places where I've seen open hardware worth its salt in audio. If you need high capacity, low latency audio over networked machines, you're going to need proprietary hardware/software in the chain somewhere.
I'll definitely be giving this a go!