Java Virtual Threads Preview

cryptonector · on Nov 16, 2021

There is a spectrum of solutions for dealing with slow I/O in programming languages (well, all I/O is best assumed to be slow I/O). At the two ends of the spectrum are:

- continuation-passing style (CPS), hand-coded or compiler

- preemptive threading / processes

In between lie various solutions, like async/await (closer to CPS), and green threads (closer to preemptive threading).

The key difference between the two ends of this spectrum is memory footprint. With CPS you manually compress state into tuned structures. With threads you store state all over the stack in very inefficient ways because all those stack frames take extra space.

Any solution towards the thread side of the spectrum will yield significantly larger memory footprints than solutions towards the CPS side of the spectrum.

On the other hand, solutions towards the CPS side of the spectrum can require writing application-specific schedulers if fairness issues arise.

On the whole, solutions on the CPS side of the spectrum are better, IMO.

Java is kinda stuck with threads, so green threads make some sense. Of course, you can get pathological issues in M:N threading, so be careful about that.

Matthias247 · on Nov 16, 2021

This is partially right. But the Java implementation under the hood is still a CPS transformation, so it only allocates the memory that is required for the "virtual stack". Compared to that Go - which has similar semantics - allocates a more classical stack for each goroutine, which can however grow over time.

The CPS approach is certainly more space efficient, but but I'm not sure how much of a difference it really makes in the end. Go seems to be doing well too, after some attempts with linked stack segments and then moving to copying stacks while growing them.

What is interesting is that that some of the CPS like implementations have other performance drawbacks. E.g. since it makes the virtual stack more distributed over memory, the cache efficiency of such an approach might be lower. In Rust one limitation of the CPS approach is that the coroutine state is first stack allocated before being moved onto the heap, and this operation has shown itself to be costly for some applications. So right now I'm not sure if there is any implementation which is superior in all possible benchmarks. But the Java one definitely seems to make a lot of sense for what they want to offer!

cryptonector · on Nov 16, 2021

Normally a POSIX thread gets a very large stack (1MB is typically the default), and obviously these "virtual stacks" will only be as large as they grow (splayed on the heap?). But the memory for those POSIX thread stacks isn't necessarily allocated up front! The OS grows the thread when the thread traps trying to access the guard page, so it's not really a 1MB stack.

Now, if you need to serve 1e6 clients with threads, and those are 1MB stack threads, then you'll be using 1TB of your VM space, which... is almost certainly going to have some performance issues (MMU table size issues at the very least). If you splay your stacks on the heap as linked lists of stack chunks then you might get away with having a very large (and fragmented) heap with large page table entries, which might be a win.

I think approaches on the CPS side of the spectrum will be generally better than this. No, I don't study this and I don't have numbers. Yes, CPS in general means allocating closures on the heap so that some state does live splayed all over rather than compressed, but it doesn't have to be so. But often you'll have only a handful of such closures, and the language could understand that they are one-time use closures (hello Rust) so that no GC is needed.

I've written a small (proprietary) HTTP server that is hand-coded CPS -- specifically it supports hanging-GETs with Range: bytes=0- of regular files as a form of tail -f over HTTP, which is great for log files. That implementation has a single object per GET that has all the state needed, and the only other place state lives is in epoll event registrations (which essentially are the closures, and they are very small, and only one per-connection). Granted, this is a very simple application, and it would be a lot more complicated if, for example, it had to do async I/O directly on a block device to implement a filesystem in the same process -- that would require more care to keep the state compressed.

So in general I'm for CPS. But it's generally true that CPS solutions cost more dev time, and that can be prohibitive. The memory footprint cost difference will be a linear factor, which does not trivially justify the additional dev cost. Then again, if you'll be running lots and lots of instances with lots and lots of clients, the run-time savings can then easily be gargantuan compared to the dev costs -- but no one measures this, and by the time you wish you'd used CPS it will be too late and reimplementation costs prohibitive. Then again, async/await might fit the bill well enough most of the time.

skyde · on Nov 17, 2021

could you explain difference between CPS and using “Async/Await). I have a C# background but always assumed they where the same thing!

cryptonector · on Nov 17, 2021

CPS == callback hell

async/await == the syntax and compiler help you manage the callback hell

skyde · on Nov 18, 2021

so different syntax but they compile down to the same thing?

cryptonector · on Nov 18, 2021

Yes, roughly. That or compile to co-routines (which is green threads).

rzwitserloot · on Nov 16, 2021

> Any solution towards the thread side of the spectrum will yield significantly larger memory footprints than solutions towards the CPS side of the spectrum.

The community-at-large decided that hand-tuning garbage collection was too finicky and not worth it, even though it obviously 'costs' memory.

I'm frankly at a loss as to why so, so, so many blogposts and tech experts are all-in on the CPS-side of this argument; it seems quite obvious to me that in the vast majority of cases, the considerably simpler* model of (green) threads means you're making the exact same trade-off: Simpler to write and debug code at the cost of needing more memory when running the app you write.

*) For sequential/imperative-style languages, that is. If you're writing in a language that is definitely and clearly intended to be written in a functional style, I can see how the gap between CPS-style and threading-style is far narrower. However, java, python, javascript - these are languages where the significant majority of lines of code written are sequential and imperative in nature.

Also note that in e.g. java you can actually configure stack sizes as you make threads. Thus, your choice of words of "'significantly' more memory footprint" is debatable.

cout · on Nov 16, 2021

Very thought provoking. A few of my thoughts:

A GC does not obviously cost memory. It might, or it might not. Both a GC and a traditional memory allocator have hidden costs. A GC with movable objects can sometimes do better, because it can manage fragmentation.

I prefer a message-passing style; on which side of the spectrum would this fall?

My experience with green threads is libraries that make I/O operations look like a regular function call. This is similar to RPC where a remote call and a local can look the same, even though the remote call is much slower. This can result in surprising performance characteristics. Even worse, a remote call can time out or take indefinitely long to complete; the same is not true of local calls. Message passing is more onerous but makes surprises more obvious.

I've often fancied writing for an architecture where main memory is treated as fast remote storage, accessible with message passing. I know such architectures exist but I've never had the opportunity to write for one. I wonder if the change in style would have a positive or negative effect on performance.

cryptonector · on Nov 16, 2021

In CPS the continuations (closures) have to be allocated on the heap, but generally they are one-time use only, which means no GC is needed for them. Hello Rust.

curryst · on Nov 17, 2021

> it seems quite obvious to me that in the vast majority of cases, the considerably simpler* model of (green) threads means you're making the exact same trade-off: Simpler to write and debug code at the cost of needing more memory when running the app you write.

I don't find it's quite that simple.

My experience is that the complexity of CPS tends to scale linearly with use, whereas threads scale exponentially. For small uses threads are easier, but CPS quickly catches up.

CPS forces you to actually declare a dependency tree for your data. Things depend on other things, and that exists in your code. It's very easy for threads to end up a mess, where it's not clear how data is passing through the code, which causes bugs like deadlocks and race conditions.

It's deceptively easy to write code where thread A tries to lock mutexes X and Y, and thread C tries to lock mutexes Y and X, and it deadlocks because neither thread can get both locks.

It would be much harder and more arcane to do that in Javascript or in Python's async. I'm not saying it's impossible, but I don't think I've ever accidentally created a race condition or deadlock in their CPS engines.

TL;DR if your functions are only marked async so you can await something, threading probably is simpler. If you're actually passing promises around, things become much more favorable to CPS.

cryptonector · on Nov 17, 2021

Interesting take!

native_samples · on Nov 16, 2021

Java is kinda stuck with threads, so green threads make some sense.

This isn't really what's happening here. Firstly, you can implement CPS on the JVM no problem. Kotlin Coroutines do exactly that. Loom's design is a very, very explicit design choice. Ron Pressler - the lead and designer of Loom - has talked about this extensively in many videos. He has argued persuasively for the way Loom works as not only a good way but the best possible way, one which is not a requirement of Java's previous design choices but rather, is only actually possible due to Java's prior design choices.

A recent talk on this topic is here:

https://www.youtube.com/watch?v=KmMU5Y_r0Uk

It's highly recommended. I'll try to summarize the basic argument.

The ideal, from a developer's perspective, is to have the programming model of threads with the efficiency of hand-coded CPS or state machines. Why: because threads naturally provide useful debugging and profiling information via their stacks, they provide backpressure, because there are tons of libraries that work with them and already use them, and most critically because it avoids the "colored function" problem which splits your ecosystem.

Why do most languages not provide that ideal? Mostly for implementation reasons. It's not due to theoretical disagreements or anything. Providing what Loom does is very difficult and is possible largely only because so much of Java and the Java ecosystem is written in Java itself. One reason native threads are relatively heavy is because the kernel can't assume anything about how the code in a process was compiled or what it will do. The JVM on the other hand is compiling code on the fly, and knows much more about the stack. In particular it knows about the (absence of) interior pointers, it knows it has a garbage collector, it controls the synchronization and mutex mechanisms, it controls debugging and profiling engines.

This allows it to very efficiently move data back and forth between a native thread stack and compressed encodings on a GCd heap. It's also why Loom has some weaknesses around calling into native code. Once you're outside the JVM controlled world it can no longer make these assumptions anymore and must revert to a much more conservative approach (this is "pinning" the "carrier thread"). Note, though, that this situation is not worse than async/await colored functions, which have exactly the same issue.

bodhiandpysics1 · on Nov 16, 2021

Maybe? it depends how you use them... any asynchronous operation will require memory, and that memory has to go somewhere. In CPS it goes on the heap... in the threaded style you have a stack you can put it on. This probably will waste some memory, though not as much as you might think. On the other hand, you can also reclaim the stack memory as soon as the operation finishes. The CPS technique creates garbage that accumulates until you do gc cycle, but the whole principle of fast concurrent GC is to let garbage accumulate! I'm really not sure what will end of consuming more memory, particularly if you design your threads to primarily use stack allocation, which is admittedly hard in a language like java.

_0w8t · on Nov 16, 2021

Green threads by definition cannot use OS stack and must allocate their stack memory on heap. Although this memory can be reused, as it is known from Go to avoid performance bottlenecks at least for Go code is better to allocate the stack as single continues block and copy the stack to a bigger block when thread’s stack reaches the current stack size. But then the whole stack space is pinned to the thread and cannot be reused.

For Java it may still be possible not to allocate the whole stack as a single chunk and instead have smaller chunks like one per few frames. But I really doubt that it can reduce memory pressure compared with CSP in real applications especially given how good GC became in Java.

aardvark179 · on Nov 16, 2021

So in Java we know a few things about the stack that are not true for other languages. We know nothing on the stack is a pointer into a Java stack frame, and nothing on the heap points into a Java stack frame. These facts allow us to mount virtual threads onto carrier threads by copying portions of the stack to and from the heap. This is normally less memory than you’d expect because although you might have a pretty deep stack most of the work will happen in just a few stack frames, so the rest can be left on the heap until the stack unwinds to that point.

The big advantage of this over CSP is that you can take existing blocking code and run it on a virtual thread and get all the advantages, there is no function colouring limiting what you can call (give or take a couple of restrictions related to calling native code).

_0w8t · on Nov 17, 2021

I like CSP precisely because it requires to color-annotate the code so it is knows what can and what cannot do IO! Surely it decreases flexibility, but makes reasoning about and maintaining the code easier.

fulafel · on Nov 16, 2021

Thread stacks are not OS level objects, at least in linux you just malloc or anon-mmap some memory and pass that to clone() or you own green thread implementation.

cryptonector · on Nov 16, 2021

Threads use stacks. Being 1:1 to OS threads or M:N doesn't change that.

_0w8t · on Nov 17, 2021

The question is can unused potion of the stack be used for anything else? With native threads the answer is no and so is with Go green threads. Time will tell if Java can pull off the trick of sharing unused space place, but I am sceptical.

cryptonector · on Nov 17, 2021

With POSIX threads the stack size defaults to something like 1MB or 2MB depending on the platform, but it's not allocated up front -- the stack grows as needed up to that maximum.

The main difference then between allocating stack chunks on the heap as needed, and stacks grown by the virtual memory subsystem, has to do with virtual memory management matters. If you can use huge pages for your heap, then allocating stack chunks on the heap will be cheaper than traditional stacks.

cryptonector · on Nov 16, 2021

In CPS the state of the program is captured in the continuation, which is a closure, which is allocated on the heap, and in any ancillary data structures pointed to by it.

However, there's generally only a very small number of such closures -- typically only one -- and they are generally one-time use only. That means they can be freed as soon as they tail-call out. Hello Rust.

ithkuil · on Nov 16, 2021

And there are also languages without GC where the CPS technique can lead to immediate reclaiming of the state on drop.

cryptonector · on Nov 16, 2021

I've written hand-coded CPS in C, but I take your point. The issue is that while every function exits via tail-calls, so the stack footprint is small, every continuation is a closure that has to be allocated somewhere, and that somewhere is generally going to have to be the heap.

strictfp · on Nov 16, 2021

My biggest gripe with continuations, futures and callbacks is the propensity for stalls; if the programmer makes a mistake it's quite likely that the program will just stall, with very little insight available as to why that happened.

With threads, green or not, you have a much clearer failure model and is easier to debug.

jonathanstrange · on Nov 16, 2021

But doesn't that depend entirely on the synchronization mechanisms and how safe they are? For example, contrary to what you suggest I'd say that Go's stateless green threads are hard to debug (though there are good tools) and easily lead to deadlocks. That's because the available synchronization mechanisms finitely buffered channels, non-re-entrant mutexes, and a few atomic operations are hard to use correctly. Higher level constructs like actors or transactional memory are in my opinion less error prone.

The point is that e.g. futures are just some threads with a global synchronization mechanism for obtaining the result. Whatever makes the future stall will also make a low-level thread + your own synchronization stall. Or do you mean some more advanced failure-tolerant threading like in Erlang as compared to less advanced threading primitives like futures?

Twisol · on Nov 16, 2021

I definitely prefer CPS, especially in functional languages (where the noise I complain about below often fades away entirely).

On the other hand, CPS usually is a bit noisier from the developer's perspective; either your continuations are callbacks (whence callback hell) or your continuations are, as you say, manually compressed, tuned structures, which requires a fair amount of manual labor.

I believe Rust uses a CPS transform (well, more of a continuation-returning style, no?), but it automatically generates the tuned structures ("futures"). The cognitive overhead isn't all gone, but it definitely helps.

captainmuon · on Nov 16, 2021

What about async/await? You can basically write linear, imperative code, and don't have to deal with continuations or these "tuned structures" manually.

For me it is the cleanest style of writing concurrent code. And more and more I find I can also replace state machines with it, which makes sense because the compiler generates state machines under the hood usually.

You know, the kind of code where you have to communicate with some outside device and it is easy to do blockingly but devolves to state machine madness if you need to do other things concurrently. For example it would be really nice if I could use async/await in C on a microcontroller to read from a serial port...

native_samples · on Nov 16, 2021

The problems with async/await are:

1. The "colored function" problem: http://journal.stuffwithstuff.com/2015/02/01/what-color-is-y...

2. Poor interaction with debuggers, profilers, and other tools that expect to be working with normal stacks.

Loom solves this because it lets you work with normal threads, but suddenly you can have millions of them in a process without blowing out your memory or other related problems.

Twisol · on Nov 16, 2021

> You know, the kind of code where you have to communicate with some outside device and it is easy to do blockingly but devolves to state machine madness if you need to do other things concurrently.

Speaking personally, I've found Lua's coroutines to have the nicest experience for modeling flows like that. The big issue with async/await is the function color problem [0] -- writing async functions is perfectly fine, but mixing them with non-async functions can be extremely frustrating. Especially if you're doing anything with higher-order functions.

[0] https://journal.stuffwithstuff.com/2015/02/01/what-color-is-...

captainmuon · on Nov 16, 2021

I used to struggle with "function color" until I realized that the functions just have a different type. Async functions return a future `Task<Thing>`, while normal function return a plain `Thing`. Of course they are incompatible.

A different way of looking at it is that in asyncs functions you should only do things that have negligible runtime (compared to the response time of your GUI or network service). If your task needs more time, you mark the call site and the called function "async" and the task will suspend somewhere "down in the call stack". (Without looking into it too much, I think something similar actually happens with these virtual threads. They modified IO functions to do cooperative multitasking under the hood?)

As to async functions being contagious, I found it helps to split "imperative" procedures and "pure" functions, and the async color mostly applies to the previous.

kaba0 · on Nov 16, 2021

The problem with function coloring is that it divides the language for no good reason. Should there really be two names for the same sleep function, just because one is blocking and the other is not? As for a return type, that’s just a leaky abstraction imo (especially for voids, like is a blocking call returning nothing different than an async call returning a Future void?)

As for loom, due to it running all in a runtime, a blocking ‘read’ call for example is not actually a blocking system call (everything uses non-blocking APIs at that level) so the runtime is free to suspend execution at such a blocking site and continue useful work elsewhere until that finishes. So for some “async” functionality you can just fire up a new virtual task with easy to understand blocking calls and that’s it, it will do the right thing automagically, and it will throw exception where it actually make sense, you will be able to debug it line by line, no callback hell, etc.

Loom will also provide something called structured concurrency where you can fire up semantically related threads and easily wait for their finish at one place.

As for pureness, I don’t think it maps that cleanly to async/blocking. What about doing the same function on each pixel of a picture in memory where you subdivide it into n smaller chunks and run it in parallel?

true_religion · on Nov 16, 2021

Personally, in JavaScript I like that you can mix and match imperative and asynchronous code using Promise instances. It lets you handle asynchronous control flow in a purely synchronous function.

However in other languages, having functions be of a different 'color' is far more painful. In Python for example, a synchronous function has to setup an event loop manually before it can run an asynchronous function. The call works, but nothing is 'running' without the event loop. Additionally, the asynchronous function may have been written to work with a particular eventloop (e.g. trio vs curio), and thus you have to use that type.

If non-blocking code has a standardized control state like Javascript, I think it's better to be explicit about async vs sync.

Twisol · on Nov 16, 2021

I also think of "color" fundamentally as different types; it's just painful to have two kinds of functions that you can't combine. I, personally, really feel that async/await is just adding a second kind of continuation (promises/futures) to a language that already has a perfectly good one (call stacks), and the language ergonomics suffers for it.

The reason I say functional languages don't get bit by this as bad is because functional languages rely far less on the specific notion of a call stack, and it's usually much easier to work with continuations (either via primitives like shift/reset or via syntax like do-notation).

whoisthemachine · on Nov 18, 2021

I struggle with this idea of "separate colors"... I see Promise returning functions as a super set of immediately returning functions... that is, any function that can return immediately could also return as a Promise (which resolved immediately), so really, the immediately returning function is just an optimization to apply when it's helpful. I'm curious why a language suffers from this explicit separation of code which "returns immediately" versus code which "returns eventually"?

cryptonector · on Nov 16, 2021

This is why I refer to solutions on that side of the spectrum. CPS is not the only one. There's also async/await, futures, etc.

kccqzy · on Nov 16, 2021

> With CPS you manually compress state into tuned structures. With threads you store state all over the stack in very inefficient ways because all those stack frames take extra space.

On the other hand, these stack frames can be thought of as a large arena allocator for what would otherwise be lots of smaller objects allocated on the heap.

philosopher1234 · on Nov 16, 2021

>On the whole, solutions on the CPS side of the spectrum are better, IMO.

Just because you think they can be more efficient?

cryptonector · on Nov 16, 2021

Not only that, but also because they make the programmer think about state representation.

hn_throwaway_99 · on Nov 16, 2021

Really excited to see this! Seeing some of the early comments here I think folks may not realize how awesome this would be in the server space.

After all, a big reason that NodeJS won a lot of popularity on the server is that, for many types of common webserver workloads (i.e. lots of IO, relatively minor CPU usage), NodeJS can actually scale much better than Java with its thread-per-request model.

With these virtual threads, though, you could get the best of all possible worlds - a webserver that scales like NodeJS, but without some of the "CPU starvation" issues you can hit in Node if one executing request doesn't yield, and also without having to worry about "function coloring" like you do in Node with async vs. non-async functions.

Really, really fantastic development, have been waiting to see when this would come out.

fulafel · on Nov 16, 2021

> For many types of common webserver workloads (i.e. lots of IO, relatively minor CPU usage), NodeJS can actually scale much better than Java with its thread-per-request model

Linux can handle a ginormous amount of threads quite well, would be interesting to see a deeper investigation to this theory.

native_samples · on Nov 16, 2021

The problem with doing it all native is that stack sizes are quite variable, especially in managed languages where modularity and code reuse works better, so it's common to have tons of libraries in a single project. The kernel won't object to lots of threads, but once those threads have been running for a while a lot of stack space will be paged in and used.

Loom solves this by moving stacks to and from the heap, where there's a compacting concurrent GC to clean up the unused space.

jankotek · on Nov 16, 2021

Java uses some memory for stack, about 1MB for each thread.

MrBuddyCasino · on Nov 16, 2021

This is configurable, and 1MB is very generous. I think the JVM automatically grows the stack size as needed nowadays and starts low.

fulafel · on Nov 17, 2021

This doesn't in practice limit scaling though as it's linear and small in absolute terms vs what you can put in a server.

vlovich123 · on Nov 16, 2021

Am I understanding correctly that this is basically goroutines for Java?

hn_throwaway_99 · on Nov 16, 2021

Yes. I am not as familiar with the underlying implementation of goroutines, but this description in the linked JEP sounds exactly how I understand goroutines to work:

> The JDK implements virtual threads by storing their state, including the stack, on the Java heap. Virtual threads are scheduled by a scheduler in the Java class libraries, whose worker threads mount virtual threads on their backs when the virtual threads are executing, thus becoming their carriers. When a virtual thread parks -- say, when it blocks on some I/O operation or a java.util.concurrent synchronization construct -- it suspends, and the virtual thread's carrier is free to run any other task. When a virtual thread is unparked -- say, by an I/O operation completing -- it is submitted to the scheduler, which, when available, will mount and resume the virtual thread on some carrier thread, not necessarily the same one it ran on previously. In this way, when a virtual thread performs a blocking operation, instead of parking an OS thread, it is suspended by the JVM and another one scheduled in its place, all without blocking any OS threads (see the Limitations section).

pkulak · on Nov 16, 2021

Yes. But, since there are _two_ kinds of threads in Java (os and virtual), you still have to be very careful never to block a virtual thread. In Go/JavaScript/Beam, it doesn't matter because you literally can't block a thread (while idle). This is the kind of thing that's not terribly useful until nearly every library you interact with is using it as well.

Also, there's no new syntax, so you're stuck with all the same thread pool concurrency we've been using for decades.

EDIT: It looks like I'm wrong about this:

> My understanding is that you won't have to worry about blocking a virtual thread, because all IO APIs are being modified to park when executed in the context of a virtual thread.

didibus · on Nov 16, 2021

My understanding is that you won't have to worry about blocking a virtual thread, because all IO APIs are being modified to park when executed in the context of a virtual thread.

That said, you'd still need to worry about unsafe code, like JNA/JNI or other such thing that could still block. And I'm not sure there will be a way to prevent long running CPU task from clogging up the virtual thread executor threads.

Twisol · on Nov 16, 2021

> My understanding is that you won't have to worry about blocking a virtual thread, because all IO APIs are being modified to park when executed in the context of a virtual thread.

And, from what I read in the original JEP, the underlying system thread pool (which all virtual threads float between as needed) will be expanded when a virtual thread gets pinned, so you don't have to worry about exhausting your pool. (If you pin too many threads, obviously you'll be consuming more OS resources than you may have expected, but that's a different problem.)

didibus · on Nov 16, 2021

What do you mean by pin here? Do you mean that a blocking IO will block the thread, but it will also add one more thread to the virtual thread executor pool? So blocking won't starve your virtual threads?

Twisol · on Nov 16, 2021

That's right. From the linked JEP, under "Scheduler":

> Some blocking APIs temporarily pin the carrier thread, e.g.most file I/O operations. The implementations of these APIs will compensate for the pinning by temporarily expanding parallelism by means of the ForkJoinPool "managed blocker" mechanism. Consequentially, the number of carrier threads may temporarily exceed the number of available processors.

hn_throwaway_99 · on Nov 16, 2021

Just to clarify, though, most currently blocking IO operations will not pin the carrier thread, because most IO operations you make from a webserver are network calls (e.g. to another API or the database), and those network APIs have been modified to not pin. From just a bit further up in the JEP:

> The implementation of the networking APIs defined in the java.net and java.nio.channels API packages have been updated to work with virtual threads. An operation that blocks, e.g. establishing a network connection or reading from a socket, will release the underlying carrier thread to do other work.

kaba0 · on Nov 16, 2021

And you are incorrect on the other point as well: https://openjdk.java.net/jeps/8277129

:D

_0w8t · on Nov 16, 2021

In Go one can block the native thread via using API that use blocking OS calls, like Linux file IO. In this case Go runtime allocates more native threads to run other language threads.

SureshG · on Nov 19, 2021

That's case for virtual threads also. It uses ForkJoinPool.ManagedBlocker to add additional threads.

"File I/O is problematic. Internally, the JDK uses buffered I/O for files, which always reports available bytes even when a read will block. On Linux, we plan to use io_uring for asynchronous file I/O, and in the meantime we’re using the ForkJoinPool.ManagedBlocker mechanism to smooth over blocking file I/O operations by adding more OS threads to the worker pool when a worker is blocked."

foota · on Nov 16, 2021

If it's added to the VM then won't that allow languages like Kotlin to add support?

vips7L · on Nov 16, 2021

Kotlin coroutines could take advantage of virtual threads but they still will have the syntatic problem of colored functions.

Skinney · on Nov 16, 2021

Kotlin can call regular Java APIs, though. Doesn't have to take the coroutine route.

vips7L · on Nov 16, 2021

Yes but then your code isn’t idiomatic/multi platform/whatever. It’s a trade off (and one where I would always chose Java).

pjmlp · on Nov 16, 2021

I learned an hard lesson in the Borland ecosystem.

Always go with the platforms languages, and the IDEs from the platform owners, even if others are more shinny.

Long term it always pays off to be the turtle, as the platforms move into directions not forseen by the shinny objects, and 3rd party IDEs keep playing catching up with SDK features.

pkulak · on Nov 16, 2021

What if the company that makes Kotlin is the one that makes the Java IDE?

pjmlp · on Nov 16, 2021

They make one Java IDE, zero contributions to the JVM, and are all cozy with "screw you Java devs" Google godfather.

IBM does Java and the IDE (Eclipse).

Red-Hat and Microsoft do Java and the IDE (VSCode).

native_samples · on Nov 16, 2021

They contribute to the JDK, mostly via the Swing project. For instance they're a major contributor to Project Lanai.

pjmlp · on Nov 16, 2021

I missed that. Most likely because they are a long way to reboot InteliJ on Compose for Desktop.

Twisol · on Nov 16, 2021

Eclipse and NetBeans do exist, and... ehhhhh. I used NetBeans for a long time; couldn't stand Eclipse; and these days I only use IntelliJ. But the others absolutely exist, and it'd be hard to say that Apache and the Eclipse Foundation aren't deeply embedded in the Java ecosystem.

vips7L · on Nov 16, 2021

Eclipse is fine. Especially from VsCode where it uses the Eclipse language server. It boots fast, and when you run it with a modern JVM and GC the memory usage is leagues lower than IntelliJ.

Twisol · on Nov 16, 2021

> Especially from VsCode where it uses the Eclipse language server.

Sure, but my particular complaint isn't with the functionality; it's with the UI. Yes, VS Code absolutely improves the experience.

tadfisher · on Nov 16, 2021

Eclipse tried their own language: https://www.eclipse.org/xtend/

Skinney · on Nov 16, 2021

If you need multi-platform then coroutines is still your best bet. But many people don't use Kotlin in a multi-platform way, and lightweight threads will be an easier migration path (and more compatible with Java libraries if you cant avoid one) compared to coroutines.

pjmlp · on Nov 16, 2021

In abstract, yes.

In the real Kotlin world of taking a random Kotlin library and call it from Java, most likely "it depends".

Skinney · on Nov 16, 2021

I didn't mention calling Java from Kotlin.

Kotlin can call a Java API to spawn a lightweight thread. There's no reason to use coroutines when you can do that.

pjmlp · on Nov 16, 2021

Only if the Kotlin code is to be tied to the JVM, if you want that Kotlin library to be usable on Android, that isn't an option.

Skinney · on Nov 16, 2021

Yes, which is perhaps the best part about virtual threads. Java, Kotlin, Scala, Clojure, Gradle... everyone benefits.

pjmlp · on Nov 16, 2021

Guest languages always have to deal with taking decisions that don't go along with the platforms, regardless how they boost being "better".

yutijke · on Nov 16, 2021

Yeah, they seem very similar on the surface level.

Though loom doesn't have support for preempting green threads that are blocking the scheduler like go does, I think.

tannhaeuser · on Nov 16, 2021

> NodeJS [...] with its thread-per-request model

Node.js doesn't create a thread per request; it's single-threaded with evented I/O. You can use node-cluster to start more than a single thread to saturate multi-core CPUs and load-balance HTTP requests across these, but that doesn't make it thread-per-request.

MikeTheGreat · on Nov 16, 2021

I think my high school English teacher would agree with you that the sentence is written awkwardly (I can see the 'awk' note, in red, on my paper right now :) ). Here's how I parsed it:

> a big reason that NodeJS won a lot of popularity on the server is that, for many types of common webserver workloads [...], NodeJS can actually scale much better than Java with ~~it's~~ [Java's] thread-per-request model.

bradfitz · on Nov 16, 2021

> ~~it's~~

Why are you calling that out? The original "its" was correct without the apostrophe.

MikeTheGreat · on Nov 16, 2021

Twisol was right - I was trying to imply strikethrough using the Markdown syntax in an attempt to depict the idea of replacing "its" with "Java's". It didn't work as well as I hoped. In my mind I can see more 'Awk' scribbles on my post, and looking at it I agree :)

Adding in the 's is 100% my mistake. I've been guilty of using "it's" as the possessive form for most of my life, but that changes today! :)

bradfitz · on Nov 16, 2021

> but that changes today! :)

Exciting! :)

Twisol · on Nov 16, 2021

Tildes are used for strikethrough in some markup dialects (including markdown), so I think they meant to depict replacing "its" with "Java's".

No clue on the apostrophe.

nayuki · on Nov 16, 2021

(Forgive my grammar nazism.) The possessive form of "it" is "its": "The dog wagged its tail". But for basically everything other than pronouns and plurals, the possessive form involves adding "apostrophe s". In recent years, many people have tried to apply this rule to "it". But the problem is that "it's" is understood to be a contraction of "it is" or "it has"; furthermore, "its" already exists as the standard possessive form.

One thing I say to people using "it's" is that by analogy, you also need to say: "He got he's skills. She missed she's ride. They have they's meeting."

MikeTheGreat · on Nov 16, 2021

Thank you a ton for posting this! I've been doing this for most/all of my life and it didn't really make sense till now. I've had people explain it before but it didn't really make sense. Here's what I got from what you wrote (please correct me if this is wrong / kinda off in some way)

For most words, the possessive form is "<word>'s"

For pronouns (including it) there are different rules. He becomes his, she goes to hers, it goes to its.

Also, words that already end in s don't get the " 's " treatment.

(Question - for words that end in "s", we put the apostrophe after the existing, ending 's', yes?)

Thanks again for posting this - viewing the possessive form of it as (yet another English language) exception to the normal rule of " 's " is really helpful.

Twisol · on Nov 16, 2021

> One thing I say to people using "it's" is that by analogy, you also need to say: "He got he's skills. She missed she's ride. They have they's meeting."

This is a great distillation of the intuition I've always had, but never quite verbalized.

tannhaeuser · on Nov 16, 2021

Ah that makes sense; didn't get this reading at all!

hn_throwaway_99 · on Nov 16, 2021

Sorry, yes, my sentence was poorly written with the ambiguous antecedent. Most Java webservers use a thread-per-request model, which is why Node can usually scale to more concurrent requests.

an_account_name · on Nov 16, 2021

Suspect they're talking about Java - a lot of frameworks do exactly create a thread per request.

didibus · on Nov 16, 2021

I can't think of any framework that still does one thread per request. Normally there is a a queue of incoming requests and they then get dispatched on a thread pool as threads return to the pool.

The challenge is normally that if any of the threads in the pool, as part of processing a request, needs to itself make an IO call, it will block. Ideally you'd want to park the request processing, return the thread to the pool, pick up the next request, until the IO is done where then on the next thread available from the pool you'd resume that request instead of picking another one. This is what the virtual threads will make really easy I think.

jayd16 · on Nov 16, 2021

>you could get the best of all possible worlds

Maybe not _all_ possible worlds. You still have original Threads for things that need an actual OS thread. Its not a solution for UI threading.

There will be code that needs a native thread or non-preemptive threading and shouldn't be run on a virtual thread. In that sense there is method coloring but its yet to be seen how common a problem that will be.

Library writers and frameworks will need to sort out patterns for how to call Runnables in a safe way.

Still, its a nice tool to have.

native_samples · on Nov 16, 2021

Yes but you always need original/kernel threads, regardless of what approach to async you need. The concept of a thread and a stack is hard-wired into the CPU.

W.R.T. code that needs a native thread: at the moment there's only two types of such code. One is code that uses Java's synchronized statement. That's supposedly just a, ehm, small matter of programming to fix. The other is calling into non-JVM controlled code. That's fundamental and no approach to scalable concurrency can fix it, not CPS/async/await or anything else because it's a foreign compiler.

But fortunately the JVM has some really interesting tricks up its sleeve there. For instance you can compile your native code using LLVM and then execute the bitcode on the JVM. Well, OK, currently GraalVM doesn't support Loom but hopefully Graal will be upgraded to do so as Loom gets integrated into HotSpot. And when it does, you will be able to call into code written in C/C++/Objective-C/Rust as long as that code can be recompiled with your own toolchain and as long as you can tolerate it being JITCd, also whilst benefiting from Loom's scalability.

jayd16 · on Nov 16, 2021

Why do you say CPS can't fix it? C# works around this by having a synchronization context and ways to bounce around contexts. In this way C# async/await is able to ensure code is run on a specific native thread. Is that not a fix?

native_samples · on Nov 16, 2021

If the native code blocks, the C# coroutine won't reschedule.

jayd16 · on Nov 17, 2021

The idea is you need to understand your workload and run tasks on schedulers meant for that workload. You'd make sure to move that work to a context for long running tasks.

Not unlike how you might use a non-virtual thread pool in Java.... but it seems wrong to imply that you no longer need to think about this stuff.

kaba0 · on Nov 16, 2021

That’s not function coloring, it is up to the caller whether to start it in a virt thread or a real one. Function coloring is having two methods do the same thing differing only in name and signature (eg. there is a blocking sleep and a non-blocking one).

jayd16 · on Nov 16, 2021

>it is up to the caller whether to start it in a virt thread or a real one.

Sorta kinda but not when you're working in a framework that will call your code or working in some library where the abstracted code is non-obvious or uneasy to configure.

Maybe its not function coloring, although I wouldn't know what else to call it and I think its quite similar. What would you call the problem?

vbezhenar · on Nov 16, 2021

Java still eats like 5x ram compared to node, Java tools are slow, Java frameworks are gargantuan, even those claiming "lean".

Java is fine if you don't care about RAM and start time, though.

sandGorgon · on Nov 16, 2021

https://quarkus.io/guides/building-native-image

You should try Quarkus. It is a production framework built by Redhat. It uses Java-GraalVM under the cover to compile your entire webapp to an executable (like golang does).

It's just as fast.

Java is the highest performance and most tuned VM there is. I think you're really thinking of java from a long time ago, if ur thinking this

rbanffy · on Nov 16, 2021

> Java is the highest performance and most tuned VM there is.

Not defending the opposite argument, but V8 is also pretty impressive. It's rooted in work done for Smalltalk long before JavaScript was a thing.

MaxBarraclough · on Nov 16, 2021

> It's rooted in work done for Smalltalk long before JavaScript was a thing.

The same is true of HotSpot. https://en.wikipedia.org/wiki/HotSpot_(virtual_machine)#Hist...

geodel · on Nov 16, 2021

Ah that Supersonic-Subatomic-Java. Whatever Redhat lacks in quality side in Java frameworks, they more than compensate with corny marketing taglines.

sverhagen · on Nov 16, 2021

If you use gargantuan Java frameworks, you'll use a lot of RAM. Just don't do that. With Spring Boot and similar frameworks, the RAM usage is really just very modest. I'll give you startup times, since I am not a believer in Quarkus and Graal. And I wouldn't use Java for a serverless function that needs to spin up and respond quickly. But for a typical (blue/green-deployed) application in my world, startup time is still only a few seconds, which is fine for many applications. And I am not settling for "fine", just saying that the startup time isn't a big consideration, against a lot of things the Java (or Spring, in my case) ecosystem offers me.

kitd · on Nov 16, 2021

I had a Quarkus server app start up in 0.1s the other week. And BTW Spring Boot does native compilation now too [1]

Your "belief" is putting you at risk of ignoring a wide range of Java use cases unnecessarily.

[1] https://docs.spring.io/spring-native/docs/current/reference/...

sverhagen · on Nov 16, 2021

You are probably right. It's one of those things where I've not seen a need to jump aboard. I'm still being fearful of reflection going to break on me. Probably irrational fear, but fed by me not understanding how it wouldn't break. Which I should study up on. Which I don't, since I don't have the need. And here I am ... vicious circle.

vips7L · on Nov 16, 2021

Startup time is fine.. just don't use spring where it has to read every class at runtime to determine what to inject.

geodel · on Nov 16, 2021

Well that's the Spring's great feature: To convert as many compile time errors in to runtime errors as possible.

jpgvm · on Nov 16, 2021

To be fair there is the new Kotlin based wiring API which avoids that with the caveat you need to instantiate everything manually etc. Which is probably a decent tradeoff for some folks.

vips7L · on Nov 16, 2021

There’s other alternatives like Avaje Inject and Quarkus which uses the same annotations but does the injection generation at compile time.

nogridbag · on Nov 16, 2021

Care to share more thoughts on Quarkus? I'm evaluating it for an upcoming app. From my limited reading it can be used with and without Graal.

EdwardDiego · on Nov 16, 2021

My favourite bit of Quarkus is the same as my favourite bit of Micronaut - DI is compile time.

sverhagen · on Nov 16, 2021

See my sibling comment: my fears are irrational.

Skinney · on Nov 16, 2021

Javalin is _very_ lightweight, and starts up fast. Use the framework that best suits your requirements.

Also, Java is working on reducing ram usage: https://openjdk.java.net/projects/lilliput/

dzonga · on Nov 16, 2021

there's Helidon: https://helidon.io/ as well from Oracle. though at the moment, I'm using Javalin.

vbezhenar · on Nov 16, 2021

I thought so too. And wrote simple web service using Helidon SE. It eats 300+ MB of RAM. I spend some time trying to optimize GC and all that stuff. Similar node service would eat 30 MB of RAM.

May be Graal would save us all. Until then Java is beyond salvation.

kaba0 · on Nov 16, 2021

Other than a very very niche usecase, I really don’t see how eating 300 MB of RAM is so problematic when we quite literally have servers with terabytes of RAM. Yeah java can be configured to run GC all the time and target <100M of ram, but it rather runs the GC only seldom (jvm is actually one of the most energy efficient runtimed languages out there!) and trades memory usage to throughput.

vbezhenar · on Nov 16, 2021

Because in the cloud you're paying hefty price for every MB of RAM. For example with Jelastic you have 128 MB per cloudlet. And it's 2x difference between 120 MB and 130 MB. And with dedicated servers I don't have terabytes of RAM, I have two server with 24 GB each.

And no, you can't configure Java to target <100 MB of RAM. I configured it with -Xmx64m and it still eats around 300 MB. Java just fat and you can't do nothing about it at this time.

kaba0 · on Nov 16, 2021

Or you care about actual performance of the system?

emmanueloga_ · on Nov 16, 2021

YT talk from Ron Pressler: "Why user-mode threads are (often) the right answer" [1] and slides [2].

1: https://www.youtube.com/watch?v=KmMU5Y_r0Uk

2: https://assets.ctfassets.net/oxjq45e8ilak/5QM86VAnN9XJ9HUIs2...

zvrba · on Nov 16, 2021

Someone would disagree (though in context of C++):

http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2018/p136...

Perhaps it's easier to address the problems in a managed environment and I really do hope they pull it off. Also it's unclear whether virtual threads will support async file I/O out of the box, or ever. (C# does have Async methods on files.)

pjmlp · on Nov 16, 2021

C++ still needs to get the executors mess sorted out.

Java supports executors since Java 5, and async IO exists since ages with NIO.

Nowadays C++ gets lost discussing language minutiae that it isn't as much fun as it used to be.

zvrba · on Nov 16, 2021

> Nowadays C++ gets lost discussing language minutiae that it isn't as much fun as it used to be.

Yeah, I agree. C++ is no longer fun at all.

msla · on Nov 16, 2021

More information:

https://en.wikipedia.org/wiki/Green_threads

Which makes me wonder how this is new:

> In Java 1.1, green threads were the only threading model used by the Java virtual machine (JVM),[8] at least on Solaris. As green threads have some limitations compared to native threads, subsequent Java versions dropped them in favor of native threads.[9][10]

So is the "new" part that green threads are coming back to Java?

jmgao · on Nov 16, 2021

The "some limitations" there was mostly that original green threads implementation in Java lived on a single OS thread, so you could only use one core. Presumably that's changing this time around?

Twisol · on Nov 16, 2021

Yes, by my reading of the linked JEP, these virtual threads are executed by a pool of honest-to-goodness OS threads (so a virtual thread might pause on one thread and get resumed on another).

ldargin · on Nov 16, 2021

Ok, now that sounds good.

kaba0 · on Nov 16, 2021

And also they are magically become non-blocking.

Const-me · on Nov 16, 2021

> at least on Solaris

On Windows NT as well. On my first job around 2000 I did a little bit of Java programming. As far as I recall, JVM scheduled all their threads on top of a single OS thread.

hashmash · on Nov 19, 2021

Windows NT natively supported threads when Java was released, and Java on Windows NT (and Windows 95) in 1996 used OS threads. There were some problems when running on machines with more than one CPU, but I never figured out if this was the the fault of the OS or the JVM.

moring · on Nov 16, 2021

A key point that I was particularly happy about is that ThreadLocals work well with virtual threads. My personal killer application: MDC for logging. I have just tried to build proper logging into a Javascript/Typescript application and none of the logging frameworks support MDC (or need cumbersome workarounds) because the async-isms in the underlying language prevent it from working as intended.

aardvark179 · on Nov 16, 2021

Thread locals work, but we think in scope locals we have a better more robust mechanism more suited to the way thread locals are used in things like logging libraries.

yutijke · on Nov 16, 2021

Anyone know if this preview JEP would make it into Java 18?

We've been hearing out openjdk's project loom for a while, but we haven't gotten to try this out in Java mainline. I am guessing this will take at least two previews before an initial release. And given the speed the ecosystem moves at, we may not see this reaching widespread use for quite a while.

jpgvm · on Nov 16, 2021

It's very close but I wouldn't expect it in 18, much more likely to land in 19 as preview. I have been using Loom builds for a small app for a while and haven't encountered anything strange (though last time I checked virtual threads confused the crap out of my IDE debugger) so I expect that there should be very few bugs to fix and thus should have an easy path to LTS in Java 21.

SureshG · on Nov 16, 2021

There are two more intersting features that work with Virtual threads,

Structured Concurrency - https://openjdk.java.net/jeps/8277129

Scope Locals - https://openjdk.java.net/jeps/8263012

touge · on Nov 16, 2021

Why not just use a name like 'Joroutine'

didibus · on Nov 16, 2021

They're not coroutines though. This is a little semantic, but a coroutine normally uses cooperative multitasking exclusively.

Something like:

    coroutine foo
      while queue not full
        put something in queue
      when full
        yield bar

    coroutine bar
      while queue not empty
        take from queue
        do something with what was taken
      when empty
        yield foo

Each time the coroutine yields, it removers it's state and execution resume another coroutine, and when execution is yield back it too resume from the yield point.

As I understand, in Java, they are not adding coroutines, but something that is a virtual thread, which is more like a green or lightweight thread. It means that it can be pre-emptively paused and resumed, it doesn't have to voluntarily yield. There is some scheduler that could decide when to execute which virtual thread and so on.

Twisol · on Nov 16, 2021

Right; Java's virtual threads are (*adjusts glasses*) preemptively-scheduled stackful continuations, meaning (1) that they execute in parallel and in cooperation with the OS' scheduler (*) (which automatically timeslices the CPU amongst threads), and (2) that each continuation is the full call stack at its yield site.

The alternative for point 1 is cooperative scheduling (announcing explicitly when they yield), as you've described.

The alternative for point 2 is "stackless" continuations, where the task yields by returning a callback describing the next step of the task -- or, equivalently, returning some data describing the state of the task, which the task's primary entry point can use to decide where to continue from. (For instance, imagine a function with a big `switch` statement that, when invoked, decides which case to jump into based on its argument, which the caller got from the return value of the last time it got invoked.) Either way, every step of the task constructs and then returns out of its call stack, which can be much more memory efficient, but is also more painful to model tasks in without help from the compiler/language.

(*) Technically, the JVM could take on the role of scheduler as well; it could count the number of bytecode instructions executed, say, and pre-empt a task after some number. This is what Lua supports for some of its sandboxing capabilities. But I think that would be counter to Java's use of an OS thread pool to execute these virtual threads; its own scheduler would be interleaving somewhat unpredictably with the OS'. You'd want to do that if you have long-running jobs that do little I/O (so they hog an OS thread)... but then you'd probably rather put those jobs on actual background worker threads.

wruza · on Nov 16, 2021

You (or a language) don’t have to yield-to another coroutine. You may resume and yield-from as in:

  coroutine producer
    forever
      while no full packet
        resume recv into buffer
        on eof return null
      yield (extract packet)

  coroutine consumer
    while packet = (resume producer)
      if packet is null break
      process packet

which is more like a green or lightweight thread. It means that it can be pre-emptively paused and resumed, it doesn't have to voluntarily yield

I’ve never heard of this meaning of coro/green/light distinction. Can you please point to some literature?

didibus · on Nov 16, 2021

Wikipedia: https://en.m.wikipedia.org/wiki/Coroutine

In your example, it seems to still be cooperative, you simply yield to the scheduler which is itself a coroutine and will then decide what other coroutine to yield back too. Here's a naive coroutine scheduler :

    ArrayList coroutines;

    coroutine scheduler
      for i = 0;; i = i++ % coroutines.size()
        yield coroutines[i]

It's still voluntary yielding though, preemptive would be that the scheduler can at any time interupt the task, but here it can't, it will still only be possible to schedule another task ounce a yield point voluntarily yields back to the scheduler.

Actually, your example is simpler then that: (resume producer) is the same as: yield producer. And the yield with a return value is the same as: yield consumer. For the latter, the language probably allows yielding to the previous coroutine under the hood or like I said maybe it yields to a scheduler.

I was also showing that you can even do something like yield to a scheduler which will then pick the next coroutine to resume, which makes it even more "thread like", but still cooperative.

The coroutine's cooperative nature has an advantage, it naturally models coordination. With a preemptive scheme like Java virtual thread, you will still have to protect shared data and have ways to coordinate and synchronize like mutex, locks and all that.

kaba0 · on Nov 16, 2021

> With a preemptive scheme like Java virtual thread, you will still have to protect shared data and have ways to coordinate and synchronize like mutex, locks and all that.

As far as I know there is nothing preventing race conditions and dead/live locks in case of coroutines either, isn’t there? Like of course if you have 1 thread these issues won’t come up, but with true parallelism, this model in itself doesn’t protect anything.

didibus · on Nov 16, 2021

It doesn't prevent it, but it can help with synchronization.

If you have two coroutines writing to the same variable, but they yield to each other, you know they won't ever both run at the same time.

You also know if you spawn multiple coroutines that they won't yield except where they call yield, so everything before and after the yield you know will be atomic.

kaba0 · on Nov 16, 2021

> If you have two coroutines writing to the same variable, but they yield to each other, you know they won't ever both run at the same time.

But that won’t be parallel just concurrent, and in case of cooperative “threads”, you could have probably written it in a more readable single threaded way, as that’s pretty much just calling two functions back and forth.

Your second point also only works when you have a single thread of execution, otherwise concurrency will entail parallelism and all the usual problems will become apparent.

didibus · on Nov 16, 2021

Threads still have the issue of synchronization and atomicity even when only concurrent and not parallel.

That is, assuming you had a single core CPU, with threads you'd still need to synchronize things when implementing concurrency. Coroutines have a more explicit synchronization from their natural ping/pong as you yield which could be said to tend to be safer in the average case.

I think you're maybe conflating something. If two things write to the same global variable for example, that can never be parallel, but it can be concurrent. With threads, the writes to the variables need to be guarded with some synchronization mechanisms, if you forget you'll have bugs.

With coroutines, they will be naturally synchronized by the yield points.

> you could have probably written it in a more readable single threaded way, as that’s pretty much just calling two functions back and forth

It's not just calling two functions back and forth, the coroutines retain state and continue where they yielded. Each time they yield they do not consume additional stack frames.

geodel · on Nov 16, 2021

Maybe because it sounds terrible.

pjmlp · on Nov 16, 2021

Because this is about bringing green threads back.

geodel · on Nov 16, 2021

Since currently it is not targeted to any JDK version. I think at the earliest it will be Java 19 to be out in Sept 2022.

Twisol · on Nov 16, 2021

That would be my guess as well, and I would (casually, sitting in my armchair) expect two previews for such a significant change to the JVM. I'm not expecting it to land fully until Java 20 at least -- but we should absolutely give the previews a try as they come out.

sverhagen · on Nov 16, 2021

...and then many of us may not put it in production until the next LTS anyway...

(...aside from those that are perpetually stuck on Java 8 anyway.)

Twisol · on Nov 16, 2021

I'd love to know who, of those pinned to an LTS release, has actually made use of a support contract with a company providing contracted support for an LTS release, whether it's Oracle or another company. I don't doubt they exist, but I have no idea what that support even looks like.

My team has been happily tracking the twice-yearly JDK bumps. We started development three years ago against Java 8 and made a series of jumps (9, 11, and then 14 onward) and never really had an issue.

I'm not sure I can live without `var`, `record`, and pattern-matching `instanceof` anymore. (With `sealed` interfaces and records, the visitor pattern is long gone... I can only wait with baited breath for exhaustive pattern-matching `switch` expressions.)

sverhagen · on Nov 16, 2021

I am not on a support contract (rather the opposite: small team). But I am just being careful to get caught having to spend time upgrading code that I would otherwise not have touched, just because my non-LTS JVM runs out of support. Support is not just a support contract, but also security patches being released. In my understanding, for non-LTS versions that ceases quickly when the next version is out. Particularly for non-LTS versions there may be experimental features that are not going to be compatible with subsequent versions, increasing my risk that a migration to the next version is occasionally not quickly.

Twisol · on Nov 16, 2021

Thanks for responding! For what it's worth, Ron Pressler (lead guy on the virtual threads work in the original article) has opinions on LTS:

https://old.reddit.com/r/programming/comments/lsuojl/jdk_16_...

pron> Assuming you've already made the last ever major upgrade past 8 (which was a relatively tough one), the reason people pay for LTS isn't because upgrades are overall cheaper -- they're costlier, actually -- but because they're willing to pay to not get new features. We've designed the LTS model mostly for legacy applications that don't see much maintenance, and want their dependencies, the JDK included, to change as little as possible.

https://www.reddit.com/r/java/comments/o0m6g8/the_state_of_p...

pron> People who want a new feature to land in LTS still misunderstand what LTS is. People who upgrade from LTS to LTS every three years also misunderstand LTS, and probably get the worst of both worlds.

Personally, I only found Java 9 to be anything like a stumbling block, and that's solely because the module system (Jigsaw) threw all the tooling for a loop. You can easily avoid Jigsaw and never worry about it.

The Java folks try really hard not to break backwards compatibility in general, and modules (+ JDK internals encapsulation) are the only major bugbears to worry about. If you can upgrade, I've found it extremely worthwhile.

> Particularly for non-LTS versions there may be experimental features that are not going to be compatible with subsequent versions, increasing my risk that a migration to the next version is occasionally not quickly.

As for this, the experimental features may as well not exist if you don't enable them. You absolutely should kick the tires with them if you can, but their presence is feature-flagged off by default. I'm on a small team myself, and it's been painless for us ever since jumping to 11.

kaba0 · on Nov 16, 2021

With the new oracle licensing you can use an LTS until the next LTS comes out and one additional year on top, for free. That one year should be more than enough for testing, isn’t it? Especially that thanks to strong encapsulations java updates are even more of a breeze.

vips7L · on Nov 16, 2021

I never found use for the visitor pattern in Java anyway. Isn't that defeated by instanceof?

Twisol · on Nov 16, 2021

Traditional wisdom says that you shouldn't use `instanceof`, because downcasting is bad and you should work with an interface uniformly implemented by any particular subclass. The Visitor pattern accomplishes this by giving the shared supertype a `match` method accepting, effectively, a bunch of callbacks, and each subclass just chooses which callback to invoke.

In algebraic type notation, this pattern replaces a function returning a sum type, X -> A + B + C, with a function accepting a callback that accepts a sum type, `X -> (A + B + C -> Y) -> Y`. But function accepting a sum type is the same as a product of functions, so you have `X -> (A -> Y, B -> Y, C -> Y) -> Y`. The product of functions is the visitor, and `X` is the thing you're visiting.

Traditional wisdom is correct when you have an open family of subclasses (i.e. you don't know, and shouldn't know, precisely how many subclasses there are). But for a closed family, it's just unnecessary; you're blinding yourself from information you already possessed.

geodel · on Nov 16, 2021

Indeed. Since Java is moving to 2 year LTS cycle, I guess overall plan would be have it as standard feature with JDK-21 LTS. Also most likely with primitive objects and much enhanced pattern matching I feel JDK-21 is going to suck a lot oxygen from other JVM languages.

Twisol · on Nov 16, 2021

Yes, Java is finally becoming a language I don't cry myself to sleep over. I've still got other languages I (vastly) prefer, but I don't feel like I'm fighting the system nearly as much anymore. (Though, tail-call optimization would mean I don't have to obfuscate so many algorithms that are naturally recursive... it's part of Loom's charter, so maybe after virtual threads.)

jpgvm · on Nov 16, 2021

I prefer Kotlin for a number of reasons but still use Java heavily as it's still a better choice in many places. I think this will be of massive benefit to both languages. Kotlin coroutines will probably be mostly relegated to multiplatform and JS backends but that is fine, server side Kotlin will take full advantage of virtual threads. :D

aembleton · on Nov 16, 2021

Where is Java a better choice than Kotlin?

jpgvm · on Nov 16, 2021

So the main thing is libraries. If you write your library in Kotlin then it depends on the Kotlin runtime JARs which seriously bloats it's dependencies. Not a problem where the library is only applicable in Kotlin-land but if it's generally useful from Java/Clojure/Scala then it's better to write it in Java.

The other case is general OSS software, Java reaches a wider audience in my field (distributed databases, streaming data, etc). Java is pretty much considered the lingua-franca of Big Data with some very small Scala footprint and much less fluency in Kotlin.

I generally write all my own stuff ontop of these on Kotlin but drop down into Java where I need to be able to share things.

Skinney · on Nov 16, 2021

I actually think it will greatly provide a lot of oxygen to other languages.

Virtual threads, project lilliput and valhalla is likely to be a great benefit for Clojure, which has great thread primitives and also spawn a _lot_ of objects that (mostly) don't care about identity.

Matthias247 · on Nov 16, 2021

I'm curious about

> There are situations when the VM cannot suspend a virtual thread, in which case it is said to be pinned. Currently, there are two:

> When a native method is currently executing in the virtual thread (even if it is calling back into Java)

Does that mean any kind of native code is currently paying some extra cost due to the possibility of being blocking? What if I e.g. want to call a library that is known to be non-blocking, or make a syscall that is non-blocking which is not pre-wrapped by the Java standard library? E.g. a library that allows to offer interacting with a BPF map comes to my mind. Is there maybe an escape hatch for virtual thread aware java libraries, where they can tell the runtime that they want to call native code without extra guardrails and overhead?

moring · on Nov 16, 2021

I suspect that this is not about the possibility of blocking, but rather that native code needs a native stack (not on the Java heap) and execution state that cannot be suspended like Java code can.

If you have "short" native methods, like in a typical async I/O library, this is not a problem. They cannot be suspended while in there, but they repeatedly go back to Java code where they can be suspended.

So your scenario is only really a concern with a long-running but non-blocking native method, say, a physics library that does lots of computations. The answer is most likely: Don't run that in a virtual thread.

jpgvm · on Nov 16, 2021

I/O from JNI is very rare in JVM world. I think the most common cases are going to be really CPU intensive libs like compression and encryption implementations. But yeah, just run these on a native thread pool (which you should probably do anyway).

kllrnohj · on Nov 16, 2021

Maybe maybe not. If you want to take advantage of things like io_uring you're going to be doing that with a JNI lib. Such as this Netty incubator support for it https://github.com/netty/netty-incubator-transport-io_uring

Also all of the existing JVM IO is done with JNI. How do you think java.io is implemented itself? Of course Loom can change those implementations, but strictly speaking JNI is currently extremely common for Java IO. How big of a task supporting virtual threads for the Java libraries remains to be seen, especially for the unofficial extensions like the sun.nio.* package

jpgvm · on Nov 16, 2021

Very likely such a thing will be pinned to a few OS threads and then interact with a virtual thread pool to farm out work.

In general most Java I/O is native because it's "sufficiently fast" for such things. Netty is the exception rather than the rule in this regard.

kaba0 · on Nov 16, 2021

The only thing I can think of where it may come up is interacting with OpenGL? But that will also be mapped to a single thread.

Twisol · on Nov 16, 2021

> What if I e.g. want to call a library that is known to be non-blocking, or make a syscall that is non-blocking which is not pre-wrapped by the Java standard library?

My understanding is that pinning only enters the equation if you need to yield while a native frame is on the stack. If that call is non-blocking, then by definition you'll call into it and return without needing to yield the current task.

A non-blocking call should give you some way to tell when the job you've requested has completed, of course, and then you need to either poll for it or arrange to be told when it's done. You don't want to spinlock in a virtual thread (you're just hogging an OS thread continuously, which is exactly what pinning is), so either way, you'll end up blocking -- but as long as you're blocking after returning from the native call, you should be fine.

> Does that mean any kind of native code is currently paying some extra cost due to the possibility of being blocking?

I have no special insight, but I imagine any costs are only incurred if a yield actually occurs with native code on the stack. Only then would the yield logic pin the current task to the current thread.

aardvark179 · on Nov 16, 2021

You aren’t incurring any overhead because that native code isn’t going to try and yield execution. You would only incur the overhead if the code called back to Java, and then that Java code performed some blocking operation.

jrh206 · on Nov 16, 2021

I really hope they change this `Thread.ofVirtual()` and `Thread.ofPlatform()` language; it sounds clunky and doesn't read like any Java that I've seen before.

Twisol · on Nov 16, 2021

`Optional.of`, `List.of`, `Set.of`, `Map.of` (and `Map.ofEntries`), ...; this language is actually pretty settled into the JDK.

elygre · on Nov 16, 2021

https://blog.joda.org/2011/08/common-java-method-names.html?... has some background into the logic of static factory naming, covering from() and of().

ljnelson · on Nov 16, 2021

Indeed. “of” infected Java with the awful naming conventions imposed by java.time (and to a lesser degree java.nio) because “new” is a keyword and doing “newThing” instead was not in vogue. Now it smacks of the worst kind of hipster disease and is going to enter java.lang for no good reason at all. What’s wrong with “newPlatformThread” or something similar? This shouldn’t have survived the initial sniff test.

daxfohl · on Nov 16, 2021

Awesome, now that we've just spent a year rewriting our backend with callback hell to make it async, we can spend the next year rolling it all back.

Skinney · on Nov 16, 2021

If it works, don't fix it.

Besides, this hasn't been targeted to a release yet, so it might not come before Java 19, which is a year from now. Even then, it will still be a preview, which likely means another year before it's a stable feature.

aoms · on Nov 16, 2021

Git rollback..

amitport · on Nov 16, 2021

Does anyone know if delimited continuations are on the roadmap with this?

Potentially, virtual threads enable them (assuming a serializable environment) (co routines are also enough for this, the serialization capability is what I find interesting).

This will enable a complete freeze of an execution to be stored and even send over the network to be completed somewhere else.

clhodapp · on Nov 16, 2021

The idea of serializing suspended computation across a network seems extremely attractive to a lot of people but I've never understood why it's so appealing versus the more typical approach of sending whole binaries and defining messaging protocols. Could you possibly elaborate on why this capability is exciting?

amitport · on Nov 16, 2021

For me. Code expressiveness and Simplicity.

If you can just send the computation around your basically don't need the entire message/protocol boilerplate in your execution code.

For example, think of a game engine that allows to write code with loops and calling function that may wait for an event or a rule to be true. Imagine that your game can be paused stored in case of a connection lost, synced between different server or both at a server and at a client for fast response. Now without this language feature your basic game logic code gets, you can just have a wait statement inside a loop etc., How will you return to the same place on resume? You need more code, storing the entire execution state.. on every condition or a loop you either need to store something or have code that looks like a state machine etc., With this feature you don't need special design pattern, just write it, and the mess is in the language level.

polskibus · on Nov 16, 2021

Is this like .NET tasks? If so, what’s the async story here? Does it involve function colouring like in .NET?

jiggawatts · on Nov 16, 2021

The doco explains it quite well, so I won't repeat it here.

It's solving the same problem that async does, but it does it with virtual threads instead. The idea is that functions aren't coloured, and that normal threaded code will "just work".

I see some benefits of this approach, but I feel that what all of the solutions (Java, C#, Rust, etc...) are missing is structured concurrency[1], without which madness and eldritch horrors of late-night concurrent code debugging are guaranteed.

[1]: https://vorpus.org/blog/notes-on-structured-concurrency-or-g...

Skinney · on Nov 16, 2021

Project Loom will include structured concurrency. But most of that will come out in releases after virtual threads.

If you read the JEP though, you'll see that Executors are auto-closable now, which means you can use try-with-resources to wait for all spawned threads to stop before continuing execution.

jsmith45 · on Nov 16, 2021

In practice, I suspect structured concurrency will frequently requiring using the escape hatch for scenarios where a long lived background like task really is the right fit.

But the scape hatch works by.... coloring functions! Specifically if a function needs to spawn a longer lived background task, it needs to take a nursery parameter.

If a function wants to call a function that might spawn a long lived function, it needs to either own the lifetime of said nursery, or more commonly accept a nursery as a a parameter, and pass in the supplied one.

In practice with java, the nursery concept (StructuredExecutor) will only be used for those cases where it is actually helpful, (i.e. where you really want the function call to not return until all concurrent tasks are finished), and everywhere else, like background tasks, existing primitives will be used.

And all nurseries/StructuredExecutor is buying you is the ability structurally enforcing joining of the relevant virtual threads. It lets you avoid some of the common mistakes in structuring such code, but I'm not convinced that is where the eldritch horrors of concurrent code debugging live.

I think the real eldritch horrors come from buggy attempts to implement low lock code, failing to realize that locks or other synchronization is needed when accessing a certain variable, etc. Basically race condition type situations.

I personally almost never have had substantial concurrency issues related to failing to join my concurrent tasks.

kaba0 · on Nov 16, 2021

https://openjdk.java.net/jeps/8277129

grandiego · on Nov 16, 2021

In the "Alternatives" section of TFA there are some interesting comments regarding the async/await approach (which was avoided by the Virtual Threads proposal):

"Provide syntactic stackless coroutines (async/await) in the Java language. These are easier to implement than user-mode threads and would provide a unifying construct representing the context of a sequence of operations, but that construct would be new, separate from threads while being similar to them in many respect yet different in some nuanced ways, would still split the world of APIs between those designed for threads and those designed for async/await, and would require the new thread-like construct to be introduced into all aspects of the platform and its tooling, resulting in something that would take longer for the ecosystem to adopt while not being as elegant and harmonious with the platform as user-mode threads. Most languages that have chosen to adopt async/await have done so due to an inability to implement user-mode threads (Koltin), legacy semantic guarantees (JavaScript), or language-specific technical constraints (C++). These do not apply to Java."

I wonder if Rust should also be included in the final sentences.

vips7L · on Nov 16, 2021

It does no function coloring. Just block :). Virtual threads are preemptive not cooperative.

kaba0 · on Nov 16, 2021

It doesn’t block, that’s the whole point. The runtime knows if a given method is blocking or not and under the hood even blocking IO calls are implemented with non-blocking IOs. So those can be preempted.

vips7L · on Nov 16, 2021

Yes I know. “Just block” refers to just waiting on virtual threads/futures via get or join. No fancy observables or callbacks etc.

bborud · on Nov 16, 2021

I've watched some of Mark Rendle's talks (like this one https://youtu.be/2-mFWi5oLkM) and while watching it I realized how much I dislike when languages start to absorb ideas that are either alien to the language, or offer multiple paradigms for solving the same problems.

I can understand Java programmer want the goodies offered in languages that are more geared towards concurrency. But multiparadigmatic languages really suck because they are no longer a single language. You get islands of different practices and a partitioned set of practitioners. And they can't always use each other's code (C++ being the most extreme and painful example).

This makes me kind of glad I switched to Go 5-6 years ago. And it makes me wonder when the (good) intentions of the Go designers to not absorb every idea that comes along will go out the window and Go will start to grow knobbly bits all over.

kaba0 · on Nov 16, 2021

I think you can say that for any language, other than Java. It is a deliberately slow moving language, with very few keywords and concepts going for it. It was always meant to be a simple language on top of a very high-end runtime — and indeed they manage to implement loom with no language change. Also, concurrency not being inherent to java? It was the first major language with good support for multithreading with its synchronized keyword.

bborud · on Nov 16, 2021

Multithreading isn't synonymous with concurrency.

Java was one step along the way, but let's say it had adequate representation of heavy-handed tools we already had in C/C++ that made some forms of concurrency somewhat easier. But it was still some ways from promoting concurrency in that threads were pretty costly and you still depended on locking to move state between threads. And it isn't like CSP hadn't been thought of.

After about 20 years of programming Java and 5-6 years programming Go I wouldn't really list concurrency as a main feature of Java. Because you kind of go at it the way you go at it in C/C++. I think someone who has programmed (for instance) Erlang would feel much the same way.

kaba0 · on Nov 16, 2021

Java exposes the low-level details of parallelism, but it also allows for high level abstractions on top. Thanks to that there are indeed libraries like Akka to provide something for those that prefer the actor model, but also clojure with its immutable data structure-using concurrency and anything in between like reactive libs. Of course concurrency has never been easy in practice so the low level details are hard to get right, but they are there in a sane and easy to use way, so if not the main point of the language, I would absolutely mention it as one of the most important plus features.

kccqzy · on Nov 16, 2021

Does Go have closures? Yes it does. Therefore you can start writing everything in an async style, manually passing closures around. Then build an executor interface. Voilà. Here's an utterly alien Go codebase that doesn't use green threads.

It has nothing to do with language. It has everything to do with how other libraries (especially standard libraries) structure their code.

MrBuddyCasino · on Nov 16, 2021

This is literally avoiding "bifurcating the language" and avoiding the function coloring problem, its the whole point! Otherwise they could have just copied Kotlin's coroutines. Everything works as before using the familiar Thread API, just with a different underlying machinery.

dastbe · on Nov 16, 2021

your comment surprises me, because this kind of development is explicitly trying to avoid bifurcating the language. i would argue that the async io/futures libraries that exist in java are the bifurcation because programming with them is very jarring compared to threaded java.

jpgvm · on Nov 16, 2021

His comment bellies his clear ignorance to both the Java development process and Loom in particular. Such comments aren't unusual where Java is concerned, it's the one language where many are willfully ignorant (well PHP is also in contention).

Pay them no mind though. Java is already the premier server side language for serious work and this is just another tool in the toolbox, hopefully I will see less RxJava in my future. :)

bborud · on Nov 16, 2021

Yes, and I am hoping that Go won't go through the same. But if you think about how different languages have evolved they do, over time, tend to end up in places where it gets harder and harder to avoid.

yla92 · on Nov 16, 2021

I wonder how this gonna compare with Kotlin Coroutines[0]

0: https://kotlinlang.org/docs/coroutines-overview.html

lmm · on Nov 16, 2021

It's like an implicit, magic version of them. If the implementation is perfect then it will work great and make Kotlin coroutines obsolete. If the implementation isn't perfect, everything will work great until it hits whatever corner case and blocks everything (just like when you block on a coroutine without properly shifting to a blocking executor, only less visible and harder to diagnose).

haimez · on Nov 16, 2021

Arguably more visible because you have native JFR support for noticing such an event. Let’s remember, that Kotlin is a guest language and that the host platform implementations get host platform integrations natively.

pjmlp · on Nov 16, 2021

Not only that, Kotlin has decided to marry Android as the platform language, so all JVM features not adopted by Google into ART means that Kotlin needs to decide to which master to obey, and with time Kotlin only path are to become its own platform.

Hence why JetBrains is so eager with creating duplicates from every Java library in Kotlin.

tadfisher · on Nov 16, 2021

JB is investing heavily into native, JS and wasm compiler backends, so there's a counterpoint. Another is that Kotlin is steadily adding support for features in newer Java runtimes, such as records. Another is that the Kotlin/JVM backend can target up to Java 17 bytecode, which is obviously not supported by Android.

This reads heavily of FUD.