Am I reading this correctly? Looking at the benchmark code, Why not do a comparison versus the default Tornado setup, which is to fork one process per core? So STM Tornado is allowed to use multiple cores in this benchmark, but vanilla Tornado is not allowed to?
http_server = HTTPServer(Application(),
xheaders=True,
)
http_server.bind(port)
http_server.start(0) # Forks one sub-process per core
The problem with multiple processes is the "share nothing" model. It works for some problems, but it blatantly fails for a whole variety of other problems. STM tries to address those problems where "share nothing" does not work, e.g. because there is interesting data to be shared (albeit with few conflicts) or the memory overhead of N processes is just too much.
I recently read this paper about Julia (http://arxiv.org/abs/1411.1607). Now when I hear about PyPy I can't help but think about this quote from that paper:
New users also want a quick explanation as to why
Julia is fast, and whether somehow the same “magic
dust” could also be sprinkled on their traditional
scientific computing language. [...] Julia is
fast because we, the designers, developed it that
way for us, the users. Performance is fragile,
like accuracy, one arithmetic error can ruin an
entire otherwise correct computation. We do not
believe that a language can be designed for the
human, and retrofitted for the computer. Rather a
language must be designed from the start for the
human and the computer.
To me, Python and Ruby are both perfect examples of languages that were designed for the human, and ever since have seen extensive effort to retrofit them for fast execution by computers.
I respect the work of the PyPy team, particularly given the raving reviews I've seen lately of how RPython is a boon to language designers who can use it to prototype their languages and get a decently-performing VM in not very much time: http://tratt.net/laurie/blog/entries/fast_enough_vms_in_fast...
But I can't help but think that languages like Python and Ruby will start to fall to languages like Swift, Julia, Go, etc. that were designed with performance in mind. I'm not saying this will happen soon, but these languages are showing that you can have your cake and eat it too.
I'm not sure how JavaScript and Lua fit into this analysis. They weren't specifically designed for performance, but have been very successfully optimized. Lua is a very simple language and Mike Pall is a genius, so LuaJIT has been very successful at speeding up Lua. JavaScript is a little more complicated, but has received an immense amount of resources into optimizing it, and has also been quite successful at getting fast.
I will want to read that paper in the entirety later, but I think they are wrong. Many languages are now fast that were designed primarily, or even only for humans (see Lisp for the last one).
Python is the language specification that is probably most impaired by performance on multi-core, due to the fact that threading semantics were so strongly specified. Pypy is now making a compelling argument that even that is not an insurmountable barrier.
If by "retrofitted" they allow backwards-compatible changes to the languages in question (e.g. optional type annotations for dynamic languages) then I would estimate that a language not designed at all with performance in mind would suffer less than a 2x penalty given enough engineering effort.
I don't think that's quite true. A lot of languages are a mix of designed-for-humans and designed-for-machine. They do aim to be higher-level than machine code, but it's quite common to have design decisions at least partly driven by considerations from the compiler side as well. Not necessarily only designing for efficient execution (though that is one); other design-for-machine considerations can include ease of parsing and ease of compiler implementation.
Python was designed to be a glue language. It does a terrific job at connecting together different pieces of C code. The more I use it, the more I realize that this is Python's calling. I envision a future where I code numerical stuff in Julia, and use Python to connect it to my main application, my GUI, web frontends and file juggling code.
I have found personally A* is particularly difficult to scale across cores because it's never a shared-nothing problem. For two reasons. One, each core has to know about the other cores' search space and avoid it (to avoid duplicating effort) so you will contend on some sort of 'visited node' cache. Two, the graph your building itself must be shared, obvs.
Are most of the interesting problems to solve always limited by Amdahl's Law? Will we never see the gains of single-core speed we saw in the last century again?
One, each core has to know about the other cores' search space and avoid it (to avoid duplicating effort) so you will contend on some sort of 'visited node' cache.
Just make the visited node cache public and immutable.
Two, the graph your building itself must be shared, obvs.
If the graph is immutable then there's zero problem with it being shared.
You can't make the cache immutable because then it will be empty at start and stays that way ;)
The cache has to mutate and be shared as that's the work completed list. As each thread completes a bit of work (visits a node) it needs to communicate it with the other threads.
:+= Returns A copy of this sequence with an element appended.
Meaning the cache is no longer shared as all threads end up having a thread local cache. You can't update shared changing state and have immutability at the same time.
Immutability is great when one can have it but sometimes its not possible. Shared changing state is something to be avoid as much as possible but sometimes we need it.
If it's public and immutable each core will get only its own cache, which would be pointless. I like your thinking, though; perhaps there can be a way to always pass the last 'visited node' cache around in a timely way.
Note: If anyone was wondering what STM was, it stands for Software Transactional Memory. The read the docs page gives a good overview[1] of what pypy-stm is.
Better it be the team account, to avoid having donors try decide "who contributes how much"... let the team decide. Also, please offer it as an alternative to PayPal on the project website.
We can't really be creating accounts everywhere for minor donations. I support your sentiment, but the official PyPy bookkeeping has to be done in a proper way via the Software Freedom Conservancy, so going through all those services for a few $$$ is simply not worth it.
Why doubt the amount that can be received from Gratipay? What do you need to see to consider using it for PyPy funding? Would a promise of, say $500/week, be enough to make it worth a bother?
But I should say that there is still too much overhead in using STM; you will still be able to very easily (and by a large margin) outperform STM-4 by running 4 instances of Tornado with HAProxy or some other lightweight router ontop. A comparison graph for this should have been the benchmark.
I like that they instrumented the STM code enough that you can debug slowdowns when writing your code. Never understimate the power of well instrumented languages.
Python's feature set is basically what's easy to do in a naive interpreter. Everything is a dictionary. Anything can be changed from anywhere. With "getattr", you can patch one thread from another. It's elegant, and very difficult to speed up. Google tried, with von Rossum on board. Their "Unladen Swallow" JIT compiler project crashed and burned.
The PyPy group has made a fast Python compiler/interpreter/JIT system. It's really hard. The initial funding from the European Union got them started, but wasn't enough. They really try to handle all the hard cases. This requires two interpreters and a JIT. They have to handle "oh no, someone patched object A from thread R, invalidating code that's running in thread S". There's a "backup interpreter" that kicks in for hard cases, and once control is out of the area in trouble, the JIT can recompile it. (This is an old and oversimplified description.)
This transactional memory thing is very clever. It has to separate things at run time that probably should have been separated at compile time, of course. It's impressive that they can get it to work. It's a lot like how a superscalar CPU works, including transaction commit and backup at the retirement unit.
Python gets into this mess because, like C and C++, the language doesn't really know about concurrency. (Threads came late to UNIX, and C predates threads. So C has an excuse for backing into concurrency.) Python has the C model of concurrency; at the user level, it's treated as a library issue. Internally, though, it needs a lot of locking, because there's so much mutable state in the interpreter.
It would be a lot easier if the language were restricted a little. But then It Wouldn't Be Python(tm). The price of this is huge complexity layered on a simple model, and probably years of obscure bugs in PyPy.
> Python has the C model of concurrency; at the user level, it's treated as a library issue.
Other than Erlang, which modern language doesn't?
Java has e.g. "synchronized" keyword, but I regard it as merely syntactic sugar over a standard library implementing thread - semantically, it still basically has the C model.
Go has channels and stuff - but it's still library level (in fact, it's basically syntactic sugar over the Plan9/Inferno/Aleph/DontRememberName standard library primitives)
There were other OS with threads and co-routines being explored as design while UNIX was being developed at AT&T.
As for the rest I agree with you.
Personally I don't have any use for Python besides the occasional shell script, but as a user of applications written in Python I would like them to perform fast.
PyPy does not really try to do that. CPython is great for a lot of things, small, easy to install on a myriad of platforms and is, after all, Python. PyPy instead tries to push what's possible to do with Python as a language and with dynamic language interpreters. You can do real time image processing using only Python with PyPy, now STM, fast numerics in the future etc. The goal is to expand Python ecosystem, not to "replace" CPython.
I don't think this is applicable to python. Only so many developers can work on cpython without stepping on toes. This led to the decision not to have cpython to be an implementation standard, not an experimental language, while pypy led the effort for expanding with a different team of developers. This is a good decision, because there are still more than enough developers to maintain a stable, full-featured cpython that pypy can conform to (to varying degrees). There aren't any competing standard libraries, for instance, and it's made very clear which libraries work on which implementations.
Maybe if there were competing implementations to ruby, it would still be popular outside the rails community. But it seems as if the vast majority of ruby core developers work on the mainline implementation, or forks thereof.
Generally speaking, for a language like Python, having only one implementation is considered a bad sign. Anything serious has multiple implementations.
CPython is hurting the ecosystem by being slow. I'll let you in on a little secret. People used to using C, C++, D even Java and C# laugh at the slowness of languages like Python and Ruby. Blah, blah, blah productivity gains. We laugh. We think, "I could write this in Java and go to production with three servers or I could write it in Python and use eleven."
People don't like to say it to your face, but Python is often derided and considered to be pretty silly. Oh, and its syntax is like Lego. Neat at first until you want something like a multiline lambda.
In most cases I would rather pay for 7 more servers than for the 10,000 extra lines of code your C++ or Java implementation is going to cost me. Hardware is cheap. Engineers are expensive.
C, C++, D, Java and C# are hurting the ecosystem by being slow. I'll let you in on a little secret. People used to coding assembly laugh at the slowness of languages like C and C++. Blah, blah, blah productivity gains. We laugh. We think, "I could write this in assembly and go to production on an Arduino or I could write it in C and have to use expensive servers."
People don't like to say it to your face, but C is often derided and considered to be pretty silly. Oh, and its syntax is like Lego. Neat at first until you want to understand what something is a pointer to.
The difference is that you can achieve similar performance to assembly by writing in those other languages, while you cannot achieve anything close to similar performance with Python.
It used to be the assumption that there was a direct tradeoff between performance and convenience/productivity of a programming language. I think newer languages are showing that this doesn't have to be true, at least not to nearly the same degree.
It was warranted. CPython needs to be derided and PyPy needs to keep up the good work until Python isn't unreasonably slow. Python should be humiliated into being fast or just go away.
PyPy is a project to bring some optimization to Python. Basically make Python run faster. It is also effectively another implementation of Python. CPython is the default one (the one you download at python.org). There is also Jython (running Python on the JVM), and PyPy, and a few others.
NodeJS is the marriage of the V8 Javascript interpreter and JIT with an asynchronous IO library (libuv) + a large ecosystem of modules.
You might want to compare nodejs with PyPY+Tornado or with PyPy+eventlet. Read about STM and the idea behind it. STM lets you take advantage of multiple cores. nodejs is single threaded. In practice nodejs might be faster currently just because V8 is very good and depending on workload if most of the stuff it does is just proxying data from one stream to another, it might do pretty well. But if you start doing a large number of concurrent requests where each request has do to some logic, PyPy might come out on top.
In general anywhere with complicated business logic or a large number of steps needed in the backend to handle requests, I wouldn't use nodejs. I never liked the callback/errback paradigm for large concurrent applications. That works for demos and short web tutorials, in practice, I don't like how it looks. I like green threads, lightweight processes better.
"In practice nodejs might be faster currently just because V8 is very good" and I presume you claim PyPy is not so good? Well if so, I would really like to say [citation needed], especially for workloads and not say computer language shootout, which has little to do with performance of doing any actual workload.
Of course not, Maciej ;-) PyPy is awesome and thank you and the whole team you have been doing a great job so far. Just looking at performance graphs and speedups gained over the years. It looks very impressive.
Yeah I don't have a citation but from experience, I have noticed where there are a few steps in each request processing (think proxies) solutions based event loops (epoll, kqueue and friends) can outperform those that spawn a thread/process/context. For example haproxy is certainly a very well done fast proxy, it is single threaded and it seems to work for it.
Again sorry for misunderstanding, I was just speculating without any benchmarks or even particular applications in mind.
Cheeky comment aside, PyPy is an alternative interpreter for Python and, more broadly, a meta-frame work to make writing tracing JIT for the language of your choice easy.
It's much faster than regular Python. The only downside (which, admittedly, is very, very big) is that the PyPy developers released their own FFI library for interfacing with C-code (which works brilliantly), but using c-types and the c-python c-api is very kludgy. This means that a lot of very important python libraries which are basically thin wrappers around c/fortran code (SciPy, NumPy, etc) are not really working.
There is some effort around reimplementing NumPy in PyPy, but it's going very slowly.
Edit:
PyPy is also a heroic effort by a small team of developers who receive some donations. JS is probably the language that received the most amount of money and attention into making it run fast from Google/Mozilla, etc - in terms of results/money, PyPy is incredible.
To be a bit more specific, pypy can be much faster, for certain usecases. For others it may not be, or it may be even significanlty slower - writing JIT for python is not an easy task.
Sigh, I'll give you the only honest answer it seems you will get here. Node.js runs JavaScript faster than anything runs Python, PyPy included. If you want speed and you like JavaScript just as much as Python, then use Node.js. When I say "faster" I mean you will use less servers/cores for your application with Node.js than you will with CPython or PyPy. V8 is just a better and faster VM for its target language.
"Node.js runs JavaScript faster than anything runs Python, PyPy included" that's a [citation needed] right here. Do you have some facts that I don't happen to have? Please share. That said, I don't care about a recursive fibonacci or computer language shootout problems.
Benchmarks are all you need to know unless a language runtime has IO problems. Fewer instructions and less memory mean fewer servers. Benchmarks are really good at predicting general performance. Languages with slow benchmarks need more servers to run your app. Languages with fast benchmarks need far fewer servers to run your app. This fact is so obvious and testable that I don't know why slow-language people bother throwing up the argument that benchmarks aren't everything.
Repeat after me: languages don't have speed.
Maybe to someone who is just now getting into the industry, but it wasn't long ago JS had no speed.
Python also varies in speed over time and implementation. Implementations have speed. There are reasons that people use "slow languages". It's not just developer happiness. You really do have to look at performance holistically.
If you want to speak to microbenchmarks, I can. For example, CPython's std lib JSON processing is done in C. It's very fast. As of Go 1.2, PyPy blew the doors off Go in my tests processing 10,000 JSON records. CPython 2 & 3 also beat Go in my tests as well. What's this mean? Does it mean Go is a slow language? Well, it's one microbenchmark vs another. It really doesn't mean much to your application.
"I don't know why slow-language people bother throwing up the argument that benchmarks aren't everything."
I'd go even further than 'aren't everything'. Benchmarks that aren't your application do not mean anything.
Many are ignorant of how CPython is even built and are shocked when I show them how fast many standard library modules are. It's just not so simple. Don't believe the hype.
What benchmarks? Benchmarks are specific to the application area. Computing n-body simulations are not good benchmarks for a concurrent application that servers web requests, connects to databases, and processes credit card data.
But that wouldn't be a true statement. Google's engineers didn't laugh at Node.js for using V8 due to arrogance. V8 is limited to a single CPU thread per instance. It's great for I/O and a typical webpage, but a CPU task stops everything. PyPy with STM does not have this limitation. Once this project is out of beta, V8 wouldn't compare to PyPy/STM. A better comparison would be the JVM or CLR.
I think your comment is moderately off-topic. This is really not about tornado at all, rather than you can take medium-sized program and get STM in PyPy not to crash