If you like this remember to donate some money to the project on http://pypy.org/ [I am not in any way affiliated with PyPy, only supporting the project].
Whilst I applaud the efforts of PyPy, I've always been disappointed with the performance benefit it's given me with real world code.
Specifically:
* PyPy Dict performance is very slow (slower than Cpython). A lot of the code I need to write that is cpu-intensive python code processes data (from stuff like logs) into various data structures that use dicts. Pypy is rarely more than 10% faster and sometimes slower.
* The memory management/GC can be bad. I've seen the same code that runs fine on cpython end up using excessive amounts of memory (and causing out of memory issues etc) with PyPy. Again - this is normally involving complicated data-structures.
On about 10 occasions now I've had CPU bound Python tasks, then I've tried to use with PyPy and never had > 20% performance improvements. Which is a big contrast with the benchmarks.
Is it just me? Or have other people had similar experiences?
The benchmarks are a little bit misleading for some types of code. They do not measure single run code well. In these cases pypy doesn't do as well as when the JIT has warmed up.
The interpreter is slower than the CPython one. For code which doesn't JIT well, it will go slower with pypy. The firefox JS engine now has a fast interpreter, a quick JIT, and a more optimizing one. This means the baseline performance is better in CPython, so if pypy manages to make their interpreter better baseline performance will get better.
The memory management in CPython is much more predictable and deterministic since it is using reference counting and not GC. The pypy GC may be faster in many circumstances however. IO in pypy can be slower sometimes if the GC is unhappy with the way you're doing it. Especially if you're leaving files open, or reading in different sized chunks of data.
The new pypy release speeds up a few different types of real world code however. XML processing, DB processing, and stackless async code (eventlet, and greenlet) are three areas where pypy has improved with this release. Numpy based code is another area where pypy has gotten better with this release.
In short, pypy still has many areas where it could improve... but with each release is getting better at more types of real code :)
For run-once, and latency or memory sensitive code the benchmarks may be a bit misleading.
Have you used pypy recently? I've found that memory usage in particular is much better as of around 1.9, compared to previous releases. Still worse than CPython, for sure, but some of my code is around 10x faster under pypy (all depends on what I'm doing, though, for sure; this stuff is numerically heavy).
I've had exactly the same problem. I have had tree-manipluation AI problems which run for > 30 minutes (seems like an ideal thing for pypy, before I rewrite them in C++). pypy is almost always slower, and never more than about 15% faster, whereas a simple line-by-line C++ rewrite can be 20x faster.
Python is a vast language. It's very hard to know upfront what sort of patterns people use - if you don't talk to us, don't post stuff on the bug tracker, don't do anything - it's your own fault. PyPy is known to speed up real world code to various degrees - sometimes 10x sometimes not at all, but it all really depends. We would be happy to help you with your problem, but if the only thing you do is to complain on hackernews, well, too bad, we can't help you.
I am sympathetic because PyPy is very good and is improving fast. but...
That doesn't change how the PyPy project tends to represent itself, which almost always comes across as something like "6x speedup for everything (excluding JIT warmup)". If you want everyone to adopt PyPy instead of CPython then it is part of your job to find the cases where PyPy is not actually faster rather than saying it is the user's fault. And it is not your job to select only benchmarks which tell the story you want.
If the difference between interpreters is nuanced then that nuance should be expressed so that people can make intelligent decisions rather than dismissing one or another interpreter as "slow".
FWIW, it's trivial to get benchmarks into their comparisons, provided they aren't microbenchmarks, at least in my experience. Saying they pick benchmarks that are favourable is untrue: the majority were added because of PyPy performing badly on them, and they've improved as a result of being included thus making them mostly benchmarks PyPy does well at.
Out of curiosity, was your python code relying on dicts or using structured classes ? It looks like Pypy is better at optimizing classes than dictionaries.
Which might explain to some degree why people are not seeing the perf improvement they expect. A lot of existing python code rely on dict manipulation which gives decent perf on CPython where classes would play nicer with Pypy.
Does anyone have any performance comparisons to Perl? I've used Perl for over a decade. Just started using Python on a project. I've got a lot to learn but Python makes you feel like you've got it down pretty quickly.
One of the things that kept me on Perl was its great performance.
Interesting datum you used. That chart shows that, on those compilers, on those programs, with those test data, Perl and CPython are VERY similar in speed. If you compare your link (Perl / CPython) with the inverse (CPython / Perl), you almost can't tell the graphs apart.
Perl is famous for it regex performance. For many years people have compared Python, Ruby, and Perl performance and it has usually been Perl as the fastest followed by Python then Ruby. I've seen many posts like this:
I think regex performance of Perl is nothing extraordinary. What it is famous for is pushing the scope of matching beyond regular languages.
Of all the common scripting languages I think TCL uses a different algorithm for regex matching and is considerably faster than Perl, especially on longer strings. Let me find some corroborating docs.
Exactly. Perl regexes have _poor_ performance in terms of running time, but good usability, including by squeezing non-regular features into their allegedly-regular expressions, and also things like numerous and flexible character classes, consistent behaviour for escapes, etc.
I love vim, but jesus I can never remember which vim regex metacharacters need escaping to get their meta-meaning, and which need escaping to get their non-meta meaning.
I dunno, I haven't used it in the past, because...
1) "It is recommended to always keep the 'magic' option at the default setting, which is 'magic'. This avoids portability problems."
2) Nor do I want to type two extra characters in every damn regex!
So, it only occurs to me right now, the right thing to do for me is this: the first instant I'm confused about whether \( means grouping or literal paren, I should immediately start my pattern with \v or \V. No unportability of plugins, no marginal cost on regexes that don't care, marginal benefit when it does matter.
I'm not aware of a direct comparison of Perl and PyPy. However, for a paper I co-authored, we put together an experiment with various languages and VMs, using the language shootout plus a few other benchmarks:
Cross-langauge benchmarks are inevitably synthetic, and synthetic benchmarks can only tell you so much. It's important not to over-interpret them. But they can give you a rough idea of what's going on, at least in some circumstances. If someone wanted to add Perl to the experiment set, I'd gladly accept a patch to the benchmarking suite.
The gevent/eventlet part of this has me pretty excited. We needed this in order to do some experimenting with PyPy without investing a good deal of time on a test conversion. I'm also interested to see if cffi is as good as I've heard it is (relative to ctypes).
Can someone knowledgeable compare PyPy, Numba and Cython? I mostly use Cython. Tried Numba also, it has very nice workflow when it works with autojit (when it doesn't error messages are pretty cryptic). With PyPy I don't understand why for example all that type information obtained from jit wouldn't be used to make something like Numba specialized functions or Cython modules (noob question but please answer).
For example does PyPy always warm up, or can it store "warmed up" version of some function (where type of objects is inferred)? I know that this is not in accordance with highly dynamic nature of Python, but not all functions are highly dynamic.
My understanding (from a previous HN discussion about javascript) is that since the output from a "just-in-time compiler" is actually machine code generated on the fly, it includes direct references to memory locations that are only valid for the lifetime of the process. So the output of a JIT is simply not in a format that can be saved and reloaded later.
I can't speak for python, but I do know that the impact of translating code for Racket's JIT (provided by GNU lightning, not LLVM) is negligable (ie. takes less time than the time you gain by JITting
Cython is not actually a compiler. It is essentially a more convenient way of writing C/C++ extension modules.
It produces C/C++ glue code which is then compiled by an actual compiler like gcc. By tapping into the existing C API of cpython it is simpler than PyPy & NumPy which work on a lower level, but on the other hand Cython does little optimization for you.
That's not quite right. Stock numpy doesn't install, and nobody's working on that as far as I know, but they've reimplemented part of the numpy API (at least the Python API, not the C one), and the subset they've implemented works fine. I have an application in production that uses it.
quora did at some point but it seems they replaced it (or some usage of it) with scala, I can't find any pointer at the moment but google could help you.
In many cases, you want to be careful about using JIT'd platforms. Many people forget that the memory requirements skyrocket. When you want to use a language on many different platforms, including embedded, you start to see how having a JIT interpreter as your reference platform can be disadvantageous. CPython isn't exactly slow either, especially when you consider many 'intensive' modules are written directly in C.
Pypy should stay as it is, an experiment that can be used for people who require more performance for certain workloads. Of course, having part of your language written in C for CPython can hurt sometimes, when you can't easily use the functionality on other interpreters.
almost agree, except for the 'experiment' part. i don't want it to be an experiment, i want it to be a production-ready interpreter that i can use instead of cpython with minimal effort when i know i can trade memory for speed.
The people who haven't upgraded to Python 3 in their projects are probably mostly people with dependencies on py2.7 libraries.
Happened to me in my last project.
I started using pypy until some bug in pypy prevented me from using an important package.
Switched to python 3 until some other dependency wasn't available.
So it's a pity, but i was more or less forced to use py2.7 (without putting major effort into 3rd party libraries).
I hope that that never happens. CPython is still pretty readable, and reference implementations should be simple and easy to learn from.
Actually, I'd like it if somebody hacked up a very very simple and naive Python interpreter and nominated that for the reference implementation instead.
No, it isn't. PyPy code is very poorly commented and documented, there's a lot going on inside the same codebase. So far, it feels like the only people who hack it are the ones who wrote it. This needs to be sorted out before we see mass adoption.
readability is probably not the right term, but I think he means easy to understand. Doesn't matter how elegant PyPy's code is, it'll surely be more complex than an straight forward interpreter.
I'm saying that it's big. Specialization requires special cases, which require more lines of code. One should be able to reason about the entirety of a Python implementation, and that's very difficult to do for PyPy.