Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
PyPy 2.0 Released (morepypy.blogspot.com)
280 points by craigkerstiens on May 9, 2013 | hide | past | favorite | 76 comments


If you like this remember to donate some money to the project on http://pypy.org/ [I am not in any way affiliated with PyPy, only supporting the project].


Whilst I applaud the efforts of PyPy, I've always been disappointed with the performance benefit it's given me with real world code.

Specifically:

* PyPy Dict performance is very slow (slower than Cpython). A lot of the code I need to write that is cpu-intensive python code processes data (from stuff like logs) into various data structures that use dicts. Pypy is rarely more than 10% faster and sometimes slower.

* The memory management/GC can be bad. I've seen the same code that runs fine on cpython end up using excessive amounts of memory (and causing out of memory issues etc) with PyPy. Again - this is normally involving complicated data-structures.

On about 10 occasions now I've had CPU bound Python tasks, then I've tried to use with PyPy and never had > 20% performance improvements. Which is a big contrast with the benchmarks.

Is it just me? Or have other people had similar experiences?

[edit: typos]


The benchmarks are a little bit misleading for some types of code. They do not measure single run code well. In these cases pypy doesn't do as well as when the JIT has warmed up.

The interpreter is slower than the CPython one. For code which doesn't JIT well, it will go slower with pypy. The firefox JS engine now has a fast interpreter, a quick JIT, and a more optimizing one. This means the baseline performance is better in CPython, so if pypy manages to make their interpreter better baseline performance will get better.

The memory management in CPython is much more predictable and deterministic since it is using reference counting and not GC. The pypy GC may be faster in many circumstances however. IO in pypy can be slower sometimes if the GC is unhappy with the way you're doing it. Especially if you're leaving files open, or reading in different sized chunks of data.

The new pypy release speeds up a few different types of real world code however. XML processing, DB processing, and stackless async code (eventlet, and greenlet) are three areas where pypy has improved with this release. Numpy based code is another area where pypy has gotten better with this release.

In short, pypy still has many areas where it could improve... but with each release is getting better at more types of real code :)

For run-once, and latency or memory sensitive code the benchmarks may be a bit misleading.


"The firefox JS engine now has a fast interpreter, a quick JIT, and a more optimizing one. This means the baseline performance is better in CPython"

Huh?


From what I understand, there are three stages:

- interpreted

- If run 'enough' times, the code is JIT compiled (fast compilation)

- If run 'enough' times again, it is JIT compiled with the most optimizing compiler (slow compilation)


The question is what the firefox JS interpreter has to do with baseline performance in CPython.


I've used PyPy in production with success. The task involved implementing a worker which needs to process Wikipedia data.

http://rz.scale-it.pl/2013/02/18/wikipedia_processing._PyPy_...

Also I'm using it with my web applications.


Have you used pypy recently? I've found that memory usage in particular is much better as of around 1.9, compared to previous releases. Still worse than CPython, for sure, but some of my code is around 10x faster under pypy (all depends on what I'm doing, though, for sure; this stuff is numerically heavy).


I've had exactly the same problem. I have had tree-manipluation AI problems which run for > 30 minutes (seems like an ideal thing for pypy, before I rewrite them in C++). pypy is almost always slower, and never more than about 15% faster, whereas a simple line-by-line C++ rewrite can be 20x faster.


Python is a vast language. It's very hard to know upfront what sort of patterns people use - if you don't talk to us, don't post stuff on the bug tracker, don't do anything - it's your own fault. PyPy is known to speed up real world code to various degrees - sometimes 10x sometimes not at all, but it all really depends. We would be happy to help you with your problem, but if the only thing you do is to complain on hackernews, well, too bad, we can't help you.


I am sympathetic because PyPy is very good and is improving fast. but...

That doesn't change how the PyPy project tends to represent itself, which almost always comes across as something like "6x speedup for everything (excluding JIT warmup)". If you want everyone to adopt PyPy instead of CPython then it is part of your job to find the cases where PyPy is not actually faster rather than saying it is the user's fault. And it is not your job to select only benchmarks which tell the story you want.

If the difference between interpreters is nuanced then that nuance should be expressed so that people can make intelligent decisions rather than dismissing one or another interpreter as "slow".


FWIW, it's trivial to get benchmarks into their comparisons, provided they aren't microbenchmarks, at least in my experience. Saying they pick benchmarks that are favourable is untrue: the majority were added because of PyPy performing badly on them, and they've improved as a result of being included thus making them mostly benchmarks PyPy does well at.


Please help PyPy by providing an example benchmark. The project is test and data driven, and the more data it has the better it can be


Out of curiosity, was your python code relying on dicts or using structured classes ? It looks like Pypy is better at optimizing classes than dictionaries. Which might explain to some degree why people are not seeing the perf improvement they expect. A lot of existing python code rely on dict manipulation which gives decent perf on CPython where classes would play nicer with Pypy.


I have at least a 10 fold increase. In my project I have many lists of integer tuples and I need to do a lot of comparison/arithmetic on that.


Does anyone have any performance comparisons to Perl? I've used Perl for over a decade. Just started using Python on a project. I've got a lot to learn but Python makes you feel like you've got it down pretty quickly.

One of the things that kept me on Perl was its great performance.

http://benchmarksgame.alioth.debian.org/u64q/perl.php

I'd like to go for the win, win. Is that 'FTWW?'


Interesting datum you used. That chart shows that, on those compilers, on those programs, with those test data, Perl and CPython are VERY similar in speed. If you compare your link (Perl / CPython) with the inverse (CPython / Perl), you almost can't tell the graphs apart.

http://benchmarksgame.alioth.debian.org/u64q/benchmark.php?t...

So CPython is about the same speed as Perl. PyPy's speed comparison suggests that PyPy is 2-20x faster than CPython. What else do you want to know?

Perl, btw, is not noted for its "great performance" in any context I've ever heard of.


Perl is famous for it regex performance. For many years people have compared Python, Ruby, and Perl performance and it has usually been Perl as the fastest followed by Python then Ruby. I've seen many posts like this:

http://stackoverflow.com/questions/12793562/text-processing-...

These days, however, it appears that Python has really come into its own. I started using Sublime Text so that got me into trying out Python.


I think regex performance of Perl is nothing extraordinary. What it is famous for is pushing the scope of matching beyond regular languages.

Of all the common scripting languages I think TCL uses a different algorithm for regex matching and is considerably faster than Perl, especially on longer strings. Let me find some corroborating docs.

Ok this has some info http://swtch.com/~rsc/regexp/regexp1.html


Exactly. Perl regexes have _poor_ performance in terms of running time, but good usability, including by squeezing non-regular features into their allegedly-regular expressions, and also things like numerous and flexible character classes, consistent behaviour for escapes, etc.

I love vim, but jesus I can never remember which vim regex metacharacters need escaping to get their meta-meaning, and which need escaping to get their non-meta meaning.



I sort of prefer \V, but same difference

I dunno, I haven't used it in the past, because...

1) "It is recommended to always keep the 'magic' option at the default setting, which is 'magic'. This avoids portability problems."

2) Nor do I want to type two extra characters in every damn regex!

So, it only occurs to me right now, the right thing to do for me is this: the first instant I'm confused about whether \( means grouping or literal paren, I should immediately start my pattern with \v or \V. No unportability of plugins, no marginal cost on regexes that don't care, marginal benefit when it does matter.


Yeah, that's basically what I do too. No \v on trivial regexes, \v on ones where there's more than one or two groups.


I haven't seen Perl vs. PyPy stats, but here is some CPython vs. PyPy benchmarks:

http://speed.pypy.org/


Here are posts that compares CPython 2.7, 3.3 and PyPy 1.9:

1. web frameworks performance: http://mindref.blogspot.com/2012/09/python-fastest-web-frame... 2. template engine performance: http://mindref.blogspot.com/2012/07/python-fastest-template....

It would be interesting to see how PyPy 1.9 compares to 2.0 in those benchmarks.


I'm not aware of a direct comparison of Perl and PyPy. However, for a paper I co-authored, we put together an experiment with various languages and VMs, using the language shootout plus a few other benchmarks:

  http://tratt.net/laurie/research/pubs/files/metatracing_vms/
Cross-langauge benchmarks are inevitably synthetic, and synthetic benchmarks can only tell you so much. It's important not to over-interpret them. But they can give you a rough idea of what's going on, at least in some circumstances. If someone wanted to add Perl to the experiment set, I'd gladly accept a patch to the benchmarking suite.


The gevent/eventlet part of this has me pretty excited. We needed this in order to do some experimenting with PyPy without investing a good deal of time on a test conversion. I'm also interested to see if cffi is as good as I've heard it is (relative to ctypes).

Excellent work, fijal and team!


This is exciting for eventlet. Coupled with recent resurgence of work on it, it might bring it back into spotlight.


Can someone knowledgeable compare PyPy, Numba and Cython? I mostly use Cython. Tried Numba also, it has very nice workflow when it works with autojit (when it doesn't error messages are pretty cryptic). With PyPy I don't understand why for example all that type information obtained from jit wouldn't be used to make something like Numba specialized functions or Cython modules (noob question but please answer).


Cython is a compiler not an interpreter. Numba requires explicit hinting and can only optimize some undefined (?) subset of Python.

PyPy on the other hand is simply a fast Python interpreter. Not sure what you mean regarding the type information gathered by the JIT.


For example does PyPy always warm up, or can it store "warmed up" version of some function (where type of objects is inferred)? I know that this is not in accordance with highly dynamic nature of Python, but not all functions are highly dynamic.


My understanding (from a previous HN discussion about javascript) is that since the output from a "just-in-time compiler" is actually machine code generated on the fly, it includes direct references to memory locations that are only valid for the lifetime of the process. So the output of a JIT is simply not in a format that can be saved and reloaded later.


Actually its possible to have a JIT like that but it is uncommon.


Do you mean can you save the state of the JIT for future use?


Yeah. Can you? Considering CPython already generates .pyc and checks timestamps, keeping some kind of JIT cache around wouldn't be a stretch.


No, the state of the JIT cannot be saved between executions.


I can't speak for python, but I do know that the impact of translating code for Racket's JIT (provided by GNU lightning, not LLVM) is negligable (ie. takes less time than the time you gain by JITting


Cython is not actually a compiler. It is essentially a more convenient way of writing C/C++ extension modules. It produces C/C++ glue code which is then compiled by an actual compiler like gcc. By tapping into the existing C API of cpython it is simpler than PyPy & NumPy which work on a lower level, but on the other hand Cython does little optimization for you.


Have you tried the cffi? I used to use cython, but since I found the cffi, I haven't looked back.


There are a lot of cffi c bindings. I used with success:

* https://github.com/amauryfa/lxml/tree/lxml-cffi

* https://github.com/chtd/psycopg2cffi


No, I got scared away by Simple example and Real example from the docs :) Do you maybe have some hints/links for cffi?


PyPy still doesn't work with Numpy, though they're working on it.


That's not quite right. Stock numpy doesn't install, and nobody's working on that as far as I know, but they've reimplemented part of the numpy API (at least the Python API, not the C one), and the subset they've implemented works fine. I have an application in production that uses it.


PyPy has it's own implementation of Numpy (called numpypy). It's not 100% complete but quiet functional.

http://morepypy.blogspot.com/search?q=numpypy


Numba is not Numpy.


What are the largest sites using PyPy in production?


quora did at some point but it seems they replaced it (or some usage of it) with scala, I can't find any pointer at the moment but google could help you.


Congratulations!

Looking forward to the day PyPy becomes the reference implementation.


In many cases, you want to be careful about using JIT'd platforms. Many people forget that the memory requirements skyrocket. When you want to use a language on many different platforms, including embedded, you start to see how having a JIT interpreter as your reference platform can be disadvantageous. CPython isn't exactly slow either, especially when you consider many 'intensive' modules are written directly in C.

Pypy should stay as it is, an experiment that can be used for people who require more performance for certain workloads. Of course, having part of your language written in C for CPython can hurt sometimes, when you can't easily use the functionality on other interpreters.


almost agree, except for the 'experiment' part. i don't want it to be an experiment, i want it to be a production-ready interpreter that i can use instead of cpython with minimal effort when i know i can trade memory for speed.


You are right, but having a JIT able to generate fast code means you can avoid writing C code at all, just look how fast Julia already is.


I'm with you on that one. But bear in mind it doesn't support Python 3 yet, so it might still be some way away.


I wonder how long? I searched for a roadmap, but found none.



God yes. I wish the CPython people would just capitulate and throw themselves behind it.


Because who cares of existing, running, legacy code, right?


Do you mean the people who haven't upgraded from Python 2.7 to Python 3?


The people who haven't upgraded to Python 3 in their projects are probably mostly people with dependencies on py2.7 libraries. Happened to me in my last project. I started using pypy until some bug in pypy prevented me from using an important package. Switched to python 3 until some other dependency wasn't available. So it's a pity, but i was more or less forced to use py2.7 (without putting major effort into 3rd party libraries).


And you did put at least the same amount of effort as to write this comment to report the bug to pypy devs and/or library devs right?


How often do people do that for CPython?


Less often because CPython characteristics are well known and there is no baseline to compare to.


Why does this have to be so political? Why does it matter to you if CPython continues to exist, just use PyPy if you want to?


The Python Foundation should really give some money/developers to these guys



I hope that that never happens. CPython is still pretty readable, and reference implementations should be simple and easy to learn from.

Actually, I'd like it if somebody hacked up a very very simple and naive Python interpreter and nominated that for the reference implementation instead.


Are you saying PyPy is unreadable?

I have not read large amounts of it, but the interpreter is written in RPython. Surely it's easier to read than the CPython source?


No, it isn't. PyPy code is very poorly commented and documented, there's a lot going on inside the same codebase. So far, it feels like the only people who hack it are the ones who wrote it. This needs to be sorted out before we see mass adoption.


readability is probably not the right term, but I think he means easy to understand. Doesn't matter how elegant PyPy's code is, it'll surely be more complex than an straight forward interpreter.


Actually if you read the PyPy interpreter's source, all you'll see is a straightforward interpreter.


PyPy is a straightforward interpreter. It's just written in a language with good JIT compiler (RPython).


Small correction: a language for which good JIT compilers can be generated for interpreters written in it.


I'm saying that it's big. Specialization requires special cases, which require more lines of code. One should be able to reason about the entirety of a Python implementation, and that's very difficult to do for PyPy.


I get about 2x speedup on my CSV reader & processing app, which is nice.

When I tried the beta, it was crashy, but 2.0 seems pretty good so far.


Is it still slower than LuaJIT?


pypy is cool, but why no love for python 3?





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: