Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Next year's computers same speed but more cores/cpus: developer challenge (tbray.org)
37 points by wglb on Oct 1, 2009 | hide | past | favorite | 27 comments


As painful as it is, I found dealing with an interrupt-rich environment much easier in assembler than in other languages. The ability to use coroutines significantly reduced our efforts in building interrupt-based programs. Plus the ability to segregate tasks by interrupt level, this helped a bit.

Besides the yield statement in Python, very few other higher-level languages seem to have anything resembling a coroutine. Bliss/36 had it, but one could argue about how high-level it is.


As painful as it is, I found dealing with an interrupt-rich environment much easier in assembler than in other languages. The ability to use coroutines significantly reduced our efforts in building interrupt-based programs. Plus the ability to segregate tasks by interrupt level, this helped a bit.

I'd like to hear more. What you were trying to do? Why did it need coroutines/interrupts? How did you manage the much-lower-levelness of the language? Also, how is it that coroutines are easier in assembler in the first place? Is it because you can save the state of an execution path and jump to it later?

I've become interested lately in the idea of building a limited language (i.e. specific to the problem of one application), Lisp style, that expands directly to assembly language, as opposed to the normal way of doing it which is to expand to a high-level language that is then handed off to a general-purpose compiler. The idea is that because the language wouldn't be general-purpose, you could exercise much more control over the assembly language that is generated. What do you think of this?


This was a real-time data-acquisition process for a medical application that took ECG data transmitted over the phone line and returned an automatically-generated English-language diagnosis of that ECG to the hospital within ten minutes.

The incoming A/D data from three channels per phone line for ten or twenty phone lines came in every 2 milliseconds. One level of interrupt was assigned to that device, and a task was attached to that interrupt. The job of that was to store each sample in the appropriate buffer properly multiplexed. When the buffer was full, this task would trigger a software interrupt for the routine writing the buffers to disk. This was its own task as well.

In addition to the analog data coming in, there were also touch-tone style tones coming in that a) gave patient identification numbers b) signalled the end of groups of data (data was transmitted in four groups of three leads), and signalled the end of the phone call. This had its own interrupt task as well.

Similarly, when data was written to disk, it was simultaneously written to tape. That had its own task as well. When the phone call was complete, it was put in a queue, and fed to the analysis system. Reading the data and properly assembling it was another interrupt driven task. Then there was either punching paper tape and later sending data out over phone lines, and that had its set of tasks as well.

Interrupts were used because that is pretty much how you did things in that environment. There is a lot of stuff going on all at the same time, and nobody better be busy-waiting anything. It all needed to be overlapped.

We eventually figured out that these tasks were often pairwise synchronous tasks that used interrupts to "hand over" control to its cooperating partner. So the idea of a coroutine was implemented to make the code clearer. You could think of the main loop of each of these tasks as a loop. When it came to the point of needing input, we developed a convention that essentially translated to "get me an interrupt" and the current IP and registers would be saved in the tasks context area, and the task suspended. Then, when it woke up, you would be at the next instruction following the "get me an interrupt" line. With this technique, the system that made the phone calls and sent the messages (tasks for dialing, sending characters, reading from disk, managing tape logging) was developed with only one single thread error and no multi thread errors. I helped with the design of that part but not the coding.

So for hard real time, you kind of need to deal with the interrupts.

Not sure what you mean by How did you manage the much-lower-levelness of the language. At that time, the alternative was Fortran, so we were before C, bliss, and the like. There just wasn't an alternative.

Yes, the reasons that coroutines work in assembler (See Knuth's description) is that you can save the state of the execution path and start right back up where you left off.

Well I think your proposal has merit, but the control you mention is also done in languages like Python, C#, Ruby with the 'yield' statement.

Lisp to bare metal is good, but stuff today is so much faster that it changes the equation entirely. We were doing this on a machine with 3.5 microsecond instruction times (at best) and the whole thing resided in a machine with 36k of 32-bit words. So small was important too, although that seemed big at the time.

So the question is how much control do you really need with machines today that will give you a metric truckload of instructions in a microsecond and you have to pay more money to get memory smaller than a gig? Heck I even get microseconds with SBCL.

Smaller language is good; but the selection of features is kind of key, cause you can give yourself a combinatorial rash if you aren't careful. Not sure what language I would choose these days--likely to start with something you could express the problem in nicely, and see if the timing works out. I learned a rule then which is "first make it right, then make it fast".


So the question is how much control do you really need with machines today

Today the limiting factor is the OS, not the hardware. If your deadlines are long enough, it's surprising how well NT/XP works for low-level tasks. The problem we've had is that when we need deadlines below say 50mS, even a multi GHz Windows box can't guarantee that (XP can't guarantee anything!), so we need a mechanism to detect the missed deadline and continue without a hard failure.

Programming on a consumer-OS like Windows has so many benefits that we prefer to use that as the main platform and offload the sub-millisecond latency hard realtime stuff to smaller processors with code running on bare metal.

Sometimes it's hard coming to grips with the fact that a 4MHz PIC is outperforming a 2+ GHz Windows box, but silicon is cheap!


Quite correct.

In the scenario I described above, with each new release of their operating system (RT/M for Sigma 5) I had to go through and change any code that masked out the timer interrupts and verify that it did not cause harm in any part of the os. And this was designed to be a real-time os, which it otherwise was.

The delay in the 2ms interrupt caused by masking out the timer interrupt was estimated to be several times the quantization noise, an unacceptable result.

I don't know what the numbers are for Linux, but I would suspect there is a release that provides something workable. I would imagine that there are real-time OSes out there now.

Yes, but keep in mind that a 4mhz PIC is faster than the hardware we were working on then.

And yes, silicon is cheap. The multiplexing A/D converters we were using cost $40,000 each. Today you get the same thing for maybe a dollar, retail quantity one.

And XP is so bad that it can't key morse code reliably through the printer port for just the reasons you mention.


That sounds like a pretty cool system; thanks for sharing!


Yer welcome.

Done in 1969.


What is special about Python's yield statement compared to the ones in Ruby, C#, and other languages?


More significantly, continuations are a strict superset of coroutines, so standard Python is actually less powerful in this regard than languages with full continuation support, which if memory serves me includes Ruby, Smalltalk, and Scheme among others.


Python has coroutines as of 2.5: http://www.python.org/dev/peps/pep-0342/


Unfortunately, continuations are not supported in Ruby as of 1.9. They might come back in 2.0 though.


Er, limitations of my knowledge, as I have done more in python than the other two. My bad.


Also Lua


I'm not sure why you were down voted. An excellent overview of coroutines in general, and specifically as they are used in Lua, is http://www.inf.puc-rio.br/~roberto/docs/MCC15-04.pdf


Perl and C have coroutines. Haskell and Scheme have continuations.

(Contingent upon library use, of course.)


It's been said before: old apps on old hardware are a similar speed to today's apps on today's hardware. Often, they're snappier.

Faster hardware makes software slower. Partly it's because they're doing more; partly it's due to inefficiency: Work expands to fill the available time - Parkinson's Law

1. Apps are already fast enough.

2. If we need extra speed to do extra things, we can draw on the reservoir of inefficiencies. Examples: Flash's ActionScript became many times faster in ver 3 (on the order of Javascript to Java). Google improved plain Javascript performance by up to x10 in its Chrome browser. Apple's iPhone is very snappy.

3. We seem to have overshot the need for performance (making it ripe for disruption), and that people value things other than performance, like mobility, size, weight, battery life - as in netbooks and smartphones.


old apps on old hardware are a similar speed to today's apps on today's hardware.

Maybe, but very often they don't do the same thing, which makes the comparison useless.

I don't remember being able to search and rank billions of documents in sub second time back in the 90s or mine terrabytes of data to find cures for cancer. We now do predictive analytics where we did quarterly reporting in the 80s and 90s.

Let's not view computing primarily as word processing with 500MB of bloat added in each new version :-)


Not sure why this comment is getting downvoted.

A few years ago, we used to manage just fine with 128Megs of RAM and 500MHz processors. Now, programmer productivity has suddenly taken importance over user experience.

As far as desktop apps are concerned, I would prefer a C++ app with a couple of features I need over an extensible, feature-rich app written in Python.

WebKit and Gecko have shown us that it is possible to make dynamic languages go faster. Yet, most people start yelling 'premature optimization LOL!!!1!' if you dare to tell them their app is slow.


Another thing that (IMO) is a main cause of the slowdown is the sheer scope of possibilities available on general low-cost hardware these days.

In earlier times you had to work hard to make your code do all the work in a given space. This meant thinking long and hard about which features to implement, let alone how to implement them efficiently (time + space) - a practice which has disappeared somewhat, mainly because it is relatively cheap and easy to just implement it, then optimise if necessary.

Premature optimisation while coding can be a big problem, but this need not mean that optimisation should be left until the end - in fact it is a good idea to optimise the design prior to implementation (that is, at the end of the design process)

This means implementation can be done relatively quickly as the key features are already decided.


I'm surprised I haven't seen anyone mention Clojure yet. It was designed specifically to deal with multiple cores in an elegant fashion thru atoms, agents, and software transactional memory. It might just be the answer to the challenge. Check it out, if you haven't already.


I have heard very good things about it including Dan Weinreb, long time lisp guy, who is very excited about it.

I would be inclined to work with clojure if I had any java projects or seriously threaded projects.

My multi-core stuff these days is done the easy way--separate long-running processes, loosely coupled.


A mitigating factor is that not much software is CPU-bound: many desktop apps spend most of their time idly waiting for input, and most web apps have already been dealing with concurrency by handling multiple users (and scaling out).


The reason people have such trouble scaling web apps is that they think the above is true.


Could you elaborate? Are you saying that apps are CPU bound or that web apps have not been dealing with concurrency even before multi-core? Perhaps you mean that web app authors care about CPU usage too little, which may be true, but I still don't think that multi-core marks a huge change for them.


The thing to remember is that there are two sides to a web-app. The client and the server. The web-app I'm currently working on uses very little CPU client side, but on the server side the number of simultaneous users I can handle scales more or less linearly with the number of cores the server has (assuming enough memory). So personally multiple cores (and using the correctly) makes huge difference to me and the web-app I'm working on.


I apologise for not being clear: certainly multi-core makes a difference to the performance of web apps. My point is that most web apps don't need a drastic change in the way they are written to take advantage of this: they are already multithread or multiprocess to handle concurrent users. Contrast this to single-user desktop apps which have simpler threading models and are less likely to be "naturally" parallelised to take advantage of multi-core.


This is one of the reasons Microsoft came out with F#. Now if only Ocaml could get its act together and give me true multi-core support. Ah well. We all have our pet languages, don't we?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: