Pipelining falls short of SPDY in several respects. The biggest problem is that ...

osth · on July 6, 2013

I trust in theory this is true, but I've never personally observed this in practice.

I guess SPDY fans' marketing of this "feature" would be more convincing if I could see a demonstration.

I just don't see any noticeable delays when using pipelining.

What strikes me as peculiar about the interest in SPDY is that I never saw any interest in pipelining before SPDY. And I really doubt it was because of potential head of line blocking or lack of header compression. I think users just were not clued in about pipelining.

The speed up between not using pipelining and using it is, IME, enormous. 1 connection for 100 files versus 100 connections for 100 files. It is a huge efficiency gain.

Yet most users have never even heard of HTTP pipelining, or never tried it. If they really wanted such a big speed up, why wouldn't they use pipelining, or at least try it? Why wouldn't they demand that browsers implement it and turn it on by default?

Users are being encouraged to jump right into SPDY, a very recent and relatively untested internal project (e.g. see the CRIME incident) of one company, most users, if not all, having never previously experimented with even basic pipelining, which has been around since the 1999 HTTP/1.1 spec and has support via keep alives in almost all web servers.

Noticeable speed gains would be seen if www pages were not so burdened with links to resources on external hosts. That's what's really slowing things down, as browsers make dozens of connections just to load a single page with little content. The speed gains from cutting out all that third party host cruft would make any speed gains from avoiding theoretical potential head of line blocking during pipelining seem miniscule and hardly worth all the effort.

If you want to see how much pipelining speeds up getting many files from the same host, you do not need SPDY to do that. Web servers already have the support you need to do HTTP/1.1 pipelining. (Though on rare occasions site admins have keep-alives disabled, like HN for example. In effect these admins are saying, "Sorry, no pipelining for you.")

akalin · on July 6, 2013

HTTP pipelining is turned off by default in most browsers due to concerns with buggy proxies and servers (see https://bugzilla.mozilla.org/show_bug.cgi?id=264354 ). It may work for you and the particular set of servers you visit, but I suspect browser developers would rather have a browser that by default works with the widest possible range of configurations.

Unfortunately, it being turned off by default in most browsers means that most people won't see the benefits from it. Hopefully, the upcoming HTTP/2 standard will fare better (latest draft: https://tools.ietf.org/html/draft-unicorn-httpbis-http2-01 ).

Note that HTTP/2 will be based on SPDY (in particular, SPDY/4 with the new header compressor). Hopefully, when the standard is finalized and we have multiple strong implementations, that will allay the concerns you seem to have with SPDY today.

(Disclaimer: I work on SPDY / HTTP/2 for Chromium.)

osth · on July 6, 2013

Yes, I understand there are buggy servers and proxies... and I use a browser that has settings to accomodate them. However... I do not know about HTTP bugs that affect <emphasis>pipelining<emphasis>. And... in addition, for pipelining, I do not use a browser to do the initial retrieval. I use something like netcat to fetch and then I view the results with a browser.

Can you give me a list of buggy servers where my HTTP/1.1 pipelining will not work as desired? I've been doing pipelining for 10 years (that's quite a few servers I've tried) with no problems.

The arguments made by SPDY fans (e.g. Google employees) all seem plausible. But I wonder why they are never supported by evidence? IOW, please show me, don't just tell me. SPDY seems to solve "problems" I'm not having. Where can I see these HTTP/1.1 pipelining problems (not just problems with browsers like Firefox or Chrome) in action? I'd love to try some of the buggy servers you allude to and see if they slow down pipelining with netcat.

akalin · on July 7, 2013

I didn't have to look hard to find bug reports for pipelining. An example is https://bugs.launchpad.net/ubuntu/+source/apt/+bug/948461 for Amazon's S3. I'd be interested if the problem is still reproducible now. Also, one of the comments mentions Squid 2.0.2 as being buggy.

Also, see https://insouciant.org/tech/status-of-http-pipelining-in-chr... for a link to Firefox's blacklist of buggy servers (and a good discussion of pipelining in Chromium).

Most of the improvements in SPDY are latency improvements, so if you're downloading sites with netcat and then viewing them in a browser, I'm pretty sure the overhead of that would dwarf anything SPDY would save. That having been said, there's ample evidence of SPDY improving things. From http://bitsup.blogspot.com/2012/11/a-brief-note-on-pipelines... :

"Also see telemetry for TRANSACTION_WAIT_TIME_HTTP and TRANSACTON_WAIT_TIME_HTTP_PIPELINES - you'll see that pipelines do marginally reduce queuing time, but not by a heck of a lot in practice. (~65% of transactions are sent within 50ms using straight HTTP, ~75% with pipelining enabled).... Check out TRANSACTON_WAIT_TIME_SPDY and you'll see that 93% of all transactions wait less than 1ms in the queue!"

osth · on July 8, 2013

Thanks for the reading material.

You omitted the sentence before your excerpt where Mr. McManus suggests we move to a multiplexed pipelined protocol for HTTP.

I'll go further. I say we need a lower level, large framed, multiplexed protocol, carried over UDP, that can accomodate HTTP, SMTP, etc. Why restrict multiplexing to HTTP and "web browsers"? Why are we funnelling everything through a web browser ("HTTP is the new waist") and looking to the web browser as the key to all evolution? It seems obvious to me what we all want in end to end peer to peer connectivity. Although the user cannot articulate that, it's clear they expect to have "stable connections". This end to end connectivity was the original state of the internet. Before "firewalls". Client-server is only so useful. It seems to me we want a "local" copy of the data sources that we need to access. We want data to be "synced" across locations. A poor substitute for such "local copies" has been moving data to network facilities located at the edge, shortening the distance to the user.

But, back to reality, in the case of http servers, common sense tells me that opening myriad connections to (often busy) web servers to retrieve myriad resources is more prone to potential delays or other problems (and such delays could be due to any number of reasons) than opening a single connection to retrieve said myriad resources. Moreover, are his observations are in the context of one browser?

I guess when you work on a browser development team, you might get a sort of tunnel vision, where the browser becomes the center of the universe.

If you dream of multiplexing over stable connections, then you should dream bigger than the web browser. IMO.

I'm aware of a bug in some PHP databases with keep alive after POST. I mainly use pipelining for document retrieval (versus document submission) so I am not a good judge of this. What I'm curious about is where keep alives after POST would be desirable. You alluded to that usage scenario (a series of GET's after a large POST).

akalin · on July 8, 2013

Re. Patrick's sentence, you're right, but as I mentioned above, SPDY/4 will become HTTP/2 (we're working through the standardization process). So I think most of the major players are on board with "fixing" HTTP pipelining by using SPDY-style multiplexing.

Re. thinking bigger, you might want to read up on QUIC, which was announced recently: http://en.wikipedia.org/wiki/QUIC . Based on that, I would content that at least we on the Chromium team don't have tunnel vision. :)

Re. your question, Patrick's data is from Firefox only I believe. You're right that it's not surprising his stats show that SPDY helps over HTTP without pipelining. But the more interesting thing is that HTTP with pipelining still doesn't help that much over HTTP without pipelining (on average) and SPDY still beats it by orders of magnitude. I'd have to dig, but I'm pretty sure there are similar stats on the Chromium side.

osth · on July 9, 2013

Yes, a major appeal of pipelining to me is efficiency with respect to open connections. It's easier to monitor the progress of one connection sending multiple HTTP verbs than multiple connections each sending one verb.

Whether multiple verbs over one connection are processed by the given httpd more efficiently than single verbs over single connections is another issue. IME, a purely client-side perspective, pipelining does speed things up. But then I'm not using Firefox to do the pipelining.

I'm sure the team reponsible for Googlebot would have some insight on this question. (And I wonder how much SPDY makes the bot's job easier?)

In any event, multiplexing would appear to solve the open connections issue. And I don't doubt it will consistently beat HTTP/1.1 pipelining alone. I'm a big fan of multiplexing (for peer-to-peer "connections"), but I am perplexed by why it's being applied at the high level of HTTP (and hence restricted to TCP, and all of its own inefficiencies and limitations).

I'm curious about something you said earlier. You said something about the "overhead" of using netcat. It's relatively a very small, simple program with modest resource requirements. What did you mean by overhead?

akalin · on July 10, 2013

Re. multiplexing at the HTTP layer, because an HTTP replacement has to be deployable and testable. However, now that the ideas in SPDY have been proven and are on their way to being standardized, you can look at QUIC to see what can be done when not limited to TCP and HTTP.

By overhead I mean latency overhead -- running a program to download a site to a local file and then displaying it in a browser will almost certainly have a higher time to start render. Not to mention you're hitting everything cold (i.e., not using the browser's cache).

osth · on July 13, 2013

I don't measure latency as including rendering time. Maybe I'm not "rendering" anything except pure html.

I measure HTTP latency as the time it takes to retrive the resources.

Whatever happens after that is up to the user. Maybe she wants to just read plain text (think text-only Google cache). Maybe she wants to view images. Maybe she wants to view video. Maybe she only wants resources from one host. Maybe she does not want resources from ad servers. We just do not know. Today's webpages are so often collections of resources from a variety of hosts. We can't presume that the user will be interested in each and every resource.

Of course those doing web development like to make lots of presumptions about how users will view a webpage. Still, these developers must tolerate that the speed of users' connections vary, the computers they use vary, and the browsers they use vary, and some routinely violate "standards". Heck, some users might even clear their browser cache now and again.

But HTTP is not web development. It's just a way to request and submit resources. Nothing more, and nothing less.

grey-area · on July 6, 2013

Osth you appear to have been hellbanned, you should ask for this to be fixed. I see nothing meriting it in your comments.

thwarted · on July 6, 2013

This isn't a problem when the primary request is dynamic and served from one server/domain/connection and the remaining requests are for static assets stored on and served from another server/domain/connection.

dcsommer · on July 6, 2013

Consider if the client makes a large POST followed by a few GETs. If the client has little upload bandwidth, the GETs will be delayed until the POST completes. With SPDY, they all can proceed concurrently. Similarly, if the client makes 5 GET requests, with the first being a heavy/slow/expensive resource for the server, the cheap resources can't be delivered until the slow resource finally is computed and returned.

thwarted · on July 7, 2013

Yes, that's the reason that SPDY exists, but my point is that that's actually a rare implementation in the real world. As was said elsewhere in this thread, and what I was saying, is that the most likely implementation is that one request goes to, say, www.example.com, which serves a single HTML file, and remaining requests for resources that that HTML references go to outsourced-cdn.example.com. So it's actually more important to have pipelining and SPDY support on outsourced-cdn.example.com than it is on www.example.com. That is, chances are you don't need to worry about it, that's why you pay outsourced-cdn to. There is less of a need for multiple simultaneous requests when you have good client-side caching of resources too. The usefulness of multiplexed requests is negated if you only serve one request.

The above has been the exact case at a number of companies I've worked at.

Sites and companies like Google or Facebook, that serve all their own traffic, it becomes more important for.