Friday, June 29th, 2007

A report on Push versus Pull

Category: Comet

<p>Engin Bozdag, Ali Mesbah, and Arie van Deursen of the Delft University of Technology have compiled a technical report on various push versus pull techniques of building Ajax applications based on an example application that they built.

They concluded that:

In this paper we have compared pull and push solutions for achieving web-based real time event notification. The contributions of this paper include the experimental design, a reusable implementation of a sample application in push and pull style as well as a measurement framework, and the experimental results.

Our experiment shows that if we want high data coherence and high network performance, we should choose the push approach. However, push brings some scalability issues; the server application CPU usage is 7 times higher as in pull. According to our results, the server starts to saturate at 350-500 users. For larger number of users, load balancing and server clustering techniques are unavoidable.

Push Pull

http://swerl.tudelft.nl/twiki/pub/Main/TechnicalReports/TUD-SERG-2007-016.pdf

Related Content:

Posted by Dion Almaer at 12:08 am
10 Comments

++++-
4.2 rating from 29 votes

10 Comments »

Comments feed TrackBack URI

Did I not read it right, or are they using a normal webserver for the long connections? It certainly won’t scale that way. You will need your own socket server to handle some HTTP traffic separate from your web pages. I really doubt meebo or anyone would do things their way.

And while we are at it, I’ll coin a phrase right now: COMET-PULL. That is, using a long running “COMET” connection to receive *events*, and based on those events, then pulling the data from the webserver (using Keep-Alive on that connection). And yes, they should be on separate subdomains to avoid browser connection limits. That way your socket server does not need to handle anything related to how to generate content (leave that to the webserver) — it just sends events.

Comment by Steven Roussey — June 29, 2007

In Oracle BAM we use 1 persistent connection on which we multiplex all of the events to all of the windows that have active components (patent pending), so there is no need for separate subdimains or anything like that. Using push to send the event notifications and then pulling using another request does not make sense to me, since you can push the event with the data at almost the same cost, and you save the latency and server load associated with the pull.

Comment by Tal Broda — June 29, 2007

I send (push) all events for a user via a single connection at presence.domain.com and pull from the domain of the request (likely http://www.domain.com). With a push/pull, you want two different domains since the browser will only allow two connections per domain, and both the push and pull connections stay open. On the main side (the pull), the data/html/whatever comes back, and that might cause something like an image download or something which take up the second connection for that main domain (www).

The reason to use two connections is because they scale quite differently, so it can be incredible cheaper. :) Some of us are bathed with cash, so we find ways around it. ;)

See, the main connection is the original connection — the one that requested the page, which was kept around via keep-alive. That server already is setup to be queried for big things like page views, etc. On the webserver side I’m using apache, and all the connections are handled by one thread. There isn’t really a way to do COMET that way efficiently.

However, on the second connection, which does not go through a webserver, but rather a custom single thread socket server dealing with some simple HTTP, and simple messaging. This server that is doing the push actually scales far better than the web and application servers (since it doesn’t need much code).

So push becomes very cheap, and the load is actually lower than using timed pulls.

Comment by Steven — June 29, 2007

Tal, good luck on getting that patent to stand. Comet style push has been used in many forms for many years, and your patent sounds like an obvious application rather than a new invention. You may get the patent, but good luck enforcing it.

Comment by Brad Neuberg — June 29, 2007

Having given the paper a look, I’d suspect that the CPU utilization they’re seeing here is related to how Jetty handles the concurrency internally. Would be good to see Greg Wilkins comment here and it’d be neat of the Delft researchers shared their test harness code. It’d certainly help improve the performance of all Bayeux implementations quickly.

Tal: we’ve discussed your (potentially?) litigious and indefensible assertions in the past. They deserve no further mention.

Comment by Alex Russell — June 30, 2007

Firstly, this report is a good contribution and I hope we see many more such studies into push vs pull. However I think the study is a little flawed as the 15s pull interval used is heavily biased against push technologies.

If your application can have a 15s latency in event delivery, then perhaps traditional polling is sufficient. But if your application needs a lower latency (say 5 seconds for an auction bidding site or 1.5 seconds for chat or other user interaction), then polling frequencies for pull applications will be significantly higher. If you need to make 10 times more requests with pull, then push can use 7 times more CPU and still be more efficient. (Note I believe that 7 times CPU figure is probably due to the early implementation used and I’d like to see the results from the 6.1.4 release of Jetty).

So I would love to see this report reproduced with results produced on 3 axis: period between events: 0.1, 1, 5, 10, 25, 50 seconds ;
minimal acceptable latency: 0.5, 1, 2, 5, 10, 25, 50 seconds ; number of clients 1, 10, 100, 1000, 5000.

I would expect within that space there will be load profiles best suited for polling and others best suited for long polling. Note that bayeux does not need to be long polling and can handling polling or a combination between the two.

There will also be differences if events need to be delivered to all clients or just a subset of clients.

So good start! but the results are only applicable to a limited range of possible Ajax applications.

Comment by Greg Wilkins — July 1, 2007

Another thing I don’t understand with the paper, is how can a pull applications with a poll time of 15s ever have a mean message trip time of less than 7.5 seconds? The report has a minimum mean time of 2.5 s which is just impossible unless the server side events always happen to coincide with the poll from the client?

Comment by Greg Wilkins — July 1, 2007

I’ve blogged a more complete response at http://blogs.webtide.com/gregw/2007/07/01/1183286820000.html

Comment by Greg Wilkins — July 1, 2007

Thank you Alex and Greg for your feedback, it is very useful for the extension of our paper.

We would like to respond to some of your comments:

*”The report also could be read as implying that the bayeux protocol is only long polling. While long polling is the default, it can also support polling or streaming”*

We only considered what was actually implemented in Jetty, not what was in the draft. Maybe we should have made this more clear.

*”The pull implementation they test has a 15 second period, which means that events will have an average 7.5 second latency for a perfect implementation. While there are many many applications that can live such latency (or longer), they are not the target applications for Ajax Comet techniques. A 15 second latency is simply too much for chat, for collaborative editing,”*

This is true, we only considered one interval for pull merely to constrain the number of test variations. Using smaller pull intervals will surely have an effect on CPU usage. This is something that we for sure will consider in the extension of our paper. However, our goal in this version was to compare how pull compares with push if the publish interval was higher/lower than the pull interval. Using a single pull interval was enough in order to see the results.
Theoretically, we expect a 15 s pull interval with 15 s publish interval to be comparable with as the push version with 15 s publish interval.

*”More over, I think the 7 times figure is at least partially due to an early implementation of Bayeux and a buggy release of jetty”*

In our paper we tested Jetty 6.1.2 RC5. We will try 6.1.4 in our next tests.

Finally, we are planning to make the test source-code public so that others could conduct similar experiments on Ajax/Comet applications.

Comment by Engin Bozdag — July 2, 2007

Totally agree with the last comment by Steven Roussey — it’s as if they’ve done an academic study about the feasibility of crossing the Atlantic and concluded that it’s problematic because their pedalo sank. If you want to do efficient server push on the Web, you need an efficient push server. For example, the one that we sell (www.caplin.com) comfortably supports over 10,000 concurrent users on an average midrange server, and can send out over 4,000,000 messages per second. And that’s BEFORE clustering. You just have to design it right.

Comment by Paul Caplin — September 4, 2007

Leave a comment

You must be logged in to post a comment.