Monday, May 18th, 2009

Crap I missed it, doesn’t miss your file upload!

Category: Showcase, UI

<>p>The Crap I missed it! crew took on the task of dealing with importing your iTunes XML file, and wanting to give you responsive feedback on the items as they come in. The usual tactic would be to suck in the entire file, and then process it.

Michael Baldwin did more, and here he tells us more:

We wanted to allow users to upload their iTunes library files, so that we could extract artist names (to let users sign up for new album and concert notifications). The problem was that these .xml library files can easily run up to 20MB in size.

Which means 1) long, boring downloads without feedback that it’s really working, 2) huge space requirements on our servers to support lots of concurrent uploads, and 3) big memory requirements to process the XML files.

What we did instead was to write a bare-bones custom web server just for this task in PHP (yes, PHP) which analyzes the file as it streams in (storing nothing on disk, and using negligible memory), gradually puts artist statistics into shared memory, and then responds to new AJAX requests every ten seconds which retrieve and remove the artist statistics from shared memory.

The end result for users: as users upload their library file over the course of several minutes, they get to watch their web page fill up with their list of artists at the same time, sorted and even animated. If the connection breaks, they can even choose to continue with just the artists that made it.

The end result for us: we can deal with gigabytes of uploads while using trivial computing resources — just a couple KB of incoming buffer space and a couple KB of outgoing buffer space.

You can check it out yourself, where it will look a bit like this (but constantly updating!)

Related Content:

16 Comments »

Comments feed TrackBack URI

Awesome work :-)

But…
Wouldn’t it be an idea to (g) zip the xml file with flash or so before transferring?

Comment by SchizoDuckie — May 18, 2009

Common sense != news.

Anyway, back to work on <a href=’http://imgkk.com/demo/’things no one will ever notice because Ajaxian priorities stuff like this.

Comment by Darkimmortal — May 18, 2009

why is scripting in PHP so surprising?

if you wrote it in CSS, that would be worthy of a (yes, CSS).

anyways, this is pretty cool. i wonder how many people feel comfortable uploading their iTunes prefs over to a third party.

Comment by driverdave — May 18, 2009

@driverdave (then turned into a rant):

People do this a lot on the web. Facebook and tons of other sites have always supported that “give us your email and password and we will find your other friends on this network” mentality. As long as they provide a benefit for the user anyone would do it.

These days its much safer though with sites providing for a real means for third party auth.

Comment by thenightwassaved — May 18, 2009

I’m going to break from the pack and say “WOW”.

For me it’s this part that rocks:

analyzes the file as it streams in (storing nothing on disk, and using negligible memory)
(snip)
responds to new AJAX requests every ten seconds which retrieve and remove the artist statistics from shared memory

That is a very cool feature, genius in fact. If “everyone” has been doing this then I must be way behind the times.

PS: darkimmortal, that demo rocks, my jaw dropped.

Comment by user24 — May 18, 2009

oh and PS: it’s not the scripting in PHP that deserves the “(yes, PHP!)”, it’s this part: “a bare-bones custom web server

I’m guessing they’re using PHP’s socket function to create a listening server on an unusual port, then submitting the form in an iframe (or via ajax) to that port number. That’s totally sweet.

Comment by user24 — May 18, 2009

@SchizoDuckie: no it wouldn’t make sense to zip the file – the whole point is that they can analyse the data as it uploads. Using zip they’d have to wait for the whole file to come in, which would defeat the purpose.

Sorry for flooding the comments, I just can’t get over how amazingly cool this is. Truly innovative.

Comment by user24 — May 18, 2009

@Darkimmortal: I checked out your link and, for the record, Safari 4 is ‘better’ than Firefox 3.54. If your project has such specific requirements, don’t be surprised that nobody notices it.

Comment by okonomiyaki3000 — May 18, 2009

So, low-level sockets API plus the convinience of PHP is understandable. Then shared memory for IPC, then that other process does the database inserts. Clever, but, why not go for a truly multithreaded solution, like a java or python app? (both practically compilable to binary executable code, no need for even a bytecode cache)

Comment by PAStheLoD — May 19, 2009

@Darkimmortal

Wow…!!!! :D

Comment by ThomasHansen — May 19, 2009

@user24: I think it’s possible to gzip/gunzip a stream, so gzipping would *not* defeat the purpose. But I don’t have enough Flash experience to say whether client-side gzipping is possible with Flash.

Comment by nbr — May 19, 2009

@nbr: yes it is: http://probertson.com/projects/gzipencoder/

Comment by Joeri — May 19, 2009

@user24

i mean, i get it, it’s unconventional to write a web server with PHP. my guess is the author is very comfortable with PHP. when PHP is your best hammer, everything looks like a nail.

i’m not a hater, PHP is my best hammer as well. but i’d guess that PHP was chosen because of the author’s level of familiarity with PHP, not because PHP beat out other languages.

Comment by driverdave — May 19, 2009

@nbr: I didn’t know it was possible to ungzip partially recieved gzipped data. If true, that’s a very cool idea.

@driverdave: Yes I see what you’re saying – good point. Perhaps it’s to do with sharing the memory with other PHP code, or being able to access the same codebase as the rest of the site is written in?

Comment by user24 — May 19, 2009

Thanks for all the comments. Zipping with flash would be a very cool idea, it’s an additional level of complexity we’re not quite ready to add yet..

And as to why we chose to write the web server with PHP’s socket functions, it was mainly to be able to include the existing codebase and all its functions with a single line, and for the business reason of not adding a second language to our codebase.

Plus, there’s no practical downside we can see — a single CLI daemon process runs in the background all the time, forking off child processes, so there’s no need to re-”compile” the PHP code on new connections, and all the memory used by the PHP interpreter is shared.

Although we have to admit it did seem like a bizarre idea to write the web server itself in PHP when we first thought of it… Now if we can just get it to interpret PHP files as well, the circle will be complete. :)

Comment by MichaelBaldwin — May 19, 2009

Nanoweb is a web server written entirely in php, with support for virtual hosts, fastcgi, db auth, etc.. It’s something much more advanced and less “to the point” that your solution.

Plus, there’s no practical downside we can see — a single CLI daemon process runs in the background all the time, forking off child processes, so there’s no need to re-”compile” the PHP code on new connections, and all the memory used by the PHP interpreter is shared.

By the way, have you tested how many concurrent uploads can your server handle? There’s a reason why no mainstream server software uses a process-per-client model.

Comment by kilburn — May 20, 2009

Leave a comment

You must be logged in to post a comment.