Wednesday, March 4th, 2009
Map Reduce in the browser
Ilya Grigorik of Igvita has proposed and built a collaborative Map Reduce system in JavaScript that allows browsers to dive in and use their CPU to do some things.

On the JavaScript side you can do something like:
-
-
function map() {
-
/* count the number of words in the body of document */
-
var words = document.body.innerHTML.split(/\\n|\\s/).length;
-
emit('reduce', {'count': words});
-
}
-
-
function reduce() {
-
/* sum up all the word counts */
-
var sum = 0;
-
var docs = document.body.innerHTML.split(/\\n/);
-
for each (num in docs) { sum+= parseInt(num)> 0 ? parseInt(num) : 0 }
-
emit('finalize', {'sum': sum});
-
}
-
-
function emit(phase, data) { ... }
-
And you can have a job server on the other end (here is an example using Ruby):
-
require "rubygems"
require "sinatra"
configure do
set :map_jobs, Dir.glob("data/*.txt")
set :reduce_jobs, []
set :result, nil
end
get "/" do
redirect "/map/#{options.map_jobs.pop}" unless options.map_jobs.empty?
redirect "/reduce" unless options.reduce_jobs.empty?
redirect "/done"
end
get "/map/*" do erb :map, :file => params[:splat].first; end
get "/reduce" do erb :reduce, :data => options.reduce_jobs; end
get "/done" do erb :done, :answer => options.result; end
post "/emit/:phase" do
case params[:phase]
when "reduce" then
options.reduce_jobs.push params['count']
redirect "/"
when "finalize" then
options.result = params['sum']
redirect "/done"
end
end
# To run the job server:
#> ruby job-server.rb -p 80
And with Web Workers you can have the work churn :)












This would be perfect for something like SETI@Home, IMHO.
You guys must be hard up for stories… The guy might as well have written a Hello World application.
The main problem with any serious distributed computing task in Javascript is that the tasks would have to execute unnoticeably (where WebWorkers and Gears may help with), but also in the short time span for about 30 seconds. SETI@Home and Folding@Home work with relatively large datasets and relatively large operations. To the point of which the costs of networking all the users may be larger than the cost of computing it in a cloud.
A year ago, I did something similar for the purpose of breaking some MD5 hashes (No real possibility of this helping humanity, hash cracking is mostly malicious) (GAE server, http://jsdc.appspot.com/). I also made something that calculates pi (appjet server, http://distpi.appjet.net/).
Hmmm. Probably not a good idea to have visitors’ browsers doing computing work for you. Somehow I think people who do things like that eventually get into trouble…