Thursday, March 4, 2010

i am shocked...

The main task in this paper (MapReduce Online) involved tokenizing and sorting a 5.5 GB file. Hadoop had to use 60 (seriously, 60!) nodes to do this in 900 seconds. The techniques introduced in the paper allowed Hadoop to do this in 600 seconds. But does anyone else think they can write a C program that runs on a single node that runs in half the time on 1/60th of the machines?

No comments:

Post a Comment