Accenture is a Global Solution Provider for AWS. As part of their plan to help their clients extend their IT provisioning capabilities into the cloud, they offer a complete Cloud Computing Suite including the Accenture Cloud Computing Accelerator, the Cloud Computing Assessment Tool, the Cloud Computing Data Processing Solution, and the Accenture Web Scaler.
Huan Liu and Dan Orban of Accenture Technology Labs sent me some information about one of their projects, Cloud MapReduce. Cloud MapReduce implements Google's MapReduce programming model using Amazon EC2, S3, SQS, and SimpleDB as a cloud operating system.
According to the research report on Cloud MapReduce, the resulting system runs at up to 60 times the speed of Hadoop (this depends on the application and the data, of course). There's no master node, so there's no single point of failure or a processing bottleneck. Because it takes advantage of high level constructs in the cloud for data (S3) and state (SimpleDB) storage, along with EC2 for processing and SQS for message queuing, the implementation is two orders of magnitude simpler than Hadoop. The research report includes details on the use of each service; they've also published some good info about the code architecture.
Download the code, read the tutorial, and and give it a shot!
--Jeff;


If truly true, this sounds amazing. I had a quick look at their Wiki and skimmed the PDF. It looks like it benefits from "assuming" having access to AWS services. That's where simplicity and performance benefits come from?
My concern in adopting this would be:
* no updates since Dec 2, 2009
* small team - 2 people, small/no community
* no ecosystem (Hadoop has a pile of subprojects)
Has the Cloud MapReduce team considered finding a way to get Cloud MapReduce under the Hadoop umbrella, where a possibly happy marriage could be arranged to benefit everyone?
Posted by: Otis Gospodnetic | January 21, 2010 at 08:25 AM