My Photo

« See You in DC Next Week... | Main | /n Software Adds Amazon Support »

Amazon EC2 For Scientific Processing

Bioinformatics_for_dummies Mike Cariaso was kind enough to set up the Meetup in Bethesda for my upcoming trip to Washington, DC. Mike has done some pretty cool work with with Amazon EC2, setting up the mpiBLAST tool to run on EC2.

MPI, short for Message Passing Interface, is a standard for coordinating processing on supercomputer grids. MPIPCH2 is a popular implementation of MPI.

BLAST is the primary bioinformatics tool used to query genome sequences against an established database, or to match one sequence against another. The primary BLAST tool is run as an online service by the National Institute of Health.

Running BLAST over MPI lets BLAST run on a processing grid; this variant is called mpiBLAST.

Mike's work builds on that of Peter Skomorch, who did the work needed to get MPIPCH2 running on Amazon EC2. Peter documented his work in a very informative set of blog posts:

That last post doesn't actually reference EC2, but it is entertaining nonetheless. Part 2 ends with a parallel fractal calculation running on 5 EC2 instances!

By the way, I'm very interested in hearing about more academic and scientific uses of EC2. Please feel free to post a comment.

-- Jeff;


TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341c534853ef00df351c67b58833

Listed below are links to weblogs that reference Amazon EC2 For Scientific Processing:

Comments

I'm anxious to try some MPI-EC2-FUN3D-powered Computational Fluid Dynamics simulations, but unfortunately I'm still on the waiting list for EC2...

There are obvious advantages to using EC2-like services for scientific/academic research but I think there will be funding-related barriers to complete adoption. Many researchers have the money but cannot rent services/equipment like EC2 because of the restrictions on grant money from the various granting agencies (NIH, NSF, etc). Researchers want/need to get into HPC but often the universities have too much bureaucracy, don't have the space or resources, or simply cannot afford to make it happen on campus. There will always be a need to have low-scale HPC on research campuses, but once code is refined and scales well it makes sense to rent time on something like EC2 (or one of the national labs like SDSC). NSF and others are obviously pushing for grids and centralized super computing centers for cost reasons but if you (amazon) can compete with them on cost you might actually win.

Hi Jeff,

My name is Mike and I am the founder of MindValley, an Internet Startup company. We would like to use Amazon EC2 for some hard core research by developing a superior algorithm that helps us determine what the hottest stories are right now coming out of the blogosphere.

So, while we are a for profit company, I was hoping to get in touch with someone at Amazon to better explain what it is we are trying to do and how both Amazon and MindValley can benefit. We plan on launching hundreds of new niche social media sites that will be powered by our new algorithm and on each one of them we can promote Amazon EC2 once they area all powered by Amazon EC2.

What you are doing is really fantastic and incredibly visionary. To help small startups scale like this has never been possible before and I sure hope that we can be one of the first to start taking full advantage of your service soon.

I look forward to hearing from you soon. Have a great weekend.

Mike

Hi Jeff, Matt here. One thing I don't get about using EC2 for scientific computing is that, as I understand it, I'm only able to rent a (virtual) slice of a machine. Other customers could be sharing the same physical machine, right? So if I'm paying for, say, ten machines, I don't know if I'm really going to get ten CPUs worth of computing, or five, or two.

Am I wrong here? If I'm right, can we get an option (at a higher price perhaps) to guarantee a given amount of compute power?

Matt Jensen
Seattle

I've blogged here [http://ano.malo.us/archives/bayesian-networks-in-the-cloud] about using EC2 to run some Bayesian network learning tasks. While this was just a small test run with one instance, I hope to run much larger jobs on EC2 soon (if I can get approval for the $$$)

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Comments are moderated, and will not appear until the author has approved them.

Email Subscription

Enter your email address:

Delivered by FeedBurner

July 2009

Sun Mon Tue Wed Thu Fri Sat
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31