My Photo

« More Bits for Your Money - AWS Bandwidth Pricing Reduced | Main | Amazon S3 Copy API Ready for Testing »

On Condor and Grids

There is lots of buzz about Hadoop and Amazon EC2—and of course there should be, given all the great projects such as the one that the New York Times one, where they converted old articles into PDF files in short order at a very reasonable cost.

There’s a second environment you should know about, although the buzz level is a bit lower. (That might change.) Condor is a scheduling application that is commonly used in HPC and grid applications. It can also be used to manage Hadoop grids, and manages “jobs” in much the same manner as mainframes—that is, you submit a job to Condor, along with metadata that describes the job’s characteristics. Then Condor finds suitable resources to allocate for the job. Note that Condor and Hadoop are trying to solve things in independent ways--with the result that they overlap in some ways, while doing unrelated things in some cases.

This week I attended Condor Week at the University of Wisconsin in Madison. Condor Week is an annual event that gives Condor collaborators and users the chance to exchange ideas and experiences, to learn about latest research, to experience live demos, and to influence our short and long term research and development directions.

If you are interested in large-scale grid computing, this approach is worth a serious look. There are two active projects that implement Condor on Amazon EC2, and of course that’s why this blog entry is being posted.

Cycle Computing offers Amazon EC2 plus Condor as an integrated platform, in addition to supporting other underlying computing resources. Their software automates Condor grid management, including monitoring, configuration, version control, usage tracking, and more. At the conference Jason Stowe from Cycle Computing made a very strong case for using Amazon EC2 instead of a traditional grid environment. Jason’s presentation is available for download at http://www.cs.wisc.edu/condor/CondorWeek2008/condor_presentations/stowe_cycle.pdf.

Red Hat’s approach integrates EC2 directly into the Condor code base. The result is that an Amazon EC2 instance is the “Condor Job”, and in that manner they are able to manage the entire life cycle of an EC2 Instance. In some cases the entire Condor pool is running on EC2, and in other cases EC2 augments an existing pool. All of this work was done by collaboration between the University of Wisconsin (Jaeyoung Yoon , Fang Cao, and Jaime Frey, along with Matt Farrellee from Red Hat. They plan to integrate Amazon S3 as a storage medium in the near future.

One thing seems certain: on-demand virtualization brightens the lights in Grid Computing City, because organizations who could not afford a grid suddenly find themselves with both affordable infrastructure and powerful tools to manage their new-found tool.

-- Mike

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341c534853ef00e55223ddb78834

Listed below are links to weblogs that reference On Condor and Grids:

Comments

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Comments are moderated, and will not appear until the author has approved them.

Email Subscription

Enter your email address:

Delivered by FeedBurner

July 2009

Sun Mon Tue Wed Thu Fri Sat
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31