Recently I blogged about The Server Labs, a consultancy that
specializes in high-performance computing – including on Amazon Web Services.
Here’s another story that I found fascinating: nominally it
is about how The Server Labs uses Amazon Web Services as a scale-out solution
that also implements Oracle databases; however it’s really about space
exploration (or should I say “nebula computing”). It began with an email asking
whether there would be a problem running up to 1,000 Amazon EC2 High-CPU Extra-Large
instances.
The Server Labs is a software development/consulting group
based in Spain and the UK that works closely with the European Space Agency,
and they needed to prove the scalability of an application that they helped
build for ESA's Gaia project. In addition to the instances, they also requested
2 large and 3 X-Large instances to host Oracle databases that coordinate the
work being performed by the high-CPU instances.
Gaia’s goal is to make the largest, most precise
three-dimensional map of our Galaxy by surveying an unprecedented number of
stars - more than one billion. This, by the way, is less than 1% of all stars!
The plan is to launch a mission in 2011, collect data until 2017; and then
publish a completed catalog no later than 2019.
I had the opportunity to see a PowerPoint deck created and
presented by The Server Lab’s founder, Paul Parsons, and their software
architect, Alfonso Olias, who is currently assigned to this project.
The deck explained that the expected number of samples in
Gaia is 1 billion stars x 80 observations x 10 readouts, which is approximately
equal to 1 x 1012 samples—or as much as 42 GB per day transferred back to
Earth. There’s a slide in the deck that says “Put another way, if it took 1
millisecond to process one image, the processing time for just one pass through
the data on a single processor) would take 30 years.”
As the spacecraft travels, it will continuously scan the sky
in 0.7 degree arcs, sending the data back to Earth. Some involved algorithms
will come into play in order to process the data; and the result is a fairly
complex computing architecture that is linked to an Oracle database. Scheduling
the cluster of computational servers is not quite so complicated, and is based
on a scheduler that is focused on keeping each machine as busy as possible.
However the amount of data to process is not steady—it will
increase over time. Which means that infrastructure needs will also vary over
time. And of course idle computing capacity is deadly to a budget.
The opportunity to solve large computational problems
usually turns to grid computing. No difference this time either – except that
as mentioned above, the required size of the grid is not constant. Because
Amazon Web Services is on-demand, it’s possible to apply just enough
computational resources to the problem at any given time.
In their test, The Server Labs set up an Oracle database
using an AWS Large Instance running a pre-defined public AMI. Then they mounted
5 EBS volumes of 100 GB each, and mounted them to the instance.
Then they created Amazon Machine Images (AMIs) to run the
actual analysis software. These images were based on large instances and
included Java, Tomcat, the AGIS software and an rc.local script to
self-configure an instance when it’s launched.
The requirements break down as follows:
To process 5 years of data for 2 million stars, they will
need to run 24 iterations of 100 minutes each, which works out to 40 hours
running a grid of 20 Amazon EC2 instances. A secondary update has to be run
once and requires 30 minutes per run, or 5 hours running a grid of 20 EC2
instances.
For the full 1 billion star project numbers extrapolate out
more or less as follows: They calculated that they will analyze 100 million
primary stars, plus 6 years of data, which will require a total of 16,200 hours
of a 20-node EC2 cluster. That’s an estimated total computing cost of 344,000
Euros. By comparison, an in-house solution would cost roughly 720,000 EUR (at
today’s prices) – which doesn’t include electricity or storage or sys-admin
costs. (Storage alone would be an additional 100,000 EUR.)
It’s really exciting to see the Cloud used in this manner;
especially when you realize that an entire set of problem solutions that were
beyond economic possibility before the Cloud became a reality.
Mike
Recent Comments