If you have a mid-range or high-end video card in your desktop PC, it probably contains a specialized processor called a GPU or Graphics Processing Unit. The instruction set and memory architecture of a GPU are designed to handle the types of operations needed to display complex graphics at high speed. The instruction sets typically include instructions for manipulating points in 2D or 3D space and for performing advanced types of calculations. The architecture of a GPU is also designed to handle long streams (usually known as vectors) of points with great efficiency. This takes the form of a deep pipeline and wide, high-bandwidth access to memory.
A few years ago advanced developers of numerical and scientific application started to use GPUs to perform general-purpose calculations, termed GPGPU, for General-Purpose computing on Graphics Processing Units. Application development continued to grow as the demands of many additional applications were met with advances in GPU technology, including high performance double precision floating point and ECC memory. However, accessibility to such high-end technology, particularly on HPC cluster infrastructure for tightly coupled applications, has been elusive for many developers. Today we are introducing our latest EC2 instance type (this makes eleven, if you are counting at home) called the Cluster GPU Instance. Now any AWS user can develop and run GPGPU on a cost-effective, pay-as-you-go basis.
Similar to the Cluster Compute Instance type that we introduced earlier this year, the Cluster GPU Instance (cg1.4xlarge if you are using the EC2 APIs) has the following specs:
- A pair of NVIDIA Tesla M2050 "Fermi" GPUs.
- A pair of quad-core Intel "Nehalem" X5570 processors offering 33.5 ECUs (EC2 Compute Units).
- 22 GB of RAM.
- 1690 GB of local instance storage.
- 10 Gbps Ethernet, with the ability to create low latency, full bisection bandwidth HPC clusters.
Each of the Tesla M2050s contains 448 cores and 3 GB of ECC RAM and are designed to deliver up to 515 gigaflops of double-precision performance when pushed to the limit. Since each instance contains a pair of these processors, you can get slightly more than a trillion FLOPS per Cluster GPU instance. With the ability to cluster these instances over 10Gbps Ethernet, the compute power delivered for highly data parallel HPC, rendering, and media processing applications is staggering. I like to think of it as a nuclear-powered bulldozer that's about 1000 feet wide that you can use for just $2.10 per hour!
Each AWS account can use up to 8 Cluster GPU instances by default with more accessible by contacting us. Similar to Cluster Compute instances, this default setting exists to help us understand your needs for the technology early on and is not a technology limitation. For example, we have now removed this default setting on Cluster Compute instances and have long had users running clusters up through and above 128 nodes as well as running multiple clusters at once at varied scale.
You'll need to develop or leverage some specialized code in order to achieve optimal GPU performance, of course. The Tesla GPUs implements the CUDA architecture. After installing the latest NVIDIA driver on your instance, you can make use of the Tesla GPUs in a number of different ways:
- You can write directly to the low-level CUDA Driver API.
- You can use higher-level functions in the C Runtime for CUDA.
- You can use existing higher-level languages such as FORTRAN, Python, C, C++, Java, or Ruby.
- You can use CUDA versions of well-established packages such as CUBLAS (BLAS), CUFFT (FFT), and LAPACK.
- You can build new applications in OpenCL (Open Compute Language), a new cross-vendor standard for heterogeneous computing.
- You can run existing applications that have been adapted to make use of CUDA.
Elastic MapReduce can now take advantage of the Cluster Compute and Cluster GPU instances, giving you the ability to combine Hadoop's massively parallel processing architecture with high performance computing. You can focus on your application and Elastic MapReduce will handle workload parallelization, node configuration, scaling, and cluster management.
Here are some resources to help you to learn more about GPUs and GPU programming:
- NVIDIA GPU Computing Developer Home Page.
- CUDA Toolkit Download.
- CUDA By Example, published earlier this year.
- Programming Massively Parallel Processors, also published this year.
- The gpgpu.org site has a lot of interesting articles.
So, what do you think? Can you make use of this "bulldozer" in your application? What can you build with this much on-demand computing power at your fingertips? Leave a comment, let me know!
--Jeff;




Wow, shiny new feature.
Posted by: Lix | November 15, 2010 at 12:42 AM
Incredible development, I was just very shortly ago thinking about the very limited availability of GPU cloud computing.
I'm really excited and interested in hearing about the kind of applications people will make with this.
Posted by: Wladimir | November 15, 2010 at 02:59 AM
Very cool! There are several folks at MIT and Harvard that are very excited about this. Looks like I have some more work to do in StarCluster (http://web.mit.edu/starcluster) to support this new instance type...should mostly just be AMI work though (installing NVIDIA driver).
Posted by: Justin Riley | November 15, 2010 at 07:04 AM
cool! reminds me of math co-processors from the old 8086 plus. Is 22GB the max RAM capable on the CGI instance? If not, why did you choose to lower it?
Posted by: P. Clarke Thomas | November 15, 2010 at 07:27 AM
I would recommend using PyCUDA or PyOpenCL instead of PyStream (outdated) to program GPUs on EC2!
Posted by: Nicolas Pinto | November 15, 2010 at 07:29 AM
We are already heavily invested into GPUs, their GFlop/Watt is much better than CPUs for compute-bound problems. We are running small MPI clusters with lots of them and are looking into private GPU clouds. I'd like to learn more about EC2
Posted by: Sven | November 15, 2010 at 07:49 AM
This is a giant leap forward for GPU computing!
We at TidePowerd will get cracking to support these new instances and (hopefully) add a little .NET flare to this groundbreaking service.
Way to go Amazon!
Posted by: Nick Beecroft | November 15, 2010 at 10:36 AM
Suppose I would like to deploy standard Windows based 3D OpenGL apps on a GPU cluster. Would this give me a working solution using Citrix XenDesktop for example? Will Windows7 clients running on a Windows Server2008R2 running on XenServer runnig on this GPU cluster, recognize a valid OpenGL graphics driver?
Quite interesting business cases are shining at the horizon I think if this will work. Please could someone comment on this?
Posted by: Manfred van der Voort | November 15, 2010 at 11:25 AM
I have a question for Amazon. When will you launch Windows GPU AMI.
Posted by: Hareemca | November 15, 2010 at 10:57 PM
It would be fantastic if ATI GPUs could be offered as well. Gary Frost, a developer at AMD, is working on a API called Aparapi to allow for Java bytecode to be converted directly to OpenCL instead of having to use / learn the OpenCL / CUDA Java APIs. More details can be found at http://developer.amd.com/zones/java/Pages/default.aspx towards the bottom, and the alpha download is at the top of the page.
Cheers,
Jim
Posted by: Jim Bethancourt | November 16, 2010 at 07:12 AM
you guys simply ROCK. When I read this I sent an email to everyone in the company saying... early Christmas Gift from Amazon. Anyway u guys are so far ahead.. no one can really touch you guys. Oh by the way THANK YOU THANK YOU THANK YOU for making those gpu instances affordable and not gauge customers like the other gpu clouds.
Posted by: tim | November 16, 2010 at 11:01 AM
You can also use SGC-Ruby-CUDA (http://github.com/xman/sgc-ruby-cuda) if you are using Ruby. As the CentOS AMI for Cluster GPU is very primitive, a community AMI (013944832161/SGCRubyCUDA.1 located in US East Virginia zone) had been created for quick trial.
Posted by: xman | November 22, 2010 at 12:54 AM
FYI, we just created a new GPU/Cluster Compute AMI for StarCluster that contains CUDA/PyCuda/PyOpenCL/etc if anyone's interested. See http://mailman.mit.edu/pipermail/starcluster/2010-December/000572.html for details.
Posted by: Justin Riley | December 20, 2010 at 06:45 PM