Developers who have found our cloud computing model attractive have been asking us to be a little bit more open about what we are planning to do in the future. To date we've simply announced new additions to the Amazon Web Services lineup, with immediate beta availability at the time of announcement.
Earlier this year we started to post specifications for new features along with requests for feedback. We did this for the Amazon S3 Copy feature and for Amazon S3 Post Support . We received a lot of helpful feedback in both cases.
Now it is time for the next step...
I am excited to be able to tell you about an entire new feature, a feature so new that it doesn't even have a proper name, and that you can't use just yet. But you can read about it and you can start thinking about the best way to incorporate it into your system architecture.
If you have taken a close look at Amazon EC2, you know that the instances are ephemeral. The instances have anywhere from 160 GB to 1.7 TB of attached storage. The storage is there as long as the instance is running, but of course it disappears as soon as the instance is shut down. Applications with a need for persistent storage could store data in Amazon S3 or in Amazon SimpleDB, but they couldn't readily access either one as if it was an actual file system.
As you can read in our forum post, we've been working on addressing this.
In the same way that your running EC2 instances, your Elastic IP addresses, your S3 buckets and your SQS queues can be thought of as items contained within the scope of your AWS account, our forthcoming persistent storage feature will give you the ability to create reliable, persistent storage volumes for use with EC2. Once created, these volumes will be part of your account and will have a lifetime independent of any particular EC2 instance.
These volumes can be thought of as raw, unformatted disk drives which can be formatted and then used as desired (or even used as raw storage if you'd like). Volumes can range in size from 1 GB on up to 1 TB; you can create and attach several of them to each EC2 instance. They are designed for low latency, high throughput access from Amazon EC2. Needless to say, you can use these volumes to host a relational database.
You will also be able to perform "snapshot" backups of your volumes to Amazon S3. You can use these snapshots to create new volumes or to roll back your stored data to an earlier point in time.
The volumes are accessible via a new set of APIs, with functions like CreateVolume, DeleteVolume, AttachVolume, and CreateSnapshot. The same functionality is also available via the EC2 Command-Line tools.
I spent some time experimenting with this new feature on Saturday. In a matter of minutes I was able to create a pair of 512 GB volumes, attach them to an EC2 instance, create file systems on them with mkfs, and then mount them. When I was done I simply unmounted, detached, and then finally deleted them.
First I created the volumes from the command line of my Windows desktop:
VOLUME vol-4695702f 549755813888 creating 2008-04-13T22:17:35+0000
U:\USER\Jeff\Amazon> ec2-create-volume -s 549755813888
VOLUME vol-59957030 549755813888 creating;2008-04-13T22:17:49+0000
U:\USER\Jeff\Amazon> ec2-describe-volumes
VOLUME vol-4695702f 549755813888 available 2008-04-13T22:17:35+0000
VOLUME vol-59957030 549755813888 available 2008-04-13T22:17:49+0000
Then I attached them to my EC2 instance:
ATTACHMENT vol-4695702f i-6b3bfd02 /dev/sdb attaching 2008-04-13T22:36:32+0000
U:\USER\Jeff\Amazon> ec2-attach-volume vol-59957030 -i i-6b3bfd02 -d /dev/sdc
ATTACHMENT vol-59957030 i-6b3bfd02 /dev/sdc attaching 2008-04-13T22:36:55+0000
Then I switched over to my instance, formatted and mounted them, and I was all set:
# yes | mkfs -t ext3 /dev/sdc
# mkdir /space1 /space2
# mount /dev/sdb /space1
# mount /dev/sdc /space2
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 9.9G 765M 8.6G 8% /
none 851M 0 851M 0% /dev/shm
/dev/sda2 147G 188M 140G 1% /mnt
/dev/sdb 504G 201M 479G 1% /space1
/dev/sdc 504G 201M 479G 1% /space2
Perhaps I am biased, but the ability to requisition this much storage on an as-needed basis seems pretty cool.
A few EC2 customers are already using these new volumes and we will be opening it up to a wider audience later this year. You should sign up now if you are interested in gaining access to this cool new feature. If you don't already have an Amazon Web Services account, get one today before you sign up for the waiting list.
We'll be releasing more information as soon as possible and I'll do my best to cover it here when we do.
Updated: Here is some additional coverage:
- Amazon's Werner Vogels talks about this feature in Persistent Storage for Amazon EC2.
- RightScale's Thorsten vok Eiken does too, in Amazon takes EC2 to the next level with persistent storage volumes.
--- Jeff;
"Needless to say, you can use these volumes to host a relational database."
... and that's the line we've been waiting for. w00t!
Though I have to ask: how reliable are these volumes? Do they get the same redundancy/replication as normal S3 data?
Posted by: Yoz | April 13, 2008 at 09:15 PM
Having tested the new storage volumes I can say only one thing: you'll love them! They really raise the EC2 offering to the next level. It will surpass non-cloud computing not only in scale and price but also in features. Yay!
More thoughts on how the storage volumes will change the game in my blog post at http://blog.rightscale.com/2008/04/13/amazon-takes-ec2-to-the-next-level-with-persistent-storage-volumes/
Posted by: Thorsten - CTO RightScale | April 13, 2008 at 10:44 PM
On behalf of everyone using EC2 or that has used EC2 in the past but written it off due to the various limitations with data persistence, I would like to say Thank You. Google AppEngine.. what's that?
Posted by: Paul Stamatiou | April 13, 2008 at 10:54 PM
It's also possible to attach the same storage to more than one ec2 instance?
Posted by: Mirko Sciachero | April 13, 2008 at 11:31 PM
Great feature. That boosts the usability of the EC2 service since it is now much easier to use for already built applications and tools.
1. Are these volumes going to be resizable? For example, can you start off by a 100GB volume, then later resize is to 200GB?
2. I wonder if the cost is going to be based on read/writes vs. just the size of the volume.
3. What is the cost of a snapshot? Is it based on volume size? size of data on volume? or incremental based on changes from previous snapshot?
Posted by: Victor Boctor | April 14, 2008 at 12:26 AM
Simply beautiful. I just hope that pricing will be set in a "affordable" range ;-).
/p
Posted by: Przemyslaw Rudzki | April 14, 2008 at 02:13 AM
Yep, this has made me ecstatically happy!
Posted by: James Hill | April 14, 2008 at 03:28 AM
This is amazing. I'm curious, is there any plan to allow a single volume to be mounted read-only across several EC2 instances?
Posted by: felix | April 14, 2008 at 03:31 AM
Is there any timeline on future availability of this service? I am launching a site within days, and the final frontier for me was installing JungleDisk to have an S3 filesystem.
Obviously, this news is the must preferred solution. I would hold off if there was a chance this functionality would become available in the next few weeks.
Posted by: Ryan | April 14, 2008 at 06:02 AM
What about building an EC2 instance cluster with a shared file system using Redhat Global File System ?
http://www.centos.org/docs/5/html/5.1/Global_File_System/
Would this be possible ?
Posted by: Nicolas Lehuen | April 14, 2008 at 08:18 AM
For read only access you may be able to mount using an SSHFS which will be secure.
Securing NFS in this environment may be more difficult than its worth for read only access.
Posted by: Amit Sudharshan | April 14, 2008 at 08:21 AM
This is excellent. I'd like to add my voice to the chorus of calls (also on the forum post) asking for individual mounted volumes to be read/writeable by multiple instances. I think this really is a crucial feature. All sorts of back end processing of data, files, DB contents etc. becomes much, much more feasible, easy and reliable if the volume it's mounted on is accessible from any of multiple EC2 instances that exist to do the processing. Please make this possible at release, Amazon!
Thanks,
Alex
Posted by: Alex Kerr | April 14, 2008 at 08:35 AM
Paul, as stated on Werner Vogels' blog you can attach a volume only to one instance. I guess this is a feature we'll have to hope for in V2.
Victor, unlike all other AWS services which are priced very aggressively it looks like the storage volumes will be horribly expensive (JOKE). Seriously, before introducing elastic IPs AWS had been discussing a number of pricing options and I believe everyone was surprised by the pricing model they chose: free while you use it and pay while you don't. If you think about it, it makes a lot of sense for the EIPs. So I expect to be surprised by the storage volume pricing structure and to find it to make sense.
Felix: I wouldn't hold off if I were you. If Jungle Disk works for you the price is right and it'll get you off the ground. You can then move over to the storage volumes once they become available and all the tools are there.
Posted by: Thorsten - CTO RightScale | April 14, 2008 at 09:19 AM
Compliment: This is truly a great frontier - thanks for listening to the consumers! - listening is your competitive advantage in the future of Cloud Computing!
Request for clarification: The S3 environment has data redundancy - data across multiple machines - My perception is that this new service has it data stored on a single physical disk - Is this correct?
Request for feature: The geographic zone feature for EC2 is an important part of redundancy - it would be nice to be able for developer to build a system of these new disks across multiple hardware (different physical drives) and geographic zones. There should also be clarification (service level agreement) that the disk is not on the same physical disk as another raw disk one is using.
Coining a name: How about "point storage", "raw drive", "plain disk", "true disk", or "mount point".
Sincerely,
Ian
Posted by: Ian | April 14, 2008 at 09:20 AM
"I'd like to add my voice to the chorus of calls (also on the forum post) asking for individual mounted volumes to be read/writeable by multiple instances. I think this really is a crucial feature."
I agree that this would be useful, but is it really that crucial?
Think of how difficult it is to have multiple servers have read/write access to the same raw disk today. I'm not even sure what hardware could be used, apart from FireWire. Can standard Fibre Channel hardware do this?
And it would probably also be a pain to set up. Oracle clustering uses shared disks, but it's just not that common, especially not in open source software. I wouldn't put this at the top of the todo list.
Posted by: Guan Yang | April 14, 2008 at 10:18 AM
If shared storage is crucial really depends on your app, for some it is.
FireWire isn't the common way to do it. Usually its fiber channel, or ethernet (iscsi). It's not that hard to set up. RHEL has it built in (dlm, gfs, etc), and v5 supports a 128 node quorum.
If there's not a (really) quick way to attached your storage to another node it almost becomes essential or you'll be dead in the water while you're waiting for your locked up node to release storage and attach it to another node.
Posted by: Daniel | April 14, 2008 at 12:47 PM
Guan, yes I would say the ability for multiple EC2 instances to access the same storage volume was indeed "crucial". That's one reason why so many people are asking for it (across the blogs/forums I've seen this posted on already). So far there have been various 3rd party attempts at creating a single "disk" across multiple instances (e.g. s3dfs and various others). I don't care how Amazon actually implement it, just as long as it's possible. If this doesn't happen, it means we are still at square one almost, and data (files, DBs etc) that are being processed by a farm of multiple instances have to have that data continually replicated across all storage volumes. This is a major, major pain. Multiple instance access to a volume solves this instantly. The fact that these 3rd party solutions were much asked for, and are much used, shows how needed this feature is.
You seem to be thinking of the difficulties associated with physical network hardware, I don't think these need apply in the AWS environment.
Posted by: Alex Kerr | April 14, 2008 at 02:54 PM
I would be great if we could possibly attach the same ec2 storage to more than one ec2 instances (at best for both read and write but if this is not possible then at least for read).
And while I am posting here, when do you expect European EC2 data centers to be supported (for lower latency) ?
Posted by: Morten | April 14, 2008 at 11:56 PM
You guys rock. Watching stuff like this develop makes me think I'm watching something very big happen. Keep it up.
Posted by: Markus | April 15, 2008 at 06:19 AM
I also hope this will be affordable...I had some ideas for solutions I posted in the AWS developer forums that I would have hoped made it to someone at Amazon. Anyway, to be short, I really hope there's some sort of "included" persistent storage with EC2...Having to pay additional for persistent storage, backup (s3), AND the EC2 server is just out of hand and out of line. Unless you're of course tying up a ton of space...I think allowing people to simply create partitions of any size is over the top...but I think you should at least get some amount of included persistent storage with your EC2 instance (based on the size of the instance) just to cover things...On the other hand, I think it's abusive to store media that will see high downloads...that's best suited for S3 (or another paid solution), but normal site content sure yea it should be protected. I really guess there's no way to police that other than setting a limit to the "included" space.
My other ideas included keeping EC2 data for a set amount of time...giving the server admin enough time to bring another instance to make sure nothing is lost. So setting some sort of expire after X amount of days or what not. OR charge for storage the moment the instance is down...(better yet) This is probably the BEST solution as it doesn't tie up disk space on Amazon's cloud and it doesn't screw over the user. It's fair.
Anyway, hope Amazon continues to be smart here and this all works out.
Posted by: Tom | April 16, 2008 at 07:19 AM
Well all i can say is i cant wait to find out more and get my hands dirty, its one of the missing links to make everything come together, and i can see it being very effective...
Posted by: Matthew Lanham | May 27, 2008 at 01:01 PM
I am absolutely salivating for this; just cannot wait to get my greedy little hands on EC2 volumes! I signed up moments after receiving the email a few weeks ago, but nothing yet. :(
Pick me, pick me!
Stu
Posted by: Stu Thompson | May 28, 2008 at 11:25 AM
Great move by Amazon and I second the request for sharing a drive across instances.
Might I suggest a name: Elastic Density Drive (ED2)
Laith.
Posted by: Laith Zraikat | August 04, 2008 at 11:10 AM