AWS Identity and Access Management is very powerful and very flexible. My colleague Elliot Yamaguchi has written a blog post that shows you how to use IAM to create a policy which implements folder-level permissions within an Amazon S3 bucket. By using this policy, you can allow hundreds of users to safely share a single bucket, restricting each one to a particular folder within the bucket.
The post contains a complete explanation of the policy. You can use it as-is or you can customize it as needed.
I saw an interesting quote this past weekend from inventor and entrepreneur Dean Kamen. In response to a claim of instant success for one of his products, Dean responded that it wasn't in fact instant, but was actually the result of between 15 and 20 years of research and development.
Amazon S3 hasn't been around for nearly that long, but the service is growing very rapidly. This kind of rapid growth comes from a wide variety of customers with an equally wide variety of use cases. Our customers tell us that they use S3 in heavily regulated industries, for any application that needs access to data at Internet scale, and for solutions that address today's Big Data challenges. Let's look at how S3 is being put to use by our customer base in these ways.
S3 in Mature and Regulated Industries Customers in heavily regulated industries such as finance, health care, and government are using S3 today. For example:
NASDAQ OMX is the largest exchange company in the world, and uses S3 for their FinQloud and Market Replay offerings. FinQloud provides NASDAQ OMX’s clients with efficient storage and management of financial data to help address regulations such as the U.S. Securities and Exchange Commission (SEC) Rule 17 a-4 (Books and Records), which require storage of certain regulated financial data for specific periods of time. Among other use cases, FinQloud utilizes S3 for data storage to help broker-dealers meet record archival and retrieval requirements. The Market Replay offering helps customers quickly access historical stock price information. As noted in the case study, NASDAQ OMX “saw that Amazon S3 would enable them to deliver hundreds of thousands of small files per day to AWS, and then back to the customer - in seconds - an ideal solution at a low cost.”
Toshiba Medical Systems Corporation is a leading Japanese
manufacturer of diagnostic imaging systems. They run a health care cloud
service called “Healthcare@Cloud” which utilizes Amazon S3 for X-ray, CT and MRI image data recorded by health care institutions. S3 allows them to meet guidelines on safety management for medical information
systems, as well as guidelines on employees who are entrusted
with medical data (more information can be found in the case study).
The National Renewable
Energy Laboratory’s Open Energy Information Initiative is an
open source knowledge sharing platform created to facilitate access to data and
tools that accelerate the transition to clean energy systems. OpenEI, which follows guidelines set by the
White House’s Open Government Initiative, utilizes S3 for storage of datasets that
users can upload and share. There are
several hundred datasets today, including global energy and mining data from
the World Bank as well as air emissions data from the EPA.
S3 at Internet Scale Many of the largest Internet companies rely on S3 to store vast amounts of data. Here are a few examples:
Netflix, a leading online subscription service for watching movies and TV programs, runs their streaming video business on AWS. Netflix uses S3 to store petabytes of video content, which they then distribute to their customers’ devices via a CDN. When Netflix needs to create a video format for a new device, they then stream their S3 video content to thousands of EC2 instances for transcoding.
Instagram enables its users to quickly and easily share photos with their friends and family from their mobile devices. S3 provides the storage backend behind Instagram’s offering, which has now grown to over 100 million users per month.
Spotify is an online music service offering instant access to over 16 million licensed songs. As noted in the case study, using S3 gives them confidence in their “ability to expand storage quickly while also providing high data durability”, allowing them to add over 20,000 tracks a day to their catalog.
S3 for Big Data Many of our customers store their application and web server logs in S3 for later analysis. These files can occupy a lot of space but S3 handles them with ease. Here are a few examples:
Yelp is best known for sharing in-depth reviews and insights on all types of local businesses. Yelp uses S3 to store daily logs and photos, and Amazon Elastic MapReduce to process these logs to power features like “People Who Viewed this Also Viewed” and “Review highlights”.
Pinterest is an online pinboard that lets their customers share things they love with their friends. Fortune Magazine recently reported that they’re one of the fastest growing social networks of all time. They use S3 for file and log storage, and process these logs on Elastic MapReduce to draw key insights into their business.
Etsy provides a website for individuals to sell handmade, vintage items, and craft supplies. They have over 25 million members and 18 million items listed today. They store their HTTP server logs in S3, and use Elastic MapReduce for web log analysis and recommendation algorithms.
Amazon Coins are a new virtual currency that will be made available to Kindle Fire users this coming May. They can be used to pay for apps and for most in-app purchases.
If your app runs on the Kindle Fire, it is eligible for Amazon Coins with no further work on your part. If it runs on another Android device and is already in the Amazon Appstore for Android, you'll need to review the Kindle Fire Best Practices and then re-submit your app with the appropriate Kindle Fire devices checked in the Device Support section of the submission form.
The AWS Storage Gateway can now be run in the Microsoft Hyper-V virtualization environment. You can use the Storage Gateway to marry your existing on-premises storage systems with the AWS cloud for backup, departmental file share storage, or disaster recovery.
With today's launch of support for Hyper-V, you can now use the Storage Gateway on-premises in two of the most popular virtualization environments: Microsoft Hyper-V and VMware ESXi. You can also run the Storage Gateway on Amazon EC2. This allows you to mirror your on-premises environment in the AWS cloud for on-demand computing and disaster recovery (DR).
About the Storage Gateway The AWS Storage Gateway combines a software appliance (a
virtual machine image that installs in your on-premises IT environment) and Amazon S3 storage. You can use the Storage Gateway to support several different file sharing, backup, and disaster recovery use cases. For example, you can use the Storage Gateway to host your company's home directory files in Amazon S3 while keeping copies of recently accessed files on-premises for fast access. This minimizes the need to scale your local storage infrastructure.
As part of the installation process for the Storage Gateway, you will create one or more storage volumes. The AWS Storage Gateway gives you two options:
Gateway-Cached Volumes store your primary data in S3 and retain frequently accessed data locally. Volumes can be up to 32 TB in size, but you need just a fraction of that amount of local storage. This gives you the ability to trade off overall storage performance and cost, fine-tuning the balance as needed to best serve your application and your users. For example, in a remote office
scenario, as your storage footprint increases, you can increase utilization of your Gateway-Cached volume in Amazon S3, without having to physically allocate
additional on-premisse storage in the remote office.
Gateway-Stored Volumes store all of your data locally with an asynchronous backup to S3 at the time and frequency of your choice for durable, off-site backups. These volumes can be up to 1 TB in size, and you'll need that amount of local storage.
You can create multiple volumes on each of your Storage Gateways, in your choice of sizes. Each volume appears as an iSCSI target and can be attached and used just like a local storage volume would be.
Storage Gateway in Action Jollibee Foods Corporation (JFC) is using the AWS Storage Gateway to backup and mirror their Oracle databases from their on-premises data center to AWS. JFC is the largest fast food chain in the Philippines with revenues well over 2 Billion USD and presence in more than a dozen countries worldwide. They
like the operational simplicity the Storage Gateway enables, making backup of
their multiple TB-sized database snapshots to AWS easy and efficient. The
Storage Gateway also provides them access to the same database snapshots for
use in Amazon EC2, providing a cost-effective in-cloud DR solution.
Getting Started If you have never used the Storage Gateway before, you can sign up for a 60 day free trial. If you are eligible for the AWS Free Usage Tier, you will receive 1 GB of snapshot storage and 15 GB of data transfer out (aggregated across all AWS services).
Route 53's new DNS Failover feature gives you the power to monitor your website and automatically route your visitors to a backup site if it goes down.
In today's guest post, Product Manager Sean Meckley shows you how to use this powerful new feature on a fictitious website.
DNS Failover pairs up well with Amazon S3’s website hosting feature
to create a simple, low-cost, and reliable way to deploy a backup website. Of
course no one wants their site to go down, but things happen, whether due to
deploying bad code, network outages, or other issues, and it’s helpful to have
a backup which gives your customers a good experience in the event that your
primary website does go down.
Let’s say you’re running a website on an Amazon EC2
instance—for example a company website with some e-commerce functionality, or a
blog, or a photo sharing site. For our example, we’ll use internetkitties.com,
a fictional website where visitors can log in and share their favorite cat
If you’re using Route 53 today, here’s what your Route 53
hosted zone might look like. It’s pretty simple, with just three DNS records: two
default records that come with your hosted zone, plus an A record for
internetkitties.com pointing to the Elastic IP address of your EC2 instance.
Let’s configure DNS Failover so that visitors to
internetkitties.com will land on a friendly backup site in the event that the
main internetkitties.com website experiences an outage.
From the Route 53 console, click Health Checks in the left
navigation bar and then click on the Create Health Check button:
Click Create Health Check. This takes you to a page where
you’ll enter the information that specifies what web page Route 53 should use
as the target of its health check. Enter the IP address of your EC2 instance,
along with the port (in most cases this will be port 80, the standard port for
web pages served over HTTP), your site’s domain name, and the specific web page
that you want Route 53 to request (in this case, we’re entering just a forward
slash, which means Route 53 will use your site’s index page as the target of
the health check). Click Create Health Check to continue.
Now the console shows the health check that we’ve just
Click Hosted Zones in the left navigation bar to go back
to our hosted zone, and click on the A record for internetkitties.com.
Now, in the Edit Record Set panel on the right side of the
page, do the following:
Set the TTL to 60 seconds. This limits the amount of time this DNS record will be cached within the Internet’s DNS system, which means that there will be a shorter delay between the time failover occurs and the time that end users begin to be routed to your backup site.
Set the Routing Policy to “Failover”.
Select “Primary” as the Failover Record Type.
Select “Yes” for Associate Record Set with Health Check.
Select the health check to associate with this record. In the drop-down that appears, you should see the health check we just created. Select this health check.
Click Save Record Set.
Route 53 will now check the health of your site by
periodically requesting your homepage and verifying that it returns a
successful response (to be more specific, it’s checking independently from
multiple locations around the world, with each location requesting the page
every 30 seconds).
Now, configure your backup site on Amazon S3. For a full
walk-through, check out this
blog post on how to set up a static website on Amazon S3. You can decide
what content to put on your static backup website. For example, you could
create a nice “fail whale” page with a friendly message to your customers, and
perhaps a phone number or email address so that your customers can reach you
even though your website is down.
Back in the Route 53 console, go to your hosted zone and
click Create Record Set. Enter the same DNS name as your primary website (in
this case, we’re using the root domain “internetkitties.com” which is the same
as the name of our hosted zone, so the Route 53 console suggests this for you).
For the alias radio button, click “Yes”. Then, select your S3 website endpoint
as the alias target.
Now, set the Routing Policy to “Failover”, and select
“Secondary” as the failover record type. Leave the remaining checkboxes
(evaluate target health and associate record set with health check) at their
default settings of “No”, and click Create Record Set.
Here’s what your Route 53 hosted zone looks like after
That’s it—now your primary site is being health checked by
Route 53, and Route 53 will automatically start sending traffic to your new
backup site on S3 if your primary site goes down for any reason.
53 health checks support HTTP and TCP level checks, and may also be used in
combination with Latency Based Routing, or Weighted Round Robin records to
route around instance, availability zone or even region level problems. You can
read more about Route 53 Health checks in the Route
53 Developer Guide.
Last summer, I introduced cost
allocation reports and tagging, a new system that allows you to organize
and track your AWS resources and their associated cost. I also mentioned that
we’ll incrementally make this feature bigger and better over time. Today, I’m
glad to announce that AWS CloudFormation has added support for tagging Amazon
S3 buckets and Amazon RDS DB Instances.
AWS CloudFormation makes it easy for you to provision and
configure a set of related AWS resources. Using tags, you can now track
the cost and usage of the following resources inside your CloudFormation stacks:
S3 buckets – new
RDS DB Instances – new
Auto Scaling groups
Tags can represent your business dimensions such as a
specific application or service that you’re running on AWS or a cost center
within your company. The cost
allocation reports allow you to take these tags and track usage and cost
associated with them.
To learn more about tagging specific resources, visit the Resource
Types Reference in the AWS CloudFormation User Guide.
There are 3 ways to tag your stacks and their associated
1- Automatic Tags As a convenience, CloudFormation automatically adds preset tags to help you organize all resources within your stack. These preset tags (prefixed by aws:cloudformation) provide the logical name of a resource, its parent stack ID, and its parent stack name:
2- Tag Your Stack You can also add your custom tags to a CloudFormation stack.
You can optionally also have these tags propagated to all resources within your
3 - Tag a Specific Resource You can also tag specific resources inside your
Here’s a template snippet that shows you how to tag an RDS
DB Instance. This tag could help you track your cost across all databases that
your company might be using.
As you may already know, you can host your static website on Amazon S3, giving you the ability to sustain any conceivable level of traffic, at a very modest cost, without the need to set up, monitor, scale, or manage any web servers. With static hosting, you pay only for the storage and bandwidth that you actually consume.
S3's website hosting feature has proven to be very popular with our customers. Today we are adding two new options to give you even more control over the user experience:
You can now host your website at the root of your domain (e.g. http://mysite.com).
You can now use redirection rules to redirect website traffic to another domain.
Root Domain Hosting Your website can now be accessed without specifying the “www” in the web address. Previously, you needed to use a proxy server to redirect requests for your root domain to your Amazon S3 hosted website. This introduced additional costs, extra work, and another potential point of failure. Now, you can take advantage of S3’s high availability and scalability for both “www” and root domain addresses. In order to do this, you must use Amazon Route 53 to host the DNS data for your domain.
Follow along as I set this up using the AWS Management Console:
In the Amazon S3 Management Console, create an S3 bucket with the same name as your www subdomain, e.g. www.mysite.com. Go to the tab labeled Static Website Hosting and choose the option labeled Enable website hosting. Specify an index document (I use index.html) and upload all of your website content to this bucket.
Create another S3 bucket with the name of the root domain, e.g. mysite.com . Go to the tab labeled Static Website Hosting, choose the option labeled Redirect all requests to another host name, and enter the bucket name from step 1:
In the Amazon Route 53 Management Console, create two records for your domain. Create an A (alias) record in the domain's DNS hosted zone, mark it as an Alias, then choose the value that corresponds to your root domain name:
Create an Alias (A) record and set the value to the S3 website endpoint for the first bucket (the one starting with www).
Redirection Rules We're also enhancing our website redirection functionality. You can now associate a set of redirection rules to automatically redirect requests. The rules can be used to smooth things over when you make changes to the logical structure of your site. You can also use them to switch a page or a related group of pages from static to dynamic hosting (on EC2 or elsewhere) as your site evolves and your needs change.
I'm writing to you from the floor of AWS re:Invent, where a capacity crowd is learning all about the latest and greatest AWS developments. As part of the welcoming keynote, AWS Senior VP Andy Jassy announced that we’re reducing prices again. This is our 24th price reduction - we continue to innovate on our customers’ behalf, and we’re delighted to pass savings on to you.
We’re reducing the price of Amazon S3 storage by 24-28% in the US Standard Region, and making commensurate price reductions in all our nine regions worldwide as well as reducing the price of Reduced Redundancy Storage (RRS). Here are the new prices for Standard Storage in the US Standard Region:
Andy also announced that Amazon S3 now stores 1.3 trillion objects and is regularly peaking at over 800,000 requests per second. We’ve often talked about the benefits of AWS’s scale. This massive scale is enabling us to make these Amazon S3 price reductions across all of our nine Regions world-wide.
We are also reducing the per-gigabyte storage cost for EBS snapshots, again world-wide. Here are the new prices:
AWS provides you with a number of data storage options. Today I would like to focus on Amazon S3 and Amazon Glacier and a new and powerful way for you to use both of them together.
Both of the services offer dependable and highly durable storage for the Internet. Amazon S3 was designed for rapid retrieval. Glacier, in contrast, trades off retrieval time for cost, providing storage for as little at $0.01 per Gigabyte per month while retrieving data within three to five hours.
How would you like to have the best of both worlds? How about rapid retrieval of fresh data stored in S3, with automatic, policy-driven archiving to lower cost Glacier storage as your data ages, along with easy, API-driven or console-powered retrieval?
Sound good? Awesome, because that's what we have! You can now use Amazon Glacier as a storage option for Amazon S3.
There are four aspects to this feature -- storage, archiving, listing, and retrieval. Let's look at each one in turn.
Storage First, you need to tell S3 which objects are to be archived to the new Glacier storage option, and under what conditions. You do this by setting up a lifecycle rule using the following elements:
A prefix to specify which objects in the bucket are subject to the policy.
A relative or absolute time specifier and a time period for transitioning objects to Glacier. The time periods are interpreted with respect to the object's creation
date. They can be relative (migrate items that are older than a certain
number of days) or absolute (migrate items on a specific date)
An object age at which the object will be deleted from S3. This is measured from the original PUT of the object into the service, and the clock is not reset by a transition to Glacier.
You can create a lifecycle rule in the AWS Management Console:
Archiving Every day, S3 will evaluate the lifecycle policies for each of your buckets and will archive objects in Glacier as appropriate. After the object has been successfully archived using the Glacier storage option, the object's data will be removed from S3 but its index entry will remain as-is. The S3 storage class of an object that has been archived in Glacier will be set to GLACIER.
Listing As with Amazon S3's other storage options, all S3 objects that are stored using the Glacier option have an associated user-defined name. You can get a real-time list of all of your S3 object names, including those stored using the Glacier option, by using S3's LIST API. If you list a bucket that contains objects that have been archived in Glacier, what will you see?
As I mentioned above, each S3 object has an associated storage class. There are three possible values:
STANDARD - 99.999999999% durability. S3's default storage option.
GLACIER - 99.999999999% durability, object archived in Glacier option.
If you archive objects using the Glacier storage option, you must inspect the storage class of an object before you attempt to retrieve it. The customary GET request will work as expected if the object is stored in S3 Standard or Reduced Redundancy (RRS) storage. It will fail (with a 403 error) if the object is archived in Glacier. In this case, you must use the RESTORE operation (described below) to make your data available in S3.
Retrieval You use S3's new RESTORE operation to access an object archived in Glacier. As part of the request, you need to specify a retention period in days. Restoring an object will generally take 3 to 5 hours. Your restored object will remain in both Glacier and S3's Reduced Redundancy Storage (RRS) for the duration of the retention period. At the end of the retention period the object's data will be removed from S3; the object will remain in Glacier.
Although the objects are archived in Glacier, you can't get to them via the Glacier APIs. Objects stored directly in Amazon Glacier using the Amazon Glacier API cannot be listed in real-time, and have a system-generated identifier rather than a user-defined name. Because Amazon S3 maintains the mapping between your user-defined object name and the Amazon Glacier system-defined identifier, Amazon S3 objects that are stored using the Amazon Glacier option are only accessible through the Amazon S3 API or the Amazon S3 Management Console.
Archiving in Action We expect to see Amazon Glacier storage put to use in a variety of different ways. Toshiba's Cloud & Solutions Division will be using it to store medical imaging. Tetsuro Muranaga, Chief Technology Executive of the division is very exciting about it. Here's what he told us:
We currently provide a service enabling medical institutions to securely store patients’ medical images in Japan. We are excited about using Amazon Glacier through Amazon S3 to affordably and cost-effectively archive these images in large volumes for each of our customers. We will combine Toshiba’s cloud computing technology with Amazon Glacier’s low costs and Amazon S3’s lifecycle policies to provide a unique offering tailored to the needs of medical institutions. In addition, we expect we can build similarly tailored integrated solutions for our wide range of customers so that they can archive massive amounts of data in various business areas.
Pricing You will pay standard Glacier pricing for data stored using S3's new Glacier storage option.
his event introduces how gaming companies can use the AWS Cloud to stay flexible, agile and ahead of the curve. You will hear from our customers, Creative Assembly, Bejig and Games Analytics on how they have successfully designed and implemented cloud computing projects.
Bay Area CloudSearch Meetup
Wednesday, June 19, 2013
Search & Database - What to use when? How can you use CloudSearch with a database? Mobile search. amazon Redshift.