AUTHOR: http://aws.typepad.com/aws/2012/03/amazon-s3-performance-tips-tricks-seattle-hiring-event.html LINK!

Recent AWS Customer Success Stories & Videos

More AWS Customer Success Stories...

« Dropping Prices Again-- EC2, RDS, EMR and ElastiCache | Main | Surprise! The EC2 CC2 Instance Type uses a Sandy Bridge Processor... »

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341c534853ef0167635b2a45970b

Listed below are links to weblogs that reference Amazon S3 Performance Tips & Tricks + Seattle S3 Hiring Event:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Brycekahle

Does the partitioning include the bucket name? Say we have bucket names of the form a-xxxx, b-xxxx, c-xxxx, etc. Would those buckets then be partitioned by the same logic described above?

Mark Rose

How soon into the name is the hash required? One well-known S3 customer starts all their files the same with xxxxxx_l, followed by random characters. Obviously it's working for them... but what is the magic number of characters you can have before?

Jeff Barr

Mark,

Since this is a single prefix inside the bucket, it still works fine. Small numbers of prefixes (single-digit) would also work fine, assuming that they all don’t introduce their workloads to S3 on the same day (giving S3 time to segment the first set of nodes for prefix_a before the workload for prefix_b shows up). Once larger numbers of prefixes appear, this kinds of defeats the purpose of the tricks in the first place.

James ORourke

Say we were to add two alphanumeric keys at the front of our key. (eg 03/ , a9/ ...) then how do we now perform say a range query where we wants something like 00/Path - ZZ/Path ? Is that even possible? It seems with this scenario, there is no simple way apart from generating hundreds and potentially thousands of requests to S3.

Mark Mitchenall

Really useful article, but I wonder whether this wouldn't all be easier if S3 allowed developers to specify some kind of regex or similar to help S3 identify the part of the object name that would make a reasonable prefix for partitioning for a particular bucket. Or perhaps, just an optional regex that simply identifies an incrementing key within the object names in a bucket, that S3 could perhaps base62 encode, reverse, and take the first two or three characters of to get a sensible partition prefix. That way, James ORourke's issue would be resolved from an API point of view, and developers would still be able to influence the partitioning without having to modify every object name in their buckets.

David

This is a great article, applies to many other data driven technologies. Glad I found your blog cant wait to check out some of the other articles.

The comments to this entry are closed.

Featured Events

The AWS Report


Brought to You By

Jeff Barr (@jeffbarr):



Jinesh Varia (@jinman):


Email Subscription

Enter your email address:

Delivered by FeedBurner

April 2014

Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30