My Photo

« November 2008 | Main | January 2009 »

Please Vote: AWS and AWS-Powered Applications Are Crunchies Finalists

AWS is a finalist in two categories of the Crunchies awards, Best Enterprise and Best Overall.

In addition to that, AWS-powered applications from Animoto and SlideRocket are candidates for Best Design; eBuddy and Fotonauts are up for Best International; DropBox for Best New Startup of 2008; and Twitter (competing with AWS) for Best Overall. GoodGuide is a finalist in the Most Likely To Make The World A Better Place category, as is Akoha. Meebo is a candidate for Best Application.

If you could take the time to vote that would be great! Here are some direct links to make it really easy:

Best Enterprise - AWS

Best Overall - AWS

Best Overall - Twitter

Best Design - Animoto

Best Design - SlideRocket

Best International - eBuddy

Best International - Fotonauts

Best New Startup of 2008 - DropBox

Most Likely To Make The World A Better Place - GoodGuide

Most Likely To Make The World A Better Place - Akoha

Best Application - Meebo

-- Jeff;

PS - If I missed any AWS-powered applications, drop me a note and I'll update this post!

Solve for Efficiency With Amazon Mechanical Turk

This is a quick post about another great use of Amazon Mechanical Turk. Over on Friend Feed Jean-Claude Bradley posted a blog post called “Mechanical Turk Does Solubility on Google Spreadsheet”, which talks about using Mechanical Turk to process solubility data for the Open Notebook Science Challenge.

What I believe is revolutionary here is that rather than rely on government funding to help process results, scientists are able to use Crowdsourcing to process results in real time—and at a very fine unit of granularity. When you combine those characteristics with the Open Notebook Science Challenge, the result is unprecedented transparency.

The blog post referenced above inspired another, titled “ Generalizability coefficient for Mechanical Turk annotations”, that describes using Amazon Mechanical Turk and Crowdsourcing to generate annotations of a thesis. Wow, eliminating (or more likely reducing) one of the biggest pains associated with a thesis. Doctoral candidates everywhere should take note.

It’s about more than Grad Student pain relief though: by Crowdsourcing mechanical tasks such as annotation, students are free to focus on data collection and analysis. That makes the entire process more efficient.

These innovative uses of Amazon Mechanical Turk combined with the new Public Data Sets, are an exciting time in both Cloud Computing and in science.

Mike

100% on Amazon Web Services: Soocial.com

I'm Simone Brunozzi, technology evangelist for Amazon Web Services in Europe.

This period of the year I decided to dedicate some time to better understand how our customers use AWS, therefore I spent some online time with Stefan Fountain and the nice guys at Soocial.com, a "one address book solution to contact management", and I would like to share with you some details of their IT infrastructure, which now runs 100% on Amazon Web Services!

In the last few months, they've been working hard to cope with tens of thousands of users and to get ready to easily scale to millions. To make this possible, they decided to move ALL their architecture to Amazon Web Services. Despite the fact that they were quite happy with their previous hosting provider, Amazon proved to be the way to go.

Overview of the new Architecture
This is how their new architecture looks:

Soocial_architecture_600x400

1. Soocial Web application: Nginx running behind an Elastic IP, proxying the request to an HAProxy on every "webapp" instance, each of which has its own memcache server (they don't need to explicitly expire the cache). The reason they use Nginx is to supply SSL support, if HAproxy would do SSL they probably wouldn't be using Nginx.

2. Sync server. This sync server communicates with phones, OSX and Outlook using the SyncML sync protocol. Similar setup as the web application.

3. Sync bots. The sync bots are jobs that run periodically to get and push changes to and from the online apps that they sync with. Currently these are GMail and Highrise. These jobs happen asynchronously. Previously they were using a database as queue for asynchronous processing, but there were several issues:
- it wasn't flexible enough,
- it locked the database unnecessarily,
- the bots were constantly polling the database - not exactly the kind of thing an RDBMS is built for.

The obvious (and most simple) solution for this would be to use a messaging queue, so they switched to AMQP, namely RabbitMQ. Not only did this allow them to feed the bots but also turned out to be good for logging and other purposes. The bots themselves were already running on EC2 before they moved the whole infrastructure over, so the only thing that changed there was that they no longer needed to use a tunnel to connect to the database.

4. Database: After evaluating all the different database replication solutions for PostgreSQL, they decided to use pgpool-II. This is PostgreSQL middleware that allows you to do all sorts of stuff including: multi-master synchronous replication, sharding, master-slave replication (although in this case it won't actually do the replication for you, therefore you need to use something like slony for that), online recovery, connection pooling, caching, etc.
The nice thing is that it didn't require any change in their application layer. It behaves like a real PostgreSQL server, therefore they can continue using the exact same previous ActiveRecord configuration. Both the webapp and the sync server use ActiveRecord and the same models as their data abstraction layer.
They use it for different purposes:
- Data shards: they configure rules for where the data goes and comes from, and pgpool-II takes care of it.
- Replication: shards aren't PostgreSQL, they have in fact an extra layer of pgpool-II instances that do synchronous multi-master replication. This way, if ones of their servers goes down, they're not affected. All they have to do is switch the Elastic IP for the given shard which allows to do hot-failover as all the db connections will automatically switch to the new server (not exactly heartbeat, but good enough).

Scaling
As you can see in the diagram above, to scale the webapp and sync server they simply add EC2 instances, change the config file and that's it. When it comes to DB nodes, they need to do an online recovery. When the user base grows they can add shards to the DB, and have different sets of users run on different shards. This allows them to move heavy users to a more powerful instance, perhaps even optimize the DB for their specific use.

Performance
EBS performs really well and the fact that they can attach more than one EBS volume to a single instance is useful because they use different volumes for data and write-ahead logging (which brings up another interesting performance boost):

"The snapshot capability has proven to be quite handy as we're able to use that for backups instead of doing a SQL dumping directly from the server. Now, whenever we want to do a SQL backup, we start a new EC2 instance, create a volume from a snapshot, start a PostgreSQL instance from that volume, dump from that server and then upload to S3. This way we can do a full backup without trashing a production database server."

Did you find this interesting? Do you have similar stories you want to share with our blog readers? Then send me a message on twitter: @simon, or email me: simoneb @ amazon d0t lu

Cheers,

- Simone

AWS Links - Wednesday, December 23, 2008

Lots of people responded to the link post that I put together next week. In fact, between the "what about me" emails and the responses to a Tweet that I made earlier today, I now have a plethora of good material. So, here we go again!

 

Information Week has named Amazon CTO Werner Vogels as their Chief of the Year. In a very detailed article they cover the history, current state, and overall philosophy of AWS. There's also a separate interview with Werner.

The article even talks about our customer base, noting that "AWS is a popular platform among startups, Web companies, and software-as-a-service companies. Increasingly, Amazon's customers are household names: Nasdaq, The New York Times, Philips, SanDisk. Eli Lilly is using EC2 to deploy SQL Server/Windows Server instances as needed for research data. The Indianapolis Motor Speedway uses AWS for Web site mirroring, video streaming, and digital image archiving."

 

Werner recently wrote an article about Eventual Consistency for the ACM Queue Magazine. Understanding this important concept (which has its roots in the CAP theorem) is essential to building a world-scale distributed system which is also highly reliable.

Werner sums up the theorem as follows:

A system that is not tolerant to network partitions can achieve data consistency and availability, and often does so by using transaction protocols. To make this work, client and storage systems must be part of the same environment; they fail as a whole under certain scenarios, and as such, clients cannot observe partitions. An important observation is that in larger distributed-scale systems, network partitions are a given; therefore, consistency and availability cannot be achieved at the same time. This means that there are two choices on what to drop: relaxing consistency will allow the system to remain highly available under the partitionable conditions, whereas making consistency a priority means that under certain conditions the system will not be available.

 
Stax makes it easy for Java developers to build, manage, and scale applications on Amazon EC2. On the coding side, there's a complete MVC framework, RIA tools, and complete support for scripting. Development and testing is simplified by a local toolkit and an Ant plugin for easy builds. At deployment time, Stax uses EC2 to provide access to unlimited server resources. It is currently in beta and you can sign up here.
 
John M. Willis wrote to tell me about his Cloud Cafe podcast. He's already recorded episodes with many of the major players in the cloud computing space, including GigaSpaces, RightScale, and Elastra. He's already recorded 28 episodes, so you'd best start listening now before you fall even further behind!
 
The very same John M. Willis asked me to mention that Cloud Camp Atlanta will be taking place on Tuesday, January 20. Cloud Camps are a wonderful way to meet other people who are involved in and interested in cloud computing, and a can also give you a good sense of why people are so excited about it.
 
Adam Kalsey has put together a how-to video for their new Drupal AMI. In the 10 minute video, Adam shows how to use ElasticFox to launch their public Drupal AMI, connect it to an EBS (Elastic Block Store) volume, stop the instance, and then reconnect the storage to another EC2 instance. Run the video full-screen in order to see all of the details. Adam's howto is also a really nice introduction to ElasticFox and to EBS. Although the video shows how how to use fdisk to create a partition table on the EBS volume, I've never bothered with that step. Instead, I simply run mkfs on the entire device (which would be /dev/sdj in the video).
 

Bob and Tony from HelpStream wrote me and asked "How can we get on this list?" That's easy - all you need to do is ask, and I'll do my best! They recently moved the complete running HelpStream ("The World's First Truly Social CRM system"), all 140 clients and 90,000 users, over to Amazon EC2.

ZDNet recently chronicled this migration in a comprehensive and worthwhile story titled Migrating to Amazon Web Services: The Blueprint. In this story you can read about how Helpstream's infrastructure was previously running at just 10% of capacity and how our prices were a selling point for them. The article covers phase 1 (using Amazon S3 for backups, and EC2 for test servers); phase 2 (moving about 85% of their storage over to S3), and phase 3 (moving the production system to EC2). The final step took them 5 hours over a weekend. This was time well spent since they estimate that the move will save them 21% on bandwidth and cage space, 59% on monitoring, server administration and in-cage work, and 100% on servers, switches, VPNs and other infrastructure which was obviated by the move to the cloud.

 

So you now understand Software-As-A-Service, Infrastructure-As-A-Service, and the various other "-As-A-Service" models that have been becoming increasingly popular of late. Great, because I've got another one for you. William from RetailZip wrote me early this month to tell me that their new product is a Format-As-A-Service!

RetailZip files are small, encrypted container file which provide gatewayed access to high-value online content. Once created and posted online, the content represented by a RetailZip file can be purchased online in any of 18 currencies. The content is stored on S3 and is decrypted and downloaded via EC2. Personal ($1.99 / month) and business ($9.99/month) licenses are available.

 

Kingsley from OpenLink Software wrote to tell me that they have released the Cloud Computing Edition of their Virtuoso Universal Server. As the press release notes, "The new product release leverages the solution packaging and deployment prowess of the Amazon EC2 cloud-computing platform by delivering a pre-configured and tightly tuned edition of Virtuoso on a Fedora Linux-based Amazon Machine Image (AMI), ready for immediate use (post initialization)."

Their cloud offerings include the Bio2Rdf bioinformatics database packaged up and available in AMI form, with full directions for instantiation and use here, the Neurocommons database for biological research (full directions here), and the DBPedia ontology of Wikipedia knowledge, with full drections here.

Kingsley is intrigued by the ease with which researchers can now instantiate truly massive databases in the cloud and told me that "In all cases, analysts, researches, and knowledge/information workers in general now have the ability to instantiate knowledgebases in the EC2 Cloud. We are talking 1.5 hrs compared to error prone 16 - 22 hrs knowledgebase commissioning marathons that are inherently error prone."

 

The brand-new Amazon Payments blog looks like it is going to be interesting. They say that "A number of authors will contribute on their respective areas of interest and expertise such as ecommerce, online shopping, merchant technologies, developers integrating payments into their projects, mobile payments and more."

They are also on Twitter.

 

Billy Marshall from rPath wrote to tell me about their new video, Cloud Computing in Plain English. In just 5 minutes, this entertaining video will educate you about SaaS, Cloud Computing, Virtualization, and the relationship between them.

 

Cale Bruckner from Email Center Pro wrote to tell me that they've been using AWS to help companies of all sizes to do a better job of processing customer email. Their application allows companies to centralize emails, assign them to people for followup, and to respond with greater efficiency using templates, internal notes, and other facilities (hmmmm, maybe I need this). Pricing starts at $19.00 per month after a free trial.

Earlier this year they wrote a long story about their architecture. They started out by storing messages in Amazon S3. This worked well, allowing them to keep their customer's data safe and secure. They then moved the application itself over to EC2, building a set of AMIs which allow them to launch additional instances as needed to deal with peak traffic and high volumes of email. They conclude with the statement that "Email Center Pro is an example that you really can build an application entirely on the Amazon Web Services platform with great results."

 

Ok, that should do it. Happy holidays everyone!

-- Jeff;

Amazon SimpleDB - Now With Select

There's now a new and somewhat easier way to write SimpleDB queries.

In addition to SimpleDB's existing query language, you can now use select statements which look very similar to standard SQL (Structured Query Language). We made some small changes and additions to the language in order to accomodate SimpleDB's unique multi-valued attribute model.

Here are some valid select statements:

select * from mydomain where city = 'Seattle'
select * from mydomain where city = 'Seattle' or city = 'Portland'
select * from mydomain where author not like 'Henry%'

Things get even more interesting once multi-valued atttibutes are used. This query returns the items where the only attribute value for keyword is 'Book':

select * from mydomain where every(keyword) = 'Book'

The following query returns items where the only value for keyword is 'Book' or Paperback':

select * from mydomain where every(keyword) in ('Book', 'Paperback')

And the following query returns all the items which have the values 'Book' and 'Hardcover' in keyword:

select * from mydomain where keyword = 'Book' intersection keyword = 'Hardcover'

You can also sort the results on any of the attributes that was used in the expression:

select * from mydomain where Year = '2007' intersection Author is not null order by Author desc

The new SimpleDB Select function accepts queries in this new syntax. The existing Query and QueryWithAttributes functions are still usable, of course. There's full information in the new version of the Developer Guide.

--Jeff;

AWS Links - Wednesday, December 17, 2008

There are plenty of really good links in my inbox. Here are some of the best:

  • The Oracle TechBlog has the slides from the recent "Oracle in the Cloud" webinar. I know that lots of you have been waiting for these!
  • On-Demand Enterprise asks Has Cloud Computing Found its 'Killer App'? and reviews AWS customer SOASTA.
  • Old friend Adam Kalsey emailed me to tell me that his company has released CDN2, a Drupal-native video platform built on top of EC2, S3, and SQS. His blog post states that "CDN2 is a combination of a Drupal module and a hosted service that's designed to help you manage video with ease. The module allows you to transcode video into many different formats, from iPhone video to Flash video for the web, to high definition flash."
  • Videos of the AWS Start-Up Challenge Finalists are online.
  • In a perfect example of why cloud computing makes perfect sense for holiday scaling, Talk2Santa is hosted on Amazon EC2!
  • If you just can't get enough cloud computing news, check out the cloud-computing topic on Alltop.
  • RightScale will be running the "Best Practices in the Cloud: Managing the Deployment Life Cycle" webinar on Thursday, December 18th. Pre-register now!
  • n Software recently released a new version of their Amazon Integrator components. The product provides easy-to-use components for accessing Amazon Web Services including  S3, SQS, SimpleDB, Elastic EC2, and the Amazon Associates Web Service.
  • FlixWagon runs on EC2 and S3. Using the site you can broadcast live video from your mobile phone.
  • Delve Networks used EC2 and S3 to create an advanced video publishing platform. S3 allows them to host videos without worrying about replication and backup while EC2 gives them the ability to scale on demand. Their client base includes NFL franchises such as the Kansas City Chiefs.
  • Not exactly new, but still cool, Ten Thousand Cents used the Amazon Mechanical Turk to get 10,000 people in 51 countries to collectively draw a picture of a $100 bill.
  • The Overcast podcast features conversations on cloud computing. Well worth a listen.
  • The S3-powered Dropbox application is an easy way to store, sync, and share files and folders online. You can get to your files from any of your Windows, Mac, or Linux computers and also from the Dropbox web interface. You can also access deleted files and snapshots of older versions.
  • Tarsnap is an online snapshotted backup service for BSD, Linux, and OS X. Colin Percival, author of tarsnap, has written about the tarsnap public beta and about his use of EC2 and S3.
  • WaveMaker is an open source IDE designed specifically for the cloud. You can start using the Cloud Edition for free and you can learn more about the product in their press release. WaveMaker uses RightScale for scaling and load balancing, and Elastra for scalable database connectivity.
  • The folks at the IT Management Podcast were asking for some examples of how corporations are putting AWS to use. Turns out that I had neglected to blog about some of our case studies. We've got great case studies from the Indy 500, Washington Post, Harvard Medical School, Autodesk, AF83 (they streamed a Madonna concert), and Morph Labs. Companies like DiskAgent, TC3Health, and MedCommons are building HIPAA-compliant applications in the cloud and we've got their stories too.

-- Jeff;

Twilio - Telephony in the Cloud

Twilio founder Jeff Lawson stopped by Amazon headquarters yesterday for a show and tell session. Twilio provides a simple yet powerful way to build highly scalable telephony applications. Of course, Twilio itself runs on Amazon EC2 and stores data in Amazon S3.

A Twilio application is simply a phone-activated web application. When the application's phone is called, Twilio answers and activates the application. The application then returns an XML document containing TwiML (Twilio Markup) commands. Jeff showed up how Twilio's 5 commands (<Play>, <Gather>, <Record>, <Say>, and <Dial>) can be combined to create applications in minutes.Here's what they do:

<Play> is used to play an audio file for the caller. Twilio will transcode the file in real-time, turning high-quality audio into the required 8 bit 11 kHz format.

<Gather> accepts one or more digits from the caller's keypad and passes them to a specified URL using POST or GET.

<Record> captures the caller's voice and returns a URL which points to the recorded audio. Recording can be terminated using a specificed keypad key or after a specified quiet period.

<Say> invokes a text to speech engine with male and female voices in 4 languages.

<Dial> is used to connect the caller to another phone number.


Pricing is friendly for developers! Developer accounts are free and include 1000 minutes of calls. Full accounts cost $5 per phone number (local or toll free), then 3 (local) or 5 (toll free) cents per minute for incoming calls and 3 cents per minute for outgoing calls.

Jeff showed us an application that he'd built the day before. The application allows the caller to request the status of EC2, S3, or SQS. The application then parse's the AWS status dashboard's HTML and echoes the status of the requested service. You can read all about the application or you can try it out by calling 206-866-5918. 

You can get started here (you'll need to ask for an invite code there first).

Update: Jeff just emailed me a link to a Slideshare presentation with even more info about Twilio. The presentation includes some really interesting information about how they use EC2, S3, and SQS to build Twilio, and how they build and customize their EC2 instances. He also let me know that they have plenty of invite codes available for readers of this blog.

-- Jeff;

JumpBox - Ready To Use Applications For EC2

Jumpbox Config Page for TwikiI spoke with the good folks at JumpBox earlier this week. They told me that they are now supporting Amazon EC2 with a lineup of 12 public AMIs (Amazon Machine Images) containing pre-built and pre-configured open source applications. You can launch blogging tools, CRM tools, development tools, and lots more.

You can follow the directions in the tutorial to get started. I was able to start up a Twiki site in less than 5 minutes. Each JumpBox includes a configuration page which is accessible via HTTPS on port 3000. Using the page I set up my computer name, entered my email address for event notifications, set my time zone, entered my password, and agreed to the license agreement. After a 10 second wait for configuration, my Twiki was up and running!

26 additional packages are available to JumpBox subscribers. It is important to note that these are all single-instance applications that aregreat for workgroups and web sites with modest amounts of traffic. They are perfect for trying out new applications and for getting off the ground in a big hurry.

All in all, this is pretty powerful stuff. If you are putting a web startup together you can have your blog, bug tracker, project manager, wiki, and content management system up and running in the first hour of business.

-- Jeff;

Bizo SDB Tool

Bizo_simpledb_tool

Bizo is a B2B advertising network running entirely on top of AWS. They needed a GUI built on top of SimpleDB and decided to extend the Firefox plugin suite with the addition of SDB Tool.

The new Bizo SDB Tool provides a nice visual interface to Amazon SimpleDB in the form of a Firefox plugin. After installing the tool and entering your AWS account credentials, you can fetch the list of your SimpleDB domains, create and delete domains, and run queries. Query results are shown in outline form, and you can also add new items.

Source code for SDB Tool is kept in Github and can be found here. Read more on the Bizo blog.

-- Jeff;

Amazon SQS Resources

I've got some new resources to help you make better use of the Amazon Simple Queue Service, or SQS.

Sqs_stock_quote First, our new Stock Quote Example shows how SQS can be used to build a scalable and reliable stock quote system. The user specifies a list of stock symbols, and the application retrieves quotes for these symbols from a financial web service. The sample illustrates how SQS adds reliability and scalability with minimum effort. It also shows how SQS can be incorporated into a Visual Studio component for easy development and reuse. The component fires a ResponseEvent when a new quote is available, enabling the creation of a clean, event-driven user interface. The code includes an adaptive polling mechanism which minimises SQS calls (and costs) when there's no work to do, while maximizing throughput when there is work to be done. The application includes a benchmarking feature to illustrate the efficiency improvements made possible by adaptive polling. Finally, the article shows how the sample can be run on Amazon EC2 and shows that network latency is greately reduced because EC2 and SQS are both running inside of the Amazon network. You can get the code here.

 

Reliable_messaging_sqs Second, Prabhakar Chaganti of Ylastic added another article to his AWS series. In part 4 he talks about Reliable Messaging with SQS. In the article he covers the basic attributes of SQS (reliability, simplicity, security, scalability, and low cost). He discusses SQS messages, and the all-important visibility timeout. From there he proceeds to talk about SQS design considerations, pricing, and how to get started. He then shows how to install and configure the Boto library for Python, and to exercise the various features of SQS: creating queues, listing queues, deleting them, and sending and retrieving messages.

 

I hope that you enjoy these articles on SQS. If you build something interesting as a result of having read them, leave a note in the comments so that we know.

-- Jeff;

Email Subscription

Enter your email address:

Delivered by FeedBurner

July 2009

Sun Mon Tue Wed Thu Fri Sat
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31