Hello again, readers, this is Simone Brunozzi, technology evangelist for AWS in Europe!
As you know, my job implies lots of travel. It's a great chance to meet with AWS enthusiasts and customers. Along the way, I've come across some great success stories.
Industria , Iceland
Industria adopted the
Amazon Web Services
service, as well as for the Zignal digital entertainment delivery platform. Zignal Cloud lowers the total cost of ownership for service providers and provides predictability of costs, reduces technology risks and decreases time to market.
In their blog, they state:
" An intended consequence of this approach is that we can do it all with no upfront cost for our customers, because we're effectively using a true cost-sharing model that offers us almost a 100% economy of scale. "
Of course, when you use Amazon Web Services, you're charged only for what you use, with no upfront investment . You can read more details on AWS's offerings on our product page .
If you're interested in ZignalCloud, you can contact Industria in Iceland, Ireland, Bulgaria, UK, Sweden or China.
Antonio Agudo , COO of CloudAngels.eu , sent us an email documenting a nice success story, involving one of their customers, imageloop.com . This is a service that allows you to create nice slideshows, and manage pictures and widgets.
When they started imageloop.com's transition to Amazon Web Services, they needed to convert all their old pictures, generating new thumbnails and output formats.
Normally that would have taken months, but since they had virtually unlimited access to cpu power with EC2, they just launched sixty c1.xlarge instances that fed off conversion jobs from SQS and were done in a day and a half.
Then, about a week later,when they were going live, they scheduled one night of downtime maintenance, and converted the images that had accumulated during the week, about 110,000 pictures , using ten EC2 instances for two hours.
is very pleased with the level of flexibility that Amazon provides.
From the words of Antonio : "the delivery speed of the Slideshows is way better than before and we liked the flexibility and ease with which we were able to build up the platform. Congratulations to a great product!"
And this is Stefan Riehl , imageloop.com's CEO: "When we began evaluating alternatives to traditional hosting vendors, it became apparent that AWS's offering is the most mature in the market."
SnappyFingers, Bangalore, India
is a Question and Answer search engine. SnappyFingers crawls and indexes Frequently Asked Questions on the Internet, and provides search results in a easy to view Question/Answer format.
Chirayu Patel was kind enough to share with us some details on how they use Amazon Web Services (AWS) along with some rationale behind their choices.
The three main motivations behind their choices are (in their own words):
- we are extremely reluctant to learn or do anything outside of SnappyFingers domain. We would rather outsource.
- We are very cost conscious.
- We do write buggy code, but we do not want our systems to die because it.
During the design of SnappyFingers, they considered multiple options, but at the end they picked Amazon Web Services.
Preliminary cost analysis showed that the basic cost of the AWS alternatives would be lower in the long run. Also, there was an added advantage of not being tied to a single vendor. However, once they added the cost of managing the systems, the financial advantage of using AWS became evident.
This, coupled with the fact that they didn't want to be distracted with operational burdens unrelated to their core business, meant that AWS became the obvious choice for scaling CPU/storage resources.
SnappyFingers is comprised of two systems - a Website, and Information Retrieval System (IRS). The Website corresponds to the system that serves user requests, and the IRS is the system that does all the behind the scenes work to gather Q&A.
SnappyFingers is mostly coded in Python, Java languages, and uses multiple third party packages: notably being the Django framework , multiprocessing package in Python, and Apache Lucene , a high-performance, full-featured text search engine library written entirely in Java.
runs on at least
three EC2 nodes
, and uses the following components.
1. nginx - An extremely fast web server, used to serve static/cached content. It is also used to reverse proxy traffic to multiple Apache servers.
2. Apache server with mod_python to execute the Python code along with the Django framework.
3. Searchers to perform the actual searches on the Q&A index.
4. Spell checkers .
5. PostgreSQL , for system management: recording bugs, registering new services, and such.
Caching is built into the system using a combination of memcached and file system caching. Static content is served using Amazon CloudFront . Amazon Mechanical Turk is used to test the relevancy of search results.
The Information Retrieval System (IRS) is responsible for creating Q&A indexes that will eventually be used by the searcher. It uses multiple services to do the job:
1. Crawlers to crawl the internet.
2. Parsers to extract Questions and Answers from each page, detect spam, and eliminate duplicate content.
3. Scorers to score the Q&A's based on a number of factors. The scoring algorithms are the most dynamic pieces of code, and are under continuous evolution.
4. Indexers to index the Q&A.
These services interact with multiple storage devices - Amazon S3, Amazon SimpleDB and Postgresql. Not all data is stored in all locations. Based on the data size, and retrieval requirements, we store the data in different locations. All data access is done through a Python based custom ORM (Object Relational Mapping) to simplify programming.
Another aspect of these services is that
they can be run in any node
. At times they have used a certain amount of EC2 servers, while at others they have reduced their infrastructure depending on the load and their monthly AWS budget.
At present IRS has consumed roughly 500 GBytes for a data set of 11 million Q&A.
Intra-service communication uses the concept of pipelines, each with its own set of pipes. Each pipe ( Amazon SQS Queue) is owned by a service, which is responsible for processing messages within it. Once processing is complete messages are sent to the next pipe in the pipeline.
This architecture has not only allowed SnappyFingers to maintain the modular nature of the system, but also to develop and deploy services in isolation with the rest of the system.
The Error Handling strategy is simple: on an error, a service will log the error and store the corresponding message in Amazon SimpleDB , and continue processing the next message. The service stops only when the error rate exceeds configured thresholds.
Once the errors have been corrected, the corresponding messages are pushed back to Amazon SQS for completion of processing.
CPU utilization and scaling
All the IRS services are designed to keep the CPU occupancy 100% (or to a configured value), using Python's multiprocessing package to spawn/kill processes to maintain CPU occupancy.
The services are independent of the node on which they are running, and if there is a huge backlog of messages in Amazon SQS, more EC2 nodes can be spawned to handle the extra load.
If you have some interesting AWS Success Stories, feel free to email me with some details, you could be featured in this official blog as well :)
[ Simone Brunozzi, AWS Technology Evangelist in Europe, simoneb at amazon dot com]