As you probably know, Amazon CloudWatch provides monitoring services for your cloud resources and your applications. You can track cloud, system, and application metrics, see them visually, and arrange to be notified (via a CloudWatch alarm) if they go beyond a value that you specify. For example, you can track the CPU load of your EC2 instances and receive a notification (via email and/or Amazon SNS) if it exceeds 90% for a period of 5 minutes.
Today we are giving you the ability to stop or terminate your EC2 instances when a CloudWatch alarm is triggered. You can use this as a failsafe (detect an abnormal condition and then act) or as part of your application's processing logic (await an expected condition and then act).
Before we dig in, I should remind you of one thing. If you are using EBS-backed EC2 instances, you can stop them at any point, with the option to restart them later, while retaining the same instance ID and root volume (this is, of course, distinct from the associated termination option).
Failsafe Ideas
If you (or your developers) are forgetful, you can detect unused EC2 instances and shut them down. You could do this by detecting a very low load average for an extended period of time. This type of failsafe could be used to reduce your AWS bill by making sure that you are not paying for resources you're not actually using.
You could also implement a failsafe that would detect runaway instances (for example, CPU pegged at 100% for an extended period of time). Perhaps your application gets stuck in a loop from time to time (only when you are not looking, of course). You could also use our CloudWatch monitoring scripts to detect and act on other situations, such as excessive memory utilization).
Processing Logic
Many AWS applications will pull work from an Amazon SQS queue, do the work, and then pass the work along to the next stage of a processing pipeline. You can detect and terminate worker instances that have been idle for a certain period of time.
You can use a similar strategy to get rid of instances that are tasked with handling compute-intensive batch processes. Once the CPU goes idle and the work is done, terminate the instance and save some money!
Application Integration
You can also create CloudWatch alarms based on Custom Metrics that you observe on an instance-by-instance basis. You could, for example, measure calls to your own web service APIs, page requests, or message postings per minute, and respond as desired.
Setting Up Alarm Actions
You can set up alarm actions from the EC2 or CloudWatch tabs of the AWS Management Console. Let's say you want to start from the EC2 tab. Right-click on the instance of interest and choose Add/Edit Alarms:
Choose your metrics, set up your notification (SNS topic and optional email) and check Take the action, and choose either Stop or Terminate this instance:
The console will confirm the creation of the alarm, and you're all set (if you asked for an email notification, you need to confirm the subscription within three days):
Your Turn
I can speak for the entire CloudWatch team when I say that we are interested in hearing more about how you will put this feature to use. Feel free to leave a comment and I'll pass it along to them ASAP.
-- Jeff;


Awesome work, I'd love plain old 'restart' as an action too.
Posted by: Thefalken | January 09, 2013 at 01:17 AM
In recent weeks some of our EC2 instances have been hit with underlying hardware issues and they restart automatically. Is it planned this this feature could be extended to not only stop but start an instance and re-assign the static EIP if for example a system or instance check fails and is caught by a CloudWatch alarm?
Posted by: Steven | January 09, 2013 at 01:28 AM
The AWS CloudWatch CLI (http://aws.amazon.com/developertools/2534) has not been updated since September. When can we expect it to support this new, and very useful, functionality?
Thanks!
Posted by: Yaron | January 09, 2013 at 02:17 AM
Cool, sounds like it's getting closer and closer to some of the things I've asked for in the AWS forums long ago.
It would be nice to also be able to use the *current time* as a threshold or criteria of an action. Such as stopping an unutilized server during the night, and starting it up again in the morning (This would also require reassigning the EIP to the instance).
Or during a certain *window of time* and low utilization, I would like to have it take a snapshot or create an AMI of the instance for me.
For now, I've had to build a service which runs on another server to coordinate all of this, but I'm sure it's not as reliable as it would be if this type of service was built into CloudWatch.
Posted by: JB | January 09, 2013 at 09:02 AM
How can this be applied to all instances instead of only a specific one?
Posted by: Bjorn | January 09, 2013 at 01:26 PM
-- @Yaron,
The existing Amazon CloudWatch Command-Line Tools (CLI) do support these new actions. In the "--alarm-actions" parameter, specify an Amazon Resource Name that corresponds to the action you want to use (remember to include the region where your instance runs). For example, to stop an instance in us-east-1, use "arn:aws:automate:us-east-1:ec2:stop" or to terminate that instance, use "arn:aws:automate:us-east-1:ec2:terminate". You can find more information in the Amazon CloudWatch developer guide at: http://docs.amazonwebservices.com/AmazonCloudWatch/latest/DeveloperGuide/UsingAlarmActions.html
-- @Steven and @Thefalken,
Thanks for the feedback about your desire to see reboot and stop/start actions. Currently you can use this feature to stop or terminate an instance that fails a status check since Amazon CloudWatch monitors status check results. However, we hear you loud and clear that re-starting the instance (and reassigning the Elastic IP address) would be even more useful.
-- @JB,
Thanks for the suggestions for time-based stopping and actions such as snapshot or AMI creation. You may want to consider Auto Scaling, which has a scheduled scaling option (using the PutScheduledUpdateGroupAction API); using that, you can terminate an instance each night and re-launch it each morning. http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/Welcome.html
-- @Bjorn,
We currently do not have simple methods to apply this to multiple instances at once, but you can use tools such as the command-line interface for this. Or, you may be able to use the EC2 console; it remembers your prior alarm's settings so you can go through your list of instances and create identical alarms for multiple instances in a few seconds each.
Posted by: Derek | January 09, 2013 at 02:32 PM
+1 to a reboot action. I can't even think of a case where we would just shut it down...
Posted by: Nick | January 09, 2013 at 03:46 PM
An out of band monitoring option would be nice as well. I currently use cloudwatch and external monitoring to ensure I have out of band monitoring. I need to ensure I measure user experience and not just resource monitoring. Cloudwatch is susceptible to AWS resource outages as well.
Posted by: DR Shaw | January 10, 2013 at 05:37 AM
This looks awesome! From the forms I found this: "Or, if you are a corporate IT administrator, you can create a group of alarms that first sends an email notification to developers whose instances have been underutilized for a day, then stops an instance if utilization doesn't improve after three days, and terminates the instance after a week of no activity."
However, when trying to setup an alarm for stopping an instance after three days of under-utilization, I'm getting the following error: "The number of consecutive periods cannot span more than a day."
Any ideas? These seem contradictory... is monitoring for longer than a day not yet supported? Thanks!
-Dan
Posted by: Dan Mackin | January 10, 2013 at 05:12 PM