Instance Status Checks
You may remember that we recently introduced EC2 Instance Status Monitoring features to give you better visibility into the status of your AWS resources. We began by providing you with information about operational activities that have been scheduled for your EC2 instances. Since then, we’ve added more functionality.
You can now view status checks to help identify problems that may impair an instance’s ability to run your applications. These status checks are the results of automated tests performed by EC2 on every running instance that detect hardware and software issues. Whether you are running applications on AWS or elsewhere, diagnosing problems quickly and accurately can be difficult. For example, to determine that a faulty boot sequence has crashed before it initialized an instance’s networking stack or that an instance has failed to renew its DHCP lease, it helps to confirm first that the instance is powered on, and all networking equipment is performing as expected.
You have told us that you want to know when problems such as these may affect your instances and that you want to be able to distinguish software problems from issues with the underlying infrastructure. To this end, we are introducing two types of status checks for each of your instances: System status checks and Instance status checks. These checks verify that the instance and the operating system are reachable from our monitoring system.
System status checks detect problems with the underlying EC2 systems that are used by each individual instance. The first System status check we are introducing is a reachability check.
- The System Reachability check confirms that we are able to get network packets to your instance.
System status problems require AWS involvement to repair. We work hard to fix every one as soon it arises, and we are continually driving down their occurrence. However, we also want you to have enough visibility to decide whether you want to wait for our systems to fix the issue or resolve it yourself (by restarting or replacing an instance).
Instance Status checks detect problems within your instance. Typically, these are problems that you as a customer can fix, for example by rebooting the instance or making changes in your operating system. There is currently one Instance status check.
- The Instance Reachability check confirms that we are able to deliver network packets to the operating system hosted on your instance.
Over time, we will add to these checks as we continue to improve our detection methods.
We are also introducing a reporting system to allow you to provide us with additional information on the status of your EC2 instances.
You can access this functionality from the new DescribeInstanceStatus and ReportInstanceStatus APIs, the AWS Management Console, and the command-line tools.
Console Support
The status of each of your instances is displayed in the instance list:

The console displays detailed information about the status checks when an instance is selected:

You can use the Submit Feedback button to report discrepancies between the reported status and your own observations or to provide more detail about issues you encounter:

We will use the feedback entered in this form to identify issues that might be affecting multiple AWS customers and improve our detection systems accordingly.
Update: A few people have emailed me to ask about the new Status Checks column in the Console's instance list. If you don't see it, click on the Show/Hide button and make sure that the Status Checks column is checked:

-- Jeff;


It would be cool if we'd get a simple API and a PHP SDK to get instance status and also historic data. This way we could build simple status boards for our own customers.
Posted by: Joshua | January 07, 2012 at 03:20 AM
Nice feature, it would be great if there was an option to trigger an alert when one of these checks fails.
Posted by: Jon Ward | January 11, 2012 at 07:42 AM
Yet the most interesting for us would be to have a notification of any kind when an EC2 instance goes up or down. Or did we miss something?
Posted by: Michael Tabolsky | January 13, 2012 at 03:19 AM
How can I access the new data from DescribeInstanceStatus ?
It looks like 'system-status' and 'instance-status' are not available yet.
(http://docs.amazonwebservices.com/AWSEC2/latest/APIReference/ApiReference-query-DescribeInstanceStatus.html)
I'm using JAVA SDK 1.2.15
Posted by: Reynald | January 13, 2012 at 06:40 AM
I also vote for triggering an alert. Even better would be the option to trigger a customizable action like an automatic reboot if there have been two failed instance status checks in a row.
Posted by: Chad Martin | January 16, 2012 at 05:38 PM
Hi
I don't see a "Status Checks" Column in my consoles instance list like in the first screen shot listed in this blog post? Has it been removed?
I can see the "Status Checks" tab as illustrated though.
Posted by: CJK | January 17, 2012 at 06:55 AM
Oops
As Jeff kindly pointed out via email, missed the show/hide button.
Posted by: CJK | January 18, 2012 at 03:53 AM
As far as I can tell the data is still not available using DescribeInstanceStatus with latest (1.5.2) PHP SDK. Any idea when this data will be exposed to the API?
Posted by: Eyal Teutsch | February 09, 2012 at 01:23 AM
@eyal - The PHP SDK contains the describe_instance_status function (http://docs.amazonwebservices.com/AWSSDKforPHP/latest/#m=AmazonEC2/describe_instance_status) .
Posted by: Jeff Barr | February 09, 2012 at 05:45 AM
I like the instance status check. One additional feature would be: for auto-scaled instances, if one of these status checks fails, have an option to go ahead and terminate the instance immediately and fire up a new one (or, reboot it, but that'd be an autoscaler feature not available currently), rather than having to wait for the elb http health checks to fail (which are much slower than the new status checks).
We experience random instance failure for reasons we have not yet fully uncovered. We use copperegg's revealcloud to monitor our systems, and we get an alert when the app server dies. But it takes the elb health check 4 minutes to detect the failure and handle it appropriately. Almost every time, I can look at the aws console, and see the "Instance Status Checks" failed (but not System); then wait 3 minutes before a replacement is brought up. Until we figure out the core problem (nginx? rails? mongos? who knows..) it would be nice to at least get a new app server up and rolling again as quickly as possible.
Posted by: Ross | February 09, 2012 at 08:23 AM
Thx Jeff - I'm indeed familiar with describe_instance_status() method, but it just returns info whether the instance is running or not, but no info regrading the status checks, hence a running machine that failed a status check, will not be reported as faulty.
Posted by: Eyal Teutsch | February 11, 2012 at 11:39 PM
Any idea when the describe_instance_status() method will return status checks? It's been almost 2 months and the current PHP API 1.5.3 still does not give this info.
Without this info the describe_instance_status() method is not of much use.
Posted by: Pascal Hughes | March 31, 2012 at 04:02 PM