People more persistent and clever than I have managed to find the Werner Vogels interview in the current issue of ACM Queue. Werner is Amazon's CTO and the author of the All Things Distributed blog. Werner goes into considerable detail on the inner workings of the Amazon site, scalability issues, our internal services architecture, and much more.
Werner also talks about the fact that Amazon developers must operate the services that they build:
Giving developers operational responsibilities has greatly enhanced the quality of the services, both from a customer and a technology point of view. The traditional model is that you take your software to the wall that separates development and operations, and throw it over and then forget about it. Not at Amazon. You build it, you run it.
This often comes as a big surprise to new hires. Here's what happens:
You arrive on a Monday morning and spend about 3 hours in orientation. At noon you are done with orientation, and you get to meet your team and your manager for lunch. If you are a developer, your manager may very well hand you a pager that same day along with a link to the on-call schedule for the group. You look at the schedule and realize that you will be "at bat" before long. Of course, you will have your team for backup. You then waste no time in learning how your team's systems work -- what the servers are named, what can go wrong, and how to fix it when it does. Before long you fix something and you realize that a code change or two will prevent that particular problem from ever happening again. Then you realize the beauty of this model: you can, effectively, code yourself into a good night's sleep. Being responsible for the care and feeding of your own systems gives you the ultimate incentive to make them robust, self-healing, and (when all else fails) self-diagnosing. You learn quickly to write descriptive messages into the application log, and you learn how to dig as deep as possible into the problem in order to solve it.