I received an email from an EC2 user asking me about search tools. This user runs a high traffic site on an array of EC2 instances, and is in need of a search solution. He knew that he could buy a search appliance, but this didn't fit with his company's model. As he told me:
"we don't want to do anything that involves us owning and operating a server...since we're big believers in web services."
After thinking about this for a while, I believe that one really cool solution would involve a search engine installed into an EC2 AMI (Amazon Machine Image), perhaps made available for use on a by-the-hour basis. This hypothetical AMI would incorporate all of the usual components: a crawler, data storage, and a query page for access to the actual search engine. There are bonus points for APIs for inserting and retrieving data, of course.
Perhaps the crawler runs once every 24 hours and then generates some indexed data structures which it stores in S3, where they are picked up by the engine and loaded into the instance's RAM for fast processing. Once again, I'll offer bonus points if spinning up multiple instances of the crawler makes the entire crawling and indexing process run faster.
To top it all off, the query page would be customizable and skinnable, so that this could be plugged into an existing site in a seamless fashion.
If you are doing something like this or have even thought about doing something similar, I'd like to hear from you. If you would pay to use it, same deal. Post some comments and let's see what happens.