Search Engine Packed as an AMI?
It never hurts to try to wish a product into existence...
I received an email from an EC2 user asking me about search tools. This user runs a high traffic site on an array of EC2 instances, and is in need of a search solution. He knew that he could buy a search appliance, but this didn't fit with his company's model. As he told me:
"we don't want to do anything that involves us owning and operating a server...since we're big believers in web services."
After thinking about this for a while, I believe that one really cool solution would involve a search engine installed into an EC2 AMI (Amazon Machine Image), perhaps made available for use on a by-the-hour basis. This hypothetical AMI would incorporate all of the usual components: a crawler, data storage, and a query page for access to the actual search engine. There are bonus points for APIs for inserting and retrieving data, of course.
Perhaps the crawler runs once every 24 hours and then generates some indexed data structures which it stores in S3, where they are picked up by the engine and loaded into the instance's RAM for fast processing. Once again, I'll offer bonus points if spinning up multiple instances of the crawler makes the entire crawling and indexing process run faster.
To top it all off, the query page would be customizable and skinnable, so that this could be plugged into an existing site in a seamless fashion.
If you are doing something like this or have even thought about doing something similar, I'd like to hear from you. If you would pay to use it, same deal. Post some comments and let's see what happens.
-- Jeff;
it could be easily done with OmniFind, I guess
http://omnifind.ibm.yahoo.net/
Posted by: aws-enthusiast | May 04, 2007 at 12:54 PM
Search Engine Packed as an AMI?
Already done. Posted: Apr 2, 2007 8:27 PM PDT
http://developer.amazonwebservices.com/connect/thread.jspa?threadID=14622
You also have
http://omnifind.ibm.yahoo.net/productinfo.php
As well as
http://wiki.apache.org/lucene-hadoop/AmazonEC2
Reuven Cohen
Enomaly Inc
http://www.enomalism.com
Posted by: Reuven Cohen | May 04, 2007 at 04:01 PM
Unlrelated in a way, but since I'm not a developer/programmer I wanted to post this as a wish for one more type of search engine (or know about it if it already exists).
Let's say I go to a new city and get lost somewhere. It would be cool if I could take a picture of a neighbouring building through my mobile sms it to some service, and then receive an sms back telling me my location.
Posted by: Juhi | May 05, 2007 at 04:29 AM
Two general purpose search solutions, with slightly different target spaces, are Nutch http://lucene.apache.org/nutch/ for mainly web-based docs, and Solr http://lucene.apache.org/solr/ which is much more generic. Both are associated with the Lucene http://lucene.apache.org/ project.
Building either into an AMI shouldn't be too hard.
Posted by: greg | May 07, 2007 at 12:34 AM
SearchBlox has a linux server build that can be run on an existing AMI. It has a great ajax based interface and the search engine is completely controlled through a web based interface. It also has an api for inserting data. You can customize the search results or just use the xml output from the search engine. www.searchblox.com
Posted by: tss | November 18, 2007 at 01:40 PM
, but i think you'd need to rewrite the index persistence in Lucene/Omnifind/SearchBlox to support Amazon S3. non-trivial for the open source implementations of search services, near impossible for anything else.
Posted by: tinnitus | December 30, 2007 at 07:16 PM