My Photo

« Friday Links | Main | Meeting for Amazon Associates in Munich »

Search Engine Packed as an AMI?

Mix_dining_room_2 It never hurts to try to wish a product into existence...

I received an email from an EC2 user asking me about search tools. This user runs a high traffic site on an array of EC2 instances, and is in need of a search solution. He knew that he could buy a search appliance, but this didn't fit with his company's model. As he told me:

"we don't want to do anything that involves us owning and operating a server...since we're big believers in web services."

After thinking about this for a while, I believe that one really cool solution would involve a search engine installed into an EC2 AMI (Amazon Machine Image), perhaps made available for use on a by-the-hour basis. This hypothetical AMI would incorporate all of the usual components: a crawler, data storage, and a query page for access to the actual search engine. There are bonus points for APIs for inserting and retrieving data, of course.

Perhaps the crawler runs once every 24 hours and then generates some indexed data structures which it stores in S3, where they are picked up by the engine and loaded into the instance's RAM for fast processing. Once again, I'll offer bonus points if spinning up multiple instances of the crawler makes the entire crawling and indexing process run faster.

To top it all off, the query page would be customizable and skinnable, so that this could be plugged into an existing site in a seamless fashion.

If you are doing something like this or have even thought about doing something similar, I'd like to hear from you. If you would pay to use it, same deal. Post some comments and let's see what happens.

-- Jeff;

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341c534853ef00d834ff567753ef

Listed below are links to weblogs that reference Search Engine Packed as an AMI?:

Comments

it could be easily done with OmniFind, I guess

http://omnifind.ibm.yahoo.net/

Unlrelated in a way, but since I'm not a developer/programmer I wanted to post this as a wish for one more type of search engine (or know about it if it already exists).

Let's say I go to a new city and get lost somewhere. It would be cool if I could take a picture of a neighbouring building through my mobile sms it to some service, and then receive an sms back telling me my location.

Two general purpose search solutions, with slightly different target spaces, are Nutch http://lucene.apache.org/nutch/ for mainly web-based docs, and Solr http://lucene.apache.org/solr/ which is much more generic. Both are associated with the Lucene http://lucene.apache.org/ project.

Building either into an AMI shouldn't be too hard.

SearchBlox has a linux server build that can be run on an existing AMI. It has a great ajax based interface and the search engine is completely controlled through a web based interface. It also has an api for inserting data. You can customize the search results or just use the xml output from the search engine. www.searchblox.com


, but i think you'd need to rewrite the index persistence in Lucene/Omnifind/SearchBlox to support Amazon S3. non-trivial for the open source implementations of search services, near impossible for anything else.


Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Comments are moderated, and will not appear until the author has approved them.

Email Subscription

Enter your email address:

Delivered by FeedBurner

July 2009

Sun Mon Tue Wed Thu Fri Sat
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31