Finding anything: Real-time search with IndexTank Tim Spence April 19, 2011
About the PresenterTim Spence● Senior Infrastructure Engineer at MedHelp ( http://www.medhelp.org/ )● Former .NET developer● Recently converted to Ruby● In love with Open Source Software● More at http://whyhello.im/tim
Agenda● State of search today● Quick survey: how much time/effort did YOU spend implementing search on your webapp?● Examples of services that need improved search● IndexTank to the rescue● Case study: reddit.com
Agenda, continued● How I found out about IndexTank● Two apps I built with IndexTank● Live Demo
The State of Search Today● Not well implemented at all – Search works, but... – Barely● How many pages of results do you typically browse through before finding what you were looking for?● Or do you give up and head for google site search instead?
Survey Time!● How much time/effort did YOU spend implementing search on your webapp?● How many times have you iterated on your search feature?● When was the last time someone thanked you for building a powerful, reliable search feature for your webapp?
My Opinion● Search as an in-app feature is an afterthought● Minimal implementation is the norm● If it wasnt for MySQL/MS-SQL full text indexing, most apps probably wouldnt even have a search feature● Most good web apps dont make it easy for users to find specific content outside of predetermined navigation
Lets pick on some apps!● These are companies with great products, but their search comes up short● Dont worry–they can take it!
App #1: Github● Why these results arent so hot – Cant search by most recently maintained – Cant search by most popular (most watched) – Are you ready to browse 1,297 results?● Advanced search capabilities exist, but not the best interface – recency/popularity implemented, but require specific arguments
App #2: Amazon Web Services● ”Hey, I bet I can find an AMI from the community for the exact EC2 setup I need”● Fact: probably not
App #2: Amazon Web Services● Notice something missing? – No search – Only sort by date, title● Ready to browse 934 results? – Id rather build my own AMI● Incredible missed opportunity – o/s search – Stack search – etc...
Fact: Github & Amazon arent the only ones● Lots of good web services● Massive quantities of quality content● Unfortunately not discoverable in meaningful ways
Interlude: Sites with great search● Foodspotting – Proximity – Recency – Rating● Medhelp – Content category – Promoted content● Other sites I overlooked? Whose search do you like?
What was the point of that last slide?● Search can be useful if it is valued as a feature● Any company willing to invest in the resources can build and host a high quality search engine● However, must you roll your own?
Enter Search as a Service● No need for you to invest in additional infrastructure● No need to reinvent the wheel – Search is a solved problem – Let the experts refine it
IndexTank to the rescue!● Hosted–no load on your infrastructure● Powerful – Well get into the details next● Always Improving – Search IS their product● Freemium● Easy to implement
Everyone loves autocomplete● Saves users time● Potentially avoids spelling errors – Not for hunters/peckers● Adds a degree of intelligence to the search process
Faceting● Does it make sense for you to categorize documents in your index? – In all cases, YES● Consider your advanced users and the narrow results they seek – Dont make anyone sift through irrelevant results
Advanced Text Search (Beta)● Fuzzy search (Did you mean...?)● Stemming – Alternate word forms (tense, possession, etc...)● Alternate spellings – Misspellings
Other Benefits● Zero maintenance● Scalability included for free● Easy implementation – Clients available in many languages – Excellent documentation–Lets check it out● Excellent support – Humans or bots? You decide● Dog food: their site search is done well
Case Study: reddit.com● High traffic news aggregator (> 1.0E9 pvs/mo) with tons of content● Who remembers how bad reddits search was? – When it even worked● Cant blame them for trying – Many attempts, but none worked● IndexTank excelled in all areas● Lets check it out now
My experience with IndexTank● Discovered through Heroku/IndexTank contest● Built my first irl Rails app in an afternoon/evening w/ fellow hacker Chris Saylor (@cwsaylor)● Didnt win the contest but learned how easy it is to quickly create highly targeted search
App #1: Toxosis● Searchable database of toxic release data supplied by U.S. E.P.A.● Hosted at http://toxosis.heroku.com/● Search enabled on many fields including city/state/zip, toxin● Additional fields can be added to index – When I have time, of course...
More personal backstory● Still in the business of reinventing myself as a Rails developer● How to get a Rails gig? Develop an app multiple Rails apps and show it them off● Opportunities are everywhere–contests, hackathons, and weekend hacks for developer community
App #2: SXSWdex● Searchable database of 2011 SXSW attendees● Hosted at http://sxswdex.heroku.com/● Design goal: do a better job than SXSW official site● Search within bio, company, location, name● Facets: company, city/state
The moment weve all been waiting for● Lets build an app!
Questions?● Q&A time with an IndexTank engineer