Finding anything: Real-time
  search with IndexTank



         Tim Spence
        April 19, 2011
About the Presenter
Tim Spence
●   Senior Infrastructure Engineer at MedHelp
    ( http://www.medhelp.org/ )
●   Former .NET developer
●   Recently converted to Ruby
●   In love with Open Source Software
●   More at http://whyhello.im/tim
Agenda
●   State of search today
●   Quick survey: how much time/effort did
    YOU spend implementing search on your
    webapp?
●   Examples of services that need improved
    search
●   IndexTank to the rescue
●   Case study: reddit.com
Agenda, continued
●   How I found out about IndexTank
●   Two apps I built with IndexTank
●   Live Demo
The State of Search Today
●   Not well implemented at all
        –   Search works, but...
        –   Barely
●   How many pages of results do you typically
    browse through before finding what you
    were looking for?
●   Or do you give up and head for google site
    search instead?
Survey Time!
●   How much time/effort did YOU spend
    implementing search on your webapp?
●   How many times have you iterated on your
    search feature?
●   When was the last time someone thanked
    you for building a powerful, reliable search
    feature for your webapp?
My Opinion
●   Search as an in-app feature is an
    afterthought
●   Minimal implementation is the norm
●   If it wasn't for MySQL/MS-SQL full text
    indexing, most apps probably wouldn't
    even have a search feature
●   Most good web apps don't make it easy for
    users to find specific content outside of
    predetermined navigation
Let's pick on some apps!
●   These are companies with great products,
    but their search comes up short
●   Don't worry–they can take it!
App #1: Github
App #1: Github
App #1: Github
●   Interface is decent
        –   Search repos, code, users, or everything
        –   Search by language
●   However...
        –   Can't do much with results but browse
        –   Check out this example
App #1: Github
App #1: Github
●   Why these results aren't so hot
        –   Can't search by most recently maintained
        –   Can't search by most popular (most
             watched)
        –   Are you ready to browse 1,297 results?
●   Advanced search capabilities exist, but not
    the best interface
        –   recency/popularity implemented, but
              require specific arguments
App #2: Amazon Web Services
●   ”Hey, I bet I can find an AMI from the
    community for the exact EC2 setup I need”
●   Fact: probably not
App #2: Amazon Web Services
App #2: Amazon Web Services
●   Notice something missing?
       –   No search
       –   Only sort by date, title
●   Ready to browse 934 results?
       –   I'd rather build my own AMI
●   Incredible missed opportunity
       –   o/s search
       –   Stack search
       –   etc...
Fact: Github & Amazon aren't the
            only ones
●   Lots of good web services
●   Massive quantities of quality content
●   Unfortunately not discoverable in
    meaningful ways
Interlude: Sites with great search
●   Foodspotting
       –   Proximity
       –   Recency
       –   Rating
●   Medhelp
       –   Content category
       –   Promoted content
●   Other sites I overlooked? Whose search
    do you like?
What was the point of that last
               slide?
●   Search can be useful if it is valued as a
    feature
●   Any company willing to invest in the
    resources can build and host a high quality
    search engine
●   However, must you roll your own?
Enter Search as a Service
●   No need for you to invest in additional
    infrastructure
●   No need to reinvent the wheel
        –   Search is a solved problem
        –   Let the experts refine it
IndexTank to the rescue!
●   Hosted–no load on your infrastructure
●   Powerful
       –   We'll get into the details next
●   Always Improving
       –   Search IS their product
●   Freemium
●   Easy to implement
Let's talk features
●   Real-time search
       –   Real-time indexing–results immediately
            available
●   Custom scoring
●   Autocomplete
●   Faceting
●   Geo search
●   Advanced text search
●Real-time search
●   Real-time indexing
       –   results immediately available
●   Index multiple docs/sec
●   Overwrite existing docs as you wish
       –   Changes also immediately available
Custom Scoring
●   Implementer has full control over how
    results are returned
●   Choose which fields are searched
●   Use pre-written scoring functions
●   Or write your own
Custom Scoring
Everyone loves autocomplete
●   Saves users time
●   Potentially avoids spelling errors
        –   Not for hunters/peckers
●   Adds a degree of intelligence to the search
    process
Faceting
●   Does it make sense for you to categorize
    documents in your index?
       –   In all cases, YES
●   Consider your advanced users and the
    narrow results they seek
       –   Don't make anyone sift through irrelevant
            results
Faceting
Geo
●   It's 2011
        –   Location is more relevant than ever before
        –   Mobile is skyrocketing–every client has a
             GPS
●   IndexTank has built-in geo proximity
    search capability
Geo
Advanced Text Search (Beta)
●   Fuzzy search (Did you mean...?)
●   Stemming
        –   Alternate word forms (tense, possession,
              etc...)
●   Alternate spellings
        –   Misspellings
Other Benefits
●   Zero maintenance
●   Scalability included for free
●   Easy implementation
        –   Clients available in many languages
        –   Excellent documentation–Let's check it out
●   Excellent support
        –   Humans or bots? You decide
●   Dog food: their site search is done well
Case Study: reddit.com
●   High traffic news aggregator (> 1.0E9
    pvs/mo) with tons of content
●   Who remembers how bad reddit's search
    was?
        –   When it even worked
●   Can't blame them for trying
        –   Many attempts, but none worked
●   IndexTank excelled in all areas
●   Let's check it out now
My experience with IndexTank
●   Discovered through Heroku/IndexTank
    contest
●   Built my first irl Rails app in an
    afternoon/evening w/ fellow hacker Chris
    Saylor (@cwsaylor)
●   Didn't win the contest but learned how
    easy it is to quickly create highly targeted
    search
App #1: Toxosis
●   Searchable database of toxic release data
    supplied by U.S. E.P.A.
●   Hosted at http://toxosis.heroku.com/
●   Search enabled on many fields including
    city/state/zip, toxin
●   Additional fields can be added to index
        –   When I have time, of course...
More personal backstory
●   Still in the business of reinventing myself
    as a Rails developer
●   How to get a Rails gig? Develop an app
    multiple Rails apps and show it them off
●   Opportunities are everywhere–contests,
    hackathons, and weekend hacks for
    developer community
App #2: SXSWdex
●   Searchable database of 2011 SXSW
    attendees
●   Hosted at http://sxswdex.heroku.com/
●   Design goal: do a better job than SXSW
    official site
●   Search within bio, company, location,
    name
●   Facets: company, city/state
The moment we've all been
            waiting for
●   Let's build an app!
Questions?
●   Q&A time with an IndexTank engineer

Indextank east bay ruby meetup slides

  • 1.
    Finding anything: Real-time search with IndexTank Tim Spence April 19, 2011
  • 2.
    About the Presenter TimSpence ● Senior Infrastructure Engineer at MedHelp ( http://www.medhelp.org/ ) ● Former .NET developer ● Recently converted to Ruby ● In love with Open Source Software ● More at http://whyhello.im/tim
  • 3.
    Agenda ● State of search today ● Quick survey: how much time/effort did YOU spend implementing search on your webapp? ● Examples of services that need improved search ● IndexTank to the rescue ● Case study: reddit.com
  • 4.
    Agenda, continued ● How I found out about IndexTank ● Two apps I built with IndexTank ● Live Demo
  • 6.
    The State ofSearch Today ● Not well implemented at all – Search works, but... – Barely ● How many pages of results do you typically browse through before finding what you were looking for? ● Or do you give up and head for google site search instead?
  • 7.
    Survey Time! ● How much time/effort did YOU spend implementing search on your webapp? ● How many times have you iterated on your search feature? ● When was the last time someone thanked you for building a powerful, reliable search feature for your webapp?
  • 8.
    My Opinion ● Search as an in-app feature is an afterthought ● Minimal implementation is the norm ● If it wasn't for MySQL/MS-SQL full text indexing, most apps probably wouldn't even have a search feature ● Most good web apps don't make it easy for users to find specific content outside of predetermined navigation
  • 9.
    Let's pick onsome apps! ● These are companies with great products, but their search comes up short ● Don't worry–they can take it!
  • 10.
  • 11.
  • 12.
    App #1: Github ● Interface is decent – Search repos, code, users, or everything – Search by language ● However... – Can't do much with results but browse – Check out this example
  • 13.
  • 14.
    App #1: Github ● Why these results aren't so hot – Can't search by most recently maintained – Can't search by most popular (most watched) – Are you ready to browse 1,297 results? ● Advanced search capabilities exist, but not the best interface – recency/popularity implemented, but require specific arguments
  • 15.
    App #2: AmazonWeb Services ● ”Hey, I bet I can find an AMI from the community for the exact EC2 setup I need” ● Fact: probably not
  • 16.
    App #2: AmazonWeb Services
  • 17.
    App #2: AmazonWeb Services ● Notice something missing? – No search – Only sort by date, title ● Ready to browse 934 results? – I'd rather build my own AMI ● Incredible missed opportunity – o/s search – Stack search – etc...
  • 18.
    Fact: Github &Amazon aren't the only ones ● Lots of good web services ● Massive quantities of quality content ● Unfortunately not discoverable in meaningful ways
  • 19.
    Interlude: Sites withgreat search ● Foodspotting – Proximity – Recency – Rating ● Medhelp – Content category – Promoted content ● Other sites I overlooked? Whose search do you like?
  • 20.
    What was thepoint of that last slide? ● Search can be useful if it is valued as a feature ● Any company willing to invest in the resources can build and host a high quality search engine ● However, must you roll your own?
  • 21.
    Enter Search asa Service ● No need for you to invest in additional infrastructure ● No need to reinvent the wheel – Search is a solved problem – Let the experts refine it
  • 22.
    IndexTank to therescue! ● Hosted–no load on your infrastructure ● Powerful – We'll get into the details next ● Always Improving – Search IS their product ● Freemium ● Easy to implement
  • 23.
    Let's talk features ● Real-time search – Real-time indexing–results immediately available ● Custom scoring ● Autocomplete ● Faceting ● Geo search ● Advanced text search
  • 24.
    ●Real-time search ● Real-time indexing – results immediately available ● Index multiple docs/sec ● Overwrite existing docs as you wish – Changes also immediately available
  • 25.
    Custom Scoring ● Implementer has full control over how results are returned ● Choose which fields are searched ● Use pre-written scoring functions ● Or write your own
  • 26.
  • 27.
    Everyone loves autocomplete ● Saves users time ● Potentially avoids spelling errors – Not for hunters/peckers ● Adds a degree of intelligence to the search process
  • 28.
    Faceting ● Does it make sense for you to categorize documents in your index? – In all cases, YES ● Consider your advanced users and the narrow results they seek – Don't make anyone sift through irrelevant results
  • 29.
  • 30.
    Geo ● It's 2011 – Location is more relevant than ever before – Mobile is skyrocketing–every client has a GPS ● IndexTank has built-in geo proximity search capability
  • 31.
  • 32.
    Advanced Text Search(Beta) ● Fuzzy search (Did you mean...?) ● Stemming – Alternate word forms (tense, possession, etc...) ● Alternate spellings – Misspellings
  • 33.
    Other Benefits ● Zero maintenance ● Scalability included for free ● Easy implementation – Clients available in many languages – Excellent documentation–Let's check it out ● Excellent support – Humans or bots? You decide ● Dog food: their site search is done well
  • 35.
    Case Study: reddit.com ● High traffic news aggregator (> 1.0E9 pvs/mo) with tons of content ● Who remembers how bad reddit's search was? – When it even worked ● Can't blame them for trying – Many attempts, but none worked ● IndexTank excelled in all areas ● Let's check it out now
  • 36.
    My experience withIndexTank ● Discovered through Heroku/IndexTank contest ● Built my first irl Rails app in an afternoon/evening w/ fellow hacker Chris Saylor (@cwsaylor) ● Didn't win the contest but learned how easy it is to quickly create highly targeted search
  • 37.
    App #1: Toxosis ● Searchable database of toxic release data supplied by U.S. E.P.A. ● Hosted at http://toxosis.heroku.com/ ● Search enabled on many fields including city/state/zip, toxin ● Additional fields can be added to index – When I have time, of course...
  • 38.
    More personal backstory ● Still in the business of reinventing myself as a Rails developer ● How to get a Rails gig? Develop an app multiple Rails apps and show it them off ● Opportunities are everywhere–contests, hackathons, and weekend hacks for developer community
  • 39.
    App #2: SXSWdex ● Searchable database of 2011 SXSW attendees ● Hosted at http://sxswdex.heroku.com/ ● Design goal: do a better job than SXSW official site ● Search within bio, company, location, name ● Facets: company, city/state
  • 40.
    The moment we'veall been waiting for ● Let's build an app!
  • 41.
    Questions? ● Q&A time with an IndexTank engineer