Finding Anything: Real-time Search with IndexTank


Published on

Presentation given at East Bay Ruby meetup group on 4/19/2011 @ U.C. Berkeley

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Finding Anything: Real-time Search with IndexTank

  1. 1. Finding anything: Real-time search with IndexTank <ul>Tim Spence April 19, 2011 </ul>
  2. 2. About the Presenter Tim Spence <ul><li>Senior Infrastructure Engineer at MedHelp ( )
  3. 3. Former .NET developer
  4. 4. Recently converted to Ruby
  5. 5. In love with Open Source Software
  6. 6. More at </li></ul>
  7. 7. Agenda <ul><li>State of search today
  8. 8. Quick survey: how much time/effort did YOU spend implementing search on your webapp?
  9. 9. Examples of services that need improved search
  10. 10. IndexTank to the rescue
  11. 11. Case study: </li></ul>
  12. 12. Agenda, continued <ul><li>How I found out about IndexTank
  13. 13. Two apps I built with IndexTank
  14. 14. Live Demo </li></ul>
  15. 16. The State of Search Today <ul><li>Not well implemented at all </li><ul><li>Search works, but...
  16. 17. Barely </li></ul><li>How many pages of results do you typically browse through before finding what you were looking for?
  17. 18. Or do you give up and head for google site search instead? </li></ul>
  18. 19. Survey Time! <ul><li>How much time/effort did YOU spend implementing search on your webapp?
  19. 20. How many times have you iterated on your search feature?
  20. 21. When was the last time someone thanked you for building a powerful, reliable search feature for your webapp? </li></ul>
  21. 22. My Opinion <ul><li>Search as an in-app feature is an afterthought
  22. 23. Minimal implementation is the norm
  23. 24. If it wasn't for MySQL/MS-SQL full text indexing, most apps probably wouldn't even have a search feature
  24. 25. Most good web apps don't make it easy for users to find specific content outside of predetermined navigation </li></ul>
  25. 26. Let's pick on some apps! <ul><li>These are companies with great products, but their search comes up short
  26. 27. Don't worry–they can take it! </li></ul>
  27. 28. App #1: Github
  28. 29. App #1: Github
  29. 30. App #1: Github <ul><li>Interface is decent </li><ul><li>Search repos, code, users, or everything
  30. 31. Search by language </li></ul><li>However... </li><ul><li>Can't do much with results but browse
  31. 32. Check out this example </li></ul></ul>
  32. 33. App #1: Github
  33. 34. App #1: Github <ul><li>Why these results aren't so hot </li><ul><li>Can't search by most recently maintained
  34. 35. Can't search by most popular (most watched)
  35. 36. Are you ready to browse 1,297 results? </li></ul><li>Advanced search capabilities exist, but not the best interface </li><ul><li>recency/popularity implemented, but require specific arguments </li></ul></ul>
  36. 37. App #2: Amazon Web Services <ul><li>”Hey, I bet I can find an AMI from the community for the exact EC2 setup I need”
  37. 38. Fact: probably not </li></ul>
  38. 39. App #2: Amazon Web Services
  39. 40. App #2: Amazon Web Services <ul><li>Notice something missing? </li><ul><li>No search
  40. 41. Only sort by date, title </li></ul><li>Ready to browse 934 results? </li><ul><li>I'd rather build my own AMI </li></ul><li>Incredible missed opportunity </li><ul><li>o/s search
  41. 42. Stack search
  42. 43. etc... </li></ul></ul>
  43. 44. Fact: Github & Amazon aren't the only ones <ul><li>Lots of good web services
  44. 45. Massive quantities of quality content
  45. 46. Unfortunately not discoverable in meaningful ways </li></ul>
  46. 47. Interlude: Sites with great search <ul><li>Foodspotting </li><ul><li>Proximity
  47. 48. Recency
  48. 49. Rating </li></ul><li>Medhelp </li><ul><li>Content category
  49. 50. Promoted content </li></ul><li>Other sites I overlooked? Whose search do you like? </li></ul>
  50. 51. What was the point of that last slide? <ul><li>Search can be useful if it is valued as a feature
  51. 52. Any company willing to invest in the resources can build and host a high quality search engine
  52. 53. However, must you roll your own? </li></ul>
  53. 54. Enter Search as a Service <ul><li>No need for you to invest in additional infrastructure
  54. 55. No need to reinvent the wheel </li><ul><li>Search is a solved problem
  55. 56. Let the experts refine it </li></ul></ul>
  56. 57. IndexTank to the rescue! <ul><li>Hosted–no load on your infrastructure
  57. 58. Powerful </li><ul><li>We'll get into the details next </li></ul><li>Always Improving </li><ul><li>Search IS their product </li></ul><li>Freemium
  58. 59. Easy to implement </li></ul>
  59. 60. Let's talk features <ul><li>Real-time search </li><ul><li>Real-time indexing–results immediately available </li></ul><li>Custom scoring
  60. 61. Autocomplete
  61. 62. Faceting
  62. 63. Geo search
  63. 64. Advanced text search </li></ul>
  64. 65. <ul><li>Real-time search </li></ul><ul><li>Real-time indexing </li><ul><li>results immediately available </li></ul><li>Index multiple docs/sec
  65. 66. Overwrite existing docs as you wish </li><ul><li>Changes also immediately available </li></ul></ul>
  66. 67. Custom Scoring <ul><li>Implementer has full control over how results are returned
  67. 68. Choose which fields are searched
  68. 69. Use pre-written scoring functions
  69. 70. Or write your own </li></ul>
  70. 71. Custom Scoring
  71. 72. Everyone loves autocomplete <ul><li>Saves users time
  72. 73. Potentially avoids spelling errors </li><ul><li>Not for hunters/peckers </li></ul><li>Adds a degree of intelligence to the search process </li></ul>
  73. 74. Faceting <ul><li>Does it make sense for you to categorize documents in your index? </li><ul><li>In all cases, YES </li></ul><li>Consider your advanced users and the narrow results they seek </li><ul><li>Don't make anyone sift through irrelevant results </li></ul></ul>
  74. 75. Faceting
  75. 76. Geo <ul><li>It's 2011 </li><ul><li>Location is more relevant than ever before
  76. 77. Mobile is skyrocketing–every client has a GPS </li></ul><li>IndexTank has built-in geo proximity search capability </li></ul>
  77. 78. Geo
  78. 79. Advanced Text Search (Beta) <ul><li>Fuzzy search (Did you mean...?)
  79. 80. Stemming </li><ul><li>Alternate word forms (tense, possession, etc...) </li></ul><li>Alternate spellings </li><ul><li>Misspellings </li></ul></ul>
  80. 81. Other Benefits <ul><li>Zero maintenance
  81. 82. Scalability included for free
  82. 83. Easy implementation </li><ul><li>Clients available in many languages
  83. 84. Excellent documentation–Let's check it out </li></ul><li>Excellent support </li><ul><li>Humans or bots? You decide </li></ul><li>Dog food: their site search is done well </li></ul>
  84. 86. Case Study: <ul><li>High traffic news aggregator (> 1.0E9 pvs/mo) with tons of content
  85. 87. Who remembers how bad reddit's search was? </li><ul><li>When it even worked </li></ul><li>Can't blame them for trying </li><ul><li>Many attempts, but none worked </li></ul><li>IndexTank excelled in all areas
  86. 88. Let's check it out now </li></ul>
  87. 89. My experience with IndexTank <ul><li>Discovered through Heroku/IndexTank contest
  88. 90. Built my first irl Rails app in an afternoon/evening w/ fellow hacker Chris Saylor (@cwsaylor)
  89. 91. Didn't win the contest but learned how easy it is to quickly create highly targeted search </li></ul>
  90. 92. App #1: Toxosis <ul><li>Searchable database of toxic release data supplied by U.S. E.P.A.
  91. 93. Hosted at
  92. 94. Search enabled on many fields including city/state/zip, toxin
  93. 95. Additional fields can be added to index </li><ul><li>When I have time, of course... </li></ul></ul>
  94. 96. More personal backstory <ul><li>Still in the business of reinventing myself as a Rails developer
  95. 97. How to get a Rails gig? Develop an app multiple Rails apps and show it them off
  96. 98. Opportunities are everywhere–contests, hackathons, and weekend hacks for developer community </li></ul>
  97. 99. App #2: SXSWdex <ul><li>Searchable database of 2011 SXSW attendees
  98. 100. Hosted at
  99. 101. Design goal: do a better job than SXSW official site
  100. 102. Search within bio, company, location, name
  101. 103. Facets: company, city/state </li></ul>
  102. 104. The moment we've all been waiting for <ul><li>Let's build an app! </li></ul>
  103. 105. Questions? <ul><li>Q&A time with an IndexTank engineer </li></ul>