Open Source Search Evolution

1,069 views

Published on

From Gopher, WAIS, and Harvest to Lucene, Solr, SolrCloud, and Elasticsearch.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,069
On SlideShare
0
From Embeds
0
Number of Embeds
157
Actions
Shares
0
Downloads
4
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Open Source Search Evolution

  1. 1. [Open Source] Search Evolution Otis Gospodnetić @otisg
  2. 2. Today
  3. 3. The Early Days
  4. 4. Even Earlier Days
  5. 5. Foci 1974 1995 now() __________________________________________________________________________________________________________________ ______ SEARCH
  6. 6. Otis Who? SEARCH
  7. 7. Then & Now 1990s 2014 WebGlimpse Swish Harvest Ht://Dig freeWAIS elasticsearch.
  8. 8. Still New? elasticsearch. …………………... 2000 …………………... 2004 …………………... 2010
  9. 9. Dominance [Open Source] Search Evolution
  10. 10. Big Cake Big Data Beyond Text Memory Footprint Distributed Model Language Support Indexing Speed, NRT Relevance Algorithms
  11. 11. Language Support: Stemming
  12. 12. Language Support: Lemmatization
  13. 13. Language Support: Morphology
  14. 14. Language Support Lucene 2004: ~ 20 languages Lucene 2014: ~ 40 languages most are stemmers
  15. 15. Relevance Models: VSM TF IDF For term i in document j wi,j = tfi,j x log(N/dfi ) tfi,j = number of occurrences of i in j dfi = number of document containing i N = total number of documents
  16. 16. Relevance Models: Pluggable Lucene until 2011: 1 relevance model Lucene 2014: 6 relevance models got more?
  17. 17. Distributed Architecture 1 Master - N Slaves good for scaling queries not good for scaling data Sharded index with replication good for scaling queries good for scaling data
  18. 18. Indexing Speed & NRT Search
  19. 19. Memory Footprint
  20. 20. Beyond Text Geospatial Search Classifier Recommendation Engine Key Value Store NoSQL DB Analytical DB
  21. 21. Geospatial Search
  22. 22. Classifier
  23. 23. Recommender Content Similarity Collaborative Filtering
  24. 24. Key Value Store id123 ⇒ manu:Apple desc:foo bar price:$111 id234 ⇒ manu:Sony desc:baz bam price:$222
  25. 25. NoSQL DB Distributed Replicated Horizontally Scalable Fast Retrieval Searchable?
  26. 26. Slicing & Dicing
  27. 27. Analytical Queries
  28. 28. Gobble Gobble If software is eating the world, then [open source] search is gobbling it. And has been for years.
  29. 29. FIN. Questions otis@sematext.com

×