Solr Powr — Enterprise-grade search for your app

1,362 views

Published on

An overview of Apache Solr and a brief introduction to the Sunspot client.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,362
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
25
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • Solr Powr — Enterprise-grade search for your app

    1. 1. Solr PowrEnterprise-grade search for your appNick Zadrozny
    2. 2. Hi, my name is Nick. I’m a webdev — full-time w/ Rails since 2005. Generalist background. Perspective of a relative Solr noob.
    3. 3. Brought my generalistperspective to Websolr aboutsix months ago.We do hosted searchI enjoy doing things the RightWay. websolr
    4. 4. What is Solr?How can we make the most of it?
    5. 5. Take some textMake a list of the wordsand where they show upOf course, being geeks,we throw a lot offeatures into that Indexing
    6. 6. Java search library thatdoes indexing. You give itsome words, it buildsthose indexes.Most of what we willtalk about is actuallyLucene. Apache Lucene
    7. 7. What is Solr?Web application interface for LuceneEssentially RESTful POST in data, GET with queriesVarious administrative featuresVarious web scaling features
    8. 8. Just so you know, I’mgoing to be blurring Solrand Lucene from here onout. Still with me?
    9. 9. Do smarter things with alittle bit of structure. Schema
    10. 10. binary external file longboolean float shortbyte geohash stringdate int textdouble integer trie
    11. 11. Most of the interestingstuff happens here Text
    12. 12. adding and updatingrecords, doing statistics,correlating with your sqldatabase, etc Unique key Not required, but handy.
    13. 13. tokenize on whitespace or non-letter chars standard tokenizer is sort of “type aware” and understands acronyms, urls, words withText apostrophes so-called stop words since we’re not doing actual semantic language search Shingles: consecutive n-sized word groups “the quick” “quick brown” “brown fox” “fox jumped”Tokenize words Stop wordsStrip HTML Language stemmingNormalize case Phonetic stemmingNormalize accented Synonymscharacters Word shinglesPattern replacement
    14. 14. Index rich content HTML, PDF, Word, etc.
    15. 15. Add and UpdateSerialize your Updates aredocuments to XML, incrementalJSON and a handfulof others.HTTP POST to yourSolr URLSolr hands your datato Lucene forprocessing
    16. 16. Querying
    17. 17. Powerful query syntax. Boolean logic is just the start.
    18. 18. min, max, average,stddev Numeric operations.
    19. 19. do stuff relative to“now” Date ranges, date math.
    20. 20. Yeah, one killer featurehere is that Solr supportsspatial search.Give it a lat/lon. Distance.
    21. 21. Present the available values so your userscan filter by it.Great for building out rich taxonomies.Example: facet books by language, author,genre. Faceting.
    22. 22. spelling suggestion foruser queries.query auto-suggest frompopular queries “Did you mean…?”
    23. 23. Generate a list of similardocuments. Consider blogposts. More Like This
    24. 24. Probably more.
    25. 25. Solr in Production
    26. 26. This is why we run Solr. It’s really, really fast. When properly configured.
    27. 27. Average max responsetime is 75ms.Even the 95 percentile isway below that.
    28. 28. updates are incremental to keep thingsrunning fastfor performance reasons, they don’t show upin search results until you issue a commitCommits are sorta heavy200ms – 2 sec Commits
    29. 29. most of the time youdon’t have to worryabout this Lock the writerbut it’s easy to screwthis up if you flood thesystem with updates andcommits Flush updates to disk Tear down the old Start a new reader reader Warm up the reader’s Unlock the writer cache Register the reader with Solr
    30. 30. As you’re committing changes,you’re usually creating newfiles in “segments”Optimize takes your indexand rewrites it into a morecompact number of filesGood to do this periodically touse less memory and avoidrunning out of open files Optimize
    31. 31. Actual replication is pull from slave andreally fast. Like, don’t worry.Best way to deal with high IO.Reads go to read cores, writes go to writecores.Scale read resources separately.Make sure writes don’t interrupt reads. Replication. Stupidly easy.
    32. 32. All I’ll say is that it’s reallypowerful and gives you a lotof rope.I’ve seen cache warmupstake down Tomcat — inparticular, on a very largeindex with spatial search. Caching
    33. 33. I’m a Rails generalistI like to do things the right way.Solr is fast, fully-featured, and can bescaled separately from the rest of yourapp.It takes the load off your database andapp servers, and does a better job.In some cases, it offers features that justaren’t other wise even possible. In Conclusion
    34. 34. Questions?
    35. 35. Thanks!

    ×