• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Solr Powr — Enterprise-grade search for your app
 

Solr Powr — Enterprise-grade search for your app

on

  • 1,218 views

An overview of Apache Solr and a brief introduction to the Sunspot client.

An overview of Apache Solr and a brief introduction to the Sunspot client.

Statistics

Views

Total Views
1,218
Views on SlideShare
1,218
Embed Views
0

Actions

Likes
1
Downloads
21
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n
  • \n \n

Solr Powr — Enterprise-grade search for your app Solr Powr — Enterprise-grade search for your app Presentation Transcript

  • Solr PowrEnterprise-grade search for your appNick Zadrozny
  • Hi, my name is Nick. I’m a webdev — full-time w/ Rails since 2005. Generalist background. Perspective of a relative Solr noob.
  • Brought my generalistperspective to Websolr aboutsix months ago.We do hosted searchI enjoy doing things the RightWay. websolr
  • What is Solr?How can we make the most of it?
  • Take some textMake a list of the wordsand where they show upOf course, being geeks,we throw a lot offeatures into that Indexing
  • Java search library thatdoes indexing. You give itsome words, it buildsthose indexes.Most of what we willtalk about is actuallyLucene. Apache Lucene
  • What is Solr?Web application interface for LuceneEssentially RESTful POST in data, GET with queriesVarious administrative featuresVarious web scaling features
  • Just so you know, I’mgoing to be blurring Solrand Lucene from here onout. Still with me?
  • Do smarter things with alittle bit of structure. Schema
  • binary external file longboolean float shortbyte geohash stringdate int textdouble integer trie
  • Most of the interestingstuff happens here Text
  • adding and updatingrecords, doing statistics,correlating with your sqldatabase, etc Unique key Not required, but handy.
  • tokenize on whitespace or non-letter chars standard tokenizer is sort of “type aware” and understands acronyms, urls, words withText apostrophes so-called stop words since we’re not doing actual semantic language search Shingles: consecutive n-sized word groups “the quick” “quick brown” “brown fox” “fox jumped”Tokenize words Stop wordsStrip HTML Language stemmingNormalize case Phonetic stemmingNormalize accented Synonymscharacters Word shinglesPattern replacement
  • Index rich content HTML, PDF, Word, etc.
  • Add and UpdateSerialize your Updates aredocuments to XML, incrementalJSON and a handfulof others.HTTP POST to yourSolr URLSolr hands your datato Lucene forprocessing
  • Querying
  • Powerful query syntax. Boolean logic is just the start.
  • min, max, average,stddev Numeric operations.
  • do stuff relative to“now” Date ranges, date math.
  • Yeah, one killer featurehere is that Solr supportsspatial search.Give it a lat/lon. Distance.
  • Present the available values so your userscan filter by it.Great for building out rich taxonomies.Example: facet books by language, author,genre. Faceting.
  • spelling suggestion foruser queries.query auto-suggest frompopular queries “Did you mean…?”
  • Generate a list of similardocuments. Consider blogposts. More Like This
  • Probably more.
  • Solr in Production
  • This is why we run Solr. It’s really, really fast. When properly configured.
  • Average max responsetime is 75ms.Even the 95 percentile isway below that.
  • updates are incremental to keep thingsrunning fastfor performance reasons, they don’t show upin search results until you issue a commitCommits are sorta heavy200ms – 2 sec Commits
  • most of the time youdon’t have to worryabout this Lock the writerbut it’s easy to screwthis up if you flood thesystem with updates andcommits Flush updates to disk Tear down the old Start a new reader reader Warm up the reader’s Unlock the writer cache Register the reader with Solr
  • As you’re committing changes,you’re usually creating newfiles in “segments”Optimize takes your indexand rewrites it into a morecompact number of filesGood to do this periodically touse less memory and avoidrunning out of open files Optimize
  • Actual replication is pull from slave andreally fast. Like, don’t worry.Best way to deal with high IO.Reads go to read cores, writes go to writecores.Scale read resources separately.Make sure writes don’t interrupt reads. Replication. Stupidly easy.
  • All I’ll say is that it’s reallypowerful and gives you a lotof rope.I’ve seen cache warmupstake down Tomcat — inparticular, on a very largeindex with spatial search. Caching
  • I’m a Rails generalistI like to do things the right way.Solr is fast, fully-featured, and can bescaled separately from the rest of yourapp.It takes the load off your database andapp servers, and does a better job.In some cases, it offers features that justaren’t other wise even possible. In Conclusion
  • Questions?
  • Thanks!