Most of the interestingstuff happens here Text
adding and updatingrecords, doing statistics,correlating with your sqldatabase, etc Unique key Not required, but handy.
tokenize on whitespace or non-letter chars standard tokenizer is sort of “type aware” and understands acronyms, urls, words withText apostrophes so-called stop words since we’re not doing actual semantic language search Shingles: consecutive n-sized word groups “the quick” “quick brown” “brown fox” “fox jumped”Tokenize words Stop wordsStrip HTML Language stemmingNormalize case Phonetic stemmingNormalize accented Synonymscharacters Word shinglesPattern replacement
This is why we run Solr. It’s really, really fast. When properly configured.
Average max responsetime is 75ms.Even the 95 percentile isway below that.
updates are incremental to keep thingsrunning fastfor performance reasons, they don’t show upin search results until you issue a commitCommits are sorta heavy200ms – 2 sec Commits
most of the time youdon’t have to worryabout this Lock the writerbut it’s easy to screwthis up if you ﬂood thesystem with updates andcommits Flush updates to disk Tear down the old Start a new reader reader Warm up the reader’s Unlock the writer cache Register the reader with Solr
As you’re committing changes,you’re usually creating newﬁles in “segments”Optimize takes your indexand rewrites it into a morecompact number of ﬁlesGood to do this periodically touse less memory and avoidrunning out of open ﬁles Optimize
Actual replication is pull from slave andreally fast. Like, don’t worry.Best way to deal with high IO.Reads go to read cores, writes go to writecores.Scale read resources separately.Make sure writes don’t interrupt reads. Replication. Stupidly easy.
All I’ll say is that it’s reallypowerful and gives you a lotof rope.I’ve seen cache warmupstake down Tomcat — inparticular, on a very largeindex with spatial search. Caching
I’m a Rails generalistI like to do things the right way.Solr is fast, fully-featured, and can bescaled separately from the rest of yourapp.It takes the load off your database andapp servers, and does a better job.In some cases, it offers features that justaren’t other wise even possible. In Conclusion