• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Understanding and visualizing solr explain information - Rafal Kuc
 

Understanding and visualizing solr explain information - Rafal Kuc

on

  • 2,940 views

See conference video - http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011 ...

See conference video - http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011

Talk and presentation about how to use, understand and visualize Solr 'explain' information—essential output from Solr that lets you better tune and debug your search application. In the talk, I'll show the free software that is in development right now, that visualize Solr 'explain' information, such as how the score of the documents were counted, from what it is taken, how it was counted,which tokens mattered the most, and so on.

Statistics

Views

Total Views
2,940
Views on SlideShare
2,813
Embed Views
127

Actions

Likes
4
Downloads
23
Comments
0

2 Embeds 127

http://www.scoop.it 125
http://www.linkedin.com 2

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Understanding and visualizing solr explain information - Rafal Kuc Understanding and visualizing solr explain information - Rafal Kuc Presentation Transcript

    • Understanding and visualising Solr explain information Rafał Kuć, Marek Rogoziński, Solr.pl r.kuc@solr.pl, m.rogozinski@solr.pl, 18.10.2011
    • My BackgroundRafał Kuć• Working with Lucene since 2002• Working with Solr since 2007Solr.pl• Co – founder (with Marek Rogoziński) ńArea of expertise• Lucene and Solr consultant and architect in many major e-commerce sites in Poland• Author of „Solr 3.1 cookbook” by Packt Publishing• Father, husband, Starcraft II player and a gardener after hours ☺ 3
    • What I Will CoverUnderstanding and visualising Solr explaininformationHow to make the information given byApache Solr explain easily readable by aSolr user (not much technical one)Context• Complicated explain made simple• Explain other made even simplerWhat’s next to come 4
    • A typical use case
    • The ChallengeCommon questions like:• Why this document was found ?• Why this document wasn’t found ?• Why this document is higher than the other one ?• Why the results list look like this ?Considerations• Do we always have to anwser those questions ?So how to make users get the answers they want ?• That’s how http://explain.solr.pl was born 6
    • Let’s look at a typical exampleYou run a query • q=ddr&defType=dismax&qf=name^1000+description^100&bf =pow(price,1.5)&debugQuery=true&indent=trueAnd you see the explain information 1.6771803 = (MATCH) sum of: 0.64883727 = (MATCH) max of: 0.64883727 = (MATCH) weight(name:ddr^1000.0 in 6), product of: 0.99999994 = queryWeight(name:ddr^1000.0), product of: 1000.0 = boost 2.446919 = idf(docFreq=3, maxDocs=17) 4.0867718E-4 = queryNorm 0.6488373 = (MATCH) fieldWeight(name:ddr in 6), product of: 1.4142135 = tf(termFreq(name:ddr)=2) 2.446919 = idf(docFreq=3, maxDocs=17) 0.1875 = fieldNorm(field=name, doc=6) 1.028343 = (MATCH) FunctionQuery(pow(float(price),const(1.5))), product of: 2516.272 = pow(float(price)=185.0,const(1.5)) 1.0 = boost 4.0867718E-4 = queryNorm 7
    • Some theorytf – term’s frequencydf – document frequencyidf – inverse document frequencynorm – normalization factor• queryNorm – query normalization factor• fieldNorm – field normalization factorcoord – score factor 8
    • Let’s take a look at it again1.6771803 = (MATCH) sum of: 0.64883727 = (MATCH) max of: 0.64883727 = (MATCH) weight(name:ddr^1000.0 in 6), product of: 0.99999994 = queryWeight(name:ddr^1000.0), product of: 1000.0 = boost 2.446919 = idf(docFreq=3, maxDocs=17) 4.0867718E-4 = queryNorm 0.6488373 = (MATCH) fieldWeight(name:ddr in 6), product of: 1.4142135 = tf(termFreq(name:ddr)=2) 2.446919 = idf(docFreq=3, maxDocs=17) 0.1875 = fieldNorm(field=name, doc=6) 1.028343 = (MATCH) FunctionQuery(pow(float(price),const(1.5))), product of: 2516.272 = pow(float(price)=185.0,const(1.5)) 1.0 = boost 4.0867718E-4 = queryNorm
    • A little more complicated example36.50278 = (MATCH) sum of: 1.54896 = (MATCH) sum of: 0.46676102 = (MATCH) max of: 0.46676102 = (MATCH) weight(name:hard^20.0 in 2), product of: 0.5461986 = queryWeight(name:hard^20.0), product of: 20.0 = boost 2.734601 = idf(docFreq=2, maxDocs=17) 0.009986806 = queryNorm 0.8545628 = (MATCH) fieldWeight(name:hard in 2), product of: 1.0 = tf(termFreq(name:hard)=1) 2.734601 = idf(docFreq=2, maxDocs=17) 0.3125 = fieldNorm(field=name, doc=2) 0.46676102 = (MATCH) max of: 0.46676102 = (MATCH) weight(name:drive^20.0 in 2), product of: 0.5461986 = queryWeight(name:drive^20.0), product of: 20.0 = boost 2.734601 = idf(docFreq=2, maxDocs=17) 0.009986806 = queryNorm 0.61543787 = (MATCH) max of: 0.8545628 = (MATCH) fieldWeight(name:drive in 2), product of: 0.098470055 = (MATCH) weight(manu:maxtor in 2), product of: 1.0 = tf(termFreq(name:drive)=1) 0.03135923 = queryWeight(manu:maxtor), product of: 2.734601 = idf(docFreq=2, maxDocs=17) 3.1400661 = idf(docFreq=1, maxDocs=17) 0.3125 = fieldNorm(field=name, doc=2) 0.009986806 = queryNorm 3.1400661 = (MATCH) fieldWeight(manu:maxtor in 2), product of: 1.0 = tf(termFreq(manu:maxtor)=1) 3.1400661 = idf(docFreq=1, maxDocs=17) 1.0 = fieldNorm(field=manu, doc=2) 0.61543787 = (MATCH) weight(name:maxtor^20.0 in 2), product of: 0.6271846 = queryWeight(name:maxtor^20.0), product of: 20.0 = boost 3.1400661 = idf(docFreq=1, maxDocs=17) 0.009986806 = queryNorm 0.9812707 = (MATCH) fieldWeight(name:maxtor in 2), product of: 1.0 = tf(termFreq(name:maxtor)=1) 3.1400661 = idf(docFreq=1, maxDocs=17) 0.3125 = fieldNorm(field=name, doc=2) 34.95382 = (MATCH) FunctionQuery(float(price)), product of: 350.0 = float(price)=350.0 10.0 = boost 0.009986806 = queryNorm
    • And now, a real life example1.6287426 = (MATCH) sum of: 0.8143703 = (MATCH) sum of: 1.8327936E-6 = (MATCH) max plus 0.01 times others of: 0.40718514 = (MATCH) max plus 0.01 times others of: 1.8327936E-6 = (MATCH) weight(description_nostemm:"harry potter"~100^10.0 in 36647), product of: 4.154771E-7 = (MATCH) weight(description_nostemm:harry^10.0 in 36647), product of: 9.255408E-7 = queryWeight(description_nostemm:"harry potter"~100^10.0), product of: 4.4066886E-7 = queryWeight(description_nostemm:harry^10.0), product of: 10.0 = boost 10.0 = boost 15.841926 = idf(description_nostemm: harry=796 potter=373) 7.5426636 = idf(docFreq=796, maxDocs=553224) 5.8423506E-9 = queryNorm 5.8423506E-9 = queryNorm 1.9802407 = fieldWeight(description_nostemm:"harry potter" in 36647), product of: 0.94283295 = (MATCH) fieldWeight(description_nostemm:harry in 36647), product of: 1.0 = tf(phraseFreq=1.0) 1.0 = tf(termFreq(description_nostemm:harry)=1) 15.841926 = idf(description_nostemm: harry=796 potter=373) 7.5426636 = idf(docFreq=796, maxDocs=553224) 0.125 = fieldNorm(field=description_nostemm, doc=36647) 0.125 = fieldNorm(field=description_nostemm, doc=36647) 0.81437016 = (MATCH) sum of: 0.40718514 = (MATCH) weight(category_search:harri^2000000.0 in 36647), product of: 0.40718508 = (MATCH) weight(category_the:harri in 36647), product of: 0.123389944 = queryWeight(category_search:harri^2000000.0), product of: 0.12338993 = queryWeight(category_the:harri), product of: 2000000.0 = boost 10.559957 = idf(docFreq=38, maxDocs=553224) 10.559957 = idf(docFreq=38, maxDocs=553224) 0.011684701 = queryNorm 5.8423506E-9 = queryNorm 3.2999864 = (MATCH) fieldWeight(category_the:harri in 36647), product of: 3.2999864 = (MATCH) fieldWeight(category_search:harri in 36647), product of: 1.0 = tf(termFreq(category_the:harri)=1) 1.0 = tf(termFreq(category_search:harri)=1) 10.559957 = idf(docFreq=38, maxDocs=553224) 10.559957 = idf(docFreq=38, maxDocs=553224) 0.3125 = fieldNorm(field=category_the, doc=36647) 0.3125 = fieldNorm(field=category_search, doc=36647) 0.40718508 = (MATCH) weight(category_the:Potter in 36647), product of: 5.976383E-8 = (MATCH) weight(description:harri in 36647), product of: 0.12338993 = queryWeight(category_the:Potter), product of: 4.2931266E-8 = queryWeight(description:harri), product of: 10.559957 = idf(docFreq=38, maxDocs=553224) 7.348286 = idf(docFreq=967, maxDocs=553224) 0.011684701 = queryNorm 5.8423506E-9 = queryNorm 3.2999864 = (MATCH) fieldWeight(category_the:Potter in 36647), product of: 1.3920817 = (MATCH) fieldWeight(description:harri in 36647), product of: 1.0 = tf(termFreq(category_the:Potter)=1) 1.7320508 = tf(termFreq(description:harri)=3) 10.559957 = idf(docFreq=38, maxDocs=553224) 7.348286 = idf(docFreq=967, maxDocs=553224) 0.3125 = fieldNorm(field=category_the, doc=36647) 0.109375 = fieldNorm(field=description, doc=36647) 3.394099E-7 = (MATCH) FunctionQuery(pow(int(sold),const(1.5))), product of: 0.40718514 = (MATCH) max plus 0.01 times others of: 58.09475 = pow(int(sold)=15,const(1.5)) 5.0300997E-7 = (MATCH) weight(description_nostemm:potter^10.0 in 36647), product of: 1.0 = boost 4.84872E-7 = queryWeight(description_nostemm:potter^10.0), product of: 5.8423506E-9 = queryNorm 10.0 = boost 8.299262 = idf(docFreq=373, maxDocs=553224) 5.8423506E-9 = queryNorm 1.0374078 = (MATCH) fieldWeight(description_nostemm:potter in 36647), product of: 1.0 = tf(termFreq(description_nostemm:potter)=1) 8.299262 = idf(docFreq=373, maxDocs=553224) 0.125 = fieldNorm(field=description_nostemm, doc=36647) 0.40718514 = (MATCH) weight(category_search:Potter^2000000.0 in 36647), product of: 0.123389944 = queryWeight(category_search:Potter^2000000.0), product of: 2000000.0 = boost 10.559957 = idf(docFreq=38, maxDocs=553224) 5.8423506E-9 = queryNorm 3.2999864 = (MATCH) fieldWeight(category_search:Potter in 36647), product of: 1.0 = tf(termFreq(category_search:Potter)=1) 10.559957 = idf(docFreq=38, maxDocs=553224) 0.3125 = fieldNorm(field=category_search, doc=36647) 5.7398886E-8 = (MATCH) weight(description:Potter in 36647), product of: 4.656172E-8 = queryWeight(description:Potter), product of: 7.9696894 = idf(docFreq=519, maxDocs=553224) 5.8423506E-9 = queryNorm 1.2327484 = (MATCH) fieldWeight(description:Potter in 36647), product of: 1.4142135 = tf(termFreq(description:Potter)=2) 7.9696894 = idf(docFreq=519, maxDocs=553224) 0.109375 = fieldNorm(field=description, doc=36647)
    • Let’s visualize now
    • History view
    • Basic information
    • The real thing
    • Even more ☺
    • What if we can’t match ?
    • And the no-matched explain
    • What you gain from explain.solr.plView Solr explain information in a humanreadable formEasily recognize the most influencing elementsof the scoring processAnswer the questions fasterMore things to come in the future 19
    • Plans for the futureSupport for more formats of Apache Solrexplain (right now, only Solr 3.x is supported)Visualisation of additional dataMore functionalities like:• query problems analysis• query syntax analysis and explanation• query time analysis and visualization• result comparison between cores or instancesVery distant future - additional web applicationdeployed along Solr to enable real timeanalysis of boosts influence
    • Wrap UpThe http://explain.solr.pl should be availablevery soon (probably end of October or midNovember)Code of explain.solr.pl will be available onGitHub soon after the initial releaseThere will be a Java version of thehttp://explain.solr.pl which will cover much moreinformation 21
    • SourcesLinks• http://www.solr.pl• http://explain.solr.pl• http://lucene.apache.org ☺We would like to thank:• Łukasz Lewandowski (http://llewandowski.pl/) for his work on the GUI• Hubert ‘depesz’ Lubaczewski (http://depesz.com) for idea ☺ 22
    • ContactRafał Kuć• r.kuc@solr.pl• http://solr.plMarek Rogoziński• m.rogozinski@solr.pl• http://solr.pl 23
    • Thank you