SlideShare a Scribd company logo
Solr’s Search Relevancy
Understand Solr’s query debug
By Suparit Krityakien
Co-founder and Software Architect at Wongnai Media
Barcamp Bangkhen 2015
Agenda
● Introduction to Apache Lucene and Apache Solr
● Relevancy Problem
● Scoring Model
● Solutions
Apache Lucene
● Full-text Search
● Inverted Index
● Fast!!!!!
● Java Library (pure)
● Apache License, Version 2.0. Privacy Policy
Inverted Index
Basic API Usage
Directory directory = new RAMDirectory();
Analyzer analyzer = new StandardAnalyzer();
IndexWriterConfig config = new IndexWriterConfig(analyzer);
IndexWriter iwriter = new IndexWriter(directory, config);
Document doc = new Document();
doc.add(new Field("title", “title”, TextField.TYPE_STORED));
doc.add(new Field("content", "content", TextField.TYPE_STORED));
iwriter.addDocument(doc);
iwriter.close();
DirectoryReader ireader = DirectoryReader.open(directory);
IndexSearcher isearcher = new IndexSearcher(ireader);
QueryParser parser = new QueryParser("content", analyzer);
Query query = parser.parse("text");
ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs;
for (int i = 0; i < hits.length; i++) {
Document hitDoc = isearcher.doc(hits[i].doc);
hitDoc.get("title"));
}
ireader.close();
directory.close();
Query Syntax (Introduction)
● Grammar
○ Query ::= ( Clause )*
Clause ::= ["+", "-"] [<FIELD> ":"] ( <TERM> | "(" Query ")" )
● Example - name:kasetsart event:barcamp
● Wildcard - m?n or sofite*
● Fuzzy - chicalecious~2
● Proximity - “after you cafe”~2
● Boosting - wongnai^10
Apache Solr
● Index Server
● Built on top of Luceue
● REST API / XML / Binary
● Configurable without programming
○ Schema, Type
○ Analyzer
○ Query .. etc
● Still be Java
○ Deployed to servlet container
○ or Standalone (jetty)
Components
http://butchiso.com/assets/posts/tim-hieu-ve-apache-solr/solr-achitecture.png
Example of configuration files
<?xml version="1.0" encoding="UTF-8" ?>
<!-- solrconfig.xml -->
<config>
<luceneMatchVersion>LUCENE_42</luceneMatchVersion>
<dataDir>${solr.data.dir:}</dataDir>
<directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.
NRTCachingDirectoryFactory}" />
<abortOnConfigurationError>${solr.abortOnConfigurationError:true}
</abortOnConfigurationError>
<lib dir="../../../contrib/extraction/lib" regex=".*.jar" />
<lib dir="../../../dist/" regex="solr-cell-d.*.jar" />
<indexConfig>...</indexConfig>
<updateHandler class="solr.DirectUpdateHandler2">...</updateHandler>
<query>...</query>
<requestDispatcher handleSelect="false">...</requestDispatcher>
<requestHandler name="/select" class="solr.SearchHandler">...</requestHandler>
<requestHandler name="/update" class="solr.UpdateRequestHandler" />
<requestHandler name="/admin/" class="solr.admin.AdminHandlers" />
<requestHandler name="/replication" class="solr.ReplicationHandler" />
<admin>
<defaultQuery>*:*</defaultQuery>
</admin>
</config>
<?xml version="1.0" ?>
<!-- schema.xml -->
<schema name="example core zero" version="1.1">
<types>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />
<fieldType name="string_ci" class="solr.TextField" sortMissingLast="true" omitNorms="true">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
.
.
.
</types>
<fields>
<field name="id" type="string" indexed="true" stored="true" multiValued="false" required="true" />
<field name="_version_" type="long" indexed="true" stored="true" />
<field name="name" type="text_general" indexed="true" stored="false" multiValued="false" required="
true" />
<field name="coordinate" type="location" indexed="true" stored="false" multiValued="false" required="
false" />
<dynamicField name="*_coordinate" type="tdouble" indexed="true" stored="false" />
</fields>
<uniqueKey>id</uniqueKey>
<defaultSearchField>name</defaultSearchField>
<solrQueryParser defaultOperator="OR" />
</schema>
Web UI
Relevancy Problem
Why does the result look like this?
● Why document id:xxx was found?
● Why document id:yyy was not found (top 5)?
● Why doesn't document id:xxx appear before
id:yyy?
TF-IDF -- the default scoring model
● tf (term frequency)
● idf (inverse document frequency)
● coord (coordination factor)
● norm (normalization)
○ fieldNorm - index-time
○ queryNorm - not effect ranking!
● query-time boosting
Search with debug query
Search with debug query (cont.)
<str name="123686">
0.20652547 = (MATCH) product of:
0.4646823 = (MATCH) sum of:
0.20399374 = (MATCH) weight(name:think in 169148) [DefaultSimilarity], result of:
0.20399374 = score(doc=169148,freq=2.0 = termFreq=2.0
), product of:
0.041768804 = queryWeight, product of:
11.050954 = idf(docFreq=7, maxDocs=185423)
0.003779656 = queryNorm
4.8838778 = fieldWeight in 169148, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
11.050954 = idf(docFreq=7, maxDocs=185423)
0.3125 = fieldNorm(doc=169148)
0.03779656 = (MATCH) ConstantScore(name_notok:think cafe name_notok:think cafe @ the bloc name_notok:think cafe siam center name_notok:
think cafe siamcenter name_notok:think cafe สยามเซ็นเตอร name_notok:think tank name_notok:think tank (third place) name_notok:thinkcafe
name_notok:thinkcafe siam center name_notok:thinkcafe siamcenter name_notok:thinkcafe สยามเซ็นเตอร name_notok:thinkcafe@thebloc name_notok:
thinktank name_notok:thinktank(thirdplace))^10.0, product of:
10.0 = boost
0.003779656 = queryNorm
0.01889828 = (MATCH) ConstantScore(name_notok:[ka-fe-vi-no] think park name_notok:[ka-fe-vi-no] thinkpark name_notok:[kafevino] think park
name_notok:[kafevino] thinkpark name_notok:dogaholic cafe think park name_notok:dogaholic cafe thinkpark name_notok:dogaholiccafe think park
name_notok:dogaholiccafe thinkpark name_notok:hamburg think park name_notok:hamburg thinkpark name_notok:satophlatthinkaraoke name_notok:
surathinkaimaranoodle(kaitun) name_notok:think cafe name_notok:think cafe @ the bloc name_notok:think cafe siam center name_notok:think cafe
siamcenter name_notok:think cafe สยามเซ็นเตอร name_notok:think tank name_notok:think tank (third place) name_notok:thinkcafe name_notok:
thinkcafe siam center name_notok:thinkcafe siamcenter name_notok:thinkcafe สยามเซ็นเตอร name_notok:thinkcafe@thebloc name_notok:thinktank
name_notok:thinktank(thirdplace) name_notok:tom n toms coffee think park name_notok:tom n toms coffee thinkpark name_notok:tomntomscoffee
think park name_notok:tomntomscoffee thinkpark name_notok:คาเฟวิโน think park name_notok:คาเฟวิโน thinkpark name_notok:ทัม แอนด ทัมส think park
Search with debug query (cont.)
0.01889828 = (MATCH) ConstantScore(name_notok:[ka-fe-vi-no] think park name_notok:[ka-fe-vi-no] thinkpark name_notok:[kafevino] think park
name_notok:[kafevino] thinkpark name_notok:dogaholic cafe think park name_notok:dogaholic cafe thinkpark name_notok:dogaholiccafe think park
name_notok:dogaholiccafe thinkpark name_notok:hamburg think park name_notok:hamburg thinkpark name_notok:satophlatthinkaraoke name_notok:
surathinkaimaranoodle(kaitun) name_notok:think cafe name_notok:think cafe @ the bloc name_notok:think cafe siam center name_notok:think cafe
siamcenter name_notok:think cafe สยามเซ็นเตอร name_notok:think tank name_notok:think tank (third place) name_notok:thinkcafe name_notok:
thinkcafe siam center name_notok:thinkcafe siamcenter name_notok:thinkcafe สยามเซ็นเตอร name_notok:thinkcafe@thebloc name_notok:thinktank
name_notok:thinktank(thirdplace) name_notok:tom n toms coffee think park name_notok:tom n toms coffee thinkpark name_notok:tomntomscoffee
think park name_notok:tomntomscoffee thinkpark name_notok:คาเฟวิโน think park name_notok:คาเฟวิโน thinkpark name_notok:ทัม แอนด ทัมส think park
name_notok:ทัม แอนด ทัมส thinkpark name_notok:ทัมแอนดทัมส think park name_notok:ทัมแอนดทัมส thinkpark)^5.0, product of:
5.0 = boost
0.003779656 = queryNorm
0.20399374 = (MATCH) sum of:
0.20399374 = (MATCH) weight(name:think in 169148) [DefaultSimilarity], result of:
0.20399374 = score(doc=169148,freq=2.0 = termFreq=2.0
), product of:
0.041768804 = queryWeight, product of:
11.050954 = idf(docFreq=7, maxDocs=185423)
0.003779656 = queryNorm
4.8838778 = fieldWeight in 169148, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
11.050954 = idf(docFreq=7, maxDocs=185423)
0.3125 = fieldNorm(doc=169148)
0.44444445 = coord(4/9)
</str>
Tuning
● Adding different type field (analyzer) eg. no tokenizer
● omitNorm
● Document & field boosting
● More query terms + term boosting
● Use boost function
○ {!boost b=numberOfReviews}
● Using filter instead of normal query
Reference
● https://lucene.apache.org/core/5_0_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.
html
● http://butchiso.com/assets/posts/tim-hieu-ve-apache-solr/solr-achitecture.png

More Related Content

What's hot

Fulltext search hell, como estruturar um sistema de busca desacoplado
Fulltext search hell, como estruturar um sistema de busca desacopladoFulltext search hell, como estruturar um sistema de busca desacoplado
Fulltext search hell, como estruturar um sistema de busca desacoplado
Juliana Lucena
 
Open Source Search: An Analysis
Open Source Search: An AnalysisOpen Source Search: An Analysis
Open Source Search: An Analysis
Justin Finkelstein
 
Recursive Query Throwdown
Recursive Query ThrowdownRecursive Query Throwdown
Recursive Query Throwdown
Karwin Software Solutions LLC
 
Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014
Alexandre Rafalovitch
 
Search and nosql for information management @nosqlmatters Cologne
Search and nosql for information management @nosqlmatters CologneSearch and nosql for information management @nosqlmatters Cologne
Search and nosql for information management @nosqlmatters Cologne
Lucian Precup
 
50 Laravel Tricks in 50 Minutes
50 Laravel Tricks in 50 Minutes50 Laravel Tricks in 50 Minutes
50 Laravel Tricks in 50 Minutes
Azim Kurt
 
Php code for online quiz
Php code for online quizPhp code for online quiz
Php code for online quiz
hnyb1002
 
Drupal7 dbtng
Drupal7  dbtngDrupal7  dbtng
Drupal7 dbtng
Nicolas Leroy
 
Ex[1].3 php db connectivity
Ex[1].3 php db connectivityEx[1].3 php db connectivity
Ex[1].3 php db connectivity
Mouli Chandira
 
What's new, what's hot in PHP 5.3
What's new, what's hot in PHP 5.3What's new, what's hot in PHP 5.3
What's new, what's hot in PHP 5.3
Jeremy Coates
 
PHP 1
PHP 1PHP 1
Database API, your new friend
Database API, your new friendDatabase API, your new friend
Database API, your new friend
kikoalonsob
 
PHP Tutorial (funtion)
PHP Tutorial (funtion)PHP Tutorial (funtion)
PHP Tutorial (funtion)
Tinnakorn Puttha
 
The Query the Whole Query and Nothing but the Query
The Query the Whole Query and Nothing but the QueryThe Query the Whole Query and Nothing but the Query
The Query the Whole Query and Nothing but the Query
Chris Olbekson
 
Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...
Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...
Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...
SPTechCon
 
Building Data Mapper PHP5
Building Data Mapper PHP5Building Data Mapper PHP5
Building Data Mapper PHP5
Vance Lucas
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 
Agile database access with CakePHP 3
Agile database access with CakePHP 3Agile database access with CakePHP 3
Agile database access with CakePHP 3
José Lorenzo Rodríguez Urdaneta
 
Api design and usability
Api design and usabilityApi design and usability
Api design and usability
sumitamar
 

What's hot (19)

Fulltext search hell, como estruturar um sistema de busca desacoplado
Fulltext search hell, como estruturar um sistema de busca desacopladoFulltext search hell, como estruturar um sistema de busca desacoplado
Fulltext search hell, como estruturar um sistema de busca desacoplado
 
Open Source Search: An Analysis
Open Source Search: An AnalysisOpen Source Search: An Analysis
Open Source Search: An Analysis
 
Recursive Query Throwdown
Recursive Query ThrowdownRecursive Query Throwdown
Recursive Query Throwdown
 
Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014
 
Search and nosql for information management @nosqlmatters Cologne
Search and nosql for information management @nosqlmatters CologneSearch and nosql for information management @nosqlmatters Cologne
Search and nosql for information management @nosqlmatters Cologne
 
50 Laravel Tricks in 50 Minutes
50 Laravel Tricks in 50 Minutes50 Laravel Tricks in 50 Minutes
50 Laravel Tricks in 50 Minutes
 
Php code for online quiz
Php code for online quizPhp code for online quiz
Php code for online quiz
 
Drupal7 dbtng
Drupal7  dbtngDrupal7  dbtng
Drupal7 dbtng
 
Ex[1].3 php db connectivity
Ex[1].3 php db connectivityEx[1].3 php db connectivity
Ex[1].3 php db connectivity
 
What's new, what's hot in PHP 5.3
What's new, what's hot in PHP 5.3What's new, what's hot in PHP 5.3
What's new, what's hot in PHP 5.3
 
PHP 1
PHP 1PHP 1
PHP 1
 
Database API, your new friend
Database API, your new friendDatabase API, your new friend
Database API, your new friend
 
PHP Tutorial (funtion)
PHP Tutorial (funtion)PHP Tutorial (funtion)
PHP Tutorial (funtion)
 
The Query the Whole Query and Nothing but the Query
The Query the Whole Query and Nothing but the QueryThe Query the Whole Query and Nothing but the Query
The Query the Whole Query and Nothing but the Query
 
Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...
Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...
Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...
 
Building Data Mapper PHP5
Building Data Mapper PHP5Building Data Mapper PHP5
Building Data Mapper PHP5
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Agile database access with CakePHP 3
Agile database access with CakePHP 3Agile database access with CakePHP 3
Agile database access with CakePHP 3
 
Api design and usability
Api design and usabilityApi design and usability
Api design and usability
 

Similar to Solr's Search Relevancy (Understand Solr's query debug)

Active Record Form Helpers, Season 1
Active Record Form Helpers, Season 1Active Record Form Helpers, Season 1
Active Record Form Helpers, Season 1
RORLAB
 
Hardcore URL Routing for WordPress - WordCamp Atlanta 2014 (PPT)
Hardcore URL Routing for WordPress - WordCamp Atlanta 2014 (PPT)Hardcore URL Routing for WordPress - WordCamp Atlanta 2014 (PPT)
Hardcore URL Routing for WordPress - WordCamp Atlanta 2014 (PPT)
Mike Schinkel
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 
Workshop quality assurance for php projects tek12
Workshop quality assurance for php projects tek12Workshop quality assurance for php projects tek12
Workshop quality assurance for php projects tek12
Michelangelo van Dam
 
Quality Assurance for PHP projects - ZendCon 2012
Quality Assurance for PHP projects - ZendCon 2012Quality Assurance for PHP projects - ZendCon 2012
Quality Assurance for PHP projects - ZendCon 2012
Michelangelo van Dam
 
Advanced Php - Macq Electronique 2010
Advanced Php - Macq Electronique 2010Advanced Php - Macq Electronique 2010
Advanced Php - Macq Electronique 2010
Michelangelo van Dam
 
03 form-data
03 form-data03 form-data
03 form-data
snopteck
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 
CouchDB-Lucene
CouchDB-LuceneCouchDB-Lucene
CouchDB-Lucene
Martin Rehfeld
 
Supercharging WordPress Development - Wordcamp Brighton 2019
Supercharging WordPress Development - Wordcamp Brighton 2019Supercharging WordPress Development - Wordcamp Brighton 2019
Supercharging WordPress Development - Wordcamp Brighton 2019
Adam Tomat
 
CCM AlchemyAPI and Real-time Aggregation
CCM AlchemyAPI and Real-time AggregationCCM AlchemyAPI and Real-time Aggregation
CCM AlchemyAPI and Real-time Aggregation
Victor Anjos
 
Hands On Spring Data
Hands On Spring DataHands On Spring Data
Hands On Spring Data
Eric Bottard
 
Connecting your Python App to OpenERP through OOOP
Connecting your Python App to OpenERP through OOOPConnecting your Python App to OpenERP through OOOP
Connecting your Python App to OpenERP through OOOP
raimonesteve
 
Jersey
JerseyJersey
Jersey
Yung-Lin Ho
 
Rapid prototyping search applications with solr
Rapid prototyping search applications with solrRapid prototyping search applications with solr
Rapid prototyping search applications with solr
Lucidworks (Archived)
 
Scalable web application architecture
Scalable web application architectureScalable web application architecture
Scalable web application architecture
postrational
 
Um roadmap do Framework Ruby on Rails, do Rails 1 ao Rails 4 - DevDay 2013
Um roadmap do Framework Ruby on Rails, do Rails 1 ao Rails 4 - DevDay 2013Um roadmap do Framework Ruby on Rails, do Rails 1 ao Rails 4 - DevDay 2013
Um roadmap do Framework Ruby on Rails, do Rails 1 ao Rails 4 - DevDay 2013
Joao Lucas Santana
 
PHP Experience 2016 - [Workshop] Elastic Search: Turbinando sua aplicação PHP
PHP Experience 2016 - [Workshop] Elastic Search: Turbinando sua aplicação PHPPHP Experience 2016 - [Workshop] Elastic Search: Turbinando sua aplicação PHP
PHP Experience 2016 - [Workshop] Elastic Search: Turbinando sua aplicação PHP
iMasters
 
Kicking off with Zend Expressive and Doctrine ORM (PHP UK 2017)
Kicking off with Zend Expressive and Doctrine ORM (PHP UK 2017)Kicking off with Zend Expressive and Doctrine ORM (PHP UK 2017)
Kicking off with Zend Expressive and Doctrine ORM (PHP UK 2017)
James Titcumb
 
第49回Php勉強会@関東 Datasource
第49回Php勉強会@関東 Datasource第49回Php勉強会@関東 Datasource
第49回Php勉強会@関東 Datasource
Kaz Watanabe
 

Similar to Solr's Search Relevancy (Understand Solr's query debug) (20)

Active Record Form Helpers, Season 1
Active Record Form Helpers, Season 1Active Record Form Helpers, Season 1
Active Record Form Helpers, Season 1
 
Hardcore URL Routing for WordPress - WordCamp Atlanta 2014 (PPT)
Hardcore URL Routing for WordPress - WordCamp Atlanta 2014 (PPT)Hardcore URL Routing for WordPress - WordCamp Atlanta 2014 (PPT)
Hardcore URL Routing for WordPress - WordCamp Atlanta 2014 (PPT)
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Workshop quality assurance for php projects tek12
Workshop quality assurance for php projects tek12Workshop quality assurance for php projects tek12
Workshop quality assurance for php projects tek12
 
Quality Assurance for PHP projects - ZendCon 2012
Quality Assurance for PHP projects - ZendCon 2012Quality Assurance for PHP projects - ZendCon 2012
Quality Assurance for PHP projects - ZendCon 2012
 
Advanced Php - Macq Electronique 2010
Advanced Php - Macq Electronique 2010Advanced Php - Macq Electronique 2010
Advanced Php - Macq Electronique 2010
 
03 form-data
03 form-data03 form-data
03 form-data
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
CouchDB-Lucene
CouchDB-LuceneCouchDB-Lucene
CouchDB-Lucene
 
Supercharging WordPress Development - Wordcamp Brighton 2019
Supercharging WordPress Development - Wordcamp Brighton 2019Supercharging WordPress Development - Wordcamp Brighton 2019
Supercharging WordPress Development - Wordcamp Brighton 2019
 
CCM AlchemyAPI and Real-time Aggregation
CCM AlchemyAPI and Real-time AggregationCCM AlchemyAPI and Real-time Aggregation
CCM AlchemyAPI and Real-time Aggregation
 
Hands On Spring Data
Hands On Spring DataHands On Spring Data
Hands On Spring Data
 
Connecting your Python App to OpenERP through OOOP
Connecting your Python App to OpenERP through OOOPConnecting your Python App to OpenERP through OOOP
Connecting your Python App to OpenERP through OOOP
 
Jersey
JerseyJersey
Jersey
 
Rapid prototyping search applications with solr
Rapid prototyping search applications with solrRapid prototyping search applications with solr
Rapid prototyping search applications with solr
 
Scalable web application architecture
Scalable web application architectureScalable web application architecture
Scalable web application architecture
 
Um roadmap do Framework Ruby on Rails, do Rails 1 ao Rails 4 - DevDay 2013
Um roadmap do Framework Ruby on Rails, do Rails 1 ao Rails 4 - DevDay 2013Um roadmap do Framework Ruby on Rails, do Rails 1 ao Rails 4 - DevDay 2013
Um roadmap do Framework Ruby on Rails, do Rails 1 ao Rails 4 - DevDay 2013
 
PHP Experience 2016 - [Workshop] Elastic Search: Turbinando sua aplicação PHP
PHP Experience 2016 - [Workshop] Elastic Search: Turbinando sua aplicação PHPPHP Experience 2016 - [Workshop] Elastic Search: Turbinando sua aplicação PHP
PHP Experience 2016 - [Workshop] Elastic Search: Turbinando sua aplicação PHP
 
Kicking off with Zend Expressive and Doctrine ORM (PHP UK 2017)
Kicking off with Zend Expressive and Doctrine ORM (PHP UK 2017)Kicking off with Zend Expressive and Doctrine ORM (PHP UK 2017)
Kicking off with Zend Expressive and Doctrine ORM (PHP UK 2017)
 
第49回Php勉強会@関東 Datasource
第49回Php勉強会@関東 Datasource第49回Php勉強会@関東 Datasource
第49回Php勉強会@関東 Datasource
 

Recently uploaded

一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
sydezfe
 
Impartiality as per ISO /IEC 17025:2017 Standard
Impartiality as per ISO /IEC 17025:2017 StandardImpartiality as per ISO /IEC 17025:2017 Standard
Impartiality as per ISO /IEC 17025:2017 Standard
MuhammadJazib15
 
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
Paris Salesforce Developer Group
 
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
PriyankaKilaniya
 
Supermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdfSupermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdf
Kamal Acharya
 
Object Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOADObject Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOAD
PreethaV16
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
21UME003TUSHARDEB
 
Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
Atif Razi
 
SENTIMENT ANALYSIS ON PPT AND Project template_.pptx
SENTIMENT ANALYSIS ON PPT AND Project template_.pptxSENTIMENT ANALYSIS ON PPT AND Project template_.pptx
SENTIMENT ANALYSIS ON PPT AND Project template_.pptx
b0754201
 
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
OKORIE1
 
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
ijseajournal
 
EV BMS WITH CHARGE MONITOR AND FIRE DETECTION.pptx
EV BMS WITH CHARGE MONITOR AND FIRE DETECTION.pptxEV BMS WITH CHARGE MONITOR AND FIRE DETECTION.pptx
EV BMS WITH CHARGE MONITOR AND FIRE DETECTION.pptx
nikshimanasa
 
Introduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.pptIntroduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.ppt
Dwarkadas J Sanghvi College of Engineering
 
Digital Image Processing Unit -2 Notes complete
Digital Image Processing Unit -2 Notes completeDigital Image Processing Unit -2 Notes complete
Digital Image Processing Unit -2 Notes complete
shubhamsaraswat8740
 
Height and depth gauge linear metrology.pdf
Height and depth gauge linear metrology.pdfHeight and depth gauge linear metrology.pdf
Height and depth gauge linear metrology.pdf
q30122000
 
Accident detection system project report.pdf
Accident detection system project report.pdfAccident detection system project report.pdf
Accident detection system project report.pdf
Kamal Acharya
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
uqyfuc
 
Sri Guru Hargobind Ji - Bandi Chor Guru.pdf
Sri Guru Hargobind Ji - Bandi Chor Guru.pdfSri Guru Hargobind Ji - Bandi Chor Guru.pdf
Sri Guru Hargobind Ji - Bandi Chor Guru.pdf
Balvir Singh
 
Ericsson LTE Throughput Troubleshooting Techniques.ppt
Ericsson LTE Throughput Troubleshooting Techniques.pptEricsson LTE Throughput Troubleshooting Techniques.ppt
Ericsson LTE Throughput Troubleshooting Techniques.ppt
wafawafa52
 
Blood finder application project report (1).pdf
Blood finder application project report (1).pdfBlood finder application project report (1).pdf
Blood finder application project report (1).pdf
Kamal Acharya
 

Recently uploaded (20)

一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
 
Impartiality as per ISO /IEC 17025:2017 Standard
Impartiality as per ISO /IEC 17025:2017 StandardImpartiality as per ISO /IEC 17025:2017 Standard
Impartiality as per ISO /IEC 17025:2017 Standard
 
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
 
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
 
Supermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdfSupermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdf
 
Object Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOADObject Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOAD
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
 
Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
 
SENTIMENT ANALYSIS ON PPT AND Project template_.pptx
SENTIMENT ANALYSIS ON PPT AND Project template_.pptxSENTIMENT ANALYSIS ON PPT AND Project template_.pptx
SENTIMENT ANALYSIS ON PPT AND Project template_.pptx
 
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
 
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
 
EV BMS WITH CHARGE MONITOR AND FIRE DETECTION.pptx
EV BMS WITH CHARGE MONITOR AND FIRE DETECTION.pptxEV BMS WITH CHARGE MONITOR AND FIRE DETECTION.pptx
EV BMS WITH CHARGE MONITOR AND FIRE DETECTION.pptx
 
Introduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.pptIntroduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.ppt
 
Digital Image Processing Unit -2 Notes complete
Digital Image Processing Unit -2 Notes completeDigital Image Processing Unit -2 Notes complete
Digital Image Processing Unit -2 Notes complete
 
Height and depth gauge linear metrology.pdf
Height and depth gauge linear metrology.pdfHeight and depth gauge linear metrology.pdf
Height and depth gauge linear metrology.pdf
 
Accident detection system project report.pdf
Accident detection system project report.pdfAccident detection system project report.pdf
Accident detection system project report.pdf
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
 
Sri Guru Hargobind Ji - Bandi Chor Guru.pdf
Sri Guru Hargobind Ji - Bandi Chor Guru.pdfSri Guru Hargobind Ji - Bandi Chor Guru.pdf
Sri Guru Hargobind Ji - Bandi Chor Guru.pdf
 
Ericsson LTE Throughput Troubleshooting Techniques.ppt
Ericsson LTE Throughput Troubleshooting Techniques.pptEricsson LTE Throughput Troubleshooting Techniques.ppt
Ericsson LTE Throughput Troubleshooting Techniques.ppt
 
Blood finder application project report (1).pdf
Blood finder application project report (1).pdfBlood finder application project report (1).pdf
Blood finder application project report (1).pdf
 

Solr's Search Relevancy (Understand Solr's query debug)

  • 1. Solr’s Search Relevancy Understand Solr’s query debug By Suparit Krityakien Co-founder and Software Architect at Wongnai Media Barcamp Bangkhen 2015
  • 2. Agenda ● Introduction to Apache Lucene and Apache Solr ● Relevancy Problem ● Scoring Model ● Solutions
  • 3. Apache Lucene ● Full-text Search ● Inverted Index ● Fast!!!!! ● Java Library (pure) ● Apache License, Version 2.0. Privacy Policy
  • 5. Basic API Usage Directory directory = new RAMDirectory(); Analyzer analyzer = new StandardAnalyzer(); IndexWriterConfig config = new IndexWriterConfig(analyzer); IndexWriter iwriter = new IndexWriter(directory, config); Document doc = new Document(); doc.add(new Field("title", “title”, TextField.TYPE_STORED)); doc.add(new Field("content", "content", TextField.TYPE_STORED)); iwriter.addDocument(doc); iwriter.close(); DirectoryReader ireader = DirectoryReader.open(directory); IndexSearcher isearcher = new IndexSearcher(ireader); QueryParser parser = new QueryParser("content", analyzer); Query query = parser.parse("text"); ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs; for (int i = 0; i < hits.length; i++) { Document hitDoc = isearcher.doc(hits[i].doc); hitDoc.get("title")); } ireader.close(); directory.close();
  • 6. Query Syntax (Introduction) ● Grammar ○ Query ::= ( Clause )* Clause ::= ["+", "-"] [<FIELD> ":"] ( <TERM> | "(" Query ")" ) ● Example - name:kasetsart event:barcamp ● Wildcard - m?n or sofite* ● Fuzzy - chicalecious~2 ● Proximity - “after you cafe”~2 ● Boosting - wongnai^10
  • 7. Apache Solr ● Index Server ● Built on top of Luceue ● REST API / XML / Binary ● Configurable without programming ○ Schema, Type ○ Analyzer ○ Query .. etc ● Still be Java ○ Deployed to servlet container ○ or Standalone (jetty)
  • 9. Example of configuration files <?xml version="1.0" encoding="UTF-8" ?> <!-- solrconfig.xml --> <config> <luceneMatchVersion>LUCENE_42</luceneMatchVersion> <dataDir>${solr.data.dir:}</dataDir> <directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr. NRTCachingDirectoryFactory}" /> <abortOnConfigurationError>${solr.abortOnConfigurationError:true} </abortOnConfigurationError> <lib dir="../../../contrib/extraction/lib" regex=".*.jar" /> <lib dir="../../../dist/" regex="solr-cell-d.*.jar" /> <indexConfig>...</indexConfig> <updateHandler class="solr.DirectUpdateHandler2">...</updateHandler> <query>...</query> <requestDispatcher handleSelect="false">...</requestDispatcher> <requestHandler name="/select" class="solr.SearchHandler">...</requestHandler> <requestHandler name="/update" class="solr.UpdateRequestHandler" /> <requestHandler name="/admin/" class="solr.admin.AdminHandlers" /> <requestHandler name="/replication" class="solr.ReplicationHandler" /> <admin> <defaultQuery>*:*</defaultQuery> </admin> </config> <?xml version="1.0" ?> <!-- schema.xml --> <schema name="example core zero" version="1.1"> <types> <fieldType name="string" class="solr.StrField" sortMissingLast="true" /> <fieldType name="string_ci" class="solr.TextField" sortMissingLast="true" omitNorms="true"> <analyzer> <tokenizer class="solr.KeywordTokenizerFactory" /> <filter class="solr.LowerCaseFilterFactory" /> </analyzer> </fieldType> . . . </types> <fields> <field name="id" type="string" indexed="true" stored="true" multiValued="false" required="true" /> <field name="_version_" type="long" indexed="true" stored="true" /> <field name="name" type="text_general" indexed="true" stored="false" multiValued="false" required=" true" /> <field name="coordinate" type="location" indexed="true" stored="false" multiValued="false" required=" false" /> <dynamicField name="*_coordinate" type="tdouble" indexed="true" stored="false" /> </fields> <uniqueKey>id</uniqueKey> <defaultSearchField>name</defaultSearchField> <solrQueryParser defaultOperator="OR" /> </schema>
  • 11. Relevancy Problem Why does the result look like this? ● Why document id:xxx was found? ● Why document id:yyy was not found (top 5)? ● Why doesn't document id:xxx appear before id:yyy?
  • 12. TF-IDF -- the default scoring model ● tf (term frequency) ● idf (inverse document frequency) ● coord (coordination factor) ● norm (normalization) ○ fieldNorm - index-time ○ queryNorm - not effect ranking! ● query-time boosting
  • 14. Search with debug query (cont.) <str name="123686"> 0.20652547 = (MATCH) product of: 0.4646823 = (MATCH) sum of: 0.20399374 = (MATCH) weight(name:think in 169148) [DefaultSimilarity], result of: 0.20399374 = score(doc=169148,freq=2.0 = termFreq=2.0 ), product of: 0.041768804 = queryWeight, product of: 11.050954 = idf(docFreq=7, maxDocs=185423) 0.003779656 = queryNorm 4.8838778 = fieldWeight in 169148, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 11.050954 = idf(docFreq=7, maxDocs=185423) 0.3125 = fieldNorm(doc=169148) 0.03779656 = (MATCH) ConstantScore(name_notok:think cafe name_notok:think cafe @ the bloc name_notok:think cafe siam center name_notok: think cafe siamcenter name_notok:think cafe สยามเซ็นเตอร name_notok:think tank name_notok:think tank (third place) name_notok:thinkcafe name_notok:thinkcafe siam center name_notok:thinkcafe siamcenter name_notok:thinkcafe สยามเซ็นเตอร name_notok:thinkcafe@thebloc name_notok: thinktank name_notok:thinktank(thirdplace))^10.0, product of: 10.0 = boost 0.003779656 = queryNorm 0.01889828 = (MATCH) ConstantScore(name_notok:[ka-fe-vi-no] think park name_notok:[ka-fe-vi-no] thinkpark name_notok:[kafevino] think park name_notok:[kafevino] thinkpark name_notok:dogaholic cafe think park name_notok:dogaholic cafe thinkpark name_notok:dogaholiccafe think park name_notok:dogaholiccafe thinkpark name_notok:hamburg think park name_notok:hamburg thinkpark name_notok:satophlatthinkaraoke name_notok: surathinkaimaranoodle(kaitun) name_notok:think cafe name_notok:think cafe @ the bloc name_notok:think cafe siam center name_notok:think cafe siamcenter name_notok:think cafe สยามเซ็นเตอร name_notok:think tank name_notok:think tank (third place) name_notok:thinkcafe name_notok: thinkcafe siam center name_notok:thinkcafe siamcenter name_notok:thinkcafe สยามเซ็นเตอร name_notok:thinkcafe@thebloc name_notok:thinktank name_notok:thinktank(thirdplace) name_notok:tom n toms coffee think park name_notok:tom n toms coffee thinkpark name_notok:tomntomscoffee think park name_notok:tomntomscoffee thinkpark name_notok:คาเฟวิโน think park name_notok:คาเฟวิโน thinkpark name_notok:ทัม แอนด ทัมส think park
  • 15. Search with debug query (cont.) 0.01889828 = (MATCH) ConstantScore(name_notok:[ka-fe-vi-no] think park name_notok:[ka-fe-vi-no] thinkpark name_notok:[kafevino] think park name_notok:[kafevino] thinkpark name_notok:dogaholic cafe think park name_notok:dogaholic cafe thinkpark name_notok:dogaholiccafe think park name_notok:dogaholiccafe thinkpark name_notok:hamburg think park name_notok:hamburg thinkpark name_notok:satophlatthinkaraoke name_notok: surathinkaimaranoodle(kaitun) name_notok:think cafe name_notok:think cafe @ the bloc name_notok:think cafe siam center name_notok:think cafe siamcenter name_notok:think cafe สยามเซ็นเตอร name_notok:think tank name_notok:think tank (third place) name_notok:thinkcafe name_notok: thinkcafe siam center name_notok:thinkcafe siamcenter name_notok:thinkcafe สยามเซ็นเตอร name_notok:thinkcafe@thebloc name_notok:thinktank name_notok:thinktank(thirdplace) name_notok:tom n toms coffee think park name_notok:tom n toms coffee thinkpark name_notok:tomntomscoffee think park name_notok:tomntomscoffee thinkpark name_notok:คาเฟวิโน think park name_notok:คาเฟวิโน thinkpark name_notok:ทัม แอนด ทัมส think park name_notok:ทัม แอนด ทัมส thinkpark name_notok:ทัมแอนดทัมส think park name_notok:ทัมแอนดทัมส thinkpark)^5.0, product of: 5.0 = boost 0.003779656 = queryNorm 0.20399374 = (MATCH) sum of: 0.20399374 = (MATCH) weight(name:think in 169148) [DefaultSimilarity], result of: 0.20399374 = score(doc=169148,freq=2.0 = termFreq=2.0 ), product of: 0.041768804 = queryWeight, product of: 11.050954 = idf(docFreq=7, maxDocs=185423) 0.003779656 = queryNorm 4.8838778 = fieldWeight in 169148, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 11.050954 = idf(docFreq=7, maxDocs=185423) 0.3125 = fieldNorm(doc=169148) 0.44444445 = coord(4/9) </str>
  • 16. Tuning ● Adding different type field (analyzer) eg. no tokenizer ● omitNorm ● Document & field boosting ● More query terms + term boosting ● Use boost function ○ {!boost b=numberOfReviews} ● Using filter instead of normal query