SlideShare a Scribd company logo
1 of 59
Download to read offline
Beyond TF-IDF
Stephen Murtagh
etsy.com
20,000,000 items
1,000,000 sellers
15,000,000 daily
searches
80,000,000 daily calls to Solr
Etsy Engineering
• Code as Craft - our engineering blog
• http://codeascraft.etsy.com/
• Continuous Deployment
• https://github.com/etsy/deployinator
• Experiment-driven culture
• Hybrid engineering roles
• Dev-Ops
• Data-Driven Products
Etsy Search
• 2 search clusters: Flip and Flop
• Master -> 20 slaves
• Only one cluster takes traffic
• Thrift (no HTTP endpoint)
• BitTorrent for index replication
• Solr 4.1
• Incremental index every 12 minutes
Beyond TF-IDF
•Why?
•What?
•How?
Luggage tags
“unique bag”
q = unique+bag
q = unique+bag
>
Scoring in Lucene
Scoring in Lucene
Fixed for any given query
constant
Scoring in Lucene
f(term, document)
f(term)
Scoring in Lucene
User content
Only measure rarity
IDF(“unique”)
4.429547
IDF(“bag”)
4.32836>
q = unique+bag
“unique unique bag” “unique bag bag”
>
“unique” tells us
nothing...
Stop words
• Add “unique” to stop word list?
• What about “handmade” or “blue”?
• Low-information words can still be useful
for matching
• ... but harmful for ranking
Why not replace IDF?
Beyond TF-IDF
•Why?
• IDF ignores term “usefulness”
•What?
•How?
Beyond TF-IDF
•Why?
• IDF ignores term “usefulness”
•What?
•How?
What do we replace it
with?
Benefits of IDF
I1 =





doc1 doc2 doc3 . . . docn
art 2 0 1 . . . 1
jewelry 1 3 0 . . . 0
...
...
termm 1 0 1 . . . 0





Benefits of IDF
I1 =





doc1 doc2 doc3 . . . docn
art 2 0 1 . . . 1
jewelry 1 3 0 . . . 0
...
...
termm 1 0 1 . . . 0





IDF(jewelry) = 1 + log(
n

d id,jewelry
)
Sharding
I1 =





doc1 doc2 doc3 . . . dock
art 2 0 1 . . . 1
jewelry 1 3 0 . . . 0
...
...
termm 1 0 1 . . . 0





I2 =





dock+1 dock+2 dock+3 . . . docn
art 6 1 0 . . . 1
jewelry 0 1 3 . . . 0
...
...
termm 0 1 1 . . . 0





Sharding
I1 =





doc1 doc2 doc3 . . . dock
art 2 0 1 . . . 1
jewelry 1 3 0 . . . 0
...
...
termm 1 0 1 . . . 0





I2 =





dock+1 dock+2 dock+3 . . . docn
art 6 1 0 . . . 1
jewelry 0 1 3 . . . 0
...
...
termm 0 1 1 . . . 0





IDF(jewelry) = 1 + log(
n

d id,jewelry
)
Sharding
I1 =





doc1 doc2 doc3 . . . dock
art 2 0 1 . . . 1
jewelry 1 3 0 . . . 0
...
...
termm 1 0 1 . . . 0





I2 =





dock+1 dock+2 dock+3 . . . docn
art 6 1 0 . . . 1
jewelry 0 1 3 . . . 0
...
...
termm 0 1 1 . . . 0





IDF1(jewelry) = IDF2(jewelry) = IDF(jewelry)
Sharded IDF options
• Ignore it - Shards score differently
• Shards exchange stats - Messy
• Central source distributes IDF to shards
Information Gain
• P(x) - Probability of x appearing in a listing
• P(x|y) - Probability of x appearing given y appears
info(y) = D(P(X|y)||P(X))
info(y) = Σx∈X log(
P(x|y)
P(x)
) ∗ P(x|y)
Term Info(x) IDF
unique 0.26 4.43
bag 1.24 4.33
pattern 1.20 4.38
original 0.85 4.38
dress 1.31 4.42
man 0.64 4.41
photo 0.74 4.37
stone 0.92 4.35
Similar IDF
Term Info(x) IDF
unique 0.26 4.39
black 0.22 3.32
red 0.22 3.52
handmade 0.20 3.26
two 0.32 5.64
white 0.19 3.32
three 0.37 6.19
for 0.21 3.59
Similar Info Gain
q = unique+bag
Using IDF
score(“unique unique bag”)

score(“unique bag bag”)
Using information gain
score(“unique unique bag”)

score(“unique bag bag”)
Beyond TF-IDF
•Why?
• IDF ignores term “usefulness”
•What?
•How?
Beyond TF-IDF
•Why?
• IDF ignores term “usefulness”
•What?
• Information gain accounts for term quality
•How?
Beyond TF-IDF
•Why?
• IDF ignores term “usefulness”
•What?
• Information gain accounts for term quality
•How?
Listing Quality
• Performance relative
to rank
• Hadoop: logs - hdfs
• cron: hdfs - master
• bash: master - slave
• Loaded as external file
field
Computing info gain
I1 =





doc1 doc2 doc3 . . . docn
art 2 0 1 . . . 1
jewelry 1 3 0 . . . 0
...
...
termm 1 0 1 . . . 0





info(y) = D(P(X|y)||P(X))
info(y) = Σx∈X log(
P(x|y)
P(x)
) ∗ P(x|y)
Hadoop
• Brute-force
• Count all terms
• Count all co-occuring terms
• Construct distributions
• Compute info gain for all terms
File Distribution
• cron copies score file to master
• master replicates file to slaves
infogain=`find /search/data/ -maxdepth 1 -type f -
name info_gain.* -print | sort | tail -n 1`
scp $infogain user@$slave:$infogain
File Distribution
schema.xml
Beyond TF-IDF
•Why?
• IDF ignores term “usefulness”
•What?
• Information gain accounts for term quality
•How?
• Hadoop + similarity factory = win
Fast Deploys,
Careful Testing
• Idea
• Proof of Concept
• Side-By-Side
• A/B test
• 100% Live
Side-by-Side
Relevant != High quality
A/B Test
• Users are randomly assigned to A or B
• A sees IDF-based results
• B sees info gain-based results
A/B Test
• Users are randomly assigned to A or B
• A sees IDF-based results
• B sees info gain-based results
• Small but significant decrease in clicks,
page views, etc.
More homogeneous results
Lower average quality score
Next Steps
Parameter Tweaking...
Rebalance relevancy and quality signals in score
The Future
Latent Semantic
Indexing in Solr/Lucene
Latent Semantic
Indexing
• In TF-IDF, documents are sparse vectors in
term space
• LSI re-maps these to dense vectors in
“concept” space
• Construct transformation matrix:
• Load file at index and query time
• Re-map query and documents
Rm
+
Rr
Tr×m
CONTACT
Stephen Murtagh
smurtagh@etsy.com

More Related Content

What's hot

EuroPython 2016 - Do I Need To Switch To Golang
EuroPython 2016 - Do I Need To Switch To GolangEuroPython 2016 - Do I Need To Switch To Golang
EuroPython 2016 - Do I Need To Switch To GolangMax Tepkeev
 
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from..."PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...Edge AI and Vision Alliance
 
Python于Web 2.0网站的应用 - QCon Beijing 2010
Python于Web 2.0网站的应用 - QCon Beijing 2010Python于Web 2.0网站的应用 - QCon Beijing 2010
Python于Web 2.0网站的应用 - QCon Beijing 2010Qiangning Hong
 
Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorchJun Young Park
 
Rainer Grimm, “Functional Programming in C++11”
Rainer Grimm, “Functional Programming in C++11”Rainer Grimm, “Functional Programming in C++11”
Rainer Grimm, “Functional Programming in C++11”Platonov Sergey
 
Kotlin Slides from Devoxx 2011
Kotlin Slides from Devoxx 2011Kotlin Slides from Devoxx 2011
Kotlin Slides from Devoxx 2011Andrey Breslav
 
Introduction to functional programming using Ocaml
Introduction to functional programming using OcamlIntroduction to functional programming using Ocaml
Introduction to functional programming using Ocamlpramode_ce
 
The Next Great Functional Programming Language
The Next Great Functional Programming LanguageThe Next Great Functional Programming Language
The Next Great Functional Programming LanguageJohn De Goes
 
Introduction to NumPy for Machine Learning Programmers
Introduction to NumPy for Machine Learning ProgrammersIntroduction to NumPy for Machine Learning Programmers
Introduction to NumPy for Machine Learning ProgrammersKimikazu Kato
 
Swift for tensorflow
Swift for tensorflowSwift for tensorflow
Swift for tensorflow규영 허
 
Machine learning with py torch
Machine learning with py torchMachine learning with py torch
Machine learning with py torchRiza Fahmi
 
The best language in the world
The best language in the worldThe best language in the world
The best language in the worldDavid Muñoz Díaz
 
Humble introduction to category theory in haskell
Humble introduction to category theory in haskellHumble introduction to category theory in haskell
Humble introduction to category theory in haskellJongsoo Lee
 
Python fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuanPython fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuanWei-Yuan Chang
 

What's hot (20)

EuroPython 2016 - Do I Need To Switch To Golang
EuroPython 2016 - Do I Need To Switch To GolangEuroPython 2016 - Do I Need To Switch To Golang
EuroPython 2016 - Do I Need To Switch To Golang
 
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from..."PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
 
Python于Web 2.0网站的应用 - QCon Beijing 2010
Python于Web 2.0网站的应用 - QCon Beijing 2010Python于Web 2.0网站的应用 - QCon Beijing 2010
Python于Web 2.0网站的应用 - QCon Beijing 2010
 
Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorch
 
Dive Into PyTorch
Dive Into PyTorchDive Into PyTorch
Dive Into PyTorch
 
MTL Versus Free
MTL Versus FreeMTL Versus Free
MTL Versus Free
 
Scala @ TomTom
Scala @ TomTomScala @ TomTom
Scala @ TomTom
 
Hammurabi
HammurabiHammurabi
Hammurabi
 
Rainer Grimm, “Functional Programming in C++11”
Rainer Grimm, “Functional Programming in C++11”Rainer Grimm, “Functional Programming in C++11”
Rainer Grimm, “Functional Programming in C++11”
 
Kotlin Slides from Devoxx 2011
Kotlin Slides from Devoxx 2011Kotlin Slides from Devoxx 2011
Kotlin Slides from Devoxx 2011
 
Introduction to functional programming using Ocaml
Introduction to functional programming using OcamlIntroduction to functional programming using Ocaml
Introduction to functional programming using Ocaml
 
The Next Great Functional Programming Language
The Next Great Functional Programming LanguageThe Next Great Functional Programming Language
The Next Great Functional Programming Language
 
Introduction to NumPy for Machine Learning Programmers
Introduction to NumPy for Machine Learning ProgrammersIntroduction to NumPy for Machine Learning Programmers
Introduction to NumPy for Machine Learning Programmers
 
Swift for tensorflow
Swift for tensorflowSwift for tensorflow
Swift for tensorflow
 
Machine learning with py torch
Machine learning with py torchMachine learning with py torch
Machine learning with py torch
 
The best language in the world
The best language in the worldThe best language in the world
The best language in the world
 
Python tour
Python tourPython tour
Python tour
 
Humble introduction to category theory in haskell
Humble introduction to category theory in haskellHumble introduction to category theory in haskell
Humble introduction to category theory in haskell
 
Python fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuanPython fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuan
 
Java Class Design
Java Class DesignJava Class Design
Java Class Design
 

Similar to Beyond tf idf why, what & how

More Data, More Problems: Evolving big data machine learning pipelines with S...
More Data, More Problems: Evolving big data machine learning pipelines with S...More Data, More Problems: Evolving big data machine learning pipelines with S...
More Data, More Problems: Evolving big data machine learning pipelines with S...Alex Sadovsky
 
Gpu programming with java
Gpu programming with javaGpu programming with java
Gpu programming with javaGary Sieling
 
Concepts of Functional Programming for Java Brains (2010)
Concepts of Functional Programming for Java Brains (2010)Concepts of Functional Programming for Java Brains (2010)
Concepts of Functional Programming for Java Brains (2010)Peter Kofler
 
Yin Yangs of Software Development
Yin Yangs of Software DevelopmentYin Yangs of Software Development
Yin Yangs of Software DevelopmentNaveenkumar Muguda
 
Multidimensional Indexing
Multidimensional IndexingMultidimensional Indexing
Multidimensional IndexingDigvijay Singh
 
Apriori algorithm
Apriori algorithm Apriori algorithm
Apriori algorithm DHIVYADEVAKI
 
Hash - A probabilistic approach for big data
Hash - A probabilistic approach for big dataHash - A probabilistic approach for big data
Hash - A probabilistic approach for big dataLuca Mastrostefano
 
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...GeeksLab Odessa
 
A Development of Log-based Game AI using Deep Learning
A Development of Log-based Game AI using Deep LearningA Development of Log-based Game AI using Deep Learning
A Development of Log-based Game AI using Deep LearningSuntae Kim
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearchhypto
 
RedisConf17 - Searching Billions of Documents with Redis
RedisConf17 - Searching Billions of Documents with RedisRedisConf17 - Searching Billions of Documents with Redis
RedisConf17 - Searching Billions of Documents with RedisRedis Labs
 
Python高级编程(二)
Python高级编程(二)Python高级编程(二)
Python高级编程(二)Qiangning Hong
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADtab0ris_1
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
Ponies and Unicorns With Scala
Ponies and Unicorns With ScalaPonies and Unicorns With Scala
Ponies and Unicorns With ScalaTomer Gabel
 
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course PROIDEA
 
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!Daniel Cousineau
 
Python Performance: Single-threaded, multi-threaded, and Gevent
Python Performance: Single-threaded, multi-threaded, and GeventPython Performance: Single-threaded, multi-threaded, and Gevent
Python Performance: Single-threaded, multi-threaded, and Geventemptysquare
 

Similar to Beyond tf idf why, what & how (20)

More Data, More Problems: Evolving big data machine learning pipelines with S...
More Data, More Problems: Evolving big data machine learning pipelines with S...More Data, More Problems: Evolving big data machine learning pipelines with S...
More Data, More Problems: Evolving big data machine learning pipelines with S...
 
Gpu programming with java
Gpu programming with javaGpu programming with java
Gpu programming with java
 
Concepts of Functional Programming for Java Brains (2010)
Concepts of Functional Programming for Java Brains (2010)Concepts of Functional Programming for Java Brains (2010)
Concepts of Functional Programming for Java Brains (2010)
 
Yin Yangs of Software Development
Yin Yangs of Software DevelopmentYin Yangs of Software Development
Yin Yangs of Software Development
 
Multidimensional Indexing
Multidimensional IndexingMultidimensional Indexing
Multidimensional Indexing
 
Apriori algorithm
Apriori algorithm Apriori algorithm
Apriori algorithm
 
Hash - A probabilistic approach for big data
Hash - A probabilistic approach for big dataHash - A probabilistic approach for big data
Hash - A probabilistic approach for big data
 
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
 
Taint analysis
Taint analysisTaint analysis
Taint analysis
 
A Development of Log-based Game AI using Deep Learning
A Development of Log-based Game AI using Deep LearningA Development of Log-based Game AI using Deep Learning
A Development of Log-based Game AI using Deep Learning
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
RedisConf17 - Searching Billions of Documents with Redis
RedisConf17 - Searching Billions of Documents with RedisRedisConf17 - Searching Billions of Documents with Redis
RedisConf17 - Searching Billions of Documents with Redis
 
Python高级编程(二)
Python高级编程(二)Python高级编程(二)
Python高级编程(二)
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADta
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Ponies and Unicorns With Scala
Ponies and Unicorns With ScalaPonies and Unicorns With Scala
Ponies and Unicorns With Scala
 
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
 
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
 
Python Performance: Single-threaded, multi-threaded, and Gevent
Python Performance: Single-threaded, multi-threaded, and GeventPython Performance: Single-threaded, multi-threaded, and Gevent
Python Performance: Single-threaded, multi-threaded, and Gevent
 

More from lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 

More from lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Recently uploaded

AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinojohnmickonozaleda
 

Recently uploaded (20)

Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipino
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 

Beyond tf idf why, what & how