Submit Search
Upload
Solr: Search at the Speed of Light
•
35 likes
•
19,519 views
Erik Hatcher
Follow
Erik Hatcher's JavaZone '09 slides for "Solr: Search at the Speed of Light"
Read less
Read more
Technology
Health & Medicine
Report
Share
Report
Share
1 of 50
Download Now
Download to read offline
Recommended
Lucene Introduction
Lucene Introduction
otisg
Apache Solr crash course
Apache Solr crash course
Tommaso Teofili
What is in a Lucene index?
What is in a Lucene index?
lucenerevolution
Spark Summit EU talk by Dean Wampler
Spark Summit EU talk by Dean Wampler
Spark Summit
Лекция 12. Spark
Лекция 12. Spark
Technopark
Deploying Kafka Streams Applications with Docker and Kubernetes
Deploying Kafka Streams Applications with Docker and Kubernetes
confluent
Introduction to Apache solr
Introduction to Apache solr
Knoldus Inc.
Hw09 Hadoop Development At Facebook Hive And Hdfs
Hw09 Hadoop Development At Facebook Hive And Hdfs
Cloudera, Inc.
More Related Content
What's hot
openCypher: Introducing subqueries
openCypher: Introducing subqueries
openCypher
ELK Stack
ELK Stack
Phuc Nguyen
Solrcloud Leader Election
Solrcloud Leader Election
ravikgiitk
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
DataWorks Summit
Apache Spark Overview
Apache Spark Overview
Vadim Y. Bichutskiy
Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...
Jimmy Lai
Apache Zookeeper
Apache Zookeeper
Nguyen Quang
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Amazon Web Services
Log analysis using elk
Log analysis using elk
Rushika Shah
Solr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for You
Sematext Group, Inc.
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
HostedbyConfluent
Query Parsing - Tips and Tricks
Query Parsing - Tips and Tricks
Erik Hatcher
Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.
Dan Harvey
Open Source 101 2022 - MySQL Indexes and Histograms
Open Source 101 2022 - MySQL Indexes and Histograms
Frederic Descamps
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21
Stamatis Zampetakis
Spark overview
Spark overview
Lisa Hua
Securing Prometheus exporters using HashiCorp Vault
Securing Prometheus exporters using HashiCorp Vault
Bram Vogelaar
Manage Add-On Services with Apache Ambari
Manage Add-On Services with Apache Ambari
DataWorks Summit
Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them All
Michael Mior
Pulsar - Distributed pub/sub platform
Pulsar - Distributed pub/sub platform
Matteo Merli
What's hot
(20)
openCypher: Introducing subqueries
openCypher: Introducing subqueries
ELK Stack
ELK Stack
Solrcloud Leader Election
Solrcloud Leader Election
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Spark Overview
Apache Spark Overview
Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...
Apache Zookeeper
Apache Zookeeper
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Log analysis using elk
Log analysis using elk
Solr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for You
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
Query Parsing - Tips and Tricks
Query Parsing - Tips and Tricks
Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.
Open Source 101 2022 - MySQL Indexes and Histograms
Open Source 101 2022 - MySQL Indexes and Histograms
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21
Spark overview
Spark overview
Securing Prometheus exporters using HashiCorp Vault
Securing Prometheus exporters using HashiCorp Vault
Manage Add-On Services with Apache Ambari
Manage Add-On Services with Apache Ambari
Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them All
Pulsar - Distributed pub/sub platform
Pulsar - Distributed pub/sub platform
Viewers also liked
Using Apache Solr
Using Apache Solr
pittaya
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engine
Trey Grainger
Solr for Indexing and Searching Logs
Solr for Indexing and Searching Logs
Sematext Group, Inc.
Introduction to Apache Solr
Introduction to Apache Solr
Alexandre Rafalovitch
Enterprise Search in Practice: A Presentation of Survey Results and Areas for...
Enterprise Search in Practice: A Presentation of Survey Results and Areas for...
Findwise
Solr introduction
Solr introduction
Lap Tran
New-Age Search through Apache Solr
New-Age Search through Apache Solr
Edureka!
Enterprise Search Using Apache Solr
Enterprise Search Using Apache Solr
sagar chaturvedi
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Sematext Group, Inc.
How Solr Search Works
How Solr Search Works
Atlogys Technical Consulting
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
thelabdude
Introduction to Solr
Introduction to Solr
Erik Hatcher
Apache Solr Search Course Drupal 7 Acquia
Apache Solr Search Course Drupal 7 Acquia
Dropsolid
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, Lucidworks
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, Lucidworks
Lucidworks
Apache Solr-Webinar
Apache Solr-Webinar
Edureka!
High Performance Solr
High Performance Solr
Shalin Shekhar Mangar
Introduction to Apache Solr
Introduction to Apache Solr
Christos Manios
Apache Spark Overview
Apache Spark Overview
Carol McDonald
Solr Application Development Tutorial
Solr Application Development Tutorial
Erik Hatcher
Lucene for Solr Developers
Lucene for Solr Developers
Erik Hatcher
Viewers also liked
(20)
Using Apache Solr
Using Apache Solr
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engine
Solr for Indexing and Searching Logs
Solr for Indexing and Searching Logs
Introduction to Apache Solr
Introduction to Apache Solr
Enterprise Search in Practice: A Presentation of Survey Results and Areas for...
Enterprise Search in Practice: A Presentation of Survey Results and Areas for...
Solr introduction
Solr introduction
New-Age Search through Apache Solr
New-Age Search through Apache Solr
Enterprise Search Using Apache Solr
Enterprise Search Using Apache Solr
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
How Solr Search Works
How Solr Search Works
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
Introduction to Solr
Introduction to Solr
Apache Solr Search Course Drupal 7 Acquia
Apache Solr Search Course Drupal 7 Acquia
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, Lucidworks
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, Lucidworks
Apache Solr-Webinar
Apache Solr-Webinar
High Performance Solr
High Performance Solr
Introduction to Apache Solr
Introduction to Apache Solr
Apache Spark Overview
Apache Spark Overview
Solr Application Development Tutorial
Solr Application Development Tutorial
Lucene for Solr Developers
Lucene for Solr Developers
Similar to Solr: Search at the Speed of Light
The Seven Deadly Sins of Solr - By Jay Hill
The Seven Deadly Sins of Solr - By Jay Hill
lucenerevolution
The Seven Deadly Sins of Solr - By Jay Hill
The Seven Deadly Sins of Solr - By Jay Hill
lucenerevolution
The Seven Deadly Sins of Solr
The Seven Deadly Sins of Solr
Lucidworks (Archived)
Games for the Masses (Jax)
Games for the Masses (Jax)
Wooga
Building specialized industry applications using Solr, and migration from FAS...
Building specialized industry applications using Solr, and migration from FAS...
Lucidworks (Archived)
Building specialized industry applications using Solr, and migration from FAS...
Building specialized industry applications using Solr, and migration from FAS...
Lucidworks (Archived)
Building specialized industry apps using solr - By Rahul Agarwalla
Building specialized industry apps using solr - By Rahul Agarwalla
lucenerevolution
Tricks And Tradeoffs Of Deploying My Sql Clusters In The Cloud
Tricks And Tradeoffs Of Deploying My Sql Clusters In The Cloud
MySQLConference
Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
Yukinori Suda
HBase and Hadoop at Adobe
HBase and Hadoop at Adobe
Cosmin Lehene
Oracle+golden+gate+introduction
Oracle+golden+gate+introduction
xiakaicd
Oslo Enterprise MeetUp May 12th 2010 - Jan Høydahl
Oslo Enterprise MeetUp May 12th 2010 - Jan Høydahl
Cominvent AS
Mule ESB - Integration Simplified
Mule ESB - Integration Simplified
Rich Software
The Java Virtual Machine is Over - The Polyglot VM is here - Marcus Lagergren...
The Java Virtual Machine is Over - The Polyglot VM is here - Marcus Lagergren...
jaxLondonConference
Ontology and semantic web (2016)
Ontology and semantic web (2016)
Craig Trim
SAM SIG: Hadoop architecture, MapReduce patterns, and best practices with Cas...
SAM SIG: Hadoop architecture, MapReduce patterns, and best practices with Cas...
cwensel
Solr @ eBay Kleinanzeigen
Solr @ eBay Kleinanzeigen
Lucidworks (Archived)
MarkLogic Server / NoSQL at ApacheCon
MarkLogic Server / NoSQL at ApacheCon
hunterhacker
Analyze this! tips and tricks on getting the lucene solr analyzer to index an...
Analyze this! tips and tricks on getting the lucene solr analyzer to index an...
Lucidworks (Archived)
Building Scale Free Applications with Hadoop and Cascading
Building Scale Free Applications with Hadoop and Cascading
cwensel
Similar to Solr: Search at the Speed of Light
(20)
The Seven Deadly Sins of Solr - By Jay Hill
The Seven Deadly Sins of Solr - By Jay Hill
The Seven Deadly Sins of Solr - By Jay Hill
The Seven Deadly Sins of Solr - By Jay Hill
The Seven Deadly Sins of Solr
The Seven Deadly Sins of Solr
Games for the Masses (Jax)
Games for the Masses (Jax)
Building specialized industry applications using Solr, and migration from FAS...
Building specialized industry applications using Solr, and migration from FAS...
Building specialized industry applications using Solr, and migration from FAS...
Building specialized industry applications using Solr, and migration from FAS...
Building specialized industry apps using solr - By Rahul Agarwalla
Building specialized industry apps using solr - By Rahul Agarwalla
Tricks And Tradeoffs Of Deploying My Sql Clusters In The Cloud
Tricks And Tradeoffs Of Deploying My Sql Clusters In The Cloud
Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
HBase and Hadoop at Adobe
HBase and Hadoop at Adobe
Oracle+golden+gate+introduction
Oracle+golden+gate+introduction
Oslo Enterprise MeetUp May 12th 2010 - Jan Høydahl
Oslo Enterprise MeetUp May 12th 2010 - Jan Høydahl
Mule ESB - Integration Simplified
Mule ESB - Integration Simplified
The Java Virtual Machine is Over - The Polyglot VM is here - Marcus Lagergren...
The Java Virtual Machine is Over - The Polyglot VM is here - Marcus Lagergren...
Ontology and semantic web (2016)
Ontology and semantic web (2016)
SAM SIG: Hadoop architecture, MapReduce patterns, and best practices with Cas...
SAM SIG: Hadoop architecture, MapReduce patterns, and best practices with Cas...
Solr @ eBay Kleinanzeigen
Solr @ eBay Kleinanzeigen
MarkLogic Server / NoSQL at ApacheCon
MarkLogic Server / NoSQL at ApacheCon
Analyze this! tips and tricks on getting the lucene solr analyzer to index an...
Analyze this! tips and tricks on getting the lucene solr analyzer to index an...
Building Scale Free Applications with Hadoop and Cascading
Building Scale Free Applications with Hadoop and Cascading
More from Erik Hatcher
Ted Talk
Ted Talk
Erik Hatcher
Solr Payloads
Solr Payloads
Erik Hatcher
it's just search
it's just search
Erik Hatcher
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)
Erik Hatcher
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
Erik Hatcher
Solr Powered Libraries
Solr Powered Libraries
Erik Hatcher
Solr Query Parsing
Solr Query Parsing
Erik Hatcher
"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago
Erik Hatcher
Solr 4
Solr 4
Erik Hatcher
Solr Recipes
Solr Recipes
Erik Hatcher
Introduction to Solr
Introduction to Solr
Erik Hatcher
Solr Flair
Solr Flair
Erik Hatcher
Introduction to Solr
Introduction to Solr
Erik Hatcher
Lucene for Solr Developers
Lucene for Solr Developers
Erik Hatcher
Rapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
Lucene for Solr Developers
Lucene for Solr Developers
Erik Hatcher
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0
Erik Hatcher
Solr Recipes Workshop
Solr Recipes Workshop
Erik Hatcher
Rapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
Lucene for Solr Developers
Lucene for Solr Developers
Erik Hatcher
More from Erik Hatcher
(20)
Ted Talk
Ted Talk
Solr Payloads
Solr Payloads
it's just search
it's just search
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
Solr Powered Libraries
Solr Powered Libraries
Solr Query Parsing
Solr Query Parsing
"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago
Solr 4
Solr 4
Solr Recipes
Solr Recipes
Introduction to Solr
Introduction to Solr
Solr Flair
Solr Flair
Introduction to Solr
Introduction to Solr
Lucene for Solr Developers
Lucene for Solr Developers
Rapid Prototyping with Solr
Rapid Prototyping with Solr
Lucene for Solr Developers
Lucene for Solr Developers
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0
Solr Recipes Workshop
Solr Recipes Workshop
Rapid Prototyping with Solr
Rapid Prototyping with Solr
Lucene for Solr Developers
Lucene for Solr Developers
Recently uploaded
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
Safe Software
CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024
Brian Pichman
Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...
DianaGray10
The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)
IES VE
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 update
adam112203
How to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptx
KaustubhBhavsar6
Scenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenarios
Erol GIRAUDY
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
xtailishbaloch
How to release an Open Source Dataweave Library
How to release an Open Source Dataweave Library
shyamraj55
UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2
DianaGray10
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First Frame
Kapil Thakar
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)
Muhammad Tiham Siddiqui
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox
Extra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdf
Infopole1
Top 10 Squarespace Development Companies
Top 10 Squarespace Development Companies
TopCSSGallery
The New Cloud World Order Is FinOps (Slideshow)
The New Cloud World Order Is FinOps (Slideshow)
codyslingerland1
3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data
Eric D. Schabell
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Alkin Tezuysal
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptx
Satishbabu Gunukula
Recently uploaded
(20)
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024
Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...
The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 update
How to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptx
Scenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenarios
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile Brochure
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
How to release an Open Source Dataweave Library
How to release an Open Source Dataweave Library
UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First Frame
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Extra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdf
Top 10 Squarespace Development Companies
Top 10 Squarespace Development Companies
The New Cloud World Order Is FinOps (Slideshow)
The New Cloud World Order Is FinOps (Slideshow)
3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptx
Solr: Search at the Speed of Light
1.
Solr
Search at the Speed of Light JavaZone 2009 September 10 Oslo Erik Hatcher, Lucid Imagination erik.hatcher@lucidimagination.com 1
2.
Solr History
• Created by Yonik Seeley for CNET • Contributed to Apache in January 2006 • December 2006:Version 1.1 released • June 2007:Version 1.2 released • September 2008:Version 1.3 released • ~September 2009:Version 1.4 http://lucene.apache.org/solr © 2008-2009 Lucid Imagination, Inc. 2
3.
Solr: Big Picture
Data DB Document Document Documents Solr Search Results © 2008-2009 Lucid Imagination, Inc. 3
4.
Features • Lucene
power exposed over HTTP • Scalability: caching, replication, distributed search • Faceting • And more: spell checking, highlighting, clustering, rich document and DB indexing, "more like this" © 2008-2009 Lucid Imagination, Inc. 4
5.
Lucene • Fast,
scalable search library • Lucene index structure • Index contains documents • documents have fields • indexed fields have terms © 2008-2009 Lucid Imagination, Inc. 5
6.
Inverted Index •
Commonly used search engine data structure • Efficient lookup of terms across large number of documents • Usually stores positional information to enable From "Taming Text" by Grant Ingersoll and Tom Morton phrase/proximity queries © 2008-2009 Lucid Imagination, Inc. 6
7.
Analysis Process © 2008-2009
Lucid Imagination, Inc. 7
8.
Analyzing the analyzer
Example phrase The quick brown fox jumps over the lazy dog. © 2008-2009 Lucid Imagination, Inc. 8
9.
WhitespaceAnalyzer
Simplest built-in analyzer The quick brown fox jumps over the lazy dog. [The] [quick] [brown] [fox] [jumps] [over] [the] [lazy] [dog.] © 2008-2009 Lucid Imagination, Inc. 9
10.
SimpleAnalyzer
Lowercases, splits at non-letter boundaries the quick brown fox jumps over the lazy dog. [the] [quick] [brown] [fox] [jumps] [over] [the] [lazy] [dog] © 2008-2009 Lucid Imagination, Inc. 10
11.
StopAnalyzer
Lowercases and removes stop words The quick brown fox jumps over the lazy dog. [quick] [brown] [fox] [jumps] [over] [lazy] [dog] © 2008-2009 Lucid Imagination, Inc. 11
12.
SnowballAnalyzer
Stemming algorithm The quick brown fox jumps over the lazi dog. [the] [quick] [brown] [fox] [jump] [over] [the] [lazi] [dog] © 2008-2009 Lucid Imagination, Inc. 12
13.
What's in a
token? © 2008-2009 Lucid Imagination, Inc. 13
14.
Relevance •
Term frequency (TF): number of times a term appears in a document • Inverse document frequency (IDF): One over number of times term appears in the index (1/df) • Field length normalization: control affect field length, in number of terms, has on score • Boost factors: terms, fields, or documents © 2008-2009 Lucid Imagination, Inc. 14
15.
Lucene Scoring
d1 q1 Θ © 2008-2009 Lucid Imagination, Inc. 15
16.
Solr APIs •
HTTP GET/POST (curl or any other HTTP client) • JSON • SolrJ (embedded or HTTP) • solr-ruby • python, PHP, solrsharp, XSLT © 2008-2009 Lucid Imagination, Inc. 16
17.
Solr in Production
Incoming Search Requests Load Balancer Solr Solr Master Solr Master Shard Request Shard Request Load Balancer Load Balancer Shard Shard Shard Shard Master 1..n Master Replicant shards Replicant Replicant Replicant Replicant Replicant Replicant Replicant © 2008-2009 Lucid Imagination, Inc. 17
18.
Getting Started:
It's This Easy 1.Start Solr java -jar start.jar 2.Index your data java -jar post.jar *.xml 3.Search http://localhost:8983/solr © 2008-2009 Lucid Imagination, Inc. 18
19.
Configuration •
schema.xml • field types and fields • solrconfig.xml • request handler mappings • cache settings: filter, query, document • warming listeners • HTTP cache settings • Lucene index parameters • plugins: spell checking, highlighting © 2008-2009 Lucid Imagination, Inc. 19
20.
Solr add/update XML <add><doc>
<field name="id">MA147LL/A</field> <field name="name">Apple 60 GB iPod with Video Playback Black</field> <field name="manu">Apple Computer Inc.</field> <field name="cat">electronics</field> <field name="cat">music</field> <field name="features">iTunes, Podcasts, Audiobooks</field> <field name="features">Stores up to 15,000 songs, 25,000 photos, or 150 hours of video</field> <field name="features">2.5-inch, 320x240 color TFT LCD display with LED backlight</field> <field name="features">Up to 20 hours of battery life</field> <field name="features">Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless, H.264 video</field> <field name="features">Notes, Calendar, Phone book, Hold button, Date display, Photo wallet, Built-in games, JPEG photo playback, Upgradeable firmware, USB 2.0 compatibility, Playback speed control, Rechargeable capability, Battery level indication</field> <field name="includes">earbud headphones, USB cable</field> <field name="weight">5.5</field> <field name="price">399.00</field> <field name="popularity">10</field> <field name="inStock">true</field> </doc></add> © 2008-2009 Lucid Imagination, Inc. 20
21.
Indexing Solr XML
• Via curl:'http://localhost:8983/ curl solr/update?commit=true' -- data-binary @ipod_video.xml - H 'Content-type:text/xml; charset=utf-8' • Via Solr's Java-based post tool: java -jar post.jar ipod_video.xml © 2008-2009 Lucid Imagination, Inc. 21
22.
Indexing CSV curl 'http://localhost:8983/solr/update/ csv?commit=true'
--data-binary @books.csv - H 'Content-type:text/plain; charset=utf-8' © 2008-2009 Lucid Imagination, Inc. 22
23.
Content Streams •
Allows Solr server to fetch local or remote data itself. Must enable remote streaming in solrconfig.xml • http://localhost:8983/solr/update?stream.file=<local Solr path to exampledocs>/ipod_video.xml • &stream.url=<url to content> • Security warning: allows Solr to fetch arbitrary server-side file or network URL content © 2008-2009 Lucid Imagination, Inc. 23
24.
Indexing Rich Documents curl
'http://localhost:8983/solr/update/ extract? literal.id=doc1&commit=true&extractOnly=true &wt=ruby&indent=on' -F "myfile=@tutorial.html" © 2008-2009 Lucid Imagination, Inc. 24
25.
Indexing with SolrJ SolrServer
solr = new CommonsHttpSolrServer(new URL("http://localhost:8983/solr")); SolrInputDocument doc = new SolrInputDocument(); doc.addField("id", "JAVAZONE_09"); doc.addField("title", "JavaZone 2009 SolrJ Example"); solr.add(doc); solr.commit(); // after a batch, not per document solr.optimize(); // periodically, when needed © 2008-2009 Lucid Imagination, Inc. 25
26.
Indexing with Ruby solr
= Connection.new( 'http://localhost:8983/solr', :autocommit => :on) solr.add(:id => 123, :title => 'Solr in Action') solr.optimize # periodically, as needed © 2008-2009 Lucid Imagination, Inc. 26
27.
Data Import Handler •
Indexes relational database, XML data sources, e-mail, and more • Supports full and incremental/delta indexing • Extensible with custom data sources, transformers, etc • http://wiki.apache.org/solr/DataImportHandler © 2008-2009 Lucid Imagination, Inc. 27
28.
DB Indexing http://localhost:8983/solr/db/dataimport? command=full-import
© 2008-2009 Lucid Imagination, Inc. 28
29.
Example Search Request
• http://localhost:8983/solr/select?q=query • &start=50 • &rows=25 • &fq=filter+query • &facet=on&facet.field=category © 2008-2009 Lucid Imagination, Inc. 29
30.
Debug Query •
&debugQuery=true is your friend • Includes parsed query, explanations, and search component timings in response © 2008-2009 Lucid Imagination, Inc. 30
31.
Query Parser •
Controlled by defType parameter • &defType=lucene (actually a Solr extension of Lucene’s QueryParser) • &defType=dismax • Local {!..} override syntax © 2008-2009 Lucid Imagination, Inc. 31
32.
Solr Query Parser
• http://lucene.apache.org/java/2_4_0/ queryparsersyntax.html + Solr extensions • Kitchen sink parser, includes advanced user- unfriendly syntax • Syntax errors throw parse exceptions back to client • Example: title:ipod* AND price:[0 TO 100] © 2008-2009 Lucid Imagination, Inc. 32
33.
Dismax Query Parser
• Simplified syntax: loose text “quote phrases” -prohibited +required • Spreads query terms across query fields (qf) with dynamic boosting per field, implicit phrase construction (pf), boosting function (bf), boosting query (bq), and minimum match (mm) © 2008-2009 Lucid Imagination, Inc. 33
34.
Searching with SolrJ SolrServer
server = new CommonsHttpSolrServer("http:// localhost:8983/solr"); SolrQuery params = new SolrQuery("author:John"); params.setFields("*,score"); params.setRows(3); QueryResponse response = server.query(params); for (SolrDocument document : response.getResults()) { System.out.println("Doc: " + document); } © 2008-2009 Lucid Imagination, Inc. 34
35.
Searching with Ruby conn
= Connection.new( 'http://localhost:8983/solr') conn.query('my query') do |hit| puts hit.inspect end © 2008-2009 Lucid Imagination, Inc. 35
36.
delete, update, etc
• Delete: • <delete><id>05991</id></delete> • <delete> <query>category:Unused</query> </delete> • java -Ddata=args -jar post.jar "<delete><query>*:*</query></delete>" • Update: simply <add> doc with same unique key • Commit: <commit/> • Optimize: <optimize/> © 2008-2009 Lucid Imagination, Inc. 36
37.
Faceting • Counts per
subset within results • Facet on: field terms, queries, date ranges • &facet=on &facet.field=cat &facet.query=price:[0 TO 100] • http://wiki.apache.org/solr/ SimpleFacetParameters © 2008-2009 Lucid Imagination, Inc. 37
38.
Spell checking •
Not enabled by default, see example config to wire it in • http://localhost:8983/solr/spell? q=epod&spellcheck=on&spellcheck.build=true • File or index-based dictionaries • Supports pluggable distance algorithms: Levenstein and JaroWinkler • http://wiki.apache.org/solr/SpellCheckComponent © 2008-2009 Lucid Imagination, Inc. 38
39.
Highlighting • http://localhost:8983/solr/select?
q=ipod&hl=on&hl.fl=manu,name • http://wiki.apache.org/solr/ HighlightingParameters © 2008-2009 Lucid Imagination, Inc. 39
40.
More Like This
• http://localhost:8983/solr/select? q=ipod&mlt=true&mlt.fl=manu,cat&mlt.min df=1&mlt.mintf=1&fl=id,score,name • http://wiki.apache.org/solr/MoreLikeThis © 2008-2009 Lucid Imagination, Inc. 40
41.
Scaling: Query Throughput
• Replication • slaves poll master for index updates • transfers index files from master to slave • configuration files can also be transferred • entirely Java/HTTP-based in Solr 1.4 (prior versions used rsync) © 2008-2009 Lucid Imagination, Inc. 41
42.
Scaling: Collection Size
• Distribution • Index documents across shards • query single server with shards parameter • sends requests to each shard • aggregates result to a single response © 2008-2009 Lucid Imagination, Inc. 42
43.
Solr-powered UI •
Solritas (from "celeritas"): VelocityResponseWriter • easily templated output • SolrJS: jQuery-based widgets • see http://solrjs.solrstuff.org/ • Blacklight and Flare: RoR plugins © 2008-2009 Lucid Imagination, Inc. 43
44.
Lucene in Action,
2nd Edition http://www.manning.com/lucene © 2008-2009 Lucid Imagination, Inc. 44
45.
Search at Lucid http://search.lucidimagination.com/?q=javazone ©
2008-2009 Lucid Imagination, Inc. 45
46.
/")$/#$0(#
!"#$%&'()*$+),$-+&$0&,12&#-((23#$)4&2+,$,5&-6 78)#12& !"#2+29:-43&2#-050,2( !"#$%&,2)(&$+#4"%20&,12&4)3*20,&#-442#,$-+&-6& !"#2+29:-43&#-(($,,230.&#-+,3$;",-30&)+%&$+64"2+#230& <"3&($00$-+&$0&,-&023=2&)0&!"#$%#&'#($)*$+,-#..#&-#$6-3& !"#2+29:-43>;)02%&02)3#1&0-4",$-+0 ?248&-"3&#"0,-(230&*2,&,12&(-0,&-",&-6&!"#2+29:-43&> !"#$%&'( (-0,&@$%245&"02%&-82+&0-"3#2&02)3#1&0-6,@)32&&& A&BCCD>BCCE © 2008-2009 !"#$%&'()*$+),$-+.&'+#/Inc. Lucid Imagination, !"#$%$&'()*+',%-'./$0+'*)1)2',+$'.+,-$3,+42')5'./$'67,#/$'()5.8,+$'9)"%-,.0)% 46
47.
!"#$%&'()*$+),$-+&./#0+$#)1&./)(
! 2-+$3&4//1/56 ! <)8#&F8/11/+9,/$+6 012),-1&-3&4-51&& Unique !"#2+264-51&#-(($,,21.&780&(2(921 0-;3-"+%21.&0=G64H7.&<-1,:21+&!$*:, Combination of ! 78)+,&'+*/89-116 H7&42)1#:.&0=G.&I5J2K$21 Enterprise Search !"#$%&"'&(')*+,#-#'.&&%'!$/01 ! @8$)+&G$+3/8,-+6 and Lucene !"#2+264-51&#-(($,,21.&0:)$1.&780 L2K25-@2%&M2901)N521.&,:2&N29OJ&3$1J,& ! :8$3&;),#0/86 #-(@12:2+J$K2&J2)1#:&2+*$+2& Expertise 0-;$+%2&"'&(')*+,#-#'3-'4,%3&-1'5&&6 71$+#$@)5&P1#:$,2#,&),&PF !"#2+264-51&#-(($,,21.&780&(2(921 ! 4$(-+&H-9/+,0)16 ! <)83&<$11/8 4-5",$-+J&)1#:$,2#,.&<-1,:21+&!$*:, !"#2+264-51&#-(($,,21.&780& (2(921 ! I)5&;$116 ! 4)($&4$8/+ 4-5",$-+J&P1#:$,2#,.&M255J&Q)1*- <",#:6=$>)&#-(($,,21.&780&(2(921 ! H5)+&<#F$+1/56 ! =+%8>/?&@$1)1/#3$& !"#2+264-51&#-(($,,21.&&780&(2(921 !"#2+26<",#:6?)%--@&#-(($,,21.&780& (2(921& ! B08$9&;-9,/,,/86&C=%D$9-8E ! A-"*&B",,$+*6&C=%D$9-8E !"#2+264-51&#-(($,,21.&&780&(2(921 012),-1&-3&!"#2+2.&<",#:&A&?)%--@ 82(921&P@)#:2&4-3,N)12&Q-"+%),$-+ B&CDDE;CDDF © 2008-2009 !"#$%&'()*$+),$-+.&'+#/ Lucid Imagination, Inc. 47
48.
!"#$%&'()*$+),$-+&."/$+0//&1-%02 ;:00 <-=+2-)%
()*+,-,./+"0+,/.1) 2+,*.3.+4"5./*,.67*.1)/ & 8,++"& 3)2"04)%%&567 !"#0+0 89*:)%0 >9)#?0@-:* 2199+,:.;<""=7--1,*>" ?,;.).)@>" 21)/7<*.)@" !"#$$%&#$$' © 2008-2009 A7:.4"B9;@.);*.1) 21)3.4+)*.;< !"#$%$&'()*+',%-'./$0+'*)1)2',+$'.+,-$3,+42')5'./$'67,#/$'()5.8,+$'9)"%-,.0)% Lucid Imagination, Inc. 48
49.
Thank you
http://www.lucidimagination.com © 2008-2009 Lucid Imagination, Inc. 49
50.
© 2008-2009
Lucid Imagination, Inc. 50
Download Now