SlideShare a Scribd company logo
Make Your Data 
Searchable 
With Solr in 25 Minutes 
Kai Chan 
BruinTech Tech-a-Thon, November 19, 2013
The Goal 
data 
data 
find this
The Goal 
• objectives 
o find something in the (text) data 
o get the results fast 
o get the most relevant results first 
o avoid getting the not-so-relevant results first 
• (one) solution: Solr
What Solr is 
• used by high-profile websites like Twitter 
… and interesting projects like NewsScape 
• open-source, full-text search platform 
• uses Lucene for indexing and searching 
• standalone process/program (typically) 
• REST-like API over HTTP 
• different output formats (XML, JSON, CSV)
How to Talk To Solr 
• have front-end/browser make HTTP 
requests 
• language-specific clients 
o .Net 
o Java 
o PHP 
o Python 
o Ruby 
• integration with other applications 
o Moodle 
o Drupal 
Plone
How Solr works 
Solr 
query 
(i.e. search criteria) 
result 
(i.e. things being looked for)
How Solr works 
Solr 
query 
(i.e. search criteria) 
result 
(i.e. things being looked for) 
Solr 
index 
index
How Solr works 
Solr 
Solr 
query 
(i.e. search criteria) 
result 
(i.e. things being looked for) 
Solr 
data to be searched 
index 
index
How Solr works 
Solr 
Solr 
query 
(i.e. search criteria) 
query 
(i.e. search criteria) 
result’ 
(i.e. things being looked for) 
Solr 
index 
index 
index’ 
additions 
updates 
deletions 
result 
(i.e. things being looked for)
How Data Are Organized 
collection 
document document document 
field 
field 
field 
field 
field 
field 
field 
field 
field
collection 
document document document 
subject 
date 
from 
subject 
date 
from 
date 
from 
reply-to 
text text 
reply-to 
text 
How Data Are Organized
collection 
document document document 
subject 
date 
from 
title 
SKU 
price 
last name 
phone 
text description 
first name 
address 
How Data Are Organized
Solr Field Definition 
• field 
o name 
o type 
o options 
• field type 
o text: "string", "text_general" 
o numeric: "int", "long", "float", "double" 
• options 
o indexed: content can be searched 
o stored: content can be returned at search-time 
o multivalued: multiple values per field & document
Solr Dynamic Field 
• define field by naming convention 
• "amount_i": int, index, stored 
• "tag_ss": string, indexed, stored, multivalued 
name type indexed stored multiValued 
*_i int true true false 
*_l long true true false 
*_f float true true false 
*_d double true true false 
*_s string true true false 
*_ss string true true true 
*_t text_general true true false 
*_txt text_general true true true
Getting Data into Solr 
• submit (post) files to Solr 
o XML 
o JSON 
o CSV 
• have Solr pull data from database or file 
o RDBMS 
o XML data locally (file) or remotely (HTTP) 
o extract data (XPath) 
o manipulate data (regex replace, strip HTML tags)
Searching Data in Solr 
• send request to http://host:port/solr/search 
• parameters 
o q - main query 
o fl - fields to return 
o sort - sort criteria 
o wt - response writer (e.g. xml, json) 
o indent - set to true for pretty-printing
Query Syntax 
• basic format: field name “:” 
word/phrasetext:negotiation 
text:"debt ceiling"
Query Syntax 
• several clauses: separated by 
spacetext:negotiation 
subject:debt 
• make the word/phrase required: “+” 
prefix+text:negotiation 
+subject:debt 
• make the word/phrase prohibited: “-” 
prefixtext:negotiation - 
subject:debt
Additional Things Solr Can Do 
• other types of queries 
o range 
o fuzzy 
o wildcard 
o regex 
o proximity 
o spatial 
o join 
• sorting 
• faceted search 
• … and more
Conclusion 
• more about 
Solr:http://lucene.apache.org/solr/ 
• Solr reference 
guide:http://www.apache.org/dyn/closer.cgi/l 
ucene/solr/ref-guide/ 
• my e-mail:kai@ssc.ucla.edu 
• questions?

More Related Content

What's hot

Using NLP to Explore Entity Relationships in COVID-19 Literature
Using NLP to Explore Entity Relationships in COVID-19 LiteratureUsing NLP to Explore Entity Relationships in COVID-19 Literature
Using NLP to Explore Entity Relationships in COVID-19 Literature
Databricks
 
An advanced tool for searching the epo's worldwide patent data
An advanced tool for searching the epo's worldwide patent dataAn advanced tool for searching the epo's worldwide patent data
An advanced tool for searching the epo's worldwide patent data
Intepat IP
 
Experiment no 4
Experiment no 4Experiment no 4
Experiment no 4
Ankit Dubey
 
Apache solr
Apache solrApache solr
Apache solr
Péter Király
 
2015q4_InnerCourse_Presentation
2015q4_InnerCourse_Presentation2015q4_InnerCourse_Presentation
2015q4_InnerCourse_Presentation
Hung-Wei Liu
 
1 구조체
1 구조체1 구조체
Language Search
Language SearchLanguage Search
Language Search
Bryan Warner
 
3. introduction to text mining
3. introduction to text mining3. introduction to text mining
3. introduction to text miningLokesh Ramaswamy
 
Searching for The Matrix in haystack (with Elasticsearch)
Searching for The Matrix in haystack  (with Elasticsearch)Searching for The Matrix in haystack  (with Elasticsearch)
Searching for The Matrix in haystack (with Elasticsearch)
Tomas Sirny
 
Dbd arrrrcamp-2013
Dbd arrrrcamp-2013Dbd arrrrcamp-2013
Dbd arrrrcamp-2013
Peter Vandenabeele
 
Nosql
NosqlNosql
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science Bootcamp
Kais Hassan, PhD
 
Week12
Week12Week12
Week12
Esha Meher
 
Unit 3 chapter-1managing-files-of-records
Unit 3 chapter-1managing-files-of-recordsUnit 3 chapter-1managing-files-of-records
Unit 3 chapter-1managing-files-of-records
hanumanthu mothukuru
 
Apache Solr lessons learned
Apache Solr lessons learnedApache Solr lessons learned
Apache Solr lessons learned
Jeroen Rosenberg
 
LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)
LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)
LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)rchbeir
 
Indexing structure for files
Indexing structure for filesIndexing structure for files
Indexing structure for files
Zainab Almugbel
 

What's hot (20)

Using NLP to Explore Entity Relationships in COVID-19 Literature
Using NLP to Explore Entity Relationships in COVID-19 LiteratureUsing NLP to Explore Entity Relationships in COVID-19 Literature
Using NLP to Explore Entity Relationships in COVID-19 Literature
 
An advanced tool for searching the epo's worldwide patent data
An advanced tool for searching the epo's worldwide patent dataAn advanced tool for searching the epo's worldwide patent data
An advanced tool for searching the epo's worldwide patent data
 
Experiment no 4
Experiment no 4Experiment no 4
Experiment no 4
 
Apache solr
Apache solrApache solr
Apache solr
 
04 standard class library c#
04 standard class library c#04 standard class library c#
04 standard class library c#
 
2015q4_InnerCourse_Presentation
2015q4_InnerCourse_Presentation2015q4_InnerCourse_Presentation
2015q4_InnerCourse_Presentation
 
1 구조체
1 구조체1 구조체
1 구조체
 
Language Search
Language SearchLanguage Search
Language Search
 
3. introduction to text mining
3. introduction to text mining3. introduction to text mining
3. introduction to text mining
 
Searching for The Matrix in haystack (with Elasticsearch)
Searching for The Matrix in haystack  (with Elasticsearch)Searching for The Matrix in haystack  (with Elasticsearch)
Searching for The Matrix in haystack (with Elasticsearch)
 
Comp102 lec 11
Comp102   lec 11Comp102   lec 11
Comp102 lec 11
 
Dbd arrrrcamp-2013
Dbd arrrrcamp-2013Dbd arrrrcamp-2013
Dbd arrrrcamp-2013
 
Nosql
NosqlNosql
Nosql
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science Bootcamp
 
Week12
Week12Week12
Week12
 
Unit 3 chapter-1managing-files-of-records
Unit 3 chapter-1managing-files-of-recordsUnit 3 chapter-1managing-files-of-records
Unit 3 chapter-1managing-files-of-records
 
SWT Lecture Session 10 R2RML Part 1
SWT Lecture Session 10 R2RML Part 1SWT Lecture Session 10 R2RML Part 1
SWT Lecture Session 10 R2RML Part 1
 
Apache Solr lessons learned
Apache Solr lessons learnedApache Solr lessons learned
Apache Solr lessons learned
 
LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)
LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)
LSI latent (par HATOUM Saria et DONGO ESCALANTE Irvin Franco)
 
Indexing structure for files
Indexing structure for filesIndexing structure for files
Indexing structure for files
 

Viewers also liked

AirShirz Pneumatic Scissors
AirShirz Pneumatic ScissorsAirShirz Pneumatic Scissors
AirShirz Pneumatic Scissors
Nicole Campana
 
Om Loxysoft Group
Om Loxysoft GroupOm Loxysoft Group
Om Loxysoft Group
Loxysoft AS
 
Poem
PoemPoem
Poem
blraal
 
Dissertation: Modelling fish dispersal in catchments affect by multiple anthr...
Dissertation: Modelling fish dispersal in catchments affect by multiple anthr...Dissertation: Modelling fish dispersal in catchments affect by multiple anthr...
Dissertation: Modelling fish dispersal in catchments affect by multiple anthr...
jradinger
 
A six years prospective study
A six years prospective studyA six years prospective study
A six years prospective study
lucacerniglia
 
West Valley College Outdoor Classroom
West Valley College Outdoor ClassroomWest Valley College Outdoor Classroom
West Valley College Outdoor ClassroomCarol Kuster
 
Early maternal relational traumatic experiences and psychopathological sympto...
Early maternal relational traumatic experiences and psychopathological sympto...Early maternal relational traumatic experiences and psychopathological sympto...
Early maternal relational traumatic experiences and psychopathological sympto...
lucacerniglia
 
UV: A new superwide opportunity
UV: A new superwide opportunityUV: A new superwide opportunity
UV: A new superwide opportunity
FujifilmPrint
 
(A)social internet projects as modulators of social reality
(A)social internet projects as modulators of social reality(A)social internet projects as modulators of social reality
(A)social internet projects as modulators of social reality
shtatspb
 
Features of the Jet Press 720S
Features of the Jet Press 720SFeatures of the Jet Press 720S
Features of the Jet Press 720S
FujifilmPrint
 
Tugas Mulok XII IPA 1 SMAN 1 Kalukku
Tugas Mulok XII IPA 1 SMAN 1 KalukkuTugas Mulok XII IPA 1 SMAN 1 Kalukku
Tugas Mulok XII IPA 1 SMAN 1 Kalukku
Meganekko Weaboo
 
Staff Presentation
Staff Presentation Staff Presentation
Staff Presentation caitlino
 
Integrative theory of crime and other forms of deviance
Integrative theory of crime and other forms of devianceIntegrative theory of crime and other forms of deviance
Integrative theory of crime and other forms of deviance
shtatspb
 
Aplikační software (den otevřených dveří)
Aplikační software (den otevřených dveří)Aplikační software (den otevřených dveří)
Aplikační software (den otevřených dveří)
vojtechjanak
 

Viewers also liked (17)

AirShirz Pneumatic Scissors
AirShirz Pneumatic ScissorsAirShirz Pneumatic Scissors
AirShirz Pneumatic Scissors
 
Om Loxysoft Group
Om Loxysoft GroupOm Loxysoft Group
Om Loxysoft Group
 
Poem
PoemPoem
Poem
 
Dissertation: Modelling fish dispersal in catchments affect by multiple anthr...
Dissertation: Modelling fish dispersal in catchments affect by multiple anthr...Dissertation: Modelling fish dispersal in catchments affect by multiple anthr...
Dissertation: Modelling fish dispersal in catchments affect by multiple anthr...
 
skydrive_ppt_doc
skydrive_ppt_docskydrive_ppt_doc
skydrive_ppt_doc
 
PacMan Slides
PacMan SlidesPacMan Slides
PacMan Slides
 
A six years prospective study
A six years prospective studyA six years prospective study
A six years prospective study
 
West Valley College Outdoor Classroom
West Valley College Outdoor ClassroomWest Valley College Outdoor Classroom
West Valley College Outdoor Classroom
 
Early maternal relational traumatic experiences and psychopathological sympto...
Early maternal relational traumatic experiences and psychopathological sympto...Early maternal relational traumatic experiences and psychopathological sympto...
Early maternal relational traumatic experiences and psychopathological sympto...
 
UV: A new superwide opportunity
UV: A new superwide opportunityUV: A new superwide opportunity
UV: A new superwide opportunity
 
Copyofmaitland12
Copyofmaitland12Copyofmaitland12
Copyofmaitland12
 
(A)social internet projects as modulators of social reality
(A)social internet projects as modulators of social reality(A)social internet projects as modulators of social reality
(A)social internet projects as modulators of social reality
 
Features of the Jet Press 720S
Features of the Jet Press 720SFeatures of the Jet Press 720S
Features of the Jet Press 720S
 
Tugas Mulok XII IPA 1 SMAN 1 Kalukku
Tugas Mulok XII IPA 1 SMAN 1 KalukkuTugas Mulok XII IPA 1 SMAN 1 Kalukku
Tugas Mulok XII IPA 1 SMAN 1 Kalukku
 
Staff Presentation
Staff Presentation Staff Presentation
Staff Presentation
 
Integrative theory of crime and other forms of deviance
Integrative theory of crime and other forms of devianceIntegrative theory of crime and other forms of deviance
Integrative theory of crime and other forms of deviance
 
Aplikační software (den otevřených dveří)
Aplikační software (den otevřených dveří)Aplikační software (den otevřených dveří)
Aplikační software (den otevřených dveří)
 

Similar to Make Your Data Searchable With Solr in 25 Minutes

Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrRahul Jain
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
Rahul Jain
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Search in the Biblical Domain - BibleTech: 2011
Search in the Biblical Domain - BibleTech: 2011Search in the Biblical Domain - BibleTech: 2011
Search in the Biblical Domain - BibleTech: 2011
Brian Seagraves
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
Erik Hatcher
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation Engines
Trey Grainger
 
Creating an Open Source Genealogical Search Engine with Apache Solr
Creating an Open Source Genealogical Search Engine with Apache SolrCreating an Open Source Genealogical Search Engine with Apache Solr
Creating an Open Source Genealogical Search Engine with Apache Solr
Brooke Ganz
 
Apache solr
Apache solrApache solr
Apache solr
Dipen Rangwani
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes WorkshopErik Hatcher
 
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Kai Chan
 
Solr Architecture
Solr ArchitectureSolr Architecture
Solr Architecture
Ramez Al-Fayez
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr WorkshopJSGB
 
Search Intelligence & MarkLogic Search API
Search Intelligence & MarkLogic Search APISearch Intelligence & MarkLogic Search API
Search Intelligence & MarkLogic Search API
WillThompson78
 
RFS Search Lang Spec
RFS Search Lang SpecRFS Search Lang Spec
RFS Search Lang SpecJing Kang
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Trey Grainger
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Lucidworks
 
ElasticSearch
ElasticSearchElasticSearch
ElasticSearch
Volodymyr Kraietskyi
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Alexandre Rafalovitch
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys"
DataArt
 

Similar to Make Your Data Searchable With Solr in 25 Minutes (20)

Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Search in the Biblical Domain - BibleTech: 2011
Search in the Biblical Domain - BibleTech: 2011Search in the Biblical Domain - BibleTech: 2011
Search in the Biblical Domain - BibleTech: 2011
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation Engines
 
Creating an Open Source Genealogical Search Engine with Apache Solr
Creating an Open Source Genealogical Search Engine with Apache SolrCreating an Open Source Genealogical Search Engine with Apache Solr
Creating an Open Source Genealogical Search Engine with Apache Solr
 
Apache solr
Apache solrApache solr
Apache solr
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
 
Solr Architecture
Solr ArchitectureSolr Architecture
Solr Architecture
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Search Intelligence & MarkLogic Search API
Search Intelligence & MarkLogic Search APISearch Intelligence & MarkLogic Search API
Search Intelligence & MarkLogic Search API
 
RFS Search Lang Spec
RFS Search Lang SpecRFS Search Lang Spec
RFS Search Lang Spec
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
 
ElasticSearch
ElasticSearchElasticSearch
ElasticSearch
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys"
 

Recently uploaded

Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 

Recently uploaded (20)

Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 

Make Your Data Searchable With Solr in 25 Minutes

  • 1. Make Your Data Searchable With Solr in 25 Minutes Kai Chan BruinTech Tech-a-Thon, November 19, 2013
  • 2. The Goal data data find this
  • 3. The Goal • objectives o find something in the (text) data o get the results fast o get the most relevant results first o avoid getting the not-so-relevant results first • (one) solution: Solr
  • 4. What Solr is • used by high-profile websites like Twitter … and interesting projects like NewsScape • open-source, full-text search platform • uses Lucene for indexing and searching • standalone process/program (typically) • REST-like API over HTTP • different output formats (XML, JSON, CSV)
  • 5. How to Talk To Solr • have front-end/browser make HTTP requests • language-specific clients o .Net o Java o PHP o Python o Ruby • integration with other applications o Moodle o Drupal Plone
  • 6. How Solr works Solr query (i.e. search criteria) result (i.e. things being looked for)
  • 7. How Solr works Solr query (i.e. search criteria) result (i.e. things being looked for) Solr index index
  • 8. How Solr works Solr Solr query (i.e. search criteria) result (i.e. things being looked for) Solr data to be searched index index
  • 9. How Solr works Solr Solr query (i.e. search criteria) query (i.e. search criteria) result’ (i.e. things being looked for) Solr index index index’ additions updates deletions result (i.e. things being looked for)
  • 10. How Data Are Organized collection document document document field field field field field field field field field
  • 11. collection document document document subject date from subject date from date from reply-to text text reply-to text How Data Are Organized
  • 12. collection document document document subject date from title SKU price last name phone text description first name address How Data Are Organized
  • 13. Solr Field Definition • field o name o type o options • field type o text: "string", "text_general" o numeric: "int", "long", "float", "double" • options o indexed: content can be searched o stored: content can be returned at search-time o multivalued: multiple values per field & document
  • 14. Solr Dynamic Field • define field by naming convention • "amount_i": int, index, stored • "tag_ss": string, indexed, stored, multivalued name type indexed stored multiValued *_i int true true false *_l long true true false *_f float true true false *_d double true true false *_s string true true false *_ss string true true true *_t text_general true true false *_txt text_general true true true
  • 15. Getting Data into Solr • submit (post) files to Solr o XML o JSON o CSV • have Solr pull data from database or file o RDBMS o XML data locally (file) or remotely (HTTP) o extract data (XPath) o manipulate data (regex replace, strip HTML tags)
  • 16. Searching Data in Solr • send request to http://host:port/solr/search • parameters o q - main query o fl - fields to return o sort - sort criteria o wt - response writer (e.g. xml, json) o indent - set to true for pretty-printing
  • 17. Query Syntax • basic format: field name “:” word/phrasetext:negotiation text:"debt ceiling"
  • 18. Query Syntax • several clauses: separated by spacetext:negotiation subject:debt • make the word/phrase required: “+” prefix+text:negotiation +subject:debt • make the word/phrase prohibited: “-” prefixtext:negotiation - subject:debt
  • 19. Additional Things Solr Can Do • other types of queries o range o fuzzy o wildcard o regex o proximity o spatial o join • sorting • faceted search • … and more
  • 20. Conclusion • more about Solr:http://lucene.apache.org/solr/ • Solr reference guide:http://www.apache.org/dyn/closer.cgi/l ucene/solr/ref-guide/ • my e-mail:kai@ssc.ucla.edu • questions?