SlideShare a Scribd company logo
ADVANCED SEARCH 
WITH 
SOLR + DJANGO-HAYSTACK 
MARCEL CHASTAIN 
LA DJANGO – 2014-09-30
WHAT WE’LL COVER 
1. THE PITCH: 
The Problem With Search 
The Solution(s) 
Overall Architecture of System with Django/Solr/Haystack 
2. THE GOOD STUFF: 
Indexing Data for Search 
Querying the Search Index 
Advanced Search Methods 
Resources
THE PITCH 
OR, “WHY ANY OF THIS MATTERS”
THE PROBLEM 
1. Sites with stored information are 
ONLY as useful as they are at 
retrieving and displaying that 
information
THE PROBLEM 
2. Users have high expectations of 
search (thanks, Google)
THE PROBLEM 
2. Users have high expectations of 
search 
• Spelling Suggestions:
THE PROBLEM 
2. Users have high expectations of 
search 
• Hit Highlighting:
THE PROBLEM 
2. Users have high expectations of 
search 
• “Related Searches” 
• Distance/GeoSpatial Search
THE PROBLEM 
2. Users have high expectations of 
search 
• Faceting:
THE PROBLEM 
3. Good search involves lots of 
challenges
THE PROBLEM 
3. Good search involves lots of 
challenges 
• Stemming: 
User Searches For Word “Stem” 
“argue” 
“argues” 
“argued” 
“argu” 
“argument” 
“arguments” 
“argument”
THE PROBLEM 
3. Good search involves lots of 
challenges 
And more..! 
• Synonyms 
• Acronyms 
• Non-ASCII characters 
• Stop words (“and”, “to”, “a”) 
• Calculating relevance 
• Performance with millions/billions(!) of documents
THE SOLUTION 
“Information Retrieval Systems” 
a.k.a Search Engines
THE SOLUTION 
“Information Retrieval Systems” 
a.k.a Search Engines
SOLR 
THE BACKEND
WHAT IS SOLR? 
Open-source enterprise search 
Java-based 
Created in 2004 
Built on Apache Lucene 
Most popular enterprise search engine 
Apache 2.0 License 
Built for millions or billions of documents
WHAT DOES IT DO? 
• Full-text search 
• Hit highlighting 
• Faceted search 
• Clustering/replication/sharding 
• Database integration 
• Rich document (word, pdf, etc) handling 
• Geospatial search 
• Spelling corrections/suggestions 
• … loads and loads more
WHO USES SOLR?
HOW CAN WE USE IT 
WITH DJANGO? 
Haystack 
From the homepage: 
(http://haystacksearch.org/)
LOOK FAMILIAR? 
Query style 
Declarative search index definitions
THE GOOD 
STUFF 
INSTALLING, CONFIGURING & USING 
SOLR/HAYSTACK
WHO DOES WHAT 
Solr: 
• Provides API for submitting to & querying from index 
• Stores actual index data 
• Manages fields/data types in xml config (‘schema.xml’) 
Haystack: 
• Manages connection(s) to solr 
• Provides familiar API for querying 
• Uses templates and declarative search index definitions 
• Helps generate solr xml config 
• Management commands to index content 
• Generic views/forms for common search use-cases 
• Hooks into signals to keep data up-to-date
PART 1: 
LET’S MAKE AN INDEX
0. GITHUB REPO 
git clone https://github.com/marcelchastain/haystackdemo
1. SETUP SOLR 
(from github repo root) 
./solr_download.sh 
(or, manually) 
wget http://apache.mirrors.pair.com/lucene/solr/4.10.1/solr-4.10.1.tgz 
tar –xzvf solr-4.10.1.tgz 
ln –s ./solr-4.10.1 ./solr 
The one file to care about: 
• solr/example/solr/collection1/conf/schema.xml 
Stores field definitions and data types. Frequently updated during 
development
2. RUN SOLR 
(from github repo root) 
./solr_start.sh 
(or, manually) 
cd solr/example && java –jar start.jar 
Requires java 1.7+. To install on debian/ubuntu: 
sudo apt-get install openjdk-7-jre-headless
3. INSTALL HAYSTACK 
(CWD haystackdemo/) 
apt-get install python-pip python-virtualenv 
virtualenv env && source env/bin/activate 
(from github repo root) 
pip install –r requirements.txt 
(or, manually) 
pip install Django==1.6.7 django-haystack
4. HAYSTACK SETTINGS 
INSTALLED_APPS = [ 
# ‘django.contrib.admin’, etc 
‘haystack’, 
# then your usual apps 
‘myapp’, 
] 
HAYSTACK_CONNECTIONS = { 
‘default’: { 
‘ENGINE’: ‘haystack.backends.solr_backend.SolrEngine’, 
‘URL’: ‘http://127.0.0.1:8983/solr’ 
}, 
} 
HAYSTACK_SIGNAL_PROCESSOR = ‘haystack.signals.RealtimeSignalProcessor’
5. THE MODEL(S)
6. SYNCDB & INITIAL DATA 
(CWD haystackdemo/demo/) 
./manage.py syncdb 
./manage.py loaddata restaurants
7. DEFINE SEARCH INDEX 
myapp/search_indexes.py
7.5 BOOSTING FIELD 
RELEVANCE 
Some fields are simply more relevant! 
(Note: changes to field boosts require reindex)
8. CREATE A TEMPLATE 
FOR INDEXED TEXT 
templates/search/indexes/myapp/note_text.txt
9. UPDATE SOLR SCHEMA 
(CWD: haystackdemo/demo/) 
./manage.py build_solr_schema > 
../solr/example/solr/collection1/conf/schema.xml 
Which adds: 
*Restart solr for changes to go into effect
10. REBUILD INDEX 
(CWD hackstackdemo/demo/) 
$ ./manage.py update_index 
Indexing 6 notes
10. REBUILD INDEX 
(CWD hackstackdemo/demo/) 
$ ./manage.py update_index 
Indexing 6 notes
PART 2: 
LET’S GET TO QUERYIN’
SIMPLE 
SEARCHQUERYSETS
GREAT, WHAT ABOUT 
FROM A BROWSER?
EASY MODE 
Full-document search 
urls.py 
templates/search/search.html
HAYSTACK COMPONENTS TO 
EXTEND 
• haystack.forms.SearchForm 
django form with extendable .search() method. Define additional 
fields on the form, then incorporate them in the .search() 
method’s logic 
• haystack.views.SearchView 
Class-based view made to be flexible for common search cases
PART 3: FEATURES
HIT HIGHLIGHTING 
Instead of referring to a context variable directly, use the {% highlight %} tag
SPELLING 
SUGGESTIONS 
Update connection’s settings dictionary + reindex 
Use spelling_suggestion() method
AUTOCOMPLETE 
Create another search index field using EdgeNgramField + reindex 
Use the .autocomplete() method on a SearchQuerySet
FACETING 
Add faceting to search index definition 
Regenerate schema.xml and reindex content 
./manage.py build_solr_schema > 
../solr/example/solr/collection1/conf/schema.xml 
./manage.py update_index
FACETING 
From a shell:
RESOURCES 
LET’S SAVE YOU A GOOGLE TRIP
RESOURCES 
Solr in Action ($45) 
Apr 2014 
Haystack Documentation 
http://django-haystack.readthedocs.org/ 
IRC (freenode): 
#django 
#haystack 
#solr

More Related Content

What's hot

Student information chatbot final report
Student information chatbot  final report Student information chatbot  final report
Student information chatbot final report
jaysavani5
 
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...
4Science
 
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
HostedbyConfluent
 
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Edureka!
 
Natural Language Processing: Parsing
Natural Language Processing: ParsingNatural Language Processing: Parsing
Natural Language Processing: Parsing
Rushdi Shams
 
Presentation on Sentiment Analysis
Presentation on Sentiment AnalysisPresentation on Sentiment Analysis
Presentation on Sentiment Analysis
Rebecca Williams
 
Extending your Information Architecture to Microsoft Teams
Extending your Information Architecture to Microsoft TeamsExtending your Information Architecture to Microsoft Teams
Extending your Information Architecture to Microsoft Teams
Christian Buckley
 
Migration from File servers to M365 Business
Migration from File servers to M365 BusinessMigration from File servers to M365 Business
Migration from File servers to M365 Business
Robert Crane
 
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeApache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Slim Baltagi
 
Recommendation Engine Project Presentation
Recommendation Engine Project PresentationRecommendation Engine Project Presentation
Recommendation Engine Project Presentation
19Divya
 
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Sri Ambati
 
Chatbots: Connecting Artificial Intelligence and Customer Service
Chatbots: Connecting Artificial Intelligence and Customer ServiceChatbots: Connecting Artificial Intelligence and Customer Service
Chatbots: Connecting Artificial Intelligence and Customer Service
Mitchell & Whale Insurance Brokers Ltd.
 
An Introduction to Information Retrieval and Applications
 An Introduction to Information Retrieval and Applications An Introduction to Information Retrieval and Applications
An Introduction to Information Retrieval and Applications
sathish sak
 
MOVIE RECOMMENDATION SYSTEM.pptx
MOVIE RECOMMENDATION SYSTEM.pptxMOVIE RECOMMENDATION SYSTEM.pptx
MOVIE RECOMMENDATION SYSTEM.pptx
Ayushkumar417871
 
CHATBOT FOR COLLEGE RELATED QUERIES | J4RV4I1008
CHATBOT FOR COLLEGE RELATED QUERIES | J4RV4I1008CHATBOT FOR COLLEGE RELATED QUERIES | J4RV4I1008
CHATBOT FOR COLLEGE RELATED QUERIES | J4RV4I1008
Journal For Research
 
2017 Tutorial - Deep Learning for Dialogue Systems
2017 Tutorial - Deep Learning for Dialogue Systems2017 Tutorial - Deep Learning for Dialogue Systems
2017 Tutorial - Deep Learning for Dialogue Systems
MLReview
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
HostedbyConfluent
 
Open source software
Open source softwareOpen source software
Open source software
Santosh Kumar Kori
 
Scaling Slack - The Good, the Unexpected, and the Road Ahead
Scaling Slack - The Good, the Unexpected, and the Road AheadScaling Slack - The Good, the Unexpected, and the Road Ahead
Scaling Slack - The Good, the Unexpected, and the Road Ahead
C4Media
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
Saurav Aryal
 

What's hot (20)

Student information chatbot final report
Student information chatbot  final report Student information chatbot  final report
Student information chatbot final report
 
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...
 
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
 
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
 
Natural Language Processing: Parsing
Natural Language Processing: ParsingNatural Language Processing: Parsing
Natural Language Processing: Parsing
 
Presentation on Sentiment Analysis
Presentation on Sentiment AnalysisPresentation on Sentiment Analysis
Presentation on Sentiment Analysis
 
Extending your Information Architecture to Microsoft Teams
Extending your Information Architecture to Microsoft TeamsExtending your Information Architecture to Microsoft Teams
Extending your Information Architecture to Microsoft Teams
 
Migration from File servers to M365 Business
Migration from File servers to M365 BusinessMigration from File servers to M365 Business
Migration from File servers to M365 Business
 
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeApache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
 
Recommendation Engine Project Presentation
Recommendation Engine Project PresentationRecommendation Engine Project Presentation
Recommendation Engine Project Presentation
 
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
 
Chatbots: Connecting Artificial Intelligence and Customer Service
Chatbots: Connecting Artificial Intelligence and Customer ServiceChatbots: Connecting Artificial Intelligence and Customer Service
Chatbots: Connecting Artificial Intelligence and Customer Service
 
An Introduction to Information Retrieval and Applications
 An Introduction to Information Retrieval and Applications An Introduction to Information Retrieval and Applications
An Introduction to Information Retrieval and Applications
 
MOVIE RECOMMENDATION SYSTEM.pptx
MOVIE RECOMMENDATION SYSTEM.pptxMOVIE RECOMMENDATION SYSTEM.pptx
MOVIE RECOMMENDATION SYSTEM.pptx
 
CHATBOT FOR COLLEGE RELATED QUERIES | J4RV4I1008
CHATBOT FOR COLLEGE RELATED QUERIES | J4RV4I1008CHATBOT FOR COLLEGE RELATED QUERIES | J4RV4I1008
CHATBOT FOR COLLEGE RELATED QUERIES | J4RV4I1008
 
2017 Tutorial - Deep Learning for Dialogue Systems
2017 Tutorial - Deep Learning for Dialogue Systems2017 Tutorial - Deep Learning for Dialogue Systems
2017 Tutorial - Deep Learning for Dialogue Systems
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
 
Open source software
Open source softwareOpen source software
Open source software
 
Scaling Slack - The Good, the Unexpected, and the Road Ahead
Scaling Slack - The Good, the Unexpected, and the Road AheadScaling Slack - The Good, the Unexpected, and the Road Ahead
Scaling Slack - The Good, the Unexpected, and the Road Ahead
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 

Viewers also liked

Advanced Aspects of the Django Ecosystem: Haystack, Celery & Fabric
Advanced Aspects of the Django Ecosystem: Haystack, Celery & FabricAdvanced Aspects of the Django Ecosystem: Haystack, Celery & Fabric
Advanced Aspects of the Django Ecosystem: Haystack, Celery & Fabric
Simon Willison
 
Full Text search in Django with Postgres
Full Text search in Django with PostgresFull Text search in Django with Postgres
Full Text search in Django with Postgres
syerram
 
Semantic Web 2.0: Creating Social Semantic Information Spaces
Semantic Web 2.0: Creating Social Semantic Information SpacesSemantic Web 2.0: Creating Social Semantic Information Spaces
Semantic Web 2.0: Creating Social Semantic Information Spaces
John Breslin
 
Webinar: OpenNLP and Solr for Superior Relevance
Webinar: OpenNLP and Solr for Superior RelevanceWebinar: OpenNLP and Solr for Superior Relevance
Webinar: OpenNLP and Solr for Superior Relevance
Lucidworks
 
Django
DjangoDjango
Python
PythonPython

Viewers also liked (6)

Advanced Aspects of the Django Ecosystem: Haystack, Celery & Fabric
Advanced Aspects of the Django Ecosystem: Haystack, Celery & FabricAdvanced Aspects of the Django Ecosystem: Haystack, Celery & Fabric
Advanced Aspects of the Django Ecosystem: Haystack, Celery & Fabric
 
Full Text search in Django with Postgres
Full Text search in Django with PostgresFull Text search in Django with Postgres
Full Text search in Django with Postgres
 
Semantic Web 2.0: Creating Social Semantic Information Spaces
Semantic Web 2.0: Creating Social Semantic Information SpacesSemantic Web 2.0: Creating Social Semantic Information Spaces
Semantic Web 2.0: Creating Social Semantic Information Spaces
 
Webinar: OpenNLP and Solr for Superior Relevance
Webinar: OpenNLP and Solr for Superior RelevanceWebinar: OpenNLP and Solr for Superior Relevance
Webinar: OpenNLP and Solr for Superior Relevance
 
Django
DjangoDjango
Django
 
Python
PythonPython
Python
 

Similar to Advanced Search with Solr & django-haystack

REST Easy with Django-Rest-Framework
REST Easy with Django-Rest-FrameworkREST Easy with Django-Rest-Framework
REST Easy with Django-Rest-Framework
Marcel Chastain
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
Jayesh Bhoyar
 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Lucidworks
 
ASHviz - Dats visualization research experiments using ASH data
ASHviz - Dats visualization research experiments using ASH dataASHviz - Dats visualization research experiments using ASH data
ASHviz - Dats visualization research experiments using ASH data
John Beresniewicz
 
Web analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comWeb analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.com
Jungsu Heo
 
Search Engines: Best Practice
Search Engines: Best PracticeSearch Engines: Best Practice
Search Engines: Best Practice
Yuliya_Prach
 
YQL:: Select * from Internet
YQL:: Select * from InternetYQL:: Select * from Internet
YQL:: Select * from Internetdrgath
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Spark Summit
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015
Patrick McFadin
 
Unifying your data management with Hadoop
Unifying your data management with HadoopUnifying your data management with Hadoop
Unifying your data management with HadoopJayant Shekhar
 
Compass Framework
Compass FrameworkCompass Framework
Compass Framework
Lukas Vlcek
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
Eric Rodriguez (Hiring in Lex)
 
Site search analytics workshop presentation
Site search analytics workshop presentationSite search analytics workshop presentation
Site search analytics workshop presentation
Louis Rosenfeld
 
YQL: Select * from Internet
YQL: Select * from InternetYQL: Select * from Internet
YQL: Select * from Internet
drgath
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science Bootcamp
Kais Hassan, PhD
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Trey Grainger
 
MongoDB Roadmap
MongoDB RoadmapMongoDB Roadmap
MongoDB RoadmapMongoDB
 
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
'Moinuddin Ahmed
 
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
Yongyao Jiang
 

Similar to Advanced Search with Solr & django-haystack (20)

REST Easy with Django-Rest-Framework
REST Easy with Django-Rest-FrameworkREST Easy with Django-Rest-Framework
REST Easy with Django-Rest-Framework
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
 
ASHviz - Dats visualization research experiments using ASH data
ASHviz - Dats visualization research experiments using ASH dataASHviz - Dats visualization research experiments using ASH data
ASHviz - Dats visualization research experiments using ASH data
 
Web analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comWeb analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.com
 
Apache Drill at ApacheCon2014
Apache Drill at ApacheCon2014Apache Drill at ApacheCon2014
Apache Drill at ApacheCon2014
 
Search Engines: Best Practice
Search Engines: Best PracticeSearch Engines: Best Practice
Search Engines: Best Practice
 
YQL:: Select * from Internet
YQL:: Select * from InternetYQL:: Select * from Internet
YQL:: Select * from Internet
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015
 
Unifying your data management with Hadoop
Unifying your data management with HadoopUnifying your data management with Hadoop
Unifying your data management with Hadoop
 
Compass Framework
Compass FrameworkCompass Framework
Compass Framework
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
 
Site search analytics workshop presentation
Site search analytics workshop presentationSite search analytics workshop presentation
Site search analytics workshop presentation
 
YQL: Select * from Internet
YQL: Select * from InternetYQL: Select * from Internet
YQL: Select * from Internet
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science Bootcamp
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 
MongoDB Roadmap
MongoDB RoadmapMongoDB Roadmap
MongoDB Roadmap
 
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
 
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
 

Recently uploaded

When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 

Recently uploaded (20)

When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 

Advanced Search with Solr & django-haystack

  • 1. ADVANCED SEARCH WITH SOLR + DJANGO-HAYSTACK MARCEL CHASTAIN LA DJANGO – 2014-09-30
  • 2. WHAT WE’LL COVER 1. THE PITCH: The Problem With Search The Solution(s) Overall Architecture of System with Django/Solr/Haystack 2. THE GOOD STUFF: Indexing Data for Search Querying the Search Index Advanced Search Methods Resources
  • 3. THE PITCH OR, “WHY ANY OF THIS MATTERS”
  • 4. THE PROBLEM 1. Sites with stored information are ONLY as useful as they are at retrieving and displaying that information
  • 5. THE PROBLEM 2. Users have high expectations of search (thanks, Google)
  • 6. THE PROBLEM 2. Users have high expectations of search • Spelling Suggestions:
  • 7. THE PROBLEM 2. Users have high expectations of search • Hit Highlighting:
  • 8. THE PROBLEM 2. Users have high expectations of search • “Related Searches” • Distance/GeoSpatial Search
  • 9. THE PROBLEM 2. Users have high expectations of search • Faceting:
  • 10. THE PROBLEM 3. Good search involves lots of challenges
  • 11. THE PROBLEM 3. Good search involves lots of challenges • Stemming: User Searches For Word “Stem” “argue” “argues” “argued” “argu” “argument” “arguments” “argument”
  • 12. THE PROBLEM 3. Good search involves lots of challenges And more..! • Synonyms • Acronyms • Non-ASCII characters • Stop words (“and”, “to”, “a”) • Calculating relevance • Performance with millions/billions(!) of documents
  • 13. THE SOLUTION “Information Retrieval Systems” a.k.a Search Engines
  • 14. THE SOLUTION “Information Retrieval Systems” a.k.a Search Engines
  • 16. WHAT IS SOLR? Open-source enterprise search Java-based Created in 2004 Built on Apache Lucene Most popular enterprise search engine Apache 2.0 License Built for millions or billions of documents
  • 17. WHAT DOES IT DO? • Full-text search • Hit highlighting • Faceted search • Clustering/replication/sharding • Database integration • Rich document (word, pdf, etc) handling • Geospatial search • Spelling corrections/suggestions • … loads and loads more
  • 19. HOW CAN WE USE IT WITH DJANGO? Haystack From the homepage: (http://haystacksearch.org/)
  • 20. LOOK FAMILIAR? Query style Declarative search index definitions
  • 21. THE GOOD STUFF INSTALLING, CONFIGURING & USING SOLR/HAYSTACK
  • 22. WHO DOES WHAT Solr: • Provides API for submitting to & querying from index • Stores actual index data • Manages fields/data types in xml config (‘schema.xml’) Haystack: • Manages connection(s) to solr • Provides familiar API for querying • Uses templates and declarative search index definitions • Helps generate solr xml config • Management commands to index content • Generic views/forms for common search use-cases • Hooks into signals to keep data up-to-date
  • 23. PART 1: LET’S MAKE AN INDEX
  • 24. 0. GITHUB REPO git clone https://github.com/marcelchastain/haystackdemo
  • 25. 1. SETUP SOLR (from github repo root) ./solr_download.sh (or, manually) wget http://apache.mirrors.pair.com/lucene/solr/4.10.1/solr-4.10.1.tgz tar –xzvf solr-4.10.1.tgz ln –s ./solr-4.10.1 ./solr The one file to care about: • solr/example/solr/collection1/conf/schema.xml Stores field definitions and data types. Frequently updated during development
  • 26. 2. RUN SOLR (from github repo root) ./solr_start.sh (or, manually) cd solr/example && java –jar start.jar Requires java 1.7+. To install on debian/ubuntu: sudo apt-get install openjdk-7-jre-headless
  • 27. 3. INSTALL HAYSTACK (CWD haystackdemo/) apt-get install python-pip python-virtualenv virtualenv env && source env/bin/activate (from github repo root) pip install –r requirements.txt (or, manually) pip install Django==1.6.7 django-haystack
  • 28. 4. HAYSTACK SETTINGS INSTALLED_APPS = [ # ‘django.contrib.admin’, etc ‘haystack’, # then your usual apps ‘myapp’, ] HAYSTACK_CONNECTIONS = { ‘default’: { ‘ENGINE’: ‘haystack.backends.solr_backend.SolrEngine’, ‘URL’: ‘http://127.0.0.1:8983/solr’ }, } HAYSTACK_SIGNAL_PROCESSOR = ‘haystack.signals.RealtimeSignalProcessor’
  • 30. 6. SYNCDB & INITIAL DATA (CWD haystackdemo/demo/) ./manage.py syncdb ./manage.py loaddata restaurants
  • 31. 7. DEFINE SEARCH INDEX myapp/search_indexes.py
  • 32. 7.5 BOOSTING FIELD RELEVANCE Some fields are simply more relevant! (Note: changes to field boosts require reindex)
  • 33. 8. CREATE A TEMPLATE FOR INDEXED TEXT templates/search/indexes/myapp/note_text.txt
  • 34. 9. UPDATE SOLR SCHEMA (CWD: haystackdemo/demo/) ./manage.py build_solr_schema > ../solr/example/solr/collection1/conf/schema.xml Which adds: *Restart solr for changes to go into effect
  • 35. 10. REBUILD INDEX (CWD hackstackdemo/demo/) $ ./manage.py update_index Indexing 6 notes
  • 36. 10. REBUILD INDEX (CWD hackstackdemo/demo/) $ ./manage.py update_index Indexing 6 notes
  • 37. PART 2: LET’S GET TO QUERYIN’
  • 39. GREAT, WHAT ABOUT FROM A BROWSER?
  • 40. EASY MODE Full-document search urls.py templates/search/search.html
  • 41. HAYSTACK COMPONENTS TO EXTEND • haystack.forms.SearchForm django form with extendable .search() method. Define additional fields on the form, then incorporate them in the .search() method’s logic • haystack.views.SearchView Class-based view made to be flexible for common search cases
  • 43. HIT HIGHLIGHTING Instead of referring to a context variable directly, use the {% highlight %} tag
  • 44. SPELLING SUGGESTIONS Update connection’s settings dictionary + reindex Use spelling_suggestion() method
  • 45. AUTOCOMPLETE Create another search index field using EdgeNgramField + reindex Use the .autocomplete() method on a SearchQuerySet
  • 46. FACETING Add faceting to search index definition Regenerate schema.xml and reindex content ./manage.py build_solr_schema > ../solr/example/solr/collection1/conf/schema.xml ./manage.py update_index
  • 47. FACETING From a shell:
  • 48. RESOURCES LET’S SAVE YOU A GOOGLE TRIP
  • 49. RESOURCES Solr in Action ($45) Apr 2014 Haystack Documentation http://django-haystack.readthedocs.org/ IRC (freenode): #django #haystack #solr