SlideShare a Scribd company logo
1 of 49
ADVANCED SEARCH 
WITH 
SOLR + DJANGO-HAYSTACK 
MARCEL CHASTAIN 
LA DJANGO – 2014-09-30
WHAT WE’LL COVER 
1. THE PITCH: 
The Problem With Search 
The Solution(s) 
Overall Architecture of System with Django/Solr/Haystack 
2. THE GOOD STUFF: 
Indexing Data for Search 
Querying the Search Index 
Advanced Search Methods 
Resources
THE PITCH 
OR, “WHY ANY OF THIS MATTERS”
THE PROBLEM 
1. Sites with stored information are 
ONLY as useful as they are at 
retrieving and displaying that 
information
THE PROBLEM 
2. Users have high expectations of 
search (thanks, Google)
THE PROBLEM 
2. Users have high expectations of 
search 
• Spelling Suggestions:
THE PROBLEM 
2. Users have high expectations of 
search 
• Hit Highlighting:
THE PROBLEM 
2. Users have high expectations of 
search 
• “Related Searches” 
• Distance/GeoSpatial Search
THE PROBLEM 
2. Users have high expectations of 
search 
• Faceting:
THE PROBLEM 
3. Good search involves lots of 
challenges
THE PROBLEM 
3. Good search involves lots of 
challenges 
• Stemming: 
User Searches For Word “Stem” 
“argue” 
“argues” 
“argued” 
“argu” 
“argument” 
“arguments” 
“argument”
THE PROBLEM 
3. Good search involves lots of 
challenges 
And more..! 
• Synonyms 
• Acronyms 
• Non-ASCII characters 
• Stop words (“and”, “to”, “a”) 
• Calculating relevance 
• Performance with millions/billions(!) of documents
THE SOLUTION 
“Information Retrieval Systems” 
a.k.a Search Engines
THE SOLUTION 
“Information Retrieval Systems” 
a.k.a Search Engines
SOLR 
THE BACKEND
WHAT IS SOLR? 
Open-source enterprise search 
Java-based 
Created in 2004 
Built on Apache Lucene 
Most popular enterprise search engine 
Apache 2.0 License 
Built for millions or billions of documents
WHAT DOES IT DO? 
• Full-text search 
• Hit highlighting 
• Faceted search 
• Clustering/replication/sharding 
• Database integration 
• Rich document (word, pdf, etc) handling 
• Geospatial search 
• Spelling corrections/suggestions 
• … loads and loads more
WHO USES SOLR?
HOW CAN WE USE IT 
WITH DJANGO? 
Haystack 
From the homepage: 
(http://haystacksearch.org/)
LOOK FAMILIAR? 
Query style 
Declarative search index definitions
THE GOOD 
STUFF 
INSTALLING, CONFIGURING & USING 
SOLR/HAYSTACK
WHO DOES WHAT 
Solr: 
• Provides API for submitting to & querying from index 
• Stores actual index data 
• Manages fields/data types in xml config (‘schema.xml’) 
Haystack: 
• Manages connection(s) to solr 
• Provides familiar API for querying 
• Uses templates and declarative search index definitions 
• Helps generate solr xml config 
• Management commands to index content 
• Generic views/forms for common search use-cases 
• Hooks into signals to keep data up-to-date
PART 1: 
LET’S MAKE AN INDEX
0. GITHUB REPO 
git clone https://github.com/marcelchastain/haystackdemo
1. SETUP SOLR 
(from github repo root) 
./solr_download.sh 
(or, manually) 
wget http://apache.mirrors.pair.com/lucene/solr/4.10.1/solr-4.10.1.tgz 
tar –xzvf solr-4.10.1.tgz 
ln –s ./solr-4.10.1 ./solr 
The one file to care about: 
• solr/example/solr/collection1/conf/schema.xml 
Stores field definitions and data types. Frequently updated during 
development
2. RUN SOLR 
(from github repo root) 
./solr_start.sh 
(or, manually) 
cd solr/example && java –jar start.jar 
Requires java 1.7+. To install on debian/ubuntu: 
sudo apt-get install openjdk-7-jre-headless
3. INSTALL HAYSTACK 
(CWD haystackdemo/) 
apt-get install python-pip python-virtualenv 
virtualenv env && source env/bin/activate 
(from github repo root) 
pip install –r requirements.txt 
(or, manually) 
pip install Django==1.6.7 django-haystack
4. HAYSTACK SETTINGS 
INSTALLED_APPS = [ 
# ‘django.contrib.admin’, etc 
‘haystack’, 
# then your usual apps 
‘myapp’, 
] 
HAYSTACK_CONNECTIONS = { 
‘default’: { 
‘ENGINE’: ‘haystack.backends.solr_backend.SolrEngine’, 
‘URL’: ‘http://127.0.0.1:8983/solr’ 
}, 
} 
HAYSTACK_SIGNAL_PROCESSOR = ‘haystack.signals.RealtimeSignalProcessor’
5. THE MODEL(S)
6. SYNCDB & INITIAL DATA 
(CWD haystackdemo/demo/) 
./manage.py syncdb 
./manage.py loaddata restaurants
7. DEFINE SEARCH INDEX 
myapp/search_indexes.py
7.5 BOOSTING FIELD 
RELEVANCE 
Some fields are simply more relevant! 
(Note: changes to field boosts require reindex)
8. CREATE A TEMPLATE 
FOR INDEXED TEXT 
templates/search/indexes/myapp/note_text.txt
9. UPDATE SOLR SCHEMA 
(CWD: haystackdemo/demo/) 
./manage.py build_solr_schema > 
../solr/example/solr/collection1/conf/schema.xml 
Which adds: 
*Restart solr for changes to go into effect
10. REBUILD INDEX 
(CWD hackstackdemo/demo/) 
$ ./manage.py update_index 
Indexing 6 notes
10. REBUILD INDEX 
(CWD hackstackdemo/demo/) 
$ ./manage.py update_index 
Indexing 6 notes
PART 2: 
LET’S GET TO QUERYIN’
SIMPLE 
SEARCHQUERYSETS
GREAT, WHAT ABOUT 
FROM A BROWSER?
EASY MODE 
Full-document search 
urls.py 
templates/search/search.html
HAYSTACK COMPONENTS TO 
EXTEND 
• haystack.forms.SearchForm 
django form with extendable .search() method. Define additional 
fields on the form, then incorporate them in the .search() 
method’s logic 
• haystack.views.SearchView 
Class-based view made to be flexible for common search cases
PART 3: FEATURES
HIT HIGHLIGHTING 
Instead of referring to a context variable directly, use the {% highlight %} tag
SPELLING 
SUGGESTIONS 
Update connection’s settings dictionary + reindex 
Use spelling_suggestion() method
AUTOCOMPLETE 
Create another search index field using EdgeNgramField + reindex 
Use the .autocomplete() method on a SearchQuerySet
FACETING 
Add faceting to search index definition 
Regenerate schema.xml and reindex content 
./manage.py build_solr_schema > 
../solr/example/solr/collection1/conf/schema.xml 
./manage.py update_index
FACETING 
From a shell:
RESOURCES 
LET’S SAVE YOU A GOOGLE TRIP
RESOURCES 
Solr in Action ($45) 
Apr 2014 
Haystack Documentation 
http://django-haystack.readthedocs.org/ 
IRC (freenode): 
#django 
#haystack 
#solr

More Related Content

What's hot

SQL Server Profiler & Performance Monitor - SarabPreet Singh
SQL Server Profiler & Performance Monitor - SarabPreet SinghSQL Server Profiler & Performance Monitor - SarabPreet Singh
SQL Server Profiler & Performance Monitor - SarabPreet Singh
Rishu Mehra
 

What's hot (20)

SQL for NoSQL and how Apache Calcite can help
SQL for NoSQL and how  Apache Calcite can helpSQL for NoSQL and how  Apache Calcite can help
SQL for NoSQL and how Apache Calcite can help
 
Introduction to Apache Calcite
Introduction to Apache CalciteIntroduction to Apache Calcite
Introduction to Apache Calcite
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
 
2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptx
 
SQL Server Profiler & Performance Monitor - SarabPreet Singh
SQL Server Profiler & Performance Monitor - SarabPreet SinghSQL Server Profiler & Performance Monitor - SarabPreet Singh
SQL Server Profiler & Performance Monitor - SarabPreet Singh
 
Apache Calcite: One planner fits all
Apache Calcite: One planner fits allApache Calcite: One planner fits all
Apache Calcite: One planner fits all
 
KazHackStan - "><script>alert()</script>
KazHackStan - "><script>alert()</script>KazHackStan - "><script>alert()</script>
KazHackStan - "><script>alert()</script>
 
MySQL: Indexing for Better Performance
MySQL: Indexing for Better PerformanceMySQL: Indexing for Better Performance
MySQL: Indexing for Better Performance
 
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
 
GraphQL Advanced
GraphQL AdvancedGraphQL Advanced
GraphQL Advanced
 
FRONT-END WEB DEVELOPMENT WITH REACTJS
FRONT-END WEB DEVELOPMENT WITH REACTJSFRONT-END WEB DEVELOPMENT WITH REACTJS
FRONT-END WEB DEVELOPMENT WITH REACTJS
 
Writing clean code
Writing clean codeWriting clean code
Writing clean code
 
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinC* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
 
Clean Code: Chapter 3 Function
Clean Code: Chapter 3 FunctionClean Code: Chapter 3 Function
Clean Code: Chapter 3 Function
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기
 
Oracle sql high performance tuning
Oracle sql high performance tuningOracle sql high performance tuning
Oracle sql high performance tuning
 
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21
 
TypeScript for Java Developers
TypeScript for Java DevelopersTypeScript for Java Developers
TypeScript for Java Developers
 
Clean code slide
Clean code slideClean code slide
Clean code slide
 
JavaScript Event Loop
JavaScript Event LoopJavaScript Event Loop
JavaScript Event Loop
 

Viewers also liked

Viewers also liked (6)

Advanced Aspects of the Django Ecosystem: Haystack, Celery & Fabric
Advanced Aspects of the Django Ecosystem: Haystack, Celery & FabricAdvanced Aspects of the Django Ecosystem: Haystack, Celery & Fabric
Advanced Aspects of the Django Ecosystem: Haystack, Celery & Fabric
 
Full Text search in Django with Postgres
Full Text search in Django with PostgresFull Text search in Django with Postgres
Full Text search in Django with Postgres
 
Semantic Web 2.0: Creating Social Semantic Information Spaces
Semantic Web 2.0: Creating Social Semantic Information SpacesSemantic Web 2.0: Creating Social Semantic Information Spaces
Semantic Web 2.0: Creating Social Semantic Information Spaces
 
Webinar: OpenNLP and Solr for Superior Relevance
Webinar: OpenNLP and Solr for Superior RelevanceWebinar: OpenNLP and Solr for Superior Relevance
Webinar: OpenNLP and Solr for Superior Relevance
 
Django
DjangoDjango
Django
 
Python
PythonPython
Python
 

Similar to Advanced Search with Solr & django-haystack

Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Lucidworks
 
YQL:: Select * from Internet
YQL:: Select * from InternetYQL:: Select * from Internet
YQL:: Select * from Internet
drgath
 
Unifying your data management with Hadoop
Unifying your data management with HadoopUnifying your data management with Hadoop
Unifying your data management with Hadoop
Jayant Shekhar
 
MongoDB Roadmap
MongoDB RoadmapMongoDB Roadmap
MongoDB Roadmap
MongoDB
 

Similar to Advanced Search with Solr & django-haystack (20)

REST Easy with Django-Rest-Framework
REST Easy with Django-Rest-FrameworkREST Easy with Django-Rest-Framework
REST Easy with Django-Rest-Framework
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
 
ASHviz - Dats visualization research experiments using ASH data
ASHviz - Dats visualization research experiments using ASH dataASHviz - Dats visualization research experiments using ASH data
ASHviz - Dats visualization research experiments using ASH data
 
Web analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comWeb analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.com
 
Apache Drill at ApacheCon2014
Apache Drill at ApacheCon2014Apache Drill at ApacheCon2014
Apache Drill at ApacheCon2014
 
Search Engines: Best Practice
Search Engines: Best PracticeSearch Engines: Best Practice
Search Engines: Best Practice
 
YQL:: Select * from Internet
YQL:: Select * from InternetYQL:: Select * from Internet
YQL:: Select * from Internet
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015
 
Unifying your data management with Hadoop
Unifying your data management with HadoopUnifying your data management with Hadoop
Unifying your data management with Hadoop
 
Compass Framework
Compass FrameworkCompass Framework
Compass Framework
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
 
Site search analytics workshop presentation
Site search analytics workshop presentationSite search analytics workshop presentation
Site search analytics workshop presentation
 
YQL: Select * from Internet
YQL: Select * from InternetYQL: Select * from Internet
YQL: Select * from Internet
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science Bootcamp
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 
MongoDB Roadmap
MongoDB RoadmapMongoDB Roadmap
MongoDB Roadmap
 
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
 
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Advanced Search with Solr & django-haystack

  • 1. ADVANCED SEARCH WITH SOLR + DJANGO-HAYSTACK MARCEL CHASTAIN LA DJANGO – 2014-09-30
  • 2. WHAT WE’LL COVER 1. THE PITCH: The Problem With Search The Solution(s) Overall Architecture of System with Django/Solr/Haystack 2. THE GOOD STUFF: Indexing Data for Search Querying the Search Index Advanced Search Methods Resources
  • 3. THE PITCH OR, “WHY ANY OF THIS MATTERS”
  • 4. THE PROBLEM 1. Sites with stored information are ONLY as useful as they are at retrieving and displaying that information
  • 5. THE PROBLEM 2. Users have high expectations of search (thanks, Google)
  • 6. THE PROBLEM 2. Users have high expectations of search • Spelling Suggestions:
  • 7. THE PROBLEM 2. Users have high expectations of search • Hit Highlighting:
  • 8. THE PROBLEM 2. Users have high expectations of search • “Related Searches” • Distance/GeoSpatial Search
  • 9. THE PROBLEM 2. Users have high expectations of search • Faceting:
  • 10. THE PROBLEM 3. Good search involves lots of challenges
  • 11. THE PROBLEM 3. Good search involves lots of challenges • Stemming: User Searches For Word “Stem” “argue” “argues” “argued” “argu” “argument” “arguments” “argument”
  • 12. THE PROBLEM 3. Good search involves lots of challenges And more..! • Synonyms • Acronyms • Non-ASCII characters • Stop words (“and”, “to”, “a”) • Calculating relevance • Performance with millions/billions(!) of documents
  • 13. THE SOLUTION “Information Retrieval Systems” a.k.a Search Engines
  • 14. THE SOLUTION “Information Retrieval Systems” a.k.a Search Engines
  • 16. WHAT IS SOLR? Open-source enterprise search Java-based Created in 2004 Built on Apache Lucene Most popular enterprise search engine Apache 2.0 License Built for millions or billions of documents
  • 17. WHAT DOES IT DO? • Full-text search • Hit highlighting • Faceted search • Clustering/replication/sharding • Database integration • Rich document (word, pdf, etc) handling • Geospatial search • Spelling corrections/suggestions • … loads and loads more
  • 19. HOW CAN WE USE IT WITH DJANGO? Haystack From the homepage: (http://haystacksearch.org/)
  • 20. LOOK FAMILIAR? Query style Declarative search index definitions
  • 21. THE GOOD STUFF INSTALLING, CONFIGURING & USING SOLR/HAYSTACK
  • 22. WHO DOES WHAT Solr: • Provides API for submitting to & querying from index • Stores actual index data • Manages fields/data types in xml config (‘schema.xml’) Haystack: • Manages connection(s) to solr • Provides familiar API for querying • Uses templates and declarative search index definitions • Helps generate solr xml config • Management commands to index content • Generic views/forms for common search use-cases • Hooks into signals to keep data up-to-date
  • 23. PART 1: LET’S MAKE AN INDEX
  • 24. 0. GITHUB REPO git clone https://github.com/marcelchastain/haystackdemo
  • 25. 1. SETUP SOLR (from github repo root) ./solr_download.sh (or, manually) wget http://apache.mirrors.pair.com/lucene/solr/4.10.1/solr-4.10.1.tgz tar –xzvf solr-4.10.1.tgz ln –s ./solr-4.10.1 ./solr The one file to care about: • solr/example/solr/collection1/conf/schema.xml Stores field definitions and data types. Frequently updated during development
  • 26. 2. RUN SOLR (from github repo root) ./solr_start.sh (or, manually) cd solr/example && java –jar start.jar Requires java 1.7+. To install on debian/ubuntu: sudo apt-get install openjdk-7-jre-headless
  • 27. 3. INSTALL HAYSTACK (CWD haystackdemo/) apt-get install python-pip python-virtualenv virtualenv env && source env/bin/activate (from github repo root) pip install –r requirements.txt (or, manually) pip install Django==1.6.7 django-haystack
  • 28. 4. HAYSTACK SETTINGS INSTALLED_APPS = [ # ‘django.contrib.admin’, etc ‘haystack’, # then your usual apps ‘myapp’, ] HAYSTACK_CONNECTIONS = { ‘default’: { ‘ENGINE’: ‘haystack.backends.solr_backend.SolrEngine’, ‘URL’: ‘http://127.0.0.1:8983/solr’ }, } HAYSTACK_SIGNAL_PROCESSOR = ‘haystack.signals.RealtimeSignalProcessor’
  • 30. 6. SYNCDB & INITIAL DATA (CWD haystackdemo/demo/) ./manage.py syncdb ./manage.py loaddata restaurants
  • 31. 7. DEFINE SEARCH INDEX myapp/search_indexes.py
  • 32. 7.5 BOOSTING FIELD RELEVANCE Some fields are simply more relevant! (Note: changes to field boosts require reindex)
  • 33. 8. CREATE A TEMPLATE FOR INDEXED TEXT templates/search/indexes/myapp/note_text.txt
  • 34. 9. UPDATE SOLR SCHEMA (CWD: haystackdemo/demo/) ./manage.py build_solr_schema > ../solr/example/solr/collection1/conf/schema.xml Which adds: *Restart solr for changes to go into effect
  • 35. 10. REBUILD INDEX (CWD hackstackdemo/demo/) $ ./manage.py update_index Indexing 6 notes
  • 36. 10. REBUILD INDEX (CWD hackstackdemo/demo/) $ ./manage.py update_index Indexing 6 notes
  • 37. PART 2: LET’S GET TO QUERYIN’
  • 39. GREAT, WHAT ABOUT FROM A BROWSER?
  • 40. EASY MODE Full-document search urls.py templates/search/search.html
  • 41. HAYSTACK COMPONENTS TO EXTEND • haystack.forms.SearchForm django form with extendable .search() method. Define additional fields on the form, then incorporate them in the .search() method’s logic • haystack.views.SearchView Class-based view made to be flexible for common search cases
  • 43. HIT HIGHLIGHTING Instead of referring to a context variable directly, use the {% highlight %} tag
  • 44. SPELLING SUGGESTIONS Update connection’s settings dictionary + reindex Use spelling_suggestion() method
  • 45. AUTOCOMPLETE Create another search index field using EdgeNgramField + reindex Use the .autocomplete() method on a SearchQuerySet
  • 46. FACETING Add faceting to search index definition Regenerate schema.xml and reindex content ./manage.py build_solr_schema > ../solr/example/solr/collection1/conf/schema.xml ./manage.py update_index
  • 47. FACETING From a shell:
  • 48. RESOURCES LET’S SAVE YOU A GOOGLE TRIP
  • 49. RESOURCES Solr in Action ($45) Apr 2014 Haystack Documentation http://django-haystack.readthedocs.org/ IRC (freenode): #django #haystack #solr