SlideShare a Scribd company logo
Kyiv.py #16
Andrii Soldatenko
24 October 2015
@a_soldatenko
ElasticSearch
in Python
world.
Andrii Soldatenko
24 October 2015
@a_soldatenko
About me:
• Software Engineer in Test at
• Speaker at PyCon Russian 2015
• Speaker at PyCon Ukraine 2014
• Speaker at PyCon Belarus 2015
• in past:
Preface
Information Explosion
Text Search
grep --ignore-case --recursive foo books/
grep --ignore-case --recursive --file=words.txt books/
Entry.objects.get(headline__icontains='foo')
words = []
with open('words.txt', 'r') as f:
words = f.readlines()
Entry.objects.get(headline__icontains_in=words)
Full text search
Search index
Simple sentences
1. The quick brown fox jumped over the lazy dog
2. Quick brown foxes leap over lazy dogs in summer
Inverted index
Term Doc_1 Doc_2
-------------------------
Quick | | X
The | X |
brown | X | X
dog | X |
dogs | | X
fox | X |
foxes | | X
in | | X
jumped | X |
lazy | X | X
leap | | X
over | X | X
quick | X |
summer | | X
the | X |
------------------------
Inverted index
Term Doc_1 Doc_2
-------------------------
brown | X | X
quick | X |
------------------------
Total | 2 | 1
Inverted index:
normalization
Term Doc_1 Doc_2
-------------------------
brown | X | X
dog | X | X
fox | X | X
in | | X
jump | X | X
lazy | X | X
over | X | X
quick | X | X
summer | | X
the | X | X
------------------------
Term Doc_1 Doc_2
-------------------------
Quick | | X
The | X |
brown | X | X
dog | X |
dogs | | X
fox | X |
foxes | | X
in | | X
jumped | X |
lazy | X | X
leap | | X
over | X | X
quick | X |
summer | | X
the | X |
------------------------
Search Engines
ElasticSearch
Who uses ElasticSearch?
ElasticSearch:
Quick Intro
Relational DB Databases TablesRows Columns
ElasticSearch Indices FieldsTypes Documents
ElasticSearch:
Quick Intro
PUT /haystack/user/1
{
"first_name" : "Andrii",
"last_name" : "Soldatenko",
"age" : 30,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ],
"likes": [ "python", "django" ]
}
ElasticSearch:
Locks
•Pessimistic concurrency control
•Optimistic concurrency control
ElasticSearch:
Setup
#!/bin/bash
VERSION=1.7.1
curl -L -O https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-$VERSION
unzip elasticsearch-$VERSION.zip
cd elasticsearch-$VERSION
# Download plugin marvel
./bin/plugin -i elasticsearch/marvel/latest
echo 'marvel.agent.enabled: false' >> ./config/elasticsearch.yml
# run elastic
./bin/elasticsearch -d
ElasticSearch:
Setup
$ curl ‘http://localhost:9200/?pretty'
{
"status" : 200,
"name" : "Dredmund Druid",
"cluster_name" : "elasticsearch",
"version" : {
"number" : "1.7.1",
"build_hash" : "b88f43fc40b0bcd7f173a1f9ee2e97816de80b19",
"build_timestamp" : "2015-07-29T09:54:16Z",
"build_snapshot" : false,
"lucene_version" : "4.10.4"
},
"tagline" : "You Know, for Search"
}
ElasticSearch:
Settings
curl -X POST 'http://localhost:9200/<index_name>/_close'
curl -XPUT "http://localhost:9200/<index_name>/_settings" -d'
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "standard",
"stopwords": [ "and", "the" ]
}
}
}
}
}'
curl -X POST 'http://localhost:9200/<index_name>/_open'
Haystack
Adding search functionality
to Simple Model
$ cat myapp/models.py
from django.db import models
from django.contrib.auth.models import User
class Page(models.Model):
user = models.ForeignKey(User)
name = models.CharField(max_length=200)
description = models.TextField()
def __unicode__(self):
return self.name
Haystack: Installation
$ pip install django-haystack
$ cat settings.py
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.sites',
# Added.
'haystack',
# Then your usual apps...
'blog',
]
Haystack: Settings
$ pip install elasticsearch
$ cat settings.py
...
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
'URL': 'http://127.0.0.1:9200/',
'INDEX_NAME': 'haystack',
},
}
...
Haystack:
Creating SearchIndexes
$ cat myapp/search_indexes.py
import datetime
from haystack import indexes
from myapp.models import Note
class PageIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
author = indexes.CharField(model_attr='user')
pub_date = indexes.DateTimeField(model_attr='pub_date')
def get_model(self):
return Note
def index_queryset(self, using=None):
"""Used when the entire index for model is updated."""
return self.get_model().objects. 
filter(pub_date__lte=datetime.datetime.now())
Haystack:
SearchQuerySet API
from haystack.query import SearchQuerySet
from haystack.inputs import Raw
all_results = SearchQuerySet().all()
hello_results = SearchQuerySet().filter(content='hello')
unfriendly_results = SearchQuerySet().
exclude(content=‘hello’).
filter(content=‘world’)
# To send unescaped data:
sqs = SearchQuerySet().filter(title=Raw(trusted_query))
How to configure
elasticSearch?
https://github.com/django-haystack/django-haystack/blob/
9d92d4da0a1ec75978fc3949375dda9a1707469f/haystack/
backends/elasticsearch_backend.py#L41
ElasticSearch settings
ElasticStack backend
https://github.com/bennylope/elasticstack
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'elasticstack.backends.ConfigurableElasticSearchEngine',
'URL': 'http://127.0.0.1:9200/',
'INDEX_NAME': 'haystack',
},
}
ELASTICSEARCH_INDEX_SETTINGS = {}
ELASTICSEARCH_DEFAULT_ANALYZER = 'synonym_analyzer'
Keeping data in sync
# Update everything.
./manage.py update_index --settings=settings.prod
# Update everything with lots of information about what's going on.
./manage.py update_index --settings=settings.prod --verbosity=2
# Update everything, cleaning up after deleted models.
./manage.py update_index --remove --settings=settings.prod
# Update everything changed in the last 2 hours.
./manage.py update_index --age=2 --settings=settings.prod
# Update everything between Dec. 1, 2011 & Dec 31, 2011
./manage.py update_index --start='2011-12-01T00:00:00' --end='2011-12-31T23:59:59' --
settings=settings.prod
Signals
class RealtimeSignalProcessor(BaseSignalProcessor):
"""
Allows for observing when saves/deletes fire & automatically updates the
search engine appropriately.
"""
def setup(self):
# Naive (listen to all model saves).
models.signals.post_save.connect(self.handle_save)
models.signals.post_delete.connect(self.handle_delete)
# Efficient would be going through all backends & collecting all models
# being used, then hooking up signals only for those.
def teardown(self):
# Naive (listen to all model saves).
models.signals.post_save.disconnect(self.handle_save)
models.signals.post_delete.disconnect(self.handle_delete)
# Efficient would be going through all backends & collecting all models
# being used, then disconnecting signals only for those.
Haystack:
Pros and Cons
Pros:
• easy to setup
• looks like Django ORM but for searches
• search engine independent
• support 4 engines (Elastic, Solr, Xapian, Whoosh)
Cons:
• poor SearchQuerySet API
• difficult to manage stop words
• loose performance, because extra layer
• Model - based
Final Thoughts
https://www.elastic.co/guide/en/elasticsearch/guide/master/
index.html
Thank You
@a_soldatenko
https://asoldatenko.com
Questions
?

More Related Content

What's hot

Papyri.info's Linked Data Story
Papyri.info's Linked Data StoryPapyri.info's Linked Data Story
Papyri.info's Linked Data Story
Hugh Cayless
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
Paul Chao
 
Scaling Analytics with elasticsearch
Scaling Analytics with elasticsearchScaling Analytics with elasticsearch
Scaling Analytics with elasticsearch
dnoble00
 
Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with rails
Tom Z Zeng
 

What's hot (20)

Indexing Present1
Indexing Present1Indexing Present1
Indexing Present1
 
Papyri.info's Linked Data Story
Papyri.info's Linked Data StoryPapyri.info's Linked Data Story
Papyri.info's Linked Data Story
 
EuroPython 2015 - Big Data with Python and Hadoop
EuroPython 2015 - Big Data with Python and HadoopEuroPython 2015 - Big Data with Python and Hadoop
EuroPython 2015 - Big Data with Python and Hadoop
 
Python網站框架絕技: Django 完全攻略班
Python網站框架絕技: Django 完全攻略班Python網站框架絕技: Django 完全攻略班
Python網站框架絕技: Django 完全攻略班
 
Practical Elasticsearch - real world use cases
Practical Elasticsearch - real world use casesPractical Elasticsearch - real world use cases
Practical Elasticsearch - real world use cases
 
Use Cases for Elastic Search Percolator
Use Cases for Elastic Search PercolatorUse Cases for Elastic Search Percolator
Use Cases for Elastic Search Percolator
 
ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014
 
Scrapy workshop
Scrapy workshopScrapy workshop
Scrapy workshop
 
CouchDB Day NYC 2017: Full Text Search
CouchDB Day NYC 2017: Full Text SearchCouchDB Day NYC 2017: Full Text Search
CouchDB Day NYC 2017: Full Text Search
 
Sea Amsterdam 2014 November 19
Sea Amsterdam 2014 November 19Sea Amsterdam 2014 November 19
Sea Amsterdam 2014 November 19
 
The ultimate guide for Elasticsearch plugins
The ultimate guide for Elasticsearch pluginsThe ultimate guide for Elasticsearch plugins
The ultimate guide for Elasticsearch plugins
 
Elasticsearch Distributed search & analytics on BigData made easy
Elasticsearch Distributed search & analytics on BigData made easyElasticsearch Distributed search & analytics on BigData made easy
Elasticsearch Distributed search & analytics on BigData made easy
 
Database Homework Help
Database Homework HelpDatabase Homework Help
Database Homework Help
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
 
Introduction to solr
Introduction to solrIntroduction to solr
Introduction to solr
 
Scaling Analytics with elasticsearch
Scaling Analytics with elasticsearchScaling Analytics with elasticsearch
Scaling Analytics with elasticsearch
 
Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with rails
 
CouchDB Day NYC 2017: MapReduce Views
CouchDB Day NYC 2017: MapReduce ViewsCouchDB Day NYC 2017: MapReduce Views
CouchDB Day NYC 2017: MapReduce Views
 
Python Performance: Single-threaded, multi-threaded, and Gevent
Python Performance: Single-threaded, multi-threaded, and GeventPython Performance: Single-threaded, multi-threaded, and Gevent
Python Performance: Single-threaded, multi-threaded, and Gevent
 
elasticsearch - advanced features in practice
elasticsearch - advanced features in practiceelasticsearch - advanced features in practice
elasticsearch - advanced features in practice
 

Similar to Kyiv.py #16 october 2015

How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
琛琳 饶
 
Fuzzing - A Tale of Two Cultures
Fuzzing - A Tale of Two CulturesFuzzing - A Tale of Two Cultures
Fuzzing - A Tale of Two Cultures
CISPA Helmholtz Center for Information Security
 
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
Chengjen Lee
 

Similar to Kyiv.py #16 october 2015 (20)

What is the best full text search engine for Python?
What is the best full text search engine for Python?What is the best full text search engine for Python?
What is the best full text search engine for Python?
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
 
PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...
PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...
PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...
 
Fuzzing - A Tale of Two Cultures
Fuzzing - A Tale of Two CulturesFuzzing - A Tale of Two Cultures
Fuzzing - A Tale of Two Cultures
 
Elastic Search
Elastic SearchElastic Search
Elastic Search
 
ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdf
ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdfELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdf
ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdf
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Использование Elasticsearch для организации поиска по сайту
Использование Elasticsearch для организации поиска по сайтуИспользование Elasticsearch для организации поиска по сайту
Использование Elasticsearch для организации поиска по сайту
 
Attack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and Kibana
 
Elasticsearch speed is key
Elasticsearch speed is keyElasticsearch speed is key
Elasticsearch speed is key
 
ACM BPM and elasticsearch AMIS25
ACM BPM and elasticsearch AMIS25ACM BPM and elasticsearch AMIS25
ACM BPM and elasticsearch AMIS25
 
How elephants survive in big data environments
How elephants survive in big data environmentsHow elephants survive in big data environments
How elephants survive in big data environments
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 
Elasticsearch intro output
Elasticsearch intro outputElasticsearch intro output
Elasticsearch intro output
 
PuppetDB, Puppet Explorer and puppetdbquery
PuppetDB, Puppet Explorer and puppetdbqueryPuppetDB, Puppet Explorer and puppetdbquery
PuppetDB, Puppet Explorer and puppetdbquery
 
Happy Go Programming
Happy Go ProgrammingHappy Go Programming
Happy Go Programming
 
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
 
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
 
Workshop: Big Data Visualization for Security
Workshop: Big Data Visualization for SecurityWorkshop: Big Data Visualization for Security
Workshop: Big Data Visualization for Security
 
ITT 2015 - Saul Mora - Object Oriented Function Programming
ITT 2015 - Saul Mora - Object Oriented Function ProgrammingITT 2015 - Saul Mora - Object Oriented Function Programming
ITT 2015 - Saul Mora - Object Oriented Function Programming
 

More from Andrii Soldatenko

More from Andrii Soldatenko (10)

Debugging concurrency programs in go
Debugging concurrency programs in goDebugging concurrency programs in go
Debugging concurrency programs in go
 
Building robust and friendly command line applications in go
Building robust and friendly command line applications in goBuilding robust and friendly command line applications in go
Building robust and friendly command line applications in go
 
Advanced debugging  techniques in different environments
Advanced debugging  techniques in different environmentsAdvanced debugging  techniques in different environments
Advanced debugging  techniques in different environments
 
Origins of Serverless
Origins of ServerlessOrigins of Serverless
Origins of Serverless
 
Building serverless-applications
Building serverless-applicationsBuilding serverless-applications
Building serverless-applications
 
Building Serverless applications with Python
Building Serverless applications with PythonBuilding Serverless applications with Python
Building Serverless applications with Python
 
Practical continuous quality gates for development process
Practical continuous quality gates for development processPractical continuous quality gates for development process
Practical continuous quality gates for development process
 
PyCon 2015 Belarus Andrii Soldatenko
PyCon 2015 Belarus Andrii SoldatenkoPyCon 2015 Belarus Andrii Soldatenko
PyCon 2015 Belarus Andrii Soldatenko
 
PyCon Ukraine 2014
PyCon Ukraine 2014PyCon Ukraine 2014
PyCon Ukraine 2014
 
SeleniumCamp 2015 Andrii Soldatenko
SeleniumCamp 2015 Andrii SoldatenkoSeleniumCamp 2015 Andrii Soldatenko
SeleniumCamp 2015 Andrii Soldatenko
 

Recently uploaded

一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理
一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理
一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理
aagad
 
Article writing on excessive use of internet.pptx
Article writing on excessive use of internet.pptxArticle writing on excessive use of internet.pptx
Article writing on excessive use of internet.pptx
abhinandnam9997
 

Recently uploaded (12)

The Best AI Powered Software - Intellivid AI Studio
The Best AI Powered Software - Intellivid AI StudioThe Best AI Powered Software - Intellivid AI Studio
The Best AI Powered Software - Intellivid AI Studio
 
一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理
一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理
一比一原版UTS毕业证悉尼科技大学毕业证成绩单如何办理
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
 
How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?
 
The Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case StudyThe Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case Study
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
 
The AI Powered Organization-Intro to AI-LAN.pdf
The AI Powered Organization-Intro to AI-LAN.pdfThe AI Powered Organization-Intro to AI-LAN.pdf
The AI Powered Organization-Intro to AI-LAN.pdf
 
ER(Entity Relationship) Diagram for online shopping - TAE
ER(Entity Relationship) Diagram for online shopping - TAEER(Entity Relationship) Diagram for online shopping - TAE
ER(Entity Relationship) Diagram for online shopping - TAE
 
Article writing on excessive use of internet.pptx
Article writing on excessive use of internet.pptxArticle writing on excessive use of internet.pptx
Article writing on excessive use of internet.pptx
 
Pvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdfPvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdf
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
 

Kyiv.py #16 october 2015

  • 1. Kyiv.py #16 Andrii Soldatenko 24 October 2015 @a_soldatenko
  • 3. About me: • Software Engineer in Test at • Speaker at PyCon Russian 2015 • Speaker at PyCon Ukraine 2014 • Speaker at PyCon Belarus 2015 • in past:
  • 6. Text Search grep --ignore-case --recursive foo books/ grep --ignore-case --recursive --file=words.txt books/ Entry.objects.get(headline__icontains='foo') words = [] with open('words.txt', 'r') as f: words = f.readlines() Entry.objects.get(headline__icontains_in=words)
  • 9. Simple sentences 1. The quick brown fox jumped over the lazy dog 2. Quick brown foxes leap over lazy dogs in summer
  • 10. Inverted index Term Doc_1 Doc_2 ------------------------- Quick | | X The | X | brown | X | X dog | X | dogs | | X fox | X | foxes | | X in | | X jumped | X | lazy | X | X leap | | X over | X | X quick | X | summer | | X the | X | ------------------------
  • 11. Inverted index Term Doc_1 Doc_2 ------------------------- brown | X | X quick | X | ------------------------ Total | 2 | 1
  • 12. Inverted index: normalization Term Doc_1 Doc_2 ------------------------- brown | X | X dog | X | X fox | X | X in | | X jump | X | X lazy | X | X over | X | X quick | X | X summer | | X the | X | X ------------------------ Term Doc_1 Doc_2 ------------------------- Quick | | X The | X | brown | X | X dog | X | dogs | | X fox | X | foxes | | X in | | X jumped | X | lazy | X | X leap | | X over | X | X quick | X | summer | | X the | X | ------------------------
  • 16. ElasticSearch: Quick Intro Relational DB Databases TablesRows Columns ElasticSearch Indices FieldsTypes Documents
  • 17. ElasticSearch: Quick Intro PUT /haystack/user/1 { "first_name" : "Andrii", "last_name" : "Soldatenko", "age" : 30, "about" : "I love to go rock climbing", "interests": [ "sports", "music" ], "likes": [ "python", "django" ] }
  • 19. ElasticSearch: Setup #!/bin/bash VERSION=1.7.1 curl -L -O https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-$VERSION unzip elasticsearch-$VERSION.zip cd elasticsearch-$VERSION # Download plugin marvel ./bin/plugin -i elasticsearch/marvel/latest echo 'marvel.agent.enabled: false' >> ./config/elasticsearch.yml # run elastic ./bin/elasticsearch -d
  • 20. ElasticSearch: Setup $ curl ‘http://localhost:9200/?pretty' { "status" : 200, "name" : "Dredmund Druid", "cluster_name" : "elasticsearch", "version" : { "number" : "1.7.1", "build_hash" : "b88f43fc40b0bcd7f173a1f9ee2e97816de80b19", "build_timestamp" : "2015-07-29T09:54:16Z", "build_snapshot" : false, "lucene_version" : "4.10.4" }, "tagline" : "You Know, for Search" }
  • 21. ElasticSearch: Settings curl -X POST 'http://localhost:9200/<index_name>/_close' curl -XPUT "http://localhost:9200/<index_name>/_settings" -d' { "settings": { "analysis": { "analyzer": { "my_analyzer": { "type": "standard", "stopwords": [ "and", "the" ] } } } } }' curl -X POST 'http://localhost:9200/<index_name>/_open'
  • 23. Adding search functionality to Simple Model $ cat myapp/models.py from django.db import models from django.contrib.auth.models import User class Page(models.Model): user = models.ForeignKey(User) name = models.CharField(max_length=200) description = models.TextField() def __unicode__(self): return self.name
  • 24. Haystack: Installation $ pip install django-haystack $ cat settings.py INSTALLED_APPS = [ 'django.contrib.admin', 'django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.sessions', 'django.contrib.sites', # Added. 'haystack', # Then your usual apps... 'blog', ]
  • 25. Haystack: Settings $ pip install elasticsearch $ cat settings.py ... HAYSTACK_CONNECTIONS = { 'default': { 'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine', 'URL': 'http://127.0.0.1:9200/', 'INDEX_NAME': 'haystack', }, } ...
  • 26. Haystack: Creating SearchIndexes $ cat myapp/search_indexes.py import datetime from haystack import indexes from myapp.models import Note class PageIndex(indexes.SearchIndex, indexes.Indexable): text = indexes.CharField(document=True, use_template=True) author = indexes.CharField(model_attr='user') pub_date = indexes.DateTimeField(model_attr='pub_date') def get_model(self): return Note def index_queryset(self, using=None): """Used when the entire index for model is updated.""" return self.get_model().objects. filter(pub_date__lte=datetime.datetime.now())
  • 27. Haystack: SearchQuerySet API from haystack.query import SearchQuerySet from haystack.inputs import Raw all_results = SearchQuerySet().all() hello_results = SearchQuerySet().filter(content='hello') unfriendly_results = SearchQuerySet(). exclude(content=‘hello’). filter(content=‘world’) # To send unescaped data: sqs = SearchQuerySet().filter(title=Raw(trusted_query))
  • 30. ElasticStack backend https://github.com/bennylope/elasticstack HAYSTACK_CONNECTIONS = { 'default': { 'ENGINE': 'elasticstack.backends.ConfigurableElasticSearchEngine', 'URL': 'http://127.0.0.1:9200/', 'INDEX_NAME': 'haystack', }, } ELASTICSEARCH_INDEX_SETTINGS = {} ELASTICSEARCH_DEFAULT_ANALYZER = 'synonym_analyzer'
  • 31. Keeping data in sync # Update everything. ./manage.py update_index --settings=settings.prod # Update everything with lots of information about what's going on. ./manage.py update_index --settings=settings.prod --verbosity=2 # Update everything, cleaning up after deleted models. ./manage.py update_index --remove --settings=settings.prod # Update everything changed in the last 2 hours. ./manage.py update_index --age=2 --settings=settings.prod # Update everything between Dec. 1, 2011 & Dec 31, 2011 ./manage.py update_index --start='2011-12-01T00:00:00' --end='2011-12-31T23:59:59' -- settings=settings.prod
  • 32. Signals class RealtimeSignalProcessor(BaseSignalProcessor): """ Allows for observing when saves/deletes fire & automatically updates the search engine appropriately. """ def setup(self): # Naive (listen to all model saves). models.signals.post_save.connect(self.handle_save) models.signals.post_delete.connect(self.handle_delete) # Efficient would be going through all backends & collecting all models # being used, then hooking up signals only for those. def teardown(self): # Naive (listen to all model saves). models.signals.post_save.disconnect(self.handle_save) models.signals.post_delete.disconnect(self.handle_delete) # Efficient would be going through all backends & collecting all models # being used, then disconnecting signals only for those.
  • 33. Haystack: Pros and Cons Pros: • easy to setup • looks like Django ORM but for searches • search engine independent • support 4 engines (Elastic, Solr, Xapian, Whoosh) Cons: • poor SearchQuerySet API • difficult to manage stop words • loose performance, because extra layer • Model - based