SlideShare a Scribd company logo
Dive into
full text search
with Python
Andrii Soldatenko
18-19 September 2015
@a_soldatenko
About me:
• Lead QA Automation Engineer at
• Backend Python Developer at
• Speaker at PyCon Ukraine 2014
• Speaker at PyCon Belarus 2015
• @a_soldatenko
Preface
Information Explosion
Text Search
grep	
  -­‐-­‐ignore-­‐case	
  -­‐-­‐recursive	
  foo	
  books/	
  
grep	
  -­‐-­‐ignore-­‐case	
  -­‐-­‐recursive	
  -­‐-­‐file=words.txt	
  books/
Entry.objects.get(headline__icontains='foo')	
  
words	
  =	
  []	
  
with	
  open('words.txt',	
  'r')	
  as	
  f:	
  
	
  	
  	
  	
  words	
  =	
  f.readlines()	
  
Entry.objects.get(headline__icontains_in=words)
Full text search
Search index
Simple sentences
1. The quick brown fox jumped over the lazy dog
2. Quick brown foxes leap over lazy dogs in summer
Inverted index
Term	
  	
  	
  	
  	
  	
  Doc_1	
  	
  Doc_2	
  
-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  
Quick	
  	
  	
  |	
  	
  	
  	
  	
  	
  	
  |	
  	
  X	
  
The	
  	
  	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  
brown	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  	
  X	
  
dog	
  	
  	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  
dogs	
  	
  	
  	
  |	
  	
  	
  	
  	
  	
  	
  |	
  	
  X	
  
fox	
  	
  	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  
foxes	
  	
  	
  |	
  	
  	
  	
  	
  	
  	
  |	
  	
  X	
  
in	
  	
  	
  	
  	
  	
  |	
  	
  	
  	
  	
  	
  	
  |	
  	
  X	
  
jumped	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  
lazy	
  	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  	
  X	
  
leap	
  	
  	
  	
  |	
  	
  	
  	
  	
  	
  	
  |	
  	
  X	
  
over	
  	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  	
  X	
  
quick	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  
summer	
  	
  |	
  	
  	
  	
  	
  	
  	
  |	
  	
  X	
  
the	
  	
  	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  
-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐
Inverted index
Term	
  	
  	
  	
  	
  	
  Doc_1	
  	
  Doc_2	
  
-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  
brown	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  	
  X	
  
quick	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  
-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  
Total	
  	
  	
  |	
  	
  	
  2	
  	
  	
  |	
  	
  1
Inverted index:
normalization
Term	
  	
  	
  	
  	
  	
  Doc_1	
  	
  Doc_2	
  
-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  
brown	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  	
  X	
  
dog	
  	
  	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  	
  X	
  
fox	
  	
  	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  	
  X	
  
in	
  	
  	
  	
  	
  	
  |	
  	
  	
  	
  	
  	
  	
  |	
  	
  X	
  
jump	
  	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  	
  X	
  
lazy	
  	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  	
  X	
  
over	
  	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  	
  X	
  
quick	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  	
  X	
  
summer	
  	
  |	
  	
  	
  	
  	
  	
  	
  |	
  	
  X	
  
the	
  	
  	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  	
  X	
  
-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐
Term	
  	
  	
  	
  	
  	
  Doc_1	
  	
  Doc_2	
  
-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  
Quick	
  	
  	
  |	
  	
  	
  	
  	
  	
  	
  |	
  	
  X	
  
The	
  	
  	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  
brown	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  	
  X	
  
dog	
  	
  	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  
dogs	
  	
  	
  	
  |	
  	
  	
  	
  	
  	
  	
  |	
  	
  X	
  
fox	
  	
  	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  
foxes	
  	
  	
  |	
  	
  	
  	
  	
  	
  	
  |	
  	
  X	
  
in	
  	
  	
  	
  	
  	
  |	
  	
  	
  	
  	
  	
  	
  |	
  	
  X	
  
jumped	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  
lazy	
  	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  	
  X	
  
leap	
  	
  	
  	
  |	
  	
  	
  	
  	
  	
  	
  |	
  	
  X	
  
over	
  	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  	
  X	
  
quick	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  
summer	
  	
  |	
  	
  	
  	
  	
  	
  	
  |	
  	
  X	
  
the	
  	
  	
  	
  	
  |	
  	
  	
  X	
  	
  	
  |	
  
-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐
Search Engines
PostgreSQL
PostgreSQL:

operators for textual data types
-­‐-­‐-­‐	
  PostgreSQL	
  has	
  operators	
  for	
  textual	
  data	
  types:	
  
-­‐-­‐-­‐	
  LIKE	
  -­‐	
  match	
  case-­‐sensitive	
  
-­‐-­‐-­‐	
  ILIKE	
  -­‐	
  match	
  case-­‐insensitive	
  
-­‐-­‐-­‐	
  ~	
  -­‐	
  Matches	
  POSIX	
  regular	
  expression,	
  case-­‐sensitive	
  
-­‐-­‐-­‐	
  ~*	
  -­‐	
  Matches	
  POSIX	
  regular	
  expression,	
  case-­‐insensitive	
  
select	
  'foo'	
  LIKE	
  'foo';	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐-­‐	
  true	
  
select	
  'bar'	
  ILIKE	
  'BAR';	
  	
  	
  	
  	
  	
  	
  	
  -­‐-­‐	
  true	
  	
  
select	
  'abc'	
  LIKE	
  'b';	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐-­‐	
  true	
  
select	
  'abc'	
  LIKE	
  'c';	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐-­‐	
  false	
  
select	
  'abc'	
  ~	
  'abc';	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐-­‐	
  true	
  
select	
  'abc'	
  ~	
  '^a';	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐-­‐	
  true	
  
select	
  'abc'	
  ~	
  '(b|d)';	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐-­‐	
  true	
  
select	
  'abc'	
  ~	
  '^(b|c)';	
  	
  	
  	
  	
  	
  	
  	
  	
  -­‐-­‐	
  false	
  
select	
  'andrii'	
  ~*	
  '.*Andrii.*';	
  -­‐-­‐	
  true
PostgreSQL:

accuracy issue
select	
  'prone'	
  like	
  '%one%';	
  -­‐-­‐true	
  	
  
select	
  'money'	
  like	
  '%one%';	
  -­‐-­‐true	
  	
  
select	
  'lonely'	
  like	
  '%one%';	
  -­‐-­‐true	
  	
  
Full text search in
PostgreSQL
1. Creating tokens
2. Converting tokens into Lexemes
3. Storing preprocessed documents
Full text search in
PostgreSQL
27 built-in configurations for 10 languages
Support of user-defined FTS configurations
Pluggable dictionaries, parsers
Inverted indexes
functions to convert
normal text to tsvector
explain	
  SELECT	
  'a	
  fat	
  cat	
  sat	
  on	
  a	
  mat	
  and	
  ate	
  a	
  fat	
  rat'::tsvector	
  @@	
  	
  
	
  	
  	
  	
  	
  	
  	
  'cat	
  &	
  rat’::tsquery;	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  QUERY	
  PLAN	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  
	
  Result	
  	
  (cost=0.00..0.01	
  rows=1	
  width=0)	
  
(1	
  row)	
  
explain	
  SELECT	
  'fat	
  &	
  cow'::tsquery	
  @@	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  'a	
  fat	
  cat	
  sat	
  on	
  a	
  mat	
  and	
  ate	
  a	
  fat	
  rat'::tsvector;	
  -­‐-­‐	
  false	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  QUERY	
  PLAN	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  
	
  Result	
  	
  (cost=0.00..0.01	
  rows=1	
  width=0)	
  
(1	
  row)
PostgreSQL:

index management
CREATE	
  FUNCTION	
  notes_vector_update()	
  RETURNS	
  TRIGGER	
  AS	
  $$	
  
BEGIN	
  
	
  	
  	
  	
  IF	
  TG_OP	
  =	
  'INSERT'	
  THEN	
  
	
  	
  	
  	
  	
  	
  	
  	
  new.search_index	
  =	
  to_tsvector('pg_catalog.english',	
  COALESCE(NEW.name,	
  ''));	
  
	
  	
  	
  	
  END	
  IF;	
  
	
  	
  	
  	
  IF	
  TG_OP	
  =	
  'UPDATE'	
  THEN	
  
	
  	
  	
  	
  	
  	
  	
  	
  IF	
  NEW.name	
  <>	
  OLD.name	
  THEN	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  new.search_index	
  =	
  to_tsvector('pg_catalog.english',	
  COALESCE(NEW.name,	
  ''));	
  
	
  	
  	
  	
  	
  	
  	
  	
  END	
  IF;	
  
	
  	
  	
  	
  END	
  IF;	
  
	
  	
  	
  	
  RETURN	
  NEW;	
  
END	
  
$$	
  LANGUAGE	
  'plpgsql';	
  
PostgreSQL:

stopwords
SELECT	
  to_tsvector('english','in	
  the	
  list	
  of	
  stop	
  words');	
  
	
  	
  	
  	
  	
  	
  	
  to_tsvector	
  
-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  
	
  'list':3	
  'stop':5	
  'word':6
/usr/pgsql-9.3/share/tsearch_data/english.stop
Django:
Malcolm Tredinnick's Advice
on Writing SQL in Django :
“︎If you need to write advanced SQL you should write it.
I would balance that by cautioning against
overuse of the raw() and extra() methods.”
PostgreSQL full-text search
integration with django orm
https://github.com/linuxlewis/djorm-ext-pgfulltext
from	
  djorm_pgfulltext.models	
  import	
  SearchManager	
  
from	
  djorm_pgfulltext.fields	
  import	
  VectorField	
  
from	
  django.db	
  import	
  models	
  
class	
  Page(models.Model):	
  
	
  	
  	
  	
  name	
  =	
  models.CharField(max_length=200)	
  
	
  	
  	
  	
  description	
  =	
  models.TextField()	
  
	
  	
  	
  	
  search_index	
  =	
  VectorField()	
  
	
  	
  	
  	
  objects	
  =	
  SearchManager(	
  
	
  	
  	
  	
  	
  	
  	
  	
  fields	
  =	
  ('name',	
  'description'),	
  
	
  	
  	
  	
  	
  	
  	
  	
  config	
  =	
  'pg_catalog.english',	
  #	
  this	
  is	
  default	
  
	
  	
  	
  	
  	
  	
  	
  	
  search_field	
  =	
  'search_index',	
  #	
  this	
  is	
  default	
  
	
  	
  	
  	
  	
  	
  	
  	
  auto_update_search_field	
  =	
  True	
  
	
  	
  	
  	
  )
For search just use search
method of the manager
https://github.com/linuxlewis/djorm-ext-pgfulltext
>>>	
  Page.objects.search("documentation	
  &	
  about")	
  
[<Page:	
  Page:	
  Home	
  page>]	
  
>>>	
  Page.objects.search("about	
  |	
  documentation	
  |	
  django	
  |	
  home",	
  raw=True)	
  
[<Page:	
  Page:	
  Home	
  page>,	
  <Page:	
  Page:	
  About>,	
  <Page:	
  Page:	
  Navigation>]
Second way
class	
  Page(models.Model):	
  
	
  	
  	
  	
  name	
  =	
  models.CharField(max_length=200)	
  
	
  	
  	
  	
  description	
  =	
  models.TextField()	
  
	
  	
  	
  	
  objects	
  =	
  SearchManager(fields=None,	
  search_field=None)	
  
>>>	
  Page.objects.search("documentation	
  &	
  about",	
  fields=('name',	
  
'description'))	
  
[<Page:	
  Page:	
  Home	
  page>]	
  
>>>	
  Page.objects.search("about	
  |	
  documentation	
  |	
  django	
  |	
  home",	
  
raw=True,	
  fields=('name',	
  'description'))	
  
[<Page:	
  Page:	
  Home	
  page>,	
  <Page:	
  Page:	
  About>,	
  <Page:	
  Page:	
  
Navigation>]
Pros and Cons
Pros:
• Quick implementation
• No dependency
Cons:
• Need manually manage indexes
• Not as flexible as pure search engines
• tied to PostgreSQL
• no analytics data
• no DSL only `&` and `|` queries
• difficult to manage stop words
ElasticSearch
Who uses ElasticSearch?
ElasticSearch:
Quick Intro
Relational DB Databases TablesRows Columns
ElasticSearch Indices FieldsTypes Documents
ElasticSearch:
Quick Intro
PUT	
  /haystack/user/1	
  
{	
  
	
  	
  	
  	
  "first_name"	
  :	
  "Andrii",	
  
	
  	
  	
  	
  "last_name"	
  :	
  	
  "Soldatenko",	
  
	
  	
  	
  	
  "age"	
  :	
  	
  	
  	
  	
  	
  	
  	
  30,	
  
	
  	
  	
  	
  "about"	
  :	
  	
  	
  	
  	
  	
  "I	
  love	
  to	
  go	
  rock	
  climbing",	
  
	
  	
  	
  	
  "interests":	
  [	
  "sports",	
  "music"	
  ],	
  
	
  	
  	
  	
  "likes":	
  [	
  "python",	
  "django"	
  ]	
  
}
ElasticSearch:
Locks
•Pessimistic concurrency control
•Optimistic concurrency control
ElasticSearch:
Setup
#!/bin/bash	
  
VERSION=1.7.1	
  
curl	
  -­‐L	
  -­‐O	
  https://download.elastic.co/elasticsearch/elasticsearch/
elasticsearch-­‐$VERSION.zip	
  
unzip	
  elasticsearch-­‐$VERSION.zip	
  
cd	
  elasticsearch-­‐$VERSION	
  
#	
  Download	
  plugin	
  marvel	
  
./bin/plugin	
  -­‐i	
  elasticsearch/marvel/latest	
  
echo	
  'marvel.agent.enabled:	
  false'	
  >>	
  ./config/elasticsearch.yml	
  
#	
  run	
  elastic	
  
./bin/elasticsearch	
  -­‐d
ElasticSearch:
Setup
$	
  curl	
  ‘http://localhost:9200/?pretty'	
  
{	
  
	
  	
  "status"	
  :	
  200,	
  
	
  	
  "name"	
  :	
  "Dredmund	
  Druid",	
  
	
  	
  "cluster_name"	
  :	
  "elasticsearch",	
  
	
  	
  "version"	
  :	
  {	
  
	
  	
  	
  	
  "number"	
  :	
  "1.7.1",	
  
	
  	
  	
  	
  "build_hash"	
  :	
  "b88f43fc40b0bcd7f173a1f9ee2e97816de80b19",	
  
	
  	
  	
  	
  "build_timestamp"	
  :	
  "2015-­‐07-­‐29T09:54:16Z",	
  
	
  	
  	
  	
  "build_snapshot"	
  :	
  false,	
  
	
  	
  	
  	
  "lucene_version"	
  :	
  "4.10.4"	
  
	
  	
  },	
  
	
  	
  "tagline"	
  :	
  "You	
  Know,	
  for	
  Search"	
  
}
Haystack
Adding search functionality
to Simple Model
$	
  cat	
  myapp/models.py	
  
from	
  django.db	
  import	
  models	
  
from	
  django.contrib.auth.models	
  import	
  User	
  
class	
  Page(models.Model):	
  
	
  	
  	
  	
  user	
  =	
  models.ForeignKey(User)	
  
	
  	
  	
  	
  name	
  =	
  models.CharField(max_length=200)	
  
	
  	
  	
  	
  description	
  =	
  models.TextField()	
  
	
  	
  	
  	
  def	
  __unicode__(self):	
  
	
  	
  	
  	
  	
  	
  	
  	
  return	
  self.name	
  
Haystack: Installation
$	
  pip	
  install	
  django-­‐haystack	
  
$	
  cat	
  settings.py	
  
INSTALLED_APPS	
  =	
  [	
  
	
  	
  	
  	
  'django.contrib.admin',	
  
	
  	
  	
  	
  'django.contrib.auth',	
  
	
  	
  	
  	
  'django.contrib.contenttypes',	
  
	
  	
  	
  	
  'django.contrib.sessions',	
  
	
  	
  	
  	
  'django.contrib.sites',	
  
	
  	
  	
  	
  #	
  Added.	
  
	
  	
  	
  	
  'haystack',	
  
	
  	
  	
  	
  #	
  Then	
  your	
  usual	
  apps...	
  
	
  	
  	
  	
  'blog',	
  
]
Haystack: Installation
$	
  pip	
  install	
  elasticsearch	
  
$	
  cat	
  settings.py	
  
...	
  
HAYSTACK_CONNECTIONS	
  =	
  {	
  
	
  	
  	
  	
  'default':	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  'ENGINE':	
  
'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',	
  
	
  	
  	
  	
  	
  	
  	
  	
  'URL':	
  'http://127.0.0.1:9200/',	
  
	
  	
  	
  	
  	
  	
  	
  	
  'INDEX_NAME':	
  'haystack',	
  
	
  	
  	
  	
  },	
  
}	
  
...
Haystack:
Creating SearchIndexes
$	
  cat	
  myapp/search_indexes.py	
  
import	
  datetime	
  
from	
  haystack	
  import	
  indexes	
  
from	
  myapp.models	
  import	
  Note	
  
class	
  PageIndex(indexes.SearchIndex,	
  indexes.Indexable):	
  
	
  	
  	
  	
  text	
  =	
  indexes.CharField(document=True,	
  use_template=True)	
  
	
  	
  	
  	
  author	
  =	
  indexes.CharField(model_attr='user')	
  
	
  	
  	
  	
  pub_date	
  =	
  indexes.DateTimeField(model_attr='pub_date')	
  
	
  	
  	
  	
  def	
  get_model(self):	
  
	
  	
  	
  	
  	
  	
  	
  	
  return	
  Note	
  
	
  	
  	
  	
  def	
  index_queryset(self,	
  using=None):	
  
	
  	
  	
  	
  	
  	
  	
  	
  """Used	
  when	
  the	
  entire	
  index	
  for	
  model	
  is	
  updated."""	
  
	
  	
  	
  	
  	
  	
  	
  	
  return	
  self.get_model().objects.	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  filter(pub_date__lte=datetime.datetime.now())
Haystack:
SearchQuerySet API
from	
  haystack.query	
  import	
  SearchQuerySet	
  
from	
  haystack.inputs	
  import	
  Raw	
  
all_results	
  =	
  SearchQuerySet().all()	
  
hello_results	
  =	
  SearchQuerySet().filter(content='hello')	
  
unfriendly_results	
  =	
  SearchQuerySet().	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  exclude(content=‘hello’).	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  filter(content=‘world’)	
  
#	
  To	
  send	
  unescaped	
  data:	
  
sqs	
  =	
  SearchQuerySet().filter(title=Raw(trusted_query))	
  
Keeping data in sync
#	
  Update	
  everything.	
  
./manage.py	
  update_index	
  -­‐-­‐settings=settings.prod	
  
#	
  Update	
  everything	
  with	
  lots	
  of	
  information	
  about	
  what's	
  going	
  on.	
  
./manage.py	
  update_index	
  -­‐-­‐settings=settings.prod	
  -­‐-­‐verbosity=2	
  
#	
  Update	
  everything,	
  cleaning	
  up	
  after	
  deleted	
  models.	
  
./manage.py	
  update_index	
  -­‐-­‐remove	
  -­‐-­‐settings=settings.prod	
  
#	
  Update	
  everything	
  changed	
  in	
  the	
  last	
  2	
  hours.	
  
./manage.py	
  update_index	
  -­‐-­‐age=2	
  -­‐-­‐settings=settings.prod	
  
#	
  Update	
  everything	
  between	
  Dec.	
  1,	
  2011	
  &	
  Dec	
  31,	
  2011	
  
./manage.py	
  update_index	
  -­‐-­‐start='2011-­‐12-­‐01T00:00:00'	
  -­‐-­‐
end='2011-­‐12-­‐31T23:59:59'	
  -­‐-­‐settings=settings.prod
Signals
class	
  RealtimeSignalProcessor(BaseSignalProcessor):	
  
	
  	
  	
  	
  """	
  
	
  	
  	
  	
  Allows	
  for	
  observing	
  when	
  saves/deletes	
  fire	
  &	
  automatically	
  updates	
  the	
  
	
  	
  	
  	
  search	
  engine	
  appropriately.	
  
	
  	
  	
  	
  """	
  
	
  	
  	
  	
  def	
  setup(self):	
  
	
  	
  	
  	
  	
  	
  	
  	
  #	
  Naive	
  (listen	
  to	
  all	
  model	
  saves).	
  
	
  	
  	
  	
  	
  	
  	
  	
  models.signals.post_save.connect(self.handle_save)	
  
	
  	
  	
  	
  	
  	
  	
  	
  models.signals.post_delete.connect(self.handle_delete)	
  
	
  	
  	
  	
  	
  	
  	
  	
  #	
  Efficient	
  would	
  be	
  going	
  through	
  all	
  backends	
  &	
  collecting	
  all	
  models	
  
	
  	
  	
  	
  	
  	
  	
  	
  #	
  being	
  used,	
  then	
  hooking	
  up	
  signals	
  only	
  for	
  those.	
  
	
  	
  	
  	
  def	
  teardown(self):	
  
	
  	
  	
  	
  	
  	
  	
  	
  #	
  Naive	
  (listen	
  to	
  all	
  model	
  saves).	
  
	
  	
  	
  	
  	
  	
  	
  	
  models.signals.post_save.disconnect(self.handle_save)	
  
	
  	
  	
  	
  	
  	
  	
  	
  models.signals.post_delete.disconnect(self.handle_delete)	
  
	
  	
  	
  	
  	
  	
  	
  	
  #	
  Efficient	
  would	
  be	
  going	
  through	
  all	
  backends	
  &	
  collecting	
  all	
  models	
  
	
  	
  	
  	
  	
  	
  	
  	
  #	
  being	
  used,	
  then	
  disconnecting	
  signals	
  only	
  for	
  those.
Haystack:
Pros and Cons
Pros:
• easy to setup
• looks like Django ORM but for searches
• search engine independent
• support 4 engines (Elastic, Solr, Xapian, Whoosh)
Cons:
• poor SearchQuerySet API
• difficult to manage stop words
• loose performance, because extra layer
• Model - based
Future FTS and
Roadmap Django 1.9
• PostgreSQL Full Text Search (Marc Tamlyn)
https://github.com/django/django/pull/4726
• Custom indexes (Marc Tamlyn)
• etc.
Final Thoughts
https://www.elastic.co/guide/en/elasticsearch/guide/master/
index.html
Thank You
a_soldatenko@wargaming.net
@a_soldatenko
https://asoldatenko.com
We are hiring
a_soldatenko@wargaming.net
Questions
?

More Related Content

What's hot

2016 bioinformatics i_io_wim_vancriekinge
2016 bioinformatics i_io_wim_vancriekinge2016 bioinformatics i_io_wim_vancriekinge
2016 bioinformatics i_io_wim_vancriekinge
Prof. Wim Van Criekinge
 
Understanding Graph Databases with Neo4j and Cypher
Understanding Graph Databases with Neo4j and CypherUnderstanding Graph Databases with Neo4j and Cypher
Understanding Graph Databases with Neo4j and Cypher
Ruhaim Izmeth
 
Compact ordered dict__k_lab_meeting_
Compact ordered dict__k_lab_meeting_Compact ordered dict__k_lab_meeting_
Compact ordered dict__k_lab_meeting_
miki koganei
 
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
Duyhai Doan
 
Node collaboration - Exported Resources and PuppetDB
Node collaboration - Exported Resources and PuppetDBNode collaboration - Exported Resources and PuppetDB
Node collaboration - Exported Resources and PuppetDB
m_richardson
 
The Ring programming language version 1.5.1 book - Part 38 of 180
The Ring programming language version 1.5.1 book - Part 38 of 180The Ring programming language version 1.5.1 book - Part 38 of 180
The Ring programming language version 1.5.1 book - Part 38 of 180
Mahmoud Samir Fayed
 
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesWebscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Jonathan Katz
 
Alta vista indexing and search engine
Alta vista  indexing and search engineAlta vista  indexing and search engine
Alta vista indexing and search engine
daomucun
 
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NYPuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
Puppet
 
dns.workshop.hsgr
dns.workshop.hsgrdns.workshop.hsgr
dns.workshop.hsgr
ebalaskas
 
2015 bioinformatics databases_wim_vancriekinge
2015 bioinformatics databases_wim_vancriekinge2015 bioinformatics databases_wim_vancriekinge
2015 bioinformatics databases_wim_vancriekinge
Prof. Wim Van Criekinge
 
Parse, scale to millions
Parse, scale to millionsParse, scale to millions
Parse, scale to millions
Florent Vilmart
 
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAYPostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAYEmanuel Calvo
 
Doing Horrible Things with DNS - Web Directions South
Doing Horrible Things with DNS - Web Directions SouthDoing Horrible Things with DNS - Web Directions South
Doing Horrible Things with DNS - Web Directions South
Tom Croucher
 
RediSearch Mumbai Meetup 2020
RediSearch Mumbai Meetup 2020RediSearch Mumbai Meetup 2020
RediSearch Mumbai Meetup 2020
⚡️ Vikram Sahu
 
Tuning Solr & Pipeline for Logs
Tuning Solr & Pipeline for LogsTuning Solr & Pipeline for Logs
Tuning Solr & Pipeline for Logs
Sematext Group, Inc.
 
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxData
 
Accelerating Local Search with PostgreSQL (KNN-Search)
Accelerating Local Search with PostgreSQL (KNN-Search)Accelerating Local Search with PostgreSQL (KNN-Search)
Accelerating Local Search with PostgreSQL (KNN-Search)
Jonathan Katz
 
아파트 정보를 이용한 ELK stack 활용 - 오근문
아파트 정보를 이용한 ELK stack 활용 - 오근문아파트 정보를 이용한 ELK stack 활용 - 오근문
아파트 정보를 이용한 ELK stack 활용 - 오근문
NAVER D2
 
Value protocols and codables
Value protocols and codablesValue protocols and codables
Value protocols and codables
Florent Vilmart
 

What's hot (20)

2016 bioinformatics i_io_wim_vancriekinge
2016 bioinformatics i_io_wim_vancriekinge2016 bioinformatics i_io_wim_vancriekinge
2016 bioinformatics i_io_wim_vancriekinge
 
Understanding Graph Databases with Neo4j and Cypher
Understanding Graph Databases with Neo4j and CypherUnderstanding Graph Databases with Neo4j and Cypher
Understanding Graph Databases with Neo4j and Cypher
 
Compact ordered dict__k_lab_meeting_
Compact ordered dict__k_lab_meeting_Compact ordered dict__k_lab_meeting_
Compact ordered dict__k_lab_meeting_
 
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
 
Node collaboration - Exported Resources and PuppetDB
Node collaboration - Exported Resources and PuppetDBNode collaboration - Exported Resources and PuppetDB
Node collaboration - Exported Resources and PuppetDB
 
The Ring programming language version 1.5.1 book - Part 38 of 180
The Ring programming language version 1.5.1 book - Part 38 of 180The Ring programming language version 1.5.1 book - Part 38 of 180
The Ring programming language version 1.5.1 book - Part 38 of 180
 
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesWebscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
 
Alta vista indexing and search engine
Alta vista  indexing and search engineAlta vista  indexing and search engine
Alta vista indexing and search engine
 
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NYPuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
 
dns.workshop.hsgr
dns.workshop.hsgrdns.workshop.hsgr
dns.workshop.hsgr
 
2015 bioinformatics databases_wim_vancriekinge
2015 bioinformatics databases_wim_vancriekinge2015 bioinformatics databases_wim_vancriekinge
2015 bioinformatics databases_wim_vancriekinge
 
Parse, scale to millions
Parse, scale to millionsParse, scale to millions
Parse, scale to millions
 
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAYPostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
 
Doing Horrible Things with DNS - Web Directions South
Doing Horrible Things with DNS - Web Directions SouthDoing Horrible Things with DNS - Web Directions South
Doing Horrible Things with DNS - Web Directions South
 
RediSearch Mumbai Meetup 2020
RediSearch Mumbai Meetup 2020RediSearch Mumbai Meetup 2020
RediSearch Mumbai Meetup 2020
 
Tuning Solr & Pipeline for Logs
Tuning Solr & Pipeline for LogsTuning Solr & Pipeline for Logs
Tuning Solr & Pipeline for Logs
 
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
 
Accelerating Local Search with PostgreSQL (KNN-Search)
Accelerating Local Search with PostgreSQL (KNN-Search)Accelerating Local Search with PostgreSQL (KNN-Search)
Accelerating Local Search with PostgreSQL (KNN-Search)
 
아파트 정보를 이용한 ELK stack 활용 - 오근문
아파트 정보를 이용한 ELK stack 활용 - 오근문아파트 정보를 이용한 ELK stack 활용 - 오근문
아파트 정보를 이용한 ELK stack 활용 - 오근문
 
Value protocols and codables
Value protocols and codablesValue protocols and codables
Value protocols and codables
 

Viewers also liked

Practical continuous quality gates for development process
Practical continuous quality gates for development processPractical continuous quality gates for development process
Practical continuous quality gates for development process
Andrii Soldatenko
 
PyCon 2015 Belarus Andrii Soldatenko
PyCon 2015 Belarus Andrii SoldatenkoPyCon 2015 Belarus Andrii Soldatenko
PyCon 2015 Belarus Andrii Soldatenko
Andrii Soldatenko
 
SeleniumCamp 2015 Andrii Soldatenko
SeleniumCamp 2015 Andrii SoldatenkoSeleniumCamp 2015 Andrii Soldatenko
SeleniumCamp 2015 Andrii Soldatenko
Andrii Soldatenko
 
PyCon Ukraine 2014
PyCon Ukraine 2014PyCon Ukraine 2014
PyCon Ukraine 2014
Andrii Soldatenko
 
Full text search | Speech by Matteo Durighetto | PGDay.IT 2013
Full text search | Speech by Matteo Durighetto | PGDay.IT 2013 Full text search | Speech by Matteo Durighetto | PGDay.IT 2013
Full text search | Speech by Matteo Durighetto | PGDay.IT 2013
Miriade Spa
 
Scaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and DjangoScaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and Django
tow21
 
Tricuris.trichiura
Tricuris.trichiuraTricuris.trichiura
Tricuris.trichiura
Joel Rojas
 
Better Full Text Search in PostgreSQL
Better Full Text Search in PostgreSQLBetter Full Text Search in PostgreSQL
Better Full Text Search in PostgreSQL
Artur Zakirov
 
Full Text Search In PostgreSQL
Full Text Search In PostgreSQLFull Text Search In PostgreSQL
Full Text Search In PostgreSQL
Karwin Software Solutions LLC
 

Viewers also liked (10)

Practical continuous quality gates for development process
Practical continuous quality gates for development processPractical continuous quality gates for development process
Practical continuous quality gates for development process
 
PyCon 2015 Belarus Andrii Soldatenko
PyCon 2015 Belarus Andrii SoldatenkoPyCon 2015 Belarus Andrii Soldatenko
PyCon 2015 Belarus Andrii Soldatenko
 
SeleniumCamp 2015 Andrii Soldatenko
SeleniumCamp 2015 Andrii SoldatenkoSeleniumCamp 2015 Andrii Soldatenko
SeleniumCamp 2015 Andrii Soldatenko
 
PyCon Ukraine 2014
PyCon Ukraine 2014PyCon Ukraine 2014
PyCon Ukraine 2014
 
Full text search | Speech by Matteo Durighetto | PGDay.IT 2013
Full text search | Speech by Matteo Durighetto | PGDay.IT 2013 Full text search | Speech by Matteo Durighetto | PGDay.IT 2013
Full text search | Speech by Matteo Durighetto | PGDay.IT 2013
 
Scaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and DjangoScaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and Django
 
Plasmodium
PlasmodiumPlasmodium
Plasmodium
 
Tricuris.trichiura
Tricuris.trichiuraTricuris.trichiura
Tricuris.trichiura
 
Better Full Text Search in PostgreSQL
Better Full Text Search in PostgreSQLBetter Full Text Search in PostgreSQL
Better Full Text Search in PostgreSQL
 
Full Text Search In PostgreSQL
Full Text Search In PostgreSQLFull Text Search In PostgreSQL
Full Text Search In PostgreSQL
 

Similar to PyCon Russian 2015 - Dive into full text search with python.

Погружение в полнотекстовый поиск, используя Python - Андрей Солдатенко, Warg...
Погружение в полнотекстовый поиск, используя Python - Андрей Солдатенко, Warg...Погружение в полнотекстовый поиск, используя Python - Андрей Солдатенко, Warg...
Погружение в полнотекстовый поиск, используя Python - Андрей Солдатенко, Warg...
it-people
 
Kyiv.py #16 october 2015
Kyiv.py #16 october 2015Kyiv.py #16 october 2015
Kyiv.py #16 october 2015
Andrii Soldatenko
 
Hvordan sette opp en OAI-PMH metadata-innhøster
Hvordan sette opp en OAI-PMH metadata-innhøsterHvordan sette opp en OAI-PMH metadata-innhøster
Hvordan sette opp en OAI-PMH metadata-innhøster
Libriotech
 
Tests unitaires pour PostgreSQL avec pgTap
Tests unitaires pour PostgreSQL avec pgTapTests unitaires pour PostgreSQL avec pgTap
Tests unitaires pour PostgreSQL avec pgTap
Rodolphe Quiédeville
 
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary dataKernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Anne Nicolas
 
ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdf
ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdfELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdf
ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdf
cadejaumafiq
 
Lec25-CS110 Computational Engineering
Lec25-CS110 Computational EngineeringLec25-CS110 Computational Engineering
Lec25-CS110 Computational Engineering
Sri Harsha Pamu
 
Postgres performance for humans
Postgres performance for humansPostgres performance for humans
Postgres performance for humansCraig Kerstiens
 
A Map of the PyData Stack
A Map of the PyData StackA Map of the PyData Stack
A Map of the PyData Stack
Peadar Coyle
 
UNIX SHELL IN DBA EVERYDAY
UNIX SHELL IN DBA EVERYDAYUNIX SHELL IN DBA EVERYDAY
UNIX SHELL IN DBA EVERYDAY
Andrejs Vorobjovs
 
UNIX SHELL IN DBA EVERYDAY
UNIX SHELL IN DBA EVERYDAYUNIX SHELL IN DBA EVERYDAY
UNIX SHELL IN DBA EVERYDAY
Signis Vavere
 
An introduction to Raku
An introduction to RakuAn introduction to Raku
An introduction to Raku
Simon Proctor
 
Bioinformatics t5-databasesearching v2014
Bioinformatics t5-databasesearching v2014Bioinformatics t5-databasesearching v2014
Bioinformatics t5-databasesearching v2014
Prof. Wim Van Criekinge
 
AD3251-Data Structures Design-Notes-Searching-Hashing.pdf
AD3251-Data Structures  Design-Notes-Searching-Hashing.pdfAD3251-Data Structures  Design-Notes-Searching-Hashing.pdf
AD3251-Data Structures Design-Notes-Searching-Hashing.pdf
Ramco Institute of Technology, Rajapalayam, Tamilnadu, India
 
Fuzzing - A Tale of Two Cultures
Fuzzing - A Tale of Two CulturesFuzzing - A Tale of Two Cultures
Fuzzing - A Tale of Two Cultures
CISPA Helmholtz Center for Information Security
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
Mark Wong
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLCommand Prompt., Inc
 
Bioinformatica p4-io
Bioinformatica p4-ioBioinformatica p4-io
Bioinformatica p4-io
Prof. Wim Van Criekinge
 
Defensive Programming 2013-03-18
Defensive Programming 2013-03-18Defensive Programming 2013-03-18
Defensive Programming 2013-03-18Laura A Schild
 

Similar to PyCon Russian 2015 - Dive into full text search with python. (20)

Погружение в полнотекстовый поиск, используя Python - Андрей Солдатенко, Warg...
Погружение в полнотекстовый поиск, используя Python - Андрей Солдатенко, Warg...Погружение в полнотекстовый поиск, используя Python - Андрей Солдатенко, Warg...
Погружение в полнотекстовый поиск, используя Python - Андрей Солдатенко, Warg...
 
Kyiv.py #16 october 2015
Kyiv.py #16 october 2015Kyiv.py #16 october 2015
Kyiv.py #16 october 2015
 
Hvordan sette opp en OAI-PMH metadata-innhøster
Hvordan sette opp en OAI-PMH metadata-innhøsterHvordan sette opp en OAI-PMH metadata-innhøster
Hvordan sette opp en OAI-PMH metadata-innhøster
 
Tests unitaires pour PostgreSQL avec pgTap
Tests unitaires pour PostgreSQL avec pgTapTests unitaires pour PostgreSQL avec pgTap
Tests unitaires pour PostgreSQL avec pgTap
 
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary dataKernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
 
ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdf
ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdfELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdf
ELK-Stack-Essential-Concepts-TheELKStack-LunchandLearn.pdf
 
Odp
OdpOdp
Odp
 
Lec25-CS110 Computational Engineering
Lec25-CS110 Computational EngineeringLec25-CS110 Computational Engineering
Lec25-CS110 Computational Engineering
 
Postgres performance for humans
Postgres performance for humansPostgres performance for humans
Postgres performance for humans
 
A Map of the PyData Stack
A Map of the PyData StackA Map of the PyData Stack
A Map of the PyData Stack
 
UNIX SHELL IN DBA EVERYDAY
UNIX SHELL IN DBA EVERYDAYUNIX SHELL IN DBA EVERYDAY
UNIX SHELL IN DBA EVERYDAY
 
UNIX SHELL IN DBA EVERYDAY
UNIX SHELL IN DBA EVERYDAYUNIX SHELL IN DBA EVERYDAY
UNIX SHELL IN DBA EVERYDAY
 
An introduction to Raku
An introduction to RakuAn introduction to Raku
An introduction to Raku
 
Bioinformatics t5-databasesearching v2014
Bioinformatics t5-databasesearching v2014Bioinformatics t5-databasesearching v2014
Bioinformatics t5-databasesearching v2014
 
AD3251-Data Structures Design-Notes-Searching-Hashing.pdf
AD3251-Data Structures  Design-Notes-Searching-Hashing.pdfAD3251-Data Structures  Design-Notes-Searching-Hashing.pdf
AD3251-Data Structures Design-Notes-Searching-Hashing.pdf
 
Fuzzing - A Tale of Two Cultures
Fuzzing - A Tale of Two CulturesFuzzing - A Tale of Two Cultures
Fuzzing - A Tale of Two Cultures
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
Bioinformatica p4-io
Bioinformatica p4-ioBioinformatica p4-io
Bioinformatica p4-io
 
Defensive Programming 2013-03-18
Defensive Programming 2013-03-18Defensive Programming 2013-03-18
Defensive Programming 2013-03-18
 

More from Andrii Soldatenko

Debugging concurrency programs in go
Debugging concurrency programs in goDebugging concurrency programs in go
Debugging concurrency programs in go
Andrii Soldatenko
 
Building robust and friendly command line applications in go
Building robust and friendly command line applications in goBuilding robust and friendly command line applications in go
Building robust and friendly command line applications in go
Andrii Soldatenko
 
Advanced debugging  techniques in different environments
Advanced debugging  techniques in different environmentsAdvanced debugging  techniques in different environments
Advanced debugging  techniques in different environments
Andrii Soldatenko
 
Origins of Serverless
Origins of ServerlessOrigins of Serverless
Origins of Serverless
Andrii Soldatenko
 
Building serverless-applications
Building serverless-applicationsBuilding serverless-applications
Building serverless-applications
Andrii Soldatenko
 
Building Serverless applications with Python
Building Serverless applications with PythonBuilding Serverless applications with Python
Building Serverless applications with Python
Andrii Soldatenko
 

More from Andrii Soldatenko (6)

Debugging concurrency programs in go
Debugging concurrency programs in goDebugging concurrency programs in go
Debugging concurrency programs in go
 
Building robust and friendly command line applications in go
Building robust and friendly command line applications in goBuilding robust and friendly command line applications in go
Building robust and friendly command line applications in go
 
Advanced debugging  techniques in different environments
Advanced debugging  techniques in different environmentsAdvanced debugging  techniques in different environments
Advanced debugging  techniques in different environments
 
Origins of Serverless
Origins of ServerlessOrigins of Serverless
Origins of Serverless
 
Building serverless-applications
Building serverless-applicationsBuilding serverless-applications
Building serverless-applications
 
Building Serverless applications with Python
Building Serverless applications with PythonBuilding Serverless applications with Python
Building Serverless applications with Python
 

Recently uploaded

制作毕业证书(ANU毕业证)莫纳什大学毕业证成绩单官方原版办理
制作毕业证书(ANU毕业证)莫纳什大学毕业证成绩单官方原版办理制作毕业证书(ANU毕业证)莫纳什大学毕业证成绩单官方原版办理
制作毕业证书(ANU毕业证)莫纳什大学毕业证成绩单官方原版办理
cuobya
 
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
CIOWomenMagazine
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
3ipehhoa
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
Rogerio Filho
 
Gen Z and the marketplaces - let's translate their needs
Gen Z and the marketplaces - let's translate their needsGen Z and the marketplaces - let's translate their needs
Gen Z and the marketplaces - let's translate their needs
Laura Szabó
 
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
fovkoyb
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
eutxy
 
Italy Agriculture Equipment Market Outlook to 2027
Italy Agriculture Equipment Market Outlook to 2027Italy Agriculture Equipment Market Outlook to 2027
Italy Agriculture Equipment Market Outlook to 2027
harveenkaur52
 
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
3ipehhoa
 
[HUN][hackersuli] Red Teaming alapok 2024
[HUN][hackersuli] Red Teaming alapok 2024[HUN][hackersuli] Red Teaming alapok 2024
[HUN][hackersuli] Red Teaming alapok 2024
hackersuli
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
keoku
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC
 
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
Javier Lasa
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
ufdana
 
可查真实(Monash毕业证)西澳大学毕业证成绩单退学买
可查真实(Monash毕业证)西澳大学毕业证成绩单退学买可查真实(Monash毕业证)西澳大学毕业证成绩单退学买
可查真实(Monash毕业证)西澳大学毕业证成绩单退学买
cuobya
 
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Brad Spiegel Macon GA
 
学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作
学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作
学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作
zyfovom
 
Search Result Showing My Post is Now Buried
Search Result Showing My Post is Now BuriedSearch Result Showing My Post is Now Buried
Search Result Showing My Post is Now Buried
Trish Parr
 
重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理
重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理
重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理
vmemo1
 
成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理
成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理
成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理
ysasp1
 

Recently uploaded (20)

制作毕业证书(ANU毕业证)莫纳什大学毕业证成绩单官方原版办理
制作毕业证书(ANU毕业证)莫纳什大学毕业证成绩单官方原版办理制作毕业证书(ANU毕业证)莫纳什大学毕业证成绩单官方原版办理
制作毕业证书(ANU毕业证)莫纳什大学毕业证成绩单官方原版办理
 
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
Internet of Things in Manufacturing: Revolutionizing Efficiency & Quality | C...
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
 
Gen Z and the marketplaces - let's translate their needs
Gen Z and the marketplaces - let's translate their needsGen Z and the marketplaces - let's translate their needs
Gen Z and the marketplaces - let's translate their needs
 
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
存档可查的(USC毕业证)南加利福尼亚大学毕业证成绩单制做办理
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
 
Italy Agriculture Equipment Market Outlook to 2027
Italy Agriculture Equipment Market Outlook to 2027Italy Agriculture Equipment Market Outlook to 2027
Italy Agriculture Equipment Market Outlook to 2027
 
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
 
[HUN][hackersuli] Red Teaming alapok 2024
[HUN][hackersuli] Red Teaming alapok 2024[HUN][hackersuli] Red Teaming alapok 2024
[HUN][hackersuli] Red Teaming alapok 2024
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
 
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
 
可查真实(Monash毕业证)西澳大学毕业证成绩单退学买
可查真实(Monash毕业证)西澳大学毕业证成绩单退学买可查真实(Monash毕业证)西澳大学毕业证成绩单退学买
可查真实(Monash毕业证)西澳大学毕业证成绩单退学买
 
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
 
学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作
学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作
学位认证网(DU毕业证)迪肯大学毕业证成绩单一比一原版制作
 
Search Result Showing My Post is Now Buried
Search Result Showing My Post is Now BuriedSearch Result Showing My Post is Now Buried
Search Result Showing My Post is Now Buried
 
重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理
重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理
重新申请毕业证书(RMIT毕业证)皇家墨尔本理工大学毕业证成绩单精仿办理
 
成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理
成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理
成绩单ps(UST毕业证)圣托马斯大学毕业证成绩单快速办理
 

PyCon Russian 2015 - Dive into full text search with python.

  • 1. Dive into full text search with Python Andrii Soldatenko 18-19 September 2015 @a_soldatenko
  • 2. About me: • Lead QA Automation Engineer at • Backend Python Developer at • Speaker at PyCon Ukraine 2014 • Speaker at PyCon Belarus 2015 • @a_soldatenko
  • 5. Text Search grep  -­‐-­‐ignore-­‐case  -­‐-­‐recursive  foo  books/   grep  -­‐-­‐ignore-­‐case  -­‐-­‐recursive  -­‐-­‐file=words.txt  books/ Entry.objects.get(headline__icontains='foo')   words  =  []   with  open('words.txt',  'r')  as  f:          words  =  f.readlines()   Entry.objects.get(headline__icontains_in=words)
  • 8. Simple sentences 1. The quick brown fox jumped over the lazy dog 2. Quick brown foxes leap over lazy dogs in summer
  • 9. Inverted index Term            Doc_1    Doc_2   -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐   Quick      |              |    X   The          |      X      |   brown      |      X      |    X   dog          |      X      |   dogs        |              |    X   fox          |      X      |   foxes      |              |    X   in            |              |    X   jumped    |      X      |   lazy        |      X      |    X   leap        |              |    X   over        |      X      |    X   quick      |      X      |   summer    |              |    X   the          |      X      |   -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐
  • 10. Inverted index Term            Doc_1    Doc_2   -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐   brown      |      X      |    X   quick      |      X      |   -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐   Total      |      2      |    1
  • 11. Inverted index: normalization Term            Doc_1    Doc_2   -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐   brown      |      X      |    X   dog          |      X      |    X   fox          |      X      |    X   in            |              |    X   jump        |      X      |    X   lazy        |      X      |    X   over        |      X      |    X   quick      |      X      |    X   summer    |              |    X   the          |      X      |    X   -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐ Term            Doc_1    Doc_2   -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐   Quick      |              |    X   The          |      X      |   brown      |      X      |    X   dog          |      X      |   dogs        |              |    X   fox          |      X      |   foxes      |              |    X   in            |              |    X   jumped    |      X      |   lazy        |      X      |    X   leap        |              |    X   over        |      X      |    X   quick      |      X      |   summer    |              |    X   the          |      X      |   -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐
  • 14. PostgreSQL:
 operators for textual data types -­‐-­‐-­‐  PostgreSQL  has  operators  for  textual  data  types:   -­‐-­‐-­‐  LIKE  -­‐  match  case-­‐sensitive   -­‐-­‐-­‐  ILIKE  -­‐  match  case-­‐insensitive   -­‐-­‐-­‐  ~  -­‐  Matches  POSIX  regular  expression,  case-­‐sensitive   -­‐-­‐-­‐  ~*  -­‐  Matches  POSIX  regular  expression,  case-­‐insensitive   select  'foo'  LIKE  'foo';                  -­‐-­‐  true   select  'bar'  ILIKE  'BAR';                -­‐-­‐  true     select  'abc'  LIKE  'b';                      -­‐-­‐  true   select  'abc'  LIKE  'c';                      -­‐-­‐  false   select  'abc'  ~  'abc';                        -­‐-­‐  true   select  'abc'  ~  '^a';                          -­‐-­‐  true   select  'abc'  ~  '(b|d)';                    -­‐-­‐  true   select  'abc'  ~  '^(b|c)';                  -­‐-­‐  false   select  'andrii'  ~*  '.*Andrii.*';  -­‐-­‐  true
  • 15. PostgreSQL:
 accuracy issue select  'prone'  like  '%one%';  -­‐-­‐true     select  'money'  like  '%one%';  -­‐-­‐true     select  'lonely'  like  '%one%';  -­‐-­‐true    
  • 16. Full text search in PostgreSQL 1. Creating tokens 2. Converting tokens into Lexemes 3. Storing preprocessed documents
  • 17. Full text search in PostgreSQL 27 built-in configurations for 10 languages Support of user-defined FTS configurations Pluggable dictionaries, parsers Inverted indexes
  • 18. functions to convert normal text to tsvector explain  SELECT  'a  fat  cat  sat  on  a  mat  and  ate  a  fat  rat'::tsvector  @@                  'cat  &  rat’::tsquery;                                  QUERY  PLAN                                   -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐    Result    (cost=0.00..0.01  rows=1  width=0)   (1  row)   explain  SELECT  'fat  &  cow'::tsquery  @@                    'a  fat  cat  sat  on  a  mat  and  ate  a  fat  rat'::tsvector;  -­‐-­‐  false                                  QUERY  PLAN                                   -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐    Result    (cost=0.00..0.01  rows=1  width=0)   (1  row)
  • 19. PostgreSQL:
 index management CREATE  FUNCTION  notes_vector_update()  RETURNS  TRIGGER  AS  $$   BEGIN          IF  TG_OP  =  'INSERT'  THEN                  new.search_index  =  to_tsvector('pg_catalog.english',  COALESCE(NEW.name,  ''));          END  IF;          IF  TG_OP  =  'UPDATE'  THEN                  IF  NEW.name  <>  OLD.name  THEN                          new.search_index  =  to_tsvector('pg_catalog.english',  COALESCE(NEW.name,  ''));                  END  IF;          END  IF;          RETURN  NEW;   END   $$  LANGUAGE  'plpgsql';  
  • 20. PostgreSQL:
 stopwords SELECT  to_tsvector('english','in  the  list  of  stop  words');                to_tsvector   -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐    'list':3  'stop':5  'word':6 /usr/pgsql-9.3/share/tsearch_data/english.stop
  • 22. Malcolm Tredinnick's Advice on Writing SQL in Django : “︎If you need to write advanced SQL you should write it. I would balance that by cautioning against overuse of the raw() and extra() methods.”
  • 23. PostgreSQL full-text search integration with django orm https://github.com/linuxlewis/djorm-ext-pgfulltext from  djorm_pgfulltext.models  import  SearchManager   from  djorm_pgfulltext.fields  import  VectorField   from  django.db  import  models   class  Page(models.Model):          name  =  models.CharField(max_length=200)          description  =  models.TextField()          search_index  =  VectorField()          objects  =  SearchManager(                  fields  =  ('name',  'description'),                  config  =  'pg_catalog.english',  #  this  is  default                  search_field  =  'search_index',  #  this  is  default                  auto_update_search_field  =  True          )
  • 24. For search just use search method of the manager https://github.com/linuxlewis/djorm-ext-pgfulltext >>>  Page.objects.search("documentation  &  about")   [<Page:  Page:  Home  page>]   >>>  Page.objects.search("about  |  documentation  |  django  |  home",  raw=True)   [<Page:  Page:  Home  page>,  <Page:  Page:  About>,  <Page:  Page:  Navigation>]
  • 25. Second way class  Page(models.Model):          name  =  models.CharField(max_length=200)          description  =  models.TextField()          objects  =  SearchManager(fields=None,  search_field=None)   >>>  Page.objects.search("documentation  &  about",  fields=('name',   'description'))   [<Page:  Page:  Home  page>]   >>>  Page.objects.search("about  |  documentation  |  django  |  home",   raw=True,  fields=('name',  'description'))   [<Page:  Page:  Home  page>,  <Page:  Page:  About>,  <Page:  Page:   Navigation>]
  • 26. Pros and Cons Pros: • Quick implementation • No dependency Cons: • Need manually manage indexes • Not as flexible as pure search engines • tied to PostgreSQL • no analytics data • no DSL only `&` and `|` queries • difficult to manage stop words
  • 29. ElasticSearch: Quick Intro Relational DB Databases TablesRows Columns ElasticSearch Indices FieldsTypes Documents
  • 30. ElasticSearch: Quick Intro PUT  /haystack/user/1   {          "first_name"  :  "Andrii",          "last_name"  :    "Soldatenko",          "age"  :                30,          "about"  :            "I  love  to  go  rock  climbing",          "interests":  [  "sports",  "music"  ],          "likes":  [  "python",  "django"  ]   }
  • 32. ElasticSearch: Setup #!/bin/bash   VERSION=1.7.1   curl  -­‐L  -­‐O  https://download.elastic.co/elasticsearch/elasticsearch/ elasticsearch-­‐$VERSION.zip   unzip  elasticsearch-­‐$VERSION.zip   cd  elasticsearch-­‐$VERSION   #  Download  plugin  marvel   ./bin/plugin  -­‐i  elasticsearch/marvel/latest   echo  'marvel.agent.enabled:  false'  >>  ./config/elasticsearch.yml   #  run  elastic   ./bin/elasticsearch  -­‐d
  • 33. ElasticSearch: Setup $  curl  ‘http://localhost:9200/?pretty'   {      "status"  :  200,      "name"  :  "Dredmund  Druid",      "cluster_name"  :  "elasticsearch",      "version"  :  {          "number"  :  "1.7.1",          "build_hash"  :  "b88f43fc40b0bcd7f173a1f9ee2e97816de80b19",          "build_timestamp"  :  "2015-­‐07-­‐29T09:54:16Z",          "build_snapshot"  :  false,          "lucene_version"  :  "4.10.4"      },      "tagline"  :  "You  Know,  for  Search"   }
  • 35. Adding search functionality to Simple Model $  cat  myapp/models.py   from  django.db  import  models   from  django.contrib.auth.models  import  User   class  Page(models.Model):          user  =  models.ForeignKey(User)          name  =  models.CharField(max_length=200)          description  =  models.TextField()          def  __unicode__(self):                  return  self.name  
  • 36. Haystack: Installation $  pip  install  django-­‐haystack   $  cat  settings.py   INSTALLED_APPS  =  [          'django.contrib.admin',          'django.contrib.auth',          'django.contrib.contenttypes',          'django.contrib.sessions',          'django.contrib.sites',          #  Added.          'haystack',          #  Then  your  usual  apps...          'blog',   ]
  • 37. Haystack: Installation $  pip  install  elasticsearch   $  cat  settings.py   ...   HAYSTACK_CONNECTIONS  =  {          'default':  {                  'ENGINE':   'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',                  'URL':  'http://127.0.0.1:9200/',                  'INDEX_NAME':  'haystack',          },   }   ...
  • 38. Haystack: Creating SearchIndexes $  cat  myapp/search_indexes.py   import  datetime   from  haystack  import  indexes   from  myapp.models  import  Note   class  PageIndex(indexes.SearchIndex,  indexes.Indexable):          text  =  indexes.CharField(document=True,  use_template=True)          author  =  indexes.CharField(model_attr='user')          pub_date  =  indexes.DateTimeField(model_attr='pub_date')          def  get_model(self):                  return  Note          def  index_queryset(self,  using=None):                  """Used  when  the  entire  index  for  model  is  updated."""                  return  self.get_model().objects.                                            filter(pub_date__lte=datetime.datetime.now())
  • 39. Haystack: SearchQuerySet API from  haystack.query  import  SearchQuerySet   from  haystack.inputs  import  Raw   all_results  =  SearchQuerySet().all()   hello_results  =  SearchQuerySet().filter(content='hello')   unfriendly_results  =  SearchQuerySet().                                            exclude(content=‘hello’).                                            filter(content=‘world’)   #  To  send  unescaped  data:   sqs  =  SearchQuerySet().filter(title=Raw(trusted_query))  
  • 40. Keeping data in sync #  Update  everything.   ./manage.py  update_index  -­‐-­‐settings=settings.prod   #  Update  everything  with  lots  of  information  about  what's  going  on.   ./manage.py  update_index  -­‐-­‐settings=settings.prod  -­‐-­‐verbosity=2   #  Update  everything,  cleaning  up  after  deleted  models.   ./manage.py  update_index  -­‐-­‐remove  -­‐-­‐settings=settings.prod   #  Update  everything  changed  in  the  last  2  hours.   ./manage.py  update_index  -­‐-­‐age=2  -­‐-­‐settings=settings.prod   #  Update  everything  between  Dec.  1,  2011  &  Dec  31,  2011   ./manage.py  update_index  -­‐-­‐start='2011-­‐12-­‐01T00:00:00'  -­‐-­‐ end='2011-­‐12-­‐31T23:59:59'  -­‐-­‐settings=settings.prod
  • 41. Signals class  RealtimeSignalProcessor(BaseSignalProcessor):          """          Allows  for  observing  when  saves/deletes  fire  &  automatically  updates  the          search  engine  appropriately.          """          def  setup(self):                  #  Naive  (listen  to  all  model  saves).                  models.signals.post_save.connect(self.handle_save)                  models.signals.post_delete.connect(self.handle_delete)                  #  Efficient  would  be  going  through  all  backends  &  collecting  all  models                  #  being  used,  then  hooking  up  signals  only  for  those.          def  teardown(self):                  #  Naive  (listen  to  all  model  saves).                  models.signals.post_save.disconnect(self.handle_save)                  models.signals.post_delete.disconnect(self.handle_delete)                  #  Efficient  would  be  going  through  all  backends  &  collecting  all  models                  #  being  used,  then  disconnecting  signals  only  for  those.
  • 42. Haystack: Pros and Cons Pros: • easy to setup • looks like Django ORM but for searches • search engine independent • support 4 engines (Elastic, Solr, Xapian, Whoosh) Cons: • poor SearchQuerySet API • difficult to manage stop words • loose performance, because extra layer • Model - based
  • 43. Future FTS and Roadmap Django 1.9 • PostgreSQL Full Text Search (Marc Tamlyn) https://github.com/django/django/pull/4726 • Custom indexes (Marc Tamlyn) • etc.