SlideShare a Scribd company logo
Full Text Search
Django + Postgres
Search is everywhere
Search expectations
● FAST
● Full Text search
● Linguistic support (“craziness | crazy”)
● Ranking
● Fuzzy Searching
● More like this
Django
● SLOW
● `icontains` is dumbed down version of
search
● Searching across tables is pain
● No relevancy, ranking or similar words
unless done manually
● No easy way for fuzzy searching
Other Alternatives
● Solr
● ElasticSearch
● AWS CloudSearch
● Sphinx
● etc*
If you’re using any of the above, use Haystack
Postgres Search
● FAST
● Simple to implement
● Supports Search features like Full Text,
Ranking, Boosting, Fuzzy etc..
Django
Live Example
● Search Students by name or by course
● Use South migration to create tsvector
column
● Store title in Search table
● Update Search table via Celery on Save of
Student data
https://github.com/Syerram/postgres_search
GIN, GIST
● GIST is Hash based, GIN is B-trees
● GINs = GISTs * 3 , s = Speed
● GINu = GISTu * 3 , u = update time
● GINkb = GISTkb * 3, kb = size
A gin index
CREATE INDEX student_index ON students USING gin(to_tsvector('english'
name));
Source http://www.postgresql.org/docs/9.2/static/textsearch-indexes.html
Full Text Search
● All text should be preprocessed using
tsvector and queried using tsquery
● Both reduce the text to lexemes
SELECT to_tsvector('How much wood would a woodchuck chuck If a woodchuck could
chuck wood?')
"'chuck':7,12 'could':11 'much':2 'wood':3,13 'woodchuck':6,10 'would':4"
● Both are required for searching to work on
normal text
SELECT to_tsvector('How much wood would a woodchucks chucks If a woodchucks could
chucks woods?') @@ 'chucks' -- False
SELECT to_tsvector('How much wood would a woodchucks chucks If a woodchucks could
chucks woods?') @@ to_tsquery('chucks') -- True
Full Text Search (Contd.)
● Technically you don’t need index, but for
large tables it will be slow
SELECT * FROM students where to_tsvector('english', name) @@ to_tsquery('english',
'Kirk')
● GIN or GIST Index
CREATE INDEX <index_name> ON <table_name> USING gin(<col_name>);
● Expression Based
CREATE INDEX <index_name> ON <table_name> USING gin(to_tsvector(COALESCE(col_name,'')
|| COALESCE(col_name,'')));
Boosting
● Boost certain results over others
● Still matching
● Use ts_rank to boost results
e.g.
…ORDER BY ts_rank(document,
to_tsquery('python')) DESC
Ranking
● Importance of search term within document
e.g.
Search term found in title > description > tag
● Use setweight to assign importance to each field
when preparing Document
e.g.
setweight(to_tsvector(‘english’, post.title), 'A') ||
setweight(to_tsvector(‘english’, post.description), 'B') ||
setweight(to_tsvector('english', post.tags), 'C'))
...
--In search query use ‘ts_rank’ to order by ranking
Trigram
● Group of 3 consecutive chars from String
● Similarity between strings is matched by # of
trigrams they share
e.g. "hello": "h", "he", "hel", "ell", "llo", "lo", and "o”
"hallo": "h", "ha", "hal", "all", "llo", "lo", and "o”
Number of matches: 4
● Use similarity to find related terms. Returns value
between 0 to 1 where 0 no match and 1 is exact match
Soundex/Metaphone
● Oldest and only good for English names
● Converts to a String of Length 4.
e.g. “Anthony == Anthoney” => “A535 ==
A535”
● Create index itself with Soundex or
Metaphone
e.g. CREATE INDEX idx_name ON tb_name USING
GIN(soundex(col_name));
SELECT ... FROM tb_name WHERE soundex(col_name) = soundex(‘...’)
Pro & Con
Pros
● Quick implementation
● Lot easier to change document format and call refresh index
● Speed comparable to other search engines
● Cost effective
Cons
● Not as flexible as pure search engines, like Solr
● Not as fast as Solr though pretty fast for humans
● Tied to Postgres
● Indexes can get pretty large, but so can search engine indexes
Django ORM
● Implements Full text Search
class StudentCourse(models.Model):
...
search_index = VectorField()
objects = SearchManager(
fields = ('student__user__name', 'course__name'),
config = 'pg_catalog.english', # this is default
search_field = 'search_index', # this is default
auto_update_search_field = True
)
● StudentCourse.objects.search("David")
https://github.com/djangonauts/djorm-ext-pgfulltext
Next Steps
● Add Ranking, Boosting, Fuzzy Search to
djorm pgfulltext
e.g. StudentCourse.objects.search("David & Python").rank("Python")
StudentCourse.objects.fuzzy_search("Jython").rank("Python")
StudentCourse.objects.soundex("Davad").rank("Java") & More
● Continue to add examples to
postgres_search
Tips
● Use separate DB if necessary or use
Materialized Views
● Don’t index everything. Limit your
searchable data
● Analyze using `Explain` and ts_stat
● Create indexes on fly using concurrently
● Don’t pull Foreign Key objects in search
Code
• https://github.com/Syerram/pos
tgres_search
• Stack
• AngularJS, Django, Celery, Postgres
• Feel free to Fork, Pull Request
@agileseeker, github/syerram,
syerram.silvrback.com/
Sai

More Related Content

What's hot

Rank Your Results with PostgreSQL Full Text Search (from PGConf2015)
Rank Your Results with PostgreSQL Full Text Search (from PGConf2015)Rank Your Results with PostgreSQL Full Text Search (from PGConf2015)
Rank Your Results with PostgreSQL Full Text Search (from PGConf2015)
Jamey Hanson
 
Better Full Text Search in PostgreSQL
Better Full Text Search in PostgreSQLBetter Full Text Search in PostgreSQL
Better Full Text Search in PostgreSQL
Artur Zakirov
 
On Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data TypesOn Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data Types
Jonathan Katz
 
Teaching PostgreSQL to new people
Teaching PostgreSQL to new peopleTeaching PostgreSQL to new people
Teaching PostgreSQL to new people
Tomek Borek
 
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAYPostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAYEmanuel Calvo
 
Developing and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWDeveloping and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDW
Jonathan Katz
 
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
PROIDEA
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013
Roy Russo
 
Elasticsearch presentation 1
Elasticsearch presentation 1Elasticsearch presentation 1
Elasticsearch presentation 1
Maruf Hassan
 
Elasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuningElasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuning
Petar Djekic
 
Elasticsearch 설치 및 기본 활용
Elasticsearch 설치 및 기본 활용Elasticsearch 설치 및 기본 활용
Elasticsearch 설치 및 기본 활용
종민 김
 
Azure search
Azure searchAzure search
Azure search
Alexej Sommer
 
Pg 95 new capabilities
Pg 95 new capabilitiesPg 95 new capabilities
Pg 95 new capabilities
Jamey Hanson
 
Spark with Elasticsearch
Spark with ElasticsearchSpark with Elasticsearch
Spark with Elasticsearch
Holden Karau
 
Elasticsearch speed is key
Elasticsearch speed is keyElasticsearch speed is key
Elasticsearch speed is key
Enterprise Search Warsaw Meetup
 
[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화
NAVER D2
 
PostgreSQL
PostgreSQLPostgreSQL
PostgreSQL
Reuven Lerner
 
Dapper performance
Dapper performanceDapper performance
Dapper performance
Suresh Loganatha
 
Building a CRM on top of ElasticSearch
Building a CRM on top of ElasticSearchBuilding a CRM on top of ElasticSearch
Building a CRM on top of ElasticSearch
Mark Greene
 
Alta vista indexing and search engine
Alta vista  indexing and search engineAlta vista  indexing and search engine
Alta vista indexing and search engine
daomucun
 

What's hot (20)

Rank Your Results with PostgreSQL Full Text Search (from PGConf2015)
Rank Your Results with PostgreSQL Full Text Search (from PGConf2015)Rank Your Results with PostgreSQL Full Text Search (from PGConf2015)
Rank Your Results with PostgreSQL Full Text Search (from PGConf2015)
 
Better Full Text Search in PostgreSQL
Better Full Text Search in PostgreSQLBetter Full Text Search in PostgreSQL
Better Full Text Search in PostgreSQL
 
On Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data TypesOn Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data Types
 
Teaching PostgreSQL to new people
Teaching PostgreSQL to new peopleTeaching PostgreSQL to new people
Teaching PostgreSQL to new people
 
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAYPostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
 
Developing and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWDeveloping and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDW
 
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013
 
Elasticsearch presentation 1
Elasticsearch presentation 1Elasticsearch presentation 1
Elasticsearch presentation 1
 
Elasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuningElasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuning
 
Elasticsearch 설치 및 기본 활용
Elasticsearch 설치 및 기본 활용Elasticsearch 설치 및 기본 활용
Elasticsearch 설치 및 기본 활용
 
Azure search
Azure searchAzure search
Azure search
 
Pg 95 new capabilities
Pg 95 new capabilitiesPg 95 new capabilities
Pg 95 new capabilities
 
Spark with Elasticsearch
Spark with ElasticsearchSpark with Elasticsearch
Spark with Elasticsearch
 
Elasticsearch speed is key
Elasticsearch speed is keyElasticsearch speed is key
Elasticsearch speed is key
 
[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화
 
PostgreSQL
PostgreSQLPostgreSQL
PostgreSQL
 
Dapper performance
Dapper performanceDapper performance
Dapper performance
 
Building a CRM on top of ElasticSearch
Building a CRM on top of ElasticSearchBuilding a CRM on top of ElasticSearch
Building a CRM on top of ElasticSearch
 
Alta vista indexing and search engine
Alta vista  indexing and search engineAlta vista  indexing and search engine
Alta vista indexing and search engine
 

Viewers also liked

Full text search | Speech by Matteo Durighetto | PGDay.IT 2013
Full text search | Speech by Matteo Durighetto | PGDay.IT 2013 Full text search | Speech by Matteo Durighetto | PGDay.IT 2013
Full text search | Speech by Matteo Durighetto | PGDay.IT 2013
Miriade Spa
 
Advanced Search with Solr & django-haystack
Advanced Search with Solr & django-haystackAdvanced Search with Solr & django-haystack
Advanced Search with Solr & django-haystack
Marcel Chastain
 
Scaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and DjangoScaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and Django
tow21
 
Как устроен поиск / Андрей Аксенов (Sphinx)
Как устроен поиск / Андрей Аксенов (Sphinx)Как устроен поиск / Андрей Аксенов (Sphinx)
Как устроен поиск / Андрей Аксенов (Sphinx)
Ontico
 
Practical continuous quality gates for development process
Practical continuous quality gates for development processPractical continuous quality gates for development process
Practical continuous quality gates for development process
Andrii Soldatenko
 
Бинарные (файловые) хранилища: страшная сказка с мрачным концом / Даниил Подо...
Бинарные (файловые) хранилища: страшная сказка с мрачным концом / Даниил Подо...Бинарные (файловые) хранилища: страшная сказка с мрачным концом / Даниил Подо...
Бинарные (файловые) хранилища: страшная сказка с мрачным концом / Даниил Подо...
Ontico
 
Annabel Lee
Annabel LeeAnnabel Lee
Annabel Lee
bmtravis
 
Secret History of Silicon Valley - Master Slide Deck
Secret History of Silicon Valley - Master Slide DeckSecret History of Silicon Valley - Master Slide Deck
Secret History of Silicon Valley - Master Slide Deck
Stanford University
 
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...SlideShare
 

Viewers also liked (9)

Full text search | Speech by Matteo Durighetto | PGDay.IT 2013
Full text search | Speech by Matteo Durighetto | PGDay.IT 2013 Full text search | Speech by Matteo Durighetto | PGDay.IT 2013
Full text search | Speech by Matteo Durighetto | PGDay.IT 2013
 
Advanced Search with Solr & django-haystack
Advanced Search with Solr & django-haystackAdvanced Search with Solr & django-haystack
Advanced Search with Solr & django-haystack
 
Scaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and DjangoScaling search to a million pages with Solr, Python, and Django
Scaling search to a million pages with Solr, Python, and Django
 
Как устроен поиск / Андрей Аксенов (Sphinx)
Как устроен поиск / Андрей Аксенов (Sphinx)Как устроен поиск / Андрей Аксенов (Sphinx)
Как устроен поиск / Андрей Аксенов (Sphinx)
 
Practical continuous quality gates for development process
Practical continuous quality gates for development processPractical continuous quality gates for development process
Practical continuous quality gates for development process
 
Бинарные (файловые) хранилища: страшная сказка с мрачным концом / Даниил Подо...
Бинарные (файловые) хранилища: страшная сказка с мрачным концом / Даниил Подо...Бинарные (файловые) хранилища: страшная сказка с мрачным концом / Даниил Подо...
Бинарные (файловые) хранилища: страшная сказка с мрачным концом / Даниил Подо...
 
Annabel Lee
Annabel LeeAnnabel Lee
Annabel Lee
 
Secret History of Silicon Valley - Master Slide Deck
Secret History of Silicon Valley - Master Slide DeckSecret History of Silicon Valley - Master Slide Deck
Secret History of Silicon Valley - Master Slide Deck
 
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
 

Similar to Full Text search in Django with Postgres

Search Engine-Building with Lucene and Solr, Part 1 (SoCal Code Camp LA 2013)
Search Engine-Building with Lucene and Solr, Part 1 (SoCal Code Camp LA 2013)Search Engine-Building with Lucene and Solr, Part 1 (SoCal Code Camp LA 2013)
Search Engine-Building with Lucene and Solr, Part 1 (SoCal Code Camp LA 2013)
Kai Chan
 
Introduction to database
Introduction to databaseIntroduction to database
Introduction to database
Pongsakorn U-chupala
 
Search Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and SolrSearch Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and Solr
Kai Chan
 
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
Holden Karau
 
Big Data Grows Up - A (re)introduction to Cassandra
Big Data Grows Up - A (re)introduction to CassandraBig Data Grows Up - A (re)introduction to Cassandra
Big Data Grows Up - A (re)introduction to Cassandra
Robbie Strickland
 
Database 101
Database 101Database 101
Database 101
thehoagie
 
Journey through high performance django application
Journey through high performance django applicationJourney through high performance django application
Journey through high performance django application
bangaloredjangousergroup
 
Postgresql search demystified
Postgresql search demystifiedPostgresql search demystified
Postgresql search demystified
javier ramirez
 
Querydsl fin jug - june 2012
Querydsl   fin jug - june 2012Querydsl   fin jug - june 2012
Querydsl fin jug - june 2012
Timo Westkämper
 
The art of readable code (ch1~ch4)
The art of readable code (ch1~ch4)The art of readable code (ch1~ch4)
The art of readable code (ch1~ch4)
Ki Sung Bae
 
The art of readable code (ch1~ch4)
The art of readable code (ch1~ch4)The art of readable code (ch1~ch4)
The art of readable code (ch1~ch4)
Ki Sung Bae
 
Elasticsearch for Data Engineers
Elasticsearch for Data EngineersElasticsearch for Data Engineers
Elasticsearch for Data Engineers
Duy Do
 
PostgreSQL - It's kind've a nifty database
PostgreSQL - It's kind've a nifty databasePostgreSQL - It's kind've a nifty database
PostgreSQL - It's kind've a nifty database
Barry Jones
 
How to use the new Domino Query Language
How to use the new Domino Query LanguageHow to use the new Domino Query Language
How to use the new Domino Query Language
Tim Davis
 
Ml pipelines with Apache spark and Apache beam - Ottawa Reactive meetup Augus...
Ml pipelines with Apache spark and Apache beam - Ottawa Reactive meetup Augus...Ml pipelines with Apache spark and Apache beam - Ottawa Reactive meetup Augus...
Ml pipelines with Apache spark and Apache beam - Ottawa Reactive meetup Augus...
Holden Karau
 
Introducing Datawave
Introducing DatawaveIntroducing Datawave
Introducing Datawave
Accumulo Summit
 
Data Exploration with Apache Drill: Day 1
Data Exploration with Apache Drill:  Day 1Data Exploration with Apache Drill:  Day 1
Data Exploration with Apache Drill: Day 1
Charles Givre
 
HelsinkiJS - Clojurescript for Javascript Developers
HelsinkiJS - Clojurescript for Javascript DevelopersHelsinkiJS - Clojurescript for Javascript Developers
HelsinkiJS - Clojurescript for Javascript Developers
Juho Teperi
 
MYSQL Query Anti-Patterns That Can Be Moved to Sphinx
MYSQL Query Anti-Patterns That Can Be Moved to SphinxMYSQL Query Anti-Patterns That Can Be Moved to Sphinx
MYSQL Query Anti-Patterns That Can Be Moved to Sphinx
Pythian
 

Similar to Full Text search in Django with Postgres (20)

Search Engine-Building with Lucene and Solr, Part 1 (SoCal Code Camp LA 2013)
Search Engine-Building with Lucene and Solr, Part 1 (SoCal Code Camp LA 2013)Search Engine-Building with Lucene and Solr, Part 1 (SoCal Code Camp LA 2013)
Search Engine-Building with Lucene and Solr, Part 1 (SoCal Code Camp LA 2013)
 
Introduction to database
Introduction to databaseIntroduction to database
Introduction to database
 
Search Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and SolrSearch Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and Solr
 
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
 
Big Data Grows Up - A (re)introduction to Cassandra
Big Data Grows Up - A (re)introduction to CassandraBig Data Grows Up - A (re)introduction to Cassandra
Big Data Grows Up - A (re)introduction to Cassandra
 
Database 101
Database 101Database 101
Database 101
 
Journey through high performance django application
Journey through high performance django applicationJourney through high performance django application
Journey through high performance django application
 
Postgresql search demystified
Postgresql search demystifiedPostgresql search demystified
Postgresql search demystified
 
Querydsl fin jug - june 2012
Querydsl   fin jug - june 2012Querydsl   fin jug - june 2012
Querydsl fin jug - june 2012
 
The art of readable code (ch1~ch4)
The art of readable code (ch1~ch4)The art of readable code (ch1~ch4)
The art of readable code (ch1~ch4)
 
The art of readable code (ch1~ch4)
The art of readable code (ch1~ch4)The art of readable code (ch1~ch4)
The art of readable code (ch1~ch4)
 
Elasticsearch for Data Engineers
Elasticsearch for Data EngineersElasticsearch for Data Engineers
Elasticsearch for Data Engineers
 
9.4json
9.4json9.4json
9.4json
 
PostgreSQL - It's kind've a nifty database
PostgreSQL - It's kind've a nifty databasePostgreSQL - It's kind've a nifty database
PostgreSQL - It's kind've a nifty database
 
How to use the new Domino Query Language
How to use the new Domino Query LanguageHow to use the new Domino Query Language
How to use the new Domino Query Language
 
Ml pipelines with Apache spark and Apache beam - Ottawa Reactive meetup Augus...
Ml pipelines with Apache spark and Apache beam - Ottawa Reactive meetup Augus...Ml pipelines with Apache spark and Apache beam - Ottawa Reactive meetup Augus...
Ml pipelines with Apache spark and Apache beam - Ottawa Reactive meetup Augus...
 
Introducing Datawave
Introducing DatawaveIntroducing Datawave
Introducing Datawave
 
Data Exploration with Apache Drill: Day 1
Data Exploration with Apache Drill:  Day 1Data Exploration with Apache Drill:  Day 1
Data Exploration with Apache Drill: Day 1
 
HelsinkiJS - Clojurescript for Javascript Developers
HelsinkiJS - Clojurescript for Javascript DevelopersHelsinkiJS - Clojurescript for Javascript Developers
HelsinkiJS - Clojurescript for Javascript Developers
 
MYSQL Query Anti-Patterns That Can Be Moved to Sphinx
MYSQL Query Anti-Patterns That Can Be Moved to SphinxMYSQL Query Anti-Patterns That Can Be Moved to Sphinx
MYSQL Query Anti-Patterns That Can Be Moved to Sphinx
 

Recently uploaded

FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 

Full Text search in Django with Postgres

  • 2. Search is everywhere Search expectations ● FAST ● Full Text search ● Linguistic support (“craziness | crazy”) ● Ranking ● Fuzzy Searching ● More like this
  • 3. Django ● SLOW ● `icontains` is dumbed down version of search ● Searching across tables is pain ● No relevancy, ranking or similar words unless done manually ● No easy way for fuzzy searching
  • 4. Other Alternatives ● Solr ● ElasticSearch ● AWS CloudSearch ● Sphinx ● etc* If you’re using any of the above, use Haystack
  • 5. Postgres Search ● FAST ● Simple to implement ● Supports Search features like Full Text, Ranking, Boosting, Fuzzy etc..
  • 6. Django Live Example ● Search Students by name or by course ● Use South migration to create tsvector column ● Store title in Search table ● Update Search table via Celery on Save of Student data https://github.com/Syerram/postgres_search
  • 7. GIN, GIST ● GIST is Hash based, GIN is B-trees ● GINs = GISTs * 3 , s = Speed ● GINu = GISTu * 3 , u = update time ● GINkb = GISTkb * 3, kb = size A gin index CREATE INDEX student_index ON students USING gin(to_tsvector('english' name)); Source http://www.postgresql.org/docs/9.2/static/textsearch-indexes.html
  • 8. Full Text Search ● All text should be preprocessed using tsvector and queried using tsquery ● Both reduce the text to lexemes SELECT to_tsvector('How much wood would a woodchuck chuck If a woodchuck could chuck wood?') "'chuck':7,12 'could':11 'much':2 'wood':3,13 'woodchuck':6,10 'would':4" ● Both are required for searching to work on normal text SELECT to_tsvector('How much wood would a woodchucks chucks If a woodchucks could chucks woods?') @@ 'chucks' -- False SELECT to_tsvector('How much wood would a woodchucks chucks If a woodchucks could chucks woods?') @@ to_tsquery('chucks') -- True
  • 9. Full Text Search (Contd.) ● Technically you don’t need index, but for large tables it will be slow SELECT * FROM students where to_tsvector('english', name) @@ to_tsquery('english', 'Kirk') ● GIN or GIST Index CREATE INDEX <index_name> ON <table_name> USING gin(<col_name>); ● Expression Based CREATE INDEX <index_name> ON <table_name> USING gin(to_tsvector(COALESCE(col_name,'') || COALESCE(col_name,'')));
  • 10. Boosting ● Boost certain results over others ● Still matching ● Use ts_rank to boost results e.g. …ORDER BY ts_rank(document, to_tsquery('python')) DESC
  • 11. Ranking ● Importance of search term within document e.g. Search term found in title > description > tag ● Use setweight to assign importance to each field when preparing Document e.g. setweight(to_tsvector(‘english’, post.title), 'A') || setweight(to_tsvector(‘english’, post.description), 'B') || setweight(to_tsvector('english', post.tags), 'C')) ... --In search query use ‘ts_rank’ to order by ranking
  • 12. Trigram ● Group of 3 consecutive chars from String ● Similarity between strings is matched by # of trigrams they share e.g. "hello": "h", "he", "hel", "ell", "llo", "lo", and "o” "hallo": "h", "ha", "hal", "all", "llo", "lo", and "o” Number of matches: 4 ● Use similarity to find related terms. Returns value between 0 to 1 where 0 no match and 1 is exact match
  • 13. Soundex/Metaphone ● Oldest and only good for English names ● Converts to a String of Length 4. e.g. “Anthony == Anthoney” => “A535 == A535” ● Create index itself with Soundex or Metaphone e.g. CREATE INDEX idx_name ON tb_name USING GIN(soundex(col_name)); SELECT ... FROM tb_name WHERE soundex(col_name) = soundex(‘...’)
  • 14. Pro & Con Pros ● Quick implementation ● Lot easier to change document format and call refresh index ● Speed comparable to other search engines ● Cost effective Cons ● Not as flexible as pure search engines, like Solr ● Not as fast as Solr though pretty fast for humans ● Tied to Postgres ● Indexes can get pretty large, but so can search engine indexes
  • 15. Django ORM ● Implements Full text Search class StudentCourse(models.Model): ... search_index = VectorField() objects = SearchManager( fields = ('student__user__name', 'course__name'), config = 'pg_catalog.english', # this is default search_field = 'search_index', # this is default auto_update_search_field = True ) ● StudentCourse.objects.search("David") https://github.com/djangonauts/djorm-ext-pgfulltext
  • 16. Next Steps ● Add Ranking, Boosting, Fuzzy Search to djorm pgfulltext e.g. StudentCourse.objects.search("David & Python").rank("Python") StudentCourse.objects.fuzzy_search("Jython").rank("Python") StudentCourse.objects.soundex("Davad").rank("Java") & More ● Continue to add examples to postgres_search
  • 17. Tips ● Use separate DB if necessary or use Materialized Views ● Don’t index everything. Limit your searchable data ● Analyze using `Explain` and ts_stat ● Create indexes on fly using concurrently ● Don’t pull Foreign Key objects in search
  • 18. Code • https://github.com/Syerram/pos tgres_search • Stack • AngularJS, Django, Celery, Postgres • Feel free to Fork, Pull Request