SlideShare a Scribd company logo
1 of 55
Download to read offline
How elephants survive in
big data environments?
Andrii Soldatenko
28 October 2017
@a_soldatenko
Andrii
Soldatenko
• Senior Python/Go Developer at Toptal
• Speaker and open source contributor
• blogger at https://asoldatenko.com
@a_soldatenko
cat /etc/passwd
Idea
Too many information
Information Explosion
Text Search
$ grep --recursive foo books/
https://www.gnu.org/software/grep/manual/grep.html
email lists
github notifications
jira notifications
Text Search
benchmarks
http://blog.burntsushi.net/ripgrep/
PostgreSQL
Michael Stonebraker
http://www.redbook.io/
Александр Коротков, Федор Сигаев и Олег Бартунов
(слева направо)
Full text search
Search index
iTunes Database
select count(*) from application;
count
---------
2,627,949
(1 row)
@a_soldatenko
PostgreSQL
Full Text Search
Demo
@a_soldatenko
Simple Full text search
SELECT to_tsvector('awesome
facebook app')
@@ to_tsquery('facebook');
?column?
----------
t
(1 row)
@a_soldatenko
More real example
Full text search
select title from application where search_vector @@
to_tsquery('Facebook') LIMIT 5;
title
--------------------------------------------------
… for Facebook,WhatsApp,Twitter
… Hair Sticker Editor for Facebook Anime Edition
… Telegram, Kik, GroupMe, Viber, Snapchat, Facebook Me
… T-Shirt for Facebook - …
… Facebook - Check if someone … on Facebook
(5 rows)
@a_soldatenko
Origins of
phrase search
text a <-> text b
@a_soldatenko
Phrase search
Teodor Sigaev and Oleg Bartunov
New FOLLOWED BY
operator
@a_soldatenko
select phraseto_tsquery(
'PostgreSQL allows searching for waxed
ducks.’);
phraseto_tsquery
--------------------------------------
'postgresql' <-> 'allow' <-> 'search'
<2> 'wax' <-> 'duck'
(1 row)
near phrase search :)
select seller_name from application where
search_vector @@ plainto_tsquery('Facebook Inc')
LIMIT 5;
seller_name
-----------------
FunPokes, Inc.
DATT JAPAN INC.
Hoot Live, Inc
Loytr Inc
Facebook, Inc.
(5 rows)
unordered lexeme set
since PostgreSQL 9.6
we have ability to
search for an exact
phrase
@a_soldatenko
Phrase search
select seller_name from application
where search_vector @@
phraseto_tsquery('Facebook Inc')
LIMIT 5;
seller_name
----------------
Facebook, Inc.
Facebook, Inc.
Facebook, Inc.
Facebook, Inc.
Facebook, Inc. @a_soldatenko
As an user I want to see
more relevant results
select title from application where search_vector @@
to_tsquery('Facebook') LIMIT 5;
title
--------------------------------------------------
… for Facebook,WhatsApp,Twitter
… Hair Sticker Editor for Facebook Anime Edition
… Telegram, Kik, GroupMe, Viber, Snapchat, Facebook Me
… T-Shirt for Facebook - …
… Facebook - Check if someone … on Facebook
(5 rows)
@a_soldatenko
Ranking Search Results
# Ranks vectors based on the frequency
of their matching lexemes.
ts_rank(textsearch, query)
# This function computes the cover
density ranking for the given document
vector and query
ts_rank_cd(textsearch, query, 0 /* 1 2
4 8 16 32*/)
@a_soldatenko
Ranking Search Results
select left(title, 40),
ts_rank_cd(search_vector_title,
phraseto_tsquery('instagram')) as rank
from application
where search_vector_title @@
phraseto_tsquery('instagram')
order by rank
LIMIT 5;
left | rank
------------------------------------------+------
Love Frames : Share your valentine photo | 0.1
FullSized Insta - Square Ready Fotos for | 0.1
InstaClean for Instagram -Cleaner Mass D | 0.1
Stickers Free for WhatsApp, Telegram, Ki | 0.1
Pip Camera - Photo Collage Maker For Ins | 0.1
divides the rank by the
document length
select left(title, 40),
ts_rank_cd(search_vector_title,
phraseto_tsquery('facebook'), 2) as rank from
application where search_vector_title @@
phraseto_tsquery('facebook') order by rank desc
LIMIT 5;
left | rank
------------------------------+------
Facebook | 0.1
Who Deleted Me? for Facebook | 0.05
Location for Facebook | 0.05
MessengerApp for Facebook | 0.05
Picturito for Facebook | 0.05
(5 rows)
@a_soldatenko
Performance
@a_soldatenko
GIN
@a_soldatenko
Generalized
Inverted index
@a_soldatenko
GIN
CREATE INDEX name ON table USING GIN
(column);
@a_soldatenko
GIN
@a_soldatenko
GIN
- Slow ranking.
- Slow phrase search
- Slow ordering by timestamp
@a_soldatenko
RUM
@a_soldatenko
RUM index
https://github.com/postgrespro/rum
Limit Value
Maximum Database Size Unlimited
Maximum Table Size 32 TB
Maximum Row Size 1.6 TB
Maximum Field Size 1 GB
Maximum Rows per Table Unlimited
Maximum Columns per Table 250 - 1600
Maximum Indexes per Table Unlimited
Andrew Dunstan
committed patch:
http://git.postgresql.org/pg/commitdiff/
e306df7f9cd6b4433273e006df11bdc966b7079e
PostgreSQL 10
Full text search
support for json and
jsonb
Release date: August 10th, 2017
$ select id, jsonb_pretty(payload) from test;
id | jsonb_pretty
----+-------------------------------------------------------------------------------------------------------------
1 | { +
| "glossary": { +
| "title": "example glossary", +
| "GlossDiv": { +
| "title": "S", +
| "GlossList": { +
| "GlossEntry": { +
| "ID": "SGML", +
| "Abbrev": "ISO 8879:1986", +
| "SortAs": "SGML", +
| "Acronym": "SGML", +
| "GlossDef": { +
| "para": "A meta-markup language, used to create markup languages such as DocBook.",+
| "GlossSeeAlso": [ +
| "GML", +
| "XML" +
| ] +
| }, +
| "GlossSee": "markup", +
| "GlossTerm": "Standard Generalized Markup Language" +
| } +
| } +
| } +
| } +
| }
(1 row)
select to_tsvector('english', payload) from test;
to_tsvector
-------------------------------------------------
'1986':8 '8879':7 'creat':21 'docbook':26
'exampl':1 'general':35 'glossari':2.
. 'gml':28 'iso':6 'languag':18,23,37 'markup':
17,22,32,36 'meta':16 'meta-mark.
.up':15 'sgml':4,10,12 'standard':34 'use':19
'xml':30
(1 row)
Full text search on json
Django 2.0
Django hardcoded ts
functions
class SearchQuery(SearchQueryCombinable, Value):
def as_sql(self, compiler, connection):
...
template =
'plainto_tsquery({}::regconfig,
%s)'.format(config_sql)
Django ORM cd_ranks
class SearchRankCD(SearchRank):
function = 'ts_rank_cd'
def __init__(self, vector, query, normalization=0, **extra):
super(SearchRank, self).__init__(
vector, query, normalization, **extra)
query = SearchQuery('messenger')
Application.objects.annotate(
rank=SearchRankCD(
F('search_vector_title'), query,
normalization=2) # 2 divides the rank by the document length
).filter(search_vector_title=query).order_by('-rank')
Conclusions
- PostgreSQL FTS combine with relation
queries.
- phrase search works fast (ms)
- don’t forget to contribute if you fix
something
Thank You
https://asoldatenko.com

@a_soldatenko
Questions
? @a_soldatenko

More Related Content

Viewers also liked

Практическое использование «больших данных» в бизнесе
Практическое использование «больших данных» в бизнесеПрактическое использование «больших данных» в бизнесе
Практическое использование «больших данных» в бизнесеAnton Vokrug
 
Se2017 управління-та-мотивація-творчих-колективів
Se2017 управління-та-мотивація-творчих-колективівSe2017 управління-та-мотивація-творчих-колективів
Se2017 управління-та-мотивація-творчих-колективівMary Prokhorova
 
как обуздать маркетинг с помощью Big data
как обуздать маркетинг с помощью Big dataкак обуздать маркетинг с помощью Big data
как обуздать маркетинг с помощью Big dataMary Prokhorova
 
маркетинг в 2018
маркетинг в 2018маркетинг в 2018
маркетинг в 2018Mary Prokhorova
 
SE2016 Fundraising Roman Kravchenko "Investment in Ukrainian IoT-Startups"
SE2016 Fundraising Roman Kravchenko "Investment in Ukrainian IoT-Startups"SE2016 Fundraising Roman Kravchenko "Investment in Ukrainian IoT-Startups"
SE2016 Fundraising Roman Kravchenko "Investment in Ukrainian IoT-Startups"Inhacking
 
Tech stars -global-tech-events-networking-parties.compressed
Tech stars -global-tech-events-networking-parties.compressedTech stars -global-tech-events-networking-parties.compressed
Tech stars -global-tech-events-networking-parties.compressedMary Prokhorova
 
Global b2 b-sales-for-startups-from-ukraine
Global b2 b-sales-for-startups-from-ukraineGlobal b2 b-sales-for-startups-from-ukraine
Global b2 b-sales-for-startups-from-ukraineMary Prokhorova
 
практическое использование «больших данных» в бизнесе
практическое использование «больших данных» в бизнесепрактическое использование «больших данных» в бизнесе
практическое использование «больших данных» в бизнесеMary Prokhorova
 
Se2017 businessdev lavrova
Se2017 businessdev lavrovaSe2017 businessdev lavrova
Se2017 businessdev lavrovaMary Prokhorova
 
технический левел-дизайн-29.10.2017
технический левел-дизайн-29.10.2017технический левел-дизайн-29.10.2017
технический левел-дизайн-29.10.2017Mary Prokhorova
 
пошук та-валідація-бізнес-моделі
пошук та-валідація-бізнес-моделіпошук та-валідація-бізнес-моделі
пошук та-валідація-бізнес-моделіMary Prokhorova
 
способы идентефикации, методы взлома и защиты
способы идентефикации, методы взлома и защитыспособы идентефикации, методы взлома и защиты
способы идентефикации, методы взлома и защитыMary Prokhorova
 
Presentation angel-inv for-se
Presentation angel-inv for-sePresentation angel-inv for-se
Presentation angel-inv for-seMary Prokhorova
 

Viewers also liked (16)

Практическое использование «больших данных» в бизнесе
Практическое использование «больших данных» в бизнесеПрактическое использование «больших данных» в бизнесе
Практическое использование «больших данных» в бизнесе
 
Se2017 query-optimizer
Se2017 query-optimizerSe2017 query-optimizer
Se2017 query-optimizer
 
Se2017 управління-та-мотивація-творчих-колективів
Se2017 управління-та-мотивація-творчих-колективівSe2017 управління-та-мотивація-творчих-колективів
Se2017 управління-та-мотивація-творчих-колективів
 
как обуздать маркетинг с помощью Big data
как обуздать маркетинг с помощью Big dataкак обуздать маркетинг с помощью Big data
как обуздать маркетинг с помощью Big data
 
маркетинг в 2018
маркетинг в 2018маркетинг в 2018
маркетинг в 2018
 
SE2016 Fundraising Roman Kravchenko "Investment in Ukrainian IoT-Startups"
SE2016 Fundraising Roman Kravchenko "Investment in Ukrainian IoT-Startups"SE2016 Fundraising Roman Kravchenko "Investment in Ukrainian IoT-Startups"
SE2016 Fundraising Roman Kravchenko "Investment in Ukrainian IoT-Startups"
 
Tech stars -global-tech-events-networking-parties.compressed
Tech stars -global-tech-events-networking-parties.compressedTech stars -global-tech-events-networking-parties.compressed
Tech stars -global-tech-events-networking-parties.compressed
 
Global b2 b-sales-for-startups-from-ukraine
Global b2 b-sales-for-startups-from-ukraineGlobal b2 b-sales-for-startups-from-ukraine
Global b2 b-sales-for-startups-from-ukraine
 
практическое использование «больших данных» в бизнесе
практическое использование «больших данных» в бизнесепрактическое использование «больших данных» в бизнесе
практическое использование «больших данных» в бизнесе
 
Se2017 businessdev lavrova
Se2017 businessdev lavrovaSe2017 businessdev lavrova
Se2017 businessdev lavrova
 
технический левел-дизайн-29.10.2017
технический левел-дизайн-29.10.2017технический левел-дизайн-29.10.2017
технический левел-дизайн-29.10.2017
 
Se nina levchuk
Se nina levchukSe nina levchuk
Se nina levchuk
 
пошук та-валідація-бізнес-моделі
пошук та-валідація-бізнес-моделіпошук та-валідація-бізнес-моделі
пошук та-валідація-бізнес-моделі
 
способы идентефикации, методы взлома и защиты
способы идентефикации, методы взлома и защитыспособы идентефикации, методы взлома и защиты
способы идентефикации, методы взлома и защиты
 
Presentation angel-inv for-se
Presentation angel-inv for-sePresentation angel-inv for-se
Presentation angel-inv for-se
 
L18 accelerators1
L18 accelerators1L18 accelerators1
L18 accelerators1
 

Similar to How elephants survive in big data environments

String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?Jeremy Schneider
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to ElasticsearchSperasoft
 
Using Thinking Sphinx with rails
Using Thinking Sphinx with railsUsing Thinking Sphinx with rails
Using Thinking Sphinx with railsRishav Dixit
 
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesWebscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesJonathan Katz
 
useR! 2012 Talk
useR! 2012 TalkuseR! 2012 Talk
useR! 2012 Talkrtelmore
 
GeoMesa on Apache Spark SQL with Anthony Fox
GeoMesa on Apache Spark SQL with Anthony FoxGeoMesa on Apache Spark SQL with Anthony Fox
GeoMesa on Apache Spark SQL with Anthony FoxDatabricks
 
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...Citus Data
 
Cool bonsai cool - an introduction to ElasticSearch
Cool bonsai cool - an introduction to ElasticSearchCool bonsai cool - an introduction to ElasticSearch
Cool bonsai cool - an introduction to ElasticSearchclintongormley
 
New Features in Apache Pinot
New Features in Apache PinotNew Features in Apache Pinot
New Features in Apache PinotSiddharth Teotia
 
Postgre sql custom datatype overloading operator and casting
Postgre sql custom datatype overloading operator and castingPostgre sql custom datatype overloading operator and casting
Postgre sql custom datatype overloading operator and castingAndrea Adami
 
Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...
Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...
Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...Paul Leclercq
 
Scalable data structures for data science
Scalable data structures for data scienceScalable data structures for data science
Scalable data structures for data scienceTuri, Inc.
 
Postgresql Database Administration Basic - Day2
Postgresql  Database Administration Basic  - Day2Postgresql  Database Administration Basic  - Day2
Postgresql Database Administration Basic - Day2PoguttuezhiniVP
 
MySQL Integral DB Design #JAB14
MySQL Integral DB Design #JAB14MySQL Integral DB Design #JAB14
MySQL Integral DB Design #JAB14Eli Aschkenasy
 
Your Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic DatabaseYour Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic Databasejavier ramirez
 
2015-12-05 Александр Коротков, Иван Панченко - Слабо-структурированные данные...
2015-12-05 Александр Коротков, Иван Панченко - Слабо-структурированные данные...2015-12-05 Александр Коротков, Иван Панченко - Слабо-структурированные данные...
2015-12-05 Александр Коротков, Иван Панченко - Слабо-структурированные данные...HappyDev
 
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Citus Data
 
Managing your Black Friday Logs NDC Oslo
Managing your  Black Friday Logs NDC OsloManaging your  Black Friday Logs NDC Oslo
Managing your Black Friday Logs NDC OsloDavid Pilato
 

Similar to How elephants survive in big data environments (20)

Pgbr 2013 fts
Pgbr 2013 ftsPgbr 2013 fts
Pgbr 2013 fts
 
String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Kyiv.py #16 october 2015
Kyiv.py #16 october 2015Kyiv.py #16 october 2015
Kyiv.py #16 october 2015
 
Using Thinking Sphinx with rails
Using Thinking Sphinx with railsUsing Thinking Sphinx with rails
Using Thinking Sphinx with rails
 
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesWebscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
 
useR! 2012 Talk
useR! 2012 TalkuseR! 2012 Talk
useR! 2012 Talk
 
GeoMesa on Apache Spark SQL with Anthony Fox
GeoMesa on Apache Spark SQL with Anthony FoxGeoMesa on Apache Spark SQL with Anthony Fox
GeoMesa on Apache Spark SQL with Anthony Fox
 
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
 
Cool bonsai cool - an introduction to ElasticSearch
Cool bonsai cool - an introduction to ElasticSearchCool bonsai cool - an introduction to ElasticSearch
Cool bonsai cool - an introduction to ElasticSearch
 
New Features in Apache Pinot
New Features in Apache PinotNew Features in Apache Pinot
New Features in Apache Pinot
 
Postgre sql custom datatype overloading operator and casting
Postgre sql custom datatype overloading operator and castingPostgre sql custom datatype overloading operator and casting
Postgre sql custom datatype overloading operator and casting
 
Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...
Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...
Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...
 
Scalable data structures for data science
Scalable data structures for data scienceScalable data structures for data science
Scalable data structures for data science
 
Postgresql Database Administration Basic - Day2
Postgresql  Database Administration Basic  - Day2Postgresql  Database Administration Basic  - Day2
Postgresql Database Administration Basic - Day2
 
MySQL Integral DB Design #JAB14
MySQL Integral DB Design #JAB14MySQL Integral DB Design #JAB14
MySQL Integral DB Design #JAB14
 
Your Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic DatabaseYour Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic Database
 
2015-12-05 Александр Коротков, Иван Панченко - Слабо-структурированные данные...
2015-12-05 Александр Коротков, Иван Панченко - Слабо-структурированные данные...2015-12-05 Александр Коротков, Иван Панченко - Слабо-структурированные данные...
2015-12-05 Александр Коротков, Иван Панченко - Слабо-структурированные данные...
 
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
 
Managing your Black Friday Logs NDC Oslo
Managing your  Black Friday Logs NDC OsloManaging your  Black Friday Logs NDC Oslo
Managing your Black Friday Logs NDC Oslo
 

Recently uploaded

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Recently uploaded (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 

How elephants survive in big data environments

  • 1. How elephants survive in big data environments? Andrii Soldatenko 28 October 2017 @a_soldatenko
  • 2. Andrii Soldatenko • Senior Python/Go Developer at Toptal • Speaker and open source contributor • blogger at https://asoldatenko.com @a_soldatenko cat /etc/passwd
  • 6. Text Search $ grep --recursive foo books/ https://www.gnu.org/software/grep/manual/grep.html
  • 7.
  • 9.
  • 10.
  • 11.
  • 13.
  • 16.
  • 17.
  • 18. Александр Коротков, Федор Сигаев и Олег Бартунов (слева направо)
  • 21. iTunes Database select count(*) from application; count --------- 2,627,949 (1 row) @a_soldatenko
  • 23. Simple Full text search SELECT to_tsvector('awesome facebook app') @@ to_tsquery('facebook'); ?column? ---------- t (1 row) @a_soldatenko
  • 24. More real example Full text search select title from application where search_vector @@ to_tsquery('Facebook') LIMIT 5; title -------------------------------------------------- … for Facebook,WhatsApp,Twitter … Hair Sticker Editor for Facebook Anime Edition … Telegram, Kik, GroupMe, Viber, Snapchat, Facebook Me … T-Shirt for Facebook - … … Facebook - Check if someone … on Facebook (5 rows) @a_soldatenko
  • 25. Origins of phrase search text a <-> text b @a_soldatenko
  • 26. Phrase search Teodor Sigaev and Oleg Bartunov
  • 27. New FOLLOWED BY operator @a_soldatenko select phraseto_tsquery( 'PostgreSQL allows searching for waxed ducks.’); phraseto_tsquery -------------------------------------- 'postgresql' <-> 'allow' <-> 'search' <2> 'wax' <-> 'duck' (1 row)
  • 28. near phrase search :) select seller_name from application where search_vector @@ plainto_tsquery('Facebook Inc') LIMIT 5; seller_name ----------------- FunPokes, Inc. DATT JAPAN INC. Hoot Live, Inc Loytr Inc Facebook, Inc. (5 rows) unordered lexeme set
  • 29. since PostgreSQL 9.6 we have ability to search for an exact phrase @a_soldatenko
  • 30. Phrase search select seller_name from application where search_vector @@ phraseto_tsquery('Facebook Inc') LIMIT 5; seller_name ---------------- Facebook, Inc. Facebook, Inc. Facebook, Inc. Facebook, Inc. Facebook, Inc. @a_soldatenko
  • 31. As an user I want to see more relevant results select title from application where search_vector @@ to_tsquery('Facebook') LIMIT 5; title -------------------------------------------------- … for Facebook,WhatsApp,Twitter … Hair Sticker Editor for Facebook Anime Edition … Telegram, Kik, GroupMe, Viber, Snapchat, Facebook Me … T-Shirt for Facebook - … … Facebook - Check if someone … on Facebook (5 rows) @a_soldatenko
  • 32. Ranking Search Results # Ranks vectors based on the frequency of their matching lexemes. ts_rank(textsearch, query) # This function computes the cover density ranking for the given document vector and query ts_rank_cd(textsearch, query, 0 /* 1 2 4 8 16 32*/) @a_soldatenko
  • 33. Ranking Search Results select left(title, 40), ts_rank_cd(search_vector_title, phraseto_tsquery('instagram')) as rank from application where search_vector_title @@ phraseto_tsquery('instagram') order by rank LIMIT 5; left | rank ------------------------------------------+------ Love Frames : Share your valentine photo | 0.1 FullSized Insta - Square Ready Fotos for | 0.1 InstaClean for Instagram -Cleaner Mass D | 0.1 Stickers Free for WhatsApp, Telegram, Ki | 0.1 Pip Camera - Photo Collage Maker For Ins | 0.1
  • 34. divides the rank by the document length select left(title, 40), ts_rank_cd(search_vector_title, phraseto_tsquery('facebook'), 2) as rank from application where search_vector_title @@ phraseto_tsquery('facebook') order by rank desc LIMIT 5; left | rank ------------------------------+------ Facebook | 0.1 Who Deleted Me? for Facebook | 0.05 Location for Facebook | 0.05 MessengerApp for Facebook | 0.05 Picturito for Facebook | 0.05 (5 rows) @a_soldatenko
  • 38. GIN CREATE INDEX name ON table USING GIN (column); @a_soldatenko
  • 40. GIN - Slow ranking. - Slow phrase search - Slow ordering by timestamp @a_soldatenko
  • 43. Limit Value Maximum Database Size Unlimited Maximum Table Size 32 TB Maximum Row Size 1.6 TB Maximum Field Size 1 GB Maximum Rows per Table Unlimited Maximum Columns per Table 250 - 1600 Maximum Indexes per Table Unlimited
  • 44.
  • 46. PostgreSQL 10 Full text search support for json and jsonb Release date: August 10th, 2017
  • 47. $ select id, jsonb_pretty(payload) from test; id | jsonb_pretty ----+------------------------------------------------------------------------------------------------------------- 1 | { + | "glossary": { + | "title": "example glossary", + | "GlossDiv": { + | "title": "S", + | "GlossList": { + | "GlossEntry": { + | "ID": "SGML", + | "Abbrev": "ISO 8879:1986", + | "SortAs": "SGML", + | "Acronym": "SGML", + | "GlossDef": { + | "para": "A meta-markup language, used to create markup languages such as DocBook.",+ | "GlossSeeAlso": [ + | "GML", + | "XML" + | ] + | }, + | "GlossSee": "markup", + | "GlossTerm": "Standard Generalized Markup Language" + | } + | } + | } + | } + | } (1 row)
  • 48. select to_tsvector('english', payload) from test; to_tsvector ------------------------------------------------- '1986':8 '8879':7 'creat':21 'docbook':26 'exampl':1 'general':35 'glossari':2. . 'gml':28 'iso':6 'languag':18,23,37 'markup': 17,22,32,36 'meta':16 'meta-mark. .up':15 'sgml':4,10,12 'standard':34 'use':19 'xml':30 (1 row) Full text search on json
  • 49.
  • 51. Django hardcoded ts functions class SearchQuery(SearchQueryCombinable, Value): def as_sql(self, compiler, connection): ... template = 'plainto_tsquery({}::regconfig, %s)'.format(config_sql)
  • 52. Django ORM cd_ranks class SearchRankCD(SearchRank): function = 'ts_rank_cd' def __init__(self, vector, query, normalization=0, **extra): super(SearchRank, self).__init__( vector, query, normalization, **extra) query = SearchQuery('messenger') Application.objects.annotate( rank=SearchRankCD( F('search_vector_title'), query, normalization=2) # 2 divides the rank by the document length ).filter(search_vector_title=query).order_by('-rank')
  • 53. Conclusions - PostgreSQL FTS combine with relation queries. - phrase search works fast (ms) - don’t forget to contribute if you fix something