SlideShare a Scribd company logo
Back to the future :
SQL 92 for Elasticsearch ?
@LucianPrecup
@nosqlmatters #nosql14
2014-09-04
whoami
• CTO of Adelean (http://adelean.com/, http://www.elasticsearch.com/about/partners/)
• Integrate search, nosql and big data
technologies to support ETL, BI, data mining,
data processing and data visualization use
cases.
2014-09-04 2@LucianPrecup @nosqlmatters #nosql14
Poll - How many of you …
• Know SQL ?
• Are familiar with the NoSQL theory ?
• Are familiar with Elasticsearch ?
• Lucene ? Solr ?
• Used a NoSQL database or product ?
• Are remembering SQL 92 ?
2014-04-30 @LucianPrecup @nosqlmatters #nosql14 3
SQL 92 ? NoSQL ?
SQL ? SQL 92 ? RDBMS ?
• SQL
– Structured Query Language
– Based on relational algebra
• Designed for RDMBSes
– Relational Database
Management Systems
• SQL 92
– 700 pages of specification
– Standardization
– No vendor lock in ?
NoSQL ? Elasticsearch ?
• NoSQL
– At first : the name of an event
– Distributed databases
– Horizontal scaling
• Standardization ?
• Polyglot persistence
• The language
– Low level : speak the “raw
data ” language
• Elasticsearch Query DSL
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 4
Why this presentation ?
• The title is voluntarily provocative
– Back in ‘92, the dream (or nightmare) of any
database vendor was to be SQL 92 compliant
• Good occasion to do a comparison
– And who knows : the history might repeat :-)
• Elasticsearch users often ask questions about
how to express a SQL query with Elasticsearch
– However this will not going to be exhaustive about
the subject
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 5
The "Query Optimizer"
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 6
SELECT DISTINCT offer_status FROM offer;
SELECT offer_status FROM offer GROUP by offer_status;
≡
SELECT O.id, O.label
FROM offer O
WHERE O.offer_status IN (
SELECT S.id FROM offer_status S)
SELECT O.id, O.label
FROM offer O, offer_status S
WHERE O.offer_status = S.id
≡
The "Query Optimizer"
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 7
SQL/RDBMS Power to the DBA
The "Query Optimizer"
NoSQL
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 8
SQL/RDBMS Power to the DBA
The "Query Optimizer"
NoSQL
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 9
SQL/RDBMS Power to the DBA
The "Query Optimizer"
NoSQL
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 10
SQL/RDBMS Power to the DBA
The "Query Optimizer"
SQL/RDBMS Power to the DBA
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 11
NoSQL
The "Query Optimizer"
SQL/RDBMS Power to the DBA NoSQL Power to the developer
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 12
“With great power comes great responsibility”
• The developer has to :
– Deal with query optimization
– Deal with data storage
– Take care about data consistency
– …
• But the developer can do better than the
query optimizer  adjusting (the data) to the
(very) specific needs
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 13
Great responsibility … with Elasticsearch
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 14
"fields": ["@timestamp"],
"from": 0, "size": 1,
"sort": [{ "@timestamp": { "order": "desc" }}],
"query": { "match_all": {} },
"filter": { "and": [
{"term": {"account": "you@me.org"}},
{"term": {"protocol": "http"}}
]
}
"from": 0, "size": 0,
"query": { "filtered": {"query": {"match_all": {}},
"filter": { "bool": { "must": [
{"term": {"account": "you@me.org"}},
{"term": {"protocol": "http"}}
]}}}
},
"aggs": {"LastTimestamp": {"max": {"field": "@timestamp"}}}
≡
What SQL 92 for Elasticsearch would imply ?
• Syntax  not important
• Focus on functionality
• Take advantage of the fact that the database is no
longer the center of the information system. The
service layer is.
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 15
Side by side - pagination
• Statement.execute()
• do while ResultSet.next()
– ResultSet.get()
• Otherwise: no standard
for pagination in SQL 92
• Pagination is at the core
of search engines
• Top n results are returned
fast and use cases usually
stop to that
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 16
As we will use this
difference in some
choices
Side by side - decimals
CREATE TABLE test_decimal(
salary_dec DECIMAL(5,2),
salary_double DOUBLE);
INSERT INTO test_decimal(
salary_dec, salary_double)
values (0.1, 0.1); X 10
SELECT SUM(salary_dec)
FROM test_decimal;
1.00
SELECT SUM(salary_double)
FROM test_decimal;
0.9999999999999999
PUT test_index/test_decimal/_mapping
"test_decimal" : {
"salary_float" : {"type" : "float" },
"salary_double" : {"type" : "double" },
"salary_string" : {"type" : "string", "index": "not_analyzed"
}
POST test_index/test_decimal
{"salary_float" : 0.1,"salary_double" : 0.1,"salary_string" :
"0.1"} X 10
POST test_index/test_decimal/_search
"size": 0, "aggs": {
"FloatTotal": {"sum": { "field" : "salary_float" }},
"DoubleTotal": {"sum": { "field" : "salary_double" }}
}
 "FloatTotal": {"value": 1.0000000149011612},
"DoubleTotal": {"value": 1}
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 17
As SQL 92 introduced
some new types
This fits
But 0.00001 X 10 does not
 0.00010000000000000002
Decimals for Elasticsearch – the solution
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 18
Multiply salary_dec by 100
Then use integers
Divide salary_dec by 100 !
Side by side – order by
• SELECT * FROM offer
ORDER BY price;
• SELECT (price_ex +
price_vat) AS price FROM
offer ORDER BY price;
• SELECT substring(concat(
value1, value2)) AS code
FROM table ORDER BY code
• "query": {"match_all": {}},
"sort": [{"price": {"order": "asc"}}]
• "function_score": {"boost_mode": "replace",
"script_score": {"script":
"doc['price_ex'].value + doc['price_vat'].value"}}
• Let’s do the computations at index time !
• Watch out for order by + pagination + distributed
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 19
Order by - computations at index time
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 20
Index substring(concat(
value1, value2)) as code
"sort": [{"code": {"order": "asc"}}]
Side by side - count
• SELECT COUNT(*)
FROM offer;
• SELECT COUNT(*)
FROM offer WHERE
price > 10;
• POST index/_count
{"query" : {"match_all": {}}}
• POST index/_count
"query": {"filtered": {
"filter": {"range": {"price": {"from": 10}}}}}
• POST index/_search
"size": 0,
"aggs": {"Total": {"value_count": { "field" :
"price" }}}
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 21
The simplest
aggregation
Side by side - other aggregations
• SELECT SUM(price)
FROM offer;
• SELECT AVG(price)
FROM offer;
• SELECT MAX(price)
FROM offer;
• POST index/_search
"size": 0,
"aggs": {"Total": {"sum": { "field" :
"price" }}}
• POST index/_search
"size": 0,
"aggs": {"Average": {"avg": { "field"
: "price" }}}
• POST index/_search
"size": 0,
"aggs": {"Maximum": {"max": {
"field" : "price" }}}
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 22
Side by side – distinct and group by
• SELECT DISTINCT
offer_status FROM
offer;
• SELECT * FROM offer
GROUP BY offer_status;
• "size": 0,
"aggs": {"Statuses": {"terms": {
"field" : "offer_status.raw" }}}
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 23
Side by side – distinct and group by
• SELECT * FROM offer
GROUP BY offer_status;
• "size": 0,
"aggs": {"Statuses": {"terms": { "field" :
"offer_status.raw" }}}
• "query": {"filtered": {
"filter": {"term": {"offer_status.raw": "on_line"}}}
"query": {"filtered": {
"filter": {"term": {"offer_status.raw": "off_line"}}}
• "size": 0,
"aggs": {"Statuses": {"terms": { "field" :
"offer_status.raw" },
"aggs": {"Top hits": {"top_hits": {"size": 10}}}}}
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 24
Implementing GROUP BY
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 25
Query 1: A terms aggregation
Query 2..N: Several terms queries
(grouped with the multi-search api)
With Elasticsearch 1.3.2 :
A terms aggregation
A top_hits sub aggregation
Side by side – joins
Normalized database Elasticsearch document
{"film" : {
"id" : "183070",
"title" : "The Artist",
"published" : "2011-10-12",
"genre" : ["Romance", "Drama", "Comedy"],
"language" : ["English", "French"],
"persons" : [
{"person" : { "id" : "5079", "name" : "Michel
Hazanavicius", "role" : "director" }},
{"person" : { "id" : "84145", "name" : "Jean
Dujardin", "role" : "actor" }},
{"person" : { "id" : "24485", "name" : "Bérénice
Bejo", "role" : "actor" }},
{"person" : { "id" : "4204", "name" : "John
Goodman", "role" : "actor" }}
]
}}
2014-04-30 @LucianPrecup @nosqlmatters #nosql14 26
The issue with joins :-)
• Let’s say you have two relational entities: Persons
and Contracts
– A Person has zero, one or more Contracts
– A Contract is attached to one or more Persons (eg. the
Subscriber, the Grantee, …)
• Need a search services :
– S1: getPersonsDetailsByContractProperties
– S2: getContractsDetailsByPersonProperties
• Simple solution with SQL:
SELECT P.* FROM P, C WHERE P.id = C.pid AND C.a = 'A‘
SELECT C.* FROM P, C WHERE P.id = C.pid AND P.a = 'A'
2014-04-30 @LucianPrecup @nosqlmatters #nosql14 27
The issue with joins - solutions
• Solution 1
– Index Persons with Contracts together for S1
{"person" : { "details" : …, … , "contracts" : ["contract" :{"id" : 1, …}, …] }}
– Index Contracts with Persons together for S2
{"contract" : { "details" : …, …, "persons" : ["person" :{"id" : 1, "role" : "S", …}, …]}}
• Issues with solution 1:
– A lot of data duplication
– Have to get Contracts when indexing Persons and vice-versa
• Solution 2
– Elasticsearch’s Parent/Child
• Issues with solution 2:
– Works in one way but not the other (only one parent for n children, a 1 to n relationship)
• Solution 3
– Index Persons and Contracts separately
– Launch two Elasticsearch queries to get the response
– For S1 : First get all Contract ids by Contract properties, then get Persons by Contract ids (terms
query or mget)
– For S2 : First get all Persons ids by Person properties, then get Contracts by Person ids (terms
query or mget)
– The response to the second query can be returned “as is” to the client (pagination, etc.)
2014-04-30 @LucianPrecup @nosqlmatters #nosql14 28
Side by side - having
• SELECT *, SUM(price)
FROM offer
GROUP BY offer_status
HAVING AVG(price) > 10;
• "size": 0,
"aggs": {
"Status": {"terms": {"field": "offer_status"},
"aggs": {
"Average": {"avg": {"field": "price_ht"}}}}
}
• "query": {
"filtered": {"filter": {
"terms": {"offer_status": ["on_line"]}}}},
"aggs": {
"Status": {"terms": {"field": "offer_status"},
"aggs": {
"Total": {"sum": {"field": "price_ht"}}}}}
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 29
Also specified
by SQL 92
Implementing HAVING
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 30
1/ Query 1: A terms aggregation and an avg sub-aggregation
2/ Pick terms that match the HAVING clause
3/ Query 2: A filtered query on previous terms + terms
aggregation + sum sub-aggregation
4/ Construct the result from hits + lookup in the
corresponding aggregation
Conclusion
• The service layer is the center of the system
• The developer has the power :-)
2014-09-04 @LucianPrecup @nosqlmatters #nosql14 31
Thank you
Q & A

More Related Content

Viewers also liked

COMPUTADORA
COMPUTADORACOMPUTADORA
COMPUTADORA
YherZhon GL
 
Portraits
PortraitsPortraits
Portraits
Jana Yar
 
Process of food video production
Process of food video productionProcess of food video production
Process of food video production
Macie Tan
 
Labio Leporino
Labio LeporinoLabio Leporino
Labio Leporino
DianaFQ
 
Eye candyto foodphoto101
Eye candyto foodphoto101Eye candyto foodphoto101
Eye candyto foodphoto101
Ann Gagno
 
Process of food styling & food photography
Process of food styling & food photographyProcess of food styling & food photography
Process of food styling & food photography
Macie Tan
 
Expo 2-anatomia-dentaria-aplicada
Expo 2-anatomia-dentaria-aplicadaExpo 2-anatomia-dentaria-aplicada
Expo 2-anatomia-dentaria-aplicada
Roberth Rodriguez
 
Caries dental
Caries dentalCaries dental
Caries dental
Roberth Rodriguez
 
Portfolio illustration a3
Portfolio illustration a3Portfolio illustration a3
Portfolio illustration a3
Stephen White
 
Graphic Design & Illustration
Graphic Design & IllustrationGraphic Design & Illustration
Graphic Design & Illustration
Stephen White
 
Exercices corrigés de la comptabilité des sociétés la constitution des sa
Exercices corrigés de la comptabilité des sociétés la constitution des saExercices corrigés de la comptabilité des sociétés la constitution des sa
Exercices corrigés de la comptabilité des sociétés la constitution des sa
Jamal Yasser
 

Viewers also liked (12)

COMPUTADORA
COMPUTADORACOMPUTADORA
COMPUTADORA
 
Portraits
PortraitsPortraits
Portraits
 
Picasso
PicassoPicasso
Picasso
 
Process of food video production
Process of food video productionProcess of food video production
Process of food video production
 
Labio Leporino
Labio LeporinoLabio Leporino
Labio Leporino
 
Eye candyto foodphoto101
Eye candyto foodphoto101Eye candyto foodphoto101
Eye candyto foodphoto101
 
Process of food styling & food photography
Process of food styling & food photographyProcess of food styling & food photography
Process of food styling & food photography
 
Expo 2-anatomia-dentaria-aplicada
Expo 2-anatomia-dentaria-aplicadaExpo 2-anatomia-dentaria-aplicada
Expo 2-anatomia-dentaria-aplicada
 
Caries dental
Caries dentalCaries dental
Caries dental
 
Portfolio illustration a3
Portfolio illustration a3Portfolio illustration a3
Portfolio illustration a3
 
Graphic Design & Illustration
Graphic Design & IllustrationGraphic Design & Illustration
Graphic Design & Illustration
 
Exercices corrigés de la comptabilité des sociétés la constitution des sa
Exercices corrigés de la comptabilité des sociétés la constitution des saExercices corrigés de la comptabilité des sociétés la constitution des sa
Exercices corrigés de la comptabilité des sociétés la constitution des sa
 

Similar to Back to the future : SQL 92 for Elasticsearch ? @nosqlmatters Dublin 2014

Back to the future : SQL 92 for Elasticsearch @nosqlmatters Paris
Back to the future : SQL 92 for Elasticsearch @nosqlmatters ParisBack to the future : SQL 92 for Elasticsearch @nosqlmatters Paris
Back to the future : SQL 92 for Elasticsearch @nosqlmatters Paris
Lucian Precup
 
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
NoSQLmatters
 
Couchbase N1QL: Language & Architecture Overview.
Couchbase N1QL: Language & Architecture Overview.Couchbase N1QL: Language & Architecture Overview.
Couchbase N1QL: Language & Architecture Overview.
Keshav Murthy
 
conf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalytics
conf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalyticsconf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalytics
conf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalytics
Tom LaGatta
 
Eventually Elasticsearch: Eventual Consistency in the Real World
Eventually Elasticsearch: Eventual Consistency in the Real WorldEventually Elasticsearch: Eventual Consistency in the Real World
Eventually Elasticsearch: Eventual Consistency in the Real World
BeyondTrees
 
Search and nosql for information management @nosqlmatters Cologne
Search and nosql for information management @nosqlmatters CologneSearch and nosql for information management @nosqlmatters Cologne
Search and nosql for information management @nosqlmatters Cologne
Lucian Precup
 
Elasticsearch for Data Engineers
Elasticsearch for Data EngineersElasticsearch for Data Engineers
Elasticsearch for Data Engineers
Duy Do
 
Webinar: What's New in Solr 6
Webinar: What's New in Solr 6Webinar: What's New in Solr 6
Webinar: What's New in Solr 6
Lucidworks
 
SQL Tutorial for Marketers
SQL Tutorial for MarketersSQL Tutorial for Marketers
SQL Tutorial for Marketers
Justin Mares
 
Singpore Oracle Sessions III - What is truly useful in Oracle Database 12c fo...
Singpore Oracle Sessions III - What is truly useful in Oracle Database 12c fo...Singpore Oracle Sessions III - What is truly useful in Oracle Database 12c fo...
Singpore Oracle Sessions III - What is truly useful in Oracle Database 12c fo...
Lucas Jellema
 
Introducing N1QL: New SQL Based Query Language for JSON
Introducing N1QL: New SQL Based Query Language for JSONIntroducing N1QL: New SQL Based Query Language for JSON
Introducing N1QL: New SQL Based Query Language for JSON
Keshav Murthy
 
Manage online profiles with oracle no sql database tht10972 - v1.1
Manage online profiles with oracle no sql database   tht10972 - v1.1Manage online profiles with oracle no sql database   tht10972 - v1.1
Manage online profiles with oracle no sql database tht10972 - v1.1
Robert Greene
 
When to NoSQL and when to know SQL
When to NoSQL and when to know SQLWhen to NoSQL and when to know SQL
When to NoSQL and when to know SQL
Simon Elliston Ball
 
Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...
Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...
Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...
Sergii Khomenko
 
Application development with Oracle NoSQL Database 3.0
Application development with Oracle NoSQL Database 3.0Application development with Oracle NoSQL Database 3.0
Application development with Oracle NoSQL Database 3.0
Anuj Sahni
 
Everything That Is Really Useful in Oracle Database 12c for Application Devel...
Everything That Is Really Useful in Oracle Database 12c for Application Devel...Everything That Is Really Useful in Oracle Database 12c for Application Devel...
Everything That Is Really Useful in Oracle Database 12c for Application Devel...
Lucas Jellema
 
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
NoSQLmatters
 
Introduction to SQL Server Graph DB
Introduction to SQL Server Graph DBIntroduction to SQL Server Graph DB
Introduction to SQL Server Graph DB
Greg McMurray
 
Do More with Postgres- NoSQL Applications for the Enterprise
Do More with Postgres- NoSQL Applications for the EnterpriseDo More with Postgres- NoSQL Applications for the Enterprise
Do More with Postgres- NoSQL Applications for the Enterprise
EDB
 
Docker Summit MongoDB - Data Democratization
Docker Summit MongoDB - Data Democratization Docker Summit MongoDB - Data Democratization
Docker Summit MongoDB - Data Democratization
Chris Grabosky
 

Similar to Back to the future : SQL 92 for Elasticsearch ? @nosqlmatters Dublin 2014 (20)

Back to the future : SQL 92 for Elasticsearch @nosqlmatters Paris
Back to the future : SQL 92 for Elasticsearch @nosqlmatters ParisBack to the future : SQL 92 for Elasticsearch @nosqlmatters Paris
Back to the future : SQL 92 for Elasticsearch @nosqlmatters Paris
 
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
 
Couchbase N1QL: Language & Architecture Overview.
Couchbase N1QL: Language & Architecture Overview.Couchbase N1QL: Language & Architecture Overview.
Couchbase N1QL: Language & Architecture Overview.
 
conf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalytics
conf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalyticsconf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalytics
conf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalytics
 
Eventually Elasticsearch: Eventual Consistency in the Real World
Eventually Elasticsearch: Eventual Consistency in the Real WorldEventually Elasticsearch: Eventual Consistency in the Real World
Eventually Elasticsearch: Eventual Consistency in the Real World
 
Search and nosql for information management @nosqlmatters Cologne
Search and nosql for information management @nosqlmatters CologneSearch and nosql for information management @nosqlmatters Cologne
Search and nosql for information management @nosqlmatters Cologne
 
Elasticsearch for Data Engineers
Elasticsearch for Data EngineersElasticsearch for Data Engineers
Elasticsearch for Data Engineers
 
Webinar: What's New in Solr 6
Webinar: What's New in Solr 6Webinar: What's New in Solr 6
Webinar: What's New in Solr 6
 
SQL Tutorial for Marketers
SQL Tutorial for MarketersSQL Tutorial for Marketers
SQL Tutorial for Marketers
 
Singpore Oracle Sessions III - What is truly useful in Oracle Database 12c fo...
Singpore Oracle Sessions III - What is truly useful in Oracle Database 12c fo...Singpore Oracle Sessions III - What is truly useful in Oracle Database 12c fo...
Singpore Oracle Sessions III - What is truly useful in Oracle Database 12c fo...
 
Introducing N1QL: New SQL Based Query Language for JSON
Introducing N1QL: New SQL Based Query Language for JSONIntroducing N1QL: New SQL Based Query Language for JSON
Introducing N1QL: New SQL Based Query Language for JSON
 
Manage online profiles with oracle no sql database tht10972 - v1.1
Manage online profiles with oracle no sql database   tht10972 - v1.1Manage online profiles with oracle no sql database   tht10972 - v1.1
Manage online profiles with oracle no sql database tht10972 - v1.1
 
When to NoSQL and when to know SQL
When to NoSQL and when to know SQLWhen to NoSQL and when to know SQL
When to NoSQL and when to know SQL
 
Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...
Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...
Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenk...
 
Application development with Oracle NoSQL Database 3.0
Application development with Oracle NoSQL Database 3.0Application development with Oracle NoSQL Database 3.0
Application development with Oracle NoSQL Database 3.0
 
Everything That Is Really Useful in Oracle Database 12c for Application Devel...
Everything That Is Really Useful in Oracle Database 12c for Application Devel...Everything That Is Really Useful in Oracle Database 12c for Application Devel...
Everything That Is Really Useful in Oracle Database 12c for Application Devel...
 
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
 
Introduction to SQL Server Graph DB
Introduction to SQL Server Graph DBIntroduction to SQL Server Graph DB
Introduction to SQL Server Graph DB
 
Do More with Postgres- NoSQL Applications for the Enterprise
Do More with Postgres- NoSQL Applications for the EnterpriseDo More with Postgres- NoSQL Applications for the Enterprise
Do More with Postgres- NoSQL Applications for the Enterprise
 
Docker Summit MongoDB - Data Democratization
Docker Summit MongoDB - Data Democratization Docker Summit MongoDB - Data Democratization
Docker Summit MongoDB - Data Democratization
 

More from Lucian Precup

Enrich data and rewrite queries with the Elasticsearch percolator
Enrich data and rewrite queries with the Elasticsearch percolatorEnrich data and rewrite queries with the Elasticsearch percolator
Enrich data and rewrite queries with the Elasticsearch percolator
Lucian Precup
 
Joins in a distributed world Distributed Matters Barcelona 2015
Joins in a distributed world Distributed Matters Barcelona 2015Joins in a distributed world Distributed Matters Barcelona 2015
Joins in a distributed world Distributed Matters Barcelona 2015
Lucian Precup
 
Search, nosql et bigdata avec les moteurs de recherche
Search, nosql et bigdata avec les moteurs de rechercheSearch, nosql et bigdata avec les moteurs de recherche
Search, nosql et bigdata avec les moteurs de recherche
Lucian Precup
 
ALM et Agilite : la convergence
ALM et Agilite : la convergenceALM et Agilite : la convergence
ALM et Agilite : la convergence
Lucian Precup
 
La revue de code : facile !
La revue de code : facile !La revue de code : facile !
La revue de code : facile !
Lucian Precup
 
La revue de code : agile, lean, indispensable !
La revue de code : agile, lean, indispensable !La revue de code : agile, lean, indispensable !
La revue de code : agile, lean, indispensable !
Lucian Precup
 
Moteurs de recherche et Lucene at LorraineJUG
Moteurs de recherche et Lucene at LorraineJUGMoteurs de recherche et Lucene at LorraineJUG
Moteurs de recherche et Lucene at LorraineJUG
Lucian Precup
 
Solr and Elasticsearch in Action (at Breizhcamp)
Solr and Elasticsearch in Action (at Breizhcamp)Solr and Elasticsearch in Action (at Breizhcamp)
Solr and Elasticsearch in Action (at Breizhcamp)
Lucian Precup
 

More from Lucian Precup (8)

Enrich data and rewrite queries with the Elasticsearch percolator
Enrich data and rewrite queries with the Elasticsearch percolatorEnrich data and rewrite queries with the Elasticsearch percolator
Enrich data and rewrite queries with the Elasticsearch percolator
 
Joins in a distributed world Distributed Matters Barcelona 2015
Joins in a distributed world Distributed Matters Barcelona 2015Joins in a distributed world Distributed Matters Barcelona 2015
Joins in a distributed world Distributed Matters Barcelona 2015
 
Search, nosql et bigdata avec les moteurs de recherche
Search, nosql et bigdata avec les moteurs de rechercheSearch, nosql et bigdata avec les moteurs de recherche
Search, nosql et bigdata avec les moteurs de recherche
 
ALM et Agilite : la convergence
ALM et Agilite : la convergenceALM et Agilite : la convergence
ALM et Agilite : la convergence
 
La revue de code : facile !
La revue de code : facile !La revue de code : facile !
La revue de code : facile !
 
La revue de code : agile, lean, indispensable !
La revue de code : agile, lean, indispensable !La revue de code : agile, lean, indispensable !
La revue de code : agile, lean, indispensable !
 
Moteurs de recherche et Lucene at LorraineJUG
Moteurs de recherche et Lucene at LorraineJUGMoteurs de recherche et Lucene at LorraineJUG
Moteurs de recherche et Lucene at LorraineJUG
 
Solr and Elasticsearch in Action (at Breizhcamp)
Solr and Elasticsearch in Action (at Breizhcamp)Solr and Elasticsearch in Action (at Breizhcamp)
Solr and Elasticsearch in Action (at Breizhcamp)
 

Recently uploaded

Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 

Recently uploaded (20)

Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 

Back to the future : SQL 92 for Elasticsearch ? @nosqlmatters Dublin 2014

  • 1. Back to the future : SQL 92 for Elasticsearch ? @LucianPrecup @nosqlmatters #nosql14 2014-09-04
  • 2. whoami • CTO of Adelean (http://adelean.com/, http://www.elasticsearch.com/about/partners/) • Integrate search, nosql and big data technologies to support ETL, BI, data mining, data processing and data visualization use cases. 2014-09-04 2@LucianPrecup @nosqlmatters #nosql14
  • 3. Poll - How many of you … • Know SQL ? • Are familiar with the NoSQL theory ? • Are familiar with Elasticsearch ? • Lucene ? Solr ? • Used a NoSQL database or product ? • Are remembering SQL 92 ? 2014-04-30 @LucianPrecup @nosqlmatters #nosql14 3
  • 4. SQL 92 ? NoSQL ? SQL ? SQL 92 ? RDBMS ? • SQL – Structured Query Language – Based on relational algebra • Designed for RDMBSes – Relational Database Management Systems • SQL 92 – 700 pages of specification – Standardization – No vendor lock in ? NoSQL ? Elasticsearch ? • NoSQL – At first : the name of an event – Distributed databases – Horizontal scaling • Standardization ? • Polyglot persistence • The language – Low level : speak the “raw data ” language • Elasticsearch Query DSL 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 4
  • 5. Why this presentation ? • The title is voluntarily provocative – Back in ‘92, the dream (or nightmare) of any database vendor was to be SQL 92 compliant • Good occasion to do a comparison – And who knows : the history might repeat :-) • Elasticsearch users often ask questions about how to express a SQL query with Elasticsearch – However this will not going to be exhaustive about the subject 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 5
  • 6. The "Query Optimizer" 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 6 SELECT DISTINCT offer_status FROM offer; SELECT offer_status FROM offer GROUP by offer_status; ≡ SELECT O.id, O.label FROM offer O WHERE O.offer_status IN ( SELECT S.id FROM offer_status S) SELECT O.id, O.label FROM offer O, offer_status S WHERE O.offer_status = S.id ≡
  • 7. The "Query Optimizer" 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 7 SQL/RDBMS Power to the DBA
  • 8. The "Query Optimizer" NoSQL 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 8 SQL/RDBMS Power to the DBA
  • 9. The "Query Optimizer" NoSQL 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 9 SQL/RDBMS Power to the DBA
  • 10. The "Query Optimizer" NoSQL 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 10 SQL/RDBMS Power to the DBA
  • 11. The "Query Optimizer" SQL/RDBMS Power to the DBA 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 11 NoSQL
  • 12. The "Query Optimizer" SQL/RDBMS Power to the DBA NoSQL Power to the developer 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 12
  • 13. “With great power comes great responsibility” • The developer has to : – Deal with query optimization – Deal with data storage – Take care about data consistency – … • But the developer can do better than the query optimizer  adjusting (the data) to the (very) specific needs 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 13
  • 14. Great responsibility … with Elasticsearch 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 14 "fields": ["@timestamp"], "from": 0, "size": 1, "sort": [{ "@timestamp": { "order": "desc" }}], "query": { "match_all": {} }, "filter": { "and": [ {"term": {"account": "you@me.org"}}, {"term": {"protocol": "http"}} ] } "from": 0, "size": 0, "query": { "filtered": {"query": {"match_all": {}}, "filter": { "bool": { "must": [ {"term": {"account": "you@me.org"}}, {"term": {"protocol": "http"}} ]}}} }, "aggs": {"LastTimestamp": {"max": {"field": "@timestamp"}}} ≡
  • 15. What SQL 92 for Elasticsearch would imply ? • Syntax  not important • Focus on functionality • Take advantage of the fact that the database is no longer the center of the information system. The service layer is. 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 15
  • 16. Side by side - pagination • Statement.execute() • do while ResultSet.next() – ResultSet.get() • Otherwise: no standard for pagination in SQL 92 • Pagination is at the core of search engines • Top n results are returned fast and use cases usually stop to that 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 16 As we will use this difference in some choices
  • 17. Side by side - decimals CREATE TABLE test_decimal( salary_dec DECIMAL(5,2), salary_double DOUBLE); INSERT INTO test_decimal( salary_dec, salary_double) values (0.1, 0.1); X 10 SELECT SUM(salary_dec) FROM test_decimal; 1.00 SELECT SUM(salary_double) FROM test_decimal; 0.9999999999999999 PUT test_index/test_decimal/_mapping "test_decimal" : { "salary_float" : {"type" : "float" }, "salary_double" : {"type" : "double" }, "salary_string" : {"type" : "string", "index": "not_analyzed" } POST test_index/test_decimal {"salary_float" : 0.1,"salary_double" : 0.1,"salary_string" : "0.1"} X 10 POST test_index/test_decimal/_search "size": 0, "aggs": { "FloatTotal": {"sum": { "field" : "salary_float" }}, "DoubleTotal": {"sum": { "field" : "salary_double" }} }  "FloatTotal": {"value": 1.0000000149011612}, "DoubleTotal": {"value": 1} 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 17 As SQL 92 introduced some new types This fits But 0.00001 X 10 does not  0.00010000000000000002
  • 18. Decimals for Elasticsearch – the solution 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 18 Multiply salary_dec by 100 Then use integers Divide salary_dec by 100 !
  • 19. Side by side – order by • SELECT * FROM offer ORDER BY price; • SELECT (price_ex + price_vat) AS price FROM offer ORDER BY price; • SELECT substring(concat( value1, value2)) AS code FROM table ORDER BY code • "query": {"match_all": {}}, "sort": [{"price": {"order": "asc"}}] • "function_score": {"boost_mode": "replace", "script_score": {"script": "doc['price_ex'].value + doc['price_vat'].value"}} • Let’s do the computations at index time ! • Watch out for order by + pagination + distributed 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 19
  • 20. Order by - computations at index time 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 20 Index substring(concat( value1, value2)) as code "sort": [{"code": {"order": "asc"}}]
  • 21. Side by side - count • SELECT COUNT(*) FROM offer; • SELECT COUNT(*) FROM offer WHERE price > 10; • POST index/_count {"query" : {"match_all": {}}} • POST index/_count "query": {"filtered": { "filter": {"range": {"price": {"from": 10}}}}} • POST index/_search "size": 0, "aggs": {"Total": {"value_count": { "field" : "price" }}} 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 21 The simplest aggregation
  • 22. Side by side - other aggregations • SELECT SUM(price) FROM offer; • SELECT AVG(price) FROM offer; • SELECT MAX(price) FROM offer; • POST index/_search "size": 0, "aggs": {"Total": {"sum": { "field" : "price" }}} • POST index/_search "size": 0, "aggs": {"Average": {"avg": { "field" : "price" }}} • POST index/_search "size": 0, "aggs": {"Maximum": {"max": { "field" : "price" }}} 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 22
  • 23. Side by side – distinct and group by • SELECT DISTINCT offer_status FROM offer; • SELECT * FROM offer GROUP BY offer_status; • "size": 0, "aggs": {"Statuses": {"terms": { "field" : "offer_status.raw" }}} 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 23
  • 24. Side by side – distinct and group by • SELECT * FROM offer GROUP BY offer_status; • "size": 0, "aggs": {"Statuses": {"terms": { "field" : "offer_status.raw" }}} • "query": {"filtered": { "filter": {"term": {"offer_status.raw": "on_line"}}} "query": {"filtered": { "filter": {"term": {"offer_status.raw": "off_line"}}} • "size": 0, "aggs": {"Statuses": {"terms": { "field" : "offer_status.raw" }, "aggs": {"Top hits": {"top_hits": {"size": 10}}}}} 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 24
  • 25. Implementing GROUP BY 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 25 Query 1: A terms aggregation Query 2..N: Several terms queries (grouped with the multi-search api) With Elasticsearch 1.3.2 : A terms aggregation A top_hits sub aggregation
  • 26. Side by side – joins Normalized database Elasticsearch document {"film" : { "id" : "183070", "title" : "The Artist", "published" : "2011-10-12", "genre" : ["Romance", "Drama", "Comedy"], "language" : ["English", "French"], "persons" : [ {"person" : { "id" : "5079", "name" : "Michel Hazanavicius", "role" : "director" }}, {"person" : { "id" : "84145", "name" : "Jean Dujardin", "role" : "actor" }}, {"person" : { "id" : "24485", "name" : "Bérénice Bejo", "role" : "actor" }}, {"person" : { "id" : "4204", "name" : "John Goodman", "role" : "actor" }} ] }} 2014-04-30 @LucianPrecup @nosqlmatters #nosql14 26
  • 27. The issue with joins :-) • Let’s say you have two relational entities: Persons and Contracts – A Person has zero, one or more Contracts – A Contract is attached to one or more Persons (eg. the Subscriber, the Grantee, …) • Need a search services : – S1: getPersonsDetailsByContractProperties – S2: getContractsDetailsByPersonProperties • Simple solution with SQL: SELECT P.* FROM P, C WHERE P.id = C.pid AND C.a = 'A‘ SELECT C.* FROM P, C WHERE P.id = C.pid AND P.a = 'A' 2014-04-30 @LucianPrecup @nosqlmatters #nosql14 27
  • 28. The issue with joins - solutions • Solution 1 – Index Persons with Contracts together for S1 {"person" : { "details" : …, … , "contracts" : ["contract" :{"id" : 1, …}, …] }} – Index Contracts with Persons together for S2 {"contract" : { "details" : …, …, "persons" : ["person" :{"id" : 1, "role" : "S", …}, …]}} • Issues with solution 1: – A lot of data duplication – Have to get Contracts when indexing Persons and vice-versa • Solution 2 – Elasticsearch’s Parent/Child • Issues with solution 2: – Works in one way but not the other (only one parent for n children, a 1 to n relationship) • Solution 3 – Index Persons and Contracts separately – Launch two Elasticsearch queries to get the response – For S1 : First get all Contract ids by Contract properties, then get Persons by Contract ids (terms query or mget) – For S2 : First get all Persons ids by Person properties, then get Contracts by Person ids (terms query or mget) – The response to the second query can be returned “as is” to the client (pagination, etc.) 2014-04-30 @LucianPrecup @nosqlmatters #nosql14 28
  • 29. Side by side - having • SELECT *, SUM(price) FROM offer GROUP BY offer_status HAVING AVG(price) > 10; • "size": 0, "aggs": { "Status": {"terms": {"field": "offer_status"}, "aggs": { "Average": {"avg": {"field": "price_ht"}}}} } • "query": { "filtered": {"filter": { "terms": {"offer_status": ["on_line"]}}}}, "aggs": { "Status": {"terms": {"field": "offer_status"}, "aggs": { "Total": {"sum": {"field": "price_ht"}}}}} 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 29 Also specified by SQL 92
  • 30. Implementing HAVING 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 30 1/ Query 1: A terms aggregation and an avg sub-aggregation 2/ Pick terms that match the HAVING clause 3/ Query 2: A filtered query on previous terms + terms aggregation + sum sub-aggregation 4/ Construct the result from hits + lookup in the corresponding aggregation
  • 31. Conclusion • The service layer is the center of the system • The developer has the power :-) 2014-09-04 @LucianPrecup @nosqlmatters #nosql14 31

Editor's Notes

  1. TODO review this
  2. The *famous* Query Optimizer versus *the* developer
  3. The *famous* Query Optimizer versus *the* developer
  4. The *famous* Query Optimizer versus *the* developer
  5. The *famous* Query Optimizer versus *the* developer
  6. The *famous* Query Optimizer versus *the* developer
  7. The *famous* Query Optimizer versus *the* developer
  8. The *famous* Query Optimizer versus *the* developer