SlideShare a Scribd company logo
1 of 28
Download to read offline
Turning a search engine
into a relational database
Matthias Wahl, Developer
Outline
• Introduction!
• Relations on Lucene!
• The How and Why!
• Crate in further detail!
• Query engine!
!
• [Demo]
Crate.io
THe company
• Founded in 2013 in Dornbirn/Austria!
• Offices in Dornbirn, Berlin, San
Francisco!
• Team of 14 People (with and without
strong austrian dialect)!
• won Techrunch Disrupt startup
battlefield
SQL Database
TABLES
• Table == Tuple Store!
• Primary-Key -> Tuple!
• Index == B-Tree!
• allowing for equality and range
queries O(log(N))!
• sorting!
• Query Planner + Engine for LOCAL
query execution
LUCENE INDEX
• Inverted Index!
• equality queries!
• range queries!
• fulltext search with analyzed
queries!
• NO Sorting!
• Stored Fields!
• docValues / FieldCache
SQL TABLE
LUCENE INDEX
CREATE TABLE t (
id int primary key,
name string,
marks array(float),
text string index using fulltext
) clustered into 5 shards
with (number_of_replicas=1)
• 1 TABLE
• S shards, each with R replicas!
• metadata in cluster state!
• 1 SHARD
• 1 Lucene Index (inverted index +
stored documents/fields)!
• field mappings!
• field caches
SQL TABLE
LUCENE INDEX
Components
• Inverted Index!
• Translog (WAL)!
• “Tuple Store” - Stored Fields!
• Lucene Field Data !
• DocValues (on disk)
SQL TABLE
LUCENE INDICES
• Differences to Relational
Databases
• DISTRIBUTED!
• 2 different indices needed for all
operations!
• inverted index not suited for all
kinds of queries!
• persistence is expensive!
• limited schema altering
SQL TABLE
LUCENE INDICES
• Differences to Relational
Databases
• DISTRIBUTED!
• 2 different indices needed for all
operations!
• inverted index not suited for all
kinds of queries!
• persistence is expensive!
• limited schema altering!
• no pull based database cursor (yet)
Crate
Features
• Distributed SQL Database written in
Java (7)!
• accessible via HTTP & TCP (for java
clients only)!
• Graphical Admin Interface!
• Blob Storage!
• Plugin Infrastructure!
• Clients available in Java, Python, Ruby,
PHP, Scala, Node.js, Erlang!
• Runs on Docker, AWS, GCE, Mesos, …
CRATE
SQL
Crate
SQL
• subset of ANSI SQL with extensions!
• arrays and nested objects!
• different types!
• Information Schema!
• Cluster and Node State exposed via
tables !
• Partitioned Tables!
• speaking JDBC, ODBC, SQLAlchemy,
Activerecord, PHP-PDO/DBAL
Crate
SQL
• Common relational Operators:!
• Projection!
• Grouping (incl. HAVING)!
• Aggregations!
• Sorting!
• Limit/Offset!
• WHERE-clause!
• Import/Export
Crate
SQL … NOT
• JOINS underway!
• no subselects, foreign key
constraints yet!
• no sessions, no client cursor!
• no transactions
CRATE -
A RELATIONAL DATABASE
• Relational Algebra
• SQL statement!
• Tree of Relational Operators!
• Mostly Tables == Leaves!
• ES!
• Single table operations only!
!
• No simple SQL wrapper around ES
Query DSL
SELECT
id,
substr(4, name),
id % 2 as “EVEN”
text,
marks
FROM t
WHERE
name IS NOT NULL
AND
match(text, ‘full’)
ORDER BY id DESC
LIMIT 10;
Querying crate
• Query Engine
• node based query execution!
• directly to Lucene indices!
• circumventing ES query execution
SELECT
id,
substr(4, name),
id % 2 as “EVEN”
text,
marks
FROM t
WHERE
name IS NOT NULL
AND
match(text, ‘full’)
ORDER BY id DESC
LIMIT 10;
SQL TABLE
LUCENE INDEX
INSERT INTO t (id, name, marks,
text)
VALUES (
42,
format(‘%d - %s’, 42, ‘Zaphod’),
[1.5, 4.6],
‘this is a quite full text!’)
ON DUPLICATE KEY UPDATE
name=‘DUPLICATE’;
• INSERT INTO
• insert values are validated by their
configured types!
• types are guessed for new
columns!
• primary key and routing values
extracted!
• JSON _source is created: !
• {“id”: 42 “name”: “42 - Zaphod”,
“marks”: [1.5, 4.6], “text”:”this is
a quite full text!”}
SQL TABLE
LUCENE INDEX
{
“id”: 42
“name”: “42 - Zaphod”,
“marks”: [1.5, 4.6],
“text”: “this is a quite full text!”
}
• INSERT INTO
• request is routed by “id” column to
node containing shard!
• row stored on shard
SQL TABLE
LUCENE INDEX
_uid: “default#42”
_routing: 42
_source: {…}
_type: “default”
_timestamp: 1435096992201
_version: 0
id: 42 (docvalues)
name: “42 - Zaphod” (sorted set)
marks: 1.5, 4.6
ft: “this is quite a full
text!” (indexed)
_size: 82
_field_names: _uid, _routing,
_source, _type, _timestamp,
_version, id, name, marks, ft,
_size
Querying crate
• Sorting and Grouping
• inverted index not enough!
• per document values (DocValues)
SELECT
id,
substr(4, name),
id % 2 as “EVEN”
text,
marks
FROM t
WHERE
name IS NOT NULL
AND
match(text, ‘full’)
ORDER BY id DESC
LIMIT 10;
Querying crate
• “Simple” SELECT - QTF
• Extract Fields to SELECT!
• Route to shards / Lucene Indices!
• Open and keep Lucene Reader in query
context!
• Only collect Doc/Row identifier (and all
necessary fields for sorting)!
• merge separate results on handler!
• apply limit/offset!
• fetch all fields!
• evaluate expressions!
• return Object[][]
SELECT
id,
substr(4, name),
id % 2 as “EVEN”
text,
marks
FROM t
WHERE
name IS NOT NULL
AND
match(text, ‘full’)
ORDER BY id DESC
LIMIT 10;
Querying crate
• INTERNAL PAGING
• Problems with big result sets /
high offsets!
• Need to fetch LIMIT + OFFSET
from every shard!
• Execution starts at TOP Relation!
• trickles down to tables (Lucene
Indices)!
• Hybrid of push and pull based
data flow
SELECT
id,
substr(4, name),
id % 2 as “EVEN”
text,
marks
FROM t
WHERE
name IS NOT NULL
AND
match(text, ‘full’)
ORDER BY id DESC
LIMIT 1
OFFSET 10000000;
Querying crate
• GROUP BY - AGGREGATIONS
• Aggregation Framework developed
parallel to Elasticsearch aggregations!
• ES - 2 phase aggregations
(HyperLogLog, Moving Averages,
Percentiles …)!
• online algorithms on partial data
(mergeable) necessary!
• https://github.com/elastic/
elasticsearch/issues/4915
SELECT
avg(temp) as avg,
stddev(temp) as stddev,
max(temp) as max,
min(temp) as min,
count(distinct temp)
date_trunc(‘year’, date)
as year
FROM t
WHERE temp IS NOT NULL
GROUP BY 2
ORDER BY avg DESC
LIMIT 10;
Querying crate
• GROUP BY - AGGREGATIONS
• split in 3 phases!
• partial aggregation executed on
each shard in parallel!
• partial result distributed to
“Reduce” nodes by hashing the
group keys!
• final aggregation on handler/
reducer!
• merge on handler
SELECT
avg(temp) as avg,
stddev(temp) as stddev,
max(temp) as max,
min(temp) as min,
count(distinct temp)
date_trunc(‘year’, date)
as year
FROM t
WHERE temp IS NOT NULL
GROUP BY 2
ORDER BY avg DESC
LIMIT 10;
Querying crate
SELECT
avg(temp) as avg,
stddev(temp) as stddev,
max(temp) as max,
min(temp) as min,
count(distinct temp)
date_trunc(‘year’, date)
as year
FROM t
WHERE temp IS NOT NULL
GROUP BY 2
ORDER BY avg DESC
LIMIT 10;
[1,2,2]
[2,3,7]
[4,9,42]
[1, 7, 9]
[2,3,4]
6
[1]
[2]
3
3
Shards
REDUCER
HANDLER
Querying crate
• GROUP BY - AGGREGATIONS
• Row Authority by hashing!
• split huge datasets!
• expensive intermediate
aggregation states possible
(COUNT DISTINCT)
SELECT
avg(temp) as avg,
stddev(temp) as stddev,
max(temp) as max,
min(temp) as min,
count(distinct temp)
date_trunc(‘year’, date)
as year
FROM t
WHERE temp IS NOT NULL
GROUP BY 2
ORDER BY avg DESC
LIMIT 10;
FINALLY
• GETTING RELATIONAL…
• still in transition!
• more relational operators to come!
• JOINs are underway!
• CROSS JOINS already “work”
Thank YOU
matthias@crate.io!
jobs@crate.io !

More Related Content

What's hot

Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Oleksiy Panchenko
 
ElasticSearch in action
ElasticSearch in actionElasticSearch in action
ElasticSearch in actionCodemotion
 
quick intro to elastic search
quick intro to elastic search quick intro to elastic search
quick intro to elastic search medcl
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedBeyondTrees
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearchpmanvi
 
Logging logs with Logstash - Devops MK 10-02-2016
Logging logs with Logstash - Devops MK 10-02-2016Logging logs with Logstash - Devops MK 10-02-2016
Logging logs with Logstash - Devops MK 10-02-2016Steve Howe
 
Elasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuningElasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuningPetar Djekic
 
Elasticsearch presentation 1
Elasticsearch presentation 1Elasticsearch presentation 1
Elasticsearch presentation 1Maruf Hassan
 
Shipping & Visualize Your Data With ELK
Shipping  & Visualize Your Data With ELKShipping  & Visualize Your Data With ELK
Shipping & Visualize Your Data With ELKAdam Chen
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013Roy Russo
 
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)Martin Traverso
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch BasicsShifa Khan
 
Why Is My Solr Slow?: Presented by Mike Drob, Cloudera
Why Is My Solr Slow?: Presented by Mike Drob, ClouderaWhy Is My Solr Slow?: Presented by Mike Drob, Cloudera
Why Is My Solr Slow?: Presented by Mike Drob, ClouderaLucidworks
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearchhypto
 
From Lucene to Elasticsearch, a short explanation of horizontal scalability
From Lucene to Elasticsearch, a short explanation of horizontal scalabilityFrom Lucene to Elasticsearch, a short explanation of horizontal scalability
From Lucene to Elasticsearch, a short explanation of horizontal scalabilityStéphane Gamard
 
Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!Philips Kokoh Prasetyo
 
Scaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextScaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextRafał Kuć
 

What's hot (19)

Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
 
ElasticSearch in action
ElasticSearch in actionElasticSearch in action
ElasticSearch in action
 
quick intro to elastic search
quick intro to elastic search quick intro to elastic search
quick intro to elastic search
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learned
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
Logging logs with Logstash - Devops MK 10-02-2016
Logging logs with Logstash - Devops MK 10-02-2016Logging logs with Logstash - Devops MK 10-02-2016
Logging logs with Logstash - Devops MK 10-02-2016
 
Elasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuningElasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuning
 
Elasticsearch presentation 1
Elasticsearch presentation 1Elasticsearch presentation 1
Elasticsearch presentation 1
 
Shipping & Visualize Your Data With ELK
Shipping  & Visualize Your Data With ELKShipping  & Visualize Your Data With ELK
Shipping & Visualize Your Data With ELK
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013
 
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
 
Elastic search
Elastic searchElastic search
Elastic search
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch Basics
 
Why Is My Solr Slow?: Presented by Mike Drob, Cloudera
Why Is My Solr Slow?: Presented by Mike Drob, ClouderaWhy Is My Solr Slow?: Presented by Mike Drob, Cloudera
Why Is My Solr Slow?: Presented by Mike Drob, Cloudera
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
ELK introduction
ELK introductionELK introduction
ELK introduction
 
From Lucene to Elasticsearch, a short explanation of horizontal scalability
From Lucene to Elasticsearch, a short explanation of horizontal scalabilityFrom Lucene to Elasticsearch, a short explanation of horizontal scalability
From Lucene to Elasticsearch, a short explanation of horizontal scalability
 
Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!
 
Scaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextScaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - Sematext
 

Viewers also liked

Crate Shared Nothing Web Backends - Web Backend Meetup May 2014
Crate Shared Nothing Web Backends - Web Backend Meetup May 2014Crate Shared Nothing Web Backends - Web Backend Meetup May 2014
Crate Shared Nothing Web Backends - Web Backend Meetup May 2014Matthias Wahl
 
Basic Introduction to Crate @ ViennaDB Meetup
Basic Introduction to Crate @ ViennaDB MeetupBasic Introduction to Crate @ ViennaDB Meetup
Basic Introduction to Crate @ ViennaDB MeetupJohannes Moser
 
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...viirya
 
node-crate: node.js and big data
 node-crate: node.js and big data node-crate: node.js and big data
node-crate: node.js and big dataStefan Thies
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure DataTaro L. Saito
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data PlatformAmazon Web Services
 

Viewers also liked (6)

Crate Shared Nothing Web Backends - Web Backend Meetup May 2014
Crate Shared Nothing Web Backends - Web Backend Meetup May 2014Crate Shared Nothing Web Backends - Web Backend Meetup May 2014
Crate Shared Nothing Web Backends - Web Backend Meetup May 2014
 
Basic Introduction to Crate @ ViennaDB Meetup
Basic Introduction to Crate @ ViennaDB MeetupBasic Introduction to Crate @ ViennaDB Meetup
Basic Introduction to Crate @ ViennaDB Meetup
 
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
 
node-crate: node.js and big data
 node-crate: node.js and big data node-crate: node.js and big data
node-crate: node.js and big data
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure Data
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
 

Similar to Turning a Search Engine into a Relational Database

Taming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQLTaming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQLMichael Rys
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Michael Rys
 
SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsKorea Sdec
 
From Raw Data to Analytics with No ETL
From Raw Data to Analytics with No ETLFrom Raw Data to Analytics with No ETL
From Raw Data to Analytics with No ETLCloudera, Inc.
 
Hive @ Bucharest Java User Group
Hive @ Bucharest Java User GroupHive @ Bucharest Java User Group
Hive @ Bucharest Java User GroupRemus Rusanu
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Michael Rys
 
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...NoSQLmatters
 
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیDeep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیEhsan Asgarian
 
Graph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft EcosystemGraph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft EcosystemMarco Parenzan
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoopbddmoscow
 
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...DataStax Academy
 
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...DataStax Academy
 
Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...DataStax Academy
 
3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sql3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sqlŁukasz Grala
 
Hive Evolution: ApacheCon NA 2010
Hive Evolution:  ApacheCon NA 2010Hive Evolution:  ApacheCon NA 2010
Hive Evolution: ApacheCon NA 2010John Sichi
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Michael Rys
 
MongoDB Munich 2012: MongoDB for official documents in Bavaria
MongoDB Munich 2012: MongoDB for official documents in BavariaMongoDB Munich 2012: MongoDB for official documents in Bavaria
MongoDB Munich 2012: MongoDB for official documents in BavariaMongoDB
 
MYSQL Query Anti-Patterns That Can Be Moved to Sphinx
MYSQL Query Anti-Patterns That Can Be Moved to SphinxMYSQL Query Anti-Patterns That Can Be Moved to Sphinx
MYSQL Query Anti-Patterns That Can Be Moved to SphinxPythian
 
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...Michael Rys
 

Similar to Turning a Search Engine into a Relational Database (20)

Taming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQLTaming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
 
SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and models
 
From Raw Data to Analytics with No ETL
From Raw Data to Analytics with No ETLFrom Raw Data to Analytics with No ETL
From Raw Data to Analytics with No ETL
 
Hive @ Bucharest Java User Group
Hive @ Bucharest Java User GroupHive @ Bucharest Java User Group
Hive @ Bucharest Java User Group
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
 
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
 
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیDeep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
 
Graph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft EcosystemGraph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft Ecosystem
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
 
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...
 
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
 
Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...
 
3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sql3 CityNetConf - sql+c#=u-sql
3 CityNetConf - sql+c#=u-sql
 
Hive Evolution: ApacheCon NA 2010
Hive Evolution:  ApacheCon NA 2010Hive Evolution:  ApacheCon NA 2010
Hive Evolution: ApacheCon NA 2010
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)
 
MongoDB Munich 2012: MongoDB for official documents in Bavaria
MongoDB Munich 2012: MongoDB for official documents in BavariaMongoDB Munich 2012: MongoDB for official documents in Bavaria
MongoDB Munich 2012: MongoDB for official documents in Bavaria
 
Apache hive
Apache hiveApache hive
Apache hive
 
MYSQL Query Anti-Patterns That Can Be Moved to Sphinx
MYSQL Query Anti-Patterns That Can Be Moved to SphinxMYSQL Query Anti-Patterns That Can Be Moved to Sphinx
MYSQL Query Anti-Patterns That Can Be Moved to Sphinx
 
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
 

Recently uploaded

VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 

Recently uploaded (20)

VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 

Turning a Search Engine into a Relational Database

  • 1. Turning a search engine into a relational database Matthias Wahl, Developer
  • 2. Outline • Introduction! • Relations on Lucene! • The How and Why! • Crate in further detail! • Query engine! ! • [Demo]
  • 3. Crate.io THe company • Founded in 2013 in Dornbirn/Austria! • Offices in Dornbirn, Berlin, San Francisco! • Team of 14 People (with and without strong austrian dialect)! • won Techrunch Disrupt startup battlefield
  • 4. SQL Database TABLES • Table == Tuple Store! • Primary-Key -> Tuple! • Index == B-Tree! • allowing for equality and range queries O(log(N))! • sorting! • Query Planner + Engine for LOCAL query execution
  • 5. LUCENE INDEX • Inverted Index! • equality queries! • range queries! • fulltext search with analyzed queries! • NO Sorting! • Stored Fields! • docValues / FieldCache
  • 6. SQL TABLE LUCENE INDEX CREATE TABLE t ( id int primary key, name string, marks array(float), text string index using fulltext ) clustered into 5 shards with (number_of_replicas=1) • 1 TABLE • S shards, each with R replicas! • metadata in cluster state! • 1 SHARD • 1 Lucene Index (inverted index + stored documents/fields)! • field mappings! • field caches
  • 7. SQL TABLE LUCENE INDEX Components • Inverted Index! • Translog (WAL)! • “Tuple Store” - Stored Fields! • Lucene Field Data ! • DocValues (on disk)
  • 8. SQL TABLE LUCENE INDICES • Differences to Relational Databases • DISTRIBUTED! • 2 different indices needed for all operations! • inverted index not suited for all kinds of queries! • persistence is expensive! • limited schema altering
  • 9. SQL TABLE LUCENE INDICES • Differences to Relational Databases • DISTRIBUTED! • 2 different indices needed for all operations! • inverted index not suited for all kinds of queries! • persistence is expensive! • limited schema altering! • no pull based database cursor (yet)
  • 10. Crate Features • Distributed SQL Database written in Java (7)! • accessible via HTTP & TCP (for java clients only)! • Graphical Admin Interface! • Blob Storage! • Plugin Infrastructure! • Clients available in Java, Python, Ruby, PHP, Scala, Node.js, Erlang! • Runs on Docker, AWS, GCE, Mesos, …
  • 12. Crate SQL • subset of ANSI SQL with extensions! • arrays and nested objects! • different types! • Information Schema! • Cluster and Node State exposed via tables ! • Partitioned Tables! • speaking JDBC, ODBC, SQLAlchemy, Activerecord, PHP-PDO/DBAL
  • 13. Crate SQL • Common relational Operators:! • Projection! • Grouping (incl. HAVING)! • Aggregations! • Sorting! • Limit/Offset! • WHERE-clause! • Import/Export
  • 14. Crate SQL … NOT • JOINS underway! • no subselects, foreign key constraints yet! • no sessions, no client cursor! • no transactions
  • 15. CRATE - A RELATIONAL DATABASE • Relational Algebra • SQL statement! • Tree of Relational Operators! • Mostly Tables == Leaves! • ES! • Single table operations only! ! • No simple SQL wrapper around ES Query DSL SELECT id, substr(4, name), id % 2 as “EVEN” text, marks FROM t WHERE name IS NOT NULL AND match(text, ‘full’) ORDER BY id DESC LIMIT 10;
  • 16. Querying crate • Query Engine • node based query execution! • directly to Lucene indices! • circumventing ES query execution SELECT id, substr(4, name), id % 2 as “EVEN” text, marks FROM t WHERE name IS NOT NULL AND match(text, ‘full’) ORDER BY id DESC LIMIT 10;
  • 17. SQL TABLE LUCENE INDEX INSERT INTO t (id, name, marks, text) VALUES ( 42, format(‘%d - %s’, 42, ‘Zaphod’), [1.5, 4.6], ‘this is a quite full text!’) ON DUPLICATE KEY UPDATE name=‘DUPLICATE’; • INSERT INTO • insert values are validated by their configured types! • types are guessed for new columns! • primary key and routing values extracted! • JSON _source is created: ! • {“id”: 42 “name”: “42 - Zaphod”, “marks”: [1.5, 4.6], “text”:”this is a quite full text!”}
  • 18. SQL TABLE LUCENE INDEX { “id”: 42 “name”: “42 - Zaphod”, “marks”: [1.5, 4.6], “text”: “this is a quite full text!” } • INSERT INTO • request is routed by “id” column to node containing shard! • row stored on shard
  • 19. SQL TABLE LUCENE INDEX _uid: “default#42” _routing: 42 _source: {…} _type: “default” _timestamp: 1435096992201 _version: 0 id: 42 (docvalues) name: “42 - Zaphod” (sorted set) marks: 1.5, 4.6 ft: “this is quite a full text!” (indexed) _size: 82 _field_names: _uid, _routing, _source, _type, _timestamp, _version, id, name, marks, ft, _size
  • 20. Querying crate • Sorting and Grouping • inverted index not enough! • per document values (DocValues) SELECT id, substr(4, name), id % 2 as “EVEN” text, marks FROM t WHERE name IS NOT NULL AND match(text, ‘full’) ORDER BY id DESC LIMIT 10;
  • 21. Querying crate • “Simple” SELECT - QTF • Extract Fields to SELECT! • Route to shards / Lucene Indices! • Open and keep Lucene Reader in query context! • Only collect Doc/Row identifier (and all necessary fields for sorting)! • merge separate results on handler! • apply limit/offset! • fetch all fields! • evaluate expressions! • return Object[][] SELECT id, substr(4, name), id % 2 as “EVEN” text, marks FROM t WHERE name IS NOT NULL AND match(text, ‘full’) ORDER BY id DESC LIMIT 10;
  • 22. Querying crate • INTERNAL PAGING • Problems with big result sets / high offsets! • Need to fetch LIMIT + OFFSET from every shard! • Execution starts at TOP Relation! • trickles down to tables (Lucene Indices)! • Hybrid of push and pull based data flow SELECT id, substr(4, name), id % 2 as “EVEN” text, marks FROM t WHERE name IS NOT NULL AND match(text, ‘full’) ORDER BY id DESC LIMIT 1 OFFSET 10000000;
  • 23. Querying crate • GROUP BY - AGGREGATIONS • Aggregation Framework developed parallel to Elasticsearch aggregations! • ES - 2 phase aggregations (HyperLogLog, Moving Averages, Percentiles …)! • online algorithms on partial data (mergeable) necessary! • https://github.com/elastic/ elasticsearch/issues/4915 SELECT avg(temp) as avg, stddev(temp) as stddev, max(temp) as max, min(temp) as min, count(distinct temp) date_trunc(‘year’, date) as year FROM t WHERE temp IS NOT NULL GROUP BY 2 ORDER BY avg DESC LIMIT 10;
  • 24. Querying crate • GROUP BY - AGGREGATIONS • split in 3 phases! • partial aggregation executed on each shard in parallel! • partial result distributed to “Reduce” nodes by hashing the group keys! • final aggregation on handler/ reducer! • merge on handler SELECT avg(temp) as avg, stddev(temp) as stddev, max(temp) as max, min(temp) as min, count(distinct temp) date_trunc(‘year’, date) as year FROM t WHERE temp IS NOT NULL GROUP BY 2 ORDER BY avg DESC LIMIT 10;
  • 25. Querying crate SELECT avg(temp) as avg, stddev(temp) as stddev, max(temp) as max, min(temp) as min, count(distinct temp) date_trunc(‘year’, date) as year FROM t WHERE temp IS NOT NULL GROUP BY 2 ORDER BY avg DESC LIMIT 10; [1,2,2] [2,3,7] [4,9,42] [1, 7, 9] [2,3,4] 6 [1] [2] 3 3 Shards REDUCER HANDLER
  • 26. Querying crate • GROUP BY - AGGREGATIONS • Row Authority by hashing! • split huge datasets! • expensive intermediate aggregation states possible (COUNT DISTINCT) SELECT avg(temp) as avg, stddev(temp) as stddev, max(temp) as max, min(temp) as min, count(distinct temp) date_trunc(‘year’, date) as year FROM t WHERE temp IS NOT NULL GROUP BY 2 ORDER BY avg DESC LIMIT 10;
  • 27. FINALLY • GETTING RELATIONAL… • still in transition! • more relational operators to come! • JOINs are underway! • CROSS JOINS already “work”