SQL in the Hybrid World

Tanel Poder
Tanel PoderA long-time computer performance geek, also entrepreneur. at PoderC LLC
gluent.com 1
SQL	
  in	
  the	
  Hybrid	
  World
Tanel	
  Poder
a	
  long	
  time	
  computer	
  performance	
  geek
gluent.com 2
Intro:	
  About	
  me
• Tanel	
  Põder
• Oracle	
  Database	
  Performance	
  geek	
  (18+	
  years)
• Exadata	
  Performance	
  geek
• Linux	
  Performance	
  geek
• Hadoop	
  Performance	
  geek
• CEO	
  &	
  co-­‐founder:
Expert	
  Oracle	
  Exadata	
  
book
(2nd edition	
  is	
  out	
  now!)
Instant	
  
promotion
gluent.com 3
The	
  Hybrid	
  World
GluentHadoop
Gluent
MSSQL
Postgres
NoSQL	
  
DBs
Big	
  Data	
  
Sources
Oracle
App	
  
X
App	
  
Y
App	
  
Z
gluent.com 4
Why	
  this	
  talk?
• Various	
  NoSQL	
  databases	
  and	
  Hadoop	
  projects	
  are	
  very	
  hot!
• Is	
  this	
  stuff	
  for	
  real?
• I’m	
  mainly	
  talking	
  about	
  traditional	
  enterprises
• Not	
  small	
  startups
• Not	
  web/mobile	
  giants
• This	
  is	
  not	
  an	
  academic	
  analysis	
  of	
  data	
  consistency	
  theorems
• A	
  basic	
  reasoning	
  from	
  a	
  database	
  professional’s	
  point	
  of	
  view…
But	
  hey,	
  it’s	
  2016!
gluent.com 5
My	
  data	
  engineering	
  career
Software	
  
Developer
Production	
  
DBA
Software	
  
Architect
Development	
  
DBA Troubleshooter,	
  
Problem	
  solver
My	
  problem-­‐
solving	
  career	
  
history	
  has	
  given	
  
great	
  insight	
  into	
  
how	
  things	
  actually	
  
work	
  (and	
  fail)
(And	
  why	
  it	
  is	
  relevant	
  to	
  this	
  talk)
Where	
  rubber	
  
meets	
  the	
  road
gluent.com 6
NoSQL
gluent.com 7
Typical properties	
  associated	
  with	
  NoSQL
1. Schemaless	
  /	
  dynamic	
  schema
• Dump	
  any	
  fields	
  into	
  (and	
  under)	
  any	
  records
2. Doesn’t	
  do	
  joins
• Everything	
  related	
  to	
  an	
  item	
  should	
  physically	
  be	
  stored	
  with	
  the	
  item
3. Horizontally	
  scalable	
  distributed	
  systems
• Data	
  in	
  a	
  “document”	
  doesn’t	
  span	
  cluster	
  nodes
• So,	
  nodes	
  don’t	
  coordinate	
  with	
  each	
  other	
  much
4. Not	
  ACID	
  compliant
• Consistent	
  or	
  “eventually	
  consistent”
• No	
  multi-­‐row	
  /	
  multi-­‐document	
  transactions
Not	
  all	
  (NoSQL)	
  
databases	
  are	
  the	
  
same!	
  No	
  single	
  
definition	
  of	
  NoSQL!
As	
  otherwise	
  the	
  
cluster	
  nodes	
  would	
  
have	
  to	
  coordinate	
  
with	
  each	
  other	
  (and	
  
two-­‐phase	
  commit)
gluent.com 8
Real	
  Life:	
  Early	
  “NoSQL”	
  drivers	
  (2004)
1. Adding	
  a	
  column	
  in	
  a	
  development	
  RDBMS takes	
  2	
  days
• Developer	
  has	
  to	
  file	
  a	
  form	
  or	
  two	
  for	
  Dev	
  DBA
2. Getting	
  this	
  column	
  into	
  production	
  RDBMS	
  takes	
  2	
  weeks
• Structural	
  table	
  changes	
  may	
  break	
  things,	
  can’t	
  release	
  often
3. Not	
  a	
  technical	
  issue,	
  but	
  a	
  process	
  limitation
• App	
  Dev	
  not	
  moving	
  fast	
  enough	
  for	
  business
gluent.com 9
Real	
  Life:	
  Early	
  “NoSQL”	
  drivers
4. Developers	
  didn’t	
  like	
  the	
  organizational	
  slowness,	
  so:
• Reaction:	
  Using	
  catch-­‐all	
  XML	
  columns	
  and	
  generic	
  entity-­‐attribute-­‐
value	
  designs	
  for	
  “dynamic”	
  data	
  models
• Result:	
  RDBMS	
  Performance	
  problems	
  with	
  real	
  datasets
• Conclusion:	
  “RDBMS	
  are	
  too	
  slow	
  for	
  modern	
  applications”
• Actual	
  reality:	
  You	
  should	
  use	
  relational	
  databases	
  in	
  a	
  relational	
  way
gluent.com 10
“Joins	
  are	
  slow!”
Compared	
  to	
  what?
gluent.com 11
Real	
  Life:	
  Joins	
  are	
  slow!	
  (MySQL	
  edition)
• Circa	
  year	
  2008
• Claim:	
  All	
  joins	
  are	
  slow!
• Look	
  deeper:	
  Joining	
  two	
  small	
  tables	
  in	
  MySQL	
  
is	
  slow
• Look	
  deeper:	
  MySQL	
  didn’t	
  have	
  hash	
  joins	
  back	
  then
• Look	
  deeper:	
  Joining	
  unindexed	
  tables	
  with	
  nested	
  loops
• Actual	
  Reality:	
  Optimize	
  physical	
  design,	
  use	
  a	
  more	
  powerful	
  RDBMS
By	
  now,	
  MySQL	
  has	
  
buffered	
  (block)	
  
nested	
  loop	
  joins	
  
and	
  MariaDB
buffered	
  hash	
  joins
gluent.com 12
Joins	
  are	
  slow!	
  (modern	
  scale-­‐out	
  edition)
• Store	
  all	
  related	
  records	
  of	
  a	
  document	
  in	
  a	
  single	
  server	
  node	
  
in	
  a	
  nested	
  structure
• Orders under	
  a	
  Customer document
• Order_items stored	
  under	
  an	
  Order document
• Data	
  “pre-­‐joined”	
  on	
  write:	
  De-­‐normalization	
  (embedding)
• Physical	
  co-­‐location	
  to	
  avoid	
  extra	
  IO	
  +	
  talking	
  to	
  other	
  cluster	
  nodes
• Document	
  Schema	
  design	
  important	
  for	
  performance!
Customer (Name: Tanel, Address: Somestreet 12-34, SomeCity, ZZ)
Order #123 (Order Date: 2015-12-01)
Order Item #1 (Quantity: 1)
Product (Name: Vacuum Cleaner)
Order Item #2 (Quantity: 3)
Product (Name: Air Filter)
All	
  the	
  ”nested”	
  data	
  
related	
  to	
  a	
  
customer	
  contained	
  
in	
  a	
  single	
  physical	
  
location	
  &	
  server
Plus	
  mirroring	
  to	
  
other	
  server(s)	
  of	
  
course
gluent.com 13
Document	
  data	
  retrieval	
  (simplified)
Customer
“Tanel”
Order	
  #123
Item	
  #2
Product	
  Y
Customer
“Ted”
Customer
“Joe”
Item	
  #1
Product	
  Y
Data	
  “pre-­‐joined”	
  in	
  
a	
  nested	
  structure
gluent.com 14
Relational	
  data	
  retrieval	
  (simplified)
Customers Orders
Order	
  
Items
Products
Retrieval	
  must	
  visit	
  
multiple	
  separate	
  
(indexes)	
  and	
  tables
gluent.com 15
Joins	
  in	
  MongoDB	
  3.2
• Source:	
  https://www.mongodb.com/blog/post/joins-­‐and-­‐other-­‐aggregation-­‐
enhancements-­‐coming-­‐in-­‐mongodb-­‐3-­‐2-­‐part-­‐1-­‐of-­‐3-­‐introduction
gluent.com 16
Queries	
  in	
  MongoDB	
  3.2
• Source:	
  https://www.mongodb.com/blog/post/joins-­‐and-­‐other-­‐aggregation-­‐
enhancements-­‐coming-­‐in-­‐mongodb-­‐3-­‐2-­‐part-­‐1-­‐of-­‐3-­‐introduction
gluent.com 17
Filtering	
  in	
  MongoDB	
  3.2
• Query	
  &	
  filter	
  an	
  array	
  of	
  elements:
• MongoDB	
  Documentation:	
  
https://docs.mongodb.org/v2.6/reference/method/db.collection.find/#examples
SELECT
FROM	
  bios
WHERE	
  
award	
  =	
  ‘Turing	
  Award’	
  
AND	
  year	
  >	
  1980	
  ?
gluent.com 18
Joins	
  in	
  MongoDB	
  3.2
• Source:	
  https://www.mongodb.com/blog/post/joins-­‐and-­‐other-­‐aggregation-­‐
enhancements-­‐coming-­‐in-­‐mongodb-­‐3-­‐2-­‐part-­‐1-­‐of-­‐3-­‐introduction
gluent.com 19
Joins	
  in	
  Cassandra
• Source:	
  http://www.datastax.com/2015/03/how-­‐to-­‐do-­‐joins-­‐in-­‐apache-­‐cassandra-­‐
and-­‐datastax-­‐enterprise
gluent.com 20
Joins	
  in	
  Cassandra
Spark	
  SQL,	
  
Tableau,	
  etc
The	
  join	
  is	
  performed	
  
in	
  application	
  layer	
  
(Spark,	
  Tableau),	
  not	
  
pushed	
  into	
  DB	
  layer
User
gluent.com 21
Hadoop!
SQL	
  on
gluent.com 22
Why	
  Hadoop?
• Scalability	
  in	
  Software!
• Open	
  Data	
  Formats
• Future-­‐proof!
• One	
  Data,	
  Many	
  Engines!
Hive Impala Spark Presto Giraph
gluent.com 23
Scalability	
  vs.	
  Features	
  (Hive	
  example)
$ hive (version 1.2.1 HDP 2.3.1)
hive> SELECT SUM(duration)
> FROM call_detail_records
> WHERE
> type = 'INTERNET‘
> OR phone_number IN ( SELECT phone_number
> FROM customer_details
> WHERE region = 'R04' );
FAILED: SemanticException [Error 10249]: Line 5:17
Unsupported SubQuery Expression 'phone_number':
Only SubQuery expressions that are top level
conjuncts are allowed
We	
  Oracle	
  users	
  
have	
  been	
  spoiled	
  
with	
  very	
  
sophisticated	
  SQL	
  
engine	
  for	
  years	
  :)
gluent.com 24
Scalability	
  vs.	
  Features	
  (Impala	
  example)
$ impala-shell
SELECT SUM(order_total)
FROM orders
WHERE order_mode='online'
OR customer_id IN (SELECT customer_id FROM customers
WHERE customer_class = 'Prime');
Query: select SUM(order_total) FROM orders WHERE
order_mode='online' OR customer_id IN (SELECT
customer_id FROM customers WHERE customer_class =
'Prime')
ERROR: AnalysisException: Subqueries in OR predicates are
not supported: order_mode = 'online' OR customer_id IN
(SELECT customer_id FROM soe.customers WHERE
customer_class = 'Prime')
Cloudera:	
  CDH5.5	
  
Impala	
  2.3.0
gluent.com 25
Scalability	
  vs.	
  Features	
  (Hive	
  example)
$ hive (version 1.2.1 HDP 2.3.1)
hive> SELECT COUNT(*) FROM sales10m
> WHERE
> cust_id IN (SELECT cust_id FROM c)
> AND prod_id IN (SELECT prod_id FROM p);
FAILED: SemanticException [Error 10249]: Line 1:85
Unsupported SubQuery Expression 'prod_id': Only 1
SubQuery expression is supported.
These	
  are	
  deliberately	
  very	
  
simple	
  examples.	
  Not	
  even	
  
talking	
  about	
  complex	
  
window	
  functions	
   etc
gluent.com 26
Engineering	
  Tradeoffs
• Speed	
  of	
  Application	
  Delivery
• Application	
  Speed	
  &	
  Efficiency
• Workload	
  Scalability
• Application	
  Reliability
• Cost!
gluent.com 27
Hadoop,	
  NoSQL,	
  RDBMS	
  One	
  way	
  to	
  look	
  at	
  it
Hadoop
Data	
  Lake
RDBMS
Complex	
  
Transactions
Analytics
All	
  Data!	
  
Scalable,
Flexible
Sophisticated,
Out-­‐of-­‐the-­‐box,
Mature
“NoSQL”	
  
Transaction	
  
Ingest
Scalable	
  
Simple
Transactions
Unified	
  SQL	
  access
gluent.com 28
Thanks!
http://blog.tanelpoder.com
tanel@tanelpoder.com
@tanelpoder
We	
  are	
  hiring	
  developers	
  &	
  data	
  engineers!!!
http://gluent.com
1 of 28

Recommended

Avoid boring work_v2 by
Avoid boring work_v2Avoid boring work_v2
Avoid boring work_v2Marcin Przepiórowski
11.1K views54 slides
SQL Monitoring in Oracle Database 12c by
SQL Monitoring in Oracle Database 12cSQL Monitoring in Oracle Database 12c
SQL Monitoring in Oracle Database 12cTanel Poder
84.7K views34 slides
In Memory Database In Action by Tanel Poder and Kerry Osborne by
In Memory Database In Action by Tanel Poder and Kerry OsborneIn Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry OsborneEnkitec
1.5K views40 slides
Oracle 12c Parallel Execution New Features by
Oracle 12c Parallel Execution New FeaturesOracle 12c Parallel Execution New Features
Oracle 12c Parallel Execution New FeaturesRandolf Geist
1.4K views38 slides
Oracle X$TRACE, Exotic Wait Event Types and Background Process Communication by
Oracle X$TRACE, Exotic Wait Event Types and Background Process CommunicationOracle X$TRACE, Exotic Wait Event Types and Background Process Communication
Oracle X$TRACE, Exotic Wait Event Types and Background Process CommunicationTanel Poder
2.6K views20 slides
Tanel Poder - Performance stories from Exadata Migrations by
Tanel Poder - Performance stories from Exadata MigrationsTanel Poder - Performance stories from Exadata Migrations
Tanel Poder - Performance stories from Exadata MigrationsTanel Poder
6.3K views28 slides

More Related Content

What's hot

Microsoft SQL Server Query Tuning by
Microsoft SQL Server Query TuningMicrosoft SQL Server Query Tuning
Microsoft SQL Server Query TuningMark Ginnebaugh
2.4K views44 slides
Drupal commerce performance profiling and tunning using loadstorm experiments... by
Drupal commerce performance profiling and tunning using loadstorm experiments...Drupal commerce performance profiling and tunning using loadstorm experiments...
Drupal commerce performance profiling and tunning using loadstorm experiments...Andy Kucharski
4K views49 slides
Oracle Exadata Performance: Latest Improvements and Less Known Features by
Oracle Exadata Performance: Latest Improvements and Less Known FeaturesOracle Exadata Performance: Latest Improvements and Less Known Features
Oracle Exadata Performance: Latest Improvements and Less Known FeaturesTanel Poder
31.7K views26 slides
Oracle Database 12c - Features for Big Data by
Oracle Database 12c - Features for Big DataOracle Database 12c - Features for Big Data
Oracle Database 12c - Features for Big DataAbishek V S
1.7K views42 slides
Connecting Hadoop and Oracle by
Connecting Hadoop and OracleConnecting Hadoop and Oracle
Connecting Hadoop and OracleTanel Poder
38.6K views47 slides
Extreme Availability using Oracle 12c Features: Your very last system shutdown? by
Extreme Availability using Oracle 12c Features: Your very last system shutdown?Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Extreme Availability using Oracle 12c Features: Your very last system shutdown?Toronto-Oracle-Users-Group
5.7K views79 slides

What's hot(20)

Microsoft SQL Server Query Tuning by Mark Ginnebaugh
Microsoft SQL Server Query TuningMicrosoft SQL Server Query Tuning
Microsoft SQL Server Query Tuning
Mark Ginnebaugh2.4K views
Drupal commerce performance profiling and tunning using loadstorm experiments... by Andy Kucharski
Drupal commerce performance profiling and tunning using loadstorm experiments...Drupal commerce performance profiling and tunning using loadstorm experiments...
Drupal commerce performance profiling and tunning using loadstorm experiments...
Andy Kucharski4K views
Oracle Exadata Performance: Latest Improvements and Less Known Features by Tanel Poder
Oracle Exadata Performance: Latest Improvements and Less Known FeaturesOracle Exadata Performance: Latest Improvements and Less Known Features
Oracle Exadata Performance: Latest Improvements and Less Known Features
Tanel Poder31.7K views
Oracle Database 12c - Features for Big Data by Abishek V S
Oracle Database 12c - Features for Big DataOracle Database 12c - Features for Big Data
Oracle Database 12c - Features for Big Data
Abishek V S1.7K views
Connecting Hadoop and Oracle by Tanel Poder
Connecting Hadoop and OracleConnecting Hadoop and Oracle
Connecting Hadoop and Oracle
Tanel Poder38.6K views
Extreme Availability using Oracle 12c Features: Your very last system shutdown? by Toronto-Oracle-Users-Group
Extreme Availability using Oracle 12c Features: Your very last system shutdown?Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Making MySQL highly available using Oracle Grid Infrastructure by Ilmar Kerm
Making MySQL highly available using Oracle Grid InfrastructureMaking MySQL highly available using Oracle Grid Infrastructure
Making MySQL highly available using Oracle Grid Infrastructure
Ilmar Kerm5.1K views
PDB Provisioning with Oracle Multitenant Self Service Application by Leighton Nelson
PDB Provisioning with Oracle Multitenant Self Service ApplicationPDB Provisioning with Oracle Multitenant Self Service Application
PDB Provisioning with Oracle Multitenant Self Service Application
Leighton Nelson7.3K views
Oracle Database 12c Release 2 - New Features On Oracle Database Exadata Expr... by Alex Zaballa
Oracle Database 12c Release 2 - New Features On Oracle Database Exadata  Expr...Oracle Database 12c Release 2 - New Features On Oracle Database Exadata  Expr...
Oracle Database 12c Release 2 - New Features On Oracle Database Exadata Expr...
Alex Zaballa3.5K views
PLSSUG - Troubleshoot SQL Server performance problems like a Microsoft Engineer by Marek Maśko
PLSSUG - Troubleshoot SQL Server performance problems like a Microsoft EngineerPLSSUG - Troubleshoot SQL Server performance problems like a Microsoft Engineer
PLSSUG - Troubleshoot SQL Server performance problems like a Microsoft Engineer
Marek Maśko545 views
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013) by Gabriele Bartolini
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)
Gabriele Bartolini5.1K views
Collaborate 2019 - How to Understand an AWR Report by Alfredo Krieg
Collaborate 2019 - How to Understand an AWR ReportCollaborate 2019 - How to Understand an AWR Report
Collaborate 2019 - How to Understand an AWR Report
Alfredo Krieg587 views
Performance Management in Oracle 12c by Alfredo Krieg
Performance Management in Oracle 12cPerformance Management in Oracle 12c
Performance Management in Oracle 12c
Alfredo Krieg921 views
Simplifying EBS 12.2 ADOP - Collaborate 2019 by Alfredo Krieg
Simplifying EBS 12.2 ADOP - Collaborate 2019   Simplifying EBS 12.2 ADOP - Collaborate 2019
Simplifying EBS 12.2 ADOP - Collaborate 2019
Alfredo Krieg1.4K views
OOUG - Oracle Performance Tuning with AAS by Kyle Hailey
OOUG - Oracle Performance Tuning with AASOOUG - Oracle Performance Tuning with AAS
OOUG - Oracle Performance Tuning with AAS
Kyle Hailey12.3K views
Oracle Database Performance Tuning: The Not SQL Option by Guatemala User Group
Oracle Database Performance Tuning: The Not SQL OptionOracle Database Performance Tuning: The Not SQL Option
Oracle Database Performance Tuning: The Not SQL Option

Similar to SQL in the Hybrid World

Building data pipelines at Shopee with DEC by
Building data pipelines at Shopee with DECBuilding data pipelines at Shopee with DEC
Building data pipelines at Shopee with DECRim Zaidullin
534 views49 slides
Building FoundationDB by
Building FoundationDBBuilding FoundationDB
Building FoundationDBFoundationDB
3.4K views38 slides
Whats new in Oracle Database 12c release 12.1.0.2 by
Whats new in Oracle Database 12c release 12.1.0.2Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2Connor McDonald
86 views42 slides
Framing the Argument: How to Scale Faster with NoSQL by
Framing the Argument: How to Scale Faster with NoSQLFraming the Argument: How to Scale Faster with NoSQL
Framing the Argument: How to Scale Faster with NoSQLInside Analysis
643 views56 slides
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks by
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksDatabricks
892 views61 slides
My C.V by
My C.VMy C.V
My C.VPrabhakar Karanam
111 views8 slides

Similar to SQL in the Hybrid World(20)

Building data pipelines at Shopee with DEC by Rim Zaidullin
Building data pipelines at Shopee with DECBuilding data pipelines at Shopee with DEC
Building data pipelines at Shopee with DEC
Rim Zaidullin534 views
Building FoundationDB by FoundationDB
Building FoundationDBBuilding FoundationDB
Building FoundationDB
FoundationDB3.4K views
Whats new in Oracle Database 12c release 12.1.0.2 by Connor McDonald
Whats new in Oracle Database 12c release 12.1.0.2Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2
Connor McDonald86 views
Framing the Argument: How to Scale Faster with NoSQL by Inside Analysis
Framing the Argument: How to Scale Faster with NoSQLFraming the Argument: How to Scale Faster with NoSQL
Framing the Argument: How to Scale Faster with NoSQL
Inside Analysis643 views
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks by Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Databricks892 views
Kent-Graziano-Intro-to-Datavault_short.pdf by abhaybansal43
Kent-Graziano-Intro-to-Datavault_short.pdfKent-Graziano-Intro-to-Datavault_short.pdf
Kent-Graziano-Intro-to-Datavault_short.pdf
abhaybansal434 views
Software Architecture and Architectors: useless VS valuable by Comsysto Reply GmbH
Software Architecture and Architectors: useless VS valuableSoftware Architecture and Architectors: useless VS valuable
Software Architecture and Architectors: useless VS valuable
AOUG_11Nov2016_Challenges_with_EBS12_2 by Sean Braymen
AOUG_11Nov2016_Challenges_with_EBS12_2AOUG_11Nov2016_Challenges_with_EBS12_2
AOUG_11Nov2016_Challenges_with_EBS12_2
Sean Braymen1.2K views
Dynamic DDL: Adding structure to streaming IoT data on the fly by DataWorks Summit
Dynamic DDL: Adding structure to streaming IoT data on the flyDynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the fly
DataWorks Summit1.4K views
From ddd to DDD : My journey from data-driven development to Domain-Driven De... by Thibaud Desodt
From ddd to DDD : My journey from data-driven development to Domain-Driven De...From ddd to DDD : My journey from data-driven development to Domain-Driven De...
From ddd to DDD : My journey from data-driven development to Domain-Driven De...
Thibaud Desodt519 views
What's New in Apache Hive 3.0 - Tokyo by DataWorks Summit
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
DataWorks Summit945 views
Become a Performance Diagnostics Hero by TechWell
Become a Performance Diagnostics HeroBecome a Performance Diagnostics Hero
Become a Performance Diagnostics Hero
TechWell146 views

More from Tanel Poder

Troubleshooting Complex Oracle Performance Problems with Tanel Poder by
Troubleshooting Complex Oracle Performance Problems with Tanel PoderTroubleshooting Complex Oracle Performance Problems with Tanel Poder
Troubleshooting Complex Oracle Performance Problems with Tanel PoderTanel Poder
959 views41 slides
Low Level CPU Performance Profiling Examples by
Low Level CPU Performance Profiling ExamplesLow Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesTanel Poder
5.3K views42 slides
Modern Linux Performance Tools for Application Troubleshooting by
Modern Linux Performance Tools for Application TroubleshootingModern Linux Performance Tools for Application Troubleshooting
Modern Linux Performance Tools for Application TroubleshootingTanel Poder
1.3K views11 slides
GNW01: In-Memory Processing for Databases by
GNW01: In-Memory Processing for DatabasesGNW01: In-Memory Processing for Databases
GNW01: In-Memory Processing for DatabasesTanel Poder
4.5K views34 slides
Tanel Poder - Scripts and Tools short by
Tanel Poder - Scripts and Tools shortTanel Poder - Scripts and Tools short
Tanel Poder - Scripts and Tools shortTanel Poder
4.7K views34 slides
Troubleshooting Complex Performance issues - Oracle SEG$ contention by
Troubleshooting Complex Performance issues - Oracle SEG$ contentionTroubleshooting Complex Performance issues - Oracle SEG$ contention
Troubleshooting Complex Performance issues - Oracle SEG$ contentionTanel Poder
44.1K views43 slides

More from Tanel Poder(12)

Troubleshooting Complex Oracle Performance Problems with Tanel Poder by Tanel Poder
Troubleshooting Complex Oracle Performance Problems with Tanel PoderTroubleshooting Complex Oracle Performance Problems with Tanel Poder
Troubleshooting Complex Oracle Performance Problems with Tanel Poder
Tanel Poder959 views
Low Level CPU Performance Profiling Examples by Tanel Poder
Low Level CPU Performance Profiling ExamplesLow Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling Examples
Tanel Poder5.3K views
Modern Linux Performance Tools for Application Troubleshooting by Tanel Poder
Modern Linux Performance Tools for Application TroubleshootingModern Linux Performance Tools for Application Troubleshooting
Modern Linux Performance Tools for Application Troubleshooting
Tanel Poder1.3K views
GNW01: In-Memory Processing for Databases by Tanel Poder
GNW01: In-Memory Processing for DatabasesGNW01: In-Memory Processing for Databases
GNW01: In-Memory Processing for Databases
Tanel Poder4.5K views
Tanel Poder - Scripts and Tools short by Tanel Poder
Tanel Poder - Scripts and Tools shortTanel Poder - Scripts and Tools short
Tanel Poder - Scripts and Tools short
Tanel Poder4.7K views
Troubleshooting Complex Performance issues - Oracle SEG$ contention by Tanel Poder
Troubleshooting Complex Performance issues - Oracle SEG$ contentionTroubleshooting Complex Performance issues - Oracle SEG$ contention
Troubleshooting Complex Performance issues - Oracle SEG$ contention
Tanel Poder44.1K views
Oracle Database In-Memory Option in Action by Tanel Poder
Oracle Database In-Memory Option in ActionOracle Database In-Memory Option in Action
Oracle Database In-Memory Option in Action
Tanel Poder7.1K views
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1 by Tanel Poder
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1
Tanel Poder14K views
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2 by Tanel Poder
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
Tanel Poder12.6K views
Tanel Poder Oracle Scripts and Tools (2010) by Tanel Poder
Tanel Poder Oracle Scripts and Tools (2010)Tanel Poder Oracle Scripts and Tools (2010)
Tanel Poder Oracle Scripts and Tools (2010)
Tanel Poder46.2K views
Oracle Latch and Mutex Contention Troubleshooting by Tanel Poder
Oracle Latch and Mutex Contention TroubleshootingOracle Latch and Mutex Contention Troubleshooting
Oracle Latch and Mutex Contention Troubleshooting
Tanel Poder7.2K views
Oracle LOB Internals and Performance Tuning by Tanel Poder
Oracle LOB Internals and Performance TuningOracle LOB Internals and Performance Tuning
Oracle LOB Internals and Performance Tuning
Tanel Poder22.3K views

Recently uploaded

Gen Apps on Google Cloud PaLM2 and Codey APIs in Action by
Gen Apps on Google Cloud PaLM2 and Codey APIs in ActionGen Apps on Google Cloud PaLM2 and Codey APIs in Action
Gen Apps on Google Cloud PaLM2 and Codey APIs in ActionMárton Kodok
5 views55 slides
Sprint 226 by
Sprint 226Sprint 226
Sprint 226ManageIQ
5 views18 slides
FIMA 2023 Neo4j & FS - Entity Resolution.pptx by
FIMA 2023 Neo4j & FS - Entity Resolution.pptxFIMA 2023 Neo4j & FS - Entity Resolution.pptx
FIMA 2023 Neo4j & FS - Entity Resolution.pptxNeo4j
7 views26 slides
Software evolution understanding: Automatic extraction of software identifier... by
Software evolution understanding: Automatic extraction of software identifier...Software evolution understanding: Automatic extraction of software identifier...
Software evolution understanding: Automatic extraction of software identifier...Ra'Fat Al-Msie'deen
9 views33 slides
20231129 - Platform @ localhost 2023 - Application-driven infrastructure with... by
20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...
20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...sparkfabrik
5 views46 slides
EV Charging App Case by
EV Charging App Case EV Charging App Case
EV Charging App Case iCoderz Solutions
5 views1 slide

Recently uploaded(20)

Gen Apps on Google Cloud PaLM2 and Codey APIs in Action by Márton Kodok
Gen Apps on Google Cloud PaLM2 and Codey APIs in ActionGen Apps on Google Cloud PaLM2 and Codey APIs in Action
Gen Apps on Google Cloud PaLM2 and Codey APIs in Action
Márton Kodok5 views
Sprint 226 by ManageIQ
Sprint 226Sprint 226
Sprint 226
ManageIQ5 views
FIMA 2023 Neo4j & FS - Entity Resolution.pptx by Neo4j
FIMA 2023 Neo4j & FS - Entity Resolution.pptxFIMA 2023 Neo4j & FS - Entity Resolution.pptx
FIMA 2023 Neo4j & FS - Entity Resolution.pptx
Neo4j7 views
Software evolution understanding: Automatic extraction of software identifier... by Ra'Fat Al-Msie'deen
Software evolution understanding: Automatic extraction of software identifier...Software evolution understanding: Automatic extraction of software identifier...
Software evolution understanding: Automatic extraction of software identifier...
20231129 - Platform @ localhost 2023 - Application-driven infrastructure with... by sparkfabrik
20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...
20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...
sparkfabrik5 views
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra... by Marc Müller
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra....NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra...
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra...
Marc Müller38 views
DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J... by Deltares
DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J...DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J...
DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J...
Deltares9 views
Myths and Facts About Hospice Care: Busting Common Misconceptions by Care Coordinations
Myths and Facts About Hospice Care: Busting Common MisconceptionsMyths and Facts About Hospice Care: Busting Common Misconceptions
Myths and Facts About Hospice Care: Busting Common Misconceptions
Headless JS UG Presentation.pptx by Jack Spektor
Headless JS UG Presentation.pptxHeadless JS UG Presentation.pptx
Headless JS UG Presentation.pptx
Jack Spektor7 views
SUGCON ANZ Presentation V2.1 Final.pptx by Jack Spektor
SUGCON ANZ Presentation V2.1 Final.pptxSUGCON ANZ Presentation V2.1 Final.pptx
SUGCON ANZ Presentation V2.1 Final.pptx
Jack Spektor22 views
DSD-INT 2023 Leveraging the results of a 3D hydrodynamic model to improve the... by Deltares
DSD-INT 2023 Leveraging the results of a 3D hydrodynamic model to improve the...DSD-INT 2023 Leveraging the results of a 3D hydrodynamic model to improve the...
DSD-INT 2023 Leveraging the results of a 3D hydrodynamic model to improve the...
Deltares6 views
DSD-INT 2023 Thermobaricity in 3D DCSM-FM - taking pressure into account in t... by Deltares
DSD-INT 2023 Thermobaricity in 3D DCSM-FM - taking pressure into account in t...DSD-INT 2023 Thermobaricity in 3D DCSM-FM - taking pressure into account in t...
DSD-INT 2023 Thermobaricity in 3D DCSM-FM - taking pressure into account in t...
Deltares9 views
Airline Booking Software by SharmiMehta
Airline Booking SoftwareAirline Booking Software
Airline Booking Software
SharmiMehta6 views
DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko... by Deltares
DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko...DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko...
DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko...
Deltares14 views
DSD-INT 2023 Wave-Current Interaction at Montrose Tidal Inlet System and Its ... by Deltares
DSD-INT 2023 Wave-Current Interaction at Montrose Tidal Inlet System and Its ...DSD-INT 2023 Wave-Current Interaction at Montrose Tidal Inlet System and Its ...
DSD-INT 2023 Wave-Current Interaction at Montrose Tidal Inlet System and Its ...
Deltares11 views
DSD-INT 2023 Salt intrusion Modelling of the Lauwersmeer, towards a measureme... by Deltares
DSD-INT 2023 Salt intrusion Modelling of the Lauwersmeer, towards a measureme...DSD-INT 2023 Salt intrusion Modelling of the Lauwersmeer, towards a measureme...
DSD-INT 2023 Salt intrusion Modelling of the Lauwersmeer, towards a measureme...
Deltares5 views

SQL in the Hybrid World

  • 1. gluent.com 1 SQL  in  the  Hybrid  World Tanel  Poder a  long  time  computer  performance  geek
  • 2. gluent.com 2 Intro:  About  me • Tanel  Põder • Oracle  Database  Performance  geek  (18+  years) • Exadata  Performance  geek • Linux  Performance  geek • Hadoop  Performance  geek • CEO  &  co-­‐founder: Expert  Oracle  Exadata   book (2nd edition  is  out  now!) Instant   promotion
  • 3. gluent.com 3 The  Hybrid  World GluentHadoop Gluent MSSQL Postgres NoSQL   DBs Big  Data   Sources Oracle App   X App   Y App   Z
  • 4. gluent.com 4 Why  this  talk? • Various  NoSQL  databases  and  Hadoop  projects  are  very  hot! • Is  this  stuff  for  real? • I’m  mainly  talking  about  traditional  enterprises • Not  small  startups • Not  web/mobile  giants • This  is  not  an  academic  analysis  of  data  consistency  theorems • A  basic  reasoning  from  a  database  professional’s  point  of  view… But  hey,  it’s  2016!
  • 5. gluent.com 5 My  data  engineering  career Software   Developer Production   DBA Software   Architect Development   DBA Troubleshooter,   Problem  solver My  problem-­‐ solving  career   history  has  given   great  insight  into   how  things  actually   work  (and  fail) (And  why  it  is  relevant  to  this  talk) Where  rubber   meets  the  road
  • 7. gluent.com 7 Typical properties  associated  with  NoSQL 1. Schemaless  /  dynamic  schema • Dump  any  fields  into  (and  under)  any  records 2. Doesn’t  do  joins • Everything  related  to  an  item  should  physically  be  stored  with  the  item 3. Horizontally  scalable  distributed  systems • Data  in  a  “document”  doesn’t  span  cluster  nodes • So,  nodes  don’t  coordinate  with  each  other  much 4. Not  ACID  compliant • Consistent  or  “eventually  consistent” • No  multi-­‐row  /  multi-­‐document  transactions Not  all  (NoSQL)   databases  are  the   same!  No  single   definition  of  NoSQL! As  otherwise  the   cluster  nodes  would   have  to  coordinate   with  each  other  (and   two-­‐phase  commit)
  • 8. gluent.com 8 Real  Life:  Early  “NoSQL”  drivers  (2004) 1. Adding  a  column  in  a  development  RDBMS takes  2  days • Developer  has  to  file  a  form  or  two  for  Dev  DBA 2. Getting  this  column  into  production  RDBMS  takes  2  weeks • Structural  table  changes  may  break  things,  can’t  release  often 3. Not  a  technical  issue,  but  a  process  limitation • App  Dev  not  moving  fast  enough  for  business
  • 9. gluent.com 9 Real  Life:  Early  “NoSQL”  drivers 4. Developers  didn’t  like  the  organizational  slowness,  so: • Reaction:  Using  catch-­‐all  XML  columns  and  generic  entity-­‐attribute-­‐ value  designs  for  “dynamic”  data  models • Result:  RDBMS  Performance  problems  with  real  datasets • Conclusion:  “RDBMS  are  too  slow  for  modern  applications” • Actual  reality:  You  should  use  relational  databases  in  a  relational  way
  • 10. gluent.com 10 “Joins  are  slow!” Compared  to  what?
  • 11. gluent.com 11 Real  Life:  Joins  are  slow!  (MySQL  edition) • Circa  year  2008 • Claim:  All  joins  are  slow! • Look  deeper:  Joining  two  small  tables  in  MySQL   is  slow • Look  deeper:  MySQL  didn’t  have  hash  joins  back  then • Look  deeper:  Joining  unindexed  tables  with  nested  loops • Actual  Reality:  Optimize  physical  design,  use  a  more  powerful  RDBMS By  now,  MySQL  has   buffered  (block)   nested  loop  joins   and  MariaDB buffered  hash  joins
  • 12. gluent.com 12 Joins  are  slow!  (modern  scale-­‐out  edition) • Store  all  related  records  of  a  document  in  a  single  server  node   in  a  nested  structure • Orders under  a  Customer document • Order_items stored  under  an  Order document • Data  “pre-­‐joined”  on  write:  De-­‐normalization  (embedding) • Physical  co-­‐location  to  avoid  extra  IO  +  talking  to  other  cluster  nodes • Document  Schema  design  important  for  performance! Customer (Name: Tanel, Address: Somestreet 12-34, SomeCity, ZZ) Order #123 (Order Date: 2015-12-01) Order Item #1 (Quantity: 1) Product (Name: Vacuum Cleaner) Order Item #2 (Quantity: 3) Product (Name: Air Filter) All  the  ”nested”  data   related  to  a   customer  contained   in  a  single  physical   location  &  server Plus  mirroring  to   other  server(s)  of   course
  • 13. gluent.com 13 Document  data  retrieval  (simplified) Customer “Tanel” Order  #123 Item  #2 Product  Y Customer “Ted” Customer “Joe” Item  #1 Product  Y Data  “pre-­‐joined”  in   a  nested  structure
  • 14. gluent.com 14 Relational  data  retrieval  (simplified) Customers Orders Order   Items Products Retrieval  must  visit   multiple  separate   (indexes)  and  tables
  • 15. gluent.com 15 Joins  in  MongoDB  3.2 • Source:  https://www.mongodb.com/blog/post/joins-­‐and-­‐other-­‐aggregation-­‐ enhancements-­‐coming-­‐in-­‐mongodb-­‐3-­‐2-­‐part-­‐1-­‐of-­‐3-­‐introduction
  • 16. gluent.com 16 Queries  in  MongoDB  3.2 • Source:  https://www.mongodb.com/blog/post/joins-­‐and-­‐other-­‐aggregation-­‐ enhancements-­‐coming-­‐in-­‐mongodb-­‐3-­‐2-­‐part-­‐1-­‐of-­‐3-­‐introduction
  • 17. gluent.com 17 Filtering  in  MongoDB  3.2 • Query  &  filter  an  array  of  elements: • MongoDB  Documentation:   https://docs.mongodb.org/v2.6/reference/method/db.collection.find/#examples SELECT FROM  bios WHERE   award  =  ‘Turing  Award’   AND  year  >  1980  ?
  • 18. gluent.com 18 Joins  in  MongoDB  3.2 • Source:  https://www.mongodb.com/blog/post/joins-­‐and-­‐other-­‐aggregation-­‐ enhancements-­‐coming-­‐in-­‐mongodb-­‐3-­‐2-­‐part-­‐1-­‐of-­‐3-­‐introduction
  • 19. gluent.com 19 Joins  in  Cassandra • Source:  http://www.datastax.com/2015/03/how-­‐to-­‐do-­‐joins-­‐in-­‐apache-­‐cassandra-­‐ and-­‐datastax-­‐enterprise
  • 20. gluent.com 20 Joins  in  Cassandra Spark  SQL,   Tableau,  etc The  join  is  performed   in  application  layer   (Spark,  Tableau),  not   pushed  into  DB  layer User
  • 22. gluent.com 22 Why  Hadoop? • Scalability  in  Software! • Open  Data  Formats • Future-­‐proof! • One  Data,  Many  Engines! Hive Impala Spark Presto Giraph
  • 23. gluent.com 23 Scalability  vs.  Features  (Hive  example) $ hive (version 1.2.1 HDP 2.3.1) hive> SELECT SUM(duration) > FROM call_detail_records > WHERE > type = 'INTERNET‘ > OR phone_number IN ( SELECT phone_number > FROM customer_details > WHERE region = 'R04' ); FAILED: SemanticException [Error 10249]: Line 5:17 Unsupported SubQuery Expression 'phone_number': Only SubQuery expressions that are top level conjuncts are allowed We  Oracle  users   have  been  spoiled   with  very   sophisticated  SQL   engine  for  years  :)
  • 24. gluent.com 24 Scalability  vs.  Features  (Impala  example) $ impala-shell SELECT SUM(order_total) FROM orders WHERE order_mode='online' OR customer_id IN (SELECT customer_id FROM customers WHERE customer_class = 'Prime'); Query: select SUM(order_total) FROM orders WHERE order_mode='online' OR customer_id IN (SELECT customer_id FROM customers WHERE customer_class = 'Prime') ERROR: AnalysisException: Subqueries in OR predicates are not supported: order_mode = 'online' OR customer_id IN (SELECT customer_id FROM soe.customers WHERE customer_class = 'Prime') Cloudera:  CDH5.5   Impala  2.3.0
  • 25. gluent.com 25 Scalability  vs.  Features  (Hive  example) $ hive (version 1.2.1 HDP 2.3.1) hive> SELECT COUNT(*) FROM sales10m > WHERE > cust_id IN (SELECT cust_id FROM c) > AND prod_id IN (SELECT prod_id FROM p); FAILED: SemanticException [Error 10249]: Line 1:85 Unsupported SubQuery Expression 'prod_id': Only 1 SubQuery expression is supported. These  are  deliberately  very   simple  examples.  Not  even   talking  about  complex   window  functions   etc
  • 26. gluent.com 26 Engineering  Tradeoffs • Speed  of  Application  Delivery • Application  Speed  &  Efficiency • Workload  Scalability • Application  Reliability • Cost!
  • 27. gluent.com 27 Hadoop,  NoSQL,  RDBMS  One  way  to  look  at  it Hadoop Data  Lake RDBMS Complex   Transactions Analytics All  Data!   Scalable, Flexible Sophisticated, Out-­‐of-­‐the-­‐box, Mature “NoSQL”   Transaction   Ingest Scalable   Simple Transactions Unified  SQL  access
  • 28. gluent.com 28 Thanks! http://blog.tanelpoder.com tanel@tanelpoder.com @tanelpoder We  are  hiring  developers  &  data  engineers!!! http://gluent.com