SlideShare a Scribd company logo
1 of 39
Download to read offline
Migrating from SQL to NoSQL (MongoDB)
I wish I had a time machine and could use
CrateDB
@mtrivizas @crateio
@mtrivizas @crateio
Marios (me)
Academic background in Databases & Distributed Systems
~ 15 years experience as backend developer, C / Java
Focused on Data (Databases/Datastores & caching)
Core Developer, 1 year at Crate.io
@mtrivizas @crateio
● Migrating from RDBMS to NoSQL (MongoDB)
● Why CrateDB made me wish I had a time machine
(This is not a presentation for MongoDB vs CrateDB)
Agenda
@mtrivizas @crateio
Migrating from RDBMS to NoSQL (MongoDB)
● Why
● Developers’ pain
● QA/DevOps/BI pain
● SysAdmin’s pain
@mtrivizas @crateio
Migrating to MongoDB - Why - Previous Setup
● Several 2-node clusters of PostgreSQL (each cluster
servicing one project/customer)
● 1 Large Oracle data warehouse
● Performing analytics on Production DBs (schemas
contained tables with data only used for analytics)
● Some data transformed -> injecting to DWH for further
analysis
@mtrivizas @crateio
Migrating to MongoDB - Why - Problem
● Unsatisfactory throughput/latency for user requests
● Querying 2 different systems (production db and DWH) for
analytics
● Querying production db for BI purposes impacts performance
dramatically (both BI queries & user requests)
● Ability to perform text searching
@mtrivizas @crateio
Migrating to MongoDB - Why - Solution
● Move non-transactional BI data to a NoSQL data store
● Use one large (or a few large) NoSQL cluster which stores
all BI related data / Remove DWH
● Why MongoDB
○ Document data store
○ Performance
○ In house knowledge by other company teams
@mtrivizas @crateio
Migrating to MongoDB - Developer’s Pain
● Schema - Docs
○ Referenced docs
■ Multiple queries to retrieve info
■ application processing (no join support)
@mtrivizas @crateio
Migrating to MongoDB - Developer’s Pain
● Schema - Docs
○ Embedded
■ Max doc size 2mb (now 16mb)
■ Data duplication
■ Data inconsistencies (updated in one embedded doc but not in
other(s)
○ Often updates or growing documents lead to fragmentation
■ Running compact is really slow and DB is almost unusable
@mtrivizas @crateio
Migrating to MongoDB - Developer’s Pain
● Schema - Indexes
○ If all fields indexed
■ Insert/Update performance drops a lot
■ Indexes must fit in memory
○ Need to carefully choose which fields to index and which
scheme to use (partial, sparse)
○ Keyword search very good / full-text quite slow
@mtrivizas @crateio
Migrating to MongoDB - Developer’s Pain
● Schema - Aggregation
○ Some aggregations were really slow -> had to store
several counters in extra collections to solve the
problem
● Write lock per collection
@mtrivizas @crateio
Migrating to MongoDB - Developer’s Pain
Learn new
DML & DQL
Language
@mtrivizas @crateio
Migrating to MongoDB - QA/DevOps/Bi Pain
● Problems
○ New query language
○ Write application-side code
● “Solution”: Transform & Inject
data to RDBMS Data
Warehouse
@mtrivizas @crateio
Migrating to MongoDB - DevOps/SysAdmin Pain
Production setup needs:
● DB nodes
● Config servers (at least 3 even if
you only have 2 DB nodes)
● mongos instances
Need for 3 different configurations ->
more “pain” for vm/container images
@mtrivizas @crateio
CrateDB to the rescue!
@mtrivizas @crateio
CrateDB
A Distributed, Persistent, Realtime SQL Database
@mtrivizas @crateio
Crate.io
● Founded in 2013
● ~25 people and growing
● Offices in Dornbirn (AT), Berlin (DE), and San Francisco (US)
● Opensource: https://github.com/crate/crate
@mtrivizas @crateio
CrateDB - core features
● Simply scalable using shared-nothing architecture
● Uses SQL for DDL/DML/DQL
● Supports dynamic schemas
● Supports full-text, geospatial & time series search
● Eventually consistent
@mtrivizas @crateio
CrateDB - Sample users
● Skyhigh Networks - cloud access security broker for F500
● Alpla (Coke bottles!) - real-time manufacturing optimization
● Clickdrive.io - Real-time vehicle fleet tracking; 1500 readings/car/second
● Gantner Instruments - Industrial (nuclear) sensors; 10Khz readings
● Roomonitor - AC & noise level sensors & control for AirBnB/public housing
● NBC (Golf Channel) - Internet of Golfers app (geo-positioning)
● ...
@mtrivizas @crateio
CrateDB - Numbers from users
● 100+ nodes clusters
● 3.2bn inserts/updates /day
● 1m+ inserts / second with 14 nodes
● 4000 reads /sec
@mtrivizas @crateio
CrateDB - for All
● SQL (+ some extension e.g. sharding) for schema creation
● SQL for data modification
● Power of SQL for data query (Joins, subselects, filtering over
aggregations, etc)
● Atomic row updates - no locking of whole document collections
● Blazing fast and accurate (no approximations) aggregations, up to 29x
faster than PostgreSQL
https://crate.io/a/benchmarking-complex-query-performance-cratedb-postgresql/
● Bulk inserts
@mtrivizas @crateio
CrateDB - for Devs
● Every column is indexed by default
● Variety of data types including arrays and objects
● Variety of functions (scalar, conditional, aggregations, casts)
● Blob storage
● Transparent partitioning
● Customizable full text search support (tokenizers, analyzers, filters)
● Dynamic schema
● Generated columns
● Explain execution plan
@mtrivizas @crateio
CrateDB - for Devs/DevOps
● HTTP (REST) & Binary protocol (Postgres wire protocol), JDBC
supported!
● Other clients:
○ Python
○ PHP PDO and DBAL
○ Erlang
○ ODBC
CrateDB - for DevOps/SysAdmins
● Easy cluster setup: All nodes are equal
● Runs on premises and in cloud AWS, Azure, GCE
● Easily containerized - Docker, Kubernetes
● Auto-sharding & replication, partitioning
● Scale horizontally? just add nodes
● Built-in cluster management
● Comes with a web UI
● CLI: Crash
@mtrivizas @crateio
CrateDB - for DevOps/SysAdmins
● Import/Export (file system, S3) and also Insert by query!
● Backup/Restore mechanism via snapshots (HDFS also supported as
storage)
● Importing/Exporting to an RBMS is easy
● Monitor/Manage via SQL
○ Information schema tables
○ sys.cluster, sys.nodes, sys.shards
○ sys.jobs, sys.jobs_log (kill jobs supported)
○ Change runtime settings
@mtrivizas @crateio
CrateDB - Enterprise edition
● JMX Monitoring plugin (integrate with popular monitoring
platforms)
● Host based authentication (HTTP/PG)
● User defined functions (Javascript)
@mtrivizas @crateio
CrateDB - Coming soon…
● Enterprise edition
○ Encryption for clients and node-node communication (SSL)
○ User authentication, schema permissions
● Community edition
○ More SQL features (full support for subselects, unions, etc)
@mtrivizas @crateio
@mtrivizas @crateio
CrateDB - Considerations
● No Join variations (only nested loop currently)
● No auto-Increment functionality
● No transactions (only eventually consistent - optimistic lock)
● No stored procedures
● No views
● No full support for subselects (but soon…)
@mtrivizas @crateio
CrateDB - Internals - Overview
Postgres Wire
Protocol/HTTP
ANTLR4 Parser
Query Analyzer/Planner
Distributed Execution
Engine
Lucene
...
CrateDB CrateDB CrateDB
Client
@mtrivizas @crateio
CrateDB - Internals - Sharding
@mtrivizas @crateio
CrateDB - Internals - Sharding
CREATE TABLE my_table (
col1 string,
col2 integer,
…
) CLUSTERED BY (col2) INTO 10 SHARDS
WITH (number_of_replicas = 2)
@mtrivizas @crateio
CrateDB - Internals - Partitioning
CREATE TABLE parted_table (
id long,
title string,
content string,
day timestamp
)
CLUSTERED BY (title) INTO 4 SHARDS
PARTITIONED BY (day)
WITH (number_of_replicas = 4)
Total shards =
number of partitions x
4 (number of shards) x
5 (primary + 4 replicas)
@mtrivizas @crateio
CrateDB - Internals - Partitioning
CREATE TABLE computed_parted_table (
id long,
data double,
created_at timestamp,
month timestamp GENERATED ALWAYS AS date_trunc('month',
created_at)
) PARTITIONED BY (month);
@mtrivizas @crateio
CrateDB - Internals - Refresh
CREATE TABLE my_table (
id long primary key,
data double,
...
) ... WITH (refresh_interval = 10000)
REFRESH TABLE my_table;
SELECT * FROM my_table WHERE id = 10
@mtrivizas @crateio
Express your data question in SQL and let the
magic happen
Complicated
SQL
CrateDB
Awesome
Analytics
@mtrivizas @crateio
CrateDB - Internals - Query execution path
● Parse the query
● Analyze/Plan the query
● Find route to shards
● Use Lucene Reader to get IDs (possible apply ORDER BY)
● Return results & merge (in a map-reduce fashion)
● Apply limit/offset
● Fetch the rest of the fields
● Evaluate remaining expressions
● Return results
@mtrivizas @crateio
CrateDB - Play with it
https://play.crate.io (read only)
@mtrivizas @crateio
Questions?
@mtrivizas @crateio
Let’s store large amounts of data
AND
have realtime query capabilities!

More Related Content

Recently uploaded

Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Monica Sydney
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
JOHNBEBONYAP1
 
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Monica Sydney
 
一比一原版田纳西大学毕业证如何办理
一比一原版田纳西大学毕业证如何办理一比一原版田纳西大学毕业证如何办理
一比一原版田纳西大学毕业证如何办理
F
 
Abu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu DhabiAbu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Monica Sydney
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Monica Sydney
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
ayvbos
 
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
ydyuyu
 

Recently uploaded (20)

Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
 
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
 
一比一原版田纳西大学毕业证如何办理
一比一原版田纳西大学毕业证如何办理一比一原版田纳西大学毕业证如何办理
一比一原版田纳西大学毕业证如何办理
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirt
 
Ballia Escorts Service Girl ^ 9332606886, WhatsApp Anytime Ballia
Ballia Escorts Service Girl ^ 9332606886, WhatsApp Anytime BalliaBallia Escorts Service Girl ^ 9332606886, WhatsApp Anytime Ballia
Ballia Escorts Service Girl ^ 9332606886, WhatsApp Anytime Ballia
 
Abu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu DhabiAbu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
 
Mira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call GirlsMira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
 
Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...
Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...
Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 
Local Call Girls in Seoni 9332606886 HOT & SEXY Models beautiful and charmin...
Local Call Girls in Seoni  9332606886 HOT & SEXY Models beautiful and charmin...Local Call Girls in Seoni  9332606886 HOT & SEXY Models beautiful and charmin...
Local Call Girls in Seoni 9332606886 HOT & SEXY Models beautiful and charmin...
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
 
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
 
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac RoomVip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
 
Call girls Service in Ajman 0505086370 Ajman call girls
Call girls Service in Ajman 0505086370 Ajman call girlsCall girls Service in Ajman 0505086370 Ajman call girls
Call girls Service in Ajman 0505086370 Ajman call girls
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
 
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
 
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime NagercoilNagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
 

Featured

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Remedying the Challenges of Migrating Oracle/Postgres/SQL to MongoDB/NoSQL

  • 1. Migrating from SQL to NoSQL (MongoDB) I wish I had a time machine and could use CrateDB @mtrivizas @crateio
  • 2. @mtrivizas @crateio Marios (me) Academic background in Databases & Distributed Systems ~ 15 years experience as backend developer, C / Java Focused on Data (Databases/Datastores & caching) Core Developer, 1 year at Crate.io
  • 3. @mtrivizas @crateio ● Migrating from RDBMS to NoSQL (MongoDB) ● Why CrateDB made me wish I had a time machine (This is not a presentation for MongoDB vs CrateDB) Agenda
  • 4. @mtrivizas @crateio Migrating from RDBMS to NoSQL (MongoDB) ● Why ● Developers’ pain ● QA/DevOps/BI pain ● SysAdmin’s pain
  • 5. @mtrivizas @crateio Migrating to MongoDB - Why - Previous Setup ● Several 2-node clusters of PostgreSQL (each cluster servicing one project/customer) ● 1 Large Oracle data warehouse ● Performing analytics on Production DBs (schemas contained tables with data only used for analytics) ● Some data transformed -> injecting to DWH for further analysis
  • 6. @mtrivizas @crateio Migrating to MongoDB - Why - Problem ● Unsatisfactory throughput/latency for user requests ● Querying 2 different systems (production db and DWH) for analytics ● Querying production db for BI purposes impacts performance dramatically (both BI queries & user requests) ● Ability to perform text searching
  • 7. @mtrivizas @crateio Migrating to MongoDB - Why - Solution ● Move non-transactional BI data to a NoSQL data store ● Use one large (or a few large) NoSQL cluster which stores all BI related data / Remove DWH ● Why MongoDB ○ Document data store ○ Performance ○ In house knowledge by other company teams
  • 8. @mtrivizas @crateio Migrating to MongoDB - Developer’s Pain ● Schema - Docs ○ Referenced docs ■ Multiple queries to retrieve info ■ application processing (no join support)
  • 9. @mtrivizas @crateio Migrating to MongoDB - Developer’s Pain ● Schema - Docs ○ Embedded ■ Max doc size 2mb (now 16mb) ■ Data duplication ■ Data inconsistencies (updated in one embedded doc but not in other(s) ○ Often updates or growing documents lead to fragmentation ■ Running compact is really slow and DB is almost unusable
  • 10. @mtrivizas @crateio Migrating to MongoDB - Developer’s Pain ● Schema - Indexes ○ If all fields indexed ■ Insert/Update performance drops a lot ■ Indexes must fit in memory ○ Need to carefully choose which fields to index and which scheme to use (partial, sparse) ○ Keyword search very good / full-text quite slow
  • 11. @mtrivizas @crateio Migrating to MongoDB - Developer’s Pain ● Schema - Aggregation ○ Some aggregations were really slow -> had to store several counters in extra collections to solve the problem ● Write lock per collection
  • 12. @mtrivizas @crateio Migrating to MongoDB - Developer’s Pain Learn new DML & DQL Language
  • 13. @mtrivizas @crateio Migrating to MongoDB - QA/DevOps/Bi Pain ● Problems ○ New query language ○ Write application-side code ● “Solution”: Transform & Inject data to RDBMS Data Warehouse
  • 14. @mtrivizas @crateio Migrating to MongoDB - DevOps/SysAdmin Pain Production setup needs: ● DB nodes ● Config servers (at least 3 even if you only have 2 DB nodes) ● mongos instances Need for 3 different configurations -> more “pain” for vm/container images
  • 16. @mtrivizas @crateio CrateDB A Distributed, Persistent, Realtime SQL Database
  • 17. @mtrivizas @crateio Crate.io ● Founded in 2013 ● ~25 people and growing ● Offices in Dornbirn (AT), Berlin (DE), and San Francisco (US) ● Opensource: https://github.com/crate/crate
  • 18. @mtrivizas @crateio CrateDB - core features ● Simply scalable using shared-nothing architecture ● Uses SQL for DDL/DML/DQL ● Supports dynamic schemas ● Supports full-text, geospatial & time series search ● Eventually consistent
  • 19. @mtrivizas @crateio CrateDB - Sample users ● Skyhigh Networks - cloud access security broker for F500 ● Alpla (Coke bottles!) - real-time manufacturing optimization ● Clickdrive.io - Real-time vehicle fleet tracking; 1500 readings/car/second ● Gantner Instruments - Industrial (nuclear) sensors; 10Khz readings ● Roomonitor - AC & noise level sensors & control for AirBnB/public housing ● NBC (Golf Channel) - Internet of Golfers app (geo-positioning) ● ...
  • 20. @mtrivizas @crateio CrateDB - Numbers from users ● 100+ nodes clusters ● 3.2bn inserts/updates /day ● 1m+ inserts / second with 14 nodes ● 4000 reads /sec
  • 21. @mtrivizas @crateio CrateDB - for All ● SQL (+ some extension e.g. sharding) for schema creation ● SQL for data modification ● Power of SQL for data query (Joins, subselects, filtering over aggregations, etc) ● Atomic row updates - no locking of whole document collections ● Blazing fast and accurate (no approximations) aggregations, up to 29x faster than PostgreSQL https://crate.io/a/benchmarking-complex-query-performance-cratedb-postgresql/ ● Bulk inserts
  • 22. @mtrivizas @crateio CrateDB - for Devs ● Every column is indexed by default ● Variety of data types including arrays and objects ● Variety of functions (scalar, conditional, aggregations, casts) ● Blob storage ● Transparent partitioning ● Customizable full text search support (tokenizers, analyzers, filters) ● Dynamic schema ● Generated columns ● Explain execution plan
  • 23. @mtrivizas @crateio CrateDB - for Devs/DevOps ● HTTP (REST) & Binary protocol (Postgres wire protocol), JDBC supported! ● Other clients: ○ Python ○ PHP PDO and DBAL ○ Erlang ○ ODBC
  • 24. CrateDB - for DevOps/SysAdmins ● Easy cluster setup: All nodes are equal ● Runs on premises and in cloud AWS, Azure, GCE ● Easily containerized - Docker, Kubernetes ● Auto-sharding & replication, partitioning ● Scale horizontally? just add nodes ● Built-in cluster management ● Comes with a web UI ● CLI: Crash @mtrivizas @crateio
  • 25. CrateDB - for DevOps/SysAdmins ● Import/Export (file system, S3) and also Insert by query! ● Backup/Restore mechanism via snapshots (HDFS also supported as storage) ● Importing/Exporting to an RBMS is easy ● Monitor/Manage via SQL ○ Information schema tables ○ sys.cluster, sys.nodes, sys.shards ○ sys.jobs, sys.jobs_log (kill jobs supported) ○ Change runtime settings @mtrivizas @crateio
  • 26. CrateDB - Enterprise edition ● JMX Monitoring plugin (integrate with popular monitoring platforms) ● Host based authentication (HTTP/PG) ● User defined functions (Javascript) @mtrivizas @crateio
  • 27. CrateDB - Coming soon… ● Enterprise edition ○ Encryption for clients and node-node communication (SSL) ○ User authentication, schema permissions ● Community edition ○ More SQL features (full support for subselects, unions, etc) @mtrivizas @crateio
  • 28. @mtrivizas @crateio CrateDB - Considerations ● No Join variations (only nested loop currently) ● No auto-Increment functionality ● No transactions (only eventually consistent - optimistic lock) ● No stored procedures ● No views ● No full support for subselects (but soon…)
  • 29. @mtrivizas @crateio CrateDB - Internals - Overview Postgres Wire Protocol/HTTP ANTLR4 Parser Query Analyzer/Planner Distributed Execution Engine Lucene ... CrateDB CrateDB CrateDB Client
  • 30. @mtrivizas @crateio CrateDB - Internals - Sharding
  • 31. @mtrivizas @crateio CrateDB - Internals - Sharding CREATE TABLE my_table ( col1 string, col2 integer, … ) CLUSTERED BY (col2) INTO 10 SHARDS WITH (number_of_replicas = 2)
  • 32. @mtrivizas @crateio CrateDB - Internals - Partitioning CREATE TABLE parted_table ( id long, title string, content string, day timestamp ) CLUSTERED BY (title) INTO 4 SHARDS PARTITIONED BY (day) WITH (number_of_replicas = 4) Total shards = number of partitions x 4 (number of shards) x 5 (primary + 4 replicas)
  • 33. @mtrivizas @crateio CrateDB - Internals - Partitioning CREATE TABLE computed_parted_table ( id long, data double, created_at timestamp, month timestamp GENERATED ALWAYS AS date_trunc('month', created_at) ) PARTITIONED BY (month);
  • 34. @mtrivizas @crateio CrateDB - Internals - Refresh CREATE TABLE my_table ( id long primary key, data double, ... ) ... WITH (refresh_interval = 10000) REFRESH TABLE my_table; SELECT * FROM my_table WHERE id = 10
  • 35. @mtrivizas @crateio Express your data question in SQL and let the magic happen Complicated SQL CrateDB Awesome Analytics
  • 36. @mtrivizas @crateio CrateDB - Internals - Query execution path ● Parse the query ● Analyze/Plan the query ● Find route to shards ● Use Lucene Reader to get IDs (possible apply ORDER BY) ● Return results & merge (in a map-reduce fashion) ● Apply limit/offset ● Fetch the rest of the fields ● Evaluate remaining expressions ● Return results
  • 37. @mtrivizas @crateio CrateDB - Play with it https://play.crate.io (read only)
  • 39. @mtrivizas @crateio Let’s store large amounts of data AND have realtime query capabilities!