February, 2018
Andy Ellicott, Crate.io
SQL for Machine
Data?
Logistics…
• Submit questions at any time via the questions panel

• Slides & recording will be shared via email after the event
Agenda
–
• Machine data - the next big wave?

• Machine data use cases

• Machine data management options - Splunk, ELK, Time Series, 

• Reinventing SQL for machine data

• SQL examples

• Questions & answers
I like databases
25 years in DBMS & software development companies

IMHO…the coolest ways software is changing what’s
possible in life and business…is usually due to some
database changing what’s possible with software.
The next wave of big data
will come from machines
“Things Data”
The next wave…“Things Data”
–
By 2020, 50% of new
software systems are IoT
related
IoT
Putting Machine Data to Work
—
• Definitive record of all activity and behavior

- What happened, when, where, by whom

• Tells us how to optimize: 

- Customer experience

- Safety

- Production

- Profitability

• Where things are going right vs. wrong

• Fingerprints of fraud
Customer: 

–
“CrateDB’s real-time SQL
performance, simple scaling, and
high availability make it a key
element of our stack”
Sekhar Sarukkai

Co-founder
Use case: Cyber Security - Campbell, CA
• Leading Cloud Access Security Broker (CASB) 

• SaaS system monitors internet traffic for security risks

- 700 customers, 40% of F500

Data Challenges

• Original MySQL-ElasticSearch platform grew too costly to run &
too hard to maintain

- Duplicate data storage, DB syncing code

CrateDB Results
• Replaced MySQL/ElasticSearch with CrateDB in 2015

• ~100TB data, billions of network messages per day

• Real-time queries for 1000s of concurrent users

• 20x faster, 75% lower AWS costs
Customer:

–
Use case: Industrial IoT - Atlanta, GA
• $4B producer of bottles for Coca Cola, P&G, Unilever

• 2016 initiative: Use real-time IoT data to optimize overall equipment
effectiveness across 170 factories

Data Challenges
• Diversity - 900 different sensor types per production line

• MS SQL Server too slow and inflexible

- 900 tables (1 per sensor type)

- 3 - 5 minute query response times

CrateDB Results
• Easier development - 1 table vs. 900 in SQL Server

• Faster dashboards - 20ms vs. 4,000ms

• Central cloud + edge deployment = insight on factory floor and in
central “Mission Control”

• Lower labor costs and greater overall equipment effectiveness (OEE)
“Thousands of sensors generate data
along our production lines, and CrateDB
allows us to analyze that firehose of data
24 hours a day to make real-time
improvements to factory efficiency.”
Philipp Lehner, 

CEO Alpla, USA
Customer: 

–
Use case: Smart Lighting - Los Angeles, CA
• $2B global leader in IoT-enabled industrial lighting 

• Lighting Burj Khalifa, OfficeMax & Sainsbury’s chains

• Software to control & monitor complex network of lighting, plus
presence, energy, & WiFi sensors

Data Challenges
• MySQL could not scale to support new initiatives:
- Shift to SaaS - central cloud portal

- Real-time reporting

- Time series analysis of operational metrics

CrateDB Results
• Easy migration from MySQL, in weeks

• Simple scaling with CrateDB on Docker

• Real-time data - concurrent SaaS users and API for application
partners

• 40x better DBMS price-performance vs. MySQL
Customer:
–
“We need to process massive amounts of
data our customers’ vehicles generate, in
real time. CrateDB offered the best
performance, scalability, and ease-of-use of
any SQL or NoSQL DBMS we tried.”
Mark Sutheran, 

Founder, Clickdrive
Use case: Vehicle Fleet Management - Singapore
• Internet-enabled vehicle fleet monitoring system

• Used by Singapore taxis, insurance vehicle fleets

• Real-time monitoring of vehicle location & health, improves fleet
utilization, safety, driver behavior, profitability 

Data Challenges
• Real-time vehicle status & location, while ingesting 1,500 data points per
second per car, 24x7

• Data science - query 10s of terabytes of vehicle system data to develop
predictive maintenance algorithms

• MySQL can’t scale, Cassandra required too much tuning

CrateDB Results
• Revealed hidden maintenance issues with 50% of vehicles

• Reduced repair costs 20% by predicting problems earlier

• Data processing speed enabling development of 3D accident recreation
within minutes
The Next Wave of Big Data
–
“IoT is creating unparalleled information
management and analytics challenges.”
- Jim Hare, Gartner
Every
Step
Every
Lightbulb
Every
Message
Every
Bottle
•Firehose of data
•Complex data
•Real-time
•Edge + Cloud
Millions of data points per second
Instantly actionable - current & large historic data sets
Run anywhere. Cloud. On-premises Containers. Small
footprint or large clusters with 100+ nodes.
Joins, Time Series, Geospatial, JSON, Text search, AI, Blobs
Your Machine Data Management Options
-
But More Likely …
-
First… Then… Lately…
Log search,
analytics 

Full stack -
forwarders,
indexers, search
heads, visualization
Open source

Log search,
analytics

Full stack -
Elasticsearch,
Logstash, Kibana
Time Series, 

IT metrics
Traditional SQL Splunk, et al
Firehose of data ❌ ✅
Complex queries &
dynamic data
❌ ✅
Fast (Real-time) Queries ❌ ✴
Why Not SQL?

–
SQL Mainstream Must be Enabled to Achieve IoT Growth

–
45:1
Ratio of SQL to NoSQL
developers 

(Source: LinkedIn)
By 2020, 50% of new
systems are IoT related
IoT
Reinventing SQL
for Machine Data
The Newest Generation of SQL
–
SQL NOSQL
Crate Components (  ​   ​Crate   ​   ​ Elasticsearch ,   ​   ​other Open Source) 
The CrateDB Open Source Stack
–
1 file to download & install

Benefits of NoSQL with
SQL ease of use
CrateDB - the key inventions

–
Distributed SQL with search, time
series, geospatial, aggregations
Cloud-native architecture
easy scaling via Containers
NoSQL storage & clustering for
horizontal scaling & dynamic schema
Columnar Caches for real-time, in-
memory SQL query performance
shared-nothing architecture
If you know SQL, you know CrateDB
–
Simple install

Zero-configuration, auto-join
Compatible

ANSI SQL vis Postgres-wire
protocol, JDBC, REST
Real-time performance

Distributed SQL query engine
Dynamic schema

all data (structured + JSON), time
series, geospatial
Distributed SQL query versatility

Aggregations, time series, search,
geospatial…
Simpler scalability

Shared nothing, horizontal scale out

Always on

High availability, replication, self-
healing
Flexible

No lock-in, runs any cloud and on-
premise
CrateDB Traditional SQL NoSQL
Firehose of data ✅ ✴ ✅
Complex,
dynamic data ✅ ❌ ✅
Real-Time Queries ✅ ❌ ✴
SQL ✅ ✅ ❌
New DBMS Required for “Things Data” Era?

–
Performance?
–
• CrateDB linear scalability

- Performance rises linearly with cluster
size

• CrateDB vs. PostgreSQL

- Complex queries run 29x faster in
CrateDB on 30% lower hardware cost

• InfluxDB (time series)

- 7x more query throughput under
concurrent user load - better for multi-
user time series apps (SaaS)
Apps
DB
Input
CrateDB Open Machine Data Stack - build your own with SQL
—
‣ Integrates easily
‣ Low learning curve
‣ Greatest flexibility
‣ No lock in
Custom

SQL Apps
Built for the Open Machine Data Stack
—
A database rarely exists independently. Instead, it is usually part of an ecosystem of tools and
other products, with each covering a different need in a data pipeline.
1. Trackers 2. Collectors 3. Enrich 4. Storage
5. Data
Modeling
6. Analytics
If You’re Doing Distributed…
–
Gateway
Devices
Servers, Sensors, 

Actuators, Machines,

Wearables, Cars etc.
Applications

& PlatformsGateway & DB
Edge Public/Hybrid/Private
shared-nothing architecture
CrateDB enables use-cases at the “edge” and in the cloud, with SQL, horizontal scaling, high availability, and multi-model data
structures. With CrateDB, customers can extract value from realtime data, enabling applications & services not possible before.
MQTT Broker & Ingestion Framework
–
• Message queues were invented to compensate for
DBMS weaknesses

- Downtime

- Slow ingestion

• New databases like CrateDB don’t have those
pitfalls

• Embedding MQTT broker in CrateDB 

- Define “Ingestion rules” in CrateDB

• MQTT topic —> Target table for storage

- Stores messages in tables

- Eliminates the need for extra middleware

• Lowers hosting costs, complexity, development time
Message Queue
Devices
MQTT messages
versus
DBMS
slow ingest &
DB downtime Fast ingestion. Always-on architecture
Embedded MQTT Broker
MQTT messages
Devices
MQTT Broker
MQTT Consumer/Writer
CrateDB Output Plugin for Telegraf
–
• Telegraf is a plugin-driven server for
collecting metrics, usually connecting
to InfluxDB

• New Telegraf plug-in writes to
CrateDB via the PostgreSQL protocol

• More turnkey integration with popular
time series data sources

• Makes it easy to migrate existing time
series data workloads to CrateDB

- For more complex data & queries

- SQL access

- Larger data / time windows

- More concurrent users
Applications

& Platforms
shared-nothing architecture
System
Stats
DBs
Networks
Message

Queues
Apps
Telegraf
Connect CrateDB to
dozens of data sources
SQL
Prometheus Integration
–
• Prometheus is a standard time series store
for monitoring IT infrastructure

- Simple, standard systems monitoring data
endpoint e.g. Docker

• Prometheus Remote Adapter for CrateDB
- Developed by RobustPerception.io

- Standard way for Prometheus to pass read/
write requests to other back-end databases

• Docker & other IT software can use CrateDB
for larger, more complex time series analysis
CrateDB
Adapter
Local storage
Unlimited storage
Unlimited data &
query complexity
Remote
read/write
protocol
Prometheus
IT Software
CrateDB
Systems
monitoring
event data
SQL for Machine
Data at ALPLA
Customer - ALPLA

–
•172 factories in 45 countries

•18,000 employees

•Global manufacturer

- Innovation leader

- Cost leader

•Plastic packaging products

- Bottles, caps, …

• eg. every CocaCola bottle in USA
Use Case
–
•Through real-time monitoring:

- Increase equipment efficiency (OEE)

- Decrease resource utilization

- Simplify labor management

•Complexity:

- 1500 production lines

- 900 different sensor types

- 160M bottles/day to be measured
Data collection
–
Production machine
data is collected at the
edge (Docker, CrateDB) 

JSON messages sent
over internet to cloud

Central data storage for
realtime dashboards,
monitoring, alerting,
prediction, machine
learning
Solution
–
24x7 central

Mission Control
for all factories
• Scale to all production lines, connect all feeds, collect all raw data

• Aggregate, monitor, predict things from huge data volumes

• Take action from data immediately through tablets, Hololens, etc.
Docker in the
cloud
–
• RabbitMQ receiving data

• CrateDB as storage for raw data

• Enrichment of data

• CrateDB as storage for enriched
data

• API

• Realtime management system

• Dashboards

• API for Hololens

RabbitMQ
CrateDB Enrichment
API Dashboards
Hololens …
In Summary…
-
• New machine data requirements

- Firehose

- Complex

- Real time

• SQL coming [back] to the rescue

- New DBMS architecture

- Same scale, performance, dynamic data as NoSQL

- Easier learning curve & integration (more choices)

- Better economics

• Splunk & ELK stack a good choice when

- You need turnkey Security Analytics / SIEM
Thank You!
-
• CrateDB

- https://crate.io

• Slides & recording of this will be sent to you shortly, via email

• Ping me any time

- Andy Ellicott

- andy@crate.io

Webinar: SQL for Machine Data?

  • 1.
    February, 2018 Andy Ellicott,Crate.io SQL for Machine Data?
  • 2.
    Logistics… • Submit questionsat any time via the questions panel • Slides & recording will be shared via email after the event
  • 3.
    Agenda – • Machine data- the next big wave? • Machine data use cases • Machine data management options - Splunk, ELK, Time Series, • Reinventing SQL for machine data • SQL examples • Questions & answers
  • 4.
    I like databases 25years in DBMS & software development companies IMHO…the coolest ways software is changing what’s possible in life and business…is usually due to some database changing what’s possible with software.
  • 5.
    The next waveof big data will come from machines “Things Data”
  • 6.
    The next wave…“ThingsData” – By 2020, 50% of new software systems are IoT related IoT
  • 7.
    Putting Machine Datato Work — • Definitive record of all activity and behavior - What happened, when, where, by whom • Tells us how to optimize: - Customer experience - Safety - Production - Profitability • Where things are going right vs. wrong • Fingerprints of fraud
  • 8.
    Customer: – “CrateDB’s real-timeSQL performance, simple scaling, and high availability make it a key element of our stack” Sekhar Sarukkai Co-founder Use case: Cyber Security - Campbell, CA • Leading Cloud Access Security Broker (CASB) • SaaS system monitors internet traffic for security risks - 700 customers, 40% of F500
 Data Challenges • Original MySQL-ElasticSearch platform grew too costly to run & too hard to maintain - Duplicate data storage, DB syncing code
 CrateDB Results • Replaced MySQL/ElasticSearch with CrateDB in 2015 • ~100TB data, billions of network messages per day • Real-time queries for 1000s of concurrent users • 20x faster, 75% lower AWS costs
  • 9.
    Customer: – Use case: IndustrialIoT - Atlanta, GA • $4B producer of bottles for Coca Cola, P&G, Unilever • 2016 initiative: Use real-time IoT data to optimize overall equipment effectiveness across 170 factories
 Data Challenges • Diversity - 900 different sensor types per production line • MS SQL Server too slow and inflexible - 900 tables (1 per sensor type) - 3 - 5 minute query response times
 CrateDB Results • Easier development - 1 table vs. 900 in SQL Server • Faster dashboards - 20ms vs. 4,000ms • Central cloud + edge deployment = insight on factory floor and in central “Mission Control” • Lower labor costs and greater overall equipment effectiveness (OEE) “Thousands of sensors generate data along our production lines, and CrateDB allows us to analyze that firehose of data 24 hours a day to make real-time improvements to factory efficiency.” Philipp Lehner, CEO Alpla, USA
  • 10.
    Customer: – Use case:Smart Lighting - Los Angeles, CA • $2B global leader in IoT-enabled industrial lighting • Lighting Burj Khalifa, OfficeMax & Sainsbury’s chains • Software to control & monitor complex network of lighting, plus presence, energy, & WiFi sensors
 Data Challenges • MySQL could not scale to support new initiatives: - Shift to SaaS - central cloud portal - Real-time reporting - Time series analysis of operational metrics
 CrateDB Results • Easy migration from MySQL, in weeks • Simple scaling with CrateDB on Docker • Real-time data - concurrent SaaS users and API for application partners • 40x better DBMS price-performance vs. MySQL
  • 11.
    Customer: – “We need toprocess massive amounts of data our customers’ vehicles generate, in real time. CrateDB offered the best performance, scalability, and ease-of-use of any SQL or NoSQL DBMS we tried.” Mark Sutheran, Founder, Clickdrive Use case: Vehicle Fleet Management - Singapore • Internet-enabled vehicle fleet monitoring system • Used by Singapore taxis, insurance vehicle fleets • Real-time monitoring of vehicle location & health, improves fleet utilization, safety, driver behavior, profitability Data Challenges • Real-time vehicle status & location, while ingesting 1,500 data points per second per car, 24x7 • Data science - query 10s of terabytes of vehicle system data to develop predictive maintenance algorithms • MySQL can’t scale, Cassandra required too much tuning
 CrateDB Results • Revealed hidden maintenance issues with 50% of vehicles • Reduced repair costs 20% by predicting problems earlier • Data processing speed enabling development of 3D accident recreation within minutes
  • 12.
    The Next Waveof Big Data – “IoT is creating unparalleled information management and analytics challenges.” - Jim Hare, Gartner Every Step Every Lightbulb Every Message Every Bottle •Firehose of data •Complex data •Real-time •Edge + Cloud Millions of data points per second Instantly actionable - current & large historic data sets Run anywhere. Cloud. On-premises Containers. Small footprint or large clusters with 100+ nodes. Joins, Time Series, Geospatial, JSON, Text search, AI, Blobs
  • 13.
    Your Machine DataManagement Options -
  • 14.
    But More Likely… - First… Then… Lately… Log search, analytics Full stack - forwarders, indexers, search heads, visualization Open source Log search, analytics Full stack - Elasticsearch, Logstash, Kibana Time Series, IT metrics
  • 15.
    Traditional SQL Splunk,et al Firehose of data ❌ ✅ Complex queries & dynamic data ❌ ✅ Fast (Real-time) Queries ❌ ✴ Why Not SQL? –
  • 16.
    SQL Mainstream Mustbe Enabled to Achieve IoT Growth – 45:1 Ratio of SQL to NoSQL developers 
 (Source: LinkedIn) By 2020, 50% of new systems are IoT related IoT
  • 17.
  • 18.
    The Newest Generationof SQL – SQL NOSQL
  • 19.
    Crate Components (  ​   ​Crate   ​   ​ Elasticsearch ,   ​   ​other Open Source)  TheCrateDB Open Source Stack – 1 file to download & install Benefits of NoSQL with SQL ease of use
  • 20.
    CrateDB - thekey inventions
 – Distributed SQL with search, time series, geospatial, aggregations Cloud-native architecture easy scaling via Containers NoSQL storage & clustering for horizontal scaling & dynamic schema Columnar Caches for real-time, in- memory SQL query performance shared-nothing architecture
  • 21.
    If you knowSQL, you know CrateDB – Simple install
 Zero-configuration, auto-join Compatible
 ANSI SQL vis Postgres-wire protocol, JDBC, REST Real-time performance
 Distributed SQL query engine Dynamic schema
 all data (structured + JSON), time series, geospatial Distributed SQL query versatility
 Aggregations, time series, search, geospatial… Simpler scalability
 Shared nothing, horizontal scale out Always on
 High availability, replication, self- healing Flexible
 No lock-in, runs any cloud and on- premise
  • 22.
    CrateDB Traditional SQLNoSQL Firehose of data ✅ ✴ ✅ Complex, dynamic data ✅ ❌ ✅ Real-Time Queries ✅ ❌ ✴ SQL ✅ ✅ ❌ New DBMS Required for “Things Data” Era? –
  • 23.
    Performance? – • CrateDB linearscalability - Performance rises linearly with cluster size • CrateDB vs. PostgreSQL - Complex queries run 29x faster in CrateDB on 30% lower hardware cost • InfluxDB (time series) - 7x more query throughput under concurrent user load - better for multi- user time series apps (SaaS)
  • 24.
    Apps DB Input CrateDB Open MachineData Stack - build your own with SQL — ‣ Integrates easily ‣ Low learning curve ‣ Greatest flexibility ‣ No lock in Custom
 SQL Apps
  • 25.
    Built for theOpen Machine Data Stack — A database rarely exists independently. Instead, it is usually part of an ecosystem of tools and other products, with each covering a different need in a data pipeline. 1. Trackers 2. Collectors 3. Enrich 4. Storage 5. Data Modeling 6. Analytics
  • 26.
    If You’re DoingDistributed… – Gateway Devices Servers, Sensors, 
 Actuators, Machines,
 Wearables, Cars etc. Applications & PlatformsGateway & DB Edge Public/Hybrid/Private shared-nothing architecture CrateDB enables use-cases at the “edge” and in the cloud, with SQL, horizontal scaling, high availability, and multi-model data structures. With CrateDB, customers can extract value from realtime data, enabling applications & services not possible before.
  • 27.
    MQTT Broker &Ingestion Framework – • Message queues were invented to compensate for DBMS weaknesses - Downtime - Slow ingestion • New databases like CrateDB don’t have those pitfalls • Embedding MQTT broker in CrateDB - Define “Ingestion rules” in CrateDB • MQTT topic —> Target table for storage - Stores messages in tables - Eliminates the need for extra middleware • Lowers hosting costs, complexity, development time Message Queue Devices MQTT messages versus DBMS slow ingest & DB downtime Fast ingestion. Always-on architecture Embedded MQTT Broker MQTT messages Devices MQTT Broker MQTT Consumer/Writer
  • 28.
    CrateDB Output Pluginfor Telegraf – • Telegraf is a plugin-driven server for collecting metrics, usually connecting to InfluxDB
 • New Telegraf plug-in writes to CrateDB via the PostgreSQL protocol • More turnkey integration with popular time series data sources • Makes it easy to migrate existing time series data workloads to CrateDB - For more complex data & queries - SQL access - Larger data / time windows - More concurrent users Applications & Platforms shared-nothing architecture System Stats DBs Networks Message Queues Apps Telegraf Connect CrateDB to dozens of data sources SQL
  • 29.
    Prometheus Integration – • Prometheusis a standard time series store for monitoring IT infrastructure - Simple, standard systems monitoring data endpoint e.g. Docker • Prometheus Remote Adapter for CrateDB - Developed by RobustPerception.io - Standard way for Prometheus to pass read/ write requests to other back-end databases • Docker & other IT software can use CrateDB for larger, more complex time series analysis CrateDB Adapter Local storage Unlimited storage Unlimited data & query complexity Remote read/write protocol Prometheus IT Software CrateDB Systems monitoring event data
  • 30.
  • 31.
    Customer - ALPLA
 – •172factories in 45 countries •18,000 employees •Global manufacturer - Innovation leader - Cost leader •Plastic packaging products - Bottles, caps, … • eg. every CocaCola bottle in USA
  • 32.
    Use Case – •Through real-timemonitoring: - Increase equipment efficiency (OEE) - Decrease resource utilization - Simplify labor management •Complexity: - 1500 production lines - 900 different sensor types - 160M bottles/day to be measured
  • 33.
    Data collection – Production machine datais collected at the edge (Docker, CrateDB) JSON messages sent over internet to cloud Central data storage for realtime dashboards, monitoring, alerting, prediction, machine learning
  • 34.
    Solution – 24x7 central
 Mission Control forall factories • Scale to all production lines, connect all feeds, collect all raw data • Aggregate, monitor, predict things from huge data volumes • Take action from data immediately through tablets, Hololens, etc.
  • 35.
    Docker in the cloud – •RabbitMQ receiving data • CrateDB as storage for raw data • Enrichment of data • CrateDB as storage for enriched data • API • Realtime management system • Dashboards • API for Hololens RabbitMQ CrateDB Enrichment API Dashboards Hololens …
  • 36.
    In Summary… - • Newmachine data requirements - Firehose - Complex - Real time • SQL coming [back] to the rescue - New DBMS architecture - Same scale, performance, dynamic data as NoSQL - Easier learning curve & integration (more choices) - Better economics • Splunk & ELK stack a good choice when - You need turnkey Security Analytics / SIEM
  • 37.
    Thank You! - • CrateDB -https://crate.io • Slides & recording of this will be sent to you shortly, via email • Ping me any time - Andy Ellicott - andy@crate.io