SlideShare a Scribd company logo
1 of 58
Apache IoTDB: a Time Series
Database for Industrial IoT
Xiangdong Huang1 and Julian Feinauer2 (on behalf of the IoTDB community)
1 Tsinghua University, Beijing, China
2 Pragmatic Minds, Stuttgart, Germany
Berlin, Germany, 2019.10.23
Outline
• Who We Are
• Why IoTDB Was Born
• Overview of Apache IoTDB (incubating): Main Features
• Working with Current Ecosystems
• Performance Evaluation
• Use Cases
• Future Works
IoTDB
• IoTDB = IoT + DB, a DataBase for managing (Industrial) IoT data
• IoTDB is a IoT DB. (using IoTDB as a keyword on Google, not “IoT DB”)
IoTDB
• IoTDB = IoT + DB, a DataBase for managing (Industrial) IoT data
• IoTDB is a IoT DB. (using IoTDB as a keyword on Google, not “IoT DB”)
• “You can find many substances about IoTDB in Germany”
IoTDB
• IoTDB = IoT + DB, a DataBase for managing (Industrial) IoT data
• “You can find many substances about IoTDB”
• IIoT
turbine excavator trunks modern cars
IoTDB
• IoTDB = IoT + DB, the DataBase for managing (Industrial) IoT data
• “You can find many substances about IoTDB”
• IIoT • DB
deutsche bahn (the real meaning)
Who We Are (The community)
• We come from the Apache IoTDB (incubating) Community
• A young community. 2018.11-18 entered the incubator.
• Mentors: Christofer Dutz, Justin Mclean, (Champion) Kevin A. McGrail, Willem Jiang
• Devoted to building the best time series database (in IoT area) in the world
Who We Are (Individual)
• Xiangdong Huang (sainthxd@gmail.com)
• PhD, PostDoc and Assistant Researcher (now)
in Tsinghua University, Beijing, China
• Use Apache Cassandra (for managing Timeseries Data) from 2012
• Develop IoTDB from 2017
• One of the initial committers of Apache IoTDB incubating
Who We Are (Individual)
• Julian Feinauer (j.feinauer@pragmaticminds.de)
• Founder of Startup pragmatic minds in Germany
• The first committer who is not initial committer
• The Release Manager of the first release version of IoTDB
• The Committer of Apache PLC4x, Edgent etc..
Outline
• Who We Are
• Why IoTDB Was Born
• Overview of Apache IoTDB (incubating): Main Features
• Working with Current Ecosystems
• Performance Evaluation
• Use Cases
• Future Works
The 4th Industrial Revolution
Industry 4.0 Industry Internet
Data analytics and
utility is the key
Advanced data
analytics
Industry Internet
Data + Model
Germany China USA
Data is becoming the most important aspect of this era
Machine Data (Time Series Data) :
the Largest Volume in Industrial Data
Machine Data
Other Domain Data
EnvironmentMeteorology Geography
Industrial
Big Data
Manufacturing
Enterprise Data
VideoModel
Doc Drawings
How to Manage Time Series Data
Network
MQ PI System
(Pi Server)
queryinsertion
save data
locally
RDBMS
How to Manage Time Series Data
Network
MQ Database
queryinsertion
save data
locally
Network
analysis
The Problems
Network
MQ Database
● millions of data points
per second?
● 10 millions of data points
per second?
● billions of data points
per second?
insertion
Big Data
50Hz,500points/machine,
20K wind-turbines macines,
totally up to 500 million points/sec
Produce Data 7*24
with High Frequency
and Large Volume
� More Features
� Out-of-order sometimes
� Sparse Table
(different machine has
different sensors)
The Problems
Network
MQ Database
query
analysis
The Problems
Network
MQ Database
query
analysis
� Features of Data Query
� Time Dimension is always accessed
� Aggregation is the first-class citizen
■ Sometimes we do not need raw data,
just know the count/min/max/avg
value is ok.
■ (For visualization), the screen
resolution is limited, e.g.,
1024*768. Then no meaning for
getting more than 1024 points
(using aggregation to
Downsampling)
� Time-series-specific query and analysis
● get a mass of data QUICKLY (ETL)
● then convert it into a analysis-friendly file format
● time consuming
The Problems
Network
MQ Database
query
analysis
What we want
� Challenges
� Large Volume
� High Throughput
� Low Cost (historical data)
� Low Latency for Query
� Fast Aggregation
� Query-Analysis hybrid workloads
Different Solutions for Managing Time Series
RDB
KVDB
LSM based
•Efficient file structure
•More query functions
Not optimize for
some application
scenarios
TSDB
Limited number of
columns
1600 Columns in a table
Limited number of rows
<=10M rows is better
Manual Sharding
• Support big data
• Limited Queries
• Lack time filtering
• Lack value filtering
• Lack multiple time series
alignment
Based on PG
•Auto sharding
•Query optimization
Performance degrades
sharply after writing
data for a long time
Hbase/Cassandra based
•Partition by TS-UID
and time range
• Storage inefficiency
• Limit of queries
Outline
• Who We Are
• Why IoTDB Was Born
• Overview of Apache IoTDB (incubating): Main Features
• Working with Current Ecosystems
• Performance Evaluation
• Use Cases
• Future Works
Time Series DB for Industrial Internet
now called:
Apache IoTDB
(incubating)
Each node can manage:
★ Tens of millions of time series
★ Trillions of data points
★ Tens of TB data
Support Hadoop, Spark, Matlab,
Grafana etc..
“清华数为”工业互联网时间序列数据库
Apache IoTDB Features
Persist data
efficiently
• Millions points
ingestion per sec
per node
• Tens of millions
of time series
Query data
with low latency
• Efficiently filter data:
millions of points
per sec
• Aggregation:
tens of ms latency
on billions of points
Exclusive operations
of time series
• Segmentation
• Representation
• Subsequence
matching
• Time-frequency
transform
• Visualization
Integration with
existing ecosystem
• Kafka
• MatLab
• Spark
• MapReduce
• Grafana
• Connecting Edge
to the Cloud
• Powerful query
engine
• User Friendly
analytics
Collecti
on
Storage
ProcessLearning
Applicat
ion
Cover the
life cycle of data
Architecture
IoTDB Outlier
detection
Machine
learning
UDF
Hadoop/
Spark
Big data
Framework
cluster
TsFile
Time series optimized
file format
TsFile-CLI
Interactive client command line
IoTDB-JDBC
Grafana-Adaptor
Web dashboard to visualize
time series data
IoTDB-CLI
Interactive client command line
I/E Tool
Batch load and export data
Other
Databases
Application
s
Message
Queue
DevOp
s
devic
e
IoTDB IoTDBSync
Concepts in IoTDB (The Schema)
Device (i.e., Data source)
• A machine instance
Measurement (e.g., sensor)
• A device can have many measurements
Time Series
• Device + Measurement
• is represented as a path that begins with root, like
“root.Cadillac_XT5.USA.CA.7BTC409.fuelRemain”
Storage Group (SG)
• A storage group can have many devices
• Storage groups have independent resources
(threads and files) to increase parallelism and
reduce competitions for locks.
Cadillac XT5
The schema mapping
root.Cadillac_XT5.USA.CA.7BTC409.fuelRemain
root.Cadillac_XT5.USA.CA.7BTC409.speed
root.Cadillac_XT5.USA.NV.6BAC321.speed
country state device name timestamp fuelRemain speed
USA CA 7BTC409 t1 5.0 120
USA CA 7BTC409 t2 4.9 109
USA CA 6BAC321 t1 NULL 50
USA CA 6BAC321 t3 NULL 65
Table Name: Cadillac_XT5
Tags and Fields in InfluxDB, KariosDB, OpenTSDB…
called as Measurement in InfluxDB
Set time series group
SET STORAGE GROUP TO root.laptop.d1.s1;
Create Timeseries
CREATE TIMESERIES root.laptop.d1.s1 WITH DATATYPE=INT32, ENCODING=RLE
Insert Data
INSERT INTO (d1.s1,d1.s2,time) VALUES (1000,2000,14735235234);
Delete Data
DALETE FROM d1.s1 WHERE time < 1000;
Update Data
UPDATE d1.s1 SET VALUE = 2000 WHERE time < 2000 and time > 1000;
Query Data (Filter, Aggregation, Group by time interval)
SELECT d1.s1,d2.* FROM BJ.WF1 WHERE d1.s1 < 2000 and d2.s2 > 1000 and freq(d2.s3) > 0.5;
SELECT count(status), max_value(temperature) from root.ln.wf01.wt01;
SELECT count(status) ) from root.ln.wf01.wt01 group by(1h, [2017-11-03T00:00:00, 2017-11-
03T23:00:00]);
SQL in IoTDB
Supported data type
• Boolean
• Int
• Long
• Float
• Double
• String
• GPS (TODO) -> for trajectory data management
• Array (TODO) -> for unstructured data management
30
TsFile: Zip File Born for Time Series Data
Columnar
Store
- Reduce Disk I/O
- Improve Compression
Compression
&
Encoding
- Improve Compression Greatly
- 15% Better than InfluxDB in
Real Applications
Time-domain
Statistics Info
Natively
- Support Fast Query in
- Time Domain
- Value Domain
- Freq Domain (TODO)
detailed specification:
http://iotdb.apache.org/#/Documents/0.8.0/chap7/sec3
https://cwiki.apache.org/confluence/display/IOTDB/TsFile+Format
TsFile: comparison with Parquet
You say, “tomato”...
Parquet
Parquet TsFile Target in TsFile
Row
Group
Chunk
Group
The data that belongs to a device
instance
Column Chunk The data that belongs to the device’s
measurement
Page Page a part of data that belongs to a Chunk
The differences
❏ Each Page has two columns actually
❏ The time column and the value column
❏ No Repeat and Duplication Field Needed
❏ More summary info for a Page/Chunk
❏ min/max timestamp
❏ min/max value
❏ count
❏ FileMetadata
Page Header
Page Data
Timestamps
Values
Difference in TsFile
statistics
FileMetadata
Devices info
Level 1
Devices info
Level 2
TsFile: comparison with Parquet
Apache Parquet
Chunk Group Chunk
File Metadata
Time
Series
Time1 Value1 Time2 Value
2
TsFile
Time series data
General File Format
Adaptive Delta encoding – Int or Long (TODO)
Gorilla encoding – Float or Double
128, 136, 144, 152, 160, …
8, 8, 8, 8 � 1st difference is constant.
0, 0, 0 � 2nd difference is 1-bit storage needed!
128, 135, 143, 154, 163, …
7, 8, 11, 9 � 1st difference is not constant though
1, 3, -2 � 2nd difference is 2-bit storage needed!
• Unified support of fixed frequency times series
or irregular frequency time series
TS2Diff encoding – Int or Long (timestamps)
• A adaptive enhance for TS2Diff.
• See next page.
RLE encoding – repeated Int or Long
• For repeated sequence: store a value and its count
Bit-Packing encoding – Int or Long
• Store data in compact form
• squeeze out wasteful bits
• XOR consecutive data points
• Store with variable length encoding scheme
Snappy Gzip (TODO) LZO (TODO)
Compression Algorithm
TsFile: Encoding and Compression
Adaptive TS2Diff encoding – Int or Long (TODO)
• For time series with outliers or missing points
• Storing second-order delta values and a boolean flag array.
TsFile: Encoding and Compression
Data Query
Only records root nodes in memory and build virtual trees,
for reducing memory cost and disk I/O
35
Fast Aggregation Method for Time Series
IoTDB-SQL
DM L
R
select
raw
aggregate
filter
device
single
across
metric
single
across
time
certain
range
group by
time
interval
series
order by
ASC
DESC
fill
inter-
polation
latest
limit
slimit
index
C
U
D
DDL
✔ 8 types of sub-clause
✔ ≥1052 kinds of query
IoTDB-SQL
——Concise TS Operations Language
JDBC
——Reduce the Cost of Learning
Interfaces: JDBC, TsFile API, CLI, etc.
Time Series Specific Operations (TODO)
Pattern Matching for Streaming Time Series Data
✔ Split the pattern and data stream into
equal length fragments
✔ Extract features to reduce the dimension
✔ Accelerate the search by using features
✔ Scenario:fault alarm in real time
36
SELECT wind_3s FROM china.farm1.tb2
WHERE time > t1 AND time < t2
AND wind_3s LIKE PATTERN(7.2,..,20.3,..,6.0)
Similarity Search of Sub-series
✔ Indexing data using Key-Value form
✔ Scenarios:
✔ Outlier detection
✔ Historical data analysis
✔ …
From Edge to Cloud: Run IoTDB Everywhere
Time series data files: high-tech
write, high compression ratio,
support simple queries. Simply
put, TsFile is a zip file for time
series data.
Suitable for embedded devices,
general servers, data centers, etc.
TsFile (a component of IoTDB)
A zip file of time series
Freely operate time series of
multiple TsFiles, including: CRUD
and advanced query like:max, min,
avg and temporal alignment.
Scene: Embedded equipment, on-
site industrial computer, general
server, etc.
IoTDB
A database of time series
3rd Systems
Easy to use and integrate for
complex analysis(data fusion,
collaborative recommendation,
machine learning)
Scene: Cloud data center
A data warehouse of time series
Outline
• Who We Are
• Why IoTDB Was Born
• Overview of Apache IoTDB (incubating): Main Features
• Working with Current Ecosystems
• Performance Evaluation
• Use Cases
• Future Works
A Process to Manage Time Series Data
data source
or
JDBC / Session API
JDBC / Session API
Grafana-Adaptor Spark-TsFile-AdaptorJDBC
Analysis with Big Data Framework
(big data set)
Analysis with Matlab
(small data set)
Visualization
(Manual data explore)
Using JDBC to write data
set storage group
create timeseries
insert data
https://iotdb.apache.org/#/Documents/0.8.0/chap6/sec1
Using Session API to write Data
(more efficient)
set storage group
create timeseries
insert data
Using JDBC to Query Data
raw data query
aggregation query
down sampling query
print result
https://iotdb.apache.org/#/Documents/0.8.0/chap6/sec1
Using Grafana to Visualize Data
https://iotdb.apache.org/#/Tools/Grafana
• Install simple-json-datasource plugin
• Config iotdb-grafana-connector
• application.properties
• Start iotdb-grafana-connector
• java -jar iotdb-grafana-0.8.0.war
• Add IoTDB data source(Simplejson)
• choose connector IP
• Config dashboard and Enjoy!
Using Matlab to Analyze Data
read IoTDB by JDBC
fast Fourier
transform
plot
Using Spark to Analyze Data
create table
sql query
read TsFile
write to TsFile
https://iotdb.apache.org/#/Tools/Spark
Demo
• Writing Data Locally
• Show data with Grafana
• Analyze data using SparkSQL
• https://github.com/jixuan1989/iotdb-tutorial
Demo Video
• Writing Data on HDFS directly
• using Hive to analyze it
• Video
Language
• Written by Java
• But the RPC is implemented by Thrift
• Easy to get other language’s API.
Say Hi to the Apache Ecosystem
IoTDB-repository:
RocketMQ: https://github.com/apache/incubator-
iotdb/tree/master/example/rocketmq
Kafka: https://github.com/apache/incubator-
iotdb/tree/master/example/kafka
Third part:
EMQx (MQTT server):
https://github.com/jixuan1989/iotdb-tutorial
Spark: https://github.com/jixuan1989/iotdb-tutorial
Calcite: https://github.com/EJTTianYu/iotdb-calcite
PLC4X:
Mapreduce:
Outline
• Who We Are
• Why IoTDB Was Born
• Overview of Apache IoTDB (incubating): Main Features
• Working with Current Ecosystems
• Performance Evaluation
• Use Cases
• Future Works
Application 1: The Next Generation of
Big Data Platform for Meteorology
1073
kinds of
meteor-
ological
data
The platform is deployed across
China
Performance improved :
two orders of magnitude
~150K
stations
collect more than 100
metrics/ 5 minutes
upgrade
Application 2:
Data Management for Equipment Monitoring
The data records the operational status of the
equipments,
e.g., the vehicle’s speed, fuel consumption
and malfunction.
© 2015. All Rights
Reserved.
execute
collect
decision
transfer
Komatsu
excavator
TIANYUAN (with Komatsu)
#devices (excavator etc.)
#metrics
collection times per minute
• sharding every day
• only store data in 3 months
• more than 10 minutes for
some queries
• store the whole data
• several seconds for
complex queries
Application 3:
Shanghai METRO Monitoring
…
144 trains
9 KairosDB + Cassandra
3200 points/500 ms/train
14 Restful service just for avoiding
modifying current programs
KDB compatible
Restful Service
KDB compatible
Restful Service
KDB compatible
Restful Service
ONE IoTDB
instance
300 trains
3200 points/200 ms/train
414 Billion
data points
per day
just using
ONE IoTDB
instance
upgrade
Application 4:
Application 4:
Outline
• Who We Are
• Why IoTDB Was Born
• Overview of Apache IoTDB (incubating): Main Features
• Working with Current Ecosystems
• Performance Evaluation
• Use Cases
• Future Works
Future Works
• Make it easy to use!
• Relational Model: Integration with Calcite
• step 1: supports relational SQL
• step 2: standard JDBC
• Big Data!
• better integration with Hive, etc..
• Cluster!
• now supports writing data on HDFS, but a share-nothing architecture is wanted.
• Advanced functions!
• integration with data streaming engine, etc..
Join Us
• Mail list:
• subscribe: dev-
subscribe@iotdb.incubator.apache.org
• discussion: dev@iotdb.apache.org
• bug report:
https://issues.apache.org/jira/projects/I
OTDB/issues/IOTDB
• Website: https://iotdb.apache.org
• Ecosystem target:
IoTDB v0.8.0 is released! (the first Apache release version)

More Related Content

What's hot

What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...Simplilearn
 
Column oriented database
Column oriented databaseColumn oriented database
Column oriented databaseKanike Krishna
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022Kai Wähner
 
An Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaAn Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaObjectRocket
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introductionsudhakara st
 
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Simplilearn
 
How Kafka Powers the World's Most Popular Vector Database System with Charles...
How Kafka Powers the World's Most Popular Vector Database System with Charles...How Kafka Powers the World's Most Popular Vector Database System with Charles...
How Kafka Powers the World's Most Popular Vector Database System with Charles...HostedbyConfluent
 
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...HostedbyConfluent
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.Taras Matyashovsky
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphP. Taylor Goetz
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streamingdatamantra
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
Near Real-Time Data Warehousing with Apache Spark and Delta Lake
Near Real-Time Data Warehousing with Apache Spark and Delta LakeNear Real-Time Data Warehousing with Apache Spark and Delta Lake
Near Real-Time Data Warehousing with Apache Spark and Delta LakeDatabricks
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Azure+Databricks+Course+Slide+Deck+V4.pdf
Azure+Databricks+Course+Slide+Deck+V4.pdfAzure+Databricks+Course+Slide+Deck+V4.pdf
Azure+Databricks+Course+Slide+Deck+V4.pdfChitresh Kaushik
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IODatabricks
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 

What's hot (20)

What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 
Column oriented database
Column oriented databaseColumn oriented database
Column oriented database
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
 
An Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaAn Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and Kibana
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
 
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
 
How Kafka Powers the World's Most Popular Vector Database System with Charles...
How Kafka Powers the World's Most Popular Vector Database System with Charles...How Kafka Powers the World's Most Popular Vector Database System with Charles...
How Kafka Powers the World's Most Popular Vector Database System with Charles...
 
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
 
Spark graphx
Spark graphxSpark graphx
Spark graphx
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Near Real-Time Data Warehousing with Apache Spark and Delta Lake
Near Real-Time Data Warehousing with Apache Spark and Delta LakeNear Real-Time Data Warehousing with Apache Spark and Delta Lake
Near Real-Time Data Warehousing with Apache Spark and Delta Lake
 
Apache flink
Apache flinkApache flink
Apache flink
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Azure+Databricks+Course+Slide+Deck+V4.pdf
Azure+Databricks+Course+Slide+Deck+V4.pdfAzure+Databricks+Course+Slide+Deck+V4.pdf
Azure+Databricks+Course+Slide+Deck+V4.pdf
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IO
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 

Similar to Apache IOTDB: a Time Series Database for Industrial IoT

Time Series Databases for IoT (On-premises and Azure)
Time Series Databases for IoT (On-premises and Azure)Time Series Databases for IoT (On-premises and Azure)
Time Series Databases for IoT (On-premises and Azure)Ivo Andreev
 
Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...Tech Triveni
 
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big DataDataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big DataHakka Labs
 
IBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data LakeIBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data LakeTorsten Steinbach
 
Data Science with the Help of Metadata
Data Science with the Help of MetadataData Science with the Help of Metadata
Data Science with the Help of MetadataJim Dowling
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataHostedbyConfluent
 
Big data berlin
Big data berlinBig data berlin
Big data berlinkammeyer
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
High-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsHigh-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsClusterpoint
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist SoftServe
 
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big DataABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big DataHitoshi Sato
 
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationDenodo
 
From a student to an apache committer practice of apache io tdb
From a student to an apache committer  practice of apache io tdbFrom a student to an apache committer  practice of apache io tdb
From a student to an apache committer practice of apache io tdbjixuan1989
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMark Kromer
 
Suburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data LakeSuburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data LakeTorsten Steinbach
 
Time Series Analytics Azure ADX
Time Series Analytics Azure ADXTime Series Analytics Azure ADX
Time Series Analytics Azure ADXRiccardo Zamana
 
Windows Azure: Lessons From The Field
Windows Azure: Lessons From The FieldWindows Azure: Lessons From The Field
Windows Azure: Lessons From The FieldRob Gillen
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceeRic Choo
 
Дмитрий Попович "How to build a data warehouse?"
Дмитрий Попович "How to build a data warehouse?"Дмитрий Попович "How to build a data warehouse?"
Дмитрий Попович "How to build a data warehouse?"Fwdays
 

Similar to Apache IOTDB: a Time Series Database for Industrial IoT (20)

Time Series Databases for IoT (On-premises and Azure)
Time Series Databases for IoT (On-premises and Azure)Time Series Databases for IoT (On-premises and Azure)
Time Series Databases for IoT (On-premises and Azure)
 
Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...
 
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big DataDataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
 
IBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data LakeIBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data Lake
 
Data Science with the Help of Metadata
Data Science with the Help of MetadataData Science with the Help of Metadata
Data Science with the Help of Metadata
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier Data
 
Big data berlin
Big data berlinBig data berlin
Big data berlin
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
High-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsHigh-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutions
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big DataABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
 
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
 
From a student to an apache committer practice of apache io tdb
From a student to an apache committer  practice of apache io tdbFrom a student to an apache committer  practice of apache io tdb
From a student to an apache committer practice of apache io tdb
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
Suburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data LakeSuburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data Lake
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
 
Time Series Analytics Azure ADX
Time Series Analytics Azure ADXTime Series Analytics Azure ADX
Time Series Analytics Azure ADX
 
Windows Azure: Lessons From The Field
Windows Azure: Lessons From The FieldWindows Azure: Lessons From The Field
Windows Azure: Lessons From The Field
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
Дмитрий Попович "How to build a data warehouse?"
Дмитрий Попович "How to build a data warehouse?"Дмитрий Попович "How to build a data warehouse?"
Дмитрий Попович "How to build a data warehouse?"
 

More from jixuan1989

Apache IoTDB 的前世今生与部分技术细节 2020-01
Apache IoTDB 的前世今生与部分技术细节 2020-01Apache IoTDB 的前世今生与部分技术细节 2020-01
Apache IoTDB 的前世今生与部分技术细节 2020-01jixuan1989
 
基于Apache IoTDB的时序数据开源解决方案2020-1-4
基于Apache IoTDB的时序数据开源解决方案2020-1-4基于Apache IoTDB的时序数据开源解决方案2020-1-4
基于Apache IoTDB的时序数据开源解决方案2020-1-4jixuan1989
 
Apache IoTDB 工业互联网时序数据库 meetup-2019.12
Apache IoTDB 工业互联网时序数据库 meetup-2019.12Apache IoTDB 工业互联网时序数据库 meetup-2019.12
Apache IoTDB 工业互联网时序数据库 meetup-2019.12jixuan1989
 
The practice of enjoying apache
The practice of enjoying apacheThe practice of enjoying apache
The practice of enjoying apachejixuan1989
 
Practice of building apache sharding sphere iincubator community
Practice of building apache sharding sphere iincubator communityPractice of building apache sharding sphere iincubator community
Practice of building apache sharding sphere iincubator communityjixuan1989
 
Willem Ning Jiang: Getting Started: How to join an Open Source project Apache...
Willem Ning Jiang: Getting Started: How to join an Open Source project Apache...Willem Ning Jiang: Getting Started: How to join an Open Source project Apache...
Willem Ning Jiang: Getting Started: How to join an Open Source project Apache...jixuan1989
 
Craig The apache Way
Craig The apache Way Craig The apache Way
Craig The apache Way jixuan1989
 

More from jixuan1989 (7)

Apache IoTDB 的前世今生与部分技术细节 2020-01
Apache IoTDB 的前世今生与部分技术细节 2020-01Apache IoTDB 的前世今生与部分技术细节 2020-01
Apache IoTDB 的前世今生与部分技术细节 2020-01
 
基于Apache IoTDB的时序数据开源解决方案2020-1-4
基于Apache IoTDB的时序数据开源解决方案2020-1-4基于Apache IoTDB的时序数据开源解决方案2020-1-4
基于Apache IoTDB的时序数据开源解决方案2020-1-4
 
Apache IoTDB 工业互联网时序数据库 meetup-2019.12
Apache IoTDB 工业互联网时序数据库 meetup-2019.12Apache IoTDB 工业互联网时序数据库 meetup-2019.12
Apache IoTDB 工业互联网时序数据库 meetup-2019.12
 
The practice of enjoying apache
The practice of enjoying apacheThe practice of enjoying apache
The practice of enjoying apache
 
Practice of building apache sharding sphere iincubator community
Practice of building apache sharding sphere iincubator communityPractice of building apache sharding sphere iincubator community
Practice of building apache sharding sphere iincubator community
 
Willem Ning Jiang: Getting Started: How to join an Open Source project Apache...
Willem Ning Jiang: Getting Started: How to join an Open Source project Apache...Willem Ning Jiang: Getting Started: How to join an Open Source project Apache...
Willem Ning Jiang: Getting Started: How to join an Open Source project Apache...
 
Craig The apache Way
Craig The apache Way Craig The apache Way
Craig The apache Way
 

Recently uploaded

Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 

Recently uploaded (20)

Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 

Apache IOTDB: a Time Series Database for Industrial IoT

  • 1. Apache IoTDB: a Time Series Database for Industrial IoT Xiangdong Huang1 and Julian Feinauer2 (on behalf of the IoTDB community) 1 Tsinghua University, Beijing, China 2 Pragmatic Minds, Stuttgart, Germany Berlin, Germany, 2019.10.23
  • 2. Outline • Who We Are • Why IoTDB Was Born • Overview of Apache IoTDB (incubating): Main Features • Working with Current Ecosystems • Performance Evaluation • Use Cases • Future Works
  • 3. IoTDB • IoTDB = IoT + DB, a DataBase for managing (Industrial) IoT data • IoTDB is a IoT DB. (using IoTDB as a keyword on Google, not “IoT DB”)
  • 4. IoTDB • IoTDB = IoT + DB, a DataBase for managing (Industrial) IoT data • IoTDB is a IoT DB. (using IoTDB as a keyword on Google, not “IoT DB”) • “You can find many substances about IoTDB in Germany”
  • 5. IoTDB • IoTDB = IoT + DB, a DataBase for managing (Industrial) IoT data • “You can find many substances about IoTDB” • IIoT turbine excavator trunks modern cars
  • 6. IoTDB • IoTDB = IoT + DB, the DataBase for managing (Industrial) IoT data • “You can find many substances about IoTDB” • IIoT • DB deutsche bahn (the real meaning)
  • 7. Who We Are (The community) • We come from the Apache IoTDB (incubating) Community • A young community. 2018.11-18 entered the incubator. • Mentors: Christofer Dutz, Justin Mclean, (Champion) Kevin A. McGrail, Willem Jiang • Devoted to building the best time series database (in IoT area) in the world
  • 8. Who We Are (Individual) • Xiangdong Huang (sainthxd@gmail.com) • PhD, PostDoc and Assistant Researcher (now) in Tsinghua University, Beijing, China • Use Apache Cassandra (for managing Timeseries Data) from 2012 • Develop IoTDB from 2017 • One of the initial committers of Apache IoTDB incubating
  • 9. Who We Are (Individual) • Julian Feinauer (j.feinauer@pragmaticminds.de) • Founder of Startup pragmatic minds in Germany • The first committer who is not initial committer • The Release Manager of the first release version of IoTDB • The Committer of Apache PLC4x, Edgent etc..
  • 10. Outline • Who We Are • Why IoTDB Was Born • Overview of Apache IoTDB (incubating): Main Features • Working with Current Ecosystems • Performance Evaluation • Use Cases • Future Works
  • 11. The 4th Industrial Revolution Industry 4.0 Industry Internet Data analytics and utility is the key Advanced data analytics Industry Internet Data + Model Germany China USA Data is becoming the most important aspect of this era
  • 12. Machine Data (Time Series Data) : the Largest Volume in Industrial Data Machine Data Other Domain Data EnvironmentMeteorology Geography Industrial Big Data Manufacturing Enterprise Data VideoModel Doc Drawings
  • 13. How to Manage Time Series Data Network MQ PI System (Pi Server) queryinsertion save data locally RDBMS
  • 14. How to Manage Time Series Data Network MQ Database queryinsertion save data locally Network analysis
  • 15. The Problems Network MQ Database ● millions of data points per second? ● 10 millions of data points per second? ● billions of data points per second? insertion Big Data 50Hz,500points/machine, 20K wind-turbines macines, totally up to 500 million points/sec Produce Data 7*24 with High Frequency and Large Volume � More Features � Out-of-order sometimes � Sparse Table (different machine has different sensors)
  • 17. The Problems Network MQ Database query analysis � Features of Data Query � Time Dimension is always accessed � Aggregation is the first-class citizen ■ Sometimes we do not need raw data, just know the count/min/max/avg value is ok. ■ (For visualization), the screen resolution is limited, e.g., 1024*768. Then no meaning for getting more than 1024 points (using aggregation to Downsampling) � Time-series-specific query and analysis
  • 18. ● get a mass of data QUICKLY (ETL) ● then convert it into a analysis-friendly file format ● time consuming The Problems Network MQ Database query analysis
  • 19. What we want � Challenges � Large Volume � High Throughput � Low Cost (historical data) � Low Latency for Query � Fast Aggregation � Query-Analysis hybrid workloads
  • 20. Different Solutions for Managing Time Series RDB KVDB LSM based •Efficient file structure •More query functions Not optimize for some application scenarios TSDB Limited number of columns 1600 Columns in a table Limited number of rows <=10M rows is better Manual Sharding • Support big data • Limited Queries • Lack time filtering • Lack value filtering • Lack multiple time series alignment Based on PG •Auto sharding •Query optimization Performance degrades sharply after writing data for a long time Hbase/Cassandra based •Partition by TS-UID and time range • Storage inefficiency • Limit of queries
  • 21. Outline • Who We Are • Why IoTDB Was Born • Overview of Apache IoTDB (incubating): Main Features • Working with Current Ecosystems • Performance Evaluation • Use Cases • Future Works
  • 22. Time Series DB for Industrial Internet now called: Apache IoTDB (incubating) Each node can manage: ★ Tens of millions of time series ★ Trillions of data points ★ Tens of TB data Support Hadoop, Spark, Matlab, Grafana etc.. “清华数为”工业互联网时间序列数据库
  • 23. Apache IoTDB Features Persist data efficiently • Millions points ingestion per sec per node • Tens of millions of time series Query data with low latency • Efficiently filter data: millions of points per sec • Aggregation: tens of ms latency on billions of points Exclusive operations of time series • Segmentation • Representation • Subsequence matching • Time-frequency transform • Visualization Integration with existing ecosystem • Kafka • MatLab • Spark • MapReduce • Grafana • Connecting Edge to the Cloud • Powerful query engine • User Friendly analytics Collecti on Storage ProcessLearning Applicat ion Cover the life cycle of data
  • 24. Architecture IoTDB Outlier detection Machine learning UDF Hadoop/ Spark Big data Framework cluster TsFile Time series optimized file format TsFile-CLI Interactive client command line IoTDB-JDBC Grafana-Adaptor Web dashboard to visualize time series data IoTDB-CLI Interactive client command line I/E Tool Batch load and export data Other Databases Application s Message Queue DevOp s devic e IoTDB IoTDBSync
  • 25. Concepts in IoTDB (The Schema) Device (i.e., Data source) • A machine instance Measurement (e.g., sensor) • A device can have many measurements Time Series • Device + Measurement • is represented as a path that begins with root, like “root.Cadillac_XT5.USA.CA.7BTC409.fuelRemain” Storage Group (SG) • A storage group can have many devices • Storage groups have independent resources (threads and files) to increase parallelism and reduce competitions for locks. Cadillac XT5
  • 26. The schema mapping root.Cadillac_XT5.USA.CA.7BTC409.fuelRemain root.Cadillac_XT5.USA.CA.7BTC409.speed root.Cadillac_XT5.USA.NV.6BAC321.speed country state device name timestamp fuelRemain speed USA CA 7BTC409 t1 5.0 120 USA CA 7BTC409 t2 4.9 109 USA CA 6BAC321 t1 NULL 50 USA CA 6BAC321 t3 NULL 65 Table Name: Cadillac_XT5 Tags and Fields in InfluxDB, KariosDB, OpenTSDB… called as Measurement in InfluxDB
  • 27. Set time series group SET STORAGE GROUP TO root.laptop.d1.s1; Create Timeseries CREATE TIMESERIES root.laptop.d1.s1 WITH DATATYPE=INT32, ENCODING=RLE Insert Data INSERT INTO (d1.s1,d1.s2,time) VALUES (1000,2000,14735235234); Delete Data DALETE FROM d1.s1 WHERE time < 1000; Update Data UPDATE d1.s1 SET VALUE = 2000 WHERE time < 2000 and time > 1000; Query Data (Filter, Aggregation, Group by time interval) SELECT d1.s1,d2.* FROM BJ.WF1 WHERE d1.s1 < 2000 and d2.s2 > 1000 and freq(d2.s3) > 0.5; SELECT count(status), max_value(temperature) from root.ln.wf01.wt01; SELECT count(status) ) from root.ln.wf01.wt01 group by(1h, [2017-11-03T00:00:00, 2017-11- 03T23:00:00]); SQL in IoTDB
  • 28. Supported data type • Boolean • Int • Long • Float • Double • String • GPS (TODO) -> for trajectory data management • Array (TODO) -> for unstructured data management
  • 29.
  • 30. 30 TsFile: Zip File Born for Time Series Data Columnar Store - Reduce Disk I/O - Improve Compression Compression & Encoding - Improve Compression Greatly - 15% Better than InfluxDB in Real Applications Time-domain Statistics Info Natively - Support Fast Query in - Time Domain - Value Domain - Freq Domain (TODO) detailed specification: http://iotdb.apache.org/#/Documents/0.8.0/chap7/sec3 https://cwiki.apache.org/confluence/display/IOTDB/TsFile+Format
  • 31. TsFile: comparison with Parquet You say, “tomato”... Parquet Parquet TsFile Target in TsFile Row Group Chunk Group The data that belongs to a device instance Column Chunk The data that belongs to the device’s measurement Page Page a part of data that belongs to a Chunk The differences ❏ Each Page has two columns actually ❏ The time column and the value column ❏ No Repeat and Duplication Field Needed ❏ More summary info for a Page/Chunk ❏ min/max timestamp ❏ min/max value ❏ count ❏ FileMetadata Page Header Page Data Timestamps Values Difference in TsFile statistics FileMetadata Devices info Level 1 Devices info Level 2
  • 32. TsFile: comparison with Parquet Apache Parquet Chunk Group Chunk File Metadata Time Series Time1 Value1 Time2 Value 2 TsFile Time series data General File Format
  • 33. Adaptive Delta encoding – Int or Long (TODO) Gorilla encoding – Float or Double 128, 136, 144, 152, 160, … 8, 8, 8, 8 � 1st difference is constant. 0, 0, 0 � 2nd difference is 1-bit storage needed! 128, 135, 143, 154, 163, … 7, 8, 11, 9 � 1st difference is not constant though 1, 3, -2 � 2nd difference is 2-bit storage needed! • Unified support of fixed frequency times series or irregular frequency time series TS2Diff encoding – Int or Long (timestamps) • A adaptive enhance for TS2Diff. • See next page. RLE encoding – repeated Int or Long • For repeated sequence: store a value and its count Bit-Packing encoding – Int or Long • Store data in compact form • squeeze out wasteful bits • XOR consecutive data points • Store with variable length encoding scheme Snappy Gzip (TODO) LZO (TODO) Compression Algorithm TsFile: Encoding and Compression
  • 34. Adaptive TS2Diff encoding – Int or Long (TODO) • For time series with outliers or missing points • Storing second-order delta values and a boolean flag array. TsFile: Encoding and Compression
  • 35. Data Query Only records root nodes in memory and build virtual trees, for reducing memory cost and disk I/O 35 Fast Aggregation Method for Time Series IoTDB-SQL DM L R select raw aggregate filter device single across metric single across time certain range group by time interval series order by ASC DESC fill inter- polation latest limit slimit index C U D DDL ✔ 8 types of sub-clause ✔ ≥1052 kinds of query IoTDB-SQL ——Concise TS Operations Language JDBC ——Reduce the Cost of Learning Interfaces: JDBC, TsFile API, CLI, etc.
  • 36. Time Series Specific Operations (TODO) Pattern Matching for Streaming Time Series Data ✔ Split the pattern and data stream into equal length fragments ✔ Extract features to reduce the dimension ✔ Accelerate the search by using features ✔ Scenario:fault alarm in real time 36 SELECT wind_3s FROM china.farm1.tb2 WHERE time > t1 AND time < t2 AND wind_3s LIKE PATTERN(7.2,..,20.3,..,6.0) Similarity Search of Sub-series ✔ Indexing data using Key-Value form ✔ Scenarios: ✔ Outlier detection ✔ Historical data analysis ✔ …
  • 37. From Edge to Cloud: Run IoTDB Everywhere Time series data files: high-tech write, high compression ratio, support simple queries. Simply put, TsFile is a zip file for time series data. Suitable for embedded devices, general servers, data centers, etc. TsFile (a component of IoTDB) A zip file of time series Freely operate time series of multiple TsFiles, including: CRUD and advanced query like:max, min, avg and temporal alignment. Scene: Embedded equipment, on- site industrial computer, general server, etc. IoTDB A database of time series 3rd Systems Easy to use and integrate for complex analysis(data fusion, collaborative recommendation, machine learning) Scene: Cloud data center A data warehouse of time series
  • 38. Outline • Who We Are • Why IoTDB Was Born • Overview of Apache IoTDB (incubating): Main Features • Working with Current Ecosystems • Performance Evaluation • Use Cases • Future Works
  • 39. A Process to Manage Time Series Data data source or JDBC / Session API JDBC / Session API Grafana-Adaptor Spark-TsFile-AdaptorJDBC Analysis with Big Data Framework (big data set) Analysis with Matlab (small data set) Visualization (Manual data explore)
  • 40. Using JDBC to write data set storage group create timeseries insert data https://iotdb.apache.org/#/Documents/0.8.0/chap6/sec1
  • 41. Using Session API to write Data (more efficient) set storage group create timeseries insert data
  • 42. Using JDBC to Query Data raw data query aggregation query down sampling query print result https://iotdb.apache.org/#/Documents/0.8.0/chap6/sec1
  • 43. Using Grafana to Visualize Data https://iotdb.apache.org/#/Tools/Grafana • Install simple-json-datasource plugin • Config iotdb-grafana-connector • application.properties • Start iotdb-grafana-connector • java -jar iotdb-grafana-0.8.0.war • Add IoTDB data source(Simplejson) • choose connector IP • Config dashboard and Enjoy!
  • 44. Using Matlab to Analyze Data read IoTDB by JDBC fast Fourier transform plot
  • 45. Using Spark to Analyze Data create table sql query read TsFile write to TsFile https://iotdb.apache.org/#/Tools/Spark
  • 46. Demo • Writing Data Locally • Show data with Grafana • Analyze data using SparkSQL • https://github.com/jixuan1989/iotdb-tutorial
  • 47. Demo Video • Writing Data on HDFS directly • using Hive to analyze it • Video
  • 48. Language • Written by Java • But the RPC is implemented by Thrift • Easy to get other language’s API.
  • 49. Say Hi to the Apache Ecosystem IoTDB-repository: RocketMQ: https://github.com/apache/incubator- iotdb/tree/master/example/rocketmq Kafka: https://github.com/apache/incubator- iotdb/tree/master/example/kafka Third part: EMQx (MQTT server): https://github.com/jixuan1989/iotdb-tutorial Spark: https://github.com/jixuan1989/iotdb-tutorial Calcite: https://github.com/EJTTianYu/iotdb-calcite PLC4X: Mapreduce:
  • 50. Outline • Who We Are • Why IoTDB Was Born • Overview of Apache IoTDB (incubating): Main Features • Working with Current Ecosystems • Performance Evaluation • Use Cases • Future Works
  • 51. Application 1: The Next Generation of Big Data Platform for Meteorology 1073 kinds of meteor- ological data The platform is deployed across China Performance improved : two orders of magnitude ~150K stations collect more than 100 metrics/ 5 minutes upgrade
  • 52. Application 2: Data Management for Equipment Monitoring The data records the operational status of the equipments, e.g., the vehicle’s speed, fuel consumption and malfunction. © 2015. All Rights Reserved. execute collect decision transfer Komatsu excavator TIANYUAN (with Komatsu) #devices (excavator etc.) #metrics collection times per minute • sharding every day • only store data in 3 months • more than 10 minutes for some queries • store the whole data • several seconds for complex queries
  • 53. Application 3: Shanghai METRO Monitoring … 144 trains 9 KairosDB + Cassandra 3200 points/500 ms/train 14 Restful service just for avoiding modifying current programs KDB compatible Restful Service KDB compatible Restful Service KDB compatible Restful Service ONE IoTDB instance 300 trains 3200 points/200 ms/train 414 Billion data points per day just using ONE IoTDB instance upgrade
  • 56. Outline • Who We Are • Why IoTDB Was Born • Overview of Apache IoTDB (incubating): Main Features • Working with Current Ecosystems • Performance Evaluation • Use Cases • Future Works
  • 57. Future Works • Make it easy to use! • Relational Model: Integration with Calcite • step 1: supports relational SQL • step 2: standard JDBC • Big Data! • better integration with Hive, etc.. • Cluster! • now supports writing data on HDFS, but a share-nothing architecture is wanted. • Advanced functions! • integration with data streaming engine, etc..
  • 58. Join Us • Mail list: • subscribe: dev- subscribe@iotdb.incubator.apache.org • discussion: dev@iotdb.apache.org • bug report: https://issues.apache.org/jira/projects/I OTDB/issues/IOTDB • Website: https://iotdb.apache.org • Ecosystem target: IoTDB v0.8.0 is released! (the first Apache release version)