SlideShare a Scribd company logo
从 Apache IoTDB 看高校学生的
Apache 开源实践
Developing Apache IoTDB:
Practice Experience from Young Students
Xiangdong Huang
Tsinghua University, Beijing, China
2019.11.09
Outline
• Who am I
• The Start
• Dream Disillusion
• A New Hope
Outline
• Who am I
• The Start
• Dream Disillusion
• A New Hope
Who am I
• Xiangdong Huang (sainthxd@gmail.com)
• Was a PhD student and PostDoc in Tsinghua University
• One of the initial committers of Apache IoTDB (incubating)
• Was a PhD student and PostDoc in Tsinghua University
The Start
• Was a PhD student and PostDoc in Tsinghua University
• it was the start of the following story when I knocked the door of
my supervisor’s office in 2011…
My supervisor
(Jianmin Wang)
me
My supervisor
(Jianmin Wang)
me
The Start
My supervisor
(Jianmin Wang)
me
Xiangdong, Why do you
want to be a PhD at
School of Software?
I want to develop
something that be used
by millions of people!
Come on!
Do some cool softwares that can be used by many many people.
Outline
• Who am I
• The Start
• Dream Disillusion
• A New Hope
As an Individual Developer
• Write a lot small “tools“
• But no maintaining
• Just for fun/self-use
Developer as a Student
• Many courses
• Do not need to write to much codes (in some home works)..
• Good for improve skill, and hard to get the full score (because some are really hard!).
Data Mining Modern Database
100 lines? innovation
Developer as a Student
The figure is from the Internet… 图文无关。。。
Homework magic
weapons:
- Bootstrap
- Django
- MySQL
A beautiful web DEMO is done
Developer as a Student
The figure is from the Internet… 图文无关。。。
Homework magic
weapons:
- Bootstrap
- Django
- MySQL
A beautiful web DEMO is done
To use the
demo, we can
Step 1, click..
Step 2, click..
…
student
reviews
Developer as a Student
The figure is from the Internet… 图文无关。。。
Homework magic
weapons:
- Bootstrap
- Django
- MySQL
A beautiful web DEMO is done
To use the
demo, we can
Step 1, click..
Step 2, click..
…
What if I click
here first.
Developer as a Student
The figure is from the Internet… 图文无关。。。
Homework magic
weapons:
- Bootstrap
- Django
- MySQL
A beautiful web DEMO is done
To use the
demo, we can
Step 1, click..
Step 2, click..
…
STOP!
YOU
CANNOT!
What if I click
here first.
We are writing demo and demo and demo…
• Complex project management?
• Makefile? POM? Gradle?
• Agile? Scrum? Sprint?
• CI? CD?
A pom file example
From Apache PLC4x
At the same time, Big Data + Apache ..
• Hadoop
• HBase
• Cassandra
Please
implement some
functions
Ah, Hadoop + Hive
can do that!
Let me download it
At the same time, Big Data + Apache ..
• Hadoop
• HBase
• Cassandra
• ~200 k lines of codes
Please
implement some
functions
Ah, Hadoop + Hive
can do that!
Let me download it
Oops, an
exception!
At the same time, Big Data + Apache ..
• Hadoop
• HBase
• Cassandra
• ~200 k lines of codes
• 2.2.0, 2.2.1, …2.2.5;
Please
implement some
functions
Ah, Hadoop + Hive
can do that!
Let me download it
Oops, an
exception!
Why
Cassandra
can update
so frequent?
At the same time, Big Data + Apache ..
• Hadoop
• HBase
• Cassandra
• ~200 k lines of codes
• 2.2.0, 2.2.1, …2.2.5;
• Patch
Please
implement some
functions
Ah, Hadoop + Hive
can do that!
Let me download it
Oops, an
exception!
Why
Cassandra
can update
so frequent?
Wow, someone
share a patch
file to fix a bug!
Yes, you are growing! You have known JIRA, etc..
• When can I get rid of writing demo, and do some
nice software like Apache Cassandra, Hadoop, etc..
Outline
• Who am I
• The Start
• Dream Disillusion
• A New Hope
A New Hope
• Be active in an existing open source community
• Hadoop, Cassandra, Spark etc..
• Be active in a new open source community
• IoTDB etc..
Time series data is everywhere
穿戴设备无人驾驶
A good DB can improve the whole process
Network
MQ Database
queryinsertion
save data
locally
Network
analysis
And no good software
RDB
KVDB
LSM based
•Efficient file structure
•More query functions
Not optimize for
some application
scenarios
TSDB
Limited number of
columns
1600 Columns in a table
Limited number of rows
<=10M rows is better
Manual Sharding
• Support big data
• Limited Queries
• Lack time filtering
• Lack value filtering
• Lack multiple time series
alignment
Based on PG
•Auto sharding
•Query optimization
Performance degrades
sharply after writing
data for a long time
Hbase/Cassandra based
•Partition by TS-UID
and time range
• Storage inefficiency
• Limit of queries
Do it ourselves
supervisor
students
Let’s develop a
time series DB!
Can we?
You can! And you
can do it in an
open source way.
And then learn a lot…
1. Teamwork
• Git with 10+ persons Team
• Commitlog
• Conflict, merge, squash…
• Branches…(dev, release, stable…)
Let your software >= 100K Lines.
2. Learn skills
• Git with 10+ persons Team
• Conflict, merge, squash…
• Branches…(dev, release, stable…)
• Project structure
Let your software powerful.
3. Stability/Agile
• Git with 10+ persons Team
• Conflict, merge, squash…
• Branches…(dev, release, stable…)
• Project structure
• CI/CD
• Jenkins, travis-CI
Let your software really really can be used.
4. Open your mind
• Git with 10+ persons Team
• Conflict, merge, squash…
• Branches…(dev, release, stable…)
• Project structure
• CI/CD
• Jenkins, travis-CI
• Issue -> PR -> Release
Open your minds.
Improve your communication skills.
5. Research and Project
• User requirements -> Implementation -> IoTDB -> User
• Idea -> Implementation -> IoTDB -> Evaluation -> Paper -> User
• Paper -> Implementation -> IoTDB -> Evaluation -> User
OK….
• Past
• I can write a demo
• I like to write something
• I like to write something used
by myself
• Now
• I/We know how to write a
complex software
• I/We know how to write a
software used by people
Do it ourselves
• Know a lot about how Apache project are developed!
• How the website of an Apache project is built?
• Who can be a committer of an Apache project?
• How to release projects?
• Who decides the new features of an Apache project?
• Etc..
Time Series DB for Industrial Internet
“清华数为” 时间序列数据库 -->Apache IoTDB (incubating)
• Apache IoTDB (incubating) is a
high efficient Database for
managing time series data,
especially in Industry Internet
applications.
• A young community. Donated by
Tsinghua University, 2018.11-18
entered the incubator.
• Devoted to building the best time
series database (in IoT area) in the
world.
• Apache IoTDB v0.8.1 is released!
v0.9.0 is coming!
Developers and Users
Concepts in IoTDB (The Schema)
Device (i.e., Data source)
• A machine instance
Measurement (e.g., sensor)
• A device can have many measurements
Time Series
• Device + Measurement
• is represented as a path that begins with root, like
“root.Cadillac_XT5.USA.CA.7BTC409.fuelRemain”
Storage Group (SG)
• A storage group can have many devices
• Storage groups have independent resources
(threads and files) to increase parallelism and
reduce competitions for locks.
Cadillac XT5
The schema mapping
root.Cadillac_XT5.USA.CA.7BTC409.fuelRemain
root.Cadillac_XT5.USA.CA.7BTC409.speed
root.Cadillac_XT5.USA.NV.6BAC321.speed
country state device name timestamp fuelRemain speed
USA CA 7BTC409 t1 5.0 120
USA CA 7BTC409 t2 4.9 109
USA CA 6BAC321 t1 NULL 50
USA CA 6BAC321 t3 NULL 65
Table Name: Cadillac_XT5
Tags and Fields in InfluxDB, KariosDB, OpenTSDB…
called as Measurement in InfluxDB
Set time series group
SET STORAGE GROUP TO root.laptop.d1.s1;
Create Timeseries
CREATE TIMESERIES root.laptop.d1.s1 WITH DATATYPE=INT32, ENCODING=RLE
Insert Data
INSERT INTO (d1.s1,d1.s2,time) VALUES (1000,2000,14735235234);
Delete Data
DALETE FROM d1.s1 WHERE time < 1000;
Update Data
UPDATE d1.s1 SET VALUE = 2000 WHERE time < 2000 and time > 1000;
Query Data (Filter, Aggregation, Group by time interval)
SELECT d1.s1,d2.* FROM BJ.WF1 WHERE d1.s1 < 2000 and d2.s2 > 1000 and freq(d2.s3) > 0.5;
SELECT count(status), max_value(temperature) from root.ln.wf01.wt01;
SELECT count(status) ) from root.ln.wf01.wt01 group by(1h, [2017-11-03T00:00:00, 2017-11-
03T23:00:00]);
SQL in IoTDB
Supported data type
• Boolean
• Int
• Long
• Float
• Double
• String
• GPS (TODO) -> for trajectory data management
• Array (TODO) -> for unstructured data management
41
TsFile: Zip File Born for Time Series Data
Columnar
Store
- Reduce Disk I/O
- Improve Compression
Compression
&
Encoding
- Improve Compression Greatly
- 15% Better than InfluxDB in
Real Applications
Time-domain
Statistics Info
Natively
- Support Fast Query in
- Time Domain
- Value Domain
- Freq Domain (TODO)
detailed specification:
http://iotdb.apache.org/#/Documents/0.8.0/chap7/sec3
https://cwiki.apache.org/confluence/display/IOTDB/TsFile+Format
Adaptive Delta encoding – Int or Long (TODO)
Gorilla encoding – Float or Double
128, 136, 144, 152, 160, …
8, 8, 8, 8 1st difference is constant.
0, 0, 0 2nd difference is 1-bit storage needed!
128, 135, 143, 154, 163, …
7, 8, 11, 9 1st difference is not constant though
1, 3, -2 2nd difference is 2-bit storage needed!
• Unified support of fixed frequency times series
or irregular frequency time series
TS2Diff encoding – Int or Long (timestamps)
• A adaptive enhance for TS2Diff.
• See next page.
RLE encoding – repeated Int or Long
• For repeated sequence: store a value and its count
Bit-Packing encoding – Int or Long
• Store data in compact form
• squeeze out wasteful bits
• XOR consecutive data points
• Store with variable length encoding scheme
Snappy Gzip (TODO) LZO (TODO)
Compression Algorithm
TsFile: Encoding and Compression
Adaptive TS2Diff encoding – Int or Long (TODO)
• For time series with outliers or missing points
• Storing second-order delta values and a boolean flag array.
TsFile: Encoding and Compression
Time Series Specific Operations (TODO)
Pattern Matching for Streaming Time Series Data
Split the pattern and data stream into
equal length fragments
Extract features to reduce the dimension
Accelerate the search by using features
Scenario:fault alarm in real time
44
SELECT wind_3s FROM china.farm1.tb2
WHERE time > t1 AND time < t2
AND wind_3s LIKE PATTERN(7.2,..,20.3,..,6.0)
Similarity Search of Sub-series
Indexing data using Key-Value form
Scenarios:
Outlier detection
Historical data analysis
…
From Edge to Cloud: Run IoTDB Everywhere
Time series data files: high-tech
write, high compression ratio,
support simple queries. Simply
put, TsFile is a zip file for time
series data.
Suitable for embedded devices,
general servers, data centers, etc.
TsFile (a component of IoTDB)
A zip file of time series
Freely operate time series of
multiple TsFiles, including: CRUD
and advanced query like:max, min,
avg and temporal alignment.
Scene: Embedded equipment, on-
site industrial computer, general
server, etc.
IoTDB
A database of time series
3rd Systems
Easy to use and integrate for
complex analysis(data fusion,
collaborative recommendation,
machine learning)
Scene: Cloud data center
A data warehouse of time series
A Process to Manage Time Series Data
data source
or
JDBC / Session API
JDBC / Session API
Grafana-Adaptor Spark-TsFile-AdaptorJDBC
Analysis with Big Data Framework
(big data set)
Analysis with Matlab
(small data set)
Visualization
(Manual data explore)
https://github.com/jixuan1989/iotdb-tutorial
Latest version v0.8 (0.9.0-snapshot)
Apache IoTDB-incubating v0.9.0-SNAPSHOT
Xeon E5v4
256G Mem
HDD Disk
#Client #Storage
Group
#Device #Measurem
ent per
Device
DataType Encoding Compressio
n
BatchSize #Point per
Time Series
10 50 1000 100 Float RLE Snappy 100 100000
Insertion
#Client #Storage
Group
#Device #Measure
ment per
Device
DataType Encoding Compressi
on
BatchSize #Point per Time
Series
50 1 1 10 Float RLE Snappy 100 100000000
Query
Compression
Apache IoTDB-incubating v0.9.0-SNAPSHOT
Xeon E5v4
256G Mem
HDD Disk
Raw data:
- 12 Bytes per point
- 112 GB totally
Write Performance: points/s(single node)
Xeon E5v4
256G Mem
HDD Disk
* In this experiment, we do not use IoTDB’s JDBC API and SQL interface.
Instead, we use a raw API like Cassnadra’s Raw Thrift API.
Apache IoTDB-incubating v0.9.0-SNAPSHOT
Query Performance: aggregation count()
InfluxDB failed to return
any answers in the
100,000,000 setting.
Xeon E5v4
256G Mem
HDD Disk
Apache IoTDB-incubating v0.9.0-SNAPSHOT
Shanghai METRO Monitoring
…
144 trains
9 KairosDB + Cassandra
3200 points/500 ms/train
14 Restful service just for avoiding
modifying current programs
KDB compatible
Restful Service
KDB compatible
Restful Service
KDB compatible
Restful Service
ONE IoTDB
instance
300 trains
3200 points/200 ms/train
414 Billion
data points
per day
just using
ONE IoTDB
instance
upgrade
Join Us
• Mail list:
• subscribe: dev-
subscribe@iotdb.incubator.apache.org
• discussion: dev@iotdb.apache.org
• !中英文皆可!(推荐英文)
• bug report: https://s.apache.org/iotdb-issues
• !中英文皆可!(推荐英文)
• Website: https://iotdb.apache.org
钉钉用户交流群
官方网站
IoTDB社区建设:
• 邀请更多开发者/用户/学生加入社区,共同成长
• 是本科学生毕设、研究生实习的最佳选择之一!
• 欢迎外地学生/开发者(邀请参加>=1次北京meetup)

More Related Content

What's hot

Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
bigdata trunk
 
Building a lightweight discovery interface for Chinese patents
Building a lightweight discovery interface for Chinese patentsBuilding a lightweight discovery interface for Chinese patents
Building a lightweight discovery interface for Chinese patents
OpenSource Connections
 
Searching Chinese Patents Presentation at Enterprise Data World
Searching Chinese Patents Presentation at Enterprise Data WorldSearching Chinese Patents Presentation at Enterprise Data World
Searching Chinese Patents Presentation at Enterprise Data World
OpenSource Connections
 
Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...
Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...
Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...
OpenSource Connections
 
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
DataWorks Summit
 
Apache Arrow and Python: The latest
Apache Arrow and Python: The latestApache Arrow and Python: The latest
Apache Arrow and Python: The latest
Wes McKinney
 
PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 Keynote
Peter Wang
 
AWS re:Invent 2016: Open-Source Resources (DCS201)
AWS re:Invent 2016: Open-Source Resources (DCS201)AWS re:Invent 2016: Open-Source Resources (DCS201)
AWS re:Invent 2016: Open-Source Resources (DCS201)
Amazon Web Services
 
Spotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYCSpotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYC
Josh Baer
 
Use cases for cassandra in federal and state government
Use cases for cassandra in federal and state governmentUse cases for cassandra in federal and state government
Use cases for cassandra in federal and state government
OpenSource Connections
 
Spectator to Participant. Contributing to Cassandra (Patrick McFadin, DataSta...
Spectator to Participant. Contributing to Cassandra (Patrick McFadin, DataSta...Spectator to Participant. Contributing to Cassandra (Patrick McFadin, DataSta...
Spectator to Participant. Contributing to Cassandra (Patrick McFadin, DataSta...
DataStax
 
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Lucidworks
 
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
Databricks
 
H2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth RedmoreH2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth Redmore
Sri Ambati
 
Azure HDInsight
Azure HDInsightAzure HDInsight
Azure HDInsight
Ashish Thapliyal
 
Splunk Spark Integration
Splunk Spark IntegrationSplunk Spark Integration
Splunk Spark Integration
Gang Tao
 
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiHadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Slim Baltagi
 
Hdfs 2016-hadoop-summit-dublin-v1
Hdfs 2016-hadoop-summit-dublin-v1Hdfs 2016-hadoop-summit-dublin-v1
Hdfs 2016-hadoop-summit-dublin-v1
Chris Nauroth
 
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhDSpark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD
Adnan Masood
 
Intro to Search
Intro to SearchIntro to Search
Intro to Search
Grant Ingersoll
 

What's hot (20)

Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
 
Building a lightweight discovery interface for Chinese patents
Building a lightweight discovery interface for Chinese patentsBuilding a lightweight discovery interface for Chinese patents
Building a lightweight discovery interface for Chinese patents
 
Searching Chinese Patents Presentation at Enterprise Data World
Searching Chinese Patents Presentation at Enterprise Data WorldSearching Chinese Patents Presentation at Enterprise Data World
Searching Chinese Patents Presentation at Enterprise Data World
 
Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...
Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...
Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...
 
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
 
Apache Arrow and Python: The latest
Apache Arrow and Python: The latestApache Arrow and Python: The latest
Apache Arrow and Python: The latest
 
PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 Keynote
 
AWS re:Invent 2016: Open-Source Resources (DCS201)
AWS re:Invent 2016: Open-Source Resources (DCS201)AWS re:Invent 2016: Open-Source Resources (DCS201)
AWS re:Invent 2016: Open-Source Resources (DCS201)
 
Spotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYCSpotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYC
 
Use cases for cassandra in federal and state government
Use cases for cassandra in federal and state governmentUse cases for cassandra in federal and state government
Use cases for cassandra in federal and state government
 
Spectator to Participant. Contributing to Cassandra (Patrick McFadin, DataSta...
Spectator to Participant. Contributing to Cassandra (Patrick McFadin, DataSta...Spectator to Participant. Contributing to Cassandra (Patrick McFadin, DataSta...
Spectator to Participant. Contributing to Cassandra (Patrick McFadin, DataSta...
 
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
 
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
 
H2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth RedmoreH2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth Redmore
 
Azure HDInsight
Azure HDInsightAzure HDInsight
Azure HDInsight
 
Splunk Spark Integration
Splunk Spark IntegrationSplunk Spark Integration
Splunk Spark Integration
 
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiHadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
 
Hdfs 2016-hadoop-summit-dublin-v1
Hdfs 2016-hadoop-summit-dublin-v1Hdfs 2016-hadoop-summit-dublin-v1
Hdfs 2016-hadoop-summit-dublin-v1
 
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhDSpark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD
 
Intro to Search
Intro to SearchIntro to Search
Intro to Search
 

Similar to From a student to an apache committer practice of apache io tdb

Stackato v5
Stackato v5Stackato v5
Stackato v5
Jonas Brømsø
 
Basic Application Performance Optimization Techniques (Backend)
Basic Application Performance Optimization Techniques (Backend)Basic Application Performance Optimization Techniques (Backend)
Basic Application Performance Optimization Techniques (Backend)
Klas Berlič Fras
 
Know thy cost (or where performance problems lurk)
Know thy cost (or where performance problems lurk)Know thy cost (or where performance problems lurk)
Know thy cost (or where performance problems lurk)
Oren Eini
 
Lessons learned from building Demand Side Platform
Lessons learned from building Demand Side PlatformLessons learned from building Demand Side Platform
Lessons learned from building Demand Side Platformbbogacki
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Sarah Guido
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
Sri Ambati
 
Stackato v3
Stackato v3Stackato v3
Stackato v3
Jonas Brømsø
 
Big Data Introduction - Solix empower
Big Data Introduction - Solix empowerBig Data Introduction - Solix empower
Big Data Introduction - Solix empower
Durga Gadiraju
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @Datadog
C4Media
 
Ncku csie talk about Spark
Ncku csie talk about SparkNcku csie talk about Spark
Ncku csie talk about Spark
Giivee The
 
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNATomas Cervenka
 
Stackato v6
Stackato v6Stackato v6
Stackato v6
Jonas Brømsø
 
Stackato v2
Stackato v2Stackato v2
Stackato v2
Jonas Brømsø
 
Stackato v4
Stackato v4Stackato v4
Stackato v4
Jonas Brømsø
 
PyData Boston 2013
PyData Boston 2013PyData Boston 2013
PyData Boston 2013
Travis Oliphant
 
Data Science
Data ScienceData Science
Data Science
Ahmet Bulut
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talksyhadoop
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
Denny Lee
 
Cloud computing and Hadoop introduction
Cloud computing and Hadoop introductionCloud computing and Hadoop introduction
Cloud computing and Hadoop introduction
christian.perez
 

Similar to From a student to an apache committer practice of apache io tdb (20)

Stackato v5
Stackato v5Stackato v5
Stackato v5
 
Basic Application Performance Optimization Techniques (Backend)
Basic Application Performance Optimization Techniques (Backend)Basic Application Performance Optimization Techniques (Backend)
Basic Application Performance Optimization Techniques (Backend)
 
Know thy cost (or where performance problems lurk)
Know thy cost (or where performance problems lurk)Know thy cost (or where performance problems lurk)
Know thy cost (or where performance problems lurk)
 
Lessons learned from building Demand Side Platform
Lessons learned from building Demand Side PlatformLessons learned from building Demand Side Platform
Lessons learned from building Demand Side Platform
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at Bitly
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
 
Stackato v3
Stackato v3Stackato v3
Stackato v3
 
Big Data Introduction - Solix empower
Big Data Introduction - Solix empowerBig Data Introduction - Solix empower
Big Data Introduction - Solix empower
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @Datadog
 
Ncku csie talk about Spark
Ncku csie talk about SparkNcku csie talk about Spark
Ncku csie talk about Spark
 
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
 
Stackato v6
Stackato v6Stackato v6
Stackato v6
 
Stackato v2
Stackato v2Stackato v2
Stackato v2
 
Stackato v4
Stackato v4Stackato v4
Stackato v4
 
PyData Boston 2013
PyData Boston 2013PyData Boston 2013
PyData Boston 2013
 
Data Science
Data ScienceData Science
Data Science
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talks
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
Cloud computing and Hadoop introduction
Cloud computing and Hadoop introductionCloud computing and Hadoop introduction
Cloud computing and Hadoop introduction
 
Be faster then rabbits
Be faster then rabbitsBe faster then rabbits
Be faster then rabbits
 

More from jixuan1989

Apache IoTDB 的前世今生与部分技术细节 2020-01
Apache IoTDB 的前世今生与部分技术细节 2020-01Apache IoTDB 的前世今生与部分技术细节 2020-01
Apache IoTDB 的前世今生与部分技术细节 2020-01
jixuan1989
 
基于Apache IoTDB的时序数据开源解决方案2020-1-4
基于Apache IoTDB的时序数据开源解决方案2020-1-4基于Apache IoTDB的时序数据开源解决方案2020-1-4
基于Apache IoTDB的时序数据开源解决方案2020-1-4
jixuan1989
 
Apache IoTDB 工业互联网时序数据库 meetup-2019.12
Apache IoTDB 工业互联网时序数据库 meetup-2019.12Apache IoTDB 工业互联网时序数据库 meetup-2019.12
Apache IoTDB 工业互联网时序数据库 meetup-2019.12
jixuan1989
 
The practice of enjoying apache
The practice of enjoying apacheThe practice of enjoying apache
The practice of enjoying apache
jixuan1989
 
Willem Ning Jiang: Getting Started: How to join an Open Source project Apache...
Willem Ning Jiang: Getting Started: How to join an Open Source project Apache...Willem Ning Jiang: Getting Started: How to join an Open Source project Apache...
Willem Ning Jiang: Getting Started: How to join an Open Source project Apache...
jixuan1989
 
Craig The apache Way
Craig The apache Way Craig The apache Way
Craig The apache Way
jixuan1989
 
Apache IOTDB: a Time Series Database for Industrial IoT
Apache IOTDB: a Time Series Database for Industrial IoTApache IOTDB: a Time Series Database for Industrial IoT
Apache IOTDB: a Time Series Database for Industrial IoT
jixuan1989
 

More from jixuan1989 (7)

Apache IoTDB 的前世今生与部分技术细节 2020-01
Apache IoTDB 的前世今生与部分技术细节 2020-01Apache IoTDB 的前世今生与部分技术细节 2020-01
Apache IoTDB 的前世今生与部分技术细节 2020-01
 
基于Apache IoTDB的时序数据开源解决方案2020-1-4
基于Apache IoTDB的时序数据开源解决方案2020-1-4基于Apache IoTDB的时序数据开源解决方案2020-1-4
基于Apache IoTDB的时序数据开源解决方案2020-1-4
 
Apache IoTDB 工业互联网时序数据库 meetup-2019.12
Apache IoTDB 工业互联网时序数据库 meetup-2019.12Apache IoTDB 工业互联网时序数据库 meetup-2019.12
Apache IoTDB 工业互联网时序数据库 meetup-2019.12
 
The practice of enjoying apache
The practice of enjoying apacheThe practice of enjoying apache
The practice of enjoying apache
 
Willem Ning Jiang: Getting Started: How to join an Open Source project Apache...
Willem Ning Jiang: Getting Started: How to join an Open Source project Apache...Willem Ning Jiang: Getting Started: How to join an Open Source project Apache...
Willem Ning Jiang: Getting Started: How to join an Open Source project Apache...
 
Craig The apache Way
Craig The apache Way Craig The apache Way
Craig The apache Way
 
Apache IOTDB: a Time Series Database for Industrial IoT
Apache IOTDB: a Time Series Database for Industrial IoTApache IOTDB: a Time Series Database for Industrial IoT
Apache IOTDB: a Time Series Database for Industrial IoT
 

Recently uploaded

Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
AzmatAli747758
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
Fundacja Rozwoju Społeczeństwa Przedsiębiorczego
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
Celine George
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 

Recently uploaded (20)

Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 

From a student to an apache committer practice of apache io tdb

  • 1. 从 Apache IoTDB 看高校学生的 Apache 开源实践 Developing Apache IoTDB: Practice Experience from Young Students Xiangdong Huang Tsinghua University, Beijing, China 2019.11.09
  • 2. Outline • Who am I • The Start • Dream Disillusion • A New Hope
  • 3. Outline • Who am I • The Start • Dream Disillusion • A New Hope
  • 4. Who am I • Xiangdong Huang (sainthxd@gmail.com) • Was a PhD student and PostDoc in Tsinghua University • One of the initial committers of Apache IoTDB (incubating)
  • 5. • Was a PhD student and PostDoc in Tsinghua University
  • 6. The Start • Was a PhD student and PostDoc in Tsinghua University • it was the start of the following story when I knocked the door of my supervisor’s office in 2011… My supervisor (Jianmin Wang) me My supervisor (Jianmin Wang) me
  • 7. The Start My supervisor (Jianmin Wang) me Xiangdong, Why do you want to be a PhD at School of Software? I want to develop something that be used by millions of people! Come on! Do some cool softwares that can be used by many many people.
  • 8. Outline • Who am I • The Start • Dream Disillusion • A New Hope
  • 9. As an Individual Developer • Write a lot small “tools“ • But no maintaining • Just for fun/self-use
  • 10. Developer as a Student • Many courses • Do not need to write to much codes (in some home works).. • Good for improve skill, and hard to get the full score (because some are really hard!). Data Mining Modern Database 100 lines? innovation
  • 11. Developer as a Student The figure is from the Internet… 图文无关。。。 Homework magic weapons: - Bootstrap - Django - MySQL A beautiful web DEMO is done
  • 12. Developer as a Student The figure is from the Internet… 图文无关。。。 Homework magic weapons: - Bootstrap - Django - MySQL A beautiful web DEMO is done To use the demo, we can Step 1, click.. Step 2, click.. … student reviews
  • 13. Developer as a Student The figure is from the Internet… 图文无关。。。 Homework magic weapons: - Bootstrap - Django - MySQL A beautiful web DEMO is done To use the demo, we can Step 1, click.. Step 2, click.. … What if I click here first.
  • 14. Developer as a Student The figure is from the Internet… 图文无关。。。 Homework magic weapons: - Bootstrap - Django - MySQL A beautiful web DEMO is done To use the demo, we can Step 1, click.. Step 2, click.. … STOP! YOU CANNOT! What if I click here first.
  • 15. We are writing demo and demo and demo… • Complex project management? • Makefile? POM? Gradle? • Agile? Scrum? Sprint? • CI? CD? A pom file example From Apache PLC4x
  • 16. At the same time, Big Data + Apache .. • Hadoop • HBase • Cassandra Please implement some functions Ah, Hadoop + Hive can do that! Let me download it
  • 17. At the same time, Big Data + Apache .. • Hadoop • HBase • Cassandra • ~200 k lines of codes Please implement some functions Ah, Hadoop + Hive can do that! Let me download it Oops, an exception!
  • 18. At the same time, Big Data + Apache .. • Hadoop • HBase • Cassandra • ~200 k lines of codes • 2.2.0, 2.2.1, …2.2.5; Please implement some functions Ah, Hadoop + Hive can do that! Let me download it Oops, an exception! Why Cassandra can update so frequent?
  • 19. At the same time, Big Data + Apache .. • Hadoop • HBase • Cassandra • ~200 k lines of codes • 2.2.0, 2.2.1, …2.2.5; • Patch Please implement some functions Ah, Hadoop + Hive can do that! Let me download it Oops, an exception! Why Cassandra can update so frequent? Wow, someone share a patch file to fix a bug! Yes, you are growing! You have known JIRA, etc..
  • 20. • When can I get rid of writing demo, and do some nice software like Apache Cassandra, Hadoop, etc..
  • 21. Outline • Who am I • The Start • Dream Disillusion • A New Hope
  • 22. A New Hope • Be active in an existing open source community • Hadoop, Cassandra, Spark etc.. • Be active in a new open source community • IoTDB etc..
  • 23. Time series data is everywhere 穿戴设备无人驾驶
  • 24. A good DB can improve the whole process Network MQ Database queryinsertion save data locally Network analysis
  • 25. And no good software RDB KVDB LSM based •Efficient file structure •More query functions Not optimize for some application scenarios TSDB Limited number of columns 1600 Columns in a table Limited number of rows <=10M rows is better Manual Sharding • Support big data • Limited Queries • Lack time filtering • Lack value filtering • Lack multiple time series alignment Based on PG •Auto sharding •Query optimization Performance degrades sharply after writing data for a long time Hbase/Cassandra based •Partition by TS-UID and time range • Storage inefficiency • Limit of queries
  • 26. Do it ourselves supervisor students Let’s develop a time series DB! Can we? You can! And you can do it in an open source way. And then learn a lot…
  • 27. 1. Teamwork • Git with 10+ persons Team • Commitlog • Conflict, merge, squash… • Branches…(dev, release, stable…) Let your software >= 100K Lines.
  • 28. 2. Learn skills • Git with 10+ persons Team • Conflict, merge, squash… • Branches…(dev, release, stable…) • Project structure Let your software powerful.
  • 29. 3. Stability/Agile • Git with 10+ persons Team • Conflict, merge, squash… • Branches…(dev, release, stable…) • Project structure • CI/CD • Jenkins, travis-CI Let your software really really can be used.
  • 30. 4. Open your mind • Git with 10+ persons Team • Conflict, merge, squash… • Branches…(dev, release, stable…) • Project structure • CI/CD • Jenkins, travis-CI • Issue -> PR -> Release Open your minds. Improve your communication skills.
  • 31. 5. Research and Project • User requirements -> Implementation -> IoTDB -> User • Idea -> Implementation -> IoTDB -> Evaluation -> Paper -> User • Paper -> Implementation -> IoTDB -> Evaluation -> User
  • 32. OK…. • Past • I can write a demo • I like to write something • I like to write something used by myself • Now • I/We know how to write a complex software • I/We know how to write a software used by people
  • 33. Do it ourselves • Know a lot about how Apache project are developed! • How the website of an Apache project is built? • Who can be a committer of an Apache project? • How to release projects? • Who decides the new features of an Apache project? • Etc..
  • 34. Time Series DB for Industrial Internet “清华数为” 时间序列数据库 -->Apache IoTDB (incubating) • Apache IoTDB (incubating) is a high efficient Database for managing time series data, especially in Industry Internet applications. • A young community. Donated by Tsinghua University, 2018.11-18 entered the incubator. • Devoted to building the best time series database (in IoT area) in the world. • Apache IoTDB v0.8.1 is released! v0.9.0 is coming!
  • 36. Concepts in IoTDB (The Schema) Device (i.e., Data source) • A machine instance Measurement (e.g., sensor) • A device can have many measurements Time Series • Device + Measurement • is represented as a path that begins with root, like “root.Cadillac_XT5.USA.CA.7BTC409.fuelRemain” Storage Group (SG) • A storage group can have many devices • Storage groups have independent resources (threads and files) to increase parallelism and reduce competitions for locks. Cadillac XT5
  • 37. The schema mapping root.Cadillac_XT5.USA.CA.7BTC409.fuelRemain root.Cadillac_XT5.USA.CA.7BTC409.speed root.Cadillac_XT5.USA.NV.6BAC321.speed country state device name timestamp fuelRemain speed USA CA 7BTC409 t1 5.0 120 USA CA 7BTC409 t2 4.9 109 USA CA 6BAC321 t1 NULL 50 USA CA 6BAC321 t3 NULL 65 Table Name: Cadillac_XT5 Tags and Fields in InfluxDB, KariosDB, OpenTSDB… called as Measurement in InfluxDB
  • 38. Set time series group SET STORAGE GROUP TO root.laptop.d1.s1; Create Timeseries CREATE TIMESERIES root.laptop.d1.s1 WITH DATATYPE=INT32, ENCODING=RLE Insert Data INSERT INTO (d1.s1,d1.s2,time) VALUES (1000,2000,14735235234); Delete Data DALETE FROM d1.s1 WHERE time < 1000; Update Data UPDATE d1.s1 SET VALUE = 2000 WHERE time < 2000 and time > 1000; Query Data (Filter, Aggregation, Group by time interval) SELECT d1.s1,d2.* FROM BJ.WF1 WHERE d1.s1 < 2000 and d2.s2 > 1000 and freq(d2.s3) > 0.5; SELECT count(status), max_value(temperature) from root.ln.wf01.wt01; SELECT count(status) ) from root.ln.wf01.wt01 group by(1h, [2017-11-03T00:00:00, 2017-11- 03T23:00:00]); SQL in IoTDB
  • 39. Supported data type • Boolean • Int • Long • Float • Double • String • GPS (TODO) -> for trajectory data management • Array (TODO) -> for unstructured data management
  • 40.
  • 41. 41 TsFile: Zip File Born for Time Series Data Columnar Store - Reduce Disk I/O - Improve Compression Compression & Encoding - Improve Compression Greatly - 15% Better than InfluxDB in Real Applications Time-domain Statistics Info Natively - Support Fast Query in - Time Domain - Value Domain - Freq Domain (TODO) detailed specification: http://iotdb.apache.org/#/Documents/0.8.0/chap7/sec3 https://cwiki.apache.org/confluence/display/IOTDB/TsFile+Format
  • 42. Adaptive Delta encoding – Int or Long (TODO) Gorilla encoding – Float or Double 128, 136, 144, 152, 160, … 8, 8, 8, 8 1st difference is constant. 0, 0, 0 2nd difference is 1-bit storage needed! 128, 135, 143, 154, 163, … 7, 8, 11, 9 1st difference is not constant though 1, 3, -2 2nd difference is 2-bit storage needed! • Unified support of fixed frequency times series or irregular frequency time series TS2Diff encoding – Int or Long (timestamps) • A adaptive enhance for TS2Diff. • See next page. RLE encoding – repeated Int or Long • For repeated sequence: store a value and its count Bit-Packing encoding – Int or Long • Store data in compact form • squeeze out wasteful bits • XOR consecutive data points • Store with variable length encoding scheme Snappy Gzip (TODO) LZO (TODO) Compression Algorithm TsFile: Encoding and Compression
  • 43. Adaptive TS2Diff encoding – Int or Long (TODO) • For time series with outliers or missing points • Storing second-order delta values and a boolean flag array. TsFile: Encoding and Compression
  • 44. Time Series Specific Operations (TODO) Pattern Matching for Streaming Time Series Data Split the pattern and data stream into equal length fragments Extract features to reduce the dimension Accelerate the search by using features Scenario:fault alarm in real time 44 SELECT wind_3s FROM china.farm1.tb2 WHERE time > t1 AND time < t2 AND wind_3s LIKE PATTERN(7.2,..,20.3,..,6.0) Similarity Search of Sub-series Indexing data using Key-Value form Scenarios: Outlier detection Historical data analysis …
  • 45. From Edge to Cloud: Run IoTDB Everywhere Time series data files: high-tech write, high compression ratio, support simple queries. Simply put, TsFile is a zip file for time series data. Suitable for embedded devices, general servers, data centers, etc. TsFile (a component of IoTDB) A zip file of time series Freely operate time series of multiple TsFiles, including: CRUD and advanced query like:max, min, avg and temporal alignment. Scene: Embedded equipment, on- site industrial computer, general server, etc. IoTDB A database of time series 3rd Systems Easy to use and integrate for complex analysis(data fusion, collaborative recommendation, machine learning) Scene: Cloud data center A data warehouse of time series
  • 46. A Process to Manage Time Series Data data source or JDBC / Session API JDBC / Session API Grafana-Adaptor Spark-TsFile-AdaptorJDBC Analysis with Big Data Framework (big data set) Analysis with Matlab (small data set) Visualization (Manual data explore) https://github.com/jixuan1989/iotdb-tutorial
  • 47. Latest version v0.8 (0.9.0-snapshot) Apache IoTDB-incubating v0.9.0-SNAPSHOT Xeon E5v4 256G Mem HDD Disk #Client #Storage Group #Device #Measurem ent per Device DataType Encoding Compressio n BatchSize #Point per Time Series 10 50 1000 100 Float RLE Snappy 100 100000 Insertion #Client #Storage Group #Device #Measure ment per Device DataType Encoding Compressi on BatchSize #Point per Time Series 50 1 1 10 Float RLE Snappy 100 100000000 Query
  • 48. Compression Apache IoTDB-incubating v0.9.0-SNAPSHOT Xeon E5v4 256G Mem HDD Disk Raw data: - 12 Bytes per point - 112 GB totally
  • 49. Write Performance: points/s(single node) Xeon E5v4 256G Mem HDD Disk * In this experiment, we do not use IoTDB’s JDBC API and SQL interface. Instead, we use a raw API like Cassnadra’s Raw Thrift API. Apache IoTDB-incubating v0.9.0-SNAPSHOT
  • 50. Query Performance: aggregation count() InfluxDB failed to return any answers in the 100,000,000 setting. Xeon E5v4 256G Mem HDD Disk Apache IoTDB-incubating v0.9.0-SNAPSHOT
  • 51. Shanghai METRO Monitoring … 144 trains 9 KairosDB + Cassandra 3200 points/500 ms/train 14 Restful service just for avoiding modifying current programs KDB compatible Restful Service KDB compatible Restful Service KDB compatible Restful Service ONE IoTDB instance 300 trains 3200 points/200 ms/train 414 Billion data points per day just using ONE IoTDB instance upgrade
  • 52. Join Us • Mail list: • subscribe: dev- subscribe@iotdb.incubator.apache.org • discussion: dev@iotdb.apache.org • !中英文皆可!(推荐英文) • bug report: https://s.apache.org/iotdb-issues • !中英文皆可!(推荐英文) • Website: https://iotdb.apache.org 钉钉用户交流群 官方网站 IoTDB社区建设: • 邀请更多开发者/用户/学生加入社区,共同成长 • 是本科学生毕设、研究生实习的最佳选择之一! • 欢迎外地学生/开发者(邀请参加>=1次北京meetup)