Big Data Taiwan 2014 Track1-3: Big Data, Big Challenge — Splunk 幫你解決 Big Data...Etu Solution
講者:SYSTEX 數據加值應用發展部產品經理 | 陶靖霖
議題簡介:認清現實吧! Big Data 是個熱門詞彙、熱門議題,但是問題的核心仍然圍繞在資料處理的流程、架構與技術,要踏入 Big Data 的領域,使用者會遭遇哪些挑戰? Splunk 被譽為「全球最佳的 Big Data Company」,究竟在資料處理的流程中擁有什麼獨特的技術優勢,能夠幫助使用者克服這些挑戰?又有哪些成功幫助使用者從資料中萃取出價值的應用案例?歡迎來認識 Splunk 以及全球 Big Data 成功案例。
Build 1 trillion warehouse based on carbon databoxu42
Apache CarbonData & Spark Meetup
Build 1 trillion warehouse based on CarbonData
Huawei
Apache Spark™ is a unified analytics engine for large-scale data processing.
CarbonData is a high-performance data solution that supports various data analytic scenarios, including BI analysis, ad-hoc SQL query, fast filter lookup on detail record, streaming analytics, and so on. CarbonData has been deployed in many enterprise production environments, in one of the largest scenario it supports queries on single table with 3PB data (more than 5 trillion records) with response time less than 3 seconds!
Big Data Taiwan 2014 Track1-3: Big Data, Big Challenge — Splunk 幫你解決 Big Data...Etu Solution
講者:SYSTEX 數據加值應用發展部產品經理 | 陶靖霖
議題簡介:認清現實吧! Big Data 是個熱門詞彙、熱門議題,但是問題的核心仍然圍繞在資料處理的流程、架構與技術,要踏入 Big Data 的領域,使用者會遭遇哪些挑戰? Splunk 被譽為「全球最佳的 Big Data Company」,究竟在資料處理的流程中擁有什麼獨特的技術優勢,能夠幫助使用者克服這些挑戰?又有哪些成功幫助使用者從資料中萃取出價值的應用案例?歡迎來認識 Splunk 以及全球 Big Data 成功案例。
Build 1 trillion warehouse based on carbon databoxu42
Apache CarbonData & Spark Meetup
Build 1 trillion warehouse based on CarbonData
Huawei
Apache Spark™ is a unified analytics engine for large-scale data processing.
CarbonData is a high-performance data solution that supports various data analytic scenarios, including BI analysis, ad-hoc SQL query, fast filter lookup on detail record, streaming analytics, and so on. CarbonData has been deployed in many enterprise production environments, in one of the largest scenario it supports queries on single table with 3PB data (more than 5 trillion records) with response time less than 3 seconds!
This talk is introduce by Junping Du, who is an Apache member and Hadoop PMC, at Apache Event at Tsinghua University in China.
Junping Du comes from Tencent and is the chairman of TOSA.
About the Event:
The open source ecosystem plays more and more important role in the world. Open source software is widely used in operating systems, cloud computing, big data, artificial intelligence, and industrial Internet. Many companies have gradually increased their participation in the open source community. Developers with open source experience are increasingly valued and favored by large enterprises. The Apache Software Foundation is one of the most important open source communities, contributing a large number of valuable open source software and communities to the world.
The invited guests of this lecture are all from ASF community, including the chairman of the Apache Software Foundation, three Apache members, Top 5 Apache code committers (according to Apache annual report), the first Committer in the Hadoop project in China, several Apache project mentors or VPs, and many Apache Committers. They will tell you what the open source culture is, how to join the Apache open source community, and the Apache Way.
From a student to an apache committer practice of apache io tdbjixuan1989
This talk is introduce by Xiangdong Huang, who is a PPMC of Apache IoTDB (incubating) project, at Apache Event at Tsinghua University in China.
About the Event:
The open source ecosystem plays more and more important role in the world. Open source software is widely used in operating systems, cloud computing, big data, artificial intelligence, and industrial Internet. Many companies have gradually increased their participation in the open source community. Developers with open source experience are increasingly valued and favored by large enterprises. The Apache Software Foundation is one of the most important open source communities, contributing a large number of valuable open source software and communities to the world.
The invited guests of this lecture are all from ASF community, including the chairman of the Apache Software Foundation, three Apache members, Top 5 Apache code committers (according to Apache annual report), the first Committer in the Hadoop project in China, several Apache project mentors or VPs, and many Apache Committers. They will tell you what the open source culture is, how to join the Apache open source community, and the Apache Way.
Practice of building apache sharding sphere iincubator communityjixuan1989
This talk is introduce by Liang Zhang, who is a PPMC of Apache SahrdingSphere (incubating) project, at Apache Event at Tsinghua University in China.
Liang Zhang comes from JD.com.
About the Event:
The open source ecosystem plays more and more important role in the world. Open source software is widely used in operating systems, cloud computing, big data, artificial intelligence, and industrial Internet. Many companies have gradually increased their participation in the open source community. Developers with open source experience are increasingly valued and favored by large enterprises. The Apache Software Foundation is one of the most important open source communities, contributing a large number of valuable open source software and communities to the world.
The invited guests of this lecture are all from ASF community, including the chairman of the Apache Software Foundation, three Apache members, Top 5 Apache code committers (according to Apache annual report), the first Committer in the Hadoop project in China, several Apache project mentors or VPs, and many Apache Committers. They will tell you what the open source culture is, how to join the Apache open source community, and the Apache Way.
Willem Ning Jiang: Getting Started: How to join an Open Source project Apache...jixuan1989
This talk is introduce by Willem Ning Jiang, who is an Apache member and ServiceComb PMC, at Apache Event at Tsinghua University in China.
Willem Ning Jiang comes from Huawei.
About the Event:
The open source ecosystem plays more and more important role in the world. Open source software is widely used in operating systems, cloud computing, big data, artificial intelligence, and industrial Internet. Many companies have gradually increased their participation in the open source community. Developers with open source experience are increasingly valued and favored by large enterprises. The Apache Software Foundation is one of the most important open source communities, contributing a large number of valuable open source software and communities to the world.
The invited guests of this lecture are all from ASF community, including the chairman of the Apache Software Foundation, three Apache members, Top 5 Apache code committers (according to Apache annual report), the first Committer in the Hadoop project in China, several Apache project mentors or VPs, and many Apache Committers. They will tell you what the open source culture is, how to join the Apache open source community, and the Apache Way.
This talk is introduce by Craig L Russell, who is the Apache Software Foundation Chairman, at Apache Event at Tsinghua University in China.
About the Event:
The open source ecosystem plays more and more important role in the world. Open source software is widely used in operating systems, cloud computing, big data, artificial intelligence, and industrial Internet. Many companies have gradually increased their participation in the open source community. Developers with open source experience are increasingly valued and favored by large enterprises. The Apache Software Foundation is one of the most important open source communities, contributing a large number of valuable open source software and communities to the world.
The invited guests of this lecture are all from ASF community, including the chairman of the Apache Software Foundation, three Apache members, Top 5 Apache code committers (according to Apache annual report), the first Committer in the Hadoop project in China, several Apache project mentors or VPs, and many Apache Committers. They will tell you what the open source culture is, how to join the Apache open source community, and the Apache Way.
22. 2. 索引结构实现特殊查询
CIKM 2016: PISA: An Index for Aggregating Big Time Series Data
Only records root nodes in memory and build virtual trees,
for reducing memory cost and disk I/O
22
Fast Aggregation Method for Time Series
26. Concepts in IoTDB (The Schema)
Device (i.e., Data source)
• A machine instance
Measurement (e.g., sensor)
• A device can have many measurements
Time Series
• Device + Measurement
• is represented as a path that begins with root, like
“root.Cadillac_XT5.USA.CA.7BTC409.fuelRemain”
Storage Group (SG)
• A storage group can have many devices
• Storage groups have independent resources
(threads and files) to increase parallelism and
reduce competitions for locks.
Cadillac XT5
27. The schema mapping 1
root.Cadillac_XT5.USA.CA.7BTC409.fuelRemain
root.Cadillac_XT5.USA.CA.7BTC409.speed
root.Cadillac_XT5.USA.NV.6BAC321.speed
country state device name timestamp fuelRemain speed
USA CA 7BTC409 t1 5.0 120
USA CA 7BTC409 t2 4.9 109
USA CA 6BAC321 t1 NULL 50
USA CA 6BAC321 t3 NULL 65
Table Name: Cadillac_XT5 (RDB schema or NoSQL like Cassandra)
Tags and Fields in InfluxDB, KariosDB, OpenTSDB…
Table Storage group
Dimension
Column
Device,timestamp
Metric
Column
Measurement
OLTP SchemaIoTDB Schema
28. The schema mapping 2
root.Cadillac_XT5.USA.CA.7BTC409.fuelRemain
root.Cadillac_XT5.USA.CA.7BTC409.speed
root.Cadillac_XT5.USA.NV.6BAC321.speed
timestamp fuelRemain speed
t1 5.0 120
t2 4.9 109
Table Name: USA.CA.7BTC409
Tags and Fields in InfluxDB, KariosDB, OpenTSDB…
timestamp speed
t1 50
t3 65
Table Name: USA.NV.6BAC321
Database: root.Cadillac_XT5
Database Storage
group
Table Device
Column Measurement
/ Sensor
30. Set Storage Group
SET STORAGE GROUP TO root.ln;
Create Timeseries
CREATE TIMESERIES root.ln.wf01.wt01.temperature WITH DATATYPE=FLOAT, ENCODING=RLE
Insert Data
INSERT INTO root.ln.wf02.wt02(timestamp, status) VALUES (1, true);
Delete Data
DELETE FROM root.ln.wf02.wt02.status WHERE time < 1000;
Query Data (Filter, Aggregation, Group by time interval)
SELECT count(status), max_value(temperature) FROM root.ln.wf01.wt01 GROUP BY (1h, [2017-
11-03T00:00:00, 2017-11-03T23:00:00]);
SQL in IoTDB
31. Set Storage Group
SET STORAGE GROUP TO root.ln;
SET STORAGE GROUP TO root.sgcc;
Show Storage Group
SHOW STORAGE GROUP;
Delete Storage Group
DELETE STORAGE GROUP TO root.ln;
Storage Group Statement
32. Create Timeseries
CREATE TIMESERIES root.ln.wf01.wt01.status WITH DATATYPE=BOOLEAN, ENCODING=PLAIN
CREATE TIMESERIES root.ln.wf01.wt01.temperature WITH DATATYPE=FLOAT, ENCODING=RLE
CREATE TIMESERIES root.ln.wf02.wt02.hardware WITH DATATYPE=TEXT, ENCODING=PLAIN
CREATE TIMESERIES root.ln.wf02.wt02.status WITH DATATYPE=BOOLEAN, ENCODING=PLAIN
CREATE TIMESERIES root.sgcc.wf03.wt01.status WITH DATATYPE=BOOLEAN, ENCODING=PLAIN
CREATE TIMESERIES root.sgcc.wf03.wt01.temperature WITH DATATYPE=FLOAT, ENCODING=RLE
Show Timeseries
SHOW TIMESERIES root
SHOW TIMESERIES root.ln
Other Timeseries Operation
COUNT TIMESERIES root
DELETE TIMESERIES root.ln.wf01.wt01
Timeseries Statement
33. Insert Data
INSERT INTO root.ln.wf01.wt01(timestamp, status) VALUES (1, true);
INSERT INTO root.ln.wf02.wt02(timestamp, status, hardware, temperature, software, type)
VALUES (1, false, “v1”, 12.0, “v2”, 3);
Delete Data
DELETE FROM root.ln.wf02.wt02.status WHERE time < 1000;
DELETE FROM root.ln.wf02.wt02.* WHERE time < 1000;
Insert and Delete Data
34. Select a Column of Data Based on a Time Interval
SELECT temperature FROM root.ln.wf01.wt01 WHERE time < 2017-11-01T00:08:00.000;
Choose Multiple Columns of Data for Different Devices According to Multiple Time Intervals
SELECT wf01.wt01.status, wf02.wt02.hardware FROM root.ln WHERE (time > 2017-11-
01T00:05:00.000 and time < 2017-11-01T00:12:00.000) or (time >= 2017-11-01T16:35:00.000 and
time <= 2017-11-01T16:37:00.000);
Query Data
35. Down-Frequency Aggregate Query
SELECT count(status), max_value(temperature) FROM root.ln.wf01.wt01 WHERE time > 2017-11-
03T06:00:00 and temperature > 20 GROUP BY (1h, [2017-11-03T00:00:00, 2017-11-03T23:00:00]);
Automated Fill
SELECT temperature FROM root.ln.wf03.wt01 WHERE time = 2017-11-01T16:37:50.000
FILL(float[previous, 1m])
Query Data
36. Supported data type
• Boolean
• Int
• Long
• Float
• Double
• String
• GPS (TODO) -> for trajectory data management
• Array (TODO) -> for unstructured data management
37. Count Nodes Statement
COUNT NODES root LEVEL=2
COUNT NODES root.ln.wf01 LEVEL=3
Show Devices Statement
SHOW DEVICES
Show Child Paths of Root Statement
SHOW CHILD PATHS
Show Child Paths Statement
SHOW CHILD PATHS root
SHOW CHILD PATHS root.ln.wf01
Metadata Related Statement
38. Set TTL
SET TTL to root.ln 3600000
Unset TTL
UNSET TTL to root.ln
TTL Statement
39. Using JDBC to write data
set storage group
create timeseries
insert data
https://iotdb.apache.org/#/Documents/progress/chap4/sec2
40. Using Session API to write Data
(more efficient)
set storage group
create timeseries
insert data
https://iotdb.apache.org/#/Documents/progress/chap4/sec3
41. Using Session API to write Data in Batch
(more efficient)
https://iotdb.apache.org/#/Documents/progress/chap4/sec3
Set Measurement
Build batch
Insert
42. Using JDBC to Query Data
raw data query
aggregation query
down sampling query
print result
https://iotdb.apache.org/#/Documents/progress/chap4/sec2
43. Using Spark to Analyze Data in Tsfile
create table
sql query
read TsFile
write to TsFile
https://iotdb.apache.org/#/Documents/progress/chap7/sec3
44. Using Spark to Analyze Data in IOTDB
create table
sql query
Query IOTDB
import org.apache.iotdb.spark.db._
val df = spark.read.format("org.apache.iotdb.spark.db")
.option("url","jdbc:iotdb://127.0.0.1:6667/")
.option("sql","select * from root").load
df.createOrReplaceTempView("iotdb_table")
val newDf = spark.sql("select * from iotdb_table")
https://iotdb.apache.org/#/Documents/progress/chap7/sec4
45. Using Hive to Analyze Data in Tsfile
• hive> CREATE EXTERNAL TABLE IF NOT EXISTS only_sensor_1( time_stamp TIMESTAMP,
sensor_1 BIGINT) ROW FORMAT SERDE 'org.apache.iotdb.hive.TsFileSerDe'STORED AS
INPUTFORMAT 'org.apache.iotdb.hive.TSFHiveInputFormat'
OUTPUTFORMAT 'org.apache.iotdb.hive.TSFHiveOutputFormat’
LOCATION '/data/data/sequence/root.baic2.WWS.leftfrontdoor/’
TBLPROPERTIES ('device_id'='root.baic2.WWS.leftfrontdoor.plc1');
• hive> set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
• hive> select * from only_sensor_1 limit 10;
Create table
Set format
query
https://iotdb.apache.org/#/Documents/progress/chap7/sec5
46. Store Data in HDFS
https://iotdb.apache.org/#/Documents/progress/chap8/sec3
• build server and Hadoop module
• copy the target jar of Hadoop module into server target lib folder
• Edit user config in iotdb-engine.properties
47. Using Grafana to Visualize Data
https://iotdb.apache.org/#/Documents/progress/chap7/sec1
• Install simple-json-datasource plugin
• Config iotdb-grafana-connector
• application.properties
• Start iotdb-grafana-connector
• java -jar iotdb-grafana-0.8.0.war
• Add IoTDB data source(Simplejson)
• choose connector IP
• Config dashboard and Enjoy!
49. A Process to Manage Time Series Data
data source
or
JDBC / Session API
JDBC / Session API
Grafana-Adaptor Spark-TsFile-AdaptorJDBC
Analysis with Big Data Framework
(big data set)
Analysis with Matlab
(small data set)
Visualization
(Manual data explore)
50. 实例
RocketMQ Consumer
data source
RocketMQ Producer
通过JDBC / Session API
创建storage group
创建时间序列
向IoTDB插入数据
获取原始数据
将数据打包为Message
代码示例:https://github.com/apache/incubator-iotdb/pull/676