22. 2. 索引结构实现特殊查询
CIKM 2016: PISA: An Index for Aggregating Big Time Series Data
Only records root nodes in memory and build virtual trees,
for reducing memory cost and disk I/O
22
Fast Aggregation Method for Time Series
26. Concepts in IoTDB (The Schema)
Device (i.e., Data source)
• A machine instance
Measurement (e.g., sensor)
• A device can have many measurements
Time Series
• Device + Measurement
• is represented as a path that begins with root, like
“root.Cadillac_XT5.USA.CA.7BTC409.fuelRemain”
Storage Group (SG)
• A storage group can have many devices
• Storage groups have independent resources
(threads and files) to increase parallelism and
reduce competitions for locks.
Cadillac XT5
27. The schema mapping 1
root.Cadillac_XT5.USA.CA.7BTC409.fuelRemain
root.Cadillac_XT5.USA.CA.7BTC409.speed
root.Cadillac_XT5.USA.NV.6BAC321.speed
country state device name timestamp fuelRemain speed
USA CA 7BTC409 t1 5.0 120
USA CA 7BTC409 t2 4.9 109
USA CA 6BAC321 t1 NULL 50
USA CA 6BAC321 t3 NULL 65
Table Name: Cadillac_XT5 (RDB schema or NoSQL like Cassandra)
Tags and Fields in InfluxDB, KariosDB, OpenTSDB…
Table Storage group
Dimension
Column
Device,timestamp
Metric
Column
Measurement
OLTP SchemaIoTDB Schema
28. The schema mapping 2
root.Cadillac_XT5.USA.CA.7BTC409.fuelRemain
root.Cadillac_XT5.USA.CA.7BTC409.speed
root.Cadillac_XT5.USA.NV.6BAC321.speed
timestamp fuelRemain speed
t1 5.0 120
t2 4.9 109
Table Name: USA.CA.7BTC409
Tags and Fields in InfluxDB, KariosDB, OpenTSDB…
timestamp speed
t1 50
t3 65
Table Name: USA.NV.6BAC321
Database: root.Cadillac_XT5
Database Storage
group
Table Device
Column Measurement
/ Sensor
30. Set Storage Group
SET STORAGE GROUP TO root.ln;
Create Timeseries
CREATE TIMESERIES root.ln.wf01.wt01.temperature WITH DATATYPE=FLOAT, ENCODING=RLE
Insert Data
INSERT INTO root.ln.wf02.wt02(timestamp, status) VALUES (1, true);
Delete Data
DELETE FROM root.ln.wf02.wt02.status WHERE time < 1000;
Query Data (Filter, Aggregation, Group by time interval)
SELECT count(status), max_value(temperature) FROM root.ln.wf01.wt01 GROUP BY (1h, [2017-
11-03T00:00:00, 2017-11-03T23:00:00]);
SQL in IoTDB
31. Set Storage Group
SET STORAGE GROUP TO root.ln;
SET STORAGE GROUP TO root.sgcc;
Show Storage Group
SHOW STORAGE GROUP;
Delete Storage Group
DELETE STORAGE GROUP TO root.ln;
Storage Group Statement
32. Create Timeseries
CREATE TIMESERIES root.ln.wf01.wt01.status WITH DATATYPE=BOOLEAN, ENCODING=PLAIN
CREATE TIMESERIES root.ln.wf01.wt01.temperature WITH DATATYPE=FLOAT, ENCODING=RLE
CREATE TIMESERIES root.ln.wf02.wt02.hardware WITH DATATYPE=TEXT, ENCODING=PLAIN
CREATE TIMESERIES root.ln.wf02.wt02.status WITH DATATYPE=BOOLEAN, ENCODING=PLAIN
CREATE TIMESERIES root.sgcc.wf03.wt01.status WITH DATATYPE=BOOLEAN, ENCODING=PLAIN
CREATE TIMESERIES root.sgcc.wf03.wt01.temperature WITH DATATYPE=FLOAT, ENCODING=RLE
Show Timeseries
SHOW TIMESERIES root
SHOW TIMESERIES root.ln
Other Timeseries Operation
COUNT TIMESERIES root
DELETE TIMESERIES root.ln.wf01.wt01
Timeseries Statement
33. Insert Data
INSERT INTO root.ln.wf01.wt01(timestamp, status) VALUES (1, true);
INSERT INTO root.ln.wf02.wt02(timestamp, status, hardware, temperature, software, type)
VALUES (1, false, “v1”, 12.0, “v2”, 3);
Delete Data
DELETE FROM root.ln.wf02.wt02.status WHERE time < 1000;
DELETE FROM root.ln.wf02.wt02.* WHERE time < 1000;
Insert and Delete Data
34. Select a Column of Data Based on a Time Interval
SELECT temperature FROM root.ln.wf01.wt01 WHERE time < 2017-11-01T00:08:00.000;
Choose Multiple Columns of Data for Different Devices According to Multiple Time Intervals
SELECT wf01.wt01.status, wf02.wt02.hardware FROM root.ln WHERE (time > 2017-11-
01T00:05:00.000 and time < 2017-11-01T00:12:00.000) or (time >= 2017-11-01T16:35:00.000 and
time <= 2017-11-01T16:37:00.000);
Query Data
35. Down-Frequency Aggregate Query
SELECT count(status), max_value(temperature) FROM root.ln.wf01.wt01 WHERE time > 2017-11-
03T06:00:00 and temperature > 20 GROUP BY (1h, [2017-11-03T00:00:00, 2017-11-03T23:00:00]);
Automated Fill
SELECT temperature FROM root.ln.wf03.wt01 WHERE time = 2017-11-01T16:37:50.000
FILL(float[previous, 1m])
Query Data
36. Supported data type
• Boolean
• Int
• Long
• Float
• Double
• String
• GPS (TODO) -> for trajectory data management
• Array (TODO) -> for unstructured data management
37. Count Nodes Statement
COUNT NODES root LEVEL=2
COUNT NODES root.ln.wf01 LEVEL=3
Show Devices Statement
SHOW DEVICES
Show Child Paths of Root Statement
SHOW CHILD PATHS
Show Child Paths Statement
SHOW CHILD PATHS root
SHOW CHILD PATHS root.ln.wf01
Metadata Related Statement
38. Set TTL
SET TTL to root.ln 3600000
Unset TTL
UNSET TTL to root.ln
TTL Statement
39. Using JDBC to write data
set storage group
create timeseries
insert data
https://iotdb.apache.org/#/Documents/progress/chap4/sec2
40. Using Session API to write Data
(more efficient)
set storage group
create timeseries
insert data
https://iotdb.apache.org/#/Documents/progress/chap4/sec3
41. Using Session API to write Data in Batch
(more efficient)
https://iotdb.apache.org/#/Documents/progress/chap4/sec3
Set Measurement
Build batch
Insert
42. Using JDBC to Query Data
raw data query
aggregation query
down sampling query
print result
https://iotdb.apache.org/#/Documents/progress/chap4/sec2
43. Using Spark to Analyze Data in Tsfile
create table
sql query
read TsFile
write to TsFile
https://iotdb.apache.org/#/Documents/progress/chap7/sec3
44. Using Spark to Analyze Data in IOTDB
create table
sql query
Query IOTDB
import org.apache.iotdb.spark.db._
val df = spark.read.format("org.apache.iotdb.spark.db")
.option("url","jdbc:iotdb://127.0.0.1:6667/")
.option("sql","select * from root").load
df.createOrReplaceTempView("iotdb_table")
val newDf = spark.sql("select * from iotdb_table")
https://iotdb.apache.org/#/Documents/progress/chap7/sec4
45. Using Hive to Analyze Data in Tsfile
• hive> CREATE EXTERNAL TABLE IF NOT EXISTS only_sensor_1( time_stamp TIMESTAMP,
sensor_1 BIGINT) ROW FORMAT SERDE 'org.apache.iotdb.hive.TsFileSerDe'STORED AS
INPUTFORMAT 'org.apache.iotdb.hive.TSFHiveInputFormat'
OUTPUTFORMAT 'org.apache.iotdb.hive.TSFHiveOutputFormat’
LOCATION '/data/data/sequence/root.baic2.WWS.leftfrontdoor/’
TBLPROPERTIES ('device_id'='root.baic2.WWS.leftfrontdoor.plc1');
• hive> set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
• hive> select * from only_sensor_1 limit 10;
Create table
Set format
query
https://iotdb.apache.org/#/Documents/progress/chap7/sec5
46. Store Data in HDFS
https://iotdb.apache.org/#/Documents/progress/chap8/sec3
• build server and Hadoop module
• copy the target jar of Hadoop module into server target lib folder
• Edit user config in iotdb-engine.properties
47. Using Grafana to Visualize Data
https://iotdb.apache.org/#/Documents/progress/chap7/sec1
• Install simple-json-datasource plugin
• Config iotdb-grafana-connector
• application.properties
• Start iotdb-grafana-connector
• java -jar iotdb-grafana-0.8.0.war
• Add IoTDB data source(Simplejson)
• choose connector IP
• Config dashboard and Enjoy!
49. A Process to Manage Time Series Data
data source
or
JDBC / Session API
JDBC / Session API
Grafana-Adaptor Spark-TsFile-AdaptorJDBC
Analysis with Big Data Framework
(big data set)
Analysis with Matlab
(small data set)
Visualization
(Manual data explore)
50. 实例
RocketMQ Consumer
data source
RocketMQ Producer
通过JDBC / Session API
创建storage group
创建时间序列
向IoTDB插入数据
获取原始数据
将数据打包为Message
代码示例:https://github.com/apache/incubator-iotdb/pull/676