SlideShare a Scribd company logo
Big Machine Data - Two Exemplary Applications in China
Jianmin Wang
Tsinghua University
Beijing, China
Agenda
• Background
• Two Exemplary Applications in China
Who we are?
• Institute for Data Science,
Tsinghua University
• Founded in April 2014
• Missions & Status Quo
– Recruiting world-class researchers and engineers from industry and academia
– Long-term dedication to system research and industry practice
– Leading China’s big data strategy, especially for industrial big data
BIG data
Big Data Landscape
People
generated
2
1
3
Computer
generated
Machine
generated
Machine Generated Data
• Broadly exist
– Industrial business
– Agriculture
– Utility
– Military
– Smarter City
– Logistics
– Smart devices
– Science research
Data Rate
24*7, up to million
data points/s, and
millions of devices
DataType
Mostly are time-series, temporal sequence,
and spatial-temporal and array data
Data Usage
Real-time processing.
From monitoring to content, shape,
signal based query and analysis
 Industrial businesses have entered the era of “big data”, but the real challenge is
how to extract value from data.
 Machine generated data is the core of industrial big data
Big Machine data is beyond 3Vs
Our research spans big data lifecycle
Storage1 Access & Exploration3
Preprocessing2 Modeling & Analytics4
Agenda
• Background
• Two Exemplary Applications in China
1 Industrial Sensor Data Management:
Cassandra at China Sany Group
2 Climate Data Management:
Cassandra at China Meteorological Administration
9© 2015. All Rights Reserved.
China Sany Group
10© 2015. All Rights Reserved.
More than 200K active engineering machineries
In more than 150 countries
SANY Group is a global company in the
construction machinery industry.
In 2011, SANY became China’s unique
company listed among the world’s top
500 companies in the construction
machinery industry.
Pipeline of Industrial Sensor Data Processing
© 2015. All Rights Reserved. 11
Internet
三一运动控制器
SYMC
三一工业显示屏
SYLD
三一移动终端
SYMT
产品主控制柜
基
于
SCP协
议
包
车
辆
工
况
数
据
无线基站
无线到有线
指定IP与端口
快反工程师
资料工程师
...
用户计算机 服务人员
业主
1
2
3
4
execute
collect
decision
transfer
The data records the operational
statuses of the machineries
5000 kinds of sensors
50 billion records per year
2008
• Start
managing
sensor data
2010
• 60k
machineries
2012
• 80k
machineries
• Can only
support 6
month data
online
2014
• >100k
machineries
• All data
online
2020
•>500K
machineries
•>10K users
Technology Roadmap in Sany Group
© 2015. All Rights Reserved. 12
SQL Server
➡ Oracle
Oracle ➡
Cassandra
Why Cassandra?
•Cost performance
•Scalability
•P2P Architecture
Operation Worst case Average
case
Write 30% 2x
Query 22.6% 10x
Software Stack of Sensor Data Management
© 2015. All Rights Reserved. 13
Collect Store Analyze
Storm
设备(主键)
工况1(列族1) 工况2(列族2) ……
接收时间1
(列1)
接收时间 2
(列2)
……
接收时间 1
(列1)
接收时间 2
(列2)
…… ……
设备1 监测值 监测值 …… 监测值 监测值 …… ……
设备2 监测值 监测值 …… 监测值 监测值 …… ……
…… …… …… …… …… …… …… ……
Map/Reduce
row
key
sensor1(cf1) sensor2(cf2)
device2
received
time1
received
time2
received
time1
received
time2
device1 value
value
value
value
value value
value value
Structured Storage
gathertime
Cassandra Storage:
machine
gather time
sensors
。。。
。。。
Schema Design – Row and Column
• Use sensor as Column Family (CF)
• In each Column Family (CF)
– Use as the row key
– Use as the column name
– Use as the column value
– Columns of each row are sorted in advance
– The number of columns is readily increasable
machine
gather time
。。。
machine
gather time
。。。
…
sensor1 sensor2 sensorN
~5000
sensors
5000+ column families
Cassandra v1.2
CQL2 (not CQL3)
© 2015. All Rights Reserved. 15
Why 5000+ Column Families?
• Cassandra V1.* does not support multiple primary key & clustering key
• This makes programming more complex
• Manually split the row key or column name
• All the data in one SSTABLE belongs to a specific CF
• When querying a specified sensor, we need not scan unnecessary data
Row Key Column Name
machine_id sensor_id : gather_time
Row Key Column Name
sensor_id : machine_id gather_time
Cassandra v1.2
CQL2 (not CQL3)
Challenge 1 – Creating Schema Hang
• Problem
– Create 5000+ CFs in batch
– Creation cost increases dramatically
© 2015. All Rights Reserved.
0
5000
10000
15000
1
28
55
82
109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
514
541
TimeCost(ms)
CF Serial Number
Time Cost Create 1
CF: 10s
Create 1
CF: 0.1s
• Root Cause
– Protocol Conflict
• Between Gossip Protocol and Request
Propagation Mechanism
– Message Overhead
• May transform the whole schema instead of
the changed part
ReceiveSchema
Message
Memory Cost
SendSchema
Message
Memory Cost
Total
N1 4.465G 4.236G 8.70G
N2 4.308G 4.907G 9.21G
N3 4.236G 4.024G 8.26G
N4 4.808G 4.387G 9.19G
N5 6.111G 6.373G 12.48G
Memory used by Gossip
Challenge 1 – Creating Schema Hang
• Solution
– Gossip takes effect only when:
• Propagation messages lost/timeout
• Nodes recovered from a failure
– Creation time cost can keep constant 17
Propagate
LOAD
STATUS
SHCEMA
VERSION
...
LOAD
STATUS
SHCEMA 延迟:t秒
VERSION
...
metadata metadata
Delay T sT strategy:
1 2
34
Adaptive Lazy Gossip
3 4
Challenge 2 – Balancing Consistency &
Throughput
• Production environment
– Sany production: 5 nodes cluster,
2x4 cores 64GB
• Problem
– Throughput = 200K data points/sec
– 75% data is written successfully only
in one replica, while the other
replicas are stale (inconsistent)
• Cassandra is NOT very consistent
• Big obstacle for query operation
– Repair is required, but is very slow
© 2015. All Rights Reserved. 18
Experiment on Amazon EC2
2 cores, 8GB, 5 nodes
rywc: read your write consistency
Challenge 2 – Balancing Consistency &
Throughput
• Root Cause of slow Repair
– Too many column families (5000+)
– Too many ranges in the consistent
hashing ring
• 256 virtual nodes (VN) per physical
node
• Too many merkle trees (ranges x CFs)
• Experience and Suggestions
– Repair CFs and ranges one by one
• Do not repair the whole keyspace (all
CFs) at once
– Repair the important CFs first
– Perform repair at light workload
© 2015. All Rights Reserved. 19
- 5 physical nodes
- each has 2 VNs
- 10 ranges in total
For each range and each CF, create merkle
tree and compare them between two nodes
Challenge 3 – Heterogeneous Nodes
• Problem
– How to assign the data partitions
in a heterogeneous cluster?
• Experiment Study
– Deploy a heterogeneous cluster
• 2 powerful servers and 8 PCs
– Throughput performance
• Heterogeneous cluster cluster
only with the 2 powerful servers
© 2015. All Rights Reserved. 20
Assign the position of the nodes (i.e. Tokens) in
the ring according to their computing capacities
Challenge 3 – Heterogeneous Nodes
• Root Cause
– The replica mechanism makes the
unbalanced problem complicated
• Each Node’s configurations may impact
other nodes’ performance
– The Virtual Node (VN) mechanism
cannot fit all scenarios
• Too many VNs make the lookup table
too big and slow down repair speed
• Max #VNs in a physical node is 1536
(restricted by Cassandra source code)
© 2015. All Rights Reserved. 21
The capacity of N1 is the worst, and E is short
But N1 is responsible for many data records
to the cluster:
• N5 finish the operation quickly
• But N5 has to wait for N1, which is slow
Challenge 3 – Heterogeneous Nodes
• Solution
– Initialize the cluster properly
• Use Quadratic Optimization
(QP) to find the best positions
of the (virtual) servers
• Has been deployed to China
Sany Group successfully
– Scaling out the cluster
• Use a dynamic algorithm to
find the best positions for the
new added server
© 2015. All Rights Reserved. 22
Scaling out: Xiangdong Huang, Jianmin Wang et al. Optimizing Data Partition for Scaling out NoSQL Cluster. Concurrency and Computation: Practice and Experience (Early View)
Scaling out: find the best position
Optimize:
1. the order of the nodes in the ring
2. the range length of the ring
Datasets & Results in China Sany Group
• 5000+ column families for sensor data
• 100K+ engineering machineries
• Amount of historical data loaded
– From 2012.4 to now
• Data size
– Tens of billions operational statuses records
– Several billion GPS data
– Write throughput
– 5 nodes (2*4 cores CPU, 64GB memory, 9TB Disk)
– 20K TPS as regular workload, 200K at peak
23
Industrial Big Data Platform: More Requirements
——Beyond Sany Applications
High frequency sensor
High volume sensors
10+ M data point/second
Time and value based query
Richer set of analytical queries
<1 Second response
Edge synchronization
Compression, out-of-order,
retransmission
Different data, different algorithms
Transparent to query
Deep compression to historical data
Spatial-temporal index
Trajectory based queries
Even higher
throughput
Native time-series
query
Synchronization
Adaptive deep
compression
Moving object
support
Industrial Data Analysis Pipeline
© 2015. All Rights Reserved. 25
Boolean value
Status values
Analogue value
1.046Billion
Basic indicator
8030
Baseline
1.046Billion
Variance
Specific
features
Common
features
Outliers
Specifiedoperational
statusesdata
General
count baseline variance
frequency baseline variance
..
Analogue
average baseline variance
variance baseline variance
extremum baseline variance
…
Boolean
times baseline variance
duration baseline variance
…
States
Changes
times
baseline variance
duration baseline variance
…
Driver profile
Hydraulic oil
temperature analysis
Temporal parameter
analysis for vehicle start
Parameter correlation
Spatial analysis
for failure
Service
Quality
ControlR&D
Key components
anomaly detection
Industry Practice – Value-Added Analytics
horizontal
inclination
angle
Concrete pump truck’s tip-over is mainly caused by insufficient leg’s cylinder
support, which is a major issue of production safety
Big Data Application 1
—Concrete Pump Truck Tip-over Detection
Big Data Application 1
—Concrete Pump Truck Tip-over Detection
Fast spot and prevent dangerous operation through group behavior
analysis of concrete pump trucks
The overall distribution of horizontal (X-axis) & vertical (Y-axis)
inclination angle of concrete pumps
Unstable instances
Idle instances
Inclination angle vibration
level filter
Inclination angle distribution of
individual concrete pump
Idle instances:
unplugging operation leads to
malfunctioning
Unstable instances:
Early degradation pattern of cylinder
Typical instances:
stable oscillation
Data driven anomaly and potential accidents detection
Big Data Application 1
—Concrete Pump Truck Tip-over Detection
Big Data Application 2
—Fault Diagnosis
Investigation proved that salt-spray environment and the water quality
along the seaside caused the corrosion of cylinder’s potted component
Via time series pattern analysis and spatial correlation, leakage problem of master
cylinder is highly correlated with a high-speed rail construction project.
Hangzhou-Shenzhen high-speed rail
Salt-spray corrosion environment
Big Data Application 3
- Spare Components Demand Forecasting
• Traditional approach is
based on marketers’
experience
• New approach
– Combining the real-time data from
machines, sale history, holdings of
vehicles, environment and GDP,
etc.
• Result
– Reduce half of inactive spare part
inventory
0
50
100
150
200
250
中
旬
下
旬
上
旬
中
旬
下
旬
上
旬
中
旬
下
旬
上
旬
中
旬
下
旬
上
旬
中
旬
下
旬
上
旬
中
旬
下
旬
上
旬
中
旬
下
旬
上
旬
中
旬
下
旬
上
旬
中
旬
下
旬
2012/10 2012/11 2012/12 2013/1 2013/2 2013/3 2013/4 2013/5 2013/6
配件需求量数量/个
实际备件需求量 基于矩阵分解的多地区协同备件预测结果 企业实际备货量
The predicted result fits the actual
demand better
Sparepartsnumber
Actual demand Actual prepared
Results of Multi-Region Collaborative
Spare Components Prediction Based
on Matrix Factorization
1 Industrial Data Management:
Cassandra at China Sany Group
2 Climate Data Management:
Cassandra at China Meteorological Administration
32© 2015. All Rights Reserved.
Pipeline of Climate Data Processing
© 2015. All Rights Reserved.
Data Center
Internet
Collection
2 Transmission
3 Access
4 Browsing
1
T639
windfield
temperaturefield
humidity
rainfall
snowfall
…...
model
Ground
Aerological
Satellite
Radar
Lightning
Typhoon
850Pa
800Pa
……
900Pa
temperature
8AM, 3h
8AM, 6h
…
8PM, 3h
8PM, 6h
Characteristics of Climate Data
Challenge in Meteorological Application
—Pattern Data
© 2015. All Rights Reserved. 35
• Hierarchical pattern data + flat others
• A highly-efficient data-deliver system for end users
– Support millions of small files
– Access data fast
– Scan data in various order
• Performance requirement
– Get ~1MB data in 50ms
– 600 concurrent clients
/
T639d1 ...
windtemper ...d2
d3
d4
d5
800 850 900
2014.2.
18.08
2014.2.
18.20
2014.2.
19.08
3 6 9
...
... ... ...
2014.2.
18.08
...
...
2014.2.
18.08
...
... ...
3 3 3 3
t1t2t3 t4 t5 t6 t7
d3
Why Cassandra?
• Scalability
• Fast read/write data
• Some columns are sorted
– Easy to scan data sequentially
• Time-based Compaction (>=Cassandra v2.0) for time series
© 2015. All Rights Reserved. 36
key 3h 6h 9h …
T639/temperature/800Pa file file file …
1. Get the data where key=‘T639…/800Pa’
2. Retrieval the data before 6h
Or retrieval the data after 6h
key 3h 6h 9h 12h … 3h 6h 9h
T639/temperature/800Pa file file file file … file file file
Solution – Schema Design for Pattern Data
• Data items
– 5-tuple
– Pattern and variable are disordered
– Level, time, ageing are ordered
© 2015. All Rights Reserved. 37
time
level
ageingData space
(pattern, variable)
ColumnFamily
Row key
Column
/
T639d1 ...
windtemper ...d2
d3
d4
d5
800 850 900
2014.2.
18.08
2014.2.
18.20
2014.2.
19.08
3 6 9
...
... ... ...
2014.2.
18.08
...
...
2014.2.
18.08
...
... ...
3 3 3 3
t1t2t3 t4 t5 t6 t7
Performance Results
• 10 servers: 2*4 cores of CPU, 64GB memory, 9TB SAS Disk
• Store 7 kinds of model data
– 16TB per day
• Get data quickly
– 100 times faster than the older
system
© 2015. All Rights Reserved. 38
Thank you

More Related Content

What's hot

Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
Robbie Strickland
 
Cassandra serving netflix @ scale
Cassandra serving netflix @ scaleCassandra serving netflix @ scale
Cassandra serving netflix @ scale
Vinay Kumar Chella
 
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
DataStax Academy
 
Testing Cassandra Guarantees under Diverse Failure Modes with Jepsen
Testing Cassandra Guarantees under Diverse Failure Modes with JepsenTesting Cassandra Guarantees under Diverse Failure Modes with Jepsen
Testing Cassandra Guarantees under Diverse Failure Modes with Jepsen
jkni
 
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
ScyllaDB
 
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
DataStax
 
Webinar: How to Shrink Your Datacenter Footprint by 50%
Webinar: How to Shrink Your Datacenter Footprint by 50%Webinar: How to Shrink Your Datacenter Footprint by 50%
Webinar: How to Shrink Your Datacenter Footprint by 50%
ScyllaDB
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
ScyllaDB
 
Proofpoint: Fraud Detection and Security on Social Media
Proofpoint: Fraud Detection and Security on Social MediaProofpoint: Fraud Detection and Security on Social Media
Proofpoint: Fraud Detection and Security on Social Media
DataStax Academy
 
A glimpse of cassandra 4.0 features netflix
A glimpse of cassandra 4.0 features   netflixA glimpse of cassandra 4.0 features   netflix
A glimpse of cassandra 4.0 features netflix
Vinay Kumar Chella
 
Building and running cloud native cassandra
Building and running cloud native cassandraBuilding and running cloud native cassandra
Building and running cloud native cassandra
Vinay Kumar Chella
 
mParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandramParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from Cassandra
ScyllaDB
 
Renegotiating the boundary between database latency and consistency
Renegotiating the boundary between database latency  and consistencyRenegotiating the boundary between database latency  and consistency
Renegotiating the boundary between database latency and consistency
ScyllaDB
 
Eliminating Volatile Latencies Inside Rakuten’s NoSQL Migration
Eliminating  Volatile Latencies Inside Rakuten’s NoSQL MigrationEliminating  Volatile Latencies Inside Rakuten’s NoSQL Migration
Eliminating Volatile Latencies Inside Rakuten’s NoSQL Migration
ScyllaDB
 
How netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloudHow netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloud
Vinay Kumar Chella
 
Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra
Cassandra Summit 2014: Launching PlayStation 4 with Apache CassandraCassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra
Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra
DataStax Academy
 
Seastar Summit 2019 Keynote
Seastar Summit 2019 KeynoteSeastar Summit 2019 Keynote
Seastar Summit 2019 Keynote
ScyllaDB
 
The True Cost of NoSQL DBaaS Options
The True Cost of NoSQL DBaaS OptionsThe True Cost of NoSQL DBaaS Options
The True Cost of NoSQL DBaaS Options
ScyllaDB
 
Scylla Summit 2016: ScyllaDB, Present and Future
Scylla Summit 2016: ScyllaDB, Present and FutureScylla Summit 2016: ScyllaDB, Present and Future
Scylla Summit 2016: ScyllaDB, Present and Future
ScyllaDB
 
Scylla Summit 2016: Scylla at Samsung SDS
Scylla Summit 2016: Scylla at Samsung SDSScylla Summit 2016: Scylla at Samsung SDS
Scylla Summit 2016: Scylla at Samsung SDS
ScyllaDB
 

What's hot (20)

Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
 
Cassandra serving netflix @ scale
Cassandra serving netflix @ scaleCassandra serving netflix @ scale
Cassandra serving netflix @ scale
 
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
 
Testing Cassandra Guarantees under Diverse Failure Modes with Jepsen
Testing Cassandra Guarantees under Diverse Failure Modes with JepsenTesting Cassandra Guarantees under Diverse Failure Modes with Jepsen
Testing Cassandra Guarantees under Diverse Failure Modes with Jepsen
 
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
 
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
 
Webinar: How to Shrink Your Datacenter Footprint by 50%
Webinar: How to Shrink Your Datacenter Footprint by 50%Webinar: How to Shrink Your Datacenter Footprint by 50%
Webinar: How to Shrink Your Datacenter Footprint by 50%
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
 
Proofpoint: Fraud Detection and Security on Social Media
Proofpoint: Fraud Detection and Security on Social MediaProofpoint: Fraud Detection and Security on Social Media
Proofpoint: Fraud Detection and Security on Social Media
 
A glimpse of cassandra 4.0 features netflix
A glimpse of cassandra 4.0 features   netflixA glimpse of cassandra 4.0 features   netflix
A glimpse of cassandra 4.0 features netflix
 
Building and running cloud native cassandra
Building and running cloud native cassandraBuilding and running cloud native cassandra
Building and running cloud native cassandra
 
mParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandramParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from Cassandra
 
Renegotiating the boundary between database latency and consistency
Renegotiating the boundary between database latency  and consistencyRenegotiating the boundary between database latency  and consistency
Renegotiating the boundary between database latency and consistency
 
Eliminating Volatile Latencies Inside Rakuten’s NoSQL Migration
Eliminating  Volatile Latencies Inside Rakuten’s NoSQL MigrationEliminating  Volatile Latencies Inside Rakuten’s NoSQL Migration
Eliminating Volatile Latencies Inside Rakuten’s NoSQL Migration
 
How netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloudHow netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloud
 
Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra
Cassandra Summit 2014: Launching PlayStation 4 with Apache CassandraCassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra
Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra
 
Seastar Summit 2019 Keynote
Seastar Summit 2019 KeynoteSeastar Summit 2019 Keynote
Seastar Summit 2019 Keynote
 
The True Cost of NoSQL DBaaS Options
The True Cost of NoSQL DBaaS OptionsThe True Cost of NoSQL DBaaS Options
The True Cost of NoSQL DBaaS Options
 
Scylla Summit 2016: ScyllaDB, Present and Future
Scylla Summit 2016: ScyllaDB, Present and FutureScylla Summit 2016: ScyllaDB, Present and Future
Scylla Summit 2016: ScyllaDB, Present and Future
 
Scylla Summit 2016: Scylla at Samsung SDS
Scylla Summit 2016: Scylla at Samsung SDSScylla Summit 2016: Scylla at Samsung SDS
Scylla Summit 2016: Scylla at Samsung SDS
 

Similar to Tsinghua University: Two Exemplary Applications in China

IoT meets Big Data
IoT meets Big DataIoT meets Big Data
IoT meets Big Data
ratthaslip ranokphanuwat
 
WW Historian 10
WW Historian 10WW Historian 10
WW Historian 10
helenafinnan
 
How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental science
inside-BigData.com
 
Apache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdbApache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdb
ZhangZhengming
 
陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰
台灣資料科學年會
 
InfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
InfluxEnterprise Architecture Patterns by Tim Hall & Sam DillardInfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
InfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
InfluxData
 
The Need for Complex Analytics from Forwarding Pipelines
The Need for Complex Analytics from Forwarding Pipelines The Need for Complex Analytics from Forwarding Pipelines
The Need for Complex Analytics from Forwarding Pipelines
Netronome
 
Huawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark StreamingHuawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark Streaming
Jen Aman
 
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
InfluxData
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
Ted Dunning
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015Ted Dunning
 
Dealing with an Upside Down Internet With High Performance Time Series Database
Dealing with an Upside Down Internet  With High Performance Time Series DatabaseDealing with an Upside Down Internet  With High Performance Time Series Database
Dealing with an Upside Down Internet With High Performance Time Series Database
DataWorks Summit
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
inside-BigData.com
 
Brad stack - Digital Health and Well-Being Festival
Brad stack - Digital Health and Well-Being Festival Brad stack - Digital Health and Well-Being Festival
Brad stack - Digital Health and Well-Being Festival
Digital Health Enterprise Zone
 
Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?
Crate.io
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
Sagar Dolas
 
Deep Turnover Forecast - meetup Lille
Deep Turnover Forecast - meetup LilleDeep Turnover Forecast - meetup Lille
Deep Turnover Forecast - meetup Lille
Carta Alfonso
 
Distributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsDistributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data Streams
Arinto Murdopo
 
Python for High Throughput Science by Mark Basham
Python for High Throughput Science by Mark BashamPython for High Throughput Science by Mark Basham
Python for High Throughput Science by Mark Basham
PyData
 
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Dataconomy Media
 

Similar to Tsinghua University: Two Exemplary Applications in China (20)

IoT meets Big Data
IoT meets Big DataIoT meets Big Data
IoT meets Big Data
 
WW Historian 10
WW Historian 10WW Historian 10
WW Historian 10
 
How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental science
 
Apache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdbApache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdb
 
陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰
 
InfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
InfluxEnterprise Architecture Patterns by Tim Hall & Sam DillardInfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
InfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
 
The Need for Complex Analytics from Forwarding Pipelines
The Need for Complex Analytics from Forwarding Pipelines The Need for Complex Analytics from Forwarding Pipelines
The Need for Complex Analytics from Forwarding Pipelines
 
Huawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark StreamingHuawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark Streaming
 
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015
 
Dealing with an Upside Down Internet With High Performance Time Series Database
Dealing with an Upside Down Internet  With High Performance Time Series DatabaseDealing with an Upside Down Internet  With High Performance Time Series Database
Dealing with an Upside Down Internet With High Performance Time Series Database
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
 
Brad stack - Digital Health and Well-Being Festival
Brad stack - Digital Health and Well-Being Festival Brad stack - Digital Health and Well-Being Festival
Brad stack - Digital Health and Well-Being Festival
 
Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
Deep Turnover Forecast - meetup Lille
Deep Turnover Forecast - meetup LilleDeep Turnover Forecast - meetup Lille
Deep Turnover Forecast - meetup Lille
 
Distributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsDistributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data Streams
 
Python for High Throughput Science by Mark Basham
Python for High Throughput Science by Mark BashamPython for High Throughput Science by Mark Basham
Python for High Throughput Science by Mark Basham
 
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
 

More from DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
DataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
DataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
DataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
DataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
DataStax Academy
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
DataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
DataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
DataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
DataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 

More from DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Recently uploaded

When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
ViralQR
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 

Recently uploaded (20)

When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 

Tsinghua University: Two Exemplary Applications in China

  • 1. Big Machine Data - Two Exemplary Applications in China Jianmin Wang Tsinghua University Beijing, China
  • 2. Agenda • Background • Two Exemplary Applications in China
  • 3. Who we are? • Institute for Data Science, Tsinghua University • Founded in April 2014 • Missions & Status Quo – Recruiting world-class researchers and engineers from industry and academia – Long-term dedication to system research and industry practice – Leading China’s big data strategy, especially for industrial big data
  • 4. BIG data Big Data Landscape People generated 2 1 3 Computer generated Machine generated
  • 5. Machine Generated Data • Broadly exist – Industrial business – Agriculture – Utility – Military – Smarter City – Logistics – Smart devices – Science research Data Rate 24*7, up to million data points/s, and millions of devices DataType Mostly are time-series, temporal sequence, and spatial-temporal and array data Data Usage Real-time processing. From monitoring to content, shape, signal based query and analysis
  • 6.  Industrial businesses have entered the era of “big data”, but the real challenge is how to extract value from data.  Machine generated data is the core of industrial big data Big Machine data is beyond 3Vs
  • 7. Our research spans big data lifecycle Storage1 Access & Exploration3 Preprocessing2 Modeling & Analytics4
  • 8. Agenda • Background • Two Exemplary Applications in China
  • 9. 1 Industrial Sensor Data Management: Cassandra at China Sany Group 2 Climate Data Management: Cassandra at China Meteorological Administration 9© 2015. All Rights Reserved.
  • 10. China Sany Group 10© 2015. All Rights Reserved. More than 200K active engineering machineries In more than 150 countries SANY Group is a global company in the construction machinery industry. In 2011, SANY became China’s unique company listed among the world’s top 500 companies in the construction machinery industry.
  • 11. Pipeline of Industrial Sensor Data Processing © 2015. All Rights Reserved. 11 Internet 三一运动控制器 SYMC 三一工业显示屏 SYLD 三一移动终端 SYMT 产品主控制柜 基 于 SCP协 议 包 车 辆 工 况 数 据 无线基站 无线到有线 指定IP与端口 快反工程师 资料工程师 ... 用户计算机 服务人员 业主 1 2 3 4 execute collect decision transfer The data records the operational statuses of the machineries 5000 kinds of sensors 50 billion records per year
  • 12. 2008 • Start managing sensor data 2010 • 60k machineries 2012 • 80k machineries • Can only support 6 month data online 2014 • >100k machineries • All data online 2020 •>500K machineries •>10K users Technology Roadmap in Sany Group © 2015. All Rights Reserved. 12 SQL Server ➡ Oracle Oracle ➡ Cassandra Why Cassandra? •Cost performance •Scalability •P2P Architecture Operation Worst case Average case Write 30% 2x Query 22.6% 10x
  • 13. Software Stack of Sensor Data Management © 2015. All Rights Reserved. 13 Collect Store Analyze Storm 设备(主键) 工况1(列族1) 工况2(列族2) …… 接收时间1 (列1) 接收时间 2 (列2) …… 接收时间 1 (列1) 接收时间 2 (列2) …… …… 设备1 监测值 监测值 …… 监测值 监测值 …… …… 设备2 监测值 监测值 …… 监测值 监测值 …… …… …… …… …… …… …… …… …… …… Map/Reduce row key sensor1(cf1) sensor2(cf2) device2 received time1 received time2 received time1 received time2 device1 value value value value value value value value
  • 14. Structured Storage gathertime Cassandra Storage: machine gather time sensors 。。。 。。。 Schema Design – Row and Column • Use sensor as Column Family (CF) • In each Column Family (CF) – Use as the row key – Use as the column name – Use as the column value – Columns of each row are sorted in advance – The number of columns is readily increasable machine gather time 。。。 machine gather time 。。。 … sensor1 sensor2 sensorN ~5000 sensors 5000+ column families Cassandra v1.2 CQL2 (not CQL3)
  • 15. © 2015. All Rights Reserved. 15 Why 5000+ Column Families? • Cassandra V1.* does not support multiple primary key & clustering key • This makes programming more complex • Manually split the row key or column name • All the data in one SSTABLE belongs to a specific CF • When querying a specified sensor, we need not scan unnecessary data Row Key Column Name machine_id sensor_id : gather_time Row Key Column Name sensor_id : machine_id gather_time Cassandra v1.2 CQL2 (not CQL3)
  • 16. Challenge 1 – Creating Schema Hang • Problem – Create 5000+ CFs in batch – Creation cost increases dramatically © 2015. All Rights Reserved. 0 5000 10000 15000 1 28 55 82 109 136 163 190 217 244 271 298 325 352 379 406 433 460 487 514 541 TimeCost(ms) CF Serial Number Time Cost Create 1 CF: 10s Create 1 CF: 0.1s • Root Cause – Protocol Conflict • Between Gossip Protocol and Request Propagation Mechanism – Message Overhead • May transform the whole schema instead of the changed part ReceiveSchema Message Memory Cost SendSchema Message Memory Cost Total N1 4.465G 4.236G 8.70G N2 4.308G 4.907G 9.21G N3 4.236G 4.024G 8.26G N4 4.808G 4.387G 9.19G N5 6.111G 6.373G 12.48G Memory used by Gossip
  • 17. Challenge 1 – Creating Schema Hang • Solution – Gossip takes effect only when: • Propagation messages lost/timeout • Nodes recovered from a failure – Creation time cost can keep constant 17 Propagate LOAD STATUS SHCEMA VERSION ... LOAD STATUS SHCEMA 延迟:t秒 VERSION ... metadata metadata Delay T sT strategy: 1 2 34 Adaptive Lazy Gossip 3 4
  • 18. Challenge 2 – Balancing Consistency & Throughput • Production environment – Sany production: 5 nodes cluster, 2x4 cores 64GB • Problem – Throughput = 200K data points/sec – 75% data is written successfully only in one replica, while the other replicas are stale (inconsistent) • Cassandra is NOT very consistent • Big obstacle for query operation – Repair is required, but is very slow © 2015. All Rights Reserved. 18 Experiment on Amazon EC2 2 cores, 8GB, 5 nodes rywc: read your write consistency
  • 19. Challenge 2 – Balancing Consistency & Throughput • Root Cause of slow Repair – Too many column families (5000+) – Too many ranges in the consistent hashing ring • 256 virtual nodes (VN) per physical node • Too many merkle trees (ranges x CFs) • Experience and Suggestions – Repair CFs and ranges one by one • Do not repair the whole keyspace (all CFs) at once – Repair the important CFs first – Perform repair at light workload © 2015. All Rights Reserved. 19 - 5 physical nodes - each has 2 VNs - 10 ranges in total For each range and each CF, create merkle tree and compare them between two nodes
  • 20. Challenge 3 – Heterogeneous Nodes • Problem – How to assign the data partitions in a heterogeneous cluster? • Experiment Study – Deploy a heterogeneous cluster • 2 powerful servers and 8 PCs – Throughput performance • Heterogeneous cluster cluster only with the 2 powerful servers © 2015. All Rights Reserved. 20 Assign the position of the nodes (i.e. Tokens) in the ring according to their computing capacities
  • 21. Challenge 3 – Heterogeneous Nodes • Root Cause – The replica mechanism makes the unbalanced problem complicated • Each Node’s configurations may impact other nodes’ performance – The Virtual Node (VN) mechanism cannot fit all scenarios • Too many VNs make the lookup table too big and slow down repair speed • Max #VNs in a physical node is 1536 (restricted by Cassandra source code) © 2015. All Rights Reserved. 21 The capacity of N1 is the worst, and E is short But N1 is responsible for many data records to the cluster: • N5 finish the operation quickly • But N5 has to wait for N1, which is slow
  • 22. Challenge 3 – Heterogeneous Nodes • Solution – Initialize the cluster properly • Use Quadratic Optimization (QP) to find the best positions of the (virtual) servers • Has been deployed to China Sany Group successfully – Scaling out the cluster • Use a dynamic algorithm to find the best positions for the new added server © 2015. All Rights Reserved. 22 Scaling out: Xiangdong Huang, Jianmin Wang et al. Optimizing Data Partition for Scaling out NoSQL Cluster. Concurrency and Computation: Practice and Experience (Early View) Scaling out: find the best position Optimize: 1. the order of the nodes in the ring 2. the range length of the ring
  • 23. Datasets & Results in China Sany Group • 5000+ column families for sensor data • 100K+ engineering machineries • Amount of historical data loaded – From 2012.4 to now • Data size – Tens of billions operational statuses records – Several billion GPS data – Write throughput – 5 nodes (2*4 cores CPU, 64GB memory, 9TB Disk) – 20K TPS as regular workload, 200K at peak 23
  • 24. Industrial Big Data Platform: More Requirements ——Beyond Sany Applications High frequency sensor High volume sensors 10+ M data point/second Time and value based query Richer set of analytical queries <1 Second response Edge synchronization Compression, out-of-order, retransmission Different data, different algorithms Transparent to query Deep compression to historical data Spatial-temporal index Trajectory based queries Even higher throughput Native time-series query Synchronization Adaptive deep compression Moving object support
  • 25. Industrial Data Analysis Pipeline © 2015. All Rights Reserved. 25 Boolean value Status values Analogue value 1.046Billion Basic indicator 8030 Baseline 1.046Billion Variance Specific features Common features Outliers Specifiedoperational statusesdata General count baseline variance frequency baseline variance .. Analogue average baseline variance variance baseline variance extremum baseline variance … Boolean times baseline variance duration baseline variance … States Changes times baseline variance duration baseline variance …
  • 26. Driver profile Hydraulic oil temperature analysis Temporal parameter analysis for vehicle start Parameter correlation Spatial analysis for failure Service Quality ControlR&D Key components anomaly detection Industry Practice – Value-Added Analytics
  • 27. horizontal inclination angle Concrete pump truck’s tip-over is mainly caused by insufficient leg’s cylinder support, which is a major issue of production safety Big Data Application 1 —Concrete Pump Truck Tip-over Detection
  • 28. Big Data Application 1 —Concrete Pump Truck Tip-over Detection Fast spot and prevent dangerous operation through group behavior analysis of concrete pump trucks The overall distribution of horizontal (X-axis) & vertical (Y-axis) inclination angle of concrete pumps Unstable instances Idle instances Inclination angle vibration level filter Inclination angle distribution of individual concrete pump
  • 29. Idle instances: unplugging operation leads to malfunctioning Unstable instances: Early degradation pattern of cylinder Typical instances: stable oscillation Data driven anomaly and potential accidents detection Big Data Application 1 —Concrete Pump Truck Tip-over Detection
  • 30. Big Data Application 2 —Fault Diagnosis Investigation proved that salt-spray environment and the water quality along the seaside caused the corrosion of cylinder’s potted component Via time series pattern analysis and spatial correlation, leakage problem of master cylinder is highly correlated with a high-speed rail construction project. Hangzhou-Shenzhen high-speed rail Salt-spray corrosion environment
  • 31. Big Data Application 3 - Spare Components Demand Forecasting • Traditional approach is based on marketers’ experience • New approach – Combining the real-time data from machines, sale history, holdings of vehicles, environment and GDP, etc. • Result – Reduce half of inactive spare part inventory 0 50 100 150 200 250 中 旬 下 旬 上 旬 中 旬 下 旬 上 旬 中 旬 下 旬 上 旬 中 旬 下 旬 上 旬 中 旬 下 旬 上 旬 中 旬 下 旬 上 旬 中 旬 下 旬 上 旬 中 旬 下 旬 上 旬 中 旬 下 旬 2012/10 2012/11 2012/12 2013/1 2013/2 2013/3 2013/4 2013/5 2013/6 配件需求量数量/个 实际备件需求量 基于矩阵分解的多地区协同备件预测结果 企业实际备货量 The predicted result fits the actual demand better Sparepartsnumber Actual demand Actual prepared Results of Multi-Region Collaborative Spare Components Prediction Based on Matrix Factorization
  • 32. 1 Industrial Data Management: Cassandra at China Sany Group 2 Climate Data Management: Cassandra at China Meteorological Administration 32© 2015. All Rights Reserved.
  • 33. Pipeline of Climate Data Processing © 2015. All Rights Reserved. Data Center Internet Collection 2 Transmission 3 Access 4 Browsing 1
  • 35. Challenge in Meteorological Application —Pattern Data © 2015. All Rights Reserved. 35 • Hierarchical pattern data + flat others • A highly-efficient data-deliver system for end users – Support millions of small files – Access data fast – Scan data in various order • Performance requirement – Get ~1MB data in 50ms – 600 concurrent clients / T639d1 ... windtemper ...d2 d3 d4 d5 800 850 900 2014.2. 18.08 2014.2. 18.20 2014.2. 19.08 3 6 9 ... ... ... ... 2014.2. 18.08 ... ... 2014.2. 18.08 ... ... ... 3 3 3 3 t1t2t3 t4 t5 t6 t7 d3
  • 36. Why Cassandra? • Scalability • Fast read/write data • Some columns are sorted – Easy to scan data sequentially • Time-based Compaction (>=Cassandra v2.0) for time series © 2015. All Rights Reserved. 36 key 3h 6h 9h … T639/temperature/800Pa file file file … 1. Get the data where key=‘T639…/800Pa’ 2. Retrieval the data before 6h Or retrieval the data after 6h key 3h 6h 9h 12h … 3h 6h 9h T639/temperature/800Pa file file file file … file file file
  • 37. Solution – Schema Design for Pattern Data • Data items – 5-tuple – Pattern and variable are disordered – Level, time, ageing are ordered © 2015. All Rights Reserved. 37 time level ageingData space (pattern, variable) ColumnFamily Row key Column / T639d1 ... windtemper ...d2 d3 d4 d5 800 850 900 2014.2. 18.08 2014.2. 18.20 2014.2. 19.08 3 6 9 ... ... ... ... 2014.2. 18.08 ... ... 2014.2. 18.08 ... ... ... 3 3 3 3 t1t2t3 t4 t5 t6 t7
  • 38. Performance Results • 10 servers: 2*4 cores of CPU, 64GB memory, 9TB SAS Disk • Store 7 kinds of model data – 16TB per day • Get data quickly – 100 times faster than the older system © 2015. All Rights Reserved. 38