SlideShare a Scribd company logo
© Hitachi, Ltd. 2019. All rights reserved.
Lessons Learned from Leveraging Real-Time
Power Consumption Data with Apache Kudu
ApacheCon North America 2019
Masahiro Ito
OSS Solution Center
Hitachi, Ltd. September 11, 2019
© Hitachi, Ltd. 2019. All rights reserved. 1
Who am I?
• Masahiro Ito
➢ Software Engineer at Hitachi, Ltd.
• Developing Bigdata and AI solutions
– E-mail: masahiro.ito.ph@hitachi.com
➢ Web article writer (in Japanese)
• https://thinkit.co.jp/author/10002
© Hitachi, Ltd. 2019. All rights reserved. 2
Outline
1. Introduction
2. Apache Kudu Overview
3. Performance Evaluations
I. Bulk Data Loading Performance
II. Near Real-time Processing Performance
4. Summary
© Hitachi, Ltd. 2019. All rights reserved. 3
1. Introduction
© Hitachi, Ltd. 2019. All rights reserved. 4
Hitachi Corporate Profile
9,480.6 billion yen
754.9 billion yen
295,941
February 1, 1920
458.7 billion yen
© Hitachi, Ltd. 2019. All rights reserved. 4
Revenues
Operating Income
Number of Employees
Established
Capital
(as of end of Mar. 2019)
(as of end of Mar. 2019)
(FY2018 Consolidated)
(FY2018 Consolidated)
Hitachi, Ltd.
President & CEO
Toshiaki Higashihara
© Hitachi, Ltd. 2019. All rights reserved. 5
Share of Revenues (FY2018*)
16%
20%
10%
7%
9%
7%
10%
Revenues
9,480.6
billion yen
■IT
■Hitachi Construction
Machinery
■Hitachi Metals
■Hitachi Chemical
■Others
5%
4%
■Industry
■Mobility
12%■Hitachi High-Technologies
■Energy
■Smart Life
* The figures are based on the new segment classifications effective from FY2019
© Hitachi, Ltd. 2019. All rights reserved. 6
Motivation of Real-time IoT Data Analysis
• Utilization of IoT and AI in various industries
➢ Generates large amounts of data in real-time by various IoT devices
➢ Leverages sensor data for monitoring, BI, and machine learning
Kudu
• Real-time IoT data analysis
➢ Requires strong performance for streaming / analytic workload
© Hitachi, Ltd. 2019. All rights reserved. 7
2. Apache Kudu Overview
© Hitachi, Ltd. 2019. All rights reserved. 8
Apache Kudu Overview
• Apache Kudu is a storage engine for Apache Hadoop
➢ A top-level project in the Apache Software Foundation
• Apache Hadoop ecosystem integration
➢ Reduces query latency for Apache Impala and Apache Spark
➢ Enables transparently joining of Kudu tables with HDFS or HBase
• Kudu enables real-time analytics on rapidly changing data
➢ Has both of fast inserts/updates and efficient scans
© Hitachi, Ltd. 2019. All rights reserved. 9
Performance Comparison for Kudu/HBase/HDFS
High throughput read
Real-time read
High throughput writeReal-time write
Kudu
HBase
HDFS
Suitable for data analysis
Suitable for
streaming data store
Kudu covers different workloads by itself
➢ Enables real-time analytics on rapidly changing data
© Hitachi, Ltd. 2019. All rights reserved. 10
Traditional Hadoop and Kudu: Analytics on rapidly changing data
HBase HDFS
Streaming
data
Traditional Hadoop
Inserts/Updates
Kudu
Kudu
Analysis system
- Dashboard
- BI
- Machine Learning
Streaming
data
Analysis system
- Dashboard
- BI
- Machine Learning
Inserts/Updates Batch copy
Scans
Scans
© Hitachi, Ltd. 2019. All rights reserved. 11
Data Model: Table
• Strongly-typed columns
• Primary Key consists of one or more columns
• Operations: Insert / Update / Delete / Upsert / Scan
date id usage cost complete
2018-01-01 01 20.86 22,360 True
2018-01-01 02 124.23 182,345 True
2018-01-02 01 22.53 736 False
2018-01-02 02 30.01 5,842 True
Primary key
Sorted by
primary key columns
© Hitachi, Ltd. 2019. All rights reserved. 12
Kudu TServer
Kudu TServer
Data Management: Table and Tablet
• A table is partitioned into tablets that distributed across tablet servers
➢ Partitioning strategy: Range partitioning, Hash partitioning
➢ All rows within a tablet are sorted by its primary key
date id …
2018-01-01 01 …
2018-01-01 02 …
2018-01-01 03 …
2018-01-01 04 …
2018-01-02 01 …
2018-01-02 02 …
2018-01-02 03 …
2018-01-02 04 …
Range partitioning
by date
Hash partitioning
by id
TabletsTable
2018-01-01 01 …
2018-01-01 03 …
2018-01-01 02 …
2018-01-01 04 …
2018-01-02 02 …
2018-01-02 04 …
2018-01-02 01 …
2018-01-02 03 …
Tablet 1
Tablet 3
Tablet 2
Tablet 4
2018-01-01 01 …
2018-01-01 02 …
2018-01-01 03 …
2018-01-01 04 …
2018-01-02 01 …
2018-01-02 02 …
2018-01-02 03 …
2018-01-02 04 …
Replicate
© Hitachi, Ltd. 2019. All rights reserved. 13
Insert Operation Flow in each TServer
Worker Node
Kudu TServer
Data disk
Tablet 1
DiskRowSet
key …
01 …
02 …
03 …
04 …
05 …
06 …
DiskRowSet
key …
02 …
03 …
06 …
3. Flush
4. Compaction:
Merge and sort by primary key
1. Insert records
Kudu Client
05 …
01 …
04 …
DiskRowSet
key …
01 …
04 …
05 …
MemRowSet
key …
01 …
04 …
05 …
2. Sort by in-memory buffer:
Sort by primary key
Write Ahead log
© Hitachi, Ltd. 2019. All rights reserved. 14
3. Performance Evaluations
© Hitachi, Ltd. 2019. All rights reserved. 15
Evaluation Scenario: Real-time Power Consumption Data Analysis
• What is power disaggregation?
➢ Estimates the power consumption of individual appliances from a single meter only
• Appliances: TV, air conditioner, refrigerator, microwave, etc.
➢ Enables energy monitoring of individual appliances
• For energy efficiency improvement, user behavior analysis, etc.
Appliance load
monitoring
Total electrical signal
(with single meter)
Electrical signals of
each appliance
Disaggregation
© Hitachi, Ltd. 2019. All rights reserved. 16
Evaluation Outline
i. Bulk Data Loading Performance
➢ Migrate existing data to the new system with Kudu
ii. Near Real-time Processing Performance
➢ Simultaneous data insertion and scanning
• Insert power consumption data every second
• Scan inserted data every minute for aggregation
• Scan aggregated data every 5 seconds for interactive data analysis
0000
Meters
0000
0000
Kudu
Electric Power
Disaggregation
System
Analysis
system
Insert every second Analytic query
Minutely aggregation
Analyst
© Hitachi, Ltd. 2019. All rights reserved. 17
Evaluation Environment: 6 Physical machines and 10 Gbps network
Physical machine Spec
- CPU: 20 cores (40 threads)
- Memory: 384 GB
- Disk: SAS HDD 1,200 GB * 10 disks
1 master node
- Impala Catalog Server
- Impala StateStore
- HDFS NameNode
- Kudu Master
- Hive Metastore Server
1 client node
- Kudu Java client
4 worker nodes
- Impala Daemon
- HDFS DataNode
- Kudu TServer
10 Gbps switch / 10Gpbs LAN
Software version
- OS: CentOS 7.6
- CDH 6.2, Kudu 1.9.0
Software Configurations
- TServer memory: 32GB
- Impala memory: 256GB
© Hitachi, Ltd. 2019. All rights reserved. 18
I. Bulk Data Loading Performance
© Hitachi, Ltd. 2019. All rights reserved. 19
Evaluation Overview
• Load CSV files in HDFS into a Kudu table using Impala
• Compared two optimizer hints in Impala
1. +SHUFFLE,CLUSTERED (default):
• SHUFFLE: Exchanges data between nodes for Partitioning data before insert
• CLUSTERED: Sorts data by the partition columns before insert
2. +NOSHUFFLE,NOCLUSTERED
• Does not partitioning and sort before insert
Table schema
# Columns Primary key Type
1 time_stamp ✔ unixtime_micros
2 building_id ✔ int32
3 floor_id ✔ int32
4 device_id ✔ int32
5 device_load int64
6 device_type int32
Table design:
- Record size: 32 byte
- Range partition: 24 hour (time_stamp)
- Hash partitions: 16 (building_id, floor_id)
- Replication factor: 3
Data size:
- 1,440 million records, 43 GB
© Hitachi, Ltd. 2019. All rights reserved. 20
Bulk Data Loading Performance: Throughput and Compaction load
Insertion finish
+SHUFFLE,CLUSTERED (default) +NOSHUFFLE,NOCLUSTERED
Insertion
throughput
Compaction
duration
Insertion finish
Avg. 1.57M
records/sec
Avg. 0.57M
records/sec
Optimizer hints in Impala
Almost no time
Continues after
finish insertion
© Hitachi, Ltd. 2019. All rights reserved. 21
Evaluation Summary
0.57 M
1.57 M
0.00 M
0.50 M
1.00 M
1.50 M
2.00 M
CLUSTERED,
SHUFFLE
(default)
NOCLUSTERED,
NOSHUFFLE
records/sec
Impala query hints
Impala bulk insert throughput
+NOSHUFFLE,NOCLUSTERED hints:
• Using Impala memory only for data
insertion
• Impala completes data loading quickly
• Kudu continues heavy compaction in
the background
+SHUFFLE,CLUSTERED hints (default):
• Leveraging Impala memory for
partitioning and sorting
• Impala takes more time to complete
data loading
• Kudu has less compaction load
© Hitachi, Ltd. 2019. All rights reserved. 22
II. Near Real-time Processing Performance
© Hitachi, Ltd. 2019. All rights reserved. 23
Evaluation Overview
• Concurrent data insertion and scanning for 4 hours
➢ Insert every second with Kudu Java clients
• Num. of insert records (appliances) : 100,000 ~
• Fail if insertion time continues to exceed 1 second
➢ Scan by two types of queries with Impala
Kudu
load_per_sec_table
minutely_load_table
A) Minutely aggregation query
From seconds to minutes for all appliances
(Every minute)
B) Per appliance aggregation query
Get 1 appliance daily total load
(Every 5 second)
Insert records
(Every second)
Pre-store for 1 day (1,440 minutes) records
to save measurement time
Kudu Java client × 20 Impala
© Hitachi, Ltd. 2019. All rights reserved. 24
Table Designs and Scan Workloads
• Evaluates two types of tables with different primary key order
➢ Affects Scan performance
time_stamp building floor device watt type
2019-09-01 00:00 00001 01 01 209 3
2019-09-01 00:00 00001 01 02 102 5
2019-09-01 00:00 00001 01 03 42 11
2019-09-01 00:00 00001 02 01 462 4
2019-09-01 00:00 00001 02 02 3 22
2019-09-01 00:00 00001 03 01 0 4
building floor device time_stamp watt type
00001 01 01 2019-09-01 00:00 209 3
00001 01 01 2019-09-01 00:01 102 5
00001 01 01 2019-09-01 00:02 42 11
00001 01 02 2019-09-01 00:00 462 4
00001 01 02 2019-09-01 00:01 3 22
00001 01 02 2019-09-01 00:02 0 4
2) First primary key columns
= building, floor, device IDs
Efficient access to a specific appliance load
➢ e.g. B) Per appliance query
Efficient access to a range of time loads
➢ e.g. A) Minutely aggregation query
1) First primary key column
= time_stamp
© Hitachi, Ltd. 2019. All rights reserved. 25
Insertion Performance: First primary key column = time_stamp
• Avg. insertion time: 480 msec
• Sometimes insertion time exceeded 1
second, but recovered quickly
• Insertion time exceeded 1 second continuously
• Occurred “Memory pressure rejection”
- Soft memory limit exceeded (at 93.59% of capacity).
Insert 1.9 M record/sec (Succeeded) Insert 2.0 M record/sec (Failed)
480 msec
1,031 msec
0 msec
1,000 msec
2,000 msec
0.1 M 0.2 M 0.3 M 0.4 M 0.5 M 0.6 M 0.7 M 0.8 M 0.9 M 1.0 M 1.2 M 1.4 M 1.6 M 1.8 M 1.9 M 2.0 M
Num. of insertion records per second
Insertion time (Avg.)
© Hitachi, Ltd. 2019. All rights reserved. 26
Insertion Performance: First primary key columns = IDs
Insert 0.5 M record/sec (Succeeded) Insert 0.6 M record/sec (Failed)
185 msec
440 msec
0 msec
500 msec
0.1 M 0.2 M 0.3 M 0.4 M 0.5 M 0.6 M
Num. of insertion records per second
Insertion time (Avg.)
• Avg. insertion time: 185 msec
• Insertion time exceeded 1 second continuously
- From 01:20
• Occurred “The service queue is full (50 items)”
© Hitachi, Ltd. 2019. All rights reserved. 27
Why is the insertion performance different in order of primary key?
• The RowSet compaction load changes according to the primary key order
➢ Since the records are inserted in timestamp order
time_stamp …
2019-09-01 00:00 …
2019-09-01 00:00 …
2019-09-01 00:01 …
2019-09-01 00:01 …
time_stamp …
2019-09-01 00:00 …
2019-09-01 00:00 …
2019-09-01 00:01 …
2019-09-01 00:01 …
2019-09-01 00:02 …
2019-09-01 00:02 …
time_stamp …
2019-09-01 00:02 …
2019-09-01 00:02 …
Existing RowSet
New RowSet
building ... time_stamp …
00001 … 2019-09-01 00:00 …
00001 … 2019-09-01 00:01 …
00001 … 2019-09-01 00:02 …
00002 … 2019-09-01 00:00 …
00002 … 2019-09-01 00:01 …
00002 … 2019-09-01 00:02 …
building ... time_stamp …
00001 … 2019-09-01 00:00 …
00001 … 2019-09-01 00:01 …
00002 … 2019-09-01 00:01 …
00002 … 2019-09-01 00:00 …
building ... time_stamp …
00001 … 2019-09-01 00:02 …
00002 … 2019-09-01 00:02 …
Existing RowSet
New RowSet
First primary key column = time_stamp
(Inserted 1.9M records/sec)
First primary key columns = IDs
(Inserted 0.5M records/sec)
Merge without sorting
Merge with sorting
© Hitachi, Ltd. 2019. All rights reserved. 28
Can we reduce the compaction load in another way?
time_stamp …
2019-09-01 00:00 …
2019-09-01 00:00 …
2019-09-01 00:01 …
2019-09-01 00:01 …
time_stamp …
2019-09-01 00:00 …
2019-09-01 00:00 …
2019-09-01 00:01 …
2019-09-01 00:01 …
2019-09-01 00:02 …
2019-09-01 00:02 …
time_stamp …
2019-09-01 00:02 …
2019-09-01 00:02 …
Existing RowSet
New RowSet
building ... time_stamp …
00001 … 2019-09-01 00:00 …
00001 … 2019-09-01 00:01 …
00001 … 2019-09-01 00:02 …
00002 … 2019-09-01 00:00 …
00002 … 2019-09-01 00:01 …
00002 … 2019-09-01 00:02 …
building ... time_stamp …
00001 … 2019-09-01 00:00 …
00001 … 2019-09-01 00:01 …
00002 … 2019-09-01 00:01 …
00002 … 2019-09-01 00:00 …
building ... time_stamp …
00001 … 2019-09-01 00:02 …
00002 … 2019-09-01 00:02 …
Existing RowSet
New RowSet
First primary key columns = IDs
(Inserted 0.5M records/sec)
Merge without sorting
Merge with sorting
Can we reduce the compaction load by
reducing the maximum size of each tablet?
First primary key column = time_stamp
(Inserted 1.9M records/sec)
© Hitachi, Ltd. 2019. All rights reserved. 29
Insertion Performance: First primary key column = IDs, Partition range = 24h->1h
Insert 0.8 M record/sec (Succeeded) Insert 0.9 M record/sec (Failed)
• Avg. insertion time: 231 msec
• Increase of insertion time was reset by
hourly tablet change
• Insertion time exceeded 1 second continuously
- Around the end of every hour
• Occurred “Memory pressure rejection” and
“The service queue is full (50 items)”
231 msec
358 msec
0 msec
200 msec
400 msec
0.1 M 0.2 M 0.3 M 0.4 M 0.5 M 0.6 M 0.7 M 0.8 M 0.9 M
Num. of insertion records per second
Insertion time (Avg.)
Reduced the maximum size of each tablet
by changing range partition from 24h to 1h.
© Hitachi, Ltd. 2019. All rights reserved. 30
Insertion Performance Summary
1.9 M record/sec
0.5 M record/sec
0.8 M record/sec
0.0 M record/sec
0.5 M record/sec
1.0 M record/sec
1.5 M record/sec
2.0 M record/sec
2.5 M record/sec
3.0 M record/sec
Timestamp (24h) IDs (24h) IDs (1h)
First primary key columns (Partition range)
Insertion throughput
Tuning point: Reduce the compaction load
• Use Timestamp for the first primary key column
• If you want IDs as the first key, reduce the maximum size of each tablet
➢ Increase the number of partitions
© Hitachi, Ltd. 2019. All rights reserved. 31
Scan Performance Summary
First primary key column = timestamp
was the lowest latency
First primary key columns = IDs
were the lowest latency
1.2 sec
0.2 sec0.2 sec
0.0 sec
0.2 sec
0.4 sec
0.6 sec
0.8 sec
1.0 sec
1.2 sec
1.4 sec
1.6 sec
1.8 sec
0.1
M
0.2
M
0.3
M
0.4
M
0.5
M
0.6
M
0.7
M
0.8
M
0.9
M
1.0
M
1.2
M
1.4
M
1.6
M
1.8
M
1.9
M
2.0
M
Num. of insertion records per second
B) Per appliance aggregation query time
(95 percentile)
First key: Timestamp
First key: IDs, Partition range: 1h
First key: IDs, Partition range: 24h
8.0 sec
7.4 sec
10.6 sec
0.0 sec
2.0 sec
4.0 sec
6.0 sec
8.0 sec
10.0 sec
12.0 sec
14.0 sec
16.0 sec
18.0 sec
20.0 sec
0.1
M
0.2
M
0.3
M
0.4
M
0.5
M
0.6
M
0.7
M
0.8
M
0.9
M
1.0
M
1.2
M
1.4
M
1.6
M
1.8
M
1.9
M
2.0
M
Num. of insertion records per second
A) Minutely aggregation query time
(95 percentile)
First key: Timestamp
First key: IDs, Partition range: 1h
First key: IDs, Partition range: 24h
• Primary key order should be defined according to the patterns of data scan
➢ Scan request latencies were 3-6 times different
• Trade-off with insertion performance
© Hitachi, Ltd. 2019. All rights reserved. 32
4. Summary
© Hitachi, Ltd. 2019. All rights reserved. 33
Summary
• 4-TServer Kudu cluster enables real-time analysis on
1-second power consumption data for 1.9 million appliances
➢ Inserts every second, aggregates every minute, aggregates by any appliance
• Lessons from performance evaluation:
➢ Insertion performance tuning:
• Reduce the compaction load by
✓ Using timestamp for the first primary key column
to reduce the cost of sort during the merge
✓ Reducing a tablet size to reduce compaction records
➢ Scan performance tuning:
• Define primary key order according to the patterns of data scan
© Hitachi, Ltd. 2019. All rights reserved. 34
Trademarks
• Apache Kudu, Apache Impala, Apache Spark, Apache HBase and Apache Hadoop are either
registered trademarks or trademarks of Apache Software Foundation in the United States and/or
other countries.
• Other company and product names mentioned in this document may be the trademarks of their
respective owners.

More Related Content

What's hot

Gpu Compute
Gpu ComputeGpu Compute
Gpu Compute
jworth
 
Beginners Guide to High Availability for Postgres
Beginners Guide to High Availability for PostgresBeginners Guide to High Availability for Postgres
Beginners Guide to High Availability for Postgres
EDB
 
Apache Flink & Kudu: a connector to develop Kappa architectures
Apache Flink & Kudu: a connector to develop Kappa architecturesApache Flink & Kudu: a connector to develop Kappa architectures
Apache Flink & Kudu: a connector to develop Kappa architectures
Nacho García Fernández
 
Public Sector Virtual Town Hall: High Availability for PostgreSQL
Public Sector Virtual Town Hall: High Availability for PostgreSQLPublic Sector Virtual Town Hall: High Availability for PostgreSQL
Public Sector Virtual Town Hall: High Availability for PostgreSQL
EDB
 
Overcoming write availability challenges of PostgreSQL
Overcoming write availability challenges of PostgreSQLOvercoming write availability challenges of PostgreSQL
Overcoming write availability challenges of PostgreSQL
EDB
 
Automating a PostgreSQL High Availability Architecture with Ansible
Automating a PostgreSQL High Availability Architecture with AnsibleAutomating a PostgreSQL High Availability Architecture with Ansible
Automating a PostgreSQL High Availability Architecture with Ansible
EDB
 
OLTP+OLAP=HTAP
 OLTP+OLAP=HTAP OLTP+OLAP=HTAP
OLTP+OLAP=HTAP
EDB
 
Optimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 MinutesOptimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 Minutes
Alexandra Sasha Blumenfeld
 
Exploiting machine learning to keep Hadoop clusters healthy
Exploiting machine learning to keep Hadoop clusters healthyExploiting machine learning to keep Hadoop clusters healthy
Exploiting machine learning to keep Hadoop clusters healthy
DataWorks Summit
 
Beginner's Guide to High Availability for Postgres - French
Beginner's Guide to High Availability for Postgres - FrenchBeginner's Guide to High Availability for Postgres - French
Beginner's Guide to High Availability for Postgres - French
EDB
 
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Hakka Labs
 
PostgreSQL 13 is Coming - Find Out What's New!
PostgreSQL 13 is Coming - Find Out What's New!PostgreSQL 13 is Coming - Find Out What's New!
PostgreSQL 13 is Coming - Find Out What's New!
EDB
 
"Performance Evaluation, Scalability Analysis, and Optimization Tuning of A...
"Performance Evaluation,  Scalability Analysis, and  Optimization Tuning of A..."Performance Evaluation,  Scalability Analysis, and  Optimization Tuning of A...
"Performance Evaluation, Scalability Analysis, and Optimization Tuning of A...
Altair
 
Kudu demo
Kudu demoKudu demo
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARNHadoop {Submarine} Project: Running Deep Learning Workloads on YARN
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN
DataWorks Summit
 
PostgreSQL as a Strategic Tool
PostgreSQL as a Strategic ToolPostgreSQL as a Strategic Tool
PostgreSQL as a Strategic Tool
EDB
 
Field Notes: YARN Meetup at LinkedIn
Field Notes: YARN Meetup at LinkedInField Notes: YARN Meetup at LinkedIn
Field Notes: YARN Meetup at LinkedIn
Hortonworks
 
Pig on Tez: Low Latency Data Processing with Big Data
Pig on Tez: Low Latency Data Processing with Big DataPig on Tez: Low Latency Data Processing with Big Data
Pig on Tez: Low Latency Data Processing with Big Data
DataWorks Summit
 
Seamless Replication and Disaster Recovery for Apache Hive Warehouse
Seamless Replication and Disaster Recovery for Apache Hive WarehouseSeamless Replication and Disaster Recovery for Apache Hive Warehouse
Seamless Replication and Disaster Recovery for Apache Hive Warehouse
Sankar H
 
Introducing Data Redaction - an enabler to data security in EDB Postgres Adva...
Introducing Data Redaction - an enabler to data security in EDB Postgres Adva...Introducing Data Redaction - an enabler to data security in EDB Postgres Adva...
Introducing Data Redaction - an enabler to data security in EDB Postgres Adva...
EDB
 

What's hot (20)

Gpu Compute
Gpu ComputeGpu Compute
Gpu Compute
 
Beginners Guide to High Availability for Postgres
Beginners Guide to High Availability for PostgresBeginners Guide to High Availability for Postgres
Beginners Guide to High Availability for Postgres
 
Apache Flink & Kudu: a connector to develop Kappa architectures
Apache Flink & Kudu: a connector to develop Kappa architecturesApache Flink & Kudu: a connector to develop Kappa architectures
Apache Flink & Kudu: a connector to develop Kappa architectures
 
Public Sector Virtual Town Hall: High Availability for PostgreSQL
Public Sector Virtual Town Hall: High Availability for PostgreSQLPublic Sector Virtual Town Hall: High Availability for PostgreSQL
Public Sector Virtual Town Hall: High Availability for PostgreSQL
 
Overcoming write availability challenges of PostgreSQL
Overcoming write availability challenges of PostgreSQLOvercoming write availability challenges of PostgreSQL
Overcoming write availability challenges of PostgreSQL
 
Automating a PostgreSQL High Availability Architecture with Ansible
Automating a PostgreSQL High Availability Architecture with AnsibleAutomating a PostgreSQL High Availability Architecture with Ansible
Automating a PostgreSQL High Availability Architecture with Ansible
 
OLTP+OLAP=HTAP
 OLTP+OLAP=HTAP OLTP+OLAP=HTAP
OLTP+OLAP=HTAP
 
Optimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 MinutesOptimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 Minutes
 
Exploiting machine learning to keep Hadoop clusters healthy
Exploiting machine learning to keep Hadoop clusters healthyExploiting machine learning to keep Hadoop clusters healthy
Exploiting machine learning to keep Hadoop clusters healthy
 
Beginner's Guide to High Availability for Postgres - French
Beginner's Guide to High Availability for Postgres - FrenchBeginner's Guide to High Availability for Postgres - French
Beginner's Guide to High Availability for Postgres - French
 
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
 
PostgreSQL 13 is Coming - Find Out What's New!
PostgreSQL 13 is Coming - Find Out What's New!PostgreSQL 13 is Coming - Find Out What's New!
PostgreSQL 13 is Coming - Find Out What's New!
 
"Performance Evaluation, Scalability Analysis, and Optimization Tuning of A...
"Performance Evaluation,  Scalability Analysis, and  Optimization Tuning of A..."Performance Evaluation,  Scalability Analysis, and  Optimization Tuning of A...
"Performance Evaluation, Scalability Analysis, and Optimization Tuning of A...
 
Kudu demo
Kudu demoKudu demo
Kudu demo
 
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARNHadoop {Submarine} Project: Running Deep Learning Workloads on YARN
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN
 
PostgreSQL as a Strategic Tool
PostgreSQL as a Strategic ToolPostgreSQL as a Strategic Tool
PostgreSQL as a Strategic Tool
 
Field Notes: YARN Meetup at LinkedIn
Field Notes: YARN Meetup at LinkedInField Notes: YARN Meetup at LinkedIn
Field Notes: YARN Meetup at LinkedIn
 
Pig on Tez: Low Latency Data Processing with Big Data
Pig on Tez: Low Latency Data Processing with Big DataPig on Tez: Low Latency Data Processing with Big Data
Pig on Tez: Low Latency Data Processing with Big Data
 
Seamless Replication and Disaster Recovery for Apache Hive Warehouse
Seamless Replication and Disaster Recovery for Apache Hive WarehouseSeamless Replication and Disaster Recovery for Apache Hive Warehouse
Seamless Replication and Disaster Recovery for Apache Hive Warehouse
 
Introducing Data Redaction - an enabler to data security in EDB Postgres Adva...
Introducing Data Redaction - an enabler to data security in EDB Postgres Adva...Introducing Data Redaction - an enabler to data security in EDB Postgres Adva...
Introducing Data Redaction - an enabler to data security in EDB Postgres Adva...
 

Similar to Lessons Learned from Leveraging Real-Time Power Consumption Data with Apache Kudu

Hyperledger weatherreport20190219 公開版
Hyperledger weatherreport20190219 公開版Hyperledger weatherreport20190219 公開版
Hyperledger weatherreport20190219 公開版
Hyperleger Tokyo Meetup
 
Toward Scalable and Powerful CloudStack
Toward Scalable and Powerful CloudStackToward Scalable and Powerful CloudStack
Toward Scalable and Powerful CloudStack
Takashi Kanai
 
Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019
alanfgates
 
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
DataWorks Summit
 
Sizing Splunk SmartStore - Spend Less and Get More Out of Splunk
Sizing Splunk SmartStore - Spend Less and Get More Out of SplunkSizing Splunk SmartStore - Spend Less and Get More Out of Splunk
Sizing Splunk SmartStore - Spend Less and Get More Out of Splunk
Paula Koziol
 
Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...
Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...
Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...
Motoki Kakinuma
 
MySQL-InnoDB
MySQL-InnoDBMySQL-InnoDB
MySQL-InnoDB
Mayank Prasad
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


Cloudera, Inc.
 
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreAzure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User Store
DataStax Academy
 
Iperconvergenza come migliora gli economics del tuo IT
Iperconvergenza come migliora gli economics del tuo ITIperconvergenza come migliora gli economics del tuo IT
Iperconvergenza come migliora gli economics del tuo IT
NetApp
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Precisely
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Steven Totman
 
What Makes Software Green?
What Makes Software Green?What Makes Software Green?
What Makes Software Green?
Rakuten Group, Inc.
 
NVMe and Flash – Make Your Storage Great Again!
NVMe and Flash – Make Your Storage Great Again!NVMe and Flash – Make Your Storage Great Again!
NVMe and Flash – Make Your Storage Great Again!
DataCore Software
 
VM-aware Adaptive Storage Cache Prefetching
VM-aware Adaptive Storage Cache PrefetchingVM-aware Adaptive Storage Cache Prefetching
VM-aware Adaptive Storage Cache Prefetching
Shinagawa Laboratory, The University of Tokyo
 
[RakutenTechConf2013] [C-1] Rakuten new infrastructure
[RakutenTechConf2013] [C-1] Rakuten new infrastructure[RakutenTechConf2013] [C-1] Rakuten new infrastructure
[RakutenTechConf2013] [C-1] Rakuten new infrastructure
Rakuten Group, Inc.
 
Best Practices for Monitoring Postgres
Best Practices for Monitoring Postgres Best Practices for Monitoring Postgres
Best Practices for Monitoring Postgres
EDB
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
Cloudera, Inc.
 
Oracle Database 19c - poslední z rodiny 12.2 a co přináší nového
Oracle Database 19c - poslední z rodiny 12.2 a co přináší novéhoOracle Database 19c - poslední z rodiny 12.2 a co přináší nového
Oracle Database 19c - poslední z rodiny 12.2 a co přináší nového
MarketingArrowECS_CZ
 
TechEvent 2019: Create a Private Database Cloud in the Public Cloud using the...
TechEvent 2019: Create a Private Database Cloud in the Public Cloud using the...TechEvent 2019: Create a Private Database Cloud in the Public Cloud using the...
TechEvent 2019: Create a Private Database Cloud in the Public Cloud using the...
Trivadis
 

Similar to Lessons Learned from Leveraging Real-Time Power Consumption Data with Apache Kudu (20)

Hyperledger weatherreport20190219 公開版
Hyperledger weatherreport20190219 公開版Hyperledger weatherreport20190219 公開版
Hyperledger weatherreport20190219 公開版
 
Toward Scalable and Powerful CloudStack
Toward Scalable and Powerful CloudStackToward Scalable and Powerful CloudStack
Toward Scalable and Powerful CloudStack
 
Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019
 
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
 
Sizing Splunk SmartStore - Spend Less and Get More Out of Splunk
Sizing Splunk SmartStore - Spend Less and Get More Out of SplunkSizing Splunk SmartStore - Spend Less and Get More Out of Splunk
Sizing Splunk SmartStore - Spend Less and Get More Out of Splunk
 
Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...
Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...
Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...
 
MySQL-InnoDB
MySQL-InnoDBMySQL-InnoDB
MySQL-InnoDB
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreAzure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User Store
 
Iperconvergenza come migliora gli economics del tuo IT
Iperconvergenza come migliora gli economics del tuo ITIperconvergenza come migliora gli economics del tuo IT
Iperconvergenza come migliora gli economics del tuo IT
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
 
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSyncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
 
What Makes Software Green?
What Makes Software Green?What Makes Software Green?
What Makes Software Green?
 
NVMe and Flash – Make Your Storage Great Again!
NVMe and Flash – Make Your Storage Great Again!NVMe and Flash – Make Your Storage Great Again!
NVMe and Flash – Make Your Storage Great Again!
 
VM-aware Adaptive Storage Cache Prefetching
VM-aware Adaptive Storage Cache PrefetchingVM-aware Adaptive Storage Cache Prefetching
VM-aware Adaptive Storage Cache Prefetching
 
[RakutenTechConf2013] [C-1] Rakuten new infrastructure
[RakutenTechConf2013] [C-1] Rakuten new infrastructure[RakutenTechConf2013] [C-1] Rakuten new infrastructure
[RakutenTechConf2013] [C-1] Rakuten new infrastructure
 
Best Practices for Monitoring Postgres
Best Practices for Monitoring Postgres Best Practices for Monitoring Postgres
Best Practices for Monitoring Postgres
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Oracle Database 19c - poslední z rodiny 12.2 a co přináší nového
Oracle Database 19c - poslední z rodiny 12.2 a co přináší novéhoOracle Database 19c - poslední z rodiny 12.2 a co přináší nového
Oracle Database 19c - poslední z rodiny 12.2 a co přináší nového
 
TechEvent 2019: Create a Private Database Cloud in the Public Cloud using the...
TechEvent 2019: Create a Private Database Cloud in the Public Cloud using the...TechEvent 2019: Create a Private Database Cloud in the Public Cloud using the...
TechEvent 2019: Create a Private Database Cloud in the Public Cloud using the...
 

More from Hitachi, Ltd. OSS Solution Center.

Authentication and Authorization of The Latest Keycloak
Authentication and Authorization of The Latest KeycloakAuthentication and Authorization of The Latest Keycloak
Authentication and Authorization of The Latest Keycloak
Hitachi, Ltd. OSS Solution Center.
 
Guide of authentication and authorization for cloud native applications with ...
Guide of authentication and authorization for cloud native applications with ...Guide of authentication and authorization for cloud native applications with ...
Guide of authentication and authorization for cloud native applications with ...
Hitachi, Ltd. OSS Solution Center.
 
KeycloakのCNCF incubating project入りまでのアップストリーム活動の歩み
KeycloakのCNCF incubating project入りまでのアップストリーム活動の歩みKeycloakのCNCF incubating project入りまでのアップストリーム活動の歩み
KeycloakのCNCF incubating project入りまでのアップストリーム活動の歩み
Hitachi, Ltd. OSS Solution Center.
 
KubeCon NA 2023 Recap: Challenge to Implementing “Scalable” Authorization wit...
KubeCon NA 2023 Recap: Challenge to Implementing “Scalable” Authorization wit...KubeCon NA 2023 Recap: Challenge to Implementing “Scalable” Authorization wit...
KubeCon NA 2023 Recap: Challenge to Implementing “Scalable” Authorization wit...
Hitachi, Ltd. OSS Solution Center.
 
パスキーでリードする: NGINXとKeycloakによる効率的な認証・認可
パスキーでリードする: NGINXとKeycloakによる効率的な認証・認可パスキーでリードする: NGINXとKeycloakによる効率的な認証・認可
パスキーでリードする: NGINXとKeycloakによる効率的な認証・認可
Hitachi, Ltd. OSS Solution Center.
 
Keycloakの全体像: 基本概念、ユースケース、そして最新の開発動向
Keycloakの全体像: 基本概念、ユースケース、そして最新の開発動向Keycloakの全体像: 基本概念、ユースケース、そして最新の開発動向
Keycloakの全体像: 基本概念、ユースケース、そして最新の開発動向
Hitachi, Ltd. OSS Solution Center.
 
Challenge to Implementing "Scalable" Authorization with Keycloak
Challenge to Implementing "Scalable" Authorization with KeycloakChallenge to Implementing "Scalable" Authorization with Keycloak
Challenge to Implementing "Scalable" Authorization with Keycloak
Hitachi, Ltd. OSS Solution Center.
 
KubeConRecap_nakamura.pdf
KubeConRecap_nakamura.pdfKubeConRecap_nakamura.pdf
KubeConRecap_nakamura.pdf
Hitachi, Ltd. OSS Solution Center.
 
NGINXでの認可について考える
NGINXでの認可について考えるNGINXでの認可について考える
NGINXでの認可について考える
Hitachi, Ltd. OSS Solution Center.
 
Security Considerations for API Gateway Aggregation
Security Considerations for API Gateway AggregationSecurity Considerations for API Gateway Aggregation
Security Considerations for API Gateway Aggregation
Hitachi, Ltd. OSS Solution Center.
 
KeycloakでFAPIに対応した高セキュリティなAPIを公開する
KeycloakでFAPIに対応した高セキュリティなAPIを公開するKeycloakでFAPIに対応した高セキュリティなAPIを公開する
KeycloakでFAPIに対応した高セキュリティなAPIを公開する
Hitachi, Ltd. OSS Solution Center.
 
IDガバナンス&管理の基礎
IDガバナンス&管理の基礎IDガバナンス&管理の基礎
IDガバナンス&管理の基礎
Hitachi, Ltd. OSS Solution Center.
 
Keycloakのステップアップ認証について
Keycloakのステップアップ認証についてKeycloakのステップアップ認証について
Keycloakのステップアップ認証について
Hitachi, Ltd. OSS Solution Center.
 
NGINXをBFF (Backend for Frontend)として利用した話
NGINXをBFF (Backend for Frontend)として利用した話NGINXをBFF (Backend for Frontend)として利用した話
NGINXをBFF (Backend for Frontend)として利用した話
Hitachi, Ltd. OSS Solution Center.
 
Why Assertion-based Access Token is preferred to Handle-based one?
Why Assertion-based Access Token is preferred to Handle-based one?Why Assertion-based Access Token is preferred to Handle-based one?
Why Assertion-based Access Token is preferred to Handle-based one?
Hitachi, Ltd. OSS Solution Center.
 
KeycloakでAPI認可に入門する
KeycloakでAPI認可に入門するKeycloakでAPI認可に入門する
KeycloakでAPI認可に入門する
Hitachi, Ltd. OSS Solution Center.
 
What API Specifications and Tools Help Engineers to Construct a High-Security...
What API Specifications and Tools Help Engineers to Construct a High-Security...What API Specifications and Tools Help Engineers to Construct a High-Security...
What API Specifications and Tools Help Engineers to Construct a High-Security...
Hitachi, Ltd. OSS Solution Center.
 
Implementing security and availability requirements for banking API system us...
Implementing security and availability requirements for banking API system us...Implementing security and availability requirements for banking API system us...
Implementing security and availability requirements for banking API system us...
Hitachi, Ltd. OSS Solution Center.
 
Lightweight Zero-trust Network Implementation and Transition with Keycloak an...
Lightweight Zero-trust Network Implementation and Transition with Keycloak an...Lightweight Zero-trust Network Implementation and Transition with Keycloak an...
Lightweight Zero-trust Network Implementation and Transition with Keycloak an...
Hitachi, Ltd. OSS Solution Center.
 
Overall pictures of Identity provider mix-up attack patterns and trade-offs b...
Overall pictures of Identity provider mix-up attack patterns and trade-offs b...Overall pictures of Identity provider mix-up attack patterns and trade-offs b...
Overall pictures of Identity provider mix-up attack patterns and trade-offs b...
Hitachi, Ltd. OSS Solution Center.
 

More from Hitachi, Ltd. OSS Solution Center. (20)

Authentication and Authorization of The Latest Keycloak
Authentication and Authorization of The Latest KeycloakAuthentication and Authorization of The Latest Keycloak
Authentication and Authorization of The Latest Keycloak
 
Guide of authentication and authorization for cloud native applications with ...
Guide of authentication and authorization for cloud native applications with ...Guide of authentication and authorization for cloud native applications with ...
Guide of authentication and authorization for cloud native applications with ...
 
KeycloakのCNCF incubating project入りまでのアップストリーム活動の歩み
KeycloakのCNCF incubating project入りまでのアップストリーム活動の歩みKeycloakのCNCF incubating project入りまでのアップストリーム活動の歩み
KeycloakのCNCF incubating project入りまでのアップストリーム活動の歩み
 
KubeCon NA 2023 Recap: Challenge to Implementing “Scalable” Authorization wit...
KubeCon NA 2023 Recap: Challenge to Implementing “Scalable” Authorization wit...KubeCon NA 2023 Recap: Challenge to Implementing “Scalable” Authorization wit...
KubeCon NA 2023 Recap: Challenge to Implementing “Scalable” Authorization wit...
 
パスキーでリードする: NGINXとKeycloakによる効率的な認証・認可
パスキーでリードする: NGINXとKeycloakによる効率的な認証・認可パスキーでリードする: NGINXとKeycloakによる効率的な認証・認可
パスキーでリードする: NGINXとKeycloakによる効率的な認証・認可
 
Keycloakの全体像: 基本概念、ユースケース、そして最新の開発動向
Keycloakの全体像: 基本概念、ユースケース、そして最新の開発動向Keycloakの全体像: 基本概念、ユースケース、そして最新の開発動向
Keycloakの全体像: 基本概念、ユースケース、そして最新の開発動向
 
Challenge to Implementing "Scalable" Authorization with Keycloak
Challenge to Implementing "Scalable" Authorization with KeycloakChallenge to Implementing "Scalable" Authorization with Keycloak
Challenge to Implementing "Scalable" Authorization with Keycloak
 
KubeConRecap_nakamura.pdf
KubeConRecap_nakamura.pdfKubeConRecap_nakamura.pdf
KubeConRecap_nakamura.pdf
 
NGINXでの認可について考える
NGINXでの認可について考えるNGINXでの認可について考える
NGINXでの認可について考える
 
Security Considerations for API Gateway Aggregation
Security Considerations for API Gateway AggregationSecurity Considerations for API Gateway Aggregation
Security Considerations for API Gateway Aggregation
 
KeycloakでFAPIに対応した高セキュリティなAPIを公開する
KeycloakでFAPIに対応した高セキュリティなAPIを公開するKeycloakでFAPIに対応した高セキュリティなAPIを公開する
KeycloakでFAPIに対応した高セキュリティなAPIを公開する
 
IDガバナンス&管理の基礎
IDガバナンス&管理の基礎IDガバナンス&管理の基礎
IDガバナンス&管理の基礎
 
Keycloakのステップアップ認証について
Keycloakのステップアップ認証についてKeycloakのステップアップ認証について
Keycloakのステップアップ認証について
 
NGINXをBFF (Backend for Frontend)として利用した話
NGINXをBFF (Backend for Frontend)として利用した話NGINXをBFF (Backend for Frontend)として利用した話
NGINXをBFF (Backend for Frontend)として利用した話
 
Why Assertion-based Access Token is preferred to Handle-based one?
Why Assertion-based Access Token is preferred to Handle-based one?Why Assertion-based Access Token is preferred to Handle-based one?
Why Assertion-based Access Token is preferred to Handle-based one?
 
KeycloakでAPI認可に入門する
KeycloakでAPI認可に入門するKeycloakでAPI認可に入門する
KeycloakでAPI認可に入門する
 
What API Specifications and Tools Help Engineers to Construct a High-Security...
What API Specifications and Tools Help Engineers to Construct a High-Security...What API Specifications and Tools Help Engineers to Construct a High-Security...
What API Specifications and Tools Help Engineers to Construct a High-Security...
 
Implementing security and availability requirements for banking API system us...
Implementing security and availability requirements for banking API system us...Implementing security and availability requirements for banking API system us...
Implementing security and availability requirements for banking API system us...
 
Lightweight Zero-trust Network Implementation and Transition with Keycloak an...
Lightweight Zero-trust Network Implementation and Transition with Keycloak an...Lightweight Zero-trust Network Implementation and Transition with Keycloak an...
Lightweight Zero-trust Network Implementation and Transition with Keycloak an...
 
Overall pictures of Identity provider mix-up attack patterns and trade-offs b...
Overall pictures of Identity provider mix-up attack patterns and trade-offs b...Overall pictures of Identity provider mix-up attack patterns and trade-offs b...
Overall pictures of Identity provider mix-up attack patterns and trade-offs b...
 

Recently uploaded

[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Neo4j
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
Fwdays
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
DanBrown980551
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Fwdays
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
DianaGray10
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
Fwdays
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 

Recently uploaded (20)

[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 

Lessons Learned from Leveraging Real-Time Power Consumption Data with Apache Kudu

  • 1. © Hitachi, Ltd. 2019. All rights reserved. Lessons Learned from Leveraging Real-Time Power Consumption Data with Apache Kudu ApacheCon North America 2019 Masahiro Ito OSS Solution Center Hitachi, Ltd. September 11, 2019
  • 2. © Hitachi, Ltd. 2019. All rights reserved. 1 Who am I? • Masahiro Ito ➢ Software Engineer at Hitachi, Ltd. • Developing Bigdata and AI solutions – E-mail: masahiro.ito.ph@hitachi.com ➢ Web article writer (in Japanese) • https://thinkit.co.jp/author/10002
  • 3. © Hitachi, Ltd. 2019. All rights reserved. 2 Outline 1. Introduction 2. Apache Kudu Overview 3. Performance Evaluations I. Bulk Data Loading Performance II. Near Real-time Processing Performance 4. Summary
  • 4. © Hitachi, Ltd. 2019. All rights reserved. 3 1. Introduction
  • 5. © Hitachi, Ltd. 2019. All rights reserved. 4 Hitachi Corporate Profile 9,480.6 billion yen 754.9 billion yen 295,941 February 1, 1920 458.7 billion yen © Hitachi, Ltd. 2019. All rights reserved. 4 Revenues Operating Income Number of Employees Established Capital (as of end of Mar. 2019) (as of end of Mar. 2019) (FY2018 Consolidated) (FY2018 Consolidated) Hitachi, Ltd. President & CEO Toshiaki Higashihara
  • 6. © Hitachi, Ltd. 2019. All rights reserved. 5 Share of Revenues (FY2018*) 16% 20% 10% 7% 9% 7% 10% Revenues 9,480.6 billion yen ■IT ■Hitachi Construction Machinery ■Hitachi Metals ■Hitachi Chemical ■Others 5% 4% ■Industry ■Mobility 12%■Hitachi High-Technologies ■Energy ■Smart Life * The figures are based on the new segment classifications effective from FY2019
  • 7. © Hitachi, Ltd. 2019. All rights reserved. 6 Motivation of Real-time IoT Data Analysis • Utilization of IoT and AI in various industries ➢ Generates large amounts of data in real-time by various IoT devices ➢ Leverages sensor data for monitoring, BI, and machine learning Kudu • Real-time IoT data analysis ➢ Requires strong performance for streaming / analytic workload
  • 8. © Hitachi, Ltd. 2019. All rights reserved. 7 2. Apache Kudu Overview
  • 9. © Hitachi, Ltd. 2019. All rights reserved. 8 Apache Kudu Overview • Apache Kudu is a storage engine for Apache Hadoop ➢ A top-level project in the Apache Software Foundation • Apache Hadoop ecosystem integration ➢ Reduces query latency for Apache Impala and Apache Spark ➢ Enables transparently joining of Kudu tables with HDFS or HBase • Kudu enables real-time analytics on rapidly changing data ➢ Has both of fast inserts/updates and efficient scans
  • 10. © Hitachi, Ltd. 2019. All rights reserved. 9 Performance Comparison for Kudu/HBase/HDFS High throughput read Real-time read High throughput writeReal-time write Kudu HBase HDFS Suitable for data analysis Suitable for streaming data store Kudu covers different workloads by itself ➢ Enables real-time analytics on rapidly changing data
  • 11. © Hitachi, Ltd. 2019. All rights reserved. 10 Traditional Hadoop and Kudu: Analytics on rapidly changing data HBase HDFS Streaming data Traditional Hadoop Inserts/Updates Kudu Kudu Analysis system - Dashboard - BI - Machine Learning Streaming data Analysis system - Dashboard - BI - Machine Learning Inserts/Updates Batch copy Scans Scans
  • 12. © Hitachi, Ltd. 2019. All rights reserved. 11 Data Model: Table • Strongly-typed columns • Primary Key consists of one or more columns • Operations: Insert / Update / Delete / Upsert / Scan date id usage cost complete 2018-01-01 01 20.86 22,360 True 2018-01-01 02 124.23 182,345 True 2018-01-02 01 22.53 736 False 2018-01-02 02 30.01 5,842 True Primary key Sorted by primary key columns
  • 13. © Hitachi, Ltd. 2019. All rights reserved. 12 Kudu TServer Kudu TServer Data Management: Table and Tablet • A table is partitioned into tablets that distributed across tablet servers ➢ Partitioning strategy: Range partitioning, Hash partitioning ➢ All rows within a tablet are sorted by its primary key date id … 2018-01-01 01 … 2018-01-01 02 … 2018-01-01 03 … 2018-01-01 04 … 2018-01-02 01 … 2018-01-02 02 … 2018-01-02 03 … 2018-01-02 04 … Range partitioning by date Hash partitioning by id TabletsTable 2018-01-01 01 … 2018-01-01 03 … 2018-01-01 02 … 2018-01-01 04 … 2018-01-02 02 … 2018-01-02 04 … 2018-01-02 01 … 2018-01-02 03 … Tablet 1 Tablet 3 Tablet 2 Tablet 4 2018-01-01 01 … 2018-01-01 02 … 2018-01-01 03 … 2018-01-01 04 … 2018-01-02 01 … 2018-01-02 02 … 2018-01-02 03 … 2018-01-02 04 … Replicate
  • 14. © Hitachi, Ltd. 2019. All rights reserved. 13 Insert Operation Flow in each TServer Worker Node Kudu TServer Data disk Tablet 1 DiskRowSet key … 01 … 02 … 03 … 04 … 05 … 06 … DiskRowSet key … 02 … 03 … 06 … 3. Flush 4. Compaction: Merge and sort by primary key 1. Insert records Kudu Client 05 … 01 … 04 … DiskRowSet key … 01 … 04 … 05 … MemRowSet key … 01 … 04 … 05 … 2. Sort by in-memory buffer: Sort by primary key Write Ahead log
  • 15. © Hitachi, Ltd. 2019. All rights reserved. 14 3. Performance Evaluations
  • 16. © Hitachi, Ltd. 2019. All rights reserved. 15 Evaluation Scenario: Real-time Power Consumption Data Analysis • What is power disaggregation? ➢ Estimates the power consumption of individual appliances from a single meter only • Appliances: TV, air conditioner, refrigerator, microwave, etc. ➢ Enables energy monitoring of individual appliances • For energy efficiency improvement, user behavior analysis, etc. Appliance load monitoring Total electrical signal (with single meter) Electrical signals of each appliance Disaggregation
  • 17. © Hitachi, Ltd. 2019. All rights reserved. 16 Evaluation Outline i. Bulk Data Loading Performance ➢ Migrate existing data to the new system with Kudu ii. Near Real-time Processing Performance ➢ Simultaneous data insertion and scanning • Insert power consumption data every second • Scan inserted data every minute for aggregation • Scan aggregated data every 5 seconds for interactive data analysis 0000 Meters 0000 0000 Kudu Electric Power Disaggregation System Analysis system Insert every second Analytic query Minutely aggregation Analyst
  • 18. © Hitachi, Ltd. 2019. All rights reserved. 17 Evaluation Environment: 6 Physical machines and 10 Gbps network Physical machine Spec - CPU: 20 cores (40 threads) - Memory: 384 GB - Disk: SAS HDD 1,200 GB * 10 disks 1 master node - Impala Catalog Server - Impala StateStore - HDFS NameNode - Kudu Master - Hive Metastore Server 1 client node - Kudu Java client 4 worker nodes - Impala Daemon - HDFS DataNode - Kudu TServer 10 Gbps switch / 10Gpbs LAN Software version - OS: CentOS 7.6 - CDH 6.2, Kudu 1.9.0 Software Configurations - TServer memory: 32GB - Impala memory: 256GB
  • 19. © Hitachi, Ltd. 2019. All rights reserved. 18 I. Bulk Data Loading Performance
  • 20. © Hitachi, Ltd. 2019. All rights reserved. 19 Evaluation Overview • Load CSV files in HDFS into a Kudu table using Impala • Compared two optimizer hints in Impala 1. +SHUFFLE,CLUSTERED (default): • SHUFFLE: Exchanges data between nodes for Partitioning data before insert • CLUSTERED: Sorts data by the partition columns before insert 2. +NOSHUFFLE,NOCLUSTERED • Does not partitioning and sort before insert Table schema # Columns Primary key Type 1 time_stamp ✔ unixtime_micros 2 building_id ✔ int32 3 floor_id ✔ int32 4 device_id ✔ int32 5 device_load int64 6 device_type int32 Table design: - Record size: 32 byte - Range partition: 24 hour (time_stamp) - Hash partitions: 16 (building_id, floor_id) - Replication factor: 3 Data size: - 1,440 million records, 43 GB
  • 21. © Hitachi, Ltd. 2019. All rights reserved. 20 Bulk Data Loading Performance: Throughput and Compaction load Insertion finish +SHUFFLE,CLUSTERED (default) +NOSHUFFLE,NOCLUSTERED Insertion throughput Compaction duration Insertion finish Avg. 1.57M records/sec Avg. 0.57M records/sec Optimizer hints in Impala Almost no time Continues after finish insertion
  • 22. © Hitachi, Ltd. 2019. All rights reserved. 21 Evaluation Summary 0.57 M 1.57 M 0.00 M 0.50 M 1.00 M 1.50 M 2.00 M CLUSTERED, SHUFFLE (default) NOCLUSTERED, NOSHUFFLE records/sec Impala query hints Impala bulk insert throughput +NOSHUFFLE,NOCLUSTERED hints: • Using Impala memory only for data insertion • Impala completes data loading quickly • Kudu continues heavy compaction in the background +SHUFFLE,CLUSTERED hints (default): • Leveraging Impala memory for partitioning and sorting • Impala takes more time to complete data loading • Kudu has less compaction load
  • 23. © Hitachi, Ltd. 2019. All rights reserved. 22 II. Near Real-time Processing Performance
  • 24. © Hitachi, Ltd. 2019. All rights reserved. 23 Evaluation Overview • Concurrent data insertion and scanning for 4 hours ➢ Insert every second with Kudu Java clients • Num. of insert records (appliances) : 100,000 ~ • Fail if insertion time continues to exceed 1 second ➢ Scan by two types of queries with Impala Kudu load_per_sec_table minutely_load_table A) Minutely aggregation query From seconds to minutes for all appliances (Every minute) B) Per appliance aggregation query Get 1 appliance daily total load (Every 5 second) Insert records (Every second) Pre-store for 1 day (1,440 minutes) records to save measurement time Kudu Java client × 20 Impala
  • 25. © Hitachi, Ltd. 2019. All rights reserved. 24 Table Designs and Scan Workloads • Evaluates two types of tables with different primary key order ➢ Affects Scan performance time_stamp building floor device watt type 2019-09-01 00:00 00001 01 01 209 3 2019-09-01 00:00 00001 01 02 102 5 2019-09-01 00:00 00001 01 03 42 11 2019-09-01 00:00 00001 02 01 462 4 2019-09-01 00:00 00001 02 02 3 22 2019-09-01 00:00 00001 03 01 0 4 building floor device time_stamp watt type 00001 01 01 2019-09-01 00:00 209 3 00001 01 01 2019-09-01 00:01 102 5 00001 01 01 2019-09-01 00:02 42 11 00001 01 02 2019-09-01 00:00 462 4 00001 01 02 2019-09-01 00:01 3 22 00001 01 02 2019-09-01 00:02 0 4 2) First primary key columns = building, floor, device IDs Efficient access to a specific appliance load ➢ e.g. B) Per appliance query Efficient access to a range of time loads ➢ e.g. A) Minutely aggregation query 1) First primary key column = time_stamp
  • 26. © Hitachi, Ltd. 2019. All rights reserved. 25 Insertion Performance: First primary key column = time_stamp • Avg. insertion time: 480 msec • Sometimes insertion time exceeded 1 second, but recovered quickly • Insertion time exceeded 1 second continuously • Occurred “Memory pressure rejection” - Soft memory limit exceeded (at 93.59% of capacity). Insert 1.9 M record/sec (Succeeded) Insert 2.0 M record/sec (Failed) 480 msec 1,031 msec 0 msec 1,000 msec 2,000 msec 0.1 M 0.2 M 0.3 M 0.4 M 0.5 M 0.6 M 0.7 M 0.8 M 0.9 M 1.0 M 1.2 M 1.4 M 1.6 M 1.8 M 1.9 M 2.0 M Num. of insertion records per second Insertion time (Avg.)
  • 27. © Hitachi, Ltd. 2019. All rights reserved. 26 Insertion Performance: First primary key columns = IDs Insert 0.5 M record/sec (Succeeded) Insert 0.6 M record/sec (Failed) 185 msec 440 msec 0 msec 500 msec 0.1 M 0.2 M 0.3 M 0.4 M 0.5 M 0.6 M Num. of insertion records per second Insertion time (Avg.) • Avg. insertion time: 185 msec • Insertion time exceeded 1 second continuously - From 01:20 • Occurred “The service queue is full (50 items)”
  • 28. © Hitachi, Ltd. 2019. All rights reserved. 27 Why is the insertion performance different in order of primary key? • The RowSet compaction load changes according to the primary key order ➢ Since the records are inserted in timestamp order time_stamp … 2019-09-01 00:00 … 2019-09-01 00:00 … 2019-09-01 00:01 … 2019-09-01 00:01 … time_stamp … 2019-09-01 00:00 … 2019-09-01 00:00 … 2019-09-01 00:01 … 2019-09-01 00:01 … 2019-09-01 00:02 … 2019-09-01 00:02 … time_stamp … 2019-09-01 00:02 … 2019-09-01 00:02 … Existing RowSet New RowSet building ... time_stamp … 00001 … 2019-09-01 00:00 … 00001 … 2019-09-01 00:01 … 00001 … 2019-09-01 00:02 … 00002 … 2019-09-01 00:00 … 00002 … 2019-09-01 00:01 … 00002 … 2019-09-01 00:02 … building ... time_stamp … 00001 … 2019-09-01 00:00 … 00001 … 2019-09-01 00:01 … 00002 … 2019-09-01 00:01 … 00002 … 2019-09-01 00:00 … building ... time_stamp … 00001 … 2019-09-01 00:02 … 00002 … 2019-09-01 00:02 … Existing RowSet New RowSet First primary key column = time_stamp (Inserted 1.9M records/sec) First primary key columns = IDs (Inserted 0.5M records/sec) Merge without sorting Merge with sorting
  • 29. © Hitachi, Ltd. 2019. All rights reserved. 28 Can we reduce the compaction load in another way? time_stamp … 2019-09-01 00:00 … 2019-09-01 00:00 … 2019-09-01 00:01 … 2019-09-01 00:01 … time_stamp … 2019-09-01 00:00 … 2019-09-01 00:00 … 2019-09-01 00:01 … 2019-09-01 00:01 … 2019-09-01 00:02 … 2019-09-01 00:02 … time_stamp … 2019-09-01 00:02 … 2019-09-01 00:02 … Existing RowSet New RowSet building ... time_stamp … 00001 … 2019-09-01 00:00 … 00001 … 2019-09-01 00:01 … 00001 … 2019-09-01 00:02 … 00002 … 2019-09-01 00:00 … 00002 … 2019-09-01 00:01 … 00002 … 2019-09-01 00:02 … building ... time_stamp … 00001 … 2019-09-01 00:00 … 00001 … 2019-09-01 00:01 … 00002 … 2019-09-01 00:01 … 00002 … 2019-09-01 00:00 … building ... time_stamp … 00001 … 2019-09-01 00:02 … 00002 … 2019-09-01 00:02 … Existing RowSet New RowSet First primary key columns = IDs (Inserted 0.5M records/sec) Merge without sorting Merge with sorting Can we reduce the compaction load by reducing the maximum size of each tablet? First primary key column = time_stamp (Inserted 1.9M records/sec)
  • 30. © Hitachi, Ltd. 2019. All rights reserved. 29 Insertion Performance: First primary key column = IDs, Partition range = 24h->1h Insert 0.8 M record/sec (Succeeded) Insert 0.9 M record/sec (Failed) • Avg. insertion time: 231 msec • Increase of insertion time was reset by hourly tablet change • Insertion time exceeded 1 second continuously - Around the end of every hour • Occurred “Memory pressure rejection” and “The service queue is full (50 items)” 231 msec 358 msec 0 msec 200 msec 400 msec 0.1 M 0.2 M 0.3 M 0.4 M 0.5 M 0.6 M 0.7 M 0.8 M 0.9 M Num. of insertion records per second Insertion time (Avg.) Reduced the maximum size of each tablet by changing range partition from 24h to 1h.
  • 31. © Hitachi, Ltd. 2019. All rights reserved. 30 Insertion Performance Summary 1.9 M record/sec 0.5 M record/sec 0.8 M record/sec 0.0 M record/sec 0.5 M record/sec 1.0 M record/sec 1.5 M record/sec 2.0 M record/sec 2.5 M record/sec 3.0 M record/sec Timestamp (24h) IDs (24h) IDs (1h) First primary key columns (Partition range) Insertion throughput Tuning point: Reduce the compaction load • Use Timestamp for the first primary key column • If you want IDs as the first key, reduce the maximum size of each tablet ➢ Increase the number of partitions
  • 32. © Hitachi, Ltd. 2019. All rights reserved. 31 Scan Performance Summary First primary key column = timestamp was the lowest latency First primary key columns = IDs were the lowest latency 1.2 sec 0.2 sec0.2 sec 0.0 sec 0.2 sec 0.4 sec 0.6 sec 0.8 sec 1.0 sec 1.2 sec 1.4 sec 1.6 sec 1.8 sec 0.1 M 0.2 M 0.3 M 0.4 M 0.5 M 0.6 M 0.7 M 0.8 M 0.9 M 1.0 M 1.2 M 1.4 M 1.6 M 1.8 M 1.9 M 2.0 M Num. of insertion records per second B) Per appliance aggregation query time (95 percentile) First key: Timestamp First key: IDs, Partition range: 1h First key: IDs, Partition range: 24h 8.0 sec 7.4 sec 10.6 sec 0.0 sec 2.0 sec 4.0 sec 6.0 sec 8.0 sec 10.0 sec 12.0 sec 14.0 sec 16.0 sec 18.0 sec 20.0 sec 0.1 M 0.2 M 0.3 M 0.4 M 0.5 M 0.6 M 0.7 M 0.8 M 0.9 M 1.0 M 1.2 M 1.4 M 1.6 M 1.8 M 1.9 M 2.0 M Num. of insertion records per second A) Minutely aggregation query time (95 percentile) First key: Timestamp First key: IDs, Partition range: 1h First key: IDs, Partition range: 24h • Primary key order should be defined according to the patterns of data scan ➢ Scan request latencies were 3-6 times different • Trade-off with insertion performance
  • 33. © Hitachi, Ltd. 2019. All rights reserved. 32 4. Summary
  • 34. © Hitachi, Ltd. 2019. All rights reserved. 33 Summary • 4-TServer Kudu cluster enables real-time analysis on 1-second power consumption data for 1.9 million appliances ➢ Inserts every second, aggregates every minute, aggregates by any appliance • Lessons from performance evaluation: ➢ Insertion performance tuning: • Reduce the compaction load by ✓ Using timestamp for the first primary key column to reduce the cost of sort during the merge ✓ Reducing a tablet size to reduce compaction records ➢ Scan performance tuning: • Define primary key order according to the patterns of data scan
  • 35. © Hitachi, Ltd. 2019. All rights reserved. 34 Trademarks • Apache Kudu, Apache Impala, Apache Spark, Apache HBase and Apache Hadoop are either registered trademarks or trademarks of Apache Software Foundation in the United States and/or other countries. • Other company and product names mentioned in this document may be the trademarks of their respective owners.