DB Apache Kudu
DB
HybridTime
2 © Cloudera, Inc. All rights reserved.
• ( ) / takahiko at cloudera.com
•
• Cloudera
•
• Internet & Network
• RDBMS 1
• NoSQL 2
• Hadoop 3 ←Now!
3 © Cloudera, Inc. All rights reserved.
• Apache Kudu
• Kudu OLTP OLAP HTAP
DB #dbts2017 Kudu
• BI/DWH DB Kudu
Google Spanner
https://www.slideshare.net/Cloudera_jp/apache-kududb-dbts2017
• HybridTime
DB HybridTime
Kudu
© Cloudera, Inc. All rights reserved.
Apache Kudu
5 © Cloudera, Inc. All rights reserved.
• 275 3PB
• 1000 PB
• /
• 1 GB/
• DB
• BLOB
•
• 1000
Kudu
1
...
6 © Cloudera, Inc. All rights reserved.
Kudu
Kudu
(Impala)
(Kudu) (S3)(HDFS)
(Impala) (Spark)
(Hive)
(MapReduce)
(ADLS)
SQL
( DB )
HMS
7 © Cloudera, Inc. All rights reserved.
SQL Kudu
Impala + Kudu
(Impala)
(Kudu) (S3)(HDFS)
(Impala) (Spark)
(Hive)
(MapReduce)
(ADLS)
HMS
• Kudu SQL
• Impala SQL
8 © Cloudera, Inc. All rights reserved.
• Impala SQL Impala Kudu
• Impala Kudu predicate push down
• Kudu SCAN Impala aggregation
SQL
SQL Impala
3
90
9 © Cloudera, Inc. All rights reserved.
Spark Kudu
Spark + Kudu
(Impala)
(Kudu) (S3)(HDFS)
(Impala) (Spark)
(Hive)
(MapReduce)
(ADLS)
HMS
• Spark SQL
Kudu API
• SparkSQL
10 © Cloudera, Inc. All rights reserved.
• Kudu 1
Kudu
Tablet
Kudu
TabletServer
11 © Cloudera, Inc. All rights reserved.
• 1 3
• 3
• Raft
•
•
Tablet
TabletServer
12 © Cloudera, Inc. All rights reserved.
•
• INSERT/UPDATE/UPSERT/DELETE
• DECIMAL
•
•
• Kerberos
•
•
•
•
Kudu
© Cloudera, Inc. All rights reserved.
HTAP: OLTP OLAP Kudu
14 © Cloudera, Inc. All rights reserved.
• OLTP 1TB RAM
•
OLTP OLAP
• 2
DB DB
insert/update/delete
OLTP OLAP DWH BI
select
ETL
15 © Cloudera, Inc. All rights reserved.
Hadoop
• (PB)
HDFS OLAP
• Impala/Hive SQL
• HDFS OLTP HBase
HBase
Hadoop DB
put / delete OLTP OLAP BIselect
HBase ImpalaHadoop
data ingestion
HDFS
ETL
16 © Cloudera, Inc. All rights reserved.
• OLTP? OLAP?
•
•
• OLTP OLAP
• HTAP(Hybrid Transactional/Analytic Processing)
•
• ...
•
• Kudu HTAP
OLTP OLAP
HTAP
HTAP
17 © Cloudera, Inc. All rights reserved.
• OLTP OLAP 1 DB
•
HTAP DB
Kudu
insert/update/delete
HTAP DWH BI
select
Kudu
18 © Cloudera, Inc. All rights reserved.
(HDFS)
SQL
Impala
(Spark Streaming)
(Flume)
ETL SQL
Hive/Spark
DB DB
(Kudu)
IoT
(Flume)
BI
BI
BI
ETL
MQTT
BrokerIoT
BI
DB DB
/
DB DB
(Kafka)
( )
© Cloudera, Inc. All rights reserved.
DB
20 © Cloudera, Inc. All rights reserved.
• DB
•
• ! f
•
• 12:30:00 < 12:30:03
• 2 < 3
• Log Sequence Number LSN
• LSN
• DB
• DB LSN
• DB
DB
12:30:00 12:30:03
2 3
! f
21 © Cloudera, Inc. All rights reserved.
• Physical Clock
•
•
•
•
• Logical Clock)
•
• ...
•
•
•
DB
B
A
A B
22 © Cloudera, Inc. All rights reserved.
•
•
•
• +1
- Lamport Clock
2
3
6
24
16
61
54
69
70
12 24 48423630
8 32 40 48
50 703020
23 © Cloudera, Inc. All rights reserved.
•
• +1
•
•
•
- Vector Clock
2
3
{1,0,0}
{1,1,0}
{1,2,0}
{1,2,1} {1,2,2}
{1,4,2}
{1,3,0}
{2,3,0} {3,3,0}
{3,3,3}
{1,5,2}
{5,5,4}
{5,5,2}{4,5,2}
24 © Cloudera, Inc. All rights reserved.
•
•
25 © Cloudera, Inc. All rights reserved.
•
•
•
12:30:00
12:29:59
A B
B
!
f
26 © Cloudera, Inc. All rights reserved.
• Spanner: Google’s Globally Distributed Database
• DB
ACID
• GPS
• TrueTime API
error bound
Google Spanner
27 © Cloudera, Inc. All rights reserved.
• API
• GPS
•
• TrueTime API TT.now() TTinterval
• TT.now()
• Google DC 1 7ms 4ms
Google Spanner TrueTime API
earliest latest
TT"#$%&'(): %(&)"%+$, )($%+$
TT,now()
--
28 © Cloudera, Inc. All rights reserved.
• commit wait
• TrueTime API
• e f
2"
• External Consistency
• f e T $ < T &
Google Spanner commit-wait
$
" & "
2"
&
$
&
2"
2"
& → $
© Cloudera, Inc. All rights reserved.
Technical Report: HybridTime - Accessible Global Consistency
with High Clock Uncertainty
30 © Cloudera, Inc. All rights reserved.
• Technical Report: HybridTime - Accessible Global Consistency with High Clock
Uncertainty
•
• Google DC
• HybridTime NTP DB
• Kudu Kudu
HybridTime
•
• 2014 (
)
Kudu
31 © Cloudera, Inc. All rights reserved.
• Google Spanner DC
• Amazon Dynamo Cassandra DB
Eventual Consistency
•
•
•
[ ] DC DB
32 © Cloudera, Inc. All rights reserved.
• Consistency
•
•
• CAP Consistency ACID Consistency/Isolation
• Consistency
• (Anomaly)
• Lost Update, Dirty Read, Non-Repeatable, Phantom Read, Read Skew, Write Skew, etc...
•
• Lost Update SELECT FOR UPDATE
Consistency
33 © Cloudera, Inc. All rights reserved.
• Lamport Clocks Vector Clocks
•
•
•
• RDB Point-in-Time
• Vector Clocks
[ ]
34 © Cloudera, Inc. All rights reserved.
• Spinnaker Paxos
•
•
• Spanner commit-wait
•
• GPS
•
[ ]
35 © Cloudera, Inc. All rights reserved.
• HybridTime
•
•
• Pint-in-time
• Lamport Clock
• HybridTime
• Vector Clocks Lamport Clocks 2
( commit-wait )
[ ] HybridTime
HTC: { , }
36 © Cloudera, Inc. All rights reserved.
•
• NTP
• NTP
• commit-wait
•
• NTP
• commit-wait
•
•
• Kudu DB
HybridTime
37 © Cloudera, Inc. All rights reserved.
• !"# $ i e
• !"'() $ e
• *# $ i e
• 1:
• 2:
[ ] HybridTime
38 © Cloudera, Inc. All rights reserved.
• HybridTime HTC
• (error)
• Spanner TrueTime API
• HybridTime
• NTP
HybridTime
39 © Cloudera, Inc. All rights reserved.
• ntp_adjtime
• timex
• maxerror
HybridTime
40 © Cloudera, Inc. All rights reserved.
• Kudu macOS
• macOS OS
macOS
41 © Cloudera, Inc. All rights reserved.
Kudu
42 © Cloudera, Inc. All rights reserved.
• 1 2
• 2 1
• !" − !$ ...
•
• %$
• & = !" − !$ − %$
NTP
2
1
100*+
160*+
T1
100ms
160ms 160-100 = 60ms
T2
%$
NG
43 © Cloudera, Inc. All rights reserved.
• !"
• RTT: !
#
$
• ! = !" + !$ = '( − '" − ('+ − '$)
#
$
= !" =
-./-0 /(-1/-2)
$
• 3
• 3 = '$ − '" −
-./-0 / -1/-2
$
• 3 =
$ -2/-0
$
−
-./-0 / -1/-2
$
• 3 =
$-2/$-0/-.4-04-1/-2
$
• 3 =
-2/-0/-.4-1
$
• 3 =
-2/-0 4 -1/-.
$
NTP
NTP
T2 T3
T4T1
NTP
10078
15078 16078
11078
16578
11578 12578
17578
7078 8078 8578 9578
+1078 +1078
+578
!" !$
1) 50ms
2) -30ms
RTT20ms
44 © Cloudera, Inc. All rights reserved.
•
•
•
• ! =
#$#%#&& '(#)$%#$*)
)
= 41./
• 50ms -9ms
• RTT
0
)
• ±
0
)
NTP
T2 T3
T4T1
NTP
100./ 101./ 106./ 125./
+1./ +19./
+5./
8# 8)
150./ 151./ 156./ 175./) 50ms
RTT20ms
45 © Cloudera, Inc. All rights reserved.
•
• 1
•
•
•
• NTP !
•
• ! +
#
$
NTP
46 © Cloudera, Inc. All rights reserved.
• NTP
• DC NTP or Google Public NTP
• AWS Amazon Time Sync Service
• Azure Hyper-V time synchronization Google Public NTP
• GCE Google Public NTP
•
NTP
47 © Cloudera, Inc. All rights reserved.
• UPDATE
• 2
• HybridTime
+1
[ ] HybridTime
48 © Cloudera, Inc. All rights reserved.
• !" #, % !" HTC #, %
[ ] HybridTime
49 © Cloudera, Inc. All rights reserved.
• HTC
• HTC
• ) ! → #
• −%& ! < !(()( # < %* #
[ ] Kudu HybridTime
j
i
#
−%& !
%* #−%& !
!
%& !
50 © Cloudera, Inc. All rights reserved.
• KUDU-146 Deal with leap seconds
• leap second
• Stratum 0 NTP Leap Indicator OS
• 23:59:59 -> 23:59:60 -> 00:00:00
• 23:59:59 -> 23:59:59-> 00:00:00
• 2 1
• HybridTime propagate
• Kudu commit-wait
• wait ms
1
• NTP TIME_INS/TIME_OOP max error
• Leap Smearing
https://issues.apache.org/jira/browse/KUDU-430
51 © Cloudera, Inc. All rights reserved.
• HybridTime
•
• Kudu RDB
• ACID
• Commit-wait
• NTP
• CLIENT_PROPAGETED
• HybridTime
• HybridTime propagate
Kudu
52 © Cloudera, Inc. All rights reserved.
• Kudu MVCC Multi-version Concurrency Control
•
• WAL REDO UNDO
•
• READ_LATEST
•
• READ_AT_SNAPSHOT
• MVCC
•
Repeatable Read
Kudu
53 © Cloudera, Inc. All rights reserved.
Kudu
https://blog.cloudera.co.jp/11c3a749a81b
54 © Cloudera, Inc. All rights reserved.
• YCSB
•
• 3 8
• insert 60%, update 20%, single-row read 20%
•
• GCE: nl-standard-8 x10
• RAM 30GB
• Disk 350GB
• NTP
• GCE
[ ]
NTP
55 © Cloudera, Inc. All rights reserved.
[ ]
HybridTime Commit Wait Commit Wait
Clock Error
© Cloudera, Inc. All rights reserved.
57 © Cloudera, Inc. All rights reserved.
• OLTP OLAP 1 DB Kudu
• HybridTime
DB
• HybridTime → P.36
• Kudu
HybridTime Serializable
• OLAP Kudu #dbts2017
• 11 6 Cloudera World Tokyo 2018
Kudu
Kudu
58 © Cloudera, Inc. All rights reserved.
Cloudera World Tokyo 2018
http://www.clouderaworldtokyo.com/
THANK YOU

分散DB Apache Kuduのアーキテクチャ DBの性能と一貫性を両立させる仕組み 「HybridTime」とは

  • 1.
  • 2.
    2 © Cloudera,Inc. All rights reserved. • ( ) / takahiko at cloudera.com • • Cloudera • • Internet & Network • RDBMS 1 • NoSQL 2 • Hadoop 3 ←Now!
  • 3.
    3 © Cloudera,Inc. All rights reserved. • Apache Kudu • Kudu OLTP OLAP HTAP DB #dbts2017 Kudu • BI/DWH DB Kudu Google Spanner https://www.slideshare.net/Cloudera_jp/apache-kududb-dbts2017 • HybridTime DB HybridTime Kudu
  • 4.
    © Cloudera, Inc.All rights reserved. Apache Kudu
  • 5.
    5 © Cloudera,Inc. All rights reserved. • 275 3PB • 1000 PB • / • 1 GB/ • DB • BLOB • • 1000 Kudu 1 ...
  • 6.
    6 © Cloudera,Inc. All rights reserved. Kudu Kudu (Impala) (Kudu) (S3)(HDFS) (Impala) (Spark) (Hive) (MapReduce) (ADLS) SQL ( DB ) HMS
  • 7.
    7 © Cloudera,Inc. All rights reserved. SQL Kudu Impala + Kudu (Impala) (Kudu) (S3)(HDFS) (Impala) (Spark) (Hive) (MapReduce) (ADLS) HMS • Kudu SQL • Impala SQL
  • 8.
    8 © Cloudera,Inc. All rights reserved. • Impala SQL Impala Kudu • Impala Kudu predicate push down • Kudu SCAN Impala aggregation SQL SQL Impala 3 90
  • 9.
    9 © Cloudera,Inc. All rights reserved. Spark Kudu Spark + Kudu (Impala) (Kudu) (S3)(HDFS) (Impala) (Spark) (Hive) (MapReduce) (ADLS) HMS • Spark SQL Kudu API • SparkSQL
  • 10.
    10 © Cloudera,Inc. All rights reserved. • Kudu 1 Kudu Tablet Kudu TabletServer
  • 11.
    11 © Cloudera,Inc. All rights reserved. • 1 3 • 3 • Raft • • Tablet TabletServer
  • 12.
    12 © Cloudera,Inc. All rights reserved. • • INSERT/UPDATE/UPSERT/DELETE • DECIMAL • • • Kerberos • • • • Kudu
  • 13.
    © Cloudera, Inc.All rights reserved. HTAP: OLTP OLAP Kudu
  • 14.
    14 © Cloudera,Inc. All rights reserved. • OLTP 1TB RAM • OLTP OLAP • 2 DB DB insert/update/delete OLTP OLAP DWH BI select ETL
  • 15.
    15 © Cloudera,Inc. All rights reserved. Hadoop • (PB) HDFS OLAP • Impala/Hive SQL • HDFS OLTP HBase HBase Hadoop DB put / delete OLTP OLAP BIselect HBase ImpalaHadoop data ingestion HDFS ETL
  • 16.
    16 © Cloudera,Inc. All rights reserved. • OLTP? OLAP? • • • OLTP OLAP • HTAP(Hybrid Transactional/Analytic Processing) • • ... • • Kudu HTAP OLTP OLAP HTAP HTAP
  • 17.
    17 © Cloudera,Inc. All rights reserved. • OLTP OLAP 1 DB • HTAP DB Kudu insert/update/delete HTAP DWH BI select Kudu
  • 18.
    18 © Cloudera,Inc. All rights reserved. (HDFS) SQL Impala (Spark Streaming) (Flume) ETL SQL Hive/Spark DB DB (Kudu) IoT (Flume) BI BI BI ETL MQTT BrokerIoT BI DB DB / DB DB (Kafka) ( )
  • 19.
    © Cloudera, Inc.All rights reserved. DB
  • 20.
    20 © Cloudera,Inc. All rights reserved. • DB • • ! f • • 12:30:00 < 12:30:03 • 2 < 3 • Log Sequence Number LSN • LSN • DB • DB LSN • DB DB 12:30:00 12:30:03 2 3 ! f
  • 21.
    21 © Cloudera,Inc. All rights reserved. • Physical Clock • • • • • Logical Clock) • • ... • • • DB B A A B
  • 22.
    22 © Cloudera,Inc. All rights reserved. • • • • +1 - Lamport Clock 2 3 6 24 16 61 54 69 70 12 24 48423630 8 32 40 48 50 703020
  • 23.
    23 © Cloudera,Inc. All rights reserved. • • +1 • • • - Vector Clock 2 3 {1,0,0} {1,1,0} {1,2,0} {1,2,1} {1,2,2} {1,4,2} {1,3,0} {2,3,0} {3,3,0} {3,3,3} {1,5,2} {5,5,4} {5,5,2}{4,5,2}
  • 24.
    24 © Cloudera,Inc. All rights reserved. • •
  • 25.
    25 © Cloudera,Inc. All rights reserved. • • • 12:30:00 12:29:59 A B B ! f
  • 26.
    26 © Cloudera,Inc. All rights reserved. • Spanner: Google’s Globally Distributed Database • DB ACID • GPS • TrueTime API error bound Google Spanner
  • 27.
    27 © Cloudera,Inc. All rights reserved. • API • GPS • • TrueTime API TT.now() TTinterval • TT.now() • Google DC 1 7ms 4ms Google Spanner TrueTime API earliest latest TT"#$%&'(): %(&)"%+$, )($%+$ TT,now() --
  • 28.
    28 © Cloudera,Inc. All rights reserved. • commit wait • TrueTime API • e f 2" • External Consistency • f e T $ < T & Google Spanner commit-wait $ " & " 2" & $ & 2" 2" & → $
  • 29.
    © Cloudera, Inc.All rights reserved. Technical Report: HybridTime - Accessible Global Consistency with High Clock Uncertainty
  • 30.
    30 © Cloudera,Inc. All rights reserved. • Technical Report: HybridTime - Accessible Global Consistency with High Clock Uncertainty • • Google DC • HybridTime NTP DB • Kudu Kudu HybridTime • • 2014 ( ) Kudu
  • 31.
    31 © Cloudera,Inc. All rights reserved. • Google Spanner DC • Amazon Dynamo Cassandra DB Eventual Consistency • • • [ ] DC DB
  • 32.
    32 © Cloudera,Inc. All rights reserved. • Consistency • • • CAP Consistency ACID Consistency/Isolation • Consistency • (Anomaly) • Lost Update, Dirty Read, Non-Repeatable, Phantom Read, Read Skew, Write Skew, etc... • • Lost Update SELECT FOR UPDATE Consistency
  • 33.
    33 © Cloudera,Inc. All rights reserved. • Lamport Clocks Vector Clocks • • • • RDB Point-in-Time • Vector Clocks [ ]
  • 34.
    34 © Cloudera,Inc. All rights reserved. • Spinnaker Paxos • • • Spanner commit-wait • • GPS • [ ]
  • 35.
    35 © Cloudera,Inc. All rights reserved. • HybridTime • • • Pint-in-time • Lamport Clock • HybridTime • Vector Clocks Lamport Clocks 2 ( commit-wait ) [ ] HybridTime HTC: { , }
  • 36.
    36 © Cloudera,Inc. All rights reserved. • • NTP • NTP • commit-wait • • NTP • commit-wait • • • Kudu DB HybridTime
  • 37.
    37 © Cloudera,Inc. All rights reserved. • !"# $ i e • !"'() $ e • *# $ i e • 1: • 2: [ ] HybridTime
  • 38.
    38 © Cloudera,Inc. All rights reserved. • HybridTime HTC • (error) • Spanner TrueTime API • HybridTime • NTP HybridTime
  • 39.
    39 © Cloudera,Inc. All rights reserved. • ntp_adjtime • timex • maxerror HybridTime
  • 40.
    40 © Cloudera,Inc. All rights reserved. • Kudu macOS • macOS OS macOS
  • 41.
    41 © Cloudera,Inc. All rights reserved. Kudu
  • 42.
    42 © Cloudera,Inc. All rights reserved. • 1 2 • 2 1 • !" − !$ ... • • %$ • & = !" − !$ − %$ NTP 2 1 100*+ 160*+ T1 100ms 160ms 160-100 = 60ms T2 %$ NG
  • 43.
    43 © Cloudera,Inc. All rights reserved. • !" • RTT: ! # $ • ! = !" + !$ = '( − '" − ('+ − '$) # $ = !" = -./-0 /(-1/-2) $ • 3 • 3 = '$ − '" − -./-0 / -1/-2 $ • 3 = $ -2/-0 $ − -./-0 / -1/-2 $ • 3 = $-2/$-0/-.4-04-1/-2 $ • 3 = -2/-0/-.4-1 $ • 3 = -2/-0 4 -1/-. $ NTP NTP T2 T3 T4T1 NTP 10078 15078 16078 11078 16578 11578 12578 17578 7078 8078 8578 9578 +1078 +1078 +578 !" !$ 1) 50ms 2) -30ms RTT20ms
  • 44.
    44 © Cloudera,Inc. All rights reserved. • • • • ! = #$#%#&& '(#)$%#$*) ) = 41./ • 50ms -9ms • RTT 0 ) • ± 0 ) NTP T2 T3 T4T1 NTP 100./ 101./ 106./ 125./ +1./ +19./ +5./ 8# 8) 150./ 151./ 156./ 175./) 50ms RTT20ms
  • 45.
    45 © Cloudera,Inc. All rights reserved. • • 1 • • • • NTP ! • • ! + # $ NTP
  • 46.
    46 © Cloudera,Inc. All rights reserved. • NTP • DC NTP or Google Public NTP • AWS Amazon Time Sync Service • Azure Hyper-V time synchronization Google Public NTP • GCE Google Public NTP • NTP
  • 47.
    47 © Cloudera,Inc. All rights reserved. • UPDATE • 2 • HybridTime +1 [ ] HybridTime
  • 48.
    48 © Cloudera,Inc. All rights reserved. • !" #, % !" HTC #, % [ ] HybridTime
  • 49.
    49 © Cloudera,Inc. All rights reserved. • HTC • HTC • ) ! → # • −%& ! < !(()( # < %* # [ ] Kudu HybridTime j i # −%& ! %* #−%& ! ! %& !
  • 50.
    50 © Cloudera,Inc. All rights reserved. • KUDU-146 Deal with leap seconds • leap second • Stratum 0 NTP Leap Indicator OS • 23:59:59 -> 23:59:60 -> 00:00:00 • 23:59:59 -> 23:59:59-> 00:00:00 • 2 1 • HybridTime propagate • Kudu commit-wait • wait ms 1 • NTP TIME_INS/TIME_OOP max error • Leap Smearing https://issues.apache.org/jira/browse/KUDU-430
  • 51.
    51 © Cloudera,Inc. All rights reserved. • HybridTime • • Kudu RDB • ACID • Commit-wait • NTP • CLIENT_PROPAGETED • HybridTime • HybridTime propagate Kudu
  • 52.
    52 © Cloudera,Inc. All rights reserved. • Kudu MVCC Multi-version Concurrency Control • • WAL REDO UNDO • • READ_LATEST • • READ_AT_SNAPSHOT • MVCC • Repeatable Read Kudu
  • 53.
    53 © Cloudera,Inc. All rights reserved. Kudu https://blog.cloudera.co.jp/11c3a749a81b
  • 54.
    54 © Cloudera,Inc. All rights reserved. • YCSB • • 3 8 • insert 60%, update 20%, single-row read 20% • • GCE: nl-standard-8 x10 • RAM 30GB • Disk 350GB • NTP • GCE [ ] NTP
  • 55.
    55 © Cloudera,Inc. All rights reserved. [ ] HybridTime Commit Wait Commit Wait Clock Error
  • 56.
    © Cloudera, Inc.All rights reserved.
  • 57.
    57 © Cloudera,Inc. All rights reserved. • OLTP OLAP 1 DB Kudu • HybridTime DB • HybridTime → P.36 • Kudu HybridTime Serializable • OLAP Kudu #dbts2017 • 11 6 Cloudera World Tokyo 2018 Kudu Kudu
  • 58.
    58 © Cloudera,Inc. All rights reserved. Cloudera World Tokyo 2018 http://www.clouderaworldtokyo.com/
  • 59.