Fuxi論文まとめ

Fuxi: a Fault-Tolerant Resource
Management and Job Scheduling
System at Internet Scale
ND , ,
2019/10/30
1
VLDB, 2014

0. ( )
30
• EC
• 6

• 11 11 1

• 25 /
• Amazon (AWS)
(AlibabaCloud)
3

Alibaba
Alibaba  
Abstract
( Internet Scale )
5
Fuxi

1

7
Abstract
Fuxi
7
•
CPU/

•

•

Fuxi
(1) incremental

(2)

(3)
incremental
 
 
8
Abstract

Fuxi
https://www.alibabacloud.com/forum/read.php?tid=50&ﬁd=2&page=1
2015 GraySort MinuteSort
GraySort: 100TB (15.9TB/s)
MinuteSort: 60s
9
Abstract

“ ”  
2
(1) Scalability
(2) Fault-tolerance
 
(Mesos, Yarn, Omega)
1. Introduction
11

(1) Scalability
}
1. Introduction
12

(2) Fault-tolerance
Master
YARN Master  
OS
tolerance ” ”
1. Introduction
13

1. Introduction
(2)
(1)
2
(1) 3 (2) 4
14

( )
2. System Overview
( )
A
Application •
CPU

• 1

•

• CPU

15

( )
2. System Overview
Alibaba  
A B
B
Application

16

1. Introduction
1. Introduction
• Alibaba

• Alibaba

•

•
17

Fuxi
2. System Overview
Fuxi master-slave
FuxiMaster, FuxiAgent, ApplicationMaster 3
( YARN )
19

Fuxi
2. System Overview
FuxiMaster
FuxiAgent
(1) FuxiMaster

(2)
ApplicationMaster
(MapReduce Spark )
20

Fuxi
2. System Overview
FuxiMasterFuxiAgent
AppMaster
FuxiMaster
CPU 1 1GB

21

Fuxi
2. System Overview
FuxiMasterFuxiAgent
AppMaster
FuxiMaster
A 3

22

3. Incremental Resource
Management Protocol
23

3.1
3. Incremental resource 
Management Protocol
• Incremental Scheduling

• Incremental

•

• locality tree
24

(1) incremental
Scheduling
Management Protocol
• 10

•
CPU /
Application
25

Management Protocol
locality tree based incremental scheduling
(1) incremental
Scheduling

( )
26

Management Protocol
Locality Tree ( )
A B
B
Application
B

A
( )
27

Incremental Scheduling
Management Protocol
AppMaster
10
: 6
: 0
( )
FuxiMaster
20 14  
6
※

{CPU: 1 , RAM: 1GB}
(1) Incremental

Scheduling
28

Management Protocol
FuxiMaster
AppMaster
6
: 0
: 4
Incremental Scheduling(1) Incremental

Scheduling
29

Management Protocol
FuxiMaster
AppMaster
: 4
: 4

4

Scheduling
30

Management Protocol
FuxiMaster
AppMaster
4
: 0
: 0
4

Incremental Scheduling

Scheduling
31

Incremental ?
Management Protocol
AppMaster
10
: 6
: 0
6
FuxiMaster
Incremental
(1) Incremental

Scheduling
32

Management Protocol
AppMaster
6
: 0
: 0
OK
FuxiMaster
Incremental

AppMaster
Incremental ?(1) Incremental

Scheduling
33

Management Protocol
AppMaster
6
: 0
: 0
OK
FuxiMaster
Incremental

AppMaster
Incremental ?(1) Incremental

Scheduling
34
incremental

Management Protocol
AppMaster
6
: 0
: 0
OK
FuxiMaster
Incremental

AppMaster
Incremental ?
FuxiMaster
(1) Incremental

Scheduling
35

(2) Incremental
Communication
Management Protocol
 
36

(2) Incremental
Communication
Management Protocol
AppMaster FuxiAgent
FuxiMaster
37

3.2
Management Protocol
CPU RAM
Resource Description
Resource Description
(Resource Description )
•

•
CPU RAM
(1) Resource Description: Both Physical and Virtual
38

3.2
Management Protocol
AppMaster FuxiMaster
(2) Resource Request
3
… …
…
…
…
…
1

39

3.2
Management Protocol
(2) Resource Request
Resource Request
CPU Memory
40

3.2
Management Protocol
(3) Resource Response
Resource Response FuxiMaster AppMaster
Fuxi
Yarn
41

3.3 Locality Tree Based
Scheduling
Management Protocol
( )
FuxiMaster

locality tree
locality
42

Locality Tree
Management Protocol
( )
FuxiMaster

 
Locality Tree
43

4. Fault Tolerant Job Scheduling
44

4. Fault Tolerant Job
Scheduling
• Fuxi

•

•
45

1.
Scheduling
Fuxi Job DAG( ; Directed Asyclic Graph)
( DAG )
DAG
(Job 3 )
46

DAG ?( )
Scheduling
: DAG
• A, B:

• C: A, B
A B
C
C A, B
DAG
From; https://www.quora.com/What-are-the-advantages-of-DAG-directed-acyclic-graph-execution-of-big-data-algorithms-over-MapReduce-I-know-that-Apache-Spark-Storm-and-
Tez-use-the-DAG-execution-model-over-MapReduce-Why-Are-there-any-disadvantages
47

2.
Scheduling

FuxiAgent
AppMaster
FuxiMaster
FuxiAgent
(1)
48

2.
Scheduling

FuxiAgent
AppMaster
FuxiMaster
FuxiMaster
(2)
49

3. Fault Tolerance
Scheduling
• FuxiMaster, FuxiAgent, AppMaster

•

• /
( )
50

😃
(1) FuxiMaster4. Fault Tolerant Job
Scheduling
:
FuxiMaster 2

( )
AppMaster AppMaster
😴
AppMaster AppMaster
😃😨
A A
:
51

Scheduling
:
hard state soft state
52

Scheduling
:
job description machine blacklist

hard state
soft state
FuxiMaster FuxiAgent AppMaster
hard state FuxiMaster soft state
53

(2) FuxiAgent4. Fault Tolerant Job
Scheduling
FuxiAgent AppMaster

AppMaster/FusiMaster
( )
54

(3) AppMaster4. Fault Tolerant Job
Scheduling

AppMaster
( Running/Stopped )

AppMaster
55

Scheduling

56

Scheduling
FuxiAgent FuxiMaster
FuxiAgent FuxiAgent
FuxiAgent I/O

FuxiMaster FuxiAgent
->
57

5. Evaluation
1000
5000
CPU Xeon E5-2430, 2.20 GHz, 6Core
Memory 96GB
Disk 12x2 TB SATA HD
Network 10 Gb/s Ethernet
59

5. Evaluation
Fuxi 2009
1. Fuxi
60
7 (CPU, , 5 )

5. Evaluation
(1)
2.
• 1000

• WordCount Terasort

• 10 10
WordCount

Terasort
61

5. Evaluation (1) FuxiMaster
• : 0.88ms

• 3ms
1ms
62

5. Evaluation
(2)
FM_total: (442TB, )
FM_planed: (429.26TB, 97.1%)
AM_obtained: AppMaster (424.56TB, 95.9%)
FA_planed: FuxiAgent (421.52TB, 95.2%)
40%
63

5. Evaluation
(2)
CPU
CPU 10% CPU

64

5. Evaluation
3.
GraySort(100TB )

5000
( 2013 (3000 ) )
65

6. Related Works
Mesos 2011 UC Berkley
TW AirBnB
Yarn 2013 MS, Yahoo, FB NTT
Omega 2013 UCB, Google Google
Sparrow 2013 UCB ?
67

6. Related Works
(1)
• Mesos oﬀer-based
Master
• Yarn, Fuxi request-based
AppMaster
AppMaster OK
Master
68

6. Related Works
(2) Fault Tolerance
• FuxiMaster Mesos master failover

• Yarn failover  
failover
69

7. Conclusion and Future Work
70

6. Related Works
• App
Future Work
Conclusion
2009
Fuxi Internet scale

Scalability Fault Tolerance
71

Fuxi論文まとめ

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Fuxi論文まとめ

Similar to Fuxi論文まとめ (20)

Recently uploaded

Recently uploaded (20)

Fuxi論文まとめ