11. 数据流图
start Reduce
start
Load
Select count sum
“fields”:[“_Url”, “_Res”]
Select merge
“call”:“get_site”
Filter end
“expr”:[“_Site”, “match”,
“/news.[^.]+.cn/”]
Store
Group
“fileds”:[“_Site”] end
12. 中间码
start Reduce
start
Load
count sum
Select
[ Select merge
{ end
Filter
"cmd": "load", Store
"path": null Group end
"using": "SchemaReader"
"from": 0
"options": {"max_item_in_mem“: 100000}
"include": [25]
}, {"cmd":"filter"…}, {"cmd":“group"…},……
]
20. Combiner
Map Phase
Group
Combine
Group
Count
Shuffle
Reduce Shuffle
Count Reduce
Sum
Reduce Phase
21. Combiner实现
Map 1 Map 2
A A
不采用系统Combiner B C
额外磁盘I / O C D
编译到Mapper中实现 B A
内存中的hash字典 A B C A C D
1 2 1 2 1 1
Reduce 1
A 12
O(mapper)个 { A 56
A …
A 92
B …
24. Schema推导、下标推导
…… ……
field ID name age field ID score
type uint64 string int32 type uint64 double
index 2 5 9 Index 0 1
join
Field ID name age Score
Type Uint64 string int32 double
Index 2 5 9 10
……