13. How to develop a web application that scales:
Data
Processing
BigTable
Storage Database Serving
Map
Reduce
Google
File
System
BigTable
Google
AppEngine
Google’s solution
12 of 20
14. Google File System:
BigTable
• Files broken into chunks
(typically 64 MB)
• Chunks triplicated across
three machines for
safety (tunable)
• Master manages
metadata
• Data transfers happen
directly between clients
and chunkservers
13 of 20
•EX:一個檔案64MB被切10
塊,每一塊再複製三份,這30
就會在不同的server上
16. Big Table:
Column
Data Model
-(row, column, timestamp)
→ cell contents
BigTable
“<html>..”
“<html>..”
“<html>...”
“www.cnn.com”
t3
t5
t6
“Content:”
Row Key
s
Timestamp
s
15 of 20
17. System Structure:
BigTable Cell
BigTable
Bigtable
master
Bigtable tablet
server
metadata
Read/write
、Bigtable tablet
…
server
Bigtable
client
Library
Bigtable tablet
server
Cluster scheduling
system
Open()
GFS Lock service
handles failover,
monitoring
holds tablet data,
logs
holds metadata,
handles master
election
16 of 20
18. Big Table Summary:
• Data model applicable to broad range of clients
– Actively deployed in many of Google’s services
• System provides high performance storage
system on a large scale
– Self-managing
– Thousands of servers
– Millions of ops/second
– Multiple GB/s reading/writing
BigTable
、
• Currently ~500 BigTable cells
• Largest bigtable cell manages ~3PB of data
spread over several thousand machines (larger
cells planned)
17 of 20
19. MapReduce -A Model and System
為一種用在大型叢集環境理處理大量資料的一種軟體架構
•EX:
[Reduce]
BigTable
、
中選會
[master]
18 of 20
台北三重1號投票所
[server1]
台中大甲3號投票所
[serve255]
分配開票任務
高雄三多4號投票所
… [serverX]
進行開票進行開票進行開票
[Map]
20. MapReduce -A Model and System
Two phases of data processing
• Map:(in_key, in_value) → {(keyj, valuej )| j=1,… K}
將Key對應到list(value)中,對Map輸入一些資料,Map會進
行一些前置處理,然後把Key與每一筆資料配對,成為
list(key’, value’)。
BigTable
、
• Reduce:(key,[value1, …valueL ])→ (key, f_value)
Reduce為一種彙總函數,將許多value’統合為一個value”。
Key’的存在,可能是用於樣式比對或額外附加資訊的需要。
19 of 20