Big Data has influenced the data center architecture in ways unimagined before. This presentation explores the Fabric Compute and Storage architectures to enable extreme scale-out, low power, high density Big Data deployments
Sn wf12 amd fabric server (satheesh nanniyur) oct 12
1.
2. Fabric Architecture
A Big Idea for the Big Data infrastructure
Satheesh Nanniyur
Senior Product Line Manager
AMD Data Center Server Solutions (formerly, SeaMicro)
3. Agenda
• Defining Big Data from an Infrastructure perspective
• Fabric Architecture for Big Data
• An overview of the Fabric Server and Fabric Storage
• Illustrating Fabric Architecture Benefits for Hadoop
• Conclusion
4. Have you come across Big Data?
Apple’s virtual smartphone assistant, Siri,
uses complex machine learning
techniques
Target’s “pregnancy prediction score”
– NY Times: “How companies learn your
secrets” – Feb 2012
5. So, what really is Big Data?
Business
• “Key basis of competition and growth…”
Observational
• “Too big, moves too fast, or doesn’t fit the structures of your
database”
Mathematical
• “Every day, we create 2.5 “million trillion” (quintillion) bytes
of data"
Systems
• “Exceeds the processing capacity of conventional database”
6. The Infrastructural definition of Big Data
Massive • Store “all” data not knowing its
Storage use in advance
Massive • Ask a query, and when you do,
Compute get the answer fast
7. Big Data infrastructure is not business as
usual
Massive • Petabyte scale high density storage
Storage • Flexible storage to compute ratio to
meet evolving business needs
Massive • High density scale-out compute
Compute • Power and space efficient
infrastructure
The IT architectural approach used in clustered environments
such as a large Hadoop grid is radically different from the
converged and virtualized IT environments
IDC White Paper, “Big Data: What It Is and Why You Should Care”
8. Fabric Architecture for Big Data
The holy grail of Big Data Infrastructure
Imagine a world where you could simply stack up servers,
with each server:
Flexible
Fraction of a Share over 5 10GE network
provisioning of
rack unit PB of storage with no cabling
storage
9. A deeper look at the traditional rack-
mount architecture
Aggregation
ToR
Cabling and Management
Nodes
• Compromise between Compute and
Storage density
• Rigid compute to storage ratio
• Oversubscribed network suitable for
north-south traffic, not heavy east-west
required for Big Data
• Too many adapters (NIC, Storage Ctlr)
and cabling that can fail
10. Fabric with 3-D Torus for Big Data
Infrastructure
Big Data is a big shift from North-South traffic to East-West
High Speed and Low Latency
Interconnection
Switchless Linear Scalability that
avoids bottlenecks
Highly available network minimizing
node loss and data reconstruction
High density scale-out architecture
with low power and space
11. An overview of the Fabric Server
Y+
X-
Z+
X+ PCIe Z-
• 512 x86 cores with 4TB
Y- DRAM in 10RU
x86 Server
• Up to 5 petabytes of
SeaMicro Fabric Node with storage
IOVT • Flexible Storage to Compute
ratio
• 10GE network per server
160GE of uplink bandwidth
12. Fabric Storage ... for Big Data?
Isn’t Big Data always deployed with DAS?
“.. the rate of change was killing us, where the data volumes were practically
doubling every month. Trying to keep up with that growth was an extreme challenge
to say the least.. “
Customer quote from IDC white paper - “Big Data – What It Is and Why You Should Care”
Underutilized Compute • Add storage capacity
& Network Rigid Storage to Compute independent of compute
Ratio (Traditional
Rackmount)
to increase cluster
efficiency
Compute
Flexible Fabric Storage to
• Flexibly provision storage
Compute Ratio capacity to meet evolving
customer needs
Storage
13. Massive capacity scale-out Fabric Storage
• Massive scale-out capacity with commodity drives
• Decoupled from Compute and Network to grow storage
independently
Captive DAS with Rigid Flexible scale-out Fabric Storage
Storage to Compute Ratio up to 5PB
Intel /AMD x86
Freedom Fabric servers
Traditional
Rackmount
14. Hadoop and the SMAQ stack
Built to scale linearly with massive scale-out storage (HDFS)
and compute (MapReduce)
Query Pig, Hive
MapReduce
Data Processing
Framework
Data Storage HDFS
15. Hadoop data processing phases
Fabric Architecture cost efficiently meets the Hadoop
infrastructure needs
Storage Compute Network Compute Storage
Intensive Intensive Intensive Intensive Intensive
Map
Reduce
Map
Reduce
Map
HDFS Map and Shuffle Reduce HDFS
Input Intermediate Output
Data Write
512 x86 cores 5 Petabytes of
10 Gpbs Inter- 160 Gbps shared
with 4TB DRAM storage capacity
Node Bandwidth uplink for Inter-
per Fabric Server with independent
per server Rack traffic
in 10RU scale-out
16. Hadoop resource usage pattern
Based on Terasort run on SeaMicro SM15000
Map
Compute Shuffle
Reduce
Map Shuffle
Storage
Reduce
Shuffle
Network
17. Deployment Challenges of Hadoop
• Plan for peak utilization
– Hadoop infrastructure utilization is bursty
• Compute, Storage, and Network mix dependent on
application workload
– Flexible ratios optimize deployment
• Power and Space Efficiency key to large scale
deployment
• Administrative cost can increase as rapidly as your data
– Simplified deployment and reduced hardware components
decrease TCO
18. Fabric Server for Hadoop Deployment
Fabric Server offers 60% more compute and storage in the same
power and space envelope
Traditional SeaMicro Fabric
Rackmount Server
Intel Xeon Cores 320 512
AMD Opteron Cores* 320 1024
Storage 720 TB 1136 TB
Storage Scalability None Up to 4PB
Network B/W per
Up to 2Gbps Up to 8Gbps
server
Network Downlinks 40 0
ToR Switch 2 0 (Built-in)
Aggregation (End of
1 1
Row) switch/router
Based on SeaMicro SM15000 and HP DL380 Gen8 2U
dual socket octal core servers in a 42U rack
19. Summary
Traditional architectures cannot scale to meet the needs of Big Data
Efficient Big Data deployments need flexible storage to compute ratio
Conventional wisdom of reduced hardware components still holds
Fabric Servers provide unprecedented density, bandwidth, and
scalability for Big Data deployments