1. Gordon: Using Flash Memory to
Build Fast, Power-efficient
Clusters for Data-intensive
Applications
Presenter: He Wang
Department of Electrical and Computer
Engineering
University of Florida
2. Outline
• Motivation and Background
• Introduction to Gordon’s system
architecture
• Gordon’s storage system
• Configuring Gordon
3. Wiki
• Gordon
o A flash-based system architecture for massively parallel,
data-centric computing
• Feature
o Power efficiency
o Performance advantage
o Aimed at data-centric applications
4. Motivation and Background
• Challenges with large-scale data processing
o Slowdown in uni-processor performance
o Latency and BW bottleneck of HDD
o Power constraints
• Improve performance and power efficiency
• Progresses
o Programming model that parallelizing data-processing program
o Increased BW and reduced latency with SSD
o Recent power efficient processors
5. Motivation and Background(cont)
• Gordon
o Programming system that parallelizing data-processing program(i.e.
MapReduce)
• Abstractions for specifying data-parallel compution
• Automating the parallelism
o SSD
• Improved flash translation layer(FTL)
o Power efficient processors
• 100s or 1000s
• simple interconnect
6. Gordon system architecture
• Gordon nodes
o 256GB Flash mem, flash storage controller, 2GB SDRAM,
1.9Ghz Intel Atom processor
o Connected through 1Gb ethernet-style network
o A standard rack hols 16 enclosures for 256 nodes with 64TB
storage and 230GB/s I/O BW
o Independent computer
• OS
• Network interfaces
7. Gordon system architecture
• Gordon nodes features
o Power efficient
• 19W to 81W
o High BW
• 900MB/S
Figure 1. Gordon system architecture
8. Storage system
• Key to power efficiency and performance
• Support Erase, Program, Read operations
• Reliability issue
o Wear out, needs wear-leveling
• Flash translation layer(FTL)
9. Storage system
• Flash controller
o Implements FTL
o Link between CPU and flash array
• Shared buses, up to 4 packages
10. Storage system
• Gordon FTL
o Operate a write point
• Pointer to a page of flash memory
o Maintain a summary page in each block
• Logical block address(LBA)-to-physical mapping
• Benefit of this indirection
• Address organization
• Wear-leveling
• Working flow
o Receive write command
o Locate data by write point
o update LBA table
11. Storage system
• Gordon FTL advantage---Write point
o Original FTL has only one write point, no parrallel
o Multiple write points with spread access
o Sequence number
• Avoid conflict with occupied write point
• Assign the write point with smallest available
12. Storage system
• Gordon FTL advantage---super-page
o Manage flash array with larger granularity with one write
point for each
o Horizontal striping
o Vertical striping
o 2D striping
13. Storage system
• Super-page stripping approaches
Figure 2. Three approachs to striping data across flash arrays
14. Storage system
• Super-page
o Pros
• Reduced overhead
o Cons
• Latency for sub-page access
• Wear out effect larger portion
16. Configuring Gordon
• Workloads
o Benchmarks that use MapReduce
• Power model
o Direct mesure of a running system
o Datasheet
P = IdlePower * (1-ActivityFactor) + ActivePower * ActivityFactor
17. Configuring Gordon
• Measuring cluster performance
o High-level simulator to measure overall performance
• Model 32 node by running 4 Vmware on 8 servers
o Sync mode, provides upper bound of exe time
o nosync mode, provides lower bound
o Storage simulator
19. Configuring Gordon
• Optimal Gordon configurations
Out-perform disk-based by 1.5X and deliver 2.5X more performance per watt
Figure 5. Optimal Gordon configuration
20. Configuring Gordon
• Gordon power consumption
o MaxE-flash consumes 40% of the energy of the disk-based configuration
o A factor of two increase in performance
Figure 6. Relative energy consumption