Gordon: Using Flash Memory to  Build Fast, Power-efficient  Clusters for Data-intensive         Applications            Pr...
Outline• Motivation and Background• Introduction to Gordon’s system  architecture• Gordon’s storage system• Configuring Go...
Wiki• Gordon  o A flash-based system architecture for massively parallel,    data-centric computing• Feature  o Power effi...
Motivation and Background• Challenges with large-scale data processing  o Slowdown in uni-processor performance  o Latency...
Motivation and Background(cont)• Gordon  o Programming system that parallelizing data-processing program(i.e.    MapReduce...
Gordon system architecture• Gordon nodes  o 256GB Flash mem, flash storage controller, 2GB SDRAM,    1.9Ghz Intel Atom pro...
Gordon system architecture• Gordon nodes features  o Power efficient     • 19W to 81W  o High BW     • 900MB/S            ...
Storage system• Key to power efficiency and performance• Support Erase, Program, Read operations• Reliability issue   o We...
Storage system• Flash controller  o Implements FTL  o Link between CPU and flash array     • Shared buses, up to 4 packages
Storage system• Gordon FTL  o Operate a write point      • Pointer to a page of flash memory  o Maintain a summary page in...
Storage system• Gordon FTL advantage---Write point  o Original FTL has only one write point, no parrallel  o Multiple writ...
Storage system• Gordon FTL advantage---super-page  o Manage flash array with larger granularity with one write    point fo...
Storage system• Super-page stripping approaches    Figure 2. Three approachs to striping data across flash arrays
Storage system• Super-page  o Pros    • Reduced overhead  o Cons    • Latency for sub-page access    • Wear out effect lar...
Storage system• Super-page evaluation         Figure 3. Flash storage array performance
Configuring Gordon• Workloads  o Benchmarks that use MapReduce• Power model  o Direct mesure of a running system  o Datash...
Configuring Gordon• Measuring cluster performance  o High-level simulator to measure overall performance     • Model 32 no...
Configuring Gordon• Parato-optimal Gordon system design                  Figure 6. Parato-optimal Gordon system designs
Configuring Gordon• Optimal Gordon configurations Out-perform disk-based by 1.5X and deliver 2.5X more performance per wat...
Configuring Gordon• Gordon power consumption  o   MaxE-flash consumes 40% of the energy of the disk-based configuration  o...
Discussions• Exploit disks for cheap redundancy• Virtualizing Gordon
Thanks
Upcoming SlideShare
Loading in …5
×

Presentation gordon

460 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
460
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Presentation gordon

  1. 1. Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications Presenter: He Wang Department of Electrical and Computer Engineering University of Florida
  2. 2. Outline• Motivation and Background• Introduction to Gordon’s system architecture• Gordon’s storage system• Configuring Gordon
  3. 3. Wiki• Gordon o A flash-based system architecture for massively parallel, data-centric computing• Feature o Power efficiency o Performance advantage o Aimed at data-centric applications
  4. 4. Motivation and Background• Challenges with large-scale data processing o Slowdown in uni-processor performance o Latency and BW bottleneck of HDD o Power constraints• Improve performance and power efficiency• Progresses o Programming model that parallelizing data-processing program o Increased BW and reduced latency with SSD o Recent power efficient processors
  5. 5. Motivation and Background(cont)• Gordon o Programming system that parallelizing data-processing program(i.e. MapReduce) • Abstractions for specifying data-parallel compution • Automating the parallelism o SSD • Improved flash translation layer(FTL) o Power efficient processors • 100s or 1000s • simple interconnect
  6. 6. Gordon system architecture• Gordon nodes o 256GB Flash mem, flash storage controller, 2GB SDRAM, 1.9Ghz Intel Atom processor o Connected through 1Gb ethernet-style network o A standard rack hols 16 enclosures for 256 nodes with 64TB storage and 230GB/s I/O BW o Independent computer • OS • Network interfaces
  7. 7. Gordon system architecture• Gordon nodes features o Power efficient • 19W to 81W o High BW • 900MB/S Figure 1. Gordon system architecture
  8. 8. Storage system• Key to power efficiency and performance• Support Erase, Program, Read operations• Reliability issue o Wear out, needs wear-leveling• Flash translation layer(FTL)
  9. 9. Storage system• Flash controller o Implements FTL o Link between CPU and flash array • Shared buses, up to 4 packages
  10. 10. Storage system• Gordon FTL o Operate a write point • Pointer to a page of flash memory o Maintain a summary page in each block • Logical block address(LBA)-to-physical mapping • Benefit of this indirection • Address organization • Wear-leveling• Working flow o Receive write command o Locate data by write point o update LBA table
  11. 11. Storage system• Gordon FTL advantage---Write point o Original FTL has only one write point, no parrallel o Multiple write points with spread access o Sequence number • Avoid conflict with occupied write point • Assign the write point with smallest available
  12. 12. Storage system• Gordon FTL advantage---super-page o Manage flash array with larger granularity with one write point for each o Horizontal striping o Vertical striping o 2D striping
  13. 13. Storage system• Super-page stripping approaches Figure 2. Three approachs to striping data across flash arrays
  14. 14. Storage system• Super-page o Pros • Reduced overhead o Cons • Latency for sub-page access • Wear out effect larger portion
  15. 15. Storage system• Super-page evaluation Figure 3. Flash storage array performance
  16. 16. Configuring Gordon• Workloads o Benchmarks that use MapReduce• Power model o Direct mesure of a running system o Datasheet P = IdlePower * (1-ActivityFactor) + ActivePower * ActivityFactor
  17. 17. Configuring Gordon• Measuring cluster performance o High-level simulator to measure overall performance • Model 32 node by running 4 Vmware on 8 servers o Sync mode, provides upper bound of exe time o nosync mode, provides lower bound o Storage simulator
  18. 18. Configuring Gordon• Parato-optimal Gordon system design Figure 6. Parato-optimal Gordon system designs
  19. 19. Configuring Gordon• Optimal Gordon configurations Out-perform disk-based by 1.5X and deliver 2.5X more performance per watt Figure 5. Optimal Gordon configuration
  20. 20. Configuring Gordon• Gordon power consumption o MaxE-flash consumes 40% of the energy of the disk-based configuration o A factor of two increase in performance Figure 6. Relative energy consumption
  21. 21. Discussions• Exploit disks for cheap redundancy• Virtualizing Gordon
  22. 22. Thanks

×