BWC Supercomputing 2008 Presentation


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

BWC Supercomputing 2008 Presentation

  1. 1. Towards Global Scale Cloud Computing: Using Sector and Sphere on the Open Cloud Testbed NCDM SC08 Bandwidth Challenge Robert Grossman 1,2 , Yunhong Gu 1 , Michal Sabala 1 , David Hanley 1 , Shirley Connelly 1 , David Turkington 1 , and Jonathan Seidman 2 1 National Center for Data Mining, Univ. of Illinois at Chicago 2 Open Data Group
  2. 2. Overview <ul><li>Cloud computing software Sector/Sphere </li></ul><ul><ul><li>Open source, developed by NCDM </li></ul></ul><ul><li>Open Cloud Testbed </li></ul><ul><ul><li>Cloud computing testbed supported by the Open Cloud Consortium </li></ul></ul><ul><li>BWC Applications and Performance Evaluation </li></ul><ul><ul><li>TeraSort: sorting 1TB distributed dataset </li></ul></ul><ul><ul><li>CreditStone: computing fraudulent rate from 3TB credit card transaction records </li></ul></ul>
  3. 3. Cloud Computing & Sector/Sphere <ul><li>Hardware </li></ul><ul><ul><li>Inexpensive computer clusters (may across multiple data centers) </li></ul></ul><ul><ul><li>High speed wide area networks </li></ul></ul><ul><li>Software </li></ul><ul><ul><li>Storage cloud: Distributed, big data set </li></ul></ul><ul><ul><li>Compute cloud: simplified distributed data processing </li></ul></ul>High Speed Transport Protocol UDT Routing: C/S or P2P Storage Cloud: Sector Compute Cloud: Sphere
  4. 4. Sector: Distributed Storage System Security Server Master slaves slaves SSL SSL Client User account Data protection System Security Storage System Mgmt. Processing Scheduling Service provider System access tools App. Programming Interfaces Storage and Processing Data UDT Encryption optional
  5. 5. Sphere: Simplified Data Processing <ul><li>Data is processed at where it resides (locality) </li></ul><ul><li>Same user defined functions (UDF) can be applied on all elements (records, blocks, or files) </li></ul><ul><li>Processing output can be written to Sector files, on the same node or other nodes </li></ul><ul><li>Generalize Map/Reduce </li></ul>
  6. 6. Sphere: Simplified Data Processing for each file F in (SDSS datasets) for each image I in F findBrownDwarf(I, …); SphereStream sdss; sdss.init(&quot;sdss files&quot;); SphereProcess myproc; myproc->run(sdss,&quot; findBrownDwarf &quot;); myproc->read(result); findBrownDwarf(char* image, int isize, char* result, int rsize);
  7. 7. Sphere: Data Movement <ul><li>Slave -> Slave Local </li></ul><ul><li>Slave -> Slaves (Shuffle) </li></ul><ul><li>Slave -> Client </li></ul>
  8. 8. Load Balance & Fault Tolerance <ul><li>The number of data segments is much more than the number of SPEs. When an SPE completes a data segment, a new segment will be assigned to the SPE. </li></ul><ul><li>If one SPE fails, the data segment assigned to it will be re-assigned to another SPE and be processed again. </li></ul><ul><li>Load balance and fault tolerance are transparent to users! </li></ul>
  9. 9. Implementation <ul><li>Sector/Sphere is open source software developed by NCDM </li></ul><ul><li>Written in C++ </li></ul><ul><li>Client tools (file access, system management, etc) </li></ul><ul><li>Sector/Sphere API library </li></ul><ul><li> </li></ul><ul><li>Current release version 1.15 </li></ul>
  10. 10. Open Cloud Testbed <ul><li>4 Racks </li></ul><ul><ul><li>Baltimore (JHU): 32 nodes, 1 AMD Dual Core CPU, 4GB RAM, 1TB single disk </li></ul></ul><ul><ul><li>Chicago (StarLight): 32 nodes, 2 AMD Dual Core CPUs, 12GB RAM, 1TB single disk </li></ul></ul><ul><ul><li>Chicago (UIC): 32 nodes, 2 AMD Dual Core CPUs, 8GB RAM, 1TB single disk </li></ul></ul><ul><ul><li>San Diego (Calit2): 32 nodes, 2 AMD Dual Core CPUs, 12GB RAM, 1TB single disk </li></ul></ul><ul><li>10Gb/s inter-rack connection on CiscoWave </li></ul><ul><li>1Gb/s intra-rack connection </li></ul><ul><li>SC08 </li></ul><ul><ul><li>9 nodes, 2 AMD Quad core CPUs, 16GB RAM, 6TB 8-disk RAID, 10Gb/s intra-rack connection </li></ul></ul>
  11. 11. SC08 & Open Cloud Testbed
  12. 12. BWC App #1: Sorting a TeraByte <ul><li>Data is split into small files, scattered on all slaves </li></ul><ul><li>Stage 1: On each slave, an SPE scans local files, sends each record to a bucket file on a remote node according to the key, so that all buckets are sorted. </li></ul><ul><li>Stage 2: On each destination node, an SPE sort all data inside each bucket. </li></ul>
  13. 13. TeraSort 10-byte 90-byte Key Value 10-bit Bucket-0 Bucket-1 Bucket-1023 0-1023 Stage 1 : Hash based on the first 10 bits Bucket-0 Bucket-1 Bucket-1023 Stage 2 : Sort each bucket on local node Binary Record 100 bytes
  14. 14. Performance Results: TeraSort [Previous run] Run time: seconds 1.2TB 900GB 600GB 300GB Data Size 3702 6675 1526 UIC + StarLight + Calit2 + JHU 3069 4341 1430 UIC + StarLight + Calit2 2617 2896 1361 UIC + StarLight 2252 2889 1265 UIC Hadoop (1 replica) Hadoop (3 replicas) Sphere
  15. 15. Performance Results: TeraSort SC08 Tue Night <ul><li>Sorting 1TB on 101 nodes </li></ul><ul><ul><li>UIC: 24 + SL: 10 + JHU: 29 + Calit2: 29 + SC08: 9 </li></ul></ul><ul><li>Hash vs. Local Sort: 1660sec + 604sec </li></ul><ul><li>Hash </li></ul><ul><ul><li>Per rack (in/out): UIC 201/193GB; SL 96/95GB; JHU 212/217GB; Calit2 214/218GB; SC08 87/89GB </li></ul></ul><ul><ul><li>Per node (in/out): 10GB AVG </li></ul></ul><ul><ul><li>System total AVG 9.6Gb/s (Peak 20Gb/s) </li></ul></ul><ul><li>Local Sort </li></ul><ul><ul><li>No network IO </li></ul></ul>
  16. 16. BWC App# 2: CreditStone Merchant ID Time Key Value 3-byte merch-000X merch-001X merch-999X 000-999 Stage 1 : Process each record and hash into buckets according to merchant ID merch-000X merch-001X merch-999x Stage 2 : Compute fraudulent rate for each merchant Trans ID|Time|Merchant ID|Fraud|Amount 01491200300|2007-09-27|2451330|0|66.49 Text Record Transform Text Record Fraud
  17. 17. Performance Results: CreditStone [Previous run] 55 53 37 36 Sphere w/o Index (min) 71 64 47 46 Sphere with Index (min) 189 191 180 179 Hadoop (min) 58.5B 44.5B 29.5B 15B Size of Dataset (rows) 3276 2492 1652 840 Size of Dataset (GB) 117 89 59 30 Number of Nodes JHU, SL, Calit2, UIC JHU, SL, Calit2 JHU, SL JHU Racks
  18. 18. Performance Results: CreditStone SC08 Tue Night <ul><li>Process 3TB (53.5 Billion records) on 101 nodes </li></ul><ul><ul><li>UIC: 24 + SL: 10 + JHU: 29 + Calit2: 29 + SC08: 9 </li></ul></ul><ul><li>Stage 1 vs. Stage 2: 3022sec : 471sec </li></ul><ul><li>Scan and hash each record </li></ul><ul><ul><li>Per rack (in/out): UIC 157/248GB; SL 84/122GB; JHU 229/135GB; Calit2 222/154GB; SC08 77/111GB </li></ul></ul><ul><ul><li>Per node (in/out): 9.7GB AVG </li></ul></ul><ul><ul><li>System total AVG 5.5Gb/s (Peak 7.2Gb/s) </li></ul></ul><ul><li>Compute fraudulent rate per merchant </li></ul><ul><ul><li>No network IO </li></ul></ul>
  19. 19. BWC App #3: Cistrack SC08 Tue Night <ul><li>10Gb/s from SC08 & JGN2 (Japan) – OCT connection via CiscoWave & JGN2, respectively. </li></ul><ul><li>Upload Cistrack data from SC08 & JGN2 (Japan) to OCT </li></ul><ul><ul><li>Approx 8Gb/s </li></ul></ul><ul><li>Download Cistrack data from OCT to SC08 & JGN2 (Japan) </li></ul><ul><ul><li>Approx 8Gb/s </li></ul></ul>
  20. 20. System Monitoring (Testbed)
  21. 21. System Monitoring (Sector/Sphere)
  22. 22. Summary <ul><li>Aggregated bandwidth </li></ul><ul><ul><li>Upload/Download SC08 <=> OCT: 8Gb/s each direction </li></ul></ul><ul><ul><li>TeraSort on OCT: 9.6Gb/s AVG, 20Gb/s Peak </li></ul></ul><ul><ul><li>CreditStone on OCT: 5.5Gb/s AVG, 7.2Gb/s Peak </li></ul></ul><ul><li>Use of distributed clients </li></ul><ul><ul><li>101 nodes in 4 locations (Chicago, Baltimore, San Diego, Austin) </li></ul></ul><ul><li>Bandwidth monitoring </li></ul><ul><ul><li>Sector/Sphere application monitoring </li></ul></ul><ul><ul><li>Testbed system monitoring </li></ul></ul><ul><ul><li>Both monitor other resources (CPU, mem, etc) too </li></ul></ul>
  23. 23. Summary (cont.) <ul><li>Usability </li></ul><ul><ul><li>Client tools and C++ library (can call functions/applications written in other languages) </li></ul></ul><ul><ul><li>Sector is just like a distributed file system </li></ul></ul><ul><ul><li>Terasort: 60 lines of Sphere code, plus application specific code </li></ul></ul><ul><ul><li>CreditStone: 170 lines of Sphere code, plus application specific code </li></ul></ul><ul><li>Real world Applicability </li></ul><ul><ul><li>Software: Open Source Sector/Sphere </li></ul></ul><ul><ul><li>System: Open Cloud Testbed </li></ul></ul><ul><ul><ul><li>The software can be installed on any heterogeneous clusters over single or multiple data centers </li></ul></ul></ul><ul><ul><li>Real applications: bioinformatics, sorting and credit card transaction processing (and many other applications!) </li></ul></ul>
  24. 24. Summary (cont.) <ul><li>Use of distributed HPC resources </li></ul><ul><ul><li>Open cloud testbed (4 distributed racks + SC08 booth) </li></ul></ul><ul><ul><li>10Gb/s CiscoWave </li></ul></ul><ul><li>Conference floor resources </li></ul><ul><ul><li>One 10Gb/s SCinet connection to CiscoWave </li></ul></ul><ul><ul><li>9 servers + 1 10GE switch </li></ul></ul><ul><ul><li>Participating as slave (storage + computing) nodes for Sphere applications </li></ul></ul>
  25. 25. For More Information <ul><li>Sector/Sphere code & docs: </li></ul><ul><li>Open Cloud Consortium: </li></ul><ul><li>NCDM: </li></ul><ul><li>@SC08: Booth 321 </li></ul>