Update on the Spider II File System

1,369 views

Published on

In this presentation from the DDN User Meeting at SC13, Sarp Oral provides an update on the Spider II file system at Oak Ridge National Laboratory.

Watch the video presentation: http://insidehpc.com/2013/11/13/ddn-user-meeting-coming-sc13-nov-18/

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,369
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
16
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Update on the Spider II File System

  1. 1. Spider%II%Specs% Scalable'Storage'System' 36'SFA12K40'IB'FDR' 10'60Sdisk'trays/couplet' 560'2'TB'NL'SAS/couplet' 20,160'drives' 40'PB'capacity'(raw)' >'1'TB/s'performance'' Test'and'Development'System' 1'SFA12K40'IB'FDR' 10'60Sdisk'enclosure' 560'2'TB'NL'SAS'drives' 16 Facts' 32'PB'capacity'(aker'RAID)' >'1'TB/s'performance'' 288'Lustre'OSS'total' 8'OSS'per'couplet' 4'MDS'and'2'MGS' Configured'in'4'rows' 2x'108Sport'FDR'IB'switches' 36x'36Sport'FDR'IB'switches' 440'Lustre'Titan'LNET'routers' (432&for&OSS,&8&for&MDS)& Sarp Oral | SC’13, DDN User Meeting
  2. 2. Spider%II%R%Architecture% Serial ATA 6 Gbit/sec InfiniBand 56 Gbit/sec XK7 Gemini 3D Torus 9.6 Gbytes/sec per direction Titan XK7 Other OLCF resources Enterprise Storage controllers and large racks of disks are connected via InfiniBand. 36 DataDirect SFA12K-40 controller pairs with 2 Tbyte NL- SAS drives and 8 InifiniBand FDR connections per pair 17 Storage Nodes run parallel file system software and manage incoming FS traffic. 288 Dell servers with 64 GB of RAM each SION II Network provides connectivity between OLCF resources and primarily carries storage traffic. 1600 ports, 56 Gbit/sec InfiniBand switch complex Lustre Router Nodes run parallel file system client software and forward I/O operations from HPC clients. 432 XK7 XIO nodes configured as Lustre routers on Titan Sarp Oral | SC’13, DDN User Meeting
  3. 3. Spider%II%R%Facili5es% •  Sits'on'a'36’’'raised'floor'and'forced'air'cooled' •  4'iden>cal'rows'in'hotSaisle/coldSaisle'configura>on' –  9'racks'for'DDN'SFA12KS40'equipment' –  1'infrastructure'rack' –  ColdSaisle'is'fully'contained'with'overhead'panels'and' sliding'doors'at'each'end'of'the'rows' •  Prevents'hotSair'coldSair'mixing'and'increases'cooling'efficiency' •  25%'perforated'>les'used'to'provide'coldSair'to'coldSaisles' •  Fully'compliant'with'the'requisite'Na>onal'Fire'Protec>on' Associa>on'(NFPA)'codes' •  Total'space'required'is'672'square'feet' 18 Sarp Oral | SC’13, DDN User Meeting
  4. 4. Spider%II%R%Facili5es% •  Ran'series'tests'on'a'DDN'SFA12KS40'testbed'unit'under' various'I/O'mode'and'load'scenarios' –  9'kW'per'DDN'rack'nominal'load' •  Total'file'system'load'including'infrastructure'racks'is'400' kW'and'total'cooling'load'is'114'tons' •  Each'rack'is'fed'with'a'pair'of'208VAC'3Sphase'electrical' feeds,'protected'by'a'50A'10%Srated'breaker' –  Fed'from'two'different'transformer'sources' –  DDN'SFA12K'power'distribu>on'system'is'both'load'balanced' and'supports'failSover,'OLCF'can'conduct'both'scheduled'and' unscheduled'maintenance'on'one'transformer'without' disrup>ng'the'file'system'opera>on' –  Neither'electrical'connec>on'is'protected'by'UPS' 19 Sarp Oral | SC’13, DDN User Meeting
  5. 5. Integra5on%efforts% •  Lustre'2.4'tes>ng' –  SmallSscale' •  Round'the'clock'tes>ng'for'stability,'regression,'and'performance'on' a'single'cabinet'Cray'XK7'(Arthur)' •  Home'built'Cray'Lustre'2.4'client'as'well'as'servers' •  Early'detec>on'and'correc>on'of'problems'and'bugs' –  LargeSscale' •  Weekly'tes>ng'on'Titan' •  Iden>fied'some'number'of'problems'at'scale' •  IB'FDR'tes>ng'on'Cray' –  Cray'and'Mellanox' 20 Sarp Oral | SC’13, DDN User Meeting
  6. 6. Schedule% •  System'infrastructure'delivery'' –  Completed' •  Block'storage'delivery' –  Completed' •  Block'acceptance' –  Completed' –  Achieved'1.3'TB/s'for'reads'and'1.2'TB/s'for'writes'at'the'blockSlevel' –  Need'to'reSvisit'for'a'few'items'Q1’14' •  Lustre'support'with'Intel'' –  Completed.'Level'1,'2,'and'3'support'with'Intel' •  File'system'integra>on' –  Completed' •  Rolling'into'produc>on' –  Completed' •  Performance'tuning' –  On'going.'To'be'completed'by'Q1‘14.' 21 Sarp Oral | SC’13, DDN User Meeting
  7. 7. 22 Sarp Oral | SC’13, DDN User Meeting
  8. 8. Ques>ons?' ' oralhs@ornl.gov' The research and activities described in this presentation were performed using the resources of the National Center for Computational Sciences at Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC0500OR22725. 23 Sarp Oral | SC’13, DDN User Meeting

×