SlideShare a Scribd company logo
1 of 44
Download to read offline
The Monash Campus Grid
Programme
  Enhancing Research with
  High-Performance/High-
  Throughput Computing
Information Technology
Services Division
Office of the CIO


           What is HPC?
                    High-performance computing is about leveraging the best
                     and cost-effective technologies, from processors,
                     memory chips, disks and networks, to provide aggregated
                     computational capabilities beyond what is typically
                     available to the enduser
                          high-performance -- running a program as quickly as
                           possible

                          high-throughput -- running as many programs as quickly as
                           possible within a unit of time

                    HPC/HTC are enabling technologies for larger
                     experiments, more complex data analyses, achieving
                     higher accuracy in computational models
Information Technology
Services Division
Office of the CIO


           The Monash Campus Grid
                   WWW

                   Nimrod Grid-enabled Middleware

                    Secure Shell   GT2 / GT4      Secure Copy     GridFTP


                   Monash Sun Grid HPC Cluster    Monash SPONGE Condor Pool




                   LaRDS (peta-scale storage)

                   Monash Gigabit Network

                    https://confluence-vre.its.monash.edu.au/display/mcgwiki/Monash+MCG
Information Technology
Services Division
Office of the CIO


           Monash Sun Grid
                    Central high-performance compute cluster (HPC + HTC capable)

                    Monash eResearch Centre and Information Technology Services Division

                    Key features:

                          dedicated Linux cluster with ~205 computers providing ~1,650 CPU
                           cores

                          processor configurations from 2 CPU cores to up to 48 CPU cores per
                           computer

                          primary memory configurations from 4 GB to up to 1,024 GB per
                           machine

                          broad range of applications and development environments

                          flexibility in addressing our customer requirements


          https://confluence-vre.its.monash.edu.au/display/mcgwiki/Monash+Sun+Grid+Overvie
Information Technology
Services Division
Office of the CIO




            Monash Sun Grid              node types


            2005

                                         2009

                                   2008-10




                                                2010

              2006       2008-10
Information Technology
Services Division
Office of the CIO



            Monash Sun Grid                                    2010
                 Very-Large RAM nodes

                          2010

                          Dell R910 - four eight-core Intel
                           Xeon (Nehalem) CPUs per node

                          two nodes [ 64 cores total ]

                          1,024 GB RAM / node

                          16 x 600GB 10k RPM SAS disk
                           drives

                          over 640 Mflop/s

                          redundant 1.1 kW PSU on each

                          ~300 Mflop/W



    http://www.dell.com/us/business/p/poweredge-r910/pd
Information Technology
Services Division
Office of the CIO



            Monash Sun Grid                                     2011
                 Partnership Nodes with
                  Engineering

                          2010-11

                          Dell R815 - four 12-core AMD
                           Opteron CPUs per node

                          five nodes [ 240 cores ]

                          128 GB RAM / node

                          10G Ethernet

                          > 2,400 Gflop/s

                          redundant 1.1 kW PSUs

                          ~400 Mflop/W


                               http://www.dell.com/us/business/p/poweredge-r815/pd
Information Technology



                           Monash Sun Grid
Services Division
Office of the CIO




                              Summary
                                                Core
                Name      Vintage    Node                 Gflop/s         Power Req’t   Mflop/W
                                                Count
                MSG-I      2005       V20z          70              336        ~17 kW    20 Mflop/W
                MSG-II     2006      X2100          64              332        ~11 kW    42 Mflop/W
               MSG-IIIe    2007      X6220          120             624       ~7.2 kW    65 Mflop/W
               MSG-IV      2008      X4600          96              885       ~3.6 kW   250 Mflop/W
               MSG-III     2009      X6250          720         7200           ~23 kW   330 Mflop/W
               MSG-III     2010      X6250          240         2400            ~7 kW   330 Mflop/W

              MSG-gpu      2010       Dell          80 > 800 + 18,660              ??             ??

              MSG-vlm      2010     Dell R910       64         > 640          ~2.2 kW   290 Mflop/W
               MSG-pn      2011     Dell R815       240       > 2400          ~5.5 kW   436 Mflop/W

             Monash Sun Grid HPC Cluster has 1,694 cores &
             clocks at over 12.5(+ 18.6)Tflops with >5.7 TB of
             RAM
Information Technology
Services Division
Office of the CIO



            Software Stack                                         •   Underworld

               •         S/W Development, Environments,            •   CFD Codes
                         Libraries
                                                                   •   OpenFOAM, ANSYS Fluent, CFX, viper (user-
               •         gcc, Intel C/Fortran, Intel MKL, IMSL         installed)
                         numerical library, Ox, python, openmpi,
                         mpich2, NetCDF, java, PETsc, FFTW,        •   CUDA toolkit, Qt, VirtualGL, itksnap, drishti,
                         BLAS, LAPACK, gsl, mathematica,               Paraview
                         octave, matlab (limited)
                                                                   •   CrystalSpace

               •         Statistics, Econometrics
                                                                   •   FSL

               •         R, Gauss, Stata                           •   Meep

               •         Computational Chemistry                   •   CircuitScape

               •         Gaussian 09, GaussView 5, Molden,         •   Structure and Beast
                         GAMESS, Siesta, Materials Studio
                         (Discovery), AMBER 10                     •   XMDS

               •         Molecular Dynamics                        •   ENViMET (via wine)

                                                                   •                  Growing List!
               •         NAMD, LAMMPS, Gromacs                         ENVI/IDL

                                                                   •   etc etc etc
Information Technology
Services Division
Office of the CIO


                   Specialist Support and
                   Advise
                   Initial
                                General Advise
                   Engagement                    Requirements
                                & Startup
                   Account                       Analysis
                                Tutorial
                   Creation




                                Follow Up        Customised
                                Maintenance      Solutions
Information Technology
Services Division
Office of the CIO


                         Specialist Support and
                                 Advise
               •         Cluster Queue Configuration and
                         Management
               •         Compute job preparation
               •         Custom scripting
               •         Software installation and tuning
               •         Job performance and/or error diagnosis
               •         etc
Information Technology
Services Division
Office of the CIO


           Growth of CPU Usage




               cpu
               hours
            2008 859K
            20093,300K
            20106,863K
Information Technology
Services Division
Office of the CIO


           Growth of CPU Usage




            2008 859K
            2009 3,300K   cpu     783 CPU years!!!
            2010 6,863K   hours
Information Technology
Services Division
Office of the CIO


           Active Users

                            Projected




            Active Users
          2008 71
          2009 145
          2010 169 24-Aug
Information Technology
Services Division
Office of the CIO



                 What to expect in the future?
                    Continued refresh of hardware and software

                          decommissioning older machines

                    More grid nodes (CPU cores) to meet growing demand

                    Scalable and high-performance storage architecture
                     without sacrificing data availability

                    Custom grid nodes &/or subclusters with special
                     configurations to meet user requirements

                    Better integration with Grid tools and middleware
Green IT Strategy
Monash Sun Grid                              Beginnings
  MSG-I

 2005
 Sun V20z AMD Opteron (dual
  core)
 initially 32 nodes = 64 cores,
  with 3 new nodes added in
  2007 making a total of 70 cores
 4 GB RAM / node
 336 Gflop/s
 ~17kW                        http://www.sun.com/servers/entry/v20z/index.js
 20 Mflop/W
Monash Sun Grid
  MSG-II

    2006

    Sun X2100 AMD Opteron (dual
     core)

    initially 24 nodes = 48 cores
     with 8 nodes added in 2007
     making 64 cores at present

    4 GB RAM / node

    332 Gflop/s

    ~11 kW

    42 Mflop/W
                        http://www.sun.com/servers/entry/x2100/
Picture on the right was googled and found from Jason Callaway’s
Flicker page:
http://www.flickr.com/photos/29925031@N07/
Monash Sun Grid                            Big Mem
Boxes
  MSG-III (now named as MSG-IIIe)

   2008

   Sun X6220 Blades - two dual
    core AMD Opterons per node

   currently 20 nodes = 80 cores
    with 10 nodes to be added in
    2010 making 120 cores

   40 GB RAM / node

   624 Gflop/s

   ~7.2 kW

   330 Mflop/W


    http://www.sun.com/servers/blades/x6220/
    http://www.sun.com/servers/blades/x6220/datasheet.pdf
Monash Sun Grid                            2010
 MSG-III expansion and GPU nodes

     2010

     Sun X6250 - two quad-core
      Intel Xeon CPUs per node

     240 cores

     24 GB RAM / node

     Dell nodes connected to two
      Tesla C1060 GPU cards

     Ten nodes = 20 GPU cards

       48 GB and 96 GB RAM
        configs


  http://www.sun.com/servers/blades/x62520/
  http://www.nvidia.com/object/product_tesla_c1060_us.html
Monash Sun Grid                           2009
 MSG-III

    2009

    Sun X6250 - two quad-
     core Intel Xeon CPUs per
     node

    as of 2009: 720 cores

    16 GB RAM / node

    > 7 Tflop/s

    ~23 kW

    ~330 Mflop/W


        http://www.sun.com/servers/blades/x62520/
Monash Sun Grid                              Big SMP
boxes
  MSG-IV

     2009

     Sun X4600 - eight quad-core
      AMD Opterons CPUs per node

     currently three nodes = 96
      cores

     96 GB RAM / node

     885 Gflop/s

     ~3.6 kW

     250 Mflop/W


            http://www.sun.com/servers/blades/x4600/
Information Technology
Services Division
Office of the CIO




               Benefits of using a cluster
                                         shared          use 2, 4, 8, 32 cores
                                         memory          a single node
                            parallel
           characteristic




                                         distributed
                                         memory          use multiple nodes
           job




                                         multiple
                                                        use multiple cores
                            sequential   scenarios or
                                         cases?         use tools like Nimrod
SPONGE
Introduction
Serendipitous Processing on Operating Nodes in Grid Environment (SPONGE)

 Core Idea and Motivation
   Resource Harnessing
   Accessibility and
   Utilization

 How SPONGE achieves this.

 What SPONGE Can do at the Moment

 What SPONGE cannot do at the moment.

 Infrastructure and Usage statistics (Pretty Pictures).

 Acknowledgements
Core Idea and Motivation
 The core idea - is to harness tremendous amount of un/under-
  utilized computational power to perform high throughput computing.

 Motivation - Large (Giga, Terra, Peta ??) scale computational
  problems that needs
   High throughput, generally embarrisingly parallel applications, e.gPSAs.
      Latin Squares (Mathematics) – Dr. Ian Wanless and Judith Egan; Department of
       Mathematics.
      Molecular Replacement (Biology, Chemistry) – Jason Schmidberger and Dr. Ashley
       Buckle; Department of Biochemistry and Molecular Biology.
      Bayesian Estimation of Bandwidth in Multivariate Kernel Regression with an
       Unknown Error Density (Business, Economics) – Han Shang, Dr. Xibin Zhang and
       Dr. Maxwell King; Department of Business and Economics.
      HPC Solution for Optimization of Transit Priority in Transportation Networks; Dr.
       MahmoudMesbah, Department of Civil Engineering.
   Short running applications that do not require specialized
    software/hardware and can be easily parallelized.
   Single point of submission, monitoring and control.
Core Idea and Motivation Contd…
Key Focus Areas
 Resource Harnessing – involves tapping “existing” (no new
  hardware) infrastructure that would contribute in solving the
  computational problem.
   Student Labs in different Faculties, ITS, EWS etc..
   Staff Computers – Personal Contributions included.

 Accessibility
   How to access these facilities -> Middleware.
   When to access these facilities -> Access and Usage Policies.

 Utilization - How to properly utilize these facilities
   Implementation abstraction. Single System Image.
   Job submission, monitoring and control.
How are we achieving this…
 Using Condor – The goal of Condor Project us to develop, implement, deploy and evaluate mechanisms
 and policies that support High Throughput Computing on large collection of distributively owned
 computing resources.


                    User Submits
                    Jobs directly to
                    Condor
Condor              Submission Node        Submission and
Submission          or Via Nimrod,         Execution Nodes
Node                Globus                 constantly updates the         Condor Head Node
                                           Central Manager                or Central Manager



Condor
Execute
Node


        Caulfield                          Clayton Campus                 Peninsula Campus
        Campus
How are we achieving this…contd
                  User Submits              Condor Head Node
                  Jobs directly to                      Default
                  Condor                                Configuration can be
 Condor           Submission Node                       modified centrally
 Submission       or Via Globus                         upto node level.
 Node
                                        •Queue Management
                                        •Resource Reservation



                          Sponge Works – Configuration Layer

Condor
Execute
Node


      Caulfield                      Clayton Campus         Peninsula Campus
      Campus
What SPONGE can do…
 Execute large number of short running embarrassingly
  parallel jobs by leveraging un/under utilized existing
  computational resources. Sounds simple 

 Advantages
   Leveraging Idle CPU time that remains unused.
   Single point of Job Submission, Monitoring, Control and
    collation of results
   Remote job submission using Nimrod/G, Globus.
What SPONGE cannot do at the
moment
 Sponge Pool consists

 Mostly non-dedicated computers.

 Distributed ownerships.

 Limited availability.

This restricts execution of Jobs that:

 Require Specialized Software/Hardware
      High Memory
      Large Storage Space
      Additional Software

 Takes long time to execute (several days or weeks)

 Perform Inter-Process Communication
Some Statistics
User Name         CPU Hrs Used      User Name          CPU Hrs Used

                                    jirving                   13258.09
shikha                2012437.67
                                    nice-user.pcha13          13205.38
jegan                 1534528.43
                                    wojtek                     7095.26
kylee                 1166358.76 nice-user.wojtek              6890.78
pxuser                 414972.76 mmesbah                       5562.53

iwanless               371833.24 transport                        5379
                                    philipc                    3733.35
zatsepin               257631.86
                                    shahaan                    3251.94
hanshang                 77930.72
                                    zatsepin                   3069.35
llopes                   66747.09
                                    kylee                      2988.84
iwanless                 30930.82 jegan                        1937.55
jvivian                  29611.87 transport                    1308.44

           Total 688 + CPU Years to date…
Statistics contd…
Acknowledgements
 WojtekGoscinski

 Philip Chan

 Jefferson Tan
35




Nimrod Tools for
  e-Research



Monash e-Science & Grid Engineering Laboratory
             Faculty of Information Technology
36




 Overview
 Supporting a Software
  Lifecycle
 Software Lifecycle Tools
37




              Plan File                                                 Nimrod Portal

                                                             Nimrod/O         Nimrod/E


                                                                               Nimrod/G
parameter pressure float range from 5000 to 6000 points 4
parameter concent float range from 0.002 to 0.005 points 2
parameter material text select anyof “Fe” “Al”
                                                                               Actuators
task main
  copy compModel node:compModel
  copy inputFile.skel node:inputFile.skel
  node:substitute inputFile.skel inputFile                                  Grid Middleware
  node:execute ./compModel < inputFile > results
  copy node:results results.$jobname
endtask
38




From one workstation ..
39




.. Scaled Up
40




    Why is this challenging?




Develop, Deploy, Test…
41




     Why is this challenging?




Build, Schedule & Execute virtual application
42




 Approaches to Grid programming
 General Purpose Workflows
   Generic solution
   Workflow editor
   Scheduler

 Special purpose workflows
   Solve one class of problem
   Specification language
   Scheduler
43




   Nimrod Development Cycle



                                          Sent to available machines




Prepare Jobs using Portal



                                                              Results displayed &
                                                              interpreted
                            Jobs Scheduled Executed Dynamically
44




 Acknowledgements
  Message Lab             Funding & Support
      Colin Enticott        CRC for Enterprise
      Slavisa Garic          Distributed Systems (DSTC)
      Blair Bethwaite       Australian Research Council
      Tom Peachy            GrangeNet (DCITA)
      Jeff Tan              Australian Research
                              Collaboration Service
                              (ARCS)
  MeRC
                             Microsoft
    Shahaan Ayyub
                             Sun Microsystems
    Philip Chan
                             IBM
                             Hewlett Packard
                             Axceleon
Message Lab Wiki:
https://messagelab.monash.edu.au/nimrod

More Related Content

What's hot

The Tofu Interconnect D for the Post K Supercomputer
The Tofu Interconnect D for the Post K SupercomputerThe Tofu Interconnect D for the Post K Supercomputer
The Tofu Interconnect D for the Post K Supercomputerinside-BigData.com
 
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APUHot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APUAMD
 
Snapdragon SoC and ARMv7 Architecture
Snapdragon SoC and ARMv7 ArchitectureSnapdragon SoC and ARMv7 Architecture
Snapdragon SoC and ARMv7 ArchitectureSantosh Verma
 
AMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop ProductsAMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop ProductsAMD
 
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APUHot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APUAMD
 
AMD Hot Chips Bulldozer & Bobcat Presentation
AMD Hot Chips Bulldozer & Bobcat PresentationAMD Hot Chips Bulldozer & Bobcat Presentation
AMD Hot Chips Bulldozer & Bobcat PresentationAMD
 
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APUDelivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APUAMD
 
Snapdragon s4 processors system on chip solutions for a new mobile age
Snapdragon s4 processors system on chip solutions for a new mobile ageSnapdragon s4 processors system on chip solutions for a new mobile age
Snapdragon s4 processors system on chip solutions for a new mobile ageSatya Harish
 
State of the Union: Open Source Network Function Virtualization
State of the Union: Open Source Network Function VirtualizationState of the Union: Open Source Network Function Virtualization
State of the Union: Open Source Network Function VirtualizationSamsung Open Source Group
 
Data flow super computing valentina balas
Data flow super computing   valentina balasData flow super computing   valentina balas
Data flow super computing valentina balasValentina Emilia Balas
 
Pcdvpcu en ex9200-customer-presentation-1
Pcdvpcu en ex9200-customer-presentation-1Pcdvpcu en ex9200-customer-presentation-1
Pcdvpcu en ex9200-customer-presentation-1He Hariyadi
 
AMD Ryzen Mobile with Radeon Vega Graphics
AMD Ryzen Mobile with Radeon Vega Graphics AMD Ryzen Mobile with Radeon Vega Graphics
AMD Ryzen Mobile with Radeon Vega Graphics Low Hong Chuan
 
Qualcomm Snapdragon 820 Product and Infographics
Qualcomm Snapdragon 820 Product and InfographicsQualcomm Snapdragon 820 Product and Infographics
Qualcomm Snapdragon 820 Product and InfographicsMark Shedd
 
MX – универсальная сервисная платформа. Вчера, сегодня, завтра.
MX – универсальная сервисная платформа. Вчера, сегодня, завтра.MX – универсальная сервисная платформа. Вчера, сегодня, завтра.
MX – универсальная сервисная платформа. Вчера, сегодня, завтра.SkillFactory
 
End to End Convergence
End to End ConvergenceEnd to End Convergence
End to End ConvergenceSkillFactory
 

What's hot (20)

The Tofu Interconnect D for the Post K Supercomputer
The Tofu Interconnect D for the Post K SupercomputerThe Tofu Interconnect D for the Post K Supercomputer
The Tofu Interconnect D for the Post K Supercomputer
 
Juniper 40G and 100G
Juniper 40G and 100GJuniper 40G and 100G
Juniper 40G and 100G
 
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APUHot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
 
Snapdragon SoC and ARMv7 Architecture
Snapdragon SoC and ARMv7 ArchitectureSnapdragon SoC and ARMv7 Architecture
Snapdragon SoC and ARMv7 Architecture
 
AMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop ProductsAMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop Products
 
ISBI MPI Tutorial
ISBI MPI TutorialISBI MPI Tutorial
ISBI MPI Tutorial
 
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APUHot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
 
AMD Hot Chips Bulldozer & Bobcat Presentation
AMD Hot Chips Bulldozer & Bobcat PresentationAMD Hot Chips Bulldozer & Bobcat Presentation
AMD Hot Chips Bulldozer & Bobcat Presentation
 
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APUDelivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
 
Snapdragon s4 processors system on chip solutions for a new mobile age
Snapdragon s4 processors system on chip solutions for a new mobile ageSnapdragon s4 processors system on chip solutions for a new mobile age
Snapdragon s4 processors system on chip solutions for a new mobile age
 
State of the Union: Open Source Network Function Virtualization
State of the Union: Open Source Network Function VirtualizationState of the Union: Open Source Network Function Virtualization
State of the Union: Open Source Network Function Virtualization
 
Data flow super computing valentina balas
Data flow super computing   valentina balasData flow super computing   valentina balas
Data flow super computing valentina balas
 
Pcdvpcu en ex9200-customer-presentation-1
Pcdvpcu en ex9200-customer-presentation-1Pcdvpcu en ex9200-customer-presentation-1
Pcdvpcu en ex9200-customer-presentation-1
 
AMD Ryzen Mobile with Radeon Vega Graphics
AMD Ryzen Mobile with Radeon Vega Graphics AMD Ryzen Mobile with Radeon Vega Graphics
AMD Ryzen Mobile with Radeon Vega Graphics
 
Snapdragon Processor
Snapdragon ProcessorSnapdragon Processor
Snapdragon Processor
 
Qualcomm Snapdragon 820 Product and Infographics
Qualcomm Snapdragon 820 Product and InfographicsQualcomm Snapdragon 820 Product and Infographics
Qualcomm Snapdragon 820 Product and Infographics
 
MX – универсальная сервисная платформа. Вчера, сегодня, завтра.
MX – универсальная сервисная платформа. Вчера, сегодня, завтра.MX – универсальная сервисная платформа. Вчера, сегодня, завтра.
MX – универсальная сервисная платформа. Вчера, сегодня, завтра.
 
Ibm power7
Ibm power7Ibm power7
Ibm power7
 
Новые коммутаторы QFX10000. Технология JunOS Fusion
Новые коммутаторы QFX10000. Технология JunOS FusionНовые коммутаторы QFX10000. Технология JunOS Fusion
Новые коммутаторы QFX10000. Технология JunOS Fusion
 
End to End Convergence
End to End ConvergenceEnd to End Convergence
End to End Convergence
 

Similar to Sponge v2

Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...Heiko Joerg Schick
 
Barcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de RiquezaBarcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de RiquezaFacultad de Informática UCM
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facilityinside-BigData.com
 
Hardware architecture of Summit Supercomputer
 Hardware architecture of Summit Supercomputer Hardware architecture of Summit Supercomputer
Hardware architecture of Summit SupercomputerVigneshwarRamaswamy
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...Linaro
 
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...Rakuten Group, Inc.
 
100G Networking Berlin.pdf
100G Networking Berlin.pdf100G Networking Berlin.pdf
100G Networking Berlin.pdfJunZhao68
 
Linaro connect 2018 keynote final updated
Linaro connect 2018 keynote final updatedLinaro connect 2018 keynote final updated
Linaro connect 2018 keynote final updatedDileep Bhandarkar
 
Expectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software researchExpectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software researchRyousei Takano
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...PROIDEA
 
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerModular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerRebekah Rodriguez
 
Exploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudExploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudRyousei Takano
 
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerModular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerRebekah Rodriguez
 
CIF16: Building the Superfluid Cloud with Unikernels (Simon Kuenzer, NEC Europe)
CIF16: Building the Superfluid Cloud with Unikernels (Simon Kuenzer, NEC Europe)CIF16: Building the Superfluid Cloud with Unikernels (Simon Kuenzer, NEC Europe)
CIF16: Building the Superfluid Cloud with Unikernels (Simon Kuenzer, NEC Europe)The Linux Foundation
 
AI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersAI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersCastLabKAIST
 
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...Larry Smarr
 
HPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand ChallengeHPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand ChallengeAnand Haridass
 
IBM Power9 Features and Specifications
IBM Power9 Features and SpecificationsIBM Power9 Features and Specifications
IBM Power9 Features and Specificationsinside-BigData.com
 

Similar to Sponge v2 (20)

Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
 
Barcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de RiquezaBarcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de Riqueza
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
 
Hardware architecture of Summit Supercomputer
 Hardware architecture of Summit Supercomputer Hardware architecture of Summit Supercomputer
Hardware architecture of Summit Supercomputer
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
 
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...
 
100G Networking Berlin.pdf
100G Networking Berlin.pdf100G Networking Berlin.pdf
100G Networking Berlin.pdf
 
Linaro connect 2018 keynote final updated
Linaro connect 2018 keynote final updatedLinaro connect 2018 keynote final updated
Linaro connect 2018 keynote final updated
 
Expectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software researchExpectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software research
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
 
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerModular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
 
100 M pps on PC.
100 M pps on PC.100 M pps on PC.
100 M pps on PC.
 
Exploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudExploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC Cloud
 
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerModular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
 
CIF16: Building the Superfluid Cloud with Unikernels (Simon Kuenzer, NEC Europe)
CIF16: Building the Superfluid Cloud with Unikernels (Simon Kuenzer, NEC Europe)CIF16: Building the Superfluid Cloud with Unikernels (Simon Kuenzer, NEC Europe)
CIF16: Building the Superfluid Cloud with Unikernels (Simon Kuenzer, NEC Europe)
 
AI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersAI Accelerators for Cloud Datacenters
AI Accelerators for Cloud Datacenters
 
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
 
HPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand ChallengeHPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand Challenge
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
IBM Power9 Features and Specifications
IBM Power9 Features and SpecificationsIBM Power9 Features and Specifications
IBM Power9 Features and Specifications
 

Recently uploaded

Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxOracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxSatishbabu Gunukula
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameKapil Thakar
 
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfQ4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfTejal81
 
UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2DianaGray10
 
My key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIMy key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIVijayananda Mohire
 
Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.IPLOOK Networks
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Muhammad Tiham Siddiqui
 
.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptx.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptxHansamali Gamage
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNeo4j
 
March Patch Tuesday
March Patch TuesdayMarch Patch Tuesday
March Patch TuesdayIvanti
 
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Alkin Tezuysal
 
The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)IES VE
 
Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for  Biomedical ApplicationsGraphene Quantum Dots-Based Composites for  Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applicationsnooralam814309
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTxtailishbaloch
 
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInThousandEyes
 
Extra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdfExtra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdfInfopole1
 
Scenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosScenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosErol GIRAUDY
 
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl
 
UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3DianaGray10
 

Recently uploaded (20)

Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxOracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptx
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First Frame
 
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfQ4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
 
UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2
 
My key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIMy key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAI
 
SheDev 2024
SheDev 2024SheDev 2024
SheDev 2024
 
Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)
 
.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptx.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptx
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4j
 
March Patch Tuesday
March Patch TuesdayMarch Patch Tuesday
March Patch Tuesday
 
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
 
The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)
 
Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for  Biomedical ApplicationsGraphene Quantum Dots-Based Composites for  Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applications
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
 
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
 
Extra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdfExtra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdf
 
Scenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosScenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenarios
 
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile Brochure
 
UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3
 

Sponge v2

  • 1. The Monash Campus Grid Programme Enhancing Research with High-Performance/High- Throughput Computing
  • 2. Information Technology Services Division Office of the CIO What is HPC?  High-performance computing is about leveraging the best and cost-effective technologies, from processors, memory chips, disks and networks, to provide aggregated computational capabilities beyond what is typically available to the enduser  high-performance -- running a program as quickly as possible  high-throughput -- running as many programs as quickly as possible within a unit of time  HPC/HTC are enabling technologies for larger experiments, more complex data analyses, achieving higher accuracy in computational models
  • 3. Information Technology Services Division Office of the CIO The Monash Campus Grid WWW Nimrod Grid-enabled Middleware Secure Shell GT2 / GT4 Secure Copy GridFTP Monash Sun Grid HPC Cluster Monash SPONGE Condor Pool LaRDS (peta-scale storage) Monash Gigabit Network https://confluence-vre.its.monash.edu.au/display/mcgwiki/Monash+MCG
  • 4. Information Technology Services Division Office of the CIO Monash Sun Grid  Central high-performance compute cluster (HPC + HTC capable)  Monash eResearch Centre and Information Technology Services Division  Key features:  dedicated Linux cluster with ~205 computers providing ~1,650 CPU cores  processor configurations from 2 CPU cores to up to 48 CPU cores per computer  primary memory configurations from 4 GB to up to 1,024 GB per machine  broad range of applications and development environments  flexibility in addressing our customer requirements https://confluence-vre.its.monash.edu.au/display/mcgwiki/Monash+Sun+Grid+Overvie
  • 5. Information Technology Services Division Office of the CIO Monash Sun Grid node types 2005 2009 2008-10 2010 2006 2008-10
  • 6. Information Technology Services Division Office of the CIO Monash Sun Grid 2010  Very-Large RAM nodes  2010  Dell R910 - four eight-core Intel Xeon (Nehalem) CPUs per node  two nodes [ 64 cores total ]  1,024 GB RAM / node  16 x 600GB 10k RPM SAS disk drives  over 640 Mflop/s  redundant 1.1 kW PSU on each  ~300 Mflop/W http://www.dell.com/us/business/p/poweredge-r910/pd
  • 7. Information Technology Services Division Office of the CIO Monash Sun Grid 2011  Partnership Nodes with Engineering  2010-11  Dell R815 - four 12-core AMD Opteron CPUs per node  five nodes [ 240 cores ]  128 GB RAM / node  10G Ethernet  > 2,400 Gflop/s  redundant 1.1 kW PSUs  ~400 Mflop/W http://www.dell.com/us/business/p/poweredge-r815/pd
  • 8. Information Technology Monash Sun Grid Services Division Office of the CIO Summary Core Name Vintage Node Gflop/s Power Req’t Mflop/W Count MSG-I 2005 V20z 70 336 ~17 kW 20 Mflop/W MSG-II 2006 X2100 64 332 ~11 kW 42 Mflop/W MSG-IIIe 2007 X6220 120 624 ~7.2 kW 65 Mflop/W MSG-IV 2008 X4600 96 885 ~3.6 kW 250 Mflop/W MSG-III 2009 X6250 720 7200 ~23 kW 330 Mflop/W MSG-III 2010 X6250 240 2400 ~7 kW 330 Mflop/W MSG-gpu 2010 Dell 80 > 800 + 18,660 ?? ?? MSG-vlm 2010 Dell R910 64 > 640 ~2.2 kW 290 Mflop/W MSG-pn 2011 Dell R815 240 > 2400 ~5.5 kW 436 Mflop/W Monash Sun Grid HPC Cluster has 1,694 cores & clocks at over 12.5(+ 18.6)Tflops with >5.7 TB of RAM
  • 9. Information Technology Services Division Office of the CIO Software Stack • Underworld • S/W Development, Environments, • CFD Codes Libraries • OpenFOAM, ANSYS Fluent, CFX, viper (user- • gcc, Intel C/Fortran, Intel MKL, IMSL installed) numerical library, Ox, python, openmpi, mpich2, NetCDF, java, PETsc, FFTW, • CUDA toolkit, Qt, VirtualGL, itksnap, drishti, BLAS, LAPACK, gsl, mathematica, Paraview octave, matlab (limited) • CrystalSpace • Statistics, Econometrics • FSL • R, Gauss, Stata • Meep • Computational Chemistry • CircuitScape • Gaussian 09, GaussView 5, Molden, • Structure and Beast GAMESS, Siesta, Materials Studio (Discovery), AMBER 10 • XMDS • Molecular Dynamics • ENViMET (via wine) • Growing List! • NAMD, LAMMPS, Gromacs ENVI/IDL • etc etc etc
  • 10. Information Technology Services Division Office of the CIO Specialist Support and Advise Initial General Advise Engagement Requirements & Startup Account Analysis Tutorial Creation Follow Up Customised Maintenance Solutions
  • 11. Information Technology Services Division Office of the CIO Specialist Support and Advise • Cluster Queue Configuration and Management • Compute job preparation • Custom scripting • Software installation and tuning • Job performance and/or error diagnosis • etc
  • 12. Information Technology Services Division Office of the CIO Growth of CPU Usage cpu hours 2008 859K 20093,300K 20106,863K
  • 13. Information Technology Services Division Office of the CIO Growth of CPU Usage 2008 859K 2009 3,300K cpu 783 CPU years!!! 2010 6,863K hours
  • 14. Information Technology Services Division Office of the CIO Active Users Projected Active Users 2008 71 2009 145 2010 169 24-Aug
  • 15. Information Technology Services Division Office of the CIO What to expect in the future?  Continued refresh of hardware and software  decommissioning older machines  More grid nodes (CPU cores) to meet growing demand  Scalable and high-performance storage architecture without sacrificing data availability  Custom grid nodes &/or subclusters with special configurations to meet user requirements  Better integration with Grid tools and middleware
  • 17. Monash Sun Grid Beginnings  MSG-I  2005  Sun V20z AMD Opteron (dual core)  initially 32 nodes = 64 cores, with 3 new nodes added in 2007 making a total of 70 cores  4 GB RAM / node  336 Gflop/s  ~17kW http://www.sun.com/servers/entry/v20z/index.js  20 Mflop/W
  • 18. Monash Sun Grid  MSG-II  2006  Sun X2100 AMD Opteron (dual core)  initially 24 nodes = 48 cores with 8 nodes added in 2007 making 64 cores at present  4 GB RAM / node  332 Gflop/s  ~11 kW  42 Mflop/W http://www.sun.com/servers/entry/x2100/ Picture on the right was googled and found from Jason Callaway’s Flicker page: http://www.flickr.com/photos/29925031@N07/
  • 19. Monash Sun Grid Big Mem Boxes  MSG-III (now named as MSG-IIIe)  2008  Sun X6220 Blades - two dual core AMD Opterons per node  currently 20 nodes = 80 cores with 10 nodes to be added in 2010 making 120 cores  40 GB RAM / node  624 Gflop/s  ~7.2 kW  330 Mflop/W http://www.sun.com/servers/blades/x6220/ http://www.sun.com/servers/blades/x6220/datasheet.pdf
  • 20. Monash Sun Grid 2010  MSG-III expansion and GPU nodes  2010  Sun X6250 - two quad-core Intel Xeon CPUs per node  240 cores  24 GB RAM / node  Dell nodes connected to two Tesla C1060 GPU cards  Ten nodes = 20 GPU cards  48 GB and 96 GB RAM configs http://www.sun.com/servers/blades/x62520/ http://www.nvidia.com/object/product_tesla_c1060_us.html
  • 21. Monash Sun Grid 2009  MSG-III  2009  Sun X6250 - two quad- core Intel Xeon CPUs per node  as of 2009: 720 cores  16 GB RAM / node  > 7 Tflop/s  ~23 kW  ~330 Mflop/W http://www.sun.com/servers/blades/x62520/
  • 22. Monash Sun Grid Big SMP boxes  MSG-IV  2009  Sun X4600 - eight quad-core AMD Opterons CPUs per node  currently three nodes = 96 cores  96 GB RAM / node  885 Gflop/s  ~3.6 kW  250 Mflop/W http://www.sun.com/servers/blades/x4600/
  • 23. Information Technology Services Division Office of the CIO Benefits of using a cluster shared use 2, 4, 8, 32 cores memory a single node parallel characteristic distributed memory use multiple nodes job multiple use multiple cores sequential scenarios or cases? use tools like Nimrod
  • 25. Introduction Serendipitous Processing on Operating Nodes in Grid Environment (SPONGE)  Core Idea and Motivation  Resource Harnessing  Accessibility and  Utilization  How SPONGE achieves this.  What SPONGE Can do at the Moment  What SPONGE cannot do at the moment.  Infrastructure and Usage statistics (Pretty Pictures).  Acknowledgements
  • 26. Core Idea and Motivation  The core idea - is to harness tremendous amount of un/under- utilized computational power to perform high throughput computing.  Motivation - Large (Giga, Terra, Peta ??) scale computational problems that needs  High throughput, generally embarrisingly parallel applications, e.gPSAs.  Latin Squares (Mathematics) – Dr. Ian Wanless and Judith Egan; Department of Mathematics.  Molecular Replacement (Biology, Chemistry) – Jason Schmidberger and Dr. Ashley Buckle; Department of Biochemistry and Molecular Biology.  Bayesian Estimation of Bandwidth in Multivariate Kernel Regression with an Unknown Error Density (Business, Economics) – Han Shang, Dr. Xibin Zhang and Dr. Maxwell King; Department of Business and Economics.  HPC Solution for Optimization of Transit Priority in Transportation Networks; Dr. MahmoudMesbah, Department of Civil Engineering.  Short running applications that do not require specialized software/hardware and can be easily parallelized.  Single point of submission, monitoring and control.
  • 27. Core Idea and Motivation Contd… Key Focus Areas  Resource Harnessing – involves tapping “existing” (no new hardware) infrastructure that would contribute in solving the computational problem.  Student Labs in different Faculties, ITS, EWS etc..  Staff Computers – Personal Contributions included.  Accessibility  How to access these facilities -> Middleware.  When to access these facilities -> Access and Usage Policies.  Utilization - How to properly utilize these facilities  Implementation abstraction. Single System Image.  Job submission, monitoring and control.
  • 28. How are we achieving this… Using Condor – The goal of Condor Project us to develop, implement, deploy and evaluate mechanisms and policies that support High Throughput Computing on large collection of distributively owned computing resources. User Submits Jobs directly to Condor Condor Submission Node Submission and Submission or Via Nimrod, Execution Nodes Node Globus constantly updates the Condor Head Node Central Manager or Central Manager Condor Execute Node Caulfield Clayton Campus Peninsula Campus Campus
  • 29. How are we achieving this…contd User Submits Condor Head Node Jobs directly to Default Condor Configuration can be Condor Submission Node modified centrally Submission or Via Globus upto node level. Node •Queue Management •Resource Reservation Sponge Works – Configuration Layer Condor Execute Node Caulfield Clayton Campus Peninsula Campus Campus
  • 30. What SPONGE can do…  Execute large number of short running embarrassingly parallel jobs by leveraging un/under utilized existing computational resources. Sounds simple   Advantages  Leveraging Idle CPU time that remains unused.  Single point of Job Submission, Monitoring, Control and collation of results  Remote job submission using Nimrod/G, Globus.
  • 31. What SPONGE cannot do at the moment  Sponge Pool consists  Mostly non-dedicated computers.  Distributed ownerships.  Limited availability. This restricts execution of Jobs that:  Require Specialized Software/Hardware  High Memory  Large Storage Space  Additional Software  Takes long time to execute (several days or weeks)  Perform Inter-Process Communication
  • 32. Some Statistics User Name CPU Hrs Used User Name CPU Hrs Used jirving 13258.09 shikha 2012437.67 nice-user.pcha13 13205.38 jegan 1534528.43 wojtek 7095.26 kylee 1166358.76 nice-user.wojtek 6890.78 pxuser 414972.76 mmesbah 5562.53 iwanless 371833.24 transport 5379 philipc 3733.35 zatsepin 257631.86 shahaan 3251.94 hanshang 77930.72 zatsepin 3069.35 llopes 66747.09 kylee 2988.84 iwanless 30930.82 jegan 1937.55 jvivian 29611.87 transport 1308.44 Total 688 + CPU Years to date…
  • 35. 35 Nimrod Tools for e-Research Monash e-Science & Grid Engineering Laboratory Faculty of Information Technology
  • 36. 36 Overview  Supporting a Software Lifecycle  Software Lifecycle Tools
  • 37. 37 Plan File Nimrod Portal Nimrod/O Nimrod/E Nimrod/G parameter pressure float range from 5000 to 6000 points 4 parameter concent float range from 0.002 to 0.005 points 2 parameter material text select anyof “Fe” “Al” Actuators task main copy compModel node:compModel copy inputFile.skel node:inputFile.skel node:substitute inputFile.skel inputFile Grid Middleware node:execute ./compModel < inputFile > results copy node:results results.$jobname endtask
  • 40. 40 Why is this challenging? Develop, Deploy, Test…
  • 41. 41 Why is this challenging? Build, Schedule & Execute virtual application
  • 42. 42 Approaches to Grid programming  General Purpose Workflows  Generic solution  Workflow editor  Scheduler  Special purpose workflows  Solve one class of problem  Specification language  Scheduler
  • 43. 43 Nimrod Development Cycle Sent to available machines Prepare Jobs using Portal Results displayed & interpreted Jobs Scheduled Executed Dynamically
  • 44. 44 Acknowledgements  Message Lab  Funding & Support  Colin Enticott  CRC for Enterprise  Slavisa Garic Distributed Systems (DSTC)  Blair Bethwaite  Australian Research Council  Tom Peachy  GrangeNet (DCITA)  Jeff Tan  Australian Research Collaboration Service (ARCS)  MeRC  Microsoft  Shahaan Ayyub  Sun Microsystems  Philip Chan  IBM  Hewlett Packard  Axceleon Message Lab Wiki: https://messagelab.monash.edu.au/nimrod