This document provides an introduction to high-performance computing (HPC) including definitions, applications, hardware, and software. It defines HPC as utilizing parallel processing through computer clusters and supercomputers to solve complex modeling problems. The document then describes typical HPC cluster hardware such as computing nodes, a head node, switches, storage, and a KVM. It also outlines cluster management software, job scheduling, and parallel programming tools like MPI that allow programs to run simultaneously on multiple processors. An example HPC cluster at SIU called Maxwell is presented with its technical specifications and a tutorial on logging into and running simple MPI programs on the system.
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)byteLAKE
byteLAKE's presentation from the PPAM 2019 conference.
Abstract:
The goal of this work is to adapt 4 CFD kernels to the Xilinx ALVEO U250 FPGA, including first-order step of the non-linear iterative upwind advection MPDATA schemes (non-oscillatory forward in time), the divergence part of the matrix-free linear operator formulation in the iterative Krylov scheme, tridiagonal Thomas algorithm for vertical matrix inversion inside preconditioner for the iterative solver, and computation of the psuedovelocity for the second pass of upwind algorithm in MPDATA. All the kernels use 3-dimensional compute domain consisted from 7 to 11 arrays. Since all kernels belong to the group of memory bound algorithms, our main challenge is to provide the highest utilization of global memory bandwidth. Our adaptation allows us to reduce the execution time upto 4x.
Find out more at: www.byteLAKE.com/en/CFD
Foot note:
This is the presentation about the non-AI version of byteLAKE's CFD kernels, highly optimized for Alveo FPGA. Based on this research project and many others in the CFD space, we decided to shift the course of the CFD Suite product development and leverage AI to accelerate computations and enable new possibilities. Instead of adapting CFD solvers to accelerators, we use AI and work on a cross-platform solution. More on the latest: www.byteLAKE.com/en/CFDSuite.
-
Update for 2020: byteLAKE is currently developing CFD Suite as AI for CFD Suite, a collection of AI/ Artificial Intelligence Models to accelerate and enable new features for CFD simulations. It is a cross-platform solution (not only for FPGAs). More: www.byteLAKE.com/en/CFDSuite.
An introduction to the Design of Warehouse-Scale ComputersAlessio Villardita
A brief overview of the main factors involved in the design of Warehouse-Scale Computers (WSC), from the hardware, to the cooling system to the overall plant energy efficiency, always keeping in mind the costs of such a big architecture.
Co-Author: Pietro Piscione (https://www.linkedin.com/pub/pietro-piscione/84/b37/926)
A work based on:
"The Datacenter as a Computer, An Introduction to the Design of Warehouse-Scale Machines, Second Edition"
by
Luiz André Barroso
Jimmy Clidaras
Urs Hölzle
A Dataflow Processing Chip for Training Deep Neural Networksinside-BigData.com
In this deck from the Hot Chips conference, Chris Nicol from Wave Computing presents: A Dataflow Processing Chip for Training Deep Neural Networks.
Watch the video: https://wp.me/p3RLHQ-k6W
Learn more: https://wavecomp.ai/
and
http://www.hotchips.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)byteLAKE
byteLAKE's presentation from the PPAM 2019 conference.
Abstract:
The goal of this work is to adapt 4 CFD kernels to the Xilinx ALVEO U250 FPGA, including first-order step of the non-linear iterative upwind advection MPDATA schemes (non-oscillatory forward in time), the divergence part of the matrix-free linear operator formulation in the iterative Krylov scheme, tridiagonal Thomas algorithm for vertical matrix inversion inside preconditioner for the iterative solver, and computation of the psuedovelocity for the second pass of upwind algorithm in MPDATA. All the kernels use 3-dimensional compute domain consisted from 7 to 11 arrays. Since all kernels belong to the group of memory bound algorithms, our main challenge is to provide the highest utilization of global memory bandwidth. Our adaptation allows us to reduce the execution time upto 4x.
Find out more at: www.byteLAKE.com/en/CFD
Foot note:
This is the presentation about the non-AI version of byteLAKE's CFD kernels, highly optimized for Alveo FPGA. Based on this research project and many others in the CFD space, we decided to shift the course of the CFD Suite product development and leverage AI to accelerate computations and enable new possibilities. Instead of adapting CFD solvers to accelerators, we use AI and work on a cross-platform solution. More on the latest: www.byteLAKE.com/en/CFDSuite.
-
Update for 2020: byteLAKE is currently developing CFD Suite as AI for CFD Suite, a collection of AI/ Artificial Intelligence Models to accelerate and enable new features for CFD simulations. It is a cross-platform solution (not only for FPGAs). More: www.byteLAKE.com/en/CFDSuite.
An introduction to the Design of Warehouse-Scale ComputersAlessio Villardita
A brief overview of the main factors involved in the design of Warehouse-Scale Computers (WSC), from the hardware, to the cooling system to the overall plant energy efficiency, always keeping in mind the costs of such a big architecture.
Co-Author: Pietro Piscione (https://www.linkedin.com/pub/pietro-piscione/84/b37/926)
A work based on:
"The Datacenter as a Computer, An Introduction to the Design of Warehouse-Scale Machines, Second Edition"
by
Luiz André Barroso
Jimmy Clidaras
Urs Hölzle
A Dataflow Processing Chip for Training Deep Neural Networksinside-BigData.com
In this deck from the Hot Chips conference, Chris Nicol from Wave Computing presents: A Dataflow Processing Chip for Training Deep Neural Networks.
Watch the video: https://wp.me/p3RLHQ-k6W
Learn more: https://wavecomp.ai/
and
http://www.hotchips.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
- What we mean by EAS core and how it's distinct from the other components - also why it's so difficult to get it merged. (This is driven by key partner concerns).
- An update on misc work that's underway to resolve the upstreaming.
- Misc load balance pathway enhancements
- Wakeup pathway mods (cleanups, basic big.LITTLE capacity awareness etc)
- Periodic load balancer mods.
- Energy model expression (why this is important, partner perspectives/experience and bottlenecks)
- Proposals to get an expression into the mainline
- Optional boot-time auto-detection of capacity over-ridable by sysfs
- Leveraging the merged power coefficient bindings
- Leveraging the OPP bindings
.. to effectively get to EAS' struct sched_group_energy.
- How we are structuring things to ease upstream acceptance. What's helping, what's not, where partners can help.
Revisiting CephFS MDS and mClock QoS SchedulerYongseok Oh
This presents the CephFS performance scalability and evaluation results. Specifically, it addresses some technical issues such as multi core scalability, cache size, static pinning, recovery, and QoS.
This presentation by Stanislav Donets (Lead Software Engineer, Consultant, GlobalLogic, Kharkiv) was delivered at GlobalLogic Kharkiv C++ Workshop #1 on September 14, 2019.
In this talk were covered:
- Graphics Processing Units: Architecture and Programming (theory).
- Scratch Example: Barnes Hut n-Body Algorithm (practice).
Conference materials: https://www.globallogic.com/ua/events/kharkiv-cpp-workshop/
Lightweight DNN Processor Design (based on NVDLA)Shien-Chun Luo
https://sites.google.com/view/itri-icl-dla/
(Public Information Share) This is our lightweight DNN inference processor presentation, including a system solution (from Caffe prototxt to HW controls files), hardware features, and an example of object detection (Tiny YOLO) RTL simulation results. We modified open-source NVDLA, small configuration, and developed a RISC-V MCU in this accelerating system.
MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isola...Heechul Yun
Memory bandwidth in modern multi-core platforms is highly variable for many reasons and is a big challenge in designing real-time systems as applications are increasingly becoming more memory intensive. In this work, we proposed, designed, and implemented an efficient memory bandwidth reservation system, that we call MemGuard. MemGuard distinguishes memory bandwidth as two parts: guaranteed and best effort. It provides bandwidth reservation for the guaranteed bandwidth for temporal isolation, with efficient reclaiming to maximally utilize the reserved bandwidth. It further improves performance by exploiting the best effort bandwidth after satisfying each core’s reserved bandwidth. MemGuard is evaluated with SPEC2006 benchmarks on a real hardware platform, and the results demonstrate that it is able to provide memory performance isolation with minimal impact on overall throughput.
Learning from ZFS to Scale Storage on and under Containersinside-BigData.com
Evan Powell presented this deck at the MSST 2107 Mass Storage Conference.
"What is so new about the container environment that a new class of storage software is emerging to address these use cases? And can container orchestration systems themselves be part of the solution? As is often the case in storage, metadata matters here. We are implementing in the open source OpenEBS.io some approaches that are in some regards inspired by ZFS to enable much more efficient scale out block storage for containers that itself is containerized. The goal is to enable storage to be treated in many regards as just another application while, of course, also providing storage services to stateful applications in the environment."
Watch the video: http://wp.me/p3RLHQ-gPs
Learn more: blog.openebs.io
and
http://storageconference.us
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
BKK16-317 How to generate power models for EAS and IPALinaro
Generating a specific power model for the platform is a pre-requirement for delpoying EAS and IPA. This makes understanding power models and how to generate parameters for them a useful skill. In this session we demonstrate how to use workload automation to gather power data from a board. We will then describe how to derive rough values for the EAS and IPA power models using nothing but this easily observable data. We will not rely on any information provided by OEM or SoC vendor.
OpenPOWER Acceleration of HPCC SystemsHPCC Systems
JT Kellington, IBM and Allan Cantle, Nallatech present at the 2015 HPCC Systems Engineering Summit Community Day about porting HPCC Systems to the POWER8-based ppc64el architecture.
- What we mean by EAS core and how it's distinct from the other components - also why it's so difficult to get it merged. (This is driven by key partner concerns).
- An update on misc work that's underway to resolve the upstreaming.
- Misc load balance pathway enhancements
- Wakeup pathway mods (cleanups, basic big.LITTLE capacity awareness etc)
- Periodic load balancer mods.
- Energy model expression (why this is important, partner perspectives/experience and bottlenecks)
- Proposals to get an expression into the mainline
- Optional boot-time auto-detection of capacity over-ridable by sysfs
- Leveraging the merged power coefficient bindings
- Leveraging the OPP bindings
.. to effectively get to EAS' struct sched_group_energy.
- How we are structuring things to ease upstream acceptance. What's helping, what's not, where partners can help.
Revisiting CephFS MDS and mClock QoS SchedulerYongseok Oh
This presents the CephFS performance scalability and evaluation results. Specifically, it addresses some technical issues such as multi core scalability, cache size, static pinning, recovery, and QoS.
This presentation by Stanislav Donets (Lead Software Engineer, Consultant, GlobalLogic, Kharkiv) was delivered at GlobalLogic Kharkiv C++ Workshop #1 on September 14, 2019.
In this talk were covered:
- Graphics Processing Units: Architecture and Programming (theory).
- Scratch Example: Barnes Hut n-Body Algorithm (practice).
Conference materials: https://www.globallogic.com/ua/events/kharkiv-cpp-workshop/
Lightweight DNN Processor Design (based on NVDLA)Shien-Chun Luo
https://sites.google.com/view/itri-icl-dla/
(Public Information Share) This is our lightweight DNN inference processor presentation, including a system solution (from Caffe prototxt to HW controls files), hardware features, and an example of object detection (Tiny YOLO) RTL simulation results. We modified open-source NVDLA, small configuration, and developed a RISC-V MCU in this accelerating system.
MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isola...Heechul Yun
Memory bandwidth in modern multi-core platforms is highly variable for many reasons and is a big challenge in designing real-time systems as applications are increasingly becoming more memory intensive. In this work, we proposed, designed, and implemented an efficient memory bandwidth reservation system, that we call MemGuard. MemGuard distinguishes memory bandwidth as two parts: guaranteed and best effort. It provides bandwidth reservation for the guaranteed bandwidth for temporal isolation, with efficient reclaiming to maximally utilize the reserved bandwidth. It further improves performance by exploiting the best effort bandwidth after satisfying each core’s reserved bandwidth. MemGuard is evaluated with SPEC2006 benchmarks on a real hardware platform, and the results demonstrate that it is able to provide memory performance isolation with minimal impact on overall throughput.
Learning from ZFS to Scale Storage on and under Containersinside-BigData.com
Evan Powell presented this deck at the MSST 2107 Mass Storage Conference.
"What is so new about the container environment that a new class of storage software is emerging to address these use cases? And can container orchestration systems themselves be part of the solution? As is often the case in storage, metadata matters here. We are implementing in the open source OpenEBS.io some approaches that are in some regards inspired by ZFS to enable much more efficient scale out block storage for containers that itself is containerized. The goal is to enable storage to be treated in many regards as just another application while, of course, also providing storage services to stateful applications in the environment."
Watch the video: http://wp.me/p3RLHQ-gPs
Learn more: blog.openebs.io
and
http://storageconference.us
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
BKK16-317 How to generate power models for EAS and IPALinaro
Generating a specific power model for the platform is a pre-requirement for delpoying EAS and IPA. This makes understanding power models and how to generate parameters for them a useful skill. In this session we demonstrate how to use workload automation to gather power data from a board. We will then describe how to derive rough values for the EAS and IPA power models using nothing but this easily observable data. We will not rely on any information provided by OEM or SoC vendor.
OpenPOWER Acceleration of HPCC SystemsHPCC Systems
JT Kellington, IBM and Allan Cantle, Nallatech present at the 2015 HPCC Systems Engineering Summit Community Day about porting HPCC Systems to the POWER8-based ppc64el architecture.
Design Considerations, Installation, and Commissioning of the RedRaider Cluster at the Texas Tech University
High Performance Computing Center
Outline of this talk
HPCC Staff and Students
Previous clusters
• History, Performance, usage Patterns, and Experience
Motivation for Upgrades
• Compute Capacity Goals
• Related Considerations
Installation and Benchmarks Conclusions and Q&A
Fast switching of threads between cores - Advanced Operating SystemsRuhaim Izmeth
Fast switching of threads between cores is a published research paper on Operating systems, This is our attempt to decode the research and present to the class
Assisting User’s Transition to Titan’s Accelerated Architectureinside-BigData.com
Oak Ridge National Lab is home of Titan, the largest GPU accelerated supercomputer in the world. This fact alone can be an intimidating experience for users new to leadership computing facilities. Our facility has collected over four years of experience helping users port applications to Titan. This talk will explain common paths and tools to successfully port applications, and expose common difficulties experienced by new users. Lastly, learn how our free and open training program can assist your organization in this transition.
This is the presentation on clusters computing which includes information from other sources too including my own research and edition. I hope this will help everyone who required to know on this topic.
Similar to Maxwell siuc hpc_description_tutorial (20)
How to Create Map Views in the Odoo 17 ERPCeline George
The map views are useful for providing a geographical representation of data. They allow users to visualize and analyze the data in a more intuitive manner.
Ethnobotany and Ethnopharmacology:
Ethnobotany in herbal drug evaluation,
Impact of Ethnobotany in traditional medicine,
New development in herbals,
Bio-prospecting tools for drug discovery,
Role of Ethnopharmacology in drug evaluation,
Reverse Pharmacology.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
How to Split Bills in the Odoo 17 POS ModuleCeline George
Bills have a main role in point of sale procedure. It will help to track sales, handling payments and giving receipts to customers. Bill splitting also has an important role in POS. For example, If some friends come together for dinner and if they want to divide the bill then it is possible by POS bill splitting. This slide will show how to split bills in odoo 17 POS.
3. Introduction
• High-speed computing. Originally pertaining only to
supercomputers for scientific research
• Tools and systems used to implement and create high
performance computing systems
• Used for scientific research or computational science
• Main area of discipline is developing parallel
processing algorithms and software so that programs
can be divided into small parts and can be executed
simultaneously by separate processors
• HPC systems have shifted from supercomputing to
computing clusters
3
4. What is Cluster?
• Cluster is a group of machines interconnected in a way that they
work together as a single system
• Used for better speed and capacity
• Types of Cluster
o High-availability (HA) clusters
o Load-balancing clusters
o Grid computing
• Terminology
o Node – individual machine in a cluster
o Head node – connected to both the private network of the cluster
and a public network and are used to access a given cluster.
Responsible for providing user an environment to work and
distributing task among other nodes
o Computer nodes – connected to only the private network of the
cluster and are generally used for running jobs assigned to them by
the head node(s)
4
5. Benefits of Cluster
• Reduced Cost
o The price of off-the-shelf consumer desktops has plummeted in recent
years, and this drop in price has corresponded with a vast increase in
their processing power and performance. The average desktop PC
today is many times more powerful than the first mainframe
computers.
• Processing Power
o The parallel processing power of a high-performance cluster can, in
many cases, prove more cost effective than a mainframe with similar
power. This reduced price-per-unit of power enables enterprises to get
a greater ROI (Return On Investment) from their IT budget.
• Scalability
o Perhaps the greatest advantage of computer clusters is the scalability
they offer. While mainframe computers have a fixed processing
capacity, computer clusters can be easily expanded as requirements
change by adding additional nodes to the network.
5
6. Benefits of Cluster
• Improved Network Technology
o Driving the development of computer clusters has been a vast
improvement in the technology related to networking, along
with a reduction in the price of such technology.
o In clusters, computers are typically connected via a single virtual
local area network (VLAN), and the network treats each
computer as a separate node. Information can be passed
throughout these networks with very little lag, ensuring that
data doesn’t bottleneck between nodes.
• Availability
o When a mainframe computer fails, the entire system fails.
However, if a node in a computer cluster fails, its operations can
be simply transferred to another node within the cluster,
ensuring that there is no interruption in service.
6
7. Application of HPC
• Used to solve complex modeling problems in a spectrum of
disciplines
• Topics include:
• HPC is currently applied to business uses as well
o data warehouses
o line-of-business (LOB) applications
o transaction processing
7
o Artificial intelligence
o Climate modeling
o Cryptographic analysis
o Geophysics
o Molecular biology
o Molecular dynamics
o Nuclear physics
o Physical oceanography
o Plasma physics
o Quantum physics
o Quantum chemistry
o Solid state physics
o Structural dynamics.
8. Top 10 Supercomputers for HPC
June 2011
8
Copyright (c) 2000-2009 TOP500.Org | All trademarks and copyrights on this page are owned by their respective owners
14. The Cluster: maxwell
Tech Specs:
No. of nodes: 106
Each node is a Intel dual CPU
Quad Core 2.3 GHz Processor
Total No. of cores: 848
RAM per node: 8 GB
Storage Memory: 90 TB
14
15. Hardware: Master/Head Node
• Head node is responsible for providing user an environment
to work and distributing task among other nodes
• Minimum Specification
o CPU of i586 or above
o A network interface card that supports a TCP/IP stack
o At least 4GB total free space – 2GB under and 2GB under /var
o A Floppy Drive
o A CD-Rom Drive
15
Front End
Back End
16. Hardware: Master/Head Node
• Maxwell Specification
o Server format Rack
o CPU family Intel Xeon
o CPU nominal frequency 2.26GHz
o Processor Model Xeon E5520
o Processors supplied 2 Quad core
o Memory RAM capacity 24GB Memory (6x4GB),
o Memory type DDR3
o Memory frequency 1066MHz Quad Ranked RDIMMs
o Storage HDD 146GB 15K RPM Serial-Attach SCSI
o RAID module PERC 6/i SAS RAID Controller 2x4 Connectors
o Gigabit LAN ports 2
o Power supply rating 480W
o Idle power consumption 150W
o Peak power consumption 270W
o OS Red Hat Enterprise Linux 53AP x32 And x64
16
Front End
17. Hardware: Computing Node (Client)
• Dedicated for Computation
• Minimum Specification
o CPU of i586 or above
o A disk on each client node, at least 2GB in size
o A network interface card that supports a TCP/IP stack
o All clients must have the same architecture (e.g., ia32 vs. ia64)
o Monitors and keyboards may be helpful, but are not required
o Floppy or PXE enable BIOS
o A CD-Rom Drive
17
Two Quad Core
processors
Memory slot
Memory slot
18. Hardware: Computing Node (Client)
• Maxwell Specification
o CPU family Intel Xeon
o CPU nominal frequency 2.13GHz
o Processors supplied 2 quad core
o Memory RAM capacity 8GB Memory (4x2GB)
o Memory type DDR3
o Memory frequency 1333MHz Dual Ranked UDIMMs
o storage HDD 160GB 7.2K RPM SATA
o Gigabit LAN ports 2
o Power supply rating 480W
o Idle power consumption 115W
o Peak power consumption 188W
o OS Red Hat Linux 5 HPC
18
Front End
19. Hardware: Switch
• Minimum Specification
– The switch is necessary for communication between the nodes
– Each node (including the head node) should have its own port on the
switch. In other words, if there are one head node and 8 client nodes,
you need at a minimum a 9-port switch
19
Switch
20. Hardware: Switch
• Maxwell Specification
o Model: Power Connect 6248
o Port: 48 10/100/1000BASE-T auto-sensing Gigabit Ethernet switching ports
o 48 GbE(giga bit ethernet) Port Managed Switch, two 10GbE and Stacking
Capable
20
Power Connect 6248 Switch Stack
21. Hardware: Power Distribution Unit
• APC Switched Rack Power Distribution Units (PDUs) place rack equipment
power control in the hands of the IT Manager. Remote outlet level controls
allow power on/off functionality for power recycling to reboot locked-up
equipment and to avoid unauthorized use of individual outlets. Power
sequencing delays allow users to define the order in which to power up or
down attached equipment. Avoid circuit overload during power recovery
and extend uptime of critical equipment by prioritizing the load shedding.
o PDU plug type L6-30P
o PDU ModelAPC AP7541
o PDU Max Amperage Load 30
21
22. Hardware: External Storage Array
• Minimum Specification:
o Model Power Vault MD1000 Hard Drive
o Max Supported Capacity 1.1 TB
o Host Channels 2
o Data Transfer Rate 300 MBPs
o Supported Devices Hard drive , Disk array (RAID)
o Spindle Speed 15000 RPM
• Maxwell Specification
o Total storage array 6
o In Each Storage Array 15 HDD
o Each HDD has 1*1 TB
o Total Storage Capacity 6*15*1.1 TB
22
Front Side
Back Side
23. Hardware: KVM Switch
• KVM (Keyboard Video Mouse) Switch is a device used to connect a
keyboard, mouse and monitor to two or more computers. KVM switches
save money, time, space, equipment and power. These switches are also
widely deployed to control pools of servers in data centers. Some KVM
switches support user terminals at both ends that allow local and remote
access to all the computers or servers.
23
24. Hardware: Networking
• Clusters are interconnected with both GigE (Dell PowerConnect 6248 and 48 GbE
PortManaged
• Switch, 2xDell PowerConnect 3424 24 Port FE with 2 GbE Copper Ports and 2 GbE
Fiber SFP
• Ports and Infiniband (Dell 24-Port Internally Managed 9024 DDR InfiniBand Edge
Switch)
• Switches and cards
24
Switch Ports Cable connectivity
28. Software for HPC
• For effective use of cluster for HPC the following
tools are at our disposal
– Remote hardware management
Remote power on/off
Monitoring CPU (for temperature etc.)
– Cluster management
Monitoring programs, system administration etc.
– Job scheduling
– Libraries/languages for parallel programming
Massage Passing Interface (MPI)
28
29. Cluster Management
• Cluster management software offers
– Easy-to-use interface for managing clusters
– Automates the process of queuing jobs
– Matching the requirements of a job and the
resources available to the cluster
– Migrating jobs across the cluster
• Maxwell uses Red Hat Enterprise Linux
29
30. Cluster Management
• Red Hat Enterprise Linux
– Specially for the scientific computing purpose to
deploy clusters of systems that work together
– Excellent hardware detection and monitoring
capabilities
– Centralized authentication and logging services
– Fast IO (Input/Output)
30
31. Parallel Computing
• Form of computation in which many calculations are carried out
simultaneously, operating on the principle that large problems can
often be divided into smaller ones, which are then solved
concurrently i.e. "in parallel"
• Different forms of parallel computing
– Bit-level parallelism
– Instruction level parallelism
– Data parallelism
– Task parallelism
• Parallel Computer classification
– Multiple processing elements (multi-core and multi-processor) within
a single machine
– Using multiple computers to work on the same task - clusters, MPPs
(Massive Parallel Processing), and grids
31
32. Parallel Programming
• Parallel computer programs are more difficult to write than
sequential programs
• Potential problems
– Race condition (output depending on sequence or timing of
other events)
– Communication and synchronization between the different
subtasks
• HPC Parallel Programming Models associated with different
computing technology
– Single Instruction Multiple Data (SIMD) on Single Processors
– Multi-Process and Multi-Threading on SMP (symmetric
multiprocessing) Computers
– Message Passing Interface (MPI) on Clusters
32
33. Parallel Programming
• Message Passing Interface (MPI)
– An application programming interface (API) specification
that allows processes to communicate with one another by
sending and receiving messages
– Now a de facto standard for parallel programs running on
distributed memory systems in computer clusters and
supercomputers
– A massage passing API with language-independent
protocol and semantic specifications
– Support both point-to-point and collective communication
– Goals are high performance, scalability, and portability
– Consists of a specific set of routines (i.e. APIs) directly
callable from C, C++, Fortran and any language able to
interface with such libraries, including C#, Java or Python
33
35. Maxwell: A Brief Introduction
35
Tech Specs:
No. of nodes: 106
Each node is a Intel dual CPU
Quad Core 2.3 GHz Processor
Total No. of cores: 848
RAM per node: 8 GB
Storage Memory: 90 TB
36. How to create an account?
• Send an email to
– Nancy Beasley nancyj0@siu.edu or
– Dr. Shaikh Ahmed ahmed@siu.edu
• Provide the following information
– Name
– Affiliation
– IP address of the computer(s) on SIUC network from which you
would access Maxwell
• Will receive an email with Log In information
36
37. How to create an account?
• IP address look-up
37
Go to Start
38. How to create an account?
• IP address look-up
38
Type in ‘cmd’ to go to
command prompt
40. How to create an account?
• IP address look-up
40
Type in ‘ipconfig’ to
get the IP Address
41. How to create an account?
• IP address look-up
41
IP Address
42. Login Procedure
• Download ‘Putty’
– Web addresses
• http://www.putty.org/
• http://download.cnet.com/PuTTY/3000-7240_4-
10808581.html
– Run ‘Putty’
• Use Host Name or IP address of Maxwell
– Host Name: maxwell.ecehpc.siuc.edu
• Enable X11
42
43. Login Procedure
• Run ‘Putty’
– Start ‘Session’
43
Type in Host Name:
“maxwell.ecehpc.siuc.edu”
53. Run MPI
• File: ‘bsub.sh’
53
job_name
output_file
run_limit
memory_limit
processor_use
run_time
to run a
parallel
program
by MPI
Change this
(nanoSIUC) to
your login ID
54. Run MPI
54
• File: ‘submit_script.sh’
Creates a new directory to generate
output.
Directory name:
cpi_test_<number of nodes used>
55. Run MPI
• Script to Run MPI
– ./submit_script.sh bsub.sh <# of Processors> <input file>
– <# of Processors> is an integer
– <input file> is optional.
• If in different directory, use the path of the input file as well
55
56. Run MPI
• Script to Run MPI
56
Running MPI Program
<# of processors>
Output Directory