Current Trends in HPC

Current Trends in High Performance
Computing

Dr. Putchong Uthayopas
Department Head, Department of Computer Engineering,
Faculty of Engineering, Kasetsart University
Bangkok, Thailand.
pu@ku.ac.th

Introduction
• High Performance Computing
– An area of computing that involve the
hardware and software that help solving
large and complex problem fast
• Many applications
– Science and Engineering research
• CFD, Genomics, Automobile Design, Drug
discovery
– High Performance Business Analysis
• Knowledge Discovery
• Risk analysis
• Stock portfolio management
– Business is moving more to the analysis of
data from data warehouse

Why we need HPC?
• Change in scientific discovery
– Experimental to simulation and visualization
• Critical need to solve an ever larger problem
– Global Climate modeling
– Life science
– Global warming
• Modern business need
– Design more complex machinery
– More complex electronics design
– Complex and large scale financial system analysis
– More complex data analysis

Top 500: Fastest Computer on Our
Planet
• List of the 500
most powerful
supercomputers
generated twice a
year (June and
November)
• Latest was
announced in
June 2012

Sequoia @ Lawrence Livermore Lab
• BlugeneQ
• 34 login node
– 48 cpu/node 64GB
• 98304 node
– 16 cpu/node 16GB
• IBM power 7
1,572,864 CPU, 1.6
PB RAM
• Peak 20132 TFlops

Projected Performance Development

Processor Just not running faster
• Processor speed keep increasing for the last
20 years
• Common technique
– Smaller process technology
– increase clock speed
– Improve microarchitecture
• Pentium, Pentium II, Pentium III, Pentium IV, Centrino,
Core

Pitfall
• Smaller process technology let
to denser transistor but….
– Heat dissipation
– Noise – reduce voltage
• Increase clock speed
– More power used since CMOS
consume power only when switch
• Improve microarchitecture
– Small improvement for a lot more
complex design
• The only solution left is to use
concurrency. Doing many things at
the same time

Parallel Computing
• Speeding up the execution by splitting task into many
independent subtask and run them on multiple
processors or core
– Break large task into many small sub tasks
– Execute these sub tasks on multiple core ort processors
– Collect result together

14

How to achieve concurrency
• Adding more concurrency into hardware
• Processor
• I/O
• Memory
• Adding more concurrency into software
– How to express parallelism better in software
• Adding more concurrency into algorithm
– How to do many thing at the same time
– How to make people think in parallel

The coming (back) of multicore

Hybrid Architecture

Interconnection
Network

Rational for Hybrid Architecture
• Most scientific application has fine grain
parallelism inside
– CFD, Financial computation, image processing
• Energy efficient
– Employing large number of slow processor and
parallelism can help lower the power
consumption and heat

Two main approaches
• Using multithreading and scale down
processor that is compatible to conventional
processor
– Intel MIC
• Using very large number of small processors
core in a SIMD model. Evolving from graphics
technology
– NVIDIA GPU
– AMD fusion

Many Integrated Core Architecture
• Effort by Intel to add a
large number of core
into a computing
system

Challenges
• Large number of core will have to divide
memory among them
– Much smaller memory per core
– Demand high memory bandwidth
• Still need an effective fine grain parallel
programming model
• No free lunch , programmer have to do some
work

What is GPU Computing?

4 cores

Computing with CPU + GPU
Heterogeneous Computing

Not 2x or 3x : Speedups are 20x to 150x

146X 36X 18X 50X 100X

Medical Molecular Video Matlab Astrophysic
Imaging Dynamics Transcoding Computing s
U of Utah U of Illinois, Elemental Tech AccelerEyes RIKEN
Urbana

149X 47X 20X 130X 30X

Financial Linear Algebra 3D Quantum Gene
simulation Universidad Ultrasound Chemistry Sequencing
Oxford Jaime Techniscan U of Illinois, U of Maryland
Urbana

CUDA Parallel Computing Architecture

• Parallel computing
architecture and
programming model

• Includes a C compiler plus
support for OpenCL and
DX11 Compute

• Architected to natively ATI’s Compute
support all computational
interfaces
“Solution”
(standard languages and
APIs)

Compiling C for CUDA Applications

C CUDA Rest of C
Key Kernels Application

NVCC CPU Code

CUDA object CPU object
files files
Linker

CPU-GPU
Executable

Simple “C” Description For
Parallelism
void saxpy_serial(int n, float a, float *x, float *y)
{
for (int i = 0; i < n; ++i)
y[i] = a*x[i] + y[i];
} Standard C Code
// Invoke serial SAXPY kernel
saxpy_serial(n, 2.0, x, y);

__global__ void saxpy_parallel(int n, float a, float
*x, float *y)
{
Parallel C Code
int i = blockIdx.x*blockDim.x + threadIdx.x;
if (i < n) y[i] = a*x[i] + y[i];
}
// Invoke parallel SAXPY kernel with 256 threads/block
int nblocks = (n + 255) / 256;
saxpy_parallel<<<nblocks, 256>>>(n, 2.0, x, y);

Computational Finance
Financial Computing Software vendors
SciComp : Derivatives pricing modeling
Hanweck: Options pricing & risk analysis
Aqumin: 3D visualization of market data
Exegy: High-volume Tickers & Risk Analysis
Source: SciComp
QuantCatalyst: Pricing & Hedging Engine
Oneye: Algorithmic Trading
Arbitragis Trading: Trinomial Options Pricing

Ongoing work
LIBOR Monte Carlo market model
Callable Swaps and Continuous Time Finance

Source: CUDA SDK

Weather, Atmospheric, & Ocean
Modeling
CUDA-accelerated WRF available
Other kernels in WRF being ported

Ongoing work
Tsunami modeling Source: Michalakes,
Vachharajani
Ocean modeling
Several CFD codes

Source: Matsuoka, Akiyama, et al

New emerging Standard
• OpenCL
– Support by many vendor including apple
– Target for both GPU based SIMD and multithreading
– More complex to program that CUDA
• OpenACC
– OpenACC is a programming standard for parallel
computing developed by Cray, CAPS, Nvidia and PGI
– simplify parallel programming of heterogeneous
CPU/GPU systems.
– Directives based

Cluster computing
• The use of large number of server that linked on
a high speed local network as one single large
supercomputer
• Popular way of building supercomputer
• Software
– Cluster aware OS
• Windows compute cluster server 2008
• NPACI Rocks Linux
• Programming system such as MPI
• Use mostly in computer aided design,
engineering, scientific research

Comment
• Cluster computing is a very mature discipline
• We know how to build a sizable cluster very well
– Hardware integration
– Storage integration : Luster, GPFS
– Scheduler: PBS, Torque, SGE, LSF
– Programming MPI
– Distribution : ROCKS
• Cluster is a foundation fabric for grid and cloud

TERA Cluster
2.5Gbps to Uninet
Storage 48 TB
• KU Fiber Backbone
1 Frontend (HP
ProLiant DL360 G5 (1Gbps Fiber)
Server) and 192 1 Gbps Ethernet/Fiber
computer nodes
– Intel Xeon 3.2
GHz (Dual core, Edge Switch 1Gbps Ethernet
Dual processor)
– Memory 4 GB
(8GB for
Frontend & FE FE WinHPC TERA Anatta SPARE1 SPARE2
infiniband Sunyata Araya (FE) (FE) (FE) (FE) (FE)
nodes)
– 70x4 GB SCSI
HDD (RAID1)
• 4 Storage Servers 96 nodes
– Lustre file 64 + 15
4 nodes 4 nodes nodes 16 spare nodes
system for TERA
cluster's storage nodes
– Attached with
Smart Array
P400i Controller
for 5TB space 200 Ports Gigabit Ethernet switch

Storage Tier 5TB Lustre FS
FS FS FS FS
1 2 3 4

TGCC 2008, Khon Khan University ,
August 29,2008
Thailand

Grid Computing Technology

• Grid computing enables the
virtualization of distributed
computing and data resources such
as processing, network bandwidth
and storage capacity to create a
single system image, granting users
and applications seamless access to
vast IT capabilities.
• Just as an Internet user views a
unified instance of content via the
Web, a grid user essentially sees a
single, large virtual computer.

Grid Architecture
• Fabric Layer
– Protocol and interface that provide
access to computing resources such Application Layer
as CPU, storage
• Connectivity Layer
– Protocol for Grid-specific network Collective Layer
transaction such as security GSI
• Resources Layer
– Protocol to access a single resources
from application Resources
• GRAM (Grid Resource Allocation
Management)
• GridFTP ( data access)
• Grid Resource Information Service Connectivity
• Collective layer
– Protocol that manage and access
group of resources Fabric

Globus as
Service-Oriented Infrastructure

User User
User Application
Application
Application Tool
Tool Reliable
File User Svc
Uniform interfaces, Transfer
Host Env
security mechanisms, MDS-
Web service transport, Index MyProxy
monitoring
DAIS
User Svc
GRAM GridFTP IBM

Host Env
IB M

IBM

IB M

Database
Specialized
Computers Storage
resource

Introduction to ThaiGrid
• A National Project under Software
Industry Promotion Agency (Public
Organization) , Ministry of Information
and Communication Technology
• Started in 2005 from 14 member
organizations
• Expanded to 22 organizations in 2008

August 29,2008
Thailand

Thai Grid Infrastructure
19 sites

1 Gbps
About 1000 CPU core.

s
1 Gbp
155 M 2.5 Gbps
bps
31 bps
s
M bp
155M
0

Mbps
ps

1G
155

ps
310 ps
Mb

Mb
bp

Gb
155
s

2 .5
5
15 bps
M ps
Mb
5
1 5 bps

15
M
5

s
bp
1G

August 29,2008
Thailand

ThaiGrid Usage
• ThaiGrid provides about 290
years of computing time for
members
– 9 years on the grid
– 280 years on tera
• 41 projects from 8 areas are
being support on Teraflop
machine
• More small projects on each
machines

August 29,2008
Thailand

Medicinal Herb Research
• Partner
– Cheminormetics Center, Kasetsart
Univesity (Chak Sangma and team)
• Objective
– Using 3D-molecular databse and virtual
screening to verify the traditional
medicinal herb
• Benefit
– Scientific proof of the ancient
traditional drug
– Benefit poor people that still rely on
the drug from medicinal herb
– Potential benefit for local
pharmaceutical industry Virtual
Screening
Infrastructure

Lab Test

August 29,2008
Thailand

NanoGrid
Computing Resources
Computing Resources

2 MS-Gateway

3
1
MS-Gateway
ThaiGrid

• Objective
– Platform that support computational Nano science
research
• Technology used
– AccelRys Materials Studio
– Cluster Scheduler: Sun Grid Engine and Torque

August 29,2008
Thailand

Challenges
• Size and Scale
• Manageability
– Deployment
– Configuration
– Operation
• Software and
Hardware
Compatibility

Grid System Architecture
• Clusters
– Satellite Sets
• 16 clusters delivered from
ThaiGrid for initial members
• Composed of 5 nodes of IBM
eServer xSeries 336
– Intel Xeon 2.8Ghz (Dual
Processor)
– x86_64 architecture
– Memory: 4 GB (DDR2 SDRAM)
– Other sets
• Various type of servers and
number of nodes
• Provided by member institutes
of ThaiGrid

Grid as a Super Cluster
Grid Scheduler

GCC

REN H
H

H
C C C C H C C C C

C C C C
C C C C

August 29,2008 TGCC 2008, Khon Khan University , Thailand

Is grid still alive?
• Yes, grid is a useful technology for certain task
– Bit torrent for massive file exchange infrastructure
– European Grid is using it to share LHC data
• Pit fall of the grid
– Network is still not reliable and fast enoughlong term
operation
– Multi-site , multi- authority concept make it very complex
for
• system management
• Security
• User to really use the system
• Recent trend is to move to centralized cloud

What is Clouding Computing?

Google
Saleforce
Amazon
Source: Wikipedia (cloud computing)
Microsoft
Yahoo

Why Cloud Computing?
• The illusion of infinite computing resources available on
demand, thereby eliminating the need for Cloud
Computing users to plan far ahead for provisioning.
• The elimination of an up-front commitment by Cloud
users, thereby allowing companies to start small and
increase hardware resources only when there is an
increase in their needs.
• The ability to pay for use of computing resources on a
short-term basis as needed (e.g., processors by the hour
and storage by the day) and release them as needed,
thereby rewarding conservation by letting machines and
storage go when they are no longer useful.

Source: “Above the Clouds: A Berkeley View of Cloud Computing”, RAD lab, UC
Berkeley

Source: “Above the Clouds: A Berkeley View of Cloud Computing”, RAD lab, UC
Berkeley

Cloud Computing Explained
• Saas (Software as a Services)
Application delivered over
internet as a services (gmail)
• Cloud is a massive server and
network that serve Saas to
large number of user
• Service being sold is called
Utility computing

Source: “Above the Clouds: A Berkeley View of Cloud Computing”, RAD lab, UC Berekeley

Enabling Technology for Cloud
Computing
• Cluster and Grid Technoogy
– The ability to build a highly scalable computing
system that consists of 100000 -1000000 nodes
• Service oriented Architecture
– Everything is a service
– Easy to build, distributed, integrate into large scale
aplication
• Web 2.0
– Powerful and flexible user interface for intenet enable
world

Cloud Computing Software Stack

Architecture of Service Oriented Cloud
Computing Systems (SOCCS)
 SOCCS can be
User Interface
constructed by
combining CCR/DSS
Cloud Application Software to form
scalable service to a
client application.
DSS
CSM
 Cloud Service
CCR Management (CSM) acts
as a resources
Operating
System
Operating
System
Operating
System
management system that
keeps track of the
Node
Hardware
Node
Hardware
Node
Hardware availability of services on
Interconnection Network
the cloud.

57

Cloud System Configuration

Cloud User
Interface (Excel)
Cloud Cloud Service Management
Application (CSM)

Service Service Service Service

OS OS OS OS
HW HW HW HW

Interconnection network

58

A Proof-of-Concept Application
Pickup and Delivery Problem with Time Window (PDPTW) is a
problem of serving a number of transportation requests based
on limited number of vehicles.
The objective of the problem is to minimize the sum of the
distance traveled by the vehicles and minimize the sum of the
time spent by each vehicle.

59

PDPTW on the cloud using SOCCS
 Master/Worker
model is adopted as
a framework for
service interaction.
 The algorithm is
partitioned using
domain
decomposition
approach.
 Cloud application
control the
decomposition of
the problem by
sending each sub
problem to worker
service and collect
the results back to
the best answer.

60

Results
Speed up on a
single node
with 4 cores

61

Results
Performance:
Speedup and
efficiency derived
from average
runtime on 1, 2, 4,
8 and 16 compute
nodes.

62

We are living in the world of Data

Video
Surveillance

Social Media

Mobile Sensors

Gene Sequencing
Smart Grids
Geophysical Medical Imaging
Exploration

Big Data
“Big data is data that exceeds the processing capacity of
conventional database systems. The data is too big,
moves too fast, or doesn’t fit the strictures of your
database architectures. To gain value from this data, you
must choose an alternative way to process it.”

Reference: “What is big data? An introduction to the big data landscape.”,
Edd Dumbill, http://radar.oreilly.com/2012/01/what-is-big-data.html

The Value of Big Data
• Analytical use
– Big data analytics can reveal insights hidden
previously by data too costly to process.
• peer influence among customers, revealed by analyzing
shoppers’ transactions, social and geographical data.
– Being able to process every item of data in reasonable
time removes the troublesome need for sampling and
promotes an investigative approach to data.
• Enabling new products.
– Facebook has been able to craft a highly personalized
user experience and create a new kind of advertising
business

Big Data Challenge
• Volume
– How to process data so big that can not be move, or
store.
• Velocity
– A lot of data coming very fast so it can not be stored
such as Web usage log , Internet, mobile messages.
Stream processing is needed to filter unused data or
extract some knowledge real-time.
• Variety
– So many type of unstructured data format making
conventional database useless.

How to deal with big data
• Integration of
– Storage
– Processing
– Analysis Algorithm
– Visualization Processing

Massive
Data Stream Processing Visualize
Stream processing

Storage
Processing
Analysis

A New Approach For Distributed Big
L.A.
Data
BOSTON LONDON L.A. BOSTON LONDON

Storage Islands Single Storage Pool
• Disparate Systems • Single System Across Locations
• Manual Administration • Automated Policies
• One Tenant, Many Systems • Many Tenants One System
• IT Provisioned Storage • Self-Service Access

Hadoop
• Hadoop is a platform for distributing computing problems across a
number of servers. First developed and released as open source by
Yahoo.
– Implements the MapReduce approach pioneered by Google in
compiling its search indexes.
– Distributing a dataset among multiple servers and operating on the
data: the “map” stage. The partial results are then recombined: the
“reduce” stage.
• Hadoop utilizes its own distributed filesystem, HDFS, which makes
data available to multiple computing nodes
• Hadoop usage pattern involves three stages:
– loading data into HDFS,
– MapReduce operations, and
– retrieving results from HDFS.

WHAT FACEBOOK KNOWS

Cameron Marlow calls himself Facebook's "in-
house sociologist." He and his team can analyze
http://www.facebook.com/data essentially all the information the site gathers.

The links of Love
• Often young women specify that
they are “in a relationship” with
their “best friend forever”.
– Roughly 20% of all relationships for
the 15-and-under crowd are
between girls.
– This number dips to 15% for 18-
year-olds and is just 7% for 25-year-
olds.
• Anonymous US users who were
over 18 at the start of the
relationship
– the average of the shortest number
of steps to get from any one U.S.
user to any other individual is 16.7.
– This is much higher than the 4.74
steps you’d need to go from any
Facebook user to another through
friendship, as opposed to romantic, Graph shown the relationship of anonymous US users who were over
ties. 18 at the start of the relationship.

http://www.facebook.com/notes/facebook-data-team/the-links-of-
love/10150572088343859

Why?
• Facebook can improve users experience
– make useful predictions about users' behavior
– make better guesses about which ads you might
be more or less open to at any given time
• Right before Valentine's Day this year a blog
post from the Data Science Team listed the
songs most popular with people who had
recently signaled on Facebook that they had
entered or left a relationship

Data Tsunami
• Data flood is coming, no
where to run now!
– Data being generated
anytime, anywhere, anyone
– Data is moving in fast
– Data is too big to move, too
big to store
• Better be prepare
– Use this to enhance your
business and offer better
services to customer

The Opportunities and Challenges of
Exascale Computing
• Summary of findings
from many workshop in
US.
• List issues needed to
overcome
• We will present only
some challenges

Hardware Challenges
• Major improvement in hardware is needed.

Power Challenge
• Power consumption of the
computers is the largest hardware
research challenge.
• Today, power costs for the largest
petaflop systems are in the range of
$5-10M60 annually
• An exascale system using current
technology.
– the annual power cost to operate
the system would be above $2.5B
per year.
– The power load would be over a
gigawatt
• The target of 20 megawatts,
identified in the DOE Technology
Roadmap, is primarily based on
keeping the operational cost of the
system in some kind of feasible
range.

Memory Challenge
• Memory subsystem is too slow

System Resiliency Challenge
• For exascale systems, the number of system
components will be increasing faster than
component reliability, with projections in the
minutes or seconds for exascale systems.
• Exascale systems will experience various kind
of faults many times per day.
– Systems running 100 million cores will continually
see core failures and the tools for
• Dealing with them will have to be rethought.

The Computer Science Challenges
• A programming model effort is a critical
component
– clock speeds will be flat or even dropping to save
energy. All performance improvements within a
chip will come from increased parallelism. The
amount of memory per arithmetic
– need for fine-grained parallelism and a
programming model other than message passing
or coarse-grained threads

Under the radar
• Mobile processor run super computer
• Hybrid war! GPU VS. MIC
• I/O goes solid state
• Programming standard war
– CUDA/ OpenCL/ OpenMP/ OpenACC

Summary
• We are in the challenging world
• Demand for HPC system, application will
increase.
– Software tool , technology, hardware is changing
to catch up.
• The greatest challenge is how to quickly
develop software for the next generation
computing system

Current Trends in HPC

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (9)

Similar to Current Trends in HPC

Similar to Current Trends in HPC (20)

More from Putchong Uthayopas

More from Putchong Uthayopas (17)

Recently uploaded

Recently uploaded (20)

Current Trends in HPC

Editor's Notes