Telco Cloud and
Network (NFV and
SDN)
2
Global Mobile Traffic
3
(Compound Annual Growth Rate)
Driving Forces
4
5
5G Use Cases
• Massive-Machine Type Communication (mMTC)
• Mission Critical-Machine Type Communication (MC-MTC)
Diverse Requirements of IoT
6
One Network – Multiple Industries
7
8
Telco Cloud Networks
 SDN/NFV enable programmability & Cloud enable virtualization of
network resources
 High level of flexibility and programmability in individual domains
(mobile core, radio access network and transport network).
 Cross-domain programmability and orchestration.
9
Telco Network
10
Change the business
model of Telecom
Biggest technological
revolution
Innovative more
quickly
Change the way
network are deployed
Change the way
consumers are enabling
services on the fly
Why Cloud Network are required
in Telco?
11
Cloud Network?
12
What is a Telco Cloud?
 A Telco Grade (aka Carrier Grade) Cloud will be a cloud
that can support telco grade applications
 Telco Grade requirements:
 High availability
 High performance (large number of transactions, scalability)
 Serviceability
 Long life time
 Security
 Real-time behavior
 Standard-compliant HW
13
Why Telco Clouds?
14
Problem of Telco Operators
Get Right Capacity at Right Time
15
Get Right Capacity at Right Time
16
Get Right Capacity at Right Time
17
Why Telco Cloud?
18
Problem of Telco Operators
How Telco Cloud Helps?
19
Solution!
20
Migration (Traditional to Cloud)
21
NFV & SDN
22
NFV & SDN
23
NFV Initiative
24
Introduction to NFV
25
Introduction to NFV
26
Introduction to NFV
27
Traditional Network Functions (NFs)
 Proprietary devices/boxes for different NFs
 Network services rely on different type of appliances
 New services into today’s network is becoming increasingly difficult
due to:
 Proprietary nature of appliances
 Diverse and purpose built hardware
 Cost (increase CapEx & OpEx)
 Short life cycle of appliances
 Lack of space
 Energy for middle-boxes
 Lack of skilled professionals to integrate services
 Recently, NFV proposed to alleviate these problems
Introduction– Network Function
Virtualization
 NFV proposed by ESTI, an Industry Specification Group (ISG)
 Allows to implement NFs (Network Functions) in software.
 Virtualized the NFs carried by proprietary HW.
 Decoupling NFs from underlying appliances
 NFs can runs on commodity hardware (i.e. servers, storage, switches)
 Accelerate deployment of new services and NFs
Network Function Virtualization
30
Why We Need NFV?
 Virtualization: Use network resource without worrying about where it is
physically located, how much it is, how it is organized, etc.
 Orchestration: Manage thousands of devices
 Programmable: Should be able to change behavior on the fly.
 Dynamic Scaling: Should be able to change size, quantity
 Visibility: Monitor resources, connectivity
 Performance: Optimize network device utilization
 Multi-tenancy
 Service Integration
 Openness: Full choice of Modular plug-ins
NFV Architecture
32
NFV Architecture
33
NFV Architecture
34
NFV Architecture
35
NFV Architecture
36
Why SDN?
37
NFV vs SDN
38
What is SDN?
39
Traditional Network Devices
 Typical Networking Software
 Management plane
 Control Plane – The brain/decision maker
 Data Plane – Packet forwarder
Traditional Network Devices
41
What is SDN?
What is SDN?
43
Basic concepts of SDN
 Separate Control plane and Data plane entities.
 Network intelligence and state are logically centralized.
 The underlying network infrastructure is abstracted from
the applications.
 Execute or run Control plane software on general
purpose hardware.
 Decouple from specific networking hardware.
 Use commodity servers and switches.
 Have programmable data planes.
 Maintain, control and program data plane state from a
central entity.
 An architecture to control not just a networking
device but an entire network.
SDN Framework
45
SDN Architecture
 Application Layer:
 Focusing network services
 SW apps communicating with the
control layer
 Control-Plane Layer:
 Core of SDN
 Consists of a centralized controller
 Logically maintains a global and
dynamic view
 Take requests from application
layer
 Manage network devices via
standard protocols
 Data-Plane Layer:
 Programmable devices
 Support standard interfaces
Data Plane – Forwarding Devices
Southbound Interface – OpenFlow
 Forwarding elements are controlled
by an open interface
 OpenFlow has strong support from
industry, research and academia
 OpenFlow, standardized information
exchange b/w two planes
 Provides controller-switch
interactions
 OpenFlow 1.3.0 provides secure
communication using TLS
 Communicate with controller via
OpenFlow protocol
Control Plane – SDN Controller
 Controller provides a programmatic
interface to the network
 Give a logically centralized global
view of network
 Simplification of policy enforcements
and management
 Some concerns:
 Control scalability
 Centralized vs. Distributed
 Reactive vs. Proactive Policies
Control Plane – SDN Controller (Cont..)
Northbound Interfaces & SDN Applications
 Northbound Interface:
 A communication interface between Control plane and
Applications
 There is no currently accepted standard for northbound
interfaces
 Implemented on ad hoc basis for particular application
 SDN Applications:
 Traffic Engineering
 Security
 QoS
 Routing
 Switching
 Virtualization
 Monitoring
 Load Balancing
 New Innovations???
NFV vs SDN
52
SDN & NFV Relationship
 Concept of NFV originated from SDN
 NFV and SDN are complementary.
 One does not depend upon the other.
 You can do SDN only, NFV only, or SDN and NFV
 Both have similar goals but approaches are very different
 SDN needs new interfaces, control modules, applications
 NFV requires moving network applications from dedicated
hardware to virtual containers on commercial-off-the-shelf
(COTS) hardware
NFV vs. SDN
 NFV can serve SDN by
virtualizing SDN controller
 Implements NFs in software
 Reducing CapEx, OpEx, space,
power consumption .
 Decouple network function from
proprietary hardware and
achieve agile provisioning and
deployment
 SDN serves NFV by providing
programmable connectivity b/w
VNFs to optimize traffic
engineering & steering
 Central control and
programmable architecture for
better connectivity
 Network abstractions to enable
flexible network control,
configuration and innovation
 Decouple control plane from
data plane forwarding to provide
a centralized controller
Software-Defined NFV Architecture
Software-Defined NFV–Service Chaining
NFV
Virtualize
CLOUD
Scale
SDN
Control
Management & Orchestration
Cross Domain Control, Orchestration & Management
SDN,Cloudand NFV
Introduction of ONOS
 ONOS (Open Network Operating System) is an open
source SDN OS
 Developed in concert with leading SPs, vendors, R&E
network operators and collaborators
 Specially targeted: Service Providers and mission critical
networks.
 ONOS main goals:
 Liberate network application developers from knowing the details
of proprietary hardware
 Free from the operational complexities of proprietary interfaces
and protocols
 Re-enable innovation to happen for both network hardware and
software
Why ONOS??
 Several open source controllers (NOX, Beacon, SNAC,
POX, etc)
 ONOS will:
 Bring carrier grade network (scale, availability, and performance)
to SDN control plane
 Enable web style agility
 Help SPs to migrate existing networks
 Lower SP CapEx & OpEx
ONOS Vision for SPs Networks
 Enabling SP SDN adoption for carrier-grade service and network innovation
Key Elements of ONOS
 Modular, Scalable, Resilient with Abstractions
ONOS Architecture
Defining Features of ONOS
 Distributed Core:
 Provides scalability, high availability, and performance
 Bring carrier grade features
 Run as a cluster is one way that ONOS brings web style agility
 Northbound abstraction/APIs:
 Include network graph and application intents to ease development
of control, management, and configuration services
 Southbound abstraction/APIs:
 Enable pluggable southbound protocols for controlling both
OpenFlow and Legacy devices
 A key enabler for migration from legacy devices to OpenFlow-based
white boxes
 Software Modularity:
 Easy to develop, debug, maintain, and upgrade ONOS as a
software system
Distributed ONOS Architecture
Software Modularity
 ONOS software is easy to enhance, change, and maintain
 Great care into modularity to make it easy for developers
 At the macro level the Northbound and Southbound APIs provide an initial
basis for insulating Applications, Core and Adapters from each other
 New applications or new protocol adapters can be added as needed without
each needing to know about the other
 Rely heavily on interfaces to serve as contracts for interactions between
different parts of the core
ONOS Initial SPs Use Cases
SONA (Simplified Overlay Network Architecture)
 ONOS applications
 Provides OpenStack Neutron ML2 mechanism driver
and L3 service plugin
 Optimized tenant network virtualization service
 Provisioning an isolated virtual tenant network uses
VXLAN based L2 tunneling or GRE/GENEVE based L3
tunneling with OpenvSwitch (OVS)
 Horizontal scalability of a gateway node
67
SONA (Simplified Overlay Network Architecture)
68
Example of SONA
69
Introduction to High
Performance Computing
(HPC) & High Throughput
Computing (HTC)
71
Anatomy of a Computer
72
Multicore
 Cores share path to
memory
 SIMD instructions +
multicore make this an
increasing bottleneck!
Performance
 The performance (time to solution) on a single computer
can depend on:
 Clock speed – How fast the processor is
 Floating point unit -- how many operands can be operated on
and what operations can be performed?
 Memory latency – what is the delay in accessing data?
 Memory bandwidth – how fast can we stream data from memory
 I/O to storage – how quickly can we access files?
 For parallel computing you can also be limited by the
performance of the interconnection
73
Performance (Cont.)
 Application performance often described as:
 Compute bound
 Memory bound
 IO bound
 Communication bound
 For computational science
 Most calculations are limited by memory bandwidth
 Processor faster than memory access
74
75
HPC Architectures
 All Cores have the same access to memory, e.g. a multicore laptop.
Symmetric Multi-processing
76
HPC Architectures (Cont.)
Distributed Memory Architecture
77
HPC Architectures (Cont.)
 In a real system:
 Each node will be a
shared-memory
system
 E.g. multicore processor
 The network will have
some specific topology
 E.g. a regular grid
Distributed/Shared Memory Hybrids
The Flood of Data
78
Why HPC?
Why HPC? (Cont.)
79
Why HPC? (Cont.)
 Big data and scientific simulations need greater
computer power.
 Single-core processors can not be made that
have enough resources for the simulations
needed.
 Making processors with faster clock speeds is difficult
due to cost
 Expensive to put huge memory on a single processor
 Solution parallel computing – divide up the work
among numerous linked systems
80
Generic Parallel Machines
81
 Good conceptual model is collection of multicore
laptops.
 Connected together by a network
 Each laptop is called a
compute node
 Each has its own OS and
network
 Suppose each node is
quad core
 Total system has 20
processor-cores
Parallel Computing?
 Parallel computing and HPC are intimately
related
 Higher performance requires more processor-cores
 Understanding of different parallel programming
models allows you to understand how to use
HPC resources efficiently
 Also allow you to better understand and critique
work that uses HPC in your research area
82
What is HPC?
 Leveraging distributed compute resources to solve
complex problems with large datasets
 Terabytes to petabytes to zettabytes of data
 Results in minutes to hours instead of days or weeks
83
Differences from Desktop Computing
 Do not log on to compute nodes directly
 Submit jobs via batch scheduling systems
 Not a GUI-based environment
 Share the system with many users
 Resources more tightly monitored and controlled
 Disk quotas
 CPU usage
84
Typical HPC System Layout
85
86
Typical Software Usage Flow
HPC Applications?
87
High Throughput
Computing(with HTCondor)
88
Serial Computing
 What many programs look
like:
 Serial execution, running on one
processor (CPU core) at a time
 Overall compute time grows
significantly as individual tasks
get more complicated (long) or if
the number of tasks increases
 How can you speed things up?
89
High Throughput Computing (HTC)?
 Parallelize!
 Independent tasks run on different cores
90
High Performance Computing (HPC)?
91
High Performance Computing (HPC)?
 Benefits greatly from:
 CPU speed + homogeneity
 Shared filesystems
 Fast, expensive networking (e.g.
InfiniBand) and servers co-located
 Scheduling: Must wait until all processors
are available, at the same time and for
the full duration
 Requires special programming (MP/MPI)
 What happens if one core or server fails
or runs slower than the others?
92
High Throughput Computing (HTC)?
 Scheduling: only need 1 CPU core for each (shorter
wait)
 Easier recovery from failure
 No special programming required
 Number of concurrently running jobs is more important
 CPU speed and homogeneity are less important
93
High Throughput vs High Performance
 HTC
 Focus: Large workflows of
numerous, relatively
small, and independent
compute tasks
 More important: maximized
number of running tasks
 Less important: CPU speed,
homogeneity
94
 HPC
 Focus: Large workflows of
highly-interdependent sub-
tasks
 More important: persistent
access to the fastest cores,
CPU homogeneity, special
coding, shared filesystems,
fast networks
Example..
 You need to process 48 brain images for each of 168
patients. Each image takes ~1 hour of compute time.
 168 patients x 48 images = ~8000 tasks = ~8000 hrs
95
Distributed Computing
 Use many computers, each running one instance of
our program
 Example:
 1 laptop (1 core) => 4,000 hours = ~½ year
 1 server (~20 cores) => 500 hours = ~3 weeks
 1 large job (400 cores) => 20 hours = ~1 day
 A whole cluster (8,000 cores) = ~8 hours
96
Break Up to Scale Up
 Computing tasks that are easy to break up are easy
to scale up.
 To truly grow your computing capabilities, you also
need a system appropriate for your computing task!
97
What computing resources are available?
 A single computer?
 A local cluster?
 Consider: What kind of cluster is it? Typical clusters tuned for
HPC (large MPI) jobs typically may not be best for HTC
workflows! Do you need even more than that?
 Open Science Grid (OSG)
 Other
 European Grid Infrastructure
 Other national and regional grids
 Commercial cloud systems (e.g. HTCondoron Amazon)
98
Example Local Cluster
 UW-Madison’s Center for High Throughput
Computing (CHTC)
 Recent CPU hours:
 ~130 million hrs/year (~15k cores)
 ~10,000 per user, per day (~400 cores in use)
99
Open Science Grid (OSG)
 HTC for Everyone
 ~100 contributors
 Past year:
 >420 million jobs
 >1.5 billion CPU hours
 >200 petabytes transferred
 Can submit jobs locally, they backfill across
the country-interrupted at any time (but not
too frequent)
 http://www.opensciencegrid.org/
100

443029825 cloud-computing-week8-9-pptx

  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
    5 5G Use Cases •Massive-Machine Type Communication (mMTC) • Mission Critical-Machine Type Communication (MC-MTC)
  • 6.
  • 7.
    One Network –Multiple Industries 7
  • 8.
  • 9.
    Telco Cloud Networks SDN/NFV enable programmability & Cloud enable virtualization of network resources  High level of flexibility and programmability in individual domains (mobile core, radio access network and transport network).  Cross-domain programmability and orchestration. 9
  • 10.
    Telco Network 10 Change thebusiness model of Telecom Biggest technological revolution Innovative more quickly Change the way network are deployed Change the way consumers are enabling services on the fly
  • 11.
    Why Cloud Networkare required in Telco? 11
  • 12.
  • 13.
    What is aTelco Cloud?  A Telco Grade (aka Carrier Grade) Cloud will be a cloud that can support telco grade applications  Telco Grade requirements:  High availability  High performance (large number of transactions, scalability)  Serviceability  Long life time  Security  Real-time behavior  Standard-compliant HW 13
  • 14.
    Why Telco Clouds? 14 Problemof Telco Operators
  • 15.
    Get Right Capacityat Right Time 15
  • 16.
    Get Right Capacityat Right Time 16
  • 17.
    Get Right Capacityat Right Time 17
  • 18.
    Why Telco Cloud? 18 Problemof Telco Operators
  • 19.
    How Telco CloudHelps? 19
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
    Traditional Network Functions(NFs)  Proprietary devices/boxes for different NFs  Network services rely on different type of appliances  New services into today’s network is becoming increasingly difficult due to:  Proprietary nature of appliances  Diverse and purpose built hardware  Cost (increase CapEx & OpEx)  Short life cycle of appliances  Lack of space  Energy for middle-boxes  Lack of skilled professionals to integrate services  Recently, NFV proposed to alleviate these problems
  • 29.
    Introduction– Network Function Virtualization NFV proposed by ESTI, an Industry Specification Group (ISG)  Allows to implement NFs (Network Functions) in software.  Virtualized the NFs carried by proprietary HW.  Decoupling NFs from underlying appliances  NFs can runs on commodity hardware (i.e. servers, storage, switches)  Accelerate deployment of new services and NFs
  • 30.
  • 31.
    Why We NeedNFV?  Virtualization: Use network resource without worrying about where it is physically located, how much it is, how it is organized, etc.  Orchestration: Manage thousands of devices  Programmable: Should be able to change behavior on the fly.  Dynamic Scaling: Should be able to change size, quantity  Visibility: Monitor resources, connectivity  Performance: Optimize network device utilization  Multi-tenancy  Service Integration  Openness: Full choice of Modular plug-ins
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
    Traditional Network Devices Typical Networking Software  Management plane  Control Plane – The brain/decision maker  Data Plane – Packet forwarder
  • 41.
  • 42.
  • 43.
  • 44.
    Basic concepts ofSDN  Separate Control plane and Data plane entities.  Network intelligence and state are logically centralized.  The underlying network infrastructure is abstracted from the applications.  Execute or run Control plane software on general purpose hardware.  Decouple from specific networking hardware.  Use commodity servers and switches.  Have programmable data planes.  Maintain, control and program data plane state from a central entity.  An architecture to control not just a networking device but an entire network.
  • 45.
  • 46.
    SDN Architecture  ApplicationLayer:  Focusing network services  SW apps communicating with the control layer  Control-Plane Layer:  Core of SDN  Consists of a centralized controller  Logically maintains a global and dynamic view  Take requests from application layer  Manage network devices via standard protocols  Data-Plane Layer:  Programmable devices  Support standard interfaces
  • 47.
    Data Plane –Forwarding Devices
  • 48.
    Southbound Interface –OpenFlow  Forwarding elements are controlled by an open interface  OpenFlow has strong support from industry, research and academia  OpenFlow, standardized information exchange b/w two planes  Provides controller-switch interactions  OpenFlow 1.3.0 provides secure communication using TLS  Communicate with controller via OpenFlow protocol
  • 49.
    Control Plane –SDN Controller  Controller provides a programmatic interface to the network  Give a logically centralized global view of network  Simplification of policy enforcements and management  Some concerns:  Control scalability  Centralized vs. Distributed  Reactive vs. Proactive Policies
  • 50.
    Control Plane –SDN Controller (Cont..)
  • 51.
    Northbound Interfaces &SDN Applications  Northbound Interface:  A communication interface between Control plane and Applications  There is no currently accepted standard for northbound interfaces  Implemented on ad hoc basis for particular application  SDN Applications:  Traffic Engineering  Security  QoS  Routing  Switching  Virtualization  Monitoring  Load Balancing  New Innovations???
  • 52.
  • 53.
    SDN & NFVRelationship  Concept of NFV originated from SDN  NFV and SDN are complementary.  One does not depend upon the other.  You can do SDN only, NFV only, or SDN and NFV  Both have similar goals but approaches are very different  SDN needs new interfaces, control modules, applications  NFV requires moving network applications from dedicated hardware to virtual containers on commercial-off-the-shelf (COTS) hardware
  • 54.
    NFV vs. SDN NFV can serve SDN by virtualizing SDN controller  Implements NFs in software  Reducing CapEx, OpEx, space, power consumption .  Decouple network function from proprietary hardware and achieve agile provisioning and deployment  SDN serves NFV by providing programmable connectivity b/w VNFs to optimize traffic engineering & steering  Central control and programmable architecture for better connectivity  Network abstractions to enable flexible network control, configuration and innovation  Decouple control plane from data plane forwarding to provide a centralized controller
  • 55.
  • 56.
  • 57.
    NFV Virtualize CLOUD Scale SDN Control Management & Orchestration CrossDomain Control, Orchestration & Management SDN,Cloudand NFV
  • 58.
    Introduction of ONOS ONOS (Open Network Operating System) is an open source SDN OS  Developed in concert with leading SPs, vendors, R&E network operators and collaborators  Specially targeted: Service Providers and mission critical networks.  ONOS main goals:  Liberate network application developers from knowing the details of proprietary hardware  Free from the operational complexities of proprietary interfaces and protocols  Re-enable innovation to happen for both network hardware and software
  • 59.
    Why ONOS??  Severalopen source controllers (NOX, Beacon, SNAC, POX, etc)  ONOS will:  Bring carrier grade network (scale, availability, and performance) to SDN control plane  Enable web style agility  Help SPs to migrate existing networks  Lower SP CapEx & OpEx
  • 60.
    ONOS Vision forSPs Networks  Enabling SP SDN adoption for carrier-grade service and network innovation
  • 61.
    Key Elements ofONOS  Modular, Scalable, Resilient with Abstractions
  • 62.
  • 63.
    Defining Features ofONOS  Distributed Core:  Provides scalability, high availability, and performance  Bring carrier grade features  Run as a cluster is one way that ONOS brings web style agility  Northbound abstraction/APIs:  Include network graph and application intents to ease development of control, management, and configuration services  Southbound abstraction/APIs:  Enable pluggable southbound protocols for controlling both OpenFlow and Legacy devices  A key enabler for migration from legacy devices to OpenFlow-based white boxes  Software Modularity:  Easy to develop, debug, maintain, and upgrade ONOS as a software system
  • 64.
  • 65.
    Software Modularity  ONOSsoftware is easy to enhance, change, and maintain  Great care into modularity to make it easy for developers  At the macro level the Northbound and Southbound APIs provide an initial basis for insulating Applications, Core and Adapters from each other  New applications or new protocol adapters can be added as needed without each needing to know about the other  Rely heavily on interfaces to serve as contracts for interactions between different parts of the core
  • 66.
  • 67.
    SONA (Simplified OverlayNetwork Architecture)  ONOS applications  Provides OpenStack Neutron ML2 mechanism driver and L3 service plugin  Optimized tenant network virtualization service  Provisioning an isolated virtual tenant network uses VXLAN based L2 tunneling or GRE/GENEVE based L3 tunneling with OpenvSwitch (OVS)  Horizontal scalability of a gateway node 67
  • 68.
    SONA (Simplified OverlayNetwork Architecture) 68
  • 69.
  • 70.
    Introduction to High PerformanceComputing (HPC) & High Throughput Computing (HTC)
  • 71.
  • 72.
    72 Multicore  Cores sharepath to memory  SIMD instructions + multicore make this an increasing bottleneck!
  • 73.
    Performance  The performance(time to solution) on a single computer can depend on:  Clock speed – How fast the processor is  Floating point unit -- how many operands can be operated on and what operations can be performed?  Memory latency – what is the delay in accessing data?  Memory bandwidth – how fast can we stream data from memory  I/O to storage – how quickly can we access files?  For parallel computing you can also be limited by the performance of the interconnection 73
  • 74.
    Performance (Cont.)  Applicationperformance often described as:  Compute bound  Memory bound  IO bound  Communication bound  For computational science  Most calculations are limited by memory bandwidth  Processor faster than memory access 74
  • 75.
    75 HPC Architectures  AllCores have the same access to memory, e.g. a multicore laptop. Symmetric Multi-processing
  • 76.
  • 77.
    77 HPC Architectures (Cont.) In a real system:  Each node will be a shared-memory system  E.g. multicore processor  The network will have some specific topology  E.g. a regular grid Distributed/Shared Memory Hybrids
  • 78.
    The Flood ofData 78 Why HPC?
  • 79.
  • 80.
    Why HPC? (Cont.) Big data and scientific simulations need greater computer power.  Single-core processors can not be made that have enough resources for the simulations needed.  Making processors with faster clock speeds is difficult due to cost  Expensive to put huge memory on a single processor  Solution parallel computing – divide up the work among numerous linked systems 80
  • 81.
    Generic Parallel Machines 81 Good conceptual model is collection of multicore laptops.  Connected together by a network  Each laptop is called a compute node  Each has its own OS and network  Suppose each node is quad core  Total system has 20 processor-cores
  • 82.
    Parallel Computing?  Parallelcomputing and HPC are intimately related  Higher performance requires more processor-cores  Understanding of different parallel programming models allows you to understand how to use HPC resources efficiently  Also allow you to better understand and critique work that uses HPC in your research area 82
  • 83.
    What is HPC? Leveraging distributed compute resources to solve complex problems with large datasets  Terabytes to petabytes to zettabytes of data  Results in minutes to hours instead of days or weeks 83
  • 84.
    Differences from DesktopComputing  Do not log on to compute nodes directly  Submit jobs via batch scheduling systems  Not a GUI-based environment  Share the system with many users  Resources more tightly monitored and controlled  Disk quotas  CPU usage 84
  • 85.
  • 86.
  • 87.
  • 88.
  • 89.
    Serial Computing  Whatmany programs look like:  Serial execution, running on one processor (CPU core) at a time  Overall compute time grows significantly as individual tasks get more complicated (long) or if the number of tasks increases  How can you speed things up? 89
  • 90.
    High Throughput Computing(HTC)?  Parallelize!  Independent tasks run on different cores 90
  • 91.
  • 92.
    High Performance Computing(HPC)?  Benefits greatly from:  CPU speed + homogeneity  Shared filesystems  Fast, expensive networking (e.g. InfiniBand) and servers co-located  Scheduling: Must wait until all processors are available, at the same time and for the full duration  Requires special programming (MP/MPI)  What happens if one core or server fails or runs slower than the others? 92
  • 93.
    High Throughput Computing(HTC)?  Scheduling: only need 1 CPU core for each (shorter wait)  Easier recovery from failure  No special programming required  Number of concurrently running jobs is more important  CPU speed and homogeneity are less important 93
  • 94.
    High Throughput vsHigh Performance  HTC  Focus: Large workflows of numerous, relatively small, and independent compute tasks  More important: maximized number of running tasks  Less important: CPU speed, homogeneity 94  HPC  Focus: Large workflows of highly-interdependent sub- tasks  More important: persistent access to the fastest cores, CPU homogeneity, special coding, shared filesystems, fast networks
  • 95.
    Example..  You needto process 48 brain images for each of 168 patients. Each image takes ~1 hour of compute time.  168 patients x 48 images = ~8000 tasks = ~8000 hrs 95
  • 96.
    Distributed Computing  Usemany computers, each running one instance of our program  Example:  1 laptop (1 core) => 4,000 hours = ~½ year  1 server (~20 cores) => 500 hours = ~3 weeks  1 large job (400 cores) => 20 hours = ~1 day  A whole cluster (8,000 cores) = ~8 hours 96
  • 97.
    Break Up toScale Up  Computing tasks that are easy to break up are easy to scale up.  To truly grow your computing capabilities, you also need a system appropriate for your computing task! 97
  • 98.
    What computing resourcesare available?  A single computer?  A local cluster?  Consider: What kind of cluster is it? Typical clusters tuned for HPC (large MPI) jobs typically may not be best for HTC workflows! Do you need even more than that?  Open Science Grid (OSG)  Other  European Grid Infrastructure  Other national and regional grids  Commercial cloud systems (e.g. HTCondoron Amazon) 98
  • 99.
    Example Local Cluster UW-Madison’s Center for High Throughput Computing (CHTC)  Recent CPU hours:  ~130 million hrs/year (~15k cores)  ~10,000 per user, per day (~400 cores in use) 99
  • 100.
    Open Science Grid(OSG)  HTC for Everyone  ~100 contributors  Past year:  >420 million jobs  >1.5 billion CPU hours  >200 petabytes transferred  Can submit jobs locally, they backfill across the country-interrupted at any time (but not too frequent)  http://www.opensciencegrid.org/ 100