SlideShare a Scribd company logo
1 of 28
08/25/2009 CS4961
CS4961 Parallel Programming
Lecture 1: Introduction
Mary Hall
August 25, 2009
1
Course Details
• Time and Location: TuTh, 9:10-10:30 AM, WEB L112
• Course Website
- http://www.eng.utah.edu/~cs4961/
• Instructor: Mary Hall, mhall@cs.utah.edu,
http://www.cs.utah.edu/~mhall/
- Office Hours: Tu 10:45-11:15 AM; Wed 11:00-11:30 AM
• TA: Sriram Aananthakrishnan, sriram@cs.utah.edu
- Office Hours: TBD
• Textbook
- “Principles of Parallel Programming,”
Calvin Lin and Lawrence Snyder.
- Also, readings and notes provided for
MPI, CUDA, Locality and Parallel Algs.
08/25/2009 CS4961 2
Today’s Lecture
• Overview of course (done)
• Important problems require powerful
computers …
- … and powerful computers must be parallel.
- Increasing importance of educating parallel
programmers (you!)
• What sorts of architectures in this class
- Multimedia extensions, multi-cores, GPUs,
networked clusters
• Developing high-performance parallel
applications
- An optimization perspective
08/25/2009 3
CS4961
08/25/2009 CS4961
Outline
• Logistics
• Introduction
• Technology Drivers for Multi-Core Paradigm Shift
• Origins of Parallel Programming: Large-scale
scientific simulations
• The fastest computer in the world today
• Why writing fast parallel programs is hard
Some material for this lecture drawn from:
Kathy Yelick and Jim Demmel, UC Berkeley
Quentin Stout, University of Michigan,
(see http://www.eecs.umich.edu/~qstout/parallel.html)
Top 500 list (http://www.top500.org)
4
Course Objectives
• Learn how to program parallel processors and
systems
- Learn how to think in parallel and write correct
parallel programs
- Achieve performance and scalability through
understanding of architecture and software mapping
• Significant hands-on programming experience
- Develop real applications on real hardware
• Discuss the current parallel computing context
- What are the drivers that make this course timely
- Contemporary programming models and
architectures, and where is the field going
08/25/2009 CS4961 5
Parallel and Distributed Computing
• Parallel computing (processing):
- the use of two or more processors (computers), usually
within a single system, working simultaneously to solve a
single problem.
• Distributed computing (processing):
- any computing that involves multiple computers remote
from each other that each have a role in a computation
problem or information processing.
• Parallel programming:
- the human process of developing programs that express
what computations should be executed in parallel.
08/25/2009 CS4961 6
08/25/2009 CS4961
Why is Parallel Programming Important Now?
• All computers are now parallel computers
(embedded, commodity, supercomputer)
- On-chip architectures look like parallel computers
- Languages, software development and compilation strategies
originally developed for high end (supercomputers) are now
becoming important for many other domains
• Why?
- Technology trends
• Looking to the future
- Parallel computing for the masses demands better parallel
programming paradigms
- And more people who are trained in writing parallel
programs (possibly you!)
- How to put all these vast machine resources to the best use!
7
Detour: Technology as Driver for
“Multi-Core” Paradigm Shift
• Do you know why most computers sold today are
parallel computers?
• Let’s talk about the technology trends
08/25/2009 CS4961 8
Technology Trends: Microprocessor Capacity
Slide source: Maurice Herlihy
Clock speed
flattening
sharply
Transistor
count still
rising
Moore’s Law:
Gordon Moore (co-founder of Intel) predicted in 1965 that the transistor
density of semiconductor chips would double roughly every 18 months.
08/25/2009 9
CS4961
Technology Trends: Power Density Limits
Serial Performance
08/25/2009 CS4961 10
• Key ideas:
- Movement away from increasingly complex processor design
and faster clocks
- Replicated functionality (i.e., parallel) is simpler to design
- Resources more efficiently utilized
- Huge power management advantages
What to do with all these transistors?
The Multi-Core Paradigm Shift
All Computers are Parallel Computers.
08/25/2009 11
CS4961
Proof of Significance: Popular Press
08/25/2009 CS4961
• This week’s issue of
Newsweek!
• Article on 25 things “smart
people” should know
• See
http://www.newsweek.com/id/212142
12
08/25/2009 CS4961
Scientific Simulation:
The Third Pillar of Science
• Traditional scientific and engineering paradigm:
1)Do theory or paper design.
2)Perform experiments or build system.
• Limitations:
- Too difficult -- build large wind tunnels.
- Too expensive -- build a throw-away passenger jet.
- Too slow -- wait for climate or galactic evolution.
- Too dangerous -- weapons, drug design, climate
experimentation.
• Computational science paradigm:
3)Use high performance computer systems to simulate
the phenomenon
- Base on known physical laws and efficient numerical
methods.
13
The quest for increasingly more powerful machines
• Scientific simulation will continue to push on system
requirements:
- To increase the precision of the result
- To get to an answer sooner (e.g., climate modeling, disaster
modeling)
• The U.S. will continue to acquire systems of
increasing scale
- For the above reasons
- And to maintain competitiveness
08/25/2009 CS4961 14
A Similar Phenomenon in Commodity Systems
• More capabilities in software
• Integration across software
• Faster response
• More realistic graphics
• …
CS4961
08/25/2009 15
The fastest computer in the world today
• What is its name?
• Where is it located?
• How many processors does it have?
• What kind of processors?
• How fast is it?
RoadRunner
Los Alamos National
Laboratory
~19,000 processor chips
(~129,600 “processors”)
AMD Opterons and
IBM Cell/BE (in Playstations)
1.105 Petaflop/second
One quadrilion operations/s
1 x 1016
See http://www.top500.org
08/25/2009 CS4961 16
08/25/2009 CS4961
Example: Global Climate Modeling Problem
• Problem is to compute:
f(latitude, longitude, elevation, time) 
temperature, pressure, humidity, wind velocity
• Approach:
- Discretize the domain, e.g., a measurement point every 10 km
- Devise an algorithm to predict weather at time t+dt given t
• Uses:
- Predict major events,
e.g., El Nino
- Use in setting air
emissions standards
Source: http://www.epm.ornl.gov/chammp/chammp.html
17
High Resolution
Climate Modeling on
NERSC-3 – P. Duffy,
et al., LLNL
08/25/2009 CS4961 18
08/25/2009 CS4961
Some Characteristics of Scientific Simulation
• Discretize physical or conceptual space into a grid
- Simpler if regular, may be more representative if adaptive
• Perform local computations on grid
- Given yesterday’s temperature and weather pattern, what is
today’s expected temperature?
• Communicate partial results between grids
- Contribute local weather result to understand global
weather pattern.
• Repeat for a set of time steps
• Possibly perform other calculations with results
- Given weather model, what area should evacuate for a
hurricane?
19
Example of Discretizing a Domain
08/25/2009 CS4961
One
processor
computes
this part
Another
processor
computes
this part in
parallel
Processors in adjacent blocks in the grid communicate their result.
20
08/25/2009 CS4961
Parallel Programming Complexity
An Analogy to Preparing Thanksgiving Dinner
• Enough parallelism? (Amdahl’s Law)
- Suppose you want to just serve turkey
• Granularity
- How frequently must each assistant report to the chef
- After each stroke of a knife? Each step of a recipe? Each dish
completed?
• Locality
- Grab the spices one at a time? Or collect ones that are needed
prior to starting a dish?
• Load balance
- Each assistant gets a dish? Preparing stuffing vs. cooking green
beans?
• Coordination and Synchronization
- Person chopping onions for stuffing can also supply green beans
- Start pie after turkey is out of the oven
All of these things makes parallel
programming even harder than sequential
programming.
21
08/25/2009 CS4961
Finding Enough Parallelism
• Suppose only part of an application seems parallel
• Amdahl’s law
- let s be the fraction of work done sequentially, so
(1-s) is fraction parallelizable
- P = number of processors
Speedup(P) = Time(1)/Time(P)
<= 1/(s + (1-s)/P)
<= 1/s
• Even if the parallel part speeds up perfectly
performance is limited by the sequential part
22
08/25/2009 CS4961
Overhead of Parallelism
• Given enough parallel work, this is the biggest barrier
to getting desired speedup
• Parallelism overheads include:
- cost of starting a thread or process
- cost of communicating shared data
- cost of synchronizing
- extra (redundant) computation
• Each of these can be in the range of milliseconds
(=millions of flops) on some systems
• Tradeoff: Algorithm needs sufficiently large units of
work to run fast in parallel (I.e. large granularity),
but not so large that there is not enough parallel
work
23
08/25/2009 CS4961
Locality and Parallelism
• Large memories are slow, fast memories are small
• Program should do most work on local data
Proc
Cache
L2 Cache
L3 Cache
Memory
Conventional
Storage
Hierarchy
Proc
Cache
L2 Cache
L3 Cache
Memory
Proc
Cache
L2 Cache
L3 Cache
Memory
potential
interconnects
24
08/25/2009 CS4961
Load Imbalance
• Load imbalance is the time that some processors in
the system are idle due to
- insufficient parallelism (during that phase)
- unequal size tasks
• Examples of the latter
- adapting to “interesting parts of a domain”
- tree-structured computations
- fundamentally unstructured problems
• Algorithm needs to balance load
25
Some Popular Parallel Programming Models
• Pthreads (parallel threads)
- Low level expression of threads, which are independent
computations that can execute in parallel
• MPI (Message Passing Interface)
- Most widely used at the very high-end machines
- Extension to common sequential languages, express
communication between different processes along with
parallelism
• Map-Reduce (popularized by Google)
- Map: apply the same computation to lots of different data
(usually in distributed files) and produce local results
- Reduce: compute global result from set of local results
• CUDA (Compute Unified Device Architecture)
- Proprietary programming language for NVIDIA graphics
processors
08/25/2009 CS4961 26
08/25/2009 CS4961
Summary of Lecture
• Solving the “Parallel Programming Problem”
- Key technical challenge facing today’s computing industry,
government agencies and scientists
• Scientific simulation discretizes some space into a grid
- Perform local computations on grid
- Communicate partial results between grids
- Repeat for a set of time steps
- Possibly perform other calculations with results
• Commodity parallel programming can draw from this history
and move forward in a new direction
• Writing fast parallel programs is difficult
- Amdahl’s Law Must parallelize most of computation
- Data Locality
- Communication and Synchronization
- Load Imbalance
27
Next Time
• An exploration of parallel algorithms and their
features
• First written homework assignment
08/25/2009 CS4961 28

More Related Content

Similar to CS4961-L1.ppt

Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel ComputingRoshan Karunarathna
 
Simulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresCloudLightning
 
CloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning
 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesMurtadha Alsabbagh
 
Parallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptxParallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptxkrnaween
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...EUDAT
 
Chap 2 classification of parralel architecture and introduction to parllel p...
Chap 2  classification of parralel architecture and introduction to parllel p...Chap 2  classification of parralel architecture and introduction to parllel p...
Chap 2 classification of parralel architecture and introduction to parllel p...Malobe Lottin Cyrille Marcel
 
Introduction to parallel_computing
Introduction to parallel_computingIntroduction to parallel_computing
Introduction to parallel_computingMehul Patel
 
Parallex - The Supercomputer
Parallex - The SupercomputerParallex - The Supercomputer
Parallex - The SupercomputerAnkit Singh
 
Update on Trinity System Procurement and Plans
Update on Trinity System Procurement and PlansUpdate on Trinity System Procurement and Plans
Update on Trinity System Procurement and Plansinside-BigData.com
 
Introduction to High Performance Computing
Introduction to High Performance ComputingIntroduction to High Performance Computing
Introduction to High Performance ComputingUmarudin Zaenuri
 
Introduction to High-Performance Computing
Introduction to High-Performance ComputingIntroduction to High-Performance Computing
Introduction to High-Performance ComputingUmarudin Zaenuri
 
Applying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System IntegrationsApplying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System Integrationsinside-BigData.com
 
Introduction to Computer & Operating Systems.ppt
Introduction to Computer & Operating Systems.pptIntroduction to Computer & Operating Systems.ppt
Introduction to Computer & Operating Systems.pptBakareAyeni1
 
Introduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed ComputingIntroduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed ComputingSayed Chhattan Shah
 
Week # 1.pdf
Week # 1.pdfWeek # 1.pdf
Week # 1.pdfgiddy5
 
parallelprogramming-130823023925-phpapp01.pptx
parallelprogramming-130823023925-phpapp01.pptxparallelprogramming-130823023925-phpapp01.pptx
parallelprogramming-130823023925-phpapp01.pptxMarlonMagtibay3
 

Similar to CS4961-L1.ppt (20)

Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
 
Simulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud Infrastructures
 
CloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use Case
 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and Disadvantages
 
Parallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptxParallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptx
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
 
Chap 2 classification of parralel architecture and introduction to parllel p...
Chap 2  classification of parralel architecture and introduction to parllel p...Chap 2  classification of parralel architecture and introduction to parllel p...
Chap 2 classification of parralel architecture and introduction to parllel p...
 
Introduction to parallel_computing
Introduction to parallel_computingIntroduction to parallel_computing
Introduction to parallel_computing
 
Parallex - The Supercomputer
Parallex - The SupercomputerParallex - The Supercomputer
Parallex - The Supercomputer
 
Update on Trinity System Procurement and Plans
Update on Trinity System Procurement and PlansUpdate on Trinity System Procurement and Plans
Update on Trinity System Procurement and Plans
 
Introduction to High Performance Computing
Introduction to High Performance ComputingIntroduction to High Performance Computing
Introduction to High Performance Computing
 
Introduction to High-Performance Computing
Introduction to High-Performance ComputingIntroduction to High-Performance Computing
Introduction to High-Performance Computing
 
Parallel processing
Parallel processingParallel processing
Parallel processing
 
Applying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System IntegrationsApplying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System Integrations
 
High performance computing
High performance computingHigh performance computing
High performance computing
 
Introduction to Computer & Operating Systems.ppt
Introduction to Computer & Operating Systems.pptIntroduction to Computer & Operating Systems.ppt
Introduction to Computer & Operating Systems.ppt
 
Introduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed ComputingIntroduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed Computing
 
Week # 1.pdf
Week # 1.pdfWeek # 1.pdf
Week # 1.pdf
 
Chap 1(one) general introduction
Chap 1(one)  general introductionChap 1(one)  general introduction
Chap 1(one) general introduction
 
parallelprogramming-130823023925-phpapp01.pptx
parallelprogramming-130823023925-phpapp01.pptxparallelprogramming-130823023925-phpapp01.pptx
parallelprogramming-130823023925-phpapp01.pptx
 

More from MarlonMagtibay2

More from MarlonMagtibay2 (14)

informationmanagement-130518152950-phpapp01.pptx
informationmanagement-130518152950-phpapp01.pptxinformationmanagement-130518152950-phpapp01.pptx
informationmanagement-130518152950-phpapp01.pptx
 
chapter8-stack-161018120225.pdf
chapter8-stack-161018120225.pdfchapter8-stack-161018120225.pdf
chapter8-stack-161018120225.pdf
 
arrays-130116232821-phpapp02.pdf
arrays-130116232821-phpapp02.pdfarrays-130116232821-phpapp02.pdf
arrays-130116232821-phpapp02.pdf
 
CS4961-L9.ppt
CS4961-L9.pptCS4961-L9.ppt
CS4961-L9.ppt
 
Data Structure - Stack.pptx
Data Structure - Stack.pptxData Structure - Stack.pptx
Data Structure - Stack.pptx
 
Lecture_Computer_Codes.ppt
Lecture_Computer_Codes.pptLecture_Computer_Codes.ppt
Lecture_Computer_Codes.ppt
 
lecture3_dec_bin_1.ppt
lecture3_dec_bin_1.pptlecture3_dec_bin_1.ppt
lecture3_dec_bin_1.ppt
 
Lec2_NumberSystems.ppt
Lec2_NumberSystems.pptLec2_NumberSystems.ppt
Lec2_NumberSystems.ppt
 
Arithmetic.ppt
Arithmetic.pptArithmetic.ppt
Arithmetic.ppt
 
binary-numbers.ppt
binary-numbers.pptbinary-numbers.ppt
binary-numbers.ppt
 
01.NumberSystems.ppt
01.NumberSystems.ppt01.NumberSystems.ppt
01.NumberSystems.ppt
 
lect1.ppt
lect1.pptlect1.ppt
lect1.ppt
 
renewablle energy.ppt
renewablle energy.pptrenewablle energy.ppt
renewablle energy.ppt
 
ch04.ppt
ch04.pptch04.ppt
ch04.ppt
 

Recently uploaded

Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 

Recently uploaded (20)

Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 

CS4961-L1.ppt

  • 1. 08/25/2009 CS4961 CS4961 Parallel Programming Lecture 1: Introduction Mary Hall August 25, 2009 1
  • 2. Course Details • Time and Location: TuTh, 9:10-10:30 AM, WEB L112 • Course Website - http://www.eng.utah.edu/~cs4961/ • Instructor: Mary Hall, mhall@cs.utah.edu, http://www.cs.utah.edu/~mhall/ - Office Hours: Tu 10:45-11:15 AM; Wed 11:00-11:30 AM • TA: Sriram Aananthakrishnan, sriram@cs.utah.edu - Office Hours: TBD • Textbook - “Principles of Parallel Programming,” Calvin Lin and Lawrence Snyder. - Also, readings and notes provided for MPI, CUDA, Locality and Parallel Algs. 08/25/2009 CS4961 2
  • 3. Today’s Lecture • Overview of course (done) • Important problems require powerful computers … - … and powerful computers must be parallel. - Increasing importance of educating parallel programmers (you!) • What sorts of architectures in this class - Multimedia extensions, multi-cores, GPUs, networked clusters • Developing high-performance parallel applications - An optimization perspective 08/25/2009 3 CS4961
  • 4. 08/25/2009 CS4961 Outline • Logistics • Introduction • Technology Drivers for Multi-Core Paradigm Shift • Origins of Parallel Programming: Large-scale scientific simulations • The fastest computer in the world today • Why writing fast parallel programs is hard Some material for this lecture drawn from: Kathy Yelick and Jim Demmel, UC Berkeley Quentin Stout, University of Michigan, (see http://www.eecs.umich.edu/~qstout/parallel.html) Top 500 list (http://www.top500.org) 4
  • 5. Course Objectives • Learn how to program parallel processors and systems - Learn how to think in parallel and write correct parallel programs - Achieve performance and scalability through understanding of architecture and software mapping • Significant hands-on programming experience - Develop real applications on real hardware • Discuss the current parallel computing context - What are the drivers that make this course timely - Contemporary programming models and architectures, and where is the field going 08/25/2009 CS4961 5
  • 6. Parallel and Distributed Computing • Parallel computing (processing): - the use of two or more processors (computers), usually within a single system, working simultaneously to solve a single problem. • Distributed computing (processing): - any computing that involves multiple computers remote from each other that each have a role in a computation problem or information processing. • Parallel programming: - the human process of developing programs that express what computations should be executed in parallel. 08/25/2009 CS4961 6
  • 7. 08/25/2009 CS4961 Why is Parallel Programming Important Now? • All computers are now parallel computers (embedded, commodity, supercomputer) - On-chip architectures look like parallel computers - Languages, software development and compilation strategies originally developed for high end (supercomputers) are now becoming important for many other domains • Why? - Technology trends • Looking to the future - Parallel computing for the masses demands better parallel programming paradigms - And more people who are trained in writing parallel programs (possibly you!) - How to put all these vast machine resources to the best use! 7
  • 8. Detour: Technology as Driver for “Multi-Core” Paradigm Shift • Do you know why most computers sold today are parallel computers? • Let’s talk about the technology trends 08/25/2009 CS4961 8
  • 9. Technology Trends: Microprocessor Capacity Slide source: Maurice Herlihy Clock speed flattening sharply Transistor count still rising Moore’s Law: Gordon Moore (co-founder of Intel) predicted in 1965 that the transistor density of semiconductor chips would double roughly every 18 months. 08/25/2009 9 CS4961
  • 10. Technology Trends: Power Density Limits Serial Performance 08/25/2009 CS4961 10
  • 11. • Key ideas: - Movement away from increasingly complex processor design and faster clocks - Replicated functionality (i.e., parallel) is simpler to design - Resources more efficiently utilized - Huge power management advantages What to do with all these transistors? The Multi-Core Paradigm Shift All Computers are Parallel Computers. 08/25/2009 11 CS4961
  • 12. Proof of Significance: Popular Press 08/25/2009 CS4961 • This week’s issue of Newsweek! • Article on 25 things “smart people” should know • See http://www.newsweek.com/id/212142 12
  • 13. 08/25/2009 CS4961 Scientific Simulation: The Third Pillar of Science • Traditional scientific and engineering paradigm: 1)Do theory or paper design. 2)Perform experiments or build system. • Limitations: - Too difficult -- build large wind tunnels. - Too expensive -- build a throw-away passenger jet. - Too slow -- wait for climate or galactic evolution. - Too dangerous -- weapons, drug design, climate experimentation. • Computational science paradigm: 3)Use high performance computer systems to simulate the phenomenon - Base on known physical laws and efficient numerical methods. 13
  • 14. The quest for increasingly more powerful machines • Scientific simulation will continue to push on system requirements: - To increase the precision of the result - To get to an answer sooner (e.g., climate modeling, disaster modeling) • The U.S. will continue to acquire systems of increasing scale - For the above reasons - And to maintain competitiveness 08/25/2009 CS4961 14
  • 15. A Similar Phenomenon in Commodity Systems • More capabilities in software • Integration across software • Faster response • More realistic graphics • … CS4961 08/25/2009 15
  • 16. The fastest computer in the world today • What is its name? • Where is it located? • How many processors does it have? • What kind of processors? • How fast is it? RoadRunner Los Alamos National Laboratory ~19,000 processor chips (~129,600 “processors”) AMD Opterons and IBM Cell/BE (in Playstations) 1.105 Petaflop/second One quadrilion operations/s 1 x 1016 See http://www.top500.org 08/25/2009 CS4961 16
  • 17. 08/25/2009 CS4961 Example: Global Climate Modeling Problem • Problem is to compute: f(latitude, longitude, elevation, time)  temperature, pressure, humidity, wind velocity • Approach: - Discretize the domain, e.g., a measurement point every 10 km - Devise an algorithm to predict weather at time t+dt given t • Uses: - Predict major events, e.g., El Nino - Use in setting air emissions standards Source: http://www.epm.ornl.gov/chammp/chammp.html 17
  • 18. High Resolution Climate Modeling on NERSC-3 – P. Duffy, et al., LLNL 08/25/2009 CS4961 18
  • 19. 08/25/2009 CS4961 Some Characteristics of Scientific Simulation • Discretize physical or conceptual space into a grid - Simpler if regular, may be more representative if adaptive • Perform local computations on grid - Given yesterday’s temperature and weather pattern, what is today’s expected temperature? • Communicate partial results between grids - Contribute local weather result to understand global weather pattern. • Repeat for a set of time steps • Possibly perform other calculations with results - Given weather model, what area should evacuate for a hurricane? 19
  • 20. Example of Discretizing a Domain 08/25/2009 CS4961 One processor computes this part Another processor computes this part in parallel Processors in adjacent blocks in the grid communicate their result. 20
  • 21. 08/25/2009 CS4961 Parallel Programming Complexity An Analogy to Preparing Thanksgiving Dinner • Enough parallelism? (Amdahl’s Law) - Suppose you want to just serve turkey • Granularity - How frequently must each assistant report to the chef - After each stroke of a knife? Each step of a recipe? Each dish completed? • Locality - Grab the spices one at a time? Or collect ones that are needed prior to starting a dish? • Load balance - Each assistant gets a dish? Preparing stuffing vs. cooking green beans? • Coordination and Synchronization - Person chopping onions for stuffing can also supply green beans - Start pie after turkey is out of the oven All of these things makes parallel programming even harder than sequential programming. 21
  • 22. 08/25/2009 CS4961 Finding Enough Parallelism • Suppose only part of an application seems parallel • Amdahl’s law - let s be the fraction of work done sequentially, so (1-s) is fraction parallelizable - P = number of processors Speedup(P) = Time(1)/Time(P) <= 1/(s + (1-s)/P) <= 1/s • Even if the parallel part speeds up perfectly performance is limited by the sequential part 22
  • 23. 08/25/2009 CS4961 Overhead of Parallelism • Given enough parallel work, this is the biggest barrier to getting desired speedup • Parallelism overheads include: - cost of starting a thread or process - cost of communicating shared data - cost of synchronizing - extra (redundant) computation • Each of these can be in the range of milliseconds (=millions of flops) on some systems • Tradeoff: Algorithm needs sufficiently large units of work to run fast in parallel (I.e. large granularity), but not so large that there is not enough parallel work 23
  • 24. 08/25/2009 CS4961 Locality and Parallelism • Large memories are slow, fast memories are small • Program should do most work on local data Proc Cache L2 Cache L3 Cache Memory Conventional Storage Hierarchy Proc Cache L2 Cache L3 Cache Memory Proc Cache L2 Cache L3 Cache Memory potential interconnects 24
  • 25. 08/25/2009 CS4961 Load Imbalance • Load imbalance is the time that some processors in the system are idle due to - insufficient parallelism (during that phase) - unequal size tasks • Examples of the latter - adapting to “interesting parts of a domain” - tree-structured computations - fundamentally unstructured problems • Algorithm needs to balance load 25
  • 26. Some Popular Parallel Programming Models • Pthreads (parallel threads) - Low level expression of threads, which are independent computations that can execute in parallel • MPI (Message Passing Interface) - Most widely used at the very high-end machines - Extension to common sequential languages, express communication between different processes along with parallelism • Map-Reduce (popularized by Google) - Map: apply the same computation to lots of different data (usually in distributed files) and produce local results - Reduce: compute global result from set of local results • CUDA (Compute Unified Device Architecture) - Proprietary programming language for NVIDIA graphics processors 08/25/2009 CS4961 26
  • 27. 08/25/2009 CS4961 Summary of Lecture • Solving the “Parallel Programming Problem” - Key technical challenge facing today’s computing industry, government agencies and scientists • Scientific simulation discretizes some space into a grid - Perform local computations on grid - Communicate partial results between grids - Repeat for a set of time steps - Possibly perform other calculations with results • Commodity parallel programming can draw from this history and move forward in a new direction • Writing fast parallel programs is difficult - Amdahl’s Law Must parallelize most of computation - Data Locality - Communication and Synchronization - Load Imbalance 27
  • 28. Next Time • An exploration of parallel algorithms and their features • First written homework assignment 08/25/2009 CS4961 28