SlideShare a Scribd company logo
Scalable Computing Software Lab, Illinois Institute of Technology 1
Computing:
From Compute-Centric to Data-Centric
Xian-He Sun
Illinois Institute of Technology
sun@iit.edu
CS546 Lecture 8 Page 2
The Design of Parallel Algorithms
Unbalanced Workload
Communication Delay
Overhead Increases with the Ensemble Size
10/3/2017
Maximum
System
256 racks
3.5 PF/s
512 TB
850 MHz
8 MB EDRAM
4 cores
1 chip, 20
DRAMs
13.6 GF/s
2.0 GB DDR
Supports 4-way SMP
32 Node Cards
1024 chips, 4096 procs
14 TF/s
2 TB
72 Racks
1 PF/s
144 TB
Cabled 8x8x16Rack
Petaflops
System
Compute Card
Chip
435 GF/s
64 GB
(32 chips 4x4x2)
32 compute, 0-2 IO cards
Node Board
Front End Node / Service Node
System p Servers
Linux SLES10
HPC SW:
Compilers
GPFS
ESSL
Loadleveler
Evolution of Computing:
The scalability
Source: ANL ALCF
IBM BG/P
Scalable Computing Software Lab, Illinois Institute of Technology 4
1
10
100
1,000
10,000
100,000
1980 1985 1990 1995 2000 2005 2010
Year
Performance
Memory
Uni-rocessor
Multi-core/many-core processor
The Memory-wall Problem
 Processor performance
increases rapidly
 Uni-processor: ~52% until
2004
 Aggregate multi-core/many-
core processor performance
even higher since 2004
 Memory: ~9% per year
 Storage: ~6% per year
 Processor-memory speed gap
keeps increasing
Source: Intel
Source: OCZ
25%
52%
20%
9%
60%
9%
Memory-bounded speedup (1990), Memory wall problem (1994)
CPU Registers
<8KB
<0.2~0.5 ns, 500~800 GB/s/core
Cache
<50MB
1-10 ns, 50~150GB/s/core
Main Memory
Giga Bytes
50ns-100ns 5~10GB/s/channel
Disk
Tera Bytes, 5 ms
100~300MB/s
Capacity
Access Time, Bandwidth
Tape
Peta Bytes or
infinite
sec-min
Registers
Cache
Memory
Disk
Tape
Instr. Operands
Blocks
Pages
Files
Staging
Xfer Unit
prog./compiler
1-8 bytes
cache cntl
32-128 bytes
OS
4K-4M bytes
user/operator
Mbytes
Upper Level
Lower Level
faster
Larger
Current Solution: Memory Hierarchy
Lecture Page 2
Fig 1. Cost variation pattern
7
I/O becomes the Bottleneck of HPC
8
Solution:Parallel I/O Systems
Architecture of a typical parallel I/O system
File server File server File server File server File server
Problem of PFS in Big Data Applications
File
Servers
0
4
8
1
5
9
2
6
3
7
0 1 2 3 4 5 6 7 8 9 . .
File
Servers
04
48
1
5
9
2
6
3
7
0 1 2 3 4 5 6 7 8 9 . .
Ideal Situation Actual Situation
o If everyone requests data: contention
o Concurrence is more than increase bandwidth
Need to measure concurrence contribution
Multi-core
Multi-threading
Multi-issue
Multi-banked Cache
Multi-level Cache
Multi-channel
Multi-rank
Multi-bank
CPU
Cache
Memory
Out-of-order Execution
Speculative Execution
Runahead Execution
Pipelined Cache
Non-blocking Cache
Data Prefetching
Write buffer
Also: (mostly hiding) Memory Concurrency
Parallel File System
Input-Output (I/O)
Disks
Pipeline
Non-blocking
Prefetching
Write buffer
Model of Data Access (Memory Hierarchy)
 The conventional AMAT (Average Memory Access Time) Model
 AMAT = HitCycle1 + MR1×AMP1
Where AMP1 = (HitCycle2 + MR2×AMP2)
 AMAT = HitCycle + MR×(H2 + MR2× (H3 + MR3×AMP3 ))
 The new C-AMAT (Concurrent-AMAT) model
Where
1
1
1 1 1 2- -
H
H
C AMAT MR C AMAT
C
   
2 2
2 2
2 2-
H M
H pAMP
C AMAT pMR
C C
  
1
1
1 1
1
1 1
m
M
CpMR pAMP
MR AMP C
   
X.-H. Sun and D. Wang, "Concurrent Average Memory Access Time", in IEEE Computers, vol.
47, no. 5, pp. 74-80,May 2014
Here is the overlapping contribution
Lecture Page 12
Data Access Time: AMAT
 Average Memory Access Time (AMAT)
= Thit(L1) + Miss%(L1)* (Thit(L2) + Miss%(L2)* (Thit(L3) + Miss%(L3)*T(memory) ) )
 Example: (Latency as shown above)
 Miss rate: L1=10%, L2=5%, L3=1% (Be careful miss rate definition)
 AMAT
= 2.115
L1
Cache L2
Cache L3
Cache Main
Memory
(DRAM)
1 clk
Hit Time
10 clks 20 clks 300 clks
On-die
Data Access Time: C-AMAT
L1
Cache L2
Cache L3
Cache Main
Memory
(DRAM)
1 clk
Hit Time
2
Hit Concurrency
10 clks 20 clks 300 clks
3 4 6
 Concurrent Average Memory Access Time (C-AMAT)
=
H1
CH1
+ MR1 × κ1 ×
H2
CH2
+ MR2 × κ2 ×
H3
CH3
+ MR3 × κ3 ×
HMem
CHMem
 Example
 Miss Rate: L1=10%, L2=5%, L3=1% pMR, pAMP, AMP, CM, Cm: L1=7%, 10, 10, 5, 4
 𝜅: 𝐿1=0.56, L2=0.6, L3=0.8 L2=3%, 60, 40, 9, 6
 C-AMAT≈0.696 L3=0.8%, 400, 300, 16, 12
What Does C-AMAT Say?
 C-AMAT is an extension of AMAT to consider concurrency
 C-AMAT can be measured at each layer with APC
 C-AMAT is data-centric thinking
 Data access is as important as computing
 High locality may hurt performance
 The Pure Miss concept
 Balance locality, concurrency, overlapping with C-AMAT
 C-AMAT uniquely integrates the joint impact of locality,
concurrency, and overlapping for optimization (analysis and
measurement)
Off-chip
side
Pace Matching Data Access(搏动数据获取)
Processor
side
Layer 1 Layer 2 Layer 3 Layer 4
No delay data access (using C-AMAT as the gate to guide
and LPM as the global controller for in-situ optimization)
Case #1
Case #2
Case #3
Case #4
IO Optimization: from Prefetching to Merging
 Application-aware optimization
 Methods (LPM) and software
 Model (C-AMAT) measurement
Prefetching SC’08 Parallel I/O Prefetching Using MPI File Caching and I/O Signatures
Data Layout
Data
Coordination
Data
Organization
HPDC11
A Cost-intelligent Application-specific Data layout Scheme for Parallel
File Systems
SC11 Server-Side I/O Coordination for Parallel File Systems
PDSW11 Pattern-aware File Reorganization in MPI-IO
Data
Replication
IPDPS13 Layout-Aware Replication Scheme for Parallel I/O Systems
Cache ICDCS14 S4D-Cache: Smart Selective SSD Cache for Parallel I/O Systems
Heterogeneity
Merging
IPDPS15
BigData15
HAS: Heterogeneity-Aware Selective Data Layout Scheme for Parallel
File Systems on Hybrid Servers
PortHadoop: Support Direct HPC Data Processing in Hadoop
• Website: www.cs.iit.edu/~scs/iosig/
Data Compression
Merging with Industrial Solution:
Distributed I/O Systems
Architecture of a typical HDFS system
Models show for many applications the best optimization is to use HDFS
Integrated data access system (IDAS)
18
Two camps
o MPI for HPC
o MapReduce for big data
Native data access
o PFS (POSIX I/O and
MPI-IO)
o Big Data (HDFS API)
Limited Support for
Data Access
o Use explicit serial copy
o Redundant data store in
HDFS
• Merging is more than access (maintain the merits)
Yang, N. Liu, B. Feng, X.-H. Sun and S. Zhou, "PortHadoop: Support Direct HPC Data Processing in Hadoop," in Proc.
of IEEE International Conference on Big Data (IEEE BigData 2015). Santa Clara, CA, USA, Oct. 2015.
Application -- NASA Earth Science Data Analysis
(Super-Cloud Project)
10/3/2017 19
Simulation is done on MPI environment and Data Analysis is done on Hadoop environment
48 96 192 384
0
1000
2000
3000
4000
5000
6000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Number of Files
ElapsedTime(Seconds)
Serial Vis Serial Copy Parallel Vis Parallel Copy PortHadoop-R Vis Serial Animation
PortHadoop-R VS Conventional Methods: Performance
(Vis-only)
Serial Copy: one copy command issued on one Hadoop node
Parallel Copy: 4 copy commands issued on each Hadoop node (32 in total)
Serial Vis: read all data into one fat node, visualize
Parallel Vis: visualize all files on all Hadoop nodes
PortHadoop-R Vis: visualize all files into images Serial Animation: merge all images into one animation
𝑆𝑝𝑒𝑒𝑑𝑢𝑝 𝑃𝑜𝑟𝑡𝐻𝑎𝑑𝑜𝑜𝑝 = 1.47
𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑡𝑜𝑡𝑎𝑙 = 15
Good speedup: Total time is reduced by up to 15x
and 1.47x compared to serial and parallel solution.
20
What Are the Challenges in Computing?
 When communication is the issue
 Domain decomposition, pipelining
 When locality (memory hierarchy) is the concern
 Block-based computing
 When memory bound is the constraint
 Memory bound analysis
 When data access concurrency is a determining factor
 What is the new frame of algorithm design?
What We Can Help You?
 If you have big data applications
 If you have data intensive scientific applications
 If you need both computing and data powers
 If you are interested in data-centric algorithm designs
What You Can Help Us
 Applications examples
 Methods
 Probability sampling, etc.
CS570: Advanced
Computer Architecture Lecture Page 23
An Unbalanced System
 Source: Bob Colwell keynote ISCA’29 2002 http://systems.cs.colorado.edu/ISCA2002/Colwell-ISCA-
KEYNOTE-2002-final.ppt

More Related Content

What's hot

Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)Sangamesh Ragate
 
Architecture and Performance of Runtime Environments for Data Intensive Scala...
Architecture and Performance of Runtime Environments for Data Intensive Scala...Architecture and Performance of Runtime Environments for Data Intensive Scala...
Architecture and Performance of Runtime Environments for Data Intensive Scala...
jaliyae
 
Assisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated ArchitectureAssisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated Architecture
inside-BigData.com
 
Bryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYC
Bryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYCBryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYC
Bryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYC
MLconf
 
TensorFlow for HPC?
TensorFlow for HPC?TensorFlow for HPC?
TensorFlow for HPC?
inside-BigData.com
 
Open power ddl and lms
Open power ddl and lmsOpen power ddl and lms
Open power ddl and lms
Ganesan Narayanasamy
 
pMatlab on BlueGene
pMatlab on BlueGenepMatlab on BlueGene
pMatlab on BlueGenevsachde
 
Big Graph Analytics Systems (Sigmod16 Tutorial)
Big Graph Analytics Systems (Sigmod16 Tutorial)Big Graph Analytics Systems (Sigmod16 Tutorial)
Big Graph Analytics Systems (Sigmod16 Tutorial)
Yuanyuan Tian
 
Manycores for the Masses
Manycores for the MassesManycores for the Masses
Manycores for the Masses
Intel® Software
 
Large data with Scikit-learn - Boston Data Mining Meetup - Alex Perrier
Large data with Scikit-learn - Boston Data Mining Meetup  - Alex PerrierLarge data with Scikit-learn - Boston Data Mining Meetup  - Alex Perrier
Large data with Scikit-learn - Boston Data Mining Meetup - Alex Perrier
Alexis Perrier
 
Ad04606184188
Ad04606184188Ad04606184188
Ad04606184188
IJERA Editor
 
Time space trade off
Time space trade offTime space trade off
Time space trade off
anisha talwar
 
OpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation finalOpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation finalJunli Gu
 
IEEE CloudCom 2014参加報告
IEEE CloudCom 2014参加報告IEEE CloudCom 2014参加報告
IEEE CloudCom 2014参加報告
Ryousei Takano
 
Data-Centric Parallel Programming
Data-Centric Parallel ProgrammingData-Centric Parallel Programming
Data-Centric Parallel Programming
inside-BigData.com
 
ExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
ExtraV - Boosting Graph Processing Near Storage with a Coherent AcceleratorExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
ExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
Jinho Lee
 
AVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUS
AVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUSAVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUS
AVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUS
csandit
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2Junli Gu
 
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015Junli Gu
 

What's hot (20)

Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)
 
Architecture and Performance of Runtime Environments for Data Intensive Scala...
Architecture and Performance of Runtime Environments for Data Intensive Scala...Architecture and Performance of Runtime Environments for Data Intensive Scala...
Architecture and Performance of Runtime Environments for Data Intensive Scala...
 
Assisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated ArchitectureAssisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated Architecture
 
Bryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYC
Bryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYCBryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYC
Bryan Thompson, Chief Scientist and Founder at SYSTAP, LLC at MLconf NYC
 
TensorFlow for HPC?
TensorFlow for HPC?TensorFlow for HPC?
TensorFlow for HPC?
 
Open power ddl and lms
Open power ddl and lmsOpen power ddl and lms
Open power ddl and lms
 
pMatlab on BlueGene
pMatlab on BlueGenepMatlab on BlueGene
pMatlab on BlueGene
 
Big Graph Analytics Systems (Sigmod16 Tutorial)
Big Graph Analytics Systems (Sigmod16 Tutorial)Big Graph Analytics Systems (Sigmod16 Tutorial)
Big Graph Analytics Systems (Sigmod16 Tutorial)
 
Manycores for the Masses
Manycores for the MassesManycores for the Masses
Manycores for the Masses
 
Large data with Scikit-learn - Boston Data Mining Meetup - Alex Perrier
Large data with Scikit-learn - Boston Data Mining Meetup  - Alex PerrierLarge data with Scikit-learn - Boston Data Mining Meetup  - Alex Perrier
Large data with Scikit-learn - Boston Data Mining Meetup - Alex Perrier
 
Ad04606184188
Ad04606184188Ad04606184188
Ad04606184188
 
Time space trade off
Time space trade offTime space trade off
Time space trade off
 
OpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation finalOpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation final
 
IEEE CloudCom 2014参加報告
IEEE CloudCom 2014参加報告IEEE CloudCom 2014参加報告
IEEE CloudCom 2014参加報告
 
Data-Centric Parallel Programming
Data-Centric Parallel ProgrammingData-Centric Parallel Programming
Data-Centric Parallel Programming
 
ExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
ExtraV - Boosting Graph Processing Near Storage with a Coherent AcceleratorExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
ExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
 
AVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUS
AVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUSAVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUS
AVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUS
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2
 
Dl2 computing gpu
Dl2 computing gpuDl2 computing gpu
Dl2 computing gpu
 
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
 

Similar to Xian He Sun Data-Centric Into

L05 parallel
L05 parallelL05 parallel
Distributed Computing
Distributed ComputingDistributed Computing
Distributed Computing
Sudarsun Santhiappan
 
Tridiagonal solver in gpu
Tridiagonal solver in gpuTridiagonal solver in gpu
Tridiagonal solver in gpu
Hunan University
 
parallel-computation.pdf
parallel-computation.pdfparallel-computation.pdf
parallel-computation.pdf
Jayanti Prasad Ph.D.
 
676.v3
676.v3676.v3
676.v3
Rajesh M
 
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Derryck Lamptey, MPhil, CISSP
 
Hadoop Network Performance profile
Hadoop Network Performance profileHadoop Network Performance profile
Hadoop Network Performance profilepramodbiligiri
 
SO-Memoria.pdf
SO-Memoria.pdfSO-Memoria.pdf
SO-Memoria.pdf
Kadu37
 
SO-Memoria.pdf
SO-Memoria.pdfSO-Memoria.pdf
SO-Memoria.pdf
ssuser143a20
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
NWU and HPC
NWU and HPCNWU and HPC
NWU and HPC
Wilhelm van Belkum
 
Intro (Distributed computing)
Intro (Distributed computing)Intro (Distributed computing)
Intro (Distributed computing)Sri Prasanna
 
PowerAI Deep dive
PowerAI Deep divePowerAI Deep dive
PowerAI Deep dive
Ganesan Narayanasamy
 
A Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelA Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in Parallel
Jenny Liu
 
Mirage: ML kernels in the cloud (ML Workshop 2010)
Mirage: ML kernels in the cloud (ML Workshop 2010)Mirage: ML kernels in the cloud (ML Workshop 2010)
Mirage: ML kernels in the cloud (ML Workshop 2010)Anil Madhavapeddy
 
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like systemAccelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Shuai Yuan
 
memorytechnologyandoptimization-140416131506-phpapp02.pptx
memorytechnologyandoptimization-140416131506-phpapp02.pptxmemorytechnologyandoptimization-140416131506-phpapp02.pptx
memorytechnologyandoptimization-140416131506-phpapp02.pptx
shahdivyanshu1002
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
Sagar Dolas
 

Similar to Xian He Sun Data-Centric Into (20)

L05 parallel
L05 parallelL05 parallel
L05 parallel
 
Distributed Computing
Distributed ComputingDistributed Computing
Distributed Computing
 
Tridiagonal solver in gpu
Tridiagonal solver in gpuTridiagonal solver in gpu
Tridiagonal solver in gpu
 
Par com
Par comPar com
Par com
 
parallel-computation.pdf
parallel-computation.pdfparallel-computation.pdf
parallel-computation.pdf
 
676.v3
676.v3676.v3
676.v3
 
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
 
Hadoop Network Performance profile
Hadoop Network Performance profileHadoop Network Performance profile
Hadoop Network Performance profile
 
SO-Memoria.pdf
SO-Memoria.pdfSO-Memoria.pdf
SO-Memoria.pdf
 
SO-Memoria.pdf
SO-Memoria.pdfSO-Memoria.pdf
SO-Memoria.pdf
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
NWU and HPC
NWU and HPCNWU and HPC
NWU and HPC
 
Intro (Distributed computing)
Intro (Distributed computing)Intro (Distributed computing)
Intro (Distributed computing)
 
PowerAI Deep dive
PowerAI Deep divePowerAI Deep dive
PowerAI Deep dive
 
A Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelA Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in Parallel
 
Mirage: ML kernels in the cloud (ML Workshop 2010)
Mirage: ML kernels in the cloud (ML Workshop 2010)Mirage: ML kernels in the cloud (ML Workshop 2010)
Mirage: ML kernels in the cloud (ML Workshop 2010)
 
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like systemAccelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
 
Accelerix ISSCC 1998 Paper
Accelerix ISSCC 1998 PaperAccelerix ISSCC 1998 Paper
Accelerix ISSCC 1998 Paper
 
memorytechnologyandoptimization-140416131506-phpapp02.pptx
memorytechnologyandoptimization-140416131506-phpapp02.pptxmemorytechnologyandoptimization-140416131506-phpapp02.pptx
memorytechnologyandoptimization-140416131506-phpapp02.pptx
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 

More from SciCompIIT

Lois Curfman McInnes Exascale CISC Lecture Jan 2018
Lois Curfman McInnes Exascale CISC Lecture Jan 2018Lois Curfman McInnes Exascale CISC Lecture Jan 2018
Lois Curfman McInnes Exascale CISC Lecture Jan 2018
SciCompIIT
 
Shuwang Li Moving Interface Modeling and Computation
Shuwang Li Moving Interface Modeling and ComputationShuwang Li Moving Interface Modeling and Computation
Shuwang Li Moving Interface Modeling and Computation
SciCompIIT
 
Wereszczynski Molecular Dynamics
Wereszczynski Molecular DynamicsWereszczynski Molecular Dynamics
Wereszczynski Molecular Dynamics
SciCompIIT
 
Dixon Deep Learning
Dixon Deep LearningDixon Deep Learning
Dixon Deep Learning
SciCompIIT
 
Chun Liu Energetic Variational Intro
Chun Liu Energetic Variational IntroChun Liu Energetic Variational Intro
Chun Liu Energetic Variational Intro
SciCompIIT
 
David Minh Brief Stories 2017 Sept
David Minh Brief Stories 2017 SeptDavid Minh Brief Stories 2017 Sept
David Minh Brief Stories 2017 Sept
SciCompIIT
 
XSEDE April 2017
XSEDE April 2017XSEDE April 2017
XSEDE April 2017
SciCompIIT
 
GridIIT Open Science Grid
GridIIT Open Science GridGridIIT Open Science Grid
GridIIT Open Science Grid
SciCompIIT
 
CISC Introduction
CISC IntroductionCISC Introduction
CISC Introduction
SciCompIIT
 

More from SciCompIIT (9)

Lois Curfman McInnes Exascale CISC Lecture Jan 2018
Lois Curfman McInnes Exascale CISC Lecture Jan 2018Lois Curfman McInnes Exascale CISC Lecture Jan 2018
Lois Curfman McInnes Exascale CISC Lecture Jan 2018
 
Shuwang Li Moving Interface Modeling and Computation
Shuwang Li Moving Interface Modeling and ComputationShuwang Li Moving Interface Modeling and Computation
Shuwang Li Moving Interface Modeling and Computation
 
Wereszczynski Molecular Dynamics
Wereszczynski Molecular DynamicsWereszczynski Molecular Dynamics
Wereszczynski Molecular Dynamics
 
Dixon Deep Learning
Dixon Deep LearningDixon Deep Learning
Dixon Deep Learning
 
Chun Liu Energetic Variational Intro
Chun Liu Energetic Variational IntroChun Liu Energetic Variational Intro
Chun Liu Energetic Variational Intro
 
David Minh Brief Stories 2017 Sept
David Minh Brief Stories 2017 SeptDavid Minh Brief Stories 2017 Sept
David Minh Brief Stories 2017 Sept
 
XSEDE April 2017
XSEDE April 2017XSEDE April 2017
XSEDE April 2017
 
GridIIT Open Science Grid
GridIIT Open Science GridGridIIT Open Science Grid
GridIIT Open Science Grid
 
CISC Introduction
CISC IntroductionCISC Introduction
CISC Introduction
 

Recently uploaded

20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Sharon Liu
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
RASHMI M G
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
HongcNguyn6
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
RASHMI M G
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 
Red blood cells- genesis-maturation.pptx
Red blood cells- genesis-maturation.pptxRed blood cells- genesis-maturation.pptx
Red blood cells- genesis-maturation.pptx
muralinath2
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
sanjana502982
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Anemia_ types_clinical significance.pptx
Anemia_ types_clinical significance.pptxAnemia_ types_clinical significance.pptx
Anemia_ types_clinical significance.pptx
muralinath2
 
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdfMudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
frank0071
 

Recently uploaded (20)

20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 
Red blood cells- genesis-maturation.pptx
Red blood cells- genesis-maturation.pptxRed blood cells- genesis-maturation.pptx
Red blood cells- genesis-maturation.pptx
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
Anemia_ types_clinical significance.pptx
Anemia_ types_clinical significance.pptxAnemia_ types_clinical significance.pptx
Anemia_ types_clinical significance.pptx
 
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdfMudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
 

Xian He Sun Data-Centric Into

  • 1. Scalable Computing Software Lab, Illinois Institute of Technology 1 Computing: From Compute-Centric to Data-Centric Xian-He Sun Illinois Institute of Technology sun@iit.edu
  • 2. CS546 Lecture 8 Page 2 The Design of Parallel Algorithms Unbalanced Workload Communication Delay Overhead Increases with the Ensemble Size
  • 3. 10/3/2017 Maximum System 256 racks 3.5 PF/s 512 TB 850 MHz 8 MB EDRAM 4 cores 1 chip, 20 DRAMs 13.6 GF/s 2.0 GB DDR Supports 4-way SMP 32 Node Cards 1024 chips, 4096 procs 14 TF/s 2 TB 72 Racks 1 PF/s 144 TB Cabled 8x8x16Rack Petaflops System Compute Card Chip 435 GF/s 64 GB (32 chips 4x4x2) 32 compute, 0-2 IO cards Node Board Front End Node / Service Node System p Servers Linux SLES10 HPC SW: Compilers GPFS ESSL Loadleveler Evolution of Computing: The scalability Source: ANL ALCF IBM BG/P
  • 4. Scalable Computing Software Lab, Illinois Institute of Technology 4 1 10 100 1,000 10,000 100,000 1980 1985 1990 1995 2000 2005 2010 Year Performance Memory Uni-rocessor Multi-core/many-core processor The Memory-wall Problem  Processor performance increases rapidly  Uni-processor: ~52% until 2004  Aggregate multi-core/many- core processor performance even higher since 2004  Memory: ~9% per year  Storage: ~6% per year  Processor-memory speed gap keeps increasing Source: Intel Source: OCZ 25% 52% 20% 9% 60% 9% Memory-bounded speedup (1990), Memory wall problem (1994)
  • 5. CPU Registers <8KB <0.2~0.5 ns, 500~800 GB/s/core Cache <50MB 1-10 ns, 50~150GB/s/core Main Memory Giga Bytes 50ns-100ns 5~10GB/s/channel Disk Tera Bytes, 5 ms 100~300MB/s Capacity Access Time, Bandwidth Tape Peta Bytes or infinite sec-min Registers Cache Memory Disk Tape Instr. Operands Blocks Pages Files Staging Xfer Unit prog./compiler 1-8 bytes cache cntl 32-128 bytes OS 4K-4M bytes user/operator Mbytes Upper Level Lower Level faster Larger Current Solution: Memory Hierarchy
  • 6. Lecture Page 2 Fig 1. Cost variation pattern
  • 7. 7 I/O becomes the Bottleneck of HPC
  • 8. 8 Solution:Parallel I/O Systems Architecture of a typical parallel I/O system File server File server File server File server File server
  • 9. Problem of PFS in Big Data Applications File Servers 0 4 8 1 5 9 2 6 3 7 0 1 2 3 4 5 6 7 8 9 . . File Servers 04 48 1 5 9 2 6 3 7 0 1 2 3 4 5 6 7 8 9 . . Ideal Situation Actual Situation o If everyone requests data: contention o Concurrence is more than increase bandwidth Need to measure concurrence contribution
  • 10. Multi-core Multi-threading Multi-issue Multi-banked Cache Multi-level Cache Multi-channel Multi-rank Multi-bank CPU Cache Memory Out-of-order Execution Speculative Execution Runahead Execution Pipelined Cache Non-blocking Cache Data Prefetching Write buffer Also: (mostly hiding) Memory Concurrency Parallel File System Input-Output (I/O) Disks Pipeline Non-blocking Prefetching Write buffer
  • 11. Model of Data Access (Memory Hierarchy)  The conventional AMAT (Average Memory Access Time) Model  AMAT = HitCycle1 + MR1×AMP1 Where AMP1 = (HitCycle2 + MR2×AMP2)  AMAT = HitCycle + MR×(H2 + MR2× (H3 + MR3×AMP3 ))  The new C-AMAT (Concurrent-AMAT) model Where 1 1 1 1 1 2- - H H C AMAT MR C AMAT C     2 2 2 2 2 2- H M H pAMP C AMAT pMR C C    1 1 1 1 1 1 1 m M CpMR pAMP MR AMP C     X.-H. Sun and D. Wang, "Concurrent Average Memory Access Time", in IEEE Computers, vol. 47, no. 5, pp. 74-80,May 2014 Here is the overlapping contribution
  • 12. Lecture Page 12 Data Access Time: AMAT  Average Memory Access Time (AMAT) = Thit(L1) + Miss%(L1)* (Thit(L2) + Miss%(L2)* (Thit(L3) + Miss%(L3)*T(memory) ) )  Example: (Latency as shown above)  Miss rate: L1=10%, L2=5%, L3=1% (Be careful miss rate definition)  AMAT = 2.115 L1 Cache L2 Cache L3 Cache Main Memory (DRAM) 1 clk Hit Time 10 clks 20 clks 300 clks On-die
  • 13. Data Access Time: C-AMAT L1 Cache L2 Cache L3 Cache Main Memory (DRAM) 1 clk Hit Time 2 Hit Concurrency 10 clks 20 clks 300 clks 3 4 6  Concurrent Average Memory Access Time (C-AMAT) = H1 CH1 + MR1 × κ1 × H2 CH2 + MR2 × κ2 × H3 CH3 + MR3 × κ3 × HMem CHMem  Example  Miss Rate: L1=10%, L2=5%, L3=1% pMR, pAMP, AMP, CM, Cm: L1=7%, 10, 10, 5, 4  𝜅: 𝐿1=0.56, L2=0.6, L3=0.8 L2=3%, 60, 40, 9, 6  C-AMAT≈0.696 L3=0.8%, 400, 300, 16, 12
  • 14. What Does C-AMAT Say?  C-AMAT is an extension of AMAT to consider concurrency  C-AMAT can be measured at each layer with APC  C-AMAT is data-centric thinking  Data access is as important as computing  High locality may hurt performance  The Pure Miss concept  Balance locality, concurrency, overlapping with C-AMAT  C-AMAT uniquely integrates the joint impact of locality, concurrency, and overlapping for optimization (analysis and measurement)
  • 15. Off-chip side Pace Matching Data Access(搏动数据获取) Processor side Layer 1 Layer 2 Layer 3 Layer 4 No delay data access (using C-AMAT as the gate to guide and LPM as the global controller for in-situ optimization) Case #1 Case #2 Case #3 Case #4
  • 16. IO Optimization: from Prefetching to Merging  Application-aware optimization  Methods (LPM) and software  Model (C-AMAT) measurement Prefetching SC’08 Parallel I/O Prefetching Using MPI File Caching and I/O Signatures Data Layout Data Coordination Data Organization HPDC11 A Cost-intelligent Application-specific Data layout Scheme for Parallel File Systems SC11 Server-Side I/O Coordination for Parallel File Systems PDSW11 Pattern-aware File Reorganization in MPI-IO Data Replication IPDPS13 Layout-Aware Replication Scheme for Parallel I/O Systems Cache ICDCS14 S4D-Cache: Smart Selective SSD Cache for Parallel I/O Systems Heterogeneity Merging IPDPS15 BigData15 HAS: Heterogeneity-Aware Selective Data Layout Scheme for Parallel File Systems on Hybrid Servers PortHadoop: Support Direct HPC Data Processing in Hadoop • Website: www.cs.iit.edu/~scs/iosig/ Data Compression
  • 17. Merging with Industrial Solution: Distributed I/O Systems Architecture of a typical HDFS system Models show for many applications the best optimization is to use HDFS
  • 18. Integrated data access system (IDAS) 18 Two camps o MPI for HPC o MapReduce for big data Native data access o PFS (POSIX I/O and MPI-IO) o Big Data (HDFS API) Limited Support for Data Access o Use explicit serial copy o Redundant data store in HDFS • Merging is more than access (maintain the merits) Yang, N. Liu, B. Feng, X.-H. Sun and S. Zhou, "PortHadoop: Support Direct HPC Data Processing in Hadoop," in Proc. of IEEE International Conference on Big Data (IEEE BigData 2015). Santa Clara, CA, USA, Oct. 2015.
  • 19. Application -- NASA Earth Science Data Analysis (Super-Cloud Project) 10/3/2017 19 Simulation is done on MPI environment and Data Analysis is done on Hadoop environment
  • 20. 48 96 192 384 0 1000 2000 3000 4000 5000 6000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Number of Files ElapsedTime(Seconds) Serial Vis Serial Copy Parallel Vis Parallel Copy PortHadoop-R Vis Serial Animation PortHadoop-R VS Conventional Methods: Performance (Vis-only) Serial Copy: one copy command issued on one Hadoop node Parallel Copy: 4 copy commands issued on each Hadoop node (32 in total) Serial Vis: read all data into one fat node, visualize Parallel Vis: visualize all files on all Hadoop nodes PortHadoop-R Vis: visualize all files into images Serial Animation: merge all images into one animation 𝑆𝑝𝑒𝑒𝑑𝑢𝑝 𝑃𝑜𝑟𝑡𝐻𝑎𝑑𝑜𝑜𝑝 = 1.47 𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑡𝑜𝑡𝑎𝑙 = 15 Good speedup: Total time is reduced by up to 15x and 1.47x compared to serial and parallel solution. 20
  • 21. What Are the Challenges in Computing?  When communication is the issue  Domain decomposition, pipelining  When locality (memory hierarchy) is the concern  Block-based computing  When memory bound is the constraint  Memory bound analysis  When data access concurrency is a determining factor  What is the new frame of algorithm design?
  • 22. What We Can Help You?  If you have big data applications  If you have data intensive scientific applications  If you need both computing and data powers  If you are interested in data-centric algorithm designs What You Can Help Us  Applications examples  Methods  Probability sampling, etc.
  • 23. CS570: Advanced Computer Architecture Lecture Page 23 An Unbalanced System  Source: Bob Colwell keynote ISCA’29 2002 http://systems.cs.colorado.edu/ISCA2002/Colwell-ISCA- KEYNOTE-2002-final.ppt