SlideShare a Scribd company logo
1 of 101
Download to read offline
Cache coherence for
GPU Architectures

Inderpreet Singh, Arrvindh Shriraman, Wilson W. L. Fung, Mike O'Connor, Tor M. Aamodt, Cache Coherence for GPU
Architectures, In proceedings of the 19th IEEE International Symposium on High-Performance Computer Architecture
1
(HPCA-19)
Agenda

2
Agenda
Challenges with CPU
coherence on GPUs.

2
Agenda
Challenges with CPU
coherence on GPUs.
Temporal Coherence:
Rethinking coherence for GPUs

2
Agenda
Challenges with CPU
coherence on GPUs.
Temporal Coherence:
Rethinking coherence for GPUs
What is the cost of
providing coherence?
2
Why provide coherence?
1. Inter-workgroup
communication

2. Atomic operations

Characterizing and Evaluating a Key-value Store
Application on Heterogeneous CPU-GPU Systems, ISPASS 2012

3. Task queues

3
Cache Coherence
Programmer

P P P P
Shared
Memory
Appearance: One global copy of every location

4
Cache Coherence
Multicores

GPUs

P P P P
L1 L1 L1 L1
L2
L2

L1 L1 L1 L1
Memory

...

Memory

5
Cache Coherence
Heterogeneous Systems

P P P P
L1 L1 L1 L1
L2
L2
...

L1 L1 L1 L1
...

Memory

How to provide coherence?
6
Challenges

7
Challenges with coherence

L1

L1
Shared L2

8
Challenges with coherence

L1

L1
Shared L2

8
Challenges with coherence

L1

L1

1

2

Shared L2

8
Challenges with coherence

L1

L1

1

3

Shared L2

8

2
Challenge 1: Traffic

L1

L1
Shared L2

9
Challenge 1: Traffic

L1

L1
Shared L2

9

L1
Challenge 1: Traffic

L1

L1
Shared L2

9

L1
Challenge 1: Traffic

L1

L1

L1

Shared L2

30% more traffic than current GPUs
9
Challenge 2: Buffer Overhead

L1

L1
Shared L2

10

L1
Challenge 2: Buffer Overhead

L1

L1

Protocol
Buffer

Shared L2

10

L1
Challenge 2: Buffer Overhead

L1

L1

Protocol
Buffer

Shared L2

10

L1
Challenge 2: Buffer Overhead

L1

L1

Protocol
Buffer

Shared L2

10

L1
Challenge 2: Buffer Overhead

L1

L1

L1

Protocol
Buffer

Shared L2
Coherence protocol buffers require 28% of total L2
10
Challenge 3: Complexity

L1

L1

1

Shared L2
Incoherent
protocol
4 states
11
Challenge 3: Complexity

L1

L1
3

1

2

Shared L2
Incoherent
protocol
4 states

Coherent
protocol
16 states
11
Coherence Overhead.
L1

Coherence messages
1. Traffic transferring
2. Area overhead
3. Protocol complexity

How to achieve coherence without messages?

12
TEMPORAL COHERENCE
13
Temporal Coherence
Time-based Approach
- trigger protocol events on timer alerts

L1

L1
Shared L2
14
Temporal Coherence
Time-based Approach
- trigger protocol events on timer alerts

L1

L1
Shared L2
14
Temporal Coherence
Time-based Approach
- trigger protocol events on timer alerts

L1

L1
Shared L2
14
Temporal Coherence
Time-based Approach
- trigger protocol events on timer alerts

L1

L1
Shared L2
14
Temporal Coherence

L1

L1
Shared L2
15
Temporal Coherence
Clock

1

L1

L1
Shared L2
15
Temporal Coherence
Clock

1
Load

L1

L1
Shared L2
15
Temporal Coherence
Clock

1
Load

Valid if
TIME  LT
!

L1

L1
Shared L2
15

LT
Temporal Coherence
Clock

1
Load

Valid if
TIME  LT
!

L1

L1
Shared L2
15

LT
Temporal Coherence
Clock

1
Load

Valid if
TIME  LT
!

L1

L1

LT

!

GT
Shared L2
15

Shared if
TIME  GT
Temporal Coherence

L1

L1

16
Temporal Coherence
TIME 0

L1

L1

16
Temporal Coherence
TIME 0
Load

L1

L1

16
Temporal Coherence
TIME 0
Load
!

L1

L1 20

16
Temporal Coherence
TIME 0
Load
!

L1

L1 20
!

20

Line shared
till 20

16
TIME 5

!

L1

L1 20

17

L1
TIME 5
Load
!

L1

L1 20

17

!

L1 25
TIME 5
Load
!

L1

L1 20

!

L1 25

!

25
Line shared
till 25
17
TIME

15
!

L1

L1 20
!

25

18

!

L1 25
TIME

15

Write
!

L1

L1 20
!

25

18

!

L1 25
TIME

20

Write
!

L1

L1 20
!

25

19

!

L1 25
TIME

20

Write
!

L1

L1 20
!

25

19

!

L1 25
TIME

25

Write

L1

!

L1

L1 25

!

25

20
TIME

25

Write

L1

!

L1

L1 25

!

25

20
TIME

25

Write

L1

!

L1

L1 25

!

25

20
TIME

25

Write

L1

!

L1

L1 25

!

25

20
Temporal Coherence
No coherence messages
All transactions are 2-hop
Protocol complexity minimal
Supports strong and weak
memory models
Enables optimized communication
(ask me later...)
21
How to set the block lifetime?
• Longer

= writes may stall

• Shorter

= may not exploit temporal locality

!

•

Lifetime predictor

at L2.

-Load to expired block (for temporal locality)
-Store to unexpired block (reduce write stalls)
-Eviction of unexpired block (reduce L2 eviction stalls)
22
Temporal Coherence (Weak)

Write
!

L1

L1 20
!

25
Shared L2
23

!

L1 25
Temporal Coherence (Weak)

Write
!

L1

L1 20
!

25
Shared L2
23

!

L1 25
Temporal Coherence (Weak)
Sensitive to misprediction
Write
!

L1

L1 20
!

25
Shared L2
23

!

L1 25
Temporal Coherence (Weak)
Sensitive to misprediction
Write

Resource ! stalls
L1

L1 20
!

25
Shared L2
23

!

L1 25
Temporal Coherence (Weak)
Sensitive to misprediction
Write

Resource ! stalls
L1

L1 20
!

!

L1 25

25
Hurts GPU applications
Shared L2
23
Temporal Coherence (Weak)
Sensitive to misprediction
Write

Resource ! stalls
L1

L1 20

!

L1 25

!

25
Hurts GPU applications
Shared L2

Goal : eliminate Write Stalls!
23
Temporal Coherence (Weak)
TIME

15
!

L1

OLD
L1
!

25

24

20

!

OLD
L1

25
Temporal Coherence (Weak)
TIME
!

25

15

Write
!

L1

OLD
L1
!

25

24

20

!

OLD
L1

25
Temporal Coherence (Weak)
TIME
!

25

15

Write
Fence
!

L1

OLD
L1
!

25

24

20

!

OLD
L1

25
Temporal Coherence (Weak)
TIME
!

25

15

Write
Fence
......
!

L1

OLD
L1
!

25

24

20

!

OLD
L1

25
Temporal Coherence (Weak)
TIME
!

25

20

Fence
!

L1

L1 20
!

25

25

!

OLD
L1

25
Temporal Coherence (Weak)
TIME
!

25

20

Fence
!

L1

L1 20
!

25

25

!

OLD
L1

25
Temporal Coherence (Weak)
TIME
!

25

20

Fence
......
!

L1

L1 20
!

25

25

!

OLD
L1

25
Temporal Coherence (Weak)
TIME
!

25

25

Fence

L1

!

L1

L1 25

!

25

26
Temporal Coherence (Weak)
TIME
!

25

25

Fence

L1

!

L1

L1 25

!

25

26
Temporal Coherence (Weak)
TIME
!

25

25

Fence

L1

!

L1

L1 25

!

25

26
Temporal Coherence (Weak)
TIME

25

!

25
L1

!

L1

L1 25

!

25

26
Temporal Coherence (Weak)
No Access Stalls
Efficient GPU applications
Aggressive lifetime predictors
Supports weak memory models
27
28
Coherence Applications
• Lock-based

programs

-Barnes Hut
-Cloth Physics
-Place-and-Route

• Stencil

-Max-Flow Min-cut
-3D equation solver

• Load

balancing

-Octree Partitioning
29
Interconnect Traffic

GPU Applications (do not need coherence)

30
Interconnect Traffic

GPU Applications (do not need coherence)
2
1.5
1
0.5
0
30
Interconnect Traffic

GPU Applications (do not need coherence)
2

1
0.5
0

NO.CC

1.5

30
Interconnect Traffic

GPU Applications (do not need coherence)
2.3

2

0.5
0

MESI

1

NO.CC

1.5

30
Interconnect Traffic

GPU Applications (do not need coherence)
2.3

2

0

GPU-VI

0.5

MESI

1

NO.CC

1.5

30
Interconnect Traffic

GPU Applications (do not need coherence)
Wr-Through

2.3

2

.8x

0

GPU-VI

0.5

MESI

1

NO.CC

1.5

30
Interconnect Traffic

GPU Applications (do not need coherence)
Wr-Through

2.3

2

.8x

1.5

0

30

.3x

TC

GPU-VI

MESI

0.5

NO.CC

1

No msgs
Coherence Applications
• Lock-based

programs

-Barnes Hut
-Cloth Physics
-Place-and-Route

• Stencil

-Max-Flow Min-cut
-3D equation solver

• Load

balancing

-Octree Partitioning
31
Speedup
Coherence Applications

32
Speedup
Coherence Applications
1.75
1.5
1.25
1
0.75
0.5
0.25
0
32
Speedup
Coherence Applications
1.75
1.5

1
0.75
0.5
0.25

NO L1

1.25

0
32
Speedup
Coherence Applications
1.75
1.5

0.75
0.5
0.25

MESI

1

NO L1

1.25

0
32
Speedup
Coherence Applications
1.75
1.5

0.75
0.5
0.25

MESI

1

NO L1

1.25

0
32
Speedup
Coherence Applications
1.75
1.5

0.5
0.25
0

32

TC

GPU-VI

0.75

MESI

1

NO L1

1.25
Speedup
Coherence Applications
Need a 32KB
directory

1.75
1.5

0.5
0.25
0

32

TC

GPU-VI

0.75

MESI

1

NO L1

1.25
Protocol Complexity	

33
Protocol Complexity	

L1 Stable
L1
Transient
L2 Stable
L2
Transient
33
Protocol Complexity	
NonCoherent

L1 Stable
L1
Transient
L2 Stable
L2
Transient

2
2
2
2
33
Protocol Complexity	
NonCoherent

L1 Stable
L1
Transient
L2 Stable
L2
Transient

GPU-VI	

 	


2
2
2
2

2
1
5
10
33
Protocol Complexity	
NonCoherent

L1 Stable
L1
Transient
L2 Stable
L2
Transient

Temporal
GPU-VI	

 	

Coherence

2
2
2
2

2
1
5
10
33

2
1
5
3
What did we learn
!

• Throughput

and heterogeneous architectures
require a more streamlined caching framework.
!

• Single-chip

integration enables mechanisms
that we can exploit to simplify communication
protocols.
!

• Efficient

coherence protocols enable
programmers to deploy accelerators for wider
purposes..
Contact:
ashriram@cs.sfu.ca
or
aamodt@ece.ubc.ca
• Obtain

GPGPU-Sim with coherence support
http://www.ece.ubc.ca/~isingh/gpgpusim-ruby.tar.gz
35
Interconnect Energy

Interworkgroup

1.6
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0

Router (Static)

Interworkgroup

Intraworkgroup

36

NO-COH
MESI
GPU-VI
GPU-Vini
TCW

Link (Static)

NO-L1
MESI
GPU-VI
GPU-Vini
TCW

Normalized Energy

Router (Dynamic)

NO-COH
MESI
GPU-VI
GPU-Vini
TCW

1.6
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0

NO-L1
MESI
GPU-VI
GPU-Vini
TCW

Normalized Power

Link (Dynamic)

Intraworkgroup
1.0

1.0

0.5

0.5

0.0

0.0

STN
HSP

VPR

37
or coherent and non-coherent GPU memory systems.

communication
2.0

1.5

KMN

(b) Intra-workgroup communication

RCL=0.25
REQ=0.55

2.0

1.5

1.0

0.5

0.0

LPS
NO-COH
MESI
GPU-VI
GPU-Vini
NO-COH
TCW

Interconnect Traffic

0.0

NDL

MESI
NO-COH
GPU-VI
MESI
GPU-Vini
GPU-VI
TCW
GPU-Vini

RCL=0.09
REQ=0.55

HSP
KMN

RG

SR

TCW

2.0

ST

NO-COH
MESI
NO-COH
GPU-VI
MESI
GPU-VI
GPU-Vini
GPU-Vini
TCW

ATO

TCW

RCL=0.15
REQ=0.63

GPU-Vini
TCW
NO-COH

RCL ST LD REQ
INV
ATO

MESI
GPU-VI
NO-COH
GPU-Vini
MESI
TCW
GPU-VI

AVG
NO-COH
MESI
GPU-VI
GPU-Vini
TCW

NO-COH
NO-L1
MESI
MESI
GPU-VI
GPU-VI
GPU-Vini
GPU-Vini
TCW
TCW

1.5

Traffic

2.0

NO-L1
NO-COH
MESI
MESI
GPU-VI
GPU-VI
GPU-Vini
GPU-Vini
TCW
TCW

NO-L1
MESI
GPU-VI Interconnect
GPU-Vini
TCW

REQ

LD

RCL=0.16 RCL=0.25
REQ=0.63 REQ=0.55
2.27

R
R

1.5

1.0

0.5

AVG

LPS

(b) Intra-work
1.0

STN
NO-L1
NO-L1
MESI MESI
GPU-VI
GPU-VI
GPU-Vini
GPU-Vini
TCW TCW

BH

VPR

(a) Inter-workgroup communicationKMN
HSP
AVG
CL

NO-L1
MESI
GPU-VI
GPU-Vini
Interconnect
TCW

NO-L1
NO-L1
MESI MESI
GPU-VI
GPU-VI
GPU-Vini
GPU-Vini
TCW TCW

CC

DLB

0.5

0.5
0.0

0.0

0.0

STN

VPR

GPU-VI
GPU-Vini
NO-L1
TCWMESI

2.0

AVG

GPU-VI
GPU-Vini
TCW

ST

GPU-VI
GPU-Vini
NO-COH
MESI TCW

ATO

GPU-VI
NO-COH
GPU-Vini
MESI TCW

REQ

GPU-Vini
NO-L1
TCWMESI

1.5

NO-L1
MESI
GPU-VI
NO-COH
GPU-Vini
MESI
GPU-VI TCW

2.0

Traffic

INV

1.0

0.5

TCW

NO-L1
NO-L1
MESI
GPU-VI MESI
GPU-VI
GPU-Vini
GPU-Vini
TCW

RCL
RCL=0.03
INV=0.03
REQ=0.68

RCL
INV

LD

REQ

2.0 R
RCL=0.25
REQ=0.55
R
1.5

1.5
1.0

LPS

communication Breakdown of interconnect(b) Intra-work
Figure 8.
traffic for co
38
TC-Strong vs TC-Weak
TCSUO

TCSOO

TCS

TCW

TCW w/ predictor

Fixed lifetime for all applications

Best lifetime for each application
1.2

1.2

Speedup

Speedup

1.4
1.0
0.8
0.6

1.0
0.8
0.6

All applications

39

All applications

More Related Content

Viewers also liked

Cache coherence
Cache coherenceCache coherence
Cache coherence
Employee
 
사례를 통해 살펴보는 프로파일링과 최적화 NDC2013
사례를 통해 살펴보는 프로파일링과 최적화 NDC2013사례를 통해 살펴보는 프로파일링과 최적화 NDC2013
사례를 통해 살펴보는 프로파일링과 최적화 NDC2013
Esun Kim
 

Viewers also liked (6)

PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon SelleyPT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
 
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
 
Cache coherence
Cache coherenceCache coherence
Cache coherence
 
사례를 통해 살펴보는 프로파일링과 최적화 NDC2013
사례를 통해 살펴보는 프로파일링과 최적화 NDC2013사례를 통해 살펴보는 프로파일링과 최적화 NDC2013
사례를 통해 살펴보는 프로파일링과 최적화 NDC2013
 
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) NDC15 Ver.
 
그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...
그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...
그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...
 

Similar to PL-4049, Cache Coherence for GPU Architectures, by Arvindh Shriraman and Tor Aamodt

Similar to PL-4049, Cache Coherence for GPU Architectures, by Arvindh Shriraman and Tor Aamodt (20)

Kernel Recipes 2016 - Wo needs a real-time operating system (not you!)
Kernel Recipes 2016 - Wo needs a real-time operating system (not you!)Kernel Recipes 2016 - Wo needs a real-time operating system (not you!)
Kernel Recipes 2016 - Wo needs a real-time operating system (not you!)
 
When Streaming Needs Batch With Konstantin Knauf | Current 2022
When Streaming Needs Batch With Konstantin Knauf | Current 2022When Streaming Needs Batch With Konstantin Knauf | Current 2022
When Streaming Needs Batch With Konstantin Knauf | Current 2022
 
Large-scale computation without sacrificing expressiveness
Large-scale computation without sacrificing expressivenessLarge-scale computation without sacrificing expressiveness
Large-scale computation without sacrificing expressiveness
 
Pipeline parallelism
Pipeline parallelismPipeline parallelism
Pipeline parallelism
 
06 pipeline
06 pipeline06 pipeline
06 pipeline
 
[114] DRC hubo technical review
[114] DRC hubo technical review[114] DRC hubo technical review
[114] DRC hubo technical review
 
Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)
 
Build a minial DBMS from scratch by Rust
Build a minial DBMS from scratch by RustBuild a minial DBMS from scratch by Rust
Build a minial DBMS from scratch by Rust
 
Debunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream ProcessingDebunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream Processing
 
Pipelining1
Pipelining1Pipelining1
Pipelining1
 
lec25-final.ppt
lec25-final.pptlec25-final.ppt
lec25-final.ppt
 
Paper_Scalable database logging for multicores
Paper_Scalable database logging for multicoresPaper_Scalable database logging for multicores
Paper_Scalable database logging for multicores
 
Transactions in Action: the Story of Exactly Once in Apache Kafka
Transactions in Action: the Story of Exactly Once in Apache KafkaTransactions in Action: the Story of Exactly Once in Apache Kafka
Transactions in Action: the Story of Exactly Once in Apache Kafka
 
L12.FA20.ppt
L12.FA20.pptL12.FA20.ppt
L12.FA20.ppt
 
SDN in Warehouse Scale Datacenters v2.0
SDN in Warehouse Scale Datacenters v2.0SDN in Warehouse Scale Datacenters v2.0
SDN in Warehouse Scale Datacenters v2.0
 
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...
 
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
 
Experiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsExperiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah Watkins
 
Tech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HATech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HA
 
Atlanta Spark User Meetup 09 22 2016
Atlanta Spark User Meetup 09 22 2016Atlanta Spark User Meetup 09 22 2016
Atlanta Spark User Meetup 09 22 2016
 

More from AMD Developer Central

Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
AMD Developer Central
 

More from AMD Developer Central (20)

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math Libraries
 
Introduction to Node.js
Introduction to Node.jsIntroduction to Node.js
Introduction to Node.js
 
Media SDK Webinar 2014
Media SDK Webinar 2014Media SDK Webinar 2014
Media SDK Webinar 2014
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
 
DirectGMA on AMD’S FirePro™ GPUS
DirectGMA on AMD’S  FirePro™ GPUSDirectGMA on AMD’S  FirePro™ GPUS
DirectGMA on AMD’S FirePro™ GPUS
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop Intelligence
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
 
Inside XBox- One, by Martin Fuller
Inside XBox- One, by Martin FullerInside XBox- One, by Martin Fuller
Inside XBox- One, by Martin Fuller
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas Thibieroz
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
Gcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodesGcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodes
 
Inside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin FullerInside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin Fuller
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan Nevraev
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

PL-4049, Cache Coherence for GPU Architectures, by Arvindh Shriraman and Tor Aamodt