SlideShare a Scribd company logo
1 of 71
Download to read offline
Should I care about CPU cache?
Kamil Witecki
Disclaimers
Key notes
Relative latency
Key notes
Relative latency Organization
Key notes
Relative latency Organization Profits?
What is average DRAM latency?
0
5
10
15
20
25
0 500 1000 1500 2000 2500
speed (MT
s
)
latency(ns)
And end to end?
row 0
row N
. . .
data
address
control
And end to end?
row 0
row N
. . .
data
address
control
activeinactive
And end to end?
row 0
row N
. . .
data
address
control
activeinactive
And end to end?
row 0
row N
. . .
data
address
control
activeinactivelatency
And end to end?
row 0
row N
. . .
data
address
control
activeinactivelatency
inactiveactive
And end to end?
row 0
row N
. . .
data
address
control
activeinactivelatency
inactiveactive
more latency
And end to end?
row 0
row N
. . .
data
address
control
activeinactivelatency
inactiveactive
more latency
DRAM
controller
And end to end?
row 0
row N
. . .
data
address
control
activeinactivelatency
inactiveactive
more latency
DRAM
controller
CPU
core
And end to end?
row 0
row N
. . .
data
address
control
activeinactivelatency
inactiveactive
more latency
DRAM
controller
CPU
core
end to end latency: 50-100ns
Is 50ns a lot?
What Duration
Reference 50ns
Is 50ns a lot?
What Duration
Reference 50ns
1 clock cycle 50ns
Is 50ns a lot?
What Duration
Reference 50ns
1 clock cycle@2MHz 50ns
Fun fact: Clock rate of 8080 CPU
Is 50ns a lot?
What Duration
Reference 50ns
1 clock cycle@2MHz 50ns
1 clock cycle@1GHz 1ns
Is 50ns a lot?
What Duration
Reference 50ns
1 clock cycle@2MHz 50ns
1 clock cycle@1GHz 1ns
1 clock cycle@5GHz 0.2ns
Is 50ns a lot?
What Duration
Reference 50ns
1 clock cycle@2MHz 50ns
1 clock cycle@1GHz 1ns
1 clock cycle@5GHz 0.2ns
Fun fact: 1 heartbeat vs boiling 1 liter of
water
And how to counter that?
row 0
row N
. . .
data
address
control
activeinactive
inactiveactive
DRAM
controller
CPU
core
end to end latency: 50-100ns
And how to counter that?
row 0
row N
. . .
data
address
control
activeinactive
inactiveactive
DRAM
controller
CPU
core
end to end latency: 50-100ns
Cache
And how to counter that?
row 0
row N
. . .
data
address
control
activeinactive
inactiveactive
DRAM
controller
CPU
core
end to end latency: 50-100ns
Cache
DRAM <-> Cache latency: 50-100ns
And how to counter that?
row 0
row N
. . .
data
address
control
activeinactive
inactiveactive
DRAM
controller
CPU
core
end to end latency: 50-100ns
Cache
DRAM <-> Cache latency: 50-100nsCache latency: 1 clock cycle
Memory hierarchy
L1 64KiB
L2 512KiB
L3 8MiB
RAM 64MiB
Memory hierarchy
L1 64KiB
L2 512KiB
L3 8MiB
RAM 64MiB
Associativity or how to map memory
Must be fast, die-size and power efficient
Ln+1 Ln
0
1
2
3
4
5
6
7
0
1
2
3
Associativity or how to map memory
Must be fast, die-size and power efficient
Ln+1 Ln
0
1
2
3
4
5
6
7
0
1
2
3
Example: Direct-mapping
Selection: address mod 4
Consequences: ?
Associativity or how to map memory
Must be fast, die-size and power efficient
Ln+1 Ln
0
1
2
3
4
5
6
7
0
1
2
3
Example: Direct-mapping
Selection: address mod 4
Consequences: ?
Associativity or how to map memory
Must be fast, die-size and power efficient
Ln+1 Ln
0
1
2
3
4
5
6
7
0
1
2
3
Example: Direct-mapping
Selection: address mod 4
Consequences: ?
Associativity or how to map memory
Must be fast, die-size and power efficient
Ln+1 Ln
0
1
2
3
4
5
6
7
0
1
2
3
Example: Direct-mapping
Selection: address mod 4
Consequences: ?
Consequences:
- simple - fast and die-size, power efficient
Associativity or how to map memory
Must be fast, die-size and power efficient
Ln+1 Ln
0
1
2
3
4
5
6
7
0
1
2
3
Example: Direct-mapping
Selection: address mod 4
Consequences: ?
Consequences:
- simple - fast and die-size, power efficient
Consequences:
- good best case - optimal sequence traversal
Associativity or how to map memory
Must be fast, die-size and power efficient
Ln+1 Ln
0
1
2
3
4
5
6
7
0
1
2
3
Example: Direct-mapping
Selection: address mod 4
Consequences: ?
Consequences:
- simple - fast and die-size, power efficient
Consequences:
- good best case - optimal sequence traversal
Consequences:
- worst case - jumping every nth line
Associativity or how to map memory
Must be fast, die-size and power efficient
Ln+1 Ln
0
1
2
3
4
5
6
7
0
1
2
3
Example: Direct-mapping
Selection: address mod 4
Consequences: ?
Consequences:
- simple - fast and die-size, power efficient
Consequences:
- good best case - optimal sequence traversal
Consequences:
- worst case - jumping every nth line
Example: 2-way set associative
Selection: address mod 2
And then?
Example: 2-way set associative
Selection: address mod 2
Then: Least Recently Used
Consequences: ?
0
1
Associativity or how to map memory
Must be fast, die-size and power efficient
Ln+1 Ln
0
1
2
3
4
5
6
7
0
1
2
3
Example: Direct-mapping
Selection: address mod 4
Consequences: ?
Consequences:
- simple - fast and die-size, power efficient
Consequences:
- good best case - optimal sequence traversal
Consequences:
- worst case - jumping every nth line
Example: 2-way set associative
Selection: address mod 2
And then?
Example: 2-way set associative
Selection: address mod 2
Then: Least Recently Used
Consequences: ?
0
1
Associativity or how to map memory
Must be fast, die-size and power efficient
Ln+1 Ln
0
1
2
3
4
5
6
7
0
1
2
3
Example: Direct-mapping
Selection: address mod 4
Consequences: ?
Consequences:
- simple - fast and die-size, power efficient
Consequences:
- good best case - optimal sequence traversal
Consequences:
- worst case - jumping every nth line
Example: 2-way set associative
Selection: address mod 2
And then?
Example: 2-way set associative
Selection: address mod 2
Then: Least Recently Used
Consequences: ?
0
1
Associativity or how to map memory
Must be fast, die-size and power efficient
Ln+1 Ln
0
1
2
3
4
5
6
7
0
1
2
3
Example: Direct-mapping
Selection: address mod 4
Consequences: ?
Consequences:
- simple - fast and die-size, power efficient
Consequences:
- good best case - optimal sequence traversal
Consequences:
- worst case - jumping every nth line
Example: 2-way set associative
Selection: address mod 2
And then?
Example: 2-way set associative
Selection: address mod 2
Then: Least Recently Used
Consequences: ?
0
1
Consequences:
- complexity grows with number of ways
Consequences:
- 15% less cache misses
Consequences:
- avoids N-parallel stalls
Associativity or how to map memory
Must be fast, die-size and power efficient
Ln+1 Ln
0
1
2
3
4
5
6
7
0
1
2
3
Example: Direct-mapping
Selection: address mod 4
Consequences: ?
Consequences:
- simple - fast and die-size, power efficient
Consequences:
- good best case - optimal sequence traversal
Consequences:
- worst case - jumping every nth line
Example: 2-way set associative
Selection: address mod 2
And then?
Example: 2-way set associative
Selection: address mod 2
Then: Least Recently Used
Consequences: ?
0
1
Consequences:
- complexity grows with number of ways
Consequences:
- 15% less cache misses
Consequences:
- avoids N-parallel stalls
Associativity or how to map memory
Must be fast, die-size and power efficient
Ln+1 Ln
0
1
2
3
4
5
6
7
0
1
2
3
Example: Direct-mapping
Selection: address mod 4
Consequences: ?
Consequences:
- simple - fast and die-size, power efficient
Consequences:
- good best case - optimal sequence traversal
Consequences:
- worst case - jumping every nth line
Example: 2-way set associative
Selection: address mod 2
And then?
Example: 2-way set associative
Selection: address mod 2
Then: Least Recently Used
Consequences: ?
0
1
Consequences:
- complexity grows with number of ways
Consequences:
- 15% less cache misses
Consequences:
- avoids N-parallel stalls
Cache line
L1
Cache line
L1
Cache line: 32B
Cache line
L1
Cache line: 32B
Cache line
L1
Cache line: 32B
Cache line
L1
Cache line: 32B
Cache line
L1
Cache line: 32B
Cache line - alignment consequences
Cache line: 32B
Object size: 4B
Alignment: 2B -> alignas(2) std::byte x[4];
Cache line - alignment consequences
Cache line: 32B
Object size: 4B
Alignment: 2B -> alignas(2) std::byte x[4];
Cache line - alignment consequences
Cache line: 32B
Object size: 4B
Alignment: 2B -> alignas(2) std::byte x[4];
Cache line - alignment consequences
Cache line: 32B
Object size: 4B
Alignment: 2B -> alignas(2) std::byte x[4];Alignment: 4B -> int32_t;
Cache line - alignment consequences
Cache line: 32B
Object size: 4B
Alignment: 2B -> alignas(2) std::byte x[4];Alignment: 4B -> int32_t;
Cache line - alignment consequences
Cache line: 32B
Object size: 4B
Alignment: 2B -> alignas(2) std::byte x[4];Alignment: 4B -> int32_t;Alignment: 32B -> alignas(32) int32_t x[4];
Cache line - write consequences
Cache line: 32B
Object size: 4B
Alignment: 4B
Cache line - write consequences
Cache line: 32B
Object size: 4B
Alignment: 4B
Cache line - write consequences
Cache line: 32B
Object size: 4B
Alignment: 4B
Cache line: 32B (invalidated)
Keeping caches hot
// data: vector of objects composed of:
// int32_t type , string name , data* parent ,
// map<string , string > params
// task: find by type
Naive C-style!
// data: vector of objects composed of:
// int32_t type , string name , data* parent ,
// map<string , string > params
s t r u c t data {
i n t 3 2 _ t type ;
s t r i n g name ;
o b s e r v e r _ p t r <data> p a r en t ;
map<s t r i n g , s t r i n g > params ;
} ;
// task: find by type (vector is sorted by type)
pair <s i z e _ t , s i z e _ t >
f i n d _ b y _ t y p e ( v e c t o r <data> const & x , key type ) {
auto r = equal_range ( begin ( x ) , end ( x ) , type ) ;
r e t u r n { r . f i r s t − begin ( x ) , r . second − begin ( x ) } ;
}
Layout
s i z e o f ( data ) ; // : 64B
o f f s e t o f ( type ) ; // : 0B
o f f s e t o f ( name ) ; // : 8B
o f f s e t o f ( p ar e n t ) ; // : 40B
o f f s e t o f ( params ) ; // : 48B
a l i g n o f ( data ) ; // : 64B
//Cache line : 64B
Type #1 — Name #1 Parent #1 Params #1
Type #2 — Name #2 Parent #2 Params #2
C++ style!
// data: vector of objects composed of:
// int32_t type , string name , data* parent ,
// map<string , string > params
s t r u c t data {
s t r i n g name ;
o b s e r v e r _ p t r <data> p a r en t ;
map<s t r i n g , s t r i n g > params ;
} ;
boost : : flat_map <i n t 3 2 _ t , data >:: equal_range ;
std : : map<i n t 3 2 _ t , data >:: equal_range ;
std : : unordered_map<i n t 3 2 _ t , data >:: equal_range ;
AMD AthlonTM
II X3 440, 12GiB RAM DDR3-1333, clang 8.0.1-3
0
2500
5000
7500
10000
1e+04 1e+06 1e+08
Number of objects
Time(ns)
colour
flat-map-clang
map-clang
naive-clang
unordered-map-clang
AMD RyzenTM
1600X, 16GiB RAM DDR4-3200, clang 8.0.1-3
0
2500
5000
7500
10000
1e+04 1e+06 1e+08
Number of objects
Time(ns)
colour
flat-map-clang
map-clang
naive-clang
unordered-map-clang
Will separate array do better?
s t r u c t a r r {
v e c t o r <i n t 3 2 _ t > types ;
v e c t o r <entry > e n t r i e s ;
} ;
pair <s i z e _ t , s i z e _ t >
f i n d _ b y _ t y p e ( a r r const & d , i n t 3 2 _ t type ) {
auto b = begin ( d . types ) ;
auto r = equal_range ( b , end ( d . types ) , type ) ;
r e t u r n { r . f i r s t − b , r . second − b } ;
}
Layout
s i z e o f ( type ) ; // : 8B
a l i g n o f ( data ) ; // : 4B
//Cache line : 64B
Type #1 Type #2 Type #3 Type #4 Type #5 Type #6 Type #7 Type #8 Type #9 Type #10 Type #11 Type #12 Type #13 Type #14 Type #15 Type #16
AMD AthlonTM
II X3 440, 12GiB RAM DDR3-1333, clang 8.0.1-3
0
2500
5000
7500
10000
1e+04 1e+06 1e+08
Number of objects
Time(ns)
colour
flat-map-clang
map-clang
naive-clang
optimized-clang
unordered-map-clang
AMD RyzenTM
1600X, 16GiB RAM DDR4-3200, clang 8.0.1-3
0
2500
5000
7500
10000
1e+04 1e+06 1e+08
Number of objects
Time(ns)
colour
flat-map-clang
map-clang
naive-clang
optimized-clang
unordered-map-clang
NOT BAD
Instructions (1.0e+07, 2.5e+07, 5.0e+07)
95614956149561495614111611161116111613461346134613468698698698691241124112411241
0
500
1000
1500
Instr
values
types
flat-map
map
naive
optimized
unordered-map
L1 uses & misses
870587058888115115156156115115 1739517395122122130130227227118118
0
100
200
300
L1 misses L1 uses
values
types
flat-map
map
naive
optimized
unordered-map
LL miss rate
7.77.77.77.78.68.68.68.69.29.29.29.215.315.315.315.39.89.89.89.8
0
5
10
15
L3 miss rate
values
types
flat-map
map
naive
optimized
unordered-map
Questions and Answers
Kamil.Witecki@nokia.com
Bibliograpy I
Micron Technology, Inc.
Speed vs. latency.
White paper, Micron Technology, Inc.
David A. Patterson and John L. Hennessy.
Computer Organization and Design, Fifth Edition: The Hardware/Software
Interface.
Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 5th edition, 2013.

More Related Content

What's hot

Hash functions MD5 and RIPEMD 160
Hash functions MD5 and RIPEMD 160Hash functions MD5 and RIPEMD 160
Hash functions MD5 and RIPEMD 160chutinhha
 
The SHA Hashing Algorithm
The SHA Hashing AlgorithmThe SHA Hashing Algorithm
The SHA Hashing AlgorithmBob Landstrom
 
Message Authentication using Message Digests and the MD5 Algorithm
Message Authentication using Message Digests and the MD5 AlgorithmMessage Authentication using Message Digests and the MD5 Algorithm
Message Authentication using Message Digests and the MD5 AlgorithmAjay Karri
 
Data streaming algorithms
Data streaming algorithmsData streaming algorithms
Data streaming algorithmsHridyesh Bisht
 
Basic explanation to md5 implementation in C
Basic explanation to md5 implementation in CBasic explanation to md5 implementation in C
Basic explanation to md5 implementation in CSourav Punoriyar
 
Hash& mac algorithms
Hash& mac algorithmsHash& mac algorithms
Hash& mac algorithmsHarry Potter
 
Hash Techniques in Cryptography
Hash Techniques in CryptographyHash Techniques in Cryptography
Hash Techniques in CryptographyBasudev Saha
 
Secure Hashing Techniques - Introduction
Secure Hashing Techniques - IntroductionSecure Hashing Techniques - Introduction
Secure Hashing Techniques - IntroductionUdhayyagethan Mano
 
Hash Function & Analysis
Hash Function & AnalysisHash Function & Analysis
Hash Function & AnalysisPawandeep Kaur
 
Computer Networking : Principles, Protocols and Practice - lesson 1
Computer Networking : Principles, Protocols and Practice - lesson 1Computer Networking : Principles, Protocols and Practice - lesson 1
Computer Networking : Principles, Protocols and Practice - lesson 1Olivier Bonaventure
 

What's hot (18)

Hash functions MD5 and RIPEMD 160
Hash functions MD5 and RIPEMD 160Hash functions MD5 and RIPEMD 160
Hash functions MD5 and RIPEMD 160
 
The SHA Hashing Algorithm
The SHA Hashing AlgorithmThe SHA Hashing Algorithm
The SHA Hashing Algorithm
 
Secure hashing algorithm
Secure hashing algorithmSecure hashing algorithm
Secure hashing algorithm
 
SHA 1 Algorithm
SHA 1 AlgorithmSHA 1 Algorithm
SHA 1 Algorithm
 
Message Authentication using Message Digests and the MD5 Algorithm
Message Authentication using Message Digests and the MD5 AlgorithmMessage Authentication using Message Digests and the MD5 Algorithm
Message Authentication using Message Digests and the MD5 Algorithm
 
Data streaming algorithms
Data streaming algorithmsData streaming algorithms
Data streaming algorithms
 
Hash crypto
Hash cryptoHash crypto
Hash crypto
 
Basic explanation to md5 implementation in C
Basic explanation to md5 implementation in CBasic explanation to md5 implementation in C
Basic explanation to md5 implementation in C
 
Hash& mac algorithms
Hash& mac algorithmsHash& mac algorithms
Hash& mac algorithms
 
Hash Techniques in Cryptography
Hash Techniques in CryptographyHash Techniques in Cryptography
Hash Techniques in Cryptography
 
MD5
MD5MD5
MD5
 
Secure Hashing Techniques - Introduction
Secure Hashing Techniques - IntroductionSecure Hashing Techniques - Introduction
Secure Hashing Techniques - Introduction
 
Message digest 5
Message digest 5Message digest 5
Message digest 5
 
Hash Function & Analysis
Hash Function & AnalysisHash Function & Analysis
Hash Function & Analysis
 
MD-5 : Algorithm
MD-5 : AlgorithmMD-5 : Algorithm
MD-5 : Algorithm
 
Chacha ppt
Chacha pptChacha ppt
Chacha ppt
 
Computer Networking : Principles, Protocols and Practice - lesson 1
Computer Networking : Principles, Protocols and Practice - lesson 1Computer Networking : Principles, Protocols and Practice - lesson 1
Computer Networking : Principles, Protocols and Practice - lesson 1
 
Hashing
HashingHashing
Hashing
 

Similar to Code dive 2019 kamil witecki - should i care about cpu cache

Atlanta Spark User Meetup 09 22 2016
Atlanta Spark User Meetup 09 22 2016Atlanta Spark User Meetup 09 22 2016
Atlanta Spark User Meetup 09 22 2016Chris Fregly
 
Segmentation Faults, Page Faults, Processes, Threads, and Tasks
Segmentation Faults, Page Faults, Processes, Threads, and TasksSegmentation Faults, Page Faults, Processes, Threads, and Tasks
Segmentation Faults, Page Faults, Processes, Threads, and TasksDavid Evans
 
Data oriented design and c++
Data oriented design and c++Data oriented design and c++
Data oriented design and c++Mike Acton
 
NIPS2007: structured prediction
NIPS2007: structured predictionNIPS2007: structured prediction
NIPS2007: structured predictionzukun
 
Music Recommender Systems
Music Recommender SystemsMusic Recommender Systems
Music Recommender Systemszhu02
 
Perl DBI Scripting with the ILS
Perl DBI Scripting with the ILSPerl DBI Scripting with the ILS
Perl DBI Scripting with the ILSRoy Zimmer
 
List intersection for web search: Algorithms, Cost Models, and Optimizations
List intersection for web search: Algorithms, Cost Models, and OptimizationsList intersection for web search: Algorithms, Cost Models, and Optimizations
List intersection for web search: Algorithms, Cost Models, and OptimizationsSunghwan Kim
 
Neo4j after 1 year in production
Neo4j after 1 year in productionNeo4j after 1 year in production
Neo4j after 1 year in productionAndrew Nikishaev
 
How to build TiDB
How to build TiDBHow to build TiDB
How to build TiDBPingCAP
 
Digital logic-formula-notes-final-1
Digital logic-formula-notes-final-1Digital logic-formula-notes-final-1
Digital logic-formula-notes-final-1Kshitij Singh
 
Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Universität Rostock
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model佳蓉 倪
 
OSMC 2009 | Anomalieerkennung und Trendvorhersagen an Hand von Daten aus Nagi...
OSMC 2009 | Anomalieerkennung und Trendvorhersagen an Hand von Daten aus Nagi...OSMC 2009 | Anomalieerkennung und Trendvorhersagen an Hand von Daten aus Nagi...
OSMC 2009 | Anomalieerkennung und Trendvorhersagen an Hand von Daten aus Nagi...NETWAYS
 
Porting NetBSD to the open source LatticeMico32 CPU
Porting NetBSD to the open source LatticeMico32 CPUPorting NetBSD to the open source LatticeMico32 CPU
Porting NetBSD to the open source LatticeMico32 CPUYann Sionneau
 
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...Flink Forward
 
Self-managed and automatically reconfigurable stream processing
Self-managed and automatically reconfigurable stream processingSelf-managed and automatically reconfigurable stream processing
Self-managed and automatically reconfigurable stream processingVasia Kalavri
 
Applying Memory Forensics to Rootkit Detection
Applying Memory Forensics to Rootkit DetectionApplying Memory Forensics to Rootkit Detection
Applying Memory Forensics to Rootkit DetectionIgor Korkin
 
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...MongoDB
 

Similar to Code dive 2019 kamil witecki - should i care about cpu cache (20)

Atlanta Spark User Meetup 09 22 2016
Atlanta Spark User Meetup 09 22 2016Atlanta Spark User Meetup 09 22 2016
Atlanta Spark User Meetup 09 22 2016
 
Segmentation Faults, Page Faults, Processes, Threads, and Tasks
Segmentation Faults, Page Faults, Processes, Threads, and TasksSegmentation Faults, Page Faults, Processes, Threads, and Tasks
Segmentation Faults, Page Faults, Processes, Threads, and Tasks
 
Data oriented design and c++
Data oriented design and c++Data oriented design and c++
Data oriented design and c++
 
NIPS2007: structured prediction
NIPS2007: structured predictionNIPS2007: structured prediction
NIPS2007: structured prediction
 
Music Recommender Systems
Music Recommender SystemsMusic Recommender Systems
Music Recommender Systems
 
Perl DBI Scripting with the ILS
Perl DBI Scripting with the ILSPerl DBI Scripting with the ILS
Perl DBI Scripting with the ILS
 
List intersection for web search: Algorithms, Cost Models, and Optimizations
List intersection for web search: Algorithms, Cost Models, and OptimizationsList intersection for web search: Algorithms, Cost Models, and Optimizations
List intersection for web search: Algorithms, Cost Models, and Optimizations
 
Neo4j after 1 year in production
Neo4j after 1 year in productionNeo4j after 1 year in production
Neo4j after 1 year in production
 
How to build TiDB
How to build TiDBHow to build TiDB
How to build TiDB
 
Digital logic-formula-notes-final-1
Digital logic-formula-notes-final-1Digital logic-formula-notes-final-1
Digital logic-formula-notes-final-1
 
Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...
 
Let's Get to the Rapids
Let's Get to the RapidsLet's Get to the Rapids
Let's Get to the Rapids
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model
 
OSMC 2009 | Anomalieerkennung und Trendvorhersagen an Hand von Daten aus Nagi...
OSMC 2009 | Anomalieerkennung und Trendvorhersagen an Hand von Daten aus Nagi...OSMC 2009 | Anomalieerkennung und Trendvorhersagen an Hand von Daten aus Nagi...
OSMC 2009 | Anomalieerkennung und Trendvorhersagen an Hand von Daten aus Nagi...
 
Porting NetBSD to the open source LatticeMico32 CPU
Porting NetBSD to the open source LatticeMico32 CPUPorting NetBSD to the open source LatticeMico32 CPU
Porting NetBSD to the open source LatticeMico32 CPU
 
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
 
Self-managed and automatically reconfigurable stream processing
Self-managed and automatically reconfigurable stream processingSelf-managed and automatically reconfigurable stream processing
Self-managed and automatically reconfigurable stream processing
 
Applying Memory Forensics to Rootkit Detection
Applying Memory Forensics to Rootkit DetectionApplying Memory Forensics to Rootkit Detection
Applying Memory Forensics to Rootkit Detection
 
Hadoop classes in mumbai
Hadoop classes in mumbaiHadoop classes in mumbai
Hadoop classes in mumbai
 
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
 

Recently uploaded

HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 

Recently uploaded (20)

HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 

Code dive 2019 kamil witecki - should i care about cpu cache

  • 1. Should I care about CPU cache? Kamil Witecki
  • 5. Key notes Relative latency Organization Profits?
  • 6. What is average DRAM latency? 0 5 10 15 20 25 0 500 1000 1500 2000 2500 speed (MT s ) latency(ns)
  • 7. And end to end? row 0 row N . . . data address control
  • 8. And end to end? row 0 row N . . . data address control activeinactive
  • 9. And end to end? row 0 row N . . . data address control activeinactive
  • 10. And end to end? row 0 row N . . . data address control activeinactivelatency
  • 11. And end to end? row 0 row N . . . data address control activeinactivelatency inactiveactive
  • 12. And end to end? row 0 row N . . . data address control activeinactivelatency inactiveactive more latency
  • 13. And end to end? row 0 row N . . . data address control activeinactivelatency inactiveactive more latency DRAM controller
  • 14. And end to end? row 0 row N . . . data address control activeinactivelatency inactiveactive more latency DRAM controller CPU core
  • 15. And end to end? row 0 row N . . . data address control activeinactivelatency inactiveactive more latency DRAM controller CPU core end to end latency: 50-100ns
  • 16. Is 50ns a lot? What Duration Reference 50ns
  • 17. Is 50ns a lot? What Duration Reference 50ns 1 clock cycle 50ns
  • 18. Is 50ns a lot? What Duration Reference 50ns 1 clock cycle@2MHz 50ns Fun fact: Clock rate of 8080 CPU
  • 19. Is 50ns a lot? What Duration Reference 50ns 1 clock cycle@2MHz 50ns 1 clock cycle@1GHz 1ns
  • 20. Is 50ns a lot? What Duration Reference 50ns 1 clock cycle@2MHz 50ns 1 clock cycle@1GHz 1ns 1 clock cycle@5GHz 0.2ns
  • 21. Is 50ns a lot? What Duration Reference 50ns 1 clock cycle@2MHz 50ns 1 clock cycle@1GHz 1ns 1 clock cycle@5GHz 0.2ns Fun fact: 1 heartbeat vs boiling 1 liter of water
  • 22. And how to counter that? row 0 row N . . . data address control activeinactive inactiveactive DRAM controller CPU core end to end latency: 50-100ns
  • 23. And how to counter that? row 0 row N . . . data address control activeinactive inactiveactive DRAM controller CPU core end to end latency: 50-100ns Cache
  • 24. And how to counter that? row 0 row N . . . data address control activeinactive inactiveactive DRAM controller CPU core end to end latency: 50-100ns Cache DRAM <-> Cache latency: 50-100ns
  • 25. And how to counter that? row 0 row N . . . data address control activeinactive inactiveactive DRAM controller CPU core end to end latency: 50-100ns Cache DRAM <-> Cache latency: 50-100nsCache latency: 1 clock cycle
  • 26. Memory hierarchy L1 64KiB L2 512KiB L3 8MiB RAM 64MiB
  • 27. Memory hierarchy L1 64KiB L2 512KiB L3 8MiB RAM 64MiB
  • 28. Associativity or how to map memory Must be fast, die-size and power efficient Ln+1 Ln 0 1 2 3 4 5 6 7 0 1 2 3
  • 29. Associativity or how to map memory Must be fast, die-size and power efficient Ln+1 Ln 0 1 2 3 4 5 6 7 0 1 2 3 Example: Direct-mapping Selection: address mod 4 Consequences: ?
  • 30. Associativity or how to map memory Must be fast, die-size and power efficient Ln+1 Ln 0 1 2 3 4 5 6 7 0 1 2 3 Example: Direct-mapping Selection: address mod 4 Consequences: ?
  • 31. Associativity or how to map memory Must be fast, die-size and power efficient Ln+1 Ln 0 1 2 3 4 5 6 7 0 1 2 3 Example: Direct-mapping Selection: address mod 4 Consequences: ?
  • 32. Associativity or how to map memory Must be fast, die-size and power efficient Ln+1 Ln 0 1 2 3 4 5 6 7 0 1 2 3 Example: Direct-mapping Selection: address mod 4 Consequences: ? Consequences: - simple - fast and die-size, power efficient
  • 33. Associativity or how to map memory Must be fast, die-size and power efficient Ln+1 Ln 0 1 2 3 4 5 6 7 0 1 2 3 Example: Direct-mapping Selection: address mod 4 Consequences: ? Consequences: - simple - fast and die-size, power efficient Consequences: - good best case - optimal sequence traversal
  • 34. Associativity or how to map memory Must be fast, die-size and power efficient Ln+1 Ln 0 1 2 3 4 5 6 7 0 1 2 3 Example: Direct-mapping Selection: address mod 4 Consequences: ? Consequences: - simple - fast and die-size, power efficient Consequences: - good best case - optimal sequence traversal Consequences: - worst case - jumping every nth line
  • 35. Associativity or how to map memory Must be fast, die-size and power efficient Ln+1 Ln 0 1 2 3 4 5 6 7 0 1 2 3 Example: Direct-mapping Selection: address mod 4 Consequences: ? Consequences: - simple - fast and die-size, power efficient Consequences: - good best case - optimal sequence traversal Consequences: - worst case - jumping every nth line Example: 2-way set associative Selection: address mod 2 And then? Example: 2-way set associative Selection: address mod 2 Then: Least Recently Used Consequences: ? 0 1
  • 36. Associativity or how to map memory Must be fast, die-size and power efficient Ln+1 Ln 0 1 2 3 4 5 6 7 0 1 2 3 Example: Direct-mapping Selection: address mod 4 Consequences: ? Consequences: - simple - fast and die-size, power efficient Consequences: - good best case - optimal sequence traversal Consequences: - worst case - jumping every nth line Example: 2-way set associative Selection: address mod 2 And then? Example: 2-way set associative Selection: address mod 2 Then: Least Recently Used Consequences: ? 0 1
  • 37. Associativity or how to map memory Must be fast, die-size and power efficient Ln+1 Ln 0 1 2 3 4 5 6 7 0 1 2 3 Example: Direct-mapping Selection: address mod 4 Consequences: ? Consequences: - simple - fast and die-size, power efficient Consequences: - good best case - optimal sequence traversal Consequences: - worst case - jumping every nth line Example: 2-way set associative Selection: address mod 2 And then? Example: 2-way set associative Selection: address mod 2 Then: Least Recently Used Consequences: ? 0 1
  • 38. Associativity or how to map memory Must be fast, die-size and power efficient Ln+1 Ln 0 1 2 3 4 5 6 7 0 1 2 3 Example: Direct-mapping Selection: address mod 4 Consequences: ? Consequences: - simple - fast and die-size, power efficient Consequences: - good best case - optimal sequence traversal Consequences: - worst case - jumping every nth line Example: 2-way set associative Selection: address mod 2 And then? Example: 2-way set associative Selection: address mod 2 Then: Least Recently Used Consequences: ? 0 1 Consequences: - complexity grows with number of ways Consequences: - 15% less cache misses Consequences: - avoids N-parallel stalls
  • 39. Associativity or how to map memory Must be fast, die-size and power efficient Ln+1 Ln 0 1 2 3 4 5 6 7 0 1 2 3 Example: Direct-mapping Selection: address mod 4 Consequences: ? Consequences: - simple - fast and die-size, power efficient Consequences: - good best case - optimal sequence traversal Consequences: - worst case - jumping every nth line Example: 2-way set associative Selection: address mod 2 And then? Example: 2-way set associative Selection: address mod 2 Then: Least Recently Used Consequences: ? 0 1 Consequences: - complexity grows with number of ways Consequences: - 15% less cache misses Consequences: - avoids N-parallel stalls
  • 40. Associativity or how to map memory Must be fast, die-size and power efficient Ln+1 Ln 0 1 2 3 4 5 6 7 0 1 2 3 Example: Direct-mapping Selection: address mod 4 Consequences: ? Consequences: - simple - fast and die-size, power efficient Consequences: - good best case - optimal sequence traversal Consequences: - worst case - jumping every nth line Example: 2-way set associative Selection: address mod 2 And then? Example: 2-way set associative Selection: address mod 2 Then: Least Recently Used Consequences: ? 0 1 Consequences: - complexity grows with number of ways Consequences: - 15% less cache misses Consequences: - avoids N-parallel stalls
  • 47. Cache line - alignment consequences Cache line: 32B Object size: 4B Alignment: 2B -> alignas(2) std::byte x[4];
  • 48. Cache line - alignment consequences Cache line: 32B Object size: 4B Alignment: 2B -> alignas(2) std::byte x[4];
  • 49. Cache line - alignment consequences Cache line: 32B Object size: 4B Alignment: 2B -> alignas(2) std::byte x[4];
  • 50. Cache line - alignment consequences Cache line: 32B Object size: 4B Alignment: 2B -> alignas(2) std::byte x[4];Alignment: 4B -> int32_t;
  • 51. Cache line - alignment consequences Cache line: 32B Object size: 4B Alignment: 2B -> alignas(2) std::byte x[4];Alignment: 4B -> int32_t;
  • 52. Cache line - alignment consequences Cache line: 32B Object size: 4B Alignment: 2B -> alignas(2) std::byte x[4];Alignment: 4B -> int32_t;Alignment: 32B -> alignas(32) int32_t x[4];
  • 53. Cache line - write consequences Cache line: 32B Object size: 4B Alignment: 4B
  • 54. Cache line - write consequences Cache line: 32B Object size: 4B Alignment: 4B
  • 55. Cache line - write consequences Cache line: 32B Object size: 4B Alignment: 4B Cache line: 32B (invalidated)
  • 56. Keeping caches hot // data: vector of objects composed of: // int32_t type , string name , data* parent , // map<string , string > params // task: find by type
  • 57. Naive C-style! // data: vector of objects composed of: // int32_t type , string name , data* parent , // map<string , string > params s t r u c t data { i n t 3 2 _ t type ; s t r i n g name ; o b s e r v e r _ p t r <data> p a r en t ; map<s t r i n g , s t r i n g > params ; } ; // task: find by type (vector is sorted by type) pair <s i z e _ t , s i z e _ t > f i n d _ b y _ t y p e ( v e c t o r <data> const & x , key type ) { auto r = equal_range ( begin ( x ) , end ( x ) , type ) ; r e t u r n { r . f i r s t − begin ( x ) , r . second − begin ( x ) } ; }
  • 58. Layout s i z e o f ( data ) ; // : 64B o f f s e t o f ( type ) ; // : 0B o f f s e t o f ( name ) ; // : 8B o f f s e t o f ( p ar e n t ) ; // : 40B o f f s e t o f ( params ) ; // : 48B a l i g n o f ( data ) ; // : 64B //Cache line : 64B Type #1 — Name #1 Parent #1 Params #1 Type #2 — Name #2 Parent #2 Params #2
  • 59. C++ style! // data: vector of objects composed of: // int32_t type , string name , data* parent , // map<string , string > params s t r u c t data { s t r i n g name ; o b s e r v e r _ p t r <data> p a r en t ; map<s t r i n g , s t r i n g > params ; } ; boost : : flat_map <i n t 3 2 _ t , data >:: equal_range ; std : : map<i n t 3 2 _ t , data >:: equal_range ; std : : unordered_map<i n t 3 2 _ t , data >:: equal_range ;
  • 60. AMD AthlonTM II X3 440, 12GiB RAM DDR3-1333, clang 8.0.1-3 0 2500 5000 7500 10000 1e+04 1e+06 1e+08 Number of objects Time(ns) colour flat-map-clang map-clang naive-clang unordered-map-clang
  • 61. AMD RyzenTM 1600X, 16GiB RAM DDR4-3200, clang 8.0.1-3 0 2500 5000 7500 10000 1e+04 1e+06 1e+08 Number of objects Time(ns) colour flat-map-clang map-clang naive-clang unordered-map-clang
  • 62. Will separate array do better? s t r u c t a r r { v e c t o r <i n t 3 2 _ t > types ; v e c t o r <entry > e n t r i e s ; } ; pair <s i z e _ t , s i z e _ t > f i n d _ b y _ t y p e ( a r r const & d , i n t 3 2 _ t type ) { auto b = begin ( d . types ) ; auto r = equal_range ( b , end ( d . types ) , type ) ; r e t u r n { r . f i r s t − b , r . second − b } ; }
  • 63. Layout s i z e o f ( type ) ; // : 8B a l i g n o f ( data ) ; // : 4B //Cache line : 64B Type #1 Type #2 Type #3 Type #4 Type #5 Type #6 Type #7 Type #8 Type #9 Type #10 Type #11 Type #12 Type #13 Type #14 Type #15 Type #16
  • 64. AMD AthlonTM II X3 440, 12GiB RAM DDR3-1333, clang 8.0.1-3 0 2500 5000 7500 10000 1e+04 1e+06 1e+08 Number of objects Time(ns) colour flat-map-clang map-clang naive-clang optimized-clang unordered-map-clang
  • 65. AMD RyzenTM 1600X, 16GiB RAM DDR4-3200, clang 8.0.1-3 0 2500 5000 7500 10000 1e+04 1e+06 1e+08 Number of objects Time(ns) colour flat-map-clang map-clang naive-clang optimized-clang unordered-map-clang
  • 67. Instructions (1.0e+07, 2.5e+07, 5.0e+07) 95614956149561495614111611161116111613461346134613468698698698691241124112411241 0 500 1000 1500 Instr values types flat-map map naive optimized unordered-map
  • 68. L1 uses & misses 870587058888115115156156115115 1739517395122122130130227227118118 0 100 200 300 L1 misses L1 uses values types flat-map map naive optimized unordered-map
  • 69. LL miss rate 7.77.77.77.78.68.68.68.69.29.29.29.215.315.315.315.39.89.89.89.8 0 5 10 15 L3 miss rate values types flat-map map naive optimized unordered-map
  • 71. Bibliograpy I Micron Technology, Inc. Speed vs. latency. White paper, Micron Technology, Inc. David A. Patterson and John L. Hennessy. Computer Organization and Design, Fifth Edition: The Hardware/Software Interface. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 5th edition, 2013.