SlideShare a Scribd company logo
1 of 32
Concurrency, Performance,
Parallelism
Aleksandr Tavgen
Concurrency and Parallelism
Quite often mixed together
But common problems are actual for both concurrency and paralellism
Process vs
Thread
Processes has separate
address space
Threads can share process
resources
For the kernel thread is
lightweight process which
shared under thread_group
Shared resources make pain
Shared
Mutable
State
Process space
Race condition
When two or more agents try to
modify one shared state in
concurrent way
Locks and
synchronisation
There are bunch of methods for
synchronising state
Locks/barriers/semaphores
Lock free algorithms (Compare
and Set)
Lock free means performance
implications
Deadlock
Let’s go deeper
Multicore
processor
Reordering of operations
Virtual registers
Operating through Load/Store Queue
Cache is a
source of truth
Registers
Memory Ordering Buffers
L1, L2 caches core local
L3 cache shared
Cache coherence protocol
needed
Cache coherence
Memory Fence Barriers
L3 cache should be synchronized
Cache coherence protocol is executed
Memory fence instruction is issued
All reordering in core pipelines should be finished
Pipelines flush
Memory Model
Mapping Virtual addresses to
Physical requires computation and/or
Page Directories access
There is another cache for that
Translation Lookaside Buffer
Some attacks use
TLBleed
If address was accesses by some
thread then it is cached in TLB
Another thread can measure
indirect access times
Context switch
Storing/Restoring context information
Could flush core pipelines
Quite expensive
Process context switch also invalidates
TLB buffers
Thread context switches are less
expensive
Performing the Process Switch
Here, we are only concerned with how the kernel performs a process switch.
Essentially, every process switch consists of two steps:
Switching the Page Global Directory to install a new address space.
Switching the Kernel Mode stack and the hardware context, which provides all the information
needed by the kernel to execute the new process, including the CPU registers.
Description of logic and steps in <<Understanding Linux Kernel>> take 4-5 pages.
Performance impact
Invalidates caches and TLB
Caches should be refiled
Context switch cost has a longer
effect
Tracing Info
Average processing time 40-60ms
In case of increased processes
switch it increased up to 500-
1000ms
DB query takes 40-50 ms
DB query time is quite constant
Processing time in normal case (CPU/Memory access intensive) 1-3 ms
After a context switch more than 40ms
Tracing on kernel
level
PythonVM with Thread execution
A lot of mutex operations (GIL effect)
A lot of gettimeoftheday() calls
I/O operations optimised - mmap
Summary
Synchronisation cost is core
pipelines flushing
Thread structures are memory
expensive
Overhead is increasing in a non
linear fashion
10 000 connections problem
Why do we need so many threads?
A lot of operations include remote
calls (DB, other services)
Synchronous calls block thread
execution
Classical Web Servers open new
thread for every incoming connection
10 000 connection problem
Code is more waiting than working
Usually DB can handle more load
than applications
Common first steps to scaling is to
increase app instances
For a lot of operations with blocking
drivers and calls this is true
Pain – pain – pain
Threads creating is expensive
Threads operating is expensive
Threads which is blocked by
synchronous call can be rescheduled
Context switches more and more
Functional programming
Avoiding
Shared
Mutable State
Lazy execution
Functional
composition
Pattern
Matching
Avoiding Mutable
State
Object encapsulates state
Methods can change internal state
Object invariant can be broken in
case of concurrent access
Semantics oriented on nouns
me.buy(store.open(basket.add(milk)))
Non-Blocking processing

More Related Content

What's hot

LOAD BALANCING ALGORITHMS
LOAD BALANCING ALGORITHMSLOAD BALANCING ALGORITHMS
LOAD BALANCING ALGORITHMStanmayshah95
 
Database replication
Database replicationDatabase replication
Database replicationArslan111
 
Nondeterminism is unavoidable, but data races are pure evil
Nondeterminism is unavoidable, but data races are pure evilNondeterminism is unavoidable, but data races are pure evil
Nondeterminism is unavoidable, but data races are pure evilracesworkshop
 
Erlang vs Ruby SOA Scheduling
Erlang vs Ruby SOA SchedulingErlang vs Ruby SOA Scheduling
Erlang vs Ruby SOA Schedulingmarfeyh
 
Deep dive into Apache Kafka consumption
Deep dive into Apache Kafka consumptionDeep dive into Apache Kafka consumption
Deep dive into Apache Kafka consumptionAlexandre Tamborrino
 
Comparing high availability solutions with percona xtradb cluster and percona...
Comparing high availability solutions with percona xtradb cluster and percona...Comparing high availability solutions with percona xtradb cluster and percona...
Comparing high availability solutions with percona xtradb cluster and percona...Marco Tusa
 
Distributed transactions
Distributed transactionsDistributed transactions
Distributed transactionsAritra Das
 
Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...
Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...
Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...Continuent
 
Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...
Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...
Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...Ontico
 
Advanced off heap ipc
Advanced off heap ipcAdvanced off heap ipc
Advanced off heap ipcPeter Lawrey
 
High Frequency Trading and NoSQL database
High Frequency Trading and NoSQL databaseHigh Frequency Trading and NoSQL database
High Frequency Trading and NoSQL databasePeter Lawrey
 
Deterministic behaviour and performance in trading systems
Deterministic behaviour and performance in trading systemsDeterministic behaviour and performance in trading systems
Deterministic behaviour and performance in trading systemsPeter Lawrey
 
Id0115
Id0115Id0115
Id0115FNian
 
Open HFT libraries in @Java
Open HFT libraries in @JavaOpen HFT libraries in @Java
Open HFT libraries in @JavaPeter Lawrey
 
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon
 
Low latency for high throughput
Low latency for high throughputLow latency for high throughput
Low latency for high throughputPeter Lawrey
 
Fast SOA with Apache Synapse
Fast SOA with Apache SynapseFast SOA with Apache Synapse
Fast SOA with Apache SynapsePaul Fremantle
 

What's hot (20)

LOAD BALANCING ALGORITHMS
LOAD BALANCING ALGORITHMSLOAD BALANCING ALGORITHMS
LOAD BALANCING ALGORITHMS
 
Database replication
Database replicationDatabase replication
Database replication
 
Nondeterminism is unavoidable, but data races are pure evil
Nondeterminism is unavoidable, but data races are pure evilNondeterminism is unavoidable, but data races are pure evil
Nondeterminism is unavoidable, but data races are pure evil
 
Erlang vs Ruby SOA Scheduling
Erlang vs Ruby SOA SchedulingErlang vs Ruby SOA Scheduling
Erlang vs Ruby SOA Scheduling
 
Deep dive into Apache Kafka consumption
Deep dive into Apache Kafka consumptionDeep dive into Apache Kafka consumption
Deep dive into Apache Kafka consumption
 
Comparing high availability solutions with percona xtradb cluster and percona...
Comparing high availability solutions with percona xtradb cluster and percona...Comparing high availability solutions with percona xtradb cluster and percona...
Comparing high availability solutions with percona xtradb cluster and percona...
 
data replication
data replicationdata replication
data replication
 
Web Server Load Balancing
Web Server Load BalancingWeb Server Load Balancing
Web Server Load Balancing
 
Distributed transactions
Distributed transactionsDistributed transactions
Distributed transactions
 
Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...
Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...
Training Slides: Basics 107: Simple Tungsten Replicator Installation to Extra...
 
Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...
Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...
Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...
 
Load balancing
Load balancingLoad balancing
Load balancing
 
Advanced off heap ipc
Advanced off heap ipcAdvanced off heap ipc
Advanced off heap ipc
 
High Frequency Trading and NoSQL database
High Frequency Trading and NoSQL databaseHigh Frequency Trading and NoSQL database
High Frequency Trading and NoSQL database
 
Deterministic behaviour and performance in trading systems
Deterministic behaviour and performance in trading systemsDeterministic behaviour and performance in trading systems
Deterministic behaviour and performance in trading systems
 
Id0115
Id0115Id0115
Id0115
 
Open HFT libraries in @Java
Open HFT libraries in @JavaOpen HFT libraries in @Java
Open HFT libraries in @Java
 
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
 
Low latency for high throughput
Low latency for high throughputLow latency for high throughput
Low latency for high throughput
 
Fast SOA with Apache Synapse
Fast SOA with Apache SynapseFast SOA with Apache Synapse
Fast SOA with Apache Synapse
 

Similar to Concurrency, Performance, Parallelism

Functional?
 Reactive? 
Why?
Functional?
 Reactive? 
Why?Functional?
 Reactive? 
Why?
Functional?
 Reactive? 
Why?Timetrix
 
(1) Briefly describe what overhead is associated with managing and s.pdf
(1) Briefly describe what overhead is associated with managing and s.pdf(1) Briefly describe what overhead is associated with managing and s.pdf
(1) Briefly describe what overhead is associated with managing and s.pdfindiaartz
 
Parallel Processing (Part 2)
Parallel Processing (Part 2)Parallel Processing (Part 2)
Parallel Processing (Part 2)Ajeng Savitri
 
Mapping Data Flows Perf Tuning April 2021
Mapping Data Flows Perf Tuning April 2021Mapping Data Flows Perf Tuning April 2021
Mapping Data Flows Perf Tuning April 2021Mark Kromer
 
Fabric Data Factory Pipeline Copy Perf Tips.pptx
Fabric Data Factory Pipeline Copy Perf Tips.pptxFabric Data Factory Pipeline Copy Perf Tips.pptx
Fabric Data Factory Pipeline Copy Perf Tips.pptxMark Kromer
 
SQL Server It Just Runs Faster
SQL Server It Just Runs FasterSQL Server It Just Runs Faster
SQL Server It Just Runs FasterBob Ward
 
Scalable Apache for Beginners
Scalable Apache for BeginnersScalable Apache for Beginners
Scalable Apache for Beginnerswebhostingguy
 
Efficient Shared Data in Perl
Efficient Shared Data in PerlEfficient Shared Data in Perl
Efficient Shared Data in PerlPerrin Harkins
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
 
ScalabilityAvailability
ScalabilityAvailabilityScalabilityAvailability
ScalabilityAvailabilitywebuploader
 
Showdown: IBM DB2 versus Oracle Database for OLTP
Showdown: IBM DB2 versus Oracle Database for OLTPShowdown: IBM DB2 versus Oracle Database for OLTP
Showdown: IBM DB2 versus Oracle Database for OLTPcomahony
 
State transfer With Galera
State transfer With GaleraState transfer With Galera
State transfer With GaleraMydbops
 
Developing high-performance network servers in Lisp
Developing high-performance network servers in LispDeveloping high-performance network servers in Lisp
Developing high-performance network servers in LispVladimir Sedach
 
Sql server 2016 it just runs faster sql bits 2017 edition
Sql server 2016 it just runs faster   sql bits 2017 editionSql server 2016 it just runs faster   sql bits 2017 edition
Sql server 2016 it just runs faster sql bits 2017 editionBob Ward
 
Multithreading 101
Multithreading 101Multithreading 101
Multithreading 101Tim Penhey
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceJ Singh
 

Similar to Concurrency, Performance, Parallelism (20)

Functional?
 Reactive? 
Why?
Functional?
 Reactive? 
Why?Functional?
 Reactive? 
Why?
Functional?
 Reactive? 
Why?
 
(1) Briefly describe what overhead is associated with managing and s.pdf
(1) Briefly describe what overhead is associated with managing and s.pdf(1) Briefly describe what overhead is associated with managing and s.pdf
(1) Briefly describe what overhead is associated with managing and s.pdf
 
Parallel Processing (Part 2)
Parallel Processing (Part 2)Parallel Processing (Part 2)
Parallel Processing (Part 2)
 
Mapping Data Flows Perf Tuning April 2021
Mapping Data Flows Perf Tuning April 2021Mapping Data Flows Perf Tuning April 2021
Mapping Data Flows Perf Tuning April 2021
 
Fabric Data Factory Pipeline Copy Perf Tips.pptx
Fabric Data Factory Pipeline Copy Perf Tips.pptxFabric Data Factory Pipeline Copy Perf Tips.pptx
Fabric Data Factory Pipeline Copy Perf Tips.pptx
 
Chap2 slides
Chap2 slidesChap2 slides
Chap2 slides
 
SQL Server It Just Runs Faster
SQL Server It Just Runs FasterSQL Server It Just Runs Faster
SQL Server It Just Runs Faster
 
Scalable Apache for Beginners
Scalable Apache for BeginnersScalable Apache for Beginners
Scalable Apache for Beginners
 
Efficient Shared Data in Perl
Efficient Shared Data in PerlEfficient Shared Data in Perl
Efficient Shared Data in Perl
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Database System Architectures
Database System ArchitecturesDatabase System Architectures
Database System Architectures
 
Chapter04 new
Chapter04 newChapter04 new
Chapter04 new
 
ScalabilityAvailability
ScalabilityAvailabilityScalabilityAvailability
ScalabilityAvailability
 
Showdown: IBM DB2 versus Oracle Database for OLTP
Showdown: IBM DB2 versus Oracle Database for OLTPShowdown: IBM DB2 versus Oracle Database for OLTP
Showdown: IBM DB2 versus Oracle Database for OLTP
 
Lecture1
Lecture1Lecture1
Lecture1
 
State transfer With Galera
State transfer With GaleraState transfer With Galera
State transfer With Galera
 
Developing high-performance network servers in Lisp
Developing high-performance network servers in LispDeveloping high-performance network servers in Lisp
Developing high-performance network servers in Lisp
 
Sql server 2016 it just runs faster sql bits 2017 edition
Sql server 2016 it just runs faster   sql bits 2017 editionSql server 2016 it just runs faster   sql bits 2017 edition
Sql server 2016 it just runs faster sql bits 2017 edition
 
Multithreading 101
Multithreading 101Multithreading 101
Multithreading 101
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduce
 

Recently uploaded

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxnada99848
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 

Recently uploaded (20)

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptx
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 

Concurrency, Performance, Parallelism

Editor's Notes

  1. Registers: Within each core are separate register files containing 160 entries for integers and 144 floating point numbers.  These registers are accessible within a single cycle and constitute the fastest memory available to our execution cores.  Memory Ordering Buffers (MOB): The MOB is comprised of a 64-entry load and 36-entry store buffer.  These buffers are used to track in-flight operations while waiting on the cache sub-system as instructions get executed out-of-order.  The store buffer is a fully associative queue that can be searched for existing store operations, which have been queued when waiting on the L1 cache.  These buffers enable our fast processors to run without blocking while data is transferred to and from the cache sub-system.  When the processor issues reads and writes they can can come back out-of-order.  The MOB is used to disambiguate the load and store ordering for compliance to the published memory model.  Level 1 Cache: The L1 is a core-local cache split into separate 32K data and 32K instruction caches.  Access time is 3 cycles and can be hidden as instructions are pipelined by the core for data already in the L1 cache. Level 2 Cache: The L2 cache is a core-local cache designed to buffer access between the L1 and the shared L3 cache.   Level 3 Cache: The L3 cache is shared across all cores within a socket. Main Memory: DRAM channels are connected to each socket with an average latency of ~65ns for socket local access on a full cache-miss.  This is however extremely variable, being much less for subsequent accesses to columns in the same row buffer, NUMA: In a multi-socket server we have non-uniform memory access.  It is non-uniform because the required memory maybe on a remote socket having an additional 40ns hop across the QPI bus. Associativity Levels Caches are effectively hardware based hash tables.  The hash function is usually a simple masking of some low-order bits for cache indexing.  Hash tables need some means to handle a collision for the same slot.  The L3 cache is inclusive in that any cache-line held in the L1 or L2 caches is also held in the L3.  This provides for rapid identification of the core containing a modified line when snooping for changes.  The cache controller for the L3 segment keeps track of which core could have a modified version of a cache-line it owns.
  2. Cache Coherence With some caches being local to cores, we need a means of keeping them coherent so all cores can have a consistent view of memory.  The cache sub-system is considered the "source of truth" for mainstream systems.  If memory is fetched from the cache it is never stale; the cache is the master copy when data exists in both the cache and main-memory.  To keep the caches coherent the cache controller tracks the state of each cache-line as being in one of a finite number of states.  The protocol Intel employs for this is MESIF, AMD employs a variant know as MOESI.  Under the MESIF protocol each cache-line can be in 1 of the 5 following states: Modified: Indicates the cache-line is dirty and must be written back to memory at a later stage.  When written back to main-memory the state transitions to Exclusive. Exclusive: Indicates the cache-line is held exclusively and that it matches main-memory.  When written to, the state then transitions to Modified.  To achieve this state a Read-For-Ownership (RFO) message is sent which involves a read plus an invalidate broadcast to all other copies. Shared: Indicates a clean copy of a cache-line that matches main-memory. Invalid: Indicates an unused cache-line. Forward: Indicates a specialised version of the shared state i.e. this is the designated cache which should respond to other caches in a NUMA system.
  3. To keep the caches coherent the cache controller tracks the state of each cache-line as being in one of a finite number of states When a cache hit occurs, the cache controller behaves differently, depending on the access type. For a read operation, the controller selects the data from the cache line and transfers it into a CPU register; the RAM is not accessed and the CPU saves time, which is why the cache system was invented. For a write operation, the controller may implement one of two basic strategies called write-through and write-back. In a write-through, the controller always writes into both RAM and the cache line, effec- tively switching off the cache for write operations. In a write-back, which offers more immediate efficiency, only the cache line is updated and the contents of the RAM are left unchanged. After a write-back, of course, the RAM must eventually be updated. The cache controller writes the cache line back into RAM only when the CPU exe- cutes an instruction requiring a flush of cache entries or when a FLUSH hardware signal occurs (usually after a cache miss). When a cache miss occurs, the cache line is written to memory, if necessary, and the correct line is fetched from RAM into the cache entry.
  4. Translation Lookaside Buffers (TLB) Besides general-purpose hardware caches, 80 × 86 processors include another cache called Translation Lookaside Buffers (TLB) to speed up linear address translation. When a linear address is used for the first time, the corresponding physical address is computed through slow accesses to the Page Tables in RAM. The physical address is then stored in a TLB entry so that further references to the same linear address can be quickly translated. In a multiprocessor system, each CPU has its own TLB, called the local TLB of the CPU. Contrary to the hardware cache, the corresponding entries of the TLB need not be synchronized, because processes running on the existing CPUs may associate the same linear address with different physical ones. When the cr3 control register of a CPU is modified, the hardware automatically invalidates all entries of the local TLB, because a new set of page tables is in use and the TLBs are pointing to old data.
  5. TLBleed shows that, by monitoring hyper-thread activity through the TLB instead of caches, even with full cache isolation or protection policies in effect, information can still leak between processes
  6. A context switch is the process by which the OS scheduler removes a currently running thread or task and replaces it with one that is waiting. There are several different types of context switch, but broadly speaking, they all involve swapping the executing instructions and the stack state of the thread. A context switch can be a costly operation, whether between user threads or from user mode into kernel mode (sometimes called a mode switch). The latter case is particularly important, because a user thread may need to swap into kernel mode in order to perform some function partway through its time slice. However, this switch will force instruction and other caches to be emptied, as the memory areas accessed by the user space code will not normally have anything in common with the kernel. For each process, Linux packs two different data structures in a single per-process memory area: a small data structure linked to the process descriptor, namely the thread_info structure, and the Kernel Mode pro- cess stack.
  7. A context switch into kernel mode will invalidate the TLBs and potentially other caches. When the call returns, these caches will have to be refilled, and so the effect of a kernel mode switch persists even after control has returned to user space. This causes the true cost of a system call to be masked, as can be seen
  8. In non-blocking or asynchronous request processing, no thread is in waiting state. There is generally only one request thread receiving the request. All incoming requests come with a event handler and call back information. Request thread delegates the incoming requests to a thread pool (generally small number of threads) which delegate the request to it’s handler function and immediately start processing other incoming requests from request thread. When the handler function is complete, one of thread from pool collect the response and pass it to the call back function.