SlideShare a Scribd company logo
1 of 51
FUNCTIONAL?
REACTIVE?
WHY?
Aleksandr Tavgen
Concurrency and Parallelism
■ Quite often mixed together
■ But common problems are actual for both
concurrency and parallelism
Process vs Thread
š Processes has separate address space
š Threads can share process resources
š For the kernel thread is a lightweight process
š Shared resources make pain… real
Race condition
■ Modifying an one shared state in a concurrent way
Locks and
synchronization
■ There is a bunch of methods for synchronising state
■ Locks/barriers/semaphores
■ Lock free algorithms (Compare and Set)
■ Lock free means often performance implications
DEADLOCK
LET’S GO DEEPER
Look mom it is a F…ING Shared State
Cache is a
source of truth
■ Registers
■ Memory Ordering Buffers
■ L1, L2 caches core local
■ L3 cache shared
■ Cache coherence protocol
needed
REORDERING OF OPERATIONS
VIRTUAL REGISTERS
OPERATING THROUGH LOAD/STORE
QUEUE
The L3 cache is inclusive of all data in the
L1 and L2 for each core on the same
socket.
Shared Mutable Cache
Cache coherence protocol
Memory Barriers
L3 cache should be synchronised
Cache coherence protocol is executed
Memory fence instruction is issued
OOO execution should be finished
Pipelines flush
Memory Model
■ Mapping Virtual addresses to Physical requires
computation and/or Page Directories access
■ Even with L1 data it would cost 16 cycles
■ There is another cache for that
■ Translation Lookaside Buffer
Some attacks use
TLBleed
■ If address was accesses by
some thread then it is
cached in TLB
■ Another thread can measure
indirect access times
Context
switch
■ Storing/Restoring context information
■ Could flush core pipelines
■ Quite expensive
■ Process context switch also invalidates TLB buffers
■ Thread context switches are less expensive
Performing the Process Switch
■ Here, we are only concerned with how the kernel
performs a process switch.
■ Essentially, every process switch consists of two steps:
■ Switching the Page Global Directory to install a new
address space.
■ Switching the Kernel Mode stack and the hardware
context, which provides all the information needed by the
kernel to execute the new process, including the CPU
registers.
■ Description of logic and steps in <<Understanding Linux
Kernel>> takes 4-5 pages.
INVALIDATES
CACHES AND TLB
CACHES SHOULD
BE REFILED
CONTEXT SWITCH
COST HAS A
LONGLASTING
EFFECT
True cost and long lasting effect
Tracing
Info■ Average processing
time 40-60ms
■ In case of increased
processes switch it
increased up to 500-
1000ms
■ DB query takes 40-50
ms
DB QUERY TIME IS QUITE CONSTANT
PROCESSING TIME IN NORMAL CASE 1-3 MS
AFTER A CONTEXT SWITCH MORE THAN 40MS
Tracing on kernel level
■ PythonVM with Thread execution
■ A lot of mutex operations (GIL effect)
■ A lot of gettimeoftheday() calls
■ I/O operations optimised - mmap
■ Synchronisation cost is
core pipelines flushing
■ Thread structures are
memory expensive
■ Overhead increases in a
non linear fashion
Why do we need so many threads?
A lot of operations include remote calls (DB, other services)
Synchronous calls block thread execution
Classical Web Servers open a new thread for every incoming connection
10 000 connection problem
Code is more waiting than working
Usually DB can handle more load than applications
Common first steps to scaling is to increase app instances
For a lot of operations with blocking drivers and calls this is true
Pain – pain – pain
Threads creating and operating is expensive
Threads which is blocked by synchronous call can be rescheduled
Context switches more and more
Functional approach
Avoiding
Shared
Mutable
State
Lazy
execution
Functional
composition
Pattern
Matching
Object Oriented Programming
■ Object encapsulates state
■ Methods can change internal
state
■ Object invariant can be
broken in case of concurrent
access
■ Semantics oriented on
nouns
me.walkTo(store.open((basket.add(milk)))
■ Oriented on functional
composition
■ Functions have no side
effects
■ Have some performance
implications
■ Semantics oriented on verbs
Functional Programming
walk(open(store, add(basket, milk))))
Lazy execution
Functional composition means
creating pipelines
It’s defined before actual
computation
Declarative
More freedom to runtime
optimizations
Common idioms
Map Filter
Reduc
e
Reactive
Manifesto
Responsive
Message Driven
Resilient
Elastic
Reactive
sides of a
coin
Actors model
Communicating Sequential
Process
Kotlin coroutines
Java Reactive Interfaces
Two sides
of a coin
Functional - building software
by composing pure functions,
avoiding shared state, mutable
data, and side-effects.
Reactive - asynchronous
programming paradigm
concerned with data streams
and the propagation of change
So what is
the main
goal?
To maximise the use rate
of modern multicore CPUs
and, more precisely, of the
threads competing for
their use.
FUNCTIONAL COMPOSITION IN A HASKELL WAY
BREADTH FIRST RECURSIVE SEARCH
DB WORKS IN A STREAMING MODE
Pulling vs. Pushing Data
Java streams – pull model
Reactive – push model
Reacting on propagating
changes instead of
iteration
Blocking
Processing
■ Mapping one execution
path on one thread is
ineffective
■ Threads are blocked
waiting for the I/O operation
to complete
■ Exit is to share threads
(relatively expensive and
scarce resources) among
lighter constructs
■ Like functional composition
of execution path
Non-Blocking processing
Let’s test
■ 2514 units of work (DB request and computation)
■ Scheduled once a minute
■ Execution on 8 thread pool with blocking driver takes 30-
35 sec on dedicated server
■ Let’s map it on threads and reactive pool
More than 2553 threads at start
CPU and memory
disturbances
A lot of 5 sec timeouts on driver side
8 threads with non-blocking I/O
Less CPU and memory usage Uniform scheduled execution
One thread – one
execution path
One thread – many
execution paths
DB LOAD – ONE REACTIVE INSTANCE CAN LOAD UP
DB
Reactive is more than just async and
non blocking execution
Advanced time scheduling
Flexible Scheduling
Backpressure control
Resilience on errors
Java
Reactive
Revolution
RxJava2/Reactor/VertX
Spring 5 with WebFlux
A lot of Reactive Drivers
Active community
Recommendations
• Understanding the Linux Kernel [Book] - O'Reilly Media
• Optimizing Java - O'Reilly Media
• Seven Concurrency Models in Seven Weeks - The Pragmatic Bookshelf
• Learning Haskell

More Related Content

What's hot

How to Meet Your P99 Goal While Overcommitting Another Workload
How to Meet Your P99 Goal While Overcommitting Another WorkloadHow to Meet Your P99 Goal While Overcommitting Another Workload
How to Meet Your P99 Goal While Overcommitting Another WorkloadScyllaDB
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephScyllaDB
 
hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0HBaseCon
 
Sharding: Past, Present and Future with Krutika Dhananjay
Sharding: Past, Present and Future with Krutika DhananjaySharding: Past, Present and Future with Krutika Dhananjay
Sharding: Past, Present and Future with Krutika DhananjayGluster.org
 
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...Flink Forward
 
Comparative Evaluation of Spark and Flink Stream Processing
Comparative Evaluation of Spark and Flink Stream Processing Comparative Evaluation of Spark and Flink Stream Processing
Comparative Evaluation of Spark and Flink Stream Processing Ehab Qadah
 
Postgres-XC: Symmetric PostgreSQL Cluster
Postgres-XC: Symmetric PostgreSQL ClusterPostgres-XC: Symmetric PostgreSQL Cluster
Postgres-XC: Symmetric PostgreSQL ClusterPavan Deolasee
 
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScyllaDB
 
How to leave the ORM at home and write SQL
How to leave the ORM at home and write SQLHow to leave the ORM at home and write SQL
How to leave the ORM at home and write SQLMariaDB plc
 
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMsJava one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMsSpeedment, Inc.
 
Training Slides: 253 - Filter like a Pro
Training Slides: 253 - Filter like a ProTraining Slides: 253 - Filter like a Pro
Training Slides: 253 - Filter like a ProContinuent
 
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devices
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devicesHBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devices
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devicesMichael Stack
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMiHBaseCon
 
3 Reasons to Love React
3 Reasons to Love React3 Reasons to Love React
3 Reasons to Love ReactVictor Leung
 
Virtual Memory Management
Virtual Memory ManagementVirtual Memory Management
Virtual Memory ManagementRahul Jamwal
 

What's hot (20)

How to Meet Your P99 Goal While Overcommitting Another Workload
How to Meet Your P99 Goal While Overcommitting Another WorkloadHow to Meet Your P99 Goal While Overcommitting Another Workload
How to Meet Your P99 Goal While Overcommitting Another Workload
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase Client
 
Apache flink
Apache flinkApache flink
Apache flink
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
 
hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0
 
Sharding: Past, Present and Future with Krutika Dhananjay
Sharding: Past, Present and Future with Krutika DhananjaySharding: Past, Present and Future with Krutika Dhananjay
Sharding: Past, Present and Future with Krutika Dhananjay
 
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
 
Comparative Evaluation of Spark and Flink Stream Processing
Comparative Evaluation of Spark and Flink Stream Processing Comparative Evaluation of Spark and Flink Stream Processing
Comparative Evaluation of Spark and Flink Stream Processing
 
Postgres-XC: Symmetric PostgreSQL Cluster
Postgres-XC: Symmetric PostgreSQL ClusterPostgres-XC: Symmetric PostgreSQL Cluster
Postgres-XC: Symmetric PostgreSQL Cluster
 
Mule batch
Mule batchMule batch
Mule batch
 
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
 
Hadoop job chaining
Hadoop job chainingHadoop job chaining
Hadoop job chaining
 
How to leave the ORM at home and write SQL
How to leave the ORM at home and write SQLHow to leave the ORM at home and write SQL
How to leave the ORM at home and write SQL
 
Accordion HBaseCon 2017
Accordion HBaseCon 2017Accordion HBaseCon 2017
Accordion HBaseCon 2017
 
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMsJava one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
 
Training Slides: 253 - Filter like a Pro
Training Slides: 253 - Filter like a ProTraining Slides: 253 - Filter like a Pro
Training Slides: 253 - Filter like a Pro
 
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devices
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devicesHBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devices
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devices
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMi
 
3 Reasons to Love React
3 Reasons to Love React3 Reasons to Love React
3 Reasons to Love React
 
Virtual Memory Management
Virtual Memory ManagementVirtual Memory Management
Virtual Memory Management
 

Similar to Functional?
 Reactive? 
Why?

Concurrency, Performance, Parallelism
Concurrency, Performance, ParallelismConcurrency, Performance, Parallelism
Concurrency, Performance, ParallelismTimetrix
 
(1) Briefly describe what overhead is associated with managing and s.pdf
(1) Briefly describe what overhead is associated with managing and s.pdf(1) Briefly describe what overhead is associated with managing and s.pdf
(1) Briefly describe what overhead is associated with managing and s.pdfindiaartz
 
Mapping Data Flows Perf Tuning April 2021
Mapping Data Flows Perf Tuning April 2021Mapping Data Flows Perf Tuning April 2021
Mapping Data Flows Perf Tuning April 2021Mark Kromer
 
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...balmanme
 
Coding For Cores - C# Way
Coding For Cores - C# WayCoding For Cores - C# Way
Coding For Cores - C# WayBishnu Rawal
 
Parallel Computing in .NET
Parallel Computing in .NETParallel Computing in .NET
Parallel Computing in .NETmeghantaylor
 
Fast switching of threads between cores - Advanced Operating Systems
Fast switching of threads between cores - Advanced Operating SystemsFast switching of threads between cores - Advanced Operating Systems
Fast switching of threads between cores - Advanced Operating SystemsRuhaim Izmeth
 
Fabric Data Factory Pipeline Copy Perf Tips.pptx
Fabric Data Factory Pipeline Copy Perf Tips.pptxFabric Data Factory Pipeline Copy Perf Tips.pptx
Fabric Data Factory Pipeline Copy Perf Tips.pptxMark Kromer
 
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...Cloudera, Inc.
 
Amazon Aurora TechConnect
Amazon Aurora TechConnect Amazon Aurora TechConnect
Amazon Aurora TechConnect LavanyaMurthy9
 
Making_Good_Enough...Better-Addressing_the_Multiple_Objectives_of_High-Perfor...
Making_Good_Enough...Better-Addressing_the_Multiple_Objectives_of_High-Perfor...Making_Good_Enough...Better-Addressing_the_Multiple_Objectives_of_High-Perfor...
Making_Good_Enough...Better-Addressing_the_Multiple_Objectives_of_High-Perfor...John Gunnels
 
Architecting a 35 PB distributed parallel file system for science
Architecting a 35 PB distributed parallel file system for scienceArchitecting a 35 PB distributed parallel file system for science
Architecting a 35 PB distributed parallel file system for scienceSpeck&Tech
 
Deep Dive on the Amazon Aurora MySQL-compatible Edition - DAT301 - re:Invent ...
Deep Dive on the Amazon Aurora MySQL-compatible Edition - DAT301 - re:Invent ...Deep Dive on the Amazon Aurora MySQL-compatible Edition - DAT301 - re:Invent ...
Deep Dive on the Amazon Aurora MySQL-compatible Edition - DAT301 - re:Invent ...Amazon Web Services
 
SPL_ALL_EN.pptx
SPL_ALL_EN.pptxSPL_ALL_EN.pptx
SPL_ALL_EN.pptx政宏 张
 
SQL Server It Just Runs Faster
SQL Server It Just Runs FasterSQL Server It Just Runs Faster
SQL Server It Just Runs FasterBob Ward
 
Multiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
Multiscale Dataflow Computing: Competitive Advantage at the Exascale FrontierMultiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
Multiscale Dataflow Computing: Competitive Advantage at the Exascale Frontierinside-BigData.com
 
Optimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardwareOptimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardwareIndicThreads
 

Similar to Functional?
 Reactive? 
Why? (20)

Concurrency, Performance, Parallelism
Concurrency, Performance, ParallelismConcurrency, Performance, Parallelism
Concurrency, Performance, Parallelism
 
(1) Briefly describe what overhead is associated with managing and s.pdf
(1) Briefly describe what overhead is associated with managing and s.pdf(1) Briefly describe what overhead is associated with managing and s.pdf
(1) Briefly describe what overhead is associated with managing and s.pdf
 
Mapping Data Flows Perf Tuning April 2021
Mapping Data Flows Perf Tuning April 2021Mapping Data Flows Perf Tuning April 2021
Mapping Data Flows Perf Tuning April 2021
 
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
 
Coding For Cores - C# Way
Coding For Cores - C# WayCoding For Cores - C# Way
Coding For Cores - C# Way
 
Parallel Computing in .NET
Parallel Computing in .NETParallel Computing in .NET
Parallel Computing in .NET
 
Fast switching of threads between cores - Advanced Operating Systems
Fast switching of threads between cores - Advanced Operating SystemsFast switching of threads between cores - Advanced Operating Systems
Fast switching of threads between cores - Advanced Operating Systems
 
Fabric Data Factory Pipeline Copy Perf Tips.pptx
Fabric Data Factory Pipeline Copy Perf Tips.pptxFabric Data Factory Pipeline Copy Perf Tips.pptx
Fabric Data Factory Pipeline Copy Perf Tips.pptx
 
Introducing Amazon Aurora
Introducing Amazon AuroraIntroducing Amazon Aurora
Introducing Amazon Aurora
 
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
 
Amazon Aurora TechConnect
Amazon Aurora TechConnect Amazon Aurora TechConnect
Amazon Aurora TechConnect
 
Making_Good_Enough...Better-Addressing_the_Multiple_Objectives_of_High-Perfor...
Making_Good_Enough...Better-Addressing_the_Multiple_Objectives_of_High-Perfor...Making_Good_Enough...Better-Addressing_the_Multiple_Objectives_of_High-Perfor...
Making_Good_Enough...Better-Addressing_the_Multiple_Objectives_of_High-Perfor...
 
Architecting a 35 PB distributed parallel file system for science
Architecting a 35 PB distributed parallel file system for scienceArchitecting a 35 PB distributed parallel file system for science
Architecting a 35 PB distributed parallel file system for science
 
Chap2 slides
Chap2 slidesChap2 slides
Chap2 slides
 
Deep Dive on the Amazon Aurora MySQL-compatible Edition - DAT301 - re:Invent ...
Deep Dive on the Amazon Aurora MySQL-compatible Edition - DAT301 - re:Invent ...Deep Dive on the Amazon Aurora MySQL-compatible Edition - DAT301 - re:Invent ...
Deep Dive on the Amazon Aurora MySQL-compatible Edition - DAT301 - re:Invent ...
 
SPL_ALL_EN.pptx
SPL_ALL_EN.pptxSPL_ALL_EN.pptx
SPL_ALL_EN.pptx
 
SQL Server It Just Runs Faster
SQL Server It Just Runs FasterSQL Server It Just Runs Faster
SQL Server It Just Runs Faster
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
 
Multiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
Multiscale Dataflow Computing: Competitive Advantage at the Exascale FrontierMultiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
Multiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
 
Optimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardwareOptimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardware
 

Recently uploaded

Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Intelisync
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 

Recently uploaded (20)

Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 

Functional?
 Reactive? 
Why?

  • 2. Concurrency and Parallelism ■ Quite often mixed together ■ But common problems are actual for both concurrency and parallelism
  • 3.
  • 4. Process vs Thread š Processes has separate address space š Threads can share process resources š For the kernel thread is a lightweight process š Shared resources make pain… real
  • 5.
  • 6. Race condition ■ Modifying an one shared state in a concurrent way
  • 7.
  • 8. Locks and synchronization ■ There is a bunch of methods for synchronising state ■ Locks/barriers/semaphores ■ Lock free algorithms (Compare and Set) ■ Lock free means often performance implications
  • 10.
  • 12.
  • 13. Look mom it is a F…ING Shared State
  • 14. Cache is a source of truth ■ Registers ■ Memory Ordering Buffers ■ L1, L2 caches core local ■ L3 cache shared ■ Cache coherence protocol needed
  • 15. REORDERING OF OPERATIONS VIRTUAL REGISTERS OPERATING THROUGH LOAD/STORE QUEUE
  • 16. The L3 cache is inclusive of all data in the L1 and L2 for each core on the same socket. Shared Mutable Cache
  • 18. Memory Barriers L3 cache should be synchronised Cache coherence protocol is executed Memory fence instruction is issued OOO execution should be finished Pipelines flush
  • 19. Memory Model ■ Mapping Virtual addresses to Physical requires computation and/or Page Directories access ■ Even with L1 data it would cost 16 cycles ■ There is another cache for that ■ Translation Lookaside Buffer
  • 20. Some attacks use TLBleed ■ If address was accesses by some thread then it is cached in TLB ■ Another thread can measure indirect access times
  • 21. Context switch ■ Storing/Restoring context information ■ Could flush core pipelines ■ Quite expensive ■ Process context switch also invalidates TLB buffers ■ Thread context switches are less expensive
  • 22. Performing the Process Switch ■ Here, we are only concerned with how the kernel performs a process switch. ■ Essentially, every process switch consists of two steps: ■ Switching the Page Global Directory to install a new address space. ■ Switching the Kernel Mode stack and the hardware context, which provides all the information needed by the kernel to execute the new process, including the CPU registers. ■ Description of logic and steps in <<Understanding Linux Kernel>> takes 4-5 pages.
  • 23. INVALIDATES CACHES AND TLB CACHES SHOULD BE REFILED CONTEXT SWITCH COST HAS A LONGLASTING EFFECT True cost and long lasting effect
  • 24. Tracing Info■ Average processing time 40-60ms ■ In case of increased processes switch it increased up to 500- 1000ms ■ DB query takes 40-50 ms
  • 25. DB QUERY TIME IS QUITE CONSTANT PROCESSING TIME IN NORMAL CASE 1-3 MS AFTER A CONTEXT SWITCH MORE THAN 40MS
  • 26. Tracing on kernel level ■ PythonVM with Thread execution ■ A lot of mutex operations (GIL effect) ■ A lot of gettimeoftheday() calls ■ I/O operations optimised - mmap
  • 27. ■ Synchronisation cost is core pipelines flushing ■ Thread structures are memory expensive ■ Overhead increases in a non linear fashion
  • 28. Why do we need so many threads? A lot of operations include remote calls (DB, other services) Synchronous calls block thread execution Classical Web Servers open a new thread for every incoming connection 10 000 connection problem
  • 29. Code is more waiting than working Usually DB can handle more load than applications Common first steps to scaling is to increase app instances For a lot of operations with blocking drivers and calls this is true
  • 30. Pain – pain – pain Threads creating and operating is expensive Threads which is blocked by synchronous call can be rescheduled Context switches more and more
  • 32. Object Oriented Programming ■ Object encapsulates state ■ Methods can change internal state ■ Object invariant can be broken in case of concurrent access ■ Semantics oriented on nouns me.walkTo(store.open((basket.add(milk)))
  • 33. ■ Oriented on functional composition ■ Functions have no side effects ■ Have some performance implications ■ Semantics oriented on verbs Functional Programming walk(open(store, add(basket, milk))))
  • 34. Lazy execution Functional composition means creating pipelines It’s defined before actual computation Declarative More freedom to runtime optimizations
  • 37. Reactive sides of a coin Actors model Communicating Sequential Process Kotlin coroutines Java Reactive Interfaces
  • 38. Two sides of a coin Functional - building software by composing pure functions, avoiding shared state, mutable data, and side-effects. Reactive - asynchronous programming paradigm concerned with data streams and the propagation of change
  • 39. So what is the main goal? To maximise the use rate of modern multicore CPUs and, more precisely, of the threads competing for their use.
  • 40. FUNCTIONAL COMPOSITION IN A HASKELL WAY BREADTH FIRST RECURSIVE SEARCH DB WORKS IN A STREAMING MODE
  • 41. Pulling vs. Pushing Data Java streams – pull model Reactive – push model Reacting on propagating changes instead of iteration
  • 42. Blocking Processing ■ Mapping one execution path on one thread is ineffective ■ Threads are blocked waiting for the I/O operation to complete ■ Exit is to share threads (relatively expensive and scarce resources) among lighter constructs ■ Like functional composition of execution path
  • 44. Let’s test ■ 2514 units of work (DB request and computation) ■ Scheduled once a minute ■ Execution on 8 thread pool with blocking driver takes 30- 35 sec on dedicated server ■ Let’s map it on threads and reactive pool
  • 45. More than 2553 threads at start CPU and memory disturbances A lot of 5 sec timeouts on driver side
  • 46. 8 threads with non-blocking I/O Less CPU and memory usage Uniform scheduled execution
  • 47. One thread – one execution path One thread – many execution paths
  • 48. DB LOAD – ONE REACTIVE INSTANCE CAN LOAD UP DB
  • 49. Reactive is more than just async and non blocking execution Advanced time scheduling Flexible Scheduling Backpressure control Resilience on errors
  • 50. Java Reactive Revolution RxJava2/Reactor/VertX Spring 5 with WebFlux A lot of Reactive Drivers Active community
  • 51. Recommendations • Understanding the Linux Kernel [Book] - O'Reilly Media • Optimizing Java - O'Reilly Media • Seven Concurrency Models in Seven Weeks - The Pragmatic Bookshelf • Learning Haskell

Editor's Notes

  1. Registers: Within each core are separate register files containing 160 entries for integers and 144 floating point numbers.  These registers are accessible within a single cycle and constitute the fastest memory available to our execution cores.  Memory Ordering Buffers (MOB): The MOB is comprised of a 64-entry load and 36-entry store buffer.  These buffers are used to track in-flight operations while waiting on the cache sub-system as instructions get executed out-of-order.  The store buffer is a fully associative queue that can be searched for existing store operations, which have been queued when waiting on the L1 cache.  These buffers enable our fast processors to run without blocking while data is transferred to and from the cache sub-system.  When the processor issues reads and writes they can can come back out-of-order.  The MOB is used to disambiguate the load and store ordering for compliance to the published memory model.  Level 1 Cache: The L1 is a core-local cache split into separate 32K data and 32K instruction caches.  Access time is 3 cycles and can be hidden as instructions are pipelined by the core for data already in the L1 cache. Level 2 Cache: The L2 cache is a core-local cache designed to buffer access between the L1 and the shared L3 cache.   Level 3 Cache: The L3 cache is shared across all cores within a socket. Main Memory: DRAM channels are connected to each socket with an average latency of ~65ns for socket local access on a full cache-miss.  This is however extremely variable, being much less for subsequent accesses to columns in the same row buffer, NUMA: In a multi-socket server we have non-uniform memory access.  It is non-uniform because the required memory maybe on a remote socket having an additional 40ns hop across the QPI bus. Associativity Levels Caches are effectively hardware based hash tables.  The hash function is usually a simple masking of some low-order bits for cache indexing.  Hash tables need some means to handle a collision for the same slot.  The L3 cache is inclusive in that any cache-line held in the L1 or L2 caches is also held in the L3.  This provides for rapid identification of the core containing a modified line when snooping for changes.  The cache controller for the L3 segment keeps track of which core could have a modified version of a cache-line it owns.
  2. Cache Coherence With some caches being local to cores, we need a means of keeping them coherent so all cores can have a consistent view of memory.  The cache sub-system is considered the "source of truth" for mainstream systems.  If memory is fetched from the cache it is never stale; the cache is the master copy when data exists in both the cache and main-memory.  To keep the caches coherent the cache controller tracks the state of each cache-line as being in one of a finite number of states.  The protocol Intel employs for this is MESIF, AMD employs a variant know as MOESI.  Under the MESIF protocol each cache-line can be in 1 of the 5 following states: Modified: Indicates the cache-line is dirty and must be written back to memory at a later stage.  When written back to main-memory the state transitions to Exclusive. Exclusive: Indicates the cache-line is held exclusively and that it matches main-memory.  When written to, the state then transitions to Modified.  To achieve this state a Read-For-Ownership (RFO) message is sent which involves a read plus an invalidate broadcast to all other copies. Shared: Indicates a clean copy of a cache-line that matches main-memory. Invalid: Indicates an unused cache-line. Forward: Indicates a specialised version of the shared state i.e. this is the designated cache which should respond to other caches in a NUMA system.
  3. Translation Lookaside Buffers (TLB) Besides general-purpose hardware caches, 80 × 86 processors include another cache called Translation Lookaside Buffers (TLB) to speed up linear address translation. When a linear address is used for the first time, the corresponding physical address is computed through slow accesses to the Page Tables in RAM. The physical address is then stored in a TLB entry so that further references to the same linear address can be quickly translated. Without the TLB, all virtual address lookups would take 16 cycles, even if the page table was held in the L1 cache. Performance would be unacceptable, so the TLB is basically essential for all modern chips.
  4. TLBleed shows that, by monitoring hyper-thread activity through the TLB instead of caches, even with full cache isolation or protection policies in effect, information can still leak between processes
  5. A context switch is the process by which the OS scheduler removes a currently running thread or task and replaces it with one that is waiting. There are several different types of context switch, but broadly speaking, they all involve swapping the executing instructions and the stack state of the thread. A context switch can be a costly operation, whether between user threads or from user mode into kernel mode (sometimes called a mode switch). The latter case is particularly important, because a user thread may need to swap into kernel mode in order to perform some function partway through its time slice. However, this switch will force instruction and other caches to be emptied, as the memory areas accessed by the user space code will not normally have anything in common with the kernel. For each process, Linux packs two different data structures in a single per-process memory area: a small data structure linked to the process descriptor, namely the thread_info structure, and the Kernel Mode pro- cess stack.
  6. A context switch into kernel mode will invalidate the TLBs and potentially other caches. When the call returns, these caches will have to be refilled, and so the effect of a kernel mode switch persists even after control has returned to user space. This causes the true cost of a system call to be masked, as can be seen
  7. Runtime optimisations threading
  8. The main feature of reactive programming for application-level components allows tasks to be executed asynchronously. process- ing streams of events in an asynchronous and nonblocking way is essential for maxi- mizing the use rate of modern multicore CPUs and, more precisely, of the threads competing for their use.
  9. In non-blocking or asynchronous request processing, no thread is in waiting state. There is generally only one request thread receiving the request. All incoming requests come with a event handler and call back information. Request thread delegates the incoming requests to a thread pool (generally small number of threads) which delegate the request to it’s handler function and immediately start processing other incoming requests from request thread. When the handler function is complete, one of thread from pool collect the response and pass it to the call back function.