The document discusses techniques for improving performance in Scala applications by reducing object allocation and improving data locality. It describes how excessive object instantiation can hurt performance by increasing garbage collection work and introducing non-determinism. Extractor objects are presented as a tool for pattern matching that can improve brevity and expressiveness. Name-based extractors introduced in Scala 2.11 avoid object allocation. The talk also covers how caching hierarchies work to reduce memory access latency and the importance of data access patterns for effective cache utilization. Cache-oblivious algorithms are designed to optimize memory hierarchy usage without knowing cache details. Synchronization is noted to have performance costs as well in an example event log implementation.
Presentation on Roaring bitmaps for the Go Montreal meetup (Go 10th anniversary).
Roaring bitmaps are a standard indexing data structure. They are
widely used in search and database engines. For example, Lucene, the
search engine powering Wikipedia relies on Roaring. The Go library
roaring implements Roaring bitmaps in Go. It is used in several
popular systems such as InfluxDB, Pilosa and Bleve. This library is
used in production in several systems, it is part of the Awesome Go
collection. After presenting the library, we will cover some advanced
Go topics such as the use of assembly language, unsafe mappings, and
so forth.
Presentation on Roaring bitmaps for the Go Montreal meetup (Go 10th anniversary).
Roaring bitmaps are a standard indexing data structure. They are
widely used in search and database engines. For example, Lucene, the
search engine powering Wikipedia relies on Roaring. The Go library
roaring implements Roaring bitmaps in Go. It is used in several
popular systems such as InfluxDB, Pilosa and Bleve. This library is
used in production in several systems, it is part of the Awesome Go
collection. After presenting the library, we will cover some advanced
Go topics such as the use of assembly language, unsafe mappings, and
so forth.
General Cryptography :WHY JOHNNY THE DEVELOPER CAN’T WORK WITH PUBLIC KEY CER...Priyanka Aash
A usability study of OpenSSL and a factorization method for moduli with two prime factors are discussed. Topic 1: Why Johnny the Developer Can’t Work with Public Key Certificates: An Experimental Study of OpenSSL Usability Authors: Martin Ukrop; Vashek Matyas Topic 2: Improved Factorization of N=p^r q^s Authors: Jean-Sebastien Coron; Rina Zeitoun
(Source: RSA Conference USA 2018)
A quick review and demonstration on how to get started on parallel computing with R. Includes an example of SNOW cluster set up in the departmental lab.
The research work that I describe in this dissertation is concerned with
the problem of shared-memory synchronization in large-scale
programs.
The difficulties of developing fine-grained lock-based synchronization
are well-known and many researchers have argued for the need of
alternative approaches.
Simply put, the main goal of my work is to provide an efficient
alternative to such approaches.
My proposal is based on Software Transactional Memory
(STM) and I implemented it in a well-known STM framework for
Java---Deuce STM.
To that end I propose a new approach that significantly lowers the
overhead caused by an STM in large-scale programs for which only a
small fraction of the memory is under contention. My solution
combines two novel optimization techniques in a synergistic way,
allowing us to get, for the first time, performance with an STM that
rivals the performance of the best lock-based approaches in some of
the more challenging benchmarks. My approach and experimental
results show that STMs may be the first efficient alternative to locks
for shared-memory synchronization in real-world--sized applications.
Slides from JEEConf 2018 talk "Virtual Machine for Regular Expressions". It describes how and why to implement a custom regular expression engine for matching arbitrary sequences.
Wprowadzenie do technologii Big Data / Intro to Big Data EcosystemSages
Introduction to Hadoop Map Reduce, Pig, Hive and Ambari technologies.
Workshop deck prepared and presented on September 5th 2015 by Radosław Stankiewicz.
During that the day participants had also the possibility to go through prepared tutorials and test their analysis on real cluster.
Go is a simple language with familiar syntax, but still, there are features, which implementation differs from most part of other programming languages. For example, the idea of CSP implemented as concurrency paradigm is used for parallelism in Go. It looks like a simple API, but under the hood, it is very complicated, but interesting mechanism. We will discuss, how it really works.
(Slides from the speech on Kharkiv Go meetup 14 Sep 2019)
JavaScript has always been a language with quite a few dark corners and gotchas, which, if ignored, could lead to unexpected behaviours and bugs. While the new ECMAScript standard has helped clear some of these issues, it has also brought up some new ones like the Temporal Dead Zone, handling exceptions in Promises, or the behaviour of constants. This interactive presentation will go through a number of ECMAScript 6 features and their quirks, in an attempt to clear them up.
All Pairs-Shortest Path (Fast Floyd-Warshall) Code Ehsan Sharifi
Shortest path algorithms are a family of algorithms designed to solve the shortest path problem. The shortest path problem is something most people have some intuitive familiarity with: given two points, A and B, what is the shortest path between them? In computer science, however, the all shortest path problem can take different forms and so different algorithms are needed to be able to solve them all. All shortest path, as an extension of single shortest path, has been investigated since the 60s, and plays a crucial role in many applications, including network optimization and routing, traffic information systems, databases, compilers, garbage collection, interactive verification systems, robotics, dataflow analysis, and document formatting.
In this project, we implement and evaluate a multi-core fast verison of Floyd-Warshall code.
Describes what minimal perfect hashing is and when to use it. Compares the most important algorithms.
Link to source code: https://github.com/thomasmueller/minperf
Operating and Supporting Delta Lake in ProductionDatabricks
Delta lake is widely adopted. There are things to be aware of when dealing with petabytes of data in Delta Lake. These smart decisions can give the best efficiency and increase the adoption of Delta. Best practices like OPTIMIZE, ZORDER have to wisely chosen. We have support stories where we successfully resolved performance issues by applying the right performance strategy. There are a set of common issues or repeated questions from our strategic customers face when using Delta and in this session we cover them and how to address them.
General Cryptography :WHY JOHNNY THE DEVELOPER CAN’T WORK WITH PUBLIC KEY CER...Priyanka Aash
A usability study of OpenSSL and a factorization method for moduli with two prime factors are discussed. Topic 1: Why Johnny the Developer Can’t Work with Public Key Certificates: An Experimental Study of OpenSSL Usability Authors: Martin Ukrop; Vashek Matyas Topic 2: Improved Factorization of N=p^r q^s Authors: Jean-Sebastien Coron; Rina Zeitoun
(Source: RSA Conference USA 2018)
A quick review and demonstration on how to get started on parallel computing with R. Includes an example of SNOW cluster set up in the departmental lab.
The research work that I describe in this dissertation is concerned with
the problem of shared-memory synchronization in large-scale
programs.
The difficulties of developing fine-grained lock-based synchronization
are well-known and many researchers have argued for the need of
alternative approaches.
Simply put, the main goal of my work is to provide an efficient
alternative to such approaches.
My proposal is based on Software Transactional Memory
(STM) and I implemented it in a well-known STM framework for
Java---Deuce STM.
To that end I propose a new approach that significantly lowers the
overhead caused by an STM in large-scale programs for which only a
small fraction of the memory is under contention. My solution
combines two novel optimization techniques in a synergistic way,
allowing us to get, for the first time, performance with an STM that
rivals the performance of the best lock-based approaches in some of
the more challenging benchmarks. My approach and experimental
results show that STMs may be the first efficient alternative to locks
for shared-memory synchronization in real-world--sized applications.
Slides from JEEConf 2018 talk "Virtual Machine for Regular Expressions". It describes how and why to implement a custom regular expression engine for matching arbitrary sequences.
Wprowadzenie do technologii Big Data / Intro to Big Data EcosystemSages
Introduction to Hadoop Map Reduce, Pig, Hive and Ambari technologies.
Workshop deck prepared and presented on September 5th 2015 by Radosław Stankiewicz.
During that the day participants had also the possibility to go through prepared tutorials and test their analysis on real cluster.
Go is a simple language with familiar syntax, but still, there are features, which implementation differs from most part of other programming languages. For example, the idea of CSP implemented as concurrency paradigm is used for parallelism in Go. It looks like a simple API, but under the hood, it is very complicated, but interesting mechanism. We will discuss, how it really works.
(Slides from the speech on Kharkiv Go meetup 14 Sep 2019)
JavaScript has always been a language with quite a few dark corners and gotchas, which, if ignored, could lead to unexpected behaviours and bugs. While the new ECMAScript standard has helped clear some of these issues, it has also brought up some new ones like the Temporal Dead Zone, handling exceptions in Promises, or the behaviour of constants. This interactive presentation will go through a number of ECMAScript 6 features and their quirks, in an attempt to clear them up.
All Pairs-Shortest Path (Fast Floyd-Warshall) Code Ehsan Sharifi
Shortest path algorithms are a family of algorithms designed to solve the shortest path problem. The shortest path problem is something most people have some intuitive familiarity with: given two points, A and B, what is the shortest path between them? In computer science, however, the all shortest path problem can take different forms and so different algorithms are needed to be able to solve them all. All shortest path, as an extension of single shortest path, has been investigated since the 60s, and plays a crucial role in many applications, including network optimization and routing, traffic information systems, databases, compilers, garbage collection, interactive verification systems, robotics, dataflow analysis, and document formatting.
In this project, we implement and evaluate a multi-core fast verison of Floyd-Warshall code.
Describes what minimal perfect hashing is and when to use it. Compares the most important algorithms.
Link to source code: https://github.com/thomasmueller/minperf
Operating and Supporting Delta Lake in ProductionDatabricks
Delta lake is widely adopted. There are things to be aware of when dealing with petabytes of data in Delta Lake. These smart decisions can give the best efficiency and increase the adoption of Delta. Best practices like OPTIMIZE, ZORDER have to wisely chosen. We have support stories where we successfully resolved performance issues by applying the right performance strategy. There are a set of common issues or repeated questions from our strategic customers face when using Delta and in this session we cover them and how to address them.
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Daniel Lemire
Maximizing performance in data engineering is a daunting challenge. We present some of our work on designing faster indexes, with a particular emphasis on compressed indexes. Some of our prior work includes (1) Roaring indexes which are part of multiple big-data systems such as Spark, Hive, Druid, Atlas, Pinot, Kylin, (2) EWAH indexes are part of Git (GitHub) and included in major Linux distributions.
We will present ongoing and future work on how we can process data faster while supporting the diverse systems found in the cloud (with upcoming ARM processors) and under multiple programming languages (e.g., Java, C++, Go, Python). We seek to minimize shared resources (e.g., RAM) while exploiting algorithms designed for the single-instruction-multiple-data (SIMD) instructions available on commodity processors. Our end goal is to process billions of records per second per core.
The talk will be aimed at programmers who want to better understand the performance characteristics of current big-data systems as well as their evolution. The following specific topics will be addressed:
1. The various types of indexes and their performance characteristics and trade-offs: hashing, sorted arrays, bitsets and so forth.
2. Index and table compression techniques: binary packing, patched coding, dictionary coding, frame-of-reference.
- Understanding Time Series
- What's the Fundamental Problem
- Prometheus Solution (v1.x)
- New Design of Prometheus (v2.x)
- Data Compression Algorithm
Data Analytics and Simulation in Parallel with MATLAB*Intel® Software
This talk covers the current parallel capabilities in MATLAB*. Learn about its parallel language and distributed and tall arrays. Interact with GPUs both on the desktop and in the cluster. Combine this information into an interesting algorithmic framework for data analysis and simulation.
Presentation by Stefan Dziembowski, associate professor and leader of Cryptology and Data Security Group University of Warsaw. In BIU workshop on Bitcoin. Covered exclusively by vpnMentor.com
Presentation by Stefan Dziembowski, associate professor and leader of Cryptology and Data Security Group University of Warsaw. In BIU workshop on Bitcoin. Covered exclusively by vpnMentor.com
Porting a Streaming Pipeline from Scala to RustEvan Chan
How we at Conviva ported a streaming data pipeline in months from Scala to Rust. What are the important human and technical factors in our port, and what did we learn?
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
3. THINGS TO KEEP IN MIND
▸ Actual object instantiation costs time
▸ Creates a lot more work for the Garbage Collector
▸ May introduce systemic pauses in your application
▸ Introduces non-determinism
4. THINGS TO KEEP IN MIND
▸ A fair amount of systems implement GC-free data
structures living entirely off-heap
▸ Some commercials solutions provide pause-free
GC implementations (Azul Zing)
▸ There are even proposals to introduce a No-Op
GC in Java
6. EXTRACTORS IN A NUTSHELL
▸ An object with an unapply method
▸ Takes an object and tries to give back its
components
▸ Useful in pattern matching, partial functions, etc
▸ A great tool for achieving brevity and
expressiveness
7. EXTRACTING BLACK ROOKS
case class Rook(x: Int,
y: Int,
isBlack: Boolean)
▸ We simply need to go through a board of rooks
▸ Match on all rooks that are black
▸ Extract the x and y coordinates
8. EXTRACTING BLACK ROOKS
object BlackRook {
def unapply(rook: Rook): Option[(Int, Int)] =
if (rook.isBlack) Some((rook.x, rook.y)) else None
}
9. EXTRACTING BLACK ROOKS
public unapply(Lextractors/Rook;)Lscala/Option;
L0
LINENUMBER 5 L0
ALOAD 1
INVOKEVIRTUAL extractors/Rook.isBlack ()Z
IFEQ L1
NEW scala/Some
DUP
NEW scala/Tuple2$mcII$sp
DUP
. . .
object BlackRook {
def unapply(rook: Rook): Option[(Int, Int)] =
if (rook.isBlack) Some((rook.x, rook.y)) else None
}
10. EXTRACTING BLACK ROOKS
public unapply(Lextractors/Rook;)Lscala/Option;
L0
LINENUMBER 5 L0
ALOAD 1
INVOKEVIRTUAL extractors/Rook.isBlack ()Z
IFEQ L1
NEW scala/Some
DUP
NEW scala/Tuple2$mcII$sp
DUP
. . .
object BlackRook {
def unapply(rook: Rook): Option[(Int, Int)] =
if (rook.isBlack) Some((rook.x, rook.y)) else None
}
11. ALLOCATION FREE EXTRACTORS
▸ Name-based extractors introduced in 2.11
▸ Returning Option is no longer needed
▸ We need to return an object defining two methods
def isEmpty: Boolean = . . .
def get: T = . . .
12. ALLOCATION-FREE EXAMPLE
object BlackRookNameBased {
class Extractor[T <: AnyRef](val extraction: T) extends AnyVal {
def isEmpty: Boolean = extraction eq null
def get: T = extraction
}
def unapply(rook: Rook): Extractor[(Int, Int)] =
if (rook.isBlack)
new Extractor((rook.x, rook.y))
else
new Extractor(null)
}
16. GC STATS
▸ -XX:+PrintGCApplicationStoppedTime
Enables printing of the duration of the pause (for
example, a GC pause) that lasted.
▸ -XX:+PrintGCApplicationConcurrentTime
Enables printing of the time elapsed from the last pause
(for example, a GC pause).
▸ -XX:+PrintGCTimeStamps
Enables printing of the duration of the pause (for
example, a GC pause) that lasted.
17. HEAP STATS (DEFAULT)
num #instances #bytes class name
----------------------------------------------
1: 262144 376704 scala.Tuple2$mcII$sp
2: 262144 6291456 extractors.Rook
3: 197796 3164736 java.lang.Integer
4: 11931 286344 java.lang.String
5: 17592 281472 scala.Some
num #instances #bytes class name
----------------------------------------------
1: 262144 376704 scala.Tuple2$mcII$sp
2: 262144 6291456 extractors.Rook
3: 197796 3164736 java.lang.Integer
4: 11949 286776 java.lang.String
5: 3108 174048 jdk.internal.org.objectweb.asm.Item
HEAP STATS (NAME BASED)
19. HOW LATENCY IS MASKED
▸ Modern CPUs have a multitude of caches
▸ These caches vary in size and latency
▸ Main purpose is to mask latency and ensure our
CPU’s progress is not severely hindered by main
memory latency
▸ A cache miss is one of the most prominent
performance killers
20. HIERARCHY OF CACHES
▸ L1 cache - core-local cache split into separate 32K
data and 32K instruction caches. (1.5 ns)
▸ L2 - core-local cache of 256K in size. Contains both
data and instructions . (5 ns)
▸ L3 - Typically 6mb. Shared between cores. (16 - 25 ns)
▸ RAM - Large in size. (60 ns)
24. MATRIX TRANSPOSITION
def transpose: Matrix = {
val newMatrix = Matrix.empty(rows)
for {i <- 0 until rows}
for {j <- 0 until cols}
newMatrix.data.update(j + I * rows, this.data(i + j * cols))
newMatrix
}
25. MATRIX TRANSPOSITION
▸ We have cache and memory
▸ Latency to memory is significantly higher
▸ We load data in cache lines of size 32 bytes (4 longs)
▸ Cache size is one line
def transpose: Matrix = {
val newMatrix = Matrix.empty(rows)
for {i <- 0 until rows}
for {j <- 0 until cols}
newMatrix.data.update(j + I * rows, this.data(i + j * cols))
newMatrix
}
39. CACHE OBLIVIOUS ALGORITHMS
▸ A bit of a misleading name…
▸ An algorithm designed to take advantage of the
underlying memory hierarchy
▸ No need to know details of the cache (size, length
of the cache lines, etc.)
▸ Reduces the problem, so that it eventually fits in
cache
40. SOME KNOWN APPROACHES
▸ Matrix multiplication - Strassen algorithm
▸ Matrix tranposition - Frigo’s transpose
▸ Tree traversal - van Emde Boas layout
▸ Hashing - blocked probing
42. EXAMPLE FROM THE DAY IN THE LIFE
▸ An event log
▸ Writer - writes events to the log in linear fashion
▸ Transformer - concurrently tails the log and
transforms the events give some predefined
function
▸ Transformer is never ahead of the writer
43. A SIMPLE EVENT LOG
trait EventLog[T] {
def writeNext(ev: T): Boolean
def transformNext(f: T => T): Boolean
}
44. A SIMPLE EVENT LOG
var writerPos = 0L
var transfPos = 0L
val log = new Array[Int](logSize)
45. A SIMPLE EVENT LOG
def writeNext(ev: Int): Boolean = synchronized {
if (writerPos < transfPos) {
false
} else {
log(writerPos.toInt) = ev
writerPos += 1
true
}
}
46. A SIMPLE EVENT LOG
def transformNext(f: Int Int): Boolean = synchronized {
if (transfPos >= writerPos) {
false
} else {
val currentEvent = log(transfPos.toInt)
log(transfPos.toInt) = f(currentEvent)
transfPos += 1
true
}
}
51. INSURING MEMORY VISIBILITY
▸ Introduces a happens-before relationship
▸ All changes prior to that have happened and are
visible to other threads
▸ Does not mean that values are read from main
memory
▸ Writes are applied to the L1 cache and flow
through the cache subsystem
52. LOCK-FREE STATS
Log Type L2 Miss (M) L3 Miss (M)
(M)
IPC Ops/s (M)
(million)Synchronized 1084 137 0.29 13.2
Lock-free 357 156 0.42 17.2
Lazy set 304 73 0.8 42.6
Padded 211 50 1.4 76.5
53. GOING ATOMIC
val writerPos = AtomicLong(0L)
val transfPos = AtomicLong(0L)
val log = new Array[Int](logSize)
def writeNext(ev: Int): Boolean = {
val currentWriterPos = writerPos.get
if (currentWriterPos < transfPos.get) {
false
} else {
log(currentWriterPos.toInt) = ev
writerPos.lazySet(currentWriterPos + 1)
true
}
}
54. LAZY SET STATS
Log Type L2 Miss (M) L3 Miss (M)
(M)
IPC Ops/s (M)
(million)Synchronized 1084 137 0.29 13.2
Lock free 357 156 0.42 17.2
Lazy set 304 73 0.8 42.6
Padded 211 50 1.4 76.5
55. FALSE SHARING
▸ Two threads modifying independent variables
sharing the same cache line
▸ Often times depends on the layout of your objects
▸ Causes invalidation of cache lines and increased
coherency protocol traffic
▸ Cache line is ping-pongs through L3 which has
significant latency implications
▸ Can be even worse in case these threads are on a
different socket (crossing interconnects)
57. FALSE SHARING
@volatile var writerPos = 0L
@volatile var transfPos = 0L
Cache Line 1
64 Bytes
Cache Line 2
Cache Line 3
Cache Line N
…
58. FALSE SHARING
@volatile var writerPos = 0L
@volatile var transfPos = 0L
Cache Line 1
64 Bytes
Cache Line 2
Cache Line 3
Cache Line N
…
▸ Inspecting Java Object Layout
https://github.com/ktoso/sbt-jol
59. PADDING TO AVOID FALSE SHARING
val writerPos = AtomicLong.withPadding(0, LeftRight128)
val transfPos = AtomicLong.withPadding(0, LeftRight128)
60. PADDED STATS
Log Type L2 Miss (M) L3 Miss (M)
(M)
IPC Ops/s (M)
(million)Synchronized 1084 137 0.29 13.2
Lock free 357 156 0.42 17.2
Lazy set 304 73 0.8 42.6
Padded 211 50 1.4 76.5
62. AKKA MESSAGE LIFECYCLE
Sender
Sends a message
through ActorRef
ActorRef
Actor
Enqueues message
Schedules and runs
the mailbox
Executor Service
T1 T2 Tn
Dispatcher
63. TYPES OF AKKA DISPATCHERS
▸ Default Dispatcher - Used if no other specified
▸ Pinned Dispatcher - one thread per actor
▸ Calling Thread Dispatcher - for tests only
64. TYPES OF EXECUTORS
▸ ForkJoinPool - Relies on lock free work stealing
queues
▸ ThreadPoolExecutor - Uses Linked blocking queue
to distribute tasks
66. LIMITATION: NO ACTOR-TO-THREAD AFFINITY
▸ Potentially causing CPU cache invalidation
▸ Lack of parameters allowing you to achieve fine
grained control
68. FAIR DISTRIBUTION QUEUE SELECTOR
▸ Adaptive work assignment strategy
▸ Few actors - explicit mapping (fairer)
▸ More actors - consistent hashing (cheaper)
69. ADVANTAGES
▸ Less cache hits due to temporal locality
▸ Decreases contention
▸ Customisable queue selection
70. CLIENT ACTOR
class UserQueryActor(latch: CountDownLatch,
numQueries: Int,
numUsersInDB: Int) extends Actor {
private var left = numQueries
private val receivedUsers: mutable.Map[Int, User] = mutable.Map()
private val randGenerator = new Random()
override def receive: Receive = {
case u: User {
receivedUsers.put(u.userId, u)
if (left == 0) {
latch.countDown()
context stop self
} else {
sender() ! Request(randGenerator.nextInt(numUsersInDB))
}
left -= 1
}
}
}
71. SERVICE ACTOR
class UserServiceActor(userDb: Map[Int, User],
latch: CountDownLatch,
numQueries: Int) extends Actor {
private var left = numQueries
def receive = {
case Request(id)
userDb.get(id) match {
case Some(u) sender() ! u
case None
}
if (left == 0) {
latch.countDown()
context stop self
}
left -= 1
}
}
73. SO… IF YOU ARE ON A HOT CODEPATH
▸ Make sure you really are
▸ Measure everything (sbt-jmh, sbt-jol, perfstat …)
▸ Watch out for language features that can
introduce unintended allocations (e.g. pattern
matching)
▸ Use algorithms and data structures that are cache
friendly
▸ Use efficient concurrency tools but try to not roll
your own: JCTools, Akka, Vert.x …
74. RESOURCES
▸ Name based extractors
https://hseeberger.wordpress.com/2013/10/04/name-based-extractors-in-scala-2-11/
▸ Cache oblivious algorithms (MIT OCW)
https://www.youtube.com/watch?v=CSqbjfCCLrU
▸ Lazy Set in detail
http://psy-lob-saw.blogspot.bg/2012/12/atomiclazyset-is-performance-win-for.html
▸ Processor Counter Monitor
https://github.com/opcm/pcm
▸ Memory Access Patterns
https://mechanical-sympathy.blogspot.bg/2012/08/memory-access-patterns-are-
important.html
▸ False Sharing
https://mechanical-sympathy.blogspot.bg/2011/07/false-sharing.html
https://mechanical-sympathy.blogspot.bg/2013/02/cpu-cache-flushing-fallacy.html
▸ Slides and code
https://github.com/zaharidichev/scala-days-2018-berlin