Function Applicative for Great Good of Palindrome Checker Function - Polyglot...Philip Schwarz
Embark on an informative and fun journey through everything you need to know to understand how the Applicative instance for functions makes for a terse palindrome checker function definition in point-free style.
Alexa, the voice service that powers Amazon Echo and Amazon Fire TV, provides a set of built-in abilities, or skills, that enable customers to interact with devices in a more intuitive way using voice. Application developers are also able to create custom applications and skills that can be published in the Alexa App Store for consumers to use. Some examples of these today include Uber, Spotify and Domino’s Pizza.This session will advise on why voice is a relevant additional user engagement model for businesses, what a good VUI (Voice User Interface) sounds like, and also demonstrate how simple it is to build custom Alexa applications by utilising the hosted Alexa Voice service and the AWS cloud.
Describes techniques for injecting "Semantic Intelligence" into search applications. Focuses on Apache Solr and Lucidworks Fusion, but these techniques are generally applicable to any search engine because all of them use the same basic mechanism - inverted token mapping at their 'core'.
Function Applicative for Great Good of Palindrome Checker Function - Polyglot...Philip Schwarz
Embark on an informative and fun journey through everything you need to know to understand how the Applicative instance for functions makes for a terse palindrome checker function definition in point-free style.
Alexa, the voice service that powers Amazon Echo and Amazon Fire TV, provides a set of built-in abilities, or skills, that enable customers to interact with devices in a more intuitive way using voice. Application developers are also able to create custom applications and skills that can be published in the Alexa App Store for consumers to use. Some examples of these today include Uber, Spotify and Domino’s Pizza.This session will advise on why voice is a relevant additional user engagement model for businesses, what a good VUI (Voice User Interface) sounds like, and also demonstrate how simple it is to build custom Alexa applications by utilising the hosted Alexa Voice service and the AWS cloud.
Describes techniques for injecting "Semantic Intelligence" into search applications. Focuses on Apache Solr and Lucidworks Fusion, but these techniques are generally applicable to any search engine because all of them use the same basic mechanism - inverted token mapping at their 'core'.
Folding Unfolded - Polyglot FP for Fun and Profit - Haskell and Scala - Part 2Philip Schwarz
(download for perfect quality) See aggregation functions defined inductively and implemented using recursion.
Learn how in many cases, tail-recursion and the accumulator trick can be used to avoid stack-overflow errors.
Watch as general aggregation is implemented and see duality theorems capturing the relationship between left folds and right folds.
Through the work of Sergei Winitzki and Richard Bird.
Folding Unfolded - Polyglot FP for Fun and Profit - Haskell and Scala Part 2 ...Philip Schwarz
(download for perfect quality) See aggregation functions defined inductively and implemented using recursion.
Learn how in many cases, tail-recursion and the accumulator trick can be used to avoid stack-overflow errors.
Watch as general aggregation is implemented and see duality theorems capturing the relationship between left folds and right folds.
Through the work of Sergei Winitzki and Richard Bird.
This version corrects the following issues:
slide 32: = reverse --> reverse =
Slide 33: 100_000 -> 1_000_000
It also adds slides 36, 37 and 38
Spreadsheets are often dismissed by developers for not being "proper programming" but that is not true. Since I have shown that spreadsheets are Turing complete, you have no excuse to diss them any longer. In this session, I will implement various algorithms in Excel to show you its power and elegance. After all, spreadsheets are 'live' and functional, so they have everything going for them! Furthermore they are very fit for TDD and rapid prototyping.
Don't fight spreadsheets any longer, but learn to love them.
Presented at 8th Light University London (13th May 2016)
Do this, do that. Coding from assembler to shell scripting, from the mainstream languages of the last century to the mainstream languages now, is dominated by an imperative style. From how we teach variables — they vary, right? — to how we talk about databases, we are constantly looking at state as a thing to be changed and programming languages are structured in terms of the mechanics of change — assignment, loops and how code can be threaded (cautiously) with concurrency.
Functional programming, mark-up languages, schemas, persistent data structures and more are all based around a more declarative approach to code, where instead of reasoning in terms of who does what to whom and what the consequences are, relationships and uses are described, and the flow of execution follows from how functions, data and other structures are composed. This talk will look at the differences between imperative and declarative approaches, offering lessons, habits and techniques that are applicable from requirements through to code and tests in mainstream languages.
Programming is hard, but we can magnify our efforts with excellent API design. Let’s explore how, as we consider compactness, orthogonality, consistency, safety, coupling, state handling, layering, and more, illustrated with practical examples (and gruesome mistakes!) from several popular Python libraries.
In this talk I will discuss how to deduplicate large amounts of source code using the source{d} stack, and more specifically the Apollo project. The 3 steps of the process used in Apollo will be detailed, ie: - the feature extraction step; - the hashing step; - the connected component and community detection step; I'll then go on describing some of the results found from applying Apollo to Public Git Archive, as well as the issues I faced and how these issues could have been somewhat avoided. The talk will be concluded by discussing Gemini, the production-ready sibling project to Apollo, and imagining applications that could extract value from Apollo.
After a quick introduction on the motivation behind Apollo, as said in the abstract I'll describe each step of Apollo's process. As a rule of thumb I'll first describe it formally, then go into how we did it in practice.
Feature extraction: I'll describe code representation, specifically as UASTs, then from there detail the features used. This will allow me to differentiate Apollo from it's inspiration, DejaVu, and talk about code clones taxonomy a bit. TF-IDF will also be touched upon. Hashing: I'll describe the basic Minhashing algorithm, then the improvements Sergey Ioffe's variant brought. I'll justify it's use in our case simultaneously. Connected components/Community detection: I'll describe the connected components and community notion's first (as in in graphs), then talk about the different ways we can extract them from the similarity graph.
After this I'll talk about the issues I had applying Apollo to PGA due to the amount of data, and how I went around the major issued faced. Then I'll go on talking about the results, show some of the communities, and explain in light of these results how issues could have been avoided, and the whole process improved. Finally I'll talk about Gemini, and outline some of the applications that could be imagined to Source code Deduplication.
Trie Data Structure
LINK: https://leetcode.com/tag/trie/
Easy:
1. Longest Word in Dictionary
Medium:
1. Count Substrings That Differ by One Character
2. Replace Words
3. Top K Frequent Words
4. Maximum XOR of Two Numbers in an Array
5. Map Sum Pairs
Hard:
1. Concatenated Words
2. Word Search II
Presented as a Tutorial at the 2023 Knowledge Graph Conference, this deck explores different ways that information can be transformed across knowledge portals, from basic RDF structures to the use of SPARQL UPDATE based Workflows. It then explores how ChatGPT can be used to expand upon this transformation capability, and why knowledge portals should be considered transformation engines for graphs.
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidPhilip Schwarz
The subject of this deck is the small Print[A] program in the following blog post by Noel Welsh: https://www.inner-product.com/posts/direct-style-effects/.
Keywords: "direct-style", "context function", "context functions", "algebraic effect", "algebraic effects", "scala", "effect system", "effect systems", "effect", "side effect", "composition", "fp", "functional programming"
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
For functions that can be defined both as an instance of a right fold and as an instance of a left fold, one may be more efficient than the other.
Let's look at the example of a function 'decimal' that converts a list of digits into the corresponding decimal number.
Erratum: it has been pointed out that it is possible to define the zip function using a right fold (see slide 5).
More Related Content
Similar to Scala Left Fold Parallelisation- Three Approaches
Folding Unfolded - Polyglot FP for Fun and Profit - Haskell and Scala - Part 2Philip Schwarz
(download for perfect quality) See aggregation functions defined inductively and implemented using recursion.
Learn how in many cases, tail-recursion and the accumulator trick can be used to avoid stack-overflow errors.
Watch as general aggregation is implemented and see duality theorems capturing the relationship between left folds and right folds.
Through the work of Sergei Winitzki and Richard Bird.
Folding Unfolded - Polyglot FP for Fun and Profit - Haskell and Scala Part 2 ...Philip Schwarz
(download for perfect quality) See aggregation functions defined inductively and implemented using recursion.
Learn how in many cases, tail-recursion and the accumulator trick can be used to avoid stack-overflow errors.
Watch as general aggregation is implemented and see duality theorems capturing the relationship between left folds and right folds.
Through the work of Sergei Winitzki and Richard Bird.
This version corrects the following issues:
slide 32: = reverse --> reverse =
Slide 33: 100_000 -> 1_000_000
It also adds slides 36, 37 and 38
Spreadsheets are often dismissed by developers for not being "proper programming" but that is not true. Since I have shown that spreadsheets are Turing complete, you have no excuse to diss them any longer. In this session, I will implement various algorithms in Excel to show you its power and elegance. After all, spreadsheets are 'live' and functional, so they have everything going for them! Furthermore they are very fit for TDD and rapid prototyping.
Don't fight spreadsheets any longer, but learn to love them.
Presented at 8th Light University London (13th May 2016)
Do this, do that. Coding from assembler to shell scripting, from the mainstream languages of the last century to the mainstream languages now, is dominated by an imperative style. From how we teach variables — they vary, right? — to how we talk about databases, we are constantly looking at state as a thing to be changed and programming languages are structured in terms of the mechanics of change — assignment, loops and how code can be threaded (cautiously) with concurrency.
Functional programming, mark-up languages, schemas, persistent data structures and more are all based around a more declarative approach to code, where instead of reasoning in terms of who does what to whom and what the consequences are, relationships and uses are described, and the flow of execution follows from how functions, data and other structures are composed. This talk will look at the differences between imperative and declarative approaches, offering lessons, habits and techniques that are applicable from requirements through to code and tests in mainstream languages.
Programming is hard, but we can magnify our efforts with excellent API design. Let’s explore how, as we consider compactness, orthogonality, consistency, safety, coupling, state handling, layering, and more, illustrated with practical examples (and gruesome mistakes!) from several popular Python libraries.
In this talk I will discuss how to deduplicate large amounts of source code using the source{d} stack, and more specifically the Apollo project. The 3 steps of the process used in Apollo will be detailed, ie: - the feature extraction step; - the hashing step; - the connected component and community detection step; I'll then go on describing some of the results found from applying Apollo to Public Git Archive, as well as the issues I faced and how these issues could have been somewhat avoided. The talk will be concluded by discussing Gemini, the production-ready sibling project to Apollo, and imagining applications that could extract value from Apollo.
After a quick introduction on the motivation behind Apollo, as said in the abstract I'll describe each step of Apollo's process. As a rule of thumb I'll first describe it formally, then go into how we did it in practice.
Feature extraction: I'll describe code representation, specifically as UASTs, then from there detail the features used. This will allow me to differentiate Apollo from it's inspiration, DejaVu, and talk about code clones taxonomy a bit. TF-IDF will also be touched upon. Hashing: I'll describe the basic Minhashing algorithm, then the improvements Sergey Ioffe's variant brought. I'll justify it's use in our case simultaneously. Connected components/Community detection: I'll describe the connected components and community notion's first (as in in graphs), then talk about the different ways we can extract them from the similarity graph.
After this I'll talk about the issues I had applying Apollo to PGA due to the amount of data, and how I went around the major issued faced. Then I'll go on talking about the results, show some of the communities, and explain in light of these results how issues could have been avoided, and the whole process improved. Finally I'll talk about Gemini, and outline some of the applications that could be imagined to Source code Deduplication.
Trie Data Structure
LINK: https://leetcode.com/tag/trie/
Easy:
1. Longest Word in Dictionary
Medium:
1. Count Substrings That Differ by One Character
2. Replace Words
3. Top K Frequent Words
4. Maximum XOR of Two Numbers in an Array
5. Map Sum Pairs
Hard:
1. Concatenated Words
2. Word Search II
Presented as a Tutorial at the 2023 Knowledge Graph Conference, this deck explores different ways that information can be transformed across knowledge portals, from basic RDF structures to the use of SPARQL UPDATE based Workflows. It then explores how ChatGPT can be used to expand upon this transformation capability, and why knowledge portals should be considered transformation engines for graphs.
Similar to Scala Left Fold Parallelisation- Three Approaches (19)
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidPhilip Schwarz
The subject of this deck is the small Print[A] program in the following blog post by Noel Welsh: https://www.inner-product.com/posts/direct-style-effects/.
Keywords: "direct-style", "context function", "context functions", "algebraic effect", "algebraic effects", "scala", "effect system", "effect systems", "effect", "side effect", "composition", "fp", "functional programming"
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
For functions that can be defined both as an instance of a right fold and as an instance of a left fold, one may be more efficient than the other.
Let's look at the example of a function 'decimal' that converts a list of digits into the corresponding decimal number.
Erratum: it has been pointed out that it is possible to define the zip function using a right fold (see slide 5).
Tagless Final Encoding - Algebras and Interpreters and also ProgramsPhilip Schwarz
Tagless Final Encoding - Algebras and Interpreters and also Programs - An introduction, through the work of Gabriel Volpe.
Slide deck home: http://fpilluminated.com/assets/tagless-final-encoding-algebras-interpreters-and-programs.html
A sighting of traverseFilter and foldMap in Practical FP in ScalaPhilip Schwarz
Slide deck home: http://fpilluminated.com/assets/sighting-of-scala-cats-traverseFilter-and-foldMap-in-practical-fp-in-scala.html.
Download PDF for perfect image quality.
A sighting of sequence function in Practical FP in ScalaPhilip Schwarz
Slide deck home: http://fpilluminated.com/assets/sighting-of-scala-cats-sequence-function-in-practical-fp-in-scala.html.
Download PDF for perfect image quality.
This talk was presented on Aug 3rd 2023 during the Scala in the City event a ITV in London https://www.meetup.com/scala-in-the-city/events/292844968/
Visit the following for a description, slideshow, all slides with transcript, pdf, github repo, and eventually a video recording: http://fpilluminated.com/assets/n-queens-combinatorial-puzzle-meets-cats.html
At the centre of this talk is the N-Queens combinatorial puzzle. The reason why this puzzle features in the Scala book and functional programming course by Martin Odersky (the language’s creator), is that such puzzles are a particularly suitable application area of 'for comprehensions'.
We’ll start by (re)acquainting ourselves with the puzzle, and seeing the role played in it by permutations. Next, we’ll see how, when wanting to visualise candidate puzzle solutions, Cats’ monoidal functions fold and foldMap are a great fit for combining images.
While we are all very familiar with the triad providing the bread, butter and jam of functional programming, i.e. map, filter and fold, not everyone knows about the corresponding functions in Cats’ monadic variant of the triad, i.e. mapM, filterM and foldM, which we are going to learn about next.
As is often the case in functional programming, the traverse function makes an appearance, and we shall grab the opportunity to point out the symmetry that exists in the interrelation of flatMap / foldMap / traverse and flatten / fold / sequence.
Armed with an understanding of foldM, we then look at how such a function can be used to implement an iterative algorithm for the N-Queens puzzle.
The talk ends by pointing out that the iterative algorithm is smarter than the recursive one, because it ‘remembers’ where it has already placed previous queens.
Kleisli composition, flatMap, join, map, unit - implementation and interrelat...Philip Schwarz
Kleisli composition, flatMap, join, map, unit. A study/memory aid, to help learn/recall their implementation/interrelation.
Version 2, updated for Scala 3
Nat, List and Option Monoids -from scratch -Combining and Folding -an examplePhilip Schwarz
Nat, List and Option Monoids, from scratch. Combining and Folding: an example.
This is a new version of the original which has some cosmetic changes and a new 7th slide which only differs from slide 6 in that it defines the fold function in terms of the foldRight function.
Code: https://github.com/philipschwarz/nat-list-and-option-monoids-from-scratch-combining-and-folding-an-example
Nat, List and Option Monoids -from scratch -Combining and Folding -an examplePhilip Schwarz
Nat, List and Option Monoids, from scratch. Combining and Folding: an example.
Code: https://github.com/philipschwarz/nat-list-and-option-monoids-from-scratch-combining-and-folding-an-example
The Sieve of Eratosthenes - Part II - Genuine versus Unfaithful Sieve - Haske...Philip Schwarz
When I posted the deck for Part 1 to the Scala users forum, Odd Möller linked to a paper titled "The Genuine Sieve of Eratosthenes", which speaks of the Unfaithful Sieve.
Part 2 is based on that paper and on Richard Bird's faithful Haskell implementation of the Sieve, which we translate into Scala.
Scala code for Richard Bird's infinite primes Haskell program: https://github.com/philipschwarz/sieve-of-eratosthenes-part-2-scala
Sum and Product Types -The Fruit Salad & Fruit Snack Example - From F# to Ha...Philip Schwarz
Sum and Product Types -The Fruit Salad & Fruit Snack Example - From F# to Haskell, Scala and Java.
Inspired by the example in Scott Wlaschin’s F# book: Domain Modeling Made Functional.
Download for better results.
Java 19 Code: https://github.com/philipschwarz/fruit-salad-and-fruit-snack-ADT-example-java
Jordan Peterson - The pursuit of meaning and related ethical axiomsPhilip Schwarz
I have only recently become aware of the work of Jordan Peterson. Because I am finding it so interesting, I hope that the following small collection of excerpts from some of his writings and speeches might entice any fellow latecomers to find out more about his work. See below for my own summary of some of the subjects touched upon in these slides.
Download for best results.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
top nidhi software solution freedownloadvrstrong314
This presentation emphasizes the importance of data security and legal compliance for Nidhi companies in India. It highlights how online Nidhi software solutions, like Vector Nidhi Software, offer advanced features tailored to these needs. Key aspects include encryption, access controls, and audit trails to ensure data security. The software complies with regulatory guidelines from the MCA and RBI and adheres to Nidhi Rules, 2014. With customizable, user-friendly interfaces and real-time features, these Nidhi software solutions enhance efficiency, support growth, and provide exceptional member services. The presentation concludes with contact information for further inquiries.
Into the Box Keynote Day 2: Unveiling amazing updates and announcements for modern CFML developers! Get ready for exciting releases and updates on Ortus tools and products. Stay tuned for cutting-edge innovations designed to boost your productivity.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
Experience our free, in-depth three-part Tendenci Platform Corporate Membership Management workshop series! In Session 1 on May 14th, 2024, we began with an Introduction and Setup, mastering the configuration of your Corporate Membership Module settings to establish membership types, applications, and more. Then, on May 16th, 2024, in Session 2, we focused on binding individual members to a Corporate Membership and Corporate Reps, teaching you how to add individual members and assign Corporate Representatives to manage dues, renewals, and associated members. Finally, on May 28th, 2024, in Session 3, we covered questions and concerns, addressing any queries or issues you may have.
For more Tendenci AMS events, check out www.tendenci.com/events
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteGoogle
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-pilot-review/
AI Pilot Review: Key Features
✅Deploy AI expert bots in Any Niche With Just A Click
✅With one keyword, generate complete funnels, websites, landing pages, and more.
✅More than 85 AI features are included in the AI pilot.
✅No setup or configuration; use your voice (like Siri) to do whatever you want.
✅You Can Use AI Pilot To Create your version of AI Pilot And Charge People For It…
✅ZERO Manual Work With AI Pilot. Never write, Design, Or Code Again.
✅ZERO Limits On Features Or Usages
✅Use Our AI-powered Traffic To Get Hundreds Of Customers
✅No Complicated Setup: Get Up And Running In 2 Minutes
✅99.99% Up-Time Guaranteed
✅30 Days Money-Back Guarantee
✅ZERO Upfront Cost
See My Other Reviews Article:
(1) TubeTrivia AI Review: https://sumonreview.com/tubetrivia-ai-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
May Marketo Masterclass, London MUG May 22 2024.pdf
Scala Left Fold Parallelisation- Three Approaches
1. Scala Left Fold Parallelisation
Three Approaches
Standard
Library
Parallel
Collections
Library
Cats Effect
Cats
+
Aleksandar Prokopec
@alexprokopec
@philip_schwarz
slides by
http://fpilluminated.com/
Adam Rosien
@arosien
foldLeft fold
2. @philip_schwarz
Let’s begin by looking at a contrived example of a left fold over a relatively large collection.
It is an adaptation of an example from the following book by Aleksandar Prokopec: Learning Concurrent Programming in Scala.
The original example downloaded a text file containing the whole HTML specification, searched its lines for the keyword ‘TEXTAREA’, and then
printed the lines containing the keyword.
We are going to search for a word supplied by the user, and the text that we are going to search is going to be that of a relatively large book
downloaded from https://gutenberg.org/.
Initially I picked War and Peace, which is 66,036 lines long, but for reasons that will become clear later, I then decided to look for a book of about
100,000 lines, and the closest that I could find was The King James Version of the Bible, which is only 25 lines short of the desired number.
case class Book(name: String, numberOfLines: Int, numberOfBytes: Int, url: URL)
val theBible = Book(
name = "The King James Version of the Bible",
numberOfLines = 99_975,
numberOfBytes = 4_456_041,
url = URL("https://gutenberg.org/cache/epub/10/pg10.txt")
)
3. Here is a method that tries to get hold of the lines of text of a book…
@main def run(word: String): Unit =
getText(book = theBible)
.fold(
error => handleErrorGettingText(error),
lines =>
announceSuccessGettingText(lines)
val matches = find(word, lines)
announceMatchingLines(matches))
def getText(book: Book): Try[Vector[String]] =
Using(Source.fromURL(book.url)): source =>
source.getLines.toVector
…and here is the first part of a program which, given a search word, uses the
above method to find occurrences of the word in the lines of our chosen book.
If getting the text lines fails then we handle that, otherwise we
announce that getting the text was successful, invoke a function to
find occurrences of the search word, and then announce the results.
def find(word: String, lines: Vector[String]): String =
lines.foldLeft("")(accumulateLinesContaining(word))
def accumulateLinesContaining(word: String): (String, String) => String =
(acc, line) => if line.matches(s".*$word.*") then s"$accn'$line'" else acc
As you can see below, the way that we search the book’s text lines for the search word is by
doing a left fold of function accumulateLinesContaining over the lines, so that the fold
returns a single string with all the lines containing the search word, separated by newlines.
By the way, to simplify exposition, the error handling that you see in
the run method is the only one in the whole slide deck. This is
obviously not production-grade code!
4. def handleErrorGettingText[A](error: Throwable): A =
throw IllegalStateException(s"Failed to obtain the text lines to be searched.", error)
def announceSuccessGettingText(lines: Vector[String]): Unit =
println(f"Successfully obtained ${lines.length}%,d lines of text to search.")
def announceMatchingLines(lines: String): Unit =
println(f"Found the word in the following ${lines.count(_ == 'n')}%,d lines of text: $lines")
Before we run the program,
here are its remaining methods
5. $ sbt "run joyous"
…
[info] running run joyous
Successfully obtained 99,975 lines of text to search.
Found the word in the following 4 lines of text:
'stirs, a tumultuous city, joyous city: thy slain men are not slain'
'23:7 Is this your joyous city, whose antiquity is of ancient days? her'
'upon all the houses of joy in the joyous city: 32:14 Because the'
'12:11 Now no chastening for the present seemeth to be joyous, but'
[success] Total time: 4 s, completed 5 Nov 2023, 07:54:12
Let’s search for the word ‘joyous’
That took four seconds, with much of the time taken
up by downloading the book (between 1.5s and 2.5s).
6. Let’s see how long the program takes to execute if we increase the number of lines to be searched.
Once the getText function has downloaded the book and obtained its lines of text, it now makes as many copies of the lines as required.
Let’s make a thousand copies of the lines.
Since the book is about 100,000 lines long, we’ll now be searching about 1,000 x
100,000 lines, i.e about 100 million lines.
While it makes little sense to search multiple copies of the book, we are doing this
purely to set the scene for the subject of this slide deck.
@philip_schwarz
7. $ sbt "run joyous"
…
[info] running run joyous
Successfully obtained 99,975,000 lines of text to search.
Found the word in the following 4,000 lines of text:
'stirs, a tumultuous city, joyous city: thy slain men are not slain'
'23:7 Is this your joyous city, whose antiquity is of ancient days? her'
'upon all the houses of joy in the joyous city: 32:14 Because the'
'12:11 Now no chastening for the present seemeth to be joyous, but'
'stirs, a tumultuous city, joyous city: thy slain men are not slain'
'23:7 Is this your joyous city, whose antiquity is of ancient days? her'
'upon all the houses of joy in the joyous city: 32:14 Because the'
'12:11 Now no chastening for the present seemeth to be joyous, but'
'stirs, a tumultuous city, joyous city: thy slain men are not slain'
'23:7 Is this your joyous city, whose antiquity is of ancient days? her'
'upon all the houses of joy in the joyous city: 32:14 Because the'
'12:11 Now no chastening for the present seemeth to be joyous, but’
…
'stirs, a tumultuous city, joyous city: thy slain men are not slain'
'23:7 Is this your joyous city, whose antiquity is of ancient days? her'
'upon all the houses of joy in the joyous city: 32:14 Because the'
'12:11 Now no chastening for the present seemeth to be joyous, but’
'stirs, a tumultuous city, joyous city: thy slain men are not slain'
'23:7 Is this your joyous city, whose antiquity is of ancient days? her'
'upon all the houses of joy in the joyous city: 32:14 Because the'
'12:11 Now no chastening for the present seemeth to be joyous, but'
[success] Total time: 66 s (01:06), completed 5 Nov 2023, 11:11:14
Searching through a thousand copies of the book takes a little bit over one minute.
When we searched one copy of the book, we found four matching lines, so it makes sense
that now that we are searching a thousand copies, we are finding 4,000 matching lines.
I ran the program four times, and its
execution times were 66s, 65s, 65s and 66s.
By the way, when I first tried to run the
program, I got some warnings suggesting
that I increase the heap space, so I added
the following to file .sbtopts: -J-Xmx5G
8. In this deck we are going to look at three ways of parallelising the program’s
search for matching lines, which is carried out by the following left fold
def find(word: String, lines: Vector[String]): String =
lines.foldLeft("")(accumulateLinesContaining(word))
def accumulateLinesContaining(word: String): (String, String) => String =
(acc, line) => if line.matches(s".*$word.*") then s"$accn'$line'" else acc
The fold is working its way sequentially through 100 million lines of text.
Instead of processing all of the lines sequentially, can we get the program to partition the
lines into a number of batches, search the batches in parallel, and then combine the results
of all the searches?
Trick question: will the foldLeft function automatically do that for us, behind the scenes, if
instead of invoking the function on a sequential collection, we first convert it to a parallel
collection?
I ask because, as you can see on the next slide, there is a Scala parallel collections library
that can be used to convert a sequential collection to a parallel one.
10. Let’s add the Scala parallel collections library to the build…
libraryDependencies += "org.scala-lang.modules" %% "scala-parallel-collections" % "1.0.4"
…and get the find function to convert the sequential collection of text lines to a parallel one…
11. $ sbt "run joyous"
…
[info] running run joyous
Successfully obtained 99,975,000 lines of text to search.
Found the word in the following 4,000 lines of text:
'stirs, a tumultuous city, joyous city: thy slain men are not slain'
'23:7 Is this your joyous city, whose antiquity is of ancient days? her'
'upon all the houses of joy in the joyous city: 32:14 Because the'
'12:11 Now no chastening for the present seemeth to be joyous, but'
'stirs, a tumultuous city, joyous city: thy slain men are not slain'
'23:7 Is this your joyous city, whose antiquity is of ancient days? her'
'upon all the houses of joy in the joyous city: 32:14 Because the'
'12:11 Now no chastening for the present seemeth to be joyous, but'
'stirs, a tumultuous city, joyous city: thy slain men are not slain'
'23:7 Is this your joyous city, whose antiquity is of ancient days? her'
'upon all the houses of joy in the joyous city: 32:14 Because the'
'12:11 Now no chastening for the present seemeth to be joyous, but’
…
'stirs, a tumultuous city, joyous city: thy slain men are not slain'
'23:7 Is this your joyous city, whose antiquity is of ancient days? her'
'upon all the houses of joy in the joyous city: 32:14 Because the'
'12:11 Now no chastening for the present seemeth to be joyous, but’
'stirs, a tumultuous city, joyous city: thy slain men are not slain'
'23:7 Is this your joyous city, whose antiquity is of ancient days? her'
'upon all the houses of joy in the joyous city: 32:14 Because the'
'12:11 Now no chastening for the present seemeth to be joyous, but'
[success] Total time: 68 s (01:08), completed 5 Nov 2023, 16:44:21
Now let’s run the program again and see if converting the sequential collection of
lines into a parallel collection has any effect on the program’s execution time.
No difference: the execution time
is pretty much the same as before.
@philip_schwarz
12. Earlier, when I asked the following, I did mention that is was a trick question:
will the foldLeft function automatically do that for us, behind the scenes, if instead of invoking the function on a
sequential collection, we first convert it to a parallel collection?
To see why it is a trick question, consider the signature of the foldLeft function:
The fodLeft function cannot avoid processing a collection’s elements sequentially: even if it did break the collection of As
down into multiple smaller collections of As, and then sequentially folded (using the op function) each of those smaller
collections at the same time, in parallel, it would not know what to do with the resulting Bs, because it doesn’t have a
function that it can use to combine two B results, and so it is unable to combine all the B results into a single overall B result.
As we can see in the next slide, in EPFL’s Scala Parallel Programming course, Aleksandar Prokopec uses some really effective
Lego diagrams to help visualise the situation.
def foldLeft[B](z: B)(op: (B, A) => B): B
13. Even if foldLeft could break a collection of As down into
multiple smaller collections, and fold each of those
collections into a B, in parallel, it doesn’t have a function
for combining the resulting Bs into a single overall B.
Aleksandar Prokopec
@alexprokopec
The Scala parallel collections library
does have a solution for this problem
though, and we’ll come back to it later.
14. Since converting the sequential vector of lines to a parallel collection doesn’t have any
effect, let’s revert our last change, and rename the main method to runWithoutParallelism.
15. As a recap, before moving on, the next slide
shows the whole code for the current program.
@philip_schwarz
16. @main def runWithoutParallelism(word: String): Unit =
getText(book = theBible, copies = 1_000)
.fold(
error => handleErrorGettingText(error),
lines =>
announceSuccessGettingText(lines)
val matches = find(word, lines)
announceMatchingLines(matches))
def getText(book: Book, copies: Int = 1): Try[Vector[String]] =
Using(Source.fromURL(book.url)): source =>
val lines = source.getLines.toVector
Vector.fill(copies)(lines).flatten
def find(word: String, lines: Vector[String]): String =
lines.foldLeft("")(accumulateLinesContaining(word))
def accumulateLinesContaining(word: String): (String, String) => String =
(acc, line) => if line.matches(s".*$word.*") then s"$accn'$line'" else acc
def handleErrorGettingText[A](error: Throwable): A =
throw IllegalStateException(s"Failed to obtain the text lines to be searched.", error)
def announceSuccessGettingText(lines: Vector[String]): Unit =
println(f"Successfully obtained ${lines.length}%,d lines of text to search.")
def announceMatchingLines(lines: String): Unit
println(f"Found the word in the following ${lines.count(_ == 'n')}%,d lines of text: $lines")
import java.net.URL
import scala.io.Source
import scala.util.{Try, Using}
val theBible = Book(
name = "The King James Version of the Bible",
numberOfLines = 99_975,
numberOfBytes = 4_456_041,
url = URL("https://gutenberg.org/cache/epub/10/pg10.txt")
)
case class Book(
name: String,
numberOfLines: Int,
numberOfBytes: Int,
url: URL
)
17. If we want to parallelise the left fold, but all we can use is Scala‘s standard library, how can we do it?
One way is to use the Future monad and its traverse function.
Let’s write a new main method called runUsingFutureTraverse. While its body is identical to that of runWithoutParallelism…
…the find function that it invokes cannot be the one invoked by runWithoutParallelism …
def find(word: String, lines: Vector[String]): String =
lines.foldLeft("")(accumulateLinesContaining(word))
…it needs to be rewritten, which we do on the next slide.
Here are some imports that we are going to need
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration.Duration
import scala.concurrent.{Await, Future}
19. def find(word: String, lines: Vector[String]): String =
val batchSize = lines.size / (numberOfCores / 2)
val groupsOfLines = lines.grouped(batchSize).toVector
Await
.result(
Future.traverse(groupsOfLines)(searchFor(word))
.map(_.foldLeft("")(_++_)),
Duration.Inf
)
def find(word: String, lines: Vector[String]): String =
lines.foldLeft("")(accumulateLinesContaining(word))
def searchFor(word: String)(lines: Vector[String]): Future[String] =
Future(lines.foldLeft("")(accumulateLinesContaining(word)))
def accumulateLinesContaining(word: String): (String, String) => String =
(acc, line) => if line.matches(s".*$word.*") then s"$accn'$line'" else acc
We use the grouped function to break the collection of text lines
into multiple smaller collections, one for each of the number of CPU
cores that we want to use for parallelising the search.
We have decided to use half of the available cores for this purpose.
We then traverse the smaller collections of lines with a searchFor function that is
used to fold each collection, and which is essentially the find function that we have
been using up to now (on the right), except that it does the folding in a Future.
Compare the new find function (above on the left), with
the one which we have been using up to now (below).
traverse first creates a collection of futures, each of which does a left fold of the As in a
smaller collection, and then turns that collection inside out, i.e. it turns the collection of
future Bs into a future collection of Bs.
The futures execute in parallel, each one on a separate core, and when they complete,
each of them yields the result of the left fold of a smaller collection.
When the future collection returned by traverse completes, the find function has a
collection of Bs, which it then folds into a single overall B.
20. Now that the program partitions the collection of text lines into multiple smaller collections, and
folds each of those smaller collections on a separate core, let’s get the program to print on the
console the name of the thread that does the folding on each such core.
To do that, let’s first extend Future with the following method that turns a Future into a Future which,
the last thing it does as part of its execution, is print the thread name on which it is being executed
extension [A](fa: Future[A])
def printThreadName(): Future[A] =
for
a <- fa
_ = println(s"[${Thread.currentThread().getName}]")
yield a
Now all we have to do is invoke the new method.
21. That worked: the collection of lines was split into six smaller
collections which got folded in parallel, each in a separate thread,
with the names of the threads visible in the console output.
When the whole collection was processed sequentially, the
processing took a bit over one minute, but now that different
parts of the collection are being processed in parallel, the
processing took 25 seconds, almost a third of the time.
$ sbt "run joyous"
…
Multiple main classes detected. Select one to run:
[1] runUsingFutureTraverse
[2] runWithoutParallelism
Enter number: 1
[info] running runUsingFutureTraverse joyous
Successfully obtained 99,975,000 lines of text to search.
[scala-execution-context-global-167]
[scala-execution-context-global-169]
[scala-execution-context-global-170]
[scala-execution-context-global-165]
[scala-execution-context-global-168]
[scala-execution-context-global-166]
Found the word in the following 4,000 lines of text:
'stirs, a tumultuous city, joyous city: thy slain men are not slain'
'23:7 Is this your joyous city, whose antiquity is of ancient days? her'
'upon all the houses of joy in the joyous city: 32:14 Because the'
'12:11 Now no chastening for the present seemeth to be joyous, but'
'stirs, a tumultuous city, joyous city: thy slain men are not slain'
'23:7 Is this your joyous city, whose antiquity is of ancient days? her'
'upon all the houses of joy in the joyous city: 32:14 Because the'
'12:11 Now no chastening for the present seemeth to be joyous, but'
'stirs, a tumultuous city, joyous city: thy slain men are not slain'
'23:7 Is this your joyous city, whose antiquity is of ancient days? her'
'upon all the houses of joy in the joyous city: 32:14 Because the'
'12:11 Now no chastening for the present seemeth to be joyous, but’
…
'stirs, a tumultuous city, joyous city: thy slain men are not slain'
'23:7 Is this your joyous city, whose antiquity is of ancient days? her'
'upon all the houses of joy in the joyous city: 32:14 Because the'
'12:11 Now no chastening for the present seemeth to be joyous, but’
'stirs, a tumultuous city, joyous city: thy slain men are not slain'
'23:7 Is this your joyous city, whose antiquity is of ancient days? her'
'upon all the houses of joy in the joyous city: 32:14 Because the'
'12:11 Now no chastening for the present seemeth to be joyous, but’
[success] Total time: 25 s, completed 12 Nov 2023, 11:56:55
Let’s run the new program and search for the word ‘joyous’ again.
I ran the program another three times, and
its execution times were 28s, 28s and 26s.
@philip_schwarz
22. That was the first of the three approaches that we
are going to explore for parallelising our left fold.
As a recap, before moving on, the next slide shows the
whole code for the new program.
New/changed code is highlighted with a yellow background.
23. def handleErrorGettingText[A](error: Throwable): A =
throw IllegalStateException(s"Failed to obtain the text lines to be searched.", error)
def announceSuccessGettingText(lines: Vector[String]): Unit =
println(f"Successfully obtained ${lines.length}%,d lines of text to search.")
def announceMatchingLines(lines: String): Unit
println(f"Found the word in the following ${lines.count(_ == 'n')}%,d lines of text: $lines")
def getText(book: Book, copies: Int = 1): Try[Vector[String]] =
Using(Source.fromURL(book.url)): source =>
val lines = source.getLines.toVector
Vector.fill(copies)(lines).flatten
def accumulateLinesContaining(word: String): (String, String) => String =
(acc, line) => if line.matches(s".*$word.*") then s"$accn'$line'" else acc
import java.net.URL
import scala.io.Source
import scala.util.{Try, Using}
val theBible = Book(
name = "The King James Version of the Bible",
numberOfLines = 99_975,
numberOfBytes = 4_456_041,
url = URL("https://gutenberg.org/cache/epub/10/pg10.txt")
)
case class Book(
name: String,
numberOfLines: Int,
numberOfBytes: Int,
url: URL
)
def find(word: String, lines: Vector[String]): String =
val batchSize = lines.size / (numberOfCores / 2)
val groupsOfLines = lines.grouped(batchSize).toVector
Await.result(
Future.traverse(groupsOfLines)(searchFor(word))
.map(_.foldLeft("")(_++_)),
Duration.Inf)
def searchFor(word: String)(lines: Vector[String]): Future[String] =
Future(lines.foldLeft("")(accumulateLinesContaining(word))).printThreadName() extension [A](fa: Future[A])
def printThreadName(): Future[A] =
for
a <- fa
_ = println(s"[${Thread.currentThread().getName}]")
yield a
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration.Duration
import scala.concurrent.{Await, Future}
@main def runUsingFutureTraverse(word: String): Unit =
getText(book = theBible, copies = 1_000)
.fold(
error => handleErrorGettingText(error),
lines =>
announceSuccessGettingText(lines)
val matches = find(word, lines)
announceMatchingLines(matches))
val numberOfCores = Runtime.getRuntime().availableProcessors()
24. If we want to parallelise the left fold, and we are allowed to use external libraries, how can we do it?
One way is to do something similar to what we did using the Future monad and its traverse function, but using the Cats Effect IO
monad and Cats Core’s parTraverse function.
Because this approach is very similar to the previous one, I am going to explain it by revisiting the explanation of the latter, so if
you get a sense of déjà vu, that’s intentional, because I reckon it will make things easier to digest.
Let’s write a new method called runUsingCatsParTraverse. While its body is very similar to that of runWithoutParallelism…
…the find function that it invokes cannot be the one invoked by runWithoutParallelism …
def find(word: String, lines: Vector[String]): String =
lines.foldLeft("")(accumulateLinesContaining(word))
…it needs to be rewritten, which we’ll be doing next.
25. If you are familiar with Cats’ parTraverse then please skip the next two slides.
If you are not, there is an amazingly helpful, clear, detailed, and hands-on explanation of parMapN and
parTraverse (and much more) in Adam Rosien’s great book: Essential Effects.
While there is no substitute for reading Chapter 3, Parallel execution, the following two slides are my humble
attempt to capture some of the information imparted by that chapter, by cherry picking some of its sentences,
passages, and diagrams, and stitching them together in an order of my own devising, in the hope that it serves as
a very brief, high-level introduction to concepts that are fully explained in the book.
Adam Rosien
@arosien
26. IO does not support parallel operations itself, because it is a Monad.
The Parallel typeclass specifies the translation between a pair of effect types: one that is a Monad and the other that is “only” an Applicative.
The Parallel typeclass encodes transformations between a sequential type S and a parallel type P.
Parallel[IO] connects the IO effect to its parallel counterpart, IO.Par.
parMapN is the parallel version of the applicative mapN method. It lets us combine multiple effects into one, in parallel, by specifying how to
combine the outputs of the effects
The parMapN extension method is implemented as (1) translating the sequential effect types into parallel representations, (2) performing the
alternative mapN, and (3) translating the parallel representation back to the sequential form.
Adam Rosien
@arosien
27. Adam Rosien
@arosien
3.5. parTraverse
parTraverse is the parallel version of traverse; both have the type signature:
F[A] => (A => G[B]) => G[F[B]]
For example, if F is List and G is IO, then (par)traverse would be a function from a List[A] to an IO[List[B]] when given a function A ⇒ IO[B].
List[A] => (A => IO[B]) => IO[List[B]]
The most common use case of (par)traverse is when you have a collection of work to be done, and a function which handles one unit of work.
Then you get a collection of results combined into one effect:
val work: List[WorkUnit] = ???
def doWork(workUnit: WorkUnit): IO[Result] = ??? ①
val results: IO[List[Result]] = work.parTraverse(doWork)
① Note that processing one unit of work is an effect, in this case, IO.
28. import cats.effect.{ExitCode, IO, IOApp}
import cats.syntax.foldable.*
import cats.syntax.parallel.*
and let’s import the following…
Let’s add Cats Core and Cats Effect to the build…
libraryDependencies += "org.typelevel" %% "cats-core" % "2.9.0"
libraryDependencies += "org.typelevel" %% "cats-effect" % "3.5.2"
29. def searchFor(word: String)(lines: Vector[String]): IO[String] =
IO(lines.foldLeft("")(accumulateLinesContaining(word)))
def find(word: String, lines: Vector[String]): String =
lines.foldLeft("")(accumulateLinesContaining(word))
def accumulateLinesContaining(word: String): (String, String) => String =
(acc, line) => if line.matches(s".*$word.*") then s"$accn'$line'" else acc
We use the grouped function to break the collection of text lines
into multiple smaller collections, one for each of the number of CPU
cores that we want to use for parallelising the search.
We have decided to use half of the available cores for this purpose.
We then parTaverse the smaller collections of lines with a searchFor function that
is used to fold each collection, and which is essentially the find function that we
have been using up to now (on the right), except that it does the folding in an IO.
Compare the new find function (above on the left), with
the one used by runWithoutParallelism (below).
parTaverse first creates a collection of IOs, each of which does a left fold of the As in a
smaller collection, and then turns that collection inside out, i.e. it turns the collection of
IOs of B into an IO of a collection of Bs.
The IOs execute in parallel, each one on a separate core, and when they complete, each of
them yields the result of the left fold of a smaller collection.
When the IO of a collection of Bs returned by parTaverse completes, the find function has
a collection of Bs, which it then folds into a single overall B.
def find(word: String, lines: Vector[String]): IO[String] =
val batchSize = lines.size / (numberOfCores / 2)
val groupsOfLines = lines.grouped(batchSize).toVector
groupsOfLines
.parTraverse(searchFor(word))
.map(_.combineAll)
30. Now that the program partitions the collection of text lines into multiple smaller collections, and folds each of
those smaller collections on a separate core, let’s get the program to print on the console the name of the thread
that does the folding on each such core.
To do that, let’s adapt the printThreadName extension method that we wrote earlier, so that it also works for IO
Now all we have to do is invoke the new method.
The next slide shows the whole code for the new program.
New/changed code is again highlighted with a yellow background.
@philip_schwarz
31. import cats.syntax.functor.*
extension [A, F[_]: Functor](fa: F[A])
def printThreadName(): F[A] =
for
a <- fa
_ = println(s"[${Thread.currentThread().getName}]")
yield a
def getText(book: Book, copies: Int = 1): Try[Vector[String]] =
Using(Source.fromURL(book.url)): source =>
val lines = source.getLines.toVector
Vector.fill(copies)(lines).flatten
def accumulateLinesContaining(word: String): (String, String) => String =
(acc, line) => if line.matches(s".*$word.*") then s"$accn'$line'" else acc
def handleErrorGettingText[A](error: Throwable): A =
throw IllegalStateException(s"Failed to obtain the text lines to be searched.", error)
def announceSuccessGettingText(lines: Vector[String]): Unit =
println(f"Successfully obtained ${lines.length}%,d lines of text to search.")
def announceMatchingLines(lines: String): Unit
println(f"Found the word in the following ${lines.count(_ == 'n')}%,d lines of text: $lines")
import java.net.URL
import scala.io.Source
import scala.util.{Try, Using}
val theBible = Book(
name = "The King James Version of the Bible",
numberOfLines = 99_975,
numberOfBytes = 4_456_041,
url = URL("https://gutenberg.org/cache/epub/10/pg10.txt")
)
case class Book(
name: String,
numberOfLines: Int,
numberOfBytes: Int,
url: URL
)
def find(word: String, lines: Vector[String]): String =
val batchSize = lines.size / (numberOfCores / 2)
val groupsOfLines = lines.grouped(batchSize).toVector
groupsOfLines
.parTraverse(searchFor(word))
.map(_.combineAll)
def searchFor(word: String)(lines: Vector[String]): IO[String] =
IO(lines.foldLeft("")(accumulateLinesContaining(word))).printThreadName()
import cats.effect.{ExitCode, IO, IOApp}
import cats.syntax.foldable.*
import cats.syntax.parallel.*
def runUsingCatsParTraverse(word: String): Unit =
getText(book = theBible, copies = 1_000)
.fold(
error => handleErrorGettingText(error),
lines =>
announceSuccessGettingText(lines)
val matches = find(word, lines)
announceMatchingLines(matches))
object CatsParTraverse extends IOApp:
override def run(args: List[String]): IO[ExitCode] =
val word = args.headOption.getOrElse("joyous")
runUsingCatsParTraverse(word).as(ExitCode.Success)
val numberOfCores = Runtime.getRuntime().availableProcessors()
32. Note the following similarities and differences between the code for the parallelisation
approach using Future + traverse and that for the approach using IO + parTraverse.
def find(word: String, lines: Vector[String]): String =
val batchSize = lines.size / (numberOfCores / 2)
val groupsOfLines = lines.grouped(batchSize).toVector
Await.result(
Future.traverse(groupsOfLines)(searchFor(word))
.map(_.foldLeft("")(_++_)),
Duration.Inf)
def searchFor(word: String)(lines: Vector[String]): Future[String] =
Future(lines.foldLeft("")(accumulateLinesContaining(word))).printThreadName()
@main def runUsingFutureTraverse(word: String): Unit =
getText(book = theBible, copies = 1_000)
.fold(
error => handleErrorGettingText(error),
lines =>
announceSuccessGettingText(lines)
val matches = find(word, lines)
announceMatchingLines(matches))
def find(word: String, lines: Vector[String]): String =
val batchSize = lines.size / (numberOfCores / 2)
val groupsOfLines = lines.grouped(batchSize).toVector
groupsOfLines
.parTraverse(searchFor(word))
.map(_.combineAll)
def searchFor(word: String)(lines: Vector[String]): IO[String] =
IO(lines.foldLeft("")(accumulateLinesContaining(word))).printThreadName()
def runUsingCatsParTraverse(word: String): Unit =
getText(book = theBible, copies = 1_000)
.fold(
error => handleErrorGettingText(error),
lines =>
announceSuccessGettingText(lines)
val matches = find(word, lines)
announceMatchingLines(matches))
33. That worked: the collection of lines was split into six smaller
collections which got folded in parallel, each in a separate thread,
with the names of the threads visible in the console output.
When the whole collection was processed sequentially, the
processing took a bit over one minute, but now that different
parts of the collection are being processed in parallel, the
processing took 28 seconds, almost a third of the time.
$ sbt "run joyous"
…
Multiple main classes detected. Select one to run:
[1] CatsParTraverse
[2] runUsingFutureTraverse
[3] runWithoutParallelism
Enter number: 1
[info] running CatsParTraverse joyous
Successfully obtained 99,975,000 lines of text to search.
[io-compute-3]
[io-compute-5]
[io-compute-11]
[io-compute-6]
[io-compute-10]
[io-compute-8]
Found the word in the following 4,000 lines of text:
'stirs, a tumultuous city, joyous city: thy slain men are not slain'
'23:7 Is this your joyous city, whose antiquity is of ancient days? her'
'upon all the houses of joy in the joyous city: 32:14 Because the'
'12:11 Now no chastening for the present seemeth to be joyous, but'
'stirs, a tumultuous city, joyous city: thy slain men are not slain'
'23:7 Is this your joyous city, whose antiquity is of ancient days? her'
'upon all the houses of joy in the joyous city: 32:14 Because the'
'12:11 Now no chastening for the present seemeth to be joyous, but'
'stirs, a tumultuous city, joyous city: thy slain men are not slain'
'23:7 Is this your joyous city, whose antiquity is of ancient days? her'
'upon all the houses of joy in the joyous city: 32:14 Because the'
'12:11 Now no chastening for the present seemeth to be joyous, but’
…
'stirs, a tumultuous city, joyous city: thy slain men are not slain'
'23:7 Is this your joyous city, whose antiquity is of ancient days? her'
'upon all the houses of joy in the joyous city: 32:14 Because the'
'12:11 Now no chastening for the present seemeth to be joyous, but’
'stirs, a tumultuous city, joyous city: thy slain men are not slain'
'23:7 Is this your joyous city, whose antiquity is of ancient days? her'
'upon all the houses of joy in the joyous city: 32:14 Because the'
'12:11 Now no chastening for the present seemeth to be joyous, but’
[success] Total time: 28 s, completed 12 Nov 2023, 11:56:55
Let’s run the new program and search for the word ‘joyous’ again.
I ran the program another three times, and
its execution times were 30s, 37s and 28s.
34. That was the second of the three approaches that
we are going to explore for parallelising our left fold.
35. For our third and final approach to parallelising the left fold, let’s go back to using the scala parallel collections library.
What we are going to do is use the library’s aggregate function.
Let’s write a new method called runUsingParallelAggregation, whose body is identical to that of runWithoutParallelism.
The function first converts the sequential vector of lines to a parallel collection, and then invokes the aggregate function on the latter.
For an explanation of the aggregate function, on the next two slides we are going to turn to Aleksandar Prokopec’s book: Learning
Concurrent Programming in Scala.
On the first slide, as a recap, is his explanation of why foldLeft cannot be parallelised, and on the second slide, his explanation of how the
aggregate function allows a left fold to be parallelised.
We are also going to throw in his diagrams from EPFL’s Scala Parallel Programming course.
As for the find function that it invokes, here is how it needs to change
def find(word: String, lines: Vector[String]): String =
lines.par.aggregate("")(seqop = accumulateLinesContaining(word), combop = _++_)
36. Non-parallelizable operations
While most parallel collection operations achieve superior performance by executing on several processors, some operations are inherently
sequential, and their semantics do not allow them to execute in parallel. Consider the foldLeft method from the Scala collections API:
def foldLeft[S](z: S)(f: (S, T) => S): S
This method visits elements of the collection going from left to right …
The crucial property of the foldLeft operation is that it traverses the elements of the list by going from left to right. This is reflected in the type
of the function f; it accepts an accumulator of type S and a list value of type T. The function f cannot take two values of the accumulator
type S and merge them into a new accumulator of type S. As a consequence, computing the accumulator cannot be implemented in parallel;
the foldLeft method cannot merge two accumulators from two different processors.
…
Aleksandar Prokopec
@alexprokopec
37. To specify how the accumulators produced by different processors should be merged together, we need to use the aggregate method.
The aggregate method is similar to the foldLeft operation, but it does not specify that the elements are traversed from left to right. Instead, it
only specifies that subsets of elements are visited going from left to right; each of these subsets can produce a separate accumulator.
The aggregate method takes an additional function of type (S, S) => S, which is used to merge multiple accumulators.
d.aggregate("")
((acc, line) => if (line.matches(".*TEXTAREA.*")) s"$accn$line" else acc,
(acc1, acc2) => acc1 + acc2 )
…
When doing these kinds of reduction operation in parallel, we can alternatively use the reduce or fold methods, which do not guarantee going
from left to right. The aggregate method is more expressive, as it allows the accumulator type to be different from the type of the elements in
the collection.
…
def aggregate[S](z: => S)(seqop: (S, T) => S, combop: (S, S) => S): S
Aleksandar Prokopec
@alexprokopec
seqop seqop
combop
aggregate
38. $ sbt "run joyous"
…
Multiple main classes detected. Select one to run:
[1] CatsParTraverse
[2] runUsingFutureTraverse
[3] runUsingParallelAggregation
[4] runWithoutParallelism
Enter number: 3
[info] running runUsingParallelAggregation joyous
Successfully obtained 99,975,000 lines of text to search.
Found the word in the following 4,000 lines of text:
'stirs, a tumultuous city, joyous city: thy slain men are not slain'
'23:7 Is this your joyous city, whose antiquity is of ancient days? her'
'upon all the houses of joy in the joyous city: 32:14 Because the'
'12:11 Now no chastening for the present seemeth to be joyous, but'
'stirs, a tumultuous city, joyous city: thy slain men are not slain'
'23:7 Is this your joyous city, whose antiquity is of ancient days? her'
'upon all the houses of joy in the joyous city: 32:14 Because the'
'12:11 Now no chastening for the present seemeth to be joyous, but'
'stirs, a tumultuous city, joyous city: thy slain men are not slain'
'23:7 Is this your joyous city, whose antiquity is of ancient days? her'
'upon all the houses of joy in the joyous city: 32:14 Because the'
'12:11 Now no chastening for the present seemeth to be joyous, but’
…
'stirs, a tumultuous city, joyous city: thy slain men are not slain'
'23:7 Is this your joyous city, whose antiquity is of ancient days? her'
'upon all the houses of joy in the joyous city: 32:14 Because the'
'12:11 Now no chastening for the present seemeth to be joyous, but’
'stirs, a tumultuous city, joyous city: thy slain men are not slain'
'23:7 Is this your joyous city, whose antiquity is of ancient days? her'
'upon all the houses of joy in the joyous city: 32:14 Because the'
'12:11 Now no chastening for the present seemeth to be joyous, but’
[success] Total time: 24 s, completed 12 Nov 2023, 11:56:55
That worked: the collection of lines was split into six smaller
collections which got folded in parallel, each in a separate thread,
with the names of the threads visible in the console output.
When the whole collection was processed sequentially, the
processing took a bit over one minute, but now that different
parts of the collection are being processed in parallel, the
processing took 24 seconds, almost a third of the time.
Let’s run the new program and search for the word ‘joyous’ again.
I ran the program another three times, and
its execution times were 28s, 25s and 26s.
@philip_schwarz
39. @main def runUsingParallelAggregation(word: String): Unit =
getText(book = theBible, copies = 1_000)
.fold(
error => handleErrorGettingText(error),
lines =>
announceSuccessGettingText(lines)
val matches = find(word, lines)
announceMatchingLines(matches))
def getText(book: Book, copies: Int = 1): Try[Vector[String]] =
Using(Source.fromURL(book.url)): source =>
val lines = source.getLines.toVector
Vector.fill(copies)(lines).flatten
def find(word: String, lines: Vector[String]): String =
lines.par.aggregate("")(seqop = accumulateLinesContaining(word), combop = _++_)
def accumulateLinesContaining(word: String): (String, String) => String =
(acc, line) => if line.matches(s".*$word.*") then s"$accn'$line'" else acc
def handleErrorGettingText[A](error: Throwable): A =
throw IllegalStateException(s"Failed to obtain the text lines to be searched.", error)
def announceSuccessGettingText(lines: Vector[String]): Unit =
println(f"Successfully obtained ${lines.length}%,d lines of text to search.")
def announceMatchingLines(lines: String): Unit
println(f"Found the word in the following ${lines.count(_ == 'n')}%,d lines of text: $lines")
import java.net.URL
import scala.io.Source
import scala.util.{Try, Using}
val theBible = Book(
name = "The King James Version of the Bible",
numberOfLines = 99_975,
numberOfBytes = 4_456_041,
url = URL("https://gutenberg.org/cache/epub/10/pg10.txt")
)
case class Book(
name: String,
numberOfLines: Int,
numberOfBytes: Int,
url: URL
)
import scala.collection.parallel.CollectionConverters.*
40. That was the third and final of the three approaches
that we explored for parallelising our left fold.
In conclusion, the next slide compares and contrasts four versions of the
find function, the one in the original sequential code, and the ones in the
three different approaches to parallelisation that we have explored.
The slide after that is the same but without any highlighting