SlideShare a Scribd company logo
Big Data Processing in
Pharo
Matteo Marra

ESUG - August 2019 - Köln
Matteo Marra - Big Data Processing in Pharo
Big Data Frameworks
2
Programming Model
To express large data processing jobs in terms of simple
functions (e.g., Map/Reduce)
Parallel Execution Model
To execute in parallel computation on large amount of data
Matteo Marra - Big Data Processing in Pharo
Why Big Data?

Google MapReduce
3
Index the WWW
More than 20 billion web pages (> 400 TB)
Do it fast
With 1000 machines: < 3 hours, with one machine, 4 months.
Keep it cheap
Use a lot of recycled hardware, but with robust error recovery
MapReduce: Simplified Data
Processing on Large Clusters
Dean J. and Ghemawat S. OSDI ‘04
Matteo Marra - Big Data Processing in Pharo
Rossi Abruzzo 1522706400
Verdi Toscana 1527976800
Bianchi Toscana 1525298400
Verdi Lombardia 1527976800
Gialli Toscana 1525298400
Rossi Abruzzo 1525298400
Gialli Veneto 1527976800
Bianchi Sardegna 1522706400
Gialli Toscana 1527976800
Bianchi Lombardia 1527976800
Bianchi Toscana 1522706400
Gialli Sicilia 1522706400
Gialli Sicilia 1527976800
Bianchi Sardegna 1527976800
Dataset KeyBy
(Rossi, Abruzzo …)
(Verdi, Toscana …)
(Bianchi, Toscana …)
(Verdi, Lombardia …)
(Gialli, Toscana …)
(Rossi, Abruzzo …)
(Gialli, Veneto …)
(Bianchi, Sardegna …)
(Gialli, Toscana …)
(Bianchi, Lombardia
…)
(Bianchi, Toscana …)
(Gialli, Sicilia …)
(Gialli, Sicilia …)
(Bianchi, Sardegna …)
Map
NIL
(Verdi, Toscana …)
(Bianchi, Toscana …)
NIL
(Gialli, Toscana …)
NIL
NIL
NIL
(Gialli, Toscana …)
NIL
(Bianchi, Toscana …)
NIL
NIL
NIL
GroupBy
(Verdi, Toscana)
(Bianchi, Toscana …)
(Bianchi, Toscana …)
(Gialli, Toscana …)
(Gialli, Toscana …)
Reduce
(Verdi, 1)
(Bianchi, 2)
(Gialli, 2)
Map/Reduce by Example
4
Matteo Marra - Big Data Processing in Pharo
A Map/Reduce application in
Pharo
5
VoteCountingApp>>map: aLine
| splitted|
(line includesSubstring: ‘Toscana')
ifTrue: [ splitted := line substrings: ' '.
(DateAndTime fromUnixTime: splitted at: 3)
> DateAndTime yesterday
ifTrue: [ ^ (splitted at: 2) -> 1 ] ].
^ nil -> nil
VoteCountingApp>>reduceByKey: values
^ values inject: 0 into: [:sum :current | sum + current]
1
2
3
4
5
6
7
8
9
10
11
Matteo Marra - Big Data Processing in Pharo
Port: a Map/Reduce Framework
for Pharo
6
Apply Map and ReduceCoordinate
Matteo Marra - Big Data Processing in Pharo
Deploying Port
7
Local
Deploy on one machine
using multiple cores.
Standalone
Deploy on multiple machines
in the same network.
Yarn Mode
Deploy on a cluster using
Hadoop YARN.
Matteo Marra - Big Data Processing in Pharo
Using Port
8
app := VoteCountingApp new.
port startMapReduceApplication: app on: data.
Matteo Marra - Big Data Processing in Pharo
Handling Results
9
VoteCountingApp>>handleResult: resultPair
self storeToHDFS: resultPair
Asynchronous Execution
The application is executed
asynchronously.
Asynchronous Result Handling
The result is handled asynchronously by the
Map/Reduce Master
Matteo Marra - Big Data Processing in Pharo
From Map/Reduce to Spark
10
Distributed Collection
Instead of executing Map/Reduce on a Dataset, you load
the dataset in memory
Broader API
Not only execute map or reduce, but a broad set of
methods applicable to the distributed collection
In-Memory operations
Avoid heavy storing of intermediate results by explicitly
keep them in memory
map:
reduce:
filter:
aggregate:
groupBy:
map: reduce:
Matteo Marra - Big Data Processing in Pharo
The Spark-like API in Port
11
port startMapReduceApplication: app
on: dataSet .
data := port distribute: dataSet.
VoteCountingApp>>map: aLine
…
result := data map: […].
result storeToHDFSPath: ‘…’ .
VoteCountingApp>>handleResult: res
self storeToHDFS: res
VoteCountingApp>>run
Matteo Marra - Big Data Processing in Pharo
The Polls Analyser in Port
(V2)
12
port := Port withServerURL: ‘http://myServerURL’.
data := port distribute: dataSet.
data := data filter: [:line | line includesSubstring: ‘Toscana’].
data map: [:line | splitted := line substrings: ' ' .
(DateAndTime fromUnixTime: splitted at: 3)
> DateAndTime yesterday
ifTrue: [ (splitted at: 2) -> 1 ]]
result := data reduceByKey: [:value :sum| sum + value]
1
2
3
4
5
6
7
8
result getCollection.
result storeToHDFSPath: ‘…’
Matteo Marra - Big Data Processing in Pharo
DEMO
13
Matteo Marra - Big Data Processing in Pharo
Uses of Port
14
Blockchain Analysis
In collaboration with Santiago at RMoD Lille.
Full blockchain in 6 hours vs 2 days!
Genetic algorithms
In collaboration with Alexandre Bergel at Universidad de Chile
Matteo Marra - Big Data Processing in Pharo
Towards Live Big Data
Development Tools
15
Online Debugging
Debug the system as the bug is happening
Global View
Centralised debugging of the distributed system
Update of the Running System
Deploy code-fixes without restarting the whole system
Isolation
Debug the system without interfering with its execution
Conclusion
Matteo Marra - Vrije Universiteit Brussel
mmarra@vub.be gitlab.soft.vub.ac.be/Marra github.com/Marmat21
Want to try it?
CONTACT ME!
mmarra@vub.be

More Related Content

What's hot

Dr Richard Fry - Using R as a GIS
Dr Richard Fry - Using R as a GISDr Richard Fry - Using R as a GIS
Dr Richard Fry - Using R as a GIS
Shaun Lewis
 
Reduce Side Joins
Reduce Side Joins Reduce Side Joins
Reduce Side Joins Edureka!
 
JSSPP 2010
JSSPP 2010JSSPP 2010
JSSPP 2010
Joachim Lepping
 
Maps with leafletR
Maps with leafletRMaps with leafletR
Maps with leafletR
Michele Tobias
 
2016 bigdata - projects list
2016   bigdata - projects list2016   bigdata - projects list
2016 bigdata - projects list
MSR PROJECTS
 
OpenCube Workshop at eGov2015 & ePart2015 dual conference
OpenCube Workshop at eGov2015 & ePart2015 dual conferenceOpenCube Workshop at eGov2015 & ePart2015 dual conference
OpenCube Workshop at eGov2015 & ePart2015 dual conference
Evangelos Kalampokis
 
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
Evangelos Kalampokis
 
OpenNebulaConf 2016 - MICHAL - flexible infrastructure accounting framework b...
OpenNebulaConf 2016 - MICHAL - flexible infrastructure accounting framework b...OpenNebulaConf 2016 - MICHAL - flexible infrastructure accounting framework b...
OpenNebulaConf 2016 - MICHAL - flexible infrastructure accounting framework b...
OpenNebula Project
 
Police2_6_poster_GregAmos
Police2_6_poster_GregAmosPolice2_6_poster_GregAmos
Police2_6_poster_GregAmosGregory Amos
 
Examples of My Recent Work
Examples of My Recent WorkExamples of My Recent Work
Examples of My Recent Workguestf2807f
 
A Study on New York City Taxi Rides
A Study on New York City Taxi RidesA Study on New York City Taxi Rides
A Study on New York City Taxi Rides
Caglar Subasi
 
Big data in GIS Environment
Big data in GIS Environment Big data in GIS Environment
Big data in GIS Environment
Shivaprakash Yaragal
 
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy LansleyUsing R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
Guy Lansley
 
On-the-fly Integration of Static and Dynamic Linked Data
On-the-fly Integration of Static and Dynamic Linked DataOn-the-fly Integration of Static and Dynamic Linked Data
On-the-fly Integration of Static and Dynamic Linked Data
aharth
 
Ross McDonald - PgRouting in QGIS
Ross McDonald - PgRouting in QGISRoss McDonald - PgRouting in QGIS
Ross McDonald - PgRouting in QGIS
Ross McDonald
 
Importing Data From Other Statistical Packages
Importing Data From Other Statistical PackagesImporting Data From Other Statistical Packages
Importing Data From Other Statistical Packages
Penn State University
 
R programming language in spatial analysis
R programming language in spatial analysisR programming language in spatial analysis
R programming language in spatial analysisAbhiram Kanigolla
 
Guug11 mashing up-google_apps
Guug11 mashing up-google_appsGuug11 mashing up-google_apps
Guug11 mashing up-google_appsTony Hirst
 
"Dude, where's my graph?" RDF Data Cubes for Clinical Trials Data
"Dude, where's my graph?" RDF Data Cubes for Clinical Trials Data"Dude, where's my graph?" RDF Data Cubes for Clinical Trials Data
"Dude, where's my graph?" RDF Data Cubes for Clinical Trials Data
Marc Andersen
 

What's hot (19)

Dr Richard Fry - Using R as a GIS
Dr Richard Fry - Using R as a GISDr Richard Fry - Using R as a GIS
Dr Richard Fry - Using R as a GIS
 
Reduce Side Joins
Reduce Side Joins Reduce Side Joins
Reduce Side Joins
 
JSSPP 2010
JSSPP 2010JSSPP 2010
JSSPP 2010
 
Maps with leafletR
Maps with leafletRMaps with leafletR
Maps with leafletR
 
2016 bigdata - projects list
2016   bigdata - projects list2016   bigdata - projects list
2016 bigdata - projects list
 
OpenCube Workshop at eGov2015 & ePart2015 dual conference
OpenCube Workshop at eGov2015 & ePart2015 dual conferenceOpenCube Workshop at eGov2015 & ePart2015 dual conference
OpenCube Workshop at eGov2015 & ePart2015 dual conference
 
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
 
OpenNebulaConf 2016 - MICHAL - flexible infrastructure accounting framework b...
OpenNebulaConf 2016 - MICHAL - flexible infrastructure accounting framework b...OpenNebulaConf 2016 - MICHAL - flexible infrastructure accounting framework b...
OpenNebulaConf 2016 - MICHAL - flexible infrastructure accounting framework b...
 
Police2_6_poster_GregAmos
Police2_6_poster_GregAmosPolice2_6_poster_GregAmos
Police2_6_poster_GregAmos
 
Examples of My Recent Work
Examples of My Recent WorkExamples of My Recent Work
Examples of My Recent Work
 
A Study on New York City Taxi Rides
A Study on New York City Taxi RidesA Study on New York City Taxi Rides
A Study on New York City Taxi Rides
 
Big data in GIS Environment
Big data in GIS Environment Big data in GIS Environment
Big data in GIS Environment
 
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy LansleyUsing R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
 
On-the-fly Integration of Static and Dynamic Linked Data
On-the-fly Integration of Static and Dynamic Linked DataOn-the-fly Integration of Static and Dynamic Linked Data
On-the-fly Integration of Static and Dynamic Linked Data
 
Ross McDonald - PgRouting in QGIS
Ross McDonald - PgRouting in QGISRoss McDonald - PgRouting in QGIS
Ross McDonald - PgRouting in QGIS
 
Importing Data From Other Statistical Packages
Importing Data From Other Statistical PackagesImporting Data From Other Statistical Packages
Importing Data From Other Statistical Packages
 
R programming language in spatial analysis
R programming language in spatial analysisR programming language in spatial analysis
R programming language in spatial analysis
 
Guug11 mashing up-google_apps
Guug11 mashing up-google_appsGuug11 mashing up-google_apps
Guug11 mashing up-google_apps
 
"Dude, where's my graph?" RDF Data Cubes for Clinical Trials Data
"Dude, where's my graph?" RDF Data Cubes for Clinical Trials Data"Dude, where's my graph?" RDF Data Cubes for Clinical Trials Data
"Dude, where's my graph?" RDF Data Cubes for Clinical Trials Data
 

Similar to Big Data Processing in Pharo

BdT 3.1-3.6
BdT 3.1-3.6BdT 3.1-3.6
Distributed Cache With MapReduce
Distributed Cache With MapReduceDistributed Cache With MapReduce
Distributed Cache With MapReduce
Edureka!
 
E031201032036
E031201032036E031201032036
E031201032036
ijceronline
 
IRJET- Hadoop based Frequent Closed Item-Sets for Association Rules form ...
IRJET-  	  Hadoop based Frequent Closed Item-Sets for Association Rules form ...IRJET-  	  Hadoop based Frequent Closed Item-Sets for Association Rules form ...
IRJET- Hadoop based Frequent Closed Item-Sets for Association Rules form ...
IRJET Journal
 
Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)MongoSF
 
XML Parsing with Map Reduce
XML Parsing with Map ReduceXML Parsing with Map Reduce
XML Parsing with Map Reduce
Edureka!
 
Bulk Loading Into HBase With MapReduce
Bulk Loading Into HBase With MapReduceBulk Loading Into HBase With MapReduce
Bulk Loading Into HBase With MapReduce
Edureka!
 
T180304125129
T180304125129T180304125129
T180304125129
IOSR Journals
 
CDMA1X Pilot Panorama introduction
CDMA1X Pilot Panorama introductionCDMA1X Pilot Panorama introduction
CDMA1X Pilot Panorama introduction
Tempus Telcosys
 
Presented by Ahmed Abdulhakim Al-Absi - Scaling map reduce applications acro...
Presented by Ahmed Abdulhakim Al-Absi -  Scaling map reduce applications acro...Presented by Ahmed Abdulhakim Al-Absi -  Scaling map reduce applications acro...
Presented by Ahmed Abdulhakim Al-Absi - Scaling map reduce applications acro...
Absi Ahmed
 
Analysis of parking citations mapreduce techniques
Analysis of parking citations   mapreduce techniquesAnalysis of parking citations   mapreduce techniques
Analysis of parking citations mapreduce techniques
SindhujanDhayalan
 
FME Around the World
FME Around the WorldFME Around the World
FME Around the World
Safe Software
 
Fast Range Aggregate Queries for Big Data Analysis
Fast Range Aggregate Queries for Big Data AnalysisFast Range Aggregate Queries for Big Data Analysis
Fast Range Aggregate Queries for Big Data Analysis
IRJET Journal
 
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
IJECEIAES
 
Kudu as Storage Layer to Digitize Credit Processes
Kudu as Storage Layer to Digitize Credit ProcessesKudu as Storage Layer to Digitize Credit Processes
Kudu as Storage Layer to Digitize Credit Processes
Olaf Hein
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Sumeet Singh
 
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
A Survey on Data Mapping Strategy for data stored in the storage cloud  111A Survey on Data Mapping Strategy for data stored in the storage cloud  111
A Survey on Data Mapping Strategy for data stored in the storage cloud 111NavNeet KuMar
 
FME Around the World (FME Trek Part 1): Ken Bragg - Safe Software FME World T...
FME Around the World (FME Trek Part 1): Ken Bragg - Safe Software FME World T...FME Around the World (FME Trek Part 1): Ken Bragg - Safe Software FME World T...
FME Around the World (FME Trek Part 1): Ken Bragg - Safe Software FME World T...
IMGS
 
Bring Cartography to the Cloud
Bring Cartography to the CloudBring Cartography to the Cloud
Bring Cartography to the Cloud
Nick Dimiduk
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
ijwscjournal
 

Similar to Big Data Processing in Pharo (20)

BdT 3.1-3.6
BdT 3.1-3.6BdT 3.1-3.6
BdT 3.1-3.6
 
Distributed Cache With MapReduce
Distributed Cache With MapReduceDistributed Cache With MapReduce
Distributed Cache With MapReduce
 
E031201032036
E031201032036E031201032036
E031201032036
 
IRJET- Hadoop based Frequent Closed Item-Sets for Association Rules form ...
IRJET-  	  Hadoop based Frequent Closed Item-Sets for Association Rules form ...IRJET-  	  Hadoop based Frequent Closed Item-Sets for Association Rules form ...
IRJET- Hadoop based Frequent Closed Item-Sets for Association Rules form ...
 
Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)
 
XML Parsing with Map Reduce
XML Parsing with Map ReduceXML Parsing with Map Reduce
XML Parsing with Map Reduce
 
Bulk Loading Into HBase With MapReduce
Bulk Loading Into HBase With MapReduceBulk Loading Into HBase With MapReduce
Bulk Loading Into HBase With MapReduce
 
T180304125129
T180304125129T180304125129
T180304125129
 
CDMA1X Pilot Panorama introduction
CDMA1X Pilot Panorama introductionCDMA1X Pilot Panorama introduction
CDMA1X Pilot Panorama introduction
 
Presented by Ahmed Abdulhakim Al-Absi - Scaling map reduce applications acro...
Presented by Ahmed Abdulhakim Al-Absi -  Scaling map reduce applications acro...Presented by Ahmed Abdulhakim Al-Absi -  Scaling map reduce applications acro...
Presented by Ahmed Abdulhakim Al-Absi - Scaling map reduce applications acro...
 
Analysis of parking citations mapreduce techniques
Analysis of parking citations   mapreduce techniquesAnalysis of parking citations   mapreduce techniques
Analysis of parking citations mapreduce techniques
 
FME Around the World
FME Around the WorldFME Around the World
FME Around the World
 
Fast Range Aggregate Queries for Big Data Analysis
Fast Range Aggregate Queries for Big Data AnalysisFast Range Aggregate Queries for Big Data Analysis
Fast Range Aggregate Queries for Big Data Analysis
 
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
 
Kudu as Storage Layer to Digitize Credit Processes
Kudu as Storage Layer to Digitize Credit ProcessesKudu as Storage Layer to Digitize Credit Processes
Kudu as Storage Layer to Digitize Credit Processes
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
 
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
A Survey on Data Mapping Strategy for data stored in the storage cloud  111A Survey on Data Mapping Strategy for data stored in the storage cloud  111
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
 
FME Around the World (FME Trek Part 1): Ken Bragg - Safe Software FME World T...
FME Around the World (FME Trek Part 1): Ken Bragg - Safe Software FME World T...FME Around the World (FME Trek Part 1): Ken Bragg - Safe Software FME World T...
FME Around the World (FME Trek Part 1): Ken Bragg - Safe Software FME World T...
 
Bring Cartography to the Cloud
Bring Cartography to the CloudBring Cartography to the Cloud
Bring Cartography to the Cloud
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
 

More from ESUG

Workshop: Identifying concept inventories in agile programming
Workshop: Identifying concept inventories in agile programmingWorkshop: Identifying concept inventories in agile programming
Workshop: Identifying concept inventories in agile programming
ESUG
 
Technical documentation support in Pharo
Technical documentation support in PharoTechnical documentation support in Pharo
Technical documentation support in Pharo
ESUG
 
The Pharo Debugger and Debugging tools: Advances and Roadmap
The Pharo Debugger and Debugging tools: Advances and RoadmapThe Pharo Debugger and Debugging tools: Advances and Roadmap
The Pharo Debugger and Debugging tools: Advances and Roadmap
ESUG
 
Sequence: Pipeline modelling in Pharo
Sequence: Pipeline modelling in PharoSequence: Pipeline modelling in Pharo
Sequence: Pipeline modelling in Pharo
ESUG
 
Migration process from monolithic to micro frontend architecture in mobile ap...
Migration process from monolithic to micro frontend architecture in mobile ap...Migration process from monolithic to micro frontend architecture in mobile ap...
Migration process from monolithic to micro frontend architecture in mobile ap...
ESUG
 
Analyzing Dart Language with Pharo: Report and early results
Analyzing Dart Language with Pharo: Report and early resultsAnalyzing Dart Language with Pharo: Report and early results
Analyzing Dart Language with Pharo: Report and early results
ESUG
 
Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6
Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6
Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6
ESUG
 
A Unit Test Metamodel for Test Generation
A Unit Test Metamodel for Test GenerationA Unit Test Metamodel for Test Generation
A Unit Test Metamodel for Test Generation
ESUG
 
Creating Unit Tests Using Genetic Programming
Creating Unit Tests Using Genetic ProgrammingCreating Unit Tests Using Genetic Programming
Creating Unit Tests Using Genetic Programming
ESUG
 
Threaded-Execution and CPS Provide Smooth Switching Between Execution Modes
Threaded-Execution and CPS Provide Smooth Switching Between Execution ModesThreaded-Execution and CPS Provide Smooth Switching Between Execution Modes
Threaded-Execution and CPS Provide Smooth Switching Between Execution Modes
ESUG
 
Exploring GitHub Actions through EGAD: An Experience Report
Exploring GitHub Actions through EGAD: An Experience ReportExploring GitHub Actions through EGAD: An Experience Report
Exploring GitHub Actions through EGAD: An Experience Report
ESUG
 
Pharo: a reflective language A first systematic analysis of reflective APIs
Pharo: a reflective language A first systematic analysis of reflective APIsPharo: a reflective language A first systematic analysis of reflective APIs
Pharo: a reflective language A first systematic analysis of reflective APIs
ESUG
 
Garbage Collector Tuning
Garbage Collector TuningGarbage Collector Tuning
Garbage Collector Tuning
ESUG
 
Improving Performance Through Object Lifetime Profiling: the DataFrame Case
Improving Performance Through Object Lifetime Profiling: the DataFrame CaseImproving Performance Through Object Lifetime Profiling: the DataFrame Case
Improving Performance Through Object Lifetime Profiling: the DataFrame Case
ESUG
 
Pharo DataFrame: Past, Present, and Future
Pharo DataFrame: Past, Present, and FuturePharo DataFrame: Past, Present, and Future
Pharo DataFrame: Past, Present, and Future
ESUG
 
thisContext in the Debugger
thisContext in the DebuggerthisContext in the Debugger
thisContext in the Debugger
ESUG
 
Websockets for Fencing Score
Websockets for Fencing ScoreWebsockets for Fencing Score
Websockets for Fencing Score
ESUG
 
ShowUs: PharoJS.org Develop in Pharo, Run on JavaScript
ShowUs: PharoJS.org Develop in Pharo, Run on JavaScriptShowUs: PharoJS.org Develop in Pharo, Run on JavaScript
ShowUs: PharoJS.org Develop in Pharo, Run on JavaScript
ESUG
 
Advanced Object- Oriented Design Mooc
Advanced Object- Oriented Design MoocAdvanced Object- Oriented Design Mooc
Advanced Object- Oriented Design Mooc
ESUG
 
A New Architecture Reconciling Refactorings and Transformations
A New Architecture Reconciling Refactorings and TransformationsA New Architecture Reconciling Refactorings and Transformations
A New Architecture Reconciling Refactorings and Transformations
ESUG
 

More from ESUG (20)

Workshop: Identifying concept inventories in agile programming
Workshop: Identifying concept inventories in agile programmingWorkshop: Identifying concept inventories in agile programming
Workshop: Identifying concept inventories in agile programming
 
Technical documentation support in Pharo
Technical documentation support in PharoTechnical documentation support in Pharo
Technical documentation support in Pharo
 
The Pharo Debugger and Debugging tools: Advances and Roadmap
The Pharo Debugger and Debugging tools: Advances and RoadmapThe Pharo Debugger and Debugging tools: Advances and Roadmap
The Pharo Debugger and Debugging tools: Advances and Roadmap
 
Sequence: Pipeline modelling in Pharo
Sequence: Pipeline modelling in PharoSequence: Pipeline modelling in Pharo
Sequence: Pipeline modelling in Pharo
 
Migration process from monolithic to micro frontend architecture in mobile ap...
Migration process from monolithic to micro frontend architecture in mobile ap...Migration process from monolithic to micro frontend architecture in mobile ap...
Migration process from monolithic to micro frontend architecture in mobile ap...
 
Analyzing Dart Language with Pharo: Report and early results
Analyzing Dart Language with Pharo: Report and early resultsAnalyzing Dart Language with Pharo: Report and early results
Analyzing Dart Language with Pharo: Report and early results
 
Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6
Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6
Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6
 
A Unit Test Metamodel for Test Generation
A Unit Test Metamodel for Test GenerationA Unit Test Metamodel for Test Generation
A Unit Test Metamodel for Test Generation
 
Creating Unit Tests Using Genetic Programming
Creating Unit Tests Using Genetic ProgrammingCreating Unit Tests Using Genetic Programming
Creating Unit Tests Using Genetic Programming
 
Threaded-Execution and CPS Provide Smooth Switching Between Execution Modes
Threaded-Execution and CPS Provide Smooth Switching Between Execution ModesThreaded-Execution and CPS Provide Smooth Switching Between Execution Modes
Threaded-Execution and CPS Provide Smooth Switching Between Execution Modes
 
Exploring GitHub Actions through EGAD: An Experience Report
Exploring GitHub Actions through EGAD: An Experience ReportExploring GitHub Actions through EGAD: An Experience Report
Exploring GitHub Actions through EGAD: An Experience Report
 
Pharo: a reflective language A first systematic analysis of reflective APIs
Pharo: a reflective language A first systematic analysis of reflective APIsPharo: a reflective language A first systematic analysis of reflective APIs
Pharo: a reflective language A first systematic analysis of reflective APIs
 
Garbage Collector Tuning
Garbage Collector TuningGarbage Collector Tuning
Garbage Collector Tuning
 
Improving Performance Through Object Lifetime Profiling: the DataFrame Case
Improving Performance Through Object Lifetime Profiling: the DataFrame CaseImproving Performance Through Object Lifetime Profiling: the DataFrame Case
Improving Performance Through Object Lifetime Profiling: the DataFrame Case
 
Pharo DataFrame: Past, Present, and Future
Pharo DataFrame: Past, Present, and FuturePharo DataFrame: Past, Present, and Future
Pharo DataFrame: Past, Present, and Future
 
thisContext in the Debugger
thisContext in the DebuggerthisContext in the Debugger
thisContext in the Debugger
 
Websockets for Fencing Score
Websockets for Fencing ScoreWebsockets for Fencing Score
Websockets for Fencing Score
 
ShowUs: PharoJS.org Develop in Pharo, Run on JavaScript
ShowUs: PharoJS.org Develop in Pharo, Run on JavaScriptShowUs: PharoJS.org Develop in Pharo, Run on JavaScript
ShowUs: PharoJS.org Develop in Pharo, Run on JavaScript
 
Advanced Object- Oriented Design Mooc
Advanced Object- Oriented Design MoocAdvanced Object- Oriented Design Mooc
Advanced Object- Oriented Design Mooc
 
A New Architecture Reconciling Refactorings and Transformations
A New Architecture Reconciling Refactorings and TransformationsA New Architecture Reconciling Refactorings and Transformations
A New Architecture Reconciling Refactorings and Transformations
 

Recently uploaded

top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
vrstrong314
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Hivelance Technology
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
XfilesPro
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024
Sharepoint Designs
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
varshanayak241
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Visitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.appVisitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.app
NaapbooksPrivateLimi
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Anthony Dahanne
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 

Recently uploaded (20)

top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
Visitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.appVisitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.app
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 

Big Data Processing in Pharo

  • 1. Big Data Processing in Pharo Matteo Marra
 ESUG - August 2019 - Köln
  • 2. Matteo Marra - Big Data Processing in Pharo Big Data Frameworks 2 Programming Model To express large data processing jobs in terms of simple functions (e.g., Map/Reduce) Parallel Execution Model To execute in parallel computation on large amount of data
  • 3. Matteo Marra - Big Data Processing in Pharo Why Big Data?
 Google MapReduce 3 Index the WWW More than 20 billion web pages (> 400 TB) Do it fast With 1000 machines: < 3 hours, with one machine, 4 months. Keep it cheap Use a lot of recycled hardware, but with robust error recovery MapReduce: Simplified Data Processing on Large Clusters Dean J. and Ghemawat S. OSDI ‘04
  • 4. Matteo Marra - Big Data Processing in Pharo Rossi Abruzzo 1522706400 Verdi Toscana 1527976800 Bianchi Toscana 1525298400 Verdi Lombardia 1527976800 Gialli Toscana 1525298400 Rossi Abruzzo 1525298400 Gialli Veneto 1527976800 Bianchi Sardegna 1522706400 Gialli Toscana 1527976800 Bianchi Lombardia 1527976800 Bianchi Toscana 1522706400 Gialli Sicilia 1522706400 Gialli Sicilia 1527976800 Bianchi Sardegna 1527976800 Dataset KeyBy (Rossi, Abruzzo …) (Verdi, Toscana …) (Bianchi, Toscana …) (Verdi, Lombardia …) (Gialli, Toscana …) (Rossi, Abruzzo …) (Gialli, Veneto …) (Bianchi, Sardegna …) (Gialli, Toscana …) (Bianchi, Lombardia …) (Bianchi, Toscana …) (Gialli, Sicilia …) (Gialli, Sicilia …) (Bianchi, Sardegna …) Map NIL (Verdi, Toscana …) (Bianchi, Toscana …) NIL (Gialli, Toscana …) NIL NIL NIL (Gialli, Toscana …) NIL (Bianchi, Toscana …) NIL NIL NIL GroupBy (Verdi, Toscana) (Bianchi, Toscana …) (Bianchi, Toscana …) (Gialli, Toscana …) (Gialli, Toscana …) Reduce (Verdi, 1) (Bianchi, 2) (Gialli, 2) Map/Reduce by Example 4
  • 5. Matteo Marra - Big Data Processing in Pharo A Map/Reduce application in Pharo 5 VoteCountingApp>>map: aLine | splitted| (line includesSubstring: ‘Toscana') ifTrue: [ splitted := line substrings: ' '. (DateAndTime fromUnixTime: splitted at: 3) > DateAndTime yesterday ifTrue: [ ^ (splitted at: 2) -> 1 ] ]. ^ nil -> nil VoteCountingApp>>reduceByKey: values ^ values inject: 0 into: [:sum :current | sum + current] 1 2 3 4 5 6 7 8 9 10 11
  • 6. Matteo Marra - Big Data Processing in Pharo Port: a Map/Reduce Framework for Pharo 6 Apply Map and ReduceCoordinate
  • 7. Matteo Marra - Big Data Processing in Pharo Deploying Port 7 Local Deploy on one machine using multiple cores. Standalone Deploy on multiple machines in the same network. Yarn Mode Deploy on a cluster using Hadoop YARN.
  • 8. Matteo Marra - Big Data Processing in Pharo Using Port 8 app := VoteCountingApp new. port startMapReduceApplication: app on: data.
  • 9. Matteo Marra - Big Data Processing in Pharo Handling Results 9 VoteCountingApp>>handleResult: resultPair self storeToHDFS: resultPair Asynchronous Execution The application is executed asynchronously. Asynchronous Result Handling The result is handled asynchronously by the Map/Reduce Master
  • 10. Matteo Marra - Big Data Processing in Pharo From Map/Reduce to Spark 10 Distributed Collection Instead of executing Map/Reduce on a Dataset, you load the dataset in memory Broader API Not only execute map or reduce, but a broad set of methods applicable to the distributed collection In-Memory operations Avoid heavy storing of intermediate results by explicitly keep them in memory map: reduce: filter: aggregate: groupBy: map: reduce:
  • 11. Matteo Marra - Big Data Processing in Pharo The Spark-like API in Port 11 port startMapReduceApplication: app on: dataSet . data := port distribute: dataSet. VoteCountingApp>>map: aLine … result := data map: […]. result storeToHDFSPath: ‘…’ . VoteCountingApp>>handleResult: res self storeToHDFS: res VoteCountingApp>>run
  • 12. Matteo Marra - Big Data Processing in Pharo The Polls Analyser in Port (V2) 12 port := Port withServerURL: ‘http://myServerURL’. data := port distribute: dataSet. data := data filter: [:line | line includesSubstring: ‘Toscana’]. data map: [:line | splitted := line substrings: ' ' . (DateAndTime fromUnixTime: splitted at: 3) > DateAndTime yesterday ifTrue: [ (splitted at: 2) -> 1 ]] result := data reduceByKey: [:value :sum| sum + value] 1 2 3 4 5 6 7 8 result getCollection. result storeToHDFSPath: ‘…’
  • 13. Matteo Marra - Big Data Processing in Pharo DEMO 13
  • 14. Matteo Marra - Big Data Processing in Pharo Uses of Port 14 Blockchain Analysis In collaboration with Santiago at RMoD Lille. Full blockchain in 6 hours vs 2 days! Genetic algorithms In collaboration with Alexandre Bergel at Universidad de Chile
  • 15. Matteo Marra - Big Data Processing in Pharo Towards Live Big Data Development Tools 15 Online Debugging Debug the system as the bug is happening Global View Centralised debugging of the distributed system Update of the Running System Deploy code-fixes without restarting the whole system Isolation Debug the system without interfering with its execution
  • 16. Conclusion Matteo Marra - Vrije Universiteit Brussel mmarra@vub.be gitlab.soft.vub.ac.be/Marra github.com/Marmat21 Want to try it? CONTACT ME! mmarra@vub.be