SlideShare a Scribd company logo
1 of 16
Download to read offline
Big Data Processing in
Pharo
Matteo Marra

ESUG - August 2019 - Köln
Matteo Marra - Big Data Processing in Pharo
Big Data Frameworks
2
Programming Model
To express large data processing jobs in terms of simple
functions (e.g., Map/Reduce)
Parallel Execution Model
To execute in parallel computation on large amount of data
Matteo Marra - Big Data Processing in Pharo
Why Big Data?

Google MapReduce
3
Index the WWW
More than 20 billion web pages (> 400 TB)
Do it fast
With 1000 machines: < 3 hours, with one machine, 4 months.
Keep it cheap
Use a lot of recycled hardware, but with robust error recovery
MapReduce: Simplified Data
Processing on Large Clusters
Dean J. and Ghemawat S. OSDI ‘04
Matteo Marra - Big Data Processing in Pharo
Rossi Abruzzo 1522706400
Verdi Toscana 1527976800
Bianchi Toscana 1525298400
Verdi Lombardia 1527976800
Gialli Toscana 1525298400
Rossi Abruzzo 1525298400
Gialli Veneto 1527976800
Bianchi Sardegna 1522706400
Gialli Toscana 1527976800
Bianchi Lombardia 1527976800
Bianchi Toscana 1522706400
Gialli Sicilia 1522706400
Gialli Sicilia 1527976800
Bianchi Sardegna 1527976800
Dataset KeyBy
(Rossi, Abruzzo …)
(Verdi, Toscana …)
(Bianchi, Toscana …)
(Verdi, Lombardia …)
(Gialli, Toscana …)
(Rossi, Abruzzo …)
(Gialli, Veneto …)
(Bianchi, Sardegna …)
(Gialli, Toscana …)
(Bianchi, Lombardia
…)
(Bianchi, Toscana …)
(Gialli, Sicilia …)
(Gialli, Sicilia …)
(Bianchi, Sardegna …)
Map
NIL
(Verdi, Toscana …)
(Bianchi, Toscana …)
NIL
(Gialli, Toscana …)
NIL
NIL
NIL
(Gialli, Toscana …)
NIL
(Bianchi, Toscana …)
NIL
NIL
NIL
GroupBy
(Verdi, Toscana)
(Bianchi, Toscana …)
(Bianchi, Toscana …)
(Gialli, Toscana …)
(Gialli, Toscana …)
Reduce
(Verdi, 1)
(Bianchi, 2)
(Gialli, 2)
Map/Reduce by Example
4
Matteo Marra - Big Data Processing in Pharo
A Map/Reduce application in
Pharo
5
VoteCountingApp>>map: aLine
| splitted|
(line includesSubstring: ‘Toscana')
ifTrue: [ splitted := line substrings: ' '.
(DateAndTime fromUnixTime: splitted at: 3)
> DateAndTime yesterday
ifTrue: [ ^ (splitted at: 2) -> 1 ] ].
^ nil -> nil
VoteCountingApp>>reduceByKey: values
^ values inject: 0 into: [:sum :current | sum + current]
1
2
3
4
5
6
7
8
9
10
11
Matteo Marra - Big Data Processing in Pharo
Port: a Map/Reduce Framework
for Pharo
6
Apply Map and ReduceCoordinate
Matteo Marra - Big Data Processing in Pharo
Deploying Port
7
Local
Deploy on one machine
using multiple cores.
Standalone
Deploy on multiple machines
in the same network.
Yarn Mode
Deploy on a cluster using
Hadoop YARN.
Matteo Marra - Big Data Processing in Pharo
Using Port
8
app := VoteCountingApp new.
port startMapReduceApplication: app on: data.
Matteo Marra - Big Data Processing in Pharo
Handling Results
9
VoteCountingApp>>handleResult: resultPair
self storeToHDFS: resultPair
Asynchronous Execution
The application is executed
asynchronously.
Asynchronous Result Handling
The result is handled asynchronously by the
Map/Reduce Master
Matteo Marra - Big Data Processing in Pharo
From Map/Reduce to Spark
10
Distributed Collection
Instead of executing Map/Reduce on a Dataset, you load
the dataset in memory
Broader API
Not only execute map or reduce, but a broad set of
methods applicable to the distributed collection
In-Memory operations
Avoid heavy storing of intermediate results by explicitly
keep them in memory
map:
reduce:
filter:
aggregate:
groupBy:
map: reduce:
Matteo Marra - Big Data Processing in Pharo
The Spark-like API in Port
11
port startMapReduceApplication: app
on: dataSet .
data := port distribute: dataSet.
VoteCountingApp>>map: aLine
…
result := data map: […].
result storeToHDFSPath: ‘…’ .
VoteCountingApp>>handleResult: res
self storeToHDFS: res
VoteCountingApp>>run
Matteo Marra - Big Data Processing in Pharo
The Polls Analyser in Port
(V2)
12
port := Port withServerURL: ‘http://myServerURL’.
data := port distribute: dataSet.
data := data filter: [:line | line includesSubstring: ‘Toscana’].
data map: [:line | splitted := line substrings: ' ' .
(DateAndTime fromUnixTime: splitted at: 3)
> DateAndTime yesterday
ifTrue: [ (splitted at: 2) -> 1 ]]
result := data reduceByKey: [:value :sum| sum + value]
1
2
3
4
5
6
7
8
result getCollection.
result storeToHDFSPath: ‘…’
Matteo Marra - Big Data Processing in Pharo
DEMO
13
Matteo Marra - Big Data Processing in Pharo
Uses of Port
14
Blockchain Analysis
In collaboration with Santiago at RMoD Lille.
Full blockchain in 6 hours vs 2 days!
Genetic algorithms
In collaboration with Alexandre Bergel at Universidad de Chile
Matteo Marra - Big Data Processing in Pharo
Towards Live Big Data
Development Tools
15
Online Debugging
Debug the system as the bug is happening
Global View
Centralised debugging of the distributed system
Update of the Running System
Deploy code-fixes without restarting the whole system
Isolation
Debug the system without interfering with its execution
Conclusion
Matteo Marra - Vrije Universiteit Brussel
mmarra@vub.be gitlab.soft.vub.ac.be/Marra github.com/Marmat21
Want to try it?
CONTACT ME!
mmarra@vub.be

More Related Content

What's hot

Dr Richard Fry - Using R as a GIS
Dr Richard Fry - Using R as a GISDr Richard Fry - Using R as a GIS
Dr Richard Fry - Using R as a GISShaun Lewis
 
Reduce Side Joins
Reduce Side Joins Reduce Side Joins
Reduce Side Joins Edureka!
 
2016 bigdata - projects list
2016   bigdata - projects list2016   bigdata - projects list
2016 bigdata - projects listMSR PROJECTS
 
OpenCube Workshop at eGov2015 & ePart2015 dual conference
OpenCube Workshop at eGov2015 & ePart2015 dual conferenceOpenCube Workshop at eGov2015 & ePart2015 dual conference
OpenCube Workshop at eGov2015 & ePart2015 dual conferenceEvangelos Kalampokis
 
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...Evangelos Kalampokis
 
OpenNebulaConf 2016 - MICHAL - flexible infrastructure accounting framework b...
OpenNebulaConf 2016 - MICHAL - flexible infrastructure accounting framework b...OpenNebulaConf 2016 - MICHAL - flexible infrastructure accounting framework b...
OpenNebulaConf 2016 - MICHAL - flexible infrastructure accounting framework b...OpenNebula Project
 
Police2_6_poster_GregAmos
Police2_6_poster_GregAmosPolice2_6_poster_GregAmos
Police2_6_poster_GregAmosGregory Amos
 
Examples of My Recent Work
Examples of My Recent WorkExamples of My Recent Work
Examples of My Recent Workguestf2807f
 
A Study on New York City Taxi Rides
A Study on New York City Taxi RidesA Study on New York City Taxi Rides
A Study on New York City Taxi RidesCaglar Subasi
 
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy LansleyUsing R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy LansleyGuy Lansley
 
On-the-fly Integration of Static and Dynamic Linked Data
On-the-fly Integration of Static and Dynamic Linked DataOn-the-fly Integration of Static and Dynamic Linked Data
On-the-fly Integration of Static and Dynamic Linked Dataaharth
 
Ross McDonald - PgRouting in QGIS
Ross McDonald - PgRouting in QGISRoss McDonald - PgRouting in QGIS
Ross McDonald - PgRouting in QGISRoss McDonald
 
Importing Data From Other Statistical Packages
Importing Data From Other Statistical PackagesImporting Data From Other Statistical Packages
Importing Data From Other Statistical PackagesPenn State University
 
R programming language in spatial analysis
R programming language in spatial analysisR programming language in spatial analysis
R programming language in spatial analysisAbhiram Kanigolla
 
Guug11 mashing up-google_apps
Guug11 mashing up-google_appsGuug11 mashing up-google_apps
Guug11 mashing up-google_appsTony Hirst
 
"Dude, where's my graph?" RDF Data Cubes for Clinical Trials Data
"Dude, where's my graph?" RDF Data Cubes for Clinical Trials Data"Dude, where's my graph?" RDF Data Cubes for Clinical Trials Data
"Dude, where's my graph?" RDF Data Cubes for Clinical Trials DataMarc Andersen
 

What's hot (19)

Dr Richard Fry - Using R as a GIS
Dr Richard Fry - Using R as a GISDr Richard Fry - Using R as a GIS
Dr Richard Fry - Using R as a GIS
 
Reduce Side Joins
Reduce Side Joins Reduce Side Joins
Reduce Side Joins
 
JSSPP 2010
JSSPP 2010JSSPP 2010
JSSPP 2010
 
Maps with leafletR
Maps with leafletRMaps with leafletR
Maps with leafletR
 
2016 bigdata - projects list
2016   bigdata - projects list2016   bigdata - projects list
2016 bigdata - projects list
 
OpenCube Workshop at eGov2015 & ePart2015 dual conference
OpenCube Workshop at eGov2015 & ePart2015 dual conferenceOpenCube Workshop at eGov2015 & ePart2015 dual conference
OpenCube Workshop at eGov2015 & ePart2015 dual conference
 
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
 
OpenNebulaConf 2016 - MICHAL - flexible infrastructure accounting framework b...
OpenNebulaConf 2016 - MICHAL - flexible infrastructure accounting framework b...OpenNebulaConf 2016 - MICHAL - flexible infrastructure accounting framework b...
OpenNebulaConf 2016 - MICHAL - flexible infrastructure accounting framework b...
 
Police2_6_poster_GregAmos
Police2_6_poster_GregAmosPolice2_6_poster_GregAmos
Police2_6_poster_GregAmos
 
Examples of My Recent Work
Examples of My Recent WorkExamples of My Recent Work
Examples of My Recent Work
 
A Study on New York City Taxi Rides
A Study on New York City Taxi RidesA Study on New York City Taxi Rides
A Study on New York City Taxi Rides
 
Big data in GIS Environment
Big data in GIS Environment Big data in GIS Environment
Big data in GIS Environment
 
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy LansleyUsing R to Visualize Spatial Data: R as GIS - Guy Lansley
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
 
On-the-fly Integration of Static and Dynamic Linked Data
On-the-fly Integration of Static and Dynamic Linked DataOn-the-fly Integration of Static and Dynamic Linked Data
On-the-fly Integration of Static and Dynamic Linked Data
 
Ross McDonald - PgRouting in QGIS
Ross McDonald - PgRouting in QGISRoss McDonald - PgRouting in QGIS
Ross McDonald - PgRouting in QGIS
 
Importing Data From Other Statistical Packages
Importing Data From Other Statistical PackagesImporting Data From Other Statistical Packages
Importing Data From Other Statistical Packages
 
R programming language in spatial analysis
R programming language in spatial analysisR programming language in spatial analysis
R programming language in spatial analysis
 
Guug11 mashing up-google_apps
Guug11 mashing up-google_appsGuug11 mashing up-google_apps
Guug11 mashing up-google_apps
 
"Dude, where's my graph?" RDF Data Cubes for Clinical Trials Data
"Dude, where's my graph?" RDF Data Cubes for Clinical Trials Data"Dude, where's my graph?" RDF Data Cubes for Clinical Trials Data
"Dude, where's my graph?" RDF Data Cubes for Clinical Trials Data
 

Similar to Big Data Processing in Pharo

Distributed Cache With MapReduce
Distributed Cache With MapReduceDistributed Cache With MapReduce
Distributed Cache With MapReduceEdureka!
 
IRJET- Hadoop based Frequent Closed Item-Sets for Association Rules form ...
IRJET-  	  Hadoop based Frequent Closed Item-Sets for Association Rules form ...IRJET-  	  Hadoop based Frequent Closed Item-Sets for Association Rules form ...
IRJET- Hadoop based Frequent Closed Item-Sets for Association Rules form ...IRJET Journal
 
Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)MongoSF
 
XML Parsing with Map Reduce
XML Parsing with Map ReduceXML Parsing with Map Reduce
XML Parsing with Map ReduceEdureka!
 
Bulk Loading Into HBase With MapReduce
Bulk Loading Into HBase With MapReduceBulk Loading Into HBase With MapReduce
Bulk Loading Into HBase With MapReduceEdureka!
 
CDMA1X Pilot Panorama introduction
CDMA1X Pilot Panorama introductionCDMA1X Pilot Panorama introduction
CDMA1X Pilot Panorama introductionTempus Telcosys
 
Presented by Ahmed Abdulhakim Al-Absi - Scaling map reduce applications acro...
Presented by Ahmed Abdulhakim Al-Absi -  Scaling map reduce applications acro...Presented by Ahmed Abdulhakim Al-Absi -  Scaling map reduce applications acro...
Presented by Ahmed Abdulhakim Al-Absi - Scaling map reduce applications acro...Absi Ahmed
 
Analysis of parking citations mapreduce techniques
Analysis of parking citations   mapreduce techniquesAnalysis of parking citations   mapreduce techniques
Analysis of parking citations mapreduce techniquesSindhujanDhayalan
 
FME Around the World
FME Around the WorldFME Around the World
FME Around the WorldSafe Software
 
Fast Range Aggregate Queries for Big Data Analysis
Fast Range Aggregate Queries for Big Data AnalysisFast Range Aggregate Queries for Big Data Analysis
Fast Range Aggregate Queries for Big Data AnalysisIRJET Journal
 
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...IJECEIAES
 
Kudu as Storage Layer to Digitize Credit Processes
Kudu as Storage Layer to Digitize Credit ProcessesKudu as Storage Layer to Digitize Credit Processes
Kudu as Storage Layer to Digitize Credit ProcessesOlaf Hein
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Sumeet Singh
 
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
A Survey on Data Mapping Strategy for data stored in the storage cloud  111A Survey on Data Mapping Strategy for data stored in the storage cloud  111
A Survey on Data Mapping Strategy for data stored in the storage cloud 111NavNeet KuMar
 
FME Around the World (FME Trek Part 1): Ken Bragg - Safe Software FME World T...
FME Around the World (FME Trek Part 1): Ken Bragg - Safe Software FME World T...FME Around the World (FME Trek Part 1): Ken Bragg - Safe Software FME World T...
FME Around the World (FME Trek Part 1): Ken Bragg - Safe Software FME World T...IMGS
 
Bring Cartography to the Cloud
Bring Cartography to the CloudBring Cartography to the Cloud
Bring Cartography to the CloudNick Dimiduk
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTijwscjournal
 

Similar to Big Data Processing in Pharo (20)

BdT 3.1-3.6
BdT 3.1-3.6BdT 3.1-3.6
BdT 3.1-3.6
 
Distributed Cache With MapReduce
Distributed Cache With MapReduceDistributed Cache With MapReduce
Distributed Cache With MapReduce
 
E031201032036
E031201032036E031201032036
E031201032036
 
IRJET- Hadoop based Frequent Closed Item-Sets for Association Rules form ...
IRJET-  	  Hadoop based Frequent Closed Item-Sets for Association Rules form ...IRJET-  	  Hadoop based Frequent Closed Item-Sets for Association Rules form ...
IRJET- Hadoop based Frequent Closed Item-Sets for Association Rules form ...
 
Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)
 
XML Parsing with Map Reduce
XML Parsing with Map ReduceXML Parsing with Map Reduce
XML Parsing with Map Reduce
 
Bulk Loading Into HBase With MapReduce
Bulk Loading Into HBase With MapReduceBulk Loading Into HBase With MapReduce
Bulk Loading Into HBase With MapReduce
 
T180304125129
T180304125129T180304125129
T180304125129
 
CDMA1X Pilot Panorama introduction
CDMA1X Pilot Panorama introductionCDMA1X Pilot Panorama introduction
CDMA1X Pilot Panorama introduction
 
Presented by Ahmed Abdulhakim Al-Absi - Scaling map reduce applications acro...
Presented by Ahmed Abdulhakim Al-Absi -  Scaling map reduce applications acro...Presented by Ahmed Abdulhakim Al-Absi -  Scaling map reduce applications acro...
Presented by Ahmed Abdulhakim Al-Absi - Scaling map reduce applications acro...
 
Analysis of parking citations mapreduce techniques
Analysis of parking citations   mapreduce techniquesAnalysis of parking citations   mapreduce techniques
Analysis of parking citations mapreduce techniques
 
FME Around the World
FME Around the WorldFME Around the World
FME Around the World
 
Fast Range Aggregate Queries for Big Data Analysis
Fast Range Aggregate Queries for Big Data AnalysisFast Range Aggregate Queries for Big Data Analysis
Fast Range Aggregate Queries for Big Data Analysis
 
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
 
Kudu as Storage Layer to Digitize Credit Processes
Kudu as Storage Layer to Digitize Credit ProcessesKudu as Storage Layer to Digitize Credit Processes
Kudu as Storage Layer to Digitize Credit Processes
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
 
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
A Survey on Data Mapping Strategy for data stored in the storage cloud  111A Survey on Data Mapping Strategy for data stored in the storage cloud  111
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
 
FME Around the World (FME Trek Part 1): Ken Bragg - Safe Software FME World T...
FME Around the World (FME Trek Part 1): Ken Bragg - Safe Software FME World T...FME Around the World (FME Trek Part 1): Ken Bragg - Safe Software FME World T...
FME Around the World (FME Trek Part 1): Ken Bragg - Safe Software FME World T...
 
Bring Cartography to the Cloud
Bring Cartography to the CloudBring Cartography to the Cloud
Bring Cartography to the Cloud
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
 

More from ESUG

Workshop: Identifying concept inventories in agile programming
Workshop: Identifying concept inventories in agile programmingWorkshop: Identifying concept inventories in agile programming
Workshop: Identifying concept inventories in agile programmingESUG
 
Technical documentation support in Pharo
Technical documentation support in PharoTechnical documentation support in Pharo
Technical documentation support in PharoESUG
 
The Pharo Debugger and Debugging tools: Advances and Roadmap
The Pharo Debugger and Debugging tools: Advances and RoadmapThe Pharo Debugger and Debugging tools: Advances and Roadmap
The Pharo Debugger and Debugging tools: Advances and RoadmapESUG
 
Sequence: Pipeline modelling in Pharo
Sequence: Pipeline modelling in PharoSequence: Pipeline modelling in Pharo
Sequence: Pipeline modelling in PharoESUG
 
Migration process from monolithic to micro frontend architecture in mobile ap...
Migration process from monolithic to micro frontend architecture in mobile ap...Migration process from monolithic to micro frontend architecture in mobile ap...
Migration process from monolithic to micro frontend architecture in mobile ap...ESUG
 
Analyzing Dart Language with Pharo: Report and early results
Analyzing Dart Language with Pharo: Report and early resultsAnalyzing Dart Language with Pharo: Report and early results
Analyzing Dart Language with Pharo: Report and early resultsESUG
 
Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6
Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6
Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6ESUG
 
A Unit Test Metamodel for Test Generation
A Unit Test Metamodel for Test GenerationA Unit Test Metamodel for Test Generation
A Unit Test Metamodel for Test GenerationESUG
 
Creating Unit Tests Using Genetic Programming
Creating Unit Tests Using Genetic ProgrammingCreating Unit Tests Using Genetic Programming
Creating Unit Tests Using Genetic ProgrammingESUG
 
Threaded-Execution and CPS Provide Smooth Switching Between Execution Modes
Threaded-Execution and CPS Provide Smooth Switching Between Execution ModesThreaded-Execution and CPS Provide Smooth Switching Between Execution Modes
Threaded-Execution and CPS Provide Smooth Switching Between Execution ModesESUG
 
Exploring GitHub Actions through EGAD: An Experience Report
Exploring GitHub Actions through EGAD: An Experience ReportExploring GitHub Actions through EGAD: An Experience Report
Exploring GitHub Actions through EGAD: An Experience ReportESUG
 
Pharo: a reflective language A first systematic analysis of reflective APIs
Pharo: a reflective language A first systematic analysis of reflective APIsPharo: a reflective language A first systematic analysis of reflective APIs
Pharo: a reflective language A first systematic analysis of reflective APIsESUG
 
Garbage Collector Tuning
Garbage Collector TuningGarbage Collector Tuning
Garbage Collector TuningESUG
 
Improving Performance Through Object Lifetime Profiling: the DataFrame Case
Improving Performance Through Object Lifetime Profiling: the DataFrame CaseImproving Performance Through Object Lifetime Profiling: the DataFrame Case
Improving Performance Through Object Lifetime Profiling: the DataFrame CaseESUG
 
Pharo DataFrame: Past, Present, and Future
Pharo DataFrame: Past, Present, and FuturePharo DataFrame: Past, Present, and Future
Pharo DataFrame: Past, Present, and FutureESUG
 
thisContext in the Debugger
thisContext in the DebuggerthisContext in the Debugger
thisContext in the DebuggerESUG
 
Websockets for Fencing Score
Websockets for Fencing ScoreWebsockets for Fencing Score
Websockets for Fencing ScoreESUG
 
ShowUs: PharoJS.org Develop in Pharo, Run on JavaScript
ShowUs: PharoJS.org Develop in Pharo, Run on JavaScriptShowUs: PharoJS.org Develop in Pharo, Run on JavaScript
ShowUs: PharoJS.org Develop in Pharo, Run on JavaScriptESUG
 
Advanced Object- Oriented Design Mooc
Advanced Object- Oriented Design MoocAdvanced Object- Oriented Design Mooc
Advanced Object- Oriented Design MoocESUG
 
A New Architecture Reconciling Refactorings and Transformations
A New Architecture Reconciling Refactorings and TransformationsA New Architecture Reconciling Refactorings and Transformations
A New Architecture Reconciling Refactorings and TransformationsESUG
 

More from ESUG (20)

Workshop: Identifying concept inventories in agile programming
Workshop: Identifying concept inventories in agile programmingWorkshop: Identifying concept inventories in agile programming
Workshop: Identifying concept inventories in agile programming
 
Technical documentation support in Pharo
Technical documentation support in PharoTechnical documentation support in Pharo
Technical documentation support in Pharo
 
The Pharo Debugger and Debugging tools: Advances and Roadmap
The Pharo Debugger and Debugging tools: Advances and RoadmapThe Pharo Debugger and Debugging tools: Advances and Roadmap
The Pharo Debugger and Debugging tools: Advances and Roadmap
 
Sequence: Pipeline modelling in Pharo
Sequence: Pipeline modelling in PharoSequence: Pipeline modelling in Pharo
Sequence: Pipeline modelling in Pharo
 
Migration process from monolithic to micro frontend architecture in mobile ap...
Migration process from monolithic to micro frontend architecture in mobile ap...Migration process from monolithic to micro frontend architecture in mobile ap...
Migration process from monolithic to micro frontend architecture in mobile ap...
 
Analyzing Dart Language with Pharo: Report and early results
Analyzing Dart Language with Pharo: Report and early resultsAnalyzing Dart Language with Pharo: Report and early results
Analyzing Dart Language with Pharo: Report and early results
 
Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6
Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6
Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6
 
A Unit Test Metamodel for Test Generation
A Unit Test Metamodel for Test GenerationA Unit Test Metamodel for Test Generation
A Unit Test Metamodel for Test Generation
 
Creating Unit Tests Using Genetic Programming
Creating Unit Tests Using Genetic ProgrammingCreating Unit Tests Using Genetic Programming
Creating Unit Tests Using Genetic Programming
 
Threaded-Execution and CPS Provide Smooth Switching Between Execution Modes
Threaded-Execution and CPS Provide Smooth Switching Between Execution ModesThreaded-Execution and CPS Provide Smooth Switching Between Execution Modes
Threaded-Execution and CPS Provide Smooth Switching Between Execution Modes
 
Exploring GitHub Actions through EGAD: An Experience Report
Exploring GitHub Actions through EGAD: An Experience ReportExploring GitHub Actions through EGAD: An Experience Report
Exploring GitHub Actions through EGAD: An Experience Report
 
Pharo: a reflective language A first systematic analysis of reflective APIs
Pharo: a reflective language A first systematic analysis of reflective APIsPharo: a reflective language A first systematic analysis of reflective APIs
Pharo: a reflective language A first systematic analysis of reflective APIs
 
Garbage Collector Tuning
Garbage Collector TuningGarbage Collector Tuning
Garbage Collector Tuning
 
Improving Performance Through Object Lifetime Profiling: the DataFrame Case
Improving Performance Through Object Lifetime Profiling: the DataFrame CaseImproving Performance Through Object Lifetime Profiling: the DataFrame Case
Improving Performance Through Object Lifetime Profiling: the DataFrame Case
 
Pharo DataFrame: Past, Present, and Future
Pharo DataFrame: Past, Present, and FuturePharo DataFrame: Past, Present, and Future
Pharo DataFrame: Past, Present, and Future
 
thisContext in the Debugger
thisContext in the DebuggerthisContext in the Debugger
thisContext in the Debugger
 
Websockets for Fencing Score
Websockets for Fencing ScoreWebsockets for Fencing Score
Websockets for Fencing Score
 
ShowUs: PharoJS.org Develop in Pharo, Run on JavaScript
ShowUs: PharoJS.org Develop in Pharo, Run on JavaScriptShowUs: PharoJS.org Develop in Pharo, Run on JavaScript
ShowUs: PharoJS.org Develop in Pharo, Run on JavaScript
 
Advanced Object- Oriented Design Mooc
Advanced Object- Oriented Design MoocAdvanced Object- Oriented Design Mooc
Advanced Object- Oriented Design Mooc
 
A New Architecture Reconciling Refactorings and Transformations
A New Architecture Reconciling Refactorings and TransformationsA New Architecture Reconciling Refactorings and Transformations
A New Architecture Reconciling Refactorings and Transformations
 

Recently uploaded

Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLAlluxio, Inc.
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadCybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadIvo Andreev
 
ERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxAutus Cyber Tech
 
Streamlining Your Application Builds with Cloud Native Buildpacks
Streamlining Your Application Builds  with Cloud Native BuildpacksStreamlining Your Application Builds  with Cloud Native Buildpacks
Streamlining Your Application Builds with Cloud Native BuildpacksVish Abrams
 
Deep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - DatacampDeep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - DatacampVICTOR MAESTRE RAMIREZ
 
eAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionseAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionsNirav Modi
 
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsYour Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsJaydeep Chhasatia
 
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine HarmonyLeveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmonyelliciumsolutionspun
 
OpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorOpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorShane Coughlan
 
Fields in Java and Kotlin and what to expect.pptx
Fields in Java and Kotlin and what to expect.pptxFields in Java and Kotlin and what to expect.pptx
Fields in Java and Kotlin and what to expect.pptxJoão Esperancinha
 
Kawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies
 
Generative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilGenerative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilVICTOR MAESTRE RAMIREZ
 
Watermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesWatermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesShyamsundar Das
 
online pdf editor software solutions.pdf
online pdf editor software solutions.pdfonline pdf editor software solutions.pdf
online pdf editor software solutions.pdfMeon Technology
 
How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?AmeliaSmith90
 
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfTobias Schneck
 
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.Sharon Liu
 
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...OnePlan Solutions
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeNeo4j
 

Recently uploaded (20)

Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadCybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and Bad
 
ERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptx
 
Streamlining Your Application Builds with Cloud Native Buildpacks
Streamlining Your Application Builds  with Cloud Native BuildpacksStreamlining Your Application Builds  with Cloud Native Buildpacks
Streamlining Your Application Builds with Cloud Native Buildpacks
 
Deep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - DatacampDeep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - Datacamp
 
eAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionseAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspections
 
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsYour Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
 
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine HarmonyLeveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
 
OpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorOpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS Calculator
 
Fields in Java and Kotlin and what to expect.pptx
Fields in Java and Kotlin and what to expect.pptxFields in Java and Kotlin and what to expect.pptx
Fields in Java and Kotlin and what to expect.pptx
 
Salesforce AI Associate Certification.pptx
Salesforce AI Associate Certification.pptxSalesforce AI Associate Certification.pptx
Salesforce AI Associate Certification.pptx
 
Kawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in TrivandrumKawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in Trivandrum
 
Generative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilGenerative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-Council
 
Watermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesWatermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security Challenges
 
online pdf editor software solutions.pdf
online pdf editor software solutions.pdfonline pdf editor software solutions.pdf
online pdf editor software solutions.pdf
 
How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?
 
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
 
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
 
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 

Big Data Processing in Pharo

  • 1. Big Data Processing in Pharo Matteo Marra
 ESUG - August 2019 - Köln
  • 2. Matteo Marra - Big Data Processing in Pharo Big Data Frameworks 2 Programming Model To express large data processing jobs in terms of simple functions (e.g., Map/Reduce) Parallel Execution Model To execute in parallel computation on large amount of data
  • 3. Matteo Marra - Big Data Processing in Pharo Why Big Data?
 Google MapReduce 3 Index the WWW More than 20 billion web pages (> 400 TB) Do it fast With 1000 machines: < 3 hours, with one machine, 4 months. Keep it cheap Use a lot of recycled hardware, but with robust error recovery MapReduce: Simplified Data Processing on Large Clusters Dean J. and Ghemawat S. OSDI ‘04
  • 4. Matteo Marra - Big Data Processing in Pharo Rossi Abruzzo 1522706400 Verdi Toscana 1527976800 Bianchi Toscana 1525298400 Verdi Lombardia 1527976800 Gialli Toscana 1525298400 Rossi Abruzzo 1525298400 Gialli Veneto 1527976800 Bianchi Sardegna 1522706400 Gialli Toscana 1527976800 Bianchi Lombardia 1527976800 Bianchi Toscana 1522706400 Gialli Sicilia 1522706400 Gialli Sicilia 1527976800 Bianchi Sardegna 1527976800 Dataset KeyBy (Rossi, Abruzzo …) (Verdi, Toscana …) (Bianchi, Toscana …) (Verdi, Lombardia …) (Gialli, Toscana …) (Rossi, Abruzzo …) (Gialli, Veneto …) (Bianchi, Sardegna …) (Gialli, Toscana …) (Bianchi, Lombardia …) (Bianchi, Toscana …) (Gialli, Sicilia …) (Gialli, Sicilia …) (Bianchi, Sardegna …) Map NIL (Verdi, Toscana …) (Bianchi, Toscana …) NIL (Gialli, Toscana …) NIL NIL NIL (Gialli, Toscana …) NIL (Bianchi, Toscana …) NIL NIL NIL GroupBy (Verdi, Toscana) (Bianchi, Toscana …) (Bianchi, Toscana …) (Gialli, Toscana …) (Gialli, Toscana …) Reduce (Verdi, 1) (Bianchi, 2) (Gialli, 2) Map/Reduce by Example 4
  • 5. Matteo Marra - Big Data Processing in Pharo A Map/Reduce application in Pharo 5 VoteCountingApp>>map: aLine | splitted| (line includesSubstring: ‘Toscana') ifTrue: [ splitted := line substrings: ' '. (DateAndTime fromUnixTime: splitted at: 3) > DateAndTime yesterday ifTrue: [ ^ (splitted at: 2) -> 1 ] ]. ^ nil -> nil VoteCountingApp>>reduceByKey: values ^ values inject: 0 into: [:sum :current | sum + current] 1 2 3 4 5 6 7 8 9 10 11
  • 6. Matteo Marra - Big Data Processing in Pharo Port: a Map/Reduce Framework for Pharo 6 Apply Map and ReduceCoordinate
  • 7. Matteo Marra - Big Data Processing in Pharo Deploying Port 7 Local Deploy on one machine using multiple cores. Standalone Deploy on multiple machines in the same network. Yarn Mode Deploy on a cluster using Hadoop YARN.
  • 8. Matteo Marra - Big Data Processing in Pharo Using Port 8 app := VoteCountingApp new. port startMapReduceApplication: app on: data.
  • 9. Matteo Marra - Big Data Processing in Pharo Handling Results 9 VoteCountingApp>>handleResult: resultPair self storeToHDFS: resultPair Asynchronous Execution The application is executed asynchronously. Asynchronous Result Handling The result is handled asynchronously by the Map/Reduce Master
  • 10. Matteo Marra - Big Data Processing in Pharo From Map/Reduce to Spark 10 Distributed Collection Instead of executing Map/Reduce on a Dataset, you load the dataset in memory Broader API Not only execute map or reduce, but a broad set of methods applicable to the distributed collection In-Memory operations Avoid heavy storing of intermediate results by explicitly keep them in memory map: reduce: filter: aggregate: groupBy: map: reduce:
  • 11. Matteo Marra - Big Data Processing in Pharo The Spark-like API in Port 11 port startMapReduceApplication: app on: dataSet . data := port distribute: dataSet. VoteCountingApp>>map: aLine … result := data map: […]. result storeToHDFSPath: ‘…’ . VoteCountingApp>>handleResult: res self storeToHDFS: res VoteCountingApp>>run
  • 12. Matteo Marra - Big Data Processing in Pharo The Polls Analyser in Port (V2) 12 port := Port withServerURL: ‘http://myServerURL’. data := port distribute: dataSet. data := data filter: [:line | line includesSubstring: ‘Toscana’]. data map: [:line | splitted := line substrings: ' ' . (DateAndTime fromUnixTime: splitted at: 3) > DateAndTime yesterday ifTrue: [ (splitted at: 2) -> 1 ]] result := data reduceByKey: [:value :sum| sum + value] 1 2 3 4 5 6 7 8 result getCollection. result storeToHDFSPath: ‘…’
  • 13. Matteo Marra - Big Data Processing in Pharo DEMO 13
  • 14. Matteo Marra - Big Data Processing in Pharo Uses of Port 14 Blockchain Analysis In collaboration with Santiago at RMoD Lille. Full blockchain in 6 hours vs 2 days! Genetic algorithms In collaboration with Alexandre Bergel at Universidad de Chile
  • 15. Matteo Marra - Big Data Processing in Pharo Towards Live Big Data Development Tools 15 Online Debugging Debug the system as the bug is happening Global View Centralised debugging of the distributed system Update of the Running System Deploy code-fixes without restarting the whole system Isolation Debug the system without interfering with its execution
  • 16. Conclusion Matteo Marra - Vrije Universiteit Brussel mmarra@vub.be gitlab.soft.vub.ac.be/Marra github.com/Marmat21 Want to try it? CONTACT ME! mmarra@vub.be