Science and data manipulation in Pharo

E
ESUGESUG
Science and data
manipulation in Pharo
Ferlicot-Delbecque Cyril | ESUG 2023
cyril@ferlicot.fr
Polymath, Pharo-AI and DataFrame
1
Summary
2
History
What is new?
Toward a new stage
History
3
Polymath
DataFrame
Pharo-AI
PolyMath
● Computation library for Pharo
● Similar to NumPy and SciPy in Python or SciRuby in Ruby
● Originally SciSmalltalk in Squeak
● Present in Pharo since a long time
4
DataFrame
5
Columns
Rows
Cells
DataFrame
● Table data structure
● Similar to DataFrame in Pandas, Julia, …
● Heavily used in data science
● Created in 2017 during GSoC
6
Pharo-AI
● Created in 2020
● Implements classical machine learning
algorithms (not deep learning)
○ K-Means, Linear Regression, N-Gram
Model, …
7
Harmony of communities
8
Polymath DataFrame
Pharo-AI
What is new?
9
Polymath: Modularization
10
● Rearchitecture
○ Extraction of data structures and random generators
○ Extraction of distributions in progress
● Cleaning of internal dependencies
Polymath
11
● Improvement of the CI robustness
● Align some conventions with Pharo-AI
● Divers cleanings and bug fixes
● Pharo 11 compatibility
Pharo-AI : Data manipulation
12
● Data partitioners : create tests sets
● Imputers : fill missing values
● Encoders : Standardize your datas
● Normalizer : Use common scales in your project
Pharo-AI
13
● Uniformization of projects
● Documentation
● Graph algorithms updates
● Divers speed up
● Cleaning and bug fixes in algos
DataFrame
14
● Speed up
● Better integration with other collections
● New visualizations based on DataFrames
● Integration with pharo-AI data preprocessing
DataFrame : GSoC 2023
15
● GSoC of Joshua Jose Dias Barreto
● Implementation of missing features
○ Better sorting
○ Data manipulation
○ Missing values management
○ …
DataFrame : GSoC 2023
16
Further improvements of DataFrame inspector
Toward a new stage
17
A push from the students
18
● DataFrame was started as a GSoC
● First AI algorithm were students projects
=> Data science interest more and more people
19
We are answering to this call
● Engineers are now pushing those projects
● Projects are maintained
● Speed is becoming correct
20
21
Are you using scientific
computing or data science?
Is the speed enough for you?
Are you encountering any
problem?
Are you missing features for
you projects?
Let us know ;)
1 of 21

Recommended

Industrialiser spark by
Industrialiser sparkIndustrialiser spark
Industrialiser sparkLucien Fregosi
118 views53 slides
(ATS6-PLAT03) What's behind Discngine collections by
(ATS6-PLAT03) What's behind Discngine collections(ATS6-PLAT03) What's behind Discngine collections
(ATS6-PLAT03) What's behind Discngine collectionsBIOVIA
1.7K views72 slides
Python in geospatial analysis by
Python in geospatial analysisPython in geospatial analysis
Python in geospatial analysisSakthivel R
539 views19 slides
Data and AI summit: data pipelines observability with open lineage by
Data and AI summit: data pipelines observability with open lineageData and AI summit: data pipelines observability with open lineage
Data and AI summit: data pipelines observability with open lineageJulien Le Dem
454 views27 slides
Observability for Data Pipelines With OpenLineage by
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageDatabricks
625 views27 slides
Glowing bear by
Glowing bear Glowing bear
Glowing bear thehyve
903 views28 slides

More Related Content

Similar to Science and data manipulation in Pharo

Dataframes Showdown (miniConf 2022) by
Dataframes Showdown (miniConf 2022)Dataframes Showdown (miniConf 2022)
Dataframes Showdown (miniConf 2022)8thLight
60 views18 slides
Machine learning at scale with Google Cloud Platform by
Machine learning at scale with Google Cloud PlatformMachine learning at scale with Google Cloud Platform
Machine learning at scale with Google Cloud PlatformMatthias Feys
8K views35 slides
Python ml by
Python mlPython ml
Python mlShubham Sharma
170 views29 slides
Complex Analysis in Public Transportation: A Step towards Smart Cities by
Complex Analysis in Public Transportation: A Step towards Smart CitiesComplex Analysis in Public Transportation: A Step towards Smart Cities
Complex Analysis in Public Transportation: A Step towards Smart CitiesDataWorks Summit
3.7K views17 slides
Physical Plans in Spark SQL by
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQLDatabricks
7K views126 slides
Data Discovery and Metadata by
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadatamarkgrover
610 views56 slides

Similar to Science and data manipulation in Pharo(20)

Dataframes Showdown (miniConf 2022) by 8thLight
Dataframes Showdown (miniConf 2022)Dataframes Showdown (miniConf 2022)
Dataframes Showdown (miniConf 2022)
8thLight60 views
Machine learning at scale with Google Cloud Platform by Matthias Feys
Machine learning at scale with Google Cloud PlatformMachine learning at scale with Google Cloud Platform
Machine learning at scale with Google Cloud Platform
Matthias Feys8K views
Complex Analysis in Public Transportation: A Step towards Smart Cities by DataWorks Summit
Complex Analysis in Public Transportation: A Step towards Smart CitiesComplex Analysis in Public Transportation: A Step towards Smart Cities
Complex Analysis in Public Transportation: A Step towards Smart Cities
DataWorks Summit3.7K views
Physical Plans in Spark SQL by Databricks
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQL
Databricks7K views
Data Discovery and Metadata by markgrover
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadata
markgrover610 views
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021 by StreamNative
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
StreamNative536 views
Dirty data? Clean it up! - Datapalooza Denver 2016 by Dan Lynn
Dirty data? Clean it up! - Datapalooza Denver 2016Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016
Dan Lynn1.3K views
Future se oct15 by CS, NcState
Future se oct15Future se oct15
Future se oct15
CS, NcState1.4K views
MapReduce: Optimizations, Limitations, and Open Issues by Vasia Kalavri
MapReduce: Optimizations, Limitations, and Open IssuesMapReduce: Optimizations, Limitations, and Open Issues
MapReduce: Optimizations, Limitations, and Open Issues
Vasia Kalavri1.8K views
Open Chemistry, JupyterLab and data: Reproducible quantum chemistry by Marcus Hanwell
Open Chemistry, JupyterLab and data: Reproducible quantum chemistryOpen Chemistry, JupyterLab and data: Reproducible quantum chemistry
Open Chemistry, JupyterLab and data: Reproducible quantum chemistry
Marcus Hanwell535 views
Pharo DataFrame: Past, Present, and Future by ESUG
Pharo DataFrame: Past, Present, and FuturePharo DataFrame: Past, Present, and Future
Pharo DataFrame: Past, Present, and Future
ESUG43 views
Better Together: How Graph database enables easy data integration with Spark ... by TigerGraph
Better Together: How Graph database enables easy data integration with Spark ...Better Together: How Graph database enables easy data integration with Spark ...
Better Together: How Graph database enables easy data integration with Spark ...
TigerGraph258 views
Introduction to Data Science - Week 4 - Tools and Technologies in Data Science by Ferdin Joe John Joseph PhD
Introduction to Data Science - Week 4 - Tools and Technologies in Data ScienceIntroduction to Data Science - Week 4 - Tools and Technologies in Data Science
Introduction to Data Science - Week 4 - Tools and Technologies in Data Science
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016 by Dan Lynn
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dan Lynn457 views

More from ESUG

Workshop: Identifying concept inventories in agile programming by
Workshop: Identifying concept inventories in agile programmingWorkshop: Identifying concept inventories in agile programming
Workshop: Identifying concept inventories in agile programmingESUG
9 views16 slides
Technical documentation support in Pharo by
Technical documentation support in PharoTechnical documentation support in Pharo
Technical documentation support in PharoESUG
27 views39 slides
The Pharo Debugger and Debugging tools: Advances and Roadmap by
The Pharo Debugger and Debugging tools: Advances and RoadmapThe Pharo Debugger and Debugging tools: Advances and Roadmap
The Pharo Debugger and Debugging tools: Advances and RoadmapESUG
56 views44 slides
Sequence: Pipeline modelling in Pharo by
Sequence: Pipeline modelling in PharoSequence: Pipeline modelling in Pharo
Sequence: Pipeline modelling in PharoESUG
86 views22 slides
Migration process from monolithic to micro frontend architecture in mobile ap... by
Migration process from monolithic to micro frontend architecture in mobile ap...Migration process from monolithic to micro frontend architecture in mobile ap...
Migration process from monolithic to micro frontend architecture in mobile ap...ESUG
19 views35 slides
Analyzing Dart Language with Pharo: Report and early results by
Analyzing Dart Language with Pharo: Report and early resultsAnalyzing Dart Language with Pharo: Report and early results
Analyzing Dart Language with Pharo: Report and early resultsESUG
106 views30 slides

More from ESUG(20)

Workshop: Identifying concept inventories in agile programming by ESUG
Workshop: Identifying concept inventories in agile programmingWorkshop: Identifying concept inventories in agile programming
Workshop: Identifying concept inventories in agile programming
ESUG9 views
Technical documentation support in Pharo by ESUG
Technical documentation support in PharoTechnical documentation support in Pharo
Technical documentation support in Pharo
ESUG27 views
The Pharo Debugger and Debugging tools: Advances and Roadmap by ESUG
The Pharo Debugger and Debugging tools: Advances and RoadmapThe Pharo Debugger and Debugging tools: Advances and Roadmap
The Pharo Debugger and Debugging tools: Advances and Roadmap
ESUG56 views
Sequence: Pipeline modelling in Pharo by ESUG
Sequence: Pipeline modelling in PharoSequence: Pipeline modelling in Pharo
Sequence: Pipeline modelling in Pharo
ESUG86 views
Migration process from monolithic to micro frontend architecture in mobile ap... by ESUG
Migration process from monolithic to micro frontend architecture in mobile ap...Migration process from monolithic to micro frontend architecture in mobile ap...
Migration process from monolithic to micro frontend architecture in mobile ap...
ESUG19 views
Analyzing Dart Language with Pharo: Report and early results by ESUG
Analyzing Dart Language with Pharo: Report and early resultsAnalyzing Dart Language with Pharo: Report and early results
Analyzing Dart Language with Pharo: Report and early results
ESUG106 views
Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6 by ESUG
Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6
Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6
ESUG37 views
A Unit Test Metamodel for Test Generation by ESUG
A Unit Test Metamodel for Test GenerationA Unit Test Metamodel for Test Generation
A Unit Test Metamodel for Test Generation
ESUG49 views
Creating Unit Tests Using Genetic Programming by ESUG
Creating Unit Tests Using Genetic ProgrammingCreating Unit Tests Using Genetic Programming
Creating Unit Tests Using Genetic Programming
ESUG46 views
Threaded-Execution and CPS Provide Smooth Switching Between Execution Modes by ESUG
Threaded-Execution and CPS Provide Smooth Switching Between Execution ModesThreaded-Execution and CPS Provide Smooth Switching Between Execution Modes
Threaded-Execution and CPS Provide Smooth Switching Between Execution Modes
ESUG52 views
Exploring GitHub Actions through EGAD: An Experience Report by ESUG
Exploring GitHub Actions through EGAD: An Experience ReportExploring GitHub Actions through EGAD: An Experience Report
Exploring GitHub Actions through EGAD: An Experience Report
ESUG17 views
Pharo: a reflective language A first systematic analysis of reflective APIs by ESUG
Pharo: a reflective language A first systematic analysis of reflective APIsPharo: a reflective language A first systematic analysis of reflective APIs
Pharo: a reflective language A first systematic analysis of reflective APIs
ESUG57 views
Garbage Collector Tuning by ESUG
Garbage Collector TuningGarbage Collector Tuning
Garbage Collector Tuning
ESUG20 views
Improving Performance Through Object Lifetime Profiling: the DataFrame Case by ESUG
Improving Performance Through Object Lifetime Profiling: the DataFrame CaseImproving Performance Through Object Lifetime Profiling: the DataFrame Case
Improving Performance Through Object Lifetime Profiling: the DataFrame Case
ESUG43 views
thisContext in the Debugger by ESUG
thisContext in the DebuggerthisContext in the Debugger
thisContext in the Debugger
ESUG36 views
Websockets for Fencing Score by ESUG
Websockets for Fencing ScoreWebsockets for Fencing Score
Websockets for Fencing Score
ESUG18 views
ShowUs: PharoJS.org Develop in Pharo, Run on JavaScript by ESUG
ShowUs: PharoJS.org Develop in Pharo, Run on JavaScriptShowUs: PharoJS.org Develop in Pharo, Run on JavaScript
ShowUs: PharoJS.org Develop in Pharo, Run on JavaScript
ESUG46 views
Advanced Object- Oriented Design Mooc by ESUG
Advanced Object- Oriented Design MoocAdvanced Object- Oriented Design Mooc
Advanced Object- Oriented Design Mooc
ESUG85 views
A New Architecture Reconciling Refactorings and Transformations by ESUG
A New Architecture Reconciling Refactorings and TransformationsA New Architecture Reconciling Refactorings and Transformations
A New Architecture Reconciling Refactorings and Transformations
ESUG28 views
BioSmalltalk by ESUG
BioSmalltalkBioSmalltalk
BioSmalltalk
ESUG415 views

Recently uploaded

DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)... by
DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)...DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)...
DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)...Deltares
9 views34 slides
Cycleops - Automate deployments on top of bare metal.pptx by
Cycleops - Automate deployments on top of bare metal.pptxCycleops - Automate deployments on top of bare metal.pptx
Cycleops - Automate deployments on top of bare metal.pptxThanassis Parathyras
30 views12 slides
ict act 1.pptx by
ict act 1.pptxict act 1.pptx
ict act 1.pptxsanjaniarun08
13 views17 slides
DSD-INT 2023 HydroMT model building and river-coast coupling in Python - Bove... by
DSD-INT 2023 HydroMT model building and river-coast coupling in Python - Bove...DSD-INT 2023 HydroMT model building and river-coast coupling in Python - Bove...
DSD-INT 2023 HydroMT model building and river-coast coupling in Python - Bove...Deltares
17 views17 slides
Unmasking the Dark Art of Vectored Exception Handling: Bypassing XDR and EDR ... by
Unmasking the Dark Art of Vectored Exception Handling: Bypassing XDR and EDR ...Unmasking the Dark Art of Vectored Exception Handling: Bypassing XDR and EDR ...
Unmasking the Dark Art of Vectored Exception Handling: Bypassing XDR and EDR ...Donato Onofri
711 views34 slides
Roadmap y Novedades de producto by
Roadmap y Novedades de productoRoadmap y Novedades de producto
Roadmap y Novedades de productoNeo4j
50 views33 slides

Recently uploaded(20)

DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)... by Deltares
DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)...DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)...
DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)...
Deltares9 views
Cycleops - Automate deployments on top of bare metal.pptx by Thanassis Parathyras
Cycleops - Automate deployments on top of bare metal.pptxCycleops - Automate deployments on top of bare metal.pptx
Cycleops - Automate deployments on top of bare metal.pptx
DSD-INT 2023 HydroMT model building and river-coast coupling in Python - Bove... by Deltares
DSD-INT 2023 HydroMT model building and river-coast coupling in Python - Bove...DSD-INT 2023 HydroMT model building and river-coast coupling in Python - Bove...
DSD-INT 2023 HydroMT model building and river-coast coupling in Python - Bove...
Deltares17 views
Unmasking the Dark Art of Vectored Exception Handling: Bypassing XDR and EDR ... by Donato Onofri
Unmasking the Dark Art of Vectored Exception Handling: Bypassing XDR and EDR ...Unmasking the Dark Art of Vectored Exception Handling: Bypassing XDR and EDR ...
Unmasking the Dark Art of Vectored Exception Handling: Bypassing XDR and EDR ...
Donato Onofri711 views
Roadmap y Novedades de producto by Neo4j
Roadmap y Novedades de productoRoadmap y Novedades de producto
Roadmap y Novedades de producto
Neo4j50 views
Copilot Prompting Toolkit_All Resources.pdf by Riccardo Zamana
Copilot Prompting Toolkit_All Resources.pdfCopilot Prompting Toolkit_All Resources.pdf
Copilot Prompting Toolkit_All Resources.pdf
Riccardo Zamana6 views
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra... by Marc Müller
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra....NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra...
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra...
Marc Müller38 views
DSD-INT 2023 Delft3D FM Suite 2024.01 2D3D - New features + Improvements - Ge... by Deltares
DSD-INT 2023 Delft3D FM Suite 2024.01 2D3D - New features + Improvements - Ge...DSD-INT 2023 Delft3D FM Suite 2024.01 2D3D - New features + Improvements - Ge...
DSD-INT 2023 Delft3D FM Suite 2024.01 2D3D - New features + Improvements - Ge...
Deltares16 views
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI... by Marc Müller
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...
Marc Müller36 views
Elevate your SAP landscape's efficiency and performance with HCL Workload Aut... by HCLSoftware
Elevate your SAP landscape's efficiency and performance with HCL Workload Aut...Elevate your SAP landscape's efficiency and performance with HCL Workload Aut...
Elevate your SAP landscape's efficiency and performance with HCL Workload Aut...
HCLSoftware6 views
Software testing company in India.pptx by SakshiPatel82
Software testing company in India.pptxSoftware testing company in India.pptx
Software testing company in India.pptx
SakshiPatel827 views
SUGCON ANZ Presentation V2.1 Final.pptx by Jack Spektor
SUGCON ANZ Presentation V2.1 Final.pptxSUGCON ANZ Presentation V2.1 Final.pptx
SUGCON ANZ Presentation V2.1 Final.pptx
Jack Spektor22 views
DSD-INT 2023 Baseline studies for Strategic Coastal protection for Long Islan... by Deltares
DSD-INT 2023 Baseline studies for Strategic Coastal protection for Long Islan...DSD-INT 2023 Baseline studies for Strategic Coastal protection for Long Islan...
DSD-INT 2023 Baseline studies for Strategic Coastal protection for Long Islan...
Deltares11 views
DSD-INT 2023 FloodAdapt - A decision-support tool for compound flood risk mit... by Deltares
DSD-INT 2023 FloodAdapt - A decision-support tool for compound flood risk mit...DSD-INT 2023 FloodAdapt - A decision-support tool for compound flood risk mit...
DSD-INT 2023 FloodAdapt - A decision-support tool for compound flood risk mit...
Deltares13 views
Citi TechTalk Session 2: Kafka Deep Dive by confluent
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
confluent17 views

Science and data manipulation in Pharo