Your SlideShare is downloading.
×

- 1. Sebastian JORDAN MONTANO 1 sebastian.jordan@inria.fr pharo-ai Oleksandr ZAITSEV 2,1 oleksandr.zaitsev@arolla.fr 1 Inria, Univ. Lille, CNRS, Centrale Lille, UMR 9189 - CRIStAL 2 Arolla, Paris August 2022
- 2. a modular library for shallow machine learning in Pharo github.com/pharo-ai We introduce pharo-ai v0.8
- 3. Artificial Intelligence Machine Learning Deep Learning Techniques which enable machines to mimic human behaviour Subset of AI techniques which use statistical methods to enable machines to improve with experience Subset of ML which uses deep neural networks (A*, knowledge based, …) (linear regression, k-means, naive bayes, …) (convolutional networks, GAN, Seq2seq, …) 3
- 4. Machine Learning Supervised Unsupervised - No labeled data - Extract patterns from the data - Labeled data - First learn from examples, then apply to new data (classification, regression,…) (clustering, anomaly detection,…) 4
- 5. Why do we need a ML library in Pharo - We would like to contribute to the work that is currently being developed by different people (Univ. Chile, Object Profile-Chile, PolyMathOrg, Semantics-Bolivia, CIRAD-France). - We want to provide tools for the Pharo community people interested in doing ML and AI. 5
- 6. How do we position ourselves Python R Pharo Data Analysis & Manipulation pandas data.frame, dplyr DataFrame Algebra & Statistics numpy, scipy MASS, SparseM PolyMath Shallow Learning scikit-learn caret, ml3 pharo-ai Deep Learning TensorFlow, Keras TensorFlow, Keras TensorFlow, Keras Visualisation matplotlib ggplot Roassal 6
- 7. Algorithms and libraries available in pharo-ai ecosystem
- 8. 8 Ready to be used Work in progress DataFrame Datasets Normalisation Random Data Partitioner Data Generator Data manipulation Gaussian Mixture Models K-Nearest Neighbours Hierarchical Clustering K-Means Clustering Linear Regression Logistic Regression Support Vector Machines Regression Principal Component Analysis Metrics PolyMath Fast Linear Algebra Scienti fi c Computation Edit Distances
- 9. 9 Ready to be used Work in progress Random Forest Decision Trees Naive Bayes Classi fi er Classi fi cation OpenAI OpenAI Binding Graph Algorithms Graph Algorithms A-Priori Data Mining Natural Language Processing Stop Words N-Gram Language Model Spelling Correction TF-IDF
- 10. 10 Ready to be used Work in progress DataFrame Datasets Normalisati on Random Data Data Generator Data Gaussian Mixture Models K- Nearest Hierarchical Clustering K-Means Clustering Linear Regression Logistic Regression Support Vector Machines Regression Principal Component Metrics PolyMath Fast Linear Algebra Scienti fi c Computation Edit Distances Random Forest Decision Trees Naive Bayes Classi fi ca OpenAI OpenAI Graph Algorithms Graph A-Priori Data Mining Natural Language Stop Words N- Gram Spelling Correction TF-IDF
- 11. Inspector Extensions for DataFrame 11
- 12. Inspector Extensions for DataFrame 12
- 13. Other Machine Learning Projects 13
- 14. pharo-ai wiki 14
- 15. Linear Regression benchmarks
- 16. 16 Dataset size Name Columns Rows Size Small 200,000 20 82 Mb Medium 1,000,000 20 411 Mb Large 5,000,000 20 2,06 Gb
- 17. 17 Pharo & LAPACK vs Scikit-learn Time in seconds (less is better) 0 4 8 12 16 Pharo & LAPACK scikit-learn Small Dataset Medium Dataset Large Dataset 8x faster 5x faster 8x faster
- 18. 18 Pure Pharo vs Python Time in seconds (less is better) 0 10000 20000 30000 40000 Pure Pharo Pure Python Small Dataset Medium Dataset 5x faster 15x faster
- 19. 19 Time in seconds (less is better) Small Medium Pure Pharo vs Pharo & LAPACK 1820 times faster 2103 times faster Pharo Pharo & LAPACK 19min40sec 1h44min 0.6sec 2.9sec Help
- 20. Visit Us ! Play, Use, and Contribute 20 https://github.com/pharo-ai/wiki pharo-ai wiki: Start here