The year 2018 marks the consolidation of Julia (https://julialang.org/), a programming language designed for scientific computing, as the first stable version (1.0) has been released, in August 2018. Among its main features, expressiveness and high execution speeds are the most prominent: the performance of Julia code is similar to
statically compiled languages, yet Julia provides a nice interactive shell and fully supports Jupyter; moreover, it can transparently call external codes written in C, Fortran, and even Python and R without the need of wrappers. The usage of Julia in the astronomical community is growing, and a GitHub oganization named JuliaAstro takes care of coordinating the development of packages. In this ADASS talk we present the features and shortcomings of this language, and discuss its application in astronomy and astrophysics.
Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...Olivier Teytaud
Here are a few suggestions on how to improve the Zermelo algorithm when it is too slow:
1. Add a depth limit. Stop recursion when a maximum search depth is reached. Return a heuristic evaluation instead of continuing search.
2. Use alpha-beta pruning. Track the best value found (alpha) and prune branches that cannot improve on it.
3. Iterative deepening. Run successive searches with increasing depth limits to get progressively better approximations.
4. Move ordering. Evaluate better moves earlier in the search tree. This prunes bad moves earlier.
5. Transposition tables. Store previously computed move evaluations to avoid re-expanding the same position.
6. Parallelize the
The document discusses tools for artificial intelligence and their applications. It describes Olivier Teytaud's work at Tao, a research group in Paris focused on reservoir computing, optimal decision making under uncertainty, optimization, and machine learning. It then provides examples of applications for these tools in electricity generation, urban rivals, pokemons, minesweeper, and solving unsolved situations in the game of Go. Olivier suggests that breakthroughs in games can help open doors to applying these algorithms to more important real-world problems by building trust in the approaches.
Building a cutting-edge data processing environment on a budgetGael Varoquaux
As a penniless academic I wanted to do "big data" for science. Open source, Python, and simple patterns were the way forward. Staying on top of todays growing datasets is an arm race. Data analytics machinery —clusters, NOSQL, visualization, Hadoop, machine learning, ...— can spread a team's resources thin. Focusing on simple patterns, lightweight technologies, and a good understanding of the applications gets us most of the way for a fraction of the cost.
I will present a personal perspective on ten years of scientific data processing with Python. What are the emerging patterns in data processing? How can modern data-mining ideas be used without a big engineering team? What constraints and design trade-offs govern software projects like scikit-learn, Mayavi, or joblib? How can we make the most out of distributed hardware with simple framework-less code?
This document summarizes a presentation on data science and big data for actuaries given by Arthur Charpentier. It discusses the history of data collection and analysis. It provides an overview of big data, including definitions of volume, variety and velocity. It also covers topics like unsupervised learning techniques including principal component analysis and cluster analysis. Computational issues for large-scale data analysis using techniques like parallelization are also summarized.
Dictionary Learning for Massive Matrix FactorizationArthur Mensch
This document proposes a method for scaling up dictionary learning for massive matrix factorization. It presents an online algorithm that can handle large datasets in both dimensions (many samples and many features) by introducing subsampling. The key steps are:
1) Computing codes on random subsets of samples instead of full samples to reduce complexity from O(p) to O(s) where s is the subsample size.
2) Partially updating the surrogate functions used for dictionary updates instead of full updates to also achieve O(s) complexity.
3) Performing cautious dictionary updates, leaving values unchanged for unseen features, to minimize in O(s) time.
Validation on fMRI and collaborative filtering datasets shows the method
Arno candel h2o_a_platform_for_big_math_hadoop_summit_june2016Sri Ambati
H2O: A Platform for Big Math
From just your laptop to 100's of nodes, H2O gives you a Single System Image - easy aggregation of all the memory and all the cores, and a simple coding style that scales wide at in-memory speeds. H2O is easily 1000x faster than disk based clustering solutions, and often 10x faster than best-of-breed alternative in-memory solutions - and will work directly on your existing Hadoop cluster. H2O ingests a wide variety of formats, parallel and distributed across the cluster, and stores the data highly compressed and then lets you do scale-out math at memory-bandwidth speeds (on compressed data!), making terabyte-scale munging an interactive experience. This is a technical talk on the insides of H2O, specifically focusing on the Single-System-Image aspect: how we write single-threaded code, and have H2O auto-parallelize and auto-scale-out to 100's of nodes and 1000's of cores.
Arno is the Chief Architect of H2O, a distributed and scalable open-source machine learning platform. He is also the main author of H2O’s Deep Learning. Before joining H2O.ai, Arno was a founding Senior MTS at Skytree where he designed and implemented high-performance machine learning algorithms. He has over a decade of experience in HPC with C++/MPI and had access to the world’s largest supercomputers as a Staff Scientist at SLAC National Accelerator Laboratory where he participated in US DOE scientific computing initiatives and collaborated with CERN on next-generation particle accelerators. Arno holds a PhD and Masters summa cum laude in Physics from ETH Zurich, Switzerland. He has authored dozens of scientific papers and is a sought-after conference speaker. Arno was named "2014 Big Data All-Star" by Fortune Magazine. Follow him on Twitter: @ArnoCandel.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Dictionary Learning for Massive Matrix Factorizationrecsysfr
The document presents a new algorithm called Subsampled Online Dictionary Learning (SODL) for solving very large matrix factorization problems with missing values efficiently. SODL adapts an existing online dictionary learning algorithm to handle missing values by only using the known ratings for each user, allowing it to process large datasets with billions of ratings in linear time with respect to the number of known ratings. Experiments on movie rating datasets show that SODL achieves similar prediction accuracy as the fastest existing solver but with a speed up of up to 6.8 times on the largest Netflix dataset tested.
The document discusses recommender systems and sequential recommendation problems. It covers several key points:
1) Matrix factorization and collaborative filtering techniques are commonly used to build recommender systems, but have limitations like cold start problems and how to incorporate additional constraints.
2) Sequential recommendation problems can be framed as multi-armed bandit problems, where past recommendations influence future recommendations.
3) Various bandit algorithms like UCB, Thompson sampling, and LinUCB can be applied, but extending guarantees to models like matrix factorization is challenging. Offline evaluation on real-world datasets is important.
Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...Olivier Teytaud
Here are a few suggestions on how to improve the Zermelo algorithm when it is too slow:
1. Add a depth limit. Stop recursion when a maximum search depth is reached. Return a heuristic evaluation instead of continuing search.
2. Use alpha-beta pruning. Track the best value found (alpha) and prune branches that cannot improve on it.
3. Iterative deepening. Run successive searches with increasing depth limits to get progressively better approximations.
4. Move ordering. Evaluate better moves earlier in the search tree. This prunes bad moves earlier.
5. Transposition tables. Store previously computed move evaluations to avoid re-expanding the same position.
6. Parallelize the
The document discusses tools for artificial intelligence and their applications. It describes Olivier Teytaud's work at Tao, a research group in Paris focused on reservoir computing, optimal decision making under uncertainty, optimization, and machine learning. It then provides examples of applications for these tools in electricity generation, urban rivals, pokemons, minesweeper, and solving unsolved situations in the game of Go. Olivier suggests that breakthroughs in games can help open doors to applying these algorithms to more important real-world problems by building trust in the approaches.
Building a cutting-edge data processing environment on a budgetGael Varoquaux
As a penniless academic I wanted to do "big data" for science. Open source, Python, and simple patterns were the way forward. Staying on top of todays growing datasets is an arm race. Data analytics machinery —clusters, NOSQL, visualization, Hadoop, machine learning, ...— can spread a team's resources thin. Focusing on simple patterns, lightweight technologies, and a good understanding of the applications gets us most of the way for a fraction of the cost.
I will present a personal perspective on ten years of scientific data processing with Python. What are the emerging patterns in data processing? How can modern data-mining ideas be used without a big engineering team? What constraints and design trade-offs govern software projects like scikit-learn, Mayavi, or joblib? How can we make the most out of distributed hardware with simple framework-less code?
This document summarizes a presentation on data science and big data for actuaries given by Arthur Charpentier. It discusses the history of data collection and analysis. It provides an overview of big data, including definitions of volume, variety and velocity. It also covers topics like unsupervised learning techniques including principal component analysis and cluster analysis. Computational issues for large-scale data analysis using techniques like parallelization are also summarized.
Dictionary Learning for Massive Matrix FactorizationArthur Mensch
This document proposes a method for scaling up dictionary learning for massive matrix factorization. It presents an online algorithm that can handle large datasets in both dimensions (many samples and many features) by introducing subsampling. The key steps are:
1) Computing codes on random subsets of samples instead of full samples to reduce complexity from O(p) to O(s) where s is the subsample size.
2) Partially updating the surrogate functions used for dictionary updates instead of full updates to also achieve O(s) complexity.
3) Performing cautious dictionary updates, leaving values unchanged for unseen features, to minimize in O(s) time.
Validation on fMRI and collaborative filtering datasets shows the method
Arno candel h2o_a_platform_for_big_math_hadoop_summit_june2016Sri Ambati
H2O: A Platform for Big Math
From just your laptop to 100's of nodes, H2O gives you a Single System Image - easy aggregation of all the memory and all the cores, and a simple coding style that scales wide at in-memory speeds. H2O is easily 1000x faster than disk based clustering solutions, and often 10x faster than best-of-breed alternative in-memory solutions - and will work directly on your existing Hadoop cluster. H2O ingests a wide variety of formats, parallel and distributed across the cluster, and stores the data highly compressed and then lets you do scale-out math at memory-bandwidth speeds (on compressed data!), making terabyte-scale munging an interactive experience. This is a technical talk on the insides of H2O, specifically focusing on the Single-System-Image aspect: how we write single-threaded code, and have H2O auto-parallelize and auto-scale-out to 100's of nodes and 1000's of cores.
Arno is the Chief Architect of H2O, a distributed and scalable open-source machine learning platform. He is also the main author of H2O’s Deep Learning. Before joining H2O.ai, Arno was a founding Senior MTS at Skytree where he designed and implemented high-performance machine learning algorithms. He has over a decade of experience in HPC with C++/MPI and had access to the world’s largest supercomputers as a Staff Scientist at SLAC National Accelerator Laboratory where he participated in US DOE scientific computing initiatives and collaborated with CERN on next-generation particle accelerators. Arno holds a PhD and Masters summa cum laude in Physics from ETH Zurich, Switzerland. He has authored dozens of scientific papers and is a sought-after conference speaker. Arno was named "2014 Big Data All-Star" by Fortune Magazine. Follow him on Twitter: @ArnoCandel.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Dictionary Learning for Massive Matrix Factorizationrecsysfr
The document presents a new algorithm called Subsampled Online Dictionary Learning (SODL) for solving very large matrix factorization problems with missing values efficiently. SODL adapts an existing online dictionary learning algorithm to handle missing values by only using the known ratings for each user, allowing it to process large datasets with billions of ratings in linear time with respect to the number of known ratings. Experiments on movie rating datasets show that SODL achieves similar prediction accuracy as the fastest existing solver but with a speed up of up to 6.8 times on the largest Netflix dataset tested.
The document discusses recommender systems and sequential recommendation problems. It covers several key points:
1) Matrix factorization and collaborative filtering techniques are commonly used to build recommender systems, but have limitations like cold start problems and how to incorporate additional constraints.
2) Sequential recommendation problems can be framed as multi-armed bandit problems, where past recommendations influence future recommendations.
3) Various bandit algorithms like UCB, Thompson sampling, and LinUCB can be applied, but extending guarantees to models like matrix factorization is challenging. Offline evaluation on real-world datasets is important.
The document discusses machine learning techniques for clustering and segmentation. It introduces Dirichlet process mixtures and the Chinese restaurant process as nonparametric Bayesian models that allow for an infinite number of clusters. It describes how these models can be used for problems like image segmentation, object recognition, population clustering from genetic data, and evolutionary document clustering over time. Approximate inference methods like Markov chain Monte Carlo sampling are used to analyze these models.
EuroSciPy 2019 - GANs: Theory and ApplicationsEmanuele Ghelfi
EuroSciPy 2019: https://pretalx.com/euroscipy-2019/talk/Q79NND/
GANs are the new hottest topic in the ML arena; however, they present a challenge for the researchers and the engineers alike. Their design, and most importantly, the code implementation has been causing headaches to the ML practitioners, especially when moving to production.
The workshop aims at providing a complete understanding of both the theory and the practical know-how to code and deploy this family of models in production. By the end of it, the attendees should be able to apply the concepts learned to other models without any issues.
We will be showcasing all the shiny new APIs introduced by TensorFlow 2.0 by showing how to build a GAN from scratch and how to "productionize" it by leveraging the AshPy Python package that allows to easily design, prototype, train and export Machine Learning models defined in TensorFlow 2.0.
The workshop is composed of:
- Theoretical introduction
- GANs from Scratch in TensorFlow 2.0
- High-performance input data pipeline with TensorFlow Datasets
- Introduction to the AshPy API
- Implementing, training, and visualizing DCGAN using AshPy
- Serving TF2 Models with Google Cloud Functions
The materials of the workshop will be openly provided via GitHub (https://github.com/zurutech/gans-from-theory-to-production).
An overview of Tensorflow, and then we'll walk through how to utilize this library within the H2O platform. Tensorflow is an open source, deep learning framework utilized by Google and Deepmind. #h2ony
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Learning to discover monte carlo algorithm on spin ice manifoldKai-Wen Zhao
The global update Monte Carlo sampler can be discovered naturally by trained machine using policy gradient method on topologically constrained environment.
Data-driven hypothesis generation using deep neural netsBalázs Kégl
1. Machine learning in science
- induction, inference, simulation, generation
2. Stretching the scientific method
- the p-value controversy and the problem of automated hypothesis generation
3. Generative models and novelty generation
IMAGE GENERATION WITH GANS-BASED TECHNIQUES: A SURVEYijcsit
In recent years, frameworks that employ Generative Adversarial Networks (GANs) have achieved immense results for various applications in many fields especially those related to image generation both due to their ability to create highly realistic and sharp images as well as train on huge data sets. However, successfully training GANs are notoriously difficult task in case ifhigh resolution images are required. In this article, we discuss five applicable and fascinating areas for image synthesis based on the state-of-theart GANs techniques including Text-to-Image-Synthesis, Image-to-Image-Translation, Face Manipulation, 3D Image Synthesis and DeepMasterPrints. We provide a detailed review of current GANs-based image generation models with their advantages and disadvantages.The results of the publications in each section show the GANs based algorithmsAREgrowing fast and their constant improvement, whether in the same field or in others, will solve complicated image generation tasks in the future.
Talk at ISIM 2017 in Durham, UK on applying database techniques to querying model results in the geosciences, with a broader position about the interaction between data science and simulation as modes of scientific inquiry.
Deep Water - Bringing Tensorflow, Caffe, Mxnet to H2OSri Ambati
Arno Candel introduces Deep Water, which brings Tensorflow, Caffe, Mxnet to H2O. It also brings support for GPUs, image classification, NLP and much more to H2O.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
The document summarizes a presentation on applying GANs in medical imaging. It discusses several papers on this topic:
1. A paper that used GANs to reduce noise in low-dose CT scans by training on paired routine-dose and low-dose CT images. This approach generated reconstructed low-dose CT images with improved quality.
2. A paper that used GANs for cross-modality synthesis, specifically generating skin lesion images from other modalities.
3. Additional papers discussed other medical imaging applications of GANs such as vessel-fundus image synthesis and organ segmentation.
Modern machine learning methods that could be useful for particle physics.
Personal summary of the "Connecting the dots 2015" conference at Berkeley lab and ideas for what particle physics could try.
DL-FOIL is an algorithm for concept learning that produces concept descriptions in description logic. It was modified from a previous DL-FOIL algorithm to improve the specialization procedure and heuristic. Preliminary experiments show it achieves good match rates compared to another concept learning method on several ontology datasets. Ongoing work includes additional evaluations, improving specialization procedures and heuristics, and addressing scalability through parallel and distributed computation.
SAGE is a free open-source mathematical software system that uses Python as its primary programming language. It aims to promote open-source software in mathematics. SAGE includes interfaces to many existing mathematical software packages and provides its own functionality for areas like algebra, calculus, and linear algebra. Using SAGE allows mathematical researchers to access powerful tools for computation while having the flexibility of an open-source system programmed in Python. However, SAGE currently has limited funding, which restricts its development and ability to match capabilities of closed-source alternatives.
This document provides an overview of a data science training module that introduces students to key concepts like big data, data science components, types of data scientists, and use cases. The module covers topics such as big data scenarios, challenges, Hadoop, R programming, and machine learning. Students will learn how to analyze real-world datasets and implement machine learning algorithms in both Hadoop and R. The goal is for students to understand how to use tools and techniques to extract insights from large datasets.
Invent Episode 3: Tech Talk on Parallel FutureM. Atif Qureshi
This document discusses the shift towards parallel computing due to physical limitations in processor speed improvements. It introduces MapReduce as a programming model for easily writing parallel programs to process large datasets across many computers. MapReduce works by splitting data, processing it in parallel via mapping functions, then collecting the results via reducing functions. Examples show how it can be used to count word frequencies or crawl the web in parallel.
Recommendation Systems in banking and Financial ServicesAndrea Gigli
The document discusses recommendation systems in banking and financial services. It describes different types of recommendation systems including content-based filtering, collaborative filtering, and hybrid filtering. It then discusses how recommendation systems could be useful in banking by using a bipartite graph and word embedding approaches to represent customer and asset data and identify relationships between them. Code examples are provided for implementing some of these recommendation system techniques.
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713Mathieu DESPRIEE
Machine learning and big data technologies enable new types of data analysis. Hadoop is an open-source framework that allows distributed storage and processing of large datasets across clusters of computers. It includes tools for working with structured and unstructured data to power applications in areas like recommendations, customer churn prediction, and more.
Big Data & Machine Learning - TDC2013 Sao PauloOCTO Technology
BigData and Machine Learning: Usage and Opportunities for your IT department
Talk presented at The Developer Conference in São Paulo - 12/0713
Mathieu DESPRIEE
From Lab to Factory: Or how to turn data into valuePeadar Coyle
We've all heard of 'big data' or data science, but how do we convert these trends into actual business value. I share case studies, and technology tips and talk about the challenges of the data science process. This is all based on two years of in-the-field research of deploying models, and going from prototypes to production.
These are slides from my talk at PyCon Ireland 2015
"An Evaluation of Models for Runtime Approximation in Link Discovery" as presented in the IEEE/WIC/ACM WI, August 25th, 2017, held in Leipzig, Germany.
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
The document discusses machine learning techniques for clustering and segmentation. It introduces Dirichlet process mixtures and the Chinese restaurant process as nonparametric Bayesian models that allow for an infinite number of clusters. It describes how these models can be used for problems like image segmentation, object recognition, population clustering from genetic data, and evolutionary document clustering over time. Approximate inference methods like Markov chain Monte Carlo sampling are used to analyze these models.
EuroSciPy 2019 - GANs: Theory and ApplicationsEmanuele Ghelfi
EuroSciPy 2019: https://pretalx.com/euroscipy-2019/talk/Q79NND/
GANs are the new hottest topic in the ML arena; however, they present a challenge for the researchers and the engineers alike. Their design, and most importantly, the code implementation has been causing headaches to the ML practitioners, especially when moving to production.
The workshop aims at providing a complete understanding of both the theory and the practical know-how to code and deploy this family of models in production. By the end of it, the attendees should be able to apply the concepts learned to other models without any issues.
We will be showcasing all the shiny new APIs introduced by TensorFlow 2.0 by showing how to build a GAN from scratch and how to "productionize" it by leveraging the AshPy Python package that allows to easily design, prototype, train and export Machine Learning models defined in TensorFlow 2.0.
The workshop is composed of:
- Theoretical introduction
- GANs from Scratch in TensorFlow 2.0
- High-performance input data pipeline with TensorFlow Datasets
- Introduction to the AshPy API
- Implementing, training, and visualizing DCGAN using AshPy
- Serving TF2 Models with Google Cloud Functions
The materials of the workshop will be openly provided via GitHub (https://github.com/zurutech/gans-from-theory-to-production).
An overview of Tensorflow, and then we'll walk through how to utilize this library within the H2O platform. Tensorflow is an open source, deep learning framework utilized by Google and Deepmind. #h2ony
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Learning to discover monte carlo algorithm on spin ice manifoldKai-Wen Zhao
The global update Monte Carlo sampler can be discovered naturally by trained machine using policy gradient method on topologically constrained environment.
Data-driven hypothesis generation using deep neural netsBalázs Kégl
1. Machine learning in science
- induction, inference, simulation, generation
2. Stretching the scientific method
- the p-value controversy and the problem of automated hypothesis generation
3. Generative models and novelty generation
IMAGE GENERATION WITH GANS-BASED TECHNIQUES: A SURVEYijcsit
In recent years, frameworks that employ Generative Adversarial Networks (GANs) have achieved immense results for various applications in many fields especially those related to image generation both due to their ability to create highly realistic and sharp images as well as train on huge data sets. However, successfully training GANs are notoriously difficult task in case ifhigh resolution images are required. In this article, we discuss five applicable and fascinating areas for image synthesis based on the state-of-theart GANs techniques including Text-to-Image-Synthesis, Image-to-Image-Translation, Face Manipulation, 3D Image Synthesis and DeepMasterPrints. We provide a detailed review of current GANs-based image generation models with their advantages and disadvantages.The results of the publications in each section show the GANs based algorithmsAREgrowing fast and their constant improvement, whether in the same field or in others, will solve complicated image generation tasks in the future.
Talk at ISIM 2017 in Durham, UK on applying database techniques to querying model results in the geosciences, with a broader position about the interaction between data science and simulation as modes of scientific inquiry.
Deep Water - Bringing Tensorflow, Caffe, Mxnet to H2OSri Ambati
Arno Candel introduces Deep Water, which brings Tensorflow, Caffe, Mxnet to H2O. It also brings support for GPUs, image classification, NLP and much more to H2O.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
The document summarizes a presentation on applying GANs in medical imaging. It discusses several papers on this topic:
1. A paper that used GANs to reduce noise in low-dose CT scans by training on paired routine-dose and low-dose CT images. This approach generated reconstructed low-dose CT images with improved quality.
2. A paper that used GANs for cross-modality synthesis, specifically generating skin lesion images from other modalities.
3. Additional papers discussed other medical imaging applications of GANs such as vessel-fundus image synthesis and organ segmentation.
Modern machine learning methods that could be useful for particle physics.
Personal summary of the "Connecting the dots 2015" conference at Berkeley lab and ideas for what particle physics could try.
DL-FOIL is an algorithm for concept learning that produces concept descriptions in description logic. It was modified from a previous DL-FOIL algorithm to improve the specialization procedure and heuristic. Preliminary experiments show it achieves good match rates compared to another concept learning method on several ontology datasets. Ongoing work includes additional evaluations, improving specialization procedures and heuristics, and addressing scalability through parallel and distributed computation.
SAGE is a free open-source mathematical software system that uses Python as its primary programming language. It aims to promote open-source software in mathematics. SAGE includes interfaces to many existing mathematical software packages and provides its own functionality for areas like algebra, calculus, and linear algebra. Using SAGE allows mathematical researchers to access powerful tools for computation while having the flexibility of an open-source system programmed in Python. However, SAGE currently has limited funding, which restricts its development and ability to match capabilities of closed-source alternatives.
This document provides an overview of a data science training module that introduces students to key concepts like big data, data science components, types of data scientists, and use cases. The module covers topics such as big data scenarios, challenges, Hadoop, R programming, and machine learning. Students will learn how to analyze real-world datasets and implement machine learning algorithms in both Hadoop and R. The goal is for students to understand how to use tools and techniques to extract insights from large datasets.
Invent Episode 3: Tech Talk on Parallel FutureM. Atif Qureshi
This document discusses the shift towards parallel computing due to physical limitations in processor speed improvements. It introduces MapReduce as a programming model for easily writing parallel programs to process large datasets across many computers. MapReduce works by splitting data, processing it in parallel via mapping functions, then collecting the results via reducing functions. Examples show how it can be used to count word frequencies or crawl the web in parallel.
Recommendation Systems in banking and Financial ServicesAndrea Gigli
The document discusses recommendation systems in banking and financial services. It describes different types of recommendation systems including content-based filtering, collaborative filtering, and hybrid filtering. It then discusses how recommendation systems could be useful in banking by using a bipartite graph and word embedding approaches to represent customer and asset data and identify relationships between them. Code examples are provided for implementing some of these recommendation system techniques.
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713Mathieu DESPRIEE
Machine learning and big data technologies enable new types of data analysis. Hadoop is an open-source framework that allows distributed storage and processing of large datasets across clusters of computers. It includes tools for working with structured and unstructured data to power applications in areas like recommendations, customer churn prediction, and more.
Big Data & Machine Learning - TDC2013 Sao PauloOCTO Technology
BigData and Machine Learning: Usage and Opportunities for your IT department
Talk presented at The Developer Conference in São Paulo - 12/0713
Mathieu DESPRIEE
From Lab to Factory: Or how to turn data into valuePeadar Coyle
We've all heard of 'big data' or data science, but how do we convert these trends into actual business value. I share case studies, and technology tips and talk about the challenges of the data science process. This is all based on two years of in-the-field research of deploying models, and going from prototypes to production.
These are slides from my talk at PyCon Ireland 2015
"An Evaluation of Models for Runtime Approximation in Link Discovery" as presented in the IEEE/WIC/ACM WI, August 25th, 2017, held in Leipzig, Germany.
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
- Ilab METIS is a collaboration between Inria-Tao, a research team focused on optimization and machine learning problems, and Artelys, an SME focused on power systems modeling.
- They develop black-box planning tools for power systems that aim to minimize model error by using direct policy search techniques on high-fidelity simulations.
- These tools are applied to problems like optimizing investments in new power plants, transmission lines, and other infrastructure for power grids under uncertainty.
This document discusses machine learning concepts including algorithms, data inputs/outputs, runtimes, and trends in academia vs industry. It notes that while academia focuses on algorithm complexity, industry prioritizes data-driven approaches using large datasets. Ensemble methods combining many simple models generally perform better than single complex models. Specific ML techniques discussed include word segmentation using n-gram probabilities, perceptrons for classification, SVD for recommendations and clustering, and crowdsourcing ensembles. The key lessons are that simple models with large data outperform complex models with less data, and that embracing many small independent models through ensembles is effective.
Alberto Danese discusses his experience using Kaggle, a platform for machine learning competitions using real-world data problems. He provides an overview of how Kaggle works, noting that companies post labeled datasets and problems that data scientists compete on to build predictive models. Their submissions are scored on public and private leaderboards, with cash prizes awarded to top models. Danese highlights examples of competitions in areas like fraud detection, cancer detection from medical images, and toxic comment classification. He outlines his approach of finding a motivating competition, understanding the problem, building baseline models, and avoiding overfitting the public leaderboard.
Congress on Evolutionary Computation (CEC 2016) - Plenary TalkGraham Kendall
Title: Is Evolutionary Computation Evolving Fast Enough?
Abstract: Evolutionary Computation (EC) has been part of the research agenda for at least 60 years but if you ask the average EC researcher to name three examples of EC being used in real world applications they might struggle. Other technologies have seen much wider adoption. 3D printing is changing the way that manufacturing is done, moving some of that functionality into the home. Immersive reality is on the verge of changing society, in ways that are not totally clear yet. Ubiquitous computing is becoming more prevalent, enabling users to access computing resources in ways that were unimaginable even just a few years ago. It might be argued that EC has not had the same penetration as other technologies. In this talk, we will look back at what EC promised, see if it has delivered on that promise and compare its progress with other technologies. Finally, we will suggest some challenges that might further advance EC and enable its wider adoption.
The document discusses modeling systems at the end of Dennard scaling and approaches to modeling in a post-Dennard era. It covers the end of consistent CPU performance improvements, the rise of specialized computing like GPUs and deep learning drivers. It also discusses using fewer bits in calculations, exploring uncertainties, and generating low-dimensional representations from complex models to help address the challenges of increased computing needs. Learning algorithms may help build emulators and surrogates of Earth system models to enable fitting-purpose simulations.
This document summarizes challenges in assembling large DNA sequence data sets and strategies to address them.
1. The cost to generate DNA sequence data is decreasing rapidly, creating data sets too large for most computers to assemble. Hundreds to thousands of such data sets are generated each year.
2. Techniques like streaming compression and low-memory probabilistic data structures allow assembly memory usage to scale linearly with the sample size rather than the total data, enabling assembly of larger datasets.
3. Benchmarking different computational platforms revealed that while some platforms have faster processors, the ability to store large amounts of data locally is also important for assembly tasks. Scaling algorithms, rather than just optimizing code, is key to addressing
Continual Learning is one of the most promising research areas to shift machine learning from solving a single task to something more similar to general intelligence.
Machine learning (and especially deep neural networks research) has shown outstanding results in the past 10 years, bringing us to the deep learning era, where learning models are everywhere and they interact with many aspect of our life.
However, machine learning have an enormous issue, which completely diversity it from biological learning: machine cannot learn continuously.
This is the so called catastrophic forgetting problem, and continual learning is trying to address it, making artificial intelligence able to continually learn for the entire duration of its "life".
A non-technical overview of Large Language Models, exploring their potential, limitations, and customization for specific challenges. While this deck is tailored for an audience from the financial industry in mind, its content remains broadly applicable.
(This updated version builds on our previous deck: slideshare.net/LoicMerckel/intro-to-llms.)
Similar to Towards new solutions for scientific computing: the case of Julia (20)
This presentation by Professor Giuseppe Colangelo, Jean Monnet Professor of European Innovation Policy, was made during the discussion “The Intersection between Competition and Data Privacy” held at the 143rd meeting of the OECD Competition Committee on 13 June 2024. More papers and presentations on the topic can be found at oe.cd/ibcdp.
This presentation was uploaded with the author’s consent.
This presentation by Tim Capel, Director of the UK Information Commissioner’s Office Legal Service, was made during the discussion “The Intersection between Competition and Data Privacy” held at the 143rd meeting of the OECD Competition Committee on 13 June 2024. More papers and presentations on the topic can be found at oe.cd/ibcdp.
This presentation was uploaded with the author’s consent.
1.) Introduction
Our Movement is not new; it is the same as it was for Freedom, Justice, and Equality since we were labeled as slaves. However, this movement at its core must entail economics.
2.) Historical Context
This is the same movement because none of the previous movements, such as boycotts, were ever completed. For some, maybe, but for the most part, it’s just a place to keep your stable until you’re ready to assimilate them into your system. The rest of the crabs are left in the world’s worst parts, begging for scraps.
3.) Economic Empowerment
Our Movement aims to show that it is indeed possible for the less fortunate to establish their economic system. Everyone else – Caucasian, Asian, Mexican, Israeli, Jews, etc. – has their systems, and they all set up and usurp money from the less fortunate. So, the less fortunate buy from every one of them, yet none of them buy from the less fortunate. Moreover, the less fortunate really don’t have anything to sell.
4.) Collaboration with Organizations
Our Movement will demonstrate how organizations such as the National Association for the Advancement of Colored People, National Urban League, Black Lives Matter, and others can assist in creating a much more indestructible Black Wall Street.
5.) Vision for the Future
Our Movement will not settle for less than those who came before us and stopped before the rights were equal. The economy, jobs, healthcare, education, housing, incarceration – everything is unfair, and what isn’t is rigged for the less fortunate to fail, as evidenced in society.
6.) Call to Action
Our movement has started and implemented everything needed for the advancement of the economic system. There are positions for only those who understand the importance of this movement, as failure to address it will continue the degradation of the people deemed less fortunate.
No, this isn’t Noah’s Ark, nor am I a Prophet. I’m just a man who wrote a couple of books, created a magnificent website: http://www.thearkproject.llc, and who truly hopes to try and initiate a truly sustainable economic system for deprived people. We may not all have the same beliefs, but if our methods are tried, tested, and proven, we can come together and help others. My website: http://www.thearkproject.llc is very informative and considerably controversial. Please check it out, and if you are afraid, leave immediately; it’s no place for cowards. The last Prophet said: “Whoever among you sees an evil action, then let him change it with his hand [by taking action]; if he cannot, then with his tongue [by speaking out]; and if he cannot, then, with his heart – and that is the weakest of faith.” [Sahih Muslim] If we all, or even some of us, did this, there would be significant change. We are able to witness it on small and grand scales, for example, from climate control to business partnerships. I encourage, invite, and challenge you all to support me by visiting my website.
This presentation by Katharine Kemp, Associate Professor at the Faculty of Law & Justice at UNSW Sydney, was made during the discussion “The Intersection between Competition and Data Privacy” held at the 143rd meeting of the OECD Competition Committee on 13 June 2024. More papers and presentations on the topic can be found at oe.cd/ibcdp.
This presentation was uploaded with the author’s consent.
This presentation by OECD, OECD Secretariat, was made during the discussion “The Intersection between Competition and Data Privacy” held at the 143rd meeting of the OECD Competition Committee on 13 June 2024. More papers and presentations on the topic can be found at oe.cd/ibcdp.
This presentation was uploaded with the author’s consent.
• For a full set of 530+ questions. Go to
https://skillcertpro.com/product/servicenow-cis-itsm-exam-questions/
• SkillCertPro offers detailed explanations to each question which helps to understand the concepts better.
• It is recommended to score above 85% in SkillCertPro exams before attempting a real exam.
• SkillCertPro updates exam questions every 2 weeks.
• You will get life time access and life time free updates
• SkillCertPro assures 100% pass guarantee in first attempt.
Why Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdfBen Linders
Psychological safety in teams is important; team members must feel safe and able to communicate and collaborate effectively to deliver value. It’s also necessary to build long-lasting teams since things will happen and relationships will be strained.
But, how safe is a team? How can we determine if there are any factors that make the team unsafe or have an impact on the team’s culture?
In this mini-workshop, we’ll play games for psychological safety and team culture utilizing a deck of coaching cards, The Psychological Safety Cards. We will learn how to use gamification to gain a better understanding of what’s going on in teams. Individuals share what they have learned from working in teams, what has impacted the team’s safety and culture, and what has led to positive change.
Different game formats will be played in groups in parallel. Examples are an ice-breaker to get people talking about psychological safety, a constellation where people take positions about aspects of psychological safety in their team or organization, and collaborative card games where people work together to create an environment that fosters psychological safety.
Gamify it until you make it Improving Agile Development and Operations with ...Ben Linders
So many challenges, so little time. While we’re busy developing software and keeping it operational, we also need to sharpen the saw, but how? Gamification can be a way to look at how you’re doing and find out where to improve. It’s a great way to have everyone involved and get the best out of people.
In this presentation, Ben Linders will show how playing games with the DevOps coaching cards can help to explore your current development and deployment (DevOps) practices and decide as a team what to improve or experiment with.
The games that we play are based on an engagement model. Instead of imposing change, the games enable people to pull in ideas for change and apply those in a way that best suits their collective needs.
By playing games, you can learn from each other. Teams can use games, exercises, and coaching cards to discuss values, principles, and practices, and share their experiences and learnings.
Different game formats can be used to share experiences on DevOps principles and practices and explore how they can be applied effectively. This presentation provides an overview of playing formats and will inspire you to come up with your own formats.
The importance of sustainable and efficient computational practices in artificial intelligence (AI) and deep learning has become increasingly critical. This webinar focuses on the intersection of sustainability and AI, highlighting the significance of energy-efficient deep learning, innovative randomization techniques in neural networks, the potential of reservoir computing, and the cutting-edge realm of neuromorphic computing. This webinar aims to connect theoretical knowledge with practical applications and provide insights into how these innovative approaches can lead to more robust, efficient, and environmentally conscious AI systems.
Webinar Speaker: Prof. Claudio Gallicchio, Assistant Professor, University of Pisa
Claudio Gallicchio is an Assistant Professor at the Department of Computer Science of the University of Pisa, Italy. His research involves merging concepts from Deep Learning, Dynamical Systems, and Randomized Neural Systems, and he has co-authored over 100 scientific publications on the subject. He is the founder of the IEEE CIS Task Force on Reservoir Computing, and the co-founder and chair of the IEEE Task Force on Randomization-based Neural Networks and Learning Systems. He is an associate editor of IEEE Transactions on Neural Networks and Learning Systems (TNNLS).
Disaster Management project for holidays homework and other uses
Towards new solutions for scientific computing: the case of Julia
1. Towards new solutions for scientific computing: the case of Julia
Maurizio Tomasi Mosè Giordano
Nov 15 2018
2. Who are we?
Maurizio Tomasi (myself) Mosè Giordano
▶ Worked on the Planck mission (calibration,
simulations, data analysis…)
▶ Currently involved in other CMB experiments
▶ Worked on gravitational microlensing
▶ Author of several Julia packages
(github.com/giordano)
Towards new solutions for scientific computing: the case of Julia (Maurizio Tomasi, Mosè Giordano) ADASS 2018 (Nov 15 2018)
3. Making Python more performant
Python is a fantastic language: easy and with a very rich library. (And AstroPy is awesome!)
However, its speed is not impressive at all!
In [1]: %time x = [i*i for i in range(100_000_000)]
CPU times: user 5.27 s, sys: 860 ms, total: 6.13 s
Wall time: 6.18 s
Several solutions have been developed: NumPy, PyPy, Numba, Cython… They can be extremely
performant in their own domains, but picking the right one requires careful consideration.
Towards new solutions for scientific computing: the case of Julia (Maurizio Tomasi, Mosè Giordano) ADASS 2018 (Nov 15 2018)
4. The two-language problem
In order to make Python codes more performant, it is common to link them to C/C++/Fortran, using
tools like f2py, SWIG, Cython, and so on:
Main program
1st module
2nd module
3rd module
These codes are complex to implement and deploy:
▶ Need to master many languages
▶ Try to write a portable setup.py for projects using f2py!
Towards new solutions for scientific computing: the case of Julia (Maurizio Tomasi, Mosè Giordano) ADASS 2018 (Nov 15 2018)
5. Meet Julia
▶ Relatively new language (first official release was 0.2, in Nov 2013)
▶ Julia 1.0 released on August, 9th 2018
▶ Released under the MIT license
▶ julialang.org
Towards new solutions for scientific computing: the case of Julia (Maurizio Tomasi, Mosè Giordano) ADASS 2018 (Nov 15 2018)
6. Towards new solutions for scientific computing: the case of Julia (Maurizio Tomasi, Mosè Giordano) ADASS 2018 (Nov 15 2018)
7. A taste of Julia
I’m going to play a screencast now. You can watch it again at the following URL:
asciinema.org/a/211906.
The following slides are left as a reference if you don’t have an internet connection. They are not the
same as the screencast, as they show some more features.
Towards new solutions for scientific computing: the case of Julia (Maurizio Tomasi, Mosè Giordano) ADASS 2018 (Nov 15 2018)
8. A taste of Julia (1/5)
# One-liner definition of a function
f(x) = 3x + 1
# Floating-point
@time f(0.1) # 0.005716 seconds (15.63 k allocations: 872.499 KiB)
@time f(0.3) # 0.000002 seconds (5 allocations: 176 bytes)
# Integer
@time f(2) # 0.002888 seconds (2.00 k allocations: 117.656 KiB)
@time f(10) # 0.000001 seconds (4 allocations: 160 bytes)
# Rational
@time f(3//2) # 0.070596 seconds (209.83 k allocations: 10.785 MiB)
@time f(4//9) # 0.000005 seconds (6 allocations: 224 bytes)
Towards new solutions for scientific computing: the case of Julia (Maurizio Tomasi, Mosè Giordano) ADASS 2018 (Nov 15 2018)
9. A taste of Julia (2/5)
# Load default packages
using Printf
using Pkg
# Install a few new packages from Internet
for name in ["Cosmology", "Measurements", "Zygote#master", "PyCall"]
Pkg.add(name)
end
using Cosmology
c = cosmology(h=0.69, Neff=3.04, OmegaM=0.29, Tcmb=2.7255)
z = 0.1
@printf("Universe age at z=%.1f: %.1f Gyrn", z, age_gyr(c, z))
# Prints "Universe age at z=0.1: 12.5 Gyr"
Towards new solutions for scientific computing: the case of Julia (Maurizio Tomasi, Mosè Giordano) ADASS 2018 (Nov 15 2018)
10. A taste of Julia (3/5)
using Measurements # Define the ± binary operator
z = 0.1 ± 0.01
println(z)
# Prints "0.1 ± 0.01"
age = age_gyr(c, z)
println(age)
# Prints "12.465336269441773 ± 0.12305608850870296"
@printf("%.2f ± %.2f Gyrn", age.val, age.err)
# prints "12.47 ± 0.12 Gyr"
Towards new solutions for scientific computing: the case of Julia (Maurizio Tomasi, Mosè Giordano) ADASS 2018 (Nov 15 2018)
11. A taste of Julia (4/5)
# See https://arxiv.org/abs/1810.07951
using Zygote # Long time to compile…
g(x) = 2x + 1
println(g(1)) # Print 3
println(g'(1)) # Print 2 (derivative of g at x=1)
@code_llvm g'(1) # Surprise! "ret i64 2"
Towards new solutions for scientific computing: the case of Julia (Maurizio Tomasi, Mosè Giordano) ADASS 2018 (Nov 15 2018)
12. A taste of Julia (5/5)
using PyCall
@pyimport numpy.random as nr
x = nr.randn(5)
Towards new solutions for scientific computing: the case of Julia (Maurizio Tomasi, Mosè Giordano) ADASS 2018 (Nov 15 2018)
13. JuliaAstro on GitHub
Package Description
AstroImages.jl Visualization of astronomical images (by MG)
AstroLib.jl Astronomical and astrophysical routines (by MG)
AstroTime.jl Astronomical time keeping
Cosmology.jl Library of cosmological functions
DustExtinction.jl Models for the interstellar extinction due to dust
ERFA.jl Wrapper to liberfa
EarthOrientation.jl Earth orientation parameters from IERS tables
FITSIO.jl Flexible Image Transport System (FITS) file support
LombScargle.jl Compute Lomb-Scargle periodogram (by MG)
SPICE.jl Julia wrapper for NASA NAIF’s SPICE toolkit
SkyCoords.jl Support for astronomical coordinate systems
UnitfulAstro.jl An extension of Unitful.jl for astronomers
WCS.jl Astronomical World Coordinate Systems library
Towards new solutions for scientific computing: the case of Julia (Maurizio Tomasi, Mosè Giordano) ADASS 2018 (Nov 15 2018)
14. Simulating cosmological experiments with Julia
Towards new solutions for scientific computing: the case of Julia (Maurizio Tomasi, Mosè Giordano) ADASS 2018 (Nov 15 2018)
15. Data science with Julia
The good The bad
▶ Very fast execution for codes with lots of
calculations
▶ Powerful features (many numerical types,
metaprogramming, missing values…)
▶ Ability to call C, Fortran, Python, R
▶ Package management is rock solid
(reproducible builds, like Rust’s cargo)
▶ Native support for parallel computing (no GIL
here!)
▶ Profiling tools immediately available (e.g.,
--track-allocation)
▶ Slow execution if every function is called just
once
▶ Not as many packages as other languages
(Python, C++, …)
▶ Plotting is promising (PyPlot.jl, Plots.jl,
UnicodePlots.jl, Makie.jl, …), but still lacking
▶ Avoid global variables as the plague! They
make the compiler highly inefficient
Towards new solutions for scientific computing: the case of Julia (Maurizio Tomasi, Mosè Giordano) ADASS 2018 (Nov 15 2018)
16. When to use Julia
Julia is interesting if:
▶ You are going to implement a code that will do lots of calculations and is going to spend much
time in doing it, and you will write this code from scratch.
Towards new solutions for scientific computing: the case of Julia (Maurizio Tomasi, Mosè Giordano) ADASS 2018 (Nov 15 2018)
17. When to use Julia
Julia is interesting if:
▶ You are going to implement a code that will do lots of calculations and is going to spend much
time in doing it, and you will write this code from scratch.
▶ You have an existing large, monolithic code and want to turn it into something to be used
interactively, without sacrificing speed.
Towards new solutions for scientific computing: the case of Julia (Maurizio Tomasi, Mosè Giordano) ADASS 2018 (Nov 15 2018)
18. When to use Julia
Julia is interesting if:
▶ You are going to implement a code that will do lots of calculations and is going to spend much
time in doing it, and you will write this code from scratch.
▶ You have an existing large, monolithic code and want to turn it into something to be used
interactively, without sacrificing speed.
▶ You plan to use Julia’s homoiconicity to do something really innovative, like Zygote.jl!
Towards new solutions for scientific computing: the case of Julia (Maurizio Tomasi, Mosè Giordano) ADASS 2018 (Nov 15 2018)
19. More information
▶ Julia compiler: julialang.org
▶ Julia user’s manual: docs.julialang.org/en/v1
▶ Package list available at juliaobserver.com
▶ User’s and developers’ forums: discourse.julialang.org
▶ Very good blogpost about Numba, Cython, and Julia:
www.stochasticlifestyle.com/why-numba-and-cython-are-not-substitutes-for-julia
▶ JuliaAstro: github.com/JuliaAstro
▶ These slides and additional material: bitbucket.org/Maurizio_Tomasi/adass2018-julia
▶ For questions, feel free to ask me or write me an email: maurizio.tomasi@unimi.it
Towards new solutions for scientific computing: the case of Julia (Maurizio Tomasi, Mosè Giordano) ADASS 2018 (Nov 15 2018)
20. Backup slides
Towards new solutions for scientific computing: the case of Julia (Maurizio Tomasi, Mosè Giordano) ADASS 2018 (Nov 15 2018)
21. Calculations with NumPy arrays
Consider this code, where all the parameters for f are NumPy arrays:
def f(r, x1, x2, x3, x4):
r = x1 - x2 + x3 - x4
This code is executed by NumPy as if it were
tmp = x1 - x2
tmp += x3
r = tmp - x4
thus three for loops are ran.
Towards new solutions for scientific computing: the case of Julia (Maurizio Tomasi, Mosè Giordano) ADASS 2018 (Nov 15 2018)
22. Performance of NumPy codes
2 3 4 5 6 7 8
Number of terms in the expression
1
2
3
4
5
6
7
8
9
Besttime(100runs)[ms]
Python 3.6.6 + NumPy 1.15.1
Source codes available at github.com/ziotom78/python-julia-c-.
Towards new solutions for scientific computing: the case of Julia (Maurizio Tomasi, Mosè Giordano) ADASS 2018 (Nov 15 2018)
23. Calculations with C++ vectors
In C++, the code needs to be written in this way:
for(size_t i = 0; i < r.size(); ++i) {
r[i] = x1[i] - x2[i] + x3[i] - x4[i];
}
We need to write the for loop explicitly, but there is only **one* of them.
Towards new solutions for scientific computing: the case of Julia (Maurizio Tomasi, Mosè Giordano) ADASS 2018 (Nov 15 2018)
24. Performance of NumPy/C++ codes
2 3 4 5 6 7 8
Number of terms in the expression
1
2
3
4
5
6
7
8
9
Besttime(100runs)[ms]
Python 3.6.6 + NumPy 1.15.1
G++ 7.3.0, flags -O3 -march -msse3
Source codes available at github.com/ziotom78/python-julia-c-.
Towards new solutions for scientific computing: the case of Julia (Maurizio Tomasi, Mosè Giordano) ADASS 2018 (Nov 15 2018)
25. Performance of Julia programs
# 4 terms
f(r, x1, x2, x3, x4) = @. r = x1 - x2 + x3 - x4
g(r, x1, x2, x3, x4) = @. r = x1 - x2 + x3
h(r, x1, x2, x3, x4) = @. r = x1 - x2
# Etc.
The @. macro fuses all the operations on loops. Thus, f above is equivalent to
function f(r, x1, x2, x3, x4)
for i in eachindex(r)
r[i] = x1[i] - x2[i] + x3[i] - x4[i]
end
end
Towards new solutions for scientific computing: the case of Julia (Maurizio Tomasi, Mosè Giordano) ADASS 2018 (Nov 15 2018)
26. Performance of NumPy, C++, and Julia codes
2 3 4 5 6 7 8
Number of terms in the expression
1
2
3
4
5
6
7
8
9
Besttime(100runs)[ms]
Python 3.6.6 + NumPy 1.15.1
G++ 7.3.0, flags -O3 -march -msse3
Julia 1.0.2
Source codes available at github.com/ziotom78/python-julia-c-.
Towards new solutions for scientific computing: the case of Julia (Maurizio Tomasi, Mosè Giordano) ADASS 2018 (Nov 15 2018)
27. Using SIMD instructions in Julia
C++ was given an unfair advantage, as it was allowed to use SIMD instructions (-msse3). Moreover, it
did not check array boundaries (Julia does automatically).
In Julia, we can use the @inbounds and @simd macro to make Julia code equivalent to C++:
function f(r, x1, x2, x3, x4)
@inbounds @simd for i in eachindex(r)
r[i] = x1[i] - x2[i] + x3[i] - x4[i]
end
end
Towards new solutions for scientific computing: the case of Julia (Maurizio Tomasi, Mosè Giordano) ADASS 2018 (Nov 15 2018)
28. Performance of NumPy, C++, and Julia codes
2 3 4 5 6 7 8
Number of terms in the expression
1
2
3
4
5
6
7
8
9
Besttime(100runs)[ms]
Python 3.6.6 + NumPy 1.15.1
G++ 7.3.0, flags -O3 -march -msse3
Julia 1.0.2
Source codes available at github.com/ziotom78/python-julia-c-.
Towards new solutions for scientific computing: the case of Julia (Maurizio Tomasi, Mosè Giordano) ADASS 2018 (Nov 15 2018)
29. Performance of NumPy, C++, and Julia codes
2 3 4 5 6 7 8
Number of terms in the expression
1
2
3
4
5
6
7
8
9
Besttime(100runs)[ms]
Python 3.6.6 + NumPy 1.15.1
G++ 7.3.0, flags -O3 -march -msse3
Julia 1.0.2
Julia 1.0.2, with @inbounds @simd
Source codes available at github.com/ziotom78/python-julia-c-.
Towards new solutions for scientific computing: the case of Julia (Maurizio Tomasi, Mosè Giordano) ADASS 2018 (Nov 15 2018)