A tutorial on using machine-learning for functional-connectomes, for instance on resting-state fMRI. This is typically useful for population imaging: comparing traits or conditions across subjects.
Towards psychoinformatics with machine learning and brain imagingGael Varoquaux
Informatics in the psychological sciences brings fascinating challenges as mental processes or pathologies have fuzzy definition and are hard to quantify. Brain imaging brings rich data on the neural substrate of these concepts, yet it is a non trivial link.
The goal of this presentation is to put forward basic ideas of "psychoinformatics", using advanced processing on brain images to quantify better the elements of psychology.
It discusses how machine learning can bridge brain images to behavior: to describe better mental processes involved in brain activity, or to extract biomarkers of pathologies, individual traits, or cognition.
Estimating Functional Connectomes: Sparsity’s Strength and LimitationsGael Varoquaux
Talk given at the OHBM 2017 education course.
I present the challenges and techniques to estimating meaningful brain functional connectomes from fMRI: why sparsity in inverse covariance leads to models that can interpreted as interactions between regions.
Then I discuss the limitations of sparse estimators and introduce shrinkage as an alternative. Finally, I discuss how to compare multiple functional connectomes.
Dirty data science machine learning on non-curated dataGael Varoquaux
These slides are a one-hour course on machine learning with non-curated data.
According to industry surveys, the number one hassle of data scientists is cleaning the data to analyze it. Here, I survey what "dirtyness" forces time-consuming cleaning. We will then cover two specific aspects of dirty data: non-normalized entries and missing values. I show how, for these two problems, machine-learning practice can be adapted to work directly on a data table without curation. The normalization problem can be tackled by adapting methods from natural language processing. The missing-values problem will lead us to revisit classic statistical results in the setting of supervised learning.
Similarity encoding for learning on dirty categorical variablesGael Varoquaux
For statistical learning, categorical variables in a table are usually considered as discrete entities and encoded separately to feature vectors, e.g., with one-hot encoding. "Dirty" non-curated data gives rise to categorical variables with a very high cardinality but redundancy: several categories reflect the same entity. In databases, this issue is typically solved with a deduplication step. We show that a simple approach that exposes the redundancy to the learning algorithm brings significant gains. We study a generalization of one-hot encoding, similarity encoding, that builds feature vectors from similarities across categories. We perform a thorough empirical validation on non-curated tables, a problem seldom studied in machine learning. Results on seven real-world datasets show that similarity encoding brings significant gains in prediction in comparison with known encoding methods for categories or strings, notably one-hot encoding and bag of character n-grams. We draw practical recommendations for encoding dirty categories: 3-gram similarity appears to be a good choice to capture morphological resemblance. For very high-cardinality, dimensionality reduction significantly reduces the computational cost with little loss in performance: random projections or choosing a subset of prototype categories still outperforms classic encoding approaches.
Inter-site autism biomarkers from resting state fMRIGael Varoquaux
We present an automated pipeline to learn predictive biomarkers from resting-state fMRI. We apply it to classifying autism on unseen sites, demonstrating the feasibility of biomarkers on weakly standardized functional imaging data.
We study the steps of the pipeline that are important to predict and can show that 1) the choice of atlas is the most important choice. Ideally the atlas should be made of functional regions learned from the data. 2) "tangent space" parametrization of the connectivity is the best performer.
We conclude on general recommendations for predictive biomarkers from resting-state fMRI
Brain maps from machine learning? Spatial regularizationsGael Varoquaux
Pattern Recognition for NeuroImaging (PR4NI)
We will show empirically how the pattern recognition techniques-commonly used, such as SVMs, provide low-quality brain maps, eventhough they give very good prediction accuracy. We will give an overview of recently developed techniques to impose priors on patterns particularly well suited to neuroimaging: selecting a small number of spatially-structured predictive brain regions. These tools reconcile machine learning with
brain mapping by giving maps more useful to draw neuroscientific conclusions. In addition, they are more robust to cross-individuals spatial variability and thus generalize well across subjects.
Connectomics: Parcellations and Network Analysis MethodsGael Varoquaux
Simple tutorial on methods for functional connectome analysis: learning regions, extracting functional signal, inferring the network structure, and comparing it across subjects.
Towards psychoinformatics with machine learning and brain imagingGael Varoquaux
Informatics in the psychological sciences brings fascinating challenges as mental processes or pathologies have fuzzy definition and are hard to quantify. Brain imaging brings rich data on the neural substrate of these concepts, yet it is a non trivial link.
The goal of this presentation is to put forward basic ideas of "psychoinformatics", using advanced processing on brain images to quantify better the elements of psychology.
It discusses how machine learning can bridge brain images to behavior: to describe better mental processes involved in brain activity, or to extract biomarkers of pathologies, individual traits, or cognition.
Estimating Functional Connectomes: Sparsity’s Strength and LimitationsGael Varoquaux
Talk given at the OHBM 2017 education course.
I present the challenges and techniques to estimating meaningful brain functional connectomes from fMRI: why sparsity in inverse covariance leads to models that can interpreted as interactions between regions.
Then I discuss the limitations of sparse estimators and introduce shrinkage as an alternative. Finally, I discuss how to compare multiple functional connectomes.
Dirty data science machine learning on non-curated dataGael Varoquaux
These slides are a one-hour course on machine learning with non-curated data.
According to industry surveys, the number one hassle of data scientists is cleaning the data to analyze it. Here, I survey what "dirtyness" forces time-consuming cleaning. We will then cover two specific aspects of dirty data: non-normalized entries and missing values. I show how, for these two problems, machine-learning practice can be adapted to work directly on a data table without curation. The normalization problem can be tackled by adapting methods from natural language processing. The missing-values problem will lead us to revisit classic statistical results in the setting of supervised learning.
Similarity encoding for learning on dirty categorical variablesGael Varoquaux
For statistical learning, categorical variables in a table are usually considered as discrete entities and encoded separately to feature vectors, e.g., with one-hot encoding. "Dirty" non-curated data gives rise to categorical variables with a very high cardinality but redundancy: several categories reflect the same entity. In databases, this issue is typically solved with a deduplication step. We show that a simple approach that exposes the redundancy to the learning algorithm brings significant gains. We study a generalization of one-hot encoding, similarity encoding, that builds feature vectors from similarities across categories. We perform a thorough empirical validation on non-curated tables, a problem seldom studied in machine learning. Results on seven real-world datasets show that similarity encoding brings significant gains in prediction in comparison with known encoding methods for categories or strings, notably one-hot encoding and bag of character n-grams. We draw practical recommendations for encoding dirty categories: 3-gram similarity appears to be a good choice to capture morphological resemblance. For very high-cardinality, dimensionality reduction significantly reduces the computational cost with little loss in performance: random projections or choosing a subset of prototype categories still outperforms classic encoding approaches.
Inter-site autism biomarkers from resting state fMRIGael Varoquaux
We present an automated pipeline to learn predictive biomarkers from resting-state fMRI. We apply it to classifying autism on unseen sites, demonstrating the feasibility of biomarkers on weakly standardized functional imaging data.
We study the steps of the pipeline that are important to predict and can show that 1) the choice of atlas is the most important choice. Ideally the atlas should be made of functional regions learned from the data. 2) "tangent space" parametrization of the connectivity is the best performer.
We conclude on general recommendations for predictive biomarkers from resting-state fMRI
Brain maps from machine learning? Spatial regularizationsGael Varoquaux
Pattern Recognition for NeuroImaging (PR4NI)
We will show empirically how the pattern recognition techniques-commonly used, such as SVMs, provide low-quality brain maps, eventhough they give very good prediction accuracy. We will give an overview of recently developed techniques to impose priors on patterns particularly well suited to neuroimaging: selecting a small number of spatially-structured predictive brain regions. These tools reconcile machine learning with
brain mapping by giving maps more useful to draw neuroscientific conclusions. In addition, they are more robust to cross-individuals spatial variability and thus generalize well across subjects.
Connectomics: Parcellations and Network Analysis MethodsGael Varoquaux
Simple tutorial on methods for functional connectome analysis: learning regions, extracting functional signal, inferring the network structure, and comparing it across subjects.
최근 이수가 되고 있는 Bayesian Deep Learning 관련 이론과 최근 어플리케이션들을 소개합니다. Bayesian Inference 의 이론에 관해서 간단히 설명하고 Yarin Gal 의 Monte Carlo Dropout 의 이론과 어플리케이션들을 소개합니다.
In this presentation, we provide a quick intro do bayesian inference, Gaussian Processes and then later relate to the latest state of the art research on Bayesian Deep Learning, in order to include uncertainty in deep neural net predictions
1. Y. Gal, Uncertainty in Deep Learning, 2016
2. P. McClure, Representing Inferential Uncertainty in Deep Neural Networks Through Sampling, 2017
3. G. Khan et al., Uncertainty-Aware Reinforcement
Learning from Collision Avoidance, 2016
4. B. Lakshminarayanan et al., Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles, 2017
5. A. Kendal and Y. Gal, What Uncertainties Do We Need in
Bayesian Deep Learning for Computer Vision?, 2017
6. S. Choi et al., Uncertainty-Aware Learning from Demonstration Using Mixture Density Networks with Sampling-Free Variance Modeling, 2017
7. Anonymous, Bayesian Uncertainty Estimation for
Batch Normalized Deep Networks, 2017
Uncertainty in Deep Learning, Gal (2016)
Representing Inferential Uncertainty in Deep Neural Networks Through Sampling, McClure & Kriegeskorte (2017)
Uncertainty-Aware Reinforcement Learning from Collision Avoidance, Khan et al. (2016)
Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles, Lakshminarayanan et al. (2017)
What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?, Kendal & Gal (2017)
Uncertainty-Aware Learning from Demonstration Using Mixture Density Networks with Sampling-Free Variance Modeling, Choi et al. (2017)
Bayesian Uncertainty Estimation for Batch Normalized Deep Networks, Anonymous (2018)
Performance evaluation of GANs in a semisupervised OCR use caseFlorian Wilhelm
Even in the age of big data, labeled data is a scarce resource in many machine learning use cases. Florian Wilhelm evaluates generative adversarial networks (GANs) when used to extract information from vehicle registrations under a varying amount of labeled data, compares the performance with supervised learning techniques, and demonstrates a significant improvement when using unlabeled data.
Fast relaxation methods for the matrix exponential David Gleich
The matrix exponential is a matrix computing primitive used in link prediction and community detection. We describe a fast method to compute it using relaxation on a large linear system of equations. This enables us to compute a column of the matrix exponential is sublinear time, or under a second on a standard desktop computer.
Anti-differentiating Approximation Algorithms: PageRank and MinCutDavid Gleich
We study how Google's PageRank method relates to mincut and a particular type of electrical flow in a network. We also explain the details of how the "push method" for computing PageRank helps to accelerate it. This has implications for semi-supervised learning and machine learning, as well as social network analysis.
Pattern-based classification of demographic sequencesDmitrii Ignatov
We have proposed prefix-based gapless sequential patterns for classification of demographic sequences. In comparison to black-box machine learning techniques, this one provides interpretable patterns suitable for treatment by professional demographers. As for the language, we have used Pattern Structures as an extension of Formal Concept Analysis for the case of complex data like sequences, graphs, intervals, etc.
Localized methods for diffusions in large graphsDavid Gleich
I describe a few ongoing research projects on diffusions in large graphs and how we can create efficient matrix computations in order to determine them efficiently.
This Naive Bayes Tutorial from Edureka will help you understand all the concepts of Naive Bayes classifier, use cases and how it can be used in the industry. This tutorial is ideal for both beginners as well as professionals who want to learn or brush up their concepts in Data Science and Machine Learning through Naive Bayes. Below are the topics covered in this tutorial:
1. What is Machine Learning?
2. Introduction to Classification
3. Classification Algorithms
4. What is Naive Bayes?
5. Use Cases of Naive Bayes
6. Demo – Employee Salary Prediction in R
Scikit-learn and nilearn: Democratisation of machine learning for brain imagingGael Varoquaux
This talk describe our efforts to bring easily usable machine learning to brain mapping. It covers both questions that machine learning can answer as well as two softwares developed to facilitate machine learning and it's application to neuroimaging.
최근 이수가 되고 있는 Bayesian Deep Learning 관련 이론과 최근 어플리케이션들을 소개합니다. Bayesian Inference 의 이론에 관해서 간단히 설명하고 Yarin Gal 의 Monte Carlo Dropout 의 이론과 어플리케이션들을 소개합니다.
In this presentation, we provide a quick intro do bayesian inference, Gaussian Processes and then later relate to the latest state of the art research on Bayesian Deep Learning, in order to include uncertainty in deep neural net predictions
1. Y. Gal, Uncertainty in Deep Learning, 2016
2. P. McClure, Representing Inferential Uncertainty in Deep Neural Networks Through Sampling, 2017
3. G. Khan et al., Uncertainty-Aware Reinforcement
Learning from Collision Avoidance, 2016
4. B. Lakshminarayanan et al., Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles, 2017
5. A. Kendal and Y. Gal, What Uncertainties Do We Need in
Bayesian Deep Learning for Computer Vision?, 2017
6. S. Choi et al., Uncertainty-Aware Learning from Demonstration Using Mixture Density Networks with Sampling-Free Variance Modeling, 2017
7. Anonymous, Bayesian Uncertainty Estimation for
Batch Normalized Deep Networks, 2017
Uncertainty in Deep Learning, Gal (2016)
Representing Inferential Uncertainty in Deep Neural Networks Through Sampling, McClure & Kriegeskorte (2017)
Uncertainty-Aware Reinforcement Learning from Collision Avoidance, Khan et al. (2016)
Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles, Lakshminarayanan et al. (2017)
What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?, Kendal & Gal (2017)
Uncertainty-Aware Learning from Demonstration Using Mixture Density Networks with Sampling-Free Variance Modeling, Choi et al. (2017)
Bayesian Uncertainty Estimation for Batch Normalized Deep Networks, Anonymous (2018)
Performance evaluation of GANs in a semisupervised OCR use caseFlorian Wilhelm
Even in the age of big data, labeled data is a scarce resource in many machine learning use cases. Florian Wilhelm evaluates generative adversarial networks (GANs) when used to extract information from vehicle registrations under a varying amount of labeled data, compares the performance with supervised learning techniques, and demonstrates a significant improvement when using unlabeled data.
Fast relaxation methods for the matrix exponential David Gleich
The matrix exponential is a matrix computing primitive used in link prediction and community detection. We describe a fast method to compute it using relaxation on a large linear system of equations. This enables us to compute a column of the matrix exponential is sublinear time, or under a second on a standard desktop computer.
Anti-differentiating Approximation Algorithms: PageRank and MinCutDavid Gleich
We study how Google's PageRank method relates to mincut and a particular type of electrical flow in a network. We also explain the details of how the "push method" for computing PageRank helps to accelerate it. This has implications for semi-supervised learning and machine learning, as well as social network analysis.
Pattern-based classification of demographic sequencesDmitrii Ignatov
We have proposed prefix-based gapless sequential patterns for classification of demographic sequences. In comparison to black-box machine learning techniques, this one provides interpretable patterns suitable for treatment by professional demographers. As for the language, we have used Pattern Structures as an extension of Formal Concept Analysis for the case of complex data like sequences, graphs, intervals, etc.
Localized methods for diffusions in large graphsDavid Gleich
I describe a few ongoing research projects on diffusions in large graphs and how we can create efficient matrix computations in order to determine them efficiently.
This Naive Bayes Tutorial from Edureka will help you understand all the concepts of Naive Bayes classifier, use cases and how it can be used in the industry. This tutorial is ideal for both beginners as well as professionals who want to learn or brush up their concepts in Data Science and Machine Learning through Naive Bayes. Below are the topics covered in this tutorial:
1. What is Machine Learning?
2. Introduction to Classification
3. Classification Algorithms
4. What is Naive Bayes?
5. Use Cases of Naive Bayes
6. Demo – Employee Salary Prediction in R
Scikit-learn and nilearn: Democratisation of machine learning for brain imagingGael Varoquaux
This talk describe our efforts to bring easily usable machine learning to brain mapping. It covers both questions that machine learning can answer as well as two softwares developed to facilitate machine learning and it's application to neuroimaging.
Scikit-learn for easy machine learning: the vision, the tool, and the projectGael Varoquaux
Scikit-learn is a popular machine learning tool. What can it do for you?Why you you want to use it? What can you do with it? Where is it going?In this talk, I will discuss why and how scikit-learn became popular. Iwill argue that it is successful because of its vision: it fills an important slot in the rich ecosystem of data science. I will demonstrate how scikit-learn makes predictive analysis easy and yet versatile.I will shed some light on our development process: how do we, as a community, ensure the quality and the growth of scikit-learn?
Better neuroimaging data processing: driven by evidence, open communities, an...Gael Varoquaux
My current thoughts about methods validity and design in brain imaging.
Data processing is a significant part of a neuroimaging study. The choice of corresponding methods and tools is crucial. I will give an opinionated view how on a path to building better data processing for neuroimaging. I will take examples on endeavors that I contributed to: defining standards for functional-connectivity analysis, the nilearn neuroimaging tool, the scikit-learn machine-learning toolbox -an industry standard with a million regular users. I will cover not only the technical process -statistics, signal processing, software engineering- but also the epistemology of methods development. Methods govern our results, they are more than a technical detail.
Recent years have seen the emergence of several static analysis techniques for reasoning about programs. This talk presents several major classes of techniques and tools that implement these techniques. Part of the presentation will be a demonstration of the tools.
Dr. Subash Shankar is an Associate Professor in the Computer Science department at Hunter College, CUNY. Prior to joining CUNY, he received a PhD from the University of Minnesota and was a postdoctoral fellow in the model checking group at Carnegie Mellon University. Dr. Shankar also has over 10 years of industrial experience, mostly in the areas of formal methods and tools for analyzing hardware and software systems.
My presentation at University of Nottingham "Fast low-rank methods for solvin...Alexander Litvinenko
Overview of my (with co-authors) low-rank tensor methods for solving PDEs with uncertain coefficients. Connection with Bayesian Update. Solving a coupled system: stochastic forward and stochastic inverse.
⭐⭐⭐⭐⭐ Device Free Indoor Localization in the 28 GHz band based on machine lea...Victor Asanza
By exploiting the received power change in a communication link produced by the presence of a human body in an otherwise empty room, this work evaluates indoor free device localization methods in the 28 GHz band using machine learning techniques. For this objective, a database is built using results from ray tracing simulations of a system comprised of 4 receivers and up to 2 transmitters, while a person is standing within the room. Transmitters are equipped with uniform linear arrays that switch their main beams sequentially at 21 angles, whereas the receivers operate with omnidirectional antennas. Statistical localization error reduction of at least 16% over a global-based classification technique can be obtained through the combination of two independent classifiers using one transmitter and a reduction of at least 19% for 2 transmitters. An additional improvement is achieved by combining each independent classifier with a regression algorithm. Results also suggest that the number of examples per class and size of the blocks (strips) in which the study area is partitioned play a role in the localization error.
DARMDN: Deep autoregressive mixture density nets for dynamical system mode...Balázs Kégl
Unlike computers, physical engineering systems (such as data center cooling or wireless network control) do not get faster with time. This is arguably one of the main reasons why recent beautiful advances in deep reinforcement learning (RL) stay mostly in the realm of simulated worlds and do not immediately translate to practical success in the real world. In order to make the best use of the small data sets these systems generate, we develop data-driven neural simulators to model the system and apply model-based control to optimize them. In this talk I will present the first step of this research agenda, a new versatile system modelling tool called deep autoregressive mixture density net (DARMDN – pronounced darm-dee-en). We argue that the performance of model-based reinforcement learning is partly limited by the approximation capacity of the currently used conditional density models and show how DARMDN alleviates these limitations. The model, combined with a random shooting controller, establishes a new state of the art on the popular Acrobot benchmark. Our most interesting and counter-intuitive finding is that the “sincos” Acrobot system which requires no multimodal posterior predictives, can be solved with a deterministic model, but only if it is trained as a probabilistic model. A deterministic model that is trained to minimize MSE leads to prediction error accumulation.
Building a cutting-edge data processing environment on a budgetGael Varoquaux
As a penniless academic I wanted to do "big data" for science. Open source, Python, and simple patterns were the way forward. Staying on top of todays growing datasets is an arm race. Data analytics machinery —clusters, NOSQL, visualization, Hadoop, machine learning, ...— can spread a team's resources thin. Focusing on simple patterns, lightweight technologies, and a good understanding of the applications gets us most of the way for a fraction of the cost.
I will present a personal perspective on ten years of scientific data processing with Python. What are the emerging patterns in data processing? How can modern data-mining ideas be used without a big engineering team? What constraints and design trade-offs govern software projects like scikit-learn, Mayavi, or joblib? How can we make the most out of distributed hardware with simple framework-less code?
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...MLconf
Understanding Deep Learning for Big Data: The complexity and scale of big data impose tremendous challenges for their analysis. Yet, big data also offer us great opportunities. Some nonlinear phenomena, features or relations, which are not clear or cannot be inferred reliably from small and medium data, now become clear and can be learned robustly from big data. Typically, the form of the nonlinearity is unknown to us, and needs to be learned from data as well. Being able to harness the nonlinear structures from big data could allow us to tackle problems which are impossible before or obtain results which are far better than previous state-of-the-arts.
Nowadays, deep neural networks are the methods of choice when it comes to large scale nonlinear learning problems. What makes deep neural networks work? Is there any general principle for tackling high dimensional nonlinear problems which we can learn from deep neural works? Can we design competitive or better alternatives based on such knowledge? To make progress in these questions, my machine learning group performed both theoretical and experimental analysis on existing and new deep learning architectures, and investigate three crucial aspects on the usefulness of the fully connected layers, the advantage of the feature learning process, and the importance of the compositional structures. Our results point to some promising directions for future research, and provide guideline for building new deep learning models.
Talk giving at PRNI 2016 for the paper https://arxiv.org/pdf/1606.06439v1.pdf
Abstract — Spatially-sparse predictors are good models for
brain decoding: they give accurate predictions and their weight
maps are interpretable as they focus on a small number of
regions. However, the state of the art, based on total variation or
graph-net, is computationally costly. Here we introduce sparsity
in the local neighborhood of each voxel with social-sparsity, a
structured shrinkage operator. We find that, on brain imaging
classification problems, social-sparsity performs almost as well as
total-variation models and better than graph-net, for a fraction
of the computational cost. It also very clearly outlines predictive
regions. We give details of the model and the algorithm
Co-Learning: Consensus-based Learning for Multi-Agent SystemsMiguel Rebollo
Distributed federated learning using consensus with intelligent agents over a network. Work presented to the 20th International Conference on Practical Applications of Agents and Multi-Agent Systems. July 2022 L'Aquila (Italy)
A Cooperative Coevolutionary Approach to Maximise Surveillance Coverage of UA...Daniel H. Stolfi
This paper presents the parameterisation and optimisation of the CACOC (Chaotic Ant Colony Optimisation for Coverage) mobility model used by an Unmanned Aerial Vehicle (UAV) swarm to perform surveillance tasks. CACOC uses chaotic solutions of a dynamical system and pheromones for optimising area coverage. Consequently, several parameters of CACOC are to be optimised with the aim of improving its coverage performance. We propose a Genetic Algorithm (GA) and two Cooperative Coevolutionary Genetic Algorithms (CCGA) to tackle this problem. After testing our proposals on four case studies we performed a comparative analysis to conclude that the cooperative approaches allow a better exploration of the search space by optimising each UAV parameters independently.
https://doi.org/10.1109/CCNC46108.2020.9045643
A tutorial on Machine Learning, with illustrations for MR imagingGael Varoquaux
Machine learning builds predictive models from the data. It is massive used on medical images these days, for a variety of applications ranging from segmentation to diagnosis.
This is an introductory tutorial to machine learning from giving intuitions on the statistical point of view. It introduce the methodology, the concepts behind the central models, the validation framework and some caveats to look for.
It also discusses some applications to drawing conclusions from brain imaging, and use these applications to highlight various technical aspects to running machine learning models on high-dimensional data such as medical imaging.
Similar to Machine learning for functional connectomes (20)
Evaluating machine learning models and their diagnostic valueGael Varoquaux
Model evaluation is, in my opinion, the most overlooked step of the machine-learning pipeline. Reliably estimating a model's performance for a given purpose is crucial and difficult. In this talk, I first discuss choosing metric informative for the application, stressing the importance of the class prevalence in classification settings. I will then discussing procedures to estimate the generalization performance, drawing a distinction between evaluating a learning procedure or a prediction rule, and discussing how to give confidence intervals to the performance estimates.
Measuring mental health with machine learning and brain imagingGael Varoquaux
The study of mental health relies vastly on behavior testing and questionnaires. I discuss how
machine learning on large brain-imaging cohorts can open new alleys for markers of mental health. My
claims are that challenges are the amount of diagnosed conditions rather than heterogeneity of the
conditions and that we should turn to proxy labels. I discuss another fundamental challenge to this
agenda: the external and construct validity of brain-imaging based markers.
A tutorial on machine learning to build prediction models with missing values.
The slides cover both theoretical results (statistical learning) and practical advice, with a focus on implementation in Python with scikit-learn
Representation learning in limited-data settingsGael Varoquaux
A 4-hour long didactic course on simple notions of representations and how to use them in limited-data settings:
- A supervised learning point of view, giving intuitions and math on what are representations are why they matter
- Building simple unsupervised learning models to extract representation: from matrix decomposition for signals to embeddings of entities
- Evaluating models in limited-data settings, often a bottleneck
This slide-deck was given as a course at the 2021 DeepLearn summer school.
Functional-connectome biomarkers to meet clinical needs?Gael Varoquaux
Extracting Functional-Connectome Biomarkers with Machine Learning: a talk in the symposium on how do current predictive connectivity models meet clinician’s needs?
This talk is a bit provocative and first sets visions, before bringing a few technical suggestions
Atlases of cognition with large-scale human brain mappingGael Varoquaux
Cognitive neuroscience uses neuroimaging to identify brain systems engaged in specific cognitive tasks. However, linking unequivocally brain systems with cognitive functions is difficult: each task probes only a small number of facets of cognition, while brain systems are often engaged in many tasks. We develop a new approach to generate a functional atlas of cognition, demonstrating brain systems selectively associated with specific cognitive functions. This approach relies upon an ontology that defines specific cognitive functions and the relations between them, along with an analysis scheme tailored to this ontology. Using a database of thirty neuroimaging studies, we show that this approach provides a highly-specific atlas of mental functions, and that it can decode the mental processes engaged in new tasks.
Simple representations for learning: factorizations and similarities Gael Varoquaux
Real-life data seldom comes in the ideal form for statistical learning.
This talk focuses on high-dimensional problems for signals and
discrete entities: when dealing with many, correlated, signals or
entities, it is useful to extract representations that capture these
correlations.
Matrix factorization models provide simple but powerful representations. They are used for recommender systems across discrete entities such as users and products, or to learn good dictionaries to represent images. However they entail large computing costs on very high-dimensional data, databases with many products or high-resolution images. I will present an
algorithm to factorize huge matrices based on stochastic subsampling that gives up to 10-fold speed-ups [1].
With discrete entities, the explosion of dimensionality may be due to variations in how a smaller number of categories are represented. Such a problem of "dirty categories" is typical of uncurated data sources. I will discuss how encoding this data based on similarities recovers a useful category structure with no preprocessing. I will show how it interpolates between one-hot encoding and techniques used in character-level natural language processing.
[1] Stochastic subsampling for factorizing huge matrices, A Mensch, J Mairal, B Thirion, G Varoquaux, IEEE Transactions on Signal Processing 66 (1), 113-128
[2] Similarity encoding for learning with dirty categorical variables. P Cerda, G Varoquaux, B Kégl Machine Learning (2018): 1-18
Computational practices for reproducible scienceGael Varoquaux
Reconciling bleeding-edge scientific results and reproducible research may seem a conundrum in our fast-paced high-pressure academic world. I discuss the practices that I found useful in computational work. At a high level, it is important to navigate the space between rapid experimentation and industrial-grade software development. I advocate adopting more and more software-engineering best practices as a project matures. I will also discuss how to turn the computational work into libraries, and to ensure the quality of the resulting libraries. And I conclude on how those libraries need to fit in the larger picture of the exercise of research to give better science.
Slides for my keynote at Scipy 2017
https://youtu.be/eVDDL6tgsv8
Computing has been driving forward a revolution in how science and technology can solve new problems. Python has grown to be a central player in this game, from computational physics to data science. I would like to explore some lessons learned doing science with Python as well as doing Python libraries for science. What are the ingredients that the scientists need? What technical and project-management choices drove the success of projects I've been involved with? How do these demands and offers shape our ecosystem?
In this talk, I'd like to share a few thoughts on how we code for science and innovation, with the modest goal of changing the world.
Data science calls for rapid experimentation and building intuitions from the data. Yet, data science also underpins crucial decisions and operational logic. Writing production-ready and robust statistical analysis without cognitive overhead may seem a conundrum. I will explore simple, and less simple, practices for fast turn around and consolidation of data-science code. I will discuss how these considerations led to the design of scikit-learn, that enables easy machine learning yet is used in production. Finally, I will mention some scikit-learn gems, new or forgotten.
Scientist meets web dev: how Python became the language of dataGael Varoquaux
Python started as a scripting language, but now it is the new trend everywhere and in particular for data science, the latest rage of computing. It didn’t get there by chance: tools and concepts built by nerdy scientists and geek sysadmins provide foundations for what is said to be the sexiest job: data scientist.
In this talk I give a personal perspective on the progress of the scientific Python ecosystem, from numerical physics to data mining. What made Python suitable for science; Why the cultural gap between scientific Python and the broader Python community turned out to be a gold mine; And where this richness might lead us.
The talk will discuss low-level and high-level technical aspects, such as how the Python world makes it easy to move large chunks of number across code. It will touch upon current technical details that make scikit-learn and joblib stand.
Machine learning and cognitive neuroimaging: new tools can answer new questionsGael Varoquaux
Machine learning is geared towards prediction. However, aside diagnosis or prognosis in the clinics, cognitive neuroimaging strives for uncovering insights from the data, rather than minimizing prediction error. I review various inferences on brain function that have been drawn using pattern recognition techniques, focusing on decoding. In particular, I discuss using generalization as a test for information, multivariate analysis to interpret overlapping activation patterns, and decoding for principled reverse inference. I give each time a statistical view and a cognitive imaging view.
Personal point of view on scikit-learn: past, present, and future.
This talks gives a bit of history, mentions exciting development, and a personal vision on the future.
Succeeding in academia despite doing good_softwareGael Varoquaux
Hacking academia for fun and profit
Thoughts on succeeding in academia despite doing good software
Keynote I gave at the Scipyconf Argentina 2014 conference
The advancement of science is a noble cause, and academia a fierce battlefield for tenure. Software is seen as a mere technicality, not worth a line on an academic CV. I claim that, on the opposite software, is the new medium of scientific method. I claim that succeeding in academia can be achieved not despite writing good software but via such an accomplishment. The key is to choose the right battles and to win them.
What is the emerging role of software in the scientific workflow? Which are the software challenges that can have impact? How to balance software quality assurance and the quick turn-around random-walk of research? What does "good design" mean for research software? What Python patterns can boost productivity and reuse in exploratory scientific computing?
I will try to answer these questions, based on my personal experience of growing up to become an academic Pythonista.
Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...Gael Varoquaux
High-level talk about machine learning: the statistical and computational challenges, as well as how they can be answer by the scikit-learn Python toolkit. In French
Scikit learn: apprentissage statistique en PythonGael Varoquaux
Présentation au niveau sur "scikit-learn", un toolkit d'apprentissage statistique (machine learning) en Python.
Philosophie et strategie du projet, ainsi que API et très bref examples de code.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
2. Machine learning for functional connectomes
Gaël Varoquaux
Outline:
1 Intuitions on machine learning
2 Machine learning on rest fMRI
Pointers to code in nilearn & scikit-learn
nilearn.github.io — scikit-learn.org
Use the “API reference” to look up functions
and scroll down for examples of usage
3. 1 Intuitions on machine learning
Adjusting models for prediction
G Varoquaux 2
4. 1 Machine learning in a nutshell: an example
Face recognition
Andrew Bill Charles Dave
G Varoquaux 3
5. 1 Machine learning in a nutshell: an example
Face recognition
Andrew Bill Charles Dave
?G Varoquaux 3
6. 1 Machine learning in a nutshell
A simple method:
1 Store all the known (noisy) images and the names
that go with them.
2 From a new (noisy) images, find the image that is
most similar.
“Nearest neighbor” method
G Varoquaux 4
7. 1 Machine learning in a nutshell
A simple method:
1 Store all the known (noisy) images and the names
that go with them.
2 From a new (noisy) images, find the image that is
most similar.
“Nearest neighbor” method
How many errors on already-known images?
... 0: no errors
Test data = Train data
G Varoquaux 4
8. 1 Machine learning in a nutshell
A simple method:
1 Store all the known (noisy) images and the names
that go with them.
2 From a new (noisy) images, find the image that is
most similar.
“Nearest neighbor” method
How many errors on already-known images?
... 0: no errors
Test data = Train data
G Varoquaux 4
9. 1 Machine learning in a nutshell: intuitions
A single descriptor:
one dimension
x
y
G Varoquaux 5
10. 1 Machine learning in a nutshell: intuitions
A single descriptor:
one dimension
x
y
x
y
Which model to prefer?
G Varoquaux 5
11. 1 Machine learning in a nutshell: intuitions
A single descriptor:
one dimension
x
y
x
y
Problem of “over-fitting”
Minimizing error is not always the best strategy
(learning noise)
Test data = train data
G Varoquaux 5
12. 1 Machine learning in a nutshell: intuitions
A single descriptor:
one dimension
x
y
x
y
Prefer simple models
= concept of “regularization”
Balance the number of parameters to learn
with the amount of data
G Varoquaux 5
13. 1 Machine learning in a nutshell: intuitions
A single descriptor:
one dimension
x
y
Two descriptors:
2 dimensions
X_1
X_2
y
The higher the number of descriptors
the more the trouble
G Varoquaux 5
14. 1 Machine learning in a nutshell: intuitions
A single descriptor:
one dimension
x
y
Two descriptors:
2 dimensions
X_1
X_2
y
The higher the number of descriptors
the more the trouble
The higher the required number of subjects
G Varoquaux 5
15. 1 Testing prediction: generalization and cross-validation
[Varoquaux... 2017]
x
y
x
y
G Varoquaux 6
16. 1 Testing prediction: generalization and cross-validation
[Varoquaux... 2017]
x
y
x
y
⇒ Need test on independent, unseen data
Train set Validation
set
Measures prediction accuracy
sklearn.model_selection.train_test_split
G Varoquaux 6
17. 1 Testing prediction: generalization and cross-validation
[Varoquaux... 2017]
x
y
x
y
⇒ Need test on independent, unseen data
Loop
Test setTrain set
Full data
sklearn.
model_selection.
cross_val_score
G Varoquaux 6
18. 2 Machine learning on rest
fMRI
for population imaging
finding differences between subjects
in functional connectomesG Varoquaux 7
19. From rest-fMRI to biomarkers
No salient features in rest fMRI
G Varoquaux 8
21. From rest-fMRI to biomarkers
Define functional regions
Learn interactions
G Varoquaux 8
22. From rest-fMRI to biomarkers
Define functional regions
Learn interactions
Find differences
G Varoquaux 8
23. From rest-fMRI to biomarkers
Functional
connectivity
matrix
Time series
extraction
Region
definition
Supervised learning
RS-fMRI
Typical pipeline [Varoquaux and Craddock 2013]
1. Define regions
2. Extract times series
3. Build functional-connectivity matrix
4. Apply supervised machine learning
G Varoquaux 9
24. 2 Defining regions from rest-fMRI
Clustering nilearn.regions.Parcellations
k-means
Fast (in nilearn)
No spatial model
⇒ smooth the data
G Varoquaux 10
25. 2 Defining regions from rest-fMRI
Clustering nilearn.regions.Parcellations
k-means
Fast (in nilearn)
No spatial model
⇒ smooth the data
Ward agglomerative clustering
Recursive merges of clusters
Spatial model constraints merges
⇒ fast
... ... ...
... ...
G Varoquaux 10
26. 2 Defining regions from rest-fMRI
Clustering nilearn.regions.Parcellations
k-means
Fast (in nilearn)
No spatial model
⇒ smooth the data
Ward agglomerative clustering
Recursive merges of clusters
Spatial model constraints merges
⇒ fast
Decomposition models
time
voxels
time
voxels
time
voxels
Y +E · S=
25
N
G Varoquaux 10
27. 2 Defining regions from rest-fMRI
Clustering nilearn.regions.Parcellations
k-means
Fast (in nilearn)
No spatial model
⇒ smooth the data
Ward agglomerative clustering
Recursive merges of clusters
Spatial model constraints merges
⇒ fast
Decomposition models
ICA: nilearn.decomposition.CanICA
seek independence of maps
Sparse dictionary learning:
seek sparse maps
nilearn.decomposition.DictLearning
G Varoquaux 10
28. 2 For connectome prediction [Dadi... 2018]
RS-fMRI
Functional
connectivity
Time series
2
4
3
1
Diagnosis
ROIs
Choice of regions for best prediction?
G Varoquaux 11
29. 2 For connectome prediction [Dadi... 2018]
RS-fMRI
Functional
connectivity
Time series
2
4
3
1
Diagnosis
ROIs
Choice of regions for best prediction?
G Varoquaux 11
30. 2 Region definition: resulting parcellations
Dictionary learning Group ICA
Ward clustering K-Means clustering
31. 2 Region definition: resulting parcellations
Dictionary learning Group ICA
Ward clustering K-Means clustering
32. 2 Region definition: resulting parcellations
Dictionary learning Group ICA
Ward clustering K-Means clustering
33. 2 Time-series extraction
Extract ROI-average signal:
Optional low-pass filter
(≈ .1 Hz – .3 Hz)
Regress out confounds (movement parameters, CSF &
white matter signals, Compcorr, Global mean)
Hard parcellations (eg from clustering)
nilearn.input_data.NiftiLabelsMasker
Soft parcellations (eg from ICA)
nilearn.input_data.NiftiMapsMasker
G Varoquaux 13
34. 2 Connectome: building a connectivity matrix
How to capture and represent interactions?
G Varoquaux 14
37. 2 Information geometry: uniform-error parametrization
Subject-specific noise in covariance form manifold
Tangent space removes coupling in coefficients
Controls
Patient
dΣ
M
anifold
Tangent
Tangent embedding[Varoquaux... 2010]
G Varoquaux 16
39. 2 For connectome prediction [Dadi... 2018]
Time series
2
RS-fMRI
41
Diagnosis
ROIs Functional
connectivity
3
Connectivity matrix
Correlation nilearn.connectome.ConnectivityMeasure
Partial correlations
Tangent space
G Varoquaux 18
40. 2 For connectome prediction [Dadi... 2018]
Time series
2
RS-fMRI
41
Diagnosis
ROIs Functional
connectivity
3
Connectivity matrix
Correlation nilearn.connectome.ConnectivityMeasure
Partial correlations
Tangent space
G Varoquaux 18
41. 2 Supervised learning step [Dadi... 2018]
Functional
connectivity
Time series
3
4
Diagnosis
2
RS-fMRI
1 ROIs
Supervised learning
Stick with Linear models
sklearn.linear_model.LogisticRegression
G Varoquaux 19
42. 2 Supervised learning step [Dadi... 2018]
Functional
connectivity
Time series
3
4
Diagnosis
2
RS-fMRI
1 ROIs
Supervised learning
Stick with Linear models
sklearn.linear_model.LogisticRegression
G Varoquaux 19
43. Predicting from brain activity at rest
RS-fMRI
Functional
connectivity
Time series
2
4
3
1
Diagnosis
ROIs
1. Functional regions (eg clustering, decomposition, or BASC atlas)
2. Filtering and or confound removal
3. Tangent-space parametrization
4. Supervised linear models (eg SVMs)
G Varoquaux 20
44. 3 References I
A. Abraham, E. Dohmatob, B. Thirion, D. Samaras, and
G. Varoquaux. Extracting brain regions from rest fMRI with
total-variation constrained dictionary learning. In MICCAI, page
607. 2013.
K. Dadi, M. Rahim, A. Abraham, D. Chyzhyk, M. Milham,
B. Thirion, and G. Varoquaux. Benchmarking functional
connectome-based predictive models for resting-state fmri. 2018.
G. Varoquaux and R. C. Craddock. Learning and comparing
functional connectomes across subjects. NeuroImage, 80:405,
2013.
G. Varoquaux and B. Thirion. How machine learning is shaping
cognitive neuroimaging. GigaScience, 3:28, 2014.
G. Varoquaux, F. Baronnet, A. Kleinschmidt, P. Fillard, and
B. Thirion. Detection of brain functional-connectivity difference
in post-stroke patients using group-level covariance modeling. In
MICCAI. 2010.G Varoquaux 21
45. 3 References II
G. Varoquaux, P. R. Raamana, D. A. Engemann, A. Hoyos-Idrobo,
Y. Schwartz, and B. Thirion. Assessing and tuning brain
decoders: cross-validation, caveats, and guidelines. NeuroImage,
145:166–179, 2017.
G Varoquaux 22