The document discusses how Python became a popular language for data science. It describes how scientists and web developers, who have different backgrounds and ways of working, were able to collaborate using Python. NumPy and SciPy provided fast numerical computing capabilities that scientists needed, while packages like Pandas, scikit-learn, and Beautiful Soup enabled data analysis and web scraping. By building on these foundations, the Python community was able to create powerful tools that have made data science widely accessible in Python.
** Python Certification Training: https://www.edureka.co/python **
This Edureka tutorial on "Python Tutorial for Beginners" (Python Blog Series: https://goo.gl/nKQJHQ) covers all the basics of Python. It includes python programming examples, so try it yourself and mention in the comments section if you have any doubts. Following are the topics included in this PPT:
Introduction to Python
Reasons to choose Python
Installing and running Python
Development Environments
Basics of Python Programming
Starting with code
Python Operators
Python Lists
Python Tuples
Python Sets
Python Dictionaries
Conditional Statements
Looping in Python
Python Functions
Python Arrays
Classes and Objects (OOP)
Conclusion
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Python for Science and Engineering: a presentation to A*STAR and the Singapor...pythoncharmers
Ā
An introduction to Python in science and engineering.
The presentation was given by Dr Edward Schofield of Python Charmers (www.pythoncharmers.com) to A*STAR and the Singapore Computational Sciences Club in June 2011.
Vskills certification for Python Developers assesses the candidate for developing Python based applications. The certification tests the candidates on various areas in developing Python based software which includes knowledge of installation, usage, syntax and semantics of Python programming language.
http://www.vskills.in/certification/Certified-Python-Developer
Python Tutorial | Python Tutorial for Beginners | Python Training | EdurekaEdureka!
Ā
This Edureka Python tutorial will help you in understanding the various fundamentals of Python programming with examples in detail. This Python tutorial helps you to learn following topics:
1. Introduction to Python
2. Who uses Python
3. Features of Python
4. Operators in Python
5. Datatypes in Python
6. Flow Control
7. Functions in Python
8. File Handling in Python
** Python Certification Training: https://www.edureka.co/python **
This Edureka tutorial on "Python Tutorial for Beginners" (Python Blog Series: https://goo.gl/nKQJHQ) covers all the basics of Python. It includes python programming examples, so try it yourself and mention in the comments section if you have any doubts. Following are the topics included in this PPT:
Introduction to Python
Reasons to choose Python
Installing and running Python
Development Environments
Basics of Python Programming
Starting with code
Python Operators
Python Lists
Python Tuples
Python Sets
Python Dictionaries
Conditional Statements
Looping in Python
Python Functions
Python Arrays
Classes and Objects (OOP)
Conclusion
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Python for Science and Engineering: a presentation to A*STAR and the Singapor...pythoncharmers
Ā
An introduction to Python in science and engineering.
The presentation was given by Dr Edward Schofield of Python Charmers (www.pythoncharmers.com) to A*STAR and the Singapore Computational Sciences Club in June 2011.
Vskills certification for Python Developers assesses the candidate for developing Python based applications. The certification tests the candidates on various areas in developing Python based software which includes knowledge of installation, usage, syntax and semantics of Python programming language.
http://www.vskills.in/certification/Certified-Python-Developer
Python Tutorial | Python Tutorial for Beginners | Python Training | EdurekaEdureka!
Ā
This Edureka Python tutorial will help you in understanding the various fundamentals of Python programming with examples in detail. This Python tutorial helps you to learn following topics:
1. Introduction to Python
2. Who uses Python
3. Features of Python
4. Operators in Python
5. Datatypes in Python
6. Flow Control
7. Functions in Python
8. File Handling in Python
Calling C++ function from Pyhon
OS (Windows 7 also works) Windows 8.1
Compiler Visual Studio 2013 Update4
Python (Other version may be incompatible) python-3.3.5.msi (x86) [1]
SWIG (Unzip and copy swigwin-3.0.2 folder to C:\) swigwin-3.0.2 [2]
Python Scripting Tutorial for Beginners | Python Tutorial | Python Training |...Edureka!
Ā
( Python Training : https://www.edureka.co/python-scripting )
This "Python Scripting Tutorial" will introduce you to python which is a scripting language and using python you can build powerful applications easily. This video helps you to learn the below topics:
1. Scripting language Vs Programming language
2. Introduction to Python
3. Modules in Python
4. AWS scripting Using Boto Module
5. GUI Programming
Subscribe to our channel to get video updates. Hit the subscribe button and click on the bell icon.
Python programming | Fundamentals of Python programming KrishnaMildain
Ā
Basic Fundamentals of Python Programming.
What is Python, History of python, Advantages, Disadvantages, feature of python, scope, and many more.
Data Structure using Python, Object Oriented Programming using
Python Loops Tutorial | Python For Loop | While Loop Python | Python Training...Edureka!
Ā
This Edureka "Python Loops" tutorial (Python Tutorial Blog: https://goo.gl/wd28Zr) will help you in understanding different types of loops used in Python. You will be learning how to implement all the loops in python practically. Below are the topics covered in this tutorial:
1) Why to use loops?
2) What are loops?
3) Types of loops in Python: While, For, Nested
4) Demo on each Python loop
Python has been one of the premier, flexible, and powerful open-source language that is easy to learn, easy to use, and has powerful libraries for data manipulation and analysis. Itās easy to learn simple syntax is very accessible to new programmers and is similar to Matlab, C/C++, Java, or Visual Basic. Python is general purpose and comparatively easy to learn with an increased adoption for analytical and quantitative computing. For over a decade, Python has been used in scientific computing and highly quantitative domains such as finance, oil and gas, physics, and signal processing.
What is Range Function? | Range in Python Explained | EdurekaEdureka!
Ā
YouTube Link: https://youtu.be/0Hp7AThTZhQ
** Python Certification Training: https://www.edureka.co/data-science-python-certification-course **
This Edureka video on 'Range In Python' will help you understand how we can use Range Function in Python with various examples for better understanding. Following are the topics discussed:
What Is Range In Python?
Range Parameters
Range With For Loop
Float Numbers In Range
Reverse Range In Python
Range vs XRange
Range Function Examples
Points To Remember
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Castbox: https://castbox.fm/networks/505?country=in
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...Edureka!
Ā
( ** Deep Learning Training: https://www.edureka.co/ai-deep-learning-with-tensorflow ** )
This Edureka PyTorch Tutorial (Blog: https://goo.gl/4zxMfU) will help you in understanding various important basics of PyTorch. It also includes a use-case in which we will create an image classifier that will predict the accuracy of an image data-set using PyTorch.
Below are the topics covered in this tutorial:
1. What is Deep Learning?
2. What are Neural Networks?
3. Libraries available in Python
4. What is PyTorch?
5. Use-Case of PyTorch
6. Summary
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Lawrence berkeley national laboratory sep 2015 - Jupyter Talk
Scientific facilities are increasingly generating large data sets. Next-generation scientific productivity relies on user-friendly tools and efficient, effective and seamless access to resources and data. Traditional approaches to research and software development for science focus on the hardware and software of the machine and do not consider the user. In this talk, I will highlight a different approach to building software for scientific users by including user knowledge in the process. I will illustrate a few example projects where this has been used to date.
GIthub repository: https://github.com/Carreau/talks/tree/master/labtech-2015
Data science calls for rapid experimentation and building intuitions from the data. Yet, data science also underpins crucial decisions and operational logic. Writing production-ready and robust statistical analysis without cognitive overhead may seem a conundrum. I will explore simple, and less simple, practices for fast turn around and consolidation of data-science code. I will discuss how these considerations led to the design of scikit-learn, that enables easy machine learning yet is used in production. Finally, I will mention some scikit-learn gems, new or forgotten.
Calling C++ function from Pyhon
OS (Windows 7 also works) Windows 8.1
Compiler Visual Studio 2013 Update4
Python (Other version may be incompatible) python-3.3.5.msi (x86) [1]
SWIG (Unzip and copy swigwin-3.0.2 folder to C:\) swigwin-3.0.2 [2]
Python Scripting Tutorial for Beginners | Python Tutorial | Python Training |...Edureka!
Ā
( Python Training : https://www.edureka.co/python-scripting )
This "Python Scripting Tutorial" will introduce you to python which is a scripting language and using python you can build powerful applications easily. This video helps you to learn the below topics:
1. Scripting language Vs Programming language
2. Introduction to Python
3. Modules in Python
4. AWS scripting Using Boto Module
5. GUI Programming
Subscribe to our channel to get video updates. Hit the subscribe button and click on the bell icon.
Python programming | Fundamentals of Python programming KrishnaMildain
Ā
Basic Fundamentals of Python Programming.
What is Python, History of python, Advantages, Disadvantages, feature of python, scope, and many more.
Data Structure using Python, Object Oriented Programming using
Python Loops Tutorial | Python For Loop | While Loop Python | Python Training...Edureka!
Ā
This Edureka "Python Loops" tutorial (Python Tutorial Blog: https://goo.gl/wd28Zr) will help you in understanding different types of loops used in Python. You will be learning how to implement all the loops in python practically. Below are the topics covered in this tutorial:
1) Why to use loops?
2) What are loops?
3) Types of loops in Python: While, For, Nested
4) Demo on each Python loop
Python has been one of the premier, flexible, and powerful open-source language that is easy to learn, easy to use, and has powerful libraries for data manipulation and analysis. Itās easy to learn simple syntax is very accessible to new programmers and is similar to Matlab, C/C++, Java, or Visual Basic. Python is general purpose and comparatively easy to learn with an increased adoption for analytical and quantitative computing. For over a decade, Python has been used in scientific computing and highly quantitative domains such as finance, oil and gas, physics, and signal processing.
What is Range Function? | Range in Python Explained | EdurekaEdureka!
Ā
YouTube Link: https://youtu.be/0Hp7AThTZhQ
** Python Certification Training: https://www.edureka.co/data-science-python-certification-course **
This Edureka video on 'Range In Python' will help you understand how we can use Range Function in Python with various examples for better understanding. Following are the topics discussed:
What Is Range In Python?
Range Parameters
Range With For Loop
Float Numbers In Range
Reverse Range In Python
Range vs XRange
Range Function Examples
Points To Remember
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Castbox: https://castbox.fm/networks/505?country=in
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...Edureka!
Ā
( ** Deep Learning Training: https://www.edureka.co/ai-deep-learning-with-tensorflow ** )
This Edureka PyTorch Tutorial (Blog: https://goo.gl/4zxMfU) will help you in understanding various important basics of PyTorch. It also includes a use-case in which we will create an image classifier that will predict the accuracy of an image data-set using PyTorch.
Below are the topics covered in this tutorial:
1. What is Deep Learning?
2. What are Neural Networks?
3. Libraries available in Python
4. What is PyTorch?
5. Use-Case of PyTorch
6. Summary
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Lawrence berkeley national laboratory sep 2015 - Jupyter Talk
Scientific facilities are increasingly generating large data sets. Next-generation scientific productivity relies on user-friendly tools and efficient, effective and seamless access to resources and data. Traditional approaches to research and software development for science focus on the hardware and software of the machine and do not consider the user. In this talk, I will highlight a different approach to building software for scientific users by including user knowledge in the process. I will illustrate a few example projects where this has been used to date.
GIthub repository: https://github.com/Carreau/talks/tree/master/labtech-2015
Data science calls for rapid experimentation and building intuitions from the data. Yet, data science also underpins crucial decisions and operational logic. Writing production-ready and robust statistical analysis without cognitive overhead may seem a conundrum. I will explore simple, and less simple, practices for fast turn around and consolidation of data-science code. I will discuss how these considerations led to the design of scikit-learn, that enables easy machine learning yet is used in production. Finally, I will mention some scikit-learn gems, new or forgotten.
Personal point of view on scikit-learn: past, present, and future.
This talks gives a bit of history, mentions exciting development, and a personal vision on the future.
Succeeding in academia despite doing good_softwareGael Varoquaux
Ā
Hacking academia for fun and profit
Thoughts on succeeding in academia despite doing good software
Keynote I gave at the Scipyconf Argentina 2014 conference
The advancement of science is a noble cause, and academia a fierce battlefield for tenure. Software is seen as a mere technicality, not worth a line on an academic CV. I claim that, on the opposite software, is the new medium of scientific method. I claim that succeeding in academia can be achieved not despite writing good software but via such an accomplishment. The key is to choose the right battles and to win them.
What is the emerging role of software in the scientific workflow? Which are the software challenges that can have impact? How to balance software quality assurance and the quick turn-around random-walk of research? What does "good design" mean for research software? What Python patterns can boost productivity and reuse in exploratory scientific computing?
I will try to answer these questions, based on my personal experience of growing up to become an academic Pythonista.
Machine learning and cognitive neuroimaging: new tools can answer new questionsGael Varoquaux
Ā
Machine learning is geared towards prediction. However, aside diagnosis or prognosis in the clinics, cognitive neuroimaging strives for uncovering insights from the data, rather than minimizing prediction error. I review various inferences on brain function that have been drawn using pattern recognition techniques, focusing on decoding. In particular, I discuss using generalization as a test for information, multivariate analysis to interpret overlapping activation patterns, and decoding for principled reverse inference. I give each time a statistical view and a cognitive imaging view.
Brain maps from machine learning? Spatial regularizationsGael Varoquaux
Ā
Pattern Recognition for NeuroImaging (PR4NI)
We will show empirically how the pattern recognition techniques-commonly used, such as SVMs, provide low-quality brain maps, eventhough they give very good prediction accuracy. We will give an overview of recently developed techniques to impose priors on patterns particularly well suited to neuroimaging: selecting a small number of spatially-structured predictive brain regions. These tools reconcile machine learning with
brain mapping by giving maps more useful to draw neuroscientific conclusions. In addition, they are more robust to cross-individuals spatial variability and thus generalize well across subjects.
Scikit-learn for easy machine learning: the vision, the tool, and the projectGael Varoquaux
Ā
Scikit-learn is a popular machine learning tool. What can it do for you?Why you you want to use it? What can you do with it? Where is it going?In this talk, I will discuss why and how scikit-learn became popular. Iwill argue that it is successful because of its vision: it fills an important slot in the rich ecosystem of data science. I will demonstrate how scikit-learn makes predictive analysis easy and yet versatile.I will shed some light on our development process: how do we, as a community, ensure the quality and the growth of scikit-learn?
Inter-site autism biomarkers from resting state fMRIGael Varoquaux
Ā
We present an automated pipeline to learn predictive biomarkers from resting-state fMRI. We apply it to classifying autism on unseen sites, demonstrating the feasibility of biomarkers on weakly standardized functional imaging data.
We study the steps of the pipeline that are important to predict and can show that 1) the choice of atlas is the most important choice. Ideally the atlas should be made of functional regions learned from the data. 2) "tangent space" parametrization of the connectivity is the best performer.
We conclude on general recommendations for predictive biomarkers from resting-state fMRI
Building a cutting-edge data processing environment on a budgetGael Varoquaux
Ā
As a penniless academic I wanted to do "big data" for science. Open source, Python, and simple patterns were the way forward. Staying on top of todays growing datasets is an arm race. Data analytics machinery āclusters, NOSQL, visualization, Hadoop, machine learning, ...ā can spread a team's resources thin. Focusing on simple patterns, lightweight technologies, and a good understanding of the applications gets us most of the way for a fraction of the cost.
I will present a personal perspective on ten years of scientific data processing with Python. What are the emerging patterns in data processing? How can modern data-mining ideas be used without a big engineering team? What constraints and design trade-offs govern software projects like scikit-learn, Mayavi, or joblib? How can we make the most out of distributed hardware with simple framework-less code?
High-level talk about machine learning: the statistical and computational challenges, as well as how they can be answer by the scikit-learn Python toolkit. In French
Processing biggish data on commodity hardware: simple Python patternsGael Varoquaux
Ā
Scipy 2013 talk on simple Python patterns to process efficiently large datasets using Python.
The talk focuses on the patterns and the concepts rather than the implementations. The implementations can be found by looking at the joblib and scikit-learn codebase
Talk giving at PRNI 2016 for the paper https://arxiv.org/pdf/1606.06439v1.pdf
Abstract ā Spatially-sparse predictors are good models for
brain decoding: they give accurate predictions and their weight
maps are interpretable as they focus on a small number of
regions. However, the state of the art, based on total variation or
graph-net, is computationally costly. Here we introduce sparsity
in the local neighborhood of each voxel with social-sparsity, a
structured shrinkage operator. We find that, on brain imaging
classification problems, social-sparsity performs almost as well as
total-variation models and better than graph-net, for a fraction
of the computational cost. It also very clearly outlines predictive
regions. We give details of the model and the algorithm
Connectomics: Parcellations and Network Analysis MethodsGael Varoquaux
Ā
Simple tutorial on methods for functional connectome analysis: learning regions, extracting functional signal, inferring the network structure, and comparing it across subjects.
Brain reading, compressive sensing, fMRI and statistical learning in PythonGael Varoquaux
Ā
Talk given at Gipsa-lab on using machine learning to learn from fMRI brain patterns and regions related to behavior. This talks focuses on the signal and inverse-problem aspects of the equation, as well as on the software.
Why Python Should Be Your First Programming LanguageEdureka!
Ā
Programmers love Python because of how fast and easy it is to use. Python cuts development time in half with its simple to read syntax and easy compilation feature. Debugging your programs is a breeze in Python with its built in debugger. Using Python makes Programmers more productive and their programs ultimately better. Python is continued to be a favourite option for data scientists who use it for building and using Machine learning applications and other scientific computations.
Python runs on Windows, Linux/Unix, Mac OS and has been ported to Java and .NET virtual machines. Python is free to use, even for the commercial products, because of its OSI-approved open source license.
Python has evolved as the most preferred Language for Data Analytics and the increasing search trends on python also indicates that Python is the next "Big Thing" and a must for Professionals in the Data Analytics domain.
This is a python course for beginners, intended both for frontal class learning as well as self-work.
The Course is designed for 2 days and then another week of HW assignments.
Tweepy is an open source Python package that gives you a very convenient way to access the Twitter API with Python. Tweepy includes a set of classes and methods that represent Twitter's models and API endpoints, and it transparently handles various implementation details, such as: Data encoding and decoding.
Python has been the fastest growing language globally in the last 5 years. In India, the average salary for a Python developer is 4.8 lacs per annum at the entry level. This deck gets you started on fundamentals of Python, Python constructs and guides you to set up the environment and write your first Python program. It introduces you to the Spotle Certificate Masterclass In Python which covers the complete Python fundamentals through interactive videos, live classes, integrated hands-ons and projects.
What is Python? An overview of Python for science.Nicholas Pringle
Ā
A brief introduction on the use of Python for scientists. Python is fast becoming a popular programming language for scientists. It is free, open source and constantly improving. Being an easy language to learn, it has a large a community of users. Its many favourable qualities make it the perfect language for scientific collaboration.
** Python Certification Training: https://www.edureka.co/python **
This Edureka PPT on Python Projects will help you establish a foothold on Python by helping you assess and obtain skills which are used to design, develop and analyze projects built in Python.
1. Introduction to Python
2. Installation and Working with Python
3. Python Projects- 3levels
4. Practical approach - Code
Python Tutorial Playlist: https://goo.gl/WsBpKe
Blog Series: http://bit.ly/2sqmP4s
Follow us to never miss an update in the future:
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Are you ready to embark on an exciting journey into the realm of programming, fuelled by your passion for computer science? If so, diving headfirst into the world of Python programming language can serve as your launchpad to a plethora of opportunities. In this blog post, we'll unveil five valuable tips to expedite your Python learning process, guiding you towards proficiency and success in your programming pursuits.
Useful Link: https://maps.app.goo.gl/MwBrdxNaFdj1JPjb7
Are you looking to embark on a rewarding career in web development? Look no further! Our comprehensive Python Full Stack Training in Noida is designed to equip you with the skills and knowledge you need to become a proficient web developer. In today's digital age, web development is a highly sought-after skill, and Python has emerged as a popular choice among developers. Its simplicity, versatility, and vast ecosystem of libraries and frameworks make it a powerful tool for building dynamic and interactive websites. At our training institute in Noida, we offer a well-structured and hands-on Python Full Stack Training program that covers every aspect of web development. Whether you're a beginner or an experienced programmer, our course is tailored to meet the needs of learners at all levels.
http://rb.gy/2jvud
Similar to Scientist meets web dev: how Python became the language of data (20)
Evaluating machine learning models and their diagnostic valueGael Varoquaux
Ā
Model evaluation is, in my opinion, the most overlooked step of the machine-learning pipeline. Reliably estimating a model's performance for a given purpose is crucial and difficult. In this talk, I first discuss choosing metric informative for the application, stressing the importance of the class prevalence in classification settings. I will then discussing procedures to estimate the generalization performance, drawing a distinction between evaluating a learning procedure or a prediction rule, and discussing how to give confidence intervals to the performance estimates.
Measuring mental health with machine learning and brain imagingGael Varoquaux
Ā
The study of mental health relies vastly on behavior testing and questionnaires. I discuss how
machine learning on large brain-imaging cohorts can open new alleys for markers of mental health. My
claims are that challenges are the amount of diagnosed conditions rather than heterogeneity of the
conditions and that we should turn to proxy labels. I discuss another fundamental challenge to this
agenda: the external and construct validity of brain-imaging based markers.
A tutorial on machine learning to build prediction models with missing values.
The slides cover both theoretical results (statistical learning) and practical advice, with a focus on implementation in Python with scikit-learn
Dirty data science machine learning on non-curated dataGael Varoquaux
Ā
These slides are a one-hour course on machine learning with non-curated data.
According to industry surveys, the number one hassle of data scientists is cleaning the data to analyze it. Here, I survey what "dirtyness" forces time-consuming cleaning. We will then cover two specific aspects of dirty data: non-normalized entries and missing values. I show how, for these two problems, machine-learning practice can be adapted to work directly on a data table without curation. The normalization problem can be tackled by adapting methods from natural language processing. The missing-values problem will lead us to revisit classic statistical results in the setting of supervised learning.
Representation learning in limited-data settingsGael Varoquaux
Ā
A 4-hour long didactic course on simple notions of representations and how to use them in limited-data settings:
- A supervised learning point of view, giving intuitions and math on what are representations are why they matter
- Building simple unsupervised learning models to extract representation: from matrix decomposition for signals to embeddings of entities
- Evaluating models in limited-data settings, often a bottleneck
This slide-deck was given as a course at the 2021 DeepLearn summer school.
Better neuroimaging data processing: driven by evidence, open communities, an...Gael Varoquaux
Ā
My current thoughts about methods validity and design in brain imaging.
Data processing is a significant part of a neuroimaging study. The choice of corresponding methods and tools is crucial. I will give an opinionated view how on a path to building better data processing for neuroimaging. I will take examples on endeavors that I contributed to: defining standards for functional-connectivity analysis, the nilearn neuroimaging tool, the scikit-learn machine-learning toolbox -an industry standard with a million regular users. I will cover not only the technical process -statistics, signal processing, software engineering- but also the epistemology of methods development. Methods govern our results, they are more than a technical detail.
Functional-connectome biomarkers to meet clinical needs?Gael Varoquaux
Ā
Extracting Functional-Connectome Biomarkers with Machine Learning: a talk in the symposium on how do current predictive connectivity models meet clinicianās needs?
This talk is a bit provocative and first sets visions, before bringing a few technical suggestions
Atlases of cognition with large-scale human brain mappingGael Varoquaux
Ā
Cognitive neuroscience uses neuroimaging to identify brain systems engaged in specific cognitive tasks. However, linking unequivocally brain systems with cognitive functions is difficult: each task probes only a small number of facets of cognition, while brain systems are often engaged in many tasks. We develop a new approach to generate a functional atlas of cognition, demonstrating brain systems selectively associated with specific cognitive functions. This approach relies upon an ontology that defines specific cognitive functions and the relations between them, along with an analysis scheme tailored to this ontology. Using a database of thirty neuroimaging studies, we show that this approach provides a highly-specific atlas of mental functions, and that it can decode the mental processes engaged in new tasks.
Similarity encoding for learning on dirty categorical variablesGael Varoquaux
Ā
For statistical learning, categorical variables in a table are usually considered as discrete entities and encoded separately to feature vectors, e.g., with one-hot encoding. "Dirty" non-curated data gives rise to categorical variables with a very high cardinality but redundancy: several categories reflect the same entity. In databases, this issue is typically solved with a deduplication step. We show that a simple approach that exposes the redundancy to the learning algorithm brings significant gains. We study a generalization of one-hot encoding, similarity encoding, that builds feature vectors from similarities across categories. We perform a thorough empirical validation on non-curated tables, a problem seldom studied in machine learning. Results on seven real-world datasets show that similarity encoding brings significant gains in prediction in comparison with known encoding methods for categories or strings, notably one-hot encoding and bag of character n-grams. We draw practical recommendations for encoding dirty categories: 3-gram similarity appears to be a good choice to capture morphological resemblance. For very high-cardinality, dimensionality reduction significantly reduces the computational cost with little loss in performance: random projections or choosing a subset of prototype categories still outperforms classic encoding approaches.
Machine learning for functional connectomesGael Varoquaux
Ā
A tutorial on using machine-learning for functional-connectomes, for instance on resting-state fMRI. This is typically useful for population imaging: comparing traits or conditions across subjects.
Towards psychoinformatics with machine learning and brain imagingGael Varoquaux
Ā
Informatics in the psychological sciences brings fascinating challenges as mental processes or pathologies have fuzzy definition and are hard to quantify. Brain imaging brings rich data on the neural substrate of these concepts, yet it is a non trivial link.
The goal of this presentation is to put forward basic ideas of "psychoinformatics", using advanced processing on brain images to quantify better the elements of psychology.
It discusses how machine learning can bridge brain images to behavior: to describe better mental processes involved in brain activity, or to extract biomarkers of pathologies, individual traits, or cognition.
A tutorial on Machine Learning, with illustrations for MR imagingGael Varoquaux
Ā
Machine learning builds predictive models from the data. It is massive used on medical images these days, for a variety of applications ranging from segmentation to diagnosis.
This is an introductory tutorial to machine learning from giving intuitions on the statistical point of view. It introduce the methodology, the concepts behind the central models, the validation framework and some caveats to look for.
It also discusses some applications to drawing conclusions from brain imaging, and use these applications to highlight various technical aspects to running machine learning models on high-dimensional data such as medical imaging.
Scikit-learn and nilearn: Democratisation of machine learning for brain imagingGael Varoquaux
Ā
This talk describe our efforts to bring easily usable machine learning to brain mapping. It covers both questions that machine learning can answer as well as two softwares developed to facilitate machine learning and it's application to neuroimaging.
Computational practices for reproducible scienceGael Varoquaux
Ā
Reconciling bleeding-edge scientific results and reproducible research may seem a conundrum in our fast-paced high-pressure academic world. I discuss the practices that I found useful in computational work. At a high level, it is important to navigate the space between rapid experimentation and industrial-grade software development. I advocate adopting more and more software-engineering best practices as a project matures. I will also discuss how to turn the computational work into libraries, and to ensure the quality of the resulting libraries. And I conclude on how those libraries need to fit in the larger picture of the exercise of research to give better science.
Slides for my keynote at Scipy 2017
https://youtu.be/eVDDL6tgsv8
Computing has been driving forward a revolution in how science and technology can solve new problems. Python has grown to be a central player in this game, from computational physics to data science. I would like to explore some lessons learned doing science with Python as well as doing Python libraries for science. What are the ingredients that the scientists need? What technical and project-management choices drove the success of projects I've been involved with? How do these demands and offers shape our ecosystem?
In this talk, I'd like to share a few thoughts on how we code for science and innovation, with the modest goal of changing the world.
Estimating Functional Connectomes: Sparsityās Strength and LimitationsGael Varoquaux
Ā
Talk given at the OHBM 2017 education course.
I present the challenges and techniques to estimating meaningful brain functional connectomes from fMRI: why sparsity in inverse covariance leads to models that can interpreted as interactions between regions.
Then I discuss the limitations of sparse estimators and introduce shrinkage as an alternative. Finally, I discuss how to compare multiple functional connectomes.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
Ā
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
ā¢ The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
ā¢ Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
ā¢ Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
ā¢ Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
Ā
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Ā
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Ā
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as āpredictable inferenceā.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
Ā
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Ā
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Ā
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Ā
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overviewā
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Ā
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
Ā
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties ā USA
Expansion of bot farms ā how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks ā Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Scientist meets web dev: how Python became the language of data
1. Scientist meets web dev:
how Python became the language of data
GaĀØel Varoquaux
2. Scientist meets web dev:
how Python became the language of data
GaĀØel Varoquaux
Very diverse community
This talk: a reļ¬ection on what
we have in common, Python
I am talking about
things you donāt understand (my science)
and things I donāt understand (web dev)
3. I actually did a PhD
in quantum physics
Hence I think I qualify as
a āscientistā
G Varoquaux 3
4. I now do computer science for neuroscience
Try to link neural activity to thoughts and cognition
G Varoquaux 4
5. I now do computer science for neuroscience
Try to link neural activity to thoughts and cognition
We attack it as a machine learning problem
Python software: nilearn
G Varoquaux 5
6. On the way, we created
a machine-learning library:
scikit-learn
G Varoquaux 6
7. Data science with Python is hot
Huge success.
Cool.
Data science is THE thing.
G Varoquaux 7
8. Data science with Python is hot
Huge success.
Cool.
Data science is THE thing.
Python is the go-to language
How did it happen?
We built scikit-learn, others pandas, etc...,
but these were built on solid foundations
G Varoquaux 7
9. 1 Scientists come from Jupiter
And web devs from Saturn?
And sysadmins from Neptune?
G Varoquaux 8
10. 1 Weāre diļ¬erent
numbers (in arrays)
arrays (of numbers)
arrays
arrays
strings
databases
object-oriented
programming
ļ¬ow control
A bit of a culture gap
G Varoquaux 9
11. 1 Letās do something together: sort EuroPython site
205 talks:
How OpenStack makes Python better (and vice-versa)
Introduction to aiohttp
So you think your Python startup is worth $10 million...
SQLAlchemy as the backbone of a Data Science company
Learn Python The Fun Way
Scaling Microservices with Crossbar.io
If you can read this you donāt need glasses
Letās ļ¬nd some common topics with data science
G Varoquaux 10
12. 1 Letās do something together: sort EuroPython site
205 talks:
How OpenStack makes Python better (and vice-versa)
Introduction to aiohttp
So you think your Python startup is worth $10 million...
SQLAlchemy as the backbone of a Data Science company
Learn Python The Fun Way
Scaling Microservices with Crossbar.io
If you can read this you donāt need glasses
Letās ļ¬nd some common topics with data science
Anyone who has used Python to search text
for substring patterns has at least heard of
the regular expression module. Many of us
use it extensively for parsers and lexers,
The py.test tool presents a rapid and simple
way to write tests for your Python code. This
training gives a quick introduction with exercises
into some distinguishing features.Chat with the core developers about how
to extend django CMS or how to integrate
your own apps seamlessly. Lets talk about
your plugins, apphooks, toolbar extensions
G Varoquaux 10
13. 1 Letās do something together: sort EuroPython site
205 talks:
How OpenStack makes Python better (and vice-versa)
Introduction to aiohttp
So you think your Python startup is worth $10 million...
SQLAlchemy as the backbone of a Data Science company
Learn Python The Fun Way
Scaling Microservices with Crossbar.io
If you can read this you donāt need glasses
Letās ļ¬nd some common topics with data science
Anyone who has used Python to search text
for substring patterns has at least heard of
the regular expression module. Many of us
use it extensively for parsers and lexers,
The py.test tool presents a rapid and simple
way to write tests for your Python code. This
training gives a quick introduction with exercises
into some distinguishing features.Chat with the core developers about how
to extend django CMS or how to integrate
your own apps seamlessly. Lets talk about
your plugins, apphooks, toolbar extensions
import urllib2, bs4
import sklearn,
wordcloud
G Varoquaux 10
14. 1 Letās do something together: sort EuroPython site
Crawl
the schedule to get a list of titles and URLs
talk pages to retrieve abstract and tags
bs4: beautiful soup, matchings on the DOM tree
G Varoquaux 11
15. 1 Letās do something together: sort EuroPython site
Crawl
the schedule to get a list of titles and URLs
talk pages to retrieve abstract and tags
bs4: beautiful soup, matchings on the DOM tree
Vectorize
Anyone who has used Python to search text
for substring patterns has at least heard of
the regular expression module. Many of us
use it extensively for parsers and lexers,
The py.test tool presents a rapid and simple
way to write tests for your Python code. This
training gives a quick introduction with exercises
into some distinguishing features.Chat with the core developers about how
to extend django CMS or how to integrate
your own apps seamlessly. Lets talk about
your plugins, apphooks, toolbar extensions
a
can
code
is
module
proļ¬ling
performance
Python
the
20
10
4
14
3
2
1
9
18
Term Freq
G Varoquaux 11
16. 1 Letās do something together: sort EuroPython site
Crawl
the schedule to get a list of titles and URLs
talk pages to retrieve abstract and tags
bs4: beautiful soup, matchings on the DOM tree
Vectorize
Anyone who has used Python to search text
for substring patterns has at least heard of
the regular expression module. Many of us
use it extensively for parsers and lexers,
The py.test tool presents a rapid and simple
way to write tests for your Python code. This
training gives a quick introduction with exercises
into some distinguishing features.Chat with the core developers about how
to extend django CMS or how to integrate
your own apps seamlessly. Lets talk about
your plugins, apphooks, toolbar extensions
a
can
code
is
module
proļ¬ling
performance
Python
the
20
10
4
14
3
2
1
9
18
Term Freq
1321
540
208
964
123
7
6
191
1450
All
docs
G Varoquaux 11
17. 1 Letās do something together: sort EuroPython site
Crawl
the schedule to get a list of titles and URLs
talk pages to retrieve abstract and tags
bs4: beautiful soup, matchings on the DOM tree
Vectorize
Anyone who has used Python to search text
for substring patterns has at least heard of
the regular expression module. Many of us
use it extensively for parsers and lexers,
The py.test tool presents a rapid and simple
way to write tests for your Python code. This
training gives a quick introduction with exercises
into some distinguishing features.Chat with the core developers about how
to extend django CMS or how to integrate
your own apps seamlessly. Lets talk about
your plugins, apphooks, toolbar extensions
a
can
code
is
module
proļ¬ling
performance
Python
the
20
10
4
14
3
2
1
9
18
Term Freq
1321
540
208
964
123
7
6
191
1450
All
docs
.015
.018
.019
.014
.023
.286
.167
.047
.012
Ratio
TF-IDF in scikit-learn
sklearn.feature extraction.text.TfidfVectorizer
G Varoquaux 11
18. 1 Letās do something together: sort EuroPython site
03078090707907
00790752700578
94071006000797
00970008007000
10000400400090
00050205008000
documents
the
Python
performance
profiling
module
is
code
can
a
Term-document matrix
G Varoquaux 12
19. 1 Letās do something together: sort EuroPython site
03078090707907
00790752700578
94071006000797
00970008007000
10000400400090
00050205008000
documents
the
Python
performance
profiling
module
is
code
can
a
Term-document matrix
3 78 9 7 79 7
79 7527 578
94 71 6
797
97
8 7
1
4 4
9
5 2 5 8
Can be a sparse matrix
G Varoquaux 12
20. 1 Letās do something together: sort EuroPython site
03078090707907
00790752700578
94071006000797
00970008007000
10000400400090
00050205008000
documents
the
Python
performance
profiling
module
is
code
can
a
ā
03078090707907
00790752700578
94071006000797
topics
the
Python
performance
profiling
module
is
code
can
a
030
007
940
009
100
000
documents
topics
+
What terms
are in a topics
What documents
are in a topics
A matrix factorization
Often with non-negative constraints
sklearn.decompositions.NMF
G Varoquaux 13
21. 1 Letās do something together: sort EuroPython site
EuroPyton abstracts
Topic 1
G Varoquaux 14
22. 1 Letās do something together: sort EuroPython site
EuroPyton abstracts
Topic 2
G Varoquaux 14
23. 1 Letās do something together: sort EuroPython site
EuroPyton abstracts
Topic 3
G Varoquaux 14
24. 1 Letās do something together: sort EuroPython site
EuroPyton abstracts
G Varoquaux 14
25. 1 Letās do something together: sort EuroPython site
EuroPyton abstracts
Add one of Pythonās great templating engine
... get a usable website
https://gaelvaroquaux.github.io/my_topics/ep16
G Varoquaux 14
26. Want to try it?
$ pip install scikit-learn
...
ImportError: Numerical Python (NumPy) is not installed.
scikit-learn requires NumPy >= 1.6.1
G Varoquaux 15
27. Want to try it?
$ pip install scikit-learn
...
ImportError: Numerical Python (NumPy) is not installed.
scikit-learn requires NumPy >= 1.6.1
G Varoquaux 15
28. Want to try it?
$ pip install scikit-learn
...
ImportError: Numerical Python (NumPy) is not installed.
scikit-learn requires NumPy >= 1.6.1
C:> pip install numpy
...
error: Unable to find vcvarsall.bat
G Varoquaux 15
29. Want to try it?
$ pip install scikit-learn
...
ImportError: Numerical Python (NumPy) is not installed.
scikit-learn requires NumPy >= 1.6.1
C:> pip install numpy
...
error: Unable to find vcvarsall.bat
G Varoquaux 15
30. 1 Weāre diļ¬erent
Well
fast linear algebra
ATLAS (Fortran) 70x faster
libfortran.so.3 ??
youāre kidding me
G Varoquaux 16
31. 1 Weāre diļ¬erent
Well
fast linear algebra
ATLAS (Fortran) 70x faster
libfortran.so.3 ??
youāre kidding me
Packaging is a major roadblock for scientiļ¬c Python
A lot of compiled code + shared libraries
ā library + ABI compatibility issues
Progress:
- Manylinux wheels: PEP 513, RT. McGibbon, NJ. Smith
rely on a conservative core set of libs
- Openblas: pure-C, fast linear algebra
G Varoquaux 16
32. 1 Weāre diļ¬erent
But working together gives us awesome things
Text mining ā intelligent interfaces
G Varoquaux 17
33. 2 The scientistās view of code
Numerics versus control ļ¬ow
Numerics versus databases
Numerics versus strings
Numerics versus the world
G Varoquaux 18
34. 2 Why we love numpy
100 000 term frequency vs inverse doc frequency:
In [*]: %timeit [t * i for t, i in izip(tf, idf)]
100 loops, best of 3: 6.2 ms per loop
The numpy style:
In [*]: %timeit tf * idf
1000 loops, best of 3: 74.2 Āµs per loop
102
104
106
number of elements
1Āµs
100ns
10ns
1ns
timeperelement
lists
numpy
G Varoquaux 19
35. 2 Why we love numpy
100 000 term frequency vs inverse doc frequency:
In [*]: %timeit [t * i for t, i in izip(tf, idf)]
100 loops, best of 3: 6.2 ms per loop
The numpy style:
In [*]: %timeit tf * idf
1000 loops, best of 3: 74.2 Āµs per loop
102
104
106
number of elements
1Āµs
100ns
10ns
1ns
timeperelement
lists
numpy
Array computing can be more readable
tf * idf
vs
[t * i for t, i in izip(tf, idf)]
G Varoquaux 19
36. 2 arrays are nothing but pointers
A numpy array =
memory address data type shape strides
03878794797927
01790752701578
94071746124797
54970718717887
13653490495190
74754265358098
48721546349084
90345673245614
78957187745620
stride 2
stride 1
shape 1
shape 2
Represents any regular
data in a structured way:
how to access elements
via pointer arythmetics
(computing oļ¬sets)
stride 2
stride 1
shape 1
shape 2
03878794797927 01790752701578 ...
G Varoquaux 20
37. 2 arrays are nothing but pointers
A numpy array =
memory address data type shape strides
03878794797927
01790752701578
94071746124797
54970718717887
13653490495190
74754265358098
48721546349084
90345673245614
78957187745620
stride 2
stride 1
shape 1
shape 2
Represents any regular
data in a structured way:
how to access elements
via pointer arythmetics
(computing oļ¬sets)
stride 2
stride 1
shape 1
shape 2
03878794797927 01790752701578 ...
Matches the memory model of numerical libraries
ā Enables copyless interactions
Numpy is really a memory model
G Varoquaux 20
38. 2 Array computing is fast
102
104
106
number of elements
1Āµs
100ns
10ns
1ns
timeperelement
lists
numpy
tf idf = tf * idf CPU
03878794797927
01790752701578
*tf_idf=No type checking
Direct sequential memory access
Vector operations (SIMD)
G Varoquaux 21
39. 2 Array computing is limited by CPU starvation
103
104
105
106
number of elements
1ns
2ns
3ns
4ns
5ns
timeperelement
tf idf = tf * idf CPU
03878794797927
01790752701578
*tf_idf=
2x slowdown passed a certain size
G Varoquaux 22
40. 2 Array computing is limited by CPU starvation
103
104
105
106
number of elements
1ns
2ns
3ns
4ns
5ns
timeperelement
105
ā¼ size of
the CPU cache
Memory is much slower than CPU
tf idf = tf * idf
03878794797927
01790752701578
*tf_idf=
0387879
017
cpu
cache
2x slowdown passed a certain size
G Varoquaux 22
41. 2 Array computing is limited by CPU starvation
103
104
105
106
number of elements
1ns
2ns
3ns
4ns
5ns
timeperelement
Memory is much slower than CPU
tf idf = tf * idf - 1
It gets worse for complex expressions
G Varoquaux 22
42. 2 Array computing is limited by CPU starvation
103
104
105
106
number of elements
1ns
2ns
3ns
4ns
5ns
timeperelement
Memory is much slower than CPU
tf idf = tf * idf - 1
Whatās going on:
1. tmp ā tf * idf
2. tf idf ā tmp - 1
Big temporary:
Moving data in
& out of cache
G Varoquaux 22
43. 2 Array computing is limited by CPU starvation
tf idf = tf * idf - 1
Whatās going on:
1. tmp ā tf * idf
2. tf idf ā tmp - 1
Big temporary:
Moving data in
& out of cache
In [*]: %timeit tf * idf
10000 loops, best of 3: 74.2 Āµs per loop
In [*]: %timeit tf * idf - 1
1000 loops, best of 3: 418 Āµs per loop
G Varoquaux 22
44. 2 Array computing is limited by CPU starvation
tf idf = tf * idf - 1
Whatās going on:
1. tmp ā tf * idf
2. tf idf ā tmp - 1
Big temporary:
Moving data in
& out of cache
In [*]: %timeit tf * idf
10000 loops, best of 3: 74.2 Āµs per loop
In [*]: %timeit tf * idf - 1
1000 loops, best of 3: 418 Āµs per loop
In-place operations: reuse the allocation
In [*]: %timeit tmp = tf * idf; tmp -= 1
10000 loops, best of 3: 112 Āµs per loop
G Varoquaux 22
45. 2 Array computing is limited by CPU starvation
103
104
105
106
number of elements
1ns
2ns
3ns
4ns
5ns
timeperelement
numpy
np inplace
tmp = tf * idf
tmp -= 1
tf idf = tf * idf - 1
Whatās going on:
1. tmp ā tf * idf
2. tf idf ā tmp - 1
Big temporary:
Moving data in
& out of cache
G Varoquaux 22
46. 2 Array computing is limited by CPU starvation
103
104
105
106
number of elements
1ns
2ns
3ns
4ns
5ns
timeperelement
numpy
np inplace
tmp = tf * idf
tmp -= 1
tf idf = tf * idf - 1
Whatās going on:
1. tmp ā tf * idf
2. tf idf ā tmp - 1
Big temporary:
Moving data in
& out of cache
A compilation problem:
tf idf = tf * idf - 1
tf idf = tf * idf
tf idf -= 1
G Varoquaux 22
47. 2 Array computing is limited by CPU starvation
103
104
105
106
number of elements
1ns
2ns
3ns
4ns
5ns
timeperelement
numpy
np inplace
numexpr
tf idf = tf * idf - 1
Whatās going on:
1. tmp ā tf * idf
2. tf idf ā tmp - 1
Big temporary:
Moving data in
& out of cache
A compilation problem:
ā¢ Removing/reusing temporaries
ā¢ Operating on āchunksā that ļ¬t in cache
Addressed by numexpr, with string expressions
numexpr.evaluate(ātf * idf - 1ā, locals())
G Varoquaux 22
48. 2 Array computing is limited by CPU starvation
103
104
105
106
number of elements
1ns
2ns
3ns
4ns
5ns
timeperelement
numpy
np inplace
numexpr
tf idf = tf * idf - 1
Whatās going on:
1. tmp ā tf * idf
2. tf idf ā tmp - 1
Big temporary:
Moving data in
& out of cache
A compilation problem:
ā¢ Removing/reusing temporaries
ā¢ Operating on āchunksā that ļ¬t in cache
Addressed by numexpr, with string expressions
Addressed by numba, with bytecode inspection
lazyarray
Similar problem to pagination with SQL queries
G Varoquaux 22
49. 2 Array computing is limited by CPU starvation
tf idf = tf * idf
Too small:
overhead
Too BIG:
Out of cache
BIG Data
$$$
G Varoquaux 23
50. 2 Numerics versus control ļ¬ow
What if there is an if
tf idf = tf / idf
tf idf[idf == 0] = 0
Suppose the we are looking at ages in a population:
ages[gender == āmaleā].mean()
- ages[gender == āfemaleā].mean()
G Varoquaux 24
51. 2 Numerics versus control ļ¬ow
What if there is an if
tf idf = tf / idf
tf idf[idf == 0] = 0
Suppose the we are looking at ages in a population:
ages[gender == āmaleā].mean()
- ages[gender == āfemaleā].mean()
This is really starting to be looking like databases
pandas: something in between arrays and
an in-memory database
Great for queries, less great for numerics.
G Varoquaux 24
54. numerics vs databases
numerics eļ¬cient on regularly spaced data
But numpy creates cache misses for big arrays
ā Need to remove temporaries and chunk data
G Varoquaux 26
55. numerics vs databases
numerics eļ¬cient on regularly spaced data
But numpy creates cache misses for big arrays
ā Need to remove temporaries and chunk data
selection and grouping eļ¬cient with indexes or trees
ā Need to group queries
Compilation is unpythonic
G Varoquaux 26
56. numerics vs databases
numerics eļ¬cient on regularly spaced data
But numpy creates cache misses for big arrays
ā Need to remove temporaries and chunk data
selection and grouping eļ¬cient with indexes or trees
ā Need to group queries
Compilation is unpythonic
A computation & query language? numexpr
I hate domain-speciļ¬c languages (SQL)
Numpy is very expressive
G Varoquaux 26
57. numerics vs databases
numerics eļ¬cient on regularly spaced data
But numpy creates cache misses for big arrays
ā Need to remove temporaries and chunk data
selection and grouping eļ¬cient with indexes or trees
ā Need to group queries
Compilation is unpythonic
A computation & query language? numexpr
I hate domain-speciļ¬c languages (SQL)
Numpy is very expressive
PonyORM: Compiling Python to optimized SQL
Datascience with SQL: Ibis & Blaze
G Varoquaux 26
58. numerics vs databases
numerics eļ¬cient on regularly spaced data
But numpy creates cache misses for big arrays
ā Need to remove temporaries and chunk data
selection and grouping eļ¬cient with indexes or trees
ā Need to group queries
Spark: java-world ābig dataā rising star
combines distributed store
+ computing model
We (scikit-learn) are faster when data ļ¬ts in RAM
G Varoquaux 26
59. Operations on chunks, or algorithms on chunks
Machine learning, data mining = numerics
03878794797927
03878794797927
G Varoquaux 27
60. Operations on chunks, or algorithms on chunks
Machine learning, data mining = numerics
03878794797927
03878794797927
03878794797927
03878794797927
ETL (extract, transform, & load) Multivariate statistics
G Varoquaux 27
62. Making the data-science magic happens
from sklearn import
03078090707907
00790752700578
94071006000797
00970008007000
10000400400090
00050205008000
documents
the
Python
performance
profiling
module
is
code
can
a
03078090707907
00790752700578
94071006000797
topics
the
Python
performance
profiling
module
is
code
can
a
030
007
940
009
100
000
documents
topics
+
What terms
are in a topics
What documents
are in a topics
G Varoquaux 28
63. Making the data-science magic happens
from sklearn import
Turning applied maths papers to robust code
High-level, readable, simple syntax reduces cognitive load
Thanks
G Varoquaux 28
67. 3 Ingredients for future data ļ¬ows
Distributed computation & Run-time analysis
Reļ¬exivity is central
Debugging
Interactive work
Code analysis
Persistence
03878794797927
03878794797927
Parallel computing
G Varoquaux 31
68. 3 Ingredients for future data ļ¬ows
Distributed computation & Run-time analysis
Reļ¬exivity is central
Debugging
Interactive work
Code analysis
Persistence
03878794797927
03878794797927
Parallel computing
Pickle
distribute code and data without data model
serialize intermediate results
deep of hash of any data structure joblib.hash
Very limited (eg no lambda #19272)
ā variants: dill, cloudpickle
G Varoquaux 31
69. 3 Ingredients for future data ļ¬ows
Distributed computation & Run-time analysis
Reļ¬exivity is central
Debugging
Interactive work
Code analysis
Persistence
03878794797927
03878794797927
Parallel computing
Pickle
distribute code and data without data model
serialize intermediate results
deep of hash of any data structure joblib.hash
joblib:
Simple parallel syntax:
Parallel(n jobs=2)(delayed(sqrt)(i) for i in range(10))
Fast persistence:
joblib.dump(anything, āfilename.pkl.gzā)
Primitive for out of core:
pointer = mem.cache(f).call and shelves(big data)
ā¢Non-invasive syntax / paradigm
ā¢Fast on big numpy arrays
ā¢Soon backend system (job broker and persistence)
Gets job managment into algorithms (eg in scikit-learn)
G Varoquaux 31
70. 3 The Python VM is great
The simplicity of the VM is our strength
Software Transactional Memory... would be nice
But, I want to use foreign memory
Java gained jmalloc for foreign memory
Better garbage collection
Yes but, I easily plug into reference counting
A strength of Python is its clear C API
ā Easy foreign functionality
G Varoquaux 32
71. 3 The Python VM is great
The simplicity of the VM is our strength
Software Transactional Memory... would be nice
But, I want to use foreign memory
Java gained jmalloc for foreign memory
Better garbage collection
Yes but, I easily plug into reference counting
A strength of Python is its clear C API
ā Easy foreign functionality
Cython: the best of C and Python
Add types for speed (numpy arrays as float*)
Call C to bind external libraries: surprisingly easy
no pointer arithmetics
An adaptation layer between Python VM and C
G Varoquaux 32
73. 4 Scikit-learn is easy machine learning
As easy as py
from s k l e a r n import svm
c l a s s i f i e r = svm.SVC()
c l a s s i f i e r . f i t ( X t r a i n , Y t r a i n )
Y t e s t = c l a s s i f i e r . p r e d i c t ( X t e s t )
People love the encapsulation
classifier is a semi black box
The power of a simple object-oriented API
Documentation-driven development
G Varoquaux 34
74. 4 Scikit-learn is easy machine learning
As easy as py
from s k l e a r n import svm
c l a s s i f i e r = svm.SVC()
c l a s s i f i e r . f i t ( X t r a i n , Y t r a i n )
Y t e s t = c l a s s i f i e r . p r e d i c t ( X t e s t )
People love the encapsulation
classifier is a semi black box
The power of a simple object-oriented API
Documentation-driven development
High-level, readable, simple API reduces cognitive load
PyData loves Python in return
G Varoquaux 34
75. 4 Diļ¬erence is richness
We all do diļ¬erent things
We can all beneļ¬t from others
though we donāt know how
G Varoquaux 35
76. 4 Diļ¬erence is richness, but requires outreach
We all do diļ¬erent things
We can all beneļ¬t from others
though we donāt know how
Being didactic outside oneās community is crucial
Avoiding jargon take that machine learning
Prioritizing information
āSimple is better than complexā
Students learning numerics donāt care about unicode
Build documentation upon very simple examples
Think stackoverļ¬ow
Sphinx + Sphinx-gallery
G Varoquaux 35
77. @GaelVaroquaux
Scientist web dev: Python is the language for data
Python language & VM is perfect to manipulate
low-level constructs
with high-level wordings
Connects to other paradigms, eg C
78. @GaelVaroquaux
Scientist web dev: Python is the language for data
Python language & VM is perfect to manipulate
low-level constructs
with high-level wordings
Dynamism and reļ¬exivity
ā meta-programming and debugging
79. @GaelVaroquaux
Scientist web dev: Python is the language for data
Python language & VM is perfect to manipulate
low-level constructs
with high-level wordings
Dynamism and reļ¬exivity
ā meta-programming and debugging
Needs for compilation and dynamism:
a diļ¬cult balance
PEP 509: guards on run-time modiļ¬cation
PEP 510: function specicalization
80. @GaelVaroquaux
Scientist web dev: Python is the language for data
Python language & VM is perfect to manipulate
low-level constructs
with high-level wordings
Dynamism and reļ¬exivity
ā meta-programming and debugging
Needs for compilation and dynamism
Pydata will use DB and concurrency from web
PyData can give knowledge engineering + AI