Python makes data science easy. In this deck we walk through a complete example of creating and evaluating a predictive model using Decision Trees and Random Forests. All of the code is included in the slides.
Analysis of Fatal Utah Avalanches with Python. From Scraping, Analysis, to In...Matt Harrison
I gave this presentation at Code Camp. As a data scientist and backcountry skier, I was interested in looking at fatal avalanche data. This covers scraping the data, analysis with Python, pandas and IPython Notebook. The final result is an infographic
This presentation covers Python most important data structures like Lists, Dictionaries, Sets and Tuples. Exception Handling and Random number generation using simple python module "random" also covered. Added simple python programs at the end of the presentation
Inspired by Josh Bloch's Java Puzzlers, we put together our own Python Puzzlers. This slide deck brings you a set of 10 python puzzlers, that are fun and educational. Each puzzler will show you a piece of python code. Your task if to figure out what happens when the code is run. Whether you're a python beginner or a passionate python veteran, we hope that there's something to learn for everybody.
This slide deck was first presented at shopkick. Nandan Sawant and Ryan Rueth are engineers at shopkick. Keeping the audience in mind, most of the puzzlers are based on python 2.x.
These are the slides of the second part of this multi-part series, from Learn Python Den Haag meetup group. It covers List comprehensions, Dictionary comprehensions and functions.
Slides from Advaned Python lectures I gave recently in Haifa Linux club
Advanced python, Part 2:
- Slots vs Dictionaries
- Basic and Advanced Generators
- Async programming
Analysis of Fatal Utah Avalanches with Python. From Scraping, Analysis, to In...Matt Harrison
I gave this presentation at Code Camp. As a data scientist and backcountry skier, I was interested in looking at fatal avalanche data. This covers scraping the data, analysis with Python, pandas and IPython Notebook. The final result is an infographic
This presentation covers Python most important data structures like Lists, Dictionaries, Sets and Tuples. Exception Handling and Random number generation using simple python module "random" also covered. Added simple python programs at the end of the presentation
Inspired by Josh Bloch's Java Puzzlers, we put together our own Python Puzzlers. This slide deck brings you a set of 10 python puzzlers, that are fun and educational. Each puzzler will show you a piece of python code. Your task if to figure out what happens when the code is run. Whether you're a python beginner or a passionate python veteran, we hope that there's something to learn for everybody.
This slide deck was first presented at shopkick. Nandan Sawant and Ryan Rueth are engineers at shopkick. Keeping the audience in mind, most of the puzzlers are based on python 2.x.
These are the slides of the second part of this multi-part series, from Learn Python Den Haag meetup group. It covers List comprehensions, Dictionary comprehensions and functions.
Slides from Advaned Python lectures I gave recently in Haifa Linux club
Advanced python, Part 2:
- Slots vs Dictionaries
- Basic and Advanced Generators
- Async programming
Installing and Using Python
Basic I/O
Variables and Expressions
Conditional Code
Functions
Loops and Iteration
Python Data Structures
Errors and Exceptions
Object Oriented with Python
Multithreaded Programming with Python
Install/Create and Using Python Library
Compile Python Script
Resources
===========================
and 7 Quizzes
Following a game show format made popular by Joshua Bloch and Neal Gafter's Java Puzzlers this presentation intends to both entertain and inform. Snippets of Python code the whose behaviour is not entirely obvious are shown, the audience will then be asked to pick from a number of options what the behaviour of the program is. The correct and sometimes non-intuitive answer will then be given along with a brief explanation of the idea the puzzle exposes. Only a modest working knowledge of the Python language is required to understand the puzzles, but the puzzles may also entertain the more experienced Python programmer.
The basics of Python are rather straightforward. In a few minutes you can learn most of the syntax. There are some gotchas along the way that might appear tricky. This talk is meant to bring programmers up to speed with Python. They should be able to read and write Python.
En esta charla veremos con detalle algunas de las construcciones más pythonicas y las posibilidades de expresar de forma clara, concisa y elegante cosas que en otros lenguajes nos obligarían a dar muchos rodeos.
A veces es fácil olvidar algunos recursos como que una función puede devolver varios valores, cómo manipular listas y diccionarios de forma sencilla, contextos, generadores... En esta charla veremos de forma entretenida y práctica cómo mejorar nuestro nivel de Python "nativo".
Python 101++: Let's Get Down to Business!Paige Bailey
You've started the Codecademy and Coursera courses; you've thumbed through Zed Shaw's "Learn Python the Hard Way"; and now you're itching to see what Python can help you do. This is the workshop for you!
Here's the breakdown: we're going to be taking you on a whirlwind tour of Python's capabilities. By the end of the workshop, you should be able to easily follow any of the widely available Python courses on the internet, and have a grasp on some of the more complex aspects of the language.
Please don't forget to bring your personal laptop!
Audience: This course is aimed at those who already have some basic programming experience, either in Python or in another high level programming language (such as C/C++, Fortran, Java, Ruby, Perl, or Visual Basic). If you're an absolute beginner -- new to Python, and new to programming in general -- make sure to check out the "Python 101" workshop!
Characteristics of Java and basic programming constructs like Data types, Variables, Operators, Control Statements, Arrays are discussed with relevant examples
The presentation from SPb Python Interest Group community meetup.
The presentation tells about the dictionaries in Python, reviews the implementation of dictionary in CPython 2.x, dictionary in CPython 3.x, and also recent changes in CPython 3.6. In addition to CPython the dictionaries in alternative Python implementations such as PyPy, IronPython and Jython are reviewed.
Installing and Using Python
Basic I/O
Variables and Expressions
Conditional Code
Functions
Loops and Iteration
Python Data Structures
Errors and Exceptions
Object Oriented with Python
Multithreaded Programming with Python
Install/Create and Using Python Library
Compile Python Script
Resources
===========================
and 7 Quizzes
Following a game show format made popular by Joshua Bloch and Neal Gafter's Java Puzzlers this presentation intends to both entertain and inform. Snippets of Python code the whose behaviour is not entirely obvious are shown, the audience will then be asked to pick from a number of options what the behaviour of the program is. The correct and sometimes non-intuitive answer will then be given along with a brief explanation of the idea the puzzle exposes. Only a modest working knowledge of the Python language is required to understand the puzzles, but the puzzles may also entertain the more experienced Python programmer.
The basics of Python are rather straightforward. In a few minutes you can learn most of the syntax. There are some gotchas along the way that might appear tricky. This talk is meant to bring programmers up to speed with Python. They should be able to read and write Python.
En esta charla veremos con detalle algunas de las construcciones más pythonicas y las posibilidades de expresar de forma clara, concisa y elegante cosas que en otros lenguajes nos obligarían a dar muchos rodeos.
A veces es fácil olvidar algunos recursos como que una función puede devolver varios valores, cómo manipular listas y diccionarios de forma sencilla, contextos, generadores... En esta charla veremos de forma entretenida y práctica cómo mejorar nuestro nivel de Python "nativo".
Python 101++: Let's Get Down to Business!Paige Bailey
You've started the Codecademy and Coursera courses; you've thumbed through Zed Shaw's "Learn Python the Hard Way"; and now you're itching to see what Python can help you do. This is the workshop for you!
Here's the breakdown: we're going to be taking you on a whirlwind tour of Python's capabilities. By the end of the workshop, you should be able to easily follow any of the widely available Python courses on the internet, and have a grasp on some of the more complex aspects of the language.
Please don't forget to bring your personal laptop!
Audience: This course is aimed at those who already have some basic programming experience, either in Python or in another high level programming language (such as C/C++, Fortran, Java, Ruby, Perl, or Visual Basic). If you're an absolute beginner -- new to Python, and new to programming in general -- make sure to check out the "Python 101" workshop!
Characteristics of Java and basic programming constructs like Data types, Variables, Operators, Control Statements, Arrays are discussed with relevant examples
The presentation from SPb Python Interest Group community meetup.
The presentation tells about the dictionaries in Python, reviews the implementation of dictionary in CPython 2.x, dictionary in CPython 3.x, and also recent changes in CPython 3.6. In addition to CPython the dictionaries in alternative Python implementations such as PyPy, IronPython and Jython are reviewed.
Presented at Data Day Texas 2020 and attempts to show the tradeoffs between bigger data, better math, and better data. Uses Fashion MNIST as the use case, and a progression of better math from Random Forest to Gradient Boosted Trees to Feedforward Neural Nets to Convolutional Neural Nets.
Oh, and Cthulhu
This tutor introduces the basic idea of machine learning with a very simple example. Machine learning teaches machines (and me too) to learn to carry out tasks and concepts by themselves. It is that simple, so here is an overview:
http://www.softwareschule.ch/examples/machinelearning.jpg
A short list of the most useful R commands
reference: http://www.personality-project.org/r/r.commands.html
R programı ile ilgilenen veya yeni öğrenmeye başlayan herkes için hazırlanmıştır.
Random Forests: The Vanilla of Machine Learning - Anna QuachWithTheBest
This speech was about coding and forest trees, as well as how to use and apply these concepts to your work.
Anna Quach, PhD Student at Utah State University working under Dr. Adele Cutler
Generics are one of the most complex features of Java. They are often poorly understood and lead to confusing errors. Unfortunately, it won’t get easier. Java 10, release planned for 2018, extends Generics. It’s now time to understand generics or risk being left behind.
We start by stepping back into the halcyon days of 2004 and explain why generics were introduced in the first place back. We also explain why Java’s implementation is unique compared to similar features in other programming languages.
Then we travel to the present to explaining how to make effective use of Generics. We then explore various entertaining code examples and puzzlers of how Generics are used today.
Finally, this talk sheds light on the planned changes in Java 10 with practical code examples and related ideas from other programming languages. If you ever wanted to understand the buzz around higher kinded types or declaration site variance now is your chance!
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
3. About Matt
● Author, consultant and trainer in Python and
Data Science.
● Python since 2000
● Experience across Data Science, BI, Web, Open
Source Stack Management, and Search.
● http://metasnake.com/
7. Supervised Classification
Sometimes called "Predictive Modeling."
1.Identify patterns from labeled examples.
(Training Set => Model)
2.Based on those patterns, try to guess labels for
other examples. (Predict)
8.
9.
10. Examples
Binary Classification
● Is this student at high risk for dropping out?
● Is this individual at high risk for defaulting on a loan?
● Is this person at high risk for becoming infected with a certain
disease?
Multi-class Classification
● What is the most likely disease given this individual's current
symptoms?
● Which ad is the user most likely to click on?
13. No Free Lunch
If an algorithm performs well on a certain class of
problems then it necessarily pays for that with
degraded performance on the set of all remaining
problems.
David Wolpert and William Macready. No Free Lunch Theorems for
Optimization. IEEE Transactions on Evolutionary Computation, 1:67, 1997.
14. No Free Lunch
No single algorithm will build the best model on
every dataset.
16. Premise
Manuel Fernández-Delgado, Eva Cernadas, Senén
Barro, and Dinani Amorim. Do we Need
Hundreds of Classifiers to Solve Real World
Classification Problems? Journal of Machine
Learning Research, 15(Oct):3133−3181, 2014.
http://jmlr.csail.mit.edu/papers/v15/delgado14a.ht
19. Premise
"The Random Forest is clearly the best family of
classifiers, followed by SVM, neural networks,
and boosting ensembles."
● On average, RF achieved 94.1% of the
theoretical maximum accuracy for each dataset.
● RF achieved over 90% of maximum accuracy in
84.3% of datasets.
22. Classification Task
Predict who did and did not survive the disaster
on the Titanic
(and see if we can get some idea of why it turned
out that way)
24. Get Data
>>> import pandas as pd
>>> df = pd.read_excel('data/titanic3.xls')
http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic3.xls
25. Columns
● class: Passenger class (1 = first; 2 = second; 3 = third)
● name: Name
● sex: Sex
● age: Age
● sibsp: Number of siblings/spouses aboard
● parch: Number of parents/children aboard
● ticket: Ticket number
● fare: Passenger fare
● cabin: Cabin
● embarked: Port of embarkation (C = Cherbourg; Q = Queenstown; S = Southampton)
● boat: Lifeboat (if survived)
● body: Body number (if did not survive and body was recovered)
26. Label Column
The column giving the label for our classification
task:
● survival: Survival (0 = no; 1 = yes)
31. Decision Trees
>>> from sklearn import tree
>>> model = tree.DecisionTreeClassifier(random_state=42)
>>> ignore = set('boat,body,home.dest,name,ticket'.split(','))
>>> cols = [c forc in df.columns if c != 'survived' and c not in ignore]
>>> X = df[cols]
>>> y = df.survived
>>> model.fit(X, y)
Traceback (most recent call last):
. . .
ValueError: could not convert string to float: 'S'
32. Create Dummy Variables
>>> dummy_cols = 'pclass,sex,cabin,embarked'.split(",")
>>> df2 = pd.get_dummies(df, columns=dummy_cols)
>>> model = tree.DecisionTreeClassifier(random_state=42)
>>> ignore = set('boat,body,home.dest,name,ticket'.split(','))
>>> cols = [c forc in df2.columns if c != 'survived' and c
... not in ignore and c not in dummy_cols]
>>> X = df2[cols]
>>> y = df2.survived
33. Try Again
>>> model.fit(X, y)
Traceback (most recent call last):
. . .
ValueError: Input contains NaN, infinity or a value
too large for dtype('float32').
34. Imputing
Fancy term for filling in values. Mean is a good
choice for decision trees as it doesn't bias the
splitting, whereas 0 would
35. Try Again
>>> X = X.fillna(X.mean())
>>> X.dtypes
age float64
sibsp int64
parch int64
fare float64
pclass_1 float64
pclass_2 float64
pclass_3 float64
sex_female float64
sex_male float64
cabin_A10 float64
cabin_A11 float64
cabin_A14 float64
62. Condorcet's Jury Theorem
From 1785 Essay on the Application of Analysis to the
Probabililty of Majority Decisions. If each member of
jury has p > .5 of predicting correct choice, adding
more jury members increases probability of
correct choice.
63. Random Forest
Algorithm:
● Sample from training set N (random WITH
REPLACEMENT - lets us do OOB)
● Select m input variables (subset of M total input variables)
● Grow a tree
● Repeat above (create ensemble)
● Predict by aggregation predictions of forest (votes for
classification, average for regression)
70. Tuning
● max_features - Don't want to use all of the features (all tree will look the same). By taking
samples of features, you reduce bias and correlation amoung trees
● n_estimators - More is better, but diminishing returns (don't need too many jurors, takes a
longer time to train, lots of memory)
● max_depth - too tall, overfitting. Can't know ahead of time what good size could be. Can
use these parameters to constrain depth as well:
– min_samples_leaf - smaller is more prone to overfitting (capturing noise)
– max_leaf_nodes - Can't be more than this many leaves
– min_weight_fraction_leaf - The minimum weighted fraction of the input samples required to be at a
leaf node. Note: this parameter is tree-specific.
77. Summary
● Scalable (can build trees per CPU)
● Reduces variance in decision tree
● No normalization of data (don't want money range ($0 - $10,000,000)
overidding age (0-100)
● Feature importance (look at "mean decrease of impurity" where this
node appears)
● Helps with missing data, outliers, and dimension reduction
● Works with both regression and classification
● Sampling allows "out of bag" estimate, removing need for test set
78. Why Python?
● Efficient algorithm
● Close to metal, 3000+ lines of Cython
● Faster than OpenCV (C++), Weka (Java),
RandomForest (R/Fortran)