A talk given by Eugene Dubossarsky on predictive analytics at the Big Data Analytics meetup in Sydney this month. The talk is available at http://www.youtube.com/watch?v=aG16YSFgtLY
Presentation to the third LIS DREaM workshop, held at Edinburgh Napier university on Wednesday 25th April 2012.
More information about the event can be found at http://lisresearch.org/dream-project/dream-event-4-workshop-wednesday-25-april-2012/
If you are curious what is ML all about, this is a gentle introduction to Machine Learning and Deep Learning. This includes questions such as why ML/Data Analytics/Deep Learning ? Intuitive Understanding o how they work and some models in detail. At last I share some useful resources to get started.
R - what do the numbers mean? #RStats This is the presentation for my Demo at Orlando Live60 AILIve. We go through statistics interpretation with examples
Presentation to the third LIS DREaM workshop, held at Edinburgh Napier university on Wednesday 25th April 2012.
More information about the event can be found at http://lisresearch.org/dream-project/dream-event-4-workshop-wednesday-25-april-2012/
If you are curious what is ML all about, this is a gentle introduction to Machine Learning and Deep Learning. This includes questions such as why ML/Data Analytics/Deep Learning ? Intuitive Understanding o how they work and some models in detail. At last I share some useful resources to get started.
R - what do the numbers mean? #RStats This is the presentation for my Demo at Orlando Live60 AILIve. We go through statistics interpretation with examples
Machine Learning has become a must to improve insight, quality and time to market. But it's also been called the 'high interest credit card of technical debt' with challenges in managing both how it's applied and how its results are consumed.
Module 1 introduction to machine learningSara Hooker
We believe in building technical capacity all over the world.
We are building and teaching an accessible introduction to machine learning for students passionate about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our work, visit www.deltanalytics.org
H2O World - Top 10 Data Science Pitfalls - Mark LandrySri Ambati
H2O World 2015 - Mark Landry
Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
A talk at ESSA@Work, TUHH (Technical University of Hamburg), 24th Nov 2017.
Abstract: Simulation models can only be justified with respect to the models purpose or aim. The talk looks at six common purposes for modelling: prediction, explanation, analogy, theoretical exposition, description, and illustration. Each of these is briefly described, with an example and an brief analysis of the risks to achieving these, and hence how they should be demonstrated. The importance of being explicitly clear about the model purpose is repeatedly emphasised.
Analytical Skills Tools and Attitudes 2013 Survey lavastorm analyticsjjoseph100
Survey of 425 analytic professionals- those that are making big data and analytics work within organizations - to see if they have the skills needed to push analytics further and/or to identify the skills most needed and how people are developing them.
State of the Art in Machine Learning, by Thomas Dietterich, Distinguished Professor Emeritus in the School of EECS at Oregon State University and Chief Scientist of BigML.
*MLSEV 2020: Virtual Conference.
This presentation is about a lecture I gave within the "Software systems and services" immigration course at the Gran Sasso Science Institute, L'Aquila (Italy): http://cs.gssi.infn.it/.
http://www.ivanomalavolta.com
Module 4: Model Selection and EvaluationSara Hooker
Delta Analytics is a 501(c)3 non-profit in the Bay Area. We believe that data is powerful, and that anybody should be able to harness it for change. Our teaching fellows partner with schools and organizations worldwide to work with students excited about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our mission or provide feedback, take a look at www.deltanalytics.org.
ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone.
An ensemble is itself a supervised learning algorithm, because it can be trained and then used to make predictions. The trained ensemble, therefore, represents a single hypothesis. This hypothesis, however, is not necessarily contained within the hypothesis space of the models from which it is built.
Understanding your data with Bayesian networks (in Python) by Bartek Wilczyns...PyData
Today's world is full of data that is easily accessible for anyone. The problem now is how to make sense of this data and extract some useful insights from the terabytes of raw material. Typically, this involves using machine learning tools - allowing you to build classifiers, cluster data, etc. Many of these approaches give you models that describe the data accurately, but may be difficult to interpret. If you want to be able to understand the result more intuitively it is worth looking at Bayesian Networks - a graphical representation that simplifies complex mathematical model into a most likely graph of dependencies between your variables. I will talk about BNFinder - a python library allowing you to take any tabular data and convert it to a much simplified representation of conditional dependencies between variables. It can be the used for classification of unseen objects while the connection structure can be interpreted even by a non specialist. BNfinder is publicly available under GNU GPL and it can be used by anyone on their data.
This presentation shows how Predictive Analytics can be more futuristic than BI in using past events to predict the future.
Furthermore, we explore the best practices in Predictive Analytics, the challenges in deployment and how this solution can be used to create business value for the organization.
Presented by Ajay Gopikrishnan, our expert in Predictive Analytics and Data Mining at the BA4ALL (Business Analytics Insight 2014) event in the Netherlands.
http://www.capgemini.com/big-data-analytics
Seismi Case Study | Oracle Mining Event | Santiago de Chile | 15 March 2012Seismi Limited
Seismi were invited to present a recent case study of a recent Oracle Hyperion DRM (Data Relationship Manager) and EPM installation. The case study outlines the benefits that Oracle DRM can deliver particularly the elimination of the risks of human middleware and the introduction of a fully automated and integrated financial process.
There is a description by Seismi's client's CFO and project sponsor at the outset as well as the client's reaction to the solution, which Seismi continues to support.
The case study highlighted the flexibility of DRM to be at the core of its network and its ability to integrate with multiple vendors
Machine Learning has become a must to improve insight, quality and time to market. But it's also been called the 'high interest credit card of technical debt' with challenges in managing both how it's applied and how its results are consumed.
Module 1 introduction to machine learningSara Hooker
We believe in building technical capacity all over the world.
We are building and teaching an accessible introduction to machine learning for students passionate about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our work, visit www.deltanalytics.org
H2O World - Top 10 Data Science Pitfalls - Mark LandrySri Ambati
H2O World 2015 - Mark Landry
Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
A talk at ESSA@Work, TUHH (Technical University of Hamburg), 24th Nov 2017.
Abstract: Simulation models can only be justified with respect to the models purpose or aim. The talk looks at six common purposes for modelling: prediction, explanation, analogy, theoretical exposition, description, and illustration. Each of these is briefly described, with an example and an brief analysis of the risks to achieving these, and hence how they should be demonstrated. The importance of being explicitly clear about the model purpose is repeatedly emphasised.
Analytical Skills Tools and Attitudes 2013 Survey lavastorm analyticsjjoseph100
Survey of 425 analytic professionals- those that are making big data and analytics work within organizations - to see if they have the skills needed to push analytics further and/or to identify the skills most needed and how people are developing them.
State of the Art in Machine Learning, by Thomas Dietterich, Distinguished Professor Emeritus in the School of EECS at Oregon State University and Chief Scientist of BigML.
*MLSEV 2020: Virtual Conference.
This presentation is about a lecture I gave within the "Software systems and services" immigration course at the Gran Sasso Science Institute, L'Aquila (Italy): http://cs.gssi.infn.it/.
http://www.ivanomalavolta.com
Module 4: Model Selection and EvaluationSara Hooker
Delta Analytics is a 501(c)3 non-profit in the Bay Area. We believe that data is powerful, and that anybody should be able to harness it for change. Our teaching fellows partner with schools and organizations worldwide to work with students excited about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our mission or provide feedback, take a look at www.deltanalytics.org.
ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone.
An ensemble is itself a supervised learning algorithm, because it can be trained and then used to make predictions. The trained ensemble, therefore, represents a single hypothesis. This hypothesis, however, is not necessarily contained within the hypothesis space of the models from which it is built.
Understanding your data with Bayesian networks (in Python) by Bartek Wilczyns...PyData
Today's world is full of data that is easily accessible for anyone. The problem now is how to make sense of this data and extract some useful insights from the terabytes of raw material. Typically, this involves using machine learning tools - allowing you to build classifiers, cluster data, etc. Many of these approaches give you models that describe the data accurately, but may be difficult to interpret. If you want to be able to understand the result more intuitively it is worth looking at Bayesian Networks - a graphical representation that simplifies complex mathematical model into a most likely graph of dependencies between your variables. I will talk about BNFinder - a python library allowing you to take any tabular data and convert it to a much simplified representation of conditional dependencies between variables. It can be the used for classification of unseen objects while the connection structure can be interpreted even by a non specialist. BNfinder is publicly available under GNU GPL and it can be used by anyone on their data.
This presentation shows how Predictive Analytics can be more futuristic than BI in using past events to predict the future.
Furthermore, we explore the best practices in Predictive Analytics, the challenges in deployment and how this solution can be used to create business value for the organization.
Presented by Ajay Gopikrishnan, our expert in Predictive Analytics and Data Mining at the BA4ALL (Business Analytics Insight 2014) event in the Netherlands.
http://www.capgemini.com/big-data-analytics
Seismi Case Study | Oracle Mining Event | Santiago de Chile | 15 March 2012Seismi Limited
Seismi were invited to present a recent case study of a recent Oracle Hyperion DRM (Data Relationship Manager) and EPM installation. The case study outlines the benefits that Oracle DRM can deliver particularly the elimination of the risks of human middleware and the introduction of a fully automated and integrated financial process.
There is a description by Seismi's client's CFO and project sponsor at the outset as well as the client's reaction to the solution, which Seismi continues to support.
The case study highlighted the flexibility of DRM to be at the core of its network and its ability to integrate with multiple vendors
Creating Your First Predictive Model In PythonRobert Dempsey
If you’ve been reading books and blog posts on machine learning and predictive analytics and are still left wondering how to create a predictive model and apply it to your own data, this presentation will give you the steps you need to take to do just that.
Webinar: The Whys and Hows of Predictive Modelling Edureka!
Predictive analytics is a great technology that can help in identifying the origin of a problem before it actually happens. It involves the collective experience of an organization that helps in taking better decisions in the future. It has many strategic advantages as it allows a company in becoming the leader when the changes actually happen. Predictive Analytics is considered a boon for the organizations to grow in the highly competitive market.
Topics covered:
1. Beyond OLS: What real life data-sets look like!
2. Decoding Forecasting
3. Handling real life datasets & Building Models in R
4. Forecasting techniques and Plots
In our last paper we compared two alternate machine-learning techniques from
the Apache Mahout stable, namely: Apache Sparks’, spark-itemsimilarity, and its
counterpart Apache Hadoop’s MapReduce. We saw how Apache Spark was better
both qualitatively as well as quantitatively even for moderately sized sites.
In this paper, we look at how we can further optimize the efficiency of these runs
without compromising on quality. We determine how the two algorithms we
studied last time perform when run on all data available and when run only with
success data. In the e-commerce domain, success data is defined, as a subset of
the total data, which we heuristically believe, does not include noise.
Artificial Intelligence, Predictive Modelling and Chatbots: Applications in P...Nick Brown
Presentation by Hari Radhakrishnan (senior solution developer) and Josh Mesout (graduate developer), in my team at Deep Learning Summit in London on September 23rd 2016. Brief overview about how we have been exploring artificial intelligence and how predictive modelling has the potential to revolutionise what we do across the drug discovery and development process. Examples include recent exploratory work on AI chatbots and video facial sentiment detection.
Amazon Machine Learning Case Study: Predicting Customer ChurnAmazon Web Services
We do a deeper dive into Amazon Machine Learning, using a specific business problem as an example – predicting if the customer is about to leave your service, also known as customer churn. We examine several practical aspects of building and using a model, including the use of the recipe language for training data manipulation and modeling the costs of false positive/negative errors.
AWS re:Invent 2016: Predicting Customer Churn with Amazon Machine Learning (M...Amazon Web Services
In this session, we take a specific business problem—predicting Telco customer churn—and explore the practical aspects of building and evaluating an Amazon Machine Learning model. We explore considerations ranging from assigning a dollar value to applying the model using the relative cost of false positive and false negative errors. We discuss all aspects of putting Amazon ML to practical use, including how to build multiple models to choose from, put models into production, and update them. We also discuss using Amazon Redshift and Amazon S3 with Amazon ML.
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...tboubez
This is my presentation from LISA 2014 in Seattle on November 14, 2014.
Most IT Ops teams only keep an eye on a small fraction of the metrics they collect because analyzing this haystack of data and extracting signal from the noise is not easy and generates too many false positives.
In this talk I will show some of the types of anomalies commonly found in dynamic data center environments and discuss the top 5 things I learned while building algorithms to find them. You will see how various Gaussian based techniques work (and why they don’t!), and we will go into some non-parametric methods that you can use to great advantage.
Two hour lecture I gave at the Jyväskylä Summer School. The purpose of the talk is to give a quick non-technical overview of concepts and methodologies in data science. Topics include a wide overview of both pattern mining and machine learning.
See also Part 2 of the lecture: Industrial Data Science. You can find it in my profile (click the face)
Improving AI Development - Dave Litwiller - Jan 11 2022 - PublicDave Litwiller
A conversational tour through some things I’ve learned in helping scale-up stage client companies improve their AI development practices, especially where deep neural nets (DNNs) are in use.
Michael Bolton - Heuristics: Solving Problems RapidlyTEST Huddle
EuroSTAR Software Testing Conference 2008 presentation on Heuristics: Solving Problems Rapidly by Michael Bolton. See more at conferences.eurostarsoftwaretesting.com/past-presentations/
We've been taught that "data science" is the esoteric domain of PhDs,
but like anything else, it's easy once you understand it. This talk
explains the basics of data science, covering concepts in supervised
learning (including a detailed explanation of decision trees and
random forests) as well as examples of unsupervised learning
algorithms. Far from being a dry and academic topic, data science and machine learning are useful and practical analytical tools. (This talk is intended for a general audience.)
Topics will include:
1) An introduction to supervised learning using the popular decision
tree algorithm
2) The concepts of training and scoring, and the meaning of "real time"
machine learning
3) Model validation using holdout sets
4) Model complexity and overfitting; understanding bias and variance;
using ensembles to reduce variance
5) An overview of unsupervised learning models including clustering,
topic modeling and anomaly detection
and more!
Lecture related to machine learning. Here you can read multiple things. Lecture related to machine learning. Here you can read multiple things. Lecture related to machine learning. Here you can read multiple things. Lecture related to machine learning. Here you can read multiple things. Lecture related to machine learning. Here you can read multiple things.
Multi task learning stepping away from narrow expert models 7.11.18Cloudera, Inc.
Join this webinar as Friederike Schüür covers:
A conceptual introduction to multi-task learning (MTL), how and why it works
A technical deep dive, from MTL random forests to MTL neural networks
Applications of MTL, from structured data to text and images
The benefits of MTL to organizations, from financial services to healthcare and agriculture
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
2. What This Talk Isn’t About
But worth mentioning anyway:
R and The Sydney Users of R Forum
Analyst First
My Courses
3. Sydney Users of R Forum
• Just 1 shy of 500 members
• Regular meetups
• Study groups: introduction to R, “Machine
Learning for Hackers”, “Elements of Statistical
Learning”
4.
5. R
• Do a Google image search for “ggplot2”
• Look for “r4stats”, “popularity”
• Join SURF
• Download R and start using it.
6.
7. Analyst First
• Strategic, Cultural, Organisational, Human issues in
analytics
• Making analytics work in organisations
• Focus on the Human side of analytics
• International : Aust, NZ, Singapore, US, Japan, India, Hong
Kong
• analystfirst.com – see “core principles” and “what is analyst
first” ?
8. My Analytics Training Courses
• Predictive Modelling, Data Mining, R, Forensic
Analytics, Visualisation, Forecasting training courses
• Sydney, Melbourne, Canberra, Singapore
• Public and in-house
• Pre-prepared or customised
• Informal coaching/mentoring
• Strategy, Review, Advice and Assistance with Analytics Capability
Development in your organisation
9. The Zen of Predictive Modelling
PredictiveModels
• The Most Important Part of My “Predictive Modelling and Data Mining Course”
• What every user of predictive modelling should know
• What every manager and owner of predictive modelling capability must know
• “Open Secrets” known to the masters
10. The Zen of Predictive Modelling
PredictiveModels
• To save people time
• To see the forest for the trees
• To real value out of predictive analytics
11. The Right Point of View
PredictiveModels
Which is unlike the other two?
• Kohonen neural network
• Backpropagation neural network
• CART decision tree
12. The Right Point of View
PredictiveModels
Which is unlike the other two?
• CART decision tree
• Random Forest
• Support Vector Machine
13. The Right Point of View
PredictiveModels
Which is unlike the other two?
• Backpropagation Neural Network
• Linear Model
• CART Decision Tree
14. The Right Point of View
PredictiveModels
• Out Of Sample Accuracy
• Robustness (Out of Time Accuracy)
• Interpretability
• Implementability
15. The Right Point of View
PredictiveModels
• Out Of Sample Accuracy
• Robustness (Out of Time Accuracy)
• Interpretability
• Implementability
16. The Right Point of View
PredictiveModels
• Out Of Sample Accuracy
• Robustness (Out of Time Accuracy)
• Interpretability
• Implementability
17. The Right Point of View
PredictiveModels
Why build predictive models ?
• Insights
• Operational prediction
• “What-if” analysis
18. What Do All Predictive Models Have in Common ?
PredictiveModels
All Predictive Models:
• Have a training set of predictors and outcomes
• Probably have a cross-validation and test set of predictors and outcomes too.
• Are “fit” (optimsied) to minimise an error function between their actual and target
outcomes
• Are probably cross-validated to control overfitting on an out-of-sample data set
• Provide information on the relationship between the predictors and outcomes in
the data
• Can be used to score new data (make new predictions)
• Can be deployed in IT systems
• Can be interrogated for insights
• Are only as accurate as the data allows
• Provide a (fairly) accurate estimate of how well they will predict on new data
19. What Do All Predictive Model Insights Have in Common ?
PredictiveModels
All Predictive Models:
• Have variable importance measures (a number of which can be applied to any
model)
• Allow plotting predictors vs outcomes
• Have variable accuracy measures
• Can be resampled for more robust measures of accuracy
20. What Do All Predictive Model Predictions Have in Common?
PredictiveModels
All Predictive Models:
• Make predictions that are numeric : estimates of amount for regression, and
probability for classification
• All predictions are applications of the underlying model structure and parameters
(formula) to new predictor data sets
• All predictions are deterministic. Once a model is fitted, the predictions for a given
record will be the same every time. (Though the prediction may be a distribution
rather than a fixed point. Also, note that model fitting itself may be random – some
models may differ slightly each time they are fitted to the same data set)
21. How Do Predictive Model Families Differ?
PredictiveModels
• Classification vs Regression (most families can do both)
• Predictive accuracy vs insights
• Predictive accuracy vs stability
• Deterministic fitting vs randomised fitting
• Specific insights
• Structure and complexity
• Model assumptions (linear models, neural nets)
• Model structure (trees vs additive models vs SVM vs Neural Nets etc)
• The kinds of insights models provide
• Tendency to overfit (most, but not all)
• Dependence on metrics
• Sensitivity to missing values and categorical variables
22. Becoming a Master of Modelling Kung Fu
PredictiveModels
• Predictive models should be thought of as a “black box” initially, with the
characteristics that all models have in common recognised
• The focus should be on the data, not the model.
• Focusing on the specific characteristics of the model is important when: deciding on
the degree of accuracy desired, and the kinds of insights desired.
• It is good to start by working with one highly accurate, simple to use method
(randomForest is a good choice) and one or two highly interpretable models (rpart
decision trees and (generalised) linear models are good here.
• In fact, you can go a long way with just randomForest alone.
23. Becoming a Master of Modelling Kung Fu
PredictiveModels
• Master an adequate tool.
• Empty your mind of the tool . It is an illusion.
• Meditate on the data.
24. Meditating on Data
PredictiveModels
• Start with a highly accurate, nonparametric model you are comfortable with.
• The accuracy of a highly accuarate method is close to the theoretical limit of
accuracy possible on the data. World class experts may get closer, but not a whole
lot closer.
• So once you build the model, forget about the specific family you used. It is just a
tool.
• Each predictor may provide a unique amount of predictability to the model.
Measure it.
• Each predictor may be masked by other predictors. Be careful.
• Check relationships between data and strongest predictors
25. Meditating on Data
PredictiveModels
• There are at least 3 ways that a predictor can be important. They are not the same:
• What is the unique contribution of the predictor to the accuracy of the model
?
• What is the individual predictive power of the predictor alone ?
• How vital is the predictor to the structure of a particular model ?
• The first two are about the data, the third is more about the specific model. Which
is more important ?
26. Meditating on Data
PredictiveModels
• There are at least 3 ways that a predictor can be important. They are not the same:
• What is the unique contribution of the predictor to the accuracy of the model
?
• What is the individual predictive power of the predictor alone ?
• How vital is the predictor to the structure of a particular model ?
• The first two are about the data, the third is more about the specific model. Which
is more important ?
27. The Predictive Modelling Master’s Data Meditation
PredictiveModels
• Start with a highly accurate, nonparametric model you are comfortable with.
• The accuracy of a highly accuarate method is close to the theoretical limit of
accuracy possible on the data. World class experts may get closer, but not a whole
lot closer.
• So once you build the model, forget about the specific family you used. It is just a
tool.
• Measure model accuracy on out-of-sample data. Pay attention to any imbalances in
class or data subset accuracy.
• Measure model stability if necessary (it almost always is)
• Measure the importance of all variables, using the three main techniques.
• Measure again, holding some of the main predictors constant
• Measure (visualise) the effects of each predictor
• Build an interpretable model to help tell the story
28. The Master Sharpens the Sword : Getting More Accuracy
PredictiveModels
• There is never enough data
• Some model accuracy can result from trying other model families. Usually not
much, and not the best use of time, though for some reason the favourite activity of
new data miners.
• Some more model accuracy can result from tweaking model parameters. This is
perhaps less of a waste of time, but still not the ideal focus.
• The most dramatic improvement in model accuracy comes from new predictors.
• New predictors may be entirely new data sets, or complex new transformations of
existing data.
• A large, multi-tabular data set may well have information that has not been
captured in the data.
• The most common information of this type involves relations between individual
records. (eg. Time series windows, geographic neighbourhoods or social network
statistics per record)
29. Illusions On the Path
PredictiveModels
• Colossal wastes of time can include
• Trying to find the “right” model family
• Getting stuck in data preprocessing trying to get all the predictors “right”
• Trying to figure out what the targets should be (usually a sign that the business
problem is not well understood)
• Trying to “improve” the model without defining what that means
30. The Sun Tzu of Modelling: Be Prepared
PredictiveModels
• Know what you are modelling and for what purpose.
• Know what your target variable is. You may have more than one.
• Do not hesitate, model with what you have, and add more predictors later.
• Messy data is better than no data
• Use the right error measures
• Know the connection between the model and your business
• Evaluate, interrogate the model accordingly
• Always question the business value of the analysis
• Always be ready to suggest the business use of the analysis
• Don’t assume that the client understands what to do with the model
31. Strategy and Tactics
PredictiveModels
• Why are you (re)building the model?
• If Strategic: what is going to be done with the insights ? By whom ?
• If Operational: what are the key metrics – accuracy, value, deployability?