In this tutorial, we will learn the the following topics -
+ Training and Visualizing a Decision Tree
+ Making Predictions
+ Estimating Class Probabilities
+ The CART Training Algorithm
+ Computational Complexity
+ Gini Impurity or Entropy?
+ Regularization Hyperparameters
+ Regression
+ Instability
No machine learning algorithm dominates in every domain, but random forests are usually tough to beat by much. And they have some advantages compared to other models. No much input preparation needed, implicit feature selection, fast to train, and ability to visualize the model. While it is easy to get started with random forests, a good understanding of the model is key to get the most of them.
This talk will cover decision trees from theory, to their implementation in scikit-learn. An overview of ensemble methods and bagging will follow, to end up explaining and implementing random forests and see how they compare to other state-of-the-art models.
The talk will have a very practical approach, using examples and real cases to illustrate how to use both decision trees and random forests.
We will see how the simplicity of decision trees, is a key advantage compared to other methods. Unlike black-box methods, or methods tough to represent in multivariate cases, decision trees can easily be visualized, analyzed, and debugged, until we see that our model is behaving as expected. This exercise can increase our understanding of the data and the problem, while making our model perform in the best possible way.
Random Forests can randomize and ensemble decision trees to increase its predictive power, while keeping most of their properties.
The main topics covered will include:
* What are decision trees?
* How decision trees are trained?
* Understanding and debugging decision trees
* Ensemble methods
* Bagging
* Random Forests
* When decision trees and random forests should be used?
* Python implementation with scikit-learn
* Analysis of performance
Recurrent and Recursive Networks (Part 1)sohaib_alam
The first half of the chapter on Sequence Modeling: Recurrent and Recursive Nets from the book "Deep Learning" by I. Goodfellow, Y. Bengio and A. Courville.
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Simplilearn
This Random Forest Algorithm Presentation will explain how Random Forest algorithm works in Machine Learning. By the end of this video, you will be able to understand what is Machine Learning, what is classification problem, applications of Random Forest, why we need Random Forest, how it works with simple examples and how to implement Random Forest algorithm in Python.
Below are the topics covered in this Machine Learning Presentation:
1. What is Machine Learning?
2. Applications of Random Forest
3. What is Classification?
4. Why Random Forest?
5. Random Forest and Decision Tree
6. Comparing Random Forest and Regression
7. Use case - Iris Flower Analysis
- - - - - - - -
About Simplilearn Machine Learning course:
A form of artificial intelligence, Machine Learning is revolutionizing the world of computing as well as all people’s digital interactions. Machine Learning powers such innovative automated technologies as recommendation engines, facial recognition, fraud protection and even self-driving cars.This Machine Learning course prepares engineers, data scientists and other professionals with knowledge and hands-on skills required for certification and job competency in Machine Learning.
- - - - - - -
Why learn Machine Learning?
Machine Learning is taking over the world- and with that, there is a growing need among companies for professionals to know the ins and outs of Machine Learning
The Machine Learning market size is expected to grow from USD 1.03 Billion in 2016 to USD 8.81 Billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period.
- - - - - -
What skills will you learn from this Machine Learning course?
By the end of this Machine Learning course, you will be able to:
1. Master the concepts of supervised, unsupervised and reinforcement learning concepts and modeling.
2. Gain practical mastery over principles, algorithms, and applications of Machine Learning through a hands-on approach which includes working on 28 projects and one capstone project.
3. Acquire thorough knowledge of the mathematical and heuristic aspects of Machine Learning.
4. Understand the concepts and operation of support vector machines, kernel SVM, naive Bayes, decision tree classifier, random forest classifier, logistic regression, K-nearest neighbors, K-means clustering and more.
5. Be able to model a wide variety of robust Machine Learning algorithms including deep learning, clustering, and recommendation systems
- - - - - - -
how,when and why to perform Feature scaling?
Different type of feature scaling Technique.
when to perform feature scaling?
why to perform feature scaling?
MinMax feature scaling techniques.
Unit vector scaling.
No machine learning algorithm dominates in every domain, but random forests are usually tough to beat by much. And they have some advantages compared to other models. No much input preparation needed, implicit feature selection, fast to train, and ability to visualize the model. While it is easy to get started with random forests, a good understanding of the model is key to get the most of them.
This talk will cover decision trees from theory, to their implementation in scikit-learn. An overview of ensemble methods and bagging will follow, to end up explaining and implementing random forests and see how they compare to other state-of-the-art models.
The talk will have a very practical approach, using examples and real cases to illustrate how to use both decision trees and random forests.
We will see how the simplicity of decision trees, is a key advantage compared to other methods. Unlike black-box methods, or methods tough to represent in multivariate cases, decision trees can easily be visualized, analyzed, and debugged, until we see that our model is behaving as expected. This exercise can increase our understanding of the data and the problem, while making our model perform in the best possible way.
Random Forests can randomize and ensemble decision trees to increase its predictive power, while keeping most of their properties.
The main topics covered will include:
* What are decision trees?
* How decision trees are trained?
* Understanding and debugging decision trees
* Ensemble methods
* Bagging
* Random Forests
* When decision trees and random forests should be used?
* Python implementation with scikit-learn
* Analysis of performance
Recurrent and Recursive Networks (Part 1)sohaib_alam
The first half of the chapter on Sequence Modeling: Recurrent and Recursive Nets from the book "Deep Learning" by I. Goodfellow, Y. Bengio and A. Courville.
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Simplilearn
This Random Forest Algorithm Presentation will explain how Random Forest algorithm works in Machine Learning. By the end of this video, you will be able to understand what is Machine Learning, what is classification problem, applications of Random Forest, why we need Random Forest, how it works with simple examples and how to implement Random Forest algorithm in Python.
Below are the topics covered in this Machine Learning Presentation:
1. What is Machine Learning?
2. Applications of Random Forest
3. What is Classification?
4. Why Random Forest?
5. Random Forest and Decision Tree
6. Comparing Random Forest and Regression
7. Use case - Iris Flower Analysis
- - - - - - - -
About Simplilearn Machine Learning course:
A form of artificial intelligence, Machine Learning is revolutionizing the world of computing as well as all people’s digital interactions. Machine Learning powers such innovative automated technologies as recommendation engines, facial recognition, fraud protection and even self-driving cars.This Machine Learning course prepares engineers, data scientists and other professionals with knowledge and hands-on skills required for certification and job competency in Machine Learning.
- - - - - - -
Why learn Machine Learning?
Machine Learning is taking over the world- and with that, there is a growing need among companies for professionals to know the ins and outs of Machine Learning
The Machine Learning market size is expected to grow from USD 1.03 Billion in 2016 to USD 8.81 Billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period.
- - - - - -
What skills will you learn from this Machine Learning course?
By the end of this Machine Learning course, you will be able to:
1. Master the concepts of supervised, unsupervised and reinforcement learning concepts and modeling.
2. Gain practical mastery over principles, algorithms, and applications of Machine Learning through a hands-on approach which includes working on 28 projects and one capstone project.
3. Acquire thorough knowledge of the mathematical and heuristic aspects of Machine Learning.
4. Understand the concepts and operation of support vector machines, kernel SVM, naive Bayes, decision tree classifier, random forest classifier, logistic regression, K-nearest neighbors, K-means clustering and more.
5. Be able to model a wide variety of robust Machine Learning algorithms including deep learning, clustering, and recommendation systems
- - - - - - -
how,when and why to perform Feature scaling?
Different type of feature scaling Technique.
when to perform feature scaling?
why to perform feature scaling?
MinMax feature scaling techniques.
Unit vector scaling.
Module 4: Model Selection and EvaluationSara Hooker
Delta Analytics is a 501(c)3 non-profit in the Bay Area. We believe that data is powerful, and that anybody should be able to harness it for change. Our teaching fellows partner with schools and organizations worldwide to work with students excited about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our mission or provide feedback, take a look at www.deltanalytics.org.
This presentation covers Decision Tree as a supervised machine learning technique, talking about Information Gain method and Gini Index method with their related Algorithms.
▸ Machine Learning / Deep Learning models require to set the value of many hyperparameters
▸ Common examples: regularization coefficients, dropout rate, or number of neurons per layer in a Neural Network
▸ Instead of relying on some "expert advice", this presentation shows how to automatically find optimal hyperparameters
▸ Exhaustive Search, Monte Carlo Search, Bayesian Optimization, and Evolutionary Algorithms are explained with concrete examples
Decision Trees for Classification: A Machine Learning AlgorithmPalin analytics
Decision Trees in Machine Learning - Decision tree method is a commonly used data mining method for establishing classification systems based on several covariates or for developing prediction algorithms for a target variable.
What is an "ensemble learner"? How can we combine different base learners into an ensemble in order to improve the overall classification performance? In this lecture, we are providing some answers to these questions.
Inroduction to Perceptron and how it is used in Machine Learning and Artificial Neural Network.
This presentation is prepared by Zaid Al-husseini, as a lectur for third stage of undergraduate students in Softwrae department - faculity of IT - University of Babylon, Iraq.
It is publicly availabe for the beginners to learn in theory and mathmatically how the Perceptron is working.
Notice: the slides are not detailed. And need a teacher to explain them deeply.
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Simplilearn
This Decision Tree Algorithm in Machine Learning Presentation will help you understand all the basics of Decision Tree along with what is Machine Learning, problems in Machine Learning, what is Decision Tree, advantages and disadvantages of Decision Tree, how Decision Tree algorithm works with solved examples and at the end we will implement a Decision Tree use case/ demo in Python on loan payment prediction. This Decision Tree tutorial is ideal for both beginners as well as professionals who want to learn Machine Learning Algorithms.
Below topics are covered in this Decision Tree Algorithm Presentation:
1. What is Machine Learning?
2. Types of Machine Learning?
3. Problems in Machine Learning
4. What is Decision Tree?
5. What are the problems a Decision Tree Solves?
6. Advantages of Decision Tree
7. How does Decision Tree Work?
8. Use Case - Loan Repayment Prediction
What is Machine Learning: Machine Learning is an application of Artificial Intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed.
- - - - - - - -
About Simplilearn Machine Learning course:
A form of artificial intelligence, Machine Learning is revolutionizing the world of computing as well as all people’s digital interactions. Machine Learning powers such innovative automated technologies as recommendation engines, facial recognition, fraud protection and even self-driving cars.This Machine Learning course prepares engineers, data scientists and other professionals with knowledge and hands-on skills required for certification and job competency in Machine Learning.
- - - - - - -
Why learn Machine Learning?
Machine Learning is taking over the world- and with that, there is a growing need among companies for professionals to know the ins and outs of Machine Learning
The Machine Learning market size is expected to grow from USD 1.03 Billion in 2016 to USD 8.81 Billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period.
- - - - - -
What skills will you learn from this Machine Learning course?
By the end of this Machine Learning course, you will be able to:
1. Master the concepts of supervised, unsupervised and reinforcement learning concepts and modeling.
2. Gain practical mastery over principles, algorithms, and applications of Machine Learning through a hands-on approach which includes working on 28 projects and one capstone project.
3. Acquire thorough knowledge of the mathematical and heuristic aspects of Machine Learning.
4. Understand the concepts and operation of support vector machines, kernel SVM, naive bayes, decision tree classifier, random forest classifier, logistic regression, K-nearest neighbors, K-means clustering and more.
5. Be able to model a wide variety of robust Machine Learning algorithms including deep learning, clustering, and recommendation systems
- - - - - - -
Unit-V.pptx DVD is a great way to get sbi and more jobs available review and ...zohebmusharraf
goog is a great place to work sincerely excellent communication skills self motivated ability to work sincerely excellent communication skills self motivated
A decision tree is a guide to the potential results of a progression of related choices. It permits an individual or association to gauge potential activities against each other dependent on their costs, probabilities, and advantages. They can be utilized either to drive casual conversation or to outline a calculation that predicts the most ideal decision scientifically.
Module 4: Model Selection and EvaluationSara Hooker
Delta Analytics is a 501(c)3 non-profit in the Bay Area. We believe that data is powerful, and that anybody should be able to harness it for change. Our teaching fellows partner with schools and organizations worldwide to work with students excited about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our mission or provide feedback, take a look at www.deltanalytics.org.
This presentation covers Decision Tree as a supervised machine learning technique, talking about Information Gain method and Gini Index method with their related Algorithms.
▸ Machine Learning / Deep Learning models require to set the value of many hyperparameters
▸ Common examples: regularization coefficients, dropout rate, or number of neurons per layer in a Neural Network
▸ Instead of relying on some "expert advice", this presentation shows how to automatically find optimal hyperparameters
▸ Exhaustive Search, Monte Carlo Search, Bayesian Optimization, and Evolutionary Algorithms are explained with concrete examples
Decision Trees for Classification: A Machine Learning AlgorithmPalin analytics
Decision Trees in Machine Learning - Decision tree method is a commonly used data mining method for establishing classification systems based on several covariates or for developing prediction algorithms for a target variable.
What is an "ensemble learner"? How can we combine different base learners into an ensemble in order to improve the overall classification performance? In this lecture, we are providing some answers to these questions.
Inroduction to Perceptron and how it is used in Machine Learning and Artificial Neural Network.
This presentation is prepared by Zaid Al-husseini, as a lectur for third stage of undergraduate students in Softwrae department - faculity of IT - University of Babylon, Iraq.
It is publicly availabe for the beginners to learn in theory and mathmatically how the Perceptron is working.
Notice: the slides are not detailed. And need a teacher to explain them deeply.
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Simplilearn
This Decision Tree Algorithm in Machine Learning Presentation will help you understand all the basics of Decision Tree along with what is Machine Learning, problems in Machine Learning, what is Decision Tree, advantages and disadvantages of Decision Tree, how Decision Tree algorithm works with solved examples and at the end we will implement a Decision Tree use case/ demo in Python on loan payment prediction. This Decision Tree tutorial is ideal for both beginners as well as professionals who want to learn Machine Learning Algorithms.
Below topics are covered in this Decision Tree Algorithm Presentation:
1. What is Machine Learning?
2. Types of Machine Learning?
3. Problems in Machine Learning
4. What is Decision Tree?
5. What are the problems a Decision Tree Solves?
6. Advantages of Decision Tree
7. How does Decision Tree Work?
8. Use Case - Loan Repayment Prediction
What is Machine Learning: Machine Learning is an application of Artificial Intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed.
- - - - - - - -
About Simplilearn Machine Learning course:
A form of artificial intelligence, Machine Learning is revolutionizing the world of computing as well as all people’s digital interactions. Machine Learning powers such innovative automated technologies as recommendation engines, facial recognition, fraud protection and even self-driving cars.This Machine Learning course prepares engineers, data scientists and other professionals with knowledge and hands-on skills required for certification and job competency in Machine Learning.
- - - - - - -
Why learn Machine Learning?
Machine Learning is taking over the world- and with that, there is a growing need among companies for professionals to know the ins and outs of Machine Learning
The Machine Learning market size is expected to grow from USD 1.03 Billion in 2016 to USD 8.81 Billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period.
- - - - - -
What skills will you learn from this Machine Learning course?
By the end of this Machine Learning course, you will be able to:
1. Master the concepts of supervised, unsupervised and reinforcement learning concepts and modeling.
2. Gain practical mastery over principles, algorithms, and applications of Machine Learning through a hands-on approach which includes working on 28 projects and one capstone project.
3. Acquire thorough knowledge of the mathematical and heuristic aspects of Machine Learning.
4. Understand the concepts and operation of support vector machines, kernel SVM, naive bayes, decision tree classifier, random forest classifier, logistic regression, K-nearest neighbors, K-means clustering and more.
5. Be able to model a wide variety of robust Machine Learning algorithms including deep learning, clustering, and recommendation systems
- - - - - - -
Unit-V.pptx DVD is a great way to get sbi and more jobs available review and ...zohebmusharraf
goog is a great place to work sincerely excellent communication skills self motivated ability to work sincerely excellent communication skills self motivated
A decision tree is a guide to the potential results of a progression of related choices. It permits an individual or association to gauge potential activities against each other dependent on their costs, probabilities, and advantages. They can be utilized either to drive casual conversation or to outline a calculation that predicts the most ideal decision scientifically.
Basic of Decision Tree Learning. This slide includes definition of decision tree, basic example, basic construction of a decision tree, mathlab example
Delta Analytics is a 501(c)3 non-profit in the Bay Area. We believe that data is powerful, and that anybody should be able to harness it for change. Our teaching fellows partner with schools and organizations worldwide to work with students excited about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our mission or provide feedback, take a look at www.deltanalytics.org.
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfAdityaSoraut
Its all about Machine learning .Machine learning is a field of artificial intelligence (AI) that focuses on the development of algorithms and statistical models that enable computers to perform tasks without explicit programming instructions. Instead, these algorithms learn from data, identifying patterns, and making decisions or predictions based on that data.
There are several types of machine learning approaches, including:
Supervised Learning: In this approach, the algorithm learns from labeled data, where each example is paired with a label or outcome. The algorithm aims to learn a mapping from inputs to outputs, such as classifying emails as spam or not spam.
Unsupervised Learning: Here, the algorithm learns from unlabeled data, seeking to find hidden patterns or structures within the data. Clustering algorithms, for instance, group similar data points together without any predefined labels.
Semi-Supervised Learning: This approach combines elements of supervised and unsupervised learning, typically by using a small amount of labeled data along with a large amount of unlabeled data to improve learning accuracy.
Reinforcement Learning: This paradigm involves an agent learning to make decisions by interacting with an environment. The agent receives feedback in the form of rewards or penalties, enabling it to learn the optimal behavior to maximize cumulative rewards over time.Machine learning algorithms can be applied to a wide range of tasks, including:
Classification: Assigning inputs to one of several categories. For example, classifying whether an email is spam or not.
Regression: Predicting a continuous value based on input features. For instance, predicting house prices based on features like square footage and location.
Clustering: Grouping similar data points together based on their characteristics.
Dimensionality Reduction: Reducing the number of input variables to simplify analysis and improve computational efficiency.
Recommendation Systems: Predicting user preferences and suggesting items or actions accordingly.
Natural Language Processing (NLP): Analyzing and generating human language text, enabling tasks like sentiment analysis, machine translation, and text summarization.
Machine learning has numerous applications across various domains, including healthcare, finance, marketing, cybersecurity, and more. It continues to be an area of active research and
Get to know in detail the termonologies of Random Forest with their types of algorithms used in the workflow along with their advantages and disadvantages of their predecessors.
Thanks, for your time, if you enjoyed this short article there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
Detailed discussion about decision tree regressor and the classifier with finding the right algorithm to split
Let me know if anything is required. Ping me at google #bobrupakroy
ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone.
An ensemble is itself a supervised learning algorithm, because it can be trained and then used to make predictions. The trained ensemble, therefore, represents a single hypothesis. This hypothesis, however, is not necessarily contained within the hypothesis space of the models from which it is built.
Decision tree is a type of supervised learning algorithm (having a pre-defined target variable) that is mostly used in classification problems. It is a tree in which each branch node represents a choice between a number of alternatives, and each leaf node represents a decision.
Understanding computer vision with Deep LearningCloudxLab
Computer vision is a branch of computer science which deals with recognising objects, people and identifying patterns in visuals. It is basically analogous to the vision of an animal.
Topics covered:
1. Overview of Machine Learning
2. Basics of Deep Learning
3. What is computer vision and its use-cases?
4. Various algorithms used in Computer Vision (mostly CNN)
5. Live hands-on demo of either Auto Cameraman or Face recognition system
6. What next?
( Machine Learning & Deep Learning Specialization Training: https://goo.gl/5u2RiS )
This CloudxLab Reinforcement Learning tutorial helps you to understand Reinforcement Learning in detail. Below are the topics covered in this tutorial:
1) What is Reinforcement?
2) Reinforcement Learning an Introduction
3) Reinforcement Learning Example
4) Learning to Optimize Rewards
5) Policy Search - Brute Force Approach, Genetic Algorithms and Optimization Techniques
6) OpenAI Gym
7) The Credit Assignment Problem
8) Inverse Reinforcement Learning
9) Playing Atari with Deep Reinforcement Learning
10) Policy Gradients
11) Markov Decision Processes
Apache Spark - Key Value RDD - Transformations | Big Data Hadoop Spark Tutori...CloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2sm5Ekd
This CloudxLab Key-Value RDD Transformations tutorial helps you to understand Key-Value RDD transformations in detail. Below are the topics covered in this tutorial:
1) Transformations on Key-Value Pair RDD - keys(), values(), groupByKey(), combineByKey(), sortByKey(), subtractByKey(), join(), leftOuterJoin(), rightOuterJoin(), cogroup(), countByKey() and lookup()
Advanced Spark Programming - Part 2 | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2kyRTuW
This CloudxLab Advanced Spark Programming tutorial helps you to understand Advanced Spark Programming in detail. Below are the topics covered in this slide:
1) Shared Variables - Accumulators & Broadcast Variables
2) Accumulators and Fault Tolerance
3) Custom Accumulators - Version 1.x & Version 2.x
4) Examples of Broadcast Variables
5) Key Performance Considerations - Level of Parallelism
6) Serialization Format - Kryo
7) Memory Management
8) Hardware Provisioning
Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...CloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2sm9c61
This CloudxLab Introduction to Spark SQL & DataFrames tutorial helps you to understand Spark SQL & DataFrames in detail. Below are the topics covered in this slide:
1) Loading XML
2) What is RPC - Remote Process Call
3) Loading AVRO
4) Data Sources - Parquet
5) Creating DataFrames From Hive Table
6) Setting up Distributed SQL Engine
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...CloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2sf2z6i
This CloudxLab Introduction to Spark SQL & DataFrames tutorial helps you to understand Spark SQL & DataFrames in detail. Below are the topics covered in this slide:
1) Introduction to DataFrames
2) Creating DataFrames from JSON
3) DataFrame Operations
4) Running SQL Queries Programmatically
5) Datasets
6) Inferring the Schema Using Reflection
7) Programmatically Specifying the Schema
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
(Big Data with Hadoop & Spark Training: http://bit.ly/2IUsWca
This CloudxLab Running in a Cluster tutorial helps you to understand running Spark in the cluster in detail. Below are the topics covered in this tutorial:
1) Spark Runtime Architecture
2) Driver Node
3) Scheduling Tasks on Executors
4) Understanding the Architecture
5) Cluster Managers
6) Executors
7) Launching a Program using spark-submit
8) Local Mode & Cluster-Mode
9) Installing Standalone Cluster
10) Cluster Mode - YARN
11) Launching a Program on YARN
12) Cluster Mode - Mesos and AWS EC2
13) Deployment Modes - Client and Cluster
14) Which Cluster Manager to Use?
15) Common flags for spark-submit
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2LCTufA
This CloudxLab Introduction to SparkR tutorial helps you to understand SparkR in detail. Below are the topics covered in this tutorial:
1) SparkR (R on Spark)
2) SparkR DataFrames
3) Launch SparkR
4) Creating DataFrames from Local DataFrames
5) DataFrame Operation
6) Creating DataFrames - From JSON
7) Running SQL Queries from SparkR
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2kyP2Ct
This CloudxLab Introduction to NoSQL tutorial helps you to understand NoSQL in detail. Below are the topics covered in this slide:
1) Introduction to NoSQL
2) Scaling Out vs Scaling Up
3) ACID - Properties of DB Transactions
4) RDBMS - Story
5) What is NoSQL?
6) Types Of NoSQL Stores
7) CAP Theorem
8) Serialization
9) Column Oriented Database
10) Column Family Oriented DataStore
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...CloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2sh5b3E
This CloudxLab Hadoop Streaming tutorial helps you to understand Hadoop Streaming in detail. Below are the topics covered in this tutorial:
1) Hadoop Streaming and Why Do We Need it?
2) Writing Streaming Jobs
3) Testing Streaming jobs and Hands-on on CloudxLab
Introduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLabCloudxLab
( Machine Learning & Deep Learning Specialization Training: https://goo.gl/6n3vko )
This CloudxLab TensorFlow tutorial helps you to understand TensorFlow in detail. Below are the topics covered in this tutorial:
1) Why TensorFlow?
2) What are Tensors?
3) What is TensorFlow?
4) Creating your First Graph
5) Linear Regression with TensorFlow
6) Implementing Gradient Descent using TensorFlow
7) Implementing Gradient Descent Using autodiff
8) Implementing Gradient Descent Using an Optimizer
9) Graph Visualization using TensorBoard
10) Name Scopes in TensorFlow
11) Modularity in TensorFlow
12) Sharing Variables in TensorFlow
Introduction to Deep Learning | CloudxLabCloudxLab
( Machine Learning & Deep Learning Specialization Training: https://goo.gl/goQxnL )
This CloudxLab Deep Learning tutorial helps you to understand Deep Learning in detail. Below are the topics covered in this tutorial:
1) What is Deep Learning
2) Deep Learning Applications
3) Artificial Neural Network
4) Deep Learning Neural Networks
5) Deep Learning Frameworks
6) AI vs Machine Learning
In this tutorial, we will learn the the following topics -
+ The Curse of Dimensionality
+ Main Approaches for Dimensionality Reduction
+ PCA - Principal Component Analysis
+ Kernel PCA
+ LLE
+ Other Dimensionality Reduction Techniques
In this tutorial, we will learn the the following topics -
+ Voting Classifiers
+ Bagging and Pasting
+ Random Patches and Random Subspaces
+ Random Forests
+ Boosting
+ Stacking
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
2. Machine Learning - Classfication
Decision Trees
In this session we’ll learn about Decision Trees.
Decision Trees are versatile Machine Learning algorithms that can perform
● Classification Tasks
● Regression Tasks
● And even multi-output tasks
They are very powerful algorithms, capable of fitting complex datasets. Also
used in Random Forest (remember our project?)
3. Machine Learning - Classfication
Decision Trees
Example of a Decision Tree
4. Machine Learning - Classfication
Decision Trees
● Working with Decision Trees
○ Train
○ Visualize
○ Make prediction with Decision tree
● The CART training algorithm used by Scikit-Learn
● How to regularize trees and use them for regression tasks
● Regression with Decision Trees
● Limitations of Decision Trees
What we’ll learn in this session ?
5. Machine Learning - Classfication
Decision Trees
Training and Visualizing a Decision Tree
Let’s just build a Decision Tree and take a look at how it makes predictions
We’ll train a DecisionTreeClassifier using Scikit Learn on the famous Iris
dataset.
And then we’ll see how it works.
6. Machine Learning - Classfication
Decision Trees
Iris Dataset - Training and Visualizing a Decision Tree
The iris dataset consists of 4 features namely :
● Petal Length
● Petal Width
● Sepal Length
● Sepal Width
There are three classes namely :
● Iris Setosa
● Iris Versicolor
● Iris Virginica
7. Machine Learning - Classfication
Decision Trees
Iris Dataset - Training and Visualizing a Decision Tree
8. Machine Learning - Classfication
Decision Trees
Training and Visualizing a Decision Tree
The iris dataset has 4 features petal length, petal width, sepal length and
sepal width.
But here we’ll only use two features i.e. petal length and petal width.
9. Machine Learning - Classfication
Decision Trees
Training and Visualizing a Decision Tree
We will follow the following steps to train and visualize our decision tree
1. Load the Iris dataset using Scikit Learn
2. Select only the Petal length and Petal width features
3. Train our Decision Tree classifier on the Iris Dataset
4. Visualize our Decision Tree using export_graphviz()
5. export_graphviz() gives us a file in .dot format which we will convert to
png using the dot command line tool
10. Machine Learning - Classfication
Decision Trees
Training and Visualizing a Decision Tree
1. Load the Iris dataset using Scikit Learn
>>> from sklearn.datasets import load_iris
>>> iris = load_iris()
Run it in jupyter notebook
11. Machine Learning - Classfication
Decision Trees
Training and Visualizing a Decision Tree
2. Select only the Petal length and Petal width features
>>> X = iris.data[:, 2:] # petal length and width
>>> y = iris.target
Run it in jupyter notebook
12. Machine Learning - Classfication
Decision Trees
Training and Visualizing a Decision Tree
3. Train our Decision Tree classifier on the Iris Dataset
>>> from sklearn.tree import DecisionTreeClassifier
>>> tree_clf = DecisionTreeClassifier(max_depth=2)
>>> tree_clf.fit(X, y)
Run it in jupyter notebook
13. Machine Learning - Classfication
Decision Trees
Training and Visualizing a Decision Tree
4. Visualize our Decision Tree using export_graphviz()
We can visualize the trained decision tree using the export_graphviz()
method.
>>> from sklearn.tree import export_graphviz
>>> export_graphviz( tree_clf,
out_file=image_path("iris_tree.dot"),
feature_names=iris.feature_names[2:],
class_names=iris.target_names, rounded=True,
filled=True )
14. Machine Learning - Classfication
Decision Trees
Training and Visualizing a Decision Tree
5. Converting to Png file
Then you can convert this .dot file to a variety of formats such as PDF or
PNG using the dot command- line tool from the graphviz package.
This command line converts the .dot file to a .png image file:
>>> dot -Tpng iris_tree.dot -o iris_tree.png
Run it in the console
15. Machine Learning - Classfication
Decision Trees
Training and Visualizing a Decision Tree
16. Machine Learning - Classfication
Decision Trees
Understanding the Decision Tree
A node’s value attribute tells
you how many training instances
of each class this node applies to
for example, the bottom-right
node applies to 0 Iris-Setosa, 1
Iris- Versicolor, and 45 Iris-
Virginica.
17. Machine Learning - Classfication
Decision Trees
Understanding the Decision Tree
A node’s gini attribute
measures its impurity: a node is
“pure” (gini=0) if all training
instances it applies to belong to
the same class.
For example, since the depth-1
left node applies only to Iris-
Setosa training instances, it is
pure and its gini score is 0.
18. Machine Learning - Classfication
Decision Trees
Making Predictions
To make a prediction the
decision classifier follows these
steps :
● Start at the root node (depth
0, at the top), this node asks
whether the flower’s petal
length is smaller than or equal
to 2.45 cm:
19. Machine Learning - Classfication
Decision Trees
Making Predictions
○ If it is, then you move down to
the root’s left child node
(depth 1, left). In this case it is
a leaf node hence the flower is
predicted as setosa.
20. Machine Learning - Classfication
Decision Trees
Making Predictions
○ If it is not, then you move
down to the root’s right child
node (depth 1, right), since it
is not a leaf node it asks
further questions as, is the
petal width smaller than or
equal to 1.75 cm?
21. Machine Learning - Classfication
Decision Trees
Making Predictions
● If it is, then your flower is
most likely an Iris- Versicolor
(depth 2, left).
● If it is not, If not, it is likely an
Iris-Virginica (depth 2, right).
22. Machine Learning - Classfication
Decision Trees
Decision Tree’s decision boundaries.
Boundary made at depth 0.
Petal length <= 2.45
23. Machine Learning - Classfication
Decision Trees
Decision Tree’s decision boundaries.
Boundary made at depth 1.
Petal width <= 1.75
24. Machine Learning - Classfication
Decision Trees
Decision Tree’s decision boundaries.
This would have been the decision
boundary to divide the tree further
i.e. if max_depth was set to 3
25. Machine Learning - Classfication
Decision Trees
How is Gini Impurity Calculated?
A node’s gini attribute measures its impurity: a node is
● “Pure” (gini=0) if all training instances it applies to belong to the same
class.
A Gini coefficient of 1 expresses maximal inequality among the training
samples.
26. Machine Learning - Classfication
Decision Trees
How is Gini Impurity Calculated?
The formula for finding the gini impurity score of a particular level is :
Here pi is the ratio of class i in the node whose gini index is being calculated.
There are total of c classes.
27. Machine Learning - Classfication
Decision Trees
How is Gini Impurity Calculated?
For example, since the depth-1
left node applies only to Iris-
Setosa training instances, it is
pure and its gini score is 0.
28. Machine Learning - Classfication
Decision Trees
How is Gini Impurity Calculated?
In the our example the depth-2
left node has a gini score equal to
1 – (0/54)2 – (49/54)2 – (5/54)2 ≈
0.168
29. Machine Learning - Classfication
Decision Trees
Model Interpretation : White Box versus Black Box
A White Box Model is :
● Fairly intuitive
● Their decisions are easy to interpret
● Eg. Decision Trees
30. Machine Learning - Classfication
Decision Trees
Model Interpretation : White Box versus Black Box
A Black Box Model is :
● They make great predictions
● Easy to check the calculations that were performed to make these
predictions
31. Machine Learning - Classfication
Decision Trees
Model Interpretation : White Box versus Black Box
A Black Box Model is :
● It is usually hard to explain in simple terms why the predictions were
made
● Eg. Random Forests , Neural networks
32. Machine Learning - Classfication
Model Interpretation : White Box versus Black Box
Decision Trees
33. Machine Learning - Classfication
Decision Trees
Estimating Class Probabilities
A Decision Tree can also estimate the probability that an instance belongs to
a particular class k.
To do this it follows the following steps:
● First it traverses the tree to find the leaf node for this instance
● Then it returns the ratio of training instances of class k in this node.
34. Machine Learning - Classfication
Decision Trees
Estimating Class Probabilities
Suppose you have found a flower
whose petals are
● 5 cm long and
● 1.5 cm wide
35. Machine Learning - Classfication
Decision Trees
Estimating Class Probabilities
It first asks the question whether
petal length <= 2.45. Since the
condition is false it will go to the
right subset
36. Machine Learning - Classfication
Decision Trees
Estimating Class Probabilities
It will then ask if petal width <=
1.75. Since the condition is true it
will go the left subset.
37. Machine Learning - Classfication
Decision Trees
Estimating Class Probabilities
Decision Tree should output the
following probabilities:
● 0% for Iris-Setosa (0/54)
● 90.7% for Iris-Versicolor
(49/54)
● 9.3% for Iris- Virginica (5/54)
38. Machine Learning - Classfication
Decision Trees
Estimating Class Probabilities
If we ask it to predict the class, it should output Iris-Versicolor (class 1) since
it has the highest probability.
Let’s verify it:
>>> tree_clf.predict_proba([[5, 1.5]])
array([[ 0. , 0.90740741, 0.09259259]])
>>> tree_clf.predict([[5, 1.5]])
array([1])
Run it in jupyter notebook
39. Machine Learning - Classfication
The CART Training Algorithm
Decision Trees
Scikit-Learn uses the Classification And Regression Tree (CART) algorithm
to train Decision Trees.
The idea is really quite simple:
● The algorithm first splits the training set in two subsets using a single
feature k and a threshold tk
40. Machine Learning - Classfication
Decision Trees
The CART Training Algorithm
How does it choose k and tk???
It searches for the pair (k, tk) that produces the purest subsets (weighted by
their size) Splitting on the basis
of feature k and a
threshold value of tk
Purest
Subset
Purest
Subset
41. Machine Learning - Classfication
Decision Trees
The CART Training Algorithm
The cost function that the algorithm tries to minimize is given by the
equation :
42. Machine Learning - Classfication
Decision Trees
The CART Training Algorithm
● Once it has successfully split the training set in two, it splits the subsets
using the same logic, then the sub- subsets and so on, recursively
● It stops recursion once it reaches the maximum depth (defined by the
max_depth hyperparameter), or if it cannot find a split that will reduce
impurity.
43. Machine Learning - Classfication
Decision Trees
Important points on the CART Training Algorithm
● It is a greedy algorithm as it greedily searches for an optimum split at the
top level
● Then repeats the process at each level.
Splitting on the basis
of feature k and a
threshold value of tk
Purest
Subset
Purest
Subset
44. Machine Learning - Classfication
Decision Trees
Important points on the CART Training Algorithm
● It does not check whether or not the split will lead to the lowest possible
impurity several levels down.
● A greedy algorithm often produces a reasonably good solution, but it is
not guaranteed to be the optimal solution
45. Machine Learning - Classfication
Decision Trees
Computational Complexity of Decision Trees
Complexity of Prediction :
● Making predictions requires traversing the Decision Tree from the root
to a leaf.
● Decision Trees are generally approximately balanced, so traversing the
Decision Tree requires going through roughly O(log2(m)) nodes, where
m is total number of training instances.
46. Machine Learning - Classfication
Decision Trees
Computational Complexity of Decision Trees
Complexity of Prediction :
● Since each node only requires checking the value of one feature, the
overall prediction complexity is just O(log2(m))
One feature checked i.e. petal
length
One feature checked i.e. petal
width
47. Machine Learning - Classfication
Decision Trees
Computational Complexity of Decision Trees
Complexity of Prediction :
● Hence, the complexity of prediction is independent of the number of
features.
So predictions are very fast, even when dealing with large training sets.
48. Machine Learning - Classfication
Decision Trees
Computational Complexity of Decision Trees
Complexity of Training :
● The training algorithm compares all features (or less if max_features is
set) on all samples at each node.
● This results in a training complexity of O(n × m log(m)), where n is the
number of features, we have to compare all the n features at each of the
m nodes.
49. Machine Learning - Classfication
Decision Trees
Computational Complexity of Decision Trees
Complexity of Training :
For small training sets (less than a few thousand instances),
Scikit-Learn can speed up training by presorting the data (set presort=True),
but this slows down training considerably for larger training sets.
50. Machine Learning - Classfication
Decision Trees
Which measure to use ? Gini Impurity or Entropy?
By default, the Gini impurity measure is used, but you can select the entropy
impurity measure instead by setting the criterion hyperparameter to
"entropy".
Entropy measures the degree of randomness
51. Machine Learning - Classfication
Decision Trees
Which measure to use ? Gini Impurity or Entropy?
The formula for measuring entropy is
52. Machine Learning - Classfication
Decision Trees
Which measure to use ? Gini Impurity or Entropy?
The entropy for the
depth-2 left node is
≈ 0.31
53. Machine Learning - Classfication
Decision Trees
Which measure to use ? Gini Impurity or Entropy?
The truth is, most of the time it does not make a big difference: they lead to
similar trees.
● Gini impurity is slightly faster to compute, so it is a good default.
● However, Gini impurity tends to isolate the most frequent class in its own
branch of the tree
● While entropy tends to produce slightly more balanced trees.
54. Machine Learning - Classfication
Decision Trees
Regularization Hyperparameter
● If left unconstrained, the tree structure will adapt itself to the training
data, fitting it very closely, and most likely overfitting it.
● To avoid overfitting the training data, you need to restrict the Decision
Tree’s freedom during training.
55. Machine Learning - Classfication
Decision Trees
Regularization Hyperparameter
● Putting restriction on the freedom of the model during training is called
regularization. The regularization hyperparameters depend on the
algorithm used, but generally you can at least restrict the maximum depth
of the Decision Tree.
56. Machine Learning - Classfication
Decision Trees
Parametric and Non-parametric models
● Models like Decision Tree models are often called nonparametric
model because the number of parameters is not determined prior to
training, so the model structure is free to stick closely to the data.
● In contrast, a parametric model such as a linear model has a
predetermined number of parameters, so its degree of freedom is limited,
reducing the risk of overfitting but increasing the risk of underfitting.
57. Machine Learning - Classfication
Decision Trees
Regularization parameters for DecisionTreeClassifier class
● max_depth → restricts the maximum depth of the Decision Tree
● min_samples_split → the minimum number of samples a node must
have before it can be split
● min_samples_leaf → the minimum number of samples a leaf node must
have
58. Machine Learning - Classfication
Decision Trees
Regularization parameters for DecisionTreeClassifier class
● min_weight_fraction_leaf → same as min_samples_leaf but expressed
as a fraction of the total number of weighted instances
● max_leaf_nodes → maximum number of leaf nodes
● max_features → maximum number of features that are evaluated for
splitting at each node
Increasing min_* hyperparameters or reducing max_*
hyperparameters will regularize the model.
59. Machine Learning - Classfication
Decision Trees
Regularization parameters for DecisionTreeClassifier class
The above Decision Tree is trained with the default hyperparameters i.e., no
restrictions. This model is Overfitting the data.
60. Machine Learning - Classfication
Decision Trees
Regularization parameters for DecisionTreeClassifier class
The above Decision Tree is trained with min_samples_leaf=4. This model
will probably generalize better.
61. Machine Learning - Classfication
Decision Trees
Regression with Decision Trees
Decision Trees are also capable of performing regression tasks. Let’s build a
regression tree using Scikit- Learn’s DecisionTreeRegressor class, training it
on a noisy quadratic dataset with max_depth=2
62. Machine Learning - Classfication
Decision Trees
Regression with Decision Trees
>>> from sklearn.tree import DecisionTreeRegressor
>>> tree_reg = DecisionTreeRegressor(max_depth=2)
tree_reg.fit(X, y)
Run it on Notebook
63. Machine Learning - Classfication
Decision Trees
Regression with Decision Trees
The main difference is that instead of predicting a class in each node, it
predicts a value.
64. Machine Learning - Classfication
Decision Trees
Regression with Decision Trees
Notice how the predicted value for each region is always the average target
value of the instances in that region.
65. Machine Learning - Classfication
Decision Trees
Regression with Decision Trees
The CART algorithm works the same way except instead of trying to split
the training set in a way that minimizes impurity, it now tries to split the
training set in a way that minimizes the MSE.
66. Machine Learning - Classfication
Decision Trees
Regression with Decision Trees
The formula for cost function for regression is -
67. Machine Learning - Classfication
Decision Trees
Regression with Decision Trees
● Just like for classification tasks, Decision Trees are prone to overfitting
when dealing with regression tasks.
● Without any regularization the model may overfit the data.
68. Machine Learning - Classfication
Decision Trees
Regression with Decision Trees
Without any regularization i.e., using the default hyperparameters, you get
the above model.
It is obviously overfitting the training set very badly.
69. Machine Learning - Classfication
Decision Trees
Regression with Decision Trees
Just setting min_samples_leaf=10 results in a much more reasonable model,
represented by the above figure.
70. Machine Learning - Classfication
Decision Trees
Demerits of Decision Trees
● Decision Trees love orthogonal decision boundaries (all splits are
perpendicular to an axis)
● This makes them sensitive to training set rotation.
● They are very sensitive to small variations in the training data.
71. Machine Learning - Classfication
Decision Trees
Demerits of Decision Trees
Above figure shows a simple linearly separable dataset: on the left, a Decision
Tree can split it easily, while on the right, after the dataset is rotated by 45°,
the decision boundary looks unnecessarily convoluted.
72. Machine Learning - Classfication
Decision Trees
Demerits of Decision Trees
Although both Decision Trees fit the training set perfectly, it is very likely
that the model on the right will not generalize well.