Concepts include decision tree with its examples. Measures used for splitting in decision tree like gini index, entropy, information gain, pros and cons, validation. Basics of random forests with its example and uses.
Intro and maths behind Bayes theorem. Bayes theorem as a classifier. NB algorithm and examples of bayes. Intro to knn algorithm, lazy learning, cosine similarity. Basics of recommendation and filtering methods.
Intro to SVM with its maths and examples. Types of SVM and its parameters. Concept of vector algebra. Concepts of text analytics and Natural Language Processing along with its applications.
In this slide, variables types, probability theory behind the algorithms and its uses including distribution is explained. Also theorems like bayes theorem is also explained.
Introduction to linear regression and the maths behind it like line of best fit, regression matrics. Other concepts include cost function, gradient descent, overfitting and underfitting, r squared.
Intro and maths behind Bayes theorem. Bayes theorem as a classifier. NB algorithm and examples of bayes. Intro to knn algorithm, lazy learning, cosine similarity. Basics of recommendation and filtering methods.
Intro to SVM with its maths and examples. Types of SVM and its parameters. Concept of vector algebra. Concepts of text analytics and Natural Language Processing along with its applications.
In this slide, variables types, probability theory behind the algorithms and its uses including distribution is explained. Also theorems like bayes theorem is also explained.
Introduction to linear regression and the maths behind it like line of best fit, regression matrics. Other concepts include cost function, gradient descent, overfitting and underfitting, r squared.
Data Science - Part III - EDA & Model SelectionDerek Kane
This lecture introduces the concept of EDA, understanding, and working with data for machine learning and predictive analysis. The lecture is designed for anyone who wants to understand how to work with data and does not get into the mathematics. We will discuss how to utilize summary statistics, diagnostic plots, data transformations, variable selection techniques including principal component analysis, and finally get into the concept of model selection.
very detailed illustration of Log of Odds, Logit/ logistic regression and their types from binary logit, ordered logit to multinomial logit and also with their assumptions.
Thanks, for your time, if you enjoyed this short article there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
A ppt based on predicting prices of houses. Also tells about basics of machine learning and the algorithm used to predict those prices by using regression technique.
Types of Probability Distributions - Statistics IIRupak Roy
Get to know in detail the definitions of the types of probability distributions from binomial, poison, hypergeometric, negative binomial to continuous distribution like t-distribution and much more.
Let me know if anything is required. Ping me at google #bobrupakroy
Machine Learning Decision Tree AlgorithmsRupak Roy
Details discussion about the Tree Algorithms like Gini, Information Gain, Chi-square for categorical and Reduction in variance for continuous variable. Let me know if anything is required. Happy to help. Enjoy machine learning! #bobrupakroy
Multiple sample test - Anova, Chi-square, Test of association, Goodness of Fit Rupak Roy
Detailed demonstration of Multiple Sample Test like Analysis of Variance (ANOVA), kinds of ANOVA One Way, Two Way, Chi-square with their assumptions and applications using excel, and much more.
Let me know if anything is needed. Happy to help. ping @ #bobrupakroy
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Simplilearn
This presentation on Machine Learning will help you understand why Machine Learning came into picture, what is Machine Learning, types of Machine Learning, Machine Learning algorithms with a detailed explanation on linear regression, decision tree & support vector machine and at the end you will also see a use case implementation where we classify whether a recipe is of a cupcake or muffin using SVM algorithm. Machine learning is a core sub-area of artificial intelligence; it enables computers to get into a mode of self-learning without being explicitly programmed. When exposed to new data, these computer programs are enabled to learn, grow, change, and develop by themselves. So, to put simply, the iterative aspect of machine learning is the ability to adapt to new data independently. Now, let us get started with this Machine Learning presentation and understand what it is and why it matters.
Below topics are explained in this Machine Learning presentation:
1. Why Machine Learning?
2. What is Machine Learning?
3. Types of Machine Learning
4. Machine Learning Algorithms
- Linear Regression
- Decision Trees
- Support Vector Machine
5. Use case: Classify whether a recipe is of a cupcake or a muffin using SVM
About Simplilearn Machine Learning course:
A form of artificial intelligence, Machine Learning is revolutionizing the world of computing as well as all people’s digital interactions. Machine Learning powers such innovative automated technologies as recommendation engines, facial recognition, fraud protection and even self-driving cars. This Machine Learning course prepares engineers, data scientists and other professionals with knowledge and hands-on skills required for certification and job competency in Machine Learning.
Why learn Machine Learning?
Machine Learning is taking over the world- and with that, there is a growing need among companies for professionals to know the ins and outs of Machine Learning
The Machine Learning market size is expected to grow from USD 1.03 Billion in 2016 to USD 8.81 Billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period.
We recommend this Machine Learning training course for the following professionals in particular:
1. Developers aspiring to be a data scientist or Machine Learning engineer
2. Information architects who want to gain expertise in Machine Learning algorithms
3. Analytics professionals who want to work in Machine Learning or artificial intelligence
4. Graduates looking to build a career in data science and Machine Learning
Learn more at: https://www.simplilearn.com/
This presentation is aimed at fitting a Simple Linear Regression model in a Python program. IDE used is Spyder. Screenshots from a working example are used for demonstration.
Application of Machine Learning in AgricultureAman Vasisht
With the growing trend of machine learning, it is needless to say how machine learning can help reap benefits in agriculture. It will be boon for the farmer welfare.
Data Science - Part V - Decision Trees & Random Forests Derek Kane
This lecture provides an overview of decision tree machine learning algorithms and random forest ensemble techniques. The practical example includes diagnosing Type II diabetes and evaluating customer churn in the telecommunication industry.
Data Science - Part III - EDA & Model SelectionDerek Kane
This lecture introduces the concept of EDA, understanding, and working with data for machine learning and predictive analysis. The lecture is designed for anyone who wants to understand how to work with data and does not get into the mathematics. We will discuss how to utilize summary statistics, diagnostic plots, data transformations, variable selection techniques including principal component analysis, and finally get into the concept of model selection.
very detailed illustration of Log of Odds, Logit/ logistic regression and their types from binary logit, ordered logit to multinomial logit and also with their assumptions.
Thanks, for your time, if you enjoyed this short article there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
A ppt based on predicting prices of houses. Also tells about basics of machine learning and the algorithm used to predict those prices by using regression technique.
Types of Probability Distributions - Statistics IIRupak Roy
Get to know in detail the definitions of the types of probability distributions from binomial, poison, hypergeometric, negative binomial to continuous distribution like t-distribution and much more.
Let me know if anything is required. Ping me at google #bobrupakroy
Machine Learning Decision Tree AlgorithmsRupak Roy
Details discussion about the Tree Algorithms like Gini, Information Gain, Chi-square for categorical and Reduction in variance for continuous variable. Let me know if anything is required. Happy to help. Enjoy machine learning! #bobrupakroy
Multiple sample test - Anova, Chi-square, Test of association, Goodness of Fit Rupak Roy
Detailed demonstration of Multiple Sample Test like Analysis of Variance (ANOVA), kinds of ANOVA One Way, Two Way, Chi-square with their assumptions and applications using excel, and much more.
Let me know if anything is needed. Happy to help. ping @ #bobrupakroy
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Simplilearn
This presentation on Machine Learning will help you understand why Machine Learning came into picture, what is Machine Learning, types of Machine Learning, Machine Learning algorithms with a detailed explanation on linear regression, decision tree & support vector machine and at the end you will also see a use case implementation where we classify whether a recipe is of a cupcake or muffin using SVM algorithm. Machine learning is a core sub-area of artificial intelligence; it enables computers to get into a mode of self-learning without being explicitly programmed. When exposed to new data, these computer programs are enabled to learn, grow, change, and develop by themselves. So, to put simply, the iterative aspect of machine learning is the ability to adapt to new data independently. Now, let us get started with this Machine Learning presentation and understand what it is and why it matters.
Below topics are explained in this Machine Learning presentation:
1. Why Machine Learning?
2. What is Machine Learning?
3. Types of Machine Learning
4. Machine Learning Algorithms
- Linear Regression
- Decision Trees
- Support Vector Machine
5. Use case: Classify whether a recipe is of a cupcake or a muffin using SVM
About Simplilearn Machine Learning course:
A form of artificial intelligence, Machine Learning is revolutionizing the world of computing as well as all people’s digital interactions. Machine Learning powers such innovative automated technologies as recommendation engines, facial recognition, fraud protection and even self-driving cars. This Machine Learning course prepares engineers, data scientists and other professionals with knowledge and hands-on skills required for certification and job competency in Machine Learning.
Why learn Machine Learning?
Machine Learning is taking over the world- and with that, there is a growing need among companies for professionals to know the ins and outs of Machine Learning
The Machine Learning market size is expected to grow from USD 1.03 Billion in 2016 to USD 8.81 Billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period.
We recommend this Machine Learning training course for the following professionals in particular:
1. Developers aspiring to be a data scientist or Machine Learning engineer
2. Information architects who want to gain expertise in Machine Learning algorithms
3. Analytics professionals who want to work in Machine Learning or artificial intelligence
4. Graduates looking to build a career in data science and Machine Learning
Learn more at: https://www.simplilearn.com/
This presentation is aimed at fitting a Simple Linear Regression model in a Python program. IDE used is Spyder. Screenshots from a working example are used for demonstration.
Application of Machine Learning in AgricultureAman Vasisht
With the growing trend of machine learning, it is needless to say how machine learning can help reap benefits in agriculture. It will be boon for the farmer welfare.
Data Science - Part V - Decision Trees & Random Forests Derek Kane
This lecture provides an overview of decision tree machine learning algorithms and random forest ensemble techniques. The practical example includes diagnosing Type II diabetes and evaluating customer churn in the telecommunication industry.
A decision tree is a guide to the potential results of a progression of related choices. It permits an individual or association to gauge potential activities against each other dependent on their costs, probabilities, and advantages. They can be utilized either to drive casual conversation or to outline a calculation that predicts the most ideal decision scientifically.
Decision Trees for Classification: A Machine Learning AlgorithmPalin analytics
Decision Trees in Machine Learning - Decision tree method is a commonly used data mining method for establishing classification systems based on several covariates or for developing prediction algorithms for a target variable.
ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone.
An ensemble is itself a supervised learning algorithm, because it can be trained and then used to make predictions. The trained ensemble, therefore, represents a single hypothesis. This hypothesis, however, is not necessarily contained within the hypothesis space of the models from which it is built.
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfAdityaSoraut
Its all about Machine learning .Machine learning is a field of artificial intelligence (AI) that focuses on the development of algorithms and statistical models that enable computers to perform tasks without explicit programming instructions. Instead, these algorithms learn from data, identifying patterns, and making decisions or predictions based on that data.
There are several types of machine learning approaches, including:
Supervised Learning: In this approach, the algorithm learns from labeled data, where each example is paired with a label or outcome. The algorithm aims to learn a mapping from inputs to outputs, such as classifying emails as spam or not spam.
Unsupervised Learning: Here, the algorithm learns from unlabeled data, seeking to find hidden patterns or structures within the data. Clustering algorithms, for instance, group similar data points together without any predefined labels.
Semi-Supervised Learning: This approach combines elements of supervised and unsupervised learning, typically by using a small amount of labeled data along with a large amount of unlabeled data to improve learning accuracy.
Reinforcement Learning: This paradigm involves an agent learning to make decisions by interacting with an environment. The agent receives feedback in the form of rewards or penalties, enabling it to learn the optimal behavior to maximize cumulative rewards over time.Machine learning algorithms can be applied to a wide range of tasks, including:
Classification: Assigning inputs to one of several categories. For example, classifying whether an email is spam or not.
Regression: Predicting a continuous value based on input features. For instance, predicting house prices based on features like square footage and location.
Clustering: Grouping similar data points together based on their characteristics.
Dimensionality Reduction: Reducing the number of input variables to simplify analysis and improve computational efficiency.
Recommendation Systems: Predicting user preferences and suggesting items or actions accordingly.
Natural Language Processing (NLP): Analyzing and generating human language text, enabling tasks like sentiment analysis, machine translation, and text summarization.
Machine learning has numerous applications across various domains, including healthcare, finance, marketing, cybersecurity, and more. It continues to be an area of active research and
Get to know in detail the termonologies of Random Forest with their types of algorithms used in the workflow along with their advantages and disadvantages of their predecessors.
Thanks, for your time, if you enjoyed this short article there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
Similar to Machine learning session6(decision trees random forrest) (20)
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
2. Decision Trees
A supervised learning predictive model that uses a set of binary rules to calculate a target value
Used for either classification(categorical variables) or regression(continuous variables)
In this method, the given data is split into wo or more homogeneous sets based on most
significant input variable
Tree generating algorithm determines
- Which variable to split at a node
- Decision to stop or make a split again
- Assign terminal nodes to a class
3. Decision Trees
Sno Species Name Body_Temp Gives_Birth Type
1 SP101 Cold-Blooded No Non-Mammal
2 SP102 Warm-Blodded No Non-Mammal
3 SP103 Cold-Blooded No Non-Mammal
4 SP104 Warm-Blodded Yes Mammal
5 SP105 Warm-Blodded No Non-Mammal
6 SP106 Warm-Blodded No Non-Mammal
7 SP107 Warm-Blodded Yes Mammal
8 SP108 Cold-Blooded No Non-Mammal
9 SP109 Cold-Blooded No Non-Mammal
10 SP110 Warm-Blodded Yes Mammal
11 SP111 Warm-Blodded Yes Mammal
12 SP112 Warm-Blodded No Non-Mammal
13 SP113 Cold-Blooded No Non-Mammal
14 SP114 Warm-Blodded Yes Mammal
4. Decision Trees
The decision tree will be created as follows:
1. The splits will be done to partition the data into purer subsets. So, when the split is done on Body
Temperature, all the species that are cold-blooded are found to be “non-mammals”. So when the
Body Temperature is “Cold- Blooded”, we get a purest subset where all the species belong to “non-
Mammals”.
2. But for “warm-blooded” species, we still have both types present: mammals and non-mammals. So,
now the algorithm splits on the values of Gives_Birth. This time, when the values are “Yes”, we get
all type values as “Mammals” and when the values are “No”, we get all type values as “Non-
Mammals”.
3. The nodes represents Attributes while the edges represent values.
5. Decision Tree – An Example
Body
Temperature
Gives Birth
Mammal
Non-
Mammal
Non-
Mammals
Warm Cold
Yes No
9. Gini
Gini Index: It is the measure of inequality of distribution. It says if we select two items from a
population at random then they must be of same class and probability for this is 1 if population is pure.
It works with categorical target variable “Success” or “Failure”.
It performs only Binary splits
Higher the value of Gini, higher the homogeneity.
CART (Classification and Regression Tree) uses Gini method to create binary splits.
Process to calculate Gini Measure:
P(j) is the Probability of Class j
10. Entropy
Entropy is a way to measure impurity.
Less impure node requires less information to describe it and more impure node
requires more information. If the sample is completely homogeneous, then the entropy
is zero and if the sample is an equally divided it has entropy of one.
11. Information Gain
Information Gain is simply a mathematical way to capture the amount of information
one gains(or reduction in randomness) by picking a particular attribute
In a decision algorithm, we start at the tree root and split the data on the feature that
results in the largest information gain (IG). In other words, IG tells us how important a
given attribute is.
The Information Gain (IG) can be defined as follows:
Where I could be entropy or Gini index. D(p), D(Left) and D(Right) are the dataset of
the parent, left and right child node.
15. Pros and Cons of Decision Trees
Advantages of Decision Trees:
• Its very interpretable and hence easy to understand
• It can be used to identify the most significant variables in your data-set
Disadvantages:
The model has very high chances of “over-fitting”
16. Avoiding Overfitting in Decision Trees
• Overfitting is the key challenge in case of Decision Trees.
• If no limit is set, in the worst case, it will end up putting each observation into a leaf node.
• It can be avoided by setting Constraints on Tree Size
17. Parameters for Decision Trees
Following parameters are used for limiting Tree Size:
Minimum samples for a node split: It defines minimum number of observations which are
required in a node to be considered for splitting.
Minimum samples for a terminal node (leaf): Defines the minimum samples (or
observations) required in a terminal node or leaf. Generally lower values should be chosen for
imbalanced class problems because the regions in which the minority class will be in majority
will be very small.
Maximum depth of tree: Higher depth can cause overfitting
Maximum number of terminal nodes
Maximum features to consider for split: As a thumb-rule, square root of the total number of
features works great but we should check upto 30-40% of the total number of features.
18. Validation
When creating a predictive model, we'd like to get an accurate sense of its ability to
generalize to unseen data before actually going out and using it on unseen data.
A complex model may fit the training data extremely closely but fail to generalize to
new, unseen data. We can get a better sense of a model's expected performance on
unseen data by setting a portion of our training data aside when creating a model, and
then using that set aside data to evaluate the model's performance.
This technique of setting aside some of the training data to assess a model's ability to
generalize is known as validation.
Holdout validation and cross validation are two common methods for assessing a
model before using it on test data.
19. Holdout Validation
Holdout validation involves splitting the training data into two parts, a training set and a
validation set, building a model with the training set and then assessing performance with
the validation set.
Some Disadvantages:
Reserving a portion of the training data for a holdout set means you aren't using all the
data at your disposal to build your model in the validation phase. This can lead to
suboptimal performance, especially in situations where you don't have much data to
work with.
If you use the same holdout validation set to assess too many different models, you
may end up finding a model that fits the validation set well due to chance that won't
necessarily generalize well to unseen data.
20. Cross-Validation
Cross-Validation is a process of leaving a sample on which you do not train the model and test
the model on this sample.
To do 10-fold cross validation:
we divide the training data equally into 10 sets.
Train the model over 9 sets, while holding back the 10th set
Test the model on the set which is held back and save the results
Repeat the process 9 more times by changing the data set which is held back
Average out the results
The result is a more reliable estimate of the performance of the algorithm on new data. It is more
accurate because the algorithm is trained and evaluated multiple times on different data. The
choice of k must allow the size of each test partition to be large enough to be a reasonable
sample of the problem, whilst allowing enough repetitions of the train-test evaluation of the
algorithm to provide a fair estimate of the algorithms performance on unseen data. For modest
sized datasets in the thousands or tens of thousands of records, k values of 3, 5 and 10 are
common.
21. Ensembling
Ensembling: Ensembling is a process of combining the results of multiple models to solve
a given prediction or classification problem.
The three most popular methods for combining the predictions from different models are:
Bagging: Building multiple models (typically of the same type) from different subsamples
of the training dataset. A limitation of bagging is that the same greedy algorithm is used
to create each tree, meaning that it is likely that the same or very similar split points will
be chosen in each tree making the different trees very similar (trees will be correlated).
This, in turn, makes their predictions similar. We can force the decision trees to be
different by limiting the features that the greedy algorithm can evaluate at each split point
when creating the tree. This is called the Random Forest algorithm.
Boosting: Building multiple models (typically of the same type) each of which learns to fix
the prediction errors of a prior model in the sequence of models.
Voting: Building multiple models (typically of differing types) and simple statistics (like
calculating the mean) are used to combine predictions.
23. Random Forests
Random Forests: Random Forests is an ensemble(bagging) modeling technique that
works by building large number of decision tress. It takes a subset of observations and a
subset of variables to build several decision trees. Each tree will play a role in determining
the final outcome.
A random forest builds many trees. In forming each tree, it randomly selects the
independent variables. This means that data used as training data for each tree is selected
randomly with replacement. Each tree will give its output. The final output will be a
function of each individual tree output.
The function can be:
Mean of individual probability outcomes
Vote Frequency
The important parameters used while using RandomForestClassifier in Python are:
n_estimators is used to fix number of trees in Random Forest.
max_depth - parameter determining when the splitting up of the decision tree stops.
min_samples_split - parameter monitoring the amount of observations in a bucket
25. An Example
Lets say we have to decide whether a cricket player is Good or Bad. We can have the as
“average runs scored” against various variables like “Opposition Played”, “Country
Played”, “Matches Won”, “Matches Lost” and many others. We can use a decision tree
which splits on only certain attributes, to determine whether this player is Good or Bad.
But we can improve our prediction, if we build a number of decision trees which split on
different combination of variables. Then using the outcome of each tree, we can decide
upon our final classification. This will greatly improve the performance which we get
using a single tree. Another example can be the judgements delivered by Supreme Court.
Supreme Court usually employs a bench of judges to deliver a judgement. Each judge
gives its own judgment and then the outcome which gets the maximum voting becomes
the final judgement.
26. An Example
Suppose, you want to watch a movie ‘X’ . You ask your friend ‘Amar’ if you should watch this movie.
Amar asks you a series of questions:
What previous movies you watched? Which of these movies you liked or disliked? Is X a romantic
movie? Does Salman Khan act in movie X?
….. And several other questions. Finally Amar tells you either Yes or No.
Amar is a decision tree for your movie preference.
But Amar may not give you accurate answer. In order to be more sure, you ask couple of friends –
Rita, Suneet and Megha, the same question. They will each vote for the movie X. (This is ensemble
model aka Random Forest in this case)
If you have a similar circle of friends, they may all have the exact same process of questions, so to
avoid them all having the exact same answer, you’ll want to give them each a different sample from
your list of movies. You decide to cut up your list and place it in a bag, then randomly draw from the
bag, tell your friend whether or not you enjoyed that movie, and then place that sample back in the
bag. This means you’ll be randomly drawing a sub sample from your original list with replacement
(bootstrapping your original data)
And you will just make sure that , each of your friends asks different questions randomly.
27. Uses – Random Forest
Variable Selection: One of the best use cases for random forest is feature selection. One
of the byproducts of trying lots of decision tree variations is that you can examine which
variables are working best/worst in each tree.
Classification: Random forest is also great for classification. It can be used to make
predictions for categories with multiple possible values