SlideShare a Scribd company logo
1 of 23
Random Variables
 A random variable is a variable
whose value is subject to variations
due to chance i.e randomness. (also
known as stochastic variable). It’s a set
of possible values from a random
experiment.
 A Random Experiment is an
experiment whose set of outcomes
can be specified beforehand but the
actual outcome of the experiment is
subject to chance. E.g throwing a dice,
flipping a coin etc. The outcome
variable of the statistical experiment is
usually a random variable.
 EVENT is a single result of an
experiment
 So, we have an EXPERIMENT. We
give values to each EVENT of
experiment. The set of values is a
Random Variable.
Its different from algebraic variable e.g if x+3=7, then x=4. But a random variable
is a ‘set’ of values.
X = {1,2,3,4} X could be 1 or 2 or 3 or 4 randomly and each can have different
probability of occurrence.
Types of Random Variables
It can be of 3 types:
> Discrete: It can take only integer values e.g [0,1,-1,2,3,4]
> Continuous: It can take any value from a range of values
> Categorical: It can only take a value from a fixed set of values
 The actual value of a random variable can not be determined beforehand. However the range
of values it can take, can be pre-determined. E.g the roll of a dice, length of a tweet etc
Probability
Probability is a measure of how likely something is to happen
Types of Events
Events can be:
 Independent: It is not affected by other events e.g toss os a coin.
 Dependent(Conditional): It is affected by other events
 Mutually Exclusive: Events can’t happen at the same time
Independent Events
Independent Events are not affected by previous events.
 A coin does not "know" it came up heads before.
 And each toss of a coin is a perfect isolated thing.
You toss a coin and it comes up "Heads" three times ... what is the chance that the next
toss will also be a "Head"? The chance is simply ½ (or 0.5) just like ANY toss of the coin.
What it did in the past will not affect the current toss!
The chances of two or more independent events can be calculated by “multiplying” the
probabilities of individual events.
Probability of 3 heads in a row: 0.5 * 0.5 * 0.5 = 0.125
P(A and B) = P(A) × P(B)
Dependent Events
Dependent Events are affected by previous events.
Example:
Marbles in a bag:
We have 2 blue marbles in a group of 5
Probability(Blue Marble) = 2/5
But after taking one out the chances change!
So the next time:
 if we got a red marble before, then the chance of a blue marble next is 2 in 4
 if we got a blue marble before, then the chance of a blue marble next is 1 in 4
Conditional Probability
In case of Dependent Events, the probability of an event B, “given” that A has happened is known
as Conditional Probability or Posterior Probability and is denoted as:
P(B|A)
P(A AND B) = P(A) * P(B|A)
Or
P(B|A) = P(A and B)/P(A)
Conditional Probability
Conditional Probability
Probability that a randomly selected person uses an iPhone:
P(iPhone)= 5/10 = 0.5
What is the probability that a randomly selected person uses an iPhone given that
person uses a Mac laptop?
there are 4 people who use both a Mac and an iPhone:
and the probability of a random person using a mac is P(mac)= 6/10
So the probability of that some person uses an iPhone given that person uses a
Mac is
P(iphone|mac) = 0.4/0.6 = 0.667
Mutually Exclusive Events
Mutually Exclusive Events are those events which can not happen at same time
 You can either go to left or right bit not both at same time
 A coin will either turn up Heads or Tails
 Kings and Aces are mutually exclusive
Not mutually exclusive events :
 Turning left and scratching your head
 Kings and hearts in a deck, because we can have a King of Hearts
Probability of Mutually Exclusive Events
If A and B are mutually exclusive, then
P(A and B) = 0
e.g If a card is drawn randomly from a deck, whats the probability that it is King AND Queen? 0
But, we can find out the probability of Event A OR Event B
P(A or B) = P(A) + P(B)
Probability(Card is King OR Card is Queen) = 1/13 + 1/13
In case when events are not mutually exclusive:
P(A or B) = P(A) +P(B) – P(A and B)
Bayes Theorm
P(D) = P(D|h)*P(h) + P(D|~h)*P(~h)
0.8% of the people in the U.S. have diabetes. There is a simple blood test we can do
that will help us determine whether someone has it. The test is a binary one—it
comes back either POS or NEG. When the disease is present the test returns a correct
POS result 98% of the time; it returns a correct NEG result 97% of the time in cases
when the disease is not present.
Suppose a patient takes the test for diabetes and the result comes back as Positive.
What is more likely : Patient has diabetes or Patient does not have diabetes?
Bayes Theorem
P(disease) = 0.008
P(~disease) = 0.992
P(POS|disease) = 0.98
P(NEG|disease) = 0.02
P(NEG|~disease)=0.97
P(POS|~disease) = 0.03
P(disease|POS) = ??
As per Bayes Theorm:
P(disease|POS) = [P(POS|disease)* P(disease)]/P(POS)
P(POS) = P(POS|disease)* P(disease)] + P(POS|~disease)* P(~disease)]
P(disease|POS) = 0.98*0.008/(0.98*0.008 + 0.03*0.992) = 0.21
P(~disease|POS) = 0.03*0.992/(0.98*0.008 + 0.03*0.992) = 0.79
The person has only 21% chance of getting the disease
Probability Distribution
 A Probability Distribution is a table or function which links each outcome of a statistical
experiment with its probability of occurrence.
Lets take a statistical experiment where in we are picking up a user at random from the entire group of
Facebook Users. We have the data tracking the country of users which login into facebook each day.
Here Country is the random variable. The % users logging in are as follows:
Now we can get the probability if the user picked belongs to USA
P(X=”USA”)= 10/100 = 0.1
 If the probabilities of each outcome of a statistical experiment are same, it is said to belong to
Uniform Probability Distribution. E.g the experiment of throwing a dice. Each outcome has a
probability of 1/6.
 Depending upon the type of Random Variable, the probability distribution can also be Discrete or
Continuous
Country % of Users
USA 10%
India 7%
Brazil 5%
Indonesia 4%
Others 74%
The NORMAL Distribution
 In real world, the following type of distribution is very commonly seen:
/
 The x-axis is the value of the random variable.
 The y-axis is the probability it can take
e.g try measuring the height of the employees in your company. In most situations, there will be couple of employees with
very low measurements, couple of employees with very large measurements and most of them centred on a particular value.
Since this pattern is so frequently seen, it is called as normal distribution.
 The peak value is called the Mean or Average. The width of the curve defines the spread of the variable and is defined
by a parameter called “Standard Deviation”
 Mean and SD are usually sufficient to completely describe a Normal Distribution. Given these 2 numbers , one can
calculate the probability of a random variable by using Standard Tables. But before assuming that a random variable
follows Normal Distribution, you need to perform certain tests for Normality
The Normal Distribution
Central Limit Theorem:
Regardless of the underlying distribution, if we draw large enough samples and plot each
sample mean then it approximates to normal distribution. The Empirical Rule states that
the percentages of data in a normal distribution within 1, 2, and 3 standard deviations of
the mean are approximately 68%, 95%, and 99.7%, respectively.
Skewness and Kurtosis are the other two characteristics used to understand a
distribution. Skewness is a measure of the asymmetry. Negatively skewed curve has a long
left tail and vice versa. Kurtosis is a measure of the "peaked ness". Distributions with
higher peaks have positive kurtosis and vice-versa. Following diagrams will make this
parameter clearer
Probability Distributions and ML
 The “features” that we select in a Machine Learning problem are generally Random Variables
 Many Machine Learning techniques makes assumptions about what are the probability
distributions of these random variables
 Statisticians and Mathematicians have studied a lot of random variables in nature and realized
that there are some recurrent themes. They have defined some standard distributions and
most random variables that are encountered fall into one of these standard distributions.
Analytics Landscape
Reporting: A report describes what events have happened in the business. It provides what is asked for and is
typically standardized. A monthly sales summary report shows monthly sales by region.
Analysis: An analysis tries to answer why the events happened in the business have happened. E.g an analysis of
sales summary report may show sales peaks on specific holidays or weekends. Basic Analytics involves slicing and
dicing of data, monitoring large volumes of data in real time and anomaly detection
Advanced Analytics: Advanced analytics extends the insights provided by analytics by doing impact analysis on the
business and prescribing the next steps which can be taken. It includes predictive modeling, text analytics and
advanced data mining algorithms. The purpose of any "data analysis" is to derive meaningful information from it.
One way to extract information from data is to study the variability in data points. The more is the variability, the
more careful you have to study or explore the dataset, so that you can capture all of its meaning.
Data Science: Data science is about using data to make decisions that drive actions.
Data science involves:
 Finding data
 Acquiring data
 Cleaning and transforming data
 Understanding relationships in data
 Delivering value from data
Forecasting is a process of estimating the future based on past events. It’s at a high level. E.g no of calls expected
in a call center, no of passengers expected to travel from an airport next month etc
Predictive modeling is doing the prediction or estimation at a more granular level. E.g which customers are
expected to buy the printer in next 30 days.
Doing Analytics – Step by Step
 Understand the Business Process
 Understand the data involved in that Business process – Data Profiling &
Exploration
 Modeling
 Testing and Validation
 Deployment
Exploratory Data Analysis
EDA refers to the process of exploring data for the purpose of doing analytics. It is primarily concerned with
looking data, summarizing it, find out the main characteristics of data, usually with visual aid.
 Identify the dependent and independent variables (Predictor and Target)
 Univariate Analysis: For continuous variables, check the distribution/summary of each of your
attributes (mean, median, range, inter-quartile range, standard deviation). For categorical variables, use
frequency tables to understand the distribution of each category. It can be measured by finding out
Count and Count% of each category.
 Bivariate Analysis: Find out the relationship between several variables
 Handling Missing Values: In cases where you have a lot of data and only a few missing values, it might
make sense to simply delete records with missing values present. On the other hand, if you have more
than a handful of missing values, removing records with missing values could cause you to get rid of a
lot of data. Missing values in categorical data are not particularly troubling because you can simply treat
NA as an additional category. Missing values in numeric variables are more troublesome, since you can't
just treat a missing value as number.
 Handling Outliers
 Variable Transformation
 Variable Creation
Exploratory Data Analysis
1.Do I need all of the variables?
2. Should I transform any variables?
3. Are there NA values, outliers or other strange values?
4. Should I create new variables?
Handling Missing Values
1. If the dataset contains very less no of missing values, you can drop those records
2. Replace the null values with 0s
3. Replace the null values with some central value like the mean or median
4. Impute values (estimate values using statistical/predictive modeling methods.).
5. Split the data set into two parts: one set with where records have an Age value and another
set where age is null.
Plots for Data Exploration
Histogram: A histogram is a univariate plot (a plot that displays one variable) that groups a
numeric variable into bins and displays the number of observations that fall within each bin. A
histogram is a useful tool for getting a sense of the distribution of a numeric variable.
Boxplot: Boxplots are another type of univariate plot for summarizing distributions of
numeric data graphically. They can very clearly show outliers in data. The central box of the
boxplot represents the middle 50% of the observations, the central bar is the median and the
bars at the end of the dotted lines (whiskers) encapsulate the great majority of the
observations. Circles that lie beyond the end of the whiskers are data points that may be
outliers.
Scatterplot: Scatterplots are bivariate (two variable) plots that take two numeric variables and
plot data points on the x/y plane.

More Related Content

What's hot

Hypothesis Testing with ease
Hypothesis Testing with easeHypothesis Testing with ease
Hypothesis Testing with easeRupak Roy
 
Linear Regression
Linear Regression Linear Regression
Linear Regression Rupak Roy
 
Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVADerek Kane
 
Interval Estimation & Estimation Of Proportion
Interval Estimation & Estimation Of ProportionInterval Estimation & Estimation Of Proportion
Interval Estimation & Estimation Of ProportionDataminingTools Inc
 
Interval estimation for proportions
Interval estimation for proportionsInterval estimation for proportions
Interval estimation for proportionsAditya Mahagaonkar
 
Statistical Estimation
Statistical Estimation Statistical Estimation
Statistical Estimation Remyagharishs
 
Evaluating hypothesis
Evaluating  hypothesisEvaluating  hypothesis
Evaluating hypothesisswapnac12
 
Estimating population mean
Estimating population meanEstimating population mean
Estimating population meanRonaldo Cabardo
 
Association between-variables
Association between-variablesAssociation between-variables
Association between-variablesBorhan Uddin
 
Quantitative Methods for Lawyers - Class #20 - Regression Analysis - Part 3
Quantitative Methods for Lawyers - Class #20 - Regression Analysis - Part 3Quantitative Methods for Lawyers - Class #20 - Regression Analysis - Part 3
Quantitative Methods for Lawyers - Class #20 - Regression Analysis - Part 3Daniel Katz
 
Normal and standard normal distribution
Normal and standard normal distributionNormal and standard normal distribution
Normal and standard normal distributionAvjinder (Avi) Kaler
 
Association between-variables
Association between-variablesAssociation between-variables
Association between-variablesBorhan Uddin
 

What's hot (19)

Hypothesis Testing with ease
Hypothesis Testing with easeHypothesis Testing with ease
Hypothesis Testing with ease
 
Parameter estimation
Parameter estimationParameter estimation
Parameter estimation
 
Linear Regression
Linear Regression Linear Regression
Linear Regression
 
Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVA
 
Interval Estimation & Estimation Of Proportion
Interval Estimation & Estimation Of ProportionInterval Estimation & Estimation Of Proportion
Interval Estimation & Estimation Of Proportion
 
Interval estimation for proportions
Interval estimation for proportionsInterval estimation for proportions
Interval estimation for proportions
 
Estimation
EstimationEstimation
Estimation
 
Point Estimation
Point EstimationPoint Estimation
Point Estimation
 
Statistical Estimation
Statistical Estimation Statistical Estimation
Statistical Estimation
 
Statistics
StatisticsStatistics
Statistics
 
Evaluating hypothesis
Evaluating  hypothesisEvaluating  hypothesis
Evaluating hypothesis
 
Statistical parameters
Statistical parametersStatistical parameters
Statistical parameters
 
Chapter09
Chapter09Chapter09
Chapter09
 
Estimating population mean
Estimating population meanEstimating population mean
Estimating population mean
 
Association between-variables
Association between-variablesAssociation between-variables
Association between-variables
 
Estimating a Population Mean
Estimating a Population Mean  Estimating a Population Mean
Estimating a Population Mean
 
Quantitative Methods for Lawyers - Class #20 - Regression Analysis - Part 3
Quantitative Methods for Lawyers - Class #20 - Regression Analysis - Part 3Quantitative Methods for Lawyers - Class #20 - Regression Analysis - Part 3
Quantitative Methods for Lawyers - Class #20 - Regression Analysis - Part 3
 
Normal and standard normal distribution
Normal and standard normal distributionNormal and standard normal distribution
Normal and standard normal distribution
 
Association between-variables
Association between-variablesAssociation between-variables
Association between-variables
 

Similar to Machine learning session2

Different types of distributions
Different types of distributionsDifferent types of distributions
Different types of distributionsRajaKrishnan M
 
RSS probability theory
RSS probability theoryRSS probability theory
RSS probability theoryKaimrc_Rss_Jd
 
Module-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data scienceModule-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data sciencepujashri1975
 
10 Must-Know Statistical Concepts for Data Scientists.docx
10 Must-Know Statistical Concepts for Data Scientists.docx10 Must-Know Statistical Concepts for Data Scientists.docx
10 Must-Know Statistical Concepts for Data Scientists.docxKin Kan
 
STSTISTICS AND PROBABILITY THEORY .pptx
STSTISTICS AND PROBABILITY THEORY  .pptxSTSTISTICS AND PROBABILITY THEORY  .pptx
STSTISTICS AND PROBABILITY THEORY .pptxVenuKumar65
 
Advanced business mathematics and statistics for entrepreneurs
Advanced business mathematics and statistics for entrepreneursAdvanced business mathematics and statistics for entrepreneurs
Advanced business mathematics and statistics for entrepreneursDr. Trilok Kumar Jain
 
Frequentist inference only seems easy By John Mount
Frequentist inference only seems easy By John MountFrequentist inference only seems easy By John Mount
Frequentist inference only seems easy By John MountChester Chen
 
Topic 1 __basic_probability_concepts
Topic 1 __basic_probability_conceptsTopic 1 __basic_probability_concepts
Topic 1 __basic_probability_conceptsMaleakhi Agung Wijaya
 
vinayjoshi-131204045346-phpapp02.pdf
vinayjoshi-131204045346-phpapp02.pdfvinayjoshi-131204045346-phpapp02.pdf
vinayjoshi-131204045346-phpapp02.pdfsanjayjha933861
 
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxAnswer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxboyfieldhouse
 
Fin415 Week 2 Slides
Fin415 Week 2 SlidesFin415 Week 2 Slides
Fin415 Week 2 Slidessmarkbarnes
 
Probability theory good
Probability theory goodProbability theory good
Probability theory goodZahida Pervaiz
 
Introduction to Statistics - Part 2
Introduction to Statistics - Part 2Introduction to Statistics - Part 2
Introduction to Statistics - Part 2Damian T. Gordon
 

Similar to Machine learning session2 (20)

Different types of distributions
Different types of distributionsDifferent types of distributions
Different types of distributions
 
Statistics
StatisticsStatistics
Statistics
 
Statistics
StatisticsStatistics
Statistics
 
RSS probability theory
RSS probability theoryRSS probability theory
RSS probability theory
 
Module-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data scienceModule-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data science
 
Probablity & queueing theory basic terminologies & applications
Probablity & queueing theory basic terminologies & applicationsProbablity & queueing theory basic terminologies & applications
Probablity & queueing theory basic terminologies & applications
 
10 Must-Know Statistical Concepts for Data Scientists.docx
10 Must-Know Statistical Concepts for Data Scientists.docx10 Must-Know Statistical Concepts for Data Scientists.docx
10 Must-Know Statistical Concepts for Data Scientists.docx
 
STSTISTICS AND PROBABILITY THEORY .pptx
STSTISTICS AND PROBABILITY THEORY  .pptxSTSTISTICS AND PROBABILITY THEORY  .pptx
STSTISTICS AND PROBABILITY THEORY .pptx
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Advanced business mathematics and statistics for entrepreneurs
Advanced business mathematics and statistics for entrepreneursAdvanced business mathematics and statistics for entrepreneurs
Advanced business mathematics and statistics for entrepreneurs
 
Frequentist inference only seems easy By John Mount
Frequentist inference only seems easy By John MountFrequentist inference only seems easy By John Mount
Frequentist inference only seems easy By John Mount
 
Probability
ProbabilityProbability
Probability
 
Topic 1 __basic_probability_concepts
Topic 1 __basic_probability_conceptsTopic 1 __basic_probability_concepts
Topic 1 __basic_probability_concepts
 
vinayjoshi-131204045346-phpapp02.pdf
vinayjoshi-131204045346-phpapp02.pdfvinayjoshi-131204045346-phpapp02.pdf
vinayjoshi-131204045346-phpapp02.pdf
 
Data science
Data scienceData science
Data science
 
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxAnswer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
 
Fin415 Week 2 Slides
Fin415 Week 2 SlidesFin415 Week 2 Slides
Fin415 Week 2 Slides
 
Probability theory good
Probability theory goodProbability theory good
Probability theory good
 
Introduction to Statistics - Part 2
Introduction to Statistics - Part 2Introduction to Statistics - Part 2
Introduction to Statistics - Part 2
 
Probability theory
Probability theoryProbability theory
Probability theory
 

More from Abhimanyu Dwivedi

John mc carthy contribution to AI
John mc carthy contribution to AIJohn mc carthy contribution to AI
John mc carthy contribution to AIAbhimanyu Dwivedi
 
Machine learning session9(clustering)
Machine learning   session9(clustering)Machine learning   session9(clustering)
Machine learning session9(clustering)Abhimanyu Dwivedi
 
Machine learning session8(svm nlp)
Machine learning   session8(svm nlp)Machine learning   session8(svm nlp)
Machine learning session8(svm nlp)Abhimanyu Dwivedi
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)Abhimanyu Dwivedi
 
Machine learning session3(intro to python)
Machine learning   session3(intro to python)Machine learning   session3(intro to python)
Machine learning session3(intro to python)Abhimanyu Dwivedi
 
Data analytics with python introductory
Data analytics with python introductoryData analytics with python introductory
Data analytics with python introductoryAbhimanyu Dwivedi
 

More from Abhimanyu Dwivedi (8)

Deepfakes videos
Deepfakes videosDeepfakes videos
Deepfakes videos
 
John mc carthy contribution to AI
John mc carthy contribution to AIJohn mc carthy contribution to AI
John mc carthy contribution to AI
 
Machine learning session9(clustering)
Machine learning   session9(clustering)Machine learning   session9(clustering)
Machine learning session9(clustering)
 
Machine learning session8(svm nlp)
Machine learning   session8(svm nlp)Machine learning   session8(svm nlp)
Machine learning session8(svm nlp)
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
 
Machine learning session3(intro to python)
Machine learning   session3(intro to python)Machine learning   session3(intro to python)
Machine learning session3(intro to python)
 
Data analytics with python introductory
Data analytics with python introductoryData analytics with python introductory
Data analytics with python introductory
 
Housing price prediction
Housing price predictionHousing price prediction
Housing price prediction
 

Recently uploaded

Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxnelietumpap1
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 

Machine learning session2

  • 1. Random Variables  A random variable is a variable whose value is subject to variations due to chance i.e randomness. (also known as stochastic variable). It’s a set of possible values from a random experiment.  A Random Experiment is an experiment whose set of outcomes can be specified beforehand but the actual outcome of the experiment is subject to chance. E.g throwing a dice, flipping a coin etc. The outcome variable of the statistical experiment is usually a random variable.  EVENT is a single result of an experiment  So, we have an EXPERIMENT. We give values to each EVENT of experiment. The set of values is a Random Variable. Its different from algebraic variable e.g if x+3=7, then x=4. But a random variable is a ‘set’ of values. X = {1,2,3,4} X could be 1 or 2 or 3 or 4 randomly and each can have different probability of occurrence.
  • 2. Types of Random Variables It can be of 3 types: > Discrete: It can take only integer values e.g [0,1,-1,2,3,4] > Continuous: It can take any value from a range of values > Categorical: It can only take a value from a fixed set of values  The actual value of a random variable can not be determined beforehand. However the range of values it can take, can be pre-determined. E.g the roll of a dice, length of a tweet etc
  • 3. Probability Probability is a measure of how likely something is to happen
  • 4. Types of Events Events can be:  Independent: It is not affected by other events e.g toss os a coin.  Dependent(Conditional): It is affected by other events  Mutually Exclusive: Events can’t happen at the same time
  • 5. Independent Events Independent Events are not affected by previous events.  A coin does not "know" it came up heads before.  And each toss of a coin is a perfect isolated thing. You toss a coin and it comes up "Heads" three times ... what is the chance that the next toss will also be a "Head"? The chance is simply ½ (or 0.5) just like ANY toss of the coin. What it did in the past will not affect the current toss! The chances of two or more independent events can be calculated by “multiplying” the probabilities of individual events. Probability of 3 heads in a row: 0.5 * 0.5 * 0.5 = 0.125 P(A and B) = P(A) × P(B)
  • 6. Dependent Events Dependent Events are affected by previous events. Example: Marbles in a bag: We have 2 blue marbles in a group of 5 Probability(Blue Marble) = 2/5 But after taking one out the chances change! So the next time:  if we got a red marble before, then the chance of a blue marble next is 2 in 4  if we got a blue marble before, then the chance of a blue marble next is 1 in 4
  • 7. Conditional Probability In case of Dependent Events, the probability of an event B, “given” that A has happened is known as Conditional Probability or Posterior Probability and is denoted as: P(B|A) P(A AND B) = P(A) * P(B|A) Or P(B|A) = P(A and B)/P(A)
  • 9. Conditional Probability Probability that a randomly selected person uses an iPhone: P(iPhone)= 5/10 = 0.5 What is the probability that a randomly selected person uses an iPhone given that person uses a Mac laptop? there are 4 people who use both a Mac and an iPhone: and the probability of a random person using a mac is P(mac)= 6/10 So the probability of that some person uses an iPhone given that person uses a Mac is P(iphone|mac) = 0.4/0.6 = 0.667
  • 10. Mutually Exclusive Events Mutually Exclusive Events are those events which can not happen at same time  You can either go to left or right bit not both at same time  A coin will either turn up Heads or Tails  Kings and Aces are mutually exclusive Not mutually exclusive events :  Turning left and scratching your head  Kings and hearts in a deck, because we can have a King of Hearts
  • 11. Probability of Mutually Exclusive Events If A and B are mutually exclusive, then P(A and B) = 0 e.g If a card is drawn randomly from a deck, whats the probability that it is King AND Queen? 0 But, we can find out the probability of Event A OR Event B P(A or B) = P(A) + P(B) Probability(Card is King OR Card is Queen) = 1/13 + 1/13 In case when events are not mutually exclusive: P(A or B) = P(A) +P(B) – P(A and B)
  • 12. Bayes Theorm P(D) = P(D|h)*P(h) + P(D|~h)*P(~h) 0.8% of the people in the U.S. have diabetes. There is a simple blood test we can do that will help us determine whether someone has it. The test is a binary one—it comes back either POS or NEG. When the disease is present the test returns a correct POS result 98% of the time; it returns a correct NEG result 97% of the time in cases when the disease is not present. Suppose a patient takes the test for diabetes and the result comes back as Positive. What is more likely : Patient has diabetes or Patient does not have diabetes?
  • 13. Bayes Theorem P(disease) = 0.008 P(~disease) = 0.992 P(POS|disease) = 0.98 P(NEG|disease) = 0.02 P(NEG|~disease)=0.97 P(POS|~disease) = 0.03 P(disease|POS) = ?? As per Bayes Theorm: P(disease|POS) = [P(POS|disease)* P(disease)]/P(POS) P(POS) = P(POS|disease)* P(disease)] + P(POS|~disease)* P(~disease)] P(disease|POS) = 0.98*0.008/(0.98*0.008 + 0.03*0.992) = 0.21 P(~disease|POS) = 0.03*0.992/(0.98*0.008 + 0.03*0.992) = 0.79 The person has only 21% chance of getting the disease
  • 14. Probability Distribution  A Probability Distribution is a table or function which links each outcome of a statistical experiment with its probability of occurrence. Lets take a statistical experiment where in we are picking up a user at random from the entire group of Facebook Users. We have the data tracking the country of users which login into facebook each day. Here Country is the random variable. The % users logging in are as follows: Now we can get the probability if the user picked belongs to USA P(X=”USA”)= 10/100 = 0.1  If the probabilities of each outcome of a statistical experiment are same, it is said to belong to Uniform Probability Distribution. E.g the experiment of throwing a dice. Each outcome has a probability of 1/6.  Depending upon the type of Random Variable, the probability distribution can also be Discrete or Continuous Country % of Users USA 10% India 7% Brazil 5% Indonesia 4% Others 74%
  • 15. The NORMAL Distribution  In real world, the following type of distribution is very commonly seen: /  The x-axis is the value of the random variable.  The y-axis is the probability it can take e.g try measuring the height of the employees in your company. In most situations, there will be couple of employees with very low measurements, couple of employees with very large measurements and most of them centred on a particular value. Since this pattern is so frequently seen, it is called as normal distribution.  The peak value is called the Mean or Average. The width of the curve defines the spread of the variable and is defined by a parameter called “Standard Deviation”  Mean and SD are usually sufficient to completely describe a Normal Distribution. Given these 2 numbers , one can calculate the probability of a random variable by using Standard Tables. But before assuming that a random variable follows Normal Distribution, you need to perform certain tests for Normality
  • 16. The Normal Distribution Central Limit Theorem: Regardless of the underlying distribution, if we draw large enough samples and plot each sample mean then it approximates to normal distribution. The Empirical Rule states that the percentages of data in a normal distribution within 1, 2, and 3 standard deviations of the mean are approximately 68%, 95%, and 99.7%, respectively. Skewness and Kurtosis are the other two characteristics used to understand a distribution. Skewness is a measure of the asymmetry. Negatively skewed curve has a long left tail and vice versa. Kurtosis is a measure of the "peaked ness". Distributions with higher peaks have positive kurtosis and vice-versa. Following diagrams will make this parameter clearer
  • 17. Probability Distributions and ML  The “features” that we select in a Machine Learning problem are generally Random Variables  Many Machine Learning techniques makes assumptions about what are the probability distributions of these random variables  Statisticians and Mathematicians have studied a lot of random variables in nature and realized that there are some recurrent themes. They have defined some standard distributions and most random variables that are encountered fall into one of these standard distributions.
  • 18. Analytics Landscape Reporting: A report describes what events have happened in the business. It provides what is asked for and is typically standardized. A monthly sales summary report shows monthly sales by region. Analysis: An analysis tries to answer why the events happened in the business have happened. E.g an analysis of sales summary report may show sales peaks on specific holidays or weekends. Basic Analytics involves slicing and dicing of data, monitoring large volumes of data in real time and anomaly detection Advanced Analytics: Advanced analytics extends the insights provided by analytics by doing impact analysis on the business and prescribing the next steps which can be taken. It includes predictive modeling, text analytics and advanced data mining algorithms. The purpose of any "data analysis" is to derive meaningful information from it. One way to extract information from data is to study the variability in data points. The more is the variability, the more careful you have to study or explore the dataset, so that you can capture all of its meaning. Data Science: Data science is about using data to make decisions that drive actions. Data science involves:  Finding data  Acquiring data  Cleaning and transforming data  Understanding relationships in data  Delivering value from data Forecasting is a process of estimating the future based on past events. It’s at a high level. E.g no of calls expected in a call center, no of passengers expected to travel from an airport next month etc Predictive modeling is doing the prediction or estimation at a more granular level. E.g which customers are expected to buy the printer in next 30 days.
  • 19. Doing Analytics – Step by Step  Understand the Business Process  Understand the data involved in that Business process – Data Profiling & Exploration  Modeling  Testing and Validation  Deployment
  • 20. Exploratory Data Analysis EDA refers to the process of exploring data for the purpose of doing analytics. It is primarily concerned with looking data, summarizing it, find out the main characteristics of data, usually with visual aid.  Identify the dependent and independent variables (Predictor and Target)  Univariate Analysis: For continuous variables, check the distribution/summary of each of your attributes (mean, median, range, inter-quartile range, standard deviation). For categorical variables, use frequency tables to understand the distribution of each category. It can be measured by finding out Count and Count% of each category.  Bivariate Analysis: Find out the relationship between several variables  Handling Missing Values: In cases where you have a lot of data and only a few missing values, it might make sense to simply delete records with missing values present. On the other hand, if you have more than a handful of missing values, removing records with missing values could cause you to get rid of a lot of data. Missing values in categorical data are not particularly troubling because you can simply treat NA as an additional category. Missing values in numeric variables are more troublesome, since you can't just treat a missing value as number.  Handling Outliers  Variable Transformation  Variable Creation
  • 21. Exploratory Data Analysis 1.Do I need all of the variables? 2. Should I transform any variables? 3. Are there NA values, outliers or other strange values? 4. Should I create new variables?
  • 22. Handling Missing Values 1. If the dataset contains very less no of missing values, you can drop those records 2. Replace the null values with 0s 3. Replace the null values with some central value like the mean or median 4. Impute values (estimate values using statistical/predictive modeling methods.). 5. Split the data set into two parts: one set with where records have an Age value and another set where age is null.
  • 23. Plots for Data Exploration Histogram: A histogram is a univariate plot (a plot that displays one variable) that groups a numeric variable into bins and displays the number of observations that fall within each bin. A histogram is a useful tool for getting a sense of the distribution of a numeric variable. Boxplot: Boxplots are another type of univariate plot for summarizing distributions of numeric data graphically. They can very clearly show outliers in data. The central box of the boxplot represents the middle 50% of the observations, the central bar is the median and the bars at the end of the dotted lines (whiskers) encapsulate the great majority of the observations. Circles that lie beyond the end of the whiskers are data points that may be outliers. Scatterplot: Scatterplots are bivariate (two variable) plots that take two numeric variables and plot data points on the x/y plane.