www.edureka.in/data-scienceSlide 1 www.edureka.in/data-science
Data Science
Inject Intelligence Into
Business decisions Using
www.edureka.in/data-scienceSlide 2 www.edureka.co/r-for-analyticsSlide 2 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Objectives
ļ‚®What is data mining
ļ‚®Stages of data mining??
ļ‚® What is R
ļ‚®What is data science??
ļ‚®What is needed of data scientist??
ļ‚® Roles and Responsibilities of a Data Scientist.
ļ‚® Logistic Regression
At the end of this session, you will be able to
www.edureka.in/data-scienceSlide 3 www.edureka.in/data-scienceSlide 3
Data Science Applications: Wine Recommendation
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
www.edureka.in/data-scienceSlide 4 www.edureka.in/data-scienceSlide 4
Data Science Applications: Predict Accidents
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
www.edureka.in/data-scienceSlide 5Slide 5Slide 5 www.edureka.co/r-for-analyticsTwitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Cross Industry standard Process for data mining ( CRISP – DM )
Stages of Analytics / Data Mining
www.edureka.in/data-scienceSlide 6Slide 6Slide 6 www.edureka.co/r-for-analyticsTwitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Components data science??
www.edureka.in/data-scienceSlide 7Slide 7Slide 7 www.edureka.co/r-for-analyticsTwitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Components data science
R Programming Language
Slide 8 www.edureka.in/data-science
Data Science: Demand Supply Gap
Big Data Analyst
Big Data Architect
Big Data Engineer
Big Data Research Analyst
Big Data Visualizer
Data Scientist
50
43
44
31
23
18
50
57
56
69
77
82
Filled job vs unfilled jobs in big data
Filled Unfilled
Vacancy/Filled(%)
Gartner Says Big Data Creates Big Jobs: 4.4 Million IT
Jobs Globally to Support Big Data By
2015http://www.gartner.com/newsroom/id/2207915
Slide 9 www.edureka.in/data-science
Hadoop and R together
Slide 10 www.edureka.in/data-science
Machine Learning
ļ‚®We have so many algorithms for data mining which can be used to build systems that can read past data and can
generate a system that can accommodate any future data and derive useful insight from it
ļ‚®Machine learning focuses on the development of computer programs that can teach themselves to grow and change
when exposed to new data
Slide 11 www.edureka.in/data-science
Types of Learning
Supervised Learning Unsupervised Learning
1. Uses a known dataset to make
predictions.
2. The training dataset includes
input data and response values.
3. From it, the supervised learning
algorithm builds a model to make
predictions of the response
values for a new dataset.
1. Draw inferences from datasets
consisting of input data without
labeled responses.
2. Used for exploratory data analysis
to find hidden patterns or grouping
in data
3. The most common unsupervised
learning method is cluster analysis.
Machine Learning
Slide 12 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
• Common Machine Learning Algorithms
Slide 13 www.edureka.in/data-science
Logistic Regression
Slide 14 www.edureka.in/data-science
Logistic Regression
ļ‚® In statistics, logistic regression, or logit regression, or logit model is a direct probability
model
ļ‚® Rather than modeling this response Y directly, logistic regression models the probability
that Y belongs to a particular category
ļ‚® In logistic regression, we use the logistic function,
Slide 15 www.edureka.in/data-science
Logistic Regression
ļ‚® After some calculations we can get : p(X) /1āˆ’p(X) = eA+BX
ļ‚® The quantity p(X)/[1āˆ’p(X)] is called the odds, and can take on values between 0 and āˆž.
ļ‚® Values of the odds close to 0 and āˆž indicate very low and very high probabilities resp.
ļ‚® Finally we get: log (p(X)/1-p(X)) = A + BX which is called the log-odds or logit
ļ‚® Logistic Regression is linear in x.
Slide 16 www.edureka.in/data-science
Sigmoid Function for Logistic Regression
www.edureka.in/pmi-acp
Slide 17
Slide 18 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Maximum Likelihood Estimation (MLE)
→ MLE is a statistical method for estimating the coefficients of a model.
→ The likelihood function (L) measures the probability of observing the
particular set of dependent variable values (p1, p2, ..., pn) that occur in the
sample:
L = Prob (p1* p2* * * pn)
→ The higher the L, the higher the probability of observing the ps in the sample.
Slide 19 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
Maximum Likelihood Estimation (MLE)
→ MLE involves finding the coefficients (, ) that makes the log of the
likelihood function (LL < 0) as large as possible
→ Or, finds the coefficients that make -2 times the log of the likelihood function
(-2LL) as small as possible
→ The maximum likelihood estimates solve the following condition:
{Y - p(Y=1)}Xi = 0
summed over all observations, i = 1,…,n
www.edureka.in/pmi-acpSlide 20 www.edureka.in/data-science
ļ‚® Module 1
Ā» Introduction to Data Science
ļ‚® Module 2
Ā» Basic Data Manipulation using R
ļ‚® Module 3
Ā» Machine Learning Techniques using R Part -1
- Clustering
- TF-IDF and Cosine Similarity
- Association Rule Mining
ļ‚® Module 4
Ā» Machine Learning Techniques using R Part -2
- Supervised and Unsupervised Learning
- Decision Tree Classifier
Course Topics
ļ‚® Module 5
Ā» Machine Learning Techniques using R Part -3
- Random Forest Classifier
- NaĆÆve Bayer’s Classifier
ļ‚® Module 6
Ā» Introduction to Hadoop Architecture
ļ‚® Module 7
Ā» Integrating R with Hadoop
ļ‚® Module 8
Ā» Mahout Introduction and Algorithm
Implementation
ļ‚® Module 9
Ā» Additional Mahout Algorithms and Parallel
Processing in R
ļ‚® Module 10
Ā» Project
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
www.edureka.in/pmi-acp
Slide 21
Questions?
Enroll for the Complete Course at : www.edureka.in/data_science
Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
www.edureka.in/data_science
Please Don’t forget to fill in the survey report
Class Recording and Presentation will be available in 24 hours at:
http://www.edureka.in/blog/application-of-clustering-in-data-science-using-real-life-examples/

Logistic Regression In Data Science

  • 1.
    www.edureka.in/data-scienceSlide 1 www.edureka.in/data-science DataScience Inject Intelligence Into Business decisions Using
  • 2.
    www.edureka.in/data-scienceSlide 2 www.edureka.co/r-for-analyticsSlide2 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Objectives ļ‚®What is data mining ļ‚®Stages of data mining?? ļ‚® What is R ļ‚®What is data science?? ļ‚®What is needed of data scientist?? ļ‚® Roles and Responsibilities of a Data Scientist. ļ‚® Logistic Regression At the end of this session, you will be able to
  • 3.
    www.edureka.in/data-scienceSlide 3 www.edureka.in/data-scienceSlide3 Data Science Applications: Wine Recommendation Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
  • 4.
    www.edureka.in/data-scienceSlide 4 www.edureka.in/data-scienceSlide4 Data Science Applications: Predict Accidents Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
  • 5.
    www.edureka.in/data-scienceSlide 5Slide 5Slide5 www.edureka.co/r-for-analyticsTwitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Cross Industry standard Process for data mining ( CRISP – DM ) Stages of Analytics / Data Mining
  • 6.
    www.edureka.in/data-scienceSlide 6Slide 6Slide6 www.edureka.co/r-for-analyticsTwitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Components data science??
  • 7.
    www.edureka.in/data-scienceSlide 7Slide 7Slide7 www.edureka.co/r-for-analyticsTwitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Components data science R Programming Language
  • 8.
    Slide 8 www.edureka.in/data-science DataScience: Demand Supply Gap Big Data Analyst Big Data Architect Big Data Engineer Big Data Research Analyst Big Data Visualizer Data Scientist 50 43 44 31 23 18 50 57 56 69 77 82 Filled job vs unfilled jobs in big data Filled Unfilled Vacancy/Filled(%) Gartner Says Big Data Creates Big Jobs: 4.4 Million IT Jobs Globally to Support Big Data By 2015http://www.gartner.com/newsroom/id/2207915
  • 9.
  • 10.
    Slide 10 www.edureka.in/data-science MachineLearning ļ‚®We have so many algorithms for data mining which can be used to build systems that can read past data and can generate a system that can accommodate any future data and derive useful insight from it ļ‚®Machine learning focuses on the development of computer programs that can teach themselves to grow and change when exposed to new data
  • 11.
    Slide 11 www.edureka.in/data-science Typesof Learning Supervised Learning Unsupervised Learning 1. Uses a known dataset to make predictions. 2. The training dataset includes input data and response values. 3. From it, the supervised learning algorithm builds a model to make predictions of the response values for a new dataset. 1. Draw inferences from datasets consisting of input data without labeled responses. 2. Used for exploratory data analysis to find hidden patterns or grouping in data 3. The most common unsupervised learning method is cluster analysis. Machine Learning
  • 12.
    Slide 12 Twitter@edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions • Common Machine Learning Algorithms
  • 13.
  • 14.
    Slide 14 www.edureka.in/data-science LogisticRegression ļ‚® In statistics, logistic regression, or logit regression, or logit model is a direct probability model ļ‚® Rather than modeling this response Y directly, logistic regression models the probability that Y belongs to a particular category ļ‚® In logistic regression, we use the logistic function,
  • 15.
    Slide 15 www.edureka.in/data-science LogisticRegression ļ‚® After some calculations we can get : p(X) /1āˆ’p(X) = eA+BX ļ‚® The quantity p(X)/[1āˆ’p(X)] is called the odds, and can take on values between 0 and āˆž. ļ‚® Values of the odds close to 0 and āˆž indicate very low and very high probabilities resp. ļ‚® Finally we get: log (p(X)/1-p(X)) = A + BX which is called the log-odds or logit ļ‚® Logistic Regression is linear in x.
  • 16.
    Slide 16 www.edureka.in/data-science SigmoidFunction for Logistic Regression
  • 17.
  • 18.
    Slide 18 Twitter@edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Maximum Likelihood Estimation (MLE) → MLE is a statistical method for estimating the coefficients of a model. → The likelihood function (L) measures the probability of observing the particular set of dependent variable values (p1, p2, ..., pn) that occur in the sample: L = Prob (p1* p2* * * pn) → The higher the L, the higher the probability of observing the ps in the sample.
  • 19.
    Slide 19 Twitter@edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Maximum Likelihood Estimation (MLE) → MLE involves finding the coefficients (, ) that makes the log of the likelihood function (LL < 0) as large as possible → Or, finds the coefficients that make -2 times the log of the likelihood function (-2LL) as small as possible → The maximum likelihood estimates solve the following condition: {Y - p(Y=1)}Xi = 0 summed over all observations, i = 1,…,n
  • 20.
    www.edureka.in/pmi-acpSlide 20 www.edureka.in/data-science ļ‚®Module 1 Ā» Introduction to Data Science ļ‚® Module 2 Ā» Basic Data Manipulation using R ļ‚® Module 3 Ā» Machine Learning Techniques using R Part -1 - Clustering - TF-IDF and Cosine Similarity - Association Rule Mining ļ‚® Module 4 Ā» Machine Learning Techniques using R Part -2 - Supervised and Unsupervised Learning - Decision Tree Classifier Course Topics ļ‚® Module 5 Ā» Machine Learning Techniques using R Part -3 - Random Forest Classifier - NaĆÆve Bayer’s Classifier ļ‚® Module 6 Ā» Introduction to Hadoop Architecture ļ‚® Module 7 Ā» Integrating R with Hadoop ļ‚® Module 8 Ā» Mahout Introduction and Algorithm Implementation ļ‚® Module 9 Ā» Additional Mahout Algorithms and Parallel Processing in R ļ‚® Module 10 Ā» Project Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
  • 21.
    www.edureka.in/pmi-acp Slide 21 Questions? Enroll forthe Complete Course at : www.edureka.in/data_science Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in/data_science Please Don’t forget to fill in the survey report Class Recording and Presentation will be available in 24 hours at: http://www.edureka.in/blog/application-of-clustering-in-data-science-using-real-life-examples/