SlideShare a Scribd company logo
1 of 27
Download to read offline
A Discussion on Data Science as a
Career option
-By Anshik
- Under Student Mentorship Prog.
Overview
As data has multiplied, so has the ability to collect, organize, and analyze it. Data
storage is cheaper than ever, processing power is more massive than ever, and tools are
more accessible than ever to mine huge amount of available data for business
intelligence.
The McKinsey Global Institute predicted that by 2018 the U.S. could face a shortage of
1.5 million people who know how to leverage data analysis to make effective decisions.
Enter: you, taking stock of your three main career options: data analyst, data scientist,
and data engineer
Career options and difference between them
Data Analyst (1.6l - 8L)
Solve problems using
existing tools
No mathematical or
research background
required.
Manage quality of scraped
data, querying databases
and serve data as
visualization.
Data Scientist(3.5L - 18L)
Similar to data analyst in
many aspects.
Responsible for doing
undirected research and
tackle open-ended
problems and questions.
Data analyst summarizes
the past; a data scientist
strategizes for the future
Data Engineer(3L - 21L)
Does groundwork for the
former two.
Responsible for compiling
and installing database
systems, writing complex
queries, scaling to multiple
machines, and putting
disaster recovery systems
into place.
Should you put in
the time and
effort?
●
What do you think?
Data set that contains the
salaries of people who work
at an organization.
-- What questions can be
formed?
-- What Interpretations can
be made?
1.Most of the
positions sought
Masters / PhD
students (especially
in statistics).
Lorem ipsum dolor sit amet,
consectetur adipiscing elit, sed do
eiusmod tempor incididunt
The competition:
● Lorem ipsum
● Dolor sit amet
2.Learning from
MOOCs is not easy
and is
time-consuming
Lorem ipsum dolor sit amet,
consectetur adipiscing elit, sed do
eiusmod tempor incididunt
The competition:
● Lorem ipsum
● Dolor sit amet
3.Condense what
you know in
presentable
manner.
Lorem ipsum dolor sit amet,
consectetur adipiscing elit, sed do
eiusmod tempor incididunt
The competition:
● Lorem ipsum
● Dolor sit amet
On a brighter side...
Srikanth Velamakanni,CEO of CA
headquartered Fractal Analytics:
“In the next few years, the size of the
analytics market will evolve to at
least one-thirds of the global IT
market from the current one-tenths”
Big Data
Analytics Jobs
Trends
Key points
● Huge Job Opportunities & Meeting the Skill Gap
● Salary Aspects
● The Rise of Unstructured and Semistructured Data Analytics
● Used Everywhere
Total Enterprise Data Growth 2005-2015
The way we capture, store,
analyze, and distribute data
is transforming.
Deduplication,compression,
and analysis tools are
lowering costs.
Tools and
Resources
Lorem ipsum dolor sit amet,
consectetur adipiscing elit, sed do
eiusmod tempor incididunt
The competition:
● Lorem ipsum
● Dolor sit amet
Categories and Links
Books
ISLR, R for Dummies, Advanced R, Machine learning
for Hackers(Py), NLP with Python
Websites and Blogs
Analytics Vidhya, Rbloggers, Kaggle Scripts,
CrowdAnalytics, students.brown.edu, github.io
Statistics and Linear
Algebra
Inferential and Descriptive statistics by Udacity,
MSR sir’s Prob & stats Slide, Khan Acad(Lin.Alg)
Machine Learning
and AI
Andrew Ng's ML Class, John Hopkins Data Analysis,
Deepak Khemani(AI-nptel)
Data Storage and
Visualization
MongoDB(Udacity), D3.js documentation and wiki
1 3 5 7 10 12 14 20
Timeline(Weeks)[Beginers]
Learn the
Language -
R/Python
Start Doing
Hackathons/Pet Projects
Practice the
Langauge, Finish
Intro in ML
Do more advance
ML, start optimizing
your code.Start
reading git commits
Intro To ML & R
Installing Packages :-
To install a package, use the install.packages() function. Once a package is installed, it must be loaded
into your current R session before being used using library() or require(). Think of this as taking the
book off of the shelf and opening it up to read.
TIP :- Use require function for loading a package as it throws false if package is not found.
Data Types :-
R has a number of basic data types.
1. Numeric :- Also known as Double. The default type when dealing with numbers.
Examples: 1, 1.0, 42.5
2. Integer: - Examples: 1L, 2L, 42L
3. Complex : - Example: 4 + 2i
4. Logical : - Two possible values: TRUE and FALSE, you can also use T and F, but this is not
recommended.
NA is also considered logical.
5. Character :- Examples: "a", "Statistics", "1 plus 2."
R Object oriented System
S3
Lacks formal definition
Objects are created by
setting the class attribute
Attributes are accessed
using $
Methods belong to generic
function
Follows copy-on-modify
semantics
S4
Class defined using
setClass()
Objects are created using
new()
Attributes are accessed
using @
Methods belong to generic
function
Follows copy-on-modify
semantics
Reference Classes
Class defined using
setRefClass()
Objects are created using
generator functions
Attributes are accessed
using $
Methods belong to the
class
Does not follow
copy-on-modify semantics
Simple Linear Reg. using R
We will use inbuilt Cars dataset in R-base
Data gathered during the 1920s about the speed of cars
and the resulting distance it takes for the car to come to a
stop.
Objective :- How far a car travels before stopping, when
traveling at a certain speed?
What sort of function should we use for f(X)[Y=f(X) +e) for
the cars data?
- A Horizontal Line?
We see this doesn’t seem to do a very good job. Many of
the data points are very far from the orange line
representing cc . This is an example of underfitting.
- Make f(x) depend on x
- As speed increases, the distance required to come to a
stop increases. There is still some variation about this
line, but it seems to capture the overall trend.
Assumptions of Linear Regression
LINE
Linear. The relationship between Y and x is linear, of the form β0+β1x .
Independent. The errors ϵ are independent.
Normal. The errors, ϵ are normally distributed. That is the “error” around the line follows a
normal distribution.
Equal Variance. At each value of x , the variance of Y is the same, σ2 .
We have to find a line that minimize sum of all squared distances from point to line.
lm()
stop_dist_model = lm(dist ~ speed, data = cars)
The abline() function is used to add lines of the form a+bx to a plot. (Hence abline.) When we give it
stop_dist_model as an argument, it automatically extracts the regression coefficient estimates ( β̂0 and
β̂1) and uses them as the slope and intercept of the line. Here we also use lwd to modify the width of
the line, as well as col to modify the color of the line.
lm() function returns an object of class lm()
We can access the members using $ operator
> names(stop_dist_model)
> stop_dist_model$residuals
Use summary() to summarize the output for linear regression.The summary() command also returns a
list, and we can again use names() to learn what about the elements of this list.
> names(summary(stop_dist_model))
> summary(stop_dist_model)$r.squared
Use predict function to predict output for certain input values
> predict(stop_dist_model, data.frame(speed = c(8, 21, 50)))
Thank You
-Anshik
Lorem ipsum dolor sit amet,
consectetur adipiscing elit, sed do
eiusmod tempor incididunt
The competition:
● Lorem ipsum
● Dolor sit amet
8826274098 (Watsapp)

More Related Content

What's hot

Lecture 1 data structures and algorithms
Lecture 1 data structures and algorithmsLecture 1 data structures and algorithms
Lecture 1 data structures and algorithms
Aakash deep Singhal
 
physical and logical data independence
physical and logical data independencephysical and logical data independence
physical and logical data independence
apoorva_upadhyay
 
Data structure
Data structureData structure
Data structure
Mohd Arif
 

What's hot (20)

DBMS - ER Model
DBMS - ER ModelDBMS - ER Model
DBMS - ER Model
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Data structure
Data structureData structure
Data structure
 
Abstract data types (adt) intro to data structure part 2
Abstract data types (adt)   intro to data structure part 2Abstract data types (adt)   intro to data structure part 2
Abstract data types (adt) intro to data structure part 2
 
Searching and Sorting Techniques in Data Structure
Searching and Sorting Techniques in Data StructureSearching and Sorting Techniques in Data Structure
Searching and Sorting Techniques in Data Structure
 
Machine learning - session 2
Machine learning - session 2Machine learning - session 2
Machine learning - session 2
 
Introduction to data analysis using R
Introduction to data analysis using RIntroduction to data analysis using R
Introduction to data analysis using R
 
R Regression Models with Zelig
R Regression Models with ZeligR Regression Models with Zelig
R Regression Models with Zelig
 
Lecture 1 data structures and algorithms
Lecture 1 data structures and algorithmsLecture 1 data structures and algorithms
Lecture 1 data structures and algorithms
 
Property Alignment on Linked Open Data
Property Alignment on Linked Open DataProperty Alignment on Linked Open Data
Property Alignment on Linked Open Data
 
An Intoduction to R
An Intoduction to RAn Intoduction to R
An Intoduction to R
 
RDBMS_Unit 01
RDBMS_Unit 01RDBMS_Unit 01
RDBMS_Unit 01
 
physical and logical data independence
physical and logical data independencephysical and logical data independence
physical and logical data independence
 
Ii pu cs practical viva voce questions
Ii pu cs  practical viva voce questionsIi pu cs  practical viva voce questions
Ii pu cs practical viva voce questions
 
Data Patterns - A Native Open Source Data Profiling Tool for HPCC Systems
Data Patterns - A Native Open Source Data Profiling Tool for HPCC SystemsData Patterns - A Native Open Source Data Profiling Tool for HPCC Systems
Data Patterns - A Native Open Source Data Profiling Tool for HPCC Systems
 
Unit 3
Unit 3Unit 3
Unit 3
 
Data Science - Part II - Working with R & R studio
Data Science - Part II -  Working with R & R studioData Science - Part II -  Working with R & R studio
Data Science - Part II - Working with R & R studio
 
Data structure
Data structureData structure
Data structure
 
Lect07
Lect07Lect07
Lect07
 
Lecture 01 Intro to DSA
Lecture 01 Intro to DSALecture 01 Intro to DSA
Lecture 01 Intro to DSA
 

Similar to Data Science as a Career and Intro to R

A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbies
Vimal Gupta
 
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Rohit Dubey
 
The Incredible Disappearing Data Scientist
The Incredible Disappearing Data ScientistThe Incredible Disappearing Data Scientist
The Incredible Disappearing Data Scientist
Rebecca Bilbro
 

Similar to Data Science as a Career and Intro to R (20)

A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbies
 
Machine learning introduction
Machine learning introductionMachine learning introduction
Machine learning introduction
 
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
 
fINAL ML PPT.pptx
fINAL ML PPT.pptxfINAL ML PPT.pptx
fINAL ML PPT.pptx
 
KIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfKIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdf
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
EE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxEE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptx
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and Techniques
 
Data analytcis-first-steps
Data analytcis-first-stepsData analytcis-first-steps
Data analytcis-first-steps
 
Predictive modeling
Predictive modelingPredictive modeling
Predictive modeling
 
DataScience_RoadMap_2023.pdf
DataScience_RoadMap_2023.pdfDataScience_RoadMap_2023.pdf
DataScience_RoadMap_2023.pdf
 
Machine Learning.pptx
Machine Learning.pptxMachine Learning.pptx
Machine Learning.pptx
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
 
The Incredible Disappearing Data Scientist
The Incredible Disappearing Data ScientistThe Incredible Disappearing Data Scientist
The Incredible Disappearing Data Scientist
 
Big data presentation
Big data presentationBig data presentation
Big data presentation
 
Algorithms and Data Structures~hmftj
Algorithms and Data Structures~hmftjAlgorithms and Data Structures~hmftj
Algorithms and Data Structures~hmftj
 
Artificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptxArtificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptx
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
 
Linear Algebra – A Powerful Tool for Data Science
Linear Algebra – A Powerful Tool for Data ScienceLinear Algebra – A Powerful Tool for Data Science
Linear Algebra – A Powerful Tool for Data Science
 

Recently uploaded

DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
MayuraD1
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
jaanualu31
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
MsecMca
 

Recently uploaded (20)

Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Rums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdfRums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdf
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 

Data Science as a Career and Intro to R

  • 1. A Discussion on Data Science as a Career option -By Anshik - Under Student Mentorship Prog.
  • 2.
  • 3. Overview As data has multiplied, so has the ability to collect, organize, and analyze it. Data storage is cheaper than ever, processing power is more massive than ever, and tools are more accessible than ever to mine huge amount of available data for business intelligence. The McKinsey Global Institute predicted that by 2018 the U.S. could face a shortage of 1.5 million people who know how to leverage data analysis to make effective decisions. Enter: you, taking stock of your three main career options: data analyst, data scientist, and data engineer
  • 4. Career options and difference between them Data Analyst (1.6l - 8L) Solve problems using existing tools No mathematical or research background required. Manage quality of scraped data, querying databases and serve data as visualization. Data Scientist(3.5L - 18L) Similar to data analyst in many aspects. Responsible for doing undirected research and tackle open-ended problems and questions. Data analyst summarizes the past; a data scientist strategizes for the future Data Engineer(3L - 21L) Does groundwork for the former two. Responsible for compiling and installing database systems, writing complex queries, scaling to multiple machines, and putting disaster recovery systems into place.
  • 5. Should you put in the time and effort? ●
  • 6. What do you think? Data set that contains the salaries of people who work at an organization. -- What questions can be formed? -- What Interpretations can be made?
  • 7. 1.Most of the positions sought Masters / PhD students (especially in statistics). Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt The competition: ● Lorem ipsum ● Dolor sit amet
  • 8. 2.Learning from MOOCs is not easy and is time-consuming Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt The competition: ● Lorem ipsum ● Dolor sit amet
  • 9. 3.Condense what you know in presentable manner. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt The competition: ● Lorem ipsum ● Dolor sit amet
  • 10. On a brighter side...
  • 11. Srikanth Velamakanni,CEO of CA headquartered Fractal Analytics: “In the next few years, the size of the analytics market will evolve to at least one-thirds of the global IT market from the current one-tenths”
  • 13. Key points ● Huge Job Opportunities & Meeting the Skill Gap ● Salary Aspects ● The Rise of Unstructured and Semistructured Data Analytics ● Used Everywhere
  • 14. Total Enterprise Data Growth 2005-2015 The way we capture, store, analyze, and distribute data is transforming. Deduplication,compression, and analysis tools are lowering costs.
  • 15. Tools and Resources Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt The competition: ● Lorem ipsum ● Dolor sit amet
  • 16. Categories and Links Books ISLR, R for Dummies, Advanced R, Machine learning for Hackers(Py), NLP with Python Websites and Blogs Analytics Vidhya, Rbloggers, Kaggle Scripts, CrowdAnalytics, students.brown.edu, github.io Statistics and Linear Algebra Inferential and Descriptive statistics by Udacity, MSR sir’s Prob & stats Slide, Khan Acad(Lin.Alg) Machine Learning and AI Andrew Ng's ML Class, John Hopkins Data Analysis, Deepak Khemani(AI-nptel) Data Storage and Visualization MongoDB(Udacity), D3.js documentation and wiki
  • 17. 1 3 5 7 10 12 14 20 Timeline(Weeks)[Beginers] Learn the Language - R/Python Start Doing Hackathons/Pet Projects Practice the Langauge, Finish Intro in ML Do more advance ML, start optimizing your code.Start reading git commits
  • 18. Intro To ML & R Installing Packages :- To install a package, use the install.packages() function. Once a package is installed, it must be loaded into your current R session before being used using library() or require(). Think of this as taking the book off of the shelf and opening it up to read. TIP :- Use require function for loading a package as it throws false if package is not found. Data Types :- R has a number of basic data types. 1. Numeric :- Also known as Double. The default type when dealing with numbers. Examples: 1, 1.0, 42.5 2. Integer: - Examples: 1L, 2L, 42L 3. Complex : - Example: 4 + 2i 4. Logical : - Two possible values: TRUE and FALSE, you can also use T and F, but this is not recommended. NA is also considered logical. 5. Character :- Examples: "a", "Statistics", "1 plus 2."
  • 19.
  • 20. R Object oriented System S3 Lacks formal definition Objects are created by setting the class attribute Attributes are accessed using $ Methods belong to generic function Follows copy-on-modify semantics S4 Class defined using setClass() Objects are created using new() Attributes are accessed using @ Methods belong to generic function Follows copy-on-modify semantics Reference Classes Class defined using setRefClass() Objects are created using generator functions Attributes are accessed using $ Methods belong to the class Does not follow copy-on-modify semantics
  • 22. We will use inbuilt Cars dataset in R-base Data gathered during the 1920s about the speed of cars and the resulting distance it takes for the car to come to a stop. Objective :- How far a car travels before stopping, when traveling at a certain speed?
  • 23. What sort of function should we use for f(X)[Y=f(X) +e) for the cars data? - A Horizontal Line? We see this doesn’t seem to do a very good job. Many of the data points are very far from the orange line representing cc . This is an example of underfitting. - Make f(x) depend on x - As speed increases, the distance required to come to a stop increases. There is still some variation about this line, but it seems to capture the overall trend.
  • 24.
  • 25. Assumptions of Linear Regression LINE Linear. The relationship between Y and x is linear, of the form β0+β1x . Independent. The errors ϵ are independent. Normal. The errors, ϵ are normally distributed. That is the “error” around the line follows a normal distribution. Equal Variance. At each value of x , the variance of Y is the same, σ2 . We have to find a line that minimize sum of all squared distances from point to line.
  • 26. lm() stop_dist_model = lm(dist ~ speed, data = cars) The abline() function is used to add lines of the form a+bx to a plot. (Hence abline.) When we give it stop_dist_model as an argument, it automatically extracts the regression coefficient estimates ( β̂0 and β̂1) and uses them as the slope and intercept of the line. Here we also use lwd to modify the width of the line, as well as col to modify the color of the line. lm() function returns an object of class lm() We can access the members using $ operator > names(stop_dist_model) > stop_dist_model$residuals Use summary() to summarize the output for linear regression.The summary() command also returns a list, and we can again use names() to learn what about the elements of this list. > names(summary(stop_dist_model)) > summary(stop_dist_model)$r.squared Use predict function to predict output for certain input values > predict(stop_dist_model, data.frame(speed = c(8, 21, 50)))
  • 27. Thank You -Anshik Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt The competition: ● Lorem ipsum ● Dolor sit amet 8826274098 (Watsapp)