SlideShare a Scribd company logo
A Discussion on Data Science as a
Career option
-By Anshik
- Under Student Mentorship Prog.
Overview
As data has multiplied, so has the ability to collect, organize, and analyze it. Data
storage is cheaper than ever, processing power is more massive than ever, and tools are
more accessible than ever to mine huge amount of available data for business
intelligence.
The McKinsey Global Institute predicted that by 2018 the U.S. could face a shortage of
1.5 million people who know how to leverage data analysis to make effective decisions.
Enter: you, taking stock of your three main career options: data analyst, data scientist,
and data engineer
Career options and difference between them
Data Analyst (1.6l - 8L)
Solve problems using
existing tools
No mathematical or
research background
required.
Manage quality of scraped
data, querying databases
and serve data as
visualization.
Data Scientist(3.5L - 18L)
Similar to data analyst in
many aspects.
Responsible for doing
undirected research and
tackle open-ended
problems and questions.
Data analyst summarizes
the past; a data scientist
strategizes for the future
Data Engineer(3L - 21L)
Does groundwork for the
former two.
Responsible for compiling
and installing database
systems, writing complex
queries, scaling to multiple
machines, and putting
disaster recovery systems
into place.
Should you put in
the time and
effort?
●
What do you think?
Data set that contains the
salaries of people who work
at an organization.
-- What questions can be
formed?
-- What Interpretations can
be made?
1.Most of the
positions sought
Masters / PhD
students (especially
in statistics).
Lorem ipsum dolor sit amet,
consectetur adipiscing elit, sed do
eiusmod tempor incididunt
The competition:
● Lorem ipsum
● Dolor sit amet
2.Learning from
MOOCs is not easy
and is
time-consuming
Lorem ipsum dolor sit amet,
consectetur adipiscing elit, sed do
eiusmod tempor incididunt
The competition:
● Lorem ipsum
● Dolor sit amet
3.Condense what
you know in
presentable
manner.
Lorem ipsum dolor sit amet,
consectetur adipiscing elit, sed do
eiusmod tempor incididunt
The competition:
● Lorem ipsum
● Dolor sit amet
On a brighter side...
Srikanth Velamakanni,CEO of CA
headquartered Fractal Analytics:
“In the next few years, the size of the
analytics market will evolve to at
least one-thirds of the global IT
market from the current one-tenths”
Big Data
Analytics Jobs
Trends
Key points
● Huge Job Opportunities & Meeting the Skill Gap
● Salary Aspects
● The Rise of Unstructured and Semistructured Data Analytics
● Used Everywhere
Total Enterprise Data Growth 2005-2015
The way we capture, store,
analyze, and distribute data
is transforming.
Deduplication,compression,
and analysis tools are
lowering costs.
Tools and
Resources
Lorem ipsum dolor sit amet,
consectetur adipiscing elit, sed do
eiusmod tempor incididunt
The competition:
● Lorem ipsum
● Dolor sit amet
Categories and Links
Books
ISLR, R for Dummies, Advanced R, Machine learning
for Hackers(Py), NLP with Python
Websites and Blogs
Analytics Vidhya, Rbloggers, Kaggle Scripts,
CrowdAnalytics, students.brown.edu, github.io
Statistics and Linear
Algebra
Inferential and Descriptive statistics by Udacity,
MSR sir’s Prob & stats Slide, Khan Acad(Lin.Alg)
Machine Learning
and AI
Andrew Ng's ML Class, John Hopkins Data Analysis,
Deepak Khemani(AI-nptel)
Data Storage and
Visualization
MongoDB(Udacity), D3.js documentation and wiki
1 3 5 7 10 12 14 20
Timeline(Weeks)[Beginers]
Learn the
Language -
R/Python
Start Doing
Hackathons/Pet Projects
Practice the
Langauge, Finish
Intro in ML
Do more advance
ML, start optimizing
your code.Start
reading git commits
Intro To ML & R
Installing Packages :-
To install a package, use the install.packages() function. Once a package is installed, it must be loaded
into your current R session before being used using library() or require(). Think of this as taking the
book off of the shelf and opening it up to read.
TIP :- Use require function for loading a package as it throws false if package is not found.
Data Types :-
R has a number of basic data types.
1. Numeric :- Also known as Double. The default type when dealing with numbers.
Examples: 1, 1.0, 42.5
2. Integer: - Examples: 1L, 2L, 42L
3. Complex : - Example: 4 + 2i
4. Logical : - Two possible values: TRUE and FALSE, you can also use T and F, but this is not
recommended.
NA is also considered logical.
5. Character :- Examples: "a", "Statistics", "1 plus 2."
R Object oriented System
S3
Lacks formal definition
Objects are created by
setting the class attribute
Attributes are accessed
using $
Methods belong to generic
function
Follows copy-on-modify
semantics
S4
Class defined using
setClass()
Objects are created using
new()
Attributes are accessed
using @
Methods belong to generic
function
Follows copy-on-modify
semantics
Reference Classes
Class defined using
setRefClass()
Objects are created using
generator functions
Attributes are accessed
using $
Methods belong to the
class
Does not follow
copy-on-modify semantics
Simple Linear Reg. using R
We will use inbuilt Cars dataset in R-base
Data gathered during the 1920s about the speed of cars
and the resulting distance it takes for the car to come to a
stop.
Objective :- How far a car travels before stopping, when
traveling at a certain speed?
What sort of function should we use for f(X)[Y=f(X) +e) for
the cars data?
- A Horizontal Line?
We see this doesn’t seem to do a very good job. Many of
the data points are very far from the orange line
representing cc . This is an example of underfitting.
- Make f(x) depend on x
- As speed increases, the distance required to come to a
stop increases. There is still some variation about this
line, but it seems to capture the overall trend.
Assumptions of Linear Regression
LINE
Linear. The relationship between Y and x is linear, of the form β0+β1x .
Independent. The errors ϵ are independent.
Normal. The errors, ϵ are normally distributed. That is the “error” around the line follows a
normal distribution.
Equal Variance. At each value of x , the variance of Y is the same, σ2 .
We have to find a line that minimize sum of all squared distances from point to line.
lm()
stop_dist_model = lm(dist ~ speed, data = cars)
The abline() function is used to add lines of the form a+bx to a plot. (Hence abline.) When we give it
stop_dist_model as an argument, it automatically extracts the regression coefficient estimates ( β̂0 and
β̂1) and uses them as the slope and intercept of the line. Here we also use lwd to modify the width of
the line, as well as col to modify the color of the line.
lm() function returns an object of class lm()
We can access the members using $ operator
> names(stop_dist_model)
> stop_dist_model$residuals
Use summary() to summarize the output for linear regression.The summary() command also returns a
list, and we can again use names() to learn what about the elements of this list.
> names(summary(stop_dist_model))
> summary(stop_dist_model)$r.squared
Use predict function to predict output for certain input values
> predict(stop_dist_model, data.frame(speed = c(8, 21, 50)))
Thank You
-Anshik
Lorem ipsum dolor sit amet,
consectetur adipiscing elit, sed do
eiusmod tempor incididunt
The competition:
● Lorem ipsum
● Dolor sit amet
8826274098 (Watsapp)

More Related Content

What's hot

DBMS - ER Model
DBMS - ER ModelDBMS - ER Model
DBMS - ER Model
MythiliMurugan3
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
Ajay Ohri
 
Data structure
Data structureData structure
Data structure
Prof. Dr. K. Adisesha
 
Abstract data types (adt) intro to data structure part 2
Abstract data types (adt)   intro to data structure part 2Abstract data types (adt)   intro to data structure part 2
Abstract data types (adt) intro to data structure part 2
Self-Employed
 
Searching and Sorting Techniques in Data Structure
Searching and Sorting Techniques in Data StructureSearching and Sorting Techniques in Data Structure
Searching and Sorting Techniques in Data Structure
Balwant Gorad
 
Machine learning - session 2
Machine learning - session 2Machine learning - session 2
Machine learning - session 2
Luis Borbon
 
Introduction to data analysis using R
Introduction to data analysis using RIntroduction to data analysis using R
Introduction to data analysis using R
Victoria López
 
R Regression Models with Zelig
R Regression Models with ZeligR Regression Models with Zelig
R Regression Models with Zelig
izahn
 
Lecture 1 data structures and algorithms
Lecture 1 data structures and algorithmsLecture 1 data structures and algorithms
Lecture 1 data structures and algorithmsAakash deep Singhal
 
Property Alignment on Linked Open Data
Property Alignment on Linked Open DataProperty Alignment on Linked Open Data
Property Alignment on Linked Open Data
Artificial Intelligence Institute at UofSC
 
An Intoduction to R
An Intoduction to RAn Intoduction to R
An Intoduction to R
Mahmoud Shiri Varamini
 
RDBMS_Unit 01
RDBMS_Unit 01RDBMS_Unit 01
RDBMS_Unit 01
Prashanth Shivakumar
 
physical and logical data independence
physical and logical data independencephysical and logical data independence
physical and logical data independenceapoorva_upadhyay
 
Ii pu cs practical viva voce questions
Ii pu cs  practical viva voce questionsIi pu cs  practical viva voce questions
Ii pu cs practical viva voce questions
Prof. Dr. K. Adisesha
 
Data Patterns - A Native Open Source Data Profiling Tool for HPCC Systems
Data Patterns - A Native Open Source Data Profiling Tool for HPCC SystemsData Patterns - A Native Open Source Data Profiling Tool for HPCC Systems
Data Patterns - A Native Open Source Data Profiling Tool for HPCC Systems
HPCC Systems
 
Unit 3
Unit 3Unit 3
Data Science - Part II - Working with R & R studio
Data Science - Part II -  Working with R & R studioData Science - Part II -  Working with R & R studio
Data Science - Part II - Working with R & R studio
Derek Kane
 
Data structure
Data structureData structure
Data structureMohd Arif
 
Lecture 01 Intro to DSA
Lecture 01 Intro to DSALecture 01 Intro to DSA
Lecture 01 Intro to DSA
Nurjahan Nipa
 

What's hot (20)

DBMS - ER Model
DBMS - ER ModelDBMS - ER Model
DBMS - ER Model
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Data structure
Data structureData structure
Data structure
 
Abstract data types (adt) intro to data structure part 2
Abstract data types (adt)   intro to data structure part 2Abstract data types (adt)   intro to data structure part 2
Abstract data types (adt) intro to data structure part 2
 
Searching and Sorting Techniques in Data Structure
Searching and Sorting Techniques in Data StructureSearching and Sorting Techniques in Data Structure
Searching and Sorting Techniques in Data Structure
 
Machine learning - session 2
Machine learning - session 2Machine learning - session 2
Machine learning - session 2
 
Introduction to data analysis using R
Introduction to data analysis using RIntroduction to data analysis using R
Introduction to data analysis using R
 
R Regression Models with Zelig
R Regression Models with ZeligR Regression Models with Zelig
R Regression Models with Zelig
 
Lecture 1 data structures and algorithms
Lecture 1 data structures and algorithmsLecture 1 data structures and algorithms
Lecture 1 data structures and algorithms
 
Property Alignment on Linked Open Data
Property Alignment on Linked Open DataProperty Alignment on Linked Open Data
Property Alignment on Linked Open Data
 
An Intoduction to R
An Intoduction to RAn Intoduction to R
An Intoduction to R
 
RDBMS_Unit 01
RDBMS_Unit 01RDBMS_Unit 01
RDBMS_Unit 01
 
physical and logical data independence
physical and logical data independencephysical and logical data independence
physical and logical data independence
 
Ii pu cs practical viva voce questions
Ii pu cs  practical viva voce questionsIi pu cs  practical viva voce questions
Ii pu cs practical viva voce questions
 
Data Patterns - A Native Open Source Data Profiling Tool for HPCC Systems
Data Patterns - A Native Open Source Data Profiling Tool for HPCC SystemsData Patterns - A Native Open Source Data Profiling Tool for HPCC Systems
Data Patterns - A Native Open Source Data Profiling Tool for HPCC Systems
 
Unit 3
Unit 3Unit 3
Unit 3
 
Data Science - Part II - Working with R & R studio
Data Science - Part II -  Working with R & R studioData Science - Part II -  Working with R & R studio
Data Science - Part II - Working with R & R studio
 
Data structure
Data structureData structure
Data structure
 
Lect07
Lect07Lect07
Lect07
 
Lecture 01 Intro to DSA
Lecture 01 Intro to DSALecture 01 Intro to DSA
Lecture 01 Intro to DSA
 

Similar to Data Science as a Career and Intro to R

A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbies
Vimal Gupta
 
Machine learning introduction
Machine learning introductionMachine learning introduction
Machine learning introduction
Anas Jamil
 
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Rohit Dubey
 
fINAL ML PPT.pptx
fINAL ML PPT.pptxfINAL ML PPT.pptx
fINAL ML PPT.pptx
19445KNithinbabu
 
KIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfKIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdf
Dr. Radhey Shyam
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
Vrishit Saraswat
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
Dr.Shweta
 
EE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxEE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptx
iamultapromax
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and Techniques
Sơn Còm Nhom
 
IT-601 Lecture Notes-UNIT-2.pdf Data Analysis
IT-601 Lecture Notes-UNIT-2.pdf Data AnalysisIT-601 Lecture Notes-UNIT-2.pdf Data Analysis
IT-601 Lecture Notes-UNIT-2.pdf Data Analysis
Dr. Radhey Shyam
 
Data analytcis-first-steps
Data analytcis-first-stepsData analytcis-first-steps
Data analytcis-first-steps
Shesha R
 
Predictive modeling
Predictive modelingPredictive modeling
Predictive modeling
Prashant Mudgal
 
DataScience_RoadMap_2023.pdf
DataScience_RoadMap_2023.pdfDataScience_RoadMap_2023.pdf
DataScience_RoadMap_2023.pdf
MuhammadRizwanAmanat
 
Machine Learning.pptx
Machine Learning.pptxMachine Learning.pptx
Machine Learning.pptx
NitinSharma134320
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
Richard Garris
 
The Incredible Disappearing Data Scientist
The Incredible Disappearing Data ScientistThe Incredible Disappearing Data Scientist
The Incredible Disappearing Data Scientist
Rebecca Bilbro
 
Big data presentation
Big data presentationBig data presentation
Big data presentation
Catur Wibisono
 
Algorithms and Data Structures~hmftj
Algorithms and Data Structures~hmftjAlgorithms and Data Structures~hmftj
Algorithms and Data Structures~hmftj
LGS, GBHS&IC, University Of South-Asia, TARA-Technologies
 
Artificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptxArtificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptx
ChandrakalaV15
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
Katy Allen
 

Similar to Data Science as a Career and Intro to R (20)

A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbies
 
Machine learning introduction
Machine learning introductionMachine learning introduction
Machine learning introduction
 
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
 
fINAL ML PPT.pptx
fINAL ML PPT.pptxfINAL ML PPT.pptx
fINAL ML PPT.pptx
 
KIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfKIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdf
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
EE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxEE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptx
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and Techniques
 
IT-601 Lecture Notes-UNIT-2.pdf Data Analysis
IT-601 Lecture Notes-UNIT-2.pdf Data AnalysisIT-601 Lecture Notes-UNIT-2.pdf Data Analysis
IT-601 Lecture Notes-UNIT-2.pdf Data Analysis
 
Data analytcis-first-steps
Data analytcis-first-stepsData analytcis-first-steps
Data analytcis-first-steps
 
Predictive modeling
Predictive modelingPredictive modeling
Predictive modeling
 
DataScience_RoadMap_2023.pdf
DataScience_RoadMap_2023.pdfDataScience_RoadMap_2023.pdf
DataScience_RoadMap_2023.pdf
 
Machine Learning.pptx
Machine Learning.pptxMachine Learning.pptx
Machine Learning.pptx
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
 
The Incredible Disappearing Data Scientist
The Incredible Disappearing Data ScientistThe Incredible Disappearing Data Scientist
The Incredible Disappearing Data Scientist
 
Big data presentation
Big data presentationBig data presentation
Big data presentation
 
Algorithms and Data Structures~hmftj
Algorithms and Data Structures~hmftjAlgorithms and Data Structures~hmftj
Algorithms and Data Structures~hmftj
 
Artificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptxArtificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptx
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
 

Recently uploaded

Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
ChristineTorrepenida1
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
Kerry Sado
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
zwunae
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
heavyhaig
 
Steel & Timber Design according to British Standard
Steel & Timber Design according to British StandardSteel & Timber Design according to British Standard
Steel & Timber Design according to British Standard
AkolbilaEmmanuel1
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
Intella Parts
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERSCW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
veerababupersonal22
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABSDESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
itech2017
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 

Recently uploaded (20)

Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
 
Steel & Timber Design according to British Standard
Steel & Timber Design according to British StandardSteel & Timber Design according to British Standard
Steel & Timber Design according to British Standard
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERSCW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABSDESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 

Data Science as a Career and Intro to R

  • 1. A Discussion on Data Science as a Career option -By Anshik - Under Student Mentorship Prog.
  • 2.
  • 3. Overview As data has multiplied, so has the ability to collect, organize, and analyze it. Data storage is cheaper than ever, processing power is more massive than ever, and tools are more accessible than ever to mine huge amount of available data for business intelligence. The McKinsey Global Institute predicted that by 2018 the U.S. could face a shortage of 1.5 million people who know how to leverage data analysis to make effective decisions. Enter: you, taking stock of your three main career options: data analyst, data scientist, and data engineer
  • 4. Career options and difference between them Data Analyst (1.6l - 8L) Solve problems using existing tools No mathematical or research background required. Manage quality of scraped data, querying databases and serve data as visualization. Data Scientist(3.5L - 18L) Similar to data analyst in many aspects. Responsible for doing undirected research and tackle open-ended problems and questions. Data analyst summarizes the past; a data scientist strategizes for the future Data Engineer(3L - 21L) Does groundwork for the former two. Responsible for compiling and installing database systems, writing complex queries, scaling to multiple machines, and putting disaster recovery systems into place.
  • 5. Should you put in the time and effort? ●
  • 6. What do you think? Data set that contains the salaries of people who work at an organization. -- What questions can be formed? -- What Interpretations can be made?
  • 7. 1.Most of the positions sought Masters / PhD students (especially in statistics). Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt The competition: ● Lorem ipsum ● Dolor sit amet
  • 8. 2.Learning from MOOCs is not easy and is time-consuming Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt The competition: ● Lorem ipsum ● Dolor sit amet
  • 9. 3.Condense what you know in presentable manner. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt The competition: ● Lorem ipsum ● Dolor sit amet
  • 10. On a brighter side...
  • 11. Srikanth Velamakanni,CEO of CA headquartered Fractal Analytics: “In the next few years, the size of the analytics market will evolve to at least one-thirds of the global IT market from the current one-tenths”
  • 13. Key points ● Huge Job Opportunities & Meeting the Skill Gap ● Salary Aspects ● The Rise of Unstructured and Semistructured Data Analytics ● Used Everywhere
  • 14. Total Enterprise Data Growth 2005-2015 The way we capture, store, analyze, and distribute data is transforming. Deduplication,compression, and analysis tools are lowering costs.
  • 15. Tools and Resources Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt The competition: ● Lorem ipsum ● Dolor sit amet
  • 16. Categories and Links Books ISLR, R for Dummies, Advanced R, Machine learning for Hackers(Py), NLP with Python Websites and Blogs Analytics Vidhya, Rbloggers, Kaggle Scripts, CrowdAnalytics, students.brown.edu, github.io Statistics and Linear Algebra Inferential and Descriptive statistics by Udacity, MSR sir’s Prob & stats Slide, Khan Acad(Lin.Alg) Machine Learning and AI Andrew Ng's ML Class, John Hopkins Data Analysis, Deepak Khemani(AI-nptel) Data Storage and Visualization MongoDB(Udacity), D3.js documentation and wiki
  • 17. 1 3 5 7 10 12 14 20 Timeline(Weeks)[Beginers] Learn the Language - R/Python Start Doing Hackathons/Pet Projects Practice the Langauge, Finish Intro in ML Do more advance ML, start optimizing your code.Start reading git commits
  • 18. Intro To ML & R Installing Packages :- To install a package, use the install.packages() function. Once a package is installed, it must be loaded into your current R session before being used using library() or require(). Think of this as taking the book off of the shelf and opening it up to read. TIP :- Use require function for loading a package as it throws false if package is not found. Data Types :- R has a number of basic data types. 1. Numeric :- Also known as Double. The default type when dealing with numbers. Examples: 1, 1.0, 42.5 2. Integer: - Examples: 1L, 2L, 42L 3. Complex : - Example: 4 + 2i 4. Logical : - Two possible values: TRUE and FALSE, you can also use T and F, but this is not recommended. NA is also considered logical. 5. Character :- Examples: "a", "Statistics", "1 plus 2."
  • 19.
  • 20. R Object oriented System S3 Lacks formal definition Objects are created by setting the class attribute Attributes are accessed using $ Methods belong to generic function Follows copy-on-modify semantics S4 Class defined using setClass() Objects are created using new() Attributes are accessed using @ Methods belong to generic function Follows copy-on-modify semantics Reference Classes Class defined using setRefClass() Objects are created using generator functions Attributes are accessed using $ Methods belong to the class Does not follow copy-on-modify semantics
  • 22. We will use inbuilt Cars dataset in R-base Data gathered during the 1920s about the speed of cars and the resulting distance it takes for the car to come to a stop. Objective :- How far a car travels before stopping, when traveling at a certain speed?
  • 23. What sort of function should we use for f(X)[Y=f(X) +e) for the cars data? - A Horizontal Line? We see this doesn’t seem to do a very good job. Many of the data points are very far from the orange line representing cc . This is an example of underfitting. - Make f(x) depend on x - As speed increases, the distance required to come to a stop increases. There is still some variation about this line, but it seems to capture the overall trend.
  • 24.
  • 25. Assumptions of Linear Regression LINE Linear. The relationship between Y and x is linear, of the form β0+β1x . Independent. The errors ϵ are independent. Normal. The errors, ϵ are normally distributed. That is the “error” around the line follows a normal distribution. Equal Variance. At each value of x , the variance of Y is the same, σ2 . We have to find a line that minimize sum of all squared distances from point to line.
  • 26. lm() stop_dist_model = lm(dist ~ speed, data = cars) The abline() function is used to add lines of the form a+bx to a plot. (Hence abline.) When we give it stop_dist_model as an argument, it automatically extracts the regression coefficient estimates ( β̂0 and β̂1) and uses them as the slope and intercept of the line. Here we also use lwd to modify the width of the line, as well as col to modify the color of the line. lm() function returns an object of class lm() We can access the members using $ operator > names(stop_dist_model) > stop_dist_model$residuals Use summary() to summarize the output for linear regression.The summary() command also returns a list, and we can again use names() to learn what about the elements of this list. > names(summary(stop_dist_model)) > summary(stop_dist_model)$r.squared Use predict function to predict output for certain input values > predict(stop_dist_model, data.frame(speed = c(8, 21, 50)))
  • 27. Thank You -Anshik Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt The competition: ● Lorem ipsum ● Dolor sit amet 8826274098 (Watsapp)