Agenda
Why Data Science?
What is Data Science?
Who is a Data Scientist?
What does a Data Scientist do?
How to solve a problem in Data Science?
Data Science Tools
Demo
Agenda
Why Data Science?
What is Data Science?
Who is a Data Scientist?
What does a Data Scientist do?
How to solve a problem in Data Science?
Data Science Tools
Demo
Why Data Science?
www.edureka.co/data-scienceData Science Certification Course using R
Why Data Science?
You can make better decisions, you can reduce your production costs by coming out with efficient ways, and give your
customers what they actually want!
Cost Reduction Faster & Better
Decision Making
Improved Services
and Products
Risk Detection
www.edureka.co/data-scienceData Science Certification Course using R
Why Data Science?
Data Science can help prevent Fraudulent transactions using advanced Machine Learning algorithms and prevent great
monetary losses.
What is Data Science?
www.edureka.co/data-scienceData Science Certification Course using R
What is Data Science?
Data Science is a blend of various tools, algorithms, and machine learning principles with the goal to discover hidden patterns
from the raw data.
DATA SCIENCE
Analysis Structure Algorithm Process Programming Insight
www.edureka.co/data-scienceData Science Certification Course using R
What is Data Science?
It is an inter-disciplinary field deploying scientific methods, processes and systems to gain insight from data in various forms.
Tell us something we don’t know already.
Statistics Code
Business
www.edureka.co/data-scienceData Science Certification Course using R
What is Data Science?
How is this different from what statisticians have been doing for years?
Business Administration
Exploratory Data Analysis
Machine Learning &
Advanced Algorithms
Data Product Engineering
Business Analyst
Data Scientist
Who is Data Scientist?
www.edureka.co/data-scienceData Science Certification Course using R
Who is a Data Scientist?
www.edureka.co/data-scienceData Science Certification Course using R
Who is a Data Scientist?
Statistics
Discrete Theory
Combinatorics
Decision Theory
Machine Learning
www.edureka.co/data-scienceData Science Certification Course using R
Who is a Data Scientist?
www.edureka.co/data-scienceData Science Certification Course using R
Who is a Data Scientist?
Economics
Finance
Operations
Management
Business
Intelligence
www.edureka.co/data-scienceData Science Certification Course using R
Who is a Data Scientist?
www.edureka.co/data-scienceData Science Certification Course using R
Who is a Data Scientist?
Computer Science
Software
Engineering
Systems
Development
What does a Data Scientist do?
www.edureka.co/data-scienceData Science Certification Course using R
Processing &
Cleansing Data
What does a Data Scientist do?
Data Mining
Building
Prediction
Models
Extending
Data
Optimizing and
building classifiers
using
Machine Learning
www.edureka.co/data-scienceData Science Certification Course using R
Processing &
Cleansing Data
What does a Data Scientist do?
Data Mining
Building
Prediction
Models
Extending
Data
Optimizing and
building classifiers
using
Machine Learning
www.edureka.co/data-scienceData Science Certification Course using R
What does a Data Scientist do?
Data Mining
Processing &
Cleansing Data
Building
Prediction
Models
Extending
Data
Optimizing and
building classifiers
using
Machine Learning
www.edureka.co/data-scienceData Science Certification Course using R
What does a Data Scientist do?
Data Mining
Processing &
Cleansing Data
Building
Prediction
Models
Extending
Data
Optimizing and
building classifiers
using
Machine Learning
www.edureka.co/data-scienceData Science Certification Course using R
What does a Data Scientist do?
Data Mining
Processing &
Cleansing Data
Building
Prediction
Models
Extending
Data
Optimizing and
building classifiers
using
Machine Learning
www.edureka.co/data-scienceData Science Certification Course using R
What does a Data Scientist do?
Data Mining
Processing &
Cleansing Data
Building
Prediction
Models
Extending
Data
Optimizing and
building classifiers
using
Machine Learning
How to solve a problem in Data Science?
www.edureka.co/data-scienceData Science Certification Course using R
How to solve a problem in Data Science?
3 62 41 5
Discovery
Data
Preparation
Model
Planning
Model
Building
Operationalize
Communicating
Results
www.edureka.co/data-scienceData Science Certification Course using R
How to solve a problem in Data Science?
1
3
2
4
Discovery
Data Preparation
Model Planning
Model Building
5
6
Operationalize
Communicate
➢ Discovery involves acquiring data from all identifies internal and
external resources that can help with a business solution.
➢ You assess if you have the required resources present in terms of
people, technology, time and data to support the project.
www.edureka.co/data-scienceData Science Certification Course using R
How to solve a problem in Data Science?
1
3
2
4
Discovery
Data Preparation
Model Planning
Model Building
5
6
Operationalize
Communicate
➢ In this phase, you require analytical sandbox in which you can
perform analytics for the entire duration of the project.
➢ This is what a Sandbox is supposed to look like;
➢ ETLT means to Extract, Transform, Load and Transform.
Preparing the
Analytics Sandbox
Performing ETLT Data Conditioning Survey & Visualize
www.edureka.co/data-scienceData Science Certification Course using R
How to solve a problem in Data Science?
1
3
2
4
Discovery
Data Preparation
Model Planning
Model Building
5
6
Operationalize
Communicate
➢ You will apply Exploratory Data Analytics (EDA) using various
statistical formulas and visualization tools.
Common Tools for Model Planning
R SAS/ ACCESS
SQL Service
Analysis Services
www.edureka.co/data-scienceData Science Certification Course using R
How to solve a problem in Data Science?
1
3
2
4
Discovery
Data Preparation
Model Planning
Model Building
5
6
Operationalize
Communicate
➢ In this phase, you will develop datasets for training and testing
purposes.
Common Tools for Model Building
SAS
Miner
WEKA SPCS MATLAB
Alpine
Miner
Statistica
www.edureka.co/data-scienceData Science Certification Course using R
How to solve a problem in Data Science?
1
3
2
4
Discovery
Data Preparation
Model Planning
Model Building
5
6
Operationalize
Communicate
➢ In this phase, you deliver final reports, briefings, code and technical
documents.
➢ In addition, sometimes a pilot project is also implemented in a real-
time production environment.
➢ This will provide you a clear picture of the performance and other
related constraints on a small scale before full deployment.
www.edureka.co/data-scienceData Science Certification Course using R
How to solve a problem in Data Science?
1
3
2
4
Discovery
Data Preparation
Model Planning
Model Building
5
6
Operationalize
Communicate
➢ You do the following things in this phase;
1. You identify all the key findings
2. communicate to the stakeholders
3. Look for performance constraints, if any
4. determine if the results of the project are a success or a failure
www.edureka.co/data-scienceData Science Certification Course using R
How to Choose an Algorithm in Data Science?
Is it A or B? Classification Algorithm
Is this weird? Anomaly Detection Algorithm
How much / How many? Regression Algorithm
How is this organised? Clustering Algorithm
What should I do next? Reinforcement Learning
www.edureka.co/data-scienceData Science Certification Course using R
What is machine Learning?
It is a type of Artificial Intelligence that makes the computers capable of learning on their own i.e without explicitly being
programmed. With machine learning, machines can update their own code, whenever they come across a new situation.
www.edureka.co/data-scienceData Science Certification Course using R
Categories of Algorithm
Supervised
Learning
1
Supervised Learning
is a type of machine
learning algorithm
that uses a known
dataset to make
predictions.
Unsupervised
Learning
2
Unsupervised
Learning is a type of
machine learning
algorithm that uses a
input datasets
without labelled
responses to draw
inference.
Reinforcement
Learning
3
Reinforcement
Learning is a type of
algorithm inspired by
behaviourist
psychology,
concerned with
taking actions to
maximise reward.
Data Science Tools
www.edureka.co/data-scienceData Science Certification Course using R
Data Science Tools
1.
Datasets Hadoop
4
Big Data
3
R programming
2
Spark
55
Demo
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial Using R | Edureka

Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial Using R | Edureka

  • 1.
    Agenda Why Data Science? Whatis Data Science? Who is a Data Scientist? What does a Data Scientist do? How to solve a problem in Data Science? Data Science Tools Demo
  • 2.
    Agenda Why Data Science? Whatis Data Science? Who is a Data Scientist? What does a Data Scientist do? How to solve a problem in Data Science? Data Science Tools Demo
  • 3.
  • 4.
    www.edureka.co/data-scienceData Science CertificationCourse using R Why Data Science? You can make better decisions, you can reduce your production costs by coming out with efficient ways, and give your customers what they actually want! Cost Reduction Faster & Better Decision Making Improved Services and Products Risk Detection
  • 5.
    www.edureka.co/data-scienceData Science CertificationCourse using R Why Data Science? Data Science can help prevent Fraudulent transactions using advanced Machine Learning algorithms and prevent great monetary losses.
  • 6.
    What is DataScience?
  • 7.
    www.edureka.co/data-scienceData Science CertificationCourse using R What is Data Science? Data Science is a blend of various tools, algorithms, and machine learning principles with the goal to discover hidden patterns from the raw data. DATA SCIENCE Analysis Structure Algorithm Process Programming Insight
  • 8.
    www.edureka.co/data-scienceData Science CertificationCourse using R What is Data Science? It is an inter-disciplinary field deploying scientific methods, processes and systems to gain insight from data in various forms. Tell us something we don’t know already. Statistics Code Business
  • 9.
    www.edureka.co/data-scienceData Science CertificationCourse using R What is Data Science? How is this different from what statisticians have been doing for years? Business Administration Exploratory Data Analysis Machine Learning & Advanced Algorithms Data Product Engineering Business Analyst Data Scientist
  • 10.
    Who is DataScientist?
  • 11.
    www.edureka.co/data-scienceData Science CertificationCourse using R Who is a Data Scientist?
  • 12.
    www.edureka.co/data-scienceData Science CertificationCourse using R Who is a Data Scientist? Statistics Discrete Theory Combinatorics Decision Theory Machine Learning
  • 13.
    www.edureka.co/data-scienceData Science CertificationCourse using R Who is a Data Scientist?
  • 14.
    www.edureka.co/data-scienceData Science CertificationCourse using R Who is a Data Scientist? Economics Finance Operations Management Business Intelligence
  • 15.
    www.edureka.co/data-scienceData Science CertificationCourse using R Who is a Data Scientist?
  • 16.
    www.edureka.co/data-scienceData Science CertificationCourse using R Who is a Data Scientist? Computer Science Software Engineering Systems Development
  • 17.
    What does aData Scientist do?
  • 18.
    www.edureka.co/data-scienceData Science CertificationCourse using R Processing & Cleansing Data What does a Data Scientist do? Data Mining Building Prediction Models Extending Data Optimizing and building classifiers using Machine Learning
  • 19.
    www.edureka.co/data-scienceData Science CertificationCourse using R Processing & Cleansing Data What does a Data Scientist do? Data Mining Building Prediction Models Extending Data Optimizing and building classifiers using Machine Learning
  • 20.
    www.edureka.co/data-scienceData Science CertificationCourse using R What does a Data Scientist do? Data Mining Processing & Cleansing Data Building Prediction Models Extending Data Optimizing and building classifiers using Machine Learning
  • 21.
    www.edureka.co/data-scienceData Science CertificationCourse using R What does a Data Scientist do? Data Mining Processing & Cleansing Data Building Prediction Models Extending Data Optimizing and building classifiers using Machine Learning
  • 22.
    www.edureka.co/data-scienceData Science CertificationCourse using R What does a Data Scientist do? Data Mining Processing & Cleansing Data Building Prediction Models Extending Data Optimizing and building classifiers using Machine Learning
  • 23.
    www.edureka.co/data-scienceData Science CertificationCourse using R What does a Data Scientist do? Data Mining Processing & Cleansing Data Building Prediction Models Extending Data Optimizing and building classifiers using Machine Learning
  • 24.
    How to solvea problem in Data Science?
  • 25.
    www.edureka.co/data-scienceData Science CertificationCourse using R How to solve a problem in Data Science? 3 62 41 5 Discovery Data Preparation Model Planning Model Building Operationalize Communicating Results
  • 26.
    www.edureka.co/data-scienceData Science CertificationCourse using R How to solve a problem in Data Science? 1 3 2 4 Discovery Data Preparation Model Planning Model Building 5 6 Operationalize Communicate ➢ Discovery involves acquiring data from all identifies internal and external resources that can help with a business solution. ➢ You assess if you have the required resources present in terms of people, technology, time and data to support the project.
  • 27.
    www.edureka.co/data-scienceData Science CertificationCourse using R How to solve a problem in Data Science? 1 3 2 4 Discovery Data Preparation Model Planning Model Building 5 6 Operationalize Communicate ➢ In this phase, you require analytical sandbox in which you can perform analytics for the entire duration of the project. ➢ This is what a Sandbox is supposed to look like; ➢ ETLT means to Extract, Transform, Load and Transform. Preparing the Analytics Sandbox Performing ETLT Data Conditioning Survey & Visualize
  • 28.
    www.edureka.co/data-scienceData Science CertificationCourse using R How to solve a problem in Data Science? 1 3 2 4 Discovery Data Preparation Model Planning Model Building 5 6 Operationalize Communicate ➢ You will apply Exploratory Data Analytics (EDA) using various statistical formulas and visualization tools. Common Tools for Model Planning R SAS/ ACCESS SQL Service Analysis Services
  • 29.
    www.edureka.co/data-scienceData Science CertificationCourse using R How to solve a problem in Data Science? 1 3 2 4 Discovery Data Preparation Model Planning Model Building 5 6 Operationalize Communicate ➢ In this phase, you will develop datasets for training and testing purposes. Common Tools for Model Building SAS Miner WEKA SPCS MATLAB Alpine Miner Statistica
  • 30.
    www.edureka.co/data-scienceData Science CertificationCourse using R How to solve a problem in Data Science? 1 3 2 4 Discovery Data Preparation Model Planning Model Building 5 6 Operationalize Communicate ➢ In this phase, you deliver final reports, briefings, code and technical documents. ➢ In addition, sometimes a pilot project is also implemented in a real- time production environment. ➢ This will provide you a clear picture of the performance and other related constraints on a small scale before full deployment.
  • 31.
    www.edureka.co/data-scienceData Science CertificationCourse using R How to solve a problem in Data Science? 1 3 2 4 Discovery Data Preparation Model Planning Model Building 5 6 Operationalize Communicate ➢ You do the following things in this phase; 1. You identify all the key findings 2. communicate to the stakeholders 3. Look for performance constraints, if any 4. determine if the results of the project are a success or a failure
  • 32.
    www.edureka.co/data-scienceData Science CertificationCourse using R How to Choose an Algorithm in Data Science? Is it A or B? Classification Algorithm Is this weird? Anomaly Detection Algorithm How much / How many? Regression Algorithm How is this organised? Clustering Algorithm What should I do next? Reinforcement Learning
  • 33.
    www.edureka.co/data-scienceData Science CertificationCourse using R What is machine Learning? It is a type of Artificial Intelligence that makes the computers capable of learning on their own i.e without explicitly being programmed. With machine learning, machines can update their own code, whenever they come across a new situation.
  • 34.
    www.edureka.co/data-scienceData Science CertificationCourse using R Categories of Algorithm Supervised Learning 1 Supervised Learning is a type of machine learning algorithm that uses a known dataset to make predictions. Unsupervised Learning 2 Unsupervised Learning is a type of machine learning algorithm that uses a input datasets without labelled responses to draw inference. Reinforcement Learning 3 Reinforcement Learning is a type of algorithm inspired by behaviourist psychology, concerned with taking actions to maximise reward.
  • 35.
  • 36.
    www.edureka.co/data-scienceData Science CertificationCourse using R Data Science Tools 1. Datasets Hadoop 4 Big Data 3 R programming 2 Spark 55
  • 37.