Introduction to
Data science
Presentation
Presented By : Chirag Gautam
22L9MBA34102
Data Cleaning
• Data cleaning is the process of
fixing or removing incorrect,
corrupted, incorrectly
formatted, duplicate, or
incomplete data within a
dataset.
Steps of Data
Cleaning
• Step 1: Remove irrelevant
data.
• Step 2: Deduplicate your
data.
• Step 3: Fix structural errors.
• Step 4: Deal with missing
data.
• Step 5: Filter out data outliers
Data
Integration
• Data integration is the
process of bringing data
from disparate sources
together to provide
users with a unified
view. The premise of
data integration is to
make data more freely
available and easier to
consume and process
by systems and users.
Data Reduction
• Data reduction is the process of
reducing the amount of capacity
required to store data. Data
reduction can increase storage
efficiency and reduce costs. Storage
vendors will often describe storage
capacity in terms of raw capacity
and effective capacity, which refers
to data after the reduction.
Data
Transformation
• Data transformation
is the process of
converting data from
one format to another,
typically from the format
of a source system into
the required format of a
destination system.
Two Ways of
Transformation
Natural transformation :
It describes the uptake and incorporation of
naked DNA from the cell’s natural environment.
Artificial transformation :
It encompasses a wide array of methods for
inducing uptake of exogenous DNA.
DATA
DISCRETIZATIO
N
• Discretization is the process of
putting values into buckets so that
there are a limited number of
possible states. The buckets
themselves are treated as ordered
and discrete values. You can
discretize both numeric and string
columns. There are several methods
that you can use to discretize data.
TECHNIQUES
IDS Presentation.pptx

IDS Presentation.pptx

  • 1.
  • 2.
    Data Cleaning • Datacleaning is the process of fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset.
  • 3.
    Steps of Data Cleaning •Step 1: Remove irrelevant data. • Step 2: Deduplicate your data. • Step 3: Fix structural errors. • Step 4: Deal with missing data. • Step 5: Filter out data outliers
  • 4.
    Data Integration • Data integrationis the process of bringing data from disparate sources together to provide users with a unified view. The premise of data integration is to make data more freely available and easier to consume and process by systems and users.
  • 5.
    Data Reduction • Datareduction is the process of reducing the amount of capacity required to store data. Data reduction can increase storage efficiency and reduce costs. Storage vendors will often describe storage capacity in terms of raw capacity and effective capacity, which refers to data after the reduction.
  • 6.
    Data Transformation • Data transformation isthe process of converting data from one format to another, typically from the format of a source system into the required format of a destination system.
  • 7.
    Two Ways of Transformation Naturaltransformation : It describes the uptake and incorporation of naked DNA from the cell’s natural environment. Artificial transformation : It encompasses a wide array of methods for inducing uptake of exogenous DNA.
  • 8.
    DATA DISCRETIZATIO N • Discretization isthe process of putting values into buckets so that there are a limited number of possible states. The buckets themselves are treated as ordered and discrete values. You can discretize both numeric and string columns. There are several methods that you can use to discretize data.
  • 9.