SlideShare a Scribd company logo
1 of 21
MODY UNIVERSITY OF SCIENCE
ANDTECHNOLOGY
Colloquium Presentation
CS 14.371
Submitted to-
Dr. Pervesh Kumar
Bishnoi
Ms. Sonal Shukla
Submitted by-
Akshita Kanther
B.Tech. IIIrd yr C2
Er.No.-180161
CONTENTS
Introduction
Why Should We Prepare Our Data
Python
Python Libraries
Pandas
Features of Pandas
Core Components Of Pandas
Pandas Operations
Typical Pipeline For Data Preparation
Common Tasks Involved In Data Preparation
Applications Of Pandas
Companies using Pandas
Summary
INTRODUCTION
Data preparation is the first step after you get your
hands on any kind of dataset. This is the step when you
pre-process raw data into a form that can be easily and
accurately analyzed. Proper data preparation allows for
efficient analysis - it can eliminate errors and
inaccuracies that could have occurred during the data
gathering process and can thus help in removing some
bias resulting from poor data quality. Therefore a lot of
an analyst's time is spent on this vital step.
WHY SHOULD
WE PREPARE
OUR DATA
Garbage in, garbage out
Reduce errors
Remove duplicate records
Fix missing values
Correct range values
Fix formatting (i.e. date, text, number)
PYTHON
 Object-oriented, high-level
programming language
 Used as a scripting language to
connect existing components
together
 Simple, easy to learn syntax
emphasizes readability
 Supports modules and
packages
PYTHON LIBRARIES
Many popular Python
toolboxes/libraries:-
• NumPy
• SciPy
• Pandas
• SciKit-Learn
Visualization libraries:-
• matplotlib
• Seaborn
PANDAS
• Pandas is a software library written for Python
• Pandas has so many uses that it might make sense to list
the things it can't do instead of what it can do
• This tool is essentially your data’s home. Through pandas,
you get acquainted with your data by cleaning,
transforming, and analyzing it
• Pandas is well suited for different kinds of data, such as:
 Tabular data with heterogeneously-typed columns
 Ordered and unordered time series data
 Arbitrary matrix data with row & column labels
 Unlabelled data
 Any other form of observational or statistical data sets
To use the pandas library, you need to first import it. Just type
this in your python console:
CORE COMPONENTS OF PANDAS
The primary two components of Pandas are:-
Dataframe
 Series
A Series is essentially column and a Dataframe is a
multidimensional Table made up of a collection of Series.
PANDAS OPERATIONS
Using Python pandas, you can perform a lot of operations with series, data frames, missing data,
group by etc. Some of the common operations for data manipulation are listed below:
TYPICAL PIPELINE FOR DATA
PREPARATION
• The first step of a data preparation pipeline is to gather data from various
sources and locations
• Before any processing is done, we wish to discover what the data is about.
At this stage, we understand the data within the context of business goals
and Visualization of the data is also helpful here
• The next stage is to cleanse the data of missing values and invalid values.
We also reformat data to standard forms
• Next we transform the data for a specific outcome or audience
• We can enrich data by merging different datasets to enable richer insights
• Finally, we store the data or directly send it out for analytics
COMMON TASKS INVOLVED IN DATA
PREPARATION
Tasks involved in
Data Preparation
Aggregation
Augmentation
Decomposing
Deletion
Blending
Anonymization
Data preparation involves one or more of the following tasks:
•Aggregation: Multiple columns are reduced to fewer columns.
Records are summarized
•Anonymization: Sensitive values are removed for the sake of
privacy
•Augmentation: Expand the dataset size without collecting more
data. For example, image data is augmented via cropping or
rotating
•Blending: Combine and link related data from various sources. For
example, combine an employee's HR data with payroll data
•Decomposing: Decompose a data column that has sub-fields. For
example, "6 ounces butter" is decomposed into three columns
representing value, unit and ingredient
•Deletion: Duplicates and outliers are removed. Exploratory Data
Analysis (EDA) may be used to identify outliers
APPLICATIONS OF
PANDAS
COMPANIES USING PANDAS
SUMMARY
Raw data is usually not suitable for direct analysis.
This is because the data might come from different
sources in different formats. Moreover, real-world
data is not clean. Some data points might be
missing. Some others might be out of range.
There could be duplicates. Data preparation is
therefore an essential task that transforms or
prepares data into a form that's suitable for
analysis.
Data preparation assumes that data has already
been collected. However, others may consider
data collection and data ingestion as part of data
preparation. Within data preparation, it's common
to identify sub-stages that might include data pre-
processing, data wrangling, and data
transformation.
Presentation on data preparation with pandas
Presentation on data preparation with pandas

More Related Content

What's hot

What's hot (20)

Python - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesPython - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning Libraries
 
Data Visualization in Python
Data Visualization in PythonData Visualization in Python
Data Visualization in Python
 
Data Visualization(s) Using Python
Data Visualization(s) Using PythonData Visualization(s) Using Python
Data Visualization(s) Using Python
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
PPT on Data Science Using Python
PPT on Data Science Using PythonPPT on Data Science Using Python
PPT on Data Science Using Python
 
Data Analysis in Python-NumPy
Data Analysis in Python-NumPyData Analysis in Python-NumPy
Data Analysis in Python-NumPy
 
Data Analysis in Python
Data Analysis in PythonData Analysis in Python
Data Analysis in Python
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Python Scipy Numpy
Python Scipy NumpyPython Scipy Numpy
Python Scipy Numpy
 
Introduction to pandas
Introduction to pandasIntroduction to pandas
Introduction to pandas
 
Basic of python for data analysis
Basic of python for data analysisBasic of python for data analysis
Basic of python for data analysis
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with Python
 
Python for Data Science
Python for Data SciencePython for Data Science
Python for Data Science
 
Python For Data Analysis | Python Pandas Tutorial | Learn Python | Python Tra...
Python For Data Analysis | Python Pandas Tutorial | Learn Python | Python Tra...Python For Data Analysis | Python Pandas Tutorial | Learn Python | Python Tra...
Python For Data Analysis | Python Pandas Tutorial | Learn Python | Python Tra...
 
Data Wrangling
Data WranglingData Wrangling
Data Wrangling
 
Looping statement in python
Looping statement in pythonLooping statement in python
Looping statement in python
 
Data preprocessing in Machine learning
Data preprocessing in Machine learning Data preprocessing in Machine learning
Data preprocessing in Machine learning
 
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]
 
Functions in python slide share
Functions in python slide shareFunctions in python slide share
Functions in python slide share
 

Similar to Presentation on data preparation with pandas

Data Preprocessing&tools
Data Preprocessing&toolsData Preprocessing&tools
Data Preprocessing&tools
Amandeep Gill
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
sumit621
 
Data warehouse
Data warehouseData warehouse
Data warehouse
_123_
 

Similar to Presentation on data preparation with pandas (20)

data wrangling (1).pptx kjhiukjhknjbnkjh
data wrangling (1).pptx kjhiukjhknjbnkjhdata wrangling (1).pptx kjhiukjhknjbnkjh
data wrangling (1).pptx kjhiukjhknjbnkjh
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Preprocessing_new.ppt
Preprocessing_new.pptPreprocessing_new.ppt
Preprocessing_new.ppt
 
Transformation techniques.pptx
Transformation techniques.pptxTransformation techniques.pptx
Transformation techniques.pptx
 
Data Preparation.pptx
Data Preparation.pptxData Preparation.pptx
Data Preparation.pptx
 
Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)
 
Data Preprocessing&tools
Data Preprocessing&toolsData Preprocessing&tools
Data Preprocessing&tools
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousing
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Anwar kamal .pdf.pptx
Anwar kamal .pdf.pptxAnwar kamal .pdf.pptx
Anwar kamal .pdf.pptx
 
DBMS CAPSTONE PPT (1).pptx
DBMS CAPSTONE PPT (1).pptxDBMS CAPSTONE PPT (1).pptx
DBMS CAPSTONE PPT (1).pptx
 
Data science | What is Data science
Data science | What is Data scienceData science | What is Data science
Data science | What is Data science
 
Dc python meetup
Dc python meetupDc python meetup
Dc python meetup
 
8 Guiding Principles to Kickstart Your Healthcare Big Data Project
8 Guiding Principles to Kickstart Your Healthcare Big Data Project8 Guiding Principles to Kickstart Your Healthcare Big Data Project
8 Guiding Principles to Kickstart Your Healthcare Big Data Project
 
Se 381 - lec 21 - 23 - 12 may09 - df-ds and data dictionary
Se 381 - lec 21 - 23 - 12 may09 - df-ds and data dictionarySe 381 - lec 21 - 23 - 12 may09 - df-ds and data dictionary
Se 381 - lec 21 - 23 - 12 may09 - df-ds and data dictionary
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSS
 
sitNL 2015 Lean Data Management (Frank Gundlich)
sitNL 2015 Lean Data Management (Frank Gundlich)sitNL 2015 Lean Data Management (Frank Gundlich)
sitNL 2015 Lean Data Management (Frank Gundlich)
 

Recently uploaded

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Recently uploaded (20)

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 

Presentation on data preparation with pandas

  • 1. MODY UNIVERSITY OF SCIENCE ANDTECHNOLOGY Colloquium Presentation CS 14.371 Submitted to- Dr. Pervesh Kumar Bishnoi Ms. Sonal Shukla Submitted by- Akshita Kanther B.Tech. IIIrd yr C2 Er.No.-180161
  • 2.
  • 3. CONTENTS Introduction Why Should We Prepare Our Data Python Python Libraries Pandas Features of Pandas Core Components Of Pandas Pandas Operations Typical Pipeline For Data Preparation Common Tasks Involved In Data Preparation Applications Of Pandas Companies using Pandas Summary
  • 4. INTRODUCTION Data preparation is the first step after you get your hands on any kind of dataset. This is the step when you pre-process raw data into a form that can be easily and accurately analyzed. Proper data preparation allows for efficient analysis - it can eliminate errors and inaccuracies that could have occurred during the data gathering process and can thus help in removing some bias resulting from poor data quality. Therefore a lot of an analyst's time is spent on this vital step.
  • 6. Garbage in, garbage out Reduce errors Remove duplicate records Fix missing values Correct range values Fix formatting (i.e. date, text, number)
  • 7. PYTHON  Object-oriented, high-level programming language  Used as a scripting language to connect existing components together  Simple, easy to learn syntax emphasizes readability  Supports modules and packages
  • 8. PYTHON LIBRARIES Many popular Python toolboxes/libraries:- • NumPy • SciPy • Pandas • SciKit-Learn Visualization libraries:- • matplotlib • Seaborn
  • 9.
  • 10. PANDAS • Pandas is a software library written for Python • Pandas has so many uses that it might make sense to list the things it can't do instead of what it can do • This tool is essentially your data’s home. Through pandas, you get acquainted with your data by cleaning, transforming, and analyzing it • Pandas is well suited for different kinds of data, such as:  Tabular data with heterogeneously-typed columns  Ordered and unordered time series data  Arbitrary matrix data with row & column labels  Unlabelled data  Any other form of observational or statistical data sets To use the pandas library, you need to first import it. Just type this in your python console:
  • 11.
  • 12. CORE COMPONENTS OF PANDAS The primary two components of Pandas are:- Dataframe  Series A Series is essentially column and a Dataframe is a multidimensional Table made up of a collection of Series.
  • 13. PANDAS OPERATIONS Using Python pandas, you can perform a lot of operations with series, data frames, missing data, group by etc. Some of the common operations for data manipulation are listed below:
  • 14. TYPICAL PIPELINE FOR DATA PREPARATION
  • 15. • The first step of a data preparation pipeline is to gather data from various sources and locations • Before any processing is done, we wish to discover what the data is about. At this stage, we understand the data within the context of business goals and Visualization of the data is also helpful here • The next stage is to cleanse the data of missing values and invalid values. We also reformat data to standard forms • Next we transform the data for a specific outcome or audience • We can enrich data by merging different datasets to enable richer insights • Finally, we store the data or directly send it out for analytics
  • 16. COMMON TASKS INVOLVED IN DATA PREPARATION Tasks involved in Data Preparation Aggregation Augmentation Decomposing Deletion Blending Anonymization Data preparation involves one or more of the following tasks: •Aggregation: Multiple columns are reduced to fewer columns. Records are summarized •Anonymization: Sensitive values are removed for the sake of privacy •Augmentation: Expand the dataset size without collecting more data. For example, image data is augmented via cropping or rotating •Blending: Combine and link related data from various sources. For example, combine an employee's HR data with payroll data •Decomposing: Decompose a data column that has sub-fields. For example, "6 ounces butter" is decomposed into three columns representing value, unit and ingredient •Deletion: Duplicates and outliers are removed. Exploratory Data Analysis (EDA) may be used to identify outliers
  • 19. SUMMARY Raw data is usually not suitable for direct analysis. This is because the data might come from different sources in different formats. Moreover, real-world data is not clean. Some data points might be missing. Some others might be out of range. There could be duplicates. Data preparation is therefore an essential task that transforms or prepares data into a form that's suitable for analysis. Data preparation assumes that data has already been collected. However, others may consider data collection and data ingestion as part of data preparation. Within data preparation, it's common to identify sub-stages that might include data pre- processing, data wrangling, and data transformation.