SlideShare a Scribd company logo
1 of 26
Transformation techniques
Let's dive more into how we can perform other types of data transformations
including cleaning, filtering, deduplication.
1. Performing data deduplication:
Removing them is essential to enhance the quality of the dataset.
This can be done with the following steps:
1. Let's consider a simple dataframe, as follows:
2. Replacing values- It is essential to find and replace some values
inside a dataframe.
This can be done with the following steps:
3.Handling missing data
• Whenever there are missing values, a NaN value is used, which
indicates that there is no value specified for that particular index.
• There could be several reasons why a value could be NaN:
• When data is retrieved from an external source and there are some
incomplete values in the dataset.
• It can also happen when we join two different datasets and some values are
not matched.
• Missing values due to data collection errors.
• When the shape of data changes, there are new additional rows or columns
that are not determined.
NaN values in pandas objects
Dropping missing values
• One of the ways to handle missing values is to simply remove them
from our dataset. We have seen that we can use the isnull() and
notnull() functions from the pandas library to determine null values:
Filling missing values
Interpolating missing values
Outlier detection and filtering
• Outliers are data points that diverge from other observations for
several reasons.
• During the EDA phase, one of our common tasks is to detect and
filter these outliers.
• The main reason for this detection and filtering of outliers is that the
presence of such outliers can cause serious issues in statistical
analysis.
We are going to perform simple outlier detection and
filtering:
Grouping Datasets
• During data analysis, it is often essential to cluster or group data
together based on certain criteria.
• For example, an e-commerce store might want to group all the sales
that were done during the New year period
• These grouping concepts occur in several parts of data analysis
groupby() method
• During the data analysis phase, categorizing a dataset into multiple
categories or groups is often essential.
• The pandas groupby function is one of the most efficient and time-
saving features for doing this.
• Groupby provides functionalities that allow us to split-apply-combine
throughout the dataframe; that is, it can be used for
splitting, applying, and combining dataframes.
https://www.w3schools.com/python/pandas/trypython.asp?filename=demo
_ref_df_groupby
• The pandas groupby method performs two essential functions:
 It splits the data into groups based on some criteria.
 It applies a function to each group independently.
Find the average co2 consumption for each car brand:
import pandas as pd
data = {
'co2': [95, 90, 99, 104, 105, 94, 99, 104],
'model': ['Citigo', 'Fabia', 'Fiesta', 'Rapid',
'Focus', 'Mondeo’, 'Octavia', 'B-Max'],
'car': ['Skoda', 'Skoda', 'Ford', 'Skoda', 'Ford’,
'Ford', 'Skoda’, 'Ford’]
}
df = pd.DataFrame(data)
print(df)
Find the average co2 consumption for each car brand:
groupby() can be used to group the dataset on the basis of car column
get_group() can be used to get value items in the group
Data aggregation
• Aggregation is the process of implementing any mathematical
operation on a dataset or a subset of it.
• Aggregation is one of the many techniques in pandas that's used to
manipulate the data in the dataframe for data analysis.
• The Dataframe.aggregate() function is used to apply aggregation across
one or more columns.
• We can apply aggregation in a DataFrame, df, as df.aggregate() or
df.agg().
• Aggregation only works with numeric type columns, Some of the most
frequently used aggregations are as follows:
Transformation techniques.pptx
Transformation techniques.pptx
Transformation techniques.pptx

More Related Content

Similar to Transformation techniques.pptx

Unit 3 part ii Data mining
Unit 3 part ii Data miningUnit 3 part ii Data mining
Unit 3 part ii Data miningDhilsath Fathima
 
Presentation on data preparation with pandas
Presentation on data preparation with pandasPresentation on data preparation with pandas
Presentation on data preparation with pandasAkshitaKanther
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning Gopal Sakarkar
 
Group 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptxGroup 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptxellamangapis2003
 
Pre-Processing and Data Preparation
Pre-Processing and Data PreparationPre-Processing and Data Preparation
Pre-Processing and Data PreparationUmair Shafique
 
overview of_data_processing
overview of_data_processingoverview of_data_processing
overview of_data_processingFEG
 
Exploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdfExploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdfAmmarAhmedSiddiqui2
 
data structures and its importance
 data structures and its importance  data structures and its importance
data structures and its importance Anaya Zafar
 
Data mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updatedData mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updatedYugal Kumar
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousingEr. Nawaraj Bhandari
 
Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018DataLab Community
 
finalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptxfinalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptxshumPanwar
 
Lecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learningLecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learningmy6305874
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dmsumit621
 
Pandas Data Cleaning and Preprocessing PPT.pptx
Pandas Data Cleaning and Preprocessing PPT.pptxPandas Data Cleaning and Preprocessing PPT.pptx
Pandas Data Cleaning and Preprocessing PPT.pptxbajajrishabh96tech
 

Similar to Transformation techniques.pptx (20)

Unit 3 part ii Data mining
Unit 3 part ii Data miningUnit 3 part ii Data mining
Unit 3 part ii Data mining
 
Presentation on data preparation with pandas
Presentation on data preparation with pandasPresentation on data preparation with pandas
Presentation on data preparation with pandas
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Group 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptxGroup 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptx
 
Pre-Processing and Data Preparation
Pre-Processing and Data PreparationPre-Processing and Data Preparation
Pre-Processing and Data Preparation
 
Chapter 3.pdf
Chapter 3.pdfChapter 3.pdf
Chapter 3.pdf
 
overview of_data_processing
overview of_data_processingoverview of_data_processing
overview of_data_processing
 
Exploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdfExploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdf
 
Data mining
Data miningData mining
Data mining
 
data structures and its importance
 data structures and its importance  data structures and its importance
data structures and its importance
 
Data mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updatedData mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updated
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousing
 
Lecture2 (1).ppt
Lecture2 (1).pptLecture2 (1).ppt
Lecture2 (1).ppt
 
More on Pandas.pptx
More on Pandas.pptxMore on Pandas.pptx
More on Pandas.pptx
 
Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018Meetup Junio Data Analysis with python 2018
Meetup Junio Data Analysis with python 2018
 
finalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptxfinalestkddfinalpresentation-111207021040-phpapp01.pptx
finalestkddfinalpresentation-111207021040-phpapp01.pptx
 
Lecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learningLecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learning
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
 
Pandas Data Cleaning and Preprocessing PPT.pptx
Pandas Data Cleaning and Preprocessing PPT.pptxPandas Data Cleaning and Preprocessing PPT.pptx
Pandas Data Cleaning and Preprocessing PPT.pptx
 
Data Preparation.pptx
Data Preparation.pptxData Preparation.pptx
Data Preparation.pptx
 

Recently uploaded

Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxcallscotland1987
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 

Recently uploaded (20)

Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 

Transformation techniques.pptx

  • 2. Let's dive more into how we can perform other types of data transformations including cleaning, filtering, deduplication. 1. Performing data deduplication: Removing them is essential to enhance the quality of the dataset. This can be done with the following steps: 1. Let's consider a simple dataframe, as follows:
  • 3.
  • 4.
  • 5. 2. Replacing values- It is essential to find and replace some values inside a dataframe. This can be done with the following steps:
  • 6.
  • 7. 3.Handling missing data • Whenever there are missing values, a NaN value is used, which indicates that there is no value specified for that particular index. • There could be several reasons why a value could be NaN: • When data is retrieved from an external source and there are some incomplete values in the dataset. • It can also happen when we join two different datasets and some values are not matched. • Missing values due to data collection errors. • When the shape of data changes, there are new additional rows or columns that are not determined.
  • 8. NaN values in pandas objects
  • 9. Dropping missing values • One of the ways to handle missing values is to simply remove them from our dataset. We have seen that we can use the isnull() and notnull() functions from the pandas library to determine null values:
  • 12. Outlier detection and filtering • Outliers are data points that diverge from other observations for several reasons. • During the EDA phase, one of our common tasks is to detect and filter these outliers. • The main reason for this detection and filtering of outliers is that the presence of such outliers can cause serious issues in statistical analysis.
  • 13. We are going to perform simple outlier detection and filtering:
  • 14.
  • 15.
  • 16. Grouping Datasets • During data analysis, it is often essential to cluster or group data together based on certain criteria. • For example, an e-commerce store might want to group all the sales that were done during the New year period • These grouping concepts occur in several parts of data analysis
  • 17. groupby() method • During the data analysis phase, categorizing a dataset into multiple categories or groups is often essential. • The pandas groupby function is one of the most efficient and time- saving features for doing this. • Groupby provides functionalities that allow us to split-apply-combine throughout the dataframe; that is, it can be used for splitting, applying, and combining dataframes.
  • 18. https://www.w3schools.com/python/pandas/trypython.asp?filename=demo _ref_df_groupby • The pandas groupby method performs two essential functions:  It splits the data into groups based on some criteria.  It applies a function to each group independently.
  • 19. Find the average co2 consumption for each car brand: import pandas as pd data = { 'co2': [95, 90, 99, 104, 105, 94, 99, 104], 'model': ['Citigo', 'Fabia', 'Fiesta', 'Rapid', 'Focus', 'Mondeo’, 'Octavia', 'B-Max'], 'car': ['Skoda', 'Skoda', 'Ford', 'Skoda', 'Ford’, 'Ford', 'Skoda’, 'Ford’] } df = pd.DataFrame(data) print(df)
  • 20. Find the average co2 consumption for each car brand: groupby() can be used to group the dataset on the basis of car column get_group() can be used to get value items in the group
  • 21.
  • 22.
  • 23. Data aggregation • Aggregation is the process of implementing any mathematical operation on a dataset or a subset of it. • Aggregation is one of the many techniques in pandas that's used to manipulate the data in the dataframe for data analysis. • The Dataframe.aggregate() function is used to apply aggregation across one or more columns. • We can apply aggregation in a DataFrame, df, as df.aggregate() or df.agg(). • Aggregation only works with numeric type columns, Some of the most frequently used aggregations are as follows: