SlideShare a Scribd company logo
1 of 23
Download to read offline
Customer Data
Cleansing Project
Natalie Offenbacker
Introduction
It is said that 80% of the time data analysts and data scientists are
cleaning the datasets given to them and the remaining 20% is the
actual analyzing. This project was spent cleaning and wrangling a
customer csv file with a half a million records. This presentation will go
into detail on how it was done in Excel.
Customer Data Cleansing Project 2
The Data
Customer Data Cleansing Project 3
An Overview
The dataset shown in the previous slide has
500002 records and 10 columns starting with
customer ID and ending with zip code. When
taking a glance at the data, right off the bat there
are some obvious errors. The column title for
email is incorrectly typed, the gender column
has an incorrect data category of F!, and DOB
and Date of Joining has incorrect date types.
Customer Data Cleansing Project 4
Excel Overview Continued
Even though there were glaring errors in the
beginning of the file, it is always good to
assume that there are more errors within the
dataset. Further investigation within the
customer csv shows that there are many
occurrences of future dates in the Date of
Joining column, DOB in the 1700s, and
gender options that are not comprehendible.
Customer Data Cleansing Project
5
How Do We Fix It?
Very rarely will there be an occurrence where a data analyst
can fix something quickly and manually. Because it is
unknown how many occurrences these errors are found
within the dataset of a half a million records, one of the
quicker ways to fix and clean the data is using formulas.
Customer Data Cleansing Project 6
Text to Columns
The first step in the process of fixing date errors in a large dataset
would be to split data into multiple columns. If this is not done, when
formula is typed to fix date error it would not recognize which number
to fix. In this situation the first step would be to find text to column
button to split columns by delimiter. As the image shows, the user
would navigate towards the data tab and then find the button.
Customer Data Cleansing Project 7
Step One: Text to Columns
Continued
1. The data tab in excel has a button called text to
columns. When pressed, it then guides the
user to a wizard to convert text to multiple
columns by delimiter.
2. In this case columns would be spaced by
“other” delimiter previewed in step two and the
delimiter “/” would be typed in “other” box
3. In the final step of the wizard the user would
select column for the new data columns. In
this case the first free column was picked
“$O$1”
4. The right most snapshot is the result of text to
column wizard. Date of birth is separated into
three columns.
IF Statement
Customer Data Cleansing Project 9
After columns are separated, dates are now ready for cleaning. In order to correct Date of
Birth from 1700s back to 1900s a formula would be used. In this case an IF formula would be
used for correction.
=IF(LEFT(Q2,2)="17","19"&RIGHT(Q2,2),Q2)
This formula basically means that if the left-most digit in Q2 IS “17” then change to “19”. Else
numbers remain unchanged.
IF Statement Continued
PRESENTATION TITLE
Date of Birth Temporary
Column Result of IF Formula
1. The two images show the result of
the IF statement formula. There
are no DOBs that have dates in the
1700s
2. After formula is completed, copy
DOB Year Temp and then choose
paste values which is the second
paste option.
3. This gets rid of formula
attachment to the column.
Explanation
10
One Last Step for DOB
• Date of Birth column errors have been corrected. But the
dates must be joined together again, instead of being in
three separate columns. This is done by the concatenation
formula. CONCAT joins columns together.
• =N2&"/"&O2&"/’’&P2
• Like the previous slide mentioned after formula is completed,
we want to get rid of formula attachment to the column so
the step would be to copy DOB Temp and then choose paste
values on the next available column.
• The last thing to do would be to delete separate DOB
columns.
11
The image shows the result of
the concatenation formula. The
three separate date columns
were returned to one column.
Doing It Again for Date of
Joining
• The same steps would be performed to fix
date errors in the Date of Joining column,
(text to column wizard) however formula
would be a little bit different. Instead
changing dates from “17” to “19” we want to
change dates from “21” to “20” So formula
would look like this:
•
=IF(LEFT(M2,2)="21","20"&RIGHT(M2,2),
M2)
• After dates are corrected DOJ Temp would
be copied and then pasted as values
• The last step would be to concatenate and
join DOJ together again.
• After these two data cleaning processes are
completed, dataset will look like second
image. 12
Are We Ready to Move On?
The new DOB and Date of Joining
columns look good as new, but are
they finished? A good habit to get
accustomed to would be to check
over what was corrected one more
time. When we do that, it is noticed
that the DOJ column still has some
future dates. There are only a
couple instances of these dates so
they can manually be fixed from “20”
to “19” but if these were not caught,
analyzation would be incorrect.
Customer Data Cleansing Project 13
Gender Column Correction
The gender column has a couple
incomprehensible options. Since the
options are assumed to be only male
and female a formula will be used to
correct “F” and “M”.
Customer Data Cleansing Project 14
IF IS NUMBER
• =IF(ISNUMBER(SEARCH("F",K2)),"F","
M")
• This formula will first check text in K2 to
see if it contains “F”. If it does the
formula outputs “F” and if it does not,
the formula outputs “M”
• This formula is a conditional function
that checks condition based on
ISNUMBER(SEARCH “F” ,K2))
Customer Data Cleansing Project 15
Gender Column Correction Last
Step
• After gender column is
corrected, the gender column
would be copied and then
pasted as values to un attach
formulas from column.
• After gender is pasted as
values make sure that
everything looks okay and
then that’s it!
• The two images show the
result of the IF IS NUMBER
formula
Customer Data Cleansing Project 16
Business Preferences and
Insights
Sometimes businesses have preference of how they want their data to
look. An example would be to change 10-digit zip code to only 5 digits.
Also, for better business insights a data analyst may aggregate more
columns for further information about data. An example in this dataset
would be to find out the ages of customers based on DOB column and
the years of membership based on Date of Joining column. These next
couple slides will show how to perform these functions.
Customer Data Cleansing Project 17
5 Digit Zip Code (TRIM)
• =LEFT(K2,5)
• This formula looks at the
leftmost digits in K2. It then
takes the first 5 digits.
• The result of the formula is
shown in the two images.
• 5 Digit Zip Code column is
created.
Customer Data Cleansing Project 18
Age Based on DOB Column
• =DATEDIF(I2,NOW(),"Y")
• This formula takes the date in cell I2
and coverts it in years as of today. I2’s
date is 9/4/1992, so DATEDIF formula
should translate date to 31 years.
• The second image shows the
conversion of date to years as of
today.
Customer Data Cleansing Project 19
Membership Years Based on DOJ
Column
• Membership years column would be
aggregated the same way that age
column was created except the
minor difference is the cell.
• =DATEDIF(M2,NOW(),"Y")
• The date in M2 is 9/9/14, so
DATEDIF formula should covert date
to 9 years.
• The second image shows the result
of DATEDIF formula for membership
years column
Customer Data Cleansing Project 20
Final Steps
Although, dataset has all components
needed and all data is cleaned for
analyzing, the dataset itself is
unorganized. DOB and age should
be closer to customer information
and Date of Joining and Membership
Years should be included within
dataset and not spaced out.
Customer Data Cleansing Project 21
Final Product
Customer Data Cleansing Project 22
Thank you
Natalie Offenbacker
natkcn07@gmail.com

More Related Content

Similar to Customer Data Cleansing Project.pptx

How To Automate Part 2
How To Automate Part 2How To Automate Part 2
How To Automate Part 2Sean Durocher
 
284566820 1 z0-061(1)
284566820 1 z0-061(1)284566820 1 z0-061(1)
284566820 1 z0-061(1)panagara
 
Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...
Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...
Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...Massimo Cenci
 
LabsLab5Lab5_Excel_SH.htmlLab 5 SpreadsheetsLearning Outcomes.docx
LabsLab5Lab5_Excel_SH.htmlLab 5 SpreadsheetsLearning Outcomes.docxLabsLab5Lab5_Excel_SH.htmlLab 5 SpreadsheetsLearning Outcomes.docx
LabsLab5Lab5_Excel_SH.htmlLab 5 SpreadsheetsLearning Outcomes.docxDIPESH30
 
1. What two conditions must be met before an entity can be classif.docx
1. What two conditions must be met before an entity can be classif.docx1. What two conditions must be met before an entity can be classif.docx
1. What two conditions must be met before an entity can be classif.docxjackiewalcutt
 
In Section 1 on the Data page, complete each column of the spreads.docx
In Section 1 on the Data page, complete each column of the spreads.docxIn Section 1 on the Data page, complete each column of the spreads.docx
In Section 1 on the Data page, complete each column of the spreads.docxsleeperharwell
 
Scoring documentation
Scoring documentationScoring documentation
Scoring documentationFatima Khalid
 
1Z0-061 Oracle Database 12c: SQL Fundamentals
1Z0-061 Oracle Database 12c: SQL Fundamentals1Z0-061 Oracle Database 12c: SQL Fundamentals
1Z0-061 Oracle Database 12c: SQL FundamentalsLydi00147
 
Question 1Which view does not display the data, but allows you t.docx
Question 1Which view does not display the data, but allows you t.docxQuestion 1Which view does not display the data, but allows you t.docx
Question 1Which view does not display the data, but allows you t.docxJUST36
 
Grader - InstructionsExcel 2019 ProjectExcel_7G_Loan_Flowers_Staf.docx
Grader - InstructionsExcel 2019 ProjectExcel_7G_Loan_Flowers_Staf.docxGrader - InstructionsExcel 2019 ProjectExcel_7G_Loan_Flowers_Staf.docx
Grader - InstructionsExcel 2019 ProjectExcel_7G_Loan_Flowers_Staf.docxgreg1eden90113
 
The two-party systemCheck out this list 
 Thats the .docx
The two-party systemCheck out this list 
 Thats the .docxThe two-party systemCheck out this list 
 Thats the .docx
The two-party systemCheck out this list 
 Thats the .docxrhetttrevannion
 
Dynamics gp insights to manufacturing
Dynamics gp insights to manufacturingDynamics gp insights to manufacturing
Dynamics gp insights to manufacturingSteve Chapman
 
Capstone Project - PPDWS Report 150807 1705 FINAL - Robert Balaam
Capstone Project - PPDWS Report 150807 1705 FINAL - Robert BalaamCapstone Project - PPDWS Report 150807 1705 FINAL - Robert Balaam
Capstone Project - PPDWS Report 150807 1705 FINAL - Robert BalaamRobert Balaam
 
Part 2Provider Database (MS Access)Use the project description HE.docx
Part 2Provider Database (MS Access)Use the project description HE.docxPart 2Provider Database (MS Access)Use the project description HE.docx
Part 2Provider Database (MS Access)Use the project description HE.docxdanhaley45372
 
Part 2Provider Database (MS Access)Use the project description HE.docx
Part 2Provider Database (MS Access)Use the project description HE.docxPart 2Provider Database (MS Access)Use the project description HE.docx
Part 2Provider Database (MS Access)Use the project description HE.docxherbertwilson5999
 
Exploratory data analysis v1.0
Exploratory data analysis v1.0Exploratory data analysis v1.0
Exploratory data analysis v1.0Vishy Chandra
 
Tableau Basic ModuleModule 6 CalculationsThis class demonst.docx
Tableau Basic ModuleModule 6 CalculationsThis class demonst.docxTableau Basic ModuleModule 6 CalculationsThis class demonst.docx
Tableau Basic ModuleModule 6 CalculationsThis class demonst.docxssuserf9c51d
 
John Noll Portfolio
John Noll PortfolioJohn Noll Portfolio
John Noll PortfolioJohnNoll
 

Similar to Customer Data Cleansing Project.pptx (20)

Sql Lab 4 Essay
Sql Lab 4 EssaySql Lab 4 Essay
Sql Lab 4 Essay
 
How To Automate Part 2
How To Automate Part 2How To Automate Part 2
How To Automate Part 2
 
284566820 1 z0-061(1)
284566820 1 z0-061(1)284566820 1 z0-061(1)
284566820 1 z0-061(1)
 
Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...
Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...
Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...
 
LabsLab5Lab5_Excel_SH.htmlLab 5 SpreadsheetsLearning Outcomes.docx
LabsLab5Lab5_Excel_SH.htmlLab 5 SpreadsheetsLearning Outcomes.docxLabsLab5Lab5_Excel_SH.htmlLab 5 SpreadsheetsLearning Outcomes.docx
LabsLab5Lab5_Excel_SH.htmlLab 5 SpreadsheetsLearning Outcomes.docx
 
1. What two conditions must be met before an entity can be classif.docx
1. What two conditions must be met before an entity can be classif.docx1. What two conditions must be met before an entity can be classif.docx
1. What two conditions must be met before an entity can be classif.docx
 
In Section 1 on the Data page, complete each column of the spreads.docx
In Section 1 on the Data page, complete each column of the spreads.docxIn Section 1 on the Data page, complete each column of the spreads.docx
In Section 1 on the Data page, complete each column of the spreads.docx
 
Scoring documentation
Scoring documentationScoring documentation
Scoring documentation
 
1Z0-061 Oracle Database 12c: SQL Fundamentals
1Z0-061 Oracle Database 12c: SQL Fundamentals1Z0-061 Oracle Database 12c: SQL Fundamentals
1Z0-061 Oracle Database 12c: SQL Fundamentals
 
Question 1Which view does not display the data, but allows you t.docx
Question 1Which view does not display the data, but allows you t.docxQuestion 1Which view does not display the data, but allows you t.docx
Question 1Which view does not display the data, but allows you t.docx
 
Grader - InstructionsExcel 2019 ProjectExcel_7G_Loan_Flowers_Staf.docx
Grader - InstructionsExcel 2019 ProjectExcel_7G_Loan_Flowers_Staf.docxGrader - InstructionsExcel 2019 ProjectExcel_7G_Loan_Flowers_Staf.docx
Grader - InstructionsExcel 2019 ProjectExcel_7G_Loan_Flowers_Staf.docx
 
50 MS Excel Tips and Tricks
50 MS Excel Tips and Tricks 50 MS Excel Tips and Tricks
50 MS Excel Tips and Tricks
 
The two-party systemCheck out this list 
 Thats the .docx
The two-party systemCheck out this list 
 Thats the .docxThe two-party systemCheck out this list 
 Thats the .docx
The two-party systemCheck out this list 
 Thats the .docx
 
Dynamics gp insights to manufacturing
Dynamics gp insights to manufacturingDynamics gp insights to manufacturing
Dynamics gp insights to manufacturing
 
Capstone Project - PPDWS Report 150807 1705 FINAL - Robert Balaam
Capstone Project - PPDWS Report 150807 1705 FINAL - Robert BalaamCapstone Project - PPDWS Report 150807 1705 FINAL - Robert Balaam
Capstone Project - PPDWS Report 150807 1705 FINAL - Robert Balaam
 
Part 2Provider Database (MS Access)Use the project description HE.docx
Part 2Provider Database (MS Access)Use the project description HE.docxPart 2Provider Database (MS Access)Use the project description HE.docx
Part 2Provider Database (MS Access)Use the project description HE.docx
 
Part 2Provider Database (MS Access)Use the project description HE.docx
Part 2Provider Database (MS Access)Use the project description HE.docxPart 2Provider Database (MS Access)Use the project description HE.docx
Part 2Provider Database (MS Access)Use the project description HE.docx
 
Exploratory data analysis v1.0
Exploratory data analysis v1.0Exploratory data analysis v1.0
Exploratory data analysis v1.0
 
Tableau Basic ModuleModule 6 CalculationsThis class demonst.docx
Tableau Basic ModuleModule 6 CalculationsThis class demonst.docxTableau Basic ModuleModule 6 CalculationsThis class demonst.docx
Tableau Basic ModuleModule 6 CalculationsThis class demonst.docx
 
John Noll Portfolio
John Noll PortfolioJohn Noll Portfolio
John Noll Portfolio
 

Recently uploaded

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 

Customer Data Cleansing Project.pptx

  • 2. Introduction It is said that 80% of the time data analysts and data scientists are cleaning the datasets given to them and the remaining 20% is the actual analyzing. This project was spent cleaning and wrangling a customer csv file with a half a million records. This presentation will go into detail on how it was done in Excel. Customer Data Cleansing Project 2
  • 3. The Data Customer Data Cleansing Project 3
  • 4. An Overview The dataset shown in the previous slide has 500002 records and 10 columns starting with customer ID and ending with zip code. When taking a glance at the data, right off the bat there are some obvious errors. The column title for email is incorrectly typed, the gender column has an incorrect data category of F!, and DOB and Date of Joining has incorrect date types. Customer Data Cleansing Project 4
  • 5. Excel Overview Continued Even though there were glaring errors in the beginning of the file, it is always good to assume that there are more errors within the dataset. Further investigation within the customer csv shows that there are many occurrences of future dates in the Date of Joining column, DOB in the 1700s, and gender options that are not comprehendible. Customer Data Cleansing Project 5
  • 6. How Do We Fix It? Very rarely will there be an occurrence where a data analyst can fix something quickly and manually. Because it is unknown how many occurrences these errors are found within the dataset of a half a million records, one of the quicker ways to fix and clean the data is using formulas. Customer Data Cleansing Project 6
  • 7. Text to Columns The first step in the process of fixing date errors in a large dataset would be to split data into multiple columns. If this is not done, when formula is typed to fix date error it would not recognize which number to fix. In this situation the first step would be to find text to column button to split columns by delimiter. As the image shows, the user would navigate towards the data tab and then find the button. Customer Data Cleansing Project 7
  • 8. Step One: Text to Columns Continued 1. The data tab in excel has a button called text to columns. When pressed, it then guides the user to a wizard to convert text to multiple columns by delimiter. 2. In this case columns would be spaced by “other” delimiter previewed in step two and the delimiter “/” would be typed in “other” box 3. In the final step of the wizard the user would select column for the new data columns. In this case the first free column was picked “$O$1” 4. The right most snapshot is the result of text to column wizard. Date of birth is separated into three columns.
  • 9. IF Statement Customer Data Cleansing Project 9 After columns are separated, dates are now ready for cleaning. In order to correct Date of Birth from 1700s back to 1900s a formula would be used. In this case an IF formula would be used for correction. =IF(LEFT(Q2,2)="17","19"&RIGHT(Q2,2),Q2) This formula basically means that if the left-most digit in Q2 IS “17” then change to “19”. Else numbers remain unchanged.
  • 10. IF Statement Continued PRESENTATION TITLE Date of Birth Temporary Column Result of IF Formula 1. The two images show the result of the IF statement formula. There are no DOBs that have dates in the 1700s 2. After formula is completed, copy DOB Year Temp and then choose paste values which is the second paste option. 3. This gets rid of formula attachment to the column. Explanation 10
  • 11. One Last Step for DOB • Date of Birth column errors have been corrected. But the dates must be joined together again, instead of being in three separate columns. This is done by the concatenation formula. CONCAT joins columns together. • =N2&"/"&O2&"/’’&P2 • Like the previous slide mentioned after formula is completed, we want to get rid of formula attachment to the column so the step would be to copy DOB Temp and then choose paste values on the next available column. • The last thing to do would be to delete separate DOB columns. 11 The image shows the result of the concatenation formula. The three separate date columns were returned to one column.
  • 12. Doing It Again for Date of Joining • The same steps would be performed to fix date errors in the Date of Joining column, (text to column wizard) however formula would be a little bit different. Instead changing dates from “17” to “19” we want to change dates from “21” to “20” So formula would look like this: • =IF(LEFT(M2,2)="21","20"&RIGHT(M2,2), M2) • After dates are corrected DOJ Temp would be copied and then pasted as values • The last step would be to concatenate and join DOJ together again. • After these two data cleaning processes are completed, dataset will look like second image. 12
  • 13. Are We Ready to Move On? The new DOB and Date of Joining columns look good as new, but are they finished? A good habit to get accustomed to would be to check over what was corrected one more time. When we do that, it is noticed that the DOJ column still has some future dates. There are only a couple instances of these dates so they can manually be fixed from “20” to “19” but if these were not caught, analyzation would be incorrect. Customer Data Cleansing Project 13
  • 14. Gender Column Correction The gender column has a couple incomprehensible options. Since the options are assumed to be only male and female a formula will be used to correct “F” and “M”. Customer Data Cleansing Project 14
  • 15. IF IS NUMBER • =IF(ISNUMBER(SEARCH("F",K2)),"F"," M") • This formula will first check text in K2 to see if it contains “F”. If it does the formula outputs “F” and if it does not, the formula outputs “M” • This formula is a conditional function that checks condition based on ISNUMBER(SEARCH “F” ,K2)) Customer Data Cleansing Project 15
  • 16. Gender Column Correction Last Step • After gender column is corrected, the gender column would be copied and then pasted as values to un attach formulas from column. • After gender is pasted as values make sure that everything looks okay and then that’s it! • The two images show the result of the IF IS NUMBER formula Customer Data Cleansing Project 16
  • 17. Business Preferences and Insights Sometimes businesses have preference of how they want their data to look. An example would be to change 10-digit zip code to only 5 digits. Also, for better business insights a data analyst may aggregate more columns for further information about data. An example in this dataset would be to find out the ages of customers based on DOB column and the years of membership based on Date of Joining column. These next couple slides will show how to perform these functions. Customer Data Cleansing Project 17
  • 18. 5 Digit Zip Code (TRIM) • =LEFT(K2,5) • This formula looks at the leftmost digits in K2. It then takes the first 5 digits. • The result of the formula is shown in the two images. • 5 Digit Zip Code column is created. Customer Data Cleansing Project 18
  • 19. Age Based on DOB Column • =DATEDIF(I2,NOW(),"Y") • This formula takes the date in cell I2 and coverts it in years as of today. I2’s date is 9/4/1992, so DATEDIF formula should translate date to 31 years. • The second image shows the conversion of date to years as of today. Customer Data Cleansing Project 19
  • 20. Membership Years Based on DOJ Column • Membership years column would be aggregated the same way that age column was created except the minor difference is the cell. • =DATEDIF(M2,NOW(),"Y") • The date in M2 is 9/9/14, so DATEDIF formula should covert date to 9 years. • The second image shows the result of DATEDIF formula for membership years column Customer Data Cleansing Project 20
  • 21. Final Steps Although, dataset has all components needed and all data is cleaned for analyzing, the dataset itself is unorganized. DOB and age should be closer to customer information and Date of Joining and Membership Years should be included within dataset and not spaced out. Customer Data Cleansing Project 21
  • 22. Final Product Customer Data Cleansing Project 22