SlideShare a Scribd company logo
1 of 29
An Introduction to
data cleaning with
spreadsheets
Anders Pedersen, @anpe
School of Data
Spreadsheets: The beginning of
each and every data story
• Which were the top growth sectors in this
quarter?
• What was the crime in the capital region in
2013 compared to 2012?
• Is there a house bubble waiting around the
corner?
It is time for journalists themselves to
tame this beast called spreadsheets!
Spreadsheets: Excel or
google docs
Some basic terminology
• data is organized in rows and columns
(rows go across the page, columns go top
down)
• each field holding data is called a cell
• Rows are numbered,
• columns are referred to by letters
• each cell has column and a row, or a
specific code (e.g. A1 is the top left cell
Some key features to
explore today
• Sorting and filtering
• Basic formulas
• Pivot tables
Tricky bits:
- don’t include summaries in pivot table
- pivot tables cannot remember when you
change your data
Data sources for exercise
• Education: Secondary school enrollment for
2012 from Data.gov.ph
http://data.gov.ph/catalogue/dataset/sy-
2012-enrollment-data-secondary
Sorting - finding the best
and the worst
• The 10 best paid sectors
• The 10 oldest cities
• The 10 poorest countries
• …
• If excel is a tool box for journalists, sorting
is the hammer!
How to sort
• 1) Mark all your data
• 2) In the Data tab go to
sort range
Sorting...
• 3) Check the Data has
header row check box
• 4) Select the
column you want to
sort
Filtering - getting a better
sense of your data
• 1) Turn on Filtering
via the Data tab
(Data → Filter)
Filtering...
• 2) Filter options now appear at top
Filtering...
• 3) Now click on the
• blue triangular arrow
Filtering...
• 4) Select the section
you wish to filter
Filtering...
• 5) A green arrow
will now appear on top
of the column
Moving forward!
• Sorting and filtering - check!
• Basic formulas
• Pivot tables
Basic formulas
• Let us know try to sum up some of the
values in the dataset…
• What is it good for: when you do analysis
and when you need to check if calculations
by your colleagues are right
Basic formulas
• Go to column H: In the second row
(cell H2), type “=sum(f2+g2)”
Basic formulas
• We now have a sum
• Now try to see if this cell can be calculated
for average “=average(f2:g2)”
Basic formulas
• You can also copy your calculations across
cells
Now only Pivot tables to go
• Sorting and filtering - check!
• Basic formulas - check!
• Pivot tables
Pivot tables
• finding stories inside datasets
• particularly well fitting for organised
datasets with clear categories and sub-
categories
Pivot tables
• Mark the full area of the dataset
• Go to Data → Pivot table report
Pivot tables
• Pivot tables allows you to work on rows,
column values and filters
• We start by dropping
a column header into Rows
• Then we drop one of our
value columns into Values
Basic formulas
• We now have a nice summary of the
budget for each department
Filtering pivot tables
• We can now go ahead and filter the Pivot
table
• Add the column you wish
to filter by
Filtering pivot tables
• Then select one or more categories within
the column you
wish to keep
Pivot tables
• We can finally add several value columns
to the pivot table
Exercises
• Find the sectors of the national budget that
grew the most in percentage
• Identify the budget lines, which had the
biggest absolute increase in the budget
• Generate a pivot table based on the
national budget comparing 2014 and 2013
in specific sectors

More Related Content

What's hot

Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 a
bhagathk
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
suganmca14
 
ความรู้เบื้องต้นฐานข้อมูล 1
ความรู้เบื้องต้นฐานข้อมูล 1ความรู้เบื้องต้นฐานข้อมูล 1
ความรู้เบื้องต้นฐานข้อมูล 1
Witoon Thammatuch-aree
 
Creating Effective Data Visualizations in Excel 2016: Some Basics
Creating Effective Data Visualizations in Excel 2016:  Some BasicsCreating Effective Data Visualizations in Excel 2016:  Some Basics
Creating Effective Data Visualizations in Excel 2016: Some Basics
Shalin Hai-Jew
 

What's hot (20)

DBMS
DBMSDBMS
DBMS
 
Data mining
Data miningData mining
Data mining
 
Introduction to database
Introduction to databaseIntroduction to database
Introduction to database
 
Data science | What is Data science
Data science | What is Data scienceData science | What is Data science
Data science | What is Data science
 
Text analytics
Text analyticsText analytics
Text analytics
 
Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 a
 
COVID - 19 DATA ANALYSIS USING PYTHON and Introduction to Data Science
COVID - 19 DATA ANALYSIS USING PYTHON and Introduction to Data ScienceCOVID - 19 DATA ANALYSIS USING PYTHON and Introduction to Data Science
COVID - 19 DATA ANALYSIS USING PYTHON and Introduction to Data Science
 
Database
DatabaseDatabase
Database
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and Techniques
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
Introduction to data pre-processing and cleaning
Introduction to data pre-processing and cleaning Introduction to data pre-processing and cleaning
Introduction to data pre-processing and cleaning
 
Big data and data science overview
Big data and data science overviewBig data and data science overview
Big data and data science overview
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Basics of Research Data Management
Basics of Research Data ManagementBasics of Research Data Management
Basics of Research Data Management
 
Best practices data management
Best practices data managementBest practices data management
Best practices data management
 
Databases and types of databases
Databases and types of databasesDatabases and types of databases
Databases and types of databases
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the Web
 
ความรู้เบื้องต้นฐานข้อมูล 1
ความรู้เบื้องต้นฐานข้อมูล 1ความรู้เบื้องต้นฐานข้อมูล 1
ความรู้เบื้องต้นฐานข้อมูล 1
 
Unit 3 part i Data mining
Unit 3 part i Data miningUnit 3 part i Data mining
Unit 3 part i Data mining
 
Creating Effective Data Visualizations in Excel 2016: Some Basics
Creating Effective Data Visualizations in Excel 2016:  Some BasicsCreating Effective Data Visualizations in Excel 2016:  Some Basics
Creating Effective Data Visualizations in Excel 2016: Some Basics
 

Viewers also liked

Doctors for Kroner - Presentation at #Dataharvest 2012
Doctors for Kroner - Presentation at #Dataharvest 2012 Doctors for Kroner - Presentation at #Dataharvest 2012
Doctors for Kroner - Presentation at #Dataharvest 2012
Anders Pedersen
 

Viewers also liked (9)

Message map
Message mapMessage map
Message map
 
Doctors for Kroner - Presentation at #Dataharvest 2012
Doctors for Kroner - Presentation at #Dataharvest 2012 Doctors for Kroner - Presentation at #Dataharvest 2012
Doctors for Kroner - Presentation at #Dataharvest 2012
 
[EN] Maps and Digital Tools For Activists [RO] Hărți și instrumente digitale ...
[EN] Maps and Digital Tools For Activists [RO] Hărți și instrumente digitale ...[EN] Maps and Digital Tools For Activists [RO] Hărți și instrumente digitale ...
[EN] Maps and Digital Tools For Activists [RO] Hărți și instrumente digitale ...
 
Geomapping Making Invisible Data Visible
Geomapping   Making Invisible Data VisibleGeomapping   Making Invisible Data Visible
Geomapping Making Invisible Data Visible
 
Censorship Regimes On The Chinese Internet
Censorship Regimes On The  Chinese InternetCensorship Regimes On The  Chinese Internet
Censorship Regimes On The Chinese Internet
 
An introduction to Data Journalism
An introduction to Data JournalismAn introduction to Data Journalism
An introduction to Data Journalism
 
An introduction to open data
An introduction to open dataAn introduction to open data
An introduction to open data
 
PROPUESTA PARA LA CREACION DE UNA FERRETERIA CON PRODUCTOS Y MATERIALES IDONE...
PROPUESTA PARA LA CREACION DE UNA FERRETERIA CON PRODUCTOS Y MATERIALES IDONE...PROPUESTA PARA LA CREACION DE UNA FERRETERIA CON PRODUCTOS Y MATERIALES IDONE...
PROPUESTA PARA LA CREACION DE UNA FERRETERIA CON PRODUCTOS Y MATERIALES IDONE...
 
32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your Business32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your Business
 

Similar to Introduction to data cleaning with spreadsheets

MIS 226: Chapter 1
MIS 226: Chapter 1MIS 226: Chapter 1
MIS 226: Chapter 1
macrob14
 
EDA_Revision_Session_1cdeba87-6912-4236-ba3b-079a5463bf00.pptx
EDA_Revision_Session_1cdeba87-6912-4236-ba3b-079a5463bf00.pptxEDA_Revision_Session_1cdeba87-6912-4236-ba3b-079a5463bf00.pptx
EDA_Revision_Session_1cdeba87-6912-4236-ba3b-079a5463bf00.pptx
saurav3107pandey
 
Excel basics for everyday use part two
Excel basics for everyday use part twoExcel basics for everyday use part two
Excel basics for everyday use part two
Kevin McLogan
 
Epidata presentation course for heath science
Epidata presentation course for heath scienceEpidata presentation course for heath science
Epidata presentation course for heath science
MitikuTeka1
 

Similar to Introduction to data cleaning with spreadsheets (20)

Big Data LDN 2018: TIPS AND TRICKS TO WRANGLE BIG, DIRTY DATA
Big Data LDN 2018: TIPS AND TRICKS TO WRANGLE BIG, DIRTY DATABig Data LDN 2018: TIPS AND TRICKS TO WRANGLE BIG, DIRTY DATA
Big Data LDN 2018: TIPS AND TRICKS TO WRANGLE BIG, DIRTY DATA
 
MS Excel training (Vidushi Khera)
MS Excel training (Vidushi Khera)MS Excel training (Vidushi Khera)
MS Excel training (Vidushi Khera)
 
IS100 Week 8
IS100 Week 8IS100 Week 8
IS100 Week 8
 
IS100 Week 9
IS100 Week 9IS100 Week 9
IS100 Week 9
 
Clueless to journal publishing
Clueless to journal publishingClueless to journal publishing
Clueless to journal publishing
 
BI Knowledge Sharing Session 2
BI Knowledge Sharing Session 2BI Knowledge Sharing Session 2
BI Knowledge Sharing Session 2
 
intro-to-spreadsheets-with-excel (1).pptx
intro-to-spreadsheets-with-excel (1).pptxintro-to-spreadsheets-with-excel (1).pptx
intro-to-spreadsheets-with-excel (1).pptx
 
Elementary Data Analysis with MS Excel_Day-2
Elementary Data Analysis with MS Excel_Day-2Elementary Data Analysis with MS Excel_Day-2
Elementary Data Analysis with MS Excel_Day-2
 
UKSG 2018 Breakout - Analysing value for money of journal bundle deals at the...
UKSG 2018 Breakout - Analysing value for money of journal bundle deals at the...UKSG 2018 Breakout - Analysing value for money of journal bundle deals at the...
UKSG 2018 Breakout - Analysing value for money of journal bundle deals at the...
 
MIS 226: Chapter 1
MIS 226: Chapter 1MIS 226: Chapter 1
MIS 226: Chapter 1
 
The Excel ToolKit
The Excel ToolKitThe Excel ToolKit
The Excel ToolKit
 
Exel
ExelExel
Exel
 
ENHANCING ICT SKILLS ON MS EXCEL.ppt
ENHANCING ICT SKILLS ON MS EXCEL.pptENHANCING ICT SKILLS ON MS EXCEL.ppt
ENHANCING ICT SKILLS ON MS EXCEL.ppt
 
BI Suite Overview
BI Suite OverviewBI Suite Overview
BI Suite Overview
 
EDA_Revision_Session_1cdeba87-6912-4236-ba3b-079a5463bf00.pptx
EDA_Revision_Session_1cdeba87-6912-4236-ba3b-079a5463bf00.pptxEDA_Revision_Session_1cdeba87-6912-4236-ba3b-079a5463bf00.pptx
EDA_Revision_Session_1cdeba87-6912-4236-ba3b-079a5463bf00.pptx
 
Introduction to Spreadsheets.ppt
Introduction to Spreadsheets.pptIntroduction to Spreadsheets.ppt
Introduction to Spreadsheets.ppt
 
Pivot tables 1.2
Pivot tables 1.2Pivot tables 1.2
Pivot tables 1.2
 
EDA
EDAEDA
EDA
 
Excel basics for everyday use part two
Excel basics for everyday use part twoExcel basics for everyday use part two
Excel basics for everyday use part two
 
Epidata presentation course for heath science
Epidata presentation course for heath scienceEpidata presentation course for heath science
Epidata presentation course for heath science
 

Recently uploaded

Competitive Advantage slide deck___.pptx
Competitive Advantage slide deck___.pptxCompetitive Advantage slide deck___.pptx
Competitive Advantage slide deck___.pptx
ScottMeyers35
 
2024 asthma jkdjkfjsdklfjsdlkfjskldfgdsgerg
2024 asthma jkdjkfjsdklfjsdlkfjskldfgdsgerg2024 asthma jkdjkfjsdklfjsdlkfjskldfgdsgerg
2024 asthma jkdjkfjsdklfjsdlkfjskldfgdsgerg
MadhuKothuru
 

Recently uploaded (20)

The NAP process & South-South peer learning
The NAP process & South-South peer learningThe NAP process & South-South peer learning
The NAP process & South-South peer learning
 
Election 2024 Presiding Duty Keypoints_01.pdf
Election 2024 Presiding Duty Keypoints_01.pdfElection 2024 Presiding Duty Keypoints_01.pdf
Election 2024 Presiding Duty Keypoints_01.pdf
 
Time, Stress & Work Life Balance for Clerks with Beckie Whitehouse
Time, Stress & Work Life Balance for Clerks with Beckie WhitehouseTime, Stress & Work Life Balance for Clerks with Beckie Whitehouse
Time, Stress & Work Life Balance for Clerks with Beckie Whitehouse
 
Dating Call Girls inBaloda Bazar Bhatapara 9332606886Call Girls Advance Cash...
Dating Call Girls inBaloda Bazar Bhatapara  9332606886Call Girls Advance Cash...Dating Call Girls inBaloda Bazar Bhatapara  9332606886Call Girls Advance Cash...
Dating Call Girls inBaloda Bazar Bhatapara 9332606886Call Girls Advance Cash...
 
World Press Freedom Day 2024; May 3rd - Poster
World Press Freedom Day 2024; May 3rd - PosterWorld Press Freedom Day 2024; May 3rd - Poster
World Press Freedom Day 2024; May 3rd - Poster
 
Pakistani Call girls in Sharjah 0505086370 Sharjah Call girls
Pakistani Call girls in Sharjah 0505086370 Sharjah Call girlsPakistani Call girls in Sharjah 0505086370 Sharjah Call girls
Pakistani Call girls in Sharjah 0505086370 Sharjah Call girls
 
Panchayath circular KLC -Panchayath raj act s 169, 218
Panchayath circular KLC -Panchayath raj act s 169, 218Panchayath circular KLC -Panchayath raj act s 169, 218
Panchayath circular KLC -Panchayath raj act s 169, 218
 
Call Girls Mehsana / 8250092165 Genuine Call girls with real Photos and Number
Call Girls Mehsana / 8250092165 Genuine Call girls with real Photos and NumberCall Girls Mehsana / 8250092165 Genuine Call girls with real Photos and Number
Call Girls Mehsana / 8250092165 Genuine Call girls with real Photos and Number
 
Peace-Conflict-and-National-Adaptation-Plan-NAP-Processes-.pdf
Peace-Conflict-and-National-Adaptation-Plan-NAP-Processes-.pdfPeace-Conflict-and-National-Adaptation-Plan-NAP-Processes-.pdf
Peace-Conflict-and-National-Adaptation-Plan-NAP-Processes-.pdf
 
Kolkata Call Girls Halisahar 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl ...
Kolkata Call Girls Halisahar  💯Call Us 🔝 8005736733 🔝 💃  Top Class Call Girl ...Kolkata Call Girls Halisahar  💯Call Us 🔝 8005736733 🔝 💃  Top Class Call Girl ...
Kolkata Call Girls Halisahar 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl ...
 
Competitive Advantage slide deck___.pptx
Competitive Advantage slide deck___.pptxCompetitive Advantage slide deck___.pptx
Competitive Advantage slide deck___.pptx
 
31st World Press Freedom Day - A Press for the Planet: Journalism in the face...
31st World Press Freedom Day - A Press for the Planet: Journalism in the face...31st World Press Freedom Day - A Press for the Planet: Journalism in the face...
31st World Press Freedom Day - A Press for the Planet: Journalism in the face...
 
tOld settlement register shouldnotaffect BTR
tOld settlement register shouldnotaffect BTRtOld settlement register shouldnotaffect BTR
tOld settlement register shouldnotaffect BTR
 
2024 asthma jkdjkfjsdklfjsdlkfjskldfgdsgerg
2024 asthma jkdjkfjsdklfjsdlkfjskldfgdsgerg2024 asthma jkdjkfjsdklfjsdlkfjskldfgdsgerg
2024 asthma jkdjkfjsdklfjsdlkfjskldfgdsgerg
 
Finance strategies for adaptation. Presentation for CANCC
Finance strategies for adaptation. Presentation for CANCCFinance strategies for adaptation. Presentation for CANCC
Finance strategies for adaptation. Presentation for CANCC
 
74th Amendment of India PPT by Piyush(IC).pptx
74th Amendment of India PPT by Piyush(IC).pptx74th Amendment of India PPT by Piyush(IC).pptx
74th Amendment of India PPT by Piyush(IC).pptx
 
Tuvalu Coastal Adaptation Project (TCAP)
Tuvalu Coastal Adaptation Project (TCAP)Tuvalu Coastal Adaptation Project (TCAP)
Tuvalu Coastal Adaptation Project (TCAP)
 
Contributi dei parlamentari del PD - Contributi L. 3/2019
Contributi dei parlamentari del PD - Contributi L. 3/2019Contributi dei parlamentari del PD - Contributi L. 3/2019
Contributi dei parlamentari del PD - Contributi L. 3/2019
 
Coastal Protection Measures in Hulhumale'
Coastal Protection Measures in Hulhumale'Coastal Protection Measures in Hulhumale'
Coastal Protection Measures in Hulhumale'
 
Cheap Call Girls In Hyderabad Phone No 📞 9352988975 📞 Elite Escort Service Av...
Cheap Call Girls In Hyderabad Phone No 📞 9352988975 📞 Elite Escort Service Av...Cheap Call Girls In Hyderabad Phone No 📞 9352988975 📞 Elite Escort Service Av...
Cheap Call Girls In Hyderabad Phone No 📞 9352988975 📞 Elite Escort Service Av...
 

Introduction to data cleaning with spreadsheets

  • 1. An Introduction to data cleaning with spreadsheets Anders Pedersen, @anpe School of Data
  • 2. Spreadsheets: The beginning of each and every data story • Which were the top growth sectors in this quarter? • What was the crime in the capital region in 2013 compared to 2012? • Is there a house bubble waiting around the corner?
  • 3. It is time for journalists themselves to tame this beast called spreadsheets!
  • 5. Some basic terminology • data is organized in rows and columns (rows go across the page, columns go top down) • each field holding data is called a cell • Rows are numbered, • columns are referred to by letters • each cell has column and a row, or a specific code (e.g. A1 is the top left cell
  • 6. Some key features to explore today • Sorting and filtering • Basic formulas • Pivot tables Tricky bits: - don’t include summaries in pivot table - pivot tables cannot remember when you change your data
  • 7. Data sources for exercise • Education: Secondary school enrollment for 2012 from Data.gov.ph http://data.gov.ph/catalogue/dataset/sy- 2012-enrollment-data-secondary
  • 8. Sorting - finding the best and the worst • The 10 best paid sectors • The 10 oldest cities • The 10 poorest countries • … • If excel is a tool box for journalists, sorting is the hammer!
  • 9. How to sort • 1) Mark all your data • 2) In the Data tab go to sort range
  • 10. Sorting... • 3) Check the Data has header row check box • 4) Select the column you want to sort
  • 11. Filtering - getting a better sense of your data • 1) Turn on Filtering via the Data tab (Data → Filter)
  • 12. Filtering... • 2) Filter options now appear at top
  • 13. Filtering... • 3) Now click on the • blue triangular arrow
  • 14. Filtering... • 4) Select the section you wish to filter
  • 15. Filtering... • 5) A green arrow will now appear on top of the column
  • 16. Moving forward! • Sorting and filtering - check! • Basic formulas • Pivot tables
  • 17. Basic formulas • Let us know try to sum up some of the values in the dataset… • What is it good for: when you do analysis and when you need to check if calculations by your colleagues are right
  • 18. Basic formulas • Go to column H: In the second row (cell H2), type “=sum(f2+g2)”
  • 19. Basic formulas • We now have a sum • Now try to see if this cell can be calculated for average “=average(f2:g2)”
  • 20. Basic formulas • You can also copy your calculations across cells
  • 21. Now only Pivot tables to go • Sorting and filtering - check! • Basic formulas - check! • Pivot tables
  • 22. Pivot tables • finding stories inside datasets • particularly well fitting for organised datasets with clear categories and sub- categories
  • 23. Pivot tables • Mark the full area of the dataset • Go to Data → Pivot table report
  • 24. Pivot tables • Pivot tables allows you to work on rows, column values and filters • We start by dropping a column header into Rows • Then we drop one of our value columns into Values
  • 25. Basic formulas • We now have a nice summary of the budget for each department
  • 26. Filtering pivot tables • We can now go ahead and filter the Pivot table • Add the column you wish to filter by
  • 27. Filtering pivot tables • Then select one or more categories within the column you wish to keep
  • 28. Pivot tables • We can finally add several value columns to the pivot table
  • 29. Exercises • Find the sectors of the national budget that grew the most in percentage • Identify the budget lines, which had the biggest absolute increase in the budget • Generate a pivot table based on the national budget comparing 2014 and 2013 in specific sectors