Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
John Spencer
MEASURE Evaluation
University of North Carolina at Chapel Hill
Webinar
August 24, 2017
The life changing magi...
Keep only those things
that bring a “spark of
joy”
The life changing
magic of tidying up
Marie Kondo
https://www.measureevaluation.org/news/tidy-data-and-how-to-get-it
Outline
What is tidy
data?
Why is it
important in
GIS?
What tools
exist to help?
Imagefrom:RforDataScience
Grolemund, Wickham
http://r4ds.had.co.nz
File format is the specific way information is
encoded for storage in a computer file.
File format vs file structure
Wikip...
File structure how the data is stored in the file.
File format vs file structure
You’ve found a great new data repository and
you can’t wait to get data from it and start doing
stuff like this
Onceyouget the
data,itwill need
tobecleaned up
beforeusingit.
Messy
Data
Making
Messy
Data Tidy
Messy data needs to be tidied up before it
can be used.
Tidy
Data
Organized structure for data.
1. Each variable forms a column.
2. Each observation forms a row.
3. Each type of ...
Untidy
Data
1. Column names represent data
values instead of variable names
2. A single column contains data
on multiple v...
“Happy families are all alike;
every unhappy family is
unhappy in its own way.”
–– Leo Tolstoy
“Tidy datasets are all alik...
Let’s get messy
Class
Mammal Number of feet
Horse 4
Dog 4
Cat 4
Reptile
Snake 0
Turtle 4
Bird
Eagle 2
Ostrich 2
Multiple data classes
and ...
Animal Number of feet Class
Horse 4 Mammal
Eagle 2 Bird
Turtle 4 Reptile
Dog 4 Mammal
Snake 0 Reptile
Ostrich 2 Bird
Cat 4...
Make it easier for computer
programs to read data
Often found on websites or in reports
TidyuntidyUnitedNations’migrationdatawithtidyr
KanNashida
https://blog.exploratory.io/tidy-untidy-united-nations-migration...
Messy data and GIS
GIS wants to see well structured data
Facility ID Name Latitude Longitude Number of
staff
3K4R200 Eastern Health
Clinic
-4...
Applicable beyond GIS data
Following basic tidy data protocols will make analysis
with many other software programs easier to do.
Hadley Wickham has an R
package, TidyR that can be
very helpful in tidying data.
R
https://cran.r-project.org/web/packages...
Nicholas Hould has an
overview of tools in Python
programming language
Tidy data in Python.
Python
http://www.jeannicholas...
Stata provides tools; an
overview of some of them are
available via the Carolina
Population Center Website
Stata
http://ww...
Excel is not necessarily the
best tool to change untidy
data into tidy data, but there
are some things it can do.
Microsof...
A good overview of some
useful Excel functions can be
found here:
Excel
http://myexcelonline.com/blog/top-excel-data-clean...
Other Data
Formats
XML
• Extensible Markup Language
• Designed to store and transport data
• Well defined schema
JSON
• Ja...
Advice
Advice for
data
producers
• Include tidy data download
options
• Think about potential users
of your data and what they
ne...
Advice for
data users
• Look for tools that make the
job easier
• Look for alternative
download sources that
provide the d...
https://www.measureevaluation.org/news/tidy-data-and-how-to-get-it
This presentation was produced with the support of the United States Agency for
International Development (USAID) under th...
The life changing magic of tidying up your data: The art and science of making data usable
The life changing magic of tidying up your data: The art and science of making data usable
Upcoming SlideShare
Loading in …5
×

The life changing magic of tidying up your data: The art and science of making data usable

489 views

Published on

Webinar presentation by John Spencer in August 2017

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

The life changing magic of tidying up your data: The art and science of making data usable

  1. 1. John Spencer MEASURE Evaluation University of North Carolina at Chapel Hill Webinar August 24, 2017 The life changing magic of tidying up your data The art and science of making data usable
  2. 2. Keep only those things that bring a “spark of joy” The life changing magic of tidying up Marie Kondo
  3. 3. https://www.measureevaluation.org/news/tidy-data-and-how-to-get-it
  4. 4. Outline What is tidy data? Why is it important in GIS? What tools exist to help?
  5. 5. Imagefrom:RforDataScience Grolemund, Wickham http://r4ds.had.co.nz
  6. 6. File format is the specific way information is encoded for storage in a computer file. File format vs file structure Wikipedia
  7. 7. File structure how the data is stored in the file. File format vs file structure
  8. 8. You’ve found a great new data repository and you can’t wait to get data from it and start doing stuff like this
  9. 9. Onceyouget the data,itwill need tobecleaned up beforeusingit. Messy Data
  10. 10. Making Messy Data Tidy Messy data needs to be tidied up before it can be used.
  11. 11. Tidy Data Organized structure for data. 1. Each variable forms a column. 2. Each observation forms a row. 3. Each type of observational unit forms a table. Wickham, H. (2014). Tidy data. Journal of Statistical Software, 59(10).
  12. 12. Untidy Data 1. Column names represent data values instead of variable names 2. A single column contains data on multiple variables instead of a single variable 3. Variables are contained in both rows and columns instead of just columns 4. A single table contains more than one observational unit 5. Data about an observational unit is spread across multiple data sets Wickham, H. (2014). Tidy data. Journal of Statistical Software, 59(10).
  13. 13. “Happy families are all alike; every unhappy family is unhappy in its own way.” –– Leo Tolstoy “Tidy datasets are all alike, but every messy dataset is messy in its own way.” –– Hadley Wickham
  14. 14. Let’s get messy
  15. 15. Class Mammal Number of feet Horse 4 Dog 4 Cat 4 Reptile Snake 0 Turtle 4 Bird Eagle 2 Ostrich 2 Multiple data classes and species mixed in the same column Blank rows Easy for human to read, hard for a computer
  16. 16. Animal Number of feet Class Horse 4 Mammal Eagle 2 Bird Turtle 4 Reptile Dog 4 Mammal Snake 0 Reptile Ostrich 2 Bird Cat 4 Mammal Tidy data
  17. 17. Make it easier for computer programs to read data
  18. 18. Often found on websites or in reports
  19. 19. TidyuntidyUnitedNations’migrationdatawithtidyr KanNashida https://blog.exploratory.io/tidy-untidy-united-nations-migration-data-with-tidyr-167cbd24c5c2
  20. 20. Messy data and GIS
  21. 21. GIS wants to see well structured data Facility ID Name Latitude Longitude Number of staff 3K4R200 Eastern Health Clinic -47.48516 61.69449 13 27LS611 Southern Health Clinic -6.05422 19.66357 4 1N291B2 Western Health Clinic -48.36875 109.76463 9
  22. 22. Applicable beyond GIS data
  23. 23. Following basic tidy data protocols will make analysis with many other software programs easier to do.
  24. 24. Hadley Wickham has an R package, TidyR that can be very helpful in tidying data. R https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html
  25. 25. Nicholas Hould has an overview of tools in Python programming language Tidy data in Python. Python http://www.jeannicholashould.com/tidy-data-in-python.html
  26. 26. Stata provides tools; an overview of some of them are available via the Carolina Population Center Website Stata http://www.cpc.unc.edu/research/tools/data_analysis/statatutorial
  27. 27. Excel is not necessarily the best tool to change untidy data into tidy data, but there are some things it can do. Microsoft has a page describing how to clean data and offers some plugins that could be helpful: Excel https://goo.gl/WGiUvp
  28. 28. A good overview of some useful Excel functions can be found here: Excel http://myexcelonline.com/blog/top-excel-data-cleansing-techniques/
  29. 29. Other Data Formats XML • Extensible Markup Language • Designed to store and transport data • Well defined schema JSON • JavaScript Object Notation • Increasingly Common • GeoJSON variation for geographic data By definition the data is “tidy”
  30. 30. Advice
  31. 31. Advice for data producers • Include tidy data download options • Think about potential users of your data and what they need to use data effectively
  32. 32. Advice for data users • Look for tools that make the job easier • Look for alternative download sources that provide the data in tidy format • Share tools that you create
  33. 33. https://www.measureevaluation.org/news/tidy-data-and-how-to-get-it
  34. 34. This presentation was produced with the support of the United States Agency for International Development (USAID) under the terms of MEASURE Evaluation cooperative agreement AID-OAA-L-14-00004. MEASURE Evaluation is implemented by the Carolina Population Center, University of North Carolina at Chapel Hill in partnership with ICF International; John Snow, Inc.; Management Sciences for Health; Palladium; and Tulane University. Views expressed are not necessarily those of USAID or the United States government. www.measureevaluation.org

×