Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

2

Share

Download to read offline

Data Curation - Data probity in a time of COVID

Download to read offline

Presentation by Gavin Chait to the SIKM Leaders Community on June 15, 2021

Related Books

Free with a 30 day trial from Scribd

See all

Data Curation - Data probity in a time of COVID

  1. 1. Whythawk Data Curation Data probity in a time of COVID SIKM, June 2021
  2. 2. www.vperemen.com, CC BY-SA 4.0, via Wikimedia Commons The screaming need for data Who is effected? How are they effected? What can we do about it? What might happen in response? How do we recover afterwards? Will things ever be the same?
  3. 3. Badics, CC BY-SA 3.0, via Wikimedia Commons The intersection of Policy & Politics Data, analysis & the evidence illusion Post-hoc support & plausible deniability Competing self-interest Changing circumstance, changing evidence
  4. 4. Harvesting longitudinal data is not joyful Instant answers don’t happen instantly Longitudinal source data are incoherent Data probity takes method, practice & time Esayas Ayele, CC BY-SA 4.0, via Wikimedia Commons
  5. 5. CDC Global, CC BY-SA 2.0, via flickr What we talk about when we talk about probity Identifiable source Transparent methods Publication before analysis Point data before aggregation Repeatable, auditable trail
  6. 6. Transparency in practice Pre-publication of research protocol, methods & data Systematic review Open licences No trust without support for peer review & validation Yakuzakorat, CC BY 4.0, via Wikimedia Commons
  7. 7. Photo by Clay Banks on Unsplash Protocols & ambiguity Maintain your source Pick sensible defaults Make no destructive changes Document every action Expect to be audited
  8. 8. Photo by Lubo Minar on Unsplash Uncertainty & the distant future Data harvested today must answer unknown questions to unknown problems in an unknown – but different – future environment
  9. 9. Poverty is expensive A legacy of futility risks becoming self-perpetuating Olga Ernst, CC BY-SA 4.0, via Wikimedia Commons
  10. 10. A history in 35 million rows
  11. 11. Where are businesses compared to where we think they are? Does a change in tax rates cause business closure? How should we measure energy consumption? Who wins & loses from COVID commute changes?
  12. 12. Who wants to be a millionaire?
  13. 13. Photo by Sylvie Tittel on Unsplash Protocol with sensible defaults 1. All units are occupied & pay full rates. 2. When data are ambiguous, refer to 1. 3. Ask for data, even when you know they’ll say no. 4. Never delete anything. 5. Document everything. 6. When in doubt, ask the data source. 7. Accept the weird but keep looking for answers. 8. Ensure the process is public.
  14. 14. 1. Track every step
  15. 15. 2. Disclose every request
  16. 16. 3. Non-destructive auditable transformation
  17. 17. 4. Always ready to explain
  18. 18. 5. Make the data useful
  19. 19. Because …
  20. 20. Photo by Sylvie Tittel on Unsplash Sqwyre data probity protocol 1. Instant answers don’t happen instantly 2. Data probity takes method, practice & patience 3. Maintain all source data 4. Pick sensible & transparent defaults 5. Transformations must be documented 6. Make no destructive changes 7. Point data before aggregation or analysis 8. Open licences to encourage use & reuse 9. Collaborate to make the data wanted & useful 10. Be ready to explain & be audited
  21. 21. Hansueli Krapf This file was uploaded with Commonist., CC BY-SA 3.0, via Wikimedia Commons Know your business
  22. 22. Whythawk Gavin Chait gchait@whythawk.com https://whythawk.com/
  • JOERAI

    Jun. 15, 2021
  • SIKM

    Jun. 10, 2021

Presentation by Gavin Chait to the SIKM Leaders Community on June 15, 2021

Views

Total views

281

On Slideshare

0

From embeds

0

Number of embeds

7

Actions

Downloads

3

Shares

0

Comments

0

Likes

2

×