SlideShare a Scribd company logo
1 of 28
Data Wrangling II:
Programming on the Whiteboard
February 26, 2016
Paige Morgan
Digital Humanities Librarian
Starting Activity:
Open Syllabus Project
http://opensyllabusproject.org/
Open Syllabus Project
• Use the syllabus explorer to examine
the data.
• Keep track of each step you take as
you drill down.
• Goal: develop a research question
based on your explorations.
• What other data would you need to
answer this research question?
Last week...
• The work of creating usable data
• Forms that this data might take:
• markup language
• Spreadsheets (MySQL & relational
DBs)
• Non-relational databases (RDF/Linked
Open Data
This week:
• Caveat Curator (challenges of working
with data)
• Programming on the Whiteboard, i.e.,
conceptualizing the specific steps that
you need to take to accomplish your
goals
Goals/Takeaways
• A better understanding of the
workflow for dealing with data
• How to start small and scale up
effectively
• Greater ability to talk about what
you’re trying to do
Why this focus on data?
• Understanding your data, and your
intended actions, is a key skill for
developing any digital project (big or
small).
• You may have one big project – but
your data may support several
small/intermediary projects.
Image: Josh Lee, @wtrsld, via Twitter, January
2014.
What if your data is
crowdsourced?
You can require a particular
format for submissions
You can even put
programmatic limits on the
formats available for
submission
But in the end, you’re
probably still going to need
to scrub and/or format.
This is true even for data
from supposedly
reputable sources, like
government or media
organizations.
Example: Doctor Who
Villains dataset
http://tinyurl.com/doctorwhovillains
Data Dictionaries
If you are thinking about
your data, and the tasks that
you need to accomplish,
then it’s easier to determine
what sort of language or
platform your project needs.
Pseudocode
• Used by programmers to break down a
complex task into single steps
• Easily adaptable for use by non-
programmers
Pseudocode Example (Visible Prices)
• Computer has a file that contains prices from different
texts.
• Computer must know that each price amount is
connected with an object, and with a bibliographical
record.
• Users can input a price amount, and computer will
retrieve all objects that match the price, and display
them to the user, along with bibliographical information.
• (More complex): Computer is able to retrieve prices
linked with certain categories (clothing, food, etc.)
It is likely that your data will
have a longer life span than
any specific project you
create.
In many instances, it may be
more useful to focus on the
data curation as much as a
single project.
Getting Data
• Figshare
• Datahub.io
• Project websites
• APIs
Cleaning Data
• OpenRefine http://openrefine.org/
Key DH Values
• Adaptive
• Sustainable/resource-aware
• Collaborative
• Social
Key skills
• Thinking flexibly about your data (and
potential project)
• Are there portions of your dataset that
could be extracted for use in a particular
tool?
• How can you adjust your data in order to
show it to people (and be more able to
talk/write/present about your research
interests?)
And now, it’s your turn...
Group Activity
• What questions can you ask and
answer with this data as it is?
• What data would you need in order to
ask & answer other research
questions?
• What are the steps that you would
need to take in order to answer those
research questions?
Next steps
• What’s the smallest version of your
dataset possible? (useful for testing out
tools)
• Possible tools to examine (as ways of
presenting your data)
• Omeka (http://www.omeka.net)
• Scalar (http://scalar.usc.edu)
• Simile (http://www.simile-widgets.org)
• Google Fusion Tables
(https://support.google.com/fusiontables/answer/2571232)
Thank you!
• Questions? Ideas? Book a consult at
http://paigecmorgan.youcanbook.me

More Related Content

What's hot

Intc 3610 data 2012
Intc 3610 data 2012Intc 3610 data 2012
Intc 3610 data 2012dharvey100
 
Sands Fish - Knowing in the Age of Networked Knowledge
Sands Fish - Knowing in the Age of Networked KnowledgeSands Fish - Knowing in the Age of Networked Knowledge
Sands Fish - Knowing in the Age of Networked Knowledgesandsfish
 
20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...
20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...
20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...Andrew Bourgeois
 
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...James Hendler
 
Software Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanSoftware Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanMicah Altman
 
Characteristics of dl
Characteristics of dlCharacteristics of dl
Characteristics of dljeancly
 
Some technical hurdles towards open science
Some technical hurdles towards open scienceSome technical hurdles towards open science
Some technical hurdles towards open scienceBjörn Brembs
 
Knowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/InteroperabilityKnowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/InteroperabilityJames Hendler
 
Gary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceGary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceMicah Altman
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked dataLaura Po
 
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...Stefan Schmunk
 
Conclusions and Learned Lessons - Visual Navigation Project Outcomes -
Conclusions and Learned Lessons - Visual Navigation Project Outcomes - Conclusions and Learned Lessons - Visual Navigation Project Outcomes -
Conclusions and Learned Lessons - Visual Navigation Project Outcomes - Visual Navigation Project
 
2011 05-01 linked data
2011 05-01 linked data2011 05-01 linked data
2011 05-01 linked datavafopoulos
 
The liaison librarian: connecting with the qualitative research lifecycle
The liaison librarian: connecting with the qualitative research lifecycleThe liaison librarian: connecting with the qualitative research lifecycle
The liaison librarian: connecting with the qualitative research lifecycleCelia Emmelhainz
 
2011 05-02 linked data intro
2011 05-02 linked data intro2011 05-02 linked data intro
2011 05-02 linked data introvafopoulos
 
Knowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaKnowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaPaul Groth
 
Academic Research over internet
Academic Research over internetAcademic Research over internet
Academic Research over internetAbdul Wahid Uqaily
 

What's hot (20)

Intc 3610 data 2012
Intc 3610 data 2012Intc 3610 data 2012
Intc 3610 data 2012
 
Sands Fish - Knowing in the Age of Networked Knowledge
Sands Fish - Knowing in the Age of Networked KnowledgeSands Fish - Knowing in the Age of Networked Knowledge
Sands Fish - Knowing in the Age of Networked Knowledge
 
20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...
20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...
20161019-dlc-making-it-happen-together-demonstrating-resilience-thru-successf...
 
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...
 
Software Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanSoftware Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental Scan
 
Characteristics of dl
Characteristics of dlCharacteristics of dl
Characteristics of dl
 
Some technical hurdles towards open science
Some technical hurdles towards open scienceSome technical hurdles towards open science
Some technical hurdles towards open science
 
Knowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/InteroperabilityKnowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/Interoperability
 
Gary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceGary Price, MIT Program on Information Science
Gary Price, MIT Program on Information Science
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked data
 
Can you Cope
Can you CopeCan you Cope
Can you Cope
 
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
 
Conclusions and Learned Lessons - Visual Navigation Project Outcomes -
Conclusions and Learned Lessons - Visual Navigation Project Outcomes - Conclusions and Learned Lessons - Visual Navigation Project Outcomes -
Conclusions and Learned Lessons - Visual Navigation Project Outcomes -
 
2011 05-01 linked data
2011 05-01 linked data2011 05-01 linked data
2011 05-01 linked data
 
The liaison librarian: connecting with the qualitative research lifecycle
The liaison librarian: connecting with the qualitative research lifecycleThe liaison librarian: connecting with the qualitative research lifecycle
The liaison librarian: connecting with the qualitative research lifecycle
 
2011 05-02 linked data intro
2011 05-02 linked data intro2011 05-02 linked data intro
2011 05-02 linked data intro
 
Knowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaKnowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPedia
 
Academic Research over internet
Academic Research over internetAcademic Research over internet
Academic Research over internet
 
Implementing Linked Data in Low-Resource Conditions
Implementing Linked Data in Low-Resource ConditionsImplementing Linked Data in Low-Resource Conditions
Implementing Linked Data in Low-Resource Conditions
 
computer science seminar
computer science seminarcomputer science seminar
computer science seminar
 

Similar to Feb.2016 Demystifying Digital Humanities - Workshop 3

Choosing a Data Visualization Tool for Data Scientists_Final
Choosing a Data Visualization Tool for Data Scientists_FinalChoosing a Data Visualization Tool for Data Scientists_Final
Choosing a Data Visualization Tool for Data Scientists_FinalHeather Choi
 
Pemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxPemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxelisarosa29
 
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docxDATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docxrandyburney60861
 
How Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment AnalysisHow Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment AnalysisCrowdFlower
 
FAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech ProposalsFAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech ProposalsFAIRDOM
 
Large scale computing
Large scale computing Large scale computing
Large scale computing Bhupesh Bansal
 
Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)Benjamin Bengfort
 
02-Lifecycle.pptx
02-Lifecycle.pptx02-Lifecycle.pptx
02-Lifecycle.pptxShree Shree
 
Big_data_1674238705.ppt is a basic background
Big_data_1674238705.ppt is a basic backgroundBig_data_1674238705.ppt is a basic background
Big_data_1674238705.ppt is a basic backgroundNidhiAhuja30
 
Lunch & Learn Intro to Big Data
Lunch & Learn Intro to Big DataLunch & Learn Intro to Big Data
Lunch & Learn Intro to Big DataMelissa Hornbostel
 
Using MS Power BI to create full, interactive reports using Brightspace Data ...
Using MS Power BI to create full, interactive reports using Brightspace Data ...Using MS Power BI to create full, interactive reports using Brightspace Data ...
Using MS Power BI to create full, interactive reports using Brightspace Data ...D2L Barry
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigManish Chopra
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data LocallyErin D. Foster
 
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph Technology
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph TechnologyOracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph Technology
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph TechnologyInfiniteGraph
 
Incentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production processIncentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production processLouise Corti
 

Similar to Feb.2016 Demystifying Digital Humanities - Workshop 3 (20)

Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Choosing a Data Visualization Tool for Data Scientists_Final
Choosing a Data Visualization Tool for Data Scientists_FinalChoosing a Data Visualization Tool for Data Scientists_Final
Choosing a Data Visualization Tool for Data Scientists_Final
 
Pemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxPemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptx
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
 
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docxDATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
 
How Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment AnalysisHow Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment Analysis
 
FAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech ProposalsFAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech Proposals
 
Large scale computing
Large scale computing Large scale computing
Large scale computing
 
Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)
 
Open Data Presentation
Open Data PresentationOpen Data Presentation
Open Data Presentation
 
02-Lifecycle.pptx
02-Lifecycle.pptx02-Lifecycle.pptx
02-Lifecycle.pptx
 
Big_data_1674238705.ppt is a basic background
Big_data_1674238705.ppt is a basic backgroundBig_data_1674238705.ppt is a basic background
Big_data_1674238705.ppt is a basic background
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARL
 
Lunch & Learn Intro to Big Data
Lunch & Learn Intro to Big DataLunch & Learn Intro to Big Data
Lunch & Learn Intro to Big Data
 
Using MS Power BI to create full, interactive reports using Brightspace Data ...
Using MS Power BI to create full, interactive reports using Brightspace Data ...Using MS Power BI to create full, interactive reports using Brightspace Data ...
Using MS Power BI to create full, interactive reports using Brightspace Data ...
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
 
Big data analytics - hadoop
Big data analytics - hadoopBig data analytics - hadoop
Big data analytics - hadoop
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data Locally
 
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph Technology
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph TechnologyOracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph Technology
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph Technology
 
Incentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production processIncentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production process
 

More from Paige Morgan

Miami Demystifying DH session 1 slides-FINAL
Miami   Demystifying DH   session 1 slides-FINALMiami   Demystifying DH   session 1 slides-FINAL
Miami Demystifying DH session 1 slides-FINALPaige Morgan
 
DMDH HASTAC 2015 Presentation: Building and Sustaining DH Communities
DMDH HASTAC 2015 Presentation: Building and Sustaining DH Communities DMDH HASTAC 2015 Presentation: Building and Sustaining DH Communities
DMDH HASTAC 2015 Presentation: Building and Sustaining DH Communities Paige Morgan
 
Dmdh may 2015 - workshop 1
Dmdh   may 2015 - workshop 1Dmdh   may 2015 - workshop 1
Dmdh may 2015 - workshop 1Paige Morgan
 
Modular Digital Scholarship // for Seeding Digital Scholarship
Modular Digital Scholarship // for Seeding Digital ScholarshipModular Digital Scholarship // for Seeding Digital Scholarship
Modular Digital Scholarship // for Seeding Digital ScholarshipPaige Morgan
 
Demystifying Digital Scholarship Workshop 6 Slides
Demystifying Digital Scholarship Workshop 6 SlidesDemystifying Digital Scholarship Workshop 6 Slides
Demystifying Digital Scholarship Workshop 6 SlidesPaige Morgan
 
Demystifying Digital Scholarship Slides: Big Project, Small Project: Steps in...
Demystifying Digital Scholarship Slides: Big Project, Small Project: Steps in...Demystifying Digital Scholarship Slides: Big Project, Small Project: Steps in...
Demystifying Digital Scholarship Slides: Big Project, Small Project: Steps in...Paige Morgan
 
DMDS Winter Workshop 2 Slides
DMDS Winter Workshop 2 SlidesDMDS Winter Workshop 2 Slides
DMDS Winter Workshop 2 SlidesPaige Morgan
 
DMDS Winter 2015 Workshop 1 slides
DMDS Winter 2015 Workshop 1 slidesDMDS Winter 2015 Workshop 1 slides
DMDS Winter 2015 Workshop 1 slidesPaige Morgan
 
Demystifying Digital Scholarship: Using Social Media for Learning and Profess...
Demystifying Digital Scholarship: Using Social Media for Learning and Profess...Demystifying Digital Scholarship: Using Social Media for Learning and Profess...
Demystifying Digital Scholarship: Using Social Media for Learning and Profess...Paige Morgan
 
Demystifying Digital Scholarship: Session 1, McMaster University
Demystifying Digital Scholarship: Session 1, McMaster UniversityDemystifying Digital Scholarship: Session 1, McMaster University
Demystifying Digital Scholarship: Session 1, McMaster UniversityPaige Morgan
 
DMDH 2014: Workshop 5: Project Ideation and Development
DMDH 2014: Workshop 5: Project Ideation and DevelopmentDMDH 2014: Workshop 5: Project Ideation and Development
DMDH 2014: Workshop 5: Project Ideation and DevelopmentPaige Morgan
 
Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the ...
Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the ...Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the ...
Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the ...Paige Morgan
 
Demystifying Digital Humanities: Winter 2014 session #1
Demystifying Digital Humanities: Winter 2014 session #1Demystifying Digital Humanities: Winter 2014 session #1
Demystifying Digital Humanities: Winter 2014 session #1Paige Morgan
 
Dmdh session-2-2013-14
Dmdh session-2-2013-14Dmdh session-2-2013-14
Dmdh session-2-2013-14Paige Morgan
 
Dmdh session-1-2013-14
Dmdh session-1-2013-14Dmdh session-1-2013-14
Dmdh session-1-2013-14Paige Morgan
 
Dmdh workshop 5 slides
Dmdh   workshop 5 slidesDmdh   workshop 5 slides
Dmdh workshop 5 slidesPaige Morgan
 
Visible Prices: Archiving the Intersection Between Literature and Economics
Visible Prices: Archiving the Intersection Between Literature and EconomicsVisible Prices: Archiving the Intersection Between Literature and Economics
Visible Prices: Archiving the Intersection Between Literature and EconomicsPaige Morgan
 

More from Paige Morgan (18)

Miami Demystifying DH session 1 slides-FINAL
Miami   Demystifying DH   session 1 slides-FINALMiami   Demystifying DH   session 1 slides-FINAL
Miami Demystifying DH session 1 slides-FINAL
 
DMDH HASTAC 2015 Presentation: Building and Sustaining DH Communities
DMDH HASTAC 2015 Presentation: Building and Sustaining DH Communities DMDH HASTAC 2015 Presentation: Building and Sustaining DH Communities
DMDH HASTAC 2015 Presentation: Building and Sustaining DH Communities
 
Dmdh may 2015 - workshop 1
Dmdh   may 2015 - workshop 1Dmdh   may 2015 - workshop 1
Dmdh may 2015 - workshop 1
 
Modular Digital Scholarship // for Seeding Digital Scholarship
Modular Digital Scholarship // for Seeding Digital ScholarshipModular Digital Scholarship // for Seeding Digital Scholarship
Modular Digital Scholarship // for Seeding Digital Scholarship
 
Demystifying Digital Scholarship Workshop 6 Slides
Demystifying Digital Scholarship Workshop 6 SlidesDemystifying Digital Scholarship Workshop 6 Slides
Demystifying Digital Scholarship Workshop 6 Slides
 
Demystifying Digital Scholarship Slides: Big Project, Small Project: Steps in...
Demystifying Digital Scholarship Slides: Big Project, Small Project: Steps in...Demystifying Digital Scholarship Slides: Big Project, Small Project: Steps in...
Demystifying Digital Scholarship Slides: Big Project, Small Project: Steps in...
 
DMDS Winter Workshop 2 Slides
DMDS Winter Workshop 2 SlidesDMDS Winter Workshop 2 Slides
DMDS Winter Workshop 2 Slides
 
DMDS Winter 2015 Workshop 1 slides
DMDS Winter 2015 Workshop 1 slidesDMDS Winter 2015 Workshop 1 slides
DMDS Winter 2015 Workshop 1 slides
 
Demystifying Digital Scholarship: Using Social Media for Learning and Profess...
Demystifying Digital Scholarship: Using Social Media for Learning and Profess...Demystifying Digital Scholarship: Using Social Media for Learning and Profess...
Demystifying Digital Scholarship: Using Social Media for Learning and Profess...
 
Demystifying Digital Scholarship: Session 1, McMaster University
Demystifying Digital Scholarship: Session 1, McMaster UniversityDemystifying Digital Scholarship: Session 1, McMaster University
Demystifying Digital Scholarship: Session 1, McMaster University
 
DMDH 2014: Workshop 5: Project Ideation and Development
DMDH 2014: Workshop 5: Project Ideation and DevelopmentDMDH 2014: Workshop 5: Project Ideation and Development
DMDH 2014: Workshop 5: Project Ideation and Development
 
Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the ...
Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the ...Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the ...
Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the ...
 
Demystifying Digital Humanities: Winter 2014 session #1
Demystifying Digital Humanities: Winter 2014 session #1Demystifying Digital Humanities: Winter 2014 session #1
Demystifying Digital Humanities: Winter 2014 session #1
 
Dmdh session-2-2013-14
Dmdh session-2-2013-14Dmdh session-2-2013-14
Dmdh session-2-2013-14
 
Dmdh session-1-2013-14
Dmdh session-1-2013-14Dmdh session-1-2013-14
Dmdh session-1-2013-14
 
Dmdh workshop #6
Dmdh workshop #6Dmdh workshop #6
Dmdh workshop #6
 
Dmdh workshop 5 slides
Dmdh   workshop 5 slidesDmdh   workshop 5 slides
Dmdh workshop 5 slides
 
Visible Prices: Archiving the Intersection Between Literature and Economics
Visible Prices: Archiving the Intersection Between Literature and EconomicsVisible Prices: Archiving the Intersection Between Literature and Economics
Visible Prices: Archiving the Intersection Between Literature and Economics
 

Recently uploaded

DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 

Recently uploaded (20)

DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 

Feb.2016 Demystifying Digital Humanities - Workshop 3

  • 1. Data Wrangling II: Programming on the Whiteboard February 26, 2016 Paige Morgan Digital Humanities Librarian
  • 2. Starting Activity: Open Syllabus Project http://opensyllabusproject.org/
  • 3. Open Syllabus Project • Use the syllabus explorer to examine the data. • Keep track of each step you take as you drill down. • Goal: develop a research question based on your explorations. • What other data would you need to answer this research question?
  • 4. Last week... • The work of creating usable data • Forms that this data might take: • markup language • Spreadsheets (MySQL & relational DBs) • Non-relational databases (RDF/Linked Open Data
  • 5. This week: • Caveat Curator (challenges of working with data) • Programming on the Whiteboard, i.e., conceptualizing the specific steps that you need to take to accomplish your goals
  • 6. Goals/Takeaways • A better understanding of the workflow for dealing with data • How to start small and scale up effectively • Greater ability to talk about what you’re trying to do
  • 7. Why this focus on data? • Understanding your data, and your intended actions, is a key skill for developing any digital project (big or small). • You may have one big project – but your data may support several small/intermediary projects.
  • 8. Image: Josh Lee, @wtrsld, via Twitter, January 2014.
  • 9. What if your data is crowdsourced?
  • 10. You can require a particular format for submissions
  • 11. You can even put programmatic limits on the formats available for submission
  • 12. But in the end, you’re probably still going to need to scrub and/or format.
  • 13. This is true even for data from supposedly reputable sources, like government or media organizations.
  • 14. Example: Doctor Who Villains dataset http://tinyurl.com/doctorwhovillains
  • 16. If you are thinking about your data, and the tasks that you need to accomplish, then it’s easier to determine what sort of language or platform your project needs.
  • 17. Pseudocode • Used by programmers to break down a complex task into single steps • Easily adaptable for use by non- programmers
  • 18. Pseudocode Example (Visible Prices) • Computer has a file that contains prices from different texts. • Computer must know that each price amount is connected with an object, and with a bibliographical record. • Users can input a price amount, and computer will retrieve all objects that match the price, and display them to the user, along with bibliographical information. • (More complex): Computer is able to retrieve prices linked with certain categories (clothing, food, etc.)
  • 19. It is likely that your data will have a longer life span than any specific project you create.
  • 20. In many instances, it may be more useful to focus on the data curation as much as a single project.
  • 21. Getting Data • Figshare • Datahub.io • Project websites • APIs
  • 22. Cleaning Data • OpenRefine http://openrefine.org/
  • 23. Key DH Values • Adaptive • Sustainable/resource-aware • Collaborative • Social
  • 24. Key skills • Thinking flexibly about your data (and potential project) • Are there portions of your dataset that could be extracted for use in a particular tool? • How can you adjust your data in order to show it to people (and be more able to talk/write/present about your research interests?)
  • 25. And now, it’s your turn...
  • 26. Group Activity • What questions can you ask and answer with this data as it is? • What data would you need in order to ask & answer other research questions? • What are the steps that you would need to take in order to answer those research questions?
  • 27. Next steps • What’s the smallest version of your dataset possible? (useful for testing out tools) • Possible tools to examine (as ways of presenting your data) • Omeka (http://www.omeka.net) • Scalar (http://scalar.usc.edu) • Simile (http://www.simile-widgets.org) • Google Fusion Tables (https://support.google.com/fusiontables/answer/2571232)
  • 28. Thank you! • Questions? Ideas? Book a consult at http://paigecmorgan.youcanbook.me

Editor's Notes

  1. Challenge: the dataset probably isn’t structured enough for you to answer your question