DH101: Exploring the
Computational Turn in the
Humanities
Nora McGregor
Curator, Digital Research
@ndalyrose
www.bl.uk 2
• Try to define digital humanities 
• Understand some of the buzzwords in and around DH
• Text/Data Mining & Machine Learning
• Data & Data Visualisation
• Georeferencing
• and a little Computer Vision & 3D modelling for good measure
….through lots of examples!
• Get tips for finding further info & support
Over the next hour we will….
www.bl.uk 3
But first, who am I?!
Founded in 2010, the Digital
Scholarship Department at British
Library supports researchers and
staff to make innovative use of our
digital collections and data.
We are a group of cross disciplinary
experts in the areas of digitisation,
librarianship, digital history &
humanities, computer and data
science, looking at how technology is
transforming research, and in turn,
our services.
@BL_DigiSchol
www.bl.uk 4
Getting (& staying) in the game
The Digital Scholarship Training Programme is
an internal staff training initiative by the Digital
Curator team that launched in November 2012.
Informed by the Digital Humanities, we look at
what researchers in the field were
learning/doing.
www.bl.uk 5
What does
“digital humanities”
mean to you?
(https://whatisdigitalhumanities.com/)
www.bl.uk 6
“Unlike many other interdisciplinary
experiments, humanities computing
has a very well-known beginning. In
1949, an Italian Jesuit priest, Father
Roberto Busa, began what even to
this day is a monumental task: to
make an index verborum of all the
words in the works of St Thomas
Aquinas and related authors, totaling
some 11 million words of medieval
Latin.
http://www.digitalhumanities.org/companion
/view?docId=blackwell/9781405103213/97
81405103213.xml&chunk.id=ss1-2-1
The origin story
www.bl.uk 7
“The real origin of that term [digital humanities] was in conversation
with Andrew McNeillie, the original acquiring editor for the Blackwell
Companion to Digital Humanities. We started talking with him about that
book project in 2001, in April, and by the end of November we’d lined up
contributors and were discussing the title, for the contract. Ray
[Siemens] wanted “A Companion to Humanities Computing” as that was
the term commonly used at that point; the editorial and marketing folks
at Blackwell wanted “Companion to Digitized Humanities.” I suggested
“Companion to Digital Humanities” to shift the emphasis away from
simple digitization.”
-John Unsworth, founding director of the
Institute for Advanced Technology in the Humanities
at the University of Virginia and author of
Blackwell Companion to Digital Humanities
The origin story, part II
www.bl.uk 8
• An area of scholarly activity, born from humanities computing, at the
intersection of computing/digital technologies and the
humanities.
• The field both employs technology in the pursuit of humanities
research, and subjects technology to humanistic questioning and
interrogation.
• DH is collaborative, crossdisciplinary, and computationally
engaged research, teaching, and publishing.
https://en.wikipedia.org/wiki/Digital_humanities
Defining digital humanities (DH)
www.bl.uk 9
The emergence of the new digital humanities isn’t an isolated academic
phenomenon. The institutional and disciplinary changes are part of a
larger cultural shift, inside and outside the academy, a rapid cycle of
emergence and convergence in technology and culture
Steven E Jones, Emergence of the Digital Humanities (2014)
http://lisacharlotterost.github.io/2015/06/20/Searching-through-the-years/
www.bl.uk 10
Is it a discipline? Or a set of methods that can be
used across disciplines (like textual criticism)
Lots of debate but for today we can safely
agree….
DH combines the methodologies from
traditional humanities & social science
disciplines…
….with computational tools provided by
computing disciplines.
Machine learning
Data Mining
Georeferencing
Text mining
Defining digital humanities (DH)
Data Visualisation
Crowdsourcing
www.bl.uk 11
How might digital humanities
techniques benefit your research?
• Explore a bigger body of material computationally than by individually
reading entire texts
• Sometimes see trends, patterns and relationships not apparent from
close reading
• Gain a broad overview of a topic
• Test an idea or hypothesis on a large dataset
• Provide skills and tools for keeping your research data clean
• New sources of funding, collaborations, connections
• …..and more!
Behind the Buzzwords
www.bl.uk 13
www.bl.uk 14
Text & Data Mining
Using a variety of computational techniques to derive information
from and find patterns in texts and large datasets. Two common TM
tasks:
• Named-entity recognition: find and classify words in texts that might
refer to names of things, such as a person or company
• Topic modelling: a method for finding a group of words (i.e topic) from
a collection of documents that best represents the information in the
collection.
Machine Learning
• Constructing algorithms that can learn from and make predictions on
data...employed in a range of computing tasks relevant to humanities
scholarship such as TM & automatic Handwritten Text Recognition (HTR)
www.bl.uk 15
Stanford Named Entity Tagger
http://nlp.stanford.edu:8080/ner/
www.bl.uk 16
Transkribus
Transkribus is an open-source software
for the automated recognition,
transcription, indexing and enrichment of
handwritten archival documents. It relies
on crowdsourcing and machine learning.
Each contribution
helps train the model
for automatic
recognition.
www.bl.uk 17
Political Meetings Mapper
Dr. Katrina Navickas, a self-professed
luddite, wanted to know how many, and
where, Chartist movement meetings
took place in the 19th Century and if
there was a more efficient way to
extract this information
programmatically from our digitised
newspapers, rather than by hand.
5,519 meetings held from 1838 to 1850
discovered in 462 towns and villages
across the UK!
Will be added to her existing findings:
http://protesthistory.org.uk/the-story-
1789-1848/database-of-meetings
“I was able to do in minutes with a python code what
I’d spent the last ten years trying to do by hand!”
-Dr. Katrina Navickas, BL Labs Winner 2015
www.bl.uk 18
Data Visualisation
• The graphical display of quantitative or qualitative information to
create insights by highlighting patterns, trends, variations and
anomalies.
• For 'sense-making (also called data analysis) and communication'
(Stephen Few)
• '…interactive, visual representations of abstract data to amplify
cognition' (Card et al)
• Visual perception is faster; interactive visualisations let you move
between the shape and the detail of a collection
www.bl.uk 19
http://datavizproject.com/
www.bl.uk 20
Big Data History of Music
How can vast amounts of bibliographic data held by research libraries be
unlocked for music researchers to analyse?
Can this data be interrogated in ways that challenge the traditional narratives
of music history?
Analyses and visualisations
exposed previously
uncharted patterns in the
history of music, for instance
the rise and fall of music
printing in 16th- and 17th-
century Europe (huge dips in
output in Venice were down
to plague and war).
https://www.royalholloway.ac
.uk/music/research/abigdata
historyofmusic/home.aspx
www.bl.uk 21
Example: Mapping the Republic of Letters & Video
https://youtu.be/nw0oS-AOIPE?t=8s
www.bl.uk 22
Georeferencing
• Linking data with a physical location. It relates information (documents,
texts, maps, images) to geographic locations through place names and
place codes or geospatial referencing (longitude and latitude coordinates).
• Some representative modes of enquiry enabled by georeferencing…
• Correspondence, Networks & Relationships (Republic of Letters)
• Mapping Literature (Willa Cather)
• Historical Social Movements (Political Meetings Mapper)
• Historical reconstructions (Orbis)
• Cities & Memory (Bomb Sight)
• Spread of Technology & Ideas (Atlas of Early Printing)
• Human-Environment Interaction (London Sound Survey)
www.bl.uk 23
Orbis: "Google Maps for Ancient Rome"
Video: https://www.youtube.com/watch?v=eWz7vXzmreg
View Interactive Map: http://atlas.lib.uiowa.edu/
Project Site: http://atlas.lib.uiowa.edu/about.php
The Stanford Geospatial
Network Model of the Roman
World reconstructs the time cost
and financial expense
associated with a wide range of
different types of travel in
antiquity.
ORBIS was created using data
from both primary sources and
computational geography
simulations about travel, wind
and sea patterns, seasonal
access, costs and other
considerations to plot realistic
transport networks.
www.bl.uk 24
Canada Through the Lens:
mapping a collection
Phil Hatfield, Curator created an
interactive map enabling access to
the Canadian copyright collection
by location, providing users with
metadata and, where possible,
access to the rights cleared (public
domain) images held on the
Library's Wikimedia site.
He used openly available tools
(Google Fusion Tables) which
automatically georeferenced the
data for him.
Discovered much of the collection
followed closely along railway
lines.
www.bl.uk 25
Computer Vision
• Closely related to Machine Learning, it’s concerned with the automatic
extraction, analysis and understanding of useful information from a
single image or a sequence of images.
It’s not ALL text based!
www.bl.uk 26
3D modelling
• Creating a three dimensional
computer model which
represents a three dimensional
object. 3D models are made
from points or vertices in 3D
space connected by geometric
data, such as lines and curves.
This forms a wireframe
representation which can be
displayed with a solid surface
through a process called
‘rendering’. Textures and
images can then be mapped to
the surfaces of the 3D model to
create ‘visualisations’.
It’s not ALL text based!
www.bl.uk 27
www.bl.uk 28
Humanities Data
• Facts and statistics collected together for reference or analysis
• Humanities data might be sets of bibliographic information, images,
image processing details, texts, texts with mark-up and annotations,
historical tabular data, archived webpages…you name it!
• A data set represents a distinct collection of data ideally packaged,
preserved and made accessible for enquiry.
• Humanities data can be “big”, “small”, “smart”…..but mostly
complex!
www.bl.uk 29
Messiness in historical data
• 'Begun in Kiryu, Japan, finished in France'
• 'Bali? Java? Mexico?'
• Variations on USA:
– U.S.
– U.S.A
– U.S.A.
– USA
– United States of America
– USA ?
– United States (case)
• Inconsistency in uncertainty
– U.S.A. or England
– U.S.A./England ?
– England & U.S.A.
www.bl.uk 30
Open Refine
www.bl.uk 31
Ships Log Books & Modern
Forecast Models
The East India Company archives
include 900 log-books of ships containing
daily instrumental measurements of
temperature and pressure, and
subjective estimates of wind speed and
direction, from voyages across the
Atlantic and Indian Oceans between
1789 and 1834.
The Met Office digitised and transcribed
these books, providing 273,000 new
weather records offering an
unprecedentedly detailed view of the
weather and climate of the late
eighteenth and early nineteenth centuries
in certain locations, which can be used to
test the accuracy of their forecasting
models.
www.bl.uk 32
• Cultural heritage records contain uncertainty and fuzziness (e.g. date ranges, multiple
values, uncertain or unavailable information)—Curators and staff at institutions often
have unique expertise in deciphering these anomalies-ask them! ( [1960] vs.1960 can
have a big impact depending on what you’re doing)
• Optical Character Recognition in particular is an imperfect art-need to consider how
bad it is, how this might effect your findings, and what needs doing to mitigate it.
• Keeping data clean, organised, open and described well will not only make your life
easier, but enable its widespread re-use beyond the life of your PhD and increase
future impact. (Datasets you’ve created in the course of your research projects could
even be used to enhance national collections!)
• Decisions always need to be made while normalising information for visualisation.
Documenting them is important for your research but also future re-use!
• Is your aim enquiry or presentation? All of this will have an impact on the tools and
data cleaning choices you make.
Things to consider: Data + Tools
www.bl.uk 33
www.bl.uk 34
#digitalhumanities
dancohen/lists/digitalhumanities
@ProfHacker
@Dhnow
@BL_DigiSchol
And more links to resources here: http://scottbot.net/teaching-yourself-to-code-in-dh/
www.bl.uk 35
Contacts
Email: digitalresearch@bl.uk
Blog: http://britishlibrary.typepad.co.uk/digital-scholarship/
Web: https://www.bl.uk/subjects/digital-scholarship
Thank you!

AHRC CDP Digital Humanities 101

  • 1.
    DH101: Exploring the ComputationalTurn in the Humanities Nora McGregor Curator, Digital Research @ndalyrose
  • 2.
    www.bl.uk 2 • Tryto define digital humanities  • Understand some of the buzzwords in and around DH • Text/Data Mining & Machine Learning • Data & Data Visualisation • Georeferencing • and a little Computer Vision & 3D modelling for good measure ….through lots of examples! • Get tips for finding further info & support Over the next hour we will….
  • 3.
    www.bl.uk 3 But first,who am I?! Founded in 2010, the Digital Scholarship Department at British Library supports researchers and staff to make innovative use of our digital collections and data. We are a group of cross disciplinary experts in the areas of digitisation, librarianship, digital history & humanities, computer and data science, looking at how technology is transforming research, and in turn, our services. @BL_DigiSchol
  • 4.
    www.bl.uk 4 Getting (&staying) in the game The Digital Scholarship Training Programme is an internal staff training initiative by the Digital Curator team that launched in November 2012. Informed by the Digital Humanities, we look at what researchers in the field were learning/doing.
  • 5.
    www.bl.uk 5 What does “digitalhumanities” mean to you? (https://whatisdigitalhumanities.com/)
  • 6.
    www.bl.uk 6 “Unlike manyother interdisciplinary experiments, humanities computing has a very well-known beginning. In 1949, an Italian Jesuit priest, Father Roberto Busa, began what even to this day is a monumental task: to make an index verborum of all the words in the works of St Thomas Aquinas and related authors, totaling some 11 million words of medieval Latin. http://www.digitalhumanities.org/companion /view?docId=blackwell/9781405103213/97 81405103213.xml&chunk.id=ss1-2-1 The origin story
  • 7.
    www.bl.uk 7 “The realorigin of that term [digital humanities] was in conversation with Andrew McNeillie, the original acquiring editor for the Blackwell Companion to Digital Humanities. We started talking with him about that book project in 2001, in April, and by the end of November we’d lined up contributors and were discussing the title, for the contract. Ray [Siemens] wanted “A Companion to Humanities Computing” as that was the term commonly used at that point; the editorial and marketing folks at Blackwell wanted “Companion to Digitized Humanities.” I suggested “Companion to Digital Humanities” to shift the emphasis away from simple digitization.” -John Unsworth, founding director of the Institute for Advanced Technology in the Humanities at the University of Virginia and author of Blackwell Companion to Digital Humanities The origin story, part II
  • 8.
    www.bl.uk 8 • Anarea of scholarly activity, born from humanities computing, at the intersection of computing/digital technologies and the humanities. • The field both employs technology in the pursuit of humanities research, and subjects technology to humanistic questioning and interrogation. • DH is collaborative, crossdisciplinary, and computationally engaged research, teaching, and publishing. https://en.wikipedia.org/wiki/Digital_humanities Defining digital humanities (DH)
  • 9.
    www.bl.uk 9 The emergenceof the new digital humanities isn’t an isolated academic phenomenon. The institutional and disciplinary changes are part of a larger cultural shift, inside and outside the academy, a rapid cycle of emergence and convergence in technology and culture Steven E Jones, Emergence of the Digital Humanities (2014) http://lisacharlotterost.github.io/2015/06/20/Searching-through-the-years/
  • 10.
    www.bl.uk 10 Is ita discipline? Or a set of methods that can be used across disciplines (like textual criticism) Lots of debate but for today we can safely agree…. DH combines the methodologies from traditional humanities & social science disciplines… ….with computational tools provided by computing disciplines. Machine learning Data Mining Georeferencing Text mining Defining digital humanities (DH) Data Visualisation Crowdsourcing
  • 11.
    www.bl.uk 11 How mightdigital humanities techniques benefit your research? • Explore a bigger body of material computationally than by individually reading entire texts • Sometimes see trends, patterns and relationships not apparent from close reading • Gain a broad overview of a topic • Test an idea or hypothesis on a large dataset • Provide skills and tools for keeping your research data clean • New sources of funding, collaborations, connections • …..and more!
  • 12.
  • 13.
  • 14.
    www.bl.uk 14 Text &Data Mining Using a variety of computational techniques to derive information from and find patterns in texts and large datasets. Two common TM tasks: • Named-entity recognition: find and classify words in texts that might refer to names of things, such as a person or company • Topic modelling: a method for finding a group of words (i.e topic) from a collection of documents that best represents the information in the collection. Machine Learning • Constructing algorithms that can learn from and make predictions on data...employed in a range of computing tasks relevant to humanities scholarship such as TM & automatic Handwritten Text Recognition (HTR)
  • 15.
    www.bl.uk 15 Stanford NamedEntity Tagger http://nlp.stanford.edu:8080/ner/
  • 16.
    www.bl.uk 16 Transkribus Transkribus isan open-source software for the automated recognition, transcription, indexing and enrichment of handwritten archival documents. It relies on crowdsourcing and machine learning. Each contribution helps train the model for automatic recognition.
  • 17.
    www.bl.uk 17 Political MeetingsMapper Dr. Katrina Navickas, a self-professed luddite, wanted to know how many, and where, Chartist movement meetings took place in the 19th Century and if there was a more efficient way to extract this information programmatically from our digitised newspapers, rather than by hand. 5,519 meetings held from 1838 to 1850 discovered in 462 towns and villages across the UK! Will be added to her existing findings: http://protesthistory.org.uk/the-story- 1789-1848/database-of-meetings “I was able to do in minutes with a python code what I’d spent the last ten years trying to do by hand!” -Dr. Katrina Navickas, BL Labs Winner 2015
  • 18.
    www.bl.uk 18 Data Visualisation •The graphical display of quantitative or qualitative information to create insights by highlighting patterns, trends, variations and anomalies. • For 'sense-making (also called data analysis) and communication' (Stephen Few) • '…interactive, visual representations of abstract data to amplify cognition' (Card et al) • Visual perception is faster; interactive visualisations let you move between the shape and the detail of a collection
  • 19.
  • 20.
    www.bl.uk 20 Big DataHistory of Music How can vast amounts of bibliographic data held by research libraries be unlocked for music researchers to analyse? Can this data be interrogated in ways that challenge the traditional narratives of music history? Analyses and visualisations exposed previously uncharted patterns in the history of music, for instance the rise and fall of music printing in 16th- and 17th- century Europe (huge dips in output in Venice were down to plague and war). https://www.royalholloway.ac .uk/music/research/abigdata historyofmusic/home.aspx
  • 21.
    www.bl.uk 21 Example: Mappingthe Republic of Letters & Video https://youtu.be/nw0oS-AOIPE?t=8s
  • 22.
    www.bl.uk 22 Georeferencing • Linkingdata with a physical location. It relates information (documents, texts, maps, images) to geographic locations through place names and place codes or geospatial referencing (longitude and latitude coordinates). • Some representative modes of enquiry enabled by georeferencing… • Correspondence, Networks & Relationships (Republic of Letters) • Mapping Literature (Willa Cather) • Historical Social Movements (Political Meetings Mapper) • Historical reconstructions (Orbis) • Cities & Memory (Bomb Sight) • Spread of Technology & Ideas (Atlas of Early Printing) • Human-Environment Interaction (London Sound Survey)
  • 23.
    www.bl.uk 23 Orbis: "GoogleMaps for Ancient Rome" Video: https://www.youtube.com/watch?v=eWz7vXzmreg View Interactive Map: http://atlas.lib.uiowa.edu/ Project Site: http://atlas.lib.uiowa.edu/about.php The Stanford Geospatial Network Model of the Roman World reconstructs the time cost and financial expense associated with a wide range of different types of travel in antiquity. ORBIS was created using data from both primary sources and computational geography simulations about travel, wind and sea patterns, seasonal access, costs and other considerations to plot realistic transport networks.
  • 24.
    www.bl.uk 24 Canada Throughthe Lens: mapping a collection Phil Hatfield, Curator created an interactive map enabling access to the Canadian copyright collection by location, providing users with metadata and, where possible, access to the rights cleared (public domain) images held on the Library's Wikimedia site. He used openly available tools (Google Fusion Tables) which automatically georeferenced the data for him. Discovered much of the collection followed closely along railway lines.
  • 25.
    www.bl.uk 25 Computer Vision •Closely related to Machine Learning, it’s concerned with the automatic extraction, analysis and understanding of useful information from a single image or a sequence of images. It’s not ALL text based!
  • 26.
    www.bl.uk 26 3D modelling •Creating a three dimensional computer model which represents a three dimensional object. 3D models are made from points or vertices in 3D space connected by geometric data, such as lines and curves. This forms a wireframe representation which can be displayed with a solid surface through a process called ‘rendering’. Textures and images can then be mapped to the surfaces of the 3D model to create ‘visualisations’. It’s not ALL text based!
  • 27.
  • 28.
    www.bl.uk 28 Humanities Data •Facts and statistics collected together for reference or analysis • Humanities data might be sets of bibliographic information, images, image processing details, texts, texts with mark-up and annotations, historical tabular data, archived webpages…you name it! • A data set represents a distinct collection of data ideally packaged, preserved and made accessible for enquiry. • Humanities data can be “big”, “small”, “smart”…..but mostly complex!
  • 29.
    www.bl.uk 29 Messiness inhistorical data • 'Begun in Kiryu, Japan, finished in France' • 'Bali? Java? Mexico?' • Variations on USA: – U.S. – U.S.A – U.S.A. – USA – United States of America – USA ? – United States (case) • Inconsistency in uncertainty – U.S.A. or England – U.S.A./England ? – England & U.S.A.
  • 30.
  • 31.
    www.bl.uk 31 Ships LogBooks & Modern Forecast Models The East India Company archives include 900 log-books of ships containing daily instrumental measurements of temperature and pressure, and subjective estimates of wind speed and direction, from voyages across the Atlantic and Indian Oceans between 1789 and 1834. The Met Office digitised and transcribed these books, providing 273,000 new weather records offering an unprecedentedly detailed view of the weather and climate of the late eighteenth and early nineteenth centuries in certain locations, which can be used to test the accuracy of their forecasting models.
  • 32.
    www.bl.uk 32 • Culturalheritage records contain uncertainty and fuzziness (e.g. date ranges, multiple values, uncertain or unavailable information)—Curators and staff at institutions often have unique expertise in deciphering these anomalies-ask them! ( [1960] vs.1960 can have a big impact depending on what you’re doing) • Optical Character Recognition in particular is an imperfect art-need to consider how bad it is, how this might effect your findings, and what needs doing to mitigate it. • Keeping data clean, organised, open and described well will not only make your life easier, but enable its widespread re-use beyond the life of your PhD and increase future impact. (Datasets you’ve created in the course of your research projects could even be used to enhance national collections!) • Decisions always need to be made while normalising information for visualisation. Documenting them is important for your research but also future re-use! • Is your aim enquiry or presentation? All of this will have an impact on the tools and data cleaning choices you make. Things to consider: Data + Tools
  • 33.
  • 34.
    www.bl.uk 34 #digitalhumanities dancohen/lists/digitalhumanities @ProfHacker @Dhnow @BL_DigiSchol And morelinks to resources here: http://scottbot.net/teaching-yourself-to-code-in-dh/
  • 35.
    www.bl.uk 35 Contacts Email: digitalresearch@bl.uk Blog:http://britishlibrary.typepad.co.uk/digital-scholarship/ Web: https://www.bl.uk/subjects/digital-scholarship Thank you!

Editor's Notes

  • #5 https://librarycarpentry.github.io/
  • #7 For more details on the process: http://www.historyofinformation.com/expanded.php?id=2321 Father Busa imagined that a machine might be able to help him, and, having heard of computers, went to visit Thomas J. Watson at IBM in the United States in search of support (Busa 1980). The entire texts were gradually transferred to punched cards and a concordance program written.”
  • #17 400 Million views since 2013
  • #18 Video: http://www.bl.uk/case-studies/political-meetings-mapper Research Question: Chartism was the biggest popular movement for democracy in 19th Century British history. They campaigned for the vote for all men. The Chartists advertised their meeting in the Northern Star newspaper from 1838 to 1850. The question is, how many of the meetings took place and where? We started with 1841-1845. Source Collections: 19th Century Digitised Newspapers, specifically Northern Star newspaper Digitised and Georeferenced Map of Oxford Street Digital/Computational Techniques: The images of the relevant pages of the Northern Star were run through an Optical Character Recognition program (Abbyy Finereader 12) and the resulting text was checked manually. We developed a set of Python codes to extract and geo-code the place of meeting, using a gazetteer of places, and parse the date of the meeting. Outcome: 5,519 meetings discovered in 462 towns and villages across the UK! http://politicalmeetingsmapper.co.uk/maps/
  • #21 Research Question: Brought together for the first time the world's biggest datasets about published sheet music, music manuscripts and classical concerts (in excess of 5 million records) for statistical analysis, manipulation and visualisation. Aim was to unlock musical-bibliographical data held by libraries in order to create new research opportunities. The project cleaned and enhanced aspects of the British Library catalogues of printed and manuscript music, which are now available as open data from www.bl.uk/bibliographic/download.html and piloted big data research techniques on these and five other datasets. Source Collections: Data from seven existing databases and catalogues were used as the basis of this project: the British Library's catalogues of printed and manuscript music; the bibliographies created by Répertoire International des Sources Musicales (RISM) that list European music printed 1500-1800 and music manuscripts in European libraries; and the RISM UK Music Manuscripts Database and the Concert Programmes Project database. Digital/Computational Techniques: Data wrangling using Open Refine and MARCedit. Data visualisation using: Google Fusion Tables and Palladio Project slides: http://www.slideshare.net/historyspot/ihr-big-data-history-of-music-9-june15 Outcome: Analyses and visualisations of these datasets exposed previously uncharted patterns in the history of music, for instance involving the rise and fall of music printing in 16th- and 17th-century Europe (huge dips in output in Venice were down to plague and war!), or the rise of nationalist colourings in music of the late 18th and early 19th centuries. The detection of these long-term trends permits new ways of linking music history to wider histories of culture, economics, society and politics
  • #22 http://web.stanford.edu/group/toolingup/rplviz/
  • #25 http://blogs.bl.uk/magnificentmaps/2017/03/canada-through-the-lens-mapping-a-collection-on-display.html
  • #26 https://www.clarifai.com/demo http://www.robots.ox.ac.uk/~vgg/research/ https://cloud.google.com/vision/ https://azure.microsoft.com/en-gb/services/cognitive-services/computer-vision/
  • #27 Sketchfab has tutorials: https://blog.sketchfab.com/category/tutorial/ - in general Sketchfab would be a good place to start, exploring models, blogs, tutorials
  • #30 Examples from the Cooper Hewitt collection. I spent 3/5 of my time at the Cooper Hewitt just trying to get the data clean enough to vaguely represent the collection. The problem is that computers think U.S., U. S. , U.S.A., U. S. A. , United States, United States of America are six different places. Fields also contain things like internal notes about potential duplicates, unexpected extra information - notes on what type of location, etc. Lots of inconsistencies - uncertainty and date ranges expressed in different ways. More common GLAM issues - What year is 'early 18th century'? What do you do with '1836 (probably)'?
  • #31 Open Refine is an amazing tool, and I wouldn't have gotten anywhere at Cooper Hewitt without it. It will suggest ways to make the data more consistent. You can then export the data and keep working on it in other tools, or put it into Open Refine. Because Refine runs locally it can be used for sensitive data you mightn't put online. One issue is that GLAMs tend to use question marks to record uncertainty in attribution, but Refine strips out all punctuation, so you have to be careful about preserving it (if that's what you want). Takes in TSV, CSV, *SV, Excel (.xls and .xlsx), JSON, XML, RDF as XML, and Google Data documents. http://freeyourmetadata.org/cleanup/ useful advice
  • #32 When plotted on a map though some of the location data recorded was WAY off. Wasn’t down to modern transcriptionist error, was a sailor error!  http://www.clim-past.net/8/1551/2012/cp-8-1551-2012.html More on the project: http://blogs.bl.uk/untoldlives/2013/04/history-and-science-meet-1.html http://www.clim-past.net/8/1551/2012/cp-8-1551-2012.html Video: https://vimeo.com/43884291