When will there be a digital revolution
in the Humanities?
Martin Wynne
martin.wynne@it.ox.ac.uk
Directory of User Involvement
CLARIN ERIC
Oxford e-Research Centre &

Was können und wollen Digital Humanities

IT Services (formerly OUCS) &

Austrian National Library

Faculty of Linguistics, Philology and Phonetics,

Vienna

University of Oxford

Friday 25th October 2013

1
The 'take-home messages'

In the era of the data deluge, web science and digital scholarship,
there is the possibility to transform the study of the Humanities...
...but we need to do some hard technical and legal stuff...
...and some even harder social and intellectual stuff, because...
...some aspects of traditional scholarship in the Humanities make
digital research difficult, and some aspects of emerging e-research
make humanistic enquiry difficult.
Or, to put it another way:
Can we follow the example of the physical
sciences in deciding priorities and adopting
standards, reducing complexity and variety,
to promote shared facilities and
infrastructures?
2
Where's the exemplary
project?

Are there a number of really good examples of:
• research made possible by digital tools and methods,
• finding new answers to old questions, or
• making it possible to ask new questions?

3
Nature 474, 436-440 (2011) | doi:10.1038/474436a
The data deluge
Three barriers to the
digital revolution

1. Technical legal and administrative barriers, and a lack of
connectivity in a fragmented environment (silos and
fishtanks)
2. Methods: digital resources make it easy to bad research
3. Some of the methods and traditions of the Humanities
make it difficult to do e-Research

14
Three barriers to the
digital revolution

1.Technical legal and administrative barriers, and a lack of
connectivity in a fragmented environment (silos and
fishtanks)
2.Digital resources make it easy to bad research
3.Some of the methods and traditions of the Humanities make it
difficult to do e-Research

15
Interoperability and sustainability
for digital textual scholarship

Well-known problems with digital resources in the humanities of:
• fragmentation of communities, resources, tools;
• lack of connectness and interoperability;
• sustainability of online services;
• lack of deployment of tools as reliable and available services
There is a potential solution in distributed, federated
infrastructure services.

16
Silos or fishtanks??

Let's talk about fishtanks rather than silos...
There are lots of fishtanks out there, some very elaborate, big, pretty...
But they're all in different places and
unconnected.
And if I want to keep a fish I have to
build a fishtank (or put it in yours)...
And who's going to carry on feeding
the fish?
Let's not all make our own fishtanks.

18
Wouldn't it be better to have an ecosystem
where we can all set our fishes free?

You can access all of the riches of the deep and it's a lot easier to get
into fish research
19
The CLARIN Vision
A researcher in Vienna, from his desktop computer, can:
 do a single sign-on, with local authentication, and then:
 search for, find and obtain authorization to use resources in Oxford,
Prague and Berlin
 select the precise dataset to work on, and save that selection
 run semantic analysis tools from Budapest and statistical tools from
Tübingen over the dataset
 use computational power from the local, national or other computing
centre where necessary
 obtain advice and support for carrying out all technical and
methodological procedures
 save the workflow and results of the analysis, and share those
results with collaborators in Paris, Edinburgh and Zagreb
 discuss and iteratively adopt and re-run the analyses with
collaborators
Three barriers to the
digital revolution

1.Technical legal and administrative barriers, and a lack of
connectivity in a fragmented environment (silos and fishtanks)
2.Digital resources make it easy to bad research
3.Some of the methods and traditions of the Humanities make it
difficult to do e-Research

27
A new opportunity

"It is not easy to justify assertions about the alleged frequency of
infrequency of some particular belief or attitude in the past. How many
examples does one need to cite in order to prove the point? Lacking any
satisfactory method of quantifying these matters, all I can do is to record
my impressions after long immersion in the period."

Thomas, Keith, The Ends of Life, Oxford University Press, 2010.
Intellectual History
“We cannot hope to understand the behaviour of people
long dead, unless we can reconstruct the mental
assumptions which led them to act as they did.”
Thomas, Keith, The Ends of Life, Oxford University Press, 2010.

Evidence:
● writing
● speech
● thoughts
● actions
● artefacts (art, architecture, cooking, etc.)
● other?
Some testable assertions

State


“...no political writer before the middle of the sixteenth century used the word 'state' in
anything like its modern political sense [referring to the machinery of government and
social control]” (Skinner, Quentin, The Foundations of Modern Political Thought,
Cambridge University Press, 1978).

Tudor


“The idea of a "Tudor era" in history is a misleading invention, claims an Oxford University
historian. Cliff Davies says his research shows the term "Tudor" was barely ever used
during the time of Tudor monarchs.” (http://www.bbc.co.uk/news/education-18240901
May 2012)

Holocaust


“I will argue that “The Holocaust” is an ideological representation of the Nazi
holocaust...Until recently, however, the Nazi holocaust barely figured in American life.
Between the end of World War II and the late 60s, only a handful of books and films
touched on the subject”. (Norman Finkelstein, The Holocaust Industry. Verso, 2000.)
The perils of
interpretation
How do we interprete the results? We need to ask the questions::
●
What's in my dataset? What's missing?
●
What will the sampling procedure miss?
●
What population of texts in the world can I make claims about by
searching this dataset?
●
What is the right tool for the job?
●
Will I successfully retrieve all occurrences of the word forms which I am
looking for?
●
How can I make my search term more sophisticated?
●
What claims can I make about the significance of the frequencies?
●
How can I improve the process and refine the results?
●
What do I need to go on to investigate further?
●
How can I share my results and methods?
Data-intensive Humanities

35
Three barriers to the
digital revolution

1.Technical legal and administrative barriers, and a lack of
connectivity in a fragmented environment (silos and fishtanks)
2.Digital resources make it easy to bad research
3.Some of the methods and traditions of the Humanities make
it difficult to do e-Research

36
What is digital scholarship
in the Humanities?

Issues and assumptions in e-Science:
• Consensus (and compromise) about funding priorities
• Adoption of technical standards
• Standards for the representation of knowledge and interpretations
(agreement on concepts and categories!)
• Reproducibility and replicability of research
• Sharing of generic tools
• Curation of tools and data in professional service centres
• Support for software sustainability
• Promtion of interoperability of resources and tools
• Sharing research outputs
• Research leading to an accumulation of knowledge
• Increasingly data-driven research

38
In defence of the enlightenment

"[There is] a monolithic conception of social space, according to which
it would suffice to have the right information to make the right decisions.
But in point of fact, information itself is far from homogenous and no
purely quantitative approach is satisfying. Having ever greater amounts
of information at our fingertips not only does not make us more
virtuous, as Rousseau already predicted, but it does not even make us
more knowledgeable."
[Tzvetan Todorov, In Defence of the Enlightenment, 2009]

39
Steering a difficult path

• One extreme: e-Humanities is like e-Science: data-driven,
empirical, evidence-based, practical, based on shared
facilities, tools, resources and methods
• Another extreme: it is in the intrinsic nature of the
Humanities that we should constantly question the basic
received ideas and categories, and therefore we should not
and cannot expect to have shared assumptions methods
• Can Digital Humanities steer a route in between?

41
The simple challenge then...

“The challenge for CLARIN is to create an ecosystem in which better
hermeneutically-informed software can be created and deployed. This would
enable the sharing of resources with provisional, ad hoc, but agreed categories
for representing our analyses and interpretations. Criticism and scholarship
relating to the nature of the interpretations implicit in these annotations would
not come to a halt, but achieving agreements such as these would remove
barriers to the creation of large-scale shared digital facilities.
We need to follow the sciences in deciding priorities, adopting standards,
reducing complexity and variety, but only as pragmatic measures to promote
shared facilities and infrastructures. At the same time, we need to avoid the
promotion of an excessively data-driven, empirical and scientistic view of the
humanities, and continue to defend the traditions of qualitative research in the
humanities, and pursue the humanities for their own sake.”
43
Read more...

'Silos or Fishtanks?'
http://blogs.it.ox.ac.uk/martinw/2012/04/06/silos-or-fishtanks/

'The Role of CLARIN in Digital Transformations in the Humanities'
Martin Wynne
International Journal of Humanities and Arts Computing 7.1-2 (2013):
89–104
DOI: 10.3366/ijhac.2013.0083
Edinburgh University Press 2013

44
The 'take-home messages'

In the era of the data deluge, web science and digital scholarship,
there is the possibility to transform the study of the Humanities...
...but we need to do some hard technical and legal stuff...
...and some even harder social and intellectual stuff, because...
...some aspects of traditional scholarship in the Humanities make
digital research difficult, and some aspects of emerging e-research
make humanistic enquiry difficult.
Or, to put it another way:
Can we follow the example of the physical
sciences in deciding priorities and adopting
standards, reducing complexity and variety,
to promote shared facilities and
infrastructures?
45

When will there be a digital revolution in the humanities?

  • 1.
    When will therebe a digital revolution in the Humanities? Martin Wynne martin.wynne@it.ox.ac.uk Directory of User Involvement CLARIN ERIC Oxford e-Research Centre & Was können und wollen Digital Humanities IT Services (formerly OUCS) & Austrian National Library Faculty of Linguistics, Philology and Phonetics, Vienna University of Oxford Friday 25th October 2013 1
  • 2.
    The 'take-home messages' Inthe era of the data deluge, web science and digital scholarship, there is the possibility to transform the study of the Humanities... ...but we need to do some hard technical and legal stuff... ...and some even harder social and intellectual stuff, because... ...some aspects of traditional scholarship in the Humanities make digital research difficult, and some aspects of emerging e-research make humanistic enquiry difficult. Or, to put it another way: Can we follow the example of the physical sciences in deciding priorities and adopting standards, reducing complexity and variety, to promote shared facilities and infrastructures? 2
  • 3.
    Where's the exemplary project? Arethere a number of really good examples of: • research made possible by digital tools and methods, • finding new answers to old questions, or • making it possible to ask new questions? 3
  • 8.
    Nature 474, 436-440(2011) | doi:10.1038/474436a
  • 9.
  • 14.
    Three barriers tothe digital revolution 1. Technical legal and administrative barriers, and a lack of connectivity in a fragmented environment (silos and fishtanks) 2. Methods: digital resources make it easy to bad research 3. Some of the methods and traditions of the Humanities make it difficult to do e-Research 14
  • 15.
    Three barriers tothe digital revolution 1.Technical legal and administrative barriers, and a lack of connectivity in a fragmented environment (silos and fishtanks) 2.Digital resources make it easy to bad research 3.Some of the methods and traditions of the Humanities make it difficult to do e-Research 15
  • 16.
    Interoperability and sustainability fordigital textual scholarship Well-known problems with digital resources in the humanities of: • fragmentation of communities, resources, tools; • lack of connectness and interoperability; • sustainability of online services; • lack of deployment of tools as reliable and available services There is a potential solution in distributed, federated infrastructure services. 16
  • 18.
    Silos or fishtanks?? Let'stalk about fishtanks rather than silos... There are lots of fishtanks out there, some very elaborate, big, pretty... But they're all in different places and unconnected. And if I want to keep a fish I have to build a fishtank (or put it in yours)... And who's going to carry on feeding the fish? Let's not all make our own fishtanks. 18
  • 19.
    Wouldn't it bebetter to have an ecosystem where we can all set our fishes free? You can access all of the riches of the deep and it's a lot easier to get into fish research 19
  • 24.
    The CLARIN Vision Aresearcher in Vienna, from his desktop computer, can:  do a single sign-on, with local authentication, and then:  search for, find and obtain authorization to use resources in Oxford, Prague and Berlin  select the precise dataset to work on, and save that selection  run semantic analysis tools from Budapest and statistical tools from Tübingen over the dataset  use computational power from the local, national or other computing centre where necessary  obtain advice and support for carrying out all technical and methodological procedures  save the workflow and results of the analysis, and share those results with collaborators in Paris, Edinburgh and Zagreb  discuss and iteratively adopt and re-run the analyses with collaborators
  • 27.
    Three barriers tothe digital revolution 1.Technical legal and administrative barriers, and a lack of connectivity in a fragmented environment (silos and fishtanks) 2.Digital resources make it easy to bad research 3.Some of the methods and traditions of the Humanities make it difficult to do e-Research 27
  • 28.
    A new opportunity "Itis not easy to justify assertions about the alleged frequency of infrequency of some particular belief or attitude in the past. How many examples does one need to cite in order to prove the point? Lacking any satisfactory method of quantifying these matters, all I can do is to record my impressions after long immersion in the period." Thomas, Keith, The Ends of Life, Oxford University Press, 2010.
  • 29.
    Intellectual History “We cannothope to understand the behaviour of people long dead, unless we can reconstruct the mental assumptions which led them to act as they did.” Thomas, Keith, The Ends of Life, Oxford University Press, 2010. Evidence: ● writing ● speech ● thoughts ● actions ● artefacts (art, architecture, cooking, etc.) ● other?
  • 30.
    Some testable assertions State  “...nopolitical writer before the middle of the sixteenth century used the word 'state' in anything like its modern political sense [referring to the machinery of government and social control]” (Skinner, Quentin, The Foundations of Modern Political Thought, Cambridge University Press, 1978). Tudor  “The idea of a "Tudor era" in history is a misleading invention, claims an Oxford University historian. Cliff Davies says his research shows the term "Tudor" was barely ever used during the time of Tudor monarchs.” (http://www.bbc.co.uk/news/education-18240901 May 2012) Holocaust  “I will argue that “The Holocaust” is an ideological representation of the Nazi holocaust...Until recently, however, the Nazi holocaust barely figured in American life. Between the end of World War II and the late 60s, only a handful of books and films touched on the subject”. (Norman Finkelstein, The Holocaust Industry. Verso, 2000.)
  • 34.
    The perils of interpretation Howdo we interprete the results? We need to ask the questions:: ● What's in my dataset? What's missing? ● What will the sampling procedure miss? ● What population of texts in the world can I make claims about by searching this dataset? ● What is the right tool for the job? ● Will I successfully retrieve all occurrences of the word forms which I am looking for? ● How can I make my search term more sophisticated? ● What claims can I make about the significance of the frequencies? ● How can I improve the process and refine the results? ● What do I need to go on to investigate further? ● How can I share my results and methods?
  • 35.
  • 36.
    Three barriers tothe digital revolution 1.Technical legal and administrative barriers, and a lack of connectivity in a fragmented environment (silos and fishtanks) 2.Digital resources make it easy to bad research 3.Some of the methods and traditions of the Humanities make it difficult to do e-Research 36
  • 38.
    What is digitalscholarship in the Humanities? Issues and assumptions in e-Science: • Consensus (and compromise) about funding priorities • Adoption of technical standards • Standards for the representation of knowledge and interpretations (agreement on concepts and categories!) • Reproducibility and replicability of research • Sharing of generic tools • Curation of tools and data in professional service centres • Support for software sustainability • Promtion of interoperability of resources and tools • Sharing research outputs • Research leading to an accumulation of knowledge • Increasingly data-driven research 38
  • 39.
    In defence ofthe enlightenment "[There is] a monolithic conception of social space, according to which it would suffice to have the right information to make the right decisions. But in point of fact, information itself is far from homogenous and no purely quantitative approach is satisfying. Having ever greater amounts of information at our fingertips not only does not make us more virtuous, as Rousseau already predicted, but it does not even make us more knowledgeable." [Tzvetan Todorov, In Defence of the Enlightenment, 2009] 39
  • 41.
    Steering a difficultpath • One extreme: e-Humanities is like e-Science: data-driven, empirical, evidence-based, practical, based on shared facilities, tools, resources and methods • Another extreme: it is in the intrinsic nature of the Humanities that we should constantly question the basic received ideas and categories, and therefore we should not and cannot expect to have shared assumptions methods • Can Digital Humanities steer a route in between? 41
  • 43.
    The simple challengethen... “The challenge for CLARIN is to create an ecosystem in which better hermeneutically-informed software can be created and deployed. This would enable the sharing of resources with provisional, ad hoc, but agreed categories for representing our analyses and interpretations. Criticism and scholarship relating to the nature of the interpretations implicit in these annotations would not come to a halt, but achieving agreements such as these would remove barriers to the creation of large-scale shared digital facilities. We need to follow the sciences in deciding priorities, adopting standards, reducing complexity and variety, but only as pragmatic measures to promote shared facilities and infrastructures. At the same time, we need to avoid the promotion of an excessively data-driven, empirical and scientistic view of the humanities, and continue to defend the traditions of qualitative research in the humanities, and pursue the humanities for their own sake.” 43
  • 44.
    Read more... 'Silos orFishtanks?' http://blogs.it.ox.ac.uk/martinw/2012/04/06/silos-or-fishtanks/ 'The Role of CLARIN in Digital Transformations in the Humanities' Martin Wynne International Journal of Humanities and Arts Computing 7.1-2 (2013): 89–104 DOI: 10.3366/ijhac.2013.0083 Edinburgh University Press 2013 44
  • 45.
    The 'take-home messages' Inthe era of the data deluge, web science and digital scholarship, there is the possibility to transform the study of the Humanities... ...but we need to do some hard technical and legal stuff... ...and some even harder social and intellectual stuff, because... ...some aspects of traditional scholarship in the Humanities make digital research difficult, and some aspects of emerging e-research make humanistic enquiry difficult. Or, to put it another way: Can we follow the example of the physical sciences in deciding priorities and adopting standards, reducing complexity and variety, to promote shared facilities and infrastructures? 45