Digital.Humani,es@Oxford Summer School, 3 July 2012 Humanities Research Data – Rate me! Wolfram Horstmann
The Research Data Question h>p://www.ﬂickr.com/photos/desconciertos/160752180/ Data-‐driven research is called the 4th Paradigm in the Sciences. Where are humani;es in the current discussion about research data?
Ratings, Skepticism & Anxiety h>p://www.ﬂickr.com/photos/komoda/7187391601/ Research Excellence Framework is a reality. But it is objected that: “Humani;es research threatened by demands for economic impact” Guardian 13 October 2009
Outline The current awareness of the importance of research data provides opportuni;es for the humani;es to show their value. ~ The challenge is to communicate what research data means for the humani;es. ~ The proposal is to state the obvious more clearly: text and images as research data of the humani;es and libraries as humani;es research facili;es.
Texts and Images as Data http://www.Flickr.com/photos/gorgmorg/9944210/ Humani;es work with texts and images as other subject areas work with ma>er, wetware, hardware or numbers.
Libraries as Research Facilities h>p://vi.sualize.us/carl_spitzweg_bucherworm_1850_books_library_ladder_reading_picture_2Qp9.html Humani;es have ins;tu;onalized their research facili;es centuries ago, other subject areas did it much later, with labs and centers like CERN or EMBL.
The Advent of the Digital h>p://www.ﬂickr.com/photos/ﬂex/27334821/ h>p://tei.oucs.ox.ac.uk/Talks/2008-‐08-‐kazan/exercise-‐2.xml h>p://www.bodley.ox.ac.uk/librarian/rpc/manchesterpres/slide15.jpg Transforming the physical research facili;es into digital is a laborious and expensive exercise – and its poten;al is not yet exploited.
Digital Humanities & Libraries h>p://adamcrymble.blogspot.com.es/2012/01/is-‐old-‐bailey-‐online-‐ﬁlm-‐or-‐science.html World Data Centers or the EBI are centralized – can Humani;es Data Centers can be at each ins;tu;on?
Digital Resources in the Bodleian ~ approaching petabyte scale of highly structured storage for texts and images ~ 2.000.000 digi;zed images, another Million to come in the next 3 years, plus 350.000 Google Books REFERENCE MISSING ~ 100 virtual machines … and by far most of these are resources of the Humani;es.
Cultures of Knowledge h>p://www.history.ox.ac.uk/coj/ An example of highly structured, intellectually curated data: more than unique 12.000 people and 3500 loca;ons iden;ﬁed in 60.000 le>ers with 25.000 annota;ons.
What’s the Score? h>p://www.whats-‐the-‐score.org/ In only a few months over 10.000 scores have been described by the public.
Broadside Ballads h>p://ballads.bodley.ox.ac.uk Collabora;ve research introduces novel quali;es into humani;es research data management.
Size matters! h>p://randommiza;on.com/2011/03/08/library-‐has-‐giant-‐books-‐for-‐facade/ Even though humani;es oken use qualita;ve and hermeneu;c methodology – rather than quan;ta;ve – the size of data is signiﬁcant.
Structure matters! 011010101001010101010101011000100010101001010001000101010010011010101001010101010101011000100010101001010001000101010010011010101001010101010101011000100010101001010001000101010010011010101001010101010101011000100010101001010001000101010010011010101001010101010101011000100010101001010001000101010 h>p://cacm.acm.org/magazines/2010/4/81499-‐the-‐data-‐structure-‐canon/fulltext Sizable numbers will not give a thorough idea of digital humani;es data – structure is evenly important. This can only be understood by example.
Collaboration matters! h>p://www.ﬂickr.com/photos/ludovicmauduit/2646525907 Involvement of colleagues in collabora;ve research and the public in crowdsourcing makes a diﬀerence.
1st Challenge: Diversity h>p://www.ucl.ac.uk/archaeology/studying/undergraduate/courses/ARCL2037 Humani;es have a varied typology of research data, oken requiring idiographic approaches. Thus, standardiza;on is diﬃcult (cf. cita;on), and so is ﬁnding computa;onal skills.
2nd Challenge: Openness h>p://www.ﬂickr.com/photos/uncene/364730693/ As with all researchers, compe;;on, privacy and exploita;on are impediments to data sharing. Do humani;es more than others keep the “ivory tower” aptude?
Accessibility of Humanities Texts Wal;nger, U., Mehler, A., Lösch, M., & Horstmann, W. (2011). Hierarchical Classiﬁca;on of OAI Metadata Using the DDC Taxonomy. In Chambers et al (Eds.), Advanced Language Technologies for Digital Libraries (Vol. 6699, pp. 29 -‐ 40). Berlin / Heidelberg: Springer. Lösch, M., Wal;nger, U., Horstmann, W., & Mehler, A. (2011). Building a DDC-‐annotated Corpus from OAI Metadata. Journal of Digital Informa;on, 12(2) From some 30.000.000 bibliographic records it is hard to ﬁll the humani;es corpus. This might constrain discoverability of Humani;es resources.
3rd Challenge: Inherent Obstacles Humani;es research data show some peculiari;es. An extreme example is the closure of archaeological data to protect sites against tomb raiders. Research in the Humani;es and Social Sciences : Hogenaar, A. , H. Tjalsma, & M. Priddy. 2011. “Research in the Humani;es and Social Sciences” h>p://dx.doi.org/10.2390/PUB-‐2011-‐7
4th Challenge: Implementing Policy Deposit of resources or datasets Grant Holders in all areas must make any signiﬁcant electronic resources or datasets created as a result of research funded by the Council available in an accessible and appropriate depository for at least three years after the end of their grant. The choice of depository should be appropriate to the nature of the project and accessible to the targeted audiences for the material produced. h>p://www.ahrc.ac.uk/FundingOpportuni;es/Documents/Research%20Funding%20Guide.pdf Funders policies are an approach for opening up data – but humani;es produce much data outside of the regular project life cycle.
1st Opportunity: Public Understanding h>p://www.queenvictoriasjournals.org/home.do Humani;es research data are oken easier understood by the public than science data. The “Impact Regime” may even be an advantage for the humani;es.
2nd Opportunity: Cultural Heritage h>p://www.europeana.eu/portal/ They are more likely to be accessed and preserved than research data in other subject areas.
3rd Opportunity: Infrastructure Na;onal Library of China The requirements of infrastructure for many humani;es research data resemble those of digital libraries. No new research facili;es have to be built.
4th Opportunity: New Metrics http://newsinfo.iu.edu/pub/libs/images/usr/9584_h.jpg It is likely that humani;es research data have an web impact advantage. High societal interest could result in higher web-‐o-‐metric and usage sta;s;cs ra;ngs.
Another mindset? …to see text & images as humani;es research data. ~ …to see the humani;es as data intensive. ~ …to see a web impact advantage for the humani;es. ~ …to see libraries as humani;es research facili;es.
Recommendations Exploit the good accessibility of humani;es research themes through newspapers, exhibi;ons, crowdsourcing and ci;zen science. ~ Make as many research outputs web accessible as possible. ~ Invest in and support new metrics such as usage sta;s;cs and web-‐impact. ~ Strengthen partnership between humani;es and other disciplines and libraries.