Your SlideShare is downloading. ×
Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique


Published on

Par Lou Burnard. Tous droits réservés

Par Lou Burnard. Tous droits réservés

Published in: Education

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • The programming to integrate the datasets virtually and to develop applications for accessing them easily have been developed here in Oxford over the past year thanks to a grant from the Fell Fund. Only one post was funded and only for one year, but the work was so exciting that others can freely of their time. In addition to those who are speaking today I would like to acknowledge the work of Sebastian Rahtz in OUCS who programmes the LGPN, to Reinhard Foertsch and Robert Kummer in Cologne, and Anne-Violaine Szabdos in Paris. My vision for the future sees CLAROS in museums, libraries, universities, schools and homes. I see a global registry of antiquities and I see public participation in creating it. That future is now and the University of Oxford is leading it.
  • Graham…. The CLAROS system provides what we might call a search and discovery service over the different data sources. They have each developed independently over many years and each uses different underlying technologies to deliver information about the same very large subject. Beazley has pottery, engraved gems, sculpture in the form of plaster casts, and antiquarian books and photographs. Arachne and DAI have sculpture, antiquarian books and photographs. LIMC has all types of classical art with mythological subjects. LGPN has personal names of ancient Greeks.
  • Each also uses different technologies and different vocabularies to represent information. CLAROS needed a way to bring them together. Instead of developing our own bespoke system we looked very carefully at the work that had already been done and we found two standards that seemed to suit our needs. The first is CIDOC-CRM. This is an ISO standard that was developed under the aegis of UNESCO for ICOM – for the world’s museums. It provides a flexible framework and vocabulary, and seems to offer greater potential for creating a common framework than other systems we know such as OAI, CDWA and MuseumDat. It also seems to be well suited to other domains, such as describing scientific experiments – the type of work that I usually do with computers. And this seems ot us to be a good thing. The second is RDF. This is a World Wide Web standard, a kind of universal data format and it is the base format for current work on the next generation web, often called a “Semantic Web” or “Web of Data”.
  • Transcript

    • 1. Beyond the Document Lou Burnard
    • 2. The message
      • The metaphor of the digital book is so pervasive that we can barely see it.
      • But going digital is not only about producing cheaper and more accessible simulations of printed or painted pages.
      • Digital applications should enable us to do more with a text than simply read it from beginning to end, or attach annotations to it for others to read, or link it to other digital texts
      • We are at last moving beyond the document, towards a distributed world, in which “the books in the library can talk to each other”
    • 3. Plan
      • What's that noise in the digital library?
      • From Literary and Linguistic Computing to Humanities Computing to Digital Humanities
      • A classical case study
      • What should we be proud of?
    • 4. Three simple truths
      • There is no going back : the knowledge infrastructure is now irrevocably digital
      • The business models of the knowledge infrastructure have changed irrevocably
      • The quantititative changes facilitated by digital technologies approximate qualitative change
    • 5. Irrevocable digitality
      • The objects of Humanities scholarship are now digital, even if its methods are not
      • And our methods are changing all around us...
      • We are moving from hyper text to hyper data
      • From a web of documents to a web of data
        • The technology is here (more or less)
        • The problems are mostly socio-politico-cultural
      • But first, a little history lesson
    • 6. Literary & Linguistic Computing
    • 7. 1960-1980
      • The Heroic age...
        • Father Busa and the Index Thomasticum
        • The Brown Corpus
        • Thesaurus Linguae Graecae
      • concordances, stylistic analysis, authorship studies, language corpora
      • technical barriers, inpenetrable for all but the determined (or mad)
    • 8. LLC is also a journal, and an annual conference
    • 9. LLC is alive and well and living in France
      • Text as a statistical phenomenon
      • Factor analysis and data mining
      • Textometrie
    • 10. Humanities Computing
    • 11. 1980-1994
      • Institutionalization
      • Is Humanities Computing an Academic Discipline?
      • The “text encoding” project
    • 12. Institutionalization
    • 13.
      • In the home, the eighties was a decade of technology that nearly worked
      • In academia, digital methods and resources, though perceived as alien and difficult, were also finding their place
      • In the UK
        • Computers in Teaching Initiative
        • Arts and Humanities Data Service
      • Something new, or something old done better?
      The rise of the HC centre
    • 14. Communities
      • E-mail and e-mail lists: Humanist
      • Electronic Text paradigms
        • Oxford Text Archive
        • Project Gutenberg
      • NLP (TALN)
      • Public funding becomes important
        • Computers in Teaching Initiative (CTI)
      • And private enterprise is curious
        • Electronic Publishing SIG
    • 15.
      • Once we have made our digital surrogates, what then?
      • Traditions (”scholarly primitives”)
        • finding by means of external characteristics
        • analysing by means of internal features
        • associating by means of shared perceptions
      • What tools and methods will help combine these approaches?
      • What theory will inform their application?
      The challenge for HC
    • 16. Resources digital resources encoding analysis abstract model
    • 17.
      • scholarship depends on continuity
      • it is not enough to preserve the bytes of an encoding
      • there must also be a continuity of comprehension: the encoding must be self-descriptive
      Transmitting our interpretations
    • 18. TEI: the main achievement of HC?
      • Originally a response to the multiplicity of formats and lack of standards
      • The TEI emerged as a single, encyclopaedic model of the “significant particularities” of textual resources
      • And also an adaptable architecture able to respond to changing needs and priorities
    • 19. Digital Humanities
    • 20. 1995 - ?
      • While we were talking about the theory....
        • digital libraries
        • mass digitization
        • commodity computing, folksonomies, cloud computing...
      • Convergence and collaboration
        • rethinking scholarly editing
        • redefining the discipline
      • New infrastructures?
    • 21. The rise of the digital library
      • “ Public good” digitization efforts
        • From Gallica to JISC Digititization Programme
      • The metadata challenge
        • Authority and link-rot: Resource Discovery Network to Intute
        • From Dublin Core to OAI/PMH
        • Can systems be self-organizing?
      • What is the right business model?
    • 22. An alternative model
      • What works for software could work equally well for digital resources
      When programmers can read, redistribute, and modify the source code for a piece of software, the software evolves. People improve it, people adapt it, people fix bugs. When developers can access, redistribute, and enhance the digital resources underlying a digital application, new applications can evolve. People can add value, people can adapt it, people can fix bugs.
    • 23. Open up the data warehouse!
    • 24. Digital humanities manifesto 2.0 Digital Humanities is not a unified field but an array of convergent practices that explore a universe in which: a) print is no longer the exclusive or the normative medium in which knowledge is produced and/or disseminated; instead, print finds itself absorbed into new, multimedia configurations; and b) digital tools, techniques, and media have altered the production and dissemination of knowledge in the arts, human and social sciences.
    • 25. ibid... Digital Humanities implies the multi-purposing and multiple channeling of humanistic knowledge : no channel excludes the other. Its economy is abundance based, not one based upon scarcity .... though notions of humanistic research are everywhere under institutional pressure, there is (potentially) plenty for all. And, indeed, there is plenty to do.
    • 26. The importance of not reading
      • “ What can you do with a million books?” (Greg Crane)
      • “ Although there is still a need for close-reading... we never don't not read” (John Unsworth)
      • A new synergy of methods:
        • Corpus linguistics
        • Pattern recognition
        • Data mining
    • 27. How to not read
      • We need to find ways of cross-searching, decomposing, and re-composing
        • rich xml documents
        • complex relational database structures
        • simple presentation-focussed websites
        • sound, image, video...
      • The challenge is to do this in an open and standards-compliant manner
      • And on a massive scale
    • 28. Escaping from the text
      • From footnote to hypertext
    • 29.  
    • 30. A classical case study
    • 31. CLAROS , for example
      • (Current) Partners
        • University of Oxford: Faculty of Classics
          • Beazley Archive: documentation of pottery, jewels, etc.
          • Lexicon of Greek Personal Names: attested names
        • University of Cologne
          • Arachne Archive: data about sculpture
        • German Archaeological Institute, Berlin
          • Images from archaeological sites
        • University of Paris X
          • Lexicon Iconographicum Mythologiae Classicae:
      • Over 2 million records and images
      • Four different database systems
    • 32. A mix of technologies... Beazley Archive DAI Arachne LGPN (Oxford) LIMC (Paris) .NET / ASP XSLT, PHP Java XSLT Relational database: MS SQL Server XML database Relational database: MySQL Relational database: MySQL Browser Browser Browser Browser
    • 33. ...but a common conceptual model
    • 34. How does it actually work?
    • 35. What makes this possible?
      • It's not rocket science!
      • XML markup with a shared semantics (TEI)
      • Appropriate use of new technologies (e.g. Unicode, javascript)
      • A willingness to open up our data
    • 36. Rethinking the digital edition
      • The insights of critical editing/edition philology need to be re-discovered and re-applied in the new context
      • We need a new synergy of semiotics and hermeneutics
      • Combined with the traditional virtues of skepticism and empiricism
    • 37. Components of the digital edition
      • Manuscript page images
      • Annotated transcriptions
      • Critical (synthetic) edition
      • Modern translation and summary
      • Notes, glossary, foreword, bibliography, etc.
      • Manuscript descriptions and metadata
      • “ Factoids” about the real world
    • 38. The textual trinity
      • Textual descriptions tend to focus on one of:
        • its linguistic nature (because texts are made of words used in particular ways)
        • its physical state (because texts are made up of glyphs arranged in particular ways)
        • its intentions (because texts are supposed to tell us something about the world)
      • Likewise, software tends to distinguish
        • document management and production systems
        • image management and production systems
        • database systems
    • 39. Convergence
      • But the digital agenda requires us to mash these things up: for example to combine
        • a GIS database about places in the Aegean sea
        • a historical gazeteer of placenames in the same area
        • a corpus of texts mentioning those placenames
      • TEI has recently expanded its scope to support this kind of convergence
    • 40. conclusions
    • 41. A key role for the Humanities
      • We know about textual objects
        • how is this discourse represented?
        • what stories does it tell
      • We know about hermeneutics
        • what does this discourse mean?
        • what does it say aside from its denotational content?
      • This is our contribution to the semantic web