Ladies and gentleman, dear colleagues, I would like to tell you a few words about the Manuscriptoroum digital library and about both challenges and perspectives that it brings to humanities. And as for humanities represent a wide area of branches and disciplines I will speak more concretely about historical auxiliary sciences and especially abut codicology. Of course, Manuscriptorium digital library gathers and brings material for many other fields of history but I will leave it for another occasion now.
Manuscriptorium digital library is an integrated resource provided by the National Library of the Czech Republic and technically maintained by the AIP Beroun Ltd. company. It contains compound digital documents eventually distributed compound digital documents that consist of catalogue records and digital images and in some cases also of full texts. There are more possibilities how to create the compound digital documents and what parts it should have but this is its typical and maybe the simplest form. Of course, for example when music is to be represented through the compound digital document we imagine that not only images of the original manuscript and electronic representation of music notation are represented but also audio with a musical performance. Although it is technically quite well possible and there are no substantial problems to create such a document, on the other hand there are big difficulties concerning copyright of the performers, so that it can be and it is too expensive for the routine work. Moreover there are enough difficulties when creating full texts because it is highly sophisticated and consequently very slow work. And at last but not least, digitization of manuscripts is not cheap business. Therefore, when providing mere images in critical mass, it is a high level standard for now, when providing also full texts, it is a big added value at present.
As I already said, Manuscriptorium digital library is an integrated resource. It covers several levels of data aggregation. Firstly, it is a result of the digitization activities of the National Library of the Czech Republic that have their very beginnings in 1992 and that started systematically in 1995. Secondly, it is an output of the Czech national digitization programme called Memoriae mundi series Bohemica and funded since 1999. Thirdly, it is the first step to international cooperation in the field of manuscript digitization that begun in the frame of the CEE MASTER project in 2002. Fourthly, it is a consequence of the European ENRICH project that started in 2007 and which main aim was to create a base for the European digital library of manuscripts which was done with success. And fifthly, there is a global level based on wide cooperation activities of the National Library of the Czech Republic, e.g. effective contacts with Turkey or Kazakhstan. Thus, the aim of the Manuscriptorium digital library is to become a global player in providing results of manuscript digitization worldwide.
When providing a huge amount of data Manuscriptorium digital library enables to enter to the field of digital codicology. What is digital codicology? This question can be answered through comparison with codicology in traditional understanding. Codicology is a historical auxiliary science which name is based on the Latin word c odex which is one of basic material forms – so-called avatar – for text transmission like book . Although the traditional codicology does not deal with the carrier exclusively, the material container of the informational content, it deals more likely with the external attributes or formal features of the text than with its internal attributes or content features. Consequently, the traditional codicology is oriented to the study of individual phenomena that are purely concrete. On the other hand, the digital codicology is oriented to the study of many entities together, so that it deals typically with the mass phenomena. Thus, quantification and quantitative evaluation is crucial for the digital codicology. Consequently, the digital codicology is very close to the study of transcultural and intercultural problems, while the traditional codicology is limited rather to mere intracultural study. And last not least, the traditional codicology is higly specialized discipline with few relations to other disciplines like e.g. palaeography, the digital codicology is open to cooperation with much more disciplines. It is a big challenge because a transition from the traditional codicology to the digital one supposes a paradigm shift.
Substantial difference between the traditional and the digital codicology is that the digital codicology is based on the use of technical tools. Now we are only at the very beginning of the digital codicology so that amount of such tools is very limited but some of them were already created and tested, i.e. they really exist. Thus, at present there are three types of technical tools enabling work in the field of the digital codicology that are implemented into the Manuscriptorium digital library or at least that are going to be implemented in a near future. Firstly, there are tools for making heuristics such as tool for creating virtual collections, i.e. for selecting and organizing historical sources. Secondly, there are tools for preparing intermediates such as tool for creating virtual documents, i.e. for arranging historical sources in detail. Thirdly, there are tools for comparing texts such as text comparator based on vectoral statistics, i.e. tool outputing results for immediate interpretation and publication. These tools (with exception of the text comparator that is only in the prototype version now but will be implemented next year or at least in 2012) can be used when creating Manuscriptorium personal account that enables accessing Manuscriptorium virtual research area.
A tool for creating virtual collections cannot be used as an off-line tool, but it must be implemented among utilities of an integrated and networked digital resource as on-line tool that uses both documents of the resource where is implemented and documents of other resources as well. It enables selection of documents from original, i.e. not yet integrated resource. The selected documents can be arranged into one and/or more virtual collections according to one and/or more parameters. There are two types of virtual collections, static and dynamic ones. Static virtual collection is based on manual selection, i.e. it is rather closed and finished although principially it can be enriched by some added documents in the future. On the other hand when selecting a high number of documents, it is not easy manageable. Dynamic virtual collection is based on saving query and new searching according to this query always when the collection is opened. Thus, it reflects all upgrades of the resource database. Virtual collections can be changed, i.e. some other virtual collection can be made using already existing virtual collections. A tool for creating virtual collection is relatively new, it was implemented in January this year. Consequently, for now there is too little feedback, nevertheless we tested it on quite real research topics and it seems to be of a good usability.
A tool for creating virtual documents is in fact a module of the tool for creating distributed compound digital documents that is implemented as regular Manuscriptorium on-line service when using Manuscriptorium personal account. It is very important, because it facilitates its usability on one hand and allows understanding virtual reality not as a suspect manipulation but as a hidden aspect of actual reality. In short, virtual document is nothing but another arrangement of partial document entities or items. It can be used as both a mere heuristical tool for making arrangements of material during the interpretation process and a tool for preparing evidence during the process of electronic publishing in the net. Of course spiritus ubi vult flat , so that mere tools are not enough for bringing scholarly results. On the other hand, any scholarly work is impossible without efficient and effective tools, inclusive purely technical tools. Also the tool for creating virtual documents is new, it was implemented into the Manuscriptorium platform in the same time as the tool for creating virtual collections, i.e. in January this year. Thus, there is little feedback, too, nevertheless testing proved it is sufficient for practical scholarly work although it will be improved in future, of course.
At present the last tool that leads from the traditional codicology to the digital one is a tool for comparison of full texts. Originally and principially, it is a tool based on the frame of the computational linguistics for philological work, but it can be used also outside this philological area. It is a tool that facilitates work with internal attributes or content features which is a crucial step to both enlarging traditional codicological themes and leading to interdisciplinarity. Comparison of full texts is based on vectoral statistics, i.e. it compares not simply words, but word strings or text strings. A big problem is that in the past stages of language evolution, language systems not only in Latin but especially in vernacular languages lacked any conception of strict prescriptions, i.e. languages in the past used neither a prescriptive grammar nor a normative orthography. Thus, a more sophisticated technique and methodology like e.g. graphical variants (for texts that use no normative orthography) must be used for the comparison to be effective and successful. It works with such standards like XML, HTML, and TXT. Now, full text comparator is only an off-line prototype, not yet implemented into the Manuscriptorium platform. Representation of results is only numerical-statistical, but after implementing it as regular on-line service (supposed in 2011 or early 2012), it will be enriched with the graphical representation of results, too. A tool for comparison of full texts is an important step into the virtual research environment and into the area of the digital codicology.
Ladies and gentlemen, dear colleagues, thank you very much for your attention!
Manuscriptorium digital library and digital codicology new york 17.06.2010
MANUSCRIPTORIUMDIGITAL LIBRARY ANDDIGITAL CODICOLOGYZdeněk UhlířNational Library of the Czech RepublicNew York 17.06.2010
MANUSCRIPTORIUMCatalogue recordsDigital imagesFull textsPerhaps other types of material in future
Manuscriptorium network Several levels National Library of the Czech Republic Czech Lands: Memoriae mundi seriesBohemica Central Europe: CEE MASTER Europe: ENRICH Global: contacts outside Europe
Digital codicology Mass vs. individual phenomena Quantification of specific features vs. dealing withindividual features Trans- and inter-cultural vs. intra-cultural study Interdisciplinarity vs. specialization
Support toolsTools for heuristicsTools for creating intermediatesTools for data comparison