Digitization of Documentary Heritage Collections in Indic LanguageComparative Study of Five Major Digital Library Initiatives in India
Digitization of Documentary Heritage Collections in Indic Language Comparative Study of Five Major Digital Library Initiatives in India Dr. Anup Kumar Das Jawaharlal Nehru University (JNU) New Delhi, India http://www.anupkumardas.blogspot.in/Presented in the International Conference on the Memory of the World in the Digital Age: Digitization and Preservation, 26-28 September 2012, Vancouver, British Columbia, Canada
Outline• Introduction• Indicative Multilingual DL Initiatives in India• Digital Library of India (DLI) project• IGNCA maintained Digital Libraries• National Mission for Manuscripts• DL Initiatives with Single Indic Language Contents• Challenges Ahead• Examining Semantic Web Principles• Conclusion
Introduction• Article 6 of the UNESCO Universal Declaration on Cultural Diversity “Towards Access for All to Cultural Diversity”• Mandates of Networked Knowledge Societies.• DL as a vehicle for widely disseminating documentary heritages.• Indian DL initiatives aim at producing a vast amount of Multilingual, Multicultural digitized contents pertaining to different forms of recorded human knowledge, ranging from the rare manuscripts to current literature.
Introduction• Culturally diverse contents in multilingual DLs ensure intercultural understanding and intercultural dialogues, a building block for inclusive knowledge societies.• When establishing digital library with a large collection, collaboration is inevitable.• Indian DL initiatives achieved multi-stakeholders’ participation with increased international, regional, national and local collaborations.• Providing metadata information in Indic languages is one of the major challenges in DLs in Indic languages
Indicative Multi-/ Bi-lingual DL Initiatives in IndiaName of the Initiative Implementing Agency Funding Agency Website Digital Library of Indian Institute of MCIT and others http://www.new1.dli.ernet. India (DLI) Science; IIIT ; Hyderabad; C-DAC http://www.new.dli.ernet.in ; http://dli.cdacnoida.inKalasampada: Digital IGNCA MCIT http://www.ignca.nic.in/dlrLibrary Resources for Indian CulturalHeritage (DL-RICH)National Databank on IGNCA MCIT http://ignca.nic.in/ndb_000 Indian Art and Culture (NDBIAC) Kritisampada : National Mission for Ministry of Culture http://www.namami.orNational Database of Manuscripts, IGNCA g/pdatabase.aspx Manuscripts Panjab Digital Panjab Digital Nanakshahi Trust and http://www.panjabdigi Library (PDL) Library others lib.org/Digital Repository of West Bengal Public Directorate of Library http://dspace.wbpubli WBPLN (DR- Library Network Services, West Bengal bnet.gov.in/dspace/ WBLLN) (WBPLN), CDAC Kolkata
Indicative Multi-/ Bi-lingual DL Initiatives in India Name of the Implementing Funding Agency Website Initiative AgencyOpen Access to Oriya National Institute of NITR; Srujanika, http://oaob.nitrkl.ac.i Books – Project Technology, Bhubaneswar; n OaOb Rourkela Pragati Utkal Sangh R Archives of Indian V. V. Giri National Ministry of Labour http://www.indialabo Labour (AIL) Labour Institute & urarchives.org Association of Indian Labour HistoriansMuktabodha Digital Muktabodha Donations from http://muktalib5.org/ Library Indological Research Individuals & Trusts digital_library.htm Institute Traditional Council of Scientific Department of http://www.tkdl.res.i Knowledge Digital and Industrial Ayurveda, Yoga… n Library (TKDL) Research (CSIR) (AYUSH) National Science NISCAIR, India Council of Scientific http://nsdl.niscair.res Digital Library and Industrial .in Research (CSIR) Vigyan Prasar Vigyan Prasar, India Department of http://www.vigyanpr Digital Library Science and asar.gov.in/digilib/ Technology
Digital Library of India• A partner project of Universal Digital Library (UDL) or Million Books Project (MBP)• Initiated in India in 2002 as spin-off of Universal Digital Library project.• 355,000+ documents; top six languages are respectively English, Sanskrit, Hindi, Telugu, Bengali and Urdu covering about 91.3% of books in major DLI site http://www.new1.dli.ernet.in.• Becomes a testbed for Indian language technologies, facilitating development of OCR (optical character recognition), TTS (text-to- speech) and other related software for Indian language computing.• Challenge 1: Indic language contents are not OCR-ed.• Challenge 2: Metadata information not available in Indic languages for Indic language documents.• Challenge 3: Document is downloaded page-wise in image, html, txt formats; but not full whole document downloaded in a single click, e.g. in PDF file.• Challenge 4: Broken links and page is not available – signs of aging.
Multi-stakeholders’ Participationo Principal Coordinator (International) – Carnegie Mellon Universityo Principal Coordinator (National) – Indian Institute of Science (IISc), Bangaloreo Research Coordinator (National) – International Institute of Information Technology (IIIT), Hyderabado Infrastructure Agency – ERNET Societyo Funding Agencies – MCIT, NSF, PSAo Software and Hardware Solutions – Industrial Partnerso Operational Agencies – Regional Mega Scanning Centres (RMSCs) – Scanning Centres – Source Libraries
Participation in Content Generation Academic Institutions (e.g. Anna University) Research Cultural Agencies Institutions (e.g. (e.g. CDAC- Salarjung Noida) Museum) Digital Library of India Religious Industrial Institutions Agencies (e.g. Tirumala (e.g. Thrinaina Tirupati Informatics Devasthanam) Government Ltd.) Agencies (e.g. Rashtrapati Bhavan)
IGNCA maintained Digital Libraries• Partially open access multilingual and multimedia digital contents – Kalasampada: Digital Library Resources for Indian Cultural Heritage (DL-RICH) – Cultural Heritage Digital Library in Hindi (CHDLH) – National Databank on Indian Art and Culture – National Digital Library of Manuscripts• Supported by DIT, MCIT; Ministry of Culture – Content Development and IT Localisation Network (COILNET) Programme – Technology Development for Indian Languages (TDIL) Programme – National Mission for Manuscripts
Collaborative Digital Libraries on Indian Cultural Heritage Archaeological Survey of India Manuscript National Libraries Mission for (e.g., Allama Manuscripts Iqbal Library) IGNCA’s Partner Oriental Government Institutions Institutions Agencies (e.g. Oriental (e.g. Asiatic Research Society) Library) Academic Museums Institutions (e.g. National (e.g. Visva- Museum) Bharati)
National Mission for Manuscripts• February 2003 by Ministry of Tourism and Culture, Government of India.• An ambitious five year project with the specific objectives of locating, documenting, conserving and disseminating the knowledge content of Indias manuscripts.• Established a network of 47 Manuscript Resource Centres, 32 Manuscript Conservation Centres (MCCs), 32 Manuscript Partner Centres (MPCs) and more than 200 Manuscript Conservation Partner Centres (MCPCs) across the country.• NMM identified 45 collections of Manuscript Treasures of India (MTI). These are very unique and rare collections of manuscripts.• 5 MTIs have already inscribed on Memory of the World Register.• Out of 6 inscriptions from India, 5 inscriptions are from MTIs.• National Digital Manuscripts Library will provide full-text access to all MTIs including which are covered in MoWR.• Kritisampada: The National Database of Manuscripts provides access to metadata inform of manuscript collections of NMM partners.
DL Initiatives with Single Indic Language Contents Name of Digital Library Organization Focused Whether S/W used Indic Metadata Language in Indic LanguageDigital Repository of W.B. West Bengal State Bengali Yes, DSpacePublic Library Network Central Library & Partial* CDAC KolkataPanjab Digital Library Panjab Digital Library; Punjabi No* - NanakshahiOpen Access to Oriya Books National Institute of Oriya No* EPrints– Project OaOb Technology, Rourkela; Srujanika, BBSDigital Repository of VPM Vidya Prasarak Marathi Yes, DSpace Mandal, Thane Partial*ASI Digital Library Archeological Survey of English and No* - India; Sanskrit IGNCA New DelhiE-Gyankosh Indira Gandhi National English and No* DSpace Open University, New Hindi Delhi * Metadata available mostly in transliterated English
Challenges Ahead• Lack of national practice for establishing principles of interoperability, cross-search, metadata harvesting, etc.• Enabling harvesting of metadata from South Asian digital libraries – Protocol for Metadata Harvesting (OAI-PMH) can be adopted – Other similar harvesting method can be applied• Standardization of transliterated metadata or metadata with diacritical mark• South Asian documentary heritage collections available worldwide – stock taking• Innovation in DL development is needed to integrate features of interactive Web 2.0 (such as user interaction and content sharing), Multimedia, and M-Science (accessibility using mobile devices).
Examining Semantic Web Principles• Indic language metadata – providing metadata in all major Indian languages for a full-text document• Whether ontology-based structure is followed (RT, BT, NT…) – Standard vocabulary/ structured subject headings/ subject thesaurus vs. user-generated keywords• Whether permanent link is available for a document or a dynamic link is generated – Rate of link failure or dead links (links to full-text contents, images, etc.)• Whether contents can be accessed using handheld devices• Whether text-to-speech (TTS) can be applied
Conclusion• Helped in bridging digital divide in the country by making Indian language documents freely available to the masses.• Helped in pushing content localization efforts.• ‘Lean backward’ to digitize important documentary heritage collections.• “Lean forward” to include born digital contents in multilingual OA repositories.• National DLs to include rare and out-of-print books and manuscripts in all Indian languages.• Metadata harvesters for these DLs.
Acknowledgement• UNESCO, UBC and JNU for travel and technical support thanK You anY Question? http://www.anupkumardas.blogspot.in/