Union catalogandknowledge engineering for teldap

2,237 views
2,206 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,237
On SlideShare
0
From Embeds
0
Number of Embeds
1,944
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Union catalogandknowledge engineering for teldap

  1. 1. Union Catalog and Knowledge Engineering for TELDAP Keh-Jiann Chen Principal Investigator Core Platforms for Digital Contents Project, TELDAP Research Fellow Research Center for Information Technology Innovation & Institute of Information Science, Academia Sinica
  2. 2.  Outline  Introduction  Union catalog  Databases and metadata for digital contents and websites  Knowledge engineering  Future perspective
  3. 3.  Introduction The integration and management of digital contents has become an important issue as the amount of digital contents produced from different projects and institutions increases rapidly. The goal of our project is to achieve optimized preservation, retrieval, and presentation of digital collections.
  4. 4.  Outline  Introduction  Union catalog  Databases and metadata for digital contents and websites  Knowledge engineering  Future perspective
  5. 5. What is the union catalog ? • It is a catalog and portal for all digital collections of TELDAP. • It is an integrated platform for browsing and searching entire digital contents of TELDAP. • Metadata provides core descriptions and licensing information of each digital collection.
  6. 6. Browsing by topics Search by keywords Home Page of Union Catalog
  7. 7.  Outline  Introduction  Union catalog  Databases and metadata for digital contents and websites  Knowledge engineering  Future perspective
  8. 8.  Metadata models for different types of objects Archived digital items • Union catalog metadata model- Dublin core+ Web sites • DCCAP (Dublin Core Collections Application Profile) • Fields for internal used only ― Unique Identifier, Format, Evaluation, Cataloging History Documents • Document metadata-Dublin core
  9. 9. 9 Metadata for digital items : Over 3 million digital items and still increasing Element Definition Title A name given to the resource Creator An entity primarily responsible for making the content of the resource Subject and Keywords The topic of the content of the resource Description An account of the content of the resource Publisher An entity responsible for making the resource available Contributor An entity responsible for making contributions to the content of the resource Date A date associated with an event in the life cycle of the resource Resource Type The nature or genre of the content of the resource Format The physical or digital manifestation of the resource Resource Identifier An unambiguous reference to the resource within a given context Source A Reference to a resource from which the present resource is derived Language A language of the intellectual content of the resource Relation A reference to a related resource Coverage The extent or scope of the content of the resource Rights Management Information about rights held in and over the resource
  10. 10. 10
  11. 11. Metadata for websites Over 500 websites and still increasing Metadata • DCCAP (Dublin Core Collections Application Profile) • Total of 19 data fields
  12. 12. The Website Homepage Picture URL, Project Information Type, Name, Author, Subject, Description, Language, Item Type, Target Archived Information: URL, time, authorization Copyright, Purpose, Other Information Figure: http://digitalarchives.tw Metadata for websites
  13. 13. Dynamic categorization • User-oriented categorization – General, elementary school students, high school students, researchers, …etc. • Topical-based categorization – Archaeology, painting, animal, plant, document, … etc. • Functional-based categorization – Research, education, business, technology,… • Categorization based on institutions – Academia Sinica, Taiwan U., Palace museum,…
  14. 14. Purpose: Education Target: Elementary school student, Junior high school student, Teacher… Purpose: Creative applications Purpose: Academic research Subject: Animal, Archaeology, Anthropology… Figure: http://digitalarchives.tw Digitalarchives.tw
  15. 15. Metadata for project documents Over 14,000 documents and still increasing Metadata- Dublin core Construct Teldapwiki- A Wikipedia for TELDAP http://wiki.teldap.tw/
  16. 16.  Outline  Introduction  Union catalog  Databases and metadata for digital contents and websites  Knowledge engineering  Future perspective
  17. 17. Plans of making knowledge structures for TELDAP • Construct metadata models for different objects. • Establish hyperlinks between contexts and objects. – Develop keyword extraction tools. – Design automatic hyperlink tagging tools. • Construct TELDAP ontology and thesaurus. – Art & Architecture Thesaurus by Getty – Chinese WordNet
  18. 18. (1) Metadata models for different objects • Digital collections – Union catalog metadata model- Dublin core+ • Web sites – DCCAP (Dublin Core Collections Application Profile) – Public fields – Private fields  Unique Identifier, Format, Evaluation, Cataloging History • Documents – Document metadata-Dublin core
  19. 19. (2) Establish hyperlinks between contents and objects • Identify keywords in contents. • Tag keywords with related object hyperlinks.
  20. 20. Develop hyperlink tagging tools • Word segmentation tools – Resolve word segmentation ambiguities and identify keywords. – CKIP word segmentation system: http://ckipsvr.iis.sinica.edu.tw/
  21. 21. Develop hyperlink tagging tools • TELDAP keyword dictionary – Extract keywords from metadata and establish object-keyword relations.  Extract text from XML data for each object.  The text are classified by topics, titles, descriptions, authors, locations, eras etc.  From each class of text file extract keywords by automatic word segmentation, keyword extraction, and manual post editing. – Current dictionary contains more than 50,000 Keywords.
  22. 22. Prototype system for hyperlink tagger • Identify and select keywords from the input text
  23. 23. Prototype system for hyperlink tagger • Produce text with keywords and hyperlinks
  24. 24. Prototype system for hyperlink tagger • Hyperlinks point to the related digital collections
  25. 25. (3) Construct TELDAP ontology and thesaurus Establish association links between Chinese keywords and Getty AAT. Merge TELDAP keywords with Chinese AAT.
  26. 26.  Outline  Introduction  Union catalog  Databases and metadata for digital contents and websites  Knowledge engineering  Future perspective
  27. 27.  Future Perspective • Technology development – Construct multi-lingua thesauri – extend Getty AAT. – Maintain the TELDAP keyword-and-object relation database. – Construct name authority files, gazetteers, and universal calendars. – Design hyperlink taggers and keyword extension tools. – Design an authoring tool which provides hyperlinks of keyword related digital contents automatically. – Design knowledge-based content retrieval system.
  28. 28.  Future Perspectives • Content enrichment – Within TELDAP :  Standardize object metadata model and data format.  Provide object metadata in controlled vocabulary.  Write scripts and stories for different topics with Wiki-like knowledge structure.  Enrich the digital collections.  Establish hyperlinks between text books and TELDAP collections. – Extend the knowledge sources : e.g. Wikipedia

×