Application of semantic tagging to generate superimposed information on a digital encyclopedia

1,244 views

Published on

Sánchez-Alonso, S. and Athanasiadis, I. (eds.) Metadata and Semantic Research. 4th International Conference MTSR 2010, Communications in Computer and Information Science 108. Heidelberg: Springer, 2010, 84-94.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,244
On SlideShare
0
From Embeds
0
Number of Embeds
43
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • F
  • .
  • Application of semantic tagging to generate superimposed information on a digital encyclopedia

    1. 1. 1 Application of semantic tagging to generate superimposed information on a digital encyclopedia 4th Metadata & Semantics Research Conference (MTSR 2010) 20-22 October 2010 P. Garrido1 , J. Tramullas2 , F. J. Martínez1 1 Department of Computer Science and Systems Engineering 2 Department of Librarianship and Information Science University of Zaragoza
    2. 2. 2 This research work proposes the automated processing of textual documents with historic information, in Spanish language, using free software technologies 4th Metadata & Semantics Research Conference (MTSR 2010) WHAT?
    3. 3. 3 WHERE ? The online Gran Enciclopedia Aragonesa (GEA, the Great Aragonese Encyclopaedia) XML labels <voz> </voz> <vozId> </vozId> <nombre> </nombre> <descripcion> </descripcion> Voices Table 2. Input Voice Labels 4th Metadata & Semantics Research Conference (MTSR 2010)
    4. 4. 4 Index (alphabetical order) Categories & Subcategories (topic) Common Integrated Form
    5. 5. 5 WHY ? 4th Metadata & Semantics Research Conference (MTSR 2010) Analysis of GEA is important because: (i) it requires continuous updating (ii) the analysis of the Spanish language is very interesting and unusual (iii) it promotes technology transfer (universities & enterprises) AIMS Dynamic version (Product) R+D+I (Project)
    6. 6. 6 WHICH ? 4th Metadata & Semantics Research Conference (MTSR 2010) ● Facilitate the task of automating the documentary analysis of content (semantic description) with this kind of textual documents, ● Permit a more thorough representation of the contents, ● Increase the possibilities of retrieving requested information, ● Adapt their use to each user’s needs. Combining several models helps accomplish richer semantics to subsequently enable simpler indexing, and allows more efficient search processes.
    7. 7. 7 HOW ? (I) 4th Metadata & Semantics Research Conference (MTSR 2010) ● reading and interpreting the text, ● detecting relevant information, ● extracting it, ● shaping the association among entities or for an entity-related event, ● storing it in one or several ways. The algorithm used for automatic processing purposes must be capable of:
    8. 8. 8 HOW ? (II) 4th Metadata & Semantics Research Conference (MTSR 2010)
    9. 9. 9 HOW ? (III) 4th Metadata & Semantics Research Conference (MTSR 2010)
    10. 10. 10 HOW ? (IV)
    11. 11. 11 HOW ? (& V) 4th Metadata & Semantics Research Conference (MTSR 2010) Input Voice (a) A lg o r it h m ( t a b le 4 ) Output Voice (b) (topic-oriented approach) (subject-centric computing) (superimposed information) (AI techniques)
    12. 12. 12 PERFORMANCE EVALUATION 4th Metadata & Semantics Research Conference (MTSR 2010) Numbers/Algorithms First-level Second-Level Voices analyzed 192 192 Associations Hundreds Thousands (3392) Events Dozens Hundreds (897) Reliability 70% 90% NOTE: An example trace is available in http://e-archivo.uc3m.es/bitstream/10016/4945/1/Tesis.pdf (pages 324-328)
    13. 13. 13 CONCLUSIONS (I) 4th Metadata & Semantics Research Conference (MTSR 2010) ● Enhance user-friendliness, especially with non-specialised users, ● Capture the search that this work contemplated in natural language without having to simulate a performance, ● Merge it with other types of external information sources With the proposed architecture we managed:
    14. 14. 14 CONCLUSIONS (& II) 4th Metadata & Semantics Research Conference (MTSR 2010) ● The peculiarities of the Spanish language, such as semantic ambiguity and the wide spectrum of available linguistic formulae to express the same thing. ● Restructuring the volume of information to work properly. Our proposal provides a framework within ’things’ can be represented as they are. With the incorporation of Artificial Intelligence techniques in the algorithm, we provide coverage of:
    15. 15. 15 4th Metadata & Semantics Research Conference (MTSR 2010)

    ×