슬라이드 1

589 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
589
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • 각각의 특성, 예제
  • 슬라이드 1

    1. 1. CILAB Seminar 2008/03/21 jwchoi@world.kaist.ac.kr
    2. 2.  Paper Overview  Wrapper Induction  Pattern-based Approch  Rule-based Approach  Conclusion
    3. 3.  Surveys on the Semantic Annotation Platform  Writer: Lawrence Reeve, Hyoil Han  Affiliation: Drexel University (Philadelphia)  ACM Symposium on Applied Computing  They examined Semantic Web annotation platforms ▪ Platform Classification, Overview, Evaluation Comparison  What I want to get …  Annotation hint for our Project  Term unification – Pattern, Rule, …  For my research
    4. 4.  Pattern-based  Discovery: Seed expansion  Rules: Taxonomy Label Matching  Machine Learning-based  Probabilistic: HMM, N-gram analysis  Induction: Linguistic, Structural
    5. 5. Platform Method Machine Learning Manual Rules Bootstrap Ontology AeroDAML Rule N Y WordNet Armadillo Pattern Discovery N Y User KIM Rule N Y KIMO MnM Wrapper Induction Y N KMi MUSE Rule N Y User Ont-O-Mat: Amilcare Wrapper Induction Y N User Ont-O-Mat: PANKOW Pattern Discovery N N User SemTag Rule N N TAP
    6. 6. Platform Method Machine Learning Manual Rules Bootstrap Ontology AeroDAML Rule N Y WordNet Armadillo Pattern Discovery N Y User KIM Rule N Y KIMO MnM Wrapper Induction Y N KMi MUSE Rule N Y User Ont-O-Mat: Amilcare Wrapper Induction Y N User Ont-O-Mat: PANKOW Pattern Discovery N N User SemTag Rule N N TAP
    7. 7. A frame to analyze semi-structured data (mostly in web)
    8. 8. Information Extraction from Semi-Structured Data by creating Wrapper Automatically “Wrapper Induction for Information Extraction” - Nicholas Kushmerick (264p)
    9. 9.  High precision  Useful bootstrapping method  Many other semantic annotation platform used this method  Amilcare: Wrapper Induction Tool ▪ MnM ▪ OntoMat ▪ Armadillo
    10. 10.  PANKOW  Pattern-based Annotation through Knowledge on the Web  Plugin for OntoMat  Institute of AIFB, University of Karlsruhe
    11. 11.  Hearst Patterns
    12. 12.  Definites  Apposition and Copula
    13. 13. Proper Noun Extraction (Term Extraction) Hypothesis Phrase Construction Using Pattern
    14. 14. The Extensible Markup Language ( XML ) is a general-purpose markup language. It is classified as an extensible language because it allows its users to define their own tags. The |Extensible Markup Language|( |XML| ) is a general-purpose |markup language|. It is classified as an |extensible language| because it allows its users to define their own |tags|. H1: <CONCEPT>s such as <INSTANCE> Extensible Markup Languages such as XML Extensible Markup Languages such as markup language XMLs such as markup language markup languages such as XML … DEFINITE1: the <INSTANCE> <CONCEPT> the markup language XML … the tags markup language Web Page Web Page with Proper Noun Phrases Hypothesis Phrases
    15. 15. H1: <CONCEPT>s such as <INSTANCE> Extensible Markup Languages such as XML Extensible Markup Languages such as markup language XMLs such as markup language markup languages such as XML … DEFINITE1: the <INSTANCE> <CONCEPT> the markup language XML … the tags markup language Extensible Markup Languages such as XML -- 3 Extensible Markup Languages such as markup language -- 0 XMLs such as markup language -- 0 markup languages such as XML -- 834 Hypothesis Phrases Number of hits for phrase
    16. 16. Extensible Markup Languages such as XML -- 3 Extensible Markup Languages such as markup language -- 0 XMLs such as markup language -- 0 markup languages such as XML -- 834 The Extensible Markup Language ( <Term id =“2” instanceOf=“3”>XML</Term> ) is a general-purpose <Term id=“3” conceptOf=“2”>markup language</Term>. It is classified as an extensible language because it allows its users to define their own tags. Number of hits for phrase Annotated Document Computer Language Markup Language Programming Language
    17. 17. Upper Ontology Named Entity Recognition Mapping
    18. 18.  Semantic annotation system requires a light-weight upper-level ontology focused on named entity classes  RDF(S) with compliance and possible extensions to OWL Lite is the best choice for knowledge representation language for the ontology and the KB  More power will unneccessarily degrade the scale and performance  The documents and the metadata (annotations) should be kept decoupled from each other and separate from the ontology and theh knowledge base
    19. 19.  Lists of mapping rule  80,000 mapping rules already ▪ Date, Person, Organization, Location, Percent, Money
    20. 20. Framework Precision Recall F-Measure Armadillo 91.0 74.0 87.0 KIM 86.0 82.0 84.0 MnM 95.0 90.0 n/a MUSE 93.5 92.3 92.9 Ont-O-Mat: PANKOW 65.0 28.2 24.9 SemTag 82.0 n/a n/a
    21. 21.  Definitions and Scope of Semantic Annotation are different  PANKOW: concept, instance annotation  Armadillo: Restricted NE Annotation(Human, Paper)  KIM: NE Annotation (Date, Person, Organization, Location, Percent, Money) To the best of our knowledge there is no well established term for this task; Neither there is a well established meaning for the term “semantic annotation” - From “KIM – Semantic Annotation Platform”
    22. 22.  Terms like pattern, rule, semantic annotation are very ambiguous  Defining these terms suitable for our project is important  Wrapper Induction for Bootstrapping Data  PANKOW Term Extraction method  Upper ontology is important  Every annotation tool have upper ontology and they mapped extracted entity to this ontology  KIMO is well-defined  Separation of relation extraction from concept gathering
    23. 23.  Named Entity (추출하고자 하는 대상을 좁히면 편하다)  개념 등록과 관계 맺기를 분리하라  Use Upper Ontology  자신의 목적에 맞게 annotation 툴을 사용하라. 같은 용어를 사용했다고, 같은 행동을 하는 툴은 아니다.
    24. 24.  Named Entity Recognition
    25. 25.  Pattern  Rule  Machine Learning  새 triple에서 pattern을 추출하는 것은 Machine Learning은 아니다.
    26. 26.  Example of Ont-O-Mat: PANKOW  PANKOW ▪ Pattern-based Annotation through Knowledge on the Web  Patterns in PANKOW  Linguistic Patterns (similar pattern with ours) ▪ Hearst Patterns ▪ Definites ▪ Apposition and Copula  They use patterns to extract concepts, instances from text
    27. 27.  평가 방법  Precision  Recall  평가셋을 어디에서 구하던가?
    28. 28.  주요 프로그램 예제  KIM  ?  프로그램에서 저장하고 있는 Annotation의 형태
    29. 29.  MMAX2  EML  OntoNote
    30. 30.  어느 프로그램이 가장 유용할까? 우리 프로젝트 에

    ×