SlideShare a Scribd company logo
1 of 89
Knowledge Acquisition in a
          System
                 Christopher Thomas
Ohio Center of Excellence in Knowledge-enabled Computing -
                         Kno.e.sis,
                  Wright State University
                        Dayton, OH
                    topher@knoesis.org
Circle of knowledge in a System




           Knowledge Enabled Information and Services Science   2
Dissertation Overview
                                Conceptual Knowledge: Ontologies, LoD
                                                                Knowledge Representation
                                                                [IJSWIS, CR, FLSW]
                                                                Ontology design [WWW, FOIS]
Knowledge merging/
Ontology alignment
[AAAI, WebSem2,                                                      Textual Information:
SWSWPC]                                                                Wikipedia, Web
                                                                          Information
                                                                          Quality[WI2]
                                                                          Social processes
                                                                          for content creation
                                                                          [CHB]


 Social processes
                                                                  Doozer++:
 for knowledge
                                                                  Taxonomy extraction
 validation
                                                                  Relationship/Fact
 [IHI,WebSci, CHB]
                                                                  extraction
                                                                  [IHI, WebSem1, IEEE-
                                                                  IC, WebSci, WI1]

                     Knowledge Enabled Information and Services Science                      3
Talk Contents
                                                       What is knowledge?




How do we turn
propositions/belie
fs into
knowledge?
                                                                          How do we
                                                                          acquire
                                                                          information?


                     Knowledge Enabled Information and Services Science                  4
Talk outline

 • Motivation
 • Knowledge Acquisition (KA) Overview
 • KA in a loosely connected system – Doozer++
   – Automatic formal domain model creation
   – Information Extraction
       • Top-Down
       • Bottom-Up
   – Information Validation “in use”
 • Conclusion


               Knowledge Enabled Information and Services Science   5
Larger Context of automated KA

    • Increasing significance of knowledge
      economy
          – “Knowledge Workers” spend 38% of their time
            searching for information (McDermott, 2005)
          – Vital to get a quick and still comprehensive
            understanding of a field through pertinent
            concepts/entities and relations/interactions
    • Increased demand for formally available
      knowledge in semantic models
          – Filtering, browsing, annotation, reasoning

Mcdermott, M. "Knowledge Workers: How can you gauge their effectiveness." Leadership Excellence. Vol. 22.10. October 2005


                                 Knowledge Enabled Information and Services Science
Motivating Scenario

 • Learn about a new subject
   – E.g. gain a quick overview over a current or
     historical event
 • Use a formal representation of the gained
   overview to filter information
   – Facilitate in-depth exploration
 • Use the formalized information and the
   user interaction to create knowledge from
   information

             Knowledge Enabled Information and Services Science   7
Motivating Scenario

 •     Google: India



 • Brief description –
   demographic-,
   geographic
   information, etc.



             Knowledge Enabled Information and Services Science   8
Motivating Scenario

 •     Google: India



 • Regular Web results




             Knowledge Enabled Information and Services Science   9
Motivating Scenario

 • Clicking on a link to the Wikipedia entry
   shows that there have been conflicts with
   Pakistan over the region of Kashmir
  Investigate more




             Knowledge Enabled Information and Services Science   10
Motivating Scenario

 •     Google: India
       Pakistan Kashmir

 • Only Web results and
   news
 So far, search
 engines only display
 facts about entities, not
 relationships or larger
 contexts
             Knowledge Enabled Information and Services Science   11
Motivating Scenario

 • Beneficial to get an overview “at a glance”
   over a domain.
 • Automated approach to creating knowledge
   models for focused areas of interest
 • Create models around an incomplete or
   rudimentary keyword description and
   “anticipate” user‟s intentions wrt. the full
   context



            Knowledge Enabled Information and Services Science   12
Motivating Scenario

      Doozer++: india pakistan kashmir
      • Important concepts and relationships
        describing the context




           Knowledge Enabled Information and Services Science   13
Motivating Scenario

• Filtered IR using
  concepts in the
  model
• Concepts and
  relationships that
  contributed to
  clicked results gain
  support
• User can explicitly
  approve content
            Knowledge Enabled Information and Services Science   14
Circle of Knowledge (Example)




          Knowledge Enabled Information and Services Science   15
Motivating Scenario

 • On-demand creation of domain knowledge
   improves individual comprehension of an
   event
 • Formal models are easy to use in
   information filtering
 • Validated information  Knowledge
   – Can be given back to the community to
     improve the overall amount of formal
     knowledge available on the Web
   – E.g. “Unknown” to DBPedia that the region of
     Kashmir belongs to both India and Pakistan

            Knowledge Enabled Information and Services Science   16
Importance of Model creation

 • Models support individual user or know-
   ledge worker, but also groups or system
    – More efficient communication through small,
      shared, agreeable conceptualizations
      • People  people
      • People  system
      • System  system
    – Classify or filter pertinent and topical
      information using models
    – Model-assisted searching and faceted or
      exploratory browsing using relationships
    – Reuse of validated knowledge
              Knowledge Enabled Information and Services Science
Domain Knowledge Models
• Scientific applications
   – In-depth description of concepts
   – Narrow field
   – People  system, system  system
     • Annotation, reasoning
     ⇒Absolute correctness necessary (as far as possible)
• General applications
   – Broad coverage of the field
   – Context – how does the new information fit in?
   – People  people, people  system
     • Individual domain comprehension, filtering, annotation
     ⇒Relative correctness sufficient
              Knowledge Enabled Information and Services Science   18
Model Creation Resources

 • Large models are available as reference
    – DBPedia, YAGO, UMLS, MeSH, GO …
    – Too big to be efficiently and effectively usable
       • Prior knowledge required to find pertinent resources
 • Other information is available in great
   abundance, but unformalized
    – Tacit expert knowledge
    – Scientific databases
    – Free text
       • peer reviewed journals and proceedings
       • General Web content

               Knowledge Enabled Information and Services Science   19
Epistemological Considerations

 • Knowledge
    – Ensure epistemological soundness of
      automated knowledge acquisition
 • Reference
    – Ensure that nodes in the models refer to real-
      world concepts/entities




              Knowledge Enabled Information and Services Science   20
Knowledge

 • Functional Definition
   – Knowledge = “Know-How”
   – Practical, but weak,
     Includes “Actionable Information”
 • Categorical Definition
   – Knowledge = Justified true belief
   – S knows that p iff
      i.     p is true;
      ii.    S believes that p;
      iii.   S is justified in believing that p.

                  Knowledge Enabled Information and Services Science
Belief and Justification

 • Belief
    – Statements held by the system
 • Justification
    – Trusted sources
    – Extraction algorithms
       • Bayesian, deductive or inductive reasoning
       • Macro-Reading algorithms  Wisdom of the crowds
    – Validation



               Knowledge Enabled Information and Services Science   22
Truth assessment of a statement

 • Is truth   correspondence?
    – “A” is true No Access
                  iff A (a true statement corresponds
      to an actual state of affairs)
 • Is truth   coherence?
    – Does the statement fit into the system of other
      statements?
 • Is truth   consensus?
    – agreement of correctness amongst a group
 ⇒In the cyclical model, achieve high degree
  of certainty by allowing constant validation
              Knowledge Enabled Information and Services Science
Domain Model – Reference

 • Model of a domain conceptually split
   – Domain Definition
      Concepts identified by URIs (classes, entities,
       relationship types)  ensures reference
      Remains static – necessity
      Rigid designators (Kripke)
   – Domain Description
      Relationships describe concepts
      Subject to change – possibility
      Definite descriptions (Russell)


              Knowledge Enabled Information and Services Science
Domain Definition

 • Top-down concept identification
 • Achieved through
    – Manual creation based on consensus in a
      group
    – Extraction from community-created or peer-
      reviewed conceptualization
      • Wikipedia
      • MeSH or UMLS Semantic Network




             Knowledge Enabled Information and Services Science
Domain Description

 • Possible to do top-down extraction of the
   domain description, e.g. from DBPedia
 • Problem: Formal concept descriptions are
   sparse
    – On average, DBPedia has less than 2 object
      properties per entity
 • Extract descriptions (facts) bottom-up
    – Available in text, DBs, etc.
    – Domain-specific molecular structure extractors
      (GlycO)
    – Domain independent IE techniques (Doozer++)


              Knowledge Enabled Information and Services Science
Knowledge Acquisition Approaches
 • KA in a tightly connected system
   – GlycO: domain-specific BioChemistry ontology
      •   Manual domain definition and description
      •   Partial automatic domain description
      •   Domain-specific automatic validation
      •   Manual validation for false negatives
 • KA in a loosely connected system
   – Doozer++: general domain-model creation framework
      • Automatic domain definition, top-down concept extraction
      • Automatic domain description, bottom-up fact extraction
           – Extraction from trusted sources
           – A trusted extraction and validation procedure
      • Domain-independent community-based validation



                  Knowledge Enabled Information and Services Science
Knowledge Acquisition Approaches
                Knowledge Traditional            GlycO                    Doozer++
                Engineering Extraction
                Approach    Approach

 Definition     Top-Down       Bottom-up         Top-Down                 Top-Down
                                                 Knowledge                Conceptually, by
                                                 Engineering              extraction from
                                                                          Top-Down
                                                                          corpus
 Description    Top-Down       Bottom-up         Bottom-up,               Bottom-up,
                                                 restricted by Top-       restricted by
                                                 down definition          Top-down
                                                                          definition
 Verification   Manual         Manual            Correctness:             Community-
                                                 automatic:               based validation
                                                 Exceptions:
                                                 added manually


                     Knowledge Enabled Information and Services Science
KA on the Web - Vision

 • Web searches, browsing sessions or
   classification task can be seen as creating
   an implicit domain model
   – World view, Concept coverage, Facts
 • Make models explicit and reusable using
   formal descriptions (RDF, OWL)
 • Validate the contained information and
   share with the community
  Increase system‟s knowledge by
     “doing what you do”: Search, browse,
     click, communicate
            Knowledge Enabled Information and Services Science   29
KA in a Loosely Connected System
Domain Model creation
to gradually increase
                                 •Linked Data
overall knowledge of
the system                                                         • Free text
• User-interest driven                                                • Wikipedia
• Incentive to                                                        • Web
  evaluate

                                        Domain Definition
 Validation                                                         Doozer++
Scooner
Evaluation in Use:                                              – Domain Definition:
Semantic browsing                                                 Top-down concept
and retrieval,                                                    extraction
Domain-independent,                                             – Domain Description:
Community-based                       Domain Description          Pattern-based fact
                                                                  extraction
                   Knowledge Enabled Information and Services Science               30
Domain Definition Requirements
• Identify concepts, concept
  labels (denotations) and
  concept hierarchy
• Challenge: define narrow
  boundaries for a domain while
  at the same time ensuring
  broad conceptual coverage
  within the domain


           Knowledge Enabled Information and Services Science   31
Domain definition - conceptual

 • Expand and Reduce approach
   – Start with „high recall‟ methods
      •   Exploration – Full text search
      •   Exploitation – Graph-Similarity Method
      •   Category growth
      •   “What could be in the domain?”
   – End with “high precision” methods
      • Apply restrictions on the concepts found
      • Remove terms and categories that fall outside the
        dense areas of the model graph
      • “What should be in the domain?”

                Knowledge Enabled Information and Services Science   32
Domain Description - Classifier

 • Concept-aware
   – Use concepts and concept labels from the
     domain definition step
 • Fact extraction as classification of
   concept pairs into relationship types
   – fclass: C C    R
   – RS,O = {R | p(R,S,O) > ε}




             Knowledge Enabled Information and Services Science
Domain Description

 • Combined Language model and Semantic
   classification model
 • Language model: Surface-pattern – based
   – Pattern manifestations of relationships as
     features
   – Open to any corpus, language independent
   – Less computational overhead than NLP
 • Semantic Classification Model
   – Learned or assigned concept labels
   – Semantic types to aid classification

             Knowledge Enabled Information and Services Science
Domain Description - Implementation

 • Probabilistic Vector-space model
   – Each relationship is defined by vectors of
     • Pattern probabilities
     • Domain/range probabilities
   – Each concept is grounded by its semantic
     types and manifested by it‟s labels and their
     probabilities of identifying the concept
   – Sparse pattern representation (density ~2%)
   – White-box, easily verifiable
   – Inherently parallel

             Knowledge Enabled Information and Services Science
Terminology
Symbol   Meaning                Example

S, O     Subject and            Kelly_Miller_(scientist)
         Object concepts        Howard_University
         (semantic)
LS,LO    Subject and            “Kelly Miller”
         Object labels          “Howard University”
PLS,LO   Phrase                 Kelly Miller graduated from Howard University
         instantiating the
         pattern
P        Pattern                <Subject> graduated from <Object>

TS,TO    Semantic type of       Person
         Subject or Object      Educational_Institution
R        relationship           almaMater
                                birthPlace


                   Knowledge Enabled Information and Services Science           36
Probabilistic Classifier

                                                                             Semantic types.
   Labels taken                                                                Asserted in
   from Lexicon                                                                Ontology or
      or linked                                                               learned from
       corpus                                                                  linked data




           Patterns
         learned from
           free text




                        Knowledge Enabled Information and Services Science                     37
Probabilistic Classifier

How is Barack Obama related to Columbia University?
             p(R, Barack_Obama, Columbia_University)


                Sentence in corpus:
     Obama graduated in 1983 from Columbia University
      with a degree in political science and international
      relations.




                             (Regular classification requires multiple examples)

              Knowledge Enabled Information and Services Science               38
Probabilistic Classifier

  Obama graduated in 1983 from Columbia University
    p(almaMater ,Barack_Obama, Columbia_University) =
           p(almaMater | “<Subject> graduated in 1983 from <Object>”) *
           p(Barack_Obama | ”Obama”) *
           p(Columbia_University | ”Columbia University”) *
           p(almaMater | domain(person)) *
           p(almaMater | range(academic_institution))

    p(almaMater , Barack_Obama, Columbia_University)
           = 0.9 * 0.95 * 0.95 * 0.9 * 0.97

    p(almaMater, Barack_Obama, Columbia_University) = 0.70909425




                 Knowledge Enabled Information and Services Science       39
Pattern Generalization

 • Problem: Low recall in pattern-based IE
 • Substitute terms with wild cards
    – No POS tagging, hence only “*” wild cards
 • Mirrors shortest paths through parse trees
  <Subject>   graduated          in            1983            from    <Object>
  <Subject>       *              in            1983            from    <Object>
  <Subject>   graduated          *             1983            from    <Object>
  <Subject>       *              *             1983            from    <Object>
  <Subject>   graduated         in               *             from    <Object>
  <Subject>       *              in              *             from    <Object>
  <Subject>   graduated          *               *             from    <Object>
  <Subject>       *              *               *             from    <Object>

                  Knowledge Enabled Information and Services Science              40
Learning p(R|P)

 • Distantly Supervised Training
 • Collect pattern frequencies for training
   examples
   – Fact triples <S, R, O> e.g. from Linked Data
     (DBPedia, UMLS)
   – Manifestations of facts in text in the form of
     patterns (corpus e.g. Web, Wikipedia, MedLine)
 • For relationship Ri, aggregate pattern
   vectors representing <*, Ri, *>


             Knowledge Enabled Information and Services Science   41
Learning p(R|P) – naïve

 • For each vector Ri containing pattern
   frequencies for relationship Ri, compute




 • #Patternj that occur with terms denoting each
   <S, O> Ri in normalized by all pattern
   occurrences for Ri

             Knowledge Enabled Information and Services Science   42
Learning p(R|P) – naïve

 • Uniform distribution of relationships assumed
   – As the number of relationship types grows), the
     prior of each type goes towards 0.
   – normalize the probabilities over the column
     vector to get p(Ri|Pj)



 • Vector space representation
   – Relationship-pattern matrix
   – R2Pij = p(Ri|Pj)

             Knowledge Enabled Information and Services Science   43
Problem: Relationship Similarities

 • Extensional similarity
   – Semantically different relationships can share
     Subject-Object pairs in training data
 • Intensional similarity
   – Overlap and entailment of relationship types
      • Types should not be seen as discrete
         – E,g, physical_part_of         part_of
      • Apriori unknown which types overlap unless formal
        description available
   – Semantically similar types compete for the
     same patterns
               Knowledge Enabled Information and Services Science   44
Relationship similarities


 Pertinence Measure
   similarity between pattern vectors as approximation
   of intensional similarity




             Knowledge Enabled Information and Services Science   45
Pertinence for Relationships
 Do not punish the occurrence of the same pattern
  with relationship types that are intensionally
  similar, but extensionally dissimilar
 Reduce impact of extensionally similar relations




             Knowledge Enabled Information and Services Science   46
Pertinence Example


 Pattern: <Subject> in the right <Object>
 Relationship                                                           p(R|P)
 biological_process_has_associated_location                             0.968371381
 disease_has_associated_anatomic_site                                   0.880452774
 part_of                                                                0.622532958
 has_finding_site                                                       0.561041318
 has_location                                                           0.537424451
 has_direct_procedure_site                                              0.363832078
 Sum:                                                                   3.933654958

 Note: This never causes p(R,S,O) > 1

                   Knowledge Enabled Information and Services Science             47
Similarities between relationships




              Knowledge Enabled Information and Services Science   48
Pertinence evaluation
            0.8

            0.7

            0.6

            0.5
Precision




            0.4
                                                                                       Pertinence
            0.3                                                                        No Pertinence

            0.2

            0.1

             0
                  0   0.1            0.2            0.3          0.4             0.5
                                           Recall


                            Knowledge Enabled Information and Services Science                   49
Fact extraction evaluation - DBPedia
60% training set, 40% testing, DBPedia Infobox fact corpus, Wikipedia text corpus
    Precision / Recall




                                                                              Strict evaluation:
                                                                              Only 1st ranked
                                                                              extracted relation is
                                                                              compared to gold-
                                                                              standard.
                                                                              Averaged over 107
                                    Confidence Threshold                      relation types.

                         Knowledge Enabled Information and Services Science                     50
Sample results (DBPedia)

                        suggested        Extracted Rank 1
  Subject :: Object     Relationship     (Rel;Confidence)      Rank 2             Rank 3
  Howard Pawley ::                       successor;            after;             office;
                        after
  Gary Filmon                            0.799                 0.768              0.686
                                         nextSingle;           followedBy;        after;
  Mulan :: Tarzan       after
                                         0.603                 0.533              0.416
  Species Deceases::                     producer;             artist;            genre;
                        artist
  Midnight Oil                           0.761                 0.719              0.467
  The Crystal City ::                    artist;               author;            writer;
                        author
  Orson Scott Card                       0.625                 0.617              0.583
  Horatio Allen ::
                        before           predecessor;0.629     before;0.475
  William Maxwell
  Basdeo Panday ::                                             birthplace;    nationality;
                        birthplace       deathPlace;0.658
  Trinidad &Tobago                                             0.658          0.330
  Bob Nystrom ::
                        birthplace       cityOfBirth;0.677     birthplace;0.513
  Stockholm
  Beccles railway                                              borough;           friend;
                        borough          district;0.772
  station :: Suffolk                                           0.770              0.749



                        Knowledge Enabled Information and Services Science                   51
Fact extraction evaluation - UMLS
60% training set, 40% testing, UMLS fact corpus, MedLine text corpus
    Precision / Recall




                                                                              Strict evaluation:
                                                                              Only 1st ranked
                                                                              extracted relation is
                                                                              compared to gold-
                                                                              standard.
                                                                              Averaged over
                                    Confidence Threshold                      ~100 relation types.

                         Knowledge Enabled Information and Services Science                     52
Sample results (UMLS)

  Subject :: Object                        suggested Relationship     Extracted Rank 1
  Teeth::poisoning, fluoride               finding_site_of           finding_site_of
  768 polyps::polyp of cervix nos
                                           associated_with           associated_with
  (disorder)
  neck of uterus::polyp of cervix nos
                                           location_of               finding_site_of
  (disorder)
  benign neoplasms::polyp of colon         related_to                associated_with

  brain ischemia::brain                    has_finding_site          location_of
                                           is_primary_anatomic_
  gastrointestinal tract::polyp of colon                        location_of
                                           site_of_disease
  gamete structure (cell
                                           is_normal_cell_origin_ is_normal_cell_
  structure)::polyvesicular vitelline
                                           of_disease             origin_of_disease
  tumor



                       Knowledge Enabled Information and Services Science                53
Comparison – DBPedia corpus
                                                                                                Mintz: extraction
              1
                                                                                                  of 102 relation-
            0.9                                                                                   ship types from
            0.8                                                                                   Freebase
                                                                                                Doozer: 107
            0.7                                                                                   from DBPedia
Precision




            0.6
            0.5                                                                                     Mintz-POS
                                                                                                    Mintz-NLP
            0.4
                                                                                                    Doozer++ (R)
            0.3
                                                                                                    Doozer++ (P)
            0.2
            0.1
              0                                                                                 (R) Recall-
                                                                                                    oriented, using
                  0             0.2             0.4             0.6             0.8           1     pattern
                                                      Recall                                        generalization
            M. Mintz, S. Bills, R. Snow, and D. Jurafsky, “Distant supervision for relation     (P) Precision-
            extraction without labeled data,” in ACL2009.                                           oriented, no
                                                                                                    generalization
                                      Knowledge Enabled Information and Services Science                        54
Evaluate Ad-Hoc Model Creation

 • On demand creation of models
                                                                         Precision
                                                          Number of       (Domain
   Domain       Query                                     Concepts       Definition)
                “Semantic Web” OWL
   Semantic Web ontologies RDF                                143           0.98
                “Harry Potter” dumbledore
   Harry Potter gryffindor slytherin                          134           0.98
                    Beatles "John Lennon" "Paul
   Beatles          McCartney" song                           250           0.99
   India-Pakistan
   Relations        India Pakistan Kashmir                    129           0.99
   US Financial     tarp "financial crisis" "toxic
   crisis - TARP    assets"                                   146           0.93
   German           German chancellors "Angela
   Chancellors      Merkel" "Helmut Kohl"                     124           0.91

                    Knowledge Enabled Information and Services Science                 55
Ad-Hoc Model Creation - Evaluation




           Knowledge Enabled Information and Services Science   56
Ad-Hoc Model Creation - Evaluation
                                                                Recall wrt. possible
                                                                extraction. I.e. the
          Relative Recall                                       maximum number of
                                                                extracted facts
                                                                marks 100% recall




           Knowledge Enabled Information and Services Science                    57
Related Work




                                        Mintz
          Sur-
          face
          pat-
         terns                    SOFIE
         Turney
          only




               Knowledge Enabled Information and Services Science   58
Main Differences

 •   Surface-patterns only
 •   Only positive training examples
 •   Pertinence measure for semantic similarity
 •   Concept-aware: start with defined concepts
 •   Include background knowledge in
     probabilistic classification instead of rule-
     based reasoning



              Knowledge Enabled Information and Services Science   59
Related work
 • Pattern-based fact extraction
    – E. Agichtein and L. Gravano. Snowball: Extracting
      relations from large plain-text collections. In JCDL,
      2000.
    – Suchanek, Fabian M., Mauro Sozio, and Gerhard
      Weikum. SOFIE : A Self-Organizing Framework for
      Information Extraction.• WWW 2009.
    – T. M. Mitchell, J. Betteridge, A. Carlson, E. Hruschka,
      and R. Wang. Populating the Semantic Web by Macro-
      Reading Internet Text. ISWC 2009.
    – M. Pasca, D. Lin, J. Bigham, A. Lifchits, and A. Jain.
      Organizing and searching the world wide web of facts-
      step one: the one-million fact extraction challenge. In
      AAAI 2006.
               Knowledge Enabled Information and Services Science   60
Related work

 • Relationship-pattern computations
    – P. D. Turney and P. Pantel. From Frequency to
      Meaning: Vector Space Models of Semantics. Journal
      of Artificial Intelligence Research, 37, 2010.
    – P. D. Turney. Expressing implicit semantic relations
      without supervision. In ACL 2006




               Knowledge Enabled Information and Services Science   61
Summary Fact extraction

 • Pattern-based fact extraction with
   generalization and Pertinence achieves
   competitive precision and recall while being
   computationally feasible for large-scale
   extraction
   – Pertinence computation can also be a
     preprocessing step for other ML techniques
 • Different types of background knowledge
   incorporated into one statistical framework
   – Combined Language model and Semantic
     model
            Knowledge Enabled Information and Services Science   62
Application and Knowledge Validation
Example: Domain model
as a basis for research in                                 • 18 Million MedLine
the area of human                                             publications/abstracts
cognitive performance.                                          • UMLS Metathesaurus
                                                                    • Wikipedia




Scooner:
Semantic browsing                                                   Doozer++
and retrieval –                                                   – Hierarchy extraction
Evaluation in Use                                                 – Pattern-based fact
                                                                    extraction


                     Knowledge Enabled Information and Services Science              63
Domain Definition – Extracted Hierarchy




A hierarchy extracted for a cognitive science domain model.

The keyword description given to the system was a collection of terms relevant
to human performance and cognition.

                    Knowledge Enabled Information and Services Science      64
Domain Description: Connect Concepts




            Knowledge Enabled Information and Services Science   65
Expert Evaluation of Facts in the Model
           0.9

           0.8

           0.7

           0.6
Fraction




           0.5
                                                                                                     Fraction in bin
           0.4                                                                                       Cumulative incorrect
                                                                                                     Cumulative correct
           0.3                                                                                       Cumulative interesting
           0.2

           0.1

            0.
            Score 1        2        3        4        5        6        7        8        9
      1-2: Information that is 3-4: Information that is 5-6: Correct general 7-9: Correct Information not
         overall incorrect       somewhat correct            Information     commonly known



                                    Knowledge Enabled Information and Services Science                                    66
Extractor Confidence vs. Correctness




• Analysis shows that highest quality extractions have the
  highest confidence, but also incorrectly extracted facts have
  high confidence
 High-quality patterns as well as some noise-patterns have
  high indicative power.
                 Knowledge Enabled Information and Services Science   67
Extractor Confidence vs. Correctness




• Many facts deemed interesting were extracted based on
  highly specialized patterns in the long tail of the frequency
  distribution.
• Noisy patterns also tend to occupy this space

                  Knowledge Enabled Information and Services Science   68
Sources of Errors

 • Extracted relationship too specific or formally
   incorrect but metaphorically correct.
    – <Interpeduncular_Cistern  disease_has_associated_
      anatomic_site  Cerebral_peduncle> is incorrect,
       • Interpeduncular Cistern is not a disease. However, it does have
         the associated anatomic site Cerebral peduncle.
 • Incorrect directionality
    – <Pituitary_Gland  sends_output_to  Supraoptic_
      nucleus> should be <Supraoptic_nucleus  sends_
      output_to  Pituitary_Gland>
       • Direction in text often expressed in the context rather than the
         immediate pattern


                 Knowledge Enabled Information and Services Science         69
Validation

 • Extracted statements need to be validated
   to be considered knowledge
   – Explicit validation, e.g. thumbs up/down
   – Implicit validation, e.g. by analyzing click streams




              Knowledge Enabled Information and Services Science   70
Explicit Validation

 • Certainty of reference
    – I.e. we know exactly which statement was
      validated
 • Validator credentials can be obtained
    – E.g. a small community of experts may evaluate
 • Extra work
    – Explicit validation is a task that is consciously
      performed




               Knowledge Enabled Information and Services Science   71
Implicit Validation

 • Find indications of correctness or
   incorrectness based on the way the users
   interact with the presented information
    – Every action taken on a piece of information is
      recorded and analyzed
    – The cumulative behavior of the users gives an
      indication of which propositions are correct or
      interesting




              Knowledge Enabled Information and Services Science   72
Implicit Validation

 • Examples for implicit community-validation
    – Games with a purpose (L. von Ahn)
    – Google search rankings
 • Scooner semantic browser
    – Browse literature along facts in a model
    – Browsing trails suggest correct extraction




              Knowledge Enabled Information and Services Science   73
Implicit Validation

 • A fact is browsed very often by different users.
    – The fact is interesting to many users.
    – The fact is surprising and interesting, but may be incorrect.

 • A user follows a trail of multiple fact-triples trough
   a variety of documents.
    – The facts that were browsed have a high probability of being correct and support is
      added to the triples.
    – If the trail was longer than suggested by a small-world phenomenon, initial triples
      may have been incorrect, but led to interesting ones. For this reason, only the last
      k triples of the trail should garner support or the support should increase for the
      last k triples in the trail.
    – The last triple in the trail may have been incorrect and led to browsing results that
      caused the user to stop browsing. For this reason, the last triple of the trail should
      be treated with caution.




                      Knowledge Enabled Information and Services Science                       74
Validation “through use”



             Choose entityEnter search
              of interest    terms

                    Browse
                  Choose relevant
                extracted facts
                    literature that
                  supports the fact


           Knowledge Enabled Information and Services Science   75
Validation “through use”




                           Find another
                          interesting fact
            Fact trails are
              recorded




           Knowledge Enabled Information and Services Science   76
Validation “through use”
                                                Path suggests
                                               that at least the
                                              first 2 triples are
                                              factually correct




           Knowledge Enabled Information and Services Science       77
Browsed Facts Examples




            Knowledge Enabled Information and Services Science   78
Related work

 • Evaluation and Use
    – E. Agichtein, E. Brill, and S. Dumais. Improving web
      search ranking by incorporating user behavior
      information. Proceedings of the 29th annual
      international ACM SIGIR conference on Research and
      development in information retrieval - SIGIR ‟06, page
      19, 2006.
    – A. Das, M. Datar, A. Garg, and S. Rajaram. Google
      News Personalization: Scalable Online Collaborative
      Filtering. In Proceedings of the 16th international
      conference on World Wide Web, page 280. ACM,
      2007.


               Knowledge Enabled Information and Services Science   79
Summary Knowledge Acquisition
    • The model actually reflects what the user is
      interested in at the point of creation
       Willingness to help validate facts
          – Applications allow for implicit and explicit
            evaluation
    • Validated Statements can be merged with
      existing knowledge
           Automated acquisition completed
           Individual-driven KA improved overall system
• R. Kavuluru, C. Thomas et al. An Up-to-date Knowledge-Based Literature Search and Exploration Framework for Focused
  Bioscience Domains. IHI 2012
• Amit Sheth, Christopher Thomas, Pankaj Mehra, 'Continuous Semantics to Analyze Real-Time Data', IEEE IC, Nov./Dec. 2010
• C. Thomas et al. Improving Linked Open Data through On-Demand Model Creation. Web Science Conference, 2010.
• C. Thomas, et al.. Growing Fields of Interest - Using an Expand and Reduce Strategy for Domain Model Extraction. WIC 2008.

                                 Knowledge Enabled Information and Services Science                                            80
Future Directions

 • Active Learning to improve classification
   – Easy in tightly connected system (e.g. NELL)
   – Feedback mechanism for loosely connected
     systems
 • Improve depth of classification
   – Augment Domain Description with learned
     concept hierarchies from text (e.g. Navigli)
 • Knowledge management for background
   knowledge
   – Belief updates
   – Model evolution
             Knowledge Enabled Information and Services Science   81
Contributions
                                Conceptual Knowledge: Ontologies, LoD
                                                                Knowledge Representation
                                                                [IJSWIS, CR, FLSW]
                                                                Ontology design [WWW, FOIS]
Knowledge merging/
Ontology alignment
[AAAI, WebSem2,                                                      Textual Information:
SWSWPC]                                                                Wikipedia, Web
                                                                          Information
                                                                          Quality[WI2]
                                                                          Social processes
                                                                          for content creation
                                                                          [CHB]


 Social processes
                                                                  Taxonomy extraction
 for knowledge
                                                                  [WI1, WebSci, WebSem1]
 validation
                                                                  Event modeling [IEEE-IC]
 [IHI,WebSci, CHB]
                                                                  Relationship/Fact/Event
                                                                  extraction [IHI, WebSem1,
                                                                  IEEE-IC, WebSci]

                     Knowledge Enabled Information and Services Science                     82
Journal/Conference Publications

 [WebSem] C. Thomas, P. Mehra, A. Sheth, W. Wang, G. Weikum. Automatic
     domain model creation using pattern-based fact extraction. Submitted to
     Journal of Web Semantics.
 [IHI]R. Kavuluru, C. Thomas, A. Sheth, V. Chan, W. Wang, A. Smith, A. Sato and
     A. Walters. An Up-to-date Knowledge-Based Literature Search and
     Exploration Framework for Focused Bioscience Domains. IHI 2012 - 2nd
     ACM SIGHIT International Health Informatics Symposium, January 28-30,
     2012.
 [IEEE-IC] Amit Sheth, Christopher Thomas, Pankaj Mehra, 'Continuous
     Semantics to Analyze Real-Time Data', IEEE Internet Computing, vol. 14, no.
     6, pp. 84-89, Nov./Dec. 2010, doi:10.1109/MIC.2010.137
 [WebSci] C. Thomas, W. Wang, P. Mehra and A. Sheth. What Goes Around
     Comes Around Improving Linked Opend Data through On-Demand Model
     Creation. Web Science Conference, 2010.
 [WI1] C. Thomas, P. Mehra, R. Brooks, and A. Sheth. Growing Fields of Interest
     - Using an Expand and Reduce Strategy for Domain Model Extraction. Web
     Intelligence and Intelligent Agent Technology, IEEE/WIC/ACM International
     Conference on, 1:496–502, 2008.

                    Knowledge Enabled Information and Services Science             83
Journal/Conference Publications


 [WI2] C. Thomas and A. Sheth. Semantic Convergence of Wikipedia Articles. In
    Proceedings of the 2007 IEEE/WIC International Conference on Web
    Intelligence, pages 600–606, Washington, DC, USA, November 2007. IEEE
    Computer Society.
 [WWW] S. S. Sahoo, C. Thomas, A. Sheth, W. S. York, and S. Tartir. Knowledge
    Modeling and its Application in Life Sciences: A Tale of two Ontologies. In
    WWW ‟06: Proceedings of the 15th international conference on World Wide
    Web, pages 317–326, New York, NY, USA, 2006. ACM Press.
 [FOIS] C. Thomas, A. Sheth, and W. York. Modular Ontology Design Using
    Canonical Building Blocks in the Biochemistry Domain. In Proceeding of the
    2006 conference on Formal Ontology in Information Systems: Proceedings of
    the Fourth International Conference (FOIS 2006), pages 115–127,
    Amsterdam (NL), 2006. IOS Press.
 [AAAI] P. Doshi and C. Thomas. Inexact matching of ontology graphs using
    expectation-maximization. In AAAI‟06: proceedings of the 21st national
    conference on Artificial intelligence, pages 1277–1282. AAAI Press, 2006.


                    Knowledge Enabled Information and Services Science            84
Publications

 [CHB] C. Thomas and A. Sheth. Web Wisdom - An Essay on How Web 2.0 and
     Semantic Web can foster a Global Knowledge Society. Computers in Human
     Behavior, Elsevier.
 [WebSem2] P. Doshi, R. Kolli, and C. Thomas. Inexact matching of ontology
     graphs using expectation-maximization. Web Semantics: Science, Services
     and Agents on the World Wide Web, 7(2):90–106, 2009.
 [IJWGS] V. Kashyap, C. Ramakrishnan, C. Thomas, and A. Sheth. Taxaminer:
     an experimentation framework for automated taxonomy bootstrapping.
     International Journal of Web and Grid Services, 1(2):240–266, 2005.
 [IJSWIS] A. P. Sheth, C. Ramakrishnan, and C. Thomas. Semantics for the
     semantic web: The implicit, the formal and the powerful. Int. J. Semantic Web
     Inf. Syst., 1(1):1–18, 2005.
 [CR] S. Sahoo, C. Thomas, A. Sheth, C. Henson, and W. York. GLYDEan
     expressive XML standard for the representation of glycan structure.
     Carbohydrate research, 340(18):2802–2807, 2005.




                     Knowledge Enabled Information and Services Science              85
Other Publications

 Workshop Publications
 [SWLS] A. Sheth, W. York, C. Thomas, M. Nagarajan, J. Miller, K. Kochut, S.
    Sahoo, and X. Yi. Semantic Web technology in support of Bioinformatics for
    Glycan Expression. In W3C Workshop on Semantic Web for Life Sciences,
    pages 27–28, 2004.
 [SWSWPC] N. Oldham, C. Thomas, A. Sheth, and K. Verma. METEOR-S Web
    Service Annotation Framework with Machine Learning Classification.
    Semantic Web Services and Web Process Composition, pages 137–146,
    2005, Springer.
 Book Chapters
 [FLSW] C. Thomas and A. Sheth. On the expressiveness of the languages for
    the semantic web - making a case for a little more. Fuzzy Logic and the
    Semantic Web, pages 3–20, 2006.
 Patent
 [PAT] P. Mehra, R. Brooks and C. Thomas. ONTOLOGY CREATION BY
    REFERENCE TO A KNOWLEDGE CORPUS. Pub.No. US 2010/0280989 A1



                    Knowledge Enabled Information and Services Science           86
• Research                            • Collaborations
                                            – Complex Carbohydrate Research
   – KR                                       Center
   – Domain model                             at UGA
     extraction / IE                        – HP Labs Palo Alto
                                            – Human Performance
                                              Directorate, AFRL
• Proposals
  – HP Incubation &
    Innovation grant for
    Doozer++
                                          • Tools and Ontologies
  – AFRL grant largely                         –   GlycO
    based on Doozer++                          –   GlycoViz
  – NSF proposal                               –   Doozer++
    submitted with “very
    good” reviews                              –   Scooner
                                                                              87
                 Knowledge Enabled Information and Services Science
Thank you!



             Shaojun              Amit                 Pascal     Pankaj
 Gerhard
              Wang                Sheth                Hitzler    Mehra
 Weikum




           Thanks to all Kno.e.sis Center
                     Members
                          –
                Past and Present

             Knowledge Enabled Information and Services Science            88
Thank you



Knowledge Enabled Information and Services Science   89

More Related Content

What's hot

20120718 linkedopendataandnextgenerationsciencemcguinnessesip final
20120718 linkedopendataandnextgenerationsciencemcguinnessesip final20120718 linkedopendataandnextgenerationsciencemcguinnessesip final
20120718 linkedopendataandnextgenerationsciencemcguinnessesip finalDeborah McGuinness
 
Information Visualization for Social Network Analysis,
 Information Visualization for Social Network Analysis,  Information Visualization for Social Network Analysis,
Information Visualization for Social Network Analysis, University of Maryland
 
Relationship Web: Trailblazing, Analytics and Computing for Human Experience
Relationship Web: Trailblazing, Analytics and Computing for Human ExperienceRelationship Web: Trailblazing, Analytics and Computing for Human Experience
Relationship Web: Trailblazing, Analytics and Computing for Human ExperienceAmit Sheth
 
Elizabeth Churchill, "Data by Design"
Elizabeth Churchill, "Data by Design"Elizabeth Churchill, "Data by Design"
Elizabeth Churchill, "Data by Design"summersocialwebshop
 
Oop principles a good book
Oop principles a good bookOop principles a good book
Oop principles a good booklahorisher
 
Looking for Commonsense in the Semantic Web
Looking for Commonsense in the Semantic WebLooking for Commonsense in the Semantic Web
Looking for Commonsense in the Semantic WebValentina Presutti
 
Metadata in a Crowd: Shared Knowledge Production
Metadata in a Crowd: Shared Knowledge ProductionMetadata in a Crowd: Shared Knowledge Production
Metadata in a Crowd: Shared Knowledge ProductionKevin Rundblad
 
How to utilize ‘big data’ on SNS for academic purpose?
How to utilize ‘big data’ on SNS  for academic purpose?How to utilize ‘big data’ on SNS  for academic purpose?
How to utilize ‘big data’ on SNS for academic purpose?Han Woo PARK
 
Mining in Ontology with Multi Agent System in Semantic Web : A Novel Approach
Mining in Ontology with Multi Agent System in Semantic Web : A Novel ApproachMining in Ontology with Multi Agent System in Semantic Web : A Novel Approach
Mining in Ontology with Multi Agent System in Semantic Web : A Novel Approachijma
 
Tutorial Cognition - Irene
Tutorial Cognition - IreneTutorial Cognition - Irene
Tutorial Cognition - IreneSSSW
 
Riding The Semantic Wave
Riding The Semantic WaveRiding The Semantic Wave
Riding The Semantic WaveKaniska Mandal
 
Text REtrieval Conference (TREC) Dynamic Domain Track 2015
Text REtrieval Conference (TREC) Dynamic Domain Track 2015Text REtrieval Conference (TREC) Dynamic Domain Track 2015
Text REtrieval Conference (TREC) Dynamic Domain Track 2015Grace Hui Yang
 
Why manage research data?
Why manage research data?Why manage research data?
Why manage research data?Graham Pryor
 
Rise of Crowd Computing (December 2012)
Rise of Crowd Computing (December 2012)Rise of Crowd Computing (December 2012)
Rise of Crowd Computing (December 2012)Matthew Lease
 
Information Architecture & Findability
Information Architecture & FindabilityInformation Architecture & Findability
Information Architecture & FindabilityAre Halland
 
20120419 linkedopendataandteamsciencemcguinnesschicago
20120419 linkedopendataandteamsciencemcguinnesschicago20120419 linkedopendataandteamsciencemcguinnesschicago
20120419 linkedopendataandteamsciencemcguinnesschicagoDeborah McGuinness
 
Online Data Preprocessing: A Case Study Approach
Online Data Preprocessing: A Case Study ApproachOnline Data Preprocessing: A Case Study Approach
Online Data Preprocessing: A Case Study ApproachIJECEIAES
 
Supporting Rationale Awareness in Large-Scale Online Open Participative Activ...
Supporting Rationale Awareness in Large-Scale Online Open Participative Activ...Supporting Rationale Awareness in Large-Scale Online Open Participative Activ...
Supporting Rationale Awareness in Large-Scale Online Open Participative Activ...Lu Xiao
 

What's hot (20)

20120718 linkedopendataandnextgenerationsciencemcguinnessesip final
20120718 linkedopendataandnextgenerationsciencemcguinnessesip final20120718 linkedopendataandnextgenerationsciencemcguinnessesip final
20120718 linkedopendataandnextgenerationsciencemcguinnessesip final
 
Information Visualization for Social Network Analysis,
 Information Visualization for Social Network Analysis,  Information Visualization for Social Network Analysis,
Information Visualization for Social Network Analysis,
 
Relationship Web: Trailblazing, Analytics and Computing for Human Experience
Relationship Web: Trailblazing, Analytics and Computing for Human ExperienceRelationship Web: Trailblazing, Analytics and Computing for Human Experience
Relationship Web: Trailblazing, Analytics and Computing for Human Experience
 
Elizabeth Churchill, "Data by Design"
Elizabeth Churchill, "Data by Design"Elizabeth Churchill, "Data by Design"
Elizabeth Churchill, "Data by Design"
 
Oop principles a good book
Oop principles a good bookOop principles a good book
Oop principles a good book
 
Looking for Commonsense in the Semantic Web
Looking for Commonsense in the Semantic WebLooking for Commonsense in the Semantic Web
Looking for Commonsense in the Semantic Web
 
Metadata in a Crowd: Shared Knowledge Production
Metadata in a Crowd: Shared Knowledge ProductionMetadata in a Crowd: Shared Knowledge Production
Metadata in a Crowd: Shared Knowledge Production
 
Ibrahim ramadan paper
Ibrahim ramadan paperIbrahim ramadan paper
Ibrahim ramadan paper
 
How to utilize ‘big data’ on SNS for academic purpose?
How to utilize ‘big data’ on SNS  for academic purpose?How to utilize ‘big data’ on SNS  for academic purpose?
How to utilize ‘big data’ on SNS for academic purpose?
 
Mining in Ontology with Multi Agent System in Semantic Web : A Novel Approach
Mining in Ontology with Multi Agent System in Semantic Web : A Novel ApproachMining in Ontology with Multi Agent System in Semantic Web : A Novel Approach
Mining in Ontology with Multi Agent System in Semantic Web : A Novel Approach
 
Tutorial Cognition - Irene
Tutorial Cognition - IreneTutorial Cognition - Irene
Tutorial Cognition - Irene
 
Riding The Semantic Wave
Riding The Semantic WaveRiding The Semantic Wave
Riding The Semantic Wave
 
Text REtrieval Conference (TREC) Dynamic Domain Track 2015
Text REtrieval Conference (TREC) Dynamic Domain Track 2015Text REtrieval Conference (TREC) Dynamic Domain Track 2015
Text REtrieval Conference (TREC) Dynamic Domain Track 2015
 
Why manage research data?
Why manage research data?Why manage research data?
Why manage research data?
 
Rise of Crowd Computing (December 2012)
Rise of Crowd Computing (December 2012)Rise of Crowd Computing (December 2012)
Rise of Crowd Computing (December 2012)
 
Information Architecture & Findability
Information Architecture & FindabilityInformation Architecture & Findability
Information Architecture & Findability
 
20120419 linkedopendataandteamsciencemcguinnesschicago
20120419 linkedopendataandteamsciencemcguinnesschicago20120419 linkedopendataandteamsciencemcguinnesschicago
20120419 linkedopendataandteamsciencemcguinnesschicago
 
Online Data Preprocessing: A Case Study Approach
Online Data Preprocessing: A Case Study ApproachOnline Data Preprocessing: A Case Study Approach
Online Data Preprocessing: A Case Study Approach
 
Supporting Rationale Awareness in Large-Scale Online Open Participative Activ...
Supporting Rationale Awareness in Large-Scale Online Open Participative Activ...Supporting Rationale Awareness in Large-Scale Online Open Participative Activ...
Supporting Rationale Awareness in Large-Scale Online Open Participative Activ...
 
James Robson - Digital and Online Ethnography
James Robson - Digital and Online EthnographyJames Robson - Digital and Online Ethnography
James Robson - Digital and Online Ethnography
 

Viewers also liked

Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...Artificial Intelligence Institute at UofSC
 
Cartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan
 
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...Artificial Intelligence Institute at UofSC
 
Kno.e.sis Approach to Impactful Research & Training for Exceptional Careers
Kno.e.sis Approach to Impactful Research & Training for Exceptional CareersKno.e.sis Approach to Impactful Research & Training for Exceptional Careers
Kno.e.sis Approach to Impactful Research & Training for Exceptional CareersAmit Sheth
 
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...Artificial Intelligence Institute at UofSC
 
Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Amit Sheth
 

Viewers also liked (20)

A Semantics-based Approach to Machine Perception
A Semantics-based Approach to Machine PerceptionA Semantics-based Approach to Machine Perception
A Semantics-based Approach to Machine Perception
 
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
 
Prateek Jain's Dissertation Defense - Linked Open Data Alignment and Querying
Prateek Jain's Dissertation Defense - Linked Open Data Alignment and QueryingPrateek Jain's Dissertation Defense - Linked Open Data Alignment and Querying
Prateek Jain's Dissertation Defense - Linked Open Data Alignment and Querying
 
Satya Sahoo Thesis Defense
Satya Sahoo Thesis DefenseSatya Sahoo Thesis Defense
Satya Sahoo Thesis Defense
 
PhD thesis defense of Ajith Ranabahu
PhD thesis defense of Ajith RanabahuPhD thesis defense of Ajith Ranabahu
PhD thesis defense of Ajith Ranabahu
 
Contrast Pattern Aided Regression and Classification
Contrast Pattern Aided Regression and ClassificationContrast Pattern Aided Regression and Classification
Contrast Pattern Aided Regression and Classification
 
Cartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defense
 
Mining and Analyzing Subjective Experiences in User-generated Content
Mining and Analyzing Subjective Experiences in User-generated ContentMining and Analyzing Subjective Experiences in User-generated Content
Mining and Analyzing Subjective Experiences in User-generated Content
 
Automatic Emotion Identification from Text
Automatic Emotion Identification from TextAutomatic Emotion Identification from Text
Automatic Emotion Identification from Text
 
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
 
Ashutosh Jadhav PhD Defense: Knowledge Driven Search Intent Mining
Ashutosh Jadhav PhD Defense: Knowledge Driven Search Intent MiningAshutosh Jadhav PhD Defense: Knowledge Driven Search Intent Mining
Ashutosh Jadhav PhD Defense: Knowledge Driven Search Intent Mining
 
Knowledge-driven Implicit Information Extraction
Knowledge-driven Implicit Information ExtractionKnowledge-driven Implicit Information Extraction
Knowledge-driven Implicit Information Extraction
 
2015 Kno.e.sis Center Annual Review
2015 Kno.e.sis Center Annual Review2015 Kno.e.sis Center Annual Review
2015 Kno.e.sis Center Annual Review
 
Trust Management: A Tutorial
Trust Management: A TutorialTrust Management: A Tutorial
Trust Management: A Tutorial
 
Web and Complex Systems Lab @ Kno.e.sis
Web and Complex Systems Lab @ Kno.e.sisWeb and Complex Systems Lab @ Kno.e.sis
Web and Complex Systems Lab @ Kno.e.sis
 
Kno.e.sis Approach to Impactful Research & Training for Exceptional Careers
Kno.e.sis Approach to Impactful Research & Training for Exceptional CareersKno.e.sis Approach to Impactful Research & Training for Exceptional Careers
Kno.e.sis Approach to Impactful Research & Training for Exceptional Careers
 
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
 
Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...
 
Kno.e.sis Review: late 2012 to mid 2013
Kno.e.sis Review: late 2012 to mid 2013Kno.e.sis Review: late 2012 to mid 2013
Kno.e.sis Review: late 2012 to mid 2013
 
Knoesis Student Achievement
Knoesis Student AchievementKnoesis Student Achievement
Knoesis Student Achievement
 

Similar to Knowledge Acquisition in a System: Automatic Creation of Conceptual Domain Models

Visualization for Data Analysis: A New Way to Look at Content
Visualization for Data Analysis: A New Way to Look at Content Visualization for Data Analysis: A New Way to Look at Content
Visualization for Data Analysis: A New Way to Look at Content Access Innovations, Inc.
 
Competency framework: engineers, statisticians, data scientists, librarians, ...
Competency framework: engineers, statisticians, data scientists, librarians, ...Competency framework: engineers, statisticians, data scientists, librarians, ...
Competency framework: engineers, statisticians, data scientists, librarians, ...African Open Science Platform
 
Session 0.0 poster minutes madness
Session 0.0   poster minutes madnessSession 0.0   poster minutes madness
Session 0.0 poster minutes madnesssemanticsconference
 
Metadata and Taxonomies for More Flexible Information Architecture
Metadata and Taxonomies for More Flexible Information Architecture Metadata and Taxonomies for More Flexible Information Architecture
Metadata and Taxonomies for More Flexible Information Architecture jrhowe
 
Everything Self-Service:Linked Data Applications with the Information Workbench
Everything Self-Service:Linked Data Applications with the Information WorkbenchEverything Self-Service:Linked Data Applications with the Information Workbench
Everything Self-Service:Linked Data Applications with the Information WorkbenchPeter Haase
 
NDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) OfficeNDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) OfficePhilip Bourne
 
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...SEAD
 
XSEDE and National Cyberinfrastructure
XSEDE and National CyberinfrastructureXSEDE and National Cyberinfrastructure
XSEDE and National CyberinfrastructureJohn Towns
 
Mending the Gap between Library's Electronic and Print Collections in ILS and...
Mending the Gap between Library's Electronic and Print Collections in ILS and...Mending the Gap between Library's Electronic and Print Collections in ILS and...
Mending the Gap between Library's Electronic and Print Collections in ILS and...New York University
 
Challenges in setting up an RDM Support Service
Challenges in setting up an RDM Support ServiceChallenges in setting up an RDM Support Service
Challenges in setting up an RDM Support ServiceGarethKnight
 
The Yahoo Knowledge Graph (SemTech 2014)
The Yahoo Knowledge Graph (SemTech 2014)The Yahoo Knowledge Graph (SemTech 2014)
The Yahoo Knowledge Graph (SemTech 2014)Nicolas Torzec
 
Federated Search Webinar for SLA (Special Libraries Assoc.)
Federated Search Webinar for SLA (Special Libraries Assoc.)Federated Search Webinar for SLA (Special Libraries Assoc.)
Federated Search Webinar for SLA (Special Libraries Assoc.)Helen Mitchell
 
Information Quality Assessment in the WIQ-EI EU Project
Information Quality Assessment in the WIQ-EI EU ProjectInformation Quality Assessment in the WIQ-EI EU Project
Information Quality Assessment in the WIQ-EI EU ProjectElisabeth Lex
 
The Social Semantic Server: A Flexible Framework to Support Informal Learning...
The Social Semantic Server: A Flexible Framework to Support Informal Learning...The Social Semantic Server: A Flexible Framework to Support Informal Learning...
The Social Semantic Server: A Flexible Framework to Support Informal Learning...tobold
 
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...The Social Semantic Server - A Flexible Framework to Support Informal Learnin...
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...Sebastian Dennerlein
 
Managing Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseManaging Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseRinke Hoekstra
 
Unlocking The Value Of Your Information
Unlocking The Value Of Your InformationUnlocking The Value Of Your Information
Unlocking The Value Of Your InformationIntergen
 

Similar to Knowledge Acquisition in a System: Automatic Creation of Conceptual Domain Models (20)

Visualization for Data Analysis: A New Way to Look at Content
Visualization for Data Analysis: A New Way to Look at Content Visualization for Data Analysis: A New Way to Look at Content
Visualization for Data Analysis: A New Way to Look at Content
 
Competency framework: engineers, statisticians, data scientists, librarians, ...
Competency framework: engineers, statisticians, data scientists, librarians, ...Competency framework: engineers, statisticians, data scientists, librarians, ...
Competency framework: engineers, statisticians, data scientists, librarians, ...
 
Session 0.0 poster minutes madness
Session 0.0   poster minutes madnessSession 0.0   poster minutes madness
Session 0.0 poster minutes madness
 
Metadata and Taxonomies for More Flexible Information Architecture
Metadata and Taxonomies for More Flexible Information Architecture Metadata and Taxonomies for More Flexible Information Architecture
Metadata and Taxonomies for More Flexible Information Architecture
 
Everything Self-Service:Linked Data Applications with the Information Workbench
Everything Self-Service:Linked Data Applications with the Information WorkbenchEverything Self-Service:Linked Data Applications with the Information Workbench
Everything Self-Service:Linked Data Applications with the Information Workbench
 
NDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) OfficeNDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) Office
 
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
 
XSEDE and National Cyberinfrastructure
XSEDE and National CyberinfrastructureXSEDE and National Cyberinfrastructure
XSEDE and National Cyberinfrastructure
 
Classroom of the futurev3
Classroom of the futurev3Classroom of the futurev3
Classroom of the futurev3
 
Mending the Gap between Library's Electronic and Print Collections in ILS and...
Mending the Gap between Library's Electronic and Print Collections in ILS and...Mending the Gap between Library's Electronic and Print Collections in ILS and...
Mending the Gap between Library's Electronic and Print Collections in ILS and...
 
Text Mining : Experience
Text Mining : ExperienceText Mining : Experience
Text Mining : Experience
 
Challenges in setting up an RDM Support Service
Challenges in setting up an RDM Support ServiceChallenges in setting up an RDM Support Service
Challenges in setting up an RDM Support Service
 
The Yahoo Knowledge Graph (SemTech 2014)
The Yahoo Knowledge Graph (SemTech 2014)The Yahoo Knowledge Graph (SemTech 2014)
The Yahoo Knowledge Graph (SemTech 2014)
 
Lafferty "Supporting Research Data Management: Perceptions from a Library Pra...
Lafferty "Supporting Research Data Management: Perceptions from a Library Pra...Lafferty "Supporting Research Data Management: Perceptions from a Library Pra...
Lafferty "Supporting Research Data Management: Perceptions from a Library Pra...
 
Federated Search Webinar for SLA (Special Libraries Assoc.)
Federated Search Webinar for SLA (Special Libraries Assoc.)Federated Search Webinar for SLA (Special Libraries Assoc.)
Federated Search Webinar for SLA (Special Libraries Assoc.)
 
Information Quality Assessment in the WIQ-EI EU Project
Information Quality Assessment in the WIQ-EI EU ProjectInformation Quality Assessment in the WIQ-EI EU Project
Information Quality Assessment in the WIQ-EI EU Project
 
The Social Semantic Server: A Flexible Framework to Support Informal Learning...
The Social Semantic Server: A Flexible Framework to Support Informal Learning...The Social Semantic Server: A Flexible Framework to Support Informal Learning...
The Social Semantic Server: A Flexible Framework to Support Informal Learning...
 
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...The Social Semantic Server - A Flexible Framework to Support Informal Learnin...
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...
 
Managing Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseManaging Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS case
 
Unlocking The Value Of Your Information
Unlocking The Value Of Your InformationUnlocking The Value Of Your Information
Unlocking The Value Of Your Information
 

Recently uploaded

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 

Recently uploaded (20)

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 

Knowledge Acquisition in a System: Automatic Creation of Conceptual Domain Models

  • 1. Knowledge Acquisition in a System Christopher Thomas Ohio Center of Excellence in Knowledge-enabled Computing - Kno.e.sis, Wright State University Dayton, OH topher@knoesis.org
  • 2. Circle of knowledge in a System Knowledge Enabled Information and Services Science 2
  • 3. Dissertation Overview Conceptual Knowledge: Ontologies, LoD Knowledge Representation [IJSWIS, CR, FLSW] Ontology design [WWW, FOIS] Knowledge merging/ Ontology alignment [AAAI, WebSem2, Textual Information: SWSWPC] Wikipedia, Web Information Quality[WI2] Social processes for content creation [CHB] Social processes Doozer++: for knowledge Taxonomy extraction validation Relationship/Fact [IHI,WebSci, CHB] extraction [IHI, WebSem1, IEEE- IC, WebSci, WI1] Knowledge Enabled Information and Services Science 3
  • 4. Talk Contents What is knowledge? How do we turn propositions/belie fs into knowledge? How do we acquire information? Knowledge Enabled Information and Services Science 4
  • 5. Talk outline • Motivation • Knowledge Acquisition (KA) Overview • KA in a loosely connected system – Doozer++ – Automatic formal domain model creation – Information Extraction • Top-Down • Bottom-Up – Information Validation “in use” • Conclusion Knowledge Enabled Information and Services Science 5
  • 6. Larger Context of automated KA • Increasing significance of knowledge economy – “Knowledge Workers” spend 38% of their time searching for information (McDermott, 2005) – Vital to get a quick and still comprehensive understanding of a field through pertinent concepts/entities and relations/interactions • Increased demand for formally available knowledge in semantic models – Filtering, browsing, annotation, reasoning Mcdermott, M. "Knowledge Workers: How can you gauge their effectiveness." Leadership Excellence. Vol. 22.10. October 2005 Knowledge Enabled Information and Services Science
  • 7. Motivating Scenario • Learn about a new subject – E.g. gain a quick overview over a current or historical event • Use a formal representation of the gained overview to filter information – Facilitate in-depth exploration • Use the formalized information and the user interaction to create knowledge from information Knowledge Enabled Information and Services Science 7
  • 8. Motivating Scenario • Google: India • Brief description – demographic-, geographic information, etc. Knowledge Enabled Information and Services Science 8
  • 9. Motivating Scenario • Google: India • Regular Web results Knowledge Enabled Information and Services Science 9
  • 10. Motivating Scenario • Clicking on a link to the Wikipedia entry shows that there have been conflicts with Pakistan over the region of Kashmir  Investigate more Knowledge Enabled Information and Services Science 10
  • 11. Motivating Scenario • Google: India Pakistan Kashmir • Only Web results and news So far, search engines only display facts about entities, not relationships or larger contexts Knowledge Enabled Information and Services Science 11
  • 12. Motivating Scenario • Beneficial to get an overview “at a glance” over a domain. • Automated approach to creating knowledge models for focused areas of interest • Create models around an incomplete or rudimentary keyword description and “anticipate” user‟s intentions wrt. the full context Knowledge Enabled Information and Services Science 12
  • 13. Motivating Scenario Doozer++: india pakistan kashmir • Important concepts and relationships describing the context Knowledge Enabled Information and Services Science 13
  • 14. Motivating Scenario • Filtered IR using concepts in the model • Concepts and relationships that contributed to clicked results gain support • User can explicitly approve content Knowledge Enabled Information and Services Science 14
  • 15. Circle of Knowledge (Example) Knowledge Enabled Information and Services Science 15
  • 16. Motivating Scenario • On-demand creation of domain knowledge improves individual comprehension of an event • Formal models are easy to use in information filtering • Validated information  Knowledge – Can be given back to the community to improve the overall amount of formal knowledge available on the Web – E.g. “Unknown” to DBPedia that the region of Kashmir belongs to both India and Pakistan Knowledge Enabled Information and Services Science 16
  • 17. Importance of Model creation • Models support individual user or know- ledge worker, but also groups or system – More efficient communication through small, shared, agreeable conceptualizations • People  people • People  system • System  system – Classify or filter pertinent and topical information using models – Model-assisted searching and faceted or exploratory browsing using relationships – Reuse of validated knowledge Knowledge Enabled Information and Services Science
  • 18. Domain Knowledge Models • Scientific applications – In-depth description of concepts – Narrow field – People  system, system  system • Annotation, reasoning ⇒Absolute correctness necessary (as far as possible) • General applications – Broad coverage of the field – Context – how does the new information fit in? – People  people, people  system • Individual domain comprehension, filtering, annotation ⇒Relative correctness sufficient Knowledge Enabled Information and Services Science 18
  • 19. Model Creation Resources • Large models are available as reference – DBPedia, YAGO, UMLS, MeSH, GO … – Too big to be efficiently and effectively usable • Prior knowledge required to find pertinent resources • Other information is available in great abundance, but unformalized – Tacit expert knowledge – Scientific databases – Free text • peer reviewed journals and proceedings • General Web content Knowledge Enabled Information and Services Science 19
  • 20. Epistemological Considerations • Knowledge – Ensure epistemological soundness of automated knowledge acquisition • Reference – Ensure that nodes in the models refer to real- world concepts/entities Knowledge Enabled Information and Services Science 20
  • 21. Knowledge • Functional Definition – Knowledge = “Know-How” – Practical, but weak, Includes “Actionable Information” • Categorical Definition – Knowledge = Justified true belief – S knows that p iff i. p is true; ii. S believes that p; iii. S is justified in believing that p. Knowledge Enabled Information and Services Science
  • 22. Belief and Justification • Belief – Statements held by the system • Justification – Trusted sources – Extraction algorithms • Bayesian, deductive or inductive reasoning • Macro-Reading algorithms  Wisdom of the crowds – Validation Knowledge Enabled Information and Services Science 22
  • 23. Truth assessment of a statement • Is truth correspondence? – “A” is true No Access iff A (a true statement corresponds to an actual state of affairs) • Is truth coherence? – Does the statement fit into the system of other statements? • Is truth consensus? – agreement of correctness amongst a group ⇒In the cyclical model, achieve high degree of certainty by allowing constant validation Knowledge Enabled Information and Services Science
  • 24. Domain Model – Reference • Model of a domain conceptually split – Domain Definition Concepts identified by URIs (classes, entities, relationship types)  ensures reference Remains static – necessity Rigid designators (Kripke) – Domain Description Relationships describe concepts Subject to change – possibility Definite descriptions (Russell) Knowledge Enabled Information and Services Science
  • 25. Domain Definition • Top-down concept identification • Achieved through – Manual creation based on consensus in a group – Extraction from community-created or peer- reviewed conceptualization • Wikipedia • MeSH or UMLS Semantic Network Knowledge Enabled Information and Services Science
  • 26. Domain Description • Possible to do top-down extraction of the domain description, e.g. from DBPedia • Problem: Formal concept descriptions are sparse – On average, DBPedia has less than 2 object properties per entity • Extract descriptions (facts) bottom-up – Available in text, DBs, etc. – Domain-specific molecular structure extractors (GlycO) – Domain independent IE techniques (Doozer++) Knowledge Enabled Information and Services Science
  • 27. Knowledge Acquisition Approaches • KA in a tightly connected system – GlycO: domain-specific BioChemistry ontology • Manual domain definition and description • Partial automatic domain description • Domain-specific automatic validation • Manual validation for false negatives • KA in a loosely connected system – Doozer++: general domain-model creation framework • Automatic domain definition, top-down concept extraction • Automatic domain description, bottom-up fact extraction – Extraction from trusted sources – A trusted extraction and validation procedure • Domain-independent community-based validation Knowledge Enabled Information and Services Science
  • 28. Knowledge Acquisition Approaches Knowledge Traditional GlycO Doozer++ Engineering Extraction Approach Approach Definition Top-Down Bottom-up Top-Down Top-Down Knowledge Conceptually, by Engineering extraction from Top-Down corpus Description Top-Down Bottom-up Bottom-up, Bottom-up, restricted by Top- restricted by down definition Top-down definition Verification Manual Manual Correctness: Community- automatic: based validation Exceptions: added manually Knowledge Enabled Information and Services Science
  • 29. KA on the Web - Vision • Web searches, browsing sessions or classification task can be seen as creating an implicit domain model – World view, Concept coverage, Facts • Make models explicit and reusable using formal descriptions (RDF, OWL) • Validate the contained information and share with the community  Increase system‟s knowledge by “doing what you do”: Search, browse, click, communicate Knowledge Enabled Information and Services Science 29
  • 30. KA in a Loosely Connected System Domain Model creation to gradually increase •Linked Data overall knowledge of the system • Free text • User-interest driven • Wikipedia • Incentive to • Web evaluate Domain Definition Validation Doozer++ Scooner Evaluation in Use: – Domain Definition: Semantic browsing Top-down concept and retrieval, extraction Domain-independent, – Domain Description: Community-based Domain Description Pattern-based fact extraction Knowledge Enabled Information and Services Science 30
  • 31. Domain Definition Requirements • Identify concepts, concept labels (denotations) and concept hierarchy • Challenge: define narrow boundaries for a domain while at the same time ensuring broad conceptual coverage within the domain Knowledge Enabled Information and Services Science 31
  • 32. Domain definition - conceptual • Expand and Reduce approach – Start with „high recall‟ methods • Exploration – Full text search • Exploitation – Graph-Similarity Method • Category growth • “What could be in the domain?” – End with “high precision” methods • Apply restrictions on the concepts found • Remove terms and categories that fall outside the dense areas of the model graph • “What should be in the domain?” Knowledge Enabled Information and Services Science 32
  • 33. Domain Description - Classifier • Concept-aware – Use concepts and concept labels from the domain definition step • Fact extraction as classification of concept pairs into relationship types – fclass: C C R – RS,O = {R | p(R,S,O) > ε} Knowledge Enabled Information and Services Science
  • 34. Domain Description • Combined Language model and Semantic classification model • Language model: Surface-pattern – based – Pattern manifestations of relationships as features – Open to any corpus, language independent – Less computational overhead than NLP • Semantic Classification Model – Learned or assigned concept labels – Semantic types to aid classification Knowledge Enabled Information and Services Science
  • 35. Domain Description - Implementation • Probabilistic Vector-space model – Each relationship is defined by vectors of • Pattern probabilities • Domain/range probabilities – Each concept is grounded by its semantic types and manifested by it‟s labels and their probabilities of identifying the concept – Sparse pattern representation (density ~2%) – White-box, easily verifiable – Inherently parallel Knowledge Enabled Information and Services Science
  • 36. Terminology Symbol Meaning Example S, O Subject and Kelly_Miller_(scientist) Object concepts Howard_University (semantic) LS,LO Subject and “Kelly Miller” Object labels “Howard University” PLS,LO Phrase Kelly Miller graduated from Howard University instantiating the pattern P Pattern <Subject> graduated from <Object> TS,TO Semantic type of Person Subject or Object Educational_Institution R relationship almaMater birthPlace Knowledge Enabled Information and Services Science 36
  • 37. Probabilistic Classifier Semantic types. Labels taken Asserted in from Lexicon Ontology or or linked learned from corpus linked data Patterns learned from free text Knowledge Enabled Information and Services Science 37
  • 38. Probabilistic Classifier How is Barack Obama related to Columbia University? p(R, Barack_Obama, Columbia_University) Sentence in corpus: Obama graduated in 1983 from Columbia University with a degree in political science and international relations. (Regular classification requires multiple examples) Knowledge Enabled Information and Services Science 38
  • 39. Probabilistic Classifier Obama graduated in 1983 from Columbia University p(almaMater ,Barack_Obama, Columbia_University) = p(almaMater | “<Subject> graduated in 1983 from <Object>”) * p(Barack_Obama | ”Obama”) * p(Columbia_University | ”Columbia University”) * p(almaMater | domain(person)) * p(almaMater | range(academic_institution)) p(almaMater , Barack_Obama, Columbia_University) = 0.9 * 0.95 * 0.95 * 0.9 * 0.97 p(almaMater, Barack_Obama, Columbia_University) = 0.70909425 Knowledge Enabled Information and Services Science 39
  • 40. Pattern Generalization • Problem: Low recall in pattern-based IE • Substitute terms with wild cards – No POS tagging, hence only “*” wild cards • Mirrors shortest paths through parse trees <Subject> graduated in 1983 from <Object> <Subject> * in 1983 from <Object> <Subject> graduated * 1983 from <Object> <Subject> * * 1983 from <Object> <Subject> graduated in * from <Object> <Subject> * in * from <Object> <Subject> graduated * * from <Object> <Subject> * * * from <Object> Knowledge Enabled Information and Services Science 40
  • 41. Learning p(R|P) • Distantly Supervised Training • Collect pattern frequencies for training examples – Fact triples <S, R, O> e.g. from Linked Data (DBPedia, UMLS) – Manifestations of facts in text in the form of patterns (corpus e.g. Web, Wikipedia, MedLine) • For relationship Ri, aggregate pattern vectors representing <*, Ri, *> Knowledge Enabled Information and Services Science 41
  • 42. Learning p(R|P) – naïve • For each vector Ri containing pattern frequencies for relationship Ri, compute • #Patternj that occur with terms denoting each <S, O> Ri in normalized by all pattern occurrences for Ri Knowledge Enabled Information and Services Science 42
  • 43. Learning p(R|P) – naïve • Uniform distribution of relationships assumed – As the number of relationship types grows), the prior of each type goes towards 0. – normalize the probabilities over the column vector to get p(Ri|Pj) • Vector space representation – Relationship-pattern matrix – R2Pij = p(Ri|Pj) Knowledge Enabled Information and Services Science 43
  • 44. Problem: Relationship Similarities • Extensional similarity – Semantically different relationships can share Subject-Object pairs in training data • Intensional similarity – Overlap and entailment of relationship types • Types should not be seen as discrete – E,g, physical_part_of part_of • Apriori unknown which types overlap unless formal description available – Semantically similar types compete for the same patterns Knowledge Enabled Information and Services Science 44
  • 45. Relationship similarities Pertinence Measure similarity between pattern vectors as approximation of intensional similarity Knowledge Enabled Information and Services Science 45
  • 46. Pertinence for Relationships Do not punish the occurrence of the same pattern with relationship types that are intensionally similar, but extensionally dissimilar Reduce impact of extensionally similar relations Knowledge Enabled Information and Services Science 46
  • 47. Pertinence Example Pattern: <Subject> in the right <Object> Relationship p(R|P) biological_process_has_associated_location 0.968371381 disease_has_associated_anatomic_site 0.880452774 part_of 0.622532958 has_finding_site 0.561041318 has_location 0.537424451 has_direct_procedure_site 0.363832078 Sum: 3.933654958 Note: This never causes p(R,S,O) > 1 Knowledge Enabled Information and Services Science 47
  • 48. Similarities between relationships Knowledge Enabled Information and Services Science 48
  • 49. Pertinence evaluation 0.8 0.7 0.6 0.5 Precision 0.4 Pertinence 0.3 No Pertinence 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 Recall Knowledge Enabled Information and Services Science 49
  • 50. Fact extraction evaluation - DBPedia 60% training set, 40% testing, DBPedia Infobox fact corpus, Wikipedia text corpus Precision / Recall Strict evaluation: Only 1st ranked extracted relation is compared to gold- standard. Averaged over 107 Confidence Threshold relation types. Knowledge Enabled Information and Services Science 50
  • 51. Sample results (DBPedia) suggested Extracted Rank 1 Subject :: Object Relationship (Rel;Confidence) Rank 2 Rank 3 Howard Pawley :: successor; after; office; after Gary Filmon 0.799 0.768 0.686 nextSingle; followedBy; after; Mulan :: Tarzan after 0.603 0.533 0.416 Species Deceases:: producer; artist; genre; artist Midnight Oil 0.761 0.719 0.467 The Crystal City :: artist; author; writer; author Orson Scott Card 0.625 0.617 0.583 Horatio Allen :: before predecessor;0.629 before;0.475 William Maxwell Basdeo Panday :: birthplace; nationality; birthplace deathPlace;0.658 Trinidad &Tobago 0.658 0.330 Bob Nystrom :: birthplace cityOfBirth;0.677 birthplace;0.513 Stockholm Beccles railway borough; friend; borough district;0.772 station :: Suffolk 0.770 0.749 Knowledge Enabled Information and Services Science 51
  • 52. Fact extraction evaluation - UMLS 60% training set, 40% testing, UMLS fact corpus, MedLine text corpus Precision / Recall Strict evaluation: Only 1st ranked extracted relation is compared to gold- standard. Averaged over Confidence Threshold ~100 relation types. Knowledge Enabled Information and Services Science 52
  • 53. Sample results (UMLS) Subject :: Object suggested Relationship Extracted Rank 1 Teeth::poisoning, fluoride finding_site_of finding_site_of 768 polyps::polyp of cervix nos associated_with associated_with (disorder) neck of uterus::polyp of cervix nos location_of finding_site_of (disorder) benign neoplasms::polyp of colon related_to associated_with brain ischemia::brain has_finding_site location_of is_primary_anatomic_ gastrointestinal tract::polyp of colon location_of site_of_disease gamete structure (cell is_normal_cell_origin_ is_normal_cell_ structure)::polyvesicular vitelline of_disease origin_of_disease tumor Knowledge Enabled Information and Services Science 53
  • 54. Comparison – DBPedia corpus Mintz: extraction 1 of 102 relation- 0.9 ship types from 0.8 Freebase Doozer: 107 0.7 from DBPedia Precision 0.6 0.5 Mintz-POS Mintz-NLP 0.4 Doozer++ (R) 0.3 Doozer++ (P) 0.2 0.1 0 (R) Recall- oriented, using 0 0.2 0.4 0.6 0.8 1 pattern Recall generalization M. Mintz, S. Bills, R. Snow, and D. Jurafsky, “Distant supervision for relation (P) Precision- extraction without labeled data,” in ACL2009. oriented, no generalization Knowledge Enabled Information and Services Science 54
  • 55. Evaluate Ad-Hoc Model Creation • On demand creation of models Precision Number of (Domain Domain Query Concepts Definition) “Semantic Web” OWL Semantic Web ontologies RDF 143 0.98 “Harry Potter” dumbledore Harry Potter gryffindor slytherin 134 0.98 Beatles "John Lennon" "Paul Beatles McCartney" song 250 0.99 India-Pakistan Relations India Pakistan Kashmir 129 0.99 US Financial tarp "financial crisis" "toxic crisis - TARP assets" 146 0.93 German German chancellors "Angela Chancellors Merkel" "Helmut Kohl" 124 0.91 Knowledge Enabled Information and Services Science 55
  • 56. Ad-Hoc Model Creation - Evaluation Knowledge Enabled Information and Services Science 56
  • 57. Ad-Hoc Model Creation - Evaluation Recall wrt. possible extraction. I.e. the Relative Recall maximum number of extracted facts marks 100% recall Knowledge Enabled Information and Services Science 57
  • 58. Related Work Mintz Sur- face pat- terns SOFIE Turney only Knowledge Enabled Information and Services Science 58
  • 59. Main Differences • Surface-patterns only • Only positive training examples • Pertinence measure for semantic similarity • Concept-aware: start with defined concepts • Include background knowledge in probabilistic classification instead of rule- based reasoning Knowledge Enabled Information and Services Science 59
  • 60. Related work • Pattern-based fact extraction – E. Agichtein and L. Gravano. Snowball: Extracting relations from large plain-text collections. In JCDL, 2000. – Suchanek, Fabian M., Mauro Sozio, and Gerhard Weikum. SOFIE : A Self-Organizing Framework for Information Extraction.• WWW 2009. – T. M. Mitchell, J. Betteridge, A. Carlson, E. Hruschka, and R. Wang. Populating the Semantic Web by Macro- Reading Internet Text. ISWC 2009. – M. Pasca, D. Lin, J. Bigham, A. Lifchits, and A. Jain. Organizing and searching the world wide web of facts- step one: the one-million fact extraction challenge. In AAAI 2006. Knowledge Enabled Information and Services Science 60
  • 61. Related work • Relationship-pattern computations – P. D. Turney and P. Pantel. From Frequency to Meaning: Vector Space Models of Semantics. Journal of Artificial Intelligence Research, 37, 2010. – P. D. Turney. Expressing implicit semantic relations without supervision. In ACL 2006 Knowledge Enabled Information and Services Science 61
  • 62. Summary Fact extraction • Pattern-based fact extraction with generalization and Pertinence achieves competitive precision and recall while being computationally feasible for large-scale extraction – Pertinence computation can also be a preprocessing step for other ML techniques • Different types of background knowledge incorporated into one statistical framework – Combined Language model and Semantic model Knowledge Enabled Information and Services Science 62
  • 63. Application and Knowledge Validation Example: Domain model as a basis for research in • 18 Million MedLine the area of human publications/abstracts cognitive performance. • UMLS Metathesaurus • Wikipedia Scooner: Semantic browsing Doozer++ and retrieval – – Hierarchy extraction Evaluation in Use – Pattern-based fact extraction Knowledge Enabled Information and Services Science 63
  • 64. Domain Definition – Extracted Hierarchy A hierarchy extracted for a cognitive science domain model. The keyword description given to the system was a collection of terms relevant to human performance and cognition. Knowledge Enabled Information and Services Science 64
  • 65. Domain Description: Connect Concepts Knowledge Enabled Information and Services Science 65
  • 66. Expert Evaluation of Facts in the Model 0.9 0.8 0.7 0.6 Fraction 0.5 Fraction in bin 0.4 Cumulative incorrect Cumulative correct 0.3 Cumulative interesting 0.2 0.1 0. Score 1 2 3 4 5 6 7 8 9 1-2: Information that is 3-4: Information that is 5-6: Correct general 7-9: Correct Information not overall incorrect somewhat correct Information commonly known Knowledge Enabled Information and Services Science 66
  • 67. Extractor Confidence vs. Correctness • Analysis shows that highest quality extractions have the highest confidence, but also incorrectly extracted facts have high confidence  High-quality patterns as well as some noise-patterns have high indicative power. Knowledge Enabled Information and Services Science 67
  • 68. Extractor Confidence vs. Correctness • Many facts deemed interesting were extracted based on highly specialized patterns in the long tail of the frequency distribution. • Noisy patterns also tend to occupy this space Knowledge Enabled Information and Services Science 68
  • 69. Sources of Errors • Extracted relationship too specific or formally incorrect but metaphorically correct. – <Interpeduncular_Cistern  disease_has_associated_ anatomic_site  Cerebral_peduncle> is incorrect, • Interpeduncular Cistern is not a disease. However, it does have the associated anatomic site Cerebral peduncle. • Incorrect directionality – <Pituitary_Gland  sends_output_to  Supraoptic_ nucleus> should be <Supraoptic_nucleus  sends_ output_to  Pituitary_Gland> • Direction in text often expressed in the context rather than the immediate pattern Knowledge Enabled Information and Services Science 69
  • 70. Validation • Extracted statements need to be validated to be considered knowledge – Explicit validation, e.g. thumbs up/down – Implicit validation, e.g. by analyzing click streams Knowledge Enabled Information and Services Science 70
  • 71. Explicit Validation • Certainty of reference – I.e. we know exactly which statement was validated • Validator credentials can be obtained – E.g. a small community of experts may evaluate • Extra work – Explicit validation is a task that is consciously performed Knowledge Enabled Information and Services Science 71
  • 72. Implicit Validation • Find indications of correctness or incorrectness based on the way the users interact with the presented information – Every action taken on a piece of information is recorded and analyzed – The cumulative behavior of the users gives an indication of which propositions are correct or interesting Knowledge Enabled Information and Services Science 72
  • 73. Implicit Validation • Examples for implicit community-validation – Games with a purpose (L. von Ahn) – Google search rankings • Scooner semantic browser – Browse literature along facts in a model – Browsing trails suggest correct extraction Knowledge Enabled Information and Services Science 73
  • 74. Implicit Validation • A fact is browsed very often by different users. – The fact is interesting to many users. – The fact is surprising and interesting, but may be incorrect. • A user follows a trail of multiple fact-triples trough a variety of documents. – The facts that were browsed have a high probability of being correct and support is added to the triples. – If the trail was longer than suggested by a small-world phenomenon, initial triples may have been incorrect, but led to interesting ones. For this reason, only the last k triples of the trail should garner support or the support should increase for the last k triples in the trail. – The last triple in the trail may have been incorrect and led to browsing results that caused the user to stop browsing. For this reason, the last triple of the trail should be treated with caution. Knowledge Enabled Information and Services Science 74
  • 75. Validation “through use” Choose entityEnter search of interest terms Browse Choose relevant extracted facts literature that supports the fact Knowledge Enabled Information and Services Science 75
  • 76. Validation “through use” Find another interesting fact Fact trails are recorded Knowledge Enabled Information and Services Science 76
  • 77. Validation “through use” Path suggests that at least the first 2 triples are factually correct Knowledge Enabled Information and Services Science 77
  • 78. Browsed Facts Examples Knowledge Enabled Information and Services Science 78
  • 79. Related work • Evaluation and Use – E. Agichtein, E. Brill, and S. Dumais. Improving web search ranking by incorporating user behavior information. Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR ‟06, page 19, 2006. – A. Das, M. Datar, A. Garg, and S. Rajaram. Google News Personalization: Scalable Online Collaborative Filtering. In Proceedings of the 16th international conference on World Wide Web, page 280. ACM, 2007. Knowledge Enabled Information and Services Science 79
  • 80. Summary Knowledge Acquisition • The model actually reflects what the user is interested in at the point of creation  Willingness to help validate facts – Applications allow for implicit and explicit evaluation • Validated Statements can be merged with existing knowledge  Automated acquisition completed  Individual-driven KA improved overall system • R. Kavuluru, C. Thomas et al. An Up-to-date Knowledge-Based Literature Search and Exploration Framework for Focused Bioscience Domains. IHI 2012 • Amit Sheth, Christopher Thomas, Pankaj Mehra, 'Continuous Semantics to Analyze Real-Time Data', IEEE IC, Nov./Dec. 2010 • C. Thomas et al. Improving Linked Open Data through On-Demand Model Creation. Web Science Conference, 2010. • C. Thomas, et al.. Growing Fields of Interest - Using an Expand and Reduce Strategy for Domain Model Extraction. WIC 2008. Knowledge Enabled Information and Services Science 80
  • 81. Future Directions • Active Learning to improve classification – Easy in tightly connected system (e.g. NELL) – Feedback mechanism for loosely connected systems • Improve depth of classification – Augment Domain Description with learned concept hierarchies from text (e.g. Navigli) • Knowledge management for background knowledge – Belief updates – Model evolution Knowledge Enabled Information and Services Science 81
  • 82. Contributions Conceptual Knowledge: Ontologies, LoD Knowledge Representation [IJSWIS, CR, FLSW] Ontology design [WWW, FOIS] Knowledge merging/ Ontology alignment [AAAI, WebSem2, Textual Information: SWSWPC] Wikipedia, Web Information Quality[WI2] Social processes for content creation [CHB] Social processes Taxonomy extraction for knowledge [WI1, WebSci, WebSem1] validation Event modeling [IEEE-IC] [IHI,WebSci, CHB] Relationship/Fact/Event extraction [IHI, WebSem1, IEEE-IC, WebSci] Knowledge Enabled Information and Services Science 82
  • 83. Journal/Conference Publications [WebSem] C. Thomas, P. Mehra, A. Sheth, W. Wang, G. Weikum. Automatic domain model creation using pattern-based fact extraction. Submitted to Journal of Web Semantics. [IHI]R. Kavuluru, C. Thomas, A. Sheth, V. Chan, W. Wang, A. Smith, A. Sato and A. Walters. An Up-to-date Knowledge-Based Literature Search and Exploration Framework for Focused Bioscience Domains. IHI 2012 - 2nd ACM SIGHIT International Health Informatics Symposium, January 28-30, 2012. [IEEE-IC] Amit Sheth, Christopher Thomas, Pankaj Mehra, 'Continuous Semantics to Analyze Real-Time Data', IEEE Internet Computing, vol. 14, no. 6, pp. 84-89, Nov./Dec. 2010, doi:10.1109/MIC.2010.137 [WebSci] C. Thomas, W. Wang, P. Mehra and A. Sheth. What Goes Around Comes Around Improving Linked Opend Data through On-Demand Model Creation. Web Science Conference, 2010. [WI1] C. Thomas, P. Mehra, R. Brooks, and A. Sheth. Growing Fields of Interest - Using an Expand and Reduce Strategy for Domain Model Extraction. Web Intelligence and Intelligent Agent Technology, IEEE/WIC/ACM International Conference on, 1:496–502, 2008. Knowledge Enabled Information and Services Science 83
  • 84. Journal/Conference Publications [WI2] C. Thomas and A. Sheth. Semantic Convergence of Wikipedia Articles. In Proceedings of the 2007 IEEE/WIC International Conference on Web Intelligence, pages 600–606, Washington, DC, USA, November 2007. IEEE Computer Society. [WWW] S. S. Sahoo, C. Thomas, A. Sheth, W. S. York, and S. Tartir. Knowledge Modeling and its Application in Life Sciences: A Tale of two Ontologies. In WWW ‟06: Proceedings of the 15th international conference on World Wide Web, pages 317–326, New York, NY, USA, 2006. ACM Press. [FOIS] C. Thomas, A. Sheth, and W. York. Modular Ontology Design Using Canonical Building Blocks in the Biochemistry Domain. In Proceeding of the 2006 conference on Formal Ontology in Information Systems: Proceedings of the Fourth International Conference (FOIS 2006), pages 115–127, Amsterdam (NL), 2006. IOS Press. [AAAI] P. Doshi and C. Thomas. Inexact matching of ontology graphs using expectation-maximization. In AAAI‟06: proceedings of the 21st national conference on Artificial intelligence, pages 1277–1282. AAAI Press, 2006. Knowledge Enabled Information and Services Science 84
  • 85. Publications [CHB] C. Thomas and A. Sheth. Web Wisdom - An Essay on How Web 2.0 and Semantic Web can foster a Global Knowledge Society. Computers in Human Behavior, Elsevier. [WebSem2] P. Doshi, R. Kolli, and C. Thomas. Inexact matching of ontology graphs using expectation-maximization. Web Semantics: Science, Services and Agents on the World Wide Web, 7(2):90–106, 2009. [IJWGS] V. Kashyap, C. Ramakrishnan, C. Thomas, and A. Sheth. Taxaminer: an experimentation framework for automated taxonomy bootstrapping. International Journal of Web and Grid Services, 1(2):240–266, 2005. [IJSWIS] A. P. Sheth, C. Ramakrishnan, and C. Thomas. Semantics for the semantic web: The implicit, the formal and the powerful. Int. J. Semantic Web Inf. Syst., 1(1):1–18, 2005. [CR] S. Sahoo, C. Thomas, A. Sheth, C. Henson, and W. York. GLYDEan expressive XML standard for the representation of glycan structure. Carbohydrate research, 340(18):2802–2807, 2005. Knowledge Enabled Information and Services Science 85
  • 86. Other Publications Workshop Publications [SWLS] A. Sheth, W. York, C. Thomas, M. Nagarajan, J. Miller, K. Kochut, S. Sahoo, and X. Yi. Semantic Web technology in support of Bioinformatics for Glycan Expression. In W3C Workshop on Semantic Web for Life Sciences, pages 27–28, 2004. [SWSWPC] N. Oldham, C. Thomas, A. Sheth, and K. Verma. METEOR-S Web Service Annotation Framework with Machine Learning Classification. Semantic Web Services and Web Process Composition, pages 137–146, 2005, Springer. Book Chapters [FLSW] C. Thomas and A. Sheth. On the expressiveness of the languages for the semantic web - making a case for a little more. Fuzzy Logic and the Semantic Web, pages 3–20, 2006. Patent [PAT] P. Mehra, R. Brooks and C. Thomas. ONTOLOGY CREATION BY REFERENCE TO A KNOWLEDGE CORPUS. Pub.No. US 2010/0280989 A1 Knowledge Enabled Information and Services Science 86
  • 87. • Research • Collaborations – Complex Carbohydrate Research – KR Center – Domain model at UGA extraction / IE – HP Labs Palo Alto – Human Performance Directorate, AFRL • Proposals – HP Incubation & Innovation grant for Doozer++ • Tools and Ontologies – AFRL grant largely – GlycO based on Doozer++ – GlycoViz – NSF proposal – Doozer++ submitted with “very good” reviews – Scooner 87 Knowledge Enabled Information and Services Science
  • 88. Thank you! Shaojun Amit Pascal Pankaj Gerhard Wang Sheth Hitzler Mehra Weikum Thanks to all Kno.e.sis Center Members – Past and Present Knowledge Enabled Information and Services Science 88
  • 89. Thank you Knowledge Enabled Information and Services Science 89

Editor's Notes

  1. Shared vocabulary
  2. To get the probability of seeing a relationship when given a concept pair, we average over all occurrences of phrases that contain labels for the concept pair, take into account the probabilities that the term pair actually denotes the concept pair and, if available, if the types of subject and object are likely to occur with that relationship.
  3. Show how pattern probabilities and background knowledge interact
  4. Shortcoming. Patterns are seen as independent, even though they would have been in the same path trough a parse tree
  5. Pertinence has most influence in high-recall regions. Intuitively, as the threshold is increased, patterns that are highly indicative of specific relationships contribute more to the classification and thus the advantage of the pertinence method is slightly diminished.
  6. Doozer(R) – recall oriented, generalizedDoozer(P) – precision-oriented, not generalized
  7. None of the facts were previously found in UMLS
  8. It is important to know how correct information was extracted. The probabilistic classifier easily allows for analysis of the patterns that were underlying the extraction. The slide shows how extraction quality measures up against extraction confidence.
  9. It is important to know how correct information was extracted. The probabilistic classifier easily allows for analysis of the patterns that were underlying the extraction. The slide shows how extraction quality measures up against extraction confidence.