Knowledge Acquisition in a System: Automatic Creation of Conceptual Domain Models

Knowledge Acquisition in a
System
Christopher Thomas
Ohio Center of Excellence in Knowledge-enabled Computing -
Kno.e.sis,
Wright State University
Dayton, OH
topher@knoesis.org

Circle of knowledge in a System

Knowledge Enabled Information and Services Science 2

Dissertation Overview
Conceptual Knowledge: Ontologies, LoD
Knowledge Representation
[IJSWIS, CR, FLSW]
Ontology design [WWW, FOIS]
Knowledge merging/
Ontology alignment
[AAAI, WebSem2, Textual Information:
SWSWPC] Wikipedia, Web
Information
Quality[WI2]
Social processes
for content creation
[CHB]

Social processes
Doozer++:
for knowledge
Taxonomy extraction
validation
Relationship/Fact
[IHI,WebSci, CHB]
extraction
[IHI, WebSem1, IEEE-
IC, WebSci, WI1]


Talk Contents
What is knowledge?

How do we turn
propositions/belie
fs into
knowledge?
How do we
acquire
information?


Talk outline

• Motivation
• Knowledge Acquisition (KA) Overview
• KA in a loosely connected system – Doozer++
– Automatic formal domain model creation
– Information Extraction
• Top-Down
• Bottom-Up
– Information Validation “in use”
• Conclusion


Larger Context of automated KA

• Increasing significance of knowledge
economy
– “Knowledge Workers” spend 38% of their time
searching for information (McDermott, 2005)
– Vital to get a quick and still comprehensive
understanding of a field through pertinent
concepts/entities and relations/interactions
• Increased demand for formally available
knowledge in semantic models
– Filtering, browsing, annotation, reasoning

Mcdermott, M. "Knowledge Workers: How can you gauge their effectiveness." Leadership Excellence. Vol. 22.10. October 2005

Knowledge Enabled Information and Services Science

Motivating Scenario

• Learn about a new subject
– E.g. gain a quick overview over a current or
historical event
• Use a formal representation of the gained
overview to filter information
– Facilitate in-depth exploration
• Use the formalized information and the
user interaction to create knowledge from
information


Motivating Scenario

• Google: India

• Brief description –
demographic-,
geographic
information, etc.


Motivating Scenario

• Google: India

• Regular Web results


Motivating Scenario

• Clicking on a link to the Wikipedia entry
shows that there have been conflicts with
Pakistan over the region of Kashmir
 Investigate more


Motivating Scenario

• Google: India
Pakistan Kashmir

• Only Web results and
news
So far, search
engines only display
facts about entities, not
relationships or larger
contexts

Motivating Scenario

• Beneficial to get an overview “at a glance”
over a domain.
• Automated approach to creating knowledge
models for focused areas of interest
• Create models around an incomplete or
rudimentary keyword description and
“anticipate” user‟s intentions wrt. the full
context


Motivating Scenario

Doozer++: india pakistan kashmir
• Important concepts and relationships
describing the context


Motivating Scenario

• Filtered IR using
concepts in the
model
• Concepts and
relationships that
contributed to
clicked results gain
support
• User can explicitly
approve content

Circle of Knowledge (Example)


Motivating Scenario

• On-demand creation of domain knowledge
improves individual comprehension of an
event
• Formal models are easy to use in
information filtering
• Validated information  Knowledge
– Can be given back to the community to
improve the overall amount of formal
knowledge available on the Web
– E.g. “Unknown” to DBPedia that the region of
Kashmir belongs to both India and Pakistan


Importance of Model creation

• Models support individual user or know-
ledge worker, but also groups or system
– More efficient communication through small,
shared, agreeable conceptualizations
• People  people
• People  system
• System  system
– Classify or filter pertinent and topical
information using models
– Model-assisted searching and faceted or
exploratory browsing using relationships
– Reuse of validated knowledge

Domain Knowledge Models
• Scientific applications
– In-depth description of concepts
– Narrow field
– People  system, system  system
• Annotation, reasoning
⇒Absolute correctness necessary (as far as possible)
• General applications
– Broad coverage of the field
– Context – how does the new information fit in?
– People  people, people  system
• Individual domain comprehension, filtering, annotation
⇒Relative correctness sufficient

Model Creation Resources

• Large models are available as reference
– DBPedia, YAGO, UMLS, MeSH, GO …
– Too big to be efficiently and effectively usable
• Prior knowledge required to find pertinent resources
• Other information is available in great
abundance, but unformalized
– Tacit expert knowledge
– Scientific databases
– Free text
• peer reviewed journals and proceedings
• General Web content


Epistemological Considerations

• Knowledge
– Ensure epistemological soundness of
automated knowledge acquisition
• Reference
– Ensure that nodes in the models refer to real-
world concepts/entities


Knowledge

• Functional Definition
– Knowledge = “Know-How”
– Practical, but weak,
Includes “Actionable Information”
• Categorical Definition
– Knowledge = Justified true belief
– S knows that p iff
i. p is true;
ii. S believes that p;
iii. S is justified in believing that p.


Belief and Justification

• Belief
– Statements held by the system
• Justification
– Trusted sources
– Extraction algorithms
• Bayesian, deductive or inductive reasoning
• Macro-Reading algorithms  Wisdom of the crowds
– Validation


Truth assessment of a statement

• Is truth correspondence?
– “A” is true No Access
iff A (a true statement corresponds
to an actual state of affairs)
• Is truth coherence?
– Does the statement fit into the system of other
statements?
• Is truth consensus?
– agreement of correctness amongst a group
⇒In the cyclical model, achieve high degree
of certainty by allowing constant validation

Domain Model – Reference

• Model of a domain conceptually split
– Domain Definition
Concepts identified by URIs (classes, entities,
relationship types)  ensures reference
Remains static – necessity
Rigid designators (Kripke)
– Domain Description
Relationships describe concepts
Subject to change – possibility
Definite descriptions (Russell)


Domain Definition

• Top-down concept identification
• Achieved through
– Manual creation based on consensus in a
group
– Extraction from community-created or peer-
reviewed conceptualization
• Wikipedia
• MeSH or UMLS Semantic Network


Domain Description

• Possible to do top-down extraction of the
domain description, e.g. from DBPedia
• Problem: Formal concept descriptions are
sparse
– On average, DBPedia has less than 2 object
properties per entity
• Extract descriptions (facts) bottom-up
– Available in text, DBs, etc.
– Domain-specific molecular structure extractors
(GlycO)
– Domain independent IE techniques (Doozer++)


Knowledge Acquisition Approaches
• KA in a tightly connected system
– GlycO: domain-specific BioChemistry ontology
• Manual domain definition and description
• Partial automatic domain description
• Domain-specific automatic validation
• Manual validation for false negatives
• KA in a loosely connected system
– Doozer++: general domain-model creation framework
• Automatic domain definition, top-down concept extraction
• Automatic domain description, bottom-up fact extraction
– Extraction from trusted sources
– A trusted extraction and validation procedure
• Domain-independent community-based validation


Knowledge Acquisition Approaches
Knowledge Traditional GlycO Doozer++
Engineering Extraction
Approach Approach

Definition Top-Down Bottom-up Top-Down Top-Down
Knowledge Conceptually, by
Engineering extraction from
Top-Down
corpus
Description Top-Down Bottom-up Bottom-up, Bottom-up,
restricted by Top- restricted by
down definition Top-down
definition
Verification Manual Manual Correctness: Community-
automatic: based validation
Exceptions:
added manually


KA on the Web - Vision

• Web searches, browsing sessions or
classification task can be seen as creating
an implicit domain model
– World view, Concept coverage, Facts
• Make models explicit and reusable using
formal descriptions (RDF, OWL)
• Validate the contained information and
share with the community
 Increase system‟s knowledge by
“doing what you do”: Search, browse,
click, communicate

KA in a Loosely Connected System
Domain Model creation
to gradually increase
•Linked Data
overall knowledge of
the system • Free text
• User-interest driven • Wikipedia
• Incentive to • Web
evaluate

Domain Definition
Validation Doozer++
Scooner
Evaluation in Use: – Domain Definition:
Semantic browsing Top-down concept
and retrieval, extraction
Domain-independent, – Domain Description:
Community-based Domain Description Pattern-based fact
extraction

Domain Definition Requirements
• Identify concepts, concept
labels (denotations) and
concept hierarchy
• Challenge: define narrow
boundaries for a domain while
at the same time ensuring
broad conceptual coverage
within the domain


Domain definition - conceptual

• Expand and Reduce approach
– Start with „high recall‟ methods
• Exploration – Full text search
• Exploitation – Graph-Similarity Method
• Category growth
• “What could be in the domain?”
– End with “high precision” methods
• Apply restrictions on the concepts found
• Remove terms and categories that fall outside the
dense areas of the model graph
• “What should be in the domain?”


Domain Description - Classifier

• Concept-aware
– Use concepts and concept labels from the
domain definition step
• Fact extraction as classification of
concept pairs into relationship types
– fclass: C C R
– RS,O = {R | p(R,S,O) > ε}


Domain Description

• Combined Language model and Semantic
classification model
• Language model: Surface-pattern – based
– Pattern manifestations of relationships as
features
– Open to any corpus, language independent
– Less computational overhead than NLP
• Semantic Classification Model
– Learned or assigned concept labels
– Semantic types to aid classification


Domain Description - Implementation

• Probabilistic Vector-space model
– Each relationship is defined by vectors of
• Pattern probabilities
• Domain/range probabilities
– Each concept is grounded by its semantic
types and manifested by it‟s labels and their
probabilities of identifying the concept
– Sparse pattern representation (density ~2%)
– White-box, easily verifiable
– Inherently parallel


Terminology
Symbol Meaning Example

S, O Subject and Kelly_Miller_(scientist)
Object concepts Howard_University
(semantic)
LS,LO Subject and “Kelly Miller”
Object labels “Howard University”
PLS,LO Phrase Kelly Miller graduated from Howard University
instantiating the
pattern
P Pattern <Subject> graduated from <Object>

TS,TO Semantic type of Person
Subject or Object Educational_Institution
R relationship almaMater
birthPlace


Probabilistic Classifier

Semantic types.
Labels taken Asserted in
from Lexicon Ontology or
or linked learned from
corpus linked data

Patterns
learned from
free text



How is Barack Obama related to Columbia University?
p(R, Barack_Obama, Columbia_University)

Sentence in corpus:
Obama graduated in 1983 from Columbia University
with a degree in political science and international
relations.

(Regular classification requires multiple examples)



Obama graduated in 1983 from Columbia University
p(almaMater ,Barack_Obama, Columbia_University) =
p(almaMater | “<Subject> graduated in 1983 from <Object>”) *
p(Barack_Obama | ”Obama”) *
p(Columbia_University | ”Columbia University”) *
p(almaMater | domain(person)) *
p(almaMater | range(academic_institution))

p(almaMater , Barack_Obama, Columbia_University)
= 0.9 * 0.95 * 0.95 * 0.9 * 0.97

p(almaMater, Barack_Obama, Columbia_University) = 0.70909425


Pattern Generalization

• Problem: Low recall in pattern-based IE
• Substitute terms with wild cards
– No POS tagging, hence only “*” wild cards
• Mirrors shortest paths through parse trees
<Subject> graduated in 1983 from <Object>
<Subject> * in 1983 from <Object>
<Subject> graduated * 1983 from <Object>
<Subject> * * 1983 from <Object>
<Subject> graduated in * from <Object>
<Subject> * in * from <Object>
<Subject> graduated * * from <Object>
<Subject> * * * from <Object>


Learning p(R|P)

• Distantly Supervised Training
• Collect pattern frequencies for training
examples
– Fact triples <S, R, O> e.g. from Linked Data
(DBPedia, UMLS)
– Manifestations of facts in text in the form of
patterns (corpus e.g. Web, Wikipedia, MedLine)
• For relationship Ri, aggregate pattern
vectors representing <*, Ri, *>


Learning p(R|P) – naïve

• For each vector Ri containing pattern
frequencies for relationship Ri, compute

• #Patternj that occur with terms denoting each
<S, O> Ri in normalized by all pattern
occurrences for Ri


Learning p(R|P) – naïve

• Uniform distribution of relationships assumed
– As the number of relationship types grows), the
prior of each type goes towards 0.
– normalize the probabilities over the column
vector to get p(Ri|Pj)

• Vector space representation
– Relationship-pattern matrix
– R2Pij = p(Ri|Pj)


Problem: Relationship Similarities

• Extensional similarity
– Semantically different relationships can share
Subject-Object pairs in training data
• Intensional similarity
– Overlap and entailment of relationship types
• Types should not be seen as discrete
– E,g, physical_part_of part_of
• Apriori unknown which types overlap unless formal
description available
– Semantically similar types compete for the
same patterns

Relationship similarities

Pertinence Measure
similarity between pattern vectors as approximation
of intensional similarity


Pertinence for Relationships
Do not punish the occurrence of the same pattern
with relationship types that are intensionally
similar, but extensionally dissimilar
Reduce impact of extensionally similar relations


Pertinence Example

Pattern: <Subject> in the right <Object>
Relationship p(R|P)
biological_process_has_associated_location 0.968371381
disease_has_associated_anatomic_site 0.880452774
part_of 0.622532958
has_finding_site 0.561041318
has_location 0.537424451
has_direct_procedure_site 0.363832078
Sum: 3.933654958

Note: This never causes p(R,S,O) > 1


Similarities between relationships


Pertinence evaluation
0.8

0.7

0.6

0.5
Precision

0.4
Pertinence
0.3 No Pertinence

0.2

0.1

0
0 0.1 0.2 0.3 0.4 0.5
Recall


Fact extraction evaluation - DBPedia
60% training set, 40% testing, DBPedia Infobox fact corpus, Wikipedia text corpus
Precision / Recall

Strict evaluation:
Only 1st ranked
extracted relation is
compared to gold-
standard.
Averaged over 107
Confidence Threshold relation types.


Sample results (DBPedia)

suggested Extracted Rank 1
Subject :: Object Relationship (Rel;Confidence) Rank 2 Rank 3
Howard Pawley :: successor; after; office;
after
Gary Filmon 0.799 0.768 0.686
nextSingle; followedBy; after;
Mulan :: Tarzan after
0.603 0.533 0.416
Species Deceases:: producer; artist; genre;
artist
Midnight Oil 0.761 0.719 0.467
The Crystal City :: artist; author; writer;
author
Orson Scott Card 0.625 0.617 0.583
Horatio Allen ::
before predecessor;0.629 before;0.475
William Maxwell
Basdeo Panday :: birthplace; nationality;
birthplace deathPlace;0.658
Trinidad &Tobago 0.658 0.330
Bob Nystrom ::
birthplace cityOfBirth;0.677 birthplace;0.513
Stockholm
Beccles railway borough; friend;
borough district;0.772
station :: Suffolk 0.770 0.749


Fact extraction evaluation - UMLS
60% training set, 40% testing, UMLS fact corpus, MedLine text corpus
Precision / Recall

Strict evaluation:
Only 1st ranked
extracted relation is
compared to gold-
standard.
Averaged over
Confidence Threshold ~100 relation types.


Sample results (UMLS)

Subject :: Object suggested Relationship Extracted Rank 1
Teeth::poisoning, fluoride finding_site_of finding_site_of
768 polyps::polyp of cervix nos
associated_with associated_with
(disorder)
neck of uterus::polyp of cervix nos
location_of finding_site_of
(disorder)
benign neoplasms::polyp of colon related_to associated_with

brain ischemia::brain has_finding_site location_of
is_primary_anatomic_
gastrointestinal tract::polyp of colon location_of
site_of_disease
gamete structure (cell
is_normal_cell_origin_ is_normal_cell_
structure)::polyvesicular vitelline
of_disease origin_of_disease
tumor


Comparison – DBPedia corpus
Mintz: extraction
1
of 102 relation-
0.9 ship types from
0.8 Freebase
Doozer: 107
0.7 from DBPedia
Precision

0.6
0.5 Mintz-POS
Mintz-NLP
0.4
Doozer++ (R)
0.3
Doozer++ (P)
0.2
0.1
0 (R) Recall-
oriented, using
0 0.2 0.4 0.6 0.8 1 pattern
Recall generalization
M. Mintz, S. Bills, R. Snow, and D. Jurafsky, “Distant supervision for relation (P) Precision-
extraction without labeled data,” in ACL2009. oriented, no
generalization

Evaluate Ad-Hoc Model Creation

• On demand creation of models
Precision
Number of (Domain
Domain Query Concepts Definition)
“Semantic Web” OWL
Semantic Web ontologies RDF 143 0.98
“Harry Potter” dumbledore
Harry Potter gryffindor slytherin 134 0.98
Beatles "John Lennon" "Paul
Beatles McCartney" song 250 0.99
India-Pakistan
Relations India Pakistan Kashmir 129 0.99
US Financial tarp "financial crisis" "toxic
crisis - TARP assets" 146 0.93
German German chancellors "Angela
Chancellors Merkel" "Helmut Kohl" 124 0.91


Ad-Hoc Model Creation - Evaluation


Ad-Hoc Model Creation - Evaluation
Recall wrt. possible
extraction. I.e. the
Relative Recall maximum number of
extracted facts
marks 100% recall


Related Work

Mintz
Sur-
face
pat-
terns SOFIE
Turney
only


Main Differences

• Surface-patterns only
• Only positive training examples
• Pertinence measure for semantic similarity
• Concept-aware: start with defined concepts
• Include background knowledge in
probabilistic classification instead of rule-
based reasoning


Related work
• Pattern-based fact extraction
– E. Agichtein and L. Gravano. Snowball: Extracting
relations from large plain-text collections. In JCDL,
2000.
– Suchanek, Fabian M., Mauro Sozio, and Gerhard
Weikum. SOFIE : A Self-Organizing Framework for
Information Extraction.• WWW 2009.
– T. M. Mitchell, J. Betteridge, A. Carlson, E. Hruschka,
and R. Wang. Populating the Semantic Web by Macro-
Reading Internet Text. ISWC 2009.
– M. Pasca, D. Lin, J. Bigham, A. Lifchits, and A. Jain.
Organizing and searching the world wide web of facts-
step one: the one-million fact extraction challenge. In
AAAI 2006.

Related work

• Relationship-pattern computations
– P. D. Turney and P. Pantel. From Frequency to
Meaning: Vector Space Models of Semantics. Journal
of Artificial Intelligence Research, 37, 2010.
– P. D. Turney. Expressing implicit semantic relations
without supervision. In ACL 2006


Summary Fact extraction

• Pattern-based fact extraction with
generalization and Pertinence achieves
competitive precision and recall while being
computationally feasible for large-scale
extraction
– Pertinence computation can also be a
preprocessing step for other ML techniques
• Different types of background knowledge
incorporated into one statistical framework
– Combined Language model and Semantic
model

Application and Knowledge Validation
Example: Domain model
as a basis for research in • 18 Million MedLine
the area of human publications/abstracts
cognitive performance. • UMLS Metathesaurus
• Wikipedia

Scooner:
Semantic browsing Doozer++
and retrieval – – Hierarchy extraction
Evaluation in Use – Pattern-based fact
extraction


Domain Definition – Extracted Hierarchy

A hierarchy extracted for a cognitive science domain model.

The keyword description given to the system was a collection of terms relevant
to human performance and cognition.


Domain Description: Connect Concepts


Expert Evaluation of Facts in the Model
0.9

0.8

0.7

0.6
Fraction

0.5
Fraction in bin
0.4 Cumulative incorrect
Cumulative correct
0.3 Cumulative interesting
0.2

0.1

0.
Score 1 2 3 4 5 6 7 8 9
1-2: Information that is 3-4: Information that is 5-6: Correct general 7-9: Correct Information not
overall incorrect somewhat correct Information commonly known


Extractor Confidence vs. Correctness

• Analysis shows that highest quality extractions have the
highest confidence, but also incorrectly extracted facts have
high confidence
 High-quality patterns as well as some noise-patterns have
high indicative power.

Extractor Confidence vs. Correctness

• Many facts deemed interesting were extracted based on
highly specialized patterns in the long tail of the frequency
distribution.
• Noisy patterns also tend to occupy this space


Sources of Errors

• Extracted relationship too specific or formally
incorrect but metaphorically correct.
– <Interpeduncular_Cistern  disease_has_associated_
anatomic_site  Cerebral_peduncle> is incorrect,
• Interpeduncular Cistern is not a disease. However, it does have
the associated anatomic site Cerebral peduncle.
• Incorrect directionality
– <Pituitary_Gland  sends_output_to  Supraoptic_
nucleus> should be <Supraoptic_nucleus  sends_
output_to  Pituitary_Gland>
• Direction in text often expressed in the context rather than the
immediate pattern


Validation

• Extracted statements need to be validated
to be considered knowledge
– Explicit validation, e.g. thumbs up/down
– Implicit validation, e.g. by analyzing click streams


Explicit Validation

• Certainty of reference
– I.e. we know exactly which statement was
validated
• Validator credentials can be obtained
– E.g. a small community of experts may evaluate
• Extra work
– Explicit validation is a task that is consciously
performed


Implicit Validation

• Find indications of correctness or
incorrectness based on the way the users
interact with the presented information
– Every action taken on a piece of information is
recorded and analyzed
– The cumulative behavior of the users gives an
indication of which propositions are correct or
interesting


Implicit Validation

• Examples for implicit community-validation
– Games with a purpose (L. von Ahn)
– Google search rankings
• Scooner semantic browser
– Browse literature along facts in a model
– Browsing trails suggest correct extraction


Implicit Validation

• A fact is browsed very often by different users.
– The fact is interesting to many users.
– The fact is surprising and interesting, but may be incorrect.

• A user follows a trail of multiple fact-triples trough
a variety of documents.
– The facts that were browsed have a high probability of being correct and support is
added to the triples.
– If the trail was longer than suggested by a small-world phenomenon, initial triples
may have been incorrect, but led to interesting ones. For this reason, only the last
k triples of the trail should garner support or the support should increase for the
last k triples in the trail.
– The last triple in the trail may have been incorrect and led to browsing results that
caused the user to stop browsing. For this reason, the last triple of the trail should
be treated with caution.


Validation “through use”

Choose entityEnter search
of interest terms

Browse
Choose relevant
extracted facts
literature that
supports the fact



Find another
interesting fact
Fact trails are
recorded


Path suggests
that at least the
first 2 triples are
factually correct


Browsed Facts Examples


Related work

• Evaluation and Use
– E. Agichtein, E. Brill, and S. Dumais. Improving web
search ranking by incorporating user behavior
information. Proceedings of the 29th annual
international ACM SIGIR conference on Research and
development in information retrieval - SIGIR ‟06, page
19, 2006.
– A. Das, M. Datar, A. Garg, and S. Rajaram. Google
News Personalization: Scalable Online Collaborative
Filtering. In Proceedings of the 16th international
conference on World Wide Web, page 280. ACM,
2007.


Summary Knowledge Acquisition
• The model actually reflects what the user is
interested in at the point of creation
 Willingness to help validate facts
– Applications allow for implicit and explicit
evaluation
• Validated Statements can be merged with
existing knowledge
 Automated acquisition completed
 Individual-driven KA improved overall system
• R. Kavuluru, C. Thomas et al. An Up-to-date Knowledge-Based Literature Search and Exploration Framework for Focused
Bioscience Domains. IHI 2012
• Amit Sheth, Christopher Thomas, Pankaj Mehra, 'Continuous Semantics to Analyze Real-Time Data', IEEE IC, Nov./Dec. 2010
• C. Thomas et al. Improving Linked Open Data through On-Demand Model Creation. Web Science Conference, 2010.
• C. Thomas, et al.. Growing Fields of Interest - Using an Expand and Reduce Strategy for Domain Model Extraction. WIC 2008.


Future Directions

• Active Learning to improve classification
– Easy in tightly connected system (e.g. NELL)
– Feedback mechanism for loosely connected
systems
• Improve depth of classification
– Augment Domain Description with learned
concept hierarchies from text (e.g. Navigli)
• Knowledge management for background
knowledge
– Belief updates
– Model evolution

Contributions
Conceptual Knowledge: Ontologies, LoD
Knowledge Representation
[IJSWIS, CR, FLSW]
Ontology design [WWW, FOIS]
Knowledge merging/
Ontology alignment
[AAAI, WebSem2, Textual Information:
SWSWPC] Wikipedia, Web
Information
Quality[WI2]
Social processes
for content creation
[CHB]

Social processes
Taxonomy extraction
for knowledge
[WI1, WebSci, WebSem1]
validation
Event modeling [IEEE-IC]
[IHI,WebSci, CHB]
Relationship/Fact/Event
extraction [IHI, WebSem1,
IEEE-IC, WebSci]


Journal/Conference Publications

[WebSem] C. Thomas, P. Mehra, A. Sheth, W. Wang, G. Weikum. Automatic
domain model creation using pattern-based fact extraction. Submitted to
Journal of Web Semantics.
[IHI]R. Kavuluru, C. Thomas, A. Sheth, V. Chan, W. Wang, A. Smith, A. Sato and
A. Walters. An Up-to-date Knowledge-Based Literature Search and
Exploration Framework for Focused Bioscience Domains. IHI 2012 - 2nd
ACM SIGHIT International Health Informatics Symposium, January 28-30,
2012.
[IEEE-IC] Amit Sheth, Christopher Thomas, Pankaj Mehra, 'Continuous
Semantics to Analyze Real-Time Data', IEEE Internet Computing, vol. 14, no.
6, pp. 84-89, Nov./Dec. 2010, doi:10.1109/MIC.2010.137
[WebSci] C. Thomas, W. Wang, P. Mehra and A. Sheth. What Goes Around
Comes Around Improving Linked Opend Data through On-Demand Model
Creation. Web Science Conference, 2010.
[WI1] C. Thomas, P. Mehra, R. Brooks, and A. Sheth. Growing Fields of Interest
- Using an Expand and Reduce Strategy for Domain Model Extraction. Web
Intelligence and Intelligent Agent Technology, IEEE/WIC/ACM International
Conference on, 1:496–502, 2008.


Journal/Conference Publications

[WI2] C. Thomas and A. Sheth. Semantic Convergence of Wikipedia Articles. In
Proceedings of the 2007 IEEE/WIC International Conference on Web
Intelligence, pages 600–606, Washington, DC, USA, November 2007. IEEE
Computer Society.
[WWW] S. S. Sahoo, C. Thomas, A. Sheth, W. S. York, and S. Tartir. Knowledge
Modeling and its Application in Life Sciences: A Tale of two Ontologies. In
WWW ‟06: Proceedings of the 15th international conference on World Wide
Web, pages 317–326, New York, NY, USA, 2006. ACM Press.
[FOIS] C. Thomas, A. Sheth, and W. York. Modular Ontology Design Using
Canonical Building Blocks in the Biochemistry Domain. In Proceeding of the
2006 conference on Formal Ontology in Information Systems: Proceedings of
the Fourth International Conference (FOIS 2006), pages 115–127,
Amsterdam (NL), 2006. IOS Press.
[AAAI] P. Doshi and C. Thomas. Inexact matching of ontology graphs using
expectation-maximization. In AAAI‟06: proceedings of the 21st national
conference on Artificial intelligence, pages 1277–1282. AAAI Press, 2006.


Publications

[CHB] C. Thomas and A. Sheth. Web Wisdom - An Essay on How Web 2.0 and
Semantic Web can foster a Global Knowledge Society. Computers in Human
Behavior, Elsevier.
[WebSem2] P. Doshi, R. Kolli, and C. Thomas. Inexact matching of ontology
graphs using expectation-maximization. Web Semantics: Science, Services
and Agents on the World Wide Web, 7(2):90–106, 2009.
[IJWGS] V. Kashyap, C. Ramakrishnan, C. Thomas, and A. Sheth. Taxaminer:
an experimentation framework for automated taxonomy bootstrapping.
International Journal of Web and Grid Services, 1(2):240–266, 2005.
[IJSWIS] A. P. Sheth, C. Ramakrishnan, and C. Thomas. Semantics for the
semantic web: The implicit, the formal and the powerful. Int. J. Semantic Web
Inf. Syst., 1(1):1–18, 2005.
[CR] S. Sahoo, C. Thomas, A. Sheth, C. Henson, and W. York. GLYDEan
expressive XML standard for the representation of glycan structure.
Carbohydrate research, 340(18):2802–2807, 2005.


Other Publications

Workshop Publications
[SWLS] A. Sheth, W. York, C. Thomas, M. Nagarajan, J. Miller, K. Kochut, S.
Sahoo, and X. Yi. Semantic Web technology in support of Bioinformatics for
Glycan Expression. In W3C Workshop on Semantic Web for Life Sciences,
pages 27–28, 2004.
[SWSWPC] N. Oldham, C. Thomas, A. Sheth, and K. Verma. METEOR-S Web
Service Annotation Framework with Machine Learning Classification.
Semantic Web Services and Web Process Composition, pages 137–146,
2005, Springer.
Book Chapters
[FLSW] C. Thomas and A. Sheth. On the expressiveness of the languages for
the semantic web - making a case for a little more. Fuzzy Logic and the
Semantic Web, pages 3–20, 2006.
Patent
[PAT] P. Mehra, R. Brooks and C. Thomas. ONTOLOGY CREATION BY
REFERENCE TO A KNOWLEDGE CORPUS. Pub.No. US 2010/0280989 A1


• Research • Collaborations
– Complex Carbohydrate Research
– KR Center
– Domain model at UGA
extraction / IE – HP Labs Palo Alto
– Human Performance
Directorate, AFRL
• Proposals
– HP Incubation &
Innovation grant for
Doozer++
• Tools and Ontologies
– AFRL grant largely – GlycO
based on Doozer++ – GlycoViz
– NSF proposal – Doozer++
submitted with “very
good” reviews – Scooner
87

Thank you!

Shaojun Amit Pascal Pankaj
Gerhard
Wang Sheth Hitzler Mehra
Weikum

Thanks to all Kno.e.sis Center
Members
–
Past and Present


Thank you


Knowledge Acquisition in a System: Automatic Creation of Conceptual Domain Models

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Knowledge Acquisition in a System: Automatic Creation of Conceptual Domain Models

Similar to Knowledge Acquisition in a System: Automatic Creation of Conceptual Domain Models (20)

Recently uploaded

Recently uploaded (20)

Knowledge Acquisition in a System: Automatic Creation of Conceptual Domain Models

Editor's Notes