<ul><li>Presenter: </li></ul><ul><li>Marc Hadfield </li></ul><ul><li>[email_address] </li></ul><ul><li>www.alitora.com </l...
Marc Hadfield <ul><li>CTO of Alitora Systems </li></ul><ul><li>Computer Science </li></ul><ul><li>Research in Bioinformati...
Alitora Systems <ul><li>System Approach </li></ul>… Talk about Systems & Apps more than Modules.
Discussion Today <ul><li>Storing Data – Semantic Repository </li></ul><ul><li>Generating Data – NLP </li></ul><ul><li>Mode...
Alitora Systems Architecture
Alitora Systems API (ASAPI) <ul><li>User Interfaces </li></ul><ul><li>ASAPI Collaboration </li></ul><ul><li>kHarmony™ Sema...
ASAPI Cloud Multi-Billion Triples
kHarmony™ Semantic DB <ul><li>Semantic / Graph DB </li></ul><ul><li>Cloud Deployable </li></ul><ul><ul><li>Distribute Data...
Alitora Foundry <ul><li>Manages NLP processes </li></ul><ul><ul><li>Annotators which add metadata to text </li></ul></ul><...
Foundry Workflow <ul><li>Independent Workflows based on type of text </li></ul><ul><li>Combine ML & Rule-based systems </l...
Foundry Data Model <ul><li>Two dimensional representation of tokens </li></ul><ul><ul><li>Labels/Spans to tag token ranges...
NLP In Action Copyright Alitora Systems, Inc. 2009 Confidential
<ul><li>Sentence </li></ul><ul><li>“ Suppression of endogenous Bim greatly inhibits Gadd45a induction of apoptosis.” </li>...
Alitora Knowledge Ontology Data Representation: Each Object is Named Graph. Unique URI. “ chunks” of RDF OWL2  “ Core” Model
Alitora Knowledge Ontology <ul><li>Named Graphs: </li></ul><ul><li>URI </li></ul><ul><li>“ Reified” </li></ul><ul><li>Prov...
Alitora Knowledge Ontology Lesson: “ Reification” at the model level. Expose the topology of the knowledge.
Semantic Knowledge Statements Domain Ontology + Instance Statements Alitora Knowledge Ontology
Semantic Collaborative Statements Alitora Knowledge Ontology
Alitora Knowledge Ontology <ul><li>Fact Representation </li></ul><ul><ul><li>This example has 9 Named Graphs </li></ul></u...
<ul><li>OWL </li></ul><ul><li>“ Reified” </li></ul><ul><li>Knowledge Representation </li></ul><ul><ul><li>Certainty, Error...
MemomicsBio Ontology (Domain) <ul><li>Extends Alitora Knowledge Ontology </li></ul><ul><ul><li>Inherits knowledge represen...
Where are we? <ul><li>Store Data </li></ul><ul><li>Generate data with NLP </li></ul><ul><li>Represent data in a general kn...
Relevancy <ul><li>The shape or “topology” of the graph helps to identify relevant knowledge. </li></ul><ul><li>The “paths”...
Scripting, Analysis, Inference <ul><li>Submitted Scripts applied over Graph Walk </li></ul><ul><ul><li>Groovy Scripts (Jav...
Certainty <ul><li>How accurate (F-score) are your NLP extractions? </li></ul><ul><li>How accurate is the source material? ...
Certainty <ul><li>Choose to assert facts ( or not ) based on certainty assessments </li></ul>
Confidential Guided Inference Inference is guided by ranked knowledge Analysis can be performed offline
Guided Inference <ul><li>Dynamic Inference / Rules </li></ul><ul><li>A question/query is posed to initiate the inference <...
Demonstrations <ul><li>Alitora Newstracker </li></ul><ul><li>Sage Commons, Biomedical Domain </li></ul><ul><li>Match Engin...
Alitora News Tracker <ul><li>Track highly relevant news in domain niche </li></ul><ul><li>Use NLP to extract entities and ...
Application: News Tracker
Application: Sage Commons <ul><li>Share networks of biomedical data across the community of researchers </li></ul><ul><ul>...
 
 
 
Application: Match Engine <ul><li>Match Engine </li></ul><ul><li>Extended AKO with Match Ontology </li></ul><ul><li>Foundr...
 
 
 
NLP and (Un)Certainty <ul><li>Capture Error / Uncertainty in Model from NLP </li></ul><ul><li>“ Reify” relationships so me...
Contact Information <ul><li>750 Menlo Ave, Suite 340 155 Water Street </li></ul><ul><li>Menlo Park, CA 94025 Brooklyn, NY ...
Upcoming SlideShare
Loading in...5
×

Natural Language Processing & Semantic Models in an Imperfect World

1,202

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,202
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
19
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Natural Language Processing & Semantic Models in an Imperfect World

  1. 1. <ul><li>Presenter: </li></ul><ul><li>Marc Hadfield </li></ul><ul><li>[email_address] </li></ul><ul><li>www.alitora.com </li></ul>Confidential Natural Language Processing & Semantic Models in an Imperfect World Copyright Alitora Systems, Inc. 2009
  2. 2. Marc Hadfield <ul><li>CTO of Alitora Systems </li></ul><ul><li>Computer Science </li></ul><ul><li>Research in Bioinformatics </li></ul><ul><ul><li>NLP </li></ul></ul><ul><ul><li>Big (Fuzzy) Networks </li></ul></ul><ul><li>Generalized Semantic Data Platform </li></ul>
  3. 3. Alitora Systems <ul><li>System Approach </li></ul>… Talk about Systems & Apps more than Modules.
  4. 4. Discussion Today <ul><li>Storing Data – Semantic Repository </li></ul><ul><li>Generating Data – NLP </li></ul><ul><li>Modeling Data – Semantic Models </li></ul><ul><li>Analyze Data – Methodology </li></ul><ul><li>Using Data – Application </li></ul>
  5. 5. Alitora Systems Architecture
  6. 6. Alitora Systems API (ASAPI) <ul><li>User Interfaces </li></ul><ul><li>ASAPI Collaboration </li></ul><ul><li>kHarmony™ Semantic DB </li></ul><ul><li>Alitora Foundry </li></ul><ul><ul><li>Text-Mining </li></ul></ul><ul><li>UMIS Secure Distributed URIs </li></ul><ul><ul><li>URI to Named Graphs </li></ul></ul>
  7. 7. ASAPI Cloud Multi-Billion Triples
  8. 8. kHarmony™ Semantic DB <ul><li>Semantic / Graph DB </li></ul><ul><li>Cloud Deployable </li></ul><ul><ul><li>Distribute Data over Servers </li></ul></ul><ul><ul><li>Layers of Cache </li></ul></ul><ul><li>Data Analytics / Clustering </li></ul><ul><ul><li>Determine High-Value Knowledge </li></ul></ul><ul><ul><li>Knowledge Relevancy </li></ul></ul><ul><li>Embedded Scripting </li></ul><ul><li>Data Entitlements </li></ul><ul><ul><li>Users, Teams, Organizations, Colleagues </li></ul></ul><ul><li>Base Ontology </li></ul>
  9. 9. Alitora Foundry <ul><li>Manages NLP processes </li></ul><ul><ul><li>Annotators which add metadata to text </li></ul></ul><ul><ul><ul><li>Includes external services like OpenCalais as annotators </li></ul></ul></ul><ul><ul><li>Workflows to link annotators together </li></ul></ul><ul><ul><li>Common data representation across components </li></ul></ul><ul><ul><li>RDF in, RDF out </li></ul></ul><ul><ul><li>Ontology includes representation of certainty, error </li></ul></ul>
  10. 10. Foundry Workflow <ul><li>Independent Workflows based on type of text </li></ul><ul><li>Combine ML & Rule-based systems </li></ul>
  11. 11. Foundry Data Model <ul><li>Two dimensional representation of tokens </li></ul><ul><ul><li>Labels/Spans to tag token ranges (features in machine learning) </li></ul></ul><ul><li>Allows multiple interpretations of tokens </li></ul><ul><ul><li>Chemical names tokenized differently than personal names </li></ul></ul><ul><li>Sequence Recognition and Categorization (with scoring/likelyhood) </li></ul><ul><ul><li>Entities, Entity Types, Normalized (Disambiguated) Entities (ER vs. ER) </li></ul></ul><ul><li>Shared across workflow steps </li></ul><ul><li>Direct RDF representation </li></ul>“Span”
  12. 12. NLP In Action Copyright Alitora Systems, Inc. 2009 Confidential
  13. 13. <ul><li>Sentence </li></ul><ul><li>“ Suppression of endogenous Bim greatly inhibits Gadd45a induction of apoptosis.” </li></ul><ul><li>Parse </li></ul><ul><li>[ action, inhibit, </li></ul><ul><li>[action, suppress, </li></ul><ul><li>[unknown], </li></ul><ul><li>[gp, endogenous Bim] </li></ul><ul><li>], </li></ul><ul><li>[action, induce, </li></ul><ul><li>[gp, Gadd45a], </li></ul><ul><li>[process, apoptosis] </li></ul><ul><li>], </li></ul><ul><li>] </li></ul>Foundry Relationship Extraction Confidential Copyright Alitora Systems, Inc. 2009
  14. 14. Alitora Knowledge Ontology Data Representation: Each Object is Named Graph. Unique URI. “ chunks” of RDF OWL2 “ Core” Model
  15. 15. Alitora Knowledge Ontology <ul><li>Named Graphs: </li></ul><ul><li>URI </li></ul><ul><li>“ Reified” </li></ul><ul><li>Provenance </li></ul><ul><ul><li>Hash/Signature </li></ul></ul><ul><ul><li>Creation, Modification, Expiration Dates </li></ul></ul><ul><li>Certainty/Error </li></ul>
  16. 16. Alitora Knowledge Ontology Lesson: “ Reification” at the model level. Expose the topology of the knowledge.
  17. 17. Semantic Knowledge Statements Domain Ontology + Instance Statements Alitora Knowledge Ontology
  18. 18. Semantic Collaborative Statements Alitora Knowledge Ontology
  19. 19. Alitora Knowledge Ontology <ul><li>Fact Representation </li></ul><ul><ul><li>This example has 9 Named Graphs </li></ul></ul><ul><ul><li>The “Relation” is the head </li></ul></ul><ul><ul><li>Any number of Relation-Parts </li></ul></ul><ul><ul><li>Relation-Parts are chained </li></ul></ul>“ Company Merger”
  20. 20. <ul><li>OWL </li></ul><ul><li>“ Reified” </li></ul><ul><li>Knowledge Representation </li></ul><ul><ul><li>Certainty, Error, Provenance, … </li></ul></ul><ul><li>Graph + Semantic </li></ul><ul><ul><li>Topology Interpretation </li></ul></ul><ul><ul><li>Logical Interpretation </li></ul></ul>Alitora Knowledge Ontology
  21. 21. MemomicsBio Ontology (Domain) <ul><li>Extends Alitora Knowledge Ontology </li></ul><ul><ul><li>Inherits knowledge representation structures </li></ul></ul><ul><li>OWL </li></ul><ul><li>Domain Specific </li></ul><ul><li>Defines types of “facts” specific to biomedical domain </li></ul><ul><li>A general AKO fact can be mapped/asserted into a Memomics BioOntology fact </li></ul>
  22. 22. Where are we? <ul><li>Store Data </li></ul><ul><li>Generate data with NLP </li></ul><ul><li>Represent data in a general knowledge model </li></ul><ul><li>Have a domain specific ontology </li></ul><ul><ul><li>Where the “action” happens </li></ul></ul><ul><li>Need some analysis to push facts into the domain ontology </li></ul><ul><li>Query, Inference using the domain ontology </li></ul>
  23. 23. Relevancy <ul><li>The shape or “topology” of the graph helps to identify relevant knowledge. </li></ul><ul><li>The “paths” connecting a User to knowledge, based on search usage, factor into Relevancy </li></ul><ul><li>“ Knowledge Rank” </li></ul><ul><ul><li>“ Best” facts </li></ul></ul>Relevancy based on Graph Topology
  24. 24. Scripting, Analysis, Inference <ul><li>Submitted Scripts applied over Graph Walk </li></ul><ul><ul><li>Groovy Scripts (Java Interface) </li></ul></ul><ul><ul><li>Can calculate “scores” </li></ul></ul><ul><li>Offline Clustering and Analysis Algorithms </li></ul><ul><ul><li>Grid/Cloud based </li></ul></ul><ul><li>Inference process utilizes knowledge </li></ul><ul><ul><li>Asserting statements (Relation  Statement) </li></ul></ul><ul><ul><li>Prolog, HiLog, F-Logic </li></ul></ul><ul><ul><li>Use all features in inferencing (such as certainty) </li></ul></ul>
  25. 25. Certainty <ul><li>How accurate (F-score) are your NLP extractions? </li></ul><ul><li>How accurate is the source material? </li></ul><ul><li>How dynamic is your domain? </li></ul><ul><li>Can facts be independently verified </li></ul><ul><ul><li>Do multiple sources reinforce a “fact”? </li></ul></ul><ul><li>Can your community of users curate or validate information? </li></ul><ul><li>How sensitive are you to error? </li></ul><ul><ul><li>Will users tolerate error (such as in search) or are you trying to inference over absolute “truth”? </li></ul></ul>
  26. 26. Certainty <ul><li>Choose to assert facts ( or not ) based on certainty assessments </li></ul>
  27. 27. Confidential Guided Inference Inference is guided by ranked knowledge Analysis can be performed offline
  28. 28. Guided Inference <ul><li>Dynamic Inference / Rules </li></ul><ul><li>A question/query is posed to initiate the inference </li></ul><ul><li>Knowledge-based is queried to collect relevant data </li></ul><ul><ul><li>Certainty Thresholds can be used </li></ul></ul><ul><ul><li>Relevancy Thresholds can be used </li></ul></ul><ul><li>AKO Relations are asserted as “facts” to extend the inference </li></ul><ul><li>Process is repeated to add assertions </li></ul>
  29. 29. Demonstrations <ul><li>Alitora Newstracker </li></ul><ul><li>Sage Commons, Biomedical Domain </li></ul><ul><li>Match Engine, Consumer Application </li></ul>
  30. 30. Alitora News Tracker <ul><li>Track highly relevant news in domain niche </li></ul><ul><li>Use NLP to extract entities and relations of interest </li></ul><ul><li>Use certainty assessments as thresholds to consider entities/relations </li></ul><ul><li>Use a score (an embedded script) to assign a relevancy to news articles </li></ul><ul><ul><li>Heuristic including entities types in articles, relationship types, et cetera </li></ul></ul>
  31. 31. Application: News Tracker
  32. 32. Application: Sage Commons <ul><li>Share networks of biomedical data across the community of researchers </li></ul><ul><ul><li>million node networks, billions of triples </li></ul></ul><ul><li>Extended AKO with Sage Ontology </li></ul><ul><ul><li>Use for structured data and unstructured data </li></ul></ul><ul><li>Allow combination of structured data with NLP derived data </li></ul><ul><li>Use certainty thresholds to cut down on noise </li></ul><ul><li>Use relevancy for efficient queries </li></ul><ul><li>Expose data for guided inferencing </li></ul>
  33. 36. Application: Match Engine <ul><li>Match Engine </li></ul><ul><li>Extended AKO with Match Ontology </li></ul><ul><li>Foundry for extracting music event entities </li></ul><ul><ul><li>Performer, Venue, Price, Genre </li></ul></ul><ul><li>Certainty for reducing noise </li></ul><ul><li>Match Engine uses inference with multiple source of “evidence” to match users with events </li></ul><ul><li>Demo Application: Bandalay Facebook App </li></ul>
  34. 40. NLP and (Un)Certainty <ul><li>Capture Error / Uncertainty in Model from NLP </li></ul><ul><li>“ Reify” relationships so metadata will “fit” </li></ul><ul><li>Use multiple types of analysis </li></ul><ul><ul><li>Rules, Machine Learning, Topology, Curation, User Feedback </li></ul></ul><ul><li>Separate general model and domain model </li></ul><ul><ul><li>Allows asserting a fact in the domain model or not (don’t “decide” everything at once) </li></ul></ul><ul><li>Use semantics to make decisions about data </li></ul><ul><li>Inference can use thresholds to decide to assert facts (or not) </li></ul><ul><li>Guided Inference can make informed choice about facts to add/remove from model </li></ul>
  35. 41. Contact Information <ul><li>750 Menlo Ave, Suite 340 155 Water Street </li></ul><ul><li>Menlo Park, CA 94025 Brooklyn, NY 11201 </li></ul><ul><li>(415) 310-4406 (917) 463-4776 </li></ul><ul><li>[email_address] </li></ul><ul><li>[email_address] </li></ul>Confidential Copyright Alitora Systems, Inc. 2009
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×