Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Lbd tm-2

804 views

Published on

Literature-based Discovery, Machine Reading, and Topic Maps

Published in: Internet

Lbd tm-2

  1. 1. Literature-based Discovery, Topic Maps, and Machine Reading AI Seminar University of Alberta 28 July, 2017 Jack Park TopicQuests Foundation © 2017, TopicQuests Foundation Img: Doc Searls: https://www.flickr.com/photos/docsearls/5500714140
  2. 2. Context and problems on the table? • Context • Improved human/machine systems capabilities in the face of complex, urgent problems (Engelbart) • Problems • Human activity • Collaboration • Information Silos • Knowledge representation • User experience 2
  3. 3. Solution research under way? • Collaboration and User Experience • Knowledge gardens • Subject of a different talk • Open Source: OpenSherlock • Information Silos • Literature-based discovery • Machine reading • Knowledge federation • Knowledge representation • A range of representation systems working together • Hybrid architectures • Pandemonium-style architectures 3
  4. 4. On Information Silos • An insular management system† • Silo effect arises from • Silo mentality • Incompatible data and information systems • Domain specialization • … • Silo effect creates • Reduced signal to noise ratio in public information systems • Redundant resources • Unconnected concepts (dots) • ... †https://en.wikipedia.org/wiki/Information_silo 4
  5. 5. Silo Effect: An Example • Alzheimer’s Context† • Dr. Trumbleand the Tsimané Project‡ • Anthropologist studying evolutionary medicine • Indigenous people, Bolivia • Higher elderly cognitive performance with the ApoE4 gene • Dr. Liddelow studying immune response in brains • Some people die without dementia but with brains clogged with Alzheimer’s pathology • A Quote (emphasis mine): • “I asked Dr. Liddelow whether he was familiar with the Tsimane research. He admitted that he was not — the field of evolutionary biology is distant from his own. But he said the hypothesis that the ApoE4 gene evolved to protect our brains from the effects of parasitic infection made perfect sense. “That’s absolutely in line with what we found. For our ancestors, an ApoE4 gene could have been beneficial,” Dr. Liddelow said, in part because it would have helped the astrocytes go on the attack.” † Kennedy, P. (2017, July 14). An Ancient Cure for Alzheimer's? The New York Times. ‡ University of New Mexico. The Tsimane Health and Life History Project. http://www.unm.edu/~tsimane/ 5
  6. 6. Literature-based Discovery • Using literature (documents) to find new relations in existing “knowledge” (discovery) • Mitigate silo effect • Literature-based Discovery, the field, was created by the information scientist Don R. Swanson in the 1980s.† • Does not produce new “knowledge” as would laboratory experiments Swanson Linking † see Swanson, Don (1988). "Migraine and Magnesium: Eleven Neglected Connections". Perspectives in Biology and Medicine. 31 (4): 526–557. Find Intermediate literature B which links topic A with topic C 6
  7. 7. Swanson’s ABC Linking Method† 1. Pick a topic of interest (Raynaud’s Disease) 2. Search to find literature A = {Raynaud’s} 3. Hypothesize that B (e.g., blood factors) should be studied in relation to Raynaud’s 4. Search literature A’ = A ∩ {blood} 5. Notice two common descriptors: blood viscosity, red blood cell rigidity 6. Search literature C = {blood viscosity} U { red blood cell rigidity } 7. Notice the term “Fish Oil” 8. Search literature C = {Fish Oil} 9. Show {Fish Oil} ∩ {Raynaud’s} = { } 10. Show plausible connection between Raynaud’s and Fish Oil Show the literature has not yet made the connection (empty set) †Adapted from: http://www.dimacs.rutgers.edu/~billp/pubs/JASISTLBD.pdf 7
  8. 8. Machine Reading: Natural Language Understanding • General approach • Engage • Lexicon / ontology • Parser • Grammar rules • Our approach • Engage • Lexicon / ontology (topic map) • Templates • Lenses • Grammar rules 8
  9. 9. On Knowledge Federation • Knowledge Federation on the Web† • International Workshop, Dagstuhl Castle, Germany, May 1-6, 2005 • First Semiannual Knowledge Federation Conference‡ • Dubrovnik, Croatia, October 20-22, 2008 † Jantke, K.P., Lunzer, A., Spyratos, N. and Tanaka, Y. (Editors; 2006) Federation over the Web. Springer-Verlag, Berlin Heidelberg. ‡ Karabeg, D and Park, J. (2008) First International Workshop on Knowledge Federation. CEUR workshop proceedings, http://ceur-ws.org. 9
  10. 10. A Working Hypothesis • A Topic Map can serve as a platform for knowledge federation • More accurately: information resource federation • Federation, in this context, is defined as bringing information resources together without filters, and keeping them organized in a topic-centric collection. 10
  11. 11. Research Questions • Can a topic map learn by reading? • Can reading by topic maps (the process of topic mapping) perform literature-based discovery? • Perhaps a form of automating Swanson’s ABC method • What software architectures support machine reading in the context of topic mapping? 11
  12. 12. Towards a Machine Reading Architecture • Workflow • Documents à Intermediate Structures à Higher-order Structures • TopicMaps • Conceptual Graphs • Probabilistic Graphs • … • Continuous Literature-based Discovery 12
  13. 13. Continuous Literature-based Discovery: C-LBD • A machine reading many documents is always exposed to surprising results • To the machine, nearly everything being read is unconnected (Swanson’s empty set) • Over time, connections may be discovered • For a machine to remember and begin to make connections • It must possess a long-term memory, part of a kind of cognitive architecture • A suggested framework: • Human-Machine teams which perform C-LBD 13
  14. 14. A Cognitive Architecture for C-LBD • “A cognitive architecture is, first, a software architecture that supports communication between software modules, shared resources and workspaces, and control. What makes it cognitive is that the modules implement abilities we generally call perceptual or cognitive.”† • Long-term Memory (LTM) • Short-term (working) Memory (STM) • Pattern recognition • Symbol manipulation • Decision making • Data • … †Cohen, P (2010) Lecture 3: Cognitive Architecture. http://w3.sista.arizona.edu/~cohen/cs-665-spring-2010/lectures/PDFs/Lecture3.pdf 14
  15. 15. Topic Maps: Long Term Memory (LTM) • A Topic Map is like a library without all the books • A container for a universe of topics • A Topic Map is indexical • Like a card catalog • Each topic has its own representation (proxy) • Improving on a card catalog, a topic can be identified many different ways • Captures metadata and optionally content • A Topic Map is relational • Like a good road map • Topics are connected by associations (relations) • Topics point to their occurrences in the territory • A Topic Map is organized • Multiple records on the same topic are co-located (stored as one topic) in the map • One location in the map for each topic Img: Jack Park 15
  16. 16. Observations 1 • A Topic Map is central to the key question research question • It serves as a kind of memory for social processes • It provides a robust platform for subject identity • It also serves as a repository for domain-specific vocabularies (ontologies, taxonomies, naming conventions,…) • It federates those vocabularies Cultural Algorithm Framework† † After Figure 1: Reynolds, RG & Peng, B (2005) CULTURAL ALGORITHMS: COMPUTATIONAL MODELING OF HOW CULTURES LEARN TO SOLVE PROBLEMS: AN ENGINEERING EXAMPLE. Cybernetics and Systems An International Journal Volume 36, 2005 - Issue 8 16
  17. 17. Observations 2 • A Topic Map is necessary but not sufficient to support discovery, learning, or problem solving • It really only provides a powerful indexicalstructure related to the key artifacts in any universe of discourse: • Actors • Their relations • Their states • Rules, laws, theories,… • To model those key artifacts, other representation strategies are required • Conceptual Graphs • Qualitative Process Theory • Belief Networks • Deep Learning • … 17
  18. 18. Knowledge Representation: Symbol Systems • A physical symbol system (also called a formal system) takes physical patterns (symbols), combining them into structures (expressions) and manipulating them (using processes) to produce new expressions.† • We consider the identifier of a topic in a topic map as a symbol • Example base structure: RDF triples • { subject symbol, predicate symbol, object symbol } • A goal: • Structures which contain more aspects of the situation being represented than simply the actors and their relationship † https://en.wikipedia.org/wiki/Physical_symbol_system 18
  19. 19. asA? • For taxonomy, we have isA • And its variants • subclassOf • instanceOf • For roles, we do not have asA • Instead, we model roles • Ulam quote • Towards an ideal structure • { subject symbol, subject role symbol, predicate symbol, object symbol, object role symbol } • Stanislaw Ulam to Gian-Carlo Rota † : • Convinced that perception is the key to intelligence, Ulam was trying to explain the subtlety ofhuman perception by showing how subjective it is, how influenced by context. He said to Rota, "When you perceive intelligently, you always perceive a function, never an object in the physical sense. Cameras always register objects, but human perception is always the perception of functional roles. The two processes could not be more different.... Your friends in AI are now beginning to trumpet the role of contexts, but they are not practicing their lesson. They still want to build machines that see by imitating cameras, perhaps with some feedback thrown in. Such an approach is bound to fail...” †Hofstadter, DR (1995). On Seeing A‘s and Seeing As https://web.stanford.edu/group/SHR/4-2/text/hofstadter.html 19
  20. 20. Context, Scopes • When a claim is asserted, that claim is surrounded by context • We can say that every claim has a biography • A biography can include • Historical records • Specific dates • Experiments • … • An ideal symbol structure would provide for representation of biographical entailment 20
  21. 21. TopicMap Structure •Topics as Actors •Topics as Role Players •Topics as Relations •Topics as Types •Topics as Biographies Relation Biography Actor Type Relation Type Actor Type Role Type Role Type Relation ActorActor The meaning of this Relation is precisely: An Actor playing a Role is in a particular Relation with another Actor playing a Role, in the context of biographical records. Note: Since this Relation is a Topic, it, too, can be an Actor in other Relations 21
  22. 22. Machine Reading: Swanson’s Example • Silo A • Silo B • Machine Reading collects graph structures from different sources • Form tuple-like structures which are graphs 22
  23. 23. Machine Reading: Topic Mapping • TopicMap Process • Rule: • One Location in the Map for each Subject • Federates (merges topics about the same subject) collected from different resources • Federation does a Union on topics from disparate sources • Federation Intersects silos around common topics • Topic Map 23
  24. 24. Towards a “Cognitive” Platform: OpenSherlock* • OpenSherlockis: • A Topic Map for information resource identity and organization • A HyperMembrane information fabric structure • A society of agents system which can • Read documents • Process information resources • Maintain the topic map • Maintain the HyperMembrane • Build and maintain models • Perform discovery tasks • Physical process discovery • Literature-based discovery • Theory formation • Answer questions • Agents are coordinated by: • A blackboard system • A dynamic task-based agenda • Event propagation and handling • Stream processing *OpenSherlock is an open source mostly Java-based research platform. 24
  25. 25. OpenSherlock’s Use Cases 1) Merge decision and implementation for topic maps 2) Research: machine reading 3) Literature-based discovery across the entire topic map 4) Question answering 5) (eventually) Theory formation, process discovery 25
  26. 26. Machine Reading in OpenSherlock • Process Loop: • For a given document • For every paragraph in that document • For every sentence in each paragraph • Read the sentence Each Document has a Blackboard (STM) Each Paragraph has a Blackboard (STM) Each Sentence has a Blackboard (STM) 26
  27. 27. Sentence Reading • First Step: • Process sentence into WordGrams† • WordGramobjects are shared across sentences • Count of sentence identifiers associated with each object serves as basis for probabilistic models • WordGrams can represent topic labels (names) • Second Step: • Where possible • Transform WordGrams into N-Tuples‡ • By way of Triple Structures: { Subject WordGram, Predicate WordGram, Object WordGram } • N-Tuples form the HyperMembrane • N-Tuples reflect structures in the TopicMap • Where not possible • Try to: • Locate noun phrases • Locate verb phrases † A container of words, from 1 to 8 words per container ‡ A container of symbols based on words in WordGrams and topics from the TopicMap. A collection of N-Tuples forms what we call a HyperMembrane. It is the final step between reading and adding resources to the topic map. 27
  28. 28. Sentence Reading as an Iterative Process • Since the same WordGram objects serve many sentences • Any change of state in a given WordGram affects every sentence it serves • Change from Unknown to Noun, Noun Phrase, Verb or Verb Phrase • WordGram change events enable unfinished (not fully read) sentences to iterate and possibly finish. 28
  29. 29. Sentence Reading Revisited • Sometimes • TopicMap recognizes terms • Mostly • Sentence is read by pattern recognition tools • IF-THEN sentence structure rules (Grammar rules) • Augmented by Lenses • Frame Semantics (e.g. FrameNet)† (Templates) • TopicMap learns from those results † Baker, CF, Fillmore, CJ, Lowe, JB (1998). The Berkeley Framenet Project. COLING '98 Proceedings of the 17th international conference on Computational linguistics, Volume 1 29
  30. 30. Lenses • Domain-specific interpreters which detect structures • Taxonomy • Causality • Biomedical • Geophysical • … • Can include • Rules • Conceptual Graphs • Deep learning • Process models • Built and maintained while reading • Predict while reading – Anticipatory Reading • … 30
  31. 31. Transform WordGram Triples to N-Tuples • Normalized N-Tuple • A structure where the subject, predicate, and object are normalized • Nouns and verbs transformed • CO2, Carbon Dioxide, … à CO2 • causes, is caused by, … à cause • Two sentence example • CO2 is a cause of climate change. • Climate change is caused by carbon dioxide. • Both result in the same structure: • { CO2, cause, climate change } 31
  32. 32. About N-Tuples • An N-Tuple is a structured record of • Topics in the topic map • Those topics are harvested from text • An N-Tuple takes the form: • { Subject, Predicate, Object } • Where • Subject and/or Object can be one of: • A topic from the topic map • Another N-Tuple • An N-Tuple is identifiedby the identities of the terms it contains • When thinking in terms of terms (words) read from documents, the identities (numeric representations) of those terms form the identity of the N-Tuple object. • N-Tuples are content addressable • Disambiguation of subjects is a topic mapping process • Learning means continuous refinement of subject identity • Ambiguities can also be solved through machine learning and/or human intervention Note: as a system of symbols, a subject, or object, or another N- Tuple refers to the symbol which serves as that object’s identifier 32
  33. 33. A Simple Example • Read this sentence: • Gene expression is caused by soluble hormones binding to a plasma membrane hormone receptor • Topic Map recognizes: • Gene expression ß GeneExpression • soluble hormones ß SolubleHormone • plasma membrane hormone receptor ß PlasmaMembraneReceptor • Software agents transform: • is caused by ß Cause • binding to ß Binds • Final semantic structure (N-Tuple): • { {SolubleHormone, Binds, PlasmaMembraneReceptor}, Cause, GeneExpression } 33
  34. 34. A Complex Example • The Sentence: • The pandemic of obesity, type 2 diabetes mellitus (T2DM) and nonalcoholic fatty liver disease (NAFLD) has frequently been associated with dietary intake of saturated fats (1) and specifically with dietary palm oil (PO) (2). • Expected WordGram Triples: • Obesity associated with saturated fats • Obesity associated with palm oil • T2DM associated with saturated fats • T2DM associated with palm oil • NAFLD associated with saturated fats • NAFLD associated with palm oil 34
  35. 35. Just One Triple—In 3 Slides—The Predicate { "pred": { "gramSize": "2", "sentences": ["ef471f0c-08c9-4097-ac17-354a32944fe1"], "vers": "1489448293804", "words": "associated with", "lexTypes": ["vp"], "predicateTense": "present", "id": "27637.27587", "gramType": "pair", "lensCodes": ["BioLens"] }, 35
  36. 36. The Object "obj": { "dbpo": { "@percentageOfSecondRank": "0.0", "@URI": "http://dbpedia.org/resource/Saturated_fat", "@support": "455", "@surfaceForm": "saturated fats", "@offset": "156", "@similarityScore": "1.0", "@types": "" }, "gramSize": "2", "sentences": ["ef471f0c-08c9-4097-ac17-354a32944fe1"], "synonyms": ["27739."], "vers": "1489448293876", "words": "saturated fats", "lexTypes": ["np"], "id": "27738.13882", "gramType": "pair" }, DBPedia recognized a topic WordNet returned a synonym 36
  37. 37. The Subject and WordGram Triple Identity "id": "27000._27637.27587_27738.13882T", "subj": { "dbpo": { "@percentageOfSecondRank": "1.0799338377810428E-21", "@URI": "http://dbpedia.org/resource/Obesity", "@support": "3012", "@surfaceForm": "obesity", "@offset": "16", "@similarityScore": "1.0", "@types": "DBpedia:Disease" }, "gramSize": "1", "sentences": ["ef471f0c-08c9-4097-ac17-354a32944fe1"], "vers": "1489448293875", "words": "obesity", "lexTypes": ["n"], "id": "27000.", "gramType": "singleton" } DBPedia recognized a topic 37
  38. 38. Pandemonium: Cognitive Architectures Again • Oliver Selfridge’s 1958 program Pandemonium† suggests a different framework: • Demons which provide opinions on some state of affairs (feature detectors) • Demons/agents which analyze those opinions • Demons/agents which make decisions • WordGrams as feature detectors Img: Wikipedia † Selfridge, OG (1959). Pandemonium: A paradigm for learning. In Symposium on the mechanization of thought processes. London: HM Stationary Office 38
  39. 39. WordGrams: Pattern Recognition Demons • A demon is a sensor • Individual demons focus on one thing. • A WordGram represents a word or a phrase • It serves as a sensor for instances of self in sentences being read • A demon shouts according to the strength of its response to a situation • A WordGram shouts according to the degree to which it is known to the TopicMap • If a WordGramrepresents the label for a single topic, it wins • If a WordGramrepresents the label for multiple topics, ambiguity exists • Expectation failure event • If a WordGramis a known noun/noun phrase, it shouts but does not necessarily win • If a WordGramis not yet known, it is relatively silent 39
  40. 40. Pandemonium Again • An Opportunity to Explore Related and Hybrid Architectures • More and different pattern recognition demons • Neural Nets • Analogy • Anaphoric Reference • Subject Disambiguation • … • Probabilistic Graphs • Bayesian belief networds • Neural-symbolic hybrids† 40 † Tarek R. Besold, Artur d’Avila Garcez, Isaac Noble (eds.) (2017) Pre- Proceedings of the 12thInternational Workshop on Neural-Symbolic Learning and Reasoning NeSy’17 http://daselab.cs.wright.edu/nesy/NeSy17/nesy17_pre-proceedings.pdf
  41. 41. Looking Ahead • Technologies I would like to explore with the OpenSherlock ecosystem: • Distributed Prolog • Probabilistic Prolog • UIMA-based ecosystems to include • Renaud Richardet’s Sherlok • BlueBrain’s Bluima • Petr Baudiš’s YodaQA • CMU’s OAQA • WordGrams • TensorFlow • … • A grand, unifying wishlist: • A global ecosystem of opensource “cognitive” systems, all working with humans on quests to solve and/or prevent complex, urgent problems. • Collaborations with commercial (closed source) platforms is included in that wish. 41
  42. 42. Completed Representation antioxidants kill free radicals Contraindicates macrophages use free radicals to kill bacteria Bacterial Infection Antioxidants Because Appropriate For Compromised Host Let us co-create woven information fabrics jackpark@topicquests.org http://www.topicquests.org/ https://slideshare.net/jackpark/ https://github.com/topicquests 42 Acknowledgments: Kennan Salinero Mark Szpakowski Patrick Durusau

×