Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

HyperMembrane Structures for Open Source Cognitive Computing

2,293 views

Published on

Open source "cognitive computing" systems, specifically OpenSherlock; describes a HyperMembrane structure, a kind of information fabric, for machine reading, literature-based discovery, deep question answering. Platform is open source, uses ElasticSearch, topic maps, JSON, link-grammar parsing, and qualitative process models.

Published in: Technology

HyperMembrane Structures for Open Source Cognitive Computing

  1. 1. HyperMembrane Structures for Open Source Cognitive Computing Japanese Agency for Science and Technology Tokyo, Japan 3 March, 2015 Jack Park © 2015 TopicQuests Foundation CC by SA 4.0
  2. 2. The Present Situation Upon this gifted age, in its dark hour, Rains from the sky a meteoric shower Of facts . . . they lie unquestioned, uncombined. Wisdom enough to leech us of our ill Is daily spun; but there exists no loom To weave it into fabric Edna St. Vincent Millay, 1939 2
  3. 3. Topics To Cover • Discovery, learning, problem solving • Topic Maps • OpenSherlock • HyperMembranes • Open Source • Key reasons for building open source cognitive systems 3
  4. 4. Cognitive Computing: My View • Cognitive Computing is: – Far less about what a computer knows – Far more about how computers can augment human cognitive capabilities – Based on the J.C.R Licklider and Douglas Engelbart augmentation work J.C.R. Licklider Douglas Engelbart 4Imgs: Wikipedia
  5. 5. A Domain-specific Problem Statement • An Example: – Do these two sentences say the same thing? • CO2 is a causal factor in climate change. • Climate change is caused by carbon dioxide. • Problem Statement – Software agents need elegant methods for reading, representing, organizing, and modeling information resources to support discovery and answering questions. 5
  6. 6. A Framing Thought • From [1] – The understanding of global brain organization and its large-scale integration remains a challenge for modern neurosciences. • To – The understanding of global conversations about topics that matter and their large-scale federation remain a challenge for modern information technology. [1] Petri G, Expert P, Turkheimer F, Carhart-Harris R, Nutt D, Hellyer PJ, Vaccarino F. (2014) Homological scaffolds of brain functional networks. J. R. Soc. Interface 11: 20140873. 6
  7. 7. Our Goals • Improve Human-Tool Capabilities • Augment existing analytic methods – Increase opportunities for discovery – Improve already sophisticated methods • Build Looms – Read documents – Map and model topics read – Weave information fabrics Douglas Engelbart 7
  8. 8. Discovery • Is it really possible for people to see everything? – Part of discovery is connecting dots not yet connected. – “Cognitive Agents” can help increase chances of serendipity. “Discovery consists of seeing what everybody has seen and thinking what nobody has thought.” –Albert Szent-Györgyi 8
  9. 9. Related Work • Commercial – IBM Watson – Wolfram Alpha – Viv – Saffron 10 – Clueda – Siri – Google Now – Cortana – … • Open Source – OAQA – DeepDive – OpenCog – OpenNARS – Watsonsim – YodaQA – AKSW OpenQA – AKSW QA – AquaLog – OpenSherlock – OpenIRIS (CALO) – … • Research – Project Aristo – Project Halo – FREyA – CASIA – NLP-Reduce – EIS Sina – WDAqua ITM – Intui2 – … 9
  10. 10. Biologically Inspired Design • Humans are blessed with: – Memory to keep concepts organized and connected – Internal mechanisms which map sensor data into memory for processing and storage – The abilities of complex, adaptive, anticipatory systems 10
  11. 11. Memory: Introducing Topic Maps • A Topic Map is like a library without all the books* – A Topic Map is indexical • Like a card catalog – Each topic has its own representation • Improving on a card catalog, a topic can be identified many different ways • Captures metadata and optionally content – A Topic Map is relational • Like a good road map – Topics are connected by associations (relations) – Topics point to their occurrences in the territory – A Topic Map is organized • Multiple records on the same topic are co-located (stored as one topic) in the map *a map is not its territory 11
  12. 12. TopicMap Structure •Topics as Actors •Topics as Relations •Topics as Types •Topics as Biographies 12
  13. 13. Processing Mechanisms • Typically, software processes take the form of variants of NLP (natural language processing) – Parsers – Cluster analysis – Entity recognition – Relation detection – Role recognition – Probabilistic methods 13
  14. 14. A Key Question in My Research • Can a Topic Map learn (construct itself) by “reading” literature? – Relevant issues: • Bootstrapping • Machine reading – NLP – Linguistics – Statistics – Analogy & Metaphor – … • Knowledge representation • Model building – Anticipation • Weaving information fabrics • Literature-based discovery • Deep Question Answering 14
  15. 15. A Simple Example • Read this sentence: – Gene expression is caused by insoluble hormones binding to a plasma membrane hormone receptor • Topic Map recognizes: – Gene expression  GeneExpression – insoluble hormones  InsolubleHormone – plasma membrane hormone receptor  PlasmaMembraneReceptor • Software agents transform: – is caused by  Cause – binding to  Binds • Final semantic structure: • { {InsolubleHormone, Binds, PlasmaMembraneReceptor}, Cause, GeneExpression } 15
  16. 16. Introducing OpenSherlock • OpenSherlock is: – A Topic Map for information resource identity and organization – A HyperMembrane information fabric structure – A society of agents system which can • Read documents • Process information resources – Maintain the topic map – Maintain the HyperMembrane – Build and maintain models – Perform discovery tasks – Answer questions – Agents are coordinated by: • A blackboard system • A dynamic task-based agenda • Event propagation and handling 16
  17. 17. Observations 1 • A Topic Map is central to the key question, and therefore to a thesis entailed by this research – It serves as a kind of memory for social processes – It provides a robust platform for subject identity – It can also serve as a repository for domain- specific vocabularies (ontologies, taxonomies, naming conventions,…) 17
  18. 18. Observations 2 • A Topic Map is necessary but not sufficient to support discovery, learning, or problem solving – It really only provides a powerful indexical structure related to the key artifacts in any universe of discourse: • Actors • Their relations • Their states • Rules, laws, theories,… • To model those key artifacts, other representation strategies are required – Conceptual Graphs – Qualitative Process Theory – Belief Networks – … 18
  19. 19. A Research Question • What processes are available which, if performed while harvesting (reading) documents, can reduce the amount of processing required later during question answering? – The question entails • Synthesis of ontology • Co-reference resolution • Re-representation during question lifting • … 19
  20. 20. A Working Hypothesis • Process – Build and maintain a content-addressable memory of questions, claims, arguments, and evidence fields. • We call that a HyperMembrane – Note: • Every text object passed into the system is processed by the same algorithms – Sentences harvested from text – Questions and responses posed by humans 20
  21. 21. Key Concept: HyperMembrane • HyperMembrane is a key concept in the working hypothesis that OpenSherlock seeks to explore and demonstrate – A growing graph as a collection of woven and intersecting fabrics • constructed from normalized tuples (n-tuples) which are designed to reduce the amount of NLP required to read documents • such that intersections of fabrics occur where named entities in the graph of n-tuples are the same – Inspired by Ted Nelson’s ZigZag Architecture 21
  22. 22. Machine Reading in OpenSherlock • Goals: – Grow the topic map • Topic Map then serves to support fabrication of higher-order knowledge structures – Conceptual Graphs – Belief Networks – QP Theory Models – HyperMembrane – … • Process Loop: – For a given document • For every paragraph in that document – For every sentence in each paragraph » Read the sentence 22
  23. 23. Sentence Reading • First Step: – Process sentence into word grams* • Second Step: – Where possible • Transform word grams into n-tuples** • n-tuples form the HyperMembrane * A container of words, from 1 to 8 words per container ** A container of symbols based on words in word grams 23
  24. 24. Process Sentence into WordGrams • Approach – Break sentence into word grams* • WordGram objects are shared across sentences – Count of sentence identifiers associated with each object serves as basis for probabilistic models – Either • TopicMap recognizes terms – Or • Sentence is parsed by Link-Grammar Parser** • TopicMap learns from parse results *http://en.wikipedia.org/wiki/W-shingling **http://www.link.cs.cmu.edu/link/ 24
  25. 25. Transform WordGrams to N-Tuples • Normalized tuple (N-Tuple) – A structure where the subject, predicate, and object are normalized • Nouns and verbs transformed – CO2, Carbon Dioxide, …  CO2 – causes, is caused by, …  cause • Two sentence example – CO2 is a cause of climate change. – Climate change is caused by carbon dioxide. – Result: » { CO2, cause, climate change } – Normalization processes include general and domain specific lenses • Rule-based interpreters which detect structures – Taxonomy – Causality – Biomedical – Geophysical – … • Process models – Built and maintained while reading – Predict while reading – Anticipatory Reading 25
  26. 26. About N-Tuples • An N-Tuple is a structured record of – Topics in the topic map – Those topics are harvested from text • An N-Tuple takes the form: – { Subject, Predicate, Object } – Where • Subject and/or Object can be one of: – A topic from the topic map – Another N-Tuple • An N-Tuple is identified by the identities of the terms it contains – When thinking in terms of terms (words) read from documents, the identities (numeric representations) of those terms form the identity of the N-Tuple object. • N-Tuples are content addressable • Disambiguation of subjects is a topic mapping process – Learning means continuous refinement of subject identity – Ambiguities can also be solved through human intervention 26
  27. 27. N-Tuples as HyperMembrane Tuples {A, Bind, B} {{A, Bind, B}, Cause, X} {X, Bind, D} {{X, Bind, D}, Cause, Y} 27 A B Bind X Y D Cause Bind Cause
  28. 28. Current State of OpenSherlock ElasticSearch Titan or Blazegraph Ontology Importer Ontologies PubMed Reader PubMed Abstracts HyperMembrane Engine TellAsk UMLS Importer UMLS 28
  29. 29. Observations 3 • HyperMembrane is a reminding system – HyperMembrane is a record of federated human conversation • Harvested from books, papers, and recorded conversation • Includes statistical properties of recorded utterances – HyperMembrane records: • That which is common • That which is novel – Possibly wrong – Possibly game changing 29
  30. 30. TellAsk Interface Conversation Tree User can click a node to select as parent for any user response Response Type Selectors. Selection required before response. User types here Linear conversation flow Entry Forms Selector List Map starts a new conversation with entered topic 30
  31. 31. The Open Source Stack • Persistence – ElasticSearch – Considering Titan – Considering Blazegraph (Bigdata™ RDF Store) • Libraries – Many from Apache Foundation and others – LinkGrammarParser (Java version) – XML PullParser – Simple JSON Parser • Tools – Eclipse 31
  32. 32. Summary 32
  33. 33. Current State of Development • Aim to answer simple questions about casuality – Current focus on biomedical domain – Current focus on two lenses • Taxonomy • Casuality – No Conceptual Graphs – No Process Models – No Probabilistic Models 33
  34. 34. Future Work • Aim to complete an anticipatory system – Process models for anticipation – Conceptual graphs – Probabilistic models – More lenses • Pluggable lenses • Adaptive lenses – More domains 34
  35. 35. Why Do This? • Augment human capabilities in problem solving • Participate in Open Science 35
  36. 36. Augmenting Social Sensemaking 1 2 3 Creating Ideas Refining Connections Connecting Ideas Cancer patient 36
  37. 37. Participate in Open Science 37
  38. 38. Key Context for Open Science • A planet-wide, collaborative quest for Global Thrivability*. – Issues include • Sociological events – Health, epidemics, wars,… • Geophysical events – Climate change, earthquakes, volcanoes, … • Astrophysical events – Asteroids, our Sun. … * Let’s call the quest: EarthMoonshot 38
  39. 39. Completed Representation antioxidants kill free radicals Contraindicates macrophages use free radicals to kill bacteria Bacterial Infection Antioxidants Because Appropriate For Compromised Host Let us co-create Cognitive Agents for Discovery jackpark@topicquests.org OpenSherlock documents at: http://debategraph.org/OpenSherlock Code emerging at: https://github.com/opensherlock/ Slides online at http://slideshare.net/jackpark/ Acknowledgments: Bob Gleichauf David Alexander Price Arun Majumdar Robert S. Stephenson Mark Szpakowski Martin Radley Sherry Jones Alexander Wenzowski Ted Kahn Patrick Durusau 39

×