Towards Cognitive Agents for BigData Discovery


Published on

Cognitive Agents to augment BigData analysis

Published in: Technology, Education
  • Be the first to comment

Towards Cognitive Agents for BigData Discovery

  1. 1. Towards Cognitive Agents for BigData Discovery Finding Solutions to Complex, Urgent Problems Jack Park BigData Science Meetup Freemont, CA: 19 April, 2014 Shyam Sarkar, Organizer © 2014, TopicQuests Foundation Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
  2. 2. A Narrative Arc
  3. 3. Context: Two Kinds of Discovery • Data-based – Harvesting nuggets from collected data • Literature-based: Deep Question Answering – Discovering connections between dots in the literature
  4. 4. Target: Deep Question Answering Breadth Depth Information Retrieval Semantic Representation Goal Diagram adapted from a talk by Percy Liang at Stanford, 20140407
  5. 5. Our Goals • Improve Human-Tool Capabilities • Augment existing analytic methods – Increase opportunities for discovery – Improve already sophisticated methods “Discovery consists of seeing what everybody has seen and thinking what nobody has thought.” –Albert Szent-Györgyi
  6. 6. Our Approach • Explore and develop the technologies of so- called Cognitive Agents – Current examples • IBM’s Watson • SIRI • An opportunity – Couple two platforms • Berkeley Data Analytics Stack (BDAS) • SolrSherlock
  7. 7. Berkeley Data Analytics Stack Deep QA Issues* • Low latency queries – Perform faster inferences – Explore larger spaces – Better decisions • Sophisticated analysis – Better forecasts – Better decisions • Unification of existing data computation models – Integrate interactive queries, batch and streaming processing *
  8. 8. An Observation In this context, interesting literature is about the social lives of data
  9. 9. Literature-based Discovery • Forming bisociative links* between information in different literature sources which are not known to be related • Swanson example (simplified)**: – Literature associated with Raynaud’s • Raynaud’s therapy linked to blood thinners – Literature associated with fish oils • Fish oil linked to blood thinners – “Blood thinners” as an implicit link between fish oil and Raynaud’s Syndrome • Akin to the wormholes formed by tags on web pages or hashtags *Arthur Koestler (1964). The Act of Creation ** Swanson, Don (1986) "Fish oil, Raynaud's syndrome, and undiscovered public knowledge." Perspectives in Biology and Medicine 30(1): 7-18.
  10. 10. Cognitive Agents • Examples – Proprietary • IBM’s Watson • SIRI • SRI’s CALO – Part of which: IRIS, was made open source as OpenIRIS • Others… – Open Source • Cougaar – • Open Cog – • Open Advancement of Question Answering Systems – Closely related to IBM’s Watson – • SolrSherlock – • Many others…
  11. 11. Use Cases for Big Data Harvesting • Resource Collection – Federation • bring together and organize without filters • Resource Augmentation – Tagging – Annotating – Debate • Knowledge Cartography – Connecting resources – Map maintenance – More Debate • Research Augmentation – Crowd-sourced discovery – Harvesting – Automated inferences /reasoning – Knowledge sharing Federated Information Resources Harvesting Activities Adapted from Slide 29 Harvesting Activities Harvesting Activities
  12. 12. A Strong Conjecture • A Knowledge Federation’s topic map provides a Rosetta Stone-like substrate – Reasoning by analogy – Big Data mined for clues – Map: • Where we have been • Where we haven’t (Dragons be here) Adapted from Slide 33
  13. 13. Topic Maps for Knowledge Federation • Maintain well-organized by topic structure • Key issue: – For any given information resource added to a map: • Agents must answer this question: – Have I seen this before by any other name or description?
  14. 14. Are We There Yet? • We are now at the edges of discovery: – Deeper ways of representing – Deeper ways of knowing • Relational Biology
  15. 15. Relational Biology • Paraphrasing Nicholas Rashevsky*: – We can tease open a living cell and count all its components, but we cannot put it back together and we have no clue why • Interpreting Robert Rosen**: – Rashevsky’s quest for a relational mathematics for biology (complex systems) entails topological algebras (Category Theory) • Category theory is said to facilitate modeling the social lives of members of the categories * **
  16. 16. Relational Modeling 1 • Starts with Ontologies – Ontologies grant uniform vocabularies to universes of discourse • Including describing data – Ontology-based frameworks provide ways to model social and other relational structures • SIOC: Semantically Interlinked Online Communities* • SWAN: Semantic Web Applications in Neuromedicine** * **
  17. 17. SIOC Closer Look • A way to model components entailed by a situation (blog post in this case) – Uniform vocabulary – Structural relations • Creates a foundation for much deeper modeling – Including: • Other ontologies • Other structures • Feedback loops SIOC Blog Post* *
  18. 18. Massive Connectivity and Feedback Complex Communication Processes
  19. 19. Feedback Loops: Crucial to Learning Image: FEDERAL HEALTH FUTURES SUMMIT LEADERSHIP LEARNING for TRANSFORMATIONAL CHANGE. September 10-11, 2012 Washington DC Metro Region Page 23
  20. 20. Relational Biology: Context • Context is about Relations among the components themselves • Context is about Relations among the components and their environment • Context is about Feedback
  21. 21. Example from Breast Cancer 1 Extracellular Matrix (EM) as Context Complex Communication Processes Milk producing tissue
  22. 22. Example from Breast Cancer 2 Cells missing their EM Cells with restored EM
  23. 23. Towards Cognitive Agents • Harvest and represent – Patterns • Actors • Relations • States – Context in which patterns exist • Discover – Processes – Unrecognized connections – …
  24. 24. Watson’s Architecture* (Simplified) • Analysis determines answer type and topics in play • Hypothesis formation seeks candidate answers from sources – Pattern matching • Hypothesis scoring weighs evidence for each hypothesis • Answer ranking uses models to select answer Question Analysis Answer Sources Evidence Sources Hypothesis Formation Hypothesis Scoring Answer Ranking Answer *
  25. 25. SolrSherlock Architecture (Simplified) Topic Map Conceptual Graphs Harvested Documents Harvester: HyperMembrane Information Fabrics, Agents Literature-based Discovery: Process documents into structures (information fabrics) from which patterns are harvested. Federate Data Analysis with Literature: Federate Data Observations and predictions with concepts and relations harvested from the literature Model Processes, Structures, and Analogies
  26. 26. SolrSherlock Component Diagram Topic Map Conceptual Graph Information Fabrics TSC TM Provider CG Provider Machine Reader TSC Provider Open DeepQA Harvester PersistenceProvidersAgents
  27. 27. Looking Forward • Coupling Literature-based research with BigData analysis – Common ontologies – Hypothesis formation – Evidence gathering – Relation discovery
  28. 28. Completed Representation antioxidants kill free radicals Contraindicates macrophages use free radicals to kill bacteria Bacterial Infection Antioxidants Because Appropriate For Compromised Host Let us co-create Cognitive Agents for Discovery Thanks to Martin Radley , Patrick Durusau Sherry Jones, and Mark Szpakowski for valuable comments SolrSherlock at: and