Your SlideShare is downloading. ×
Towards Cognitive Agents for BigData Discovery
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Towards Cognitive Agents for BigData Discovery

657
views

Published on

Cognitive Agents to augment BigData analysis

Cognitive Agents to augment BigData analysis

Published in: Technology, Education

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
657
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
16
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Towards Cognitive Agents for BigData Discovery Finding Solutions to Complex, Urgent Problems Jack Park BigData Science Meetup Freemont, CA: 19 April, 2014 Shyam Sarkar, Organizer © 2014, TopicQuests Foundation Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
  • 2. A Narrative Arc
  • 3. Context: Two Kinds of Discovery • Data-based – Harvesting nuggets from collected data • Literature-based: Deep Question Answering – Discovering connections between dots in the literature
  • 4. Target: Deep Question Answering Breadth Depth Information Retrieval Semantic Representation Goal Diagram adapted from a talk by Percy Liang at Stanford, 20140407
  • 5. Our Goals • Improve Human-Tool Capabilities • Augment existing analytic methods – Increase opportunities for discovery – Improve already sophisticated methods “Discovery consists of seeing what everybody has seen and thinking what nobody has thought.” –Albert Szent-Györgyi
  • 6. Our Approach • Explore and develop the technologies of so- called Cognitive Agents – Current examples • IBM’s Watson • SIRI • An opportunity – Couple two platforms • Berkeley Data Analytics Stack (BDAS) • SolrSherlock
  • 7. Berkeley Data Analytics Stack Deep QA Issues* • Low latency queries – Perform faster inferences – Explore larger spaces – Better decisions • Sophisticated analysis – Better forecasts – Better decisions • Unification of existing data computation models – Integrate interactive queries, batch and streaming processing *http://strata.oreilly.com/2013/02/the-future-of-big-data-with-bdas-the-berkeley-data-analytics-stack.html
  • 8. An Observation In this context, interesting literature is about the social lives of data
  • 9. Literature-based Discovery • Forming bisociative links* between information in different literature sources which are not known to be related • Swanson example (simplified)**: – Literature associated with Raynaud’s • Raynaud’s therapy linked to blood thinners – Literature associated with fish oils • Fish oil linked to blood thinners – “Blood thinners” as an implicit link between fish oil and Raynaud’s Syndrome • Akin to the wormholes formed by tags on web pages or hashtags *Arthur Koestler (1964). The Act of Creation ** Swanson, Don (1986) "Fish oil, Raynaud's syndrome, and undiscovered public knowledge." Perspectives in Biology and Medicine 30(1): 7-18.
  • 10. Cognitive Agents • Examples – Proprietary • IBM’s Watson • SIRI • SRI’s CALO – Part of which: IRIS, was made open source as OpenIRIS • Others… – Open Source • Cougaar – http://www.cougaar.org/ • Open Cog – http://opencog.org/ • Open Advancement of Question Answering Systems – Closely related to IBM’s Watson – http://oaqa.github.io/ • SolrSherlock – http://debategraph.org/SolrSherlock • Many others…
  • 11. Use Cases for Big Data Harvesting • Resource Collection – Federation • bring together and organize without filters • Resource Augmentation – Tagging – Annotating – Debate • Knowledge Cartography – Connecting resources – Map maintenance – More Debate • Research Augmentation – Crowd-sourced discovery – Harvesting – Automated inferences /reasoning – Knowledge sharing Federated Information Resources Harvesting Activities Adapted from http://www.slideshare.net/jackpark/big-datasciencemeetup-final Slide 29 Harvesting Activities Harvesting Activities
  • 12. A Strong Conjecture • A Knowledge Federation’s topic map provides a Rosetta Stone-like substrate – Reasoning by analogy – Big Data mined for clues – Map: • Where we have been • Where we haven’t (Dragons be here) Adapted from http://www.slideshare.net/jackpark/big-datasciencemeetup-final Slide 33
  • 13. Topic Maps for Knowledge Federation • Maintain well-organized by topic structure • Key issue: – For any given information resource added to a map: • Agents must answer this question: – Have I seen this before by any other name or description?
  • 14. Are We There Yet? • We are now at the edges of discovery: – Deeper ways of representing – Deeper ways of knowing • Relational Biology
  • 15. Relational Biology • Paraphrasing Nicholas Rashevsky*: – We can tease open a living cell and count all its components, but we cannot put it back together and we have no clue why • Interpreting Robert Rosen**: – Rashevsky’s quest for a relational mathematics for biology (complex systems) entails topological algebras (Category Theory) • Category theory is said to facilitate modeling the social lives of members of the categories *http://en.wikipedia.org/wiki/Nicholas_Rashevsky **http://en.wikipedia.org/wiki/Robert_Rosen_(theoretical_biologist)
  • 16. Relational Modeling 1 • Starts with Ontologies – Ontologies grant uniform vocabularies to universes of discourse • Including describing data – Ontology-based frameworks provide ways to model social and other relational structures • SIOC: Semantically Interlinked Online Communities* • SWAN: Semantic Web Applications in Neuromedicine** *http://www.sioc-project.org/ **http://www.w3.org/TR/hcls-swan/
  • 17. SIOC Closer Look • A way to model components entailed by a situation (blog post in this case) – Uniform vocabulary – Structural relations • Creates a foundation for much deeper modeling – Including: • Other ontologies • Other structures • Feedback loops SIOC Blog Post* *http://rdfs.org/sioc/spec/
  • 18. Massive Connectivity and Feedback http://geography.oii.ox.ac.uk/?page=home Complex Communication Processes
  • 19. Feedback Loops: Crucial to Learning Image: FEDERAL HEALTH FUTURES SUMMIT LEADERSHIP LEARNING for TRANSFORMATIONAL CHANGE. September 10-11, 2012 Washington DC Metro Region Page 23
  • 20. Relational Biology: Context • Context is about Relations among the components themselves • Context is about Relations among the components and their environment • Context is about Feedback
  • 21. Example from Breast Cancer 1 Extracellular Matrix (EM) as Context Complex Communication Processes Milk producing tissue http://www.ted.com/talks/mina_bissell_experiments_that_point_to_a_new_understanding_of_cancer
  • 22. Example from Breast Cancer 2 Cells missing their EM Cells with restored EM http://www.ted.com/talks/mina_bissell_experiments_that_point_to_a_new_understanding_of_cancer
  • 23. Towards Cognitive Agents • Harvest and represent – Patterns • Actors • Relations • States – Context in which patterns exist • Discover – Processes – Unrecognized connections – …
  • 24. Watson’s Architecture* (Simplified) • Analysis determines answer type and topics in play • Hypothesis formation seeks candidate answers from sources – Pattern matching • Hypothesis scoring weighs evidence for each hypothesis • Answer ranking uses models to select answer Question Analysis Answer Sources Evidence Sources Hypothesis Formation Hypothesis Scoring Answer Ranking Answer *http://www.aaai.org/Magazine/Watson/watson.php
  • 25. SolrSherlock Architecture (Simplified) Topic Map Conceptual Graphs Harvested Documents Harvester: HyperMembrane Information Fabrics, Agents Literature-based Discovery: Process documents into structures (information fabrics) from which patterns are harvested. Federate Data Analysis with Literature: Federate Data Observations and predictions with concepts and relations harvested from the literature Model Processes, Structures, and Analogies
  • 26. SolrSherlock Component Diagram Topic Map Conceptual Graph Information Fabrics TSC TM Provider CG Provider Machine Reader TSC Provider Open DeepQA Harvester PersistenceProvidersAgents
  • 27. Looking Forward • Coupling Literature-based research with BigData analysis – Common ontologies – Hypothesis formation – Evidence gathering – Relation discovery
  • 28. Completed Representation antioxidants kill free radicals Contraindicates macrophages use free radicals to kill bacteria Bacterial Infection Antioxidants Because Appropriate For Compromised Host Let us co-create Cognitive Agents for Discovery jackpark@topicquests.org Thanks to Martin Radley , Patrick Durusau Sherry Jones, and Mark Szpakowski for valuable comments SolrSherlock at: http://debategraph.org/SolrSherlock and https://github.com/SolrSherlock