A01-Openness in knowledge-based systems
Upcoming SlideShare
Loading in...5

A01-Openness in knowledge-based systems



The role of openness in knowle

The role of openness in knowle



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • I will have more to say about the importance of broad knowledge later
  • Humans are very facile at generating explanations. Confabulation is what happens when the process is disconnected from relevant sources of information. Split brain patients explaining why the other hemisphere did something, or Capgrassymdome
  • I believe that both of these goals are within reach in the next generation.

A01-Openness in knowledge-based systems A01-Openness in knowledge-based systems Presentation Transcript

  • The Role of Openness in Creating a Mind for Life
  • Open Source, AI, & Biology
    An AI breakthrough can come from an application in biology
    It is imperative that this be open source
    Some steps toward (and questions about) creating an open source AI for understanding life
  • The first artificial mind will think about molecular biology
    “You can’t think about thinking without thinking about thinking about something.”
    Seymour Papert, 1974
    “A thorough study of Human Physiology is, in itself, an education broader and more comprehensive than much that passes under that name. There is no side of the intellect which it does not call into play, no region of human knowledge into which either its roots, or its branches, do not extend.”
    Thomas Huxley,1893
  • Why AI hasn’t succeeded (yet)
    People know a lot about the world implicitly
    Conversing with a partnerwho doesn’t know these basic things is very frustrating
    50 years of failing to capture this “common sense” information computationally suggests:
    Lack of explicit enumeration makes capture very expensive (encyclopedias don’t have it!)
    Still no idea of the extent of this knowledge
  • People don’t have implicit knowledge of molecular biology
    • Everything anyone knows about MolBio comes from some combination of:
    Scientific publications
    Databases (e.g. NCBI)
    Experiments done in one’s own lab
    • There is no elicitation barrier to capturing everything known about molecular biology
  • Why would biologists care?
    They have to understand genome-scale data, in the context of all that is already known.
    Magic bullets and biomarkers are not enough
    The idea of finding a single marker of disease state, and addressing it with a specifically targeted drug is not panningout as well as hoped.
  • X
    J.J. Hornberg et al. / BioSystems 83 (2006) 81–90
    Homeostatic networks foil single markers and drugs
  • Networks change through time
    Mjolsness, Sharp, Reinitz, A Connectionist Model of Development J. Theoretical Bio 1991
  • Understanding the data
    “We are close to having a $1,000 genome sequence, but this may be accompanied by a $1,000,000 interpretation.” - Bruce Korf, president American College of Medical Genetics
    Not only is the cost of sequencing essentially free, but big computers and big storage are cheap, too. What will keep us busy for the next 50 years is understanding the data” - Russ Altman, chair of Biomedical Engineering at Stanford
  • The Hard Problem
    Given a set of genomic regions, variants, gene products, and/or concentrations empirically involved in a defined phenotype…
    An explanation of the reasons that those genomic regions / variants / products / concentrations are (or are not) relevant to the phenotype
    Evidence to support the explanation(s)
    Alternative explanations
    Reasons to prefer one explanation over another
  • Answering Why? questions
    Fundamental to human cognitive development
    Amazing human facility
    Even to confabulation
    Causal explanation is central to science
    The only question “big data”doesn’t seem to be enoughto answer (cfRamachandran & Hovy, 2002)
  • Abductive inference
    “However man may have acquired his faculty of divining the ways of Nature, it has certainly not been by a self-controlled and critical logic. Even now he cannot give any exact reason for his best guesses…. For though it goes wrong oftener than right, yet the relative frequency with which it is right is on the whole the most wonderful thing in our constitution.”
    The Essential Peirce: Selected Philosophical Writings v. 2 p. 217
  • “Two paradoxes are better than one; they may even suggest a solution” –Edward Teller
    Molecular Systems Biology
    Artificial Intelligence
  • Explanation is hard
    Not just about the connection between an explanation and the thing explained, but must also be “consonant” with other explanations.
    Knowledge is key
    Have to know many other explanations.
    Need “judgment” to compare the qualities of alternative explanations.
    Racunas & Shah’s HyBrow system, but required extensive manually represented knowledge
    A “complete enough” knowledge-base?
  • Knowledge-based Computational Biology
    Widespread use, e.g.
    Simulation systems (e.g. BioCyc)
    Question answering systems (e.g. AskHermes or Watson Medicine)
    High-throughput result analysis (e.g. GOEAST, Ontologizer)
    Hypothesis generation / testing (e.g. HyQue)
    Anything that uses an ontology
    Annotations (e.g. GOA)
    Cross-species comparisons
  • KB for explanation
    Knowledge base quality
    Correctness, timeliness (tracking changes)
    A constantly receding goal, that obviously cannot be achieved, but is important anyway
    Need to cover the material in
    Journal articles
  • Explanatory inference
    Even if all the relevant knowledge were available in computationally tractable form…
    We need inferential methods to
    Identify possible explanations of complex biological phenomena (symbolic?)
    Compare alternative explanations in the light of existing evidence (numeric?)
    History of explanatory inference in AI is suggestive, but key open problems remain
  • Why does openness matter?
    Attacking hard problems efficiently
    Rapid assimilation of effective methods
    Building on (not ignoring) each other’s results
    Access to scientists with low budgets
    Distribution to the widest possible community
    Transparency for AI is a moral value
  • Transparency is a moral value
    AI matters – lots of social concerns about loss of control, etc. 2001, Robopocolypse
    AI is cheap to replicate, and will diverge (if you can build one mind, building millions more is easy). Too important to be private
    Technological development in the face of such broad social concern requires earning the trust of the society
  • Getting there
    Build on track records of openness
    OBO &Community-curated Ontologies
    Semantic Web / OWL / SPARQL / SWRL
    Open Access Publishing
    Linked Life Data
    Breaking down barriers
  • Opening a Bazzar
    To get the productivity advantage, infrastructure matters
    Technical infrastructure to share, compare and integrate code
    Social infrastructure to work together to solve hard problems
  • Confronting the temptations of being proprietary
    The temptations:
    Potential future payoff
    Avoid effort to conform to the infrastructure
    Fear of not being able to improve in the future
    Competition errors
    Wrong task / evaluation / supplied data
    Poor process (timing, execution, infrastructure)
    Doesn’t evolve toward worthy end
  • Goals
    Participation from many, previously disparate communities
    Bio focused: BioCreative, BioNLP,
    Comp Ling: ACL Shared Tasks, CONLL
    A living, open source collection of useful, modular, repurposable, state of the art software for understanding biomedical texts
    Major advances in AI
  • Facilitating an OS community
    Providing Resources
    Software (UIMA, U-COMPARE)
    Compute power
    Training data (CRAFT, Analysis of analysts)
    Signal Events
    Series of competitions based on CRAFT
    Prizes for significant achievements
  • http://bionlp-corpora.sourceforge.net/CRAFT/http://bionlp.sourceforge.net
  • Remaining challenges
    Pubmed Central and open access
    Corporate ownership (Ontotext & LLD)
    Semantic compatibility of various sources
    UMLS breadth vs. BFO logic
    Sharing inference methods & rules
    Rule syntax (SWRL) is not enough.
    DL inference is not enough
    UIMA equivalent?
  • How to participate
    Help design CRAFT competitions
    Confront publishers about PMC bulk downloads
    Help define inferential benchmarks