Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A01-Openness in knowledge-based systems


Published on

The role of openness in knowle

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

A01-Openness in knowledge-based systems

  1. 1. The Role of Openness in Creating a Mind for Life<br />
  2. 2. Open Source, AI, & Biology<br />An AI breakthrough can come from an application in biology<br />It is imperative that this be open source<br />Some steps toward (and questions about) creating an open source AI for understanding life<br />
  3. 3. The first artificial mind will think about molecular biology<br />“You can’t think about thinking without thinking about thinking about something.”<br />Seymour Papert, 1974<br />“A thorough study of Human Physiology is, in itself, an education broader and more comprehensive than much that passes under that name. There is no side of the intellect which it does not call into play, no region of human knowledge into which either its roots, or its branches, do not extend.”<br />Thomas Huxley,1893<br />
  4. 4. Why AI hasn’t succeeded (yet)<br />People know a lot about the world implicitly <br />Conversing with a partnerwho doesn’t know these basic things is very frustrating<br />50 years of failing to capture this “common sense” information computationally suggests:<br />Lack of explicit enumeration makes capture very expensive (encyclopedias don’t have it!) <br />Still no idea of the extent of this knowledge<br />
  5. 5. People don’t have implicit knowledge of molecular biology<br /><ul><li>Everything anyone knows about MolBio comes from some combination of:</li></ul>Textbooks<br />Scientific publications<br />Databases (e.g. NCBI)<br />Experiments done in one’s own lab<br /><ul><li>There is no elicitation barrier to capturing everything known about molecular biology</li></li></ul><li>Why would biologists care?<br />They have to understand genome-scale data, in the context of all that is already known.<br />Magic bullets and biomarkers are not enough<br />The idea of finding a single marker of disease state, and addressing it with a specifically targeted drug is not panningout as well as hoped.<br />
  6. 6. X<br />J.J. Hornberg et al. / BioSystems 83 (2006) 81–90<br />Homeostatic networks foil single markers and drugs<br />outcome<br />target<br />
  7. 7. Networks change through time<br />Mjolsness, Sharp, Reinitz, A Connectionist Model of Development J. Theoretical Bio 1991<br />
  8. 8. Understanding the data<br />“We are close to having a $1,000 genome sequence, but this may be accompanied by a $1,000,000 interpretation.” - Bruce Korf, president American College of Medical Genetics<br />Not only is the cost of sequencing essentially free, but big computers and big storage are cheap, too. What will keep us busy for the next 50 years is understanding the data” - Russ Altman, chair of Biomedical Engineering at Stanford<br />
  9. 9. The Hard Problem<br />Given a set of genomic regions, variants, gene products, and/or concentrations empirically involved in a defined phenotype…<br />Produce:<br />An explanation of the reasons that those genomic regions / variants / products / concentrations are (or are not) relevant to the phenotype<br />Evidence to support the explanation(s)<br />Alternative explanations<br />Reasons to prefer one explanation over another<br />
  10. 10. Answering Why? questions<br />Fundamental to human cognitive development<br />Amazing human facility<br />Even to confabulation<br />Causal explanation is central to science<br />The only question “big data”doesn’t seem to be enoughto answer (cfRamachandran & Hovy, 2002)<br />
  11. 11. Abductive inference<br />“However man may have acquired his faculty of divining the ways of Nature, it has certainly not been by a self-controlled and critical logic. Even now he cannot give any exact reason for his best guesses…. For though it goes wrong oftener than right, yet the relative frequency with which it is right is on the whole the most wonderful thing in our constitution.”<br />The Essential Peirce: Selected Philosophical Writings v. 2 p. 217<br />
  12. 12. “Two paradoxes are better than one; they may even suggest a solution” –Edward Teller<br />Molecular Systems Biology<br />+ <br />Artificial Intelligence<br />
  13. 13. Explanation is hard<br />Not just about the connection between an explanation and the thing explained, but must also be “consonant” with other explanations.<br />Knowledge is key<br />Have to know many other explanations.<br />Need “judgment” to compare the qualities of alternative explanations.<br />Racunas & Shah’s HyBrow system, but required extensive manually represented knowledge<br />A “complete enough” knowledge-base?<br />
  14. 14. Knowledge-based Computational Biology<br />Widespread use, e.g.<br />Simulation systems (e.g. BioCyc)<br />Question answering systems (e.g. AskHermes or Watson Medicine)<br />High-throughput result analysis (e.g. GOEAST, Ontologizer)<br />Hypothesis generation / testing (e.g. HyQue)<br />Anything that uses an ontology<br />Annotations (e.g. GOA)<br />Cross-species comparisons <br />NCBO<br />
  15. 15. KB for explanation<br />Knowledge base quality<br />Correctness, timeliness (tracking changes)<br />Completeness<br />A constantly receding goal, that obviously cannot be achieved, but is important anyway<br />Need to cover the material in<br />Textbooks<br />Journal articles<br />Databases<br />
  16. 16. Explanatory inference<br />Even if all the relevant knowledge were available in computationally tractable form…<br />We need inferential methods to<br />Identify possible explanations of complex biological phenomena (symbolic?)<br />Compare alternative explanations in the light of existing evidence (numeric?)<br />History of explanatory inference in AI is suggestive, but key open problems remain<br />
  17. 17. Why does openness matter?<br />Productivity: <br />Attacking hard problems efficiently<br />Rapid assimilation of effective methods<br />Building on (not ignoring) each other’s results<br />Equity: <br />Access to scientists with low budgets<br />Distribution to the widest possible community<br />Ethics: <br />Transparency for AI is a moral value<br />
  18. 18. Transparency is a moral value<br />AI matters – lots of social concerns about loss of control, etc. 2001, Robopocolypse<br />AI is cheap to replicate, and will diverge (if you can build one mind, building millions more is easy). Too important to be private<br />Technological development in the face of such broad social concern requires earning the trust of the society<br />
  19. 19. Getting there<br />Build on track records of openness<br />OBO &Community-curated Ontologies<br />Semantic Web / OWL / SPARQL / SWRL<br />Open Access Publishing<br />Linked Life Data<br />Breaking down barriers<br />Infrastructure<br />Incentives<br />
  20. 20. Opening a Bazzar<br />To get the productivity advantage, infrastructure matters<br />Technical infrastructure to share, compare and integrate code <br />Social infrastructure to work together to solve hard problems<br />Motivation<br />Competition<br />Cooperation<br />
  21. 21. Confronting the temptations of being proprietary<br />The temptations:<br />Potential future payoff<br />Avoid effort to conform to the infrastructure<br />Fear of not being able to improve in the future<br />Competition errors<br />Wrong task / evaluation / supplied data<br />Poor process (timing, execution, infrastructure)<br />Doesn’t evolve toward worthy end<br />
  22. 22. Goals<br />Participation from many, previously disparate communities<br />Bio focused: BioCreative, BioNLP,<br />Comp Ling: ACL Shared Tasks, CONLL<br />NIST: TREC, TAC<br />A living, open source collection of useful, modular, repurposable, state of the art software for understanding biomedical texts<br />Major advances in AI<br />
  23. 23. Facilitating an OS community<br />Providing Resources<br />Software (UIMA, U-COMPARE)<br />Compute power<br />Training data (CRAFT, Analysis of analysts)<br />Signal Events<br />Series of competitions based on CRAFT<br />Incentives<br />Prizes for significant achievements<br />
  24. 24.<br />
  25. 25. Remaining challenges<br />Pubmed Central and open access<br />Corporate ownership (Ontotext & LLD)<br />Semantic compatibility of various sources<br />UMLS breadth vs. BFO logic<br />Sharing inference methods & rules<br />Rule syntax (SWRL) is not enough. <br />DL inference is not enough<br />UIMA equivalent?<br />
  26. 26. How to participate<br />Help design CRAFT competitions<br />Confront publishers about PMC bulk downloads<br />Help define inferential benchmarks<br />