2. Overview Introduction Two kinds of knowledge infrastructure Ontological controversies: some examples The nature of actual scientific representation Representational pragmatism Technical directions Conclusions
3. My background SSS Artificial IntelligenceKnowledge RepresentationAgent-based systems Programming Languages Media Science Human InterfaceConstructionism Visual Programming Scientific Software(@startups, large companies, open source projects, and now SRI)Scientific KMCollaborationDecision Support PublishingStandards Philosophy of Science Sociology Cognitive Science Narrative Theory
4.
5. Synopsis Knowledge representation inevitably involves inconsistency, controversy, hence politics; Scientific representation does too, but it has worked-out practices for dealing with it; KR should work more like science rather than the other way around; Representational Pragmatism: a conceptual framework to make it happen
6. Overview Introduction Two kinds of knowledge infrastructure Ontological controversies: some examples The nature of actual scientific representation Representational pragmatism Technical directions Conclusions
7. What’s a knowledge infrastructure? A system of Technologies, Institutions, Standards, and Practices that serve to support knowledge Collection Storage Curation Sharing Validation …
8. Knowledge Infrastructure #1: Science The scientific community An elaborate web of People (scientists and others) Institutions (labs, journals, funding agencies, instrument makers…) Practices (publishing criteria, protocols, conferences) Works pretty well! The gold standard for knowledge in fact. But there are issues of scaling, quality, inertia, siloing, epistemological closure…
9. Knowledge Infrastructure #2: The Semantic Web Set of technical standards for sharing formalized knowledge Aspires to be a universal framework for knowledge A grand vision of global-scale knowledge representation And tremendously important and needed.
12. These two are becoming one… Bioscience is by far the largest application area for semantic web technology
13.
14. Some non-robust properties of the semantic web Too inexpressive(Can’t represent default reasoning or n-way predicates) Too complex(Prevents widespread acceptance) Too logic-based(Emphasizes wrong things)
15. Overview Introduction Two kinds of knowledge infrastructure Ontological controversies: some examples The nature of actual scientific representation Representational pragmatism Technical directions Conclusions
16. Convergence and Controversy Ontologies are supposed to define a common understanding of a domain But “common” is easier said than done In practice: Many different constituencies With different ideas about what’s important Many side-factors complicate things (implementation cost, personal status, existing non-rigorous usages…) Compromise is necessary but rarely produces elegant results
17. Example: psychiatric illness What constitutes a mental illness? Not at all obvious that categories correspond to real phenomena Huge changes over over time Currently defined by DSM-IV through a highly politicized process History of PTSD (Scott, 1990) “combat fatigue” or cowardice In and out of the DSM Finally recognized as PTSD, partly as response to Vietnam War
18. Psychiatric illness (2) Homosexuality Formerly a pathology, now not, through a highly politicized process Attention Deficit Disorder Cluster of symptoms, not clear what the boundaries should be Opinions often determined by theories of child-rearing or institutional aspects of school. Insurers and economics are important actors in debate Summary: these disorders are social constructed categories over a definite but unclear underlying reality.
19. Example: category fudging In Pathway Tools, SRI’s bioinformatics knowledge base This is a widely used system for curating genomes and metabolic pathways Underlying frame system Web based interface
20.
21. Example: Gene/Protein conflation Genes and Proteins are different things But biologists tend to want to use the same name for a gene and its product Tension between formal ontology and actual scientific usage Equivalently, an argument between the computer scientists who build the system and the biologists who use it and curate it
25. Moral of this somewhat trivial example There are tensions (inconsistencies) between formal representation and actual usage And, software makers end up having to cope with these tensions in design decisions Usually in a kludgy way! Eg, papering over the conflict in the user interface layer Would be nice to have a better theory of how do this.
26. Example: how do we classify mitochondria? Organelles (part of cell) But descended from separate endosymbiotic organisms With their own DNA (Generally but not universally accepted theory)
27.
28. There are consequences “If we accept that mitochondria are bacteria, then the record books have to be rewritten. The first bacterial genome sequence was completed not by American arriviste Craig Venter …in 1995, but instead by … Fred Sanger, who completed the human mitochondrial genome sequence in 1981!”
29. Expressivity in Description Logics Description Logics (DL) are the basis for semantic web ontology. Selected largely for computational tractability But DL make it hard to do simple things such as representing defaults All cats have hair Except for this one! Expressivity has been traded away A compromise and perhapsnot the right one
30. Overview Introduction Two kinds of knowledge infrastructure Ontological controversies: some examples The nature of actual scientific representation Representational pragmatism Technical directions Conclusions
31. Bruno Latour French philosopher and sociologist of science Roundly reviled for perceived anti-realism Started with anthropological studies of science in labs and fields Ends in a rather unique view of representation and even metaphysics
32. Latour for dummies Science is a social construction (but not an arbitrary one) Network based: a network consists of humans and non-human actors (lab animals, instruments, funding institutions…) Agonistic – trials of strength between networks Understand how science works by tracing the flow of inscriptions, abstractions, and power through these networks An enriched realism, that provides a rich account of the relation between phenomena and representation
33. Dual face of science Settled science:“That’s the way it is”ObjectiveBlack-boxed Politically Established Natural Science under construction: UnsettledContentiousSearching for allies (people, funding, tools)Building networks of alliance Social
34. Science in the making: EG: Watson and Crick’s work on the structure of DNA Speculations (A three-strand model was proposed) Contending theories Eventually a winner emerges Science made Now that the structure of DNA is known, it’s a “black box” we can make instruments that measure it representations of its sequence
37. Where the representation meets the road Science is: “the transformation of rats and mice into paper” Situated representations From phenomena Lab notebook Tables in articles Laws of nature Concrete, situated Abstract, objective
41. Analogizing to KR Knowledge Representation: Realist Objective Settled Factual Established Abstract Graph structures Knowledge Construction: Situated representations Unsettled Bottom-up User interfaces Ad-hoc structures
42. A new view of the relation between world and representation Latour refocuses epistemology Less on the truth of representations, More on their connection to the world via networks of actants. Should be a natural fit for computationalists Who also make systems of symbols with causal connections to the world and each other
43. Overview Introduction Two kinds of knowledge infrastructure Ontological controversies: some examples The nature of actual scientific representation Representational pragmatism Technical directions Conclusions
44. Realism vs Conceptualism Realism: a movement in philosophy of KR Led mostly by Barry Smith, SUNY Buffalo(eg “Beyond Concepts: Ontology as Reality Representation”, 2004) The problem: nobody knows what makes a good ontology His solution: Aristotelian universals Bad ontologies are…those whose general terms lack the relation to corresponding universals in reality, and thereby also to corresponding instances. Good ontologies are reality representations...
45. Realism is extremely annoying Both vacuous and wrong Vacuous: because it presupposes we know what is real beforehand Wrong: because it doesn’t correspond to actual scientific knowledge representation Examples of failure: Higgs bosons – we don’t know if they are real Genes – were hypothesized before their “implementation” was known; when were they real? Software for synthetic chemistry – mixes real and not-yet-real molecular structures
47. But Realism is Winning Basis of BFO (Basic Formal Ontology) Which is used by OBO Foundry and other bio-ontology efforts Nobody wants to be against “realism”… so they picked a good name
48. Realism only deals with half of science May work for ready-made science, hopeless for science-in-the-making Where we don’t know what’s real And which is where the action is
49. Representational Pragmatism Needed: a term with good connotations to compete with “realism”. Connects to a philosophical tradition (James, Peirce, Dewey, Rorty) “It is astonishing how many philosophical disputes collapse into insignificance the moment that you subject them to this simple test of tracing a concrete consequence” -- James Bottom-up rather than top-down; opposed to premature ontologizing; Latourian Support the divergent representational practices of actual science Help science towards convergence, objectivity, and realism, rather than demanding it upfront.
50. Overview Introduction Two kinds of knowledge infrastructure Ontological controversies: some examples The nature of actual scientific representation Representational pragmatism Technical directions Conclusions
51. Some encouraging developments Linked data vs semantic webA somewhat more bottom-up, pragmatic approach to universal knowledge infrastructure Freebase, DBPedia similar efforts Open Science movement Open Access Journals (PLoS, etc) Open Data (standards) Open Notebook (practices)
52.
53. BioBike: a platformfor symbolic biocomputing A web-based, programmable tool for advanced biocomputing Knowledge-based Programmable Social Really the inspiration of many of the ideas here Joint work with Jeff Shrager (Stanford), Jeff Elhai (VCU), and others
54.
55. Reworked to be more social Bio-computation Bio-blog menu Knowledge/ data analysis Integration with services Commentary
56.
57. Prototype-based KR How the mind categorizes (Rosche, Lakoff) A perennial minority theme in computation: 60s: Sutherland, Sketchpad 70s: Early frame-based KR systems 80s: Ungar and Smith, SELF programming language 90s: Ken Haase, Framer Now: Javascript A structured way to manage inconsistency
58. Biology is prototype-based Every feature of a biological class started out as an exception to a general case! aka mutation Classes are Aristotelian Prototypes are Darwinian
59. Overview Introduction Two kinds of knowledge infrastructure Ontological controversies: some examples The nature of actual scientific representation Representational pragmatism Technical directions Conclusions
60. The Problems Ontologies are plagued with inconsistencies (or compromise) because they are inevitably the product of different interests. Ontologies generally only try to capture the settled science Realism is vacuous, question-begging; if we knew at the start what was real we wouldn't need to do science Knowledge construction is social, tentative, situated, multi-viewpoint, and only objective at its endpoints.
61. The Solutions Tools that support how science is actually done, at web scale and with greater visibility and traceability A pragmatic view of scientific representation That let scientists work bottom-up from their results that foregrounds the concrete relations between representation and reality (circulating reference) connects science in progress with settled science, supporting and preserving controversy, unsettledness, and argument structure More simply: integrate data and knowledge and the processes that connect them. Open Science: institutions, standards, practices. A representational infrastructure that supports prototypes, default reasoning, and exceptions.