Ontology for the Financial Services Industry


Published on

Published in: Education, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • http://www.w3.org/People/Ivan/CorePresentations/HighLevelIntro/
  • http://www.w3.org/People/Ivan/CorePresentations/HighLevelIntro/
  • http://www.w3.org/People/Ivan/CorePresentations/HighLevelIntro/
  • Ivan Herman
  • http://dbpedia.org/fct/images/lod-datasets_2009-03-27_colored.png
  • http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=116006492sequence of X chromosome in baker’s yeast
  • http://1105govinfoevents.com/EA/Presentations/EA09_2-2_Robinson.pdf
  • Ontology for the Financial Services Industry

    1. 1. Reference Data Integration:A Strategy For The FutureBarry SmithNational Center for Ontological ResearchUniversity at Buffalopresented at FIMA, March 21, 20121
    2. 2. Who am I?National Center for Biomedical Ontologybased in Stanford Medical School, the Mayo Clinicand Buffalo Department of Philosophy2• Cleveland Clinic Semantic Database• Duke University Health System• University of Pittsburgh Medical Center• German Federal Ministry of Health• European Union eHealth Directorate• Plant Genome Research Resource• Protein Information Resource
    3. 3. Who am I?National Center for Ontological Research (http://ncor.us)• Joint Warfighting Center, US Joint Forces Command• Intelligence and Information Warfare Directorate(I2WD)• US Department of the Army Net-Centric DataStrategy Center of Excellence• NextGen (Next Generation Air TransportationSystem) Ontology Team• National Nuclear Security Administration (NNSA),Department of Energy3
    4. 4. Some questions• How to find data?• How to understand data when you find it?• How to use data when you find it?• How to compare and integrate with other data?• How to avoid data silos?4
    5. 5. The Web (net-centricity) as part of thesolution• You build a site• Others discover the site and they link to it• The more they link, the more well known thepage becomes (Google …)• Your data becomes discoverable5
    6. 6. 1. Make your data available in a standard wayon the Web2. Use controlled vocabularies (‘ontologies’) tocapture common meanings, in waysunderstandable to both humans andcomputers – Web Ontology Language(OWL)3. Build links among the datasets to create a‘web of data’The roots of Semantic Technology
    7. 7. Controlled vocabularies for tagging(‘annotating’) data• Hardware changes rapidly• Organizations rapidly forming anddisbanding• Data is exploding• Meanings of common words change slowly• Use web architecture to annotate explodingdata stores using ontologies to capturethese common meanings in a stable way7
    8. 8. Where we stand today• increasing availability of semantically enhanceddata and semantic software• increasing use of XML, RDF, OWL in attempts tocreate useful integration of on-line data andinformation• “Linked Open Data” the New Big Thing8
    9. 9. Ontology success stories, and somereasons for failure•9
    10. 10. as of September 2010
    11. 11. The problem: the more SemanticTechnology is successful, they more it failsThe original idea was to break down silos viacommon controlled vocabularies for the taggingof dataThe very success of the approach leads to thecreation of ever new controlled vocabularies –semantic silos – as ever more ontologies arecreated in ad hoc waysThe Semantic Web framework as currentlyconceived and governed by the W3C yieldsminimal standardizationMultiplying (Meta)data registries are creatingdata cemeteries11
    12. 12. NCBO Bioportal (Ontology Registry)12
    13. 13. 13/24
    14. 14. 14/24
    15. 15. Reasons for this effect• Low incentives for reuse of existing ontologies• Each organization wants its own ontology• Poor licensing regime, poor standards, poortraining• People think: Information technology (hardware)is changing constantly, so it’s not worth the effortof getting things right• People have egos: “We have done it this way for30 years, we are not going to change now”15
    16. 16. Why should you care?• when they are many ad hoc systems, averagequality will be low• constant need for ad hoc repair throughmanual effort• DoD alone spends $6 billion per annum onthis problem• regulatory agencies are recognizing the needfor common controlled vocabularies16/24
    17. 17. So now people are scrambling• to learn how to create ontologies• serious lag in creating trained expertise• poor quality coding leads to poor qualityontologies• poor quality ontology management17
    18. 18. How to do it right?• how create an incremental, evolutionaryprocess, where what is good survives ?• how to bring about ontology death ?A success story from biology18
    19. 19. Old biology data19/
    21. 21. 0200400600800100012002000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010AxisTitleSeries 1Ontology in PubMed
    22. 22. By far the most successful: GO (Gene Ontology)22
    23. 23. 23what cellular component?what molecular function?what biological process?the Gene Ontology is not an ontology of ge
    24. 24. arson lw n3d ...t_LW_n3 d_5p_...Colored by: Copy of Copy of C5_RMA (Defa...Gene List: all genes (1 4010)attackedtimecontrolPuparial adhesionMolting cyclehemocyaninDefense responseImmune responseResponse to stimulusToll regulated genesJAK-STAT regulated genesImmune responseToll regulated genesAmino acid catabolismLipid metobolismPeptidase activityProtein catabloismImmune responsee Tree: pearson lw n3d ...lassification: Set_LW_n3d_5p_...Colored by: Copy of Copy of C5_RMA (Defa...Gene List: all genes (14010)Microarray datashows changedexpression ofthousands of genes.How will you spotthe patterns?24
    25. 25. Why is GO successful• built by bench biologists• multi-species, multi-disciplinary, open source• compare use of kilograms, meters, seconds informulating experimental results• natural language and logical definitions for allterms• initially low-tech to ensure aggressive use andtesting25
    26. 26. now used not just inbiology but also inhospital research26
    27. 27. Lab / pathology dataEHR dataClinical trial dataFamily history dataMedical imagingMicroarray dataModel organism dataFlow cytometryMass specGenotype / SNP dataHow will you spot the patterns?How will you find the data youneed?27
    28. 28.  over 11 million annotations relatingUniProt, Ensembl and other databases to terms inthe GO28
    29. 29. 29Hierarchical view representingrelations between representedtypes
    30. 30. ~ $200 mill. invested in the GO so farA new kind of biomedical researchOver 11 million GO annotations to biomedicalresearch literature freely available on the webPowerful software tool support for navigatingthis data means that what used to takeresearchers months of data comparison effort,can now be performed in milliseconds30
    31. 31. If controlled vocabularies are to serveto remove silosthey have to be respected by many owners ofdata as resources that ensure accuratedescription of their data– GO maintained not by computer scientists butby biologiststhey have to be willingly used in annotations bymany owners of datathey have to be maintained by persons who aretrained in common principles of ontologymaintenance31
    32. 32. 32The new profession of biocurator
    33. 33. GO has been amazingly successfulHas created a community consensusHas created a web of feedback loops whereusers of the GO can easily report errorsand gapsHas identified principles for successfulontology managementIndispensable to every drug company andevery biology lab33
    34. 34. But GO is limited in its scopeit covers only generic biological entities of threesorts:–cellular components–molecular functions–biological processesno diseases, symptoms, diseasebiomarkers, protein interactions, experimentalprocesses …34
    35. 35. Extending the GO methodology toother domains of biology andmedicine35
    36. 36. RELATIONTO TIMEGRANULARITYCONTINUANT OCCURRENTINDEPENDENT DEPENDENTORGAN ANDORGANISMOrganism(NCBITaxonomy)AnatomicalEntity(FMA,CARO)OrganFunction(FMP, CPRO) PhenotypicQuality(PaTO)BiologicalProcess(GO)CELL ANDCELLULARCOMPONENTCell(CL)CellularComponent(FMA, GO)CellularFunction(GO)MOLECULEMolecule(ChEBI, SO,RnaO, PrO)Molecular Function(GO)Molecular Process(GO)OBO (Open Biomedical Ontology) Foundry proposal(Gene Ontology in yellow) 36
    37. 37. RELATIONTO TIMEGRANULARITYCONTINUANT OCCURRENTINDEPENDENT DEPENDENTORGAN ANDORGANISMOrganism(NCBITaxonomy)AnatomicalEntity(FMA,CARO)OrganFunction(FMP, CPRO) PhenotypicQuality(PaTO)BiologicalProcess(GO)CELL ANDCELLULARCOMPONENTCell(CL)CellularComponent(FMA, GO)CellularFunction(GO)MOLECULEMolecule(ChEBI, SO,RnaO, PrO)Molecular Function(GO)Molecular Process(GO)The strategy of orthogonal modules37
    38. 38. Ontology Scope URL CustodiansCell Ontology(CL)cell types from prokaryotesto mammalsobo.sourceforge.net/cgi-bin/detail.cgi?cellJonathan Bard, MichaelAshburner, Oliver HofmanChemical Entities of Bio-logical Interest (ChEBI)molecular entities ebi.ac.uk/chebiPaula Dematos,Rafael AlcantaraCommon Anatomy Refer-ence Ontology (CARO)anatomical structures inhuman and model organisms(under development)Melissa Haendel, TerryHayamizu, Cornelius Rosse,David Sutherland,Foundational Model ofAnatomy (FMA)structure of the human bodyfma.biostr.washington.eduJLV Mejino Jr.,Cornelius RosseFunctional GenomicsInvestigation Ontology(FuGO)design, protocol, datainstrumentation, and analysisfugo.sf.net FuGO Working GroupGene Ontology(GO)cellular components,molecular functions,biological processeswww.geneontology.org Gene Ontology ConsortiumPhenotypic QualityOntology(PaTO)qualities of anatomicalstructuresobo.sourceforge.net/cgi-bin/ detail.cgi?attribute_and_valueMichael Ashburner, SuzannaLewis, Georgios GkoutosProtein Ontology(PrO)protein types andmodifications(under development) Protein Ontology ConsortiumRelation Ontology (RO) relations obo.sf.net/relationship Barry Smith, Chris MungallRNA Ontology(RnaO)three-dimensional RNAstructures(under development) RNA Ontology ConsortiumSequence Ontology(SO)properties and features ofnucleic sequencessong.sf.net Karen Eilbeck
    39. 39. How to recreate the success of theGO in other areas1. create a portal for sharing of informationabout existing controlled vocabularies, needsand institutions operating in a given area2. create a library of ontologies in this area3. create a consortium of developers of theseontologies who agree to pool their efforts tocreate a single set of non-overlappingontology modules– one ontology for each sub-area39
    40. 40. 40NextGen Ontology PortalPortalCommunitiesSearchOntology LibraryNextGenEnterpriseOntologyOntology Portal• Two-Tiered Registry– NextGen Ontology – consist ofvetted ontologies– Ontology Library – open to thewider community• Ontology Metadata– Ontology owner, domain, andlocation• Ontology Search*– Support ontology discovery
    41. 41.  Developers commit in advance tocollaborating with developers of ontologiesin adjacent domains and to working to ensure that, for eachdomain, there is community convergence ona single ontologyhttp://obofoundry.orgThe OBO Foundry: a step-by-step, principles-based approach41
    42. 42. OBO Foundry Principles Common governance Common training Robust versioning Common architecture42
    43. 43. Anatomy Ontology(FMA*, CARO)EnvironmentOntology(EnvO)InfectiousDiseaseOntology(IDO*)BiologicalProcessOntology (GO*)CellOntology(CL)CellularComponentOntology(FMA*, GO*) PhenotypicQualityOntology(PaTO)Subcellular Anatomy Ontology (SAO)Sequence Ontology(SO*) MolecularFunction(GO*)Protein Ontology(PRO*)OBO Foundry Modular Organizationtop levelmid-leveldomain levelInformation ArtifactOntology(IAO)Ontology for BiomedicalInvestigations(OBI)Ontology of GeneralMedical Science(OGMS)Basic Formal Ontology (BFO)43
    44. 44. UCore 2.0 / UCore SLExtension Strategy44top levelmid-leveldomainlevelMilitary domain ontologies as extensions of theUniversal Core Semantic Layer
    45. 45. Existing efforts to create modularontology suitesNASA Sweet OntologiesMilitary Intelligence Ontology FoundryPlanned OMG efforts:• OMG (CIA) Financial Event Ontology• Semantic Layer for ISO 20022 (FinancialIndustry Message Scheme)
    46. 46. 46Example:Financial Securities OntologyMike Bennett (2007)
    47. 47. Basic principles of ontologydevelopment– for formulating definitions– of modularity– of user feedback for error correction and gapidentification– for ensuring compatibility between modules– for using ontologies to annotate legacy data– for using ontologies to create new data– for developing user-specific views
    48. 48. Modularity designed to ensure• non-redundancy• annotations can be additive• division of labor among SMEs• lessons learned in one module can benefit work onother modules• transferrable training• motivation of SME users49
    49. 49. How the FIMA Reference Datacommunity should solve this problem?Major financial institutionsMajor software vendorsMajor data management companiesEDMC and government principals– should pool information about the controlled vocabularieswhich already exist– create a common library of these controlled vocabularies– create a subset of thought leaders who agree to pool theirefforts in the creation of a suite of ontology modules forcommon use– create a strategy to disseminate and evolve the selectedmodules– create a governance strategy to manage the modules over time– allow bad ontologies to die
    50. 50. Urgent need for trained ontologistsSevere shortage of persons with the neededexpertiseUniversity at Buffalo Online Training andCertification Program for Ontologistsfor details: phismith@buffalo.edu