Your SlideShare is downloading. ×
Ontology for the Financial Services Industry
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Ontology for the Financial Services Industry


Published on

Published in: Education, Technology

1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • Ivan Herman
  • of X chromosome in baker’s yeast
  • Transcript

    • 1. Reference Data Integration:A Strategy For The FutureBarry SmithNational Center for Ontological ResearchUniversity at Buffalopresented at FIMA, March 21, 20121
    • 2. Who am I?National Center for Biomedical Ontologybased in Stanford Medical School, the Mayo Clinicand Buffalo Department of Philosophy2• Cleveland Clinic Semantic Database• Duke University Health System• University of Pittsburgh Medical Center• German Federal Ministry of Health• European Union eHealth Directorate• Plant Genome Research Resource• Protein Information Resource
    • 3. Who am I?National Center for Ontological Research (• Joint Warfighting Center, US Joint Forces Command• Intelligence and Information Warfare Directorate(I2WD)• US Department of the Army Net-Centric DataStrategy Center of Excellence• NextGen (Next Generation Air TransportationSystem) Ontology Team• National Nuclear Security Administration (NNSA),Department of Energy3
    • 4. Some questions• How to find data?• How to understand data when you find it?• How to use data when you find it?• How to compare and integrate with other data?• How to avoid data silos?4
    • 5. The Web (net-centricity) as part of thesolution• You build a site• Others discover the site and they link to it• The more they link, the more well known thepage becomes (Google …)• Your data becomes discoverable5
    • 6. 1. Make your data available in a standard wayon the Web2. Use controlled vocabularies (‘ontologies’) tocapture common meanings, in waysunderstandable to both humans andcomputers – Web Ontology Language(OWL)3. Build links among the datasets to create a‘web of data’The roots of Semantic Technology
    • 7. Controlled vocabularies for tagging(‘annotating’) data• Hardware changes rapidly• Organizations rapidly forming anddisbanding• Data is exploding• Meanings of common words change slowly• Use web architecture to annotate explodingdata stores using ontologies to capturethese common meanings in a stable way7
    • 8. Where we stand today• increasing availability of semantically enhanceddata and semantic software• increasing use of XML, RDF, OWL in attempts tocreate useful integration of on-line data andinformation• “Linked Open Data” the New Big Thing8
    • 9. Ontology success stories, and somereasons for failure•9
    • 10. as of September 2010
    • 11. The problem: the more SemanticTechnology is successful, they more it failsThe original idea was to break down silos viacommon controlled vocabularies for the taggingof dataThe very success of the approach leads to thecreation of ever new controlled vocabularies –semantic silos – as ever more ontologies arecreated in ad hoc waysThe Semantic Web framework as currentlyconceived and governed by the W3C yieldsminimal standardizationMultiplying (Meta)data registries are creatingdata cemeteries11
    • 12. NCBO Bioportal (Ontology Registry)12
    • 13. 13/24
    • 14. 14/24
    • 15. Reasons for this effect• Low incentives for reuse of existing ontologies• Each organization wants its own ontology• Poor licensing regime, poor standards, poortraining• People think: Information technology (hardware)is changing constantly, so it’s not worth the effortof getting things right• People have egos: “We have done it this way for30 years, we are not going to change now”15
    • 16. Why should you care?• when they are many ad hoc systems, averagequality will be low• constant need for ad hoc repair throughmanual effort• DoD alone spends $6 billion per annum onthis problem• regulatory agencies are recognizing the needfor common controlled vocabularies16/24
    • 17. So now people are scrambling• to learn how to create ontologies• serious lag in creating trained expertise• poor quality coding leads to poor qualityontologies• poor quality ontology management17
    • 18. How to do it right?• how create an incremental, evolutionaryprocess, where what is good survives ?• how to bring about ontology death ?A success story from biology18
    • 19. Old biology data19/
    • 21. 0200400600800100012002000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010AxisTitleSeries 1Ontology in PubMed
    • 22. By far the most successful: GO (Gene Ontology)22
    • 23. 23what cellular component?what molecular function?what biological process?the Gene Ontology is not an ontology of ge
    • 24. arson lw n3d ...t_LW_n3 d_5p_...Colored by: Copy of Copy of C5_RMA (Defa...Gene List: all genes (1 4010)attackedtimecontrolPuparial adhesionMolting cyclehemocyaninDefense responseImmune responseResponse to stimulusToll regulated genesJAK-STAT regulated genesImmune responseToll regulated genesAmino acid catabolismLipid metobolismPeptidase activityProtein catabloismImmune responsee Tree: pearson lw n3d ...lassification: Set_LW_n3d_5p_...Colored by: Copy of Copy of C5_RMA (Defa...Gene List: all genes (14010)Microarray datashows changedexpression ofthousands of genes.How will you spotthe patterns?24
    • 25. Why is GO successful• built by bench biologists• multi-species, multi-disciplinary, open source• compare use of kilograms, meters, seconds informulating experimental results• natural language and logical definitions for allterms• initially low-tech to ensure aggressive use andtesting25
    • 26. now used not just inbiology but also inhospital research26
    • 27. Lab / pathology dataEHR dataClinical trial dataFamily history dataMedical imagingMicroarray dataModel organism dataFlow cytometryMass specGenotype / SNP dataHow will you spot the patterns?How will you find the data youneed?27
    • 28.  over 11 million annotations relatingUniProt, Ensembl and other databases to terms inthe GO28
    • 29. 29Hierarchical view representingrelations between representedtypes
    • 30. ~ $200 mill. invested in the GO so farA new kind of biomedical researchOver 11 million GO annotations to biomedicalresearch literature freely available on the webPowerful software tool support for navigatingthis data means that what used to takeresearchers months of data comparison effort,can now be performed in milliseconds30
    • 31. If controlled vocabularies are to serveto remove silosthey have to be respected by many owners ofdata as resources that ensure accuratedescription of their data– GO maintained not by computer scientists butby biologiststhey have to be willingly used in annotations bymany owners of datathey have to be maintained by persons who aretrained in common principles of ontologymaintenance31
    • 32. 32The new profession of biocurator
    • 33. GO has been amazingly successfulHas created a community consensusHas created a web of feedback loops whereusers of the GO can easily report errorsand gapsHas identified principles for successfulontology managementIndispensable to every drug company andevery biology lab33
    • 34. But GO is limited in its scopeit covers only generic biological entities of threesorts:–cellular components–molecular functions–biological processesno diseases, symptoms, diseasebiomarkers, protein interactions, experimentalprocesses …34
    • 35. Extending the GO methodology toother domains of biology andmedicine35
    • 36. RELATIONTO TIMEGRANULARITYCONTINUANT OCCURRENTINDEPENDENT DEPENDENTORGAN ANDORGANISMOrganism(NCBITaxonomy)AnatomicalEntity(FMA,CARO)OrganFunction(FMP, CPRO) PhenotypicQuality(PaTO)BiologicalProcess(GO)CELL ANDCELLULARCOMPONENTCell(CL)CellularComponent(FMA, GO)CellularFunction(GO)MOLECULEMolecule(ChEBI, SO,RnaO, PrO)Molecular Function(GO)Molecular Process(GO)OBO (Open Biomedical Ontology) Foundry proposal(Gene Ontology in yellow) 36
    • 37. RELATIONTO TIMEGRANULARITYCONTINUANT OCCURRENTINDEPENDENT DEPENDENTORGAN ANDORGANISMOrganism(NCBITaxonomy)AnatomicalEntity(FMA,CARO)OrganFunction(FMP, CPRO) PhenotypicQuality(PaTO)BiologicalProcess(GO)CELL ANDCELLULARCOMPONENTCell(CL)CellularComponent(FMA, GO)CellularFunction(GO)MOLECULEMolecule(ChEBI, SO,RnaO, PrO)Molecular Function(GO)Molecular Process(GO)The strategy of orthogonal modules37
    • 38. Ontology Scope URL CustodiansCell Ontology(CL)cell types from prokaryotesto Bard, MichaelAshburner, Oliver HofmanChemical Entities of Bio-logical Interest (ChEBI)molecular entities Dematos,Rafael AlcantaraCommon Anatomy Refer-ence Ontology (CARO)anatomical structures inhuman and model organisms(under development)Melissa Haendel, TerryHayamizu, Cornelius Rosse,David Sutherland,Foundational Model ofAnatomy (FMA)structure of the human bodyfma.biostr.washington.eduJLV Mejino Jr.,Cornelius RosseFunctional GenomicsInvestigation Ontology(FuGO)design, protocol, datainstrumentation, and FuGO Working GroupGene Ontology(GO)cellular components,molecular functions,biological Gene Ontology ConsortiumPhenotypic QualityOntology(PaTO)qualities of detail.cgi?attribute_and_valueMichael Ashburner, SuzannaLewis, Georgios GkoutosProtein Ontology(PrO)protein types andmodifications(under development) Protein Ontology ConsortiumRelation Ontology (RO) relations Barry Smith, Chris MungallRNA Ontology(RnaO)three-dimensional RNAstructures(under development) RNA Ontology ConsortiumSequence Ontology(SO)properties and features ofnucleic Karen Eilbeck
    • 39. How to recreate the success of theGO in other areas1. create a portal for sharing of informationabout existing controlled vocabularies, needsand institutions operating in a given area2. create a library of ontologies in this area3. create a consortium of developers of theseontologies who agree to pool their efforts tocreate a single set of non-overlappingontology modules– one ontology for each sub-area39
    • 40. 40NextGen Ontology PortalPortalCommunitiesSearchOntology LibraryNextGenEnterpriseOntologyOntology Portal• Two-Tiered Registry– NextGen Ontology – consist ofvetted ontologies– Ontology Library – open to thewider community• Ontology Metadata– Ontology owner, domain, andlocation• Ontology Search*– Support ontology discovery
    • 41.  Developers commit in advance tocollaborating with developers of ontologiesin adjacent domains and to working to ensure that, for eachdomain, there is community convergence ona single ontologyhttp://obofoundry.orgThe OBO Foundry: a step-by-step, principles-based approach41
    • 42. OBO Foundry Principles Common governance Common training Robust versioning Common architecture42
    • 43. Anatomy Ontology(FMA*, CARO)EnvironmentOntology(EnvO)InfectiousDiseaseOntology(IDO*)BiologicalProcessOntology (GO*)CellOntology(CL)CellularComponentOntology(FMA*, GO*) PhenotypicQualityOntology(PaTO)Subcellular Anatomy Ontology (SAO)Sequence Ontology(SO*) MolecularFunction(GO*)Protein Ontology(PRO*)OBO Foundry Modular Organizationtop levelmid-leveldomain levelInformation ArtifactOntology(IAO)Ontology for BiomedicalInvestigations(OBI)Ontology of GeneralMedical Science(OGMS)Basic Formal Ontology (BFO)43
    • 44. UCore 2.0 / UCore SLExtension Strategy44top levelmid-leveldomainlevelMilitary domain ontologies as extensions of theUniversal Core Semantic Layer
    • 45. Existing efforts to create modularontology suitesNASA Sweet OntologiesMilitary Intelligence Ontology FoundryPlanned OMG efforts:• OMG (CIA) Financial Event Ontology• Semantic Layer for ISO 20022 (FinancialIndustry Message Scheme)
    • 46. 46Example:Financial Securities OntologyMike Bennett (2007)
    • 47. Basic principles of ontologydevelopment– for formulating definitions– of modularity– of user feedback for error correction and gapidentification– for ensuring compatibility between modules– for using ontologies to annotate legacy data– for using ontologies to create new data– for developing user-specific views
    • 48. Modularity designed to ensure• non-redundancy• annotations can be additive• division of labor among SMEs• lessons learned in one module can benefit work onother modules• transferrable training• motivation of SME users49
    • 49. How the FIMA Reference Datacommunity should solve this problem?Major financial institutionsMajor software vendorsMajor data management companiesEDMC and government principals– should pool information about the controlled vocabularieswhich already exist– create a common library of these controlled vocabularies– create a subset of thought leaders who agree to pool theirefforts in the creation of a suite of ontology modules forcommon use– create a strategy to disseminate and evolve the selectedmodules– create a governance strategy to manage the modules over time– allow bad ontologies to die
    • 50. Urgent need for trained ontologistsSevere shortage of persons with the neededexpertiseUniversity at Buffalo Online Training andCertification Program for Ontologistsfor details: