The document provides information about the Experimental Factor Ontology (EFO). EFO models experimental factors from genomic studies stored in the ArrayExpress archive, including species, diseases, and cell lines. It captures about 30% of terms not already in the UMLS. EFO uses reference ontologies and automatic mapping to import synonyms and definitions. Regression testing verifies ontology changes. EFO has a web interface and content negotiation support, and defines experimental factor hierarchies used in the Gene Expression Atlas to aggregate experiments.
Investigating Term Reuse and Overlap in Biomedical OntologiesMaulik Kamdar
Our conference presentation at the 6th International Conference on Biomedical Ontology (ICBO), held at Lisbon, Portugal, during 27th-30th July 2015. Conference Proceedings: http://icbo2015.fc.ul.pt/ICBO2015Proceedings.pdf
Flash introduction to Qiime2 -- 16S Amplicon analysisAndrea Telatin
Review of basic concepts in the 16S Amplicon analysis workflow for microbial community characterization, and brief introdution to Qiime and Qiime 2 concepts.
BiteSized seminar at Quadram Institute, UK
Investigating Term Reuse and Overlap in Biomedical OntologiesMaulik Kamdar
Our conference presentation at the 6th International Conference on Biomedical Ontology (ICBO), held at Lisbon, Portugal, during 27th-30th July 2015. Conference Proceedings: http://icbo2015.fc.ul.pt/ICBO2015Proceedings.pdf
Flash introduction to Qiime2 -- 16S Amplicon analysisAndrea Telatin
Review of basic concepts in the 16S Amplicon analysis workflow for microbial community characterization, and brief introdution to Qiime and Qiime 2 concepts.
BiteSized seminar at Quadram Institute, UK
Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in ...Robert (Rob) Salomon
"Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in Cytometry" was an Invited Tutorial given at the 2019 CYTO conference for the the International Society for the Advancement of Cytometry on the 22nd May 2019. This tutorial was recorded and we expect that it will be converted to a CYTOU webinar in the near future.
This tutorial will begin by explaining why the emerging field of Genomic Cytometry, i.e. the measurement of cells using genomic techniques (e.g. sequencing), in conjunction with more traditional cytometry techniques such as fluorescence, mass and imaging cytometry is becoming a standard tool for biologists looking to unravel complex cellular processes and to develop a deeper understanding of heterogeneity.
We will give a detailed overview of the various technologies that have allowed the emergence of Genomic Cytometry as well as those that continue to push the boundaries of cellular characterisation.
We will then provide a basic overview of the sequencing process such that both research cytometerists and the staff for the cytometry SRL are better equipped to understand the downstream genomic component of Genomic Cytometry.
Finally, we will wrap up the session with case studies that illustrate the power of the genomic cytometry approach and will give a brief outline of where we feel the field needs to go as it matures. We expect attendees will gain a better understanding of 1) the rapidly maturing field of Genomic Cytometry and 2) how Genomic Cytometry should be leveraged into more traditional cytometry workflows.
World-wide data exchange in metabolomics, Wageningen, October 2016Christoph Steinbeck
Talk given at the Netherlands Institute of Ecology in Wageningen, where I describe the development of the MetaboLights database and the value of data sharing in Metabolomics and molecular Biology in General
Ontologies and Semantic Web technologies play an important role in the life sciences to help make data more interoperable and reusable. There are now many publicly available ontologies that enable biologists to describe everything from gene function through to animal physiology and disease.
Various efforts such as the Open Biomedical Ontologies (OBO) foundry provide central registries for biomedical ontologies and ensure they remain interoperable through a set of common shared development principles.
At EMBL-EBI we contribute to the development of biomedical ontologies and make extensive use of them in the annotation of public datasets. Biological data typically comes with rich and often complex metadata, so the ontologies provide a standard way to capture “what the data is about” and gives us hooks to connect to more data about similar things.
These ontology annotations have been put to good use in a number of large-scale data integration efforts and there’s an increasing recognition of the need for ontologies in making data FAIR (Findable, Accessible, Interoperable and Reusable).
EMBL-EBI build a number of integrative data platforms where ontologies are at the core of our domain models. One example is the Open Targets platform, where data about disease from 18 different databases can be aggregated and grouped based on therapeutic areas in the ontology and used to identify potential drug targets.
The ontologies team at EMBL-EBI provide a suite of services that are aimed at making ontologies more accessible for both humans and machines. We work with scientific data curators and software developers to integrate ontologies and semantics into both the data generation and data presentation workflows. We provide:
– An ontology lookup service (OLS) that provides search and visualisation services to over 200+ ontologies
– Services for automating the annotation of metadata and learning from previous annotations (Zooma)
– An ontology mapping and alignment service (OXO)
– Tools for working with metadata and ontologies in spreadsheets (Webulous)
– Software for enriching documents in search engines to support “semantic” query expansion
I’ll present how we are using these services at EMBL-EBI to scale up the semantic annotation of metadata. I’ll talk about our open source technology stack and describe how we utilise a polyglot persistence approach (graph databases, triples stores, document stores etc) to optimize how we deliver ontologies and semantics to our users.
Ontologies for life sciences: examples from the gene ontologyMelanie Courtot
A half day course presented during the Earlham Institute summer school on bioinformatics 2016, in Norwich, UK, http://www.earlham.ac.uk/earlham-institute-summer-school-bioinformatics
Connecting life sciences data at the European Bioinformatics InstituteConnected Data World
Tony Burdett's slides from his talk at Connected Data London. Tony is a Senior Software Engineer at The European Bioinformatics Institute. He presented the complexity of data at the EMBL-EBI and what is their solution to make sense of all this data.
Presentation pathway extensions using knowledge integration and network approaches presented at the Systems Biology Institute in Luxembourg on November 28 2012.
Facilitating semantic alignment of EMBL-EBI services using ontologies and semantic web technology. Presentation at the BioHackathon Symposium 2016, Japan.
Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in ...Robert (Rob) Salomon
"Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in Cytometry" was an Invited Tutorial given at the 2019 CYTO conference for the the International Society for the Advancement of Cytometry on the 22nd May 2019. This tutorial was recorded and we expect that it will be converted to a CYTOU webinar in the near future.
This tutorial will begin by explaining why the emerging field of Genomic Cytometry, i.e. the measurement of cells using genomic techniques (e.g. sequencing), in conjunction with more traditional cytometry techniques such as fluorescence, mass and imaging cytometry is becoming a standard tool for biologists looking to unravel complex cellular processes and to develop a deeper understanding of heterogeneity.
We will give a detailed overview of the various technologies that have allowed the emergence of Genomic Cytometry as well as those that continue to push the boundaries of cellular characterisation.
We will then provide a basic overview of the sequencing process such that both research cytometerists and the staff for the cytometry SRL are better equipped to understand the downstream genomic component of Genomic Cytometry.
Finally, we will wrap up the session with case studies that illustrate the power of the genomic cytometry approach and will give a brief outline of where we feel the field needs to go as it matures. We expect attendees will gain a better understanding of 1) the rapidly maturing field of Genomic Cytometry and 2) how Genomic Cytometry should be leveraged into more traditional cytometry workflows.
World-wide data exchange in metabolomics, Wageningen, October 2016Christoph Steinbeck
Talk given at the Netherlands Institute of Ecology in Wageningen, where I describe the development of the MetaboLights database and the value of data sharing in Metabolomics and molecular Biology in General
Ontologies and Semantic Web technologies play an important role in the life sciences to help make data more interoperable and reusable. There are now many publicly available ontologies that enable biologists to describe everything from gene function through to animal physiology and disease.
Various efforts such as the Open Biomedical Ontologies (OBO) foundry provide central registries for biomedical ontologies and ensure they remain interoperable through a set of common shared development principles.
At EMBL-EBI we contribute to the development of biomedical ontologies and make extensive use of them in the annotation of public datasets. Biological data typically comes with rich and often complex metadata, so the ontologies provide a standard way to capture “what the data is about” and gives us hooks to connect to more data about similar things.
These ontology annotations have been put to good use in a number of large-scale data integration efforts and there’s an increasing recognition of the need for ontologies in making data FAIR (Findable, Accessible, Interoperable and Reusable).
EMBL-EBI build a number of integrative data platforms where ontologies are at the core of our domain models. One example is the Open Targets platform, where data about disease from 18 different databases can be aggregated and grouped based on therapeutic areas in the ontology and used to identify potential drug targets.
The ontologies team at EMBL-EBI provide a suite of services that are aimed at making ontologies more accessible for both humans and machines. We work with scientific data curators and software developers to integrate ontologies and semantics into both the data generation and data presentation workflows. We provide:
– An ontology lookup service (OLS) that provides search and visualisation services to over 200+ ontologies
– Services for automating the annotation of metadata and learning from previous annotations (Zooma)
– An ontology mapping and alignment service (OXO)
– Tools for working with metadata and ontologies in spreadsheets (Webulous)
– Software for enriching documents in search engines to support “semantic” query expansion
I’ll present how we are using these services at EMBL-EBI to scale up the semantic annotation of metadata. I’ll talk about our open source technology stack and describe how we utilise a polyglot persistence approach (graph databases, triples stores, document stores etc) to optimize how we deliver ontologies and semantics to our users.
Ontologies for life sciences: examples from the gene ontologyMelanie Courtot
A half day course presented during the Earlham Institute summer school on bioinformatics 2016, in Norwich, UK, http://www.earlham.ac.uk/earlham-institute-summer-school-bioinformatics
Connecting life sciences data at the European Bioinformatics InstituteConnected Data World
Tony Burdett's slides from his talk at Connected Data London. Tony is a Senior Software Engineer at The European Bioinformatics Institute. He presented the complexity of data at the EMBL-EBI and what is their solution to make sense of all this data.
Presentation pathway extensions using knowledge integration and network approaches presented at the Systems Biology Institute in Luxembourg on November 28 2012.
Facilitating semantic alignment of EMBL-EBI services using ontologies and semantic web technology. Presentation at the BioHackathon Symposium 2016, Japan.
The NCBI Boot Camp for Beginners was designed to offer an overview of the NCBI suite of resources. In the first half of the presentation, highlighted databases were covered in four main categories: literature, sequences, genes & genomes and expression & structure. The second half of the class used the apolipoprotein A as a query that was explored through many of the NCBI databases, from identifying the reference sequences to a structural analysis of the Cys130Arg variant.
Presented at Cambridge Semantic Web Monthly Meetup on September 8, 2015
http://www.meetup.com/The-Cambridge-Semantic-Web-Meetup-Group/events/223161012/
Connecting the dots: drug information and Linked DataTomasz Adamusiak
Presented as part of the AMIA2014 Knowledge Representation + Semantics and
Clinical Information Systems Working Groups Pre-Symposium "Drug
Terminology Standards: Meaningful Use and Better Knowledge"
November 16, 2014
Washington, DC
EHR-based Phenome Wide Association Study in Pancreatic CancerTomasz Adamusiak
Presented at 2014 AMIA Joint Summits, April 9, 2014, San Francisco, CA
BACKGROUND. Pancreatic cancer is one of the most common causes of cancer-related deaths in the United States, it is difficult to detect early and typically has a very poor prognosis. We present a novel method of large-scale clinical hypothesis generation based on phenome wide association study performed using Electronic Health Records (EHR) in a pancreatic cancer cohort. METHODS. The study population consisted of 1,154 patients diagnosed with malignant neoplasm of pancreas seen at The Froedtert & The Medical College of Wisconsin academic medical center between the years 2004 and 2013. We evaluated death of a patient as the primary clinical outcome and tested its association with the phenome, which consisted of over 2.5 million structured clinical observations extracted out of the EHR including labs, medications, phenotypes, diseases and procedures. The individual observations were encoded in the EHR using 6,617 unique ICD-9, CPT-4, LOINC, and RxNorm codes. We remapped this initial code set into UMLS concepts and then hierarchically expanded to support generalization into the final set of 10,164 clinical concepts, which formed the final phenome. We then tested all possible pairwise associations between any of the original 10,164 concepts and death as the primary outcome. RESULTS. After correcting for multiple testing and folding back (generalizing) child concepts were appropriate, we found 231 concepts to be significantly associated with death in the study population.
CONCLUSIONS. With the abundance of structured EHR data, phenome wide association studies combined with
knowledge engineering can be a viable method of rapid hypothesis generation.
Creating Dynamic Groupers Using Overrepresentation of Clinical TermsTomasz Adamusiak
Presented at Epic's Research Advisory Council, April 3, 2014, Verona, WI
See a novel approach to query expansion based on pre-existing structured information within the EHR. Presenters adopted over-representation analysis to find statistically significant associations among the clinical terms extracted from Clarity reports. The study population consisted of over 7,000 patients and their 12 million observations - including labs, medications, phenotypes, diseases, and procedures. See the detailed findings and discuss computational and terminology challenges.
Semantic Interoperability in Health Information ExchangeTomasz Adamusiak
Presented at HIMSS14 Annual Conference & Exhibition, February 26, 2014, Orlando, FL.
http://www.himssconference.org/Education/EventDetail.aspx?ItemNumber=25331
Meaningful Use certification requires several large vocabulary standards for representing clinical facts in health information exchange. This presents unique challenges for semantic interoperability such as information loss in translating from and to internal data dictionaries, semantic drift, dealing with legacy content (e.g., ICD-9) and clinical information reconciliation.
Re-identification of de-identified PHI date elementsTomasz Adamusiak
Presented in the Late Breaking Research Abstracts - Machine Learning in Relation to EMRs session at the American Medical Informatics Associatio (AMIA) 2013 Annual Symposium on 11/20/2013
Integrating SNOMED CT with other Meaningful Use vocabulary standards (LOINC, ...Tomasz Adamusiak
Presented at SNOMED CT Implementation Showcase held in Washington DC’s metro area, USA on the 10th-11th October 2013.
Abstract:
http://www.ihtsdo.org/show13/abstract14.pdf
Next-generation phenotyping using UMLS and Meaningful Use ontologies: SNOMED ...Tomasz Adamusiak
SNOMED CT, LOINC, and RxNorm, fuelled by the Meaningful Use legislation, are poised to become the cornerstone of U.S. health information interchange. SNOMED CT is one of the most comprehensive, multilingual medical terminologies in the world. LOINC is a universal standard for identifying laboratory observations. RxNorm is a standardized nomenclature for generic and branded drugs. All three are integrated within the Unified Medical Language System (UMLS) maintained by the U.S. National Library of Medicine.
While physicians rarely have to deal with clinical terminologies directly, these are indispensable for data querying, validation and reconciliation. The Clinical Informatics team at the Medical College of Wisconsin has developed ClinMiner (https://clinminer.hmgc.mcw.edu), a clinical research portal for clinical and diagnostic information on patients in genetics clinics and clinical sequencing programs, as well as other clinical research projects. ClinMiner is a larger system that incorporates data entry forms, patient reports, advanced querying, export and data visualization. Data for the system consists of many clinical and referral documents the patients have accumulated throughout their clinic and diagnostic histories, and are standardized through the three Meaningful Use ontologies: SNOMED CT, RxNorm and LOINC; integrated into a single UMLS perspective that allows for seamless and dynamic translation between the annotating sources, as well as provides a consolidated view of the underlying patient data.
This approach is unique in integrating all three terminologies into a single workflow of a clinical application, and in fact is not limited to Meaningful Use, as any terminology integrated within the UMLS can be used to annotate, visualize, and query data. This is of particular significance for reintegrating legacy clinical information, for example, billing data annotated with ICD-9 codes in the process of transitioning to ICD-10. Most importantly, as large resources such as SNOMED CT and the UMLS often remain underused due to their sheer size and complexity, ClinMiner demonstrates that the additional effort is well worth it.
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
This presentation includes basic of PCOS their pathology and treatment and also Ayurveda correlation of PCOS and Ayurvedic line of treatment mentioned in classics.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Thinking of getting a dog? Be aware that breeds like Pit Bulls, Rottweilers, and German Shepherds can be loyal and dangerous. Proper training and socialization are crucial to preventing aggressive behaviors. Ensure safety by understanding their needs and always supervising interactions. Stay safe, and enjoy your furry friends!
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
How to Build a Module in Odoo 17 Using the Scaffold MethodCeline George
Odoo provides an option for creating a module by using a single line command. By using this command the user can make a whole structure of a module. It is very easy for a beginner to make a module. There is no need to make each file manually. This slide will show how to create a module using the scaffold method.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
Unifying ontology services for functional genomic annotations
1. Unifying ontology services for
functional genomic annotations
Tomasz Adamusiak MD PhD
7omasz
Postdoc at LHC CgSB since 10/2011
1
EBI is an Outstation of the European Molecular Biology Laboratory.
2. The European Molecular Biology Laboratory, a
“European NIH” for molecular biology
Heidelberg Hamburg Hinxton
Basic research in Structural biology Bioinformatics
molecular biology
Administration Grenoble Monterotondo
EMBO
• 1500 staff
• >60 nationalities
Structural biology Mouse biology
2
2
4. Focus on providing database services to bioinformatics
community Literature and ontologies
CiteXplore, GO
Genomes
Ensembl
Ensembl Genomes Protein families,
EGA motifs and domains
Functional InterPro
Nucleotide sequence genomics
ENA ArrayExpress
Expression Atlas Macromolecular
EFO PDBe
Protein activity
IntAct , PRIDE
Pathways
Reactome
Protein Sequences
UniProt
Chemical entities Systems
ChEBI BioModels
BioSamples
Chemogenomics
ChEMBL 4
5. ArrayExpress is the 2nd largest resource for public
transcriptomics data (CIBEX < AE < GEO)
‘blood cancer’
‘hematological neoplasm’
‘haematological neoplasm’
Archive
‘lymphoma/leukemia’ EFO: lymphoid neoplasm Atlas
‘leukaemia’
‘haematological cancer’
25k exps 2.6k exps
EFO
5
6. Experimental Factor Ontology (EFO)
• Modelling experimental factors currently in Archive:
species, diseases, cell lines, etc.
• Capture ~30% not in UMLS
• Determined by Atlas, Ensembl, external requests (Upenn)
and EBI site-wide search
6
7. Developed a process to automatically import metadata
from reference ontologies and validate changes
20000
SYNONYMS
18000
16000
Number of classes or synonyms
14000
12000
10000
8000
6000 CLASSES
4000
2000
0
Aug-08 Jan-09 Jan-10 Jan-11 Aug-11
Time
7
9. Did not evaluate Norm in this context
• Production requirements (Perl, OWL)
• Improvement (ngrams) over legacy code
• Primary use case mapping EFO against AE annotations:
• 2'-deoxy-5-azacytidine to 5-aza-2'-deoxycytidine CHEBI:50131
• Barrett's Esophagus to Barrett's esophagus
• Difficult to use MetaMap on non-UMLS ontologies
9
10. Step 2: definitions and synonyms are pulled in
from reference ontologies via NCBO BioPortal
S acute lymphoblastic leukemia
http://www.ebi.ac.uk/efo/EFO_0000220
T
E xref:
xref:
SNOMEDCT:91857003 translate IDs
P DOID:9952
xref:
xref:
NCIt:C3167
1 NCIt:C3167
synonym:
Acute lymphoid leukaemia, disease
definition:
S Leukemia with an acute onset [...] fetch
T bioportal_provenance:
E Acute Lymphocytic Leukaemia
[accessedResource: NCIt:C3167]
P [accessDate: 05-04-2011]
bioportal_provenance:
2 Leukemia with an acute onset [...]
+ provenance
[accessedResource: NCIt:C3167]
[accessDate: 05-04-2011]
10
11. Step 3: regression testing package produces a
report for manual verification of the import
• 13 different tests
• Shared xrefs, e.g. NCIt:C17459 (Hispanic or Latino)
• Hispanic (EFO_0003169)
• Latino (EFO_0003166)
• Shared synonyms, e.g. head kidney (ZFA:0000669)
• pronephros (EFO_0000927)
• bone marrow (EFO_0000868)
• Changes in external sources (11/2010 vs. 5/2010):
• synonym Spinocerebellar Ataxias (EFO_0002624) no longer in
DOID:1441
• definition Organ with organ cavity which connects the cavity of the
urinary bladder to the exterior. […] (EFO_0000931) no longer in
FMAID:1966
11
12. EFO has a unique XSLT-based web presence
http://www.ebi.ac.uk/efo/overview
12
13. EFO URIs are readable by humans and computers
13
14. Content negotiation is an alternative approach
Tuckey’s server side urlrewritefilter
<rule>
<condition name="Accept" type="header">
application/rdf+xml</condition>
<from>^/$</from>
<to type="redirect">/efo/efo.owl</to>
</rule>
14
15. The Semantic Web provides a common framework that allows data to
be shared and reused across application, enterprise, and community
boundaries (W3C)
If you want to put something on the web there are three rules:
1. All kinds of conceptual things, they have names now that start with
HTTP.
2. If I take one of these HTTP names and I look it up [...] I fetch the
data using the HTTP protocol from the web, I will get back some
data in a standard format
3. It's got relationships [..] the other thing that it's related to is given
one of those names that starts HTTP. So, I can go ahead and look
that thing up.
Sir Tim Berners-Lee on the next Web (TED2009)
15
16. RDF triple is the core concept underpinning the
semantic web
subject predicate object
<http://www.example.com/index.html> <http://purl.org/dc/elements/1.1/creator> „John Smith”
dc:creator
example:index.html John Smith
Entity Attribute Value (EAV) model with well defined semantics
16
17. Open linked data lacks central URI reconciliation
• Responsibility for URIs:
http://bio2rdf.org/mesh:68009154
http://bio2rdf.org/pubmed:11992264
http://bio2rdf.org/go:0016458
http://purl.org/obo/owl/GO#GO_0016458
• Versioning:
http://sig.uw.edu/fma#Anatomical_entity (FMA 3.1)
http://sig.biostr.washington.edu/fma3.0#Anatomical_entity (FMA 3.0)
http://purl.obolibrary.org/obo/GO_0016458 (Foundry-compliant URI)
• Requires institutional support
• Would be great to have public UMLS in RDF
17
19. There is no single ontology resource that covers
all the use cases
Local ontologies in OWL/OBO
NCBO BioPortal
EBI Ontology Lookup Service
...and no huffing and puffing
will blow all of them down...
Leonard Leslie Brooke (1904) 19
20. EBI Ontology Lookup Service
• 82 ontologies
• OBO ontologies
• SOAP web services/Java client
• First out there
Cote RG, Jones P, Apweiler R, Hermjakob H.The ontology lookup service, a
lightweight cross-platform tool for controlled vocabulary queries.
BMC Bioinformatics. 2006 Feb 28;7(1):97
20
21. NCBO BioPortal
• 267 ontologies and growing
• Both OWL and OBO
• REST web services
• Rich in functionality
Noy, N.F., Shah, N.H., Whetzel, P.L., Dai, B., Dorf, M., Griffith, N., Jonquet, C.,
Rubin, D.L., Storey, M.A., Chute, C.G., Musen, M.A.BioPortal: ontologies and
integrated data resources at the click of a mouse. Nucleic Acids Res. 2009 Jul
1;37(Web Server issue):W170-3. 21
23. OWL API
• Reference implementation for manipulating and
serialising OWL2
• Multiple parsers (incl. OBO)
• Reasoner interfaces
• Low level access
Sean Bechhofer, Phillip Lord, Raphael Volz. Cooking the Semantic Web with the
OWL API. 2nd International Semantic Web Conference, ISWC, Sanibel Island,
Florida, October 2003
23
24. We wanted to annotate data with ontology terms within the
MOLGENIS framework – ontology browser
OWL API
EFO Bioportal Import
Ontology Browser
24
26. A simple facade to ontology resources providing a set of
functions most common to ontology APIs (e.g. HL7 CTS2,
UMLS API) under a single interface
http://www.ontocat.org
BioPortal
searchAll()
searchOntology()
getChildren()
EBI OLS
getParents()
getSynonyms()
getDefinitions()
OWL getAllParents()
getAllChildren()
getRelations()
OBO ...
?
26
27. There are many ways how you could use
OntoCAT
• Store data and annotate with ontology terms
• OntoCAT database and browser
• Work with ontologies in R
• Bioconductor ontocat R package
• Integrate a number of ontologies in a local repository
• OntoCAT REST server
• Add ontology support to your GWT web application
• OntoCAT GoogleApp
http://www.ontocat.org/wiki/OntocatDownload
27
29. Developed for internal and external use cases
Example 11@ontocat.org
• Automatically obtain CUIs from UMLS sources for
extracted terms via BioPortal
• Shamim Mollah, Bleeding History Phenotype Ontology,
Rockefeller University Center for Clinical and
Translational Science, New York, NY
1. Get all terms from BHP
2. Search for corresponding UMLS terms (also MetaMap)
3. Obtain CUIs for mapped terms through BioPortal
29
31. Use case – explore beyond subsumption
Example 16@ontocat.org
• Requested by reviewer for partonomy in GO
• Easy in OBO, hard in OWL
• Computationally intensive:
• (starting from the root node)
• 1. classify all children of inverse_relation some class
• 2. repeat 1. on all new nodes
• 3. finish if all nodes were seen
• OWL API is not thread safe
31
32. Reasoning is fundamental to exploring the
hierarchies of more expressive ontologies
Heart
Heart
Component
Left
Heart
partOf
is_a
Mitral
Valve
32
33. When ontologies classify as inconsistent it is not often
obvious why (Open World Assumption)
• Mary is_a CitizenOfFrance
Is Paul a citizen of France?
Closed World, e.g. SQL databases: NO
Ontologies: ?
• OWL is more expressive: classes, individuals, closure
axioms, value partitions, cardinality restrictions, property
chains; disjoint, reflexive, irreflexive, symmetric and anti-
symmetric, inverse or transitive properties
• Explanation in OWL
(http://owl.cs.manchester.ac.uk/explanation/)
33
34. The extra information is used in QC of EFO, but
not in query expansion
ventricular subClassOf
cardiomyopathy
myocardium part_of
has_disease_location
myocardium
atrial myocardium
cardiac ventricle
atrium
heart atrial fibrillation
Heart disease? 34
38. EFO inferred is_a hierarchy defines how experiments are
aggregated in Atlas for re-analysis
http://www.ebi.ac.uk/gxa
38
39. It is possible to infer diseases of heart computationally
rather than asserting this information directly
ventricular subClassOf
cardiomyopathy
myocardium part_of
has_disease_location
myocardium
atrial myocardium
cardiac ventricle
atrium
heart atrial fibrillation
has_disease_location ∃ (heart ∪ part_of ∃ heart)
heart disease ≡
39
40. One RDF graph per experiment accession
Context-specific gene expression is grouped with blank nodes
experiment accession Predicates
Homo sapiens
efo:EFO_0004033
rdf:type rdf:type
liver
organism OBI_0100026
E-AFMX-1
gxa:E-AFMX-1 is_about IAO_0000136
NONDE
gene EFO_0002606
experimental factor EFO_0000001
1.0E30
discretized differential
EFO_0004034
expression
p value OBI_0000175
PRDX2
ensembl:ENSG00000167815
gene
efo:EFO_0002606
W3C Note on RDF Approach to Gene Expression Data (in progress)
40
Semantic Web for Health Care and Life Sciences Interest Group, BioRDF task force
41. One RDF graph per experiment accession
Context-specific gene expression is grouped with blank nodes
experiment accession Predicates
Homo sapiens
efo:EFO_0004033
rdf:type rdf:type
liver
organism OBI_0100026
E-AFMX-1
gxa:E-AFMX-1 is_about IAO_0000136
NONDE
gene EFO_0002606
experimental factor EFO_0000001
1.0E30
discretized differential
EFO_0004034
expression
p value OBI_0000175
approximately 14 PRDX2
weeks ensembl:ENSG00000167815
NONDE
gene
efo:EFO_0002606
1.0E30
W3C Note on RDF Approach to Gene Expression Data (in progress)
41
Semantic Web for Health Care and Life Sciences Interest Group, BioRDF task force
43. Semantic Web is unlikely to take over the web,
but has the potential to unify all of bioinformatics
OntoCAT
EFO
Semantic
Atlas
http://gigaom.com/broadband/the-storage-vs-bandwidth-debate/ 43
44. Acknowledgments
• Morris A. Swertz’s group at the Genomics Coordination Center (GCC),
University of Groningen
This work was supported by the European
• K Joeri van der Velde
Community's Seventh Framework
• Despoina Antonakaki Programmes GEN2PHEN [grant number
• Dasha Zhernakova 200754], SLING [grant number 226073], and
SYBARIS [grant number 242220], the
• James Malone European Molecular Biology Laboratory, the
• Helen Parkinson Netherlands Organisation for Scientific
Research [NWO/Rubicon grant number
• FuzzyRecogniser: Emma Hastings 825.09.008], and the Netherlands
• Niran Abeygunawardena Bioinformatics Centre [BioAssist/Biobanking
platform and BioRange grant SP1.2.3]
• Ele Holloway
• Tim Rayner OntoCAT logo courtesy of Eamonn Maguire
• Zooma: Tony Burdett
• Bioconductor/R package: Natalja Kurbatova, Pavel Kurnosov, Misha
Kapushesky
Special thanks go to NCBO BioPortal and
EBI OLS support teams for all the
comprehensive help they provide
44