SlideShare a Scribd company logo
Enabling	
  knowledge	
  management	
  
in	
  the	
  Agronomic	
  Domain	
  
Pierre	
  Larmande	
  
Ins-tute	
  of	
  Computa-onnal	
  Biology	
  	
  
(IBC)	
  
Pierre.larmande@ird.fr	
  
Mul7-­‐scale	
  omics	
  integra7on	
  
BIG	
  DATA	
  
Research	
  areas	
  
Workflow	
  management	
  
Seman7c	
  web	
  
Outline
•  Data integration challenges in the Life Sciences"
•  Ontologies/ Semantic Web Technologies"
•  Agronomic Linked Data project"
•  NGS Big Data project"
Data landscape in the Life Sciences
•  The availability of biological data has increased"
•  Advancements in:"
•  computational biology"
•  genome sequencing"
•  high-throughput technologies "
•  Integrative approaches are necessary to understand
the functioning of biological systems"
•  Lack of effective approaches to integrate data that
has created a gap between data and knowledge"
•  Need for an effective method to bridge gap between
data and underlying meaning"
•  Harvest the power of overlaying different data sets"
Data integration challenges
•  Today’s Web content is suitable for human consumption"
•  Collection of documents"
•  the existence of links that establish connections
between documents"
•  Low on data interoperability and lacks semantics."
Today’s Web
•  Ontologies are formal representations of knowledge - definitions of
concepts, their attributes and relations between them."
•  To integrate data, improve machine interoperability and data
analysis required a conceptual scaffold."
•  Ontological terms used across databases"
•  provide cross-domain common entry points in the description."
•  An array of ontologies are being used to bring structured integration
of various datasets."
•  The Open Biomedical Ontologies (OBO) initiative:"
•  serves as an umbrella for well structured orthogonal
ontologies."
•  Ontologies represented in OBO format and OWL"
Ontologies
Semantic Web Technology
•  An extension of the current Web technologies."
•  Enables navigation and meaningful use of digital
resources."
•  Support aggregation and integration of information
from diverse sources."
•  Based on common and standard formats."
Resource Description Framework (RDF)
•  Framework for representing information about resources
on the Web"
•  Provides a labeled connection between two resources"
•  Uses Unique Resource Identifiers (URI)"
•  Statements take the form of triples:"
Subject	
   Predicate	
   Object	
  
<Gene_A>	
   <codes_for>	
   <Protein_A>	
  
RDF	
  Triple	
  
•  Combining the triples results in a directed, labeled
graph."
<Gene_A>	
  
<Protein_A>	
  
<has_func7on>	
  
<BP_A>	
  
<MF_A>	
  
<Gene_X>	
  
<regulates>	
  
SPARQL
•  Language which allows querying RDF models (graphs) "
•  Powerful, flexible"
•  Its syntax is similar to the one of SQL"
Agronomic Linked Data (AgroLD)
•  Semantic web based system that captures knowledge
pertaining to plant data"
•  Aim:"
•  Capability to answer complex real life questions"
•  Efficient information integration / retrieval"
•  Easy extensibility"
AgroLD – Phase I
•  AgroLD will be developed in phases – "
•  Phase I: includes data on Oryza sps. and Arabidopsis thaliana!
•  SPARQL endpoint: http://bioportal.lirmm.fr:8081/test/ "
AgroLD	
  
•  Integrates information from:"
•  Ontologies: Gene Ontology, Plant Ontology, Plant Trait Ontology,
Plant Environment Ontology, Crop Ontology"
•  Other information sources: "
•  Gene/Protein information: GOA, Gramene, Tair, OryzaBase,
UniPort "
•  QTL information: Gramene"
•  Pathway information: RiceCyc, AraCyc"
•  Germplasm information: Oryza Tag Line"
•  Homology prediction: GreenPhyl"
Information in AgroLD
Biological
process
Cellular
Component
Molecular
Function
Interaction
Gene	
  
Taxon
Protein	
  
protein	
  
Modification
Pathway
has_par7cipant	
  
contains	
  
has_func7on	
  
has_agent	
  
acts_on	
  
is_member_of	
  
codes_for	
  
occurs_in	
  
has_source	
  
protein	
  
Target
Gene
Knowledge representation in AgroLD
Future directions
•  Phase II: having both wider and deeper coverage to promote
comparative analysis"
•  Include varied data types – gene expression data, protein-protein
interaction, Transcription factor- target gene "
•  Developing methods to aid the process of hypotheses generation - e.g.
inference rules."
•  Pluggable with workflow platforms e.g.: Galaxy, OpenAlea
(VirtualPlants)."
•  Engage with biologists to mobilise ‘user-pull’:"
•  Develop real world use cases – studying the molecular mechanism
of panicle differentiation in rice"
Gigwa:	
  Genotype	
  Inves7gator	
  for	
  Genome	
  
Wide	
  Analyses	
  
Aims	
  of	
  Gigwa	
  
•  Manage	
  genomic,	
  transcriptomic	
  and	
  
genotyping	
  data	
  resul-ng	
  from	
  NGS	
  analyses	
  
•  Handle	
  large	
  VCF	
  files	
  to	
  filter,	
  query	
  and	
  
extract	
  data	
  
•  Provide	
  a	
  web	
  user	
  interface	
  to	
  make	
  the	
  
system	
  accessible	
  for	
  all	
  	
  
Main	
  features	
  
o  Based	
  on	
  NoSQL	
  technology	
  
o  Handles	
  VCF	
  files	
  (Variant	
  Call	
  Format)	
  and	
  annota-ons	
  
o  Supports	
  mul-ple	
  variant	
  types:	
  SNPs,	
  InDels,	
  SSRs,	
  SV	
  
o  Powerful	
  genotyping	
  queries	
  	
  
o  Easily	
  scalable	
  with	
  MongoDB	
  sharding	
  
o  Transparent	
  access	
  
Demo	
  –	
  Database	
  selec7on	
  
Mul--­‐database	
  with	
  
restricted	
  access	
  
Demo	
  –	
  datatypes	
  selec7on	
  
Several types of selection:
Variant search is restricted to the
selected reference sequences
and subset of individuals
Demo	
  -­‐	
  Queries	
  
Predefined queries
Demo	
  –	
  Filters	
  selec7on	
  
Several	
  filters	
  
Demo	
  –	
  browsing	
  results	
  
Acknowledgements	
  
Dr. Patrick Valduriez!
Dr. Clement Jonquet
Dr. Manuel Ruiz! Guilhem Sempéré!
Alexis DereeperGautier Sarah
Dr. Aravind Venkatesan
Florian Philippe
Contact:	
  	
  pierre.larmande@ird.fr	
  	
  

More Related Content

Similar to Enabling knowledge management in the Agronomic Domain

Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...
mestato
 
How does semantic technology work?
How does semantic technology work? How does semantic technology work?
How does semantic technology work?
Graeme Wood
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an example
Enis Afgan
 
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
William Gunn
 
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...
Maryann Martone
 
Neurosciences Information Framework (NIF): An example of community Cyberinfra...
Neurosciences Information Framework (NIF): An example of community Cyberinfra...Neurosciences Information Framework (NIF): An example of community Cyberinfra...
Neurosciences Information Framework (NIF): An example of community Cyberinfra...
Neuroscience Information Framework
 
KnetMiner Overview Oct 2017
KnetMiner Overview Oct 2017KnetMiner Overview Oct 2017
KnetMiner Overview Oct 2017
Keywan Hassani-Pak
 
Applied semantic technology and linked data
Applied semantic technology and linked dataApplied semantic technology and linked data
Applied semantic technology and linked data
William Smith
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
Artificial Intelligence Institute at UofSC
 
Neuroscience as networked science
Neuroscience as networked scienceNeuroscience as networked science
Neuroscience as networked science
Neuroscience Information Framework
 
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Amit Sheth
 
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
Bradford Condon
 
Ramil Mauleon: Galaxy: bioinformatics for rice scientists
Ramil Mauleon: Galaxy: bioinformatics for rice scientistsRamil Mauleon: Galaxy: bioinformatics for rice scientists
Ramil Mauleon: Galaxy: bioinformatics for rice scientists
GigaScience, BGI Hong Kong
 
Web-based Tools for Integrative Analysis of Pancreatic Cancer Data
Web-based Tools for Integrative Analysis of Pancreatic Cancer DataWeb-based Tools for Integrative Analysis of Pancreatic Cancer Data
Web-based Tools for Integrative Analysis of Pancreatic Cancer Data
Derek Wright
 
Data-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystemData-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystem
Maryann Martone
 
Databases and Ontologies: Where do we go from here?
Databases and Ontologies:  Where do we go from here?Databases and Ontologies:  Where do we go from here?
Databases and Ontologies: Where do we go from here?
Maryann Martone
 
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
Ilkay Altintas, Ph.D.
 
Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...
Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...
Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...
Vivek Krishnakumar
 
EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017
EITESANGO
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
Warren Kibbe
 

Similar to Enabling knowledge management in the Agronomic Domain (20)

Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...
 
How does semantic technology work?
How does semantic technology work? How does semantic technology work?
How does semantic technology work?
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an example
 
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
 
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...
 
Neurosciences Information Framework (NIF): An example of community Cyberinfra...
Neurosciences Information Framework (NIF): An example of community Cyberinfra...Neurosciences Information Framework (NIF): An example of community Cyberinfra...
Neurosciences Information Framework (NIF): An example of community Cyberinfra...
 
KnetMiner Overview Oct 2017
KnetMiner Overview Oct 2017KnetMiner Overview Oct 2017
KnetMiner Overview Oct 2017
 
Applied semantic technology and linked data
Applied semantic technology and linked dataApplied semantic technology and linked data
Applied semantic technology and linked data
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
 
Neuroscience as networked science
Neuroscience as networked scienceNeuroscience as networked science
Neuroscience as networked science
 
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
 
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
 
Ramil Mauleon: Galaxy: bioinformatics for rice scientists
Ramil Mauleon: Galaxy: bioinformatics for rice scientistsRamil Mauleon: Galaxy: bioinformatics for rice scientists
Ramil Mauleon: Galaxy: bioinformatics for rice scientists
 
Web-based Tools for Integrative Analysis of Pancreatic Cancer Data
Web-based Tools for Integrative Analysis of Pancreatic Cancer DataWeb-based Tools for Integrative Analysis of Pancreatic Cancer Data
Web-based Tools for Integrative Analysis of Pancreatic Cancer Data
 
Data-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystemData-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystem
 
Databases and Ontologies: Where do we go from here?
Databases and Ontologies:  Where do we go from here?Databases and Ontologies:  Where do we go from here?
Databases and Ontologies: Where do we go from here?
 
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
 
Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...
Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...
Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...
 
EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 

Recently uploaded

waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdfwaterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
LengamoLAppostilic
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
PirithiRaju
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
hozt8xgk
 
HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1
Shashank Shekhar Pandey
 
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills MN
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
Aditi Bajpai
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
PirithiRaju
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
Leonel Morgado
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
Advanced-Concepts-Team
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
Vandana Devesh Sharma
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Leonel Morgado
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
Leonel Morgado
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
Areesha Ahmad
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
Carl Bergstrom
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 

Recently uploaded (20)

waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdfwaterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
 
HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1
 
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 

Enabling knowledge management in the Agronomic Domain

  • 1. Enabling  knowledge  management   in  the  Agronomic  Domain   Pierre  Larmande   Ins-tute  of  Computa-onnal  Biology     (IBC)   Pierre.larmande@ird.fr  
  • 2. Mul7-­‐scale  omics  integra7on   BIG  DATA   Research  areas   Workflow  management   Seman7c  web  
  • 3. Outline •  Data integration challenges in the Life Sciences" •  Ontologies/ Semantic Web Technologies" •  Agronomic Linked Data project" •  NGS Big Data project"
  • 4. Data landscape in the Life Sciences •  The availability of biological data has increased" •  Advancements in:" •  computational biology" •  genome sequencing" •  high-throughput technologies " •  Integrative approaches are necessary to understand the functioning of biological systems"
  • 5. •  Lack of effective approaches to integrate data that has created a gap between data and knowledge" •  Need for an effective method to bridge gap between data and underlying meaning" •  Harvest the power of overlaying different data sets" Data integration challenges
  • 6. •  Today’s Web content is suitable for human consumption" •  Collection of documents" •  the existence of links that establish connections between documents" •  Low on data interoperability and lacks semantics." Today’s Web
  • 7. •  Ontologies are formal representations of knowledge - definitions of concepts, their attributes and relations between them." •  To integrate data, improve machine interoperability and data analysis required a conceptual scaffold." •  Ontological terms used across databases" •  provide cross-domain common entry points in the description." •  An array of ontologies are being used to bring structured integration of various datasets." •  The Open Biomedical Ontologies (OBO) initiative:" •  serves as an umbrella for well structured orthogonal ontologies." •  Ontologies represented in OBO format and OWL" Ontologies
  • 8. Semantic Web Technology •  An extension of the current Web technologies." •  Enables navigation and meaningful use of digital resources." •  Support aggregation and integration of information from diverse sources." •  Based on common and standard formats."
  • 9. Resource Description Framework (RDF) •  Framework for representing information about resources on the Web" •  Provides a labeled connection between two resources" •  Uses Unique Resource Identifiers (URI)" •  Statements take the form of triples:" Subject   Predicate   Object   <Gene_A>   <codes_for>   <Protein_A>   RDF  Triple  
  • 10. •  Combining the triples results in a directed, labeled graph." <Gene_A>   <Protein_A>   <has_func7on>   <BP_A>   <MF_A>   <Gene_X>   <regulates>  
  • 11. SPARQL •  Language which allows querying RDF models (graphs) " •  Powerful, flexible" •  Its syntax is similar to the one of SQL"
  • 12. Agronomic Linked Data (AgroLD) •  Semantic web based system that captures knowledge pertaining to plant data" •  Aim:" •  Capability to answer complex real life questions" •  Efficient information integration / retrieval" •  Easy extensibility"
  • 13. AgroLD – Phase I •  AgroLD will be developed in phases – " •  Phase I: includes data on Oryza sps. and Arabidopsis thaliana! •  SPARQL endpoint: http://bioportal.lirmm.fr:8081/test/ " AgroLD  
  • 14. •  Integrates information from:" •  Ontologies: Gene Ontology, Plant Ontology, Plant Trait Ontology, Plant Environment Ontology, Crop Ontology" •  Other information sources: " •  Gene/Protein information: GOA, Gramene, Tair, OryzaBase, UniPort " •  QTL information: Gramene" •  Pathway information: RiceCyc, AraCyc" •  Germplasm information: Oryza Tag Line" •  Homology prediction: GreenPhyl" Information in AgroLD
  • 15. Biological process Cellular Component Molecular Function Interaction Gene   Taxon Protein   protein   Modification Pathway has_par7cipant   contains   has_func7on   has_agent   acts_on   is_member_of   codes_for   occurs_in   has_source   protein   Target Gene Knowledge representation in AgroLD
  • 16. Future directions •  Phase II: having both wider and deeper coverage to promote comparative analysis" •  Include varied data types – gene expression data, protein-protein interaction, Transcription factor- target gene " •  Developing methods to aid the process of hypotheses generation - e.g. inference rules." •  Pluggable with workflow platforms e.g.: Galaxy, OpenAlea (VirtualPlants)." •  Engage with biologists to mobilise ‘user-pull’:" •  Develop real world use cases – studying the molecular mechanism of panicle differentiation in rice"
  • 17. Gigwa:  Genotype  Inves7gator  for  Genome   Wide  Analyses  
  • 18. Aims  of  Gigwa   •  Manage  genomic,  transcriptomic  and   genotyping  data  resul-ng  from  NGS  analyses   •  Handle  large  VCF  files  to  filter,  query  and   extract  data   •  Provide  a  web  user  interface  to  make  the   system  accessible  for  all    
  • 19. Main  features   o  Based  on  NoSQL  technology   o  Handles  VCF  files  (Variant  Call  Format)  and  annota-ons   o  Supports  mul-ple  variant  types:  SNPs,  InDels,  SSRs,  SV   o  Powerful  genotyping  queries     o  Easily  scalable  with  MongoDB  sharding   o  Transparent  access  
  • 20. Demo  –  Database  selec7on   Mul--­‐database  with   restricted  access  
  • 21. Demo  –  datatypes  selec7on   Several types of selection: Variant search is restricted to the selected reference sequences and subset of individuals
  • 22. Demo  -­‐  Queries   Predefined queries
  • 23. Demo  –  Filters  selec7on   Several  filters  
  • 24. Demo  –  browsing  results  
  • 25. Acknowledgements   Dr. Patrick Valduriez! Dr. Clement Jonquet Dr. Manuel Ruiz! Guilhem Sempéré! Alexis DereeperGautier Sarah Dr. Aravind Venkatesan Florian Philippe Contact:    pierre.larmande@ird.fr