SlideShare a Scribd company logo
Semantic Web Applications and Tools for Life Sciences
November 2008

Genome and Proteome data integration in RDF
Nadia Anwar, Ela Hunt, Walter Kolch and Andy Pitt
                       e                       Me
                                  ts
                                                 tab
                   nom


                                       Pr
                              rip
                 e                                  olit


                                         ot
                G                                       es
                             sc




                                         ein
                             an




                                          s
                           Tr




     Data                                                    Discovery
Outline
• Data Integration in Bioinformatics.


• Semantic data integration


• Francisella


• Integrating genome annotations with experimental proteomics data in RDF


• Further work
Data Integration is not a solved problem
Information discovery is not Integrated

       High TP        Microarray                   Proteomics
                                   Computational                 Computational
      Sequencing     experiments                   experiments
                                     analysis                      analysis       Systems Biology
                                                                                 Synthetic Networks/
     Genomics                                          Proteomics                     Pathways
     Sequence                                        Peptide Profiles                 Predictions
                      Gene Expression
   ORF Prediction     Transcript Profile            Peptide Abundance
      Genome              Transcript               Protein Identification
    Comparisons          Abundance                 Protein Interactions
                                                     PT-Modifications             Metabolomics
              LIMS                 LIMS                           LIMS                           LIMS




     Genome          Regulatory Networks           Metabolic Pathways
                                                                                 Translational
                                                                                 Medicine
Semantic Data Integration across omes data silos




  Data Genes   Transcripts    Peptides   Metabolites   Genotype   Information

                             Data                  Discovery
Proof of concept
Francisella tularensis




       ulceroglandular
          tularaemia




 respiratory     oculoglandular
 tularaemia        tularaemia
Bioterrorism
• Francisella tularensis is a very successful intracellular pathogen that causes
  severe disease (respiratory tulareamia is the most acute form of the disease)
• low infectious dose (10-50 bacterium compared to anthrax which requires
  8,000-15,000 spores)
• weaponisation fears
Data sources
Genome
RDF
  http://img.jgi.doe.gov/cgi-bin/pub/main.cgi?section=TaxonDetail&page=taxonDetail&taxon_oid=639633024#export




                                                                                  229976
                                                           +

                                                               (3)IMG_S:genomic_location_strand


                                                                                                      229107
                                        TPR
                                                                               (3)IMG_S:genomic_location_end

                                                         (2)RDFS:comment                     (3)IMG_S:genomic_location_start




                                                 (1)RDF:type (4)IMG:gene_oid=639752258 (3)IMG_S:locus_tag FTN_0209
                            RDF:description
Data sources
Genome annotations


                                                                                                             http://supfam.cs.bris.ac.uk/




                                                                                                          RDF#type                                       RDF:description

                                                                                          http://purl.uniprot.org/core/Protein_Family
                                                                                                                                        SUPERFAMILY:cgi-bin/model.cgi?model=0040419

                                                                                             SUPERFAMILY:Assignment_Region                                   155-367


                                                                                                    SUPERFAMILY:Score                                         5.1e-39

                                                                                                  SUPERFAMILY:SCOP_ID                     SUPERFAMILY:cgi-bin/scop.cgi?sunid=52540
               http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&id=118496616            SUPERFAMILY:SCOP_Fold
                                                                                                                                        P-loop containing nucleoside triphosphate hydrolases
                                                                                                 SUPERFAMILY:Family_ID

                                                                                                   SUPERFAMILY:Evalue                                         81269


                                                                                                                                                             7.33e-06
                                                                                             SUPERFAMILY:Family_Description
                                                                                                                                                  Extended AAA-ATPase domain
                                                                                              SUPERFAMILY:Similar_Structure
                                                                                                                                                          1l8q A:77-289

                                                                                        Francisella SuperFamily Data
Data sources
Genome annotations - KEGG

                                http://www.genome.jp/dbget-bin/www_bget?pathway+ftn00010

                                                              http://img.jgi.doe.gov/schema#gene

                                  http://www.genome.jp/dbget-bin/www_bget?ftn:FTN_0298

       http://img.jgi.doe.gov/schema#gene_name rdfs:comment rdfs:seeAlso                                       rdfs:seeAlso

    glpX                           fructose    http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+[EC:3.1.3.11]   http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+[SP:A0Q4N9_FRATN]




                   http://www.genome.jp/dbget-bin/www_bfind?F.tularensis_U112

Genome annotations - NCBI protein
                   http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&id=118496616

                               RDF:type       RDF:idsymbol                  RDFS:#seeAlso                                     http://purl.uniprot.org/Annotation/

 RDF:description       YP_897666.1        http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?[refseqp-SeqVersion:YP_897666.1]+-e                  chromosomal




     http://www.ncbi.nlm.nih.gov/sites/gquery?term=Francisella+tularensis+novicida
Data sources
Genome annotations - GO
                                                                              RDF:type                                                         RDF:description

                                                                       mgla:GO_Annotation#ID                     http://amigo.geneontology.org/cgi-bin/amigo/go.cgi?view=details&query=0006749

                                                                      mgla:GO_Annotation#Term                                                     glutathione

 http://www.genome.jp/dbget-bin/www_bget?ftn:FTN_0277
                                                                    mgla:GO_Annotation#Ontology                                               biological_process
                                                                     mgla:GO_Annotation#Level
                                                                                                                                                      7
                                                        http://www.compbio.dundee.ac.uk/Software/GOtcha/iscore
                                                                                                                                             0.879989490261963
                                                        http://www.compbio.dundee.ac.uk/Software/GOtcha/cscore
                                                                                                                                               5.7273821328517




Poson annotations - Cogs


                                                                                                         http://www.ncbi.nlm.nih.gov/sites/entrez?db=cdd&cmd=search&term=COG0508
                                                                               mgla:cogNumber
                                                                              mgla:cogDomain                                                       AceF
 https://tools.nwrce.org/cgi-bin/fnu112/poson.cgi?poson=PSN082435            mgla:cogDescription
                                                                              mgla:cogCategory                                          Pyruvate/2-oxoglutarate

                                                                                                                                            dihydrolipoamide
Data sources - experiments
Transcriptomics
Data sources - experiments
Proteomics
Proteomics WT vs Mgla Mutant
Francisella tularensis novicida U112

                   WildType                                            MglA mutant

      Whole Cell     Soluble      Membrane            Whole Cell             Soluble    Membrane
        (3)            (3)          (3)                 (3)                    (3)        (3)




(4)          (4)               (4)              (4)                  (4)               (4)

Sequest DRAGON                   Sequest DRAGON              Sequest DRAGON
             Sequest DRAGON                   Sequest DRAGON             Sequest DRAGON



                                            Identification Relative Abundance

                                                                                        P val <0.01



                                                                Two-sided t-test
RDF - excel conversion
                                    Pval
                                                                                        Genome
                                           Pval-1
                       analysis

Identified Peptide                 mgla:poson


 abundance      mgla:experiment    PSN     rdfs:seeAlso
                                                          PSNV2   rdfs:seeAlso
                                                                                  PSNV3         rdfs:seeAlso
                                                                                                               FTN


                                                                                 rdfs:seeAlso

                                                                      DDBID
                                   Peptide
                                  sequence
                    predicate                                                GO                  SP            EC
  subject




                    object
Data integration
Reconciled Identifiers

                  (WashU-B) PSN.V1




                    (COGs) COGID       (WashU-B) PSN.V2




                  (NCBI) PROTEINID      (WashU-B) PSN.V3      (IMG) GENEID   (WashU-P) DDB




                                        (Fn ORF ID) FTN                      (Refseq) ACNo




     (Gene Ontology) GOID            (ENZYME) E.C.No       (Uniprot) ACNo
Data Integration
Adding new experiments

                                            Experiment                                     Public
                                                2
        Experiment                                                                       domain data
            1




                     PSN   rdfs:seeAlso
                                          PSNV2   rdfs:seeAlso
                                                                  PSNV3         rdfs:seeAlso
                                                                                               FTN


                                                                 rdfs:seeAlso
     Experiment
         3                                            DDBID

                           Experiment
                               4
                                                             GO            AC No.              EC
Data integration
Sesame
 NadiaAnwar:~ nadia$ openrdf-sesame-2.1/bin/console.sh
 Connected to default data directory

 Commands end with '.' at the end of a line
 Type 'help.' for help
 > connect http://127.0.0.1:8080/openrdf-sesame/.
 Disconnecting from default data directory
 Connected to http://127.0.0.1:8080/openrdf-sesame/
 > show r.
 +----------
 |SYSTEM ("System configuration repository")
 |ftnRepoNative ("Francisella Test")
 |FrancisellaNative ("FrancisellaTestStore")
 |FrancisellaReified ("Native store with RDF Schema inferencing")
 |FrancisellaReified_index2 ("Native store with RDF Schema inferencing")
 |Francisella ("Native store with RDF Schema inferencing")
 +----------
 > open FrancisellaReified_index2.
 Opened repository 'FrancisellaReified_index2'
Sesame
Data load (ftnRepoNative) - native (spoc,posc)

                                 Data File            time (s)    triples
              francisella_locus_tag.nt                8.93        1,767
              interact-prot.nt                        88.51      20,682
              interact-prot-peptides.nt                          248,647
              mgla search db.fasta.blastp4 ypURL.n3   9.7         1,719
              NC_008601.nt                            43.14      12,781
              Ft_novicidaU112go.nt                    359.14      2,548
              francisella.rdf2.nt                     43.41      10,434
              francisellaSUPERFAMILY.nt               57.88      16,110
              francisellaPROTEIN.fasta.nt             13.63       5,160
              Soluble.nt                              588.87     336,761
              WholeCell.nt                            469.02     112,625
              Membranes.nt                            1003.19    298,771
Data Integration
 Mgla data (ftnRepoNative)

                      analysis

Identified Peptide                  mgla:poson


 abundance                           PSN    rdfs:seeAlso
                                                           PSNV2   rdfs:seeAlso
                                                                                   PSNV3         rdfs:seeAlso
                                                                                                                FTN


                                                                                  rdfs:seeAlso
                    Experiment
                                                                       DDBID
                                            Peptide
                                           sequence
SELECT psn, ftn, ec FROM
{ftn} rdfs:seeAlso {ec},
                                                                        GO                        SP            EC
{psn} rdfs:seeAlso {ftn},
{analysis} mgla:poson {psn}
WHERE ec LIKE “*[EC:*”
USING NAMESPACE
mgla =<http://www.francisella.org/novicida/schema/fnu112/experiments/mgla/>
Data Integration
  Mgla data (ftnRepoNative)

                            analysis
                rdf:about
 Identified Peptide                     mgla:poson

                             mgla:sequence
                 mgla:experiment
  abundance                                  PSN    rdfs:seeAlso
                                                                   PSNV2   rdfs:seeAlso
                                                                                           PSNV3         rdfs:seeAlso
                                                                                                                        FTN

                         Peptide
                        sequence                                                          rdfs:seeAlso

                                                                               DDBID
SELECT abundance, psn, ec, ftn FROM
{ftn} rdfs:seeAlso {ec},
{psn} rdfs:seeAlso {ftn},                                                 GO                              SP            EC
{analysis} mgla:poson {psn},
{analysis} mgla:experiment {abundance},
WHERE ec LIKE “*[EC:*”
USING NAMESPACE
mgla =<http://www.francisella.org/novicida/schema/fnu112/experiments/mgla/>
Really easy, But....
• Simple excel to RDF conversion does not enable all queries


• Not a simple conversion - Data needs to be “modelled”



                           analysis
               rdf:about
Identified Peptide                     mgla:poson

                            mgla:sequence
                mgla:experiment
 abundance                                  PSN
                                                                      identifiedIn    Experiment
                        Peptide                    Peptide Sequence
                                                                                      Replicate




                                                                 {
                       sequence
                                                                            hasAbundance


                                                                  abundance
Data Integration
Reified statements
                                                     rdf:type
                         analysis                                      Identified Peptide

                                                           Peptide
                                                          sequence
                                         mgla:poson

                                               PSN    rdfs:seeAlso
                                                                     PSNV2   rdfs:seeAlso
                                                                                            PSNV3      rdfs:seeAlso
                                                                                                                      FTN
                   Experiment
                      Replicate
                                                                                            rdfs:seeAlso
                              t
                          jec
                  rd f:ob                                                         DDBID
 analysis data
                                    rdf:type          rdf:Statement
                 rd




                                rdf:s
                    f:
                   pr




                                       ubje
                                               ct                                       GO                 SP         EC
                     ed




                                                      analysis data
                         ica




mgla:PeptideAbundance
                          te




                         InExperimentReplicate
      abundance
Sesame
Reified Data load - native-RDFS (spoc,posc,posc)
                                    Data File     time (s)   time(mins)         triples
              FnU112Version3.nt                 383.44          6.3       58,474
              PosonMappings.nt                  84.56           1.4       13,760
              francisella_locus_tag.nt          16.73           0.3       1,767
              ConstructHasGeneID.nt             23.00           0.4       1,719
              interact-prot.nt                  124.95          2.1       20,682
              interact-prot-pepteides.nt        1127.97        18.7       248,647
              interact-protSeeAlsoisbURL.nt     10.67           0.2       1,528
              goAnnotation_URLID.nt             74.14           1.2       20,501
              NC_008601.nt                      75.84           1.3       12,781
              Membranes_CogNumberURL.nt         8.60            0.1       2,548
              Ft_novicida_U112_go.nt            561.38          9.3       2,548
              francisella.rdf2.nt               46.19           0.8       10,602
              francisellaSUPERFAMILY.nt         66.67           1.1       16,110
              francisellaPROTEIN.fasta.nt       15.27           0.3       5,160
              SolubleReifeid_3.rdf              1392.98        23.2       580,873
              WholeCellReified_3.rdf             941.16         15.6       184,221
              Membranes_3.rdf                   1026.66       17.111      416,086
              fnU112_draftRDFschemaV4.nt        215010.98     3,583.5     501
Queries
which posons have the most highly abundant peptides
select ftn , psn, exp, abundance from
{psn} rdfs:seeAlso {psnv2},
{psnv2} rdfs:seeAlso {psnv3},
{psnv3} rdfs:seeAlso {ftn},
{analysis} fnu112:poson {psn},
{analysis} rdf:type {rdf:Statement},
{analysis} rdf:object {exp},
{analysis} mgla:PeptideAbundance {abundance}
where xsd:integer(abundance) > 100000
and ftn LIKE "*FTN*"
using namespace
mgla=<http://www.francisella.org/novicida/schema/fnu112/experiments/mgla/>,
fnu112=<http://www.francisella.org/novicida/fnu112/schema/fnu112/experiments/
mgla#>
Queries
which posons have the most highly abundant peptides
Queries
which experiments have the most highly abundant peptides
Reified statements
 • Reified mgla data are much bigger (4 more statements/abundance)


 • The really interesting queries return Java out of memory error (-Xms-1024M -
   Xmx 1536M)
                                                                                                                         identifiedIn             Experiment
                                                                                       Peptide Sequence
                                                                                                                                                  Replicate




                                                                                                                   {
 • Haven’t yet tested shortcut path expression
                                                                                                                                 hasAbundance
     { {reifSubj} reifPred {reifObj} } pred {obj}
                                                                                                                      abundance
     { {seq} identifiedIn {ExpRep} } hasAbundance {abd}
<#WholeCell_Lvl7_02.12> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement>.
<#WholeCell_Lvl7_02.12> <http://www.w3.org/1999/02/22-rdf-syntax-ns#subject> <http:/www.francisella.org/novicida/schema/fnu112/experiments/mgla/WholeCell_Lvl7_02.1>.
<#WholeCell_Lvl7_02.12> <http://www.w3.org/1999/02/22-rdf-syntax-ns#predicate> <http:/www.francisella.org/novicida/schema/fnu112/experiments/mgla/InExperimentReplicate>.
<#WholeCell_Lvl7_02.12> <http://www.w3.org/1999/02/22-rdf-syntax-ns#object> <http:/www.francisella.org/novicida/schema/fnu112/experiments/mgla/wildtype/01_wc_01>.
<#WholeCell_Lvl7_02.12> <http:/www.francisella.org/novicida/schema/fnu112/experiments/mgla/PeptideAbundance> "2594".
Comparison of integrated experimental data
  Distinct and overlapping posons identified within each biological fraction (>20000)

                                                  171                 146
                                                            185
                                               mem                    sol

                               mem MINUS sol                                         sol MINUS mem
select distinct psn from                                                                    select distinct psn from
{x} fns:poson {psn},                                                                        {x} fns:poson {psn},
{x} fn:InExperimentReplicate {experiment},                                                  {x} fn:InExperimentReplicate {experiment},
{analysis} rdf:subject {x},                                                                 {analysis} rdf:subject {x},
{analysis} rdf:object {exp},                        INTERSECT                               {analysis} rdf:object {exp},
{analysis} fn:PeptideAbundance {abundance}                                                  {analysis} fn:PeptideAbundance {abundance}
                                             select distinct psn from
where xsd:integer(abundance) > 20000                                                        where xsd:integer(abundance) > 20000
                                             {x} fns:poson {psn},
and experiment LIKE "*mem*"                                                                 and experiment LIKE "*sol*"
                                             {x} fn:InExperimentReplicate {experiment},
MINUS                                                                                       MINUS
                                             {analysis} rdf:subject {x},
select distinct psn from                                                                    select distinct psn from
                                             {analysis} rdf:object {exp},
{x} fns:poson {psn},                                                                        {x} fns:poson {psn},
                                             {analysis} fn:PeptideAbundance {abundance}
{x} fn:InExperimentReplicate {experiment},                                                  {x} fn:InExperimentReplicate {experiment},
                                             where xsd:integer(abundance) > 20000
{analysis} rdf:subject {x},                                                                 {analysis} rdf:subject {x},
                                             and experiment LIKE "*sol*"
{analysis} rdf:object {exp},                                                                {analysis} rdf:object {exp},
                                             INTERSECT
{analysis} fn:PeptideAbundance {abundance}                                                  {analysis} fn:PeptideAbundance {abundance}
                                             select distinct psn from
where xsd:integer(abundance) > 20000                                                        where xsd:integer(abundance) > 20000
                                             {x} fns:poson {psn},
and experiment LIKE "*sol*"                                                                 and experiment LIKE "*mem*"
                                             {x} fn:InExperimentReplicate {experiment},
using namespace                                                                             using namespace
                                             {analysis} rdf:subject {x},
                                             {analysis} rdf:object {exp},
                                             {analysis} fn:PeptideAbundance {abundance}
                                             where xsd:integer(abundance) > 20000
                                             and experiment LIKE "*mem*"
                                             using namespace
Comparison of integrated experimental data
 Distinct and overlapping posons identified within each biological fraction (<5000)

                                                  219                 125
                                                            245
                                               mem                    sol

                              mem MINUS sol                                          sol MINUS mem
select distinct psn from                                                                    select distinct psn from
{x} fns:poson {psn},                                                                        {x} fns:poson {psn},
{x} fn:InExperimentReplicate {experiment},                                                  {x} fn:InExperimentReplicate {experiment},
{analysis} rdf:subject {x},                                                                 {analysis} rdf:subject {x},
{analysis} rdf:object {exp},                        INTERSECT                               {analysis} rdf:object {exp},
{analysis} fn:PeptideAbundance {abundance}                                                  {analysis} fn:PeptideAbundance {abundance}
                                             select distinct psn from
where xsd:integer(abundance) < 5000                                                         where xsd:integer(abundance) < 5000
                                             {x} fns:poson {psn},
and experiment LIKE "*mem*"                                                                 and experiment LIKE "*sol*"
                                             {x} fn:InExperimentReplicate {experiment},
MINUS                                                                                       MINUS
                                             {analysis} rdf:subject {x},
select distinct psn from                                                                    select distinct psn from
                                             {analysis} rdf:object {exp},
{x} fns:poson {psn},                                                                        {x} fns:poson {psn},
                                             {analysis} fn:PeptideAbundance {abundance}
{x} fn:InExperimentReplicate {experiment},                                                  {x} fn:InExperimentReplicate {experiment},
                                             where xsd:integer(abundance) < 5000
{analysis} rdf:subject {x},                                                                 {analysis} rdf:subject {x},
                                             and experiment LIKE "*sol*"
{analysis} rdf:object {exp},                                                                {analysis} rdf:object {exp},
                                             INTERSECT
{analysis} fn:PeptideAbundance {abundance}                                                  {analysis} fn:PeptideAbundance {abundance}
                                             select distinct psn from
where xsd:integer(abundance) < 5000                                                         where xsd:integer(abundance) < 5000
                                             {x} fns:poson {psn},
and experiment LIKE "*sol*"                                                                 and experiment LIKE "*mem*"
                                             {x} fn:InExperimentReplicate {experiment},
using namespace                                                                             using namespace
                                             {analysis} rdf:subject {x},
                                             {analysis} rdf:object {exp},
                                             {analysis} fn:PeptideAbundance {abundance}
                                             where xsd:integer(abundance) < 5000
                                             and experiment LIKE "*mem*"
                                             using namespace
Further work
• Queries are slow in the native repository, database repositories are probably
  faster.
• Adding transcriptomic experiment:
 Wt Vs mglA mutant
 GEO AC GSE5468
• RDF-S inferencing?
Acknowledgements
• Funding: BBSRC -Radical Solutions for Researching the Proteome
• University of Glasgow, Glasgow
   • Prof. Walter Kolch
   • Dr Andy Pitt
• University of Strathclyde, Glasgow
   • Dr Ela Hunt (Scientific Advisor)
• University of Washington, Seattle
   • Prof. Dave Goodlett (Scientific Advisor)
   • Dr Mitch Brittnacher, Mathew Radey, Laurence Rohmer
   • Dr Tina Guina (MglA experiment)
Abundance thresholds....
• SeRQL aggregate functions would be nice to have


• Queries to find low and high abundance values:


  • WHERE abundance BETWEEN MEDIAN(abundance) AND
    MAX(abundance)


  • WHERE abundance BETWEEN MIN(abundance) and MEDIAN(abundance)

More Related Content

Viewers also liked

Dna baser
Dna baserDna baser
Dna baser
Ayesha Iram
 
137920
137920137920
Genome assembly: then and now (with notes) — v1.2
Genome assembly: then and now (with notes) — v1.2Genome assembly: then and now (with notes) — v1.2
Genome assembly: then and now (with notes) — v1.2
Keith Bradnam
 
DNA of building software products - Fast track method
DNA of building software products - Fast track methodDNA of building software products - Fast track method
DNA of building software products - Fast track method
ProductNation/iSPIRT
 
Chenoweth os bridge 2015 pp
Chenoweth os bridge 2015 ppChenoweth os bridge 2015 pp
Chenoweth os bridge 2015 pp
dreamwidth
 
Kishor Presentation
Kishor PresentationKishor Presentation
Kishor Presentation
Kishor Tappita
 
Profile A.I.Macan Markar &amp; Co.
Profile A.I.Macan Markar &amp; Co.Profile A.I.Macan Markar &amp; Co.
Profile A.I.Macan Markar &amp; Co.
Arjuna Dangalla
 
Biology for Computer Engineers:Part 1(www.ubio.in)
Biology for Computer Engineers:Part 1(www.ubio.in)Biology for Computer Engineers:Part 1(www.ubio.in)
Biology for Computer Engineers:Part 1(www.ubio.in)
ubio Biotechnology Systems Pvt Ltd
 
Application of Marker Assisted Selection (MAS) for the improvement of Bean Co...
Application of Marker Assisted Selection (MAS) for the improvement of Bean Co...Application of Marker Assisted Selection (MAS) for the improvement of Bean Co...
Application of Marker Assisted Selection (MAS) for the improvement of Bean Co...
CIAT
 
Biology DNA Analysis
Biology DNA AnalysisBiology DNA Analysis
Biology DNA Analysis
eLearningJa
 
Biotechnological toools & their applications
Biotechnological toools & their applicationsBiotechnological toools & their applications
Biotechnological toools & their applications
Rishikesh Mishra
 
Back to Basics: Fundamental Concepts and Special Considerations in gDNA Isola...
Back to Basics: Fundamental Concepts and Special Considerations in gDNA Isola...Back to Basics: Fundamental Concepts and Special Considerations in gDNA Isola...
Back to Basics: Fundamental Concepts and Special Considerations in gDNA Isola...
QIAGEN
 

Viewers also liked (12)

Dna baser
Dna baserDna baser
Dna baser
 
137920
137920137920
137920
 
Genome assembly: then and now (with notes) — v1.2
Genome assembly: then and now (with notes) — v1.2Genome assembly: then and now (with notes) — v1.2
Genome assembly: then and now (with notes) — v1.2
 
DNA of building software products - Fast track method
DNA of building software products - Fast track methodDNA of building software products - Fast track method
DNA of building software products - Fast track method
 
Chenoweth os bridge 2015 pp
Chenoweth os bridge 2015 ppChenoweth os bridge 2015 pp
Chenoweth os bridge 2015 pp
 
Kishor Presentation
Kishor PresentationKishor Presentation
Kishor Presentation
 
Profile A.I.Macan Markar &amp; Co.
Profile A.I.Macan Markar &amp; Co.Profile A.I.Macan Markar &amp; Co.
Profile A.I.Macan Markar &amp; Co.
 
Biology for Computer Engineers:Part 1(www.ubio.in)
Biology for Computer Engineers:Part 1(www.ubio.in)Biology for Computer Engineers:Part 1(www.ubio.in)
Biology for Computer Engineers:Part 1(www.ubio.in)
 
Application of Marker Assisted Selection (MAS) for the improvement of Bean Co...
Application of Marker Assisted Selection (MAS) for the improvement of Bean Co...Application of Marker Assisted Selection (MAS) for the improvement of Bean Co...
Application of Marker Assisted Selection (MAS) for the improvement of Bean Co...
 
Biology DNA Analysis
Biology DNA AnalysisBiology DNA Analysis
Biology DNA Analysis
 
Biotechnological toools & their applications
Biotechnological toools & their applicationsBiotechnological toools & their applications
Biotechnological toools & their applications
 
Back to Basics: Fundamental Concepts and Special Considerations in gDNA Isola...
Back to Basics: Fundamental Concepts and Special Considerations in gDNA Isola...Back to Basics: Fundamental Concepts and Special Considerations in gDNA Isola...
Back to Basics: Fundamental Concepts and Special Considerations in gDNA Isola...
 

Similar to Genome and Proteome data integration in RDF

Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
Arockiyajainmary
 
Protein database
Protein databaseProtein database
Protein database
Khalid Hakeem
 
Introduction to Proteogenomics
Introduction to Proteogenomics Introduction to Proteogenomics
Introduction to Proteogenomics
Yasset Perez-Riverol
 
Introduction to Protein Families and Databases
Introduction to Protein Families and DatabasesIntroduction to Protein Families and Databases
Introduction to Protein Families and Databases
Rohit Satyam
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple
nadeem akhter
 
Prediction of proteins for insecticidal activity using python toolkit iFeature
Prediction of proteins for insecticidal activity using python toolkit iFeaturePrediction of proteins for insecticidal activity using python toolkit iFeature
Prediction of proteins for insecticidal activity using python toolkit iFeature
Karnam Vasudeva Rao, PhD
 
Analysis with biological pathways:
Analysis with biological pathways: Analysis with biological pathways:
Analysis with biological pathways:
Chris Evelo
 
Research presentation-wd
Research presentation-wdResearch presentation-wd
Research presentation-wd
Wagied Davids
 
BITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS: Basics of sequence analysis
BITS: Basics of sequence analysis
BITS
 
bioinformatic.pptx
bioinformatic.pptxbioinformatic.pptx
bioinformatic.pptx
RitikaChoudhary57
 
Omic Data Integration Strategies
Omic Data Integration StrategiesOmic Data Integration Strategies
Omic Data Integration Strategies
Dmitry Grapov
 
Data retrieval
Data retrievalData retrieval
Linking Linked Data CSHALS2013
Linking Linked Data CSHALS2013Linking Linked Data CSHALS2013
Linking Linked Data CSHALS2013
Nadia Anwar
 
Using ontologies to do integrative systems biology
Using ontologies to do integrative systems biologyUsing ontologies to do integrative systems biology
Using ontologies to do integrative systems biology
Chris Evelo
 
Unison: Enabling easy, rapid, and comprehensive proteomic mining
Unison: Enabling easy, rapid, and comprehensive proteomic miningUnison: Enabling easy, rapid, and comprehensive proteomic mining
Unison: Enabling easy, rapid, and comprehensive proteomic mining
Reece Hart
 
Proteomics a search tool for vaccines
Proteomics a search tool for vaccinesProteomics a search tool for vaccines
Proteomics a search tool for vaccines
Lawrence Okoror
 
IRJET- Disease Identification using Proteins Values and Regulatory Modules
IRJET-  	  Disease Identification using Proteins Values and Regulatory  ModulesIRJET-  	  Disease Identification using Proteins Values and Regulatory  Modules
IRJET- Disease Identification using Proteins Values and Regulatory Modules
IRJET Journal
 
High throughput approaches to understanding gene function and mapping archite...
High throughput approaches to understanding gene function and mapping archite...High throughput approaches to understanding gene function and mapping archite...
High throughput approaches to understanding gene function and mapping archite...
Tintumann
 
Fans
FansFans
Research report (alternative splicing, protein structure; retinitis pigmentosa)
Research report (alternative splicing, protein structure; retinitis pigmentosa)Research report (alternative splicing, protein structure; retinitis pigmentosa)
Research report (alternative splicing, protein structure; retinitis pigmentosa)
avalgar
 

Similar to Genome and Proteome data integration in RDF (20)

Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Protein database
Protein databaseProtein database
Protein database
 
Introduction to Proteogenomics
Introduction to Proteogenomics Introduction to Proteogenomics
Introduction to Proteogenomics
 
Introduction to Protein Families and Databases
Introduction to Protein Families and DatabasesIntroduction to Protein Families and Databases
Introduction to Protein Families and Databases
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple
 
Prediction of proteins for insecticidal activity using python toolkit iFeature
Prediction of proteins for insecticidal activity using python toolkit iFeaturePrediction of proteins for insecticidal activity using python toolkit iFeature
Prediction of proteins for insecticidal activity using python toolkit iFeature
 
Analysis with biological pathways:
Analysis with biological pathways: Analysis with biological pathways:
Analysis with biological pathways:
 
Research presentation-wd
Research presentation-wdResearch presentation-wd
Research presentation-wd
 
BITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS: Basics of sequence analysis
BITS: Basics of sequence analysis
 
bioinformatic.pptx
bioinformatic.pptxbioinformatic.pptx
bioinformatic.pptx
 
Omic Data Integration Strategies
Omic Data Integration StrategiesOmic Data Integration Strategies
Omic Data Integration Strategies
 
Data retrieval
Data retrievalData retrieval
Data retrieval
 
Linking Linked Data CSHALS2013
Linking Linked Data CSHALS2013Linking Linked Data CSHALS2013
Linking Linked Data CSHALS2013
 
Using ontologies to do integrative systems biology
Using ontologies to do integrative systems biologyUsing ontologies to do integrative systems biology
Using ontologies to do integrative systems biology
 
Unison: Enabling easy, rapid, and comprehensive proteomic mining
Unison: Enabling easy, rapid, and comprehensive proteomic miningUnison: Enabling easy, rapid, and comprehensive proteomic mining
Unison: Enabling easy, rapid, and comprehensive proteomic mining
 
Proteomics a search tool for vaccines
Proteomics a search tool for vaccinesProteomics a search tool for vaccines
Proteomics a search tool for vaccines
 
IRJET- Disease Identification using Proteins Values and Regulatory Modules
IRJET-  	  Disease Identification using Proteins Values and Regulatory  ModulesIRJET-  	  Disease Identification using Proteins Values and Regulatory  Modules
IRJET- Disease Identification using Proteins Values and Regulatory Modules
 
High throughput approaches to understanding gene function and mapping archite...
High throughput approaches to understanding gene function and mapping archite...High throughput approaches to understanding gene function and mapping archite...
High throughput approaches to understanding gene function and mapping archite...
 
Fans
FansFans
Fans
 
Research report (alternative splicing, protein structure; retinitis pigmentosa)
Research report (alternative splicing, protein structure; retinitis pigmentosa)Research report (alternative splicing, protein structure; retinitis pigmentosa)
Research report (alternative splicing, protein structure; retinitis pigmentosa)
 

Recently uploaded

NARCOTICS- POLICY AND PROCEDURES FOR ITS USE
NARCOTICS- POLICY AND PROCEDURES FOR ITS USENARCOTICS- POLICY AND PROCEDURES FOR ITS USE
NARCOTICS- POLICY AND PROCEDURES FOR ITS USE
Dr. Ahana Haroon
 
What is Obesity? How to overcome Obesity?
What is Obesity? How to overcome Obesity?What is Obesity? How to overcome Obesity?
What is Obesity? How to overcome Obesity?
Healthmedsrx.com
 
Cervical Disc Arthroplasty ORSI 2024.pptx
Cervical Disc Arthroplasty ORSI 2024.pptxCervical Disc Arthroplasty ORSI 2024.pptx
Cervical Disc Arthroplasty ORSI 2024.pptx
LEFLOT Jean-Louis
 
Spontaneous Bacterial Peritonitis - Pathogenesis , Clinical Features & Manage...
Spontaneous Bacterial Peritonitis - Pathogenesis , Clinical Features & Manage...Spontaneous Bacterial Peritonitis - Pathogenesis , Clinical Features & Manage...
Spontaneous Bacterial Peritonitis - Pathogenesis , Clinical Features & Manage...
Jim Jacob Roy
 
Top Travel Vaccinations in Manchester
Top Travel Vaccinations in ManchesterTop Travel Vaccinations in Manchester
Top Travel Vaccinations in Manchester
NX Healthcare
 
Pune Call Girls 7339748667 AVAILABLE HOT GIRLS AUNTY BOOK NOW
Pune Call Girls 7339748667 AVAILABLE HOT GIRLS AUNTY BOOK NOWPune Call Girls 7339748667 AVAILABLE HOT GIRLS AUNTY BOOK NOW
Pune Call Girls 7339748667 AVAILABLE HOT GIRLS AUNTY BOOK NOW
Get New Sim
 
biomechanics of running. Dr.dhwani.pptx
biomechanics of running.   Dr.dhwani.pptxbiomechanics of running.   Dr.dhwani.pptx
biomechanics of running. Dr.dhwani.pptx
Dr. Dhwani kawedia
 
Ageing, the Elderly, Gerontology and Public Health
Ageing, the Elderly, Gerontology and Public HealthAgeing, the Elderly, Gerontology and Public Health
Ageing, the Elderly, Gerontology and Public Health
phuakl
 
Nano-gold for Cancer Therapy chemistry investigatory project
Nano-gold for Cancer Therapy chemistry investigatory projectNano-gold for Cancer Therapy chemistry investigatory project
Nano-gold for Cancer Therapy chemistry investigatory project
SIVAVINAYAKPK
 
What are the different types of Dental implants.
What are the different types of Dental implants.What are the different types of Dental implants.
What are the different types of Dental implants.
Gokuldas Hospital
 
All about shoulder Joint ..
All about shoulder Joint .. All about shoulder Joint ..
All about shoulder Joint ..
Aswan University Hospital
 
KENT'S REPERTORY by dr niranjan mohanty.pptx
KENT'S REPERTORY by dr niranjan mohanty.pptxKENT'S REPERTORY by dr niranjan mohanty.pptx
KENT'S REPERTORY by dr niranjan mohanty.pptx
SravsPandu1
 
Local anesthetics 2024/ Medicinal Chemistry pdf
Local anesthetics 2024/ Medicinal Chemistry pdfLocal anesthetics 2024/ Medicinal Chemistry pdf
Local anesthetics 2024/ Medicinal Chemistry pdf
NarminHamaaminHussen
 
Debunking Nutrition Myths: Separating Fact from Fiction"
Debunking Nutrition Myths: Separating Fact from Fiction"Debunking Nutrition Myths: Separating Fact from Fiction"
Debunking Nutrition Myths: Separating Fact from Fiction"
AlexandraDiaz101
 
Breast cancer: Post menopausal endocrine therapy
Breast cancer: Post menopausal endocrine therapyBreast cancer: Post menopausal endocrine therapy
Breast cancer: Post menopausal endocrine therapy
Dr. Sumit KUMAR
 
Pharmacology of 5-hydroxytryptamine and Antagonist
Pharmacology of 5-hydroxytryptamine and AntagonistPharmacology of 5-hydroxytryptamine and Antagonist
Pharmacology of 5-hydroxytryptamine and Antagonist
Dr. Nikhilkumar Sakle
 
Public Health Lecture 4 Social Sciences and Public Health
Public Health Lecture 4 Social Sciences and Public HealthPublic Health Lecture 4 Social Sciences and Public Health
Public Health Lecture 4 Social Sciences and Public Health
phuakl
 
Tele Optometry (kunj'sppt) / Basics of tele optometry.
Tele Optometry (kunj'sppt) / Basics of tele optometry.Tele Optometry (kunj'sppt) / Basics of tele optometry.
Tele Optometry (kunj'sppt) / Basics of tele optometry.
Kunj Vihari
 
Physical demands in sports - WCSPT Oslo 2024
Physical demands in sports - WCSPT Oslo 2024Physical demands in sports - WCSPT Oslo 2024
Physical demands in sports - WCSPT Oslo 2024
Torstein Dalen-Lorentsen
 
RESPIRATORY DISEASES by bhavya kelavadiya
RESPIRATORY DISEASES by bhavya kelavadiyaRESPIRATORY DISEASES by bhavya kelavadiya
RESPIRATORY DISEASES by bhavya kelavadiya
Bhavyakelawadiya
 

Recently uploaded (20)

NARCOTICS- POLICY AND PROCEDURES FOR ITS USE
NARCOTICS- POLICY AND PROCEDURES FOR ITS USENARCOTICS- POLICY AND PROCEDURES FOR ITS USE
NARCOTICS- POLICY AND PROCEDURES FOR ITS USE
 
What is Obesity? How to overcome Obesity?
What is Obesity? How to overcome Obesity?What is Obesity? How to overcome Obesity?
What is Obesity? How to overcome Obesity?
 
Cervical Disc Arthroplasty ORSI 2024.pptx
Cervical Disc Arthroplasty ORSI 2024.pptxCervical Disc Arthroplasty ORSI 2024.pptx
Cervical Disc Arthroplasty ORSI 2024.pptx
 
Spontaneous Bacterial Peritonitis - Pathogenesis , Clinical Features & Manage...
Spontaneous Bacterial Peritonitis - Pathogenesis , Clinical Features & Manage...Spontaneous Bacterial Peritonitis - Pathogenesis , Clinical Features & Manage...
Spontaneous Bacterial Peritonitis - Pathogenesis , Clinical Features & Manage...
 
Top Travel Vaccinations in Manchester
Top Travel Vaccinations in ManchesterTop Travel Vaccinations in Manchester
Top Travel Vaccinations in Manchester
 
Pune Call Girls 7339748667 AVAILABLE HOT GIRLS AUNTY BOOK NOW
Pune Call Girls 7339748667 AVAILABLE HOT GIRLS AUNTY BOOK NOWPune Call Girls 7339748667 AVAILABLE HOT GIRLS AUNTY BOOK NOW
Pune Call Girls 7339748667 AVAILABLE HOT GIRLS AUNTY BOOK NOW
 
biomechanics of running. Dr.dhwani.pptx
biomechanics of running.   Dr.dhwani.pptxbiomechanics of running.   Dr.dhwani.pptx
biomechanics of running. Dr.dhwani.pptx
 
Ageing, the Elderly, Gerontology and Public Health
Ageing, the Elderly, Gerontology and Public HealthAgeing, the Elderly, Gerontology and Public Health
Ageing, the Elderly, Gerontology and Public Health
 
Nano-gold for Cancer Therapy chemistry investigatory project
Nano-gold for Cancer Therapy chemistry investigatory projectNano-gold for Cancer Therapy chemistry investigatory project
Nano-gold for Cancer Therapy chemistry investigatory project
 
What are the different types of Dental implants.
What are the different types of Dental implants.What are the different types of Dental implants.
What are the different types of Dental implants.
 
All about shoulder Joint ..
All about shoulder Joint .. All about shoulder Joint ..
All about shoulder Joint ..
 
KENT'S REPERTORY by dr niranjan mohanty.pptx
KENT'S REPERTORY by dr niranjan mohanty.pptxKENT'S REPERTORY by dr niranjan mohanty.pptx
KENT'S REPERTORY by dr niranjan mohanty.pptx
 
Local anesthetics 2024/ Medicinal Chemistry pdf
Local anesthetics 2024/ Medicinal Chemistry pdfLocal anesthetics 2024/ Medicinal Chemistry pdf
Local anesthetics 2024/ Medicinal Chemistry pdf
 
Debunking Nutrition Myths: Separating Fact from Fiction"
Debunking Nutrition Myths: Separating Fact from Fiction"Debunking Nutrition Myths: Separating Fact from Fiction"
Debunking Nutrition Myths: Separating Fact from Fiction"
 
Breast cancer: Post menopausal endocrine therapy
Breast cancer: Post menopausal endocrine therapyBreast cancer: Post menopausal endocrine therapy
Breast cancer: Post menopausal endocrine therapy
 
Pharmacology of 5-hydroxytryptamine and Antagonist
Pharmacology of 5-hydroxytryptamine and AntagonistPharmacology of 5-hydroxytryptamine and Antagonist
Pharmacology of 5-hydroxytryptamine and Antagonist
 
Public Health Lecture 4 Social Sciences and Public Health
Public Health Lecture 4 Social Sciences and Public HealthPublic Health Lecture 4 Social Sciences and Public Health
Public Health Lecture 4 Social Sciences and Public Health
 
Tele Optometry (kunj'sppt) / Basics of tele optometry.
Tele Optometry (kunj'sppt) / Basics of tele optometry.Tele Optometry (kunj'sppt) / Basics of tele optometry.
Tele Optometry (kunj'sppt) / Basics of tele optometry.
 
Physical demands in sports - WCSPT Oslo 2024
Physical demands in sports - WCSPT Oslo 2024Physical demands in sports - WCSPT Oslo 2024
Physical demands in sports - WCSPT Oslo 2024
 
RESPIRATORY DISEASES by bhavya kelavadiya
RESPIRATORY DISEASES by bhavya kelavadiyaRESPIRATORY DISEASES by bhavya kelavadiya
RESPIRATORY DISEASES by bhavya kelavadiya
 

Genome and Proteome data integration in RDF

  • 1. Semantic Web Applications and Tools for Life Sciences November 2008 Genome and Proteome data integration in RDF Nadia Anwar, Ela Hunt, Walter Kolch and Andy Pitt e Me ts tab nom Pr rip e olit ot G es sc ein an s Tr Data Discovery
  • 2. Outline • Data Integration in Bioinformatics. • Semantic data integration • Francisella • Integrating genome annotations with experimental proteomics data in RDF • Further work
  • 3. Data Integration is not a solved problem
  • 4. Information discovery is not Integrated High TP Microarray Proteomics Computational Computational Sequencing experiments experiments analysis analysis Systems Biology Synthetic Networks/ Genomics Proteomics Pathways Sequence Peptide Profiles Predictions Gene Expression ORF Prediction Transcript Profile Peptide Abundance Genome Transcript Protein Identification Comparisons Abundance Protein Interactions PT-Modifications Metabolomics LIMS LIMS LIMS LIMS Genome Regulatory Networks Metabolic Pathways Translational Medicine
  • 5. Semantic Data Integration across omes data silos Data Genes Transcripts Peptides Metabolites Genotype Information Data Discovery
  • 6. Proof of concept Francisella tularensis ulceroglandular tularaemia respiratory oculoglandular tularaemia tularaemia
  • 7. Bioterrorism • Francisella tularensis is a very successful intracellular pathogen that causes severe disease (respiratory tulareamia is the most acute form of the disease) • low infectious dose (10-50 bacterium compared to anthrax which requires 8,000-15,000 spores) • weaponisation fears
  • 9. RDF http://img.jgi.doe.gov/cgi-bin/pub/main.cgi?section=TaxonDetail&page=taxonDetail&taxon_oid=639633024#export 229976 + (3)IMG_S:genomic_location_strand 229107 TPR (3)IMG_S:genomic_location_end (2)RDFS:comment (3)IMG_S:genomic_location_start (1)RDF:type (4)IMG:gene_oid=639752258 (3)IMG_S:locus_tag FTN_0209 RDF:description
  • 10. Data sources Genome annotations http://supfam.cs.bris.ac.uk/ RDF#type RDF:description http://purl.uniprot.org/core/Protein_Family SUPERFAMILY:cgi-bin/model.cgi?model=0040419 SUPERFAMILY:Assignment_Region 155-367 SUPERFAMILY:Score 5.1e-39 SUPERFAMILY:SCOP_ID SUPERFAMILY:cgi-bin/scop.cgi?sunid=52540 http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&id=118496616 SUPERFAMILY:SCOP_Fold P-loop containing nucleoside triphosphate hydrolases SUPERFAMILY:Family_ID SUPERFAMILY:Evalue 81269 7.33e-06 SUPERFAMILY:Family_Description Extended AAA-ATPase domain SUPERFAMILY:Similar_Structure 1l8q A:77-289 Francisella SuperFamily Data
  • 11. Data sources Genome annotations - KEGG http://www.genome.jp/dbget-bin/www_bget?pathway+ftn00010 http://img.jgi.doe.gov/schema#gene http://www.genome.jp/dbget-bin/www_bget?ftn:FTN_0298 http://img.jgi.doe.gov/schema#gene_name rdfs:comment rdfs:seeAlso rdfs:seeAlso glpX fructose http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+[EC:3.1.3.11] http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+[SP:A0Q4N9_FRATN] http://www.genome.jp/dbget-bin/www_bfind?F.tularensis_U112 Genome annotations - NCBI protein http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&id=118496616 RDF:type RDF:idsymbol RDFS:#seeAlso http://purl.uniprot.org/Annotation/ RDF:description YP_897666.1 http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?[refseqp-SeqVersion:YP_897666.1]+-e chromosomal http://www.ncbi.nlm.nih.gov/sites/gquery?term=Francisella+tularensis+novicida
  • 12. Data sources Genome annotations - GO RDF:type RDF:description mgla:GO_Annotation#ID http://amigo.geneontology.org/cgi-bin/amigo/go.cgi?view=details&query=0006749 mgla:GO_Annotation#Term glutathione http://www.genome.jp/dbget-bin/www_bget?ftn:FTN_0277 mgla:GO_Annotation#Ontology biological_process mgla:GO_Annotation#Level 7 http://www.compbio.dundee.ac.uk/Software/GOtcha/iscore 0.879989490261963 http://www.compbio.dundee.ac.uk/Software/GOtcha/cscore 5.7273821328517 Poson annotations - Cogs http://www.ncbi.nlm.nih.gov/sites/entrez?db=cdd&cmd=search&term=COG0508 mgla:cogNumber mgla:cogDomain AceF https://tools.nwrce.org/cgi-bin/fnu112/poson.cgi?poson=PSN082435 mgla:cogDescription mgla:cogCategory Pyruvate/2-oxoglutarate dihydrolipoamide
  • 13. Data sources - experiments Transcriptomics
  • 14. Data sources - experiments Proteomics
  • 15. Proteomics WT vs Mgla Mutant
  • 16. Francisella tularensis novicida U112 WildType MglA mutant Whole Cell Soluble Membrane Whole Cell Soluble Membrane (3) (3) (3) (3) (3) (3) (4) (4) (4) (4) (4) (4) Sequest DRAGON Sequest DRAGON Sequest DRAGON Sequest DRAGON Sequest DRAGON Sequest DRAGON Identification Relative Abundance P val <0.01 Two-sided t-test
  • 17.
  • 18. RDF - excel conversion Pval Genome Pval-1 analysis Identified Peptide mgla:poson abundance mgla:experiment PSN rdfs:seeAlso PSNV2 rdfs:seeAlso PSNV3 rdfs:seeAlso FTN rdfs:seeAlso DDBID Peptide sequence predicate GO SP EC subject object
  • 19. Data integration Reconciled Identifiers (WashU-B) PSN.V1 (COGs) COGID (WashU-B) PSN.V2 (NCBI) PROTEINID (WashU-B) PSN.V3 (IMG) GENEID (WashU-P) DDB (Fn ORF ID) FTN (Refseq) ACNo (Gene Ontology) GOID (ENZYME) E.C.No (Uniprot) ACNo
  • 20. Data Integration Adding new experiments Experiment Public 2 Experiment domain data 1 PSN rdfs:seeAlso PSNV2 rdfs:seeAlso PSNV3 rdfs:seeAlso FTN rdfs:seeAlso Experiment 3 DDBID Experiment 4 GO AC No. EC
  • 21. Data integration Sesame NadiaAnwar:~ nadia$ openrdf-sesame-2.1/bin/console.sh Connected to default data directory Commands end with '.' at the end of a line Type 'help.' for help > connect http://127.0.0.1:8080/openrdf-sesame/. Disconnecting from default data directory Connected to http://127.0.0.1:8080/openrdf-sesame/ > show r. +---------- |SYSTEM ("System configuration repository") |ftnRepoNative ("Francisella Test") |FrancisellaNative ("FrancisellaTestStore") |FrancisellaReified ("Native store with RDF Schema inferencing") |FrancisellaReified_index2 ("Native store with RDF Schema inferencing") |Francisella ("Native store with RDF Schema inferencing") +---------- > open FrancisellaReified_index2. Opened repository 'FrancisellaReified_index2'
  • 22. Sesame Data load (ftnRepoNative) - native (spoc,posc) Data File time (s) triples francisella_locus_tag.nt 8.93 1,767 interact-prot.nt 88.51 20,682 interact-prot-peptides.nt 248,647 mgla search db.fasta.blastp4 ypURL.n3 9.7 1,719 NC_008601.nt 43.14 12,781 Ft_novicidaU112go.nt 359.14 2,548 francisella.rdf2.nt 43.41 10,434 francisellaSUPERFAMILY.nt 57.88 16,110 francisellaPROTEIN.fasta.nt 13.63 5,160 Soluble.nt 588.87 336,761 WholeCell.nt 469.02 112,625 Membranes.nt 1003.19 298,771
  • 23. Data Integration Mgla data (ftnRepoNative) analysis Identified Peptide mgla:poson abundance PSN rdfs:seeAlso PSNV2 rdfs:seeAlso PSNV3 rdfs:seeAlso FTN rdfs:seeAlso Experiment DDBID Peptide sequence SELECT psn, ftn, ec FROM {ftn} rdfs:seeAlso {ec}, GO SP EC {psn} rdfs:seeAlso {ftn}, {analysis} mgla:poson {psn} WHERE ec LIKE “*[EC:*” USING NAMESPACE mgla =<http://www.francisella.org/novicida/schema/fnu112/experiments/mgla/>
  • 24. Data Integration Mgla data (ftnRepoNative) analysis rdf:about Identified Peptide mgla:poson mgla:sequence mgla:experiment abundance PSN rdfs:seeAlso PSNV2 rdfs:seeAlso PSNV3 rdfs:seeAlso FTN Peptide sequence rdfs:seeAlso DDBID SELECT abundance, psn, ec, ftn FROM {ftn} rdfs:seeAlso {ec}, {psn} rdfs:seeAlso {ftn}, GO SP EC {analysis} mgla:poson {psn}, {analysis} mgla:experiment {abundance}, WHERE ec LIKE “*[EC:*” USING NAMESPACE mgla =<http://www.francisella.org/novicida/schema/fnu112/experiments/mgla/>
  • 25. Really easy, But.... • Simple excel to RDF conversion does not enable all queries • Not a simple conversion - Data needs to be “modelled” analysis rdf:about Identified Peptide mgla:poson mgla:sequence mgla:experiment abundance PSN identifiedIn Experiment Peptide Peptide Sequence Replicate { sequence hasAbundance abundance
  • 26. Data Integration Reified statements rdf:type analysis Identified Peptide Peptide sequence mgla:poson PSN rdfs:seeAlso PSNV2 rdfs:seeAlso PSNV3 rdfs:seeAlso FTN Experiment Replicate rdfs:seeAlso t jec rd f:ob DDBID analysis data rdf:type rdf:Statement rd rdf:s f: pr ubje ct GO SP EC ed analysis data ica mgla:PeptideAbundance te InExperimentReplicate abundance
  • 27. Sesame Reified Data load - native-RDFS (spoc,posc,posc) Data File time (s) time(mins) triples FnU112Version3.nt 383.44 6.3 58,474 PosonMappings.nt 84.56 1.4 13,760 francisella_locus_tag.nt 16.73 0.3 1,767 ConstructHasGeneID.nt 23.00 0.4 1,719 interact-prot.nt 124.95 2.1 20,682 interact-prot-pepteides.nt 1127.97 18.7 248,647 interact-protSeeAlsoisbURL.nt 10.67 0.2 1,528 goAnnotation_URLID.nt 74.14 1.2 20,501 NC_008601.nt 75.84 1.3 12,781 Membranes_CogNumberURL.nt 8.60 0.1 2,548 Ft_novicida_U112_go.nt 561.38 9.3 2,548 francisella.rdf2.nt 46.19 0.8 10,602 francisellaSUPERFAMILY.nt 66.67 1.1 16,110 francisellaPROTEIN.fasta.nt 15.27 0.3 5,160 SolubleReifeid_3.rdf 1392.98 23.2 580,873 WholeCellReified_3.rdf 941.16 15.6 184,221 Membranes_3.rdf 1026.66 17.111 416,086 fnU112_draftRDFschemaV4.nt 215010.98 3,583.5 501
  • 28. Queries which posons have the most highly abundant peptides select ftn , psn, exp, abundance from {psn} rdfs:seeAlso {psnv2}, {psnv2} rdfs:seeAlso {psnv3}, {psnv3} rdfs:seeAlso {ftn}, {analysis} fnu112:poson {psn}, {analysis} rdf:type {rdf:Statement}, {analysis} rdf:object {exp}, {analysis} mgla:PeptideAbundance {abundance} where xsd:integer(abundance) > 100000 and ftn LIKE "*FTN*" using namespace mgla=<http://www.francisella.org/novicida/schema/fnu112/experiments/mgla/>, fnu112=<http://www.francisella.org/novicida/fnu112/schema/fnu112/experiments/ mgla#>
  • 29. Queries which posons have the most highly abundant peptides
  • 30. Queries which experiments have the most highly abundant peptides
  • 31. Reified statements • Reified mgla data are much bigger (4 more statements/abundance) • The really interesting queries return Java out of memory error (-Xms-1024M - Xmx 1536M) identifiedIn Experiment Peptide Sequence Replicate { • Haven’t yet tested shortcut path expression hasAbundance { {reifSubj} reifPred {reifObj} } pred {obj} abundance { {seq} identifiedIn {ExpRep} } hasAbundance {abd} <#WholeCell_Lvl7_02.12> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement>. <#WholeCell_Lvl7_02.12> <http://www.w3.org/1999/02/22-rdf-syntax-ns#subject> <http:/www.francisella.org/novicida/schema/fnu112/experiments/mgla/WholeCell_Lvl7_02.1>. <#WholeCell_Lvl7_02.12> <http://www.w3.org/1999/02/22-rdf-syntax-ns#predicate> <http:/www.francisella.org/novicida/schema/fnu112/experiments/mgla/InExperimentReplicate>. <#WholeCell_Lvl7_02.12> <http://www.w3.org/1999/02/22-rdf-syntax-ns#object> <http:/www.francisella.org/novicida/schema/fnu112/experiments/mgla/wildtype/01_wc_01>. <#WholeCell_Lvl7_02.12> <http:/www.francisella.org/novicida/schema/fnu112/experiments/mgla/PeptideAbundance> "2594".
  • 32. Comparison of integrated experimental data Distinct and overlapping posons identified within each biological fraction (>20000) 171 146 185 mem sol mem MINUS sol sol MINUS mem select distinct psn from select distinct psn from {x} fns:poson {psn}, {x} fns:poson {psn}, {x} fn:InExperimentReplicate {experiment}, {x} fn:InExperimentReplicate {experiment}, {analysis} rdf:subject {x}, {analysis} rdf:subject {x}, {analysis} rdf:object {exp}, INTERSECT {analysis} rdf:object {exp}, {analysis} fn:PeptideAbundance {abundance} {analysis} fn:PeptideAbundance {abundance} select distinct psn from where xsd:integer(abundance) > 20000 where xsd:integer(abundance) > 20000 {x} fns:poson {psn}, and experiment LIKE "*mem*" and experiment LIKE "*sol*" {x} fn:InExperimentReplicate {experiment}, MINUS MINUS {analysis} rdf:subject {x}, select distinct psn from select distinct psn from {analysis} rdf:object {exp}, {x} fns:poson {psn}, {x} fns:poson {psn}, {analysis} fn:PeptideAbundance {abundance} {x} fn:InExperimentReplicate {experiment}, {x} fn:InExperimentReplicate {experiment}, where xsd:integer(abundance) > 20000 {analysis} rdf:subject {x}, {analysis} rdf:subject {x}, and experiment LIKE "*sol*" {analysis} rdf:object {exp}, {analysis} rdf:object {exp}, INTERSECT {analysis} fn:PeptideAbundance {abundance} {analysis} fn:PeptideAbundance {abundance} select distinct psn from where xsd:integer(abundance) > 20000 where xsd:integer(abundance) > 20000 {x} fns:poson {psn}, and experiment LIKE "*sol*" and experiment LIKE "*mem*" {x} fn:InExperimentReplicate {experiment}, using namespace using namespace {analysis} rdf:subject {x}, {analysis} rdf:object {exp}, {analysis} fn:PeptideAbundance {abundance} where xsd:integer(abundance) > 20000 and experiment LIKE "*mem*" using namespace
  • 33. Comparison of integrated experimental data Distinct and overlapping posons identified within each biological fraction (<5000) 219 125 245 mem sol mem MINUS sol sol MINUS mem select distinct psn from select distinct psn from {x} fns:poson {psn}, {x} fns:poson {psn}, {x} fn:InExperimentReplicate {experiment}, {x} fn:InExperimentReplicate {experiment}, {analysis} rdf:subject {x}, {analysis} rdf:subject {x}, {analysis} rdf:object {exp}, INTERSECT {analysis} rdf:object {exp}, {analysis} fn:PeptideAbundance {abundance} {analysis} fn:PeptideAbundance {abundance} select distinct psn from where xsd:integer(abundance) < 5000 where xsd:integer(abundance) < 5000 {x} fns:poson {psn}, and experiment LIKE "*mem*" and experiment LIKE "*sol*" {x} fn:InExperimentReplicate {experiment}, MINUS MINUS {analysis} rdf:subject {x}, select distinct psn from select distinct psn from {analysis} rdf:object {exp}, {x} fns:poson {psn}, {x} fns:poson {psn}, {analysis} fn:PeptideAbundance {abundance} {x} fn:InExperimentReplicate {experiment}, {x} fn:InExperimentReplicate {experiment}, where xsd:integer(abundance) < 5000 {analysis} rdf:subject {x}, {analysis} rdf:subject {x}, and experiment LIKE "*sol*" {analysis} rdf:object {exp}, {analysis} rdf:object {exp}, INTERSECT {analysis} fn:PeptideAbundance {abundance} {analysis} fn:PeptideAbundance {abundance} select distinct psn from where xsd:integer(abundance) < 5000 where xsd:integer(abundance) < 5000 {x} fns:poson {psn}, and experiment LIKE "*sol*" and experiment LIKE "*mem*" {x} fn:InExperimentReplicate {experiment}, using namespace using namespace {analysis} rdf:subject {x}, {analysis} rdf:object {exp}, {analysis} fn:PeptideAbundance {abundance} where xsd:integer(abundance) < 5000 and experiment LIKE "*mem*" using namespace
  • 34. Further work • Queries are slow in the native repository, database repositories are probably faster. • Adding transcriptomic experiment: Wt Vs mglA mutant GEO AC GSE5468 • RDF-S inferencing?
  • 35. Acknowledgements • Funding: BBSRC -Radical Solutions for Researching the Proteome • University of Glasgow, Glasgow • Prof. Walter Kolch • Dr Andy Pitt • University of Strathclyde, Glasgow • Dr Ela Hunt (Scientific Advisor) • University of Washington, Seattle • Prof. Dave Goodlett (Scientific Advisor) • Dr Mitch Brittnacher, Mathew Radey, Laurence Rohmer • Dr Tina Guina (MglA experiment)
  • 36. Abundance thresholds.... • SeRQL aggregate functions would be nice to have • Queries to find low and high abundance values: • WHERE abundance BETWEEN MEDIAN(abundance) AND MAX(abundance) • WHERE abundance BETWEEN MIN(abundance) and MEDIAN(abundance)