SlideShare a Scribd company logo
1 of 15
Download to read offline
Unison: An Integrated Platform for
Computational Biology Discovery
Freely accessible and available at http://unison-db.org/ .

Reece Hart, Kiran Mukhyala
Genentech, Inc.

Pacific Symposium on Biocomputing 2009
assert(Sequence Analysis != Sequence Mining)

                                               feature types/models HMM, TM, signal, etc.
     sequences
                                                                                                                                Sequence Analysis
                                                                                                                                i.e., show predictions for a given sequence
                                                                                                                                Typically involves minutes to hours of computing per sequence.
                                                Typically entails days to months of computing results.
                                                i.e., show sequences that contain specified features.

                                                                                                         Feature-Based Mining
                                                                                                                                  Prediction results
     non-redundant superset of all sequences




                                                                                                                                  method-specific data such as score, e-
                                                                                                                                  value, p-value, kinase probability, etc.




                                                                                                                                                                        parameters
                                                                                                                                                               execution arguments/options for
                                                                                                                                                               every prediction type and result
Unison in a Nutshell




                           Domain,
                                                              Structures
                    Structure & Homology
                                                              & Ligands
                         Predictions

                                            Protein
                                         Sequences and
                                          Annotations

                         Genomes,                            Auxiliary
                       Gene Mapping &                      Annotations
                         Structure,                       GO, RIF, SCOP,
                          Probes                               etc.



       Sequences and Annotations          Auxiliary Data      Precomputed predictions
  UniProt, IPI, Ensembl, RefSeq, PDB    HomoloGene, Gene      Domains, homology, structure, TMs,
STRING, PHANTOM, HUGE, ROUGE,           Ontology, taxonomy,   localization, signals, disorder, etc.
        MGC, Derwent, pataa, nr, etc.   PDB, HUGO, SCOP,      >200M predictions, 23 types,
  >13M seqs, >17k species, 69 origins           etc.          ~6 CPU-years
Unison has many applications.
Unison Web Tools                                   Other In-House Tools                                                  Ad Hoc Mining



                                                                                                                             Mining and
                                                                                                                             analysis
                                                                                                                             projects




                                              Domain,
                                                                                 Structures
                                       Structure & Homology
                                                                                 & Ligands
                                            Predictions

                                                               Protein
                                                            Sequences and
                                                             Annotations

                                            Genomes,                            Auxiliary
                                          Gene Mapping &                      Annotations
                                            Structure,                       GO, RIF, SCOP,
                                             Probes                               etc.



                          Sequences and Annotations          Auxiliary Data      Precomputed predictions
                     UniProt, IPI, Ensembl, RefSeq, PDB    HomoloGene, Gene      Domains, homology, structure, TMs,
                   STRING, PHANTOM, HUGE, ROUGE,           Ontology, taxonomy,   localization, signals, disorder, etc.
                           MGC, Derwent, pataa, nr, etc.   PDB, HUGO, SCOP,      >200M predictions, 23 types,
                     >13M seqs, >17k species, 69 origins           etc.          ~6 CPU-years
Unison Web Tools
Unison is a platform for diverse tools.




                                    Matt Brauer
                                    Guy Cavet
                                    Josh Kaminker
                                    Scott Lohr
                                    Kathryn Woods
                                    Jean Yuan
                                    Peng Yue
Unison facilitates complex mining.




Mining for TNF ligands
Mining for E3 Ligases
Mining for 4H Cytokines
Mining for ITxM
Mining for deubiquitinases
Analyzing SNP impact on binding interfaces




                                             Jason Hackney
                                             Nandini Krishnamurthy
                                             Li Li
                                             Yun Li
                                             Jinfeng Liu
                                             Shiu-ming Loh
                                             Kiran Mukhyala
Mining for ITIMs the old way.

                 Ig      TM       ITIM


➢   Collect sequences.
➢   Prune redundant sequences. (How?!)
➢   For each unique sequence, predict
    ●   Immunoglobulin domains.
    ●   Transmembrane domains.
    ●   ITIM domains.
➢   Write a program that filters predictions.
➢   Summarize hits with external data.
➢   Do it again when source data are updated.
Mining for ITIMs the Unison way.

                             Ig                   TM             ITIM
SELECT IG.pseq_id,
        IG.start as ig_start,IG.stop as ig_stop,IG.score,IG.eval,
        TM.start as tm_start,TM.stop as tm_stop,
        ITIM.start as itim_start,ITIM.stop as itim_stop
 FROM pahmm_current_pfam_v IG
 JOIN pftmhmm_tms_v TM ON IG.pseq_id=TM.pseq_id                          AND IG.stop<TM.start
 JOIN pfregexp_v ITIM             ON TM.pseq_id=ITIM.pseq_id AND TM.stop<ITIM.start
WHERE IG.name='ig' AND IG.eval<1e-2
        AND ITIM.acc='MOD_TYR_ITIM';

               Ig     Ig                   TM      Tm    ITIM     ITIM
  pseq_id   start   stop score     eval   start   stop   start    stop best_annotation
      234    262     316    30 7.40E-06    440     462    518      523 UniProtKB/Swiss-Prot:SIGL5_HUMAN (RecName: Fu
      254    158     213    36 1.90E-07    284     306    386      391 UniProtKB/Swiss-Prot:VSIG4_HUMAN (RecName: F
      544    157     215    24 6.60E-04    348     370    431      436 UniProtKB/Swiss-Prot:SIGL9_HUMAN (RecName: Fu
      797    254     312    40 7.60E-09   1099    1121   1361     1366 UniProtKB/Swiss-Prot:DCC_HUMAN (RecName: Ful
     1113     42     102    30 1.20E-05    243     265    300      305 UniProtKB/Swiss-Prot:KI2L2_HUMAN (RecName: Fu
     1114     42     102    30 6.50E-06    243     265    330      335 UniProtKB/Swiss-Prot:KI2L1_HUMAN (RecName: Fu
     1115     42     102    31 4.20E-06    243     265    301      306 UniProtKB/Swiss-Prot:KI2L3_HUMAN (RecName: Fu
     1116     42      97    30 1.10E-05    339     361    396      401 UniProtKB/TrEMBL:Q95368_HUMAN (SubName: Fu
     1134    340     388    26 1.40E-04    603     625    688      693 UniProtKB/Swiss-Prot:PECA1_HUMAN (RecName: F
“Are you sure about this Stan? It seems odd that a
pointy head and a long beak is what makes them fly.”
                              J. Workman, Science 245:1399 (1989)
Kiran Mukhyala

Fernando Bazan, Matt Brauer, Jason Hackney, Pete Haverty,
Ken Jung, Josh Kaminker, Nandini Krishnamurthy, Li Li, Yun Li,
Shiuh-ming Loh, Jinfeng Liu, Peng Yue, Jianjun Zhang, Yan Zhang

http://unison-db.org/
Open access web site, downloads, documentation, references

unison-db.org:5432
PostgreSQL & odbc/jdbc/sdbc access
Unison Contents
  patents                       HUGO
  Geneseq:AAP60074              TNFSF9
  1991-10-29
  SUNTORY
                                TNFSF10
                                TNFSF11
                                                      homologs
                                                      NP_000585.2 NP_036807.1 | RAT
  EP205038-A; New tumour...
                                                      NP_000585.2 NP_038721.1 | MOUSE
                                                      NP_000585.2 XP_858423.1 | CANFA


 GO                                                                                                      SNPs
 Function                                                                                                P84L
   transcription                                                                                         A94T
      initiation
      elongation
                       aliases
                       TNFA_HUMAN
Entrez                 Q1XHZ6
                       IPI00001671.1
                                                      sequences                         protein features
gene_id                                               >Unison:98
                       INCY:1109711.FL1p
symbol                                                MSTESMIRDVE...FGIIAL
                       CCDS4702.1
locus                                                 >Unison:23782
                       gi:25952111
                                                      VRSSSRTPSD...FGIIAL                  1   |    23   |         | SS
                                                                                         108   |   143   | 1.8e-06 | EGF
                                                                                         162   |   184   |         | TM

taxonomy                                                           alignments
                                                                                         133   |   138   |         | ITIM

9606 Homo sapiens
10090 Mus musculus                                                 TNFA 1tnfA
10028 Rattus rattus                                                TNFA 1tnfB
                                                                                                   aa-to-resid
                              loci                                 ...
                                                                   TNFA 5tswF                      MSTESMIR
                                                                                                   DVEFGIIA
                                1 233 6+:31651498-31653288
                                                                                                   TESMIRDV
                                                                                                   IIAMDAC

                                                                                structures
                                                                                1tnf                            SCOP
  genomes                                                                       1a8m                            all alpha
  Hs35
  Hs36
                                            probes                              2tun
                                                                                4tsv
                                                                                                                all beta
                                                                                                                 Ig
                                            HGU133P                             5tsw                             TNF-like
  RAT
                                            WHG                                                                 alpha+beta
Ex1: Mine for sequences w/conserved features.
  patents                       HUGO
  Geneseq:AAP60074              TNFSF9
  1991-10-29
  SUNTORY
                                TNFSF10
                                TNFSF11
                                                      homologs
                                                      NP_000585.2 NP_036807.1 | RAT
  EP205038-A; New tumour...
                                                      NP_000585.2 NP_038721.1 | MOUSE
                                                      NP_000585.2 XP_858423.1 | CANFA


 GO                                                                                                      SNPs
 Function                                                                                                P84L
   transcription                                                                                         A94T
      initiation
      elongation
                       aliases
                       TNFA_HUMAN
Entrez                 Q1XHZ6
                       IPI00001671.1
                                                      sequences                         protein features
gene_id                                               >Unison:98
                       INCY:1109711.FL1p
symbol                                                MSTESMIRDVE...FGIIAL
                       CCDS4702.1
locus                                                 >Unison:23782
                       gi:25952111
                                                      VRSSSRTPSD...FGIIAL                  1   |    23   |         | SS
                                                                                         108   |   143   | 1.8e-06 | EGF
                                                                                         162   |   184   |         | TM

taxonomy                                                           alignments
                                                                                         133   |   138   |         | ITIM

9606 Homo sapiens
10090 Mus musculus                                                 TNFA 1tnfA
10028 Rattus rattus                                                TNFA 1tnfB
                                                                                                   aa-to-resid
                              loci                                 ...
                                                                   TNFA 5tswF                      MSTESMIR
                                                                                                   DVEFGIIA
                                1 233 6+:31651498-31653288
                                                                                                   TESMIRDV
                                                                                                   IIAMDAC

                                                                                structures
                                                                                1tnf                            SCOP
  genomes                                                                       1a8m                            all alpha
  Hs35
  Hs36
                                            probes                              2tun
                                                                                4tsv
                                                                                                                all beta
                                                                                                                 Ig
                                            HGU133P                             5tsw                             TNF-like
  RAT
                                            WHG                                                                 alpha+beta
Ex2: Locate SNPs and Domains on Structure
  patents                       HUGO
  Geneseq:AAP60074              TNFSF9
  1991-10-29
  SUNTORY
                                TNFSF10
                                TNFSF11
                                                      homologs
                                                      NP_000585.2 NP_036807.1 | RAT
  EP205038-A; New tumour...
                                                      NP_000585.2 NP_038721.1 | MOUSE
                                                      NP_000585.2 XP_858423.1 | CANFA


 GO                                                                                                      SNPs
 Function                                                                                                P84L
   transcription                                                                                         A94T
      initiation
      elongation
                       aliases
                       TNFA_HUMAN
Entrez                 Q1XHZ6
                       IPI00001671.1
                                                      sequences                         protein features
gene_id                                               >Unison:98
                       INCY:1109711.FL1p
symbol                                                MSTESMIRDVE...FGIIAL
                       CCDS4702.1
locus                                                 >Unison:23782
                       gi:25952111
                                                      VRSSSRTPSD...FGIIAL                  1   |    23   |         | SS
                                                                                         108   |   143   | 1.8e-06 | EGF
                                                                                         162   |   184   |         | TM

taxonomy                                                           alignments
                                                                                         133   |   138   |         | ITIM

9606 Homo sapiens
10090 Mus musculus                                                 TNFA 1tnfA
10028 Rattus rattus                                                TNFA 1tnfB
                                                                                                   aa-to-resid
                              loci                                 ...
                                                                   TNFA 5tswF                      MSTESMIR
                                                                                                   DVEFGIIA
                                1 233 6+:31651498-31653288
                                                                                                   TESMIRDV
                                                                                                   IIAMDAC

                                                                                structures
                                                                                1tnf                            SCOP
  genomes                                                                       1a8m                            all alpha
  Hs35
  Hs36
                                            probes                              2tun
                                                                                4tsv
                                                                                                                all beta
                                                                                                                 Ig
                                            HGU133P                             5tsw                             TNF-like
  RAT
                                            WHG                                                                 alpha+beta
Unison can also help you...
➢   Answer more sophisticated questions.
    ●   Require orthologs or a specified exon structure.
➢   Annotate hits.
    ●   Annotate with locus, probes, HUGO gene name,
        structures, PubMed refs, external links.
    ●   Group splice forms by locus.
➢   Explore alternatives.
    ●   How do parameters influence results?
    ●   Try other prediction algorithms.
➢   Stay current.
    ●   When new data are available, just rerun the query.
➢   Move on.
    ●   The same data are available to other projects and
        other people.

More Related Content

Similar to Unison: An Integrated Platform for Computational Biology Discovery

Inferring microbial community function from taxonomic composition
Inferring microbial community function from taxonomic compositionInferring microbial community function from taxonomic composition
Inferring microbial community function from taxonomic compositionMorgan Langille
 
Experimentos de nubes científicas: Medical Genome Project
Experimentos de nubes científicas: Medical Genome ProjectExperimentos de nubes científicas: Medical Genome Project
Experimentos de nubes científicas: Medical Genome ProjectFundación Ramón Areces
 
International Journal of Computational Engineering Research(IJCER)
 International Journal of Computational Engineering Research(IJCER)  International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) ijceronline
 
Caporaso sloan qiime_workshop_slides_18_oct2012
Caporaso sloan qiime_workshop_slides_18_oct2012Caporaso sloan qiime_workshop_slides_18_oct2012
Caporaso sloan qiime_workshop_slides_18_oct2012gregcaporaso
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfH K Yoon
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAGRF_Ltd
 
Scientific and Grid Workflow Management (SGS09)
Scientific and Grid Workflow Management (SGS09)Scientific and Grid Workflow Management (SGS09)
Scientific and Grid Workflow Management (SGS09)Cesare Pautasso
 
Bioinformatics t8-go-hmm wim-vancriekinge_v2013
Bioinformatics t8-go-hmm wim-vancriekinge_v2013Bioinformatics t8-go-hmm wim-vancriekinge_v2013
Bioinformatics t8-go-hmm wim-vancriekinge_v2013Prof. Wim Van Criekinge
 
Introduction to Evolutionary Concepts and VMD/MultiSeq
Introduction to Evolutionary Concepts and VMD/MultiSeqIntroduction to Evolutionary Concepts and VMD/MultiSeq
Introduction to Evolutionary Concepts and VMD/MultiSeqTCBG
 
Genome structure prediction a review over soft computing techniques
Genome structure prediction a review over soft computing techniquesGenome structure prediction a review over soft computing techniques
Genome structure prediction a review over soft computing techniqueseSAT Journals
 

Similar to Unison: An Integrated Platform for Computational Biology Discovery (16)

Inferring microbial community function from taxonomic composition
Inferring microbial community function from taxonomic compositionInferring microbial community function from taxonomic composition
Inferring microbial community function from taxonomic composition
 
Experimentos de nubes científicas: Medical Genome Project
Experimentos de nubes científicas: Medical Genome ProjectExperimentos de nubes científicas: Medical Genome Project
Experimentos de nubes científicas: Medical Genome Project
 
International Journal of Computational Engineering Research(IJCER)
 International Journal of Computational Engineering Research(IJCER)  International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Caporaso sloan qiime_workshop_slides_18_oct2012
Caporaso sloan qiime_workshop_slides_18_oct2012Caporaso sloan qiime_workshop_slides_18_oct2012
Caporaso sloan qiime_workshop_slides_18_oct2012
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdf
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
 
Scientific and Grid Workflow Management (SGS09)
Scientific and Grid Workflow Management (SGS09)Scientific and Grid Workflow Management (SGS09)
Scientific and Grid Workflow Management (SGS09)
 
An26247254
An26247254An26247254
An26247254
 
Bioinformatics t8-go-hmm wim-vancriekinge_v2013
Bioinformatics t8-go-hmm wim-vancriekinge_v2013Bioinformatics t8-go-hmm wim-vancriekinge_v2013
Bioinformatics t8-go-hmm wim-vancriekinge_v2013
 
Rnaseq forgenefinding
Rnaseq forgenefindingRnaseq forgenefinding
Rnaseq forgenefinding
 
Introduction to Evolutionary Concepts and VMD/MultiSeq
Introduction to Evolutionary Concepts and VMD/MultiSeqIntroduction to Evolutionary Concepts and VMD/MultiSeq
Introduction to Evolutionary Concepts and VMD/MultiSeq
 
408 420
408 420408 420
408 420
 
Bioinformatica t8-go-hmm
Bioinformatica t8-go-hmmBioinformatica t8-go-hmm
Bioinformatica t8-go-hmm
 
Darwin
DarwinDarwin
Darwin
 
Genome structure prediction a review over soft computing techniques
Genome structure prediction a review over soft computing techniquesGenome structure prediction a review over soft computing techniques
Genome structure prediction a review over soft computing techniques
 
Bioinformatics t8-go-hmm v2014
Bioinformatics t8-go-hmm v2014Bioinformatics t8-go-hmm v2014
Bioinformatics t8-go-hmm v2014
 

More from Reece Hart

HGVS 2015 poster: hgvs, uta, variantanalyzer
HGVS 2015 poster: hgvs, uta, variantanalyzerHGVS 2015 poster: hgvs, uta, variantanalyzer
HGVS 2015 poster: hgvs, uta, variantanalyzerReece Hart
 
Invitae PSB 2014 poster
Invitae PSB 2014 posterInvitae PSB 2014 poster
Invitae PSB 2014 posterReece Hart
 
ASHG 2012 Poster
ASHG 2012 PosterASHG 2012 Poster
ASHG 2012 PosterReece Hart
 
Bio-IT 2010 Genome Commons
Bio-IT 2010 Genome CommonsBio-IT 2010 Genome Commons
Bio-IT 2010 Genome CommonsReece Hart
 
HVP Critical Assessment of Genome Interpretation
HVP Critical Assessment of Genome InterpretationHVP Critical Assessment of Genome Interpretation
HVP Critical Assessment of Genome InterpretationReece Hart
 
A Tour of Research Computing at Genentech
A Tour of Research Computing at GenentechA Tour of Research Computing at Genentech
A Tour of Research Computing at GenentechReece Hart
 
Mining for Novel TNF Ligands
Mining for Novel TNF LigandsMining for Novel TNF Ligands
Mining for Novel TNF LigandsReece Hart
 

More from Reece Hart (7)

HGVS 2015 poster: hgvs, uta, variantanalyzer
HGVS 2015 poster: hgvs, uta, variantanalyzerHGVS 2015 poster: hgvs, uta, variantanalyzer
HGVS 2015 poster: hgvs, uta, variantanalyzer
 
Invitae PSB 2014 poster
Invitae PSB 2014 posterInvitae PSB 2014 poster
Invitae PSB 2014 poster
 
ASHG 2012 Poster
ASHG 2012 PosterASHG 2012 Poster
ASHG 2012 Poster
 
Bio-IT 2010 Genome Commons
Bio-IT 2010 Genome CommonsBio-IT 2010 Genome Commons
Bio-IT 2010 Genome Commons
 
HVP Critical Assessment of Genome Interpretation
HVP Critical Assessment of Genome InterpretationHVP Critical Assessment of Genome Interpretation
HVP Critical Assessment of Genome Interpretation
 
A Tour of Research Computing at Genentech
A Tour of Research Computing at GenentechA Tour of Research Computing at Genentech
A Tour of Research Computing at Genentech
 
Mining for Novel TNF Ligands
Mining for Novel TNF LigandsMining for Novel TNF Ligands
Mining for Novel TNF Ligands
 

Recently uploaded

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Recently uploaded (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Unison: An Integrated Platform for Computational Biology Discovery

  • 1. Unison: An Integrated Platform for Computational Biology Discovery Freely accessible and available at http://unison-db.org/ . Reece Hart, Kiran Mukhyala Genentech, Inc. Pacific Symposium on Biocomputing 2009
  • 2. assert(Sequence Analysis != Sequence Mining) feature types/models HMM, TM, signal, etc. sequences Sequence Analysis i.e., show predictions for a given sequence Typically involves minutes to hours of computing per sequence. Typically entails days to months of computing results. i.e., show sequences that contain specified features. Feature-Based Mining Prediction results non-redundant superset of all sequences method-specific data such as score, e- value, p-value, kinase probability, etc. parameters execution arguments/options for every prediction type and result
  • 3. Unison in a Nutshell Domain, Structures Structure & Homology & Ligands Predictions Protein Sequences and Annotations Genomes, Auxiliary Gene Mapping & Annotations Structure, GO, RIF, SCOP, Probes etc. Sequences and Annotations Auxiliary Data Precomputed predictions UniProt, IPI, Ensembl, RefSeq, PDB HomoloGene, Gene Domains, homology, structure, TMs, STRING, PHANTOM, HUGE, ROUGE, Ontology, taxonomy, localization, signals, disorder, etc. MGC, Derwent, pataa, nr, etc. PDB, HUGO, SCOP, >200M predictions, 23 types, >13M seqs, >17k species, 69 origins etc. ~6 CPU-years
  • 4. Unison has many applications. Unison Web Tools Other In-House Tools Ad Hoc Mining Mining and analysis projects Domain, Structures Structure & Homology & Ligands Predictions Protein Sequences and Annotations Genomes, Auxiliary Gene Mapping & Annotations Structure, GO, RIF, SCOP, Probes etc. Sequences and Annotations Auxiliary Data Precomputed predictions UniProt, IPI, Ensembl, RefSeq, PDB HomoloGene, Gene Domains, homology, structure, TMs, STRING, PHANTOM, HUGE, ROUGE, Ontology, taxonomy, localization, signals, disorder, etc. MGC, Derwent, pataa, nr, etc. PDB, HUGO, SCOP, >200M predictions, 23 types, >13M seqs, >17k species, 69 origins etc. ~6 CPU-years
  • 6. Unison is a platform for diverse tools. Matt Brauer Guy Cavet Josh Kaminker Scott Lohr Kathryn Woods Jean Yuan Peng Yue
  • 7. Unison facilitates complex mining. Mining for TNF ligands Mining for E3 Ligases Mining for 4H Cytokines Mining for ITxM Mining for deubiquitinases Analyzing SNP impact on binding interfaces Jason Hackney Nandini Krishnamurthy Li Li Yun Li Jinfeng Liu Shiu-ming Loh Kiran Mukhyala
  • 8. Mining for ITIMs the old way. Ig TM ITIM ➢ Collect sequences. ➢ Prune redundant sequences. (How?!) ➢ For each unique sequence, predict ● Immunoglobulin domains. ● Transmembrane domains. ● ITIM domains. ➢ Write a program that filters predictions. ➢ Summarize hits with external data. ➢ Do it again when source data are updated.
  • 9. Mining for ITIMs the Unison way. Ig TM ITIM SELECT IG.pseq_id, IG.start as ig_start,IG.stop as ig_stop,IG.score,IG.eval, TM.start as tm_start,TM.stop as tm_stop, ITIM.start as itim_start,ITIM.stop as itim_stop FROM pahmm_current_pfam_v IG JOIN pftmhmm_tms_v TM ON IG.pseq_id=TM.pseq_id AND IG.stop<TM.start JOIN pfregexp_v ITIM ON TM.pseq_id=ITIM.pseq_id AND TM.stop<ITIM.start WHERE IG.name='ig' AND IG.eval<1e-2 AND ITIM.acc='MOD_TYR_ITIM'; Ig Ig TM Tm ITIM ITIM pseq_id start stop score eval start stop start stop best_annotation 234 262 316 30 7.40E-06 440 462 518 523 UniProtKB/Swiss-Prot:SIGL5_HUMAN (RecName: Fu 254 158 213 36 1.90E-07 284 306 386 391 UniProtKB/Swiss-Prot:VSIG4_HUMAN (RecName: F 544 157 215 24 6.60E-04 348 370 431 436 UniProtKB/Swiss-Prot:SIGL9_HUMAN (RecName: Fu 797 254 312 40 7.60E-09 1099 1121 1361 1366 UniProtKB/Swiss-Prot:DCC_HUMAN (RecName: Ful 1113 42 102 30 1.20E-05 243 265 300 305 UniProtKB/Swiss-Prot:KI2L2_HUMAN (RecName: Fu 1114 42 102 30 6.50E-06 243 265 330 335 UniProtKB/Swiss-Prot:KI2L1_HUMAN (RecName: Fu 1115 42 102 31 4.20E-06 243 265 301 306 UniProtKB/Swiss-Prot:KI2L3_HUMAN (RecName: Fu 1116 42 97 30 1.10E-05 339 361 396 401 UniProtKB/TrEMBL:Q95368_HUMAN (SubName: Fu 1134 340 388 26 1.40E-04 603 625 688 693 UniProtKB/Swiss-Prot:PECA1_HUMAN (RecName: F
  • 10. “Are you sure about this Stan? It seems odd that a pointy head and a long beak is what makes them fly.” J. Workman, Science 245:1399 (1989)
  • 11. Kiran Mukhyala Fernando Bazan, Matt Brauer, Jason Hackney, Pete Haverty, Ken Jung, Josh Kaminker, Nandini Krishnamurthy, Li Li, Yun Li, Shiuh-ming Loh, Jinfeng Liu, Peng Yue, Jianjun Zhang, Yan Zhang http://unison-db.org/ Open access web site, downloads, documentation, references unison-db.org:5432 PostgreSQL & odbc/jdbc/sdbc access
  • 12. Unison Contents patents HUGO Geneseq:AAP60074 TNFSF9 1991-10-29 SUNTORY TNFSF10 TNFSF11 homologs NP_000585.2 NP_036807.1 | RAT EP205038-A; New tumour... NP_000585.2 NP_038721.1 | MOUSE NP_000585.2 XP_858423.1 | CANFA GO SNPs Function P84L transcription A94T initiation elongation aliases TNFA_HUMAN Entrez Q1XHZ6 IPI00001671.1 sequences protein features gene_id >Unison:98 INCY:1109711.FL1p symbol MSTESMIRDVE...FGIIAL CCDS4702.1 locus >Unison:23782 gi:25952111 VRSSSRTPSD...FGIIAL 1 | 23 | | SS 108 | 143 | 1.8e-06 | EGF 162 | 184 | | TM taxonomy alignments 133 | 138 | | ITIM 9606 Homo sapiens 10090 Mus musculus TNFA 1tnfA 10028 Rattus rattus TNFA 1tnfB aa-to-resid loci ... TNFA 5tswF MSTESMIR DVEFGIIA 1 233 6+:31651498-31653288 TESMIRDV IIAMDAC structures 1tnf SCOP genomes 1a8m all alpha Hs35 Hs36 probes 2tun 4tsv all beta Ig HGU133P 5tsw TNF-like RAT WHG alpha+beta
  • 13. Ex1: Mine for sequences w/conserved features. patents HUGO Geneseq:AAP60074 TNFSF9 1991-10-29 SUNTORY TNFSF10 TNFSF11 homologs NP_000585.2 NP_036807.1 | RAT EP205038-A; New tumour... NP_000585.2 NP_038721.1 | MOUSE NP_000585.2 XP_858423.1 | CANFA GO SNPs Function P84L transcription A94T initiation elongation aliases TNFA_HUMAN Entrez Q1XHZ6 IPI00001671.1 sequences protein features gene_id >Unison:98 INCY:1109711.FL1p symbol MSTESMIRDVE...FGIIAL CCDS4702.1 locus >Unison:23782 gi:25952111 VRSSSRTPSD...FGIIAL 1 | 23 | | SS 108 | 143 | 1.8e-06 | EGF 162 | 184 | | TM taxonomy alignments 133 | 138 | | ITIM 9606 Homo sapiens 10090 Mus musculus TNFA 1tnfA 10028 Rattus rattus TNFA 1tnfB aa-to-resid loci ... TNFA 5tswF MSTESMIR DVEFGIIA 1 233 6+:31651498-31653288 TESMIRDV IIAMDAC structures 1tnf SCOP genomes 1a8m all alpha Hs35 Hs36 probes 2tun 4tsv all beta Ig HGU133P 5tsw TNF-like RAT WHG alpha+beta
  • 14. Ex2: Locate SNPs and Domains on Structure patents HUGO Geneseq:AAP60074 TNFSF9 1991-10-29 SUNTORY TNFSF10 TNFSF11 homologs NP_000585.2 NP_036807.1 | RAT EP205038-A; New tumour... NP_000585.2 NP_038721.1 | MOUSE NP_000585.2 XP_858423.1 | CANFA GO SNPs Function P84L transcription A94T initiation elongation aliases TNFA_HUMAN Entrez Q1XHZ6 IPI00001671.1 sequences protein features gene_id >Unison:98 INCY:1109711.FL1p symbol MSTESMIRDVE...FGIIAL CCDS4702.1 locus >Unison:23782 gi:25952111 VRSSSRTPSD...FGIIAL 1 | 23 | | SS 108 | 143 | 1.8e-06 | EGF 162 | 184 | | TM taxonomy alignments 133 | 138 | | ITIM 9606 Homo sapiens 10090 Mus musculus TNFA 1tnfA 10028 Rattus rattus TNFA 1tnfB aa-to-resid loci ... TNFA 5tswF MSTESMIR DVEFGIIA 1 233 6+:31651498-31653288 TESMIRDV IIAMDAC structures 1tnf SCOP genomes 1a8m all alpha Hs35 Hs36 probes 2tun 4tsv all beta Ig HGU133P 5tsw TNF-like RAT WHG alpha+beta
  • 15. Unison can also help you... ➢ Answer more sophisticated questions. ● Require orthologs or a specified exon structure. ➢ Annotate hits. ● Annotate with locus, probes, HUGO gene name, structures, PubMed refs, external links. ● Group splice forms by locus. ➢ Explore alternatives. ● How do parameters influence results? ● Try other prediction algorithms. ➢ Stay current. ● When new data are available, just rerun the query. ➢ Move on. ● The same data are available to other projects and other people.