SlideShare a Scribd company logo
Vivek Krishnakumar1, Chia-Yi Cheng1, Maria Kim1, Erik Ferlanti1, Irina Belyaeva1, Seth
Schobel1, Sergio Contrino3, Matthew R. Hanlon2, Walter Moreira2, Steve Mock2, Joe Stubbs2,
Agnes P. Chan1, Jason R. Miller1, Matthew W. Vaughn2, Gos Micklem3, Christopher D. Town1

1J. Craig Venter Institute, Rockville, MD, USA; 2Texas Advanced Computing Center, Austin, TX, USA; 3Cambridge University, Cambridge, UK
Araport, the Arabidopsis Information Portal, (https://www.araport.org), is an open-access, online resource for the Arabidopsis research community funded by the NSF and BBSRC. Since its inception in late 2013, the goal of
Araport has been to provide users with a “one-stop-shop” through data federation. Araport exposes a searchable index of TAIR10 genomic data as well as additional datasets from UniProt (protein), BAR (expression), EPIC-CoGe
(epigenomics), IntAct (interaction networks), ATTED-II (co-expression), PubMed (literature), and other diverse and geographically dispersed resources using a combination of warehousing and state-of-the-art web technologies.
Araport incorporates and integrates software from GMOD including InterMine, JBrowse, GBrowse, WebApollo, Tripal, and Chado. Araport has inherited from TAIR the responsibility of providing continued access to up-to-date
structural and functional annotation for the Col-0 genome. Later this year, the Araport11 annotation update will be released including over 1,000 novel protein coding gene loci and ~50k splice variants derived from ~28k gene
loci using 11 tissue-specific bins of RNA-seq datasets spanning over 100 SRA accessions, as well as various classes of non-coding RNA.
Araport: Data Integration for the Arabidopsis Research Community
Araport (https://www.araport.org)
“One-stop-shop” for Arabidopsis data
ThaleMine report pages present a comprehensive set of data integrated from a variety sources.
Report below shows up-to-date information about EMBRYO DEFECTIVE 2770, such as: GO
annotation(s), publications, array based expression, protein–protein interactions, metabolic pathways
and homologs in other plant species.
113 SRA accessions
Binned by 11 Tissue/Organ
TopHat Alignment to TAIR10
Genome-Guided
Trinity Assembly
Binned by 11 Tissue/Organ
De novo Trinity
Assembly
Concatenating De Novo Assembly and Genome-
Guided Assembly for each Tissue/Organ
11 Transcriptomes Assembled by PASA
Annotation Update by PASA
Consolidating 11 Transcriptomes
Re-indexing updated gene models
Araport11 Protein-Coding Gene
NCBI and MAKER-P
Assembly
Uniprot Protein
Novel Transcribed Regions
Filtering
Novel Loci
Appending Novel Transcripts
to TAIR10
Augmented TAIR10
Unique Models
Filtering
Protein Alignment
Literature
Araport11 Annotation Pipeline
JBrowse genome viewer presents users with data organized into hierarchical and faceted track list(s).
Genomic region shown below represents the features within the vicinity of EMBRYO DEFECTIVE 2770,
highlighting the Col-0 methylation data retrieved on-the-fly from EPIC-CoGe, Paired-end analysis of
TSS (PEAT) peaks, TDNA-seq based insertion sites and 1001 genomes variants alongside the updated
Araport11 annotation set.
Category
 TAIR10
 Araport11
 Description
Long intergenic noncoding
RNA (linc RNA)
2,708
The 2,708 intergenic transcripts were detected by tiling array and
confirmed by RNA-seq (Liu et al., 2012) 
Natural antisense transcript
(NAT)
2,980
Li et al (2013) identified 1490 NAT pairs in whole root samples using
strand-specific RNA-seq followed by computational analysis
(NASTIseq)
microRNA (miRNA)
 177
 427
 miRBase 21
Small nucleolar RNA
(snoRNA)
71
 287
Sherstnev et al (2012) incorporated data from TAIR, PlantDB, Chen
and Wu (2009) and Kim et al (2010) and annotated 287 snoRNA. 
tRNA
 689
 689
Small nuclear RNA (snRNA)
 13
 13
Small RNA
 24,575
We used ShortStack (Axtell, 2013), a software designed for annotation
of small RNA genes, to analyze public data sets (Law et al., 2013).
ShortStack was able to recapitulate >99% of the siRNAs clusters
reported by Law et al (2013), which was based on TAIR8 genome. We
ran ShortStack using 'de novo discovery mode', supplemented with
TAIR10 and miRBase 21 as the reference, and identified 24,575
smRNA non-miRNA non-hairpin small RNA loci. 
rRNA
 15
 15
Other RNA
 394
Total
 1,359
 31,681
Araport11 protein-coding gene annotation: TAIR10 annotation was supplemented with novel transcripts from NCBI
and MAKER-P assemblies and used as the reference annotation set. RNA-seq reads from SRA grouped into 11 tissue/
organ types, assembled by Trinity; tissue specific transcriptomes reconstructed from a hybrid assembly of de novo
and genome-guided assemblies. PASA based annotation update was performed independently for each tissue group
to avoid constituting chimeric transcripts and the 11 transcriptomes were consolidated using a custom Python script
to collapse isoforms differing in terminal UTR length. Around 300 Uniprot protein records inconsistent with TAIR10
were evaluated, filtered, and appended to the PASA updated set. Additional novel transcripts extracted from PASA
and literature were used to further quantify novel loci. Updated gene models and novel loci part of Araport11, will be
re-indexed with appropriate locus and isoform identifiers and released for community review.

Statistics: Araport11 updated 80.3% (28,429/35,385) of TAIR10 protein-coding gene models of which 3.3% (933) and
88.2% (25,079/28,429) altered CDS and UTR respectively. A total of 1,162 new loci and 14,880 new gene models were
added. 38.3% (18% in TAIR10) of protein-coding genes now have additional splice variants. Overall, the Araport11
pre-release contains 28,565 protein-coding gene loci encompassing 50,265 gene models.
Araport11 non-coding RNA annotation
Publications
1.  Araport: the Arabidopsis Information Portal. Nucleic Acids Research (2014) doi: 10.1093/nar/gku1200
2.  The Arabidopsis Information Portal: An Application Platform for Data Discovery. Proceedings of the 9th Gateway
Computing Environments Workshop (2014) doi: 10.1109/GCE.2014.10
We thank NCBI RefSeq team and Mark Yandell lab for sharing the TAIR10 re-annotation data, authors of the RNA-seq
data sets used in our coding and non-coding RNA annotation, Michael Axtell (PSU) and Ho-Ming Chen (Academia
Sinica) for helpful discussions.
Acknowledgements

More Related Content

What's hot

Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
mikaelhuss
 
High-throughput proteomics: from understanding data to predicting them
High-throughput proteomics: from understanding data to predicting themHigh-throughput proteomics: from understanding data to predicting them
High-throughput proteomics: from understanding data to predicting them
Maté Ongenaert
 
Biological database by kk sahu
Biological database by kk sahuBiological database by kk sahu
Biological database by kk sahu
KAUSHAL SAHU
 
BIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And ChallengesBIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And Challenges
Amos Watentena
 
EMBL
EMBLEMBL
What's New at Araport - ICAR 2017
What's New at Araport - ICAR 2017What's New at Araport - ICAR 2017
What's New at Araport - ICAR 2017
Vivek Krishnakumar
 
Bioinformatics for beginners (exam point of view)
Bioinformatics for beginners (exam point of view)Bioinformatics for beginners (exam point of view)
Bioinformatics for beginners (exam point of view)
Sijo A
 
Rishi
RishiRishi
TOOLS AND DATA BASES OF NCBI
TOOLS AND DATA BASES OF NCBITOOLS AND DATA BASES OF NCBI
TOOLS AND DATA BASES OF NCBI
Santosh Kumar Sahoo
 
Bioinformatics Databases
Bioinformatics DatabasesBioinformatics Databases
Bioinformatics Databasescschlos2
 
Databases ii
Databases iiDatabases ii
Databases ii
Sucheta Tripathy
 
Intro bioinfo
Intro bioinfoIntro bioinfo
Intro bioinfo
Vinitha Nair
 
Protein Sequence Databases
Protein Sequence Databases Protein Sequence Databases
Protein Sequence Databases
Hemant Bothe
 
Introduction to Bioinformatics.
 Introduction to Bioinformatics. Introduction to Bioinformatics.
Introduction to Bioinformatics.
Elena Sügis
 
Biological databases
Biological databasesBiological databases
Biological databases
SHRADHEYA GUPTA
 
Features of biological databases
Features of biological databasesFeatures of biological databases
Features of biological databases
Charu Sharma
 
RML NCBI Resources
RML NCBI ResourcesRML NCBI Resources
RML NCBI Resources
Jackie Wirz, PhD
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomics
mikaelhuss
 

What's hot (20)

Bioinformatics-2009-Moura-1096-8
Bioinformatics-2009-Moura-1096-8Bioinformatics-2009-Moura-1096-8
Bioinformatics-2009-Moura-1096-8
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 
High-throughput proteomics: from understanding data to predicting them
High-throughput proteomics: from understanding data to predicting themHigh-throughput proteomics: from understanding data to predicting them
High-throughput proteomics: from understanding data to predicting them
 
Biological database by kk sahu
Biological database by kk sahuBiological database by kk sahu
Biological database by kk sahu
 
Mir prapik
Mir prapikMir prapik
Mir prapik
 
BIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And ChallengesBIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And Challenges
 
EMBL
EMBLEMBL
EMBL
 
What's New at Araport - ICAR 2017
What's New at Araport - ICAR 2017What's New at Araport - ICAR 2017
What's New at Araport - ICAR 2017
 
Bioinformatics for beginners (exam point of view)
Bioinformatics for beginners (exam point of view)Bioinformatics for beginners (exam point of view)
Bioinformatics for beginners (exam point of view)
 
Rishi
RishiRishi
Rishi
 
TOOLS AND DATA BASES OF NCBI
TOOLS AND DATA BASES OF NCBITOOLS AND DATA BASES OF NCBI
TOOLS AND DATA BASES OF NCBI
 
Bioinformatics Databases
Bioinformatics DatabasesBioinformatics Databases
Bioinformatics Databases
 
Databases ii
Databases iiDatabases ii
Databases ii
 
Intro bioinfo
Intro bioinfoIntro bioinfo
Intro bioinfo
 
Protein Sequence Databases
Protein Sequence Databases Protein Sequence Databases
Protein Sequence Databases
 
Introduction to Bioinformatics.
 Introduction to Bioinformatics. Introduction to Bioinformatics.
Introduction to Bioinformatics.
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Features of biological databases
Features of biological databasesFeatures of biological databases
Features of biological databases
 
RML NCBI Resources
RML NCBI ResourcesRML NCBI Resources
RML NCBI Resources
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomics
 

Viewers also liked

Wesport
Wesport Wesport
Wesport
Go Green
 
Tapahtumamarkkinointi ja some Tampere 2016
Tapahtumamarkkinointi ja some Tampere 2016Tapahtumamarkkinointi ja some Tampere 2016
Tapahtumamarkkinointi ja some Tampere 2016
FutureMarja
 
2016 aapor presentation virginia
2016 aapor presentation   virginia2016 aapor presentation   virginia
2016 aapor presentation virginia
Martin Wulfe
 
Josselyn espinal
Josselyn espinalJosselyn espinal
Josselyn espinal
josselynespinal
 
Folleto final La Llora 2015
Folleto final La Llora 2015Folleto final La Llora 2015
Folleto final La Llora 2015
Evelin Garcia
 
Présentation Bel monnaie Mardinnov'
Présentation Bel monnaie Mardinnov'Présentation Bel monnaie Mardinnov'
Présentation Bel monnaie Mardinnov'
Le Moulin Digital
 
Promeneurs suite
Promeneurs suitePromeneurs suite
Promeneurs suite
gnizon
 
Mapa conceptual norma juridica
Mapa conceptual norma juridicaMapa conceptual norma juridica
Mapa conceptual norma juridica
Rosmar Perez
 

Viewers also liked (9)

Wesport
Wesport Wesport
Wesport
 
Tapahtumamarkkinointi ja some Tampere 2016
Tapahtumamarkkinointi ja some Tampere 2016Tapahtumamarkkinointi ja some Tampere 2016
Tapahtumamarkkinointi ja some Tampere 2016
 
2016 aapor presentation virginia
2016 aapor presentation   virginia2016 aapor presentation   virginia
2016 aapor presentation virginia
 
Josselyn espinal
Josselyn espinalJosselyn espinal
Josselyn espinal
 
Folleto final La Llora 2015
Folleto final La Llora 2015Folleto final La Llora 2015
Folleto final La Llora 2015
 
Présentation Bel monnaie Mardinnov'
Présentation Bel monnaie Mardinnov'Présentation Bel monnaie Mardinnov'
Présentation Bel monnaie Mardinnov'
 
Promeneurs suite
Promeneurs suitePromeneurs suite
Promeneurs suite
 
Foda con enfoque prospectivo
Foda con enfoque prospectivoFoda con enfoque prospectivo
Foda con enfoque prospectivo
 
Mapa conceptual norma juridica
Mapa conceptual norma juridicaMapa conceptual norma juridica
Mapa conceptual norma juridica
 

Similar to Araport Data Integration - 2015 UMD Minisymposium

ICAR 2015 Poster - Araport
ICAR 2015 Poster - AraportICAR 2015 Poster - Araport
ICAR 2015 Poster - Araport
Araport
 
Bioinformatics seminar
Bioinformatics seminarBioinformatics seminar
Bioinformatics seminar
shashi bijapure
 
Whole Transcriptome Analysis of Testicular Germ Cell Tumors
Whole Transcriptome Analysis of Testicular Germ Cell TumorsWhole Transcriptome Analysis of Testicular Germ Cell Tumors
Whole Transcriptome Analysis of Testicular Germ Cell Tumors
Thermo Fisher Scientific
 
Prediction of protein function
Prediction of protein functionPrediction of protein function
Prediction of protein function
Lars Juhl Jensen
 
Journal club Aug04 2015 GeneMarkET
Journal club Aug04 2015 GeneMarkETJournal club Aug04 2015 GeneMarkET
Journal club Aug04 2015 GeneMarkET
Hiroya Morimoto
 
Protein databases
Protein databasesProtein databases
Protein databases
bansalaman80
 
EVE 161 Winter 2018 Class 13
EVE 161 Winter 2018 Class 13EVE 161 Winter 2018 Class 13
EVE 161 Winter 2018 Class 13
Jonathan Eisen
 
Biological databases
Biological databasesBiological databases
Biological databases
Prasanthperceptron
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
Anshika Bansal
 
Tair workshop stanford2017
Tair workshop stanford2017Tair workshop stanford2017
Tair workshop stanford2017
Phoenix Bioinformatics
 
RT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferationRT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferation
IJAEMSJORNAL
 
EVE 161 Winter 2018 Class 16
EVE 161 Winter 2018 Class 16EVE 161 Winter 2018 Class 16
EVE 161 Winter 2018 Class 16
Jonathan Eisen
 
Use of TGIRT for ssDNA-seq
Use of TGIRT for ssDNA-seqUse of TGIRT for ssDNA-seq
Use of TGIRT for ssDNA-seq
Douglas Wu
 
Liu_Jiangyuan_1201662_FR
Liu_Jiangyuan_1201662_FRLiu_Jiangyuan_1201662_FR
Liu_Jiangyuan_1201662_FR姜圆 刘
 
New methods for high-throughput nucleic sequencing and diagnostics using a th...
New methods for high-throughput nucleic sequencing and diagnostics using a th...New methods for high-throughput nucleic sequencing and diagnostics using a th...
New methods for high-throughput nucleic sequencing and diagnostics using a th...
Douglas Wu
 
A Comparative Encyclopedia Of DNA Elements In The Mouse Genome
A Comparative Encyclopedia Of DNA Elements In The Mouse GenomeA Comparative Encyclopedia Of DNA Elements In The Mouse Genome
A Comparative Encyclopedia Of DNA Elements In The Mouse Genome
Daniel Wachtel
 

Similar to Araport Data Integration - 2015 UMD Minisymposium (20)

Rnaseq forgenefinding
Rnaseq forgenefindingRnaseq forgenefinding
Rnaseq forgenefinding
 
ICAR 2015 Poster - Araport
ICAR 2015 Poster - AraportICAR 2015 Poster - Araport
ICAR 2015 Poster - Araport
 
sequencing-methods-review
sequencing-methods-reviewsequencing-methods-review
sequencing-methods-review
 
Bioinformatics seminar
Bioinformatics seminarBioinformatics seminar
Bioinformatics seminar
 
Whole Transcriptome Analysis of Testicular Germ Cell Tumors
Whole Transcriptome Analysis of Testicular Germ Cell TumorsWhole Transcriptome Analysis of Testicular Germ Cell Tumors
Whole Transcriptome Analysis of Testicular Germ Cell Tumors
 
Prediction of protein function
Prediction of protein functionPrediction of protein function
Prediction of protein function
 
Journal club Aug04 2015 GeneMarkET
Journal club Aug04 2015 GeneMarkETJournal club Aug04 2015 GeneMarkET
Journal club Aug04 2015 GeneMarkET
 
Protein databases
Protein databasesProtein databases
Protein databases
 
2011-NAR
2011-NAR2011-NAR
2011-NAR
 
EVE 161 Winter 2018 Class 13
EVE 161 Winter 2018 Class 13EVE 161 Winter 2018 Class 13
EVE 161 Winter 2018 Class 13
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
Tair workshop stanford2017
Tair workshop stanford2017Tair workshop stanford2017
Tair workshop stanford2017
 
RT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferationRT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferation
 
EVE 161 Winter 2018 Class 16
EVE 161 Winter 2018 Class 16EVE 161 Winter 2018 Class 16
EVE 161 Winter 2018 Class 16
 
Use of TGIRT for ssDNA-seq
Use of TGIRT for ssDNA-seqUse of TGIRT for ssDNA-seq
Use of TGIRT for ssDNA-seq
 
Liu_Jiangyuan_1201662_FR
Liu_Jiangyuan_1201662_FRLiu_Jiangyuan_1201662_FR
Liu_Jiangyuan_1201662_FR
 
New methods for high-throughput nucleic sequencing and diagnostics using a th...
New methods for high-throughput nucleic sequencing and diagnostics using a th...New methods for high-throughput nucleic sequencing and diagnostics using a th...
New methods for high-throughput nucleic sequencing and diagnostics using a th...
 
Rna
RnaRna
Rna
 
A Comparative Encyclopedia Of DNA Elements In The Mouse Genome
A Comparative Encyclopedia Of DNA Elements In The Mouse GenomeA Comparative Encyclopedia Of DNA Elements In The Mouse Genome
A Comparative Encyclopedia Of DNA Elements In The Mouse Genome
 

More from Vivek Krishnakumar

JBrowse and Inter-"Mine" Communication - IMDEV 2017
JBrowse and Inter-"Mine" Communication - IMDEV 2017JBrowse and Inter-"Mine" Communication - IMDEV 2017
JBrowse and Inter-"Mine" Communication - IMDEV 2017
Vivek Krishnakumar
 
Integrate JBrowse REST API Framework with Adama Federation Architecture
Integrate JBrowse REST API Framework with Adama Federation ArchitectureIntegrate JBrowse REST API Framework with Adama Federation Architecture
Integrate JBrowse REST API Framework with Adama Federation Architecture
Vivek Krishnakumar
 
Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...
Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...
Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...
Vivek Krishnakumar
 
Interoperation between InterMines
Interoperation between InterMinesInteroperation between InterMines
Interoperation between InterMines
Vivek Krishnakumar
 
InterMine Infrastructure LF Meeting 20150428
InterMine Infrastructure LF Meeting 20150428InterMine Infrastructure LF Meeting 20150428
InterMine Infrastructure LF Meeting 20150428
Vivek Krishnakumar
 
JBrowse within the Arabidopsis Information Portal - PAG XXIII
JBrowse within the Arabidopsis Information Portal - PAG XXIIIJBrowse within the Arabidopsis Information Portal - PAG XXIII
JBrowse within the Arabidopsis Information Portal - PAG XXIII
Vivek Krishnakumar
 
Tripal within the Arabidopsis Information Portal - PAG XXIII
Tripal within the Arabidopsis Information Portal - PAG XXIIITripal within the Arabidopsis Information Portal - PAG XXIII
Tripal within the Arabidopsis Information Portal - PAG XXIII
Vivek Krishnakumar
 
Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...
Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...
Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...
Vivek Krishnakumar
 
Tutorial 1: Your First Science App - Araport Developer Workshop
Tutorial 1: Your First Science App - Araport Developer WorkshopTutorial 1: Your First Science App - Araport Developer Workshop
Tutorial 1: Your First Science App - Araport Developer Workshop
Vivek Krishnakumar
 

More from Vivek Krishnakumar (9)

JBrowse and Inter-"Mine" Communication - IMDEV 2017
JBrowse and Inter-"Mine" Communication - IMDEV 2017JBrowse and Inter-"Mine" Communication - IMDEV 2017
JBrowse and Inter-"Mine" Communication - IMDEV 2017
 
Integrate JBrowse REST API Framework with Adama Federation Architecture
Integrate JBrowse REST API Framework with Adama Federation ArchitectureIntegrate JBrowse REST API Framework with Adama Federation Architecture
Integrate JBrowse REST API Framework with Adama Federation Architecture
 
Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...
Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...
Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...
 
Interoperation between InterMines
Interoperation between InterMinesInteroperation between InterMines
Interoperation between InterMines
 
InterMine Infrastructure LF Meeting 20150428
InterMine Infrastructure LF Meeting 20150428InterMine Infrastructure LF Meeting 20150428
InterMine Infrastructure LF Meeting 20150428
 
JBrowse within the Arabidopsis Information Portal - PAG XXIII
JBrowse within the Arabidopsis Information Portal - PAG XXIIIJBrowse within the Arabidopsis Information Portal - PAG XXIII
JBrowse within the Arabidopsis Information Portal - PAG XXIII
 
Tripal within the Arabidopsis Information Portal - PAG XXIII
Tripal within the Arabidopsis Information Portal - PAG XXIIITripal within the Arabidopsis Information Portal - PAG XXIII
Tripal within the Arabidopsis Information Portal - PAG XXIII
 
Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...
Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...
Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...
 
Tutorial 1: Your First Science App - Araport Developer Workshop
Tutorial 1: Your First Science App - Araport Developer WorkshopTutorial 1: Your First Science App - Araport Developer Workshop
Tutorial 1: Your First Science App - Araport Developer Workshop
 

Recently uploaded

S.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary levelS.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary level
ronaldlakony0
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
sanjana502982
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Scintica Instrumentation
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiologyBLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
NoelManyise1
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
nodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptxnodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptx
alishadewangan1
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 

Recently uploaded (20)

S.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary levelS.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary level
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiologyBLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
nodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptxnodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptx
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 

Araport Data Integration - 2015 UMD Minisymposium

  • 1. Vivek Krishnakumar1, Chia-Yi Cheng1, Maria Kim1, Erik Ferlanti1, Irina Belyaeva1, Seth Schobel1, Sergio Contrino3, Matthew R. Hanlon2, Walter Moreira2, Steve Mock2, Joe Stubbs2, Agnes P. Chan1, Jason R. Miller1, Matthew W. Vaughn2, Gos Micklem3, Christopher D. Town1 1J. Craig Venter Institute, Rockville, MD, USA; 2Texas Advanced Computing Center, Austin, TX, USA; 3Cambridge University, Cambridge, UK Araport, the Arabidopsis Information Portal, (https://www.araport.org), is an open-access, online resource for the Arabidopsis research community funded by the NSF and BBSRC. Since its inception in late 2013, the goal of Araport has been to provide users with a “one-stop-shop” through data federation. Araport exposes a searchable index of TAIR10 genomic data as well as additional datasets from UniProt (protein), BAR (expression), EPIC-CoGe (epigenomics), IntAct (interaction networks), ATTED-II (co-expression), PubMed (literature), and other diverse and geographically dispersed resources using a combination of warehousing and state-of-the-art web technologies. Araport incorporates and integrates software from GMOD including InterMine, JBrowse, GBrowse, WebApollo, Tripal, and Chado. Araport has inherited from TAIR the responsibility of providing continued access to up-to-date structural and functional annotation for the Col-0 genome. Later this year, the Araport11 annotation update will be released including over 1,000 novel protein coding gene loci and ~50k splice variants derived from ~28k gene loci using 11 tissue-specific bins of RNA-seq datasets spanning over 100 SRA accessions, as well as various classes of non-coding RNA. Araport: Data Integration for the Arabidopsis Research Community Araport (https://www.araport.org) “One-stop-shop” for Arabidopsis data ThaleMine report pages present a comprehensive set of data integrated from a variety sources. Report below shows up-to-date information about EMBRYO DEFECTIVE 2770, such as: GO annotation(s), publications, array based expression, protein–protein interactions, metabolic pathways and homologs in other plant species. 113 SRA accessions Binned by 11 Tissue/Organ TopHat Alignment to TAIR10 Genome-Guided Trinity Assembly Binned by 11 Tissue/Organ De novo Trinity Assembly Concatenating De Novo Assembly and Genome- Guided Assembly for each Tissue/Organ 11 Transcriptomes Assembled by PASA Annotation Update by PASA Consolidating 11 Transcriptomes Re-indexing updated gene models Araport11 Protein-Coding Gene NCBI and MAKER-P Assembly Uniprot Protein Novel Transcribed Regions Filtering Novel Loci Appending Novel Transcripts to TAIR10 Augmented TAIR10 Unique Models Filtering Protein Alignment Literature Araport11 Annotation Pipeline JBrowse genome viewer presents users with data organized into hierarchical and faceted track list(s). Genomic region shown below represents the features within the vicinity of EMBRYO DEFECTIVE 2770, highlighting the Col-0 methylation data retrieved on-the-fly from EPIC-CoGe, Paired-end analysis of TSS (PEAT) peaks, TDNA-seq based insertion sites and 1001 genomes variants alongside the updated Araport11 annotation set. Category TAIR10 Araport11 Description Long intergenic noncoding RNA (linc RNA) 2,708 The 2,708 intergenic transcripts were detected by tiling array and confirmed by RNA-seq (Liu et al., 2012) Natural antisense transcript (NAT) 2,980 Li et al (2013) identified 1490 NAT pairs in whole root samples using strand-specific RNA-seq followed by computational analysis (NASTIseq) microRNA (miRNA) 177 427 miRBase 21 Small nucleolar RNA (snoRNA) 71 287 Sherstnev et al (2012) incorporated data from TAIR, PlantDB, Chen and Wu (2009) and Kim et al (2010) and annotated 287 snoRNA. tRNA 689 689 Small nuclear RNA (snRNA) 13 13 Small RNA 24,575 We used ShortStack (Axtell, 2013), a software designed for annotation of small RNA genes, to analyze public data sets (Law et al., 2013). ShortStack was able to recapitulate >99% of the siRNAs clusters reported by Law et al (2013), which was based on TAIR8 genome. We ran ShortStack using 'de novo discovery mode', supplemented with TAIR10 and miRBase 21 as the reference, and identified 24,575 smRNA non-miRNA non-hairpin small RNA loci. rRNA 15 15 Other RNA 394 Total 1,359 31,681 Araport11 protein-coding gene annotation: TAIR10 annotation was supplemented with novel transcripts from NCBI and MAKER-P assemblies and used as the reference annotation set. RNA-seq reads from SRA grouped into 11 tissue/ organ types, assembled by Trinity; tissue specific transcriptomes reconstructed from a hybrid assembly of de novo and genome-guided assemblies. PASA based annotation update was performed independently for each tissue group to avoid constituting chimeric transcripts and the 11 transcriptomes were consolidated using a custom Python script to collapse isoforms differing in terminal UTR length. Around 300 Uniprot protein records inconsistent with TAIR10 were evaluated, filtered, and appended to the PASA updated set. Additional novel transcripts extracted from PASA and literature were used to further quantify novel loci. Updated gene models and novel loci part of Araport11, will be re-indexed with appropriate locus and isoform identifiers and released for community review. Statistics: Araport11 updated 80.3% (28,429/35,385) of TAIR10 protein-coding gene models of which 3.3% (933) and 88.2% (25,079/28,429) altered CDS and UTR respectively. A total of 1,162 new loci and 14,880 new gene models were added. 38.3% (18% in TAIR10) of protein-coding genes now have additional splice variants. Overall, the Araport11 pre-release contains 28,565 protein-coding gene loci encompassing 50,265 gene models. Araport11 non-coding RNA annotation Publications 1.  Araport: the Arabidopsis Information Portal. Nucleic Acids Research (2014) doi: 10.1093/nar/gku1200 2.  The Arabidopsis Information Portal: An Application Platform for Data Discovery. Proceedings of the 9th Gateway Computing Environments Workshop (2014) doi: 10.1109/GCE.2014.10 We thank NCBI RefSeq team and Mark Yandell lab for sharing the TAIR10 re-annotation data, authors of the RNA-seq data sets used in our coding and non-coding RNA annotation, Michael Axtell (PSU) and Ho-Ming Chen (Academia Sinica) for helpful discussions. Acknowledgements