SlideShare a Scribd company logo
1 of 14
Download to read offline
Predicting lncRNA Transcripts Out of Comprehensive
Rat Renal Cell type-specific Transcriptome Libraries
Gui Chen
11/20/2015
WHY LONG NON CODING RNA?
➤ Many long non-coding transcripts
(lncRNAs) function in a variety of
responses which include differentiation,
cell cycle, and maintenance of stem-cell
like phenotypes, and are cell-type specific
in their expression. Yet, very little is
known about their regulation or roles in
disease states.
➤ A newly established rat renal gene
expression database and recently
assembled rn6 genome sequecne have
paved a way for us to conduct such study.
WHAT IS EXACTLY THE DATA SOURCE?
➤ 110(renal tubule segments) +
5(glomeruli) renal cell-type specific gene
expression profiles as a product of work
described in the paper shown left.
➤ 7 polyadenylated mRNA-seq(PA-seq) &

cortical collecting duct(4 control rat
and 4 water loaded rat)
➤ Totally 125 libraries

WHAT IS THE FORMAT OF THE DATA
➤ Original transcripts data are stored in
GTF format which is a flat tab-delimited
file format that can be directly loaded
into excel.
➤ Next is a real case example of what GTF
records looks like.

GTF FILE EXAMPLE
How can we pick out those transcripts that potentially are long
non coding RNA transcripts from thousands of transcripts?
1. What are the characteristics of lncRNA from preliminary data and
experience?
➤ Less conserved than protein-coding genes.(PhyloCSF)
➤ A much shorter ORF(open reading frame) than that of genes(they
don’t necessarily have, if have, have one short and by chance or
they are originally genes?)
➤ When forcely translated into protein, there is no counterpart in
nr database(none redundant protein database).(Blastx)
➤ They are consistently and significantly expressed at least in one
type of cell.

2. Extract records satisfying all the characteristics above.
A pipeline is established based on this idea.
Theoretically the pipeline works like this…
➤ The biggest circle represents the whole searching space.
➤ small rectangles inside the big circle represent subset of records in the whole searching space, which satisfy certain lncRNA
charateristic.
➤ The intersection of all the small rectangles representing the predicted set of lncRNA transcripts.
all the transcripts
less conserved ones
no counterpart in nrdatabase
short ORF
true positive expression
Predicted

lncRNAs
What do we get by each step? (take multiexon transcripts as examples)
➤ Find transcripts with short ORF(length < 150)
Because each record in fasta file contains two rows, there are actually n/2 records.
What do we get by each step? (take multiexon transcripts as examples)
➤ Find transcripts with no counterpart in nr database(E-value threshold > 10E-4 )
What do we get by each step? (take multiexon transcripts as examples)
➤ Find transcripts are consistently and significantly expressed for all replicates in at least
one type of cell (fpkm > 0.1)

Classification of lncRNAs
➤ sense and antisense lncRNAs
➤ sense lncRNAs can be classified into
intergenic, cons, incs, ponds lncRNAs
RESULT
THANK YOU
& Happy Thanksgiving!

More Related Content

Similar to Practicum Pressentation PDF

Catalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seqCatalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seqManjappa Ganiger
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012Dan Gaston
 
Rna lecture
Rna lectureRna lecture
Rna lecturenishulpu
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment DesignYaoyu Wang
 
Introduction to Apollo: i5K E affinis
Introduction to Apollo: i5K E affinisIntroduction to Apollo: i5K E affinis
Introduction to Apollo: i5K E affinisMonica Munoz-Torres
 
Apollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research CommunityApollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research CommunityMonica Munoz-Torres
 
GENOME_STRUCTURE1.ppt
GENOME_STRUCTURE1.pptGENOME_STRUCTURE1.ppt
GENOME_STRUCTURE1.pptsherylbadayos
 
RNA-Seq_Presentation
RNA-Seq_PresentationRNA-Seq_Presentation
RNA-Seq_PresentationToyin23
 
Apollo : A workshop for the Manakin Research Coordination Network
Apollo: A workshop for the Manakin Research Coordination NetworkApollo: A workshop for the Manakin Research Coordination Network
Apollo : A workshop for the Manakin Research Coordination NetworkMonica Munoz-Torres
 
Wellstein poster embl meeting nov 2018
Wellstein poster embl meeting nov 2018Wellstein poster embl meeting nov 2018
Wellstein poster embl meeting nov 2018Anne Deslattes Mays
 
Evolutionary analysis across mammals reveals distinct classes of long non-cod...
Evolutionary analysis across mammals reveals distinct classes of long non-cod...Evolutionary analysis across mammals reveals distinct classes of long non-cod...
Evolutionary analysis across mammals reveals distinct classes of long non-cod...Jacob Hanna - Weizmann Institute of Science
 
The introduction of supernova system: a vector system for single-cell labelin...
The introduction of supernova system: a vector system for single-cell labelin...The introduction of supernova system: a vector system for single-cell labelin...
The introduction of supernova system: a vector system for single-cell labelin...Div. of Neurogenet., NIG
 
Genome Curation using Apollo - Workshop at UTK
Genome Curation using Apollo - Workshop at UTKGenome Curation using Apollo - Workshop at UTK
Genome Curation using Apollo - Workshop at UTKMonica Munoz-Torres
 
Transcriptomics and lexico-syntactic analysis
Transcriptomics and lexico-syntactic analysisTranscriptomics and lexico-syntactic analysis
Transcriptomics and lexico-syntactic analysisLars Juhl Jensen
 
Transposons
TransposonsTransposons
Transposonssiva ni
 

Similar to Practicum Pressentation PDF (20)

Apolo Taller en BIOS
Apolo Taller en BIOS Apolo Taller en BIOS
Apolo Taller en BIOS
 
Est database
Est databaseEst database
Est database
 
Catalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seqCatalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seq
 
Thesis biobix
Thesis biobixThesis biobix
Thesis biobix
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
 
Rna lecture
Rna lectureRna lecture
Rna lecture
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment Design
 
Gen bank
Gen bankGen bank
Gen bank
 
Introduction to Apollo: i5K E affinis
Introduction to Apollo: i5K E affinisIntroduction to Apollo: i5K E affinis
Introduction to Apollo: i5K E affinis
 
Apollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research CommunityApollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research Community
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
GENOME_STRUCTURE1.ppt
GENOME_STRUCTURE1.pptGENOME_STRUCTURE1.ppt
GENOME_STRUCTURE1.ppt
 
RNA-Seq_Presentation
RNA-Seq_PresentationRNA-Seq_Presentation
RNA-Seq_Presentation
 
Apollo : A workshop for the Manakin Research Coordination Network
Apollo: A workshop for the Manakin Research Coordination NetworkApollo: A workshop for the Manakin Research Coordination Network
Apollo : A workshop for the Manakin Research Coordination Network
 
Wellstein poster embl meeting nov 2018
Wellstein poster embl meeting nov 2018Wellstein poster embl meeting nov 2018
Wellstein poster embl meeting nov 2018
 
Evolutionary analysis across mammals reveals distinct classes of long non-cod...
Evolutionary analysis across mammals reveals distinct classes of long non-cod...Evolutionary analysis across mammals reveals distinct classes of long non-cod...
Evolutionary analysis across mammals reveals distinct classes of long non-cod...
 
The introduction of supernova system: a vector system for single-cell labelin...
The introduction of supernova system: a vector system for single-cell labelin...The introduction of supernova system: a vector system for single-cell labelin...
The introduction of supernova system: a vector system for single-cell labelin...
 
Genome Curation using Apollo - Workshop at UTK
Genome Curation using Apollo - Workshop at UTKGenome Curation using Apollo - Workshop at UTK
Genome Curation using Apollo - Workshop at UTK
 
Transcriptomics and lexico-syntactic analysis
Transcriptomics and lexico-syntactic analysisTranscriptomics and lexico-syntactic analysis
Transcriptomics and lexico-syntactic analysis
 
Transposons
TransposonsTransposons
Transposons
 

Practicum Pressentation PDF

  • 1. Predicting lncRNA Transcripts Out of Comprehensive Rat Renal Cell type-specific Transcriptome Libraries Gui Chen 11/20/2015
  • 2. WHY LONG NON CODING RNA? ➤ Many long non-coding transcripts (lncRNAs) function in a variety of responses which include differentiation, cell cycle, and maintenance of stem-cell like phenotypes, and are cell-type specific in their expression. Yet, very little is known about their regulation or roles in disease states. ➤ A newly established rat renal gene expression database and recently assembled rn6 genome sequecne have paved a way for us to conduct such study.
  • 3. WHAT IS EXACTLY THE DATA SOURCE? ➤ 110(renal tubule segments) + 5(glomeruli) renal cell-type specific gene expression profiles as a product of work described in the paper shown left. ➤ 7 polyadenylated mRNA-seq(PA-seq) &
 cortical collecting duct(4 control rat and 4 water loaded rat) ➤ Totally 125 libraries

  • 4.
  • 5. WHAT IS THE FORMAT OF THE DATA ➤ Original transcripts data are stored in GTF format which is a flat tab-delimited file format that can be directly loaded into excel. ➤ Next is a real case example of what GTF records looks like.

  • 7. How can we pick out those transcripts that potentially are long non coding RNA transcripts from thousands of transcripts? 1. What are the characteristics of lncRNA from preliminary data and experience? ➤ Less conserved than protein-coding genes.(PhyloCSF) ➤ A much shorter ORF(open reading frame) than that of genes(they don’t necessarily have, if have, have one short and by chance or they are originally genes?) ➤ When forcely translated into protein, there is no counterpart in nr database(none redundant protein database).(Blastx) ➤ They are consistently and significantly expressed at least in one type of cell.
 2. Extract records satisfying all the characteristics above. A pipeline is established based on this idea.
  • 8. Theoretically the pipeline works like this… ➤ The biggest circle represents the whole searching space. ➤ small rectangles inside the big circle represent subset of records in the whole searching space, which satisfy certain lncRNA charateristic. ➤ The intersection of all the small rectangles representing the predicted set of lncRNA transcripts. all the transcripts less conserved ones no counterpart in nrdatabase short ORF true positive expression Predicted
 lncRNAs
  • 9. What do we get by each step? (take multiexon transcripts as examples) ➤ Find transcripts with short ORF(length < 150) Because each record in fasta file contains two rows, there are actually n/2 records.
  • 10. What do we get by each step? (take multiexon transcripts as examples) ➤ Find transcripts with no counterpart in nr database(E-value threshold > 10E-4 )
  • 11. What do we get by each step? (take multiexon transcripts as examples) ➤ Find transcripts are consistently and significantly expressed for all replicates in at least one type of cell (fpkm > 0.1)

  • 12. Classification of lncRNAs ➤ sense and antisense lncRNAs ➤ sense lncRNAs can be classified into intergenic, cons, incs, ponds lncRNAs
  • 14. THANK YOU & Happy Thanksgiving!