SlideShare a Scribd company logo
1 of 3
The Research and Application Progress of Transcriptome
Sequencing Technology (II)
1.5. Full transcription sequencing
The transcription level of mRNAis affected by the regulation of lncRNA, small RNA and circRNA.
Quantitative analysis of biomolecular networks and regulatory pathways in cells or specific tissues
in a certain space-time requires quantitative and qualitative research on all RNA molecules in the
entire transcriptome. Whole transcriptome sequencing can determine all complete transcripts in a
sample, including mRNA and non-coding RNA (lncRNA, circRNA and miRNA). The difference
between full transcript sequencing and conventional RNA-seq is the main way of library
construction. The whole transcriptome sequencing requires the establishment of 2 libraries (mRNA
+ lncRNA + circRNAlibrary and miRNA library) or 3 libraries (mRNA+ lncRNA library, circRNA
library and miRNA library) during the library construction process. Through the whole
transcriptome data, not only the expression profiles of all types of transcripts can be obtained, on
this basis, different RNA molecules are identified and annotated, their encoded proteins and
regulatory functions are analyzed, and the interaction between RNAmolecules is regulated Network
analysis, comprehensively and systematically analyze the biological characteristics of specific cells
in a specific time and space.
1.6. Single-cell transcriptome sequencing
Combining the complementary DNA (cDNAs) technology of in vitro transcription linear
amplification and PCR exponential amplification of a single cell with high-throughput sequencing
technology helps to derive single cell RNA-seq (scRNA-seq). Single-cell transcriptome sequencing
technology is a technique to studythe entire transcriptomeat the single-cell level. It is used to assess
the differences in gene expression between single cells, which can avoid false-negative results
introduced by the confusion of cell types, and may identify the rare cell population failing to pass
mixed cell detection. Common single-cell sequencing platforms currently include Fluidigm,
WaferGen, 10 × Genomics, and Illumina / Bio-Rad. Unlike other RNA sequencing technologies,
scRNA-seq needs to first isolate and obtain all transcriptomes within a single cell. Single cell
separation is a key step in scRNA-seq, which is mainly achieved through serial dilution,
micromanipulation separation, fluorescence activated cell sorting (FACS) and microfluidic
technology.
2. Construction of Transcriptome Sequencing Library
When performing transcriptome sequencing, the total RNA in the sample is extracted, rRNA is
removed, and the target sequencing RNA molecule is enriched to construct a sequencing library.
Sequencing libraries are divided into non-strand-specific libraries and strand-specific libraries. The
non-strand-specific library refers to a library in which RNA is reverse transcribed into double-
strandedcDNA, and a linker and information that does not distinguish the RNA strandare randomly
added. During sequencing, double-stranded cDNAis used for sequencing, which cannot distinguish
the transcription direction of m RNA. Strand-specific libraries can be divided into two categories,
one is to label one strand with a chemical modification, for example, to treat RNA molecules with
bisulfate, or to introduce dUTP during the synthesis of the second strand cDNA, and then degrade
the strand containing U; Different linkers are used to connect the 5 'and 3' ends of RNAmolecules
or synthetic cDNA strands to distinguish the positive and antisense strands.
In transcriptome sequencing, distinguishing the source of RNA molecular chains can avoid the
interference of reads on the antisense strand of genes, and can improve the accuracy of gene
transcript identification and transcript quantification. When using transcriptome data for de novo
stitching, it helps to demarcate the boundaries of transcripts and determine the sense chain
information of transcripts.
3. Transcriptome Data Processing
When transcript sequencing data is used to compare quantitative differences between gene levels or
transcript levels between different groups, the basic analysis process includes raw data
preprocessing, reads comparison, transcript assembly, new transcript prediction, and transcript
expression level, analysis and other steps. According to the purpose of the experiment, we can
further analyze the difference in transcript expression between the experimental group and the
control group, cluster gene expression patterns between samples, and perform joint analysis with
other omics data.
3.1. Raw data preprocessing
After obtaining the raw data of the second-generation sequencing, the quality of the data needs to
be evaluated and quality control (QC) is performed. The evaluation content includes data output,
GC content, rRNA content, basequality distribution, and repeated sequences. Thelow-quality reads
and linker sequences are removed, and the clean data after quality controlis obtained for subsequent
analysis.
3.2. Reads comparison
The transcriptome data is mainly derived from the exon sequence of the genome. The transcriptome
reads obtained by sequencing are aligned to the genomic sequence, which will be separated by the
intron sequence.
3.3. Transcript assembly
Transcriptassemblyis the assembly of sequencing data into transcripts. For species with a reference
genome, according to the results of the transcriptome comparison, the connection mode between
the exons is clarified, thereby constructing the structure of the transcript. For transcriptome data
without a referencegenomic sequence, in order to obtain a complete transcript sequence, short reads
obtained from RNA sequencing need to be assembled de novo. For transcriptome studies of non-
parameter species, more sequencing data is often needed to meet the requirements of de novo
assembly. The greater theamount of valid data for assembly, the better the number and completeness
of the transcripts that are spliced, and the easier it is to detect transcripts with lower expression
levels.
3.4. Transcript prediction
Most genes have multiple splicing forms and may produce multiple transcripts, thereby encoding
different proteins, whichmay causea gene to have multiple functions. After splicing and assembling
the transcript sequencing data, not only will you get the known transcript information, but also new
transcript sequences, you need to identify and annotate the new transcripts, especially the new ones
that are less studied ncRNAtranscript.
For species with reference genome and transcript reference information, the transcript structure is
mainly basedon sequencing to obtain reads for comparison. Thereads cover all transcript sequences
and rely on the genome sequence to assemble complete transcript information. For species without
a reference genome, the transcript sequence of the gene needs to be assembled by itself. The
obtained gene or transcript sequence can be compared with unigene and EST databases of the same
species or near-source species to judge the reliability of the obtained gene or transcript sequence. In
this process, the blast method is commonly used for comparison to quickly identify the similarity
between sequences. In the identification and analysis of new lncRNA, transcripts with a total exon
length of> 200 nt are extracted from the transcriptome data based on the characteristics of lncRNA
molecules, and then predicted based on open reading frames and compared with known protein
databases Further isolate lncRNA from mRNA.
3.5. The analysis of transcript expression levels
After comparing the reads to the corresponding genomic position or assembling the transcript from
scratch, the number of reads on each gene or transcript can reflect the expression abundance to a
certain extent. There may be significant differences in the total output of data between samples, the
number of gene expressions between samples, the length of different genes in a sample, or even the
distribution of different transcripts within the same gene. When comparing expression levels, you
need to normalize the data between samples.
To be continued in Part III…

More Related Content

What's hot

Massively Parallel Signature Sequencing (MPSS)
Massively Parallel Signature Sequencing (MPSS) Massively Parallel Signature Sequencing (MPSS)
Massively Parallel Signature Sequencing (MPSS) Bharathiar university
 
Multilocus sequence typin1
Multilocus sequence typin1Multilocus sequence typin1
Multilocus sequence typin1Manash Debbarma
 
Tools for Transcriptome Data Analysis
Tools for Transcriptome Data AnalysisTools for Transcriptome Data Analysis
Tools for Transcriptome Data AnalysisSANJANA PANDEY
 
Sage - serial analysis of gene expression
Sage - serial analysis of gene expressionSage - serial analysis of gene expression
Sage - serial analysis of gene expressionSwati Pawar
 
Transcriptome analysis
Transcriptome analysisTranscriptome analysis
Transcriptome analysisRamaJumwal2
 
Genomic mapping, genetic mapping
Genomic mapping, genetic mappingGenomic mapping, genetic mapping
Genomic mapping, genetic mappingKAUSHAL SAHU
 
Expressed sequence tag (EST), molecular marker
Expressed sequence tag (EST), molecular markerExpressed sequence tag (EST), molecular marker
Expressed sequence tag (EST), molecular markerKAUSHAL SAHU
 
Transcriptomics and metabolomics
Transcriptomics and metabolomicsTranscriptomics and metabolomics
Transcriptomics and metabolomicsSukhjinder Singh
 

What's hot (18)

Sage
SageSage
Sage
 
Massively Parallel Signature Sequencing (MPSS)
Massively Parallel Signature Sequencing (MPSS) Massively Parallel Signature Sequencing (MPSS)
Massively Parallel Signature Sequencing (MPSS)
 
Express sequence tags
Express sequence tagsExpress sequence tags
Express sequence tags
 
Est database
Est databaseEst database
Est database
 
Multilocus sequence typin1
Multilocus sequence typin1Multilocus sequence typin1
Multilocus sequence typin1
 
Tools for Transcriptome Data Analysis
Tools for Transcriptome Data AnalysisTools for Transcriptome Data Analysis
Tools for Transcriptome Data Analysis
 
Sage - serial analysis of gene expression
Sage - serial analysis of gene expressionSage - serial analysis of gene expression
Sage - serial analysis of gene expression
 
Sage
SageSage
Sage
 
Transcriptomics approaches
Transcriptomics approachesTranscriptomics approaches
Transcriptomics approaches
 
Transcriptome analysis
Transcriptome analysisTranscriptome analysis
Transcriptome analysis
 
Genomic mapping, genetic mapping
Genomic mapping, genetic mappingGenomic mapping, genetic mapping
Genomic mapping, genetic mapping
 
Image Based Transcriptomics: An Overview
Image Based Transcriptomics: An OverviewImage Based Transcriptomics: An Overview
Image Based Transcriptomics: An Overview
 
2015 12-09 nmdd
2015 12-09 nmdd2015 12-09 nmdd
2015 12-09 nmdd
 
Expressed sequence tag (EST), molecular marker
Expressed sequence tag (EST), molecular markerExpressed sequence tag (EST), molecular marker
Expressed sequence tag (EST), molecular marker
 
Ajeet
AjeetAjeet
Ajeet
 
Gene expression profiling i
Gene expression profiling  iGene expression profiling  i
Gene expression profiling i
 
Poster ESCS 2020 - PROIMI - CONICET
Poster ESCS 2020 - PROIMI - CONICETPoster ESCS 2020 - PROIMI - CONICET
Poster ESCS 2020 - PROIMI - CONICET
 
Transcriptomics and metabolomics
Transcriptomics and metabolomicsTranscriptomics and metabolomics
Transcriptomics and metabolomics
 

More from creativebiolabs11

Proteolysis-targeting chimeras as tools in drug development (i)
Proteolysis-targeting chimeras as tools in drug development (i)Proteolysis-targeting chimeras as tools in drug development (i)
Proteolysis-targeting chimeras as tools in drug development (i)creativebiolabs11
 
What you need to know about the coronavirus
What you need to know about the coronavirusWhat you need to know about the coronavirus
What you need to know about the coronaviruscreativebiolabs11
 
The research and application progress of transcriptome sequencing technology ...
The research and application progress of transcriptome sequencing technology ...The research and application progress of transcriptome sequencing technology ...
The research and application progress of transcriptome sequencing technology ...creativebiolabs11
 
The research and application progress of transcriptome sequencing technology (i)
The research and application progress of transcriptome sequencing technology (i)The research and application progress of transcriptome sequencing technology (i)
The research and application progress of transcriptome sequencing technology (i)creativebiolabs11
 
Introduction to cancer bioinformatics
Introduction to cancer bioinformaticsIntroduction to cancer bioinformatics
Introduction to cancer bioinformaticscreativebiolabs11
 
The future of gene therapy used for complex diseases
The future of gene therapy used for complex diseasesThe future of gene therapy used for complex diseases
The future of gene therapy used for complex diseasescreativebiolabs11
 
The challenge of gene therapy
The challenge of gene therapyThe challenge of gene therapy
The challenge of gene therapycreativebiolabs11
 
Anti human cd20 therapeutic antibody
Anti human cd20 therapeutic antibodyAnti human cd20 therapeutic antibody
Anti human cd20 therapeutic antibodycreativebiolabs11
 

More from creativebiolabs11 (9)

The share of virus
The share of virusThe share of virus
The share of virus
 
Proteolysis-targeting chimeras as tools in drug development (i)
Proteolysis-targeting chimeras as tools in drug development (i)Proteolysis-targeting chimeras as tools in drug development (i)
Proteolysis-targeting chimeras as tools in drug development (i)
 
What you need to know about the coronavirus
What you need to know about the coronavirusWhat you need to know about the coronavirus
What you need to know about the coronavirus
 
The research and application progress of transcriptome sequencing technology ...
The research and application progress of transcriptome sequencing technology ...The research and application progress of transcriptome sequencing technology ...
The research and application progress of transcriptome sequencing technology ...
 
The research and application progress of transcriptome sequencing technology (i)
The research and application progress of transcriptome sequencing technology (i)The research and application progress of transcriptome sequencing technology (i)
The research and application progress of transcriptome sequencing technology (i)
 
Introduction to cancer bioinformatics
Introduction to cancer bioinformaticsIntroduction to cancer bioinformatics
Introduction to cancer bioinformatics
 
The future of gene therapy used for complex diseases
The future of gene therapy used for complex diseasesThe future of gene therapy used for complex diseases
The future of gene therapy used for complex diseases
 
The challenge of gene therapy
The challenge of gene therapyThe challenge of gene therapy
The challenge of gene therapy
 
Anti human cd20 therapeutic antibody
Anti human cd20 therapeutic antibodyAnti human cd20 therapeutic antibody
Anti human cd20 therapeutic antibody
 

The research and application progress of transcriptome sequencing technology (iii)

  • 1. The Research and Application Progress of Transcriptome Sequencing Technology (II) 1.5. Full transcription sequencing The transcription level of mRNAis affected by the regulation of lncRNA, small RNA and circRNA. Quantitative analysis of biomolecular networks and regulatory pathways in cells or specific tissues in a certain space-time requires quantitative and qualitative research on all RNA molecules in the entire transcriptome. Whole transcriptome sequencing can determine all complete transcripts in a sample, including mRNA and non-coding RNA (lncRNA, circRNA and miRNA). The difference between full transcript sequencing and conventional RNA-seq is the main way of library construction. The whole transcriptome sequencing requires the establishment of 2 libraries (mRNA + lncRNA + circRNAlibrary and miRNA library) or 3 libraries (mRNA+ lncRNA library, circRNA library and miRNA library) during the library construction process. Through the whole transcriptome data, not only the expression profiles of all types of transcripts can be obtained, on this basis, different RNA molecules are identified and annotated, their encoded proteins and regulatory functions are analyzed, and the interaction between RNAmolecules is regulated Network analysis, comprehensively and systematically analyze the biological characteristics of specific cells in a specific time and space. 1.6. Single-cell transcriptome sequencing Combining the complementary DNA (cDNAs) technology of in vitro transcription linear amplification and PCR exponential amplification of a single cell with high-throughput sequencing technology helps to derive single cell RNA-seq (scRNA-seq). Single-cell transcriptome sequencing technology is a technique to studythe entire transcriptomeat the single-cell level. It is used to assess the differences in gene expression between single cells, which can avoid false-negative results introduced by the confusion of cell types, and may identify the rare cell population failing to pass mixed cell detection. Common single-cell sequencing platforms currently include Fluidigm, WaferGen, 10 × Genomics, and Illumina / Bio-Rad. Unlike other RNA sequencing technologies, scRNA-seq needs to first isolate and obtain all transcriptomes within a single cell. Single cell separation is a key step in scRNA-seq, which is mainly achieved through serial dilution, micromanipulation separation, fluorescence activated cell sorting (FACS) and microfluidic technology. 2. Construction of Transcriptome Sequencing Library When performing transcriptome sequencing, the total RNA in the sample is extracted, rRNA is removed, and the target sequencing RNA molecule is enriched to construct a sequencing library. Sequencing libraries are divided into non-strand-specific libraries and strand-specific libraries. The non-strand-specific library refers to a library in which RNA is reverse transcribed into double- strandedcDNA, and a linker and information that does not distinguish the RNA strandare randomly added. During sequencing, double-stranded cDNAis used for sequencing, which cannot distinguish the transcription direction of m RNA. Strand-specific libraries can be divided into two categories, one is to label one strand with a chemical modification, for example, to treat RNA molecules with bisulfate, or to introduce dUTP during the synthesis of the second strand cDNA, and then degrade the strand containing U; Different linkers are used to connect the 5 'and 3' ends of RNAmolecules or synthetic cDNA strands to distinguish the positive and antisense strands.
  • 2. In transcriptome sequencing, distinguishing the source of RNA molecular chains can avoid the interference of reads on the antisense strand of genes, and can improve the accuracy of gene transcript identification and transcript quantification. When using transcriptome data for de novo stitching, it helps to demarcate the boundaries of transcripts and determine the sense chain information of transcripts. 3. Transcriptome Data Processing When transcript sequencing data is used to compare quantitative differences between gene levels or transcript levels between different groups, the basic analysis process includes raw data preprocessing, reads comparison, transcript assembly, new transcript prediction, and transcript expression level, analysis and other steps. According to the purpose of the experiment, we can further analyze the difference in transcript expression between the experimental group and the control group, cluster gene expression patterns between samples, and perform joint analysis with other omics data. 3.1. Raw data preprocessing After obtaining the raw data of the second-generation sequencing, the quality of the data needs to be evaluated and quality control (QC) is performed. The evaluation content includes data output, GC content, rRNA content, basequality distribution, and repeated sequences. Thelow-quality reads and linker sequences are removed, and the clean data after quality controlis obtained for subsequent analysis. 3.2. Reads comparison The transcriptome data is mainly derived from the exon sequence of the genome. The transcriptome reads obtained by sequencing are aligned to the genomic sequence, which will be separated by the intron sequence. 3.3. Transcript assembly Transcriptassemblyis the assembly of sequencing data into transcripts. For species with a reference genome, according to the results of the transcriptome comparison, the connection mode between the exons is clarified, thereby constructing the structure of the transcript. For transcriptome data without a referencegenomic sequence, in order to obtain a complete transcript sequence, short reads obtained from RNA sequencing need to be assembled de novo. For transcriptome studies of non- parameter species, more sequencing data is often needed to meet the requirements of de novo assembly. The greater theamount of valid data for assembly, the better the number and completeness of the transcripts that are spliced, and the easier it is to detect transcripts with lower expression levels. 3.4. Transcript prediction Most genes have multiple splicing forms and may produce multiple transcripts, thereby encoding different proteins, whichmay causea gene to have multiple functions. After splicing and assembling the transcript sequencing data, not only will you get the known transcript information, but also new transcript sequences, you need to identify and annotate the new transcripts, especially the new ones that are less studied ncRNAtranscript. For species with reference genome and transcript reference information, the transcript structure is mainly basedon sequencing to obtain reads for comparison. Thereads cover all transcript sequences and rely on the genome sequence to assemble complete transcript information. For species without a reference genome, the transcript sequence of the gene needs to be assembled by itself. The obtained gene or transcript sequence can be compared with unigene and EST databases of the same
  • 3. species or near-source species to judge the reliability of the obtained gene or transcript sequence. In this process, the blast method is commonly used for comparison to quickly identify the similarity between sequences. In the identification and analysis of new lncRNA, transcripts with a total exon length of> 200 nt are extracted from the transcriptome data based on the characteristics of lncRNA molecules, and then predicted based on open reading frames and compared with known protein databases Further isolate lncRNA from mRNA. 3.5. The analysis of transcript expression levels After comparing the reads to the corresponding genomic position or assembling the transcript from scratch, the number of reads on each gene or transcript can reflect the expression abundance to a certain extent. There may be significant differences in the total output of data between samples, the number of gene expressions between samples, the length of different genes in a sample, or even the distribution of different transcripts within the same gene. When comparing expression levels, you need to normalize the data between samples. To be continued in Part III…