1. An introduction to Next-Generation
Sequencing Technology
www.illumina.com/technology/next-generation-sequencing.html
For Research Use Only. Not for use in diagnostic procedures.
2. Table of Contents
Table ofContents 2
I. Welcome toNext-GenerationSequencing 3
a. The EvolutionofGenomic Science 3
b. The BasicsofNGSChemistry 4
c. AdvancesinSequencingTechnology 5
Paired-End Sequencing 5
Tunable Coverage and Unlimited Dynamic Range 6
AdvancesinLibraryPreparation 6
Multiplexing 7
Flexible, Scalable Instrumentation 7
II. NGSMethods 8
a. Genomics 8
Whole-Genome Sequencing 8
Exome Sequencing 8
De novoSequencing 9
Targeted Sequencing 9
b. Transcriptomics 11
TotalRNAand mRNASequencing 11
Targeted RNASequencing 11
SmallRNAand NoncodingRNASequencing 11
c. Epigenomics 12
MethylationSequencing 12
ChIPSequencing 12
Ribosome Profiling 12
III. Illumina DNA-to-Data NGSSolutions 13
a. The Illumina NGSWorkflow 13
b. Integrated Data Analysis 13
IV. Glossary 14
V. References 15
For Research Use Only. Not for use in diagnostic procedures.
3. I. Welcome to Next-Generation Sequencing
a. The Evolution of Genomic Science
DNAsequencinghascome a longwaysince the daysoftwo-dimensionalchromatographyinthe 1970s. Withthe advent of
the Sangerchainterminationmethod1 in1977, scientistsgained the abilitytosequence DNAina reliable, reproducible
manner. Adecade later, Applied Biosystemsintroduced the first automated, capillaryelectrophoresis(CE)-based
sequencinginstruments,the AB370in1987and the AB3730xlin1998,instrumentsthat became the primaryworkhorsesfor
the NIH-led and Celera-led HumanGenome Projects.2 While these “first-generation” instrumentswere considered high
throughput fortheirtime, the Genome Analyzeremerged in2005and took sequencingrunsfrom84 kilobase (kb)perrunto
1 gigabase (Gb)perrun.3 The short read, massivelyparallelsequencingtechnique wasa fundamentallydifferent approach
that revolutionized sequencingcapabilitiesand launched the “next generation” ingenomic science. Fromthat point forward,
the data output ofnext-generationsequencing(NGS)hasoutpaced Moore’slaw,more thandoublingeachyear(Figure 1).
Figure 1: Sequencing Cost and Data Output Since 2000—The dramatic rise of data output and concurrent falling cost of sequencing since 2000. The Y-axes on both sides
of the graph are logarithmic.
In2005, a single runonthe Genome Analyzercould produce roughlyone gigabase ofdata. By2014, the rate climbed to
1.8 terabases(Tb)ofdata ina single sequencingrun, anastounding1000× increase. It isremarkable toreflect onthe fact that
the first humangenome, famouslycopublished inScience and Nature in2001, required 15 yearstosequence and cost
nearlythree billiondollars. Incontrast, the HiSeq X®
TenSystem, released in2014, cansequence over45humangenomesin
a single dayforapproximately$1000each(Figure 2).4
Beyond the massive increase indata output, the introductionofNGStechnologyhastransformed the wayscientiststhink
about genetic information. The $1000 dollargenome enablespopulation-scale sequencingand establishesthe foundation
forpersonalized genomic medicine aspart ofstandard medicalcare. Researcherscannow analyze thousandstotensof
thousandsofsamplesina single year. AsEric Lander, foundingdirectorofthe Broad Institute ofMITand Harvard and
principalleaderofthe HumanGenome Project, states:
“The rate of progress is stunning.As costs continue tocome down, we are enteringa
period where we are goingtobe able toget the complete catalogof disease genes.
This willallow us tolookat thousands of people and see the differences among
them, todiscover criticalgenes that cause cancer,autism,heart disease,or
schizophrenia.”5
For Research Use Only. Not for use in diagnostic procedures.
4. Figure 2: Human Genome Sequencing Over the Decades—The capacity to sequence all 3.2 billion bases of the human genome (at 30× coverage) has increased
exponentially since the 1990s. In 2005, with the introduction of the Illumina Genome Analyzer System, 1.3 human genomes could be sequenced annually. Nearly 10 years
later, with the Illumina HiSeq X Ten fleet of sequencing systems, the number has climbed to 18,000 human genomes a year.
b. The Basics of NGS Chemistry
Inprinciple, the concept behind NGStechnologyissimilartoCE sequencing. DNApolymerase catalyzesthe incorporationof
fluorescentlylabeled deoxyribonucleotide triphosphates(dNTPs)intoa DNAtemplate strand duringsequentialcyclesof
DNAsynthesis. Duringeachcycle, at the point ofincorporation, the nucleotidesare identified byfluorophore excitation. The
criticaldifference isthat, instead ofsequencinga single DNAfragment, NGSextendsthisprocessacrossmillionsoffragments
ina massivelyparallelfashion. More than90% ofthe world'ssequencingdata are generated byIllumina sequencingby
synthesis(SBS)chemistry.* It delivershighaccuracy, a highyield oferror-free reads, and a highpercentage ofbase calls
above Q30.6–8
Illumina NGSworkflowsinclude fourbasic steps:
1. LibraryPreparation—The sequencinglibraryisprepared byrandomfragmentationofthe DNAorcDNAsample, followed
by5′
and 3′
adapterligation(Figure 3A). Alternatively, “tagmentation” combinesthe fragmentationand ligationreactions
intoa single step that greatlyincreasesthe efficiencyofthe librarypreparationprocess.9 Adapter-ligated fragmentsare
thenPCR amplified and gelpurified.
2. ClusterGeneration—Forclustergeneration, the libraryisloaded intoa flow cellwhere fragmentsare captured ona lawnof
surface-bound oligoscomplementarytothe libraryadapters. Eachfragment isthenamplified intodistinct, clonalclusters
throughbridge amplification(Figure 3B). Whenclustergenerationiscomplete, the templatesare readyforsequencing.
3. Sequencing—Illumina SBStechnologyusesa proprietaryreversible terminator–based method that detectssingle bases
astheyare incorporated intoDNAtemplate strands(Figure 3C). Asallfourreversible terminator–bound dNTPsare
present duringeachsequencingcycle, naturalcompetitionminimizesincorporationbiasand greatlyreducesraw error
ratescompared toothertechnologies.6,7 The result ishighlyaccurate base-by-base sequencingthat virtuallyeliminates
sequence context–specific errors, evenwithinrepetitive sequence regionsand homopolymers.
4. Data Analysis—Duringdata analysisand alignment, the newlyidentified sequence readsare aligned toa reference
genome (Figure 3D). Followingalignment, manyvariationsofanalysisare possible, suchassingle nucleotide
polymorphism(SNP)orinsertion-deletion(indel)identification, read countingforRNAmethods, phylogenetic or
metagenomic analysis, and more.
Adetailed animationofSBSchemistryisavailable at www.illumina.com/SBSvideo.
*Data calculations on file. Illumina, Inc., 2015.
For Research Use Only. Not for use in diagnostic procedures.
5. Figure 3: Next-Generation Sequencing Chemistry Overview—Illumina NGS includes four steps: (A) library preparation, (B) cluster generation,(C) sequencing, and (D)
alignment and data analysis.
c. Advances in Sequencing Technology
Paired-End Sequencing
Amajoradvance inNGStechnologyoccurred withthe development ofpaired-end (PE)sequencing(Figure 4). PE
sequencinginvolvessequencingbothendsofthe DNAfragmentsina libraryand aligningthe forward and reverse readsas
read pairs. Inadditiontoproducingtwice the numberofreadsforthe same time and effort inlibrarypreparation, sequences
aligned asread pairsenable more accurate read alignment and the abilitytodetect indels, whichisnot possible withsingle-
read data.8 Analysisofdifferentialread-pairspacingalsoallowsremovalofPCR duplicates, a commonartifact resultingfrom
PCR amplificationduringlibrarypreparation. Furthermore, PE sequencingproducesa highernumberofSNV callsfollowing
read-pairalignment.8,9 While some methodsare best served bysingle-read sequencing, suchassmallRNAsequencing,
most researcherscurrentlyuse the paired-end approach.
For Research Use Only. Not for use in diagnostic procedures.
6. Figure 4: Paired-End Sequencing and Alignment—Paired-end sequencing enables both ends of the DNA fragment to be sequenced. Because the distance between each
paired read is known, alignment algorithms can use this information to map the reads over repetitive regions more precisely. This results in better alignment of reads,
especially across difficult-to-sequence, repetitive regions of the genome.
Tunable Coverage and Unlimited Dynamic Range
The digitalnature ofNGSallowsa virtuallyunlimited dynamic range forread-countingmethods, suchasgene expression
analysis. Microarraysmeasure continuoussignalintensitiesand the detectionrange islimited bynoise at the low end and
signalsaturationat the highend, while NGSquantifiesdiscrete, digitalsequencingread counts. Byincreasingordecreasing
the numberofsequencingreads, researcherscantune the sensitivityofanexperiment toaccommodate variousstudy
objectives. Because the dynamic range withNGSisadjustable and nearlyunlimited, researcherscanquantifysubtle gene
expressionchangeswithmuchgreatersensitivitythantraditionalmicroarray-based methods. Sequencingrunscanbe
tailored tozoominwithhighresolutiononparticularregionsofthe genome, orprovide a more expansive view withlower
resolution.
The abilitytoeasilytune the levelofcoverage offersseveralexperimentaldesignadvantages. Forinstance, somatic mutations
mayonlyexist withina smallproportionofcellsina giventissue sample. Usingmixed tumor–normalcellsamples, the regionof
DNAharboringthe mutationmust be sequenced at extremelyhighcoverage, oftenupwardsof1000×, todetect these low-
frequencymutationswithinthe mixed cellpopulation. Onthe otherside ofthe coverage spectrum, a method like genome-
wide variant discoveryusuallyrequiresa muchlowercoverage level. Inthiscase, the studydesigninvolvessequencingmany
samples(hundredstothousands)at lowerresolution, toachieve greaterstatisticalpowerwithina givenpopulation.
Advances in Library Preparation
WithIllumina NGS, librarypreparationhasundergone rapid improvements. The first NGSlibraryprep protocolsinvolved
randomfragmentationofthe DNAorRNAsample, gel-based size selection, ligationofplatform-specific oligonucleotides,
PCR amplification, and severalpurificationsteps. While the 1–2 daysrequired togenerate these earlyNGSlibrarieswere a
great improvement overtraditionalcloningtechniques, current NGSprotocols, suchasNextera®
XTDNALibrary
Preparation, have reduced the libraryprep time tolessthan90minutes.10 PCR-free and gel-free kitsare alsoavailable for
sensitive sequencingmethods. PCR-free librarypreparationkitsresult insuperiorcoverage oftraditionallychallengingareas
suchashighAT/GC-richregions, promoters, and homopolymeric regions.11
Fora complete list ofIllumina librarypreparationkits, visit www.illumina.com/products/by-type/sequencing-
kits/library-prep-kits.html.
For Research Use Only. Not for use in diagnostic procedures.
7. Multiplexing
Inadditiontothe rise ofdata output perrun, the sample throughput perruninNGShasalsoincreased overtime. Multiplexing
allowslarge numbersoflibrariestobe pooled and sequenced simultaneouslyduringa single sequencingrun(Figure 5). With
multiplexed libraries, unique indexsequencesare added toeachDNAfragment duringlibrarypreparationsothat eachread
canbe identified and sorted before finaldata analysis. WithPE sequencingand multiplexing, NGShasdramaticallyreduced
the time todata formultisample studiesand enabled researcherstogofromexperiment todata quicklyand easily.
Gainsinthroughput frommultiplexingcome withanadded layerofcomplexity, assequencingreadsfrompooled libraries
need tobe identified and sorted computationallyina processcalled demultiplexingbefore finaldata analysis(Figure 5). The
phenomenonofindexmisassignment betweenmultiplexed librariesisa knownissue that hasimpacted NGStechnologies
fromthe time sample multiplexingwasdeveloped.12 Indexhoppingisa specific cause ofindexmisassignment that canresult
inincorrect assignment oflibrariesfromthe expected indextoa different indexinthe pool, leadingtomisalignment and
inaccurate sequencingresults.
Formore informationregardingindexhopping, includingmechanismsbywhichit occurs, how Illumina measures
indexhopping, and best practicesformitigatingthe impact ofindexhoppingonsequencingdata quality, read the
EffectsofIndexMisassignment onMultiplexingand DownstreamAnalysisWhite Paper.
Figure 5: Library Multiplexing Overview—(A) Unique index sequences are added to two different libraries during library preparation. (B) Libraries are pooled together and
loaded into the same flow cell lane. (C) Libraries are sequenced together during a single instrument run. All sequences are exported to a single output file. (D) A
demultiplexing algorithm sorts the reads into different files according to their indexes. (E) Each set of reads is aligned to the appropriate reference sequence.
Flexible, Scalable Instrumentation
While the latest NGSplatformscanproduce massive data output, NGStechnologyisalsohighlyflexible and scalable.
Sequencingsystemsare available foreverymethod and scale ofstudy, fromsmalllaboratoriestolarge genome centers
(Figure 6). Illumina NGSinstrumentsrange fromthe benchtop MiniSeq™System, withoutput rangingfrom1.8–7.5 Gb for
targeted sequencingstudies, tothe NovaSeq™6000System, whichcangenerate animpressive 6 Tb and 20B readsin~ 2
days†
forpopulation-scale studies.
Flexible runconfigurationsare alsoengineered intothe designofIllumina NGSsequencers. Forexample, the HiSeq®
2500
Systemofferstworunmodesand single ordualflow cellsequencingwhile the NextSeq®
SeriesofSequencingSystemsoffers
twoflow celltypestoaccommodate different throughput requirements. The HiSeq 3000/4000Seriesusesthe same
patterned flow celltechnologyasthe HiSeq X instrumentsforcost-effective production-scale sequencing. The new
NovaSeq Seriesofsystemsunitesthe latest high-performance imagingwiththe next generationofIllumina patterned flow cell
For Research Use Only. Not for use in diagnostic procedures.
8. technologytodelivermassive increasesinthroughput. Thisflexibilityallowsresearcherstoconfigure runstailored totheir
specific studyrequirements, withthe instrument oftheirchoice.
Foranin-depthcomparisonofIllumina platforms, visit www.illumina.com/systems/sequencing.htmlorexplore the
SequencingPlatformComparisonToolat www.illumina.com/systems/sequencing-platforms/comparison-
tool.html.
Figure 6: Sequencing Systems for Virtually Every Scale—Illumina offers innovative NGS platforms that deliver exceptional data quality and accuracy over a wide scale,
from small benchtop sequencers to production-scale sequencing systems.
II. NGS Methods
NGSplatformsenable a wide varietyofmethods, allowingresearcherstoask virtuallyanyquestionrelated tothe genome,
transcriptome, orepigenome ofanyorganism. Sequencingmethodsdifferprimarilybyhow the DNAorRNAsamplesare
obtained (eg, organism, tissue type, normalvs. affected, experimentalconditions, etc)and bythe data analysisoptions
used. Afterthe sequencinglibrariesare prepared, the actualsequencingstage remainsfundamentallythe same, regardless
ofthe method. There are variousstandard librarypreparationkitsthat offerprotocolsforwhole-genome sequencing(WGS),
RNAsequencing(RNA-Seq), targeted sequencing(suchasexome sequencingor16Ssequencing), custom-selected
regions, protein-bindingregions, and more. Althoughthe numberofNGSmethodsisconstantlygrowing, a briefoverview of
the most commonmethodsispresented here.
a. Genomics
Whole-Genome Sequencing
Microarray-based, genome-wide associationstudies(GWAS)have beena commonapproachforidentifyingdisease
associationsacrossthe whole genome. While GWASmicroarrayscaninterrogate overfour millionmarkerspersample, the
most comprehensive method ofinterrogatingthe 3.2 billionbasesofthe humangenome isWGS. The rapid drop in
sequencingcost and the abilityofWGStoproduce large volumesofdata rapidlymake it a powerfultoolforgenomics
research. While WGSiscommonlyassociated withsequencinghumangenomes, the scalable, flexible nature ofthe method
makesit equallyusefulforsequencinganyspecies, suchasagriculturallyimportant livestock, plant genomes, ordisease-
related microbialgenomes. Thisbroad utilitywasdemonstrated duringthe recent E. colioutbreak inEurope in2011, which
prompted a rapid scientific response. Usingthe latest NGSsystems, researchersquicklysequenced the bacterialstrain,
enablingthemtotrack the originsand transmissionofthe outbreak aswellasidentifygenetic mutationsconferringthe
increased virulence.13
Exome Sequencing
Exome sequencingisa widely-used targeted sequencingmethod. The exome representslessthan2% ofthe human
genome, but containsmost ofthe knowndisease-causingvariants, makingwhole-exome sequencing(WES)a cost-
effective alternative toWGS.14 WithWES, the protein-codingportionofthe genome isselectivelycaptured and sequenced.
It canefficientlyidentifyvariantsacrossa wide range ofapplications, includingpopulationgenetics, genetic disease, and
cancerstudies.
†
With dual flow cell mode enabled.
For Research Use Only. Not for use in diagnostic procedures.
9. De novo Sequencing
De novosequencingreferstosequencinga novelgenome where there isnoreference sequence available foralignment.
Sequence readsare assembled ascontigsand the coverage qualityofde novosequence data dependsonthe size and
continuityofthe contigs(ie, the numberofgapsinthe data). Anotherimportant factoringeneratinghigh-qualityde novo
sequencesisthe diversityofinsert sizesincluded inthe library. Combiningshort-insert paired-end and long-insert mate pair
sequencesisthe most powerfulapproachformaximalcoverage acrossthe genome (Figure 7). The combinationofinsert
sizesenablesdetectionofthe widest range ofstructuralvariant typesand isessentialforaccuratelyidentifyingmore complex
rearrangements. The short-insert reads, sequenced at higherdepths, canfillingapsnot covered bythe longinserts, which
are oftensequenced at lowerread depths. Therefore, usinga combined approachresultsinhigherqualityassemblies. In
parallelwithNGStechnologyimprovements, manyalgorithmic advanceshave emerged insequence assemblersforshort-
read data. Researcherscanperformhigh-qualityde novoassemblyusingNGSreadsand publiclyavailable short-read
assemblytoolswithexistingcomputerresourcesinthe laboratory.
Figure 7: Mate Pairs and De novo Assembly—Using a combination of short and long insert sizes with paired-end sequencing results in maximal coverage of the genome for
de novo assembly.
Targeted Sequencing
Withtargeted sequencing, a subset ofgenesorregionsofthe genome are isolated and sequenced. Targeted sequencing
allowsresearcherstofocustime, expenses, and data analysisonspecific areasofinterest and enablessequencingat much
highercoverage levels. Forexample, a typicalWGSstudyachievescoverage levelsof30–50× pergenome, while a targeted
resequencingproject caneasilycoverthe target regionat 500–1000× orhigher. Thishighercoverage allowsresearchersto
identifyrare variants, variantsthat would be toorare and tooexpensive toidentifywithWGSorCE-based sequencing.
Targeted sequencingpanelscanbe purchased withfixed, preselected content orcanbe customdesigned. Awide variety
oftargeted sequencinglibraryprep kitsare available, includingkitswithprobe setsfocused onspecific areasofinterest such
ascancer, cardiomyopathy, orautism. Customprobe setsare available throughDesignStudio™Software enabling
researcherstotarget regionsofthe genome relevant tospecific researchinterests. Customtargeted sequencingisidealfor
examininggenesinspecific pathways, orforfollow-up studiesfromGWASorWGS. Illumina currentlysupportstwomethods
fortargeted sequencing, target enrichment and amplicongeneration(Figure 8).
Target enrichment capturesbetween10 kb–62 Mb regions, dependingonthe libraryprep kit parameters. Amplicon
sequencingallowsresearcherstosequence 16–1536 targetsat a time, spanning2.4–652.8 kb oftotalcontent, depending
onthe libraryprep kit used. Thishighlymultiplexed approachenablesa wide range ofapplicationsfordiscovery, validation, or
screeningofgenetic variants. Ampliconsequencingisusefulfordiscoveryofrare somatic mutationsincomplexsamples(eg,
canceroustumorsmixed withgermline DNA).15,16 Anothercommonampliconapplicationissequencingthe bacterial
16S rRNAgene acrossmultiple species, a widelyused method forphylogenyand taxonomystudies, particularlyindiverse
metagenomic samples.17
Formore informationonIllumina targeted, WGS, exome, orde novosequencingsolutions, visit
www.illumina.com/applications/sequencing/dna_sequencing.html.
For Research Use Only. Not for use in diagnostic procedures.
10. Figure 8: Target Enrichment and Amplicon Generation Workflows—With target enrichment, specific regions of interest are captured by hybridization to biotinylated
probes, then isolated by magnetic pulldown. Amplicon sequencing involves the amplification and purification of regions of interest using highly multiplexed PCR oligos sets.
For Research Use Only. Not for use in diagnostic procedures.
11. b. Transcriptomics
LibrarypreparationmethodsforRNA-Seq typicallybeginwithtotalRNAsample preparationfollowed bya ribosome removal
step. The totalRNAsample isthenconverted tocDNAbefore standard NGSlibrarypreparation. RNA-Seq focused on
mRNA, smallRNA, noncodingRNA, ormicroRNAscanbe achieved byincludingadditionalisolationorenrichment steps
before cDNAsynthesis(Figure 9).
Figure 9: A Complete View of Transcriptomics with NGS—A broad range of methods for transcriptomics with NGS have emerged over the past 10 years including total
RNA-Seq, mRNA-Seq, small RNA-Seq, and targeted RNA-Seq.
Total RNA and mRNA Sequencing
Transcriptome sequencingisa majoradvance inthe studyofgene expressionbecause it allowsa snapshot ofthe whole
transcriptome ratherthana predetermined subset ofgenes. Whole-transcriptome sequencingprovidesa comprehensive
view ofa cellulartranscriptionalprofile at a givenbiologicalmoment and greatlyenhancesthe powerofRNAdiscovery
methods. Aswithanysequencingmethod, analmost unlimited dynamic range allowsidentificationand quantificationofboth
commonand rare transcripts. Additionalcapabilitiesinclude aligningsequencingreadsacrosssplice junctions, and
detectionofisoforms, noveltranscripts, and gene fusions. Librarypreparationkitsthat support precise detectionofstrand
orientationare available forbothtotalRNA-Seq and mRNA-Seq methods.
Targeted RNA Sequencing
Targeted RNAsequencingisa method formeasuringtranscriptsofinterest fordetectingdifferentialexpression, allele-
specific expression, detectionofgene-fusions, isoforms, cSNPs, and splice junctions. Illumina TruSeq®
Targeted RNA
SequencingKitsinclude preconfigured, experimentallyvalidated panelsfocused onspecific cellularpathwaysordisease
statessuchasapoptosis, cardiotoxicity, NFκB pathway, and more. Customcontent canbe designed and ordered for
analysisofspecific genesofinterest. Targeted RNAsequencingisa powerfulmethod forthe investigationofspecific
pathwaysofinterest orforthe validationofgene expressionmicroarrayorwhole-transcriptome sequencingresults.
Small RNA and Noncoding RNASequencing
Small, noncodingRNA, ormicroRNAsare short, 18–22 bp nucleotidesthat playa role inthe regulationofgene expression
oftenasgene repressorsorsilencers. The studyofmicroRNAshasgrownastheirrole intranscriptionaland translational
regulationhasbecome more evident.18,19
Formore informationregardingIllumina solutionsforsmallRNA(noncodingRNA), targeted RNA, totalRNA, and
mRNAsequencing, visit www.illumina.com/applications/sequencing/rna.html.
For Research Use Only. Not for use in diagnostic procedures.
12. c. Epigenomics
While genomicsinvolvesthe studyofheritable oracquired alterationsinthe DNAsequence, epigeneticsisthe studyof
heritable changesingene activitycaused bymechanismsotherthanDNAsequence changes. Mechanismsofepigenetic
activityinclude DNAmethylation, smallRNA–mediated regulation, DNA–proteininteractions, histone modification, and
more.
Methylation Sequencing
Acriticalfocusinepigeneticsisthe studyofcytosine methylation(5mC)statesacrossspecific areasofregulation, suchas
promotorsorheterochromatin. Cytosine methylationcansignificantlymodifytemporaland spatialgene expressionand
chromatinremodeling.20 While there are manymethodsforthe studyofgenetic methylation, methylationsequencing
leveragesthe advantagesofNGStechnologyand genome-wide analysiswhile assessingmethylationstatesat the single-
nucleotide level. Twomethylationsequencingmethodsare widelyused: whole-genome bisulfite sequencing(WGBS)and
reduced representationbisulfite sequencing (RRBS). WithWGBS, sodiumbisulfite chemistryconvertsnonmethylated
cytosinestouracils, whichare thenconverted tothyminesinthe sequence readsordata output. InRRBS, DNAisdigested
withMspI, a restrictionenzyme unaffected bymethylationstatus. Fragmentsinthe 100–150 bp size range are isolated to
enrichforCpG and promotorcontainingDNAregions. Sequencinglibrariesare thenconstructed usingthe standard NGS
protocols.
Formore informationonmethylationsequencingsolutions, visit
www.illumina.com/techniques/sequencing/methylation-sequencing.html
ChIP Sequencing
Protein–DNAorprotein–RNAinteractionshave a significant impact onmanybiologicalprocessesand disease states. These
interactionscanbe surveyed withNGSbycombiningchromatinimmunoprecipitation (ChIP)assaysand NGSmethods.
ChIP-Seq protocolsbeginwiththe chromatinimmunoprecipitationstep (ChIPprotocolsvarywidelyastheymust be specific
tothe species, tissue type, and experimentalconditions).
Formore informationonChIP-Seq, visit
www.illumina.com/techniques/sequencing/dna-sequencing/chip-seq.html.
Ribosome Profiling
Ribosome profilingisa method based ondeep sequencingofribosome protected–mRNAfragments. Purificationand
sequencingofthese fragmentsprovidesa “snapshot” ofallthe ribosomesactive ina cellat a specific time point. This
informationcandetermine what proteinsare beingactivelytranslated ina cell, and canbe usefulforinvestigatingtranslational
control, measuringgene expression, determiningthe rate ofproteinsynthesis, orpredictingproteinabundance. Ribosome
profilingenablessystematic monitoringofcellulartranslationprocessesand predictionofproteinabundance. Determining
what regionsofa transcript are beingtranslated canhelp define the proteome ofcomplexorganisms. WithNGS, ribosome
profilingallowsdetailed and accurate invivoanalysisofproteinproduction.
Tolearnmore about Illumina ribosome profiling, visit
www.illumina.com/applications/sequencing/rna.html.
For Research Use Only. Not for use in diagnostic procedures.
13. III. Illumina DNA-to-Data NGS Solutions
a. The Illumina NGS Workflow
Illumina offersa comprehensive solutionforthe NGSworkflow, fromlibrarypreparationtodata analysis(Figure 10). Library
preparationkitsare available forallNGSmethods, includingWGS, exome sequencing, targeted sequencing, RNA-Seq,
and more. Illumina librarypreparationprotocolscanaccommodate a range ofthroughput needs, frommanualprotocolsfor
smallerlaboratoriestofullyautomated librarypreparationworkstationsforlargerlaboratoriesorgenome centers. Likewise,
Illumina offersa fullportfolioofsequencingplatforms, fromthe benchtop MiniSeq and MiSeq®
Systemstothe factory-scale
HiSeq X and NovaSeq SeriesofSequencingSystemsthat deliverthe right levelofspeed, capacity, and cost forvarious
laboratoryorsequencingcenterrequirements. Forthe last step inthe NGSworkflow, Illumina offersuser-friendly
bioinformaticstoolsthat are easilyaccessible throughthe web, oninstrument, orthroughonsite servers.
Figure 10: Illumina DNA-to-Data Solutions—Illumina provides fully integrated, DNA-to-data solutions, with technology and support for every step of the NGS workflow
including library preparation, sequencing, and final data analysis.
b. Integrated Data Analysis
Data fromanyIllumina sequencingsystemcanbe streamed intoBaseSpace®
Sequence Hub, a user-friendlygenomics
cloud computingplatformthat offerssimplified data management, analyticalsequencingtools, and data storage.
BaseSpace Sequence Hub isoptimized toautomate processingofthe large volume ofdata generated. Researcherswillfind
a richecosystemofcommercialand open-source tools, fromIllumina and third-partydevelopers, fordata analysis, including
alignment and variant detection, annotation, visualization, interpretation, and somatic variant calling. BaseSpace Onsite
Sequence Hub isa localversionofBaseSpace Sequence Hub that enablesdata storage and analysisonsite throughan
installed localserver. On-instrument accesstoBaseSpace Sequence Hub enablesthe integrationofmanyworkflow steps,
includinglibraryprep planningwithBaseSpace Prep,‡
runset-up and chemistryvalidation, and real-time automatic data
transfertothe BaseSpace computingenvironment.
The NGSworkflow thenproceedsseamlesslythroughalignment and subsequent data analysisstepswithBaseSpace
Apps. BaseSpace Appsoffera wide varietyofanalysispipelines, includinganalysisforde novoassembly, SNPand indel
variant analysis, RNAexpressionprofiling, 16Smetagenomics, tumor-normalcomparisons, epigenetic/gene regulation
analysis, and manymore. Illumina collaboratescloselywithcommercialand academic software developerstocreate a full
ecosystemofdata analysistoolsthat addressthe needsofvariousresearchobjectives. Inthe finalstagesofthe NGS
workflow, data canbe shared withcollaboratorsordelivered instantlytocustomersaround the world.§
Tolearnmore about BaseSpace Sequence Hub, visit
www.illumina.com/basespace.
‡
Currently available with MiniSeq and NextSeq 500/550 Systems only. HiSeq and MiSeq Systems can use Illumina Experiment Manager (IEM) for the same planning
and validation functions.
§
Cloud-based environment only. BaseSpace Onsite Sequence Hub restricts data sharing to local users.
For Research Use Only. Not for use in diagnostic procedures.
14. IV. Glossary
adapters: The oligosbound tothe 5′
and 3′
end ofeachDNAfragment ina sequencinglibrary. The adaptersare
complementarytothe lawnofoligospresent onthe surface ofIllumina sequencingflow cells.
bridge amplification: Anamplificationreactionthat occursonthe surface ofanIllumina flow cell. Duringflow cell
manufacturing, the surface iscoated witha lawnoftwodistinct oligonucleotidesoftenreferred toas“P5” and “P7.” Inthe first
step ofbridge amplification, a single-stranded sequencinglibrary(withcomplementaryadapterends)isloaded intothe flow
cell. Individualmoleculesinthe librarybind tocomplementaryoligosasthey“flow” acrossthe oligolawn. Primingoccursas
the opposite end ofa ligated fragment bendsoverand “bridges” toanothercomplementaryoligoonthe surface. Repeated
denaturationand extensioncycles(similartoPCR)resultsinlocalized amplificationofsingle moleculesintomillionsofunique,
clonalclustersacrossthe flow cell. Thisprocess, alsoknownas“clustering,” occursinanautomated, flow cellinstrument
called a cBot Systemorinanonboard clustermodule withinanNGSinstrument.
clusters: Aclonalgroupingoftemplate DNAbound tothe surface ofa flow cell. Eachclusterisseeded bya single template
DNAstrand and isclonallyamplified throughbridge amplificationuntilthe clusterhas~1000copies. Eachclusteronthe flow
cellproducesa single sequencingread. Forexample, 10,000clustersonthe flow cellwould produce 10,000single reads
and 20,000paired-end reads.
contigs: Astretchofcontinuoussequence, insilico, generated byaligningoverlappingsequencingreads.
coverage level: The average numberofsequenced basesthat aligntoeachbase ofthe reference DNA. Forexample, a
whole genome sequenced at 30× coverage meansthat, onaverage, eachbase inthe genome wassequenced 30times.
flow cell: Aglassslide withone, two, oreight physicallyseparated lanes, dependingonthe instrument platform. Eachlane is
coated witha lawnofsurface bound, adapter-complimentaryoligos. Asingle libraryora poolofup to96multiplexed libraries
canbe runperlane, dependingonapplicationparameters.
indexes/barcodes/tags: Aunique DNAsequence ligated tofragmentswithina sequencinglibraryfordownstreaminsilico
sortingand identification. Indexesare typicallya component ofadaptersorPCR primersand are ligated tothe library
fragmentsduringthe sequencinglibrarypreparationstage. Illumina indexesare typicallybetween8–12bp. Librarieswith
unique indexescanbe pooled together, loaded intoone lane ofa sequencingflow cell, and sequenced inthe same run.
Readsare lateridentified and sorted via bioinformatic software. Alltogether, thisprocessisknownas“multiplexing.”
insert: Duringthe librarypreparationstage, the sample DNAisfragmented, and the fragmentsofa specific size (typically200–
500bp, but canbe larger)are ligated or“inserted” inbetweentwooligoadapters. The originalsample DNAfragmentsare
alsoreferred toas“inserts.”
mate pairlibrary: Asequencinglibrarywithlonginsertsranginginsize from2–5 kb typicallyrunaspaired-end libraries. The
longgap lengthinbetweenthe sequence pairsisusefulforbuildingcontigsinde novosequencing, identificationofindels,
and othermethods.
multiplexing: See “indexes/barcodes/tags."
read: NGSusessophisticated instrumentstodetermine the nucleotide sequence ofa DNAorRNAsample. Ingeneralterms,
a sequence “read” referstothe data stringofA,T, C, and G basescorrespondingtothe sample DNAorRNA. WithIllumina
technology, millionsofreadsare generated ina single sequencingrun.
reference genome: Areference genome isa fullysequenced and assembled genome that actsasa scaffold against which
new sequence readsare aligned and compared. Typically, readsgenerated froma sequencingrunare aligned toa
reference genome asa first step indata analysis.
sequencingbysynthesis(SBS): SBStechnologyusesfourfluorescentlylabeled nucleotidestosequence the tensofmillions
ofclustersonthe flow cellsurface inparallel. Duringeachsequencingcycle, a single labeled dNTPisadded tothe nucleic
acid chain. The nucleotide labelservesasa “reversible terminator” forpolymerization: afterdNTPincorporation, the
fluorescent dye isidentified throughlaserexcitationand imaging, thenenzymaticallycleaved toallow the next round of
incorporation. Base callsare made directlyfromsignalintensitymeasurementsduringeachcycle.
For Research Use Only. Not for use in diagnostic procedures.
15. V. References
1. Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain-terminating inhibitors. PNAS. 1977;74(12):5463–5467.
2. Collins FS, Morgan M, Patrinos A. The human genome project: lessons from large-scale biology. Science. 2003;300(5617):286–290.
3. Davies K. 13 years ago, a beer summit in an English pub led to the birth of Solexa. BioIT World. September 28, 2010. (www.bio-itworld.com/2010/issues/sept-
oct/solexa.html).
4. Illumina. HiSeq X Ten Series of Sequencing Systems. 2014. (www.illumina.com/documents/products/datasheets/datasheet-hiseq-x-ten.pdf).
5. Fallows J. When will genomics cure cancer?. The Atlantic. January 2014. (www.theatlantic.com/magazine/archive/2014/01/when-will-genomics-cure-
cancer/355739/)
6. Ross MG, Russ C, Costello M, et al. Characterizing and measuring bias in sequence data. Genome Biol. 2013;14(5):R51.
7. Bentley DR, Balasubramanian S, Swerdlow HP, et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nat. 2008;456(7218):53–
59.
8. Nakazato T, Ohta T, Bono H. Experimental design-based functional mining and characterization of high-throughput sequencing data in the sequence read archive.
PLoS One. 2013;8(10):e77910.
9. Illumina. Nextera DNA Library Preparation Kits data sheet. 2014. (www.illumina.com/documents/products/datasheets/datasheet_nextera_dna_sample_prep.pdf).
10. Illumina. Nextera XT DNA Library Preparation Kit data sheet. 2014.(www.illumina.com/documents/products/datasheets/datasheet_nextera_xt_dna_sample_
prep.pdf).
11. Illumina. TruSeq DNA PCR-Free Library Preparation Kit data sheet. 2013. (www.illumina.com/documents/products/datasheets/datasheet_truseq_dna_pcr_free_
sample_prep.pdf).
12. Kircher M, Sawyer S, Meyer M. Double indexing overcomes inaccuracies in multiplex sequencing on the Illumina platform. Nucleic Acids Res. 2012:2513–2524.
13. Grad YH, Lipsitch M, Feldgarden M, et al. Genomic epidemiology of the Escherichia coli O104:H4 outbreaks in Europe, 2011. PNAS. 2012;109(8):3065–3070.
14. van Dijk EL, Auger H, Jaszczyszyn Y, Thermes C. Ten years of next-generation sequencing technology. Trends Genet. 2014;(9):418–426.
15. McEllistrem MC. Genetic diversity of the pneumococcal capsule: implications for molecular-based serotyping. Future Microbiol. 2009;4(7):857–865.
16. Lo YMD, Chiu RWK. Next-generation sequencing of plasma/serum DNA: an emerging research and molecular diagnostic tool. Clin Chem. 2009;55:607–608.
17. Ram JL, Karim AS, Sendler ED, Kato I. Strategy for microbiome analysis using 16S rRNA gene sequence analysis on the Illumina sequencing platform.
Syst Biol Reprod Med. 2011;57(3):117–118.
18. Wang Y, Kim S, Kim IM. Regulation of metastasis by microRNAs in ovarian cancer. Front Oncol. 2014;10:143.
19. Dior Up, Kogan L, Chill HH, Eizenberg N, Simon A, Revel A. Emerging roles of microRNA in the embryo-endometrium cross talk. Semin Reprod Med. 2014;32
(5):402–409.
20. Phillips T. The role of methylation in gene expression. Nat Education. 2008;1(1):116.
For Research Use Only. Not for use in diagnostic procedures.