SlideShare a Scribd company logo
1 of 46
CLOUD BIOINFORMATICS
BY.A.ARPUTHA SELVARAJ
Bioinformatics is…
The development of computational methods
for studying the structure, function, and
evolution of genes, proteins, and whole
genomes;
The development of methods for the
management and analysis of biological
information arising from genomics and high-
throughput biological experiments.
3
Why is there Bioinformatics?
 Lots of new sequences being added
- Automated sequencers
- Genome Projects
- Metagenomics
- RNA sequencing, microarray studies, proteomics,…
Patterns in datasets that can be analyzed
using computers
Huge datasets
4
 Gramicidine S (Consden et al., 1947), partial insulin sequence
(Sanger and Tuppy, 1951)
 1961: tRNA fragments
 Francis Crick, Sydney Brenner, and colleagues propose the
existence of transfer RNA that uses a three base code and
mediates in the synthesis of proteins (Crick et al., 1961)
General nature of genetic code for proteins. Nature 192: 1227-
1232. In Microbiology: A Centenary Perspective, edited by
Wolfgang K. Joklik, ASM Press. 1999, p.384
 First codon assignment UUU/phe (Nirenberg and Matthaei,
1961)
Need for informatics in biology: origins
5
 The key to the whole field of nucleic acid-based identification
of microorganisms…
…the introduction molecular systematics using proteins and
nucleic acids by the American Nobel laureate Linus Pauling.
Zuckerkandl, E., and L. Pauling. "Molecules as Documents of
Evolutionary History." 1965. Journal of Theoretical Biology
8:357-366
 Another landmark: Nucleic acid sequencing (Sanger and
Coulson, 1975)
Need for informatics in biology: origins
6
Need for informatics in biology: origins
• First genomes sequenced:
– 3.5 kb RNA bacteriophage MS2 (Fiers et al.,
1976)
– 5.4 kb bacteriophage X174 (Sanger et al.,
1977)
– 1.83 Mb First complete genome sequence of a free-living
organism: Haemophilus influenzae KW20 (Fleischmann
et al., 1995)
– First multicellular organism to be sequenced:
C. elegans (C. elegans sequencing consortium, 1998)
• Early databases: Dayhoff, 1972; Erdmann, 1978
• Early programs: restriction enzyme sites, promoters, etc…
circa 1978.
• 1978 – 1993: Nucleic Acids Research published supplemental
information
7(from the National Centre for Biotechnology Information)
Genbank and associated
resources doubles faster
than Moore’s Law!
(< every 18 months)
http://en.wikipedia.org/wiki/Moore’s_law
8
Today: So many genomes…
As of mid-August 2010, according to the GOLD GenomesOnline database….
 Eukaryotic genome projects are in progress?
(Genome and ESTs)
1548 (517 - 5 years ago)
 Prokaryote genome projects are in progress?
5006 (740 - 5 years ago)
 Metagenome projects are in progress?
133 (Zero - 5 years ago)
TOTAL 6687 projects (As of Sept 2011: >10,000)
14
The Human Genome
The genome sequence is complete - almost!
 approximately 3.5 billion base pairs.
15
Work ongoing to locate all genes and
regulatory regions and describe their
functions… …bioinformatics plays a critical
role
16
Identifying single nucleotide polymorphisms
(SNPs) and other changes between individuals
17
Bioinformatics helps with…….
Sequence Similarity Searching/Comparison
 What is similar to my sequence?
 Searching gets harder as the databases
get bigger - and quality changes
 Tools: BLAST and FASTA = early time
saving heuristics (approximate methods)
 Need better methods for SNP analysis!
 Statistics + informed judgment of the
biologist
18
Bioinformatics helps with…….
Structure-Function Relationships
 Can we predict the function of protein
molecules from their sequence?
sequence > structure > function
 Prediction of some simple 3-D structures
possible (a-helix, b-sheet, membrane
spanning, etc.)
19
 Can we define evolutionary
relationships between organisms
by comparing DNA sequences?
- Lots of methods and software,
what is the best analysis approach?
Bioinformatics helps with…….
Phylogenetics
WHAT IS NEXT GENERATION
SEQUENCING (NGS)?
Sanger (“dideoxy sequencing or chain
termination”) Sequencing
 Single stranded DNA
from sample* extended
by polymerase from
primer then randomly
terminated by dideoxy
nucleotide (ddNTP)
 Variable length DNA
fragments radiolabelled
or fluorescently
detected ddNTP
*sample derived from amplified cDNA, genomic clones or whole genome shotgun
Sanger Pro’s & Con’s
• Advantages
– Relatively accurate
– Relatively long (500 –
1500) bp reads
• Disadvantage
– Relatively costly in terms
of reagents and
relatively low
throughput
Next Generation Sequencing (NGS)
Sequence
Assembly
on HPC
Roche 454
Life Tech. Ion Torrent
Illumina HiSeq
Life Tech SOLiD Oxford
Nanopore
“GridION”
Polonator
HeliScope
Pacific
Biosciences
SMRT Cell
(General) NGS Pro’s & Con’s
• Advantages
– Very high throughput
– Very cheap data
production
• Disadvantages
– Relatively short reads
– Relatively higher error
rates
– Bioinformatics of
assembly is much more
challenging
General Workflow
1. Template preparation
2. Sequencing & imaging
3. Genome alignment/assembly
COPING WITH THE
BIOINFORMATICS CHALLENGE
Challenge
 Assembling “next generation sequence” (NGS) data
requires a great deal of computing power and gigabytes
memory
 Software often can execute in parallel on all available
computer processing unit (CPU) cores.
 Many functional annotation processes (e.g. database
searching, gene expression statistical analyses) also
demand a lot of computing power
“High Performance
Computing” and “Cloud
Computing”
Computer
Nodes
Network
Storage
Your local
workstation/
laptop
What is Cloud Computing?
 Pooled resources: shared with many
users (remotely accessed)
 Virtualization: high utilization of
hardware resources (no idling)
 Elasticity: dynamic scaling without
capital expenditure and time delay
 Automation: build, deploy, configure,
provision, and move without manual
intervention
 Metered billing: “pay-as-you-go, only
for what you use
Cloud Computing
Cloud Bioinformatics Module
Raw Data/
Results/
Snapshots
Task-
Specialized
Server
Input Job
Message Queue
Output Job
Message Queue
Job Status
Notification
Customized
Machine
Image
Start-up
(w/parameters)
A More Complete Picture…
Raw Data + Results
Web
Portal
Project
Relational
Database
Database
Loader
Case Study in Bioinformatics on the Cloud
 Used Amazon Web Services
http://aws.amazon.com
 Assembled ~99 raw NGS transcriptome sequence
datasets from 83 species, on 16 Amazon EC2 instances
with 8 CPU cores, 68 GB of RAM, ~200 hours of
computer time, total run in less than one working day.
 Each single machine of the required size would likely
have cost at least ~$10,000 (and time) to purchase,
and incur significant operating costs overhead
(machine room space, power supplies, networking, air
conditioning, staff salaries, etc.)
 The above run could be started up in a few minutes
and cost ~ $500 to complete. Once done, no machines
left idling and unused…
Software for (NGS) Bioinformatics
Bundled with sequencing machines:
e.g. Newbler assembler with Roche 454
3rd party commercial:
DNA Star (www.dnastar.com)
Geneious (http://www.geneious.com/)
GeneWiz (http://www.genewiz.com)
And others…
Open Source:
Lots (selected examples to be covered in this
workshop)
What do I need to run bioinformatics software locally?
Some common bioinformatics software is
platform independent, hence will run equally
under Windows and UNIX (Linux, OSX)
Most other software targets Unix systems. If
you are running Microsoft Windows and want
to run such software locally, the easiest way to
do this(?) is to install some version of Linux
(suggest “Ubuntu”) as a dual boot or (less
intrusively) as a guest operating system in a
virtual machine, e.g.
http://www.vmware.com/products/player/
But, what are *we* going to use here?
WestGrid @ SFU / IRMACS
WestGrid is a consortium member of
“Computer Canada”
https://computecanada.org/
“bugaboo” cluster: 4328 cores total: 1280
cores, 8 cores/node, 16 GB/node, x86_64, IB.
Plus 3048 cores, 12 cores/node, 24GB/node,
x86_64, IB. capability cluster, 40 Core Years
Access to other Westgrid resources through
LAN and WAN
More details from Brian Corrie tomorrow…
Galaxy Genomics Workbench
http://galaxy.psu.edu/
(also http://main.g2.bx.psu.edu/)
THE ROADMAP
What is Bioinformatics?
Road Map
Annotation
Sequences
(Formats)
Visualization of
Sequence &
Annotation
Search &
Alignments
NGS
Sequence
Databases
Sequence
Assembly
Specific Applications
 Sequence Assembly of Transcriptomes
 Sequence Assembly of Whole Genomes
 Annotation of de novo Assembled Sequences
 Identification and Analysis of Sequence Variation
 Comparative Genomic Analysis and Visualization
 Meta-Analysis of Annotated Sequence Data
Survey: Workshop Expectations I
 How to find significance in the huge amount of
data that Next Gen sequencing, but also
microarrays etc. generate.
 A basic understanding of how to analyse next
generation sequencing data.
 Learn some hands-on computer experience
 learning to use software for analysing sequence
data; what can be done and how to do it.
 genome assembly + meta-analysis
Survey: Workshop Expectations II
 The basics of alignment and SNP calling with next-
gen sequencing, and what kind of programs are
out there to do these tasks and then analyze the
large datasets (I've been trying to figure this out
on my own through reading the literature and it's
quite time consuming so any info provided
through the workshop would be very helpful -
thanks)
 The main workflow for processing sequence data
from the beginning to the more specific paths of
analyses. Also the concepts, significance of the
adjustable parameters behind the various
algorithms used in the workflow.
Survey: Workshop Expectations III
 I expect to learn the basic bioinformatics tools.
 Learn different sequence alignment
software/technologies (i.e. BWA, Abyss, etc.). Learn
more about the complexities of NGS sequencing
 Next generation sequencing, data analysis etc.
 Parameters regulating assembly of contigs. How to
take raw data to an assembly, control the main
parameters for assembly, mass analyze data for
annotation and SNPs
 How to compare expression profiles using RNA
transcriptomes.
 Want to learn new things
Survey: Operating System Being Used
Microsoft Windows on Intel/AMD – 14 (86.7%)
 Most running Windows 7 (some XP & Vista)
 One uses Linux through Westgrid and the IRMACS
cluster
Some of you also thinking of running Linux
Apple OS X – 2 (13.3%)
 Snow Leopard Release
 Apple Lion, running Windows 7 using Parallels
Linux on Intel - 2 (13.3%)
Looking Ahead…
What will you need for this workshop?
Mainly, just a laptop running a web browser
(Optional) access to Linux/Unix locally (VM Player)
Reading list:
Will give review citations for future lectures
For next week, suggest that you surf to
http://www.ncbi.nlm.nih.gov/
Thank You

More Related Content

What's hot

BioChain Next Generation Sequencing Products
BioChain Next Generation Sequencing ProductsBioChain Next Generation Sequencing Products
BioChain Next Generation Sequencing Productsbiochain
 
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...EMC
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsmikaelhuss
 
GENOMICS AND BIOINFORMATICS
GENOMICS AND BIOINFORMATICSGENOMICS AND BIOINFORMATICS
GENOMICS AND BIOINFORMATICSsandeshGM
 
16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur PipelineEman Abdelrazik
 
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...EMC
 
Data analytics challenges in genomics
Data analytics challenges in genomicsData analytics challenges in genomics
Data analytics challenges in genomicsmikaelhuss
 
Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)LOGESWARAN KA
 
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2BITS
 
Computational Genomics - Bioinformatics - IK
Computational Genomics - Bioinformatics - IKComputational Genomics - Bioinformatics - IK
Computational Genomics - Bioinformatics - IKIlgın Kavaklıoğulları
 
Introduction to second generation sequencing
Introduction to second generation sequencingIntroduction to second generation sequencing
Introduction to second generation sequencingDenis C. Bauer
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectGenome Reference Consortium
 

What's hot (20)

BioChain Next Generation Sequencing Products
BioChain Next Generation Sequencing ProductsBioChain Next Generation Sequencing Products
BioChain Next Generation Sequencing Products
 
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 
GENOMICS AND BIOINFORMATICS
GENOMICS AND BIOINFORMATICSGENOMICS AND BIOINFORMATICS
GENOMICS AND BIOINFORMATICS
 
Biological databases
Biological databasesBiological databases
Biological databases
 
16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline
 
Introduction to 16S Microbiome Analysis
Introduction to 16S Microbiome AnalysisIntroduction to 16S Microbiome Analysis
Introduction to 16S Microbiome Analysis
 
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
 
Data analytics challenges in genomics
Data analytics challenges in genomicsData analytics challenges in genomics
Data analytics challenges in genomics
 
Rna seq
Rna seq Rna seq
Rna seq
 
Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)
 
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
 
Computational Genomics - Bioinformatics - IK
Computational Genomics - Bioinformatics - IKComputational Genomics - Bioinformatics - IK
Computational Genomics - Bioinformatics - IK
 
Variant analysis and whole exome sequencing
Variant analysis and whole exome sequencingVariant analysis and whole exome sequencing
Variant analysis and whole exome sequencing
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Introduction to second generation sequencing
Introduction to second generation sequencingIntroduction to second generation sequencing
Introduction to second generation sequencing
 
Bioinformatics seminar
Bioinformatics seminarBioinformatics seminar
Bioinformatics seminar
 
2015 bioinformatics wim_vancriekinge
2015 bioinformatics wim_vancriekinge2015 bioinformatics wim_vancriekinge
2015 bioinformatics wim_vancriekinge
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
 
RNA-Seq with R-Bioconductor
RNA-Seq with R-BioconductorRNA-Seq with R-Bioconductor
RNA-Seq with R-Bioconductor
 

Viewers also liked

Biodiversidad En MéXico
Biodiversidad En MéXico Biodiversidad En MéXico
Biodiversidad En MéXico beatrizaaa
 
How to think like a startup
How to think like a startupHow to think like a startup
How to think like a startupLoic Le Meur
 
32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your Business32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your BusinessBarry Feldman
 
Teaching Students with Emojis, Emoticons, & Textspeak
Teaching Students with Emojis, Emoticons, & TextspeakTeaching Students with Emojis, Emoticons, & Textspeak
Teaching Students with Emojis, Emoticons, & TextspeakShelly Sanchez Terrell
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerLuminary Labs
 

Viewers also liked (6)

Biodiversidad En MéXico
Biodiversidad En MéXico Biodiversidad En MéXico
Biodiversidad En MéXico
 
Inaugural Addresses
Inaugural AddressesInaugural Addresses
Inaugural Addresses
 
How to think like a startup
How to think like a startupHow to think like a startup
How to think like a startup
 
32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your Business32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your Business
 
Teaching Students with Emojis, Emoticons, & Textspeak
Teaching Students with Emojis, Emoticons, & TextspeakTeaching Students with Emojis, Emoticons, & Textspeak
Teaching Students with Emojis, Emoticons, & Textspeak
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI Explainer
 

Similar to Cloud bioinformatics 2

Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxxRowlet
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchDavid Ruau
 
Bioinformatics t1-introduction wim-vancriekinge_v2013
Bioinformatics t1-introduction wim-vancriekinge_v2013Bioinformatics t1-introduction wim-vancriekinge_v2013
Bioinformatics t1-introduction wim-vancriekinge_v2013Prof. Wim Van Criekinge
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesGuy Coates
 
Microarry andd NGS.pdf
Microarry andd NGS.pdfMicroarry andd NGS.pdf
Microarry andd NGS.pdfnedalalazzwy
 
BITS: Basics of sequence databases
BITS: Basics of sequence databasesBITS: Basics of sequence databases
BITS: Basics of sequence databasesBITS
 
Analyzing Genomic Data with PyEnsembl and Varcode
Analyzing Genomic Data with PyEnsembl and VarcodeAnalyzing Genomic Data with PyEnsembl and Varcode
Analyzing Genomic Data with PyEnsembl and VarcodeAlex Rubinsteyn
 
The UCSC genome browser: A Neuroscience focused overview
The UCSC genome browser: A Neuroscience focused overviewThe UCSC genome browser: A Neuroscience focused overview
The UCSC genome browser: A Neuroscience focused overviewVictoria Perreau
 
Next generation sequencing by Muhammad Abbas
Next generation sequencing by Muhammad AbbasNext generation sequencing by Muhammad Abbas
Next generation sequencing by Muhammad AbbasMuhammadAbbaskhan9
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchAnshika Bansal
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and KnowledgeIan Foster
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128GenomeInABottle
 
Managing & Processing Big Data for Cancer Genomics, an insight of Bioinformatics
Managing & Processing Big Data for Cancer Genomics, an insight of BioinformaticsManaging & Processing Big Data for Cancer Genomics, an insight of Bioinformatics
Managing & Processing Big Data for Cancer Genomics, an insight of BioinformaticsRaul Chong
 
2016 bioinformatics i_wim_vancriekinge_vupload
2016 bioinformatics i_wim_vancriekinge_vupload2016 bioinformatics i_wim_vancriekinge_vupload
2016 bioinformatics i_wim_vancriekinge_vuploadProf. Wim Van Criekinge
 
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...VHIR Vall d’Hebron Institut de Recerca
 

Similar to Cloud bioinformatics 2 (20)

Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptx
 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
 
2016 bergen-sars
2016 bergen-sars2016 bergen-sars
2016 bergen-sars
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical Research
 
Bioinformatics t1-introduction wim-vancriekinge_v2013
Bioinformatics t1-introduction wim-vancriekinge_v2013Bioinformatics t1-introduction wim-vancriekinge_v2013
Bioinformatics t1-introduction wim-vancriekinge_v2013
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciences
 
Microarry andd NGS.pdf
Microarry andd NGS.pdfMicroarry andd NGS.pdf
Microarry andd NGS.pdf
 
BITS: Basics of sequence databases
BITS: Basics of sequence databasesBITS: Basics of sequence databases
BITS: Basics of sequence databases
 
Genome Assembly
Genome AssemblyGenome Assembly
Genome Assembly
 
Analyzing Genomic Data with PyEnsembl and Varcode
Analyzing Genomic Data with PyEnsembl and VarcodeAnalyzing Genomic Data with PyEnsembl and Varcode
Analyzing Genomic Data with PyEnsembl and Varcode
 
The UCSC genome browser: A Neuroscience focused overview
The UCSC genome browser: A Neuroscience focused overviewThe UCSC genome browser: A Neuroscience focused overview
The UCSC genome browser: A Neuroscience focused overview
 
Next generation sequencing by Muhammad Abbas
Next generation sequencing by Muhammad AbbasNext generation sequencing by Muhammad Abbas
Next generation sequencing by Muhammad Abbas
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128
 
Managing & Processing Big Data for Cancer Genomics, an insight of Bioinformatics
Managing & Processing Big Data for Cancer Genomics, an insight of BioinformaticsManaging & Processing Big Data for Cancer Genomics, an insight of Bioinformatics
Managing & Processing Big Data for Cancer Genomics, an insight of Bioinformatics
 
2016 bioinformatics i_wim_vancriekinge_vupload
2016 bioinformatics i_wim_vancriekinge_vupload2016 bioinformatics i_wim_vancriekinge_vupload
2016 bioinformatics i_wim_vancriekinge_vupload
 
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
 
T1 2018 bioinformatics
T1 2018 bioinformaticsT1 2018 bioinformatics
T1 2018 bioinformatics
 
ngs.pptx
ngs.pptxngs.pptx
ngs.pptx
 

Recently uploaded

Call Girls Service Noida Maya 9711199012 Independent Escort Service Noida
Call Girls Service Noida Maya 9711199012 Independent Escort Service NoidaCall Girls Service Noida Maya 9711199012 Independent Escort Service Noida
Call Girls Service Noida Maya 9711199012 Independent Escort Service NoidaPooja Gupta
 
Russian Call Girls Gunjur Mugalur Road : 7001305949 High Profile Model Escort...
Russian Call Girls Gunjur Mugalur Road : 7001305949 High Profile Model Escort...Russian Call Girls Gunjur Mugalur Road : 7001305949 High Profile Model Escort...
Russian Call Girls Gunjur Mugalur Road : 7001305949 High Profile Model Escort...narwatsonia7
 
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingCall Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingNehru place Escorts
 
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️saminamagar
 
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photos
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original PhotosBook Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photos
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photosnarwatsonia7
 
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls ServiceCall Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Servicesonalikaur4
 
See the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy PlatformSee the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy PlatformKweku Zurek
 
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original PhotosCall Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original Photosnarwatsonia7
 
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...narwatsonia7
 
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort ServiceCollege Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort ServiceNehru place Escorts
 
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...narwatsonia7
 
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment BookingCall Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment Bookingnarwatsonia7
 
Pharmaceutical Marketting: Unit-5, Pricing
Pharmaceutical Marketting: Unit-5, PricingPharmaceutical Marketting: Unit-5, Pricing
Pharmaceutical Marketting: Unit-5, PricingArunagarwal328757
 
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiCall Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiNehru place Escorts
 
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service JaipurHigh Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipurparulsinha
 
Call Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
Call Girl Lucknow Mallika 7001305949 Independent Escort Service LucknowCall Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
Call Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknownarwatsonia7
 
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...narwatsonia7
 

Recently uploaded (20)

Call Girls Service Noida Maya 9711199012 Independent Escort Service Noida
Call Girls Service Noida Maya 9711199012 Independent Escort Service NoidaCall Girls Service Noida Maya 9711199012 Independent Escort Service Noida
Call Girls Service Noida Maya 9711199012 Independent Escort Service Noida
 
Russian Call Girls Gunjur Mugalur Road : 7001305949 High Profile Model Escort...
Russian Call Girls Gunjur Mugalur Road : 7001305949 High Profile Model Escort...Russian Call Girls Gunjur Mugalur Road : 7001305949 High Profile Model Escort...
Russian Call Girls Gunjur Mugalur Road : 7001305949 High Profile Model Escort...
 
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingCall Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
 
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
 
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
 
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photos
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original PhotosBook Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photos
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photos
 
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls ServiceCall Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Service
 
See the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy PlatformSee the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy Platform
 
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Available
 
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original PhotosCall Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
 
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
 
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort ServiceCollege Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
 
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
 
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment BookingCall Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
 
Pharmaceutical Marketting: Unit-5, Pricing
Pharmaceutical Marketting: Unit-5, PricingPharmaceutical Marketting: Unit-5, Pricing
Pharmaceutical Marketting: Unit-5, Pricing
 
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
 
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiCall Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
 
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service JaipurHigh Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
 
Call Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
Call Girl Lucknow Mallika 7001305949 Independent Escort Service LucknowCall Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
Call Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
 
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
 

Cloud bioinformatics 2

  • 2. Bioinformatics is… The development of computational methods for studying the structure, function, and evolution of genes, proteins, and whole genomes; The development of methods for the management and analysis of biological information arising from genomics and high- throughput biological experiments.
  • 3. 3 Why is there Bioinformatics?  Lots of new sequences being added - Automated sequencers - Genome Projects - Metagenomics - RNA sequencing, microarray studies, proteomics,… Patterns in datasets that can be analyzed using computers Huge datasets
  • 4. 4  Gramicidine S (Consden et al., 1947), partial insulin sequence (Sanger and Tuppy, 1951)  1961: tRNA fragments  Francis Crick, Sydney Brenner, and colleagues propose the existence of transfer RNA that uses a three base code and mediates in the synthesis of proteins (Crick et al., 1961) General nature of genetic code for proteins. Nature 192: 1227- 1232. In Microbiology: A Centenary Perspective, edited by Wolfgang K. Joklik, ASM Press. 1999, p.384  First codon assignment UUU/phe (Nirenberg and Matthaei, 1961) Need for informatics in biology: origins
  • 5. 5  The key to the whole field of nucleic acid-based identification of microorganisms… …the introduction molecular systematics using proteins and nucleic acids by the American Nobel laureate Linus Pauling. Zuckerkandl, E., and L. Pauling. "Molecules as Documents of Evolutionary History." 1965. Journal of Theoretical Biology 8:357-366  Another landmark: Nucleic acid sequencing (Sanger and Coulson, 1975) Need for informatics in biology: origins
  • 6. 6 Need for informatics in biology: origins • First genomes sequenced: – 3.5 kb RNA bacteriophage MS2 (Fiers et al., 1976) – 5.4 kb bacteriophage X174 (Sanger et al., 1977) – 1.83 Mb First complete genome sequence of a free-living organism: Haemophilus influenzae KW20 (Fleischmann et al., 1995) – First multicellular organism to be sequenced: C. elegans (C. elegans sequencing consortium, 1998) • Early databases: Dayhoff, 1972; Erdmann, 1978 • Early programs: restriction enzyme sites, promoters, etc… circa 1978. • 1978 – 1993: Nucleic Acids Research published supplemental information
  • 7. 7(from the National Centre for Biotechnology Information) Genbank and associated resources doubles faster than Moore’s Law! (< every 18 months) http://en.wikipedia.org/wiki/Moore’s_law
  • 8. 8 Today: So many genomes… As of mid-August 2010, according to the GOLD GenomesOnline database….  Eukaryotic genome projects are in progress? (Genome and ESTs) 1548 (517 - 5 years ago)  Prokaryote genome projects are in progress? 5006 (740 - 5 years ago)  Metagenome projects are in progress? 133 (Zero - 5 years ago) TOTAL 6687 projects (As of Sept 2011: >10,000)
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14. 14 The Human Genome The genome sequence is complete - almost!  approximately 3.5 billion base pairs.
  • 15. 15 Work ongoing to locate all genes and regulatory regions and describe their functions… …bioinformatics plays a critical role
  • 16. 16 Identifying single nucleotide polymorphisms (SNPs) and other changes between individuals
  • 17. 17 Bioinformatics helps with……. Sequence Similarity Searching/Comparison  What is similar to my sequence?  Searching gets harder as the databases get bigger - and quality changes  Tools: BLAST and FASTA = early time saving heuristics (approximate methods)  Need better methods for SNP analysis!  Statistics + informed judgment of the biologist
  • 18. 18 Bioinformatics helps with……. Structure-Function Relationships  Can we predict the function of protein molecules from their sequence? sequence > structure > function  Prediction of some simple 3-D structures possible (a-helix, b-sheet, membrane spanning, etc.)
  • 19. 19  Can we define evolutionary relationships between organisms by comparing DNA sequences? - Lots of methods and software, what is the best analysis approach? Bioinformatics helps with……. Phylogenetics
  • 20. WHAT IS NEXT GENERATION SEQUENCING (NGS)?
  • 21. Sanger (“dideoxy sequencing or chain termination”) Sequencing  Single stranded DNA from sample* extended by polymerase from primer then randomly terminated by dideoxy nucleotide (ddNTP)  Variable length DNA fragments radiolabelled or fluorescently detected ddNTP *sample derived from amplified cDNA, genomic clones or whole genome shotgun
  • 22. Sanger Pro’s & Con’s • Advantages – Relatively accurate – Relatively long (500 – 1500) bp reads • Disadvantage – Relatively costly in terms of reagents and relatively low throughput
  • 23. Next Generation Sequencing (NGS) Sequence Assembly on HPC Roche 454 Life Tech. Ion Torrent Illumina HiSeq Life Tech SOLiD Oxford Nanopore “GridION” Polonator HeliScope Pacific Biosciences SMRT Cell
  • 24. (General) NGS Pro’s & Con’s • Advantages – Very high throughput – Very cheap data production • Disadvantages – Relatively short reads – Relatively higher error rates – Bioinformatics of assembly is much more challenging
  • 25. General Workflow 1. Template preparation 2. Sequencing & imaging 3. Genome alignment/assembly
  • 27. Challenge  Assembling “next generation sequence” (NGS) data requires a great deal of computing power and gigabytes memory  Software often can execute in parallel on all available computer processing unit (CPU) cores.  Many functional annotation processes (e.g. database searching, gene expression statistical analyses) also demand a lot of computing power
  • 28. “High Performance Computing” and “Cloud Computing” Computer Nodes Network Storage Your local workstation/ laptop
  • 29. What is Cloud Computing?  Pooled resources: shared with many users (remotely accessed)  Virtualization: high utilization of hardware resources (no idling)  Elasticity: dynamic scaling without capital expenditure and time delay  Automation: build, deploy, configure, provision, and move without manual intervention  Metered billing: “pay-as-you-go, only for what you use Cloud Computing
  • 30. Cloud Bioinformatics Module Raw Data/ Results/ Snapshots Task- Specialized Server Input Job Message Queue Output Job Message Queue Job Status Notification Customized Machine Image Start-up (w/parameters)
  • 31. A More Complete Picture… Raw Data + Results Web Portal Project Relational Database Database Loader
  • 32. Case Study in Bioinformatics on the Cloud  Used Amazon Web Services http://aws.amazon.com  Assembled ~99 raw NGS transcriptome sequence datasets from 83 species, on 16 Amazon EC2 instances with 8 CPU cores, 68 GB of RAM, ~200 hours of computer time, total run in less than one working day.  Each single machine of the required size would likely have cost at least ~$10,000 (and time) to purchase, and incur significant operating costs overhead (machine room space, power supplies, networking, air conditioning, staff salaries, etc.)  The above run could be started up in a few minutes and cost ~ $500 to complete. Once done, no machines left idling and unused…
  • 33. Software for (NGS) Bioinformatics Bundled with sequencing machines: e.g. Newbler assembler with Roche 454 3rd party commercial: DNA Star (www.dnastar.com) Geneious (http://www.geneious.com/) GeneWiz (http://www.genewiz.com) And others… Open Source: Lots (selected examples to be covered in this workshop)
  • 34. What do I need to run bioinformatics software locally? Some common bioinformatics software is platform independent, hence will run equally under Windows and UNIX (Linux, OSX) Most other software targets Unix systems. If you are running Microsoft Windows and want to run such software locally, the easiest way to do this(?) is to install some version of Linux (suggest “Ubuntu”) as a dual boot or (less intrusively) as a guest operating system in a virtual machine, e.g. http://www.vmware.com/products/player/
  • 35. But, what are *we* going to use here?
  • 36. WestGrid @ SFU / IRMACS WestGrid is a consortium member of “Computer Canada” https://computecanada.org/ “bugaboo” cluster: 4328 cores total: 1280 cores, 8 cores/node, 16 GB/node, x86_64, IB. Plus 3048 cores, 12 cores/node, 24GB/node, x86_64, IB. capability cluster, 40 Core Years Access to other Westgrid resources through LAN and WAN More details from Brian Corrie tomorrow…
  • 39. What is Bioinformatics? Road Map Annotation Sequences (Formats) Visualization of Sequence & Annotation Search & Alignments NGS Sequence Databases Sequence Assembly
  • 40. Specific Applications  Sequence Assembly of Transcriptomes  Sequence Assembly of Whole Genomes  Annotation of de novo Assembled Sequences  Identification and Analysis of Sequence Variation  Comparative Genomic Analysis and Visualization  Meta-Analysis of Annotated Sequence Data
  • 41. Survey: Workshop Expectations I  How to find significance in the huge amount of data that Next Gen sequencing, but also microarrays etc. generate.  A basic understanding of how to analyse next generation sequencing data.  Learn some hands-on computer experience  learning to use software for analysing sequence data; what can be done and how to do it.  genome assembly + meta-analysis
  • 42. Survey: Workshop Expectations II  The basics of alignment and SNP calling with next- gen sequencing, and what kind of programs are out there to do these tasks and then analyze the large datasets (I've been trying to figure this out on my own through reading the literature and it's quite time consuming so any info provided through the workshop would be very helpful - thanks)  The main workflow for processing sequence data from the beginning to the more specific paths of analyses. Also the concepts, significance of the adjustable parameters behind the various algorithms used in the workflow.
  • 43. Survey: Workshop Expectations III  I expect to learn the basic bioinformatics tools.  Learn different sequence alignment software/technologies (i.e. BWA, Abyss, etc.). Learn more about the complexities of NGS sequencing  Next generation sequencing, data analysis etc.  Parameters regulating assembly of contigs. How to take raw data to an assembly, control the main parameters for assembly, mass analyze data for annotation and SNPs  How to compare expression profiles using RNA transcriptomes.  Want to learn new things
  • 44. Survey: Operating System Being Used Microsoft Windows on Intel/AMD – 14 (86.7%)  Most running Windows 7 (some XP & Vista)  One uses Linux through Westgrid and the IRMACS cluster Some of you also thinking of running Linux Apple OS X – 2 (13.3%)  Snow Leopard Release  Apple Lion, running Windows 7 using Parallels Linux on Intel - 2 (13.3%)
  • 45. Looking Ahead… What will you need for this workshop? Mainly, just a laptop running a web browser (Optional) access to Linux/Unix locally (VM Player) Reading list: Will give review citations for future lectures For next week, suggest that you surf to http://www.ncbi.nlm.nih.gov/

Editor's Notes

  1. Summer 2002
  2. Information sources: (Rhesus macaque) Robert F. Service. Science 311: 5767. 1544-1546 (2006). 454 press release, May 31, 2007. http://www.454.com/about-454/news/index.asp?display=detail&id=68 Wellcome Trust Sanger Institute press release, July 2, 2008. http://www.sanger.ac.uk/Info/Press/2008/080702.shtml Complete Genomics article in Bio-IT World: http://www.bio-itworld.com/BioIT_Article.aspx?id=82058 Applied Biosystems press release, October 1, 2008. http://phx.corporate-ir.net/phoenix.zhtml?c=61498&p=irol-abiNewsArticle&ID=1207598&highlight=
  3. Summer 2002
  4. Summer 2002
  5. Summer 2002
  6. Roadmap of the workshop (10 minutes, 3 slides - program revisited; + tech structure/flow diagram(?)