Your SlideShare is downloading. ×
Intro to field_of_bioinformatics
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Intro to field_of_bioinformatics

284
views

Published on

Published in: Technology, Business

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
284
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. 09/05/13 K-INBRE Bioinformatics Core KSU Bioinformatics 1 Introduction to the field of bioinformatics Sept, 2013 Jennifer Shelton K-INBRE Bioinformatics Core KSU
  • 2. 09/05/13 K-INBRE Bioinformatics Core KSU Outline 2 I. Basic concepts i. Definition of bioinformatics ii. Databases (flat-file and relational) iii. Assembly (Overlap-layout- consensus) II. Steps you can take on your own
  • 3. 09/05/13 K-INBRE Bioinformatics Core KSU Definition of bioinformatics 3 Acquire data Store/archive data Organize data Analyzedata Visualizedata Biological, Medical, Behavioral, or Health “Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.” -NIH Biomedical Information Science and Technology Initiative Consortium 2000
  • 4. 09/05/13 K-INBRE Bioinformatics Core KSU Definition of bioinformatics 4 Acquire data Store/archive data Organize data Analyzedata Visualizedata Biological, Medical, Behavioral, or Health Acquire data Store/archive data Organize data Analyzedata Visualizedata Biological, Medical, Behavioral, or Health “Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.” -NIH Biomedical Information Science and Technology Initiative Consortium 2000
  • 5. 09/05/13 K-INBRE Bioinformatics Core KSU Problem with volume 5 “We believe the field of bioinformatics for genetic analysis will be one of the biggest areas of disruptive innovation in life science tools over the next few years,” -Isaac Ro, Goldman Sachs Mark Smiciklas, Flickr.com/photos/intersectionconsulting Ro, Goldman Sachs Per year worldwide we can generate ~13,000,000,000,000,000 bp of data
  • 6. 09/05/13 K-INBRE Bioinformatics Core KSU "This unprecedented amount of sequencing information poses bottlenecks that vary, depending on application, at the level of data extraction, analysis, and interpretation” "These challenges have become part and parcel of the biomedical research community where investigators have increasingly needed to incorporate bioinformatics and biostatistics into their armamentarium." Problem with volume 6 Mark Smiciklas, Flickr.com/photos/intersectionconsulting Opportunities and Challenges Associated with Clinical Diagnostic Genome Sequencing: A Report of the Association for Molecular Pathology. The Journal of Molecular Diagnostics - November 2012
  • 7. 09/05/13 K-INBRE Bioinformatics Core KSU “It sounds like an analog solution in a digital age,”-Sifei He, head of cloud computing for BGI (referring to FedExing disks of data because internet connections are often too slow) NY Times 2011 article: DNA Sequencing Caught in a Deluge of Data http:// www.nytimes.com/ 2011/12/01/business/dna- sequencing-caught-in- deluge-of-data.html? pagewanted=all&_r=0 Problem with volume 7
  • 8. 09/05/13 K-INBRE Bioinformatics Core KSU Examples of bioinformatics tools 8 9/4/13 tumblr_m5sa3oXBAB1rrtrfso1_500.jpg (500×500) ? ? ? ? ? ? ? ? ?
  • 9. 09/05/13 K-INBRE Bioinformatics Core KSU Outline 9 I. Basic concepts i. Definition of bioinformatics ii. Databases (flat-file and relational) iii. Assembly (Overlap-layout- consensus) II. Steps you can take on your own
  • 10. 09/05/13 K-INBRE Bioinformatics Core KSU Flat-file databases ‘records’ about one unique object ‘fields’ same kind of data about different object http://www.ncbi.nlm.nih.gov/ genbank/ 10 GenBank:
  • 11. 09/05/13 K-INBRE Bioinformatics Core KSU 11 Flat-file databases Any flat-file database, like GenBank can be thought of as a single spreadsheet called a ‘table’ of ‘fields’ and ‘records’
  • 12. 09/05/13 K-INBRE Bioinformatics Core KSU Relational databases Have multiple tables with some shared fields and some different **‘fields’ same kind of data about different objects http://www.genome.jp/kegg/ pathway.html 12
  • 13. 09/05/13 K-INBRE Bioinformatics Core KSU Relational databases Relational databases are like multiple tables that are linked with a shared field. This acts like a “key” between them 13 9/25/12 KEGG PATHWAY: hsa05204 2/10www.genome.jp/dbget-‐‑bin/www_bget?pathway+hsa05204 Organism Homo sapiens (human) [GN:hsa] Gene 1543 CYP1A1; cytochrome P450, family 1, subfamily A, polypeptide 1 (EC:1.14.14.1) [KO:K07408] [EC:1.14.14.1] 1576 CYP3A4; cytochrome P450, family 3, subfamily A, polypeptide 4 (EC:1.14.13.67 1.14.13.97 1.14.13.32) [KO:K07424] [EC:1.14.14.1] 1577 CYP3A5; cytochrome P450, family 3, subfamily A, polypeptide 5 (EC:1.14.14.1) [KO:K07424] [EC:1.14.14.1] 1551 CYP3A7; cytochrome P450, family 3, subfamily A, polypeptide 7 (EC:1.14.14.1) [KO:K07424] [EC:1.14.14.1] 64816 CYP3A43; cytochrome P450, family 3, subfamily A, polypeptide 43 (EC:1.14.14.1) [KO:K07424] [EC:1.14.14.1] 5743 PTGS2; prostaglandin-endoperoxide synthase 2 (prostaglandin G/H synthase and cyclooxygenase) (EC:1.14.99.1) [KO:K11987] [EC:1.14.99.1] 10 NAT2; N-acetyltransferase 2 (arylamine N-acetyltransferase) (EC:2.3.1.5) [KO:K00622] [EC:2.3.1.5] 9 NAT1; N-acetyltransferase 1 (arylamine N-acetyltransferase) (EC:2.3.1.5) [KO:K00622] [EC:2.3.1.5] 1544 CYP1A2; cytochrome P450, family 1, subfamily A, polypeptide 2 (EC:1.14.14.1) [KO:K07409] [EC:1.14.14.1] 6799 SULT1A2; sulfotransferase family, cytosolic, 1A, phenol- preferring, member 2 (EC:2.8.2.1) [KO:K01014] [EC:2.8.2.1] 6817 SULT1A1; sulfotransferase family, cytosolic, 1A, phenol- preferring, member 1 (EC:2.8.2.1) [KO:K01014] [EC:2.8.2.1] 6818 SULT1A3; sulfotransferase family, cytosolic, 1A, phenol- preferring, member 3 (EC:2.8.2.1) [KO:K01014] [EC:2.8.2.1] 445329 SULT1A4; sulfotransferase family, cytosolic, 1A, phenol- preferring, member 4 (EC:2.8.2.1) [KO:K01014] [EC:2.8.2.1] 1545 CYP1B1; cytochrome P450, family 1, subfamily B, polypeptide 1 (EC:1.14.14.1) [KO:K07410] [EC:1.14.14.1] 1558 CYP2C8; cytochrome P450, family 2, subfamily C, polypeptide 8 (EC:1.14.14.1) [KO:K07413] [EC:1.14.14.1] 1562 CYP2C18; cytochrome P450, family 2, subfamily C, polypeptide 18 (EC:1.14.14.1) [KO:K07413] [EC:1.14.14.1] 1557 CYP2C19; cytochrome P450, family 2, subfamily C, polypeptide 19 (EC:1.14.13.48 1.14.13.49 1.14.13.80) [KO:K07413] [EC:1.14.14.1] 1559 CYP2C9; cytochrome P450, family 2, subfamily C, polypeptide 9 (EC:1.14.13.48 1.14.13.49 1.14.13.80) [KO:K07413] [EC:1.14.14.1] 2052 EPHX1; epoxide hydrolase 1, microsomal (xenobiotic)
  • 14. 09/05/13 K-INBRE Bioinformatics Core KSU Outline 14 I. Basic concepts i. Definition of bioinformatics ii. Databases (flat-file and relational) iii. Assembly (Overlap-layout- consensus) II. Steps you can take on your own
  • 15. 09/05/13 K-INBRE Bioinformatics Core KSU Assembly 15 Of the ~13,000,000,000,000,000bp of sequence data we can generate each year, most is not the full length of the molecule of DNA or RNA. Instead, scientists get back multiple copies of their genome (or transcriptome) but all in short segments (between 50bp and several kbs) Steps of Overlap-Layout- Consensus (OLC): 1) Lets’ think of a genome like the text of a book. We get back multiple copies of the book
  • 16. 09/05/13 K-INBRE Bioinformatics Core KSU OLC Assembly 16 1) Instead of being nicely bound, we get randomly shredded text all mixed together from our multiple copies ice was beginning to get very tired of sitting by her tister on the bank, and of having nothing to do Alice was beginning to get vory tired of sitting by her sister on the bank, and of having nothing to do: once lice was beginning to get very tired of sitting by her sister on the bank, and of having nothing
  • 17. 09/05/13 K-INBRE Bioinformatics Core KSU OLC Assembly 17 2) We look for lines that overlap for more than some minimum number of letters (in these programs all overlaps are found, then a single “path” is found through this “graph” of overlaps) ice was beginning to get very tired of sitting by her tister on the bank, and of having nothing to do Alice was beginning to get vory tired of sitting by her sister on the bank, and of having nothing to do: once lice was beginning to get very tired of sitting by her sister on the bank, and of having nothing
  • 18. 09/05/13 K-INBRE Bioinformatics Core KSU OLC Assembly 18 2) We look for lines that overlap for more than some minimum number of letters (in these programs overlaps are found, then a single “path” is found through this “graph” of overlaps) ice was beginning to get very tired of sitting by her tister on the bank, and of having nothing to do Alice was beginning to get vory tired of sitting by her sister on the bank, and of having nothing to do: once lice was beginning to get very tired of sitting by her sister on the bank, and of having nothing
  • 19. 09/05/13 K-INBRE Bioinformatics Core KSU OLC Assembly 19 3) We move column by column counting the letters in a column a make a note of the most common letter (take the consensus) ice was beginning to get very tired of sitting by her tister on the bank, and of having nothing to do Alice was beginning to get vory tired of sitting by her sister on the bank, and of having nothing to do: once lice was beginning to get very tired of sitting by her sister on the bank, and of having nothing ice was beginning to get very tired of sitting by her tister on the bank, and of having nothing to do Alice was beginning to get vory tired of sitting by her sister on the bank, and of having nothing to do: once lice was beginning to get very tired of sitting by her sister on the bank, and of having nothing Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do
  • 20. 09/05/13 K-INBRE Bioinformatics Core KSU OLC Assembly 20 3) We move column by column counting the letters in a column a make a note of the most common letter (take the consensus) ice was beginning to get very tired of sitting by her tister on the bank, and of having nothing to do Alice was beginning to get vory tired of sitting by her sister on the bank, and of having nothing to do: once lice was beginning to get very tired of sitting by her sister on the bank, and of having nothing ice was beginning to get very tired of sitting by her tister on the bank, and of having nothing to do Alice was beginning to get vory tired of sitting by her sister on the bank, and of having nothing to do: once lice was beginning to get very tired of sitting by her sister on the bank, and of having nothing Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do
  • 21. 09/05/13 K-INBRE Bioinformatics Core KSU OLC Assembly 21 3) We move column by column counting the letters in a column a make a note of the most common letter (take the consensus) ice was beginning to get very tired of sitting by her tister on the bank, and of having nothing to do Alice was beginning to get vory tired of sitting by her sister on the bank, and of having nothing to do: once lice was beginning to get very tired of sitting by her sister on the bank, and of having nothing ice was beginning to get very tired of sitting by her tister on the bank, and of having nothing to do Alice was beginning to get vory tired of sitting by her sister on the bank, and of having nothing to do: once lice was beginning to get very tired of sitting by her sister on the bank, and of having nothing Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do
  • 22. 09/05/13 K-INBRE Bioinformatics Core KSU OLC Assembly 22 3) We move column by column counting the letters in a column a make a note of the most common letter (take the consensus) ice was beginning to get very tired of sitting by her tister on the bank, and of having nothing to do Alice was beginning to get vory tired of sitting by her sister on the bank, and of having nothing to do: once lice was beginning to get very tired of sitting by her sister on the bank, and of having nothing ice was beginning to get very tired of sitting by her tister on the bank, and of having nothing to do Alice was beginning to get vory tired of sitting by her sister on the bank, and of having nothing to do: once lice was beginning to get very tired of sitting by her sister on the bank, and of having nothing Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do
  • 23. 09/05/13 K-INBRE Bioinformatics Core KSU OLC Assembly 23 3) We move column by column counting the letters in a column a make a note of the most common letter (take the consensus) ice was beginning to get very tired of sitting by her tister on the bank, and of having nothing to do Alice was beginning to get vory tired of sitting by her sister on the bank, and of having nothing to do: once lice was beginning to get very tired of sitting by her sister on the bank, and of having nothing Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do
  • 24. 09/05/13 K-INBRE Bioinformatics Core KSU 0" 10" 20" 30" 40" 50" 60" 400! 500! 600! 700! 800! Sand"bluestem" (removed)" Sand"bluestem" (intact)" 0! 10! 20! 30! 40! 50! 60! 400! 500! 600! 700! 800! Big$bluestem$ (removed)$ Big$bluestem$(intact)$ RelativereflectanceofEWC Wavelength (nm) Big bluestem Sand bluestem Bischof B. Bittersweet Balsam Assemblies homenursery.com gardeninginsomnia.com 24 60 145 230 315 400 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 MIRA(454) MIRAcluster 0 75 150 225 300 375 450 525 600 Sand bluestem assembly length and number of contigs Cumulativelengthofsequences(Mb) Assembly k-mer value or name Numberofsequences(k) Cumulative length of sequences (Mb) Number of sequences x 10^5 0.4 1.6 2.7 3.9 5.0 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 MIRA(454) MIRAcluster Sand bluestem N values Contiglength(kb) Assembly k-mer value or name N75 (kb) N50 (kb) N25 (kb) k-mer N75 (kb) N50 (kb) N25 (kb) Cumulative length of sequences (Mb) Number of sequences x 105 k-mer N75 (kb) N50 (kb) 27 37 47 57 merge CDH cluster MIRA cluster 1.219 2.028 3.126 142.633358 1.28113 27 1.219 2.0 1.206 2.008 3.087 128.100083 1.1091 37 1.206 2.0 1.195 1.977 3.051 113.176134 0.93839 47 1.195 1.9 1.271 2.035 3.096 102.507455 0.82755 57 1.271 2.0 1.41 2.211 3.331 345.752982 2.31102 merge 1.41 2.2 1.44 2.27 3.422 84.202533 0.59174 CDH cluster 1440 2270 1.804 2.69 3.941 105.920843 0.50279 MIRA cluster 1804 2690 1.1 1.7 2.3 2.8 3.4 4.0 27 37 47 57 merge CDHcluster MIRAcluster Balsam N values Contiglength(kb) Assembly k-mer value or name N75 (kb) N50 (kb) N25 (kb) 80 185 290 395 500 27 37 47 57 merge CDHcluster MIRAcluster 0 0.75 1.5 2.25 3 Balsam assembly length and number of contigs Cumulativelengthofsequences(Mb) Assembly k-mer value or name Numberofsequencesx10^5 Cumulative length of sequences (Mb) Number of sequences x 10^5 k-mer N75 (kb) N50 (kb) N25 (kb) Cumulative length of sequences (Mb) Number of sequences x 105 27 37 47 57 merge CDH cluster MIRA cluster 1.213 2.11 3.221 175.505163 1.61952 1.176 2.026 3.068 154.222168 1.36947 1.168 1.948 2.932 129.331497 1.07545 1.218 1.974 2.95 111.672465 0.90385 1.404 2.23 3.299 418.762352 2.77833 1.399 2.274 3.339 96.411479 0.70852 CDH cluster 1399 2274 3339 96411479 70852 1.825 2.676 3.856 123.666263 0.59598 MIRA cluster 1825 2676 3856 123666263 59598 100 200 300 400 500 27 37 47 57 merge CDHcluster MIRAcluster 0 0.75 1.5 2.25 3 Bittersweet assembly length and number of contigs Cumulativelengthofsequences(Mb) Assembly k-mer value or name Numberofsequencesx10^5 Cumulative length of sequences (Mb) Number of sequences x 10^5 1.1 1.8 2.6 3.3 4.0 27 37 47 57 merge CDHcluster MIRAcluster Bittersweet N values Contiglength(kb) Assembly k-mer value or name N75 (kb) N50 (kb) N25 (kb) Red flour beetle Day E.
  • 25. 09/05/13 K-INBRE Bioinformatics Core KSU Outline 25 I. Basic concepts i. Definition of bioinformatics ii. Databases (flat-file and relational) iii. Assembly (Overlap-layout- consensus) II. Steps you can take on your own
  • 26. 09/05/13 K-INBRE Bioinformatics Core KSU What can you do to get prepared? 26 -Manoj Samanta http://www.homolog.us/blogs/2011/07/22/a-beginners- guide-to-bioinformatics-part-i/ •Layer 1 – Using web to analyze biological data •Layer 2 – Ability to install and run new programs •Layer 3 – Writing own scripts for analysis in PERL, python or R •Layer 4 – High level coding in C/C++/Java for implementing existing algorithms or modifying existing codes for new functionality •Layer 5 – Thinking mathematically, developing own algorithms and implementing in C/C++/ Java If you are interested in studying bioinformatics here is an outline of increasingly complex levels of skills you might work towards
  • 27. 09/05/13 K-INBRE Bioinformatics Core KSU K-INBRE resources 27 Over the fall semester the Bioinformatics Core and Virginia Rider from Pittsburg State University will be hosting an undergraduate bioinformatics club. Our first topic will be command-line blast. Students will get an account on Beocat (Kansas’ largest compute cluster). http://bioinformaticsk-state-undergrad.blogspot.com
  • 28. 09/05/13 K-INBRE Bioinformatics Core KSU K-INBRE resources 28 K-INBRE hosts a journal club, Wednesday at noon, via PolyCom to discuss current bioinformatics tools. http://bioinformaticsk-state.blogspot.com/
  • 29. 09/05/13 K-INBRE Bioinformatics Core KSU K-INBRE resources 29 Bradley Olson and K-INBRE – Perl Justin Blumenstiel et al. – Python http://bioinformaticskstateperl.blogspot.com/
  • 30. 09/05/13 K-INBRE Bioinformatics Core KSU K-INBRE resources 30 K-INBRE and i5K have begun a Github script sharing organization to archive and share scripts. https://github.com/i5K-KINBRE-script-share i5K-KINBRE- script-share RNA-Seq annotation and comparison genome annotation and comparison genome and transcriptome assembly read cleaning and format conversion KSU bioinfo lab Olson lab read me KSU bioinfo lab Olson lab read me read me KSU bioinfo lab Olson lab read me GitHub organization Category of ‘omics’ tool Lab or research group List and description of scripts
  • 31. 09/05/13 K-INBRE Bioinformatics Core KSU K-INBRE resources 31 -Git has very well developed version control built-in http://git- scm.com/video/what-is-version-control -Easy to search -More advantages are reviewed in this quick introduction http:// git-scm.com/video/quick-wins -Provides continuity within labs (as students and post docs rotate out) - Increases collaboration and sharing of workflows between our community - It is also a good way to distribute the code you describe in a publication. - Git is also widely used by beginners as well as developers of technology and software in the omics community. Including: https://github.com/broadinstitute (The Broad Institute) https://github.com/lh3 (Li H. developer of BWA etc) https://github.com/dzerbino (Daniel Zerbino developer of oases and velvet) https://github.com/PacificBiosciences
  • 32. 09/05/13 K-INBRE Bioinformatics Core KSU Questions? 32 9/4/13 tumblr_mp3qolvEiS1rr34bqo1_500.jpg (497×628) Contact information: sheltonj@ksu.edu K-INBRE Bioinformatics Core: http://www.kumc.edu/kinbre/ bioinformatics.html http://bioinformatics.k- state.edu/

×