• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
20120622 fridayadelboden
 

20120622 fridayadelboden

on

  • 25,655 views

This was a last minute microtalk given at the Swiss Institute of Bioinformatics SIB 2011 Summer school in Bioinformatics & Population Genomics in Adelboden ...

This was a last minute microtalk given at the Swiss Institute of Bioinformatics SIB 2011 Summer school in Bioinformatics & Population Genomics in Adelboden (http://edu.isb-sib.ch/course/view.php?id=111 ).

Statistics

Views

Total Views
25,655
Views on SlideShare
24,841
Embed Views
814

Actions

Likes
2
Downloads
0
Comments
0

6 Embeds 814

http://yannick.poulet.org 691
http://localhost 81
http://arkokoley.github.io 22
http://yannick.poulet.org.194-1-205-35.taho.be 17
http://0.0.0.0 2
http://yannickwurm.github.io 1

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • # This is my project intro\n\nYes oh yes ants are the best\n\n# Results\n\nLorem ipsum **dolor sit amet**, consectetur adipiscing elit. Morbi a quam et urna fringilla facilisis. Sed commodo, turpis et luctus pellentesque, nisl nunc luctus mauris, ut sollicitudin enim massa eu dolor. Phasellus interdum neque porta lorem vehicula auctor. Etiam justo magna, aliquam at tempus non, adipiscing vitae nibh. Integer pharetra laoreet eros, at ultrices leo gravida vel. Integer sollicitudin nibh eros, ut ullamcorper tellus. *Nulla ac tortor sed massa bibendum accumsan et fringilla ligula*. Etiam at metus lorem, vitae euismod metus. Maecenas sollicitudin elit eget nulla consequat fermentum tincidunt ipsum adipiscing. Donec ut fringilla turpis. Nunc augue purus, elementum id imperdiet et, volutpat vel magna. Donec euismod libero non augue varius sed venenatis magna tempor. Suspendisse rhoncus felis velit, et scelerisque risus.\n\n\n## They really are\n\nUh-huh\n\n\n## They really really are\n\nOk good job because: \n \n * bla \n * blabla\n * blablabla\n\n\n# Conclusion\n\nYou win: Ants are cool. I want to look at them and crush them and sequence them and genotype them. \n
  • \n
  • Many output formats.\n
  • Many output formats.\n
  • Many output formats.\n
  • Many output formats.\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

20120622 fridayadelboden 20120622 fridayadelboden Presentation Transcript

  • Some Timesavers
  • Programming better• “being able to use understand and improve your code in 6 months & in 60 years” - approximate Damian Conway
  • Programming better• “being able to use understand and improve your code in 6 months & in 60 years” - approximate Damian Conway• variable naming
  • Programming better• “being able to use understand and improve your code in 6 months & in 60 years” - approximate Damian Conway• variable naming• coding width: 100 characters
  • Programming better• “being able to use understand and improve your code in 6 months & in 60 years” - approximate Damian Conway• variable naming• coding width: 100 characters• indenting
  • Programming better• “being able to use understand and improve your code in 6 months & in 60 years” - approximate Damian Conway• variable naming• coding width: 100 characters• indenting• Follow conventions -eg “Google R Style”
  • Programming better• “being able to use understand and improve your code in 6 months & in 60 years” - approximate Damian Conway• variable naming• coding width: 100 characters• indenting• Follow conventions -eg “Google R Style”• Versioning: DropBox & http://github.com/
  • Programming better• “being able to use understand and improve your code in 6 months & in 60 years” - approximate Damian Conway• variable naming• coding width: 100 characters• indenting• Follow conventions -eg “Google R Style”• Versioning: DropBox & http://github.com/• Automated testing
  • Programming better• “being able to use understand and improve your code in 6 months & in 60 years” - approximate Damian Conway• variable naming• coding width: 100 characters• indenting• Follow conventions -eg “Google R Style”• Versioning: DropBox & http://github.com/• Automated testing preprocess_snps <- function(snp_table, testing=FALSE) { if (testing) { # run a bunch of tests of extreme situations. # quit if a test gives a weird result. } # real part of function. }
  • EducationA Quick Guide to Organizing Computational BiologyProjectsWilliam Stafford Noble1,2*1 Department of Genome Sciences, School of Medicine, University of Washington, Seattle, Washington, United States of America, 2 Department of Computer Science andEngineering, University of Washington, Seattle, Washington, United States of AmericaIntroduction understanding your work or who may be under a common root directory. The evaluating your research skills. Most com- exception to this rule is source code or Most bioinformatics coursework focus- monly, however, that ‘‘someone’’ is you. A scripts that are used in multiple projects.es on algorithms, with perhaps some few months from now, you may not Each such program might have a projectcomponents devoted to learning pro- remember what you were up to when you directory of its own.gramming skills and learning how to created a particular set of files, or you may Within a given project, I use a top-leveluse existing bioinformatics software. Un- not remember what conclusions you drew. organization that is logical, with chrono-fortunately, for students who are prepar- You will either have to then spend time logical organization at the next level, anding for a research career, this type of reconstructing your previous experiments logical organization below that. A samplecurriculum fails to address many of the or lose whatever insights you gained from project, called msms, is shown in Figure 1.day-to-day organizational challenges as- those experiments. At the root of most of my projects, I have asociated with performing computational This leads to the second principle, data directory for storing fixed data sets, aexperiments. In practice, the principles which is actually more like a version of results directory for tracking computa-behind organizing and documenting Murphy’s Law: Everything you do, you tional experiments peformed on that data,computational experiments are often will probably have to do over again. a doc directory with one subdirectory perlearned on the fly, and this learning is Inevitably, you will discover some flaw in manuscript, and directories such as srcstrongly influenced by personal predilec- your initial preparation of the data being for source code and bin for compiledtions as well as by chance interactions analyzed, or you will get access to new binaries or scripts.with collaborators or colleagues. data, or you will decide that your param- Within the data and results directo- The purpose of this article is to describe eterization of a particular model was not ries, it is often tempting to apply a similar,one good strategy for carrying out com- broad enough. This means that the logical organization. For example, youputational experiments. I will not describe experiment you did last week, or even may have two or three data sets againstprofound issues such as how to formulate the set of experiments you’ve been work- which you plan to benchmark yourhypotheses, design experiments, or draw ing on over the past month, will probably algorithms, so you could create oneconclusions. Rather, I will focus on need to be redone. If you have organized directory for each of them under data.
  • EducationA Quick Guide to Organizing Computational BiologyProjectsWilliam Stafford Noble1,2*1 Department of Genome Sciences, School of Medicine, University of Washington, Seattle, Washington, United States of America, 2 Department of Computer Science andEngineering, University of Washington, Seattle, Washington, United States of AmericaIntroduction understanding your work or who may be under a common root directory. The evaluating your research skills. Most com- exception to this rule is source code or Most bioinformatics coursework focus- monly, however, that ‘‘someone’’ is you. A scripts that are used in multiple projects.es on algorithms, with perhaps some few months from now, you may not Each such program might have a projectcomponents devoted to learning pro- remember what you were up to when you directory of its own.gramming skills and learning how to created a particular set of files, or you may Within a given project, I use a top-leveluse existing bioinformatics software. Un- not remember what conclusions you drew. organization that is logical, with chrono-fortunately, for students who are prepar- You will either have to then spend time logical organization at the next level, anding for a research career, this type of reconstructing your previous experiments logical organization below that. A samplecurriculum fails to address many of the files1. Directory structurethat thesampleare formattedyounames are in large typeface,so that they can beinsorted intypeface. Only, aorder. The Figure the for a project. Directory and filenames are smaller or lose whatever insights ,year.-,month.-,day. project, called msms is shown in Figure 1. are shown here. Note dates gained from chronological subset ofday-to-day organizational challenges as- in the data directories specify whoisdownloaded the databin/ms-analysison whatdocumented in the root of most of my projects, I have a source code src/ms-analysis.c compiled to create files those experiments. files from what URL and is doc/ms-analysis.html. The README date. TheAt script results/2009-01-15/runall driver split1, split2,sociated with performing computational scriptgenerates by both ofsubdirectoriesthe scripts. and principle, automatically the three split3, corresponding to three cross-validation splits. The bin/parse- sqt.py is This leads runall driver second called the to data directory for storing fixed data sets, aexperiments. In practice, the principles doi:10.1371/journal.pcbi.1000424.g001 which is actually more like a version of results directory for tracking computa-behind organizing and documenting this approach, the distinction be- The Lab Notebook you with Murphy’s Law: Everything you do, tionaltypes of entries providepeformed on that data, These experiments a completecomputational experiments are often data and results may not be useful. tween picture of the development of the project doc directory with one subdirectory per Instead, one could probably have to parallel over this chronological a over time. will imagine a top-level In do with again.learned on the fly, and this learning directory called something likeyou will discover some flaw in is directory structure, I find it useful to Inevitably, experi- maintain a chronologically organized lab manuscript,I put theirdirectories such as src In practice, and ask members of my ments, with subdirectories with names like research group to lab notebooksstrongly influenced by personal predilec- 2008-12-19. Optionally, the preparation of This is data beingresides for source password and bin for compiled your initial directory notebook. the aresults directory and online, behind code protection if document thattions as well as by chance interactions mightanalyzed, word or two will therecords the progress in detail. binaries Whenscripts. a member name also include a in root of or you that get access to new your necessary. or I meet with indicating the topic of the experiment of my lab or a project team, we can referwith collaborators or colleagues. therein. In practice, a single experiment you day Entries in the notebook should be dated, data, orthan onewill ofdecidethey should be relatively verbose, with toWithin entry notebook, focusing results directo- the online the data and on lab will often require more and that your param- The purpose of this article is to describe and so you may end upof a particular model was not tables ries, it is oftennecessary. Theupto apply a similar, work, eterization working a links or embedded of the experiments previous entries as temptingURL images or the current but scrolling to displaying the resultsone good strategy for carrying out com- days orbroad enough.new This you performed.that theto de- logical giveprovided to remote collabo- example, you few more before creating a subdirectory. Later, when you or someone that means In addition can also be rators to organization. on the them status updates Forputational experiments. I will not describe wants experiment youthedidnotebook should record youreven else to know what you did, scribing precisely what you did, the last week, or observations, may have you wouldor three data sets against project. two chronological structure of your work willprofound issues such as how to formulateself-evident. set of experiments you’ve been for future work. which you plan to create be the conclusions, and ideas Note that if work- out your own ‘‘home-brew’’ electronic note- rather not benchmark yourhypotheses, design experiments, or draw a single experiment directoriesthe Particularlytempting simply to linkturnsfinal algorithms,alternativesyou available. Below directory, ing on over the past badly, it is will probably organization of files and is month, when an experiment the book, several so are For example, a variety of commercial could create oneconclusions. Rather, I will focus on andneed to be redone. If you have organized it is directory forhave been of them under data. logical, depends upon the structure plot or table of results and start a new experiment. Before doing that, software systems each created to of your experiment. In many simple help scientists create and maintain elec-
  • EducationA Quick Guide to Organizing Computational BiologyProjectsWilliam Stafford Noble1,2*1 Department of Genome Sciences, School of Medicine, University of Washington, Seattle, Washington, United States of America, 2 Department of Computer Science andEngineering, University of Washington, Seattle, Washington, United States of AmericaIntroduction understanding your work or who may be under a common root directory. The evaluating your research skills. Most com- exception to this rule is source code or Most bioinformatics coursework focus- monly, however, that ‘‘someone’’ is you. A scripts that are used in multiple projects.es on algorithms, with perhaps some few months from now, you may not Each such program might have a projectcomponents devoted to learning pro- remember what you were up to when you directory of its own.gramming skills and learning how to created a particular set of files, or you may Within a given project, I use a top-leveluse existing bioinformatics software. Un- not remember what conclusions you drew. organization that is logical, with chrono-fortunately, for students who are prepar- You will either have to then spend time logical organization at the next level, anding for a research career, this type of reconstructing your previous experiments logical organization below that. A samplecurriculum fails to address many of the files1. Directory structurethat thesampleare formattedyounames are in large typeface,so that they can beinsorted intypeface. Only, aorder. The Figure for a project. Directory and filenames are smaller or lose whatever insights ,year.-,month.-,day. project, called msms is shown in Figure 1. gained from subset of In each results folder: the are shown here. Note dates chronologicalday-to-day organizational challenges as- in the data directories specify whoisdownloaded the databin/ms-analysison whatdocumented in the root of most of my projects, I have a source code src/ms-analysis.c compiled to create files those experiments. files from what URL and is doc/ms-analysis.html. The README date. TheAt script results/2009-01-15/runall driver split1, split2,sociated with performing computational scriptgenerates by both ofsubdirectoriesthe scripts. and principle, automatically the three split3, corresponding to three cross-validation splits. The bin/parse- sqt.py is This leads runall driver second called the to data directory for storing fixed data sets, a •script: getResults.rb or WHATIDID.txtexperiments. In practice, the principles doi:10.1371/journal.pcbi.1000424.g001 which is actually more like a version of results directory for tracking computa-behind organizing and documenting this approach, the distinction be- The Lab Notebook you with Murphy’s Law: Everything you do, tionaltypes of entries providepeformed on that data, These experiments a completecomputational experiments are often data and results may not be useful. tween picture of the development of the project doc directory with one subdirectory per Instead, one could probably have to parallel over this chronological a over time. will imagine a top-level In do with again. •intermediateslearned on the fly, and this learning directory called something likeyou will discover some flaw in is directory structure, I find it useful to Inevitably, experi- maintain a chronologically organized lab manuscript,I put theirdirectories such as src In practice, and ask members of my ments, with subdirectories with names like research group to lab notebooksstrongly influenced by personal predilec- 2008-12-19. Optionally, the preparation of This is data beingresides for source password and bin for compiled your initial directory notebook. the aresults directory and online, behind code protection if document thattions as well as by chance interactions mightanalyzed, word or two will therecords the progress in detail. binaries Whenscripts. a member name also include a in root of or you that get access to new necessary. or I meet with •output indicating the topic of the experiment your of my lab or a project team, we can referwith collaborators or colleagues. therein. In practice, a single experiment you day Entries in the notebook should be dated, data, orthan onewill ofdecidethey should be relatively verbose, with toWithin entry notebook, focusing results directo- the online the data and on lab will often require more and that your param- The purpose of this article is to describe and so you may end upof a particular model was not tables ries, it is oftennecessary. Theupto apply a similar, work, eterization working a links or embedded of the experiments previous entries as temptingURL images or the current but scrolling to displaying the resultsone good strategy for carrying out com- days orbroad enough.new This you performed.that theto de- logical giveprovided to remote collabo- example, you few more before creating a subdirectory. Later, when you or someone that means In addition can also be rators to organization. on the them status updates Forputational experiments. I will not describe wants experiment youthedidnotebook should record youreven else to know what you did, scribing precisely what you did, the last week, or observations, may have you wouldor three data sets against project. two chronological structure of your work willprofound issues such as how to formulateself-evident. set of experiments you’ve been for future work. which you plan to create be the conclusions, and ideas Note that if work- out your own ‘‘home-brew’’ electronic note- rather not benchmark yourhypotheses, design experiments, or draw a single experiment directoriesthe Particularlytempting simply to linkturnsfinal algorithms,alternativesyou available. Below directory, ing on over the past badly, it is will probably organization of files and is month, when an experiment the book, several so are For example, a variety of commercial could create oneconclusions. Rather, I will focus on andneed to be redone. If you have organized it is directory forhave been of them under data. logical, depends upon the structure plot or table of results and start a new experiment. Before doing that, software systems each created to of your experiment. In many simple help scientists create and maintain elec-
  • Markdown.
  • Markdown.
  • •A few tools
  • knitr (sweave)Analyzing & Reporting in a single file.MyFile.Rnw
  • knitr (sweave)Analyzing & Reporting in a single file.MyFile.Rnwdocumentclass{article}usepackage[sc]{mathpazo}usepackage[T1]{fontenc}begin{document}<<setup, include=FALSE, cache=FALSE, echo=FALSE>>=# this is equivalent to SweaveOpts{...}opts_chunk$set(fig.path=figure/minimal-, fig.align=center, fig.show=hold)options(replace.assign=TRUE,width=90)@title{A Minimal Demo of knitr}author{Yihui Xie}maketitleYou can test if textbf{knitr} works with this minimal demo. OK, letsget started with some boring random numbers:<<boring-random,echo=TRUE,cache=TRUE>>=set.seed(1121)(x=rnorm(20))mean(x);var(x)@The first element of texttt{x} is Sexpr{x[1]}. Boring boxplotsand histograms recorded by the PDF device:<<boring-plots,cache=TRUE,echo=TRUE>>=## two plots side by side (option fig.show=hold)par(mar=c(4,4,.1,.1),cex.lab=.95,cex.axis=.9,mgp=c(2,.7,0),tcl=-.3,las=1)boxplot(x)hist(x,main=)@Do the above chunks work? You should be able to compile the TeX{}document and get a PDF file like this one: url{https://github.com/downloads/
  • knitr (sweave)Analyzing & Reporting in a single file. ### in R:MyFile.Rnw library(knitr)documentclass{article}usepackage[sc]{mathpazo}usepackage[T1]{fontenc} knit(“MyFile.Rnw”)begin{document} # --> creates MyFile.tex<<setup, include=FALSE, cache=FALSE, echo=FALSE>>=# this is equivalent to SweaveOpts{...}opts_chunk$set(fig.path=figure/minimal-, fig.align=center, fig.show=hold)options(replace.assign=TRUE,width=90)@ ### in shell: pdflatex MyFile.textitle{A Minimal Demo of knitr}author{Yihui Xie} # --> creates MyFile.pdfmaketitleYou can test if textbf{knitr} works with this minimal demo. OK, letsget started with some boring random numbers:<<boring-random,echo=TRUE,cache=TRUE>>=set.seed(1121)(x=rnorm(20))mean(x);var(x)@The first element of texttt{x} is Sexpr{x[1]}. Boring boxplotsand histograms recorded by the PDF device:<<boring-plots,cache=TRUE,echo=TRUE>>=## two plots side by side (option fig.show=hold)par(mar=c(4,4,.1,.1),cex.lab=.95,cex.axis=.9,mgp=c(2,.7,0),tcl=-.3,las=1)boxplot(x)hist(x,main=)@Do the above chunks work? You should be able to compile the TeX{}document and get a PDF file like this one: url{https://github.com/downloads/
  • knitr (sweave)Analyzing & Reporting in a single file. ### in R:MyFile.Rnw library(knitr)documentclass{article}usepackage[sc]{mathpazo}usepackage[T1]{fontenc} knit(“MyFile.Rnw”)begin{document} # --> creates MyFile.tex<<setup, include=FALSE, cache=FALSE, echo=FALSE>>=# this is equivalent to SweaveOpts{...}opts_chunk$set(fig.path=figure/minimal-, fig.align=center, fig.show=hold)options(replace.assign=TRUE,width=90)@ ### in shell: pdflatex MyFile.textitle{A Minimal Demo of knitr}author{Yihui Xie} # --> creates MyFile.pdfmaketitleYou can test if textbf{knitr} works with this minimal demo. OK, lets A Minimal Demo of knitrget started with some boring random numbers: Yihui Xie<<boring-random,echo=TRUE,cache=TRUE>>=set.seed(1121) February 26, 2012(x=rnorm(20))mean(x);var(x)@ You can test if knitr works with this minimal demo. OK, let’s get started with s numbers:The first element of texttt{x} is Sexpr{x[1]}. Boring boxplotsand histograms recorded by the PDF device: set.seed(1121) (x <- rnorm(20))<<boring-plots,cache=TRUE,echo=TRUE>>= ## [1] 0.14496 0.43832 0.15319 1.08494 1.99954 -0.81188 0.16027 0## two plots side by side (option fig.show=hold) ## [10] -0.02531 0.15088 0.11008 1.35968 -0.32699 -0.71638 1.80977 0par(mar=c(4,4,.1,.1),cex.lab=.95,cex.axis=.9,mgp=c(2,.7,0),tcl=-.3,las=1) ## [19] 0.13272 -0.15594boxplot(x)hist(x,main=) mean(x)@ ## [1] 0.3217Do the above chunks work? You should be able to compile the TeX{}document and get a PDF file like this one: url{https://github.com/downloads/ var(x)
  • knitr (sweave)Analyzing & Reporting in a single file. ### in R: A Minimal Demo of knitrMyFile.Rnw library(knitr) Yihui Xiedocumentclass{article} February 26, 2012usepackage[sc]{mathpazo}usepackage[T1]{fontenc} knit(“MyFile.Rnw”) You can test if knitr works with this minimal demo. OK, let’s get started with sobegin{document} # --> creates MyFile.tex numbers:<<setup, include=FALSE, cache=FALSE, echo=FALSE>>= set.seed(1121)# this is equivalent to SweaveOpts{...} (x <- rnorm(20))opts_chunk$set(fig.path=figure/minimal-, fig.align=center, fig.show=hold)options(replace.assign=TRUE,width=90)@ ### in shell: ## [1] 0.14496 0.43832 ## [10] -0.02531 0.15088 0.15319 0.11008 1.08494 1.99954 -0.81188 1.35968 -0.32699 -0.71638 0.16027 1.80977 0. 0. pdflatex MyFile.tex ## [19] 0.13272 -0.15594 mean(x)title{A Minimal Demo of knitr}author{Yihui Xie} # --> creates MyFile.pdf ## [1] 0.3217 var(x)maketitleYou can test if textbf{knitr} works with this minimal demo. OK, lets ## [1] 0.5715get started with some boring random numbers: The first element of x is 0.145. Boring boxplots and histograms recorded by the PDF<<boring-random,echo=TRUE,cache=TRUE>>= ## two plots side by side (option fig.show=’hold’)set.seed(1121) par(mar = c(4, 4, 0.1, 0.1), cex.lab = 0.95, cex.axis = 0.9,(x=rnorm(20)) mgp = c(2, 0.7, 0), tcl = -0.3, las = 1)mean(x);var(x) boxplot(x)@ hist(x, main = "")The first element of texttt{x} is Sexpr{x[1]}. Boring boxplots 2.0 ● 8and histograms recorded by the PDF device: ● 1.5<<boring-plots,cache=TRUE,echo=TRUE>>= 6## two plots side by side (option fig.show=hold) 1.0par(mar=c(4,4,.1,.1),cex.lab=.95,cex.axis=.9,mgp=c(2,.7,0),tcl=-.3,las=1) Frequencyboxplot(x) 4 0.5hist(x,main=)@ 0.0 2Do the above chunks work? You should be able to compile the TeX{} −0.5document and get a PDF file like this one: url{https://github.com/downloads/
  • ggplot2: beautiful &(almost) effortless R plots
  • ggplot2: beautiful &(almost) effortless R plots 10 count 5 0 4 6 8 factor(cyl)ggplot(mtcars, aes(factor(cyl))) + geom_bar()
  • ggplot2: beautiful &(almost) effortless R plots 10 count 5 0 4 6 8 factor(cyl) 10 factor(gear) 3 count 4 5 5 0 4 6 8 factor(cyl)ggplot(mtcars, aes(factor(cyl))) + geom_bar()ggplot(mtcars, aes(factor(cyl), fill=factor(gear))) + geom_bar()
  • Ruby.
  • Ruby.“Friends don’t let friends do Perl” - reddit user
  • Getting help.
  • Getting help.• In real life: Make friends with people. Talk to them.
  • Getting help.• In real life: Make friends with people. Talk to them.• Online:
  • Getting help.• In real life: Make friends with people. Talk to them.• Online: • Specific discussion mailing lists (e.g.: R, Stacks, bioruby, MAKER...)
  • Getting help.• In real life: Make friends with people. Talk to them.• Online: • Specific discussion mailing lists (e.g.: R, Stacks, bioruby, MAKER...) • Programming: http://stackoverflow.com
  • Getting help.• In real life: Make friends with people. Talk to them.• Online: • Specific discussion mailing lists (e.g.: R, Stacks, bioruby, MAKER...) • Programming: http://stackoverflow.com • Bioinformatics: http://www.biostars.org
  • Getting help.• In real life: Make friends with people. Talk to them.• Online: • Specific discussion mailing lists (e.g.: R, Stacks, bioruby, MAKER...) • Programming: http://stackoverflow.com • Bioinformatics: http://www.biostars.org • Sequencing-related: http://seqanswers.com
  • • Once I wanted to set up a BLAST server.
  • • Once I wanted to set up a BLAST server. Anurag Priyam, Mechanical engineering student, Kharagpur
  • • Once I wanted to set up a BLAST server. Anurag Priyam, Mechanical engineering student, KharagpurAim: An open source idiot-proof web-interface for custom BLAST
  • http://www.sequenceserver.com/1. Installing gem install sequenceserver
  • http://www.sequenceserver.com/1. Installing gem install sequenceserver2. Configure. # .sequenceserver.conf bin: ~/ncbi-blast-2.2.25+/bin/ database: /Users/me/blast_databases/
  • http://www.sequenceserver.com/1. Installing gem install sequenceserver2. Configure. # .sequenceserver.conf bin: ~/ncbi-blast-2.2.25+/bin/ database: /Users/me/blast_databases/3. Launch. sequenceserver ### Launched SequenceServer at: http://0.0.0.0:4567
  • http://www.sequenceserver.com/1. Installing gem install sequenceserver Do you have BLAST+? If not: gem install blast Do you have BLAST-formatted databases? If not: sequenceserver format-databases /path/to/fastas2. Configure. # .sequenceserver.conf bin: ~/ncbi-blast-2.2.25+/bin/ database: /Users/me/blast_databases/3. Launch. sequenceserver ### Launched SequenceServer at: http://0.0.0.0:4567
  • http://0.0.0.0:4567
  • So what did we do this week? CummeRbund? SOAP? WTF?Aim: first stages of working with a non-model organism.
  • • Read quality: FastQC [required for all data!]
  • • Readquality: FastQC [required for all data!]• Genome• RNA• SNPs & population stuff
  • • Read quality: FastQC [required for all data!]• Genome • Assembly: SOAPdenovo• RNA• SNPs & population stuff
  • • Read quality: FastQC [required for all data!]• Genome • Assembly: SOAPdenovo • Assembly quality:• RNA• SNPs & population stuff
  • • Read quality: FastQC [required for all data!]• Genome • Assembly: SOAPdenovo • Assembly quality: • Internal metrics (scaffold size, number).• RNA• SNPs & population stuff
  • • Read quality: FastQC [required for all data!]• Genome • Assembly: SOAPdenovo • Assembly quality: • Internal metrics (scaffold size, number).• RNA • de novo Assembly: Trinity• SNPs & population stuff
  • • Read quality: FastQC [required for all data!]• Genome • Assembly: SOAPdenovo • Assembly quality: • Internal metrics (scaffold size, number). • Comparison with other data (assembled RNA)• RNA • de novo Assembly: Trinity• SNPs & population stuff
  • • Read quality: FastQC [required for all data!]• Genome • Assembly: SOAPdenovo • Assembly quality: • Internal metrics (scaffold size, number). • Comparison with other data (assembled RNA) • Gene identification• RNA • de novo Assembly: Trinity• SNPs & population stuff
  • • Read quality: FastQC [required for all data!]• Genome • Assembly: SOAPdenovo • Assembly quality: • Internal metrics (scaffold size, number). • Comparison with other data (assembled RNA) • Gene identification • MAKER (automated uses many tools)• RNA • de novo Assembly: Trinity• SNPs & population stuff
  • • Read quality: FastQC [required for all data!]• Genome • Assembly: SOAPdenovo • Assembly quality: • Internal metrics (scaffold size, number). • Comparison with other data (assembled RNA) • Gene identification • MAKER (automated uses many tools) • Apollo (fixing MAKER’s gene models)• RNA • de novo Assembly: Trinity• SNPs & population stuff
  • • Read quality: FastQC [required for all data!]• Genome • Assembly: SOAPdenovo • Assembly quality: • Internal metrics (scaffold size, number). • Comparison with other data (assembled RNA) • Gene identification • MAKER (automated uses many tools) • Apollo (fixing MAKER’s gene models)• RNA • de novo Assembly: Trinity • Gene expression comparison (Queen vs Worker vs Male) • TopHat (mapping to genome) • Cufflinks (de novo gene prediction & quantification) • CummeRbund (easy visualization)• SNPs & population stuff
  • • Read quality: FastQC [required for all data!]• Genome • Assembly: SOAPdenovo • Assembly quality: • Internal metrics (scaffold size, number). • Comparison with other data (assembled RNA) • Gene identification • MAKER (automated uses many tools) • Apollo (fixing MAKER’s gene models)• RNA • de novo Assembly: Trinity • Gene expression comparison (Queen vs Worker vs Male) • TopHat (mapping to genome) • Cufflinks (de novo gene prediction & quantification) • CummeRbund (easy visualization)• SNPs & population stuff • from mapping of pools of RNA • from RAD (Stacks)
  • What is special about my genome?
  • What is special about my genome?• After assembly:
  • What is special about my genome?• After assembly: • Candidate genes?
  • What is special about my genome?• After assembly: • Candidate genes? • Gene expression comparisons?
  • What is special about my genome?• After assembly: • Candidate genes? • Gene expression comparisons? • Genome-wide scans for enrichment (of protein domains; of pathways....)