SlideShare a Scribd company logo
1 of 51
Download to read offline
Programming in R 
?
If/else
Logical Operators
going further
SBSM035 - Stats/ 
Bioinformatics/ 
Programming 
Reproducible Research & 
Sustainable Software 
@yannick__ http://yannick.poulet.org
Why care?
Aquaculture in 
Offshore Zones 
LETTERS I BOOKS I POLICY FORUM I EDUCATION FORUM I PERSPECTIVES 
operations that reveal little, if any, negative 
impact on the environment or local ecosys-tems 
(2, 3). Naylor criticizes the National 
industry governed by regulations with a rational 
basis in the ecology of the oceans and the eco-nomic 
realities of the marketplace. 
1878 
in the classroom 
1880 1882 
perspectives 
LETTERS 
edited by Etta Kavanagh 
Retraction 
WE WISH TO RETRACT OUR RESEARCH ARTICLE “STRUCTURE OF 
MsbA from E. coli: A homolog of the multidrug resistance ATP bind-ing 
cassette (ABC) transporters” and both of our Reports “Structure of 
the ABC transporter MsbA in complex with ADP•vanadate and 
lipopolysaccharide” and “X-ray structure of the EmrE multidrug trans-porter 
in complex with a substrate” (1–3). 
The recently reported structure of Sav1866 (4) indicated that our 
MsbA structures (1, 2, 5) were incorrect in both the hand of the struc-ture 
and the topology. Thus, our biological interpretations based on 
these inverted models for MsbA are invalid. 
An in-house data reduction program introduced a change in sign for 
anomalous differences. This program, which was not part of a conven-tional 
data processing package, converted the anomalous pairs (I+ and 
I-) to (F- and F+), thereby introducing a sign change. As the diffrac-tion 
data collected for each set of MsbA crystals and for the EmrE 
crystals were processed with the same program, the structures reported 
in (1–3, 5, 6) had the wrong hand. 
The error in the topology of the original MsbA structure was a con-sequence 
of the low resolution of the data as well as breaks in the elec-tron 
density for the connecting loop regions. Unfortunately, the use of 
the multicopy refinement procedure still allowed us to obtain reason-able 
refinement values for the wrong structures. 
The Protein Data Bank (PDB) files 1JSQ, 1PF4, and 1Z2R for 
MsbA and 1S7B and 2F2M for EmrE have been moved to the archive 
of obsolete PDB entries. The MsbA and EmrE structures will be 
recalculated from the original data using the proper sign for the anom-alous 
differences, and the new Ca coordinates and structure factors 
will be deposited. 
We very sincerely regret the confusion that these papers have 
caused and, in particular, subsequent research efforts that were unpro-ductive 
as a result of our original findings. 
GEOFFREY CHANG, CHRISTOPHER B. ROTH, 
CHRISTOPHER L. REYES, OWEN PORNILLOS, 
YEN-JU CHEN, ANDY P. CHEN 
Department of Molecular Biology, The Scripps Research Institute, La Jolla, CA 92037, USA. 
References 
1. G. Chang, C. B. Roth, Science 293, 1793 (2001). 
2. C. L. Reyes, G. Chang, Science 308, 1028 (2005). 
3. O. Pornillos, Y.-J. Chen, A. P. Chen, G. Chang, Science 310, 1950 (2005). 
4. R. J. Dawson, K. P. Locher, Nature 443, 180 (2006). 
5. G. Chang, J. Mol. Biol. 330, 419 (2003). 
6. C. Ma, G. Chang, Proc. Natl. Acad. Sci. U.S.A. 101, 2852 (2004). 
Downloaded from www.sciencemag.org on September 24, 2014
Reproducible Research & 
Sustainable Software 
• Avoid costly mistakes 
• Be faster: “stand on the shoulders of giants” 
• Increase impact / visibility
“Big data” biology 
is hard.
“Big data” biology 
is hard. 
• Biology/life is complex 
• Field is young. 
• Biologists lack computational training. 
• Generally, analysis tools suck. 
• badly written 
• badly tested 
• hard to install 
• output quality… often questionable. 
• Understanding/visualizing/massaging data is hard. 
• Datasets continue to grow!
We need great tools.
Some sources of inspiration
1210.0530v3 [cs.MS] 29 Nov 2012 
steve@practicalcomputing.org),††University ofWisconsin (khuff@cae.wisc.Mary University of London (mark.plumbley@eecs.qmul.ac.uk),¶¶University University (ethan@weecology.org), and †††University of Wisconsin (wilsonp@Scientists spend an increasing amount of time building and using 
software. However, most scientists are never taught how to do this 
efficiently. As a result, many are unaware of tools and practices that 
would allow them to write more reliable and maintainable code with 
less effort. We describe a set of best practices for scientific software 
development that have solid foundations in research and experience, 
and that improve scientists’ productivity and the reliability of their 
software. 
1. Write programs for people, not computers. 
Scientists writing software need to write correctly and can be easily read and programmers (especially the author’s future cannot be easily read and understood it is to know that it is actually doing what it is be productive, software developers must therefore aspects of human cognition into account: human working memory is limited, human (Best Practices for Scientific Computing 
Greg Wilson ∗, D.A. Aruliah †, C. Titus Brown ‡, Neil P. Chue Hong §, Matt Davis ¶, Richard T. Guy ∥, 
Steven H.D. Haddock ∗∗, Katy Huff ††, Ian M. Mitchell ‡‡, Mark D. Plumbley §§, Ben Waugh ¶¶, 
Ethan P. White ∗∗∗, Paul Wilson ††† 
∗Software Carpentry (gvwilson@software-carpentry.org),†University of Ontario Institute of Technology (Dhavide.Aruliah@State University (ctb@msu.edu),§Software Sustainability Institute (N.ChueHong@epcc.ed.ac.uk),¶Space Telescope (mrdavis@stsci.edu),∥University of Toronto (guy@cs.utoronto.ca),∗∗Monterey Bay Aquarium Research Institute 
(steve@practicalcomputing.org),††University ofWisconsin (khuff@cae.wisc.edu),‡‡University of British Columbia (mitchell@Mary University of London (mark.plumbley@eecs.qmul.ac.uk),¶¶University College London (b.waugh@ucl.ac.uk),∗∗∗University (ethan@weecology.org), and †††University of Wisconsin (wilsonp@engr.wisc.edu) 
Scientists spend an increasing amount of time building and using 
software. However, most scientists are never taught how to do this 
efficiently. As a result, many are unaware of tools and practices that 
would allow them to write more reliable and maintainable code with 
less effort. We describe a set of best practices for scientific software 
development that have solid foundations in research and experience, 
and that improve scientists’ productivity and the reliability of their 
software. 
Software is as important to modern scientific research as 
telescopes and test tubes. From groups that work exclusively 
on computational problems, to traditional laboratory and field 
scientists, more and more of the daily operation of science re-volves 
around computers. This includes the development of 
new algorithms, managing and analyzing the large amounts 
of data that are generated in single research projects, and 
combining disparate datasets to assess synthetic problems. 
Scientists typically develop their own software for these 
purposes because doing so requires substantial domain-specific 
and open source software development [61, studies of scientific computing [4, 31, development in general (summarized in practices will guarantee efficient, error-free but used in concert they will reduce errors in scientific software, make it easier the authors of the software time and effort focusing on the underlying scientific questions. 
Software is as important to modern scientific research as 
telescopes and test tubes. From groups that work exclusively 
on computational problems, to traditional laboratory and field 
scientists, more and more of the daily operation of science re-volves 
around computers. This includes the development of 
new algorithms, managing and analyzing the large amounts 
of data that are generated in single research projects, and 
combining disparate datasets to assess synthetic problems. 
arXiv:1210.0530v3 [cs.MS] 29 Nov 2012 
1. Write programs for people, not computers. 
2. Automate repetitive tasks. 
3. Use the computer to record history. 
4. Make incremental changes. 
5. Use version control. 
6. Don’t repeat yourself (or others). 
7. Plan for mistakes. 
8. Optimize software only after it works correctly. 
9. Document the design and purpose of code rather than its mechanics.! 
10. Conduct code reviews.
Specific Approaches/Tools 
• Planning for mistakes 
• Automated testing 
• Continuous integration 
•Writing for people: use style guide
Code for people: Use a style guide 
• For R: http://r-pkgs.had.co.nz/style.html
R style guide extract
understand and improve your code in 6 
Coding for people: Indent your 
approximate Damian Conway 
code! 
characters 
http://github.com/
R style guide extract 
Line length 
Strive to limit your code to 80 characters per line. This fits comfortably on a printed page with a 
reasonably sized font. If you find yourself running out of room, this is a good indication that you 
should encapsulate some of the work in a separate function. 
! 
ant_measurements <- read.table(file = '~/Downloads/Web/ant_measurements.txt', header=TRUE, sep='! 
ant_measurements <- read.table(file = '~/Downloads/Web/ant_measurements.txt', header=TRUE, 
sep='t', col.names = c('colony', 'individual', 'headwidth', ‘mass')) 
! 
ant_measurements <- read.table(file = '~/Downloads/Web/ant_measurements.txt', 
header = TRUE, 
sep = 't', 
col.names = c('colony', 'individual', 'headwidth', 'mass') 
)
Code for people: Use a style guide 
• For R: http://r-pkgs.had.co.nz/style.html 
• For Ruby: https://github.com/bbatsov/ruby-style-guide 
Automatically check your code: 
install.packages(“lint”) # once 
library(lint) # everytime 
lint(“file_to_check.R”)
Eliminate redundancy 
DRY: Don’t Repeat Yourself
knitr (sweave)Analyzing & Reporting in a single file. 
analysis.Rmd 
A minimal R Markdown example 
I know the value of pi is 3.1416, and 2 times pi is 6.2832. To compile library(knitr); knit(minimal.Rmd) 
A paragraph here. A code chunk below: 
1+1 
## [1] 2 
### in R: 
library(knitr) 
knit(“analysis.Rmd”) 
# -- creates analysis.md 
### in shell: 
pandoc analysis.md -o analysis.pdf 
# -- creates MyFile.pdf 
.4-.7+.3 # what? it is not zero! 
## [1] 5.551e-17 
Graphics work too 
library(ggplot2) 
qplot(speed, dist, data = cars) + geom_smooth() 
● ● 
● 
● 
● 
● 
● ● ● 
● 
● 
● ● ● ● 
● 
● ● 
● 
● 
● 
● 
● 
● 
● ● 
● ● 
● 
● ● 
● ● 
● 
● 
● 
● 
● 
● ● ● ● 
● 
● 
120 
80 
40 
0 
5 10 15 20 speed 
dist 
Figure 1: A scatterplot of cars
Organize mindfully
Education 
A Quick Guide to Organizing Computational Biology 
Projects 
William Stafford Noble1,2* 
1 Department of Genome Sciences, School of Medicine, University of Washington, Seattle, Washington, United States of America, 2 Department of Computer Science and 
Engineering, University of Washington, Seattle, Washington, United States of America 
Introduction 
Most bioinformatics coursework focus-es 
on algorithms, with perhaps some 
components devoted to learning pro-gramming 
skills and learning how to 
use existing bioinformatics software. Un-fortunately, 
for students who are prepar-ing 
for a research career, this type of 
curriculum fails to address many of the 
day-to-day organizational challenges as-sociated 
with performing computational 
experiments. In practice, the principles 
behind organizing and documenting 
computational experiments are often 
learned on the fly, and this learning is 
strongly influenced by personal predilec-tions 
In each results folder: 
•script getResults.rb 
•intermediates 
•output 
Figure 1. Directory structure for a sample project. Directory names are in large typeface, and filenames are in smaller typeface. Only a subset of 
the files are shown here. Note that the dates are formatted ,year.-,month.-,day. so that they can be sorted in chronological order. The 
source code src/ms-analysis.c is compiled to create bin/ms-analysis and is documented in doc/ms-analysis.html. The README 
files in the data directories specify who downloaded the data files from what URL on what date. The driver script results/2009-01-15/runall 
automatically generates the three subdirectories split1, split2, and split3, corresponding to three cross-validation splits. The bin/parse-sqt. 
as well as by chance interactions 
with collaborators or colleagues. 
The purpose of this article is to describe 
one good strategy for carrying out com-putational 
experiments. I will not describe 
profound issues such as how to formulate 
hypotheses, design experiments, or draw 
conclusions. Rather, I will focus on 
understanding your work or who may be 
evaluating your research skills. Most com-monly, 
however, that ‘‘someone’’ is you. A 
few months from now, you may not 
remember what you were up to when you 
created a particular set of files, or you may 
not remember what conclusions you drew. 
You will either have to then spend time 
reconstructing your previous experiments 
or lose whatever insights you gained from 
those experiments. 
This leads to the second principle, 
which is actually more like a version of 
Murphy’s Law: Everything you do, you 
will probably have to do over again. 
Inevitably, you will discover some flaw in 
your initial preparation of the data being 
analyzed, or you will get access to new 
data, or you will decide that your param-eterization 
of a particular model was not 
broad enough. This means that the 
experiment you did last week, or even 
the set of experiments you’ve been work-ing 
on over the past month, will probably 
under a common root directory. The 
exception to this rule is source code or 
scripts that are used in multiple projects. 
Each such program might have a project 
directory of its own. 
Within a given project, I use a top-level 
organization that is logical, with chrono-logical 
organization at the next level, and 
logical organization below that. A sample 
project, called msms, is shown in Figure 1. 
At the root of most of my projects, I have a 
data directory for storing fixed data sets, a 
results directory for tracking computa-tional 
experiments peformed on that data, 
a doc directory with one subdirectory per 
manuscript, and directories such as src 
for source code and bin for compiled 
binaries or scripts. 
Within the data and results directo-ries, 
it is often tempting to apply a similar, 
logical organization. For example, you 
may have two or three data sets against 
which you plan to benchmark your 
algorithms, so you could create one 
directory for each of them under data. 
with this approach, the distinction be-tween 
data and results may not be useful. 
Instead, one could imagine a top-level 
directory called something like experi-ments, 
with subdirectories with names like 
2008-12-19. Optionally, the directory 
name might also include a word or two 
indicating the topic of the experiment 
The Lab Notebook 
In parallel with this chronological 
directory structure, I find it useful to 
maintain a chronologically organized lab 
notebook. This is a document that resides 
in the root of the results directory and 
that records your progress in detail. 
Entries in the notebook should be dated, 
These types of entries provide a complete 
picture of the development of the project 
over time. 
In practice, I ask members of my 
research group to put their lab notebooks 
online, behind password protection if 
necessary. When I meet with a member 
of my lab or a project team, we can refer 
py script is called by both of the runall driver scripts. 
doi:10.1371/journal.pcbi.1000424.g001
Track versions of everything
Github: Facebook for code
Github: Facebook for code 
• Easy versioning 
• Random people use your stuff 
• And find problems and fix and improve it! 
• Greater impact / better planet 
• Easily update 
• Easily collaborate 
• Identify trends 
• Build online reputation 
Demo
Learn how: 
https://try.github.io/levels/1/challenges/1
Programming languages
Choosing a programming language 
Good: Bad: 
Excel quick  dirty easy to make mistakes 
doesn’t scale 
R numbers, stats, 
genomics 
programming 
Unix command-line 
== shell == bash 
Can’t escape it. 
Quick  Dirty. HPC. 
programming, 
complicated things 
Java 1990s user interfaces overcomplicated. 
Perl 1980s. Everything. 
Python scripting, text ugly 
Ruby scripting, text 
Javascript/Node scripting, flexibility(web 
 client), community only little bio-stuff
Ruby. “Friends don’t let friends do Perl” - reddit user 
• example: “reverse each line in file” 
• read file; with each line 
• remove the invisible “end of line” character 
• reverse the contents 
• print the reversed line 
### in PERL: 
open INFILE, my_file.txt; 
while (defined ($line = INFILE)) { 
chomp($line); 
@letters = split(//, $line); 
@reverse_letters = reverse(@letters); 
$reverse_string = join(, @reverse_letters); 
print $reverse_string, n; 
} 
### in Ruby: 
File.open(a).each { |line| 
puts line.chomp.reverse 
}
More ruby examples. 
5.times { 
puts Hello world 
} 
# Sorting people 
people_sorted_by_age = people.sort_by { |person| person.age} 
+many tools for bio-data - e.g. check http://biogems.info
Getting help. 
• In real life: Make friends with people. Talk to them. 
• Online: 
• Specific discussion mailing lists (e.g.: R, Stacks, bioruby, MAKER...) 
• Programming: http://stackoverflow.com 
• Bioinformatics: http://www.biostars.org 
• Sequencing-related: http://seqanswers.com 
• Stats: http://stats.stackexchange.com 
! 
• Codeschool!
“Can you BLAST this for me?”
Anurag Priyam, 
Sure, I can 
help you… 
Mechanical engineering student, IIT Kharagpur
“Can you BLAST this for me?” 
Antgenomes.org SequenceServer 
BLAST made easy 
(well, we’re trying...) 
Aim: An open source idiot-proof 
web-interface for custom BLAST
Today: SequenceServer 
Used in 200 labs
Anurag Priyam, 
Sure, I can 
help you… 
Mechanical engineering student, IIT Kharagpur
xkcd

More Related Content

What's hot

The Taverna Workflow Management Software Suite - Past, Present, Future
The Taverna Workflow Management Software Suite - Past, Present, FutureThe Taverna Workflow Management Software Suite - Past, Present, Future
The Taverna Workflow Management Software Suite - Past, Present, FuturemyGrid team
 
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...Alejandra Gonzalez-Beltran
 
2015 ohsu-metagenome
2015 ohsu-metagenome2015 ohsu-metagenome
2015 ohsu-metagenomec.titus.brown
 
Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3Philippe Rocca-Serra
 
BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...Alejandra Gonzalez-Beltran
 
Web Apollo Workshop University of Exeter
Web Apollo Workshop University of ExeterWeb Apollo Workshop University of Exeter
Web Apollo Workshop University of ExeterMonica Munoz-Torres
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesGuy Coates
 
Managing Genomics Data at the Sanger Institute
Managing Genomics Data at the Sanger InstituteManaging Genomics Data at the Sanger Institute
Managing Genomics Data at the Sanger Instituteinside-BigData.com
 
Web Apollo at Genome Informatics 2014
Web Apollo at Genome Informatics 2014Web Apollo at Genome Informatics 2014
Web Apollo at Genome Informatics 2014Monica Munoz-Torres
 
Oxford DTP - Sansone curation tools - Dec 2014
Oxford DTP - Sansone curation tools - Dec 2014Oxford DTP - Sansone curation tools - Dec 2014
Oxford DTP - Sansone curation tools - Dec 2014Susanna-Assunta Sansone
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...Denis C. Bauer
 
The beauty of workflows and models
The beauty of workflows and modelsThe beauty of workflows and models
The beauty of workflows and modelsmyGrid team
 

What's hot (20)

NETTAB 2013
NETTAB 2013NETTAB 2013
NETTAB 2013
 
The Taverna Workflow Management Software Suite - Past, Present, Future
The Taverna Workflow Management Software Suite - Past, Present, FutureThe Taverna Workflow Management Software Suite - Past, Present, Future
The Taverna Workflow Management Software Suite - Past, Present, Future
 
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
 
CV_10/17
CV_10/17CV_10/17
CV_10/17
 
CSHALS 2013
CSHALS 2013CSHALS 2013
CSHALS 2013
 
NETTAB 2012
NETTAB 2012NETTAB 2012
NETTAB 2012
 
Drug Discovery- ELRIG -2012
Drug Discovery- ELRIG -2012Drug Discovery- ELRIG -2012
Drug Discovery- ELRIG -2012
 
2015_CV_J_SHELTON_linked
2015_CV_J_SHELTON_linked2015_CV_J_SHELTON_linked
2015_CV_J_SHELTON_linked
 
Beyond the PDF 2, 2013
Beyond the PDF 2, 2013Beyond the PDF 2, 2013
Beyond the PDF 2, 2013
 
2015 ohsu-metagenome
2015 ohsu-metagenome2015 ohsu-metagenome
2015 ohsu-metagenome
 
Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3
 
BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...
 
Web Apollo Workshop University of Exeter
Web Apollo Workshop University of ExeterWeb Apollo Workshop University of Exeter
Web Apollo Workshop University of Exeter
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciences
 
Managing Genomics Data at the Sanger Institute
Managing Genomics Data at the Sanger InstituteManaging Genomics Data at the Sanger Institute
Managing Genomics Data at the Sanger Institute
 
Web Apollo at Genome Informatics 2014
Web Apollo at Genome Informatics 2014Web Apollo at Genome Informatics 2014
Web Apollo at Genome Informatics 2014
 
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...
 
Oxford DTP - Sansone curation tools - Dec 2014
Oxford DTP - Sansone curation tools - Dec 2014Oxford DTP - Sansone curation tools - Dec 2014
Oxford DTP - Sansone curation tools - Dec 2014
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...
 
The beauty of workflows and models
The beauty of workflows and modelsThe beauty of workflows and models
The beauty of workflows and models
 

Viewers also liked

20120622 fridayadelboden
20120622 fridayadelboden20120622 fridayadelboden
20120622 fridayadelbodenYannick Wurm
 
2014 10-01-assembly summaryvariantsoverview
2014 10-01-assembly summaryvariantsoverview2014 10-01-assembly summaryvariantsoverview
2014 10-01-assembly summaryvariantsoverviewYannick Wurm
 
Masters bioinfo 2013-11-14-15
Masters bioinfo 2013-11-14-15Masters bioinfo 2013-11-14-15
Masters bioinfo 2013-11-14-15Yannick Wurm
 
Semut programming minds
Semut programming mindsSemut programming minds
Semut programming mindsAo Hayabuza
 
Evolution lectures 5 6 2012b
Evolution lectures 5   6 2012bEvolution lectures 5   6 2012b
Evolution lectures 5 6 2012bYannick Wurm
 
GS Rubber Industries
GS Rubber IndustriesGS Rubber Industries
GS Rubber Industriesjrodriguesjr
 
Blue Designs Presentation
Blue Designs PresentationBlue Designs Presentation
Blue Designs PresentationClaireCardwell
 
Improvement of Military Planes From War to War
Improvement of Military Planes From War to WarImprovement of Military Planes From War to War
Improvement of Military Planes From War to WarTodd Roberts
 
yw jakartarb20101031
yw jakartarb20101031yw jakartarb20101031
yw jakartarb20101031Yannick Wurm
 
A Isings Joomla Presentation[1]
A Isings Joomla Presentation[1]A Isings Joomla Presentation[1]
A Isings Joomla Presentation[1]guest4cbfd6
 
2014 evolution-week1
2014 evolution-week12014 evolution-week1
2014 evolution-week1Yannick Wurm
 
Mundo perdido
Mundo perdidoMundo perdido
Mundo perdidoAVATARX1X
 
Evolution lectures1&2 2012 slideshare
Evolution lectures1&2 2012 slideshareEvolution lectures1&2 2012 slideshare
Evolution lectures1&2 2012 slideshareYannick Wurm
 

Viewers also liked (18)

20120622 fridayadelboden
20120622 fridayadelboden20120622 fridayadelboden
20120622 fridayadelboden
 
2014 10-01-assembly summaryvariantsoverview
2014 10-01-assembly summaryvariantsoverview2014 10-01-assembly summaryvariantsoverview
2014 10-01-assembly summaryvariantsoverview
 
Masters bioinfo 2013-11-14-15
Masters bioinfo 2013-11-14-15Masters bioinfo 2013-11-14-15
Masters bioinfo 2013-11-14-15
 
Semut programming minds
Semut programming mindsSemut programming minds
Semut programming minds
 
2014 12-09-oulu
2014 12-09-oulu2014 12-09-oulu
2014 12-09-oulu
 
Evolution lectures 5 6 2012b
Evolution lectures 5   6 2012bEvolution lectures 5   6 2012b
Evolution lectures 5 6 2012b
 
GS Rubber Industries
GS Rubber IndustriesGS Rubber Industries
GS Rubber Industries
 
Blue Designs Presentation
Blue Designs PresentationBlue Designs Presentation
Blue Designs Presentation
 
Improvement of Military Planes From War to War
Improvement of Military Planes From War to WarImprovement of Military Planes From War to War
Improvement of Military Planes From War to War
 
yw jakartarb20101031
yw jakartarb20101031yw jakartarb20101031
yw jakartarb20101031
 
Linux Routing
Linux RoutingLinux Routing
Linux Routing
 
Employment.Konrad + Ilkay
Employment.Konrad + IlkayEmployment.Konrad + Ilkay
Employment.Konrad + Ilkay
 
Human evolution
Human evolutionHuman evolution
Human evolution
 
A Isings Joomla Presentation[1]
A Isings Joomla Presentation[1]A Isings Joomla Presentation[1]
A Isings Joomla Presentation[1]
 
2014 evolution-week1
2014 evolution-week12014 evolution-week1
2014 evolution-week1
 
Mundo perdido
Mundo perdidoMundo perdido
Mundo perdido
 
Evolution lectures1&2 2012 slideshare
Evolution lectures1&2 2012 slideshareEvolution lectures1&2 2012 slideshare
Evolution lectures1&2 2012 slideshare
 
Photo mix
Photo mixPhoto mix
Photo mix
 

Similar to 2014 11-13-sbsm032-reproducible research

2015 10-7-11am-reproducible research
2015 10-7-11am-reproducible research2015 10-7-11am-reproducible research
2015 10-7-11am-reproducible researchYannick Wurm
 
2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible research2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible researchYannick Wurm
 
2013 10-30-sbc361-reproducible designsandsustainablesoftware
2013 10-30-sbc361-reproducible designsandsustainablesoftware2013 10-30-sbc361-reproducible designsandsustainablesoftware
2013 10-30-sbc361-reproducible designsandsustainablesoftwareYannick Wurm
 
Open-source tools for generating and analyzing large materials data sets
Open-source tools for generating and analyzing large materials data setsOpen-source tools for generating and analyzing large materials data sets
Open-source tools for generating and analyzing large materials data setsAnubhav Jain
 
Belak_ICME_June02015
Belak_ICME_June02015Belak_ICME_June02015
Belak_ICME_June02015Jim Belak
 
Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...Anubhav Jain
 
The Symbiotic Nature of Provenance and Workflow
The Symbiotic Nature of Provenance and WorkflowThe Symbiotic Nature of Provenance and Workflow
The Symbiotic Nature of Provenance and WorkflowEric Stephan
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibilityc.titus.brown
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceCarole Goble
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and KnowledgeIan Foster
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningAnubhav Jain
 
Docker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker, Inc.
 
Introduction to Next Generation Sequencing
Introduction to Next Generation SequencingIntroduction to Next Generation Sequencing
Introduction to Next Generation SequencingEdizonJambormias2
 
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLER
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLERHPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLER
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLERcscpconf
 

Similar to 2014 11-13-sbsm032-reproducible research (20)

2015 10-7-11am-reproducible research
2015 10-7-11am-reproducible research2015 10-7-11am-reproducible research
2015 10-7-11am-reproducible research
 
2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible research2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible research
 
2013 10-30-sbc361-reproducible designsandsustainablesoftware
2013 10-30-sbc361-reproducible designsandsustainablesoftware2013 10-30-sbc361-reproducible designsandsustainablesoftware
2013 10-30-sbc361-reproducible designsandsustainablesoftware
 
Open-source tools for generating and analyzing large materials data sets
Open-source tools for generating and analyzing large materials data setsOpen-source tools for generating and analyzing large materials data sets
Open-source tools for generating and analyzing large materials data sets
 
Belak_ICME_June02015
Belak_ICME_June02015Belak_ICME_June02015
Belak_ICME_June02015
 
Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
 
The Symbiotic Nature of Provenance and Workflow
The Symbiotic Nature of Provenance and WorkflowThe Symbiotic Nature of Provenance and Workflow
The Symbiotic Nature of Provenance and Workflow
 
AI for Science
AI for ScienceAI for Science
AI for Science
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibility
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
2014 abic-talk
2014 abic-talk2014 abic-talk
2014 abic-talk
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 
Cv long
Cv longCv long
Cv long
 
Docker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce Hoff
 
2015 osu-metagenome
2015 osu-metagenome2015 osu-metagenome
2015 osu-metagenome
 
Collins seattle-2014-final
Collins seattle-2014-finalCollins seattle-2014-final
Collins seattle-2014-final
 
Introduction to Next Generation Sequencing
Introduction to Next Generation SequencingIntroduction to Next Generation Sequencing
Introduction to Next Generation Sequencing
 
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLER
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLERHPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLER
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLER
 

More from Yannick Wurm

2018 09-03-ses open-fair_practices_in_evolutionary_genomics
2018 09-03-ses open-fair_practices_in_evolutionary_genomics2018 09-03-ses open-fair_practices_in_evolutionary_genomics
2018 09-03-ses open-fair_practices_in_evolutionary_genomicsYannick Wurm
 
2018 08-reduce risks of genomics research
2018 08-reduce risks of genomics research2018 08-reduce risks of genomics research
2018 08-reduce risks of genomics researchYannick Wurm
 
2017 11-15-reproducible research
2017 11-15-reproducible research2017 11-15-reproducible research
2017 11-15-reproducible researchYannick Wurm
 
2016 09-16-fairdom
2016 09-16-fairdom2016 09-16-fairdom
2016 09-16-fairdomYannick Wurm
 
2016 05-31-wurm-social-chromosome
2016 05-31-wurm-social-chromosome2016 05-31-wurm-social-chromosome
2016 05-31-wurm-social-chromosomeYannick Wurm
 
2016 05-30-monday-assembly
2016 05-30-monday-assembly2016 05-30-monday-assembly
2016 05-30-monday-assemblyYannick Wurm
 
2016 05-29-intro-sib-springschool-leuker bad
2016 05-29-intro-sib-springschool-leuker bad2016 05-29-intro-sib-springschool-leuker bad
2016 05-29-intro-sib-springschool-leuker badYannick Wurm
 
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...Yannick Wurm
 
2015 11-17-programming inr.key
2015 11-17-programming inr.key2015 11-17-programming inr.key
2015 11-17-programming inr.keyYannick Wurm
 
2015 11-10-bio-in-docker-oswitch
2015 11-10-bio-in-docker-oswitch2015 11-10-bio-in-docker-oswitch
2015 11-10-bio-in-docker-oswitchYannick Wurm
 
Week 5 genetic basis of evolution
Week 5   genetic basis of evolutionWeek 5   genetic basis of evolution
Week 5 genetic basis of evolutionYannick Wurm
 
Biol113 week4 evolution
Biol113 week4 evolutionBiol113 week4 evolution
Biol113 week4 evolutionYannick Wurm
 
2015 10-7-9am regex-functions-loops.key
2015 10-7-9am regex-functions-loops.key2015 10-7-9am regex-functions-loops.key
2015 10-7-9am regex-functions-loops.keyYannick Wurm
 
2015 9-30-sbc361-research methcomm
2015 9-30-sbc361-research methcomm2015 9-30-sbc361-research methcomm
2015 9-30-sbc361-research methcommYannick Wurm
 
2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.keyYannick Wurm
 
2015 09-28 bio721 intro
2015 09-28 bio721 intro2015 09-28 bio721 intro
2015 09-28 bio721 introYannick Wurm
 
Sustainable software institute Collaboration workshop
Sustainable software institute Collaboration workshopSustainable software institute Collaboration workshop
Sustainable software institute Collaboration workshopYannick Wurm
 

More from Yannick Wurm (20)

2018 09-03-ses open-fair_practices_in_evolutionary_genomics
2018 09-03-ses open-fair_practices_in_evolutionary_genomics2018 09-03-ses open-fair_practices_in_evolutionary_genomics
2018 09-03-ses open-fair_practices_in_evolutionary_genomics
 
2018 08-reduce risks of genomics research
2018 08-reduce risks of genomics research2018 08-reduce risks of genomics research
2018 08-reduce risks of genomics research
 
2017 11-15-reproducible research
2017 11-15-reproducible research2017 11-15-reproducible research
2017 11-15-reproducible research
 
2016 09-16-fairdom
2016 09-16-fairdom2016 09-16-fairdom
2016 09-16-fairdom
 
2016 05-31-wurm-social-chromosome
2016 05-31-wurm-social-chromosome2016 05-31-wurm-social-chromosome
2016 05-31-wurm-social-chromosome
 
2016 05-30-monday-assembly
2016 05-30-monday-assembly2016 05-30-monday-assembly
2016 05-30-monday-assembly
 
2016 05-29-intro-sib-springschool-leuker bad
2016 05-29-intro-sib-springschool-leuker bad2016 05-29-intro-sib-springschool-leuker bad
2016 05-29-intro-sib-springschool-leuker bad
 
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...
2015 12-18- Avoid having to retract your genomics analysis - Popgroup Reprodu...
 
2015 11-17-programming inr.key
2015 11-17-programming inr.key2015 11-17-programming inr.key
2015 11-17-programming inr.key
 
2015 11-10-bio-in-docker-oswitch
2015 11-10-bio-in-docker-oswitch2015 11-10-bio-in-docker-oswitch
2015 11-10-bio-in-docker-oswitch
 
Week 5 genetic basis of evolution
Week 5   genetic basis of evolutionWeek 5   genetic basis of evolution
Week 5 genetic basis of evolution
 
Biol113 week4 evolution
Biol113 week4 evolutionBiol113 week4 evolution
Biol113 week4 evolution
 
Evolution week3
Evolution week3Evolution week3
Evolution week3
 
2015 10-7-9am regex-functions-loops.key
2015 10-7-9am regex-functions-loops.key2015 10-7-9am regex-functions-loops.key
2015 10-7-9am regex-functions-loops.key
 
Evolution week2
Evolution week2Evolution week2
Evolution week2
 
2015 9-30-sbc361-research methcomm
2015 9-30-sbc361-research methcomm2015 9-30-sbc361-research methcomm
2015 9-30-sbc361-research methcomm
 
2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key
 
Sbc322 intro.key
Sbc322 intro.keySbc322 intro.key
Sbc322 intro.key
 
2015 09-28 bio721 intro
2015 09-28 bio721 intro2015 09-28 bio721 intro
2015 09-28 bio721 intro
 
Sustainable software institute Collaboration workshop
Sustainable software institute Collaboration workshopSustainable software institute Collaboration workshop
Sustainable software institute Collaboration workshop
 

Recently uploaded

Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 

Recently uploaded (20)

Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 

2014 11-13-sbsm032-reproducible research

  • 4.
  • 5.
  • 6.
  • 8.
  • 9. SBSM035 - Stats/ Bioinformatics/ Programming Reproducible Research & Sustainable Software @yannick__ http://yannick.poulet.org
  • 11.
  • 12.
  • 13. Aquaculture in Offshore Zones LETTERS I BOOKS I POLICY FORUM I EDUCATION FORUM I PERSPECTIVES operations that reveal little, if any, negative impact on the environment or local ecosys-tems (2, 3). Naylor criticizes the National industry governed by regulations with a rational basis in the ecology of the oceans and the eco-nomic realities of the marketplace. 1878 in the classroom 1880 1882 perspectives LETTERS edited by Etta Kavanagh Retraction WE WISH TO RETRACT OUR RESEARCH ARTICLE “STRUCTURE OF MsbA from E. coli: A homolog of the multidrug resistance ATP bind-ing cassette (ABC) transporters” and both of our Reports “Structure of the ABC transporter MsbA in complex with ADP•vanadate and lipopolysaccharide” and “X-ray structure of the EmrE multidrug trans-porter in complex with a substrate” (1–3). The recently reported structure of Sav1866 (4) indicated that our MsbA structures (1, 2, 5) were incorrect in both the hand of the struc-ture and the topology. Thus, our biological interpretations based on these inverted models for MsbA are invalid. An in-house data reduction program introduced a change in sign for anomalous differences. This program, which was not part of a conven-tional data processing package, converted the anomalous pairs (I+ and I-) to (F- and F+), thereby introducing a sign change. As the diffrac-tion data collected for each set of MsbA crystals and for the EmrE crystals were processed with the same program, the structures reported in (1–3, 5, 6) had the wrong hand. The error in the topology of the original MsbA structure was a con-sequence of the low resolution of the data as well as breaks in the elec-tron density for the connecting loop regions. Unfortunately, the use of the multicopy refinement procedure still allowed us to obtain reason-able refinement values for the wrong structures. The Protein Data Bank (PDB) files 1JSQ, 1PF4, and 1Z2R for MsbA and 1S7B and 2F2M for EmrE have been moved to the archive of obsolete PDB entries. The MsbA and EmrE structures will be recalculated from the original data using the proper sign for the anom-alous differences, and the new Ca coordinates and structure factors will be deposited. We very sincerely regret the confusion that these papers have caused and, in particular, subsequent research efforts that were unpro-ductive as a result of our original findings. GEOFFREY CHANG, CHRISTOPHER B. ROTH, CHRISTOPHER L. REYES, OWEN PORNILLOS, YEN-JU CHEN, ANDY P. CHEN Department of Molecular Biology, The Scripps Research Institute, La Jolla, CA 92037, USA. References 1. G. Chang, C. B. Roth, Science 293, 1793 (2001). 2. C. L. Reyes, G. Chang, Science 308, 1028 (2005). 3. O. Pornillos, Y.-J. Chen, A. P. Chen, G. Chang, Science 310, 1950 (2005). 4. R. J. Dawson, K. P. Locher, Nature 443, 180 (2006). 5. G. Chang, J. Mol. Biol. 330, 419 (2003). 6. C. Ma, G. Chang, Proc. Natl. Acad. Sci. U.S.A. 101, 2852 (2004). Downloaded from www.sciencemag.org on September 24, 2014
  • 14. Reproducible Research & Sustainable Software • Avoid costly mistakes • Be faster: “stand on the shoulders of giants” • Increase impact / visibility
  • 16. “Big data” biology is hard. • Biology/life is complex • Field is young. • Biologists lack computational training. • Generally, analysis tools suck. • badly written • badly tested • hard to install • output quality… often questionable. • Understanding/visualizing/massaging data is hard. • Datasets continue to grow!
  • 17. We need great tools.
  • 18. Some sources of inspiration
  • 19.
  • 20. 1210.0530v3 [cs.MS] 29 Nov 2012 steve@practicalcomputing.org),††University ofWisconsin (khuff@cae.wisc.Mary University of London (mark.plumbley@eecs.qmul.ac.uk),¶¶University University (ethan@weecology.org), and †††University of Wisconsin (wilsonp@Scientists spend an increasing amount of time building and using software. However, most scientists are never taught how to do this efficiently. As a result, many are unaware of tools and practices that would allow them to write more reliable and maintainable code with less effort. We describe a set of best practices for scientific software development that have solid foundations in research and experience, and that improve scientists’ productivity and the reliability of their software. 1. Write programs for people, not computers. Scientists writing software need to write correctly and can be easily read and programmers (especially the author’s future cannot be easily read and understood it is to know that it is actually doing what it is be productive, software developers must therefore aspects of human cognition into account: human working memory is limited, human (Best Practices for Scientific Computing Greg Wilson ∗, D.A. Aruliah †, C. Titus Brown ‡, Neil P. Chue Hong §, Matt Davis ¶, Richard T. Guy ∥, Steven H.D. Haddock ∗∗, Katy Huff ††, Ian M. Mitchell ‡‡, Mark D. Plumbley §§, Ben Waugh ¶¶, Ethan P. White ∗∗∗, Paul Wilson ††† ∗Software Carpentry (gvwilson@software-carpentry.org),†University of Ontario Institute of Technology (Dhavide.Aruliah@State University (ctb@msu.edu),§Software Sustainability Institute (N.ChueHong@epcc.ed.ac.uk),¶Space Telescope (mrdavis@stsci.edu),∥University of Toronto (guy@cs.utoronto.ca),∗∗Monterey Bay Aquarium Research Institute (steve@practicalcomputing.org),††University ofWisconsin (khuff@cae.wisc.edu),‡‡University of British Columbia (mitchell@Mary University of London (mark.plumbley@eecs.qmul.ac.uk),¶¶University College London (b.waugh@ucl.ac.uk),∗∗∗University (ethan@weecology.org), and †††University of Wisconsin (wilsonp@engr.wisc.edu) Scientists spend an increasing amount of time building and using software. However, most scientists are never taught how to do this efficiently. As a result, many are unaware of tools and practices that would allow them to write more reliable and maintainable code with less effort. We describe a set of best practices for scientific software development that have solid foundations in research and experience, and that improve scientists’ productivity and the reliability of their software. Software is as important to modern scientific research as telescopes and test tubes. From groups that work exclusively on computational problems, to traditional laboratory and field scientists, more and more of the daily operation of science re-volves around computers. This includes the development of new algorithms, managing and analyzing the large amounts of data that are generated in single research projects, and combining disparate datasets to assess synthetic problems. Scientists typically develop their own software for these purposes because doing so requires substantial domain-specific and open source software development [61, studies of scientific computing [4, 31, development in general (summarized in practices will guarantee efficient, error-free but used in concert they will reduce errors in scientific software, make it easier the authors of the software time and effort focusing on the underlying scientific questions. Software is as important to modern scientific research as telescopes and test tubes. From groups that work exclusively on computational problems, to traditional laboratory and field scientists, more and more of the daily operation of science re-volves around computers. This includes the development of new algorithms, managing and analyzing the large amounts of data that are generated in single research projects, and combining disparate datasets to assess synthetic problems. arXiv:1210.0530v3 [cs.MS] 29 Nov 2012 1. Write programs for people, not computers. 2. Automate repetitive tasks. 3. Use the computer to record history. 4. Make incremental changes. 5. Use version control. 6. Don’t repeat yourself (or others). 7. Plan for mistakes. 8. Optimize software only after it works correctly. 9. Document the design and purpose of code rather than its mechanics.! 10. Conduct code reviews.
  • 21.
  • 22.
  • 23. Specific Approaches/Tools • Planning for mistakes • Automated testing • Continuous integration •Writing for people: use style guide
  • 24. Code for people: Use a style guide • For R: http://r-pkgs.had.co.nz/style.html
  • 25. R style guide extract
  • 26. understand and improve your code in 6 Coding for people: Indent your approximate Damian Conway code! characters http://github.com/
  • 27. R style guide extract Line length Strive to limit your code to 80 characters per line. This fits comfortably on a printed page with a reasonably sized font. If you find yourself running out of room, this is a good indication that you should encapsulate some of the work in a separate function. ! ant_measurements <- read.table(file = '~/Downloads/Web/ant_measurements.txt', header=TRUE, sep='! ant_measurements <- read.table(file = '~/Downloads/Web/ant_measurements.txt', header=TRUE, sep='t', col.names = c('colony', 'individual', 'headwidth', ‘mass')) ! ant_measurements <- read.table(file = '~/Downloads/Web/ant_measurements.txt', header = TRUE, sep = 't', col.names = c('colony', 'individual', 'headwidth', 'mass') )
  • 28. Code for people: Use a style guide • For R: http://r-pkgs.had.co.nz/style.html • For Ruby: https://github.com/bbatsov/ruby-style-guide Automatically check your code: install.packages(“lint”) # once library(lint) # everytime lint(“file_to_check.R”)
  • 29. Eliminate redundancy DRY: Don’t Repeat Yourself
  • 30. knitr (sweave)Analyzing & Reporting in a single file. analysis.Rmd A minimal R Markdown example I know the value of pi is 3.1416, and 2 times pi is 6.2832. To compile library(knitr); knit(minimal.Rmd) A paragraph here. A code chunk below: 1+1 ## [1] 2 ### in R: library(knitr) knit(“analysis.Rmd”) # -- creates analysis.md ### in shell: pandoc analysis.md -o analysis.pdf # -- creates MyFile.pdf .4-.7+.3 # what? it is not zero! ## [1] 5.551e-17 Graphics work too library(ggplot2) qplot(speed, dist, data = cars) + geom_smooth() ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 120 80 40 0 5 10 15 20 speed dist Figure 1: A scatterplot of cars
  • 32. Education A Quick Guide to Organizing Computational Biology Projects William Stafford Noble1,2* 1 Department of Genome Sciences, School of Medicine, University of Washington, Seattle, Washington, United States of America, 2 Department of Computer Science and Engineering, University of Washington, Seattle, Washington, United States of America Introduction Most bioinformatics coursework focus-es on algorithms, with perhaps some components devoted to learning pro-gramming skills and learning how to use existing bioinformatics software. Un-fortunately, for students who are prepar-ing for a research career, this type of curriculum fails to address many of the day-to-day organizational challenges as-sociated with performing computational experiments. In practice, the principles behind organizing and documenting computational experiments are often learned on the fly, and this learning is strongly influenced by personal predilec-tions In each results folder: •script getResults.rb •intermediates •output Figure 1. Directory structure for a sample project. Directory names are in large typeface, and filenames are in smaller typeface. Only a subset of the files are shown here. Note that the dates are formatted ,year.-,month.-,day. so that they can be sorted in chronological order. The source code src/ms-analysis.c is compiled to create bin/ms-analysis and is documented in doc/ms-analysis.html. The README files in the data directories specify who downloaded the data files from what URL on what date. The driver script results/2009-01-15/runall automatically generates the three subdirectories split1, split2, and split3, corresponding to three cross-validation splits. The bin/parse-sqt. as well as by chance interactions with collaborators or colleagues. The purpose of this article is to describe one good strategy for carrying out com-putational experiments. I will not describe profound issues such as how to formulate hypotheses, design experiments, or draw conclusions. Rather, I will focus on understanding your work or who may be evaluating your research skills. Most com-monly, however, that ‘‘someone’’ is you. A few months from now, you may not remember what you were up to when you created a particular set of files, or you may not remember what conclusions you drew. You will either have to then spend time reconstructing your previous experiments or lose whatever insights you gained from those experiments. This leads to the second principle, which is actually more like a version of Murphy’s Law: Everything you do, you will probably have to do over again. Inevitably, you will discover some flaw in your initial preparation of the data being analyzed, or you will get access to new data, or you will decide that your param-eterization of a particular model was not broad enough. This means that the experiment you did last week, or even the set of experiments you’ve been work-ing on over the past month, will probably under a common root directory. The exception to this rule is source code or scripts that are used in multiple projects. Each such program might have a project directory of its own. Within a given project, I use a top-level organization that is logical, with chrono-logical organization at the next level, and logical organization below that. A sample project, called msms, is shown in Figure 1. At the root of most of my projects, I have a data directory for storing fixed data sets, a results directory for tracking computa-tional experiments peformed on that data, a doc directory with one subdirectory per manuscript, and directories such as src for source code and bin for compiled binaries or scripts. Within the data and results directo-ries, it is often tempting to apply a similar, logical organization. For example, you may have two or three data sets against which you plan to benchmark your algorithms, so you could create one directory for each of them under data. with this approach, the distinction be-tween data and results may not be useful. Instead, one could imagine a top-level directory called something like experi-ments, with subdirectories with names like 2008-12-19. Optionally, the directory name might also include a word or two indicating the topic of the experiment The Lab Notebook In parallel with this chronological directory structure, I find it useful to maintain a chronologically organized lab notebook. This is a document that resides in the root of the results directory and that records your progress in detail. Entries in the notebook should be dated, These types of entries provide a complete picture of the development of the project over time. In practice, I ask members of my research group to put their lab notebooks online, behind password protection if necessary. When I meet with a member of my lab or a project team, we can refer py script is called by both of the runall driver scripts. doi:10.1371/journal.pcbi.1000424.g001
  • 33. Track versions of everything
  • 35.
  • 36. Github: Facebook for code • Easy versioning • Random people use your stuff • And find problems and fix and improve it! • Greater impact / better planet • Easily update • Easily collaborate • Identify trends • Build online reputation Demo
  • 39. Choosing a programming language Good: Bad: Excel quick dirty easy to make mistakes doesn’t scale R numbers, stats, genomics programming Unix command-line == shell == bash Can’t escape it. Quick Dirty. HPC. programming, complicated things Java 1990s user interfaces overcomplicated. Perl 1980s. Everything. Python scripting, text ugly Ruby scripting, text Javascript/Node scripting, flexibility(web client), community only little bio-stuff
  • 40. Ruby. “Friends don’t let friends do Perl” - reddit user • example: “reverse each line in file” • read file; with each line • remove the invisible “end of line” character • reverse the contents • print the reversed line ### in PERL: open INFILE, my_file.txt; while (defined ($line = INFILE)) { chomp($line); @letters = split(//, $line); @reverse_letters = reverse(@letters); $reverse_string = join(, @reverse_letters); print $reverse_string, n; } ### in Ruby: File.open(a).each { |line| puts line.chomp.reverse }
  • 41. More ruby examples. 5.times { puts Hello world } # Sorting people people_sorted_by_age = people.sort_by { |person| person.age} +many tools for bio-data - e.g. check http://biogems.info
  • 42. Getting help. • In real life: Make friends with people. Talk to them. • Online: • Specific discussion mailing lists (e.g.: R, Stacks, bioruby, MAKER...) • Programming: http://stackoverflow.com • Bioinformatics: http://www.biostars.org • Sequencing-related: http://seqanswers.com • Stats: http://stats.stackexchange.com ! • Codeschool!
  • 43.
  • 44.
  • 45. “Can you BLAST this for me?”
  • 46. Anurag Priyam, Sure, I can help you… Mechanical engineering student, IIT Kharagpur
  • 47. “Can you BLAST this for me?” Antgenomes.org SequenceServer BLAST made easy (well, we’re trying...) Aim: An open source idiot-proof web-interface for custom BLAST
  • 49. Anurag Priyam, Sure, I can help you… Mechanical engineering student, IIT Kharagpur
  • 50.
  • 51. xkcd