SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
Coding & Best Practice in Programming in the NGS era
Coding & Best Practice in Programming in the NGS era
1.
Coding & Best Practice in Programming
Why it matters so much in the NGS era
Lex Nederbragt
Norwegian Sequencing Centre and
Centre for Evolutionary and Ecological Synthesis
lex.nederbragt@ibv.uio.no
@lexnederbragt
OK
2.
Who am I
@lexnederbragt flxlexblog.wordpress.com
10.
Cod genome project phase 2
From Wikimedia commons, user Sagar Joshi
11.
In summary
From flickr, user lesterpubliclibrary
12.
Challenges in the next-generation sequencing era
13.
High-throughput sequencing
Phase 1: more is better
Phase 2: smaller is better
Phase 3: single-molecule
Phase 4: nanopores
14.
Democratization of sequencing
MinION
512 nanopores
150mb/hour
Up to 6 hours
$900
15.
Sequencing cost
Thanks to Matt Clark (TGAC), modified from http://bit.ly/1iiajcS
454 &
polony Solexa
&
SOLiD
HiSeq HiSeq X Ten
GAII
End of the gold rush?
16.
More more more
Data Software
Mathias Bigge, Ricordisamoa, others (wikimedia commons)
TCTCCTAACAACCCCCcACACACACACACTGGTA
CTGATGCCATTCTGCTTTACACCTATACACATCA
TATACATtATACACACACACACACACACACAACA
CTCTCCTAACCCACACACACTGGTACAGATGCCA
GTCTGCTTAACACCTACGCACGTATTATACACAC
ACACACACAACGCTCTCCTAACCCACACACACAC
CAGTCTGCTTTAAACCTACACACATATTATACAA
ACGAGTTGGTGACGTAAGGTTGATAAGGGATATT
GGTAAGGGTTAAGGGTAGGGTTGGTGTTAGGGGC
AAGGGTTAGGGTTAGTGTAAGGGGTAAGGGTTAG
TGTAaGGAGTAAGGGTTAGTGTAAGGGGTTAGTG
TTATTGTAAGGGGCTAGTGTTAGTGTTAGTGTTC
AGGGTTAGTGTTAGGGGTAGGGTTAATgTTTAGG
GTAATGTTTAGGGTTAGGGGTATGGGTTAGTGCT
AGGGGTCAGGGTTAGTGTTAGGGTTAGACAACCC
ACCTGAGAGAACCAGTGCGATGCCGCCGCAGGCG
TTGGGCGAGGACATGGAGGTGCCGTTCATCAGCT
GGGTCCCCCGGAGGGTCCAGTTGGGGACGGAGGC
GATGGCTCCCCCCGGAGCGCTGATGCTGACCCCC
AGGGCGCCGTCGATGCTGGGTCCCCGAGACGACC
AGGTGTACTGGTTGGCCGGGAGCTTCTCCCTCAG
GGAGTACTCCGCCACCATCATGTCGGGGGTCACG
TAGGCCCCAACCCCTGGGGACAGACGGAGCGCGT
TACACACCTCAACCCCTTACCCTCGGAGCCTACA
17.
Software
Constant stream of new software
http://wwwdev.ebi.ac.uk/fg/hts_mappers
88 short-read mappers
18.
Software
Constant stream of new software
http://neidetcher.com/ubuntu_package_dependency.html
InstallationJudging quality
Wikimedia commons, user Thebestofall007
20.
Do we need to be worried?
Self-taught bioinformaticians
ACCCCCcACACACACACACTGGTACTGATGCC
ACACCTATACACATCATATACATtATACACAC
ACACAACACTCTCCTAACCCACACACACTGGT
GTCTGCTTAACACCTACGCACGTATTATACAC
AACGCTCTCCTAACCCACACACACACCAGTCT
TACACACATATTATACAAACGAGTTGGTGACG
AAGGGATATTGGTAAGGGTTAAGGGTAGGGTT
GCAAGGGTTAGGGTTAGTGTAAGGGGTAAGGG
GAGTAAGGGTTAGTGTAAGGGGTTAGTGTTAT
TAGTGTTAGTGTTAGTGTTCAGGGTTAGTGTT
TTAATgTTTAGGGTAATGTTTAGGGTTAGGGG
TGCTAGGGGTCAGGGTTAGTGTTAGGGTTAGA
GAGAGAACCAGTGCGATGCCGCCGCAGGCGTT
ATGGAGGTGCCGTTCATCAGCTGGGTCCCCCG
TTGGGGACGGAGGCGATGGCTCCCCCCGGAGC
ACCCCCAGGGCGCCGTCGATGCTGGGTCCCCG
GTGTACTGGTTGGCCGGGAGCTTCTCCCTCAG
GCCACCATCATGTCGGGGGTCACGTAGGCCCC
GACAGACGGAGCGCGTTACACACCTCAACCCC
AGCCTACATAACCCAACCCTCTGGAGACGGCA
AGTCAGAAATAGaGCTGACCGATTCATCAAAT
lot’s of data
lot’s of software
recipe for disaster?
21.
Correctness of results
http://www.it.bton.ac.uk/staff/je/java/jewl/tutorial/tutorial.html
22.
Reproducibility
doi:10.1038/sj.embor.7401143
A reproducibility crisis?
23.
Reproducibility and reusability
http://upload.wikimedia.org/wikipedia/commons/4/48/Recycle.jpg
25.
My (given) title
Coding & Best Practice in Programming
Why it matters so much in the NGS era
Why it matters so much in science
Next-generation sequencing specific?
26.
Diagnostic sequencing
Wikimedia commons, user Bill Branson
34.
Best practices
Use version control
https://www.atlassian.com/git/workflows
35.
Best practices
From my own work:
$ cd scripts
$ ls
blat_parse4.pl old_versions snps_flanks_2_fastq.pl
$ ls old_versions/
blat_parse2.pl blat_parse_attemp1.pl
blat_parse.pl.bak blat_parse.pl
blat_parse3_backup.pl
blat_parse3.pl
36.
Best practices
test, test, test
def test_zero:
assert run_the_function(0) == 0
Assert x > 0, ”cannot handle negative numbers"
42.
khmer
Integrated code
coverage analysis
The “GitHub Flow”
model of code review
Semantic
versioning
Continuous
integrationIntegration and
acceptance testing