What makes a good programming language for bioinformatics education? Greg Slodkowicz Chris Workman group Center for Biological Sequence Analysis
Overview Language usage trends Practical language comparison Perl design Bioinformatics library comparison
Programming language trends
Programming language trends 2001 Java C C++ (Visual) Basic Perl PHP Python C# (Objective C) (Lua) 2011 Java C C++ C# PHP (Visual) Basic Objective C Python Perl Lua
Overview Language usage trends Practical language comparison Perl design Bioinformatics library comparison
Comparing programming languages ’ Religious’ aspect (holy language wars) Few hard facts, folk wisdom/software engineering practice instead Difficult to compare across universities (different student intake etc.) Personal preference
How to compare languages? Make an army of clones, teach them programming, do statistics
Software engineering perspective Effort (/time) to solve a problem is proportional to the number of lines needed for the solution [1] The number of bugs per line of code is constant regardless of the language used [2] [1]  F Brooks 1995 [2]  L Hatton 1995
Languages in the comparison Language Paradigm Execution model Perl Mixed Interpreted Python Mixed Interpreted Java OO VM C Procedural Compiled C++ OO Compiled
Practical comparison We compare several small problems implemented in each language: dna2prot Translate DNA to amino acid sequence fasta* Generate and write random DNA sequences reverse-complement* Read DNA sequences and write their reverse-complement k-nucleotide* Repeatedly update hashtables and k-nucleotide strings regex-dna* Match DNA 8-mers and substitute nucleotides for IUB code *From the Programming Language Benchmark Game
Code lengths compared
Overview Language usage trends Practical language comparison Perl design Bioinformatics library comparison
Unix legacy Many more syntactic structures in Perl Things happen by ‘magic’ Perl Python Java $str =~  /xxx$/  str.endswith(“xxx”) str.endsWith(“xxx”) $str !~ /xxx$/ not str.endswith(“xxx”) !str.endsWith(“xxx”) `cmd` os.system(cmd) Runtime.exec(cmd) s/xxx/yyy/ str.replace(“xxx”, “yyy”) str.replace(“xxx”, “yyy”) tr/abc/xyz/ str.translate(trantab) str.translate(“xxx”, “yyy”) <FILE> file f File f
Perl special variables
Perl philosophy ” (…) Perl is chock-full of exceptions to its rules. This is a good thing, as real life is chock-full of exceptions to rules.”  Schwartz et al., Learning Perl, O’Reilly
Overview Language usage trends Practical language comparison Perl design Bioinformatics library comparison
Bio* main projects BioPerl BioPython BioJava BioRuby
Bio* commit activity Activity by year Total # of commits
Summary Changing landscape of programming practice Scripting languages are more suited for teaching bioinformatics New languages emerge and can make teaching bioinformatics easier There are  more and more viable bioinformatics libraries
Questions? Acknowledgements Chris Workman Peter Wad Sackett Nils Weinhold

Programming languages vienna

  • 1.
    What makes agood programming language for bioinformatics education? Greg Slodkowicz Chris Workman group Center for Biological Sequence Analysis
  • 2.
    Overview Language usagetrends Practical language comparison Perl design Bioinformatics library comparison
  • 3.
  • 4.
    Programming language trends2001 Java C C++ (Visual) Basic Perl PHP Python C# (Objective C) (Lua) 2011 Java C C++ C# PHP (Visual) Basic Objective C Python Perl Lua
  • 5.
    Overview Language usagetrends Practical language comparison Perl design Bioinformatics library comparison
  • 6.
    Comparing programming languages’ Religious’ aspect (holy language wars) Few hard facts, folk wisdom/software engineering practice instead Difficult to compare across universities (different student intake etc.) Personal preference
  • 7.
    How to comparelanguages? Make an army of clones, teach them programming, do statistics
  • 8.
    Software engineering perspectiveEffort (/time) to solve a problem is proportional to the number of lines needed for the solution [1] The number of bugs per line of code is constant regardless of the language used [2] [1] F Brooks 1995 [2] L Hatton 1995
  • 9.
    Languages in thecomparison Language Paradigm Execution model Perl Mixed Interpreted Python Mixed Interpreted Java OO VM C Procedural Compiled C++ OO Compiled
  • 10.
    Practical comparison Wecompare several small problems implemented in each language: dna2prot Translate DNA to amino acid sequence fasta* Generate and write random DNA sequences reverse-complement* Read DNA sequences and write their reverse-complement k-nucleotide* Repeatedly update hashtables and k-nucleotide strings regex-dna* Match DNA 8-mers and substitute nucleotides for IUB code *From the Programming Language Benchmark Game
  • 11.
  • 12.
    Overview Language usagetrends Practical language comparison Perl design Bioinformatics library comparison
  • 13.
    Unix legacy Manymore syntactic structures in Perl Things happen by ‘magic’ Perl Python Java $str =~ /xxx$/ str.endswith(“xxx”) str.endsWith(“xxx”) $str !~ /xxx$/ not str.endswith(“xxx”) !str.endsWith(“xxx”) `cmd` os.system(cmd) Runtime.exec(cmd) s/xxx/yyy/ str.replace(“xxx”, “yyy”) str.replace(“xxx”, “yyy”) tr/abc/xyz/ str.translate(trantab) str.translate(“xxx”, “yyy”) <FILE> file f File f
  • 14.
  • 15.
    Perl philosophy ”(…) Perl is chock-full of exceptions to its rules. This is a good thing, as real life is chock-full of exceptions to rules.” Schwartz et al., Learning Perl, O’Reilly
  • 16.
    Overview Language usagetrends Practical language comparison Perl design Bioinformatics library comparison
  • 17.
    Bio* main projectsBioPerl BioPython BioJava BioRuby
  • 18.
    Bio* commit activityActivity by year Total # of commits
  • 19.
    Summary Changing landscapeof programming practice Scripting languages are more suited for teaching bioinformatics New languages emerge and can make teaching bioinformatics easier There are more and more viable bioinformatics libraries
  • 20.
    Questions? Acknowledgements ChrisWorkman Peter Wad Sackett Nils Weinhold

Editor's Notes

  • #2 My name is Greg Slodkowicz and I’ll be talking about how well suited different programming languages are for solving common problems in bioinformatics, in particular in bioinformatics education.
  • #4 To put it into context, here’s a plot of programming language popularity in the last 10 years. This is general, . So it’s not something specific for bioinformatics. Java, C, C++ -- heavy duty compiled languages Visual Basic has been promoted by Microsoft used mainly by small businesses and we can see from the plot that it’s on it’s way out in favour of C# and other things PHP, still the most popular language for the web
  • #5 You need a bit of context information to interpret this
  • #7 It’s always dangerous to compare programming
  • #8 ‘ clinical trial’ ’ Religious’ aspect (holy language wars) Few hard facts, folk wisdom/software engineering practice instead Difficult to compare across universities (different student intake etc.) Personal preference
  • #9 My background So of course we can’t do it ‘scientifically’ to get a definite answer but we can reason about the properties of each language and we can use software engineering practice to help us. If you’re a professional software developer 1-1.5 bugs per 100 lines of code. This is not because
  • #10 It’s always dangerous to compare programming langs Has to do with overall popularity trends, also to do with the courses which are taught Ruby is also quite tempting to try out but it’s really slow and there’s not much scientific communit
  • #11 Decided to implement solutions to several simple problems that could occur in an introductory bioinformatics course. IUB – degenerate base codes - optimized so not the optimal case for introductory stuff - but chosen out of many submissions - I pick shortest, not fastest
  • #14 Scoping, bolt-on error handling, profusion of control structures which do the same thing Have to work against the language to teach effectively Makes it difficult to understand somebody else’s code; also difficult for self-study
  • #15 Worst offenders $_, @_ Implicit behaviours, values are implicitly put into variables with arbitrary names
  • #16 You can make up your own mind if it’s a good thing – it’s loosely reasoned And in fact increases the cognitive load Means you have to work against the language to produce good code. In theory, people say that it gives more power but not for beginners
  • #18 OBF There are some affiliated