FBW
25-09-2018
Wim Van Criekinge
Bioinformatics.be
Minerva.ugent.be
Google Calendar
Overview
What is Python ?
Why Python in Bioinformatics ?
How to Python
IDE: Eclipse & PyDev / Athena
Code Sharing: Git(hub)
Examples
Hello World
PIthon
Overview
What is Python ?
Why Python 4 Bioinformatics ?
How to Python
IDE: Eclipse & PyDev / Athena
Code Sharing: Git(hub)
Examples
Hello World
PIthon
Programming Language
• Formal notation for specifying computations
– Syntax (usually specified by a context-free
grammar)
– Semantics for each syntactic construct
– Practical implementation on a real or virtual
machine
• Compilation vs. interpretation
• Efficiency vs. portability
• Assembly Languages
– Invented by machine designers
the early 1950s
– Reusable macros and subroutines
FORTRAN
• Procedural, imperative language
– Still used in scientific computation
• Developed at IBM in the 1950s by
John Backus (1924-2007)
– Backus’s 1977 Turing award lecture made the
case for functional programming
– On FORTRAN: “We did not know what we
wanted and how to do it. It just sort of grew. The
first struggle was over what the language would
look like. Then how to parse expressions – it was
a big problem…”
• BNF: Backus-Naur form for defining context-free
grammars
LISP
• Invented by John McCarthy (b. 1927, Turing award:
1971)
• Formal notation for lambda-calculus
• Pioneered many PL concepts
– Automated memory management (garbage collection)
– Dynamic typing
– No distinction between code and data
• Still in use: ACL2, Scheme, …
“Anyone could learn Lisp in one day,
except that if they already knew FORTRAN, it would take
three days”
- Marvin Minsky
PASCAL
• Designed by Niklaus Wirth
– 1984 Turing Award
• Revised type system of Algol
– Good data structure concepts
• Records, variants, subranges
– More restrictive than Algol 60/68
• Procedure parameters cannot have procedure
parameters
• Popular teaching language
• Simple one-pass compiler
C
• Bell Labs 1972 (Dennis Ritchie)
• Development closely related to UNIX
– 1983 Turing Award to Thompson and Ritchie
• Compiles to native code
• 1973-1980: new features; compiler ported
– unsigned, long, union, enums
• 1978: K&R C book published
• 1989: ANSI C standardization
– Function prototypes as in C++
• 1999: ISO 9899:1999 also known as “C99”
– Inline functions, C++-like decls, bools, variable arrays
• Concurrent C, Objective C, C*, C++, C#
• “Portable assembly language”
– Early C++, Modula-3, Eiffel source-translated to C
JAVA
• Sun 1991-1995 (James Gosling)
– Originally called Oak, intended for set top boxes
• Mixture of C and Modula-3
– Unlike C++
• No templates (generics), no multiple inheritance, no
operator overloading
– Like Modula-3 (developed at DEC SRC)
• Explicit interfaces, single inheritance, exception
handling, built-in threading model, references &
automatic garbage collection (no explicit pointers!)
• “Generics” added later
Other Important Languages
• Algol-like
– Modula, Oberon, Ada
• Functional
– ISWIM, FP, SASL, Miranda, Haskell, LCF,
ML, Caml, Ocaml, Scheme, Common LISP
• Object-oriented
– Smalltalk, Objective-C, Eiffel, Modula-3,
Self, C#, CLOS
• Logic programming
– Prolog, Gödel, LDL, ACL2, Isabelle, HOL
… and more
• Data processing and databases
– Cobol, SQL, 4GLs, XQuery
• Systems programming
– PL/I, PL/M, BLISS
• Specialized applications
– APL, Forth, Icon, Logo, SNOBOL4,
GPSS, Visual Basic
• Concurrent, parallel, distributed
– Concurrent Pascal, Concurrent C, C*,
SR, Occam, Erlang, Obliq
… and more
• Programming tool “mini-languages”
– awk, make, lex, yacc, autoconf …
• Command shells, scripting and “web”
languages
– sh, csh, tcsh, ksh, zsh, bash …
– Perl, JavaScript, PHP, Python, Rexx, Ruby, Tcl,
AppleScript, VBScript …
• Web application frameworks and technologies
– ASP.NET, AJAX, Flash, Silverlight …
• Note: HTML/XML are markup languages, not
programming languages, but they often embed
executable scripts like Active Server Pages (ASPs) &
Java Server Pages (JSPs)
What is scripting ?
• Wikipedia has an informative and detailed explanation, “A
scripting language, script language or extension language
is a programming language that allows control of one or more
software applications. "Scripts" are distinct from the core
code of the application, as they are usually written in a
different language and are often created or at least modified
by the end-user.[1] Scripts are often interpreted from source
code or bytecode, whereas the applications they control are
traditionally compiled to native machine code. Scripting
languages are nearly always embedded in the applications
they control.[2]
• The name "script" is derived from the written script of the
performing arts, in which dialogue is set down to be spoken
by human actors. Early script languages were often called
batch languages or job control languages. Such early
scripting languages were created to shorten the traditional
edit-compile-link-run process”.
What is Python ?
• Python is an interpreted, object-oriented, high-level
programming language with dynamic semantics.
• Its high-level built in data structures, combined with dynamic
typing and dynamic binding, make it very attractive for Rapid
Application Development, as well as for use as a scripting or
glue language to connect existing components together.
• Python supports modules and packages, which encourages
program modularity and code reuse. The Python interpreter
and the extensive standard library are available in source or
binary form without charge for all major platforms, and can be
freely distributed.
• When he began implementing Python, Guido van Rossum
was also reading the published scripts from “Monty Python's
Flying Circus”, a BBC comedy series from the 1970s. Van
Rossum thought he needed a name that was short, unique,
and slightly mysterious, so he decided to call the language
Python.
What’s Driving Their Evolution?
• Constant search for better ways to build software
tools for solving computational problems
– Many PLs are general purpose tools
– Others are targeted at specific kinds of problems
• For example, massively parallel computations or
graphics
• Useful ideas evolve into language designs
– Algol  Simula  Smalltalk  C with Classes 
C++
• Often design is driven by expediency
– Scripting languages: Perl, Tcl, Python, PHP, etc.
• “PHP is a minor evil perpetrated by incompetent
amateurs, whereas Perl is a great and insidious evil,
perpetrated by skilled but perverted professionals.” -
Jon Ribbens
What Do They Have in Common?
• Lexical structure and analysis
– Tokens: keywords, operators, symbols, variables
– Regular expressions and finite automata
• Syntactic structure and analysis
– Parsing, context-free grammars
• Pragmatic issues
– Scoping, block structure, local variables
– Procedures, parameter passing, iteration,
recursion
– Type checking, data structures
• Semantics
– What do programs mean and are they correct
Visual history of programming languages
http://cdn.oreillystatic.com/news/graphics/prog_lang_poster.pdf
The most valuable programming skills to have on a resume
The most valuable programming skills to have on a resume
The 2015 Top Ten Programming Languages
http://spectrum.ieee.org/computing/software/the-2015-top-ten-programming-languages
3/05/2016 Project Biological Databases
2015-2016
Biological Databases
Bruno Verstraeten, Arthur Zwaenepoel,
Jules Haezebrouck, Laurenz De Cock, Jonathan
Walgraeve, Cedric Bogaert, Dries Schaumont
What is minecraft
• Sandbox game
• Designed by Markus “Notch” Persson
• Mojang
• Bought by Microsoft in 2014
• 70 million sold copies (june 2015)
Minecraft programming from Python
Third party mods
• Extra content made by users
• Adding items, magic and features to
the original game
• The true beauty of minecraft
And now Sciencecraft
• Visualizing proteins in minecraft
• Minecraft Tools python package
• Data directly from PDB flat files or
from the PDB server
• Spigot minecraft server
The basics
1. Start a server with Minecraft Tools
2. Using python import the pdb file
3. Retrieve the coordinates from the file
4. Using the setBlock function blocks of
specific colours are placed in the
minecraft server to represent the protein
5. Fly around and take screenshots
Minecraft programming from Python
# Connect to Minecraft
from mcpi.minecraft import Minecraft
mc = Minecraft.create()
# Set x, y, and z variables to represent coordinates
x = 10.0
y = 110.0
z = 12.0
# Change the player's position
mc.player.setPos(x, y, z)
Verotoxin
Apo-lipoprotein A1
Kinesine
Retrieving PDB data using SPARQL
• PDB available in RDF (wwPDB)
• Using python SPARQLwrapper
Using SPARQL with Python – SPARQLWrapper
SPARQL endpoint
Using SPARQL with Python – SPARQLWrapper
“Search engine”
• Naive regex based
• Returns list of all pdb
entries containing a
certain keyword with
organism name and
full description
• PDB entry can be
retrieved with previous
query
Retrieve .xml.gz file:
 Actual structure information in xml file
<?xml version="1.0" encoding="UTF-8" ?>
<PDBx:datablock datablockName="1O9K"
xmlns:PDBx="http://pdbml.pdb.org/schema/pdbx-v40.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://pdbml.pdb.org/schema/pdbx-
v40.xsd pdbx-v40.xsd">
<PDBx:atom_siteCategory>
<PDBx:atom_site id="1">
<PDBx:B_iso_or_equiv>62.42</PDBx:B_iso_or_equiv>
<PDBx:Cartn_x>13.258</PDBx:Cartn_x>
<PDBx:Cartn_y>142.706</PDBx:Cartn_y>
<PDBx:Cartn_z>30.410</PDBx:Cartn_z>
<PDBx:auth_asym_id>A</PDBx:auth_asym_id>
<PDBx:auth_atom_id>N</PDBx:auth_atom_id>
<PDBx:auth_comp_id>MET</PDBx:auth_comp_id>
<PDBx:auth_seq_id>379</PDBx:auth_seq_id>
<PDBx:group_PDB>ATOM</PDBx:group_PDB>
<PDBx:label_alt_id xsi:nil="true" />
<PDBx:label_asym_id>A</PDBx:label_asym_id>
<PDBx:label_atom_id>N</PDBx:label_atom_id>
<PDBx:label_comp_id>MET</PDBx:label_comp_id>
<PDBx:label_entity_id>1</PDBx:label_entity_id>
<PDBx:label_seq_id>8</PDBx:label_seq_id>
<PDBx:occupancy>1.00</PDBx:occupancy>
….
Using SPARQL with Python – SPARQLWrapper
New kid in the coding block …
Overview
What is Python ?
Why Python 4 Bioinformatics ?
How to Python
IDE: Eclipse & PyDev / Athena
Code Sharing: Git(hub)
Examples
Hello World
PIthon
Python
• Programming languages are overrated
– If you are going into bioinformatics you probably
learn/need multiple
– If you know one you know 90% of a second
• Choice does matter but it matters far less than people think it
does
• Why Python?
– Lets you start useful programs asap
– Build-in libraries – incl BioPython
– Free, most platforms, widely (scientifically) used
• Versus Perl?
– Incredibly similar
– Consistent syntax, indentation
http://www.python.org
Should I use Python 2 or Python 3 for my development activity?
• Short version: Python 2.x is legacy, Python 3.x is the present and future of the language
• Python 3.0 was released in 2008. The final 2.x version 2.7 release came out in mid-
2010, with a statement of extended support for this end-of-life release. The 2.x branch
will see no new major releases after that. 3.x is under active development and has
already seen over five years of stable releases, including version 3.3 in 2012 and 3.4 in
2014. This means that all recent standard library improvements, for example, are only
available by default in Python 3.x.
• Guido van Rossum (the original creator of the Python language) decided to clean up
Python 2.x properly, with less regard for backwards compatibility than is the case for
new releases in the 2.x range. The most drastic improvement is the better Unicode
support (with all text strings being Unicode by default) as well as saner bytes/Unicode
separation.
• Besides, several aspects of the core language (such as print and exec being statements,
integers using floor division) have been adjusted to be easier for newcomers to learn and
to be more consistent with the rest of the language, and old cruft has been removed (for
example, all classes are now new-style, "range()" returns a memory efficient iterable,
not a list as in 2.x).
• The What's New in Python 3.0 document provides a good overview of the major
language changes and likely sources of incompatibility with existing Python 2.x code.
Nick Coghlan (one of the CPython core developers) has also created a relatively
extensive FAQ regarding the transition.
• However, the broader Python ecosystem has amassed a significant amount of quality
software over the years. The downside of breaking backwards compatibility in 3.x is that
some of that software (especially in-house software in companies) still doesn't work on
3.x yet.
How to install ?
• On windows you’ll need administrator
right 
• Portable python distribution ?
Takes 500Mb and >2 hours 
Version 2.7 and 3.4 on http://athena.ugent.be
Interactive “Shell”
• Great for learning the language
• Great for experimenting with the library
• Great for testing your own modules
• Two variations: IDLE (GUI),
python (command line)
• Type statements or expressions at prompt:
>>> print "Hello, world"
Hello, world
>>> x = 12**2
>>> x/2
72
>>> # this is a comment
Overview
What is Python ?
Why Python 4 Bioinformatics ?
How to Python
IDE: Eclipse & PyDev / Athena
Code Sharing: Git(hub)
Examples
Hello World
PIthon
IDE: Integrated Development Environment
• You type scripts using can use
notepad(++)
• Better: PyCharm
– available for free on most OS but you
need to be administrator to install 
• We will use Eclipse in combination
with PyDev
What is Eclipse?
• Eclipse started as a proprietary IBM product
(IBM Visual age for Smalltalk/Java)
– Embracing the open source model IBM opened the
product up
• Open Source
– It is a general purpose open platform that facilitates
and encourages the development of third party plug-
ins
• Best known as an Integrated Development
Environment (IDE)
– Provides tools for coding, building, running and
debugging applications
• Originally designed for Java, now supports
many other languages
– Good support for C, C++
– Python, PHP, Ruby, etc…
Prerequisites for Running Eclipse
• Eclipse is written in Java and will
thus need an installed JRE or JDK
in which to execute
– JDK recommended
Selecting a Workspace
• In Eclipse, all of your code will live under a
workspace
• A workspace is nothing more than a location
where we will store our source code and
where Eclipse will write out our preferences
• Eclipse allows you to have multiple
workspaces – each tailored in its own way
• Choose a location where you want to store
your files, then click OK
Eclipse IDE Components
Menubars
Full drop down menus plus quick
access to common functions
Editor Pane
This is where we edit
our source code
Perspective Switcher
We can switch between
various perspectives
here
Outline Pane
This contains a hierarchical
view of a source file
Package Explorer Pane
This is where our
projects/files are listed
Miscellaneous Pane
Various components can appear in this
pane – typically this contains a console
and a list of compiler problems
Task List Pane
This contains a list of
“tasks” to complete
PYTHON
PyDev: Python plug-in for Eclipse
• Syntax highlighting
• Debugger
• Code completion
• An extensive preference menu
that can be used to edit the
plug-in’s attributes and options.
Installation
 The plug-in can be installed through
Software Updates:
Setting Up
In Eclipse, go to:
Window, Preferences, PyDev,
Interpreter-Python,
and click New.
Select the python.exe
file in the Python
directory, click OK and
OK in the Preferences
window again. Wait for
the creating procedure
to finish.
Create Python Project and File
Click on File, New, choose File,
click on Python project folder,
write the file name ending in a
.py, and click Finish.
Go to File, New, Project, select
Pydev,Python Project, click Next,
write name, choose Python version,
and click Finish.
Running Python
To run Python code click on Run, Run As, and
select Python Run.
Lets try for “Hello World!” from athena.ugent.be
Where is the workspace ?
Make PyDev Project
Which Python interpreter is used … check Preferences or run version.py
Create new file …
… Hello_world.py
Run Hello_world.py
Overview
What is Python ?
Why Python 4 Bioinformatics ?
How to Python
IDE: Eclipse & PyDev / Athena
Code Sharing: Git(hub)
Examples
Hello World
PIthon
git is an open source,
distributed version control
system designed for speed
and efficiency
Git: A distributed version control system
• Version control (or revision control, or source
control) is all about managing multiple versions of
documents, programs, web sites, etc.
– Almost all “real” projects use some kind of version
control
– Essential for team projects, but also very useful for
individual projects
• Some well-known version control systems are
CVS, Subversion, Mercurial, and Git
– CVS and Subversion use a “central” repository; users
“check out” files, work on them, and “check them in”
– Mercurial and Git treat all repositories as equal
• Distributed systems like Mercurial and Git are
newer and are gradually replacing centralized
systems like CVS and Subversion
Why version control?
• For working by yourself:
– Gives you a “time machine” for going back to
earlier versions
– Gives you great support for different versions
(standalone, web app, etc.) of the same basic
project
• For working with others:
– Greatly simplifies concurrent work, merging
changes
• For getting an internship or job:
– Any company with a clue uses some kind of
version control
– Companies without a clue are bad places to work
Why Git?
• Git has many advantages over earlier
systems such as CVS and Subversion
– More efficient, better workflow, etc.
– See the literature for an extensive list of reasons
– Of course, there are always those who disagree
• It works from with Eclipse, also when
started from athena 
No Network needed for
(almost) everything is local
• Performing a diff
• Viewing file history
• Committing changes
• Merging branches
• Obtaining any other
revision of a file
• Switching branches
GitHub: Hosted GIT
• Largest open source git hosting site
• Public and private options
• User-centric rather than project-centric
• http://github.ugent.be (use your Ugent
login and password)
• URI:
• https://github.ugent.be/wvcrieki/Bioinfor
matics_2018.py.git
GitHub: Hosted GIT
GitHub: Hosted GIT
Typical workflow
Person A
 Setup project &
repo
 push code onto
github
 edit/commit
 edit/commit
 pull/push
Person B
•clone code from
github
•edit/commit/push
•edit…
•edit… commit
•pull/push
This is just the flow, specific commands on following slides.
It’s also possible to create your project first on github, then clone (i.e., no git init)
GitHub: Hosted GIT
GitHub: Hosted GIT
GitHub: Hosted GIT
GitHub: Hosted GIT
GitHub: Hosted GIT
URI (Uniform Resource Identifier):
https://github.ugent.be/wvcrieki/Bioinformatics_2018.py.git
GitHub: Hosted GIT
GitHub: Hosted GIT
GitHub: Hosted GIT
GitHub: Hosted GIT
GitHub: Hosted GIT
Overview
What is Python ?
Why Python 4 Bioinformatics ?
How to Python
IDE: Eclipse & PyDev / Athena
Code Sharing: Git(hub)
Examples
Hello_World.py
PI-thon.py
Hello_world.py
Overview
What is Python ?
Why Python 4 Bioinformatics ?
How to Python
IDE: Eclipse & PyDev / Athena
Code Sharing: Git(hub)
Examples
Hello_World.py
PI-thon.py
Variables
• No need to declare
• Need to assign (initialize)
• use of uninitialized variable raises exception
• Not typed
if friendly: greeting = "hello world"
else: greeting = 12**2
print greeting
• Everything is a "variable":
• Even functions, classes, modules
Numbers
• The usual suspects
• 12, 3.14, 0xFF, 0377, (-1+2)*3/4**5, abs(x), 0<x<=5
• C-style shifting & masking
• 1<<16, x&0xff, x|1, ~x, x^y
• Integer division truncates :-(
• 1/2 -> 0 # 1./2. -> 0.5, float(1)/2 -> 0.5
• Will be fixed in the future
• Long (arbitrary precision), complex
• 2L**100 -> 1267650600228229401496703205376L
– In Python 2.2 and beyond, 2**100 does the same thing
• 1j**2 -> (-1+0j)
Control Structures
if condition:
statements
[elif condition:
statements] ...
else:
statements
while condition:
statements
for var in sequence:
statements
break
continue
Example Function
def gcd(a, b):
"greatest common divisor"
while a != 0:
a, b = b%a, a # parallel assignment
return b
>>> gcd.__doc__
'greatest common divisor'
>>> gcd(12, 20)
4
How to generate random numbers
The standard random module implements a random number
generator.
import random
print (random.random())
This prints a random floating point number in the range [0, 1) (that is,
between 0 and 1, including 0.0 but always smaller than 1.0).
There are also many other specialized generators in this module,
such as:
randrange(a, b) chooses an integer in the range [a, b).
uniform(a, b) chooses a floating point number in the range [a, b).
normalvariate(mean, sdev) samples the normal (Gaussian)
distribution.
Some higher-level functions operate on sequences directly, such as:
choice(S) chooses a random element from a given sequence (the
sequence must have a known length).
shuffle(L) shuffles a list in-place, i.e. permutes it randomly
There’s also a Random class you can instantiate to create
independent multiple random number generators.
First program: PI-thon.py
• How good are the random numbers ?
• If they are good, you should be able to
“measure” PI
Measure Pi with two random numbers …. many of them …
1
x
y
e (Euler’s number)
• The value of e is also equal
to 10! + 11! + 12! + 13! + 14! + 15! +
16! + 17! + ... (etc)
• (Note: "!" means factorial)
• The first few terms add up to: 1 + 1
+ 12 + 16 + 124 + 1120 =
2.718055556
• In fact Euler himself used this method
to calculate e to 18 decimal places.
Eulers identity
import math
print ((math.e**(math.pi*1j)).real + 1)
Python Videos
http://python.org/
- documentation, tutorials, beginners guide, core
distribution, ...
Books include:
 Learning Python by Mark Lutz
 Python Essential Reference by David Beazley
 Python Cookbook, ed. by Martelli, Ravenscroft and
Ascher

P1 2018 python

  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
    Overview What is Python? Why Python in Bioinformatics ? How to Python IDE: Eclipse & PyDev / Athena Code Sharing: Git(hub) Examples Hello World PIthon
  • 7.
    Overview What is Python? Why Python 4 Bioinformatics ? How to Python IDE: Eclipse & PyDev / Athena Code Sharing: Git(hub) Examples Hello World PIthon
  • 8.
    Programming Language • Formalnotation for specifying computations – Syntax (usually specified by a context-free grammar) – Semantics for each syntactic construct – Practical implementation on a real or virtual machine • Compilation vs. interpretation • Efficiency vs. portability • Assembly Languages – Invented by machine designers the early 1950s – Reusable macros and subroutines
  • 9.
    FORTRAN • Procedural, imperativelanguage – Still used in scientific computation • Developed at IBM in the 1950s by John Backus (1924-2007) – Backus’s 1977 Turing award lecture made the case for functional programming – On FORTRAN: “We did not know what we wanted and how to do it. It just sort of grew. The first struggle was over what the language would look like. Then how to parse expressions – it was a big problem…” • BNF: Backus-Naur form for defining context-free grammars
  • 10.
    LISP • Invented byJohn McCarthy (b. 1927, Turing award: 1971) • Formal notation for lambda-calculus • Pioneered many PL concepts – Automated memory management (garbage collection) – Dynamic typing – No distinction between code and data • Still in use: ACL2, Scheme, … “Anyone could learn Lisp in one day, except that if they already knew FORTRAN, it would take three days” - Marvin Minsky
  • 11.
    PASCAL • Designed byNiklaus Wirth – 1984 Turing Award • Revised type system of Algol – Good data structure concepts • Records, variants, subranges – More restrictive than Algol 60/68 • Procedure parameters cannot have procedure parameters • Popular teaching language • Simple one-pass compiler
  • 12.
    C • Bell Labs1972 (Dennis Ritchie) • Development closely related to UNIX – 1983 Turing Award to Thompson and Ritchie • Compiles to native code • 1973-1980: new features; compiler ported – unsigned, long, union, enums • 1978: K&R C book published • 1989: ANSI C standardization – Function prototypes as in C++ • 1999: ISO 9899:1999 also known as “C99” – Inline functions, C++-like decls, bools, variable arrays • Concurrent C, Objective C, C*, C++, C# • “Portable assembly language” – Early C++, Modula-3, Eiffel source-translated to C
  • 13.
    JAVA • Sun 1991-1995(James Gosling) – Originally called Oak, intended for set top boxes • Mixture of C and Modula-3 – Unlike C++ • No templates (generics), no multiple inheritance, no operator overloading – Like Modula-3 (developed at DEC SRC) • Explicit interfaces, single inheritance, exception handling, built-in threading model, references & automatic garbage collection (no explicit pointers!) • “Generics” added later
  • 14.
    Other Important Languages •Algol-like – Modula, Oberon, Ada • Functional – ISWIM, FP, SASL, Miranda, Haskell, LCF, ML, Caml, Ocaml, Scheme, Common LISP • Object-oriented – Smalltalk, Objective-C, Eiffel, Modula-3, Self, C#, CLOS • Logic programming – Prolog, Gödel, LDL, ACL2, Isabelle, HOL
  • 15.
    … and more •Data processing and databases – Cobol, SQL, 4GLs, XQuery • Systems programming – PL/I, PL/M, BLISS • Specialized applications – APL, Forth, Icon, Logo, SNOBOL4, GPSS, Visual Basic • Concurrent, parallel, distributed – Concurrent Pascal, Concurrent C, C*, SR, Occam, Erlang, Obliq
  • 16.
    … and more •Programming tool “mini-languages” – awk, make, lex, yacc, autoconf … • Command shells, scripting and “web” languages – sh, csh, tcsh, ksh, zsh, bash … – Perl, JavaScript, PHP, Python, Rexx, Ruby, Tcl, AppleScript, VBScript … • Web application frameworks and technologies – ASP.NET, AJAX, Flash, Silverlight … • Note: HTML/XML are markup languages, not programming languages, but they often embed executable scripts like Active Server Pages (ASPs) & Java Server Pages (JSPs)
  • 17.
    What is scripting? • Wikipedia has an informative and detailed explanation, “A scripting language, script language or extension language is a programming language that allows control of one or more software applications. "Scripts" are distinct from the core code of the application, as they are usually written in a different language and are often created or at least modified by the end-user.[1] Scripts are often interpreted from source code or bytecode, whereas the applications they control are traditionally compiled to native machine code. Scripting languages are nearly always embedded in the applications they control.[2] • The name "script" is derived from the written script of the performing arts, in which dialogue is set down to be spoken by human actors. Early script languages were often called batch languages or job control languages. Such early scripting languages were created to shorten the traditional edit-compile-link-run process”.
  • 18.
    What is Python? • Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. • Its high-level built in data structures, combined with dynamic typing and dynamic binding, make it very attractive for Rapid Application Development, as well as for use as a scripting or glue language to connect existing components together. • Python supports modules and packages, which encourages program modularity and code reuse. The Python interpreter and the extensive standard library are available in source or binary form without charge for all major platforms, and can be freely distributed. • When he began implementing Python, Guido van Rossum was also reading the published scripts from “Monty Python's Flying Circus”, a BBC comedy series from the 1970s. Van Rossum thought he needed a name that was short, unique, and slightly mysterious, so he decided to call the language Python.
  • 19.
    What’s Driving TheirEvolution? • Constant search for better ways to build software tools for solving computational problems – Many PLs are general purpose tools – Others are targeted at specific kinds of problems • For example, massively parallel computations or graphics • Useful ideas evolve into language designs – Algol  Simula  Smalltalk  C with Classes  C++ • Often design is driven by expediency – Scripting languages: Perl, Tcl, Python, PHP, etc. • “PHP is a minor evil perpetrated by incompetent amateurs, whereas Perl is a great and insidious evil, perpetrated by skilled but perverted professionals.” - Jon Ribbens
  • 20.
    What Do TheyHave in Common? • Lexical structure and analysis – Tokens: keywords, operators, symbols, variables – Regular expressions and finite automata • Syntactic structure and analysis – Parsing, context-free grammars • Pragmatic issues – Scoping, block structure, local variables – Procedures, parameter passing, iteration, recursion – Type checking, data structures • Semantics – What do programs mean and are they correct
  • 21.
    Visual history ofprogramming languages http://cdn.oreillystatic.com/news/graphics/prog_lang_poster.pdf
  • 23.
    The most valuableprogramming skills to have on a resume
  • 24.
    The most valuableprogramming skills to have on a resume
  • 25.
    The 2015 TopTen Programming Languages http://spectrum.ieee.org/computing/software/the-2015-top-ten-programming-languages
  • 28.
    3/05/2016 Project BiologicalDatabases 2015-2016 Biological Databases Bruno Verstraeten, Arthur Zwaenepoel, Jules Haezebrouck, Laurenz De Cock, Jonathan Walgraeve, Cedric Bogaert, Dries Schaumont
  • 29.
    What is minecraft •Sandbox game • Designed by Markus “Notch” Persson • Mojang • Bought by Microsoft in 2014 • 70 million sold copies (june 2015)
  • 31.
  • 32.
    Third party mods •Extra content made by users • Adding items, magic and features to the original game • The true beauty of minecraft
  • 33.
    And now Sciencecraft •Visualizing proteins in minecraft • Minecraft Tools python package • Data directly from PDB flat files or from the PDB server • Spigot minecraft server
  • 34.
    The basics 1. Starta server with Minecraft Tools 2. Using python import the pdb file 3. Retrieve the coordinates from the file 4. Using the setBlock function blocks of specific colours are placed in the minecraft server to represent the protein 5. Fly around and take screenshots
  • 35.
    Minecraft programming fromPython # Connect to Minecraft from mcpi.minecraft import Minecraft mc = Minecraft.create() # Set x, y, and z variables to represent coordinates x = 10.0 y = 110.0 z = 12.0 # Change the player's position mc.player.setPos(x, y, z)
  • 36.
  • 37.
  • 38.
  • 39.
    Retrieving PDB datausing SPARQL • PDB available in RDF (wwPDB) • Using python SPARQLwrapper
  • 40.
    Using SPARQL withPython – SPARQLWrapper SPARQL endpoint
  • 41.
    Using SPARQL withPython – SPARQLWrapper “Search engine” • Naive regex based • Returns list of all pdb entries containing a certain keyword with organism name and full description • PDB entry can be retrieved with previous query
  • 43.
    Retrieve .xml.gz file: Actual structure information in xml file <?xml version="1.0" encoding="UTF-8" ?> <PDBx:datablock datablockName="1O9K" xmlns:PDBx="http://pdbml.pdb.org/schema/pdbx-v40.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://pdbml.pdb.org/schema/pdbx- v40.xsd pdbx-v40.xsd"> <PDBx:atom_siteCategory> <PDBx:atom_site id="1"> <PDBx:B_iso_or_equiv>62.42</PDBx:B_iso_or_equiv> <PDBx:Cartn_x>13.258</PDBx:Cartn_x> <PDBx:Cartn_y>142.706</PDBx:Cartn_y> <PDBx:Cartn_z>30.410</PDBx:Cartn_z> <PDBx:auth_asym_id>A</PDBx:auth_asym_id> <PDBx:auth_atom_id>N</PDBx:auth_atom_id> <PDBx:auth_comp_id>MET</PDBx:auth_comp_id> <PDBx:auth_seq_id>379</PDBx:auth_seq_id> <PDBx:group_PDB>ATOM</PDBx:group_PDB> <PDBx:label_alt_id xsi:nil="true" /> <PDBx:label_asym_id>A</PDBx:label_asym_id> <PDBx:label_atom_id>N</PDBx:label_atom_id> <PDBx:label_comp_id>MET</PDBx:label_comp_id> <PDBx:label_entity_id>1</PDBx:label_entity_id> <PDBx:label_seq_id>8</PDBx:label_seq_id> <PDBx:occupancy>1.00</PDBx:occupancy> …. Using SPARQL with Python – SPARQLWrapper
  • 44.
    New kid inthe coding block …
  • 45.
    Overview What is Python? Why Python 4 Bioinformatics ? How to Python IDE: Eclipse & PyDev / Athena Code Sharing: Git(hub) Examples Hello World PIthon
  • 46.
    Python • Programming languagesare overrated – If you are going into bioinformatics you probably learn/need multiple – If you know one you know 90% of a second • Choice does matter but it matters far less than people think it does • Why Python? – Lets you start useful programs asap – Build-in libraries – incl BioPython – Free, most platforms, widely (scientifically) used • Versus Perl? – Incredibly similar – Consistent syntax, indentation
  • 47.
  • 48.
    Should I usePython 2 or Python 3 for my development activity? • Short version: Python 2.x is legacy, Python 3.x is the present and future of the language • Python 3.0 was released in 2008. The final 2.x version 2.7 release came out in mid- 2010, with a statement of extended support for this end-of-life release. The 2.x branch will see no new major releases after that. 3.x is under active development and has already seen over five years of stable releases, including version 3.3 in 2012 and 3.4 in 2014. This means that all recent standard library improvements, for example, are only available by default in Python 3.x. • Guido van Rossum (the original creator of the Python language) decided to clean up Python 2.x properly, with less regard for backwards compatibility than is the case for new releases in the 2.x range. The most drastic improvement is the better Unicode support (with all text strings being Unicode by default) as well as saner bytes/Unicode separation. • Besides, several aspects of the core language (such as print and exec being statements, integers using floor division) have been adjusted to be easier for newcomers to learn and to be more consistent with the rest of the language, and old cruft has been removed (for example, all classes are now new-style, "range()" returns a memory efficient iterable, not a list as in 2.x). • The What's New in Python 3.0 document provides a good overview of the major language changes and likely sources of incompatibility with existing Python 2.x code. Nick Coghlan (one of the CPython core developers) has also created a relatively extensive FAQ regarding the transition. • However, the broader Python ecosystem has amassed a significant amount of quality software over the years. The downside of breaking backwards compatibility in 3.x is that some of that software (especially in-house software in companies) still doesn't work on 3.x yet.
  • 49.
    How to install? • On windows you’ll need administrator right  • Portable python distribution ? Takes 500Mb and >2 hours 
  • 50.
    Version 2.7 and3.4 on http://athena.ugent.be
  • 52.
    Interactive “Shell” • Greatfor learning the language • Great for experimenting with the library • Great for testing your own modules • Two variations: IDLE (GUI), python (command line) • Type statements or expressions at prompt: >>> print "Hello, world" Hello, world >>> x = 12**2 >>> x/2 72 >>> # this is a comment
  • 53.
    Overview What is Python? Why Python 4 Bioinformatics ? How to Python IDE: Eclipse & PyDev / Athena Code Sharing: Git(hub) Examples Hello World PIthon
  • 54.
    IDE: Integrated DevelopmentEnvironment • You type scripts using can use notepad(++) • Better: PyCharm – available for free on most OS but you need to be administrator to install  • We will use Eclipse in combination with PyDev
  • 55.
    What is Eclipse? •Eclipse started as a proprietary IBM product (IBM Visual age for Smalltalk/Java) – Embracing the open source model IBM opened the product up • Open Source – It is a general purpose open platform that facilitates and encourages the development of third party plug- ins • Best known as an Integrated Development Environment (IDE) – Provides tools for coding, building, running and debugging applications • Originally designed for Java, now supports many other languages – Good support for C, C++ – Python, PHP, Ruby, etc…
  • 56.
    Prerequisites for RunningEclipse • Eclipse is written in Java and will thus need an installed JRE or JDK in which to execute – JDK recommended
  • 57.
    Selecting a Workspace •In Eclipse, all of your code will live under a workspace • A workspace is nothing more than a location where we will store our source code and where Eclipse will write out our preferences • Eclipse allows you to have multiple workspaces – each tailored in its own way • Choose a location where you want to store your files, then click OK
  • 58.
    Eclipse IDE Components Menubars Fulldrop down menus plus quick access to common functions Editor Pane This is where we edit our source code Perspective Switcher We can switch between various perspectives here Outline Pane This contains a hierarchical view of a source file Package Explorer Pane This is where our projects/files are listed Miscellaneous Pane Various components can appear in this pane – typically this contains a console and a list of compiler problems Task List Pane This contains a list of “tasks” to complete PYTHON
  • 59.
    PyDev: Python plug-infor Eclipse • Syntax highlighting • Debugger • Code completion • An extensive preference menu that can be used to edit the plug-in’s attributes and options.
  • 60.
    Installation  The plug-incan be installed through Software Updates:
  • 61.
    Setting Up In Eclipse,go to: Window, Preferences, PyDev, Interpreter-Python, and click New. Select the python.exe file in the Python directory, click OK and OK in the Preferences window again. Wait for the creating procedure to finish.
  • 62.
    Create Python Projectand File Click on File, New, choose File, click on Python project folder, write the file name ending in a .py, and click Finish. Go to File, New, Project, select Pydev,Python Project, click Next, write name, choose Python version, and click Finish.
  • 63.
    Running Python To runPython code click on Run, Run As, and select Python Run.
  • 64.
    Lets try for“Hello World!” from athena.ugent.be
  • 65.
    Where is theworkspace ?
  • 66.
  • 68.
    Which Python interpreteris used … check Preferences or run version.py
  • 69.
  • 70.
  • 71.
  • 72.
    Overview What is Python? Why Python 4 Bioinformatics ? How to Python IDE: Eclipse & PyDev / Athena Code Sharing: Git(hub) Examples Hello World PIthon
  • 73.
    git is anopen source, distributed version control system designed for speed and efficiency
  • 74.
    Git: A distributedversion control system • Version control (or revision control, or source control) is all about managing multiple versions of documents, programs, web sites, etc. – Almost all “real” projects use some kind of version control – Essential for team projects, but also very useful for individual projects • Some well-known version control systems are CVS, Subversion, Mercurial, and Git – CVS and Subversion use a “central” repository; users “check out” files, work on them, and “check them in” – Mercurial and Git treat all repositories as equal • Distributed systems like Mercurial and Git are newer and are gradually replacing centralized systems like CVS and Subversion
  • 75.
    Why version control? •For working by yourself: – Gives you a “time machine” for going back to earlier versions – Gives you great support for different versions (standalone, web app, etc.) of the same basic project • For working with others: – Greatly simplifies concurrent work, merging changes • For getting an internship or job: – Any company with a clue uses some kind of version control – Companies without a clue are bad places to work
  • 76.
    Why Git? • Githas many advantages over earlier systems such as CVS and Subversion – More efficient, better workflow, etc. – See the literature for an extensive list of reasons – Of course, there are always those who disagree • It works from with Eclipse, also when started from athena 
  • 77.
    No Network neededfor (almost) everything is local • Performing a diff • Viewing file history • Committing changes • Merging branches • Obtaining any other revision of a file • Switching branches
  • 80.
    GitHub: Hosted GIT •Largest open source git hosting site • Public and private options • User-centric rather than project-centric • http://github.ugent.be (use your Ugent login and password) • URI: • https://github.ugent.be/wvcrieki/Bioinfor matics_2018.py.git
  • 81.
  • 82.
  • 83.
    Typical workflow Person A Setup project & repo  push code onto github  edit/commit  edit/commit  pull/push Person B •clone code from github •edit/commit/push •edit… •edit… commit •pull/push This is just the flow, specific commands on following slides. It’s also possible to create your project first on github, then clone (i.e., no git init)
  • 84.
  • 85.
  • 86.
  • 87.
  • 88.
    GitHub: Hosted GIT URI(Uniform Resource Identifier): https://github.ugent.be/wvcrieki/Bioinformatics_2018.py.git
  • 89.
  • 90.
  • 91.
  • 92.
  • 93.
  • 94.
    Overview What is Python? Why Python 4 Bioinformatics ? How to Python IDE: Eclipse & PyDev / Athena Code Sharing: Git(hub) Examples Hello_World.py PI-thon.py
  • 95.
  • 96.
    Overview What is Python? Why Python 4 Bioinformatics ? How to Python IDE: Eclipse & PyDev / Athena Code Sharing: Git(hub) Examples Hello_World.py PI-thon.py
  • 97.
    Variables • No needto declare • Need to assign (initialize) • use of uninitialized variable raises exception • Not typed if friendly: greeting = "hello world" else: greeting = 12**2 print greeting • Everything is a "variable": • Even functions, classes, modules
  • 98.
    Numbers • The usualsuspects • 12, 3.14, 0xFF, 0377, (-1+2)*3/4**5, abs(x), 0<x<=5 • C-style shifting & masking • 1<<16, x&0xff, x|1, ~x, x^y • Integer division truncates :-( • 1/2 -> 0 # 1./2. -> 0.5, float(1)/2 -> 0.5 • Will be fixed in the future • Long (arbitrary precision), complex • 2L**100 -> 1267650600228229401496703205376L – In Python 2.2 and beyond, 2**100 does the same thing • 1j**2 -> (-1+0j)
  • 99.
    Control Structures if condition: statements [elifcondition: statements] ... else: statements while condition: statements for var in sequence: statements break continue
  • 100.
    Example Function def gcd(a,b): "greatest common divisor" while a != 0: a, b = b%a, a # parallel assignment return b >>> gcd.__doc__ 'greatest common divisor' >>> gcd(12, 20) 4
  • 101.
    How to generaterandom numbers The standard random module implements a random number generator. import random print (random.random()) This prints a random floating point number in the range [0, 1) (that is, between 0 and 1, including 0.0 but always smaller than 1.0). There are also many other specialized generators in this module, such as: randrange(a, b) chooses an integer in the range [a, b). uniform(a, b) chooses a floating point number in the range [a, b). normalvariate(mean, sdev) samples the normal (Gaussian) distribution. Some higher-level functions operate on sequences directly, such as: choice(S) chooses a random element from a given sequence (the sequence must have a known length). shuffle(L) shuffles a list in-place, i.e. permutes it randomly There’s also a Random class you can instantiate to create independent multiple random number generators.
  • 102.
    First program: PI-thon.py •How good are the random numbers ? • If they are good, you should be able to “measure” PI
  • 103.
    Measure Pi withtwo random numbers …. many of them … 1 x y
  • 104.
    e (Euler’s number) •The value of e is also equal to 10! + 11! + 12! + 13! + 14! + 15! + 16! + 17! + ... (etc) • (Note: "!" means factorial) • The first few terms add up to: 1 + 1 + 12 + 16 + 124 + 1120 = 2.718055556 • In fact Euler himself used this method to calculate e to 18 decimal places.
  • 105.
    Eulers identity import math print((math.e**(math.pi*1j)).real + 1)
  • 106.
    Python Videos http://python.org/ - documentation,tutorials, beginners guide, core distribution, ... Books include:  Learning Python by Mark Lutz  Python Essential Reference by David Beazley  Python Cookbook, ed. by Martelli, Ravenscroft and Ascher