6. Overview
What is Python ?
Why Python in Bioinformatics ?
How to Python
IDE: Eclipse & PyDev / Athena
Code Sharing: Git(hub)
Examples
Hello World
PIthon
7. Overview
What is Python ?
Why Python 4 Bioinformatics ?
How to Python
IDE: Eclipse & PyDev / Athena
Code Sharing: Git(hub)
Examples
Hello World
PIthon
8. Programming Language
• Formal notation for specifying computations
– Syntax (usually specified by a context-free
grammar)
– Semantics for each syntactic construct
– Practical implementation on a real or virtual
machine
• Compilation vs. interpretation
• Efficiency vs. portability
• Assembly Languages
– Invented by machine designers
the early 1950s
– Reusable macros and subroutines
9. FORTRAN
• Procedural, imperative language
– Still used in scientific computation
• Developed at IBM in the 1950s by
John Backus (1924-2007)
– Backus’s 1977 Turing award lecture made the
case for functional programming
– On FORTRAN: “We did not know what we
wanted and how to do it. It just sort of grew. The
first struggle was over what the language would
look like. Then how to parse expressions – it was
a big problem…”
• BNF: Backus-Naur form for defining context-free
grammars
10. LISP
• Invented by John McCarthy (b. 1927, Turing award:
1971)
• Formal notation for lambda-calculus
• Pioneered many PL concepts
– Automated memory management (garbage collection)
– Dynamic typing
– No distinction between code and data
• Still in use: ACL2, Scheme, …
“Anyone could learn Lisp in one day,
except that if they already knew FORTRAN, it would take
three days”
- Marvin Minsky
11. PASCAL
• Designed by Niklaus Wirth
– 1984 Turing Award
• Revised type system of Algol
– Good data structure concepts
• Records, variants, subranges
– More restrictive than Algol 60/68
• Procedure parameters cannot have procedure
parameters
• Popular teaching language
• Simple one-pass compiler
12. C
• Bell Labs 1972 (Dennis Ritchie)
• Development closely related to UNIX
– 1983 Turing Award to Thompson and Ritchie
• Compiles to native code
• 1973-1980: new features; compiler ported
– unsigned, long, union, enums
• 1978: K&R C book published
• 1989: ANSI C standardization
– Function prototypes as in C++
• 1999: ISO 9899:1999 also known as “C99”
– Inline functions, C++-like decls, bools, variable arrays
• Concurrent C, Objective C, C*, C++, C#
• “Portable assembly language”
– Early C++, Modula-3, Eiffel source-translated to C
13. JAVA
• Sun 1991-1995 (James Gosling)
– Originally called Oak, intended for set top boxes
• Mixture of C and Modula-3
– Unlike C++
• No templates (generics), no multiple inheritance, no
operator overloading
– Like Modula-3 (developed at DEC SRC)
• Explicit interfaces, single inheritance, exception
handling, built-in threading model, references &
automatic garbage collection (no explicit pointers!)
• “Generics” added later
14. Other Important Languages
• Algol-like
– Modula, Oberon, Ada
• Functional
– ISWIM, FP, SASL, Miranda, Haskell, LCF,
ML, Caml, Ocaml, Scheme, Common LISP
• Object-oriented
– Smalltalk, Objective-C, Eiffel, Modula-3,
Self, C#, CLOS
• Logic programming
– Prolog, Gödel, LDL, ACL2, Isabelle, HOL
15. … and more
• Data processing and databases
– Cobol, SQL, 4GLs, XQuery
• Systems programming
– PL/I, PL/M, BLISS
• Specialized applications
– APL, Forth, Icon, Logo, SNOBOL4,
GPSS, Visual Basic
• Concurrent, parallel, distributed
– Concurrent Pascal, Concurrent C, C*,
SR, Occam, Erlang, Obliq
16. … and more
• Programming tool “mini-languages”
– awk, make, lex, yacc, autoconf …
• Command shells, scripting and “web”
languages
– sh, csh, tcsh, ksh, zsh, bash …
– Perl, JavaScript, PHP, Python, Rexx, Ruby, Tcl,
AppleScript, VBScript …
• Web application frameworks and technologies
– ASP.NET, AJAX, Flash, Silverlight …
• Note: HTML/XML are markup languages, not
programming languages, but they often embed
executable scripts like Active Server Pages (ASPs) &
Java Server Pages (JSPs)
17. What is scripting ?
• Wikipedia has an informative and detailed explanation, “A
scripting language, script language or extension language
is a programming language that allows control of one or more
software applications. "Scripts" are distinct from the core
code of the application, as they are usually written in a
different language and are often created or at least modified
by the end-user.[1] Scripts are often interpreted from source
code or bytecode, whereas the applications they control are
traditionally compiled to native machine code. Scripting
languages are nearly always embedded in the applications
they control.[2]
• The name "script" is derived from the written script of the
performing arts, in which dialogue is set down to be spoken
by human actors. Early script languages were often called
batch languages or job control languages. Such early
scripting languages were created to shorten the traditional
edit-compile-link-run process”.
18. What is Python ?
• Python is an interpreted, object-oriented, high-level
programming language with dynamic semantics.
• Its high-level built in data structures, combined with dynamic
typing and dynamic binding, make it very attractive for Rapid
Application Development, as well as for use as a scripting or
glue language to connect existing components together.
• Python supports modules and packages, which encourages
program modularity and code reuse. The Python interpreter
and the extensive standard library are available in source or
binary form without charge for all major platforms, and can be
freely distributed.
• When he began implementing Python, Guido van Rossum
was also reading the published scripts from “Monty Python's
Flying Circus”, a BBC comedy series from the 1970s. Van
Rossum thought he needed a name that was short, unique,
and slightly mysterious, so he decided to call the language
Python.
19. What’s Driving Their Evolution?
• Constant search for better ways to build software
tools for solving computational problems
– Many PLs are general purpose tools
– Others are targeted at specific kinds of problems
• For example, massively parallel computations or
graphics
• Useful ideas evolve into language designs
– Algol Simula Smalltalk C with Classes
C++
• Often design is driven by expediency
– Scripting languages: Perl, Tcl, Python, PHP, etc.
• “PHP is a minor evil perpetrated by incompetent
amateurs, whereas Perl is a great and insidious evil,
perpetrated by skilled but perverted professionals.” -
Jon Ribbens
20. What Do They Have in Common?
• Lexical structure and analysis
– Tokens: keywords, operators, symbols, variables
– Regular expressions and finite automata
• Syntactic structure and analysis
– Parsing, context-free grammars
• Pragmatic issues
– Scoping, block structure, local variables
– Procedures, parameter passing, iteration,
recursion
– Type checking, data structures
• Semantics
– What do programs mean and are they correct
21. Visual history of programming languages
http://cdn.oreillystatic.com/news/graphics/prog_lang_poster.pdf
32. Third party mods
• Extra content made by users
• Adding items, magic and features to
the original game
• The true beauty of minecraft
33. And now Sciencecraft
• Visualizing proteins in minecraft
• Minecraft Tools python package
• Data directly from PDB flat files or
from the PDB server
• Spigot minecraft server
34. The basics
1. Start a server with Minecraft Tools
2. Using python import the pdb file
3. Retrieve the coordinates from the file
4. Using the setBlock function blocks of
specific colours are placed in the
minecraft server to represent the protein
5. Fly around and take screenshots
35. Minecraft programming from Python
# Connect to Minecraft
from mcpi.minecraft import Minecraft
mc = Minecraft.create()
# Set x, y, and z variables to represent coordinates
x = 10.0
y = 110.0
z = 12.0
# Change the player's position
mc.player.setPos(x, y, z)
41. Using SPARQL with Python – SPARQLWrapper
“Search engine”
• Naive regex based
• Returns list of all pdb
entries containing a
certain keyword with
organism name and
full description
• PDB entry can be
retrieved with previous
query
42.
43. Retrieve .xml.gz file:
Actual structure information in xml file
<?xml version="1.0" encoding="UTF-8" ?>
<PDBx:datablock datablockName="1O9K"
xmlns:PDBx="http://pdbml.pdb.org/schema/pdbx-v40.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://pdbml.pdb.org/schema/pdbx-
v40.xsd pdbx-v40.xsd">
<PDBx:atom_siteCategory>
<PDBx:atom_site id="1">
<PDBx:B_iso_or_equiv>62.42</PDBx:B_iso_or_equiv>
<PDBx:Cartn_x>13.258</PDBx:Cartn_x>
<PDBx:Cartn_y>142.706</PDBx:Cartn_y>
<PDBx:Cartn_z>30.410</PDBx:Cartn_z>
<PDBx:auth_asym_id>A</PDBx:auth_asym_id>
<PDBx:auth_atom_id>N</PDBx:auth_atom_id>
<PDBx:auth_comp_id>MET</PDBx:auth_comp_id>
<PDBx:auth_seq_id>379</PDBx:auth_seq_id>
<PDBx:group_PDB>ATOM</PDBx:group_PDB>
<PDBx:label_alt_id xsi:nil="true" />
<PDBx:label_asym_id>A</PDBx:label_asym_id>
<PDBx:label_atom_id>N</PDBx:label_atom_id>
<PDBx:label_comp_id>MET</PDBx:label_comp_id>
<PDBx:label_entity_id>1</PDBx:label_entity_id>
<PDBx:label_seq_id>8</PDBx:label_seq_id>
<PDBx:occupancy>1.00</PDBx:occupancy>
….
Using SPARQL with Python – SPARQLWrapper
45. Overview
What is Python ?
Why Python 4 Bioinformatics ?
How to Python
IDE: Eclipse & PyDev / Athena
Code Sharing: Git(hub)
Examples
Hello World
PIthon
46. Python
• Programming languages are overrated
– If you are going into bioinformatics you probably
learn/need multiple
– If you know one you know 90% of a second
• Choice does matter but it matters far less than people think it
does
• Why Python?
– Lets you start useful programs asap
– Build-in libraries – incl BioPython
– Free, most platforms, widely (scientifically) used
• Versus Perl?
– Incredibly similar
– Consistent syntax, indentation
48. Should I use Python 2 or Python 3 for my development activity?
• Short version: Python 2.x is legacy, Python 3.x is the present and future of the language
• Python 3.0 was released in 2008. The final 2.x version 2.7 release came out in mid-
2010, with a statement of extended support for this end-of-life release. The 2.x branch
will see no new major releases after that. 3.x is under active development and has
already seen over five years of stable releases, including version 3.3 in 2012 and 3.4 in
2014. This means that all recent standard library improvements, for example, are only
available by default in Python 3.x.
• Guido van Rossum (the original creator of the Python language) decided to clean up
Python 2.x properly, with less regard for backwards compatibility than is the case for
new releases in the 2.x range. The most drastic improvement is the better Unicode
support (with all text strings being Unicode by default) as well as saner bytes/Unicode
separation.
• Besides, several aspects of the core language (such as print and exec being statements,
integers using floor division) have been adjusted to be easier for newcomers to learn and
to be more consistent with the rest of the language, and old cruft has been removed (for
example, all classes are now new-style, "range()" returns a memory efficient iterable,
not a list as in 2.x).
• The What's New in Python 3.0 document provides a good overview of the major
language changes and likely sources of incompatibility with existing Python 2.x code.
Nick Coghlan (one of the CPython core developers) has also created a relatively
extensive FAQ regarding the transition.
• However, the broader Python ecosystem has amassed a significant amount of quality
software over the years. The downside of breaking backwards compatibility in 3.x is that
some of that software (especially in-house software in companies) still doesn't work on
3.x yet.
49. How to install ?
• On windows you’ll need administrator
right
• Portable python distribution ?
Takes 500Mb and >2 hours
52. Interactive “Shell”
• Great for learning the language
• Great for experimenting with the library
• Great for testing your own modules
• Two variations: IDLE (GUI),
python (command line)
• Type statements or expressions at prompt:
>>> print "Hello, world"
Hello, world
>>> x = 12**2
>>> x/2
72
>>> # this is a comment
53. Overview
What is Python ?
Why Python 4 Bioinformatics ?
How to Python
IDE: Eclipse & PyDev / Athena
Code Sharing: Git(hub)
Examples
Hello World
PIthon
54. IDE: Integrated Development Environment
• You type scripts using can use
notepad(++)
• Better: PyCharm
– available for free on most OS but you
need to be administrator to install
• We will use Eclipse in combination
with PyDev
55. What is Eclipse?
• Eclipse started as a proprietary IBM product
(IBM Visual age for Smalltalk/Java)
– Embracing the open source model IBM opened the
product up
• Open Source
– It is a general purpose open platform that facilitates
and encourages the development of third party plug-
ins
• Best known as an Integrated Development
Environment (IDE)
– Provides tools for coding, building, running and
debugging applications
• Originally designed for Java, now supports
many other languages
– Good support for C, C++
– Python, PHP, Ruby, etc…
56. Prerequisites for Running Eclipse
• Eclipse is written in Java and will
thus need an installed JRE or JDK
in which to execute
– JDK recommended
57. Selecting a Workspace
• In Eclipse, all of your code will live under a
workspace
• A workspace is nothing more than a location
where we will store our source code and
where Eclipse will write out our preferences
• Eclipse allows you to have multiple
workspaces – each tailored in its own way
• Choose a location where you want to store
your files, then click OK
58. Eclipse IDE Components
Menubars
Full drop down menus plus quick
access to common functions
Editor Pane
This is where we edit
our source code
Perspective Switcher
We can switch between
various perspectives
here
Outline Pane
This contains a hierarchical
view of a source file
Package Explorer Pane
This is where our
projects/files are listed
Miscellaneous Pane
Various components can appear in this
pane – typically this contains a console
and a list of compiler problems
Task List Pane
This contains a list of
“tasks” to complete
PYTHON
59. PyDev: Python plug-in for Eclipse
• Syntax highlighting
• Debugger
• Code completion
• An extensive preference menu
that can be used to edit the
plug-in’s attributes and options.
61. Setting Up
In Eclipse, go to:
Window, Preferences, PyDev,
Interpreter-Python,
and click New.
Select the python.exe
file in the Python
directory, click OK and
OK in the Preferences
window again. Wait for
the creating procedure
to finish.
62. Create Python Project and File
Click on File, New, choose File,
click on Python project folder,
write the file name ending in a
.py, and click Finish.
Go to File, New, Project, select
Pydev,Python Project, click Next,
write name, choose Python version,
and click Finish.
72. Overview
What is Python ?
Why Python 4 Bioinformatics ?
How to Python
IDE: Eclipse & PyDev / Athena
Code Sharing: Git(hub)
Examples
Hello World
PIthon
73. git is an open source,
distributed version control
system designed for speed
and efficiency
74. Git: A distributed version control system
• Version control (or revision control, or source
control) is all about managing multiple versions of
documents, programs, web sites, etc.
– Almost all “real” projects use some kind of version
control
– Essential for team projects, but also very useful for
individual projects
• Some well-known version control systems are
CVS, Subversion, Mercurial, and Git
– CVS and Subversion use a “central” repository; users
“check out” files, work on them, and “check them in”
– Mercurial and Git treat all repositories as equal
• Distributed systems like Mercurial and Git are
newer and are gradually replacing centralized
systems like CVS and Subversion
75. Why version control?
• For working by yourself:
– Gives you a “time machine” for going back to
earlier versions
– Gives you great support for different versions
(standalone, web app, etc.) of the same basic
project
• For working with others:
– Greatly simplifies concurrent work, merging
changes
• For getting an internship or job:
– Any company with a clue uses some kind of
version control
– Companies without a clue are bad places to work
76. Why Git?
• Git has many advantages over earlier
systems such as CVS and Subversion
– More efficient, better workflow, etc.
– See the literature for an extensive list of reasons
– Of course, there are always those who disagree
• It works from with Eclipse, also when
started from athena
77. No Network needed for
(almost) everything is local
• Performing a diff
• Viewing file history
• Committing changes
• Merging branches
• Obtaining any other
revision of a file
• Switching branches
78.
79.
80. GitHub: Hosted GIT
• Largest open source git hosting site
• Public and private options
• User-centric rather than project-centric
• http://github.ugent.be (use your Ugent
login and password)
• URI:
– https://github.ugent.be/wvcrieki/Bioinfor
matics_2017.py.git
84. Typical workflow
Person A
Setup project &
repo
push code onto
github
edit/commit
edit/commit
pull/push
Person B
•clone code from
github
•edit/commit/push
•edit…
•edit… commit
•pull/push
This is just the flow, specific commands on following slides.
It’s also possible to create your project first on github, then clone (i.e., no git init)
97. Overview
What is Python ?
Why Python 4 Bioinformatics ?
How to Python
IDE: Eclipse & PyDev / Athena
Code Sharing: Git(hub)
Examples
Hello_World.py
PI-thon.py
98. Variables
• No need to declare
• Need to assign (initialize)
• use of uninitialized variable raises exception
• Not typed
if friendly: greeting = "hello world"
else: greeting = 12**2
print greeting
• Everything is a "variable":
• Even functions, classes, modules
99. Numbers
• The usual suspects
• 12, 3.14, 0xFF, 0377, (-1+2)*3/4**5, abs(x), 0<x<=5
• C-style shifting & masking
• 1<<16, x&0xff, x|1, ~x, x^y
• Integer division truncates :-(
• 1/2 -> 0 # 1./2. -> 0.5, float(1)/2 -> 0.5
• Will be fixed in the future
• Long (arbitrary precision), complex
• 2L**100 -> 1267650600228229401496703205376L
– In Python 2.2 and beyond, 2**100 does the same thing
• 1j**2 -> (-1+0j)
101. Example Function
def gcd(a, b):
"greatest common divisor"
while a != 0:
a, b = b%a, a # parallel assignment
return b
>>> gcd.__doc__
'greatest common divisor'
>>> gcd(12, 20)
4
102. How to generate random numbers
The standard random module implements a random number
generator.
import random
print (random.random())
This prints a random floating point number in the range [0, 1) (that is,
between 0 and 1, including 0.0 but always smaller than 1.0).
There are also many other specialized generators in this module,
such as:
randrange(a, b) chooses an integer in the range [a, b).
uniform(a, b) chooses a floating point number in the range [a, b).
normalvariate(mean, sdev) samples the normal (Gaussian)
distribution.
Some higher-level functions operate on sequences directly, such as:
choice(S) chooses a random element from a given sequence (the
sequence must have a known length).
shuffle(L) shuffles a list in-place, i.e. permutes it randomly
There’s also a Random class you can instantiate to create
independent multiple random number generators.
103. First program: PI-thon.py
• How good are the random numbers ?
• If they are good, you should be able to
“measure” PI