Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

P1 2017 python


  • Login to see the comments

  • Be the first to like this

P1 2017 python

  1. 1. FBW 26-09-2017 Wim Van Criekinge
  2. 2.
  3. 3. Google Calendar
  4. 4. Overview What is Python ? Why Python in Bioinformatics ? How to Python IDE: Eclipse & PyDev / Athena Code Sharing: Git(hub) Examples Hello World PIthon
  5. 5. Overview What is Python ? Why Python 4 Bioinformatics ? How to Python IDE: Eclipse & PyDev / Athena Code Sharing: Git(hub) Examples Hello World PIthon
  6. 6. Programming Language • Formal notation for specifying computations – Syntax (usually specified by a context-free grammar) – Semantics for each syntactic construct – Practical implementation on a real or virtual machine • Compilation vs. interpretation • Efficiency vs. portability • Assembly Languages – Invented by machine designers the early 1950s – Reusable macros and subroutines
  7. 7. FORTRAN • Procedural, imperative language – Still used in scientific computation • Developed at IBM in the 1950s by John Backus (1924-2007) – Backus’s 1977 Turing award lecture made the case for functional programming – On FORTRAN: “We did not know what we wanted and how to do it. It just sort of grew. The first struggle was over what the language would look like. Then how to parse expressions – it was a big problem…” • BNF: Backus-Naur form for defining context-free grammars
  8. 8. LISP • Invented by John McCarthy (b. 1927, Turing award: 1971) • Formal notation for lambda-calculus • Pioneered many PL concepts – Automated memory management (garbage collection) – Dynamic typing – No distinction between code and data • Still in use: ACL2, Scheme, … “Anyone could learn Lisp in one day, except that if they already knew FORTRAN, it would take three days” - Marvin Minsky
  9. 9. PASCAL • Designed by Niklaus Wirth – 1984 Turing Award • Revised type system of Algol – Good data structure concepts • Records, variants, subranges – More restrictive than Algol 60/68 • Procedure parameters cannot have procedure parameters • Popular teaching language • Simple one-pass compiler
  10. 10. C • Bell Labs 1972 (Dennis Ritchie) • Development closely related to UNIX – 1983 Turing Award to Thompson and Ritchie • Compiles to native code • 1973-1980: new features; compiler ported – unsigned, long, union, enums • 1978: K&R C book published • 1989: ANSI C standardization – Function prototypes as in C++ • 1999: ISO 9899:1999 also known as “C99” – Inline functions, C++-like decls, bools, variable arrays • Concurrent C, Objective C, C*, C++, C# • “Portable assembly language” – Early C++, Modula-3, Eiffel source-translated to C
  11. 11. JAVA • Sun 1991-1995 (James Gosling) – Originally called Oak, intended for set top boxes • Mixture of C and Modula-3 – Unlike C++ • No templates (generics), no multiple inheritance, no operator overloading – Like Modula-3 (developed at DEC SRC) • Explicit interfaces, single inheritance, exception handling, built-in threading model, references & automatic garbage collection (no explicit pointers!) • “Generics” added later
  12. 12. Other Important Languages • Algol-like – Modula, Oberon, Ada • Functional – ISWIM, FP, SASL, Miranda, Haskell, LCF, ML, Caml, Ocaml, Scheme, Common LISP • Object-oriented – Smalltalk, Objective-C, Eiffel, Modula-3, Self, C#, CLOS • Logic programming – Prolog, Gödel, LDL, ACL2, Isabelle, HOL
  13. 13. … and more • Data processing and databases – Cobol, SQL, 4GLs, XQuery • Systems programming – PL/I, PL/M, BLISS • Specialized applications – APL, Forth, Icon, Logo, SNOBOL4, GPSS, Visual Basic • Concurrent, parallel, distributed – Concurrent Pascal, Concurrent C, C*, SR, Occam, Erlang, Obliq
  14. 14. … and more • Programming tool “mini-languages” – awk, make, lex, yacc, autoconf … • Command shells, scripting and “web” languages – sh, csh, tcsh, ksh, zsh, bash … – Perl, JavaScript, PHP, Python, Rexx, Ruby, Tcl, AppleScript, VBScript … • Web application frameworks and technologies – ASP.NET, AJAX, Flash, Silverlight … • Note: HTML/XML are markup languages, not programming languages, but they often embed executable scripts like Active Server Pages (ASPs) & Java Server Pages (JSPs)
  15. 15. What is scripting ? • Wikipedia has an informative and detailed explanation, “A scripting language, script language or extension language is a programming language that allows control of one or more software applications. "Scripts" are distinct from the core code of the application, as they are usually written in a different language and are often created or at least modified by the end-user.[1] Scripts are often interpreted from source code or bytecode, whereas the applications they control are traditionally compiled to native machine code. Scripting languages are nearly always embedded in the applications they control.[2] • The name "script" is derived from the written script of the performing arts, in which dialogue is set down to be spoken by human actors. Early script languages were often called batch languages or job control languages. Such early scripting languages were created to shorten the traditional edit-compile-link-run process”.
  16. 16. What is Python ? • Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. • Its high-level built in data structures, combined with dynamic typing and dynamic binding, make it very attractive for Rapid Application Development, as well as for use as a scripting or glue language to connect existing components together. • Python supports modules and packages, which encourages program modularity and code reuse. The Python interpreter and the extensive standard library are available in source or binary form without charge for all major platforms, and can be freely distributed. • When he began implementing Python, Guido van Rossum was also reading the published scripts from “Monty Python's Flying Circus”, a BBC comedy series from the 1970s. Van Rossum thought he needed a name that was short, unique, and slightly mysterious, so he decided to call the language Python.
  17. 17. What’s Driving Their Evolution? • Constant search for better ways to build software tools for solving computational problems – Many PLs are general purpose tools – Others are targeted at specific kinds of problems • For example, massively parallel computations or graphics • Useful ideas evolve into language designs – Algol  Simula  Smalltalk  C with Classes  C++ • Often design is driven by expediency – Scripting languages: Perl, Tcl, Python, PHP, etc. • “PHP is a minor evil perpetrated by incompetent amateurs, whereas Perl is a great and insidious evil, perpetrated by skilled but perverted professionals.” - Jon Ribbens
  18. 18. What Do They Have in Common? • Lexical structure and analysis – Tokens: keywords, operators, symbols, variables – Regular expressions and finite automata • Syntactic structure and analysis – Parsing, context-free grammars • Pragmatic issues – Scoping, block structure, local variables – Procedures, parameter passing, iteration, recursion – Type checking, data structures • Semantics – What do programs mean and are they correct
  19. 19. Visual history of programming languages
  20. 20. The most valuable programming skills to have on a resume
  21. 21. The most valuable programming skills to have on a resume
  22. 22. The 2015 Top Ten Programming Languages
  23. 23. 3/05/2016 Project Biological Databases 2015-2016 Biological Databases Bruno Verstraeten, Arthur Zwaenepoel, Jules Haezebrouck, Laurenz De Cock, Jonathan Walgraeve, Cedric Bogaert, Dries Schaumont
  24. 24. What is minecraft • Sandbox game • Designed by Markus “Notch” Persson • Mojang • Bought by Microsoft in 2014 • 70 million sold copies (june 2015)
  25. 25. Minecraft programming from Python
  26. 26. Third party mods • Extra content made by users • Adding items, magic and features to the original game • The true beauty of minecraft
  27. 27. And now Sciencecraft • Visualizing proteins in minecraft • Minecraft Tools python package • Data directly from PDB flat files or from the PDB server • Spigot minecraft server
  28. 28. The basics 1. Start a server with Minecraft Tools 2. Using python import the pdb file 3. Retrieve the coordinates from the file 4. Using the setBlock function blocks of specific colours are placed in the minecraft server to represent the protein 5. Fly around and take screenshots
  29. 29. Minecraft programming from Python # Connect to Minecraft from mcpi.minecraft import Minecraft mc = Minecraft.create() # Set x, y, and z variables to represent coordinates x = 10.0 y = 110.0 z = 12.0 # Change the player's position mc.player.setPos(x, y, z)
  30. 30. Verotoxin
  31. 31. Apo-lipoprotein A1
  32. 32. Kinesine
  33. 33. Retrieving PDB data using SPARQL • PDB available in RDF (wwPDB) • Using python SPARQLwrapper
  34. 34. Using SPARQL with Python – SPARQLWrapper SPARQL endpoint
  35. 35. Using SPARQL with Python – SPARQLWrapper “Search engine” • Naive regex based • Returns list of all pdb entries containing a certain keyword with organism name and full description • PDB entry can be retrieved with previous query
  36. 36. Retrieve .xml.gz file:  Actual structure information in xml file <?xml version="1.0" encoding="UTF-8" ?> <PDBx:datablock datablockName="1O9K" xmlns:PDBx="" xmlns:xsi="" xsi:schemaLocation=" v40.xsd pdbx-v40.xsd"> <PDBx:atom_siteCategory> <PDBx:atom_site id="1"> <PDBx:B_iso_or_equiv>62.42</PDBx:B_iso_or_equiv> <PDBx:Cartn_x>13.258</PDBx:Cartn_x> <PDBx:Cartn_y>142.706</PDBx:Cartn_y> <PDBx:Cartn_z>30.410</PDBx:Cartn_z> <PDBx:auth_asym_id>A</PDBx:auth_asym_id> <PDBx:auth_atom_id>N</PDBx:auth_atom_id> <PDBx:auth_comp_id>MET</PDBx:auth_comp_id> <PDBx:auth_seq_id>379</PDBx:auth_seq_id> <PDBx:group_PDB>ATOM</PDBx:group_PDB> <PDBx:label_alt_id xsi:nil="true" /> <PDBx:label_asym_id>A</PDBx:label_asym_id> <PDBx:label_atom_id>N</PDBx:label_atom_id> <PDBx:label_comp_id>MET</PDBx:label_comp_id> <PDBx:label_entity_id>1</PDBx:label_entity_id> <PDBx:label_seq_id>8</PDBx:label_seq_id> <PDBx:occupancy>1.00</PDBx:occupancy> …. Using SPARQL with Python – SPARQLWrapper
  37. 37. New kid in the coding block …
  38. 38. Overview What is Python ? Why Python 4 Bioinformatics ? How to Python IDE: Eclipse & PyDev / Athena Code Sharing: Git(hub) Examples Hello World PIthon
  39. 39. Python • Programming languages are overrated – If you are going into bioinformatics you probably learn/need multiple – If you know one you know 90% of a second • Choice does matter but it matters far less than people think it does • Why Python? – Lets you start useful programs asap – Build-in libraries – incl BioPython – Free, most platforms, widely (scientifically) used • Versus Perl? – Incredibly similar – Consistent syntax, indentation
  40. 40.
  41. 41. Should I use Python 2 or Python 3 for my development activity? • Short version: Python 2.x is legacy, Python 3.x is the present and future of the language • Python 3.0 was released in 2008. The final 2.x version 2.7 release came out in mid- 2010, with a statement of extended support for this end-of-life release. The 2.x branch will see no new major releases after that. 3.x is under active development and has already seen over five years of stable releases, including version 3.3 in 2012 and 3.4 in 2014. This means that all recent standard library improvements, for example, are only available by default in Python 3.x. • Guido van Rossum (the original creator of the Python language) decided to clean up Python 2.x properly, with less regard for backwards compatibility than is the case for new releases in the 2.x range. The most drastic improvement is the better Unicode support (with all text strings being Unicode by default) as well as saner bytes/Unicode separation. • Besides, several aspects of the core language (such as print and exec being statements, integers using floor division) have been adjusted to be easier for newcomers to learn and to be more consistent with the rest of the language, and old cruft has been removed (for example, all classes are now new-style, "range()" returns a memory efficient iterable, not a list as in 2.x). • The What's New in Python 3.0 document provides a good overview of the major language changes and likely sources of incompatibility with existing Python 2.x code. Nick Coghlan (one of the CPython core developers) has also created a relatively extensive FAQ regarding the transition. • However, the broader Python ecosystem has amassed a significant amount of quality software over the years. The downside of breaking backwards compatibility in 3.x is that some of that software (especially in-house software in companies) still doesn't work on 3.x yet.
  42. 42. How to install ? • On windows you’ll need administrator right  • Portable python distribution ? Takes 500Mb and >2 hours 
  43. 43. Version 2.7 and 3.4 on
  44. 44. Interactive “Shell” • Great for learning the language • Great for experimenting with the library • Great for testing your own modules • Two variations: IDLE (GUI), python (command line) • Type statements or expressions at prompt: >>> print "Hello, world" Hello, world >>> x = 12**2 >>> x/2 72 >>> # this is a comment
  45. 45. Overview What is Python ? Why Python 4 Bioinformatics ? How to Python IDE: Eclipse & PyDev / Athena Code Sharing: Git(hub) Examples Hello World PIthon
  46. 46. IDE: Integrated Development Environment • You type scripts using can use notepad(++) • Better: PyCharm – available for free on most OS but you need to be administrator to install  • We will use Eclipse in combination with PyDev
  47. 47. What is Eclipse? • Eclipse started as a proprietary IBM product (IBM Visual age for Smalltalk/Java) – Embracing the open source model IBM opened the product up • Open Source – It is a general purpose open platform that facilitates and encourages the development of third party plug- ins • Best known as an Integrated Development Environment (IDE) – Provides tools for coding, building, running and debugging applications • Originally designed for Java, now supports many other languages – Good support for C, C++ – Python, PHP, Ruby, etc…
  48. 48. Prerequisites for Running Eclipse • Eclipse is written in Java and will thus need an installed JRE or JDK in which to execute – JDK recommended
  49. 49. Selecting a Workspace • In Eclipse, all of your code will live under a workspace • A workspace is nothing more than a location where we will store our source code and where Eclipse will write out our preferences • Eclipse allows you to have multiple workspaces – each tailored in its own way • Choose a location where you want to store your files, then click OK
  50. 50. Eclipse IDE Components Menubars Full drop down menus plus quick access to common functions Editor Pane This is where we edit our source code Perspective Switcher We can switch between various perspectives here Outline Pane This contains a hierarchical view of a source file Package Explorer Pane This is where our projects/files are listed Miscellaneous Pane Various components can appear in this pane – typically this contains a console and a list of compiler problems Task List Pane This contains a list of “tasks” to complete PYTHON
  51. 51. PyDev: Python plug-in for Eclipse • Syntax highlighting • Debugger • Code completion • An extensive preference menu that can be used to edit the plug-in’s attributes and options.
  52. 52. Installation  The plug-in can be installed through Software Updates:
  53. 53. Setting Up In Eclipse, go to: Window, Preferences, PyDev, Interpreter-Python, and click New. Select the python.exe file in the Python directory, click OK and OK in the Preferences window again. Wait for the creating procedure to finish.
  54. 54. Create Python Project and File Click on File, New, choose File, click on Python project folder, write the file name ending in a .py, and click Finish. Go to File, New, Project, select Pydev,Python Project, click Next, write name, choose Python version, and click Finish.
  55. 55. Running Python To run Python code click on Run, Run As, and select Python Run.
  56. 56. Lets try for “Hello World!” from
  57. 57. Where is the workspace ?
  58. 58. Make PyDev Project
  59. 59. Which Python interpreter is used … check Preferences or run
  60. 60. Create new file …
  61. 61. …
  62. 62. Run
  63. 63. Overview What is Python ? Why Python 4 Bioinformatics ? How to Python IDE: Eclipse & PyDev / Athena Code Sharing: Git(hub) Examples Hello World PIthon
  64. 64. git is an open source, distributed version control system designed for speed and efficiency
  65. 65. Git: A distributed version control system • Version control (or revision control, or source control) is all about managing multiple versions of documents, programs, web sites, etc. – Almost all “real” projects use some kind of version control – Essential for team projects, but also very useful for individual projects • Some well-known version control systems are CVS, Subversion, Mercurial, and Git – CVS and Subversion use a “central” repository; users “check out” files, work on them, and “check them in” – Mercurial and Git treat all repositories as equal • Distributed systems like Mercurial and Git are newer and are gradually replacing centralized systems like CVS and Subversion
  66. 66. Why version control? • For working by yourself: – Gives you a “time machine” for going back to earlier versions – Gives you great support for different versions (standalone, web app, etc.) of the same basic project • For working with others: – Greatly simplifies concurrent work, merging changes • For getting an internship or job: – Any company with a clue uses some kind of version control – Companies without a clue are bad places to work
  67. 67. Why Git? • Git has many advantages over earlier systems such as CVS and Subversion – More efficient, better workflow, etc. – See the literature for an extensive list of reasons – Of course, there are always those who disagree • It works from with Eclipse, also when started from athena 
  68. 68. No Network needed for (almost) everything is local • Performing a diff • Viewing file history • Committing changes • Merging branches • Obtaining any other revision of a file • Switching branches
  69. 69. GitHub: Hosted GIT • Largest open source git hosting site • Public and private options • User-centric rather than project-centric • (use your Ugent login and password) • URI: –
  70. 70. GitHub: Hosted GIT
  71. 71. GitHub: Hosted GIT
  72. 72. GitHub: Hosted GIT
  73. 73. Typical workflow Person A  Setup project & repo  push code onto github  edit/commit  edit/commit  pull/push Person B •clone code from github •edit/commit/push •edit… •edit… commit •pull/push This is just the flow, specific commands on following slides. It’s also possible to create your project first on github, then clone (i.e., no git init)
  74. 74. GitHub: Hosted GIT
  75. 75. GitHub: Hosted GIT
  76. 76. GitHub: Hosted GIT
  77. 77. GitHub: Hosted GIT
  78. 78. GitHub: Hosted GIT URI (Uniform Resource Identifier):
  79. 79. GitHub: Hosted GIT
  80. 80. GitHub: Hosted GIT
  81. 81. GitHub: Hosted GIT
  82. 82. GitHub: Hosted GIT
  83. 83. GitHub: Hosted GIT
  84. 84. Overview What is Python ? Why Python 4 Bioinformatics ? How to Python IDE: Eclipse & PyDev / Athena Code Sharing: Git(hub) Examples
  85. 85.
  86. 86. Overview What is Python ? Why Python 4 Bioinformatics ? How to Python IDE: Eclipse & PyDev / Athena Code Sharing: Git(hub) Examples
  87. 87. Variables • No need to declare • Need to assign (initialize) • use of uninitialized variable raises exception • Not typed if friendly: greeting = "hello world" else: greeting = 12**2 print greeting • Everything is a "variable": • Even functions, classes, modules
  88. 88. Numbers • The usual suspects • 12, 3.14, 0xFF, 0377, (-1+2)*3/4**5, abs(x), 0<x<=5 • C-style shifting & masking • 1<<16, x&0xff, x|1, ~x, x^y • Integer division truncates :-( • 1/2 -> 0 # 1./2. -> 0.5, float(1)/2 -> 0.5 • Will be fixed in the future • Long (arbitrary precision), complex • 2L**100 -> 1267650600228229401496703205376L – In Python 2.2 and beyond, 2**100 does the same thing • 1j**2 -> (-1+0j)
  89. 89. Control Structures if condition: statements [elif condition: statements] ... else: statements while condition: statements for var in sequence: statements break continue
  90. 90. Example Function def gcd(a, b): "greatest common divisor" while a != 0: a, b = b%a, a # parallel assignment return b >>> gcd.__doc__ 'greatest common divisor' >>> gcd(12, 20) 4
  91. 91. How to generate random numbers The standard random module implements a random number generator. import random print (random.random()) This prints a random floating point number in the range [0, 1) (that is, between 0 and 1, including 0.0 but always smaller than 1.0). There are also many other specialized generators in this module, such as: randrange(a, b) chooses an integer in the range [a, b). uniform(a, b) chooses a floating point number in the range [a, b). normalvariate(mean, sdev) samples the normal (Gaussian) distribution. Some higher-level functions operate on sequences directly, such as: choice(S) chooses a random element from a given sequence (the sequence must have a known length). shuffle(L) shuffles a list in-place, i.e. permutes it randomly There’s also a Random class you can instantiate to create independent multiple random number generators.
  92. 92. First program: • How good are the random numbers ? • If they are good, you should be able to “measure” PI
  93. 93. Measure Pi with two random numbers …. many of them … 1 x y
  94. 94. Python Videos - documentation, tutorials, beginners guide, core distribution, ... Books include:  Learning Python by Mark Lutz  Python Essential Reference by David Beazley  Python Cookbook, ed. by Martelli, Ravenscroft and Ascher