A Kanterakis - PyPedia: a python crowdsourcing development environment for bioinformatics and computational biology

PyPedia
The free programming environment
that anyone can edit!
AlexandrosKanterakis

Genomics Coordination Center, Department of Genetics,
University Medical Center, Groningen, The Netherlands

How not to be a bioinformatician
• Stay low level at every level
• Be open source without being open
• Make tools that make no sense to scientists
• Do not ever share your results and do not reuse
• Never maintain your databases and web services
• Be unreachable and isolated

So, you think you can be a
bioinformatician…
• Imagine you only have: A personal computer
with a browser and an Internet connection
• Answer the following question:
- Who is the current prime minister of Latvia?

SYTYCBAB
• Imagine you only have: A personal computer with
a browser and an Internet connection
• Answer the following question:
Compute the Hardy-Weinberg equilibriums of a set of
genotypes
Execute
Source
Documentation

Execute
Source
Documentation

Execute
Source

Documentation

Execute
Source
Documentation
But what about…
? Web environment, online execution
? Open Source
? Integrate with other tools
? Edit a method and share it
? Examples and Unit tests
? Deploy in the cloud
? Frequency of new releases

Apython sandbox to the rescue
From:
http://wiki.python.org/moin/SandboxedPython

So:
Google App Engine + MediaWiki = PyPedia

Executing a method in a remote computer

• Edit your user page and add an “ssh” section:

==ssh==
host=ec2-107-22-59-115.compute-1.amazonaws.com
username=JohnDoe
path=/home/JohnDoe/runPyPedia

• This content is NOT shown to anyone
• Install the PyPedia client on remote
computer(details on pypedia.com)

“Execute on remote computer”

Example:
Fixed_point_user_JohnDoe

The cloud instance contains:
numpy, scipy, matplotlib

Like SAGE but with custom
execution environments
(i.eBioPython, PyCogent, …)

Cool, but I want to call the function from my local computer..

• Install the PyPedia python library:
git clone git://github.com/kantale/pypedia.git

• Load the function in python:
import pypedia
from pypedia importPairwise_linkage_disequilibrium
Pairwise_linkage_disequilibrium([(A,A), (A,G), (G,G),
(G,A)], [(A,A), (A,G), (G,G), (A,A)])

{'haplotypes': [('AA', 0.49999999997393502, 0.3125), ('AG',
2.606498430642265e-11, 0.1875), ('GA', 0.12500000002606498,
0.3125), ('GG', 0.37499999997393502, 0.1875)], 'R_sq':
0.59999999983318408, 'Dprime': 0.99999999986098675}

• You can call the method of any user and your method can be
called by anyone.
• Edit locally, push changes.

• On the top of each article there is a button:

• Creates a personalized version of the article that only
you can edit.

• This is similar to the Github’s “fork” feature.

Using PyPedia for open science
• A complete analysis can be hosted in PyPedia

• Any finding generated or published should be
easily shared and reproduced.

• The reproduction of a finding takes time even
when the source code is released.

Reproducible science
• PyPedia offers a REST interface:
• www.pypedia.com/index.php?
b_timestamp=YYYYMMDDHHMMSS
get_code=python code
• Get the most recent version of the python
code that is edited before the timestamp.

• Reproduce the analysis by sharing a single URL:
http://www.pypedia.com/index.php?b_timestamp=20120102101010get_code=print
Pairwise_linkage_disequilibrium([(A,A), (A,G), (G,G), (G,A)], [(A,A), (A,G), (G,G),
(A,A)])

Reproducing an experiment
# curl
--data-urlencode 'b_timestamp=20120501010101'
--data-urlencode 'get_code=print
Pairwise_linkage_disequilibrium([(A,A), (A,G), (G,G),
(G,A)], [(A,A), (A,G), (G,G), (A,A)])'
http://www.pypedia.com/index.php
--output code.py

# python code.py
{'haplotypes': [('AA', 0.49999999997393502, 0.3125), ('AG',
2.606498430642265e-11, 0.1875), ('GA', 0.12500000002606498, 0.3125),
('GG', 0.37499999997393502, 0.1875)], 'R_sq': 0.59999999983318408,
'Dprime': 0.99999999986098675}

Meta-webserver
• HTML injection is allowed
and encouraged!
http://www.pypedia.com/index.php/Draw_face_user_Kantale

• Example run an HTML code
posted on gist:
http://www.pypedia.com/index.php?
run_code=
import urllib2
print urllib2.urlopen(
‘https://raw.github.com/gist/2689822/bbea0c43b278d7c4c04
b3f7a23ba43f558fba98b/index_full.html’).read()
Click me!

• All content is under the Simplified BSD License
• Two namespaces:
– Validated articles. i.e: Minor_allele_frequency
• Safe, only admins can edit
– User articles. i.e: Minor_allele_frequency_user_John
• Unsafe, edited by individual user
– Qualitative articles from User namespace is
promoted to the Validated namespace
– Validated articles cannot call User articles (duh..)

Some thoughts
(in the embarrassing occasion I have some minutes left)

Code as wiki, program as wiki concept
• Multidimensional expansion
• As Mao said: Let a thousand flowers scripts bloom (and
some of them rot in hell)
• Minimize the distance:
Dsanity(SCRIPTmade_by_IT_guy, SCRIPTuseful_to_biologists)
• Encyclopedialize™ your scripts because open source isn’t
enough!

Future steps:
• Attract editors, make communities!
• If it can be done in python, why not Ruby, …?

• Contact: admin@pypedia.com
• Source code license: GPL v3
• Content license: Simplified BSD license
• Join us in google groups:
http://groups.google.com/group/pypedia
• Twitter: @PyPedia

• PyPedia’s source code:
– Mediawiki extension:
https://github.com/kantale/PyPedia_server
– Python library:
https://github.com/kantale/pypedia

• Acknowledgements:
– Despoina Antonakaki
– Kostas Tselios Posters:
– Morris A. Swertz BOSC: 11
ISMB: E12

A Kanterakis - PyPedia: a python crowdsourcing development environment for bioinformatics and computational biology

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to A Kanterakis - PyPedia: a python crowdsourcing development environment for bioinformatics and computational biology

Similar to A Kanterakis - PyPedia: a python crowdsourcing development environment for bioinformatics and computational biology (20)

More from Jan Aerts

More from Jan Aerts (20)

Recently uploaded

Recently uploaded (20)

A Kanterakis - PyPedia: a python crowdsourcing development environment for bioinformatics and computational biology