SlideShare a Scribd company logo
1 of 32
Download to read offline
minutes
Python for Chemistry in 21 days



                Dr. Noel O'Boyle

   Dr. John Mitchell and Prof. Peter Murray-Rust

                 UCC Talk, Sept 2005
 Available at http://www-mitchell.ch.cam.ac.uk/noel/
Introduction
●   This talk will cover
     –   what Python is
     –   why you should be interested in it
     –   how you can use Python in chemistry
Introduction
●   This talk will cover
     –   what Python is
     –   why you should be interested in it
     –   how you can use Python in chemistry
●   This talk will not cover
     –   how to program in Python
●   See references at end of talk



                       Word of warning!!
“My mental eye could now distinguish larger structures, of
manifold conformation; long rows, sometimes more closely
fitted together; all twining and twisting in snakelike motion.
But look! What was that? One of the snakes had seized
hold of its own tail, and the form whirled mockingly before
my eyes. As if by the flash of lightning I awoke...Let us learn
to dream, gentlemen”
                          Friedrich August Kekulé (1829-1896)
What is Python?
●   For a computer scientist...
    –   a high-level programming language
    –   interpreted (byte-compiled)
    –   dynamic typing
    –   object-oriented
What is Python?
●   For a computer scientist...
    –   a high-level programming language
    –   interpreted (byte-compiled)
    –   dynamic typing
    –   object-oriented
●   For everyone else...
    –   a scripting language (like Perl or Ruby) released by
        Guido von Rossum in 1991
    –   easy to learn
    –   easy to read (!)
What is Python?
●   For a computer scientist...
    –   a high-level programming language
    –   interpreted (byte-compiled)
    –   dynamic typing
    –   object-oriented
●   For everyone else...
    –   a scripting language (like Perl or Ruby) released by
        Guido von Rossum in 1991
    –   easy to learn
    –   easy to read (!)
    –   named after Cambridge comedians
The Great Debate

Sir Lancelot:
We were in the nick of time. You were in great Perl.
Sir Galahad:
I don't think I was.
Sir Lancelot:
You were, Sir Galahad. You were in terrible Perl.
Sir Galahad:
Look, let me go back in there and face the Perl.
Sir Lancelot:
No, it's too perilous.

                      (adapted from Monty Python and the Holy Grail)
Why you should be interested (1)

●   Python has been adopted by the cheminformatics community
●   For example, AstraZeneca has moved some of its codebase
    from 'the other scripting language' to Python


Job Description – Research Software Developer/Informatician
[section deleted]
Required Skills:

At least one object oriented programming language, e.g., Python, C++, Java.
Web-based application development (design/construction/maintenance)
UNIX, UNIX scripting & Linux OS

                            Position in AstraZeneca R&D, 02-09-05
Why you should be interested (2)

●   Scientific computing: scipy/pylab (like Matlab)
●   Molecular dynamics: MMTK
●   Statistics: scipy, rpy (R), pychem
●   3D-visualisation: VTK (mayavi)
●   2D-visualisation: scipy, pylab, rpy
●   coming soon, a wrapper around OpenBabel
●   cheminformatics: OEChem, frowns, PyDaylight, pychem
●   bioinformatics: BioPython
●   structural biology: PyMOL
●   computational chemistry: GaussSum
●   you can still use Java libraries...like the CDK
>>> from scipy import *
         >>> from pylab import *
         >>> a = arange(0,4)       > a <- seq(0,3)
         >>> a                     >a
scipy/                                               R
         [0,1,2,3]                 [1] 0 1 2 3
pylab    >>> mean(a)               > mean(a)
         1.5                       [1] 1.5
         >>> a**2                  > a**2
         [0,1,4,9]                 [1] 0 1 4 9
         >>> plot(a,a**2)          > plot(a,a**2)
         >>> show()                > lines(a,a**2)
Scipy
–   cluster: information theory functions (currently, vq and kmeans)
–   weave: compilation of numeric expressions to C++ for fast execution
–   cow: parallel programming via a Cluster Of Workstations
–   fftpack: fast Fourier transform module based on fftpack and fftw when
    available
–   ga: genetic algorithms
–   io: reading and writing numeric arrays, MATLAB .mat, and Matrix
    Market .mtx files
–   integrate: numeric integration for bounded and unbounded ranges. ODE
    solvers.
–   interpolate: interpolation of values from a sample data set.
–   optimize: constrained and unconstrained optimization methods and
    root-finding algorithms
–   signal: signal processing (1-D and 2-D filtering, filter design, LTI
    systems, etc.)
–   special: special function types (bessel, gamma, airy, etc.)
–   stats: statistical functions (stdev, var, mean, etc.)
–   linalg: linear algebra and BLAS routines based on the ATLAS
    implementation of LAPACK
–   sparse: Some sparse matrix support. LU factorization and solving
    sparse linear systems
Scipy statistical functions

●   descriptive statistics: variance, standard deviation, standard error, mean,
    mode, median
●   correlation: Pearson r, Spearman r, Kendall tau
●   statistical tests: chi-squared, t-tests, binomial, Wilcoxon, Kruskal,
    Kolmogorov-Smirnov, Anderson, etc.

●   linear regression

●   analysis of variance (ANOVA)

●   (and more)
pylab
pychem: Using scipy for chemoinformatics

●   Many multivariate analysis techniques are based on matrix
    algebra
●   scipy has wrappers around well-known C and Fortran
    numerical libraries (ATLAS, LAPACK)
●   pychem can do:
     –   principal component analysis, partial least squares
         regression, Fisher's discriminant analysis
     –   clustering: k-means, hierarchical (used Open Source
         clustering library)
     –   feature selection: genetic algorithms with PLS
     –   additional methods can be added
Python and R
  ●   Advantages of R:
       –   a large number of statistical libraries are available
  ●   Disadvantages of R:
       –   difficult to write algorithms
       –   slow (most R libraries are written in C)
       –   chokes on large datasets (use scan instead of read.table)

           Reading in data                   Principal component analysis
Method         300K    600K    1.6M        Method         300K    600K     1.6M
Python             6.8    13.9      41     Python             2.2      3.6      42
R (read.table)      42     105             R (read.table)       5       10
R (scan)             9      20      56     R (scan)             3        5      29




                                http://www.redbrick.dcu.ie/~noel/RversusPython.html
Python and R
●   rpy module allows Python programs to interface with R
●   have the best of both worlds
     –   access to the statistical functions of R
     –   access to the numerous modules available for Python
     –   can program in Python, instead of in R!!

             Python                                    R
>>> from rpy import r
>>> x = [5.05, 6.75, 3.21, 2.66]         > x <- c(5.05, 6.75, 3.21, 2.66)
>>> y = [1.65, 26.5, -5.93, 7.96]        > y <- c(1.65, 26.5, -5.93, 7.96)
>>> print r.lsfit(x,y)['coefficients']   > lsfit(x, y)$coefficients
{'X': 5.3935773611970212,                 Intercept        X
 'Intercept': -16.281127993087839}       -16.281128 5.393577
Python and R
                                             > hc$merge
Problem: Analyse a hierarchical clustering         [,1] [,2]
                                               [1,] -32 -33
Solution: Use R to cluster, and Python to      [2,] -39 -71
analyse the merge object of the cluster        [3,] -43 -47
                                               [4,] -10 -55
                                               [5,] -19 -36
                                               [6,] -5 -24
                                               [7,] -62 -63
                                               [8,] -74 -75
                                               [9,] -35 -76
                                              [10,] -1 -84
                                              [11,] -41 -42
                                              [12,] -83 -96
                                              [13,] -2 -29
                                              [14,] -7 -21
                                              [15,] -61 4
Problem 1
Graphically show the distribution of molecular weights of
molecules in an SD file. The molecular weight is stored in
a field of the SD file.

1,2-Diaminoethane
  MOE2004           3D

  6 3 0 0 0 0 0 0 0 0999 V2000
   -0.6900   -0.6620  0.0000 C 0   0   0   0   0   0   0   0   0   0   0   0
    0.5850   -1.9590  0.8240 H 0   0   0   0   0   0   0   0   0   0   0   0
    0.5350    1.5040  0.0000 N 0   0   0   0   0   0   0   0   0   0   0   0
    0.6060    2.0500  0.8460 H 0   0   0   0   0   0   0   0   0   0   0   0
    1.3040    0.8520 -0.0460 H 0   0   0   0   0   0   0   0   0   0   0   0
  1 2 1 0 0 0 0
  1 3 1 0 0 0 0
  1 4 1 0 0 0 0
M END
> <chem.name>
1,2-Diaminoethane

> <molecular.weight>
60.0995

$$$$
Object-oriented approach

 object                        SD file



 object           Molecule 1      Molecule 2      Molecule n


attributes                               fields
               name        atoms

  The code for creating these objects is stored in sdparser.py,
  which can be imported and used by any scripts that need to
  parse SD files.
Solution

from sdparser import SDFile
from rpy import *

inputfile = "mddr_complete.sd"

allmolweights = []

for molecule in SDFile(inputfile):
   molweight = molecule.fields['molecular.weight']
   allmolweights.append(float(molweight))

r.png(file="molwt_r.png")
r.hist(allmolweights,xlab="Mol. weights",main="MDDR", col="red")
r.dev_off()
Solution

from sdparser import SDFile
from rpy import *

inputfile = "mddr_complete.sd"

allmolweights = []

for molecule in SDFile(inputfile):
   molweight = molecule.fields['molecular.weight']
   allmolweights.append(float(molweight))

r.png(file="molwt_r.png")
r.hist(allmolweights,xlab="Mol. weights",main="MDDR", col="red")
r.dev_off()
Solution

from sdparser import SDFile
from rpy import *

inputfile = "mddr_complete.sd"

allmolweights = []

for molecule in SDFile(inputfile):
   molweight = molecule.fields['molecular.weight']
   allmolweights.append(float(molweight))

r.png(file="molwt_r.png")
r.hist(allmolweights,xlab="Mol. weights",main="MDDR", col="red")
r.dev_off()
Solution

from sdparser import SDFile
from rpy import *

inputfile = "mddr_complete.sd"

allmolweights = []

for molecule in SDFile(inputfile):
   molweight = molecule.fields['molecular.weight']
   allmolweights.append(float(molweight))

r.png(file="molwt_r.png")
r.hist(allmolweights,xlab="Mol. weights",main="MDDR", col="red")
r.dev_off()
Problem 2
Every molecule in an SD file is missing the name. To be
compatible with proprietary program X, we need to set the
name equal to the value of the field “chem.name”.

(MISSING NAME!)
  MOE2004          3D

  6 3 0 0 0 0 0 0 0 0999 V2000
   -0.6900   -0.6620  0.0000 C 0   0   0   0   0   0   0   0   0   0   0   0
    0.5850   -1.9590  0.8240 H 0   0   0   0   0   0   0   0   0   0   0   0
    0.5350    1.5040  0.0000 N 0   0   0   0   0   0   0   0   0   0   0   0
    0.6060    2.0500  0.8460 H 0   0   0   0   0   0   0   0   0   0   0   0
    1.3040    0.8520 -0.0460 H 0   0   0   0   0   0   0   0   0   0   0   0
  1 2 1 0 0 0 0
  1 3 1 0 0 0 0
  1 4 1 0 0 0 0
M END
> <chem.name>
1,2-Diaminoethane

> <molecular.weight>
60.0995

$$$$
Solution

from sdparser import SDFile

inputfile = "mddr_complete.sd"
outputfile = "mddr_withnames.sd"

for molecule in SDFile(inputfile):
   molecule.name = molecule.fields['chemical.name']
   outputfile.write(molecule)

inputfile.close()
outputfile.close()

print "We are the knights who say....SD!!!"
Solution

from sdparser import SDFile

inputfile = "mddr_complete.sd"
outputfile = "mddr_withnames.sd"

for molecule in SDFile(inputfile):
   molecule.name = molecule.fields['chemical.name']
   outputfile.write(molecule)

inputfile.close()
outputfile.close()

print "We are the knights who say....SD!!!"
Python and Java
 ●  It's easy to use Java libraries from Python
      – using either Jython or JPype
      – see http://www.redbrick.dcu.ie/~noel/CDKJython.html

Example: using the CDK to calculate the number of rings in a molecule
(given a string variable containing CML)
from jpype import *
startJVM("jdk1.5.0_03/jre/lib/i386/server/libjvm.so")
cdk = JPackage("org").openscience.cdk
SSSRFinder = cdk.ringsearch.SSSRFinder
CMLReader = cdk.io.CMLReader

def getNumRings(molecule):
  # Convert to a CDK molecule
  reader = CMLReader(java.io.StringReader(molXmlValue))
  chemFile = reader.read(cdk.ChemFile())
  cdkMol = chemFile.getChemSequence(0).getChemModel(0).getSetOfMolecules().getMolecule(0)
  # Calculate the number of rings
  sssrFinder = SSSRFinder(cdkMol)
  sssr = sssrFinder.findSSSR().size()
  return sssr
3D visualisation
●   VTK (Visualisation Toolkit) from Kitware
    –   open source, freely available
    –   scalar, tensor, vector and volumetric methods
    –   advanced modeling techniques such as implicit modelling,
        polygon reduction, mesh smoothing, cutting, contouring, and
        Delaunay triangulation
●   MayaVi
    –   easy to use GUI interface to VTK, written in Python
    –   can create input files and visualise them using Python scripts

                                 Demo
Python Resources
●   http://www.python.org
●   Guido's Tutorial
     –   http://www.python.org/doc/current/tut/tut.html
●   O'Reilly's “Learning Python” or Visual Quickstart Guide
    to Python
     –   Make sure it's Python 2.3 or 2.4 though
●   For Windows, consider the Enthought edition
     –   http://www.enthought.com/
Python for Chemistry

More Related Content

What's hot

energy minimization
energy minimizationenergy minimization
energy minimizationpradeep kore
 
Virtual screening strategies and applications
Virtual screening strategies and applicationsVirtual screening strategies and applications
Virtual screening strategies and applicationsAshishkumar3249
 
King_Notes_Density_of_States_2D1D0D.pdf
King_Notes_Density_of_States_2D1D0D.pdfKing_Notes_Density_of_States_2D1D0D.pdf
King_Notes_Density_of_States_2D1D0D.pdfLogeshwariC2
 
In Silico methods for ADMET prediction of new molecules
 In Silico methods for ADMET prediction of new molecules In Silico methods for ADMET prediction of new molecules
In Silico methods for ADMET prediction of new moleculesMadhuraDatar
 
heck reaction, suzuki coupling and sharpless epoxidation
heck reaction, suzuki coupling and sharpless epoxidationheck reaction, suzuki coupling and sharpless epoxidation
heck reaction, suzuki coupling and sharpless epoxidationVISHAL PATIL
 
Understanding Chemical Reaction Mechanisms with Quantum Chemistry
Understanding Chemical Reaction Mechanisms with Quantum ChemistryUnderstanding Chemical Reaction Mechanisms with Quantum Chemistry
Understanding Chemical Reaction Mechanisms with Quantum Chemistrywinterschool
 
CHEMISTRY OF PEPTIDES [M.PHARM, M.SC, BSC, B.PHARM]
CHEMISTRY OF PEPTIDES [M.PHARM, M.SC, BSC, B.PHARM]CHEMISTRY OF PEPTIDES [M.PHARM, M.SC, BSC, B.PHARM]
CHEMISTRY OF PEPTIDES [M.PHARM, M.SC, BSC, B.PHARM]Shikha Popali
 
Process chemistry AS PER PCI SYLLABUS FOR M.PHARM
Process chemistry AS PER PCI SYLLABUS FOR M.PHARMProcess chemistry AS PER PCI SYLLABUS FOR M.PHARM
Process chemistry AS PER PCI SYLLABUS FOR M.PHARMShikha Popali
 
Process chemistry
Process chemistryProcess chemistry
Process chemistryDivya V
 
DENOVO DRUG DESIGN AS PER PCI SYLLABUS M.PHARM
DENOVO DRUG DESIGN AS PER PCI SYLLABUS M.PHARMDENOVO DRUG DESIGN AS PER PCI SYLLABUS M.PHARM
DENOVO DRUG DESIGN AS PER PCI SYLLABUS M.PHARMShikha Popali
 
Molecular and Quantum Mechanics in drug design
Molecular and Quantum Mechanics in drug designMolecular and Quantum Mechanics in drug design
Molecular and Quantum Mechanics in drug designAjay Kumar
 
Two diametional (2 d) spectroscopy
Two diametional (2 d) spectroscopyTwo diametional (2 d) spectroscopy
Two diametional (2 d) spectroscopySurendra Singh
 
A review on ipce and pec measurements and materials p.basnet
A review on ipce and pec measurements and materials p.basnetA review on ipce and pec measurements and materials p.basnet
A review on ipce and pec measurements and materials p.basnetPradip Basnet
 
Density Functional Theory
Density Functional TheoryDensity Functional Theory
Density Functional Theorykrishslide
 

What's hot (20)

energy minimization
energy minimizationenergy minimization
energy minimization
 
Virtual screening strategies and applications
Virtual screening strategies and applicationsVirtual screening strategies and applications
Virtual screening strategies and applications
 
King_Notes_Density_of_States_2D1D0D.pdf
King_Notes_Density_of_States_2D1D0D.pdfKing_Notes_Density_of_States_2D1D0D.pdf
King_Notes_Density_of_States_2D1D0D.pdf
 
Lc ftir
Lc ftir Lc ftir
Lc ftir
 
Molecular Dynamics
Molecular DynamicsMolecular Dynamics
Molecular Dynamics
 
In Silico methods for ADMET prediction of new molecules
 In Silico methods for ADMET prediction of new molecules In Silico methods for ADMET prediction of new molecules
In Silico methods for ADMET prediction of new molecules
 
heck reaction, suzuki coupling and sharpless epoxidation
heck reaction, suzuki coupling and sharpless epoxidationheck reaction, suzuki coupling and sharpless epoxidation
heck reaction, suzuki coupling and sharpless epoxidation
 
Chemical dynamics, intro,rrk, rrkm theory
Chemical dynamics, intro,rrk, rrkm theoryChemical dynamics, intro,rrk, rrkm theory
Chemical dynamics, intro,rrk, rrkm theory
 
Understanding Chemical Reaction Mechanisms with Quantum Chemistry
Understanding Chemical Reaction Mechanisms with Quantum ChemistryUnderstanding Chemical Reaction Mechanisms with Quantum Chemistry
Understanding Chemical Reaction Mechanisms with Quantum Chemistry
 
CHEMISTRY OF PEPTIDES [M.PHARM, M.SC, BSC, B.PHARM]
CHEMISTRY OF PEPTIDES [M.PHARM, M.SC, BSC, B.PHARM]CHEMISTRY OF PEPTIDES [M.PHARM, M.SC, BSC, B.PHARM]
CHEMISTRY OF PEPTIDES [M.PHARM, M.SC, BSC, B.PHARM]
 
Process chemistry AS PER PCI SYLLABUS FOR M.PHARM
Process chemistry AS PER PCI SYLLABUS FOR M.PHARMProcess chemistry AS PER PCI SYLLABUS FOR M.PHARM
Process chemistry AS PER PCI SYLLABUS FOR M.PHARM
 
Process chemistry
Process chemistryProcess chemistry
Process chemistry
 
DENOVO DRUG DESIGN AS PER PCI SYLLABUS M.PHARM
DENOVO DRUG DESIGN AS PER PCI SYLLABUS M.PHARMDENOVO DRUG DESIGN AS PER PCI SYLLABUS M.PHARM
DENOVO DRUG DESIGN AS PER PCI SYLLABUS M.PHARM
 
Molecular and Quantum Mechanics in drug design
Molecular and Quantum Mechanics in drug designMolecular and Quantum Mechanics in drug design
Molecular and Quantum Mechanics in drug design
 
Free Radical Reactions-Guest Lecture.pdf
Free Radical Reactions-Guest Lecture.pdfFree Radical Reactions-Guest Lecture.pdf
Free Radical Reactions-Guest Lecture.pdf
 
31-P NMR SPECTROSCOPY
31-P NMR SPECTROSCOPY31-P NMR SPECTROSCOPY
31-P NMR SPECTROSCOPY
 
Cosy,nosy
Cosy,nosyCosy,nosy
Cosy,nosy
 
Two diametional (2 d) spectroscopy
Two diametional (2 d) spectroscopyTwo diametional (2 d) spectroscopy
Two diametional (2 d) spectroscopy
 
A review on ipce and pec measurements and materials p.basnet
A review on ipce and pec measurements and materials p.basnetA review on ipce and pec measurements and materials p.basnet
A review on ipce and pec measurements and materials p.basnet
 
Density Functional Theory
Density Functional TheoryDensity Functional Theory
Density Functional Theory
 

Similar to Python for Chemistry

Seven waystouseturtle pycon2009
Seven waystouseturtle pycon2009Seven waystouseturtle pycon2009
Seven waystouseturtle pycon2009A Jorge Garcia
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptSanket Shikhar
 
Migrating from matlab to python
Migrating from matlab to pythonMigrating from matlab to python
Migrating from matlab to pythonActiveState
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」Preferred Networks
 
A Workshop on R
A Workshop on RA Workshop on R
A Workshop on RAjay Ohri
 
Data Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science CompetitionsData Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science CompetitionsKrishna Sankar
 
Using R in remote computer clusters
Using R in remote computer clustersUsing R in remote computer clusters
Using R in remote computer clustersBurak Himmetoglu
 
RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)Daniel Nüst
 
Open-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitOpen-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitGreg Landrum
 
Reproducible Computational Research in R
Reproducible Computational Research in RReproducible Computational Research in R
Reproducible Computational Research in RSamuel Bosch
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimizationg3_nittala
 
2015-10-23_wim_davis_r_slides.pptx on consumer
2015-10-23_wim_davis_r_slides.pptx on consumer2015-10-23_wim_davis_r_slides.pptx on consumer
2015-10-23_wim_davis_r_slides.pptx on consumertirlukachaitanya
 
Python For Scientists
Python For ScientistsPython For Scientists
Python For Scientistsaeberspaecher
 
Regex Considered Harmful: Use Rosie Pattern Language Instead
Regex Considered Harmful: Use Rosie Pattern Language InsteadRegex Considered Harmful: Use Rosie Pattern Language Instead
Regex Considered Harmful: Use Rosie Pattern Language InsteadAll Things Open
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging EnvironmentsPaul Groth
 
DRUG - RDSTK Talk
DRUG - RDSTK TalkDRUG - RDSTK Talk
DRUG - RDSTK Talkrtelmore
 
R and RMarkdown crash course
R and RMarkdown crash courseR and RMarkdown crash course
R and RMarkdown crash courseOlga Scrivner
 

Similar to Python for Chemistry (20)

Seven waystouseturtle pycon2009
Seven waystouseturtle pycon2009Seven waystouseturtle pycon2009
Seven waystouseturtle pycon2009
 
Python Orientation
Python OrientationPython Orientation
Python Orientation
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
 
Migrating from matlab to python
Migrating from matlab to pythonMigrating from matlab to python
Migrating from matlab to python
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」
 
A Workshop on R
A Workshop on RA Workshop on R
A Workshop on R
 
Data Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science CompetitionsData Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science Competitions
 
Using R in remote computer clusters
Using R in remote computer clustersUsing R in remote computer clusters
Using R in remote computer clusters
 
RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)
 
Open-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitOpen-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKit
 
Reproducible Computational Research in R
Reproducible Computational Research in RReproducible Computational Research in R
Reproducible Computational Research in R
 
DS LAB MANUAL.pdf
DS LAB MANUAL.pdfDS LAB MANUAL.pdf
DS LAB MANUAL.pdf
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimization
 
2015-10-23_wim_davis_r_slides.pptx on consumer
2015-10-23_wim_davis_r_slides.pptx on consumer2015-10-23_wim_davis_r_slides.pptx on consumer
2015-10-23_wim_davis_r_slides.pptx on consumer
 
Python For Scientists
Python For ScientistsPython For Scientists
Python For Scientists
 
Regex Considered Harmful: Use Rosie Pattern Language Instead
Regex Considered Harmful: Use Rosie Pattern Language InsteadRegex Considered Harmful: Use Rosie Pattern Language Instead
Regex Considered Harmful: Use Rosie Pattern Language Instead
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging Environments
 
DRUG - RDSTK Talk
DRUG - RDSTK TalkDRUG - RDSTK Talk
DRUG - RDSTK Talk
 
R for hadoopers
R for hadoopersR for hadoopers
R for hadoopers
 
R and RMarkdown crash course
R and RMarkdown crash courseR and RMarkdown crash course
R and RMarkdown crash course
 

More from baoilleach

We need to talk about Kekulization, Aromaticity and SMILES
We need to talk about Kekulization, Aromaticity and SMILESWe need to talk about Kekulization, Aromaticity and SMILES
We need to talk about Kekulization, Aromaticity and SMILESbaoilleach
 
Open Babel project overview
Open Babel project overviewOpen Babel project overview
Open Babel project overviewbaoilleach
 
So I have an SD File... What do I do next?
So I have an SD File... What do I do next?So I have an SD File... What do I do next?
So I have an SD File... What do I do next?baoilleach
 
Chemistrify the Web
Chemistrify the WebChemistrify the Web
Chemistrify the Webbaoilleach
 
Universal Smiles: Finally a canonical SMILES string
Universal Smiles: Finally a canonical SMILES stringUniversal Smiles: Finally a canonical SMILES string
Universal Smiles: Finally a canonical SMILES stringbaoilleach
 
What's New and Cooking in Open Babel 2.3.2
What's New and Cooking in Open Babel 2.3.2What's New and Cooking in Open Babel 2.3.2
What's New and Cooking in Open Babel 2.3.2baoilleach
 
Intro to Open Babel
Intro to Open BabelIntro to Open Babel
Intro to Open Babelbaoilleach
 
Protein-ligand docking
Protein-ligand dockingProtein-ligand docking
Protein-ligand dockingbaoilleach
 
Cheminformatics
CheminformaticsCheminformatics
Cheminformaticsbaoilleach
 
Making the most of a QM calculation
Making the most of a QM calculationMaking the most of a QM calculation
Making the most of a QM calculationbaoilleach
 
Data Analysis in QSAR
Data Analysis in QSARData Analysis in QSAR
Data Analysis in QSARbaoilleach
 
Large-scale computational design and selection of polymers for solar cells
Large-scale computational design and selection of polymers for solar cellsLarge-scale computational design and selection of polymers for solar cells
Large-scale computational design and selection of polymers for solar cellsbaoilleach
 
My Open Access papers
My Open Access papersMy Open Access papers
My Open Access papersbaoilleach
 
Improving the quality of chemical databases with community-developed tools (a...
Improving the quality of chemical databases with community-developed tools (a...Improving the quality of chemical databases with community-developed tools (a...
Improving the quality of chemical databases with community-developed tools (a...baoilleach
 
De novo design of molecular wires with optimal properties for solar energy co...
De novo design of molecular wires with optimal properties for solar energy co...De novo design of molecular wires with optimal properties for solar energy co...
De novo design of molecular wires with optimal properties for solar energy co...baoilleach
 
Cinfony - Bring cheminformatics toolkits into tune
Cinfony - Bring cheminformatics toolkits into tuneCinfony - Bring cheminformatics toolkits into tune
Cinfony - Bring cheminformatics toolkits into tunebaoilleach
 
Density functional theory calculations on Ruthenium polypyridyl complexes inc...
Density functional theory calculations on Ruthenium polypyridyl complexes inc...Density functional theory calculations on Ruthenium polypyridyl complexes inc...
Density functional theory calculations on Ruthenium polypyridyl complexes inc...baoilleach
 
Application of Density Functional Theory to Scanning Tunneling Microscopy
Application of Density Functional Theory to Scanning Tunneling MicroscopyApplication of Density Functional Theory to Scanning Tunneling Microscopy
Application of Density Functional Theory to Scanning Tunneling Microscopybaoilleach
 
Towards Practical Molecular Devices
Towards Practical Molecular DevicesTowards Practical Molecular Devices
Towards Practical Molecular Devicesbaoilleach
 
Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...baoilleach
 

More from baoilleach (20)

We need to talk about Kekulization, Aromaticity and SMILES
We need to talk about Kekulization, Aromaticity and SMILESWe need to talk about Kekulization, Aromaticity and SMILES
We need to talk about Kekulization, Aromaticity and SMILES
 
Open Babel project overview
Open Babel project overviewOpen Babel project overview
Open Babel project overview
 
So I have an SD File... What do I do next?
So I have an SD File... What do I do next?So I have an SD File... What do I do next?
So I have an SD File... What do I do next?
 
Chemistrify the Web
Chemistrify the WebChemistrify the Web
Chemistrify the Web
 
Universal Smiles: Finally a canonical SMILES string
Universal Smiles: Finally a canonical SMILES stringUniversal Smiles: Finally a canonical SMILES string
Universal Smiles: Finally a canonical SMILES string
 
What's New and Cooking in Open Babel 2.3.2
What's New and Cooking in Open Babel 2.3.2What's New and Cooking in Open Babel 2.3.2
What's New and Cooking in Open Babel 2.3.2
 
Intro to Open Babel
Intro to Open BabelIntro to Open Babel
Intro to Open Babel
 
Protein-ligand docking
Protein-ligand dockingProtein-ligand docking
Protein-ligand docking
 
Cheminformatics
CheminformaticsCheminformatics
Cheminformatics
 
Making the most of a QM calculation
Making the most of a QM calculationMaking the most of a QM calculation
Making the most of a QM calculation
 
Data Analysis in QSAR
Data Analysis in QSARData Analysis in QSAR
Data Analysis in QSAR
 
Large-scale computational design and selection of polymers for solar cells
Large-scale computational design and selection of polymers for solar cellsLarge-scale computational design and selection of polymers for solar cells
Large-scale computational design and selection of polymers for solar cells
 
My Open Access papers
My Open Access papersMy Open Access papers
My Open Access papers
 
Improving the quality of chemical databases with community-developed tools (a...
Improving the quality of chemical databases with community-developed tools (a...Improving the quality of chemical databases with community-developed tools (a...
Improving the quality of chemical databases with community-developed tools (a...
 
De novo design of molecular wires with optimal properties for solar energy co...
De novo design of molecular wires with optimal properties for solar energy co...De novo design of molecular wires with optimal properties for solar energy co...
De novo design of molecular wires with optimal properties for solar energy co...
 
Cinfony - Bring cheminformatics toolkits into tune
Cinfony - Bring cheminformatics toolkits into tuneCinfony - Bring cheminformatics toolkits into tune
Cinfony - Bring cheminformatics toolkits into tune
 
Density functional theory calculations on Ruthenium polypyridyl complexes inc...
Density functional theory calculations on Ruthenium polypyridyl complexes inc...Density functional theory calculations on Ruthenium polypyridyl complexes inc...
Density functional theory calculations on Ruthenium polypyridyl complexes inc...
 
Application of Density Functional Theory to Scanning Tunneling Microscopy
Application of Density Functional Theory to Scanning Tunneling MicroscopyApplication of Density Functional Theory to Scanning Tunneling Microscopy
Application of Density Functional Theory to Scanning Tunneling Microscopy
 
Towards Practical Molecular Devices
Towards Practical Molecular DevicesTowards Practical Molecular Devices
Towards Practical Molecular Devices
 
Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...
 

Recently uploaded

Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 

Recently uploaded (20)

Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 

Python for Chemistry

  • 1. minutes Python for Chemistry in 21 days Dr. Noel O'Boyle Dr. John Mitchell and Prof. Peter Murray-Rust UCC Talk, Sept 2005 Available at http://www-mitchell.ch.cam.ac.uk/noel/
  • 2. Introduction ● This talk will cover – what Python is – why you should be interested in it – how you can use Python in chemistry
  • 3. Introduction ● This talk will cover – what Python is – why you should be interested in it – how you can use Python in chemistry ● This talk will not cover – how to program in Python ● See references at end of talk Word of warning!!
  • 4. “My mental eye could now distinguish larger structures, of manifold conformation; long rows, sometimes more closely fitted together; all twining and twisting in snakelike motion. But look! What was that? One of the snakes had seized hold of its own tail, and the form whirled mockingly before my eyes. As if by the flash of lightning I awoke...Let us learn to dream, gentlemen” Friedrich August Kekulé (1829-1896)
  • 5. What is Python? ● For a computer scientist... – a high-level programming language – interpreted (byte-compiled) – dynamic typing – object-oriented
  • 6. What is Python? ● For a computer scientist... – a high-level programming language – interpreted (byte-compiled) – dynamic typing – object-oriented ● For everyone else... – a scripting language (like Perl or Ruby) released by Guido von Rossum in 1991 – easy to learn – easy to read (!)
  • 7. What is Python? ● For a computer scientist... – a high-level programming language – interpreted (byte-compiled) – dynamic typing – object-oriented ● For everyone else... – a scripting language (like Perl or Ruby) released by Guido von Rossum in 1991 – easy to learn – easy to read (!) – named after Cambridge comedians
  • 8. The Great Debate Sir Lancelot: We were in the nick of time. You were in great Perl. Sir Galahad: I don't think I was. Sir Lancelot: You were, Sir Galahad. You were in terrible Perl. Sir Galahad: Look, let me go back in there and face the Perl. Sir Lancelot: No, it's too perilous. (adapted from Monty Python and the Holy Grail)
  • 9. Why you should be interested (1) ● Python has been adopted by the cheminformatics community ● For example, AstraZeneca has moved some of its codebase from 'the other scripting language' to Python Job Description – Research Software Developer/Informatician [section deleted] Required Skills: At least one object oriented programming language, e.g., Python, C++, Java. Web-based application development (design/construction/maintenance) UNIX, UNIX scripting & Linux OS Position in AstraZeneca R&D, 02-09-05
  • 10. Why you should be interested (2) ● Scientific computing: scipy/pylab (like Matlab) ● Molecular dynamics: MMTK ● Statistics: scipy, rpy (R), pychem ● 3D-visualisation: VTK (mayavi) ● 2D-visualisation: scipy, pylab, rpy ● coming soon, a wrapper around OpenBabel ● cheminformatics: OEChem, frowns, PyDaylight, pychem ● bioinformatics: BioPython ● structural biology: PyMOL ● computational chemistry: GaussSum ● you can still use Java libraries...like the CDK
  • 11. >>> from scipy import * >>> from pylab import * >>> a = arange(0,4) > a <- seq(0,3) >>> a >a scipy/ R [0,1,2,3] [1] 0 1 2 3 pylab >>> mean(a) > mean(a) 1.5 [1] 1.5 >>> a**2 > a**2 [0,1,4,9] [1] 0 1 4 9 >>> plot(a,a**2) > plot(a,a**2) >>> show() > lines(a,a**2)
  • 12. Scipy – cluster: information theory functions (currently, vq and kmeans) – weave: compilation of numeric expressions to C++ for fast execution – cow: parallel programming via a Cluster Of Workstations – fftpack: fast Fourier transform module based on fftpack and fftw when available – ga: genetic algorithms – io: reading and writing numeric arrays, MATLAB .mat, and Matrix Market .mtx files – integrate: numeric integration for bounded and unbounded ranges. ODE solvers. – interpolate: interpolation of values from a sample data set. – optimize: constrained and unconstrained optimization methods and root-finding algorithms – signal: signal processing (1-D and 2-D filtering, filter design, LTI systems, etc.) – special: special function types (bessel, gamma, airy, etc.) – stats: statistical functions (stdev, var, mean, etc.) – linalg: linear algebra and BLAS routines based on the ATLAS implementation of LAPACK – sparse: Some sparse matrix support. LU factorization and solving sparse linear systems
  • 13. Scipy statistical functions ● descriptive statistics: variance, standard deviation, standard error, mean, mode, median ● correlation: Pearson r, Spearman r, Kendall tau ● statistical tests: chi-squared, t-tests, binomial, Wilcoxon, Kruskal, Kolmogorov-Smirnov, Anderson, etc. ● linear regression ● analysis of variance (ANOVA) ● (and more)
  • 14. pylab
  • 15. pychem: Using scipy for chemoinformatics ● Many multivariate analysis techniques are based on matrix algebra ● scipy has wrappers around well-known C and Fortran numerical libraries (ATLAS, LAPACK) ● pychem can do: – principal component analysis, partial least squares regression, Fisher's discriminant analysis – clustering: k-means, hierarchical (used Open Source clustering library) – feature selection: genetic algorithms with PLS – additional methods can be added
  • 16. Python and R ● Advantages of R: – a large number of statistical libraries are available ● Disadvantages of R: – difficult to write algorithms – slow (most R libraries are written in C) – chokes on large datasets (use scan instead of read.table) Reading in data Principal component analysis Method 300K 600K 1.6M Method 300K 600K 1.6M Python 6.8 13.9 41 Python 2.2 3.6 42 R (read.table) 42 105 R (read.table) 5 10 R (scan) 9 20 56 R (scan) 3 5 29 http://www.redbrick.dcu.ie/~noel/RversusPython.html
  • 17. Python and R ● rpy module allows Python programs to interface with R ● have the best of both worlds – access to the statistical functions of R – access to the numerous modules available for Python – can program in Python, instead of in R!! Python R >>> from rpy import r >>> x = [5.05, 6.75, 3.21, 2.66] > x <- c(5.05, 6.75, 3.21, 2.66) >>> y = [1.65, 26.5, -5.93, 7.96] > y <- c(1.65, 26.5, -5.93, 7.96) >>> print r.lsfit(x,y)['coefficients'] > lsfit(x, y)$coefficients {'X': 5.3935773611970212, Intercept X 'Intercept': -16.281127993087839} -16.281128 5.393577
  • 18. Python and R > hc$merge Problem: Analyse a hierarchical clustering [,1] [,2] [1,] -32 -33 Solution: Use R to cluster, and Python to [2,] -39 -71 analyse the merge object of the cluster [3,] -43 -47 [4,] -10 -55 [5,] -19 -36 [6,] -5 -24 [7,] -62 -63 [8,] -74 -75 [9,] -35 -76 [10,] -1 -84 [11,] -41 -42 [12,] -83 -96 [13,] -2 -29 [14,] -7 -21 [15,] -61 4
  • 19. Problem 1 Graphically show the distribution of molecular weights of molecules in an SD file. The molecular weight is stored in a field of the SD file. 1,2-Diaminoethane MOE2004 3D 6 3 0 0 0 0 0 0 0 0999 V2000 -0.6900 -0.6620 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.5850 -1.9590 0.8240 H 0 0 0 0 0 0 0 0 0 0 0 0 0.5350 1.5040 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 0.6060 2.0500 0.8460 H 0 0 0 0 0 0 0 0 0 0 0 0 1.3040 0.8520 -0.0460 H 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0 1 3 1 0 0 0 0 1 4 1 0 0 0 0 M END > <chem.name> 1,2-Diaminoethane > <molecular.weight> 60.0995 $$$$
  • 20. Object-oriented approach object SD file object Molecule 1 Molecule 2 Molecule n attributes fields name atoms The code for creating these objects is stored in sdparser.py, which can be imported and used by any scripts that need to parse SD files.
  • 21. Solution from sdparser import SDFile from rpy import * inputfile = "mddr_complete.sd" allmolweights = [] for molecule in SDFile(inputfile): molweight = molecule.fields['molecular.weight'] allmolweights.append(float(molweight)) r.png(file="molwt_r.png") r.hist(allmolweights,xlab="Mol. weights",main="MDDR", col="red") r.dev_off()
  • 22. Solution from sdparser import SDFile from rpy import * inputfile = "mddr_complete.sd" allmolweights = [] for molecule in SDFile(inputfile): molweight = molecule.fields['molecular.weight'] allmolweights.append(float(molweight)) r.png(file="molwt_r.png") r.hist(allmolweights,xlab="Mol. weights",main="MDDR", col="red") r.dev_off()
  • 23. Solution from sdparser import SDFile from rpy import * inputfile = "mddr_complete.sd" allmolweights = [] for molecule in SDFile(inputfile): molweight = molecule.fields['molecular.weight'] allmolweights.append(float(molweight)) r.png(file="molwt_r.png") r.hist(allmolweights,xlab="Mol. weights",main="MDDR", col="red") r.dev_off()
  • 24. Solution from sdparser import SDFile from rpy import * inputfile = "mddr_complete.sd" allmolweights = [] for molecule in SDFile(inputfile): molweight = molecule.fields['molecular.weight'] allmolweights.append(float(molweight)) r.png(file="molwt_r.png") r.hist(allmolweights,xlab="Mol. weights",main="MDDR", col="red") r.dev_off()
  • 25.
  • 26. Problem 2 Every molecule in an SD file is missing the name. To be compatible with proprietary program X, we need to set the name equal to the value of the field “chem.name”. (MISSING NAME!) MOE2004 3D 6 3 0 0 0 0 0 0 0 0999 V2000 -0.6900 -0.6620 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.5850 -1.9590 0.8240 H 0 0 0 0 0 0 0 0 0 0 0 0 0.5350 1.5040 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 0.6060 2.0500 0.8460 H 0 0 0 0 0 0 0 0 0 0 0 0 1.3040 0.8520 -0.0460 H 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0 1 3 1 0 0 0 0 1 4 1 0 0 0 0 M END > <chem.name> 1,2-Diaminoethane > <molecular.weight> 60.0995 $$$$
  • 27. Solution from sdparser import SDFile inputfile = "mddr_complete.sd" outputfile = "mddr_withnames.sd" for molecule in SDFile(inputfile): molecule.name = molecule.fields['chemical.name'] outputfile.write(molecule) inputfile.close() outputfile.close() print "We are the knights who say....SD!!!"
  • 28. Solution from sdparser import SDFile inputfile = "mddr_complete.sd" outputfile = "mddr_withnames.sd" for molecule in SDFile(inputfile): molecule.name = molecule.fields['chemical.name'] outputfile.write(molecule) inputfile.close() outputfile.close() print "We are the knights who say....SD!!!"
  • 29. Python and Java ● It's easy to use Java libraries from Python – using either Jython or JPype – see http://www.redbrick.dcu.ie/~noel/CDKJython.html Example: using the CDK to calculate the number of rings in a molecule (given a string variable containing CML) from jpype import * startJVM("jdk1.5.0_03/jre/lib/i386/server/libjvm.so") cdk = JPackage("org").openscience.cdk SSSRFinder = cdk.ringsearch.SSSRFinder CMLReader = cdk.io.CMLReader def getNumRings(molecule): # Convert to a CDK molecule reader = CMLReader(java.io.StringReader(molXmlValue)) chemFile = reader.read(cdk.ChemFile()) cdkMol = chemFile.getChemSequence(0).getChemModel(0).getSetOfMolecules().getMolecule(0) # Calculate the number of rings sssrFinder = SSSRFinder(cdkMol) sssr = sssrFinder.findSSSR().size() return sssr
  • 30. 3D visualisation ● VTK (Visualisation Toolkit) from Kitware – open source, freely available – scalar, tensor, vector and volumetric methods – advanced modeling techniques such as implicit modelling, polygon reduction, mesh smoothing, cutting, contouring, and Delaunay triangulation ● MayaVi – easy to use GUI interface to VTK, written in Python – can create input files and visualise them using Python scripts Demo
  • 31. Python Resources ● http://www.python.org ● Guido's Tutorial – http://www.python.org/doc/current/tut/tut.html ● O'Reilly's “Learning Python” or Visual Quickstart Guide to Python – Make sure it's Python 2.3 or 2.4 though ● For Windows, consider the Enthought edition – http://www.enthought.com/