Ponència a càrrec de Xevi Biarnés, IQS-URL, presentada a la 19a edició de la Trobada de l'Anella Científica (TAC) al Tecnocampus Mataró-Maresme el 30 de juny de 2015.
En aquesta presentació s'ha mostrat el càlcul i anàlisi de dades massives per al disseny d’enzims amb aplicacions a la indústria biotecnològica.
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Càlcul i anàlisi de dades massiu per al disseny d'enzims amb aplicacions a la indústria biotecnològica
1. TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Càlcul i anàlisi de dades massiu per al disseny
d’enzims amb aplicacions biotecnològiques
Xevi Biarnés
@xevibiarnes
xevi.biarnes@iqs.edu
Departament de Bioenginyeria
IQS School of Engineering
Jornada TAC’2015
Mataró, 30 de juny de 2015
2. TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
IQS School of Engineering
IQS School of Management
Via Augusta, 390
Barcelona
3. TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Bioengineering Department
Laboratory of Biochemistry
Laboratory of Biomaterials
Laboratory of Tissue Engineering
Laboratory of Microbiology
Laboratory of Bioprocesses
Degree in Biotechnology
Master’s of Science in Bioengineering
PhD in Bioengineering
4. TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Laboratory of Biochemistry
Main R&D topics:
Protein engineering and enzymology of glycosidases and glycosyltranferases
Therapeutic targets in infectious diseases
Amyloidogenic proteins in neurodegenerative diseases
Biocatalysis: enzyme redesign, directed evolution of enzymes
Metabolic Engineering for the production of glycoglycerolipids
Headed by Prof. Antoni Planas
5 permanent staff
8 PhD students
6 MSc students
4 undergrad students
2 research assistants
5. TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Bioinformatics and Molecular Modelling Unit
Laboratory of Biochemistry
BIOMMIQS
Bioinformatics for comparative
analysis of genomic sequences
Protein Structure Prediction
In silico tools to assist in
experimental Protein Engineering
Simulation of small and macro
molecules conformational dynamics
6. TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
BIOMMIQS @IQS and Anella Científica
BIOMMIQS
Direct access to
consortium services:
CBUC/CCUC
EDUROAM
8. TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Re-Evolution of Genomes
optimization of the genetic codes of
living organisms to adapt to their
living environments
9. TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Burst of Genomic Data
• Growth of GeneBank database
http://www.ncbi.nlm.nih.gov/genbank/statistics
700 GBytes
of raw data
10. TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
ACTAACCCCTCAGTTTTTGTCAAGCTGTCAGACCCTCCAGCGCAGGTTTCAGTGCC
ATTCATGTCACCTGCGAGTGCTTATCAATGGTTTTATGACGGATATCCCACATTCGGA
GAACACAAACAGGAGAAAGATCTTGAATACGGGGCATGTCCTAATAACATGATGG
GCACGTTCTCAGTGCGGACTGTGGGGACCTCCAAGTCCAAGTACCCTTTAGTGGT
AGGATCTACATGAGAATGAAGCACGTCAGGGCGTGGATACCTCGCCCGATGCGTA
CCAGAACTACCTATTCAAAGCCAACCCAAATTATGCTGGCAACTCCATTAAGCCAAC
TGGTGCCAGTCGTACAGCGATCACCACTCTTGGGAAATTTGGACAACAGTCTGGG
GCTATTTATGTGGGCAACTTTAGAGTGGTCAACCGACATCTTGCCACTCACAATGAT
TGGGCAAATCTTGTTTGGGAAGACAGCTCTCGCGACTTGCTCGTGTCATGAACCAC
CGCCCAAGGCTGTGACACGATTGCTCGTTGCGATTGCCAGACAGGGGTGTACTAC
GTAACTCGATGAGAAAACACTACCCAGTCAGTTTTTCAAAACCCAGCCTGATCTATG
TAGAGGCTAGCGAGTATTACCCAGCCAGGTACCAATCACATCTCATGCTCGCACAG
GGTCACTCAGAACCTGGTGATTGCGGTGGTATCCTTAGATGCCAACATGGCGTCGT
CGGCATAGTGTCTACTGGTGGTAATGGGCTCGTTGGCTTTGCAGACGTTAGAGACC
TCTTGTGGTTAGATGAAGAAGCTATGGAACAGGGCGTGTCCGACTACATCAAGGG
TCTCGGAGATGCTTTTGGAACAGGCTTCACTGACGCAGTCTCAAGGGAGGTTGAA
GCTCTCAAGAACTATCTTATAGGGTCTGAAGGAGCAGTTGAGAAAATCTTGAAAAA
TCTTATTAAACTAATCTCTGCACTGGTATTGTGATCAGAAGTGATTACGACATGGTTA
Where is
the information?
12. TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Proteins are the Machinery of the Cells
genes (DNA) only keep the information
proteins (aminoacids) perform the function
The Inner Life of the Cell (youtube)
XVIVO Scientific Animation
13. TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Proteins are 3D Objects
nanometer size
>sequence_of_aminoacids
MYCSASTCTCSTTATRHYGCKLMNDSSCRFGH
KLISPRDTEDFSGFRTCSKLIPSCSFACVIPL
PSFACEERERWQSRTNCVISCRTEDPLKISCF
GRSRACGRSTTRSGCSPLYPLREDTSWASDFR
3D structure and function of proteins is dictated
by their aminoacids sequence
14. TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Proteins are Dynamic 3D Objects
hemoglobin:
oxygen
transporter
in blood
eppur si muove
a = F / m
x(t) = x0 + v0·t + ½·a·t2
protein motion can be simulated on the computer:
Molecular Dynamics (MD)
typical simulation:
1 billion of steps!
15. TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Computer Simulations of Biological Processes @ BIOMMIQS
Implementation of
computational algorithms
based on Molecular Dynamics
(metadynamics)
that enhance the simulation of
biologically relevant processes
BIOMMIQS
Protein Folding
Protein Aggregation
16. TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Computer Simulations of Biological Processes @ BIOMMIQS
Prion protein folding
Prion protein (the causative agent of spongiform encephalopathy)
is an unstable protein that can adopt different structures.
One of these structures, tends to form precipitates in the central
nervous system tissue, leading to neurodegeneration.
17. TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Computer Simulations of Biological Processes @ BIOMMIQS
Prion protein folding
The structural determinants of prion protein
stability were identified in-silico by extensive
computer simulations.
Benetti F. and Biarnés X. et al, JMB 2014
The simulations spent
1.000.000 of hours of total CPU time,
and generated
1 TeraByte of data.
18. TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Computer Simulations of Biological Processes @ BIOMMIQS
Amyloid-β peptide aggregation
Amyloid-β is an intrinsically disordered
protein. The final segment of this
protein can lead to aggregates. These
aggregates are associated to
Alzheimer’s disease.
18 molecules of the final Amyloid-β
segment were simulated, and a nascent
fibril was detected.
Baftizadeh F. et al, PRL 2013
19. TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Technical Details of Computer Simulations
• Huge CPU-demanding
– millions of hours in CPU time supercomputers
• Big Data Storage
– 100 GBytes per simulation (3-4 months) cloud storage?
• Data Transfer
– 1 GByte of data generated daily
– Need to transfer locally for visualization efficient communications
– Current download rates:
• From CESVIMA (UPM Madrid) to IQS 5.3 MBytes / s
• From BSC (UPC Barcelona) to IQS ??
• From SISSA (Trieste, Italy) to IQS 9.5 Mbytes / s
20. TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Technical Details of Computer Simulations
• Visualization
– Renders are done off-line in local computers
• Visualization on-line?
– Remote Desktops are a solution not nice
– “Streaming” of xyz coordinates could be a solution?
C 3.23 2.22 4.34
O 2.31 1.34 3.41
H 2.88 2.35 5.32
C 3.21 2.11 1.22
…
…
30000-50000 atoms
x 100000 frames
Minimal Atomic Coordinates File
atom x y z
3D renders are generated by specific software
based on an atomic coordinate file containing
the xyz coordinates of each atom
in the protein structure
21. TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Proteins as Chemical Machines: ENZYMES
Chemical transformations rule our life
6·CO2 + 6·H2O C6H12O6 + 6·O2
Enzymes decrease the activation
energy required for a chemical
transformation: this is a catalyst
PHOTOSYNTHESIS
22. TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Proteins as “bio”catalysts (enzymes)
Protein structures are tightly
tuned to accommodate their
natural ligands.
Maximum catalytic efficiency of
enzymes is attained, in part, by the
binding forces in the active site.
23. TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Industrial applications of enzymes
Amylases
production of sugars from starch in
syrups production
Glucanases
starch degradation prior to
fermentation in beer production
Proteases
Cellulases
cellulose degradation prior to fermentation in
bioethanol production
Lipases
esterification of lipids in biodiesel production
Amylases, Xylanases, Cellulases, Ligninases
starch degradation to lower viscosity, aiding
sizing and coating paper
24. TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Industrial applications of enzymes
PRODUCTION OF NON-NATURAL ADDED-VALUE
COMPOUNDS
Pharmaceuticals Pigments Biomaterials
Complementing traditional chemical industry
···
- Processes optimization
- Green chemistry
25. TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Enzymes to Produce Added-Value Compounds
natural
compound
novel
compound
there is room for enzyme optimization
by PROTEIN ENGINEERING
Natural enzymes are not optimized for non-natural compounds
26. TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
The Protein Engineering Dilemma
MSDTAGSPWFSHSLKRNQDFGFYY
SDFCNARSDTPQSCWREGQNESDR
QTAVWPYRTSCNMLKCSRYTCVPM
Protein Engineering can be guided by
Computer Simulations and Genomic-Data Mining
27. TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Protein Engineering by Data Mining @ BIOMMIQS
Setting-up of an
integrative platform
to assist in protein engineering
experiments
BIOMMIQS
The platform is based on biological data integration
from different database sources
and complemented with computer simulations
28. TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Protein Engineering by Data Mining @ BIOMMIQS
UNIPROT
50.000.000
protein sequences
PDB
110.000
protein structures
PFAM
16.000
protein functions
CAZY
340.000
enzymes active on carbohydrates
GENBANK
185.000.000
genomic sequences
http://www.ncbi.nlm.nih.gov/genbank/
http://uniprot.org
http://pdb.org
http://pfam.xfam.org
http://cazy.org
29. TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Protein Engineering by Data Mining @ BIOMMIQS
Protein engineering of chitindeacetylases for the
biotechnological production of chitosan
http://nano3bio.eu
from chitin
to chitosan
……
+ + + ……
30. TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Protein Engineering by Data Mining @ BIOMMIQS
Natural chitindeacetylase enzymes producing different chitosans
Andrés E. et al, Angew Chem Intl Ed 2014
Engineering of a non-natural
chitindeacetylase to produce
new-to-nature chitosans
31. TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Technical Details of Data Mining
• Public Databases size
– GENBANK 700 GBytes
– UNIPROT 27 GBytes
– PDB 373 GBytes
– PFAM 195 GBytes
– No local copies! For general purposes, public web services are used.
• BLAST
• JMOL
• HMMSEARCH
• Public Databases are updated regularly (weekly)
– Need to update local copies mirrors?
32. TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Bioinformatics and Molecular Modelling Unit
Laboratory of Biochemistry
BIOMMIQS
Bioinformatics for comparative
analysis of genomic sequences
Protein Structure Prediction
In silico tools to assist in
experimental Protein Engineering
Simulation of small and macro
molecules conformational dynamics
33. TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Càlcul i anàlisi de dades massiu per al disseny
d’enzims amb aplicacions biotecnològiques
Xevi Biarnés
@xevibiarnes
xevi.biarnes@iqs.edu
Departament de Bioenginyeria
IQS School of Engineering
Jornada TAC’2015
Mataró, 30 de juny de 2015