1. Department of Electrical Engineering,
University of Engineering and Technology, Lahore
Protein Quantitation Pipeline For Top Down
Proteomics
(Hudiara Drain Case Study)
Group No: 2014-FYP-14
Project Advisor: Dr. Khalid Mahmood Ul Hasan
Undergraduate Final Year Project Presentation
Dated: 4th April, 2018
2. Team Introduction
Muhammad Ahsan Ali (Team Leader)
2014-EE-024
Specialization: Computer
Undergraduate Final Year Project Presentation
Dated: 4th April, 2018
Rimsha Nadeem
2014-EE-057
Specialization: Computer
Mujtaba Saboor
2014-EE-016
Specialization: Computer
Shifa Imran
2014-EE-158
Specialization: Computer
3. Problem Statement
• Lack of open-source and open-architecture softwares
publicly available for the analysis of mass
spectrometric data employing top-down approach.
• Limited research work on protein quantitation through
top-down proteomics.
Undergraduate Final Year Project Presentation
Dated: 4th April, 2018
4. Problem Statement
• Extremely dangerous heavy metal poisoning in Hudiara
drain, Lahore.
• Heavy metal poisoned water used for drinking purposes
and feeding crops causing serious health effects on
general public.
Undergraduate Final Year Project Presentation
Dated: 4th April, 2018
5. Problem Statement
• Pollution Load Contributed by Hudiara drain to River
Ravi: 141.5 tons/day[2]
• Its annual average discharge at its confluence with the
river Ravi is 178 cusecs[3]
• Process of bioremediation at stake due to the addition
of antibiotics, in the polluted Hudiara drain, originating
from industrial effluents from both India and Pakistan
Undergraduate Final Year Project Presentation
Dated: 4th April, 2018
6. Heavy Metals Reported In Hudiara Drain
Undergraduate Final Year Project Presentation
Dated: 4th April, 2018
Effects of Metals on Human Health:
• Cr: Damage to liver, kidneys, nerve tissues,
cancers and skin irritation.
• Mn: Manganese, tremors, and increased
neurological disorders in children,
hyperactivity, memory issues.
• Fe: Acne, Eczema, Hemochromatosis,
which can lead to liver, heart and pancreatic
damage, as well as diabetes.
• Cu: Short-term exposure to high levels can
cause gastrointestinal distress. Long-term
exposure and severe cases of copper
poisoning can cause anaemia and disrupt
liver and kidney functions.
• Cd: Kidney Dysfunction and Osteoporosis.
• Pb: Young children, infants, and foetuses
are particularly vulnerable
• As: Skin, bladder and lung cancers, skin
lesions, cardiovascular disease,
neurotoxicity and diabetes.
7. Problem Statement: Hudiara Drain
Undergraduate Final Year Project Presentation
Dated: 4th April, 2018
Snapshot of Case Study of Hudiara. Pinned locations show Industries, Blue Line is Hudiara Drain, Black Line is
Sattukatla Drain, Arrows Starred sites are sampling sites, Yellow region is residential area and green region is
Quaid-e-Azam Industrial Estate.
8. Proposed Solution: Hudiara Case Study
Bioremediation is the use of microbes to clean up
contaminated soil and groundwater.
Undergraduate Final Year Project Presentation
Dated: 4th April, 2018
1Biosorption: Binding
of metals to cellular
surface.
2Bioaccumulation:
Active uptake and
accumulation in
cytoplasm/periplasm
ic space.
3Biopreciptation: Metal ions
combine with the anionic
species excreted by microbial
metabolism to give less toxic,
insoluble, metal salts.
4Biotransformation:
Chemical modification of a
toxic compound to less toxic
compound.
5Bioleaching: Conversion
of insoluble metal sulfides
to soluble metal sulfates.
9. Proposed Solution : Hudiara Case Study
• Perform analysis of the protein content of the microbes
present in Hudiara drain responsible for heavy metal
fixation, which can lead to unlock plethora of ways to
unfold the problem of heavy metal contamination in
Pakistan and all around the world.
• Identify proteins responsible for each heavy metal
fixation, and quantify the amount necessary for
substantial bioremediation in order for the water to be
non-injurious to human health.
Undergraduate Final Year Project Presentation
Dated: 4th April, 2018
10. Proposed Solution
• Develop a state-of-the-art high-performance protein
MS/MS mass spectra deconvolution, theoretical pattern
generation, protein identification and quantitation
pipeline for this purpose.
• Utilize groundbreaking Graphical Processing Units
(GPUs) for implementing intensive algorithms.
• Implement entire workflow as a RESTful API Web-
Service , with its core built on C#, front-end developed
using Angular 4 (a JavaScript framework), with an
MSSQL database present at its back-end.
Undergraduate Final Year Project Presentation
Dated: 4th April, 2018
11. Ultimate Goal
• Opening a new paradigm of research by finding specific
proteins responsible for each heavy metal fixation.
• Using our results, help bioengineer novel ways to combat the
issue of heavy metal contamination employing the technique
of bioremediation.
• Make the entire pipeline open-source and open-architecture
for general public for future developments.
• Move on to protein analysis in milk, blood and other
substances for further expansion.
Undergraduate Final Year Project Presentation
Dated: 4th April, 2018
14. Web Architecture
• Algorithmic codes are written in
C# and specific hot-paths shifted
to GPU through CUDA C coding
(host approach kernel and ).
• All input parameters, results and
other essentials are stored in
database through code first
technique.
• RESTful API is connected to
Angular 4 front-end.
Undergraduate Final Year Project Presentation
Dated: 4th April, 2018
16. GPU Introduction
• CPU is the central processing unit and is considered as the brain of the
computing devices. CPU is a general purpose it can do any
computation and can process various tasks while GPU is a specialized
unit that can do some specific tasks more efficiently.
• GPU has hundreds of cores that can handle thousands of threads at a
time thus is useful for large bulk of data, while a CPU can handle only
few threads of software at a time. The cost of a CPU thread switch is
hundreds of cycles whereas GPU does not have any cost in switching
of threads.
• Since our input files contain plethora of mass-intensity pair values, it’s
much more efficient it to implement our algorithms using GPUs.
Undergraduate Final Year Project Presentation
Dated: 4th April, 2018
18. Algorithms
State-of-the-art algorithms have been designed under
these categories:
I. Theoretical Pattern Generation using Averagine[4].
II. Deconvolution of protein spectra using a modified
version of THRASH[5].
III. Extraction of missing peaks using Polynomial
Regression.
IV.Protein Quantitation using Spectral Counting[6].
V. Protein Quantitation using XIC[7].
Undergraduate Final Year Project Presentation
Dated: 4th April, 2018
19. I. Theoretical Pattern Generator
• Multinomial Expansion is used to find out the intensities for a
specific number of atoms and is described below:
(𝑝 + 𝑞) 𝑛 = 𝑥=0
𝑛
𝐶(𝑛, 𝑥) 𝑝(𝑛−𝑥) 𝑞 𝑥 Eq(1)
where 𝐶 𝑛, 𝑥 =
𝑛!
𝑛−𝑥 ! 𝑥!
,
𝑝 = abundance of lighter isotope,
q = abundance of heavier isotope,
n = number of atoms
Undergraduate Final Year Project Presentation
Dated: 4th April, 2018
20. Undergraduate Final Year Project Presentation
Dated: 4th April, 2018
• For large molecules, the result of the factorial becomes very taxing, it is
optimized by a log operator and Sterling's approximation. Taking log of
equation 1:
𝑙𝑜𝑔
𝑛!
𝑛−𝑥 ! 𝑥!
× 𝑝 𝑥 × 𝑞 𝑛−𝑥 =
𝑙𝑜𝑔 𝑛! + [𝑥 × 𝑙𝑜𝑔 𝑝 ] + [ 𝑛 − 𝑥 × 𝑙𝑜𝑔 𝑞 ] − 𝑙𝑜𝑔 𝑥! − 𝑙𝑜𝑔 𝑛 − 𝑥 !
𝑤ℎ𝑒𝑟𝑒: 𝑙𝑜𝑔 𝑛! = 𝑛 × 𝑙𝑜𝑔 𝑛 − 𝑛 + 𝑙𝑜𝑔(𝑛)
• Anti-log of the above calculation gives us the required theoretical
intensity for that particular instance.
21. II. Deconvolution
Undergraduate Final Year Project Presentation
Dated: 4th April, 2018
Get .RAW file,
Extract mass list and
intensities
End of mass/
intensity list?
End
Find local maxima
within a user defined
window
Get molecular
formula from
Averagine and
generate theoretical
spectrum
Identify peak spacing
(z)
Find Goodness of Fit
between
experimental and
theoretical
1/z an ineger?
p value less than
threshold?
Subtract 1.0023 from
the average mass
Process original
spectrum
Create mass list of
mono isotopic masses
START
YES
NO
NO
YES
YESNO
22. Comparison With a Chinese Toolbox
• Results of our In-house
deconvolution tool
• Results of a Chinese research
group
Undergraduate Final Year Project Presentation
Dated: 4th April, 2018
C100 H200 O60 N50 S3
MASS INTENSITY
3156 75.7747
3157 100.0000
3158 74.9352
3159 40.8380
3160 17.8889
3162 6.6406
3163 2.1641
3164 0.6284
3165 0.1665
24. Applications
• Identification and quantitation of proteins is of critical
importance in pharmaceutical industries.
• Diagnosis of diseases due to malfunctioning of a
protein.
• Other than water pollution, many other case studies can
be presented as identification of a protein specific to a
function can open vast horizons of research, like
hemoglobin for oxygen fixation in cells.
• Analysis of the protein content of milk, thus
determining milk quality. Undergraduate Final Year Project Presentation
Dated: 4th April, 2018
25. Audience
• All kinds of computer engineers, biologists, chemists,
doctors, pharmacists and bioengineers who work in any
kind of lab and want to test their results with the help of
mass spectrometry.
Undergraduate Final Year Project Presentation
Dated: 4th April, 2018
26. What we have done so far?:
• All the designed algorithms are verified and
benchmarked on MATLAB.
• A MATLAB version of our toolbox is already available at
GITHUB.
• Web service is live at: 203.135.63.99/perceptron_v1
• Hudiara samples have been collected from six different
sites and sent for mass spectrometry after proper wet lab
procedure, raw data files are awaited.
Undergraduate Final Year Project Presentation
Dated: 4th April, 2018
27. What we have done so far?:
• Mass Tuner, PST and Spectral Counting are
implemented on GPU for higher performance.
Undergraduate Final Year Project Presentation
Dated; 4th April, 2018
28. GUI Developed
• Perceptron: A RESTful API Web-Service (GPU-based)
• The front-end developed using Angular 4 (a JavaScript framework).
• Core built using C#
• MSSQL database present at the back end.
Undergraduate Final Year Project Presentation
Dated: 4th April, 2018
29. GUI Developed
• Spectrum: A MATLAB toolbox
Undergraduate Final Year Project Presentation
Dated: 4th April, 2018
30. Future Deliverables
• Incorporate quantitation of labelled data samples e.g.
SILAC, iTRAQ, iCAT etc.
• Interface multiple GPUs to prevent stallation in queued
jobs and further boost the computational processes.
• Help innovate new ways to combat the issue of water
contamination employing bioengineering.
Undergraduate Final Year Project Presentation
Dated: 4th April, 2018
31. Gantt Chart
Undergraduate Final Year Project Presentation
Dated: 4th April, 2018
0 10 20 30 40 50 60 70
Deconvolution
Quantitation
Spectrum
Perceptron
Milestones Achieved from 15 Feb 17’ to 1 April 18’ (58 weeks)
GUI/frontend GPU tansfer Algorithms design
32. References
Undergraduate Final Year Project Presentation
Dated: 4th April, 2018
[1] Environ. D. Valkenborg I. Jansen, and T. Burzykowski
“A Model-Based Method for the Prediction of the Iso-Mental
Monitoring of River Ravi,” in EPA, pp. 1-86, 2009.
[2] M. T. Yamin, N. Ahmad, (2007). “Influence of Hudiara Drain
Water Irrigation on Trace Elements Load in Soil and Uptake by
Vegetables,” Journal of Applied Sciences and Environmental
Management [Online], vol. 11, no. 2, pp. 169-172.
https://doi.org/10.4314/jasem.v11i2.55029
33. References
Undergraduate Final Year Project Presentation
Dated: 4th April, 2018
[3] M. S. Nobile, “cuTauLeaping: A GPU-Powered Tau-Leaping
Stochastic Simulator for Massive Parallel Analyses of Biological
Systems”, in PLOS ONE, 2014.
[4] D. Valkenborg, I. Jansen, T. Burzykowski, “A Model-Based
Method for the Prediction of the Isotopic Distribution of
Peptides”, in Journal of American Society for Mass Spectrometry,
2008.
34. References
Undergraduate Final Year Project Presentation
Dated: 4th April, 2018
[5] D. M. Horn, “Automated reduction and interpretation of high
resolution electrospray mass spectra of large molecules”, in Journal
of American Society for Mass Spectrometry, 2000.
[6] K. Aoshima, “A Simple Peak Detection and Label Free
Quantitation Algorithm for Chromatography Mass Spectrometry”,
in BMC, 2014.
[7] D. Fermin, “Abacus: A computational Tool for Extracting and
Pre-processing Spectral Count Data” in Proteomics, vol. 11, no. 7,
2011.