SlideShare a Scribd company logo
1 of 17
Download to read offline
Summer internship report
FAiS, Human Genome Center,
The University of Tokyo
Vaibhav Kulshrestha
2018
Summary
2
1. Studied Non-negative matrix factorization (NMF)
2. Applied it to JFCR39 and DepMap-Avana datasets
3. Learnt how to use a supercomputer
4. Made a python package - BigNMF
About NMF
Non-negative factorization (NMF) refers to a set of algorithms which factorize a
matrix X into two matrices W and H such that X = W.H and all three matrices are
non-negative.
This is a very versatile algorithm and finds use in many fields, one of which is
computational biology as the resultant factorized matrices are easy to interpret
and inspect.
There are many variations to this algorithm like Sparse NMF, Integrative NMF etc.
This paper popularized use of NMF.
Getting to know NMF
I started my internship by reading about NMF and its uses for the first few days.
Then, I studied different modifications of NMF and why they were made.
After this, I implemented the algorithm on my own. I used Standard and Sparse
NMF.
To fine tune my implementation, I tried the algorithm on simulated/toy data and
made changes as necessary.
This was to ensure that the algorithm would give correct factorization when I use it
on real data.
4
Project #1 - Biomarker discovery using
JFCR39 dataset
About the JFCR39 dataset
I worked on biomarker discovery using NMF.
This dataset contains information about cell
lines and their response to 69 drugs, mutation
status of 7 genes and protein expression of 19
proteins.
1. The drugs were a mix of chemotherapeutic
and molecular targeted drugs.
2. The mutations are either oncogenic or
tumor suppressor.
3. The proteins in the database can be
classified into 3 broad pathways - PI3K/Akt
pathway, mTOR pathway and MEK
pathway.
6
Workflow
1. Since this was a multi omics dataset, I had to use Joint NMF which is
combined factorization of multiple matrices with one dimension common.
2. I chose Standard NMF instead of Sparse NMF as Standard NMF gave better
clustering at all ranks.
3. Then I found the most suitable rank for analysis (3) with the help of
consensus matrices and cophenetic correlation (errors decreasing with
increasing rank).
4. I also read about the different types of drugs and their mechanisms of action,
following which I manually classified the genes and protein in the data for
verification of NMF results.
5. After fixing rank 3, I found the output matrices and then analysed them.
7
8
Figures Joint NMF consensus matrices
Rank 2 Rank 3 Rank 4
R
a
n
k
3
Cophenetic correlation (higher is better)
was the highest for rank 3 at 0.97
Consensus matrices for
rank 3 were the most
clearly clustered.
The 4 matrices
(clockwise from top left)
are for cell lines, drugs,
mutations and proteins.
Cophenetic correlation
Results/outputs
9
Clustered output matrices for JFCR rank 3
0 - PI3K
inhibitors
1 -
Chemother
apeutic
2 - Akt
inhibitors
0 - Akt
1 - PI3K +
mTOR
2 -
Mixed
Cell lines
ProteinsMutations
Drugs
2 - Oncogenic
0 -
TS
Analysis
1. PTEN mutants are sensitive to PI3K inhibitors.
2. PTEN mutation affects Akt phosphorylation.
3. mTOR is downstream of PI3K and Akt, yet mTOR and downstream proteins are not
affected by PTEN mutation - other proteins?
4. PTEN mutation can be a good biomarker for PI3K inhibitor.
5. PI3K inhibitors and mTOR inhibitors cluster together, suggesting similar mechanisms of
action.
6. Chemotherapeutic drugs don’t have any relation to any mutation.
7. Some chemotherapeutic drugs are clustered with Akt inhibitors and PI3K inhibitors -
similar effects?
8. Mutations happen in PIK3C subunits, KRAS and BRAF, they affect a wide range of
pathways.
9. Cell lines in each cluster are sensitive to the drugs in the respective cluster..
10
Project #2 - Tumor dependency using
DepMap CRISPR Avana dataset
About the DepMap - CRISPR Avana dataset
DepMap is a repository about tumors and their dependencies for development of
precision treatments.
The data was survival of the cell line after gene knockout using CRISPR.
It contained cell lines and their sensitivities to different genes. The more negative
the value, the more dependent the cell line was on the gene.
We wanted to find tumor dependencies and group cancers based on them.
https://depmap.org/portal/
12
Workflow
1. Initially chose rank 3 again due to good clustering and high cophenetic
correlation.
2. But since it was a very large dataset, I would need a higher rank to clearly
factorize the data.
3. So, I plotted consensus matrices and cophenetic correlation for ranks greater
than 40.
4. Chose rank 50 due to good clustering and high cophenetic correlation.
5. Read about pathway analysis, GO analysis and PANTHER analysis.
6. Plotted pie charts for cell line clusters and corresponding genes.
7. The genes in the cluster were tumor suppressors for the cell lines in
corresponding cluster.
13
Figures
14
3
Pie chart representation of some clusters. Pathway analysis of
clusters would shed more light on cell line-gene dependencies.
P53PTEN
Tumor Suppressors
Supercomputer experience - Shirokane
1. This was my first time using a supercomputer.
2. I learnt various commands and utilities- qsub, qstat
3. Also learnt about memory usage - qfree, qavail
4. After basic usage of qsub, learnt different parameters with which I could
submit jobs.
5. Made shell scripts which could run multiple programs in parallel with different
parameters for faster workflow and convenience.
6. Had to customize my environment to work with the supercomputer better.
15
Python Package - BigNmf
Made a package BigNmf which is now available to
download from PyPI.
Run pip3 install bignmf.
This package implements both single and joint
NMF with the help of Standard, Sparse and
Integrative NMF.
You can get output matrices, consensus matrices,
cophenetic correlations and errors.
Proper documentation is available at -
https://bignmf.readthedocs.io/en/latest/
16
Learnings and new experiences
1. Great introduction to the world of research
2. Vastly improved knowledge about bioinformatics and biology in general
3. Improved coding practices looking at others codes
4. Supercomputer!
5. Interaction with people from different cultures
Always wanted to visit and live in Japan!
Thank you everyone! :)
17

More Related Content

What's hot

An Integrated Approach to Uncover Drivers of Cancer
An Integrated Approach to Uncover Drivers of CancerAn Integrated Approach to Uncover Drivers of Cancer
An Integrated Approach to Uncover Drivers of CancerRaunak Shrestha
 
Peptide mass fingerprinting analysis
Peptide mass fingerprinting analysisPeptide mass fingerprinting analysis
Peptide mass fingerprinting analysisSusan Rey
 
Homology modeling of proteins (ppt)
Homology modeling of proteins (ppt)Homology modeling of proteins (ppt)
Homology modeling of proteins (ppt)Melvin Alex
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformaticsAbhishek Vatsa
 
Deep learning based multi-omics integration, a survey
Deep learning based multi-omics integration, a surveyDeep learning based multi-omics integration, a survey
Deep learning based multi-omics integration, a surveySOYEON KIM
 
Protein Remote Homology Detection
Protein Remote Homology DetectionProtein Remote Homology Detection
Protein Remote Homology DetectionAlia Hamwi
 
Structural Bioinformatics - Homology modeling & its Scope
Structural Bioinformatics - Homology modeling & its ScopeStructural Bioinformatics - Homology modeling & its Scope
Structural Bioinformatics - Homology modeling & its ScopeNixon Mendez
 

What's hot (9)

Homology modelling
Homology modellingHomology modelling
Homology modelling
 
An Integrated Approach to Uncover Drivers of Cancer
An Integrated Approach to Uncover Drivers of CancerAn Integrated Approach to Uncover Drivers of Cancer
An Integrated Approach to Uncover Drivers of Cancer
 
Peptide mass fingerprinting analysis
Peptide mass fingerprinting analysisPeptide mass fingerprinting analysis
Peptide mass fingerprinting analysis
 
Homology modeling of proteins (ppt)
Homology modeling of proteins (ppt)Homology modeling of proteins (ppt)
Homology modeling of proteins (ppt)
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 
Deep learning based multi-omics integration, a survey
Deep learning based multi-omics integration, a surveyDeep learning based multi-omics integration, a survey
Deep learning based multi-omics integration, a survey
 
Applied Bioinformatics Assignment 5docx
Applied Bioinformatics Assignment  5docxApplied Bioinformatics Assignment  5docx
Applied Bioinformatics Assignment 5docx
 
Protein Remote Homology Detection
Protein Remote Homology DetectionProtein Remote Homology Detection
Protein Remote Homology Detection
 
Structural Bioinformatics - Homology modeling & its Scope
Structural Bioinformatics - Homology modeling & its ScopeStructural Bioinformatics - Homology modeling & its Scope
Structural Bioinformatics - Homology modeling & its Scope
 

Similar to Summer internship at University of Tokyo

Internship Report
Internship ReportInternship Report
Internship ReportNeha Gupta
 
Wp mi script_preamp_0613_lr
Wp mi script_preamp_0613_lrWp mi script_preamp_0613_lr
Wp mi script_preamp_0613_lrElsa von Licy
 
Liu_Jiangyuan_1201662_FR
Liu_Jiangyuan_1201662_FRLiu_Jiangyuan_1201662_FR
Liu_Jiangyuan_1201662_FR姜圆 刘
 
Crimson Publishers-Predicting Protein Transmembrane Regionsby Using LSTM Model
Crimson Publishers-Predicting Protein Transmembrane Regionsby Using LSTM ModelCrimson Publishers-Predicting Protein Transmembrane Regionsby Using LSTM Model
Crimson Publishers-Predicting Protein Transmembrane Regionsby Using LSTM ModelCrimsonPublishers-SBB
 
Analytical Study of Hexapod miRNAs using Phylogenetic Methods
Analytical Study of Hexapod miRNAs using Phylogenetic MethodsAnalytical Study of Hexapod miRNAs using Phylogenetic Methods
Analytical Study of Hexapod miRNAs using Phylogenetic Methodscscpconf
 
An gef mi_scriptffpe
An gef mi_scriptffpeAn gef mi_scriptffpe
An gef mi_scriptffpeElsa von Licy
 
Systems biology in polypharmacology: explaining and predicting drug secondary...
Systems biology in polypharmacology: explaining and predicting drug secondary...Systems biology in polypharmacology: explaining and predicting drug secondary...
Systems biology in polypharmacology: explaining and predicting drug secondary...Andrei KUCHARAVY
 
USING ARTIFICIAL NEURAL NETWORK IN DIAGNOSIS OF THYROID DISEASE: A CASE STUDY
USING ARTIFICIAL NEURAL NETWORK IN DIAGNOSIS OF THYROID DISEASE: A CASE STUDYUSING ARTIFICIAL NEURAL NETWORK IN DIAGNOSIS OF THYROID DISEASE: A CASE STUDY
USING ARTIFICIAL NEURAL NETWORK IN DIAGNOSIS OF THYROID DISEASE: A CASE STUDYijcsa
 
Gene regulatory networks
Gene regulatory networksGene regulatory networks
Gene regulatory networksMadiheh
 
Protein structure prediction by means
Protein structure prediction by meansProtein structure prediction by means
Protein structure prediction by meansijaia
 
A Methodology For Motif Discovery Employing Iterated Cluster Re-Assignment
A Methodology For Motif Discovery Employing Iterated Cluster Re-AssignmentA Methodology For Motif Discovery Employing Iterated Cluster Re-Assignment
A Methodology For Motif Discovery Employing Iterated Cluster Re-AssignmentAngela Tyger
 
overview on Next generation sequencing in breast csncer
overview on Next generation sequencing in breast csnceroverview on Next generation sequencing in breast csncer
overview on Next generation sequencing in breast csncerSeham Al-Shehri
 
RSC Hatfield 2018 Kinase meeting : potency patents MMPA approaches
RSC Hatfield 2018  Kinase meeting : potency patents MMPA approachesRSC Hatfield 2018  Kinase meeting : potency patents MMPA approaches
RSC Hatfield 2018 Kinase meeting : potency patents MMPA approachesEd Griffen
 
RNA Sequencing Research
RNA Sequencing ResearchRNA Sequencing Research
RNA Sequencing ResearchTanmay Ghai
 
In Silico discovery of Metabotropic Glutamate Receptor-3 (mGluR-3) inhibitors
In Silico discovery of Metabotropic Glutamate Receptor-3 (mGluR-3) inhibitorsIn Silico discovery of Metabotropic Glutamate Receptor-3 (mGluR-3) inhibitors
In Silico discovery of Metabotropic Glutamate Receptor-3 (mGluR-3) inhibitorsmaldjuan
 
Knowing Your NGS Downstream: Functional Predictions
Knowing Your NGS Downstream: Functional PredictionsKnowing Your NGS Downstream: Functional Predictions
Knowing Your NGS Downstream: Functional PredictionsGolden Helix Inc
 
Prospects And Future Trend of mRNA Therapeutics.pdf
Prospects And Future Trend of mRNA Therapeutics.pdfProspects And Future Trend of mRNA Therapeutics.pdf
Prospects And Future Trend of mRNA Therapeutics.pdfDoriaFang
 
Genomics_Aishwarya Teli.pptx
Genomics_Aishwarya Teli.pptxGenomics_Aishwarya Teli.pptx
Genomics_Aishwarya Teli.pptxAishwaryaTeli5
 

Similar to Summer internship at University of Tokyo (20)

Internship Report
Internship ReportInternship Report
Internship Report
 
Rna seq
Rna seq Rna seq
Rna seq
 
Wp mi script_preamp_0613_lr
Wp mi script_preamp_0613_lrWp mi script_preamp_0613_lr
Wp mi script_preamp_0613_lr
 
Liu_Jiangyuan_1201662_FR
Liu_Jiangyuan_1201662_FRLiu_Jiangyuan_1201662_FR
Liu_Jiangyuan_1201662_FR
 
Crimson Publishers-Predicting Protein Transmembrane Regionsby Using LSTM Model
Crimson Publishers-Predicting Protein Transmembrane Regionsby Using LSTM ModelCrimson Publishers-Predicting Protein Transmembrane Regionsby Using LSTM Model
Crimson Publishers-Predicting Protein Transmembrane Regionsby Using LSTM Model
 
Analytical Study of Hexapod miRNAs using Phylogenetic Methods
Analytical Study of Hexapod miRNAs using Phylogenetic MethodsAnalytical Study of Hexapod miRNAs using Phylogenetic Methods
Analytical Study of Hexapod miRNAs using Phylogenetic Methods
 
An gef mi_scriptffpe
An gef mi_scriptffpeAn gef mi_scriptffpe
An gef mi_scriptffpe
 
Systems biology in polypharmacology: explaining and predicting drug secondary...
Systems biology in polypharmacology: explaining and predicting drug secondary...Systems biology in polypharmacology: explaining and predicting drug secondary...
Systems biology in polypharmacology: explaining and predicting drug secondary...
 
USING ARTIFICIAL NEURAL NETWORK IN DIAGNOSIS OF THYROID DISEASE: A CASE STUDY
USING ARTIFICIAL NEURAL NETWORK IN DIAGNOSIS OF THYROID DISEASE: A CASE STUDYUSING ARTIFICIAL NEURAL NETWORK IN DIAGNOSIS OF THYROID DISEASE: A CASE STUDY
USING ARTIFICIAL NEURAL NETWORK IN DIAGNOSIS OF THYROID DISEASE: A CASE STUDY
 
Gene regulatory networks
Gene regulatory networksGene regulatory networks
Gene regulatory networks
 
Protein structure prediction by means
Protein structure prediction by meansProtein structure prediction by means
Protein structure prediction by means
 
A Methodology For Motif Discovery Employing Iterated Cluster Re-Assignment
A Methodology For Motif Discovery Employing Iterated Cluster Re-AssignmentA Methodology For Motif Discovery Employing Iterated Cluster Re-Assignment
A Methodology For Motif Discovery Employing Iterated Cluster Re-Assignment
 
overview on Next generation sequencing in breast csncer
overview on Next generation sequencing in breast csnceroverview on Next generation sequencing in breast csncer
overview on Next generation sequencing in breast csncer
 
RSC Hatfield 2018 Kinase meeting : potency patents MMPA approaches
RSC Hatfield 2018  Kinase meeting : potency patents MMPA approachesRSC Hatfield 2018  Kinase meeting : potency patents MMPA approaches
RSC Hatfield 2018 Kinase meeting : potency patents MMPA approaches
 
RNA Sequencing Research
RNA Sequencing ResearchRNA Sequencing Research
RNA Sequencing Research
 
In Silico discovery of Metabotropic Glutamate Receptor-3 (mGluR-3) inhibitors
In Silico discovery of Metabotropic Glutamate Receptor-3 (mGluR-3) inhibitorsIn Silico discovery of Metabotropic Glutamate Receptor-3 (mGluR-3) inhibitors
In Silico discovery of Metabotropic Glutamate Receptor-3 (mGluR-3) inhibitors
 
Knowing Your NGS Downstream: Functional Predictions
Knowing Your NGS Downstream: Functional PredictionsKnowing Your NGS Downstream: Functional Predictions
Knowing Your NGS Downstream: Functional Predictions
 
CADD assignment unit 3
CADD assignment unit 3CADD assignment unit 3
CADD assignment unit 3
 
Prospects And Future Trend of mRNA Therapeutics.pdf
Prospects And Future Trend of mRNA Therapeutics.pdfProspects And Future Trend of mRNA Therapeutics.pdf
Prospects And Future Trend of mRNA Therapeutics.pdf
 
Genomics_Aishwarya Teli.pptx
Genomics_Aishwarya Teli.pptxGenomics_Aishwarya Teli.pptx
Genomics_Aishwarya Teli.pptx
 

Recently uploaded

Obat Aborsi Ampuh Usia 1,2,3,4,5,6,7 Bulan 081901222272 Obat Penggugur Kandu...
Obat Aborsi Ampuh Usia 1,2,3,4,5,6,7 Bulan  081901222272 Obat Penggugur Kandu...Obat Aborsi Ampuh Usia 1,2,3,4,5,6,7 Bulan  081901222272 Obat Penggugur Kandu...
Obat Aborsi Ampuh Usia 1,2,3,4,5,6,7 Bulan 081901222272 Obat Penggugur Kandu...Halo Docter
 
Difference Between Skeletal Smooth and Cardiac Muscles
Difference Between Skeletal Smooth and Cardiac MusclesDifference Between Skeletal Smooth and Cardiac Muscles
Difference Between Skeletal Smooth and Cardiac MusclesMedicoseAcademics
 
MOTION MANAGEMANT IN LUNG SBRT BY DR KANHU CHARAN PATRO
MOTION MANAGEMANT IN LUNG SBRT BY DR KANHU CHARAN PATROMOTION MANAGEMANT IN LUNG SBRT BY DR KANHU CHARAN PATRO
MOTION MANAGEMANT IN LUNG SBRT BY DR KANHU CHARAN PATROKanhu Charan
 
Top 10 Most Beautiful Russian Pornstars List 2024
Top 10 Most Beautiful Russian Pornstars List 2024Top 10 Most Beautiful Russian Pornstars List 2024
Top 10 Most Beautiful Russian Pornstars List 2024locantocallgirl01
 
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...Cara Menggugurkan Kandungan 087776558899
 
See it and Catch it! Recognizing the Thought Traps that Negatively Impact How...
See it and Catch it! Recognizing the Thought Traps that Negatively Impact How...See it and Catch it! Recognizing the Thought Traps that Negatively Impact How...
See it and Catch it! Recognizing the Thought Traps that Negatively Impact How...bkling
 
7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta
7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta
7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana GuptaLifecare Centre
 
Creeping Stroke - Venous thrombosis presenting with pc-stroke.pptx
Creeping Stroke - Venous thrombosis presenting with pc-stroke.pptxCreeping Stroke - Venous thrombosis presenting with pc-stroke.pptx
Creeping Stroke - Venous thrombosis presenting with pc-stroke.pptxYasser Alzainy
 
Physiologic Anatomy of Heart_AntiCopy.pdf
Physiologic Anatomy of Heart_AntiCopy.pdfPhysiologic Anatomy of Heart_AntiCopy.pdf
Physiologic Anatomy of Heart_AntiCopy.pdfMedicoseAcademics
 
VIP ℂall Girls Kothanur {{ Bangalore }} 6378878445 WhatsApp: Me 24/7 Hours Se...
VIP ℂall Girls Kothanur {{ Bangalore }} 6378878445 WhatsApp: Me 24/7 Hours Se...VIP ℂall Girls Kothanur {{ Bangalore }} 6378878445 WhatsApp: Me 24/7 Hours Se...
VIP ℂall Girls Kothanur {{ Bangalore }} 6378878445 WhatsApp: Me 24/7 Hours Se...deepakkumar115120
 
Jual Obat Aborsi Di Dubai UAE Wa 0838-4800-7379 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi Di Dubai UAE Wa 0838-4800-7379 Obat Penggugur Kandungan CytotecJual Obat Aborsi Di Dubai UAE Wa 0838-4800-7379 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi Di Dubai UAE Wa 0838-4800-7379 Obat Penggugur Kandungan Cytotecjualobat34
 
Dr. A Sumathi - LINEARITY CONCEPT OF SIGNIFICANCE.pdf
Dr. A Sumathi - LINEARITY CONCEPT OF SIGNIFICANCE.pdfDr. A Sumathi - LINEARITY CONCEPT OF SIGNIFICANCE.pdf
Dr. A Sumathi - LINEARITY CONCEPT OF SIGNIFICANCE.pdfSumathi Arumugam
 
VIP ℂall Girls Arekere Bangalore 6378878445 WhatsApp: Me All Time Serviℂe Ava...
VIP ℂall Girls Arekere Bangalore 6378878445 WhatsApp: Me All Time Serviℂe Ava...VIP ℂall Girls Arekere Bangalore 6378878445 WhatsApp: Me All Time Serviℂe Ava...
VIP ℂall Girls Arekere Bangalore 6378878445 WhatsApp: Me All Time Serviℂe Ava...deepakkumar115120
 
Intro to disinformation and public health
Intro to disinformation and public healthIntro to disinformation and public health
Intro to disinformation and public healthTina Purnat
 
Cardiac Output, Venous Return, and Their Regulation
Cardiac Output, Venous Return, and Their RegulationCardiac Output, Venous Return, and Their Regulation
Cardiac Output, Venous Return, and Their RegulationMedicoseAcademics
 
ANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptxANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptxSwetaba Besh
 
ANATOMY AND PHYSIOLOGY OF REPRODUCTIVE SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF REPRODUCTIVE SYSTEM.pptxANATOMY AND PHYSIOLOGY OF REPRODUCTIVE SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF REPRODUCTIVE SYSTEM.pptxSwetaba Besh
 
Circulatory Shock, types and stages, compensatory mechanisms
Circulatory Shock, types and stages, compensatory mechanismsCirculatory Shock, types and stages, compensatory mechanisms
Circulatory Shock, types and stages, compensatory mechanismsMedicoseAcademics
 
The Clean Living Project Episode 23 - Journaling
The Clean Living Project Episode 23 - JournalingThe Clean Living Project Episode 23 - Journaling
The Clean Living Project Episode 23 - JournalingThe Clean Living Project
 
Part I - Anticipatory Grief: Experiencing grief before the loss has happened
Part I - Anticipatory Grief: Experiencing grief before the loss has happenedPart I - Anticipatory Grief: Experiencing grief before the loss has happened
Part I - Anticipatory Grief: Experiencing grief before the loss has happenedbkling
 

Recently uploaded (20)

Obat Aborsi Ampuh Usia 1,2,3,4,5,6,7 Bulan 081901222272 Obat Penggugur Kandu...
Obat Aborsi Ampuh Usia 1,2,3,4,5,6,7 Bulan  081901222272 Obat Penggugur Kandu...Obat Aborsi Ampuh Usia 1,2,3,4,5,6,7 Bulan  081901222272 Obat Penggugur Kandu...
Obat Aborsi Ampuh Usia 1,2,3,4,5,6,7 Bulan 081901222272 Obat Penggugur Kandu...
 
Difference Between Skeletal Smooth and Cardiac Muscles
Difference Between Skeletal Smooth and Cardiac MusclesDifference Between Skeletal Smooth and Cardiac Muscles
Difference Between Skeletal Smooth and Cardiac Muscles
 
MOTION MANAGEMANT IN LUNG SBRT BY DR KANHU CHARAN PATRO
MOTION MANAGEMANT IN LUNG SBRT BY DR KANHU CHARAN PATROMOTION MANAGEMANT IN LUNG SBRT BY DR KANHU CHARAN PATRO
MOTION MANAGEMANT IN LUNG SBRT BY DR KANHU CHARAN PATRO
 
Top 10 Most Beautiful Russian Pornstars List 2024
Top 10 Most Beautiful Russian Pornstars List 2024Top 10 Most Beautiful Russian Pornstars List 2024
Top 10 Most Beautiful Russian Pornstars List 2024
 
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
 
See it and Catch it! Recognizing the Thought Traps that Negatively Impact How...
See it and Catch it! Recognizing the Thought Traps that Negatively Impact How...See it and Catch it! Recognizing the Thought Traps that Negatively Impact How...
See it and Catch it! Recognizing the Thought Traps that Negatively Impact How...
 
7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta
7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta
7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta
 
Creeping Stroke - Venous thrombosis presenting with pc-stroke.pptx
Creeping Stroke - Venous thrombosis presenting with pc-stroke.pptxCreeping Stroke - Venous thrombosis presenting with pc-stroke.pptx
Creeping Stroke - Venous thrombosis presenting with pc-stroke.pptx
 
Physiologic Anatomy of Heart_AntiCopy.pdf
Physiologic Anatomy of Heart_AntiCopy.pdfPhysiologic Anatomy of Heart_AntiCopy.pdf
Physiologic Anatomy of Heart_AntiCopy.pdf
 
VIP ℂall Girls Kothanur {{ Bangalore }} 6378878445 WhatsApp: Me 24/7 Hours Se...
VIP ℂall Girls Kothanur {{ Bangalore }} 6378878445 WhatsApp: Me 24/7 Hours Se...VIP ℂall Girls Kothanur {{ Bangalore }} 6378878445 WhatsApp: Me 24/7 Hours Se...
VIP ℂall Girls Kothanur {{ Bangalore }} 6378878445 WhatsApp: Me 24/7 Hours Se...
 
Jual Obat Aborsi Di Dubai UAE Wa 0838-4800-7379 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi Di Dubai UAE Wa 0838-4800-7379 Obat Penggugur Kandungan CytotecJual Obat Aborsi Di Dubai UAE Wa 0838-4800-7379 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi Di Dubai UAE Wa 0838-4800-7379 Obat Penggugur Kandungan Cytotec
 
Dr. A Sumathi - LINEARITY CONCEPT OF SIGNIFICANCE.pdf
Dr. A Sumathi - LINEARITY CONCEPT OF SIGNIFICANCE.pdfDr. A Sumathi - LINEARITY CONCEPT OF SIGNIFICANCE.pdf
Dr. A Sumathi - LINEARITY CONCEPT OF SIGNIFICANCE.pdf
 
VIP ℂall Girls Arekere Bangalore 6378878445 WhatsApp: Me All Time Serviℂe Ava...
VIP ℂall Girls Arekere Bangalore 6378878445 WhatsApp: Me All Time Serviℂe Ava...VIP ℂall Girls Arekere Bangalore 6378878445 WhatsApp: Me All Time Serviℂe Ava...
VIP ℂall Girls Arekere Bangalore 6378878445 WhatsApp: Me All Time Serviℂe Ava...
 
Intro to disinformation and public health
Intro to disinformation and public healthIntro to disinformation and public health
Intro to disinformation and public health
 
Cardiac Output, Venous Return, and Their Regulation
Cardiac Output, Venous Return, and Their RegulationCardiac Output, Venous Return, and Their Regulation
Cardiac Output, Venous Return, and Their Regulation
 
ANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptxANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptx
 
ANATOMY AND PHYSIOLOGY OF REPRODUCTIVE SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF REPRODUCTIVE SYSTEM.pptxANATOMY AND PHYSIOLOGY OF REPRODUCTIVE SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF REPRODUCTIVE SYSTEM.pptx
 
Circulatory Shock, types and stages, compensatory mechanisms
Circulatory Shock, types and stages, compensatory mechanismsCirculatory Shock, types and stages, compensatory mechanisms
Circulatory Shock, types and stages, compensatory mechanisms
 
The Clean Living Project Episode 23 - Journaling
The Clean Living Project Episode 23 - JournalingThe Clean Living Project Episode 23 - Journaling
The Clean Living Project Episode 23 - Journaling
 
Part I - Anticipatory Grief: Experiencing grief before the loss has happened
Part I - Anticipatory Grief: Experiencing grief before the loss has happenedPart I - Anticipatory Grief: Experiencing grief before the loss has happened
Part I - Anticipatory Grief: Experiencing grief before the loss has happened
 

Summer internship at University of Tokyo

  • 1. Summer internship report FAiS, Human Genome Center, The University of Tokyo Vaibhav Kulshrestha 2018
  • 2. Summary 2 1. Studied Non-negative matrix factorization (NMF) 2. Applied it to JFCR39 and DepMap-Avana datasets 3. Learnt how to use a supercomputer 4. Made a python package - BigNMF
  • 3. About NMF Non-negative factorization (NMF) refers to a set of algorithms which factorize a matrix X into two matrices W and H such that X = W.H and all three matrices are non-negative. This is a very versatile algorithm and finds use in many fields, one of which is computational biology as the resultant factorized matrices are easy to interpret and inspect. There are many variations to this algorithm like Sparse NMF, Integrative NMF etc. This paper popularized use of NMF.
  • 4. Getting to know NMF I started my internship by reading about NMF and its uses for the first few days. Then, I studied different modifications of NMF and why they were made. After this, I implemented the algorithm on my own. I used Standard and Sparse NMF. To fine tune my implementation, I tried the algorithm on simulated/toy data and made changes as necessary. This was to ensure that the algorithm would give correct factorization when I use it on real data. 4
  • 5. Project #1 - Biomarker discovery using JFCR39 dataset
  • 6. About the JFCR39 dataset I worked on biomarker discovery using NMF. This dataset contains information about cell lines and their response to 69 drugs, mutation status of 7 genes and protein expression of 19 proteins. 1. The drugs were a mix of chemotherapeutic and molecular targeted drugs. 2. The mutations are either oncogenic or tumor suppressor. 3. The proteins in the database can be classified into 3 broad pathways - PI3K/Akt pathway, mTOR pathway and MEK pathway. 6
  • 7. Workflow 1. Since this was a multi omics dataset, I had to use Joint NMF which is combined factorization of multiple matrices with one dimension common. 2. I chose Standard NMF instead of Sparse NMF as Standard NMF gave better clustering at all ranks. 3. Then I found the most suitable rank for analysis (3) with the help of consensus matrices and cophenetic correlation (errors decreasing with increasing rank). 4. I also read about the different types of drugs and their mechanisms of action, following which I manually classified the genes and protein in the data for verification of NMF results. 5. After fixing rank 3, I found the output matrices and then analysed them. 7
  • 8. 8 Figures Joint NMF consensus matrices Rank 2 Rank 3 Rank 4 R a n k 3 Cophenetic correlation (higher is better) was the highest for rank 3 at 0.97 Consensus matrices for rank 3 were the most clearly clustered. The 4 matrices (clockwise from top left) are for cell lines, drugs, mutations and proteins. Cophenetic correlation
  • 9. Results/outputs 9 Clustered output matrices for JFCR rank 3 0 - PI3K inhibitors 1 - Chemother apeutic 2 - Akt inhibitors 0 - Akt 1 - PI3K + mTOR 2 - Mixed Cell lines ProteinsMutations Drugs 2 - Oncogenic 0 - TS
  • 10. Analysis 1. PTEN mutants are sensitive to PI3K inhibitors. 2. PTEN mutation affects Akt phosphorylation. 3. mTOR is downstream of PI3K and Akt, yet mTOR and downstream proteins are not affected by PTEN mutation - other proteins? 4. PTEN mutation can be a good biomarker for PI3K inhibitor. 5. PI3K inhibitors and mTOR inhibitors cluster together, suggesting similar mechanisms of action. 6. Chemotherapeutic drugs don’t have any relation to any mutation. 7. Some chemotherapeutic drugs are clustered with Akt inhibitors and PI3K inhibitors - similar effects? 8. Mutations happen in PIK3C subunits, KRAS and BRAF, they affect a wide range of pathways. 9. Cell lines in each cluster are sensitive to the drugs in the respective cluster.. 10
  • 11. Project #2 - Tumor dependency using DepMap CRISPR Avana dataset
  • 12. About the DepMap - CRISPR Avana dataset DepMap is a repository about tumors and their dependencies for development of precision treatments. The data was survival of the cell line after gene knockout using CRISPR. It contained cell lines and their sensitivities to different genes. The more negative the value, the more dependent the cell line was on the gene. We wanted to find tumor dependencies and group cancers based on them. https://depmap.org/portal/ 12
  • 13. Workflow 1. Initially chose rank 3 again due to good clustering and high cophenetic correlation. 2. But since it was a very large dataset, I would need a higher rank to clearly factorize the data. 3. So, I plotted consensus matrices and cophenetic correlation for ranks greater than 40. 4. Chose rank 50 due to good clustering and high cophenetic correlation. 5. Read about pathway analysis, GO analysis and PANTHER analysis. 6. Plotted pie charts for cell line clusters and corresponding genes. 7. The genes in the cluster were tumor suppressors for the cell lines in corresponding cluster. 13
  • 14. Figures 14 3 Pie chart representation of some clusters. Pathway analysis of clusters would shed more light on cell line-gene dependencies. P53PTEN Tumor Suppressors
  • 15. Supercomputer experience - Shirokane 1. This was my first time using a supercomputer. 2. I learnt various commands and utilities- qsub, qstat 3. Also learnt about memory usage - qfree, qavail 4. After basic usage of qsub, learnt different parameters with which I could submit jobs. 5. Made shell scripts which could run multiple programs in parallel with different parameters for faster workflow and convenience. 6. Had to customize my environment to work with the supercomputer better. 15
  • 16. Python Package - BigNmf Made a package BigNmf which is now available to download from PyPI. Run pip3 install bignmf. This package implements both single and joint NMF with the help of Standard, Sparse and Integrative NMF. You can get output matrices, consensus matrices, cophenetic correlations and errors. Proper documentation is available at - https://bignmf.readthedocs.io/en/latest/ 16
  • 17. Learnings and new experiences 1. Great introduction to the world of research 2. Vastly improved knowledge about bioinformatics and biology in general 3. Improved coding practices looking at others codes 4. Supercomputer! 5. Interaction with people from different cultures Always wanted to visit and live in Japan! Thank you everyone! :) 17