SlideShare a Scribd company logo
IEEE BIBE 2013

13rd IEEE International Conference on
Bioinformatics and Bioengineering,
11st November, Chania, Greece, EU

A Discrete Optimization Approach
for SVD Best Truncation Choice based
on ROC Curves
Davide Chicco, Marco Masseroli
davide.chicco@elet.polimi.it
Summary
1. The context & the problem
• Biomolecular annotations

• Prediction of biomolecular annotations
• SVD (Singular Value Decomposition)
• SVD Truncation

2. The proposed solution
• ROC Area Under the Curve comparison
• Truncation level choices
3. Evaluation
• Evaluation data set & results

4. Conclusions
“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”

2
Biomolecular annotations
• The concept of annotation: association of nucleotide or amino
acid sequences with useful information describing their features

• This information is expressed through controlled vocabularies,
sometimes structured as ontologies, where every controlled
term of the vocabulary is associated with a unique
alphanumeric code
• The association of such a code with a gene or protein ID
constitutes an annotation
Biological function feature

Gene /
Protein
Annotation
gene2bff

“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”

3
Biomolecular annotations (2)
• The association of an information/feature with a gene or
protein ID constitutes an annotation

• Annotation example:
• gene: GD4
• feature: “is present in the mitochondrial membrane”
Biological function feature

Gene /
Protein
Annotation
gene2bff

“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”

4
Prediction of biomolecular annotations
• Many available annotations in different databanks
• However, available annotations are incomplete
• Only a few of them represent highly reliable, human–curated
information

• To support and quicken the time–consuming curation process,
prioritized lists of computationally predicted annotations
are extremely useful
• These lists could be generated softwares based that implement
Machine Learning algorithms

“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”

5
Annotation prediction through
Singular Value Decomposition – SVD

• Annotation matrix A  {0, 1} m x n
− m rows: genes / proteins
− n columns: annotation terms

A(i,j) = 1 if gene / protein i is annotated to term j or to any
descendant of j in the considered ontology structure (true
path rule)
A(i,j) = 0 otherwise (it is unknown)
term01

term02

term03

term04

…

termN

gene01

0

0

0

0

…

0

gene02

0

1

1

0

…

1

…

…

…

…

…

…

…

geneM

0

0

0

0

…

0

“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”

7
Annotation prediction through
Singular Value Decomposition – SVD

• Annotation matrix A  {0, 1} m x n
− m rows: genes / proteins
− n columns: annotation terms

A(i,j) = 1 if gene / protein i is annotated to term j or to any
descendant of j in the considered ontology structure (true
path rule)
A(i,j) = 0 otherwise (it is unknown)
term01

term02

term03

term04

…

termN

gene01

0

0

0

0

…

0

gene02

0

1

1

0

…

1

…

…

…

…

…

…

…

geneM

0

0

0

0

…

0

“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”

8
Singular Value Decomposition – SVD
Compute SVD:
A  U V T

A  U V T  U V T V TA  U V T
A U
A



Compute reduced rank approximation:
Ak  U k kkVk U kUkVkkkVkTU k kVkT
A AT    T 
A
k
Ak  U k kVkT



k
k

• An annotation prediction is performed by computing a reduced
rank approximation Ak of the annotation matrix A
(where 0 < k < r, with r the number of non zero singular values
of A, i.e. the rank of A)

“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”

9
Singular Value Decomposition – SVD (2)
• Ak contains real valued entries related to the likelihood that
gene i shall be annotated to term j
For a certain real threshold τ:
if Ak(i,j) > τ, gene i is predicted to be annotated to term j
− The threshold τ can be chosen in order to obtain the
best predicted annotations [Khatri et al., 2005]

“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”

10
Singular Value Decomposition – SVD (3)
• It is possible to rewrite the SVD decomposition in an equivalent
form, such that the predicted annotation profile is given by:
ak,iT = aiT Vk VkT
where ak,iT is a row vector containing the predictions for gene i
• Note that Vk depends on the whole set of genes
• Indeed, the columns of Vk are a set of eigenvectors of the
global term-to-term correlation matrix T = ATA, estimated from
the whole set of available annotations

“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”

11
Evaluation of the prediction
To evaluate the prediction, we compare each A(i,j) element to its
corresponding Ak(i,j) for each real threshold τ, with 0 ≤ τ ≤ 1.0

•

if A(i,j) = 1 & Ak(i,j) > τ:

AC: Annotation Confirmed
(AC <- AC+1)

•

if A(i,j) = 1 & Ak(i,j) ≤ τ:

AR: Annotation to be Reviewed
(AR <- AR+1)

•

if A(i,j) = 0 & Ak(i,j) ≤ τ: NAC: No Annotation Confirmed
(NAC <- NAC+1)

•

if A(i,j) = 0 & Ak(i,j) > τ:

AP: annotation predicted
(AP <- AP+1)

“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”

12
SVD truncation
• The main problem of truncated SVD: how to choose the
truncation?
• Where to truncate?

How to choose the k here?
“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”

13
New concept: Receiver Operating Characteristic
(ROC) curve
Starting from the annotation prediction evaluation factor we just
introduced
 AC: Annotation Confirmed
 AR: Annotation to be Reviewed
 NAC: No Annotation Confirmed
 AP: Annotation Predicted

Input

Output

Yes

Yes

Yes

No

No

No

No

Yes

We can design the Receiver Operating Characteristic curves for
every prediction:

 On the x, the annotation to be reviewed rate:
 On the y, the annotation predicted rate:
“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”

𝑨𝑹
𝑨𝑪+𝑨𝑹
𝑨𝑷
𝑨𝑷+𝑵𝑨𝑪

14
New concept: Receiver Operating Characteristic
(ROC) curve (2)

 On the y, the annotation confirmed rate:
 On the x, the annotation predicted rate:
“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”

𝑨𝑪
𝑨𝑪+𝑨𝑹
𝑨𝑷
𝑨𝑷+𝑵𝑨𝑪
15
SVD truncation choice
Algorithm:
1) Choose some possible truncation levels
2) Compute the Receiver Operating Characteristic for each
SVD prediction of those truncation levels
3) Compute the Area Under the Curve (AUC) of each ROC
4) Choose the truncation level of the ROC that has
maximum AUC

“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”

16
SVD truncation choice (2)
Algorithm:
1) Choose some possible truncation levels
2) Compute the Receiver Operating Characteristic for each
SVD prediction of those truncation levels
3) Compute the Area Under the Curve (AUC) of each ROC
4) Choose the truncation level of the ROC that has
maximum AUC

Quite easy!

“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”

17
SVD truncation choice (3)
Algorithm:

Quite challenging!

1) Choose some possible truncation levels
2) Compute the Receiver Operating Characteristic for each
SVD prediction of those truncation levels
3) Compute the Area Under the Curve (AUC) of each ROC
4) Choose the truncation level of the ROC that has
maximum AUC

“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”

18
Minimum AUC between all the ROCs of various
truncation levels
1) Choose some possible truncation levels
We cannot compute the SVD, its ROC and its AUC for every
truncation values because would be too expensive (for time
and resources).
Algorithm:
1) Since the matrix A(i,j) has m rows (genes) and n columns
(annotation terms), we take p = min(m, n)
2) Since r ≤ p is the number of non-zero singular values
along the diagonal of , the best truncation value is in the
interval [1; r]
3) newInterval = {1, r}
4) k = firstElement(newInterval)
5) step = length(newInterval) / numStep
“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”

19
Minimum AUC between all the ROCs of various
truncation levels (2)
4. We make a sampling of all the N non-null singular values,
with constant sample intervals of size step (step=10% * N)

5. For every sampled singular value, we compute the SVD
and its corresponding ROC AUC for ACrate in [0%, 100%]
and APrate in [0%, 1%]

“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”

20
Minimum AUC between all the ROCs of various
truncation levels (3)
Given the first AUC, if the AUCs of all the three subsequent
samples decrease, we take it for the zoom next step
Local
Best
Index

zoom

This means we found a local maximum.

“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”

21
Minimum AUC between all the ROCs of various
truncation levels (3)
If the AUC differences of the last three singular values are
lower than gamma = 10%, , we take it for the zoom next step

Chosen
Index

zoom

This means that the AUCs do not grow up enough

“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”

22
Minimum AUC between all the ROCs of various
truncation levels (3)
Once we chose the index where to zoom, we re-run the
algorithm in the sub-interval

zoom

Until one of the previously described condition is satisfied
Or the maximum number of zooms (numZoom = 4) is reached
“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”

23
Example
Dataset: annotations with Gallus gallus genes and Biological
Process Gene Ontology terms

“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”

24
Results
• To evaluate the performance of our method, we used
annotations of
 terms: Biological process (BP), Cellular component (CC) and
Molecular function (MF) GO features
 organisms Bos Taurus, Danio rerio, Gallus gallus genes
• Available on July 2009 in an old version of the Gene Ontology

“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”

25
Results (2)
We then checked the, against the percentage of annotations
predicted percentage of annotations predicted with our SVD
method and our optimized truncation levelby the SVD method
and fixed truncation level (k=500) used by Draghici et al. in the
paper “A semantic analysis of the annotations of the human

genome” (2005)

“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”

26
Conclusions

Problem: SVD truncation in
the prediction of genomic
annotations context

Proposed solution: finding the
truncation level corresponding
to the maximum AUC of the
ROC curve, and it’s near to
zero

“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”

27
Conclusions (2)
•To avoid computing SVD for all the possible truncation levels
(too expensive!), we proposed an algorithm for the search of
local and global maxima, by zooming sub-intervals

•The best SVD truncation levels suggested by this algorithm for
our dataset (annotations of Bos Taurus, Danio Rerio, and Gallus
gallus genes, and GO terms) gave better results than other
truncation levels, in a reasonable time.

“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”

28
Future developments
• To obtain the best sampling, we could study the gradient
variations in the distribution of the AUC values for different
truncation levels and the histogram of the eigenvalues
• Our approach is not limited to the Gene Ontology and can be
applied to any controlled annotations

“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”

29
A Discrete Optimization Approach for SVD Best
Truncation Choice based on ROC Curves

Thanks for your attention!!!
www.DavideChicco.it

davide.chicco@elet.polimi.it

“A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves”

30

More Related Content

Similar to A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves

MUMS: Transition & SPUQ Workshop - Gradient-Free Construction of Active Subsp...
MUMS: Transition & SPUQ Workshop - Gradient-Free Construction of Active Subsp...MUMS: Transition & SPUQ Workshop - Gradient-Free Construction of Active Subsp...
MUMS: Transition & SPUQ Workshop - Gradient-Free Construction of Active Subsp...
The Statistical and Applied Mathematical Sciences Institute
 
Doctoral Thesis Dissertation 2014-03-20 @PoliMi
Doctoral Thesis Dissertation 2014-03-20 @PoliMiDoctoral Thesis Dissertation 2014-03-20 @PoliMi
Doctoral Thesis Dissertation 2014-03-20 @PoliMi
Davide Chicco
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model
佳蓉 倪
 
Outlier detection method introduction
Outlier detection method introductionOutlier detection method introduction
Outlier detection method introduction
DaeJin Kim
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
theijes
 
Simulation Software Performances And Examples
Simulation Software Performances And ExamplesSimulation Software Performances And Examples
Simulation Software Performances And Examples
Hector Alberto Cerdan Arteaga
 
MODIFIED VORTEX SEARCH ALGORITHM FOR REAL PARAMETER OPTIMIZATION
MODIFIED VORTEX SEARCH ALGORITHM FOR REAL PARAMETER OPTIMIZATIONMODIFIED VORTEX SEARCH ALGORITHM FOR REAL PARAMETER OPTIMIZATION
MODIFIED VORTEX SEARCH ALGORITHM FOR REAL PARAMETER OPTIMIZATION
cscpconf
 
Modified Vortex Search Algorithm for Real Parameter Optimization
Modified Vortex Search Algorithm for Real Parameter Optimization Modified Vortex Search Algorithm for Real Parameter Optimization
Modified Vortex Search Algorithm for Real Parameter Optimization
csandit
 
Kailash(13EC35032)_mtp.pptx
Kailash(13EC35032)_mtp.pptxKailash(13EC35032)_mtp.pptx
Kailash(13EC35032)_mtp.pptx
KailashChandMeena6
 
A MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATION
A MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATIONA MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATION
A MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATION
ijaia
 
Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...
Usatyuk Vasiliy
 
Evaluating Classifiers' Performance KDD2002
Evaluating Classifiers' Performance KDD2002Evaluating Classifiers' Performance KDD2002
Evaluating Classifiers' Performance KDD2002
Anna Olecka
 
Regression.pptx
Regression.pptxRegression.pptx
Regression.pptx
Tigabu Yaya
 
Regression.pptx
Regression.pptxRegression.pptx
Regression.pptx
tayyaba19799
 
Minimum Cost Fault Tolerant Adder Circuits in Reversible Logic Synthesis
Minimum Cost Fault Tolerant Adder Circuits in Reversible Logic SynthesisMinimum Cost Fault Tolerant Adder Circuits in Reversible Logic Synthesis
Minimum Cost Fault Tolerant Adder Circuits in Reversible Logic Synthesis
Sajib Mitra
 
India presentation final
India presentation finalIndia presentation final
India presentation final
caki2
 
Speaker recognition systems
Speaker recognition systemsSpeaker recognition systems
Speaker recognition systems
Namratha Dcruz
 
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
IRJET Journal
 
Physical design
Physical design Physical design
Physical design
Mantra VLSI
 
xldb-2015
xldb-2015xldb-2015
xldb-2015
Mohitdeep Singh
 

Similar to A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves (20)

MUMS: Transition & SPUQ Workshop - Gradient-Free Construction of Active Subsp...
MUMS: Transition & SPUQ Workshop - Gradient-Free Construction of Active Subsp...MUMS: Transition & SPUQ Workshop - Gradient-Free Construction of Active Subsp...
MUMS: Transition & SPUQ Workshop - Gradient-Free Construction of Active Subsp...
 
Doctoral Thesis Dissertation 2014-03-20 @PoliMi
Doctoral Thesis Dissertation 2014-03-20 @PoliMiDoctoral Thesis Dissertation 2014-03-20 @PoliMi
Doctoral Thesis Dissertation 2014-03-20 @PoliMi
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model
 
Outlier detection method introduction
Outlier detection method introductionOutlier detection method introduction
Outlier detection method introduction
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Simulation Software Performances And Examples
Simulation Software Performances And ExamplesSimulation Software Performances And Examples
Simulation Software Performances And Examples
 
MODIFIED VORTEX SEARCH ALGORITHM FOR REAL PARAMETER OPTIMIZATION
MODIFIED VORTEX SEARCH ALGORITHM FOR REAL PARAMETER OPTIMIZATIONMODIFIED VORTEX SEARCH ALGORITHM FOR REAL PARAMETER OPTIMIZATION
MODIFIED VORTEX SEARCH ALGORITHM FOR REAL PARAMETER OPTIMIZATION
 
Modified Vortex Search Algorithm for Real Parameter Optimization
Modified Vortex Search Algorithm for Real Parameter Optimization Modified Vortex Search Algorithm for Real Parameter Optimization
Modified Vortex Search Algorithm for Real Parameter Optimization
 
Kailash(13EC35032)_mtp.pptx
Kailash(13EC35032)_mtp.pptxKailash(13EC35032)_mtp.pptx
Kailash(13EC35032)_mtp.pptx
 
A MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATION
A MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATIONA MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATION
A MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATION
 
Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...
 
Evaluating Classifiers' Performance KDD2002
Evaluating Classifiers' Performance KDD2002Evaluating Classifiers' Performance KDD2002
Evaluating Classifiers' Performance KDD2002
 
Regression.pptx
Regression.pptxRegression.pptx
Regression.pptx
 
Regression.pptx
Regression.pptxRegression.pptx
Regression.pptx
 
Minimum Cost Fault Tolerant Adder Circuits in Reversible Logic Synthesis
Minimum Cost Fault Tolerant Adder Circuits in Reversible Logic SynthesisMinimum Cost Fault Tolerant Adder Circuits in Reversible Logic Synthesis
Minimum Cost Fault Tolerant Adder Circuits in Reversible Logic Synthesis
 
India presentation final
India presentation finalIndia presentation final
India presentation final
 
Speaker recognition systems
Speaker recognition systemsSpeaker recognition systems
Speaker recognition systems
 
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
 
Physical design
Physical design Physical design
Physical design
 
xldb-2015
xldb-2015xldb-2015
xldb-2015
 

Recently uploaded

Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...
EduSkills OECD
 
SWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptxSWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptx
zuzanka
 
CIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdfCIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdf
blueshagoo1
 
How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17
Celine George
 
Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.
IsmaelVazquez38
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
Nguyen Thanh Tu Collection
 
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.pptLevel 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Henry Hollis
 
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
Payaamvohra1
 
MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025
khuleseema60
 
Bonku-Babus-Friend by Sathyajith Ray (9)
Bonku-Babus-Friend by Sathyajith Ray  (9)Bonku-Babus-Friend by Sathyajith Ray  (9)
Bonku-Babus-Friend by Sathyajith Ray (9)
nitinpv4ai
 
Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10
nitinpv4ai
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
Himanshu Rai
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
siemaillard
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...
Nguyen Thanh Tu Collection
 
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
National Information Standards Organization (NISO)
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
haiqairshad
 
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumPhilippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
MJDuyan
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
Jyoti Chand
 
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
ImMuslim
 
Educational Technology in the Health Sciences
Educational Technology in the Health SciencesEducational Technology in the Health Sciences
Educational Technology in the Health Sciences
Iris Thiele Isip-Tan
 

Recently uploaded (20)

Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...
 
SWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptxSWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptx
 
CIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdfCIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdf
 
How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17
 
Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
 
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.pptLevel 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
 
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
 
MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025
 
Bonku-Babus-Friend by Sathyajith Ray (9)
Bonku-Babus-Friend by Sathyajith Ray  (9)Bonku-Babus-Friend by Sathyajith Ray  (9)
Bonku-Babus-Friend by Sathyajith Ray (9)
 
Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...
 
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
 
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumPhilippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
 
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
 
Educational Technology in the Health Sciences
Educational Technology in the Health SciencesEducational Technology in the Health Sciences
Educational Technology in the Health Sciences
 

A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves

  • 1. IEEE BIBE 2013 13rd IEEE International Conference on Bioinformatics and Bioengineering, 11st November, Chania, Greece, EU A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves Davide Chicco, Marco Masseroli davide.chicco@elet.polimi.it
  • 2. Summary 1. The context & the problem • Biomolecular annotations • Prediction of biomolecular annotations • SVD (Singular Value Decomposition) • SVD Truncation 2. The proposed solution • ROC Area Under the Curve comparison • Truncation level choices 3. Evaluation • Evaluation data set & results 4. Conclusions “A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves” 2
  • 3. Biomolecular annotations • The concept of annotation: association of nucleotide or amino acid sequences with useful information describing their features • This information is expressed through controlled vocabularies, sometimes structured as ontologies, where every controlled term of the vocabulary is associated with a unique alphanumeric code • The association of such a code with a gene or protein ID constitutes an annotation Biological function feature Gene / Protein Annotation gene2bff “A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves” 3
  • 4. Biomolecular annotations (2) • The association of an information/feature with a gene or protein ID constitutes an annotation • Annotation example: • gene: GD4 • feature: “is present in the mitochondrial membrane” Biological function feature Gene / Protein Annotation gene2bff “A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves” 4
  • 5. Prediction of biomolecular annotations • Many available annotations in different databanks • However, available annotations are incomplete • Only a few of them represent highly reliable, human–curated information • To support and quicken the time–consuming curation process, prioritized lists of computationally predicted annotations are extremely useful • These lists could be generated softwares based that implement Machine Learning algorithms “A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves” 5
  • 6. Annotation prediction through Singular Value Decomposition – SVD • Annotation matrix A  {0, 1} m x n − m rows: genes / proteins − n columns: annotation terms A(i,j) = 1 if gene / protein i is annotated to term j or to any descendant of j in the considered ontology structure (true path rule) A(i,j) = 0 otherwise (it is unknown) term01 term02 term03 term04 … termN gene01 0 0 0 0 … 0 gene02 0 1 1 0 … 1 … … … … … … … geneM 0 0 0 0 … 0 “A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves” 7
  • 7. Annotation prediction through Singular Value Decomposition – SVD • Annotation matrix A  {0, 1} m x n − m rows: genes / proteins − n columns: annotation terms A(i,j) = 1 if gene / protein i is annotated to term j or to any descendant of j in the considered ontology structure (true path rule) A(i,j) = 0 otherwise (it is unknown) term01 term02 term03 term04 … termN gene01 0 0 0 0 … 0 gene02 0 1 1 0 … 1 … … … … … … … geneM 0 0 0 0 … 0 “A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves” 8
  • 8. Singular Value Decomposition – SVD Compute SVD: A  U V T A  U V T  U V T V TA  U V T A U A  Compute reduced rank approximation: Ak  U k kkVk U kUkVkkkVkTU k kVkT A AT    T  A k Ak  U k kVkT  k k • An annotation prediction is performed by computing a reduced rank approximation Ak of the annotation matrix A (where 0 < k < r, with r the number of non zero singular values of A, i.e. the rank of A) “A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves” 9
  • 9. Singular Value Decomposition – SVD (2) • Ak contains real valued entries related to the likelihood that gene i shall be annotated to term j For a certain real threshold τ: if Ak(i,j) > τ, gene i is predicted to be annotated to term j − The threshold τ can be chosen in order to obtain the best predicted annotations [Khatri et al., 2005] “A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves” 10
  • 10. Singular Value Decomposition – SVD (3) • It is possible to rewrite the SVD decomposition in an equivalent form, such that the predicted annotation profile is given by: ak,iT = aiT Vk VkT where ak,iT is a row vector containing the predictions for gene i • Note that Vk depends on the whole set of genes • Indeed, the columns of Vk are a set of eigenvectors of the global term-to-term correlation matrix T = ATA, estimated from the whole set of available annotations “A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves” 11
  • 11. Evaluation of the prediction To evaluate the prediction, we compare each A(i,j) element to its corresponding Ak(i,j) for each real threshold τ, with 0 ≤ τ ≤ 1.0 • if A(i,j) = 1 & Ak(i,j) > τ: AC: Annotation Confirmed (AC <- AC+1) • if A(i,j) = 1 & Ak(i,j) ≤ τ: AR: Annotation to be Reviewed (AR <- AR+1) • if A(i,j) = 0 & Ak(i,j) ≤ τ: NAC: No Annotation Confirmed (NAC <- NAC+1) • if A(i,j) = 0 & Ak(i,j) > τ: AP: annotation predicted (AP <- AP+1) “A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves” 12
  • 12. SVD truncation • The main problem of truncated SVD: how to choose the truncation? • Where to truncate? How to choose the k here? “A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves” 13
  • 13. New concept: Receiver Operating Characteristic (ROC) curve Starting from the annotation prediction evaluation factor we just introduced  AC: Annotation Confirmed  AR: Annotation to be Reviewed  NAC: No Annotation Confirmed  AP: Annotation Predicted Input Output Yes Yes Yes No No No No Yes We can design the Receiver Operating Characteristic curves for every prediction:  On the x, the annotation to be reviewed rate:  On the y, the annotation predicted rate: “A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves” 𝑨𝑹 𝑨𝑪+𝑨𝑹 𝑨𝑷 𝑨𝑷+𝑵𝑨𝑪 14
  • 14. New concept: Receiver Operating Characteristic (ROC) curve (2)  On the y, the annotation confirmed rate:  On the x, the annotation predicted rate: “A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves” 𝑨𝑪 𝑨𝑪+𝑨𝑹 𝑨𝑷 𝑨𝑷+𝑵𝑨𝑪 15
  • 15. SVD truncation choice Algorithm: 1) Choose some possible truncation levels 2) Compute the Receiver Operating Characteristic for each SVD prediction of those truncation levels 3) Compute the Area Under the Curve (AUC) of each ROC 4) Choose the truncation level of the ROC that has maximum AUC “A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves” 16
  • 16. SVD truncation choice (2) Algorithm: 1) Choose some possible truncation levels 2) Compute the Receiver Operating Characteristic for each SVD prediction of those truncation levels 3) Compute the Area Under the Curve (AUC) of each ROC 4) Choose the truncation level of the ROC that has maximum AUC Quite easy! “A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves” 17
  • 17. SVD truncation choice (3) Algorithm: Quite challenging! 1) Choose some possible truncation levels 2) Compute the Receiver Operating Characteristic for each SVD prediction of those truncation levels 3) Compute the Area Under the Curve (AUC) of each ROC 4) Choose the truncation level of the ROC that has maximum AUC “A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves” 18
  • 18. Minimum AUC between all the ROCs of various truncation levels 1) Choose some possible truncation levels We cannot compute the SVD, its ROC and its AUC for every truncation values because would be too expensive (for time and resources). Algorithm: 1) Since the matrix A(i,j) has m rows (genes) and n columns (annotation terms), we take p = min(m, n) 2) Since r ≤ p is the number of non-zero singular values along the diagonal of , the best truncation value is in the interval [1; r] 3) newInterval = {1, r} 4) k = firstElement(newInterval) 5) step = length(newInterval) / numStep “A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves” 19
  • 19. Minimum AUC between all the ROCs of various truncation levels (2) 4. We make a sampling of all the N non-null singular values, with constant sample intervals of size step (step=10% * N) 5. For every sampled singular value, we compute the SVD and its corresponding ROC AUC for ACrate in [0%, 100%] and APrate in [0%, 1%] “A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves” 20
  • 20. Minimum AUC between all the ROCs of various truncation levels (3) Given the first AUC, if the AUCs of all the three subsequent samples decrease, we take it for the zoom next step Local Best Index zoom This means we found a local maximum. “A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves” 21
  • 21. Minimum AUC between all the ROCs of various truncation levels (3) If the AUC differences of the last three singular values are lower than gamma = 10%, , we take it for the zoom next step Chosen Index zoom This means that the AUCs do not grow up enough “A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves” 22
  • 22. Minimum AUC between all the ROCs of various truncation levels (3) Once we chose the index where to zoom, we re-run the algorithm in the sub-interval zoom Until one of the previously described condition is satisfied Or the maximum number of zooms (numZoom = 4) is reached “A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves” 23
  • 23. Example Dataset: annotations with Gallus gallus genes and Biological Process Gene Ontology terms “A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves” 24
  • 24. Results • To evaluate the performance of our method, we used annotations of  terms: Biological process (BP), Cellular component (CC) and Molecular function (MF) GO features  organisms Bos Taurus, Danio rerio, Gallus gallus genes • Available on July 2009 in an old version of the Gene Ontology “A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves” 25
  • 25. Results (2) We then checked the, against the percentage of annotations predicted percentage of annotations predicted with our SVD method and our optimized truncation levelby the SVD method and fixed truncation level (k=500) used by Draghici et al. in the paper “A semantic analysis of the annotations of the human genome” (2005) “A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves” 26
  • 26. Conclusions Problem: SVD truncation in the prediction of genomic annotations context Proposed solution: finding the truncation level corresponding to the maximum AUC of the ROC curve, and it’s near to zero “A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves” 27
  • 27. Conclusions (2) •To avoid computing SVD for all the possible truncation levels (too expensive!), we proposed an algorithm for the search of local and global maxima, by zooming sub-intervals •The best SVD truncation levels suggested by this algorithm for our dataset (annotations of Bos Taurus, Danio Rerio, and Gallus gallus genes, and GO terms) gave better results than other truncation levels, in a reasonable time. “A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves” 28
  • 28. Future developments • To obtain the best sampling, we could study the gradient variations in the distribution of the AUC values for different truncation levels and the histogram of the eigenvalues • Our approach is not limited to the Gene Ontology and can be applied to any controlled annotations “A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves” 29
  • 29. A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves Thanks for your attention!!! www.DavideChicco.it davide.chicco@elet.polimi.it “A Discrete Optimization Approach for SVD Best Truncation Choice based on ROC Curves” 30