Evidence for tissue and stage-specific composition of the ribosome: machine learning analysis of ribosomal protein mRNA data
1. Evidence for tissue and stage-specific composition of the ribosome:
machine learning analysis of ribosomal protein mRNA data
June 2020
Michael Biehl www.cs.rug.nl/~biehl m.biehl@rug.nl
Bernoulli Inst. for Mathematics, Computer Science and Artificial Intelligence
2. Groningen, December 2021
2016: Aspen Center for Physics http://www.aspenphys.org
working group, initialized by Gyan Bhanot
development of ideas and first data analysis (LVQ & SOM)
here: emphasis on computational analysis, selected results
6. Groningen, December 2021
Francis Crick 1958: The Central Dogma
This states that once "information" has passed into protein, it cannot get
out again. In more detail, the transfer of information from nucleic acid to
nucleic acid, or from nucleic acid to protein may be possible, but transfer
from protein to protein, or from protein to nucleic acid is impossible.
Information means here the precise determination of sequence, either of
bases in the nucleic acid or of amino acid residues in the protein.
the “central dogma” of molecular biology
Rosalind Franklin
1920-1958
https://www.electricvoicetheatre.co.uk/rosalind-franklin-centenary-celebrations
https://de.wikipedia.org/wiki/Desoxyribonukleinsäure
7. Groningen, December 2021
• ancient molecular machine, “3D-printer” for proteins
• ~ 107 ribosomes per cell
• believed to have universal function
• believed to have the same composition in all tissues
the ribosome
Ribosome
8. Groningen, December 2021 8
Courtesy: National Human Genome Research Institute, NHI www.genome.gov
remark: mRNA vaccines
9. Groningen, December 2021
• ancient molecular machine, “3D-printer” for proteins
• ~ 107 ribosomes per cell
• believed to have universal function
• believed to have the same composition in all tissues
• consists of RNA and
ribosomal proteins (RP)
the ribosome
also coded by DNA which
is transcribed to mRNA
we consider 78 RP
← mRNA expression
Ribosome
10. Groningen, December 2021
Klijn et al. Nature Biotechnol. 33(3): 306-312 (2015)
675 cancer cell lines
public domain data sets
GTeX (v6p) www.gtexportal.org
ca. 10000 normal samples from 53 different tissues (with >50 samples)
TCGA (NCI-GDC, v7) www.cancer.gov (The Cancer Genome Atlas)
ca. 10000 tumor samples, 730 tumor-adjacent normals
PCA, SOM, t-SNE, UMAP, LVQ
normalization:
constant sum of reads
for each of the 78 RP
all results robust w.r.t. pre-
processing (log, z-score) and
choice of distance measures mRNA
expression
11. Groningen, December 2021
GTex ribosome data
ca. 10000 normal samples
from 53 different tissues
exclude 5 tissues
with < 50 samples
select randomly 88
samples per tissue
to avoid bias
set of feature vectors
<latexit sha1_base64="YYnX74plywrsHLvNIAOgeJTuamQ=">AAACJHicbVDLSgNBEJyNrxhfUY9eBoPgKexKUEGEoBdPEsWokEnC7GQ2GTI7u8z0imHZj/Hir3jx4AMPXvwWJ3EPvgoaiqpuurv8WAoDrvvuFKamZ2bnivOlhcWl5ZXy6tqliRLNeJNFMtLXPjVcCsWbIEDy61hzGvqSX/nD47F/dcO1EZG6gFHM2yHtKxEIRsFK3fIBkTwAkmISUhj4QXqbdUiYYCJULvnpedY5xUSL/gBI1k2tfehlnQbulitu1Z0A/yVeTiooR6NbfiG9iCUhV8AkNabluTG0U6pBMMmzEkkMjykb0j5vWapoyE07nTyZ4S2r9HAQaVsK8ET9PpHS0JhR6NvO8d3mtzcW//NaCQT77VSoOAGu2NeiIJEYIjxODPeE5gzkyBLKtLC3YjagmjKwuZZsCN7vl/+Sy52qt1utndUq9aM8jiLaQJtoG3loD9XRCWqgJmLoDj2gJ/Ts3DuPzqvz9tVacPKZdfQDzscnfw6lUQ==</latexit>
xµ
2 RN P
µ=1
<latexit sha1_base64="H3XzPe116LESU9mbbc8XD2vlHBg=">AAACBnicbVC7SgNBFJ31GdfXqqUIg0GwkLAbgkkTDNpYSQTzgOwSZieTZMjs7DIzK4YlNjb+h5WNhSK2dvY24t84eRSaeODC4Zx7ufceP2JUKtv+NubmFxaXllMr5ura+samtbVdlWEsMKngkIWi7iNJGOWkoqhipB4JggKfkZrfOxv6tWsiJA35lepHxAtQh9M2xUhpqWntXRTzBdeFphv44U1yqzGA7hEsF3PZbK5ppe2MPQKcJc6EpE8+zGL08GWWm9an2wpxHBCuMENSNhw7Ul6ChKKYkYHpxpJECPdQhzQ05Sgg0ktGbwzggVZasB0KXVzBkfp7IkGBlP3A150BUl057Q3F/7xGrNoFL6E8ihXheLyoHTOoQjjMBLaoIFixviYIC6pvhbiLBMJKJ2fqEJzpl2dJNZtxjjO5SztdOgVjpMAu2AeHwAF5UALnoAwqAIM78AiewYtxbzwZr8bbuHXOmMzsgD8w3n8APZSaEg==</latexit>
N = 78
P = 4224
with target labels (tissue)
12. Groningen, December 2021
unsupervised: low-dimensional representation and visualization
• Principal Component Analysis (PCA)
• Self-Organizing Map (SOM)
• Stochastic Neighborhood Embedding (t-SNE)
• post-hoc labelling
supervised: classification and class-discriminative visualization
• Learning Vector Quantization (LVQ)
• Relevance Learning (Matrix Relevance LVQ)
computational analysis / machine learning
spoiler: different tissues have different RP mRNA signatures!
basic ideas, limited mathematical detail
focus on aspects relevant for this particular study
13. Groningen, December 2021 13
<latexit sha1_base64="QRlFKTdZFTH2TZmCLuq5ispBsxg=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE1GPRi8eK9gPaUDbbTbt0swm7E7GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSKQw6LrfTmFldW19o7hZ2tre2d0r7x80TZxqxhsslrFuB9RwKRRvoEDJ24nmNAokbwWjm6nfeuTaiFg94DjhfkQHSoSCUbTS/VPP65UrbtWdgSwTLycVyFHvlb+6/ZilEVfIJDWm47kJ+hnVKJjkk1I3NTyhbEQHvGOpohE3fjY7dUJOrNInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tOyYbgLb68TJpnVe+ien53Xqld53EU4QiO4RQ8uIQa3EIdGsBgAM/wCm+OdF6cd+dj3lpw8plD+APn8wcOno2p</latexit>
x1
<latexit sha1_base64="AR8bw1iLV+h/TJ1WtHIGvCZC5c0=">AAAB6nicbVDLTgJBEOzFF+IL9ehlIjHxRHYJUY9ELx4xyiOBDZkdemHC7OxmZtZICJ/gxYPGePWLvPk3DrAHBSvppFLVne6uIBFcG9f9dnJr6xubW/ntws7u3v5B8fCoqeNUMWywWMSqHVCNgktsGG4EthOFNAoEtoLRzcxvPaLSPJYPZpygH9GB5CFn1Fjp/qlX6RVLbtmdg6wSLyMlyFDvFb+6/ZilEUrDBNW647mJ8SdUGc4ETgvdVGNC2YgOsGOppBFqfzI/dUrOrNInYaxsSUPm6u+JCY20HkeB7YyoGeplbyb+53VSE175Ey6T1KBki0VhKoiJyexv0ucKmRFjSyhT3N5K2JAqyoxNp2BD8JZfXiXNStm7KFfvqqXadRZHHk7gFM7Bg0uowS3UoQEMBvAMr/DmCOfFeXc+Fq05J5s5hj9wPn8AECKNqg==</latexit>
x2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Principal Component Analysis (PCA)
<latexit sha1_base64="YYnX74plywrsHLvNIAOgeJTuamQ=">AAACJHicbVDLSgNBEJyNrxhfUY9eBoPgKexKUEGEoBdPEsWokEnC7GQ2GTI7u8z0imHZj/Hir3jx4AMPXvwWJ3EPvgoaiqpuurv8WAoDrvvuFKamZ2bnivOlhcWl5ZXy6tqliRLNeJNFMtLXPjVcCsWbIEDy61hzGvqSX/nD47F/dcO1EZG6gFHM2yHtKxEIRsFK3fIBkTwAkmISUhj4QXqbdUiYYCJULvnpedY5xUSL/gBI1k2tfehlnQbulitu1Z0A/yVeTiooR6NbfiG9iCUhV8AkNabluTG0U6pBMMmzEkkMjykb0j5vWapoyE07nTyZ4S2r9HAQaVsK8ET9PpHS0JhR6NvO8d3mtzcW//NaCQT77VSoOAGu2NeiIJEYIjxODPeE5gzkyBLKtLC3YjagmjKwuZZsCN7vl/+Sy52qt1utndUq9aM8jiLaQJtoG3loD9XRCWqgJmLoDj2gJ/Ts3DuPzqvz9tVacPKZdfQDzscnfw6lUQ==</latexit>
xµ
2 RN P
µ=1
<latexit sha1_base64="MuIGc1CqJmN2dBSohoon8NLWjdQ=">AAAB9HicbVDLSgMxFL1TX7W+qi7dBIvgqsxIUZdFNy4r2Ae0Q8mkmTY0yYxJplKGfocbF4q49WPc+Tdm2llo64HA4Zx7uScniDnTxnW/ncLa+sbmVnG7tLO7t39QPjxq6ShRhDZJxCPVCbCmnEnaNMxw2okVxSLgtB2MbzO/PaFKs0g+mGlMfYGHkoWMYGMlvyewGQVh+jTre6hfrrhVdw60SrycVCBHo1/+6g0ikggqDeFY667nxsZPsTKMcDor9RJNY0zGeEi7lkosqPbTeegZOrPKAIWRsk8aNFd/b6RYaD0VgZ3MQuplLxP/87qJCa/9lMk4MVSSxaEw4chEKGsADZiixPCpJZgoZrMiMsIKE2N7KtkSvOUvr5LWRdW7rNbua5X6TV5HEU7gFM7Bgyuowx00oAkEHuEZXuHNmTgvzrvzsRgtOPnOMfyB8/kDhg6R8g==</latexit>
w1 direction of largest variance in the data
<latexit sha1_base64="MuIGc1CqJmN2dBSohoon8NLWjdQ=">AAAB9HicbVDLSgMxFL1TX7W+qi7dBIvgqsxIUZdFNy4r2Ae0Q8mkmTY0yYxJplKGfocbF4q49WPc+Tdm2llo64HA4Zx7uScniDnTxnW/ncLa+sbmVnG7tLO7t39QPjxq6ShRhDZJxCPVCbCmnEnaNMxw2okVxSLgtB2MbzO/PaFKs0g+mGlMfYGHkoWMYGMlvyewGQVh+jTre6hfrrhVdw60SrycVCBHo1/+6g0ikggqDeFY667nxsZPsTKMcDor9RJNY0zGeEi7lkosqPbTeegZOrPKAIWRsk8aNFd/b6RYaD0VgZ3MQuplLxP/87qJCa/9lMk4MVSSxaEw4chEKGsADZiixPCpJZgoZrMiMsIKE2N7KtkSvOUvr5LWRdW7rNbua5X6TV5HEU7gFM7Bgyuowx00oAkEHuEZXuHNmTgvzrvzsRgtOPnOMfyB8/kDhg6R8g==</latexit>
w1
<latexit sha1_base64="A837x7q5v7RDnIGRnbg2uXgo8S4=">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiSlqMuiG5cVbCs0oUymk3boZBLmoZTQ33DjQhG3/ow7/8ZJm4W2Hhg4nHMv98wJU86Udt1vp7S2vrG5Vd6u7Ozu7R9UD4+6KjGS0A5JeCIfQqwoZ4J2NNOcPqSS4jjktBdObnK/90ilYom419OUBjEeCRYxgrWVfD/GehxG2dNs0BhUa27dnQOtEq8gNSjQHlS//GFCTEyFJhwr1ffcVAcZlpoRTmcV3yiaYjLBI9q3VOCYqiCbZ56hM6sMUZRI+4RGc/X3RoZjpaZxaCfzjGrZy8X/vL7R0VWQMZEaTQVZHIoMRzpBeQFoyCQlmk8twUQymxWRMZaYaFtTxZbgLX95lXQbde+i3rxr1lrXRR1lOIFTOAcPLqEFt9CGDhBI4Rle4c0xzovz7nwsRktOsXMMf+B8/gAtS5HJ</latexit>
w2
largest variance in orthogonal space
… … … …
<latexit sha1_base64="GTA3Wg/eDSF68QwBC5C5XSLoZV4=">AAACSHicbVDLSgMxFM3Ud31VXboJFsGFlhkRdSm6cVOoYFVoypBJ77TBzIPkjlKG+Tw3Lt35DW5cKOLOTC34PBA4OedccnOCVEmDrvvoVCYmp6ZnZueq8wuLS8u1ldULk2RaQFskKtFXATegZAxtlKjgKtXAo0DBZXB9UvqXN6CNTOJzHKbQjXg/lqEUHK3k13wWcRwEYX5b+E1K2TbbpiwFndKSKAiR5d8inhW/brtlpJeg+aHmzR2vYFr2B8gKv1Z3G+4I9C/xxqROxmj5tQfWS0QWQYxCcWM6nptiN+capVBQVFlmIOXimvehY2nMIzDdfFREQTet0qNhou2JkY7U7xM5j4wZRoFNlvua314p/ud1MgwPu7mM0wwhFp8PhZmimNCyVdqTGgSqoSVcaGl3pWLANRdou6/aErzfX/5LLnYb3n5j72yvfnQ8rmOWrJMNskU8ckCOyClpkTYR5I48kRfy6tw7z86b8/4ZrTjjmTXyA5XKByEqshA=</latexit>
wM ? {w1, w2, . . . , wM 1}
centered
Principal Component Analysis:
<latexit sha1_base64="8lVQZGbAGMfoffsHO5BSRreA6nM=">AAACL3icbVDLSgNBEJz1bXxFPXoZDIKHEHZjUI+iIB4jGBUyIcxOepMhsw9mepWw5I+8+Cu5iCji1b9wNgZ8xIKBmqpuurv8REmDrvvszMzOzS8sLi0XVlbX1jeKm1vXJk61gIaIVaxvfW5AyQgaKFHBbaKBh76CG79/lvs3d6CNjKMrHCTQCnk3koEUHK3ULp6zkGPPD7L7YfuAUlZmZcoS0AnNiYIAWfajxLPi969KmZbdHrJhu1hyK+4YdJp4E1IiE9TbxRHrxCINIUKhuDFNz02wlXGNUigYFlhqIOGiz7vQtDTiIZhWNr53SPes0qFBrO2LkI7Vnx0ZD40ZhL6tzJc1f71c/M9rphgctzIZJSlCJL4GBamiGNM8PNqRGgSqgSVcaGl3paLHNRdoIy7YELy/J0+T62rFO6zULmulk9NJHEtkh+ySfeKRI3JCLkidNIggD2REXsir8+g8OW/O+1fpjDPp2Sa/4Hx8AmzYqMI=</latexit>
w3 ? {w1, w2}
<latexit sha1_base64="HExy6GIT6eMTJuoR2V19GdXeImc=">AAACEXicbZDLSgMxFIYzXmu9jbp0EyxCF6XMlKIui25cVrAX6AxDJs20oZlMSDJKGfoKbnwVNy4UcevOnW9jpi2orQcCH/9/DjnnDwWjSjvOl7Wyura+sVnYKm7v7O7t2weHbZWkEpMWTlgiuyFShFFOWppqRrpCEhSHjHTC0VXud+6IVDTht3osiB+jAacRxUgbKbDLXoz0MIyy+0lQg9CreBXoCSIFzOHHcwO75FSdacFlcOdQAvNqBvan109wGhOuMUNK9VxHaD9DUlPMyKTopYoIhEdoQHoGOYqJ8rPpRRN4apQ+jBJpHtdwqv6eyFCs1DgOTWe+o1r0cvE/r5fq6MLPKBepJhzPPopSBnUC83hgn0qCNRsbQFhSsyvEQyQR1ibEognBXTx5Gdq1qntWrd/US43LeRwFcAxOQBm44Bw0wDVoghbA4AE8gRfwaj1az9ab9T5rXbHmM0fgT1kf33ObnCo=</latexit>
w2 ? w1
projections:
<latexit sha1_base64="RA5iB821E5SNoyijWuF9GDFdhZw=">AAACE3icbVDLSsNAFJ34rPUVdelmsAjioiRS1I1QdOOygn1AE8NkOmmHTiZhZqKG0H9w46+4caGIWzfu/BsnbRBtPTBwOOdc5t7jx4xKZVlfxtz8wuLScmmlvLq2vrFpbm23ZJQITJo4YpHo+EgSRjlpKqoY6cSCoNBnpO0PL3K/fUuEpBG/VmlM3BD1OQ0oRkpLnnmYesMbJ0zgGXRCpAZ+kN2NcklF8Y9yP8ojnlmxqtYYcJbYBamAAg3P/HR6EU5CwhVmSMqubcXKzZBQFDMyKjuJJDHCQ9QnXU05Col0s/FNI7ivlR4MIqEfV3Cs/p7IUChlGvo6mW8pp71c/M/rJio4dTPK40QRjicfBQmDKoJ5QbBHBcGKpZogLKjeFeIBEggrXWNZl2BPnzxLWkdV+7hau6pV6udFHSWwC/bAAbDBCaiDS9AATYDBA3gCL+DVeDSejTfjfRKdM4qZHfAHxsc3xlKewg==</latexit>
yµ
k = w>
k xµ
<latexit sha1_base64="OSsaCH1Eb6WryF/Ad6oXNg/hEUY=">AAACNnicbVBNa9tAEF0lbeO4X0p67GWpKfRQjBRCk0vAJJdeDGmpE4NXFqv1yF6yWondUUAI/6pe+jtyy6WHltJrf0JWtg6t3QcLb9+bYWZeUihpMQjuvZ3dR4+f7HX2u0+fPX/x0j84vLJ5aQSMRK5yM064BSU1jFCignFhgGeJguvk5qLxr2/BWJnrL1gVEGV8rmUqBUcnxf6Q3YKoq+WUZSU9o0xBipMqDpv/e1rFR2vC1CxH2wjDRmBGzhcYUSY1ZRnHRZLUn5fTYez3gn6wAt0mYUt6pMVl7N+xWS7KDDQKxa2dhEGBUc0NSqFg2WWlhYKLGz6HiaOaZ2CjenX2kr51yoymuXFPI12pf3fUPLO2yhJX2exoN71G/J83KTE9jWqpixJBi/WgtFQUc9pkSGfSgEBVOcKFkW5XKhbccIEu6a4LIdw8eZtcHfXDD/3jT8e9wXkbR4e8Jm/IOxKSEzIgH8klGRFBvpJ78oP89L55371f3u916Y7X9rwi/8D78wBWuau3</latexit>
~
yµ
= [yµ
1 , yµ
2 , . . . , yµ
M ] 2 RM
typically: M < N, low-dimensional linear projections of high-dim. data
with optimal information content for Gaussian data
frequently: M =2,3 for visualization of high-dim. data
<latexit sha1_base64="uDaGUxcK//nRkCEehlsKTtnLA5g=">AAACRXicbVBNTxsxFPQCLZB+pfTIxSKqxAGlu1VVeomE4NJyChIBpDhEXudtYmF7V/bbqpG1J/4ZF+7c+AdcegChXouTcKChI1kazczTe560UNJhHF9HC4tLL14ur6zWXr1+8/Zd/f3akctLK6AjcpXbk5Q7UNJAByUqOCkscJ0qOE7P9ib+8U+wTubmEMcF9DQfGplJwTFI/TpjCjJkipuhAso0x1Ga+V/VKdMlZVYOR8jszGxRllkufFL5dkWZK3Xfh1QrqU7blG3ND7do3K834mY8BX1OkkfS2PnBPp37/Ua7X79ig1yUGgwKxZ3rJnGBPc8tSqGgqrHSQcHFGR9CN1DDNbien7ZQ0Y9BGdAst+EZpFP16YTn2rmxTkNycqmb9ybi/7xuidm3npemKBGMmC3KSkUxp5NK6UBaEKjGgXBhZbiVihEPVWEovhZKSOa//JwcfW4mX5tfDkIbu2SGFbJONsgmScg22SHfSZt0iCAX5IbckrvoMvod3Ud/ZtGF6HHmA/kH0d8HBja0vQ==</latexit>
hxµ
i =
1
P
P
X
µ=1
xµ
= 0
feature vectors
eigenvectors of covariance matrix
<latexit sha1_base64="9z4CEIRz5yeIRjjrTcahhq7l73k=">AAACVnicbVHBShxBEO2ZjdFsNE70mEsnEshBlhkxJBdB8OJJNiGrwva69PTWaGN3z9BdE1ya+Zzg/5hLJMd8RS5iz+4ejKag4dV7VVTV67xS0mGa3kZx59nS8+WVF92Xq2uv1pPXG8eurK2AgShVaU9z7kBJAwOUqOC0ssB1ruAkvzxo9ZPvYJ0szTecVjDS/NzIQgqOgRonmiFcocOpAnqwxwrLhc8a32+Yq/XYM13vZc1ZyCnTHC/ywl81Z4F9mLZVDMuqoZRts23KpFnIuf8a5COGUoOjR8042Up76SzoU5AtwNb+2x+d7M/17/44uWGTUtQaDArFnRtmaYUjzy1KoaDpstpBxcUlP4dhgIaHOSM/s6Wh7wMzoUVpwzNIZ+zDDs+1c1Odh8p2W/dYa8n/acMai88jL01VIxgxH1TUimJJW4/pRFoQqKYBcGFl2JWKCx6sxfAT3WBC9vjkp+B4p5d97KVfghu7ZB4r5A15Rz6QjHwi++SQ9MmACPKT/I3iqBP9iu7ipXh5XhpHi55N8k/EyT3bBLk6</latexit>
C = 1
P
PP
µ=1 xµ
xµ>
2 RN⇥N
14. Groningen, December 2021
GTeX data (normal samples)
<latexit sha1_base64="S9Cu8AKFnK6I/8lMB82XcUuy6Gc=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkqMeiF48V7Qe0oWy2m3bpZhN2J0Io/QlePCji1V/kzX/jts1BWx8MPN6bYWZekEhh0HW/ncLa+sbmVnG7tLO7t39QPjxqmTjVjDdZLGPdCajhUijeRIGSdxLNaRRI3g7GtzO//cS1EbF6xCzhfkSHSoSCUbTSQ9b3+uWKW3XnIKvEy0kFcjT65a/eIGZpxBUySY3pem6C/oRqFEzyaamXGp5QNqZD3rVU0YgbfzI/dUrOrDIgYaxtKSRz9ffEhEbGZFFgOyOKI7PszcT/vG6K4bU/ESpJkSu2WBSmkmBMZn+TgdCcocwsoUwLeythI6opQ5tOyYbgLb+8SloXVe+yWruvVeo3eRxFOIFTOAcPrqAOd9CAJjAYwjO8wpsjnRfn3flYtBacfOYY/sD5/AEQJI2q</latexit>
y1
<latexit sha1_base64="AXHDfzJfI2psFexzj7c34ZbJEOI=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY9FLx4r2lpoQ9lsJ+3SzSbsboQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IBFcG9f9dgpr6xubW8Xt0s7u3v5B+fCoreNUMWyxWMSqE1CNgktsGW4EdhKFNAoEPgbjm5n/+IRK81g+mEmCfkSHkoecUWOl+0m/1i9X3Ko7B1klXk4qkKPZL3/1BjFLI5SGCap113MT42dUGc4ETku9VGNC2ZgOsWuppBFqP5ufOiVnVhmQMFa2pCFz9fdERiOtJ1FgOyNqRnrZm4n/ed3UhFd+xmWSGpRssShMBTExmf1NBlwhM2JiCWWK21sJG1FFmbHplGwI3vLLq6Rdq3oX1fpdvdK4zuMowgmcwjl4cAkNuIUmtIDBEJ7hFd4c4bw4787HorXg5DPH8AfO5w8RqI2r</latexit>
y2
<latexit sha1_base64="2CemlM6+NZ4ujdFym/hTPFz7v/c=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0m0qMeiF48V7Qe0oWy2m3bpZhN2J0Io/QlePCji1V/kzX/jts1Bqw8GHu/NMDMvSKQw6LpfTmFldW19o7hZ2tre2d0r7x+0TJxqxpsslrHuBNRwKRRvokDJO4nmNAokbwfjm5nffuTaiFg9YJZwP6JDJULBKFrpPuuf98sVt+rOQf4SLycVyNHolz97g5ilEVfIJDWm67kJ+hOqUTDJp6VeanhC2ZgOeddSRSNu/Mn81Ck5scqAhLG2pZDM1Z8TExoZk0WB7YwojsyyNxP/87ophlf+RKgkRa7YYlGYSoIxmf1NBkJzhjKzhDIt7K2EjaimDG06JRuCt/zyX9I6q3oX1dpdrVK/zuMowhEcwyl4cAl1uIUGNIHBEJ7gBV4d6Tw7b877orXg5DOH8AvOxzcTLI2s</latexit>
y3
<latexit sha1_base64="S9Cu8AKFnK6I/8lMB82XcUuy6Gc=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkqMeiF48V7Qe0oWy2m3bpZhN2J0Io/QlePCji1V/kzX/jts1BWx8MPN6bYWZekEhh0HW/ncLa+sbmVnG7tLO7t39QPjxqmTjVjDdZLGPdCajhUijeRIGSdxLNaRRI3g7GtzO//cS1EbF6xCzhfkSHSoSCUbTSQ9b3+uWKW3XnIKvEy0kFcjT65a/eIGZpxBUySY3pem6C/oRqFEzyaamXGp5QNqZD3rVU0YgbfzI/dUrOrDIgYaxtKSRz9ffEhEbGZFFgOyOKI7PszcT/vG6K4bU/ESpJkSu2WBSmkmBMZn+TgdCcocwsoUwLeythI6opQ5tOyYbgLb+8SloXVe+yWruvVeo3eRxFOIFTOAcPrqAOd9CAJjAYwjO8wpsjnRfn3flYtBacfOYY/sD5/AEQJI2q</latexit>
y1
Principal Component Analysis
- tissue labels not used
- low-dimensional projections of 78-dim. mRNA expression data:
15. Groningen, December 2021
whole blood - brain tissues - all other tissues
different tissues have different RP mRNA signatures!
Principal Component Analysis
- tissue labels not used
- low-dimensional projections of 78-dim. mRNA expression data
- post-labelling (coloring) of data points according to tissue (group):
<latexit sha1_base64="S9Cu8AKFnK6I/8lMB82XcUuy6Gc=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkqMeiF48V7Qe0oWy2m3bpZhN2J0Io/QlePCji1V/kzX/jts1BWx8MPN6bYWZekEhh0HW/ncLa+sbmVnG7tLO7t39QPjxqmTjVjDdZLGPdCajhUijeRIGSdxLNaRRI3g7GtzO//cS1EbF6xCzhfkSHSoSCUbTSQ9b3+uWKW3XnIKvEy0kFcjT65a/eIGZpxBUySY3pem6C/oRqFEzyaamXGp5QNqZD3rVU0YgbfzI/dUrOrDIgYaxtKSRz9ffEhEbGZFFgOyOKI7PszcT/vG6K4bU/ESpJkSu2WBSmkmBMZn+TgdCcocwsoUwLeythI6opQ5tOyYbgLb+8SloXVe+yWruvVeo3eRxFOIFTOAcPrqAOd9CAJjAYwjO8wpsjnRfn3flYtBacfOYY/sD5/AEQJI2q</latexit>
y1
<latexit sha1_base64="AXHDfzJfI2psFexzj7c34ZbJEOI=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY9FLx4r2lpoQ9lsJ+3SzSbsboQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IBFcG9f9dgpr6xubW8Xt0s7u3v5B+fCoreNUMWyxWMSqE1CNgktsGW4EdhKFNAoEPgbjm5n/+IRK81g+mEmCfkSHkoecUWOl+0m/1i9X3Ko7B1klXk4qkKPZL3/1BjFLI5SGCap113MT42dUGc4ETku9VGNC2ZgOsWuppBFqP5ufOiVnVhmQMFa2pCFz9fdERiOtJ1FgOyNqRnrZm4n/ed3UhFd+xmWSGpRssShMBTExmf1NBlwhM2JiCWWK21sJG1FFmbHplGwI3vLLq6Rdq3oX1fpdvdK4zuMowgmcwjl4cAkNuIUmtIDBEJ7hFd4c4bw4787HorXg5DPH8AfO5w8RqI2r</latexit>
y2
<latexit sha1_base64="2CemlM6+NZ4ujdFym/hTPFz7v/c=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0m0qMeiF48V7Qe0oWy2m3bpZhN2J0Io/QlePCji1V/kzX/jts1Bqw8GHu/NMDMvSKQw6LpfTmFldW19o7hZ2tre2d0r7x+0TJxqxpsslrHuBNRwKRRvokDJO4nmNAokbwfjm5nffuTaiFg9YJZwP6JDJULBKFrpPuuf98sVt+rOQf4SLycVyNHolz97g5ilEVfIJDWm67kJ+hOqUTDJp6VeanhC2ZgOeddSRSNu/Mn81Ck5scqAhLG2pZDM1Z8TExoZk0WB7YwojsyyNxP/87ophlf+RKgkRa7YYlGYSoIxmf1NBkJzhjKzhDIt7K2EjaimDG06JRuCt/zyX9I6q3oX1dpdrVK/zuMowhEcwyl4cAl1uIUGNIHBEJ7gBV4d6Tw7b877orXg5DOH8AvOxzcTLI2s</latexit>
y3
<latexit sha1_base64="S9Cu8AKFnK6I/8lMB82XcUuy6Gc=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkqMeiF48V7Qe0oWy2m3bpZhN2J0Io/QlePCji1V/kzX/jts1BWx8MPN6bYWZekEhh0HW/ncLa+sbmVnG7tLO7t39QPjxqmTjVjDdZLGPdCajhUijeRIGSdxLNaRRI3g7GtzO//cS1EbF6xCzhfkSHSoSCUbTSQ9b3+uWKW3XnIKvEy0kFcjT65a/eIGZpxBUySY3pem6C/oRqFEzyaamXGp5QNqZD3rVU0YgbfzI/dUrOrDIgYaxtKSRz9ffEhEbGZFFgOyOKI7PszcT/vG6K4bU/ESpJkSu2WBSmkmBMZn+TgdCcocwsoUwLeythI6opQ5tOyYbgLb+8SloXVe+yWruvVeo3eRxFOIFTOAcPrqAOd9CAJjAYwjO8wpsjnRfn3flYtBacfOYY/sD5/AEQJI2q</latexit>
y1
GTeX data (normal samples)
16. Groningen, December 2021
based on dis-similarity/distance measure
assignment to prototypes: e.g. Nearest Prototype Scheme
given vector xµ , determine winner
or best matching unit (BMU) → assign xµ to prototype w*
most popular: (squared) Euclidean distance
Competitive Learning
VQ system: set of prototypes
data: set of feature vectors
w1
, w2
, . . . , wK
wk
2 I
RN
d(w, x) 0
x1
, x2
, . . . , xP
xµ
2 I
RN
w⇤
= argminj d(wj
, xµ
)
Vector Quantization: identify typical representatives of data
which capture essential features
d(w, x) =
N
X
i=1
(wi xi)
2
17. Groningen, December 2021
and random sequential (repeated) presentation of data
… BMU: The Winner Takes it All :
initially: randomized wk, e.g. in randomly selected data points
Competitive Learning
η (<1): learning rate, step size of update
wi⇤
= argminj d(wj
, xi
)
{xµ
}
⌘ (xµ
w⇤
)
w⇤
! w⇤
+ ⌘ (xµ
w⇤
)
repeated presentation of the
data set
protoytpes → typical
representatives of data
possibly reflect structures
in the data set, e.g. clusters
18. Groningen, December 2021
Self-Organizing Map (SOM)
T. Kohonen. Self-Organizing Maps (Springer 1995)
neighborhood relation on a pre-defined low-dim. lattice
d-dim. lattice A of
prototypes (neurons)
- update BMU and lattice neighborhood:
where
range ρ w.r.t. distances in lattice A
ws
wr 2 I
RN
at r 2 I
Rd
h⇢(r, s) = exp
✓
|| r s ||2
A
2⇢2
◆
upon presentation of xµ :
- determine the Best Matching Unit
at position s in the lattice
wr ! wr + ⌘ h⇢(r, s) (xµ
wr)
e.g. two-dim. square
lattice, triangular, etc.
21. Groningen, December 2021 21
subsets of RP mRNA data:
if majority<50%
if majority <50%
GTeX data (normal samples)
different tissues have different RP mRNA signatures!
22. Groningen, December 2021 22
low-dimensional embedding
- represent high-dim data in low-dim. embedding space
- no functional mapping from (as in PCA)
instead: representation of by explicit counterparts
<latexit sha1_base64="1tv1oazNTjeAf027AGYdq27jEJM=">AAACF3icbVDLSgMxFM3UVx0fHXXpJlgEBSkzouhCsCCCm5Yq9gGdsWTStA3NZMYkI5Shf+HGX9GFC0Xc6s5P8C9MH6BWL4ScnHMvuef4EaNS2faHkZqanpmdS8+bC4tLyxlrZbUiw1hgUsYhC0XNR5IwyklZUcVILRIEBT4jVb97MtCrN0RIGvJL1YuIF6A2py2KkdJUw8q5AVId308u+ldF6O5AV4XD65su6Nd1jJqwcFRsWFk7Zw8L/gXOGGTzmc/jbfPhtNSw3t1miOOAcIUZkrLu2JHyEiQUxYz0TTeWJEK4i9qkriFHAZFeMvTVh5uaacJWKPThCg7ZnxMJCqTsBb7uHKwrJ7UB+Z9Wj1Xr0Esoj2JFOB591IoZ1N4HIcEmFQQr1tMAYUH1rhB3kEBY6ShNHYIzafkvqOzmnP2cfa7T2AOjSoN1sAG2gAMOQB6cgRIoAwxuwT14As/GnfFovBivo9aUMZ5ZA7/KePsCzwOgqw==</latexit>
RN
! RM
, M < N
<latexit sha1_base64="TCNomwS7dMfW9CJyG4NGilIoNhk=">AAACE3icbVC7SgNBFJ31GeMramkzGgSxCLuiaCMEbCwjmAfsJsvsZDYZMvtg5q4ali38Axv/wG+wsVDE1sbO7/AHnDwKTTwwcOace7n3Hi8WXIFpfhkzs3PzC4u5pfzyyuraemFjs6aiRFJWpZGIZMMjigkesipwEKwRS0YCT7C61zsf+PVrJhWPwivox6wZkE7IfU4JaMktHDiC+eCk2AkIdD0/vc1aTpBgR/JOF5zMTfXvzMpaFewWimbJHAJPE2tMiuWdx+8b+86suIVPpx3RJGAhUEGUsi0zhmZKJHAqWJZ3EsViQnukw2xNQxIw1UyHN2V4Tytt7EdSvxDwUP3dkZJAqX7g6crB5mrSG4j/eXYC/mkz5WGcAAvpaJCfCAwRHgSE21wyCqKvCaGS610x7RJJKOgY8zoEa/LkaVI7LFnHJfNSp3GERsihbbSL9pGFTlAZXaAKqiKK7tETekGvxoPxbLwZ76PSGWPcs4X+wPj4AQRpodg=</latexit>
{xµ
}
P
µ=1
<latexit sha1_base64="WUiTE2WauLEqQrLB5tnyWwIJtwU=">AAACEHicbVC7SgNBFJ31bXytWtqMimgVdkXRRgjYWEYwKmSTZXZyNxmcfTBzNxKWLfwAG7/BP7CxUMTW0s7v8AecJBZqPHDhzDn3MveeIJVCo+N8WGPjE5NT0zOzpbn5hcUle3nlXCeZ4lDjiUzUZcA0SBFDDQVKuEwVsCiQcBFcHff9iy4oLZL4DHspNCLWjkUoOEMj+fa2JyFEL6deF3jeK5pelFFPiXYHvcLPzevILZpV6tubTtkZgI4S95tsVtbvP6/rN07Vt9+9VsKzCGLkkmldd50UGzlTKLiEouRlGlLGr1gb6obGLALdyAcHFXTLKC0aJspUjHSg/pzIWaR1LwpMZ8Swo/96ffE/r55heNjIRZxmCDEffhRmkmJC++nQllDAUfYMYVwJsyvlHaYYR5NhyYTg/j15lJzvlt39snNq0tgjQ8yQNbJBdohLDkiFnJAqqRFObskDeSLP1p31aL1Yr8PWMet7ZpX8gvX2BZOqoIc=</latexit>
{~
yµ
}
P
µ=1
Multi-Dimensional Scaling: pair-wise distances
<latexit sha1_base64="POajwQbY6hz3V3rd5z+JTTBxyeU=">AAACTnicbVHLbhMxFPWEV0h5BFhVbCwqpFSCMBOBYFNRqRtWVZFIWykzGXmcO4lb2zOyr1Gj0XwTH8IGdVd2fAIbFiAEnkkXpeVKlo/PuUf2Pc5KKSyG4VnQuXb9xs1b3du9tTt3793vP3i4bwtnOIx5IQtzmDELUmgYo0AJh6UBpjIJB9nxTqMffARjRaE/4LKERLG5FrngDD2V9mGW7g5ixXCR5dVJPY2Ve0YvnrXbpFs0lpDjJLZOpdXRVlRPd2mv5QYn6VFjet7u2sVGzBe4OR2tQDKtohejOu1vhMOwLXoVROdgY/vt+qdvX6u1vbR/Gs8K7hRo5JJZO4nCEpOKGRRcQt2LnYWS8WM2h4mHmimwSdXGUdOnnpnRvDB+aaQte9FRMWXtUmW+sxnUXtYa8n/axGH+JqmELh2C5quLcicpFrTJls6EAY5y6QHjRvi3Ur5ghnH0P9DzIUSXR74K9kfD6NUwfO/TeElW1SWPyRMyIBF5TbbJO7JHxoSTz+Q7+Ul+BV+CH8Hv4M+qtROcex6Rf6rT/Qtej7gZ</latexit>
dN (xµ
, x⌫
) =
2
4
N
X
j=1
xµ
j x⌫
j
2
3
5
1/2
<latexit sha1_base64="FIH/CltnRbY0w1ADFSAcgtfVJmc=">AAACSHicbVBbaxQxGM1svdT1ttZHX4JF3IKuM4uiCIWCIL4UKrhtYedCJvvNbtgkMyRfCsMwP08EH30p/Q2++KCIb2Zni2jrgZCTc75DkpNXUlgMw7Ogt3Hl6rXrmzf6N2/dvnN3cG/r0JbOcJjwUpbmOGcWpNAwQYESjisDTOUSjvLlm5V/dALGilJ/wLqCRLG5FoXgDL2UDbJZtj+MT4A3dZvGyj2hfw7a7dBdGksocBpbp7JmuRu16T7td9qwzparxNNu1y42Yr7AnXS8JknaRM/GbTbYDkdhB3qZROdke+/149OUvv10kA2+xLOSOwUauWTWTqOwwqRhBgWX0PZjZ6FifMnmMPVUMwU2aboiWvrIKzNalMYvjbRT/040TFlbq9xPKoYLe9Fbif/zpg6LV0kjdOUQNF9fVDhJsaSrVulMGOAoa08YN8K/lfIFM4yj777vS4gufvkyORyPohej8L1v4zlZY5M8IA/JkETkJdkj78gBmRBOPpKv5Dv5EXwOvgU/g1/r0V5wnrlP/kGv9xtDWLT1</latexit>
dM (~
yµ
, ~
y⌫
) =
" M
X
k=1
(yµ
k y⌫
k )
2
#1/2
d
M
dN
x3
x2 x1
find →distances approx. reproduced
<latexit sha1_base64="bIkLS64pBnl9X+CZIjkQnpD14rg=">AAACeHicbVFNb9MwGHbCgFG+Chx3sTYmOqlUScUEhx0mceHA0CbRbVKdRY77pjWzncwfE1WUOzd+D3+DG7f9iV12wkmnqWy8kqXHz4dev6+zUnBjo+hPEN5buf/g4eqjzuMnT5897754eWgKpxmMWCEKfZxRA4IrGFluBRyXGqjMBBxlpx8b/egctOGF+mrnJSSSThXPOaPWU2n3J5FcpRURkFtSkXNg1bw+IdIRzaczS+oaY2Kc9Bbp+kS5Pm5EBWce17jNjfEk/dIjktpZllff23h/+arcVuetN+31lhr0b7CXF92Sk2Ha3YgGUVv4LoivwcbuzsWvH5+/7eyn3d9kUjAnQVkmqDHjOCptUlFtORNQd4gzUFJ2Sqcw9lBRCSap2sXVeNMzE5wX2h9lccsuJyoqjZnLzDubccxtrSH/p42dzT8kFVels6DYolHuBLYFbn4BT7gGZsXcA8o092/FbEY1Zdb/VccvIb498l1wOBzE24PowG/jHVrUKlpD66iHYvQe7aJPaB+NEEOXwVrwOtgMrkIcvgm3FtYwuM68Qv9UOPwLmB7FcQ==</latexit>
min
{~
yµ}
X
µ,⌫,µ6=⌫
[dN (xµ
, x⌫
) dM (~
yµ
, ~
y⌫
)]
2
23. Groningen, December 2021 23
stochastic neighborhood embedding
• consider distance-based probability for pair-wise neighborhood
in the original feature space, e.g.
local std. deviations !µ determined by local density of data
<latexit sha1_base64="4Ar8HLBeqAIglHOmzs8gK1Id8sw=">AAACYnicbVFNTxsxEPVuodC0hdCeqnIYFVWiB8JuKGqPSFw4UqkBpHgTeR1vYmF7LX9URMv+oP6C/o/eeuoFJH4G3k0qIehIlp/fzNPMPOdacOuS5E8UP1tZfb62/qLz8tXrjc3u1pszW3pD2YCWojQXObFMcMUGjjvBLrRhROaCneeXx03+/Aczlpfqu5trlkkyVbzglLhAjbtzPa6w9HANWPkasDaldiVgdqUBC1a4Iey1dw9wYQit0rrq17CLJXGzvKiu6lEj34OHhPLwadQHbPh05vYBWz6VZNQPndoeLZ2NuztJL2kDnoJ0CXaODt79nNz9uj0dd3/jSUm9ZMpRQawdpol2WUWM41SwuoO9ZZrQSzJlwwAVkcxmVWtRDR8DM4GiNOEoBy37UFERae1c5qGyWcQ+zjXk/3JD74qvWcWV9o4pumhUeAHBxMZvmHDDqBPzAAg1PMwKdEaCky78SieYkD5e+Sk46/fSw17yLbjxGS1iHb1HH9AuStEXdIRO0CkaIIr+RqvRRrQZ3cSdeCt+uyiNo6Vm+f4X8fY96WS6iQ==</latexit>
pµ|⌫ / exp
1
2
(xµ
x⌫
)2 2
⌫
<latexit sha1_base64="1+hnkY9YBLe/CdN/7Me1tUkBq1Y=">AAACL3icbVDLSgMxFM34tr6qLt0ERVCUMuMD3QgFQVyJgq1Cp5RMmmlDk8yQ3BHKOD/gB/gDbt34K92IKOLGhX9hOnXh60LgnHPP5eaeIBbcgOs+OUPDI6Nj4xOThanpmdm54vxC1USJpqxCIxHpy4AYJrhiFeAg2GWsGZGBYBdB57Dfv7hi2vBInUM3ZnVJWoqHnBKwUqN4FDdSXyabvkoyfID9UBOaelm6dZJhX7AQ1gYGfI1zywbuc5VzabmveasN643iilty88J/gfcFVsobd9Xb95vt00ax5zcjmkimgApiTM1zY6inRAOngmUFPzEsJrRDWqxmoSKSmXqa35vhVas0cRhp+xTgXP0+kRJpTFcG1ikJtM3vXl/8r1dLINyvp1zFCTBFB4vCRGCIcD883OSaURBdCwjV3P4V0zaxkYGNuGBD8H6f/BdUt0rebsk9s2nsoEFNoCW0jNaQh/ZQGR2jU1RBFN2jHnpGL86D8+i8Om8D65DzNbOIfpTz8Qnaqqv4</latexit>
pµ,⌫ =
1
2N
pµ|⌫ + p⌫|µ
• analogous in embedding space: pairwise probabilities
e.g. Gaussian density in Stochastic Neighborhood Embedding (SNE)
<latexit sha1_base64="lPrSjBgpne1Rd6UpTRdvI+EX2Cc=">AAAB83icbVC7SgNBFL0bXzG+ohYWNoNBsJCwK4qWARvtIpgHZJc4O5lNBmdn13kIYclv2FgoYmvtf9j5A5Z+g5NHoYkHLhzOuZd77wlTzpR23U8nNze/sLiUXy6srK6tbxQ3t+oqMZLQGkl4IpshVpQzQWuaaU6bqaQ4DjlthLfnQ79xT6ViibjW/ZQGMe4KFjGCtZX8u3bmx+bQF2aA2sWSW3ZHQLPEm5BSZefy6/s9d1NtFz/8TkJMTIUmHCvV8txUBxmWmhFOBwXfKJpicou7tGWpwDFVQTa6eYD2rdJBUSJtCY1G6u+JDMdK9ePQdsZY99S0NxT/81pGR2dBxkRqNBVkvCgyHOkEDQNAHSYp0bxvCSaS2VsR6WGJibYxFWwI3vTLs6R+VPZOyu6VTeMYxsjDLuzBAXhwChW4gCrUgEAKD/AEz45xHp0X53XcmnMmM9vwB87bD4/KlQI=</latexit>
qµ,⌫
t-SNE:
<latexit sha1_base64="+xq1lEf6Uwgk0rwIYzZqUwN2hsc=">AAACcHicbVFdaxQxFM2MX3X92toXQcVoEVopy0xR9KFCQR8E+7AFty1sxiGTvbMbm2SmSaawxHnWv+Tf8M03/4Av/gLv7hZZWy9cOPece5Obk6JW0vkk+RHFly5fuXpt5Xrnxs1bt+90V+8euKqxAgaiUpU9KrgDJQ0MvPQKjmoLXBcKDovjNzP98BSsk5X54Kc1ZJqPjSyl4B6pvPuFaWnywAI7BRGm7UemG9a2lLlGI62bLWYwkTRwghCVLVr/VdoOlkwZTCj9kJWWi7Akt+FkqaDMyvHEZ3jGa5zrvM3D+712o/95fzPvrie9ZB70IkjPwPruzs9vX/c+7fTz7nc2qkSjwXihuHPDNKl9Frj1UijAvRoHNRfHfAxDhIZrcFmYG9bSp8iMaFlZTOPpnF2eCFw7N9UFdmruJ+68NiP/pw0bX77KgjR148GIxUVlo6iv6Mx9OpIWhFdTBFxYibtSMeFomsc/6qAJ6fknXwQH2730RS/ZRzeek0WskPvkCdkgKXlJdsk70icDIsivaC16ED2Mfsf34kfx40VrHJ3NrJF/In72B48gwHo=</latexit>
min
{~
yµ}
X
µ,⌫,µ6=⌫
pµ,⌫ ln
pµ,⌫
qµ,⌫
= DKL(P|Q)
find →similar probabilities
with emphasis on
local neighborhoods
Kullback-Leibler divergence
student-t distribution
<latexit sha1_base64="R/E7CgYeaLCTpNcw41fOfKniBdk=">AAACbHicbVFdaxQxFM2MH63rR7fVB6UIwaJUbJeZRdE+CAVffKzgtoXNdMlk7+yGJplpclNYhnny5/gPxD9R3/wJvvgbzOwWWVsvBM4951xucpJXSjpMkp9RfOPmrdsrq3c6d+/df7DWXd84dKW3AgaiVKU9zrkDJQ0MUKKC48oC17mCo/z0Q6sfnYN1sjSfcVZBpvnEyEIKjoEadb+cjWqm/Q4zvqHvKSssFzVzXgfaTssdhjyIATEDZ23TUKagwGEnfbXNzkHUs+ak1eku/dsG28uTPrNyMsWsqZed2i8bTetrRt2tpJfMi14H6SXY2u9ffGV7378djLo/2LgUXoNBobhzwzSpMKu5RSkUNB3mHVRcnPIJDAM0XIPL6nlYDX0emDEtShuOQTpnlydqrp2b6Tw4Ncepu6q15P+0ocfiXVZLU3kEIxaLCq8olrRNno6lBYFqFgAXVoa7UjHlIW8M/9MJIaRXn3wdHPZ76Zte8imk8ZosapVskmdkm6TkLdknH8kBGRBBfkVr0ePoSfQ7fhRvxk8X1ji6nHlI/qn4xR8iZ79K</latexit>
qµ,⌫ =
P
⇢,⌧,⇢6=⌧
⇥
1 + (~
y⇢
~
y⌧
)2
⇤
1 + (~
yµ ~
y⌫)2
24. Groningen, December 2021
t-SNE analysis, posthoc labelling according to tissues:
see NARpublication for
more detailed maps
GTeX data (normal samples)
clustering of
tissue-types
here: no intuitive
interpretation of y1,y2
different tissues have different RP mRNA signatures!
25. Groningen, December 2021
∙ identification of prototype vectors from labeled example data
∙ distance based classification (e.g. Euclidean)
(supervised) Learning Vector Quantization
N-dimensional data, feature vectors
• initialize prototype vectors
for different classes
supervised competititve learning: LVQ1 [Kohonen, 1990]
• identify the winner
(closest prototype)
• present a single example
• move the winner
- closer towards the data (same class)
- away from the data (different class)
26. Groningen, December 2021
∙ identification of prototype vectors from labeled example data
∙ distance based classification (e.g. Euclidean)
N-dimensional data, feature vectors
∙ tesselation of feature space
[piece-wise linear]
∙ distance-based classification
[here: Euclidean distances]
∙ aim: discrimination of classes
( ≠ vector quantization
or density estimation )
(supervised) Learning Vector Quantization
27. Groningen, December 2021
generalized quadratic distance in LVQ:
= contribution of the pair of features i,j
⇤ij
⇤jj = relevance of feature j
d(w, x) =
N
X
i,j=1
(wi xi) ⇤ij (wj xj)
Matrix Relevance LVQ
<latexit sha1_base64="lSgxu3alJZqcvn93Zvl1W/TnnkA=">AAACQ3icbVBNS8MwGE7n16xfVQ8evASH4Gm0ouhlMPCi4EHBbeI6Zpplmi1Na5KKo7R/y6sX/4A3/4AXQUW8CqabolNfCDw8H7xvHi9kVCrbvjdyI6Nj4xP5SXNqemZ2zppfqMogEphUcMACcewhSRjlpKKoYuQ4FAT5HiM1r7uT6bVLIiQN+JHqhaThozNO2xQjpammdeLua3MLNWNKk5IDXd8LruI0bQciRYylCaTQdaH5xadp8p3oJCV7KJHZXU4uYEeHmlbBLtr9gX+B8wkK5aW9h6fr3OlB07pzWwGOfMIVZkjKumOHqhEjoShmJDHdSJIQ4S46I3UNOfKJbMT9DhK4qpkW1EfoxxXssz8TMfKl7PmedvpIncvfWkb+p9Uj1d5uxJSHkSIcDxa1IwZVALNCYYsKghXraYCwoPpWiM+RQFjp2k1dgvP7y39Bdb3obBbtQ93GBhhMHiyDFbAGHLAFymAXHIAKwOAGPIBn8GLcGo/Gq/E2sOaMz8wiGBrj/QPJLrY5</latexit>
⇤ii = 1 for all i
⇤ij = 0 for i 6= j
<latexit sha1_base64="CNvpcRs5/fS0apxEYNXY/9SzUzc=">AAACLnicbZDLSgMxFIYz3q23qksRgiJU0DJTFF0oCCK4VLBa6NQhk2ba2MyF5IxtGeaJ3PgOPoEuBBVx6xu4NdMqaPVAwsd/ziH5fzcSXIFpPhlDwyOjY+MTk7mp6ZnZufz8wrkKY0lZmYYilBWXKCZ4wMrAQbBKJBnxXcEu3NZh1r+4ZlLxMDiDbsRqPmkE3OOUgJac/FG9YPsEmq6XtNONb+yk69jewPvZZavYd5KrtMeCeVBoOxxv4o7DbckbTVi/LDn5VbNo9gr/BesLVg/2WvRu+cM9cfIPdj2ksc8CoIIoVbXMCGoJkcCpYGnOjhWLCG2RBqtqDIjPVC3p2U3xmlbq2AulPgHgnvpzIyG+Ul3f1ZOZHzXYy8T/etUYvN1awoMoBhbQ/kNeLDCEOMsO17lkFERXA6GS679i2iSSUNAJ53QI1qDlv3BeKlrbRfNUp7GF+jWBltAKKiAL7aADdIxOUBlRdIPu0TN6MW6NR+PVeOuPDhlfO4voVxnvn2TTqoM=</latexit>
d(w, x) =
X
j
(wi xi)
2
squared
Euclidean
distance
29. Groningen, December 2021
training
Matrix Relevance LVQ
move prototypes
and
change matrix Ω
in order to
decrease if labels agree
increase if labels disagree
<latexit sha1_base64="XxMQLvfGY5l9SQvqEKuTc545nFs=">AAACAXicbZDLSsNAFIZP6q3WW7wsBDfBIlSQkoiiy4IblxXsBdpQJtNJOziZhJmJWkLc+CpuXCji1rdw59s4SSto6w8DH/85hznn9yJGpbLtL6MwN7+wuFRcLq2srq1vmJtbTRnGApMGDlko2h6ShFFOGooqRtqRICjwGGl5NxdZvXVLhKQhv1ajiLgBGnDqU4yUtnrmbr/SDZAaen5ylx794H162DPLdtXOZc2CM4FybcfPVe+Zn91+iOOAcIUZkrLj2JFyEyQUxYykpW4sSYTwDRqQjkaOAiLdJL8gtQ6007f8UOjHlZW7vycSFEg5Cjzdma0op2uZ+V+tEyv/3E0oj2JFOB5/5MfMUqGVxWH1qSBYsZEGhAXVu1p4iATCSodW0iE40yfPQvO46pxW7SudxgmMVYQ92IcKOHAGNbiEOjQAwwM8wQu8Go/Gs/FmvI9bC8ZkZhv+yPj4BjlKmaE=</latexit>
d(w, x)
⇤ij quantify the relevance of features / pairs
after training:
prototypes: represent typical class properties
Relevance Matrix:
leading eigenvectors of ! mark the most discriminative directions
in feature space → low-dim. representation of labelled data sets
generalized quadratic distance in LVQ:
d(w, x) = (w x)
>
⇤ (w x) = [ ⌦ (w x) ]
2
31. Groningen, December 2021
kidney cancer has 3 distinct subtypes (cell of origin): KIRP, KIRC, KICH
six tumor types have ribosomal subtypes: e.g. bladder cancer (BLCA)
skin melanoma (SKCM)
eye melanoma (UVM)
TCGA data: tumor samples
different tumors and subtypes have different RP mRNA signatures!
BLCA
SKCM
UVM
t-SNE
32. Groningen, December 2021
RP-profile tumor subtypes display significant differences in survival:
examples: uveal melanoma bladder cancer
TCGA data: tumor samples
33. Groningen, December 2021
TCGA data: tumor samples
classification:
LVQ relevances
ROC evaluated in
60/40% validation
univariate statistical test
identifies discriminative RP
e.g. KIRP vs. KIRC kidney tumors
vs.KIRC
0.0 0.5 1.0
34. Groningen, December 2021
RP mRNA signatures vary with
Ÿ tissue type Ÿ tumor type and sub-type Ÿ developmental stage
RP translation rates are proportional to RP mRNA levels
RP mRNA and profiling different in cell cultures vs. cells in-vivo
main findings: (not all presented here, see NAR paper)
speculative (yet plausible) conclusions:
RP composition and function
Ÿ is tissue-, tumor-, development-, environment-specific
Ÿ adds a novel layer to the regulatory network of the cell
Ÿ might play an important role in cancer
caveats: composition could be independent of RP abundance
possible extra-ribosomal functions of RP
direct inspection of ribosome is difficult
conclusion