SlideShare a Scribd company logo
1 of 35
Download to read offline
Evidence for tissue and stage-specific composition of the ribosome:
machine learning analysis of ribosomal protein mRNA data
June 2020
Michael Biehl www.cs.rug.nl/~biehl m.biehl@rug.nl
Bernoulli Inst. for Mathematics, Computer Science and Artificial Intelligence
Groningen, December 2021
2016: Aspen Center for Physics http://www.aspenphys.org
working group, initialized by Gyan Bhanot
development of ideas and first data analysis (LVQ & SOM)
here: emphasis on computational analysis, selected results
Groningen, December 2021
transcription:
DNA è (m)RNA
the “central dogma” of molecular biology
Groningen, December 2021
transcription:
DNA è (m)RNA
translation
mRNA è
proteins
the “central dogma” of molecular biology
protein,è
function
Groningen, December 2021
transcription:
DNA è (m)RNA
translation
mRNA è
proteins
the “central dogma” of molecular biology
Ribosome
protein,è
function
Groningen, December 2021
Francis Crick 1958: The Central Dogma
This states that once "information" has passed into protein, it cannot get
out again. In more detail, the transfer of information from nucleic acid to
nucleic acid, or from nucleic acid to protein may be possible, but transfer
from protein to protein, or from protein to nucleic acid is impossible.
Information means here the precise determination of sequence, either of
bases in the nucleic acid or of amino acid residues in the protein.
the “central dogma” of molecular biology
Rosalind Franklin
1920-1958
https://www.electricvoicetheatre.co.uk/rosalind-franklin-centenary-celebrations
https://de.wikipedia.org/wiki/Desoxyribonukleinsäure
Groningen, December 2021
• ancient molecular machine, “3D-printer” for proteins
• ~ 107 ribosomes per cell
• believed to have universal function
• believed to have the same composition in all tissues
the ribosome
Ribosome
Groningen, December 2021 8
Courtesy: National Human Genome Research Institute, NHI www.genome.gov
remark: mRNA vaccines
Groningen, December 2021
• ancient molecular machine, “3D-printer” for proteins
• ~ 107 ribosomes per cell
• believed to have universal function
• believed to have the same composition in all tissues
• consists of RNA and
ribosomal proteins (RP)
the ribosome
also coded by DNA which
is transcribed to mRNA
we consider 78 RP
← mRNA expression
Ribosome
Groningen, December 2021
Klijn et al. Nature Biotechnol. 33(3): 306-312 (2015)
675 cancer cell lines
public domain data sets
GTeX (v6p) www.gtexportal.org
ca. 10000 normal samples from 53 different tissues (with >50 samples)
TCGA (NCI-GDC, v7) www.cancer.gov (The Cancer Genome Atlas)
ca. 10000 tumor samples, 730 tumor-adjacent normals
PCA, SOM, t-SNE, UMAP, LVQ
normalization:
constant sum of reads
for each of the 78 RP
all results robust w.r.t. pre-
processing (log, z-score) and
choice of distance measures mRNA
expression
Groningen, December 2021
GTex ribosome data
ca. 10000 normal samples
from 53 different tissues
exclude 5 tissues
with < 50 samples
select randomly 88
samples per tissue
to avoid bias
set of feature vectors
<latexit sha1_base64="YYnX74plywrsHLvNIAOgeJTuamQ=">AAACJHicbVDLSgNBEJyNrxhfUY9eBoPgKexKUEGEoBdPEsWokEnC7GQ2GTI7u8z0imHZj/Hir3jx4AMPXvwWJ3EPvgoaiqpuurv8WAoDrvvuFKamZ2bnivOlhcWl5ZXy6tqliRLNeJNFMtLXPjVcCsWbIEDy61hzGvqSX/nD47F/dcO1EZG6gFHM2yHtKxEIRsFK3fIBkTwAkmISUhj4QXqbdUiYYCJULvnpedY5xUSL/gBI1k2tfehlnQbulitu1Z0A/yVeTiooR6NbfiG9iCUhV8AkNabluTG0U6pBMMmzEkkMjykb0j5vWapoyE07nTyZ4S2r9HAQaVsK8ET9PpHS0JhR6NvO8d3mtzcW//NaCQT77VSoOAGu2NeiIJEYIjxODPeE5gzkyBLKtLC3YjagmjKwuZZsCN7vl/+Sy52qt1utndUq9aM8jiLaQJtoG3loD9XRCWqgJmLoDj2gJ/Ts3DuPzqvz9tVacPKZdfQDzscnfw6lUQ==</latexit>
xµ
2 RN P
µ=1
<latexit sha1_base64="H3XzPe116LESU9mbbc8XD2vlHBg=">AAACBnicbVC7SgNBFJ31GdfXqqUIg0GwkLAbgkkTDNpYSQTzgOwSZieTZMjs7DIzK4YlNjb+h5WNhSK2dvY24t84eRSaeODC4Zx7ufceP2JUKtv+NubmFxaXllMr5ura+samtbVdlWEsMKngkIWi7iNJGOWkoqhipB4JggKfkZrfOxv6tWsiJA35lepHxAtQh9M2xUhpqWntXRTzBdeFphv44U1yqzGA7hEsF3PZbK5ppe2MPQKcJc6EpE8+zGL08GWWm9an2wpxHBCuMENSNhw7Ul6ChKKYkYHpxpJECPdQhzQ05Sgg0ktGbwzggVZasB0KXVzBkfp7IkGBlP3A150BUl057Q3F/7xGrNoFL6E8ihXheLyoHTOoQjjMBLaoIFixviYIC6pvhbiLBMJKJ2fqEJzpl2dJNZtxjjO5SztdOgVjpMAu2AeHwAF5UALnoAwqAIM78AiewYtxbzwZr8bbuHXOmMzsgD8w3n8APZSaEg==</latexit>
N = 78
P = 4224
with target labels (tissue)
Groningen, December 2021
unsupervised: low-dimensional representation and visualization
• Principal Component Analysis (PCA)
• Self-Organizing Map (SOM)
• Stochastic Neighborhood Embedding (t-SNE)
• post-hoc labelling
supervised: classification and class-discriminative visualization
• Learning Vector Quantization (LVQ)
• Relevance Learning (Matrix Relevance LVQ)
computational analysis / machine learning
spoiler: different tissues have different RP mRNA signatures!
basic ideas, limited mathematical detail
focus on aspects relevant for this particular study
Groningen, December 2021 13
<latexit sha1_base64="QRlFKTdZFTH2TZmCLuq5ispBsxg=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE1GPRi8eK9gPaUDbbTbt0swm7E7GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSKQw6LrfTmFldW19o7hZ2tre2d0r7x80TZxqxhsslrFuB9RwKRRvoEDJ24nmNAokbwWjm6nfeuTaiFg94DjhfkQHSoSCUbTS/VPP65UrbtWdgSwTLycVyFHvlb+6/ZilEVfIJDWm47kJ+hnVKJjkk1I3NTyhbEQHvGOpohE3fjY7dUJOrNInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tOyYbgLb68TJpnVe+ien53Xqld53EU4QiO4RQ8uIQa3EIdGsBgAM/wCm+OdF6cd+dj3lpw8plD+APn8wcOno2p</latexit>
x1
<latexit sha1_base64="AR8bw1iLV+h/TJ1WtHIGvCZC5c0=">AAAB6nicbVDLTgJBEOzFF+IL9ehlIjHxRHYJUY9ELx4xyiOBDZkdemHC7OxmZtZICJ/gxYPGePWLvPk3DrAHBSvppFLVne6uIBFcG9f9dnJr6xubW/ntws7u3v5B8fCoqeNUMWywWMSqHVCNgktsGG4EthOFNAoEtoLRzcxvPaLSPJYPZpygH9GB5CFn1Fjp/qlX6RVLbtmdg6wSLyMlyFDvFb+6/ZilEUrDBNW647mJ8SdUGc4ETgvdVGNC2YgOsGOppBFqfzI/dUrOrNInYaxsSUPm6u+JCY20HkeB7YyoGeplbyb+53VSE175Ey6T1KBki0VhKoiJyexv0ucKmRFjSyhT3N5K2JAqyoxNp2BD8JZfXiXNStm7KFfvqqXadRZHHk7gFM7Bg0uowS3UoQEMBvAMr/DmCOfFeXc+Fq05J5s5hj9wPn8AECKNqg==</latexit>
x2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Principal Component Analysis (PCA)
<latexit sha1_base64="YYnX74plywrsHLvNIAOgeJTuamQ=">AAACJHicbVDLSgNBEJyNrxhfUY9eBoPgKexKUEGEoBdPEsWokEnC7GQ2GTI7u8z0imHZj/Hir3jx4AMPXvwWJ3EPvgoaiqpuurv8WAoDrvvuFKamZ2bnivOlhcWl5ZXy6tqliRLNeJNFMtLXPjVcCsWbIEDy61hzGvqSX/nD47F/dcO1EZG6gFHM2yHtKxEIRsFK3fIBkTwAkmISUhj4QXqbdUiYYCJULvnpedY5xUSL/gBI1k2tfehlnQbulitu1Z0A/yVeTiooR6NbfiG9iCUhV8AkNabluTG0U6pBMMmzEkkMjykb0j5vWapoyE07nTyZ4S2r9HAQaVsK8ET9PpHS0JhR6NvO8d3mtzcW//NaCQT77VSoOAGu2NeiIJEYIjxODPeE5gzkyBLKtLC3YjagmjKwuZZsCN7vl/+Sy52qt1utndUq9aM8jiLaQJtoG3loD9XRCWqgJmLoDj2gJ/Ts3DuPzqvz9tVacPKZdfQDzscnfw6lUQ==</latexit>
xµ
2 RN P
µ=1
<latexit sha1_base64="MuIGc1CqJmN2dBSohoon8NLWjdQ=">AAAB9HicbVDLSgMxFL1TX7W+qi7dBIvgqsxIUZdFNy4r2Ae0Q8mkmTY0yYxJplKGfocbF4q49WPc+Tdm2llo64HA4Zx7uScniDnTxnW/ncLa+sbmVnG7tLO7t39QPjxq6ShRhDZJxCPVCbCmnEnaNMxw2okVxSLgtB2MbzO/PaFKs0g+mGlMfYGHkoWMYGMlvyewGQVh+jTre6hfrrhVdw60SrycVCBHo1/+6g0ikggqDeFY667nxsZPsTKMcDor9RJNY0zGeEi7lkosqPbTeegZOrPKAIWRsk8aNFd/b6RYaD0VgZ3MQuplLxP/87qJCa/9lMk4MVSSxaEw4chEKGsADZiixPCpJZgoZrMiMsIKE2N7KtkSvOUvr5LWRdW7rNbua5X6TV5HEU7gFM7Bgyuowx00oAkEHuEZXuHNmTgvzrvzsRgtOPnOMfyB8/kDhg6R8g==</latexit>
w1 direction of largest variance in the data
<latexit sha1_base64="MuIGc1CqJmN2dBSohoon8NLWjdQ=">AAAB9HicbVDLSgMxFL1TX7W+qi7dBIvgqsxIUZdFNy4r2Ae0Q8mkmTY0yYxJplKGfocbF4q49WPc+Tdm2llo64HA4Zx7uScniDnTxnW/ncLa+sbmVnG7tLO7t39QPjxq6ShRhDZJxCPVCbCmnEnaNMxw2okVxSLgtB2MbzO/PaFKs0g+mGlMfYGHkoWMYGMlvyewGQVh+jTre6hfrrhVdw60SrycVCBHo1/+6g0ikggqDeFY667nxsZPsTKMcDor9RJNY0zGeEi7lkosqPbTeegZOrPKAIWRsk8aNFd/b6RYaD0VgZ3MQuplLxP/87qJCa/9lMk4MVSSxaEw4chEKGsADZiixPCpJZgoZrMiMsIKE2N7KtkSvOUvr5LWRdW7rNbua5X6TV5HEU7gFM7Bgyuowx00oAkEHuEZXuHNmTgvzrvzsRgtOPnOMfyB8/kDhg6R8g==</latexit>
w1
<latexit sha1_base64="A837x7q5v7RDnIGRnbg2uXgo8S4=">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiSlqMuiG5cVbCs0oUymk3boZBLmoZTQ33DjQhG3/ow7/8ZJm4W2Hhg4nHMv98wJU86Udt1vp7S2vrG5Vd6u7Ozu7R9UD4+6KjGS0A5JeCIfQqwoZ4J2NNOcPqSS4jjktBdObnK/90ilYom419OUBjEeCRYxgrWVfD/GehxG2dNs0BhUa27dnQOtEq8gNSjQHlS//GFCTEyFJhwr1ffcVAcZlpoRTmcV3yiaYjLBI9q3VOCYqiCbZ56hM6sMUZRI+4RGc/X3RoZjpaZxaCfzjGrZy8X/vL7R0VWQMZEaTQVZHIoMRzpBeQFoyCQlmk8twUQymxWRMZaYaFtTxZbgLX95lXQbde+i3rxr1lrXRR1lOIFTOAcPLqEFt9CGDhBI4Rle4c0xzovz7nwsRktOsXMMf+B8/gAtS5HJ</latexit>
w2
largest variance in orthogonal space
… … … …
<latexit sha1_base64="GTA3Wg/eDSF68QwBC5C5XSLoZV4=">AAACSHicbVDLSgMxFM3Ud31VXboJFsGFlhkRdSm6cVOoYFVoypBJ77TBzIPkjlKG+Tw3Lt35DW5cKOLOTC34PBA4OedccnOCVEmDrvvoVCYmp6ZnZueq8wuLS8u1ldULk2RaQFskKtFXATegZAxtlKjgKtXAo0DBZXB9UvqXN6CNTOJzHKbQjXg/lqEUHK3k13wWcRwEYX5b+E1K2TbbpiwFndKSKAiR5d8inhW/brtlpJeg+aHmzR2vYFr2B8gKv1Z3G+4I9C/xxqROxmj5tQfWS0QWQYxCcWM6nptiN+capVBQVFlmIOXimvehY2nMIzDdfFREQTet0qNhou2JkY7U7xM5j4wZRoFNlvua314p/ud1MgwPu7mM0wwhFp8PhZmimNCyVdqTGgSqoSVcaGl3pWLANRdou6/aErzfX/5LLnYb3n5j72yvfnQ8rmOWrJMNskU8ckCOyClpkTYR5I48kRfy6tw7z86b8/4ZrTjjmTXyA5XKByEqshA=</latexit>
wM ? {w1, w2, . . . , wM 1}
centered
Principal Component Analysis:
<latexit sha1_base64="8lVQZGbAGMfoffsHO5BSRreA6nM=">AAACL3icbVDLSgNBEJz1bXxFPXoZDIKHEHZjUI+iIB4jGBUyIcxOepMhsw9mepWw5I+8+Cu5iCji1b9wNgZ8xIKBmqpuurv8REmDrvvszMzOzS8sLi0XVlbX1jeKm1vXJk61gIaIVaxvfW5AyQgaKFHBbaKBh76CG79/lvs3d6CNjKMrHCTQCnk3koEUHK3ULp6zkGPPD7L7YfuAUlZmZcoS0AnNiYIAWfajxLPi969KmZbdHrJhu1hyK+4YdJp4E1IiE9TbxRHrxCINIUKhuDFNz02wlXGNUigYFlhqIOGiz7vQtDTiIZhWNr53SPes0qFBrO2LkI7Vnx0ZD40ZhL6tzJc1f71c/M9rphgctzIZJSlCJL4GBamiGNM8PNqRGgSqgSVcaGl3paLHNRdoIy7YELy/J0+T62rFO6zULmulk9NJHEtkh+ySfeKRI3JCLkidNIggD2REXsir8+g8OW/O+1fpjDPp2Sa/4Hx8AmzYqMI=</latexit>
w3 ? {w1, w2}
<latexit sha1_base64="HExy6GIT6eMTJuoR2V19GdXeImc=">AAACEXicbZDLSgMxFIYzXmu9jbp0EyxCF6XMlKIui25cVrAX6AxDJs20oZlMSDJKGfoKbnwVNy4UcevOnW9jpi2orQcCH/9/DjnnDwWjSjvOl7Wyura+sVnYKm7v7O7t2weHbZWkEpMWTlgiuyFShFFOWppqRrpCEhSHjHTC0VXud+6IVDTht3osiB+jAacRxUgbKbDLXoz0MIyy+0lQg9CreBXoCSIFzOHHcwO75FSdacFlcOdQAvNqBvan109wGhOuMUNK9VxHaD9DUlPMyKTopYoIhEdoQHoGOYqJ8rPpRRN4apQ+jBJpHtdwqv6eyFCs1DgOTWe+o1r0cvE/r5fq6MLPKBepJhzPPopSBnUC83hgn0qCNRsbQFhSsyvEQyQR1ibEognBXTx5Gdq1qntWrd/US43LeRwFcAxOQBm44Bw0wDVoghbA4AE8gRfwaj1az9ab9T5rXbHmM0fgT1kf33ObnCo=</latexit>
w2 ? w1
projections:
<latexit sha1_base64="RA5iB821E5SNoyijWuF9GDFdhZw=">AAACE3icbVDLSsNAFJ34rPUVdelmsAjioiRS1I1QdOOygn1AE8NkOmmHTiZhZqKG0H9w46+4caGIWzfu/BsnbRBtPTBwOOdc5t7jx4xKZVlfxtz8wuLScmmlvLq2vrFpbm23ZJQITJo4YpHo+EgSRjlpKqoY6cSCoNBnpO0PL3K/fUuEpBG/VmlM3BD1OQ0oRkpLnnmYesMbJ0zgGXRCpAZ+kN2NcklF8Y9yP8ojnlmxqtYYcJbYBamAAg3P/HR6EU5CwhVmSMqubcXKzZBQFDMyKjuJJDHCQ9QnXU05Col0s/FNI7ivlR4MIqEfV3Cs/p7IUChlGvo6mW8pp71c/M/rJio4dTPK40QRjicfBQmDKoJ5QbBHBcGKpZogLKjeFeIBEggrXWNZl2BPnzxLWkdV+7hau6pV6udFHSWwC/bAAbDBCaiDS9AATYDBA3gCL+DVeDSejTfjfRKdM4qZHfAHxsc3xlKewg==</latexit>
yµ
k = w>
k xµ
<latexit sha1_base64="OSsaCH1Eb6WryF/Ad6oXNg/hEUY=">AAACNnicbVBNa9tAEF0lbeO4X0p67GWpKfRQjBRCk0vAJJdeDGmpE4NXFqv1yF6yWondUUAI/6pe+jtyy6WHltJrf0JWtg6t3QcLb9+bYWZeUihpMQjuvZ3dR4+f7HX2u0+fPX/x0j84vLJ5aQSMRK5yM064BSU1jFCignFhgGeJguvk5qLxr2/BWJnrL1gVEGV8rmUqBUcnxf6Q3YKoq+WUZSU9o0xBipMqDpv/e1rFR2vC1CxH2wjDRmBGzhcYUSY1ZRnHRZLUn5fTYez3gn6wAt0mYUt6pMVl7N+xWS7KDDQKxa2dhEGBUc0NSqFg2WWlhYKLGz6HiaOaZ2CjenX2kr51yoymuXFPI12pf3fUPLO2yhJX2exoN71G/J83KTE9jWqpixJBi/WgtFQUc9pkSGfSgEBVOcKFkW5XKhbccIEu6a4LIdw8eZtcHfXDD/3jT8e9wXkbR4e8Jm/IOxKSEzIgH8klGRFBvpJ78oP89L55371f3u916Y7X9rwi/8D78wBWuau3</latexit>
~
yµ
= [yµ
1 , yµ
2 , . . . , yµ
M ] 2 RM
typically: M < N, low-dimensional linear projections of high-dim. data
with optimal information content for Gaussian data
frequently: M =2,3 for visualization of high-dim. data
<latexit sha1_base64="uDaGUxcK//nRkCEehlsKTtnLA5g=">AAACRXicbVBNTxsxFPQCLZB+pfTIxSKqxAGlu1VVeomE4NJyChIBpDhEXudtYmF7V/bbqpG1J/4ZF+7c+AdcegChXouTcKChI1kazczTe560UNJhHF9HC4tLL14ur6zWXr1+8/Zd/f3akctLK6AjcpXbk5Q7UNJAByUqOCkscJ0qOE7P9ib+8U+wTubmEMcF9DQfGplJwTFI/TpjCjJkipuhAso0x1Ga+V/VKdMlZVYOR8jszGxRllkufFL5dkWZK3Xfh1QrqU7blG3ND7do3K834mY8BX1OkkfS2PnBPp37/Ua7X79ig1yUGgwKxZ3rJnGBPc8tSqGgqrHSQcHFGR9CN1DDNbien7ZQ0Y9BGdAst+EZpFP16YTn2rmxTkNycqmb9ybi/7xuidm3npemKBGMmC3KSkUxp5NK6UBaEKjGgXBhZbiVihEPVWEovhZKSOa//JwcfW4mX5tfDkIbu2SGFbJONsgmScg22SHfSZt0iCAX5IbckrvoMvod3Ud/ZtGF6HHmA/kH0d8HBja0vQ==</latexit>
hxµ
i =
1
P
P
X
µ=1
xµ
= 0
feature vectors
eigenvectors of covariance matrix
<latexit sha1_base64="9z4CEIRz5yeIRjjrTcahhq7l73k=">AAACVnicbVHBShxBEO2ZjdFsNE70mEsnEshBlhkxJBdB8OJJNiGrwva69PTWaGN3z9BdE1ya+Zzg/5hLJMd8RS5iz+4ejKag4dV7VVTV67xS0mGa3kZx59nS8+WVF92Xq2uv1pPXG8eurK2AgShVaU9z7kBJAwOUqOC0ssB1ruAkvzxo9ZPvYJ0szTecVjDS/NzIQgqOgRonmiFcocOpAnqwxwrLhc8a32+Yq/XYM13vZc1ZyCnTHC/ywl81Z4F9mLZVDMuqoZRts23KpFnIuf8a5COGUoOjR8042Up76SzoU5AtwNb+2x+d7M/17/44uWGTUtQaDArFnRtmaYUjzy1KoaDpstpBxcUlP4dhgIaHOSM/s6Wh7wMzoUVpwzNIZ+zDDs+1c1Odh8p2W/dYa8n/acMai88jL01VIxgxH1TUimJJW4/pRFoQqKYBcGFl2JWKCx6sxfAT3WBC9vjkp+B4p5d97KVfghu7ZB4r5A15Rz6QjHwi++SQ9MmACPKT/I3iqBP9iu7ipXh5XhpHi55N8k/EyT3bBLk6</latexit>
C = 1
P
PP
µ=1 xµ
xµ>
2 RN⇥N
Groningen, December 2021
GTeX data (normal samples)
<latexit sha1_base64="S9Cu8AKFnK6I/8lMB82XcUuy6Gc=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkqMeiF48V7Qe0oWy2m3bpZhN2J0Io/QlePCji1V/kzX/jts1BWx8MPN6bYWZekEhh0HW/ncLa+sbmVnG7tLO7t39QPjxqmTjVjDdZLGPdCajhUijeRIGSdxLNaRRI3g7GtzO//cS1EbF6xCzhfkSHSoSCUbTSQ9b3+uWKW3XnIKvEy0kFcjT65a/eIGZpxBUySY3pem6C/oRqFEzyaamXGp5QNqZD3rVU0YgbfzI/dUrOrDIgYaxtKSRz9ffEhEbGZFFgOyOKI7PszcT/vG6K4bU/ESpJkSu2WBSmkmBMZn+TgdCcocwsoUwLeythI6opQ5tOyYbgLb+8SloXVe+yWruvVeo3eRxFOIFTOAcPrqAOd9CAJjAYwjO8wpsjnRfn3flYtBacfOYY/sD5/AEQJI2q</latexit>
y1
<latexit sha1_base64="AXHDfzJfI2psFexzj7c34ZbJEOI=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY9FLx4r2lpoQ9lsJ+3SzSbsboQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IBFcG9f9dgpr6xubW8Xt0s7u3v5B+fCoreNUMWyxWMSqE1CNgktsGW4EdhKFNAoEPgbjm5n/+IRK81g+mEmCfkSHkoecUWOl+0m/1i9X3Ko7B1klXk4qkKPZL3/1BjFLI5SGCap113MT42dUGc4ETku9VGNC2ZgOsWuppBFqP5ufOiVnVhmQMFa2pCFz9fdERiOtJ1FgOyNqRnrZm4n/ed3UhFd+xmWSGpRssShMBTExmf1NBlwhM2JiCWWK21sJG1FFmbHplGwI3vLLq6Rdq3oX1fpdvdK4zuMowgmcwjl4cAkNuIUmtIDBEJ7hFd4c4bw4787HorXg5DPH8AfO5w8RqI2r</latexit>
y2
<latexit sha1_base64="2CemlM6+NZ4ujdFym/hTPFz7v/c=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0m0qMeiF48V7Qe0oWy2m3bpZhN2J0Io/QlePCji1V/kzX/jts1Bqw8GHu/NMDMvSKQw6LpfTmFldW19o7hZ2tre2d0r7x+0TJxqxpsslrHuBNRwKRRvokDJO4nmNAokbwfjm5nffuTaiFg9YJZwP6JDJULBKFrpPuuf98sVt+rOQf4SLycVyNHolz97g5ilEVfIJDWm67kJ+hOqUTDJp6VeanhC2ZgOeddSRSNu/Mn81Ck5scqAhLG2pZDM1Z8TExoZk0WB7YwojsyyNxP/87ophlf+RKgkRa7YYlGYSoIxmf1NBkJzhjKzhDIt7K2EjaimDG06JRuCt/zyX9I6q3oX1dpdrVK/zuMowhEcwyl4cAl1uIUGNIHBEJ7gBV4d6Tw7b877orXg5DOH8AvOxzcTLI2s</latexit>
y3
<latexit sha1_base64="S9Cu8AKFnK6I/8lMB82XcUuy6Gc=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkqMeiF48V7Qe0oWy2m3bpZhN2J0Io/QlePCji1V/kzX/jts1BWx8MPN6bYWZekEhh0HW/ncLa+sbmVnG7tLO7t39QPjxqmTjVjDdZLGPdCajhUijeRIGSdxLNaRRI3g7GtzO//cS1EbF6xCzhfkSHSoSCUbTSQ9b3+uWKW3XnIKvEy0kFcjT65a/eIGZpxBUySY3pem6C/oRqFEzyaamXGp5QNqZD3rVU0YgbfzI/dUrOrDIgYaxtKSRz9ffEhEbGZFFgOyOKI7PszcT/vG6K4bU/ESpJkSu2WBSmkmBMZn+TgdCcocwsoUwLeythI6opQ5tOyYbgLb+8SloXVe+yWruvVeo3eRxFOIFTOAcPrqAOd9CAJjAYwjO8wpsjnRfn3flYtBacfOYY/sD5/AEQJI2q</latexit>
y1
Principal Component Analysis
- tissue labels not used
- low-dimensional projections of 78-dim. mRNA expression data:
Groningen, December 2021
whole blood - brain tissues - all other tissues
different tissues have different RP mRNA signatures!
Principal Component Analysis
- tissue labels not used
- low-dimensional projections of 78-dim. mRNA expression data
- post-labelling (coloring) of data points according to tissue (group):
<latexit sha1_base64="S9Cu8AKFnK6I/8lMB82XcUuy6Gc=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkqMeiF48V7Qe0oWy2m3bpZhN2J0Io/QlePCji1V/kzX/jts1BWx8MPN6bYWZekEhh0HW/ncLa+sbmVnG7tLO7t39QPjxqmTjVjDdZLGPdCajhUijeRIGSdxLNaRRI3g7GtzO//cS1EbF6xCzhfkSHSoSCUbTSQ9b3+uWKW3XnIKvEy0kFcjT65a/eIGZpxBUySY3pem6C/oRqFEzyaamXGp5QNqZD3rVU0YgbfzI/dUrOrDIgYaxtKSRz9ffEhEbGZFFgOyOKI7PszcT/vG6K4bU/ESpJkSu2WBSmkmBMZn+TgdCcocwsoUwLeythI6opQ5tOyYbgLb+8SloXVe+yWruvVeo3eRxFOIFTOAcPrqAOd9CAJjAYwjO8wpsjnRfn3flYtBacfOYY/sD5/AEQJI2q</latexit>
y1
<latexit sha1_base64="AXHDfzJfI2psFexzj7c34ZbJEOI=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY9FLx4r2lpoQ9lsJ+3SzSbsboQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IBFcG9f9dgpr6xubW8Xt0s7u3v5B+fCoreNUMWyxWMSqE1CNgktsGW4EdhKFNAoEPgbjm5n/+IRK81g+mEmCfkSHkoecUWOl+0m/1i9X3Ko7B1klXk4qkKPZL3/1BjFLI5SGCap113MT42dUGc4ETku9VGNC2ZgOsWuppBFqP5ufOiVnVhmQMFa2pCFz9fdERiOtJ1FgOyNqRnrZm4n/ed3UhFd+xmWSGpRssShMBTExmf1NBlwhM2JiCWWK21sJG1FFmbHplGwI3vLLq6Rdq3oX1fpdvdK4zuMowgmcwjl4cAkNuIUmtIDBEJ7hFd4c4bw4787HorXg5DPH8AfO5w8RqI2r</latexit>
y2
<latexit sha1_base64="2CemlM6+NZ4ujdFym/hTPFz7v/c=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0m0qMeiF48V7Qe0oWy2m3bpZhN2J0Io/QlePCji1V/kzX/jts1Bqw8GHu/NMDMvSKQw6LpfTmFldW19o7hZ2tre2d0r7x+0TJxqxpsslrHuBNRwKRRvokDJO4nmNAokbwfjm5nffuTaiFg9YJZwP6JDJULBKFrpPuuf98sVt+rOQf4SLycVyNHolz97g5ilEVfIJDWm67kJ+hOqUTDJp6VeanhC2ZgOeddSRSNu/Mn81Ck5scqAhLG2pZDM1Z8TExoZk0WB7YwojsyyNxP/87ophlf+RKgkRa7YYlGYSoIxmf1NBkJzhjKzhDIt7K2EjaimDG06JRuCt/zyX9I6q3oX1dpdrVK/zuMowhEcwyl4cAl1uIUGNIHBEJ7gBV4d6Tw7b877orXg5DOH8AvOxzcTLI2s</latexit>
y3
<latexit sha1_base64="S9Cu8AKFnK6I/8lMB82XcUuy6Gc=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkqMeiF48V7Qe0oWy2m3bpZhN2J0Io/QlePCji1V/kzX/jts1BWx8MPN6bYWZekEhh0HW/ncLa+sbmVnG7tLO7t39QPjxqmTjVjDdZLGPdCajhUijeRIGSdxLNaRRI3g7GtzO//cS1EbF6xCzhfkSHSoSCUbTSQ9b3+uWKW3XnIKvEy0kFcjT65a/eIGZpxBUySY3pem6C/oRqFEzyaamXGp5QNqZD3rVU0YgbfzI/dUrOrDIgYaxtKSRz9ffEhEbGZFFgOyOKI7PszcT/vG6K4bU/ESpJkSu2WBSmkmBMZn+TgdCcocwsoUwLeythI6opQ5tOyYbgLb+8SloXVe+yWruvVeo3eRxFOIFTOAcPrqAOd9CAJjAYwjO8wpsjnRfn3flYtBacfOYY/sD5/AEQJI2q</latexit>
y1
GTeX data (normal samples)
Groningen, December 2021
based on dis-similarity/distance measure
assignment to prototypes: e.g. Nearest Prototype Scheme
given vector xµ , determine winner
or best matching unit (BMU) → assign xµ to prototype w*
most popular: (squared) Euclidean distance
Competitive Learning
VQ system: set of prototypes
data: set of feature vectors
w1
, w2
, . . . , wK
wk
2 I
RN
d(w, x) 0
x1
, x2
, . . . , xP
xµ
2 I
RN
w⇤
= argminj d(wj
, xµ
)
Vector Quantization: identify typical representatives of data
which capture essential features
d(w, x) =
N
X
i=1
(wi xi)
2
Groningen, December 2021
and random sequential (repeated) presentation of data
… BMU: The Winner Takes it All :
initially: randomized wk, e.g. in randomly selected data points
Competitive Learning
η (<1): learning rate, step size of update
wi⇤
= argminj d(wj
, xi
)
{xµ
}
⌘ (xµ
w⇤
)
w⇤
! w⇤
+ ⌘ (xµ
w⇤
)
repeated presentation of the
data set
protoytpes → typical
representatives of data
possibly reflect structures
in the data set, e.g. clusters
Groningen, December 2021
Self-Organizing Map (SOM)
T. Kohonen. Self-Organizing Maps (Springer 1995)
neighborhood relation on a pre-defined low-dim. lattice
d-dim. lattice A of
prototypes (neurons)
- update BMU and lattice neighborhood:
where
range ρ w.r.t. distances in lattice A
ws
wr 2 I
RN
at r 2 I
Rd
h⇢(r, s) = exp
✓
|| r s ||2
A
2⇢2
◆
upon presentation of xµ :
- determine the Best Matching Unit
at position s in the lattice
wr ! wr + ⌘ h⇢(r, s) (xµ
wr)
e.g. two-dim. square
lattice, triangular, etc.
Groningen, December 2021
prototype lattice deforms, reflecting the density of observations
typical case: high-dim. orginal feature space → two-dim. lattice
© Wikipedia
SOM: provides topology/neighborhood preserving
low-dimensional representation (visualization, clusters …)
Frequently:
unsupervised analysis, post-hoc labelling of prototypes
according to the majority of assigned data points
Self-Organizing Map (SOM)
Groningen, December 2021
20
other tissues
ovary
liver
adrenal gland
pancreas
muscle
heart
cells (fibroblasts)
cells (ebv)
skin
pituitary
brain (other)
cerebellum
testis
blood
9
13
14
13
14
11
12
11
11
11
10
6
6
6
6
6
6
6
30
26
32
33
33
33
38
38
38
8
8
16
13
17
12
12
9
39
32
6
6
6
4
4
4
48
32
32
32
33
38
38
38
8
16
15
17
17
17
11
50
1
36
51
6
6
4
4
4
27
32
32
32
33
38
38
38
17
16
16
18
10
17
42
50
50
2
28
26
6
4
4
4
5
6
32
32
33
38
38
38
38
8
20
18
10
18
17
42
42
50
50
26
30
30
26
5
5
5
1
36
36
32
32
33
38
38
38
18
20
20
16
10
15
42
42
50
50
27
26
30
30
30
28
6
6
39
39
36
36
36
1
19
19
20
50
50
50
43
26
30
30
28
51
52
39
39
39
36
36
2
1
1
1
1
19
19
34
50
50
50
50
35
26
30
26
51
51
51
39
39
39
2
21
1
1
1
1
2
49
49
49
35
36
36
50
50
50
50
50
30
30
43
51
51
21
21
39
39
21
2
1
1
1
1
2
49
49
35
36
36
36
37
37
37
43
43
43
43
40
40
52
21
39
39
21
1
1
1
1
2
2
49
35
35
36
36
36
37
48
48
37
36
43
2
24
40
21
21
21
39
5
5
1
1
1
1
1
3
3
35
36
47
47
46
48
48
48
27
36
39
21
2
21
21
44
45
45
45
2
5
53
53
36
53
3
35
47
46
46
27
48
37
36
30
39
21
52
44
44
44
45
44
44
44
53
53
53
53
53
3
3
35
36
27
27
27
27
36
36
36
36
1
29
52
44
45
45
45
44
44
45
53
53
53
53
3
35
43
27
27
27
52
36
33
23
29
7
23
52
45
45
45
45
45
53
53
53
53
41
48
43
43
52
29
29
29
29
23
23
23
22
22
23
23
45
45
45
45
45
53
53
53
41
41
43
29
29
29
29
29
23
23
23
23
22
22
23
23
23
45
45
45
45
45
53
53
53
4
4
4
4
4
3
3
3
3
3
4
15
15
15
15
15
15
15
15
15
9
9
9
9
10
10
10
4
4
4
4
4
3
3
4
15
9
15
15
15
15
15
15
15
9
9
9
9
10
10
10
4
4
4
4
4
4
3
15
15
15
15
15
15
15
15
15
15
9
9
9
9
10
10
10
4
4
4
4
4
4
5
15
15
15
15
15
15
15
15
15
15
15
9
9
9
10
10
10
10
4
4
4
4
4
4
5
5
15
15
15
15
15
15
15
15
15
15
15
15
9
9
9
10
10
10
4
4
4
4
4
4
5
5
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
4
4
4
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
4
4
15
15
15
15
15
13
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
2
2
2
13
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
2
2
13
15
15
15
15
15
15
15
15
15
15
14
14
15
15
15
15
15
15
15
15
15
15
15
2
13
13
15
15
15
15
15
15
15
15
15
15
15
14
15
15
15
15
15
15
15
15
15
15
15
12
12
13
15
15
15
15
15
15
15
15
15
15
15
15
15
15
6
15
6
15
15
15
1
1
15
1
12
13
15
15
15
15
15
15
15
15
15
15
6
6
6
6
6
6
6
6
1
1
1
1
1
12
12
13
15
15
15
15
15
15
15
15
15
15
15
15
6
6
6
6
6
6
6
1
1
1
1
12
13
15
15
15
15
15
15
9
8
15
15
8
15
6
6
6
6
6
1
1
1
1
11
15
15
15
15
15
15
15
15
8
8
8
7
7
8
8
6
6
6
6
6
1
1
1
11
11
15
15
15
15
15
15
8
8
8
8
7
7
8
8
8
6
6
6
6
6
1
1
1
individual tissue labels suggested groups of tissues
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
grayscale: avg. distance from neighbor prototypes
GTeX data (normal samples)
Groningen, December 2021 21
subsets of RP mRNA data:
if majority<50%
if majority <50%
GTeX data (normal samples)
different tissues have different RP mRNA signatures!
Groningen, December 2021 22
low-dimensional embedding
- represent high-dim data in low-dim. embedding space
- no functional mapping from (as in PCA)
instead: representation of by explicit counterparts
<latexit sha1_base64="1tv1oazNTjeAf027AGYdq27jEJM=">AAACF3icbVDLSgMxFM3UVx0fHXXpJlgEBSkzouhCsCCCm5Yq9gGdsWTStA3NZMYkI5Shf+HGX9GFC0Xc6s5P8C9MH6BWL4ScnHMvuef4EaNS2faHkZqanpmdS8+bC4tLyxlrZbUiw1hgUsYhC0XNR5IwyklZUcVILRIEBT4jVb97MtCrN0RIGvJL1YuIF6A2py2KkdJUw8q5AVId308u+ldF6O5AV4XD65su6Nd1jJqwcFRsWFk7Zw8L/gXOGGTzmc/jbfPhtNSw3t1miOOAcIUZkrLu2JHyEiQUxYz0TTeWJEK4i9qkriFHAZFeMvTVh5uaacJWKPThCg7ZnxMJCqTsBb7uHKwrJ7UB+Z9Wj1Xr0Esoj2JFOB591IoZ1N4HIcEmFQQr1tMAYUH1rhB3kEBY6ShNHYIzafkvqOzmnP2cfa7T2AOjSoN1sAG2gAMOQB6cgRIoAwxuwT14As/GnfFovBivo9aUMZ5ZA7/KePsCzwOgqw==</latexit>
RN
! RM
, M < N
<latexit sha1_base64="TCNomwS7dMfW9CJyG4NGilIoNhk=">AAACE3icbVC7SgNBFJ31GeMramkzGgSxCLuiaCMEbCwjmAfsJsvsZDYZMvtg5q4ali38Axv/wG+wsVDE1sbO7/AHnDwKTTwwcOace7n3Hi8WXIFpfhkzs3PzC4u5pfzyyuraemFjs6aiRFJWpZGIZMMjigkesipwEKwRS0YCT7C61zsf+PVrJhWPwivox6wZkE7IfU4JaMktHDiC+eCk2AkIdD0/vc1aTpBgR/JOF5zMTfXvzMpaFewWimbJHAJPE2tMiuWdx+8b+86suIVPpx3RJGAhUEGUsi0zhmZKJHAqWJZ3EsViQnukw2xNQxIw1UyHN2V4Tytt7EdSvxDwUP3dkZJAqX7g6crB5mrSG4j/eXYC/mkz5WGcAAvpaJCfCAwRHgSE21wyCqKvCaGS610x7RJJKOgY8zoEa/LkaVI7LFnHJfNSp3GERsihbbSL9pGFTlAZXaAKqiKK7tETekGvxoPxbLwZ76PSGWPcs4X+wPj4AQRpodg=</latexit>
{xµ
}
P
µ=1
<latexit sha1_base64="WUiTE2WauLEqQrLB5tnyWwIJtwU=">AAACEHicbVC7SgNBFJ31bXytWtqMimgVdkXRRgjYWEYwKmSTZXZyNxmcfTBzNxKWLfwAG7/BP7CxUMTW0s7v8AecJBZqPHDhzDn3MveeIJVCo+N8WGPjE5NT0zOzpbn5hcUle3nlXCeZ4lDjiUzUZcA0SBFDDQVKuEwVsCiQcBFcHff9iy4oLZL4DHspNCLWjkUoOEMj+fa2JyFEL6deF3jeK5pelFFPiXYHvcLPzevILZpV6tubTtkZgI4S95tsVtbvP6/rN07Vt9+9VsKzCGLkkmldd50UGzlTKLiEouRlGlLGr1gb6obGLALdyAcHFXTLKC0aJspUjHSg/pzIWaR1LwpMZ8Swo/96ffE/r55heNjIRZxmCDEffhRmkmJC++nQllDAUfYMYVwJsyvlHaYYR5NhyYTg/j15lJzvlt39snNq0tgjQ8yQNbJBdohLDkiFnJAqqRFObskDeSLP1p31aL1Yr8PWMet7ZpX8gvX2BZOqoIc=</latexit>
{~
yµ
}
P
µ=1
Multi-Dimensional Scaling: pair-wise distances
<latexit sha1_base64="POajwQbY6hz3V3rd5z+JTTBxyeU=">AAACTnicbVHLbhMxFPWEV0h5BFhVbCwqpFSCMBOBYFNRqRtWVZFIWykzGXmcO4lb2zOyr1Gj0XwTH8IGdVd2fAIbFiAEnkkXpeVKlo/PuUf2Pc5KKSyG4VnQuXb9xs1b3du9tTt3793vP3i4bwtnOIx5IQtzmDELUmgYo0AJh6UBpjIJB9nxTqMffARjRaE/4LKERLG5FrngDD2V9mGW7g5ixXCR5dVJPY2Ve0YvnrXbpFs0lpDjJLZOpdXRVlRPd2mv5QYn6VFjet7u2sVGzBe4OR2tQDKtohejOu1vhMOwLXoVROdgY/vt+qdvX6u1vbR/Gs8K7hRo5JJZO4nCEpOKGRRcQt2LnYWS8WM2h4mHmimwSdXGUdOnnpnRvDB+aaQte9FRMWXtUmW+sxnUXtYa8n/axGH+JqmELh2C5quLcicpFrTJls6EAY5y6QHjRvi3Ur5ghnH0P9DzIUSXR74K9kfD6NUwfO/TeElW1SWPyRMyIBF5TbbJO7JHxoSTz+Q7+Ul+BV+CH8Hv4M+qtROcex6Rf6rT/Qtej7gZ</latexit>
dN (xµ
, x⌫
) =
2
4
N
X
j=1
xµ
j x⌫
j
2
3
5
1/2
<latexit sha1_base64="FIH/CltnRbY0w1ADFSAcgtfVJmc=">AAACSHicbVBbaxQxGM1svdT1ttZHX4JF3IKuM4uiCIWCIL4UKrhtYedCJvvNbtgkMyRfCsMwP08EH30p/Q2++KCIb2Zni2jrgZCTc75DkpNXUlgMw7Ogt3Hl6rXrmzf6N2/dvnN3cG/r0JbOcJjwUpbmOGcWpNAwQYESjisDTOUSjvLlm5V/dALGilJ/wLqCRLG5FoXgDL2UDbJZtj+MT4A3dZvGyj2hfw7a7dBdGksocBpbp7JmuRu16T7td9qwzparxNNu1y42Yr7AnXS8JknaRM/GbTbYDkdhB3qZROdke+/149OUvv10kA2+xLOSOwUauWTWTqOwwqRhBgWX0PZjZ6FifMnmMPVUMwU2aboiWvrIKzNalMYvjbRT/040TFlbq9xPKoYLe9Fbif/zpg6LV0kjdOUQNF9fVDhJsaSrVulMGOAoa08YN8K/lfIFM4yj777vS4gufvkyORyPohej8L1v4zlZY5M8IA/JkETkJdkj78gBmRBOPpKv5Dv5EXwOvgU/g1/r0V5wnrlP/kGv9xtDWLT1</latexit>
dM (~
yµ
, ~
y⌫
) =
" M
X
k=1
(yµ
k y⌫
k )
2
#1/2
d
M
dN
x3
x2 x1
find →distances approx. reproduced
<latexit sha1_base64="bIkLS64pBnl9X+CZIjkQnpD14rg=">AAACeHicbVFNb9MwGHbCgFG+Chx3sTYmOqlUScUEhx0mceHA0CbRbVKdRY77pjWzncwfE1WUOzd+D3+DG7f9iV12wkmnqWy8kqXHz4dev6+zUnBjo+hPEN5buf/g4eqjzuMnT5897754eWgKpxmMWCEKfZxRA4IrGFluBRyXGqjMBBxlpx8b/egctOGF+mrnJSSSThXPOaPWU2n3J5FcpRURkFtSkXNg1bw+IdIRzaczS+oaY2Kc9Bbp+kS5Pm5EBWce17jNjfEk/dIjktpZllff23h/+arcVuetN+31lhr0b7CXF92Sk2Ha3YgGUVv4LoivwcbuzsWvH5+/7eyn3d9kUjAnQVkmqDHjOCptUlFtORNQd4gzUFJ2Sqcw9lBRCSap2sXVeNMzE5wX2h9lccsuJyoqjZnLzDubccxtrSH/p42dzT8kFVels6DYolHuBLYFbn4BT7gGZsXcA8o092/FbEY1Zdb/VccvIb498l1wOBzE24PowG/jHVrUKlpD66iHYvQe7aJPaB+NEEOXwVrwOtgMrkIcvgm3FtYwuM68Qv9UOPwLmB7FcQ==</latexit>
min
{~
yµ}
X
µ,⌫,µ6=⌫
[dN (xµ
, x⌫
) dM (~
yµ
, ~
y⌫
)]
2
Groningen, December 2021 23
stochastic neighborhood embedding
• consider distance-based probability for pair-wise neighborhood
in the original feature space, e.g.
local std. deviations !µ determined by local density of data
<latexit sha1_base64="4Ar8HLBeqAIglHOmzs8gK1Id8sw=">AAACYnicbVFNTxsxEPVuodC0hdCeqnIYFVWiB8JuKGqPSFw4UqkBpHgTeR1vYmF7LX9URMv+oP6C/o/eeuoFJH4G3k0qIehIlp/fzNPMPOdacOuS5E8UP1tZfb62/qLz8tXrjc3u1pszW3pD2YCWojQXObFMcMUGjjvBLrRhROaCneeXx03+/Aczlpfqu5trlkkyVbzglLhAjbtzPa6w9HANWPkasDaldiVgdqUBC1a4Iey1dw9wYQit0rrq17CLJXGzvKiu6lEj34OHhPLwadQHbPh05vYBWz6VZNQPndoeLZ2NuztJL2kDnoJ0CXaODt79nNz9uj0dd3/jSUm9ZMpRQawdpol2WUWM41SwuoO9ZZrQSzJlwwAVkcxmVWtRDR8DM4GiNOEoBy37UFERae1c5qGyWcQ+zjXk/3JD74qvWcWV9o4pumhUeAHBxMZvmHDDqBPzAAg1PMwKdEaCky78SieYkD5e+Sk46/fSw17yLbjxGS1iHb1HH9AuStEXdIRO0CkaIIr+RqvRRrQZ3cSdeCt+uyiNo6Vm+f4X8fY96WS6iQ==</latexit>
pµ|⌫ / exp

1
2
(xµ
x⌫
)2 2
⌫
<latexit sha1_base64="1+hnkY9YBLe/CdN/7Me1tUkBq1Y=">AAACL3icbVDLSgMxFM34tr6qLt0ERVCUMuMD3QgFQVyJgq1Cp5RMmmlDk8yQ3BHKOD/gB/gDbt34K92IKOLGhX9hOnXh60LgnHPP5eaeIBbcgOs+OUPDI6Nj4xOThanpmdm54vxC1USJpqxCIxHpy4AYJrhiFeAg2GWsGZGBYBdB57Dfv7hi2vBInUM3ZnVJWoqHnBKwUqN4FDdSXyabvkoyfID9UBOaelm6dZJhX7AQ1gYGfI1zywbuc5VzabmveasN643iilty88J/gfcFVsobd9Xb95vt00ax5zcjmkimgApiTM1zY6inRAOngmUFPzEsJrRDWqxmoSKSmXqa35vhVas0cRhp+xTgXP0+kRJpTFcG1ikJtM3vXl/8r1dLINyvp1zFCTBFB4vCRGCIcD883OSaURBdCwjV3P4V0zaxkYGNuGBD8H6f/BdUt0rebsk9s2nsoEFNoCW0jNaQh/ZQGR2jU1RBFN2jHnpGL86D8+i8Om8D65DzNbOIfpTz8Qnaqqv4</latexit>
pµ,⌫ =
1
2N
pµ|⌫ + p⌫|µ
• analogous in embedding space: pairwise probabilities
e.g. Gaussian density in Stochastic Neighborhood Embedding (SNE)
<latexit sha1_base64="lPrSjBgpne1Rd6UpTRdvI+EX2Cc=">AAAB83icbVC7SgNBFL0bXzG+ohYWNoNBsJCwK4qWARvtIpgHZJc4O5lNBmdn13kIYclv2FgoYmvtf9j5A5Z+g5NHoYkHLhzOuZd77wlTzpR23U8nNze/sLiUXy6srK6tbxQ3t+oqMZLQGkl4IpshVpQzQWuaaU6bqaQ4DjlthLfnQ79xT6ViibjW/ZQGMe4KFjGCtZX8u3bmx+bQF2aA2sWSW3ZHQLPEm5BSZefy6/s9d1NtFz/8TkJMTIUmHCvV8txUBxmWmhFOBwXfKJpicou7tGWpwDFVQTa6eYD2rdJBUSJtCY1G6u+JDMdK9ePQdsZY99S0NxT/81pGR2dBxkRqNBVkvCgyHOkEDQNAHSYp0bxvCSaS2VsR6WGJibYxFWwI3vTLs6R+VPZOyu6VTeMYxsjDLuzBAXhwChW4gCrUgEAKD/AEz45xHp0X53XcmnMmM9vwB87bD4/KlQI=</latexit>
qµ,⌫
t-SNE:
<latexit sha1_base64="+xq1lEf6Uwgk0rwIYzZqUwN2hsc=">AAACcHicbVFdaxQxFM2MX3X92toXQcVoEVopy0xR9KFCQR8E+7AFty1sxiGTvbMbm2SmSaawxHnWv+Tf8M03/4Av/gLv7hZZWy9cOPece5Obk6JW0vkk+RHFly5fuXpt5Xrnxs1bt+90V+8euKqxAgaiUpU9KrgDJQ0MvPQKjmoLXBcKDovjNzP98BSsk5X54Kc1ZJqPjSyl4B6pvPuFaWnywAI7BRGm7UemG9a2lLlGI62bLWYwkTRwghCVLVr/VdoOlkwZTCj9kJWWi7Akt+FkqaDMyvHEZ3jGa5zrvM3D+712o/95fzPvrie9ZB70IkjPwPruzs9vX/c+7fTz7nc2qkSjwXihuHPDNKl9Frj1UijAvRoHNRfHfAxDhIZrcFmYG9bSp8iMaFlZTOPpnF2eCFw7N9UFdmruJ+68NiP/pw0bX77KgjR148GIxUVlo6iv6Mx9OpIWhFdTBFxYibtSMeFomsc/6qAJ6fknXwQH2730RS/ZRzeek0WskPvkCdkgKXlJdsk70icDIsivaC16ED2Mfsf34kfx40VrHJ3NrJF/In72B48gwHo=</latexit>
min
{~
yµ}
X
µ,⌫,µ6=⌫
pµ,⌫ ln

pµ,⌫
qµ,⌫
= DKL(P|Q)
find →similar probabilities
with emphasis on
local neighborhoods
Kullback-Leibler divergence
student-t distribution
<latexit sha1_base64="R/E7CgYeaLCTpNcw41fOfKniBdk=">AAACbHicbVFdaxQxFM2MH63rR7fVB6UIwaJUbJeZRdE+CAVffKzgtoXNdMlk7+yGJplpclNYhnny5/gPxD9R3/wJvvgbzOwWWVsvBM4951xucpJXSjpMkp9RfOPmrdsrq3c6d+/df7DWXd84dKW3AgaiVKU9zrkDJQ0MUKKC48oC17mCo/z0Q6sfnYN1sjSfcVZBpvnEyEIKjoEadb+cjWqm/Q4zvqHvKSssFzVzXgfaTssdhjyIATEDZ23TUKagwGEnfbXNzkHUs+ak1eku/dsG28uTPrNyMsWsqZed2i8bTetrRt2tpJfMi14H6SXY2u9ffGV7378djLo/2LgUXoNBobhzwzSpMKu5RSkUNB3mHVRcnPIJDAM0XIPL6nlYDX0emDEtShuOQTpnlydqrp2b6Tw4Ncepu6q15P+0ocfiXVZLU3kEIxaLCq8olrRNno6lBYFqFgAXVoa7UjHlIW8M/9MJIaRXn3wdHPZ76Zte8imk8ZosapVskmdkm6TkLdknH8kBGRBBfkVr0ePoSfQ7fhRvxk8X1ji6nHlI/qn4xR8iZ79K</latexit>
qµ,⌫ =
P
⇢,⌧,⇢6=⌧
⇥
1 + (~
y⇢
~
y⌧
)2
⇤
1 + (~
yµ ~
y⌫)2
Groningen, December 2021
t-SNE analysis, posthoc labelling according to tissues:
see NARpublication for
more detailed maps
GTeX data (normal samples)
clustering of
tissue-types
here: no intuitive
interpretation of y1,y2
different tissues have different RP mRNA signatures!
Groningen, December 2021
∙ identification of prototype vectors from labeled example data
∙ distance based classification (e.g. Euclidean)
(supervised) Learning Vector Quantization
N-dimensional data, feature vectors
• initialize prototype vectors
for different classes
supervised competititve learning: LVQ1 [Kohonen, 1990]
• identify the winner
(closest prototype)
• present a single example
• move the winner
- closer towards the data (same class)
- away from the data (different class)
Groningen, December 2021
∙ identification of prototype vectors from labeled example data
∙ distance based classification (e.g. Euclidean)
N-dimensional data, feature vectors
∙ tesselation of feature space
[piece-wise linear]
∙ distance-based classification
[here: Euclidean distances]
∙ aim: discrimination of classes
( ≠ vector quantization
or density estimation )
(supervised) Learning Vector Quantization
Groningen, December 2021
generalized quadratic distance in LVQ:
= contribution of the pair of features i,j
⇤ij
⇤jj = relevance of feature j
d(w, x) =
N
X
i,j=1
(wi xi) ⇤ij (wj xj)
Matrix Relevance LVQ
<latexit sha1_base64="lSgxu3alJZqcvn93Zvl1W/TnnkA=">AAACQ3icbVBNS8MwGE7n16xfVQ8evASH4Gm0ouhlMPCi4EHBbeI6Zpplmi1Na5KKo7R/y6sX/4A3/4AXQUW8CqabolNfCDw8H7xvHi9kVCrbvjdyI6Nj4xP5SXNqemZ2zppfqMogEphUcMACcewhSRjlpKKoYuQ4FAT5HiM1r7uT6bVLIiQN+JHqhaThozNO2xQjpammdeLua3MLNWNKk5IDXd8LruI0bQciRYylCaTQdaH5xadp8p3oJCV7KJHZXU4uYEeHmlbBLtr9gX+B8wkK5aW9h6fr3OlB07pzWwGOfMIVZkjKumOHqhEjoShmJDHdSJIQ4S46I3UNOfKJbMT9DhK4qpkW1EfoxxXssz8TMfKl7PmedvpIncvfWkb+p9Uj1d5uxJSHkSIcDxa1IwZVALNCYYsKghXraYCwoPpWiM+RQFjp2k1dgvP7y39Bdb3obBbtQ93GBhhMHiyDFbAGHLAFymAXHIAKwOAGPIBn8GLcGo/Gq/E2sOaMz8wiGBrj/QPJLrY5</latexit>
⇤ii = 1 for all i
⇤ij = 0 for i 6= j
<latexit sha1_base64="CNvpcRs5/fS0apxEYNXY/9SzUzc=">AAACLnicbZDLSgMxFIYz3q23qksRgiJU0DJTFF0oCCK4VLBa6NQhk2ba2MyF5IxtGeaJ3PgOPoEuBBVx6xu4NdMqaPVAwsd/ziH5fzcSXIFpPhlDwyOjY+MTk7mp6ZnZufz8wrkKY0lZmYYilBWXKCZ4wMrAQbBKJBnxXcEu3NZh1r+4ZlLxMDiDbsRqPmkE3OOUgJac/FG9YPsEmq6XtNONb+yk69jewPvZZavYd5KrtMeCeVBoOxxv4o7DbckbTVi/LDn5VbNo9gr/BesLVg/2WvRu+cM9cfIPdj2ksc8CoIIoVbXMCGoJkcCpYGnOjhWLCG2RBqtqDIjPVC3p2U3xmlbq2AulPgHgnvpzIyG+Ul3f1ZOZHzXYy8T/etUYvN1awoMoBhbQ/kNeLDCEOMsO17lkFERXA6GS679i2iSSUNAJ53QI1qDlv3BeKlrbRfNUp7GF+jWBltAKKiAL7aADdIxOUBlRdIPu0TN6MW6NR+PVeOuPDhlfO4voVxnvn2TTqoM=</latexit>
d(w, x) =
X
j
(wi xi)
2
squared
Euclidean
distance
Groningen, December 2021
generalized quadratic distance in LVQ:
d(w, x) = (w x)
>
⇤ (w x) = [ ⌦ (w x) ]
2
Matrix Relevance LVQ
Groningen, December 2021
training
Matrix Relevance LVQ
move prototypes
and
change matrix Ω
in order to
decrease if labels agree
increase if labels disagree
<latexit sha1_base64="XxMQLvfGY5l9SQvqEKuTc545nFs=">AAACAXicbZDLSsNAFIZP6q3WW7wsBDfBIlSQkoiiy4IblxXsBdpQJtNJOziZhJmJWkLc+CpuXCji1rdw59s4SSto6w8DH/85hznn9yJGpbLtL6MwN7+wuFRcLq2srq1vmJtbTRnGApMGDlko2h6ShFFOGooqRtqRICjwGGl5NxdZvXVLhKQhv1ajiLgBGnDqU4yUtnrmbr/SDZAaen5ylx794H162DPLdtXOZc2CM4FybcfPVe+Zn91+iOOAcIUZkrLj2JFyEyQUxYykpW4sSYTwDRqQjkaOAiLdJL8gtQ6007f8UOjHlZW7vycSFEg5Cjzdma0op2uZ+V+tEyv/3E0oj2JFOB5/5MfMUqGVxWH1qSBYsZEGhAXVu1p4iATCSodW0iE40yfPQvO46pxW7SudxgmMVYQ92IcKOHAGNbiEOjQAwwM8wQu8Go/Gs/FmvI9bC8ZkZhv+yPj4BjlKmaE=</latexit>
d(w, x)
⇤ij quantify the relevance of features / pairs
after training:
prototypes: represent typical class properties
Relevance Matrix:
leading eigenvectors of ! mark the most discriminative directions
in feature space → low-dim. representation of labelled data sets
generalized quadratic distance in LVQ:
d(w, x) = (w x)
>
⇤ (w x) = [ ⌦ (w x) ]
2
Groningen, December 2021
brain
tissues
(7-12)
blood
spleen
brain cerebellum
(8+9)
muscle
pancreas
testis
heart (21,22)
liver
pituitary
(28)
muscle
different tissues have different RP mRNA signatures!
GTeX data (normal samples)
Groningen, December 2021
kidney cancer has 3 distinct subtypes (cell of origin): KIRP, KIRC, KICH
six tumor types have ribosomal subtypes: e.g. bladder cancer (BLCA)
skin melanoma (SKCM)
eye melanoma (UVM)
TCGA data: tumor samples
different tumors and subtypes have different RP mRNA signatures!
BLCA
SKCM
UVM
t-SNE
Groningen, December 2021
RP-profile tumor subtypes display significant differences in survival:
examples: uveal melanoma bladder cancer
TCGA data: tumor samples
Groningen, December 2021
TCGA data: tumor samples
classification:
LVQ relevances
ROC evaluated in
60/40% validation
univariate statistical test
identifies discriminative RP
e.g. KIRP vs. KIRC kidney tumors
vs.KIRC
0.0 0.5 1.0
Groningen, December 2021
RP mRNA signatures vary with
Ÿ tissue type Ÿ tumor type and sub-type Ÿ developmental stage
RP translation rates are proportional to RP mRNA levels
RP mRNA and profiling different in cell cultures vs. cells in-vivo
main findings: (not all presented here, see NAR paper)
speculative (yet plausible) conclusions:
RP composition and function
Ÿ is tissue-, tumor-, development-, environment-specific
Ÿ adds a novel layer to the regulatory network of the cell
Ÿ might play an important role in cancer
caveats: composition could be independent of RP abundance
possible extra-ribosomal functions of RP
direct inspection of ribosome is difficult
conclusion
Groningen, December 2021
Thanks!
Questions?

More Related Content

What's hot

Thesis def
Thesis defThesis def
Thesis defJay Vyas
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekData Driven Innovation
 
Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Gunnar Rätsch
 
2015 bioinformatics personal_genomics_wim_vancriekinge
2015 bioinformatics personal_genomics_wim_vancriekinge2015 bioinformatics personal_genomics_wim_vancriekinge
2015 bioinformatics personal_genomics_wim_vancriekingeProf. Wim Van Criekinge
 
Integrative Networks Centric Bioinformatics
Integrative Networks Centric BioinformaticsIntegrative Networks Centric Bioinformatics
Integrative Networks Centric BioinformaticsNatalio Krasnogor
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsmikaelhuss
 
BITS: Basics of sequence databases
BITS: Basics of sequence databasesBITS: Basics of sequence databases
BITS: Basics of sequence databasesBITS
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomicsajay301
 
The UCSC genome browser: A Neuroscience focused overview
The UCSC genome browser: A Neuroscience focused overviewThe UCSC genome browser: A Neuroscience focused overview
The UCSC genome browser: A Neuroscience focused overviewVictoria Perreau
 
BITs: Genome browsers and interpretation of gene lists.
BITs: Genome browsers and interpretation of gene lists.BITs: Genome browsers and interpretation of gene lists.
BITs: Genome browsers and interpretation of gene lists.BITS
 
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Natalio Krasnogor
 
Drug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge GraphsDrug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge GraphsDatabricks
 
BRITEREU_finalposter
BRITEREU_finalposterBRITEREU_finalposter
BRITEREU_finalposterElsa Fecke
 

What's hot (20)

Thesis def
Thesis defThesis def
Thesis def
 
Paper - Muhammad Gulraj
Paper - Muhammad GulrajPaper - Muhammad Gulraj
Paper - Muhammad Gulraj
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
 
Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)
 
2015 bioinformatics personal_genomics_wim_vancriekinge
2015 bioinformatics personal_genomics_wim_vancriekinge2015 bioinformatics personal_genomics_wim_vancriekinge
2015 bioinformatics personal_genomics_wim_vancriekinge
 
Integrative Networks Centric Bioinformatics
Integrative Networks Centric BioinformaticsIntegrative Networks Centric Bioinformatics
Integrative Networks Centric Bioinformatics
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomics
 
Bioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmmBioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmm
 
BITS: Basics of sequence databases
BITS: Basics of sequence databasesBITS: Basics of sequence databases
BITS: Basics of sequence databases
 
Cloud bioinformatics 2
Cloud bioinformatics 2Cloud bioinformatics 2
Cloud bioinformatics 2
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
The UCSC genome browser: A Neuroscience focused overview
The UCSC genome browser: A Neuroscience focused overviewThe UCSC genome browser: A Neuroscience focused overview
The UCSC genome browser: A Neuroscience focused overview
 
Kirmitzoglou_PhD_Final
Kirmitzoglou_PhD_FinalKirmitzoglou_PhD_Final
Kirmitzoglou_PhD_Final
 
10.1.1.80.2149
10.1.1.80.214910.1.1.80.2149
10.1.1.80.2149
 
BITs: Genome browsers and interpretation of gene lists.
BITs: Genome browsers and interpretation of gene lists.BITs: Genome browsers and interpretation of gene lists.
BITs: Genome browsers and interpretation of gene lists.
 
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
 
Biological Network Inference via Gaussian Graphical Models
Biological Network Inference via Gaussian Graphical ModelsBiological Network Inference via Gaussian Graphical Models
Biological Network Inference via Gaussian Graphical Models
 
Predicting Functional Regions in Genomic DNA Sequences Using Artificial Neur...
Predicting Functional Regions in Genomic DNA Sequences Using  Artificial Neur...Predicting Functional Regions in Genomic DNA Sequences Using  Artificial Neur...
Predicting Functional Regions in Genomic DNA Sequences Using Artificial Neur...
 
Drug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge GraphsDrug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge Graphs
 
BRITEREU_finalposter
BRITEREU_finalposterBRITEREU_finalposter
BRITEREU_finalposter
 

Similar to Evidence for tissue and stage-specific composition of the ribosome: machine learning analysis of ribosomal protein mRNA data

STRING - Prediction of a functional association network for the yeast mitocho...
STRING - Prediction of a functional association network for the yeast mitocho...STRING - Prediction of a functional association network for the yeast mitocho...
STRING - Prediction of a functional association network for the yeast mitocho...Lars Juhl Jensen
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformaticsAtai Rabby
 
20080110 Genome exploration in A-T G-C space: an introduction to DNA walking
20080110 Genome exploration in A-T G-C space: an introduction to DNA walking20080110 Genome exploration in A-T G-C space: an introduction to DNA walking
20080110 Genome exploration in A-T G-C space: an introduction to DNA walkingJonathan Blakes
 
Kulakova sbb2014
Kulakova sbb2014Kulakova sbb2014
Kulakova sbb2014Ek_Kul
 
Predicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPredicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPatricia Francis-Lyon
 
Human genome project
Human genome projectHuman genome project
Human genome projectruchibioinfo
 
Project Presentation
Project PresentationProject Presentation
Project Presentationbutest
 
Bioinformatics-R program의 실례
Bioinformatics-R program의 실례Bioinformatics-R program의 실례
Bioinformatics-R program의 실례mothersafe
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Elia Brodsky
 
RT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferationRT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferationIJAEMSJORNAL
 
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...Servio Fernando Lima Reina
 
upload.pdf
upload.pdfupload.pdf
upload.pdfzohra72
 
Microarray biotechnologg ppy dna microarrays
Microarray biotechnologg ppy dna microarraysMicroarray biotechnologg ppy dna microarrays
Microarray biotechnologg ppy dna microarraysayeshasattarsandhu
 
PICS: Pathway Informed Classification System for cancer analysis using gene e...
PICS: Pathway Informed Classification System for cancer analysis using gene e...PICS: Pathway Informed Classification System for cancer analysis using gene e...
PICS: Pathway Informed Classification System for cancer analysis using gene e...David Craft
 
Complementing Computation with Visualization in Genomics
Complementing Computation with Visualization in GenomicsComplementing Computation with Visualization in Genomics
Complementing Computation with Visualization in GenomicsFrancis Rowland
 
overview on Next generation sequencing in breast csncer
overview on Next generation sequencing in breast csnceroverview on Next generation sequencing in breast csncer
overview on Next generation sequencing in breast csncerSeham Al-Shehri
 

Similar to Evidence for tissue and stage-specific composition of the ribosome: machine learning analysis of ribosomal protein mRNA data (20)

STRING - Prediction of a functional association network for the yeast mitocho...
STRING - Prediction of a functional association network for the yeast mitocho...STRING - Prediction of a functional association network for the yeast mitocho...
STRING - Prediction of a functional association network for the yeast mitocho...
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 
20080110 Genome exploration in A-T G-C space: an introduction to DNA walking
20080110 Genome exploration in A-T G-C space: an introduction to DNA walking20080110 Genome exploration in A-T G-C space: an introduction to DNA walking
20080110 Genome exploration in A-T G-C space: an introduction to DNA walking
 
Kulakova sbb2014
Kulakova sbb2014Kulakova sbb2014
Kulakova sbb2014
 
MoM2010: Bioinformatics
MoM2010: BioinformaticsMoM2010: Bioinformatics
MoM2010: Bioinformatics
 
Predicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPredicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learning
 
HGP, the human genome project
HGP, the human genome projectHGP, the human genome project
HGP, the human genome project
 
Human genome project
Human genome projectHuman genome project
Human genome project
 
Project Presentation
Project PresentationProject Presentation
Project Presentation
 
Bioinformatics-R program의 실례
Bioinformatics-R program의 실례Bioinformatics-R program의 실례
Bioinformatics-R program의 실례
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
 
RT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferationRT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferation
 
1207.2600
1207.26001207.2600
1207.2600
 
An26247254
An26247254An26247254
An26247254
 
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
 
upload.pdf
upload.pdfupload.pdf
upload.pdf
 
Microarray biotechnologg ppy dna microarrays
Microarray biotechnologg ppy dna microarraysMicroarray biotechnologg ppy dna microarrays
Microarray biotechnologg ppy dna microarrays
 
PICS: Pathway Informed Classification System for cancer analysis using gene e...
PICS: Pathway Informed Classification System for cancer analysis using gene e...PICS: Pathway Informed Classification System for cancer analysis using gene e...
PICS: Pathway Informed Classification System for cancer analysis using gene e...
 
Complementing Computation with Visualization in Genomics
Complementing Computation with Visualization in GenomicsComplementing Computation with Visualization in Genomics
Complementing Computation with Visualization in Genomics
 
overview on Next generation sequencing in breast csncer
overview on Next generation sequencing in breast csnceroverview on Next generation sequencing in breast csncer
overview on Next generation sequencing in breast csncer
 

More from University of Groningen

Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024
Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024
Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024University of Groningen
 
The statistical physics of learning revisted: Phase transitions in layered ne...
The statistical physics of learning revisted: Phase transitions in layered ne...The statistical physics of learning revisted: Phase transitions in layered ne...
The statistical physics of learning revisted: Phase transitions in layered ne...University of Groningen
 
Interpretable machine-learning (in endocrinology and beyond)
Interpretable machine-learning (in endocrinology and beyond)Interpretable machine-learning (in endocrinology and beyond)
Interpretable machine-learning (in endocrinology and beyond)University of Groningen
 
2020: Prototype-based classifiers and relevance learning: medical application...
2020: Prototype-based classifiers and relevance learning: medical application...2020: Prototype-based classifiers and relevance learning: medical application...
2020: Prototype-based classifiers and relevance learning: medical application...University of Groningen
 
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...University of Groningen
 
Prototype-based classifiers and their applications in the life sciences
Prototype-based classifiers and their applications in the life sciencesPrototype-based classifiers and their applications in the life sciences
Prototype-based classifiers and their applications in the life sciencesUniversity of Groningen
 
Prototype-based models in machine learning
Prototype-based models in machine learningPrototype-based models in machine learning
Prototype-based models in machine learningUniversity of Groningen
 
The statistical physics of learning - revisited
The statistical physics of learning - revisitedThe statistical physics of learning - revisited
The statistical physics of learning - revisitedUniversity of Groningen
 
2013: Sometimes you can trust a rat - The sbv improver species translation ch...
2013: Sometimes you can trust a rat - The sbv improver species translation ch...2013: Sometimes you can trust a rat - The sbv improver species translation ch...
2013: Sometimes you can trust a rat - The sbv improver species translation ch...University of Groningen
 
2013: Prototype-based learning and adaptive distances for classification
2013: Prototype-based learning and adaptive distances for classification2013: Prototype-based learning and adaptive distances for classification
2013: Prototype-based learning and adaptive distances for classificationUniversity of Groningen
 
2015: Distance based classifiers: Basic concepts, recent developments and app...
2015: Distance based classifiers: Basic concepts, recent developments and app...2015: Distance based classifiers: Basic concepts, recent developments and app...
2015: Distance based classifiers: Basic concepts, recent developments and app...University of Groningen
 
2016: Classification of FDG-PET Brain Data
2016: Classification of FDG-PET Brain Data2016: Classification of FDG-PET Brain Data
2016: Classification of FDG-PET Brain DataUniversity of Groningen
 
2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma
2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma
2016: Predicting Recurrence in Clear Cell Renal Cell CarcinomaUniversity of Groningen
 
2017: Prototype-based models in unsupervised and supervised machine learning
2017: Prototype-based models in unsupervised and supervised machine learning2017: Prototype-based models in unsupervised and supervised machine learning
2017: Prototype-based models in unsupervised and supervised machine learningUniversity of Groningen
 

More from University of Groningen (20)

Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024
Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024
Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024
 
ESE-Eyes-2023.pdf
ESE-Eyes-2023.pdfESE-Eyes-2023.pdf
ESE-Eyes-2023.pdf
 
APPIS-FDGPET.pdf
APPIS-FDGPET.pdfAPPIS-FDGPET.pdf
APPIS-FDGPET.pdf
 
stat-phys-appis-reduced.pdf
stat-phys-appis-reduced.pdfstat-phys-appis-reduced.pdf
stat-phys-appis-reduced.pdf
 
prototypes-AMALEA.pdf
prototypes-AMALEA.pdfprototypes-AMALEA.pdf
prototypes-AMALEA.pdf
 
stat-phys-AMALEA.pdf
stat-phys-AMALEA.pdfstat-phys-AMALEA.pdf
stat-phys-AMALEA.pdf
 
The statistical physics of learning revisted: Phase transitions in layered ne...
The statistical physics of learning revisted: Phase transitions in layered ne...The statistical physics of learning revisted: Phase transitions in layered ne...
The statistical physics of learning revisted: Phase transitions in layered ne...
 
Interpretable machine-learning (in endocrinology and beyond)
Interpretable machine-learning (in endocrinology and beyond)Interpretable machine-learning (in endocrinology and beyond)
Interpretable machine-learning (in endocrinology and beyond)
 
Biehl hanze-2021
Biehl hanze-2021Biehl hanze-2021
Biehl hanze-2021
 
2020: Prototype-based classifiers and relevance learning: medical application...
2020: Prototype-based classifiers and relevance learning: medical application...2020: Prototype-based classifiers and relevance learning: medical application...
2020: Prototype-based classifiers and relevance learning: medical application...
 
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
 
Prototype-based classifiers and their applications in the life sciences
Prototype-based classifiers and their applications in the life sciencesPrototype-based classifiers and their applications in the life sciences
Prototype-based classifiers and their applications in the life sciences
 
Prototype-based models in machine learning
Prototype-based models in machine learningPrototype-based models in machine learning
Prototype-based models in machine learning
 
The statistical physics of learning - revisited
The statistical physics of learning - revisitedThe statistical physics of learning - revisited
The statistical physics of learning - revisited
 
2013: Sometimes you can trust a rat - The sbv improver species translation ch...
2013: Sometimes you can trust a rat - The sbv improver species translation ch...2013: Sometimes you can trust a rat - The sbv improver species translation ch...
2013: Sometimes you can trust a rat - The sbv improver species translation ch...
 
2013: Prototype-based learning and adaptive distances for classification
2013: Prototype-based learning and adaptive distances for classification2013: Prototype-based learning and adaptive distances for classification
2013: Prototype-based learning and adaptive distances for classification
 
2015: Distance based classifiers: Basic concepts, recent developments and app...
2015: Distance based classifiers: Basic concepts, recent developments and app...2015: Distance based classifiers: Basic concepts, recent developments and app...
2015: Distance based classifiers: Basic concepts, recent developments and app...
 
2016: Classification of FDG-PET Brain Data
2016: Classification of FDG-PET Brain Data2016: Classification of FDG-PET Brain Data
2016: Classification of FDG-PET Brain Data
 
2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma
2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma
2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma
 
2017: Prototype-based models in unsupervised and supervised machine learning
2017: Prototype-based models in unsupervised and supervised machine learning2017: Prototype-based models in unsupervised and supervised machine learning
2017: Prototype-based models in unsupervised and supervised machine learning
 

Recently uploaded

Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionPriyansha Singh
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 

Recently uploaded (20)

Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorption
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 

Evidence for tissue and stage-specific composition of the ribosome: machine learning analysis of ribosomal protein mRNA data

  • 1. Evidence for tissue and stage-specific composition of the ribosome: machine learning analysis of ribosomal protein mRNA data June 2020 Michael Biehl www.cs.rug.nl/~biehl m.biehl@rug.nl Bernoulli Inst. for Mathematics, Computer Science and Artificial Intelligence
  • 2. Groningen, December 2021 2016: Aspen Center for Physics http://www.aspenphys.org working group, initialized by Gyan Bhanot development of ideas and first data analysis (LVQ & SOM) here: emphasis on computational analysis, selected results
  • 3. Groningen, December 2021 transcription: DNA è (m)RNA the “central dogma” of molecular biology
  • 4. Groningen, December 2021 transcription: DNA è (m)RNA translation mRNA è proteins the “central dogma” of molecular biology protein,è function
  • 5. Groningen, December 2021 transcription: DNA è (m)RNA translation mRNA è proteins the “central dogma” of molecular biology Ribosome protein,è function
  • 6. Groningen, December 2021 Francis Crick 1958: The Central Dogma This states that once "information" has passed into protein, it cannot get out again. In more detail, the transfer of information from nucleic acid to nucleic acid, or from nucleic acid to protein may be possible, but transfer from protein to protein, or from protein to nucleic acid is impossible. Information means here the precise determination of sequence, either of bases in the nucleic acid or of amino acid residues in the protein. the “central dogma” of molecular biology Rosalind Franklin 1920-1958 https://www.electricvoicetheatre.co.uk/rosalind-franklin-centenary-celebrations https://de.wikipedia.org/wiki/Desoxyribonukleinsäure
  • 7. Groningen, December 2021 • ancient molecular machine, “3D-printer” for proteins • ~ 107 ribosomes per cell • believed to have universal function • believed to have the same composition in all tissues the ribosome Ribosome
  • 8. Groningen, December 2021 8 Courtesy: National Human Genome Research Institute, NHI www.genome.gov remark: mRNA vaccines
  • 9. Groningen, December 2021 • ancient molecular machine, “3D-printer” for proteins • ~ 107 ribosomes per cell • believed to have universal function • believed to have the same composition in all tissues • consists of RNA and ribosomal proteins (RP) the ribosome also coded by DNA which is transcribed to mRNA we consider 78 RP ← mRNA expression Ribosome
  • 10. Groningen, December 2021 Klijn et al. Nature Biotechnol. 33(3): 306-312 (2015) 675 cancer cell lines public domain data sets GTeX (v6p) www.gtexportal.org ca. 10000 normal samples from 53 different tissues (with >50 samples) TCGA (NCI-GDC, v7) www.cancer.gov (The Cancer Genome Atlas) ca. 10000 tumor samples, 730 tumor-adjacent normals PCA, SOM, t-SNE, UMAP, LVQ normalization: constant sum of reads for each of the 78 RP all results robust w.r.t. pre- processing (log, z-score) and choice of distance measures mRNA expression
  • 11. Groningen, December 2021 GTex ribosome data ca. 10000 normal samples from 53 different tissues exclude 5 tissues with < 50 samples select randomly 88 samples per tissue to avoid bias set of feature vectors <latexit sha1_base64="YYnX74plywrsHLvNIAOgeJTuamQ=">AAACJHicbVDLSgNBEJyNrxhfUY9eBoPgKexKUEGEoBdPEsWokEnC7GQ2GTI7u8z0imHZj/Hir3jx4AMPXvwWJ3EPvgoaiqpuurv8WAoDrvvuFKamZ2bnivOlhcWl5ZXy6tqliRLNeJNFMtLXPjVcCsWbIEDy61hzGvqSX/nD47F/dcO1EZG6gFHM2yHtKxEIRsFK3fIBkTwAkmISUhj4QXqbdUiYYCJULvnpedY5xUSL/gBI1k2tfehlnQbulitu1Z0A/yVeTiooR6NbfiG9iCUhV8AkNabluTG0U6pBMMmzEkkMjykb0j5vWapoyE07nTyZ4S2r9HAQaVsK8ET9PpHS0JhR6NvO8d3mtzcW//NaCQT77VSoOAGu2NeiIJEYIjxODPeE5gzkyBLKtLC3YjagmjKwuZZsCN7vl/+Sy52qt1utndUq9aM8jiLaQJtoG3loD9XRCWqgJmLoDj2gJ/Ts3DuPzqvz9tVacPKZdfQDzscnfw6lUQ==</latexit> xµ 2 RN P µ=1 <latexit sha1_base64="H3XzPe116LESU9mbbc8XD2vlHBg=">AAACBnicbVC7SgNBFJ31GdfXqqUIg0GwkLAbgkkTDNpYSQTzgOwSZieTZMjs7DIzK4YlNjb+h5WNhSK2dvY24t84eRSaeODC4Zx7ufceP2JUKtv+NubmFxaXllMr5ura+samtbVdlWEsMKngkIWi7iNJGOWkoqhipB4JggKfkZrfOxv6tWsiJA35lepHxAtQh9M2xUhpqWntXRTzBdeFphv44U1yqzGA7hEsF3PZbK5ppe2MPQKcJc6EpE8+zGL08GWWm9an2wpxHBCuMENSNhw7Ul6ChKKYkYHpxpJECPdQhzQ05Sgg0ktGbwzggVZasB0KXVzBkfp7IkGBlP3A150BUl057Q3F/7xGrNoFL6E8ihXheLyoHTOoQjjMBLaoIFixviYIC6pvhbiLBMJKJ2fqEJzpl2dJNZtxjjO5SztdOgVjpMAu2AeHwAF5UALnoAwqAIM78AiewYtxbzwZr8bbuHXOmMzsgD8w3n8APZSaEg==</latexit> N = 78 P = 4224 with target labels (tissue)
  • 12. Groningen, December 2021 unsupervised: low-dimensional representation and visualization • Principal Component Analysis (PCA) • Self-Organizing Map (SOM) • Stochastic Neighborhood Embedding (t-SNE) • post-hoc labelling supervised: classification and class-discriminative visualization • Learning Vector Quantization (LVQ) • Relevance Learning (Matrix Relevance LVQ) computational analysis / machine learning spoiler: different tissues have different RP mRNA signatures! basic ideas, limited mathematical detail focus on aspects relevant for this particular study
  • 13. Groningen, December 2021 13 <latexit sha1_base64="QRlFKTdZFTH2TZmCLuq5ispBsxg=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE1GPRi8eK9gPaUDbbTbt0swm7E7GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSKQw6LrfTmFldW19o7hZ2tre2d0r7x80TZxqxhsslrFuB9RwKRRvoEDJ24nmNAokbwWjm6nfeuTaiFg94DjhfkQHSoSCUbTS/VPP65UrbtWdgSwTLycVyFHvlb+6/ZilEVfIJDWm47kJ+hnVKJjkk1I3NTyhbEQHvGOpohE3fjY7dUJOrNInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tOyYbgLb68TJpnVe+ien53Xqld53EU4QiO4RQ8uIQa3EIdGsBgAM/wCm+OdF6cd+dj3lpw8plD+APn8wcOno2p</latexit> x1 <latexit sha1_base64="AR8bw1iLV+h/TJ1WtHIGvCZC5c0=">AAAB6nicbVDLTgJBEOzFF+IL9ehlIjHxRHYJUY9ELx4xyiOBDZkdemHC7OxmZtZICJ/gxYPGePWLvPk3DrAHBSvppFLVne6uIBFcG9f9dnJr6xubW/ntws7u3v5B8fCoqeNUMWywWMSqHVCNgktsGG4EthOFNAoEtoLRzcxvPaLSPJYPZpygH9GB5CFn1Fjp/qlX6RVLbtmdg6wSLyMlyFDvFb+6/ZilEUrDBNW647mJ8SdUGc4ETgvdVGNC2YgOsGOppBFqfzI/dUrOrNInYaxsSUPm6u+JCY20HkeB7YyoGeplbyb+53VSE175Ey6T1KBki0VhKoiJyexv0ucKmRFjSyhT3N5K2JAqyoxNp2BD8JZfXiXNStm7KFfvqqXadRZHHk7gFM7Bg0uowS3UoQEMBvAMr/DmCOfFeXc+Fq05J5s5hj9wPn8AECKNqg==</latexit> x2 . . . . . . . . . . . . . . . . . . . . . . . . Principal Component Analysis (PCA) <latexit sha1_base64="YYnX74plywrsHLvNIAOgeJTuamQ=">AAACJHicbVDLSgNBEJyNrxhfUY9eBoPgKexKUEGEoBdPEsWokEnC7GQ2GTI7u8z0imHZj/Hir3jx4AMPXvwWJ3EPvgoaiqpuurv8WAoDrvvuFKamZ2bnivOlhcWl5ZXy6tqliRLNeJNFMtLXPjVcCsWbIEDy61hzGvqSX/nD47F/dcO1EZG6gFHM2yHtKxEIRsFK3fIBkTwAkmISUhj4QXqbdUiYYCJULvnpedY5xUSL/gBI1k2tfehlnQbulitu1Z0A/yVeTiooR6NbfiG9iCUhV8AkNabluTG0U6pBMMmzEkkMjykb0j5vWapoyE07nTyZ4S2r9HAQaVsK8ET9PpHS0JhR6NvO8d3mtzcW//NaCQT77VSoOAGu2NeiIJEYIjxODPeE5gzkyBLKtLC3YjagmjKwuZZsCN7vl/+Sy52qt1utndUq9aM8jiLaQJtoG3loD9XRCWqgJmLoDj2gJ/Ts3DuPzqvz9tVacPKZdfQDzscnfw6lUQ==</latexit> xµ 2 RN P µ=1 <latexit sha1_base64="MuIGc1CqJmN2dBSohoon8NLWjdQ=">AAAB9HicbVDLSgMxFL1TX7W+qi7dBIvgqsxIUZdFNy4r2Ae0Q8mkmTY0yYxJplKGfocbF4q49WPc+Tdm2llo64HA4Zx7uScniDnTxnW/ncLa+sbmVnG7tLO7t39QPjxq6ShRhDZJxCPVCbCmnEnaNMxw2okVxSLgtB2MbzO/PaFKs0g+mGlMfYGHkoWMYGMlvyewGQVh+jTre6hfrrhVdw60SrycVCBHo1/+6g0ikggqDeFY667nxsZPsTKMcDor9RJNY0zGeEi7lkosqPbTeegZOrPKAIWRsk8aNFd/b6RYaD0VgZ3MQuplLxP/87qJCa/9lMk4MVSSxaEw4chEKGsADZiixPCpJZgoZrMiMsIKE2N7KtkSvOUvr5LWRdW7rNbua5X6TV5HEU7gFM7Bgyuowx00oAkEHuEZXuHNmTgvzrvzsRgtOPnOMfyB8/kDhg6R8g==</latexit> w1 direction of largest variance in the data <latexit sha1_base64="MuIGc1CqJmN2dBSohoon8NLWjdQ=">AAAB9HicbVDLSgMxFL1TX7W+qi7dBIvgqsxIUZdFNy4r2Ae0Q8mkmTY0yYxJplKGfocbF4q49WPc+Tdm2llo64HA4Zx7uScniDnTxnW/ncLa+sbmVnG7tLO7t39QPjxq6ShRhDZJxCPVCbCmnEnaNMxw2okVxSLgtB2MbzO/PaFKs0g+mGlMfYGHkoWMYGMlvyewGQVh+jTre6hfrrhVdw60SrycVCBHo1/+6g0ikggqDeFY667nxsZPsTKMcDor9RJNY0zGeEi7lkosqPbTeegZOrPKAIWRsk8aNFd/b6RYaD0VgZ3MQuplLxP/87qJCa/9lMk4MVSSxaEw4chEKGsADZiixPCpJZgoZrMiMsIKE2N7KtkSvOUvr5LWRdW7rNbua5X6TV5HEU7gFM7Bgyuowx00oAkEHuEZXuHNmTgvzrvzsRgtOPnOMfyB8/kDhg6R8g==</latexit> w1 <latexit sha1_base64="A837x7q5v7RDnIGRnbg2uXgo8S4=">AAAB83icbVDLSsNAFL2pr1pfVZduBovgqiSlqMuiG5cVbCs0oUymk3boZBLmoZTQ33DjQhG3/ow7/8ZJm4W2Hhg4nHMv98wJU86Udt1vp7S2vrG5Vd6u7Ozu7R9UD4+6KjGS0A5JeCIfQqwoZ4J2NNOcPqSS4jjktBdObnK/90ilYom419OUBjEeCRYxgrWVfD/GehxG2dNs0BhUa27dnQOtEq8gNSjQHlS//GFCTEyFJhwr1ffcVAcZlpoRTmcV3yiaYjLBI9q3VOCYqiCbZ56hM6sMUZRI+4RGc/X3RoZjpaZxaCfzjGrZy8X/vL7R0VWQMZEaTQVZHIoMRzpBeQFoyCQlmk8twUQymxWRMZaYaFtTxZbgLX95lXQbde+i3rxr1lrXRR1lOIFTOAcPLqEFt9CGDhBI4Rle4c0xzovz7nwsRktOsXMMf+B8/gAtS5HJ</latexit> w2 largest variance in orthogonal space … … … … <latexit sha1_base64="GTA3Wg/eDSF68QwBC5C5XSLoZV4=">AAACSHicbVDLSgMxFM3Ud31VXboJFsGFlhkRdSm6cVOoYFVoypBJ77TBzIPkjlKG+Tw3Lt35DW5cKOLOTC34PBA4OedccnOCVEmDrvvoVCYmp6ZnZueq8wuLS8u1ldULk2RaQFskKtFXATegZAxtlKjgKtXAo0DBZXB9UvqXN6CNTOJzHKbQjXg/lqEUHK3k13wWcRwEYX5b+E1K2TbbpiwFndKSKAiR5d8inhW/brtlpJeg+aHmzR2vYFr2B8gKv1Z3G+4I9C/xxqROxmj5tQfWS0QWQYxCcWM6nptiN+capVBQVFlmIOXimvehY2nMIzDdfFREQTet0qNhou2JkY7U7xM5j4wZRoFNlvua314p/ud1MgwPu7mM0wwhFp8PhZmimNCyVdqTGgSqoSVcaGl3pWLANRdou6/aErzfX/5LLnYb3n5j72yvfnQ8rmOWrJMNskU8ckCOyClpkTYR5I48kRfy6tw7z86b8/4ZrTjjmTXyA5XKByEqshA=</latexit> wM ? {w1, w2, . . . , wM 1} centered Principal Component Analysis: <latexit sha1_base64="8lVQZGbAGMfoffsHO5BSRreA6nM=">AAACL3icbVDLSgNBEJz1bXxFPXoZDIKHEHZjUI+iIB4jGBUyIcxOepMhsw9mepWw5I+8+Cu5iCji1b9wNgZ8xIKBmqpuurv8REmDrvvszMzOzS8sLi0XVlbX1jeKm1vXJk61gIaIVaxvfW5AyQgaKFHBbaKBh76CG79/lvs3d6CNjKMrHCTQCnk3koEUHK3ULp6zkGPPD7L7YfuAUlZmZcoS0AnNiYIAWfajxLPi969KmZbdHrJhu1hyK+4YdJp4E1IiE9TbxRHrxCINIUKhuDFNz02wlXGNUigYFlhqIOGiz7vQtDTiIZhWNr53SPes0qFBrO2LkI7Vnx0ZD40ZhL6tzJc1f71c/M9rphgctzIZJSlCJL4GBamiGNM8PNqRGgSqgSVcaGl3paLHNRdoIy7YELy/J0+T62rFO6zULmulk9NJHEtkh+ySfeKRI3JCLkidNIggD2REXsir8+g8OW/O+1fpjDPp2Sa/4Hx8AmzYqMI=</latexit> w3 ? {w1, w2} <latexit sha1_base64="HExy6GIT6eMTJuoR2V19GdXeImc=">AAACEXicbZDLSgMxFIYzXmu9jbp0EyxCF6XMlKIui25cVrAX6AxDJs20oZlMSDJKGfoKbnwVNy4UcevOnW9jpi2orQcCH/9/DjnnDwWjSjvOl7Wyura+sVnYKm7v7O7t2weHbZWkEpMWTlgiuyFShFFOWppqRrpCEhSHjHTC0VXud+6IVDTht3osiB+jAacRxUgbKbDLXoz0MIyy+0lQg9CreBXoCSIFzOHHcwO75FSdacFlcOdQAvNqBvan109wGhOuMUNK9VxHaD9DUlPMyKTopYoIhEdoQHoGOYqJ8rPpRRN4apQ+jBJpHtdwqv6eyFCs1DgOTWe+o1r0cvE/r5fq6MLPKBepJhzPPopSBnUC83hgn0qCNRsbQFhSsyvEQyQR1ibEognBXTx5Gdq1qntWrd/US43LeRwFcAxOQBm44Bw0wDVoghbA4AE8gRfwaj1az9ab9T5rXbHmM0fgT1kf33ObnCo=</latexit> w2 ? w1 projections: <latexit sha1_base64="RA5iB821E5SNoyijWuF9GDFdhZw=">AAACE3icbVDLSsNAFJ34rPUVdelmsAjioiRS1I1QdOOygn1AE8NkOmmHTiZhZqKG0H9w46+4caGIWzfu/BsnbRBtPTBwOOdc5t7jx4xKZVlfxtz8wuLScmmlvLq2vrFpbm23ZJQITJo4YpHo+EgSRjlpKqoY6cSCoNBnpO0PL3K/fUuEpBG/VmlM3BD1OQ0oRkpLnnmYesMbJ0zgGXRCpAZ+kN2NcklF8Y9yP8ojnlmxqtYYcJbYBamAAg3P/HR6EU5CwhVmSMqubcXKzZBQFDMyKjuJJDHCQ9QnXU05Col0s/FNI7ivlR4MIqEfV3Cs/p7IUChlGvo6mW8pp71c/M/rJio4dTPK40QRjicfBQmDKoJ5QbBHBcGKpZogLKjeFeIBEggrXWNZl2BPnzxLWkdV+7hau6pV6udFHSWwC/bAAbDBCaiDS9AATYDBA3gCL+DVeDSejTfjfRKdM4qZHfAHxsc3xlKewg==</latexit> yµ k = w> k xµ <latexit sha1_base64="OSsaCH1Eb6WryF/Ad6oXNg/hEUY=">AAACNnicbVBNa9tAEF0lbeO4X0p67GWpKfRQjBRCk0vAJJdeDGmpE4NXFqv1yF6yWondUUAI/6pe+jtyy6WHltJrf0JWtg6t3QcLb9+bYWZeUihpMQjuvZ3dR4+f7HX2u0+fPX/x0j84vLJ5aQSMRK5yM064BSU1jFCignFhgGeJguvk5qLxr2/BWJnrL1gVEGV8rmUqBUcnxf6Q3YKoq+WUZSU9o0xBipMqDpv/e1rFR2vC1CxH2wjDRmBGzhcYUSY1ZRnHRZLUn5fTYez3gn6wAt0mYUt6pMVl7N+xWS7KDDQKxa2dhEGBUc0NSqFg2WWlhYKLGz6HiaOaZ2CjenX2kr51yoymuXFPI12pf3fUPLO2yhJX2exoN71G/J83KTE9jWqpixJBi/WgtFQUc9pkSGfSgEBVOcKFkW5XKhbccIEu6a4LIdw8eZtcHfXDD/3jT8e9wXkbR4e8Jm/IOxKSEzIgH8klGRFBvpJ78oP89L55371f3u916Y7X9rwi/8D78wBWuau3</latexit> ~ yµ = [yµ 1 , yµ 2 , . . . , yµ M ] 2 RM typically: M < N, low-dimensional linear projections of high-dim. data with optimal information content for Gaussian data frequently: M =2,3 for visualization of high-dim. data <latexit sha1_base64="uDaGUxcK//nRkCEehlsKTtnLA5g=">AAACRXicbVBNTxsxFPQCLZB+pfTIxSKqxAGlu1VVeomE4NJyChIBpDhEXudtYmF7V/bbqpG1J/4ZF+7c+AdcegChXouTcKChI1kazczTe560UNJhHF9HC4tLL14ur6zWXr1+8/Zd/f3akctLK6AjcpXbk5Q7UNJAByUqOCkscJ0qOE7P9ib+8U+wTubmEMcF9DQfGplJwTFI/TpjCjJkipuhAso0x1Ga+V/VKdMlZVYOR8jszGxRllkufFL5dkWZK3Xfh1QrqU7blG3ND7do3K834mY8BX1OkkfS2PnBPp37/Ua7X79ig1yUGgwKxZ3rJnGBPc8tSqGgqrHSQcHFGR9CN1DDNbien7ZQ0Y9BGdAst+EZpFP16YTn2rmxTkNycqmb9ybi/7xuidm3npemKBGMmC3KSkUxp5NK6UBaEKjGgXBhZbiVihEPVWEovhZKSOa//JwcfW4mX5tfDkIbu2SGFbJONsgmScg22SHfSZt0iCAX5IbckrvoMvod3Ud/ZtGF6HHmA/kH0d8HBja0vQ==</latexit> hxµ i = 1 P P X µ=1 xµ = 0 feature vectors eigenvectors of covariance matrix <latexit sha1_base64="9z4CEIRz5yeIRjjrTcahhq7l73k=">AAACVnicbVHBShxBEO2ZjdFsNE70mEsnEshBlhkxJBdB8OJJNiGrwva69PTWaGN3z9BdE1ya+Zzg/5hLJMd8RS5iz+4ejKag4dV7VVTV67xS0mGa3kZx59nS8+WVF92Xq2uv1pPXG8eurK2AgShVaU9z7kBJAwOUqOC0ssB1ruAkvzxo9ZPvYJ0szTecVjDS/NzIQgqOgRonmiFcocOpAnqwxwrLhc8a32+Yq/XYM13vZc1ZyCnTHC/ywl81Z4F9mLZVDMuqoZRts23KpFnIuf8a5COGUoOjR8042Up76SzoU5AtwNb+2x+d7M/17/44uWGTUtQaDArFnRtmaYUjzy1KoaDpstpBxcUlP4dhgIaHOSM/s6Wh7wMzoUVpwzNIZ+zDDs+1c1Odh8p2W/dYa8n/acMai88jL01VIxgxH1TUimJJW4/pRFoQqKYBcGFl2JWKCx6sxfAT3WBC9vjkp+B4p5d97KVfghu7ZB4r5A15Rz6QjHwi++SQ9MmACPKT/I3iqBP9iu7ipXh5XhpHi55N8k/EyT3bBLk6</latexit> C = 1 P PP µ=1 xµ xµ> 2 RN⇥N
  • 14. Groningen, December 2021 GTeX data (normal samples) <latexit sha1_base64="S9Cu8AKFnK6I/8lMB82XcUuy6Gc=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkqMeiF48V7Qe0oWy2m3bpZhN2J0Io/QlePCji1V/kzX/jts1BWx8MPN6bYWZekEhh0HW/ncLa+sbmVnG7tLO7t39QPjxqmTjVjDdZLGPdCajhUijeRIGSdxLNaRRI3g7GtzO//cS1EbF6xCzhfkSHSoSCUbTSQ9b3+uWKW3XnIKvEy0kFcjT65a/eIGZpxBUySY3pem6C/oRqFEzyaamXGp5QNqZD3rVU0YgbfzI/dUrOrDIgYaxtKSRz9ffEhEbGZFFgOyOKI7PszcT/vG6K4bU/ESpJkSu2WBSmkmBMZn+TgdCcocwsoUwLeythI6opQ5tOyYbgLb+8SloXVe+yWruvVeo3eRxFOIFTOAcPrqAOd9CAJjAYwjO8wpsjnRfn3flYtBacfOYY/sD5/AEQJI2q</latexit> y1 <latexit sha1_base64="AXHDfzJfI2psFexzj7c34ZbJEOI=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY9FLx4r2lpoQ9lsJ+3SzSbsboQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IBFcG9f9dgpr6xubW8Xt0s7u3v5B+fCoreNUMWyxWMSqE1CNgktsGW4EdhKFNAoEPgbjm5n/+IRK81g+mEmCfkSHkoecUWOl+0m/1i9X3Ko7B1klXk4qkKPZL3/1BjFLI5SGCap113MT42dUGc4ETku9VGNC2ZgOsWuppBFqP5ufOiVnVhmQMFa2pCFz9fdERiOtJ1FgOyNqRnrZm4n/ed3UhFd+xmWSGpRssShMBTExmf1NBlwhM2JiCWWK21sJG1FFmbHplGwI3vLLq6Rdq3oX1fpdvdK4zuMowgmcwjl4cAkNuIUmtIDBEJ7hFd4c4bw4787HorXg5DPH8AfO5w8RqI2r</latexit> y2 <latexit sha1_base64="2CemlM6+NZ4ujdFym/hTPFz7v/c=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0m0qMeiF48V7Qe0oWy2m3bpZhN2J0Io/QlePCji1V/kzX/jts1Bqw8GHu/NMDMvSKQw6LpfTmFldW19o7hZ2tre2d0r7x+0TJxqxpsslrHuBNRwKRRvokDJO4nmNAokbwfjm5nffuTaiFg9YJZwP6JDJULBKFrpPuuf98sVt+rOQf4SLycVyNHolz97g5ilEVfIJDWm67kJ+hOqUTDJp6VeanhC2ZgOeddSRSNu/Mn81Ck5scqAhLG2pZDM1Z8TExoZk0WB7YwojsyyNxP/87ophlf+RKgkRa7YYlGYSoIxmf1NBkJzhjKzhDIt7K2EjaimDG06JRuCt/zyX9I6q3oX1dpdrVK/zuMowhEcwyl4cAl1uIUGNIHBEJ7gBV4d6Tw7b877orXg5DOH8AvOxzcTLI2s</latexit> y3 <latexit sha1_base64="S9Cu8AKFnK6I/8lMB82XcUuy6Gc=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkqMeiF48V7Qe0oWy2m3bpZhN2J0Io/QlePCji1V/kzX/jts1BWx8MPN6bYWZekEhh0HW/ncLa+sbmVnG7tLO7t39QPjxqmTjVjDdZLGPdCajhUijeRIGSdxLNaRRI3g7GtzO//cS1EbF6xCzhfkSHSoSCUbTSQ9b3+uWKW3XnIKvEy0kFcjT65a/eIGZpxBUySY3pem6C/oRqFEzyaamXGp5QNqZD3rVU0YgbfzI/dUrOrDIgYaxtKSRz9ffEhEbGZFFgOyOKI7PszcT/vG6K4bU/ESpJkSu2WBSmkmBMZn+TgdCcocwsoUwLeythI6opQ5tOyYbgLb+8SloXVe+yWruvVeo3eRxFOIFTOAcPrqAOd9CAJjAYwjO8wpsjnRfn3flYtBacfOYY/sD5/AEQJI2q</latexit> y1 Principal Component Analysis - tissue labels not used - low-dimensional projections of 78-dim. mRNA expression data:
  • 15. Groningen, December 2021 whole blood - brain tissues - all other tissues different tissues have different RP mRNA signatures! Principal Component Analysis - tissue labels not used - low-dimensional projections of 78-dim. mRNA expression data - post-labelling (coloring) of data points according to tissue (group): <latexit sha1_base64="S9Cu8AKFnK6I/8lMB82XcUuy6Gc=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkqMeiF48V7Qe0oWy2m3bpZhN2J0Io/QlePCji1V/kzX/jts1BWx8MPN6bYWZekEhh0HW/ncLa+sbmVnG7tLO7t39QPjxqmTjVjDdZLGPdCajhUijeRIGSdxLNaRRI3g7GtzO//cS1EbF6xCzhfkSHSoSCUbTSQ9b3+uWKW3XnIKvEy0kFcjT65a/eIGZpxBUySY3pem6C/oRqFEzyaamXGp5QNqZD3rVU0YgbfzI/dUrOrDIgYaxtKSRz9ffEhEbGZFFgOyOKI7PszcT/vG6K4bU/ESpJkSu2WBSmkmBMZn+TgdCcocwsoUwLeythI6opQ5tOyYbgLb+8SloXVe+yWruvVeo3eRxFOIFTOAcPrqAOd9CAJjAYwjO8wpsjnRfn3flYtBacfOYY/sD5/AEQJI2q</latexit> y1 <latexit sha1_base64="AXHDfzJfI2psFexzj7c34ZbJEOI=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY9FLx4r2lpoQ9lsJ+3SzSbsboQS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8IBFcG9f9dgpr6xubW8Xt0s7u3v5B+fCoreNUMWyxWMSqE1CNgktsGW4EdhKFNAoEPgbjm5n/+IRK81g+mEmCfkSHkoecUWOl+0m/1i9X3Ko7B1klXk4qkKPZL3/1BjFLI5SGCap113MT42dUGc4ETku9VGNC2ZgOsWuppBFqP5ufOiVnVhmQMFa2pCFz9fdERiOtJ1FgOyNqRnrZm4n/ed3UhFd+xmWSGpRssShMBTExmf1NBlwhM2JiCWWK21sJG1FFmbHplGwI3vLLq6Rdq3oX1fpdvdK4zuMowgmcwjl4cAkNuIUmtIDBEJ7hFd4c4bw4787HorXg5DPH8AfO5w8RqI2r</latexit> y2 <latexit sha1_base64="2CemlM6+NZ4ujdFym/hTPFz7v/c=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0m0qMeiF48V7Qe0oWy2m3bpZhN2J0Io/QlePCji1V/kzX/jts1Bqw8GHu/NMDMvSKQw6LpfTmFldW19o7hZ2tre2d0r7x+0TJxqxpsslrHuBNRwKRRvokDJO4nmNAokbwfjm5nffuTaiFg9YJZwP6JDJULBKFrpPuuf98sVt+rOQf4SLycVyNHolz97g5ilEVfIJDWm67kJ+hOqUTDJp6VeanhC2ZgOeddSRSNu/Mn81Ck5scqAhLG2pZDM1Z8TExoZk0WB7YwojsyyNxP/87ophlf+RKgkRa7YYlGYSoIxmf1NBkJzhjKzhDIt7K2EjaimDG06JRuCt/zyX9I6q3oX1dpdrVK/zuMowhEcwyl4cAl1uIUGNIHBEJ7gBV4d6Tw7b877orXg5DOH8AvOxzcTLI2s</latexit> y3 <latexit sha1_base64="S9Cu8AKFnK6I/8lMB82XcUuy6Gc=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkqMeiF48V7Qe0oWy2m3bpZhN2J0Io/QlePCji1V/kzX/jts1BWx8MPN6bYWZekEhh0HW/ncLa+sbmVnG7tLO7t39QPjxqmTjVjDdZLGPdCajhUijeRIGSdxLNaRRI3g7GtzO//cS1EbF6xCzhfkSHSoSCUbTSQ9b3+uWKW3XnIKvEy0kFcjT65a/eIGZpxBUySY3pem6C/oRqFEzyaamXGp5QNqZD3rVU0YgbfzI/dUrOrDIgYaxtKSRz9ffEhEbGZFFgOyOKI7PszcT/vG6K4bU/ESpJkSu2WBSmkmBMZn+TgdCcocwsoUwLeythI6opQ5tOyYbgLb+8SloXVe+yWruvVeo3eRxFOIFTOAcPrqAOd9CAJjAYwjO8wpsjnRfn3flYtBacfOYY/sD5/AEQJI2q</latexit> y1 GTeX data (normal samples)
  • 16. Groningen, December 2021 based on dis-similarity/distance measure assignment to prototypes: e.g. Nearest Prototype Scheme given vector xµ , determine winner or best matching unit (BMU) → assign xµ to prototype w* most popular: (squared) Euclidean distance Competitive Learning VQ system: set of prototypes data: set of feature vectors w1 , w2 , . . . , wK wk 2 I RN d(w, x) 0 x1 , x2 , . . . , xP xµ 2 I RN w⇤ = argminj d(wj , xµ ) Vector Quantization: identify typical representatives of data which capture essential features d(w, x) = N X i=1 (wi xi) 2
  • 17. Groningen, December 2021 and random sequential (repeated) presentation of data … BMU: The Winner Takes it All : initially: randomized wk, e.g. in randomly selected data points Competitive Learning η (<1): learning rate, step size of update wi⇤ = argminj d(wj , xi ) {xµ } ⌘ (xµ w⇤ ) w⇤ ! w⇤ + ⌘ (xµ w⇤ ) repeated presentation of the data set protoytpes → typical representatives of data possibly reflect structures in the data set, e.g. clusters
  • 18. Groningen, December 2021 Self-Organizing Map (SOM) T. Kohonen. Self-Organizing Maps (Springer 1995) neighborhood relation on a pre-defined low-dim. lattice d-dim. lattice A of prototypes (neurons) - update BMU and lattice neighborhood: where range ρ w.r.t. distances in lattice A ws wr 2 I RN at r 2 I Rd h⇢(r, s) = exp ✓ || r s ||2 A 2⇢2 ◆ upon presentation of xµ : - determine the Best Matching Unit at position s in the lattice wr ! wr + ⌘ h⇢(r, s) (xµ wr) e.g. two-dim. square lattice, triangular, etc.
  • 19. Groningen, December 2021 prototype lattice deforms, reflecting the density of observations typical case: high-dim. orginal feature space → two-dim. lattice © Wikipedia SOM: provides topology/neighborhood preserving low-dimensional representation (visualization, clusters …) Frequently: unsupervised analysis, post-hoc labelling of prototypes according to the majority of assigned data points Self-Organizing Map (SOM)
  • 20. Groningen, December 2021 20 other tissues ovary liver adrenal gland pancreas muscle heart cells (fibroblasts) cells (ebv) skin pituitary brain (other) cerebellum testis blood 9 13 14 13 14 11 12 11 11 11 10 6 6 6 6 6 6 6 30 26 32 33 33 33 38 38 38 8 8 16 13 17 12 12 9 39 32 6 6 6 4 4 4 48 32 32 32 33 38 38 38 8 16 15 17 17 17 11 50 1 36 51 6 6 4 4 4 27 32 32 32 33 38 38 38 17 16 16 18 10 17 42 50 50 2 28 26 6 4 4 4 5 6 32 32 33 38 38 38 38 8 20 18 10 18 17 42 42 50 50 26 30 30 26 5 5 5 1 36 36 32 32 33 38 38 38 18 20 20 16 10 15 42 42 50 50 27 26 30 30 30 28 6 6 39 39 36 36 36 1 19 19 20 50 50 50 43 26 30 30 28 51 52 39 39 39 36 36 2 1 1 1 1 19 19 34 50 50 50 50 35 26 30 26 51 51 51 39 39 39 2 21 1 1 1 1 2 49 49 49 35 36 36 50 50 50 50 50 30 30 43 51 51 21 21 39 39 21 2 1 1 1 1 2 49 49 35 36 36 36 37 37 37 43 43 43 43 40 40 52 21 39 39 21 1 1 1 1 2 2 49 35 35 36 36 36 37 48 48 37 36 43 2 24 40 21 21 21 39 5 5 1 1 1 1 1 3 3 35 36 47 47 46 48 48 48 27 36 39 21 2 21 21 44 45 45 45 2 5 53 53 36 53 3 35 47 46 46 27 48 37 36 30 39 21 52 44 44 44 45 44 44 44 53 53 53 53 53 3 3 35 36 27 27 27 27 36 36 36 36 1 29 52 44 45 45 45 44 44 45 53 53 53 53 3 35 43 27 27 27 52 36 33 23 29 7 23 52 45 45 45 45 45 53 53 53 53 41 48 43 43 52 29 29 29 29 23 23 23 22 22 23 23 45 45 45 45 45 53 53 53 41 41 43 29 29 29 29 29 23 23 23 23 22 22 23 23 23 45 45 45 45 45 53 53 53 4 4 4 4 4 3 3 3 3 3 4 15 15 15 15 15 15 15 15 15 9 9 9 9 10 10 10 4 4 4 4 4 3 3 4 15 9 15 15 15 15 15 15 15 9 9 9 9 10 10 10 4 4 4 4 4 4 3 15 15 15 15 15 15 15 15 15 15 9 9 9 9 10 10 10 4 4 4 4 4 4 5 15 15 15 15 15 15 15 15 15 15 15 9 9 9 10 10 10 10 4 4 4 4 4 4 5 5 15 15 15 15 15 15 15 15 15 15 15 15 9 9 9 10 10 10 4 4 4 4 4 4 5 5 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 4 4 4 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 4 4 15 15 15 15 15 13 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 2 2 2 13 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 2 2 13 15 15 15 15 15 15 15 15 15 15 14 14 15 15 15 15 15 15 15 15 15 15 15 2 13 13 15 15 15 15 15 15 15 15 15 15 15 14 15 15 15 15 15 15 15 15 15 15 15 12 12 13 15 15 15 15 15 15 15 15 15 15 15 15 15 15 6 15 6 15 15 15 1 1 15 1 12 13 15 15 15 15 15 15 15 15 15 15 6 6 6 6 6 6 6 6 1 1 1 1 1 12 12 13 15 15 15 15 15 15 15 15 15 15 15 15 6 6 6 6 6 6 6 1 1 1 1 12 13 15 15 15 15 15 15 9 8 15 15 8 15 6 6 6 6 6 1 1 1 1 11 15 15 15 15 15 15 15 15 8 8 8 7 7 8 8 6 6 6 6 6 1 1 1 11 11 15 15 15 15 15 15 8 8 8 8 7 7 8 8 8 6 6 6 6 6 1 1 1 individual tissue labels suggested groups of tissues 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 grayscale: avg. distance from neighbor prototypes GTeX data (normal samples)
  • 21. Groningen, December 2021 21 subsets of RP mRNA data: if majority<50% if majority <50% GTeX data (normal samples) different tissues have different RP mRNA signatures!
  • 22. Groningen, December 2021 22 low-dimensional embedding - represent high-dim data in low-dim. embedding space - no functional mapping from (as in PCA) instead: representation of by explicit counterparts <latexit sha1_base64="1tv1oazNTjeAf027AGYdq27jEJM=">AAACF3icbVDLSgMxFM3UVx0fHXXpJlgEBSkzouhCsCCCm5Yq9gGdsWTStA3NZMYkI5Shf+HGX9GFC0Xc6s5P8C9MH6BWL4ScnHMvuef4EaNS2faHkZqanpmdS8+bC4tLyxlrZbUiw1hgUsYhC0XNR5IwyklZUcVILRIEBT4jVb97MtCrN0RIGvJL1YuIF6A2py2KkdJUw8q5AVId308u+ldF6O5AV4XD65su6Nd1jJqwcFRsWFk7Zw8L/gXOGGTzmc/jbfPhtNSw3t1miOOAcIUZkrLu2JHyEiQUxYz0TTeWJEK4i9qkriFHAZFeMvTVh5uaacJWKPThCg7ZnxMJCqTsBb7uHKwrJ7UB+Z9Wj1Xr0Esoj2JFOB591IoZ1N4HIcEmFQQr1tMAYUH1rhB3kEBY6ShNHYIzafkvqOzmnP2cfa7T2AOjSoN1sAG2gAMOQB6cgRIoAwxuwT14As/GnfFovBivo9aUMZ5ZA7/KePsCzwOgqw==</latexit> RN ! RM , M < N <latexit sha1_base64="TCNomwS7dMfW9CJyG4NGilIoNhk=">AAACE3icbVC7SgNBFJ31GeMramkzGgSxCLuiaCMEbCwjmAfsJsvsZDYZMvtg5q4ali38Axv/wG+wsVDE1sbO7/AHnDwKTTwwcOace7n3Hi8WXIFpfhkzs3PzC4u5pfzyyuraemFjs6aiRFJWpZGIZMMjigkesipwEKwRS0YCT7C61zsf+PVrJhWPwivox6wZkE7IfU4JaMktHDiC+eCk2AkIdD0/vc1aTpBgR/JOF5zMTfXvzMpaFewWimbJHAJPE2tMiuWdx+8b+86suIVPpx3RJGAhUEGUsi0zhmZKJHAqWJZ3EsViQnukw2xNQxIw1UyHN2V4Tytt7EdSvxDwUP3dkZJAqX7g6crB5mrSG4j/eXYC/mkz5WGcAAvpaJCfCAwRHgSE21wyCqKvCaGS610x7RJJKOgY8zoEa/LkaVI7LFnHJfNSp3GERsihbbSL9pGFTlAZXaAKqiKK7tETekGvxoPxbLwZ76PSGWPcs4X+wPj4AQRpodg=</latexit> {xµ } P µ=1 <latexit sha1_base64="WUiTE2WauLEqQrLB5tnyWwIJtwU=">AAACEHicbVC7SgNBFJ31bXytWtqMimgVdkXRRgjYWEYwKmSTZXZyNxmcfTBzNxKWLfwAG7/BP7CxUMTW0s7v8AecJBZqPHDhzDn3MveeIJVCo+N8WGPjE5NT0zOzpbn5hcUle3nlXCeZ4lDjiUzUZcA0SBFDDQVKuEwVsCiQcBFcHff9iy4oLZL4DHspNCLWjkUoOEMj+fa2JyFEL6deF3jeK5pelFFPiXYHvcLPzevILZpV6tubTtkZgI4S95tsVtbvP6/rN07Vt9+9VsKzCGLkkmldd50UGzlTKLiEouRlGlLGr1gb6obGLALdyAcHFXTLKC0aJspUjHSg/pzIWaR1LwpMZ8Swo/96ffE/r55heNjIRZxmCDEffhRmkmJC++nQllDAUfYMYVwJsyvlHaYYR5NhyYTg/j15lJzvlt39snNq0tgjQ8yQNbJBdohLDkiFnJAqqRFObskDeSLP1p31aL1Yr8PWMet7ZpX8gvX2BZOqoIc=</latexit> {~ yµ } P µ=1 Multi-Dimensional Scaling: pair-wise distances <latexit sha1_base64="POajwQbY6hz3V3rd5z+JTTBxyeU=">AAACTnicbVHLbhMxFPWEV0h5BFhVbCwqpFSCMBOBYFNRqRtWVZFIWykzGXmcO4lb2zOyr1Gj0XwTH8IGdVd2fAIbFiAEnkkXpeVKlo/PuUf2Pc5KKSyG4VnQuXb9xs1b3du9tTt3793vP3i4bwtnOIx5IQtzmDELUmgYo0AJh6UBpjIJB9nxTqMffARjRaE/4LKERLG5FrngDD2V9mGW7g5ixXCR5dVJPY2Ve0YvnrXbpFs0lpDjJLZOpdXRVlRPd2mv5QYn6VFjet7u2sVGzBe4OR2tQDKtohejOu1vhMOwLXoVROdgY/vt+qdvX6u1vbR/Gs8K7hRo5JJZO4nCEpOKGRRcQt2LnYWS8WM2h4mHmimwSdXGUdOnnpnRvDB+aaQte9FRMWXtUmW+sxnUXtYa8n/axGH+JqmELh2C5quLcicpFrTJls6EAY5y6QHjRvi3Ur5ghnH0P9DzIUSXR74K9kfD6NUwfO/TeElW1SWPyRMyIBF5TbbJO7JHxoSTz+Q7+Ul+BV+CH8Hv4M+qtROcex6Rf6rT/Qtej7gZ</latexit> dN (xµ , x⌫ ) = 2 4 N X j=1 xµ j x⌫ j 2 3 5 1/2 <latexit sha1_base64="FIH/CltnRbY0w1ADFSAcgtfVJmc=">AAACSHicbVBbaxQxGM1svdT1ttZHX4JF3IKuM4uiCIWCIL4UKrhtYedCJvvNbtgkMyRfCsMwP08EH30p/Q2++KCIb2Zni2jrgZCTc75DkpNXUlgMw7Ogt3Hl6rXrmzf6N2/dvnN3cG/r0JbOcJjwUpbmOGcWpNAwQYESjisDTOUSjvLlm5V/dALGilJ/wLqCRLG5FoXgDL2UDbJZtj+MT4A3dZvGyj2hfw7a7dBdGksocBpbp7JmuRu16T7td9qwzparxNNu1y42Yr7AnXS8JknaRM/GbTbYDkdhB3qZROdke+/149OUvv10kA2+xLOSOwUauWTWTqOwwqRhBgWX0PZjZ6FifMnmMPVUMwU2aboiWvrIKzNalMYvjbRT/040TFlbq9xPKoYLe9Fbif/zpg6LV0kjdOUQNF9fVDhJsaSrVulMGOAoa08YN8K/lfIFM4yj777vS4gufvkyORyPohej8L1v4zlZY5M8IA/JkETkJdkj78gBmRBOPpKv5Dv5EXwOvgU/g1/r0V5wnrlP/kGv9xtDWLT1</latexit> dM (~ yµ , ~ y⌫ ) = " M X k=1 (yµ k y⌫ k ) 2 #1/2 d M dN x3 x2 x1 find →distances approx. reproduced <latexit sha1_base64="bIkLS64pBnl9X+CZIjkQnpD14rg=">AAACeHicbVFNb9MwGHbCgFG+Chx3sTYmOqlUScUEhx0mceHA0CbRbVKdRY77pjWzncwfE1WUOzd+D3+DG7f9iV12wkmnqWy8kqXHz4dev6+zUnBjo+hPEN5buf/g4eqjzuMnT5897754eWgKpxmMWCEKfZxRA4IrGFluBRyXGqjMBBxlpx8b/egctOGF+mrnJSSSThXPOaPWU2n3J5FcpRURkFtSkXNg1bw+IdIRzaczS+oaY2Kc9Bbp+kS5Pm5EBWce17jNjfEk/dIjktpZllff23h/+arcVuetN+31lhr0b7CXF92Sk2Ha3YgGUVv4LoivwcbuzsWvH5+/7eyn3d9kUjAnQVkmqDHjOCptUlFtORNQd4gzUFJ2Sqcw9lBRCSap2sXVeNMzE5wX2h9lccsuJyoqjZnLzDubccxtrSH/p42dzT8kFVels6DYolHuBLYFbn4BT7gGZsXcA8o092/FbEY1Zdb/VccvIb498l1wOBzE24PowG/jHVrUKlpD66iHYvQe7aJPaB+NEEOXwVrwOtgMrkIcvgm3FtYwuM68Qv9UOPwLmB7FcQ==</latexit> min {~ yµ} X µ,⌫,µ6=⌫ [dN (xµ , x⌫ ) dM (~ yµ , ~ y⌫ )] 2
  • 23. Groningen, December 2021 23 stochastic neighborhood embedding • consider distance-based probability for pair-wise neighborhood in the original feature space, e.g. local std. deviations !µ determined by local density of data <latexit sha1_base64="4Ar8HLBeqAIglHOmzs8gK1Id8sw=">AAACYnicbVFNTxsxEPVuodC0hdCeqnIYFVWiB8JuKGqPSFw4UqkBpHgTeR1vYmF7LX9URMv+oP6C/o/eeuoFJH4G3k0qIehIlp/fzNPMPOdacOuS5E8UP1tZfb62/qLz8tXrjc3u1pszW3pD2YCWojQXObFMcMUGjjvBLrRhROaCneeXx03+/Aczlpfqu5trlkkyVbzglLhAjbtzPa6w9HANWPkasDaldiVgdqUBC1a4Iey1dw9wYQit0rrq17CLJXGzvKiu6lEj34OHhPLwadQHbPh05vYBWz6VZNQPndoeLZ2NuztJL2kDnoJ0CXaODt79nNz9uj0dd3/jSUm9ZMpRQawdpol2WUWM41SwuoO9ZZrQSzJlwwAVkcxmVWtRDR8DM4GiNOEoBy37UFERae1c5qGyWcQ+zjXk/3JD74qvWcWV9o4pumhUeAHBxMZvmHDDqBPzAAg1PMwKdEaCky78SieYkD5e+Sk46/fSw17yLbjxGS1iHb1HH9AuStEXdIRO0CkaIIr+RqvRRrQZ3cSdeCt+uyiNo6Vm+f4X8fY96WS6iQ==</latexit> pµ|⌫ / exp  1 2 (xµ x⌫ )2 2 ⌫ <latexit sha1_base64="1+hnkY9YBLe/CdN/7Me1tUkBq1Y=">AAACL3icbVDLSgMxFM34tr6qLt0ERVCUMuMD3QgFQVyJgq1Cp5RMmmlDk8yQ3BHKOD/gB/gDbt34K92IKOLGhX9hOnXh60LgnHPP5eaeIBbcgOs+OUPDI6Nj4xOThanpmdm54vxC1USJpqxCIxHpy4AYJrhiFeAg2GWsGZGBYBdB57Dfv7hi2vBInUM3ZnVJWoqHnBKwUqN4FDdSXyabvkoyfID9UBOaelm6dZJhX7AQ1gYGfI1zywbuc5VzabmveasN643iilty88J/gfcFVsobd9Xb95vt00ax5zcjmkimgApiTM1zY6inRAOngmUFPzEsJrRDWqxmoSKSmXqa35vhVas0cRhp+xTgXP0+kRJpTFcG1ikJtM3vXl/8r1dLINyvp1zFCTBFB4vCRGCIcD883OSaURBdCwjV3P4V0zaxkYGNuGBD8H6f/BdUt0rebsk9s2nsoEFNoCW0jNaQh/ZQGR2jU1RBFN2jHnpGL86D8+i8Om8D65DzNbOIfpTz8Qnaqqv4</latexit> pµ,⌫ = 1 2N pµ|⌫ + p⌫|µ • analogous in embedding space: pairwise probabilities e.g. Gaussian density in Stochastic Neighborhood Embedding (SNE) <latexit sha1_base64="lPrSjBgpne1Rd6UpTRdvI+EX2Cc=">AAAB83icbVC7SgNBFL0bXzG+ohYWNoNBsJCwK4qWARvtIpgHZJc4O5lNBmdn13kIYclv2FgoYmvtf9j5A5Z+g5NHoYkHLhzOuZd77wlTzpR23U8nNze/sLiUXy6srK6tbxQ3t+oqMZLQGkl4IpshVpQzQWuaaU6bqaQ4DjlthLfnQ79xT6ViibjW/ZQGMe4KFjGCtZX8u3bmx+bQF2aA2sWSW3ZHQLPEm5BSZefy6/s9d1NtFz/8TkJMTIUmHCvV8txUBxmWmhFOBwXfKJpicou7tGWpwDFVQTa6eYD2rdJBUSJtCY1G6u+JDMdK9ePQdsZY99S0NxT/81pGR2dBxkRqNBVkvCgyHOkEDQNAHSYp0bxvCSaS2VsR6WGJibYxFWwI3vTLs6R+VPZOyu6VTeMYxsjDLuzBAXhwChW4gCrUgEAKD/AEz45xHp0X53XcmnMmM9vwB87bD4/KlQI=</latexit> qµ,⌫ t-SNE: <latexit sha1_base64="+xq1lEf6Uwgk0rwIYzZqUwN2hsc=">AAACcHicbVFdaxQxFM2MX3X92toXQcVoEVopy0xR9KFCQR8E+7AFty1sxiGTvbMbm2SmSaawxHnWv+Tf8M03/4Av/gLv7hZZWy9cOPece5Obk6JW0vkk+RHFly5fuXpt5Xrnxs1bt+90V+8euKqxAgaiUpU9KrgDJQ0MvPQKjmoLXBcKDovjNzP98BSsk5X54Kc1ZJqPjSyl4B6pvPuFaWnywAI7BRGm7UemG9a2lLlGI62bLWYwkTRwghCVLVr/VdoOlkwZTCj9kJWWi7Akt+FkqaDMyvHEZ3jGa5zrvM3D+712o/95fzPvrie9ZB70IkjPwPruzs9vX/c+7fTz7nc2qkSjwXihuHPDNKl9Frj1UijAvRoHNRfHfAxDhIZrcFmYG9bSp8iMaFlZTOPpnF2eCFw7N9UFdmruJ+68NiP/pw0bX77KgjR148GIxUVlo6iv6Mx9OpIWhFdTBFxYibtSMeFomsc/6qAJ6fknXwQH2730RS/ZRzeek0WskPvkCdkgKXlJdsk70icDIsivaC16ED2Mfsf34kfx40VrHJ3NrJF/In72B48gwHo=</latexit> min {~ yµ} X µ,⌫,µ6=⌫ pµ,⌫ ln  pµ,⌫ qµ,⌫ = DKL(P|Q) find →similar probabilities with emphasis on local neighborhoods Kullback-Leibler divergence student-t distribution <latexit sha1_base64="R/E7CgYeaLCTpNcw41fOfKniBdk=">AAACbHicbVFdaxQxFM2MH63rR7fVB6UIwaJUbJeZRdE+CAVffKzgtoXNdMlk7+yGJplpclNYhnny5/gPxD9R3/wJvvgbzOwWWVsvBM4951xucpJXSjpMkp9RfOPmrdsrq3c6d+/df7DWXd84dKW3AgaiVKU9zrkDJQ0MUKKC48oC17mCo/z0Q6sfnYN1sjSfcVZBpvnEyEIKjoEadb+cjWqm/Q4zvqHvKSssFzVzXgfaTssdhjyIATEDZ23TUKagwGEnfbXNzkHUs+ak1eku/dsG28uTPrNyMsWsqZed2i8bTetrRt2tpJfMi14H6SXY2u9ffGV7378djLo/2LgUXoNBobhzwzSpMKu5RSkUNB3mHVRcnPIJDAM0XIPL6nlYDX0emDEtShuOQTpnlydqrp2b6Tw4Ncepu6q15P+0ocfiXVZLU3kEIxaLCq8olrRNno6lBYFqFgAXVoa7UjHlIW8M/9MJIaRXn3wdHPZ76Zte8imk8ZosapVskmdkm6TkLdknH8kBGRBBfkVr0ePoSfQ7fhRvxk8X1ji6nHlI/qn4xR8iZ79K</latexit> qµ,⌫ = P ⇢,⌧,⇢6=⌧ ⇥ 1 + (~ y⇢ ~ y⌧ )2 ⇤ 1 + (~ yµ ~ y⌫)2
  • 24. Groningen, December 2021 t-SNE analysis, posthoc labelling according to tissues: see NARpublication for more detailed maps GTeX data (normal samples) clustering of tissue-types here: no intuitive interpretation of y1,y2 different tissues have different RP mRNA signatures!
  • 25. Groningen, December 2021 ∙ identification of prototype vectors from labeled example data ∙ distance based classification (e.g. Euclidean) (supervised) Learning Vector Quantization N-dimensional data, feature vectors • initialize prototype vectors for different classes supervised competititve learning: LVQ1 [Kohonen, 1990] • identify the winner (closest prototype) • present a single example • move the winner - closer towards the data (same class) - away from the data (different class)
  • 26. Groningen, December 2021 ∙ identification of prototype vectors from labeled example data ∙ distance based classification (e.g. Euclidean) N-dimensional data, feature vectors ∙ tesselation of feature space [piece-wise linear] ∙ distance-based classification [here: Euclidean distances] ∙ aim: discrimination of classes ( ≠ vector quantization or density estimation ) (supervised) Learning Vector Quantization
  • 27. Groningen, December 2021 generalized quadratic distance in LVQ: = contribution of the pair of features i,j ⇤ij ⇤jj = relevance of feature j d(w, x) = N X i,j=1 (wi xi) ⇤ij (wj xj) Matrix Relevance LVQ <latexit sha1_base64="lSgxu3alJZqcvn93Zvl1W/TnnkA=">AAACQ3icbVBNS8MwGE7n16xfVQ8evASH4Gm0ouhlMPCi4EHBbeI6Zpplmi1Na5KKo7R/y6sX/4A3/4AXQUW8CqabolNfCDw8H7xvHi9kVCrbvjdyI6Nj4xP5SXNqemZ2zppfqMogEphUcMACcewhSRjlpKKoYuQ4FAT5HiM1r7uT6bVLIiQN+JHqhaThozNO2xQjpammdeLua3MLNWNKk5IDXd8LruI0bQciRYylCaTQdaH5xadp8p3oJCV7KJHZXU4uYEeHmlbBLtr9gX+B8wkK5aW9h6fr3OlB07pzWwGOfMIVZkjKumOHqhEjoShmJDHdSJIQ4S46I3UNOfKJbMT9DhK4qpkW1EfoxxXssz8TMfKl7PmedvpIncvfWkb+p9Uj1d5uxJSHkSIcDxa1IwZVALNCYYsKghXraYCwoPpWiM+RQFjp2k1dgvP7y39Bdb3obBbtQ93GBhhMHiyDFbAGHLAFymAXHIAKwOAGPIBn8GLcGo/Gq/E2sOaMz8wiGBrj/QPJLrY5</latexit> ⇤ii = 1 for all i ⇤ij = 0 for i 6= j <latexit sha1_base64="CNvpcRs5/fS0apxEYNXY/9SzUzc=">AAACLnicbZDLSgMxFIYz3q23qksRgiJU0DJTFF0oCCK4VLBa6NQhk2ba2MyF5IxtGeaJ3PgOPoEuBBVx6xu4NdMqaPVAwsd/ziH5fzcSXIFpPhlDwyOjY+MTk7mp6ZnZufz8wrkKY0lZmYYilBWXKCZ4wMrAQbBKJBnxXcEu3NZh1r+4ZlLxMDiDbsRqPmkE3OOUgJac/FG9YPsEmq6XtNONb+yk69jewPvZZavYd5KrtMeCeVBoOxxv4o7DbckbTVi/LDn5VbNo9gr/BesLVg/2WvRu+cM9cfIPdj2ksc8CoIIoVbXMCGoJkcCpYGnOjhWLCG2RBqtqDIjPVC3p2U3xmlbq2AulPgHgnvpzIyG+Ul3f1ZOZHzXYy8T/etUYvN1awoMoBhbQ/kNeLDCEOMsO17lkFERXA6GS679i2iSSUNAJ53QI1qDlv3BeKlrbRfNUp7GF+jWBltAKKiAL7aADdIxOUBlRdIPu0TN6MW6NR+PVeOuPDhlfO4voVxnvn2TTqoM=</latexit> d(w, x) = X j (wi xi) 2 squared Euclidean distance
  • 28. Groningen, December 2021 generalized quadratic distance in LVQ: d(w, x) = (w x) > ⇤ (w x) = [ ⌦ (w x) ] 2 Matrix Relevance LVQ
  • 29. Groningen, December 2021 training Matrix Relevance LVQ move prototypes and change matrix Ω in order to decrease if labels agree increase if labels disagree <latexit sha1_base64="XxMQLvfGY5l9SQvqEKuTc545nFs=">AAACAXicbZDLSsNAFIZP6q3WW7wsBDfBIlSQkoiiy4IblxXsBdpQJtNJOziZhJmJWkLc+CpuXCji1rdw59s4SSto6w8DH/85hznn9yJGpbLtL6MwN7+wuFRcLq2srq1vmJtbTRnGApMGDlko2h6ShFFOGooqRtqRICjwGGl5NxdZvXVLhKQhv1ajiLgBGnDqU4yUtnrmbr/SDZAaen5ylx794H162DPLdtXOZc2CM4FybcfPVe+Zn91+iOOAcIUZkrLj2JFyEyQUxYykpW4sSYTwDRqQjkaOAiLdJL8gtQ6007f8UOjHlZW7vycSFEg5Cjzdma0op2uZ+V+tEyv/3E0oj2JFOB5/5MfMUqGVxWH1qSBYsZEGhAXVu1p4iATCSodW0iE40yfPQvO46pxW7SudxgmMVYQ92IcKOHAGNbiEOjQAwwM8wQu8Go/Gs/FmvI9bC8ZkZhv+yPj4BjlKmaE=</latexit> d(w, x) ⇤ij quantify the relevance of features / pairs after training: prototypes: represent typical class properties Relevance Matrix: leading eigenvectors of ! mark the most discriminative directions in feature space → low-dim. representation of labelled data sets generalized quadratic distance in LVQ: d(w, x) = (w x) > ⇤ (w x) = [ ⌦ (w x) ] 2
  • 30. Groningen, December 2021 brain tissues (7-12) blood spleen brain cerebellum (8+9) muscle pancreas testis heart (21,22) liver pituitary (28) muscle different tissues have different RP mRNA signatures! GTeX data (normal samples)
  • 31. Groningen, December 2021 kidney cancer has 3 distinct subtypes (cell of origin): KIRP, KIRC, KICH six tumor types have ribosomal subtypes: e.g. bladder cancer (BLCA) skin melanoma (SKCM) eye melanoma (UVM) TCGA data: tumor samples different tumors and subtypes have different RP mRNA signatures! BLCA SKCM UVM t-SNE
  • 32. Groningen, December 2021 RP-profile tumor subtypes display significant differences in survival: examples: uveal melanoma bladder cancer TCGA data: tumor samples
  • 33. Groningen, December 2021 TCGA data: tumor samples classification: LVQ relevances ROC evaluated in 60/40% validation univariate statistical test identifies discriminative RP e.g. KIRP vs. KIRC kidney tumors vs.KIRC 0.0 0.5 1.0
  • 34. Groningen, December 2021 RP mRNA signatures vary with Ÿ tissue type Ÿ tumor type and sub-type Ÿ developmental stage RP translation rates are proportional to RP mRNA levels RP mRNA and profiling different in cell cultures vs. cells in-vivo main findings: (not all presented here, see NAR paper) speculative (yet plausible) conclusions: RP composition and function Ÿ is tissue-, tumor-, development-, environment-specific Ÿ adds a novel layer to the regulatory network of the cell Ÿ might play an important role in cancer caveats: composition could be independent of RP abundance possible extra-ribosomal functions of RP direct inspection of ribosome is difficult conclusion