3. Kay et al., 2008, Nature.
Static images
Nishimoto et al., 2011, Curr Biol.
Movie frames
4. Huth et al., 2016, Nature.
Semantics in natural speech
de Heer et al., 2017, J Neurosci.
Acoustic vs. linguistic
contributions
5. What is a “linearized encoding model”?
Nunez-Elizalde et al., 2019, NeuroImage.
6. How is it different from GLM?
• Estimation (ridge regression)
• Delay modeling (finite impulse response;
sometimes used in GLM)
• Optimization (regularization) and evaluation
(cross-validation)
8. Consider a linear model for a single-pixel time series*
<latexit sha1_base64="o/CZnJ+k4lphQmUXsvbmK6kJ5PI=">AAACHXicbVDLSsNAFJ34rPVVdelmsAiCUBIp6kYounFZwT6gCWUyvWmHTh7MTIQQ8iNu/BU3LhRx4Ub8Gydt8NF6YODMufdw7z1uxJlUpvlpLCwuLa+sltbK6xubW9uVnd22DGNBoUVDHoquSyRwFkBLMcWhGwkgvsuh446v8nrnDoRkYXCrkggcnwwD5jFKlJb6lXpq+0SNXA8nGb7A379u9sNtFxTJ8DG2IZKM57aqWTMnwPPEKkgVFWj2K+/2IKSxD4GinEjZs8xIOSkRilEOWdmOJUSEjskQepoGxAfppJPrMnyolQH2QqFfoPBE/e1IiS9l4ru6M19YztZy8b9aL1beuZOyIIoVBHQ6yIs5ViHOo8IDJoAqnmhCqGB6V0xHRBCqdKBlHYI1e/I8aZ/UrNOaeVOvNi6LOEpoHx2gI2ShM9RA16iJWoiie/SIntGL8WA8Ga/G27R1wSg8e+gPjI8vhiKhkQ==</latexit>
y = X + ✏
* Without autocorrelation
🤖 🤔
9. Consider a linear model for a single-pixel time series*
<latexit sha1_base64="o/CZnJ+k4lphQmUXsvbmK6kJ5PI=">AAACHXicbVDLSsNAFJ34rPVVdelmsAiCUBIp6kYounFZwT6gCWUyvWmHTh7MTIQQ8iNu/BU3LhRx4Ub8Gydt8NF6YODMufdw7z1uxJlUpvlpLCwuLa+sltbK6xubW9uVnd22DGNBoUVDHoquSyRwFkBLMcWhGwkgvsuh446v8nrnDoRkYXCrkggcnwwD5jFKlJb6lXpq+0SNXA8nGb7A379u9sNtFxTJ8DG2IZKM57aqWTMnwPPEKkgVFWj2K+/2IKSxD4GinEjZs8xIOSkRilEOWdmOJUSEjskQepoGxAfppJPrMnyolQH2QqFfoPBE/e1IiS9l4ru6M19YztZy8b9aL1beuZOyIIoVBHQ6yIs5ViHOo8IDJoAqnmhCqGB6V0xHRBCqdKBlHYI1e/I8aZ/UrNOaeVOvNi6LOEpoHx2gI2ShM9RA16iJWoiie/SIntGL8WA8Ga/G27R1wSg8e+gPjI8vhiKhkQ==</latexit>
y = X + ✏
* Without autocorrelation
🤔
🤖
10. Consider a linear model for a single-pixel time series*
<latexit sha1_base64="o/CZnJ+k4lphQmUXsvbmK6kJ5PI=">AAACHXicbVDLSsNAFJ34rPVVdelmsAiCUBIp6kYounFZwT6gCWUyvWmHTh7MTIQQ8iNu/BU3LhRx4Ub8Gydt8NF6YODMufdw7z1uxJlUpvlpLCwuLa+sltbK6xubW9uVnd22DGNBoUVDHoquSyRwFkBLMcWhGwkgvsuh446v8nrnDoRkYXCrkggcnwwD5jFKlJb6lXpq+0SNXA8nGb7A379u9sNtFxTJ8DG2IZKM57aqWTMnwPPEKkgVFWj2K+/2IKSxD4GinEjZs8xIOSkRilEOWdmOJUSEjskQepoGxAfppJPrMnyolQH2QqFfoPBE/e1IiS9l4ru6M19YztZy8b9aL1beuZOyIIoVBHQ6yIs5ViHOo8IDJoAqnmhCqGB6V0xHRBCqdKBlHYI1e/I8aZ/UrNOaeVOvNi6LOEpoHx2gI2ShM9RA16iJWoiie/SIntGL8WA8Ga/G27R1wSg8e+gPjI8vhiKhkQ==</latexit>
y = X + ✏
* Without autocorrelation
🤔
🤖
∅
???
11. Consider a linear model for a single-pixel time series*
<latexit sha1_base64="o/CZnJ+k4lphQmUXsvbmK6kJ5PI=">AAACHXicbVDLSsNAFJ34rPVVdelmsAiCUBIp6kYounFZwT6gCWUyvWmHTh7MTIQQ8iNu/BU3LhRx4Ub8Gydt8NF6YODMufdw7z1uxJlUpvlpLCwuLa+sltbK6xubW9uVnd22DGNBoUVDHoquSyRwFkBLMcWhGwkgvsuh446v8nrnDoRkYXCrkggcnwwD5jFKlJb6lXpq+0SNXA8nGb7A379u9sNtFxTJ8DG2IZKM57aqWTMnwPPEKkgVFWj2K+/2IKSxD4GinEjZs8xIOSkRilEOWdmOJUSEjskQepoGxAfppJPrMnyolQH2QqFfoPBE/e1IiS9l4ru6M19YztZy8b9aL1beuZOyIIoVBHQ6yIs5ViHOo8IDJoAqnmhCqGB6V0xHRBCqdKBlHYI1e/I8aZ/UrNOaeVOvNi6LOEpoHx2gI2ShM9RA16iJWoiie/SIntGL8WA8Ga/G27R1wSg8e+gPjI8vhiKhkQ==</latexit>
y = X + ✏
* Without autocorrelation
13. What if (XTX)-1 doesn’t exist?
When the columns of X are not independent
<latexit sha1_base64="YUmrGvwPEV7SuFDowrs4Y+ja1ls=">AAAEWXicrVNNa9tAEF3LSuqqX05z7GWpMbUJNlIoTS6G0F56KDSlcWKwbLFar+wl+mJ3VRBr/ckeCqV/pYeuZNHIsmlT6IBg5r2ZN08L48Y+5cI0vze0pn5w+KD10Hj0+MnTZ+2j59c8ShgmYxz5EZu4iBOfhmQsqPDJJGYEBa5Pbtzbdzl/84UwTqPwSqQxmQVoGVKPYiQU5BxpUVfaARIr14NpBkfwdzXJoO0SgaDRtVdIlMUI2ogtnU1hBzSE63VVYLBHYL2enyoR1VzMygLNYC8dFMJpv6CrOyoac3mSpUa3jtSaTipFIVIT7FX4V5W8P5cDK9tLptnW1h1rOd/bsmWzAF5Vxfp3HkZ/6ryPVGH0H/bdS+8vpu7kHPnxw+fsf4g67Y45NIuAu4lVJh1QxqXT/movIpwEJBTYR5xPLTMWM4mYoNgnymTCSYzwLVqSqUpDFBA+k8VlZLCrkAX0Iqa+UMACrU5IFHCeBq7qzB3yOpeD+7hpIrzzmaRhnAgS4s0iL/GhiGB+ZnBBGcHCT1WCMKPKK8QrxBAW6hgN9QhW/Zd3k+vTofVmaH163bl4Wz5HC7wAL0EPWOAMXID34BKMAda+aT+bB83D5g+9obd0Y9OqNcqZY7AV+vEv5VJqPQ==</latexit>
ˆOLS = (XT
X) 1
XT
y
<latexit sha1_base64="hpW5tsCj3xUVMs6ZnnCcEwBXLnM=">AAADA3icbVLLattAFB2pr1R9xGlXpZuhxtQh2EihJNkEQrvJMoU6MVi2uRqP7CGjkZgZBYQs6Ka/0k0XLaXb/kR2/ZuOZFFUxRcGzj3nnnvnFSScKe26fyz73v0HDx/tPHaePH32fLez9+JSxakkdERiHstxAIpyJuhIM83pOJEUooDTq+D6Q6lf3VCpWCw+6Syh0wiWgoWMgDbUfM961cv9CPQqCHFW4FP8LxsX2A+oBuz4K9A1PsU+yOV8k/gRE3i9bvoHW/zr9ezQ6ZXFlTev2AL3s0HVONuv5OaMRo9ZflBkTq/NtIoOGknVpNWw39DfNvD+LB94xVYxK+adrjt0q8B3gVeDLqrjYt659RcxSSMqNOGg1MRzEz3NQWpGOC0cP1U0AXINSzoxUEBE1TSv3rDAPcMscBhLs4TGFdt05BAplUWBqSx3qNpaSW7TJqkOT6Y5E0mqqSCbQWHKsY5x+SHwgklKNM8MACKZ2SsmK5BAtPk2jrkEr33ku+DycOgdDb2P77pn7+vr2EGv0RvURx46RmfoHF2gESLWZ+ur9d36YX+xv9k/7V+bUtuqPS/Rf2H//gt/cO/J</latexit>
ˆ = arg min ||y X ||2
2
Tikhonov regularization
<latexit sha1_base64="QOhbYLpeATDyYUdW/7ks9+lnB1k=">AAAFEXicrVRNb9MwGPaWACN8dXDkYlFV61StSiYEXCpNcIADEkO0W6WmjRzXaa3GSWQ7SFGav8CFv8KFAwhx5caNf0O+YGlasYKwFOl93+fx8z52bNuBS4XU9R87u4p65eq1vevajZu3bt9p7N89E37IMRlg3/X50EaCuNQjA0mlS4YBJ4jZLjm3F88y/Pwt4YL6Xl9GARkzNPOoQzGSacnaVw5ascmQnNsOjBLYg7+zYQJNm0gEtZY5R7JMetBEfGYVicmoB5fLqsDRBoHlcnJcE6mQJnEnibRWvVIjdSpJLlITbFfwg0p8OImPjGQjGCUrXdesZXh7xZbJGexXxQ4vPPT+xNxGKjf6F/220rvM1IWeFb96+Sb536r/fF5gp0o0nyPGUFL9N/mhqtrv08Wl9lPVQuoXVmTbrMpqNPWung+4Hhhl0ATlOLUa382pj0NGPIldJMTI0AM5jhGXFLsk0cxQkADhBZqRURp6iBExjvMbncBWWplCx+fp50mYV6szYsSEiJidMjOHoo5lxU3YKJTOk3FMvSCUxMNFIyd0ofRh9jzAKeUESzdKA4Q5Tb1CPEccYZk+Ilq6CUZ9yevB2XHXeNQ1Xj9snjwtt2MP3AcPQBsY4DE4AS/AKRgArLxTPiiflM/qe/Wj+kX9WlB3d8o598DKUL/9BMwHsBE=</latexit>
ˆT ik = (XT
X + T
) 1
XT
y
<latexit sha1_base64="z89y30GU7pVMq/REjOUJs6FTyzY=">AAAGcnic1VTfi9NAEN67a+tZf/VOfFHQ1VLsUa82RdSXwqEPKgie2N4VmjZsNtt2aTYJuxshpHn37/PNv8IX/wA3ac5L06I96IsLgZn5Zr/5ZpKM6dlUyFbrx87uXqFYurZ/vXzj5q3bdyoHh2fC9TkmPezaLu+bSBCbOqQnqbRJ3+MEMdMm5+bsbYyffyVcUNfpysAjQ4YmDh1TjKQKGQeFb7VQZ0hOzTEMItiBf7x+BHWTSATLNX2KZOp0oI74xFg4OqMOnM+zBMdrCObzUTtHkkkahY0oKNfykVxSI+MkJDnCegZ/mrGPRuGxFq0Fg2ip6oq0GK8vydI5g90s2dGlhs7fMjehSoReod5GfP+lqEs+I/z08Uu0FdYtfMOwkU3U3yHGUIrnP3Ej7NLZP4UrwgXLBbbwrjqlbTVkq7VhIVVNe96OVv5aI+TUmpCNmlowwYvgh006MirVVrOVHLhqaKlRBek5NSrfdcvFPiOOxDYSYqC1PDkMEZcU20S9cV8QD+EZmpCBMh3EiBiGycqMYE1FLDh2uXocCZNo9kaImBABM1VmrFDksTi4Dhv4cvx6GFLH8yVx8KLQ2LehdGG8f6FFOcHSDpSBMKdKK8RTxBGWakuX1RC0fMurxlm7qb1sap9fVE/epOPYBw/AE1AHGngFTsB7cAp6ABd+Fu8VHxYfFX+V7pcel9LZ7e6kd+6CpVN69hvTmSvZ</latexit>
ˆ = arg min ||y X ||2
+ || ||2
2
2
Ridge regularization (Hoerl & Kennard, 1970)
<latexit sha1_base64="ZRPBfp0omrpv5doMyHPisv6yae8=">AAAFfXicrVRdb9MwFPXWFEb56sYjLxZVRUtZlUyI7aXSBA+AhMQQ7VapaSPHdVurcRLZDlKU5lfwz3jjr/ACTlJYmlYsICxFuvee63PPdexr+w4VUte/7e1XtOqt2wd3anfv3X/wsH54dCm8gGMywJ7j8aGNBHGoSwaSSocMfU4Qsx1yZS9fJ/jVZ8IF9dy+DH0yZmju0hnFSKqQdVj50oxMhuTCnsEwhj342xvG0LSJRLDWNBdIrp0eNBGfW5ljMurC1SpPcLyDYLWanBRIckmTqBOHtWYxUkjq5JyUpEDYyuFPc3Z7Eh0b8U4wjDeqbklL8NaGLJMz2M+Tta819P6UWYYqFfoX9Urx3STqms+KPrz/FP9v1n++L7CTTzTfIMZQnP83xUtlRX26vFG/os24fmGZV6qtfC1Op3NSqpqj3uI0aT0LvitTyqo39K6eLrhtGGujAdbrwqp/NaceDhhxJXaQECND9+U4QlxS7BAlPRDER3iJ5mSkTBcxIsZROj1i2FSRKZx5XH2uhGk0vyNCTIiQ2SozUSiKWBLchY0COTsbR9T1A0lcnBWaBQ6UHkxGEZxSTrB0QmUgzKnSCvECcYSlGlg1dQhGseVt4/Kka7zsGh9fNM5frY/jADwGT0ALGOAUnIO34AIMAK5816DW1p5pP6rN6vNqN0vd31vveQQ2VvX0J6Em1HY=</latexit>
ˆridge = (XT
X + I) 1
XT
y
<latexit sha1_base64="tBe+7BFQBWb2QQSb9Cr0O/Ws/nA=">AAAGd3ic1VRNj9MwEPUubVnKx3bhBgcsSqGraktTIeBSaQUHQEJiEe1upaaNHMdtrcZJZDtIUZqfwJ/jxv/gwg3nY9k0LVDQXhjJ0sy88Zs3ljWmZ1MhO52vO7tXSuXK1b1r1es3bt7arx3cPhWuzzEZYNd2+dBEgtjUIQNJpU2GHieImTY5MxevYvzsE+GCuk5fBh4ZMzRz6JRiJFXKOCh9boQ6Q3JuTmEQwR78GQ0jqJtEIlht6HMks6AHdcRnRhrojDpwucwTHG0gWC4n3QJJrmgStqKg2ihmCkWtXJCQFAibOfxxzj+chEdatBEMopWua9JivLkiS+cM9vNkhxcaer+r3IYqEfoX/bbi+y9FXfAZ4ft3H6PLZv3nTwxb+UL9NWIMZXjxjxthny7+qFwRpiznWBptNdDlz2OrtWEh1Ux70v3lWJxaM7LVYCkdPE++3WYqo1bvtDuJwXVHy5w6yOzEqH3RLRf7jDgS20iIkdbx5DhEXFJsE/VKviAewgs0IyPlOogRMQ6TvRnBhspYcOpydRwJk2z+RoiYEAEzVWWsUBSxOLkJG/ly+mIcUsfzJXFw2mjq21C6MF7C0KKcYGkHykGYU6UV4jniCEu1qqvqEbTiyOvOabetPWtrH57Wj19mz7EH7oEHoAk08BwcgzfgBAwALn0r3y3Xyw/L3yv3K48qzbR0dye7cwesWEX7Aa2RLgM=</latexit>
ˆ = arg min ||y X ||2
+ || 1/2
||2
2 2
21. How do we find the optimal lambda?
Optimization methods
• Ridge trace (where the shrinkage is stable): ≤ # lambdas
• Cross-validation (a lambda such that maximizes prediction accuracy): #
lambdas x # CV-folds
• PRESS (predicted residual sum of squares; Allen, 1971): # lambdas (only for
LOOCV)
• Generalized cross-validation: # lambdas
22. Over-optimization
“Cross-validation failure”
• If you find a regularization with a TEST SET, then can you use the same set to
evaluate the model performance?
• NO. Optimization itself introduces a bias towards the SET you used for optimization.
• You need THREE partitions: Training / Optimization / Evaluation sets (eg. Nested
CV)
Varoquaux et al., 2017, NeuroImage.
23. So what’s up with the
multivariate normal prior?
Nunez-Elizalde et al., 2019, NeuroImage.
24. OLS in Bayesian: MLE
<latexit sha1_base64="hpW5tsCj3xUVMs6ZnnCcEwBXLnM=">AAADA3icbVLLattAFB2pr1R9xGlXpZuhxtQh2EihJNkEQrvJMoU6MVi2uRqP7CGjkZgZBYQs6Ka/0k0XLaXb/kR2/ZuOZFFUxRcGzj3nnnvnFSScKe26fyz73v0HDx/tPHaePH32fLez9+JSxakkdERiHstxAIpyJuhIM83pOJEUooDTq+D6Q6lf3VCpWCw+6Syh0wiWgoWMgDbUfM961cv9CPQqCHFW4FP8LxsX2A+oBuz4K9A1PsU+yOV8k/gRE3i9bvoHW/zr9ezQ6ZXFlTev2AL3s0HVONuv5OaMRo9ZflBkTq/NtIoOGknVpNWw39DfNvD+LB94xVYxK+adrjt0q8B3gVeDLqrjYt659RcxSSMqNOGg1MRzEz3NQWpGOC0cP1U0AXINSzoxUEBE1TSv3rDAPcMscBhLs4TGFdt05BAplUWBqSx3qNpaSW7TJqkOT6Y5E0mqqSCbQWHKsY5x+SHwgklKNM8MACKZ2SsmK5BAtPk2jrkEr33ku+DycOgdDb2P77pn7+vr2EGv0RvURx46RmfoHF2gESLWZ+ur9d36YX+xv9k/7V+bUtuqPS/Rf2H//gt/cO/J</latexit>
ˆ = arg min ||y X ||2
OLS MLE (maximum likelihood estimate)
Nunez-Elizalde et al., 2019, NeuroImage.
2
25. Ridge in Bayesian: imposing a prior on β
Ridge MAP (maximum a posteriori)
<latexit sha1_base64="tBe+7BFQBWb2QQSb9Cr0O/Ws/nA=">AAAGd3ic1VRNj9MwEPUubVnKx3bhBgcsSqGraktTIeBSaQUHQEJiEe1upaaNHMdtrcZJZDtIUZqfwJ/jxv/gwg3nY9k0LVDQXhjJ0sy88Zs3ljWmZ1MhO52vO7tXSuXK1b1r1es3bt7arx3cPhWuzzEZYNd2+dBEgtjUIQNJpU2GHieImTY5MxevYvzsE+GCuk5fBh4ZMzRz6JRiJFXKOCh9boQ6Q3JuTmEQwR78GQ0jqJtEIlht6HMks6AHdcRnRhrojDpwucwTHG0gWC4n3QJJrmgStqKg2ihmCkWtXJCQFAibOfxxzj+chEdatBEMopWua9JivLkiS+cM9vNkhxcaer+r3IYqEfoX/bbi+y9FXfAZ4ft3H6PLZv3nTwxb+UL9NWIMZXjxjxthny7+qFwRpiznWBptNdDlz2OrtWEh1Ux70v3lWJxaM7LVYCkdPE++3WYqo1bvtDuJwXVHy5w6yOzEqH3RLRf7jDgS20iIkdbx5DhEXFJsE/VKviAewgs0IyPlOogRMQ6TvRnBhspYcOpydRwJk2z+RoiYEAEzVWWsUBSxOLkJG/ly+mIcUsfzJXFw2mjq21C6MF7C0KKcYGkHykGYU6UV4jniCEu1qqvqEbTiyOvOabetPWtrH57Wj19mz7EH7oEHoAk08BwcgzfgBAwALn0r3y3Xyw/L3yv3K48qzbR0dye7cwesWEX7Aa2RLgM=</latexit>
ˆ = arg min ||y X ||2
+ || 1/2
||2
2 2
Nunez-Elizalde et al., 2019, NeuroImage.
26. How to solve it? Linear transform (whitening)
Nunez-Elizalde et al., 2019, NeuroImage.
Generalized least squares
Linear transform:
Tikhonov -> Ridge
Tikhonov regression