SlideShare a Scribd company logo
1 of 10
Download to read offline
Scale-invariant Bayesian Neural Networks
with Connectivity Tangent Kernel
Sungyub Kim, Sihwan Park, Kyungsu Kim, Eunho Yang
ICLR 2023
Graduate School of AI, KAIST
Background
โ€ข Data-dependent PAC-Bayes bound
โ€ข PAC-Bayes bound: Generalization gap is bounded by KL divergence between prior and posterior
โ€ข Data-dependent PAC-Bayes bound: Use data-dependent PAC-Bayes prior
โ€ข Partition training ๐’ฎ into ๐’ฎโ„™ and ๐’ฎโ„š
โ€ข Pre-train a โ„™๐’Ÿ with ๐’ฎโ„™ (Therefore, โ„™๐’Ÿ and ๐’ฎโ„š are independent.)
โ€ข Fine-tuning โ„š with entire dataset ๐’ฎ.
<latexit sha1_base64="0fOhK+Uw2PjQWa3dwlm09D4NvWY=">AAACi3icdVFdaxNBFJ1dq43RatTHvgwNQooQd2O1RXwo1EKhpaRo2kJ2CbOTu+nQ2Q9n7gphnD/jT/LNf9PZTUo11QsD5557DjP3TFJKoTEIfnv+g7WHj9Zbj9tPnm48e9558fJcF5XiMOKFLNRlwjRIkcMIBUq4LBWwLJFwkVwf1POL76C0KPKvOC8hztgsF6ngDB016fyMMoZXKjOglJ2YpuNMms/W9pomScyZ3aaRBPof6ZcV6Rsa6W8KTZQqxs2t6fjEju9U0Y9bPLRx7ZDFrDdY+E7t22gKEtm2NYNTayedbtAPmqL3QbgEXbKs4aTzK5oWvMogRy6Z1uMwKDE2TKHgEmw7qjSUjF+zGYwdzFkGOjZNlpa+dsyUpoVyJ0fasH86DMu0nmeJU9Yb6NVZTf5rNq4w3YuNyMsKIeeLi9JKUixo/TF0KhRwlHMHGFfCvZXyK+YiRPd9bRdCuLryfXA+6Icf+u/Pdrr7wTKOFtkkW6RHQrJL9skRGZIR4V7L63u73p6/4b/zP/qfFlLfW3pekb/KP7wBSynIyQ==</latexit>
errD(Q) ๏ฃฟ errS(Q) +
s
KL[QkP] + log(2
p
N/ )
2N
<latexit sha1_base64="B+qECs1sRNnoMn188cNLIEIWGe4=">AAACv3icdVHfa9swEJbdbmuzX+n2uBfRsJEyyOywtXvYQ8v2MFgpKVvaQmyMrJxTEVl2JbkQVP2Texj0v5nsuLRptwPBd9/dd3e6S0vOlA6Ca89fW3/0+MnGZufps+cvXna3Xp2oopIUxrTghTxLiQLOBIw10xzOSgkkTzmcpvOvdfz0EqRihfilFyXEOZkJljFKtKOS7p8oJ/pc5gaktIlpPEq4+WZtv3HS1BzbHdzBzt5FHPB/BD9tcpu/qn2PI3UhtYkySai50f84tJPbrOjqBo/aQssp4lrNi1l/uKxxdLfLh2gKXJMda4YrvMVJtxcMgsbwQxC2oIdaGyXd39G0oFUOQlNOlJqEQaljQ6RmlIPtRJWCktA5mcHEQUFyULFp9m/xW8dMcVZI94TGDXtXYUiu1CJPXWY9pLofq8l/xSaVzj7Hhomy0iDoslFWcawLXB8TT5kEqvnCAUIlc7Niek7clrU7ecctIbz/5YfgZDgIdwefjj/29oN2HRvoDdpGfRSiPbSPvqMRGiPqffFSb+5x/8Cf+cIvl6m+12peoxXzF38Bxd7dyA==</latexit>
errD(Q) ๏ฃฟ errSQ
(Q) +
s
KL[QkPD] + log(2
p
NQ/ )
2NQ
Independent
Background
โ€ข Invariance of generalization bounds for function-preserving transformations
โ€ข Function-preserving scaling transformations (FPST)
โ€ข Scale transformation (diagonal matrix) of parameters that preserves function
โ€ข Practical examples
โ€ข 1) Rescaling transformation of positive homogeneous (e.g., ReLU) NNs
โ€ข 2) Weight decay (WD) with Batch Normalization (BN) layers
<latexit sha1_base64="Fwg+qLM54Y+yNbGsKfO00TITrOE=">AAACT3icbVFbS8MwGE3rfd6qPvoSHMIGY7TiDUQQ9MHHKc4N1jrSLN2CaVqSVByl/9AXffNv+OKDIqZbEd38IOTknPORLyd+zKhUtv1qmDOzc/MLi0ul5ZXVtXVrY/NWRonApIkjFom2jyRhlJOmooqRdiwICn1GWv79ea63HoiQNOI3ahgTL0R9TgOKkdJU1wqCymMNuiFSA4xYepNV3FjSahWewrGSn/R2At0gEogx+Ahdyscdvp9eZ3fpRVb7UXP/lKGRda2yXbdHBaeBU4AyKKrRtV7cXoSTkHCFGZKy49ix8lIkFMWMZCU3kSRG+B71SUdDjkIivXSURwZ3NdODeiK9uIIj9ndHikIph6GvnfmUclLLyf+0TqKCYy+lPE4U4Xh8UZAwqCKYhwt7VBCs2FADhAXVs0I8QAJhpb+gpENwJp88DW736s5h/eBqv3y2V8SxCLbBDqgABxyBM3AJGqAJMHgCb+ADfBrPxrvxZRZW0yjAFvhT5tI3dOKx7g==</latexit>
f(x, T ( )) = f(x, ), 8x 2 RD
, 8 2 RP
<latexit sha1_base64="h6rSnbDr/lAOHe02dX1jCp5081Y=">AAADpXicnVLbbhMxEN1kuZRwS+GRl1ETIJVCSBA3ISFV4qUvSG3VpJXiaOV1ZhMrXntZz1aE1X4Zf8Ebf4N3k0BoEUjMg3V0xmOfOTNhoqSlfv97re5fu37j5s6txu07d+/db+4+GFmTpQKHwiiTnofcopIahyRJ4XmSIo9DhWfh4kOZP7vA1EqjT2mZ4CTmMy0jKTg5KtitfQUXLDKGtCG08gvmHRZzmguu8pMiyNmMxzHvqu6i6DCaI/H9/SCXBbyHRlUb4kzqXDgRtlgxVQUwMTUEq5JAPukyws+Ug4ygvSGBSd0Glic85XEPbBZaJBBGaxQk9Qy4BamTjACnM7RABtqL9jOaA3f5i6oH4ARtVZGKLzFlRcHYSsfml+cbSX8R8Q8VJqP/kMEUD1Hl+OlditYZ6h4rtqU5G39Kikz6S1PbtQ0Og3FHCtvmop6uzS6CZqvf61cBV8FgDVreOo6C5jc2NSKLUZNQ3NrxoJ/QJOcpSaGwaLDMYsLFgs9w7KDmMdpJXm1ZAY8dM61kRkaX/jh2uyLnsbXLOHQ3y/2xl3Ml+afcOKPo7SSv5oxarD6KMlXaXK4sTGXqBqGWDnCRSqcVxNzNSpBb7IYzYXC55atg9KI3eN17dfyyddBf27HjPfL2vI438N54B96hd+QNPVHfqx/Wj+sn/lP/o3/qj1ZX67V1zUPvt/CDHwILJTc=</latexit>
(R ,l,k(โœ“))i =
8
<
:
ยท โœ“i , if โœ“i 2 {param. subset connecting as input edges to k-th activation at l-th layer}
โœ“i/ , if โœ“i 2 {param. subset connecting as output edges to k-th activation at l-th layer}
โœ“i , for โœ“i in the other cases
<latexit sha1_base64="dhL2SMsbctMdhmwrcgOMZoaTEBY=">AAADFnicbVLLjtMwFE3CayiP6cCSzRUNaEYqVYp4bZBGYsNyEHRmpLqKHOcmterYUXwzokT5Cjb8ChsWIMQWseNvcNpKDFPuwjo6x76Pc52USlqKot9+cOnylavXdq73bty8dXu3v3fn2Jq6EjgRRpnqNOEWldQ4IUkKT8sKeZEoPEkWrzr95AwrK41+R8sSZwXPtcyk4OSoeM8fMoUZ7bOC01xw1bxt44blvCj4UA0X7T6jORI/YJXM53QQN7KFl9ADFywzhrQhtPIDNizBXOpGuF5su9ZXWeJm0QITqSFYp+pSPBwywvfUgMwg3NASmNQhsKbkFS9GYOvEIoEwWqMgqXPgFqQuawJMc7RABsJF+IjmwJ1+thoIOEGoVqTiS6xY2zK27mareGaqv8VDlxocBuOOCs6PgTrdjNXG/UE0ilYB22C8AQNvE0dx/xdLjagL1CQUt3Y6jkqaNbwiKRS2PVZbLLlY8BynDmpeoJ01q7W28MAxKXRtZkZ3Tjj2/IuGF9Yui8Td7LZnL2od+T9tWlP2YtasvEQt1oWyWnWOdn8EUlk5y9XSAS4q6XoFMXdbEeR+Us+ZML448jY4fjwaPxs9ffNkcBht7Njx7nn3vX1v7D33Dr3X3pE38YT/0f/sf/W/BZ+CL8H34Mf6auBv3tz1/ong5x/KQ/j2</latexit>
(S ,l,k(โœ“))i =
โ‡ข
k ยท โœ“i , if โœ“i 2 {param. subset connecting as input edges to k-th activation at l-th layer}
โœ“i , for โœ“i in the other cases
Background
โ€ข Invariance of generalization bounds for function-preserving transformations
โ€ข Existing sharpness metric are vulnerable to FPSTs.
โ€ข Dinh et al. (2017) showed the vulnerability of sharpness to rescaling transformations.
โ€ข Rescaling two successive layers can arbitrarily sharpens NNs with preserving functions.
โ€ข Li et al. (2018) showed the vulnerability of sharpness to weight decay.
โ€ข WD improves generalizability, but sharpens NNs.
โ€ข Prior works focused only on the scale-invariance of sharpness metrics (and not generalization bounds)
โ€ข G-bounds of Tsuzuku et al. (2020) and Kwon et al. (2021) include scale-dependent terms.
โ€ข G-bounds of Petzka et al. (2021) only holds for single-layer NNs.
Key Idea โ€“ Modification of Jacobian
โ€ข Many sharpness metrics / generalization bounds are based on the Jacobian.
โ€ข Hessian, Fisher, and Gauss-Newton (GN) matrix: (Jacobian) (Jacobian)T
โ€ข Neural Tangent Kernel (NTK): (Jacobian)T (Jacobian)
โ€ข However, Jacobian w.r.t. parameter is not invariant to FPST.
โ€ข To mitigate this issue, we consider Jacobian w.r.t. connectivity.
โ€ข Note that we relax the binary constraints of connectivity for differentiable formulation.
<latexit sha1_base64="vXiphMQLqBd+PseLjGZH3czZm2U=">AAACInicbVDLSsNAFJ3UV62vqks3g0VwVRLxuRAKLhRXFewDmhgm00k7dPJg5kYoId/ixl9x40JRV4If47SNoLUHBs6ccy/33uPFgiswzU+jMDe/sLhUXC6trK6tb5Q3t5oqSiRlDRqJSLY9opjgIWsAB8HasWQk8ARreYOLkd+6Z1LxKLyFYcycgPRC7nNKQEtu+cwOCPQ9P73M8Dn++VxnbmpDnwHJ7myI4lmGW66YVXMM/J9YOamgHHW3/G53I5oELAQqiFIdy4zBSYkETgXLSnaiWEzogPRYR9OQBEw56fjEDO9ppYv9SOoXAh6rvztSEig1DDxdOVpVTXsjcZbXScA/dVIexgmwkE4G+YnAEOFRXrjLJaMghpoQKrneFdM+kYSCTrWkQ7CmT/5PmgdV67h6dHNYqZl5HEW0g3bRPrLQCaqhK1RHDUTRA3pCL+jVeDSejTfjY1JaMPKebfQHxtc3J1ulRg==</latexit>
G = J>
โœ“ Jโœ“
<latexit sha1_base64="IY4xUPeQmZnTDQb6UKTXm1eWOA4=">AAACHnicbZDLSsNAFIYn9VbrLerSzWARXJVErLoRCm7EVYXeoKlhMp20QycXZk6EEvIkbnwVNy4UEVzp2zhpu9C2Pwz8fOcc5pzfiwVXYFk/RmFldW19o7hZ2tre2d0z9w9aKkokZU0aiUh2PKKY4CFrAgfBOrFkJPAEa3ujm7zefmRS8ShswDhmvYAMQu5zSkAj16w6jSEDgq+xExAYen56l7mpAznMlrEHB6LYNctWxZoILxp7Zspoprprfjn9iCYBC4EKolTXtmLopUQCp4JlJSdRLCZ0RAasq21IAqZ66eS8DJ9o0sd+JPULAU/o34mUBEqNA0935vuq+VoOl9W6CfhXvZSHcQIspNOP/ERgiHCeFe5zySiIsTaESq53xXRIJKGgEy3pEOz5kxdN66xiX1Sq9+flmjWLo4iO0DE6RTa6RDV0i+qoiSh6Qi/oDb0bz8ar8WF8TlsLxmzmEP2T8f0Lkw+jYw==</latexit>
โ‡ฅ = Jโœ“J>
โœ“
<latexit sha1_base64="AQKp6vcNEGO5ZVlY0MuB1AjR8Kc=">AAACc3icfVFNS+RAEO1Ed1ezX6MLXjzYOCvMwDok4tdFEPayeFJwVJiMQ6en4jR2OrG7Ig4hf2B/njf/hRfvdjJzcFR80PB49R5VXRVlUhj0/QfHnZv/9PnLwqL39dv3Hz8bS8tnJs01hy5PZaovImZACgVdFCjhItPAkkjCeXT9t6qf34I2IlWnOM6gn7ArJWLBGVpp0PgfJgxHUVwclYOi5pzJ4rRshTgCZO2ydfeHzuiZEe029Q7oTLJ2T8yV4WXkstgMSuqFCm4+zAwaTb/j16BvSTAlTTLF8aBxHw5TniegkEtmTC/wM+wXTKPgEkovzA1kjF+zK+hZqlgCpl/UOyvphlWGNE61fQpprb5MFCwxZpxE1lnNbF7XKvG9Wi/HeL9fCJXlCIpPGsW5pJjS6gB0KDRwlGNLGNfCzkr5iGnG0Z7Js0sIXn/5LTnb6gS7nZ2T7eahP13HAlkl66RFArJHDsk/cky6hJNHZ8VZc6jz5K666+7vidV1pplfZAbu5jOAT709</latexit>
JT (โœ“)(x, T ( )) = Jโœ“(x, )T 1
6= Jโœ“(x, )
<latexit sha1_base64="x+hxCHN9Q4n4VMW0Idf/x1ESwk8=">AAADGXichVJNb9QwEHXCR8vytYUjF4sVaFeCJamAwqFSpV4Qp0XqtpXWS+Q4zq5VJ47sCerK5G9w4a9w4QCqOMKJf4OTXdCSUjGSpac3b97M2I4LKQwEwU/Pv3T5ytWNzWud6zdu3rrd3bpzaFSpGR8zJZU+jqnhUuR8DAIkPy40p1ks+VF8sl/nj95xbYTKD2BR8GlGZ7lIBaPgqGjLe0IkT2FIMgrzOLWvq8iyqn/6CDcMo9IeVH1SGDHARCUKMBsQLWZzeO+Eu7/LwioaVZ2Hu3jdh8CcA73AbNBQOrOJoDNHtgWuHXGGF/k5CR6s2761j8NqncCtBssVSDNke2WG/5j+b8mo2wuGQRP4PAhXoIdWMYq630miWJnxHJikxkzCoICppRoEk7zqkNLwgrITOuMTB3OacTO1zctW+IFjEpwq7U7uxqrZ9QpLM2MWWeyU9ZCmnavJf+UmJaQvplbkRQk8Z8tGaSkxKFx/E5wIzRnIhQOUaeFmxWxONWXgPlPHXULYXvk8ONwehs+Hz9487e29XF3HJrqH7qM+CtEO2kOv0AiNEfM+eJ+8L95X/6P/2T/zvy2lvrequYv+Cv/HL3EB/NA=</latexit>
Jc(x, T ( ) c)|c=1P
= Jโœ“(x, T ( ))diag(T ( ))
= Jโœ“(x, )T 1
T diag( )
= Jc(x, c)|c=1P
Key Idea โ€“ Design of PAC-Bayes prior and posterior
โ€ข Apply Bayesian Linear Regression for PAC-Bayes prior and posterior
โ€ข PAC-Bayes prior: Isotropic Gaussian distribution centered at pre-trained parameter
โ€ข PAC-Bayes posterior: Posterior of Bayesian Linear Regression given PAC-Bayes prior
where
โ€ข Check Appendix D for the detailed derivation!
<latexit sha1_base64="g/Gk9BrpTL8pJpLMrkk734bRPLI=">AAACTHicbVDLSgNBEJyN7/iKevQyGAQjIewGXwhCwIsniWBUyCahdzJJBmcfzPQKYc0HevHgza/w4kERwUmygkYLBoqqarqnvEgKjbb9bGWmpmdm5+YXsotLyyurubX1Kx3GivEaC2WobjzQXIqA11Cg5DeR4uB7kl97t6dD//qOKy3C4BL7EW/40A1ERzBAI7VyzPUBe56XVAetxMUeR2juDnbcSIvC8QkduQxkcj7WqFu8d4v0O2gYyKgHzfI4qfykLaBrsmmgYJxCK5e3S/YI9C9xUpInKaqt3JPbDlns8wCZBK3rjh1hIwGFgkk+yLqx5hGwW+jyuqEB+Fw3klEZA7ptlDbthMq8AOlI/TmRgK913/dMcniynvSG4n9ePcbOUSMRQRQjD9h4USeWFEM6bJa2heIMZd8QYEqYWynrgQKGpv+sKcGZ/PJfclUuOQel/Yu9fKWc1jFPNskW2SEOOSQVckaqpEYYeSAv5I28W4/Wq/VhfY6jGSud2SC/kJn9AuHSsn0=</latexit>
Pโœ“โ‡ค ( ) := N( | โœ“โ‡ค
, โ†ต2
diag(โœ“โ‡ค
)2
)
<latexit sha1_base64="SX+3k5IFOadRyhMNgI3eBmmlmGQ=">AAAChnicdVFdT9swFHXCProwRsce92KtQqIwVQkbAx6QKu1lTxMICkhNqW4ct7Ww48i+mVSF/JT9Kd74N7hphgZsV7J0fM6519f3JrkUFsPwzvNXXrx89br1Jlh9u/Zuvf1+49zqwjA+YFpqc5mA5VJkfIACJb/MDQeVSH6RXH9f6Be/uLFCZ2c4z/lIwTQTE8EAHTVu/44V4CxJypNqXMY44whX29VWnFvRDY6CWmUgy59L7ob+8dCdBxjrVCONVeEqPFSrPtP6YlSZCpi69MbdjU/FVMEj73+c3XG7E/bCOuhzEDWgQ5o4Hrdv41SzQvEMmQRrh1GY46gEg4JJXgVxYXkO7BqmfOhgBorbUVmPsaKbjknpRBt3MqQ1+3dGCcrauUqcc9GvfaotyH9pwwInB6NSZHmBPGPLhyaFpKjpYic0FYYzlHMHgBnheqVsBgYYus0FbgjR0y8/B+e7vehbb+/ka6d/2IyjRT6ST2SLRGSf9MkPckwGhHkrXtfb9b74Lb/n7/n7S6vvNTkfyKPw+/ejvMRM</latexit>
Qโœ“โ‡ค ( ) = N( |โœ“โ‡ค
+ โœ“โ‡ค
ยตQ, diag(โœ“โ‡ค
)โŒƒQdiag(โœ“โ‡ค
))
<latexit sha1_base64="D4tgLCVI3IiStfsy97dYd1AqkX8=">AAAEIHicpVPLbhMxFHVneJTwaFqWbCwiUAptlKloeUhIldgAq1SQNijORB7HM7Hqecj2IEWWP4UNv8KGBQjBDr4Gz0OQRysQXGmk43Ov7z33aBxknEnV7X5fc9wLFy9dXr/SuHrt+o2N5ubWsUxzQWifpDwVgwBLyllC+4opTgeZoDgOOD0JTp8V+ZO3VEiWJq/VLKOjGEcJCxnBylLjTecAxflYoxiraRDoI2Pg3SdPIQoFJhq9YlGMF7IVDPVLM9bE+EilGUSchqoNyxTBXL8xcBeG7V/ngdlBakoV9u9tI8Giqdo2Gsmit79n4J+miVhPGI5M+3eTOREVaXxdSDH/JwXtINQ4Q0ZlSd260loreGEV9IxtgXk2Lbe5v1SxYNQCN+9BrcXXu15pyD/M+jubVnTUCXhOj/NEjputbqdbBlwFXg1aoI7euPkNTVKSxzRRhGMph143UyONhWKEU9NAuaQZJqc4okMLExxTOdLlD27gHctMYJgK+yUKluz8DY1jKWdxYCuLNeRyriDPyg1zFT4aaZZkuaIJqQaFOYcqhcVrgRMmKFF8ZgEmglmtkEyx9VzZN9WwJnjLK6+C472Od9DZP3rQOnxc27EOboHboA088BAcguegB/qAOO+cD84n57P73v3ofnG/VqXOWn3nJlgI98dPjaJlNA==</latexit>
ยตQ :=
โŒƒQJ>
c (Y f(X, โœ“โ‡ค
))
2
=
โŒƒQdiag(โœ“โ‡ค
)J>
โœ“ (Y f(X, โœ“โ‡ค
))
2
โŒƒQ :=
โœ“
IP
โ†ต2
+
J>
c Jc
2
โ—† 1
=
โœ“
IP
โ†ต2
+
diag(โœ“โ‡ค
)J>
โœ“ Jโœ“diag(โœ“โ‡ค
)
2
โ—† 1
Theoretical results
โ€ข PAC-Bayes-CTK and its invariance
โ€ข Perturbation term relates to the Complexity Measure of Data (CMD) term of Arora et al. (2019).
โ€ข Sharpness term is positively correlated to the eigenvalues of CTK.
โ€ข Each term in PAC-Bayes-CTK is invariant to FPST.
Empirical results
โ€ข Invariance of PAC-Bayes-CTK
โ€ข The tightness of PAC-Bayes-CTK is not changed by FPSTs.
Empirical results
โ€ข Robustness to the selection of prior scale.
โ€ข Standard Linearized Laplace (LL) is sensitive to the prior scale (๐›ผ) and has different optimal sharpness
for each metric.
โ€ข Our PAC-Bayes posterior, called Connecitivity Laplace (CL), was robust to the prior scale and consistent
between metrics.
Conclusion
โ€ข Our contribution is threefold:
โ€ข We introduced a novel PAC-Bayes bound guarantees invariance for FPSTs with a broad class of
networks. We empirically verify this bound gives tight results for ResNet with 11M parameters.
โ€ข Based on the sharpness term of our bound, we provided a low-complexity sharpness metric, called
Connectivity Sharpness.
โ€ข To prevent overconfident predictions, we showed how our PAC-Bayes posterior can be used to solve
pitfalls of WD with BNs, proving its practicality.

More Related Content

Similar to S. Kim, ICLR 2023, MLILAB, KAISTAI

V2 final presentation 08-12-2014 (akash gupta's conflicted copy 2014-12-08)
V2 final presentation 08-12-2014 (akash gupta's conflicted copy 2014-12-08)V2 final presentation 08-12-2014 (akash gupta's conflicted copy 2014-12-08)
V2 final presentation 08-12-2014 (akash gupta's conflicted copy 2014-12-08)
0309akash
ย 
V2 final presentation 08-12-2014
V2 final presentation 08-12-2014V2 final presentation 08-12-2014
V2 final presentation 08-12-2014
0309akash
ย 
Breaking the 49 qubit barrier in the simulation of quantum circuits
Breaking the 49 qubit barrier in the simulation of quantum circuitsBreaking the 49 qubit barrier in the simulation of quantum circuits
Breaking the 49 qubit barrier in the simulation of quantum circuits
hquynh
ย 
ICFHR'18-DataAug.pptx
ICFHR'18-DataAug.pptxICFHR'18-DataAug.pptx
ICFHR'18-DataAug.pptx
KartikDutta10
ย 

Similar to S. Kim, ICLR 2023, MLILAB, KAISTAI (20)

Quantum Computation for Predicting Electron and Phonon Properties of Solids
Quantum Computation for Predicting Electron and Phonon Properties of SolidsQuantum Computation for Predicting Electron and Phonon Properties of Solids
Quantum Computation for Predicting Electron and Phonon Properties of Solids
ย 
Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...
Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...
Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...
ย 
Bayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-LearningBayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-Learning
ย 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
ย 
Scalable constrained spectral clustering
Scalable constrained spectral clusteringScalable constrained spectral clustering
Scalable constrained spectral clustering
ย 
GPU Kernels for Block-Sparse Weights
GPU Kernels for Block-Sparse WeightsGPU Kernels for Block-Sparse Weights
GPU Kernels for Block-Sparse Weights
ย 
V2 final presentation 08-12-2014 (akash gupta's conflicted copy 2014-12-08)
V2 final presentation 08-12-2014 (akash gupta's conflicted copy 2014-12-08)V2 final presentation 08-12-2014 (akash gupta's conflicted copy 2014-12-08)
V2 final presentation 08-12-2014 (akash gupta's conflicted copy 2014-12-08)
ย 
High performance large-scale image recognition without normalization
High performance large-scale image recognition without normalizationHigh performance large-scale image recognition without normalization
High performance large-scale image recognition without normalization
ย 
Synthesis of a Sparse 2D-Scanning Array using Particle Swarm Optimization for...
Synthesis of a Sparse 2D-Scanning Array using Particle Swarm Optimization for...Synthesis of a Sparse 2D-Scanning Array using Particle Swarm Optimization for...
Synthesis of a Sparse 2D-Scanning Array using Particle Swarm Optimization for...
ย 
Continual learning: Variational continual learning
Continual learning: Variational continual learningContinual learning: Variational continual learning
Continual learning: Variational continual learning
ย 
QX Simulator and quantum programming - 2020-04-28
QX Simulator and quantum programming - 2020-04-28QX Simulator and quantum programming - 2020-04-28
QX Simulator and quantum programming - 2020-04-28
ย 
Bruce Bassett - Constraining Exotic Cosmologies
Bruce Bassett - Constraining Exotic CosmologiesBruce Bassett - Constraining Exotic Cosmologies
Bruce Bassett - Constraining Exotic Cosmologies
ย 
K-means clustering-based WSN protocol for energy efficiency improvement
K-means clustering-based WSN protocol for energy efficiency improvement K-means clustering-based WSN protocol for energy efficiency improvement
K-means clustering-based WSN protocol for energy efficiency improvement
ย 
defect_supercell_finite_size_schemes_10-09-18
defect_supercell_finite_size_schemes_10-09-18defect_supercell_finite_size_schemes_10-09-18
defect_supercell_finite_size_schemes_10-09-18
ย 
4 high performance large-scale image recognition without normalization
4 high performance large-scale image recognition without normalization4 high performance large-scale image recognition without normalization
4 high performance large-scale image recognition without normalization
ย 
V2 final presentation 08-12-2014
V2 final presentation 08-12-2014V2 final presentation 08-12-2014
V2 final presentation 08-12-2014
ย 
Breaking the 49 qubit barrier in the simulation of quantum circuits
Breaking the 49 qubit barrier in the simulation of quantum circuitsBreaking the 49 qubit barrier in the simulation of quantum circuits
Breaking the 49 qubit barrier in the simulation of quantum circuits
ย 
Tree_Chain
Tree_ChainTree_Chain
Tree_Chain
ย 
NIST-JARVIS infrastructure for Improved Materials Design
NIST-JARVIS infrastructure for Improved Materials DesignNIST-JARVIS infrastructure for Improved Materials Design
NIST-JARVIS infrastructure for Improved Materials Design
ย 
ICFHR'18-DataAug.pptx
ICFHR'18-DataAug.pptxICFHR'18-DataAug.pptx
ICFHR'18-DataAug.pptx
ย 

More from MLILAB

I. Chung, AAAI 2020, MLILAB, KAIST AI
I. Chung, AAAI 2020, MLILAB, KAIST AII. Chung, AAAI 2020, MLILAB, KAIST AI
I. Chung, AAAI 2020, MLILAB, KAIST AI
MLILAB
ย 
H. Shim, NeurIPS 2018, MLILAB, KAIST AI
H. Shim, NeurIPS 2018, MLILAB, KAIST AIH. Shim, NeurIPS 2018, MLILAB, KAIST AI
H. Shim, NeurIPS 2018, MLILAB, KAIST AI
MLILAB
ย 

More from MLILAB (20)

J. Jeong, AAAI 2024, MLILAB, KAIST AI..
J. Jeong,  AAAI 2024, MLILAB, KAIST AI..J. Jeong,  AAAI 2024, MLILAB, KAIST AI..
J. Jeong, AAAI 2024, MLILAB, KAIST AI..
ย 
J. Yun, NeurIPS 2023, MLILAB, KAISTAI
J. Yun,  NeurIPS 2023,  MLILAB,  KAISTAIJ. Yun,  NeurIPS 2023,  MLILAB,  KAISTAI
J. Yun, NeurIPS 2023, MLILAB, KAISTAI
ย 
S. Kim, NeurIPS 2023, MLILAB, KAISTAI
S. Kim,  NeurIPS 2023,  MLILAB,  KAISTAIS. Kim,  NeurIPS 2023,  MLILAB,  KAISTAI
S. Kim, NeurIPS 2023, MLILAB, KAISTAI
ย 
C. Kim, INTERSPEECH 2023, MLILAB, KAISTAI
C. Kim, INTERSPEECH 2023, MLILAB, KAISTAIC. Kim, INTERSPEECH 2023, MLILAB, KAISTAI
C. Kim, INTERSPEECH 2023, MLILAB, KAISTAI
ย 
Y. Jung, ICML 2023, MLILAB, KAISTAI
Y. Jung, ICML 2023, MLILAB, KAISTAIY. Jung, ICML 2023, MLILAB, KAISTAI
Y. Jung, ICML 2023, MLILAB, KAISTAI
ย 
J. Song, S. Kim, ICML 2023, MLILAB, KAISTAI
J. Song, S. Kim, ICML 2023, MLILAB, KAISTAIJ. Song, S. Kim, ICML 2023, MLILAB, KAISTAI
J. Song, S. Kim, ICML 2023, MLILAB, KAISTAI
ย 
K. Seo, ICASSP 2023, MLILAB, KAISTAI
K. Seo, ICASSP 2023, MLILAB, KAISTAIK. Seo, ICASSP 2023, MLILAB, KAISTAI
K. Seo, ICASSP 2023, MLILAB, KAISTAI
ย 
G. Kim, CVPR 2023, MLILAB, KAISTAI
G. Kim, CVPR 2023, MLILAB, KAISTAIG. Kim, CVPR 2023, MLILAB, KAISTAI
G. Kim, CVPR 2023, MLILAB, KAISTAI
ย 
Y. Kim, ICLR 2023, MLILAB, KAISTAI
Y. Kim, ICLR 2023, MLILAB, KAISTAIY. Kim, ICLR 2023, MLILAB, KAISTAI
Y. Kim, ICLR 2023, MLILAB, KAISTAI
ย 
J. Yun, AISTATS 2022, MLILAB, KAISTAI
J. Yun, AISTATS 2022, MLILAB, KAISTAIJ. Yun, AISTATS 2022, MLILAB, KAISTAI
J. Yun, AISTATS 2022, MLILAB, KAISTAI
ย 
J. Song, J. Park, ICML 2022, MLILAB, KAISTAI
J. Song, J. Park, ICML 2022, MLILAB, KAISTAIJ. Song, J. Park, ICML 2022, MLILAB, KAISTAI
J. Song, J. Park, ICML 2022, MLILAB, KAISTAI
ย 
J. Park, J. Song, ICLR 2022, MLILAB, KAISTAI
J. Park, J. Song, ICLR 2022, MLILAB, KAISTAIJ. Park, J. Song, ICLR 2022, MLILAB, KAISTAI
J. Park, J. Song, ICLR 2022, MLILAB, KAISTAI
ย 
J. Park, H. Shim, AAAI 2022, MLILAB, KAISTAI
J. Park, H. Shim, AAAI 2022, MLILAB, KAISTAIJ. Park, H. Shim, AAAI 2022, MLILAB, KAISTAI
J. Park, H. Shim, AAAI 2022, MLILAB, KAISTAI
ย 
J. Park, AAAI 2022, MLILAB, KAIST AI
J. Park, AAAI 2022, MLILAB, KAIST AIJ. Park, AAAI 2022, MLILAB, KAIST AI
J. Park, AAAI 2022, MLILAB, KAIST AI
ย 
J. Song, et. al., ASRU 2021, MLILAB, KAIST AI
J. Song, et. al., ASRU 2021, MLILAB, KAIST AIJ. Song, et. al., ASRU 2021, MLILAB, KAIST AI
J. Song, et. al., ASRU 2021, MLILAB, KAIST AI
ย 
J. Song, H. Shim et al., ICASSP 2021, MLILAB, KAIST AI
J. Song, H. Shim et al., ICASSP 2021, MLILAB, KAIST AIJ. Song, H. Shim et al., ICASSP 2021, MLILAB, KAIST AI
J. Song, H. Shim et al., ICASSP 2021, MLILAB, KAIST AI
ย 
T. Yoon, et. al., ICLR 2021, MLILAB, KAIST AI
T. Yoon, et. al., ICLR 2021, MLILAB, KAIST AIT. Yoon, et. al., ICLR 2021, MLILAB, KAIST AI
T. Yoon, et. al., ICLR 2021, MLILAB, KAIST AI
ย 
G. Park, J.-Y. Yang, et. al., NeurIPS 2020, MLILAB, KAIST AI
G. Park, J.-Y. Yang, et. al., NeurIPS 2020, MLILAB, KAIST AIG. Park, J.-Y. Yang, et. al., NeurIPS 2020, MLILAB, KAIST AI
G. Park, J.-Y. Yang, et. al., NeurIPS 2020, MLILAB, KAIST AI
ย 
I. Chung, AAAI 2020, MLILAB, KAIST AI
I. Chung, AAAI 2020, MLILAB, KAIST AII. Chung, AAAI 2020, MLILAB, KAIST AI
I. Chung, AAAI 2020, MLILAB, KAIST AI
ย 
H. Shim, NeurIPS 2018, MLILAB, KAIST AI
H. Shim, NeurIPS 2018, MLILAB, KAIST AIH. Shim, NeurIPS 2018, MLILAB, KAIST AI
H. Shim, NeurIPS 2018, MLILAB, KAIST AI
ย 

Recently uploaded

result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
Tonystark477637
ย 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
ย 

Recently uploaded (20)

University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
ย 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
ย 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spain
ย 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
ย 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
ย 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
ย 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
ย 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
ย 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
ย 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
ย 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
ย 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
ย 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
ย 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
ย 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
ย 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
ย 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
ย 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
ย 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
ย 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
ย 

S. Kim, ICLR 2023, MLILAB, KAISTAI

  • 1. Scale-invariant Bayesian Neural Networks with Connectivity Tangent Kernel Sungyub Kim, Sihwan Park, Kyungsu Kim, Eunho Yang ICLR 2023 Graduate School of AI, KAIST
  • 2. Background โ€ข Data-dependent PAC-Bayes bound โ€ข PAC-Bayes bound: Generalization gap is bounded by KL divergence between prior and posterior โ€ข Data-dependent PAC-Bayes bound: Use data-dependent PAC-Bayes prior โ€ข Partition training ๐’ฎ into ๐’ฎโ„™ and ๐’ฎโ„š โ€ข Pre-train a โ„™๐’Ÿ with ๐’ฎโ„™ (Therefore, โ„™๐’Ÿ and ๐’ฎโ„š are independent.) โ€ข Fine-tuning โ„š with entire dataset ๐’ฎ. <latexit sha1_base64="0fOhK+Uw2PjQWa3dwlm09D4NvWY=">AAACi3icdVFdaxNBFJ1dq43RatTHvgwNQooQd2O1RXwo1EKhpaRo2kJ2CbOTu+nQ2Q9n7gphnD/jT/LNf9PZTUo11QsD5557DjP3TFJKoTEIfnv+g7WHj9Zbj9tPnm48e9558fJcF5XiMOKFLNRlwjRIkcMIBUq4LBWwLJFwkVwf1POL76C0KPKvOC8hztgsF6ngDB016fyMMoZXKjOglJ2YpuNMms/W9pomScyZ3aaRBPof6ZcV6Rsa6W8KTZQqxs2t6fjEju9U0Y9bPLRx7ZDFrDdY+E7t22gKEtm2NYNTayedbtAPmqL3QbgEXbKs4aTzK5oWvMogRy6Z1uMwKDE2TKHgEmw7qjSUjF+zGYwdzFkGOjZNlpa+dsyUpoVyJ0fasH86DMu0nmeJU9Yb6NVZTf5rNq4w3YuNyMsKIeeLi9JKUixo/TF0KhRwlHMHGFfCvZXyK+YiRPd9bRdCuLryfXA+6Icf+u/Pdrr7wTKOFtkkW6RHQrJL9skRGZIR4V7L63u73p6/4b/zP/qfFlLfW3pekb/KP7wBSynIyQ==</latexit> errD(Q) ๏ฃฟ errS(Q) + s KL[QkP] + log(2 p N/ ) 2N <latexit sha1_base64="B+qECs1sRNnoMn188cNLIEIWGe4=">AAACv3icdVHfa9swEJbdbmuzX+n2uBfRsJEyyOywtXvYQ8v2MFgpKVvaQmyMrJxTEVl2JbkQVP2Texj0v5nsuLRptwPBd9/dd3e6S0vOlA6Ca89fW3/0+MnGZufps+cvXna3Xp2oopIUxrTghTxLiQLOBIw10xzOSgkkTzmcpvOvdfz0EqRihfilFyXEOZkJljFKtKOS7p8oJ/pc5gaktIlpPEq4+WZtv3HS1BzbHdzBzt5FHPB/BD9tcpu/qn2PI3UhtYkySai50f84tJPbrOjqBo/aQssp4lrNi1l/uKxxdLfLh2gKXJMda4YrvMVJtxcMgsbwQxC2oIdaGyXd39G0oFUOQlNOlJqEQaljQ6RmlIPtRJWCktA5mcHEQUFyULFp9m/xW8dMcVZI94TGDXtXYUiu1CJPXWY9pLofq8l/xSaVzj7Hhomy0iDoslFWcawLXB8TT5kEqvnCAUIlc7Niek7clrU7ecctIbz/5YfgZDgIdwefjj/29oN2HRvoDdpGfRSiPbSPvqMRGiPqffFSb+5x/8Cf+cIvl6m+12peoxXzF38Bxd7dyA==</latexit> errD(Q) ๏ฃฟ errSQ (Q) + s KL[QkPD] + log(2 p NQ/ ) 2NQ Independent
  • 3. Background โ€ข Invariance of generalization bounds for function-preserving transformations โ€ข Function-preserving scaling transformations (FPST) โ€ข Scale transformation (diagonal matrix) of parameters that preserves function โ€ข Practical examples โ€ข 1) Rescaling transformation of positive homogeneous (e.g., ReLU) NNs โ€ข 2) Weight decay (WD) with Batch Normalization (BN) layers <latexit sha1_base64="Fwg+qLM54Y+yNbGsKfO00TITrOE=">AAACT3icbVFbS8MwGE3rfd6qPvoSHMIGY7TiDUQQ9MHHKc4N1jrSLN2CaVqSVByl/9AXffNv+OKDIqZbEd38IOTknPORLyd+zKhUtv1qmDOzc/MLi0ul5ZXVtXVrY/NWRonApIkjFom2jyRhlJOmooqRdiwICn1GWv79ea63HoiQNOI3ahgTL0R9TgOKkdJU1wqCymMNuiFSA4xYepNV3FjSahWewrGSn/R2At0gEogx+Ahdyscdvp9eZ3fpRVb7UXP/lKGRda2yXbdHBaeBU4AyKKrRtV7cXoSTkHCFGZKy49ix8lIkFMWMZCU3kSRG+B71SUdDjkIivXSURwZ3NdODeiK9uIIj9ndHikIph6GvnfmUclLLyf+0TqKCYy+lPE4U4Xh8UZAwqCKYhwt7VBCs2FADhAXVs0I8QAJhpb+gpENwJp88DW736s5h/eBqv3y2V8SxCLbBDqgABxyBM3AJGqAJMHgCb+ADfBrPxrvxZRZW0yjAFvhT5tI3dOKx7g==</latexit> f(x, T ( )) = f(x, ), 8x 2 RD , 8 2 RP <latexit sha1_base64="h6rSnbDr/lAOHe02dX1jCp5081Y=">AAADpXicnVLbbhMxEN1kuZRwS+GRl1ETIJVCSBA3ISFV4qUvSG3VpJXiaOV1ZhMrXntZz1aE1X4Zf8Ebf4N3k0BoEUjMg3V0xmOfOTNhoqSlfv97re5fu37j5s6txu07d+/db+4+GFmTpQKHwiiTnofcopIahyRJ4XmSIo9DhWfh4kOZP7vA1EqjT2mZ4CTmMy0jKTg5KtitfQUXLDKGtCG08gvmHRZzmguu8pMiyNmMxzHvqu6i6DCaI/H9/SCXBbyHRlUb4kzqXDgRtlgxVQUwMTUEq5JAPukyws+Ug4ygvSGBSd0Glic85XEPbBZaJBBGaxQk9Qy4BamTjACnM7RABtqL9jOaA3f5i6oH4ARtVZGKLzFlRcHYSsfml+cbSX8R8Q8VJqP/kMEUD1Hl+OlditYZ6h4rtqU5G39Kikz6S1PbtQ0Og3FHCtvmop6uzS6CZqvf61cBV8FgDVreOo6C5jc2NSKLUZNQ3NrxoJ/QJOcpSaGwaLDMYsLFgs9w7KDmMdpJXm1ZAY8dM61kRkaX/jh2uyLnsbXLOHQ3y/2xl3Ml+afcOKPo7SSv5oxarD6KMlXaXK4sTGXqBqGWDnCRSqcVxNzNSpBb7IYzYXC55atg9KI3eN17dfyyddBf27HjPfL2vI438N54B96hd+QNPVHfqx/Wj+sn/lP/o3/qj1ZX67V1zUPvt/CDHwILJTc=</latexit> (R ,l,k(โœ“))i = 8 < : ยท โœ“i , if โœ“i 2 {param. subset connecting as input edges to k-th activation at l-th layer} โœ“i/ , if โœ“i 2 {param. subset connecting as output edges to k-th activation at l-th layer} โœ“i , for โœ“i in the other cases <latexit sha1_base64="dhL2SMsbctMdhmwrcgOMZoaTEBY=">AAADFnicbVLLjtMwFE3CayiP6cCSzRUNaEYqVYp4bZBGYsNyEHRmpLqKHOcmterYUXwzokT5Cjb8ChsWIMQWseNvcNpKDFPuwjo6x76Pc52USlqKot9+cOnylavXdq73bty8dXu3v3fn2Jq6EjgRRpnqNOEWldQ4IUkKT8sKeZEoPEkWrzr95AwrK41+R8sSZwXPtcyk4OSoeM8fMoUZ7bOC01xw1bxt44blvCj4UA0X7T6jORI/YJXM53QQN7KFl9ADFywzhrQhtPIDNizBXOpGuF5su9ZXWeJm0QITqSFYp+pSPBwywvfUgMwg3NASmNQhsKbkFS9GYOvEIoEwWqMgqXPgFqQuawJMc7RABsJF+IjmwJ1+thoIOEGoVqTiS6xY2zK27mareGaqv8VDlxocBuOOCs6PgTrdjNXG/UE0ilYB22C8AQNvE0dx/xdLjagL1CQUt3Y6jkqaNbwiKRS2PVZbLLlY8BynDmpeoJ01q7W28MAxKXRtZkZ3Tjj2/IuGF9Yui8Td7LZnL2od+T9tWlP2YtasvEQt1oWyWnWOdn8EUlk5y9XSAS4q6XoFMXdbEeR+Us+ZML448jY4fjwaPxs9ffNkcBht7Njx7nn3vX1v7D33Dr3X3pE38YT/0f/sf/W/BZ+CL8H34Mf6auBv3tz1/ong5x/KQ/j2</latexit> (S ,l,k(โœ“))i = โ‡ข k ยท โœ“i , if โœ“i 2 {param. subset connecting as input edges to k-th activation at l-th layer} โœ“i , for โœ“i in the other cases
  • 4. Background โ€ข Invariance of generalization bounds for function-preserving transformations โ€ข Existing sharpness metric are vulnerable to FPSTs. โ€ข Dinh et al. (2017) showed the vulnerability of sharpness to rescaling transformations. โ€ข Rescaling two successive layers can arbitrarily sharpens NNs with preserving functions. โ€ข Li et al. (2018) showed the vulnerability of sharpness to weight decay. โ€ข WD improves generalizability, but sharpens NNs. โ€ข Prior works focused only on the scale-invariance of sharpness metrics (and not generalization bounds) โ€ข G-bounds of Tsuzuku et al. (2020) and Kwon et al. (2021) include scale-dependent terms. โ€ข G-bounds of Petzka et al. (2021) only holds for single-layer NNs.
  • 5. Key Idea โ€“ Modification of Jacobian โ€ข Many sharpness metrics / generalization bounds are based on the Jacobian. โ€ข Hessian, Fisher, and Gauss-Newton (GN) matrix: (Jacobian) (Jacobian)T โ€ข Neural Tangent Kernel (NTK): (Jacobian)T (Jacobian) โ€ข However, Jacobian w.r.t. parameter is not invariant to FPST. โ€ข To mitigate this issue, we consider Jacobian w.r.t. connectivity. โ€ข Note that we relax the binary constraints of connectivity for differentiable formulation. <latexit sha1_base64="vXiphMQLqBd+PseLjGZH3czZm2U=">AAACInicbVDLSsNAFJ3UV62vqks3g0VwVRLxuRAKLhRXFewDmhgm00k7dPJg5kYoId/ixl9x40JRV4If47SNoLUHBs6ccy/33uPFgiswzU+jMDe/sLhUXC6trK6tb5Q3t5oqSiRlDRqJSLY9opjgIWsAB8HasWQk8ARreYOLkd+6Z1LxKLyFYcycgPRC7nNKQEtu+cwOCPQ9P73M8Dn++VxnbmpDnwHJ7myI4lmGW66YVXMM/J9YOamgHHW3/G53I5oELAQqiFIdy4zBSYkETgXLSnaiWEzogPRYR9OQBEw56fjEDO9ppYv9SOoXAh6rvztSEig1DDxdOVpVTXsjcZbXScA/dVIexgmwkE4G+YnAEOFRXrjLJaMghpoQKrneFdM+kYSCTrWkQ7CmT/5PmgdV67h6dHNYqZl5HEW0g3bRPrLQCaqhK1RHDUTRA3pCL+jVeDSejTfjY1JaMPKebfQHxtc3J1ulRg==</latexit> G = J> โœ“ Jโœ“ <latexit sha1_base64="IY4xUPeQmZnTDQb6UKTXm1eWOA4=">AAACHnicbZDLSsNAFIYn9VbrLerSzWARXJVErLoRCm7EVYXeoKlhMp20QycXZk6EEvIkbnwVNy4UEVzp2zhpu9C2Pwz8fOcc5pzfiwVXYFk/RmFldW19o7hZ2tre2d0z9w9aKkokZU0aiUh2PKKY4CFrAgfBOrFkJPAEa3ujm7zefmRS8ShswDhmvYAMQu5zSkAj16w6jSEDgq+xExAYen56l7mpAznMlrEHB6LYNctWxZoILxp7Zspoprprfjn9iCYBC4EKolTXtmLopUQCp4JlJSdRLCZ0RAasq21IAqZ66eS8DJ9o0sd+JPULAU/o34mUBEqNA0935vuq+VoOl9W6CfhXvZSHcQIspNOP/ERgiHCeFe5zySiIsTaESq53xXRIJKGgEy3pEOz5kxdN66xiX1Sq9+flmjWLo4iO0DE6RTa6RDV0i+qoiSh6Qi/oDb0bz8ar8WF8TlsLxmzmEP2T8f0Lkw+jYw==</latexit> โ‡ฅ = Jโœ“J> โœ“ <latexit sha1_base64="AQKp6vcNEGO5ZVlY0MuB1AjR8Kc=">AAACc3icfVFNS+RAEO1Ed1ezX6MLXjzYOCvMwDok4tdFEPayeFJwVJiMQ6en4jR2OrG7Ig4hf2B/njf/hRfvdjJzcFR80PB49R5VXRVlUhj0/QfHnZv/9PnLwqL39dv3Hz8bS8tnJs01hy5PZaovImZACgVdFCjhItPAkkjCeXT9t6qf34I2IlWnOM6gn7ArJWLBGVpp0PgfJgxHUVwclYOi5pzJ4rRshTgCZO2ydfeHzuiZEe029Q7oTLJ2T8yV4WXkstgMSuqFCm4+zAwaTb/j16BvSTAlTTLF8aBxHw5TniegkEtmTC/wM+wXTKPgEkovzA1kjF+zK+hZqlgCpl/UOyvphlWGNE61fQpprb5MFCwxZpxE1lnNbF7XKvG9Wi/HeL9fCJXlCIpPGsW5pJjS6gB0KDRwlGNLGNfCzkr5iGnG0Z7Js0sIXn/5LTnb6gS7nZ2T7eahP13HAlkl66RFArJHDsk/cky6hJNHZ8VZc6jz5K666+7vidV1pplfZAbu5jOAT709</latexit> JT (โœ“)(x, T ( )) = Jโœ“(x, )T 1 6= Jโœ“(x, ) <latexit sha1_base64="x+hxCHN9Q4n4VMW0Idf/x1ESwk8=">AAADGXichVJNb9QwEHXCR8vytYUjF4sVaFeCJamAwqFSpV4Qp0XqtpXWS+Q4zq5VJ47sCerK5G9w4a9w4QCqOMKJf4OTXdCSUjGSpac3b97M2I4LKQwEwU/Pv3T5ytWNzWud6zdu3rrd3bpzaFSpGR8zJZU+jqnhUuR8DAIkPy40p1ks+VF8sl/nj95xbYTKD2BR8GlGZ7lIBaPgqGjLe0IkT2FIMgrzOLWvq8iyqn/6CDcMo9IeVH1SGDHARCUKMBsQLWZzeO+Eu7/LwioaVZ2Hu3jdh8CcA73AbNBQOrOJoDNHtgWuHXGGF/k5CR6s2761j8NqncCtBssVSDNke2WG/5j+b8mo2wuGQRP4PAhXoIdWMYq630miWJnxHJikxkzCoICppRoEk7zqkNLwgrITOuMTB3OacTO1zctW+IFjEpwq7U7uxqrZ9QpLM2MWWeyU9ZCmnavJf+UmJaQvplbkRQk8Z8tGaSkxKFx/E5wIzRnIhQOUaeFmxWxONWXgPlPHXULYXvk8ONwehs+Hz9487e29XF3HJrqH7qM+CtEO2kOv0AiNEfM+eJ+8L95X/6P/2T/zvy2lvrequYv+Cv/HL3EB/NA=</latexit> Jc(x, T ( ) c)|c=1P = Jโœ“(x, T ( ))diag(T ( )) = Jโœ“(x, )T 1 T diag( ) = Jc(x, c)|c=1P
  • 6. Key Idea โ€“ Design of PAC-Bayes prior and posterior โ€ข Apply Bayesian Linear Regression for PAC-Bayes prior and posterior โ€ข PAC-Bayes prior: Isotropic Gaussian distribution centered at pre-trained parameter โ€ข PAC-Bayes posterior: Posterior of Bayesian Linear Regression given PAC-Bayes prior where โ€ข Check Appendix D for the detailed derivation! <latexit sha1_base64="g/Gk9BrpTL8pJpLMrkk734bRPLI=">AAACTHicbVDLSgNBEJyN7/iKevQyGAQjIewGXwhCwIsniWBUyCahdzJJBmcfzPQKYc0HevHgza/w4kERwUmygkYLBoqqarqnvEgKjbb9bGWmpmdm5+YXsotLyyurubX1Kx3GivEaC2WobjzQXIqA11Cg5DeR4uB7kl97t6dD//qOKy3C4BL7EW/40A1ERzBAI7VyzPUBe56XVAetxMUeR2juDnbcSIvC8QkduQxkcj7WqFu8d4v0O2gYyKgHzfI4qfykLaBrsmmgYJxCK5e3S/YI9C9xUpInKaqt3JPbDlns8wCZBK3rjh1hIwGFgkk+yLqx5hGwW+jyuqEB+Fw3klEZA7ptlDbthMq8AOlI/TmRgK913/dMcniynvSG4n9ePcbOUSMRQRQjD9h4USeWFEM6bJa2heIMZd8QYEqYWynrgQKGpv+sKcGZ/PJfclUuOQel/Yu9fKWc1jFPNskW2SEOOSQVckaqpEYYeSAv5I28W4/Wq/VhfY6jGSud2SC/kJn9AuHSsn0=</latexit> Pโœ“โ‡ค ( ) := N( | โœ“โ‡ค , โ†ต2 diag(โœ“โ‡ค )2 ) <latexit sha1_base64="SX+3k5IFOadRyhMNgI3eBmmlmGQ=">AAAChnicdVFdT9swFHXCProwRsce92KtQqIwVQkbAx6QKu1lTxMICkhNqW4ct7Ww48i+mVSF/JT9Kd74N7hphgZsV7J0fM6519f3JrkUFsPwzvNXXrx89br1Jlh9u/Zuvf1+49zqwjA+YFpqc5mA5VJkfIACJb/MDQeVSH6RXH9f6Be/uLFCZ2c4z/lIwTQTE8EAHTVu/44V4CxJypNqXMY44whX29VWnFvRDY6CWmUgy59L7ob+8dCdBxjrVCONVeEqPFSrPtP6YlSZCpi69MbdjU/FVMEj73+c3XG7E/bCOuhzEDWgQ5o4Hrdv41SzQvEMmQRrh1GY46gEg4JJXgVxYXkO7BqmfOhgBorbUVmPsaKbjknpRBt3MqQ1+3dGCcrauUqcc9GvfaotyH9pwwInB6NSZHmBPGPLhyaFpKjpYic0FYYzlHMHgBnheqVsBgYYus0FbgjR0y8/B+e7vehbb+/ka6d/2IyjRT6ST2SLRGSf9MkPckwGhHkrXtfb9b74Lb/n7/n7S6vvNTkfyKPw+/ejvMRM</latexit> Qโœ“โ‡ค ( ) = N( |โœ“โ‡ค + โœ“โ‡ค ยตQ, diag(โœ“โ‡ค )โŒƒQdiag(โœ“โ‡ค )) <latexit sha1_base64="D4tgLCVI3IiStfsy97dYd1AqkX8=">AAAEIHicpVPLbhMxFHVneJTwaFqWbCwiUAptlKloeUhIldgAq1SQNijORB7HM7Hqecj2IEWWP4UNv8KGBQjBDr4Gz0OQRysQXGmk43Ov7z33aBxknEnV7X5fc9wLFy9dXr/SuHrt+o2N5ubWsUxzQWifpDwVgwBLyllC+4opTgeZoDgOOD0JTp8V+ZO3VEiWJq/VLKOjGEcJCxnBylLjTecAxflYoxiraRDoI2Pg3SdPIQoFJhq9YlGMF7IVDPVLM9bE+EilGUSchqoNyxTBXL8xcBeG7V/ngdlBakoV9u9tI8Giqdo2Gsmit79n4J+miVhPGI5M+3eTOREVaXxdSDH/JwXtINQ4Q0ZlSd260loreGEV9IxtgXk2Lbe5v1SxYNQCN+9BrcXXu15pyD/M+jubVnTUCXhOj/NEjputbqdbBlwFXg1aoI7euPkNTVKSxzRRhGMph143UyONhWKEU9NAuaQZJqc4okMLExxTOdLlD27gHctMYJgK+yUKluz8DY1jKWdxYCuLNeRyriDPyg1zFT4aaZZkuaIJqQaFOYcqhcVrgRMmKFF8ZgEmglmtkEyx9VzZN9WwJnjLK6+C472Od9DZP3rQOnxc27EOboHboA088BAcguegB/qAOO+cD84n57P73v3ofnG/VqXOWn3nJlgI98dPjaJlNA==</latexit> ยตQ := โŒƒQJ> c (Y f(X, โœ“โ‡ค )) 2 = โŒƒQdiag(โœ“โ‡ค )J> โœ“ (Y f(X, โœ“โ‡ค )) 2 โŒƒQ := โœ“ IP โ†ต2 + J> c Jc 2 โ—† 1 = โœ“ IP โ†ต2 + diag(โœ“โ‡ค )J> โœ“ Jโœ“diag(โœ“โ‡ค ) 2 โ—† 1
  • 7. Theoretical results โ€ข PAC-Bayes-CTK and its invariance โ€ข Perturbation term relates to the Complexity Measure of Data (CMD) term of Arora et al. (2019). โ€ข Sharpness term is positively correlated to the eigenvalues of CTK. โ€ข Each term in PAC-Bayes-CTK is invariant to FPST.
  • 8. Empirical results โ€ข Invariance of PAC-Bayes-CTK โ€ข The tightness of PAC-Bayes-CTK is not changed by FPSTs.
  • 9. Empirical results โ€ข Robustness to the selection of prior scale. โ€ข Standard Linearized Laplace (LL) is sensitive to the prior scale (๐›ผ) and has different optimal sharpness for each metric. โ€ข Our PAC-Bayes posterior, called Connecitivity Laplace (CL), was robust to the prior scale and consistent between metrics.
  • 10. Conclusion โ€ข Our contribution is threefold: โ€ข We introduced a novel PAC-Bayes bound guarantees invariance for FPSTs with a broad class of networks. We empirically verify this bound gives tight results for ResNet with 11M parameters. โ€ข Based on the sharpness term of our bound, we provided a low-complexity sharpness metric, called Connectivity Sharpness. โ€ข To prevent overconfident predictions, we showed how our PAC-Bayes posterior can be used to solve pitfalls of WD with BNs, proving its practicality.