Masayuki Tanaka
Jun. 17, 2016
Derivation of
the closed soft threshold solution
of
the Lasso regression
Lasso regression
The cost function of Lasso regression:
𝐿 𝜷, 𝜆 =
1
2
𝒀 − 𝑿𝜷 2
2
+ 𝜆 𝜷 1
Y:Data matrix
X:System matrix
Orthonormal Lasso regression
𝐿 𝜷, 𝜆 =
1
2
𝒀 − 𝑿𝜷 2
2
+ 𝜆 𝜷 1
where 𝑿 𝑇 𝑿 = 𝑰
(orthonormal)The closed form
soft threshold
solution
𝛽𝑗 = sign 𝛽𝑗
OLS
𝛽𝑗
OLS
− 𝜆
+
𝜷OLS
= arg min
𝜷
1
2
𝒀 − 𝑿𝜷 2
2
= 𝑿 𝑇
𝑿 −1
𝑿 𝑇
𝒀 = 𝑿 𝑇
𝒀
sign 𝜉 =
−1 (𝜉 < 0)
0 (𝜉 = 0)
1 (𝜉 > 0)
𝜉 + = max 𝜉, 0 =
𝜉 (𝜉 > 0)
0 (𝜉 ≤ 0)
Derivation of the soft threshold solution
arg min
𝜷
1
2
𝒀 − 𝑿𝜷 2
2
+ 𝜆 𝜷 1
= arg min
𝜷
1
2
𝒀 𝑇 𝒀 − 2𝜷 𝑇 𝑿 𝑻 𝒀 + 𝜷 𝑇 𝑿 𝑻 𝑿𝜷 + 𝜆 𝜷 1
= arg min
𝜷
1
2
−2𝜷 𝑇
𝜷OLS
+ 𝜷 𝑇
𝜷 + 𝜆 𝜷 1
𝒀 𝑇 𝒀 = 𝒄𝒐𝒏𝒔𝒕
𝑿 𝑻 𝒀 = 𝜷OLS
𝑿 𝑻 𝑿 = 𝑰
(We can consider element-wise)
arg min
𝛽𝑗
𝐶 𝛽𝑗 = arg min
𝛽𝑗
1
2
𝛽𝑗
2
− 𝛽𝑗
OLS
𝛽𝑗 + 𝜆 𝛽𝑗
𝛽𝑗 = 0
𝛽𝑗 = 0
𝛽𝑗 > 0
𝐶 𝛽𝑗 =
1
2
𝛽𝑗
2
− 𝛽𝑗
OLS
𝛽𝑗 + 𝜆𝛽𝑗
= 𝛽𝑗
1
2
𝛽𝑗 − 𝛽𝑗
OLS
+ 𝜆
𝛽𝑗 = 𝛽𝑗
OLS
− 𝜆
𝛽𝑗 < 0
𝐶 𝛽𝑗 =
1
2
𝛽𝑗
2
− 𝛽𝑗
OLS
𝛽𝑗 − 𝜆𝛽𝑗
= 𝛽𝑗
1
2
𝛽𝑗 − 𝛽𝑗
OLS
− 𝜆
𝛽𝑗 = 𝛽𝑗
OLS
+ 𝜆
Derivation of the soft threshold solution
𝛽𝑗
OLS
−𝜆 𝜆
Case: 𝛽𝑗
OLS
< −𝜆 𝛽𝑗
OLS
− 𝜆 < 0𝛽𝑗
OLS
+ 𝜆 < 0
𝛽𝑗
𝐶 𝛽𝑗
𝛽𝑗 = 𝛽𝑗
OLS
+ 𝜆
𝛽𝑗
OLS
−𝜆 𝜆
Case: −𝜆 ≤ 𝛽𝑗
OLS
≤ 𝜆
𝛽𝑗
OLS
− 𝜆 ≤ 0𝛽𝑗
OLS
+ 𝜆 ≥ 0
𝛽𝑗
𝐶 𝛽𝑗
𝛽𝑗 = 0
𝛽𝑗
OLS
−𝜆 𝜆
Case: 𝜆 < 𝛽𝑗
OLS
𝛽𝑗
OLS
− 𝜆 > 0𝛽𝑗
OLS
+ 𝜆 > 0
𝛽𝑗
𝐶 𝛽𝑗
𝛽𝑗 = 𝛽𝑗
OLS
− 𝜆
Derivation of the soft threshold solution
Case: 𝛽𝑗
OLS
< −𝜆, 𝛽𝑗 = 𝛽𝑗
OLS
+ 𝜆
Case: −𝜆 ≤ 𝛽𝑗
OLS
≤ 𝜆,𝛽𝑗 = 0
Case: 𝜆 < 𝛽𝑗
OLS
, 𝛽𝑗 = 𝛽𝑗
OLS
− 𝜆
𝛽𝑗 = sign 𝛽𝑗
OLS
𝛽𝑗
OLS
− 𝜆
+
𝛽𝑗 = sign 𝛽𝑗
OLS
𝛽𝑗
OLS
− 𝜆
+
= −1 − 𝛽𝑗
OLS
− 𝜆
+
= 𝛽𝑗
OLS
+ 𝜆
𝛽𝑗 = sign 𝛽𝑗
OLS
𝛽𝑗
OLS
− 𝜆
+
= sign 𝛽𝑗
OLS
× 0 = 0
𝛽𝑗 = sign 𝛽𝑗
OLS
𝛽𝑗
OLS
− 𝜆
+
= +1 𝛽𝑗
OLS
− 𝜆
+
= 𝛽𝑗
OLS
− 𝜆
Reference
High-dimensional data analysis, Lecture 6 (Lasso Regression) by Wessel van Wierin

Lasso regression

  • 1.
    Masayuki Tanaka Jun. 17,2016 Derivation of the closed soft threshold solution of the Lasso regression
  • 2.
    Lasso regression The costfunction of Lasso regression: 𝐿 𝜷, 𝜆 = 1 2 𝒀 − 𝑿𝜷 2 2 + 𝜆 𝜷 1 Y:Data matrix X:System matrix
  • 3.
    Orthonormal Lasso regression 𝐿𝜷, 𝜆 = 1 2 𝒀 − 𝑿𝜷 2 2 + 𝜆 𝜷 1 where 𝑿 𝑇 𝑿 = 𝑰 (orthonormal)The closed form soft threshold solution 𝛽𝑗 = sign 𝛽𝑗 OLS 𝛽𝑗 OLS − 𝜆 + 𝜷OLS = arg min 𝜷 1 2 𝒀 − 𝑿𝜷 2 2 = 𝑿 𝑇 𝑿 −1 𝑿 𝑇 𝒀 = 𝑿 𝑇 𝒀 sign 𝜉 = −1 (𝜉 < 0) 0 (𝜉 = 0) 1 (𝜉 > 0) 𝜉 + = max 𝜉, 0 = 𝜉 (𝜉 > 0) 0 (𝜉 ≤ 0)
  • 4.
    Derivation of thesoft threshold solution arg min 𝜷 1 2 𝒀 − 𝑿𝜷 2 2 + 𝜆 𝜷 1 = arg min 𝜷 1 2 𝒀 𝑇 𝒀 − 2𝜷 𝑇 𝑿 𝑻 𝒀 + 𝜷 𝑇 𝑿 𝑻 𝑿𝜷 + 𝜆 𝜷 1 = arg min 𝜷 1 2 −2𝜷 𝑇 𝜷OLS + 𝜷 𝑇 𝜷 + 𝜆 𝜷 1 𝒀 𝑇 𝒀 = 𝒄𝒐𝒏𝒔𝒕 𝑿 𝑻 𝒀 = 𝜷OLS 𝑿 𝑻 𝑿 = 𝑰 (We can consider element-wise) arg min 𝛽𝑗 𝐶 𝛽𝑗 = arg min 𝛽𝑗 1 2 𝛽𝑗 2 − 𝛽𝑗 OLS 𝛽𝑗 + 𝜆 𝛽𝑗 𝛽𝑗 = 0 𝛽𝑗 = 0 𝛽𝑗 > 0 𝐶 𝛽𝑗 = 1 2 𝛽𝑗 2 − 𝛽𝑗 OLS 𝛽𝑗 + 𝜆𝛽𝑗 = 𝛽𝑗 1 2 𝛽𝑗 − 𝛽𝑗 OLS + 𝜆 𝛽𝑗 = 𝛽𝑗 OLS − 𝜆 𝛽𝑗 < 0 𝐶 𝛽𝑗 = 1 2 𝛽𝑗 2 − 𝛽𝑗 OLS 𝛽𝑗 − 𝜆𝛽𝑗 = 𝛽𝑗 1 2 𝛽𝑗 − 𝛽𝑗 OLS − 𝜆 𝛽𝑗 = 𝛽𝑗 OLS + 𝜆
  • 5.
    Derivation of thesoft threshold solution 𝛽𝑗 OLS −𝜆 𝜆 Case: 𝛽𝑗 OLS < −𝜆 𝛽𝑗 OLS − 𝜆 < 0𝛽𝑗 OLS + 𝜆 < 0 𝛽𝑗 𝐶 𝛽𝑗 𝛽𝑗 = 𝛽𝑗 OLS + 𝜆 𝛽𝑗 OLS −𝜆 𝜆 Case: −𝜆 ≤ 𝛽𝑗 OLS ≤ 𝜆 𝛽𝑗 OLS − 𝜆 ≤ 0𝛽𝑗 OLS + 𝜆 ≥ 0 𝛽𝑗 𝐶 𝛽𝑗 𝛽𝑗 = 0 𝛽𝑗 OLS −𝜆 𝜆 Case: 𝜆 < 𝛽𝑗 OLS 𝛽𝑗 OLS − 𝜆 > 0𝛽𝑗 OLS + 𝜆 > 0 𝛽𝑗 𝐶 𝛽𝑗 𝛽𝑗 = 𝛽𝑗 OLS − 𝜆
  • 6.
    Derivation of thesoft threshold solution Case: 𝛽𝑗 OLS < −𝜆, 𝛽𝑗 = 𝛽𝑗 OLS + 𝜆 Case: −𝜆 ≤ 𝛽𝑗 OLS ≤ 𝜆,𝛽𝑗 = 0 Case: 𝜆 < 𝛽𝑗 OLS , 𝛽𝑗 = 𝛽𝑗 OLS − 𝜆 𝛽𝑗 = sign 𝛽𝑗 OLS 𝛽𝑗 OLS − 𝜆 + 𝛽𝑗 = sign 𝛽𝑗 OLS 𝛽𝑗 OLS − 𝜆 + = −1 − 𝛽𝑗 OLS − 𝜆 + = 𝛽𝑗 OLS + 𝜆 𝛽𝑗 = sign 𝛽𝑗 OLS 𝛽𝑗 OLS − 𝜆 + = sign 𝛽𝑗 OLS × 0 = 0 𝛽𝑗 = sign 𝛽𝑗 OLS 𝛽𝑗 OLS − 𝜆 + = +1 𝛽𝑗 OLS − 𝜆 + = 𝛽𝑗 OLS − 𝜆
  • 7.
    Reference High-dimensional data analysis,Lecture 6 (Lasso Regression) by Wessel van Wierin