### （一）FTRL 的算法原理：

FTRL 算法综合考虑了 FOBOS 和 RDA 对于梯度和正则项的优势和不足，其特征权重的更新公式是：

$W^{(t+1)}=argmin_{W}\{G^{(1:t)}\cdot W+\lambda_{1}||W||_{1}+\frac{\lambda_{2}}{2}||W||_{2}^{2}+\frac{1}{2}\sum_{s=1}^{t}\sigma^{(s)}||W-W^{(s)}||_{2}^{2}\}$

$(G^{(1:t)}-\sum_{s=1}^{t}\sigma^{(s)}W^{(s)})\cdot W + \lambda_{1}||W||_{1}+\frac{1}{2}(\lambda_{2}+\sum_{s=1}^{t}\sigma^{(s)})||W||_{2}^{2}$

$W^{(t+1)}=argmin_{W}\{Z^{(t)}\cdot W + \lambda_{1}||W||_{1}+\frac{1}{2}(\lambda_{2}+\sum_{s=1}^{t}\sigma^{(s)})||W||_{2}^{2}\}$

$argmin_{w_{i}}\{z_{i}^{(t)}w_{i}+\lambda_{1}|w_{i}|+\frac{1}{2}(\lambda_{2}+\sum_{s=1}^{t}\sigma^{(s)})w_{i}^{2}\}$

$w_{i}^{(t+1)}=0, \text{ if } |z_{i}^{(t)}|<\lambda_{1}$

$w_{i}^{(t+1)}=-(\lambda_{2}+\sum_{s=1}^{t}\sigma^{(s)})^{-1}\cdot(z_{i}^{(t)}-\lambda_{1}\cdot sgn(z_{i}^{(t)})) \text{ otherwise }$

### （二）学习率

$\eta_{i}^{(t)}=\alpha/(\beta+\sqrt{\sum_{s=1}^{t}(g_{i}^{(s)})^{2}})$

$\sum_{s=1}^{t}\sigma^{(s)}=\eta_{i}^{(t)}=\alpha/(\beta+\sqrt{\sum_{s=1}^{t}(g_{i}^{(s)})^{2}})$

### （三）FTRL 算法

FTRL Algorithm

（1）输入 $\alpha, \beta, \lambda_{1},\lambda_{2}$，初始化 $W\in\mathbb{R}^{N}, Z=0\in\mathbb{R}^{N}, Q=0\in\mathbb{R}^{N}$

（2）for $t=1,2,3,.....$
$G=\nabla_{W}\ell(W,X^{(t)},Y^{(t)})$
for $i=1,2,...,N$
$\sigma_{i}=\alpha^{-1}(\sqrt{q_{i}+g_{i}^{2}}-\sqrt{q_{i}})$
$q_{i}=q_{i}+g_{i}^{2}$ // equals to $(\eta^{(t)})^{-1}-(\eta^{(t-1)})^{-1}$
$z_{i}=z_{i}+g_{i}-\sigma_{i}w_{i}$
$\text{if } |z_{i}^{(t)}|<\lambda_{1}, w_{i}=0$
$\text{otherwise, }$
$w_{i}^{(t+1)}=-(\lambda_{2}+\sum_{s=1}^{t}\sigma^{(s)})^{-1}\cdot(z_{i}^{(t)}-\lambda_{1}\cdot sgn(z_{i}^{(t)}))$
end
end



### （四）Logistic Regression 的 FTRL 形式

$\ell_{t}(w_{t})=-y_{t}log(p_{t})-(1-y_{t})log(1-p_{t})$

FTRL Algorithm (Logistic Regression)

（1）输入 $\alpha, \beta, \lambda_{1},\lambda_{2}$，初始化 $W\in\mathbb{R}^{N}, Z=0\in\mathbb{R}^{N}, Q=0\in\mathbb{R}^{N}$

（2）for $t=1,2,3,.....$
for $i=1,2,...,N$
$g_{i}=(p_{t}-y_{t})x_{i}$ // gradient of loss function
$\sigma_{i}=\alpha^{-1}(\sqrt{q_{i}+g_{i}^{2}}-\sqrt{q_{i}})$
$q_{i}=q_{i}+g_{i}^{2}$ // equals to $(\eta^{(t)})^{-1}-(\eta^{(t-1)})^{-1}$
$z_{i}=z_{i}+g_{i}-\sigma_{i}w_{i}$
$\text{if } |z_{i}^{(t)}|<\lambda_{1}, w_{i}=0$
$\text{otherwise, }$
$w_{i}^{(t+1)}=-(\lambda_{2}+\sum_{s=1}^{t}\sigma^{(s)})^{-1}\cdot(z_{i}^{(t)}-\lambda_{1}\cdot sgn(z_{i}^{(t)}))$
end
end



1. Yusen Zhan says:

没写完？大帝你这个FTRL的objetive function有点丑啊。
Shai Shalev-Shwartz, ML界的名笔。他写了Online Learning 的survey，你可以参考一下。

Click to access OLsurvey.pdf

Like

2. 大帝，Objective function写的很丑啊。我看了半天才区分出loss function和regularizer。
Shai，ML界名笔，他写了一篇关于online learning 的survey。

Click to access OLsurvey.pdf

Like