# 读博驱动力的五个阶段：过山车曲线

## 英文原文如下：

Transition Curve
The motivation during your PhD is not constant, and it resembles the phases that entrepreneurs experience and that Tim Ferriss describes in his post Harnessing Entrepreneurial Manic Depression: Making The Rollercoaster Work for You. Tim provides great advice for entrepreneurs, but this can easily be adapted to research and PhD life.

### Phase 1: Uninformed optimism

You start your PhD, everything is new and you find your project really cool. It feels like you are going to solve a big problem and you might get a big prize if you are ambitious and work well, maybe a patent, maybe a paper in a high impact journal. Sounds familiar? It is a similar feeling to starting in a new job, everybody is nicer than in the previous job and it is by far better organized. Well, give it some months, you’ll realize it is not that great.

### Phase 2: Informed Pessimism

You have been working for some time on your project, you understand the field better, but unfortunately you are still lost. You don’t see any good results in the near future and you start to realize that this project might be a bit too big for you. This phase is more severe if the content of your PhD is not a continuation from a previous work, if you switched fields.

### Phase 3: Crisis of Meaning

You are more or less in the middle of your PhD and you have a crisis like 40 year old guys have. Since you don’t have money to buy you a Porsche, you just cry in silence in a corner. You think “Is this all? Am I a failure?” The project is not as pinkful as you dreamt it, in fact, you are going to struggle and work your ass off to finish a minimally decent body of work. You feel that you have wasted a lot of time, that you did a lot of useless little projects. Now they seem useless, but you never know, maybe sometime later you connect the dots and they were the starting points of something great.

### Phase 4: Crash and Burn (optional)

While at Phase 3, if you don’t step aside fast from your negative feelings you are going to be screwed. Negativity might take over, leading you a mini depression. At this stage, many people think they have been wasting their time and they give up. They walk away with an unfinished PhD. Needless to say, we want to avoid this.

### Phase 5: Informed Optimism

Slowly you start to realize that your PhD thesis is not going to be as awesome as you thought. Whatever. At least you’ll get some publications, enough to graduate. Maybe the Nature paper has to wait for your post-doc. Who cares. You’d better finish a half-ass Phd than nothing. You are getting the grip of your field, you can contribute (something) to the state of the art. It should be enough. Good enough, you don’t need perfect.

This curve is fitted to PhD data collected during many years. This means everybody will experience a certain deviation from the values here predicted. Some phases will be mild while others can be extreme. At any stage, don’t be carried away by over-optimism/pessimism. Stay cool, be water my friend.

Interested in becoming a Scientist 2.0? Then visit my blog: http://juliopeironcely.com/

# 开公众号之后的一些感想

PS：自己踏入社会也没多长的时间，可能其中也会有一些偏见，希望大家多多指教。

# 异常点检测算法（二）

### （一）主成分分析（Principle Component Analysis）

（1）使得数据集合更容易使用；

（2）降低很多算法的计算开销；

（3）去除噪声；

（4）更加容易的描述结果。

去除平均值

Principle Component Analysis 的基本性质：

Principle component analysis provides a set of eigenvectors satisfying the following properties:

（1）If the top-k eigenvectors are picked (by largest eigenvalue), then the k-dimensional hyperplane defined by these eigenvectors, and passing through the mean of the data, is a plane for which the mean square distance of all data points to it is as small as possible among all hyperplanes of dimensionality k.

（2）If the data is transformed to the axis-system corresponding to the orthogonal eigenvectors, the variance of the transformed data along each eigenvector dimension is equal to the corresponding eigenvalue. The covariances of the transformed data in this new representation are 0.

（3）Since the variances of the transformed data along the eigenvectors with small eigenvalues are low, significant deviations of the transformed data from the mean values along these directions may represent outliers.

### （二）基于矩阵分解的异常点检测方法

$X=PDP^{T},$

$Y=dataMat\times P.$

$Y^{j}=dataMat \times P^{j},$

$R^{j}=(P^{j}\times (Y^{j})^{T})^{T}=Y^{j}\times (P^{j})^{T},$

$score(dataMat_{i})=\sum_{j=1}^{p}(|dataMat_{i}-R_{i}^{j}|)\times ev(j)$

$ev(j)=\sum_{k=1}^{j}\lambda_{k}/\sum_{k=1}^{p}\lambda_{k}$

# 异常点检测算法（一）

### （一）基于正态分布的一元离群点检测方法

$\mu=\sum_{i=1}^{n}x_{i}/n,$

$\sigma^{2}=\sum_{i=1}^{n}(x_{i}-\mu)^{2}/n.$

### （二）多元离群点的检测方法

#### （1）基于一元正态分布的离群点检测方法

$\mu_{j}=\sum_{i=1}^{m}x_{i,j}/m$

$\sigma_{j}^{2}=\sum_{i=1}^{m}(x_{i,j}-\mu_{j})^{2}/m$

$p(\vec{x})=\prod_{j=1}^{n} p(x_{j};\mu_{j},\sigma_{j}^{2})=\prod_{j=1}^{n}\frac{1}{\sqrt{2\pi}\sigma_{j}}\exp(-\frac{(x_{j}-\mu_{j})^{2}}{2\sigma_{j}^{2}})$

#### （2）多元高斯分布的异常点检测

$\vec{\mu}=(E(x_{1}),...,E(x_{n}))$

$n\times n$ 的协方差矩阵：

$\Sigma=[Cov(x_{i},x_{j})], i,j \in \{1,...,n\}$

$p(\vec{x})=\frac{1}{(2\pi)^{\frac{n}{2}}|\Sigma|^{\frac{1}{2}}} \exp(-\frac{1}{2}(\vec{x}-\vec{\mu})^{T}\Sigma^{-1}(\vec{x}-\vec{\mu}))$

#### （3）使用 Mahalanobis 距离检测多元离群点

$MDist(a,\overline{a})=\sqrt{(a-\overline{a})^{T}S^{-1}(a-\overline{a})},$

#### （4）使用 $\chi^{2}$ 统计量检测多元离群点

$\chi^{2}=\sum_{i=1}^{n}(a_{i}-E_{i})^{2}/E_{i}.$

# 异常点检测算法（三）Replicator Neural Networks

An outlier is an observation that deviates so much from other observations as as to arouse suspicion that it was generated by a different mechanism.

## RNN 算法的主要思想

$e_{i}=\sum_{j=1}^{6}(x_{i j}-r_{i j})^{2}/6$

$\theta=I_{ki}=\sum_{j=0}^{L_{k-1}}w_{kij}Z_{(k-1)j}$

$S_{k}(\theta)=tanh(a_{k}\theta) \text{ for } k=2 \text{ or } 4,$

$S_{3}(\theta)=\frac{1}{2}$+$\frac{1}{4}\sum_{j=1}^{N-1}tanh(a_{3}(\theta-\frac{j}{N}))$.

## 后向传播算法：

$\bold{x}_{i}=\bold{y}_{i}\in\mathbb{R}^{n} \text{ for all } 1\leq i\leq m$.

$\alpha_{h} = \sum_{i=1}^{n}v_{i h}x_{i} \text{ for all } 1\leq h \leq q.$

$(\alpha_{1},\cdot\cdot\cdot,\alpha_{q})=(x_{1},\cdot\cdot\cdot,x_{n})\begin{bmatrix} v_{11} & ... & v_{1q} \\ ... & ... & ... \\ v_{n1} & ... & v_{nq} \end{bmatrix}.$

$\beta_{j}=\sum_{h=1}^{q}w_{h j}b_{h} \text{ for all } 1\leq j\leq n,$

$(\beta_{1},\cdot\cdot\cdot,\beta_{n})=(b_{1},\cdot\cdot\cdot,b_{q})\begin{bmatrix} w_{11} & ... & w_{1n} \\ ... & ... & ... \\ w_{q1} & ... & w_{qn} \end{bmatrix}.$

$E_{k} =\frac{1}{2}\sum_{j=1}^{n}(\hat{y}_{kj}-y_{kj})^{2},$

$E = \frac{1}{m}\sum_{k=1}^{m}E_{k} = \frac{1}{2m}\sum_{k=1}^{m}\sum_{j=1}^{n}(\hat{y}_{kj}-y_{kj})^{2}$

### 标准 BP 算法：

$v \leftarrow v$+$\Delta v.$

$\Delta w_{hj} = -\eta \frac{\partial E_{k}}{\partial w_{hj}},$

$\frac{\partial E_{k}}{\partial w_{hj}} = \frac{\partial E_{k}}{\partial \hat{y}_{kj}} \cdot \frac{\partial \hat{y}_{kj}}{\partial \beta_{j}} \cdot \frac{\partial \beta_{j}}{\partial w_{hj}}$

$\frac{\partial E_{k}}{\partial w_{hj}} = (\hat{y}_{kj}-y_{kj})\cdot\hat{y}_{kj}\cdot(1-\hat{y}_{kj})\cdot b_{h}$

$g_{j}=-\frac{\partial E_{k}}{\partial \beta_{j}}=-\frac{\partial E_{k}}{\partial \hat{y}_{kj}}\cdot \frac{\hat{y}_{kj}}{\partial \beta_{j}}$

$g_{j}=\hat{y}_{kj}\cdot(1-\hat{y}_{kj})\cdot(y_{kj}-\hat{y}_{kj})$

$\Delta \theta_{j}=-\eta\cdot\frac{\partial E_{k}}{\partial \theta_{j}}, \Delta v_{ih}=-\eta\cdot\frac{\partial E_{k}}{\partial v_{ih}}, \Delta \gamma_{h}=-\eta\cdot\frac{\partial E_{k}}{\partial \gamma_{h}}$.

$\frac{\partial E_{k}}{\partial \theta_{j}}=\frac{\partial E_{k}}{\partial \hat{y}_{kj}}\cdot\frac{\partial\hat{y}_{kj}}{\partial\theta_{j}}=(\hat{y}_{kj}-y_{kj})\cdot(-1)\cdot f^{'}(\beta_{j}-\theta_{j})=(y_{kj}-\hat{y}_{kj})\cdot\hat{y}_{kj}\cdot(1-\hat{y}_{kj})=g_{j}$

$\frac{\partial E_{k}}{\partial v_{ih}}=\frac{\partial E_{k}}{\partial\alpha_{h}}\cdot\frac{\partial\alpha_{h}}{\partial v_{ih}}=\frac{\partial E_{k}}{\partial b_{h}}\cdot\frac{\partial b_{h}}{\partial \alpha_{h}}\cdot\frac{\partial\alpha_{h}}{\partial v_{ih}}$

$\frac{\partial \alpha_{h}}{\partial v_{ih}}=x_{ki}$

$\frac{\partial b_{h}}{\partial\alpha_{h}}=f^{'}(\alpha_{h}-\gamma_{h})=f(\alpha_{h}-\gamma_{h})\cdot(1-f(\alpha_{h}-\gamma_{h}))=b_{h}\cdot(1-b_{h})$

$\frac{\partial E_{k}}{\partial b_{h}}=\sum_{j=1}^{n}\frac{\partial E_{k}}{\partial \beta_{j}}\cdot\frac{\partial \beta_{j}}{\partial b_{h}}=\sum_{j=1}^{n}(-g_{j})\cdot w_{hj}$

$\Delta v_{ih}=\eta(\sum_{j=1}^{n}g_{j}w_{hj})\cdot b_{h}\cdot (1-b_{h})x_{ki} = \eta e_{h}x_{ki},$ 其中 $e_{h}=-\partial E_{k}/\partial\alpha_{h}=(\sum_{j=1}^{n}g_{j}w_{hj})\cdot b_{h}\cdot(1-b_{h}).$

$\Delta \gamma_{h}=(-\eta)\cdot\frac{\partial E_{k}}{\partial\gamma_{h}}=(-\eta)\cdot\frac{\partial E_{k}}{\partial b_{h}}\cdot\frac{\partial b_{h}}{\partial\gamma_{h}}=\eta\cdot(\sum_{j=1}^{n}g_{j}w_{hj})\cdot(-1)\cdot f^{'}(\alpha_{h}-\gamma_{h})=(-\eta)\cdot(\sum_{j=1}^{n}g_{j}w_{hj})\cdot b_{h}\cdot(1-b_{h})=(-\eta)\cdot e_{h} .$

$\Delta w_{hj}=\eta g_{j}b_{h} \text{ for all } 1\leq j\leq n, 1\leq h \leq q,$

$\Delta \theta_{j}=-\eta g_{j} \text{ for all } 1\leq j\leq n,$

$\Delta v_{ih}=\eta e_{h}x_{ki} \text{ for all } 1\leq i\leq n, 1\leq h\leq q,$

$\Delta \gamma_{h}=-\eta e_{h} \text{ for all } 1\leq h\leq q,$

1. 在 (0,1) 范围内随机神经网络中的所有连接权重和阈值
2. repeat
for all $(\bold{x}_{k},\bold{y}_{k})$ do

end for
3. 达到停止条件

### 累积 BP 算法：

BP 算法的目的是最小化训练集上的累计误差 $E=\sum_{k=1}^{m}E_{k}/m,$ 其中 m 是训练集合中样本的个数。不过，标准的 BP 算法每次仅针对一个训练样例更新连接权重和阈值，也就是说，标准 BP 算法的更新规则是基于单个的 $E_{k}$ 推导而得到的。通过类似的计算方法可以推导出累计误差的最小化更新规则，那就得到了累计误差逆传播（accumulate error backpropagation）算法。标准 BP 算法需要进行更多次的迭代，并且参数的更新速度快，累积 BP 算法必须扫描一次训练集合才会进行一次参数的更新，而且累计误差下降到一定的程度以后 ，进一步下降就会明显变慢，此时标准 BP 算法往往会更快的得到较好的解，尤其是训练集合大的时候。

## 训练方法：

（1）把数据集合的每一列都进行归一化；

（2）选择 70% 的数据集合作为训练集合，30% 的数据集合作为验证集合。或者 训练集合 : 验证集合 = 8 : 2，这个需要根据情况而定。

（3）随机生成一个三层的神经网络结构，里面的权重都是随机生成，范围在 [0,1] 内。输入层的数据和输出层的数据保持一致，并且神经网络中间层的节点个数是输入层的一半。

（4）使用后向传播算法（back-propagation）来训练模型。为了防止神经网络的过拟合，通常有两种策略来防止这个问题。（i）第一种策略是“早停”（early stopping）：当训练集合的误差降低，但是验证集合的误差增加时，则停止训练，同时返回具有最小验证集合误差的神经网络；（ii）第二种策略是“正则化”（regularization）：基本思想是在误差目标函数中增加一个用于描述网络复杂度的部分，例如链接权和阀值的平方和。

## 参考文献：

[1] Anomaly Detection Using Replicator Neural Networks Trained on Examples of One Class, Hoang Anh Dau, Vic Ciesielski, Andy Song

[2] Replicator Neural Networks for Outlier Modeling in Segmental Speech Recognition, Laszlo Toth and Gabor Gosztolya

[3] Outlier Detection Using Replicator Neural Networks, Simon Hawkins, Honxing He, Graham Williams and Rohan Baxter