# UCL course – 2016

GOOD Hado van Hasselt

Together with Joseph Modayil, this year I am teaching the part on reinforcement learning of the Advanced Topics in Machine Learning course at UCL.

# Lectures

Note that there will be two lectures about AlphaGo on March 24.  We will talk about AlphaGo in the context of the whole course at the normal place and time (9:15am in Roberts 412), and in addition David Silver will give a seminar that afternoon.  Neither of these will be required for the exam.

1. Introduction to reinforcement learning updated January 14 (Lecture: January 14)
2. Exploration and Exploitation updated January 21 (Lecture: January 21)
3. Markov decision processes updated January 27 (Lecture: January 28)
4. Dynamic programming updated February 3 (Lecture: February 4)
5. Learning to predict updated February 10 (Lecture: February 11)
6. Learning to control updated March 16 (Lecture: February 25)
7. Value function approximation updated March 2 (Lecture: March 3)
8. Policy-gradient algorithms updated March 9 (Lecture:…

View original post 51 more words

# 用强化学习玩文本游戏   $Q^{\pi}(s,a)=E[\sum_{i=0}^{\infty}\gamma^{i}r_{t+i}|s_{t}=s, a_{t}=a]$. $\pi(a_{t}=a_{t}^{i}|s_{t}) = \exp(\alpha\cdot Q(s_{t},a_{t}^{i}))/\sum_{j=1}^{|A_{t}|}\ \exp(\alpha\cdot Q(s_{t},a_{t}^{j})),$  $h_{1,s} = f(W_{1,s}s_{t}+b_{1,s}),$ $h_{1,a}^{i} = f(W_{1,a}a_{t}^{i}+b_{1,a}),$ $h_{\ell,s} = f(W_{\ell,s}h_{\ell-1,s}+b_{\ell,s}),$ $h_{\ell,a}^{i} = f(W_{\ell,a}h_{\ell-1,a}^{i}+b_{\ell,a}),$ DRRN 相比另外两个模型其创新点在于分别使用了两个网络来映射状态文本和动作文本，因为如果将长文本和短文本直接拼接输入单个神经网络结构的时候，可能会降低 Q 值的质量，所以把 state-text 和 action-text 分别放入不同的网络结构进行学习，最后使用内积合并的方式获得 Q 值的方法会更加优秀。

1. Deep Reinforcement Learning with a Natural Language Action Space