The fundamental reinforcement algorithm: Q learning and Bellman equation. Experiment on a simple Atari game: Frozen Lake.
Done of 4x4, 5x5 and 8x8 map(s).
Bellman equation
\[\begin{align} Q(S_t, a_t) \leftarrow Q(S_t, a_t)+\alpha \left [r_{t+1}+\lambda \sideset{}{}{max}_a (S_t,a)-Q(S_t, a_t) \right ] \end{align}\] \[\begin{align} here, \\ Q:& \text{quality, Q-table}\\ S:& \text{state}\\ a:& \text{action}\\ \alpha:& \text{learning rate}\\ r:& \text{reward}\\ \lambda:& \text{discount, for blancing the short-term reward and long-term reward.} \end{align}\]For more details, see my GitHub.