On this page

    agent

    一个RL agent可能包含一个或多个这些成分:

    \[a = \pi(s)\] \[v_\pi(s) = E_\pi [R_t + \gamma R_{t+1} + \gamma^2 R_{t+2} + ... \mid S_t = s]\] \[P_{SS'}^a = P[S' = s' \mid S=s, A=a]\] \[R_s^a = E[R \mid S=s, A=a]\]

    强化学习的分类

    基于value

    基于policy

    Actor Critic

    Model Free

    序列决策中两个基本问题

    强化学习:

    Planning(规划):