\item 在机器翻译中,强化学习的应用还有很多,比如,MIXER算法用混合策略梯度和极大似然估计的目标函数来更新模型{\red Sequence Level Training with Recurrent Neural Networks},DAgger{\red A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning}以及DAD{\red Improving Multi-step Prediction of Learned Time Series Models}等算法在训练过程之中逐渐让模型适应推断阶段的模式。此外,强化学习的效果目前还相当不稳定,研究人员提出了大量的方法来进行改善,比如降低方差{\red An Actor-Critic Algorithm for Sequence Prediction;Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback}、使用单语语料{\red Improving Neural Machine Translation Models with Monolingual Data;A Study of Reinforcement Learning for Neural Machine Translation}等等。由于强化学习能从反馈的奖励中学习的特性,有不少研究探究如何在交互式场景中使用强化学习来提升系统性能。典型的例子就是对话系统,人类的反馈可以被用来训练系统,例如small-talk{\red A Deep Reinforcement Learning Chatbot}以及面向任务的对话{\red Continuously Learning Neural Dialogue Management}。