Gabriel Poesia

Deep Reinforcement Learning with a Natural Language Action Space (@ ACL 2016)

Ji He, Jianshu Chen, Xiaodong He, Jianfeng Gao, Lihong Li, Li Deng and Mari Ostendorf


This paper studies Reinforcement Learning in environments where states and actions are given in unbounded natural language. They use text-based games to test their architectures. The architecture they propose is intuitive: one uses distinct encoders for both the state and action, and then uses the dot product between both embeddings to estimate the $Q$ value of the state. This makes the encoders align states with good actions in embedding space.

This is one of the few viable baselines I can use for an education-realted environment I'm proposing. This space seems surprisingly underexplored, possibly because of the lack of interesting environments. Dialogue systems are a cousin of this setup, but one that is way more messy (data is expensive to collect, rewards are less well-defined, etc).