This paper builds on ReAct and proposes an interesting paradigm of "verbal reinforcement learning", where the agent accumulates knowledge from one trial to the next via self-generated "lessons learned" in natural language. This provides a quite interesting way to get a frozen LLM to improve as it tries to perform the same task multiple times. In cases where the model has enough domain knowledge to accurately summarize useful hints from its failures, this can be widely more sample-efficient than regular RL would hope to be by training the policy directly with the final reward signal, which is typically very little data to learn from.