Generate and Revise: Reinforcement Learning in Neural Poetry

Several studies have been recently dedicated to the use of natural language processing for the reproduction of artistic behaviors. Nevertheless, existing poem generators differ in their method from the way humans write poetry. A poet revisits and adjusts a poem many times before reaching the final version, while poem generators create a poem all at once.

Image credit:, free licence

A recent study looks into the possibility of improving the quality of a poem by repeatedly revisiting and correcting it.

It uses proximal policy optimization, a reinforcement learning method that is not commonly used in natural language generation. The agent is not informed about what the rhyme is. However, it learns to pay attention to the ending words of each verse. It also understands that other words of the poem might need to be adjusted to make a poem coherent. The suggested method could be applied in other text generation tasks in the future.

Writers, poets, singers usually do not create their compositions in just one breath. Text is revisited, adjusted, modified, rephrased, even multiple times, in order to better convey meanings, emotions and feelings that the author wants to express. Amongst the noble written arts, Poetry is probably the one that needs to be elaborated the most, since the composition has to formally respect predefined meter and rhyming schemes. In this paper, we propose a framework to generate poems that are repeatedly revisited and corrected, as humans do, in order to improve their overall quality. We frame the problem of revising poems in the context of Reinforcement Learning and, in particular, using Proximal Policy Optimization. Our model generates poems from scratch and it learns to progressively adjust the generated text in order to match a target criterion. We evaluate this approach in the case of matching a rhyming scheme, without having any information on which words are responsible of creating rhymes and on how to coherently alter the poem words. The proposed framework is general and, with an appropriate reward shaping, it can be applied to other text generation problems.

Research paper: Zugarini, A., Pasqualini, L., Melacci, S., and Maggini, M., “Generate and Revise: Reinforcement Learning in Neural Poetry”, 2021. Link:


Notify of
Inline Feedbacks
View all comments
Would love your thoughts, please comment.x