Steadily Learn to Drive with Virtual Memory

Reinforcement learning has achieved great success in fields as games or robotics. Despite the potential to apply it for autonomous driving, collecting data in the real world is expensive, and the instabilities of the method may lead to safety accidents.

A recent study addresses these problems by suggesting a novel actor-critic algorithm called Learn to drive with Virtual Memory.

Image credit: AImotive via Wikimedia (CC BY-SA 4.0)

It learns the virtual latent environment model from real interaction data. The virtual environment is then predicted, and imagined trajectories are recorded as the virtual memory. The policy is optimized without the need for real interaction data.

A double critic approach makes the process more stable by reducing the state value overestimation, which is caused by errors and noise. In the task of lane-keeping in a roundabout, the suggested model achieved more stable training and better control performance than current approaches.

Reinforcement learning has shown great potential in developing high-level autonomous driving. However, for high-dimensional tasks, current RL methods suffer from low data efficiency and oscillation in the training process. This paper proposes an algorithm called Learn to drive with Virtual Memory (LVM) to overcome these problems. LVM compresses the high-dimensional information into compact latent states and learns a latent dynamic model to summarize the agent’s experience. Various imagined latent trajectories are generated as virtual memory by the latent dynamic model. The policy is learned by propagating gradient through the learned latent model with the imagined latent trajectories and thus leads to high data efficiency. Furthermore, a double critic structure is designed to reduce the oscillation during the training process. The effectiveness of LVM is demonstrated by an image-input autonomous driving task, in which LVM outperforms the existing method in terms of data efficiency, learning stability, and control performance.

Research paper: Zhang, Y., “Steadily Learn to Drive with Virtual Memory”, 2021. Link:


Notify of
Inline Feedbacks
View all comments
Would love your thoughts, please comment.x