The details of implementing reinforcement learning algorithms.

Posted on 2023-03-06 Edited on 2024-04-20 In Note Views: 14 Waline: Word count in article: 1.6k Reading time ≈ 1 mins.

This is an automatically translated post by LLM. The original post is in Chinese. If you find any translation errors, please leave a comment to help me improve the translation. Thanks!

Common

Be cautious when implementing reinforcement learning algorithms, as attention to detail is crucial for convergence and training effectiveness. This article mainly documents some pitfalls encountered and details to be aware of while implementing various reinforcement learning algorithms, with continuous updates...

And here's my self-implemented RL algorithm library: https://github.com/KezhiAdore/RL-Algorithm

Common

In PyTorch, the cross-entropy loss function torch.nn.functional.cross_entropy first computes a softmax, which should be noted when using policy gradients ¹.
In some scenarios, there's a human-induced premature termination of trajectories (e.g., reaching a certain step in the Cart Pole environment). This needs to be distinguished from termination due to failure. In the latter case, $q (s, a) = r$ , while in human-induced truncation, $q (s, a) = r + γ * V (s^{'})$ ².

REINFORCE

When computing the loss, ensure consistency in dimensions when multiplying $- \ln π_{θ} (a | s)$ with discounted rewards.
After computing the cross-entropy, using torch.sum for summation is more effective than torch.mean for averaging ³.

DQN Series

Be mindful when calculating q_target, especially when done=1, where q_target = reward + self._gamma * max_next_q_value * (1 - done), to prevent significant oscillations in rewards during training.
Ensure there's periodic synchronization of the target_network, as convergence is challenging without it.

The details of implementing reinforcement learning algorithms.

Common

Common

REINFORCE

DQN Series

Reference

预览: