2023毕业快乐
年纪越大,感到时光流逝的就越快。转眼间,就已经到了2023年。比我大一届的师兄师姐们就要毕业了。这篇博客主要放照片用,没有什么太多的文字内容。
祝各位毕业的师兄师姐前程似锦,生活精彩。🎉🎉🎉

年纪越大,感到时光流逝的就越快。转眼间,就已经到了2023年。比我大一届的师兄师姐们就要毕业了。这篇博客主要放照片用,没有什么太多的文字内容。
祝各位毕业的师兄师姐前程似锦,生活精彩。🎉🎉🎉

This is an automatically translated post by LLM. The original post is in Chinese. If you find any translation errors, please leave a comment to help me improve the translation. Thanks!
As we grow older, we feel that time passes by faster. In the blink of an eye, it's already 2023. The seniors who are one year older than me are about to graduate. This blog mainly consists of photos and doesn't have much text content.
I wish all the graduating seniors a bright future and a wonderful life. 🎉🎉🎉

最近,清华大学和商汤发表了一篇名为《Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory》的文章,简称GITM。很有意思,感兴趣的朋友可以读一下原文。

This is an automatically translated post by LLM. The original post is in Chinese. If you find any translation errors, please leave a comment to help me improve the translation. Thanks!
Recently, Tsinghua University and SenseTime published an article titled "Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory," abbreviated as GITM. It's quite interesting, and interested friends can read the original article.

深度强化学习的流程可以抽象为以下步骤的重复:
本文主要探讨在收集经验过程中,环境自然结束(Terminated,包括目标成功,失败等)和人为截断(Truncated,主要为达到一定步数结束)对经验收集和训练产生的影响,以及如何对其进行处理。并对其进行了部分实验来比较性能。

This is an automatically translated post by LLM. The original post is in Chinese. If you find any translation errors, please leave a comment to help me improve the translation. Thanks!
The process of deep reinforcement learning can be abstracted into the following steps:
This article mainly discusses the impact of natural termination (Terminated), including successful or failed goals, and artificial truncation (Truncated), mainly ending after a certain number of steps, on experience collection and training. It also conducts some experiments to compare performance.

强化学习在算法实现时需要非常注意细节,否则网络很难收敛或没有训练效果。本文主要记录我在实现各种强化学习算法时踩到的一些坑以及需要注意的细节,持续更新......
以及附上我自己实现的RL算法库:https://github.com/KezhiAdore/RL-Algorithm

This is an automatically translated post by LLM. The original post is in Chinese. If you find any translation errors, please leave a comment to help me improve the translation. Thanks!
And here's my self-implemented RL algorithm library: https://github.com/KezhiAdore/RL-Algorithm
本文主要对于交叉熵的手动计算和PyTorch中的CrossEntropyLoss模块计算结果不一致的问题展开讨论,查阅了PyTorch的官方文档,最终发现是CrossEntropyLoss在计算交叉熵之前会对输入的概率分布进行一次SoftMax操作导致的。

This is an automatically translated post by LLM. The original post is in Chinese. If you find any translation errors, please leave a comment to help me improve the translation. Thanks!
This article mainly discusses the inconsistency between the manual
calculation of cross-entropy and the results obtained by the
CrossEntropyLoss module in PyTorch. After
consulting the official documentation of PyTorch, it was
found that the inconsistency was caused by the SoftMax
operation performed by CrossEntropyLoss on the input
probability distribution before calculating the cross-entropy.
In reinforcement learning, the loss function commonly used in policy learning is \(l=-\ln\pi_\theta(a|s)\cdot g\), where \(\pi_\theta\) is a probability distribution over actions given state \(s\), and \(a\) is the action selected in state \(s\). Therefore, we have:
\[ -\ln\pi_\theta(a|s) = -\sum_{a'\in A}p(a')\cdot \ln q(a') \]
\[ p(a') = \left\{ \begin{array}{lr} 1 &&& a'=a\\ 0 &&& otherwise \end{array} \right. \]
\[ q(a') = \pi_\theta(a'|s) \]
Thus, this loss function is transformed into the calculation of
cross-entropy between two probability distributions. Therefore, we can
use the built-in torch.nn.functional.cross_entropy function
(referred to as the F.cross_entropy function below) in
PyTorch to calculate the loss function. However, in
practice, it was found that the results calculated using this function
were inconsistent with the results calculated manually, which led to a
series of investigations.
Firstly, we used Python to manually calculate the
cross-entropy of two sets of data and the cross-entropy calculated using
the F.cross_entropy function, as shown in the code
below:
1 | import torch |
The results of the above code are as follows:
1 | Manually calculated cross-entropy: |
From the results, it can be seen that the two calculation results are
not consistent. Therefore, we consulted the official documentation of
PyTorch to understand the implementation of
F.cross_entropy.
The description of the F.cross_entropy function in the
documentation does not include the specific calculation process, only
explaining the correspondence between the input data and the output
result dimensions 1. However, there is a sentence in the
introduction of this function:
See
CrossEntropyLossfor details.
So we turned to the documentation of CrossEntropyLoss 2 and finally found the calculation
process of cross-entropy in PyTorch:

It can be seen that the official documentation on the calculation of
cross-entropy is very clear. In summary, the
F.cross_entropy function requires at least two parameters,
one is the predicted probability distribution, and the other is the
index of the target true class. The important point is that the
F.cross_entropy function does not require the input
probability distribution to sum to 1 or each item to be greater than 0.
This is because the function performs a SoftMax operation
on the input probability distribution before calculating the
cross-entropy.
Performing the SoftMax operation before calculating the
cross-entropy improves the tolerance of the input, but if the
SoftMax operation has been performed before the output is
constructed in the neural network, it will cause the calculation of
loss to be distorted, that is, the calculation results of
the previous section are inconsistent.
According to the official documentation of PyTorch, if
we add a SoftMax operation to the manual calculation of
cross-entropy, we can get the same calculation result as the
F.cross_entropy function. The following code is used to
verify this:
1 | import torch |
The output of the above code is as follows:
1 | Manually calculated cross-entropy: |