# Research problem record.

This is an automatically translated post by LLM. The original post is in Chinese. If you find any translation errors, please leave a comment to help me improve the translation. Thanks!

## Introduction

In the process of scientific research and engineering practice, the solution to a problem often comes with the emergence of new problems. These problems often occur in specific scenarios, but their abstractions have a certain universality, making them the starting point for scientific research.

## Game Theory Related Problem Records

Counterfactual Regret Minimization (CFR) algorithms are often used to solve discrete problems that can be modeled using game trees. However, in real-world problems, the state transitions often take the form of a cyclic graph rather than a tree. If historical information is included, the resulting tree would be very large and similar states cannot be efficiently processed. How can we use the regret minimization approach in such problems?

In game theory, discrete problems are usually represented in the form of game trees. By iteratively computing on the game tree, the optimal solution or approximate Nash equilibrium solution can be obtained. However, solving continuous problems is often more difficult. How can continuous problems be transformed into problems that can be modeled using game trees, so that iterative algorithms for discrete problems can be applied? This is a question worth pondering.

## Reinforcement Learning Related Problem Records

- In multi-agent reinforcement learning, one solution is to adopt a hierarchical approach. The upper level makes macro decisions based on global information, while the lower level performs micro operations based on the macro decisions provided by the upper level. In this process, there will inevitably be numerous small teams collaborating to perform various tasks. These teams may have varying numbers of members, heterogeneity, and cooperative properties. How can we cleverly handle the self-organizing heterogeneous team problem in this reinforcement learning architecture?