Cooperation in Reinforcement Learning Multi-agent Systems

Q-Mix article, published in 2018 by T. Rashid et al. Examines a hybrid learning method based on multi-agent values ​​for reinforcement, and adds Constraint And mixing network structure to make learning more stable, faster and ultimately better in a controlled setting.

As a key concept concept for QMIX is to understand Concentrated learning ( ThatUntil ) With Distributed execution paradigm ( ThatI am ), Also known as CTDE: Agents Concentrated in a Concentrated manner with Access to the History of Overall Operation Observations ( τ) And the global state during training, but during execution they have access only to the local history of action observations ( τI am )

One of the first major ideas is to verify a constraint that enforces the monotony of the relationship between Global function of action value Q.Until And Function value function of each of the agents Q.I am In every action. This limited operation allows any agent to participate in a distributed distribution by Choice of greedy actions With respect to its action value function




Please enter your comment!
Please enter your name here