In this article, we propose a new approach to multi-agent reinforcement learning called Higher-Order Models (HOM). The goal is to simplify the policy space by limiting it to policies that use the useful abstractions of role and role interaction. These abstractions have consistently shown strong performance in multi-agent tasks, so we can simplify the policy search by only considering policies that use these abstractions.
To understand how HOM works, imagine a recipe book with different roles (e.g., baker, chef). Each role has a set of instructions (algorithms) for preparing a dish. The algorithms are immutable, meaning they can’t be changed once the recipe is written. However, the ingredients and cooking time can be adjusted based on the current state of the kitchen. This allows us to create different variations of the same dish depending on the situation.
In HOM, we use a similar approach. We have a set of instructions (algorithms) that define how each agent should act based on its role and the current state of the environment. These instructions are immutable, meaning they can’t be changed once the policy is defined. However, we can adjust the parameters of the algorithms based on feedback from the environment, allowing us to adapt the policy over time.
The key insight behind HOM is that by limiting the policy space to only policies that use these useful abstractions, we can simplify the search process and improve the efficiency of multi-agent reinforcement learning. By using immutable instructions and mutable parameters, we can balance between exploration and exploitation, allowing the agents to adapt to changing environments while still following their assigned roles.
In summary, HOM is a new approach to multi-agent reinforcement learning that simplifies the policy space by limiting it to policies that use useful abstractions like role and role interaction. By combining immutable instructions with mutable parameters, we can improve the efficiency and adaptability of multi-agent systems.
Computer Science, Machine Learning