Improving Multi-Agent Reinforcement Learning for Beer Game by Reward Design Based on Payment Mechanism

Keywords: Multi-Agent Reinforcement Learning, Beer Game, Reward Shaping, Supply Chain Management, VCG Mechanism


Supply chain management aims to maximize profits among supply chain partners by managing the flow of information and products. Multiagent reinforcement learning in artificial intelligence research fields has been applied to supply chain management. The beer game is an example problem in supply chain management and has also been studied as a cooperation problem in multiagent systems. In the previous study, a solution method SRDQN that is based on deep reinforcement learning and reward shaping has been applied to the beer game. By introducing a single reinforcement learning agent with SRDQN as a participant in the beer game, the cost of beer inventory was reduced. However, the previous study has not addressed the case of multiagent reinforcement learning due to the difficulties in cooperation among agents. To address the multiagent cases, we apply a reward shaping technique RDPM based on mechanism design to SRDQN and improve cooperative policies in multiagent reinforcement learning. Furthermore, we propose two reward design methods with modifications to the state value function designs in RDPM to address various consumer demands for beers in the supply chain. And then we empirically evaluate the effectiveness of the proposed approaches.


Adrian K. Agogino and Kagan Tumer. Analyzing and Visualizing Multiagent Rewards in Dynamic and Stochastic Domains. Autonomous Agents and Multi-Agent Systems, 17(2):320 ‒ 338, 2008.

Sushrut Bhalla, Sriram Ganapathi Subramanian, and Mark Crowley. Deep Multi Agent Reinforcement Learning for Autonomous Driving. In Canadian Conference on Artificial Intelligence 2020: Advances in Artificial Intelligence, pages 67–78, 2020.

Sven Gronauer and Klaus Diepold. Multi-Agent Deep Reinforcement Learning: A Survey. Artificial Intelligence Review, 55(2):895 ‒ 943, 2022.

Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, Conference Track Proceedings, 2015.

Hau Lee, V. Padmanabhan, and Seungjin Whang. Comments on ”Information Distortion in a Supply Chain: The Bullwhip Effect”. Management Science, 50:1887–1893, 2004.

Hau Lee, V. Padmanabhan, and Seungjin Whang. Information Distortion in a Supply Chain: The Bullwhip Effect. Management Science, 43:546–558, 2004.

Benda M., Jagannathan V., and Dodhiawalla R. On optimal cooperation of knowledge sources. Technical Report BCS-G2010-28, 1985.

Natsuki Matsunami, Shun Okuhara, and Takayuki Ito. Reward Design for Multi- Agent Reinforcement Learning with a Penalty Based on the Payment Mechanism. Transaction of the Japanese Society for Artificial Intelligence, 36(5):AG21–H 1–11, 2021.

Volodymyr Mnih et al. Human-level control through deep reinforcement learning. Nature, 518:529–533, 2015.

Vinod Nair and Geoffrey E. Hinton. Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the 27th International Conference on International Conference on Machine Learning, page 807 ‒ 814, 2010.

Authors’ names omitted for review. Applying Reward Design Based on Payment Mechanism to Shaped-Reward DQN for Beer Game. In 2022 12th International Congress on Advanced Applied Informatics (IIAI-AAI), pages 384–390, 2022.

Afshin Oroojlooy et al. A Deep Q-Network for the Beer Game: Deep Reinforcement Learning for Inventory Optimization.

Afshin Oroojlooy jadid, Mohammadreza Nazari, Lawrence Snyder, and Martin Tak´aˇc. A Deep Q-Network for the Beer Game: Deep Reinforcement Learning for Inventory Optimization. Manufacturing & Service Operations Management, 24(1):285–304, 2021.

Tim Roughgarden. Algorithmic Game Theory. Communications of the ACM, 53(7):78 ‒ 86, 2010.

John D. Sterman. Modeling Managerial Behavior: Misperceptions of Feedback in a Dynamic Decision Making Experiment. Management Science, 35:321–339, 1989.

Oriol Vinyals et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575:350 ‒ 354, 2019.

Technical Papers