Bi-Level Proximal Policy Optimization for Stochastic Coordination of EV Charging Load with Uncertain Wind Power

Details

15:50 - 16:10 | Mon 19 Aug | Lau, 6-211 | MoC4.2

Session: Reinforcement Learning

Abstract

Energy and environmental issues are one of the biggest technological challenges facing human society in the new century, electrical vehicle (EV) and wind power are good solutions for both ends of demand and supply. It is of great importance to schedule the EV charging load to match the uncertain wind power supply both for economic reasons and for sustainable development. However, many classical methods are hard to apply for solving this problem due to its huge state space, the high uncertainty and the complex dynamic evolution of the system. Proximal policy optimization (PPO) shows us a new way to solve large scale stochastic optimization problems. In this paper, we make the following major contributions. First, using the structure of different spatial scales of this problem, we formulate the system as a bi-level Markov decision process (MDP). EV aggregators (EVAs) are applied to gather EVs according to their parking regions. On the upper level, based on some brief state information, the independent system operator (ISO) only decides how much power to buy from the state grid, and how much power will be allocated to each EVA. On the lower level, each EVA will finally decide every EV's charging process based on the dispatched power and then the system state starts to shift. Second, we develop a bi-level proximal policy optimization (BPPO) algorithm to solve this bi-level MDP where the upper network and lower level network are interrelated. Third, the numerical results show that our algorithm outperforms three other algorithms on reducing the total charging cost.