Peng Zhang1, Jie Zhang2, Bingzhang Hu1, Yang Long3
16:30 - 16:50 | Mon 19 Aug | Lau, 6-211 | MoC4.4
Batch processes are important manufacturing route for the agile manufacturing of high value added products and they are typically difficult to control due to highly nonlinear characteristic, unknown disturbance and model plant mismatches. Neural networks and traditional reinforcement learning have been applied to control and optimize batch processes. However, they usually lack robustness and accuracy leading to unsatisfactory performance. To overcome these problems, this paper proposes a stochastic multi-step action Q-learning algorithm (SMSA) based on multiple step action Q-learning (MSA). In MSA, the action space is divided into some same time steps, which means that some non-optimal actions will be continuously and compulsively applied in a long time and the speed of learning might be slow. Compared with MSA, the modification of SMSA is that several time steps are different and a modified greedy algorithm is used to improve the speed, efficiency and flexibility of algorithm. The proposed method is applied to a simulated fed-batch process and it gives better optimization control performance than other control strategies.