Related Parallel Machine Online Scheduling Method Based on LSTM-PPO Reinforcement Learning

doi:10.3969/j.issn.1004-132X.2022.03.009

Abstract

Abstract: To solve the related parallel machine online scheduling problems, the total weighted completion time was taken into account, and an online scheduling method was proposed based on LSTM-PPO reinforcement learning. A LSTM-integrated agent was designed to record the historical variations of workshop states and the corresponding scheduling policy adjustment, and then online scheduling decision was made according to the state information. Meanwhile, the workshop state matrix was designed to describe the problem constraints and optimization goals, additional machine waiting was introduced in scheduling action space to expand solution space, and the reward function was designed to decompose the optimization goal into step-by-step rewards to achieve scheduling decision evaluation. Finally, the model updating and global optimization of parameters was achieved by PPO algorithm. Experimental results show that the proposed method has competitive solutions than the existing heuristic rules, and the proposed algorithm is applied to the production scheduling of the actual workshops, which effectively reduces the total weighted completion time.

摘要： 针对等效并行机在线调度问题，以加权完工时间和为目标，提出了一种基于长短期记忆近端策略优化（LSTM-PPO）强化学习的在线调度方法。通过设计融合LSTM的智能体记录车间的历史状态变化和调度策略，进而根据状态信息进行在线调度。设计了车间状态矩阵对问题约束和优化目标进行描述，在调度决策中引入额外的设备等待指令来扩大解空间，并设计奖励函数将优化目标分解为分步奖励值实现调度决策评价。最后基于PPO算法进行模型更新和参数全局优化。实验结果表明所提方法优于现有的几种启发式规则，并将所提算法应用于实际车间的生产调度，有效减小了加权完工时间和。

关键词: 等效并行机, 在线调度, 强化学习, 长短期记忆近端策略优化

CLC Number:

TH166

HE Junjie, ZHANG Jie, ZHANG Peng, WANG Junliang, ZHENG Peng, WANG Ming. Related Parallel Machine Online Scheduling Method Based on LSTM-PPO Reinforcement Learning[J]. China Mechanical Engineering, 2022, 33(03): 329-338.

贺俊杰, 张洁, 张朋, 汪俊亮, 郑鹏, 王明. 基于长短期记忆近端策略优化强化学习的等效并行机在线调度方法[J]. 中国机械工程, 2022, 33(03): 329-338.

References

［1］吴继浩. 面向航天产品的多目标动态生产调度方法研究及应用［D］. 绵阳:西南科技大学, 2019.
WU Jihao. Research and Application of Multi-objective Dynamic Production Scheduling Method for Aerospace Products［D］. Mianyang:Southwest University of Science and Technology, 2019.
［2］BANSAL N. Algorithms for Flow Time Scheduling［D］. Pennsylvania:Carnegie Mellon University, 2003.
［3］LEONARDI S, RAZ D. Approximating Total Flow Time on Parallel Machines［J］. Journal of Computer and System Sciences, 2007, 73(6):875-891.
［4］SITTERS R. Efficient Algorithms for Average Completion Time Scheduling［C］∥Integer Programming and Combinatorial Optimization. Lausanne, 2010:411-423.
［5］HALL L A, SHMOYS D B, WEIN J. Scheduling to Minimize Average Completion Time:Off-line and On-line Algorithms［J］. Mathematics of Operations Research, 1996, 22(3):513-544.
［6］MAO H, ALIZADEH M, MENACHE I, et al. Resource Management with Deep Reinforcement Learning［C］∥Proceedings of the 15th ACM Workshop on Hot Topics in Networks. Atlanta, 2016:50-56.
［7］柳丹丹, 龚祝平, 邱磊. 改进遗传算法求解同类并行机优化调度问题［J］. 机械设计与制造, 2020(4):262-265.
LIU Dandan, GONG Zhuping, QIU Lei. Improved Genetic Algorithm for the Optimal Scheduling Problem of Uniform Parallel Machine［J］. Machinery Design &Manufacture, 2020(4):262-265.
［8］许显杨, 陈璐. 考虑设备可靠性与能耗的平行机调度［J］. 上海交通大学学报, 2020, 54(3):247-255.
XU Xianyang, CHEN Lu. Parallel Machine Scheduling Problem Considering Machine Reliability and Enegy Consumption［J］. Journal of Shanghai Jiao Tong University, 2020, 54(3):247-255.
［9］GUPTA D, MARAVELIAS C T, WASSICK J M. From Rescheduling to Online Scheduling［J］. Chemical Engineering Research and Design, 2016, 116:83-97.
［10］ZHANG R, CHANG P, SONG S, et al. A Multi-objective Artificial Bee Colony Algorithm for Parallel Batch-processing Machine Scheduling in Fabric Dyeing Processes［J］. Knowledge-based Systems, 2017, 116:114-129.
［11］MICHAEL L P. Scheduling:Theory, Algorithms, and Systems［M］. New York:Springer, 2018.
［12］TAO J, LIU T. WSPT’s Competitive Performance for Minimizing the Total Weighted Flow Time:from Single to Parallel Machines［J］. Mathematical Problems in Engineering, 2013, 2013:343287.
［13］ANDERSON E J, POTTS C N. Online Scheduling of a Single Machine to Minimize Total Weighted Completion Time［J］. Mathematics of Operations Research, 2004, 29(3):686-697.
［14］TAO J. A Better Online Algorithm for the Parallel Machine Scheduling to Minimize the Total Weighted Completion Time［J］. Computers and Operations Research, 2014, 43(1):215-224.
［15］ABBEEL P, COATES A, QUIGLEY M, et al. An Application of Reinforcement Learning to Aerobatic Helicopter Flight［M］∥SCHLKOPF B, PLATT J, HOFMANN T.Advances in Neural Information Processing Systems 19:Proceedings of the 2006 Conference. Cambridge:MIT Press, 2007:1-8.
［16］吴晓光, 刘绍维, 杨磊, 等. 基于深度强化学习的双足机器人斜坡步态控制方法［J］. 自动化学报, 2020, 46:1-12.
WU Xiaoguang, LIU Shaowei, YANG Lei, et al. A Gait Control Method for Biped Robot on Slope Based on Deep Reinforcement Learning［J］. Acta Automatica Sinica, 2020, 46:1-12.
［17］王云鹏, 郭戈. 基于深度强化学习的有轨电车信号优先控制［J］. 自动化学报, 2019, 45(12):2366-2377.
WANG Yunpeng, GUO Ge. Signal Priority Control for Trams Using Deep Reinforcement Learning［J］. Acta Automatica Sinica, 2019, 45(12):2366-2377.
［18］袁兆麟, 何润姿, 姚超, 等. 基于强化学习的浓密机底流浓度在线控制算法［J］. 自动化学报, 2021, 47(7):1558-1571.
YUAN Zhaolin, HE Runzi, YAO Chao, et al. Online Reinforcement Learning Control Algorithm for Concentration of Thickener Underflow［J］. Acta Automatica Sinica, 2021, 47(7):1558-1571.
［19］CUNHA B, MADUREIRA A M, FONSECA B, et al. Deep Reinforcement Learning as a Job Shop Scheduling Solver:a Literature Review［C］∥International Conference on Hybrid Intelligent Systems. Porto, 2018:350-359.
［20］SUTTON R S, BARTO A G. Reinforcement Learning:an Introduction［M］. Cambridge:MIT Press, 2018.
［21］LIU C L, CHANG C C, TSENG C J. Actor-critic Deep Reinforcement Learning for Solving Job Shop Scheduling Problems［J］. IEEE Access, IEEE, 2020, 8:71752-71762.
［22］GABEL T, RIEDMILLER M. Distributed Policy Search Reinforcement Learning for Job-shop Scheduling Tasks［J］. International Journal of Production Research, 2012, 50(1):41-61.
［23］王世进, 孙晟, 周炳海, 等. 基于Q-学习的动态单机调度［J］. 上海交通大学学报, 2007(8):1227-1243.
WANG Shijin, SUN Sheng, ZHOU Binghai, et al. Q-Learning Based Dynamic Single Machine Scheduling［J］. Journal of Shanghai Jiao Tong University, 2007(8):1227-1243.
［24］WANG J, HE J, ZHANG J. A Reinforcement Learning Method to Optimize the Priority of Product for Scheduling the Large-scale Complex Manufacturing Systems［C］∥ 48th International Conference on Computers & Industrial Engineering (CIE48). Auckland, 2018:2-5.
［25］ZHANG Z, ZHENG L, LI N, et al. Minimizing Mean Weighted Tardiness in Unrelated Parallel Machine Scheduling with Reinforcement Learning［J］. Computers & Operations Research, 2012, 39(7):1315-1324.
［26］GUAN Y, REN Y, LI S E, et al. Centralized Cooperation for Connected and Automated Vehicles at Intersections by Proximal Policy Optimization［J］. IEEE Transactions on Vehicular Technology, 2020, 69(11):12597-12608.
［27］WEI H, LIU X, MASHAYEKHY L, et al. Mixed-autonomy Traffic Control with Proximal Policy Optimization［C］∥ IEEE Vehicular Networking Conference(VNC). Los Angeles, 2019:19529967.
［28］GANGAPURWALA S, MITCHELL A, HAVOUTIS I. Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot Locomotion［J］. IEEE Robotics and Automation Letters, 2020, 5(2):3642-3649.
［29］CHEN Y, MA L. Rocket Powered Landing Guidance Using Proximal Policy Optimization［C］∥4th International Conference on Automation, Control and Robotics Engineering. Shenzhen,2019:1-6.
［30］ZHU J, WANG H, ZHANG T. A Deep Reinforcement Learning Approach to the Flexible Flowshop Scheduling Problem with Makespan Minimization［C］∥2020 IEEE 9th Data Driven Control and Learning Systems Conference. Liuzhou, 2020:20256682.
［31］RUMMUKAINEN H, NURMINEN J K. Practical Reinforcement Learning - Experiences in Lot Scheduling Application［J］. IFAC-PapersOnLine, 2019, 52(13):1415-1420.
［32］SUTTON R S, MCALLESTER D A, SINGH S P, et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation［C］∥ Proceedings of the 12th International Conference on Neural Information Processing Systems. Denver, 1999:1057-1063.
［33］SCHULMAN J, LEVINE S, ABBEEL P, et al. Trust Region Policy Optimization［C］∥32nd International Conference on Machine Learning. Lille, 2015:1889-1897.
［34］MNIH V, BADIA A P, MIRZA M, et al. Asynchronous Methods for Deep Reinforcement Learning［C］∥International Conference on Machine Learning. New York, 2016:1928-1937.
［35］KINGMA D P, BA J. Adam:a Method for Stochastic Optimization［C］∥3rd International Conference for Learning Representations. San Diego,2015:1412.6980.

[1]	GUO Jutao, LYU Youlong, DAI Zheng, ZHANG Jie, GUO Yu. Compound Rules and Reinforcement Learning Based Scheduling Method for Mixed Model Assembly Lines [J]. China Mechanical Engineering, 2023, 34(21): 2600-2606,2614.
[2]	ZHENG Kun, LIAN Zhiwei, GU Xinyan, ZHU Changjian, XU Hui, FENG Xueqing. Hybrid Flow Shop Scheduling Problems with Unrelated Parallel Machine Solved by Improved Adaptive Genetic Algorithm(IAGA) with ITPX [J]. China Mechanical Engineering, 2023, 34(14): 1647-1658，1671.
[3]	SHI Qingqing, ZHANG Runfeng, ZHANG Lianhong, , LAN Shiquan, . Research on Underwater Gliders Path Tracking Based on Reinforcement Learning Algorithm [J]. China Mechanical Engineering, 2023, 34(09): 1100-1110.
[4]	WU Xing, YANG Junjie, TANG Kai, ZHAI Jingjing, LOU Peihuang. Hierarchical Path Planning for Mobile Robots Based on Hybrid Map [J]. China Mechanical Engineering, 2023, 34(05): 563-575.
[5]	ZHANG Kai, BI Li, JIAO Xiaogang. Research on Flexible Job-shop Scheduling Problems with Integrated Reinforcement Learning Algorithm [J]. China Mechanical Engineering, 2023, 34(02): 201-207.
[6]	DU Lizhen1;WANG Zhen1;KE Shanfu1;XIONG Zixue1;LI Xinyu2. Fruit Fly Optimization Algorithm for Solving Hybrid Flow-shop Scheduling Problems [J]. China Mechanical Engineering, 2019, 30(12): 1480-1485.
[7]	MENG Leilei1;ZHANG Chaoyong1;ZHAN Xinlong1;HONG Hui1;LUO Min2. Modeling of Energy-saving Unrelated Parallel Machine Scheduling Problems [J]. China Mechanical Engineering, 2018, 29(23): 2850-2858.
[8]	LI Wen-Chao, YAN Hong-Sen. A Self-evolution Algorithm for Scheduling a Flow-shop-like Knowledgeable Manufacturing Cell [J]. China Mechanical Engineering, 2011, 22(7): 830-835.
[9]	Wang Wenxi;Xiao Shide;Meng Xiangyin;Zhang Weihua,. ALV Path Planning Based on Reinforcement Learning in Fuzzy Neural-networks [J]. J4, 2009, 20(21): 0-2525.