China Mechanical Engineering ›› 2023, Vol. 34 ›› Issue (09): 1100-1110.DOI: 10.3969/j.issn.1004-132X.2023.09.011

Previous Articles     Next Articles

Research on Underwater Gliders Path Tracking Based on Reinforcement Learning Algorithm

SHI Qingqing1;ZHANG Runfeng1,3,4;ZHANG Lianhong1,2 ;LAN Shiquan1,2   

  1. 1.Key Laboratory of Mechanism Theory and Equipment Design of Ministry of Education,School of Mechanical Engineering,Tianjin University,Tianjin,300350
    2.The Joint Laboratory of Ocean Observing and Detection,Pilot National Laboratory for Marine Science and Technology(Qingdao),Qingdao,Shandong,266237
    3.Tianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control,School of Mechanical Engineering,Tianjin University of Technology,Tianjin,300384
    4.National Demonstration Center for Experimental Mechanical and Electrical Engineering Education  (Tianjin University of Technology),Tianjin,300384
  • Online:2023-05-10 Published:2023-05-31

基于强化学习算法的水下滑翔机路径跟踪研究

石晴晴1;张润锋1,3,4;张连洪1,2;兰世泉1,2   

  1. 1.天津大学机械工程学院机构理论与装备设计教育部重点实验室,天津,300350
    2.青岛海洋科学与技术试点国家实验室海洋观测与探测联合实验室,青岛,266237
    3.天津理工大学天津市先进机电系统设计与智能控制重点实验室,天津,300384
    4.机电工程国家级实验教学示范中心(天津理工大学)天津,300384
  • 通讯作者: 兰世泉(通信作者),男,1988年生,高级工程师。研究方向为无人水下航行器产业化及应用推广。E-mail:yxlx2010@163.com。
  • 作者简介:石晴晴,女,1997 年生,硕士研究生。研究方向为复杂环境下的无人设备路径规划与智能决策。
  • 基金资助:
    天津市新一代人工智能科技重大专项(19ZXZNGX00050)

Abstract: Aiming at the large deviations between the actual paths and the predetermined ones of underwater gliders affected by ocean current, a neural network ocean current prediction model with long-term and short-term memory and attention mechanism was established based on the traditional long-term and short-term memory network model.The dynamic Q-table of underwater glider motions was generated by depth neural network, and the optimal motion attitude was selected by reinforcement learning algorithm. Considering the influences of ocean current, an underwater glider path tracking algorithm was constructed based on depth reinforcement learning. The results show that the long-term and short-term memory network based on attention mechanism has less mean square errors and root mean square errors in ocean current prediction than that of the traditional integrated moving average autoregressive model and long-term and short-term memory network.Compared with the traditional PID control, the deep reinforcement learning model may reduce the root mean square errors of the underwater glider trajectory by 50.9%, and significantly improve the path tracking accuracy.

Key words:  , underwater glider, path tracking, attention mechanism, reinforcement learning

摘要: 针对洋流影响下水下滑翔机实际路径与预定路径偏差较大的问题,在传统的长短期记忆网络模型的基础上引入注意力机制,建立了具有长短期记忆与注意力机制的神经网络洋流预测模型;利用深度神经网络生成水下滑翔机运动的动态Q表,并通过强化学习算法选择最优运动姿态,同时考虑洋流的影响,构造了基于深度强化学习的水下滑翔机路径跟踪算法。结果表明,基于注意力机制的长短期记忆网络相较于传统的整合移动平均自回归模型与长短期记忆网络,其洋流预测具有更小的均方误差与均方根误差,具有良好的预测能力;相较于传统的PID控制,深度强化学习模型可使水下滑翔机轨迹均方根误差降低50.9%,显著提高了路径跟踪精度。

关键词: 水下滑翔机, 路径跟踪, 注意力机制, 强化学习

CLC Number: