机器学习实验室博士生系列论坛(第二十六期)——Understanding Unsupervised Reinforcement Learning
报告人:Hao Jin (PKU)
时间:2022-04-20 15:10-16:10
地点:bat365中国在线平台官方网站静园六院212会议室 & 腾讯会议 723 1564 5542
Abstract: Unsupervised reinforcement learning, as its name indicates, stands for the policy learning in an MDP without explicit reward signal. Instead of the well-known reward signal from the MDP, URL utilizes the intrinsic reward generated throughout the training process. There are multiple ways to design such intrinsic reward, which depends on the specific scenario of URL applications. Generally speaking, the performance of URL reflects how the agent(s) understand the MDP dynamics, which is also known as transition problems.
In this talk, we will start from several applications of using unsupervised reward as auxiliary tasks, then move on to several different URL methods and finally introduce several theoretical works analyzing URL methods from different perspectives.