[Please, notice that the room has changed (now: room G1-201, just in front of the usual
Policy Optimization via Importance Sampling
Francesco Faccio (IDSIA)
Policy optimization is an effective Reinforcement Learning approach to solve continuous
control tasks. Recent achievements have shown that alternating online and offline
optimization is a successful choice for efficient trajectory reuse. However, deciding when
to stop optimizing and collect new trajectories is non-trivial, as it requires to account
for the variance of the objective function estimate. In this talk, we propose a novel,
model-free, policy search algorithm, POIS, applicable in both action-based and
parameter-based settings. We first derive a high-confidence bound for importance sampling
estimation; then we define a surrogate objective function, which is optimized offline
whenever a new batch of trajectories is collected. Finally, the algorithm is tested on a
selection of continuous control tasks, with both linear and deep policies, and compared
with state-of-the-art policy optimization methods.
Joint work with Alberto Maria Metelli, Matteo Papini and Marcello Restelli from
Politecnico di Milano. To appear at the 32nd Conference on Neural Information Processing
Systems (NIPS 2018). Selected for an oral presentation.
*When: Tuesday, 27th of November 2018, 12:00-13:00
*Location: Manno, Galleria 1, 2nd floor, room G1-20
*Registration: Pizza (or alternative food) and drinks will be offered at the end of the
talk. If you plan to attend, please register in a timely fashion at the following link so
that we will have no shortage of food:
Francesco Faccio is a Master Student in Mathematical Engineering at Politecnico di Milano.
He is currently working as an intern at IDSIA, where he completed his Master's thesis.
His main research interests include Reinforcement Learning, Recurrent Neural Networks and
Bayesian Statistics.