Preprints

A Benchmark for Low-Switching-Cost Reinforcement Learning
D3PG: Deep Differentiable Deterministic Policy Gradients
Pretrain soft q-learning with imperfect demonstrations