MARKO ZARIĆ

Research

Here is a list of all my publications. Follow the links at the bottom for more information (pdf, code, etc.).

Publications

Pink Noise LQR: How does Colored Noise affect the Optimal Policy in RL?

Jakob Hollenstein, Marko Zaric, Samuele Tosatto, Justus Piater

ICML 2024 Workshop: Foundations of Reinforcement Learning and Control--Connections and Perspectives2024

Abstract:

Colored noise, a class of temporally correlated noise processes, has shown promising results for improving exploration in deep reinforcement learning for both off-policy and on-policy algorithms. However, It is unclear how temporally correlated colored noise affects policy learning apart from changing exploration properties. In this paper, we investigate the influence of colored noise on the optimal policy in a simplified linear quadratic regulator (LQR) setting. We show that the expected trajectory remains independent of the noise color for a given linear policy. We derive a closed-form solution for the expected cost and find that the noise affects both the expected cost and the optimal policy. The cost splits into two parts: a state-cost part equaling the cost for the unperturbed system and a noise-cost term independent of the initial state. Far from the goal state, the state cost dominates, and the effect due to the noise is negligible: the policy approaches the optimal policy of the unperturbed system. Near the goal state, the noise cost dominates, changing the optimal policy.

Unsupervised Learning of Effective Actions in Robotics

Marko Zaric, Jakob Hollenstein, Justus Piater, Erwan Renaudo

Proceedings of the First Austrian Symposium on AI, Robotics, and Vision (AIRoV 2024)2024

Abstract:

Learning actions that are relevant to decision-making and can be executed effectively is a key problem in autonomous robotics. Current state-of-the-art action representations in robotics lack proper effect-driven learning of the robot's actions. Although successful in solving manipulation tasks, deep learning methods also lack this ability, in addition to their high cost in terms of memory or training data. In this paper, we propose an unsupervised algorithm to discretize a continuous motion space and generate "action prototypes", each producing different effects in the environment. After an exploration phase, the algorithm automatically builds a representation of the effects and groups motions into action prototypes, where motions more likely to produce an effect are represented more than those that lead to negligible changes. We evaluate our method on a simulated stair-climbing reinforcement learning task, and the preliminary results show that our effect driven discretization outperforms uniformly and randomly sampled discretizations in convergence speed and maximum reward.