# Maximum Causal Tsallis Entropy Imitation Learning

@article{Lee2018MaximumCT, title={Maximum Causal Tsallis Entropy Imitation Learning}, author={Kyungjae Lee and Sungjoon Choi and Songhwai Oh}, journal={ArXiv}, year={2018}, volume={abs/1805.08336} }

In this paper, we propose a novel maximum causal Tsallis entropy (MCTE) framework for imitation learning which can efficiently learn a sparse multi-modal policy distribution from demonstrations. We provide the full mathematical analysis of the proposed framework. First, the optimal solution of an MCTE problem is shown to be a sparsemax distribution, whose supporting set can be adjusted. The proposed method has advantages over a softmax distribution in that it can exclude unnecessary actions by… Expand

#### 13 Citations

Entropic Regularization of Markov Decision Processes

- Computer Science, Mathematics
- Entropy
- 2019

A broader family of f-divergences is considered, and more concretely α-diversgences are considered, which inherit the beneficial property of providing the policy improvement step in closed form at the same time yielding a corresponding dual objective for policy evaluation. Expand

Imitation Learning as f-Divergence Minimization

- Computer Science, Mathematics
- WAFR
- 2021

This work proposes a general imitation learning framework for estimating and minimizing any f-Divergence, and shows that the approximate I-projection technique is able to imitate multi-modal behaviors more reliably than GAIL and behavior cloning. Expand

Correlated Adversarial Imitation Learning

- Computer Science, Mathematics
- ArXiv
- 2020

A novel imitation learning algorithm is introduced by applying a game-theoretic notion of correlated equilibrium to the generative adversarial imitation learning, equipped with queues of discriminators and agents, in contrast with the classical approach. Expand

Semi-Supervised Imitation Learning with Mixed Qualities of Demonstrations for Autonomous Driving

- Computer Science
- ArXiv
- 2021

The experimental results demonstrate the validity of the proposed algorithm using unlabeled trajectories with mixed qualities and the hardware experiments are conducted to show that the proposed method can be applied to real-world applications. Expand

Divergence-Augmented Policy Optimization

- Computer Science
- NeurIPS
- 2019

Empirical experiments show that in the data-scarce scenario where the reuse of off-policy data becomes necessary, the method can achieve better performance than other state-of-the-art deep reinforcement learning algorithms. Expand

MixGAIL: Autonomous Driving Using Demonstrations with Mixed Qualities

- Computer Science
- 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
- 2020

A novel method is proposed, called mixed generative adversarial imitation learning (MixGAIL), which incorporates both of expert demonstrations and negative demonstrations, such as vehicle collisions, which converges faster than the other baseline methods. Expand

A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress

- Computer Science, Mathematics
- Artif. Intell.
- 2021

The survey formally introduces the IRL problem along with its central challenges which include accurate inference, generalizability, correctness of prior knowledge, and growth in solution complexity with problem size and elaborates how the current methods mitigate these challenges. Expand

Generative Adversarial Imitation Learning with Deep P-Network for Robotic Cloth Manipulation

- Computer Science
- 2019 IEEE-RAS 19th International Conference on Humanoid Robots (Humanoids)
- 2019

Experimental results suggest both fast and stable imitation learning ability and sample efficiency of P-GAIL in robotic cloth manipulation. Expand

Sparse Randomized Shortest Paths Routing with Tsallis Divergence Regularization

- Computer Science, Mathematics
- Data Min. Knowl. Discov.
- 2021

The sparse RSP is a promising model of movements on a graph, balancing sparse exploitation and exploration in an optimal way, and the derived dissimilarity measures based on expected routing costs provide state-of-the-art results. Expand

Inverse Decision Modeling: Learning Interpretable Representations of Behavior

- Computer Science
- ICML
- 2021

This paper develops an expressive, unifying perspective on inverse decision modeling: a framework for learning parameterized representations of sequential decision behavior, which formalizes the forward problem (as a normative standard), subsuming common classes of control behavior. Expand

#### References

SHOWING 1-10 OF 31 REFERENCES

Path Consistency Learning in Tsallis Entropy Regularized MDPs

- Computer Science, Mathematics
- ICML
- 2018

A class of novel path consistency learning (PCL) algorithms, called {\em sparse PCL}, for the sparse ERL problem that can work with both on-policy and off-policy data, and is empirically compared with its soft counterpart, and shows its advantage, especially in problems with a large number of actions. Expand

Infinite Time Horizon Maximum Causal Entropy Inverse Reinforcement Learning

- Mathematics, Computer Science
- IEEE Transactions on Automatic Control
- 2018

The maximum causal entropy framework is extended to the infinite time horizon setting and a gradient-based algorithm for the maximum discounted causal entropy formulation is developed that enjoys the desired feature of being model agnostic, a property that is absent in many previous IRL algorithms. Expand

Sparse Markov Decision Processes With Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning

- Computer Science, Mathematics
- IEEE Robotics and Automation Letters
- 2018

A sparse Markov decision process (MDP) with novel causal sparse Tsallis entropy regularization with outperforms existing methods in terms of the convergence speed and performance and a sparse value iteration method that solves a sparse MDP and proves the convergence and optimality of sparse value iterations using the Banach fixed-point theorem is proposed. Expand

Maximum Entropy Inverse Reinforcement Learning

- Computer Science
- AAAI
- 2008

A probabilistic approach based on the principle of maximum entropy that provides a well-defined, globally normalized distribution over decision sequences, while providing the same performance guarantees as existing methods is developed. Expand

Modeling purposeful adaptive behavior with the principle of maximum causal entropy

- Computer Science
- 2010

The principle of maximum causal entropy is introduced, a general technique for applying information theory to decision-theoretic, game-the theoretical, and control settings where relevant information is sequentially revealed over time. Expand

Robust Bayesian Inverse Reinforcement Learning with Sparse Behavior Noise

- Computer Science
- AAAI
- 2014

This paper develops a robust IRL framework that can accurately estimate the reward function in the presence of behavior noise, and introduces a novel latent variable characterizing the reliability of each expert action and uses Laplace distribution as its prior. Expand

Reinforcement Learning with Deep Energy-Based Policies

- Computer Science
- ICML
- 2017

A method for learning expressive energy-based policies for continuous states and actions, which has been feasible only in tabular domains before, is proposed and a new algorithm, called soft Q-learning, that expresses the optimal policy via a Boltzmann distribution is applied. Expand

Robust Imitation of Diverse Behaviors

- Computer Science, Mathematics
- NIPS
- 2017

A new version of GAIL is developed that is much more robust than the purely-supervised controller, especially with few demonstrations, and avoids mode collapse, capturing many diverse behaviors when GAIL on its own does not. Expand

Generative Adversarial Imitation Learning

- Computer Science, Mathematics
- NIPS
- 2016

A new general framework for directly extracting a policy from data, as if it were obtained by reinforcement learning following inverse reinforcement learning, is proposed and a certain instantiation of this framework draws an analogy between imitation learning and generative adversarial networks. Expand

Maximum Entropy Deep Inverse Reinforcement Learning

- Computer Science
- 2015

It is shown that the Maximum Entropy paradigm for IRL lends itself naturally to the efficient training of deep architectures, and the approach achieves performance commensurate to the state-of-the-art on existing benchmarks while exceeding on an alternative benchmark based on highly varying reward structures. Expand