Back to Publications
Q-Learning for robust satisfaction of signal temporal logic specifications
Derya Aksaray, Austin Jones, Zhaodan Kong, Mac Schwager, Calin Belta
2016 IEEE 55th Conference on Decision and Control (CDC), 2016
Abstract
In this paper, we address the problem of learning optimal policies for satisfying signal temporal logic (STL) specifications by agents with unknown stochastic dynamics. The system is modeled as a Markov decision process, in which the states represent partitions of a continuous space and the transition probabilities are unknown. We formulate two synthesis problems where the desired STL specification is enforced by maximizing 1) the probability of satisfaction, and 2) the expected robustness degree, i.e., a measure quantifying the quality of satisfaction. We discuss that Q-learning is not directly applicable to these problems because, based on the quantitative semantics of STL, the probability of satisfaction and expected robustness degree are not in the standard objective form of Qlearning (i.e., the sum of instantaneous rewards). To resolve this issue, we propose an approximation of STL synthesis problems that can be solved via Q-learning, and we derive some performance bounds for the policies obtained by the approximate approach. Finally, we present simulation results to demonstrate the performance of the proposed method.
BibTeX
@inproceedings{aksaray_q-learning_2016,
address = {Las Vegas, NV, USA},
title = {Q-{Learning} for robust satisfaction of signal temporal logic specifications},
isbn = {978-1-5090-1837-6},
url = {http://ieeexplore.ieee.org/document/7799279/},
abstract = {In this paper, we address the problem of learning optimal policies for satisfying signal temporal logic (STL) specifications by agents with unknown stochastic dynamics. The system is modeled as a Markov decision process, in which the states represent partitions of a continuous space and the transition probabilities are unknown. We formulate two synthesis problems where the desired STL specification is enforced by maximizing 1) the probability of satisfaction, and 2) the expected robustness degree, i.e., a measure quantifying the quality of satisfaction. We discuss that Q-learning is not directly applicable to these problems because, based on the quantitative semantics of STL, the probability of satisfaction and expected robustness degree are not in the standard objective form of Qlearning (i.e., the sum of instantaneous rewards). To resolve this issue, we propose an approximation of STL synthesis problems that can be solved via Q-learning, and we derive some performance bounds for the policies obtained by the approximate approach. Finally, we present simulation results to demonstrate the performance of the proposed method.},
language = {en},
urldate = {2020-09-15},
booktitle = {2016 {IEEE} 55th {Conference} on {Decision} and {Control} ({CDC})},
publisher = {IEEE},
author = {Aksaray, Derya and Jones, Austin and Kong, Zhaodan and Schwager, Mac and Belta, Calin},
month = dec,
year = {2016},
keywords = {reinforcement learning, planning, safety},
pages = {6565--6570},
month_numeric = {12}
}