Use this resource - and many more! - in your textbook!
AcademicPub holds over eight million pieces of educational content for you to mix-and-match your way.
Online Learning of Rested and Restless Bandits
By: Mingyan Liu; Tekin, C.;
2012 / IEEE
This item was taken from the IEEE Periodical ' Online Learning of Rested and Restless Bandits ' In this paper, we study the online learning problem involving rested and restless bandits, in both a centralized and a decentralized setting. In a centralized setting, the system consists of a single player/user and a set of K finite-state discrete-time Markov chains (arms) with unknown state spaces (rewards) and statistics. The objective of the player is to decide in each step which M of the K arms to play over a sequence of trials so as to maximize its long-term reward. In a decentralized setting, multiple uncoordinated players each makes its own decision on which arm to play in a step, and if two or more players select the same arm simultaneously, a collision results and none of the players selecting that arm gets a reward. The objective of each player again is to maximize its long-term reward. We first show that logarithmic regret algorithms exist both for the centralized rested and restless bandit problems. For the decentralized setting, we propose an algorithm with logarithmic regret with respect to the optimal centralized arm allocation. Numerical results and extensive discussion are also provided to highlight insights obtained from this study.
Discrete Time Systems
Optimal Centralized Arm Allocation
K Finite-state Discrete-time Markov Chains
Algorithm Design And Analysis
Eigenvalues And Eigenfunctions
Opportunistic Spectrum Access (osa)
Computing And Processing