Your Search Results

Use this resource - and many more! - in your textbook!

AcademicPub holds over eight million pieces of educational content for you to mix-and-match your way.

Experience the freedom of customizing your course pack with AcademicPub!
Not an educator but still interested in using this content? No problem! Visit our provider's page to contact the publisher and get permission directly.

Online Learning of Rested and Restless Bandits

By: Mingyan Liu; Tekin, C.;

2012 / IEEE


This item was taken from the IEEE Periodical ' Online Learning of Rested and Restless Bandits ' In this paper, we study the online learning problem involving rested and restless bandits, in both a centralized and a decentralized setting. In a centralized setting, the system consists of a single player/user and a set of K finite-state discrete-time Markov chains (arms) with unknown state spaces (rewards) and statistics. The objective of the player is to decide in each step which M of the K arms to play over a sequence of trials so as to maximize its long-term reward. In a decentralized setting, multiple uncoordinated players each makes its own decision on which arm to play in a step, and if two or more players select the same arm simultaneously, a collision results and none of the players selecting that arm gets a reward. The objective of each player again is to maximize its long-term reward. We first show that logarithmic regret algorithms exist both for the centralized rested and restless bandit problems. For the decentralized setting, we propose an algorithm with logarithmic regret with respect to the optimal centralized arm allocation. Numerical results and extensive discussion are also provided to highlight insights obtained from this study.