Abstract |
This paper studies Online Convex Optimization (OCO) problems where the constraints have additive perturbations that (i) vary over time and (ii) are not known at the time to make a decision. Perturbations may not be i.i.d. generated and can be used, for example, to model a time-varying budget or time- varying requests in resource allocation problems. Our goal is to design a policy that obtains sublinear regret and satisfies the constraints in the long-term. To this end, we present an online primal-dual proximal gradient algorithm that has O(Tε ∨ T1−ε) regret and O(Tε) constraint violation, where ε ∈ [0, 1) is a parameter in the learning rate. The proposed algorithm obtains optimal rates when ε = 1/2, and can compare against a stronger comparator (the set of fixed decisions in hindsight) than previous work. |