Machine learning heavily relies on optimization algorithms to solve its learning models. Constrained problems constitute a major type of optimization problem, and the alternating direction method of multipliers (ADMM) is a commonly used algorithm to solve constrained problems, especially linearly constrained ones. Written by experts in machine learning and optimization, this is the first book providing a state-of-the-art review on ADMM under various scenarios, including deterministic and convex optimization, nonconvex optimization, stochastic optimization, and distributed optimization. Offering a rich blend of ideas, theories and proofs, the book is up-to-date and self-contained. It is an excellent reference book for users who are seeking a relatively universal algorithm for constrained problems. Graduate students or researchers can read it to grasp the frontiers of ADMM in machine learning in a short period of time.
Autorentext
Zhouchen Lin is a leading expert in the fields of machine learning and optimization. He is currently a professor with the Key Laboratory of Machine Perception (Ministry of Education), School of Artificial Intelligence, Peking University. Prof. Lin served as an area chair many times for prestigious conferences, including CVPR, ICCV, NIPS/NeurIPS, ICML, ICLR, IJCAI and AAAI. He is a Program Co-Chair of ICPR 2022 and a Senior Area Chair of ICML 2022. Prof. Lin is an associate editor of the International Journal of Computer Vision and the Optimization Methods and Software. He is a Fellow of CSIG, IAPR and IEEE.
Huan Li received a doctoral degree in machine learning from Peking University in 2019. He is currently an assistant researcher at the School of Artificial Intelligence, Nankai University. His research interests include optimization and machine learning.
Cong Fang received a doctoral degree in machine learning from Peking University in 2019. He is currently an assistant professor at the School of Artificial Intelligence, Peking University. His research interests include optimization and machine learning.
Inhalt
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Examples of Constrained Optimization Problems in Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Sketch of Representative Works on ADMM . . . . . . . . . . . . . . . . . . . . . . 3
1.3 About the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Derivations of ADMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1 Lagrangian Viewpoint of ADMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.1 Dual Ascent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.2 Augmented Lagrangian Method . . . . . . . . . . . . . . . . . . . . . . . . . . 122.1.3 Alternating Direction Method of Multipliers . . . . . . . . . . . . . . . . 14
2.1.4 Relation to the Split Bregman Method . . . . . . . . . . . . . . . . . . . . . 15
2.2 Operator Splitting Viewpoint of ADMM . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.1 Douglas-Rachford Splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.2 From DRS to ADMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3 ADMM for Deterministic and Convex Optimization . . . . . . . . . . . . . . . . . . . 21
3.1 Original ADMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1.1 Convergence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1.2 Sublinear Convergence Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.1.3 Linear Convergence Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2 Bregman ADMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.1 Sublinear Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2.2 Linear Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.3 Accelerated Linearized ADMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473.3.1 Sublinear Convergence Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3.2 Linear Convergence Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.4 Special Case: Linearized Augmented Lagrangian Method and Its
Acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.5 Multi-block ADMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.5.1 Gaussian Back Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.5.2 Linearized ADMM with Parallel Splitting . . . . . . . . . . . . . . . . . . . 783.5.3 Combing the Serial and the Parallel Update Orders . . . . . . . . . . . . 80
3.6 The Constraint is Nonlinear . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4 ADMM for Nonconvex Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.1 Multi-block Bregman ADMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 894.1.1 No Assumptions on the Linear Constraint but More
Assumptions on the Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.2 Proximal ADMM with Exponential Averaging . . . . . . . . . . . . . . . . . . . 96
4.3 ADMM for Multilinearly Constrained Optimization . . . . . . . . . . . . . . 105
4.3.1 Nonnegative Matrix Factorization . . . . . . . . . . . . . . . . . . . . . . . . 1064.3.2 Robust PCA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .110
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5 Stochastic ADMM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.1 Stochastic ADMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.2 Variance Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1215.3 Momentum Acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.4 Non-convex Stochastic ADMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
5.4.1 Non-convex SADMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
5.4.2 SPIDER Acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1646 ADMM for Distributed Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
6.1 Centralized Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
6.1.1 ADMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
6.1.2 Linearized ADMM. . . . . . . . . . . . . . . . . . . . . . . . . . …