Quasi-Newton optimization methods are widely used in machine learning to minimize the risk function. However, these methods can suffer from slow convergence, especially when dealing with large datasets. In this article, we explore a new approach called "divide-and-combine" that improves the convergence rate of standard quasi-Newton methods without requiring additional computational resources.
The key idea is to divide the workers into smaller groups and have each group maintain their own local Hessian estimate. Then, these groups combine their estimates to form a global estimate of the Hessian matrix. This process allows for faster convergence without compromising accuracy. The authors show that this approach achieves non-asymptotic super-linear convergence rates, which is a significant improvement over previous methods.
To understand how this works, imagine a group of people trying to solve a complex problem together. Each person has their own idea of the solution, and they work independently at first. However, they soon realize that their individual solutions are not accurate enough, so they come together and share their ideas with each other. By combining their knowledge, they can create a better solution that converges faster and is more accurate.
The "divide-and-combine" approach used in this article is similar. Each worker maintains their own local Hessian estimate, which is like their individual solution to the optimization problem. Then, these workers come together and share their estimates with each other, creating a global estimate that converges faster and is more accurate.
The authors also prove that this approach achieves non-asymptotic super-linear convergence rates, which means that the convergence rate improves as the dataset size increases. This is a significant improvement over previous methods, which often have slower convergence rates or require additional computational resources.
In summary, "divide-and-combine" is a new approach to quasi-Newton optimization that improves the convergence rate without requiring more resources. By dividing workers into smaller groups and having them share their local Hessian estimates, this approach achieves faster and more accurate optimization. This article demonstrates the effectiveness of this approach through theoretical analysis and simulations.
Mathematics, Optimization and Control