In this article, we explore the issue of sudden performance collapse in deep neural networks during training, which can lead to unstable learning behavior. The authors analyze the impact of different hyperparameters and demonstrate that a balance between learning performance and stability is crucial to prevent over-aggregation of factors. They propose setting α = 10−3 and kGW,OD = 100 as a compromise between these competing goals.
The article begins by introducing the problem of sudden performance collapse in deep neural networks, which can occur after approximately 50 iterations of training. The authors explain that this phenomenon is caused by the interaction between the learning rate and the number of aggregation factors (kGW). They show that for some combinations of hyperparameters, the learning curve exhibits a sudden increase in error, indicating a collapse in performance.
To better understand this phenomenon, the authors perform a detailed analysis of the average decrease in test error during training (δMY) for different α and kGW,OD values. They find that the performance does not collapse as significantly for some combinations of hyperparameters, such as (α = 10−3, kGW,OD = 200), but is still unstable after approximately 50 iterations.
The authors also investigate the impact of different aggregation factors on the learning behavior of deep neural networks. They find that for smaller aggregation factors, the convergence curves exhibit a more stable learning behavior, although they do not converge after 100 iterations. However, for larger aggregation factors, the performance collapses more significantly, indicating that too much coarsening can lead to unstable learning behavior.
To address this issue, the authors propose setting α = 10−3 and kGW,OD = 100 as a compromise between learning performance and stability. They demonstrate that this combination leads to a more stable learning behavior without significantly sacrificing performance.
In conclusion, the article highlights the importance of balancing learning performance and stability in deep neural networks during training. By carefully selecting hyperparameters, such as α and kGW,OD, we can prevent sudden performance collapses and achieve better generalization performance. The proposed combination of α = 10−3 and kGW,OD = 100 offers a good tradeoff between these competing goals and can be used as a starting point for further optimization.
Electrical Engineering and Systems Science, Systems and Control