当前位置: X-MOL 学术Acta Numer. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Deep learning: a statistical viewpoint
Acta Numerica ( IF 14.2 ) Pub Date : 2021-08-04 , DOI: 10.1017/s0962492921000027
Peter L. Bartlett 1 , Andrea Montanari 2 , Alexander Rakhlin 3
Affiliation  

The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting, that is, accurate predictions despite overfitting training data. In this article, we survey recent progress in statistical learning theory that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behaviour of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favourable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.

中文翻译:

深度学习:统计观点

深度学习的显着实践成功从理论角度揭示了一些重大惊喜。特别是,简单的梯度方法很容易找到非凸优化问题的近乎最优解,尽管在没有任何明确努力控制模型复杂性的情况下对训练数据进行了近乎完美的拟合,但这些方法表现出出色的预测准确性。我们推测这些现象背后的特定原理:过度参数化允许梯度方法找到插值解决方案,这些方法隐含地强加正则化,并且过度参数化导致良性过度拟合,即尽管过度拟合训练数据也能准确预测。在本文中,我们调查了统计学习理论的最新进展,这些理论提供了在更简单的环境中说明这些原则的例子。我们首先回顾经典的一致收敛结果,以及为什么它们无法解释深度学习方法行为的各个方面。我们给出了简单设置中的隐式正则化示例,其中梯度方法导致完美拟合训练数据的最小范数函数。然后,我们回顾了表现出良性过拟合的预测方法,重点关注具有二次损失的回归问题。对于这些方法,我们可以将预测规则分解为一个对预测有用的简单组件和一个对过度拟合有用的尖峰组件,但在有利的设置下,不会损害预测准确性。我们特别关注神经网络的线性机制,其中网络可以用线性模型来近似。在这种情况下,我们证明了梯度流的成功,并且我们考虑了两层网络的良性过度拟合,给出了精确的渐近分析,准确地证明了过度参数化的影响。最后,我们强调了将这些见解扩展到现实的深度学习环境中出现的关键挑战。
更新日期:2021-08-04
down
wechat
bug