In this paper, the authors aim to understand how deep neural networks generalize to new examples, specifically focusing on the difference between individual class performance and overall network performance. They introduce the concept of "class-generalization error," which measures the gap between these two metrics. The authors derive novel information-theoretic bounds for this quantity and experimentally validate them in various neural network architectures.
To explain this complex topic, let’s use an analogy: imagine you’re trying to learn a new language. Each class in a deep neural network is like a different word in that language. The network’s overall performance can be thought of as its ability to understand and use all those words correctly. However, the class-generalization error measures how well the network performs for each individual word (class) separately.
The authors propose several new bounds for estimating this class-generalization error using a technique called the super-sample method. These bounds are based on KL divergence, which is like measuring the similarity between two probability distributions. By using these bounds, the authors demonstrate that the class-generalization error can be significantly smaller than what was previously believed, especially in certain neural network architectures.
In summary, this paper provides a deeper understanding of how deep neural networks generalize to new examples by introducing the concept of class-generalization error and proposing novel bounds for estimating it. These bounds can help improve our understanding of when and how deep neural networks work well or poorly in different contexts.
Computer Science, Machine Learning