Large language models are like complex recipes – they require a lot of computation to create, but once they’re cooked, they can be used for many tasks. However, these models take up a lot of space and energy, so we need to find ways to simplify them without losing their usefulness. In this article, we explore different methods to compress large language models while preserving their quality.
We begin by understanding the tradeoffs between model compression and performance using general metrics like perplexity or standardized benchmark tasks. These measures provide a rough idea of how well a compressed model will perform, but they’re not very helpful in making informed decisions for specific models or tasks.
Next, we dive into related work that focuses on understanding the knowledge capabilities of large language models and how they acquire factual and commonsense knowledge. We discuss techniques like probing, which helps us understand what a model knows, but these methods are limited to particular models or classes of models.
We then introduce our main contribution – a large-scale study that focuses on fine-grained effects of compression on quantities like parametric knowledge. We investigate various compression schemes across multiple model families and provide useful insights into which types of compression have the least and most significant impact on models.
In summary, this article is like a cookbook for large language models – it provides practical advice on how to simplify them without sacrificing their performance. By demystifying complex concepts using everyday language and engaging metaphors, we hope this work helps users develop intuition for selecting appropriate compression techniques in large language models.
Computation and Language, Computer Science