Dataset Distillation: A Comprehensive Review

Dataset distillation is a technique that compresses a vast dataset into a much smaller, yet representative sample. The goal is to condense the rich distributional information described by the original dataset into just a few samples that capture its essence. This process has gained significant attention as it can significantly reduce data volume while maintaining the integrity and quality of the information. In this paper, we present a new approach to dataset distillation based on Wasserstein metrics derived from optimal transport theory.
Wasserstein metrics are a type of distance measure that evaluates the similarity between two probability distributions. They have been used in various applications, including image registration and risk management. Our method leverages these metrics to learn the core representations of the data distribution, allowing us to effectively narrow the divide between efficiency and performance. By using Wasserstein metrics, we can identify the most representative samples of the dataset, providing new insights into the data distribution.
Our approach has demonstrated impressive performance across various benchmarks, establishing new standards in the field of dataset distillation. We have also conducted an ablation study to support our design rationales and demonstrate the power of Wasserstein metrics in learning the core representations of the data distribution. Our results show that our method can significantly reduce the data volume while maintaining its integrity and quality, making it a practical solution for handling high-resolution datasets.
In conclusion, this paper introduces a new approach to dataset distillation based on Wasserstein metrics derived from optimal transport theory. Our method effectively narrows the divide between efficiency and performance, offering a powerful solution that can handle high-resolution datasets while maintaining their integrity and quality. The results of our ablation study support our design rationales, highlighting the effectiveness of Wasserstein metrics in learning the core representations of the data distribution.

ARXIV/2311.18531 authored by Haoyang Liu, Tiancheng Xing, Luwei Li, Vibhu Dalal, Jingrui He, Haohan Wang.

Dataset Distillation: A Comprehensive Review

LLama 2 7B Chat

Categories

Tags

Archives

Dataset Distillation: A Comprehensive Review

LLama 2 7B Chat

Accurate Analysis of Image Captions with CoT-Based Methods

Unsupervised Audio-Caption Alignment via Correspondence Learning

Efficient Method for ML Model Accuracy Improvement in Non-IID Data Settings

Categories

Tags

Archives