Deep Learning Recommendation Models (DLRMs) are widely used in personalized recommendation systems to improve user experience. DLRMs consist of two major components: memory-bound embedding layers and computation-bound fully-connected (FC) layers. These models handle both dense and sparse features, with the latter stored as embedding vectors in tables. During inference, these vectors are accessed via indexes, resulting in multiple random memory accesses. To address this challenge, DLRMs can be partitioned across multiple FPGAs using a checkerboard block decomposition, allowing for efficient storage and computation.
To understand how DLRMs work, imagine a large library with millions of books. Each book represents a feature (e.g., user demographics or item attributes), and the features are stored in different rooms throughout the library. When a user searches for a book, they want to find similar books quickly and efficiently. To do this, the system needs to access the relevant features from each room and compare them to the searched book. This process is computationally expensive due to the large number of feature comparisons required.
To address this challenge, DLRMs use embedding vectors that map the features to a lower-dimensional space. These vectors are stored in tables, similar to a bookshelf with books organized by author or genre. When a user searches for a book, the system only needs to access the relevant books from the shelf and compare them directly, rather than searching through every book in the library. This process is much faster and more efficient, making it possible to provide personalized recommendations in real-time.
One limitation of DLRMs is that they require a large number of FPGAs to store the embedding vectors. However, modern FPGAs have a limited capacity, so they cannot accommodate all the required vectors. To address this challenge, researchers proposed partitioning DLRMs across multiple FPGAs using a checkerboard block decomposition. This allows for efficient storage and computation, making it possible to scale DLRMs to larger datasets and improve personalized recommendations.
In summary, DLRMs are powerful models used in personalized recommendation systems to improve user experience. They consist of memory-bound embedding layers and computation-bound FC layers that handle both dense and sparse features. To address the computational challenges associated with DLRMs, researchers proposed partitioning them across multiple FPGAs using a checkerboard block decomposition, allowing for efficient storage and computation. By leveraging this technique, it is possible to scale DLRMs to larger datasets and improve personalized recommendations in real-time.
Computer Science, Distributed, Parallel, and Cluster Computing