Deep Voice 3: In this paper, Jakob Uszkoreit et al. propose a fully convolutional encoder-decoder model for text-to-speech synthesis. Their approach involves incorporating a speaker representation into the model to capture different speech styles, and using position-augmented attention to improve context size. The proposed method achieves state-of-the-art performance in human mean opinion score evaluations.
PixelSNAIL: In this paper, the authors propose applying SNAIL [254], a general purpose autoregressive meta-learning model using causal convolutions and self-attention, to sequential image generation tasks. Their approach involves maximizing context size to improve likelihood performance, but sacrificing density estimation performance in exchange for greater computational efficiency.
Subscale Pixel Network (SPN): In this paper, the authors propose a novel network architecture for image generation that involves splitting images into smaller slices and feeding them into a convolutional encoder. Their approach allows for faster computation and improved scalability, while maintaining competitive performance in image generation tasks.
In summary, these papers represent significant advancements in various areas of deep learning, including natural language processing, computer vision, music generation, and image generation. The proposed methods show promising results and have the potential to enable new applications and use cases in these fields.
Computer Science, Machine Learning