In this article, the authors aim to analyze short videos on social media platforms by leveraging multimodal emotion analysis. They designed criteria for selecting short videos featuring one or two main characters, with clear speech in the same language, and less than three minutes duration. The audio segments are then fed into a Whisper model for transcription, and the resulting text is analyzed for emotions using a multimodal emotion analysis method.
The authors explain that short videos have simple but strong emotions due to their fragmented transmission and purpose of gaining high likes and comments. They argue that conducting multimodal emotion analysis on short videos can provide more accurate results than traditional videos, making them a valuable resource for understanding public attitudes and anticipating future opinions.
The authors introduce the concept of multimodal data, which combines multiple modalities such as audio, video, and text to analyze emotions. They emphasize the importance of considering the context in which short videos are created and disseminated on social media platforms.
In summary, the article presents a new approach to analyzing short videos on social media by leveraging multimodal emotion analysis. By combining audio, video, and text data, the method can provide more accurate results than traditional video analysis, making it a valuable tool for understanding public attitudes and anticipating future opinions. The authors emphasize the importance of considering the context in which short videos are created and disseminated on social media platforms.
Computer Science, Social and Information Networks