In this article, Yongjun Zhang explores the potential of large language and vision models (LLVMs) to analyze and understand protests in news articles. The author acknowledges the support from various institutions, including the Institute for Advanced Computational Science, for access to high-performance computing systems and OpenAI APIs.
The article discusses several limitations that researchers should be aware of when using LLVMs for protest analysis. For instance, fine-tuning transformer models requires some knowledge of deep learning and programming skills, which can be a barrier for scholars. Additionally, the training data used to fine-tune the longformer model has time constraints, as most of the positive news articles are from 1960-1995. This may limit the potential of inferring protests in recent years.
To overcome these limitations, Zhang proposes a new framework called LLVMs4Protest, which harnesses the power of LLVMs for deciphering protests in the news. The author argues that by using ChatGPT to generate, annotate, and debug codes, scholars can overcome the barriers to computational social science.
The article also highlights several future directions for research. For instance, a comparison between fine-tuned models and GPT-4 annotation can offer more nuances in terms of choosing generative AIs or transfer learning with pretrained models. Additionally, the classifiers used in the study focus solely on protest identification and some limited number of protest attributes, which may not capture the full complexity of protests.
In conclusion, Zhang’s article demonstrates the potential of LLVMs for analyzing and understanding protests in news articles. By harnessing the power of large language and vision models, scholars can overcome the limitations of traditional methods and gain new insights into the complex and dynamic nature of protests.
Computer Science, Computer Vision and Pattern Recognition