Bridging the gap between complex scientific research and the curious minds eager to explore it.

Computer Science, Social and Information Networks

Uncovering the Topics of Online Discussions: A Text Mining Approach

Uncovering the Topics of Online Discussions: A Text Mining Approach

In this article, we explore how to analyze online discussions using a structural approach that captures the dynamics of content going viral and estimates polarization in online audiences. The author explains that while this method is already insightful, it can be expanded to include more human-centered directions by incorporating sentiment, emotions, topics, and ideas as embedded in online posts.
The author begins by describing the importance of text preprocessing, which involves adopting a standard pipeline and lemmatizing words to obtain their root forms. This helps reduce variations and standardize language representation. Next, the author discusses the implementation of BERTopic, an algorithm that provides various choices for its components. To ensure the extraction of both individual words and bigrams/trigrams, the author employs BERTopic with a unigram-to-trigram range (1, 3).
The author then explains how they tested different dimension reduction and clustering methods and opted for UMAP and HDBSCAN. Specifically, they set a minimum cluster size of 200 to avoid small and noisy topics. The author notes that while HDBSCAN can provide valuable insights, it may lead to neglecting the textual, cognitive, and affective components of human interactions.
The author concludes by emphasizing the need to expand structural investigations to more human-centered directions by capturing sentiment, emotions, topics, and ideas as embedded in online posts. By doing so, researchers can gain a deeper understanding of the dynamics of online discussions and improve their analysis of content going viral and polarization in online audiences.
In summary, this article explores the structural approach to analyzing online discussions, which involves capturing sentiment, emotions, topics, and ideas as embedded in online posts to gain a deeper understanding of the dynamics of online interactions. By incorporating more human-centered directions, researchers can improve their analysis of content going viral and polarization in online audiences.