Exploring Global Discourse Evolution: Applying BERTopic, NMF, and LDA to the United Nations General Debate Corpus in the Pre/Post-Millennium Development Goals Era
We studied the United Nations (UN) General Debate Corpus, a collection of 7,314 speeches from 1970 to 2014. We wanted to understand how the focus and emphasis of speeches at the UN General Assembly changed after the adoption of the Millennium Development Goals (MDGs). We employed topic modeling techniques to identify key themes and topics in speeches. Our analysis revealed that BERTopic, a neural topic modeling algorithm, generated the most coherent topics. BERTopic's effectiveness in handling complex datasets was evident. However, we also encountered challenges. BERTopic relies on pre-trained word embeddings, which may not effectively capture domain-specific information. Additionally, BERTopic can struggle with noisy data. Our dataset presented unique challenges, as it included scanned documents and complete text, deviating from the standard format of UN General Assembly speeches. The model's results indicated a degree of semantic similarity, but interpreting the results proved difficult.