The Korean Association of Language Sciences

전국우수 학회와 맞먹는 연구성과를 위해 학술대회와 편집/심사기능을 보다 강화하겠습니다.


언어과학, Vol.24 (2017)

텍스트 마이닝을 통한 셰익스피어 학술논문 영어초록 코퍼스의 토픽모델링 분석













This study explores a Shakespeare Research Article Abstract Corpus through topic modeling, a machine-learning technique that automatically identifies topics in a corpus. First, we identify which topics are prominent through the entire corpus. We also investigate the top 20 topics in each particular decade such as the 1980s, 1990s, 2000s, and 2010s and examine patterns, trends and ranking changes such as falling, rising, and curve contours over time. In addition, we extract corpus keywords using the cross-validation method on Wordsmith tools 6.0. in order to compare similarities and differences between topic modeling keywords and corpus keywords. We also select each group of absolute keywords which have zero frequency in reference corpora to examine which words are associated with new trends in each period and to explore which shared common words are found in topic modeling and corpus keywords. Finally each group of non-absolute keywords extracted from the three corpora is discussed to check patterns and trends identical to topic modeling. The results of this comparison conform that it is hard to assert that topic modeling keywords are well grouped into certain research subjects over corpus keywords and show better trends over time than corpus keywords. This is because both topic modeling keywords and corpus keywords show their own respective merits.
  셰익스피어, 코퍼스언어학, 토픽모델링, 머신러닝, 키워드분석, 트렌드 분석

Download PDF list