Publications

Thesis

Su-Jin Shin, Incorporating Domain Knowledge into Hierarchical Topic Models with Dirichlet Forest Priors, Master's Thesis, Department of Industrial and Systems Engineering, KAIST, 2015
File
신수진-석사학위논문-디리쉴릿 포레스트 사전 확률을 적용한 계층적 토픽 모델에의 도메인 지식 반영법 연구.pdf (17.8M) 15회 다운로드 DATE : 2023-11-07 14:06:53

Su-Jin Shin, Incorporating Domain Knowledge into Hierarchical Topic Models with Dirichlet Forest Priors, Master's Thesis, Department of Industrial and Systems Engineering, KAIST, 2015

 

Abstract : 

In spite of the proliferation of the topic model, the structured organization of topics from the probabilistic models needs to be improved. The improvement can be achieved in two ways: the better structured presentation of topics and the incorporation of domain knowledge on the corpus. The structured presentation, i.e., the hierarchical topic model, helps in categorizing similar topics, and the incorporation of domain knowledge enables the concentrated sampling of predefined keywords in the mixture model training. This paper presents the first topic model of the hierarchical topic clustering as well as incorporates domain knowledge, which I named Guided Hierarchical Topic Model (GHTM). Specifically, I allocated the prior information from the knowledge to the Dirichlet tree distribution, which becomes the prior of the hierarchical topic model. From the prior adjustment, I obtained the topic tree guided by the domain knowledge. With the Reuters Corpus Volume and the 20 Newsgroups datasets, I compared the performance of the GHTM to that of the Hierarchical Topic Model (HTM) from the perspective of the hierarchical classification accuracy. I found that the micro/macro F-measures of the classification are improved with enhanced structured organization.


@masterthesis{Shin:2015,

author = {Su-Jin Shin},

advisor ={Il-Chul Moon},

title = {Incorporating Domain Knowledge into Hierarchical Topic Models with Dirichlet Forest Priors},

school = {KAIST},

year = {2015}

}