Community Question Retrieval in Health Forums

Hamman Samuel1, Mi-Young Kim2, Sankalp Prabhakar3, Mohomed Shazan Mohomed Jabbar1, Osmar Zaiane1

  • 1University of Alberta
  • 2Alberta Machine Intelligence Institute
  • 3Dept. of Computing Science, University of Alberta

Details

14:30 - 14:40 | Thu 16 Feb | Salon 5 | ThB1.1

Session: Thu1.2: Health Informatics (Public/Lifestyle)

Abstract

Community Question Answering (CQA) has emerged as a popular type of service enabling users to ask and answer questions, and access the existing knowledge-base. CQA archives contain a lot of useful user-generated content and have been recognized as important information resources for the web. To improve accessibility to this body of knowledge in CQA archives, effective and efficient question retrieval is required. Question retrieval in a CQA archive aims to identify and retrieve existing questions that are relevant to new user questions. The objective of this study is to develop a question retrieval system that can sift through such forums and identify existing questions which are most similar to the user-provided question. We focus on health forums, and propose a CQA system using weighted TF-IDF, relevance heuristics, and term expansion. We compare our proposed algorithm against other well-known methods, and demonstrate that our method outperforms the Latent Dirichlet allocation (LDA) topic model, Latent Semantic Indexing (LSI), language model-based information retrieval, BM25, vector space, Word2Vec, and semantic similarity approaches. Our initial experiments use datasets from the IEEE Healthcare Data Analytics Challenge 2015, and we also present our efforts towards development of a Bronze Standard for question similarity evaluation using self-annotations and annotations provided by affiliates of Mayo Clinic.