A Topological Collapse for Document Summarization

Hui Guan1, Wen Tang, Hamid Krim2, James Keiser, Andrew J. Rindos, Radmila Sazdanovic

  • 1North Carolina State University
  • 2North Carolina State Univeristy

Details

14:30 - 16:00 | Tue 5 Jul | Salisbury C | S6.2

Session: Big data signal processing in communications and networking

Abstract

As a useful tool to summarize documents, keyphrase extraction extracts a set of single or multiple words, called keyphrases, that capture the primary topics discussed in a document. In this paper we propose DoCollapse, a topological collapse-based unsupervised keyphrase extraction method that relies on networking document by semantic relatedness of candidate keyphrases. A semantic graph is built with candidates keyphrases as vertices and then reduced to its core using topological collapse algorithm to facilitate final keyphrase selection. Iteratively collapsing dominated vertices aids in removing noisy candidates and revealing important points. We conducted experiments on two standard evaluation datasets composed of scientific papers and found that DoCollapse outperforms state-ofthe-art methods. Results show that simplifying a document graph by homology-preserving topological collapse benefits keyphrase extraction.