Clustering Sinhala News Articles Using Corpus-Based Similarity Measures

Purnima Nanayakkara¹, Surangika Ranathunga¹

¹University of Moratuwa, Sri Lanka

Details

10:00 - 10:15 | Thu 31 May | Seminar Room | T.1.3-5

Session: Big Data, Machine Learning, and Cloud Computing

Abstract

News aggregators help readers to handle large numbers of news items in a convenient manner by collecting them into a single place with meaningful groupings. Such news aggregators/clusters are available for English and some other popular languages. However, no such tools are available for Sinhala language. To address this void, this paper presents a system to collect news articles published across the web and group related articles using string-based similarity measures. Despite the simplicity of the technique and the morphological richness of Sinhala, we achieved very promising results that prove the viability of the presented technique.