A framework for spatial clustering of textual objects: applications in topic clustering and text segmentation
Pdf (English)

Comment citer

Guex, G. (2025). A framework for spatial clustering of textual objects: applications in topic clustering and text segmentation. Cahiers Du Centre De Linguistique Et Des Sciences Du Langage, (69), 73–96. https://doi.org/10.26034/la.cdclsl.2025.8346

Résumé

We present a general, classical, framework of spatial clustering which can be applied to various textual objects (e.g. character n-grams, words, sentences). This framework proposes to cluster objects according to users defined linguistic similarity, while keeping a spatial coherence of objects among clusters. Two methods are derived from this formalism: SpatialWord, which applies to word-tokens, and SpatialSent, operating on sentences, which both balance between semantic similarities of objects and their position along the textual sequence. We show that these unsupervised methods, along with semi-supervised variants, can perform jointly two operations often achieved individually by methods in literature: (1) the extraction of a desirable number of topics from a document along with list of words to interpret them; and (2) the textual segmentation of the document reflecting these extracted topics. Case studies show that these methods perform competitively against state-of-the-art methods on baseline datasets.

https://doi.org/10.26034/la.cdclsl.2025.8346
Pdf (English)
Licence Creative Commons

Cette œuvre est sous licence Creative Commons Attribution 4.0 International.