A framework for spatial clustering of textual objects: applications in topic clustering and text segmentation

Guillaume Guex

doi:10.26034/la.cdclsl.2025.8346

No. 69 (2025), Articles

No. 69 (2025)

A framework for spatial clustering of textual objects: applications in topic clustering and text segmentation

Articles

https://doi.org/10.26034/la.cdclsl.2025.8346

Publié-e 2025-06-27

Guillaume Guex⁺⁻

Guillaume Guex

Université de Lausanne

Pdf (English)

Comment citer

Guex, G. (2025). A framework for spatial clustering of textual objects: applications in topic clustering and text segmentation. Cahiers Du Centre De Linguistique Et Des Sciences Du Langage, (69), 73–96. https://doi.org/10.26034/la.cdclsl.2025.8346

Résumé

We present a general, classical, framework of spatial clustering which can be applied to various textual objects (e.g. character n-grams, words, sentences). This framework proposes to cluster objects according to users defined linguistic similarity, while keeping a spatial coherence of objects among clusters. Two methods are derived from this formalism: SpatialWord, which applies to word-tokens, and SpatialSent, operating on sentences, which both balance between semantic similarities of objects and their position along the textual sequence. We show that these unsupervised methods, along with semi-supervised variants, can perform jointly two operations often achieved individually by methods in literature: (1) the extraction of a desirable number of topics from a document along with list of words to interpret them; and (2) the textual segmentation of the document reflecting these extracted topics. Case studies show that these methods perform competitively against state-of-the-art methods on baseline datasets.

https://doi.org/10.26034/la.cdclsl.2025.8346

Pdf (English)

Cette œuvre est sous licence Creative Commons Attribution 4.0 International.

A framework for spatial clustering of textual objects: applications in topic clustering and text segmentation

Comment citer

Télécharger la référence

Résumé