We will have a talk tomorrow (Wednesday 18 April), in the meeting room.
::Guest:
Mirco Kocher
Computational Linguistics group
Computer science department of the Université de Neuchâtel.
::Title: Dynamic Thresholds for Author Clustering
::Abstract:
In author clustering, we have a corpus with texts and we want to regroup
them such that each of the clusters corresponds to a distinct author.
We propose to use a dynamic threshold for single link hierarchical
clustering. First, we determine the probability for each text pair to
be written by the same person. To do so, we present an effective
feature selection strategy and compute the pairwise distances of
documents in the collection. Assuming a Gaussian distribution of the
distances for each text, we can select the relative shortest distances
and assign them higher probabilities. Comparisons with other systems
from PAN CLEF have shown that this approach achieves some of the highest
clustering performances and also completes the task with one of the
shortest runtime.
Show replies by date