11
edits
mNo edit summary |
m (→numbers) |
||
Line 60: | Line 60: | ||
* 683k clusters comprised of | * 683k clusters comprised of | ||
* 820k comments | * 820k comments | ||
* there are a few larger clusters with 20 to 4000 comments (~2k, < 1% of all clusters). These take the bulk of the clustering time (n**2 algorithm). | * there are a few larger clusters with 20 to 4000 comments (~2k, < 1% of all clusters). These take the bulk of the clustering time (n**2 algorithm) as they come from large sitesummaries. | ||
* there are ~35k small clusters (2-20 comments, mostly 2 or 3) | * there are ~35k small clusters (2-20 comments, mostly 2 or 3) | ||
* and a very long tail of single-comment "clusters" | * and a very long tail of single-comment "clusters" |
edits