On May 11, 7:54=A0am, "ozgun.harmanci" <ozgun.harma...@[EMAIL PROTECTED]
> wrote:
> Hello,
> We have been doing some data clustering to compare samples generated
> by two different methods: A method is used to generate sample x_1,
> then we cluster x_1 using diana in R package and determine the optimal
> clustering scenario by maximizing calinsky harabasz index (as
> calculated by R). diana is divisive analysis, which is a hierarchical
> divisive clustering method. It computes a tree or dendrogram.
>
> Our hypothesis is that one method should generate data which is less
> scattered, meaning that cluster analysis should yield less number of
> clusters.
>
> However, when we do the clustering analysis on the generated samples,
> we saw that there is no clear distinction between number of clusters.
> But if I look at the tree's generated by diana then it is obvious to
> me that the method which we expect to have less clusters has less
> spread in the tree.
>
> I am thinking that we should also use the variance of data in the
> clusters in addition to number of clusters to compare the sampling
> methods. I, however, could not find a theoretical way to do that.
> Could you suggest me ideas, papers or books to follow up with this
> problem?
>
> I hope this makes sense.
> Arif.
You need a good definition of "less scattered" and then compare based
on that definition. For example, would comparing the variance work?
Aniko


|