Main content area

Cluster Analyses for Analyzing Two-Way Classification Data

Lin, C. S., Butler, G.
Agronomy journal 1990 v.82 no.2 pp. 344-348
genotype-environment interaction, analysis of variance, regression analysis, algorithms, data analysis, genotype, cluster analysis
The interaction structure of two-way classification data often can be identified if the data are stratified into homogeneous subsets. Four cluster methods, two new and two originally developed for investigating genotype × environment (GE) interaction, are proposed for this purpose. The four methods differ in the dissimilarity indices depending on whether the regression model or ANOVA model is used, and whether the similarity is specified with respect to the GE interaction alone or with respect to the genetic effect and GE interaction combined. The same cluster algorithm of averaging the dissimilarity indices between all pairs of individuals in the two groups is used for all. A unique feature of these methods is that the dissimilarity index defined or constructed at any cluster cycle, is equivalent to the mean square of the respective ANOVA for the grouped genotypes. The direct link between the cluster analysis and conventional ANOVA provides a convenient way of determining the cutoff point based on tire -ratio of the smallest dissimilarity index and the error estimate. When the calculated -ratio exceeds the tabular -value the cluster process should be stopped. Since the smallest dissimilarity index at each cluster cycle is monotonically increasing, this stopping rule assures that all individuals within the groups are homogeneous with respect to defined characteristics. The stratification of data sets (by these methods) leads to the revelation of complex data structures as demonstrated by several works cited.