Main content area

Biases of tree-independent-character-subsampling methods

Simmons, Mark P., Gatesy, John
Molecular phylogenetics and evolution 2016 v.100 pp. 424-443
data collection, sampling, simulation models, synapomorphy
Observed Variability (OV) and Tree Independent Generation of Evolutionary Rates (TIGER) are quick and easy-to-apply tree-independent methods that have been proposed to provide unbiased estimates of each character’s rate of evolution and serve as the basis for excluding rapidly evolving characters. Both methods have been applied to multiple phylogenomic datasets, and in many cases the authors considered their trees inferred from the OV- and TIGER-delimited sub-matrices to be better estimates of the phylogeny than their trees based on all characters. In this study we use four sets of simulations and an empirical phylogenomic example to demonstrate that both methods share a systematic bias against characters with more symmetric distributions of character states, against characters with greater observed character-state space, and against large clades in the context of character conflict. As a result these methods can favor convergences and reversals over synapomorphy, exacerbate long-branch attraction, and produce mutually exclusive phylogenetic inferences that are dependent upon differential taxon sampling. We assert that neither OV nor TIGER should be relied upon to increase the ratio of phylogenetic to non-phylogenetic signal in a data matrix. We also assert that skepticism is warranted for empirical phylogenetic results that are based on OV- and/or TIGER-based character deletion wherein a small clade is supported after deletion of characters, yet is contradicted by a larger clade when the entire data matrix was analyzed.