Main content area

Multi-element comparisons of tapes evidence using dimensionality reduction for calculating likelihood ratios

Gupta, Anjali, Martinez-Lopez, Claudia, Curran, James M., Almirall, Jose R.
Forensic science international 2019 v.301 pp. 426-434
data collection, forensic sciences, lead, models, normal distribution, principal component analysis, stable isotopes, weight-of-evidence
Computing the likelihood ratio (LR), as a measure of weight of evidence, has traditionally been difficult for multi-element evidence. A solution based on multivariate random effects models has been adopted by the forensic community but suffers from instability and has a tendency toward extreme values. This problem is magnified by increasing the number of variables. In this study, we consider reducing the dimensionality of the problem using principal component analysis (PCA) and a post-hoc calibration step suggested by van Es et al. [1] and evaluate the performance of this method using multi-element data collected from electrical tapes with up to 18 elements measured. A set of 90 tapes known to originate from different sources were analyzed by LA-ICP-MS. We used additive log-ratio transformation with respect to the signal of 208Pb to transform the 18-dimensional data. This transformation altered the scale of the signals and more importantly, the transformed signals exhibited characteristics similar to a normal distribution. We used scores of the first five principal components (PCs) as input to the LR formula given by Aitken and Lucy [2] where we assumed multivariate normal between-sources distribution (LR MVN) to compare the tapes.We observed that the calculated LRs were extremely positive and negative and did not conform with the definition of well-calibrated LRs. Thus, we used the post-hoc calibration method given by van Es et al. [1] to calibrate the likelihood ratios. The calibrated LRs were obtained within an appropriate range.Five scenarios, each related to the number of principal components used to compare the samples formed part of this study. The first scenario made the comparisons using only the first PC, the second scenario used the first two PCs together and so on. The last scenario, LR5, used 5 PCs for the comparisons. Comparing the results of these 5 scenarios provided an understanding around sensitivity of the method based on the percentage of information used for the comparisons.The lowest false exclusion (Type I) and false inclusion (Type II) error rates were obtained for LR5 scenario in comparison to all the other scenarios. False inclusion and false exclusion error rates of 3.7% and 2.2% were reported by using only 5 out of 17 PCs. False exclusion error rates of 2.2% indicated that only two same-source comparisons had LR<1.The proposed method overcomes the problem of using highly-dimensional data for the comparisons, while using a high percentage of information present in the original data.