Main content area

Ensuring the quality of data packages in the LTER network data management system

O'Brien, Margaret, Costa, Duane, Servilla, Mark
Ecological informatics 2016 v.36 pp. 237-246
extensibility, information management, pasta, provenance
Considerable data analyses use automated workflows to ingest data from public repositories, and rely on data packages of high structural quality. The Long Term Ecological Research (LTER) Network now screens all packages entering its long-term archive to ensure completeness and quality, and to ascertain that metadata and data are structurally congruent, i.e., that the data typing and formats expressed in metadata agree with that found in data entities. The EML Congruence Checker (ECC) system is a component of the LTER Provenance Aware Synthesis Tracking Architecture (PASTA), and operates on data tables in packages described with Ecological Metadata Language using the EML Data Manager Library, written in Java. Checking is extensible for other data types and customizable via a template. Reports are retained as part of the submitted data package, and summaries here reflect the general usability of LTER data for a variety of purposes. On average in 2015, site-contributed data in the LTER catalog were 95% compliant (valid) with the current suite of checks.