Jump to Main Content
Towards integration of data-driven agronomic experiments with data provenance
- Cruz, Sérgio Manuel Serra da, Nascimento, José Antonio Pires do
- Computers and electronics in agriculture 2019 v.161 pp. 14-28
- computer simulation, computer software, concrete, metadata, models, provenance, research projects
- With improvements in computing and communications, the amount of scientific data in agriculture has been exploding. Thus, researchers must rely on computational simulations to model the data-driven in silico agronomic experiments, the in silico experiments are those that are completely executed by using computer models. Reproducibility, transparency, independent verification are major features of Science. However, even agricultural research of exemplary quality may have irreproducible empirical findings because of random or systematic error. Funding agencies, researchers, and reviewers are demanding improved processes and the use of open data to increase reproducibility of those experiments. Currently, there are no scientific criteria to evaluate the integration of data-driven agronomic experiments with data provenance. We propose RFlow, a framework that aid researchers to manage, share, and enact the scientific in silico experiments of research projects that use reusable R scripts. The framework uses open data standards and transparently captures provenance of the agronomic experiments. RFlow is non-intrusive, can be connected to workflow systems and does not require researchers to change their working way. Our computational experiments show that the framework can collect provenance metadata and enrich a scientific project. This study shows how RFlow can serve as the primary integration platform for statistical systems, like R, with implications for other data and compute-intensive agronomic projects. As a proof of concept, we show the concrete effectiveness and expressive power of the RFlow which was evaluated through a set of data-driven agronomic in silico experiments and provenance SQL queries that exemplifies what kind of information was gathered.