U.S. flag

An official website of the United States government

Dot gov

Official websites use .gov
A .gov website belongs to an official government organization in the United States.


Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.


Main content area

Nutrient estimation from 24-hour food recalls using machine learning and database mapping: a case study with lactose

Chin Elizabeth Laura Su Mei, Simmons Gabriel, Bouzid Yasmine Y., Kan Annie, Burnett Dustin J., Tagkopoulos Llias, Lemay Danielle G.
Nutrients 2019 v.11 no.12 pp. 1-27
algorithms, artificial intelligence, automation, case studies, computational methodology, data collection, diet recall, food recalls, foods, lactose, models, nutrient databanks, nutrients, nutrition assessment, prediction
The Automated Self-Administered 24-Hour Dietary Assessment Tool (ASA24) is a free dietary recall system that outputs fewer nutrients than the Nutrition Data System for Research (NDSR). NDSR uses the Nutrition Coordinating Center (NCC) Food and Nutrient Database; both require a license. Manual lookup of ASA24 foods into NDSR is time consuming but currently the only way to acquire NCC-exclusive nutrients. Using lactose as an example, we evaluated machine learning and database matching methods to estimate this NCC-exclusive nutrient from ASA24 reports. ASA24-reported foods were manually looked up into NDSR to obtain lactose estimates and split into training (n= 378) and test (n = 189) datasets. Nine machine learning models were developed to predict lactose from the nutrients common between ASA24 and the NCC database. Database matching algorithms were developed to match NCC foods to an ASA24 food using only nutrients (“nutrient-only”) or the nutrient and food descriptions (“nutrient+text”). For both methods, the lactose predictions were compared to the manual curation. Among machine learning models, Bounded-LASSO and Bounded-Ridge performed best on held-out test data (R2 = 0.52 and 0.50, respectively). For the database matching method, nutrient+text matching yielded the best lactose estimates (R2=0.76), a vast improvement over the status quo of no estimate. These results suggest that computational methods can successfully estimate an NCC-exclusive nutrient for foods reported in the ASA24.