The “dark matter” of nutrition – just what are you eating?

Phytochemicals and other molecules in foods can have many health benefits; but what might they be and by what mechanism? Much of this is still poorly understood due to the presence of nutrition "dark matter" - spectral data that cannot be assigned to known molecules. Parnell and colleagues have investigated this dark matter using a computational prediction tool called PhyteByte; the results of which have now been published in BMC Bioinformatics. He explains more about this work here.

Imagine digging into a bowl of strawberries. They are in season and make for a delicious, healthy snack. The deep red color and the aroma of the sliced berries are striking. That’s pelargonidin 3-glucoside (1) and methoxyfuraneol (2) captivating the senses. There’s gamma-decalactone (3) in those berries too. In fact, there are hundreds of other biochemical compounds naturally present in this fruit. The situation is no different for other foods like turmeric, ginger, garlic, romaine lettuce, chocolate, coffee, and even wine.


Some compounds found in strawberries include pelargonidin 3-glucoside (1, PubChem ID 443648), methoxyfuraneol (2, 53929577) and gamma-decalactone (3, 12813).

But what health effects can these compounds produce? The answers are many, of course, because the catalog of compounds in any one food is long and diverse, and testing every compound experimentally is prohibitive. However, some answers can arise from putting rigorous computational methods to the task of exploring nutrition’s “dark matter.”

The dark matter in nutrition

In metabolomics, this “dark matter” is the massive amount of spectral data that cannot be assigned easily to known molecules. As relating to nutrition, this notion of dark matter recently has been explored in two ways. First, there are efforts to expand the catalogs of compounds present in raw, fermented, processed, stored, and metabolized food. With this foundation, the second exploratory step is naturally to identify the biological functions of each compound. To make progress in step two, we elected to use the well-characterized universe of pharmaceutical compounds as an anchor to predictions about food compounds. Specifically, we asked the question: “Can we predict biological function of a compound by identifying structurally similar pharmacological agents?” And so, the design and coding of PhyteByte began.

What PhyteByte does

PhyteByte is software built to predict biological effects of food compounds.

PhyteByte is software built to predict biological effects of food compounds. The implementation is simple: the end-user inputs a protein target, and PhyteByte identifies food compounds with a relatively high likelihood of interacting with the target, thus predicting biological effects. PhyteByte leverages a random forest classifier to perform much of the heavy lifting. This model is trained to identify pharmaceutical compounds from databases like ChEMBL. Compounds are encoded as “fingerprints” – simple bit vectors that store complex cheminformatics sub-structure information. Our model learns the sub-structures that it is looking for – for a given target – and generalizes this to identify novel food compounds in our test dataset.

Importantly, PhyteByte is a prediction tool. Its output is a list of molecules and their source foods that are very likely to have specific biological effects, but it is not a tool to devise a meal plan or provide for dietary recommendations.

The road ahead for computational nutrition

PhyteByte is just one of numerous examples of new computational methods – like complex systems theory and machine learning – being layered atop genomics, metabolomics, and genetics data in nutrition research. In fact, one can make the point that computational nutrition is merely the natural offspring of the nutrition-omics marriage.

Computational nutrition likely will develop in several ways. One, given a phenotype or protein of interest, foods are proposed that may affect that phenotype or protein, and done in a targeted, macronutrient-free manner. Two, more detailed epidemiological and food frequency questionnaire data can support tests for connections between certain foods and health outcomes. Here, the foods under study contain compounds that potentially offer narrowly defined health benefits based on similarity to known drugs. Three, such computationally derived pharmacological-food compound links can inform drug development. For example, drug efficacy studies could instruct volunteers to avoid specific foods that contain compounds potentially exhibiting physiological effects like the drug.

Importantly, while some of the above scientific endeavors have been woven into nutrition research for some time, all of what is described here requires that efforts be put forth at a much grander scale. And, for such approaches to be most effective, there must occur a concerted, broad effort to identify and accurately quantify the roughly 2000 different compounds in each food, a theme central to the emerging foodomics field.


View the latest posts on the BMC Series blog homepage