4.3.11. Molecular epidemiology - II. The exposome and internal molecular markers

 

(draft)

Authors: Karen Vrijens and Michelle Plusquin

Reviewers: Frank Van Belleghem,

 

Learning objectives

You should be able to

 

 

Exposome

The exposome idea was described by Christopher Wild in 2005 as a measure of all human life-long exposures, including the process of how these exposures relate to health. An important aim of the exposome is to explain how non-genetic exposures contribute to the onset or development of important chronic disease. This concept represents the totality of exposures from three broad domains, i.e. internal, specific external and general external (Figure 1) (Wild, 2012). The internal exposome includes processes such as metabolism, endogenous circulating hormones, body morphology, physical activity, gut microbiota, inflammation, and aging. The specific external exposures include diverse agents, for example, radiation, infections, chemical contaminants and pollutants, diet, lifestyle factors (e.g., tobacco, alcohol) and medical interventions. The wider social, economic and psychological influences on the individual make up the general external exposome, including the following factors but not limited to social capital, education, financial status, psychological stress, urban-rural environment, climate, etc1.

 

Figure 1. The exposome consists of 3 domains: the general external, the specific internal and the internal exposome.

 

The exposome is a theoretical concept with overlap between the three domains, however, this description serves to illustrate the full width of the exposome. The exposome model is characterized by the application of a wide range of tools in rapidly developing fields. Novel advances in monitoring exposure via wearables, modelling, internal biological measurements are recently developed and implemented to actually estimate lifelong exposures2-4. As these approaches generate extensive amounts of data, statistical and data science frameworks are warranted to analyze the exposome. Besides several bio-statistical advances combining multiple levels of exposures, biological responses and layers of personal characteristics, machine learning algorithms are developed to fully exploit collected data5,6.

The exposome concept clearly illustrates the complexity of the environment humans are exposed to nowadays, and how this can impact human health. There is a need for internal biomarkers of exposure (see section on Human biomonitoring) as well as biomarkers of effect, to disentangle the complex interplay between several exposures occurring potentially simultaneously and at different concentrations throughout life. Advances in biomedical sciences and molecular biology thereby collecting holistic information of epigenetics, transcriptome (see section on Gene expression), metabolome (see section on Metabolomics), etc. are at the forefront to identify biomarkers of exposure as well as of effect.

 

Internal molecular markers of the exposome

Meet in the middle model

To determine the health effect of environmental exposure, markers that can detect early changes before disease arises are essential and can be implemented in preventative medicine. These types of markers can be seen as intermediate biomarkers of effect, and their discovery relies on large-scale studies at different levels of biology (transcriptomics, genomics, metabolomics). The term “omics” refers to the quantitative measurement of global sets of molecules in biological samples using high throughput techniques (i.e. automated experiments that enable large scale repetition)7, in combination with advanced biostatistics and bioinformatics tools8. Given the availability of data from high-throughput omics platforms, together with reliable measurements of external exposures, the use of omics enhances the search for markers playing a role in the biological pathway linking exposure to disease risk.

The meet-in-the-middle (MITM) concept was suggested as a way to address the challenge of identifying causal relationships linking exposures and disease outcome (Figure 2). The first step of this approach consists in the investigation of the association between exposure and biomarkers of exposure. The next step consists in the study of the relationship between (biomarkers of) exposure and intermediate omics biomarkers of early effects; and third, the relation between the disease outcome and intermediate omics biomarkers is assessed. The MITM stipulates that the causal nature of an association is reinforced if it is found in all three steps. Molecular markers that indicate susceptibility to certain environmental exposures are starting to become uncovered and can aid in targeted prevention strategies. Therefore, this approach is heavily dependent on new developments in molecular epidemiology, in which molecular biology is merged into epidemiological studies. Below, the different levels of molecular biology currently studied to identify markers of exposure and effect are discussed in detail.

 

Figure 2. The meet in the middle approach. Biological samples are examined to identify molecules that represent intermediate markers of early effect. These can then be used to link exposure measures or markers with disease endpoints. Figure adapted from Vineis & Perera (2007).

 

Levels

Intermediate biomarkers can be identified as measurable indicators of certain biological states at different levels of the cellular machinery, and vary in their response time, duration, site and mechanism of action. Different molecular markers might be preferred depending on the exposure(s) under study.  

 

Gene expression

Changes at the mRNA level can be studied following a candidate approach in which mRNAs with a biological role suspected to be involved in the molecular response to a certain type of exposure (e.g. inflammatory mRNAs in the case of exposure to tobacco smoke) are selected a priori and measured using quantitative PCR technology or alternatively at the level of the whole genome by means of microarray analyses or Next Generation Sequencing technology. 10 Changes at the transcriptome level, are studied by analysing the totality of RNA molecules present in a cell type or sample.

 

Both types of studies have proven their utility in molecular epidemiology. About a decade ago the first study was published reporting on candidate gene expression profiles that were associated with exposure to diverse carcinogens11. Around the same time, the first studies on transcriptomics were published, including transcriptomic profiles for a dioxin-exposed population 12, in association with diesel-exhaust exposure,13 and comparing smokers versus non-smokers both in blood 14 as well as airway epithelium cells15. More recently, attention has been focused on prenatal exposures in association with transcriptomic signatures, as this fits within the scope of the exposome concept. As such, transcriptomic profiles have been described in association with exposure to maternal smoking assessed in placental tissue,16 as well as particulate matter exposure in cord blood samples17.

Epigenetics

Epigenetics related to all heritable changes in that do not affect the DNA sequence itself directly. The most widely studied epigenetic mechanism in the field of environmental epidemiology to date is DNA methylation. DNA methylation refers to the process in which methyl-groups are added to a DNA sequence. As such, these methylation changes can alter the expression of a DNA segment without altering its sequence. DNA methylation can be studied by a candidate gene approach using a digestion-based design or, more commonly used, a bisulfite conversion followed by pyrosequencing, methylation-specific PCR or a bead array. The bisulfite treatment of DNA mediates the deamination of cytosine into uracil, and these converted residues will be read as thymine, as determined by PCR-amplification and sequencing. However, 5 mC residues are resistant to this conversion and will remain read as cytosine (Figure 3).

 

Figure 3: A. Restriction-digest based design A methylated (CH3) region of genomic DNA is digested either with two restriction enzymes, one which is blocked by GC methylation (HpaII) and one which is not(MspI). Smaller fragments are discarded (X), enriching for methylated DNA in the HpaII treated sample. B. Bisulfite-conversion of DNA. DNA is denatured and then treated with sodium bisulfite to convert unmethylated cytosine to uracil, which is converted to thymine by PCR. An important point is that following bisulfite conversion, the DNA strands are no longer complementary, and primers are designed to assay the methylation status of a specific strand.

 

If an untargeted approach is desirable, several strategies can be followed to obtain whole-genome methylation data, including sequencing. Epigenotyping technologies such as the human methylation BeadChips 18 generate a methylation-state-specific ‘pseudo-SNP’ through bisulfite conversion; therefore, translating differences in the DNA methylation patterns into sequence differences that can be analyzed using quantitative genotyping methods19.

An interesting characteristic of DNA methylation is that it can have transgenerational effects (i.e. effects that act across multiple generations). This was first shown in a study on a population that was prenatally exposed to famine during the Dutch Hunger Winter in 1944–1945. These individuals had less DNA methylation of the imprinted gene coding for insulin-like growth factor 2 (IGF2) measured 6 decades later compared with their unexposed, same-sex siblings. The association was specific for peri-conceptional exposure (i.e. exposure during the period from before conception to early pregnancy), reinforcing that very early mammalian development is a crucial period for establishing and maintaining epigenetic marks20.

 

Post-translational modifications (i.e. referring to the biochemical modification of proteins following protein biosynthesis) recently gained more attention as they are known to be induced by oxidative stress 21 (see sections on Oxidative stress) and specific inflammatory mediators 22. Besides their function in the structure of chromatin in eukaryotic cells, histones have been shown to have toxic and pro-inflammatory activities when they are released into the extracellular space 23. Much attention has gone to the associations between metal exposures and histone modifications,24 although recently a first human study on the association between particulate matter exposure and histone H3 modification was published25.

 

Expression of microRNAs (miRNAs are small noncoding RNAs of ∼22nt in length which are involved in the regulation of gene expression at the posttranscriptional level by degrading their target mRNAs and/or inhibiting their translation) {Ambros et al,2004}{Ambros, 2004 #324}{Ambros, 2004 #324} has also been shown to serve as a valuable marker of exposure, both candidate and untargeted approaches have resulted in the identification of miRNA expression patterns that are associated with exposure to smoking 26, particulate matter 27, and chemicals such as polychlorinated biphenyls (PCBs) 28.

 

 

Metabolomics

Metabolomics have been proposed as a valuable approach to address the challenges of the exposome. Metabolomics, the study of metabolism at the whole-body level, involves assessment of the entire repertoire of small molecule metabolic products present in a biological sample. Unlike genes, transcripts and proteins, metabolites are not encoded in the genome. They are also chemically diverse, consisting of carbohydrates, amino acids, lipids, nucleotides and more. Humans are expected to contain a few thousand metabolites, including those they make themselves as well as nutrients and pollutants from their environment and substances produced by microbes in the gut. The study of metabolomics increases knowledge on the interactions between gene and protein expression, and the environment29. Metabolomics can be a biomarker of effect of environmental exposure as it allows for the full characterization of biochemical changes that occur during xenobiotic metabolism (see Section on Xenobiotic metabolism and defence). Recent technological developments have allowed downscaling the sample volume necessary for the analysis of the full metabolome, allowing for the assessment of system-wide metabolic changes that occur as a result of an exposure or in conjunction with a health outcome 30. As for all discussed biomarkers, both targeted metabolomics, in which specific metabolites are measured in order to characterize a pathway of interest, as well as untargeted metabolomic approaches are available. Among “omics” methodologies, metabolomics interrogates the levels of a relatively lower number of features as there are about 2900 known human metabolites versus ~30,000 genes. Therefore it has strong statistical power compared to transcriptome-wide and genome-wide studies 31. Metabolomics is, therefore, a potentially sensitive method for identifying biochemical effects of external stressors. Even though the developing field of “environmental metabolomics” seeks to employ metabolomic methodologies to characterize the effects of environmental exposures on organism function and health, the relationship between most of the chemicals and their effects on the human metabolome have not yet been studied.

 

Challenges

Limitations of molecular epidemiological studies include the difficulty to obtain samples to study, the need for large study populations to identify significant relations between exposure and the biomarker, the need for complex statistical methods to analyse the data. To circumvent the issue of sample collection, much effort has been focused on eliminating the need for blood or serum samples by utilizing saliva samples, buccal cells or nail clippings to read out molecular markers. Although these samples can be easily collected in a non-invasive manner, care must be taken to prove that these samples indeed accurately reflect the body’s response to exposure rather than a local effect. For DNA methylation, it has been shown this is heavily dependent on the locus under study. For certain CpG sites the correlation in methylation levels is much higher than for other sites 32. For those sites that do not correlate well across tissues, it has furthermore been demonstrated that DNA methylation levels can differ in their associations with clinical outcomes 33, so care must be taken in epidemiological study design to overcome these issues.