4.3.12. Gene expression

Author: Nico M. van Straalen

Reviewers: Dick Roelofs, Dave Spurgeon

 

Learning objectives:

You should be able to

 

Keywords: genomics, transcriptomics, proteomics, metabolomics, risk assessment

 

 

Synopsis

 

Low-dose exposure to toxicants induces biochemical changes in an organism, which aim to maintain homoeostasis of the internal environment and to prevent damage. One aspect of these changes is a high abundance of transcripts of biotransformation enzymes, oxidative stress defence enzymes, heat shock proteins and many proteins related to the cellular stress response. Such defence mechanisms are often highly inducible, that is, their activity is greatly upregulated in response to a toxicant. It is also known that most of the stress responses are specific to the type of toxicant. This principle may be reversed: if an upregulated stress response is observed, this implies that the organism is exposed to a certain stress factor; the nature of the stress factor may even be derived from the transcription profile. For this reason, microarrays, RNA sequencing or other techniques of transcriptome analysis, have been applied in a large variety of contexts, both in laboratory experiments and in field surveys. These studies suggest that transcriptomics scores high on (in decreasing order) (1) rapidity, (2) specificity, and (3) sensitivity. While the promises of genomics applications in environmental toxicology are high, most of the applications are in mode-of-action studies rather than in risk assessment.

 

Introduction

No organism is defenceless against environmental toxicants. Even at exposures below phenotypically visible no-effect levels a host of physiological and biochemical defence mechanisms are already active and contribute to the organism’s homeostasis. These regulatory mechanisms often involve upregulation of defence mechanisms such as oxidative stress defence, biotransformation (xenobiotic metabolism), heat shock responses, induction of metal-binding proteins, hypoxia response, repair of DNA damage, etc. At the same time downregulation is observed for energy metabolism and functions related to growth and reproduction. In addition to these targeted regulatory mechanisms targeting, there are usually a lot of secondary effects and dysfunctional changes arising from damage. A comprehensive overview of all these adjustments can be obtained from analysis of the transcriptome.

 

In this module we will review the various approaches adopted in “omics”, with an emphasis on transcriptomics. “Omics” is a container term comprising five different activities. Table 1 provides a list of these approaches and their possible contribution to environmental toxicology. Genomics and transcriptomics deal with DNA and mRNA sequencing, proteomics relies on mass spectrometry while metabolomics involves a variety of separation and detection techniques, depending on the class of compounds analysed. The various approaches gain strength when applied jointly. For example proteomics analysis is much more insightful if it can be linked to an annotated genome sequence and metabolism studies can profit greatly from transcription profiles that include the enzymes responsible for metabolic reactions. Systems biology aims to integrate the different approaches using mathematical models. However, it is fair to say that the correlation between responses at the different levels is often rather poor. Upregulation of a transcript does not always imply more protein, more protein can be generated without transcriptional upregulation and the concentration of a metabolite is not always correlated with upregulation of the enzymes supposed to produce it. In this module we will focus on transcriptomics only. Metabolomics is dealt with in a separate section.

 

Table 1. Overview of the various “omics” approaches

Term

Description

Relevance for environmental toxicology

Genomics

Genome sequencing and assembly, comparison of genomes, phylogenetics, evolutionary analysis

Explanation of species and lineage differences in susceptibility from the structure of targets and metabolic potential, relationship between toxicology, evolution and ecology

Transcriptomics

Genome-wide transcriptome (mRNA) analysis, gene expression profiling

Target and metabolism expression indicating activity, analysis of modes of action, diagnosis of substance-specific effects, early warning instrument for risk assessment

Proteomics

Analysis of the protein complement of the cell or tissue

Systemic metabolism and detoxification, diagnosis of physiological status, long-term or permanent effects

Metabolomics

Analysis of all metabolites from a certain class, pathway analysis

Functional read-out of the physiological state of a cell or tissue

Systems biology

Integration of the various “omics” approaches, network analysis, modelling

Understanding of coherent responses, extrapolation to whole-body phenotypic responses

 

 

Transcriptomics analysis

The aim of transcriptomics in environmental toxicology is to gain a complete overview of all changes in mRNA abundance in a cell or tissue as a function of exposure to environmental chemicals. This is usually done in the following sequence of steps:

  1. Exposure of organisms to an environmental toxicant, including a range of concentrations, time-points, etc., depending on the objectives of the experiment.
  2. Isolation of total RNA from individuals or a sample of pooled individuals. The number of biological replicates is determined at this stage, by the number of independent RNA isolations, not by technical replication further on in the procedure.
  3. Reverse transcription. mRNAs are transcribed to cDNA using the enzyme reverse transcriptase that initiates at the polyA tail of mRNAs. Because ribosomal RNA lacks a poly(A)tail they are (in principle) not transcribed to cDNA. This is followed by size selection and sometimes labelling of cDNAs with barcodes to facilitate sequencing.
  4. Sequencing of the cDNA pool and transcriptome assembly. The assembly preferably makes use of a reference genome for the species.If no reference genome is available, the transcriptome is assembled de novo, which requires a greater sequencing depth and usually ends in many incomplete transcripts. A variety of corrections are applied to equalize effects of total RNA yield, library size, sequencing depth, gene length, etc.
  5. Gene expression analysis and estimation of fold regulation. This is done, in principle, by counting the normalized number of transcripts per gene for every gene in the genome, for each of the different conditions to which the organism was exposed. The response per gene is expressed as fold regulation, by expressing the transcripts relative to a standard or control condition. Tests are conducted to separate significant changes from noise.
  6. Annotation and assessment of pathways and functions as influenced by exposure. An integrative picture is developed, taking all evidence together, of the functional changes in the organism.

In the recent past, step 4 was done by microarray hybridization rather than by direct sequencing. In this technique two pools of cDNA (e.g. a control and a treatment) are hybridized to a large number of probes fixed onto a small glass plate. The probes are designed to represent the complete gene complement of the organism. Positive hybridization signals are taken as evidence for upregulated gene expression. Microarray hybridization arose in the years 1995-2005 but has now been largely overtaken by ultrafast and high-throughput next generation sequencing methods, however, due to cost-efficiency, relative simplicity of bioinformatics analysis, and standardization of the assessed genes it is still often used.

 

We illustrate the principles of transcriptomics analysis and the kind of data analysis that follows it, by an example from the work by Bundy et al. (2008). These authors exposed earthworms (Lumbricus rubellus) to soils experimentally amended with copper, quite a toxic element for earthworms. The copper-induced transcriptome was surveyed using a custom-made microarray and metabolic profiles were established using NMR (nuclear magnetic resonance) spectroscopy. From the 8,209 probes on the microarray, 329 showed a significant alteration of expression under the influence of copper. The data were plotted in a “heat map” diagram (Figures 1A and 1B), providing a quick overview of upregulated and downregulated genes. The expression profiles were also analysed in reduced dimensionality using principal component analysis (PCA). This showed that the profiles varied considerably with treatment. Especially the highest and the penultimate highest exposures generated a profile very different from the control (see Figure 1C). The genes could be allocated to four clusters, (1) genes upregulated by copper over all exposures (Figure 1D), (2) genes downregulated by copper (see Figure 1E), (3) genes upregulated by low exposures but unaffected at higher exposures (see Figure 1F), and (4) genes upregulated by low exposure but downregulated by higher concentrations (see Figure 1G). Analysis of gene identity combined with metabolite analysis suggested that the changes were due to an effect of copper on mitochondrial respiration, reducing the amount of energy generated by oxidative phosphorylation. This mechanism was underlying the reduction of body-growth observed on the phenotypic level.

 

Figure 1. Example of a transcriptomics analysis aiming to understand copper toxicity to earthworms. A “heat map” of individual replicates (four in each of five copper treatments). Expression is indicated for each of the 329 differentially expressed genes (arranged from top to bottom) in red (downregulated) or green (upregulated). A cluster analysis showing the similarities is indicated above the profiles. B. The same data, but with the four replicates per copper treatment joined. The data show that at 40 mg/kg of copper in soil some of the earthworm’s genes are starting to be downregulated, while at 160 mg/kg and 480 mg/kg significant upregulation and downregulation is occurring. C Principal Component Analysis of the changes in expression profile. The multivariate expression profile is reduced to two dimensions and the position of each replicate is indicated by a single point in the biplot; the confidence interval over four replicates of each copper treatment is indicated by horizontal and vertical bars. The profiles of the different copper treatments (joined by a dashed line) differ significantly from each other. D, E, F, and G. Classification of the 329 genes in four groups according to their responses to copper (plotted on the horizontal axis). Redrawn from Bundy et al. (2008) by Wilma Ijzerman.

 

Omics in risk assessment

How could omics-technology, especially transcriptomics, contribute to risk assessment of chemicals? Three possible advantages have been put forward:

  1. Gene expression analysis is rapid. Gene regulation takes place on a time-scale of hours and results can be obtained within a few days. This compares very favourably with traditional toxicity testing (Daphnia, 48 hours, Folsomia, 28 days).
  2. Gene expression is specific. Because a transcription profile involves hundreds to thousands of endpoints (genes), the information content is potentially very large. By comparing a new profile generated by an unknown compound, to a trained data set, the compound can usually be identified quite precisely.
  3. Gene expression is sensitive. Because gene regulation is among the very first biochemical responses in an organism, it is expected to respond to lower dosages, at which whole-body parameters such as survival, growth and reproduction are not yet responding.

Among these advantages, the second one (specificity) has shown to be the most consistent and possibly brings the largest advantage. This can be illustrated by a study by Dom et al. (2012) in which gene expression profiles were generated for Daphnia magna exposed to different alcohols and chlorinated anilines (Figure 2).

 

Figure 2. Clustered gene expression profiles of Daphnia magna exposed to seven different compounds. Replicates exposed to the same compound are clustered together, except for ethanol. The first split separates exposures that at the EC10 level (reproduction) did not show any effects on growth and energy reserves (right) and exposures that caused significant such effects (left). Reproduced from Dom et al. (2012) by Wilma IJzerman.

 

The profiles of replicates exposed to the same compound were always clustered together, except in one case (ethanol), showing that gene expression is quite specific to the compound. It is possible to reverse this argument: from the gene expression profile the compound causing it can be deduced. In addition, the example cited showed that the first separation in the cluster analysis was between exposures that did and did not affect energy reserves and growth. So the gene expression profiles are not only indicative of the compound, but also of the type of effects expected.

 

The claim of rapidity also proved true, however, the advantage of rapidity is not always borne out. It may be an issue when quick decisions are crucial (evaluating a truck loaded with suspect contaminated soil, deciding whether to discharge a certain waste stream into a lake yes or no), but for regular risk assessment procedures it proved to be less of an advantage than sometimes expected. Finally, greater sensitivity of gene expression, in the sense of lower no-observed effect concentrations than classical endpoints is a potential advantage, but proves to be less spectacular in practice. However, there are clear examples in which exposures below phenotypic effect levels were shown to induce gene expression responses, indicating that the organism was able to compensate any negative effects by adjusting its biochemistry.

 

Another strategy regarding the use of gene expression in risk assessment is not to focus on genome-wide transcriptomes but on selected biomarker genes. In this strategy, gene expressions are aimed for that show (1) consistent dose-dependency, (2) responses over a wide range of contaminants, and (3) correlations with biological damage. For example, De Boer et al. (2015) analysed a composite data set including experiments with six heavy metals, six chlorinated anilines, tetrachlorobenzene, phenanthrene, diclofenac and isothiocyanate, all previously used in standardized experiments with the soil-living collembolan, Folsomia candida. Across all treatments a selection of 61 genes was made, that were responsive in all cases and fulfilled the three criteria listed above. Some of these marker genes showed a very good and reproducible dose-related response to soil contamination. Two biomarkers are shown in Figure 3. This experiment, designed to diagnose a field soil with complex unknown contamination, clearly demonstrated the presence of Cyp-inducing organic toxicants.

 

Figure 3. Showing gene expression, relative to control expression, for two selected biomarker genes (encoding cytochroms P450 phase I biotransformation enzymes) in the genome of the soil-living collembolan Folsomia candida, in response to the concentration of contaminated field soil spiked-in into a clean soil at different rates. Reproduced from Roelofs et al. (2012) by Wilma IJzerman.

 

Of course there are also disadvantages associated with transcriptomics in environmental toxicology, for example:

  1. Gene expression analysis requires a knowledge-intensive infrastructure, including a high level of expertise for some of the bioinformatics analyses. Also, adequate molecular laboratory facilities are needed; some techniques are quite expensive.
  2. Gene expression analysis is most fruitful when species are used that are backed up by adequate genomic resources, especially a well annotated genome assembly, although this is becoming less of a problem with increasing availability of genomic resources.
  3. The relationship between gene expression and ecologically relevant variables such as growth and reproduction of the animal is not always clear.

 

 

Conclusions

Gene expression analysis has come to occupy a designated niche in environmental toxicology since about 2005. It is a field highly driven by technology, and shows continuous change over the last years. It may significantly contribute to risk assessment in the context of mode of action studies and as a source of designated biomarker techniques. Finally, transcriptomics data are very suitable to feed into information regarding key events, important biochemical alterations that are causally linked up to the level of the phenotype to form an adverse outcome pathway. We refer to the section on Adverse outcome pathways for further reading.

 

 

References

Bundy, J.G., Sidhu, J.K., Rana, F., Spurgeon, D.J., Svendsen, C., Wren, J.F., Stürzenbaum, S.R., Morgan, A.J., Kille, P. (2008). “Systems toxicology" approach identifies coordinated metabolic responses to copper in a terrestrial non-model invertebrate, the earthworm Lumbricus rubellus. BMC Biology 6, 25.

De Boer, T.E., Janssens, T.K.S., Legler, J., Van Straalen, N.M., Roelofs, D. (2015). Combined transcriptomics analysis for classification of adverse effects as a potential end point in effect based screening. Environmental Science and Technology 49, 14274-14281.

Dom, N., Vergauwen, L., Vandenbrouck, T., Jansen, M., Blust, R., Knapen, D. (2012). Physiological and molecular effect assessment versus physico-chemistry based mode of action schemes: Daphnia magna exposed to narcotics and polar narcotics. Environmental Science and Technology 46, 10-18.

Gibson, G., Muse, S.V. (2002). A Primer of Genome Science. Sinauer Associates Inc., Sunderland.

Gibson, G. (2008). The environmental contribution to gene expression profiles. Nature Reviews Genetics 9, 575-581.

Roelofs, D., De Boer, M., Agamennone, V., Bouchier, P., Legler, J., Van Straalen, N. (2012). Functional environmental genomics of a municipal landfill soil. Frontiers in Genetics 3, 85.

Van Straalen, N.M., Feder, M.E. (2012). Ecological and evolutionary functional genomics - how can it contribute to the risk assessment of chemicals? Environmental Science & Technology 46, 3-9.

Van Straalen, N.M., Roelofs, D. (2008). Genomics technology for assessing soil pollution. Journal of Biology 7, 19.