Author: Joop Hermens
Reviewer: Steven Droge, Monika Nendza
Learning objectives:
You should be able to:
Keywords: Quantitative structure-property relationships (QSPR), quantitative structure-activity relationships (QSAR), octanol-water partition coefficients, hydrogen bonding, multivariate techniques.
Introduction
Risk assessment needs input data for fate and effect parameters. These data are not available for many of the existing chemicals and predictions via estimation models will provide a good alternative to actual testing. Examples of estimation models are Quantitative Structure-Property Relationships (QSPRs) and Quantitative Structure-Activity Relationships (QSARs). The term "activity" is often used in relation to models for toxicity, while "property" usually refers to physical-chemical properties or fate parameters.
In a QSAR or QSPR, a certain environmental parameter is related to a physical-chemical or structural property, or a combination of properties.
The elements in a QSPR or QSAR are shown in Figure 1 and include:
Figure 1. The principle of a QSPR or QSAR. See text for explanation.
The Y-variable
Estimation models have been developed for many endpoints such as sorption to sediment, humic acids, lipids and proteins, chemical degradation, biodegradation, bioconcentration and ecotoxic effects.
The X-variable
An overview of the chemical parameters (the X-variable) used in estimation models is given in Table 1. Chemical properties are divided in three categories: (i) parameters related to hydrophobicity, (ii) parameters related to charge and charge distribution in a molecule and (iii) parameters related to the size or volume of a molecule. Hydrophobicity is discussed in more detail in the section on Relevant chemical properties.
Other QSPR approaches use large number of parameters derived from chemical graphs. The CODESSA Pro software, for example, generates molecular (494) and fragment (944) descriptors, classified as (i) constitutional, (ii) topological, (iii) geometrical, (iv) charge related, and (v) quantum chemical (Katritzky et al. 2009). Some models are based on structural fragment in a molecule. The polyparameter linear free energy relationships (pp-LFER) use parameters that represent interactions between molecules (see under pp-LFER).
Table 1. Examples of parameters related to hydrophobicity and electronic and steric parameters (the X variable).
Hydrophobic parameters |
Aqueous solubility |
Octanol-water partition coefficient (Kow) |
Hydrophobic fragment constant π |
Electronic parameters |
Atomic charges (q) |
Dipole moment |
Hydrogen bond acidity (H bond-donating) |
Hydrogen bond basicity (H bond-accepting) |
Hammett constant σ |
Steric parameters |
Total Surface Area (TSA) |
Total Molecular Volume (TMV) |
Taft constant for steric effects (Es) |
The model
Most models are based on correlations between Y and X. Such a relationship is derived for a “training set” that consists of a limited number of carefully selected chemicals. The validity of such a model should be tested by applying it to a "validation set", i.e. a set of compounds for which experimental data can be compared with the predictions. Different techniques can be used to develop an empirical model, such as:
Linear equations take the form:
Y(i) = a1X1(i) + a2X2(i) + a3X3(i) + ... + b (1)
where Y(i) is the value of the dependent parameter of chemical i (for example sorption coefficients); X1-X3(i) are values for the independent parameters (the chemical properties) of chemical i; a1-a3 are regression coefficients (usually 95% confidence limits are given); b is the intercept of the linear equation. The quality of the equation is presented via the correlation coefficient (r) and the standard error of estimate (s). The closer r is to 1.0, the better the fit of the relationship is. More information about the statistical quality of models can be found under “limitation of QSPR”.
The classical approach in QSAR and QSPR studies is the Hansch approach that was develop in the 1960s. The Hansch equation (Hansch et al., 1963) describes the influence of substituents on the biological activity in a series of parent compounds with a certain substituent (equation 2). Substituents are for example a certain atom or chemical group (Cl, F, B,. OH, NH2) attached to a parent aromatic ring structure.
log 1/C = c π + c' σ + c'' Es + c''' (2)
in which:
C is the molar concentration of a chemical with a particular effect,
π is a substituent constant for hydrophobic effects,
σ is a substituent constant for electronic effects, and
Es is a substituent constant for steric effects.
c are constants that are obtained by fitting experimental data
For example, the hydrophobic substituent constant is based on Kow and is defined as is defined as:
π (X) = log Kow (RX) - log Kow (RH) (3)
where RX and RH are the substituted and unsubstituted parent compound, respectively.
The Hammett and Taft constants are derived in a similar way.
Multivariate techniques may be very useful to develop structure-activity relationships, in particular in cases where a large number of chemical parameters is involved. Principal Component Analysis (PCA) can be applied to reduce the number of variables into a few principal components. The next step is to find a relationship between Y and X via, for example, Partial Least Square (PLS) analysis. The advantage of PCA and PLS is that it can deal with a large number of chemical descriptors and that is can also cope with collinear (correlated) properties. More information on these multivariate techniques and examples in the field of environmental science are given by Eriksson et al. (1995).
Poly-parameter Linear Free Energy Relationship (pp-LFER)
The pp-LFER approach has a strong mechanistic basis because it includes the different types of interactions between molecules (Goss and Schwarzenbach, 2001). For example, the sorption coefficient of a chemical from an aqueous phase to soil or to phospholipids (the sorbent) depends on the interaction of a chemical with water and the interaction with the sorbent phase. One of the driving forces behind sorption is the hydrophobicity. Hydrophobicity means fear (phobia) for water (hydro). A hydrophobic chemical prefers to “escape from the aqueous phase” or in other words “it does not like to dissolve in water”. Water molecules are tightly bound to each other via hydrogen bonds. For a chemical to dissolve, a cavity should be formed in the aqueous phase (Figure 2) and this will cost energy. More hydrophobic compounds will often have a stronger sorption (see more information in the section on Relevant chemical properties).
Hydrophobicity mainly depends on two molecular properties:
Figure 2. The formation of a cavity in water for chemical X and the interaction with another phase (here, a soil particle).
In the interaction with the sorbent (soil, membrane lipids, storage lipids, humic acids), major interactions are van der Waals interactions and hydrogen bonding (Table 2). Van der Waals interactions are attractive and occur between all kind of molecules and the strength depends on the contact area. Therefore, the strength of van der Waals interactions are related to the size of a molecule. A hydrogen bond is an electrostatic attraction between a hydrogen (H) and another electronegative atom bearing a lone pair of electrons. The hydrogen atom is usually covalently bound to a more electronegative atom (N, O, F). Table 2 lists the interactions with examples of chemical structures.
A pp-LFER is a linear equation developed to model partition or sorption coefficients (K) using parameters that represent the interactions (Abraham, 1993). The model equation is based on five descriptors:
(2)
with:
E |
excess molar refraction |
S |
dipolarity/polarizability parameter |
A |
solute H-bond acidity (H-bond donor) |
B |
solute H-bond basicity (H-bond acceptor) |
V |
molar volume |
The partition or sorption coefficient K may be expressed as the sum of five interaction terms, with the uppercase parameters describing compound specific properties. E depends on the valence electronic structure, S represents polarity and polarizability, A is the hydrogen bond (HB) donor strength (HB acidity), B the HB acceptor strength (HB basicity), V is the so-called characteristic volume related to the molecule size, and c is a constant. The lower-case parameters express the corresponding properties of the respective two-phase system, and can thus be taken as the relative importance of the compound properties for the particular partitioning or sorption process. In this introductory section, we only focus on the volume factor (V) and the two hydrogen bond parameters (A and B).
Numerous pp-LFERs have been developed for all kinds of environmental processes and an overview is given by Endo and Goss (2014).
Table 2. Types of interactions between molecules and the phase to which they sorb with examples of chemicals (Goss and Schwarzenbach, 2003).
Compounda) |
Interactions |
Examples |
Apolar |
only van der Waals |
alkanes, chlorobenzenes, PCBs |
Monopolar |
van der Waals + H-acceptor (e-donor)
|
alkenes, alkynes, alkylaromatic compounds ethers, ketones, esters, aldehydes |
Monopolar |
van der Waals + H-donor (e-acceptor) |
CHCl3, CH2Cl2
|
Bipolar |
van der Waals + H-donor + H-acceptor |
R–NH2, R2–NH, R–COOH, R–OH
|
a) Apolar: no polar group present; mono/dipolar: one or two polar groups present in a molecule
Examples of QSPR for bioconcentration to fish
Kow based model
Predictive models for bioconcentration have a long history. The octanol-water partition coefficient (KOW) is a good measure for hydrophobicity and bioconcentration factors (BCF’s) are often correlated to Kow (see more information in section on Bioaccumulation). The success of these KOW based models was explained by the resemblance of partitioning in octanol and bulk lipid in the organisms, at least for neutral hydrophobic compounds. A well-known example of a linear QSAR model for the log BCF (Y variable) based on the log KOW (X variable) (Veith et al., 1979):
log BCF = 0.85 log KOW - 0.70 (5)
Figure 3 gives a classical example of such a correlation for BCF to guppy of a series of chlorinated benzenes and polychlorinated biphenyls. When lipophilic chemicals are metabolised, the relation shown in Figure 3 is no longer valid and BCF will be lower than predicted based on KOW. Another deviation of this BCF-Kow relation can be found for highly lipophilic chemicals with log Kow>7. For such chemicals, BCF often decrease again with increasing Kow (see Figure 3). The apparent BCF curve with Kow as the X variable tends to follow a nonlinear curve with an optimum at log Kow 7-8. This phenomenon may be explained from molecular size: molecules of chemicals like decachlorobiphenyl may be so large that they have difficulties in passing membranes. A more likely explanation, however, is that for highly lipophilic chemicals aqueous concentrations may be overestimated. It is not easy to separate chemicals bound to particles from the aqueous phase (see box 1 in the section on Sorption) and this may lead to measured concentrations that are higher than the bioavailable (freely dissolved) concentration (Jonker and van der Heijden 2007; Kraaij et al. 2003). For example, at a dissolved organic carbon (DOC) concentration of 1 mg-DOC/L, a chemical with a log Koc of 7 will be 90% bound to particles, and this bound fraction is not part of the dissolved concentration that equilibrates with the (fish) tissue. This shows that these models are also interesting because they may show trends in the data that may lead to a better understanding of processes.
Figure 3. The relationship between bioconcentration factors in guppy and the octanol-water partition coefficients with data from (Bruggeman et al., 1984; Könemann and Leeuwen, 1980).
Examples of QSPR for sorption to lipids
Kow based models are successful because octanol probably has similar properties than fish lipids. There are several types of lipids and membrane lipids have different properties and structure than for example storage lipids (see Figure 4, and more details in the section on Biota). More refined BCF models include separation of storage and membrane lipids and also proteins as separate sorptive phases (Armitage et al. 2013). pp-LFER is a very suitable approach to model these sorption or partitioning processes and results for two large data sets are presented in Table 3. The coefficients e, s, b and v are rather similar. The only parameter that is different in these two models is coefficient a, which represents the contribution of hydrogen bond (HB) donating properties (A) of chemicals in the data set. This effect makes sense because the phosphate group in the phospholipid structure has strong HB accepting properties. This example shows the strength of the pp-LFER approach because it closely represents the mechanism of interactions.
Figure 4. Structure of a phospholipid and a triglyceride. Note the similar glycerol part in both lipids.
Table 3. LFERs for storage lipid-water partition coefficients (KSL-W) and membrane lipid-water partition coefficients (KML-W (liposome)). Listed are the parameters (and standard error), the number of compounds with which the LFER was calibrated (n), the correlation coefficient (r2), and the standard error of estimate (SE). log K = c + eE + sS + aA + bB + vV.
Para- meter |
c |
e |
s |
a |
b |
v |
n |
r2 |
SE |
KSL-W
|
-0.07 (0.07) |
0.70 (0.06) |
-1.08 (0.08) |
-1.72 (0.13) |
-4.14 (0.09) |
4.11 (0.06) |
247 |
0.997 |
0.29 |
From (Geisler et al. 2012) |
|||||||||
KML-W (liposome) |
0.26 (0.08) |
0.85 (0.05) |
-0.75 (0.08) |
0.29 (0.09) |
-3.84 (0.10) |
3.35 (0.09) |
131 |
0.979 |
0.28 |
From (Endo et al. 2011) |
KSL-W: storage lipid partition coefficients are mean values for different types of oil. Raw data and pp-LFER (for 37 oC) reported in (Geisler et al. 2012).
KML-W (liposome): data from liposomes made up of phosphatidylcholine (PC) or PC mixed with other membrane lipids. Raw data (20-40 oC) and pp-LFER reported in (Endo et al. 2011).
Examples of QSPR for sorption to soil
Numerous QSPRs are available for soil sorption (see section on Sorption). Also the organic carbon normalized sorption coefficient (Koc) is linearly related to the octanol-water partition coefficient (see Figure 5).
Figure 5. Correlation between the organic carbon normalized sorption coefficient to soil (Koc) and the octanol-water partition coefficient (Kow) for data from (Sabljic et al. 1995).
The model in Figure 5 is only valid for neutral, non-polar hydrophobic organic chemicals such as chlorinated aromatic compounds, polycyclic aromatic hydrocarbons (PAHs), polychlorinated biphenyl (PCBs) and chlorinated insecticides or, in general, compounds that only contain carbon, hydrogen and halogen atoms. It does not apply to polar and ionized organic compounds nor to metals. For polar chemicals, also other interactions may influence sorption and a pp-LFER approach would also be useful.
The sorption of ionic chemicals is more complex. For the sorption of cationic organic compounds, clay minerals can be an equally important sorption phase as organic matter because of their negative surface charge and large surface area. The sorption of organic cations is mainly an adsorption process that reaches a maximum at the cation exchange capacity (CEC) of a particle (see section on Soil). Also models for the prediction of sorption of cationic compounds are more complicated and first attempts have been made recently (Droge and Goss, 2013). Major sorption mechanism for anionic chemicals is sorption into organic matter. The sorption coefficient of anionic chemicals is substantially lower than for the neutral form of the chemical, roughly a factor 10-100 for KOC (Tülp et al. 2009). In case of weakly dissociating chemicals such as carboxylic acids, the sorption coefficient can often be estimated from the sorption coefficient of the non-ionic form and the fraction of the chemical that is present in the non-ionized form (see section on Relevant chemical properties).
Reliability and limitations of QSPR
Predictive models have limitations and it is important to know these limitations. There is not one single model that can predict a parameter for all chemicals. Each model will have a domain of applicability and it is important to apply a model only to a chemical within that domain. Therefore, guidance has to be defined on how to select a specific model. It is also important to realize that in many computer programs (such as fate modeling programs), estimates and predictions are implicitly incorporated in these progams.
Another aspect is the reliability of the prediction. The model itself can show a good fit (high r2) for the training set (the chemicals used to develop the model), but the actual reliability should be tested with a separate set of chemicals (the validation set) and a number of statistical procedures can be applied to test the accuracy and predictive power of the model. The OECD has developed a set of rules that should be applied in the validation of QSPR and QSAR models.
References
Abraham, M.H. (1993). Scales of solute hydrogen-bonding - their construction and application to physicochemical and biochemical processes. Chemical Society Reviews 22, 73-83.
Armitage, J.M., Arnot, J.A., Wania, F., Mackay, D. (2013). Development and evaluation of a mechanistic bioconcentration model for ionogenic organic chemicals in fish. Environmental Toxicology and Chemistry 32, 115-128.
Bruggeman, W.A., Opperhuizen, A., Wijbenga, A., Hutzinger, O. (1984). Bioaccumulation of super-lipophilic chemicals in fish. Toxicological and Environmental Chemistry 7, 173-189.
Droge, S.T.J., Goss, K.U. (2013). Development and evaluation of a new sorption model for organic cations in soil: Contributions from organic matter and clay minerals. Environmental Science and Technology 47, 14233-14241.
Endo, S., Escher, B.I., Goss, K.U. (2011). Capacities of membrane lipids to accumulate neutral organic chemicals. Environmental Science and Technology 45, 5912-5921.
Endo, S., Goss, K.U. (2014). Applications of polyparameter linear free energy relationships in environmental chemistry. Environmental Science and Technology 48, 12477-12491.
Eriksson, L., Hermens, J.L.M., Johansson, E., Verhaar, H.J.M., Wold, S. (1995). Multivariate analysis of aquatic toxicity data with pls. Aquatic Sciences 57:217-241.
Geisler, A., Endo, S., Goss, K.U. (2012). Partitioning of organic chemicals to storage lipids: Elucidating the dependence on fatty acid composition and temperature. Environmental Science and Technology 46, 9519-9524.
Goss, K.-U., Schwarzenbach, R.P. (2001). Linear free energy relationships used to evaluate equilibrium partittioning of organic compounds. Environmental Science and Technology 35, 1-9.
Goss, K.U., Schwarzenbach, R.P. (2003). Rules of thumb for assessing equilibrium partitioning of organic compounds: Successes and pitfalls. Journal of Chemical Education 80, 450-455.
Hansch, C., Streich, M., Geiger, F., Muir, R.M., Maloney, P.P., Fujita, T. (1963). Correlation of biological activity of plant growth regulators and chloromycetin derivatives with hammett constants and partition coefficients. Journal of the American Chemical Society 85, 2817-&.
Jonker, M.T.O., van der Heijden, S.A. (2007). Bioconcentration factor hydrophobicity cutoff: An artificial phenomenon reconstructed. Environmental Science and Technology 41, 7363-7369.
Katritzky, A.R., Slavov, S., Radzvilovits, M., Stoyanova-Slavova, I., Karelson, M. (2009). Computational chemistry approaches for understanding how structure determines properties. Zeitschrift Fur Naturforschung Section B-a Journal of Chemical Sciences 64:773-777.
Könemann, H., Van Leeuwen, K. (1980). Toxicokinetics in fish: Accumulation and elimination of six chlorobenzenes by guppies. Chemosphere 9, 3-19.
Kraaij, R., Mayer, P., Busser, F.J.M., Bolscher, M.V., Seinen, W., Tolls, J. (2003). Measured pore-water concentrations make equilibrium partitioning work - a data analysis. Environmental Science and Technology 37, 268-274.
Sabljic, A., Güsten, H., Verhaar, H.J.M., Hermens, J.L.M. (1995). Qsar modelling of soil sorption. Improvements and systematics of log koc vs. Log kow correlations. Chemosphere 31, 4489-4514.
Tülp, H.C., Fenner, K., Schwarzenbach, R.P., Goss, K.U. (2009). pH-dependent sorption of acidic organic chemicals to soil organic matter. Environmental Science and Technology 43, 9189-9195.
Veith, G.D., Defoe, D.L., Bergstedt, B.V. (1979). Measuring and estimating the bioconcentration factor of chemicals in fish. Journal of the Fisheries Research Board of Canada 36, 1040-1048.