Estimating species sensitivity distributions for microplastics by quantitively considering particle characteristics using a recently created ecotoxicity database

Estimation of a species sensitivity distribution (SSD) by fitting a statistical distribution to ecotoxicity data is a promising approach to deriving “safe” concentrations for microplastics. However, most existing SSDs do not quantitatively consider the diverse characteristics of microplastics, such as particle size and shape. To address this issue, based on 38 mass-based chronic no observed effect concentrations (NOECs) obtained from a recently created database, we estimated SSDs that quantitatively consider the influences of three types of microplastic characteristics (particle length, shape, and polymer type) and habitat of the test species (freshwater vs. marine) by using Bayesian modeling. We selected the best SSD model among all possible models using the widely applicable information criterion. The best SSD model included particle length (range: 0.05–280 μm) and a binary dummy variable corresponding to the fiber shape. Lower chronic NOECs were associated with decreasing particle size and with toxicity tests that included fibers in this model. Combined with the fact that the null model (i.e., an SSD model with no predictor variable) was ranked 27th among the 64 candidate SSD models, our results support the need to incorporate particle characteristics such as length and shape (e.g., fiber) into estimations of SSDs for microplastics. The medians of the hazardous concentration of 5% of species (HC5) for microplastic spheres and fragments, estimated by the posterior distributions of individual parameters in the best SSD model, ranged from 0.02 to 2 µg/L, depending on the particle length (0.1–100 μm). For microplastic fibers, the HC5 values were estimated to be approximately 100 times lower than those for microplastic spheres and fragments with the same particle length. However, the 95% Bayesian credible intervals for HC5 estimates for fibers were considerable, expanded by up to five orders of magnitude. Despite many remaining challenges, the Bayesian SSD modeling utilized in this study provides unique opportunities to simultaneously investigate the influences of multiple microplastic characteristics on the NOECs of multiple species, which would otherwise be difficult to discern.


Introduction
Microplastics, defined as plastics with a particle length of less than 5 mm, are ubiquitous in aquatic environments [1][2][3][4][5] and potentially pose risks to the environment as well as human health [6][7][8][9].The number of laboratory studies reporting ecotoxicity of microplastics to biological species has been growing as researchers try to better understand and infer the ecological impacts on aquatic ecosystems [5,10,11].To assess and manage ecological risk based on such ecotoxicity data, it is important to derive predicted no effect concentrations (PNECs) below which adverse ecological impacts of concern are unlikely to occur.
A species sensitivity distribution (SSD) is usually estimated by fitting a statistical distribution to ecotoxicity data [12], and a hazardous concentration of 5% of species (HC5) is used to derive a PNEC.SSDs are now internationally accepted and used to develop environmental water quality benchmarks (e.g., standards, guidelines, and other criteria) [13,14].Many attempts have been made to estimate SSDs for microplastics [7,8,[15][16][17][18], but most existing SSDs do not consider the diverse characteristics of microplastics, such as particle size and shape (sphere, fragment, and fiber) or polymer type (e.g., polystyrene [PS] and polyethylene [PE]), which can affect their ecotoxicity and HC5 values [17,[19][20][21].
In this study, we used a recently created ecotoxicity database for microplastics (Toxicity of Microplastics Explorer [ToMEx]; [11]) to estimate SSDs that quantitatively consider the influences of three types of microplastic characteristics (i.e., particle size/length, particle shape, and polymer type) and the habitat of test species (i.e., freshwater vs. marine) by using Bayesian modeling.A previous application of Bayesian hierarchical modeling to SSDs was based on a limited number of chronic lowest observed effect concentrations (LOECs; n = 26) for spherical microplastics [17].By using this more comprehensive database, we extended the Bayesian SSD modeling to microplastic fragments and fibers.The specific objective was to gain insights into which characteristics were important in the prediction of microplastic ecotoxicity, as well as to validate and further investigate the findings of the previous study [17].By assessing the improvements in data availability for SSD estimation, this study also contributes to identifying the research and data needs for developing more scientifically defensible SSDs.

Data and assumptions
All of the ecotoxicity data used in this study were acquired from the ToMEx Aquatic Organisms database on 10 August 2022 (https://microplastics.sccwrp.org; [11]).This database is a comprehensive collection of ecotoxicity data for microplastics, including detailed information on particle length and shape, polymer type, and habitat ("Environment" in the ToMEx database), as well as effect concentrations such as no observed effect concentration (NOEC), LOEC, and 50% effect concentration (EC50).The database includes a total of 5853 microplastic effect records (3444 acute and 2409 chronic data points) extracted from a total of 162 publications [11].
For SSD estimation, Mehinto et al. [7] explicitly used two effect mechanisms (food dilution and tissue translocation) and suggested that HC5 values based on food dilution were lower than those based on tissue translocation.Following Mehinto et al. [7], in our SSD modeling, we assumed that the ingestion of microplastics is a necessary step leading to the organism-level effects such as growth inhibition and reduced reproduction.This assumption resulted in the exclusion of effect concentrations derived from tests with algal species because ingestion is not a plausible mechanism [7].Furthermore, mass-based concentrations were used in our SSD modeling by assuming that food dilution is likely a key effect mechanism and that the choice of volume-or massbased concentrations should not materially affect results given that the plastic densities are generally close to 1 [11,22,23].We might expect that none of the microplastic characteristics would be suggested as important by SSD modeling if food dilution is the only responsible effect mechanism.However, this expectation may not be valid because the available data were still limited and incomplete (see the Results and Discussion section).Despite this limitation, our modeling results are still valuable in the evaluation of microplastic characteristics that are important for SSD estimation.
We selected effect concentrations for the SSD estimation if (1) the exposure route was water and the experimental type was "particle only" (e.g., excluding effect data obtained from tests with plastic leachate) and (2) the essential quality criteria ("red criteria") for risk assessment were passed (see [11] for details).Effect concentrations were excluded if they were obtained from tests with particle mixtures or the information about polymer type or shape was unavailable because such effect data could not be adequately considered in our modeling.In addition, highest observed no effect concentrations were not included in our analysis because of their limited reliability [7,15].Effect concentrations based on the organism-level endpoints, such as growth, development, mortality, and reproduction, were included in our analysis.Whether endpoints can be regarded as organism level was judged from information about the level of biological organization for the observed endpoints and the "endpoints score" (see [11] for more details).Note that, for these selection processes, we simply used the available information in the ToMEx database.Then, for each unique combination of DOI (digital object identifier), species, polymer type, shape, and particle length, a single effect concentration was selected by following the priority order of chronic NOEC, chronic LOEC, acute NOEC, acute LOEC, and acute EC50.These selection processes and the corresponding numbers of resulting effect records are summarized in Supplementary Table S1.The original effect concentrations were then converted to chronic NOECs by applying assessment factors available in the ToMEx database.As a result, a total of 38 chronic NOECs that included 18 species across 6 groups (Cnidaria, Crustacea, Echinoderm, Fish, Mollusca, and Rotifera) were used for SSD modeling.This dataset is available on the GitHub website at https://github.com/yuichiwsk/SSDmodeling_ToMEx2022.Of the 38 NOECs, 20 and 18 were tested with species inhabiting freshwater and marine environments, respectively.

Species sensitivity distribution
Bayesian SSD modeling was performed in accordance with the procedures detailed in Takeshita et al. [17] and considered the influences of particle size (particle length: 0.05-280 μm), particle shape, polymer type, and/or type of habitat on the chronic NOECs of microplastics.We assumed that effect concentrations followed a log-normal distribution, which is a commonly used distribution for SSD estimation [24,25].The log-normal SSD models (i.e., normal SSD models for log 10 -transformed NOECs) were expressed by the following equations: where NOEC is the chronic NOEC, and µ and σ are the mean and standard deviation of the normal SSD, respectively.The parameters α and β i are the intercept and coefficient for the predictor variable X i , respectively, for deriving the mean value (i.e., µ) of the chronic NOECs.The six predictor variables (and β i ) were particle length (β length ) and 5 binary dummy variables corresponding to habitat (β habitat ), fragment (β fragment ), fiber (β fiber ), PS (β polystyrene ), and PE (β polyethylene ).These six predictor variables were selected following the previous study [17], and considering the limited data availability, we decided not to include additional variables.Details of the six predictor variables are described below.Although Takeshita et al. [17] incorporated reference-level random effects to account for variations in effect concentrations due to reference-specific unmodeled factors such as particle source and cleaning, we chose not to do so because the incorporation of such random effects may result in the underestimation of the SSD standard deviation and, consequently, an overestimation of HC5 values.This is particularly true when there are limited instances of pseudoreplication (i.e., cases where multiple chronic NOECs are obtained from a single study).Note that, due to the exclusion of the random effects, we referred the modeling approach utilized in this study as Bayesian SSD modeling (rather than Bayesian hierarchical SSD modeling; [17]).The range of particle length was 0.05-280 μm, and the log 10 -transformed values were used to consider the influence of particle length on SSD mean (i.e., µ).A binary dummy variable corresponding to type of habitat (marine: 0, freshwater: 1) was used to examine the influence of the habitat [17,26].To consider the influences of microplastic shapes, we used two binary dummy variables representing microplastic fragments (fragment: 1, other shapes: 0) and fibers (fiber: 1, other shapes: 0).This implies that if the shape was spherical, both of these binary variables were set to zero.Totals of 18, 17, and 3 NOECs were obtained from toxicity tests with microplastic shapes of sphere, fragment, and fiber, respectively.Similarly, two binary dummy variables corresponding to PS (PS: 1, other polymer types: 0) and PE (PE: 1, other polymer types: 0) were included as the predictors in the SSD modeling.Polymer types other than PS (n = 15) and PE (n = 14) included polypropylene (n = 2), polyethylene terephthalate (n = 4), polylactic acid (n = 1), polyurethane (n = 1), and polyvinylchloride (n = 1).Because the numbers of these other polymer types were limited, only two binary variables concerning PS and PE were used in our analysis.Similar to the effect data selection process, data about these variables were acquired from the ToMEx database.As in our previous study [17], we assumed that the standard deviation of SSD was not affected by these characteristics because of the limited data availability, but this assumption will need to be tested in future studies.

Parameter estimation, model selection, and estimation of HC5 values
Based on all possible combinations of the six predictor variables, we developed a total of 64 candidate SSD models.Parameter estimation of these SSD models was performed using a Bayesian framework.Hamiltonian Monte Carlo sampling was used to obtain posterior samples of the parameters; the sampling was performed using Stan [27] with R 3.6.3[28] (R Core Team, 2018) and the package "rstan" ver.2.21.2 [29].
The parameter for standard deviation (σ) was lowercensored at zero.We employed Cauchy and half-Cauchy distributions as weakly informative prior distributions of individual parameters [29].Specifically, we used a Cauchy distribution with location parameter 0 and scale parameter 10 or 5 for α and β i , respectively.For σ, we used a half-Cauchy distribution with location parameter 0 and scale parameter 5.The R and Stan code for the SSD model with all the predictor variables, as well as the dataset used, can be found on the GitHub website at https://github.com/yuichiwsk/SSDmodeling_ToMEx2022.We ran three chains in parallel with a burn-in of 10,000 samples, which were discarded, followed by 20,000 samples that were thinned to retain every 10th sample, resulting in a total of 6000 samples as the posterior distributions of the individual parameters.The convergence of sampling was assured with the criterion that the Gelman-Rubin statistic R was less than 1.1 [30].
We ranked the candidate SSD models based on the widely applicable information criterion (WAIC; Watanabe [31]), a measure of model fit and complexity, with the R package "loo" ver.2.5.1 [32].Models with smaller WAIC values have higher predictive power for the dependent variable value (i.e., the log 10 -transformed chronic NOEC) among the models evaluated.A smaller difference in WAIC values indicates similar predictive powers between models.However, there is no theoretical criterion for a cut-off value for the difference in WAIC values to conclude whether the model with a larger WAIC value is not supported.Thus, we selected the model with the minimum WAIC value as the best model and discuss the model selection results considering other competitive models.The SSD curve for the best model was obtained by using the Hamiltonian Monte Carlo samples of parameters, and the posterior distribution of HC5 was derived from the curve.

SSD model selection
The SSD model with the minimum WAIC value among the 64 candidate models was the model with two predictors: particle length and the binary dummy variable corresponding to fiber (Table 1; Fig. 1).Particle length was positively associated with µ (i.e., the posterior median of the parameter β length ; Table 1) in the best model, indicating lower chronic NOECs with decreasing particle size in this model.In addition, the posterior median of the coefficient for microplastic fibers (β fiber ; Table 1) was negative, indicating that chronic NOECs obtained from toxicity tests with fibers were lower than those with other shapes (i.e., sphere and fragment) in this SSD model.However, the 95% Bayesian credible intervals of the posterior distributions for β fiber included zero, indicating that careful interpretation is required.
The best SSD model did not include variables corresponding to habitat, fragment, PS, and PE; these predictors were first included in the second, fifth, seventh, and third-ranked models, respectively.Although a cutoff value for the WAIC difference is unavailable, the null model (i.e., no predictor variable) was ranked 27th, and the WAIC difference between the null and best models was 3.8; Table 1).These results at least supported the incorporation of particle characteristics such as Table 1 Posterior median (95% Bayesian credible intervals) of the parameters and the widely applicable information criterion (WAIC) for the top 5 (rank 1-5), full, and null species sensitivity distribution (SSD) models  A model with a smaller WAIC value has higher predictive power, but there is no theoretical criterion for a cut-off value for the delta WAIC.See Eqs.1-2 and related text for details of α, β, and σ length and shape (e.g., fiber) into estimations of SSDs for microplastics.

HC5 estimates
The median HC5 values for microplastic spheres and fragments, estimated by the posterior distributions of individual parameters in the best SSD model, ranged from 0.02 to 2 µg/L, depending on the particle length (0.1-100 μm; Fig. 2).Despite differences in the effect datasets and methodologies used, these HC5 estimates were generally within the same order of magnitude as those reported in previous studies, such as 0.14 µg/L [33], 1.7 and 5.4 µg/L [18], and 0.5 µg/L [15].For microplastic fibers, the HC5 values were estimated to be approximately 100 times lower than those for microplastic spheres and fragments with the same particle length (Fig. 2).However, the 95% Bayesian credible intervals for HC5 estimates for fibers were expanded by up to five orders of magnitude.Even though the estimation of SSD parameters was performed simultaneously using all the effect data, a likely reason for this result is only three effect data points were included in our analysis (Fig. 1).These results clearly indicate the need of more effect data for microplastic fibers to accurately capture the influence of the fiber shape and to estimate HC5 values, as has also been generally suggested by previous studies [34,35].

Factors affecting SSD estimation
The 95% Bayesian credible interval for β length in the best SSD model did not include zero (Table 1), but the influence of the particle length should be carefully interpreted.Our previous study [17] showed a negative influence of particle length, which is the opposite of the current finding.It is difficult to determine the reasons for the opposite results because the effect data used in the two studies were different (e.g., LOECs vs. NOECs; only spheres included versus three shapes included), and previous studies have drawn mixed conclusions (Jacob et al. [36], Yang and Nowack [37]; also see the discussion in Takeshita et al. [17]).Although the effects of tissue translocation have been suggested to be relatively minor based on the comparison of HC5 values estimated from the SSDs [7], such effects may have been responsible for the negative influence of particle size in our modeling.Furthermore, the polymer type for all the effect data with particle length of < 1 μm was PS (Fig. S1), which could have been a confounding factor.Indeed, the SSD model with two binary variables corresponding to fiber and PS was ranked 7th (WAIC = 155.5).Therefore, given these issues, it is still too early to reach a firm conclusion about the influence of particle length in SSD modeling.
The parameters corresponding to habitat (marine vs. freshwater) and polymer type (PS and PE) were rarely included in the top 5 models (i.e., ≤ 1 model; Table 1).For the influence of habitat, our results suggest that the chronic NOECs obtained from tests with marine species were on average 0.3 times lower than those with freshwater species.A similar trend for lower LOECs in marine media/species was observed in Takeshita et al. [17].However, considering that the ratios of freshwater to seawater SSD means for chemicals were mostly within a factor of 10 [26], such differences in chronic NOECs indicated by the SSD modeling might be within the margin of inherent errors.
It is important to note that our Bayesian SSD modeling did not incorporate the identity of biological species, largely because only a few effect data were available for most species (Fig. S2).For instance, only two freshwater cladoceran species (Daphnia magna and Ceriodaphnia dubia), which are commonly used as standard test species, had four or more effect data points.This implies that data on microplastic characteristics currently available for toxicity tests in the ToMEx database are very limited in terms of being used to comprehensively examine their influences on individual species.In addition to this issue, many challenges remain, such as the use of assessment factors to derive chronic NOECs and the alignment of HC5 values to exposure data, the latter of which is addressed in Koelmans et al. [38] and Mehinto et al. [7].However, the Bayesian SSD modeling utilized in this study provides unique opportunities to simultaneously investigate the influences of multiple microplastic characteristics on NOECs of multiple species, which would otherwise be difficult to discern.In combination with efforts to generate more relevant effect data [10,39] and update the databases such as ToMEx [11], the use of the Bayesian SSD modeling should facilitate more comprehensive discussions about the effects of microplastics on aquatic communities.

Fig. 1
Fig. 1 Species sensitivity distributions (SSDs, lines) illustrated based on the best model estimated by Bayesian modeling Median values of particle length were used for this illustration (particle length: 3.5 and 47.5 μm for sphere/fragment and fiber, respectively).The 90% Bayesian credible intervals are represented by broken lines.Chronic no observed effect concentrations (open circles, squares, and triangles) are included using Hazen plotting (p i = (i -0.5)/n)