Developing relational tables
Approach and assumptions
We compiled 68 English-language survey lists from various countries and organizations, including government, research, nonprofit, and academic groups that describe trash survey types for freshwater, marine, and terrestrial ecosystems. Throughout this report, we italicize class names when referring to trash typology. Three groups of classes were found across most of the surveys, which describe trash in terms of materials (the resource used to make the item, e.g., plastic or paper), item (description of the form of the object, e.g., bottle or fragment), and brand (the logo or manufacture’s name identified on the item) (Fig. 1). We also recognized two relational systems within the data: alias (synonymous words, e.g., cap and lid) and hierarchy (words that are parents or nested as children, e.g., spoon, fork, and knife are nested under utensils). We developed relational tables for comparing words used within and between these structures that originated from the 68 selected survey sheets. To provide potential users with definitions we operationalized in this study, we present a glossary of terms used (Table S1).
The primary assumption within the TTT is that there are no differences in the definitions of a given class between surveys. An example of a violation of this assumption would be two surveys that define fragment based on size, but with different criteria: such as fragment = particles > 1 mm versus fragment = particles < 5 mm. These surveys would classify different sets of objects using the same word. There are other types of information held within the methodological distinctions in definitions that we did not investigate further (e.g., color, shape, size) unless the methodological limitation was encoded in the class name (e.g., rope diameter < 1 cm). This study compared the relationships between the words used to describe trash and how they relate to one another based on professional experience with trash nomenclature.
Material-item relational table
We compiled a table listing the materials and items described by each organization’s classification system that we reviewed for our study (Fig. 2). Each row represents a unique material-item relationship (e.g., plastic and straw being listed in a row together). Sometimes it was unclear whether a class described a material class or an item class (e.g., disposable fork, typically made of plastic). To avoid introducing bias and adding words not used explicitly in the surveys, these classes were placed in the item class, and the material class was not inferred.
Misaligned category table
We defined misaligned classes as classes that did not fit within the material, item, or brand classes. If the class was too ambiguous, did not conform to the standard one-parent rule for hierarchical databases, or did not describe trash in environments, we added it to a separate document called the misaligned class table. Examples of misaligned classes include construction materials, fishing gear, and tree.
Alias and hierarchical tables
We developed alias tables for material, item, and brand classes independent of one another (Fig. 2). For the item and material alias tables, all words that were found to have the same meaning were linked using rows in a table where the first column defined the prime word, which was used as a key for joining to the hierarchy (Fig. 2), while all other columns were defined as aliases (e.g., fork and forks will be under the same alias). Break Free From Plastic, a nonprofit organization promoting a global movement to create a future free from plastic pollution, developed the brand alias table by researching the manufacturers who own the brands found during their annual Brand Audit in 2018 and 2019 [12]. This table was formatted with recurring manufacturer classes in one column corresponding to each brand owned by that manufacturer. In the alias tables, prime words can be merged with the hierarchical tables and vice versa. We established a single alias rule for every word in the alias tables so that any word could only join to one prime word to simplify analysis procedures using the tables.
Additionally, we developed hierarchy tables for material classes (Fig. 3A) and item classes (Fig. 3B). These tables specify the hierarchical position of prime words through multi-level grouping (e.g., the utensils class encompasses forks, knives, spoons, and straws; plastic in materials includes foam and soft plastic). The hierarchy tables only describe the prime words from the alias tables since those words are equivalent to the other words used to describe trash. Hierarchical groups were sometimes obvious. For example, one survey we reviewed used the class glass/ceramic while another split the classes into glass and ceramic. In other cases, the relationships were more nuanced. For example, organic is a more general material description that includes materials like wood and cloth. We established a single parent rule for the hierarchical tables where every word could only have up to one parent to simplify analysis procedures using the tables.
Database query tool development
The Trash Taxonomy Tool (TTT) is a database with a set of query tools and all previously mentioned relational tables accessible via an online application (openanalysis.org/trashtaxonomy). The site was created using the shiny [18], dplyr [19], data.table [20], shinyjs [21], shinythemes [22], DT [23], shinyhelper [24], data.tree [25], and collapsibleTree [26], packages in R (4.0.5) and R Studio (1.4.1106). This site allows users to upload a comma-separated value (csv) file of their survey list to process using the alias and hierarchy framework; an example of the exact required formatting is provided in the supplemental information (Table S2). In technical terms, the TTT is a schema matching tool because it matches and maps schemas from trash surveys to a unified format [27]. The TTT first uses an alias lookup to match and map the user-provided survey classes to prime word keys for material and item words. It then locates the prime word in the hierarchy and allows users to display all recognized words that are more or less specific in their item and material columns. It finds all parent words when the less specific function is called and all child words when the more specific function is called. If the user provides a word that is not in the relational tables, a notification will return for that particular word. More detailed documentation and a video tutorial (https://youtu.be/sqeLaJKyol8) can be found on the TTT website.
Relational table cleaning and validation
We cleaned the relational tables using several tests. We created basic queries to identify duplicated terms, remove them, and ensure that all relationship links between the tables (Fig. 2) were equivalent in both directions for the alias to hierarchy relationships. The material-item table keys are equivalent to the alias and misaligned prime keys combined. The key column in the ‘material alias’ table has the same terms as the key column in the ‘material hierarchy’ table. We created a visualization within our online tool to inspect all the relational tables for nuanced relationships like semantic relationships within and between the tables. We also uploaded the material-item relational table to the query tool, then returned the relational table’s results and visually assessed the matches.
Assessment of the current state of trash typology
Summary statistics
We calculated summary statistics on each of the relational tables. The total number of classes was assessed by summing the number of unique words used within the survey lists (e.g., fork or spoon and fork/spoon are considered separate words). We assessed the number of unique classes by summing the unique prime aliases in the alias table (e.g., the two previously mentioned categories are joined to the same class). The number of levels of the hierarchies was assessed using the maximum number of levels of any given branch in the hierarchy tables. Diagrams were developed to demonstrate the depth and complexity of the hierarchical tables.
Factor analysis
The similarities between the groups of survey types (organization, ecosystem, substrate) were assessed with multiple correspondence analysis (MCA), using FactoMineR [28] and FactoShiny [29]. MCA is recommended for factor analysis of categorical variables instead of PCA [28]. We expected that the survey lists of similar types (e.g., marine trash surveys) would use similar trash typology since they would have similar study goals. We split survey types by organization into research, nonprofit, and academic; ecosystems were split into marine, riverine, estuary, or land; substrate were split into beach, surface water, underwater, or roadside. First, we joined all classes used in each survey’s materials-items table to the alias tables. Second, we converted all classes to a matrix with zero denoting that the survey list did not have the class and one denoting that the survey list had the material and item class (one hot encoding). The MCA’s supplemental information (information not used to inform the model development) included organization, ecosystem, and substrate types (Table S3). V test statistics were assessed for each supplemental category's first and second dimensions. V tests are used determine if a supplemental category has a MCA dimension significantly different from zero. We asigned a V test statistic value of 2 as the cutoff for significance (Table S4, Table S5).
Comparability analysis
We assessed the comparability of each survey list to all the others by calculating the one-way percent of overlapping items or materials after joining them to the alias table:
$${Comparability\ Metric}_{X,Y}=\frac{\Sigma\ Classess\ in\ sheet\ X\ equivalent\ with\ classes\ in\ sheet\ Y}{\Sigma \ All\ classes\ in\ sheet\ Y}$$
(1)
where the Comparability MetricX,Y is a one-way test for how comparable survey X is with survey Y. The metric defines the proportion of the classes in survey list Y that are accounted for by the classes in survey list X after joining the lists to the alias table. The comparability metric helps describe how much one survey accounts for the classes in another survey, a typical operation when merging trash survey lists. We then averaged all comparability metrics for each survey by material and item independently and plotted them to identify the most comparable surveys and discuss strategies for creating a 100% comparable survey list.
Another way to compare trash surveys is to lump them together using the hierarchy. We used the hierarchy and alias tables to compare the Stormwater Monitoring Coalition (SMC) survey list with the NOAA survey list. First, we added randomly sampled trash counts (a standard trash survey method) between 1 and 10 for each trash typology. We joined both surveys to the alias table and then to the hierarchical tables. We used the data.tree [25] package in R to sum up the hierarchies to demonstrate how the two surveys are related based on the hierarchy.