Introduction

Dissolved organic matter (DOM) is considered to be one of the most reactive and dynamic pools of organic material in the Earth’s system (Hansell and Carlson 2015; Hedges 2002). Lakes have key reprocessing functions in the global biogeochemical cycles of DOM (Tranvik et al. 2009): In these environments, DOM is transformed into atmospheric gasses and molecular by-products via various photochemical, microbial, and other processes (e.g., dark Fenton reactions). These by-products (“reprocessed DOM”) are later exported to the global oceans via rivers or sequestered in lake sediments via flocculation and complexation processes. The magnitude of lake outgassing and DOM burial is on the same magnitude of carbon sequestration in the oceans (Tranvik et al. 2009) indicative of the importance of lake biogeochemistry. However, the dynamics of DOM in lake systems remain understudied even for the largest lakes on Earth (Minor and Oyler 2021; Queimalinos et al. 2019; Zhou et al. 2016).

The focus of this study is Lake George (LG), a temperate, oligotrophic, medium-sized lake (114 km2) in northeastern New York State (U.S.). It is nicknamed the “Queen of American Lakes”. LG is a natural reservoir in a heavily forested watershed with a sporadic abundance of wetlands, agricultural farms, and urban developments. LG is a popular recreational and tourist destination and its watershed is 10% developed at present (Swinton et al. 2015). The urbanized areas contain old septic systems (Harrison et al. 2021), which, in addition to three wastewater treatment facilities (WWTFs), contribute to enrichment of DOM with biological material and nutrients (Lusk et al. 2017). The LG watershed contains at least 140 tributaries (Sutherland et al. 2001), which range in size and discharge, and provide a unique opportunity to study how DOM in temperate and oligotrophic lake systems is impacted by watershed characteristics (e.g., presence of wetlands) or anthropogenic activities (e.g., wastewater seepage, agriculture). LG has been extensively studied for more than 50 years via its monitoring programs conducted by the Darrin Fresh Water Institute. This institute was established with the primary goal to monitor water quality for sustaining the lake’s health and identifying key issues requiring scientific research to prevent ecological decline in the LG watershed. The majority of previous LG research was focused on nutrient loadings (nitrogen, phosphorus) along with introduced fish and plant species, but later the LG research program expanded to road salt and other contaminants (Boylen et al. 2014). The onset of the “Jefferson Project” (https://dfwi.rpi.edu/jefferson-project-lake-george) spurred the probing of lake biogeochemistry using in-situ sensors, as well as studying various other biogeochemical and ecological aspects such as harmful algal blooms. LG is aimed to become a “smart lake” due to the extensive in-situ monitoring of the lake, streams, and weather that has yielded a large abundance of research data.

Fluorescence sensors are one of the tools that have been used for historic monitoring of LG water quality. However, the composition of the fluorescent DOM (FDOM) fraction is complex and usually contains multiple underlying components. In this study, we identify and quantify these components using parallel factor analysis (PARAFAC) modeling, a common approach for deconvoluting excitation-emission matrix (EEM) fluorescence spectra of DOM (Bro 1997). We particularly focus on assessing FDOM in the tributaries of LG and hypothesize that FDOM composition will vary during different seasons (temporally) and with different land use (spatially) as there are tributaries affected by agriculture, wastewater, and wetlands. The fieldwork was planned strategically to enhance the PARAFAC model for successful deconvolution of tributary EEM spectra. We contextualize FDOM composition with bulk DOM and chromophoric DOM (CDOM) measurements as dissolved organic carbon (DOC) and ultraviolet–visible spectra, respectively. Evaluating the composition of FDOM in the LG watershed will enhance the knowledge of biogeochemistry of temperate lake watersheds and determine how different watershed features impact the reactivity and cycling of DOM, which impact lake ecology and water quality. Furthermore, the developed PARAFAC model and resultant knowledge of FDOM composition will allow for a better understanding of in-situ fluorescence data in future studies, which will provide a much more detailed picture of the spatiotemporal variability of DOM in this watershed. More broadly, as fluorescence spectroscopy is a highly popular analytical method to characterize tributary and lake waters, our developed PARAFAC model would be useful for deconvoluting EEM data and providing biogeochemical insights for other oligotrophic lake environments in temperate and even boreal forested environments similar to the LG watershed.

Materials and methods

Description of LG watershed and sampling sites

LG (43° 35′ N, 73° 35′ W) is a deep, dimictic, oligotrophic lake in the Adirondack Mountains region of New York State, USA (size = 114 km2, mean depth = 18 m; maximum depth = 58 m; volume = 2.1 km3; Aulenbach et al. 1981, Boylen et al. 2014). The watershed geology is mostly shallow sandy till overlaying bedrock, with many granite outcrops and large boulders. It has a small drainage area of 492 km2 and receives 57% of its water as surface water inflow from tributaries, 25% from precipitation directly on the lake, and 18% from groundwater (Shuster et al. 1994). The catchment is heavily forested with some urbanization around the lake's shoreline. There are 3 WWTFs as well as multiple wetlands and several horse farms. Further details on Lake George and its watershed can be found in Boylen et al. (2014) and Harrison et al. (2021).

There are 11 major tributaries (Northwest Bay Brook, Indian Brook, Hague Brook, West Brook, English Brook, Shelving Rock, Finkle Brook, East Brook, Sucker Brook, Pole Hill Brook, and Sunset Brook), and the remaining ~ 130 are minor, with ~ 50 being ephemeral. For this study, 64 tributaries of various sizes were sampled (Fig. 1), with some sites having unique wastewater, agricultural, or wetland influences. These influences are determined empirically based on the location of the tributary and sampling point (e.g., tributaries sampled downstream of a wetland are categorized as “wetland-influenced”). Most tributaries were sampled near the shoreline. Several major tributaries were sampled at numerous locations along their length. The lake itself was sampled at 12 pelagic locations. The final dataset contained 213 samples and 5 procedural blanks. Fieldwork consisted of 4 major sampling expeditions (one in November 2020, two in May 2021, and one in September 2021) and 4 minor sampling expeditions (one in August 2020, one in July 2021, one in September 2021, and one in October 2021). All sampling trips were at least 5 days after a rain event to ensure baseflow DOM conditions. Stream hydrographs (if available) were checked to additionally confirm that water levels are at baseflow conditions. The numerous different tributaries that were sampled in the 8 fieldwork expeditions allowed for obtaining a sufficient spatiotemporal diversity of FDOM inputs into LG, which enhanced the dataset variance and improved the PARAFAC deconvolution (see “Parallel factor analysis (PARAFAC) modeling” section of the SI). The supplementary Excel file contains information for all samples (labeled EEM 1 through EEM 218) such as the name of the tributary/site they were collected from, any specific influences (wetland, agriculture, wastewater), date/time of collection, as part of which sampling trip, and site characteristics from StreamStats (Ries Iii et al. 2017) such as forest coverage, annual precipitation, etc.

Fig. 1
figure 1

Map of the Lake George (LG) watershed showing the sampling sites, which are color-coded by specific influences. The three major wastewater treatment facilities (WWTFs) are also shown. The insert shows New York State and includes the outline of the Adirondack Park, with the LG watershed located in its southeast corner

Sample collection and processing

Grab surface samples were collected in acid-cleaned 250-mL high-density polyethylene Nalgene™ bottles after three bottle rinses in the field with sampled water. Samples were immediately transported to the laboratory for processing. Samples were filtered using pre-combusted glass-fiber filters (Pall Laboratory, Type A/E, 47 mm, pore size 1 µm at first and decreasing with sample volume filtered). Future research should employ filters with narrower pore size (0.22 µm or less) to prevent colloidal organic matter that may affect fluorescence results. Our current dataset remains comparable for future meta-analysis studies (Nimptsch et al. 2014). Filtrates were aliquoted in acid-cleaned and pre-combusted amber vials prior to instrumental analyses. Procedural blanks of ultrapure water assessing for background DOM contaminants were also included in the subdatasets of most sampling trips. Between processing and analysis samples were refrigerated at 4 °C for no more than 5 days to prevent sample degradation (Spencer et al. 2007).

Bulk and chromophoric DOM measurements

Dissolved organic carbon (DOC) and total dissolved nitrogen (TDN) were measured using a Shimadzu TOC-L analyzer following standard protocols. The instrument was calibrated using potassium hydrogen phthalate (for DOC) and potassium nitrate (for TDN). Samples were analyzed in triplicate with relative standard deviations no higher than 5%.

Ultraviolet–visible absorbance spectra were used for characterizing the chromophoric dissolved organic matter (CDOM) fraction. Spectra were acquired using a Horiba Aqualog spectrofluorometer along with the measurement of fluorescence spectra (see below). Absorbance spectra were acquired over 230–700 nm in 2 nm steps. CDOM was quantified based on the absorbance at 254 nm, which was converted to Napierian units (on an ln-basis) and normalized to the cuvette pathlength (1 cm). The spectral slope ratio was computed as the ratio of the spectral slope over 275–295 nm to the spectral slope over 350–400 nm (Helms et al. 2008). The specific ultraviolet absorbance at 254 nm (SUVA254) was calculated as the decadic absorbance at 254 nm (on a log10-basis) normalized to DOC content (Weishaar et al. 2003).

Spectrofluorometry and data processing

Fluorescence was measured using a HORIBA Aqualog spectrofluorometer. When necessary, samples were analyzed using 2- or 3-fold dilution to ensure that UV absorbance at 230 nm was below 0.3 to avoid uncorrectable inner-filter effects (Miller et al. 2010). Three-dimensional EEM spectra were acquired using excitation at a gradient of 230–700 nm with 2 nm steps. Fluorescence emission was monitored over emission wavelengths of 246.01 to 828.00 nm in 2.33 nm steps (equivalent to a 4-pixel resolution) using integration time of 0.5 s and normal gain of the charged-coupled device detector. The software automatically corrected the EEM spectra for background contributions from an ultrapure laboratory water blank as well as for instrument-specific responses (Cory et al. 2010).

A water Raman scan for calibration to water Raman units (RU) was acquired for each instrument run using the default settings in the RU tool within the software: excitation at 350 nm, emission coverage at 246.01–828.00 nm in 0.58 nm steps (equivalent to 1-pixel resolution), integration time of 30 s.

Data processing and PARAFAC deconvolution of EEM spectra

All data (EEM, UV–Vis, and water Raman spectra) were exported and loaded in MATLAB. First, UV–Vis and water Raman spectra were reformatted using the TEnvR toolbox, version 2021, as described by Goranov et al. (2023). All further processing (Raman normalization, inner-filter effect correction, scattering removal and interpolation) was done using the drEEM toolbox, version 0.6.3, as described by Murphy et al. (2013). Raw and processed EEMs, UV–Vis and water Raman spectra have been published in the Mendeley Data repository (https://doi.org/10.17632/yv9f3z3xb6.2).

EEM deconvolution using PARAFAC modeling was done following Murphy et al. (2013). Modeling was done in three different stages as described in Sect. 1.1 of the SI. The final PARAFAC model, containing 197 samples and developed with non-negativity constraints, was validated with a 4-split, 6-combination, and 3-test (S4C6T3) alternating split-half analysis (Murphy et al. 2013) with a convergence criterion of 1 × 10–8. The online repository OpenFluor (K. R. Murphy et al. 2014a, b) was used to find other PARAFAC models with similar components using Tucker congruence coefficients (Tucker 1951) of 0.95 in both excitation and emission dimensions. Our model was published in the OpenFluor repository under the name Goranov_LakeGeorge (model ID: 11658). The second preliminary model (see supporting information, Sect. 1.1) was also published in the OpenFluor repository under the name Goranov_LakeGeorge_Prelim (model ID: 11640). Further details about modeling and validation can be found in Sect. 1.2 of  the SI. The component contributions (Fmax values) from the final 6-component model can be found in the supplementary Excel file.

Statistics

As some of the evaluated sets of data were comprised of even numbers of samples, exclusive medians were used for box and whisker plots throughout this study. Data averages are reported as medians and the ± range is calculated as the difference between the upper and lower quartiles (i.e., Q3–Q1). Kruskal–Wallis tests were used to determine if significant differences existed among multiple subsets of samples. Sequential Tukey’s honestly significance difference post-hoc test was used to determine how subsets compared to each other. Principal component analysis was done using the TEnvR toolbox, version 2021, as described by Goranov et al. (2023). All statistical evaluations were done in MATLAB R2022a with a confidence level of 95% corresponding to a significance p-value threshold of 0.05.

Results and discussion

Overview of bulk DOM and CDOM in the LG watershed

Bulk DOM concentrations in tributary and lake samples of the LG watershed, expressed as dissolved organic carbon (DOC), varied between 0.6 and 9.0 ppm-C (mg-C L−1), with a median of 1.9 ± 1.1 ppm-C. Total dissolved nitrogen (TDN) levels varied from 0.03 to 12 ppm-N (median 0.2 ± 0.1). The corresponding C/N ratio (mol DOC/mol TDN) varied 0.1 – 47.0 (median 13.6 ± 7.6). The average DOC of LG waters is lower than that of global lakes (5.7 – 9.6 ppm-C, Massicotte et al. 2017; Queimalinos et al. 2019; Sobek et al. 2007), but is comparable to the DOC of the oligotrophic Great Lakes nearby (Minor and Oyler 2021). This is likely due to the lower annual inputs of carbon though the biogeochemistry of such environments is highly understudied.

The chromophoric DOM (CDOM) is determined using ultraviolet–visible absorbance spectra. CDOM is commonly quantified as the Napierian absorbance at 254 nm, α254. CDOM varied 4 – 79 AU/cm, with a median 13 ± 10 AU nm. The specific ultraviolet absorbance at 254 nm (SUVA254) is a useful indicator of the abundance of CDOM per unit DOC (Weishaar et al. 2003). SUVA254 varied 0.7 – 4.0, with a median of 3.0 ± 0.8. Lastly, the slope ratio (SR) estimate obtained from the shape of the ultraviolet–visible absorbance spectra, can be used to judge the size of CDOM clusters (Helms et al. 2008). SR ranged 0.18 – 2.16, with a median of 1.90 ± 1.10. The ranges of DOM and CDOM measurements clearly indicate the abundance of outlier samples. These are samples with specific characteristics (e.g., wetland-influence, wastewater-influence) that are described in the sections below.

It is worth mentioning that all of these results are at baseflow conditions. It is expected that stormflow conditions will increase the DOM loadings (Buffam et al. 2001; Garcia et al. 2023), which will likely be dependent on spatial watershed characteristics (Singh et al. 2015). Thus, with the increasing changes in climate towards warmer weather and increased storms in the LG watershed area (Walsh et al. 2014), it is expected that tributary and lake DOM will become browner. Thus, establishing baseflow conditions in this study would be critical for monitoring and evaluating future changes to this watershed’s biogeochemistry.

Description of deconvoluted PARAFAC components in LG FDOM

A PARAFAC model was developed to explore the composition of FDOM and observe how the underlying components vary across the LG watershed. The developed model successfully extracts quantitative measures of six separate components (C1-C6, Fig. 2). To assess the environmental implications of the variability in FDOM underlying substituents, PARAFAC components were assigned with a source and a biogeochemical function.

Fig. 2
figure 2

An example excitation-emission matrix (EEM) fluorescence spectrum and the obtained six components (C1–C6) after parallel factor deconvolution (PARAFAC) modeling along with (1) Fluorophore type per the classification by Coble et al. (2014), (2) Potential structure, and (3) Assigned biogeochemical role in the LG watershed. The shown EEM spectrum is of Finkle Brook (pristine tributary) sampled next to the Darrin Fresh Water Institute (Bolton Landing, NY) on 08 August 2020. White regions on the EEM spectrum represent spectral artifacts (noise or scattering bands) that have been removed

The first component (C1) emits fluorescence at long wavelengths (> 400 nm), which suggests a highly aromatic fluorophore. Previous studies have identified the C1 fluorophore to be terrestrial in origin (Fellman et al. 2010; Lambert et al. 2016a, b; Moona et al. 2021; Wauthy et al. 2018) and likely represents lignin degradation products, such as syringaldehyde and other lignin phenols derived from plant litter (K.R. Murphy et al. 2014a, b; Stedmon et al. 2007a, b; Walker et al. 2009). In terms of reactivity, the C1 fluorophore has been shown to be both bio-degradable (Yang et al. 2019) and photo-degradable (Du et al. 2016; Zhou et al. 2019). Thus, C1 does not appear to be an end-member and likely represents a myriad of aromatic, terrestrial organic matter degradation products, which explains why C1 has been previously observed in a large range of inland waters, including lakes and rivers in NY state (Wang et al. 2022b, a; Wang et al. 2020, 2021).

The second component (C2) emits fluorescence at even longer wavelengths (> 500 nm) suggesting a fluorophore of higher molecular weight relative to C1 and likely originating from plant remnants/microbial biomass or detrital DOM (Osburn et al. 2015). C2 represents “humic-type” molecules that are environmentally ubiquitous (Kowalczuk et al. 2009; Obrador et al. 2018) though more prominent in terrestrial environments such as sediments/soils (Kida et al. 2021; Moona et al. 2021; Osburn et al. 2012, 2016; Yamashita et al. 2010a, b) and environments with high DOM loadings (Obrador et al. 2018). Based on its spectral properties C2 could represent large molecules containing reduced quinone moieties (Cory & McKnight 2005). In terms of reactivity, C2 is highly photo-labile and of similar photo-reactivity to C1 (Murphy et al. 2018). C2 has been found to be bio-degradable (Lambert et al. 2016a, b; Wauthy et al. 2018), but it can be also microbially produced (Derrien et al. 2020). Lastly, C2 has been found in wastewater-influenced DOM (Wunsch & Murphy 2021) as well as lake DOM in NY State (Wasswa et al. 2022).

The third component (C3) emits fluorescence at short wavelengths (< 400 nm) suggesting a low molecular weight fluorophore that possibly contains oxidized quinone moieties (Ishii & Boyer 2012; Peleato et al. 2017). C3 is microbially produced in terrestrial and marine aquatic environments (Graeber et al. 2021; Gullian-Klanian et al. 2021; Lambert et al. 2016a, b; Peleato et al. 2017; Tanaka et al. 2014) and is overall found in systems with high microbial activities (Fellman et al. 2010; Meng et al. 2013; Murphy et al. 2011; Osburn et al. 2015; Yamashita et al. 2008). C3 can be produced from non-fluorescent precursors in photo-bleached DOM (Bittar et al. 2015) and has been observed in lake DOM (Chen et al. 2018; Wang et al. 2020) and wastewater-influenced DOM (Wunsch & Murphy 2021). In terrestrial systems, C3 is in higher proportions in stream DOM relative to soil DOM (Eder et al. 2022) suggesting that its aquatic production outcompetes its production in soil systems. C3 has been found to be resistant to bio-degradation (Bittar et al. 2015) suggesting it is not an intermediate fluorophore but a microbial end-product. As with many aromatic species, C3 is difficult to remove via bio-filtration additionally proving its biological recalcitrance (Peleato et al. 2016). This is why C3 has been suggested as a wastewater/nutrient-enrichment tracer (Murphy et al. 2011). Lastly, C3 has been found to be labile to oxidation (Peleato et al. 2017) indicating that while biologically refractory, it can be degraded by aquatic photochemical or other oxidative (e.g., Fenton) processes.

The fourth component (C4) appears to represent N-type fluorophores, with presently unknown chemical composition. These species appear to be environmentally ubiquitous and are in highest abundance in forested and wetland environments (Fellman et al. 2010) though also observed in agriculturally influenced DOM (Graeber et al. 2012; Hernes et al. 2009; Søndergaard et al. 2003), treated wastewater and algal DOM (Søndergaard et al. 2003), and in limnologic environments (Du et al. 2016) including lakes in NY state (Wang et al. 2020). It has been suggested that C4 is a product of heterotrophic microbes that consume terrestrial DOM as substrate (Lambert et al. 2017). C4 has been suggested to be comprised of labile fluorophores associated with freshly produced DOM (Fellman et al. 2010). It has been shown that C4 is a photochemical by-product but is also photoreactive and can be further photodegraded (Du et al. 2016). Lastly, C4 has been found in high concentrations in areas of high urbanization (Lambert et al. 2017).

The fifth component (C5) is a rare fluorescent species observed only in two previous studies (Harjung et al. 2018; Jutaporn et al. 2020) based on spectral matching in OpenFluor (Tucker congruence coefficients > 0.95). It has been identified in wastewater effluent DOM and natural surface DOM (Jutaporn et al. 2020) as well as in forested streams (Harjung et al. 2018) suggesting that this is a terrestrial plant/soil-derived fluorophore.

The sixth component (C6) appears to represent a mixture of proteinaceous tryptophan-like and tyrosine-like fluorophores (Coble 1996; Yamashita et al. 2015; Yamashita et al. 2010a, b; Zhou et al. 2019). C6 has been suggested to be biologically produced by phytoplankton (Coble 1996; Dall’Osto et al. 2022; Osburn et al. 2017; Stedmon and Markager 2005) as well as algal or from other microbes (Yamashita & Tanoue 2003). This claim has been supported by the observed significant correlations of C6 and chlorophyll (Goncalves-Araujo et al. 2016). However, another study has shown an insignificant correlation of C6 and chlorophyll (Zhou et al. 2019) indicating that the fluorescent species represented by C6 may have other sources. Yamashita et al. (2010a, b) also conclude that C6 is not exclusively derived from heterotrophs as C6 appears to be generated by microbial communities, periphyton, and from leachates of higher plants (Scully et al. 2004). It has been suggested that C6 includes some lignin-like or tannin-like polyphenols (Hernes et al. 2009; Maie et al. 2007; Romero et al. 2017; Schafer et al. 2021) though C6 does not co-vary with lignin phenols (Walker et al. 2009). C6 has been found in plant leachate without co-existence of humic components (Zhuang et al. 2021). Collectively, these findings indicate that this “protein-like” C6 fluorophore could also represent plant-derived non-ligninaceous aromatic fluorophores. Photochemistry has been excluded as a potential C6 source as C6 has not been observed to be a major photo-product in the degradation of fluvial DOM (Zhou et al. 2019). Even though C6 is likely an aromatic fluorophore, it has been observed to be poorly photo-reactive in some studies (Dainard et al. 2015; Stedmon et al. 2007a, b) and completely photo-refractory by Zhou et al. (2019). When C6 was observed to photo-degrade (Stedmon et al. 2007a, b) it has been suggested to be a source of ammonia indicative of nitrogen present in the molecules comprising C6 agreeing with the operational label of tyrosine- and tryptophan-like fluorophores. Evidence from ultrahigh resolution mass spectrometry also suggests C6 to be comprised of proteinaceous material (Stubbins et al. 2014). In addition to being commonly proposed as a microbial exudate, C6 has been suggested to be bioavailable (Graeber et al. 2012; Podgorski et al. 2021), by-product of microbial activities, or a metabolism substrate for microbial utilization (Yang et al. 2019) collectively agreeing that C6 includes biological compounds that are intermediates in aquatic productivity. C6 appears to be an environmentally ubiquitous fluorophore: it has been found in fluvial DOM, though not as a primary FDOM constituent (Harjung et al. 2018; Osburn et al. 2018), wetlands (Zhou et al. 2019), and wastewaters (Zhou et al. 2023). C6 is resistant to adsorption to minerals (Groeneveld et al. 2020) and is highly persistent in lakes (Kothawala et al. 2014), which is likely due to slow degradation kinetics and/or internal production. C6 correlates with total nitrogen, total phosphorus, and ammonia in effluents (Ryan et al. 2022) suggesting it could be a tracer for wastewater. C6 has been also found to co-vary with various organic micropollutants in NY lakes indicative that C6 is likely a key non-humic water quality parameter to monitor in future studies (Wang et al. 2022b, a). Related to this, C6 has been found to have similar spectral signature to that of polycondensed aromatic hydrocarbons, PAHs (Murphy et al. 2006), but C6 did not correlate with benzenepolycarboxylic acids (Yamashita et al. 2021), which are compound-specific markers for quantifying condensed aromatic structures like PAHs (Wagner et al. 2017). Thus, C6 may contain some small aromatics such as 2–3 ring PAHs most likely originating from combustion by-products of boat gasoline or other fossil fuels.

Collectively, these previous observations for the six fluorescent components establish a baseline of knowledge, which paired with the observed FDOM dynamics in LG, allowed for assigning a function/source for each component in the LG watershed (Fig. 2, Table S7):

  • C1 is a syringaldehyde-like fluorophore. It is derived from lignin and appears to be an intermediate in aquatic biotic and abiotic processes. It is a C-type fluorophore per the classifications described by Coble et al. (2014).

  • C2 is a D/E-type reduced quinone-like fluorophore. It corresponds to freshly leached humic material from soil or plant detritus and likely has relatively high molecular weight.

  • C3 is an M-type oxidized quinone-like fluorophore. It is a by-product of aquatic microbiological processes and likely has relatively low molecular weight.

  • C4 is an N-type fluorophore of unknown structure comprised of microbial exudates and photochemical degradation by-products.

  • C5 is a poorly characterized fluorophore with unknown structure that likely originates from plant litter/soil organic matter.

  • C6 is B/T-type fluorophore, appearing like a mixture of tyrosine and tryptophan-like fluorophores. It appears to represent proteinaceous material from primary production, plant-derived non-ligninaceous aromatics, or maybe even small PAHs.

Notably, some of the EEM spectra exhibited higher leverages and sum of squared errors (Figure S6) during their PARAFAC modeling. These were samples that were not as well represented (lake samples, wastewater-influenced samples) as the pristine tributaries in the dataset. This suggest that there are possibly other underlying fluorescent components in this dataset though they are likely of minor significance.

Comparison of FDOM composition in tributaries versus the lake

The six PARAFAC components identified in the dataset were present in all samples indicating that the C1-C6 components are not specific to certain environments (e.g., tributary, lake, wetland, wastewater, etc.). This is in agreement with the latest findings about the composition of FDOM (Wünsch et al. 2019) revealing that most fluorescent components are likely ubiquitous in a wide range of different environments and are not tied to a biogeochemical origin (i.e., are not source-specific). However, fluorescent components vastly differ in concentrations across different biogeochemical interfaces, which is showcased when PARAFAC components in the LG tributaries and lake are compared (Fig. 3).

Fig. 3
figure 3

Box and whisker plots showing the relative contribution of each PARAFAC component (C1–C6) in tributary (Tribs., N = 186) versus lake (N = 11) environments. Please note that while the lake was sampled in 12 locations, one of the samples was an outlier and was excluded during the PARAFAC deconvolution. Outliers are labeled on the figure as their sample number and listed here with the site name and specific influence: 15 = Cedar Lane (wastewater-influenced); 19 = Tahoe (Lake Front Terrace, pristine tributary); 39 and 158 = Jeremy’s Stream (wetland-influenced); 41, 131, and 190 = Dula Pond (wastewater-influenced); 83 and 191 = Hondah (wastewater-influenced); 156 = Blind Brook (agriculture-influenced); 177 = West Brook ('87 uppermost, pristine tributary); 179 = West Brook (below seep side inlet, wastewater-influenced); 180 = West Brook (downstream site, wastewater-influenced); 203 = Warner Bay (shallow bay in the lake); 206 = 99 Cotton Point Road (pristine tributary); 214 = West Brook (seep at motel, wastewater-influenced)

The tributary samples appear to be enriched in fluorescent components in the following order: C1 (i.e., highest concentration) > C2 > C3 > C5 > C4 > C6 (i.e., lowest concentration). The primary lignin-derived component C1 explains most of the variance in the dataset and is of the highest abundance (median of 37%). The C2 component, which is the secondary terrestrial component (representing high molecular weight FDOM), is in lower amounts (17%) likely due to the poor solubility of high molecular weight molecules in water. C1 and C2 are slightly correlated (Figure S7) suggesting a similar source and production in the tributary waters. The three terrestrial components (C1 + C2 + C5) collectively comprise about 2/3 of FDOM signal. The C3 and C4 components, which are degradation by-products from microbial, photochemical, or other oxidative processes, collectively represent about a third of FDOM (C3 = 16%, C4 = 12%). This indicates that these streams and brooks are photobiochemically active and rework FDOM even before it reaches LG. The C6 component accounted for little of the FDOM composition of LG tributaries (4%). This suggests that it is not sourced from soils or other landscape sources. However, C6 may be related to DOM processing that does occur in the tributaries. This would explain its presence in the tributaries, and the low concentrations can be explained by its slow degradation kinetics yielding high persistence (Kothawala et al. 2014)—it would be discharged into the lake faster than the time it would take for C6 to accumulate.

The lake samples appear to be enriched in fluorescent components in the following order: C6 (i.e., highest concentration) > C4 > C1 > C3 > C2 > C5 (i.e., lowest concentration). Thus, the lake samples are enriched in protein-like fluorophores (C6) and DOM reprocessing by-products. This is expected for lake systems where DOM is reprocessed and intermixed with new microbial DOM formed by aquatic microbiological processes such as primary production. The three terrestrial components (C1, C2, and C5) are present in low quantities. The observed significant differences in FDOM composition in tributaries and lake samples agree with previous literature (Biers et al. 2007; Ma and Green 2004) and the observed trends provide confidence to the presented PARAFAC model and to its applicability for assessment of the LG watershed. However, obvious is the discrepancy between the oligotrophic character of LG, implying low biological activity, and the high concentrations of protein-like C6 fluorescence implying high biological activity. This suggests that C6 does not represent proteinaceous compounds but others such as plant-derived non-ligninaceous aromatics or maybe even small PAHs.

In contrast with FDOM, bulk DOM characteristics of tributaries and lake did not exhibit significant differences (Figure S9): DOC (1.8 ± 1.2 ppm-C vs. 1.9 ± 0.1 ppm-C), TDN (0.19 ± 0.12 ppm-N vs. 0.19 ± 0.02 ppm-N), and C/N ratio (13.7 ± 8.0 vs. 12.0 ± 0.4) remained fairly consistent throughout the sampling period for tributaries and lake samples, respectively. This indicated that FDOM characterization in this watershed can be more useful to decipher biogeochemical changes that were not reflected by bulk DOM measurements. The tributaries were richer in CDOM (higher SUVA values), which was also of higher molecular weight (lower SR values). This is expected as CDOM becomes degraded by sunlight, microbes, and other processes, in addition to being diluted, when it is exported to the lake. This agrees with the increased levels of humic components C1, C2, and C5 in the tributaries.

Impact on watershed characteristics on FDOM composition in LG tributaries

Several samples appear as outliers (Fig. 3) and these are mainly tributaries under a specific influence: Blind Brook (label 156) is agriculture-influenced; Jeremy’s Stream (labels 39 and 158) is wetland-influenced; and Dula Pond, Hondah, West Brook (seep at motel), West Brook (below seep side inlet), West Brook (downstream site), and Cedar Lane (labels 15, 41, 83, 131, 179, 180, 190, 191, 214) are all wastewater-influenced. This indicates that watershed characteristics affect the LG FDOM, which is generally expected (Hosen et al. 2014; Wagner et al. 2015). The remaining outliers (labels 19 = Tahoe (Lake Front Terrace), 177 = West Brook ('87 uppermost), and label 206 = 99 Cotton Point Road) are of pristine tributaries and are, to our knowledge, not influenced by any specific watershed characteristics suggesting that LG FDOM composition may have some intermittent temporal variations that remain to be explored in the future with further sampling. The last outlier, the lake sample Warner Bay (label 203), whose FDOM has a very similar composition to the FDOM of the tributaries, is fed by a large nearby wetland. Warner Bay is very shallow (less than 5 m at the sampling spot) and exhibits a very low residence time. Thus, its waters are constantly replenished from the wetland without having enough time for lake reworking processes to occur resulting in a spatially different composition than the rest of LG.

To more robustly explore the abundance of PARAFAC components in tributaries of different influences, the PARAFAC scores (Fig. 3) were grouped into four categories: pristine, agriculture-, wastewater-, and wetland-influenced tributaries (Fig. 4). The distribution of PARAFAC components in agriculture- and wastewater-influenced tributaries was significantly different than in pristine tributaries. Wetland-influenced tributaries showed no significant difference in their fluorescence composition relative to the pristine tributaries. Since wetlands accumulate terrestrial material and are therefore enriched in aromatic DOM, it is not surprising that the distribution of molecular groups is similar to the DOM that is released in the adjacent tributaries. Additional evidence for this was acquired by the lack of correlation of component scores and percentage areas of storage (Figure S8), a parameter from StreamStats describing the abundance of ponds, wetlands, or other reservoirs (Ries Iii et al. 2017). Interestingly, the land-use analysis of the watershed also showed that forest coverage % and developed (urban) land % did not correlate with any PARAFAC component abundance suggesting consistent FDOM composition regardless of the land use (non-point sources of DOM) except for the abundance of point-sources of DOM such as WWTFs, wetlands, or agricultural farms.

Fig. 4
figure 4

PARAFAC component distributions in agriculture-influenced (N = 6), wastewater-influenced (N = 23), wetland-influenced (N = 13), and pristine tributaries (N = 143). Significant differences of the disturbed tributaries relative to the pristine tributaries are denoted with an asterisk (p < 0.05)

The tributaries with agricultural influence exhibited significantly different fluorescence distribution agreeing with previous findings (Graeber et al. 2012). Specifically, the abundance of C1 and C5 fluorophores was lower relative to the pristine tributaries. The agriculturally disturbed tributaries are next to horse farms. One possible explanation is that physical soil disturbances (tillage, drainage, etc.) affect the complexes between soil organic matter and minerals and could cause release of DOM that would be otherwise sequestered in non-disturbed soils (Ogle et al. 2005). Another possible explanation is that the use of fertilizer has affected the soil chemistry to elevate microbial processes which collectively affect the composition of released DOM in LG tributaries (Heinz et al. 2015; Wilson and Xenopoulos 2009). Surprisingly, only C1 and C5 were significantly affected by agriculture indicating that they have some special susceptibility to this type of disturbance. While C1 has been well described in the literature and can be affected by a variety of biogeochemical processes, C5 is not so well studied and appears to be far less ubiquitous. Thus, we suggest that C5 may be used as a novel agricultural disturbance tracer in the LG watershed and potentially in other systems as well.

All PARAFAC components were of significantly different distributions in the wastewater-influenced tributaries (Fig. 4). The humic components C1, C2, and C5 were lowered whereas the protein-like/microbial-derived components C3, C4, and C6 were elevated relative to the pristine tributaries. This was expected based on the large number of previous reports on how WWTFs change DOM fluorescence composition (Coble et al. 2014). In brief, humic components are generally removed by flocculation and oxidation processes (Yang et al. 2015). Proteinaceous and other biological compounds are generally produced in the biological treatment of wastewater (Hudson et al. 2007) and thus, would be found downstream. Knowledge of exactly how the three WWTFs in the LG watershed affect water quality should allow for taking measures to prevent the health LG from deteriorating and upkeep its ecological and societal productivity.

In the context of bulk DOM and CDOM, the tributaries also differed in quantity and quality upon catchment features (Figure S10): the wastewater-influenced tributaries had higher TDN, lower C/N and SR ratios, and were less aromatic than the pristine tributaries. The DOC and CDOM loadings were significantly higher in wetland-influenced tributaries than in pristine ones even though this was not observed when comparing FDOM components (Fig. 4). Thus, FDOM characterization appears less powerful than DOM and CDOM characterization for tracing wetland biogeochemical effects in this watershed. By the contrary, FDOM characterization distinguished the agricultural impact whereas DOM and CDOM characterization did not. Thus, combining DOC/TDN measurements with optical analyses can provide a comprehensive view of the biogeochemistry of this watershed. This is discussed and exemplified in the next section.

Temporal variability of FDOM composition in LG tributaries

The quantity and speciation of fluorophores is expected to vary throughout the year. The temporal variability in FDOM composition was evaluated for Northwest Bay Brook, the largest LG tributary (also a pristine one); Westbrook, a tributary downstream from one of the WWTFs; and Jeremy’s Stream, a tributary affected by a wetland nearby (Fig. 5). Though the dataset is limited to 4–5 time points per stream, we have two time points representing the fall and summer seasons of the LG watershed, which is sufficient for obtaining preliminary results that would seed large-scale temporal studies in the future using EEM and/or in-situ data. Unfortunately, the data for the two agriculture-influenced tributaries (Blind Brook and Ice House) were quite limited and thus, they were not evaluated for temporal variability of FDOM composition.

Fig. 5
figure 5

Temporal variability in PARAFAC components in three selected tributaries. Error bars indicate a propagated 5% uncertainty. Sample labels are shown on the C1 panel

In these three tributaries, the primary humic component C1 appears to be of stable abundance throughout the sampling period varying with no more than 6% (relative standard deviation of all C1 measurements). The results for the minor humic components C2 and C5 are more variable (up to 17% relative standard deviation), but their abundances are still within 2–5% absolute difference of each other. This indicates that the inputs of humic material in LG are relatively constant for tributaries of different watershed characteristics and humic inputs do not appear to be drastically affected by seasonal changes such as differences in precipitation, cloud coverage, or precipitation.

The biologically produced C3, C4, and C6 fluorophores are also generally of similar abundance throughout the year suggesting that microbiological transformation and production of biological fluorophores in tributaries is relatively constant. Interestingly, C4 exhibited an increase of about 2% (absolute units) between the two summer temporal samplings (1 week apart). Given that all other components exhibited similar abundances during these two samplings, there must have been a specific event enhancing the production of this fluorophore. Considering that C4 is likely comprised of microbial exudates and photochemical by-products, it is likely that the abundance of this fluorescent constituent is weather dependent — microbial productivity will be suppressed at lower temperatures and photochemical transformation of DOM would be decreased in cloudy weather. This is corroborated by the fact that the first sampling day (on 5-11-2021) was much colder (4–13 °C range) than the second sampling day (5-18-2021, 9–26 °C range; https://www.wunderground.com/).

The protein-like fluorophore C6 shows high variability in all tributaries (17 – 28% relative standard deviations). This fluorophore was low in concentration in the summer and higher in concentration in the fall. Given that warmer summer temperatures enhance primary productivity, if C6 was microbially produced it would have increased in the summer months, and not decreased as shown by our data (Fig. 5). Notably, photochemistry has been excluded as a potential C6 source (Zhou et al. 2019) and it has been also found that C6 is poorly photo-reactive (Dainard et al. 2015; Stedmon et al. 2007a, b) or maybe even entirely photo-refractory (Zhou et al. 2019). The lack of a drastic shift in C6 between the two summer samplings agrees with the proposition that C6 is not directly related to short-term sunlight conditions, and it would likely have taken months of changing weather to affect its concentration. One potential explanation for the observed decrease in C6 in summer months is that if C6 is biologically produced, and sunlight degrades other DOM species that happen to be utilized as food by microbes, the biological production of C6 would be decreased. Another possible explanation is that C6 is a biologically labile compound that is actively utilized as food. In summer months the enhancement of microbiology by sunlight (Antony et al. 2018; Bostick et al. 2021; Kieber et al. 1989; Wetzel et al. 1995) would yield higher C6 consumption agreeing with our data. Unfortunately, this does not agree with the high C6 persistence shown by Kothawala et al. (2014) for oligotrophic lakes. Thus, while C6 exhibits protein-like fluorescence it might not be a proteinaceous compound at all and actually may be a photo-refractory degradation by-product of humics (C1, C2, C5) agreeing with numerous previous suggestions (Hernes et al. 2009; Maie et al. 2007; Romero et al. 2017; Schafer et al. 2021). This is corroborated by the strong negative correlation between C6 and C1 (Figure S7) suggesting that C6 is a product of C1 degradation occurring during downstream in-tributary processing or post-export during in-lake DOM processing.

Another observation is that the wastewater-influenced tributary, Westbrook, exhibited unexpected variabilities. The sudden decrease in C1, C2, and C5, and sudden increase in C3 and C6 for this tributary indicate that the influence of WWTF effluent on DOM composition is intermittent. This variability is likely related to the amounts of wastewater being treated at the WWTF at the different points in time.

The tributaries also showed temporal variations in their bulk DOM and CDOM contents (Figure S11). The pristine and wetland-influenced tributaries generally exhibited minor variations. DOC and TDN remained relatively constant though the C/N ratio slightly increased for both from ~11 to ~17. This may be due to the warmer temperatures in October 2021 (11.52 OC average temperature) than in October 2020 (9.42 °C average temperature, https://www.wunderground.com/). Higher temperatures allow for more material to solubilize from soils, which explains why these two tributaries were more humic in October 2021. This also corroborates with the increase in CDOM content (higher α254), as it is expected that if more humic material is mobilized, both DOM and CDOM would increase. The slope ratio SR and SUVA254 remain relatively consistent indicating that the bulk quality (composition) of DOM remained the same additionally supporting this proposition. The wastewater-influenced tributary had similar trends with the exception of its vastly different shift in TDN: it decreased from ~2.3 ppm-N to ~1.3 ppm-N within a year. This may be due to operational changes at the corresponding WWTF, improved piping seals to reduce seepage, or lower wastewater inputs into the plant in 2021.

The use of DOC/TDN and absorbance parameters appears complementary to the fluorescence C1-C6 measurements; however, they do not correlate (Figure S12) especially in the tributary samples. This is explainable by the different analytical windows of these techniques. However, our findings so far indicate that the combination of the three techniques allows to completely distinguish the different types of disturbances to LG tributaries. Thus, measuring DOC, TDN, and absorbance and fluorescence spectra appears the most effective approach for studying LG biogeochemistry. These data can be combined and used altogether in a principal component analysis (PCA) model, which can further simultaneously differentiate the different types of samples and allow for the assessment of spatiotemporal variability (Figure S13). The trends of this PCA model mirror those that we have previously described in the paragraphs above showing that a holistic PCA approach is an effective way to look at LG biogeochemistry instead of the fragmented approach with box and whisker plots we used to separately assess DOC, TDN, CDOM, and FDOM metrics.

Future directions for research on the Lake George watershed using PARAFAC modeling

The LG watershed contains 92 permanent tributaries and another 50 ephemeral that annually contribute with DOM inputs to LG. For this study only 64 tributaries were sampled including all 11 major tributaries. Capturing higher spatiotemporal resolution of tributaries will permit the capture of seasonal and hydrologic variability of DOC, CDOM, and FDOM delivered to LG. The data at present suggests that the inputs of DOM at baseflow conditions are relatively constant, although we would expect DOM quantity and composition to vary with rainfall or other runoff events, which could result in short-term changes to DOM cycling and biogeochemistry. For example, DOM dynamics differ significantly during storms and snowmelt conditions (Garcia et al. 2023; Humbert et al. 2019; Packer et al. 2020; Pellerin et al. 2012). Future work would benefit from assessing the DOM dynamics of LG during such events and comparing them to the DOM dynamics at baseflow. Furthermore, tributaries affected by agriculture, wetlands, wastewater, or other activities should also receive more attention in future studies as FDOM in such environments appears to be slightly more variable. Additional sample-paired measurements with nutrients (e.g., nitrate) or other tracers may be necessary to more precisely categorize and quantify wastewater/agriculture/wetland or other disturbances. Future research on tributaries will allow for comprehensively evaluating and quantifying the inputs of DOM into LG and their fluctuations on spatiotemporal scales throughout the watershed.

Given that the biogeochemistry of lakes varies most critically among depth gradients, it would be useful to obtain LG samples from different lake strata (epilimnion, metalimnion, hypolimnion layers) in addition to surface pelagic waters. Increasing the number of lake samples in this dataset will be a critical future research aspect in order to create a robust PARAFAC model for successful deconvolution of both tributary and lake EEM spectra. It may be even necessary to separate the lake samples in a separate PARAFAC model for more precise deconvolution of underlying lake FDOM components (Pitta and Zeri 2021).

Additionally, chlorophyll and light intensity measurements should be taken and paired with tributary and lake samples to distinguish the roles of microbiology and photochemistry in the FDOM cycling in LG (Berg et al. 2022). These fieldwork observations should be paired with laboratory experiments, which can allow to better understand the biogeochemical roles and underlying structural components of each fluorescent DOM component. Photochemical, microbial, and abiotic (e.g., Fenton oxidation) incubations should be conducted to determine the lability/stability of each component as well as determine if the degradation of one component could yield another one (Murphy et al. 2018). The observed strong negative correlation among C1 and C6 (Figure S7) suggests that C6 is a by-product of the degradation of C1, however, a carefully designed empirical study must confirm this.

Conclusion

Characterizing DOM, CDOM, and FDOM at baseflow conditions and spatiotemporally assessing them established a fundamental understanding of DOM variability in the LG watershed. Results from tributaries with known watershed disturbances from WWTFs, farms, and wetlands indicated that future research should certainly focus on such “impacted” areas to establish quantitative relationships between anthropogenic stressors and DOM quantity and quality. Continued research of LG will allow for preventing the health of the lake from deteriorating and sustaining the watershed’s productivity for future generations. More broadly, exploring the dynamics of DOC, TDN, CDOM, and PARAFAC components of FDOM contributes to better understand broader DOM biogeochemistry in temperate lake watersheds. Most notably, the presented PARAFAC model here describes six different FDOM components in tributary and lake DOM, whose reactivity and overall biogeochemical cycling appears different, and will be useful in future biogeochemical studies of oligotrophic lakes in boreal and temperate regions.