Methods

Authored By: K. Pierce, K. Brewer, J. Ohmann

Subsections found in Methods include synopses of the study area, moderate resolution predictor data, the Gradient Nearest Neighbor (GNN) method of predictive vegetation mapping, and model evaluation and accuracy assessment.

Subsections found in Methods
 

Encyclopedia ID: p3447

The Study Area

Authored By: K. Pierce, K. Brewer, J. Ohmann

Three western regions covering temperate steppe, coastal forest, and Mediterranean ecosystems were mapped using GNN imputation for a Joint Fire Sciences Program study (see figure at right). The original study examined the feasibility of mapping wildland fuels and vegetation structure to provide data for fire and fuels management planning (Pierce and others, in review; Wimberly and others 2003). The 2.86-million-ha coastal forest site was located in the coast range of Oregon extending as far inland as the western edge of the Willamette Valley. The forests are primarily coniferous with hardwoods occupying riparian and disturbed areas. The 4.1-million-ha Mediterranean site was located in the central Sierra Nevada occupied by savannah, chaparral, mixed conifer, and alpine woodlands vegetation types. The site stretches from the northern border of Sequoia National Park north through the Plumas National Forest. The 5-million-ha temperate steppe in northeastern Washington was bounded on the west by the Cascade crest and on the south by the Columbia and Spokane rivers. The temperate steppe site is dominated by a combination of mixed coniferous forest and extensive shrub steppe.

Vegetation Data from Field Plots

Vegetation data from regional inventories were derived in each of the three regions from multiple sources including Forest Inventory and Analysis (FIA) plots, Current Vegetation Survey/R6 plots (CVS), R5, BLM, research Ecology Plots in North Cascades National Park (NCNP) (provided by Dave L. Peterson) and Yosemite National Park (provided by Jan Van Wagtendonk). The FIA, R5, and CVS plots were installed on systematic grids. CVS/R6 and R5 plots covered the national forests whereas FIA installed plots on all ownerships. FIA and CVS inventory plots used five subplot arrays within a 1-ha area. Small trees, snags, coarse woody debris line-intercept transects, and ground cover were sampled on each subplot.

Because vegetation data were derived from multiple inventories with different sampling protocols, all individual tree records were converted to per-hectare values. For plots with multiple vegetation or land cover conditions only, the forested portion was used with expansion factors adjusted accordingly. Plot-level summary variables were calculated for each plot.

Stand-summary variables included total basal area, basal area by species, trees per hectare, quadratic-mean diameter, snags per hectare, percent tree canopy cover, and down-wood volume. Different inventories collected down-wood data using different sampling schemes and minimum sizes. As a result, we focused primarily on species basal area.

Literature Cited
 

Encyclopedia ID: p3448

Moderate Resolution Predictor Data

Authored By: K. Pierce, K. Brewer, J. Ohmann

Beginning in 2003, RSAC, in cooperation with the FIA remote sensing band, developed a national predictors database to support FIA national mapping efforts, (e.g., national forest type maps, and national biomass maps). The original database included about 60 layers consisting primarily of MODIS imagery. The national predictors database has been extensively used with additional data layers added each year. The current version has more than 700 layers, which includes DAYMET climate data, additional MODIS imagery, derived MODIS-based vegetation indices, STATSGO soil layers, topography, and several derived thematic products (Table: Comparison of spatial data). These data cover the entire conterminous United States as well as Alaska and Puerto Rico.

To prepare these data for Gradient Nearest Neighbor analysis, all layers were sampled with all plot locations. With 30-m data, 13-pixel footprints were used to sample the spatial data to account for plots occupying more area than a single 30-m pixel. With 250-m data, only a single pixel was necessary to represent each plot because the plot area is less than the area of one 250-m pixel. All predictor images were split into separate bands, comprising over 700 individual predictor layers. For example, some STATSGO soil layers were comprised of 11 soil horizons. Those 11 horizons were split into separate predictor layers. Before statistical modeling, the 700 bands were reduced to those without a correlation of >0.90 with any other remaining predictor. For each of the three study areas, this left about 100-150 predictor layers. These predictors were entered as the predictor matrix in a stepwise canonical correspondence analysis. In each case, 10-15 predictors were sufficient to explain the primary gradients in vegetation composition.

Satellite Imagery

For the 30-m study, Landsat Thematic Mapper imagery (TM) was mosaiced and histogram matched. For the 250-m study, several products derived from MODIS imagery were used (Table: Comparison of spatial data). These products have a spatial extent covering our entire study regions so no image matching was necessary.

Biophysical Environment Data

For both studies, biophysical data were derived from digital elevation models, including slope, aspect, and elevation. Both studies used climate data derived from DAYMET, though the 250-m data set used many more calculated variables. These variables included year-to-year variability by month. The DAYMET data is provided at 1-km resolution so both the 30-m and 250-m DAYMET data were interpolated or resampled to their respective resolutions.

The 250-m data set also included STATSGO data layers, such as available water-holding capacity, soil-bulk density, soil permeability, and soil pH. No analog for these data existed in the 30-m data set.

 

Encyclopedia ID: p3450

The Gradient Nearest Neighbor (GNN) Method of Predictive Vegetation Mapping

Authored By: K. Pierce, K. Brewer, J. Ohmann

Imputation is a process in which values are assigned to unmeasured locations from either measured values or a statistical summary of a few selected measured values such as a mean (Moeur and Stage 1995). Unlike regression-based predictions, the assigned value is not a product of predictor variables and coefficients. The predictor variables are used to rank sample plots as to their similarity to a target location (30-m or 250-m-square pixel). In Gradient Nearest Neighbor imputation (GNN) we used the loadings for the ordination axes and their eigenvalues from a Canonical Correspondence Analysis (CCA) to relate the target locations to the locations of sample plots. This is achieved by calculating a Euclidean distance in eight-axis ordination space between the target pixel and each plot using the ordination loadings to weigh the spatial variables and the eigenvalues to weigh each axis. The distance in gradient space between the target location and each plot is used to rank the sample plots for potential assignment at each target pixel in our study regions (Ohmann and Gregory 2002). Because we use a multivariate response, which is analogous to a community representation of a plot, the variables selected must be of similar type. Our primary modeling scenarios have been species modeling, which involves using the total basal area of each individual species, and structure modeling, which has used basal area in different size classes of hardwoods and conifers, snag density, coarse woody debris volume, and canopy cover.

When GNN is run as a single neighbor imputation, the closest plot in gradient space is assigned to each pixel in the landscape. Once the assignment has been made, any attribute calculated for all plots can be mapped maintaining the original covariance structure for any/all other attributes to be mapped. Additionally, the ranking of potential neighbors provides a sample neighborhood of similar plots from which natural variability and sample sufficiency can be evaluated (Pierce and others, in review).

Gradient Nearest Neighbor Results Summary from Previous Studies

In the 30-m study, coastal Oregon had favorable results for structure models but substandard results for species models. This pattern was reversed in both Washington and California, where species models worked well, but structure models did a poor job with attributes such as quadratic mean diameter and trees per-hectare. The purpose of the original study was to map wildland fuels and vegetation structure. Wildland fuel components, such as coarse woody debris and snag density, were not mapped adequately in any of the three sites (However, canopy-related fuels variables were more satisfactory.) This was partially due to remote sensing not directly detecting wildland fuels, which, in general, are below the canopy. Another factor is that course woody debris data are collected on only a relatively short transect on FIA plots such that the resulting sample size is too small to characterize individual plots. The original design of the course woody debris sampling was to create estimates for a region.

We expect that species models will perform more favorably than structure models with 250-m data. In the previous study, remote sensing imagery was very important for mapping structure and has a much finer spatial grain than our climate data. Although we had climate data at 30 m, it was interpolated from 1-km resolution DAYMET data, which is itself interpolated from weather stations plus higher resolution covariates (elevation, topography, etc.). Therefore, climate data would change gradually over the course of a kilometer whereas TM data can change abruptly from one 30-m pixel to the next. Therefore, our change in resolution loses much more information for predictions relying heavily on imagery than those relying on climate.

Literature Cited
 

Encyclopedia ID: p3453

Model Evaluation and Accuracy Assessment

Authored By: K. Pierce, K. Brewer, J. Ohmann

To evaluate the results of the 30-m study compared with the 250-m study results, two primary approaches were used. For continuous variables, we calculated 2nd nearest neighbor correlations (analogous to r-squares from cross-validation), (see Ohmann and Gregory 2002). For discrete variables, such as species presence/absence, we used standard confusion matrices. Producers accuracy, users accuracy, and Kappa statistics were calculated for each species and compared for the two studies (Wilkie and Finn 1996). The Kappa statistic accounts for the probability of randomly assigning a plot to its correct class. As such, the random probability of assigning a species with a frequency of 0.9 to any plot is very high; Kappa statistics tend to be low as the probability of improving upon randomness decreases. Producers accuracy is the proportion of sample plots with a species present in which the species is predicted to occur. Users accuracy is the proportion of plots with the predicted species occurrence that actually had the species in the inventory.

Any discrepancies derived from changing the scale of the analysis stems from two primary effects, which are both aspects of spatial averaging. First, with the 30-m product, the predicted value for a plot is the average of the 13 pixels imputed to the plot footprint. In this way, the predicted value is subject to averaging over those 13 imputations. With the 250-m imputation, only a single plot is imputed for the same ~1 ha space. Therefore, the 250-m results will be slightly more variable. The second effect involves the spatial averaging of the predictor data set. The predictor variables are the averages of the 13 pixels within the individual predictor layers. In addition, we calculate two texture indices for the remote sensing data, which provides an estimate of variability within the footprint. With the 250-m data, a single pixel covers an area considerably larger than a 1-ha plot (62,500 m2 vs. 11,700 m2). Thus, the spectral values are averaged over a larger area, and we have no estimate of within plot variability.

Literature Cited
 

Encyclopedia ID: p3455