USGS gage data streamflow

Hi Hazhir -

It is hard to come up with detailed, specific instructions for working with gage data, since each dataset will be different in format and content, based on data source and station data availability.

I’m not exactly sure what aspect of this you’re looking for guidance on, but we recently put together a draft of general freshwater model calibration methods, which I intend to add to the InVEST User Guide but haven’t gotten around to yet. So I’ll paste them here, and will be interested to hear if they address your questions or not, and if not, what’s missing.

General calibration steps

  1. Find observed data within the watershed of interest. Usually from gauge stations.
  • Gauge station data often comes from government agencies, but may also be provided by water-related utilities like hydropower operators, or other sources.
  1. Review the observed data for required measurements, duration, and completeness.
  • Required measurements: values that correspond to the output of the model you’re calibrating. For example, if you’re calibrating SDR sediment export, the gauge data must include either sediment load values, or a combination of sediment concentration and water flow data that can be used to calculate sediment load.
  • Duration: Optimally, at least 10 years of continuous daily data, which corresponds with the time frame of the climate data you’re using as input to the InVEST model.
  • Completeness: No large gaps in data. If there is a gap in one year, but the other years’ data fill that gap, that is ok. But if most or all years are missing data for, say, a whole month or whole season, then that is unlikely to produce good results.
  • Even better if someone has already processed the observed data into monthly or annual average values. This is rare, but worth asking about.
  1. Prepare the observed data, summarizing it to a value that can be compared directly with model results.
  • This process will be different depending on the nature of the data you’re working with, and the model output that you are calibrating, so it’s hard to generalize.
  • In the end, you want to create (at least one) single value that represents average annual sediment loading, nutrient loading, or water flow at the gauge station, with units that match the model output. (For the seasonal water yield model, you could use (12) averages representing each month of the year for a gauge station, but would need to decide how to distribute the annual baseflow result.)
  1. Compare the calculated observed values with modeled results.
  • Summarize the modeled results within the watershed that drains into the point where the observed data was taken. See the following section “Delineating watersheds” for more information.
  • The modeled result is unlikely to match the observed values, and may be very different. Remember that these are simple models, and for any model (even complex ones) calibration is necessary to bring the modeled results close to reality, and have confidence in the absolute values.
  1. Do a sensitivity analysis to determine which model parameters are useful to adjust for calibration.
  • This requires doing many model runs, which is most efficiently done by scripting, so it’s easier to iterate over a range of biophysical table values, input rasters, or other parameters.
  • Vary biophysical table values (related to the land use/land cover map), as well as global model parameter values, one parameter at a time, within reasonable ranges, based on ranges reported in the literature. You can also vary spatial input layers, if you have different sources covering the area of interest that are significantly different from each other.
  • The parameters that have the greatest effect on results should be used for calibration.
  1. Once you’ve chosen the parameters that have the greatest effect, do another set of model runs that adjusts these parameters across a range of values, changing all of the parameters at the same time, such that a different set of parameter values is used for each model run.

  2. Use statistical methods to compare the results from step 6 with the observed data. Select the set of model parameters that create results that come satisfactorily close to the observed data value.

  • This can be as simple as calculating the percent error as follows:
    • percent_error = ((modeled_value - observed_value) / observed value) * 100

Delineating watersheds

When calibrating freshwater models with observed data, we need to delineate the watershed that flows into the point where the observed data gauge is located. Then we can summarize the relevant model result (such as sediment export) within that watershed, and compare that summary with the observed data value.

Many different tools are available to create watersheds, and you can use whichever one you’re comfortable with. InVEST includes the tool DelineateIt as a simple, effective way of creating watersheds.

Whichever tool you use, they generally require, at a minimum, a digital elevation model (DEM) raster, and a vector (like a shapefile or geopackage) containing the point location(s) to be used as outlets. In this case, the outlet will be the location of the gauge station where observed data comes from. The DEM must be the same one that is used as input to the InVEST freshwater model you’re calibrating.

After running the delineation tool, look at the resulting watershed carefully to make sure that it appears correct. One common problem is that the delineated watershed is very tiny. This is usually caused by the outlet point not being located directly on a stream created by the delineation tool. To fix this, many delineation tools have a “snap” function, where you can specify a distance around the outlet point that the tool should look for a stream, and if one is found within that distance, the tool “snaps” the point to the stream, and delineates the watershed more accurately. If the tool does not have a snap feature, you can manually move the point to lie on the stream network generated by the delineation tool.

Once the watershed is correctly generated, a GIS tool like Zonal Statistics is used to sum the relevant model result raster (such as sediment export) within the watershed. This summarized value is then compared with the observed data value. Alternatively, you can use the generated watershed as an input to the model, which will do the summarizing for you, and output a vector layer whose table contains the summarized values.


Step 3 (preparing observed data) is probably the trickiest one to advise on, since it’s where each dataset will be different, and it’s also probably the step people would like the most help with. But do let me know if this helps and what could be improved.

~ Stacie

3 Likes