Input datasets for SDR (data sources & pre-processing)

Many thanks to the incredibly helpful developers and GIS experts who provide support on these fora…

I have recently completed some of the tutorials for InVEST SDR in both the first module and the data acquisition and processing section in the latter section. The modules allow us to get an understanding of what the input data might look like. I am now in the process of trying to work out how best to generate these data inputs for my own case study area.

The guidelines here are fairly comprehensive too: SDR: Sediment Delivery Ratio — InVEST documentation

I have highlighted the remaining gaps in my knowledge for the SDR model here in yellow:

The first gap is erosivity. I aim to download the rainfall data from one of the recommended datasets:
https://esdac.jrc.ec.europa.eu/content/global-rainfall-erosivity#tabs-0-description=1 2. https://esdac.jrc.ec.europa.eu/content/global-rainfall-erosivity#tabs-0-description=1&tabs-0-description-2=

Questions

  1. I am not sure about the best way to convert this result into erosivity - is this done through a geoprocessing / calculator in QGIS / ArcGIS ? OR through an excel formula?

  2. Do I need to do something similar for erodibility?

  3. For the watersheds vector, do I need to generate this from DelineatIt or from hydrological geoprocessing toolboxes in GIS?

  4. Is the biophysical table also generated as the attribute table of the watersheds vector through the same process?

  5. Is drainage also generated through the DelineateIt or from hydrological geoprocessing toolboxes in GIS?

Thank you for your time and support.

Hi @ndmetherall -

The rainfall erosivity layer you point to is a very cool global dataset to know about! The metadata says that the units are already MJ mm ha-1 h-1 yr-1, which is what the model needs, so there is no need to convert it, you should be able to use it as is.

Erodibility usually requires some processing, unless you’ve found a SOTER layer that includes it already calculated with correct units. This is one input where we are actually working on scripts and additional guidance, since it’s a pain to create from ISRIC Soil Grids (which most of us use) but they’re not ready yet. If you do an article search about calculating erodibility/Kfactor, you’ll find several different equations, which use different soil properties (sand/silt/clay, organic matter, permeability…) Some of the equations are place-specific, which may or may not be the place you’re working, so be aware of that. Otherwise, I’m sure you found that the User Guide has some guidance, although it’s still somewhat laborious. One thing that’s nice about erosivity is that you only really need to consider the top layer of soil, so don’t need to process horizons. Sorry I don’t have more straightforward advice on this. The only place that I know it’s relatively easy to create is in the US, which has GIS tools for producing derived soil properties from a soil database.

You can generate watersheds using any tool you like, the one important thing is to create them from the DEM that you’re using as input to the model, so the watersheds are hydrologically complete.

The biophysical table is based on the land use/land cover map, where each LULC class is assigned usle_c and usle_p values. It is not based on watersheds.

The Drainage input is optional, and usually is not generated from the DEM. It’s intended to represent irrigation ditches or some other similar human-created artificial drainage system, that would come from a separate source.

~ Stacie

2 Likes

Dear @swolny

This advice is great. Thanks for your detailed and insightful responses. I have looked into these datasets you have recommended and those from the guide in greater detail and am chipping away at the original datasets one at a time. It is challenging working in study areas outside of the U.S. for the reasons you have mentioned.

I am now following up on your points and have reached the following new questions:

  1. I have been downloading LULC data from the dataset your recommend on APPEEARS platform. Do you recommend the MODIS 500m Combined Landcover type or another dataset?

  2. I have searched for some papers for the soil K - factors and spatial datasets in the country and now I am looking at the ISRIC datasets so I can hopefully add a field to the attribute tables or a value to the rasters to align with these k-factor values for my case study area (Fiji).

  • Which of the following ISRIC soil datasets do you recommend:

A Globally Distributed Soil Spectral Library Mid Infrared Diffuse Reflectance Spectra

https://data.isric.org/geonetwork/srv/eng/catalog.search#/metadata/1b65024a-cd9f-11e9-a8f9-a0481ca9e724

A homogenized soil data file for global environmental research: A subset of FAO, ISRIC and NRCS profiles

https://data.isric.org/geonetwork/srv/eng/catalog.search#/metadata/0f85c381-e496-47d9-89d8-f1fe2ee1a517

WISE derived soil properties on a 0.5 by 0.5 degree global grid, version 3.0

https://data.isric.org/geonetwork/srv/eng/catalog.search#/metadata/d9eca770-29a4-4d95-bf93-f32e1ab419c3

WISE - Global Soil Profile Data, version 3.1

https://data.isric.org/geonetwork/srv/eng/catalog.search#/metadata/a351682c-330a-4995-a5a1-57ad160e621c

WISE derived soil properties on a 30 by 30 arc-seconds global grid

https://data.isric.org/geonetwork/srv/eng/catalog.search#/metadata/dc7b283a-8f19-45e1-aaed-e9bd515119bc

Or any others?

Many thanks again for all your technical support.

Kind regards.

Hi @ndmetherall -

It’s hard for me to recommend a particular LULC layer without being familiar with the area you’re working in. We will often collect several land cover maps (global, like MODIS or ESA, or, preferably, more local/national), and compare them with a basemap, as well as get feedback from partners or other local experts, to determine which one represents the project area the best. I recently compared MODIS and ESA against a satellite basemap for one of my projects, and they were each ok in some ways, and obviously wrong in others. So you’ll have to decide which one works the best for your needs.

As for soil, again it’s hard for me to judge. It’s really unfortunate that ISRIC doesn’t provide Kfactor directly. Since you’ll be calculating this layer, I’d say that the two things to look for are 1/ whether the datasets contain the properties that you need to calculate K, and 2/ resolution. For example, one of the datasets you list is 0.5x0.5 degrees, which is very coarse, and I’d recommend going with one that’s higher-resolution, such as those that are 30 arc-seconds.

For the soil properties, it may help to consider how you want to calculate K. If you want to use the table provided in the User Guide, you’ll need to know the textural class (clay/clay loam/etc), or %sand/silt/clay, plus %organic matter content. If you want to use a different equation, then you’ll need to choose a soil database that provides whatever properties go into the equation.

The ISRIC website is a bit confusing. They have so very much data that it’s hard to figure out what’s best. Their latest product is SoilGrids, where you can zoom into your area of interest, select the soil properties you’re interested in, choose only the top depth (0-5cm, since erodibility is concerned with surface erosion) and download the layers already in grid form. Then you can do raster calculations on them, perhaps more easily than these other datasets provide.

~ Stacie

This is great advice again @swolny thank you very much again.

SOIL
I am supporting a project in the South Pacific and we have had access to some local soil datasets. When lucky, there has been a vector file with the soil descriptions including the soil composition you have outlined. In this case, I have used the table in the user guide and joined it to the vector attribute table in GIS then turned the polygon into a raster to meet the data format requirements.
image

In cases, where we have not been so lucky, I have had access to old scanned soil maps from ISRIC. We may have to digitise each layer and then give it a value.
https://edepot.wur.nl/486972

However, I prefer your advice to use the soil grids link you shared and then work with that. I am assuming that I should just download each of the separate raster layers following the instructions you have outlined - (silt, sand, clay, bulk density etc…) - 0-5m mean values as shapefiles and join them all in to a final collated soil value in GIS then use raster calculators to give them all k-values?

Please excuse so many questions. I appreciate the inputs here.

Thanks again

Dear Stacie.

Thanks again. I will try to download the data from soil grids as you have suggested. I am just wondering if I will need to download each of the separate raster layers following the instructions you have outlined - (silt, sand, clay, bulk density etc…) - 0-5m mean values as shapefiles and join them all in to a final collated soil value in GIS then use raster calculators to give them all k-values? Is this the approach you would recommend?

Best wishes.

I think ISRIC Soil Grids are already in raster format, so if you’re using them directly, you should be able to reproject to the same projected coordinate system, clip them to the study area if needed, and use Raster Calculator to create K values, if you’re using an equation to calculate K.

If, however, you’re doing the translation from sand/silt/clay to texture, then mapping to the values in the table in the User Guide, I suppose you could still do all that in raster format, but it’s perhaps more confusing that way. You could turn the rasters into shapefiles, do the texture/K-factor mapping in the attribute table, then go back to raster. If you combine the sand/silt/clay/OM layers, such that each polygon has all values, you can export the shapefile attribute table to Excel and work with the rest of it there, which I often do, it can be more efficient. Then join the resulting table back to the shapefile and copy over your final K values, and convert back to raster.

~ Stacie

1 Like

This makes sense, Stacie. Thank you for sharing this instruction and experience.

I was having some progress with joining attribute tables to the vector shapefiles as you have mentioned. However, for other geographic areas, I may have to rely more on the Soil Grids rasters you have also mentioned and then using an equation. Do you recommend trying to find an equation from a journal article published for a similar geographic region? Or is there are an article / formula for the raster calculator you can recommend more widely?

I am now delving into the landuse and landcover. I notice from the sample data from the Gura case study from the online module. In the example, (Gura attached) there are 10 classes

. I notice these do not always have to align with the 6 IPCC landcover classes but can include other custom classes. I was wondering how we attribute the values 1-19 to each of these raster classes? I was unsure whether all classifications should always follow this example with these values?
Screen Shot 2021-03-18 at 8.45.48 am

According to the user guide there are a range of classes and sub-classes. I was not sure how to or whether to incorporate the P and C coefficients into the raster values of the landcover / landuse values?

Is there a reference document / reading with the values for landclasses needed for the InVEST model?

Thanks again for all the technical support.

@ndmetherall I’ll have to let Stacie answer your question about soil grids, but I can answer your other questions about LULC classes.

The LULC classes that we distribute with the InVEST sample data, including the ones you mention for the Gura study area that you mention here, are for demonstration purposes only, and are merely to demonstrate what the structure/format of the inputs should be. The LULC classes you use could align with the 6 IPCC landcover classes (or any other standard set of landcover classes), but they don’t have to. If you have your own classification system, that would work fine too.

Maybe this section (which provides some links to resources for estimating C and P coefficients) in the SDR User’s Guide chapter would be useful? Forgive me if you’ve already taken a look at it: SDR: Sediment Delivery Ra

1 Like

Hey @ndmetherall -

It’s always nice if we can find equations that were created more specifically for our study area, but they often don’t exist. So I would take a look, but not spend too much time on it, since you can use the table in the User Guide to get good general values for K. When I start a study with SDR in a new place, I always do a web search for “USLE” and my area of interest and see what comes up, you might find publications with equations for R or K that you think are applicable, or sources for USLE C and P values etc.

As @jdouglass said, your landcover map can have any set of classes, there are no requirements for how many, or which types, or the specific ID values they have. We expect that every project will have a different LULC map, from one of many possible sources. (I am working with one now that has 157 classes, including over 80 extremely specific forest types!) The only requirements are that each class have a unique integer ID in the raster’s Value field, and your biophysical table must have an entry for each LULC class that’s in the raster. You can see this correspondence in the Gura sample data, by comparing the LULC raster with the biophysical table.

~ Stacie

3 Likes

Thank you @jdouglass for these insights. I have referred back to the guide again and with your points on the LULC, I will look up the c and p coefficients in the guide

So from my understanding of the advice and the user guide, these p and c coefficients are used as inputs in the biophysical table corresponding to the LULC raster file with the same number of classes and no gaps? Correct? Does the raster need to be joined to the biophysical table using a geoprocessing toolbox? OR do we just use the raster calculator?

While, the LS factor is calculated automatically from the DEM, the P and C factors can be estimated from the tables in the guidelines, is this understanding correct?

But how would we know which areas are different farming types? Do we have to run a classification from high-resolution imagery?

Dear Stacie. Many thanks for your inputs here…

This is great advice. I will look into the values in the literature as you have advised.

Wow, 157 classes in your current study. That must be a complex catchment / area.

In terms of the biophysical table and the LULC raster with the corresponding value fields, I wasn’t quite sure if we needed to run a spatial join to link the table to the raster?

Below I have the biophysical table from Gura sample data

In the biophysical table, there are 10 LULC classes. Each has the following fields:

  1. unique LUC code
  2. USLE_c this is the c value that might come from the literature or the default values in the SDR and FAO guides? Not sure how to best classify / identify the different c value for each area? Would we use a classifier?
  3. USLE_p > this is the p value that might come from the literature or the default values in the SDR and FAO guides? Not sure how to best classify / identify the different p value for each area? Would we use a classifier?
  4. load_p > Not sure what this is? Where would we find this information?
  5. eff_p > Not sure what this is? Where would we find this information?
  6. crit_lan_p > Not sure what this is? Where would we find this information?
  7. root_depth > Where would we find this information?
  8. Kc > unsure about this variable?
  9. LULC_veg > Is this a boolean where 0 is no vegetation and 1 is vegetation?

Any advice on these variables and which ones are required inputs (sources) as well as the outputs would be much appreciated.

Kind regards.

Hi @ndmetherall -

If you look in the Data Needs section of the User Guide, and the sample data, you’ll see that you provide the land use raster as one input and the biophysical table as a separate input. The model will join the raster to the biophysical table using the Value field in the raster and the lucode field in the table, so you do not need to do this.

In the biophysical table, for SDR you only need to provide usle_c and usle_p, the other columns in the sample data correspond to parameters required for other hydrology models (NDR and Annual Water Yield). When we do projects using multiple models, it’s often most efficient to keep all of the models’ biophysical parameters in one table, but I’m seeing that it might not be a good idea to do that with the sample data, it’s often confusing when just getting started with InVEST.

You provide a usle_c and usle_p value for each land cover class, so your land cover map should already be classified. Then you’ll need to use your understanding of the different land cover types, and a literature search, to assign values to each class. We often start out with usle_p set to all 1s, if we don’t know if/how/where different sediment management practices are done.

~ Stacie