Input datasets for SDR (data sources & pre-processing)

Many thanks to the incredibly helpful developers and GIS experts who provide support on these fora…

I have recently completed some of the tutorials for InVEST SDR in both the first module and the data acquisition and processing section in the latter section. The modules allow us to get an understanding of what the input data might look like. I am now in the process of trying to work out how best to generate these data inputs for my own case study area.

The guidelines here are fairly comprehensive too: SDR: Sediment Delivery Ratio — InVEST documentation

I have highlighted the remaining gaps in my knowledge for the SDR model here in yellow:

The first gap is erosivity. I aim to download the rainfall data from one of the recommended datasets: 2.


  1. I am not sure about the best way to convert this result into erosivity - is this done through a geoprocessing / calculator in QGIS / ArcGIS ? OR through an excel formula?

  2. Do I need to do something similar for erodibility?

  3. For the watersheds vector, do I need to generate this from DelineatIt or from hydrological geoprocessing toolboxes in GIS?

  4. Is the biophysical table also generated as the attribute table of the watersheds vector through the same process?

  5. Is drainage also generated through the DelineateIt or from hydrological geoprocessing toolboxes in GIS?

Thank you for your time and support.

Hi @ndmetherall -

The rainfall erosivity layer you point to is a very cool global dataset to know about! The metadata says that the units are already MJ mm ha-1 h-1 yr-1, which is what the model needs, so there is no need to convert it, you should be able to use it as is.

Erodibility usually requires some processing, unless you’ve found a SOTER layer that includes it already calculated with correct units. This is one input where we are actually working on scripts and additional guidance, since it’s a pain to create from ISRIC Soil Grids (which most of us use) but they’re not ready yet. If you do an article search about calculating erodibility/Kfactor, you’ll find several different equations, which use different soil properties (sand/silt/clay, organic matter, permeability…) Some of the equations are place-specific, which may or may not be the place you’re working, so be aware of that. Otherwise, I’m sure you found that the User Guide has some guidance, although it’s still somewhat laborious. One thing that’s nice about erosivity is that you only really need to consider the top layer of soil, so don’t need to process horizons. Sorry I don’t have more straightforward advice on this. The only place that I know it’s relatively easy to create is in the US, which has GIS tools for producing derived soil properties from a soil database.

You can generate watersheds using any tool you like, the one important thing is to create them from the DEM that you’re using as input to the model, so the watersheds are hydrologically complete.

The biophysical table is based on the land use/land cover map, where each LULC class is assigned usle_c and usle_p values. It is not based on watersheds.

The Drainage input is optional, and usually is not generated from the DEM. It’s intended to represent irrigation ditches or some other similar human-created artificial drainage system, that would come from a separate source.

~ Stacie


Dear @swolny

This advice is great. Thanks for your detailed and insightful responses. I have looked into these datasets you have recommended and those from the guide in greater detail and am chipping away at the original datasets one at a time. It is challenging working in study areas outside of the U.S. for the reasons you have mentioned.

I am now following up on your points and have reached the following new questions:

  1. I have been downloading LULC data from the dataset your recommend on APPEEARS platform. Do you recommend the MODIS 500m Combined Landcover type or another dataset?

  2. I have searched for some papers for the soil K - factors and spatial datasets in the country and now I am looking at the ISRIC datasets so I can hopefully add a field to the attribute tables or a value to the rasters to align with these k-factor values for my case study area (Fiji).

  • Which of the following ISRIC soil datasets do you recommend:

A Globally Distributed Soil Spectral Library Mid Infrared Diffuse Reflectance Spectra

A homogenized soil data file for global environmental research: A subset of FAO, ISRIC and NRCS profiles

WISE derived soil properties on a 0.5 by 0.5 degree global grid, version 3.0

WISE - Global Soil Profile Data, version 3.1

WISE derived soil properties on a 30 by 30 arc-seconds global grid

Or any others?

Many thanks again for all your technical support.

Kind regards.

Hi @ndmetherall -

It’s hard for me to recommend a particular LULC layer without being familiar with the area you’re working in. We will often collect several land cover maps (global, like MODIS or ESA, or, preferably, more local/national), and compare them with a basemap, as well as get feedback from partners or other local experts, to determine which one represents the project area the best. I recently compared MODIS and ESA against a satellite basemap for one of my projects, and they were each ok in some ways, and obviously wrong in others. So you’ll have to decide which one works the best for your needs.

As for soil, again it’s hard for me to judge. It’s really unfortunate that ISRIC doesn’t provide Kfactor directly. Since you’ll be calculating this layer, I’d say that the two things to look for are 1/ whether the datasets contain the properties that you need to calculate K, and 2/ resolution. For example, one of the datasets you list is 0.5x0.5 degrees, which is very coarse, and I’d recommend going with one that’s higher-resolution, such as those that are 30 arc-seconds.

For the soil properties, it may help to consider how you want to calculate K. If you want to use the table provided in the User Guide, you’ll need to know the textural class (clay/clay loam/etc), or %sand/silt/clay, plus %organic matter content. If you want to use a different equation, then you’ll need to choose a soil database that provides whatever properties go into the equation.

The ISRIC website is a bit confusing. They have so very much data that it’s hard to figure out what’s best. Their latest product is SoilGrids, where you can zoom into your area of interest, select the soil properties you’re interested in, choose only the top depth (0-5cm, since erodibility is concerned with surface erosion) and download the layers already in grid form. Then you can do raster calculations on them, perhaps more easily than these other datasets provide.

~ Stacie

This is great advice again @swolny thank you very much again.

I am supporting a project in the South Pacific and we have had access to some local soil datasets. When lucky, there has been a vector file with the soil descriptions including the soil composition you have outlined. In this case, I have used the table in the user guide and joined it to the vector attribute table in GIS then turned the polygon into a raster to meet the data format requirements.

In cases, where we have not been so lucky, I have had access to old scanned soil maps from ISRIC. We may have to digitise each layer and then give it a value.

However, I prefer your advice to use the soil grids link you shared and then work with that. I am assuming that I should just download each of the separate raster layers following the instructions you have outlined - (silt, sand, clay, bulk density etc…) - 0-5m mean values as shapefiles and join them all in to a final collated soil value in GIS then use raster calculators to give them all k-values?

Please excuse so many questions. I appreciate the inputs here.

Thanks again

Dear Stacie.

Thanks again. I will try to download the data from soil grids as you have suggested. I am just wondering if I will need to download each of the separate raster layers following the instructions you have outlined - (silt, sand, clay, bulk density etc…) - 0-5m mean values as shapefiles and join them all in to a final collated soil value in GIS then use raster calculators to give them all k-values? Is this the approach you would recommend?

Best wishes.

I think ISRIC Soil Grids are already in raster format, so if you’re using them directly, you should be able to reproject to the same projected coordinate system, clip them to the study area if needed, and use Raster Calculator to create K values, if you’re using an equation to calculate K.

If, however, you’re doing the translation from sand/silt/clay to texture, then mapping to the values in the table in the User Guide, I suppose you could still do all that in raster format, but it’s perhaps more confusing that way. You could turn the rasters into shapefiles, do the texture/K-factor mapping in the attribute table, then go back to raster. If you combine the sand/silt/clay/OM layers, such that each polygon has all values, you can export the shapefile attribute table to Excel and work with the rest of it there, which I often do, it can be more efficient. Then join the resulting table back to the shapefile and copy over your final K values, and convert back to raster.

~ Stacie

1 Like

This makes sense, Stacie. Thank you for sharing this instruction and experience.

I was having some progress with joining attribute tables to the vector shapefiles as you have mentioned. However, for other geographic areas, I may have to rely more on the Soil Grids rasters you have also mentioned and then using an equation. Do you recommend trying to find an equation from a journal article published for a similar geographic region? Or is there are an article / formula for the raster calculator you can recommend more widely?

I am now delving into the landuse and landcover. I notice from the sample data from the Gura case study from the online module. In the example, (Gura attached) there are 10 classes

. I notice these do not always have to align with the 6 IPCC landcover classes but can include other custom classes. I was wondering how we attribute the values 1-19 to each of these raster classes? I was unsure whether all classifications should always follow this example with these values?
Screen Shot 2021-03-18 at 8.45.48 am

According to the user guide there are a range of classes and sub-classes. I was not sure how to or whether to incorporate the P and C coefficients into the raster values of the landcover / landuse values?

Is there a reference document / reading with the values for landclasses needed for the InVEST model?

Thanks again for all the technical support.

@ndmetherall I’ll have to let Stacie answer your question about soil grids, but I can answer your other questions about LULC classes.

The LULC classes that we distribute with the InVEST sample data, including the ones you mention for the Gura study area that you mention here, are for demonstration purposes only, and are merely to demonstrate what the structure/format of the inputs should be. The LULC classes you use could align with the 6 IPCC landcover classes (or any other standard set of landcover classes), but they don’t have to. If you have your own classification system, that would work fine too.

Maybe this section (which provides some links to resources for estimating C and P coefficients) in the SDR User’s Guide chapter would be useful? Forgive me if you’ve already taken a look at it: SDR: Sediment Delivery Ra

1 Like

Hey @ndmetherall -

It’s always nice if we can find equations that were created more specifically for our study area, but they often don’t exist. So I would take a look, but not spend too much time on it, since you can use the table in the User Guide to get good general values for K. When I start a study with SDR in a new place, I always do a web search for “USLE” and my area of interest and see what comes up, you might find publications with equations for R or K that you think are applicable, or sources for USLE C and P values etc.

As @jdouglass said, your landcover map can have any set of classes, there are no requirements for how many, or which types, or the specific ID values they have. We expect that every project will have a different LULC map, from one of many possible sources. (I am working with one now that has 157 classes, including over 80 extremely specific forest types!) The only requirements are that each class have a unique integer ID in the raster’s Value field, and your biophysical table must have an entry for each LULC class that’s in the raster. You can see this correspondence in the Gura sample data, by comparing the LULC raster with the biophysical table.

~ Stacie


Thank you @jdouglass for these insights. I have referred back to the guide again and with your points on the LULC, I will look up the c and p coefficients in the guide

So from my understanding of the advice and the user guide, these p and c coefficients are used as inputs in the biophysical table corresponding to the LULC raster file with the same number of classes and no gaps? Correct? Does the raster need to be joined to the biophysical table using a geoprocessing toolbox? OR do we just use the raster calculator?

While, the LS factor is calculated automatically from the DEM, the P and C factors can be estimated from the tables in the guidelines, is this understanding correct?

But how would we know which areas are different farming types? Do we have to run a classification from high-resolution imagery?

Dear Stacie. Many thanks for your inputs here…

This is great advice. I will look into the values in the literature as you have advised.

Wow, 157 classes in your current study. That must be a complex catchment / area.

In terms of the biophysical table and the LULC raster with the corresponding value fields, I wasn’t quite sure if we needed to run a spatial join to link the table to the raster?

Below I have the biophysical table from Gura sample data

In the biophysical table, there are 10 LULC classes. Each has the following fields:

  1. unique LUC code
  2. USLE_c this is the c value that might come from the literature or the default values in the SDR and FAO guides? Not sure how to best classify / identify the different c value for each area? Would we use a classifier?
  3. USLE_p > this is the p value that might come from the literature or the default values in the SDR and FAO guides? Not sure how to best classify / identify the different p value for each area? Would we use a classifier?
  4. load_p > Not sure what this is? Where would we find this information?
  5. eff_p > Not sure what this is? Where would we find this information?
  6. crit_lan_p > Not sure what this is? Where would we find this information?
  7. root_depth > Where would we find this information?
  8. Kc > unsure about this variable?
  9. LULC_veg > Is this a boolean where 0 is no vegetation and 1 is vegetation?

Any advice on these variables and which ones are required inputs (sources) as well as the outputs would be much appreciated.

Kind regards.

Hi @ndmetherall -

If you look in the Data Needs section of the User Guide, and the sample data, you’ll see that you provide the land use raster as one input and the biophysical table as a separate input. The model will join the raster to the biophysical table using the Value field in the raster and the lucode field in the table, so you do not need to do this.

In the biophysical table, for SDR you only need to provide usle_c and usle_p, the other columns in the sample data correspond to parameters required for other hydrology models (NDR and Annual Water Yield). When we do projects using multiple models, it’s often most efficient to keep all of the models’ biophysical parameters in one table, but I’m seeing that it might not be a good idea to do that with the sample data, it’s often confusing when just getting started with InVEST.

You provide a usle_c and usle_p value for each land cover class, so your land cover map should already be classified. Then you’ll need to use your understanding of the different land cover types, and a literature search, to assign values to each class. We often start out with usle_p set to all 1s, if we don’t know if/how/where different sediment management practices are done.

~ Stacie

1 Like

Many thanks @swolny - I think I am getting closer thanks to all your help and the advice from @jdouglass and team.

I had a quick question : this is my erodibility data based on local soil datasets:

Vector version

Raster version

I have added K values to the vector attribute table and then turned it into a raster as it is my understanding that the input format should be a raster.

My question:
There are gaps in the dataset. Do I need to fill these gaps for the model to run? Should I interpolate the values or fill with a global dataset? Or can I leave them blank?

Thanks for the time and advice.

Hi @ndmetherall -

Soil data often has gaps in places where the soil is covered in bodies of water, glaciers etc (and I suppose there may be gaps due to other reasons, but these are most common.) Yes, these gaps should be filled in, since if there are NoData values in any of your input layers, you’ll get NoData values in the output maps, and sometimes the model will simply throw an error and not run properly.

Depending on what’s going on in the landscape where the data is missing, I do one of the following:

  • If it is actually exposed land, or in some other situation where you’d expect erosion to occur and impact water quality, I usually assign the K value that’s dominant around the gap. This makes the assumption that the soil in the gap area is the same as the soil surrounding it.

  • If it is something like a glacier, where (hopefully) the soil is not actually exposed, and the soil surrounding it might actually be very different, I usually assign a low value of K.

For consistency, you could use the first method to fill in all instances of missing data. The model is generally less sensitive to K, and more sensitive to USLE C, which you’ll assign by land cover type, so you’ll be able to account for the actual land cover through that parameter.

And yes, the K input needs to be a raster.

~ Stacie


Many thanks @swolny - this is really helpful. You are correct, it is mostly the first method that makes sense for this context (in Fiji). I will use the dominant K value. Is there a particular toolbox / geoprocessing tool you recommend for filling these numerous gaps?

I am also looking into the grids data in greater detail. I have downloaded the 0-5cm depth profile data for the different physical soil compositions as rasters. I wanted to get your advice about the next geoprocessing toolbox step? Should I combine the rasters into an average and then work out the K-values? Or can you recommend a suitable approach to using these soil grids rasters?

Thank you again.

Hi @ndmetherall -

What method are you using to calculate K? If it’s an equation using the different parameters that you have grids for, it should be straightforward to do math with Raster Calculator.

If you’re using the table in the User Guide, probably not so straightforward, since you’d have to translate these grids to soil texture type (using different ranges of %sand/silt/clay), then translate texture type + organic matter to K values. I’d have to think more about how to do it, but it could be a tedious set of conditional statements (my favorite Arc tool is Con, which I don’t think has an equivalent in QGIS), one for each texture type (clay, clay loam, sandy loam etc) + organic matter range.

~ Stacie

1 Like

Dear @swolny

Thanks again for all the advice. I was planning to use the following methods to work out K. Please find both text and screenshot of guidelines for calculating K that I intend to follow for this context in the South Pacific:

K was derived from the global World Inventory of Soil Emission Potentials (WISE )soil database (Batjes 2016). A special case is the K value for water bodies, for which soil maps may not indicate any soil type. A value of 0 can be used, assuming that no soil loss occurs in water bodies. We clipped the soil map to Vanuatu and joined the table (Table: HW30s_MapUnit) from the global database to obtain the soil types for the region. The dataset included spatially categorized soil types based on similarities in soil characteristics. Each category contained information on percent organic matter, the product of the primary particle size fraction, and the percent of the top six abundant soil types. For this study, we only focused on the topsoil. So, we only needed to join with the topsoil Table PRID (HW30s_ParEst) from the global database. The K-factor was derived using equation 3 (Williams 1995) and the WISE derived soil properties on a 30 by 30 arc-seconds global grid (Batjes 2016).

K=fcsand . fcl-si . forgc . fhisand (3)

Where: fcsand = a factor that gives low soil erodibility factors for soils with high coarse-sand contents and high values for soils with little sand, expressed in (ton/ha)*( or t⋅ha⋅hr⋅(MJ⋅mm⋅ha)−1, fcl-si = a factor that gives low soil erodibility factors for soils with high clay to silt ratios, forgc = a factor that reduce soil erodibility for soils with high organic carbon content, fhisand = a factor that reduces soil erodibility for soils with extremely high sand contents. The soil erodibility values (K) in this table are in US customary units, and require the 0.1317 conversion (FAO 2007). The input factors of equation 3 are calculated with equations 4-7 below (Williams 1995):

fcsand =0.2 + 0.3 . exp [-0.256 . ms . (1 - msilt/100)] (4)

fcl-si = msilt/(mc+ msilt)0.3 (5)

forgc =1–(0.0256. orgC /(orgC+exp [3.72-2.95. orgC]) (6)

fhisand=1–(0.7. (1–ms/100))/((1–ms/100)+exp(-5.51 +22.9. (1–ms/100)) (7)

Where: ms = % sand content, msilt = % silt content, mc = % clay content, orgC = % organic carbon. Soil erodibility is represented by a raster dataset, with a soil erodibility value for each cell (Fig S1.40-53).

However, when I download the WISE dataset, I get a georaster tif wise_30sec_v1.tif and a HW30s_ParEst table without any unifying field to join them together. Can you advise on how best to join the raster dataset to the table dataset from WISE?

Here is the attribute table for the raster

Here is the table
HW30s_ParEst.csv (629.4 KB)

I cannot find out how best to join them in order to do the raster calculations. Any advice on how best to work out K from these datasets would be much appreciated.

Best wishes.

It’s totally lame that they’d provide spatial data and a table with no common join field. Like the text says, they usually join on a “map unit”. It’s possible that the Value column in the raster is the map unit, but the table does not appear to have a corresponding column. The text also mentions using the column PRID in the table, but again there’s no corresponding value in the raster (which wouldn’t be the usual way to do it anyway, since the values of PRID aren’t unique but hey…) I have never used this particular soil data, so don’t really know what to advise. It sounds like you’ve done a lot of digging already, but that’s all I can think of, in case there’s a different table that contains the join field, or perhaps a field that’s hidden for some reason, I just don’t know.

~ Stacie

1 Like

Continuing the discussion from Input datasets for SDR (data sources & pre-processing):

Dear Stacie. Thanks again for the advice. This soil erodibility has required a lot of digging to be sure.

I was hoping to just get your advice about precisely which dataset you would recommend I use since the one recommended in this method does not seem to provide all the info needed to be able to spatially join the various soil properties to each pixel.

My understanding is I need a dataset that can give me % sand content, % silt content, % clay content, % organic carbon. Do you recommend just using the GRIDS soil to work out the percentage of each of the pixels compositions and using raster calculator to calculate through this formula for each pixel?

Is this all that is needed? If this is the dataset you recommend, I will use this instead of the method I had originally been referred to above. At this stage, I would just like to get the model to run and then I can do fine tuning later.

Your patience and support is much appreciated.

Kind regards.