Input datasets for SDR (data sources & pre-processing)

Looking at these equations, it seems like you should be able to download the SoilGrids rasters for %sand, %silt, %clay and %orgC for the top layer of soil, and use these equations as they are to calculate K using a GIS Raster Calculator. This way you don’t need to fuss with the WISE soil database, which was likely incorporated into Soil Grids anyway. It’s nice that you have these equations and instructions for your study area!

~ Stacie

Dear @swolny - thank you again for these inputs. I have honestly been amazed by the responsiveness of technical experts, developers and academic staff of NATCAP like yourself in this community. It is much appreciated. If there is anywhere I can leave a positive review, please let me know.

I was wondering about the landcover datasets. At the moment, I am using the Copernicus 100m landcover layer here from the Earth Engine catalogue. But I have also done some of my own supervised classifications using 10m sentinel imagery (however, this is much more time intensive). Do you foresee any issues with InVEST SDR running with either of these options?

Next, I am just hoping to confirm that my DEM and erosivity don’t need to be processed any further to be used in InVEST SDR?

For the SI LULC clip, as you have explained in past correspondence, I need to set up my biophysical table to align to the landcover classes / uses.

And for the drainage layer, is there anywhere I can see examples of this? Is it usually just a digitized set of polygons? I note this is optional so I may leave it out for the first run.

Kind regards.

Really, you can use whatever land cover map you feel represents your area well. Especially in terms of including the land use/cover types that are important for your study, and having appropriate resolution for whatever you’re using the results for. I don’t see any problem with the Copernicus layers you reference, they look like pretty standard LULC maps, if they seem useful for your purpose.

The User Guide describes the basic requirements, in terms of having the same projected coordinate system as your other spatial inputs, and providing USLE C and P parameters for each land cover type. Often, the latter is the tricky bit, and, for example, we end up giving the same C values for several (or all) forest types, because we can’t find studies that differentiate between them. And that’s ok, we just use the best data we can find.

For the DEM, if you haven’t already, I recommend reading through the Working with the DEM section of the User Guide. It describes different aspects to pay attention to, especially evaluating how well the stream network that’s generated from the DEM corresponds to reality, and provides pointers about how to do the relevant processing.

I don’t think I’ve ever used the Drainage input to SDR, so don’t have an example to share. It really is entirely optional, and was intended for cases where there’s a lot of human-engineered drainage systems, particularly in agricultural areas (or could be roads, etc). It would look rather like a stream map, where the drainage lines have a value of 1, and everything else has a value of 0. Then the model will treat it like a stream, such that when sediment moves down the landscape and reaches a drainage line, it will stop and be considered export.

~ Stacie

1 Like

Very useful @swolny - thank you!

Also I have been downloading the GRIDS data one square at a time and then appending all the rasters together for each of the components (i.e. sand 1,2,3,4 appended together to make one sand raster and replicating this for each component…).

Each component
image
All components appended

The K equation I am working towards in the raster calculator requires % silt, % clay etc…

|755px;x540px;

However, each raster gives a value of 1 to approx 400. I note that this is a unit of g / kg - therefore it would be around 400/1000 = 40%. Perhaps then the easiest way to get from the raster value to a percent value would be to divide by 10? Would you recommend this approach?

Does this mean I should be dividing each of these components by the total maximum for each raster (in the raster calculator) to get to a % component for each?

I am planning to append each component (sand (1,2,3,4 together), clay (1,2,3,4 together), silt (1,2,3,4 together), SOC (1,2,3,4 together)) and then do a raster calculation to to work out each f-factor (fsand, fclay, fsilt, fSOC) and then working out the K equation last. Do you think this is ok? Or should I just be doing everything at once in the raster calculator?

Also when appending to the raster dataset - I notice that the scale of the target raster dataset becomes the scale of the output appended dataset. Does this mean I should be appending to the dataset with the largest scale? In the example below I would append to SI_SD_3 with a raster value range of 0-420 rather than SI_SD_1 with a raster value of 0-388 so I could capture the entire range?
image

Also when working in the raster calculator is there a way to save the raster calculation to automate the process so we do not have to retype each component every time we want to do a raster calculation? I might just copy the equation into a doc so I can copy it back in and change the layers as needed. Isn’t there a way to record the algorithm for replication for automation in ArcGIS or QGIS?

Thanks again! With your support I feel like we are getting closer to completing the recipe for this model.

Best wishes.

I think you should be able to only do the conversion of g/kg to % and not divide by the total maximum. Try doing the conversion for each of the components, add them up and see if they come out to 100%.

Personally, I would also do the steps of mosaicking each set of 4 rasters, working out the f-factor, then calculating K, separately, as you suggest. I’m not quite sure if/how you could accomplish all of that in Raster Calculator itself.

Which GIS and mosaicking tool are you using? When you mosaic rasters together, you should be able to do all of them at once, and the result should include all of the values contained in all of the 4 rasters, it shouldn’t matter what order you enter them.

I haven’t done any automation with QGIS, but in Arc you can do a few things. One is to make sure your Geoprocessing Options are saving your history, then you can use the Results window to re-run a calculation. Or, you can use ModelBuilder to create a more complex workflow. Or you can script it with Python. Looking now, QGIS has a Process Modeler, which sounds similar to Model Builder, and I know you can script it. Looking at the History window shows the command that was run, so you could add it to a script, but for some reason it doesn’t allow you to just re-run it from there, which would be a useful addition.

~ Stacie

1 Like

Dear @swolny

Thank you again for this. In response to your follow up questions

  1. I am using append in ArcGIS Pro to join the rasters (is this alright?) > should I instead be using mosaic or mosaic to new raster?
  2. In terms of automation, I will follow your suggestions and test out what works best.
  3. For the calculation of percent > I will look for this in the raster calculations as you have suggested.

For the DEM - I have been reading through the user guide and following the instructions that I have summarised here:

For preparing the DEM

  1. Upload SRTM / DEM
  2. Reproject / warp to 1956 Fiji UTM CRS (as this is close to my study area)
  3. Fill sinks : DEM (I am not sure how many times I need to repeat the fill sink geoprocessing run. Is there a way to check when I have run this enough times?)

For generating streams orders and basins to compare the DEM against existing streams and watersheds
4. Flow direction : Use fill sinks > flow direction
5. Flow accumulation : Use flow direction > flow accumulation
6. (Is there an additional step to delineate outlet area needed?
7. Basins : Use flow direction > basins (8)

To generate basins / watersheds I can use:
8. Raster to polygon : Flowdir > Watersheds OR the DelineateIt module.

Would appreciate your thoughts on these assumptions. Is there anything I am missing here for the DEM? Do the watersheds vector input to the model need to be delineated from the same DEM or can we use a pre-existing vector? I am working on this now and will share pictures later.

follow up:

Outputs of Erodibility K raster calculations:

My erodibility K value output looks like this


:
With values from -0.08 to 0.572 - does this seem off to you?

RUNNING SDR MODEL

I have managed to input a whole range of files:


However, I am stuck on a last error for the watersheds vector:

“File could not be opened as a GDAL vector”

Do you know how best I can address this issue?

  1. I have reprojected all inputs to a UTM based on the advice of this website: What UTM Zone am I in ? - Interactive Web Map
  2. I have clipped all inputs to the same size for the bounding boxes to be compatible (was this necessary?)

Thanks again. We are getting close now I hope.

Best wishes.

Hi @ndmetherall ,

I believe Stacie is unavailable this week, so I will attempt to answer some of your questions now.

I recommend using the Mosaic to New Raster tool in ArcGIS. You should be able to include as many tiles as you need in a single use of the tool. When you are verifying the stream networks, check that stream segments are contiguous. If not, and streams are terminating where they shouldn’t, you may need to fill the DEM again. It is definitely advisable that the watershed input vector be delineated from the same DEM you are using for the input raster.

The error message you shared in the last screenshot is because the watersheds vector you have populated is not a vector file, it is an XML file. Notice the file extension ends in .xml. Instead, input the watershed shapefile, which has the extension .shp (without .xml after it).

It’s a good idea to clip the inputs so they have the same extents as one another, but as Stacie advised, be sure the mask has ample buffer so that the edges of coarse rasters are not clipped out.

Stacie should return next week and can provide more details then if needed.

Be well,
Jesse

3 Likes

Very useful. Thank you, @jesseG - I have followed your advice to use the mosaic tool.

And to use the .shp file for the watersheds vector. I am having a strange error with the watersheds where when I try to reproject them on the same UTM projection as the other layers they tend to disappear and when I try to run the model with this shapefile, I get the error “bounding boxes do not intersect”.

Any advice about how I can rectify this will be much appreciated.

Thank you.

If you’re getting a bounding box error (and even more so, if the watersheds are disappearing from your GIS view) then take a close look at all of your spatial inputs and make sure that they have exactly the same coordinate system. Which tool did you use for reprojecting?

A couple of notes on the K/erodibility values. They should not be negative, so I’d recommend setting all negative values to 0, and assessing whether that makes sense in those areas (perhaps they’re solid rock or something.) And the values look more like they’re in US units, not metric, so check your data to verify, and convert to metric using the 0.1317 value mentioned in the User Guide.

~ Stacie

2 Likes

Thanks for this advice @swolny

To answer your questions:

  1. I have been using define projection to change the projections to CRS UTM WGS_1984_UTM_Zone_57S - my understanding is that this is a linear units.
  2. I have checked all my datasets as you have suggested and they all have the following metadata:

|535px;x540px;

  1. I thought everything was metric - based on the linear unit being set to "Meters (1.0). However, I notice that the angular unit is in degrees? I thought that it would have to be in degrees since it is an angle. Are there any other additional steps needed to confirm everything is metric?

  2. I have used a mixture of mosaic and append to join the erodibility factor rasters? Is this a problem?

  3. I will set all negative values to 0 as you have prescribed. Do I need to multiply the resulting K value dataset by 0.1317 or does this need to be done for each of the f-factors or inputs into the K equation individually?

Thank you once again.

If you’re working in ArcGIS, use Data Management Tools > Projections and Tranformations > Raster > Project Raster, not Define Projection. The Project Raster tool translates the raster’s location from whatever coordinate system it starts out in, to the new coordinate system, which is what you want.

Define Projection does not translate location, it just changes the metadata to read that it has a new coordinate system. So even though the data is actually in, say, geographic system WGS84, you’re now telling it that it’s in WGS_1984_UTM_Zone_57S, even though the underlying data isn’t. This causes all sorts of problems. So go back to the original data and use Project Raster.

Yes, there are always linear units (which should be in meters for InVEST) and angular units in degrees (which is fine, as you say, it’s an angle after all).

I’ve never used the Append tool, so don’t know how it works. If I need to stitch together multiple rasters to cover an area, I’ve always used Mosaic to New Raster. Append might work fine, I just don’t have experience with it. My advice is to look closely at the joined edges and make sure they look ok, and check for any changes to patterns or major changes to values before and after appending. If things look ok, seems fine to me.

Multiply the resulting K value dataset by 0.1317.

~ Stacie

2 Likes

This is very useful, @swolny - am working on this now.

I am curious as to what tool you would recommend for the vector component (5. watersheds) > when I search for “project raster” in geoprocessing it does come up. But when I search for “project vector” the top search results is just “Project” - is this ok?

I will also look into the edges of the raster as suggested and finally check the 0.1317 multiplication and normalisation of negative values to 0 as instructed.

Thank you.

Yes, in ArcGIS the vector version is just called Project.

~ Stacie

Great. Thank you. I have taken the steps outlined and have made some further progress thanks to your help @swolny

When I have tried to run the SDR model again, I get the following error message:

|577px;x444px;

In my original landcover raster, I seem to have the correct values - corresponding to the biophysical table I have made in csv.

Original raster attribute table

Biophysical table (csv)

However, when I project raster to UTM and linear units, the attribute table is no longer accessible - greyed out. I am not sure if the biophysical table has been altered or lost when projecting raster to the UTM linear units projection. Do I need to use the geoprocessing tools ‘build a new raster attribute table’ or ‘join table’?

When I have tried to build a new table for the LULC raster input, I get the following error:

I have been using the following dataset for my landcover input: Copernicus Global Land Cover Layers: CGLS-LC100 collection 3

I will try to attach the files here:
Biophysical table: SI_joined_table.csv (339 Bytes)

I have tried to revise the biophysical table to address the error but not sure how to join this to the raster since the values in the original don’t seem to have a corresponding ‘count’ column
SI_joined_table2.csv (434 Bytes)

Original raster file for LULC (with attribute table)
SI_LULC.tif (30.0 KB)

Project raster to UTM and clipped to same bounding box version (attribute table seems to have been lost here)
Solomons_LULC_2.zip (12.7 KB)

Any advice about how best to address this would be much appreciated. Thanks again.

Kind regards.

The Values in your original LULC raster do match the lucodes in your biophysical table, except that the LULC has an additional value of 0 (which will need to be added to the biophysical table if it represents a land cover type, but I suspect it actually should be NoData). None of the missing values listed in the SDR error are actually in the LULC rasters you uploaded, so I have no idea why that error is happening. Did you upload the same files that you used to generate the SDR error?

I see what you mean about the re-projected layer not being able to have a raster attribute table created. That’s also puzzling, since it is a single-band integer. The one other obvious difference between it and the original LULC is that the re-projected raster is signed, which technically shouldn’t matter as long as it’s integer. I tried reprojecting SI_LULC.tif to the same UTM coordinate system as Solomons_LULC_2, and had no trouble creating an attribute table for it, so I’m not sure what’s different with how you did it. My projected raster is unsigned integer though.

For doing any joining between biophysical table and LULC, use the Value/lucode fields, not Count.

~ Stacie

2 Likes

Dear @swolny

Thanks again for these inputs. I have tried to add more details to the dataset and restart the LULC layer from the beginning.

Sometimes, when I try to export the raster as a tiff, I only get .aux and .ovr files. I have been struggling to get the file types that I need i.e. tiff and or .adf as was used previously. I have managed to redo this without losing the attribute table. I have done a join to connect the attribute table to the values needed.

See revised raster file: LULC4.zip (29.1 KB)
See refomed biophysical table: SI_joined_table_4.csv (337 Bytes)

I am not sure how to best proceed as I now have the issue that I am unable to export the file in the right format to input into the InVEST SDR model software.

I had another go today at trying to run the model. I managed to get the LULC and therefore managed to get the SDR model to run again for a second attempt. However, my resulting output files generated still do not visualise or display when loaded into ArcGIS Pro. Can you advise about this? I load in all the raster output files and they all have the same raster scale. Should this be the result? Is there a way I can better visualise it?
This is what it looks like when I zoom to the SDR output layers :

The SDR output layers seem to be to the correct UTM projection - same as the model inputs:

Workspace results:
Outputs:
outputs.zip (1.4 MB)

Uploading: aligned_dem.tif…
Uploading: aligned_erodibility.tif…
Uploading: aligned_erosivity.tif…
aligned_lulc.tif (2.4 MB)
Uploading: cp.tif…Uploading: d_dn_bare_soil.tif…
Uploading: d_up.tif…
Uploading: d_up_bare_soil.tif…Uploading: f.tif…
Uploading: flow_accumulation.tif…Uploading: ic.tif…
Uploading: ic_bare_soil.tif…
Uploading: ls.tif…Uploading: sdr_bare_soil.tif…
Uploading: sdr_factor.tif…
Uploading: slope.tif…
Uploading: slope_threshold.tif…
Uploading: s_accumulation.tif…
Uploading: s_bar.tif…
Uploading: s_inverse.tif…
Uploading: w.tif…
Uploading: weighted_avg_aspect.tif…
Uploading: ws_inverse.tif…
Uploading: w_accumulation.tif…Uploading: w_threshold.tif…
InVEST-Sediment-Delivery-Ratio-Model-(SDR)-log-2021-07-10–17_54_35.txt (931.2 KB)
InVEST-Sediment-Delivery-Ratio-Model-(SDR)-log-2021-07-12–00_10_59.txt (475.8 KB)
InVEST-Sediment-Delivery-Ratio-Model-(SDR)-log-2021-07-16–19_10_02.txt (916.4 KB)
Uploading: rkls.tif…Uploading: sed_export.tif…
Uploading: sed_retention.tif…
Uploading: sed_retention_index.tif…
stream.tif (2.5 MB)
Uploading: usle.tif…

As shown in the results the usle, sed_retention and sed_export all have the same range of -3.4 e^38 to 3.4 e^38. Should there be different ranges and non-negative ranges?

I am still trying to locate the source of these issues. My suspicion is that it is likely from the erodibility layer here:
3_k_f.zip (242.3 KB)

As shown here I have made the negative values = 0 and have multiplied all values by 0.1317 as you have advised. The results range from 0 to 0.075. I am not sure if this is acceptable? The ocean is given a 0 value and the land seems to be closer to the upper range. Does this look ok?

Any advice as to how to complete this model properly will be much appreciated. Nearly there I hope…

Looking forward to discussing.

Kind regards.

Hey @ndmetherall -

I can’t open the ArcGIS Pro layer package that’s in the LULC4.zip file, I don’t generally use Pro and my license seems to not be working. If you still want me to check out the LULC, try just zipping the raster, not a layer package - thanks.

What tool or method are you using to export your raster? What format is it in to start with? I’ve never had a problem like that creating TIFFs, they’re generally the easiest and best-supported format, although you can use other formats (like ESRI GRID, which I guess you’ve tried since you mentioned .adf files). But, it looks like you might have figured that out if you’re running the model now.

You’re right that those results are not correct. Can you post your log file? Do you see any warnings or errors in it that might indicate the issue?

One thing I recommend is going back through the Intermediate files to see where the model started producing bad values. By comparing those with the sequence in the User Guide, you might get an idea of which inputs or processes to troubleshoot.

The values of your K raster seem mostly fine, and looking at it in Arc it looks ok too, although kinda odd to have a value of -0. Also, we typically don’t assign values to ocean, we just assign them to land for this model and mask out the ocean, but I don’t think that should cause problems, since presumably your other input layers have NoData in the ocean, so you should end up with NoData there in the results.

~ Stacie

Dear @swolny

Thank you for getting back so quick again. Really appreciate it. I will try to address your points of advice:

  1. Have tried to include the raster here but I can only export the .adf:
    w001001.zip (235.0 KB)

image

  1. I am using the export raster tool for this part. As shown in the screenshots below, I can use the ‘TIFF’ option in the ‘Output Format’ prompt. However, the output that results is still the .adf as shown above.


image

  1. Here is the log file - thank you.
    InVEST-Sediment-Delivery-Ratio-Model-(SDR)-log-2021-07-16–19_10_02.txt (916.4 KB)

When I read through the log file I can see some errors at the end:
2021-07-16 19:19:36,709 geoprocessing.raster_calculator(480) INFO Waiting for raster stats worker result.
2021-07-16 19:19:36,710 threading.run(870) WARNING No valid pixels were received, sending None.
2021-07-16 19:19:37,826 geoprocessing.zonal_statistics(1184) DEBUG <osgeo.gdal.Dataset; proxy of <Swig Object of type ‘GDALDatasetShadow *’ at 0x0000029D003D6930> >
2021-07-16 19:19:37,833 geoprocessing.zonal_statistics(1224) ERROR aggregate vector C:\Users\Nicholas Metherall\Documents\sdr_workspace\watershed_results_sdr.shp does not intersect with the raster (‘C:\Users\Nicholas Metherall\Documents\sdr_workspace\usle.tif’, 1)
2021-07-16 19:19:37,836 geoprocessing.zonal_statistics(1184) DEBUG <osgeo.gdal.Dataset; proxy of <Swig Object of type ‘GDALDatasetShadow *’ at 0x0000029D003D6480> >
2021-07-16 19:19:37,843 geoprocessing.zonal_statistics(1224) ERROR aggregate vector C:\Users\Nicholas Metherall\Documents\sdr_workspace\watershed_results_sdr.shp does not intersect with the raster (‘C:\Users\Nicholas Metherall\Documents\sdr_workspace\sed_export.tif’, 1)
2021-07-16 19:19:37,845 geoprocessing.zonal_statistics(1184) DEBUG <osgeo.gdal.Dataset; proxy of <Swig Object of type ‘GDALDatasetShadow *’ at 0x0000029D003D6780> >
2021-07-16 19:19:37,853 geoprocessing.zonal_statistics(1224) ERROR aggregate vector C:\Users\Nicholas Metherall\Documents\sdr_workspace\watershed_results_sdr.shp does not intersect with the raster (‘C:\Users\Nicholas Metherall\Documents\sdr_workspace\sed_retention.tif’, 1)
2021-07-16 19:19:37,855 geoprocessing.zonal_statistics(1184) DEBUG <osgeo.gdal.Dataset; proxy of <Swig Object of type ‘GDALDatasetShadow *’ at 0x0000029D003D6570> >
2021-07-16 19:19:37,862 geoprocessing.zonal_statistics(1224) ERROR aggregate vector C:\Users\Nicholas Metherall\Documents\sdr_workspace\watershed_results_sdr.shp does not intersect with the raster (‘C:\Users\Nicholas Metherall\Documents\sdr_workspace\sed_deposition.tif’, 1)
2021-07-16 19:19:37,870 model._logged_target(1655) INFO Execution finished
2021-07-16 19:19:37,875 utils.prepare_workspace(129) INFO Elapsed time: 9m 35.83000000000004s

These seem to be the only references to ‘error’ I can find.

  1. I will go back through the intermediate files.

  2. Glad that K seems to be ok but I will also check the intermediate files as you have suggested.

Many thanks for your patience and support.

Kind regards.

Dear @swolny in relation to the intermediate outputs here is what I have found when loading them up and comparing to the user guide as advised:

  1. The values for most of the rasters are the same range as the final outputs discussed above -3.4 e^38 to 3.4 e^38

  2. One of the main exceptions are some of the hydrological outputs including the flow accumulation layer as shown here. When I try to visualise the layers most do not show up but the flow accumulation does visualise a large black rectangle:

  3. watersheds and other layers do not visualise when brought into the map.

image

If you have any further advice or info on what might be the problem, I will try to revise this. I am concerned that the results may mean I have to just start again from scratch?

Looking forward to discussing.

Best wishes.

Hm. It appears that the DEM raster is not being processed correctly, which could be your fundamental issue. It’s hard to tell why this would be happening. It could be because of the raster format, although the model usually throws an error if it has problems with that.

Is there a particular reason that you’re using ESRI GRID format for your rasters? They should work fine, but are complicated with all of the related files and file naming limitations. It’s been a while since I’ve used them, but try providing “hdr.adf” as the model input (instead of w001001.adf) and see if that changes anything. TIFF is just easier, but you seem to be having problems exporting a TIFF, which is not something I’ve ever experienced.

One random thing, which is probably not an issue here, but be aware that models sometimes have problems when writing to cloud folders like Dropbox, since there’s latency, syncing etc that can cause odd things to happen. We generally recommend keeping input and output on a local drive if possible.

~ Stacie

2 Likes