SDR crashing after completing a task

I have been attempting to run the SDR model and about for some reason it keeps crashing after completing a task at 100%. Then for some reason it starts the task over and crashes. InVEST-Sediment-Delivery-Ratio-Model-(SDR)-log-2019-10-09–16_09_39.txt (2.5 KB)

Thanks for posting. It’s hard to say what the problem is exactly. If you would like to send a link to your input data I will try to reproduce the problem.

Another option is to go to File > Settings and change the Logfile logging threshold and Taskgraph logging threshold to DEBUG . And then share your logfile again. I can’t promise that will reveal the problem though.

Hi Dave,

Here is the logfile with the options you recommended. InVEST-Sediment-Delivery-Ratio-Model-(SDR)-log-2019-10-18–14_18_25.txt (13.7 KB)

I will send you a link to the data if this logfile doesn’t reveal the problem.

Thanks!

Hmm, I don’t see any red flags. I think the reason the first logfile you shared was so brief is because many of the tasks in the model were completed during previous runs and their intermediate data are being re-used to avoid re-computation.

At the bottom of the more verbose log we see the flow accumulation calculation task begins, reports INFO 0% complete and then that’s all. How long have you let this run? I don’t have a sense for how long it should take, but maybe @swolny does.

What is the extent (overall dimensions) of your input data? And the grid cell size of the DEM?

Hi Dave,

The grid cell size of the DEM is 30m , it was initially 10m but that wasn’t working well either. The extent of the data is approximately a 220 by 100 mile watershed boundary area. I would be happy to send over the data via email. I initially ran it at a 10m size, and it completed successfully at about 8 hrs total, but my output sediment values were blank. Ever since running it again whether at 10m or 30m it has only taken about 5-20 minutes for that error to appear.

Hi there,

It sounds like there may be more than one issue at play here. So far the logs you have shared don’t contain any errors. Are you seeing an error message?

You may send me a link to your input data, here or in a private message.

Hi Dave,

I am not seeing any error messages when it quits. Here is the link to the files I was using https://tnc.box.com/s/jnnfxy8nr32jv2gnt6yhm23gscowqqjd

BS_30mFill1.tif = 30m dem of the big Sioux watershed. (I tried using this to see if it would work any better)

Fill_BigSioux10mdem.tif = 10m dem of the big Sioux

K_factor.tif = k factor raster (derived from WSS k values which I believe are already in metric?)

R_factor.tif = r factor raster (was converted to metric by multiplying by 17.02)

Landuse.tif = landcover raster derived from cdl

SDR.csv = landcover reclass biophysical table

I have been using the default input values for the rest of the model

Threshold flow = 1000

Borselli k parameter = 2

Borselli IC0 parameter = 0.5

Max SDR value = 0.8

I appreciate your help!

-Sam

Thanks for sharing your data. It looks like the crash happens because you ran out of memory, at least that’s what happened to me on a laptop with 8GB. The flow accumulation step of SDR consumed all available memory.

@rich that doesn’t seem reasonable, does it? I tested pygeoprocessing.routing.flow_accumulation_mfd on the latest develop as well.

Here’s the flow direction raster (120MB)

Hi @sfix, everything looks okay on my end too.

Sometimes we get a case where the landcover or the dem is in wgs84 and there’s a reprojection that makes a huge raster that crashes everything, but that doesn’t seem to be the case here. One thing I note though, in your log it looks as though you are using pit_filled_dem_BS2.tif and landuse1111222.tif for the first run and 30mFill_BS_DEM.tif and landuse1111222.tif for the dem/landcover on the last run. But the box link you sent only has “landuse.tif” and “BS_30mFill1.tif” in it.

Can you take a look at those rasters and ensure everything is projected and/or box us pit_filled_dem_BS2.tif, 30mFill_BS_DEM.tif, and landuse1111222.tif so we can see if there’s anything unusual there?

Hi Rich and Dave,

I just changed the names over from those files landuse1111222.tif to landuse.tif and pit_filled_demBS2.tif to BS_30mFill1.tif.
I ran that again yesterday just to check, and it didn’t work again. Here is that log file: InVEST-Sediment-Delivery-Ratio-Model-(SDR)-log-2019-10-22–12_15_17.txt (24.1 KB)

I also ran it with the same files on a computer with 32GB’s of RAM and I still had the crashing error. I also just checked the projections and everything is in NAD 1983 UTM Zone 15N.

@rich, to clarify, the story here is that pygeoprocessing.routing.flow_accumulation_mfd consumes over 32GB of memory on a 120MB flow direction input (my gdrive link above).

I ran SDR on these inputs and none of the intermediate files seem suspect, though I don’t have much experience with them.

So is the next step here to run it on a computer with more than 32GBs of RAM?

Ooops, sorry here both of you. I hadn’t followed the issue closely. The next step is for me to fix the bug that’s causing an OOM error in PyGeoprocessing – that should never happen. I’m working on it now.

I am picking up where sfix left off on this Big Sioux sediment model run. I’m basically running into a similar problem – the model just exits out suddenly at 36.9% of the flow accumulation task.
(1) I succeeded in developing a flow accumulation model outside of InVEST - can I just substitute that into the code and pick up where the model is crashing?
(2) Can I break the model up into 4 smaller watersheds? I can see how that would work for the headwater Huc8s but not the middle and lower ones.
fyi: I am running Invest 3.6 (3.6.0.post204+hb4caaf03e97b) because I ran into a problem with the NDR 3.7, though I did get it to run with 3.6.0.post204+hb4caaf03e97b based on another post in these forums.

Update: I have now successfully run the model on the two headwater Huc8s. I’m trying the middle and lower watershed Huc8s now-- not sure how it will handle the flow accumulation grid without the upstream watersheds.

Thanks for your patience as we address this. We now have a development build that should address the memory error discussed above. https://ci.appveyor.com/api/buildjobs/231gw9sa8bt3knxa/artifacts/dist%2FInVEST_3.7.0.post656%2Bh79f5d28bd594_x86_Setup.exe

This would require a large amount of python scripting, so definitely not the first thing I would try.

I would guess that one of those watersheds would still encounter the bug that raised the memory error.

Let us know if you have success with the new development build!

Well, for whatever reason I did get this approach to work – I completed the model run for the whole basin by running separate models for each of the 4 Huc8s, and the results for the lower and middle watersheds look good… I’m going to run with that for now.