I’m planning to do a batch processing of the SDR model, so I was trying to run the model using a Python script. The script works but fails at some point - apparently at computing the zonal statistics of the raster files, as the output watershed shapefile (watershed_results_sdr) doesn’t have the fields usle_tot, sed_export, sed_retent and sed_dep.
Here is the log file: log_error.txt (19.9 KB)
The model works fine using the GUI, all the outputs are created correctly.
I’m using Invest 8.9.0, Python 3.8.6, and Windows 10, and the sample data from the Gura watershed.
The package versions required for natcap.invest to work follow this (invest/requirements.txt at main · natcap/invest · GitHub). However, while I was updating some of the packages, the system told me that natcap.invest actually needed another version.
Can someone please help me to figure this out?
Thank you,
Ligia
You’re exactly right that this error is happening during a zonal_statistics function call as part of the final computational step o the model. This zonal_statistics call is done just after the watershed_results_sdr.shp vector is created, but just before any fields are added, so that’s why you’re seeing that the vector exists, but without fields.
Looking at your logfile, the crash appears to be this error:
ValueError: GEOSGeom_createLinearRing_r returned a NULL pointer
I can’t say for certain what’s causing this specific error in this case, but I can think of a few possible causes.
First, could you check that your watersheds vector does have geometries and that the geometries are valid? Occasionally invalid geometries (geometries that self-intersect or are not closed rings) can cause unexpected behavior.
If that doesn’t do the trick, could you describe from where you installed shapely and gdal? And which versions of each of those packages are installed in your python installation? In particular, if you grabbed a GDAL wheel from Christoph Gohlke’s page, could you make sure that your shapely install is also from Gohlke’s page? These two packages can sometimes have conflicting GEOS versions that can cause issues.
The InVEST build process for 3.9.0 uses conda and the packages on conda-forge, so if all else fails, you could always try setting up a conda environment and installing the needed requirements there. It shouldn’t be necessary, but it’s nice to at least have a fallback
Thanks for the reply. The vector geometries are valid.
I guess the error may be due to the conflict you described. Shapely was installed via pip and gdal whas installled using the wheel. Package versions:
GDAL==3.1.4
Shapely==1.7.1
What do you suggest? Should I uninstall Shapely and reinstall it using the wheel from Gohlke’s page?
Yes, I think uninstalling shapely and re-installing it via the wheel from Gohlke could very well solve the issue you’re seeing. When setting up an InVEST development environment with pip, we usually grab gdal, shapely, and rtree at the very least from Gohlke wheels.
Like James mentioned, we’ve transitioned over to Conda for a lot of our development needs. If you have miniconda installed and are familiar with conda, the Makefile in the InVEST repository has a nice command: make env, which will set up a Conda development environment. Let us know if you’re interested in trying that as well.
I uninstalled Shapely and reinstalled it via the wheel, and the script works now. I ran it changing only ‘lulc_path’ and ‘results suffix’ (2x), and the outputs look ok. However, I noticed a lot of messages in the log like “Task.is_precalculated(1379) INFO not precalculated (fill pits (1)), Task hash exists, but there are these mismatches: Recorded path not in target path list c:\users\ligia\documents\saida\intermediate_outputs\pit_filled_dem_test1.tif” log2.txt (42.6 KB)
I read on the forum that this indicates the task was done before but with different inputs, so it needs to be recalculated. Am I correct?
Then two questions came up. Could you please clarify for me? 1) Even if only one input changes in each run, does the model recalculate the files that won’t change (e.g. LS factor)?
2) Does the model have to write all the intermediate files, or is it possible to ask it to write only the outputs?
Great to hear that reinstalling shapely via the Gohlke wheel did the trick!
For the logging, you can silence a lot of that extra Task.is_precalculated ... logging if you like (there is usually a whole lot of it) by doing this in the top of your script:
The logging.basicConfig will set up a logging handler to your console at a slightly higher level (INFO), which will limit the taskgraph logging, and then setting the taskgraph logging level to ERROR will further limit the taskgraph logging you’ll see.
Yes, that is correct! Changing the suffix will change the filepath, and so (in most models) the model will recalculate that input.
The model will recalculate only what it needs to in order to use the outputs. So for SDR, if you change the LULC but not the DEM, then the LS factor will not be recalculated. Only those outputs that depend on the LULC will be recalculated.
InVEST does not currently have a way to avoid writing these intermediate files. In general, the intermediate files written can be useful in certain circumstances, and they also help us to ensure that the model makes efficient use of memory.
I was just worried about the memory required to store all those files, as all intermediate files for each run are being written, even those that do not change (e.g., LS factor). I guess this happens due to the change in the suffix, right?