Habitat Risk: how to perform QAQC on summary statistics

What is the issue or question you have?

I am having trouble obtaining exact numbers from summary statistics (output folder as well as max numbers in intermediate folder) when QAQC data results manually on a cell-by-cell basis and obtaining summarizing results manually in R. Expected values by habitat and stressor or habitat and all stressors do not equal final results in output folder (Summary statistics).

What do you expect to happen?

I am hoping to be able to QAQC our own results but cannot find the issue.

What have you tried so far?

Current and previous HRA versions 3.1.4.1, 3.1.4.2, 3.1.4.3, Different resolutions, different projections, using mean vs row sums by cell. Accounting for spatial overlap for habitat/stressors. Running as a single vs multiple regions (Dissolved Barkley Sound and Clay Sound), assigning each region by cell and removing those outside Barkley Sound and Clay Sound, Using all cells regardless of assigned region. Habitat attributes are accounted for in Consequence, Max values for E and C values appear OK yet means and mins are off.

Upload the logfile using the :outbox_tray: button

No issue running data in HRA, just issue QAQC, willing to share my R script and outputs using InVEST Sample Data which holds all calculations derived from the InVEST User Guide.

Hi @Caelin , thanks for reporting the issue. It’s always possible there is a bug, but it’s also possible there are explainable differences between what the model does and what your R script does.

If you would like to see in detail what the model is doing, and how that might differ from your approach. You could review the source code:

and some tests that we have:

If you have a concise example of your approach, we can take a look at that also.

Thanks,

Hi Dave,
Thanks for your response, I do have a concise example and have spent a lot of time looking at the python code hra.py and replicating it in arc gis and in R but am still not getting the same values.

Is there a better platform to share the R code and concrete example outputs using the InVEST sample data? I couldn’t attach it previously because it is too large.

For example, I seem to get the correct E _Max value for soft bottom rec fishing of 2.63364 and a min of 0 but I get a mean of 0.138304811571419 but it should equal:1.2189637422561646

I am running a resolution of 1000, euclidean distance, no decay
In the code shared I tried with a max criteria of 2 and 3. I also expanded the sample data AOI to the extent of the eelgrass connectivity.

Because we are running for the entire Atlantic Canada, I need to ensure our results are accurate.

Is there currently a QAQC method in place for this software that you use as developers? Perhaps there is a better method to QAQC?? I seem to have no issue running any data including my own, just ensuring its correct!

SampleData_E_C_Calcualted_Summ_by_hab_stressors_account_overlap_sub_regions_V4.csv (3.1 KB)
SUMMARY_STATISTICS.csv (6.1 KB)
3. Calcualte_ExpConseq_sub_regions_V4.txt (9.4 KB)

I ran the sample data through version 3.14.3, using the parameters we include in the sample data .json file. If I calculate the statistics from intermediate/E_softbottom_rec_fishing.tif they match what appears in the SUMMARY_STATISTICS.csv.

One thing to be careful about is handling nodata. There are lots of nodata pixels in the summary region, and the model ignores them when calculating the mean. In other words, the mean is the sum of all non-nodata pixel values divided by the number of non-nodata pixels.

Hi Dave, Thanks for the response I have been off sick but will try this and let you know how I get along!

Hi Dave,
I have spent a few days checking my code and ensuring the no data is handled correctly but still cannot get the same means. I had tried the no data handling in the past as well. Would you be willing to share 1 example with for example soft bottom and rec fishing? I could also share my updated methods.

I have tried manually outside of R as well in ArcGIS following the hra.py steps in both, it is strange only the means are still not quite right on my end.

Would you be willing to share 1 example with for example soft bottom and rec fishing?

Here’s an example using the sample data and invest version 3.14.3:

library(raster)
library(exactextractr)
library(sf)

filename <- "E_softbottom_rec_fishing.tif"
workspace <- "C:/Users/dmf/projects/forum/hra/sample_3_14_3"
raster_path <- file.path(workspace, "intermediate_outputs", filename)
aoi_path <- file.path(workspace, "intermediate_outputs", "simplified_aoi.gpkg")
csv_path <- file.path(workspace, "outputs", "SUMMARY_STATISTICS.csv")

data <- read.csv(csv_path)
r <- raster(raster_path)
aoi <- st_read(aoi_path)

# Means calculated by HRA
data[data$HABITAT == "softbottom" & data$STRESSOR == "rec_fishing", "E_MEAN"]

# Means calculated by exactextractr
exact_extract(r, aoi, 'mean')

# Means calculated by raster::extract
extracted_values <- raster::extract(r, aoi)
mean(extracted_values[[1]], na.rm = TRUE)
mean(extracted_values[[2]], na.rm = TRUE)

This should print out 3 different pairs of means. Pairs because there are 2 polygons in the AOI; each gets a mean calculation. The methods do not produce identical results, but they are very close and the differences are likely due to how each method handles edge cases, where a polygon only partially overlaps a raster pixel, for example. Anyway, the results are close enough that I am confident all are “correct”.

Thanks Dave, I appreciate your patience and sharing code in R. It is reassuring to hear the values differ slightly in the intermediate outputs and summary statistics! I have found my issue in spatial overlap as I was misassigning no spatial overlap and no data both as 0. I have been able to obtain the exact numbers in the intermediate folder using raw #'s.

Thank you again for your response!

I will also note I made several changes to my code shared above in case anyone else tries it out!