Urban InVEST Cooling parallel processing bug

Hi all,
I am working on running the Urban InVEST Cooling model in Python via a function that takes each city in a list of 760 for the United States, prepares its arguments (i.e., clips the NLCD/evapotranspiration rasters, calculates the rural reference temperature and UHI), and runs the cooling model for each of the summer months in a given year. I have been using the multiprocessing module pool.map function to speed this process along. I save all the results to the same workspace with suffixes edited with the city and month.

I’m running into an issue where my code is working when each city is run sequentially (i.e., looping through the cities list), but during parallel processing there seems to be some thread intermingling messing with the results. Some cities have the correct avd_eng_cn in the uhi.shp results but none of the other information is correct, some have incorrect information throughout. The code works correctly when including only one city with the multiprocessing call.

I have tried running versions of the code without the data preparation within the function for a small subsample of cities that had the correct avd_eng_cn but nothing else correct in the uhi.shp. In this version, all of the data sources read in are specific to the city (so the parallel calls of the function should not be reading/writing to the same places). I notice that the incorrect values of other attributes in the uhi.shp files (avg_cc, avg_tmp_v, etc.) are the correct values for another city in the subsample. All of the inputs should be generated for the given city specifically within each call of the function since the loop works correctly, so I am left to assume the threads are intermingling rather than running in parallel.

I’ve also tried using the n_workers argument for the subsample and it seems to work, but it didn’t seem to speed things up much, at least for this small subsample.

I don’t have a log file, but let me know if there is anything you would like me to show for context. I added a picture showing what is happening in the multiprocessing results vs. what I should expect:

1 Like

Hi @libby_kula , welcome to the community. This sounds like a really interesting problem.

Offhand the first thing I would try is using a different workspace for each model run. In theory I think using a different suffix should work – all files created by the model should be unique to each model run – but in practice maybe this is not true. Letting each process use it’s own workspace seems more reliable as invest does a lot of reading & writing to disk.

1 Like

Hi @dave, thank you so much for your speedy reply! This seems to be working!!

1 Like

That’s great! Perhaps it is a bug with the results suffix where one or more files are being named without the suffix and as a consequence the different processes are overwriting each other’s file.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.