Urban InVEST Cooling parallel processing bug

libby_kula · October 24, 2023, 7:06pm

Hi all,
I am working on running the Urban InVEST Cooling model in Python via a function that takes each city in a list of 760 for the United States, prepares its arguments (i.e., clips the NLCD/evapotranspiration rasters, calculates the rural reference temperature and UHI), and runs the cooling model for each of the summer months in a given year. I have been using the multiprocessing module pool.map function to speed this process along. I save all the results to the same workspace with suffixes edited with the city and month.

I’m running into an issue where my code is working when each city is run sequentially (i.e., looping through the cities list), but during parallel processing there seems to be some thread intermingling messing with the results. Some cities have the correct avd_eng_cn in the uhi.shp results but none of the other information is correct, some have incorrect information throughout. The code works correctly when including only one city with the multiprocessing call.

I have tried running versions of the code without the data preparation within the function for a small subsample of cities that had the correct avd_eng_cn but nothing else correct in the uhi.shp. In this version, all of the data sources read in are specific to the city (so the parallel calls of the function should not be reading/writing to the same places). I notice that the incorrect values of other attributes in the uhi.shp files (avg_cc, avg_tmp_v, etc.) are the correct values for another city in the subsample. All of the inputs should be generated for the given city specifically within each call of the function since the loop works correctly, so I am left to assume the threads are intermingling rather than running in parallel.

I’ve also tried using the n_workers argument for the subsample and it seems to work, but it didn’t seem to speed things up much, at least for this small subsample.

I don’t have a log file, but let me know if there is anything you would like me to show for context. I added a picture showing what is happening in the multiprocessing results vs. what I should expect:

dave · October 24, 2023, 7:59pm

Hi @libby_kula , welcome to the community. This sounds like a really interesting problem.

Offhand the first thing I would try is using a different workspace for each model run. In theory I think using a different suffix should work – all files created by the model should be unique to each model run – but in practice maybe this is not true. Letting each process use it’s own workspace seems more reliable as invest does a lot of reading & writing to disk.

libby_kula · October 24, 2023, 8:57pm

Hi @dave, thank you so much for your speedy reply! This seems to be working!!

dave · October 25, 2023, 12:53pm

That’s great! Perhaps it is a bug with the results suffix where one or more files are being named without the suffix and as a consequence the different processes are overwriting each other’s file.

system · November 1, 2023, 12:54pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Urban Cooling Model, still some errors after run Software Support	1	329	July 20, 2020
Urban Cooling model: Support Software Support	2	158	April 9, 2024
Very long processing time: Urban Cooling model Software Support urban-cooling	6	461	April 15, 2022
Urban Cooling _ nan values (new) Software Support urban-cooling	12	1737	April 7, 2021
Can I run the InVEST Urban Cooling Model with commandline(cmd) using the 'arg' list? Software Support urban-cooling	6	831	July 9, 2020

Urban InVEST Cooling parallel processing bug

Related topics