Open-source program to assess and map COVID-19 hazard risk

04/02/2020 3 Minutes Read

Most of the COVID-19 maps that I see are usually into choropleth maps at the country scale, which means that they assume a uniform distribution in each geographical unit. There are some other maps using a point symbology. However, the problem is that usually those points overlap each other. The approach adopted on the other hand, increases the spatial resolution and granularity of information that is conveyed to the people.

Most of the other COVID-19 maps/applications usually focus purely on confirmed cases/ deaths, while not paying much attention to the quantification of potential risks. For example, if you look at some of the most current maps, you will see that populous countries like India and Nigeria do not yet have a big problem, while their large populations alone increase their risk.

These are some of the reasons why I developed a customizable open-source program to assess the COVID-19 hazard risk mapping using up to date data. This is a very simple methodology to assess hazard risks from COVID-19 in a geographical manner. However, by no means it is fully accurate and needs to be used with care. Ideally, it should be reviewed by data scientists, geographical epidemiologists, other health professionals and policymakers to be adjusted for gaining some overall insights into the pandemic.

Purpose and logic

Theprogram is basedon an open-source approach where other users have access to all the codes and data, and can customize their maps according to their own criteria. Programmers and data scientists can build on these codes, enhance it with additional data and methods, and make it much more useful for informing the public and potentially policymaking. The program works on a simple logic that relies on the hazard risk approach. In the hazards research, it is usually accepted that the hazard risk = hazard magnitude x vulnerability.

The hazard magnitude is known up to a certain extent and it is estimated to be some function of the current confirmed cases, deaths and recovered cases. The vulnerability, on the other hand, can be defined by the number of people that is vulnerable to the disease, and it is some function of the population.

For the hazard component, the confirmed cases and deaths collected across the world are used. Further, for the vulnerability component, a 1km population grid is used. The population grid is already downloaded, aggregated to 10km resolution and included it in the repository (ppp_2020_10km_Aggregated.zip).

Assessing risk

Multiplication of confirmed cases and the population gives one risk measure, but since testing is not uniform across the world and the number of deaths might be more reliable, death numbers are multiplied with the population for the second risk component . Finally, the larger the population is, the more risk it has at an exponential level even if there are no confirmed cases yet. That is why the population was squared to generate the third risk component.

In the program, each risk component is scaled between 0 and 1000, followed by the calculation of total risk based on component a + b + c/2. This is a first attempt to quantify the risk and it is acknowledged that countless factors should ideally go into this mapping (e.g. temperatures, connectivity and human flows, existing policies, type of medical system, economics, level of social isolation, etc.).

Program flow

The program reads all the constants and file names from the covConst.py file. The outcome of the program can be changed simply by changing the variables (such as the size of the low pass filter). The program covid19RiskMap.py pulls the COVID-19 data from the COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE. Then it creates a shapefile containing the confirmed cases and deaths with their lat/long. It further creates two rasters for both confirmed cases and deaths. Because the rasters are created with a relatively fine spatial resolution, a low pass filter using a Gaussian kernel was applied on to these rasters for a more meaningful spatial distribution (basically distribute the confirmed cases and deaths to neighboring pixels).

During this process, the geographical references disappear so they had to be reassigned. The program adjusts the size of the population grid since the raster calculation is done by numpy, which means the arrays need to be in identical size. Each raster is read into a numpy array, and the calculations described above are carried out. The result is saved as a raster named “covid-risk.tif”.

Visualization

For the screenshots, ArcGIS Pro was used, but an open-source solution like QGIS can be easily deployed, as well. The final solution might include a capability of visualization on Jupyter Notebooks. Ideally, the user would be able to zoom into portions of the final map on their browser.