Some data are big, some have some great information but aren’t useful, and some just need a little nudge in the right direction. Quite often we have information that needs finer categorizing or the values remapped. If you go to the help pages of the Geographic Information System (GIS) software, you will find that they have tools that do a statistical remap or suggest a method that involves hard coding the information as a field into the attributes. This causes a big problem when you are working with a client who keeps changing their mind over how a feature should be classified.
Quite often in business, product owners, data scientists, developers, and even the solution architect all have a say in how the data is classified so that it meets a plan or other categories used elsewhere. Even when you are working on your own, you may find that it works better for you to use a separate file that can be altered several times without affecting the core data. The best way is to create a lookup table.
Why create a lookup table? Isn’t this creating more information? By creating a lookup table it can be easily manipulated and adjusted with minimum disruption to the core data when working with many of the different geospatial formats. Also in most GIS, it is easier to join a CSV file than it is to keep going in and editing again.
Creating the lookup table is as simple as opening a blank CSV file, adding two fields (‘before’ field and ‘categorized’ field), then copying and pasting the current values from the table into the before field. Then, in the after field, put in the categorized values.
How you use the lookup depends on the system you are using. If using geopackage, shapefile or Esri geodatabase, then you can use the join function in the GIS to combine the data. The join is virtual and you can edit the CSV file that is referenced. When all changes are complete, just export the data and the changes will be set into the new data.
Example: Categorizing town size using an external CSV and then joining based on name. Above shows original data and the CSV created to categorize town size. Below shows how to join data in QGIS and the resulting table.
A more efficient program is PostGIS; you can use the method above in a GIS or you can just use QGIS to drag and drop the CSV file into your database and then create a ’view’. A view, to those who are not Structured Query Language (SQL) users is literally just that: you select information from a table and use a view from it. The great thing is that you can use it as a data. So, the best practice for PostGIS would be to create a view of your original data and then join it to the new table based on the before field.
We often use lookup tables from authorities and third party data providers but rarely create them ourselves, though in practice, when you consider the wider business need, it turns out that they can be a valuable resource.
One financial benefit of creating lookup tables is the resale value. Often there is some industry expertise that may be added to data that cannot be obtained elsewhere. By creating a lookup table you can resell the table without having to provide whole data. This has been made even more accessible by the opening of the UK Unique Property Reference Numbers (UPRNs) and Topographic Identifiers (TOIDs). This allows for providing tables of information which may be linked to the geometries of the geographic data. An example of this would be the current EPC or Energy Performance Certificate data that is provided by the UK government. This has energy information for every home and business in the UK that has had an energy review. Currently, it is approximately 80 percent of the buildings in the UK. The data is provided as a set of CSV files which have a UPRN field. This can be matched to the Ordnance Survey data with either an open product or their proprietary products to create data with geometry.