Metadata: What is it and Why Do I Need it?

Metadata is a vital component of geospatial data. Understanding how the data is captured or the intention with which it is captured is sometimes more valuable than the data itself.

By Nicholas Duggan

If you work in the geospatial industry, you will know that shapefiles are these annoying data transmission files you can’t live without. Look closely and you will see that along with the shapefile (*.shp), the projection (*.prj), the attribution database (*.dbf), indexes (*.shx) and encoding file (*.cpg), there is another file that may be present — the metadata file (*.shp.xml).

Data about data

Metadata, you will be told, is “data about data”, which always sounds a little confusing. To be clear, metadata provides details about what the data you are providing contains. This can be as little as the extent/bounding box of the data, though it should, if following ISO/INSPIRE standards (more on those later), contain a lot more information such as, resolution, scale, method of recording, ISO type classification, ISO theme classification, recorded time, person responsible, and contact information, among other details.

In essence, when receiving a file from someone, rather than having to send an email with a thousand questions, it is sent as a file that fully describes what you have, how it was made and who to contact for more information. As a matter of course, all geospatial data should have metadata; it is an industry standard — ISO 19115 (2003) and ISO 19139. But there are a few other standards too, like the US Government FGDC, INSPIRE, UK GEMINI and Dublin Core, to name a few.

Along with the complexity of inserting a standard to the catalog of data you may manage, there is also the amount of time it will take to fill out. At one point, I had a team of five GIS (Geographic Information System) staff who spent six months completing our data to the INSPIRE standard (this covered us for ISO 19115 too). This creates a bit of a conundrum as many GIS jobs require using known GIS primary sources. So, what happens when this doesn’t contain metadata? How do you complete your own metadata? Further, if digitizing a quick boundary or an area of interest, does it require an hour of filling out all the metadata?

To be clear, metadata provides details about what the data you are providing contains. This can be as little as the extent/bounding box of the data, though it should, if following ISO/INSPIRE standards (more on those later), contain a lot more information such as, resolution, scale, method of recording, ISO type classification, ISO theme classification, recorded time, person responsible, and contact information, among other details.

Vital component

The biggest issue the geospatial industry has at present is about how rapidly it is moving. The need for data and the speed at which it requires delivery is leading to shortcuts, which are detrimental for geospatial standards. If a product manager has to choose between whether the data is delayed or the metadata is completed, it isn’t hard to see what will happen. Those not handling or working with geospatial data will find it difficult to comprehend how important the supporting information is.

Over the last year, my personal experience has been that around 60 percent of the data I have received as download from official sites and as email has had some form of metadata. Many of the owners of this data, whom I have contacted, do not have much of the information that is required.

In my opinion, metadata is a vital component of geospatial data. Understanding how the data was captured or the intention with which it was captured is sometimes more valuable than the data itself. Any data provided to another user should contain metadata, though I question the amount of detail that is required. Looking at current standards, it is close to the length of a novel and could be very confusing for new geospatial users.

Many of the geospatial metadata standards were written over a decade ago, if not longer, and there are far superior technologies and capabilities available today. This brings me to a presentation by Jo Cook of Astun Technology. She has looked at this very problem and found ways to automate a large part of this metadata, leaving the data creator more time to focus on important tasks. I think she has nailed the major missing component in our modern complex mapping systems. If geospatial software providers added some of this semi-automation to the metadata and maybe applied a little Machine Learning or Artificial Intelligence to provide a much shorter metadata form, it could provide geospatial data users with more enthusiasm and enable easier form filling.

As our need for better, more complex geospatial data increases, we must make it easier and faster to fill metadata; it needs to be a desire to fill out and not an afterthought. The only way we can do this is with the support of geospatial software providers and improvements to international standards.