Picterra, the Swiss geospatial Machine Learning company, has released a new data curation and exploration technology that allows users to gain more effective insights from datasets and improve model accuracy.
Need of data visualization
Visualizing data can get difficult, especially when large and complex aerial imagery is involved, on a global scale.
The Data Exploration Report reveal visual patterns in their data and provide key insights for better and more robust detectors.
“Dataset exploration is a game changer for Picterra users. It’s the first in a series of advanced data curation tools that will enable users to effortlessly take the performance of their detectors to the next level”, says Julien Rebetez, Chief Technology Officer, Picterra
The Data Exploration Report allows a quick assessment of the training coverage and identifies areas where the user should concentrate on future iterations.
- Improve dataset quality to ensure the data covers the variety of appearances of an object that will be seen during production (e.g., “building on grass”, “building on snow”, etc). Better datasets lead to better models.
- Ensure validation set is representative: By making sure the validation set covers the variety of the dataset, the validation score is more representative of how well the model will perform in production on new data.
- Data curation: distribute and focus annotation effort on the dataset’s most impactful images/regions.
The features are based on unsupervised learning and clustering techniques and allow a user to evaluate the distribution of their dataset. This is important because it allows users to spot “annotation gaps” in their datasets.
The report divides large imagery into small tiles before grouping similar tiles together based on their visual similarity (e.g., forest, water, urban, etc). These tiles are then visualized within the interactive report allowing users to understand which regions are covered by the current training dataset and make adjustments where necessary.
Why dataset exploration?
Dataset exploration can also be used for “data curation” approaches. This is when you have a team of annotators and you need to assign them to images to annotate. By selecting the region to annotate using the Dataset Exploration Report, you make sure that you distribute the annotation workforce as efficiently as possible because they will annotate regions that maximize the diversity of appearance covered by the dataset. This leads to more robust detectors.
“This innovation gives GIS and data science teams an indication of where – based on image variability – they need to place new training areas in order to achieve maximum model accuracy. This is a powerful tool to speed up and streamline the annotation process and reduce the likelihood of false positives”, states Rebetz.
Illustration of using satellite imagery from Morocco, shows how the Data Exploration Report can be used to solve real-world problems. The goal of the detector, in this case, was to identify man-made holes used for reforestation—a natural solution to both preserve and strengthen biodiversity and combat climate change.
The Data Exploration Report is able to pinpoint this variety within the dataset allowing you to quickly identify and annotate the most diverse areas.
Apart from Morocco, there are lots of upcoming use cases related to the diversity of urban areas across different geographical regions and climates.
“Predicting wildfire risk, detecting vegetation encroachment, identifying landslide risk on infrastructure, and responding to climate change better are just a few of the global challenges that can be tackled using the platform and the new Data Exploration Report functionality”, says Rebetz.
“For instance, you may be trying to identify certain types of buildings or even solar panels on the roofs of those buildings. There will be a wide variety of building materials being used on the roofs as well as differences in the surrounding environment”, he adds.