Abstract
This article focuses on the role of crowd sourcing in global urban area mapping. Crowd sourcing is an approach in which non-expert people, called crowds, join a project of producing data with simple procedures. To introduce crowd sourcing for our global urban area mapping, the researchers of this study constructed a crowd sourcing platform with open source GIS software and developed a ground truth data development system the platform. The data development system was for producing ground truth data by digitising boundaries of urban area with visual interpretation of satellite images. By using the system, the researchers successfully developed over 160,000 records of boundary data in five month. They conducted an experiment with operation of the system to measure working time by several sizes of work unit: 80 km × 80 km, 20 km × 20 km, and 10 km × 10 km. Medians of working time were 87.2, 6.2, and 1.4 hours, respectively. The result of this research can be helpful to estimate total working time of crowd sourcing of ground truth data by visual interpretation and can contribute to progress of data-intensive studies of geospatial information, remote sensing, and photogrammetry.
INTRODUCTION
Global urban area maps have been used for critical issues on the earth to measure location, area and shape of urban areas around the world. Many attempts have been made to develop urban area maps in high resolution and high accuracy; however, data based on ground truthing for urban area mapping is still lacking (Miyazaki et al. 2011).
Efficient methods are needed to collect data for further improvement. In recent years, several data-intensive science projects adopted crowd sourcing, in which many operators work together on routine data collection over the Internet (Hand 2010). Especially for the field of geographical information, volunteer people collect environmental and civil information with geographical location derived from Global Positioning System (GPS) and post it to public database. The information collected by such voluntary activities is called volunteered geographical information (VGI; Goodchild 2008).
The most active VGI is OpenStreetMap (OSM; Haklay & Weber 2008). OSM was started with the purpose of providing freely available road map data. Degree confluence project (https://confluence.org) is another outstanding crowd-sourcing project for building ground truth database. The objective of the DCP is to archive ground information of the latitude and longitude integer degree intersections by visiting and documenting the state of the surroundings (Iwao et al. 2006). The ground information provided by DCP is statistically applicable for ground truth data of land cover classifications (Iwao et al. 2006). Geo-Wiki (https://www.geo-wiki.org) is also an outstanding crowd-sourcing project of building ground truth database. The Geo-Wiki provides web-based interface for mapping validation information of global land cover maps. The user of the Geo-Wiki may determine correct land cover at pixels of disagreement among ground land cover maps by interpreting high-resolution imagery of Google Earth (Fritz et al. 2009).
These approaches of crowd sourcing of geographical information would be applicable to the issue of global urban area mapping. This paper presents a development of crowd-sourcing GIS platform for producing ground truth data by visual interpretation for global urban area mapping. We also suggested issues management of ”crowds” for efficient operation of the system.
In this study, we defined urban areas as places covered with a built environment, incorporating non-vegetative, human-constructed elements (e.g., roads, buildings, runways, and industrial facilities) as applied to remote sensing literature (Angel et al. 2005; Schneider et al. 2009; Orenstein et al. 2011).
2. METHODOLOGY
2.1. Crowd sourcing for global urban area mapping
Generally, developing ground truth data is performed mainly with visual interpretation by remote sensing experts. However, for collecting ground truth data in global scale, work resources only with experts would not be enough to complete ground information necessary for good accuracy.
Crowd sourcing would be a good solution for this kind of problems requiring large amount of accurate information. In this method, data development is performed by non-expert operators, called ‘crowd’. For the crowd works to be efficient, we focused on the following challenges on implementing crowdsourcing.
Challenge 1. Defining tasks to be simple: For non-experts, complicated works could be so heavy that precision of works would be degreaded. Also, complicated tasks require more sophisticated capacity, suggesting high cost for providing trainings with the crowds and considerable obstacle to join the project.
Challenge 2. Quality assurance: Although positional accuracy of crowd-sourced data could be fairly good (Haklay 2010), there would be heterogeneity of processes, scales of production, and compliance to standard and specifications (Girres & Touya 2010). This problem could be occurred in the context of urban area mapping with visual interpretation of satellite images. Visual interpretation of satellite images would involve various conditions of visual interpretation, such as landscape of region to be digitized, operator’s expertise, and motivation.
Challenge 3. Managing various types of efforts: Crowd-sourcing project involves operators with various types of backgrounds (Coleman et al. 2009). Management of their efforts had to be flexible for unexpected consequences caused by uneven expertise on remote sensing. For example, if an operator gives up completing his/her assignment, rest of the assignment have to be reassigned to other operator. This induces complexity on managing tasks.
Challenge 4. Keeping motivation up: Producing ground truth data can be boring if there is no stimulating outcome. Intellectual stimulation, such as improvement of technical skills, knowledge and experience, is suggested to be a motivation in crowd-sourcing project (Coleman et al. 2009).
Challenge 5. Reference information for visual interpretation: For beginners of remote sensing, complement information is required for visual interpretation of satellite images. Map services on the Internet would be a good information source of reference for visual interpretation. Therefore, integration of the system with other map services is expected to improve work efficiency. For these challenges, we proposed solutions as follows.
Solution 1. Drawing only urban area: In the urban area mapping, ground truth data have to be classified into ‘urban’ and ‘non-urban’. Because these classes are mutually exclusive, development of ground truth data can be done just by drawing boundaries of urban areas. Therefore, we defined procedure of task just as drawing boundaries of urban areas. We also defined a class ‘unknown’ for areas unclear to be classified into urban or non-urban (e.g. cloud contamination and haze). In summary, the task included drawing polygons and classifying them into ‘urban’ or ‘unknown’.
Solution 2. Quality assurance by fixing map scale and expert’s checks: Map scale of visual interpretation affects precision of digitizing ground object; therefore, just fixing map scale could assure quality in a certain range. Also, to assure quality of outputs produced by operators with different background, check and control by trained expert would be required (Coleman et al. 2009).
Solution 3. Tiny assignment: Non-expert people are not likely to keep working for a long time, such as tens of hours. To avoid complicated management mentioned in Challenge 3, assignment unit should be as tiny as possible.
We conducted an experiment to identify proper size of work unit.
Solution 4. Showing achievements: recognizing achievements is a good opportunity to identify what have been done by an operator’s hands. It could be associated to keeping motivation by intellectual stimulation as mentioned in Challenge 4.
Figure 1: Overview of the system architecture.
Solution 5. Building the system with open standards: We attempted to build the system with openly accepted standards to integrate other map services (e.g. Google Maps, OpenStreetMaps, and Bing Maps). For the system to be friendly with those map services, we applied Tile Mapping Service (TMS) for providing background satellite images and reference map images of other map services. TMS has an advantage for our global urban area mapping in organizing map tiles with same size consistently all over the world. In addition, the tiles are organized in multi-resolution, so it has another advantage in experiment for Solution 3 to measure work efficiency by the size of work unit.
2.2. Server-side System development
Overview of system architechture is shown in Figure 1. The system is basically structured by server side (right column of Figrue 1) and client side.(left column of Figure1). For the server side, key technologies of the system are Web Map Service (WMS) and Web Feature Service (WFS). Here, we describe basic features of the WMS and WFS.
Web Map Service (WMS)
Conventionally, development of ground truth data involves transferring large image files more than hundreds of megabytes by optical media, hard disk drives, FTP, and so on. Preparing reference images by these kind of medium considerably consumes time and labor. That way also would induce security problems of source data. When transferring reference images to operators, the conductor has to ask the operators to sign covenants not to use the data for other purpose. However, the covenants do not assure security of the reference data. More or less, the risk of technical incidents and human errors is always in backyards. WMS would be a proper way to save such unessential labor, time, and risks. WMS is a standard scheme for publishing map data by a server-client model over the Internet (Open Geospatial Consortium 2011b). A basic use case is that, when a client requests map images for a certain geographical extent, the server sends reference images only for the requested extent.
The server-client system enabled us to easily control accessing the reference images. For example, if an operator has tasks of visual interpretation for a certain extent, the conductor has to consider physical limitation of medium and assure consistency of geographical extent of assigned tasks and the reference images to be interpreted by the operator. Owing to the on-demand system implemented by WMS, we just had to put reference images into the server.
Figure 2: Process of task management. The numbers denote the order of steps.
In addition, because WMS can be performed on a very common protocol, called HTTP, it provided us easy access control of operators. The feature was very useful when adding and removing operators from a project of developing ground truth data by visual interpretation.
Web Feature Service (WFS)
For a project of visual interpretation, as well as managing reference images, organizing the posted result from operators would require considerable labor and time. Project manages have to manually keep consistency of the transactions of ground information among operators. A typical series of steps are receiving spreadsheet and vector map data by email, associating them, and integrating outputs. Those would prevent flexible operation of developing ground truth database which required for labor-intensive projects.
For the operation to be flexible, we proposed to introduce WFS to developing ground truth database. WFS is a standard scheme to transfer vector map data over the Internet. Addition to that, WFS supports transactions on vector maps between server and clients, like editing the map data over the Internet (Open Geospatial Consortium 2011a).
WFS is basically operated with geographical information database, thus, project manager does not need to associate vector data and spreadsheet. Additionally, with a few configurations, WFS automatically records name of user and time of transactions. Another useful feature of WFS is accepting requests over the Internet into a single geographical database. Owing to the function, we were able to avoid integrating the records of the visual interpretations separately sent from operators. It also enabled us to flexibly add and remove operators for visual interpretations.
2.3. Client-side system requirements
To avoid complicated operation for drawing urban boundaries, we didn’t use fully functional GIS, but developed our own GIS interface optimized for our crowd sourcing with focusing on simplicity for beginners. Key technologies are as follows.
Web-based GIS: Most of beginners are not familiar with installation of GIS software. In addition, critical constraints of installing software might be occurred owing to operation environment (e.g. operation system, limitation by system administrator). To solve such problems, we choose to develop a web-based GIS (Web-GIS), which can be operated only with internet web browser.
Recently, web-based applications have been getting popular among internet users (e.g. applications by Google). Those are developed by JavaScript, which are commonly performed on internet web browsers. Therefore, Web-based applications are less dependent on operation environment and could be a good solution for crowd-sourcing projects.
Figure 3: Screenshot of the Web-GIS. Left window is for drawing polygons and lines of urban area by visual interpretation of false color composite of ASTER satellite images. Right window is for showing reference map of other map services.
Task management: Task management had to be automated because so many operators were expected to join the crowd-sourcing project. Therefore, we had to develop the system that automatically assigns tasks to an operator when he/she is logging in the system and finishing an assignment. Process of task management on the system was proposed as the following (Figure 2). First, an operator accesses the platform and visits the assignment page.
Second, the system automatically assigns a region from prioritized regions and geographical extent of the region is passed to the Web-GIS. Third, operator’s access is redirected to the Web-GIS and the operator begins producing ground truth data by visual interpretation. After finishing the assignment, operator visits the assignment page to begin the next assignment.
As well as operator’s process with client system, we proposed a process of approval by expert for quality assurance. Finished assignments were registered in queue for approval by expert. If an output was not good enough to assure quality, it was returned to the operator for revision. This process involving approval by expert would serve as quality assurance as mentioned at Solution 2 in the section 2.1.
3. SYSTEM IMPLEMENTATION
3.1. System implementation of server side
We implemented the server-side system with WMS and WFS using three open source GIS software: MapServer (https://mapserver.org), PostgreSQL (https://www.postgresql.org), PostGIS ), GeoServer (https://www.geoserver.org). Figure 2 shows the overview of the implementation.
MapServer is the most popular server software with WMS. It has three attractive functions so that we selected it for the system. First, it has capability with various image file formats supported by Geographical Data Abstraction Library (GDAL; https://www.gdal.org). Second, it dynamically assembles separated image files into a single layer with least computation for a requested extent. For global urban area mapping, due to sparsely distributed urban area, mosaic operation of the reference images would yield much unnecessary pixels. Third, it dynamically constructs look-up table of RGB composition to enhance contrast within a requested extent. This function is also useful for our project to avoid preparing huge number of reference images enhanced scene by scene.
Figure 4: Screenshot of the assignment interface.
We used false color composite of ASTER/VNIR satellite images archived in GEO Grid, AIST, Japan (Yamamoto et al. 2006). 11802 scenes of the images for 3372 cities of the world were indexed in the catalogue of MapServer. For the management of ground truth data, we setup a relational database management system (RDBMS) with spatial data using PostgreSQL and PostGIS. PostgreSQL has a long history of the development by open community and is one of the most leading RDBMS. PostGIS is spatial data extension of PostgreSQL. Owing to the open development, it has flexible compatibility and extensibility with WFS software. In addition, it has useful functions of querying and manipulating spatial data for analyzing and summarizing outputs of crowd-sourced data.
For implementing WFS, we used GeoServer, which is highly functional server software of geospatial database. It has WFS function compatible with PostGIS database; then we could easily set up the WFS interface for the system.
The server-side system was constructed on Debian GNU/Linux 6.0.2 with Apache 2.2.16. All of the software packages except GeoServer were provided as compiled binary packages so that stable installation was very easy.
3.2. System implementation of client side
Web-GIS client of the system was developed as a web application, which was operated just with web browser.
Operators can use the Web-GIS on any operating system without installing specialized software. The Web-GIS client was assured to be operated with Google Chrome and Mozilla Firefox, both of which are commonly used web browser distributed for several operation systems, including Windows, Mac OS, Linux and so on. We also assured that the Web-GIS was correctly operated with tablet PC with iOS. Figure 2 shows a screenshot of the Web-GIS client on Google Chrome.
We developed the Web-GIS with OpenLayers (https://openlayers.org/), which is a JavaScript library for developing web-based GIS. It enables displaying WMS, WFS, and external map services such as Google, Bing, and Yahoo. In addition, it has a function of editing vector data via WFS. We developed the user interface with minimum functions for procedures to be optimized. Polygons and lines drawn within the Web-GIS were automatically saved. This helped to avoid loss of work data due to incidental problems, such as crash of web browser and loss of internet connection. Also, to assure quality owing to map scale, we set constraints of map scale being 1:10,000 to begin editing polygons and lines in the Web-GIS.
Figure 5: Screenshot of the achievement list.
The assignment interface was shown as Figure 4. Just by asscessing this interface, one of prioritized region was assigned to accessing operator. Operator began producing ground truth data by clicking hyperlink to the Web-GIS for extent of assigned region. Figure 5 shows a screenshot of achievement list on web browser, which was developed as an implementation of
Solution 4 in the section 2.1. Operators could check their outputs and its status of approval by experts. This interface also provide hyperlinks to the Web-GIS for revision of outputs.
4. EXPERIMENT RESULTS
Experimental works were conducted for prioritized region which had major errors within urban area mapping in global scale using ASTER satellite data (Miyazaki et al. 2011). Assignment units were definded with zoom level 10, 12, and 13 defined by TMS scheme (80 km × 80 km, 20 km × 20 km, and 10 km × 10 km for each).
Figure 6 shows distribution of working time by the assignment unit. We had experimental operation on 12, 38, and 92 tiles for tiles of zoom level 10, 12, and 13, respectively. Medians of work time duration was 87.2, 6.2, and 1.4 hours, mean values were 117.8, 6.8, 1.7 hours, and standard deviations were 72.4, 4.7, and 1.4 hours, respectively. Working time by unit was reduced with the size of unit. The statistics simply shows that working time were drastically varied by tiles.
5. DISCUSSIONS AND CONCLUSION
Whole of the system was well developed and operated for the experiment described above. However, we would note that there were several incidents affecting system operation. For the system operation to be securely sustainable, introducing cloud service, such as IaaS (Infrastructure as a Service) and PaaS (Platform as a Service), are possible solutions to outsource basic parts of the system operation.
According to the results of the experiment, the size of work unit has to be smaller than 10 km × 10 km for the working hours to be as long as one hour. There is still a lot of space to be investigated on conditions of assigned region, such as complexity of urban area and quality of satellite images. In addition to that, operator’s background experience of remote sensing should be a significant factor of work efficiency.
Although further investigation is required for practical implementation of crowd-sourcing project, these results would be helpful to estimate total working time for a project of urban area mapping by crowd sourcing. These suggestive results also could be applicable to other land cover classes.
Figure 6: Distribution of working time by the size of assigned region.
For further study, we would suggest following issues: (i) investigation of learning process on visual interpretation with operators and (ii) Applying ‘Gamification’ approach, with which crowds are motivated for completing tasks of projects.
ACKNOWLEDGEMENT
This research used ASTER Data beta processed by the AIST GEO Grid from ASTER Data owned by the Ministry of Economy, Trade and Industry of Japan.
REFERENCES
- Angel, S., Sheppard, S.C. & Civco, D.L., 2005. The Dynamics of Global Urban Expansion.
- Coleman, D.J., Georgiadou, Y. & Labonte, J., 2009. Volunteered Geographic Information : The Nature and Motivation of Produsers. International Journal of Spatial Data Infrastructures Research, 4(2009), pp.332–358.
- Fritz, S. et al., 2009. Geo-Wiki.Org: The Use of Crowdsourcing to Improve Global Land Cover. Remote Sensing, 1(3), pp.345–354.
- Girres, J.-F. & Touya, G., 2010. Quality Assessment of the French OpenStreetMap Dataset. Transactions in GIS, 14(4), pp.435–459.
- Goodchild, M.F., 2008. Commentary: whither VGI? GeoJournal, 72(3-4), pp.239–244.
- Haklay, M., 2010. How good is volunteered geographical information? A comparative study of OpenStreetMap and Ordnance Survey datasets. Environment and Planning B Planning and Design , 37(4), pp.682–703.
- Haklay, M. & Weber, P., 2008. OpenStreetMap: User-Generated Street Maps. Pervasive Computing, IEEE, 7(4), pp.12–18.
- Hand, E., 2010. Citizen science: People power. Nature, 466(7307), pp.685–7.
- Iwao, K. et al., 2006. Validating land cover maps with Degree Confluence Project information. Geophysical
- Research Letters, 33, p.L23404.
- Miyazaki, H., Itabashi, K., et al., 2011. High-Resolution Urban Area Map for 3372 Cities of the World. In 32nd Asian Conference on Remote Sensing. Taipei, p. PS–3.
- Miyazaki, H., Iwao, K. & Shibasaki, R., 2011. Development of a New Ground Truth Database for Global Urban Area Mapping from a Gazetteer. Remote Sensing, 3(6), pp.1177–1187.
- Open Geospatial Consortium, 2011a. Web Feature Service. Open Geospatial Consortium. Available at: https://www.opengeospatial.org/standards/wfs [Accessed August 31, 2011].
- Open Geospatial Consortium, 2011b. Web Map Service. Open Geospatial Consortium. Available at: https://www.opengeospatial.org/standards/wms [Accessed August 31, 2011].
- Orenstein, D. et al., 2011. How much is built? Quantifying and interpreting patterns of built space from different data sources. International Journal of Remote Sensing, 32(9), pp.2621–2644.
- Schneider, A., Friedl, M.A. & Potere, D., 2009. A new map of global urban extent from MODIS satellite data.
- Environmental Research Letters, 4(4), p.44003.
- Yamamoto, N. et al., 2006. GEO Grid: Grid Infrastructure for Integration of Huge Satellite Imagery and Geoscience Data Sets. In Proceedings of The Sixth IEEE International Conference on Computer and Information Technology. p. 75.