sperimentazioni sulla città di Milano e risultati OSMit 2016
Transcript
sperimentazioni sulla città di Milano e risultati OSMit 2016
OSMit 2016 Milano, 20 maggio 2016 Qualità dei dati OpenStreetMap: sperimentazioni sulla città di Milano e risultati Marco Minghini & Monia Elisa Molinari Politecnico di Milano | GEOlab Outline ✔ We will present some ongoing research works on OpenStreetMap: ➔ An automated procedure to compare OSM and authoritative road network datasets ➔ An automated methodology for converting OSM data into a Land Use/Cover map ➔ Positional accuracy assessment of the OSM buildings in Milan ➔ Open geodata quality in Milan GEOlab, Como Campus An automated procedure to compare OSM and authoritative road network datasets Monia Elisa Molinari, Marco Minghini, Maria Antonia Brovelli Politecnico di Milano, Como Campus, DICA, via Valleggio 11, 22100 Como (Italy) 2 Motivation of the work – VGI & OSM quality ✔ Increasing popularity of OpenStreetMap (OSM) as today's most notable Volunteered Geographic Information (VGI) project on the Internet ✔ Increasing concern on VGI (and OSM) data quality: ➔ ➔ ➔ ➔ ➔ spatial accuracy completeness temporal accuracy semantic accuracy up-to-dateness ✔ Increasing availability of open data from NMAs and CSC that can be used as a source of comparison for VGI (and OSM) data: ➔ comparing two spatial datasets against each other is a challenging geocomputation problem! GEOlab, Politecnico di Milano – Como Campus 3 Motivation of the work – OSM comparisons ✔ Literature provides plenty of works assessing or comparing OSM quality against that of authoritative datasets: ➔ ➔ ➔ strongly focused on road network OSM compared to data from NMA (UK Ordnance Survey, French NMA, USGS TNM/TIGER, etc.) and CSC (Navteq, TeleAtlas, etc.) semi- or fully-automated ✔ Comparison techniques are very strong and fit for purpose, but mostly application and dataset specific: ➔ ➔ hard to replicate difficult to extend to other dataset comparisons GEOlab, Politecnico di Milano – Como Campus Our methodology 4 ✔ Novel methodology to compare OSM and authoritative road datasets: ➔ ➔ ➔ ➔ fully automated focused on spatial accuracy and completeness flexible, i.e. not developed for a specific dataset built with FOSS4G (Free and Open Source Software for Geospatial) ✗ reusable and extensible in case of need GEOlab, Politecnico di Milano – Como Campus Our methodology – Overview 5 ✔ Currently developed as 3 GRASS GIS modules: ➔ ➔ written in Python available with a Graphical User Interface (GUI) ✔ Comparison between OSM and reference road network datasets composed of 3 consecutive steps: ➔ ➔ ➔ 1. Preliminary comparison of the datasets and computation of global statistics 2. Geometric preprocessing of the OSM dataset to extract a subset which is fully comparable with the reference dataset 3. Evaluation of OSM spatial accuracy using a grid-based approach GEOlab, Politecnico di Milano – Como Campus Case study: Paris 6 data © IGN and © OpenStreetMap contributors GEOlab, Politecnico di Milano – Como Campus 7 Step 1: Preliminary comparison of the datasets ✔ Compute the total length of OSM and IGN datasets and their length difference, both in map units and percentage ➔ output values are returned in a text file ✗ ≅450 km more in OSM than IGN dataset! GEOlab, Politecnico di Milano – Como Campus 8 Step 1: Preliminary comparison of the datasets ✔ Compute the length of OSM and IGN datasets contained in a user- specified buffer around the IGN and OSM dataset, respectively ➔ output values are returned in a text file ✗ ≅450 km more in OSM than IGN dataset! GEOlab, Politecnico di Milano – Como Campus 9 Step 2: preprocessing of the OSM dataset ✔ Cleaning of OSM dataset to make it comparable with IGN dataset ✔ Apply a buffer of user-specified width around each IGN segment ➔ ➔ suitable buffer width derived from Step 1 delete all the OSM roads falling outside the buffer GEOlab, Politecnico di Milano – Como Campus 10 Step 2: preprocessing of the OSM dataset ✔ Further clean the OSM dataset: ➔ ➔ compute the angular coefficient of each IGN segment and all the OSM segments included in the buffer around it compare the difference between IGN and OSM angular coefficients with a user-specified threshold GEOlab, Politecnico di Milano – Como Campus 11 Step 2: preprocessing of the OSM dataset ✔ Further clean the OSM dataset: GEOlab, Politecnico di Milano – Como Campus 12 Step 2: preprocessing of the OSM dataset ✔ Outputs from Step 2 are saved and can be used for further analysis: ➔ Area 2: buffer = 11 m, angular coefficient threshold = 30° ✗ preprocessed OSM has ≅50 km less than original OSM ✗ preprocessed OSM has still ≅50 km more than IGN GEOlab, Politecnico di Milano – Como Campus 13 Step 3: grid-based evaluation of OSM accuracy ✔ Use a grid to take into account OSM heterogeneous nature: ➔ import a vector layer to be used as grid GEOlab, Politecnico di Milano – Como Campus 14 Step 3: grid-based evaluation of OSM accuracy ✔ For each grid cell, find the OSM maximum deviation from IGN: ➔ Area 2: perc=98% 5-6 m 6-7 m 7-8 m 8-9 m 9 - 10 m 10 - 11 m GEOlab, Politecnico di Milano – Como Campus 15 Step 3: grid-based evaluation of OSM accuracy ✔ For each grid cell, evaluate OSM accuracy against one or more threshold values of OSM deviation from IGN: ➔ ➔ length percentage of OSM roads included in the threshold buffer Area 2: tol_eval = 6 m 85 - 90% 90 - 95% 95 - 100% GEOlab, Politecnico di Milano – Como Campus Transposition of the procedure as a WPS ✔ Work in progress, currently available just for Step 1 ➔ available at http://131.175.143.84/WPS GEOlab, Politecnico di Milano – Como Campus 16 Future work ➔ ➔ ➔ 17 reduce computational time (especially for Step 2) increase usability through a WPS implementation (also for Steps 2 & 3) extend the procedure to also compare attributes GEOlab, Politecnico di Milano – Como Campus Links & publications 18 ✔ Links: ➔ ➔ source code: https://github.com/MoniaMolinari/OSM-roads-comparison WPS client: http://131.175.143.84/WPS ✔ Related publications: ➔ ➔ ➔ Brovelli M. A., Minghini M., Molinari M. and Mooney P (in press). Towards an automated comparison of OpenStreetMap with authoritative road datasets. Transactions in GIS. Antunes F., Fonte C. C., Brovelli M. A., Minghini M., Molinari M. and Mooney P. (2015) Assessing OSM Road Positional Quality with Authoritative Data. Proceedings of the VIII Conferência Nacional de Cartografia e Geodesia, Lisbon (Portugal), October 29-30, 2015. Brovelli M. A., Minghini M., Molinari M. and Mooney P. (2015) A FOSS4G-based procedure to compare OpenStreetMap and authoritative road network datasets. Geomatics Workbooks 12, 235-238, ISSN 1591-092X. GEOlab, Politecnico di Milano – Como Campus An automated methodology for converting OSM data into a Land Use/Cover map Marco Minghini, Cidália Fonte, Vyron Antoniou, Linda See, Joaquim Patriarca, Maria Antonia Brovelli, Grega Milcinski EU COST Actions on VGI ✔ COST (Cooperation in Science and Technology) is an EU framework to provide networking of nationally funded research activities. ✔ We are currently involved in 2 EU COST Actions on Volunteered Geographic Information (VGI): ➔ COST Action TD1202 - “Mapping and the Citizen Sensor” (http://www.citizensensor-cost.eu) ➔ COST Action IC1203 - “European Network Exploring Research into Geospatial Information Crowdsourcing: software and methodologies for harnessing geographic information from the crowd (ENERGIC)” (http://vgibox.eu) Land Use/Land Cover (LULC) maps ✔ LULC maps are crucial products for multiple areas of application: ➔ modeling climate and biochemistry of the Earth ➔ natural resources managem ➔ planning/urban studies ➔ many others ✔ LULC maps are created through the classification of satellite imagery and validated using reference data: ➔ the creation and updating process is long, costly, and time-consuming – insufficient to describe rapidly-changing environments ➔ level of detail and spatial coverage inadequate for many applications OSM as a source of LULC maps ✔ Exploiting OSM as a source for LULC maps has a number of advantages: ➔ OSM full spatial coverage in the world ➔ OSM richness ➔ OSM non-stop updating ➔ OSM open license ✔ Exploiting OSM as a source for LULC maps has some disadvantages: ➔ OSM uneven spatial coverage ➔ OSM positional accuracy & geometrical inconsistencies ➔ OSM semantic inconsistencies ✔ Purpose: creating an automated procedure which converts OSM data in a specific area into a LULC map OSM as a source of LULC maps ✔ The nomenclature chosen for LULC classes is the one of EU Urban Atlas: Level 1 1. Artificial Surfaces 2. Agricultural, seminatural areas, wetlands 3. Forests 5. Water Level 2 Level 3 1.1 Urban Fabric 1.1.1 Continuous urban fabric 1.1.2 Discontinuous urban fabric 1.1.3 Isolated Structures 1.2 Industrial, commercial, public, military, private and transport units 1.2.1 Industrial, commercial, public, military and private units 1.2.2 Road and rail network and associated land 1.2.3 Port areas 1.2.4 Airports 1.3 Mine, dump and construction sites 1.3.1 Mineral extraction and dump sites 1.3.3 Construction sites 1.3.4 Land without current use 1.4 Artificial non-agricultural vegetated areas 1.4.1 Green urban areas 1.4.2 Sports and leisure facilities Procedure to convert OSM to LULC maps ✔ The procedure consists of transforming available OSM features into LULC corresponding to Urban Atlas (UA) levels 1 and 2 ➔ this is done considering any OSM data that might be associated with classes in all three levels of the UA nomenclature ✔ At this initial stage of the procedure, the only keys considered are: ➔ OSM ways (polylines) ✗ highway ✗ railway ✗ waterway ➔ OSM ways (polygons) ✗ building ✗ landuse ✗ natural ✔ The procedure is implemented in Python and makes use of GRASS GIS, GDAL/OGR Python bindings and PostgreSQL/PostGIS. Procedure to convert OSM to LULC maps ✔ The main steps of the procedure are: ➔ identification of the keys available in the OSM data to be processed ➔ conversion of the linear features into areas using spatial analysis, and merging them with areas in the polygon features that have values of predefined keys corresponding to the themes of the linear features ➔ conversion of the polygon features to the LUCC according to themes ➔ application of priority rules to solve remaining inconsistencies ➔ conversion of the map into the appropriate Minimum Mapping Unit (MMU), merging small features with their neighboring features Example: roads (highway=*) → UA class 1.2 ✔ The main steps of the conversion are: ➔ choose the values of “highway” key to be considered for the conversion ➔ identify a maximum and typical width for each road type in the region of interest (typically larger for primary roads and smaller for other less important roads); predefined default values may also be considered ➔ compute the distance between each road segment and the buildings within the maximum defined width of roads; store the minimum value ➔ generate an area feature for each road segment where the distance to buildings is larger than zero, using the width to generate a buffer; for those segments where the distance to buildings was not obtained, the typical width chosen above is applied ➔ merge and dissolve the created buffers Procedure to convert OSM to LULC maps ✔ Priority rules are defined to solve remaining inconsistencies: ➔ some classes prevail on other according to a hierarchy of importance Level of UA Class priority (level 2) Class Description 1 1.2 Industrial, commercial, public, military, private and transport units 2 5.0 Water 3 1.4 Artificial non-agricultural vegetated areas 4 1.3 Mine, dump and construction sites 5 1.1 Urban Fabric 6 2.0 Agricultural, semi-natural areas, wetlands 7 3.0 Forests Case studies ✔ Two squared areas of 10 km on one side for a total area of 100 km 2: ➔ 60 km NW of Paris – agriculture and forest region, low urban density ➔ Milan city centre – very dense urban area, small agricultural and rural areas, parks and a cemetery Results ✔ Paris Results ✔ Milan Conclusions ✔ Results are generally satisfactory: ➔ very good correspondence with UA classes ➔ OSM advantages: up-to-dateness, higher resolution data ✔ There are a number of problems: ➔ empty areas in the final LULC map ✗ no OSM data at all ✗ use of GeoFabrik data, where some data are missing ✗ many tags not (yet) taken into account ➔ values chosen for buffers can be highly dependent on specific regions ✔ Next/future developments: ➔ switch to the Planet OSM file ➔ increase/detail the number of OSM tags considered ➔ implementation of a Web service to expose the procedure on the Web Reference ✔ Fonte C.C., Minghini M., Antoniou V., See L., Patriarca J., Brovelli M.A. & Milcinski, G. (in press) “An automated methodology for converting OSM data into a Land Use/Cover map”. Proceedings of the 6th International Conference on Cartography & GIS, Albena, Bulgaria, 13-17 June 2016. Positional accuracy assessment of the OSM buildings in Milan Monia Elisa Molinari, Marco Minghini, Maria Antonia Brovelli, Giorgio Zamboni Politecnico di Milano | GEOlab Introduction ✔ Earliest works on OSM quality assessment were all focused on streets, that were the initial mapping target of the OSM project. ✔ In recent years, researches on other objects of OSM database began to appear: ✔ point of interest ✔ land use features ✔ building footprints Objective ✔ The purpose of this study is to contribute to the quality assessment of OSM building data quality in Milan. ✔ The quality assessment has been performed by comparing the OSM data (downloaded in January 2016) with the building layer of the official vector cartography of Milan Municipality (produced in 2012) ✔ Two different analyses have been performed: ✔ ✔ Completeness evaluation based on methods suggested by literature Positional accuracy evaluation by means of a novel approach developed at GEOlab - PoliMI Completeness assessment ✔ The completeness analysis has been performed through the area ratio unit-based method proposed by Hecht et al. (2013) C = AOSM /AREF C = completeness AREF = total area of reference buildings AOSM = total area of OSM buildings The completeness parameter should be calculated within a defined spatial unit (e.g. administrative or geometrical) in order to take into account the heterogeneity of OSM data. Completeness assessment ● The area ratio method can introduce an overestimation of C due to exceeding data available in OSM. For this reason the computation of three additional rates is recommended: ● ● ● True Positive (TP): the areas of agreement between the datasets False Positive (FP): the OSM building areas which do not exist in the REF dataset False Negative (FN): the REF building areas which do not exist in the OSM dataset. Completeness analysis results ✔ Spatial distribution of C rate in Milan area C VALUES % > 100% 28.9% 80% < C < 100% 27.7% 60% < C < 80% 18.5% 40% < C < 60% 8.0% C < 40% 16.9% The completeness of the OSM dataset is very high in the city center and gradually decreases when moving towards the periphery. TP analysis results ✔ Spatial distribution of TP rate in Milan area TP VALUES % 60% < C < 100% 63.9% 60% < C < 40% 16.0% C < 40% 20.1% Results largely confirm the trend observed for C: OSM completeness is high in the city center and gradually lower in the peripheral areas Positional accuracy assessment ✔ An advanced algorithm implemented in an application (Brovelli and Zamboni, 2004) allowed the: ✔ Quasi-automated detection of homologous points between REF and OSM by means of geometric, topological and semantic analyses. Positional accuracy assessment ✔ An advanced algorithm implemented in an application (Brovelli and Zamboni, 2004) allowed the: ✔ Quasi-automated detection of homologous points between REF and OSM by means of geometric, topological and semantic analyses. ✔ Application of a set of warping transformations to the OSM dataset in order to optimize its match with the REF dataset Positional accuracy analysis results ✔ The number of homologous pairs detected is approximately 100000 Cell Points Trasf. ΔY μ [m] ΔX μ [m] d μ [m] 0 19135 None 0.45 0.46 0.81 1-2 16480 None 0.35 0.46 0.77 2 18732 None 0.44 0.43 0.79 3-1 4318 None 0.28 0.41 0.71 The positional accuracy is the same in both Milan center and periphery Positional accuracy analysis results ✔ Using the homologous points detected it is possible to estimate the parameters of an affine or spline MR transformation to remove the systematic translation and reduce the mean distance Cell 0 1-2 2 3-1 Points 19135 16480 18732 4318 Trasf. ΔY μ [m] ΔX μ [m] d μ [m] None 0.45 0.46 0.81 Affine 0.00 0.00 0.56 Spline MR 0.00 0.00 0.50 None 0.35 0.46 0.77 Affine 0.00 0.00 0.57 Spline MR 0.00 0.00 0.50 None 0.44 0.43 0.79 Affine 0.00 0.00 0.55 Spline MR 0.00 0.00 0.49 None 0.28 0.41 0.71 Affine 0.00 0.00 0.55 Spline MR 0.00 0.00 0.48 References ✔ Brovelli M.A., Minghini M., Molinari M.E. & Zamboni G. (in press) “Positional accuracy assessment of the OpenStreetMap buildings layer through automatic homologous pair detection: the method and a case study”. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. ✔ Brovelli M.A., Zamboni G., 2004. A step towards geographic interoperability: the automatic detection of maps homologous pairs. In: Proceedings of Urban Data Management Society Conference (UDMS ’04), Chioggia, Italy, 27-29 October 2004. ✔ Hecht, C., Kunze, C., Hahmann, S., 2013. Measuring Completeness of Building Footprints in OpenStreetMap over Space and Time. ISPRS International Journal of Geo-Information, 2(4), pp. 1066-1091. Open geodata quality in Milan Marco Minghini, Monia Elisa Molinari, Miriam Molteni, Maria Antonia Brovelli Politecnico di Milano | GEOlab Open (geo)data ✔ Open data (http://opendefinition.org): ➔ permissions: use, modification, separation, redistribution, compilation, non-discrimination, propagation, application to any purpose, no charge ➔ conditions: attribution, integrity, share-alike, notice, source, technical restriction prohibition, non-aggression ✔ On a total of 13 domains of open governance data, the geospatial one has been recognized as the one with the highest commercial value (Carrara, W., Chan, W. S., Fischer, S., van Steenbergen, E., 2015. Creating Value through Open Data: Study on the Impact of Re-use of Public Data Resources. European Commission, European Data Portal, http://www.europeandataportal.eu/sites/default/files/edp_creating_value_through_open_data_0.pdf ✔ Which is the quality of open (geo)data? ➔ focus on Milan Municipality (only data published by Italian institutions) ➔ quality check performed using ISO guidelines Open geodata in Milan Municipality ✔ Classification according to the provider: Open geodata in Milan Municipality ✔ Classification according to the content: Open geodata in Milan Municipality ✔ Classification according to the format: Open geodata in Milan Municipality ✔ Classification according to the scale: Open geodata in Milan Municipality ✔ Classification according to the license: Open geodata in Milan Municipality ✔ Classification according to the year of publication: Open geodata in Milan Municipality ✔ Classification according to the content vs. the provider: Example of quality evaluation ✔ Example of quality (positional accuracy) evaluation: ➔ dataset: orthophoto ➔ provider: Italian Environmental Ministry ➔ producer: AGEA (Agenzia per le Erogazioni in Agricoltura) ➔ content: airborne observations ➔ format: Web Map Service (WMS) ➔ scale: national ➔ license: CC-BY-SA ➔ year of publication: 2012 ➔ declared planimetric accuracy: 4 m (3 m, according to AGEA) ✔ Ground truth dataset for quality check: ➔ buildings DBTR Milan Municipality (2012): scale 1:1000, accuracy 20 cm Example of quality evaluation ✔ Procedure: ➔ extract DBTR buildings according to a random stratified sampling on a hexagonal grid (ISO guidelines) ➔ identification of the corresponding buildings on the orthophoto and manual digitization of 3 corners of the roof of each building ➔ computation of accuracy measures on the homologous points ✗ statistics on planimetric errors ✗ measures of error confidence Example of quality evaluation ✔ Statistics on planimetric errors (number of homologous pairs = 1450): Index eX eY e μ 1.66 m 3.65 m 4.32 m Me 1.04 m 2.19 m 3.03 m σ 2.49 m 5.32 m 4.24 m min 0.00 m 0.00 m 0.04 m max 13.03 m 29.70 m 29.94 m n 245 615 758 Example of quality evaluation ✔ Measures of error confidence: Index Value CE39.4 4.15 m CE50 4.89 m CE90 8.91 m CE95 10.16 m CE99.8 14.53 m Example of quality evaluation ✔ Error visualization: Example of quality evaluation ✔ Error visualization: Reference ✔ Brovelli M.A., Minghini M., Molinari M.E. & Molteni M. (in press) “Do open geodata actually have the quality they declare? The case study of Milan, Italy”. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Marco Minghini & Monia Elisa Molinari Politecnico di Milano, GEOlab – Como Campus Via Valleggio 11, 22100 Como (Italy) [email protected], [email protected] @MarcoMinghini, @MoniaMolinari