sperimentazioni sulla città di Milano e risultati OSMit 2016

Transcript

sperimentazioni sulla città di Milano e risultati OSMit 2016
OSMit 2016
Milano, 20 maggio 2016
Qualità dei dati OpenStreetMap:
sperimentazioni sulla città di Milano e risultati
Marco Minghini & Monia Elisa Molinari
Politecnico di Milano | GEOlab
Outline
✔ We will present some ongoing research works on OpenStreetMap:
➔
An automated procedure to compare OSM and authoritative road
network datasets
➔
An automated methodology for converting OSM data into a Land
Use/Cover map
➔
Positional accuracy assessment of the OSM buildings in Milan
➔
Open geodata quality in Milan
GEOlab, Como Campus
An automated procedure to compare
OSM and authoritative road network
datasets
Monia Elisa Molinari, Marco Minghini, Maria Antonia Brovelli
Politecnico di Milano, Como Campus, DICA, via Valleggio 11, 22100 Como (Italy)
2
Motivation of the work – VGI & OSM quality
✔ Increasing popularity of OpenStreetMap (OSM) as today's most notable
Volunteered Geographic Information (VGI) project on the Internet
✔ Increasing concern on VGI (and OSM) data quality:
➔
➔
➔
➔
➔
spatial accuracy
completeness
temporal accuracy
semantic accuracy
up-to-dateness
✔ Increasing availability of open data from NMAs and CSC that can be used
as a source of comparison for VGI (and OSM) data:
➔
comparing two spatial datasets against each other is a challenging
geocomputation problem!
GEOlab, Politecnico di Milano – Como Campus
3
Motivation of the work – OSM comparisons
✔ Literature provides plenty of works assessing or comparing OSM quality
against that of authoritative datasets:
➔
➔
➔
strongly focused on road network
OSM compared to data from NMA (UK Ordnance Survey, French
NMA, USGS TNM/TIGER, etc.) and CSC (Navteq, TeleAtlas, etc.)
semi- or fully-automated
✔ Comparison techniques are very strong and fit for purpose, but mostly
application and dataset specific:
➔
➔
hard to replicate
difficult to extend to other dataset comparisons
GEOlab, Politecnico di Milano – Como Campus
Our methodology
4
✔ Novel methodology to compare OSM and authoritative road datasets:
➔
➔
➔
➔
fully automated
focused on spatial accuracy and completeness
flexible, i.e. not developed for a specific dataset
built with FOSS4G (Free and Open Source Software for Geospatial)
✗ reusable and extensible in case of need
GEOlab, Politecnico di Milano – Como Campus
Our methodology – Overview
5
✔ Currently developed as 3 GRASS GIS modules:
➔
➔
written in Python
available with a Graphical User Interface (GUI)
✔ Comparison between OSM and reference road network datasets
composed of 3 consecutive steps:
➔
➔
➔
1. Preliminary comparison of the datasets and computation of
global statistics
2. Geometric preprocessing of the OSM dataset to extract a subset
which is fully comparable with the reference dataset
3. Evaluation of OSM spatial accuracy using a grid-based approach
GEOlab, Politecnico di Milano – Como Campus
Case study: Paris
6
data © IGN and © OpenStreetMap contributors
GEOlab, Politecnico di Milano – Como Campus
7
Step 1: Preliminary comparison of the datasets
✔ Compute the total length of OSM and IGN datasets and their length
difference, both in map units and percentage
➔
output values are returned in a text file
✗ ≅450 km more in OSM than IGN dataset!
GEOlab, Politecnico di Milano – Como Campus
8
Step 1: Preliminary comparison of the datasets
✔ Compute the length of OSM and IGN datasets contained in a user-
specified buffer around the IGN and OSM dataset, respectively
➔
output values are returned in a text file
✗ ≅450 km more in OSM than IGN dataset!
GEOlab, Politecnico di Milano – Como Campus
9
Step 2: preprocessing of the OSM dataset
✔ Cleaning of OSM dataset to make it comparable with IGN dataset
✔ Apply a buffer of user-specified width around each IGN segment
➔
➔
suitable buffer width derived from Step 1
delete all the OSM roads falling outside the buffer
GEOlab, Politecnico di Milano – Como Campus
10
Step 2: preprocessing of the OSM dataset
✔ Further clean the OSM dataset:
➔
➔
compute the angular coefficient of each IGN segment and all the
OSM segments included in the buffer around it
compare the difference between IGN and OSM angular coefficients
with a user-specified threshold
GEOlab, Politecnico di Milano – Como Campus
11
Step 2: preprocessing of the OSM dataset
✔ Further clean the OSM dataset:
GEOlab, Politecnico di Milano – Como Campus
12
Step 2: preprocessing of the OSM dataset
✔ Outputs from Step 2 are saved and can be used for further analysis:
➔
Area 2: buffer = 11 m, angular coefficient threshold = 30°
✗ preprocessed OSM has ≅50 km less than original OSM
✗ preprocessed OSM has still ≅50 km more than IGN
GEOlab, Politecnico di Milano – Como Campus
13
Step 3: grid-based evaluation of OSM accuracy
✔ Use a grid to take into account OSM heterogeneous nature:
➔
import a vector layer to be used as grid
GEOlab, Politecnico di Milano – Como Campus
14
Step 3: grid-based evaluation of OSM accuracy
✔ For each grid cell, find the OSM maximum deviation from IGN:
➔
Area 2: perc=98%
5-6 m
6-7 m
7-8 m
8-9 m
9 - 10 m
10 - 11 m
GEOlab, Politecnico di Milano – Como Campus
15
Step 3: grid-based evaluation of OSM accuracy
✔ For each grid cell, evaluate OSM accuracy against one or more threshold
values of OSM deviation from IGN:
➔
➔
length percentage of OSM roads included in the threshold buffer
Area 2: tol_eval = 6 m
85 - 90%
90 - 95%
95 - 100%
GEOlab, Politecnico di Milano – Como Campus
Transposition of the procedure as a WPS
✔ Work in progress, currently available just for Step 1
➔
available at http://131.175.143.84/WPS
GEOlab, Politecnico di Milano – Como Campus
16
Future work
➔
➔
➔
17
reduce computational time (especially for Step 2)
increase usability through a WPS implementation (also for Steps 2 & 3)
extend the procedure to also compare attributes
GEOlab, Politecnico di Milano – Como Campus
Links & publications
18
✔ Links:
➔
➔
source code: https://github.com/MoniaMolinari/OSM-roads-comparison
WPS client: http://131.175.143.84/WPS
✔ Related publications:
➔
➔
➔
Brovelli M. A., Minghini M., Molinari M. and Mooney P (in press).
Towards an automated comparison of OpenStreetMap with
authoritative road datasets. Transactions in GIS.
Antunes F., Fonte C. C., Brovelli M. A., Minghini M., Molinari M. and
Mooney P. (2015) Assessing OSM Road Positional Quality with
Authoritative Data. Proceedings of the VIII Conferência Nacional de
Cartografia e Geodesia, Lisbon (Portugal), October 29-30, 2015.
Brovelli M. A., Minghini M., Molinari M. and Mooney P. (2015) A
FOSS4G-based procedure to compare OpenStreetMap and
authoritative road network datasets. Geomatics Workbooks 12,
235-238, ISSN 1591-092X.
GEOlab, Politecnico di Milano – Como Campus
An automated methodology for converting
OSM data into a Land Use/Cover map
Marco Minghini, Cidália Fonte, Vyron Antoniou, Linda See,
Joaquim Patriarca, Maria Antonia Brovelli, Grega Milcinski
EU COST Actions on VGI
✔ COST (Cooperation in Science and Technology) is an EU framework to
provide networking of nationally funded research activities.
✔ We are currently involved in 2 EU COST Actions on Volunteered
Geographic Information (VGI):
➔
COST Action TD1202 - “Mapping and the Citizen Sensor”
(http://www.citizensensor-cost.eu)
➔
COST Action IC1203 - “European Network Exploring Research into
Geospatial Information Crowdsourcing: software and methodologies
for harnessing geographic information from the crowd (ENERGIC)”
(http://vgibox.eu)
Land Use/Land Cover (LULC) maps
✔ LULC maps are crucial products for multiple areas of application:
➔
modeling climate and biochemistry of the Earth
➔
natural resources managem
➔
planning/urban studies
➔
many others
✔ LULC maps are created through the classification of satellite imagery
and validated using reference data:
➔
the creation and updating process is long, costly, and time-consuming
– insufficient to describe rapidly-changing environments
➔
level of detail and spatial coverage inadequate for many applications
OSM as a source of LULC maps
✔ Exploiting OSM as a source for LULC maps has a number of advantages:
➔
OSM full spatial coverage in the world
➔
OSM richness
➔
OSM non-stop updating
➔
OSM open license
✔ Exploiting OSM as a source for LULC maps has some disadvantages:
➔
OSM uneven spatial coverage
➔
OSM positional accuracy & geometrical inconsistencies
➔
OSM semantic inconsistencies
✔ Purpose: creating an automated procedure which converts OSM data in
a specific area into a LULC map
OSM as a source of LULC maps
✔ The nomenclature chosen for LULC classes is the one of EU Urban Atlas:
Level 1
1. Artificial Surfaces
2. Agricultural, seminatural areas, wetlands
3. Forests
5. Water
Level 2
Level 3
1.1 Urban Fabric
1.1.1 Continuous urban fabric
1.1.2 Discontinuous urban fabric
1.1.3 Isolated Structures
1.2 Industrial, commercial, public,
military, private and transport
units
1.2.1 Industrial, commercial, public,
military and private units
1.2.2 Road and rail network and
associated land
1.2.3 Port areas
1.2.4 Airports
1.3 Mine, dump and construction
sites
1.3.1 Mineral extraction and dump sites
1.3.3 Construction sites
1.3.4 Land without current use
1.4 Artificial non-agricultural
vegetated areas
1.4.1 Green urban areas
1.4.2 Sports and leisure facilities
Procedure to convert OSM to LULC maps
✔ The procedure consists of transforming available OSM features into
LULC corresponding to Urban Atlas (UA) levels 1 and 2
➔
this is done considering any OSM data that might be associated with
classes in all three levels of the UA nomenclature
✔ At this initial stage of the procedure, the only keys considered are:
➔
OSM ways (polylines)
✗ highway
✗ railway
✗ waterway
➔
OSM ways (polygons)
✗ building
✗ landuse
✗ natural
✔ The procedure is implemented in Python and makes use of GRASS GIS,
GDAL/OGR Python bindings and PostgreSQL/PostGIS.
Procedure to convert OSM to LULC maps
✔ The main steps of the procedure are:
➔
identification of the keys available in the OSM data to be processed
➔
conversion of the linear features into areas using spatial analysis, and
merging them with areas in the polygon features that have values of
predefined keys corresponding to the themes of the linear features
➔
conversion of the polygon features to the LUCC according to themes
➔
application of priority rules to solve remaining inconsistencies
➔
conversion of the map into the appropriate Minimum Mapping Unit
(MMU), merging small features with their neighboring features
Example: roads (highway=*) → UA class 1.2
✔ The main steps of the conversion are:
➔
choose the values of “highway” key to be considered for the conversion
➔
identify a maximum and typical width for each road type in the region
of interest (typically larger for primary roads and smaller for other less
important roads); predefined default values may also be considered
➔
compute the distance between each road segment and the buildings
within the maximum defined width of roads; store the minimum value
➔
generate an area feature for each road segment where the distance to
buildings is larger than zero, using the width to generate a buffer; for
those segments where the distance to buildings was not obtained, the
typical width chosen above is applied
➔
merge and dissolve the created buffers
Procedure to convert OSM to LULC maps
✔ Priority rules are defined to solve remaining inconsistencies:
➔
some classes prevail on other according to a hierarchy of importance
Level of UA Class
priority (level 2)
Class Description
1
1.2
Industrial, commercial, public,
military, private and transport units
2
5.0
Water
3
1.4
Artificial non-agricultural
vegetated areas
4
1.3
Mine, dump and construction sites
5
1.1
Urban Fabric
6
2.0
Agricultural, semi-natural areas,
wetlands
7
3.0
Forests
Case studies
✔ Two squared areas of 10 km on one side for a total area of 100 km 2:
➔
60 km NW of Paris – agriculture and forest region, low urban density
➔
Milan city centre – very dense urban area, small agricultural and rural
areas, parks and a cemetery
Results
✔ Paris
Results
✔ Milan
Conclusions
✔ Results are generally satisfactory:
➔
very good correspondence with UA classes
➔
OSM advantages: up-to-dateness, higher resolution data
✔ There are a number of problems:
➔
empty areas in the final LULC map
✗ no OSM data at all
✗ use of GeoFabrik data, where some data are missing
✗ many tags not (yet) taken into account
➔
values chosen for buffers can be highly dependent on specific regions
✔ Next/future developments:
➔
switch to the Planet OSM file
➔
increase/detail the number of OSM tags considered
➔
implementation of a Web service to expose the procedure on the Web
Reference
✔ Fonte C.C., Minghini M., Antoniou V., See L., Patriarca J., Brovelli M.A. &
Milcinski, G. (in press) “An automated methodology for converting OSM
data into a Land Use/Cover map”. Proceedings of the 6th International
Conference on Cartography & GIS, Albena, Bulgaria, 13-17 June 2016.
Positional accuracy assessment
of the OSM buildings in Milan
Monia Elisa Molinari, Marco Minghini, Maria Antonia
Brovelli, Giorgio Zamboni
Politecnico di Milano | GEOlab
Introduction
✔ Earliest works on OSM quality assessment were all focused on
streets, that were the initial mapping target of the OSM project.
✔ In recent years, researches on other objects of OSM database began to
appear:
✔
point of interest
✔
land use features
✔
building footprints
Objective
✔ The purpose of this study is to contribute to the quality assessment of
OSM building data quality in Milan.
✔ The quality assessment has been performed by comparing the OSM
data (downloaded in January 2016) with the building layer of the
official vector cartography of Milan Municipality (produced in 2012)
✔ Two different analyses have been performed:
✔
✔
Completeness evaluation based on methods suggested by
literature
Positional accuracy evaluation by means of a novel approach
developed at GEOlab - PoliMI
Completeness assessment
✔ The completeness analysis has been performed through the area ratio
unit-based method proposed by Hecht et al. (2013)
C = AOSM /AREF
C = completeness
AREF = total area of reference buildings
AOSM = total area of OSM buildings
The completeness parameter should be calculated within a
defined spatial unit (e.g. administrative or geometrical) in order
to take into account the heterogeneity of OSM data.
Completeness assessment
●
The area ratio method can introduce an overestimation of C due to
exceeding data available in OSM. For this reason the computation of
three additional rates is recommended:
●
●
●
True Positive (TP): the areas of agreement between the datasets
False Positive (FP): the OSM building areas which do not exist in the
REF dataset
False Negative (FN): the REF building areas which do not exist in the
OSM dataset.
Completeness analysis results
✔ Spatial distribution of C rate in Milan area
C VALUES
%
> 100%
28.9%
80% < C < 100%
27.7%
60% < C < 80%
18.5%
40% < C < 60%
8.0%
C < 40%
16.9%
The completeness of the OSM dataset is very high in the city center
and gradually decreases when moving towards the periphery.
TP analysis results
✔ Spatial distribution of TP rate in Milan area
TP VALUES
%
60% < C < 100%
63.9%
60% < C < 40%
16.0%
C < 40%
20.1%
Results largely confirm the trend observed for C: OSM completeness is
high in the city center and gradually lower in the peripheral areas
Positional accuracy assessment
✔ An advanced algorithm implemented in an application (Brovelli and
Zamboni, 2004) allowed the:
✔
Quasi-automated detection of homologous points between REF
and OSM by means of geometric, topological and semantic
analyses.
Positional accuracy assessment
✔ An advanced algorithm implemented in an application (Brovelli and
Zamboni, 2004) allowed the:
✔
Quasi-automated detection of homologous points between REF
and OSM by means of geometric, topological and semantic
analyses.
✔
Application of a set of warping transformations to the OSM dataset
in order to optimize its match with the REF dataset
Positional accuracy analysis results
✔ The number of homologous pairs detected is approximately 100000
Cell
Points
Trasf.
ΔY
μ [m]
ΔX
μ [m]
d
μ [m]
0
19135
None
0.45
0.46
0.81
1-2
16480
None
0.35
0.46
0.77
2
18732
None
0.44
0.43
0.79
3-1
4318
None
0.28
0.41
0.71
The positional accuracy is the same in
both Milan center and periphery
Positional accuracy analysis results
✔ Using the homologous points detected it is possible to estimate the
parameters of an affine or spline MR transformation to remove the
systematic translation and reduce the mean distance
Cell
0
1-2
2
3-1
Points
19135
16480
18732
4318
Trasf.
ΔY
μ [m]
ΔX
μ [m]
d
μ [m]
None
0.45
0.46
0.81
Affine
0.00
0.00
0.56
Spline MR
0.00
0.00
0.50
None
0.35
0.46
0.77
Affine
0.00
0.00
0.57
Spline MR
0.00
0.00
0.50
None
0.44
0.43
0.79
Affine
0.00
0.00
0.55
Spline MR
0.00
0.00
0.49
None
0.28
0.41
0.71
Affine
0.00
0.00
0.55
Spline MR
0.00
0.00
0.48
References
✔ Brovelli M.A., Minghini M., Molinari M.E. & Zamboni G. (in press)
“Positional accuracy assessment of the OpenStreetMap buildings layer
through automatic homologous pair detection: the method and a case
study”. International Archives of the Photogrammetry, Remote Sensing
and Spatial Information Sciences.
✔ Brovelli M.A., Zamboni G., 2004. A step towards geographic
interoperability: the automatic detection of maps homologous pairs. In:
Proceedings of Urban Data Management Society Conference (UDMS
’04), Chioggia, Italy, 27-29 October 2004.
✔ Hecht, C., Kunze, C., Hahmann, S., 2013. Measuring Completeness of
Building Footprints in OpenStreetMap over Space and Time. ISPRS
International Journal of Geo-Information, 2(4), pp. 1066-1091.
Open geodata quality in Milan
Marco Minghini, Monia Elisa Molinari, Miriam
Molteni, Maria Antonia Brovelli
Politecnico di Milano | GEOlab
Open (geo)data
✔ Open data (http://opendefinition.org):
➔
permissions: use, modification, separation, redistribution, compilation,
non-discrimination, propagation, application to any purpose, no charge
➔
conditions: attribution, integrity, share-alike, notice, source, technical
restriction prohibition, non-aggression
✔ On a total of 13 domains of open governance data, the geospatial one
has been recognized as the one with the highest commercial value
(Carrara, W., Chan, W. S., Fischer, S., van Steenbergen, E., 2015. Creating Value through Open Data: Study
on the Impact of Re-use of Public Data Resources. European Commission, European Data Portal,
http://www.europeandataportal.eu/sites/default/files/edp_creating_value_through_open_data_0.pdf
✔ Which is the quality of open (geo)data?
➔
focus on Milan Municipality (only data published by Italian institutions)
➔
quality check performed using ISO guidelines
Open geodata in Milan Municipality
✔ Classification according to the provider:
Open geodata in Milan Municipality
✔ Classification according to the content:
Open geodata in Milan Municipality
✔ Classification according to the format:
Open geodata in Milan Municipality
✔ Classification according to the scale:
Open geodata in Milan Municipality
✔ Classification according to the license:
Open geodata in Milan Municipality
✔ Classification according to the year of publication:
Open geodata in Milan Municipality
✔ Classification according to the content vs. the provider:
Example of quality evaluation
✔ Example of quality (positional accuracy) evaluation:
➔
dataset: orthophoto
➔
provider: Italian Environmental Ministry
➔
producer: AGEA (Agenzia per le Erogazioni in Agricoltura)
➔
content: airborne observations
➔
format: Web Map Service (WMS)
➔
scale: national
➔
license: CC-BY-SA
➔
year of publication: 2012
➔
declared planimetric accuracy: 4 m (3 m, according to AGEA)
✔ Ground truth dataset for quality check:
➔
buildings DBTR Milan Municipality (2012): scale 1:1000, accuracy 20 cm
Example of quality evaluation
✔ Procedure:
➔
extract DBTR buildings according
to a random stratified sampling on
a hexagonal grid (ISO guidelines)
➔
identification of the corresponding
buildings on the orthophoto and
manual digitization of 3 corners of
the roof of each building
➔
computation of accuracy measures
on the homologous points
✗ statistics on planimetric errors
✗ measures of error confidence
Example of quality evaluation
✔ Statistics on planimetric errors (number of homologous pairs = 1450):
Index
eX
eY
e
μ
1.66 m
3.65 m
4.32 m
Me
1.04 m
2.19 m
3.03 m
σ
2.49 m
5.32 m
4.24 m
min
0.00 m
0.00 m
0.04 m
max
13.03 m
29.70 m
29.94 m
n
245
615
758
Example of quality evaluation
✔ Measures of error confidence:
Index
Value
CE39.4
4.15 m
CE50
4.89 m
CE90
8.91 m
CE95
10.16 m
CE99.8
14.53 m
Example of quality evaluation
✔ Error visualization:
Example of quality evaluation
✔ Error visualization:
Reference
✔ Brovelli M.A., Minghini M., Molinari M.E. & Molteni M. (in press) “Do
open geodata actually have the quality they declare? The case study of
Milan, Italy”. International Archives of the Photogrammetry, Remote
Sensing and Spatial Information Sciences.
Marco Minghini & Monia Elisa Molinari
Politecnico di Milano, GEOlab – Como Campus
Via Valleggio 11, 22100 Como (Italy)
[email protected], [email protected]
@MarcoMinghini, @MoniaMolinari