Massive standardization and geocoding of postal addresses through ETL processes using unified digital street map of Andalusia (CDAU) web services
DOI:
https://doi.org/10.59192/mapping.391Keywords:
Postal code, Address standardization, Geocoding, ETL, Pentaho Data Integration, Unified Digital Street Map of Andalusia, ISE, Inventory of governmental headquarters and public services, REST, JSON, WPSAbstract
The Inventory of headquarters and public services of «Junta de Andalucía» (ISE) offers a global vision of the location of the services provided by the Andalusian Regional Government, offering both the geographical location and the most relevant alphanumeric data. The ISE solves the problems associated with the great dispersion of data and formats in which each responsible body publishes them, integrating them into a single standardized PostgreSQL/PostGIS database that can be consulted through a web viewer and interoperable OGC web services. A major task of the project is to process the starting information so that the postal addresses are normalized and that each facility or headquarter is located through a geometric point. These operations (normalization and geocoding) are executed automatically using the web processing service (WPS) of the Unified Digital Street Map of Andalusia (CDAU), the official reference source for roads and portals in Andalusia. To facilitate the task, the queries have been automated within workflows implemented using the ETL Kettle tool, one of the products of the Pentaho Data Integration suite. To do this, the normalization and geocoding functions are recursively invoked through the REST communication protocol and the response returned in JSON format is interpreted, isolating the values of interest (road type, road name, gate number and set of coordinates) and generating the corresponding geometries.
Downloads
References
IECA. 2019. «Manual de buenas prácticas para la normalización de fuentes y registros administrativos de la Junta de Andalucía. Versión 2.0». 66p.
IECA. 2021. «Manual de integración – WS-CDAU y CdauProxyWS. Versión 2.11.0». 129p.
IECA. 2022. «Inventario de Equipamientos y Sedes de la Junta de Andalucía (ISE). Especificaciones del proyecto». 28p. https://www.juntadeandalucia.es/institutodeestadisticaycartografia/mapa_equipamientos/documentos/Especificaciones_ISE.pdf
Información sobre Pentaho Data Integration https://help.hitachivantara.com/Documentation/Pentaho/9.3/Products/Learn_about_the_PDI_client
Información sobre CDAU - https://www.callejerodeandalucia.es/portal/proyecto
Inventario de Sedes y Equipamientos de la Junta de Andalucía. Disponible en: http://www.juntadeandalucia.es/institutodeestadisticaycartografia/mapa_equipamientos/index.htm
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.