aboutsummaryrefslogtreecommitdiffstats
path: root/webmap-import
Commit message (Collapse)AuthorAgeFiles
* MVT: Generate metadata.json with copyright and timing information.Guilhem Moulin37 hours1
| | | | So the information can be exposed to the webmap's info dialog.
* webmap-import: Rename --compress-tiles option to --mvt-compress.Guilhem Moulin2025-05-211
|
* webmap-import: Remove option --mvtdir-tmp.Guilhem Moulin2025-05-211
| | | | | | | | | | | | | | | | | | | | | | Having a shared temporary directory, flock(2)'ed to avoid races, is a great idea in theory but unfortunately doesn't work so well with systemd.exec(5)'s ReadWritePaths settings since ReadWritePaths=/var/www/webmap/tiles ReadWritePaths=/var/www/webmap/tiles.tmp creates multiple mount points pointing at the same file system and rename(2)/renameat2(2) can't cope with that. Quoting the manual: EXDEV oldpath and newpath are not on the same mounted filesystem. (Linux permits a filesystem to be mounted at multiple points, but rename() does not work across different mount points, even if the same filesystem is mounted on both.) So the options are to either use a single ReadWritePaths=/var/www/webmap, or --mvtdir-tmp=/var/www/webmap/tiles/.tmp. Both kind of defeat the point (we'd in fact want to use --mvtdir-tmp=/var/tmp/webmap/tiles), so we use mkdtemp(3) instead.
* webmap-import: Add option to generate Mapbox Vector Tiles (MVT).Guilhem Moulin2025-05-211
|
* Factor out densification logic from getExtent() into own function.Guilhem Moulin2025-05-211
| | | | | | And only densify if needs be. Most sources are already in SWEREF 99 (modulo axis mapping strategy) so in pratice we can use mere rectangles as spatial filters.
* Move part of the fingerprinting logic into PostgreSQL when possible.Guilhem Moulin2025-05-201
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows ordering features before hashing, which is required for layers from Naturvårdsverket and Skogsstyrelsen (features appears to be randomly ordered in daily exports, so normalization and fingerprinting is needed to detect whether there are now changes). On the downside, this makes the cache a PostgreSQL-only feature. It's also marginally slower than the old logic because for some reason PostgreSQL doesn't seem to use the UNIQUE index and instead does a seq scan followed by a quicksort. Without fingerprinting logic: $ time -f "%E (%U user, %S sys) %Mk maxres" /usr/local/bin/webmap-import \ --cachedir=/var/cache/webmap \ --lockfile=/run/lock/webmap/lock \ --lockdir-sources=/run/lock/webmap/cache \ --force \ "sks:UtfordAvverk" […] INFO: Layer "sks:UtfordAvverk" has 313044 features […] 3:54.45 (85.28 user, 26.19 sys) 72520k maxres With old fingerprinting logic (full client-side SHA-256 digest of features as they are being imported): $ time -f "%E (%U user, %S sys) %Mk maxres" /usr/local/bin/webmap-import \ --cachedir=/var/cache/webmap \ --lockfile=/run/lock/webmap/lock \ --lockdir-sources=/run/lock/webmap/cache \ --force \ "sks:UtfordAvverk" […] INFO: Imported 313044 features from source layer "UtfordAvverkningYta" […] INFO: Updated layer "sks:UtfordAvverk" has new fingerprint e655a97a 4:15.65 (108.46 user, 26.73 sys) 80672k maxres With now fingerprinting logic (hybrid client/server SHA-256 digest and hash_record_extended() calls after the import process): $ time -f "%E (%U user, %S sys) %Mk maxres" /usr/local/bin/webmap-import \ --cachedir=/var/cache/webmap \ --lockfile=/run/lock/webmap/lock \ --lockdir-sources=/run/lock/webmap/cache \ --force \ "sks:UtfordAvverk" […] INFO: Layer "sks:UtfordAvverk" has 313044 features […] 4:30.77 (87.02 user, 25.67 sys) 72856k maxres Same but without ORDER BY (or ORDER BY ogc_fid): 4:07.52 (88.23 user, 26.58 sys) 72060k maxres (A server side incremental hash function would be better, but there is no such thing currently and the only way to hash fully server side is to aggregate rows in an array which would be too expensive memory-wise for large table.)
* importSources(): Return either success, error, or no change.Guilhem Moulin2025-05-011
| | | | | That way we can detect when the import of all layers are no-op (besides changing last_updated) and exit gracefully.
* webmap-import: Fingerprint destination layers to detect changes.Guilhem Moulin2025-05-011
| | | | | | | | Comparing modification times is not enough since some sources (for instance Naturvårdsverket's SCI_Rikstackande) are updated on the server even though no objects are being added; the source layer remains unchanged but the file differs because of OBJECTID changes we are not interested in.
* webmap-import: Fix fd leak, open lockfiles only once.Guilhem Moulin2025-04-281
| | | | | Some layers, for instance svk:*, use the same source file, and we want a single lock per file.
* Set and restore umask to ensure lockfiles are atomically created with mode 0664.Guilhem Moulin2025-04-281
| | | | | Using the default 0022 yields lock files with g-w, so trying to flock(2) from a different user failed.
* Move layer transactional logic to importSources().Guilhem Moulin2025-04-241
| | | | | | It's much clearer that way. The destination layer is cleared and updated in that function, so it makes sense if that's also where transactions (or SAVEPOINTs) are committed or rollback'ed.
* Change layer cache logic to target destination layers rather than sources.Guilhem Moulin2025-04-241
| | | | | | | | | | | | | | | | | | | | In a future commit we'll fingerprint layers to detect changes. Comparing modification times is not enough since some sources (for instance Naturvårdsverket's SCI_Rikstackande) are updated on the server even though no objects are being added; the source layer remains unchanged but the file differs because of OBJECTID changes we are not interested in. Rather than using another cache layer/table for fingerprints, we cache destination layernames rather than triplets (source_path, archive_member, layername), along with the time at which the import was started rather than source_path's mtime. There is indeed no value in having exact source_path's mtime in the cache. What we need is simply a way to detect whether source paths have been updated in a subsequent run. Thanks to the shared locks the ctime of any updated source path will be at least the time when the locks are released, thereby exceeding the last_updated value.
* typofixGuilhem Moulin2025-04-231
|
* webmap-import: Add a cache layer and store the source file's last ↵Guilhem Moulin2025-04-231
| | | | | | | | | | | | | | | | | | | | | | | | modification time. That way we can avoid the expensive unpack+import when the source file(s) have not been updated since the last run. The check can be bypassed with a new flag `--force`. We use a sequence for the FID:s (primary key) and a UNIQUE constraint on triplets (source_path, archive_member, layername) as GDAL doesn't support multicolumns primary keys. To avoid races between the stat(2) calls, gdal.OpenEx() and updates via `webmap-download` runs we place a shared lock on the downloaded files. One could resort to some tricks to eliminate the race between the first two, but there is also some value in having consistency during the entire execution of the script (a single source file can be used by multiple layers for instance, and it makes sense to use the very same file for all layers in that case). We also intersperse dso.FlushCache() calls between _importSource() calls in order to force the PG driver to call EndCopy() to detect errors and trigger a rollback when _importSource() fails.
* webmap-import: Break down into separate modules.Guilhem Moulin2025-04-211
|
* webmap-import: Major refactoring.Guilhem Moulin2025-04-191
|
* Add type hints and refactor a bit to please pylint.Guilhem Moulin2025-04-191
|
* webmap-import: Show the list of ingnored source fields.Guilhem Moulin2024-10-271
|
* PostgreSQL: Add NOT NULL constraints on the geometry columns.Guilhem Moulin2024-10-271
| | | | | Among other things this allows CLUSTERing on the GIST indices, cf. https://postgis.net/docs/manual-3.3/performance_tips.html#database_clustering
* webmap-import: Improve wording.Guilhem Moulin2024-09-261
|
* Add `webmap-publish` script to export layers to Mapbox Vector Tiles.Guilhem Moulin2024-09-251
|
* webmap-import: add option --lockfile to obtain an exclusive lock.Guilhem Moulin2024-09-201
| | | | | | This avoids starting multiple imports in parallel. Some layers, such as Skogsstyrelsen's, are quite large and filtering/importing causes rather high load.
* webmap-import: Don't crash when trying to insert a feature without geometry.Guilhem Moulin2024-06-221
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cf. for instance $ ogrinfo ./LST.vbk_projekteringsomraden.shp -sql "SELECT * FROM \"LST.vbk_projekteringsomraden\" WHERE OMRID = '1452-V-008'" […] Layer name: LST.vbk_projekteringsomraden Geometry: Polygon Feature Count: 1 Extent: (-907106.000000, 727.000000) - (914131.738200, 7573766.311200) Layer SRS WKT: PROJCRS["SWEREF99 TM", […] OGRFeature(LST.vbk_projekteringsomraden):2043 OMRID (String) = 1452-V-008 PROJNAMN (String) = Grimsås Äspås ANTALVERK (Integer64) = 0 AntalejXY (Integer64) = (null) CALPROD (Real) = 0.000000000000000 PBYGGSTART (String) = (null) PDRIFT (String) = (null) Andringsan (String) = (null) UnderByggn (String) = (null) ORGNAMN (String) = Kraftö AB ORGNR (String) = 556708-7456 EJAKTUELL (String) = Yes KOMNAMN (String) = Tranemo LANSNAMN (String) = Västra Götalands l EL_NAMN (String) = (null) Raderat (String) = No ArendeStat (String) = (null)
* webmap-import: Improve OGRFieldDefn::[GS]et*() capability detection.Guilhem Moulin2024-06-211
| | | | The PostgreSQL driver doesn't support AlternativeName, for instance.
* webmap-import: Don't crash if the destination layer has no SRS.Guilhem Moulin2024-06-211
| | | | This is the case for the PGDump driver, for instance.
* Conditionally use GetTZFlag()/SetTZFlag() depending on the GDAL version.Guilhem Moulin2024-06-201
| | | | | | | | | OGRFieldDefn: add GetComment() / SetComment() methods were added in OGR 3.8.0, cf. https://github.com/OSGeo/gdal/blob/master/NEWS.md#core-3 . Don't comment out TZ on field definitions. Instead we check the GDAL/OGR version and ignore TZ on field definitions if the OGR version is too old.
* webmap-import: Improve debug messages.Guilhem Moulin2024-06-201
|
* Conditionally use GetComment()/SetComment() depending on the GDAL version.Guilhem Moulin2024-06-201
| | | | | | | | | OGRFieldDefn: add GetComment() / SetComment() methods were added in OGR 3.7.0, cf. https://github.com/OSGeo/gdal/blob/master/NEWS.md#core-5 . Don't comment out comments on field definitions. Instead we check the GDAL/OGR version and ignore comments on field definitions if the OGR version is too old.
* webmap-import: Improve variable name.Guilhem Moulin2024-06-191
|
* Add logic for field regex substitution.Guilhem Moulin2024-06-191
| | | | | | | | | This is useful to replace a YYYYMMDD formatted date with YYYY-MM-DD. The target field can then be set to not-nullable and its type set to Date, as the OGR_F_SetField*() with take care of the conversion. We could also do that via an SQL query, but in our case the sources are not proper RDBMS so SQL is emulated anyway.
* webmap-import: Improve variable name.Guilhem Moulin2024-06-191
|
* Add logic to replace field value literals.Guilhem Moulin2024-06-191
| | | | And set them to NULL.
* webmap-import: Use the identity mapping if no ‘field-map’ is specified.Guilhem Moulin2024-06-161
| | | | | The previous default map was [-1] * n i.e., all source fields were ignored.
* webmap-import: Rename ‘fields’ list/dict to ‘field-map’.Guilhem Moulin2024-06-161
|
* Don't warn about unexisting fields for empty GeoJSON sources.Guilhem Moulin2024-06-121
|
* Use systemd.journal to log to journald when sarted via .service files.Guilhem Moulin2024-06-111
| | | | This enables proper filtering by level etc. (incl. journald coloring).
* config.yml: Add field comments.Guilhem Moulin2024-06-111
| | | | (Commented out for now since Bookworm has only GDAL v3.6.)
* webmap-import: Add error-checking for CreateFeature().Guilhem Moulin2024-06-111
| | | | | | Despite using gdal.UseExceptions() a failed call doesn't raise an exception, so we need to check the return value to avoid missing features.
* webmap-import: Improve INFO message.Guilhem Moulin2024-06-111
|
* Improve comments.Guilhem Moulin2024-06-111
|
* webmap-import: Add geometry conversion support.Guilhem Moulin2024-06-111
|
* Fix extent logic when the SRS of the output layer is not the destination SRS.Guilhem Moulin2024-06-111
| | | | | | The extent is expressed in config['SRS'] in traditional GIS order (easting/northing ordered: minX, minY, maxX, maxY), but the destination layers might be pre-existing and use other SRS:es or mapping strategy.
* WordingGuilhem Moulin2024-06-111
|
* Add support for reprojection into the destination SRS.Guilhem Moulin2024-06-111
| | | | | | The configured extent is always expressed in the destination SRS, so it needs to be transformed into the source SRS. Like apps/ogr2ogr_lib.cpp, we segmentize it to make sure it is sufficiently densified.
* Add TZFlag support (for GDAl ≥3.8).Guilhem Moulin2024-06-101
| | | | (Commented out in config.yml for now since Bookworm has only v3.6.)
* webmap-import: Don't try to set description if it is unset in config.yml.Guilhem Moulin2024-06-101
|
* webmap-import: Rename getFieldTypeCode() to parseFieldType().Guilhem Moulin2024-06-101
| | | | And getFieldSubTypeCode() to parseSubFieldType().
* Add `webmap-import` script to import source layers.Guilhem Moulin2024-06-101
There is still a few things to do (such as reprojection and geometry changes) but it's mostly working. We roll out our own ogr2ogr/GDALVectorTranslate()-like function version because GDALVectorTranslate() insists in calling StartTransaction() https://github.com/OSGeo/gdal/issues/3403 while we want a single transaction for the entire desination layer, including truncation, source imports, and metadata changes. Surprisingly our version is not much slower than the C++ one. Importing the 157446 (of 667034) features from sksUtfordAvverk-2000-2015.shp takes 14.3s while ogr2ogr -f PostgreSQL \ -doo ACTIVE_SCHEMA=postgis \ --config PG_USE_COPY YES \ --config OGR_TRUNCATE YES \ -append \ -fieldmap "0,-1,-1,-1,-1,1,2,3,4,5,6,7,8,9,10,11,12,13" \ -nlt MULTIPOLYGON -nlt PROMOTE_TO_MULTI \ -gt unlimited \ -spat 110720 6927136 1159296 7975712 \ -nln "sks:UtfordAvverk" \ PG:"dbname='webmap' user='webmap_import'" \ /tmp/x/sksUtfordAvverk-2000-2015.shp \ sksUtfordAvverk-2000-2015 takes 14s. Merely opening /tmp/x/sksUtfordAvverk-2000-2015.shp and looping through its (extent-filtered) features results in a runtime of 4.3s.