diff options
author | Guilhem Moulin <guilhem@fripost.org> | 2025-05-01 21:20:44 +0200 |
---|---|---|
committer | Guilhem Moulin <guilhem@fripost.org> | 2025-05-20 09:51:54 +0200 |
commit | 12bd18ed5e01a84b03be7c21570bac6547759970 (patch) | |
tree | ec491f29beca20bc4657f34ae7244b9f52321b0a /webmap-import | |
parent | 3edce255b3010244ab5d7fae59cbda11926f50f1 (diff) |
Move part of the fingerprinting logic into PostgreSQL when possible.
This allows ordering features before hashing, which is required for
layers from Naturvårdsverket and Skogsstyrelsen (features appears to be
randomly ordered in daily exports, so normalization and fingerprinting
is needed to detect whether there are now changes).
On the downside, this makes the cache a PostgreSQL-only feature. It's
also marginally slower than the old logic because for some reason
PostgreSQL doesn't seem to use the UNIQUE index and instead does a seq
scan followed by a quicksort.
Without fingerprinting logic:
$ time -f "%E (%U user, %S sys) %Mk maxres"
/usr/local/bin/webmap-import \
--cachedir=/var/cache/webmap \
--lockfile=/run/lock/webmap/lock \
--lockdir-sources=/run/lock/webmap/cache \
--force \
"sks:UtfordAvverk"
[…]
INFO: Layer "sks:UtfordAvverk" has 313044 features
[…]
3:54.45 (85.28 user, 26.19 sys) 72520k maxres
With old fingerprinting logic (full client-side SHA-256 digest of
features as they are being imported):
$ time -f "%E (%U user, %S sys) %Mk maxres"
/usr/local/bin/webmap-import \
--cachedir=/var/cache/webmap \
--lockfile=/run/lock/webmap/lock \
--lockdir-sources=/run/lock/webmap/cache \
--force \
"sks:UtfordAvverk"
[…]
INFO: Imported 313044 features from source layer "UtfordAvverkningYta"
[…]
INFO: Updated layer "sks:UtfordAvverk" has new fingerprint e655a97a
4:15.65 (108.46 user, 26.73 sys) 80672k maxres
With now fingerprinting logic (hybrid client/server SHA-256 digest and
hash_record_extended() calls after the import process):
$ time -f "%E (%U user, %S sys) %Mk maxres"
/usr/local/bin/webmap-import \
--cachedir=/var/cache/webmap \
--lockfile=/run/lock/webmap/lock \
--lockdir-sources=/run/lock/webmap/cache \
--force \
"sks:UtfordAvverk"
[…]
INFO: Layer "sks:UtfordAvverk" has 313044 features
[…]
4:30.77 (87.02 user, 25.67 sys) 72856k maxres
Same but without ORDER BY (or ORDER BY ogc_fid):
4:07.52 (88.23 user, 26.58 sys) 72060k maxres
(A server side incremental hash function would be better, but there is no
such thing currently and the only way to hash fully server side is to
aggregate rows in an array which would be too expensive memory-wise for
large table.)
Diffstat (limited to 'webmap-import')
-rwxr-xr-x | webmap-import | 4 |
1 files changed, 4 insertions, 0 deletions
diff --git a/webmap-import b/webmap-import index 6f514a9..f20fdef 100755 --- a/webmap-import +++ b/webmap-import @@ -377,6 +377,10 @@ def validateLayerCacheField(defn : ogr.FeatureDefn, idx : int, def validateCacheLayer(ds : gdal.Dataset, name : str) -> bool: """Validate layer cache table.""" + drvName = ds.GetDriver().ShortName + if drvName != 'PostgreSQL': # we need hash_record_extended(), sha256() and ST_AsEWKB() + logging.warning('Unsupported cache layer for output driver %s', drvName) + return False lyr = ds.GetLayerByName(name) if lyr is None: logging.warning('Table "%s" does not exist', name) |