| Commit message (Collapse) | Author | Age | Files |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Having a shared temporary directory, flock(2)'ed to avoid races, is a
great idea in theory but unfortunately doesn't work so well with
systemd.exec(5)'s ReadWritePaths settings since
ReadWritePaths=/var/www/webmap/tiles
ReadWritePaths=/var/www/webmap/tiles.tmp
creates multiple mount points pointing at the same file system and
rename(2)/renameat2(2) can't cope with that. Quoting the manual:
EXDEV oldpath and newpath are not on the same mounted filesystem.
(Linux permits a filesystem to be mounted at multiple points,
but rename() does not work across different mount points, even
if the same filesystem is mounted on both.)
So the options are to either use a single ReadWritePaths=/var/www/webmap,
or --mvtdir-tmp=/var/www/webmap/tiles/.tmp. Both kind of defeat the
point (we'd in fact want to use --mvtdir-tmp=/var/tmp/webmap/tiles), so
we use mkdtemp(3) instead.
|
| |
|
|
|
|
|
|
| |
And only densify if needs be. Most sources are already in SWEREF 99
(modulo axis mapping strategy) so in pratice we can use mere rectangles
as spatial filters.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows ordering features before hashing, which is required for
layers from Naturvårdsverket and Skogsstyrelsen (features appears to be
randomly ordered in daily exports, so normalization and fingerprinting
is needed to detect whether there are now changes).
On the downside, this makes the cache a PostgreSQL-only feature. It's
also marginally slower than the old logic because for some reason
PostgreSQL doesn't seem to use the UNIQUE index and instead does a seq
scan followed by a quicksort.
Without fingerprinting logic:
$ time -f "%E (%U user, %S sys) %Mk maxres"
/usr/local/bin/webmap-import \
--cachedir=/var/cache/webmap \
--lockfile=/run/lock/webmap/lock \
--lockdir-sources=/run/lock/webmap/cache \
--force \
"sks:UtfordAvverk"
[…]
INFO: Layer "sks:UtfordAvverk" has 313044 features
[…]
3:54.45 (85.28 user, 26.19 sys) 72520k maxres
With old fingerprinting logic (full client-side SHA-256 digest of
features as they are being imported):
$ time -f "%E (%U user, %S sys) %Mk maxres"
/usr/local/bin/webmap-import \
--cachedir=/var/cache/webmap \
--lockfile=/run/lock/webmap/lock \
--lockdir-sources=/run/lock/webmap/cache \
--force \
"sks:UtfordAvverk"
[…]
INFO: Imported 313044 features from source layer "UtfordAvverkningYta"
[…]
INFO: Updated layer "sks:UtfordAvverk" has new fingerprint e655a97a
4:15.65 (108.46 user, 26.73 sys) 80672k maxres
With now fingerprinting logic (hybrid client/server SHA-256 digest and
hash_record_extended() calls after the import process):
$ time -f "%E (%U user, %S sys) %Mk maxres"
/usr/local/bin/webmap-import \
--cachedir=/var/cache/webmap \
--lockfile=/run/lock/webmap/lock \
--lockdir-sources=/run/lock/webmap/cache \
--force \
"sks:UtfordAvverk"
[…]
INFO: Layer "sks:UtfordAvverk" has 313044 features
[…]
4:30.77 (87.02 user, 25.67 sys) 72856k maxres
Same but without ORDER BY (or ORDER BY ogc_fid):
4:07.52 (88.23 user, 26.58 sys) 72060k maxres
(A server side incremental hash function would be better, but there is no
such thing currently and the only way to hash fully server side is to
aggregate rows in an array which would be too expensive memory-wise for
large table.)
|
|
|
|
|
| |
That way we can detect when the import of all layers are no-op (besides
changing last_updated) and exit gracefully.
|
|
|
|
|
|
|
|
| |
Comparing modification times is not enough since some sources (for
instance Naturvårdsverket's SCI_Rikstackande) are updated on the server
even though no objects are being added; the source layer remains
unchanged but the file differs because of OBJECTID changes we are not
interested in.
|
|
|
|
|
| |
Some layers, for instance svk:*, use the same source file, and we want a
single lock per file.
|
|
|
|
|
| |
Using the default 0022 yields lock files with g-w, so trying to flock(2)
from a different user failed.
|
|
|
|
|
|
| |
It's much clearer that way. The destination layer is cleared and
updated in that function, so it makes sense if that's also where
transactions (or SAVEPOINTs) are committed or rollback'ed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a future commit we'll fingerprint layers to detect changes.
Comparing modification times is not enough since some sources (for
instance Naturvårdsverket's SCI_Rikstackande) are updated on the server
even though no objects are being added; the source layer remains
unchanged but the file differs because of OBJECTID changes we are not
interested in.
Rather than using another cache layer/table for fingerprints, we cache
destination layernames rather than triplets (source_path, archive_member,
layername), along with the time at which the import was started rather
than source_path's mtime.
There is indeed no value in having exact source_path's mtime in the
cache. What we need is simply a way to detect whether source paths have
been updated in a subsequent run. Thanks to the shared locks the ctime
of any updated source path will be at least the time when the locks are
released, thereby exceeding the last_updated value.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
modification time.
That way we can avoid the expensive unpack+import when the source
file(s) have not been updated since the last run. The check can be
bypassed with a new flag `--force`.
We use a sequence for the FID:s (primary key) and a UNIQUE constraint on
triplets (source_path, archive_member, layername) as GDAL doesn't
support multicolumns primary keys.
To avoid races between the stat(2) calls, gdal.OpenEx() and updates via
`webmap-download` runs we place a shared lock on the downloaded files.
One could resort to some tricks to eliminate the race between the first
two, but there is also some value in having consistency during the
entire execution of the script (a single source file can be used by
multiple layers for instance, and it makes sense to use the very same
file for all layers in that case).
We also intersperse dso.FlushCache() calls between _importSource() calls
in order to force the PG driver to call EndCopy() to detect errors and
trigger a rollback when _importSource() fails.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
Among other things this allows CLUSTERing on the GIST indices, cf.
https://postgis.net/docs/manual-3.3/performance_tips.html#database_clustering
|
| |
|
| |
|
|
|
|
|
|
| |
This avoids starting multiple imports in parallel. Some layers, such as
Skogsstyrelsen's, are quite large and filtering/importing causes rather
high load.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Cf. for instance
$ ogrinfo ./LST.vbk_projekteringsomraden.shp -sql "SELECT * FROM \"LST.vbk_projekteringsomraden\" WHERE OMRID = '1452-V-008'"
[…]
Layer name: LST.vbk_projekteringsomraden
Geometry: Polygon
Feature Count: 1
Extent: (-907106.000000, 727.000000) - (914131.738200, 7573766.311200)
Layer SRS WKT:
PROJCRS["SWEREF99 TM",
[…]
OGRFeature(LST.vbk_projekteringsomraden):2043
OMRID (String) = 1452-V-008
PROJNAMN (String) = Grimsås Äspås
ANTALVERK (Integer64) = 0
AntalejXY (Integer64) = (null)
CALPROD (Real) = 0.000000000000000
PBYGGSTART (String) = (null)
PDRIFT (String) = (null)
Andringsan (String) = (null)
UnderByggn (String) = (null)
ORGNAMN (String) = Kraftö AB
ORGNR (String) = 556708-7456
EJAKTUELL (String) = Yes
KOMNAMN (String) = Tranemo
LANSNAMN (String) = Västra Götalands l
EL_NAMN (String) = (null)
Raderat (String) = No
ArendeStat (String) = (null)
|
|
|
|
| |
The PostgreSQL driver doesn't support AlternativeName, for instance.
|
|
|
|
| |
This is the case for the PGDump driver, for instance.
|
|
|
|
|
|
|
|
|
| |
OGRFieldDefn: add GetComment() / SetComment() methods were added in OGR
3.8.0, cf. https://github.com/OSGeo/gdal/blob/master/NEWS.md#core-3 .
Don't comment out TZ on field definitions. Instead we check the
GDAL/OGR version and ignore TZ on field definitions if the OGR version
is too old.
|
| |
|
|
|
|
|
|
|
|
|
| |
OGRFieldDefn: add GetComment() / SetComment() methods were added in OGR
3.7.0, cf. https://github.com/OSGeo/gdal/blob/master/NEWS.md#core-5 .
Don't comment out comments on field definitions. Instead we check the
GDAL/OGR version and ignore comments on field definitions if the OGR
version is too old.
|
| |
|
|
|
|
|
|
|
|
|
| |
This is useful to replace a YYYYMMDD formatted date with YYYY-MM-DD.
The target field can then be set to not-nullable and its type set to
Date, as the OGR_F_SetField*() with take care of the conversion.
We could also do that via an SQL query, but in our case the sources are
not proper RDBMS so SQL is emulated anyway.
|
| |
|
|
|
|
| |
And set them to NULL.
|
|
|
|
|
| |
The previous default map was [-1] * n i.e., all source fields were
ignored.
|
| |
|
| |
|
|
|
|
| |
This enables proper filtering by level etc. (incl. journald coloring).
|
|
|
|
| |
(Commented out for now since Bookworm has only GDAL v3.6.)
|
|
|
|
|
|
| |
Despite using gdal.UseExceptions() a failed call doesn't raise an
exception, so we need to check the return value to avoid missing
features.
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
The extent is expressed in config['SRS'] in traditional GIS order
(easting/northing ordered: minX, minY, maxX, maxY), but the destination
layers might be pre-existing and use other SRS:es or mapping strategy.
|
| |
|
|
|
|
|
|
| |
The configured extent is always expressed in the destination SRS, so it
needs to be transformed into the source SRS. Like apps/ogr2ogr_lib.cpp,
we segmentize it to make sure it is sufficiently densified.
|
|
|
|
| |
(Commented out in config.yml for now since Bookworm has only v3.6.)
|
| |
|
|
|
|
| |
And getFieldSubTypeCode() to parseSubFieldType().
|
|
There is still a few things to do (such as reprojection and geometry
changes) but it's mostly working.
We roll out our own ogr2ogr/GDALVectorTranslate()-like function version
because GDALVectorTranslate() insists in calling StartTransaction()
https://github.com/OSGeo/gdal/issues/3403 while we want a single
transaction for the entire desination layer, including truncation,
source imports, and metadata changes.
Surprisingly our version is not much slower than the C++ one. Importing
the 157446 (of 667034) features from sksUtfordAvverk-2000-2015.shp takes
14.3s while
ogr2ogr -f PostgreSQL \
-doo ACTIVE_SCHEMA=postgis \
--config PG_USE_COPY YES \
--config OGR_TRUNCATE YES \
-append \
-fieldmap "0,-1,-1,-1,-1,1,2,3,4,5,6,7,8,9,10,11,12,13" \
-nlt MULTIPOLYGON -nlt PROMOTE_TO_MULTI \
-gt unlimited \
-spat 110720 6927136 1159296 7975712 \
-nln "sks:UtfordAvverk" \
PG:"dbname='webmap' user='webmap_import'" \
/tmp/x/sksUtfordAvverk-2000-2015.shp \
sksUtfordAvverk-2000-2015
takes 14s.
Merely opening /tmp/x/sksUtfordAvverk-2000-2015.shp and looping through
its (extent-filtered) features results in a runtime of 4.3s.
|