| Commit message (Collapse) | Author | Age | Files |
|
|
|
|
|
|
| |
All features have the same export date value, so the date is arguably a
schema attribute not a feature attribute. Since exports from the
mineral register happens daily nowdays the date should never be more
than a day off compared with the zip file's mtime.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The file appears to be official and updated as often as the GPKG
nowdays. Both have 101 distinct features.
And unlike the GPKG the multipolygons are not split, so we add the
UNIQUE CONSTRAINT on "OmrID".
For some reason there are a lot of duplicates in the GPKG.
sqlite> SELECT "ObjectID", COUNT(*) FROM "lst_vbk_havsbaserad_vindkraft" GROUP BY 1 ORDER BY 2;
101|9
1|24
2|24
[…]
100|24
|
|
|
|
| |
So the information can be exposed to the webmap's info dialog.
|
|
|
|
| |
We don't want to bump ctimes since they are used in the layer cache.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Web Application doesn't need the original full/non-simplified
geometry. After all, OpenLayers has fetched the tiles already and the
(visible part of) the geometry is already cached in the target SRS with
sensible simplification factors.
So there is really no need to transfer megabytes of high-precison data
to the client to highlight the feature. This changes means that CGI
responses will remain small hence can be buffered.
To further speed up things one could store geometry area/perimeter/length
as generated columns (along with the list of municipality and county
codes) and serve row directly from PostgreSQL (bypassing PostGIS). Also
since buffering is now an option the rows don't need be served in the
same order as the request, so one could batch queries using ANY[] or IN.
|
| |
|
|
|
|
|
| |
We subdivide administrative polygons to speed up things, cf.
https://symphony.is/about-us/blog/boosting-postgis-performance
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This causes overhead on some geometries though. Querying 100000 random
features in each table and measuring yields the following.
Without geom_area/geom_perimeter/geom_length
============================================
nvr:SCI_Rikstackande (forced to 2D)
Time: min=0.000s, max=0.668s, avg=0.003s
ren:riks_ren
Time: min=0.000s, max=0.090s, avg=0.012s
sametinget:flyttled
Time: min=0.000s, max=0.003s, avg=0.000s
sks:UtfordAvverk
Time: min=0.000s, max=0.180s, avg=0.001s
With geom_area/geom_perimeter/geom_length
=========================================
nvr:SCI_Rikstackande (forced to 2D)
Time: min=0.000s, max=1.242s, avg=0.004s
ren:riks_ren
Time: min=0.000s, max=0.092s, avg=0.012s
sametinget:flyttled
Time: min=0.000s, max=0.016s, avg=0.000s
sks:UtfordAvverk
Time: min=0.000s, max=0.190s, avg=0.001s
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
And skip geometry columns other than the first one. Some sources
(svk:stolpar, nvr:SPA_Rikstackande, nvr:OSPAR, svk:stationsomraden,
nvr:HELCOM, nvr:SCI_Rikstackande) come as 3D and we don't flatten them
during import.
Dropping the Z (and/or M) component during GeoJSON export avoids
transfering data which will anyway be ignored by OpenLayer. The
overhead of ST_Force2D() seems negligible in comparison with
ST_AsGeoJSON(). Querying 100000 random features in each table and
measuring yields the following.
Simple query
============
SELECT convert_to(ST_AsGeoJSON(m.*,geom_column=>'wkb_geometry',pretty_bool=>'f'),'UTF8') AS "GeoJSON"
FROM postgis."nvr:SCI_Rikstackande" m WHERE m."ogc_fid" = %s
nvr:HELCOM
Time: min=0.000s, max=0.002s, avg=0.000s
Size: min=1.24kiB, max=24.23kiB, avg=9.47kiB
nvr:SCI_Rikstackande
Time: min=0.000s, max=0.663s, avg=0.002s
Size: min=0.84kiB, max=61613.04kiB, avg=259.67kiB
(Yes that's >50MiB, cf. SE0820430 Torne och Kalix älvsystem.)
nvr:SPA_Rikstackande
Time: min=0.000s, max=0.008s, avg=0.001s
Size: min=1.36kiB, max=392.37kiB, avg=35.23kiB
svk:stolpar
Time: min=0.000s, max=0.003s, avg=0.000s
Size: min=0.19kiB, max=0.20kiB, avg=0.19kiB
svk:stationsomraden
Time: min=0.000s, max=0.003s, avg=0.000s
Size: min=0.35kiB, max=1.22kiB, avg=0.56kiB
sks:UtfordAvverk
Time: min=0.000s, max=0.023s, avg=0.001s
Size: min=0.51kiB, max=1117.39kiB, avg=9.75kiB
Advanced query
==============
WITH feature AS (
SELECT m."ogc_fid",
m.…
ST_Force2D(m."wkb_geometry") AS "wkb_geometry",
%s AS layer_group,
%s AS layer
FROM postgis."nvr:SCI_Rikstackande" m
WHERE m."ogc_fid" = %s
)
SELECT convert_to(ST_AsGeoJSON(feature.*,geom_column=>'wkb_geometry',pretty_bool=>'f'),'UTF8') AS "GeoJSON"
FROM feature
(We add `layer_group` and `layer` to the later, which adds 48bytes — the
size of `, "layer_group": "layer_group", "layer": "layer"` to the
output.)
nvr:HELCOM
Time: min=0.000s, max=0.004s, avg=0.000s
Size: min=1.21kiB, max=22.62kiB, avg=8.86kiB
nvr:SCI_Rikstackande
Time: min=0.000s, max=0.893s, avg=0.002s
Size: min=0.88kiB, max=57799.83kiB, avg=243.62kiB
nvr:SPA_Rikstackande
Time: min=0.000s, max=0.011s, avg=0.001s
Size: min=1.37kiB, max=367.70kiB, avg=33.13kiB
svk:stolpar
Time: min=0.000s, max=0.010s, avg=0.000s
Size: min=0.23kiB, max=0.23kiB, avg=0.23kiB
svk:stationsomraden
Time: min=0.000s, max=0.004s, avg=0.000s
Size: min=0.38kiB, max=1.21kiB, avg=0.58kiB
sks:UtfordAvverk (already in 2D so skip ST_Force2D() call)
Time: min=0.000s, max=0.085s, avg=0.001s
Size: min=0.55kiB, max=1117.44kiB, avg=9.80kiB
|
| |
|
|
|
|
|
| |
application() returns a iterator, so we serve responses immediately as
they as received from the server (no buffering).
|
|
|
|
|
|
|
|
|
|
|
| |
This avoids exporting features properties and geometries as GeoJSON.
(We could do the same for tiles, but it would be too slow.) The
downside is that the tables can't be read from while `webmap-import` is
running, so metadata lookup might briefly timeout during the daily
update.
It's still worth doing as it greatly simplifies deployment and saves
GiB of disk space.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Having a shared temporary directory, flock(2)'ed to avoid races, is a
great idea in theory but unfortunately doesn't work so well with
systemd.exec(5)'s ReadWritePaths settings since
ReadWritePaths=/var/www/webmap/tiles
ReadWritePaths=/var/www/webmap/tiles.tmp
creates multiple mount points pointing at the same file system and
rename(2)/renameat2(2) can't cope with that. Quoting the manual:
EXDEV oldpath and newpath are not on the same mounted filesystem.
(Linux permits a filesystem to be mounted at multiple points,
but rename() does not work across different mount points, even
if the same filesystem is mounted on both.)
So the options are to either use a single ReadWritePaths=/var/www/webmap,
or --mvtdir-tmp=/var/www/webmap/tiles/.tmp. Both kind of defeat the
point (we'd in fact want to use --mvtdir-tmp=/var/tmp/webmap/tiles), so
we use mkdtemp(3) instead.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The webmap currently doesn't go beyond zoom level 7 (8m/px) so it makes
little sense to create tiles for higher zoom levels.
It speeds things up and saves CPU resources and disk space, too. For
group ‘ren’ on the desktop (GPKG destination dataset), before
(max-zoom=9):
INFO: Exported 4488 features to 4 MVT layers in 00:06:02.926
INFO: Tile count: 75972 [min=33 B, max=128.16 kiB, sum=15.10 MiB, avg=208 B]
vs. after (max-zoom=7):
INFO: Exported 4488 features to 4 MVT layers in 00:00:25.548
INFO: Tile count: 5031 [min=35 B, max=128.16 kiB, sum=4.80 MiB, avg=0.98 kiB]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's done automatically by the MVT driver. Also we don't want to clip
at the extent boundaries, but instead leave som headroom so the clipped
polygon border is not visible on the map. The MVT driver does that.
It seems that GDAL 3.6.2 from Debian Bookworm generates too many tiles
though. It yields the following tile counts for group ‘ren’:
no manual clipping, BUFFER=32: 83718 tiles [min=33 B, max=117.70 kiB, sum=15.73 MiB, avg=197 B]
no manual clipping, BUFFER=0: 83676 tiles
clip at extent, BUFFER=32: 76256 tiles
GDAL 3.10.3 from Debian Trixie yields less surprising tile counts:
no manual clipping, BUFFER=32: 75972 tiles [min=33 B, max=128.16 kiB, sum=15.10 MiB, avg=208 B]
no manual clipping, BUFFER=0: 75939 tiles
clip at extent, BUFFER=32: 75972 tiles
(Interesting to see that the largest tile — 0/0/0.pbf — is over 10kiB
larger with the more recent GDAL version, also.)
|
| |
|
|
|
|
|
|
| |
And only densify if needs be. Most sources are already in SWEREF 99
(modulo axis mapping strategy) so in pratice we can use mere rectangles
as spatial filters.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows ordering features before hashing, which is required for
layers from Naturvårdsverket and Skogsstyrelsen (features appears to be
randomly ordered in daily exports, so normalization and fingerprinting
is needed to detect whether there are now changes).
On the downside, this makes the cache a PostgreSQL-only feature. It's
also marginally slower than the old logic because for some reason
PostgreSQL doesn't seem to use the UNIQUE index and instead does a seq
scan followed by a quicksort.
Without fingerprinting logic:
$ time -f "%E (%U user, %S sys) %Mk maxres"
/usr/local/bin/webmap-import \
--cachedir=/var/cache/webmap \
--lockfile=/run/lock/webmap/lock \
--lockdir-sources=/run/lock/webmap/cache \
--force \
"sks:UtfordAvverk"
[…]
INFO: Layer "sks:UtfordAvverk" has 313044 features
[…]
3:54.45 (85.28 user, 26.19 sys) 72520k maxres
With old fingerprinting logic (full client-side SHA-256 digest of
features as they are being imported):
$ time -f "%E (%U user, %S sys) %Mk maxres"
/usr/local/bin/webmap-import \
--cachedir=/var/cache/webmap \
--lockfile=/run/lock/webmap/lock \
--lockdir-sources=/run/lock/webmap/cache \
--force \
"sks:UtfordAvverk"
[…]
INFO: Imported 313044 features from source layer "UtfordAvverkningYta"
[…]
INFO: Updated layer "sks:UtfordAvverk" has new fingerprint e655a97a
4:15.65 (108.46 user, 26.73 sys) 80672k maxres
With now fingerprinting logic (hybrid client/server SHA-256 digest and
hash_record_extended() calls after the import process):
$ time -f "%E (%U user, %S sys) %Mk maxres"
/usr/local/bin/webmap-import \
--cachedir=/var/cache/webmap \
--lockfile=/run/lock/webmap/lock \
--lockdir-sources=/run/lock/webmap/cache \
--force \
"sks:UtfordAvverk"
[…]
INFO: Layer "sks:UtfordAvverk" has 313044 features
[…]
4:30.77 (87.02 user, 25.67 sys) 72856k maxres
Same but without ORDER BY (or ORDER BY ogc_fid):
4:07.52 (88.23 user, 26.58 sys) 72060k maxres
(A server side incremental hash function would be better, but there is no
such thing currently and the only way to hash fully server side is to
aggregate rows in an array which would be too expensive memory-wise for
large table.)
|
|
|
|
|
| |
That way we can detect when the import of all layers are no-op (besides
changing last_updated) and exit gracefully.
|
|
|
|
|
|
|
|
| |
Comparing modification times is not enough since some sources (for
instance Naturvårdsverket's SCI_Rikstackande) are updated on the server
even though no objects are being added; the source layer remains
unchanged but the file differs because of OBJECTID changes we are not
interested in.
|
|
|
|
|
| |
Some layers, for instance svk:*, use the same source file, and we want a
single lock per file.
|
|
|
|
|
| |
Using the default 0022 yields lock files with g-w, so trying to flock(2)
from a different user failed.
|
|
|
|
|
| |
Cf. for instance svk:transmissionsnatsprojekt. It has an import
definition, but no source path/URL for webmap-download to operate on.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Thanks to EU Directive 2019/1024 of the European Parliament and of the
Council of 20 June 2019 on open data and the re-use of public sector
information, and the Commission Implementing Regulation (EU) 2023/138 of
21 December 2022, the mineral register is now available under the terms
of the CC0 1.0 Universal licence, see
https://www.sgu.se/produkter-och-tjanster/geologiska-data/malmer-och-mineral--geologiska-data/mineralrattigheter-och-prospektering/
Given we no longer need to parse SVG images from the webmap, we drop
webmap-download-mrr.py and add layers for expired and forbidden permits
(ut_metaller_industrimineral_forfallna, ut_olja_gas_diamant_forfallna,
bearbetningskoncessioner_forfallna, ut_metaller_industrimineral_forbud,
ut_diamant_forbud) as well as markanvisningar_bk_ansokta.
Unfortunately the GeoPackage file doesn't include peat concessions, so
we drop them from config.yml for now. Should they be of interest we can
always restore webmap-download-mrr.py and/or order the register from
Bergsstaten, cf. https://resource.sgu.se/bergsstaten/exporter-ur-mrr-info.pdf .
|
|
|
|
|
|
| |
It's much clearer that way. The destination layer is cleared and
updated in that function, so it makes sense if that's also where
transactions (or SAVEPOINTs) are committed or rollback'ed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a future commit we'll fingerprint layers to detect changes.
Comparing modification times is not enough since some sources (for
instance Naturvårdsverket's SCI_Rikstackande) are updated on the server
even though no objects are being added; the source layer remains
unchanged but the file differs because of OBJECTID changes we are not
interested in.
Rather than using another cache layer/table for fingerprints, we cache
destination layernames rather than triplets (source_path, archive_member,
layername), along with the time at which the import was started rather
than source_path's mtime.
There is indeed no value in having exact source_path's mtime in the
cache. What we need is simply a way to detect whether source paths have
been updated in a subsequent run. Thanks to the shared locks the ctime
of any updated source path will be at least the time when the locks are
released, thereby exceeding the last_updated value.
|
| |
|
|
|
|
| |
Cf. https://gdal.org/en/stable/api/vector_c_api.html#c.OGR_TZFLAG_UTC .
|
|
|
|
| |
files.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
modification time.
That way we can avoid the expensive unpack+import when the source
file(s) have not been updated since the last run. The check can be
bypassed with a new flag `--force`.
We use a sequence for the FID:s (primary key) and a UNIQUE constraint on
triplets (source_path, archive_member, layername) as GDAL doesn't
support multicolumns primary keys.
To avoid races between the stat(2) calls, gdal.OpenEx() and updates via
`webmap-download` runs we place a shared lock on the downloaded files.
One could resort to some tricks to eliminate the race between the first
two, but there is also some value in having consistency during the
entire execution of the script (a single source file can be used by
multiple layers for instance, and it makes sense to use the very same
file for all layers in that case).
We also intersperse dso.FlushCache() calls between _importSource() calls
in order to force the PG driver to call EndCopy() to detect errors and
trigger a rollback when _importSource() fails.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
This avoid duplications when the same source file is used multiple times
(either by the same layer or by multiple layers). This change breaks
webmap-import, but that one will be refactored shortly.
It also breaks webmap-import-mrr.py, which is no longer used since
mineralrattigheter.zip can be downloaded from SGU's site directly.
|
| |
|
|
|
|
| |
And add an optional precision argument
|
|
|
|
| |
It appears it is no longer stripped in Bergsstaten's mineralregister.
|
|
|
|
|
|
|
| |
Apparently there are duplicates for dnr BS 22-28-2000
(bearbetningskoncessioner_beviljade) and BS 23-126-2007,
BS 23-116-2005, BS 23-226-1930, BS 23-149-1960, BS 23-7-2000,
and BS 23-105-2010 (markanvisningar_bk_beviljade).
|
| |
|
|
|
|
|
|
| |
Forgot that PostgreSQL space-pads the latter, and that the former
doesn't in fact yield extra storage costs, cf.
https://www.postgresql.org/docs/13/datatype-character.html
|
| |
|
|
|
|
|
|
| |
As of today SK 117-2024 has not contract date set, so the constraint
fails during import.
https://www.skogsstyrelsen.se/skogens-parlor/NVAvtal/?objektid=4020527
|
|
|
|
|
| |
Among other things this allows CLUSTERing on the GIST indices, cf.
https://postgis.net/docs/manual-3.3/performance_tips.html#database_clustering
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
Unfortunately the dam register is way too noisy Unfortunately it looks
like there is no way to download a pre-curated dam registry with only
noteworthy dams used for production (SvK's dammar_pf.shp comes close but
still misses some), so we manually remove the ones which are more than
2km from a production site or a power station.
|