KlimatanalysNorr/tools - Backend utilities for Klimatanalys Norr's web map.

	Commit message (Collapse)	Author	Age	Files
*	MRR: Drop field export_date.	Guilhem Moulin	2025-06-04	2
\| \| \| \| \| \| \|	All features have the same export date value, so the date is arguably a schema attribute not a feature attribute. Since exports from the mineral register happens daily nowdays the date should never be more than a day off compared with the zip file's mtime.
*	VBK: Fix schema for source layer LST.vbk_projekteringsomraden.	Guilhem Moulin	2025-06-04	1
\|
*	VBK: Use the shapefile as source for offshore projects.	Guilhem Moulin	2025-06-04	2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The file appears to be official and updated as often as the GPKG nowdays. Both have 101 distinct features. And unlike the GPKG the multipolygons are not split, so we add the UNIQUE CONSTRAINT on "OmrID". For some reason there are a lot of duplicates in the GPKG. sqlite> SELECT "ObjectID", COUNT(*) FROM "lst_vbk_havsbaserad_vindkraft" GROUP BY 1 ORDER BY 2; 101\|9 1\|24 2\|24 […] 100\|24
*	MVT: Generate metadata.json with copyright and timing information.	Guilhem Moulin	2025-06-04	5
\| \| \| \|	So the information can be exposed to the webmap's info dialog.
*	webmap-download: Drop obsolete comment.	Guilhem Moulin	2025-05-27	1
\| \| \| \|	We don't want to bump ctimes since they are used in the layer cache.
*	CGI: Serve only the feature's properties not the entire feature as GeoJSON.	Guilhem Moulin	2025-05-27	1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The Web Application doesn't need the original full/non-simplified geometry. After all, OpenLayers has fetched the tiles already and the (visible part of) the geometry is already cached in the target SRS with sensible simplification factors. So there is really no need to transfer megabytes of high-precison data to the client to highlight the feature. This changes means that CGI responses will remain small hence can be buffered. To further speed up things one could store geometry area/perimeter/length as generated columns (along with the list of municipality and county codes) and serve row directly from PostgreSQL (bypassing PostGIS). Also since buffering is now an option the rows don't need be served in the same order as the request, so one could batch queries using ANY[] or IN.
*	Drop unused MVT layer mrr:appl_dl.	Guilhem Moulin	2025-05-26	1
\|
*	Schema: Add functions to get a list of municipality and county codes.	Guilhem Moulin	2025-05-26	2
\| \| \| \| \|	We subdivide administrative polygons to speed up things, cf. https://symphony.is/about-us/blog/boosting-postgis-performance
*	CGI: Export geometry area and length/perimeter in the GeoJSON properties.	Guilhem Moulin	2025-05-26	1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This causes overhead on some geometries though. Querying 100000 random features in each table and measuring yields the following. Without geom_area/geom_perimeter/geom_length ============================================ nvr:SCI_Rikstackande (forced to 2D) Time: min=0.000s, max=0.668s, avg=0.003s ren:riks_ren Time: min=0.000s, max=0.090s, avg=0.012s sametinget:flyttled Time: min=0.000s, max=0.003s, avg=0.000s sks:UtfordAvverk Time: min=0.000s, max=0.180s, avg=0.001s With geom_area/geom_perimeter/geom_length ========================================= nvr:SCI_Rikstackande (forced to 2D) Time: min=0.000s, max=1.242s, avg=0.004s ren:riks_ren Time: min=0.000s, max=0.092s, avg=0.012s sametinget:flyttled Time: min=0.000s, max=0.016s, avg=0.000s sks:UtfordAvverk Time: min=0.000s, max=0.190s, avg=0.001s
*	CGI: Force geometries to 2D before exporting as GeoJSON.	Guilhem Moulin	2025-05-25	1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	And skip geometry columns other than the first one. Some sources (svk:stolpar, nvr:SPA_Rikstackande, nvr:OSPAR, svk:stationsomraden, nvr:HELCOM, nvr:SCI_Rikstackande) come as 3D and we don't flatten them during import. Dropping the Z (and/or M) component during GeoJSON export avoids transfering data which will anyway be ignored by OpenLayer. The overhead of ST_Force2D() seems negligible in comparison with ST_AsGeoJSON(). Querying 100000 random features in each table and measuring yields the following. Simple query ============ SELECT convert_to(ST_AsGeoJSON(m.,geom_column=>'wkb_geometry',pretty_bool=>'f'),'UTF8') AS "GeoJSON" FROM postgis."nvr:SCI_Rikstackande" m WHERE m."ogc_fid" = %s nvr:HELCOM Time: min=0.000s, max=0.002s, avg=0.000s Size: min=1.24kiB, max=24.23kiB, avg=9.47kiB nvr:SCI_Rikstackande Time: min=0.000s, max=0.663s, avg=0.002s Size: min=0.84kiB, max=61613.04kiB, avg=259.67kiB (Yes that's >50MiB, cf. SE0820430 Torne och Kalix älvsystem.) nvr:SPA_Rikstackande Time: min=0.000s, max=0.008s, avg=0.001s Size: min=1.36kiB, max=392.37kiB, avg=35.23kiB svk:stolpar Time: min=0.000s, max=0.003s, avg=0.000s Size: min=0.19kiB, max=0.20kiB, avg=0.19kiB svk:stationsomraden Time: min=0.000s, max=0.003s, avg=0.000s Size: min=0.35kiB, max=1.22kiB, avg=0.56kiB sks:UtfordAvverk Time: min=0.000s, max=0.023s, avg=0.001s Size: min=0.51kiB, max=1117.39kiB, avg=9.75kiB Advanced query ============== WITH feature AS ( SELECT m."ogc_fid", m.… ST_Force2D(m."wkb_geometry") AS "wkb_geometry", %s AS layer_group, %s AS layer FROM postgis."nvr:SCI_Rikstackande" m WHERE m."ogc_fid" = %s ) SELECT convert_to(ST_AsGeoJSON(feature.,geom_column=>'wkb_geometry',pretty_bool=>'f'),'UTF8') AS "GeoJSON" FROM feature (We add `layer_group` and `layer` to the later, which adds 48bytes — the size of `, "layer_group": "layer_group", "layer": "layer"` to the output.) nvr:HELCOM Time: min=0.000s, max=0.004s, avg=0.000s Size: min=1.21kiB, max=22.62kiB, avg=8.86kiB nvr:SCI_Rikstackande Time: min=0.000s, max=0.893s, avg=0.002s Size: min=0.88kiB, max=57799.83kiB, avg=243.62kiB nvr:SPA_Rikstackande Time: min=0.000s, max=0.011s, avg=0.001s Size: min=1.37kiB, max=367.70kiB, avg=33.13kiB svk:stolpar Time: min=0.000s, max=0.010s, avg=0.000s Size: min=0.23kiB, max=0.23kiB, avg=0.23kiB svk:stationsomraden Time: min=0.000s, max=0.004s, avg=0.000s Size: min=0.38kiB, max=1.21kiB, avg=0.58kiB sks:UtfordAvverk (already in 2D so skip ST_Force2D() call) Time: min=0.000s, max=0.085s, avg=0.001s Size: min=0.55kiB, max=1117.44kiB, avg=9.80kiB
*	CGI: Don't hardcode geometry and primary key columns.	Guilhem Moulin	2025-05-25	1
\|
*	CGI: Take a list of feature IDs to query to avoid pounding the backend.	Guilhem Moulin	2025-05-25	1
\| \| \| \| \|	application() returns a iterator, so we serve responses immediately as they as received from the server (no buffering).
*	Add a simple CGI to retrieve features from PostgreSQL/PostGIS.	Guilhem Moulin	2025-05-25	3
\| \| \| \| \| \| \| \| \| \| \|	This avoids exporting features properties and geometries as GeoJSON. (We could do the same for tiles, but it would be too slow.) The downside is that the tables can't be read from while `webmap-import` is running, so metadata lookup might briefly timeout during the daily update. It's still worth doing as it greatly simplifies deployment and saves GiB of disk space.
*	webmap-import: Rename --compress-tiles option to --mvt-compress.	Guilhem Moulin	2025-05-21	1
\|
*	webmap-import: Remove option --mvtdir-tmp.	Guilhem Moulin	2025-05-21	2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Having a shared temporary directory, flock(2)'ed to avoid races, is a great idea in theory but unfortunately doesn't work so well with systemd.exec(5)'s ReadWritePaths settings since ReadWritePaths=/var/www/webmap/tiles ReadWritePaths=/var/www/webmap/tiles.tmp creates multiple mount points pointing at the same file system and rename(2)/renameat2(2) can't cope with that. Quoting the manual: EXDEV oldpath and newpath are not on the same mounted filesystem. (Linux permits a filesystem to be mounted at multiple points, but rename() does not work across different mount points, even if the same filesystem is mounted on both.) So the options are to either use a single ReadWritePaths=/var/www/webmap, or --mvtdir-tmp=/var/www/webmap/tiles/.tmp. Both kind of defeat the point (we'd in fact want to use --mvtdir-tmp=/var/tmp/webmap/tiles), so we use mkdtemp(3) instead.
*	MVT: Lower max-zoom to 7 (8m/px) from 9 (2m/px).	Guilhem Moulin	2025-05-21	1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The webmap currently doesn't go beyond zoom level 7 (8m/px) so it makes little sense to create tiles for higher zoom levels. It speeds things up and saves CPU resources and disk space, too. For group ‘ren’ on the desktop (GPKG destination dataset), before (max-zoom=9): INFO: Exported 4488 features to 4 MVT layers in 00:06:02.926 INFO: Tile count: 75972 [min=33 B, max=128.16 kiB, sum=15.10 MiB, avg=208 B] vs. after (max-zoom=7): INFO: Exported 4488 features to 4 MVT layers in 00:00:25.548 INFO: Tile count: 5031 [min=35 B, max=128.16 kiB, sum=4.80 MiB, avg=0.98 kiB]
*	MVT: Don't clip features manually.	Guilhem Moulin	2025-05-21	2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It's done automatically by the MVT driver. Also we don't want to clip at the extent boundaries, but instead leave som headroom so the clipped polygon border is not visible on the map. The MVT driver does that. It seems that GDAL 3.6.2 from Debian Bookworm generates too many tiles though. It yields the following tile counts for group ‘ren’: no manual clipping, BUFFER=32: 83718 tiles [min=33 B, max=117.70 kiB, sum=15.73 MiB, avg=197 B] no manual clipping, BUFFER=0: 83676 tiles clip at extent, BUFFER=32: 76256 tiles GDAL 3.10.3 from Debian Trixie yields less surprising tile counts: no manual clipping, BUFFER=32: 75972 tiles [min=33 B, max=128.16 kiB, sum=15.10 MiB, avg=208 B] no manual clipping, BUFFER=0: 75939 tiles clip at extent, BUFFER=32: 75972 tiles (Interesting to see that the largest tile — 0/0/0.pbf — is over 10kiB larger with the more recent GDAL version, also.)
*	webmap-import: Add option to generate Mapbox Vector Tiles (MVT).	Guilhem Moulin	2025-05-21	6
\|
*	Factor out densification logic from getExtent() into own function.	Guilhem Moulin	2025-05-21	3
\| \| \| \| \| \|	And only densify if needs be. Most sources are already in SWEREF 99 (modulo axis mapping strategy) so in pratice we can use mere rectangles as spatial filters.
*	common_gdal: Replace remaining generic Exception with RuntimeError.	Guilhem Moulin	2025-05-21	1
\|
*	Move part of the fingerprinting logic into PostgreSQL when possible.	Guilhem Moulin	2025-05-20	2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows ordering features before hashing, which is required for layers from Naturvårdsverket and Skogsstyrelsen (features appears to be randomly ordered in daily exports, so normalization and fingerprinting is needed to detect whether there are now changes). On the downside, this makes the cache a PostgreSQL-only feature. It's also marginally slower than the old logic because for some reason PostgreSQL doesn't seem to use the UNIQUE index and instead does a seq scan followed by a quicksort. Without fingerprinting logic: $ time -f "%E (%U user, %S sys) %Mk maxres" /usr/local/bin/webmap-import \ --cachedir=/var/cache/webmap \ --lockfile=/run/lock/webmap/lock \ --lockdir-sources=/run/lock/webmap/cache \ --force \ "sks:UtfordAvverk" […] INFO: Layer "sks:UtfordAvverk" has 313044 features […] 3:54.45 (85.28 user, 26.19 sys) 72520k maxres With old fingerprinting logic (full client-side SHA-256 digest of features as they are being imported): $ time -f "%E (%U user, %S sys) %Mk maxres" /usr/local/bin/webmap-import \ --cachedir=/var/cache/webmap \ --lockfile=/run/lock/webmap/lock \ --lockdir-sources=/run/lock/webmap/cache \ --force \ "sks:UtfordAvverk" […] INFO: Imported 313044 features from source layer "UtfordAvverkningYta" […] INFO: Updated layer "sks:UtfordAvverk" has new fingerprint e655a97a 4:15.65 (108.46 user, 26.73 sys) 80672k maxres With now fingerprinting logic (hybrid client/server SHA-256 digest and hash_record_extended() calls after the import process): $ time -f "%E (%U user, %S sys) %Mk maxres" /usr/local/bin/webmap-import \ --cachedir=/var/cache/webmap \ --lockfile=/run/lock/webmap/lock \ --lockdir-sources=/run/lock/webmap/cache \ --force \ "sks:UtfordAvverk" […] INFO: Layer "sks:UtfordAvverk" has 313044 features […] 4:30.77 (87.02 user, 25.67 sys) 72856k maxres Same but without ORDER BY (or ORDER BY ogc_fid): 4:07.52 (88.23 user, 26.58 sys) 72060k maxres (A server side incremental hash function would be better, but there is no such thing currently and the only way to hash fully server side is to aggregate rows in an array which would be too expensive memory-wise for large table.)
*	importSources(): Return either success, error, or no change.	Guilhem Moulin	2025-05-01	2
\| \| \| \| \|	That way we can detect when the import of all layers are no-op (besides changing last_updated) and exit gracefully.
*	webmap-import: Fingerprint destination layers to detect changes.	Guilhem Moulin	2025-05-01	4
\| \| \| \| \| \| \| \|	Comparing modification times is not enough since some sources (for instance Naturvårdsverket's SCI_Rikstackande) are updated on the server even though no objects are being added; the source layer remains unchanged but the file differs because of OBJECTID changes we are not interested in.
*	webmap-import: Fix fd leak, open lockfiles only once.	Guilhem Moulin	2025-04-28	1
\| \| \| \| \|	Some layers, for instance svk:*, use the same source file, and we want a single lock per file.
*	Set and restore umask to ensure lockfiles are atomically created with mode 0664.	Guilhem Moulin	2025-04-28	2
\| \| \| \| \|	Using the default 0022 yields lock files with g-w, so trying to flock(2) from a different user failed.
*	webmap-download: Don't fail for sourceless layers.	Guilhem Moulin	2025-04-28	1
\| \| \| \| \|	Cf. for instance svk:transmissionsnatsprojekt. It has an import definition, but no source path/URL for webmap-download to operate on.
*	config.yml: Record sks:AvverkAnm columns following order from the source layer.	Guilhem Moulin	2025-04-28	1
\|
*	Update Mineralrättigheter import logic.	Guilhem Moulin	2025-04-28	3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Thanks to EU Directive 2019/1024 of the European Parliament and of the Council of 20 June 2019 on open data and the re-use of public sector information, and the Commission Implementing Regulation (EU) 2023/138 of 21 December 2022, the mineral register is now available under the terms of the CC0 1.0 Universal licence, see https://www.sgu.se/produkter-och-tjanster/geologiska-data/malmer-och-mineral--geologiska-data/mineralrattigheter-och-prospektering/ Given we no longer need to parse SVG images from the webmap, we drop webmap-download-mrr.py and add layers for expired and forbidden permits (ut_metaller_industrimineral_forfallna, ut_olja_gas_diamant_forfallna, bearbetningskoncessioner_forfallna, ut_metaller_industrimineral_forbud, ut_diamant_forbud) as well as markanvisningar_bk_ansokta. Unfortunately the GeoPackage file doesn't include peat concessions, so we drop them from config.yml for now. Should they be of interest we can always restore webmap-download-mrr.py and/or order the register from Bergsstaten, cf. https://resource.sgu.se/bergsstaten/exporter-ur-mrr-info.pdf .
*	Move layer transactional logic to importSources().	Guilhem Moulin	2025-04-24	2
\| \| \| \| \| \|	It's much clearer that way. The destination layer is cleared and updated in that function, so it makes sense if that's also where transactions (or SAVEPOINTs) are committed or rollback'ed.
*	Change layer cache logic to target destination layers rather than sources.	Guilhem Moulin	2025-04-24	4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In a future commit we'll fingerprint layers to detect changes. Comparing modification times is not enough since some sources (for instance Naturvårdsverket's SCI_Rikstackande) are updated on the server even though no objects are being added; the source layer remains unchanged but the file differs because of OBJECTID changes we are not interested in. Rather than using another cache layer/table for fingerprints, we cache destination layernames rather than triplets (source_path, archive_member, layername), along with the time at which the import was started rather than source_path's mtime. There is indeed no value in having exact source_path's mtime in the cache. What we need is simply a way to detect whether source paths have been updated in a subsequent run. Thanks to the shared locks the ctime of any updated source path will be at least the time when the locks are released, thereby exceeding the last_updated value.
*	typofix	Guilhem Moulin	2025-04-23	1
\|
*	common_gdal.py: Use OGR_TZFLAG_UTC rather than hardcoding its value 100.	Guilhem Moulin	2025-04-23	1
\| \| \| \|	Cf. https://gdal.org/en/stable/api/vector_c_api.html#c.OGR_TZFLAG_UTC .
*	Skogsstyrelsen: Relax some constraints on the schema to reflect current GPKG ↵	Guilhem Moulin	2025-04-23	2
\| \| \| \|	files.
*	webmap-import: Add a cache layer and store the source file's last ↵	Guilhem Moulin	2025-04-23	6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	modification time. That way we can avoid the expensive unpack+import when the source file(s) have not been updated since the last run. The check can be bypassed with a new flag `--force`. We use a sequence for the FID:s (primary key) and a UNIQUE constraint on triplets (source_path, archive_member, layername) as GDAL doesn't support multicolumns primary keys. To avoid races between the stat(2) calls, gdal.OpenEx() and updates via `webmap-download` runs we place a shared lock on the downloaded files. One could resort to some tricks to eliminate the race between the first two, but there is also some value in having consistency during the entire execution of the script (a single source file can be used by multiple layers for instance, and it makes sense to use the very same file for all layers in that case). We also intersperse dso.FlushCache() calls between _importSource() calls in order to force the PG driver to call EndCopy() to detect errors and trigger a rollback when _importSource() fails.
*	webmap-import: Break down into separate modules.	Guilhem Moulin	2025-04-21	5
\|
*	webmap-import: Major refactoring.	Guilhem Moulin	2025-04-19	4
\|
*	webmap-download: Add a --force flag to always download regardless of age.	Guilhem Moulin	2025-04-19	1
\|
*	Factor sources in config.yml.	Guilhem Moulin	2025-04-19	3
\| \| \| \| \| \| \| \| \|	This avoid duplications when the same source file is used multiple times (either by the same layer or by multiple layers). This change breaks webmap-import, but that one will be refactored shortly. It also breaks webmap-import-mrr.py, which is no longer used since mineralrattigheter.zip can be downloaded from SGU's site directly.
*	Add type hints and refactor a bit to please pylint.	Guilhem Moulin	2025-04-19	8
\|
*	Fix common.format_time().	Guilhem Moulin	2025-04-17	1
\| \| \| \|	And add an optional precision argument
*	mrr:markanvisningar: Make DecisionDate non-nullable.	Guilhem Moulin	2025-03-17	1
\| \| \| \|	It appears it is no longer stripped in Bergsstaten's mineralregister.
*	Relax some UNIQUE constraint on the MRR layer's DiaryNR column.	Guilhem Moulin	2024-10-29	2
\| \| \| \| \| \| \|	Apparently there are duplicates for dnr BS 22-28-2000 (bearbetningskoncessioner_beviljade) and BS 23-126-2007, BS 23-116-2005, BS 23-226-1930, BS 23-149-1960, BS 23-7-2000, and BS 23-105-2010 (markanvisningar_bk_beviljade).
*	Bump User-Agent value to Tor Browser 14.	Guilhem Moulin	2024-10-27	1
\|
*	PostgreSQL: Use `character varying (n)` not `character (n)`.	Guilhem Moulin	2024-10-27	1
\| \| \| \| \| \|	Forgot that PostgreSQL space-pads the latter, and that the former doesn't in fact yield extra storage costs, cf. https://www.postgresql.org/docs/13/datatype-character.html
*	webmap-import: Show the list of ingnored source fields.	Guilhem Moulin	2024-10-27	1
\|
*	nva:Skogsstyrelsen: Remove NOT NULL constraint on the DatAvtal column.	Guilhem Moulin	2024-10-27	2
\| \| \| \| \| \|	As of today SK 117-2024 has not contract date set, so the constraint fails during import. https://www.skogsstyrelsen.se/skogens-parlor/NVAvtal/?objektid=4020527
*	PostgreSQL: Add NOT NULL constraints on the geometry columns.	Guilhem Moulin	2024-10-27	2
\| \| \| \| \|	Among other things this allows CLUSTERing on the GIST indices, cf. https://postgis.net/docs/manual-3.3/performance_tips.html#database_clustering
*	Add comments and value substitution to some Riksintresse layers.	Guilhem Moulin	2024-10-20	1
\|
*	webmap-publish: Add comment regarding compression.	Guilhem Moulin	2024-10-20	1
\|
*	Add layer with noteworthy dams.	Guilhem Moulin	2024-10-19	2
\| \| \| \| \| \| \| \|	Unfortunately the dam register is way too noisy Unfortunately it looks like there is no way to download a pre-curated dam registry with only noteworthy dams used for production (SvK's dammar_pf.shp comes close but still misses some), so we manually remove the ones which are more than 2km from a production site or a power station.