|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a future commit we'll fingerprint layers to detect changes.
Comparing modification times is not enough since some sources (for
instance Naturvårdsverket's SCI_Rikstackande) are updated on the server
even though no objects are being added; the source layer remains
unchanged but the file differs because of OBJECTID changes we are not
interested in.
Rather than using another cache layer/table for fingerprints, we cache
destination layernames rather than triplets (source_path, archive_member,
layername), along with the time at which the import was started rather
than source_path's mtime.
There is indeed no value in having exact source_path's mtime in the
cache. What we need is simply a way to detect whether source paths have
been updated in a subsequent run. Thanks to the shared locks the ctime
of any updated source path will be at least the time when the locks are
released, thereby exceeding the last_updated value.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
modification time.
That way we can avoid the expensive unpack+import when the source
file(s) have not been updated since the last run. The check can be
bypassed with a new flag `--force`.
We use a sequence for the FID:s (primary key) and a UNIQUE constraint on
triplets (source_path, archive_member, layername) as GDAL doesn't
support multicolumns primary keys.
To avoid races between the stat(2) calls, gdal.OpenEx() and updates via
`webmap-download` runs we place a shared lock on the downloaded files.
One could resort to some tricks to eliminate the race between the first
two, but there is also some value in having consistency during the
entire execution of the script (a single source file can be used by
multiple layers for instance, and it makes sense to use the very same
file for all layers in that case).
We also intersperse dso.FlushCache() calls between _importSource() calls
in order to force the PG driver to call EndCopy() to detect errors and
trigger a rollback when _importSource() fails.
|