Harmonise and integrate geometries into a standardised format
normGeometry(
input = NULL,
pattern = NULL,
query = NULL,
thresh = 10,
beep = NULL,
simplify = FALSE,
stringdist = TRUE,
strictMatch = FALSE,
verbose = FALSE
)
character(1)
path of the file to normalise. If
this is left empty, all files at stage two as subset by pattern
are
chosen.
character(1)
an optional regular expression.
Only dataset names which match the regular expression will be processed.
character(1)
part of the SQL query (starting
from
WHERE) used to subset the input geometries, for example "where NAME_0
= 'Estonia'"
. The first part of the query (where the layer is defined) is
derived from the meta-data of the currently handled geometry.
integerish(1)
percent value of overlap below
which two geometries (the input and the base) are considered to be the
same. This is required, because often the polygons from different sources,
albeit describing the same territorial unit, aren't completely the same.
integerish(1)
Number specifying what sound to be
played to signal the user that a point of interaction is reached by the
program, see beep
.
logical(1)
whether or not to simplify
geometries.
logical(1)
whether or not to use string
distance to find matches (should not be used for large datasets/when a
memory error is shown).
logical(1)
whether or not matches are
strict, i.e., there should be clear one-to-one relationships and no changes
in broader concepts.
logical(1)
be verbose about what is happening
(default FALSE
). Furthermore, you can use
suppressMessages
to make this function completely silent.
This function harmonises and integrates so far unprocessed geometries at stage two into stage three of the geospatial database. It produces for each main polygon (e.g. nation) in the registered geometries a spatial file of the specified file-type.
To normalise geometries, this function proceeds as follows:
Read in input
and extract initial metadata from
the file name.
In case filters are set, the new geometry is filtered by those.
The territorial names are matched with the gazetteer to harmonise new territorial names (at this step, the function might ask the user to edit the file 'matching.csv' to align new names with already harmonised names).
Loop through every nation potentially included in the file that shall be processed and carry out the following steps:
In case the geometries are provided as a list of simple feature POLYGONS, they are dissolved into a single MULTIPOLYGON per main polygon.
In case the nation to which a geometry belongs has not yet been created at stage three, the following steps are carried out:
Store the current geometry as basis of the respective level (the user needs to make sure that all following levels of the same dataseries are perfectly nested into those parent territories, for example by using the GADM dataset)
In case the nation to which the geometry belongs has already been created, the following steps are carried out:
Check whether the new geometries have the same coordinate reference system as the already existing database and re-project the new geometries if this is not the case.
Check whether all new geometries are already exactly matched spatially and stop if that is the case.
Check whether the new geometries are all within the already defined parents, and save those that are not as a new geometry.
Calculate
spatial overlap and distinguish the geometries into those that overlap with
more and those with less than thresh
.
For all units that dName match, copy gazID from the geometries they overlap.
For all units that dName not match, rebuild metadata and a new gazID.
store the processed geometry at stage three.
Move the geometry to the folder '/processed', if it is fully processed.
Other normalise functions:
normTable()
if(dev.interactive()){
library(sf)
# build the example database
adb_example(until = "regGeometry", path = tempdir())
# normalise all geometries ...
normGeometry(pattern = "estonia")
# ... and check the result
st_layers(paste0(tempdir(), "/geometries/stage3/Estonia.gpkg"))
output <- st_read(paste0(tempdir(), "/geometries/stage3/Estonia.gpkg"))
}