Harmonise and integrate geometries into a standardised format

normGeometry(
  input = NULL,
  pattern = NULL,
  query = NULL,
  thresh = 10,
  beep = NULL,
  simplify = FALSE,
  stringdist = TRUE,
  strictMatch = FALSE,
  verbose = FALSE
)

Arguments

input

character(1)
path of the file to normalise. If this is left empty, all files at stage two as subset by pattern are chosen.

pattern

character(1)
an optional regular expression. Only dataset names which match the regular expression will be processed.

query

character(1)
part of the SQL query (starting from WHERE) used to subset the input geometries, for example "where NAME_0 = 'Estonia'". The first part of the query (where the layer is defined) is derived from the meta-data of the currently handled geometry.

thresh

integerish(1)
percent value of overlap below which two geometries (the input and the base) are considered to be the same. This is required, because often the polygons from different sources, albeit describing the same territorial unit, aren't completely the same.

beep

integerish(1)
Number specifying what sound to be played to signal the user that a point of interaction is reached by the program, see beep.

simplify

logical(1)
whether or not to simplify geometries.

stringdist

logical(1)
whether or not to use string distance to find matches (should not be used for large datasets/when a memory error is shown).

strictMatch

logical(1)
whether or not matches are strict, i.e., there should be clear one-to-one relationships and no changes in broader concepts.

verbose

logical(1)
be verbose about what is happening (default FALSE). Furthermore, you can use suppressMessages to make this function completely silent.

Value

This function harmonises and integrates so far unprocessed geometries at stage two into stage three of the geospatial database. It produces for each main polygon (e.g. nation) in the registered geometries a spatial file of the specified file-type.

Details

To normalise geometries, this function proceeds as follows:

  1. Read in input and extract initial metadata from the file name.

  2. In case filters are set, the new geometry is filtered by those.

  3. The territorial names are matched with the gazetteer to harmonise new territorial names (at this step, the function might ask the user to edit the file 'matching.csv' to align new names with already harmonised names).

  4. Loop through every nation potentially included in the file that shall be processed and carry out the following steps:

    • In case the geometries are provided as a list of simple feature POLYGONS, they are dissolved into a single MULTIPOLYGON per main polygon.

    • In case the nation to which a geometry belongs has not yet been created at stage three, the following steps are carried out:

      1. Store the current geometry as basis of the respective level (the user needs to make sure that all following levels of the same dataseries are perfectly nested into those parent territories, for example by using the GADM dataset)

    • In case the nation to which the geometry belongs has already been created, the following steps are carried out:

      1. Check whether the new geometries have the same coordinate reference system as the already existing database and re-project the new geometries if this is not the case.

      2. Check whether all new geometries are already exactly matched spatially and stop if that is the case.

      3. Check whether the new geometries are all within the already defined parents, and save those that are not as a new geometry.

      4. Calculate spatial overlap and distinguish the geometries into those that overlap with more and those with less than thresh.

      5. For all units that dName match, copy gazID from the geometries they overlap.

      6. For all units that dName not match, rebuild metadata and a new gazID.

    • store the processed geometry at stage three.

  5. Move the geometry to the folder '/processed', if it is fully processed.

See also

Other normalise functions: normTable()

Examples

if(dev.interactive()){
  library(sf)

  # build the example database
  adb_example(until = "regGeometry", path = tempdir())

  # normalise all geometries ...
  normGeometry(pattern = "estonia")

  # ... and check the result
  st_layers(paste0(tempdir(), "/geometries/stage3/Estonia.gpkg"))
  output <- st_read(paste0(tempdir(), "/geometries/stage3/Estonia.gpkg"))
}