This function takes a raw schema description and updates values that were
only given as wildcard or implied values. It is automatically called by
reorganise
, but can also be used in concert with the getters to debug
a schema.
validateSchema(schema = NULL, input = NULL)
An updated schema description
The core idea of a schema description is that it can be written in a
very generic way, as long as it describes sufficiently where in a table
what variable can be found. A very generic way can be via using the
function .find
to identify the initially unknown
cell-locations of a variable on-the-fly, for example when it is merely
known that a variable must be in the table, but not where it is.
validateSchema
matches a schema with an input table and inserts the
accordingly evaluated positions (of clusters, filters and variables),
adapts some of the meta-data and ensures formal consistency of the schema.
# build a schema for an already tidy table
(tidyTab <- tabs2shift$tidy)
#> # A tibble: 10 × 7
#> X1 X2 X3 X4 X5 X6 X7
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 territories period commodities other_observed harvested production empty_col
#> 2 unit 1 year 1 soybean xyz 1111 1112 NA
#> 3 unit 1 year 1 maize xyz 1121 1122 NA
#> 4 unit 1 year 2 soybean xyz 1211 1212 NA
#> 5 unit 1 year 2 maize xyz 1221 1222 NA
#> 6 NA NA NA NA NA NA NA
#> 7 unit 2 year 1 soybean xyz 2111 2112 NA
#> 8 unit 2 year 1 maize xyz 2121 2122 NA
#> 9 unit 2 year 2 soybean xyz 2211 2212 NA
#> 10 unit 2 year 2 maize xyz 2221 2222 NA
schema <-
setIDVar(name = "territories", col = 1) %>%
setIDVar(name = "year", col = .find(pattern = "period")) %>%
setIDVar(name = "commodities", col = 3) %>%
setObsVar(name = "harvested", col = 5) %>%
setObsVar(name = "production", col = 6)
# before ...
schema
#> 1 cluster (whole spreadsheet)
#>
#> variable type col
#> ------------- ---------- --------
#> territories id 1
#> year id period
#> commodities id 3
#> harvested observed 5
#> production observed 6
# ... after
validateSchema(schema = schema, input = tidyTab)
#> 1 cluster
#> origin : 1|1 (row|col)
#>
#> filter [rows 2, 3, 4, 5, 7, 8, 9, 10]
#>
#> variable type top col
#> ------------- ---------- ----- -----
#> territories id 1
#> year id 2
#> commodities id 3
#> harvested observed 1 5
#> production observed 1 6