This function takes a raw schema description and updates values that were
only given as wildcard or implied values. It is automatically called by
reorganise, but can also be used in concert with the getters to debug
a schema.
validateSchema(schema = NULL, input = NULL)An updated schema description
The core idea of a schema description is that it can be written in a
very generic way, as long as it describes sufficiently where in a table
what variable can be found. A very generic way can be via using the
function .find to identify the initially unknown
cell-locations of a variable on-the-fly, for example when it is merely
known that a variable must be in the table, but not where it is.
validateSchema matches a schema with an input table and inserts the
accordingly evaluated positions (of clusters, filters and variables),
adapts some of the meta-data and ensures formal consistency of the schema.
# build a schema for an already tidy table
(tidyTab <- tabs2shift$tidy)
#> # A tibble: 10 × 7
#> X1 X2 X3 X4 X5 X6 X7
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 territories period commodities other_observed harvested production empty_col
#> 2 unit 1 year 1 soybean xyz 1111 1112 NA
#> 3 unit 1 year 1 maize xyz 1121 1122 NA
#> 4 unit 1 year 2 soybean xyz 1211 1212 NA
#> 5 unit 1 year 2 maize xyz 1221 1222 NA
#> 6 NA NA NA NA NA NA NA
#> 7 unit 2 year 1 soybean xyz 2111 2112 NA
#> 8 unit 2 year 1 maize xyz 2121 2122 NA
#> 9 unit 2 year 2 soybean xyz 2211 2212 NA
#> 10 unit 2 year 2 maize xyz 2221 2222 NA
schema <-
setIDVar(name = "territories", col = 1) %>%
setIDVar(name = "year", col = .find(pattern = "period")) %>%
setIDVar(name = "commodities", col = 3) %>%
setObsVar(name = "harvested", col = 5) %>%
setObsVar(name = "production", col = 6)
# before ...
schema
#> 1 cluster (whole spreadsheet)
#>
#> variable type col
#> ------------- ---------- --------
#> territories id 1
#> year id period
#> commodities id 3
#> harvested observed 5
#> production observed 6
# ... after
validateSchema(schema = schema, input = tidyTab)
#> 1 cluster
#> origin : 1|1 (row|col)
#>
#> filter [rows 2, 3, 4, 5, 7, 8, 9, 10]
#>
#> variable type top col
#> ------------- ---------- ----- -----
#> territories id 1
#> year id 2
#> commodities id 3
#> harvested observed 1 5
#> production observed 1 6