This function groups rows, splices the header into the table and fills missing values where they should not exist.
validateInput(schema = NULL, input = NULL)a table where grouped rows are summarised and, if applicable, the header row is spliced back in as row 1.
validateInput is called automatically by reorganise and
does not usually need to be called directly. It performs two pre-processing
steps on the input table before variable extraction begins:
If setFormat(header = TRUE) was used, the column names that
were consumed by R when reading the file are spliced back into the table as
row 1. This makes row numbers stable and consistent with the schema
description.
If setGroups was used, the specified groups of rows are
summarised into single rows according to the aggregation functions provided
to .sum. Character columns are collapsed with
paste0(na.omit(x), collapse = " ") by default; numeric columns are
summed. Missing values within a group can be filled before aggregation by
passing a fill direction to .sum.
# validateInput is called implicitly by reorganise(); the example below shows
# its effect when setGroups is used to collapse pairs of rows before extraction.
(input <- tabs2shift$group_sum)
#> # A tibble: 8 × 6
#> X1 X2 X3 X4 X5 X6
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 territories period harvested NA production NA
#> 2 NA NA soybean maize soybean maize
#> 3 unit 1 NA 1000 1000 1000 1000
#> 4 NA year 1 111 121 112 122
#> 5 unit 1 year 2 1211 1221 1212 1222
#> 6 NA year 1 2000 2000 2000 2000
#> 7 unit 2 NA 111 121 112 122
#> 8 unit 2 year 2 2211 2221 2212 2222
schema <-
setGroups(rows = .sum(c(3, 4))) |>
setGroups(rows = .sum(c(6, 7))) |>
setIDVar(name = "territories", columns = 1) |>
setIDVar(name = "year", columns = 2) |>
setIDVar(name = "commodities", columns = c(3:6), rows = 2) |>
setObsVar(name = "harvested", columns = c(3, 4)) |>
setObsVar(name = "production", columns = c(5, 6))
# inspect the pre-processed table directly
schema_validated <- validateSchema(schema = schema, input = input)
validateInput(schema = schema_validated, input = input)
#> # A tibble: 6 × 6
#> X1 X2 X3 X4 X5 X6
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 territories period harvested NA production NA
#> 2 NA NA soybean maize soybean maize
#> 3 unit 1 year 1 1111 1121 1112 1122
#> 4 unit 1 year 2 1211 1221 1212 1222
#> 5 unit 2 year 1 2111 2121 2112 2122
#> 6 unit 2 year 2 2211 2221 2212 2222