Pre-process input table — validateInput • tabshiftr

This function groups rows, splices the header into the table and fills missing values where they should not exist.

validateInput(schema = NULL, input = NULL)

Arguments

schema: [character(1)]
the validated schema description of input.
input: [character(1)]
table to reorganise.

Value

a table where grouped rows are summarised and, if applicable, the header row is spliced back in as row 1.

Details

validateInput is called automatically by reorganise and does not usually need to be called directly. It performs two pre-processing steps on the input table before variable extraction begins:

If setFormat(header = TRUE) was used, the column names that were consumed by R when reading the file are spliced back into the table as row 1. This makes row numbers stable and consistent with the schema description.
If setGroups was used, the specified groups of rows are summarised into single rows according to the aggregation functions provided to .sum. Character columns are collapsed with paste0(na.omit(x), collapse = " ") by default; numeric columns are summed. Missing values within a group can be filled before aggregation by passing a fill direction to .sum.

Examples

# validateInput is called implicitly by reorganise(); the example below shows
# its effect when setGroups is used to collapse pairs of rows before extraction.
(input <- tabs2shift$group_sum)
#> # A tibble: 8 × 6
#>   X1          X2     X3        X4    X5         X6   
#>   <chr>       <chr>  <chr>     <chr> <chr>      <chr>
#> 1 territories period harvested NA    production NA   
#> 2 NA          NA     soybean   maize soybean    maize
#> 3 unit 1      NA     1000      1000  1000       1000 
#> 4 NA          year 1 111       121   112        122  
#> 5 unit 1      year 2 1211      1221  1212       1222 
#> 6 NA          year 1 2000      2000  2000       2000 
#> 7 unit 2      NA     111       121   112        122  
#> 8 unit 2      year 2 2211      2221  2212       2222 

schema <-
  setGroups(rows = .sum(c(3, 4))) |>
  setGroups(rows = .sum(c(6, 7))) |>
  setIDVar(name = "territories", columns = 1) |>
  setIDVar(name = "year", columns = 2) |>
  setIDVar(name = "commodities", columns = c(3:6), rows = 2) |>
  setObsVar(name = "harvested", columns = c(3, 4)) |>
  setObsVar(name = "production", columns = c(5, 6))

# inspect the pre-processed table directly
schema_validated <- validateSchema(schema = schema, input = input)
validateInput(schema = schema_validated, input = input)
#> # A tibble: 6 × 6
#>   X1          X2     X3        X4    X5         X6   
#>   <chr>       <chr>  <chr>     <chr> <chr>      <chr>
#> 1 territories period harvested NA    production NA   
#> 2 NA          NA     soybean   maize soybean    maize
#> 3 unit 1      year 1 1111      1121  1112       1122 
#> 4 unit 1      year 2 1211      1221  1212       1222 
#> 5 unit 2      year 1 2111      2121  2112       2122 
#> 6 unit 2      year 2 2211      2221  2212       2222