Reorganise a table — reorganise • tabshiftr

This function takes a disorganised messy table and rearranges columns and rows into a tidy table based on a schema description.

reorganise(input = NULL, schema = NULL)

Arguments

input: [data.frame(1)]
table to reorganise.
schema: [symbol(1)]
the schema description of input.

Value

A (tidy) table which is the result of reorganising input based on schema.

Examples

# a rather disorganised table with messy clusters and a distinct variable
(input <- tabs2shift$clusters_messy)
#> # A tibble: 13 × 7
#>    X1          X2        X3         X4          X5        X6         X7    
#>    <chr>       <chr>     <chr>      <chr>       <chr>     <chr>      <chr> 
#>  1 commodities harvested production NA          NA        NA         NA    
#>  2 unit 1      NA        NA         NA          NA        NA         NA    
#>  3 soybean     1111      1112       year 1      NA        NA         NA    
#>  4 maize       1121      1122       year 1      NA        NA         NA    
#>  5 soybean     1211      1212       year 2      NA        NA         NA    
#>  6 maize       1221      1222       year 2      NA        NA         NA    
#>  7 NA          NA        NA         NA          NA        NA         NA    
#>  8 commodities harvested production commodities harvested production NA    
#>  9 unit 2      NA        NA         unit 3      NA        NA         NA    
#> 10 soybean     2111      2112       soybean     3111      3112       year 1
#> 11 maize       2121      2122       maize       3121      3122       year 1
#> 12 soybean     2211      2212       soybean     3211      3212       year 2
#> 13 maize       2221      2222       maize       3221      3222       year 2

# put together schema description by ...
# ... identifying cluster positions
schema <- setCluster(id = "territories", left = c(1, 1, 4), top = c(1, 8, 8))

# ... specifying the cluster ID as id variable (obligatory)
schema <- schema %>%
    setIDVar(name = "territories", columns = c(1, 1, 4), rows = c(2, 9, 9))

# ... specifying the distinct variable (explicit position)
schema <- schema %>%
    setIDVar(name = "year", columns = 4, rows = c(3:6), distinct = TRUE)

# ... specifying a tidy variable (by giving the column values)
schema <- schema %>%
    setIDVar(name = "commodities", columns = c(1, 1, 4))

# ... identifying the (tidy) observed variables
schema <- schema %>%
    setObsVar(name = "harvested", columns = c(2, 2, 5)) %>%
    setObsVar(name = "production", columns = c(3, 3, 6))

# get the tidy output
reorganise(input, schema)
#> filling NA-values in downwards direction in column 'commodities'.
#> # A tibble: 12 × 5
#>    territories year   commodities harvested production
#>    <chr>       <chr>  <chr>           <dbl>      <dbl>
#>  1 unit 1      year 1 soybean          1111       1112
#>  2 unit 1      year 1 maize            1121       1122
#>  3 unit 1      year 2 soybean          1211       1212
#>  4 unit 1      year 2 maize            1221       1222
#>  5 unit 2      year 1 soybean          2111       2112
#>  6 unit 2      year 1 maize            2121       2122
#>  7 unit 2      year 2 soybean          2211       2212
#>  8 unit 2      year 2 maize            2221       2222
#>  9 unit 3      year 1 soybean          3111       3112
#> 10 unit 3      year 1 maize            3121       3122
#> 11 unit 3      year 2 soybean          3211       3212
#> 12 unit 3      year 2 maize            3221       3222