Any table makes some assumptions about the data, but they are mostly not explicitly recorded in the commonly available table format. This concerns, for example, the symbol(s) that signal "not available" values or the symbol that is used as decimal sign.

setFormat(
  schema = NULL,
  header = FALSE,
  decimal = NULL,
  thousand = NULL,
  na_values = NULL,
  zero_values = NULL,
  flags = NULL
)

Arguments

schema

[schema(1)]
In case this information is added to an already existing schema, provide that schema here (overwrites previous information).

header

[logical(1)]
Whether the table was read with a header row already consumed as column names (e.g. via read.csv default). If TRUE, the column names are spliced back into the table as row 1 before variable extraction. Optimally, tables are read with header = FALSE so row numbers are stable, in which case this should be left as FALSE (the default).

decimal

[character(1)]
The symbols that should be interpreted as decimal separator.

thousand

[character(1)]
The symbols that should be interpreted as thousand separator.

na_values

[character(.)]
The symbols that should be interpreted as NA.

zero_values

[character(.)]
The symbols that should be interpreted as 0.

flags

[data.frame(2)]
The typically character based flags that should be shaved off of observed variables to make them identifiable as numeric values. This must be a data.frame with two columns with names flag and value.

Value

An object of class schema.

Details

Please also take a look at the currently suggested strategy to set up a schema description.

See also

Other functions to describe table arrangement: setCluster(), setFilter(), setGroups(), setIDVar(), setObsVar()

Examples

# please check the vignette for examples