Feature Request: `assert_cols()` #121

billdenney · 2023-02-23T03:00:17Z

I often want to do a set of checks on the columns of data.frames before doing checks on the values within each column itself.

For example, I want to check that all columns are present, and rather than use the has_names() function with verify(), I'd like the output to specify what column or columns are missing. Similarly, I use verify(is.numeric(numeric_column_1)) %>% verify(is.numeric(numeric_column_2)) when a cleaner report would look more like assert(is.numeric, numeric_column_1, numeric_column_2).

What would you think about an assert_cols() function?

The text was updated successfully, but these errors were encountered:

tonyfischetti · 2023-03-13T17:03:45Z

I really like that idea. I'd like to make sure the semantics operate a lot like assert. Would it be like assert but on the vector of column names?

I think a lot of great functionality could come from assert_cols. Like checking whether there are no

duplicate column names
column names contain no ridiculous characters
missing columns (like you said)
data type (like you said)
fits a regex pattern
etc...

billdenney · 2023-03-14T18:11:18Z

Yeah, that covers a lot of the space I was thinking of. Generally, I'm thinking that it would be used two few different ways:

On the column names (duplicate names, character check, missing columns, name regexp)
On the column overall (data type is the main thing that I see here as the regexp pattern of values within the column seems like it would be handled by assert(), but if you wanted the answer for the column name instead of the row, then the regexp method could apply here, too.)

I see the above two as different ways to use the data, so I'd think they would either be two different functions (e.g. assert_col_names() and assert_col(), my preference) or one function with two modes of use (e.g. assert_col(..., assert_type = c("names", "values"))).

FYI, https://sfirke.github.io/janitor/reference/clean_names.html can help a lot with rational column naming, but it is a correction function rather than a checking function.

What do you think? Are there other use cases?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: `assert_cols()` #121

Feature Request: `assert_cols()` #121

billdenney commented Feb 23, 2023

tonyfischetti commented Mar 13, 2023

billdenney commented Mar 14, 2023

Feature Request: assert_cols() #121

Feature Request: assert_cols() #121

Comments

billdenney commented Feb 23, 2023

tonyfischetti commented Mar 13, 2023

billdenney commented Mar 14, 2023

Feature Request: `assert_cols()` #121

Feature Request: `assert_cols()` #121