Combine records helper functions
Source:R/combine_records_class.R
combine_records_helper_functions.Rd
This page documents helper functions for use with combine_records()
.
Usage
.mode(ties = FALSE, na.rm = TRUE)
.mean()
.median()
.collapse(separator, na_string = "NA")
.select_max(max_col, use_abs = FALSE, keep_NA = FALSE)
.select_min(min_col, use_abs = FALSE, keep_NA = FALSE)
.select_match(match_col, search_col, separator, na_string = "NA")
.select_exact(match_col, match, separator, na_string = "NA")
.unique(separator, na_string = "NA", digits = 6)
.prioritise(match_col, priority, separator, no_match = NA, na_string = "NA")
.nothing()
.count()
.select_grade(grade_col, keep_NA = FALSE, upper_case = TRUE)
Arguments
- ties
(logical) If TRUE then all records matching the tied groups are returned. Otherwise the first record is returned.
- na.rm
(logical) If TRUE then NA is ignored
- separator
(character, NULL) if !NULL this string is used to collapse matches with the same priority
- na_string
(character) NA values are replaced with this string
- max_col
(character) the column name to search for the maximum value.
- use_abs
(logical) If TRUE then the sign of the values is ignored.
- keep_NA
(logical) If TRUE keeps records with NA values
- min_col
(character) the column name to search for the minimum value.
- match_col
(character) the column with labels to prioritise
- search_col
(character) the name of a column to use as a reference for locating values in the matching column.
- match
(character) a value to search for in the matching column.
- digits
(numeric) the number of digits to use when converting numerical values to characters when determining if values are unique.
- priority
(character) a list of labels in priority order
- no_match
(character, NULL) if !NULL then annotations not matching any of the priority labels are replaced with this value
- grade_col
(character) the name of a column containing grades
- upper_case
(logical) If TRUE then grades are compared to upper case letters to determine their ordering, otherwise lower case.
Value
A function for use with combine_records()
Functions
.mode()
: returns the most common value, excluding NA. Ifties == TRUE
then all tied values are returned, otherwise the first value in a sorted unique list is returned (equal to min if numeric). Ifna.rm = FALSE
then NA are included when searching for the modal value and placed last ifties = FALSE
(values are returned preferentially over NA)..mean()
: calculates the mean value, excluding NA ifna.rm = TRUE
.median()
: calculates the median value, excluding NA ifna.rm = TRUE
.collapse()
: collapses multiple matching records into a single string using the provided separator..select_max()
: selects a record based on the index of the maximum value in a another column..select_min()
: selects a record based on the index of the minimum in a second column..select_match()
: returns all records based on the indices of identical matches in a second column and collapses them useing the provided separator..select_exact()
: returns records based on the index of identical value matching thematch
parameter within the current column, and collapses them using the provided separator if necessary..unique()
: collapses a set of records to a set of unique values using the provided separator.digits
can be provided for numeric columns to control the precision used when determining unique values..prioritise()
: reduces a set of annotations by prioritising values according to the input. If there are multiple matches with the same priority then they are collapsed using a separator..nothing()
: a pass-through function to allow some annotation table columns to remain unchanged..count()
: adds a new column indicating the number of annotations that match the given grouping variable..select_grade()
: returns records based on the index of the best grade in a second list. The best grade is defined as "A" forupper_case = TRUE
or "a" forupper_case = FALSE
and the worst grade is "Z" or "z". Any non-exact matches to a character inLETTERS
orletters
are replaced with NA.
Examples
# Select matching records
M = combine_records(
group_by = 'example',
default_fcn = .select_match(
match_col = 'match_column',
match = 'find_me',
separator = ', ',
na_string = 'NA')
)
#> Error in .select_match(match_col = "match_column", match = "find_me", separator = ", ", na_string = "NA"): unused argument (match = "find_me")
# Collapse unique values
M = combine_records(
group_by = 'example',
default_fcn = .unique(
digits = 6,
separator = ', ',
na_string = 'NA')
)
# Prioritise by source
M = combine_records(
group_by = 'InChiKey',
default_fcn = .prioritise(
match_col = 'source',
priority = c('CD','LS'),
separator = ' || ')
)
# Do nothing to all columns
M = combine_records(
group_by = 'InChiKey',
default_fcn = .nothing()
)
# Add a column with the number of records with a matching inchikey
M = combine_records(
group_by = 'InChiKey',
fcns = list(
count = .count()
))
# Select annotation with highest (best) grade
M = combine_records(
group_by = 'InChiKey',
default_fcn = .select_grade(
grade_col = 'grade',
keep_NA = FALSE,
upper_case = TRUE
))