Combine records helper functions — combine_records_helper

This page documents helper functions for use with combine_records().

Usage

compute_mode(ties = FALSE, na.rm = TRUE)

compute_mean(na.rm = TRUE)

compute_median(na.rm = TRUE)

fuse(separator, na_string = "NA")

select_max(max_col, use_abs = FALSE, keep_NA = FALSE)

select_min(min_col, use_abs = FALSE, keep_NA = FALSE)

select_match(match_col, search_col, separator, na_string = "NA")

select_exact(match_col, match, separator, na_string = "NA")

fuse_unique(
  separator,
  na_string = "NA",
  digits = 6,
  drop_na = FALSE,
  sort = FALSE
)

prioritise(match_col, priority, separator, no_match = NA, na_string = "NA")

nothing()

count_records()

select_grade(grade_col, keep_NA = FALSE, upper_case = TRUE)

Arguments

ties: (logical) If TRUE then all records matching the tied groups are returned. Otherwise the first record is returned.
na.rm: (logical) If TRUE then NA is ignored
separator: (character, NULL) if !NULL this string is used to collapse matches with the same priority
na_string: (character) NA values are replaced with this string
max_col: (character) the column name to search for the maximum value.
use_abs: (logical) If TRUE then the sign of the values is ignored.
keep_NA: (logical) If TRUE keeps records with NA values
min_col: (character) the column name to search for the minimum value.
match_col: (character) the column with labels to prioritise
search_col: (character) the name of a column to use as a reference for locating values in the matching column.
match: (character) a value to search for in the matching column.
digits: (numeric) the number of digits to use when converting numerical values to characters when determining if values are unique.
drop_na: (logical) exclude NA from the list of unique entires
sort: (logical) sort the values before collapsing.
priority: (character) a list of labels in priority order
no_match: (character, NULL) if !NULL then annotations not matching any of the priority labels are replaced with this value
grade_col: (character) the name of a column containing grades
upper_case: (logical) If TRUE then grades are compared to upper case letters to determine their ordering, otherwise lower case.

Value

A function for use with combine_records()

Functions

compute_mode(): returns the most common value, excluding NA. If ties == TRUE then all tied values are returned, otherwise the first value in a sorted unique list is returned (equal to min if numeric). If na.rm = FALSE then NA are included when searching for the modal value and placed last if ties = FALSE (values are returned preferentially over NA).
compute_mean(): calculates the mean value, excluding NA if na.rm = TRUE
compute_median(): calculates the median value, excluding NA if na.rm = TRUE
fuse(): collapses multiple matching records into a single string using the provided separator.
select_max(): selects a record based on the index of the maximum value in a another column.
select_min(): selects a record based on the index of the minimum in a second column.
select_match(): returns all records based on the indices of identical matches in a second column and collapses them using the provided separator.
select_exact(): returns records based on the index of identical value matching the match parameter within the current column, and collapses them using the provided separator if necessary.
fuse_unique(): collapses a set of records to a set of unique values using the provided separator. digits can be provided for numeric columns to control the precision used when determining unique values.
prioritise(): reduces a set of annotations by prioritising values according to the input. If there are multiple matches with the same priority then they are collapsed using a separator.
nothing(): a pass-through function to allow some annotation table columns to remain unchanged.
count_records(): adds a new column indicating the number of annotations that match the given grouping variable.
select_grade(): returns records based on the index of the best grade in a second list. The best grade is defined as "A" for upper_case = TRUE or "a" for upper_case = FALSE and the worst grade is "Z" or "z". Any non-exact matches to a character in LETTERS or letters are replaced with NA.

Examples


# Select matching records
M <- combine_records(
    group_by = "example",
    default_fcn = select_exact(
        match_col = "match_column",
        match = "find_me",
        separator = ", ",
        na_string = "NA"
    )
)

# Collapse unique values
M <- combine_records(
    group_by = "example",
    default_fcn = fuse_unique(
        digits = 6,
        separator = ", ",
        na_string = "NA",
        sort = FALSE
    )
)

# Prioritise by source
M <- combine_records(
    group_by = "InChiKey",
    default_fcn = prioritise(
        match_col = "source",
        priority = c("CD", "LS"),
        separator = "  || "
    )
)

# Do nothing to all columns
M <- combine_records(
    group_by = "InChiKey",
    default_fcn = nothing()
)

# Add a column with the number of records with a matching inchikey
M <- combine_records(
    group_by = "InChiKey",
    fcns = list(
        count = count_records()
    )
)

# Select annotation with highest (best) grade
M <- combine_records(
    group_by = "InChiKey",
    default_fcn = select_grade(
        grade_col = "grade",
        keep_NA = FALSE,
        upper_case = TRUE
    )
)