Skip to contents

This function reads an Excel file containing excerpt data, cleans and processes it by:

  • Reading all columns as text to avoid type guessing issues,

  • Dropping columns whose names end with "Range" or "Weight",

  • Converting code columns (those starting with "Code: ") from text to logical, interpreting "true" (case insensitive) as TRUE, otherwise FALSE,

  • Renaming code columns by removing the prefix "Code: " and suffix " Applied",

  • Filtering the data to keep only one preferred coder per Media Title based on the provided order.

Usage

clean_data(filepath, preferred_coders)

Arguments

filepath

A character string giving the path to the Excel file to read.

preferred_coders

A character vector listing coders in order of preference. The function keeps excerpts only from the highest-ranked coder per Media Title.

Value

A cleaned tibble/data frame containing filtered excerpts with logical code columns and only preferred coders per media title.

Details

The function expects columns starting with "Code: " to contain textual "true"/"false" values, which are converted to logical TRUE/FALSE. Columns ending with "Range" or "Weight" are removed. Excerpts are filtered so that for each Media Title, only the coder highest in the preferred_coders vector is retained.

Examples

if (FALSE) { # \dontrun{
preferred <- c("Coder1", "Coder2", "Coder3")
cleaned_data <- clean_data("path/to/excerpts.xlsx", preferred)
} # }