Plot Code Saturation by Quality Indicators — plot

Creates a horizontal stacked or dodged bar plot visualizing counts or proportions of quality indicator annotations per code. Only codes that have all specified quality indicators present at least once (count > 0) are shown.

Usage

plot_saturation(
  df_all_summary,
  df_qual_summary,
  qual_indicators = NULL,
  min_counts = NULL,
  stacked = TRUE,
  as_proportion = FALSE
)

Arguments

df_all_summary: A data frame (tibble) summarizing codes, must contain at least Code and total_preferred_coder columns.
df_qual_summary: A data frame (tibble) containing quality indicator counts per code. Must have columns named exactly as in qual_indicators, and a Code column.
qual_indicators: A character vector of quality indicator names to plot (e.g., c("Priority excerpt", "Heterogeneity")). These determine which columns to use and filter on.
min_counts: Optional named numeric vector specifying minimum counts for each quality indicator to include a code (e.g., c("Priority excerpt" = 20, "Heterogeneity" = 30)). Codes with counts below these thresholds for the respective quality indicators will be excluded.
stacked: Logical; if TRUE (default), bars for quality indicators will be stacked; if FALSE, bars will be dodged (side-by-side).
as_proportion: Logical; if TRUE, the y-axis will represent proportions of counts per code rather than raw counts.

Value

A ggplot object displaying the counts or proportions of quality indicator annotations by code.

Details

The function filters to only display codes that have counts greater than zero for all specified quality indicators.
The plot orders codes by descending total counts from total_preferred_coder.
Colors are generated with a discrete gradient palette for visual clarity.
Input data frames should be outputs from summarize_codes() and quality_indicators() functions or have equivalent structure.

Examples

if (FALSE) { # \dontrun{
summary_data <- summarize_codes(excerpts, preferred_coders,
output_type = "tibble")
quality_data <- quality_indicators(excerpts, preferred_coders,
qual_indicators = c("Priority excerpt", "Heterogeneity"))

plot_saturation(
  summary_data,
  quality_data,
  qual_indicators = c("Priority excerpt", "Heterogeneity"),
  min_counts = c("Priority excerpt" = 3, "Heterogeneity" = 3),
  stacked = TRUE,
  as_proportion = FALSE
)
} # }