Skip to contents

Creates a horizontal stacked or dodged bar plot visualizing counts or proportions of quality indicator annotations per code. Only codes that have all specified quality indicators present at least once (count > 0) are shown.

Usage

plot_saturation(
  df_all_summary,
  df_qual_summary,
  qual_indicators = NULL,
  min_counts = NULL,
  stacked = TRUE,
  as_proportion = FALSE
)

Arguments

df_all_summary

A data frame (tibble) summarizing codes, must contain at least Code and total_preferred_coder columns.

df_qual_summary

A data frame (tibble) containing quality indicator counts per code. Must have columns named exactly as in qual_indicators, and a Code column.

qual_indicators

A character vector of quality indicator names to plot (e.g., c("Priority excerpt", "Heterogeneity")). These determine which columns to use and filter on.

min_counts

Optional named numeric vector specifying minimum counts for each quality indicator to include a code (e.g., c("Priority excerpt" = 20, "Heterogeneity" = 30)). Codes with counts below these thresholds for the respective quality indicators will be excluded.

stacked

Logical; if TRUE (default), bars for quality indicators will be stacked; if FALSE, bars will be dodged (side-by-side).

as_proportion

Logical; if TRUE, the y-axis will represent proportions of counts per code rather than raw counts.

Value

A ggplot object displaying the counts or proportions of quality indicator annotations by code.

Details

  • The function filters to only display codes that have counts greater than zero for all specified quality indicators.

  • The plot orders codes by descending total counts from total_preferred_coder.

  • Colors are generated with a discrete gradient palette for visual clarity.

  • Input data frames should be outputs from summarize_codes() and quality_indicators() functions or have equivalent structure.

Examples

if (FALSE) { # \dontrun{
summary_data <- summarize_codes(excerpts, preferred_coders,
output_type = "tibble")
quality_data <- quality_indicators(excerpts, preferred_coders,
qual_indicators = c("Priority excerpt", "Heterogeneity"))

plot_saturation(
  summary_data,
  quality_data,
  qual_indicators = c("Priority excerpt", "Heterogeneity"),
  min_counts = c("Priority excerpt" = 3, "Heterogeneity" = 3),
  stacked = TRUE,
  as_proportion = FALSE
)
} # }