
Redact a Zoom/WebVTT transcript and write VTT/DOCX/TXT
ghost_vtt.RdParses a Zoom/WebVTT transcript as raw cues (no data.frame), redacts interviewee names (and optionally interviewer names) plus other phrases using boundary-aware matching, and writes the result as a WebVTT, Word, or plain text file.
Usage
ghost_vtt(
filepath,
interviewers,
interviewees = character(),
redact_other = character(),
redact_interviewer = FALSE,
include_common_names = FALSE,
redacted_token = "[REDACTED]",
add_blank_line_between_turns = TRUE,
output_path = NULL,
suffix = "_redacted",
out_format = c("vtt", "docx", "txt"),
report_redacted = FALSE
)Arguments
- filepath
Path to a
.vttfile.- interviewers
Character vector of interviewer names.
- interviewees
Character vector of interviewee/participant names.
- redact_other
Other words/phrases to redact.
- redact_interviewer
If
TRUE, also redact interviewer names in the transcript text. Interviewer names in the speaker field are always redacted.- include_common_names
If
TRUE, also redact a default list of common names (e.g., top US baby names, if available viaghosted::common_names_default).- redacted_token
Replacement token used for redactions (names and other phrases).
- add_blank_line_between_turns
Logical; for DOCX/TXT outputs, insert a blank line between turns.
- output_path
Full path for the output file. If
NULL, uses the folder offilepathwith the input base name plussuffixand an extension based onout_format.- suffix
Suffix to append to the base filename (default:
"_redacted").- out_format
One of
"vtt","docx", or"txt"controlling the output file extension.- report_redacted
If
TRUE, prints which phrases were found and redacted.
Details
Redaction mirrors the standalone logic used in ghost_docx() and
ghost_txt(): full names are also split into parts (e.g., first/last and
hyphenated pieces) and replaced longest-first with tokens.