
Read a Word file and redact it (no data.frame)
ghost_docx.RdReads a .docx file as raw paragraphs, applies in-function redaction, and writes a
redacted file. No speaker/text data.frame is created; the document is treated
as a sequence of paragraphs. You can choose the output format (DOCX/TXT/VTT)
via out_format similar to ghost_vtt() and ghost_batch().
Usage
ghost_docx(
filepath,
interviewers,
interviewees = character(),
redact_other = character(),
redact_interviewer = FALSE,
include_common_names = FALSE,
redacted_token = "[REDACTED]",
add_blank_line_between_turns = TRUE,
output_path = NULL,
suffix = "_redacted",
out_format = c("docx", "txt", "vtt"),
report_redacted = FALSE
)Arguments
- filepath
Path to a
.docxfile.- interviewers
Character vector of interviewer names.
- interviewees
Character vector of interviewee/participant names.
- redact_other
Other words/phrases to redact.
- redact_interviewer
If
TRUE, also redact interviewer names.- include_common_names
If
TRUE, also redact a default list of common names (e.g., top US baby names, if available viaghosted::common_names_default).- redacted_token
Replacement token used for redactions (names and other phrases).
- add_blank_line_between_turns
Logical; for TXT/DOCX outputs when converting formats, insert a blank line between turns. This does not affect DOCX→DOCX.
- output_path
Path for the redacted file. If
NULL(default), set to the same directory and base name asfilepathwith_redactedbefore the extension. The extension is chosen based onout_format(e.g.report.docx->report_redacted.docxforout_format = "docx", orreport_redacted.txt/report_redacted.vttotherwise).- suffix
Suffix to append to the base filename (default:
"_redacted"). Only used whenoutput_pathisNULL.- out_format
One of
"docx","txt", or"vtt"controlling the output file type. Defaults to"docx".- report_redacted
If
TRUE, print to the R console which phrases were found and redacted (names and other).
Examples
# Writes report_redacted.docx in same folder, returns path:
# ghost_docx("report.docx", interviewers = "Dr. Smith", interviewees = "Jane Doe")
# Write as TXT instead of DOCX:
# ghost_docx("report.docx", interviewers = "Dr. Smith", interviewees = "Jane Doe",
# out_format = "txt")
# With common names and redaction report:
# ghost_docx("report.docx", interviewers = "Dr. Smith", interviewees = "Jane Doe",
# include_common_names = TRUE, report_redacted = TRUE)