ac_extract() returns one list element per document. Each element contains
the matched text and the corresponding pattern values.
Usage
ac_extract(ac, doc, overlapping = FALSE, na = c("keep", "empty", "error"))Arguments
- ac
An
<ac_automaton>object created byac_build().- doc
A character vector of documents to search.
- overlapping
Default is
FALSE. IfTRUE, extract overlapping matches. This is only supported whenacwas built withmatch_kind = "standard".- na
How to handle
NAdocuments."keep"returns one row with missingmatchesandpatternsvalues (default);"empty"treats missing documents as no matches;"error"fails.
Value
A list with the same length as doc. Each element is a data frame
with one row per match and two columns:
matches: Text matched in the document.patterns: Pattern values corresponding to each match.
Examples
if (
requireNamespace("dplyr", quietly = TRUE) &&
requireNamespace("tibble", quietly = TRUE) &&
requireNamespace("tidyr", quietly = TRUE)
) {
ac <- ac_build(c("hello", "world"))
tibble::tibble(doc = c("hello world", "nothing", "world")) |>
dplyr::mutate(extracted = ac_extract(ac, doc)) |>
tidyr::unnest(extracted)
}
#> # A tibble: 3 × 3
#> doc matches patterns
#> <chr> <chr> <chr>
#> 1 hello world hello hello
#> 2 hello world world world
#> 3 world world world