ac_locate() searches a character vector with a compiled automaton and
returns one list element per document. Character offsets are 1-based and
inclusive, so they can be used directly with substr().
Usage
ac_locate(ac, doc, overlapping = FALSE, na = c("keep", "empty", "error"))Arguments
- ac
An
<ac_automaton>object created byac_build().- doc
A character vector of documents to search.
- overlapping
Default is
FALSE. IfTRUE, report overlapping matches. This is only supported whenacwas built withmatch_kind = "standard".- na
How to handle
NAdocuments."keep"returns one row with missingpattern_id,start, andendvalues (default);"empty"treats missing documents as no matches;"error"fails.
Value
A list with the same length as doc. Each element is a data frame
with one row per match and three columns:
pattern_id: Index of the matched pattern inac_patterns(ac).start: 1-based index of the first character in each match.end: 1-based index of the last character in each match.
Examples
if (
requireNamespace("dplyr", quietly = TRUE) &&
requireNamespace("tibble", quietly = TRUE) &&
requireNamespace("tidyr", quietly = TRUE)
) {
ac <- ac_build(c("hello", "world"))
tibble::tibble(doc = c("hello world", "nothing", "world")) |>
dplyr::mutate(hits = ac_locate(ac, doc)) |>
tidyr::unnest(hits)
}
#> # A tibble: 3 × 4
#> doc pattern_id start end
#> <chr> <int> <int> <int>
#> 1 hello world 1 1 5
#> 2 hello world 2 7 11
#> 3 world 2 1 5