ac_locate_bytes() searches a character vector with a compiled automaton
and returns byte offsets from the Rust aho-corasick crate. Byte offsets are
0-based, and byte_end is end-exclusive.
Usage
ac_locate_bytes(ac, doc, overlapping = FALSE, na = c("omit", "keep", "error"))Arguments
- ac
An
<ac_automaton>object created byac_build().- doc
A character vector of documents to search.
- overlapping
Default is
FALSE. IfTRUE, report overlapping matches. This is only supported whenacwas built withmatch_kind = "standard".- na
How to handle
NAdocuments."omit"drops missing documents (default);"keep"returns one row with missing result columns for each missing document;"error"fails.
Value
A data frame with one row per match and four columns:
doc_id, pattern_id, byte_start, and byte_end.