ac_locate_file() searches files with a compiled automaton and returns one
list element per file. Character offsets are 1-based and inclusive, so they
can be used directly with substr().
Arguments
- ac
An
<ac_automaton>object created byac_build().- path
A vector of file paths to search.
- overlapping
Default is
FALSE. IfTRUE, report overlapping matches. This is only supported whenacwas built withmatch_kind = "standard".
Value
A list with the same length as path. Each element is a data frame
with one row per match and three columns:
pattern_id: Index of the matched pattern inac_patterns(ac).start: 1-based index of the first character in each match.end: 1-based index of the last character in each match.
Details
File location search is always non-streaming. Converting byte offsets from a
streaming search into R-facing character offsets would require a second pass
over the same file to reconstruct UTF-8 character boundaries. Keeping
ac_locate_file() as a simple in-memory search is the clearest
implementation.
Examples
ac <- ac_build(c("hello", "world"))
path <- tempfile()
writeLines("hello world", path)
ac_locate_file(ac, path)
#> [[1]]
#> pattern_id start end
#> 1 1 1 5
#> 2 2 7 11
#>