Skip to contents

ac_locate_file() searches files with a compiled automaton and returns one list element per file. Character offsets are 1-based and inclusive, so they can be used directly with substr().

Usage

ac_locate_file(ac, path, overlapping = FALSE)

Arguments

ac

An <ac_automaton> object created by ac_build().

path

A vector of file paths to search.

overlapping

Default is FALSE. If TRUE, report overlapping matches. This is only supported when ac was built with match_kind = "standard".

Value

A list with the same length as path. Each element is a data frame with one row per match and three columns:

  • pattern_id: Index of the matched pattern in ac_patterns(ac).

  • start: 1-based index of the first character in each match.

  • end: 1-based index of the last character in each match.

Details

File location search is always non-streaming. Converting byte offsets from a streaming search into R-facing character offsets would require a second pass over the same file to reconstruct UTF-8 character boundaries. Keeping ac_locate_file() as a simple in-memory search is the clearest implementation.

Examples

ac <- ac_build(c("hello", "world"))
path <- tempfile()
writeLines("hello world", path)
ac_locate_file(ac, path)
#> [[1]]
#>   pattern_id start end
#> 1          1     1   5
#> 2          2     7  11
#>