Skip to contents

ac_extract() returns one list element per document. Each element contains the matched text and the corresponding pattern values.

Usage

ac_extract(ac, doc, overlapping = FALSE, na = c("keep", "empty", "error"))

Arguments

ac

An <ac_automaton> object created by ac_build().

doc

A character vector of documents to search.

overlapping

Default is FALSE. If TRUE, extract overlapping matches. This is only supported when ac was built with match_kind = "standard".

na

How to handle NA documents. "keep" returns one row with missing matches and patterns values (default); "empty" treats missing documents as no matches; "error" fails.

Value

A list with the same length as doc. Each element is a data frame with one row per match and two columns:

  • matches: Text matched in the document.

  • patterns: Pattern values corresponding to each match.

Examples

if (
  requireNamespace("dplyr", quietly = TRUE) &&
    requireNamespace("tibble", quietly = TRUE) &&
    requireNamespace("tidyr", quietly = TRUE)
) {
  ac <- ac_build(c("hello", "world"))
  tibble::tibble(doc = c("hello world", "nothing", "world")) |>
    dplyr::mutate(extracted = ac_extract(ac, doc)) |>
    tidyr::unnest(extracted)
}
#> # A tibble: 3 × 3
#>   doc         matches patterns
#>   <chr>       <chr>   <chr>   
#> 1 hello world hello   hello   
#> 2 hello world world   world   
#> 3 world       world   world