Remove selected words from a segmented character vector or from each element of a list of segmented character vectors.
Details
This is a modern reimplementation of jiebaR::filter_segment() with the
same core filtering behavior under the default settings.
In the reproducible benchmark, this version is about 110x to 140x
faster than jiebaR::filter_segment() on the tested workloads.
Examples
filter_segment(c("abc", "def", " ", "."), c("abc"))
#> [1] "def" " " "."
filter_segment(c("a", NA, "b", "a"), c("b"), keep_na = FALSE)
#> [1] "a" "a"
input <- list(
c("\u6211", "\u662f", "\u6d4b\u8bd5"),
c("\u6d4b\u8bd5", "\u6587\u672c", "\u6211")
)
filter_segment(input, "\u6211")
#> [[1]]
#> [1] "是" "测试"
#>
#> [[2]]
#> [1] "测试" "文本"
#>