Convenience wrapper around segment() for multi-string input. When
batch is omitted, segment_batch() will return list output by default.
Usage
segment_batch(texts, jiebar, ..., batch = c("list", "data.frame", "flatten"))Details
segment_batch() is a convenience wrapper around segment() for explicit
batch processing. It always treats texts as multi-string input. The
returned object depends on batch:
"list": one character vector per input string."data.frame": a data frame withdoc_idandwordcolumns."flatten": one concatenated character vector.
In the current release benchmarks on the bundled Fortress Besieged and
Dream of the Red Chamber texts, batch segmentation reaches about 7x to
12x speedup over the comparable jiebaR workflow on many-string inputs.
For very long texts, splitting into about 32 to 128 chunks before calling
segment_batch() is recommended for good throughput.