Skip to contents

get_tuple() is kept only for compatibility with jiebaR. New code should use count_ngrams() instead.

Usage

get_tuple(x, size = 2, dataframe = TRUE)

Arguments

x

A character vector of tokens or a list of character vectors.

size

A single integer >= 2. The compatibility semantics count all contiguous n-grams from 2 up to size.

dataframe

Whether to return a data frame. If FALSE, a named integer vector is returned.

Value

If dataframe = TRUE, a data frame with name and count columns, sorted by descending count. Otherwise, a named integer vector.

Details

This function is deprecated and should not be used in new code. It is provided only as a compatibility wrapper around count_ngrams() and replicates the behavior of jiebaR::get_tuple().

Prefer count_ngrams() because the original jiebaR::get_tuple() interface has several design problems:

  1. Its n-gram extraction behavior does not match the most obvious reading of the argument name: size = n counts all contiguous n-grams from 2:n, not just the exact size n.

  2. Its documentation says it accepts list input, but the original exported implementation does not reliably support lists.

  3. It concatenates tokens without a separator, which makes tuple boundaries ambiguous.

See also

Examples

suppressWarnings(get_tuple(c("sd", "sd", "sd", "rd"), 2))
#>   name count
#> 1 sdsd     2
#> 2 sdrd     1