Skip to contents

Generate IDF dict from a list of documents.

Usage

get_idf(x, stop_word = NULL, stop_word_file = NULL, path = NULL)

Arguments

x

a list of character vectors. Each vector represents a document of already-segmented words.

stop_word

Optional character vector of stop words supplied directly.

stop_word_file

Optional file path containing one stop word per line.

path

Optional output file path. When NULL, a data frame is returned. Otherwise, the result is written to the file as word idf_value per line (the format expected by worker(type = "keywords", idf = ...)) and the path is returned invisibly.

Value

A data frame with name and count columns, or a file path (invisibly) when path is supplied.

Details

Input list contains multiple character vectors with words, and each vector represents a document.

Stop words will be removed from the result.

If path is not NULL, it will write the result to the path.

Examples

get_idf(list(c("abc", "def"),c("abc", " ")))
#>   name     count
#> 1  abc 0.0000000
#> 2  def 0.6931472
#> 3      0.6931472