WordlistLoader

Loader for text files that represent a list of stopwords.

See also

to obtain Reader instances

Functions

Link copied to clipboard
fun getLines(stream: InputStream, charset: Charset): MutableList<String>

Accesses a resource by name and returns the (non comment) lines containing data using the given character encoding.

Link copied to clipboard

Reads stopwords from a stopword list in Snowball format.

Link copied to clipboard

Reads a stem dictionary. Each line contains:

Link copied to clipboard

Reads lines from an InputStream with UTF-8 charset and adds every line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

Reads lines from a Reader and adds every line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

fun getWordSet(stream: InputStream, comment: String): CharArraySet

Reads lines from an InputStream with UTF-8 charset and adds every non-comment line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

fun getWordSet(stream: InputStream, charset: Charset): CharArraySet

Reads lines from an InputStream with the given charset and adds every line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

fun getWordSet(reader: Reader, comment: String): CharArraySet

Reads lines from a Reader and adds every non-comment line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

Reads lines from a Reader and adds every non-blank line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

fun getWordSet(stream: InputStream, charset: Charset, comment: String): CharArraySet

Reads lines from an InputStream with the given charset and adds every non-comment line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

fun getWordSet(reader: Reader, comment: String, result: CharArraySet): CharArraySet

Reads lines from a Reader and adds every non-blank non-comment line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).