test-framework/org.gnit.lucenekmp.tests.analysis.standard/WordBreakTestUnicode_12_1_0

WordBreakTestUnicode_12_1_0

This class was automatically generated by generateJavaUnicodeWordBreakTest.main.kts from: http://www.unicode.org/Public/12.1.0/ucd/auxiliary/WordBreakTest.txt

WordBreakTest.txt indicates the points in the provided character sequences at which conforming implementations must and must not break words. This class tests for expected token extraction from each of the test sequences in WordBreakTest.txt, where the expected tokens are those character sequences bounded by word breaks and containing at least one character from one of the following character sets:

\\p{Script = Han}                (From http://www.unicode.org/Public/12.1.0/ucd/Scripts.txt)
\\p{Script = Hiragana}
\\p{LineBreak = Complex_Context} (From http://www.unicode.org/Public/12.1.0/ucd/LineBreak.txt)
\\p{WordBreak = ALetter}         (From http://www.unicode.org/Public/12.1.0/ucd/auxiliary/WordBreakProperty.txt)
\\p{WordBreak = Hebrew_Letter}
\\p{WordBreak = Katakana}
\\p{WordBreak = Numeric}
\\p{Extended_Pictographic}       (From http://www.unicode.org/Public/emoji/12.1/emoji-data.txt)

Functions

test

fun test(analyzer: Analyzer)