core/org.gnit.lucenekmp.jdkport/Character/Companion

Companion

Types

enum UnicodeScript : Enum<Character.Companion.UnicodeScript>

A family of character subsets representing the character scripts defined in the Unicode Standard Annex #24: Script Names. Every Unicode character is assigned to a single Unicode script, either a specific script, such as Latin, or one of the following three special values, Inherited, Common or Unknown.

Properties

BYTES

const val BYTES: Int

The number of bytes used to represent a char value in unsigned binary form.

COMBINING_SPACING_MARK

const val COMBINING_SPACING_MARK: Byte = 8

General category "Mc" in the Unicode specification.

CONNECTOR_PUNCTUATION

const val CONNECTOR_PUNCTUATION: Byte = 23

General category "Pc" in the Unicode specification.

CONTROL

const val CONTROL: Byte = 15

General category "Cc" in the Unicode specification.

CURRENCY_SYMBOL

const val CURRENCY_SYMBOL: Byte = 26

General category "Sc" in the Unicode specification.

DASH_PUNCTUATION

const val DASH_PUNCTUATION: Byte = 20

General category "Pd" in the Unicode specification.

DECIMAL_DIGIT_NUMBER

const val DECIMAL_DIGIT_NUMBER: Byte = 9

General category "Nd" in the Unicode specification.

DIRECTIONALITY_ARABIC_NUMBER

const val DIRECTIONALITY_ARABIC_NUMBER: Byte = 6

Weak bidirectional character type "AN" in the Unicode specification.

DIRECTIONALITY_BOUNDARY_NEUTRAL

const val DIRECTIONALITY_BOUNDARY_NEUTRAL: Byte = 9

Weak bidirectional character type "BN" in the Unicode specification.

DIRECTIONALITY_COMMON_NUMBER_SEPARATOR

const val DIRECTIONALITY_COMMON_NUMBER_SEPARATOR: Byte = 7

Weak bidirectional character type "CS" in the Unicode specification.

DIRECTIONALITY_EUROPEAN_NUMBER

const val DIRECTIONALITY_EUROPEAN_NUMBER: Byte = 3

Weak bidirectional character type "EN" in the Unicode specification.

DIRECTIONALITY_EUROPEAN_NUMBER_SEPARATOR

const val DIRECTIONALITY_EUROPEAN_NUMBER_SEPARATOR: Byte = 4

Weak bidirectional character type "ES" in the Unicode specification.

DIRECTIONALITY_EUROPEAN_NUMBER_TERMINATOR

const val DIRECTIONALITY_EUROPEAN_NUMBER_TERMINATOR: Byte = 5

Weak bidirectional character type "ET" in the Unicode specification.

DIRECTIONALITY_FIRST_STRONG_ISOLATE

const val DIRECTIONALITY_FIRST_STRONG_ISOLATE: Byte = 21

Weak bidirectional character type "FSI" in the Unicode specification.

DIRECTIONALITY_LEFT_TO_RIGHT

const val DIRECTIONALITY_LEFT_TO_RIGHT: Byte = 0

Strong bidirectional character type "L" in the Unicode specification.

DIRECTIONALITY_LEFT_TO_RIGHT_EMBEDDING

const val DIRECTIONALITY_LEFT_TO_RIGHT_EMBEDDING: Byte = 14

Strong bidirectional character type "LRE" in the Unicode specification.

DIRECTIONALITY_LEFT_TO_RIGHT_ISOLATE

const val DIRECTIONALITY_LEFT_TO_RIGHT_ISOLATE: Byte = 19

Weak bidirectional character type "LRI" in the Unicode specification.

DIRECTIONALITY_LEFT_TO_RIGHT_OVERRIDE

const val DIRECTIONALITY_LEFT_TO_RIGHT_OVERRIDE: Byte = 15

Strong bidirectional character type "LRO" in the Unicode specification.

DIRECTIONALITY_NONSPACING_MARK

const val DIRECTIONALITY_NONSPACING_MARK: Byte = 8

Weak bidirectional character type "NSM" in the Unicode specification.

DIRECTIONALITY_OTHER_NEUTRALS

const val DIRECTIONALITY_OTHER_NEUTRALS: Byte = 13

Neutral bidirectional character type "ON" in the Unicode specification.

DIRECTIONALITY_PARAGRAPH_SEPARATOR

const val DIRECTIONALITY_PARAGRAPH_SEPARATOR: Byte = 10

Neutral bidirectional character type "B" in the Unicode specification.

DIRECTIONALITY_POP_DIRECTIONAL_FORMAT

const val DIRECTIONALITY_POP_DIRECTIONAL_FORMAT: Byte = 18

Weak bidirectional character type "PDF" in the Unicode specification.

DIRECTIONALITY_POP_DIRECTIONAL_ISOLATE

const val DIRECTIONALITY_POP_DIRECTIONAL_ISOLATE: Byte = 22

Weak bidirectional character type "PDI" in the Unicode specification.

DIRECTIONALITY_RIGHT_TO_LEFT

const val DIRECTIONALITY_RIGHT_TO_LEFT: Byte = 1

Strong bidirectional character type "R" in the Unicode specification.

DIRECTIONALITY_RIGHT_TO_LEFT_ARABIC

const val DIRECTIONALITY_RIGHT_TO_LEFT_ARABIC: Byte = 2

Strong bidirectional character type "AL" in the Unicode specification.

DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING

const val DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING: Byte = 16

Strong bidirectional character type "RLE" in the Unicode specification.

DIRECTIONALITY_RIGHT_TO_LEFT_ISOLATE

const val DIRECTIONALITY_RIGHT_TO_LEFT_ISOLATE: Byte = 20

Weak bidirectional character type "RLI" in the Unicode specification.

DIRECTIONALITY_RIGHT_TO_LEFT_OVERRIDE

const val DIRECTIONALITY_RIGHT_TO_LEFT_OVERRIDE: Byte = 17

Strong bidirectional character type "RLO" in the Unicode specification.

DIRECTIONALITY_SEGMENT_SEPARATOR

const val DIRECTIONALITY_SEGMENT_SEPARATOR: Byte = 11

Neutral bidirectional character type "S" in the Unicode specification.

DIRECTIONALITY_UNDEFINED

const val DIRECTIONALITY_UNDEFINED: Byte

Undefined bidirectional character type. Undefined {@code char} values have undefined directionality in the Unicode specification.

DIRECTIONALITY_WHITESPACE

const val DIRECTIONALITY_WHITESPACE: Byte = 12

Neutral bidirectional character type "WS" in the Unicode specification.

ENCLOSING_MARK

const val ENCLOSING_MARK: Byte = 7

General category "Me" in the Unicode specification.

END_PUNCTUATION

const val END_PUNCTUATION: Byte = 22

General category "Pe" in the Unicode specification.

ERROR

const val ERROR: Int

Error flag. Use int (code point) to avoid confusion with U+FFFF.

FINAL_QUOTE_PUNCTUATION

const val FINAL_QUOTE_PUNCTUATION: Byte = 30

General category "Pf" in the Unicode specification.

FORMAT

const val FORMAT: Byte = 16

General category "Cf" in the Unicode specification.

INITIAL_QUOTE_PUNCTUATION

const val INITIAL_QUOTE_PUNCTUATION: Byte = 29

General category "Pi" in the Unicode specification.

LETTER_NUMBER

const val LETTER_NUMBER: Byte = 10

General category "Nl" in the Unicode specification.

LINE_SEPARATOR

const val LINE_SEPARATOR: Byte = 13

General category "Zl" in the Unicode specification.

LOWERCASE_LETTER

const val LOWERCASE_LETTER: Byte = 2

General category "Ll" in the Unicode specification.

MATH_SYMBOL

const val MATH_SYMBOL: Byte = 25

General category "Sm" in the Unicode specification.

MAX_CODE_POINT

const val MAX_CODE_POINT: Int

MAX_HIGH_SURROGATE

const val MAX_HIGH_SURROGATE: Char = '\uDBFF'

The maximum value of a Unicode high-surrogate code unit in the UTF-16 encoding, constant '\u005CuDBFF'. A high-surrogate is also known as a leading-surrogate.

MAX_LOW_SURROGATE

const val MAX_LOW_SURROGATE: Char = '\uDFFF'

The maximum value of a Unicode low-surrogate code unit in the UTF-16 encoding, constant '\u005CuDFFF'. A low-surrogate is also known as a trailing-surrogate.

MAX_RADIX

const val MAX_RADIX: Int = 36

The maximum radix available for conversion to and from strings. The constant value of this field is the largest value permitted for the radix argument in radix-conversion methods such as the digit method, the forDigit method, and the toString method of class Integer.

MAX_SURROGATE

const val MAX_SURROGATE: Char

The maximum value of a Unicode surrogate code unit in the UTF-16 encoding, constant '\u005CuDFFF'.

MAX_VALUE

const val MAX_VALUE: Char = '\uFFFF'

The constant value of this field is the largest value of type char, '\u005CuFFFF'.

MIN_CODE_POINT

const val MIN_CODE_POINT: Int = 0

MIN_HIGH_SURROGATE

const val MIN_HIGH_SURROGATE: Char = '\uD800'

The minimum value of a Unicode high-surrogate code unit in the UTF-16 encoding, constant '\u005CuD800'. A high-surrogate is also known as a leading-surrogate.

MIN_LOW_SURROGATE

const val MIN_LOW_SURROGATE: Char = '\uDC00'

The minimum value of a Unicode low-surrogate code unit in the UTF-16 encoding, constant '\u005CuDC00'. A low-surrogate is also known as a trailing-surrogate.

MIN_RADIX

const val MIN_RADIX: Int = 2

The minimum radix available for conversion to and from strings. The constant value of this field is the smallest value permitted for the radix argument in radix-conversion methods such as the digit method, the forDigit method, and the toString method of class Integer.

MIN_SUPPLEMENTARY_CODE_POINT

const val MIN_SUPPLEMENTARY_CODE_POINT: Int = 65536

MIN_SURROGATE

const val MIN_SURROGATE: Char

The minimum value of a Unicode surrogate code unit in the UTF-16 encoding, constant '\u005CuD800'.

MIN_VALUE

const val MIN_VALUE: Char = '\u0000'

The constant value of this field is the smallest value of type char, '\u005Cu0000'.

MODIFIER_LETTER

const val MODIFIER_LETTER: Byte = 4

General category "Lm" in the Unicode specification.

MODIFIER_SYMBOL

const val MODIFIER_SYMBOL: Byte = 27

General category "Sk" in the Unicode specification.

NON_SPACING_MARK

const val NON_SPACING_MARK: Byte = 6

General category "Mn" in the Unicode specification.

OTHER_LETTER

const val OTHER_LETTER: Byte = 5

General category "Lo" in the Unicode specification.

OTHER_NUMBER

const val OTHER_NUMBER: Byte = 11

General category "No" in the Unicode specification.

OTHER_PUNCTUATION

const val OTHER_PUNCTUATION: Byte = 24

General category "Po" in the Unicode specification.

OTHER_SYMBOL

const val OTHER_SYMBOL: Byte = 28

General category "So" in the Unicode specification.

PARAGRAPH_SEPARATOR

const val PARAGRAPH_SEPARATOR: Byte = 14

General category "Zp" in the Unicode specification.

PRIVATE_USE

const val PRIVATE_USE: Byte = 18

General category "Co" in the Unicode specification.

SIZE

const val SIZE: Int = 16

SPACE_SEPARATOR

const val SPACE_SEPARATOR: Byte = 12

General category "Zs" in the Unicode specification.

START_PUNCTUATION

const val START_PUNCTUATION: Byte = 21

General category "Ps" in the Unicode specification.

SURROGATE

const val SURROGATE: Byte = 19

General category "Cs" in the Unicode specification.

TITLECASE_LETTER

const val TITLECASE_LETTER: Byte = 3

General category "Lt" in the Unicode specification.

UNASSIGNED

const val UNASSIGNED: Byte = 0

General category "Cn" in the Unicode specification.

UPPERCASE_LETTER

const val UPPERCASE_LETTER: Byte = 1

General category "Lu" in the Unicode specification.

Functions

charCount

fun charCount(codePoint: Int): Int

codePointAt

fun codePointAt(seq: CharSequence, index: Int): Int

Returns the code point at the given index of the CharSequence. If the char value at the given index in the CharSequence is in the high-surrogate range, the following index is less than the length of the CharSequence, and the char value at the following index is in the low-surrogate range, then the supplementary code point corresponding to this surrogate pair is returned. Otherwise, the char value at the given index is returned.

fun codePointAt(a: CharArray, index: Int, limit: Int): Int

Returns the code point at the given index of the char array, where only array elements with index less than limit can be used. If the char value at the given index in the char array is in the high-surrogate range, the following index is less than the limit, and the char value at the following index is in the low-surrogate range, then the supplementary code point corresponding to this surrogate pair is returned. Otherwise, the char value at the given index is returned.

codePointAtImpl

fun codePointAtImpl(a: CharArray, index: Int, limit: Int): Int

codePointCount

fun codePointCount(a: CharArray, offset: Int, count: Int): Int

Returns the number of Unicode code points in a subarray of the char array argument.

compare

fun compare(x: Char, y: Char): Int

Compares two char values numerically. The value returned is identical to what would be returned by:

getNumericValue

fun getNumericValue(codePoint: Int): Int

Returns the numeric value of the specified character (Unicode code point).

getType

fun getType(codePoint: Int): Int

Returns a value indicating a character's general category.

highSurrogate

fun highSurrogate(codePoint: Int): Char

Returns the leading surrogate (a high surrogate code unit) of the surrogate pair representing the specified supplementary character (Unicode code point) in the UTF-16 encoding. If the specified character is not a Character.html#supplementary, an unspecified char is returned.

isBmpCodePoint

fun isBmpCodePoint(codePoint: Int): Boolean

Determines whether the specified character (Unicode code point) is in the #BMP. Such code points can be represented using a single char.

isDigit

fun isDigit(codePoint: Int): Boolean

Determines if the specified character (Unicode code point) is a digit.

isExtendedPictographic

fun isExtendedPictographic(codePoint: Int): Boolean

Determines if the specified character (Unicode code point) is an Extended Pictographic.

isHighSurrogate

fun isHighSurrogate(ch: Char): Boolean

Determines if the given char value is a Unicode high-surrogate code unit (also known as leading-surrogate code unit).

isLetter

fun isLetter(codePoint: Int): Boolean

Determines if the specified character (Unicode code point) is a letter.

isLowerCase

fun isLowerCase(codePoint: Int): Boolean

Determines if the specified character (Unicode code point) is a lowercase character.

isLowSurrogate

fun isLowSurrogate(ch: Char): Boolean

Determines if the given char value is a Unicode low-surrogate code unit (also known as trailing-surrogate code unit).

isSupplementaryCodePoint

fun isSupplementaryCodePoint(codePoint: Int): Boolean

Determines whether the specified character (Unicode code point) is in the #supplementary range.

isValidCodePoint

fun isValidCodePoint(codePoint: Int): Boolean

Determines whether the specified code point is a valid Unicode code point value.

isWhitespace

fun isWhitespace(codePoint: Int): Boolean

Determines if the specified character (Unicode code point) is white space according to Java.

lowSurrogate

fun lowSurrogate(codePoint: Int): Char

Returns the trailing surrogate (a low surrogate code unit) of the surrogate pair representing the specified supplementary character (Unicode code point) in the UTF-16 encoding. If the specified character is not a Character.html#supplementary, an unspecified char is returned.

offsetByCodePoints

fun offsetByCodePoints(a: CharArray, start: Int, count: Int, index: Int, codePointOffset: Int): Int

Returns the index within the given char subarray that is offset from the given index by codePointOffset code points.

toChars

fun toChars(codePoint: Int, dst: CharArray, dstIndex: Int): Int

Converts the specified character (Unicode code point) to its UTF-16 representation. If the specified code point is a BMP (Basic Multilingual Plane or Plane 0) value, the same value is stored in dst[dstIndex], and 1 is returned. If the specified code point is a supplementary character, its surrogate values are stored in dst[dstIndex] (high-surrogate) and dst[dstIndex+1] (low-surrogate), and 2 is returned.

toCodePoint

fun toCodePoint(high: Char, low: Char): Int

Converts the specified surrogate pair to its supplementary code point value. This method does not validate the specified surrogate pair. The caller must validate it using .isSurrogatePair if necessary.

toLowerCase

fun toLowerCase(codePoint: Int): Int

Converts the character (Unicode code point) argument to lowercase using case mapping information from the UnicodeData file.

toSurrogates

fun toSurrogates(codePoint: Int, dst: CharArray, index: Int)

toUpperCase

fun toUpperCase(codePoint: Int): Int

Converts the character (Unicode code point) argument to uppercase using case mapping information from the UnicodeData file.