UnicodeUtil
Class to encode java's UTF16 char[] into UTF8 byte[] without always allocating a new byte[] as String.getBytes(StandardCharsets.UTF_8) does.
Types
Holds a codepoint along with the number of bytes required to represent it in UTF8
Properties
Maximum number of UTF8 bytes per UTF16 character.
Functions
Calculates the number of UTF8 bytes necessary to write a UTF16 string.
Computes the codepoint and codepoint length (in bytes) of the specified offset in the provided utf8 byte array, assuming UTF8 encoding. As with other related methods in this class, this assumes valid UTF8 input and does not perform full UTF8 validation. Passing invalid UTF8 or a position that is not a valid header byte position may result in undefined behavior. This makes no attempt to synchronize or validate.
Returns the number of code points in this UTF8 sequence.
Returns the maximum number of utf8 bytes required to encode a utf16 (e.g., java char[], String)
Encode characters from a char[] source, starting at offset for length chars. It is the responsibility of the caller to make sure that the destination array is large enough.
Encode characters from this String, starting at offset for length characters. It is the responsibility of the caller to make sure that the destination array is large enough.
Encode characters from this String, starting at offset for length characters. Output to the destination array will begin at outOffset. It is the responsibility of the caller to make sure that the destination array is large enough.
Utility method for .UTF8toUTF16
Interprets the given byte array as UTF-8 and converts to UTF-16. It is the responsibility of the caller to make sure that the destination array is large enough.
This method assumes valid UTF8 input. This method does not perform full UTF8 validation, it will check only the first byte of each codepoint (for multi-byte sequences any bytes after the head are skipped). It is the responsibility of the caller to make sure that the destination array is large enough.