abstract static class Utf8.Processor
extends java.lang.Object
Constructor and Description |
---|
Processor() |
Modifier and Type | Method and Description |
---|---|
(package private) abstract int |
encodeUtf8(java.lang.CharSequence in,
byte[] out,
int offset,
int length)
Encodes an input character sequence (
in ) to UTF-8 in the target array (out ). |
(package private) void |
encodeUtf8(java.lang.CharSequence in,
java.nio.ByteBuffer out)
Encodes an input character sequence (
in ) to UTF-8 in the target buffer (out ). |
(package private) void |
encodeUtf8Default(java.lang.CharSequence in,
java.nio.ByteBuffer out)
Encodes the input character sequence to a
ByteBuffer instance using the ByteBuffer API, rather than potentially faster approaches. |
(package private) abstract void |
encodeUtf8Direct(java.lang.CharSequence in,
java.nio.ByteBuffer out)
Encodes the input character sequence to a direct
ByteBuffer instance. |
(package private) boolean |
isValidUtf8(byte[] bytes,
int index,
int limit)
Returns
true if the given byte array slice is a
well-formed UTF-8 byte sequence. |
(package private) boolean |
isValidUtf8(java.nio.ByteBuffer buffer,
int index,
int limit)
Returns
true if the given portion of the ByteBuffer is a
well-formed UTF-8 byte sequence. |
private static int |
partialIsValidUtf8(java.nio.ByteBuffer buffer,
int index,
int limit)
Performs validation for
ByteBuffer instances using the ByteBuffer API rather
than potentially faster approaches. |
(package private) abstract int |
partialIsValidUtf8(int state,
byte[] bytes,
int index,
int limit)
Tells whether the given byte array slice is a well-formed,
malformed, or incomplete UTF-8 byte sequence.
|
(package private) int |
partialIsValidUtf8(int state,
java.nio.ByteBuffer buffer,
int index,
int limit)
Indicates whether or not the given buffer contains a valid UTF-8 string.
|
(package private) int |
partialIsValidUtf8Default(int state,
java.nio.ByteBuffer buffer,
int index,
int limit)
Performs validation for
ByteBuffer instances using the ByteBuffer API rather
than potentially faster approaches. |
(package private) abstract int |
partialIsValidUtf8Direct(int state,
java.nio.ByteBuffer buffer,
int index,
int limit)
Performs validation for direct
ByteBuffer instances. |
final boolean isValidUtf8(byte[] bytes, int index, int limit)
true
if the given byte array slice is a
well-formed UTF-8 byte sequence. The range of bytes to be
checked extends from index index
, inclusive, to limit
, exclusive.
This is a convenience method, equivalent to partialIsValidUtf8(bytes, index, limit) == Utf8.COMPLETE
.
abstract int partialIsValidUtf8(int state, byte[] bytes, int index, int limit)
index
, inclusive, to
limit
, exclusive.state
- either Utf8.COMPLETE
(if this is the initial decoding
operation) or the value returned from a call to a partial decoding method
for the previous bytesUtf8.MALFORMED
if the partial byte sequence is
definitely not well-formed, Utf8.COMPLETE
if it is well-formed
(no additional input needed), or if the byte sequence is
"incomplete", i.e. apparently terminated in the middle of a character,
an opaque integer "state" value containing enough information to
decode the character when passed to a subsequent invocation of a
partial decoding method.final boolean isValidUtf8(java.nio.ByteBuffer buffer, int index, int limit)
true
if the given portion of the ByteBuffer
is a
well-formed UTF-8 byte sequence. The range of bytes to be
checked extends from index index
, inclusive, to limit
, exclusive.
This is a convenience method, equivalent to partialIsValidUtf8(bytes, index, limit) == Utf8.COMPLETE
.
final int partialIsValidUtf8(int state, java.nio.ByteBuffer buffer, int index, int limit)
buffer
- the buffer to check.true
if the given buffer contains a valid UTF-8 string.abstract int partialIsValidUtf8Direct(int state, java.nio.ByteBuffer buffer, int index, int limit)
ByteBuffer
instances.final int partialIsValidUtf8Default(int state, java.nio.ByteBuffer buffer, int index, int limit)
ByteBuffer
instances using the ByteBuffer
API rather
than potentially faster approaches. This first completes validation for the current
character (provided by state
) and then finishes validation for the sequence.private static int partialIsValidUtf8(java.nio.ByteBuffer buffer, int index, int limit)
ByteBuffer
instances using the ByteBuffer
API rather
than potentially faster approaches.abstract int encodeUtf8(java.lang.CharSequence in, byte[] out, int offset, int length)
in
) to UTF-8 in the target array (out
).
For a string, this method is similar to
byte[] a = string.getBytes(UTF_8);
System.arraycopy(a, 0, bytes, offset, a.length);
return offset + a.length;
but is more efficient in both time and space. One key difference is that this method
requires paired surrogates, and therefore does not support chunking.
While String.getBytes(UTF_8)
replaces unpaired surrogates with the default
replacement character, this method throws Utf8.UnpairedSurrogateException
.
To ensure sufficient space in the output buffer, either call Utf8.encodedLength(java.lang.CharSequence)
to
compute the exact amount needed, or leave room for
Utf8.MAX_BYTES_PER_CHAR * sequence.length()
, which is the largest possible number
of bytes that any input can be encoded to.
in
- the input character sequence to be encodedout
- the target arrayoffset
- the starting offset in bytes
to start writing atlength
- the length of the bytes
, starting from offset
offset + Utf8.encodedLength(sequence)
Utf8.UnpairedSurrogateException
- if sequence
contains ill-formed UTF-16 (unpaired
surrogates)java.lang.ArrayIndexOutOfBoundsException
- if sequence
encoded in UTF-8 is longer than
bytes.length - offset
final void encodeUtf8(java.lang.CharSequence in, java.nio.ByteBuffer out)
in
) to UTF-8 in the target buffer (out
).
Upon returning from this method, the out
position will point to the position after
the last encoded byte. This method requires paired surrogates, and therefore does not
support chunking.
To ensure sufficient space in the output buffer, either call Utf8.encodedLength(java.lang.CharSequence)
to
compute the exact amount needed, or leave room for
Utf8.MAX_BYTES_PER_CHAR * in.length()
, which is the largest possible number
of bytes that any input can be encoded to.
in
- the source character sequence to be encodedout
- the target bufferUtf8.UnpairedSurrogateException
- if in
contains ill-formed UTF-16 (unpaired
surrogates)java.lang.ArrayIndexOutOfBoundsException
- if in
encoded in UTF-8 is longer than
out.remaining()
abstract void encodeUtf8Direct(java.lang.CharSequence in, java.nio.ByteBuffer out)
ByteBuffer
instance.final void encodeUtf8Default(java.lang.CharSequence in, java.nio.ByteBuffer out)
ByteBuffer
instance using the ByteBuffer
API, rather than potentially faster approaches.