Copyright © 2010 Mochi Media, Inc.
Authors: Bob Ippolito (bob@mochimedia.com).
unichar() = unichar_low() | unichar_high()
unichar_high() = 57344..1114111
unichar_low() = 0..55295
bytes_foldl/3 | |
bytes_to_codepoints/1 | |
codepoint_foldl/3 | |
codepoint_to_bytes/1 | Convert a unicode codepoint to UTF-8 bytes. |
codepoints_to_bytes/1 | Convert a list of codepoints to a UTF-8 binary. |
len/1 | |
read_codepoint/1 | |
valid_utf8_bytes/1 | Return only the bytes in B that represent valid UTF-8. |
bytes_foldl(F::fun((binary(), term()) -> term()), Acc::term(), Bin::binary()) -> term()
bytes_to_codepoints(B::binary()) -> [unichar()]
codepoint_foldl(F::fun((unichar(), term()) -> term()), Acc::term(), Bin::binary()) -> term()
codepoint_to_bytes(C::unichar()) -> binary()
Convert a unicode codepoint to UTF-8 bytes.
codepoints_to_bytes(L::[unichar()]) -> binary()
Convert a list of codepoints to a UTF-8 binary.
len(B::binary()) -> non_neg_integer()
read_codepoint(Bin::binary()) -> {unichar(), binary(), binary()}
valid_utf8_bytes(B::binary()) -> binary()
Return only the bytes in B that represent valid UTF-8. Uses the following recursive algorithm: skip one byte if B does not follow UTF-8 syntax (a 1-4 byte encoding of some number), skip sequence of 2-4 bytes if it represents an overlong encoding or bad code point (surrogate U+D800 - U+DFFF or > U+10FFFF).
Generated by EDoc