Package org.w3c.tidy
Class EncodingUtils
- java.lang.Object
-
- org.w3c.tidy.EncodingUtils
-
public final class EncodingUtils extends java.lang.Object
- Version:
- $Revision: 622 $ ($Author: fgiust $)
- Author:
- Fabrizio Giustina
-
-
Field Summary
Fields Modifier and Type Field Description static int
FSM_ASCII
states for ISO 2022 A document in ISO-2022 based encoding uses some ESC sequences called "designator" to switch character sets.static int
FSM_ESC
state ESC.static int
FSM_ESCD
state ESCD.static int
FSM_ESCDP
state ESCDP.static int
FSM_ESCP
state ESCP.static int
FSM_NONASCII
state NONASCII.static int
HIGH_UTF16_SURROGATE
UTF-16 high surrogate.static int
LOW_UTF16_SURROGATE
utf16 low surrogate.static int
MAX_UTF16_FROM_UCS4
Max UTF-16 value.static int
MAX_UTF8_FROM_UCS4
Max UTF-88 valid char value.static int
UNICODE_BOM
the default (big-endian) UNICODE BOM.static int
UNICODE_BOM_BE
the big-endian (default) UNICODE BOM.static int
UNICODE_BOM_LE
the little-endian UNICODE BOM.static int
UNICODE_BOM_UTF8
the UTF-8 UNICODE BOM.static int
UTF16_HIGH_SURROGATE_BEGIN
UTF-16 surrogate pair areas: high surrogates begin.static int
UTF16_HIGH_SURROGATE_END
UTF-16 surrogate pair areas: high surrogates end.static int
UTF16_LOW_SURROGATE_BEGIN
UTF-16 surrogate pair areas: low surrogates begin.static int
UTF16_LOW_SURROGATE_END
UTF-16 surrogate pair areas: low surrogates end.static int
UTF16_SURROGATES_BEGIN
UTF-16 surrogates begin.
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description protected static int
decodeMacRoman(int c)
Function to convert from MacRoman to Unicode.protected static int
decodeWin1252(int c)
Function for conversion from Windows-1252 to Unicode.
-
-
-
Field Detail
-
UNICODE_BOM_BE
public static final int UNICODE_BOM_BE
the big-endian (default) UNICODE BOM.- See Also:
- Constant Field Values
-
UNICODE_BOM
public static final int UNICODE_BOM
the default (big-endian) UNICODE BOM.- See Also:
- Constant Field Values
-
UNICODE_BOM_LE
public static final int UNICODE_BOM_LE
the little-endian UNICODE BOM.- See Also:
- Constant Field Values
-
UNICODE_BOM_UTF8
public static final int UNICODE_BOM_UTF8
the UTF-8 UNICODE BOM.- See Also:
- Constant Field Values
-
FSM_ASCII
public static final int FSM_ASCII
states for ISO 2022 A document in ISO-2022 based encoding uses some ESC sequences called "designator" to switch character sets. The designators defined and used in ISO-2022-JP are: "ESC" + "(" + ? for ISO646 variants "ESC" + "$" + ? and "ESC" + "$" + "(" + ? for multibyte character sets. State ASCII.- See Also:
- Constant Field Values
-
FSM_ESC
public static final int FSM_ESC
state ESC.- See Also:
- Constant Field Values
-
FSM_ESCD
public static final int FSM_ESCD
state ESCD.- See Also:
- Constant Field Values
-
FSM_ESCDP
public static final int FSM_ESCDP
state ESCDP.- See Also:
- Constant Field Values
-
FSM_ESCP
public static final int FSM_ESCP
state ESCP.- See Also:
- Constant Field Values
-
FSM_NONASCII
public static final int FSM_NONASCII
state NONASCII.- See Also:
- Constant Field Values
-
MAX_UTF8_FROM_UCS4
public static final int MAX_UTF8_FROM_UCS4
Max UTF-88 valid char value.- See Also:
- Constant Field Values
-
MAX_UTF16_FROM_UCS4
public static final int MAX_UTF16_FROM_UCS4
Max UTF-16 value.- See Also:
- Constant Field Values
-
LOW_UTF16_SURROGATE
public static final int LOW_UTF16_SURROGATE
utf16 low surrogate.- See Also:
- Constant Field Values
-
UTF16_SURROGATES_BEGIN
public static final int UTF16_SURROGATES_BEGIN
UTF-16 surrogates begin.- See Also:
- Constant Field Values
-
UTF16_LOW_SURROGATE_BEGIN
public static final int UTF16_LOW_SURROGATE_BEGIN
UTF-16 surrogate pair areas: low surrogates begin.- See Also:
- Constant Field Values
-
UTF16_LOW_SURROGATE_END
public static final int UTF16_LOW_SURROGATE_END
UTF-16 surrogate pair areas: low surrogates end.- See Also:
- Constant Field Values
-
UTF16_HIGH_SURROGATE_BEGIN
public static final int UTF16_HIGH_SURROGATE_BEGIN
UTF-16 surrogate pair areas: high surrogates begin.- See Also:
- Constant Field Values
-
UTF16_HIGH_SURROGATE_END
public static final int UTF16_HIGH_SURROGATE_END
UTF-16 surrogate pair areas: high surrogates end.- See Also:
- Constant Field Values
-
HIGH_UTF16_SURROGATE
public static final int HIGH_UTF16_SURROGATE
UTF-16 high surrogate.- See Also:
- Constant Field Values
-
-
Method Detail
-
decodeWin1252
protected static int decodeWin1252(int c)
Function for conversion from Windows-1252 to Unicode.- Parameters:
c
- char to decode- Returns:
- decoded char
-
decodeMacRoman
protected static int decodeMacRoman(int c)
Function to convert from MacRoman to Unicode.- Parameters:
c
- char to decode- Returns:
- decoded char
-
-