public class TextDetector extends java.lang.Object implements Detector
Note that text documents with a character encoding like UTF-16 are better
detected with MagicDetector
and an appropriate magic byte pattern.
Modifier and Type | Field and Description |
---|---|
private int |
bytesToTest |
private static int |
DEFAULT_NUMBER_OF_BYTES_TO_TEST
The number of bytes from the beginning of the document stream
to test for control bytes.
|
private static boolean[] |
IS_CONTROL_BYTE
Lookup table for all the ASCII/ISO-Latin/UTF-8/etc.
|
private static long |
serialVersionUID
Serial version UID
|
Constructor and Description |
---|
TextDetector()
Constructs a
TextDetector which will look at the default number
of bytes from the beginning of the document. |
TextDetector(int bytesToTest)
Constructs a
TextDetector which will look at a given number of
bytes from the beginning of the document. |
Modifier and Type | Method and Description |
---|---|
MediaType |
detect(java.io.InputStream input,
Metadata metadata)
Looks at the beginning of the document input stream to determine
whether the document is text or not.
|
private static final long serialVersionUID
private static final int DEFAULT_NUMBER_OF_BYTES_TO_TEST
private static final boolean[] IS_CONTROL_BYTE
true
then that byte is very unlikely to occur
in a plain text document.
The contents of this lookup table are based on the following definition from section 4 of the "Content-Type Processing Model" Internet-draft (draft-abarth-mime-sniff-01).
+-------------------------+ | Binary data byte ranges | +-------------------------+ | 0x00 -- 0x08 | | 0x0B | | 0x0E -- 0x1A | | 0x1C -- 0x1F | +-------------------------+
private final int bytesToTest
public TextDetector()
TextDetector
which will look at the default number
of bytes from the beginning of the document.public TextDetector(int bytesToTest)
TextDetector
which will look at a given number of
bytes from the beginning of the document.