public class CommonsDigester extends java.lang.Object implements DigestingParser.Digester
DigestingParser.Digester
that relies on commons.codec.digest.DigestUtils to calculate digest hashes.
This digester tries to use the regular mark/reset protocol on the InputStream. However, this wraps an internal BoundedInputStream, and if the InputStream is not fully read, then this will reset the stream and spool the InputStream to disk (via TikaInputStream) and then digest the file.
If a TikaInputStream is passed in and it has an underlying file that is longer
than the markLimit
, then this digester digests the file directly.
Modifier and Type | Class and Description |
---|---|
static class |
CommonsDigester.DigestAlgorithm |
private class |
CommonsDigester.SimpleBoundedInputStream
Very slight modification of Commons' BoundedInputStream
so that we can figure out if this hit the bound or not.
|
Modifier and Type | Field and Description |
---|---|
private java.util.List<CommonsDigester.DigestAlgorithm> |
algorithms |
private int |
markLimit |
Constructor and Description |
---|
CommonsDigester(int markLimit,
CommonsDigester.DigestAlgorithm... algorithms) |
Modifier and Type | Method and Description |
---|---|
void |
digest(java.io.InputStream is,
Metadata m,
ParseContext parseContext)
Digests an InputStream and sets the appropriate value(s) in the metadata.
|
private boolean |
digestEach(CommonsDigester.DigestAlgorithm algorithm,
java.io.InputStream is,
Metadata metadata) |
private void |
digestFile(java.io.File f,
Metadata m) |
static CommonsDigester.DigestAlgorithm[] |
parse(java.lang.String s) |
private final java.util.List<CommonsDigester.DigestAlgorithm> algorithms
private final int markLimit
public CommonsDigester(int markLimit, CommonsDigester.DigestAlgorithm... algorithms)
public void digest(java.io.InputStream is, Metadata m, ParseContext parseContext) throws java.io.IOException
DigestingParser.Digester
The given stream is guaranteed to support the
mark feature
and the detector
is expected to mark
the stream before
reading any bytes from it, and to reset
the stream before returning. The stream must not be closed by the
detector.
digest
in interface DigestingParser.Digester
is
- InputStream to digestm
- Metadata to set the values forparseContext
- ParseContextjava.io.IOException
private void digestFile(java.io.File f, Metadata m) throws java.io.IOException
java.io.IOException
private boolean digestEach(CommonsDigester.DigestAlgorithm algorithm, java.io.InputStream is, Metadata metadata) throws java.io.IOException
algorithm
- algo to useis
- input stream to read frommetadata
- metadata for reporting the digestjava.io.IOException
public static CommonsDigester.DigestAlgorithm[] parse(java.lang.String s)
s
- comma-delimited (no space) list of algorithms to use: md5,sha256