Package | Description |
---|---|
org.apache.tika.embedder | |
org.apache.tika.extractor |
Extraction of component documents.
|
org.apache.tika.fork |
Forked parser.
|
org.apache.tika.parser |
Tika parsers.
|
org.apache.tika.parser.audio | |
org.apache.tika.parser.envi | |
org.apache.tika.parser.epub | |
org.apache.tika.parser.external |
External parser process.
|
org.apache.tika.parser.feed | |
org.apache.tika.parser.gdal | |
org.apache.tika.parser.iptc | |
org.apache.tika.parser.iwork | |
org.apache.tika.parser.strings | |
org.apache.tika.parser.utils | |
org.apache.tika.parser.video | |
org.apache.tika.parser.xml | |
org.apache.tika.utils |
Utilities.
|
Modifier and Type | Method and Description |
---|---|
void |
Embedder.embed(Metadata metadata,
java.io.InputStream originalStream,
java.io.OutputStream outputStream,
ParseContext context)
Embeds related document metadata from the given metadata object into the
given output stream.
|
void |
ExternalEmbedder.embed(Metadata metadata,
java.io.InputStream inputStream,
java.io.OutputStream outputStream,
ParseContext context)
Executes the configured external command and passes the given document
stream as a simple XHTML document to the given SAX content handler.
|
java.util.Set<MediaType> |
Embedder.getSupportedEmbedTypes(ParseContext context)
Returns the set of media types supported by this embedder when used with
the given parse context.
|
java.util.Set<MediaType> |
ExternalEmbedder.getSupportedEmbedTypes(ParseContext context) |
Modifier and Type | Field and Description |
---|---|
private ParseContext |
ParsingEmbeddedDocumentExtractor.context |
Modifier and Type | Method and Description |
---|---|
java.util.Set<MediaType> |
ParserContainerExtractor.RecursiveParser.getSupportedTypes(ParseContext context) |
void |
ParserContainerExtractor.RecursiveParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler ignored,
Metadata metadata,
ParseContext context) |
Constructor and Description |
---|
ParsingEmbeddedDocumentExtractor(ParseContext context) |
Modifier and Type | Method and Description |
---|---|
java.util.Set<MediaType> |
ForkParser.getSupportedTypes(ParseContext context) |
void |
ForkParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Field and Description |
---|---|
private ParseContext |
ParsingReader.context
The parse context.
|
Modifier and Type | Method and Description |
---|---|
void |
DigestingParser.Digester.digest(java.io.InputStream is,
Metadata m,
ParseContext parseContext)
Digests an InputStream and sets the appropriate value(s) in the metadata.
|
java.util.Map<MediaType,java.util.List<Parser>> |
CompositeParser.findDuplicateParsers(ParseContext context)
Utility method that goes through all the component parsers and finds
all media types for which more than one parser declares support.
|
protected Parser |
DelegatingParser.getDelegateParser(ParseContext context)
Returns the parser instance to which parsing tasks should be delegated.
|
protected Parser |
CompositeParser.getParser(Metadata metadata,
ParseContext context) |
java.util.Map<MediaType,Parser> |
DefaultParser.getParsers(ParseContext context) |
java.util.Map<MediaType,Parser> |
CompositeParser.getParsers(ParseContext context) |
java.util.Set<MediaType> |
ParserDecorator.getSupportedTypes(ParseContext context)
Delegates the method call to the decorated parser.
|
java.util.Set<MediaType> |
NetworkParser.getSupportedTypes(ParseContext context) |
java.util.Set<MediaType> |
EmptyParser.getSupportedTypes(ParseContext context) |
java.util.Set<MediaType> |
CompositeParser.getSupportedTypes(ParseContext context) |
java.util.Set<MediaType> |
RecursiveParserWrapper.getSupportedTypes(ParseContext context) |
java.util.Set<MediaType> |
CryptoParser.getSupportedTypes(ParseContext context) |
java.util.Set<MediaType> |
DelegatingParser.getSupportedTypes(ParseContext context) |
java.util.Set<MediaType> |
Parser.getSupportedTypes(ParseContext context)
Returns the set of media types supported by this parser when used
with the given parse context.
|
java.util.Set<MediaType> |
ErrorParser.getSupportedTypes(ParseContext context) |
void |
ParserDecorator.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context)
Delegates the method call to the decorated parser.
|
void |
DigestingParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
NetworkParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
NetworkParser.ParsingTask.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
EmptyParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
AutoDetectParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
CompositeParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context)
Delegates the call to the matching component parser.
|
void |
RecursiveParserWrapper.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler ignore,
Metadata metadata,
ParseContext context)
Acts like a regular parser except it ignores the ContentHandler
and it automatically sets/overwrites the embedded Parser in the
ParseContext object.
|
void |
RecursiveParserWrapper.EmbeddedParserDecorator.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler ignore,
Metadata metadata,
ParseContext context) |
void |
ParserPostProcessor.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context)
Forwards the call to the delegated parser and post-processes the
results as described above.
|
void |
CryptoParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
DelegatingParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context)
Looks up the delegate parser from the parsing context and
delegates the parse operation to it.
|
void |
Parser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context)
Parses a document stream into a sequence of XHTML SAX events.
|
void |
ErrorParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
private void |
NetworkParser.parse(TikaInputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
Constructor and Description |
---|
ParsingReader(Parser parser,
java.io.InputStream stream,
Metadata metadata,
ParseContext context)
Creates a reader for the text content of the given binary stream
with the given document metadata.
|
ParsingReader(Parser parser,
java.io.InputStream stream,
Metadata metadata,
ParseContext context,
java.util.concurrent.Executor executor)
Creates a reader for the text content of the given binary stream
with the given document metadata.
|
Modifier and Type | Method and Description |
---|---|
java.util.Set<MediaType> |
AudioParser.getSupportedTypes(ParseContext context) |
java.util.Set<MediaType> |
MidiParser.getSupportedTypes(ParseContext context) |
void |
AudioParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
MidiParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
java.util.Set<MediaType> |
EnviHeaderParser.getSupportedTypes(ParseContext context) |
void |
EnviHeaderParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
java.util.Set<MediaType> |
EpubParser.getSupportedTypes(ParseContext context) |
java.util.Set<MediaType> |
EpubContentParser.getSupportedTypes(ParseContext context) |
void |
EpubParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
EpubContentParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
java.util.Set<MediaType> |
ExternalParser.getSupportedTypes(ParseContext context) |
void |
ExternalParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context)
Executes the configured external command and passes the given document
stream as a simple XHTML document to the given SAX content handler.
|
Modifier and Type | Method and Description |
---|---|
java.util.Set<MediaType> |
FeedParser.getSupportedTypes(ParseContext context) |
void |
FeedParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
java.util.Set<MediaType> |
GDALParser.getSupportedTypes(ParseContext context) |
void |
GDALParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
java.util.Set<MediaType> |
IptcAnpaParser.getSupportedTypes(ParseContext context) |
void |
IptcAnpaParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
java.util.Set<MediaType> |
IWorkPackageParser.getSupportedTypes(ParseContext context) |
void |
IWorkPackageParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
private void |
Latin1StringsParser.doParse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context)
Does a best effort to extract Latin1 strings encoded with ISO-8859-1,
UTF-8 or UTF-16.
|
java.util.Set<MediaType> |
Latin1StringsParser.getSupportedTypes(ParseContext arg0) |
java.util.Set<MediaType> |
StringsParser.getSupportedTypes(ParseContext context) |
void |
Latin1StringsParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
StringsParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
CommonsDigester.digest(java.io.InputStream is,
Metadata m,
ParseContext parseContext) |
Modifier and Type | Method and Description |
---|---|
java.util.Set<MediaType> |
FLVParser.getSupportedTypes(ParseContext context) |
void |
FLVParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
protected org.xml.sax.ContentHandler |
DcXMLParser.getContentHandler(org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
protected org.xml.sax.ContentHandler |
FictionBookParser.getContentHandler(org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
protected org.xml.sax.ContentHandler |
XMLParser.getContentHandler(org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
java.util.Set<MediaType> |
FictionBookParser.getSupportedTypes(ParseContext context) |
java.util.Set<MediaType> |
XMLParser.getSupportedTypes(ParseContext context) |
void |
XMLParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
static java.util.concurrent.Future |
ConcurrentUtils.execute(ParseContext context,
java.lang.Runnable runnable)
Execute a runnable using an ExecutorService from the ParseContext if possible.
|