Package | Description |
---|---|
org.apache.tika |
Apache Tika.
|
org.apache.tika.detect |
Media type detection.
|
org.apache.tika.embedder | |
org.apache.tika.extractor |
Extraction of component documents.
|
org.apache.tika.fork |
Forked parser.
|
org.apache.tika.io |
IO utilities.
|
org.apache.tika.metadata |
Multi-valued metadata container, and set of constant metadata fields.
|
org.apache.tika.metadata.serialization | |
org.apache.tika.mime |
Media type information.
|
org.apache.tika.parser |
Tika parsers.
|
org.apache.tika.parser.audio | |
org.apache.tika.parser.envi | |
org.apache.tika.parser.epub | |
org.apache.tika.parser.external |
External parser process.
|
org.apache.tika.parser.feed | |
org.apache.tika.parser.gdal | |
org.apache.tika.parser.iptc | |
org.apache.tika.parser.iwork | |
org.apache.tika.parser.strings | |
org.apache.tika.parser.utils | |
org.apache.tika.parser.video | |
org.apache.tika.parser.xml | |
org.apache.tika.sax |
SAX utilities.
|
Modifier and Type | Method and Description |
---|---|
java.lang.String |
Tika.detect(java.io.InputStream stream,
Metadata metadata)
Detects the media type of the given document.
|
java.io.Reader |
Tika.parse(java.io.InputStream stream,
Metadata metadata)
Parses the given document and returns the extracted text content.
|
java.lang.String |
Tika.parseToString(java.io.InputStream stream,
Metadata metadata)
Parses the given document and returns the extracted text content.
|
java.lang.String |
Tika.parseToString(java.io.InputStream stream,
Metadata metadata,
int maxLength)
Parses the given document and returns the extracted text content.
|
Modifier and Type | Method and Description |
---|---|
java.nio.charset.Charset |
EncodingDetector.detect(java.io.InputStream input,
Metadata metadata)
Detects the character encoding of the given text document, or
null if the encoding of the document can not be detected. |
MediaType |
Detector.detect(java.io.InputStream input,
Metadata metadata)
Detects the content type of the given input document.
|
MediaType |
TrainedModelDetector.detect(java.io.InputStream input,
Metadata metadata) |
MediaType |
TypeDetector.detect(java.io.InputStream input,
Metadata metadata)
Detects the content type of an input document based on a type hint
given in the input metadata.
|
MediaType |
TextDetector.detect(java.io.InputStream input,
Metadata metadata)
Looks at the beginning of the document input stream to determine
whether the document is text or not.
|
MediaType |
MagicDetector.detect(java.io.InputStream input,
Metadata metadata) |
MediaType |
EmptyDetector.detect(java.io.InputStream input,
Metadata metadata) |
MediaType |
CompositeDetector.detect(java.io.InputStream input,
Metadata metadata) |
MediaType |
NameDetector.detect(java.io.InputStream input,
Metadata metadata)
Detects the content type of an input document based on the document
name given in the input metadata.
|
private static java.nio.charset.Charset |
AutoDetectReader.detect(java.io.InputStream input,
Metadata metadata,
java.util.List<EncodingDetector> detectors,
LoadErrorHandler handler) |
Constructor and Description |
---|
AutoDetectReader(java.io.BufferedInputStream stream,
Metadata metadata,
java.util.List<EncodingDetector> detectors,
LoadErrorHandler handler) |
AutoDetectReader(java.io.InputStream stream,
Metadata metadata) |
AutoDetectReader(java.io.InputStream stream,
Metadata metadata,
ServiceLoader loader) |
Modifier and Type | Method and Description |
---|---|
void |
Embedder.embed(Metadata metadata,
java.io.InputStream originalStream,
java.io.OutputStream outputStream,
ParseContext context)
Embeds related document metadata from the given metadata object into the
given output stream.
|
void |
ExternalEmbedder.embed(Metadata metadata,
java.io.InputStream inputStream,
java.io.OutputStream outputStream,
ParseContext context)
Executes the configured external command and passes the given document
stream as a simple XHTML document to the given SAX content handler.
|
protected java.util.List<java.lang.String> |
ExternalEmbedder.getCommandMetadataSegments(Metadata metadata)
Constructs a collection of command line arguments responsible for setting
individual metadata fields based on the given
metadata . |
Modifier and Type | Method and Description |
---|---|
void |
ParserContainerExtractor.RecursiveParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler ignored,
Metadata metadata,
ParseContext context) |
void |
EmbeddedDocumentExtractor.parseEmbedded(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
boolean outputHtml)
Processes the supplied embedded resource, calling the delegating
parser with the appropriate details.
|
void |
ParsingEmbeddedDocumentExtractor.parseEmbedded(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
boolean outputHtml) |
boolean |
DocumentSelector.select(Metadata metadata)
Checks if a document with the given metadata matches the specified
selection criteria.
|
boolean |
EmbeddedDocumentExtractor.shouldParseEmbedded(Metadata metadata) |
boolean |
ParsingEmbeddedDocumentExtractor.shouldParseEmbedded(Metadata metadata) |
Modifier and Type | Field and Description |
---|---|
private Metadata |
MetadataContentHandler.metadata |
Modifier and Type | Method and Description |
---|---|
void |
ForkParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
Constructor and Description |
---|
MetadataContentHandler(Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
static TikaInputStream |
TikaInputStream.get(java.sql.Blob blob,
Metadata metadata)
Creates a TikaInputStream from the given database BLOB.
|
static TikaInputStream |
TikaInputStream.get(byte[] data,
Metadata metadata)
Creates a TikaInputStream from the given array of bytes.
|
static TikaInputStream |
TikaInputStream.get(java.io.File file,
Metadata metadata)
Deprecated.
use
TikaInputStream.get(Path, Metadata) . In Tika 2.0,
this will be removed or modified to throw an IOException. |
static TikaInputStream |
TikaInputStream.get(java.nio.file.Path path,
Metadata metadata)
Creates a TikaInputStream from the file at the given path.
|
static TikaInputStream |
TikaInputStream.get(java.net.URI uri,
Metadata metadata)
Creates a TikaInputStream from the resource at the given URI.
|
static TikaInputStream |
TikaInputStream.get(java.net.URL url,
Metadata metadata)
Creates a TikaInputStream from the resource at the given URL.
|
Modifier and Type | Method and Description |
---|---|
static void |
XMPDM.ChannelTypePropertyConverter.convertAndSet(Metadata metadata,
java.lang.Object value)
Deprecated.
How convert+set might work
|
Modifier and Type | Method and Description |
---|---|
Metadata |
JsonMetadataDeserializer.deserialize(com.google.gson.JsonElement element,
java.lang.reflect.Type type,
com.google.gson.JsonDeserializationContext context)
Deserializes a json object (equivalent to: Map
|
static Metadata |
JsonMetadata.fromJson(java.io.Reader reader)
Read metadata from reader.
|
Modifier and Type | Method and Description |
---|---|
static java.util.List<Metadata> |
JsonMetadataList.fromJson(java.io.Reader reader)
Read metadata from reader.
|
Modifier and Type | Method and Description |
---|---|
java.lang.String[] |
JsonMetadataBase.SortedJsonMetadataSerializer.getNames(Metadata m) |
protected java.lang.String[] |
JsonMetadataSerializer.getNames(Metadata metadata)
Override to get a custom sort order
or to filter names.
|
com.google.gson.JsonElement |
JsonMetadataSerializer.serialize(Metadata metadata,
java.lang.reflect.Type type,
com.google.gson.JsonSerializationContext context)
Serializes a Metadata object into effectively Map
|
static void |
JsonMetadata.toJson(Metadata metadata,
java.io.Writer writer)
Serializes a Metadata object to Json.
|
Modifier and Type | Method and Description |
---|---|
static void |
JsonMetadataList.toJson(java.util.List<Metadata> metadataList,
java.io.Writer writer)
Serializes a Metadata object to Json.
|
Modifier and Type | Method and Description |
---|---|
MediaType |
ProbabilisticMimeDetectionSelector.detect(java.io.InputStream input,
Metadata metadata) |
MediaType |
MimeTypes.detect(java.io.InputStream input,
Metadata metadata)
Automatically detects the MIME type of a document based on magic
markers in the stream prefix and any given metadata hints.
|
Modifier and Type | Field and Description |
---|---|
private Metadata |
ParsingReader.metadata
Metadata associated with the document being parsed.
|
private Metadata |
NetworkParser.MetaHandler.metadata |
Modifier and Type | Field and Description |
---|---|
private java.util.List<Metadata> |
RecursiveParserWrapper.metadatas |
Modifier and Type | Method and Description |
---|---|
private Metadata |
RecursiveParserWrapper.deepCopy(Metadata m) |
private static Metadata |
ParsingReader.getMetadata(java.lang.String name)
Utility method that returns a
Metadata instance
for a document with the given name. |
Modifier and Type | Method and Description |
---|---|
java.util.List<Metadata> |
RecursiveParserWrapper.getMetadata()
The first element in the returned list represents the
data from the outer container file.
|
Modifier and Type | Method and Description |
---|---|
private void |
RecursiveParserWrapper.addContent(org.xml.sax.ContentHandler handler,
Metadata metadata) |
private Metadata |
RecursiveParserWrapper.deepCopy(Metadata m) |
void |
DigestingParser.Digester.digest(java.io.InputStream is,
Metadata m,
ParseContext parseContext)
Digests an InputStream and sets the appropriate value(s) in the metadata.
|
protected Parser |
CompositeParser.getParser(Metadata metadata)
Returns the parser that best matches the given metadata.
|
protected Parser |
CompositeParser.getParser(Metadata metadata,
ParseContext context) |
java.lang.String |
PasswordProvider.getPassword(Metadata metadata)
Looks up the password for a document with the given metadata,
and returns it for the Parser.
|
private java.lang.String |
RecursiveParserWrapper.getResourceName(Metadata metadata) |
void |
AbstractParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata)
Deprecated.
use the
Parser.parse(InputStream, ContentHandler, Metadata, ParseContext) method instead |
void |
AutoDetectParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata) |
void |
ParserDecorator.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context)
Delegates the method call to the decorated parser.
|
void |
DigestingParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
NetworkParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
NetworkParser.ParsingTask.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
EmptyParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
AutoDetectParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
CompositeParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context)
Delegates the call to the matching component parser.
|
void |
RecursiveParserWrapper.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler ignore,
Metadata metadata,
ParseContext context)
Acts like a regular parser except it ignores the ContentHandler
and it automatically sets/overwrites the embedded Parser in the
ParseContext object.
|
void |
RecursiveParserWrapper.EmbeddedParserDecorator.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler ignore,
Metadata metadata,
ParseContext context) |
void |
ParserPostProcessor.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context)
Forwards the call to the delegated parser and post-processes the
results as described above.
|
void |
CryptoParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
DelegatingParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context)
Looks up the delegate parser from the parsing context and
delegates the parse operation to it.
|
void |
Parser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context)
Parses a document stream into a sequence of XHTML SAX events.
|
void |
ErrorParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
private void |
NetworkParser.parse(TikaInputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
Constructor and Description |
---|
MetaHandler(Metadata metadata) |
ParsingReader(Parser parser,
java.io.InputStream stream,
Metadata metadata,
ParseContext context)
Creates a reader for the text content of the given binary stream
with the given document metadata.
|
ParsingReader(Parser parser,
java.io.InputStream stream,
Metadata metadata,
ParseContext context,
java.util.concurrent.Executor executor)
Creates a reader for the text content of the given binary stream
with the given document metadata.
|
Modifier and Type | Method and Description |
---|---|
private void |
AudioParser.addMetadata(Metadata metadata,
java.util.Map<java.lang.String,java.lang.Object> properties) |
void |
AudioParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
MidiParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
EnviHeaderParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
EpubParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
EpubContentParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
private void |
ExternalParser.extractMetadata(java.io.InputStream stream,
Metadata metadata) |
void |
ExternalParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context)
Executes the configured external command and passes the given document
stream as a simple XHTML document to the given SAX content handler.
|
private void |
ExternalParser.parse(TikaInputStream stream,
XHTMLContentHandler xhtml,
Metadata metadata,
TemporaryResources tmp) |
Modifier and Type | Method and Description |
---|---|
void |
FeedParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
private void |
GDALParser.applyPatternsToOutput(java.lang.String output,
Metadata metadata,
java.util.Map<java.util.regex.Pattern,java.lang.String> metadataPatterns) |
private void |
GDALParser.extractMetFromOutput(java.lang.String output,
Metadata met) |
void |
GDALParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
private void |
GDALParser.processOutput(org.xml.sax.ContentHandler handler,
Metadata metadata,
java.lang.String output) |
Modifier and Type | Method and Description |
---|---|
void |
IptcAnpaParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata)
Deprecated.
This method will be removed in Apache Tika 1.0.
|
void |
IptcAnpaParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
private void |
IptcAnpaParser.setMetadata(Metadata metadata,
java.util.HashMap<java.lang.String,java.lang.String> properties) |
Modifier and Type | Field and Description |
---|---|
private Metadata |
KeynoteContentHandler.metadata |
private Metadata |
NumbersContentHandler.metadata |
private Metadata |
PagesContentHandler.metadata |
Modifier and Type | Method and Description |
---|---|
void |
IWorkPackageParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
Constructor and Description |
---|
KeynoteContentHandler(XHTMLContentHandler xhtml,
Metadata metadata) |
NumbersContentHandler(XHTMLContentHandler xhtml,
Metadata metadata) |
PagesContentHandler(XHTMLContentHandler xhtml,
Metadata metadata) |
Modifier and Type | Method and Description |
---|---|
private void |
Latin1StringsParser.doParse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context)
Does a best effort to extract Latin1 strings encoded with ISO-8859-1,
UTF-8 or UTF-16.
|
void |
Latin1StringsParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
StringsParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Method and Description |
---|---|
void |
CommonsDigester.digest(java.io.InputStream is,
Metadata m,
ParseContext parseContext) |
private boolean |
CommonsDigester.digestEach(CommonsDigester.DigestAlgorithm algorithm,
java.io.InputStream is,
Metadata metadata) |
private void |
CommonsDigester.digestFile(java.io.File f,
Metadata m) |
Modifier and Type | Method and Description |
---|---|
void |
FLVParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
Modifier and Type | Field and Description |
---|---|
private Metadata |
FictionBookParser.BinaryElementsDataHandler.metadata |
private Metadata |
ElementMetadataHandler.metadata |
private Metadata |
MetadataHandler.metadata
Deprecated.
|
private Metadata |
AbstractMetadataHandler.metadata |
private Metadata |
AttributeDependantMetadataHandler.metadata |
Modifier and Type | Method and Description |
---|---|
protected org.xml.sax.ContentHandler |
DcXMLParser.getContentHandler(org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
protected org.xml.sax.ContentHandler |
FictionBookParser.getContentHandler(org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
protected org.xml.sax.ContentHandler |
XMLParser.getContentHandler(org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
private static org.xml.sax.ContentHandler |
DcXMLParser.getDublinCoreHandler(Metadata metadata,
Property property,
java.lang.String element) |
void |
XMLParser.parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context) |
Constructor and Description |
---|
AbstractMetadataHandler(Metadata metadata,
Property property) |
AbstractMetadataHandler(Metadata metadata,
java.lang.String name) |
AttributeDependantMetadataHandler(Metadata metadata,
java.lang.String nameHoldingAttribute,
java.lang.String namePrefix) |
AttributeMetadataHandler(java.lang.String uri,
java.lang.String localName,
Metadata metadata,
Property property) |
AttributeMetadataHandler(java.lang.String uri,
java.lang.String localName,
Metadata metadata,
java.lang.String name) |
ElementMetadataHandler(java.lang.String uri,
java.lang.String localName,
Metadata metadata,
Property targetProperty)
Constructor for Property metadata keys.
|
ElementMetadataHandler(java.lang.String uri,
java.lang.String localName,
Metadata metadata,
Property targetProperty,
boolean allowDuplicateValues,
boolean allowEmptyValues)
Constructor for Property metadata keys which allows change of behavior
for duplicate and empty entry values.
|
ElementMetadataHandler(java.lang.String uri,
java.lang.String localName,
Metadata metadata,
java.lang.String name)
Constructor for string metadata keys.
|
ElementMetadataHandler(java.lang.String uri,
java.lang.String localName,
Metadata metadata,
java.lang.String name,
boolean allowDuplicateValues,
boolean allowEmptyValues)
Constructor for string metadata keys which allows change of behavior
for duplicate and empty entry values.
|
MetadataHandler(Metadata metadata,
Property property)
Deprecated.
|
MetadataHandler(Metadata metadata,
java.lang.String name)
Deprecated.
|
Modifier and Type | Field and Description |
---|---|
private Metadata |
DIFContentHandler.metadata |
private Metadata |
PhoneExtractingContentHandler.metadata |
private Metadata |
XHTMLContentHandler.metadata
Metadata associated with the document.
|
Modifier and Type | Method and Description |
---|---|
private void |
XMPContentHandler.description(Metadata metadata,
java.lang.String prefix,
java.lang.String uri) |
void |
XMPContentHandler.metadata(Metadata metadata) |
Constructor and Description |
---|
DIFContentHandler(org.xml.sax.ContentHandler delegate,
Metadata metadata) |
PhoneExtractingContentHandler(org.xml.sax.ContentHandler handler,
Metadata metadata)
Creates a decorator for the given SAX event handler and Metadata object.
|
XHTMLContentHandler(org.xml.sax.ContentHandler handler,
Metadata metadata) |