public class ExternalEmbedder extends java.lang.Object implements Embedder
Modifier and Type | Field and Description |
---|---|
private java.lang.String[] |
command
The external command to invoke.
|
private java.lang.String |
commandAppendOperator |
private java.lang.String |
commandAssignmentDelimeter |
private java.lang.String |
commandAssignmentOperator |
static java.lang.String |
METADATA_COMMAND_ARGUMENTS_SERIALIZED_TOKEN
Token to be replaced with a String array of metadata assignment command
arguments
|
static java.lang.String |
METADATA_COMMAND_ARGUMENTS_TOKEN
Token to be replaced with a String array of metadata assignment command
arguments
|
private java.util.Map<Property,java.lang.String[]> |
metadataCommandArguments
Mapping of Tika metadata to command line parameters.
|
private boolean |
quoteAssignmentValues |
private static long |
serialVersionUID |
private java.util.Set<MediaType> |
supportedEmbedTypes
Media types supported by the external program.
|
private TemporaryResources |
tmp |
Constructor and Description |
---|
ExternalEmbedder() |
Modifier and Type | Method and Description |
---|---|
static boolean |
check(java.lang.String[] checkCmd,
int... errorValue)
Checks to see if the command can be run.
|
static boolean |
check(java.lang.String checkCmd,
int... errorValue)
Checks to see if the command can be run.
|
void |
embed(Metadata metadata,
java.io.InputStream inputStream,
java.io.OutputStream outputStream,
ParseContext context)
Executes the configured external command and passes the given document
stream as a simple XHTML document to the given SAX content handler.
|
java.lang.String[] |
getCommand()
Gets the command to be run.
|
java.lang.String |
getCommandAppendOperator()
Gets the operator to append rather than replace a value for the command
line tool, i.e.
|
java.lang.String |
getCommandAssignmentDelimeter()
Gets the delimiter for multiple assignments for the command line tool,
i.e.
|
java.lang.String |
getCommandAssignmentOperator()
Gets the assignment operator for the command line tool, i.e.
|
protected java.util.List<java.lang.String> |
getCommandMetadataSegments(Metadata metadata)
Constructs a collection of command line arguments responsible for setting
individual metadata fields based on the given
metadata . |
java.util.Map<Property,java.lang.String[]> |
getMetadataCommandArguments()
Gets the map of Metadata keys to command line parameters.
|
java.util.Set<MediaType> |
getSupportedEmbedTypes() |
java.util.Set<MediaType> |
getSupportedEmbedTypes(ParseContext context)
Returns the set of media types supported by this embedder when used with
the given parse context.
|
boolean |
isQuoteAssignmentValues()
Gets whether or not to quote assignment values, i.e.
|
private void |
multiThreadedStreamCopy(java.io.InputStream inputStream,
java.io.OutputStream outputStream)
Creates a new thread for copying a given input stream to a given output stream.
|
private void |
sendInputStreamToStdIn(java.io.InputStream inputStream,
java.lang.Process process)
Sends the contents of the given input stream to the
standard input of the given process.
|
private void |
sendStdErrToOutputStream(java.lang.Process process,
java.io.OutputStream outputStream)
Starts a thread that reads and discards the contents of the standard
stream of the given process.
|
private void |
sendStdOutToOutputStream(java.lang.Process process,
java.io.OutputStream outputStream)
Sends the standard output of the given
process to the given output stream.
|
protected static java.lang.String |
serializeMetadata(java.util.List<java.lang.String> metadataCommandArguments)
Serializes a collection of metadata command line arguments into a single
string.
|
void |
setCommand(java.lang.String... command)
Sets the command to be run.
|
void |
setCommandAppendOperator(java.lang.String commandAppendOperator)
Sets the operator to append rather than replace a value for the command
line tool, i.e.
|
void |
setCommandAssignmentDelimeter(java.lang.String commandAssignmentDelimeter)
Sets the delimiter for multiple assignments for the command line tool,
i.e.
|
void |
setCommandAssignmentOperator(java.lang.String commandAssignmentOperator)
Sets the assignment operator for the command line tool, i.e.
|
void |
setMetadataCommandArguments(java.util.Map<Property,java.lang.String[]> arguments)
Sets the map of Metadata keys to command line parameters.
|
void |
setQuoteAssignmentValues(boolean quoteAssignmentValues)
Sets whether or not to quote assignment values, i.e.
|
void |
setSupportedEmbedTypes(java.util.Set<MediaType> supportedEmbedTypes) |
private static final long serialVersionUID
public static final java.lang.String METADATA_COMMAND_ARGUMENTS_TOKEN
public static final java.lang.String METADATA_COMMAND_ARGUMENTS_SERIALIZED_TOKEN
private java.util.Set<MediaType> supportedEmbedTypes
private java.util.Map<Property,java.lang.String[]> metadataCommandArguments
private java.lang.String[] command
Runtime.exec(String[])
private java.lang.String commandAssignmentOperator
private java.lang.String commandAssignmentDelimeter
private java.lang.String commandAppendOperator
private boolean quoteAssignmentValues
private TemporaryResources tmp
public java.util.Set<MediaType> getSupportedEmbedTypes(ParseContext context)
Embedder
The name differs from the precedence of Parser.getSupportedTypes(ParseContext)
so that parser implementations may also choose to implement this interface.
getSupportedEmbedTypes
in interface Embedder
context
- parse contextpublic java.util.Set<MediaType> getSupportedEmbedTypes()
public void setSupportedEmbedTypes(java.util.Set<MediaType> supportedEmbedTypes)
public java.lang.String[] getCommand()
#INPUT_FILE_TOKEN
or #OUTPUT_FILE_TOKEN
if the command
needs filenames.public void setCommand(java.lang.String... command)
#INPUT_FILE_TOKEN
or #OUTPUT_FILE_TOKEN
if the command
needs filenames.Runtime.exec(String[])
public java.lang.String getCommandAssignmentOperator()
public void setCommandAssignmentOperator(java.lang.String commandAssignmentOperator)
commandAssignmentOperator
- public java.lang.String getCommandAssignmentDelimeter()
public void setCommandAssignmentDelimeter(java.lang.String commandAssignmentDelimeter)
commandAssignmentDelimeter
- public java.lang.String getCommandAppendOperator()
public void setCommandAppendOperator(java.lang.String commandAppendOperator)
commandAppendOperator
- public boolean isQuoteAssignmentValues()
public void setQuoteAssignmentValues(boolean quoteAssignmentValues)
quoteAssignmentValues
- public java.util.Map<Property,java.lang.String[]> getMetadataCommandArguments()
public void setMetadataCommandArguments(java.util.Map<Property,java.lang.String[]> arguments)
arguments
- protected java.util.List<java.lang.String> getCommandMetadataSegments(Metadata metadata)
metadata
.metadata
- the metadata to embedprotected static java.lang.String serializeMetadata(java.util.List<java.lang.String> metadataCommandArguments)
metadataCommandArguments
- public void embed(Metadata metadata, java.io.InputStream inputStream, java.io.OutputStream outputStream, ParseContext context) throws java.io.IOException, TikaException
setMetadataCommandArguments(Map)
has been called to set arguments.embed
in interface Embedder
metadata
- document metadata (input and output)inputStream
- the document stream (input)outputStream
- the output stream to write the metadata embedded data tocontext
- parse contextjava.io.IOException
- if the document stream could not be readTikaException
- if the document could not be parsedprivate void multiThreadedStreamCopy(java.io.InputStream inputStream, java.io.OutputStream outputStream)
inputStream
- the source input streamoutputStream
- the target output streamprivate void sendInputStreamToStdIn(java.io.InputStream inputStream, java.lang.Process process)
Note that the given input stream is not closed by this method.
process
- the processinputStream
- the input stream to send to standard input of the processprivate void sendStdOutToOutputStream(java.lang.Process process, java.io.OutputStream outputStream)
Note that the given output stream is not closed by this method.
process
- the processoutputStream
- the putput stream to send to standard input of the processprivate void sendStdErrToOutputStream(java.lang.Process process, java.io.OutputStream outputStream)
process
- the process
param outputStream the output stream to send to standard error of the processpublic static boolean check(java.lang.String checkCmd, int... errorValue)
checkCmd
- the check command to runerrorValue
- what is considered an error value?public static boolean check(java.lang.String[] checkCmd, int... errorValue)
checkCmd
- the check command to runerrorValue
- what is considered an error value?