Introduction

The file transport, also known as the VFS (Virtual File System) transport, can be used to read, mediate and write file content using Synapse. This transport allows Synapse to interface with the local file system and remote file systems via file transfer protocols such as FTP.

The file transport is based on Apache Commons VFS, and supports all the file transfer protocols supported by Commons VFS. This includes interactions with the local file system, HTTP, HTTPS, FTP and SFTP (i.e. file transfer over SSH).

There is a fundamental difference between the file transport and transports such as HTTP, and it is important to understand this difference to be able to use the file transport correctly. The HTTP transport binds to a single protocol endpoint, i.e. a TCP port on which it accepts incoming HTTP requests. These requests are then dispatched to the appropriate service based on the request URI. On the other hand, the file transport only receives the payload of a message (i.e. the file), but no additional information that could be used to dispatch the message to a service. This means that file system locations must be explicitly mapped to services. This is done using a set of service parameters. For Synapse this means that the VFS transport listener can only be used in conjunction with proxy services. The relevant service parameters are specified in the proxy service configuration as follows:

<proxy name="MyVFSService" transports="vfs"> <parameter name="transport.vfs.FileURI">file:///var/spool/synapse/in</parameter> <parameter name="transport.vfs.ContentType">application/xml</parameter> ... <target> ... </target> </proxy>

In the above example the file system location file:///var/spool/synapse/in is explicitly bound to MyVFSService. Any file dropped in that location will be pre-dispatched to MyVFSService, bypassing any other configured dispatch mechanisms that would normally apply to messages received via HTTP.

[Back to top]

Transport Configuration

The file transport consists of a transport listener component and a transport sender component. Proxy services can read files using the file transport listener, and they can write file content using the file transport sender. Following sections describe how to configure these two components of the transport.

File Transport Listener (VFS Listener)

Before a proxy service can read files, the VFS listener must be enabled in the SYNAPSE_HOME/repository/conf/axis2.xml file of Synapse. Look for the following XML configuration in the axis2.xml file, and uncomment it if it's commented out.

<transportReceiver name="vfs" class="org.apache.synapse.transport.vfs.VFSTransportListener"/>

To configure a proxy service to receive messages via the VFS listener (i.e. read files from some local or remote location), set the "transports" attribute on the proxy service element to "vfs":

<proxy name="MyVFSService" transports="vfs"> ... </proxy>

It's also possible to expose a proxy service on VFS transport and several other transports. Simply specify the required transports as a space-separated list in the "transports" attribute:

<proxy name="MyVFSService" transports="vfs http https"> ... </proxy>

A proxy service configured with the VFS listener, can be further customized by setting a number of parameters (some of which are required). Following table lists all the supported service parameters. Please refer sample 254 for an example that demonstrates how to use some of these settings.

Parameter Name Description/Example Required Default
transport.vfs.FileURI The primary location to read the file contents from. This must be specified as a valid URI and it may point to a file or a directory. If a directory location is specified, the transport will attempt to read any file dropped into the directory.
<parameter name="transport.vfs.FileURI">file:///home/user/test/in</parameter>
<parameter name="transport.vfs.FileURI">sftp://bob:password@example.com/logs</parameter>
Yes N/A
transport.vfs.ContentType The expected content type for files retrieved for this service. The VFS transport uses this information to select the appropriate message builder.
<parameter name="transport.vfs.ContentType">text/xml</parameter>
Yes N/A
transport.vfs.FileNamePattern A file name regex pattern to match when fetching files from a directory specified by the FileURI.
<parameter name="transport.vfs.FileNamePattern">.*.xml</parameter>
No N/A
transport.PollInterval The polling interval in seconds.
<parameter name="transport.PollInterval">10</parameter>
No 300
transport.vfs.ActionAfterProcess Once a file has been read and successfully processed by Synapse (i.e. without any errors and runtime exceptions), the file should be either moved or deleted to prevent Synapse from processing the file for a second time. This parameter specifies which of the above actions should be taken. Allowed values are MOVE or DELETE.
<parameter name="transport.vfs.ActionAfterProcess">MOVE</parameter>
No DELETE
transport.vfs.MoveAfterProcess Specify the location to which the files should be moved after successfully processing them. Required if transport.vfs.ActionAfterProcess is set to MOVE. Ignored otherwise. Value must be a valid URI (local or remote).
<parameter name="transport.vfs.MoveAfterProcess">file:///home/test/original</parameter>
No N/A
transport.vfs.ActionAfterFailures If Synapse encounters an error while processing a file, the file should be either moved or deleted to prevent Synapse from processing the file for a second time. This parameter specifies which of the above actions should be taken. Allowed values are MOVE or DELETE.
<parameter name="transport.vfs.ActionAfterFailure">MOVE</parameter>
No DELETE
transport.vfs.MoveAfterFailure Specify the location to which the files should be moved after a failure. R equired if transport.vfs.ActionAfterFailure is set to MOVE. Ignored otherwise. Value must be a valid URI (local or remote).
<parameter name="transport.vfs.MoveAfterFailure">file:///home/user/test/error</parameter>
No N/A
transport.vfs.ReplyFileURI Specify the reply file location as a URI, in case the proxy service should generate a response message (file) after processing an input file.
<parameter name="transport.vfs.ReplyFileURI">file:///home/user/test/out</parameter>
No N/A
transport.vfs.ReplyFileName Name of the response file that should be generated by the proxy service.
<parameter name="transport.vfs.ReplyFileName">file:///home/user/test/out</parameter>
No response.xml or response.dat depending on the content type of the response
transport.vfs.MoveTimestampFormat Must be a timestamp format string compatible with java.text.SimpleDateFormat. If specified, Synapse will append a timestamp in the specified format to all the file names, whenever a file is moved to a new location (i.e. when moving a file after processing it or after a failure).
<parameter name="transport.vfs.MoveTimestampFormat">yy-MM-dd:HHmmss</parameter>
No N/A
transport.vfs.Locking File locking makes sure that each file is accessed by only one proxy service at any given instant. This is important when multiple proxy services are reading files from the same location or when one proxy service is configured to read the files written by another proxy service. By default file locking is globally enabled in the VFS transport, and this parameter lets you configure the locking behavior on a per service basis. Possible values are enable or disable, and both these values are important because locking can be disabled at the global level by specifying that at the transport receiver configuration (in axis2.xml) and selectively enable locking only for a set of services. To configure global locking behavior, set this parameter in the axis2.xml under the VFS transport receiver configuration.
<parameter name="transport.vfs.Locking">disable</parameter>
No enable
transport.vfs.Streaming If this parameter is set to true, the transport will attempt to use a javax.activation.DataSource (instead of a java.io.InputStream ) object to pass the content of the file to the message builder. Note that this is only supported by some message builders, e.g. for plain text and binary. This allows processing of the message without storing the entire content in memory. It also has two other side effects:
  • The incoming file (or connection in case of a remote file) will only be opened on demand.
  • Since the data is not cached, the file might be read several times.
This option can be used to achieve streaming of large payloads. Note that this feature is still somewhat experimental and might be superseded by a more flexible mechanism in a future release.
<parameter name="transport.vfs.Streaming">true</parameter>
No false
transport.vfs.MaxRetryCount If the file transport listener encounters an error while trying to read a file, it will try to read the file again after some time. This parameter sets the maximum number of times the listener should retry before giving up. Use the transport.vfs.ReconnectTimeout parameter to set the time duration between retries.
<parameter name="transport.vfs.MaxRetryCount">3</parameter>
No 3
transport.vfs.ReconnectTimeout The amount of time (in seconds) the current polling task should be suspended for after a failed attempt to resolve a file.
<parameter name="transport.vfs.ReconnectTimeout">30000</parameter>
No 30
transport.vfs.FailedRecordsFileName Once a file has been fully processed, it will be moved to a new location or deleted. If this operation fails, a log entry with the failure details can be written to a separate log file. This parameter controls the name of this failure log file.
<parameter name="transport.vfs.FailedRecordsFileName">move-errors.txt</parameter>
No vfs-move-failed-records.properties
transport.vfs.FailedRecordsFileDestination Once a file has been fully processed, it will be moved to a new location or deleted. If this operation fails, a log entry with the failure details can be written to a separate log file. This parameter controls the location (directory path) of this failure log file. To set the name of the log file use the transport.vfs.FailedRecordsFileName parameter.
<parameter name="transport.vfs.FailedRecordsFileDestination">logs/</parameter>
No repository/conf
transport.vfs.FailedRecordNextRetryDuration When a move operation has failed, the operation will be retried after this amount of time (configured in milliseconds).
<parameter name="transport.vfs.FailedRecordNextRetryDuration">5000</parameter>
No 3000
transport.vfs.MoveAfterFailedMove The destination to move the file after a failed move attempt.
<parameter name="transport.vfs.MoveAfterFailedMove">repository/move-errors</parameter>
No N/A
transport.vfs.MoveFailedRecordTimestampFormat The time stamp format to use when reporting failed move operations in the log.
<parameter name="transport.vfs.MoveFailedRecordTimestampFormat">HH:mm:ss</parameter>
No dd/MM/yyyy/ HH:mm:ss

[Back to top]

File Transport Sender (VFS Sender)

The file transport sender allows writing outgoing messages to local or remote files. To activate the file transport sender, simply uncomment the following transport sender configuration in the SYNAPSE_HOME/repository/conf/axis2.xml file.

<transportSender name="vfs" class="org.apache.synapse.transport.vfs.VFSTransportSender"/>

To send a message using the file transport, define a Synapse endpoint with an address that starts with the prefix 'vfs:'. The rest of the address should be a valid local or remote file URI. An example is shown below:

<endpoint> <address uri="vfs:file:///var/spool/synapse/out"/> </endpoint>

Some more example file URIs are listed below. Remember to prefix each URI with the string 'vfs:' when using these to define Synapse endpoints. Refer http://commons.apache.org/vfs/filesystems.html for a complete list of Commons VFS supported protocols and their corresponding URI formats.

  • file:///directory/filename.ext
  • file:////somehost/someshare/afile.txt
  • jar:../lib/classes.jar!/META-INF/manifest.mf
  • jar:zip:outer.zip!/nested.jar!/somedir
  • ftp://myusername:mypassword@somehost/pub/downloads/somefile.tgz

File Locking

By default file locking is globally enabled for the file transport sender. This behavior can be overridden at the endpoint level by specifying transport.vfs.Locking as a URL query parameter with the appropriate value (enable/disable) on a given endpoint:

<endpoint> <address uri="vfs:file:///var/spool/synapse/out?transport.vfs.Locking=disable"/> </endpoint>

You may also change the global locking behavior by setting the transport.vfs.Locking parameter in the file transport sender configuration in axis2.xml file.

FTP Passive Mode

When writing to remote file locations using a protocol such as FTP, you might want Synapse to communicate with the FTP server in the passive mode. To configure this behavior, simply add the query parameter vfs.passive to the endpoint address:

<endpoint> <address uri="vfs:ftp://myusername:mypassword@somehost/pub/downloads/somefile.tgz?vfs.passive=true"/> </endpoint>

Retrying on Error

When the file transport sender encounters an error while trying to write a file, it can retry after some time. This is useful to recover from certain types of transient I/O errors and network connectivity issues. Following parameters can be configured as URL query parameters on the file (vfs) endpoints to make use of this feature.

Parameter Name Description/Example Required Default
transport.vfs.MaxRetryCount Maximum number of retries to perform before giving up. No 3
transport.vfs.ReconnectTimeout Time duration (in seconds) between retry attempts. No 30

Using Temporary Files

The file transport sender does not write file content atomically. Therefore a process reading a file updated by Synapse, may read partial content. To get around this limitation, the temporary file support can be activated on the target file (vfs) endpoint:

<endpoint> <address uri="vfs:file:///var/spool/synapse/out?transport.vfs.UseTempFile=true"/> </endpoint>

This forces the file transport sender to write the data to a temporary file and then move the temporary file to the actual destination configured in the file endpoint. On most operating systems (e.g. Unix/Linux, Windows), this delivers the desired atomic file update behavior. When the file endpoint points to a remote file system, the temporary files will be created on the remote file system, thus preserving the atomic update behavior.

Appending to Files

When updating an existing file, the file transport sender usually overwrites the old content. To get append behavior instead, set transport.vfs.Append parameter on the target endpoint:

<endpoint> <address uri="vfs:file:///var/spool/synapse/out?transport.vfs.Append=true"/> </endpoint>

Out-only Message Exchange Pattern

It should be noted that by its nature, the file transport sender doesn't support synchronous responses and should only be invoked using the out-only message exchange pattern. In a Synapse mediation (sequence/proxy/API), this can be forced using the following mediator:

<property name="OUT_ONLY" value="true"/>

[Back to top]

Using SFTP

To avoid man-in-the-middle attacks, SSH clients will only connect to hosts with a known host key. When connecting for the first time to an SSH server, a typical command line SSH client would request confirmation from the user to add the server and its fingerprint to the list of known hosts.

The VFS transports supports SFTP through the JSch library and this library also requires a list of known hosts. Since Synapse is not an interactive process, it can't request confirmation from the user and is therefore unable to automatically add a host to the list. This implies that the list of known hosts must be set up manually before the transport can connect.

JSch loads the list of known hosts from a file called known_hosts in the .ssh sub-directory of the user's home directory, i.e. $HOME/.ssh in Unix and %HOMEPATH%\.ssh in Windows. The location and format of this file are compatible with the OpenSSH client.

Since the file not only contains a list of host names but also the fingerprints of their host keys, the easiest way to add a new host to that file is to simply use the OpenSSH client to open an SSH session on the target host. The client will then ask to add the credentials to the known_hosts file. Note that if the SSH server is configured to only allow SFTP sessions, but no interactive sessions, the connection will actually fail. Since this doesn't rollback the change to the known_hosts file, this error can be ignored.

Known issues

The VFS listener will start reading a file as soon as it appears in the configured location. To avoid processing half written files, the creation of these files should be made atomic. On most platforms this can be achieved by writing the data to a temporary file and then moving the file to the target location. Note however that a move operation is only atomic if the source and destination are on the same physical file system. The location for the temporary file should be chosen with that constraint in mind.

It should also be noted that the VFS transport sender doesn't create files atomically. Use the transport.vfs.UseTempFile endpoint parameter to get around this issue.