Topcoder Challenge | Topcoder Community

Challenge Overview

In this challenge series we're building a microservice for integration purposes that listens for incoming files over SFTP and streams the output on a Kafka bus. Service will be based on Apache Camel.
In this challenge, we want to build a SFTP server that will listen for incoming files and forward them into the Camel pipeline. One specific feature that we want to support is handling large files (up to 500GB) without the need to store them in memory or on disk. To do that, we need to modify how the SFTP server handles the incoming files.
We have two suggestions on how to achieve this goal:

Override the SFTP server “file open” subsystem method and create a Java list of datastructure(class) where you keep the filename, filehandle and pointer to stream (which on other end is connected to Camel “from:” pipeline). Also override the SFTP “write” subsystem method and utilize the datastructure(class) to lookup the stream and then convert the inbound byte array coming as part of the write method and write it into the stream, thus triggering camel to start pipeline-processing of the inbound data. The above approach allows for low memory footprint and parallel processing as only chunks are processed and camel handles the concatenation of complete messages until the inbound camel stream has been closed.
Same principles as in 1, override “fileopen” and “write” from SFTP server implementation , but as an alternative to using java streams, the file-systems FIFO stream can be used instead. Java does not natively support creation of FIFO-files, so this would require FIFO files to be generated using System.exec(“mkfifo xyz”). The Apache Camel pipeline is then instructed to listen to file creations and the FIFO will be processed like any other FileProcessing with Camel, but when closed by both ends will be cleared of bufferdata(RAM memory). This approach also requires the deletion of FIFO files to occur when Apache Camel pipeline processing has completed. https://linux.die.net/man/3/mkfifo The above approach makes the microservice not really care if 500GB or 1 TB files sizes are being transmitted, the downstream (Kafka) has to however be sized to cater for max-load, but in an enterprise stream integration scenario the idea is for Kafka consumers to be able to start processing of the chucked/split files directly as individual messages.

Here are two SFTP server libraries for Java that you can use as a starting point (but you are free to use a different one if you find it more convenient)

Apache Mina SSHD - https://mina.apache.org/sshd-project/index.html
Apache Commons 2 VFS SFTP https://commons.apache.org/proper/commons-vfs/filesystems.html#SFTP

Base code for this project is available in the forums and it contains a preconfigured Camel pipeline and application build/run/deploy steps using docker containers. You can modify the processor to just split the input content on newline and send each new line to Kafka (Camel "to" route).

Final Submission Guidelines

Submit the updated code
Submit the updated deployment/verification guide

SFTP2Kafka Connector - SFTP Server

Challenge Overview

Final Submission Guidelines

Learn

ELIGIBLE EVENTS:

Review style

Final Review

Approval

Challenge links

Toolbox

ID: 30064522