Challenge Overview
In this challenge series we're building a microservice for integration purposes that listens for incoming files over SFTP and streams the output on a Kafka bus. Service will be based on Apache Camel.
In this challenge, we want to build a SFTP server that will listen for incoming files and forward them into the Camel pipeline. One specific feature that we want to support is handling large files (up to 500GB) without the need to store them in memory or on disk. To do that, we need to modify how the SFTP server handles the incoming files.
We have two suggestions on how to achieve this goal:
Submit the updated deployment/verification guide
In this challenge, we want to build a SFTP server that will listen for incoming files and forward them into the Camel pipeline. One specific feature that we want to support is handling large files (up to 500GB) without the need to store them in memory or on disk. To do that, we need to modify how the SFTP server handles the incoming files.
We have two suggestions on how to achieve this goal:
- Override the SFTP server “file open” subsystem method and create a Java list of datastructure(class) where you keep the filename, filehandle and pointer to stream (which on other end is connected to Camel “from:” pipeline). Also override the SFTP “write” subsystem method and utilize the datastructure(class) to lookup the stream and then convert the inbound byte array coming as part of the write method and write it into the stream, thus triggering camel to start pipeline-processing of the inbound data. The above approach allows for low memory footprint and parallel processing as only chunks are processed and camel handles the concatenation of complete messages until the inbound camel stream has been closed.
- Same principles as in 1, override “fileopen” and “write” from SFTP server implementation , but as an alternative to using java streams, the file-systems FIFO stream can be used instead. Java does not natively support creation of FIFO-files, so this would require FIFO files to be generated using System.exec(“mkfifo xyz”). The Apache Camel pipeline is then instructed to listen to file creations and the FIFO will be processed like any other FileProcessing with Camel, but when closed by both ends will be cleared of bufferdata(RAM memory). This approach also requires the deletion of FIFO files to occur when Apache Camel pipeline processing has completed. https://linux.die.net/man/3/mkfifo The above approach makes the microservice not really care if 500GB or 1 TB files sizes are being transmitted, the downstream (Kafka) has to however be sized to cater for max-load, but in an enterprise stream integration scenario the idea is for Kafka consumers to be able to start processing of the chucked/split files directly as individual messages.
- Apache Mina SSHD - https://mina.apache.org/sshd-project/index.html
- Apache Commons 2 VFS SFTP https://commons.apache.org/proper/commons-vfs/filesystems.html#SFTP
Final Submission Guidelines
Submit the updated codeSubmit the updated deployment/verification guide