Challenge Overview
Project Overview:
A common requirement for a new system implementation is to load historical documents from a legacy system into the new system. Anywhere between 1,000 and 1,000,000 files may be loaded using APIs. Typical file formats may be PDF, doc, txt, png, and so on. Each file has values that will be used to determine where the individual file is loaded into the new system. These values will need to be extracted using the filename and a provided index mapping file.
Competition Task Overview:
The requirement is to develop a program that will do the following:
1. Retrieve a zip folder from an SFTP, decrypt the folder using a provided PGP Public Key, unzip the folder, locate the index file from the folder and then loop through each file in the folder (excluding index file) to gather the required values and base64 encode the whole file.
2. Ultimately, each file will be individually uploaded to an endpoint using APIs. However since the API can not be exposed, we just want an API stub that won't actually call the API but will call a function and be able to handle the response. The response should be a set of values for each file that will be used to load the file. These values will need to be output for each file in some type of log file which also contains any issues encountered. A diagram of the process is below.
Input Values (configurable at deployment time)
SFTP endpoint
SFTP directory
SFTP username
SFTP password
Target system username (stored for later use)
Target system password (stored for later use)
Index file name
Desired output filename (dynamic using output values, it will contains some placeholders that will be replaced by real values, see examples below)
PGP Public Key
Default Document Category (stored for later use)
Output Values (for each file)
filename
userID
base64 encoded file
category
The output values should be determined by using the index file that is located in the zip folder - the name of each file can be used to look up other values. The index file is a pipe delimited .txt file and there are three possible formats. RESUMEKEY is the userID. FILENAME, RESUMEPDFNAME, ATTACHMENTFILENAME are the filenames to match.
RESUMEKEY|ADDEDON|FIRSTNAME|MIDDLENAME|LASTNAME|EMAIL|FILENAME
123456|2014-02-03 19:48:46|Test||One|cccccccc@gmail.com|R_123456.txt
456789|2014-02-03 20:22:56|Test|Matthew|Two|aaaa@gmail.com|R_456789.txt
ATTACHMENTID|RESUMEKEY|ADDEDON|CATEGORY|FIRSTNAME|MIDDLENAME|LASTNAME|EMAIL|RESUMEPDFNAME
A_6646692|123456|2015-02-19 13:12:58|R|Test|Adam|One|bbb@gmail.com|Sample_File_123445
A_6646695|456789|2013-07-11 13:51:29|R|Test||Two|aaa@yahoo.com|Another_File_451123
ATTACHMENTID|RESUMEKEY|ADDEDON|CATEGORY|FIRSTNAME|MIDDLENAME|LASTNAME|EMAIL|ATTACHMENTFILENAME
904137|123456|2011-06-18 16:56:33|OTHER|Test|P|One|aaaa@hotmail.com|200100_John_Resume.doc
904196|456789|2011-06-18 19:26:30|OTHER|Test||Two|bbbb@hotmail.com|30001 Resume.docx
Example (using third option above)
Input file: 200100_John_Resume.doc
Desired output filename: [userID]BR.[extension]
Output filename: 123456BR.doc
Output userID: 123456
Output category = Default Category
Final Submission Guidelines
Additional Requirements:
Matching with the index file should be exact match - case sensitive, including spaces
The files should not be opened to read content
Each piece outlined in the diagram should be it’s own component
Technology Overview:
Java
Please submit all code required by the application in your submission.zip
Document the build process for your code, including how to configure, deploy and run the code and verify the results