Register
Submit a solution
The challenge is finished.

Challenge Overview

Detailed Requirements
Describe what you want to accomplish.

  • Create python scripts to generate automated excel reports from given input files

Project background

  • Our client is looking forward to automate the report generation tasks which is currently performed by a team of experts in their company.

  • The reports are very crucial to our client and few of the reports are viewed everyday by C-Level executives

  • Many automated report development challenges are planned in this series, jump in right now to get familiar with automated report generation techniques

Technology Stack

  • Python (version 3.7.4)

Challenge Assets

  • You would need to follow the existing conventions in the starter pack during further development (Logging, Check reports, Styling guide, Mapping file)

  • Masked input files and sample output files are provided

  • Business requirement document for excel report in scope is shared in the forum.

Individual Requirements

General requirements

  • Python script should perform necessary validations before generating the report. If any validation fails, report generation should stop and the error should be reported to the user. Validation check involves

    • File existence check - Check whether all the files required for generating the report is available. If any of the file is missing, raise error

    • Missing values check - Check if any of the fields required for generating the report is missing. If any such field is missing, raise error

    • Valid values check - Check if any of the fields required for generating the report is having values like `#REF`, `NA` or having formulae without `=`. If there is any such field, raise error

  • Result of each such check should be logged properly to the log file. 

  • User should be kept updated about the progress by printing appropriate informational messages in the console

  • Formatting shown in the sample output reports need to be maintained in the reports generated by Python script as well

  • Python script should be agnostic enough to handle report generation for any date. There should not be any hardcoding of the dates or cell references in the code. 

  • Please keep in mind that the end users of this application are not tech savvy, hence the application should be designed and documented in such a way that even non technical users could run the application easily and generate their reports.

Challenge Requirements

  • Required Approach
    • Other than the tool created for the MIS report, you should implement a new approach for Country Monthly reports. (We will still share the MIS report tool for reference). 
    • If you look into the MIS report codebase, it's basically doing the following steps to set the cell value in output excel file:
      • locate the cell by row+column in output excel file by keywords and offset
      • locate the cells similarly in input excel files, and get their values
      • calculate the output value and set to the cell in output file
    • But the mapping file used in MIS report is too complicated and becomes very hard to maintain when it's getting larger. The following improvements should be made based upon its idea:
      • Define column and row aliases for the columns and rows in the excel file. And similar to the MIS report, the rows and columns will be matched by keywords. e.g. assign "mtd" alias to the column with "MTD" content. And we can also use "mtd+1" to represent the next column, and "mtd-1" to represent the previous column. 
      • After we define the aliases for the rows and columns, we can reference the cell by cells[rowAlias][columnAlias]. We can do the same for the input and output excel files, which makes it easy to access the data in the excel files. 
      • We can even define aliases for continuous rows and columns, then we can reference a row like: cells[rowAlias][column1Alias ... column2Alias] - note that aliases should be defined for all columns between column1Alias and column2Alias. Similarly, we can reference a column or even a region. 
      • With the cell/row/column/region references, now we can define some general actions:
        • copy values from one cell / row / column / region from one input file to the output file
        • calculate the cell value in input file from many cells from one or more input files
        • calculate the cell values for a row in input file with several other rows in the same input file, and each cell is calculated from several cells from other rows at the same column.  And similarly, we can calculate the cell values for a column in input file with several other columns in the same file. 
        • any other common actions
      • We may even define some context variables for some cells, so that it can be referenced directly. 
    • Note that the notations given above are high level, you can do whatever is needed to make it easy to implement. And it's expected that you create a row/column alias file for each excel file, and this file should also have the context variables. 
  • Code Structure
    • input folder - contain all the input files. 
    • output folder - output file is written to this folder
    • template folder - contain the output template file, which should contain no formula
    • mapping folder - contain the alias files for excel files and the mapping files. (note that these files should be TEXT format rather than BINARY format). e.g. you can use CSV format, but don't use Excel format. 
  • Scope
    • The input file name follows the "{Date} Country Financials_{Country}.xlsx format. 
    • if user runs "python inputReport.py", the input tab of all files following the format above should be processed, and the output file should only contain the Input tab. 
    • if user runs "python pbReport.py", the PB tab of all files following the format above should be processed, and the output file should only contain the PB tab. 
    • The Input and PB tabs of all 6 Countries and 7 SEAs excel files are in scope, and they have the same format. (Please let me if they are not). 
    • For static fields (with no formula), you just copy them as they are to the output file. 
    • For all the other fields, you should calculate the value and set to the output file.  Note that the output file should NOT contain any formula. 
    • Please make the code as generic as possible, so that we can extend it to support the other reports. 
  • Extra Notes:
    • The alias name must be user-friendly. you may even use rXXX to indicate it's an alias for a row, and cXXX to indicate it's for a column. 
    • We will give $200 bonus to the winner, but this includes the final-fix as well. 

Testing and Validation

  • During the review, we will be using a different set of files to validate the accuracy of your report generation. Accuracy of report matters a lot!

Important Notes:

  • Please pay attention to the shortened timelines of this challenge. Submission, Review and Appeals phase are shorter than general challenge.

  • Code should follow google python style guide. You could use pylint to validate the conformance of your python files with style guide.

  • Ensure that code has sufficient amount of comments to understand its purpose as well as there is no unnecessary comments which just clutter the file.

  • Clean up the unnecessary code before submitting

  • If you have any doubts, Please feel free to post it in the forum. We will be happy to resolve your queries as soon as possible.



Final Submission Guidelines

  • Zip your updated codebase and submit the zip file using the submit link in the challenge

  • After submitting, please ensure that you are able to download your submission from Online Review

Review style

Final Review

Community Review Board

Approval

User Sign-Off

ID: 30103574