Challenge Overview
Project Overview
Detect IT is a web application that is being developed by the World Wildlife Fund that allows comparison of trade data to identify discrepancies in reporting of imports and exports between countries which may be a sign of illegal activity. For example: if Russia reports significantly less exports of king crab to South Korea than South Korea reports in imports of king crab from Russia, this suggests there may be some illegal trade occurring in king crab between the two countries.
This challenge is part of the Living Progress Build Challenge Blitz program.
Competition Task Overview
In this challenge, we’re going to be preparing the data for the dashboard functions that are front and center in the application. You’ll be delivering both the C#/ASP.NET services and the database structures required to store this aggregated data in our SQL Server database.
Here are the dashboard components that we’re hoping to support:
1. Top 20 wild-caught Commodities (Worldwide) by Export Trading Value over 2005-2015
2. Leading Exporters of Seafood and their trading partners (show by volume and by year 2005-2015)
3. Leading Importers of Seafood and their trading partners (show by volume and by year 2005-2015)
4. Leading Importers of Seafood - list of what they import (show by value and by year 2005-2015)
5. Top 10 wild-caught Commodities by Import Trading Value per Kilogram (by year 2005-2015)
We have a Vertica instance set up which contains the trading data related to seafood commodities from the UNComtrade database. Here are the credentials to access this system but system access is already available through previous challenges done in the program stream.
Host: 54.149.189.221
Port: 5433
Database Name: TradeData
User: user1
Password: pass1
We should set up the following endpoints to support the dashboard components above:
/aggregatedCommoditiesByCountry
Parameters:
displayType=weight or value
tradeFlow=Import or Export
pageSize
page
sort
The data set includes some aggregated data for each reporting country. Namely, each country that submits data to the UN reports what the total amount of trade it does for each commodity to rest of world: Partner = 0 in our records table. We should be able to limit our queries to these Partner = All (0) records for aggregation of commodities by dollar value or weight.
Here are some queries that the challenge admin ran recently to validate that all the commodities/countries had “All” records which we could use for further aggregation.
select c.name, com.text, count(*) from record r, commodity com, country c where r.partner = 0 and r.commodity = com.id and r.reporter = c.code group by c.name, com.text order by c.name asc;
The idea here is we’ll store the aggregated results in our SQL Server database for quick access for our dashboards.
You should also create the endpoint to initiate the refresh of our aggregated data:
/refreshAggregatedCommodities
This endpoint will be initiated from an admin page to drop the current aggregated data and refresh the data set from Vertica.
/exporters
displayType=weight or value
children=tradingPartners or commodities
numberOfChildren
pageSize
page
sort
The exporters endpoints can aggregate records from the TradeData.Record table grouping data by Reporter records using the Partner = 0 and TradeFlow = 2. Here is a simple query to do that.
select c.name, r.year, sum(netweightKg) as weight from record r, country c where tradeFlow = 2 and partner = 0 and r.reporter = c.code and isLeafCode = true group by c.name, r.year order by r.year asc, weight desc;
In addition, the children parameter of the /exporters endpoint indicates whether the child records returned should be trading partners (Record.partner column) associated with each importer or commodities by year. You’ll need to query the individual commodity records and group them by year and country and sort them in descending order by volume or value.
We should just stage all the aggregation types weight and value, trading partners and commodity in SQL Server so that it is ready for quick retrieval on the dashboards.
/refreshExporters
numberOfChildren
This endpoint will be initiated from an admin page to drop the current aggregated exporters data and refresh the data set from Vertica.
/importers
displayType=weight or value
children=tradingPartners or commodities
numberOfChildren
pageSize
page
sort
The importers endpoints can aggregate records from the TradeData.Record table grouping data by Reporter records using the Partner = 0 and TradeFlow = 1. Here is a simple query to do that.
select c.name, r.year, sum(netweightKg) as weight from record r, country c where tradeFlow = 1 and partner = 0 and r.reporter = c.code and isLeafCode = true group by c.name, r.year order by r.year asc, weight desc;
In addition, the children parameter of the /importers endpoint indicates whether the child records returned should be trading partners (Record.partner column) associated with each importer or commodities by year. You’ll need to query the individual commodity records and group them by year and country and sort them in descending order by volume or value.
We should just stage all the aggregation types weight and value, trading partners and commodity in SQL Server so that it is ready for quick retrieval on the dashboards.
/refreshImporters
numberOfChildren
This endpoint will be initiated from an admin page to drop the current aggregated importers data and refresh the data set from Vertica.
/commodityValueForWeight
pageSize
page
sort
For this endpoint we need to divide value by weight for all commodity. Here’s a sample query:
select com.text, r.year, sum(tradeValueUSD)/sum(netweightKg) as dollarsPerKilo from record r, commodity com where tradeFlow = 1 and partner != 0 and isLeafCode = true and netweightKg > 0 and r.commodity = com.id group by com.text, r.year order by r.year asc, dollarsPerKilo desc;
/refreshCommodityValueForWeight
This endpoint will be initiated from an admin page to drop the current aggregated CommodityValueForWeight data and refresh the data set from Vertica.
Final Submission Guidelines
Submission Deliverables
- Full code that covers all the mentioned requirements
- SQL script to setup and clean database
- A readme with details on how to configure, deploy and verify your services