Challenge Overview

OpeanAccess interface provider is a bridge between data analytics tools (that query the data via SQL) and a data source that provides the data. More specifically, the OpenAccess SDK Interface Provider provides methods in Java to exchange data between any data source and the OpenAccess SDK SQL Engine. 

In this challenge we'll be building one such interface provider that uses an API as a data source. The winning submission will be used as a basis for a tutorial on building API drivers, so the goal is to showcase various OpenAccess features. Before diving into more details, take a look at this tutorial for building a sample API driver

We have two third party APIs that we can use for testing

In our new IP, the API responses should be cached to support the end to end data exchange between SQL clients and API endpoints. Caching layer should be implemented using open source technologies for REST API responses (ie, use a well supported library - don't invent another one). Use Apache HttpComponents library for making the API calls. There are many optimization strategies in OpenAccess, and the winning submission will leverage those and provide the best experience for exchanging data between SQL clients and the SODA APIs.

Requirements:

  1. IP should be enhanced to support SQL pass through mode for at least one API endpoint without using the SOQL query language since many REST APIs do not have a query language.
  2. Cache should be able to support both row based mode data exchange (dam_addxxx) and SQL pass through mode data exchange (dam_addresxxx or dam_addcolxxx).
  3. Optimize JOINs between two objects using one of the join modes available in the SDK.
Your IP will be verified using ISQL tool included with OpenAccess (a tool for executing SQL queries against a data source). The following queries should run and return the correct results:
1. show tables
2. show columns
Show tables will return list of objects and show columns will return a list of fields. Based on this result, those column and field combinations should work with the following queries. 'T' is any object returned from defined schema (a table); 'F' is any field returned from the defined schema (a column); and 'XY' is any literal value you choose to use in the query:
3. select count(*) from T - return at least 250,000 rows
4. select count(*) from T2 - return at least 250,000 rows
5. select F,count(*) as cou from T group by F order by cou
6. select F, avg(F1) from T group by F order by F
7. select * from T where F=‘ST’
8. select F, AVG(F1) from T WHERE F2=‘XY’ GROUP BY F
9. select * from T where F in (‘ST’,‘AB’,‘XY’)
10. select * from T where F in (select F2 from T2 where F2>4 )
11. select * from T AS O1 INNER JOIN T2 AS O2 ON O1.F1 =O2.F2
You can use the Weather Stations API for selecting the data from a single table and you can use Police API for years 2016 and 2017 for join queries.
Make sure to document your code properly, and showcase all the used OpenAccess features in your Deployment/Verification guide.

Final Submission Guidelines

Submit the complete code for interface provider
Submit a Deployment guide with steps for setting up Open Access and adding/configuring the IP as data source
Submit a verification guide - how to run the sql queries, against which APIs, etc
Submit a short video of your solution - only cover running an SQL query, no need to include setup/deployment in the video.

ELIGIBLE EVENTS:

2018 Topcoder(R) Open

REVIEW STYLE:

Final Review:

Community Review Board

Approval:

User Sign-Off

SHARE:

ID: 30060758