Challenge Overview
NOTE - This is a repost of an earlier challenge with reduced scope. The challenge will be judged by client in collaboration with the copilot
Welcome to the Actian Vector Schema Conversion Tool Challenge.
Actian is the hybrid data management, integration, and analytics Company that enables Enterprises to seamlessly manage and connect operational and analytic data for superior performance, insights, and business outcomes. Activate Your DataTM - Learn more about Actian’s products here and follow us on Twitter.
Challenge Overview
We need help from the TopCoder community to build a schema conversion tool that will make it easy for Business Systems Analysts, DevOps Engineers, Developers, Database Administrators, and Data Scientists to move data from an existing database into Actian Vector.
What is Actian Vector?
Actian Vector is a high-performance vectorized column store analytics database designed for high performance analytics by Actian Corporation. Actian Vector was designed from the ground up to exploit performance features in today’s x86_64 CPUs with features such as vectorization and larger chip caches that enable in-chip analytics. Learn how Actian Vector achieves record-breaking speed and performance here.
Examples of Actian Vector use cases include:
· Customer profile analytics: Granular, multi-channel, near real-time customer profile analytics can tell you about your customers, the best means to connect, the targeted offers that will resonate, their predilection to churn, and the best ways to personalize the entire customer experience to win more business and drive up loyalty levels.
· Micro-segmentation: Uncover relationships between customers and key purchase drivers and predicting the value of each customer along thousands of customer attributes, you can uncover new segments that your competition isn’t thinking about yet, increasing conversions and gaining higher returns on your marketing investment.
· Customer life-time value: Connect to all of your data, from account histories and demographics to mobile and social media interactions, and blend these disparate sources with speed and accuracy. Uncover key purchase drivers to understand why someone purchases or rejects your products. Assign customer value scores by correlating which characteristics and behaviors lead to value at various points of time in the future.
· Next best action: Use micro-segmentation models to find and classify small clusters of similar customers. Customer value models predict the value of each customer to the business at various intervals. Combining the output of these two models into a personalized recommendation engine gives you the information you need to take action that gives you a distinct competitive advantage. You can optimize your supply chain, customize campaigns with confidence, and ultimately drive meaningful, personalized engagements.
· Campaign optimization: Traditional campaign optimization models use limited samples of transactional data, which can lead to incomplete customer views. Actian Vector allows you to connect to social media and competitor web sites in real time to learn which competitive offerings are gaining traction in the marketplace.
· Churn analysis: Churn prediction models have been limited to account information and transactional history, a tiny fraction of available data. With Actian Vector, increase the accuracy of churn predictions by combining and analyzing traditional transactional and account datasets with call center text logs, past marketing and campaign response data, competitive offers, social media, and a host of other data sources.
· Market basket analysis: Actian Vector enables data science models and advanced analytics to go deeper into detailed associations on all product relationships, and segment customers and spending habits into similar groups to learn more about shoppers.
Actian Vector product overview and datasheets are available here:
Requirements:
We strongly recommend using Actian’s DBMV as a starting point for this challenge requirements. Please fork this repo and use it as your baseline code.
1. For each of the following Source databases, create a fully representative sample schema with supporting examples of all data types and include an appropropriate amount of test data to use as a test bed. Nullability and defaults should be driven by the source schema, so the input data schemas should contain examples of both nullability and non-nullability, and defaults plus no defaults in the column specifications – and these need to be passed through properly in addition to the types.
a) In-scope for this challenge:
1) SQL Server (on Azure)
b) NOT in scope for this challenge (Future challenges will require schemas and support for the following Source databases):
1) Oracle
2) Teradata
3) MySQL
4) Greenplum
5) Netezza
6) Postgres
7) Sybase IQ
2. Build a Schema Conversion Tool that runs on Azure and that will achieve the following:
a) In-scope for this challenge:
1) Successfully migrate each of the Source databases in Azure over to the target Actian Vector database, with its data, and not lose anything along the way.
2) Successfully migrate all data from the Source database in Azure to the target Actian Vector database, optionally deleting all data from the target first.
For the Source dataset, ensure that a) there is a decent quantity of data to test throughput and b) it has examples of all the core data types from each Source database. XML types are not supported by Actian Vector and therefore should not be included in these schemas or examples.
You may only use data that is open and that can be shared with anyone in the world and which is freely available and to which you have rights to use the data in submitting such data.
Source database objects to include for migration/augmenting:
-
Schemas
-
Tables and all data types (ideally include a data type mapping config ability to change the types used both globally, and on a table-by table basis)
-
Views
-
Alias/Synonyms
-
Primary and unique keys
-
Referential integrity constraints
You only need to convert the items listed in the above list. Typical schemas may include other verbiage that you do not need to convert either because they are not necessary in Vector, such as indexes or other tuning objects, and/or items that are just out of scope of this exercise, such as stored procedures, encryption, LOBs, and administrative activity commands.
Data migration needs to be scalable and able to run efficiently in parallel. It should support both staging and streaming of the data files, or only streaming (if staging is not an option).
For each of the migrations, demonstrate that the number of parallel threads of activity is configurable (e.g. moving via the cloud might be bandwidth limited and so need only two threads, whereas side-by-side servers on a LAN could use 32, say) and ‘reasonably efficient’ so that they use bulk loading operations and don’t do singleton inserts for each row.
For this challenge, assume that live instances are available to both the Source and Target databases to work with at runtime, rather than having to parse SQL Scripts as text files. This should make the job a lot easier as you could then use metadata through a JDBC driver to work out data types and do conversions etc.
Need help with something?
· Please use the TopCoder forums for any questions relating to completing this challenge e.g. to clarify a requirement.· Please use the Actian Vector community forum for questions relating to how to use Actian Vector e.g. to ask a question about SQL data types supported by Actian Vector.
Scorecard
We’ll use the Scorecard (1-10) for grading submissions. The submissions will be reviewed by client and there will be no appeals or appeals response phases.
Weightage distribution is as follows
Focus of assessment for Actian
· 10% Schemas and data
· 90% Schema Conversion Tool
Additional terms and conditions for all participants
By participating in this Competition, You acknowledge and agree that
-
You must comply with all applicable laws in submitting a Competition Submission, and you represent that you are authorized to submit the Competition Submission.
-
Actian Corporation (“Actian”) is free to use, disclose, distribute or otherwise exploit Residual Knowledge. Residual Knowledge means information that is retained in the unaided memories of Actian’s employees and contractors who have had access to any Competition Submissions submitted by You. An employee’s or contractor’s memory will be considered unaided if the employee or contractor has not intentionally memorized the information for the purpose of retaining and subsequently using or disclosing it; and
-
You are not entitled to any compensation from Actian or any of the benefits which Actian may make available to its employees, and You are not authorized to make any representation, contract or commitment on behalf of Actian.
-
Employees and direct and indirect subcontractors of Actian Corporation and its subsidiaries and other affiliates, and employees and direct and indirect subcontractors of Actian’s partners (including TopCoder and its affiliates) are not eligible to participate in the challenge.
You may only use data that is open and that can be shared with anyone in the world and which is freely available and to which you have rights to use the data in submitting such data.
Final Submission Guidelines
Valid submissions require
· A fully documented GitHub package with MIT license.
o The readme file should include installation and setup instructions.
o The package should include sample data for each Source database, along with detailed instructions on how to use the package to move the data to Actian Vector.
· A short screencast walkthrough that demonstrates the tool working as intended (make the video available as an .mp4 file or share privately on YouTube or Vimeo).
· IMPORTANT: Include a file called "Submission details" with your entry. Include the following information:
o Actian ID – you created this when you registered on the Actian Community
o Links to any websites, source code repositories, videos, and blog posts related to this challenge that you have online
o Links to any and all data sources used in your submission
o If you are one of the winners and you would like Actian to contact you about opportunities to have your entry featured on the Actian blog, please confirm your interest and provide your full name and email address
o If you would like someone from Actian to contact you, please provide your full name, email address and a brief description of what you would like to discuss so that we can connect you with the appropriate person from Actian