Challenge Overview
Challenge Overview
Welcome to the Pioneer Technical Architecture Updates Challenge
In previous challenges we have defined the high level architecture for the event streaming platform. In this challenge we will expand the architecture of the subscriptions management tool by specifying the capabilities and behavior of the backend.
Background
We are building a scalable event streaming platform that will be used to provide customers with real time notifications on events that occurred in the system. Specific use case that we’re targeting is in the financial sector, but the platform will be designed as a generic event streaming solution. Scalability is a major concern as the solution would be used to process millions of events daily.
Event streaming platform will consist of three parts:
-
Producer - that ingests source data into Kafka cluster
-
Aggregation and filtering of the source data
-
Delivery of the events to end users
Most of the source data is generated in real time (ex Bob sent $5 to Alice) and some data is generated during night work process (ex balance for Bob’s account is $10). Regardless of how the data is generated, it is available in Kafka topics and will be used by our Producer to send event notifications.
See the project architecture document for more information (posted in challenge forums). You should read and understand the existing architecture before reading the challenge requirements below.
Task Details
Your task in this challenge is to design the backend API of the Subscriptions Management tool. Note that this is not a standard api design challenge with a strictly defined UI and requirements - you will need to understand the overall project architecture and do some research on how to appropriately design the api to support the use cases (see below).
Subscription management tool will be used by the end clients to define what data they want to receive. Defining a data subscription consists of defining a ksqldb query based on data from the available topics (outputs of data ingestion, transformations, aggregations - see architecture for details). This can be simply copying the data from the available topics, or generating attribute projections or aggregations using ksqldb streams or tables.
Subscriptions management tool will be a self service tool used by the end users (clients looking to transfer real time data to their infrastructure) to manage the data transfer process. Use cases for the subscriptions tool are:
-
managing subscriptions based on ksqldb streams,
-
managing subscriptions based on ksqldb tables
-
Monitoring the status of configured data ingestion and aggregation jobs
There are no strict requirements on those individual use cases, but we’re aiming to create a tool that will be flexible for the end user - not one that will limit the users and require them to go around the tool to define subscriptions. General rule to follow here is - if defining a data source/sink/transformation/aggregation is easy with Kafka Connect/ksqldb, it should be easy with the subscriptions management tool API as well.
Another important questions to answer in this challenge is how the api should be implemented:
-
Will it call some KafkaConnect or ksqldb management apis?
-
Or will it use some management APIs provided by Strimzi (see architecture document)
-
What data will the API save and where (naturally the ksqldb jobs data would be saved to a local database so the user can view/edit/update it)
-
Can the job definitions be updated, or user has to delete and recreate them if changes are needed to job definitions
-
Can the API validate the job parameters - ex how do we avoid defining two jobs that push data to the same topic or overwrite existing ksqldb tables. Can the api do all the validation, or we have to rely on handling error codes from kafka connect or ksqldb?
-
Are API endpoints synchronous or async - are the jobs created/updated in Kafka Connect/ksqldb as the user defines them, or we need to track status for a job definition (ex pending, created, started, failed, removed, etc)
Monitoring subscriptions is the last major requirement for this tool. Low level details like time series of number of processed records in a unit of time is not required - those details would be available in Grafana/Prometheus (per the application architecture) - users are only interested in weather the jobs are configured correctly, is the data being processed, and perhaps high level stats on total number of processed records since the job was created - it’s up to you to define data that can easily be obtained from Kafka and is useful to users of the ingestion tool.
User authentication/authorization is out of scope for now - assume anyone can do any operations on the api - we will add the authorization rules later.
Submission Guidelines
These should be the contents of your submission:
-
Architecture document explaining how the backend of subscriptions management tool will work
-
Swagger API spec
-
Database structure - ER diagram showing what entities are managed in the database