Easy | 250 Points | Topcoder Skill Builder Competition | Databricks | Apache Spark

Key Information

Register
Submit
The challenge is finished.

Challenge Overview



This is the 250 points Easy level problem of Topcoder Skill Builder Competition for Databricks and Apache Spark. For more challenge context info Register for the Host Competition before submitting a solution to this problem.

 

Technology Stack

  • Python

  • Scala

  • SQL

  • Apache Spark

  • Databricks

You can use either Python, Scala, SQL or a combination of these in your notebook. You are encouraged to use Apache Spark (it is an optional requirement).

Problem Statement

Notebook Setup

  • Sign up for the Databricks community edition here

  • Note that during sign up, it will prompt you to select between the Business and Community editions. Be sure to select the community edition, which is the free version of Databricks

  • You will be working with the dataset available in Github Archive. You can upload the dataset to your Databricks workspace and import it when working with the tasks below or you can download it during run time itself. You need to be familiar with Github / Version Control to understand the terminologies used in the tasks below.

  • Once you have signed up, proceed with the steps below

 

Data Ingestion Task

  • Create a notebook in Databricks

  • In this notebook, you need to import the data for the 1st of October, 2020 at 9 AM UTC - from Github Archive

  • Once imported, print, in tabular format, the first 10 entries in the dataset. You need to print ONLY the following attributes:

    • The event type (type attribute)

    • The handle of the actor associated with the event (the actor.login attribute).

    • The repository name associated with the event (the repo.name attribute)

    • The date and time of the event (the created_at) attribute

Yes, a header row is needed.

  • Your notebook must contain the commands in the cells that you used to arrive at the above result.

  • Next, Publish your notebook. Databricks will provide you with the public url where your notebook can be accessed

  • That completes the task.

 

Important Notes

  • Don’t just write the commands necessary to complete this task. You need to run all the cells in the notebook, display the output and verify that it meets the expectations and then publish.

  • This contest is part of the Databricks Skill Builder Contest

  • Successfully completing the task will earn you 250 points in the DataBricks Skill Builder Leaderboard.

 

Problems

  1. Easy: 250 Points - This contest

  2. Medium: 500 Points

  3. Hard: 1000 Points



Final Submission Guidelines

Submit a text file that contains the link to your Databricks Notebook

 

ELIGIBLE EVENTS:

2021 Topcoder(R) Open

REVIEW STYLE:

Final Review:

Community Review Board

Approval:

User Sign-Off

SHARE:

ID: 30149032