Register
Submit a solution
The challenge is finished.

Challenge Overview

1. Project Overview

NASA has recorded over 100 terabytes of images, telemetry, models and just about everything one can imagine from all the planetary missions from the past 30 years. The data stored is within NASA's Planetary Data System (PDS). And it is all available free at http://pds.nasa.gov

However, while rich in depth and breadth, the PDS holdings have developed in a disparate fashion over the years with different architectures and formats at the various nodes. Consequently establishing a uniform approach to accessing data within the PDS is a significant challenge.

Working with TopCoder and Harvard IQSS via the NASA Tournament Lab, PDS developed a data base of meta-data that allows user-friendly access to data held at the Small Bodies Node (SBN). As the next step, we would like to extend the import and persistence module to process the Cassini ISS datasets held at the Rings Node.

2. Competition Task Overview

In this competition you have to extend the current functionality of the import and persistence module so that it can support the Cassini ISS datasets. This includes extending the database to accommodated additional meta-data parameters, and modifying the API to populate the expanded database.

Raw data volumes: http://pds-challenge.seti.org/volumes/COISS_2xxx/

Supplemental tables of geometric metadata: http://pds-challenge.seti.org/supplemental_tables/COISS_2xxx/

There are different meta-data parameters in the Cassini ISS data labels (some are the same as in the SBN data, some are not in the SBN labels, and some of the parameters in the SBN data probably are not in the Cassini labels. There are 129 additional geometric meta-data parameters in the supplemental tables (moon summary 35, ring summary 59, saturn summary 35). And there are subtleties. For example the fifth column in the "moon summary" and "saturn summary" tables is MINIMUM_PLANETOCENTRIC_LATITUDE. The database needs to accommodate both - they are distinct, not duplicates. One is the minimum value for the specified target moon (specified in column 4) and the other is the minimum value for the planet.

2.1.1. Changes Required

The current code supports processing of raw data volumes at the SBN. However substantially more information is necessary to support the current PDS challenge. As per the current logic DataSetProcessorImpl#doProcessing method gets the data label file names from index/index.lbl and index/index.tab. Each data label file will be parsed and the meta-data it contains persisted to the database. More details can be obtained by referring the code in DataSetProcessorImpl. 

For the Cassini data used in this challenge, additional meta-data, stored in supplemental tables is also required. Consequently, the code must be revised not only to accommodate the meta-data in the data label files, but to parse the meta-data from the supplemental tables.

For each volume of data, (COISS_2001, COISS_2002, COISS_2003, etc.) there are four separate supplemental meta-data files (inventory, moon, ring, and saturn). Each table file has it's own label file describing the contents of the table.

Consider COISS_2001, the supplemental meta-data for the inventory of targets within the image field of view is available here - http://pds-challenge.seti.org/supplemental_tables/COISS_2xxx/COISS_2001/COISS_2001_inventory.tab. The label file for this table is COISS_2001_inventory.lbl which is available in the same folder. The contents in the .tab file are VolumeID, File specification path to the data product label file, Ring Observation ID and Target (this field recurs as many times as necessary). The "file specification path to the data product label file" means the location in the Raw data volume. The .tab file contains one row for each data file (image) in the raw volume.

Similarly, again for COISS_2001, the 'moon summary' supplemental meta-data is available here - http://pds-challenge.seti.org/supplemental_tables/COISS_2xxx/COISS_2001/COISS_2001_moon_summary.tab. This table contains 35 columns of meta-data relating to the moons of the Saturn system. The contents of the individual columns in the table file are described in the corresponding label file. This table contains one row for each moon in the field of view for each image. So any single image may have from zero to many rows depending on the image contents

Example, Line #1 in the tab file is as follows.

"COISS_2001","data/1454725799_1455008789/N1454725799_1.LBL ","S/IMG/CO/ISS/1454725799/N","RHEA","HELENE","TELESTO"

Absolute file Specification Path to the data product label file will be http://pds-challenge.seti.org/volumes/COISS_2xxx/COISS_2001/data/1454725799_1455008789/N1454725799_1.LBLThis label file will have the pointers to the data objects (Images in our case).

2.1.2. Source Code

SVN - https://coder.topcoder.com/tcs/clients/ntl-pds/assets/assembly/pds_projects/import_and_persistence

Please go through the implementations in gov.nasa.pds.processors.impl and gov.nasa.pds.services.impl for understanding how dataset processing and persistence is performend for SBN dataset.

Also you can go through the current architecture which is located here.

Important - Do not break any existing functionality while updating the code.

2.1.2. Database

The new datasets might need DB schema updates. Make sure that you are strictly following 3NF.

2.1.3 Application Management

Please follow the standards set by the existing code for the application management.

2.1.3.1 Transactions

The services will use Spring to manage their transactions. All modifying methods should provide transactional control.

2.1.3.2 Configuration

The converter and all service implementations will use setter injection for configuration with the use of Spring. 

2.1.3.3 Persistence

This module will use JDBC queries to manage all data. It will use JdbcTemplate from the Spring framework.

2.1.3.4 Logging

The services will log activity and exceptions using the Logging Wrapper in this component.
It will log errors at Error level, and method entry/exit information at DEBUG level. It will log errors at Error level, potentially harmful situations at WARN level, and method entry/exit, input/output information at DEBUG level.

Please follow the logging pattern in the current code. 

2.1.3.5 Exception Handling

The Base Exception component will be used as the top-level exception for all other exceptions thrown by the components.

2.1.3.6 Scalability

With the use of object recycling and preloading during data reading, the application is quite scalable. Such services as validation preload all definitions that will be used for validation instead accessing the database each time.

Object recycling means that each time a data file is loaded, the Table object is reused, this minimizing the creation of new objects.

3. Technology overview



Final Submission Guidelines

Submission Deliverables

A complete list of deliverables can be found in the TopCoder Assembly competition Tutorial at:

http://apps.topcoder.com/wiki/display/tc/Assembly+Competition+Tutorials

  • Source code and configuration files.
  • Deployment guide to configure and verify the application.

ELIGIBLE EVENTS:

2014 TopCoder(R) Open

Review style

Final Review

Community Review Board

Approval

User Sign-Off

ID: 30035961