Register
Submit a solution
The challenge is finished.

Challenge Overview

The objective of the Megahack is simple, the development of a search portal that allows users to perform a search against pending EPA laws and regulations based on search criteria that aren’t available today, and see if and how it is related to Clean Air Act enforcement as well as show linkages from the laws or regulations to programs that might be related.

Contest Objective

In order to associate proposed rules with informative facts about them, like NAICS codes, program names, and the CFR (Code of Federal Regulations) that implement them, the EPA provides a dataset called LRS.  LRS stand for Legislative Rules Service.  This dataset contains lists of current legislation (laws), indexed by chapter, subchapter and section.  This data also establishes relationships that associate these laws and chapters with other relevant information, like the Programs the EPA uses to regulate activity, the NAICS codes that associate a law with a specific type of industry, and other information.

 
In this challenge your job is to create a parser on Node.js that will provide data that will be used in later contests to associate proposed rules (pulled from regulations.gov) with programs, codes and citations.  Programs and codes will be used by the interface to provide drop-downs.
---Build a Nodejs module to parse LRS XML files (provided in forum)
---LRS data are SKOS (simple knowledge organization systems) xml files

Requirements
---Nodejs 6.x should be used
---Files should be parsed and stored in database (memory is fine) for easy query. 

The XML files should be parsed such that users of the parser can:

1. Retrieve a list of NAICS codes

2. Retrieve a list of Program names, and the CFR citations associated with them

3. Retrieve a list of Laws, and the CRF citations associated with them

Note that a likely use case of these lists will be to provide drop-down autocomplete lists for 1 and 2, so the ability to apply a unique filter will be useful.

In order to assist you with understanding how the LRS files work the following tips are provided.  Note the requirements are listed above.

1. Retrieve NAICS codes

NAICS: WHERE TO GET A LIST OF THEM                           
This file: NAICS2012_LRSRelationships_RDF-SKOS_20160825                               
Code Names: key on <skos:prefLabel>   e.g. <skos:prefLabel>61111 Elementary and Secondary Schools</skos:prefLabel>
Code: <zthes:label>                          e.g. <zthes:label>61111</zthes:label>               
Use <zthes:termID> for linking                      e.g. <zthes:termID>2944589</zthes:termID>               


2) Retrieve Program and regulations associated

PROGRAMS: WHERE TO GET A LIST OF THEM                               
EPAProgramProject_LRSRelationships_RDF-SKOS_20160825.xml key on <skos:prefLabel>THIS WILL BE THE PROGRAM NAME</skos:prefLabel> to find your program names (like "Brownfields")
Use that to get a list of Programs and their associated termID.         
Match termID to <skm:PC rdf:resource="#<termId>" any "CFR2015Title40..." file to find regulations that are related to this program


3) Retrieve Clean Air Act rules information

REGULATIONS: HOW TO FIND THEM IN LRS DATA                               
They're in the files labeled CFR2015Title40.....xml  Ignore any that don't begin with "CFR2015Title40"           
Note these files have more regulations than we care about.   
We care only about parts 50 through 98, which are spread across many of the XMLs volumes.  Search them to find the right Parts.


Since there is no Nodejs module to parse SKOS data, other options can be used. One example is a python skos library (https://github.com/geo-data/python-skos). This is just a suggestion, competitors a free to suggest other options.


Reference Documents
---Will be posted in the contest forum.

Final Submission Guidelines

---A node module to be installed using NPM
---Github or Gitlab repository link with the source code. Add handles coderReview and rsial2 as collaborators.
---Deployment and usage instructions should be included in a README.md file

Review style

Final Review

Community Review Board

Approval

User Sign-Off

ID: 30055711