Challenge Overview
Contest Objective
In order to associate proposed rules with informative facts about them, like NAICS codes, program names, and the CFR (Code of Federal Regulations) that implement them, the EPA provides a dataset called LRS. LRS stand for Legislative Rules Service. This dataset contains lists of current legislation (laws), indexed by chapter, subchapter and section. This data also establishes relationships that associate these laws and chapters with other relevant information, like the Programs the EPA uses to regulate activity, the NAICS codes that associate a law with a specific type of industry, and other information.
---Build a Nodejs module to parse LRS XML files (provided in forum)
---LRS data are SKOS (simple knowledge organization systems) xml files
Requirements
---Nodejs 6.x should be used
---Files should be parsed and stored in database (memory is fine) for easy query.
The XML files should be parsed such that users of the parser can:
1. Retrieve a list of NAICS codes
2. Retrieve a list of Program names, and the CFR citations associated with them
3. Retrieve a list of Laws, and the CRF citations associated with them
Note that a likely use case of these lists will be to provide drop-down autocomplete lists for 1 and 2, so the ability to apply a unique filter will be useful.
In order to assist you with understanding how the LRS files work the following tips are provided. Note the requirements are listed above.
1. Retrieve NAICS codes
NAICS: WHERE TO GET A LIST OF THEMThis file: NAICS2012_LRSRelationships_RDF-SKOS_20160825
Code Names: key on <skos:prefLabel> e.g. <skos:prefLabel>61111 Elementary and Secondary Schools</skos:prefLabel>
Code: <zthes:label> e.g. <zthes:label>61111</zthes:label>
Use <zthes:termID> for linking e.g. <zthes:termID>2944589</zthes:termID>
2) Retrieve Program and regulations associated
EPAProgramProject_LRSRelationships_RDF-SKOS_20160825.xml key on <skos:prefLabel>THIS WILL BE THE PROGRAM NAME</skos:prefLabel> to find your program names (like "Brownfields")
Use that to get a list of Programs and their associated termID.
Match termID to <skm:PC rdf:resource="#<termId>" any "CFR2015Title40..." file to find regulations that are related to this program
3) Retrieve Clean Air Act rules information
They're in the files labeled CFR2015Title40.....xml Ignore any that don't begin with "CFR2015Title40"
Note these files have more regulations than we care about.
We care only about parts 50 through 98, which are spread across many of the XMLs volumes. Search them to find the right Parts.
Since there is no Nodejs module to parse SKOS data, other options can be used. One example is a python skos library (https://github.com/geo-data/python-skos). This is just a suggestion, competitors a free to suggest other options.
Reference Documents
---Will be posted in the contest forum.
Final Submission Guidelines
---A node module to be installed using NPM
---Github or Gitlab repository link with the source code. Add handles coderReview and rsial2 as collaborators.
---Deployment and usage instructions should be included in a README.md file