Web crawler that checks for missing or expired content

Register
Submit a solution
The challenge is finished.

Challenge Overview

Our web site consists of public and entitled content. We need a web crawler that is capable of logging on to the entitled portion of the web site and is capable of checking that different types of resources are available. The crawler also needs to be capable of setting a cookie that determines the country/language version of content.

The overall goal is to find expired and missing resources so that content pages can be identified and corrected. The output should identify a list of pages that need to be corrected. 

Options that build on open source crawlers would be considered.

Review style

Final Review

Community Review Board

Approval

User Sign-Off

Challenge links

ID: 30031571