Challenge Overview
Our web site consists of public and entitled content. We need a web crawler that is capable of logging on to the entitled portion of the web site and is capable of checking that different types of resources are available. The crawler also needs to be capable of setting a cookie that determines the country/language version of content.
The overall goal is to find expired and missing resources so that content pages can be identified and corrected. The output should identify a list of pages that need to be corrected.
Options that build on open source crawlers would be considered.