Challenge Overview
We have a couple of projects in process that are producing boundary boxes on a set of images and we need to be able to determine if any of the boxes are overlapping and remove overlapping boundaries boxes if required. Your job in this challenge write a simple script that does the following:
- get the list of images (the distinct list is available in the IMAGE_OCR.NAME field)
- query for the list of phrases/rectangles within each image
- see if any of the rectangles overlap with each other. The coordinates for the bounding boxes/marks are in the IMAGE_OCR_PHRASE.X1, IMAGE_OCR_PHRASE.Y1,IMAGE_OCR_PHRASE.X2,IMAGE_OCR_PHRASE.Y2 fields
- remove overlapping rectangles -- where there is a clash keep the rectangle with the smaller area.
- save the kept records to a new database table - IMAGE_OCR_PHRASE_KEEP. IMAGE_OCR_PHRASE_KEEP should have exactly the same structure as IMAGE_OCR_PHRASE.
- save the reject records to a new database table - IMAGE_OCR_PHRASE_REJECT. IMAGE_OCR_PHRASE_REJECT should have exactly the same structure as IMAGE_OCR_PHRASE.
Final Submission Guidelines
- Python 3 script
- Deployment instructions and required installation modules (pip install ... ) .
- The mysqldump file of your output after running your app including the manual inserted boundary boxes, and the structures and data for the IMAGE_OCR_PHRASE_KEEP and IMAGE_OCR_PHRASE_REJECT tables.