Challenge Overview
In this challenge, we’re going to develop a Python desktop tool which allows for the marking of images and recording of the text designated by the marked input. This tool will be used to establish Ground Truth for the data files and to evaluate solutions.
I. Requirements. Here is what the tool needs to do:-
Allow users to select a local or remote input folder where a set of image files (mud log images) will reside. Local folders and Amazon S3 folders will be the most typical solution.
-
Load the list of the images files at the folder destination.
-
Save the IMAGE_URL, IMAGE_NAME, IMAGE_WIDTH, and IMAGE_HEIGHT to the IMAGE_OCR table in the database.
-
Provide users with the ability to select a particular image from the list.
-
Selecting a particular image will load the image (gif, jpg, png or tiff format) into the main workspace of the app.
-
The main workspace is basically an image editor but it only has one function: a bounding box tool. This workspace should allow users to zoom in and out on an image and should allow users to move and place multiple rectangular bounding boxes onto the existing images.
-
Users should have the ability to select and remove the boxes as well.
-
User should be able to save the bounding box characteristics into a database. Each bounding box will surround a phrase. The bounding box coordinates can be saved in the IMAGE_OCR_PHRASE table. Each bounding box/phrase will correspond to a record in the IMAGE_OCR_PHRASE table.
-
The app should provide a means (a text input control) for manually entering the text enclosed by each bounding box. It would great if the text input control were context driven. For example, right click on a bounding box -- which brings up a small dialog box which allows users to enter text.
-
The tool should also have the ability to connect to an existing database of image data and bounding box metadata. Provided the urls in the database are accessible, a user should be able to select any image in the database and see the loaded image and the bounding boxes and phrases previously defined.
-
Once an image has been loaded into the workspace and the bounding boxes have rendered users should be able to adjust and modify the existing bounding boxes and text identified.
-
There should be a save operation after adjustments have been made.
-
Don’t worry about calculating IMAGE_OCR.PHRASE_COUNT or IMAGE_OCR.PHRASE_COUNT.SCORE. We’ll develop that functionality later.
-
Don’t worry about calculating IMAGE_OCR_PHRASE.SCORE or populating IMAGE_OCR_PHRASE.PHRASE_TYPE. We’ll develop that functionality later.
II. Database Schema
The IMAGE_OCR table contains one row per image, and must at least have the following fields:
-
IMAGE_URL - URL of File Including Image Name (PK)
-
IMAGE_NAME - Image name
-
IMAGE_WIDTH - Image width in pixels
-
IMAGE_HEIGHT - Image height in pixels
-
PHRASE_COUNT - Total number of phrases identified in this image
-
SCORE - Total score from summation of all individual phrase scores in IMAGE_OCR_PHRASE
CREATE TABLE IMAGE_OCR
(
IMAGE_URL - char(250) NOT NULL,
IMAGE_NAME char(200) NOT NULL,
IMAGE_WIDTH int DEFAULT 0 NOT NULL,
IMAGE_HEIGHT int DEFAULT 0 NOT NULL,
PHRASE_COUNT int DEFAULT 0 NOT NULL,
SCORE int
);
CREATE UNIQUE INDEX IMAGE_NAME ON IMAGE_OCR(IMAGE_URL);
The IMAGE_OCR_PHRASE table contains one row per phrase, and must at least have the following fields:
-
IMAGE_URL - URL of File Including Image Name (FK)
-
IMAGE_NAME - Image name
-
OCR_PHRASE - Phrase text identified
-
OCR_PHRASE_TYPE - Type of phrase identified: "Show", "Stain", "Trace", or "Negative"
-
SCORE - Score corresponding to phrase type: 3 for "Show", 2 for "Stain", 1 for "Trace", and 0 for "Negative"
-
X1 - pixel x coordinate of upper left corner of phrase bounding box
-
Y1 - pixel y coordinate of upper left corner of phrase bounding box
-
X2 - pixel x coordinate of lower right corner of phrase bounding box
-
Y2 - pixel y coordinate of lower right corner of phrase bounding box
CREATE TABLE IMAGE_OCR_PHRASE
(
IMAGE_URL - char(250) NOT NULL,
IMAGE_NAME char(200) NOT NULL,
OCR_PHRASE_TYPE char(10) NOT NULL,
OCR_PHRASE longtext,
SCORE int,
X1 int,
Y1 int,
X2 int,
Y2 int,
CONTEXT char(200) NOT NULL
);
Python 3.6.x
MySQL 5.7.x
Final Submission Guidelines
1. Please submit all code required by the application in your submission.zip
2. Document the build process for your code including all dependencies (pip installs etc..)
3. You may use any Python Open Source libraries or technologies provided they are available for commercial use.