This was an assignment from my Image Processing class. I implemented some basic methods for segmenting images and compared their output to the "ground truth" ideal segmentation. The first method for segmenting images I employed was thresholding. It takes a grayscale image and converts it to a binary black picture based on some threshold value. Every pixel above that value is set to white, and everything below to black. This method really only works if the object we're looking for in the scene is relatively bright. In this case I was trying to select just the girl's face in the picture below.
On the left is the original image, and on the right is the "ground truth". I selected the girl's face as a single shape to use as a basis for comparison with my algorithm. In an ideal ground truth, the eyes, eyebrows, and mouth also would not be selected, but this will be good enough.
Here's the core snippet of code for the thresholding algorithm. It's short and sweet - it goes through every pixel in the image and checks if its intensity is higher or lower than the input threshold value t.
Above are the results of running this function at a variety of different threshold values. I experimented with this parameter a bit and visually determined t = 50 to provide the best output. When comparing the output to the ground truth, pixels can be marked in four ways: true positives, false positives, true negatives, and false negatives. A true positive is marked when a specific pixel is white in both the output image and the ground truth. A false positive is when the output image has a white pixel where the ground truth has a black pixel, and so on. To gauge the correctness of the algorithm's output I generated an ROC curve for the function. ROC stands for Return Operator Characteristic, and is a chart measuring true positives against false positives while varying some parameter of the function (in this case, the threshold value). The x axis is false positives, and the y axis is true positives. The goal is to have an algorithm that sticks as close to the upper left corner as possible (maximizing TPs and minimizing FPs). A straight diagonal line represents the performance of a random algorithm, as there are equal numbers of TPs and FPs.
Above is the ROC curve, and it shows the performance is better than random, but not stellar. Below is a visual depiction of the true positives (green), true negatives (blue), false positives (orange), and false negatives (pink) for an image thresholded at t = 50, which is what I determined visually to be a good segmentation. If the eyes and mouth had been excluded from the ground truth, the pink features would have been marked blue as the algorithm decided they should be negative.
The performance of this function is highly dependent on the input image. For comparison, here is a different input image and some output.
I also wrote a function that can calculate an optimal threshold value based on the relative TP/FP/TN/FN values at every threshold level. The goal is to maximize True Positives and True Negatives, and minimize falsely identified pixels. To do this, I graphed the curves for true and false positives as a function of threshold value. The intersection of these two curves is the desired operating point.
At the top of this block are the OP curve graphs for both the girl and cat images, and below are the optimal outputs as were chosen by the program. Pretty good I'd say!