Project 1: Colorizing the Prokudin-Gorskii photo collection

By Ethan Zhang

Overview

In order to colorize the Prokudin-Gorskii photo collection, the main task is to align the 3 color channels then overlay them to create a final image.

To approach this problem, I mainly iterated through different alignment options and optimizations until I reached visually satisfactory results. In particular, I used NCC (Normalized Cross-Correlation) for similarity checking, Canny Edge Feature for better matching, and a Gaussian filter to weigh the center of the image more significantly.

Here is an overview of the progression:

monastery.jpg (naive)

monastery.jpg (canny)

monastery.jpg (canny + gaussian)

emir.jpg (naive)

emir.jpg (canny)

emir.jpg (canny + gaussian)

Single-Scaled Approach

For the single-scaled approach, I took the most modest approach of exhaustively searching over a 15x15 window of displacements for both the red and green channels against the blue channel. For each displacement, I chose to use Normalized Cross-Correlation (NCC) as the score but it seemed like the normal euclidean distance formula worked equally as well. At this point, I also realized that I could run the normalization process once before the 15x15 window iterations since normalization doesn't depend on the order of elements--this seemed to reduce the running time a little but I figured the majority probably came from the dot product.

It seemed to work pretty well in the case of cathedral.jpg and tobolsk.jpg where a 15x15 displacement search window seemed small enough to cover a large portion of the margins. In addition, a few pixels off also did not seem to affect image quality. However, for the larger tif pictures, it seemed like there was no effect at all.

cathedral.jpg
Displacements:
[ 1 -1] [ 7 -1]

tobolsk.jpg
Displacements
[3 2] [6 3]

Image Pyramid Approach

Having noticed that larger image files probably needed a larger window to search, I began implementing the Image Pyramid approach. In this case, the idea is that we can artificially downscale the image channels such that we can progressively get closer to where the displacement should actually be at by restoring the resolution, level by level, whilst using previously discovered displacements. At this point, I also realized that it wasn't so necessary to run the pyramid at higher level since it would essentially downscale it to a few pixels or produce very high displacements (since it just rolled the image back to the origin).

Following this implementation, I had already find several images to align pretty well. For example, church.tif, icon.tif, and sculpture.tif all looked well aligned:

church.jpg
Displacements:
[0 -5] [52 -6]

icon.jpg
Displacements
[42 16] [89 22]

sculpture.jpg
Displacements
[ 33 -11] [140 -26]

However, there were also a few that stood out like emir.tif and harvesters.tif:

emir.jpg
Displacements:
[-3 7] [107 17]

harvesters.jpg
Displacements
[118 -3] [257 -2]

For emir.tif I had a guess that the misalignment probably came from the largely solid blue-ish clothing. Since that would saturate the blue channel around that region, it probably had a high correlation coefficient with all the other channel at many displacements. In addition, I had a feeling that it could also be caused by the image borders.

Similarly for harvesters.tif, there were a lot of similar shapes in terms of the greenery and the people wore similar color clothing which meant it was very easy for there to be close matches even though the displacement might be significantly off from the actual displacement.

In order to resolve these issues, I felt that it might be better to try and correlate things that would be consistent across the color channels. In particular, I looked into finding edges in an image since I reasoned that it would likely remain consistent throughout the colors.

Bells & Whistles (Better Feature, Edge Detection)

At this point, I have discovered the Canny edge detector which was conveniently built into Scikit image. Since it also generates a np array that represents an image for the specific channel, it was relatively simple to run the exact same correlation / Image Pyramid algorithms devised in the previous steps. However, it didn't end up being as effective as I first imagined. Although images like emir.tif and harvesters.tif had dramatically improved alignment, some basic images like cathedral.jpg and monastery.jpg had a large regression.

emir.jpg
Displacements:
[49 23] [107 40]

harvesters.jpg
Displacements
[60 18] [123 10]

cathedral.jpg
Displacements:
[-175 -151] [-180 -1]

monastery.jpg
Displacements
[-76 0] [-82 1]

In order to debug why this alignment was happening, I thought it might be helpful to print out the actual channels that were being compared:

After looking at this, it wasn't immediately clear why it was still misaligning but I began strongly suspecting the image borders since it seemed to have the most overlap and the other edges seemed to be missing in the other channel which could be leading to the prioritization of matching the border edges.

Gaussian Filter

In order to further improve on the alignment, I looked for a while to de-emphasize the importance of the image borders. I reasoned that the photographer likely wanted to capture the subject towards the center of the frame and the debug images I printed out also seemed to support this conclusion. To do this, I decided to use a simple np mask that was fitted to a gaussian curve on both axis, scaled to [0, 1]. Running it on the same edge images yielded much more promising debug outputs:

Bells & Whistles (Auto Contrast)

In an effort to introduce more contrast into the image, I decided to calculate the overall luminance for each pixel such that the darkest and brightest color of the image would be stretched to black / white. I effectively used a linear transformation but it seemed like there wasn't too many obvious effects. In particular, I disregarded the outside 1/8th of the pixels since I reasoned it had the least impact on the perceived image colors. I feel that it might've been better to use a curve for transformation but it seems that the images are already well distributed in terms of the luminance histogram.

Optimizations

Lastly, this project had a run time requirement and I noticed that towards the during my image pyramid algorithms, it was very rare for the displacement to become more than 5-8 pixels from the previous displacement found. I reasoned that this must've been because of the previous scale not being that much further. Therefore, I designed the pyramid algorithm to use decreasingly smaller displacement search windows. In the end, my runs became about 60-70% faster.