CS 280A Proj1 - Leo Huang

Overview

Sergei Mikhailovich Prokudin-Gorskii (1863-1944) was a pioneer in photography, traveling across the Russian Empire to capture color images by documenting scenes in three separate color channels and later combining them to produce full-color effects. While many manual approaches have been used to colorize his images, this project aims to implement efficient algorithms to automate the alignment process for his photo collection.

Examples of separate color channel images

Approach

Alignment Metric: L2 vs. NCC

At first, I tried comparing images using the L2-norm to determine a numerical value correlated with alignment, as suggested by the project specifications. Then, I compared it with the other suggested metric, a normalized cross correlation metric, which calculates the dot product between each image normalized. This performed slightly better, so I chose to use this for the remainder of the project.

Naive Color Alignment:

Using the assumption that the pixel brightness between the 3 color channels were directly correlated, I tried applying a naive double for-loop search to maximize the alignment metric within a [-15, 15] range of pixel displacement in both x and y directions. This proved to be decent for many of the images. However, one particular example that performed poorly was Emir. This led me to believe the assumption that all colors were correlated may be incorrect, which is explored later in Bells and Whistles.

Cathedral aligned using color correlated alignment

Emir misaligned using color correlated alignment

Pyramiding:

Through testing both alignment, there was a consistent problem that the search was taking too long for larger images. Therefore, I used pyramiding, as mentioned in the project specification, to blur and downsample the image to a more manageable size, usually to 4 levels of halving, to search the [-15, 15] range of pixel displacements. Afterwards, I limit the search area to a [-3, 3] range of pixel displacements for subsequent lower levels of downsampling. This greatly reduced the runtime of the alignment algorithm. For Emir, the algorithm performed about 15x faster, reducing the runtime per image to around 20 seconds.

Train alignment image progression (gif)

Train alignment pyramid progression (gif)

Bells & Whistles

Edge Detection:

Since naive color alignment did not work for all cases, I then tried using edge detection to preprocess the images. Replacing the assumption that all colors are correlated, we now assume that the edges in each channel are correlated. I used a Sobel Edge Filter to detect vertical and horizontal edges, and then combined the two filtered results by taking the absolute value, superimposing, and normalizing.

Emir better aligned using edge detection alignment

Emir preprocessed using Sobel Edge Filter

Cropping:

A final optimization that I introduced was removing the borders of the image that might have negatively affected the alignment metrics, such as aligning to the borders rather than the image itself. This began as a naive crop to remove 4% off each horizontal edge, and 7% off each vertical edge. However, I realized that there might be merit in cropping further, as most of the image's subject are near the center pixels, so I implemented a more aggressive crop of 20% off each edge in large images. I kept a 10% crop off each edge in the smaller images to prevent more information loss than necessary. One mistake that occured while performing these crops was zeroing the original area that contained the borders. This helped raise the alignment metric unintentionally, and potentially kept less ideal alignment candidates. Thus, I simply cropped the image and effectively reduced the image size, and after finding the optimal alignment, then applied the roll to the original image. Through this optimization, the runtime per image dropped to around 13 seconds.