Sergei Mikhailovich Prokudin-Gorskii (1863-1944) was a pioneer in photography, traveling across the Russian Empire to capture color images by documenting scenes in three separate color channels and later combining them to produce full-color effects. While many manual approaches have been used to colorize his images, this project aims to implement efficient algorithms to automate the alignment process for his photo collection.
At first, I tried comparing images using the L2-norm to determine a numerical value correlated with alignment, as suggested by the project specifications. Then, I compared it with the other suggested metric, a normalized cross correlation metric, which calculates the dot product between each image normalized. This performed slightly better, so I chose to use this for the remainder of the project.
Using the assumption that the pixel brightness between the 3 color channels were directly correlated, I tried applying a naive double for-loop search to maximize the alignment metric within a [-15, 15] range of pixel displacement in both x and y directions. This proved to be decent for many of the images. However, one particular example that performed poorly was Emir. This led me to believe the assumption that all colors were correlated may be incorrect, which is explored later in Bells and Whistles.
Through testing both alignment, there was a consistent problem that the search was taking too long for larger images. Therefore, I used pyramiding, as mentioned in the project specification, to blur and downsample the image to a more manageable size, usually to 4 levels of halving, to search the [-15, 15] range of pixel displacements. Afterwards, I limit the search area to a [-3, 3] range of pixel displacements for subsequent lower levels of downsampling. This greatly reduced the runtime of the alignment algorithm. For Emir, the algorithm performed about 15x faster, reducing the runtime per image to around 20 seconds.
Since naive color alignment did not work for all cases, I then tried using edge detection to preprocess the images. Replacing the assumption that all colors are correlated, we now assume that the edges in each channel are correlated. I used a Sobel Edge Filter to detect vertical and horizontal edges, and then combined the two filtered results by taking the absolute value, superimposing, and normalizing.
A final optimization that I introduced was removing the borders of the image that might have negatively affected the alignment metrics, such as aligning to the borders rather than the image itself. This began as a naive crop to remove 4% off each horizontal edge, and 7% off each vertical edge. However, I realized that there might be merit in cropping further, as most of the image's subject are near the center pixels, so I implemented a more aggressive crop of 20% off each edge in large images. I kept a 10% crop off each edge in the smaller images to prevent more information loss than necessary. One mistake that occured while performing these crops was zeroing the original area that contained the borders. This helped raise the alignment metric unintentionally, and potentially kept less ideal alignment candidates. Thus, I simply cropped the image and effectively reduced the image size, and after finding the optimal alignment, then applied the roll to the original image. Through this optimization, the runtime per image dropped to around 13 seconds.
G: [2, 5] R: [3, 12]
G: [4, 25] R: [-4, 58]
G: [24, 49] R: [40, 107]
G: [18, 60] R: [14, 123]
G: [17, 41] R: [23, 90]
G: [9, 56] R: [12, 115]
G: [11, 81] R: [13, 178]
G: [2, -3] R: [2, 3]
G: [26, 51] R: [36, 108]
G: [-11, 33] R: [-27, 140]
G: [29, 78] R: [37, 175]
G: [13, 53] R: [9, 111]
G: [2, 3] R: [3, 6]
G: [8, 42] R: [32, 86]