Aligning RGB Channels Part 2

5 min readJun 15, 2022

Colorizing the Prokudin-Gorskii Photo Collection

This part of the blog aims to use image pyramids to find the best transformation for images of the Prokudin-Gorskii Photo collection.

Disclaimer:- I have spent a large amount of time tackling this problem and had to stop at an average output.

I will be giving my best attempt to explain all the concepts involved. Hope this helps!

What’s an image pyramid?

In plain English, when we subsample an image multiple times to a specific scale, it forms images of smaller sizes which if stacked together would form a pyramid-like structure.

A formal definition is along the lines:-

“A lowpass pyramid is made by smoothing the image with a gaussian smoothing filter and then subsampling the smoothed image. The resulting image is then subjected to the same procedure, and the cycle is repeated multiple times.
Each cycle of this process results in a smaller image with increased smoothing, but with decreased spatial sampling density (that is, decreased image resolution).
If illustrated graphically, the entire multi-scale representation will look like a pyramid, with the original image on the bottom and each cycle’s resulting smaller image stacked one atop the other.” — Wikipedia (edited)

Why use image pyramids?

If you have read my earlier blog (link) you must have observed that we are literally translating over the image pixels within a specified range (defaulting to -15,15) along both the axes. This gives us great results for small images but it falters when it comes to big images where it exponentially increases the computational cost.

To overcome this hurdle we are primarily doing three things:-

Apply a smoothening function/filter to the image.
Subsample the image.
Use the Laplacian form to calculate the Normalized Cross-Correlation for images at each stage.

Results:-

Let’s get into details!

I have used cv2.resize for subsampling the image.

g(i,j) = sum from i to j: f(k,l) h(i-k/r, j-l/r) — formula for down-sampling

Why apply the f(k,l)?

We are essentially dropping rows and columns and if we do not apply the Gaussian filter, the image will look something like this:

The above images are sharp because of the high frequencies in the original image.

If we apply the Gaussian filter they look something like this:-

We are also using their laplacian forms i.e.

How do we get the Laplacian form? Original image — blurred image.

Implementation details:-

Let’s first create the gaussian filter.

Gaussian blur 2d source: https://www.projectrhea.org/rhea/images/5/5c/Math2.jpg

As shown in the above image, implementing the the formula for f(k,l) as g(x,y) where we provide the function with sigma and filter size.

Function to create a 2D Gaussian blur filter.

gaussian blur 2d

Just for fun, let’s also create a 1d gaussian blur.

gaussian blur 1d

The flow of the implementation is simple.

First we will smoothen the image aka apply a gaussian filter (in my implementation I have applied it with a size of 7 x 7 and sigma 1 centered at origin) and then we will subsample the image i.e. reduce it.
Once we reach the base case i.e. either we reach the expected depth: suggested as num_pyramids in the parameters or if the image size is smaller than the sampling area i.e. [-15,15] of the kernel in both the directions.

Notice that I have tried to maintain the original image size using padding.

Now finally, let’s apply the above filter to our image and subsample it.

Note: I had (at first) implemented the above logic (of smoothening then subsampling to factors of two).

Later I realized that I should apply the smoothening on the smaller result, similar to lines 59–62 above and then try a recursive approach. Hence I have tried a recursive method as well.

The is_out_of_bounds function simply compares whether the image size is greater than the sampling area i.e. [-15,15].

For example:

# check if the left side is out of bound?if -width_1 > x_disp[0] or -width_2 > x_disp[0]:is_out_of_bounds = True

in a similar fashion, I have used it to check all the bounds.

Note: While experimenting with my code, I have found a weird behavior for the red channel:

The jump for the first two (i.e. the deepest, and the one before it) layers is (unreasonably) large and later it is offset as per expected behaviour.

Note 2: I have used np.roll() in NCC which simply appends the part of the image that goes out of bound to the direction opposite of the roll. for ex. if the above image is rolled leftwards, the part which is lost outside the boundary of the image is appended at the right end.

Thank you for reading this far!
Any feedback is encouraged.

Aligning RGB Channels Part 2

Why use image pyramids?

Results:-

Written by Siddhant Shah