Project 4: [Auto]Stitching Photo Mosaics

By Ethan Zhang

Overview

In two parts, this project implements a system that can automatically stitch together photos into a photo mosaic. First, correspondence between a pair of image was manually defined so that an algorithm to blend the images can be done through finding the homography. Then, an algorithm is devised to find features, match features in corresponding images, and finally remove outliers through Lowe's trick and Random Sample Consensus (RANSAC).

Image Warping and Mosaicing

Shoot the Pictures

In order to get some testing images for this portion, I made sure to fix the center of projection (COP) by standing in the same spot and only rotating the phone camera. This way, the transforms between the images would be projective.
Soda 1
Soda 2
Soda 3
Soda 4
Soda 5
Wall 1
Wall 2
Wall 3
Wall 4
Wall 5
Building 1
Building 2
Building 3
Building 4
Building 5
I also took pictures of two fridges in order to test the rectification homography matrix:
Soda Fridge
Apartment Frdige

Recover Homographies

From the sets of images taken, I chose to implement homography recovery on the following two pictures:
Soda 1 manually annotated with
correspondence points
Soda 2 manually annotated with
correspondence points
Knowing that point to point transformation can be calculated with p’=Hp, I was able to use the manual correspondence points to create 2n linear equations which can then be solved with using np.linalg.lstsq.

Warp the Images

With the homography H matrix, I am now able to apply the transformation globally on the first image to warp it towards the perspective of the second image. The straight-forward way would be to set the destination points via p’=Hp but I decided to use inverse-warping like that of project 3 in order to avoid gaps in the pixels. In particular, I had to handle the possibility of negative coordinates by storing an offset alongside the actual bounding box calculated after the translation. The bounding box itself though, was calculated with the corners of the picture warped with H. As a result, here is the first image warped to the second image's perspective through the homography transformation found between the manually marked correspondence points:
Soda 1 warped towards Soda 2
Soda 2

Image Rectification

In order to sanity check that the image warp function would behave as expected in other situations, I tested it by warping two images that contained rectangular objects into a perspective where the corners are actually in a rectangular shape. The H matrices were calculated using the methods defined above and here are the results (after cropping back to original image dimensions:
Soda Fridge
Fridge After Crop
Apartment Fridge
Fridge After Crop

Blend the images into a mosaic

Lastly, I used the Multi-resolution Blending technique implemented in project 2 to smoothly blend together the warped image and the base image for the perspective. In particular, I used the correct offsets to put the pair of images in relative correct location in the final canvas then generated a mask by perpendicularly separating the centers of the images.
Image 1 Warped to Image 2
Image 2 with offset
Mask to use for Blending
Soda 1
Soda 2
Soda 1 + 2
With manual correspondence matching, I also created two more mosaics with Wall 2/3 and Building 3/4:
Wall 2
Wall 3
Wall 2 + 3
Building 3
Building 4
Building 3 + 4

Feature Matching for Autostitching

In the previous parts, manually picking correspondences was quite tedious and made it hard to stitch together more photos in the same scene (since correspondences will have to be re-picked with the new perspective transforms). As such, over a few parts, I implemented feature matching to automate correspondence finding.

Corner Detection

For this part, I used the Harris corner detector to find points with high corner response. It turns out, there were a lot of corners! In the image Soda 1, I found 13k+ corners:
Harris Corners on Soda 1

Adaptive Non-Maximal Suppression (ANMS)

In order to reduce repetition and also space out the possible feature points, I implemented ANMS through calculating pairwise distances between the different points. Specifically, I found that my laptop had some issues calculating a large amount of distances so I decided to batch the points and generate a full distance matrix. Then, based on the closest point that fulfills robustness criteria (based on corner strength), I was able to limit the interest points to 500/1000 of the points with furthest neighbors.

After this part, the number of corners are reduced to 500 and they are much more spread out:
Soda 1 after ANMS

Feature Descriptor extraction

In order to run similarity matching between corresponding features in two images, I vectorized a region around each interest point. Specifically, I created a patch from a 40x40 window around the interest point, downsampled it to 8x8, rotated based on the corner response orientation, and stacked the values from each of the 3 color channels into one 192x1 vector. Here are some of the patches I found in the first image.
Patch from Image 1
Patch from Image 1
Patch from Image 1

Feature Matching

With these feature descriptors of each image, it's still not useful without knowing which features in either image corresponds to features in the other. Specifically, it is necessary to automatically find correspondences from the two input images in order to find the homography transformation. To do this, I used the dist2 function to find pairwise distance between every feature patch in image 1 and image 2. Since my image contained a lot of bricks and repetitive patterns, it wasn't good enough to just take the closest matches as the correspondence points. Instead, I also used Lowe's trick with the second-closest neighbor to filter the features down to those that were highly distinctive. In particular, I used a threshold of 0.27 somewhat arbitrarily but mostly based on the Multi-Image Matching using Multi-Scale Oriented Patches paper where it seemed outliers generally had a distance of around > 0.4 and correct matches had < 0.1.

Using image 1 and image 2 to test this portion, very similar patches were found to match:
Patches from Soda 1 and Soda 2
Patches from Soda 1 and Soda 2
Patches from Soda 1 and Soda 2
Visually, the kept correspondences already matched pretty well:
Kept patches location in Soda 1
Kept patches location in Soda 2

RANSAC

However, there were still some outliers that didn't correspond well. For example, I found some mistakes in the shadows or edges of the steel beam that were close patch matches but incorrect location overall. Thus, it was necessary to further reduce outliers through RANSAC. This algorithm randomly samples 4 correspondences and tries to estimate the correctness of the homography matrix generated from those 4 correspondence. With luck, it will find increasingly better matches, and I can keep the largest set of correspondences that had the best matches given a homography matrix.

Again, using image 1 and image 2, the set of correspondences were reduced down to the points shown below:
Patches location in Soda 1 after RANSAC
Patches location in Soda 2 after RANSAC

Final Results

Putting everything together, I re-stitched together the previously created mosaics to create the following:
Soda 1 + 2 (Manual)
Soda 1 + 2 (Auto)
Wall 2 + 3 (Manual)
Wall 2 + 3 (Auto)
Building 3 + 4 (Manual)
Building 3 + 4 (Auto)

What have you learned?

I think the coolest thing overall from this project was the effectiveness of RANSAC. Even though it is basically trying to take a bunch of lucky guesses, it was cool to see that it eventually came to a consensus and decided on points that visually looked optimal. In addition, implementing ANMS proved really interesting to see how spatial separation can be done algorithmically. Lastly, I learned a lot about how corner detection and feature descriptions can be refined using rotational invariance. In particular, I was able to find the orientation of the corners and compensate for that such that all candidate correspondences will have the same orientation adjusted to the patches.

Bells & Whistle: Rotational Invariance

Since the prospectives are slightly different between the images taken, the same corner patches may not match up exactly in the Feature Matching phase. Visually, I found that some corners were just a few degrees off between the pair of images. Initially, I planned to implement rotational invariance by rotating by the opposite of the corner response orientation per patch. However, I noticed that in terms of my images that included bricks, it wasn't necessarily a good thing to match with complete rotational invariance--I only wanted to match corners that were maybe a few degrees off from each other. As such, I decided to rotate by the corner orientation divided by a constant to nudge it towards an uniform direction but not completely aligned to a certain axis. Empirically, this seemed to have reduced mismatches.

Below are some of the patches before and after the minor rotation correction (before down-scaling):
Patch Pre-Rotation
Patch Post-Rotation
Patch Pre-Rotation
Patch Post-Rotation
Patch Pre-Rotation
Patch Post-Rotation