Feature Matching & Stereo Vision

Computer vision project exploring feature detection, stereo correspondence, and depth estimation using OpenCV.

Base image of motorbike used for this project.

Feature Detection and Mapping

The first step in any stereo vision pipeline is identifying unique, repeatable points in an image areas that can be reliably recognised from another viewpoint. These are known as keypoints. In this project, a sample motorbike image was used to illustrate how features are extracted before any matching occurs.

Detected SIFT keypoints visualised on the motorbike images.

SIFT Feature Detection

The Scale Invariant Feature Transform (SIFT) algorithm was developed by David Lowe (ICCV, 1999). It identifies keypoints that remain consistent despite changes in scale, rotation, and lighting.
SIFT generates a 128-dimensional descriptor for each keypoint, capturing local gradient patterns that make the feature distinctive and robust.

For the example dataset, approximately 11,761 keypoints were detected in the first image and 11,928 in the second, each represented as a 128-element feature vector.

Visualising good feature matches between stereo images

Feature correspondences drawn between left and right motorbike images.

Good Matches

With features extracted from both images, the next step involves matching corresponding points between them. This is typically done using a feature matcher such as the FLANN or Brute-Force matcher in OpenCV. Each match links one keypoint in the left image to a similar descriptor in the right image.

Epipolar geometry visualised across both stereo images.

Epipolar Lines

Once good matches are identified, the geometry of the camera setup can be understood through the epipolar constraint.
This ensures that a point in one image must lie along a specific line (the epipolar line) in the other image. By estimating the fundamental matrix, these lines can be drawn to visualise the spatial relationships between views.

Disparity map highlighting depth variations around the motorbike.

Disparity and Depth Map

The disparity map measures how far apart corresponding pixels are between the left and right images. Areas with greater disparity are closer to the camera. OpenCV provides a StereoBM or StereoSGBM algorithm to compute this map, producing a grayscale output where brighter regions represent closer objects.

2.5D visualisation of estimated scene depth based on disparity values.

Visualising Depth in 2.5D

The final stage involves projecting the disparity map into a 2.5D depth plot. This offers a kind-of-3D view of the scene, where depth is inferred from pixel displacement rather than an actual 3D model.
It gives a clear representation of how the stereo pipeline captures geometric structure.

Summary

This project gave me insight into how stereo vision works from detecting features, matching points and more. It showed me the importance of robust descriptors like SIFT for maintaining consistency across viewpoints and demonstrated how geometric reasoning (through epipolar lines and disparity) enables machines to perceive depth.