Computer Vision for Robotics
A series of three hands-on laboratory projects covering camera geometry, feature-based tracking, and deep learning for image classification.
Project Overview
This collection of labs from the "Computer Vision for Robotics" course provided practical experience with the core techniques that allow robots to perceive and interpret the world. The projects were implemented in Python and covered three distinct areas: calibrating a camera's intrinsic properties, tracking objects using feature detection, and classifying images using a neural network.
Lab 1: Camera Calibration using Zhang's Method
Goal: To implement the classic Zhang's method to determine a camera's intrinsic matrix (focal length, principal point) from multiple images of a planar chessboard pattern.
Technical Deep Dive: The task was to compute the homography that maps the 3D points on the chessboard plane to their corresponding 2D pixel coordinates in each image. Each homography provides two constraints on the camera's intrinsic parameters. By taking at least 3 images of the chessboard from different orientations, an over-determined system of linear equations is formed. I implemented the Python code to solve this system and extract the intrinsic matrix `K`. A key step was implementing data normalization to improve the numerical stability of the homography calculation, as described in the original paper.
Lab 2: Feature-Based Single-Object Tracking
Goal: To develop a simple single-object tracking algorithm using local feature matching and homography estimation.
Technical Deep Dive: The algorithm was implemented in Python using OpenCV. First, I used the ORB (Oriented FAST and Rotated BRIEF) algorithm to detect keypoints and compute their descriptors on a manually-selected object in the first frame. Then, for every subsequent frame, new keypoints were detected and matched against the initial set. Using these matches, `cv2.findHomography` was called to compute the transformation matrix that maps the object from its original position to its new position. Finally, `cv2.perspectiveTransform` was used to apply this homography to the corners of the initial bounding box, effectively tracking the object's location and orientation in the video.
Lab 3: Deep Learning for Image Classification
Goal: To build, train, and evaluate a simple Multi-Layer Perceptron (MLP) for classifying handwritten digits from the MNIST dataset using PyTorch.
Technical Deep Dive: This project involved the complete deep learning workflow. I implemented a custom `Dataset` class to load and transform the MNIST images and labels. I defined the MLP architecture as a `torch.nn.Module`, consisting of two linear layers with a ReLU activation function. The model was trained using the Cross-Entropy Loss function and an Adam optimizer. The implementation involved writing a complete training loop that iterates over the dataset in mini-batches, performs the forward and backward passes, and updates the model's weights. The final trained model was then evaluated on the test set to measure its classification accuracy.