posted on 2025-05-12, 14:02authored byJames Nguyen
<p>This thesis report goes through the process of developing a stereo visual odometry pipeline</p>
<p>and compare its performance with a monocular visual odometry pipeline. Visual odometry</p>
<p>is an extremely important computer vision technique that allows a system to determine its</p>
<p>position and orientation based solely on visual information. This is beneficial for systems</p>
<p>that cannot operate with GPS due to environmental issues. In this study, two different</p>
<p>VO pipelines were implemented and tested; one using a single grayscale camera, known</p>
<p>as monocular VO, and the other using rectified stereo image pairs, known as stereo VO.</p>
<p>Both of these pipelines were tested on the KITTI odometry benchmark dataset, with</p>
<p>certain sequences that include the ground truth for comparison. The performance was</p>
<p>evaluated in terms of trajectory accuracy, translational error, and rotational error. To</p>
<p>quantify the performance between monocular and stereo VO, both pipelines’ performance</p>
<p>was assessed using the root mean square error (RSME) for translation and rotation. The</p>
<p>stereo VO pipeline achieved a 63.35% improvement in translational RMSE and 31.55%</p>
<p>in rotational RMSE. It also exhibited improved robustness which was demonstrated by</p>
<p>the lower standard deviation values of 117.99 m versus 181.33 m for translation and 5.17°</p>
<p>versus 15.00° for rotation. The results of all tests show that even though monocular VO</p>
<p>provides a simpler and faster solution, it suffers from significant drift due to the lack</p>
<p>of depth perception. In contrast, stereo VO significantly improves upon the trajectory</p>
<p>alignment with ground truth and the errors by leveraging depth from disparity maps. This</p>
<p>improvement reduces drift and allows for a more stable estimation over time. Overall,</p>
<p>this thesis demonstrates that stereo VO provides a much more reliable and accurate result</p>
<p>and also highlights the trade-offs between performance and complexity.</p>