Return to Wisconsin Computer Vision Group Publications Page

Recovering Feature and Observer Position by Projected Error Refinement
G. S. Bestor, Ph.D. Dissertation, Computer Sciences Department Technical Report 1381, University of Wisconsin - Madison, August 1998.

Abstract

Recovering three-dimensional information from images is a principal goal of computer vision. An approach called Structure From Motion (SFM) does so without imposing strict requirements on the observer or scene. In particular, SFM assumes camera motion is unknown and the scene is only required to be static. This thesis describes a new SFM technique called Projected Error Refinement that computes the positions of feature points (i.e., structure) and the locations of the camera or observer (i.e., motion) from a noisy image sequence. The technique addresses limitations of existing SFM techniques that make them unsuitable except in controlled environments; the approach presented in this thesis models perspective projection, allows unconstrained camera motion, deals with outliers and occlusion, and is scalable. This new technique is recursive and thus is suitable for video image streams because new images can be added at any time.

Projected Error Refinement views SFM as a geometric inverse projection problem, with the goal of determining the positions of the cameras and feature points such that the projectors defined by each image optimally intersect (projectors are the lines of projection specifying the direction of each feature point from the camera's optical center). This is expressed as a global optimization problem with the objective function minimizing the mean-squared angular projection error between the solution and the observed images. Occlusion is dealt with naturally in this approach because only visible feature points define projectors that are considered during optimization - occluded features are ignored. The technique models true perspective projection and is scalable to an arbitrary number of feature points and images. Projected Error Refinement is non-linear and uses an efficient parallel iterative refinement algorithm that takes an initial estimate of the structure and motion parameters and alternately refines the cameras' poses and the positions of the feature points in parallel. The solution can be refined to an arbitrary precision or refinement can be terminated prematurely due to limited processing time. The solution converges rapidly towards the global minimum even when started from a poor initial estimate. Experimental results are given for both 2D and 3D perspective projection using real and synthetic images sequences.