Self-Calibrating Gaussian Splatting for Large Field of View Reconstruction


1Cornell University 2Netflix Eyeline Studio 3Stanford University
* denotes equal contributions

arXiv 2024

Please use desktop or PC to view this page for sliding video comparisons.

Abstract

In this paper, we present a self-calibrating framework that jointly optimizes camera parameters, lens distortion and 3D Gaussian representations, enabling accurate and efficient scene reconstruction. In particular, our technique enables high-quality scene reconstruction from Large field-of-view (FOV) imagery taken with wide-angle lenses, allowing the scene to be modeled from a smaller number of images. Our approach introduces a novel method for modeling complex lens distortions using a hybrid network that combines invertible residual networks with explicit grids. This design effectively regularizes the optimization process, achieving greater accuracy than conventional camera models. Additionally, we propose a cubemap-based resampling strategy to support large FOV images without sacrificing resolution or introducing distortion artifacts. Our method is compatible with the fast rasterization of Gaussian Splatting, adaptable to a wide variety of camera lens distortion, and demonstrates state-of-the-art performance on both synthetic and real-world datasets.

Method

Description of the image

Conventional approaches require undistorting the image into perspective views compatible with 3DGS rasterization. As the field of view increases, pixel stretching becomes progressively severe, significantly compromising the quality of the reconstruction. In contrast, our cubemap resampling strategy maintains a consistent pixel density across the entire field of view. This approach, combined with our hybrid distortion field, makes use of the peripheral regions (the annular area outside the blue box) without severe distortion or pixel stretching. Moreover, our method can handle fields of view all the way up to 180°, as demonstrated by the green box, allowing for comprehensive and accurate reconstructions.

FisheyeNeRF Dataset

We compare our method with Vanilla 3DGS and Fisheye-GS on the FisheyeNeRF dataset.
Without modeling lens distortion, Vanilla 3DGS fails to reconstruct properly, resulting in numerous floaters.
Fisheye-GS struggles in peripheral areas where distortions are more pronounced.
Our method produces better reconstructions, especially in highly distorted regions.
We also render the results in a perspective view with a large FOV after reconstruction.


Captures in NetFlix Studio

We captured scenes in NetFlix Studio with wide-angle cameras.
The following compares our method with conventional pipeline (e.g., undistort images with SfM).


180 Mitsuba Synthetic Dataset

We created an extremely challenging dataset with a 180-degree fisheye using Mitsuba.
The following compares our method with reconstruction from a set of regular FOV captures.
Additionally, we show the influence of failing to model the peripheral region during reconstruction with Fisheye-GS.