ORStereo: Occlusion-Aware Recurrent Stereo Matching for 4K-Resolution Images

Stereo cameras can be a high-resolution source of 3D maps. However, effective data processing is necessary. Most current stereo matching models are trained on low-resolution data because high-resolution images require a lot of GPU memory. Therefore, a recent paper focuses on the generalization of such models to high-resolution images.

Computer-aided image editing. Image credit: Free-Photos | Free picture via Pixabay

In the first phase, an initial down-sampled disparity map is predicted. Then, the full-resolution disparity is recurrently refined. The approach explicitly detects occlusions to guide the updates. A novel refinement module equipped with special normalization operations can generalize to previously unseen disparity ranges. A dataset of 4K-resolution stereo images was also collected for the evaluation. The results show that the method can achieve state-of-the-art performance without any high-resolution training data.

Stereo reconstruction models trained on small images do not generalize well to high-resolution data. Training a model on high-resolution image size faces difficulties of data availability and is often infeasible due to limited computing resources. In this work, we present the Occlusion-aware Recurrent binocular Stereo matching (ORStereo), which deals with these issues by only training on available low disparity range stereo images. ORStereo generalizes to unseen high-resolution images with large disparity ranges by formulating the task as residual updates and refinements of an initial prediction. ORStereo is trained on images with disparity ranges limited to 256 pixels, yet it can operate 4K-resolution input with over 1000 disparities using limited GPU memory. We test the model’s capability on both synthetic and real-world high-resolution images. Experimental results demonstrate that ORStereo achieves comparable performance on 4K-resolution images compared to state-of-the-art methods trained on large disparity ranges. Compared to other methods that are only trained on low-resolution images, our method is 70% more accurate on 4K-resolution images.

Research paper: Hu, Y., Wang, W., Yu, H., Zhen, W., and Scherer, S., “ORStereo: Occlusion-Aware Recurrent Stereo Matching for 4K-Resolution Images”, 2021. Link: https://arxiv.org/abs/2103.07798


Notify of
Inline Feedbacks
View all comments
Would love your thoughts, please comment.x