Please wait a minute...
Computational Visual Media  2019, Vol. 05 Issue (04): 337-345    doi: 10.1007/s41095-019-0159-7
Research Article     
Reconstructing piecewise planar scenes with multi-view regularization
Weijie Xi1, Xuejin Chen1,(✉)
1University of Science and Technology of China, Hefei, 230026, China. E-mail: W. Xi, xiwj@mail.ustc.edu.cn
Download: PDF (11394 KB)      HTML  
Export: BibTeX | EndNote (RIS)      

Abstract  

Reconstruction of man-made scenes from multi-view images is an important problem in computer vision and computer graphics. Observing that man-made scenes are usually composed of planar surfaces, we encode plane shape prior in reconstructing man-made scenes. Recent approaches for single-view reconstruction employ multi-branch neural networks to simultaneouslysegment planes and recover 3D plane parameters. However, the scale of available annotated data heavily limits the generalizability and accuracy of these supervised methods. In this paper, we propose multi-view regularization to enhance the capability of piecewise planar reconstruction during the training phase, without demanding extra annotated data. Our multi-view regularization enables the consistency among multiple views by making the feature embedding more robust against view change and lighting variations. Thus, the neural network trained by multi-view regularization performs better on a wide range of views and lightings in the test phase. Based on more consistent prediction results, we merge the recovered models from multiple views to reconstruct scenes. Our approach achieves state-of-the-art reconstruction performance compared to previous approaches on the public ScanNet dataset.



Key wordsscene modeling      multi-view      regularization      neural network     
Received: 17 December 2019      Published: 13 March 2020
Corresponding Authors: Xuejin Chen   
About author:

* Shuai Liu and Ruipeng Gang contributed equally to this work.

Cite this article:

Weijie Xi, Xuejin Chen. Reconstructing piecewise planar scenes with multi-view regularization. Computational Visual Media, 2019, 05(04): 337-345.

URL:

http://cvm.tsinghuajournals.com/10.1007/s41095-019-0159-7     OR     http://cvm.tsinghuajournals.com/Y2019/V05/I04/337

Fig. 1:  Given multiple images under different viewpoints, we reconstruct the scene by recovering plane segments and the depth map under the planar shape constraint for each image. Based on our multi-view regularization, the recovered planes and depths are more consistent and can be composed to reconstruct the scene more completely.
K images under different views. The embedding features of the K source images are projected from their original views onto the reference view. At the test phase, only each color image is fed to the reconstruction network for 3D modeling separately and then composed in 3D.">
Fig. 2:  The architecture of the proposed multiview regularization. At the training phase, the input of the network contains a reference image with annotation and <inline-formula><math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline" id="MA1"><mml:mi>K</mml:mi></math></inline-formula> images under different views. The embedding features of the <inline-formula><math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline" id="MA2"><mml:mi>K</mml:mi></math></inline-formula> source images are projected from their original views onto the reference view. At the test phase, only each color image is fed to the reconstruction network for 3D modeling separately and then composed in 3D.
Depth threshold σd(m)
0.050.100.150.200.250.300.350.400.450.500.550.60
Per-pixel recall (%)
Baseline30.6551.2662.5468.1071.6273.7974.8576.0376.5376.7877.0177.31
Ours31.8553.0963.4768.4472.1274.2575.1475.8976.8677.1377.4777.53
Per-plane recall (%)
Baseline23.0739.5849.2254.1657.4659.5060.6161.5361.8862.1462.4062.56
Ours25.4542.0150.6055.0458.0159.9060.8161.4762.0662.3062.5662.60
Table 1: Per-pixel and per-plane recalls on the ScanNet test dataset compared with the baseline SVPNet [12]
Fig. 3:  Comparison between the enhanced SVPNet by our multi-view regularization term and three existing methods for single-view planar scene reconstruction on the ScanNet test set.
12]. (a) Input images. (b) The plane clustering results by SVPNet and (c) our method. (d) The inferred depth map by SVPNet and (e) our method. (f, g) Two novel views of the recovered models using SVPNet and (h, i) our method.">
Fig. 4:  Comparison between the single-view reconstruction results of our method and the baseline SVPNet [<xref ref-type="bibr" rid="R12">12</xref>]. (a) Input images. (b) The plane clustering results by SVPNet and (c) our method. (d) The inferred depth map by SVPNet and (e) our method. (f, g) Two novel views of the recovered models using SVPNet and (h, i) our method.
Fig. 5:  Comparison of the plane segmentation results of multiple images under different views using our method and other approaches.
12].">
Fig. 6:  The reconstruction results of two scenes from multiple images using our method and SVPNet [<xref ref-type="bibr" rid="R12">12</xref>].
[1]   Gallup, D.; Frahm, J.-M.; Mordohai, P.; Yang, Q.; Pollefeys, M.Real-time plane-sweeping stereo with multiple sweeping directions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1-8, 2007.
[2]   Hirschmuller, H. Stereo processing by semiglobal matching and mutual information. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 30, No. 2, 328-341, 2008.
[3]   Yao, Y.; Luo, Z. X.; Li, S. W.; Fang, T.; Quan, L.MVSNet: Depth inference for unstructured multi-view stereo. In: Computer Vision - ECCV 2018. Lecture Notes in Computer Science, Vol. 11212. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer International Publishing, 785-801, 2018.
[4]   Yao, Y.; Luo, Z.; Li, S.; Shen, T.; Fang, T.; Quan, L.Recurrent MVSNet for high-resolution multiview stereo depth inference. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5525-5534, 2019.
[5]   Chen, R.; Han, S.; Xu, J.; Su, H.Point-based multiview stereo network. In: Proceedings of the IEEE International Conference on Computer Vision, 1538-1547, 2019.
[6]   Luo, K.; Guan, T.; Ju, L.; Huang, H.; Luo, Y.PMVSNet: Learning patch-wise matching confidence aggregation for multi-view stereo. In: Proceedings of the IEEE International Conference on Computer Vision, 10452-10461, 2019.
[7]   Yang, R.; Pollefeys, M.Multi-resolution real-time stereo on commodity graphics hardware. In: Pro-ceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003.
[8]   Monszpart, A.; Mellado, N.; Brostow, G. J.; Mitra, N. J. RAPter: Rebuilding man-made scenes with regular arrangements of planes. ACM Transactions on Graphics Vol. 34, No. 4, Article No. 103, 2015.
[9]   Liu, C.; Yang, J.; Ceylan, D.; Yumer, E.; Furukawa, Y.PlaneNet: Piece-wise planar reconstruction from a single RGB image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2579-2588, 2018.
[10]   Yang, F. T.; Zhou, Z. H.Recovering 3D planes from a single image via convolutional neural networks. In: Computer Vision - ECCV 2018. Lecture Notes in Computer Science, Vol. 11214. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 87-103, 2018.
[11]   Liu, C.; Kim, K.; Gu, J.; Furukawa, Y.; Kautz, J.PlaneRCNN: 3D plane detection and reconstruction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4450-4459, 2019.
[12]   Yu, Z.; Zheng, J.; Lian, D.; Zhou, Z.; Gao, S.Single-image piece-wise planar 3D reconstruction via associative embedding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1029-1037, 2019.
[13]   Zhang, Y. Z.; Xu, W. W.; Tong, Y. Y.; Zhou, K. Online structure analysis for real-time indoor scene reconstruction. ACM Transactions on Graphics Vol. 34, No. 5, Article No. 159, 2015.
[14]   Dai, A.; Chang, A. X.; Savva, M.; Halber, M.; Funkhouser, T.; Niessner, M.ScanNet: Richlyannotated 3D reconstructions of indoor scenes. In: Proceedings of the Conference on Computer Vision and Pattern Recognition, 5828-5839, 2017.
[15]   Furukawa, Y.; Ponce, J. Accurate, dense, and robust multiview stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 32, No. 8, 1362-1376, 2010.
[16]   Sch?nberger, J. L.; Zheng, E. L.; Frahm, J. M.; Pollefeys, M.Pixelwise view selection for unstructured multi-view stereo. In: Computer Vision - ECCV 2016. Lecture Notes in Computer Science, Vol. 9907. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer International Publishing, 501-518, 2016.
[17]   Jensen, R.; Dahl, A.; Vogiatzis, G.; Tola, E.; Aanaes, H.Large scale multi-view stereopsis evaluation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 406-413, 2014.
[18]   Knapitsch, A.; Park, J.; Zhou, Q.-Y.; Koltun, V. Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics Vol. 36, No. 4, Article No. 78, 2017.
[19]   Delage, E.; Lee, H.; Ng, A. Y.Automatic single-image 3d reconstructions of indoor manhattan world scenes. In: Robotics Research. Springer Tracts in Advanced Robotics, Vol. 28. Thrun, S.; Brooks, R.; Durrant-Whyte, H. Eds. Springer Berlin Heidelberg, 305-321, 2007.
[20]   Barinova, O.; Konushin, V.; Yakubenko, A.; Lee, K.; Lim, H.; Konushin, A.Fast automatic single-view 3-d reconstruction of urban scenes. In: Computer Vision - ECCV 2008. Lecture Notes in Computer Science, Vol. 5303. Forsyth, D.; Torr, P.; Zisserman, A. Eds. Springer Berlin Heidelberg, 100-113, 2008.
[21]   Saxena, A.; Chung, S. H.; Ng, A. Y.Learning depth from single monocular images. In: Proceedings of the 18th International Conference on Neural Information Processing Systems, 1161-1168, 2005.
[22]   De Brabandere, B.; Neven, D.; Van Gool, L.Semantic instance segmentation for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 7-9, 2017.
[23]   Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; Lerer, A.Automatic differentiation in PyTorch. In: Proceedings of the 31st Conference on Neural Information Processing Systems, 2017.
[24]   Zhang,T. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the 21st International Conference on Machine Learning, 2004.
[25]   Silberman, N.; Hoiem, D.; Kohli, P.; Fergus, R.Indoor segmentation and support inference from RGBD images. In: Computer Vision - ECCV 2012. Lecture Notes in Computer Science, Vol. 7576. Fitzgibbon, A.; Lazebnik, S.; Perona, P.; Sato, Y.; Schmid, C. Eds. Springer Berlin Heidelberg, 746-760, 2012.
[1] Xiaochuan Wang, Xiaohui Liang, Bailin Yang, Frederick W. B. Li. No-reference synthetic image quality assessment with convolutional neural network and local image saliency[J]. Computational Visual Media, 2019, 5(2): 193-208.
[2] Salma Alqazzaz, Xianfang Sun, Xin Yang, Len Nokes. Automated brain tumor segmentation on multi-modal MR image using SegNet[J]. Computational Visual Media, 2019, 5(2): 209-219.
[3] Taishi Ono, Hiroyuki Kubo, Kenichiro Tanaka, Takuya Funatomi, Yasuhiro Mukaigawa. Practical BRDF reconstruction using reliable geometric regions from multi-view stereo[J]. Computational Visual Media, 2019, 05(04): 325-336.
[4] Takazumi Kikuchi, Yuki Endo, Yoshihiro Kanamori, Taisuke Hashimoto, Jun Mitani. Transferring pose and augmenting background for deep human-image parsing and its applications[J]. Computational Visual Media, 2018, 4(1): 43-54.
[5] Yoshikatsu Nakajima,Hideo Saito. Robust camera pose estimation by viewpoint classification using deep learning[J]. Computational Visual Media, 2017, 3(2): 189-198.
[6] Jun Song,Siliang Tang,Jun Xiao,Fei Wu,Zhongfei (Mark) Zhang. LSTM-in-LSTM for generating long descriptions of images[J]. Computational Visual Media, 2016, 2(4): 379-388.
[7] Yang Song,Qing Li,Dagan Feng,Ju Jia Zou,Weidong Cai. Texture image classification with discriminative neural networks[J]. Computational Visual Media, 2016, 2(4): 367-377.
[8] Yun-Hao Yuan,Yun Li,Jianjun Liu,Chao-Feng Li,Xiao-Bo Shen,Guoqing Zhang,Quan-Sen Sun. Learning multi-kernel multi-view canonical correlations for image recognition[J]. Computational Visual Media, 2016, 2(2): 153-162.