Please wait a minute...
Computational Visual Media  2019, Vol. 05 Issue (04): 417-428    doi: 10.1007/s41095-019-0152-1
Research Article     
SpinNet: Spinning convolutional network for lane boundary detection
Ruochen Fan1, Xuanrun Wang1, Qibin Hou2, Hanchao Liu1, Tai-Jiang Mu1,(✉)
1Tsinghua University, Beijing, 100084, China. E-mail: R. Fan, frc16@mails.tsinghua.edu.cn; X. Wang, xuanrun-16@mails.tsinghua.edu.cn; H. Liu, liuhc17@mails.tsinghua.edu.cn
2Nankai University, Tianjin, 300350, China. E-mail: andrewhoux@gmail.com.
Download: PDF (10147 KB)      HTML  
Export: BibTeX | EndNote (RIS)      

Abstract  

In this paper, we propose a simple but effective framework for lane boundary detection, called SpinNet. Considering that cars or pedestrians often occlude lane boundaries and that the local features of lane boundaries are not distinctive, therefore, analyzing and collecting global context information is crucial for lane boundary detection. To this end, we design a novel spinning convolution layer and a brand-new lane parameterization branch in our network to detect lane boundaries from a global perspective. To extract features in narrow strip-shaped fields, we adopt strip-shaped convolutions with kernels which have 1×n or n×1 shape in the spinning convolution layer. To tackle the problem of that straight strip-shaped convolutions are only able to extract features in vertical or horizontal directions, we introduce the concept of feature map rotation to allow the convolutions to be applied in multiple directions so that more information can be collected concerning a whole lane boundary. Moreover, unlike most existing lane boundary detectors, which extract lane boundaries from segmentation masks, our lane boundary parameterization branch predicts a curve expression for the lane boundary for each pixel in the output feature map. And the network utilizes this information to predict the weights of the curve, to better form the final lane boundaries. Our framework is easy to implement and end-to-end trainable. Experiments show that our proposed SpinNet outperforms state-of-the-art methods.



Key wordsobject detection      lane boundary detection      autonomous driving      deep learning     
Received: 20 December 2019      Published: 13 March 2020
Corresponding Authors: Tai-Jiang Mu   
About author:

* Shuai Liu and Ruipeng Gang contributed equally to this work.

Cite this article:

Ruochen Fan, Xuanrun Wang, Qibin Hou, Hanchao Liu, Tai-Jiang Mu. SpinNet: Spinning convolutional network for lane boundary detection. Computational Visual Media, 2019, 05(04): 417-428.

URL:

http://cvm.tsinghuajournals.com/10.1007/s41095-019-0152-1     OR     http://cvm.tsinghuajournals.com/Y2019/V05/I04/417

Fig. 1:  The network structure of SpinNet and an illustration of the proposed spinning convolution. (A) shows the whole network structure of SpinNet, in which Conv_1 to Dilated_Conv_5 are VGGNet backbone and the elements in dotted box and "Existence Vector" are the output of SpinNet.
-60, -30, 0, 30, 60 degrees respectively. Obviously, a kernel with a certain rotation angle can collect more edge information of lane boundaries in that direction.">
Fig. 2:  The result feature maps of spinning convolution. Given the strip-shaped kernels, performing convolution in traditional way can only use horizontal and vertical kernels. But in our purposed method, kernels can be designed to spin at arbitrary angles, as shown in the upper-right corner of each feature map. (a) is the original image input. (b-f) are the output feature maps of the spinning convolution stack. Each of the feature maps has 64 channels and the mean values of them are shown. The kernels are rotated with <inline-formula><math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline" id="MA13"><mml:mrow><mml:mo>-</mml:mo><mml:mn>60</mml:mn></mml:mrow></math></inline-formula>, <inline-formula><math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline" id="MA14"><mml:mrow><mml:mo>-</mml:mo><mml:mn>30</mml:mn></mml:mrow></math></inline-formula>, 0, 30, 60 degrees respectively. Obviously, a kernel with a certain rotation angle can collect more edge information of lane boundaries in that direction.
4×4 grids, and each grid indicates one pixel in the output feature maps and a square area in the original picture. The green horizontal line is the virtual baseline we are trying to solve the intersection point on. A grid is in orange only if the grid has the maximum Mi on a horizontal vertical baseline. (b) shows the separate curves Qi generated from the four orange grids. These curves have intersections xi,2 with the green horizontal baseline, which are shown in (c). These intersections are summed after being weighted by their confidence M and the distance between their baselines and the green one, generating the final answer x2 on the green baseline. Repeat this process, we get all the xjs, and then we perform a simple post-process to draw the predicted lane boundary in orange shown in (d).">
Fig. 3:  The curve aggregation pipeline. (a) shows a part of an input picture. There are <inline-formula><math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline" id="MA17"><mml:mrow><mml:mn>4</mml:mn><mml:mo>×</mml:mo><mml:mn>4</mml:mn></mml:mrow></math></inline-formula> grids, and each grid indicates one pixel in the output feature maps and a square area in the original picture. The green horizontal line is the virtual baseline we are trying to solve the intersection point on. A grid is in orange only if the grid has the maximum <inline-formula><math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline" id="MA18"><mml:msub><mml:mi>M</mml:mi><mml:mi>i</mml:mi></mml:msub></math></inline-formula> on a horizontal vertical baseline. (b) shows the separate curves <inline-formula><math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline" id="MA19"><mml:msub><mml:mi>Q</mml:mi><mml:mi>i</mml:mi></mml:msub></math></inline-formula> generated from the four orange grids. These curves have intersections <inline-formula><math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline" id="MA20"><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi id="XM3">i</mml:mi><mml:mo>,</mml:mo><mml:mn id="XM4">2</mml:mn></mml:mrow></mml:msub></math></inline-formula> with the green horizontal baseline, which are shown in (c). These intersections are summed after being weighted by their confidence <inline-formula><math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline" id="MA21"><mml:mi>M</mml:mi></math></inline-formula> and the distance between their baselines and the green one, generating the final answer <inline-formula><math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline" id="MA22"><mml:msub><mml:mi>x</mml:mi><mml:mn>2</mml:mn></mml:msub></math></inline-formula> on the green baseline. Repeat this process, we get all the <inline-formula><math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline" id="MA23"><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub></math></inline-formula>s, and then we perform a simple post-process to draw the predicted lane boundary in orange shown in (d).
StrategyPrecisionRecallF1-score
w/o Spin-Conv0.8560.6380.731
w/o Parameterization0.8530.6350.728
w/o both0.8460.6280.721
w/ all branches0.8790.6280.742
Table 1: Ablation study of our proposed SpinNet. "w/o Spin-Conv" represents the performance when spinning convolution is replaced with traditional convolution stack. "w/o Parameterization" means disabling parameterization branch and generates lane boundary prediction results from segmentation mask. The experiments show that SpinNet achieves best performance when both proposed branches are used
AnglesPrecisionRecallF1-score
(±20 ,±10 ,0 )0.8780.6320.735
(±40 ,±20 ,0 )0.8780.6350.737
(±60 ,±30 ,0 )0.8520.6570.742
(±80 ,±40 ,0 )0.8690.6410.738
(±45 ,0 )0.8470.6150.713
(±60 ,±40 ,±20 ,0 )0.8500.6510.737
Table 2: Performance of SpinNet when using different rotation angles in spinning convolution. (±20 ,±10 ,0 ) means that there are five sub-branches in spinning convolution stack, with rotation angles from -20 to 20
No.Kernel size (n)PrecisionRecallF1-score
170.8390.6500.733
290.8510.6560.740
3120.8520.6570.742
4150.8470.6520.737
5180.8790.6280.732
Table 3: Performance of SpinNet when using different size of strip-shaped large kernel in spinning convolution. The size of the kernel we use is 1×n. The experiment reveals that too long or too short kernels harm the performance, and a 1×12 kernel is most appropriate for our task
MethodNormalCrowdedNightNo lineShadowArrowDazzle lightCurveTotal
SegNet [32]0.7920.6170.4960.1450.2940.7100.3890.4560.572
SegNet-Ego-Lane [33]0.7540.6200.5780.1770.3100.7140.4760.4300.584
SCNN [4]0.8830.7530.6860.3650.5930.8210.5310.5940.720
Zhang et al. [34]0.8970.7650.6870.3510.6550.8220.6740.6320.731
LineNet [35]0.731
SpinNet (ours)0.9050.7170.6840.4320.7290.8500.6200.5070.742
Table 4: Lane boundary results on CULane dataset (F1-measure). The columns from "Normal" to "Curve" show the effectiveness comparison in some particular scenario. The column "Total" shows the overall performance in the whole test set of CULane dataset, indicating that our SpinNet achieves new state-of-the-art result
Fig. 4:  Selected examples produced by our SpinNet. Green lines are the ground truths. Blue lines are the predicted true positive line boundaries, while red lines are the incorrect prediction results. From this figure, we can see that our framework works well even in crowded road with obvious occlusion and scenes with low contrast.
[1]   Girshick, R.Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, 1440-1448, 2015.
[2]   Ren, S.; He, K.; Girshick, R.; Sun, J.Faster RCNN: Towards real-time object detection with region proposal networks. In: Proceedings of the Advances in Neural Information Processing Systems 28, 91-99, 2015.
[3]   He, K. M.; Gkioxari, G.; Dollár, P.; Girshick, R.Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, 2961-2969, 2017.
[4]   Pan, X.; Shi, J.; Luo, P.; Wang, X.; Tang, X.Spatial as deep: Spatial CNN for traffic scene understanding. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, 7276-7283, 2018.
[5]   Lipton, Z. C.; Berkowitz, J.; Elkan, C. A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019, 2015.
[6]   Aly, M.Real time detection of lane markers in urban streets. In: Proceedings of the IEEE Intelligent Vehicles Symposium, 7-12, 2008.
[7]   Bar Hillel, A.; Lerner, R.; Levi, D.; Raz, G. Recent progress in road and lane detection: A survey. Machine Vision and Applications Vol. 25, No. 3, 727-745, 2014.
[8]   Son, J.; Yoo, H.; Kim, S.; Sohn, K. Real-time illumination invariant lane detection for lane departure warning system. Expert Systems with Applications Vol. 42, No. 4, 1816-1824, 2015.
[9]   Jung, S.; Youn, J.; Sull, S. Efficient lane detection based on spatiotemporal images. IEEE Transactions on Intelligent Transportation Systems Vol. 17, No. 1, 289-295, 2016.
[10]   Borkar, A.; Hayes, M.; Smith, M. T. A novel lane detection system with efficient ground truth generation. IEEE Transactions on Intelligent Transportation Systems Vol. 13, No. 1, 365-374, 2012.
[11]   Loose, H.; Franke, U.; Stiller, C.Kalman Particle Filter for lane recognition on rural roads. In: Proceedings of the IEEE Intelligent Vehicles Symposium, 60-65, 2009.
[12]   Chiu, K. Y.; Lin, S. F.Lane detection using color-based segmentation. In: Proceedings of the IEEE Intelligent Vehicles Symposium, 706-711, 2005.
[13]   Teng, Z.; Kim, J.-H.; Kang, D.-J.Real-time lane detection by using multiple cues. In: Proceedings of the International Conference on Control, Automation and Systems, 2334-2337, 2010.
[14]   Liu, G. L.; W?rg?tter, F.; Markeli?, I.Combining statistical Hough transform and particle filter for robust lane detection and tracking. In: Proceedings of the IEEE Intelligent Vehicles Symposium, 993-997, 2010.
[15]   TuSimple. Tusimple dataset. Available at .
[16]   Long, J.; Shelhamer, E.; Darrell, T.Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognitionm, 3431-3440, 2015.
[17]   Chen, L. C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A. L. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 40, No. 4, 834-848, 2018.
[18]   Huval, B.; Wang, T.; Tandon, S.; Kiske, J.; Song, W.; Pazhayampallil, J.; Andriluka, M.; Rajpurkar, P.; Migimatsu, T.; Cheng-Yue, R. et al. An empirical evaluation of deep learning on highway driving. arXiv preprint arXiv:1504.01716, 2015.
[19]   Zou, Q.; Jiang, H. W.; Dai, Q. Y.; Yue, Y. H.; Chen, L.; Wang, Q. Robust lane detection from continuous driving scenes using deep neural networks. IEEE Transactions on Vehicular Technology doi: 10.1109/TVT.2019.2949603, 2019.
[20]   Zhang, W.; Mahale, T. End to end video segmentation for driving: Lane detection for autonomous car. arXiv preprint arXiv:1812.05914 2018.
[21]   Quach, C. H.; Tran, V. L.; Nguyen, D. H.; Nguyen, V. T.; Pham, M. T.; Phung, M. D.Real-time lane marker detection using template matching with RGB-D camera. In: Proceedings of the 2nd International Conference on Recent Advances in Signal Processing, Telecommunications & Computing, 152-157, 2018.
[22]   Garnett, N.; Cohen, R.; Pe’er, T.; Lahav, R.; Levi, D.3D-LaneNet: End-to-end 3D multiple lane detection. In: Proceedings of the IEEE International Conference on Computer Vision, 2921-2930, 2019.
[23]   Lee, S.; Kim, J.; Yoon, J. S.; Shin, S.; Bailo, O.; Kim, N.; Lee, T.-H.; Hong, H. S.; Han, S.-H.; Kweon, I. S.VPGNet: Vanishing point guided network for lane and road marking detection and recognition. In: Proceedings of the IEEE International Conference on Computer Vision, 1947-1955, 2017.
[24]   Neven, D.; De Brabandere, B.; Georgoulis, S.; Proesmans, M.; Gool, L. V.Towards end-to-end lane detection: An instance segmentation approach. In: Proceedings of the IEEE Intelligent Vehicles Symposium (IV), 286-291, 2018.
[25]   Van Gansbeke, W.; De Brabandere, B.; Neven, D.; Proesmans, M.; Van Gool, L.End-to-end lane detection through differentiable least-squares fitting. In: Proceedings of the IEEE International Conference on Computer Vision, 2019.
[26]   Chen, P. R.; Lo, S. Y.; Hang, H. M.; Chan, S. W.; Lin, J. J.Efficient road lane marking detection with deep learning. In: Proceedings of the IEEE 23rd International Conference on Digital Signal Processing, 1-5, 2018.
[27]   Worrall, D. E.; Garbin, S. J.; Turmukhambetov, D.; Brostow, G. J.Harmonic networks: Deep translation and rotation equivariance. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5028-5037, 2017.
[28]   Ouyang, Z. Y.; Feng, J. J.; Su, F.; Cai, A. N.Fingerprint matching with rotation-descriptor texture features. In: Proceedings of the 18th International Conference on Pattern Recognition, 417-420, 2006.
[29]   Dieleman, S.; Willett, K. W.; Dambre, J. Rotation-invariant convolutional neural networks for galaxy morphology prediction. Monthly Notices of the Royal Astronomical Society Vol. 450, No. 2, 1441-1459, 2015.
[30]   Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[31]   Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M. et al. TensorFlow: A system for large-scale machine learning. In: Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, 265-283, 2016.
[32]   Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, No. 12, 2481-2495, 2017.
[33]   Kim, J.; Park, C.End-to-end ego lane estimation based on sequential transfer learning for self-driving cars. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 30-38, 2017.
[34]   Zhang, J.; Xu, Y.; Ni, B. B.; Duan, Z. Y.Geometric constrained joint lane segmentation and lane boundary detection. In: Computer Vision - ECCV 2018. Lecture Notes in Computer Science, Vol. 11205. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer International Publishing, 502-518, 2018.
[35]   Liang, D.; Guo, Y.; Zhang, S.; Zhang, S.-H.; Hall, P.; Zhang, M.; Hu, S. LineNet: A zoomable CNN for crowdsourced high definition maps modeling in urban environments. arXiv preprint arXiv:1807.05696,2018.
[1] Ali Borji, Ming-Ming Cheng, Qibin Hou, Huaizu Jiang, Jia Li. Salient object detection: A survey[J]. Computational Visual Media, 2019, 5(2): 117-150.
[2] Shuai Liu, Ruipeng Gang, Chenghua Li, Ruixia Song. Adaptive deep residual network for single image super-resolution[J]. Computational Visual Media, 2019, 05(04): 391-401.
[3] Yifan Lu, Jiaming Lu, Songhai Zhang, Peter Hall. Traffic signal detection and classification in street views using an attention model[J]. Computational Visual Media, 2018, 04(03): 253-266.
[4] Yoshikatsu Nakajima,Hideo Saito. Robust camera pose estimation by viewpoint classification using deep learning[J]. Computational Visual Media, 2017, 3(2): 189-198.
[5] Wei Qi,Ming-Ming Cheng,Ali Borji,Huchuan Lu,Lian-Fa Bai. SaliencyRank: Two-stage manifold ranking for salient object detection[J]. Computational Visual Media, 2015, 1(4): 309-320.