Computational Visual Media  2019, Vol. 05 Issue (04): 417-428    doi: 10.1007/s41095-019-0152-1
 Research Article
SpinNet: Spinning convolutional network for lane boundary detection
Ruochen Fan1, Xuanrun Wang1, Qibin Hou2, Hanchao Liu1, Tai-Jiang Mu1,(✉)
1Tsinghua University, Beijing, 100084, China. E-mail: R. Fan, frc16@mails.tsinghua.edu.cn; X. Wang, xuanrun-16@mails.tsinghua.edu.cn; H. Liu, liuhc17@mails.tsinghua.edu.cn
2Nankai University, Tianjin, 300350, China. E-mail: andrewhoux@gmail.com.

Abstract

In this paper, we propose a simple but effective framework for lane boundary detection, called SpinNet. Considering that cars or pedestrians often occlude lane boundaries and that the local features of lane boundaries are not distinctive, therefore, analyzing and collecting global context information is crucial for lane boundary detection. To this end, we design a novel spinning convolution layer and a brand-new lane parameterization branch in our network to detect lane boundaries from a global perspective. To extract features in narrow strip-shaped fields, we adopt strip-shaped convolutions with kernels which have $1×n$ or $n×1$ shape in the spinning convolution layer. To tackle the problem of that straight strip-shaped convolutions are only able to extract features in vertical or horizontal directions, we introduce the concept of feature map rotation to allow the convolutions to be applied in multiple directions so that more information can be collected concerning a whole lane boundary. Moreover, unlike most existing lane boundary detectors, which extract lane boundaries from segmentation masks, our lane boundary parameterization branch predicts a curve expression for the lane boundary for each pixel in the output feature map. And the network utilizes this information to predict the weights of the curve, to better form the final lane boundaries. Our framework is easy to implement and end-to-end trainable. Experiments show that our proposed SpinNet outperforms state-of-the-art methods.

Received: 20 December 2019      Published: 13 March 2020
Corresponding Authors: Tai-Jiang Mu
 Fig. 1:  The network structure of SpinNet and an illustration of the proposed spinning convolution. (A) shows the whole network structure of SpinNet, in which Conv_1 to Dilated_Conv_5 are VGGNet backbone and the elements in dotted box and "Existence Vector" are the output of SpinNet. -60, $-30$, 0, 30, 60 degrees respectively. Obviously, a kernel with a certain rotation angle can collect more edge information of lane boundaries in that direction."> Fig. 2:  The result feature maps of spinning convolution. Given the strip-shaped kernels, performing convolution in traditional way can only use horizontal and vertical kernels. But in our purposed method, kernels can be designed to spin at arbitrary angles, as shown in the upper-right corner of each feature map. (a) is the original image input. (b-f) are the output feature maps of the spinning convolution stack. Each of the feature maps has 64 channels and the mean values of them are shown. The kernels are rotated with -60, -30, 0, 30, 60 degrees respectively. Obviously, a kernel with a certain rotation angle can collect more edge information of lane boundaries in that direction. 4×4 grids, and each grid indicates one pixel in the output feature maps and a square area in the original picture. The green horizontal line is the virtual baseline we are trying to solve the intersection point on. A grid is in orange only if the grid has the maximum $Mi$ on a horizontal vertical baseline. (b) shows the separate curves $Qi$ generated from the four orange grids. These curves have intersections $xi,2$ with the green horizontal baseline, which are shown in (c). These intersections are summed after being weighted by their confidence $M$ and the distance between their baselines and the green one, generating the final answer $x2$ on the green baseline. Repeat this process, we get all the $xj$s, and then we perform a simple post-process to draw the predicted lane boundary in orange shown in (d)."> Fig. 3:  The curve aggregation pipeline. (a) shows a part of an input picture. There are 4×4 grids, and each grid indicates one pixel in the output feature maps and a square area in the original picture. The green horizontal line is the virtual baseline we are trying to solve the intersection point on. A grid is in orange only if the grid has the maximum Mi on a horizontal vertical baseline. (b) shows the separate curves Qi generated from the four orange grids. These curves have intersections xi,2 with the green horizontal baseline, which are shown in (c). These intersections are summed after being weighted by their confidence M and the distance between their baselines and the green one, generating the final answer x2 on the green baseline. Repeat this process, we get all the xjs, and then we perform a simple post-process to draw the predicted lane boundary in orange shown in (d). Table 1: Ablation study of our proposed SpinNet. "w/o Spin-Conv" represents the performance when spinning convolution is replaced with traditional convolution stack. "w/o Parameterization" means disabling parameterization branch and generates lane boundary prediction results from segmentation mask. The experiments show that SpinNet achieves best performance when both proposed branches are used Table 2: Performance of SpinNet when using different rotation angles in spinning convolution. $(±20∘ ,±10∘ ,0∘ )$ means that there are five sub-branches in spinning convolution stack, with rotation angles from $-20∘$ to $20∘$ Table 3: Performance of SpinNet when using different size of strip-shaped large kernel in spinning convolution. The size of the kernel we use is $1×n$. The experiment reveals that too long or too short kernels harm the performance, and a $1×12$ kernel is most appropriate for our task Table 4: Lane boundary results on CULane dataset (F1-measure). The columns from "Normal" to "Curve" show the effectiveness comparison in some particular scenario. The column "Total" shows the overall performance in the whole test set of CULane dataset, indicating that our SpinNet achieves new state-of-the-art result Fig. 4:  Selected examples produced by our SpinNet. Green lines are the ground truths. Blue lines are the predicted true positive line boundaries, while red lines are the incorrect prediction results. From this figure, we can see that our framework works well even in crowded road with obvious occlusion and scenes with low contrast.