This commit is contained in:
ShqWW 2024-09-03 21:56:42 +08:00
parent ad893908bf
commit 9029182029

538
main.tex
View File

@ -155,7 +155,7 @@ The lane detection aims to detect lane instances in a image. In this section, we
\begin{figure*}[ht]
\centering
\includegraphics[width=\linewidth]{thsis_figure/ovarall_architecture.png} % 替换为你的图片文件名
\caption{The overall pipeline of PolarRCNN. The architecture is simple and lightweight. The backbone (e.g. ResNet18) and FPN aims to extract feature of the image. And the Local polar head aims to proposed sparse line anchors. After pooling features sample along the line anchors, the global polar head give the final predictions. Trilet subheads are set in the Global polar Head, including an one-to-one classification head (o2o cls head), an one-to-many classification head (o2m cls head) and an one-to-many regression head (o2m Reg Head). The one-to-one cls head aim to replace the NMS postprocessing and select only one positive prediction sample for each ground truth from the redundant predictions from the o2m head.}
\caption{The overall pipeline of PolarRCNN. The architecture is simple and lightweight. The backbone (e.g. ResNet18) and FPN aims to extract feature of the image. And the Local polar head aims to proposed sparse line anchors. After pooling features sample along the line anchors, the global polar head give the final predictions. Trilet subheads are set in the Global polar Head, including an one-to-one classification head (O2O cls head), an one-to-many classification head (o2m cls head) and an one-to-many regression head (o2m Reg Head). The one-to-one cls head aim to replace the NMS postprocessing and select only one positive prediction sample for each ground truth from the redundant predictions from the o2m head.}
\label{overall_architecture}
\end{figure*}
@ -188,12 +188,12 @@ The overall architecture of PolarRCNN is illustrated in Fig. \ref{overall_archit
$N^{nbr}_{i}$& set& The adjacent node set of the $i_{th}$ of anchor node\\
$C_{o2m}$ & scalar& The positive threshold of one-to-many confidence\\
$C_{o2o}$ & scalar& The positive threshold of one-to-one confidence\\
\midrule
& & \\
& & \\
& & \\
& & \\
& & \\
% \midrule
% & & \\
% & & \\
% & & \\
% & & \\
% & & \\
\bottomrule
\end{tabular}
\end{adjustbox}
@ -203,7 +203,7 @@ The overall architecture of PolarRCNN is illustrated in Fig. \ref{overall_archit
\subsection{Lane and Line Anchor Representation}
Lanes are characterized by their thin and elongated curved shapes. A suitable lane prior aids the model in extracting features, predicting locations, and modeling the shapes of lane curves with greater accuracy. In line with previous works \cite{linecnn}\cite{laneatt}, our lane priors (also referred to as lane anchors) consists of straight lines. We sample a sequence of 2D points along each lane anchor, denoted as $ P\doteq \left\{ \left( x_1, y_1 \right) , \left( x_2, y_2 \right) , ....,\left( x_n, y_n \right) \right\} $, where N is the number of sampled points. The y-coordinates of these points are uniformly sampled from the vertical axis of the image, specifically $y_i=\frac{H}{N-1}*i$, where H is the image height. These y-coordinates are also sampled from the ground truth lane, and the model is tasked with regressing the x-coordinate offset from the line anchor to the lane instance ground truth. The primary distinction between PolarRCNN and previous approaches lies in the description of the lane anchors (straight line), which will be detailed in the following sections.
Lanes are characterized by their thin and elongated curved shapes. A suitable lane prior aids the model in extracting features, predicting locations, and modeling the shapes of lane curves with greater accuracy. Consistent with previous studies \cite{linecnn}\cite{laneatt}, our lane priors (also referred to as lane anchors) consists of straight lines. We sample a sequence of 2D points along each lane anchor, denoted as $ P\doteq \left\{ \left( x_1, y_1 \right) , \left( x_2, y_2 \right) , ....,\left( x_n, y_n \right) \right\} $, where N is the number of sampled points. The y-coordinates of these points are uniformly sampled from the vertical axis of the image, specifically $y_i=\frac{H}{N-1}*i$, where H is the image height. These y-coordinates are also sampled from the ground truth lane, and the model is tasked with regressing the x-coordinate offset from the line anchor to the lane instance ground truth. The primary distinction between PolarRCNN and previous approaches lies in the description of the lane anchors (straight line), which will be detailed in the following sections.
\textbf{Polar Coordinate system.} Since lane anchors are typically represented as straight lines, they can be described using straight line parameters. Previous approaches have used rays to describe 2D lane anchors, with the parameters including the coordinates of the starting point and the orientation/angle, denoted as $\left\{\theta, P_{xy}\right\}$, as shown in Fig. \ref{coord} (a). \cite{linecnn}\cite{laneatt} define the start points as lying on the three image boundaries. However, \cite{adnet} argue that this approach is problematic because the actual starting point of a lane could be located anywhere within the image. In our analysis, using a ray can lead to ambiguity in line representation because a line can have an infinite number of starting points, and the choice of the starting point for a lane is subjective. As illustrated in Fig. \ref{coord} (a), the yellow (the visual start point) and green (the point located on the image boundary) starting points with the same orientation $\theta$ describe the same line, and either could be used in different datasets \cite{scnn}\cite{vil100}. This ambiguity arises because a straight line has two degrees of freedom, whereas a ray has three. To resolve this ussue , we propose using polar coordinates to describe a lane anchor with only two parameters: radius and angle, deoted as $\left\{\theta, r\right\}$, where $\theta \in \left[-\frac{\pi}{2}, \frac{\pi}{2}\right)$ and $r \in \left(-\infty, +\infty\right)$. This representation isillustrated in Fig. \ref{coord} (b).
@ -229,7 +229,7 @@ We define two types of polar coordinate systems: the global coordinate system an
\subsection{Local Polar Head}
\textbf{Anchor formulation in Local polar head}. Inspired by the region proposal network in Faster R-CNN \cite{fasterrcnn}, the local polar head (LPH) aims to propose flexible, high-quality anchors aorund the lane ground truths within an image. As Figure \ref{lph} and Figure \ref{overall_architecture} demonstrate, the highest level $P_{3} \in \mathbb{R}^{C_{f} \times H_{f} \times W_{f}}$ of FPN feature maps is selected as the input for the Local Polar Head (LPH). Following a downsampling operation, the feature map is then fed into two branches: the regression branch $\phi _{reg}^{lph}\left(\cdot \right)$ and the classification branch $\phi _{cls}^{lph}\left(\cdot \right)$.
\textbf{Anchor formulation in Local polar head}. Inspired by the region proposal network in Faster R-CNN \cite{fasterrcnn}, the local polar head (LPH) aims to propose flexible, high-quality anchors aorund the lane ground truths within an image. As Fig. \ref{lph} and Fig. \ref{overall_architecture} demonstrate, the highest level $P_{3} \in \mathbb{R}^{C_{f} \times H_{f} \times W_{f}}$ of FPN feature maps is selected as the input for the Local Polar Head (LPH). Following a downsampling operation, the feature map is then fed into two branches: the regression branch $\phi _{reg}^{lph}\left(\cdot \right)$ and the classification branch $\phi _{cls}^{lph}\left(\cdot \right)$.
\begin{equation}
\begin{aligned}
@ -441,7 +441,7 @@ Equation \ref{edge_layer} represents the implicit expression of equation \ref{al
The equation \ref{node_layer} serves as the implicit replacement for equation \ref{al_1-4}. In this approach, we use elementwise max pooling of tensors instead of scalar-based max operations. The pooled tensor is then fed into a neural network with a sigmoid activation function to directly obtain the confidence. By eliminating the need for a predefined distance threshold, all confidence calculation patterns are derived from the training data.
It should be noted that the O2O cls head depends on the predictons of O2M cls head as outlined in equation \ref{al_1-1}. From a probablity percpective, the confidence output by O2M cls head, $s_{j}$
,represents the probability that the $j_{th}$ detection is a positive sample. The confidence output by o2o cls head, $\tilde{s}_i$, denotes the conditional probablity that $i_{th}$ sample shouldn't be supressed given the condition that the $i_{th}$ sample identified as a positive sample:
,represents the probability that the $j_{th}$ detection is a positive sample. The confidence output by O2O cls head, $\tilde{s}_i$, denotes the conditional probablity that $i_{th}$ sample shouldn't be supressed given the condition that the $i_{th}$ sample identified as a positive sample:
\begin{equation}
\begin{aligned}
&s_j|_{j=1}^{N_A}\equiv P\left( a_j\,\,is\,\,pos \right) \,\,
@ -498,7 +498,7 @@ This cost function is more compact than those in previous work and takes both lo
\\
\end{aligned}
\end{equation}
where the set of the one-to-one sample, $\varOmega _{pos}^{o2o}$ and $\varOmega _{neg}^{o2o}$, is resstricted to the subset $\varOmega _{neg}^{o2m}$ of O2M cls head:
where the set of the one-to-one sample, $\varOmega _{pos}^{o2o}$ and $\varOmega _{neg}^{o2o}$, is restricted to the positive sample set of O2M cls head:
\begin{equation}
\begin{aligned}
\varOmega _{pos}^{o2o}\cup \varOmega _{neg}^{o2o}=\left\{ i|s_i>C_{o2m} \right\}
@ -548,25 +548,25 @@ The first line in the loss function represents the loss for the local polar head
& Environment &urban and highway & highway&highway&railay&urban and highway\\
& Distribution &sparse&sparse&sparse&sparse&sparse and dense\\
\midrule
\multirow{1}*{Data Preprocess}
& Crop Height &270&160&300&560&640, etc\\
\midrule
\multirow{6}*{Training Parameter}
& Epoch Number &32&70&20&90&32\\
& Batch Size &40&24&32&40&40\\
& Warm up iterations &800&200&800&400&800\\
& Aux loss &0.2&0 &0.2&0.2&0.2\\
& Rank loss &0.7&0.7&0.1&0.7&0 \\
\midrule
\multirow{4}*{Evaluation Parameter}
& Polar map size &$4\times10$&$4\times10$&$4\times10$&$4\times10$&$6\times13$\\
& Top anchor selection &20&20&20&12&50\\
& o2m conf thres &0.48&0.40&0.40&0.40&0.45\\
& o2o conf thres &0.46&0.46&0.46&0.46&0.44\\
\midrule
\multirow{2}*{Dataset Split}
& Evaluation &Test&Test&Test&Test&Val\\
& Visualization &Test&Test&Val&Test&Val\\
\midrule
\multirow{1}*{Data Preprocess}
& Crop Height &270&160&300&560&640, etc\\
\midrule
\multirow{6}*{Training Hyperparameter}
& Epoch Number &32&70&20&90&32\\
& Batch Size &40&24&32&40&40\\
& Warm up iterations &800&200&800&400&800\\
& $w_{aux}$ &0.2&0 &0.2&0.2&0.2\\
& $w_{rank}$ &0.7&0.7&0.1&0.7&0 \\
\midrule
\multirow{4}*{Evaluation Hyperparameter}
& $H^{L}\times W^{L}$ &$4\times10$&$4\times10$&$4\times10$&$4\times10$&$6\times13$\\
& $K_{A}$ &20&20&20&12&50\\
& $C_{O2M}$ &0.48&0.40&0.40&0.40&0.45\\
& $C_{O2O}$ &0.46&0.46&0.46&0.46&0.44\\
\bottomrule
\end{tabular}
\end{adjustbox}
@ -585,7 +585,7 @@ The first line in the loss function represents the loss for the local polar head
\subsection{Dataset and Evaluation Metric}
We conducted experiments on four widely used lane detection benchmarks and one rail detection dataset: CULane\cite{scnn}, TuSimple\cite{tusimple}, LLAMAS\cite{llamas}, CurveLanes\cite{curvelanes}, and DL-Rail\cite{dalnet}. Among these datasets, CULane and CurveLanes are particularly challenging. The CULane dataset consists various scenarios but has sparse lane distributions, whereas CurveLanes includes a large number of curved and dense lane types, such as forked and double lanes. The DL-Rail dataset, focused on rail detection across different scenarios, was chosen to evaluate our models performance beyond traditional lane detection. The details for five dataset are shown in Tab. \ref{dataset_info}
We conducted experiments on four widely used lane detection benchmarks and one rail detection dataset: CULane\cite{scnn}, TuSimple\cite{tusimple}, LLAMAS\cite{llamas}, CurveLanes\cite{curvelanes}, and DL-Rail\cite{dalnet}. Among these datasets, CULane and CurveLanes are particularly challenging. The CULane dataset consists various scenarios but has sparse lane distributions, whereas CurveLanes includes a large number of curved and dense lane types, such as forked and double lanes. The DL-Rail dataset, focused on rail detection across different scenarios, was chosen to evaluate our models performance beyond traditional lane detection. The details for five dataset are shown in Table. \ref{dataset_info}
We use the F1-score to evaluate our model on the CULane, LLAMAS, DL-Rail, and Curvelanes datasets, maintaining consistency with previous work. The F1-score is defined as follows:
\begin{equation}
@ -610,7 +610,7 @@ For Tusimple, the evaluation is formulated as follows:
Accuracy=\frac{\sum{C_{clip}}}{\sum{S_{clip}}}
\end{aligned}
\end{equation}
where $C_{clip}$ and $S_{clip}$ represent the number of correct points (predicted points within 20 pixels of the ground truth) and the ground truth points, respectively. If the accuracy exceeds 85\%, the prediction is considered correct. Tusimples also report the False Positive rate (FP=1-Precision) and False Negative Rate (FN=1-Recall) formular.
where $C_{clip}$ and $S_{clip}$ represent the number of correct points (predicted points within 20 pixels of the ground truth) and the ground truth points, respectively. If the accuracy exceeds 85\%, the prediction is considered correct. TuSimples also report the False Positive Rate (FP=1-Precision) and False Negative Rate (FN=1-Recall) formular.
\subsection{Implement Detail}
All input images are cropped and resized to $800\times320$. Similar to \cite{clrnet}, we apply random affine transformations and random horizontal flips. For the optimization process, we use the AdamW \cite{adam} optimizer with a learning rate warm-up and a cosine decay strategy. The initial learning rate is set to 0.006. The number of sampled points and regression points for each lane anchor are set to 36 and 72, respectively. Other parameters, such as batch size and loss weights for each dataset, are detailed in Table \ref{dataset_info}. Since some test/validation sets for the five datasets are not accessible, the test/validation sets used are also listed in Table \ref{dataset_info}. All the expoeriments are conducted on a single NVIDIA A100-40G GPU. To make our model simple, we only use CNN based backbone, namely ResNet\cite{resnet} and DLA34\cite{dla}.
@ -781,15 +781,11 @@ All input images are cropped and resized to $800\times320$. Similar to \cite{clr
\end{table}
\subsection{Comparison with the state-of-the-art results}
The comparison results of our proposed model with other methods are shown in Tables \ref{culane result}, \ref{tusimple result}, \ref{llamas result}, \ref{dlrail result}, and \ref{curvelanes result}. We present results for two versions of our model: the NMS-based version, denoted as $PolarRCNN-NMS$, and the NMS-free version, denoted as $PolarRCNN$. The NMS-based version utilizes predictions obtained from the O2M head followed by NMS post-processing, while the NMS-free version derives predictions directly from the O2O classification head without NMS.
The comparison results of our proposed model with other methods are shown in Tables \ref{culane result}, \ref{tusimple result}, \ref{llamas result}, \ref{dlrail result}, and \ref{curvelanes result}. We present results for two versions of our model: the NMS-based version, denoted as PolarRCNN-NMS, and the NMS-free version, denoted as PolarRCNN. The NMS-based version utilizes predictions obtained from the O2M head followed by NMS post-processing, while the NMS-free version derives predictions directly from the O2O classification head without NMS.
To ensure a fair comparison, we also include results for CLRerNet \cite{clrernet} on the CULane and CurveLanes datasets, as we use a similar training strategy and data split. As illustrated in the comparison results, our model demonstrates competitive performance across five datasets. Specifically, on the CULane, TuSimple, LLAMAS, and DL-Rail datasets (sparse scenarios), our model outperforms other anchor-based methods. Additionally, the performance of the NMS-free version is nearly identical to that of the NMS-based version, highlighting the effectiveness of the O2O head in eliminating redundant predictions. On the CurveLanes dataset, the NMS-free version achieves superior F1-measure and Recall compared to both NMS-based and segment\&grid-based methods.
We also compare the number of anchors and processing speed with other methods. Figure \ref{anchor_num_method} illustrates the number of anchors used by several anchor-based methods on CULane. Our proposed model utilizes the fewest anchors (20) while achieving the highest F1-score on CULane. It remains competitive with state-of-the-art methods like CLRerNet, which uses 192 anchors and a cross-layer refinement strategy. Conversely, the sparse Laneformer, which also uses 20 anchors, does not achieve optimal performance. It is important to note that our model features a simpler structure without additional refinement, indicating that the design of flexible anchors is crucial for performance in sparse scenarios. Furthermore, due to its simple structure and fewer anchors, our model exhibits lower latency compared to most methods, as shown in Figure \ref{speed_method}. The combination of fast processing speed and a straightforward architecture makes our model highly deployable.
\subsection{Ablation Study and Visualization}
We also compare the number of anchors and processing speed with other methods. Fig. \ref{anchor_num_method} illustrates the number of anchors used by several anchor-based methods on CULane. Our proposed model utilizes the fewest anchors (20) while achieving the highest F1-score on CULane. It remains competitive with state-of-the-art methods like CLRerNet, which uses 192 anchors and a cross-layer refinement strategy. Conversely, the sparse Laneformer, which also uses 20 anchors, does not achieve optimal performance. It is important to note that our model is designed with a simpler structure without additional refinement, indicating that the design of flexible anchors is crucial for performance in sparse scenarios. Furthermore, due to its simple structure and fewer anchors, our model exhibits lower latency compared to most methods, as shown in Fig. \ref{speed_method}. The combination of fast processing speed and a straightforward architecture makes our model highly deployable.
\begin{figure}[t]
\centering
@ -806,47 +802,14 @@ We also compare the number of anchors and processing speed with other methods. F
\label{speed_method}
\end{figure}
\subsection{Ablation Study and Visualization}
To validate and analyze the effectiveness and influence of different component of PolarRCNN, we conduct serveral ablation expeoriments on CULane and CurveLanes dataset to show the performance.
\begin{figure*}[htbp]
\centering
\def\subwidth{0.325\textwidth}
\def\imgwidth{\linewidth}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth]{thsis_figure/anchor_num/anchor_num_testing_p.png}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth]{thsis_figure/anchor_num/anchor_num_testing_r.png}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth]{thsis_figure/anchor_num/anchor_num_testing.png}
\end{subfigure}
\caption{Anchor Number and f1-score of different methods on CULane.}
\label{fig:anchor_num_testing}
\end{figure*}
\textbf{Ablation study on polar coordinate system and anchor number.} To assess the importance of local polar coordinates of anchors, we examine the contribution of each component (i.e., angle and radius) to model performance. As shown in Table \ref{aba_lph}, both angle and radius contribute to performance to varying degrees. Additionally, we conduct experiments with auxiliary loss using fixed anchors and PolarRCNN. Fixed anchors refer to using anchor settings trained by CLRNet, as illustrated in Fig. \ref{anchor setting} (b). Model performance improves by 0.48% and 0.3% under the fixed anchor paradigm and proposal anchor paradigm, respectively.
We also explore the effect of different local polar map sizes on our model, as illustrated in Fig. \ref{anchor_num_testing}. The overall F1 measure improves with increasing local polar map size and tends to stabilize when the size is sufficiently large. Specifically, precision improves, while recall decreases. A larger polar map size includes more background anchors in the second stage (since we choose k=4 for SimOTA, with no more than four positive samples). Consequently, the model learns more negative samples, enhancing precision but reducing recall. Regarding the number of anchors chosen during the evaluation stage, recall and F1 measure show a significant increase in the early stages of anchor number expansion but stabilize in later stages. This suggests that eliminating some anchors does not significantly affect performance. Fig. \ref{cam} displays the heat map and top-$k_{A}$ selected anchors distribution in sparse scenarios. Brighter colors indicate a higher likelihood of anchors being foreground anchors. It is evident that most of the proposed anchors are clustered around the lane ground truth.
\begin{table}[h]
\begin{table}[h]
\centering
\caption{Comparsion between different anchor strategies}
\begin{adjustbox}{width=\linewidth}
@ -867,104 +830,30 @@ We also compare the number of anchors and processing speed with other methods. F
\bottomrule
\end{tabular}
\end{adjustbox}
\end{table}
\begin{table}[h]
\centering
\caption{NMS vs NMS-free on CurveLanes}
\begin{adjustbox}{width=\linewidth}
\begin{tabular}{l|l|ccc}
\toprule
\textbf{Paradigm} & \textbf{NMS thres(pixel)} & \textbf{F1(\%)} & \textbf{Precision(\%)} & \textbf{Recall(\%)} \\
\midrule
\multirow{7}*{PolarRCNN-NMS}
& 50 (default) &85.38&\textbf{91.01}&80.40\\
& 40 &85.97&90.72&81.68\\
& 30 &86.26&90.44&82.45\\
& 25 &86.38&90.27&82.83\\
& 20 &86.57&90.05&83.37\\
& 15 (optimal) &86.81&89.64&84.16\\
& 10 &86.58&88.62&\textbf{84.64}\\
\midrule
PolarRCNN (NMS-free) & - &\textbf{87.29}&90.50&84.31\\
\bottomrule
\end{tabular}
\end{adjustbox}
\end{table}
\begin{table}[h]
\centering
\caption{Ablation study on nms-free block}
\begin{adjustbox}{width=\linewidth}
\begin{tabular}{cccc|ccc}
\toprule
\textbf{GNN}&\textbf{cls Mat}& \textbf{Nbr Mat}&\textbf{Rank Loss}&\textbf{F1@50}&\textbf{Precision(\%)} & \textbf{Recall(\%)} \\
\midrule
& & & &16.19&69.05&9.17\\
\checkmark&\checkmark& & &79.42&88.46&72.06\\
\checkmark& &\checkmark& &71.97&73.13&70.84\\
\checkmark&\checkmark&\checkmark& &80.74&88.49&74.23\\
\checkmark&\checkmark&\checkmark&\checkmark&\textbf{80.78}&\textbf{88.49}&\textbf{74.30}\\
\bottomrule
\end{tabular}\
\end{adjustbox}
\label{aba_lph}
\end{table}
\begin{table}[h]
\begin{figure*}[t]
\centering
\caption{The ablation study for structure on CULane test set}
\begin{adjustbox}{width=\linewidth}
\begin{tabular}{c|l|lll}
\toprule
\multicolumn{2}{c|}{\textbf{Anchor strategy~/~assign}} & \textbf{F1@50(\%)} & \textbf{Precision(\%)} & \textbf{Recall(\%)} \\
\midrule
\multirow{6}*{Fixed}
&o2m-B w/~ NMS &80.38&87.44&74.38\\
&o2m-B w/o NMS &44.03\textcolor{darkgreen}{~(36.35$\downarrow$)}&31.12\textcolor{darkgreen}{~(56.32$\downarrow$)}&75.23\textcolor{red}{~(0.85$\uparrow$)}\\
\cline{2-5}
&o2o-B w/~ NMS &78.72&87.58&71.50\\
&o2o-B w/o NMS &78.23\textcolor{darkgreen}{~(0.49$\downarrow$)}&86.26\textcolor{darkgreen}{~(1.32$\downarrow$)}&71.57\textcolor{red}{~(0.07$\uparrow$)}\\
\cline{2-5}
&o2o-G w/~ NMS &80.37&87.44&74.37\\
&o2o-G w/o NMS &80.27\textcolor{darkgreen}{~(0.10$\downarrow$)}&87.14\textcolor{darkgreen}{~(0.30$\downarrow$)}&74.40\textcolor{red}{~(0.03$\uparrow$)}\\
\midrule
\multirow{6}*{Proposal}
&o2m-B w/~ NMS &80.81&88.53&74.33\\
&o2m-B w/o NMS &36.46\textcolor{darkgreen}{~(44.35$\downarrow$)}&24.09\textcolor{darkgreen}{~(64.44$\downarrow$)}&74.93\textcolor{red}{~(0.6$\uparrow$)}\\
\cline{2-5}
&o2o-B w/~ NMS &77.27&92.64&66.28\\
&o2o-B w/o NMS &47.11\textcolor{darkgreen}{~(30.16$\downarrow$)}&36.48\textcolor{darkgreen}{~(56.16$\downarrow$)}&66.48\textcolor{red}{~(0.20$\uparrow$)}\\
\cline{2-5}
&o2o-G w/~ NMS &80.81&88.53&74.32\\
&o2o-G w/o NMS &80.81\textcolor{red}{~(0.00$\uparrow$)}&88.52\textcolor{darkgreen}{~(0.01$\downarrow$)}&74.33\textcolor{red}{~(0.01$\uparrow$)}\\
\bottomrule
\end{tabular}
\end{adjustbox}
\end{table}
\def\subwidth{0.325\textwidth}
\def\imgwidth{\linewidth}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth]{thsis_figure/anchor_num/anchor_num_testing_p.png}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth]{thsis_figure/anchor_num/anchor_num_testing_r.png}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth]{thsis_figure/anchor_num/anchor_num_testing.png}
\end{subfigure}
\caption{Anchor Number and f1-score of different methods on CULane.}
\label{anchor_num_testing}
\end{figure*}
\begin{table}[h]
\centering
\caption{The ablation study for stop grad on CULane test set}
\begin{adjustbox}{width=\linewidth}
\begin{tabular}{c|c|lll}
\toprule
\multicolumn{2}{c|}{\textbf{Paradigm}} & \textbf{F1(\%)} & \textbf{Precision(\%)} & \textbf{Recall(\%)} \\
\midrule
\multirow{2}*{Baseline}
&o2m-B w/~ NMS &78.83&88.99&70.75\\
&o2o-G w/o NMS &71.68\textcolor{darkgreen}{~(7.15$\downarrow$)}&72.56\textcolor{darkgreen}{~(16.43$\downarrow$)}&70.81\textcolor{red}{~(0.06$\uparrow$)}\\
\midrule
\multirow{2}*{Stop grad}
&o2m-B w/~ NMS &80.81&88.53&74.33\\
&o2o-G w/o NMS &80.81\textcolor{red}{~(0.00$\uparrow$)}&88.52\textcolor{darkgreen}{~(0.01$\downarrow$)}&74.33\textcolor{red}{~(0.00$\uparrow$)} \\
\bottomrule
\end{tabular}
\end{adjustbox}
\end{table}
\begin{figure}[t]
\centering
\def\subwidth{0.24\textwidth}
@ -988,110 +877,135 @@ We also compare the number of anchors and processing speed with other methods. F
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/heatmap/anchor2.jpg}
\caption{}
\end{subfigure}
\caption{Comparision between different anchor thresholds in different scenarios. (a) Ground truth in dense scenario. (b) Predictions with large nms thresholds in dense scenario. (c) Ground truth in sparse scenario. (d) Predictions with small nms threshol in sparse scenario.}
\caption{The heap map of the local polar map and the anchor selection during the evaluation stage.}
\label{cam}
\end{figure}
\textbf{Ablation study on NMS-free block in sparse scenrios.} We conduct several experiments on the CULane dataset to evaluate the performance of the NMS-free head in sparse scenarios. As shown in Table \ref{aba_nmsfree_block}, without using the GNN to establish relationships between anchors, PolarRCNN fails to achieve an NMS-free paradigm, even with one-to-one assignment. Furthermore, the classification matrix (cls matrix) proves crucial, indicating that conditional probability is effective. Other components, such as the neighbor matrix (provided as a geometric prior) and rank loss, also contribute to the performance of the NMS-free block.
\begin{figure*}[htbp]
To compare the NMS-free paradigm with the traditional NMS paradigm, we perform experiments with the NMS-free block under both proposal and fixed anchor strategies. Table \ref{nms vs nmsfree} presents the results of these experiments. Here, O2M-B refers to the O2M classification head, O2O-B refers to the O2O classification head with a plain structure, and O2O-G refers to the O2O classification head with our proposed GNN structure. To assess the ability to eliminate redundant predictions, NMS post-processing is applied to each head. The results show that NMS is necessary for the traditional O2M classification head. In the fixed anchor paradigm, although the O2O classification head with a plain structure effectively eliminates redundant predictions, it is less effective than the proposed GNN structure. In the proposal anchor paradigm, the O2O classification head with a plain structure fails to eliminate redundant predictions due to high anchor overlap and similar RoI features. Thus, GNN is essential for PolarRCNN in the NMS-free paradigm. Both in the fixed and proposal anchor paradigms, the O2O classification head with the GNN structure successfully eliminates redundant predictions, indicating that our GNN-based O2O classification head can replace NMS post-processing in sparse scenarios without a loss in performance. This confirms our earlier theory that both structure and label assignment are crucial for an NMS-free paradigm.
We also explore the stop-gradient strategy for the O2O classification head. As shown in Table \ref{stop}, the gradient of the O2O classification head negatively impacts both the O2M classification head (with NMS post-processing) and the O2O classification head. This suggests that one-to-one assignment introduces critical bias into feature learning.
\begin{table}[h]
\centering
\def\subwidth{0.24\textwidth}
\def\imgwidth{\linewidth}
\def\imgheight{0.5625\linewidth}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/redun_gt.jpg}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/redun_pred50.jpg}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/redun_pred15.jpg}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/redun_nmsfree.jpg}
\end{subfigure}
\vspace{0.5em}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/redun2_gt.jpg}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/redun2_pred50.jpg}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/redun2_pred15.jpg}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/redun2_nmsfree.jpg}
\end{subfigure}
\vspace{0.5em}
\caption{Ablation study on nms-free block}
\begin{adjustbox}{width=\linewidth}
\begin{tabular}{cccc|ccc}
\toprule
\textbf{GNN}&\textbf{cls Mat}& \textbf{Nbr Mat}&\textbf{Rank Loss}&\textbf{F1@50}&\textbf{Precision(\%)} & \textbf{Recall(\%)} \\
\midrule
& & & &16.19&69.05&9.17\\
\checkmark&\checkmark& & &79.42&88.46&72.06\\
\checkmark& &\checkmark& &71.97&73.13&70.84\\
\checkmark&\checkmark&\checkmark& &80.74&88.49&74.23\\
\checkmark&\checkmark&\checkmark&\checkmark&\textbf{80.78}&\textbf{88.49}&\textbf{74.30}\\
\bottomrule
\end{tabular}\
\end{adjustbox}
\label{aba_nmsfree_block}
\end{table}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/less_gt.jpg}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/less_pred50.jpg}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/less_pred15.jpg}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/less_nmsfree.jpg}
\end{subfigure}
\vspace{0.5em}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/less2_gt.jpg}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/less2_pred50.jpg}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/less2_pred15.jpg}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/less2_nmsfree.jpg}
\end{subfigure}
\vspace{0.5em}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/all_gt.jpg}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/all_pred50.jpg}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/all_pred15.jpg}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/all_nmsfree.jpg}
\end{subfigure}
\vspace{0.5em}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/all2_gt.jpg}
\caption{GT}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/all2_pred50.jpg}
\caption{NMS@50}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/all2_pred15.jpg}
\caption{NMS@15}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/all2_nmsfree.jpg}
\caption{NMSFree}
\end{subfigure}
\vspace{0.5em}
\caption{The Visualization of the detection results of sparse scenarios.}
\end{figure*}
\begin{table}[h]
\centering
\caption{The ablation study for NMS and NMS-free on CULane test set}
\begin{adjustbox}{width=\linewidth}
\begin{tabular}{c|l|lll}
\toprule
\multicolumn{2}{c|}{\textbf{Anchor strategy~/~assign}} & \textbf{F1@50(\%)} & \textbf{Precision(\%)} & \textbf{Recall(\%)} \\
\midrule
\multirow{6}*{Fixed}
&O2M-B w/~ NMS &80.38&87.44&74.38\\
&O2M-B w/o NMS &44.03\textcolor{darkgreen}{~(36.35$\downarrow$)}&31.12\textcolor{darkgreen}{~(56.32$\downarrow$)}&75.23\textcolor{red}{~(0.85$\uparrow$)}\\
\cline{2-5}
&O2O-B w/~ NMS &78.72&87.58&71.50\\
&O2O-B w/o NMS &78.23\textcolor{darkgreen}{~(0.49$\downarrow$)}&86.26\textcolor{darkgreen}{~(1.32$\downarrow$)}&71.57\textcolor{red}{~(0.07$\uparrow$)}\\
\cline{2-5}
&O2O-G w/~ NMS &80.37&87.44&74.37\\
&O2O-G w/o NMS &80.27\textcolor{darkgreen}{~(0.10$\downarrow$)}&87.14\textcolor{darkgreen}{~(0.30$\downarrow$)}&74.40\textcolor{red}{~(0.03$\uparrow$)}\\
\midrule
\multirow{6}*{Proposal}
&O2M-B w/~ NMS &80.81&88.53&74.33\\
&O2M-B w/o NMS &36.46\textcolor{darkgreen}{~(44.35$\downarrow$)}&24.09\textcolor{darkgreen}{~(64.44$\downarrow$)}&74.93\textcolor{red}{~(0.6$\uparrow$)}\\
\cline{2-5}
&O2O-B w/~ NMS &77.27&92.64&66.28\\
&O2O-B w/o NMS &47.11\textcolor{darkgreen}{~(30.16$\downarrow$)}&36.48\textcolor{darkgreen}{~(56.16$\downarrow$)}&66.48\textcolor{red}{~(0.20$\uparrow$)}\\
\cline{2-5}
&O2O-G w/~ NMS &80.81&88.53&74.32\\
&O2O-G w/o NMS &80.81\textcolor{red}{~(0.00$\uparrow$)}&88.52\textcolor{darkgreen}{~(0.01$\downarrow$)}&74.33\textcolor{red}{~(0.01$\uparrow$)}\\
\bottomrule
\end{tabular}
\end{adjustbox}
\label{nms vs nmsfree}
\end{table}
\begin{figure*}[htbp]
\begin{table}[h]
\centering
\caption{The ablation study for stop grad on CULane test set}
\begin{adjustbox}{width=\linewidth}
\begin{tabular}{c|c|lll}
\toprule
\multicolumn{2}{c|}{\textbf{Paradigm}} & \textbf{F1(\%)} & \textbf{Precision(\%)} & \textbf{Recall(\%)} \\
\midrule
\multirow{2}*{Baseline}
&O2M-B w/~ NMS &78.83&88.99&70.75\\
&O2O-G w/o NMS &71.68\textcolor{darkgreen}{~(7.15$\downarrow$)}&72.56\textcolor{darkgreen}{~(16.43$\downarrow$)}&70.81\textcolor{red}{~(0.06$\uparrow$)}\\
\midrule
\multirow{2}*{Stop grad}
&O2M-B w/~ NMS &80.81&88.53&74.33\\
&O2O-G w/o NMS &80.81\textcolor{red}{~(0.00$\uparrow$)}&88.52\textcolor{darkgreen}{~(0.01$\downarrow$)}&74.33\textcolor{red}{~(0.00$\uparrow$)} \\
\bottomrule
\end{tabular}
\end{adjustbox}
\label{stop}
\end{table}
\textbf{Ablation study on NMS-free block in desse scenrios.} Despite demonstrating the feasibility of replacing NMS with the O2O classification head in sparse scenarios, the shortcomings of NMS in dense scenarios remain. To investigate the performance of the NMS-free block in dense scenarios, we conduct experiments on the CurveLanes dataset, as detailed in Table \ref{aba_nms_dense}.
In the traditional NMS post-processing \cite{clrernet}, the default IoU threshold is set to 50 pixels. However, this default setting may not always be optimal, especially in dense scenarios where some lane predictions might be erroneously eliminated. Lowering the IoU threshold increases recall but decreases precision. To find the most effective IoU threshold, we experimented with various values and found that a threshold of 15 pixels achieves the best trade-off, resulting in an F1-score of 86.81\%. In contrast, the NMS-free paradigm with the GNN-based O2O classification head achieves an overall F1-score of 87.29\%, which is 0.48\% higher than the optimal threshold setting in the NMS paradigm. Additionally, both precision and recall are improved under the NMS-free approach. This indicates that the GNN-based O2O classification head is capable of learning semantic distances between anchors in addition to geometric distances, thus providing a more effective solution for dense scenarios compared to the traditional NMS approach.
\begin{table}[h]
\centering
\caption{NMS vs NMS-free on CurveLanes validation set}
\begin{adjustbox}{width=\linewidth}
\begin{tabular}{l|l|ccc}
\toprule
\textbf{Paradigm} & \textbf{NMS thres(pixel)} & \textbf{F1(\%)} & \textbf{Precision(\%)} & \textbf{Recall(\%)} \\
\midrule
\multirow{7}*{PolarRCNN-NMS}
& 50 (default) &85.38&\textbf{91.01}&80.40\\
& 40 &85.97&90.72&81.68\\
& 30 &86.26&90.44&82.45\\
& 25 &86.38&90.27&82.83\\
& 20 &86.57&90.05&83.37\\
& 15 (optimal) &86.81&89.64&84.16\\
& 10 &86.58&88.62&\textbf{84.64}\\
\midrule
PolarRCNN & - &\textbf{87.29}&90.50&84.31\\
\bottomrule
\end{tabular}
\end{adjustbox}
\label{aba_nms_dense}
\end{table}
\textbf{Visualization.} We present the PolarRCNN predictions for both sparse and dense scenarios. Figure \ref{vis_sparse} displays the predictions for sparse scenarios across four datasets. The local polar head effectively proposes anchors that are clustered around the ground truth, providing a robust prior for the RoI stage to achieve the final lane predictions. Moreover, the number of anchors has significantly decreased compared to previous work, making our method faster than other anchor-based methods in theory. Figure \ref{vis_dense} shows the predictions for dense scenarios. We observe that NMS@50 mistakenly removes some predictions, leading to false negatives, while NMS@15 fails to eliminate redundant predictions, resulting in false positives. This highlights the trade-off between using a large IoU threshold and a small IoU threshold. The visualization clearly demonstrates that geometric distance becomes less effective in dense scenarios. Only the O2O classification head, driven by data, can address this issue by capturing semantic distance beyond geometric distance. As shown in Figure \ref{vis_dense}, the O2O classification head successfully eliminates redundant true predictions while retaining dense predictions with small geometric distances.
\begin{figure*}[htbp]
\centering
\def\pagewidth{0.49\textwidth}
\def\subwidth{0.47\linewidth}
@ -1224,14 +1138,118 @@ We also compare the number of anchors and processing speed with other methods. F
\vspace{0.5em}
\caption{The Visualization of the detection results of sparse scenarios.}
\label{vis_sparse}
\end{figure*}
\begin{figure*}[htbp]
\centering
\def\subwidth{0.24\textwidth}
\def\imgwidth{\linewidth}
\def\imgheight{0.5625\linewidth}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/redun_gt.jpg}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/redun_pred50.jpg}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/redun_pred15.jpg}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/redun_nmsfree.jpg}
\end{subfigure}
\vspace{0.5em}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/redun2_gt.jpg}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/redun2_pred50.jpg}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/redun2_pred15.jpg}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/redun2_nmsfree.jpg}
\end{subfigure}
\vspace{0.5em}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/less_gt.jpg}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/less_pred50.jpg}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/less_pred15.jpg}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/less_nmsfree.jpg}
\end{subfigure}
\vspace{0.5em}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/less2_gt.jpg}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/less2_pred50.jpg}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/less2_pred15.jpg}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/less2_nmsfree.jpg}
\end{subfigure}
\vspace{0.5em}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/all_gt.jpg}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/all_pred50.jpg}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/all_pred15.jpg}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/all_nmsfree.jpg}
\end{subfigure}
\vspace{0.5em}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/all2_gt.jpg}
\caption{GT}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/all2_pred50.jpg}
\caption{NMS@50}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/all2_pred15.jpg}
\caption{NMS@15}
\end{subfigure}
\begin{subfigure}{\subwidth}
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/all2_nmsfree.jpg}
\caption{NMSFree}
\end{subfigure}
\vspace{0.5em}
\caption{The Visualization of the detection results of sparse scenarios.}
\lebel{vis_dense}
\end{figure*}
\section{Conclusion}
The conclusion goes here.
In this paper, we propose PolarRCNN to address two key issues in anchor-based lane detection methods. By incorporating a local and global polar coordinate system, our model achieves improved performance with fewer anchors. Additionally, the introduction of a GNN-based O2O classification head allows us to replace the traditional NMS post-processing, and the NMS-free paradigm demonstrates superior performance in dense scenarios. Our model is highly flexible; the number of anchors can be adjusted based on the specific scenario. Users have the option to use either the O2M classification head with NMS post-processing or the O2O classification head for an NMS-free approach. PolarRCNN is also deployment-friendly due to its simple structure, making it a potential new baseline for lane detection. Future work could explore incorporating new structures, such as large kernels or attention mechanisms, and experimenting with new label assignment, training, and anchor sampling strategies. We also plan to extend PolarRCNN to video instance lane detection and 3D lane detection, utilizing advanced geometric modeling for these new tasks.
\section*{Acknowledgments}
This should be a simple paragraph before the References to thank those individuals and institutions who have supported your work on this article.
% \section*{Acknowledgments}
% This should be a simple paragraph before the References to thank those individuals and institutions who have supported your work on this article.