\documentclass[lettersize,journal]{IEEEtran} \usepackage{amsmath,amsfonts} \usepackage{algorithmic} \usepackage{algorithm} \usepackage{array} % \usepackage[caption=false,font=normalsize,labelfont=sf,textfont=sf]{subfig} \usepackage{textcomp} \usepackage{stfloats} \usepackage{url} \usepackage{verbatim} \usepackage{graphicx} \usepackage{cite} \usepackage{subcaption} \usepackage{multirow} \usepackage[T1]{fontenc} \usepackage{adjustbox} \usepackage{amssymb} \usepackage{booktabs} \usepackage{tikz} \usepackage{tabularx} \usepackage[colorlinks,bookmarksopen,bookmarksnumbered, linkcolor=red]{hyperref} % \usepackage[table,xcdraw]{xcolor} \definecolor{darkgreen}{RGB}{17,159,27} % \aboverulesep=0pt \belowrulesep=0pt \hyphenation{op-tical net-works semi-conduc-tor IEEE-Xpolare} % updated with editorial comments 8/9/2021 \begin{document} \title{Polar R-CNN:\@ End-to-End Lane Detection with Fewer Anchors} \author{Shengqi Wang, Junmin Liu, Xiangyong Cao, Zengjie Song, and Kai Sun\\ \thanks{This work was supported in part by the National Nature Science Foundation of China (Grant Nos. 62276208, 12326607) and in part by the Natural Science Basic Research Program of Shaanxi Province (Grant No. 2024JC-JCQN-02).}% \thanks{S. Wang, J. Liu, Z. Song and K. Sun are with the School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an 710049, China.} \thanks{X. Cao is with the School of Computer Science and Technology and the Ministry of Education Key Lab for Intelligent Networks and Network Security, Xi’an Jiaotong University, Xi’an 710049, China.} } %\thanks{Manuscript received April 19, 2021; revised August 16, 2021.}} \markboth{S. Wang \MakeLowercase{\textit{et al.}}: Polar R-CNN:\@ End-to-End Lane Detection with Fewer Anchors}% {S. Wang \MakeLowercase{\textit{et al.}}: Polar R-CNN:\@ End-to-End Lane Detection with Fewer Anchors} \maketitle \begin{abstract} Lane detection is a critical and challenging task in autonomous driving, particularly in real-world scenarios where traffic lanes can be slender, lengthy, and often obscured by other vehicles, complicating detection efforts. Existing anchor-based methods typically rely on prior Lane anchors to extract features and refine location and shape of lanes. While these methods achieve high performance, manually setting prior anchors is cumbersome, and ensuring sufficient coverage across diverse datasets often requires a large number of dense anchors. Furthermore, the use of \textit{Non-Maximum Suppression} (NMS) to eliminate redundant predictions complicates real-world deployment and may underperform in complex scenarios. In this paper, we propose \textit{Polar R-CNN}, a NMS-free anchor-based method for lane detection. By incorporating both local and global polar coordinate systems, Polar R-CNN facilitates flexible anchor proposals and significantly reduces the number of anchors required without compromising performance. Additionally, we introduce a heuristic \textit{Graph Neural Network} (GNN)-based NMS-free head that supports an end-to-end paradigm, enhancing deployment efficiency and performance in scenarios with dense lanes. Our method achieves competitive results on five popular lane detection benchmarks—\textit{Tusimple}, \textit{CULane}, \textit{LLAMAS}, \textit{CurveLanes}, and \textit{DL-Rail}—while maintaining a lightweight design and straightforward structure. Our source code is available at \href{https://github.com/ShqWW/PolarRCNN}{\textit{https://github.com/ShqWW/PolarRCNN}}. \end{abstract} \begin{IEEEkeywords} Lane Detection, NMS-Free, Graph Neural Network, Polar Coordinate System. \end{IEEEkeywords} \section{Introduction} \IEEEPARstart{L}{ane} detection is a critical task in computer vision and autonomous driving, aimed at identifying and tracking lane markings on the road. While extensive research has been conducted in ideal environments, it is still challenging in adverse scenarios such as night driving, glare, crowd, and rainy conditions, where lanes may be occluded or damaged. Moreover, the slender shapes and complex topologies of lanes further complicate detection efforts. %Therefore, an effective lane detection method should take into account both global high-level semantic features and local low-level features to address these varied conditions and ensure robust performances in a real-time application. along with their global properties, \par In the past few decades, a lot of methods primarily focus on handcrafted local feature extraction and lane shape modeling. Techniques such as the \textit{Canny edge detector}\cite{cannyedge},\textit{ Hough transform}\cite{houghtransform}, and \textit{deformable templates}\cite{kluge1995deformable} have been widely employed for lane fitting. However, these approaches often face limitations in real-world scenarios, especially when low-level and local features lack clarity and distinctiveness. \par In recent years, advancements in deep learning and the availability of large datasets have led to significant progress in lane detection, especially deep models such as \textit{Convolutional Neural Networks} (CNNs)\cite{scnn} and \textit{transformer-based} architectures \cite{lstr}. Based on this, earlier approaches typically framed lane detection as a \textit{segmentation task} \cite{lanenet}, which, despite its straightforward, required time-consuming computations. There are still some methods that rely on \textit{parameter-based} models, which directly output lane curve parameters rather than pixel locations \cite{bezierlanenet}\cite{polylanenet}\cite{lstr}. Although these segmentation-based and parameter-based methods provide end-to-end solutions, their sensitivity to lane shape compromises their robustness. \begin{figure}[t] \centering \def\subwidth{0.24\textwidth} \def\imgwidth{\linewidth} \def\imgheight{0.5625\linewidth} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/anchor_demo/anchor_fix_init.jpg} \caption{} \end{subfigure} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/anchor_demo/anchor_fix_learned.jpg} \caption{} \end{subfigure} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/anchor_demo/anchor_proposal.jpg} \caption{} \end{subfigure} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/anchor_demo/gt.jpg} \caption{} \end{subfigure} \caption{Anchor settings of different methods. (a) The initial anchor settings of CLRNet. (b) The learned anchor settings of CLRNet trained on CULane. (c) The learned anchors of our method. (d) The ground truth.} \label{anchor setting} \end{figure} \begin{figure}[t] \centering \def\subwidth{0.24\textwidth} \def\imgwidth{\linewidth} \def\imgheight{0.5625\linewidth} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/nms_demo/less_gt.jpg} \caption{} \end{subfigure} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/nms_demo/redun_gt.jpg} \caption{} \end{subfigure} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/nms_demo/less_pred.jpg} \caption{} \end{subfigure} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/nms_demo/redun_pred.jpg} \caption{} \end{subfigure} \caption{Comparison of anchor thresholds in \textit{sparse} and \textit{dense} scenarios. (a) and (b) Ground truths in a dense and sparse scenarios, respectively. (c) Predictions with large NMS thresholds in a dense scenario, resulting in a lane prediction being mistakenly suppressed. (d) Predictions with a small NMS threshold in a sparse scenario, where redundant prediction results are not effectively removed.} \label{NMS setting} \end{figure} %, where some lane instances are close with each others; , where the lane instance are far apart \par Drawing inspiration from object detection methods such as \textit{YOLO} \cite{yolov10} and \textit{Faster R-CNN} \cite{fasterrcnn}, several anchor-based approaches have been introduced for lane detection, with representative works including \textit{LaneATT} \cite{laneatt} and \textit{CLRNet} \cite{clrnet}. These methods have shown superior performance by leveraging anchor \textit{priors} (as shown in Fig. \ref{anchor setting}) and enabling larger receptive fields for feature extraction. However, anchor-based methods encounter similar drawbacks to those in general object detection, including the following: \begin{itemize} \item As shown in Fig. \ref{anchor setting}(a), a large number of lane anchors are predefined in the image, even in \textbf{\textit{sparse scenarios}}---the situations where lanes are distributed widely and located far apart from each other, as illustrated in the Fig. \ref{anchor setting}(d). \item A \textit{Non-Maximum Suppression} (NMS) post-processing step is required to eliminate redundant predictions but may struggle in \textbf{\textit{dense scenarios}} where lanes are close to each other, such as forked lanes and double lanes, as illustrated in the Fig. \ref{NMS setting}(a). \end{itemize} \par Regrading the first issue, \cite{clrnet} introduced learned anchors that optimize the anchor parameters during training to better adapt to lane distributions, as shown in Fig. \ref{anchor setting}(b). However, the number of anchors remains excessive to adequately cover the diverse potential distributions of lanes. Furthermore, \cite{adnet} proposes flexible anchors for each image by generating start points, rather than using a fixed set of anchors. Nevertheless, these start points of lanes are subjective and lack clear visual evidence due to the global nature of lanes. In contrast, \cite{srlane} uses a local angle map to propose sketch anchors according to the direction of ground truth. While this approach considers directional alignment, it neglects precise anchor positioning, resulting in suboptimal performance. Overall, the abundance of anchors is unnecessary in sparse scenarios.% where lane ground truths are sparse. The trend in new methodologies is to reduce the number of anchors while offering more flexible anchor configurations.%, which negatively impacts its performance. They also employ cascade cross-layer anchor refinement to bring the anchors closer to the ground truth. in the absence of cascade anchor refinement \par Regarding the second issue, nearly all anchor-based methods \cite{laneatt}\cite{clrnet}\cite{adnet}\cite{srlane} rely on direct or indirect NMS post-processing to eliminate redundant predictions. Although it is necessary to eliminate redundant predictions, NMS remains a suboptimal solution. On one hand, NMS is not deployment-friendly because it requires defining and calculating distances between lane pairs using metrics such as \textit{Intersection over Union} (IoU). This task is more challenging than in general object detection due to the intricate geometry of lanes. On the other hand, NMS can struggle in dense scenarios. Typically, a large distance threshold may lead to false negatives, as some true positive predictions could be mistakenly eliminated, as illustrated in Fig. \ref{NMS setting}(a)(c). Conversely, a small distance threshold may fail to eliminate redundant predictions effectively, resulting in false positives, as shown in Fig. \ref{NMS setting}(b)(d). Therefore, achieving an optimal trade-off across all scenarios by manually setting the distance threshold is challenging. %The root of this problem lies in the fact that the distance definition in NMS considers only geometric parameters while ignoring the semantic context in the image. As a result, when two predictions are ``close'' to each other, it is nearly impossible to determine whether one of them is redundant.% where lane ground truths are closer together than in sparse scenarios;including those mentioned above, \par To address the above two issues, we propose Polar R-CNN, a novel anchor-based method for lane detection. For the first issue, we introduce local and global heads based on the polar coordinate system to create anchors with more accurate locations, thereby reducing the number of proposed anchors in sparse scenarios, as illustrated in Fig. \ref{anchor setting}(c). In contrast to \textit{State-Of-The-Art} (SOTA) methods \cite{clrnet}\cite{clrernet}, which utilize 192 anchors, Polar R-CNN employs only 20 anchors to effectively cover potential lane ground truths. For the second issue, we have revised Fast NMS to Graph-based Fast NMS, incorporating a new heuristic \textit{Graph Neural Network} (GNN) block (Polar GNN block) into the NMS head. The Polar GNN block offers an interpretable structure, achieving nearly equivalent performance in sparse scenarios and superior performance in dense scenarios. We conducted experiments on five major benchmarks: \textit{TuSimple} \cite{tusimple}, \textit{CULane} \cite{scnn}, \textit{LLAMAS} \cite{llamas}, \textit{CurveLanes} \cite{curvelanes}, and \textit{DL-Rail} \cite{dalnet}. Our proposed method demonstrates competitive performance compared to SOTA approaches. Our main contributions are summarized as follows: \begin{itemize} \item We design a strategy to simplify the anchor parameters by using local and global polar coordinate systems and applied these to two-stage lane detection frameworks. Compared to other anchor-based methods, this strategy significantly reduces the number of proposed anchors while achieving better performance. \item We propose a novel Polar GNN block to implement a NMS-free paradigm. The block is inspired by Graph-based Fast NMS, providing enhanced interpretability. Our Polar GNN block supports end-to-end training and testing while still allowing for traditional NMS post-processing as an option for a NMS version of our model. \item By integrating the polar coordinate systems and Polar GNN block, we present a Polar R-CNN model for fast and efficient lane detection. And we conduct extensive experiments on five benchmark datasets to demonstrate the effectiveness of our model in high performance with fewer anchors and a NMS-free paradigm. %Additionally, our model features a straightforward structure—lacking cascade refinement or attention strategies—making it simpler to deploy. \end{itemize} % \section{Related Works} %As mentioned above, our model is based on deep learning. Generally, deep learning-based lane detection methods can be categorized into three groups: segmentation-based, parameter-based, and anchor-based methods. Additionally, NMS-free is an important technique for anchor-based methods, and it will also be described in this section. \par \textbf{Segmentation-based Methods.} These methods focus on pixel-wise prediction. They predefined each pixel into different categories according to different lane instances and background\cite{lanenet} and predicted information pixel by pixel. However, they often overly emphasize low-level and local features, neglecting global semantic information and real-time detection. To address this issue, \textit{SCNN} \cite{scnn} uses a larger receptive field. There are some methods such as \textit{UFLDv1-v2} \cite{ufld}\cite{ufldv2} and \textit{CondLaneNet}\cite{CondLaneNet} by utilizing row-wise or column-wise classification instead of pixel classification to improve detection speed. Another issue with these methods is that the lane instance prior is learned by the model itself, leading to a lack of prior knowledge. For example, \textit{LaneNet}\cite{lanenet} uses post-clustering to distinguish each lane instance, while \textit{UFLDv1-v2} categorizes lane instances by angles and locations, allowing it to detect only a fixed number of lanes. In contrast, \textit{CondLaneNet} employs different conditional dynamic kernels to predict different lane instances. Additionally, some methods such as \textit{FOLOLane}\cite{fololane} and \textit{GANet}\cite{ganet} adopt bottom-up strategies to detect a few key points and model their global relations to form lane instances. \par \textbf{Parameter-based Methods.} Instead of predicting a series of points locations or pixel classifications, the parameter-based methods directly generate the curve parameters of lane instances. For example, \textit{PolyLanenet}\cite{polylanenet} and \textit{LSTR}\cite{lstr} consider the lane instance as a polynomial curve, outputting the polynomial coefficients directly. \textit{BézierLaneNet}\cite{bezierlanenet} treats the lane instance as a Bézier curve, generating the locations of their control points, while \textit{BSLane}\cite{bsnet} uses B-Spline to describe the lane, with curve parameters that emphasize local lane shapes. These parameter-based methods are mostly end-to-end and do not require post-processing, resulting in faster inference speed. However, since the final visual lane shapes are sensitive to their shapes, the robustness and generalization of these methods may not be optimal. \par \textbf{Anchor-Based Methods.} These methods are inspired by general object detection models, such as YOLO \cite{yolov10} and Faster R-CNN \cite{fasterrcnn}, for lane detection. The earliest work is Line-CNN, which utilizes line anchors designed as rays emitted from the three edges (left, bottom, and right) of an image. However, the model’s receptive field is limited to the edges, rendering it suboptimal for capturing the entirety of the lane. LaneATT \cite{laneatt} improves upon this by employing anchor-based feature pooling to aggregate features along the entire line anchor, achieving faster speeds and better performance. Nevertheless, its grid sampling strategy and label assignment still pose limitations. A key advantage of the anchor-based methods is their flexibility, allowing the integration of strategies from anchor-based object detection. For example, \textit{CLRNet} \cite{clrnet} enhances the performance with \textit{cross-layer refinement strategies}, \textit{SimOTA label assignment} \cite{yolox}, and \textit{LIOU loss}, outperforming many previous methods. They also have some essential drawbacks, e.g., lane anchors are often handcrafted and numerous. Some approaches, such as \textit{ADNet} \cite{adnet}, \textit{SRLane} \cite{srlane}, and \textit{Sparse Laneformer} \cite{sparse}, attempt to reduce the number of anchors and provide more flexible proposals; however, this can slightly impact performance. Additionally, methods such as \cite{clrernet} \cite{adnet} still rely on NMS post-processing, complicating NMS threshold settings and model deployment. Although one-to-one label assignment during training, without NMS \cite{detr}\cite{o2o} during evaluation, alleviates this issue, its performance is still less satisfactory compared to NMS-based models. \par \textbf{NMS-free Methods.} Due to the threshold sensitivity and computational overhead of NMS, many studies attempt to NMF-free methods or models that do not use NMS during the detection process. For example, \textit{DETR} \cite{detr} employs one-to-one label assignment to avoid redundant predictions without using NMS. Other NMS-free methods \cite{learnNMS}\cite{date} \cite{yolov10} have also been proposed to addressing this issue from two aspects: \textit{model architecture} and \textit{label assignment}. For example, studies in \cite{date} \cite{yolov10} suggest that one-to-one assignments are crucial for NMS-free predictions, but maintaining one-to-many assignments is still necessary to ensure effective feature learning of the model. While some works in \cite{o3d} \cite{relationnet} consider the model’s expressive capacity to provide non-redundant predictions. However, compared to the extensive studies conducted in general object detection, there has been limited research analyzing the NMS-free paradigm. \par In this work, we aim to address the above two issues in the framework of anchor-based detection to achieve NMF-free and non-redundant lane predictions. % % \section{Polar R-CNN} \begin{figure*}[ht] \centering \includegraphics[width=0.85\linewidth]{thesis_figure/ovarall_architecture.png} \caption{An illustration of the Polar R-CNN architecture. It has a similar pipelines with the Faster R-CNN for the task of object detection, and consists of a backbone, a FPN with three levels of feature maps, respectively denote by $P_0, P_1, P_2$, followed by a \textit{local polar head}, and a RoI pooling module to extract features fed to a \textit{global polar head} for lane detection. Based on the designed lane representation and lane anchor representation in polar coordinate system, the local polar head can propose sparse line anchors and the global polar head can produce the robust and accurate lane predictions. The global polar head includes a triplet head, which comprises a \textit{one-to-one (O2O) classification head}, a \textit{one-to-many (O2M) classification head} , and a \textit{one-to-many (O2M) regression head}.} \label{overall_architecture} \end{figure*} \begin{figure}[t] \centering \def\subwidth{0.24\textwidth} \def\imgwidth{\linewidth} \def\imgheight{0.4\linewidth} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth]{thesis_figure/coord/ray.png} \caption{} \end{subfigure} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth]{thesis_figure/coord/polar.png} \caption{} \end{subfigure} \caption{Different descriptions for anchor parameters: (a) Ray: defined by its starting point and direction $\theta$. (b) Polar: defined by its radius and angle.} %rectangular coordinates \label{coord} \end{figure} \begin{figure}[t] \centering \includegraphics[width=\linewidth]{thesis_figure/coord/localpolar.png} \caption{Local (left) and global (right) polar coordinate system.} \label{lphlabel} \end{figure} % The overall architecture of our Polar R-CNN is illustrated in Fig. \ref{overall_architecture}. As shown in this figure, our Polar R-CNN for lane detection has a similar pipeline with Faster R-CNN \cite{fasterrcnn}, which consists of a backbone, a \textit{Feature Pyramid Network} (FPN), a \textit{Region Proposal Network} (RPN) followed by a local polar head, and \textit{Region of Interest} (RoI) pooling module followed by a global polar head. To investigate the fundamental factors affecting model performance, such as anchor settings and NMS post-processing, and also to enhance ease of deployment, our Polar R-CNN utilizes a simple and straightforward network structure. just relying on basic components, including convolutional or pooling operations, \textit{Multi-Layer Perceptrons} (MLPs), while deliberately excluding advanced elements like \textit{attention mechanisms}, \textit{dynamic kernels}, and \textit{cross-layer refinement} used in previous works \cite{clrnet}\cite{clrernet}. \par In the following, based on a polar coordinate representation of lane and lane anchors, we will further introduce the designed \textit{Local Polar Head} (LPH) and \textit{Global Polar Head} (GPH) in our Polar R-CNN. % \subsection{Representation of Lane and Lane Anchor} % Lanes are characterized by their thin, elongated, and curved shapes. A well-defined lane prior aids the model in feature extraction and location prediction. \par \textbf{Lane and Anchor Representation as Ray.} Given an input image with dimensions of length $W$ and height $H$, a lane is represented by a set of 2D points with equally spaced y-coordinates $Y=\{y_1, y_2,\cdots, y_n\}$, where $y_i=i\times\frac{H}{n}$ and $n$ is the number of data points. Since the set $Y$ is fixed, a lane can be uniquely defined by its x-coordinates $X=\{x_1,x_2,\cdots,x_n\}$, with each $x_i$ corresponding to the respective $y_i\in Y$. Previous studies \cite{linecnn}\cite{laneatt} have introduced lane priors, also known as lane anchors, which are represented as straight lines in the image plane that serve as references. From a geometric perspective, a lane anchor can be viewed as a ray defined by a starting point $(x_{orig},y_{orig})$ located at the edge of an image (left/bottom/right boundaries), along with a direction $\theta$, as shown in Fig. \ref{coord}(a). The primary task of a lane detection model is to estimate the x-coordinate offset from the lane anchor to the ground truth of the lane instance. \par However, lane anchors, which are essentially straight lines represented as rays, have certain drawbacks, as illustrated in Fig. \ref{coord}(a). A lane anchor possesses an infinite number of potential starting points, making the definition of the anchor’s starting point ambiguous and subjective. Some methods, such as \cite{linecnn}\cite{dalnet}\cite{laneatt}, define the starting points as being located at the boundaries of an image (e.g. the green point in Fig. \ref{coord}(a)), while \cite{adnet} sets the starting points to correspond to the actual visual location within the image (e.g. the purple point in Fig. \ref{coord}(a)). Moreover, occlusion and damage to the lane significantly impact the detection of these starting points, necessitating that the model possess a large receptive field \cite{adnet}. Fundamentally, a straight lane has two degrees of freedom (for instance, the slope and the intercept under a Cartesian coordinate system), implying that the lane anchor could be described using just two parameters instead of the three redundant parameters (two for the start point and one for orientation) currently employed. \par \textbf{Representation in Polar Coordinate.} As stated above, lane anchors represented by rays have some drawbacks. To address these issues, we introduce the polar coordinate representation of lane anchors. In mathematics, the polar coordinate is a two-dimensional coordinate system in which each point on a plane is determined by a distance from a reference point (also called the pole) and an angle $\theta$ from a reference direction (called polar axis). As shown in Fig. \ref{coord}(b), a lane anchor for a straight line can be uniquely defined by two parameters: the radial distance from the pole (called radius), $r$, and the counterclockwise angle from the polar axis, $\theta$, with $r \in \mathbb{R}$ and $\theta\in\left(-\frac{\pi}{2}, \frac{\pi}{2}\right]$. To better integrate the local inductive bias properties of CNNs, we define two types of polar coordinate systems: the local polar coordinate system and the global coordinate system. In the polar coordinate system, we introduce a set of reference points known as local poles. These local poles are positioned at the lattice points (or pixels) of the downsampled feature map, as illustrated in Fig. \ref{lphlabel} (a). Each local pole, which we denoted as $\mathbf{c}_{i}^{l}$, is responsible for predicting a single lane anchor, similar to the green points shown in Fig. \ref{lphlabel} (a). During training, as depicted in Fig. \ref{lphlabel} (a), the ground truth labels for each local pole are defined as follows: the radius ground truth is the shortest distance from a local pole to the ground truth lane curve, and the angle ground truth represents the orientation of the vector from the local pole to the nearest point on the curve. A local pole is labeled as a positive sample (the green points) if its radius label is below a threshold $\tau_{l}$; otherwise, it is considered a negative sample (the red points). Note that one lane curve instance is regressed by multiple local poles. Some local features around certain poles may be missed due to damage or occlusion of the lane curve, so the one-to-many approach is crucial for ensuring comprehensive anchor proposals. In the second stage (RoI Pooling and final lane detection), we standardize the lane anchors by transforming them from multiple local polar coordinate systems into a single uniform global coordinate system. This system contains only one reference point, termed the global pole, denoted as $\mathbf{c}^{g}$. The location of the global pole is manually set, and in this work, it is positioned around the static vanishing point of the entire lane image dataset. % \newpage % Since lane anchors are typically represented as straight lines, they can be described using straight line parameters. Previous approaches have used rays to describe 2D lane anchors, with the parameters including the coordinates of the starting point and the orientation/angle, denoted as $\left\{\theta, P_{xy}\right\}$, as shown in Fig. \ref{coord}(a). \cite{linecnn}\cite{laneatt} define the start points as lying on the three image boundaries. However, \cite{adnet} argue that this approach is problematic because the actual starting point of a lane could be located anywhere within the image. In our analysis, using a ray can lead to ambiguity in line representation because a line can have an infinite number of starting points, and the choice of the starting point for a lane is subjective. As illustrated in Fig. \ref{coord}(a), the yellow (the visual start point) and green (the point located on the image boundary) starting points with the same orientation $\theta$ describe the same line, and either could be used in different datasets \cite{scnn}\cite{vil100}. This ambiguity arises because a straight line has two degrees of freedom, whereas a ray has three (two for the start point and one for orientation). To resolve this issue , we propose using polar coordinates to describe a lane anchor with only two parameters: radius and angle, deoted as $\left\{\theta, r\right\}$, where This representation is illustrated in Fig. \ref{coord}(b). % \newpage \begin{figure}[t] \centering \includegraphics[width=0.45\textwidth]{thesis_figure/local_polar_head.png} \caption{The main architecture of LPH.} \label{lph} \end{figure} % We define two types of polar coordinate systems: the global coordinate system and the local coordinate system, with the origin points denoted as the global origin $\boldsymbol{c}^{g}$ and the local origin $\boldsymbol{c}^{l}$, respectively. For convenience, the global origin is positioned near the static vanishing point of the entire lane image dataset, while the local origins are set at lattice points within the image. As illustrated in Fig. \ref{coord}(b), only the radius parameters are affected by the choice of the origin point, while the angle/orientation parameters remain consistent. \subsection{Local Polar Head} \textbf{Anchor formulation in local polar head.}. Inspired by the region proposal network in Faster R-CNN \cite{fasterrcnn}, the local polar head (LPH) aims to propose flexible, high-quality anchors aorund the lane ground truths within an image. As Fig. \ref{lph} and Fig. \ref{overall_architecture} demonstrate, the highest level $P_{3} \in \mathbb{R}^{C_{f} \times H_{f} \times W_{f}}$ of FPN feature maps is selected as the input for LPH. Following a downsampling operation, the feature map is then fed into two branches: the regression branch $\phi _{reg}^{lph}\left(\cdot \right)$ and the classification branch $\phi _{cls}^{lph}\left(\cdot \right)$: \begin{equation} \begin{aligned} &F_d\gets DS\left( P_{3} \right), \,F_d\in \mathbb{R} ^{C_f\times H^{l}\times W^{l}},\\ &F_{reg\,\,}\gets \phi _{reg}^{lph}\left( F_d \right), \,F_{reg\,\,}\in \mathbb{R} ^{2\times H^{l}\times W^{l}},\\ &F_{cls}\gets \phi _{cls}^{lph}\left( F_d \right), \,F_{cls}\in \mathbb{R} ^{H^{l}\times W^{l}}. \end{aligned} \label{lph equ} \end{equation} The regression branch aims to propose lane anchors by predicting two parameters $F_{reg\,\,} \equiv \left\{\theta_{j}, r^{l}_{j}\right\}_{j=1}^{H^{l}\times W^{l}}$, within the local polar coordinate system. These parameters represent the angles and the radius.The classification branch predicts the heat map $F_{cls\,\,}\equiv \left\{ c_j \right\} _{j=1}^{H^l\times W^l}$ of the local poles. By discarding local poles with lower confidence, the module increases the likelihood of selecting potential positive foreground lane anchors while removing background lane anchors to the greatest extent. Keeping it simple, the regression branch $\phi _{reg}^{lph}\left(\cdot \right)$ consists of one $1\times1$ convolutional layer while the classification branch $\phi _{cls}^{lph}\left(\cdot \right)$ consists of two $1\times1$ convolutional layers. \textbf{Loss Function.} Once the regression and classification labels are established as Fig. \ref{lphlabel}, the LPH can be trained using the smooth-L1 loss $d\left(\cdot \right)$ for regression and the binary cross-entropy loss $BCE\left( \cdot , \cdot \right)$ for classification. The LPH loss function is defined as follows: \begin{equation} \begin{aligned} \mathcal{L} _{lph}^{cls}&=BCE\left( F_{cls},F_{gt} \right), \\ \mathcal{L} _{lph}^{r\mathrm{e}g}&=\frac{1}{N_{lph}^{pos}}\sum_{j\in \left\{j|\hat{r}_i<\tau_{L} \right\}}{\left( d\left( \theta _j-\hat{\theta}_j \right) +d\left( r_j^L-\hat{r}_j^L \right) \right)}.\\ \end{aligned} \label{loss_lph} \end{equation} \textbf{Top-$K_{a}$ Anchor Selectoin.}. During the training stage, all $H^{l}\times W^{l}$ anchors are considered as candidate anchors and fed into the R-CNN module. This approach helps the R-CNN module to learn from sufficient features of negative (background) anchor samples. In the evaluation stage, however, only the top-$K_{a}$ anchors with the highest confidence scores are selected and fed into the R-CNN module. This strategy is designed to filter out potential negative (background) anchors and reduce the computational complexity of the R-CNN module. By doing so, it maintains the adaptability and flexibility of anchor distribution while decreasing the total number of anchors. The following experiments will demonstrate the effectiveness of our top-$K_{a}$ anchor selection strategy. \subsection{Global Polar Head.} Global polar head (GPH) is a crucial component in the second stage of Polar R-CNN. It takes lane anchor pooling features as input and predicts the precise lane location and confidence. Fig. \ref{gph} illustrates the structure and pipeline of GPH. GPH comprises RoI pooling modules and three subheads (triplet head module), which will be introduced in detail. \textbf{RoI Pooling Module.} RoI pooling module is designed to transform features sampled from lane anchors into a standard feature tensor. Once the local polar parameters of a lane anchor are given, they can be converted to global polar coordinates using the following equation: \begin{equation} \begin{aligned} r^{g}_{j}=r^{l}_{j}+\left( \textbf{c}^{l}_{j}-\textbf{c}^{g}_{j} \right) ^{T}\left[\cos\theta_{j}; \sin\theta_{j} \right]. \end{aligned} \end{equation} where $\textbf{c}^{l}_{j} \in \mathbb{R}^{2}$ and $\textbf{c}^{g} \in \mathbb{R}^{2}$ represent the Cartesian coordinates of $j_{th}$ local pole and the global pole correspondingly. Next, feature points are sampled on the lane anchor. The y-coordinates of these points are uniformly sampled vertically from the image, as previously mentioned. The $x_{i}$ coordinates are computed using the global polar axis with the following equation: \begin{equation} \begin{aligned} x_{i\,\,}=-y_i\tan \theta +\frac{r^{g}}{\cos \theta}. \end{aligned} \end{equation} \begin{figure}[t] \centering \includegraphics[width=\linewidth]{thesis_figure/detection_head.png} % 替换为你的图片文件名 \caption{The main architecture of GPH.} \label{gph} \end{figure} Suppose the $P_{0}$, $P_{1}$ and $P_{2}$ denote the last three levels from FPN and $\boldsymbol{F}_{L}^{s}\in \mathbb{R} ^{N_p\times d_f}$ represent the $L_{th}$ sample point feature from $P_{L}$. The grid featuers from the three levels are extracted and fused together without cross layer cascade refinenment unlike CLRNet. To reduce the number of parameters, we employ a weight sum strategy to combine features from different layers (denoted as $L$), similar to \cite{detr}, but in a more compact form: \begin{equation} \begin{aligned} \boldsymbol{F}^s=\sum_{L=0}^2{\boldsymbol{F}_{L}^{s}\times \frac{e^{\boldsymbol{w}_{L}^{s}}}{\sum_{L=0}^2{e^{\boldsymbol{w}_{L}^{s}}}}}, \end{aligned} \end{equation} where $\boldsymbol{w}_{L}^{s}\in \mathbb{R} ^{N_p}$ represents the learnable aggregate weight, serving as a learned model weight. Instead of concatenating the three sampling features into $\boldsymbol{F}^s\in \mathbb{R} ^{N_p\times d_f\times 3}$ directly, the adaptive summation significantly reduces the feature dimensions to $\boldsymbol{F}^s\in \mathbb{R} ^{N_p\times d_f}$, which is one-third of the original dimension. The weighted sum tensors are then fed into fully connected layers to obtain the pooled RoI features of an anchor: \begin{equation} \begin{aligned} \boldsymbol{F}^{roi}\gets FC_{pooling}\left( \boldsymbol{F}^s \right), \boldsymbol{F}^{roi}\in \mathbb{R} ^{d_r}, \end{aligned} \end{equation} \textbf{Triplet Head.} The triplet head comprises three distinct heads: the one-to-one classification (O2O cls) head, the one-to-many classification (O2M cls) head, and the one-to-many regression (O2M reg) head. In various studies \cite{laneatt}\cite{clrnet}\cite{adnet}\cite{srlane}, the detection head predominantly follows the one-to-many paradigm. During the training phase, multiple positive samples are assigned to a single ground truth. Consequently, during the evaluation stage, redundant detection results are often predicted for each instance. These redundancies are typically addressed using NMS, which eliminates duplicate results and retains the highest confidence detection for each groung truth. However, NMS relies on the definition of distance between detection results, and this calculation can be complex for curved lanes and other irregular geometric shapes. To achieve non-redundant detection results with a NMS-free paradigm, the one-to-one paradigm becomes crucial during training, as highlighted in \cite{o2o}. Nevertheless, merely adopting the one-to-one paradigm is insufficient; the structure of the detection head also plays a pivotal role in achieving NMS-free detection. This aspect will be further analyzed in the following sections. \textbf{NMS vs NMS-free.} Let $\boldsymbol{F}^{roi}_{i}$ denotes the ROI features extracted from $i_{th}$ anchors and the three subheads using $\boldsymbol{F}^{roi}_{i}$ as input. For now, let us focus on the O2M classification (O2M cls) head and the O2M regression (O2M reg) head, which follow the old paradigm used in previous work and can serve as a baseline for the new one-to-one paradigm. To maintain simplicity and rigor, both the O2M classification head and the O2M regression head consist of two layers with activation functions, featuring a plain structure without any complex mechanisms such as attention or deformable convolution. as previously mentioned, merely replacing the one-to-many label assignment with one-to-one label assignment is insufficient for eliminating NMS post-processing. This is because anchors often exhibit significant overlap or are positioned very close to each other, as shown in Fig. \ref{anchor setting}(b)\&(c). Let the $\boldsymbol{F}^{roi}_{i}$ and $\boldsymbol{F}^{roi}_{j}$ represent the features from two overlapping (or very close) anchors, implying that $\boldsymbol{F}^{roi}_{i}$ and $\boldsymbol{F}^{roi}_{j}$ will be almost identical. Let $f_{plain}^{cls}$ denotes the neural structure used in O2M classification head and suppose it's trained with one-to-one label assignment. If $\boldsymbol{F}^{roi}_{i}$ is a positive sample and the $\boldsymbol{F}^{roi}_{j}$ is a negative sample, the ideal output should be as follows: \begin{equation} \begin{aligned} &\boldsymbol{F}_{i}^{roi}\approx \boldsymbol{F}_{j}^{roi}, \\ &f_{cls}^{plain}\left( \boldsymbol{F}_{i}^{roi} \right) \rightarrow 1, \\ &f_{cls}^{plain}\left( \boldsymbol{F}_{j}^{roi} \right) \rightarrow 0. \end{aligned} \label{sharp fun} \end{equation} The Eq. (\ref{sharp fun}) suggests that the property of $f_{cls}^{plain}$ need to be ``sharp'' enough to differentiate between two similar features. That is to say, the output of $f_{cls}^{plain}$ changes rapidly over short periods or distances, it implies that $f_{cls}^{plain}$ need to captures information with higher frequency. This issue is also discussed in \cite{o3d}. Capturing the high frequency with a plain structure is difficult because a naive MLP tends to capture information with lower frequency \cite{xu2022overview}. In the most extreme case, where $\boldsymbol{F}_{i}^{roi} = \boldsymbol{F}_{j}^{roi}$, it becomes impossible to distinguish the two anchors to positive and negative samples completely; in practice, both confidences converge to around 0.5. This problem arises from the limitations of the input format and the structure of the naive MLP, which restrict its expressive capability for information with higher frequency. Therefore, it is crucial to establish relationships between anchors and design a new model structure to effectively represent ``sharp'' information. It is easy to see that the ``ideal'' one-to-one branch is equivalence to O2M cls branch with O2M regression and NMS post-processing. If the NMS could be replaced by some equivalent but learnable functions (e.g. a neural network with specific structure), the O2O head could be trained to handle the one-to-one assignment. However, the NMS involves sequential iteration and confidence sorting, which are challenging to reproduce with a neural network. Although previous works, such as RNN-based approaches \cite{stewart2016end}, utilize an iterative format, they are time-consuming and introduce additional complexity into the model training process due to their iterative nature. To eliminate the iteration process, we proposed a equivalent format of Fast NMS\cite{yolact}. \begin{algorithm}[t] \caption{The Algorithm of the Graph-based Fast NMS} \begin{algorithmic}[1] %这个1 表示每一行都显示数字 \REQUIRE ~~\\ %算法的输入参数:Input The index of positive predictions, $1, 2, ..., i, ..., N_{pos}$;\\ The positive corresponding anchors, $[\theta_i, r_{i}^{global}]$;\\ The x axis of sampling points from positive anchors, $\boldsymbol{x}_{i}^{b}$;\\ The positive confidence get from o2m classification head, $s_i$;\\ The positive regressions get from o2m regression head, the horizontal offset $\varDelta \boldsymbol{x}_{i}^{roi}$ and end point location $\boldsymbol{e}_{i}$.\\ \ENSURE ~~\\ %算法的输出:Output \STATE Calculate the confidential adjacent matrix $\boldsymbol{C} \in \mathbb{R} ^{N_{pos} \times N_{pos}} $, where the element $C_{ij}$ in $\boldsymbol{C}$ is caculate as follows: \begin{equation} \begin{aligned} C_{ij}=\begin{cases} 1, s_i0$, the value range of GLaneIoU is $\left(-g, 1 \right]$. We then define the cost function between $i_{th}$ prediction and $j_{th}$ ground truth as follows like \cite{detr}: \begin{equation} \begin{aligned} \mathcal{C} _{ij}=\left(s_i\right)^{\beta_c}\times \left( GLaneIoU_{ij, g=0} \right) ^{\beta_r}. \end{aligned} \end{equation} This cost function is more compact than those in previous works\cite{clrnet}\cite{adnet} and takes both location and confidence into account. For label assignment, SimOTA (with k=4) \cite{yolox} is used for the two O2M heads with one-to-many assignment, while the Hungarian \cite{detr} algorithm is employed for the O2O classification head for one-to-one assignment. \begin{figure}[t] \centering \includegraphics[width=\linewidth]{thesis_figure/auxloss.png} % \caption{Auxiliary loss for segment parameter regression.} \label{auxloss} \end{figure} \textbf{Loss function.} We use focal loss \cite{focal} for O2O classification head and O2M classification head: \begin{equation} \begin{aligned} \mathcal{L} _{o2m}^{cls}&=\sum_{i\in \varOmega _{pos}^{o2m}}{\alpha _{o2m}\left( 1-s_i \right) ^{\gamma}\log \left( s_i \right)}\\&+\sum_{i\in \varOmega _{neg}^{o2m}}{\left( 1-\alpha _{o2m} \right) \left( s_i \right) ^{\gamma}\log \left( 1-s_i \right)}, \\ \mathcal{L} _{o2o}^{cls}&=\sum_{i\in \varOmega _{pos}^{o2o}}{\alpha _{o2o}\left( 1-\tilde{s}_i \right) ^{\gamma}\log \left( \tilde{s}_i \right)}\\&+\sum_{i\in \varOmega _{neg}^{o2o}}{\left( 1-\alpha _{o2o} \right) \left( \tilde{s}_i \right) ^{\gamma}\log \left( 1-\tilde{s}_i \right)}. \\ \end{aligned} \end{equation} where the set of the one-to-one sample, $\varOmega _{pos}^{o2o}$ and $\varOmega _{neg}^{o2o}$, is restricted to the positive sample set of O2M classification head: \begin{equation} \begin{aligned} \varOmega _{pos}^{o2o}\cup \varOmega _{neg}^{o2o}=\left\{ i|s_i>C_{o2m} \right\}. \end{aligned} \end{equation} Only one sample with confidence larger than $C_{o2m}$ is chosed as the canditate sample of O2O classification head. According to \cite{pss}, to maintain feature quality during training stage, the gradient of O2O classification head are stopped from propagating back to the rest of the network (stop from the roi feature of the anchor $\boldsymbol{F}_{i}^{roi}$). Additionally, we use the rank loss to increase the gap between positive and negative confidences of O2O classification head: \begin{equation} \begin{aligned} &\mathcal{L} _{\,\,rank}=\frac{1}{N_{rank}}\sum_{i\in \varOmega _{pos}^{o2o}}{\sum_{j\in \varOmega _{neg}^{o2o}}{\max \left( 0, \tau _{rank}-\tilde{s}_i+\tilde{s}_j \right)}},\\ &N_{rank}=\left| \varOmega _{pos}^{o2o} \right|\left| \varOmega _{neg}^{o2o} \right|. \end{aligned} \end{equation} We directly use the GLaneIoU loss, $\mathcal{L}_{GLaneIoU}$, to regression the offset of xs (with g=1) and Smooth-L1 loss for the regression of end points (namely the y axis of the start point and the end point), denoted as $\mathcal{L} _{end}$. In order to make model learn the global features, we proposed the auxiliary loss illustrated in Fig. \ref{auxloss}: \begin{align} \begin{aligned} \mathcal{L}_{aux} &= \frac{1}{\left| \varOmega_{pos}^{o2m} \right| N_{seg}} \sum_{i \in \varOmega_{pos}^{o2o}} \sum_{m=j}^k \Bigg[ l \left( \theta_i - \hat{\theta}_{i}^{seg,m} \right) \\ &\quad + l \left( r_{i}^{global} - \hat{r}_{i}^{seg,m} \right) \Bigg]. \end{aligned} \end{align} The anchors and ground truth are divided into several segments. Each anchor segment is regressed to the main components of the corresponding segment of the assigned ground truth. This trick assists the anchors in learning more about the global geometric shape. \subsection{Loss function} The overall loss function of Polar R-CNN is given as follows: \begin{equation} \begin{aligned} \mathcal{L}_{overall} &=\mathcal{L} _{lph}^{cls}+w_{lph}^{reg}\mathcal{L} _{lph}^{reg}\\&+w_{o2m}^{cls}\mathcal{L} _{o2m}^{cls}+w_{o2o}^{cls}\mathcal{L} _{o2o}^{cls}+w_{rank}\mathcal{L} _{rank}\\&+w_{IoU}\mathcal{L} _{IoU}+w_{end}\mathcal{L} _{end}+w_{aux}\mathcal{L} _{aux}. \end{aligned} \end{equation} The first line in the loss function represents the loss for LPH, which includes both classification and regression components. The second line pertains to the losses associated with the two classification heads (O2M and O2O), while the third line represents the loss for the regression head within the triplet head. Each term in the equation is weighted by a factor to balance the contributions of each component to the gradient. The entire training process is end-to-end. \begin{table*}[htbp] \centering \caption{Infos and hyperparameters for five datasets. For CULane, $*$ denotes the actual number of training samples used to train our model. Please note that labels for some validation/test sets are missing; therefore, we have selected different splits (test or validation set) for different datasets.} \begin{adjustbox}{width=\linewidth} \begin{tabular}{l|l|ccccc} \toprule \multicolumn{2}{c|}{\textbf{Dataset}} & CULane & TUSimple & LLAMAS & DL-Rail & CurveLanes \\ \midrule \multirow{7}*{Dataset Description} & Train &88,880/$55,698^{*}$&3,268 &58,269&5,435&100,000\\ & Validation &9,675 &358 &20,844&- &20,000 \\ & Test &34,680&2,782 &20,929&1,569&- \\ & Resolution &$1640\times590$&$1280\times720$&$1276\times717$&$1920\times1080$&$2560\times1440$, etc\\ & Lane &$\leqslant4$&$\leqslant5$&$\leqslant4$&$=2$&$\leqslant10$\\ & Environment &urban and highway & highway&highway&railay&urban and highway\\ & Distribution &sparse&sparse&sparse&sparse&sparse and dense\\ \midrule \multirow{2}*{Dataset Split} & Evaluation &Test&Test&Test&Test&Val\\ & Visualization &Test&Test&Val&Test&Val\\ \midrule \multirow{1}*{Data Preprocess} & Crop Height &270&160&300&560&640, etc\\ \midrule \multirow{6}*{Training Hyperparameter} & Epoch Number &32&70&20&90&32\\ & Batch Size &40&24&32&40&40\\ & Warm up iterations &800&200&800&400&800\\ & $w_{aux}$ &0.2&0 &0.2&0.2&0.2\\ & $w_{rank}$ &0.7&0.7&0.1&0.7&0 \\ \midrule \multirow{4}*{Evaluation Hyperparameter} & $H^{l}\times W^{l}$ &$4\times10$&$4\times10$&$4\times10$&$4\times10$&$6\times13$\\ & $K_{a}$ &20&20&20&12&50\\ & $C_{o2m}$ &0.48&0.40&0.40&0.40&0.45\\ & $C_{o2o}$ &0.46&0.46&0.46&0.46&0.44\\ \bottomrule \end{tabular} \end{adjustbox} \label{dataset_info} \end{table*} \section{Experiment} \subsection{Dataset and Evaluation Metric} We conducted experiments on four widely used lane detection benchmarks and one rail detection dataset: CULane\cite{scnn}, TuSimple\cite{tusimple}, LLAMAS\cite{llamas}, CurveLanes\cite{curvelanes}, and DL-Rail\cite{dalnet}. Among these datasets, CULane and CurveLanes are particularly challenging. The CULane dataset consists various scenarios but has sparse lane distributions, whereas CurveLanes includes a large number of curved and dense lane types, such as forked and double lanes. The DL-Rail dataset, focused on rail detection across different scenarios, is chosen to evaluate our model’s performance beyond traditional lane detection. The details for five dataset are shown in Table. \ref{dataset_info} We use the F1-score to evaluate our model on the CULane, LLAMAS, DL-Rail, and Curvelanes datasets, maintaining consistency with previous works. The F1-score is defined as follows: \begin{equation} \begin{aligned} F1=\frac{2\times Precision\times Recall}{Precision\,\,+\,\,Recall}, \\ Precision\,\,=\,\,\frac{TP}{TP+FP}, \\ Recall\,\,=\,\,\frac{TP}{TP+FN}. \end{aligned} \end{equation} In our experiment, we use different IoU thresholds to calculate the F1-score for different datasets: F1@50 and F1@75 for CULane \cite{clrnet}, F1@50 for LLAMAS \cite{clrnet} and Curvelanes \cite{CondLaneNet}, and F1@50, F1@75, and mF1 for DL-Rail \cite{dalnet}. The mF1 is defined as: \begin{equation} \begin{aligned} mF1=\left( F1@50+F1@55+...+F1@95 \right) /10. \end{aligned} \end{equation} For Tusimple, the evaluation is formulated as follows: \begin{equation} \begin{aligned} Accuracy=\frac{\sum{C_{clip}}}{\sum{S_{clip}}}. \end{aligned} \end{equation} where $C_{clip}$ and $S_{clip}$ represent the number of correct points (predicted points within 20 pixels of the ground truth) and the ground truth points, respectively. If the accuracy exceeds 85\%, the prediction is considered correct. TuSimples also report the False Positive Rate (FPR=1-Precision) and False Negative Rate (FNR=1-Recall) formular. \subsection{Implement Detail} All input images are cropped and resized to $800\times320$. Similar to \cite{clrnet}, we apply random affine transformations and random horizontal flips. For the optimization process, we use the AdamW \cite{adam} optimizer with a learning rate warm-up and a cosine decay strategy. The initial learning rate is set to 0.006. The number of sampled points and regression points for each lane anchor are set to 36 and 72, respectively. The power coefficients of cost function, $\beta_{c}$ and $\beta_{r}$, are set to 1 and 6 respectively. We set different base semi-widths, denoted as $w_{b}^{assign}$, $w_{b}^{cost}$ and $w_{b}^{loss}$ for label assignment, cost function and loss function, respectively, as demonstrated in previous work\cite{clrernet}. Other parameters, such as batch size and loss weights for each dataset, are detailed in Table \ref{dataset_info}. Since some test/validation sets for the five datasets are not accessible, the test/validation sets used are also listed in Table \ref{dataset_info}. All the expoeriments are conducted on a single NVIDIA A100-40G GPU. To make our model simple, we only use CNN-based backbone, namely ResNet\cite{resnet} and DLA34\cite{dla}. \begin{table*}[htbp] \centering \caption{Comparision results on CULane test set with other methods.} \normalsize \begin{adjustbox}{width=\linewidth} \begin{tabular}{lrlllllllllll} \toprule \textbf{Method}& \textbf{Backbone}&\textbf{F1@50}$\uparrow$& \textbf{F1@75}$\uparrow$& \textbf{Normal}$\uparrow$&\textbf{Crowded}$\uparrow$&\textbf{Dazzle}$\uparrow$&\textbf{Shadow}$\uparrow$&\textbf{No line}$\uparrow$& \textbf{Arrow}$\uparrow$& \textbf{Curve}$\uparrow$& \textbf{Cross}$\downarrow$ & \textbf{Night}$\uparrow$ \\ \hline \textbf{Seg \& Grid} \\ \cline{1-1} SCNN\cite{scnn} &VGG-16 &71.60&39.84&90.60&69.70&58.50&66.90&43.40&84.10&64.40&1900&66.10\\ RESA\cite{resa} &ResNet50 &75.30&53.39&92.10&73.10&69.20&72.80&47.70&83.30&70.30&1503&69.90\\ LaneAF\cite{laneaf} &DLA34 &77.41&- &91.80&75.61&71.78&79.12&51.38&86.88&72.70&1360&73.03\\ UFLDv2\cite{ufldv2} &ResNet34 &76.0 &- &92.5 &74.8 &65.5 &75.5 &49.2 &88.8 &70.1 &1910&70.8 \\ CondLaneNet\cite{CondLaneNet} &ResNet101&79.48&61.23&93.47&77.44&70.93&80.91&54.13&90.16&75.21&1201&74.80\\ \cline{1-1} \textbf{Parameter} \\ \cline{1-1} BézierLaneNet\cite{bezierlanenet} &ResNet18&73.67&-&90.22&71.55&62.49&70.91&45.30&84.09&58.98&\textbf{996} &68.70\\ BSNet\cite{bsnet} &DLA34 &80.28&-&93.87&78.92&75.02&82.52&54.84&90.73&74.71&1485&75.59\\ Eigenlanes\cite{eigenlanes} &ResNet50&77.20&-&91.7 &76.0 &69.8 &74.1 &52.2 &87.7 &62.9 &1509&71.8 \\ \cline{1-1} \textbf{Keypoint} \\ \cline{1-1} CurveLanes-NAS-L\cite{curvelanes} &- &74.80&-&90.70&72.30&67.70&70.10&49.40&85.80&68.40&1746&68.90\\ FOLOLane\cite{fololane} &ResNet18 &78.80&-&92.70&77.80&75.20&79.30&52.10&89.00&69.40&1569&74.50\\ GANet-L\cite{ganet} &ResNet101&79.63&-&93.67&78.66&71.82&78.32&53.38&89.86&77.37&1352&73.85\\ \cline{1-1} \textbf{Dense Anchor} \\ \cline{1-1} LaneATT\cite{laneatt} &ResNet18 &75.13&51.29&91.17&72.71&65.82&68.03&49.13&87.82&63.75&1020&68.58\\ LaneATT\cite{laneatt} &ResNet122&77.02&57.50&91.74&76.16&69.47&76.31&50.46&86.29&64.05&1264&70.81\\ CLRNet\cite{laneatt} &Resnet18 &79.58&62.21&93.30&78.33&73.71&79.66&53.14&90.25&71.56&1321&75.11\\ CLRNet\cite{laneatt} &DLA34 &80.47&62.78&93.73&79.59&75.30&82.51&54.58&90.62&74.13&1155&75.37\\ CLRerNet\cite{clrernet} &DLA34 &81.12&64.07&94.02&80.20&74.41&\textbf{83.71}&56.27&90.39&74.67&1161&\textbf{76.53}\\ \cline{1-1} \textbf{Sparse Anchor} \\ \cline{1-1} ADNet \cite{adnet} &ResNet34&78.94&-&92.90&77.45&71.71&79.11&52.89&89.90&70.64&1499&74.78\\ SRLane \cite{srlane} &ResNet18&79.73&-&93.52&78.58&74.13&81.90&55.65&89.50&75.27&1412&74.58\\ Sparse Laneformer\cite{sparse} &Resnet50&77.83&-&- &- &- &- &- &- &- &- &- \\ \hline \textbf{Proposed Method} \\ \cline{1-1} Polar R-CNN-NMS &ResNet18&80.81&63.97&94.12&79.57&76.53&83.33&55.10&90.70&79.50&1088&75.25\\ Polar R-CNN &ResNet18&80.81&63.96&94.12&79.57&76.53&83.33&55.06&90.62&79.50&1088&75.25\\ Polar R-CNN &ResNet34&80.92&63.97&94.24&79.76&76.70&81.93&55.40&\textbf{91.12}&79.85&1158&75.71\\ Polar R-CNN &ResNet50&81.34&64.77&94.45&\textbf{80.42}&75.82&83.61&56.62&91.10&80.05&1356&75.94\\ Polar R-CNN-NMS &DLA34 &\textbf{81.49}&64.96&\textbf{94.44}&80.36&\textbf{76.79}&83.68&56.52&90.85&\textbf{80.09}&1133&76.32\\ Polar R-CNN &DLA34 &\textbf{81.49}&\textbf{64.97}&\textbf{94.44}&80.36&\textbf{76.79}&83.68&\textbf{56.55}&90.81&\textbf{79.80}&1133&76.33\\ \bottomrule \end{tabular} \end{adjustbox} \label{culane result} \end{table*} \begin{table}[h] \centering \caption{Comparision results on TuSimple test set with other methods.} \begin{adjustbox}{width=\linewidth} \begin{tabular}{lrcccc} \toprule \textbf{Method}& \textbf{Backbone}& \textbf{Acc(\%)}&\textbf{F1(\%)}&\textbf{FP(\%)}&\textbf{FN(\%)} \\ \midrule SCNN\cite{scnn} &VGG16 &96.53&95.97&6.17&\textbf{1.80}\\ PolyLanenet\cite{polylanenet}&EfficientNetB0&93.36&90.62&9.42&9.33\\ UFLDv2\cite{ufld} &ResNet34 &88.08&95.73&18.84&3.70\\ LaneATT\cite{laneatt} &ResNet34 &95.63&96.77&3.53&2.92\\ FOLOLane\cite{laneatt} &ERFNet &\textbf{96.92}&96.59&4.47&2.28\\ CondLaneNet\cite{CondLaneNet}&ResNet101 &96.54&97.24&2.01&3.50\\ CLRNet\cite{clrnet} &ResNet18 &96.84&97.89&2.28&1.92\\ \midrule Polar R-CNN-NMS &ResNet18&96.21&\textbf{97.98}&2.17&1.86\\ Polar R-CNN &ResNet18&96.20&97.94&2.25&1.87\\ \bottomrule \end{tabular} \end{adjustbox} \label{tusimple result} \end{table} \begin{table}[h] \centering \caption{Comparision results on LLAMAS test set with other methods.} \begin{adjustbox}{width=\linewidth} \begin{tabular}{lrcccc} \toprule \textbf{Method}& \textbf{Backbone}&\textbf{F1@50(\%)}&\textbf{Precision(\%)}&\textbf{Recall(\%)} \\ \midrule SCNN\cite{scnn} &ResNet34&94.25&94.11&94.39\\ BézierLaneNet\cite{bezierlanenet} &ResNet34&95.17&95.89&94.46\\ LaneATT\cite{laneatt} &ResNet34&93.74&96.79&90.88\\ LaneAF\cite{laneaf} &DLA34 &96.07&\textbf{96.91}&95.26\\ DALNet\cite{dalnet} &ResNet18&96.12&96.83&95.42\\ CLRNet\cite{clrnet} &DLA34 &96.12&- &- \\ \midrule Polar R-CNN-NMS &ResNet18&96.05&96.80&95.32\\ Polar R-CNN &ResNet18&96.06&96.81&95.32\\ Polar R-CNN-NMS &DLA34&96.13&96.80&\textbf{95.47}\\ Polar R-CNN &DLA34&\textbf{96.14}&96.82&\textbf{95.47}\\ \bottomrule \end{tabular} \end{adjustbox} \label{llamas result} \end{table} \begin{table}[h] \centering \caption{Comparision results on DL-Rail test set with other methods.} \begin{adjustbox}{width=\linewidth} \begin{tabular}{lrccc} \toprule \textbf{Method}& \textbf{Backbone}&\textbf{mF1(\%)}&\textbf{F1@50(\%)}&\textbf{F1@75(\%)} \\ \midrule BézierLaneNet\cite{bezierlanenet} &ResNet18&42.81&85.13&38.62\\ GANet-S\cite{ganet} &Resnet18&57.64&95.68&62.01\\ CondLaneNet\cite{CondLaneNet} &Resnet18&52.37&95.10&53.10\\ UFLDv1\cite{ufld} &ResNet34&53.76&94.78&57.15\\ LaneATT(with RPN)\cite{dalnet} &ResNet18&55.57&93.82&58.97\\ DALNet\cite{dalnet} &ResNet18&59.79&96.43&65.48\\ \midrule Polar R-CNN-NMS &ResNet18&\textbf{61.53}&\textbf{97.01}&\textbf{67.86}\\ Polar R-CNN &ResNet18&61.52&96.99&67.85\\ \bottomrule \end{tabular} \end{adjustbox} \label{dlrail result} \end{table} \begin{table}[h] \centering \caption{Comparision results on CurveLanes validation set with other methods.} \begin{adjustbox}{width=\linewidth} \begin{tabular}{lrcccc} \toprule \textbf{Method}& \textbf{Backbone}&\textbf{F1@50 (\%)}&\textbf{Precision (\%)}&\textbf{Recall (\%)} \\ \midrule SCNN\cite{scnn} &VGG16 &65.02&76.13&56.74\\ Enet-SAD\cite{enetsad} &- &50.31&63.60&41.60\\ PointLanenet\cite{pointlanenet} &ResNet101&78.47&86.33&72.91\\ CurveLane-S\cite{curvelanes} &- &81.12&93.58&71.59\\ CurveLane-M\cite{curvelanes} &- &81.80&93.49&72.71\\ CurveLane-L\cite{curvelanes} &- &82.29&91.11&75.03\\ UFLDv2\cite{ufldv2} &ResNet34 &81.34&81.93&80.76\\ CondLaneNet-M\cite{CondLaneNet} &ResNet34 &85.92&88.29&83.68\\ CondLaneNet-L\cite{CondLaneNet} &ResNet101&86.10&88.98&83.41\\ CLRNet\cite{clrnet} &DLA34 &86.10&91.40&81.39\\ CLRerNet\cite{clrernet} &DLA34 &86.47&91.66&81.83\\ \hline Polar R-CNN &DLA34&\textbf{87.29}&90.50&\textbf{84.31}\\ \hline \end{tabular} \end{adjustbox} \label{curvelanes result} \end{table} \subsection{Comparison with the state-of-the-art method} The comparison results of our proposed model with other methods are shown in Tables \ref{culane result}, \ref{tusimple result}, \ref{llamas result}, \ref{dlrail result}, and \ref{curvelanes result}. We present results for two versions of our model: the NMS-based version, denoted as Polar R-CNN-NMS, and the NMS-free version, denoted as Polar R-CNN. The NMS-based version utilizes predictions obtained from the O2M head followed by NMS post-processing, while the NMS-free version derives predictions directly from the O2O classification head without NMS. To ensure a fair comparison, we also include results for CLRerNet \cite{clrernet} on the CULane and CurveLanes datasets, as we use a similar training strategy and data split. As illustrated in the comparison results, our model demonstrates competitive performance across five datasets. Specifically, on the CULane, TuSimple, LLAMAS, and DL-Rail datasets (sparse scenarios), our model outperforms other anchor-based methods. Additionally, the performance of the NMS-free version is nearly identical to that of the NMS-based version, highlighting the effectiveness of the O2O head in eliminating redundant predictions. On the CurveLanes dataset, the NMS-free version achieves superior F1-measure and Recall compared to both NMS-based and segment\&grid-based methods. We also compare the number of anchors and processing speed with other methods. Fig. \ref{anchor_num_method} illustrates the number of anchors used by several anchor-based methods on CULane. Our proposed model utilizes the fewest proposal anchors (20 anchors) while achieving the highest F1-score on CULane. It remains competitive with state-of-the-art methods like CLRerNet, which uses 192 anchors and a cross-layer refinement strategy. Conversely, the sparse Laneformer, which also uses 20 anchors, does not achieve optimal performance. It is important to note that our model is designed with a simpler structure without additional refinement, indicating that the design of flexible anchors is crucial for performance in sparse scenarios. Furthermore, due to its simple structure and fewer anchors, our model exhibits lower latency compared to most methods, as shown in Fig. \ref{speed_method}. The combination of fast processing speed and a straightforward architecture makes our model highly deployable. \subsection{Ablation Study and Visualization} To validate and analyze the effectiveness and influence of different component of Polar R-CNN, we conduct serveral ablation expeoriments on CULane and CurveLanes dataset to show the performance. \textbf{Ablation study on polar coordinate system and anchor number.} To assess the importance of local polar coordinates of anchors, we examine the contribution of each component (i.e., angle and radius) to model performance. As shown in Table \ref{aba_lph}, both angle and radius contribute to performance to varying degrees. Additionally, we conduct experiments with auxiliary loss using fixed anchors and Polar R-CNN. Fixed anchors refer to using anchor settings trained by CLRNet, as illustrated in Fig. \ref{anchor setting}(b). Model performance improves by 0.48% and 0.3% under the fixed anchor paradigm and proposal anchor paradigm, respectively. We also explore the effect of different local polar map sizes on our model, as illustrated in Fig. \ref{anchor_num_testing}. The overall F1 measure improves with increasing the local polar map size and tends to stabilize when the size is sufficiently large. Specifically, precision improves, while recall decreases. A larger polar map size includes more background anchors in the second stage (since we choose k=4 for SimOTA, with no more than four positive samples for each ground truth). Consequently, the model learns more negative samples, enhancing precision but reducing recall. Regarding the number of anchors chosen during the evaluation stage, recall and F1 measure show a significant increase in the early stages of anchor number expansion but stabilize in later stages. This suggests that eliminating some anchors does not significantly affect performance. Fig. \ref{cam} displays the heat map and top-$K_{a}$ selected anchors’ distribution in sparse scenarios. Brighter colors indicate a higher likelihood of anchors being foreground anchors. It is evident that most of the proposed anchors are clustered around the lane ground truth. \begin{figure}[t] \centering \includegraphics[width=\linewidth]{thesis_figure/anchor_num_method.png} \caption{Anchor numbers vs F1@50 of different methods on CULane lane detection benchmark.} \label{anchor_num_method} \end{figure} \begin{figure}[t] \centering \includegraphics[width=\linewidth]{thesis_figure/speed_method.png} \caption{Latency vs F1@50 of different methods on CULane lane detection benchmark.} \label{speed_method} \end{figure} \begin{table}[h] \centering \caption{Ablation study of anchor proposal strategies} \begin{adjustbox}{width=\linewidth} \begin{tabular}{c|ccc|cc} \toprule \textbf{Anchor strategy}&\textbf{Local R}& \textbf{Local Angle}&\textbf{Auxloss}&\textbf{F1@50 (\%)}&\textbf{F1@75 (\%)}\\ \midrule \multirow{2}*{Fixed} &- &- & &79.90 &60.98\\ &- &- &\checkmark&80.38 &62.35\\ \midrule \multirow{5}*{Porposal} & & & &75.85 &58.97\\ &\checkmark& & &78.46 &60.32\\ & &\checkmark& &80.31 &62.13\\ &\checkmark&\checkmark& &80.51 &63.38\\ &\checkmark&\checkmark&\checkmark&\textbf{80.81}&\textbf{63.97}\\ \bottomrule \end{tabular} \end{adjustbox} \label{aba_lph} \end{table} \begin{figure*}[t] \centering \def\subwidth{0.325\textwidth} \def\imgwidth{\linewidth} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth]{thesis_figure/anchor_num/anchor_num_testing_p.png} \end{subfigure} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth]{thesis_figure/anchor_num/anchor_num_testing_r.png} \end{subfigure} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth]{thesis_figure/anchor_num/anchor_num_testing.png} \end{subfigure} \caption{F1@50 preformance of different polar map sizes and different top-$K_{a}$ anchor selections on CULane test set.} \label{anchor_num_testing} \end{figure*} \begin{figure}[t] \centering \def\subwidth{0.24\textwidth} \def\imgwidth{\linewidth} \def\imgheight{0.4\linewidth} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/heatmap/cam1.jpg} \caption{} \end{subfigure} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/heatmap/anchor1.jpg} \caption{} \end{subfigure} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/heatmap/cam2.jpg} \caption{} \end{subfigure} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/heatmap/anchor2.jpg} \caption{} \end{subfigure} \caption{(a)\&(c): The heap map of the local polar map; (b)\&(d): The final anchor selection during the evaluation stage.} \label{cam} \end{figure} \textbf{Ablation study on NMS-free block in sparse scenarios.} We conduct several experiments on the CULane dataset to evaluate the performance of the NMS-free head in sparse scenarios. As shown in Table \ref{aba_NMSfree_block}, without using the GNN to establish relationships between anchors, Polar R-CNN fails to achieve a NMS-free paradigm, even with one-to-one assignment. Furthermore, the classification matrix (cls matrix) proves crucial, indicating that conditional probability is effective. Other components, such as the neighbor matrix (provided as a geometric prior) and rank loss, also contribute to the performance of the NMS-free block. To compare the NMS-free paradigm with the traditional NMS paradigm, we perform experiments with the NMS-free block under both proposal and fixed anchor strategies. Table \ref{NMS vs NMS-free} presents the results of these experiments. Here, O2M-B refers to the O2M classification head, O2O-B refers to the O2O classification head with a plain structure, and O2O-G refers to the O2O classification head with Polar GNN block. To assess the ability to eliminate redundant predictions, NMS post-processing is applied to each head. The results show that NMS is necessary for the traditional O2M classification head. In the fixed anchor paradigm, although the O2O classification head with a plain structure effectively eliminates redundant predictions, it is less effective than the proposed Polar GNN block. In the proposal anchor paradigm, the O2O classification head with a plain structure fails to eliminate redundant predictions due to high anchor overlap and similar RoI features. Thus, the GNN structure is essential for Polar R-CNN in the NMS-free paradigm. Both in the fixed and proposal anchor paradigms, the O2O classification head with the GNN structure successfully eliminates redundant predictions, indicating that our GNN-based O2O classification head can replace NMS post-processing in sparse scenarios without a decrease in performance. This confirms our earlier theory that both structure and label assignment are crucial for a NMS-free paradigm. We also explore the stop-gradient strategy for the O2O classification head. As shown in Table \ref{stop}, the gradient of the O2O classification head negatively impacts both the O2M classification head (with NMS post-processing) and the O2O classification head. This suggests that one-to-one assignment introduces critical bias into feature learning. \begin{table}[h] \centering \caption{Ablation study on Polar GNN block.} \begin{adjustbox}{width=\linewidth} \begin{tabular}{cccc|ccc} \toprule \textbf{GNN}&\textbf{cls Mat}& \textbf{Nbr Mat}&\textbf{Rank Loss}&\textbf{F1@50 (\%)}&\textbf{Precision (\%)} & \textbf{Recall (\%)} \\ \midrule & & & &16.19&69.05&9.17\\ \checkmark&\checkmark& & &79.42&88.46&72.06\\ \checkmark& &\checkmark& &71.97&73.13&70.84\\ \checkmark&\checkmark&\checkmark& &80.74&88.49&74.23\\ \checkmark&\checkmark&\checkmark&\checkmark&\textbf{80.78}&\textbf{88.49}&\textbf{74.30}\\ \bottomrule \end{tabular}\ \end{adjustbox} \label{aba_NMSfree_block} \end{table} \begin{table}[h] \centering \caption{The ablation study for NMS and NMS-free on CULane test set.} \begin{adjustbox}{width=\linewidth} \begin{tabular}{c|l|lll} \toprule \multicolumn{2}{c|}{\textbf{Anchor strategy~/~assign}} & \textbf{F1@50 (\%)} & \textbf{Precision (\%)} & \textbf{Recall (\%)} \\ \midrule \multirow{6}*{Fixed} &O2M-B w/~ NMS &80.38&87.44&74.38\\ &O2M-B w/o NMS &44.03\textcolor{darkgreen}{~(36.35$\downarrow$)}&31.12\textcolor{darkgreen}{~(56.32$\downarrow$)}&75.23\textcolor{red}{~(0.85$\uparrow$)}\\ \cline{2-5} &O2O-B w/~ NMS &78.72&87.58&71.50\\ &O2O-B w/o NMS &78.23\textcolor{darkgreen}{~(0.49$\downarrow$)}&86.26\textcolor{darkgreen}{~(1.32$\downarrow$)}&71.57\textcolor{red}{~(0.07$\uparrow$)}\\ \cline{2-5} &O2O-G w/~ NMS &80.37&87.44&74.37\\ &O2O-G w/o NMS &80.27\textcolor{darkgreen}{~(0.10$\downarrow$)}&87.14\textcolor{darkgreen}{~(0.30$\downarrow$)}&74.40\textcolor{red}{~(0.03$\uparrow$)}\\ \midrule \multirow{6}*{Proposal} &O2M-B w/~ NMS &80.81&88.53&74.33\\ &O2M-B w/o NMS &36.46\textcolor{darkgreen}{~(44.35$\downarrow$)}&24.09\textcolor{darkgreen}{~(64.44$\downarrow$)}&74.93\textcolor{red}{~(0.6$\uparrow$)}\\ \cline{2-5} &O2O-B w/~ NMS &77.27&92.64&66.28\\ &O2O-B w/o NMS &47.11\textcolor{darkgreen}{~(30.16$\downarrow$)}&36.48\textcolor{darkgreen}{~(56.16$\downarrow$)}&66.48\textcolor{red}{~(0.20$\uparrow$)}\\ \cline{2-5} &O2O-G w/~ NMS &80.81&88.53&74.32\\ &O2O-G w/o NMS &80.81\textcolor{red}{~(0.00$\uparrow$)}&88.52\textcolor{darkgreen}{~(0.01$\downarrow$)}&74.33\textcolor{red}{~(0.01$\uparrow$)}\\ \bottomrule \end{tabular} \end{adjustbox} \label{NMS vs NMS-free} \end{table} \begin{table}[h] \centering \caption{The ablation study for the stop grad strategy on CULane test set.} \begin{adjustbox}{width=\linewidth} \begin{tabular}{c|c|lll} \toprule \multicolumn{2}{c|}{\textbf{Paradigm}} & \textbf{F1@50 (\%)} & \textbf{Precision (\%)} & \textbf{Recall (\%)} \\ \midrule \multirow{2}*{Baseline} &O2M-B w/~ NMS &78.83&88.99&70.75\\ &O2O-G w/o NMS &71.68\textcolor{darkgreen}{~(7.15$\downarrow$)}&72.56\textcolor{darkgreen}{~(16.43$\downarrow$)}&70.81\textcolor{red}{~(0.06$\uparrow$)}\\ \midrule \multirow{2}*{Stop Grad} &O2M-B w/~ NMS &80.81&88.53&74.33\\ &O2O-G w/o NMS &80.81\textcolor{red}{~(0.00$\uparrow$)}&88.52\textcolor{darkgreen}{~(0.01$\downarrow$)}&74.33\textcolor{red}{~(0.00$\uparrow$)} \\ \bottomrule \end{tabular} \end{adjustbox} \label{stop} \end{table} \textbf{Ablation study on NMS-free block in dense scenarios.} Despite demonstrating the feasibility of replacing NMS with the O2O classification head in sparse scenarios, the shortcomings of NMS in dense scenarios remain. To investigate the performance of the NMS-free block in dense scenarios, we conduct experiments on the CurveLanes dataset, as detailed in Table \ref{aba_NMS_dense}. In the traditional NMS post-processing \cite{clrernet}, the default IoU threshold is set to 50 pixels. However, this default setting may not always be optimal, especially in dense scenarios where some lane predictions might be erroneously eliminated. Lowering the IoU threshold increases recall but decreases precision. To find the most effective IoU threshold, we experimented with various values and found that a threshold of 15 pixels achieves the best trade-off, resulting in an F1-score of 86.81\%. In contrast, the NMS-free paradigm with the GNN-based O2O classification head achieves an overall F1-score of 87.29\%, which is 0.48\% higher than the optimal threshold setting in the NMS paradigm. Additionally, both precision and recall are improved under the NMS-free approach. This indicates that the GNN-based O2O classification head is capable of learning both explicit geometric distance and implicit semantic distances between anchors in addition to geometric distances, thus providing a more effective solution for dense scenarios compared to the traditional NMS post-processing. \begin{table}[h] \centering \caption{NMS vs NMS-free on CurveLanes validation set.} \begin{adjustbox}{width=\linewidth} \begin{tabular}{l|l|ccc} \toprule \textbf{Paradigm} & \textbf{NMS thres(pixel)} & \textbf{F1@50(\%)} & \textbf{Precision(\%)} & \textbf{Recall(\%)} \\ \midrule \multirow{7}*{Polar R-CNN-NMS} & 50 (default) &85.38&\textbf{91.01}&80.40\\ & 40 &85.97&90.72&81.68\\ & 30 &86.26&90.44&82.45\\ & 25 &86.38&90.27&82.83\\ & 20 &86.57&90.05&83.37\\ & 15 (optimal) &86.81&89.64&84.16\\ & 10 &86.58&88.62&\textbf{84.64}\\ \midrule Polar R-CNN & - &\textbf{87.29}&90.50&84.31\\ \bottomrule \end{tabular} \end{adjustbox} \label{aba_NMS_dense} \end{table} \textbf{Visualization.} We present the Polar R-CNN predictions for both sparse and dense scenarios. Fig. \ref{vis_sparse} displays the predictions for sparse scenarios across four datasets. LPH effectively proposes anchors that are clustered around the ground truth, providing a robust prior for the RoI stage to achieve the final lane predictions. Moreover, the number of anchors has significantly decreased compared to previous works, making our method faster than other anchor-based methods in theory. Fig. \ref{vis_dense} shows the predictions for dense scenarios. We observe that NMS@50 mistakenly removes some predictions, leading to false negatives, while NMS@15 fails to eliminate redundant predictions, resulting in false positives. This highlights the trade-off between using a large IoU threshold and a small IoU threshold. The visualization clearly demonstrates that geometric distance becomes less effective in dense scenarios. Only the O2O classification head, driven by data, can address this issue by capturing semantic distance beyond geometric distance. As shown in Fig. \ref{vis_dense}, the O2O classification head successfully eliminates redundant true predictions while retaining dense predictions with small geometric distances. \begin{figure*}[htbp] \centering \def\pagewidth{0.49\textwidth} \def\subwidth{0.47\linewidth} \def\imgwidth{\linewidth} \def\imgheight{0.5625\linewidth} \def\dashheight{0.8\linewidth} \begin{subfigure}{\pagewidth} \rotatebox{90}{\small{GT}} \begin{minipage}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_dataset/culane/1_gt.jpg} \end{minipage} \begin{minipage}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_dataset/culane/2_gt.jpg} \end{minipage} \end{subfigure} \begin{subfigure}{\pagewidth} \begin{minipage}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_dataset/tusimple/1_gt.jpg} \end{minipage} \begin{minipage}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_dataset/tusimple/2_gt.jpg} \end{minipage} \end{subfigure} \vspace{0.5em} \begin{subfigure}{\pagewidth} \raisebox{-1.5em}{\rotatebox{90}{\small{Anchors}}} \begin{minipage}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_dataset/culane/1_anchor.jpg} \end{minipage} \begin{minipage}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_dataset/culane/2_anchor.jpg} \end{minipage} \end{subfigure} \begin{subfigure}{\pagewidth} \begin{minipage}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_dataset/tusimple/1_anchor.jpg} \end{minipage} \begin{minipage}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_dataset/tusimple/2_anchor.jpg} \end{minipage} \end{subfigure} \vspace{0.5em} \begin{subfigure}{\pagewidth} \raisebox{-2em}{\rotatebox{90}{\small{Predictions}}} \begin{minipage}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_dataset/culane/1_pred.jpg} \end{minipage} \begin{minipage}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_dataset/culane/2_pred.jpg} \end{minipage} \caption{CULane} \end{subfigure} \begin{subfigure}{\pagewidth} \begin{minipage}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_dataset/tusimple/1_pred.jpg} \end{minipage} \begin{minipage}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_dataset/tusimple/2_pred.jpg} \end{minipage} \caption{TuSimple} \end{subfigure} \vspace{0.5em} % \begin{tikzpicture} % \draw[dashed, pattern=on 8pt off 2pt, color=gray, line width=1pt] (-\textwidth/2,0) -- (\textwidth/2.,0); % \end{tikzpicture} % \vspace{0.05em} \begin{subfigure}{\pagewidth} \rotatebox{90}{\small{GT}} \begin{minipage}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_dataset/llamas/1_gt.jpg} \end{minipage} \begin{minipage}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_dataset/llamas/2_gt.jpg} \end{minipage} \end{subfigure} \begin{subfigure}{\pagewidth} \begin{minipage}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_dataset/dlrail/1_gt.jpg} \end{minipage} \begin{minipage}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_dataset/dlrail/2_gt.jpg} \end{minipage} \end{subfigure} \vspace{0.5em} \begin{subfigure}{\pagewidth} \raisebox{-1.5em}{\rotatebox{90}{\small{Anchors}}} \begin{minipage}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_dataset/llamas/1_anchor.jpg} \end{minipage} \begin{minipage}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_dataset/llamas/2_anchor.jpg} \end{minipage} \end{subfigure} \begin{subfigure}{\pagewidth} \begin{minipage}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_dataset/dlrail/1_anchor.jpg} \end{minipage} \begin{minipage}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_dataset/dlrail/2_anchor.jpg} \end{minipage} \end{subfigure} \vspace{0.5em} \begin{subfigure}{\pagewidth} \raisebox{-2em}{\rotatebox{90}{\small{Predictions}}} \begin{minipage}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_dataset/llamas/1_pred.jpg} \end{minipage} \begin{minipage}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_dataset/llamas/2_pred.jpg} \end{minipage} \caption{LLAMAS} \end{subfigure} \begin{subfigure}{\pagewidth} \begin{minipage}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_dataset/dlrail/1_pred.jpg} \end{minipage} \begin{minipage}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_dataset/dlrail/2_pred.jpg} \end{minipage} \caption{DL-Rail} \end{subfigure} \vspace{0.5em} \caption{The visualization of the detection results of sparse scenarios.} \label{vis_sparse} \end{figure*} \begin{figure*}[htbp!] \centering \def\subwidth{0.24\textwidth} \def\imgwidth{\linewidth} \def\imgheight{0.5625\linewidth} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_nms/redun_gt.jpg} \end{subfigure} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_nms/redun_pred50.jpg} \end{subfigure} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_nms/redun_pred15.jpg} \end{subfigure} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_nms/redun_NMSfree.jpg} \end{subfigure} \vspace{0.5em} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_nms/redun2_gt.jpg} \end{subfigure} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_nms/redun2_pred50.jpg} \end{subfigure} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_nms/redun2_pred15.jpg} \end{subfigure} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_nms/redun2_NMSfree.jpg} \end{subfigure} \vspace{0.5em} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_nms/less_gt.jpg} \end{subfigure} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_nms/less_pred50.jpg} \end{subfigure} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_nms/less_pred15.jpg} \end{subfigure} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_nms/less_NMSfree.jpg} \end{subfigure} \vspace{0.5em} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_nms/less2_gt.jpg} \end{subfigure} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_nms/less2_pred50.jpg} \end{subfigure} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_nms/less2_pred15.jpg} \end{subfigure} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_nms/less2_NMSfree.jpg} \end{subfigure} \vspace{0.5em} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_nms/all_gt.jpg} \end{subfigure} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_nms/all_pred50.jpg} \end{subfigure} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_nms/all_pred15.jpg} \end{subfigure} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_nms/all_NMSfree.jpg} \end{subfigure} \vspace{0.5em} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_nms/all2_gt.jpg} \caption{GT} \end{subfigure} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_nms/all2_pred50.jpg} \caption{NMS@50} \end{subfigure} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_nms/all2_pred15.jpg} \caption{NMS@15} \end{subfigure} \begin{subfigure}{\subwidth} \includegraphics[width=\imgwidth, height=\imgheight]{thesis_figure/view_nms/all2_NMSfree.jpg} \caption{NMSFree} \end{subfigure} \vspace{0.5em} \caption{The visualization of the detection results of sparse and dense scenarios on CurveLanes dataset.} \label{vis_dense} \end{figure*} \section{Conclusion and Future Work} In this paper, we propose Polar R-CNN to address two key issues in anchor-based lane detection methods. By incorporating a local and global polar coordinate system, our Polar R-CNN achieves improved performance with fewer anchors. Additionally, the introduction of the O2O classification head with Polar GNN block allows us to replace the traditional NMS post-processing, and the NMS-free paradigm demonstrates superior performance in dense scenarios. Our model is highly flexible and the number of anchors can be adjusted based on the specific scenario. Users have the option to use either the O2M classification head with NMS post-processing or the O2O classification head for a NMS-free approach. Polar R-CNN is also deployment-friendly due to its simple structure, making it a potential new baseline for lane detection. Future work could explore incorporating new structures, such as large kernels or attention mechanisms, and experimenting with new label assignment, training, and anchor sampling strategies. We also plan to extend Polar R-CNN to video instance lane detection and 3D lane detection, utilizing advanced geometric modeling for these new tasks. % % % \bibliographystyle{IEEEtran} \bibliography{reference} %\newpage % \begin{IEEEbiography}[{\includegraphics[width=1in,height=1.25in,clip,keepaspectratio]{thesis_figure/wsq.jpg}}]{Shengqi Wang} received the Master degree from Xi'an Jiaotong University, Xi'an, China, in 2022. He is now pursuing for the Ph.D. degree in statistics at Xi'an Jiaotong University. His research interests include low-level computer vision, deep learning, and so on. \end{IEEEbiography} \begin{IEEEbiography}[{\includegraphics[width=1in,height=1.25in,clip,keepaspectratio]{thesis_figure/ljm.pdf}}]{Junmin Liu} was born in 1982. He received the Ph.D. degree in Mathematics from Xi'an Jiaotong University, Xi'an, China, in 2013. From 2011 to 2012, he served as a Research Assistant with the Department of Geography and Resource Management at the Chinese University of Hong Kong, Hong Kong, China. From 2014 to 2017, he worked as a Visiting Scholar at the University of Maryland, College Park, USA. He is currently a full Professor at the School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an, China. His research interests are mainly focused on the theory and application of machine learning and image processing. He has published over 60+ research papers in international conferences and journals. \end{IEEEbiography} \begin{IEEEbiography}[{\includegraphics[width=1in,height=1.25in,clip,keepaspectratio]{thesis_figure/xiangyongcao.jpg}}]{Xiangyong Cao (Member, IEEE)} received the B.Sc. and Ph.D. degrees from Xi’an Jiaotong University, Xi’an, China, in 2012 and 2018, respectively. From 2016 to 2017, he was a Visiting Scholar with Columbia University, New York, NY, USA. He is an Associate Professor with the School of Computer Science and Technology, Xi’an Jiaotong University. His research interests include statistical modeling and image processing. \end{IEEEbiography} \vfill \end{document}