update

2024-09-27 10:19:18 +08:00 · 2024-09-27 10:19:18 +08:00 · 03de77e908
commit 03de77e908
parent e0f1eee469
4 changed files with 12 additions and 11 deletions
--- a/main.tex
+++ b/main.tex
@ -46,7 +46,7 @@
 \maketitle
 \begin{abstract}
-    Lane detection is a critical and challenging task in autonomous driving, particularly in real-world scenarios where traffic lanes can be slender, lengthy, and often obscured by other vehicles, complicating detection efforts. Existing anchor-based methods typically rely on prior Lane anchors to extract features and refine location and shape of lanes. While these methods achieve high performance, manually setting prior anchors is cumbersome, and ensuring sufficient  coverage across diverse datasets often requires a large number of dense anchors. Furthermore,
+    Lane detection is a critical and challenging task in autonomous driving, particularly in real-world scenarios where traffic lanes can be slender, lengthy, and often obscured by other vehicles, complicating detection efforts. Existing anchor-based methods typically rely on prior lane anchors to extract features and refine location and shape of lanes. While these methods achieve high performance, manually setting prior anchors is cumbersome, and ensuring sufficient  coverage across diverse datasets often requires a large number of dense anchors. Furthermore,
    the use of \textit{Non-Maximum Suppression} (NMS) to eliminate redundant predictions complicates real-world deployment and may underperform in complex scenarios. In this paper, we propose \textit{Polar R-CNN}, a NMS-free anchor-based method for lane detection. By incorporating both local and global polar coordinate systems, Polar R-CNN facilitates flexible anchor proposals and significantly reduces the number of anchors required without compromising performance. Additionally, we introduce a heuristic \textit{Graph Neural Network} (GNN)-based NMS-free head that supports an end-to-end paradigm, enhancing deployment efficiency and performance in scenarios with dense lanes. Our method achieves competitive results on five popular lane detection benchmarks—\textit{Tusimple}, \textit{CULane}, \textit{LLAMAS}, \textit{CurveLanes}, and \textit{DL-Rail}—while maintaining a lightweight design and straightforward structure. Our source code is available at \href{https://github.com/ShqWW/PolarRCNN}{\textit{https://github.com/ShqWW/PolarRCNN}}.
 \end{abstract}
 \begin{IEEEkeywords}
@ -173,7 +173,7 @@ In this work, we aim to address the above two issues in the framework of anchor-
 	\label{coord}
 \end{figure}
 %
-The overall architecture of our Polar R-CNN is illustrated in Fig. \ref{overall_architecture}. As shown in this figure, our Polar R-CNN for lane detection has a similar pipeline with Faster R-CNN \cite{fasterrcnn}, which consists of a backbone, a \textit{Feature Pyramid Network} (FPN), a \textit{Region Proposal Network} (RPN) followed by a \textit{Local Polar Module} LPM, and \textit{Region of Interest} (RoI) pooling module followed by a \textit{Global Polar Module} (GPM). In the following, we first introduce the polar coordinate representation of lane and lane anchors, and then present the designed LPM and GPM in our Polar R-CNN. %To investigate the fundamental factors affecting model performance, such as anchor settings and NMS post-processing, and also to enhance ease of deployment, our Polar R-CNN utilizes a simple and straightforward network structure. just relying on basic components, including convolutional or pooling operations, \textit{Multi-Layer Perceptrons} (MLPs), while deliberately excluding advanced elements like \textit{attention mechanisms}, \textit{dynamic kernels}, and \textit{cross-layer refinement} used in previous works \cite{clrnet}\cite{clrernet}. 
+The overall architecture of our Polar R-CNN is illustrated in Fig. \ref{overall_architecture}. As shown in this figure, our Polar R-CNN for lane detection has a similar pipeline with Faster R-CNN \cite{fasterrcnn}, which consists of a backbone, a \textit{Feature Pyramid Network} (FPN), a \textit{Region Proposal Network} (RPN) followed by a \textit{Local Polar Module} (LPM), and \textit{Region of Interest} (RoI) pooling module followed by a \textit{Global Polar Module} (GPM). In the following, we first introduce the polar coordinate representation of lane and lane anchors, and then present the designed LPM and GPM in our Polar R-CNN. %To investigate the fundamental factors affecting model performance, such as anchor settings and NMS post-processing, and also to enhance ease of deployment, our Polar R-CNN utilizes a simple and straightforward network structure. just relying on basic components, including convolutional or pooling operations, \textit{Multi-Layer Perceptrons} (MLPs), while deliberately excluding advanced elements like \textit{attention mechanisms}, \textit{dynamic kernels}, and \textit{cross-layer refinement} used in previous works \cite{clrnet}\cite{clrernet}. 
 %\par
 %
@ -188,17 +188,18 @@ However, the representation of lane anchors as rays presents certain limitations
 \begin{figure}[t]
 	\centering
 	\includegraphics[width=0.87\linewidth]{thesis_figure/coord/localpolar.png}
-	\caption{Local (left) and global (right) polar coordinate system.}
+	\caption{The local polar coordinate system. The ground truth of the radius $\hat{r}_{i}^{l}$ of a local pole is defines as the minimum distance from the pole to the lane curve instance. A positive pole has a radius $\hat{r}_{i}^{l}$ that is below a threshold $\tau_{l}$, and vice versa. Additionally, the ground truth angle is determined by the angle formed between the radius vector (connecting the pole to the closest point on the lanes) and the local polar axis.}
 	\label{lphlabel}
 \end{figure}
 \par
 \textbf{Representation in Polar Coordinate.} As stated above, lane anchors represented by rays have some drawbacks. To address these issues, we introduce a polar coordinate representation of lane anchors. In mathematics, the polar coordinate is a two-dimensional coordinate system in which each point on a plane is determined by a distance from a reference point (also called the pole) and an angle $\theta$ from a reference direction (called polar axis). As shown in Fig. \ref{coord}(b), given a polar corresponding to the yellow point, a lane anchor for a straight line can be uniquely defined by two parameters: the radial distance from the pole (called radius), $r$, and the counterclockwise angle from the polar axis to the perpendicular line of the lane anchor, $\theta$, with $r \in \mathbb{R}$ and $\theta\in\left(-\frac{\pi}{2}, \frac{\pi}{2}\right]$. 
 \par
-To better leverage the local inductive bias properties of CNNs, we define two types of polar coordinate systems: the local and global coordinate systems. The local polar coordinate system is to generate lane anchors, while the global coordinate system is to regress these anchors to the ground truth lane instances. Given the distinct roles of the local and global systems, we adopt a two-stage training scheme for our Polar R-CNN, similar to Faster R-CNN\cite{fasterrcnn}. This scheme alternates between training the local polar system and the global polar system, with the lane anchors kept fixed. 
+To better leverage the local inductive bias properties of CNNs, we define two types of polar coordinate systems: the local and global coordinate systems. The local polar coordinate system is to generate lane anchors, while the global coordinate system expresses these anchors in a form within the entire image and regresses them to the ground truth lane instances. Given the distinct roles of the local and global systems, we adopt a two-stage framewrok for our Polar R-CNN, similar to Faster R-CNN\cite{fasterrcnn}.  
 \par
-The local polar system is designed to predict lane anchors adaptable to both sparse and dense scenarios. In this system, there are many poles with each as the lattice point of the feature map, referred to as local poles. As illustrated on the left side of Fig. \ref{lphlabel}, there are two types of local poles: positive and negative. Positive local poles (e.g., the green points) have a radius $r_l$ below a threshold $\tau_l$, otherwise, they are classified as negative local poles (e.g., the red points). Each local pole is responsible for predicting a single lane anchor.
+The local polar system is designed to predict lane anchors adaptable to both sparse and dense scenarios. In this system, there are many poles with each as the lattice point of the feature map, referred to as local poles. As illustrated on the left side of Fig. \ref{lphlabel}, there are two types of local poles: positive and negative. Positive local poles (e.g., the blue points) have a radius $r_{i}^{l}$ below a threshold $\tau_l$, otherwise, they are classified as negative local poles (e.g., the red points). Each local pole is responsible for predicting a single lane anchor. While a lane ground truth may generate multiple lane anchors, as shown in Fig. \ref{lphlabel}, there are three positive poles around the lane instance (green lane), which are expected to generate three lane anchors. This one-to-many approach is essential for ensuring comprehensive anchor proposals, especially since some local features around certain poles may be lost due to damage or occlusion of the lane curve. 
 \par
-In contrast, the global polar system has a single uniform pole, as shown on the right side of Fig. \ref{lphlabel}. The location of the global pole is manually set; in this work, it is positioned around the static vanishing point of the entire lane image dataset. Notably, one ground truth lane curve instance is regressed by multiple lane anchors generated by the positive local poles. This one-to-many approach is essential for ensuring comprehensive anchor proposals, especially since some local features around certain poles may be lost due to damage or occlusion of the lane curve. 
+In the local polar coordinate system, the parameters of each lane anchor are determined based on the location of its corresponding local pole. However, in practical terms, once a lane anchor is generated, its position becomes fixed and independent from its original local pole. To simplify the representation of lane anchors in the second stage of Polar-RCNN, a global polar system has been designed, featuring a single pole that serves as a reference point for the entire image. The location of this global pole is manually set, and in this case, it is positioned near the static vanishing point observed across the entire lane image dataset. This approach ensures a consistent and unified framework for expressing lane anchors within the global context of the image, facilitating accurate regression to the ground truth lane instances.
 \begin{figure}[t]
        \centering
        \includegraphics[width=0.45\textwidth]{thesis_figure/local_polar_head.png}
@ -215,16 +216,16 @@ The downsampled feature map $F_d$ is then fed into two branches: a \textit{regre
 F_{reg\,\,}\gets \phi _{reg}^{lpm}\left( F_d \right)\ &\text{and}\ F_{reg\,\,}\in \mathbb{R} ^{2\times H^{l}\times W^{l}},\\
 F_{cls}\gets \phi _{cls}^{lpm}\left( F_d \right)\ &\text{and}\ F_{cls}\in \mathbb{R} ^{H^{l}\times W^{l}}.	\label{lph equ}
 \end{align}
-The regression branch consists of a single $1\times1$ convolutional layer and with the goal of generating lane anchors by outputting their angles $\theta_{j}$ and the radius $r^{l}_{j}$, \textit{i.e.}, $F_{reg\,\,} \equiv \left\{\theta_{j}, r^{l}_{j}\right\}_{j=1}^{H^{l}\times W^{l}}$, in the defined local polar coordinate system previously introduced. Similarly, the classification branch $\phi _{cls}^{lph}\left(\cdot \right)$ only consists of two $1\times1$ convolutional layers for simplicity. This branch is to predict the confidence heat map $F_{cls\,\,}\equiv \left\{ c_j \right\} _{j=1}^{H^l\times W^l}$ for local poles, each associated with a feature point. By discarding local poles with lower confidence, the module increases the likelihood of selecting potential positive foreground lane anchors while effectively removing background lane anchors.
+The regression branch consists of a single $1\times1$ convolutional layer and with the goal of generating lane anchors by outputting their angles $\theta_{j}$ and the radius $r^{l}_{j}$, \textit{i.e.}, $F_{reg\,\,} \equiv \left\{\theta^{l}_{j}, r^{l}_{j}\right\}_{j=1}^{H^{l}\times W^{l}}$, in the defined local polar coordinate system previously introduced. Similarly, the classification branch $\phi _{cls}^{lph}\left(\cdot \right)$ only consists of two $1\times1$ convolutional layers for simplicity. This branch is to predict the confidence heat map $F_{cls\,\,}\equiv \left\{ c_j \right\} _{j=1}^{H^l\times W^l}$ for local poles, each associated with a feature point. By discarding local poles with lower confidence, the module increases the likelihood of selecting potential positive foreground lane anchors while effectively removing background lane anchors.
 \par
-\textbf{Loss Function for Training the LPM.} To train the local polar module, we define the ground truth labels for each local pole as follows: the ground truth radius, $r^*$, is set to be the shortest distance from a local pole to the corresponding lane curve, while the ground truth angle, $\theta^*$, is set to be the orientation of the vector extending from the local pole to the nearest point on the curve. A positive pole is labeled as one; otherwise, it is labeled as zero. Consequently, we have a label set of local poles $F_{gt}=\{c_j^*\}_{j=1}^{H^l\times W^l}$, where $c_j^*=1$ if the $j$-th local pole is positive and $c_j^*=0$ if it is negative. Once the regression and classification labels are established, as shown in Fig. \ref{lphlabel}, the LPM can be trained using the \textit{smooth-L}1 loss $s\left(\cdot \right)$ for regression branch and the \textit{binary cross-entropy} loss $BCE\left( \cdot , \cdot \right)$ for classification branch. The loss functions for the LPM are given as follows:
+\textbf{Loss Function for Training the LPM.} To train the local polar module, we define the ground truth labels for each local pole as follows: the ground truth radius, $\hat{r}^l_i$, is set to be the minimum distance from a local pole to the corresponding lane curve, while the ground truth angle, $\hat{\theta}^l_i$, is set to be the orientation of the vector extending from the local pole to the nearest point on the curve. A positive pole is labeled as one; otherwise, it is labeled as zero. Consequently, we have a label set of local poles $F_{gt}=\{c_j^l\}_{j=1}^{H^l\times W^l}$, where $c_j^*=1$ if the $j$-th local pole is positive and $c_j^l=0$ if it is negative. Once the regression and classification labels are established, as shown in Fig. \ref{lphlabel}, the LPM can be trained using the \textit{smooth-L}1 loss $s\left(\cdot \right)$ for regression branch and the \textit{binary cross-entropy} loss $BCE\left( \cdot , \cdot \right)$ for classification branch. The loss functions for the LPM are given as follows:
 \begin{align}
 \mathcal{L} _{lpm}^{cls}&=BCE\left( F_{cls},F_{gt} \right), \\
-\mathcal{L} _{lpm}^{r\mathrm{e}g}&=\frac{1}{N_{lpm}^{pos}}\sum_{j\in \left\{j|r_j<\tau_{l} \right\}}{\left( s\left( \theta _j-\theta_j^* \right) +s\left( r_j^*-r_j^* \right) \right)},  \label{loss_lph}
+\mathcal{L} _{lpm}^{r\mathrm{e}g}&=\frac{1}{N_{lpm}^{pos}}\sum_{j\in \left\{j|\hat{r}_j^l<\tau_{l} \right\}}{\left( s\left( \theta_j^l-\hat{\theta}_j^l \right) +s\left( r_j^l-\hat{r}_j^l \right) \right)},  \label{loss_lph}
 \end{align}
-where $N_{lpm}^{pos}=\left|\{j|r_j<\tau_{l}\}\right|$ is the number of positive local poles in the LPM.
+where $N_{lpm}^{pos}=\left|\{j|\hat{r}_j^l<\tau_{l}\}\right|$ is the number of positive local poles in the LPM.
 \par
-\textbf{Top-$K$ Anchor Selection.} As discussed above, all $H^{l}\times W^{l}$ anchors, each associated with a point in the feature map, are considered as candidate anchor during training the LPM. It is helpful to our Polar R-CNN to learn from a sufficient variety of features, including negative anchor samples. However, only the top-$K$ anchors with the highest confidence scores $\{c_j\}$ are selected and fed into the next stage. This strategy effectively filters out potential negative anchors and reduces the computational complexity of our Polar R-CNN. By doing this, it maintains the adaptability and flexibility of anchor distribution while decreasing the total number of anchors. The following experiments will demonstrate the effectiveness of our top-$K$ anchor selection strategy.
+\textbf{Top-$K$ Anchor Selection.} As discussed above, all $H^{l}\times W^{l}$ anchors, each associated with a point in the feature map, are considered as candidate anchor during training the LPM. It is helpful to our Polar R-CNN to learn from a sufficient variety of features, including negative anchor samples. However, only the top-$K$ anchors with the highest confidence scores $\{c_j^l\}$ are selected and fed into the next stage. This strategy effectively filters out potential negative anchors and reduces the computational complexity of our Polar R-CNN. By doing this, it maintains the adaptability and flexibility of anchor distribution while decreasing the total number of anchors. The following experiments will demonstrate the effectiveness of our top-$K$ anchor selection strategy.
 %
 \subsection{Global Polar Module}
 Similar to the pipeline of Faster R-CNN, the LPM serves as the first stage for generating lane proposals. As illustrated in Fig. \ref{overall_architecture}, we introduce a novel \textit{Global Polar Module} (GPM) as the second stage to achieve accurate lane prediction. The GPM takes features extracted by a \textit{Region of Interest} (ROI) pooling layer as input and outputs the precise lane location and confidence scores through a triplet head.
--- a/thesis_figure/coord/localpolar.png
+++ b/thesis_figure/coord/localpolar.png
--- a/thesis_figure/ovarall_architecture.png
+++ b/thesis_figure/ovarall_architecture.png
--- a/thesis_figure/thisis_pic.pptx
+++ b/thesis_figure/thisis_pic.pptx