upate

2024-09-18 10:23:07 +08:00 · 2024-09-18 10:23:07 +08:00 · 4fa2730c18
commit 4fa2730c18
parent 3d6f626c56
4 changed files with 32 additions and 28 deletions
--- a/main.tex
+++ b/main.tex
@ -132,13 +132,7 @@ To address the above two issues, we propose Polar R-CNN, a novel anchor-based me
 \item By integrating the polar coordinate systems and Polar GNN block, we present a Polar R-CNN model for fast and efficient lane detection. And we conduct extensive experiments on five benchmark datasets to demonstrate the effectiveness of our model in high performance with fewer anchors and a NMS-free paradigm. %Additionally, our model features a straightforward structure—lacking cascade refinement or attention strategies—making it simpler to deploy. 
 \end{itemize}
 %
-\begin{figure*}[ht]
-	\centering
-	\includegraphics[width=0.85\linewidth]{thesis_figure/ovarall_architecture.png} 
-	\caption{An illustration of the Polar R-CNN architecture. It has a similar pipelines with the Faster R-CNN for the task of object detection, and consists of a backbone, a FPN with three levels of feature maps, respectively denote by $P_0, P_1, P_2$, followed by a \textit{local polar head}, and a RoI pooling module to extract features fed to a \textit{global polar head} for lane detection. Based on the designed lane representation and lane anchor representation in polar coordinate system, the local polar head can propose sparse line anchors and the global polar head can produce the robust and accurate lane predictions. The global polar head includes a triplet head, which comprises a \textit{one-to-one classification} (O2O Cls) head, a \textit{one-to-many classification} (O2M Cls), and a \textit{one-to-many regression} (O2M reg) head.}
-	\label{overall_architecture}
-\end{figure*}
-%
+
 \section{Related Works}
 %As mentioned above, our model is based on deep learning.
 Generally, deep learning-based lane detection methods can be categorized into three groups: segmentation-based, parameter-based, and anchor-based methods. Additionally, NMS-free is an important technique for anchor-based methods, and it will also be described in this section.
@ -154,6 +148,17 @@ categorizes lane instances by angles and locations, allowing it to detect only a
 \par
 In this work, we aim to address the above two issues in the framework of anchor-based detection to achieve NMF-free and non-redundant lane predictions.
 %
+
+%
+\section{Polar R-CNN}
+
+\begin{figure*}[ht]
+	\centering
+	\includegraphics[width=0.85\linewidth]{thesis_figure/ovarall_architecture.png} 
+	\caption{An illustration of the Polar R-CNN architecture. It has a similar pipelines with the Faster R-CNN for the task of object detection, and consists of a backbone, a FPN with three levels of feature maps, respectively denote by $P_0, P_1, P_2$, followed by a \textit{local polar head}, and a RoI pooling module to extract features fed to a \textit{global polar head} for lane detection. Based on the designed lane representation and lane anchor representation in polar coordinate system, the local polar head can propose sparse line anchors and the global polar head can produce the robust and accurate lane predictions. The global polar head includes a triplet head, which comprises a \textit{one-to-one (O2O) classification head}, a \textit{one-to-many (O2M) classification head} , and a \textit{one-to-many (O2M) regression head}.}
+	\label{overall_architecture}
+\end{figure*}
+
 \begin{figure}[t]
 	\centering
 	\def\subwidth{0.24\textwidth}
@ -171,8 +176,14 @@ In this work, we aim to address the above two issues in the framework of anchor-
 	\caption{Different descriptions for anchor parameters: (a) Ray: defined by its starting point and direction $\theta$. (b) Polar: defined by its radius and angle.} %rectangular coordinates
 	\label{coord}
 \end{figure}
+
+\begin{figure}[t]
+        \centering
+        \includegraphics[width=\linewidth]{thesis_figure/coord/localpolar.png}
+        \caption{Label construction for local polar head.}
+        \label{lphlabel}
+\end{figure}
 %
-\section{Polar R-CNN}
 The overall architecture of our Polar R-CNN is illustrated in Fig. \ref{overall_architecture}. As shown in this figure, our Polar R-CNN for lane detection has a similar pipeline with Faster R-CNN \cite{fasterrcnn}, which consists of a backbone, a \textit{Feature Pyramid Network} (FPN), a \textit{Region Proposal Network} (RPN) followed by a local polar head, and \textit{Region of Interest} (RoI) pooling module followed by a global polar head. To investigate the fundamental factors affecting model performance, such as anchor settings and NMS post-processing, and also to enhance ease of deployment, our Polar R-CNN utilizes a simple and straightforward network structure. just relying on basic components, including convolutional or pooling operations, \textit{Multi-Layer Perceptrons} (MLPs), while deliberately excluding advanced elements like \textit{attention mechanisms}, \textit{dynamic kernels}, and \textit{cross-layer refinement} used in previous works \cite{clrnet}\cite{clrernet}. 
 \par
 In the following, based on a polar coordinate representation of lane and lane anchors, we will further introduce the designed \textit{Local Polar Head} (LPH) and \textit{Global Polar Head} (GPH) in our Polar R-CNN.
@ -181,29 +192,29 @@ In the following, based on a polar coordinate representation of lane and lane an
 %
 Lanes are characterized by their thin, elongated, and curved shapes. A well-defined lane prior aids the model in feature extraction and location prediction. 
 \par
-\textbf{Lane and Anchor Representation as Ray.} Given an input image with dimensions of length $W$ and height $H$, a lane is represented by a set of 2D points with equally spaced y-coordinates $Y=\{y_1, y_2,\cdots, y_n\}$, where $y_i=i\times\frac{H}{n}$ and $n$ is the number of data points. Since the set $Y$ is fixed, a lane can be uniquely defined by its x-coordinates $X=\{x_1,x_2,\cdots,x_n\}$, with each $x_i$ corresponding to the respective $y_i\in Y$. Previous studies \cite{linecnn}\cite{laneatt} have introduced lane priors, also known as lane anchors, which are represented as straight lines in the image plane that serve as references. From a geometric perspective, a lane anchor can be viewed as a ray defined by a starting point $(x_{orig},y_{orig})$ located at the edge of an image (left/bottom/right boundaries), along with a direction $\theta$, as shown in Fig. \ref{coord}(a). The primary task of a lane detection model is to estimate the x-coordinate offset from the lane anchor to the ground truth of the lane instance. However, ......
+\textbf{Lane and Anchor Representation as Ray.} Given an input image with dimensions of length $W$ and height $H$, a lane is represented by a set of 2D points with equally spaced y-coordinates $Y=\{y_1, y_2,\cdots, y_n\}$, where $y_i=i\times\frac{H}{n}$ and $n$ is the number of data points. Since the set $Y$ is fixed, a lane can be uniquely defined by its x-coordinates $X=\{x_1,x_2,\cdots,x_n\}$, with each $x_i$ corresponding to the respective $y_i\in Y$. Previous studies \cite{linecnn}\cite{laneatt} have introduced lane priors, also known as lane anchors, which are represented as straight lines in the image plane that serve as references. From a geometric perspective, a lane anchor can be viewed as a ray defined by a starting point $(x_{orig},y_{orig})$ located at the edge of an image (left/bottom/right boundaries), along with a direction $\theta$, as shown in Fig. \ref{coord}(a). The primary task of a lane detection model is to estimate the x-coordinate offset from the lane anchor to the ground truth of the lane instance. 
+\par
+However, lane anchors, which are essentially straight lines represented as rays, have certain drawbacks, as illustrated in Fig. \ref{coord}(a). A lane anchor possesses an infinite number of potential starting points, making the definition of the anchor’s starting point ambiguous and subjective. Some methods, such as \cite{linecnn}\cite{dalnet}\cite{laneatt}, define the starting points as being located at the boundaries of an image (e.g. the green point in Fig. \ref{coord}(a)), while \cite{adnet} sets the starting points to correspond to the actual visual location within the image (e.g. the purple point in Fig. \ref{coord}(a)). Moreover, occlusion and damage to the lane significantly impact the detection of these starting points, necessitating that the model possess a large receptive field \cite{adnet}. Fundamentally, a straight lane has two degrees of freedom (for instance, the slope and the intercept under a Cartesian coordinate system), implying that the lane anchor could be described using just two parameters instead of the three redundant parameters (two for the start point and one for orientation) currently employed.

-
-
-\newpage
 \par
 \textbf{Representation in Polar Coordinate.} 
-As stated above, lane anchors represented by rays have some drawbacks. To address these issues, we introduce the polar coordinate representation of lane anchors. In mathematics, the polar coordinate is a two-dimensional coordinate system in which each point on a plane is determined by a distance from a reference point (also called the pole) and an angle $\theta$ from a reference direction (called polar axis). As shown in Fig. \ref{coord}(b), a lane anchor for a straight line can be uniquely defined by two parameters: the radial distance from the pole (called radius), $r$, and the counterclockwise angle from the polar axis, $\theta$, with $r\geq 0$ and $\theta\in\left(-\frac{\pi}{2}, \frac{\pi}{2}\right]$. 
+As stated above, lane anchors represented by rays have some drawbacks. To address these issues, we introduce the polar coordinate representation of lane anchors. In mathematics, the polar coordinate is a two-dimensional coordinate system in which each point on a plane is determined by a distance from a reference point (also called the pole) and an angle $\theta$ from a reference direction (called polar axis). As shown in Fig. \ref{coord}(b), a lane anchor for a straight line can be uniquely defined by two parameters: the radial distance from the pole (called radius), $r$, and the counterclockwise angle from the polar axis, $\theta$, with $r \in \mathbb{R}$ and $\theta\in\left(-\frac{\pi}{2}, \frac{\pi}{2}\right]$. To better integrate the local inductive bias properties of CNNs, we define two types of polar coordinate systems: the local polar coordinate system and the global coordinate system.

+In the polar coordinate system, we introduce a set of reference points known as local poles. These local poles are positioned at the lattice points (or pixels) of the downsampled feature map, as illustrated in Fig. \ref{lphlabel} (a). Each local pole, which we denoted as $\mathbf{c}_{i}^{l}$, is responsible for predicting a single lane anchor, similar to the green points shown in Fig. \ref{lphlabel} (a). During training, as depicted in Fig. \ref{lphlabel} (a), the ground truth labels for each local pole are defined as follows: the radius ground truth is the shortest distance from a grid point (local origin) to the ground truth lane curve, and the angle ground truth represents the orientation of the vector from the grid point to the nearest point on the curve. A grid point is labeled as a positive sample (the green local poles) if its radius label is below a threshold $\tau_{l}$; otherwise, it is considered a negative sample (the red poles). Note that one lane curve instance is regressed by multiple local poles. Some local features around certain poles may be missed due to damage or occlusion of the lane curve, so the one-to-many approach is crucial for ensuring comprehensive anchor proposals.

+In the second stage (RoI Pooling and final lane detection), we standardize the lane anchors by transforming them from multiple local polar coordinate systems into a single uniform global coordinate system. This system contains only one reference point, termed the global pole, denoted as $\mathbf{c}^{g}$. The location of the global pole is manually set, and in this work, it is positioned around the static vanishing point of the entire lane image dataset.

+% \newpage
+% Since lane anchors are typically represented as straight lines, they can be described using straight line parameters. Previous approaches have used rays to describe 2D lane anchors, with the parameters including the coordinates of the starting point and the orientation/angle, denoted as $\left\{\theta, P_{xy}\right\}$, as shown in Fig. \ref{coord}(a). \cite{linecnn}\cite{laneatt} define the start points as lying  on the three image boundaries. However, \cite{adnet} argue that this approach is problematic because the actual starting point of a lane could be located anywhere within the image. In our analysis, using a ray can lead to ambiguity in line representation because a line can have an infinite number of starting points, and the choice of the starting point for a lane is subjective. As illustrated in Fig. \ref{coord}(a), the yellow (the visual start point) and green (the point located on the image boundary) starting points with the same orientation $\theta$ describe the same line, and either could be used in different datasets \cite{scnn}\cite{vil100}. This ambiguity arises because a straight line has two degrees of freedom, whereas a ray has three (two for the start point and one for orientation). To resolve this issue , we propose using polar coordinates to describe a lane anchor with only two parameters: radius and angle, deoted as $\left\{\theta, r\right\}$, where This representation is illustrated in Fig. \ref{coord}(b).
+% \newpage

-\newpage
-Since lane anchors are typically represented as straight lines, they can be described using straight line parameters. Previous approaches have used rays to describe 2D lane anchors, with the parameters including the coordinates of the starting point and the orientation/angle, denoted as $\left\{\theta, P_{xy}\right\}$, as shown in Fig. \ref{coord}(a). \cite{linecnn}\cite{laneatt} define the start points as lying  on the three image boundaries. However, \cite{adnet} argue that this approach is problematic because the actual starting point of a lane could be located anywhere within the image. In our analysis, using a ray can lead to ambiguity in line representation because a line can have an infinite number of starting points, and the choice of the starting point for a lane is subjective. As illustrated in Fig. \ref{coord}(a), the yellow (the visual start point) and green (the point located on the image boundary) starting points with the same orientation $\theta$ describe the same line, and either could be used in different datasets \cite{scnn}\cite{vil100}. This ambiguity arises because a straight line has two degrees of freedom, whereas a ray has three (two for the start point and one for orientation). To resolve this issue , we propose using polar coordinates to describe a lane anchor with only two parameters: radius and angle, deoted as $\left\{\theta, r\right\}$, where This representation is illustrated in Fig. \ref{coord}(b).
- 
-\newpage
 \begin{figure}[t]
        \centering
        \includegraphics[width=0.45\textwidth]{thesis_figure/local_polar_head.png}
        \caption{The main architecture of LPH.}
        \label{lph}
 \end{figure}
-We define two types of polar coordinate systems: the global coordinate system and the local coordinate system, with the origin points denoted as the global origin $\boldsymbol{c}^{g}$ and the local origin $\boldsymbol{c}^{l}$, respectively. For convenience, the global origin is positioned near the static vanishing point of the entire lane image dataset, while the local origins are set at lattice points within the image. As illustrated in Fig. \ref{coord}(b), only the radius parameters are affected by the choice of the origin point, while the angle/orientation parameters remain consistent.
+% We define two types of polar coordinate systems: the global coordinate system and the local coordinate system, with the origin points denoted as the global origin $\boldsymbol{c}^{g}$ and the local origin $\boldsymbol{c}^{l}$, respectively. For convenience, the global origin is positioned near the static vanishing point of the entire lane image dataset, while the local origins are set at lattice points within the image. As illustrated in Fig. \ref{coord}(b), only the radius parameters are affected by the choice of the origin point, while the angle/orientation parameters remain consistent.
 \subsection{Local Polar Head}
 \textbf{Anchor formulation in local polar head.}. Inspired by the region proposal network in Faster R-CNN \cite{fasterrcnn}, the local polar head (LPH) aims to propose flexible, high-quality anchors aorund the lane ground truths within an image. As Fig. \ref{lph} and Fig. \ref{overall_architecture} demonstrate, the highest level $P_{3} \in \mathbb{R}^{C_{f} \times H_{f} \times W_{f}}$ of FPN feature maps is selected as the input for LPH. Following a downsampling operation, the feature map is then fed into two branches: the regression branch $\phi _{reg}^{lph}\left(\cdot \right)$ and the classification branch $\phi _{cls}^{lph}\left(\cdot \right)$:
 \begin{equation}
@ -217,9 +228,8 @@ We define two types of polar coordinate systems: the global coordinate system an

 The regression branch aims to propose lane anchors by predicting two parameters $F_{reg\,\,} \equiv \left\{\theta_{j}, r^{l}_{j}\right\}_{j=1}^{H^{l}\times W^{l}}$, within the local polar coordinate system. These parameters represent the angles and the radius.The classification branch predicts the heat map $F_{cls\,\,}\equiv \left\{ c_j \right\} _{j=1}^{H^l\times W^l}$ of the local polar origin grid. By discarding local origin points with lower confidence, the module increases the likelihood of selecting potential positive foreground lane anchors while removing background lane anchors to the greatest extent. Keeping it simple, the regression branch $\phi _{reg}^{lph}\left(\cdot \right)$ consists of one $1\times1$ convolutional layer while the classification branch $\phi _{cls}^{lph}\left(\cdot \right)$ consists of two $1\times1$ convolutional layers.

-\textbf{Loss Function.} During the training phase, as illustrated in Fig. \ref{lphlabel}, the ground truth labels for LPH are constructed as follows. The radius ground truth is defined as the shortest distance from a grid point (local origin point) to the ground truth lane curve. The angle ground truth is defined as the orientation of the vector from the grid point to the nearest point on the curve. A grid point is designated as a positive sample if its radius label is less than a threshold $\tau_{L}$ ; otherwise, it is considered a negative sample.
-
-Once the regression and classification labels are established, the LPH can be trained using the smooth-L1 loss $d\left(\cdot \right)$ for regression and the binary cross-entropy loss $BCE\left( \cdot , \cdot \right)$ for classification. The LPH loss function is defined as follows:
+\textbf{Loss Function.} 
+Once the regression and classification labels are established as Fig. \ref{lphlabel}, the LPH can be trained using the smooth-L1 loss $d\left(\cdot \right)$ for regression and the binary cross-entropy loss $BCE\left( \cdot , \cdot \right)$ for classification. The LPH loss function is defined as follows:
 \begin{equation}
        \begin{aligned}
                \mathcal{L} _{lph}^{cls}&=BCE\left( F_{cls},F_{gt} \right), \\
@ -230,12 +240,6 @@ Once the regression and classification labels are established, the LPH can be tr

 \textbf{Top-$K_{a}$ Anchor Selectoin.}. During the training stage, all $H^{l}\times W^{l}$ anchors are considered as candidate anchors and fed into the R-CNN module. This approach helps the R-CNN module to learn from sufficient features of negative (background) anchor samples. In the evaluation stage, however, only the top-$K_{a}$ anchors with the highest confidence scores are selected and fed into the R-CNN module. This strategy is designed to filter out potential negative (background) anchors and reduce the computational complexity of the R-CNN module. By doing so, it maintains the adaptability and flexibility of anchor distribution while decreasing the total number of anchors. The following experiments will demonstrate the effectiveness of our top-$K_{a}$ anchor selection strategy.

-\begin{figure}[t]
-        \centering
-        \includegraphics[width=\linewidth]{thesis_figure/coord/localpolar.png}
-        \caption{Label construction for LPH.}
-        \label{lphlabel}
-\end{figure}

 \subsection{Global Polar Head.}
 Global polar head (GPH) is a crucial component in the second stage of Polar R-CNN. It takes lane anchor pooling features as input and predicts the precise lane location and confidence. Fig. \ref{gph} illustrates the structure and pipeline of GPH. GPH comprises RoI pooling modules and three subheads (triplet head module), which will be introduced in detail.
@ -1194,7 +1198,7 @@ In the traditional NMS post-processing \cite{clrernet}, the default IoU threshol
        \end{subfigure}
        \vspace{0.5em}

-        \caption{The visualization of the detection results of sparse\&dense scenarios on CurveLanes dataset.}
+        \caption{The visualization of the detection results of sparse and dense scenarios on CurveLanes dataset.}
        \label{vis_dense}
 \end{figure*}

--- a/thesis_figure/coord/localpolar.png
+++ b/thesis_figure/coord/localpolar.png
--- a/thesis_figure/coord/polar.png
+++ b/thesis_figure/coord/polar.png
--- a/thesis_figure/~$thisis_pic.pptx
+++ b/thesis_figure/~$thisis_pic.pptx