This commit is contained in:
王老板 2024-09-18 10:28:56 +08:00
parent 4fa2730c18
commit bd2198fc44

View File

@ -200,7 +200,7 @@ However, lane anchors, which are essentially straight lines represented as rays,
\textbf{Representation in Polar Coordinate.} \textbf{Representation in Polar Coordinate.}
As stated above, lane anchors represented by rays have some drawbacks. To address these issues, we introduce the polar coordinate representation of lane anchors. In mathematics, the polar coordinate is a two-dimensional coordinate system in which each point on a plane is determined by a distance from a reference point (also called the pole) and an angle $\theta$ from a reference direction (called polar axis). As shown in Fig. \ref{coord}(b), a lane anchor for a straight line can be uniquely defined by two parameters: the radial distance from the pole (called radius), $r$, and the counterclockwise angle from the polar axis, $\theta$, with $r \in \mathbb{R}$ and $\theta\in\left(-\frac{\pi}{2}, \frac{\pi}{2}\right]$. To better integrate the local inductive bias properties of CNNs, we define two types of polar coordinate systems: the local polar coordinate system and the global coordinate system. As stated above, lane anchors represented by rays have some drawbacks. To address these issues, we introduce the polar coordinate representation of lane anchors. In mathematics, the polar coordinate is a two-dimensional coordinate system in which each point on a plane is determined by a distance from a reference point (also called the pole) and an angle $\theta$ from a reference direction (called polar axis). As shown in Fig. \ref{coord}(b), a lane anchor for a straight line can be uniquely defined by two parameters: the radial distance from the pole (called radius), $r$, and the counterclockwise angle from the polar axis, $\theta$, with $r \in \mathbb{R}$ and $\theta\in\left(-\frac{\pi}{2}, \frac{\pi}{2}\right]$. To better integrate the local inductive bias properties of CNNs, we define two types of polar coordinate systems: the local polar coordinate system and the global coordinate system.
In the polar coordinate system, we introduce a set of reference points known as local poles. These local poles are positioned at the lattice points (or pixels) of the downsampled feature map, as illustrated in Fig. \ref{lphlabel} (a). Each local pole, which we denoted as $\mathbf{c}_{i}^{l}$, is responsible for predicting a single lane anchor, similar to the green points shown in Fig. \ref{lphlabel} (a). During training, as depicted in Fig. \ref{lphlabel} (a), the ground truth labels for each local pole are defined as follows: the radius ground truth is the shortest distance from a grid point (local origin) to the ground truth lane curve, and the angle ground truth represents the orientation of the vector from the grid point to the nearest point on the curve. A grid point is labeled as a positive sample (the green local poles) if its radius label is below a threshold $\tau_{l}$; otherwise, it is considered a negative sample (the red poles). Note that one lane curve instance is regressed by multiple local poles. Some local features around certain poles may be missed due to damage or occlusion of the lane curve, so the one-to-many approach is crucial for ensuring comprehensive anchor proposals. In the polar coordinate system, we introduce a set of reference points known as local poles. These local poles are positioned at the lattice points (or pixels) of the downsampled feature map, as illustrated in Fig. \ref{lphlabel} (a). Each local pole, which we denoted as $\mathbf{c}_{i}^{l}$, is responsible for predicting a single lane anchor, similar to the green points shown in Fig. \ref{lphlabel} (a). During training, as depicted in Fig. \ref{lphlabel} (a), the ground truth labels for each local pole are defined as follows: the radius ground truth is the shortest distance from a local pole to the ground truth lane curve, and the angle ground truth represents the orientation of the vector from the local pole to the nearest point on the curve. A local pole is labeled as a positive sample (the green points) if its radius label is below a threshold $\tau_{l}$; otherwise, it is considered a negative sample (the red points). Note that one lane curve instance is regressed by multiple local poles. Some local features around certain poles may be missed due to damage or occlusion of the lane curve, so the one-to-many approach is crucial for ensuring comprehensive anchor proposals.
In the second stage (RoI Pooling and final lane detection), we standardize the lane anchors by transforming them from multiple local polar coordinate systems into a single uniform global coordinate system. This system contains only one reference point, termed the global pole, denoted as $\mathbf{c}^{g}$. The location of the global pole is manually set, and in this work, it is positioned around the static vanishing point of the entire lane image dataset. In the second stage (RoI Pooling and final lane detection), we standardize the lane anchors by transforming them from multiple local polar coordinate systems into a single uniform global coordinate system. This system contains only one reference point, termed the global pole, denoted as $\mathbf{c}^{g}$. The location of the global pole is manually set, and in this work, it is positioned around the static vanishing point of the entire lane image dataset.
@ -226,7 +226,7 @@ In the second stage (RoI Pooling and final lane detection), we standardize the l
\label{lph equ} \label{lph equ}
\end{equation} \end{equation}
The regression branch aims to propose lane anchors by predicting two parameters $F_{reg\,\,} \equiv \left\{\theta_{j}, r^{l}_{j}\right\}_{j=1}^{H^{l}\times W^{l}}$, within the local polar coordinate system. These parameters represent the angles and the radius.The classification branch predicts the heat map $F_{cls\,\,}\equiv \left\{ c_j \right\} _{j=1}^{H^l\times W^l}$ of the local polar origin grid. By discarding local origin points with lower confidence, the module increases the likelihood of selecting potential positive foreground lane anchors while removing background lane anchors to the greatest extent. Keeping it simple, the regression branch $\phi _{reg}^{lph}\left(\cdot \right)$ consists of one $1\times1$ convolutional layer while the classification branch $\phi _{cls}^{lph}\left(\cdot \right)$ consists of two $1\times1$ convolutional layers. The regression branch aims to propose lane anchors by predicting two parameters $F_{reg\,\,} \equiv \left\{\theta_{j}, r^{l}_{j}\right\}_{j=1}^{H^{l}\times W^{l}}$, within the local polar coordinate system. These parameters represent the angles and the radius.The classification branch predicts the heat map $F_{cls\,\,}\equiv \left\{ c_j \right\} _{j=1}^{H^l\times W^l}$ of the local poles. By discarding local poles with lower confidence, the module increases the likelihood of selecting potential positive foreground lane anchors while removing background lane anchors to the greatest extent. Keeping it simple, the regression branch $\phi _{reg}^{lph}\left(\cdot \right)$ consists of one $1\times1$ convolutional layer while the classification branch $\phi _{cls}^{lph}\left(\cdot \right)$ consists of two $1\times1$ convolutional layers.
\textbf{Loss Function.} \textbf{Loss Function.}
Once the regression and classification labels are established as Fig. \ref{lphlabel}, the LPH can be trained using the smooth-L1 loss $d\left(\cdot \right)$ for regression and the binary cross-entropy loss $BCE\left( \cdot , \cdot \right)$ for classification. The LPH loss function is defined as follows: Once the regression and classification labels are established as Fig. \ref{lphlabel}, the LPH can be trained using the smooth-L1 loss $d\left(\cdot \right)$ for regression and the binary cross-entropy loss $BCE\left( \cdot , \cdot \right)$ for classification. The LPH loss function is defined as follows:
@ -250,7 +250,7 @@ Global polar head (GPH) is a crucial component in the second stage of Polar R-CN
r^{g}_{j}=r^{l}_{j}+\left( \textbf{c}^{l}_{j}-\textbf{c}^{g}_{j} \right) ^{T}\left[\cos\theta_{j}; \sin\theta_{j} \right]. r^{g}_{j}=r^{l}_{j}+\left( \textbf{c}^{l}_{j}-\textbf{c}^{g}_{j} \right) ^{T}\left[\cos\theta_{j}; \sin\theta_{j} \right].
\end{aligned} \end{aligned}
\end{equation} \end{equation}
where $\textbf{c}^{l}_{j} \in \mathbb{R}^{2}$ and $\textbf{c}^{g} \in \mathbb{R}^{2}$ represent the Cartesian coordinates of local and global origins correspondingly. where $\textbf{c}^{l}_{j} \in \mathbb{R}^{2}$ and $\textbf{c}^{g} \in \mathbb{R}^{2}$ represent the Cartesian coordinates of $j_{th}$ local pole and the global pole correspondingly.
Next, feature points are sampled on the lane anchor. The y-coordinates of these points are uniformly sampled vertically from the image, as previously mentioned. The $x_{i}$ coordinates are computed using the global polar axis with the following equation: Next, feature points are sampled on the lane anchor. The y-coordinates of these points are uniformly sampled vertically from the image, as previously mentioned. The $x_{i}$ coordinates are computed using the global polar axis with the following equation:
\begin{equation} \begin{equation}