update
This commit is contained in:
parent
b732e11ca7
commit
26e73877d0
182
main.tex
182
main.tex
@ -18,6 +18,7 @@
|
|||||||
\usepackage{booktabs}
|
\usepackage{booktabs}
|
||||||
\usepackage{tikz}
|
\usepackage{tikz}
|
||||||
\usepackage{tabularx}
|
\usepackage{tabularx}
|
||||||
|
\usepackage{mathrsfs}
|
||||||
\usepackage[colorlinks,bookmarksopen,bookmarksnumbered, linkcolor=red]{hyperref}
|
\usepackage[colorlinks,bookmarksopen,bookmarksnumbered, linkcolor=red]{hyperref}
|
||||||
% \usepackage[table,xcdraw]{xcolor}
|
% \usepackage[table,xcdraw]{xcolor}
|
||||||
|
|
||||||
@ -241,15 +242,15 @@ where $N^{l}_{pos}=\left|\{j|\hat{r}_j^l<\tau^{l}\}\right|$ is the number of pos
|
|||||||
\par
|
\par
|
||||||
\textbf{RoI Pooling Layer.} It is designed to extract sampled features from lane anchors. For ease of the sampling operation, we first transform the radius of the positive lane anchors in a local polar coordinate, $r_j^l$, into the equivalent in a global polar coordinate system, $r_j^g$, by the following equation:
|
\textbf{RoI Pooling Layer.} It is designed to extract sampled features from lane anchors. For ease of the sampling operation, we first transform the radius of the positive lane anchors in a local polar coordinate, $r_j^l$, into the equivalent in a global polar coordinate system, $r_j^g$, by the following equation:
|
||||||
\begin{align}
|
\begin{align}
|
||||||
r_{j}^{g}&=r_{j}^{l}+\left[ \cos \theta _j;-\sin \theta _j \right] ^T\left( \mathbf{c}_{j}^{l}-\mathbf{c}^g \right), \label{l2g}\\
|
r_{j}^{g}&=r_{j}^{l}+\left[ \cos \theta _j; \sin \theta _j \right] ^T\left( \mathbf{c}_{j}^{l}-\mathbf{c}^g \right), \label{l2g}\\
|
||||||
j &= 1, 2, \cdots, K, \notag
|
j &= 1, 2, \cdots, K, \notag
|
||||||
\end{align}
|
\end{align}
|
||||||
where $\boldsymbol{c}^{g} \in \mathbb{R}^{2}$ and $\boldsymbol{c}^{l}_{j} \in \mathbb{R}^{2}$ represent the Cartesian coordinates of the global pole and the $j$-th local pole, respectively. It is noteworthy that the angle $\theta_j$ remains unaltered, as the local and global polar coordinate systems share the same polar axis. And next, the feature points are sampled on each lane anchor as follows:
|
where $\boldsymbol{c}^{g} \in \mathbb{R}^{2}$ and $\boldsymbol{c}^{l}_{j} \in \mathbb{R}^{2}$ represent the Cartesian coordinates of the global pole and the $j$-th local pole, respectively. It is noteworthy that the angle $\theta_j$ remains unaltered, as the local and global polar coordinate systems share the same polar axis. And next, the feature points are sampled on each lane anchor as follows:
|
||||||
\begin{align}
|
\begin{align}
|
||||||
x_{i,j}^{s}&=y_{i,j}^{s}\tan \theta _j+\frac{r_{j}^{g}+\left[ \cos \theta _j;-\sin \theta _j \right] ^T\boldsymbol{c}^g}{\cos \theta _j},\label{positions}\\
|
x_{i,j}^{s}&=-y_{i,j}^{s}\tan \theta _j+\frac{r_{j}^{g}+\left[ \cos \theta _j;\sin \theta _j \right] ^T\boldsymbol{c}^g}{\cos \theta _j},\label{positions}\\
|
||||||
i&=1,2,\cdots,N,\notag
|
i&=1,2,\cdots,N,\notag
|
||||||
\end{align}
|
\end{align}
|
||||||
where the y-coordinates $\boldsymbol{y}_{j}^{s}\equiv \{y_{1,j}^s,y_{2,j}^s,\cdots ,y_{N,j}^s\}$ of the $j$-th lane anchor are uniformly sampled vertically from the image, as previously mentioned. The x-coordinates $\boldsymbol{x}_{j}^{s}\equiv \{x_{1,j}^s,x_{2,j}^s,\cdots ,x_{N,j}^s\}$ are then calculated by Eq. (\ref{positions}). Eq. (\ref{l2g})-(\ref{positions}) could be easily demonstrated through the principles of Euclidean geometry.
|
where the y-coordinates $\boldsymbol{y}_{j}^{s}\equiv \{y_{1,j}^s,y_{2,j}^s,\cdots ,y_{N,j}^s\}$ of the $j$-th lane anchor are uniformly sampled vertically from the image, as previously mentioned. The x-coordinates $\boldsymbol{x}_{j}^{s}\equiv \{x_{1,j}^s,x_{2,j}^s,\cdots ,x_{N,j}^s\}$ are then calculated by Eq. (\ref{positions}). The derivation of Eq. (\ref{l2g})-(\ref{positions}) can be found in Appendix \ref{appendix_coord}.
|
||||||
\par
|
\par
|
||||||
Given the feature maps $\boldsymbol{P}_1, \boldsymbol{P}_2, \boldsymbol{P}_3$ from FPN, we can extract feature vectors corresponding to the positions of feature points $\{(x_{1,j}^s,y_{1,j}^s),(x_{2,j}^s,y_{2,j}^s),\cdots,(x_{N,j}^s,y_{N,j}^s)\}_{j=1}^{K}$, respectively denoted as $\boldsymbol{F}_{1,j}, \boldsymbol{F}_{2,j}, \boldsymbol{F}_{3,j}\in \mathbb{R} ^{N\times C_f}$. To enhance representation, similar to \cite{srlane}, we employ a weighted sum strategy to combine features from different levels as:
|
Given the feature maps $\boldsymbol{P}_1, \boldsymbol{P}_2, \boldsymbol{P}_3$ from FPN, we can extract feature vectors corresponding to the positions of feature points $\{(x_{1,j}^s,y_{1,j}^s),(x_{2,j}^s,y_{2,j}^s),\cdots,(x_{N,j}^s,y_{N,j}^s)\}_{j=1}^{K}$, respectively denoted as $\boldsymbol{F}_{1,j}, \boldsymbol{F}_{2,j}, \boldsymbol{F}_{3,j}\in \mathbb{R} ^{N\times C_f}$. To enhance representation, similar to \cite{srlane}, we employ a weighted sum strategy to combine features from different levels as:
|
||||||
\begin{equation}
|
\begin{equation}
|
||||||
@ -819,28 +820,36 @@ In this paper, we propose Polar R-CNN to address two key issues in anchor-based
|
|||||||
\renewcommand{\thefigure}{A\arabic{figure}}
|
\renewcommand{\thefigure}{A\arabic{figure}}
|
||||||
\renewcommand{\thesection}{A\arabic{section}}
|
\renewcommand{\thesection}{A\arabic{section}}
|
||||||
\renewcommand{\theequation}{A\arabic{equation}}
|
\renewcommand{\theequation}{A\arabic{equation}}
|
||||||
\section{Coordinate Systems of Polar R-CNN}
|
\section{Details about the Coordinate Systems}
|
||||||
In this section, we introduce the details about the coordinate systems used in our model and transformations between them.
|
In this section, we introduce the details about the coordinate systems employed in our model and coordinate transformations between them.
|
||||||
|
For convenience, we adopted Cartesian coordinate system instead of the image coordinate system, wherein the y-axis is oriented from bottom to top and the x-axis from left to right. The coordinates of of the local poles $\left\{\boldsymbol{c}^l_i\right\}$, the global pole $\boldsymbol{c}^g$, and the sampled points $\{(x_{1,j}^s,y_{1,j}^s),(x_{2,j}^s,y_{2,j}^s),\cdots,(x_{N,j}^s,y_{N,j}^s)\}_{j=1}^{K}$ of anchors are all within this coordinate by default.
|
||||||
|
|
||||||
\textbf{Cartesian Coordinate System.} This system is widely used in the anchor-based lane detection methods \cite{laneatt}\cite{clrnet}\cite{srlane}. The origin point of the Cartesian coordinate system is defined as the bottom-left of the whole image (different from the origin point located at the top-left of the image in the image coordinate system). In PolarRCNN, the Coordinate of the local poles $\left\{\boldsymbol{c}^l_i\right\}$, the global pole $\boldsymbol{c}^g$, and the sampled points $\{(x_{1,j}^s,y_{1,j}^s),(x_{2,j}^s,y_{2,j}^s),\cdots,(x_{N,j}^s,y_{N,j}^s)\}_{j=1}^{K}$ of the anchors are all defined in the Cartesian coordinate system.
|
We now furnish the derivation of the Eq. (\ref{l2g}) and Eq. (\ref{positions}), with the crucial symbols elucidated in Fig. \ref{elu_proof}. These geometric transformations can be demonstrated with Analytic geometry theory in Euclidean space. The derivation of Eq. (\ref{l2g}) is presented as follows:
|
||||||
|
|
||||||
\textbf{Polar coordinate system}. The defination of the local and the global coordinate systems are already defined in the main text. Now the proof of the Eq. \ref{l2g} is given as follows.
|
|
||||||
\begin{align}
|
\begin{align}
|
||||||
r_{j}^{g}&=\left\| \overrightarrow{c^gh_{j}^{g}} \right\| =\left\| \overrightarrow{h_{j}^{a}h_{j}^{l}} \right\| =\left\| \overrightarrow{h_{j}^{a}h_{j}^{l}} \right\|\\
|
r_{j}^{g}&=\left\| \overrightarrow{c^gh_{j}^{g}} \right\| =\left\| \overrightarrow{h_{j}^{a}h_{j}^{l}} \right\| =\left\| \overrightarrow{h_{j}^{a}h_{j}^{l}} \right\| \notag\\
|
||||||
&=\left\| \overrightarrow{c_{j}^{l}h_{j}^{l}}-\overrightarrow{h_{j}^{a}c_{j}^{l}} \right\| =\left\| \overrightarrow{c_{j}^{l}h_{j}^{l}} \right\| -\left\| \overrightarrow{c_{j}^{l}h_{j}^{a}} \right\|\\
|
&=\left\| \overrightarrow{c_{j}^{l}h_{j}^{l}}-\overrightarrow{h_{j}^{a}c_{j}^{l}} \right\| =\left\| \overrightarrow{c_{j}^{l}h_{j}^{l}} \right\| -\left\| \overrightarrow{c_{j}^{l}h_{j}^{a}} \right\| \notag\\
|
||||||
&=\left\| \overrightarrow{c_{j}^{l}h_{j}^{l}} \right\| +\frac{\overrightarrow{c_{j}^{l}h_{j}^{a}}\cdot \overrightarrow{c^gc_{j}^{l}}}{\left\| \overrightarrow{c_{j}^{l}h_{j}^{a}} \right\|}\\
|
&=\left\| \overrightarrow{c_{j}^{l}h_{j}^{l}} \right\| - \frac{\overrightarrow{c_{j}^{l}h_{j}^{a}}}{\left\| \overrightarrow{c_{j}^{l}h_{j}^{a}} \right\|}\cdot \overrightarrow{c_{j}^{l}h_{j}^{a}} =\left\| \overrightarrow{c_{j}^{l}h_{j}^{l}} \right\| +\frac{\overrightarrow{c_{j}^{l}h_{j}^{a}}}{\left\| \overrightarrow{c_{j}^{l}h_{j}^{a}} \right\|}\cdot \overrightarrow{c^gc_{j}^{l}} \notag\\
|
||||||
&=\left\| \overrightarrow{c_{j}^{l}h_{j}^{l}} \right\| +\frac{\overrightarrow{c_{j}^{l}h_{j}^{a}}}{\left\| \overrightarrow{c_{j}^{l}h_{j}^{a}} \right\|}\cdot \overrightarrow{c^gc_{j}^{l}}\\
|
&=r_{j}^{l}+\left[ \cos \theta _j;\sin \theta _j \right] ^T\left( \boldsymbol{c}_{j}^{l}-\boldsymbol{c}^g \right),
|
||||||
&=r_{j}^{l}+\left[ \cos \theta _j;\sin \theta _j \right] ^T\left( \boldsymbol{c}_{j}^{l}-\boldsymbol{c}^g \right)
|
\label{proof_l2g}
|
||||||
\end{align}
|
\end{align}
|
||||||
|
where $h_j^l$, $h_j^g$ and $h_j^a$ represent the foots of their respective perpendiculars in Fig. \ref{elu_proof}.
|
||||||
|
Analogously, the derivation of Eq. (\ref{positions}) is provided as follows:
|
||||||
\begin{align}
|
\begin{align}
|
||||||
&\overrightarrow{c^gp_{i,j}^{s}}\cdot \overrightarrow{c^gh_{j}^{g}}=\overrightarrow{c^gh_{j}^{g}}\cdot \overrightarrow{c^gh_{j}^{g}}\\
|
&\overrightarrow{c^gp_{i,j}^{s}}\cdot \overrightarrow{c^gh_{j}^{g}}=\overrightarrow{c^gh_{j}^{g}}\cdot \overrightarrow{c^gh_{j}^{g}} \notag\\
|
||||||
&\Rightarrow \overrightarrow{c^gp_{i,j}^{s}}\cdot \overrightarrow{c^gh_{j}^{g}}=\left\| \overrightarrow{c^gh_{j}^{g}} \right\| \left\| \overrightarrow{c^gh_{j}^{g}} \right\| \\
|
\Rightarrow &\overrightarrow{c^gp_{i,j}^{s}}\cdot \overrightarrow{c^gh_{j}^{g}}=\left\| \overrightarrow{c^gh_{j}^{g}} \right\| \left\| \overrightarrow{c^gh_{j}^{g}} \right\| \notag\\
|
||||||
&\Rightarrow \frac{\overrightarrow{c^gh_{j}^{g}}}{\left\| \overrightarrow{c^gh_{j}^{g}} \right\|}\cdot \overrightarrow{c^gp_{i,j}^{s}}=\left\| \overrightarrow{c^gh_{j}^{g}} \right\| \\
|
\Rightarrow &\frac{\overrightarrow{c^gh_{j}^{g}}}{\left\| \overrightarrow{c^gh_{j}^{g}} \right\|}\cdot \overrightarrow{c^gp_{i,j}^{s}}=\left\| \overrightarrow{c^gh_{j}^{g}} \right\| \notag\\
|
||||||
&\Rightarrow \left[ \cos \theta _j;\sin \theta _j \right] ^T\left( \boldsymbol{p}_{i,j}^{s}-\boldsymbol{c}^g \right) =r_{j}^{g}\\
|
\Rightarrow &\left[ \cos \theta _j;\sin \theta _j \right] ^T\left( \boldsymbol{p}_{i,j}^{s}-\boldsymbol{c}^g \right) =r_{j}^{g}\notag\\
|
||||||
&\Rightarrow x_{i,j}^{s}\cos \theta _j+y_{i,j}^{s}\sin \theta _j=r_{j}^{g}+\left[ \cos \theta _j;\sin \theta _j \right] ^T\boldsymbol{c}^g\\
|
\Rightarrow &x_{i,j}^{s}\cos \theta _j+y_{i,j}^{s}\sin \theta _j=r_{j}^{g}+\left[ \cos \theta _j;\sin \theta _j \right] ^T\boldsymbol{c}^g \notag\\
|
||||||
&\Rightarrow x_{i,j}^{s}=\frac{r_{j}^{g}+\left[ \cos \theta _j;\sin \theta _j \right] ^T\boldsymbol{c}^g}{\cos \theta _j}-y_{i,j}^{s}\tan \theta _j
|
\Rightarrow &x_{i,j}^{s}=-y_{i,j}^{s}\tan \theta _j+\frac{r_{j}^{g}+\left[ \cos \theta _j;\sin \theta _j \right] ^T\boldsymbol{c}^g}{\cos \theta _j},
|
||||||
|
\label{proof_sample}
|
||||||
\end{align}
|
\end{align}
|
||||||
|
where $p_{i,j}^{s}$ represents the $i$-th sampled point of the $j$-th lane anchor, whose coordinate is $\boldsymbol{p}_{i,j}^{s}\equiv(x_{i,j}^s, y_{i,j}^s)$.
|
||||||
|
|
||||||
|
|
||||||
|
\label{appendix_coord}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\section{The Design Principles of the One-to-one classification Head}
|
\section{The Design Principles of the One-to-one classification Head}
|
||||||
@ -863,6 +872,15 @@ where $\boldsymbol{F}_{cls}^{plain}$ represents a classification head characteri
|
|||||||
|
|
||||||
We draw inspiration from Fast NMS \cite{yolact} for the design of the O2O classification head. Fast NMS serves as an iteration-free post-processing algorithm based on traditional NMS. Furthermore, we have incorporated a sort-free strategy along with geometric priors into Fast NMS, with the specifics delineated in Algorithm \ref{Graph Fast NMS}.
|
We draw inspiration from Fast NMS \cite{yolact} for the design of the O2O classification head. Fast NMS serves as an iteration-free post-processing algorithm based on traditional NMS. Furthermore, we have incorporated a sort-free strategy along with geometric priors into Fast NMS, with the specifics delineated in Algorithm \ref{Graph Fast NMS}.
|
||||||
|
|
||||||
|
|
||||||
|
\begin{figure}[t]
|
||||||
|
\centering
|
||||||
|
\includegraphics[width=\linewidth]{thesis_figure/elu_proof.png}
|
||||||
|
\caption{The symbols employed in the derivation of coordinate transformations across different coordinate systems.}
|
||||||
|
\label{elu_proof}
|
||||||
|
\end{figure}
|
||||||
|
|
||||||
|
|
||||||
\begin{algorithm}[t]
|
\begin{algorithm}[t]
|
||||||
\caption{Fast NMS with Geometric Prior.}
|
\caption{Fast NMS with Geometric Prior.}
|
||||||
\begin{algorithmic}[1] %这个1 表示每一行都显示数字
|
\begin{algorithmic}[1] %这个1 表示每一行都显示数字
|
||||||
@ -871,11 +889,12 @@ We draw inspiration from Fast NMS \cite{yolact} for the design of the O2O classi
|
|||||||
The positive corresponding anchors, $\left\{ \theta _i,r_{i}^{g} \right\} |_{i=1}^{K}$;\\
|
The positive corresponding anchors, $\left\{ \theta _i,r_{i}^{g} \right\} |_{i=1}^{K}$;\\
|
||||||
The confidence emanating from the O2M classification head, $s_i^g$;\\
|
The confidence emanating from the O2M classification head, $s_i^g$;\\
|
||||||
The regressions emanating from the O2M regression head, denoted as $\left\{ Lane_i \right\} |_{i=1}^{K}$\\
|
The regressions emanating from the O2M regression head, denoted as $\left\{ Lane_i \right\} |_{i=1}^{K}$\\
|
||||||
|
The predetermined thresholds $\tau^\theta$, $\tau^r$, $\tau_d$ and $\lambda _{o2m}^{s}$.
|
||||||
\ENSURE ~~\\ %算法的输出:Output
|
\ENSURE ~~\\ %算法的输出:Output
|
||||||
\STATE Caculate the confidence comparison matrix $\boldsymbol{A}^{C}\in\mathbb{R}^{K\times K}$, defined as follows:
|
\STATE Caculate the confidence comparison matrix $\boldsymbol{A}^{C}\in\mathbb{R}^{K\times K}$, defined as follows:
|
||||||
\begin{align}
|
\begin{align}
|
||||||
A_{ij}^{C}=\begin{cases}
|
A_{ij}^{C}=\begin{cases}
|
||||||
1, s_i>s_j\,\,and\,\,\left( s_i=s_j\,\,or\,\,i>j \right)\\
|
1, s_i>s_j\,\,or\,\,\left( s_i^g=s_j^g\,\,and\,\,i>j \right)\\
|
||||||
0, others.\\
|
0, others.\\
|
||||||
\end{cases}
|
\end{cases}
|
||||||
\label{confidential matrix}
|
\label{confidential matrix}
|
||||||
@ -897,45 +916,45 @@ We draw inspiration from Fast NMS \cite{yolact} for the design of the O2O classi
|
|||||||
\STATE Define the adjacent matrix $\boldsymbol{A} = \boldsymbol{A}^{C} \odot \boldsymbol{A}^{G}$ and the final confidence $\tilde{s}_i^g$ is calculate as following:
|
\STATE Define the adjacent matrix $\boldsymbol{A} = \boldsymbol{A}^{C} \odot \boldsymbol{A}^{G}$ and the final confidence $\tilde{s}_i^g$ is calculate as following:
|
||||||
\begin{align}
|
\begin{align}
|
||||||
\tilde{s}_{i}^{g}=\begin{cases}
|
\tilde{s}_{i}^{g}=\begin{cases}
|
||||||
1, \mathrm{if}\underset{k\in \{k\mid A_{ki}=1\}}{\max}D_{ki}<\lambda _{d}^{s}, \\
|
1, \mathrm{if}\underset{D_{ki}\in \{D_{ki}\mid A_{ki}=1\}}{\max}D_{ki}<\left( \tau ^d \right) ^{-1},\\
|
||||||
0, \mathrm{otherwise}.c\\
|
0, \mathrm{otherwise}\\
|
||||||
\end{cases}
|
\end{cases}
|
||||||
\label{al_1-4}
|
\label{al_1-4}
|
||||||
\end{align}
|
\end{align}
|
||||||
\STATE Get the final selection set:
|
\STATE Get the final selection set:
|
||||||
\begin{align}
|
\begin{align}
|
||||||
\varOmega_{nms}^{pos}=\left\{ i|s_{i}^{g}>\lambda _{o2m}^{s}\,\,and\,\,\tilde{s}_{i}^{g}=1 \right\}
|
\varOmega_{nms}^{pos}=\left\{ i|s_{i}^{g}>\lambda _{o2m}^{s}\,\,and\,\,\tilde{s}_{i}^{g}=1 \right\}
|
||||||
|
\label{al_1-5}
|
||||||
\end{align}
|
\end{align}
|
||||||
where one prediction retained should statisfy the above two criterias.
|
|
||||||
|
|
||||||
\RETURN The final result $\varOmega_{nms}^{pos}$.
|
\RETURN The final selection result $\varOmega_{nms}^{pos}$.
|
||||||
\end{algorithmic}
|
\end{algorithmic}
|
||||||
\label{Graph Fast NMS}
|
\label{Graph Fast NMS}
|
||||||
\end{algorithm}
|
\end{algorithm}
|
||||||
|
|
||||||
The new algorithm has a distinct format from the original one\cite{yolact}. The geometric prior $\boldsymbol{A}_{G}$ indicated that predictions associated with adequately proximate anchors were likely to suppress one another. It is straightforward to demonstrate that, when all elements within $\boldsymbol{A}_{G}$ are all set to 1 (disregarding geometric priors), Algorithm \ref{Graph Fast NMS} is equivalent to Fast NMS. Building upon our newly proposed sort-free Fast NMS with geometric prior, we can design the structure of the one-to-one classification head.
|
The new algorithm has a distinct format from the original one\cite{yolact}. The geometric prior $\boldsymbol{A}_{G}$ indicated that predictions associated with adequately proximate anchors were likely to suppress one another. It is straightforward to demonstrate that, when all elements within $\boldsymbol{A}_{G}$ are all set to 1 (disregarding geometric priors), Algorithm \ref{Graph Fast NMS} is equivalent to Fast NMS. Building upon our newly proposed sort-free Fast NMS with geometric prior, we can design the structure of the one-to-one classification head.
|
||||||
|
|
||||||
The principal limitations of the NMS lie in the definitions of distance derived from geometry (i.e., Eq. (\ref{al_1-3})) and the threshold $\lambda^{g}$ employed to eliminate redundant predictions (i.e., Eq. (\ref{al_1-4})). For instance, in the scenario of double lines, despite the minimal geometric distance between the two lanes, their semantic divergence is strikingly distinct. Consequently, we replace the above two steps with trainable neural networks, allowing them to learn the semantic distance in a data-driven fashion. The neural network blocks to replace Eq. (\ref{al_1-3}) are expressed as:
|
The principal limitations of the NMS lie in the definitions of distance derived from geometry (i.e., Eq. (\ref{al_1-3})) and the threshold $\lambda^{g}$ employed to eliminate redundant predictions (i.e., Eq. (\ref{al_1-4})). For instance, in the scenario of double lines, despite the minimal geometric distance between the two lane instances, their semantic divergence is strikingly distinct. Consequently, we replace the above two steps with trainable neural networks, allowing them to learn the semantic distance in a data-driven fashion. The neural network blocks to replace Eq. (\ref{al_1-3}) are expressed as:
|
||||||
|
% \begin{align}
|
||||||
|
% \tilde{\boldsymbol{F}}_{i}^{roi}&\gets \mathrm{ReLU}\left( \boldsymbol{W}_{roi}\boldsymbol{F}_{i}^{roi}+\boldsymbol{b}_{roi} \right) ,\label{edge_layer_1_appendix}\\
|
||||||
|
% \boldsymbol{F}_{ij}^{edge}&\gets \boldsymbol{W}_{in}\tilde{\boldsymbol{F}}_{j}^{roi}-\boldsymbol{W}_{out}\tilde{\boldsymbol{F}}_{i}^{roi},\label{edge_layer_2_appendix}\\
|
||||||
|
% \tilde{\boldsymbol{F}}_{ij}^{edge}&\gets \boldsymbol{F}_{ij}^{edge}+\boldsymbol{W}_s\left( \boldsymbol{x}_{j}^{s}-\boldsymbol{x}_{i}^{s} \right) +\boldsymbol{b}_s,\label{edge_layer_3_appendix}\\
|
||||||
|
% \boldsymbol{D}_{ij}^{edge}&\gets \mathrm{MLP}_{edge}\left( \tilde{\boldsymbol{F}}_{ij}^{edge} \right) .\label{edge_layer_4_appendix}
|
||||||
|
% \end{align}
|
||||||
|
where the inverse distance $\boldsymbol{D}_{ij}^{edge}\in\mathbb{R}^{d_n}$ is no longer a scalar but a tensor. We use element-wise max pooling for tensor to repalce the max operation for scalear. So the $\left( \tau ^d \right) ^{-1}$ can vbe no longer employed as the threshold of the distance. Furthermore, the predetermined $\left( \tau ^d \right) ^{-1}$ can be no longer employed as the threshold of the distance. We defined a neural work as a implicit decision plane to formulate the final score $\tilde{s}_{i}^{g}$. The replacement of Eq. (\ref{al_1-4}) is constructed as follows:So We also use a The replacement of Eq. (\ref{al_1-4}) is constructed as follows:
|
||||||
|
% \begin{align}
|
||||||
|
% \boldsymbol{D}_{i}^{node}&\gets \underset{\boldsymbol{D}_{ki}^{edge}\in \left\{ \boldsymbol{D}_{ki}^{edge}|A_{ki}=1 \right\}}{\max}\boldsymbol{D}_{ki}^{edge}.
|
||||||
|
% \\
|
||||||
|
% \boldsymbol{F}_{i}^{node}&\gets \mathrm{MLP}_{node}\left( \boldsymbol{D}_{i}^{node} \right) ,
|
||||||
|
% \\
|
||||||
|
% \tilde{s}_{i}^{g}&\gets \sigma \left( \boldsymbol{W}_{node}\boldsymbol{F}_{i}^{node} + \boldsymbol{b}_{node} \right).
|
||||||
|
% \label{node_layer_appendix}
|
||||||
|
% \end{align}
|
||||||
|
In this expression, the score $\tilde{s}_{i}^{g}$ transitions from a binary score to a continuous soft score ranging from 0 to 1. We introduce a new threshold $\lambda^s_{o2o}$ within the replacement criteria of Eq. (\ref{al_1-5}):
|
||||||
\begin{align}
|
\begin{align}
|
||||||
\tilde{\boldsymbol{F}}_{i}^{roi}&\gets \mathrm{ReLU}\left( \boldsymbol{W}_{roi}\boldsymbol{F}_{i}^{roi}+\boldsymbol{b}_{roi} \right) ,\label{edge_layer_1_appendix}\\
|
\varOmega_{nms}^{pos}=\left\{ i|s_{i}^{g}>\lambda _{o2m}^{s}\,\,and\,\,\tilde{s}_{i}^{g}>\lambda^s_{o2o}\right\}.
|
||||||
\boldsymbol{F}_{ij}^{edge}&\gets \boldsymbol{W}_{in}\tilde{\boldsymbol{F}}_{j}^{roi}-\boldsymbol{W}_{out}\tilde{\boldsymbol{F}}_{i}^{roi},\label{edge_layer_2_appendix}\\
|
|
||||||
\tilde{\boldsymbol{F}}_{ij}^{edge}&\gets \boldsymbol{F}_{ij}^{edge}+\boldsymbol{W}_s\left( \boldsymbol{x}_{j}^{s}-\boldsymbol{x}_{i}^{s} \right) +\boldsymbol{b}_s,\label{edge_layer_3_appendix}\\
|
|
||||||
\boldsymbol{D}_{ij}^{edge}&\gets \mathrm{MLP}_{edge}\left( \tilde{\boldsymbol{F}}_{ij}^{edge} \right) .\label{edge_layer_4_appendix}
|
|
||||||
\end{align}
|
\end{align}
|
||||||
where the inverse distance $\boldsymbol{D}_{ij}^{edge}\in\mathbb{R}^{d}$ is no longer a scalar but a tensor. The replacement of Eq. (\ref{al_1-4}) is constructed as follows:
|
This criteria is also referred to as the \textit{dual confidence selection} in the main text.
|
||||||
\begin{align}
|
|
||||||
\boldsymbol{D}_{i}^{node}&\gets \underset{k\in \left\{ k|A_{ki}=1 \right\}}{\max}\boldsymbol{D}_{ki}^{edge}.
|
|
||||||
\\
|
|
||||||
\boldsymbol{F}_{i}^{node}&\gets \mathrm{MLP}_{node}\left( \boldsymbol{D}_{i}^{node} \right) ,
|
|
||||||
\\
|
|
||||||
\tilde{s}_{i}^{g}&\gets \sigma \left( \boldsymbol{W}_{node}\boldsymbol{F}_{i}^{node} + \boldsymbol{b}_{node} \right).
|
|
||||||
\label{node_layer_appendix}
|
|
||||||
\end{align}
|
|
||||||
In this expression, we use element-wise max pooling of tensors instead of scalar-based max operations. By eliminating the need for a predetermined distance threshold $\lambda^s_d$, the authentic implicit decision surface is learned by neural work. Furthermore, since the score $\tilde{s}_{i}^{g}$ transitions from a binary score to a continuous soft score ranging from 0 to 1. We introduce a threshold $\lambda^s_{o2o}$ within the replacement criteria:
|
|
||||||
\begin{align}
|
|
||||||
\varOmega_{nms}^{pos}=\left\{ i|s_{i}^{g}>\lambda _{o2m}^{s}\,\,and\,\,\tilde{s}_{i}^{g}>\lambda^s_{o2o}\right\},
|
|
||||||
\end{align}
|
|
||||||
which is referred to as the \textit{dual confidence selection} in the main text.
|
|
||||||
\label{NMS_appendix}
|
\label{NMS_appendix}
|
||||||
|
|
||||||
\begin{table*}[htbp]
|
\begin{table*}[htbp]
|
||||||
@ -981,33 +1000,72 @@ which is referred to as the \textit{dual confidence selection} in the main text.
|
|||||||
\label{dataset_info}
|
\label{dataset_info}
|
||||||
\end{table*}
|
\end{table*}
|
||||||
|
|
||||||
\section{The IoU Definations for Lane Instances}
|
|
||||||
To make the function more compact and consistent with general object detection works \cite{iouloss}\cite{giouloss}, we have redefined the lane IoU. As illustrated in Fig. \ref{glaneiou}, the newly-defined lane IoU, which we refer to as GLaneIoU, is redefined as follows:
|
|
||||||
\begin{figure}[t]
|
\begin{figure}[t]
|
||||||
\centering
|
\centering
|
||||||
\includegraphics[width=\linewidth]{thesis_figure/GLaneIoU.png} % 替换为你的图片文件名
|
\includegraphics[width=\linewidth]{thesis_figure/GLaneIoU.png} % 替换为你的图片文件名
|
||||||
\caption{Illustrations of GLaneIoU redefined in our work.}
|
\caption{Illustrations of GLaneIoU redefined in our work.}
|
||||||
\label{glaneiou}
|
\label{glaneiou}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
\begin{align}
|
|
||||||
&w_{i}^{k}=\frac{\sqrt{\left( \Delta x_{i}^{k} \right) ^2+\left( \Delta y_{i}^{k} \right) ^2}}{\Delta y_{i}^{k}}w_{b},
|
\section{The Details of Intersection Over Union between Lane Instances}
|
||||||
\\
|
To make the IoU between lane instances consistent with that of general object detection methods \cite{iouloss}\cite{giouloss}, we have redefined the lane IoU. As illustrated in Fig. \ref{glaneiou}, the newly-defined IoU of lanes, which we term as GLaneIoU, is articulated as follows:
|
||||||
&\hat{d}_{i}^{\mathcal{O}}=\min \left( x_{i}^{p}+w_{i}^{p}, x_{i}^{q}+w_{i}^{q} \right) -\max \left( x_{i}^{p}-w_{i}^{p}, x_{i}^{q}-w_{i}^{q} \right),
|
|
||||||
\\
|
|
||||||
&\hat{d}_{i}^{\xi}=\max \left( x_{i}^{p}-w_{i}^{p}, x_{i}^{q}-w_{i}^{q} \right) -\min \left( x_{i}^{p}+w_{i}^{p}, x_{i}^{q}+w_{i}^{q} \right),
|
|
||||||
\\
|
|
||||||
&d_{i}^{\mathcal{U}}=\max \left( x_{i}^{p}+w_{i}^{p}, x_{i}^{q}+w_{i}^{q} \right) -\min \left( x_{i}^{p}-w_{i}^{p}, x_{i}^{q}-w_{i}^{q} \right),
|
|
||||||
\\
|
|
||||||
&d_{i}^{\mathcal{O}}=\max \left( \hat{d}_{i}^{\mathcal{O}},0 \right), \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, d_{i}^{\xi}=\max \left( \hat{d}_{i}^{\xi},0 \right),
|
|
||||||
\end{align}
|
|
||||||
where $w_{b}$ is the base semi-width of the lane instance. The definations of $d_{i}^{\mathcal{O}}$ and $d_{i}^{\mathcal{\xi}}$ is similar but slightly different from those in \cite{clrnet} and \cite{adnet}, with adjustments made to ensure the values are non-negative. This format is intended to maintain consistency with the IoU definitions used for bounding boxes. Therefore, the overall GLaneIoU is given as follows:
|
|
||||||
\begin{align}
|
\begin{align}
|
||||||
GIoU_{lane}\,\,=\,\,\frac{\sum\nolimits_{i=j}^k{d_{i}^{\mathcal{O}}}}{\sum\nolimits_{i=j}^k{d_{i}^{\mathcal{U}}}}-g\frac{\sum\nolimits_{i=j}^k{d_{i}^{\xi}}}{\sum\nolimits_{i=j}^k{d_{i}^{\mathcal{U}}}},
|
\Delta x_{i,p}^{d}&=x_{i+1,p}^{d}-x_{i-1,p}^{d},\,\, \Delta y_{i,p}^{d}=y_{i+1,p}^{d}-y_{i-1,p}^{d}, \\
|
||||||
|
w_{i,p}&=\frac{\sqrt{\left( \Delta x_{i,p}^{d} \right) ^2+\left( \Delta y_{i,p}^{d} \right) ^2}}{\Delta y_{i,p}^{d}}w^b,\\
|
||||||
|
b_{i,p}^{l}&=x_{i,p}^{d}-w_{i,p},\,\, b_{i,p}^{r}=x_{i,p}^{d}+w_{i,p}\,\, \\
|
||||||
\end{align}
|
\end{align}
|
||||||
where j and k are the indices of the start point and the end point, respectively. It's straightforward to observed that when $g=0$, the $GIoU_{lane}$ is correspond to IoU for bounding box, with a value range of $\left[0, 1 \right]$. When $g=1$, the $GIoU_{lane}$ is correspond to GIoU\cite{giouloss} for bounding box, with a value range of $\left(-1, 1 \right]$. In general, when $g>0$, the value range of $GIoU_{lane}$ is $\left(-g, 1 \right]$. We set $g=0$ for cost function and IoU matrix in SimOTA, while $g=1$ for the loss function.
|
where $w^{b}$ is the base semi-width parameter and $w_{i,p}&$ is the actual semi-width of $p$-th lane instance. $\left\{ b_{i,p}^{l} \right\} _{i=1}^{N}$ and $\left\{ b_{i,p}^{r} \right\} _{i=1}^{N}$ denotes the left boundaries and the right boundaries if the $p$-th lane instance. Then we defined inter and union between lane instances:
|
||||||
|
\begin{align}
|
||||||
|
d_{i,pq}^{\mathcal{O}}&=\max \left( \min \left( b_{i,p}^{r}, b_{i,q}^{r} \right) -\max \left( b_{i,p}^{l}, b_{i,q}^{l} \right) , 0 \right),\\
|
||||||
|
d_{i,pq}^{\xi}&=\max \left( \max \left( b_{i,p}^{l}, b_{i,q}^{l} \right) -\min \left( b_{i,p}^{l}, b_{i,q}^{l} \right) , 0 \right),\\
|
||||||
|
d_{i,pq}^{\mathcal{U}}&=\max \left( b_{i,p}^{r}, b_{i,q}^{r} \right) -\min \left( b_{i,p}^{l}, b_{i,q}^{l} \right),
|
||||||
|
\end{align}
|
||||||
|
The definations of $\left\{d_{i,pq}^{\mathcal{O}}\right\}_{i=1}^{N}$, $\left\{d_{i,pq}^{\xi}\right\}_{i=1}^{N}$ and $\left\{d_{i,pq}^{\mathcal{U}}\right\}_{i=1}^{N}$ denote the over distance, gap distance, and union distance, respectively. These definatons are similar but slightly different from those in \cite{clrnet} and \cite{adnet}, with adjustments made to ensure the values are non-negative. This format is intended to maintain consistency with the IoU definitions used for bounding boxes. Therefore, the overall GLaneIoU between the $p$-th and $q$-th lane instances is given as follows:
|
||||||
|
\begin{align}
|
||||||
|
GIoU_{lane}\left( p,q \right)=\frac{\sum\nolimits_{i=j}^k{d_{i,pq}^{\mathcal{O}}}}{\sum\nolimits_{i=j}^k{d_{i,pq}^{\mathcal{U}}}}-g\frac{\sum\nolimits_{i=j}^k{d_{i,pq}^{\xi}}}{\sum\nolimits_{i=j}^k{d_{i,pq}^{\mathcal{U}}}},
|
||||||
|
\end{align}
|
||||||
|
where j and k are the indices of the start point and the end point, respectively. It's straightforward to observed that when $g=0$, the $GIoU_{lane}$ is correspond to IoU for bounding box, with a value range of $\left[0, 1 \right]$. When $g=1$, the $GIoU_{lane}$ is correspond to GIoU\cite{giouloss} for bounding box, with a value range of $\left(-1, 1 \right]$.
|
||||||
|
|
||||||
|
% In general, when $g>0$, the value range of $GIoU_{lane}$ is $\left(-g, 1 \right]$. We set $g=0$ for cost function and IoU matrix in SimOTA, while $g=1$ for the loss function.
|
||||||
|
|
||||||
\label{giou_appendix}
|
\label{giou_appendix}
|
||||||
|
|
||||||
\begin{figure*}[htbp]
|
\section{Details about The Label assignment and Loss function.}
|
||||||
|
|
||||||
|
\begin{figure}[t]
|
||||||
|
\centering
|
||||||
|
\includegraphics[width=\linewidth]{thesis_figure/detection_head_assign.png}
|
||||||
|
\caption{Label assignment and loss function for the triplet head.}
|
||||||
|
\label{head_assign}
|
||||||
|
\end{figure}
|
||||||
|
We furnish the cost function and label assignments for the triplet head. We use dual label assignment strategy \cite{date} to assign label for triplet head, as illustrated in Fig. \ref{head_assign}. Specifically, we use one-to-many label assignments for both O2O classification head and O2M regression head. This part is almost the same as previous work \cite{clrernet}. In order to equip our model with NMS-free paradigm, we additionally add a O2O classification head and employ one-to-one label assignment to it.
|
||||||
|
|
||||||
|
The cost metric for one-to-one and one-to-many label assignments, are given as follows:
|
||||||
|
|
||||||
|
\begin{align}
|
||||||
|
\mathcal{C} _{p,q}^{o2o}=\tilde{s}_{p}^{g}\times \left( GIoU_{lane}\left( p,q \right) \right) ^{\beta} \label{o2o_cost},\\
|
||||||
|
\mathcal{C} _{p,q}^{o2m}=s_{p}^{g}\times \left( GIoU_{lane}\left( p,q \right) \right) ^{\beta}, \label{o2m_cost}
|
||||||
|
\end{align}
|
||||||
|
where $\mathcal{C} _{pq}^{o2o}$ and $\mathcal{C} _{pq}^{o2m}$ are the cost metric between $p$-th prediction and $q$-th ground truth and $g$ in $GIoU_{lane}$ are set to $0$ to keep it non-negative. These metrics imply that both confidence score and geometric distance contributes to the cost metrics.
|
||||||
|
|
||||||
|
Suppose that there are $K$ predictions and $G$ ground truth. Let $\pi$ denotes some one-to-one label assignment strategy and $\pi(q)$ represents that $\pi(q)$-th predictions are assign to the $q$-th anchor. Additionally, $\mathscr{S}_{K, G}$ denotes the set of all possible one-to-one assignment strategies for K predictions and Q ground truth. It's easy to demonstrate that the total number of one-to-one assignment strategies $\left| \mathscr{S} _{K,G} \right|$ is $\frac{K!}{\left( K-G \right)!}$. The final assignment $\hat{\pi}$ are determined as follows:
|
||||||
|
\begin{align}
|
||||||
|
\hat{\pi}=\underset{\pi \in \mathscr{S}_{K,G}}{arg\max}\sum_{q=1}^G{\mathcal{C} _{\pi \left( q \right) ,q}^{o2o}}。
|
||||||
|
\end{align}
|
||||||
|
This assignment problem can be solved by Hungarian algorithm \cite{detr}. Finally, $G$ predictions are assigned as positive samples and $K-G$ predictons are assigned as negative samples.
|
||||||
|
|
||||||
|
In the one-to-many label assignment, we simply use SimOTA \cite{yolox}, which is the same as previous works \cite{clrernet}. Neglecting the detailed process of SimOTA, we only introduce the inputs of SimOTA, the cost matrix $\boldsymbol{M}^C\in \mathbb{R}^{G\times K}$ and the IoU matrix $\boldsymbol{M}^{IoU}\in \mathbb{R}^{G\times K}$. The elements in the two matrices are defined as $M^C_{qp}=\mathcal{C} _{p,q}^{o2m}$ and $M^{IoU}_{qp}= GIoU_{lane}\left( p,q \right)$ (with $g=0$), respectively. The number of assigned predictions for each ground truth is unfixed but no more than an upper bound $k_{dynamic}$, which is set to $4$ in our experiment. Finally, there are $K_{pos}$ positive samples and $K-K_{pos}$ negative samples, where $K_{pos}$ ranges from $0$ to $Gk_{dynamic}$.
|
||||||
|
|
||||||
|
Given the ground truth label generated by the label assignment strategy for each prediction, we can conduct the loss function during phase. As illustrated in Fig. \ref{head_assign}, $\mathcal{L}_{cls}^{o2o}$ and $\mathcal{L}_{rank}$ are for the O2O classification head, $\mathcal{L}_{cls}^{o2m}$ is for the O2M classification head whereas $\mathcal{L}_{GIOU}$ (with $g=1$), $\mathcal{L}_{end}$ and $\mathcal{L}_{aux}$ for the O2M head. The training phase of the O2M classification and regression heads are almost the same as previous works \cite{clrnet}.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
\begin{figure*}[t]
|
||||||
\centering
|
\centering
|
||||||
\def\pagewidth{0.49\textwidth}
|
\def\pagewidth{0.49\textwidth}
|
||||||
\def\subwidth{0.47\linewidth}
|
\def\subwidth{0.47\linewidth}
|
||||||
@ -1145,7 +1203,7 @@ where j and k are the indices of the start point and the end point, respectively
|
|||||||
|
|
||||||
|
|
||||||
|
|
||||||
\begin{figure*}[htbp]
|
\begin{figure*}[t]
|
||||||
\centering
|
\centering
|
||||||
\def\subwidth{0.24\textwidth}
|
\def\subwidth{0.24\textwidth}
|
||||||
\def\imgwidth{\linewidth}
|
\def\imgwidth{\linewidth}
|
||||||
|
Binary file not shown.
Before Width: | Height: | Size: 1.7 MiB After Width: | Height: | Size: 1.7 MiB |
Binary file not shown.
Before Width: | Height: | Size: 962 KiB After Width: | Height: | Size: 964 KiB |
Binary file not shown.
Before Width: | Height: | Size: 54 KiB After Width: | Height: | Size: 94 KiB |
Binary file not shown.
Loading…
x
Reference in New Issue
Block a user