This commit is contained in:
王老板 2024-11-02 18:48:34 +08:00
parent 263bea254e
commit 163eb2fa72
2 changed files with 28 additions and 9 deletions

View File

@ -19,14 +19,31 @@
\usepackage{tikz} \usepackage{tikz}
\usepackage{tabularx} \usepackage{tabularx}
\usepackage{mathrsfs} \usepackage{mathrsfs}
\usepackage{etoolbox}
% 定义一个命令来禁用参考文献引用
\newcommand{\disablecitations}{%
\renewcommand{\cite}[1]{}%
}
% 定义一个命令来恢复参考文献引用
\newcommand{\enablecitations}{%
\let\cite\oldcite%
}
% 保存原始的 \cite 命令
\let\oldcite\cite
\usepackage[colorlinks,bookmarksopen,bookmarksnumbered, linkcolor=red]{hyperref} \usepackage[colorlinks,bookmarksopen,bookmarksnumbered, linkcolor=red]{hyperref}
\definecolor{darkgreen}{RGB}{17,159,27} % \definecolor{darkgreen}{RGB}{17,159,27} %
\aboverulesep=0pt \aboverulesep=0pt
\belowrulesep=0pt \belowrulesep=0pt
\hyphenation{op-tical net-works semi-conduc-tor IEEE-Xpolare} \hyphenation{op-tical net-works semi-conduc-tor IEEE-Xpolare}
% updated with editorial comments 8/9/2021 % updated with editorial comments 8/9/2021
% \renewcommand{\includegraphics}[2][]{} % 重定义 \includegraphics 命令为空操作
\begin{document} \begin{document}
\disablecitations
\enablecitations
\title{Polar R-CNN:\@ End-to-End Lane Detection with Fewer Anchors} \title{Polar R-CNN:\@ End-to-End Lane Detection with Fewer Anchors}
@ -303,7 +320,7 @@ where $j=1,2,\cdots,K$ and $\mathrm{MPool}_{col}(\cdot|\boldsymbol{A}(:,j)=1)$ i
\end{align} \end{align}
As stated above, the O2O classification subhead is formed from Eqs. (\ref{edge_layer_1})-(\ref{node_layer}), which can be seen as a directed graph driven by neural networks, which is referred to as the \textit{graph neural network} (GNN) block. As stated above, the O2O classification subhead is formed from Eqs. (\ref{edge_layer_1})-(\ref{node_layer}), which can be seen as a directed graph driven by neural networks, which is referred to as the \textit{graph neural network} (GNN) block.
\par \par
\textbf{Dual Confidence Selection with NMF-free.} With the help of adjacency matrix $A$, the variability among semantic features $\{\boldsymbol{D}_j^{roi}\}$ has been enlarged, resulting in a significant gap in confidence scores $\{\tilde{s}_{j}^{g}\}$ generated by O2O classification subhead, which makes them easier to distinguish. Therefore, unlike conventional methods that feed the confidence scores $\{\tilde{s}_{j}^{g}\}$ obtained by O2M classification subhead into the NMS post-processing stage to remove redundant candidates, we have implemented the following dual confidence selection criterion for selecting positive anchors: \textbf{Dual Confidence Selection with NMF-free.} With the help of adjacency matrix $\boldsymbol{A}$, the variability among semantic features $\{\boldsymbol{D}_j^{roi}\}$ has been enlarged, resulting in a significant gap in confidence scores $\{\tilde{s}_{j}^{g}\}$ generated by O2O classification subhead, which makes them easier to distinguish. Therefore, unlike conventional methods that feed the confidence scores $\{\tilde{s}_{j}^{g}\}$ obtained by O2M classification subhead into the NMS post-processing stage to remove redundant candidates, we have implemented the following dual confidence selection criterion for selecting positive anchors:
\begin{align} \begin{align}
\Omega^{pos}=\left\{i|\tilde{s}_{i}^{g}>\tau_{o2o} \right\} \cap \left\{ i|s_{i}^{g}>\tau_{o2m} \right\}, \Omega^{pos}=\left\{i|\tilde{s}_{i}^{g}>\tau_{o2o} \right\} \cap \left\{ i|s_{i}^{g}>\tau_{o2m} \right\},
\end{align} \end{align}
@ -754,7 +771,9 @@ In this paper, we propose Polar R-CNN to address two key issues in anchor-based
\newpage \newpage
% 附录有多个section时 % 附录有多个section时
\appendices \enablecitations
\begin{appendices}
\setcounter{table}{0} %从0开始编号显示出来表会A1开始编号 \setcounter{table}{0} %从0开始编号显示出来表会A1开始编号
\setcounter{figure}{0} \setcounter{figure}{0}
\setcounter{section}{0} \setcounter{section}{0}
@ -974,10 +993,10 @@ Details about cost function and label assignments for the triplet head are furni
The cost metrics for both one-to-one and one-to-many label assignments are articulated as follows: The cost metrics for both one-to-one and one-to-many label assignments are articulated as follows:
\begin{align} \begin{align}
\mathcal{C} _{p,q}^{o2o}=\tilde{s}_{p}^{g}\times \left( GIoU_{lane}\left( p,q \right) \right) ^{\beta} \label{o2o_cost},\\ \mathcal{C} _{p,q}^{o2o}=\tilde{s}_{p}^{g}\times \left( GIoU\left( p,q \right) \right) ^{\beta} \label{o2o_cost},\\
\mathcal{C} _{p,q}^{o2m}=s_{p}^{g}\times \left( GIoU_{lane}\left( p,q \right) \right) ^{\beta}, \label{o2m_cost} \mathcal{C} _{p,q}^{o2m}=s_{p}^{g}\times \left( GIoU\left( p,q \right) \right) ^{\beta}, \label{o2m_cost}
\end{align} \end{align}
where $\mathcal{C} _{pq}^{o2o}$ and $\mathcal{C} _{pq}^{o2m}$ denote the cost metric between $p$-th prediction and $q$-th ground truth and $g$ in $GIoU_{lane}$ are set to $0$ to ensure it maintains non-negative. These metrics imply that both the confidence score and geometric distance contribute to the cost metrics. where $\mathcal{C} _{pq}^{o2o}$ and $\mathcal{C} _{pq}^{o2m}$ denote the cost metric between $p$-th prediction and $q$-th ground truth and $g$ in $GIoU$ are set to $0$ to ensure it maintains non-negative. These metrics imply that both the confidence score and geometric distance contribute to the cost metrics.
Suppose that there exist $K$ predictions and $G$ ground truth. Let $\pi$ denotes the one-to-one label assignment strategy and $\pi(q)$ represent that the $\pi(q)$-th prediction is assigned to the $q$-th anchor. Additionally, $\mathscr{S}_{K, G}$ denotes the set of all possible one-to-one assignment strategies for K predictions and G ground truth. It's straightforward to demonstrate that the total number of one-to-one assignment strategies $\left| \mathscr{S} _{K,G} \right|$ is $\frac{K!}{\left( K-G \right)!}$. The final optimal assignment $\hat{\pi}$ is determined as follows: Suppose that there exist $K$ predictions and $G$ ground truth. Let $\pi$ denotes the one-to-one label assignment strategy and $\pi(q)$ represent that the $\pi(q)$-th prediction is assigned to the $q$-th anchor. Additionally, $\mathscr{S}_{K, G}$ denotes the set of all possible one-to-one assignment strategies for K predictions and G ground truth. It's straightforward to demonstrate that the total number of one-to-one assignment strategies $\left| \mathscr{S} _{K,G} \right|$ is $\frac{K!}{\left( K-G \right)!}$. The final optimal assignment $\hat{\pi}$ is determined as follows:
\begin{align} \begin{align}
@ -1218,7 +1237,7 @@ This assignment problem can be solved by Hungarian algorithm \cite{detr}. Finall
\caption{Visualization of the detection outcomes in sparse and dense scenarios on the CurveLanes dataset.} \caption{Visualization of the detection outcomes in sparse and dense scenarios on the CurveLanes dataset.}
\label{vis_dense} \label{vis_dense}
\end{figure*} \end{figure*}
In the one-to-many label assignment, we simply use SimOTA \cite{yolox}, which aligns with previous works \cite{clrernet}. Omitting the detailed process of SimOTA, we only introduce the inputs to it, namely the cost matrix $\boldsymbol{M}^C\in \mathbb{R}^{G\times K}$ and the IoU matrix $\boldsymbol{M}^{IoU}\in \mathbb{R}^{G\times K}$. The elements in the two matrices are defined as $M^C_{qp}=\mathcal{C} _{p,q}^{o2m}$ and $M^{IoU}_{qp}= GIoU_{lane}\left( p,q \right)$ (with $g=0$), respectively. The number of assigned predictions for each ground truth is variable but does not exceed an upper bound $k_{dynamic}$, which is set to $4$ in our experiment. Finally, there are $K_{pos}$ positive samples and $K-K_{pos}$ negative samples, where $K_{pos}$ ranges from $0$ to $Gk_{dynamic}$. In the one-to-many label assignment, we simply use SimOTA \cite{yolox}, which aligns with previous works \cite{clrernet}. Omitting the detailed process of SimOTA, we only introduce the inputs to it, namely the cost matrix $\boldsymbol{M}^C\in \mathbb{R}^{G\times K}$ and the IoU matrix $\boldsymbol{M}^{IoU}\in \mathbb{R}^{G\times K}$. The elements in the two matrices are defined as $M^C_{qp}=\mathcal{C} _{p,q}^{o2m}$ and $M^{IoU}_{qp}= GIoU\left( p,q \right)$ (with $g=0$), respectively. The number of assigned predictions for each ground truth is variable but does not exceed an upper bound $k_{dynamic}$, which is set to $4$ in our experiment. Finally, there are $K_{pos}$ positive samples and $K-K_{pos}$ negative samples, where $K_{pos}$ ranges from $0$ to $Gk_{dynamic}$.
Given the ground truth label generated by the label assignment strategy for each prediction, we can conduct the loss function during phase. As illustrated in Fig. \ref{head_assign}, $\mathcal{L}_{cls}^{o2o}$ and $\mathcal{L}_{rank}$ are for the O2O classification subhead, $\mathcal{L}_{cls}^{o2m}$ is for the O2M classification subhead whereas $\mathcal{L}_{GIOU}$ (with $g=1$), $\mathcal{L}_{end}$ and $\mathcal{L}_{aux}$ for the O2M regression subhead. Given the ground truth label generated by the label assignment strategy for each prediction, we can conduct the loss function during phase. As illustrated in Fig. \ref{head_assign}, $\mathcal{L}_{cls}^{o2o}$ and $\mathcal{L}_{rank}$ are for the O2O classification subhead, $\mathcal{L}_{cls}^{o2m}$ is for the O2M classification subhead whereas $\mathcal{L}_{GIOU}$ (with $g=1$), $\mathcal{L}_{end}$ and $\mathcal{L}_{aux}$ for the O2M regression subhead.
\label{assign_appendix} \label{assign_appendix}
@ -1236,5 +1255,5 @@ Fig. \ref{vis_sparse} illustrates the visualization outcomes in sparse scenarios
Fig. \ref{vis_dense} shows the visualization outcomes in dense scenarios. The first column displays the ground truth, while the second and the third columns reveal the detection results with NMS paradigm of large (\textit{i.e.}, the default threshold NMS@50 with 50 pixels) and small (\textit{i.e.}, the optimal threshold NMS@15 with 15 pixels) NMS thresholds, respectively. The final column shows the detection results with NMS-free paradigm. We observe that NMS@50 mistakenly removes some predictions, leading to false negatives, while NMS@15 fails to eliminate some redundant predictions, leading to false positives. This underscores that the trade-off struggles between large and small NMS thresholds. The visualization distinctly demonstrates that distance becomes less effective in dense scenarios. Only the proposed O2O classification subhead, driven by data, can address this issue by capturing semantic distance beyond geometric distance. As shown in the last column of Fig. \ref{vis_dense}, the O2O classification subhead successfully eliminates redundant predictions while preserving dense predictions, despite their minimal geometric distances. Fig. \ref{vis_dense} shows the visualization outcomes in dense scenarios. The first column displays the ground truth, while the second and the third columns reveal the detection results with NMS paradigm of large (\textit{i.e.}, the default threshold NMS@50 with 50 pixels) and small (\textit{i.e.}, the optimal threshold NMS@15 with 15 pixels) NMS thresholds, respectively. The final column shows the detection results with NMS-free paradigm. We observe that NMS@50 mistakenly removes some predictions, leading to false negatives, while NMS@15 fails to eliminate some redundant predictions, leading to false positives. This underscores that the trade-off struggles between large and small NMS thresholds. The visualization distinctly demonstrates that distance becomes less effective in dense scenarios. Only the proposed O2O classification subhead, driven by data, can address this issue by capturing semantic distance beyond geometric distance. As shown in the last column of Fig. \ref{vis_dense}, the O2O classification subhead successfully eliminates redundant predictions while preserving dense predictions, despite their minimal geometric distances.
\label{vis_appendix} \label{vis_appendix}
\end{appendices}
\end{document} \end{document}

View File

@ -1,2 +1,2 @@
latexmk -C latexmk -C
latexmk -pdf main.tex latexmk -pdf main.tex