From a2e56810de0fb59f1956fa268eb72a9c33747cb6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E7=8E=8B=E8=80=81=E6=9D=BF?= Date: Fri, 11 Oct 2024 10:31:56 +0800 Subject: [PATCH] update --- main.tex | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/main.tex b/main.tex index f3de097..5598a76 100644 --- a/main.tex +++ b/main.tex @@ -231,11 +231,11 @@ where $N^{l}_{pos}=\left|\{j|\hat{r}_j^l<\tau^{l}\}\right|$ is the number of pos \centering \includegraphics[width=0.89\linewidth]{thesis_figure/detection_head.png} \caption{The main pipeline of GPM. It comprises the RoI Pooling Layer alongside the triplet heads—namely, the O2O classification head, O2M classification head, and O2M regression head. The predictions generated by the O2M classification head $\left\{s_i^g\right\}$ exhibit redundancy and necessitate Non-Maximum Suppression (NMS) post-processing. Conversely, the O2O classification head functions as a substitute for NMS, directly delivering the non-redundant prediction scores (also denoted as $\left\{\tilde{s}_i^g\right\}$) based on the redundant scores (also denoted as $\left\{s_i^g\right\}$) from the O2M classification head.} - \label{gpm} + \label{g} \end{figure} \subsection{Global Polar Module} -Similar to the pipeline of Faster R-CNN, the LPM serves as the first stage for generating lane anchor proposals. As illustrated in Fig. \ref{overall_architecture}, we introduce a novel \textit{Global Polar Module} (GPM) as the second stage to achieve final lane prediction. The GPM takes features samples from anchors and outputs the precise location and confidence scores of final lane detection results. The overall architecture of GPM is illustrated in the Fig. \ref{gpm}. +Similar to the pipeline of Faster R-CNN, the LPM serves as the first stage for generating lane anchor proposals. As illustrated in Fig. \ref{overall_architecture}, we introduce a novel \textit{Global Polar Module} (GPM) as the second stage to achieve final lane prediction. The GPM takes features samples from anchors and outputs the precise location and confidence scores of final lane detection results. The overall architecture of GPM is illustrated in the Fig. \ref{g}. \par \textbf{RoI Pooling Layer.} It is designed to extract sampled features from lane anchors. For ease of the sampling operation, we first convert the radius of the positive lane anchors in a local polar coordinate, $r_j^l$, to the one in a global polar coordinate system, $r_j^g$, by the following equation \begin{align} @@ -258,9 +258,9 @@ where $\boldsymbol{w}_{k}\in \mathbb{R} ^{N^{l}_{pos}}$ represents the learnable \boldsymbol{F}^{roi}\gets FC_{pool}\left( \boldsymbol{F}^s \right), \boldsymbol{F}^{roi}\in \mathbb{R} ^{d_r}. \end{equation} -\textbf{Triplet Head.} The triplet head encompasses three distinct components: the one-to-one (O2O) classification head, the one-to-many (O2M) classification head, and the one-to-many (O2M) regression head, as depicted in Fig. \ref{gpm}. In numerous studies \cite{laneatt}\cite{clrnet}\cite{adnet}\cite{srlane}, the detection head predominantly adheres to the one-to-many paradigm. During the training phase, multiple positive samples are assigned to a single ground truth. Consequently, during the evaluation phase, redundant detection outcomes are frequently predicted for each instance. These redundancies are conventionally mitigated using Non-Maximum Suppression (NMS), which eradicates duplicate results. Nevertheless, NMS relies on the definition of the geometric distance between detection results, rendering this calculation intricate for curvilinear lanes. Moreover, NMS post-processing introduces challenges in balancing recall and precision, a concern highlighted in our previous analysis. To attain optimal non-redundant detection outcomes within a NMS-free paradigm (i.e., end-to-end detection), both the one-to-one and one-to-many paradigms become pivotal during the training stage, as underscored in \cite{o2o}. Drawing inspiration from \cite{o3d}\cite{pss} but with subtle variations, we architect the triplet head to achieve a NMS-free paradigm. +\textbf{Triplet Head.} The triplet head encompasses three distinct components: the one-to-one (O2O) classification head, the one-to-many (O2M) classification head, and the one-to-many (O2M) regression head, as depicted in Fig. \ref{g}. In numerous studies \cite{laneatt}\cite{clrnet}\cite{adnet}\cite{srlane}, the detection head predominantly adheres to the one-to-many paradigm. During the training phase, multiple positive samples are assigned to a single ground truth. Consequently, during the evaluation phase, redundant detection outcomes are frequently predicted for each instance. These redundancies are conventionally mitigated using Non-Maximum Suppression (NMS), which eradicates duplicate results. Nevertheless, NMS relies on the definition of the geometric distance between detection results, rendering this calculation intricate for curvilinear lanes. Moreover, NMS post-processing introduces challenges in balancing recall and precision, a concern highlighted in our previous analysis. To attain optimal non-redundant detection outcomes within a NMS-free paradigm (i.e., end-to-end detection), both the one-to-one and one-to-many paradigms become pivotal during the training stage, as underscored in \cite{o2o}. Drawing inspiration from \cite{o3d}\cite{pss} but with subtle variations, we architect the triplet head to achieve a NMS-free paradigm. -To ensure both simplicity and efficiency in our model, the O2M regression head and the O2M classification head are constructed using a straightforward architecture featuring two-layer Multi-Layer Perceptrons (MLPs). To facilitate the model’s transition to an end-to-end paradigm, we have developed an extended O2O classification head. As illustrated in Fig. \ref{gpm}, it is important to note that the detection process of the O2O classification head is not independent; rather, the confidence $\left\{ \tilde{s}_i \right\}$ output by the O2O classificatoin head relies upon the confidence $\left\{ s_i \right\} $ output by the O2M classification head. +To ensure both simplicity and efficiency in our model, the O2M regression head and the O2M classification head are constructed using a straightforward architecture featuring two-layer Multi-Layer Perceptrons (MLPs). To facilitate the model’s transition to an end-to-end paradigm, we have developed an extended O2O classification head. As illustrated in Fig. \ref{g}, it is important to note that the detection process of the O2O classification head is not independent; rather, the confidence $\left\{ \tilde{s}_i \right\}$ output by the O2O classificatoin head relies upon the confidence $\left\{ s_i \right\} $ output by the O2M classification head. \begin{figure}[t] \centering @@ -340,9 +340,9 @@ We directly apply the redefined GLaneIoU loss (refer to Appendix \ref{giou_appen The final loss functions for GPM are given as follows: \begin{align} - \mathcal{L} _{cls}^{gpm}&=w_{o2m}^{cls}\mathcal{L} _{o2m}^{\mathrm{cls}}+w_{o2o}^{cls}\mathcal{L} _{\mathrm{o}2\mathrm{o}}^{\mathrm{cls}}+w_{rank}\mathcal{L}_{\mathrm{rank}}, + \mathcal{L} _{cls}^{g}&=w_{o2m}^{cls}\mathcal{L} _{o2m}^{\mathrm{cls}}+w_{o2o}^{cls}\mathcal{L} _{\mathrm{o}2\mathrm{o}}^{\mathrm{cls}}+w_{rank}\mathcal{L}_{\mathrm{rank}}, \\ - \mathcal{L} _{reg}^{gpm}&=w_{GIoU}\mathcal{L} _{G\mathrm{IoU}}+w_{end}\mathcal{L}_{\mathrm{end}}+w_{aux}\mathcal{L} _{\mathrm{aux}}. + \mathcal{L} _{reg}^{g}&=w_{GIoU}\mathcal{L} _{G\mathrm{IoU}}+w_{end}\mathcal{L}_{\mathrm{end}}+w_{aux}\mathcal{L} _{\mathrm{aux}}. \end{align} % \begin{align} % \mathcal{L}_{aux} &= \frac{1}{\left| \varOmega^{pos}_{o2m} \right| N_{seg}} \sum_{i \in \varOmega_{pos}^{o2o}} \sum_{m=j}^k \Bigg[ l \left( \theta_i - \hat{\theta}_{i}^{seg,m} \right) \\ @@ -350,7 +350,7 @@ The final loss functions for GPM are given as follows: % \end{align} \subsection{The Overalll Loss Function.} The entire training process is orchestrated in an end-to-end manner, wherein both the LPM and the GPM are trained concurrently. The overall loss function is delineated as follows: \begin{align} -\mathcal{L} =\mathcal{L} _{cls}^{l}+\mathcal{L} _{reg}^{l}+\mathcal{L} _{cls}^{gpm}+\mathcal{L} _{reg}^{gpm}. +\mathcal{L} =\mathcal{L} _{cls}^{l}+\mathcal{L} _{reg}^{l}+\mathcal{L} _{cls}^{g}+\mathcal{L} _{reg}^{g}. \end{align}