This commit is contained in:
王老板 2024-10-01 19:08:03 +08:00
parent f0a77edf44
commit 37a9a10309
4 changed files with 51 additions and 54 deletions

105
main.tex
View File

@ -220,9 +220,7 @@ The regression branch consists of a single $1\times1$ convolutional layer and wi
\par
\textbf{Loss Function for Training the LPM.} To train the local polar module, we define the ground truth labels for each local pole as follows: the ground truth radius, $\hat{r}^l_i$, is set to be the minimum distance from a local pole to the corresponding lane curve, while the ground truth angle, $\hat{\theta}_i$, is set to be the orientation of the vector extending from the local pole to the nearest point on the curve. A positive pole is labeled as one; otherwise, it is labeled as zero. Consequently, we have a label set of local poles $F_{gt}=\{\hat{s}_j^l\}_{j=1}^{H^l\times W^l}$, where $\hat{s}_j^l=1$ if the $j$-th local pole is positive and $\hat{s}_j^l=0$ if it is negative. Once the regression and classification labels are established, as shown in Fig. \ref{lpmlabel}, the LPM can be trained using the \textit{smooth-L}1 loss $S_{L1}\left(\cdot \right)$ for regression branch and the \textit{binary cross-entropy} loss $BCE\left( \cdot , \cdot \right)$ for classification branch. The loss functions for the LPM are given as follows:
\begin{align}
\mathcal{L} ^{lpm}_{cls}&=BCE\left( F_{cls},F_{gt} \right), \\
\mathcal{L} ^{lpm}_{reg}&=\frac{1}{N^{lpm}_{pos}}\sum_{j\in \left\{j|\hat{r}_j^l<\tau^{l} \right\}}{\left( S_{L1}\left( \theta_j-\hat{\theta}_j \right) +S_{L1}\left( r_j^l-\hat{r}_j^l \right) \right)}, \\
\mathcal{L} ^{lpm} &= \mathcal{L} ^{lpm}_{cls} + w^{lpm}_{reg}\mathcal{L} ^{lpm}_{reg},
\mathcal{L} ^{lpm}_{cls}&=BCE\left( F_{cls},F_{gt} \right)
\label{loss_lph}
\end{align}
where $N^{lpm}_{pos}=\left|\{j|\hat{r}_j^l<\tau^{l}\}\right|$ is the number of positive local poles in the LPM.
@ -257,62 +255,58 @@ Given the feature maps $P_1, P_2, P_3$ from FPN, we can extract feature vectors
\end{equation}
where $\boldsymbol{w}_{k}\in \mathbb{R} ^{N^{lpm}_{pos}}$ represents the learnable aggregate weight, serving as a learned model weight. Instead of concatenating the three sampling features into $\boldsymbol{F}^s\in \mathbb{R} ^{N_p\times d_f\times 3}$ directly, the adaptive summation significantly reduces the feature dimensions to $\boldsymbol{F}^s\in \mathbb{R} ^{N_p\times d_f}$, which is one-third of the original dimension. The weighted sum tensors are then fed into fully connected layers to obtain the pooled RoI features of an anchor:
\begin{equation}
\boldsymbol{F}^{roi}\gets FC_{pool}\left( \boldsymbol{F}^s \right), \boldsymbol{F}^{roi}\in \mathbb{R} ^{d_r},
\boldsymbol{F}^{roi}\gets FC_{pool}\left( \boldsymbol{F}^s \right), \boldsymbol{F}^{roi}\in \mathbb{R} ^{d_r}.
\end{equation}
\textbf{Triplet Head.} The triplet head encompasses three distinct components: the one-to-one classification (O2O cls) head, the one-to-many classification (O2M cls) head, and the one-to-many regression (O2M reg) head, as depicted in Fig. \ref{gpm}. In numerous studies \cite{laneatt}\cite{clrnet}\cite{adnet}\cite{srlane}, the detection head predominantly adheres to the one-to-many paradigm. During the training phase, multiple positive samples are assigned to a single ground truth. Consequently, during the evaluation phase, redundant detection outcomes are frequently predicted for each instance. These redundancies are conventionally mitigated using Non-Maximum Suppression (NMS), which eradicates duplicate results. Nevertheless, NMS relies on the definition of the geometric distance between detection results, rendering this calculation intricate for curvilinear lanes and other irregular geometric shapes. Moreover, NMS post-processing introduces challenges in balancing recall and precision, a concern highlighted in our previous analysis. To attain optimal non-redundant detection outcomes within a NMS-free paradigm (i.e., end-to-end detection), both the one-to-one and one-to-many paradigms become pivotal during the training stage, as underscored in \cite{o2o}\cite{}. Drawing inspiration from \cite{} but with subtle variations, we architect the triplet head to achieve a NMS-free paradigm.
To ensure both simplicity and efficiency in our model, the O2M regression head and the O2M classification head are constructed using a straightforward architecture featuring two-layer Multi-Layer Perceptrons (MLPs). To facilitate the models transition to an end-to-end paradigm, we have developed an extended O2O classification head. As illustrated in Fig. \ref{gpm}, it is important to note that the detection process of the O2O classification head is not independent; rather, the confidence $\left\{ \tilde{s}_i \right\}$ output by the O2O classificatoin head relies upon the confidence $\left\{ s_i \right\} $ output by the O2M classification head.
\begin{figure}[t]
\centering
\includegraphics[width=\linewidth]{thesis_figure/gnn.png} % 替换为你的图片文件名
\caption{The main architecture of O2O classification head.}
\includegraphics[width=0.9\linewidth]{thesis_figure/gnn.png} % 替换为你的图片文件名
\caption{The main architecture of O2O classification head. Each anchor is conceived as a node within the Polar GNN. The interconnecting edges are formed through the amalgamation of three distinct matrices: the positive selection matrix $\left\{M_{ij}^{P}\right\}$, the confidence comparison matrix $\left\{M_{ij}^{C}\right\}$ and the geometric prior matrix $\left\{M_{ij}^{G}\right\}$. $\left\{M_{ij}^{P}\right\}$ and $\left\{M_{ij}^{C}\right\}$ are derived from the O2M classification head (the orange box), whereas $\left\{M_{ij}^{G}\right\}$ is constructed in accordance with the positional parameter of the anchors (the dashed box).}
\label{o2o_cls_head}
\end{figure}
As shown in Fig. \ref{o2o_cls_head}, we introduce a novel architecture that incorporates a \textit{graph neural network} \cite{gnn} (GNN) with a polar geometric prior, which we refer to as the Polar GNN. he Polar GNN is designed to model the relationship between features $\boldsymbol{F}_{i}^{roi}$ sampled from different anchors. Based on our previous analysis, the distance between lanes should not only be modeled by explicit geometric properties but also consider implicit contextual semantics such as “double” and “forked” lanes. These types of lanes, despite their tiny geometric differences, should not be removed as redundant predictions. The structural insight of the Polar GNN is derived from Fast NMS \cite{yolact}, which operates without iterative processes. The detailed design can be found in the appendix; here, we focus on elaborating the architecture of the Polar GNN.
In the Polar GNN, each anchor is treated as a node, and the ROI features $\boldsymbol{F}_{i}^{roi}$ are considered as the attributes of these nodes. Another crucial element of the GNN is the edge, also known as the adjacency matrix. We derive the adjacency matrix from three submatrices. The first part is the positive selection matrix $\mathbf{M}^{P}$.
In the Polar GNN, each anchor is conceptualized as a node, with the ROI features $\boldsymbol{F}_{i}^{roi}$ serving as the attributes of these nodes. pivotal component of the GNN is the edge, represented by the adjacency matrix. This matrix is derived from three submatrices. The first component is the positive selection matrix, denoted as $\mathbf{M}^{P}\in\mathbb{R}^{K\times K}$:
\begin{align}
M_{ij}^{P}=\begin{cases}
1, \left( s_i\geqslant \tau _s\land s_j\geqslant \tau _s \right)\\
0,others\\
0,others,\\
\end{cases}
\end{align}
where $\tau _s$ signifies the threshold for positive scores in the NMS paradigm. We employ this threshold to selectively retain positive redundant predictions.
where $\tau _s$ denotes the threshold of positive score in the NMS paradigm. We directly use the threshold to select the posivite redundant prediction.
The second part is called the confidence comparision matrix $\mathbf{M}^{C}$, defined as follows:
The second component is the confidence comparison matrix $\mathbf{M}^{C}\in\mathbb{R}^{K\times K}$, defined as follows:
\begin{align}
M_{ij}^{C}=\begin{cases}
1, s_i<s_j\,\,| \left( s_i=s_j \land i<j \right)\\
0, others\\
0, others.\\
\end{cases}
\label{confidential matrix}
\end{align}
where the scores of each pair are compared.The thrid part is the geometric prior matrix $\mathbf{M}^{G}$ and the defination is given as follows:
This matrix facilitates the comparison of scores for each pair of anchors.
The third component is the geometric prior matrix, denoted by $\mathbf{M}^{G}\in\mathbb{R}^{K\times K}$ which is defined as:
\begin{align}
M_{ij}^{G}=\begin{cases}
1,\left| \theta _i-\theta _j \right|<\theta _{\tau}\land \left| r_{i}^{global}-r_{j}^{global} \right|<r_{\tau}\\
0, others\\
0, others.\\
\end{cases}
\label{geometric prior matrix}
\end{align}
This matrix means that if two anchors are closer enough, then an edge will be consider exists between the two corresponding nodes.
Given the three matrix above, we can define the overall adjacent matrix by $\mathbf{M} = \mathbf{M}^{P} \land \mathbf{M}^{C} \land \mathbf{M}^{G}$; where ``$\land$'' denotes the elementwise ``AND''. Then the relationships between the i-th anchor and the j-th anchor can be modelling by follows:
This matrix indicates that an edge (\textit{e.g.} the relationship between two nodes) is considered to exist between two corresponding nodes if the anchors are sufficiently close.
With the aforementioned three matrices, we can define the overall adjacency matrix as $\mathbf{M} = \mathbf{M}^{P} \land \mathbf{M}^{C} \land \mathbf{M}^{G}$; where ``$\land$'' denotes the elementwise ``AND''. Then the relationships between the $i$-th anchor and the $j$-th anchor can be modeled by follows:
\begin{align}
\tilde{\boldsymbol{F}}_{i}^{roi}&\gets \mathrm{Re}LU\left( FC_{o2o}^{roi}\left( \boldsymbol{F}_{i}^{roi} \right) \right),
\\
\boldsymbol{F}_{ij}^{edge}&\gets FC_{in}\left( \tilde{\boldsymbol{F}}_{i}^{roi} \right) -FC_{out}\left( \tilde{\boldsymbol{F}}_{j}^{roi} \right) +FC_{b}\left( \varDelta \boldsymbol{x}_{ij}^{b} \right),
\\
\boldsymbol{D}_{ij}^{edge}&\gets MLP_{edge}\left( \boldsymbol{F}_{ij}^{edge} \right).
\tilde{\boldsymbol{F}}_{i}^{roi} & \gets \mathrm{ReLU}\left( FC_{o2o}^{roi}\left( \boldsymbol{F}_{i}^{roi} \right) \right), \\
\boldsymbol{F}_{ij}^{edge} & \gets FC_{in}\left( \tilde{\boldsymbol{F}}_{i}^{roi} \right) - FC_{out}\left( \tilde{\boldsymbol{F}}_{j}^{roi} \right) + FC_{b}\left( \varDelta \boldsymbol{x}_{ij}^{b} \right), \\
\boldsymbol{D}_{ij}^{edge} & \gets MLP_{edge}\left( \boldsymbol{F}_{ij}^{edge} \right).
\label{edge_layer}
\end{align}
where $\boldsymbol{D}_{ij}^{edge}\in\mathbb{R}^d$ denoted the implicit semantic distance features from the $i$-th anchor to $j$-th anchor. Given the semantic distance feastures of anchors pair, we use a max pooling layer \cite{} to aggregate the adjacent node update the feaures of node, and get the final non-redundant scores $\left\{ \tilde{s}_i\right\}$:
Here, $\varDelta \boldsymbol{x}_{ij}^{b}$ denotes the difference between the x-axis coordinates of sampled points between the $i$-th anchor and the $j$-th anchor. $\boldsymbol{D}_{ij}^{edge}\in\mathbb{R}^d$ denotes the implicit semantic distance features from the $i$-th anchor to the $j$-th anchor. Given the semantic distance features for each pair of anchors, we employ a max pooling layer to aggregate the adjacent node features and update the node attributes, ultimately yielding the final non-redundant scores $\left\{ \tilde{s}_i\right\}$:
\begin{align}
\boldsymbol{D}_{i}^{node}&\gets \underset{j\in \left\{ j|M_{ij}=1 \right\}}{\max}\boldsymbol{D}_{ij}^{edge},
\\
@ -322,39 +316,41 @@ where $\boldsymbol{D}_{ij}^{edge}\in\mathbb{R}^d$ denoted the implicit semantic
\label{node_layer}
\end{align}
\textbf{Label Assignment and Cost Function.} As the previous work, we use the dual assignment strategy for label assignment of triplet head. The cost function of the i-th prediction and j-th ground truth is given as follows:
\textbf{Label Assignment and Cost Function.} As the previous work, we use the dual assignment strategy for label assignment of triplet head. The cost function of the $i$-th prediction and $j$-th ground truth is given as follows:
\begin{align}
\mathcal{C} _{ij}=s_i\times \left( GLaneIoU_{ij, g=0} \right) ^{\beta_r}.
\mathcal{C} _{ij}=s_i\times \left( GIoU_{lane} \right) ^{\beta}.
\end{align}
This cost function is more compact than those in previous works\cite{clrnet}\cite{adnet} and takes both location and confidence into account, with $\beta$ and $\beta$ as the trade-off hyper parameters of location and confidence. We redefined the GLaneIoU function, slightly different from previous work \cite{}\cite{}. More details about the redefined GLaneIoU function can be seen in the Appendix.
We use SimOTA \cite{}\cite{} ($k=4$) for O2M classification head and O2M regression head (one-to-many assignment) while Hungarian \cite{detr} algorithm is employed for the O2O classification head (one-to-one assignment).
This cost function is more compact than those in previous works\cite{clrnet}\cite{adnet}, considering both location and confidence into account, with $\beta$ serving as the trade-off hyperparameter for location and confidence. We have redefined the LaneIoU function $IOU_{lane}$, which differs slightly from previous work \cite{clrernet}. More details about $GIOU_{lane}$ can be found in the Appendix \ref{}.
We use SimOTA \cite{yolox} with dynamic $k=4$ (one-to-many assignment) for O2M classification head and O2M regression head while Hungarian \cite{detr} algorithm (one-to-one assignment) is employed for the O2O classification head.
\textbf{Loss function.}
We use focal loss \cite{focal} for O2O classification head and O2M classification head, which are dentoed as $\mathcal{L}^{o2m}_{cls}$ and $\mathcal{L}^{o2o}_{cls}$ correspondingly.
where the set of the one-to-one sample, $\varOmega _{pos}^{o2o}$ and $\varOmega _{neg}^{o2o}$, is restricted to the positive sample set of O2M classification head:
We utilize focal loss \cite{focal} for both O2O classification head and O2M classification head, dentoed as $\mathcal{L}^{o2m}_{cls}$ and $\mathcal{L}^{o2o}_{cls}$, respectively. The set of candidate samples involved in the computation of $\mathcal{L}^{o2o}_{cls}$, denoted as $\varOmega ^{pos}_{o2o}$ and $\varOmega ^{neg}_{o2o}$ for positive and negative target sets, is confined to the positive sample set of the O2M classification head:
\begin{align}
\varOmega _{pos}^{o2o}\cup \varOmega _{neg}^{o2o}=\left\{ i|s_i>C_{o2m} \right\}.
\varOmega^{pos}{o2o} \cup \varOmega^{neg}{o2o} = { i \mid s_i > \tau_s }.
\end{align}
only the posivite candidatas of the O2M classification head $s$ participatg in caculating the O2O classification loss, with other samples ignored. Additionally, we use the rank loss $\mathcal{L} _{\,\,rank}$ in \cite{} to increase the gap between positive and negative confidences of O2O classification head
In essence, certain samples with lower O2M scores are excluded from the computation of $\mathcal{L}^{o2o}_{cls}$. Furthermore, we harness the rank loss $\mathcal{L} _{\,\,rank}$ as referenced in \cite{pss} to amplify the disparity between the positive and negative confidences of the O2O classification head.
\begin{figure}[t]
\centering
\includegraphics[width=\linewidth]{thesis_figure/auxloss.png} %
\caption{Auxiliary loss for segment parameter regression.}
\caption{Auxiliary loss for segment parameter regression. The ground truth of a lane curve is partitioned into several segments, with the parameters of each segment denoted as $\left( \hat{\theta}_{i,\cdot}^{seg},\hat{r}_{i,\cdot}^{seg} \right)$. The model output the parameter offsets $\left( \varDelta \theta _{j,\cdot},\varDelta r_{j,\cdot}^{g} \right)$ to regress from the original anchor to each target line segments.}
\label{auxloss}
\end{figure}
We directly apply the redefined GLaneIoU loss (refer to Appendix \ref{}), $\mathcal{L}_{GIoU}$, to regress the offset of x-axis coordinates of sampled points and Smooth-L1 loss for the regression of end points of lanes, denoted as $\mathcal{L} _{end}$. To facilitate the learning of global features, we propose the auxiliary loss $\mathcal{L} _{\mathrm{aux}}$ depicted in Fig. \ref{auxloss}. The anchors and ground truth are segmented into several divisions. Each anchor segment is regressed to the primary components of the corresponding segment of the designated ground truth. This approach aids the anchors in acquiring a deeper comprehension of the global geometric form.
We directly use the GLaneIoU loss, $\mathcal{L}_{GLaneIoU}$, to regression the offset of xs (with g=1) and Smooth-L1 loss for the regression of end points (namely the y axis of the start point and the end point), denoted as $\mathcal{L} _{end}$. In order to make model learn the global features, we proposed the auxiliary loss illustrated in Fig. \ref{auxloss}:
The final loss functions for GPM are given as follows:
\begin{align}
\mathcal{L}_{aux} &= \frac{1}{\left| \varOmega_{pos}^{o2m} \right| N_{seg}} \sum_{i \in \varOmega_{pos}^{o2o}} \sum_{m=j}^k \Bigg[ l \left( \theta_i - \hat{\theta}_{i}^{seg,m} \right) \\
&\quad + l \left( r_{i}^{global} - \hat{r}_{i}^{seg,m} \right) \Bigg].
\mathcal{L} _{cls}^{gpm}&=w_{o2m}^{cls}\mathcal{L} _{o2m}^{\mathrm{cls}}+w_{o2o}^{cls}\mathcal{L} _{\mathrm{o}2\mathrm{o}}^{\mathrm{cls}}+w_{rank}\mathcal{L} _{\mathrm{rank}},
\\
\mathcal{L} _{reg}^{gpm}&=w_{GIoU}\mathcal{L} _{G\mathrm{IoU}}+w_{end}\mathcal{L} _{\mathrm{end}}+w_{aux}\mathcal{L} _{\mathrm{aux}}.
\end{align}
where the anchors and ground truth are divided into several segments. Each anchor segment is regressed to the main components of the corresponding segment of the assigned ground truth. This trick assists the anchors in learning more about the global geometric shape.
\subsection{The whole training process of Polar-RCNN.} The whole training process is end to end with only one step just like \cite{}\cite{}. The overll loss function is given as follows:
% \begin{align}
% \mathcal{L}_{aux} &= \frac{1}{\left| \varOmega^{pos}_{o2m} \right| N_{seg}} \sum_{i \in \varOmega_{pos}^{o2o}} \sum_{m=j}^k \Bigg[ l \left( \theta_i - \hat{\theta}_{i}^{seg,m} \right) \\
% &\quad + l \left( r_{i}^{global} - \hat{r}_{i}^{seg,m} \right) \Bigg].
% \end{align}
\subsection{The Overalll Loss Function.} The entire training process is orchestrated in an end-to-end manner, wherein both the LPM and the GPM are trained concurrently. The overall loss function is delineated as follows:
\begin{align}
\mathcal{L}_{\text{overall}} &= \mathcal{L}_{\text{lpm}}^{\text{cls}} &+ \mathcal{L}_{\text{lpm}}^{\text{reg}} &+ \mathcal{L}_{\text{o2m}}^{\text{cls}} + \mathcal{L}_{\text{o2o}}^{\text{cls}} + \mathcal{L}_{\text{rank}} \\
&+ \mathcal{L}_{\text{IoU}} &+ \mathcal{L}_{\text{end}} &+ \mathcal{L}_{\text{aux}}.
\mathcal{L} =\mathcal{L} _{cls}^{lpm}+\mathcal{L} _{reg}^{lpm}+\mathcal{L} _{cls}^{gpm}+\mathcal{L} _{reg}^{gpm}.
\end{align}
@ -371,11 +367,11 @@ We conducted experiments on four widely used lane detection benchmarks and one r
We use the F1-score to evaluate our model on the CULane, LLAMAS, DL-Rail, and Curvelanes datasets, maintaining consistency with previous works. The F1-score is defined as follows:
\begin{align}
F1=\frac{2\times Precision\times Recall}{Precision\,\,+\,\,Recall},
Pre\,\,&=\,\,\frac{TP}{TP+FP},
\\
Precision\,\,=\,\,\frac{TP}{TP+FP},
Rec\,\,&=\,\,\frac{TP}{TP+FN}.
\\
Recall\,\,=\,\,\frac{TP}{TP+FN}.
F1&=\frac{2\times Pre\times Rec}{Pre\,\,+\,\,Rec},
\end{align}
In our experiment, we use different IoU thresholds to calculate the F1-score for different datasets: F1@50 and F1@75 for CULane \cite{clrnet}, F1@50 for LLAMAS \cite{clrnet} and Curvelanes \cite{CondLaneNet}, and F1@50, F1@75, and mF1 for DL-Rail \cite{dalnet}. The mF1 is defined as:
\begin{align}
@ -570,12 +566,12 @@ We also compare the number of anchors and processing speed with other methods. F
\label{anchor_num_method}
\end{figure}
\subsection{Ablation Study and Visualization}
\subsection{Ablation Study}
To validate and analyze the effectiveness and influence of different component of Polar R-CNN, we conduct serveral ablation expeoriments on CULane and CurveLanes dataset to show the performance.
\textbf{Ablation study on polar coordinate system and anchor number.} To assess the importance of local polar coordinates of anchors, we examine the contribution of each component (i.e., angle and radius) to model performance. As shown in Table \ref{aba_lph}, both angle and radius contribute to performance to varying degrees. Additionally, we conduct experiments with auxiliary loss using fixed anchors and Polar R-CNN. Fixed anchors refer to using anchor settings trained by CLRNet, as illustrated in Fig. \ref{anchor setting}(b). Model performance improves by 0.48% and 0.3% under the fixed anchor paradigm and proposal anchor paradigm, respectively.
We also explore the effect of different local polar map sizes on our model, as illustrated in Fig. \ref{anchor_num_testing}. The overall F1 measure improves with increasing the local polar map size and tends to stabilize when the size is sufficiently large. Specifically, precision improves, while recall decreases. A larger polar map size includes more background anchors in the second stage (since we choose k=4 for SimOTA, with no more than four positive samples for each ground truth). Consequently, the model learns more negative samples, enhancing precision but reducing recall. Regarding the number of anchors chosen during the evaluation stage, recall and F1 measure show a significant increase in the early stages of anchor number expansion but stabilize in later stages. This suggests that eliminating some anchors does not significantly affect performance. Fig. \ref{cam} displays the heat map and top-$K_{a}$ selected anchors distribution in sparse scenarios. Brighter colors indicate a higher likelihood of anchors being foreground anchors. It is evident that most of the proposed anchors are clustered around the lane ground truth.
We also explore the effect of different local polar map sizes on our model, as illustrated in Fig. \ref{anchor_num_testing}. The overall F1 measure improves with increasing the local polar map size and tends to stabilize when the size is sufficiently large. Specifically, precision improves, while recall decreases. A larger polar map size includes more background anchors in the second stage (since we choose $k=4$ for SimOTA, with no more than four positive samples for each ground truth). Consequently, the model learns more negative samples, enhancing precision but reducing recall. Regarding the number of anchors chosen during the evaluation stage, recall and F1 measure show a significant increase in the early stages of anchor number expansion but stabilize in later stages. This suggests that eliminating some anchors does not significantly affect performance. Fig. \ref{cam} displays the heat map and top-$K_{a}$ selected anchors distribution in sparse scenarios. Brighter colors indicate a higher likelihood of anchors being foreground anchors. It is evident that most of the proposed anchors are clustered around the lane ground truth.
\begin{figure}[t]
@ -669,7 +665,7 @@ We also explore the stop-gradient strategy for the O2O classification head. As s
\begin{adjustbox}{width=\linewidth}
\begin{tabular}{cccc|ccc}
\toprule
\textbf{GNN}&\textbf{cls Mat}& \textbf{Nbr Mat}&\textbf{Rank Loss}&\textbf{F1@50 (\%)}&\textbf{Precision (\%)} & \textbf{Recall (\%)} \\
\textbf{GNN}&$\textbf{M}^{C}$&$\textbf{M}^{G}$&\textbf{Rank Loss}&\textbf{F1@50 (\%)}&\textbf{Precision (\%)} & \textbf{Recall (\%)} \\
\midrule
& & & &16.19&69.05&9.17\\
\checkmark&\checkmark& & &79.42&88.46&72.06\\
@ -952,7 +948,7 @@ We then define the cost function between $i$-th prediction and $j$-th ground tru
\mathcal{C} _{ij}=\left(s_i\right)^{\beta_c}\times \left( GLaneIoU_{ij, g=0} \right) ^{\beta_r}.
\end{align}
This cost function is more compact than those in previous works\cite{clrnet}\cite{adnet} and takes both location and confidence into account. For label assignment, SimOTA (with k=4) \cite{yolox} is used for the two O2M heads with one-to-many assignment, while the Hungarian \cite{detr} algorithm is employed for the O2O classification head for one-to-one assignment.
This cost function is more compact than those in previous works\cite{clrnet}\cite{adnet} and takes both location and confidence into account. For label assignment, SimOTA (with $k=4$) \cite{yolox} is used for the two O2M heads with one-to-many assignment, while the Hungarian \cite{detr} algorithm is employed for the O2O classification head for one-to-one assignment.
\textbf{Loss function.} We use focal loss \cite{focal} for O2O classification head and O2M classification head:
@ -968,10 +964,11 @@ where the set of the one-to-one sample, $\varOmega _{pos}^{o2o}$ and $\varOmega
\end{align}
Only one sample with confidence larger than $C_{o2m}$ is chosed as the canditate sample of O2O classification head. According to \cite{pss}, to maintain feature quality during training stage, the gradient of O2O classification head are stopped from propagating back to the rest of the network (stop from the roi feature of the anchor $\boldsymbol{F}_{i}^{roi}$). Additionally, we use the rank loss to increase the gap between positive and negative confidences of O2O classification head:
\begin{align}
&\mathcal{L} _{\,\,rank}=\frac{1}{N_{rank}}\sum_{i\in \varOmega _{pos}^{o2o}}{\sum_{j\in \varOmega _{neg}^{o2o}}{\max \left( 0, \tau _{rank}-\tilde{s}_i+\tilde{s}_j \right)}},\\
&N_{rank}=\left| \varOmega _{pos}^{o2o} \right|\left| \varOmega _{neg}^{o2o} \right|.
\end{align}
\begin{align}
&\mathcal{L} _{\,\,rank}=\frac{1}{N_{rank}}\sum_{i\in \varOmega ^{pos}_{o2o}}{\sum_{j\in \varOmega ^{neg}_{o2o}}{\max \left( 0, \tau _{rank}-\tilde{s}_i+\tilde{s}_j \right)}},\\
&N_{rank}=\left| \varOmega ^{pos}_{o2o} \right|\left| \varOmega ^{neg}_{o2o} \right|.
\end{align}
We directly use the GLaneIoU loss, $\mathcal{L}_{GLaneIoU}$, to regression the offset of xs (with g=1) and Smooth-L1 loss for the regression of end points (namely the y axis of the start point and the end point), denoted as $\mathcal{L} _{end}$. In order to make model learn the global features, we proposed the auxiliary loss illustrated in Fig. \ref{auxloss}:
\begin{align}

Binary file not shown.

Before

Width:  |  Height:  |  Size: 752 KiB

After

Width:  |  Height:  |  Size: 1017 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.4 MiB

After

Width:  |  Height:  |  Size: 1.4 MiB

Binary file not shown.