update

2024-10-16 20:12:46 +08:00 · 2024-10-16 20:12:46 +08:00 · 2973df9319
commit 2973df9319
parent 9fd990fa80
1 changed files with 38 additions and 34 deletions
--- a/main.tex
+++ b/main.tex
@ -286,7 +286,7 @@ The first matrix is the confidence comparison matrix $\boldsymbol{A}^{C}\in\math
 \begin{align}
        A_{ij}^{C}=\begin{cases}
                1, s_i>s_j\,\,and\,\,\left( s_i=s_j\,\,or\,\,i>j \right)\\
-                0, others.\\
+                0, others.
        \end{cases}
        \label{confidential matrix}
 \end{align}
@ -296,24 +296,24 @@ The second component is the geometric prior matrix, denoted by $\boldsymbol{A}^{
 \begin{align}
        A_{ij}^{G}=\begin{cases}
                1, \left| \theta _i-\theta _j \right|<\lambda ^{\theta}\,\,and\,\,\left| r_{i}^{g}-r_{j}^{g} \right|<\lambda ^r\\
-                0, others.\\
+                0, others.
        \end{cases}
        \label{geometric prior matrix}
 \end{align}
-This matrix indicates that an edge (\textit{e.g.} the relationship between two nodes) is considered to exist between two nodes \textit{only if} the two corresponding anchors are sufficiently close with each other. The distance between anchors is described by their global polar parameter. 
+This matrix indicates that an edge (\textit{e.g.} the relationship between two nodes) is considered to exist between two nodes \textit{only if} the two corresponding anchors are sufficiently close with each other. The distance between anchors is described by their global polar parameters. 

-With the aforementioned two matrices, we can constructed the overall adjacency matrix as $\boldsymbol{A} = \boldsymbol{A}^{C} \odot \boldsymbol{A}^{G}$; where ``$\odot$'' denotes the element-wise multiplication. This means that the existence of edges should statisfies the above three corresponding conditions. Then the relationships between the $i$-th anchor and the $j$-th anchor can be modeled by follows:
+With the aforementioned two matrices, the overall adjacency matrix is formulated as $\boldsymbol{A} = \boldsymbol{A}^{C} \odot \boldsymbol{A}^{G}$; where ``$\odot$'' signifies the element-wise multiplication. This indicates that the existence of edges should statisfies the above two corresponding conditions. Subsequently, the relationships between the $i$-th anchor and the $j$-th anchor can be modeled as follows:
 \begin{align}
 	\tilde{\boldsymbol{F}}_{i}^{roi}&\gets \mathrm{ReLU}\left( \boldsymbol{W}_{roi}\boldsymbol{F}_{i}^{roi}+\boldsymbol{b}_{roi} \right) ,\label{edge_layer_1}\\
 	\boldsymbol{F}_{ij}^{edge}&\gets \boldsymbol{W}_{in}\tilde{\boldsymbol{F}}_{j}^{roi}-\boldsymbol{W}_{out}\tilde{\boldsymbol{F}}_{i}^{roi},\label{edge_layer_2}\\
 	\tilde{\boldsymbol{F}}_{ij}^{edge}&\gets \boldsymbol{F}_{ij}^{edge}+\boldsymbol{W}_s\left( \boldsymbol{x}_{j}^{s}-\boldsymbol{x}_{i}^{s} \right) +\boldsymbol{b}_s,\label{edge_layer_3}\\
 	\boldsymbol{D}_{ij}^{edge}&\gets \mathrm{MLP}_{edge}\left( \tilde{\boldsymbol{F}}_{ij}^{edge} \right) .\label{edge_layer_4}
 \end{align}
-Eq. (\ref{edge_layer_1})-(\ref{edge_layer_4}) establish the directed relationships from the $i$-th node and the $j$-th node. Here, tensor $\boldsymbol{D}_{ij}^{edge}$ denotes the semantic features of directed edge $E_{ij}$. Given the directed edge features for each pair of nodes, we employ an element-wise max pooling layer to aggregate all the \textit{incoming edges} features of one node to update its node features:
+Eq. (\ref{edge_layer_1})-(\ref{edge_layer_4}) establish the directed relationships from the $i$-th node and the $j$-th node. Here, tensor $\boldsymbol{D}_{ij}^{edge}$ signifies  the semantic features of directed edge $E_{ij}$. With the directed edge characteristics provided for linked node pairs, we employ an element-wise max pooling layer to aggregate all the \textit{incoming edges} features of one node to refine its node features:
 \begin{align}
        \boldsymbol{D}_{i}^{node}&\gets \underset{k\in \left\{ k|A_{ki}=1 \right\}}{\max}\boldsymbol{D}_{ki}^{edge}.
 \end{align}
-Here, inspired by \cite{o3d}\cite{pointnet}, the max pooling aims to get the most distinctive features alone the column of the adjacent matrix (\textit{i.e.}, the incoming edges). With the updated node features $\boldsymbol{D}_{i}^{node}$, the final confidence scores $\tilde{s}_{i}^{g}$ are yielded by the following layers:
+Here, inspired by \cite{o3d}\cite{pointnet}, the max pooling aims to get the most distinctive features alone the column of the adjacent matrix (\textit{i.e.}, the incoming edges). With the refined node features $\boldsymbol{D}_{i}^{node}$, the ultimate confidence scores $\tilde{s}_{i}^{g}$ are generated by the subsequent layers:
 \begin{align}
        \boldsymbol{F}_{i}^{node}&\gets \mathrm{MLP}_{node}\left( \boldsymbol{D}_{i}^{node} \right) ,
        \\
@ -321,13 +321,13 @@ Here, inspired by \cite{o3d}\cite{pointnet}, the max pooling aims to get the mos
 \label{node_layer}
 \end{align}

-\textbf{Dual Confidence Selection.} We use dual confidence thresholds $\lambda_{o2m}^s$ and $\lambda_{o2o}^s$ to selected the positive (\textit{i.e.}, foreground) predictions. In the traditional NMS paradigm, the predictions output by the O2M classification heads with confidences $\left\{ s_{i}^{g} \right\} $ higher than $\lambda_{o2m}^s$ are selected as the positive predictions and subsequently fed into the NMS postprocessing to eliminate the redundant predictions. In the NMS-free paradigm, the final non-redundant predictions are selected as following:
+\textbf{Dual Confidence Selection.} We employ dual confidence thresholds, denoted as $\lambda_{o2m}^s$ and $\lambda_{o2o}^s$, to select the positive (\textit{i.e.}, foreground) predictions. Within the conventional NMS framework, the predictions emanating from the O2M classification heads with confidences $\left\{ s_{i}^{g} \right\} $ surpassing $\lambda_{o2m}^s$ are designated as positive predictions. hese are subsequently channeled into the NMS post-processing stage to remove redundant predictions. In the NMS-free paradigm of our work, the final non-redundant predictions are selected through the following certerion:
 \begin{align}
-        \varOmega _{o2o}^{pos}\equiv \left\{ i|\tilde{s}_{i}^{g}>\lambda _{o2o}^{s} \right\} \cap \left\{ i|s_{i}^{g}>\lambda _{o2m}^{s} \right\} 
+        \varOmega _{o2o}^{pos}\equiv \left\{ i|\tilde{s}_{i}^{g}>\lambda _{o2o}^{s} \right\} \cap \left\{ i|s_{i}^{g}>\lambda _{o2m}^{s} \right\}. 
 \end{align}
-where the $\varOmega _{o2o}^{pos}$ denoted the final set of the non-redundant predictions with the two types of confidences both statisfy the above conditions with dual confidence thresholds. The selection principle for non-redundant predictions is called dual confidence selection.
+The $\varOmega _{o2o}^{pos}$ signifies the ultimate collection of non-redundant predictions, wherein both confidences satisfy the aforementioned conditions in conjunction with the dual confidence thresholds. This methodology of selecting non-redundant predictions is termed \textit{dual confidence selection}.

-\textbf{Label Assignment and Cost Function for GPM.} As the previous work, we use the dual assignment strategy for label assignment of triplet head. The cost function for the $i$-th prediction and $j$-th ground truth is given as follows:
+\textbf{Label Assignment and Cost Function for GPM.} As the previous work \cite{o3d}\cite{pss}, we use the dual assignment strategy for label assignment of triplet head. The cost function for the $i$-th prediction and $j$-th ground truth is given as follows:
 \begin{align}
        \mathcal{C} _{ij}^{o2m}&=s_i^g\times \left( GIoU_{lane, \,ij} \right) ^{\beta},\\
        \mathcal{C} _{ij}^{o2o}&=\tilde{s}_i^g\times \left( GIoU_{lane, \,ij} \right) ^{\beta},
@ -337,11 +337,11 @@ where $\mathcal{C} _{ij}^{o2m}$ is the cost function for the O2M classification
 Given the cost matrix, we use SimOTA \cite{yolox} (one-to-many assignment) for the O2M classification head and the O2M regression head while Hungarian \cite{detr} algorithm (one-to-one assignment) for the O2O classification head. 

 \textbf{Loss function for GPM.}
-We utilize focal loss \cite{focal} for both O2O classification head and the O2M classification head, dentoed as $\mathcal{L}^{o2m}_{cls}$ and $\mathcal{L}^{o2o}_{cls}$, respectively. The set of candidate samples involved in the computation of $\mathcal{L}^{o2o}_{cls}$, denoted as $\varOmega_{o2o}$, is confined to the positive sample set of the O2M classification head:
+Focal loss \cite{focal} is utilized for both O2O classification head and the O2M classification head, dentoed as $\mathcal{L}^{o2m}_{cls}$ and $\mathcal{L}^{o2o}_{cls}$, respectively. The set of candidate samples involved in the computation of $\mathcal{L}^{o2o}_{cls}$, denoted as $\varOmega_{o2o}$, is confined to the positive sample set of the O2M classification head:
 \begin{align}
        \varOmega _{o2o}=\left\{ i\mid s_i^g>\lambda_{o2m}^s \right\}.
 \end{align}
-In essence, certain samples with lower O2M scores are excluded from the computation of $\mathcal{L}^{o2o}_{cls}$. Furthermore, we harness the rank loss $\mathcal{L} _{rank}$ as referenced in \cite{pss} to amplify the disparity between the positive and negative confidences of the O2O classification head. Since there is a gap between the label assignment of the O2O classification head and the O2M classification head, to keep the quality of RoI features learning, the gradient is stopped from the O2O classification head to ROI pooling head during training porcess. This trick is also proposed in \cite{pss}.
+In essence, certain samples with lower $\left\{ s_{i}^{g} \right\} $ are excluded from the computation of $\mathcal{L}^{o2o}_{cls}$. Furthermore, we harness the rank loss $\mathcal{L} _{rank}$ as referenced in \cite{pss} to amplify the disparity between the positive and negative confidences of the O2O classification head. Given the disparity between the label assignments of the O2O classification head and the O2M classification head, to preserve the quality of RoI feature learning, the gradient is stopped from the O2O classification head to the ROI pooling head during the training process. This technique is also posited in \cite{pss}.
 \begin{figure}[t]
        \centering
        \includegraphics[width=\linewidth]{thesis_figure/auxloss.png} % 
@ -349,13 +349,13 @@ In essence, certain samples with lower O2M scores are excluded from the computat
        \label{auxloss}
 \end{figure}

-We directly apply the redefined GIoU loss (refer to Appendix \ref{giou_appendix}), $\mathcal{L}_{GIoU}$, to regress the offset of x-axis coordinates of sampled points and $Smooth_{L1}$ loss for the regression of end points of lanes, denoted as $\mathcal{L}_{end}$. To facilitate the learning of global features, we propose the auxiliary loss $\mathcal{L}_{\mathrm{aux}}$ depicted in Fig. \ref{auxloss}. The anchors and ground truth are segmented into several divisions. Each anchor segment is regressed to the primary components of the corresponding segment of the designated ground truth. This approach aids the anchors in acquiring a deeper comprehension of the global geometric form. 
+We directly apply the redefined GIoU loss (refer to Appendix \ref{giou_appendix}), $\mathcal{L}_{GIoU}$, to regress the offset of x-axis coordinates of sampled points and $Smooth_{L1}$ loss for the regression of end points of lanes, denoted as $\mathcal{L}_{end}$. To facilitate the learning of global features, we propose the auxiliary loss $\mathcal{L}_{aux}$ depicted in Fig. \ref{auxloss}. The anchors and ground truth are segmented into several divisions. Each anchor segment is regressed to the primary components of the corresponding segment of the designated ground truth. This approach aids the anchors in acquiring a deeper comprehension of the global geometric form. 

 The final loss functions for GPM are given as follows:
 \begin{align}
-        \mathcal{L} _{cls}^{g}&=w_{o2m}^{cls}\mathcal{L} _{o2m}^{\mathrm{cls}}+w_{o2o}^{cls}\mathcal{L} _{\mathrm{o}2\mathrm{o}}^{\mathrm{cls}}+w_{rank}\mathcal{L}_{\mathrm{rank}},
+        \mathcal{L} _{cls}^{g}&=w_{o2m}^{cls}\mathcal{L}^{o2m}_{cls}+w_{o2o}^{cls}\mathcal{L}_{o2o}^{\mathrm{cls}}+w_{rank}\mathcal{L}_{\mathrm{rank}},
        \\
-        \mathcal{L} _{reg}^{g}&=w_{GIoU}\mathcal{L} _{G\mathrm{IoU}}+w_{end}\mathcal{L}_{\mathrm{end}}+w_{aux}\mathcal{L} _{\mathrm{aux}}.
+        \mathcal{L} _{reg}^{g}&=w_{GIoU}\mathcal{L}_{GIoU}+w_{end}\mathcal{L}_{end}+w_{aux}\mathcal{L} _{aux}.
 \end{align}
 % \begin{align}
 %         \mathcal{L}_{aux} &= \frac{1}{\left| \varOmega^{pos}_{o2m} \right| N_{seg}} \sum_{i \in \varOmega_{pos}^{o2o}} \sum_{m=j}^k \Bigg[ l \left( \theta_i - \hat{\theta}_{i}^{seg,m} \right) \\
@ -844,7 +844,7 @@ where $f_{cls}^{plain}$ denotes a classification head with the plain structure.
 We decided to choose the Fast NMS \cite{yolact} as the inspiration of the design of O2O classification head. Fast NMS is an iteration-free postprocessing algorithm to remove redundant predictions. Additionally, we add a sort-free straight and geometric priors to Fast NMS, and the details are shown in Algorithm \ref{Graph Fast NMS}. 

 \begin{algorithm}[t]
-        \caption{Fast NMS with Sort-free Paradigm and Geometric Prior.}
+        \caption{Fast NMS with Geometric Prior.}
        \begin{algorithmic}[1] %这个1 表示每一行都显示数字
        \REQUIRE ~~\\ %算法的输入参数：Input 
            The index of positive predictions, $1, 2, ..., i, ..., N_{pos}$;\\
@ -862,7 +862,7 @@ We decided to choose the Fast NMS \cite{yolact} as the inspiration of the design
                \label{confidential matrix}
                \end{align}
            where the $\land$  denotes (element wise) logical ``AND'' operation between two Boolean values/tensors.
-            \STATE Calculate the geometric prior matrix $\boldsymbol{M}^{G}\in\mathbb{R}^{K\times K}$, which is defined as follows:
+            \STATE Calculate the geometric prior matrix $\boldsymbol{A}^{G}\in\mathbb{R}^{K\times K}$, which is defined as follows:
            \begin{align}
                A_{ij}^{G}=\begin{cases}
                        1, \left| \theta _i-\theta _j \right|<\lambda ^{\theta}\,\,and\,\,\left| r_{i}^{g}-r_{j}^{g} \right|<\lambda ^r\\
@ -870,23 +870,27 @@ We decided to choose the Fast NMS \cite{yolact} as the inspiration of the design
                \end{cases}
                \label{geometric prior matrix}
        \end{align}
-            \STATE Calculate the distance matrix $\boldsymbol{D}  \in \mathbb{R} ^{K \times K}$ The element $D_{ij}$ in $\boldsymbol{D}$ is defined as follows: 
+            \STATE Calculate the inverse distance matrix $\boldsymbol{D}  \in \mathbb{R} ^{K \times K}$ The element $D_{ij}$ in $\boldsymbol{D}$ is defined as follows: 
                \begin{align}
            D_{ij} = 1-d\left( \mathrm{lane}_i, \mathrm{lane}_j\right),
                \label{al_1-3}
                \end{align}
            where $d\left(\cdot, \cdot \right)$ is some predefined function to quantify the distance between two lane predictions such as IoU.
-            \STATE Define the adjacent matrix $\boldsymbol{M} = \boldsymbol{M}^{P} \land \boldsymbol{M}^{C} \land \boldsymbol{M}^{G}$ and the final confidence $\tilde{s}_i^g$ is calculate as following:
+            \STATE Define the adjacent matrix $\boldsymbol{A} = \boldsymbol{A}^{C} \odot \boldsymbol{A}^{G}$ and the final confidence $\tilde{s}_i^g$ is calculate as following:
            \begin{align}
-                \tilde{s}_i^g = \begin{cases}
-                    1, if\,\text{if } \underset{j \in \{ j \mid A_{ij} = 1 \}}{\max} D_{ij} < \lambda^g \\
-                    0, & \text{otherwise}
+                \tilde{s}_{i}^{g}=\begin{cases}
+                        1, \mathrm{if}\underset{k\in \{k\mid A_{ki}=1\}}{\max}D_{ki}<\lambda _{d}^{s}, \\
+                        0, \mathrm{otherwise}.c\\
                \end{cases}
                \label{al_1-4}
            \end{align}
-                
-                        
-        \RETURN The final confidence $\tilde{s}_i^g$. % the return result of the algorithm
+            \STATE Get the final selection set:
+            \begin{align}
+            \varOmega_{nms}^{pos}=\left\{ i|s_{i}^{g}>\lambda _{o2m}^{s}\,\,and\,\,\tilde{s}_{i}^{g}=1 \right\}
+            \end{align}
+            where one prediction retained should statisfy the above two condition.
+        
+        \RETURN The final result $\varOmega_{nms}^{pos}$.
        \end{algorithmic}
        \label{Graph Fast NMS}
 \end{algorithm}
@ -897,19 +901,19 @@ The fundamental shortcomings of the NMS are the definations of distance based on

 In order to help the model to learn distance containing both explicit geometric information and implicit semantic informaton, the block to replace Eq. \ref{al_1-3} are expressed as:
 \begin{align}
-        \tilde{\boldsymbol{F}}_{i}^{roi}&\gets \mathrm{ReLU}\left( \mathrm{Linear}_{o2o}^{roi}\left( \boldsymbol{F}_{i}^{roi} \right) \right) ,\\
-        \boldsymbol{F}_{ij}^{edge}&\gets \mathrm{Linear}_{in}\left( \tilde{\boldsymbol{F}}_{i}^{roi} \right) -\mathrm{Linear}_{out}\left( \tilde{\boldsymbol{F}}_{j}^{roi} \right) ,\\
-        \tilde{\boldsymbol{F}}_{ij}^{edge}&\gets \boldsymbol{F}_{ij}^{edge}+\mathrm{Linear}_b\left( \boldsymbol{x}_{i}^{s}-\boldsymbol{x}_{j}^{s} \right) ,\\
-        \boldsymbol{D}_{ij}^{edge}&\gets \mathrm{MLP}_{edge}\left( \tilde{\boldsymbol{F}}_{ij}^{edge} \right) .
-\label{edge_layer_appendix}
+	\tilde{\boldsymbol{F}}_{i}^{roi}&\gets \mathrm{ReLU}\left( \boldsymbol{W}_{roi}\boldsymbol{F}_{i}^{roi}+\boldsymbol{b}_{roi} \right) ,\label{edge_layer_1_appendix}\\
+	\boldsymbol{F}_{ij}^{edge}&\gets \boldsymbol{W}_{in}\tilde{\boldsymbol{F}}_{j}^{roi}-\boldsymbol{W}_{out}\tilde{\boldsymbol{F}}_{i}^{roi},\label{edge_layer_2_appendix}\\
+	\tilde{\boldsymbol{F}}_{ij}^{edge}&\gets \boldsymbol{F}_{ij}^{edge}+\boldsymbol{W}_s\left( \boldsymbol{x}_{j}^{s}-\boldsymbol{x}_{i}^{s} \right) +\boldsymbol{b}_s,\label{edge_layer_3_appendix}\\
+	\boldsymbol{D}_{ij}^{edge}&\gets \mathrm{MLP}_{edge}\left( \tilde{\boldsymbol{F}}_{ij}^{edge} \right) .\label{edge_layer_4_appendix}
 \end{align}
 where the inverse distance $\boldsymbol{D}_{ij}^{edge}$ is no longer a scalar but a semantic tensor with dimension $d_{dis}$. $\boldsymbol{D}_{ij}^{edge}$. The replacement of Eq. \ref{al_1-4} is constructed as follows:
 \begin{align}
-        \boldsymbol{D}_{i}^{node}&\gets \underset{j\in \left\{ j|A_{ij}=1 \right\}}{\max}\boldsymbol{D}_{ij}^{edge},
+        \boldsymbol{D}_{i}^{node}&\gets \underset{k\in \left\{ k|A_{ki}=1 \right\}}{\max}\boldsymbol{D}_{ki}^{edge}.
+\end{align}
+\begin{align}
+        \boldsymbol{F}_{i}^{node}&\gets \mathrm{MLP}_{node}\left( \boldsymbol{D}_{i}^{node} \right) ,
        \\
-        \boldsymbol{F}_{i}^{node}&\gets \mathrm{MLP}_{node}\left( \boldsymbol{D}_{i}^{node} \right),
-        \\
-        \tilde{s}_i^g&\gets \sigma \left( \mathrm{Linear}_{o2o}^{out}\left( \boldsymbol{F}_{i}^{node} \right) \right).
+        \tilde{s}_{i}^{g}&\gets \sigma \left( \boldsymbol{W}_{node}\boldsymbol{F}_{i}^{node} + \boldsymbol{b}_{node} \right) ,
 \label{node_layer_appendix}
 \end{align}