Compare commits
No commits in common. "888326d770d56d07d594a8aea9c2c09301b5bdaf" and "051b36c4cc443c225c2abad019cf9ce2207f60fb" have entirely different histories.
888326d770
...
051b36c4cc
5196
Anaconda3-2024.06-1-Linux-s390x.sh
Normal file
@ -68,7 +68,7 @@ It is assumed that the reader has a basic working knowledge of \LaTeX. Those who
|
|||||||
\noindent At the beginning of your \LaTeX\ file you will need to establish what type of publication style you intend to use. The following list shows appropriate documentclass options for each of the types covered by IEEEtran.
|
\noindent At the beginning of your \LaTeX\ file you will need to establish what type of publication style you intend to use. The following list shows appropriate documentclass options for each of the types covered by IEEEtran.
|
||||||
|
|
||||||
\begin{list}{}{}
|
\begin{list}{}{}
|
||||||
\item{regular Journal Article}
|
\item{Regular Journal Article}
|
||||||
\item{{\tt{$\backslash$documentclass[journal]{IEEEtran}}}}\\
|
\item{{\tt{$\backslash$documentclass[journal]{IEEEtran}}}}\\
|
||||||
\item{{Conference Paper}}
|
\item{{Conference Paper}}
|
||||||
\item{{\tt{$\backslash$documentclass[conference]{IEEEtran}}}}\\
|
\item{{\tt{$\backslash$documentclass[conference]{IEEEtran}}}}\\
|
||||||
|
BIN
QQ_9.9.12_240715_x64_01.exe
Normal file
627
main.tex
@ -17,7 +17,6 @@
|
|||||||
\usepackage{amssymb}
|
\usepackage{amssymb}
|
||||||
\usepackage{booktabs}
|
\usepackage{booktabs}
|
||||||
\usepackage{tikz}
|
\usepackage{tikz}
|
||||||
\usepackage{tabularx}
|
|
||||||
\usepackage[table,xcdraw]{xcolor}
|
\usepackage[table,xcdraw]{xcolor}
|
||||||
\usepackage[colorlinks,bookmarksopen,bookmarksnumbered, linkcolor=red]{hyperref}
|
\usepackage[colorlinks,bookmarksopen,bookmarksnumbered, linkcolor=red]{hyperref}
|
||||||
|
|
||||||
@ -47,14 +46,15 @@
|
|||||||
\maketitle
|
\maketitle
|
||||||
|
|
||||||
\begin{abstract}
|
\begin{abstract}
|
||||||
Lane detection is a critical and challenging task in autonomous driving, particularly in real-world scenarios where traffic lanes are often slender, lengthy, and partially obscured by other vehicles, complicating detection efforts. Existing anchor-based methods typically rely on prior straight line anchors to extract features and refine lane location and shape. Though achieving high performance, manually setting prior anchors is cumbersome, and ensuring adequate coverage across diverse datasets often requires a large number of dense anchors. Additionally, Non-Maximum Suppression (NMS) is used to suppress redundant predictions, which complicates real-world deployment and may fail in dense scenarios. In this study, we introduce PolarRCNN, a nms-free anchor-based method for lane detection. By incorporating both local and global polar coordinate systems, PolarRCNN enables flexible anchor proposals and significantly reduces the number of anchors required without compromising performance. Additionally, we introduce a heuristic GNN-based NMS-free head that supports an end-to-end paradigm, making the model more deployment-friendly and enhancing performance in dense scenarios. Our method achieves competitive results on five popular lane detection benchmarks—Tusimple, CULane, LLAMAS, Curvelanes, and DL-Rail—while maintaining a lightweight design and straightforward structure. Our source code are available at \href{https://github.com/ShqWW/PolarRCNN}{\textit{https://github.com/ShqWW/PolarRCNN}}.
|
Lane detection is a critical and challenging task in autonomous driving, particularly in real-world scenarios where traffic lanes are often slender, lengthy, and partially obscured by other vehicles, complicating detection efforts. Existing anchor-based methods typically rely on prior straight line anchors to extract features and refine lane location and shape. Though achieving high performance, manually setting prior anchors is cumbersome, and ensuring sufficient anchor coverage across diverse datasets requires a large number of dense anchors. Furthermore, NMS postprocessing should be applied to supress the redundant predictions. In this study, we introduce PolarRCNN, a two-stage nms-free anchor-based method for lane detection. By introducing local polar head, the proposal of anchors are dynamic. The number of anchors are decreasing greatly without sacrificing performace. What's more, a GNN based nms free head is proposed to enable the model reach an end-to-end format, which is deployment friendly. Our model yields competitive results on five popular lane detection benchmarks (Tusimple, CULane, LLAMAS, Curvelanes and DL-Rail) while maintaining a lightweight size and a simple structure.
|
||||||
|
Our source code are available at \href{https://github.com/ShqWW/PolarRCNN}{\textit{https://github.com/ShqWW/PolarRCNN}}.
|
||||||
\end{abstract}
|
\end{abstract}
|
||||||
\begin{IEEEkeywords}
|
\begin{IEEEkeywords}
|
||||||
Lane detection, NMS-free, Graph neural network, Polar coordinate system.
|
Lane detection.
|
||||||
\end{IEEEkeywords}
|
\end{IEEEkeywords}
|
||||||
|
|
||||||
\section{Introduction}
|
\section{Introduction}
|
||||||
\IEEEPARstart{L}{ane} detection is a significant problem in computer vision and autonomous driving, forming the basis for accurately perceiving the driving environment in intelligent driving systems. While extensive research has been conducted in ideal environments, it remains a challenging task in adverse scenarios such as night driving, glare, crowd, and rainy conditions, where lanes may be occluded or damaged. Moreover, the slender shapes, complex topologies of lanes and the global property add to the complexity of detection challenges. An effective lane detection method should take into account both global high-level semantic features and local low-level features to address these varied conditions and ensure robust performance in real-time applications such as autonomous driving.
|
\IEEEPARstart{L}{ane} detection is a significant problem in computer vision and autonomous driving, forming the basis for accurately perceiving the driving environment in intelligent driving systems. While extensive research has been conducted in ideal environments, it remains a challenging task in adverse scenarios such as night driving, glare, crowd, and rainy conditions, where lanes may be occluded or damaged. Moreover, the slender shapes, complex topologies of lanes and the global property to the complexity of detection challenges. An effective lane detection method should take into account both global high-level semantic features and local low-level features to address these varied conditions and ensure robust performance in real-time applications such as autonomous driving.
|
||||||
|
|
||||||
Traditional methods predominantly concentrate on handcrafted local feature extraction and lane shape modeling. Techniques such as the Canny edge detector\cite{canny1986computational}, Hough transform\cite{houghtransform}, and deformable templates for lane fitting\cite{kluge1995deformable} have been extensively utilized. Nevertheless, these approaches often encounter limitations in practical settings, particularly when low-level and local features lack clarity or distinctiveness.
|
Traditional methods predominantly concentrate on handcrafted local feature extraction and lane shape modeling. Techniques such as the Canny edge detector\cite{canny1986computational}, Hough transform\cite{houghtransform}, and deformable templates for lane fitting\cite{kluge1995deformable} have been extensively utilized. Nevertheless, these approaches often encounter limitations in practical settings, particularly when low-level and local features lack clarity or distinctiveness.
|
||||||
|
|
||||||
@ -120,92 +120,61 @@ In recent years, fueled by advancements in deep learning and the availability of
|
|||||||
|
|
||||||
|
|
||||||
|
|
||||||
Drawing inspiration from object detection methods such as Yolos \cite{} and Faster RCNN \cite{}, several anchor-based approaches have been introduced for lane detection, the representative work including LanesATT \cite{} and CLRNet \cite{}. These methods have demonstrated superior performance by leveraging anchor priors and enabling larger receptive fields for feature extraction. However, anchor-based methods encounter similar drawbacks as anchor-based general object detection method as follows:
|
Drawing inspiration from object detection methods such as Yolos and Fast RCNN, several anchor-based approaches have been introduced for lane detection, the representative work including LanesATT and CLRNet. These methods have demonstrated superior performance by leveraging anchor priors and enabling larger receptive fields for feature extraction. However, anchor-based methods encounter similar drawbacks as anchor-based general object detection method as follows:
|
||||||
|
|
||||||
(1) A large amount of lane anchors are set among the image even in sparse scenarios.
|
\begin{itemize} (1) A large amount of dense anchors should be configured to ensure the recall of detection result since the lane distributions are complex in real scenarios (i.e the direction and the localtion), as the Fig. \ref{anchor setting}(a) shows.
|
||||||
|
\end{itemize}
|
||||||
|
|
||||||
(2) Non-maximum suppression (NMS) postprocessing is necessary for the remove of redundant prediction but may fail in dense scenarios.
|
\begin{itemize} (2) Due to the large anchor setting, redundant predictions should be remove by postprocessing such as NMS \cite{} and FastNMS \cite{}, which brings the difficulty to deployment and the threshold of NMS should be manual setting.
|
||||||
|
\end{itemize}
|
||||||
|
|
||||||
Regrading the first issue, \cite{} introduced learned anchors, where the anchor parameters are optimized during training to adapt to the lane distributions (see Fig. \ref{anchor setting} (b)) in real dataset. Additionally, they employ cascade cross-layer anchor refinement to bring the anchors closer to the ground truth. However, the anchors are still numerous to cover the potential distributions of lanes. Moving further, \cite{} proposes flexible anchors for each image by generating start points, rather than using a fixed set of anchors for all images. Nevertheless, the start points of lanes are subjective and lack clear visual evidence due to the global nature of lanes, which affects its performance. \cite{} uses a local angle map to propose sketch anchors according to the direction of ground truth. This approach only considers the direction and neglects the accurate positioning of anchors, resulting in suboptimal performance without cascade anchor refinement. Overall, numerous anchors are unnecessary in sparse scenarios (where lane ground truths are sparse). The trend in newly proposed methods is to reduce the number of anchors and offer more flexible anchor configurations.
|
In order to solve the first problem, CLRNet uses learned anchors which location are optimized during training to adapt to the lane distributions (see Fig \ref{anchor setting} (b)) in real scenarios and use cascade cross layer anchor refinement to make the anchor more closer to the groundtruth. However, the anchors in CLRNet are still numerous to cover the potential distributions of lanes. To solve this problem, ADNet \cite{} uses start points generate unit to propose flexible anchors for each image rather than uses the same set of anchors for all images. However, the start points of lanes are subjective and lack of clear visual evidence due to the gloal property of lanes, so the performance of ADNet is not ideal. SRLane uses local angle map to propose sketch anchors according the direction of groundtruth. This method only consider the direction and ignore the accurate location of anchors, leading to worse performance without cascade anchor refinement. Moreover, all methods mentioned above fail to avoid the redundant predictions in the second proplem.
|
||||||
|
|
||||||
|
In order to address the issue we mentioned above better than the previous work, we analysis the reasons causing these issues and proposed a new lane detection method called PolarRCNN, which is two-stage nms-free anchor-based model. PolarRCNN uses local and global coordinates to describe the anchors and the number of proposed anchors are much less than previous work, as shown in fig. \ref{anchor setting} (c). Moreover, aheuristic graph neural network block is proposed to make the model nms-free. The model architecture is simple without complex mechanism using in previous work(i.e. attenion, cascade refinement, etc.), making the model deployment easier and speed faster. Besides, simple architecture helps us to inspect the key factors for performance for anchor based lane detection methods.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Regarding the second issue, nearly all anchor-based methods (including those mentioned above) require direct or indirect Non-Maximum Suppression (NMS) post-processing to eliminate redundant predictions. Although it is necessary to eliminate redundant predictions, NMS remains a suboptimal solution. On the one hand, NMS is not deployment-friendly because it involves defining and calculating distances (e.g., Intersection over Union) between lane pairs. This is more challenging than bounding boxes in general object detection due to the complexity of lane geometry. On the other hand, NMS fails in some dense scenarios where the lane ground truths are closer together compared to sparse scenarios. A larger distance threshold may result in false negatives, as some true positive predictions might be eliminated (as shown in Fig. \ref{nms setting} (a) and (b)) by mistake. Conversely, a smaller distance threshold may not eliminate redundant predictions effectively and can leave false positives (as shown in Fig. \ref{nms setting} © and (d)). Achieving an optimal trade-off in all scenarios by manually setting the distance threshold is challenging. The root cause of this problem is that the distance definition in NMS considers only geometric parameters while ignoring the semantic context in the image. Thus, when two predictions are “close” to each other, it is nearly impossible to determine whether one of them is redundant.
|
We conducted ecperiment on five mainstream benchmarks including TuSimple \cite{}, CULane \cite{}, LLAMAS\cite{}, Curvelanes\cite{} and DL-Rail\cite{}. Our proposed method is blessed with competitive performance with the state-of-art methods.
|
||||||
|
|
||||||
To address the two issues outlined above, we propose PolarRCNN, a novel anchor-based method for lane detection. For the first issue, we introduce local and global heads based on the polar coordinate system to create anchors with more accurate locations and reduce the number of proposed anchors in sparse scenarios, as illustrated in Fig. \ref{anchor setting} (c). Compared to state-of-the-art previous work \cite{} which uses 192 anchors, PolarRCNN employs only 20 anchors to cover potential lane ground truths. For the second issue, we have revised FastNMS to Graph-based FastNMS and introduced a new heuristic graph neural network block (Polar GNN block) integrated into the non-maximum suppression (NMS) head. The Polar GNN block offers a more interpretable structure compared to traditional NMS, achieving nearly equivalent performance in sparse scenarios and superior performance in dense scenarios. We conducted experiments on five major benchmarks: TuSimple \cite{}, CULane \cite{}, LLAMAS \cite{}, Curvelanes \cite{}, and DL-Rail \cite{}. Our proposed method demonstrates competitive performance compared to state-of-the-art methods.
|
Our main contribution are summarized as:
|
||||||
|
|
||||||
Our main contributions are summarized as follows:
|
|
||||||
|
|
||||||
\begin{itemize}
|
\begin{itemize}
|
||||||
\item We simplified the anchor parameters using local and global polar coordinate systems and applied them to two-stage lane detection frameworks. Compared to other anchor-based methods, the number of proposed anchors is greatly reduced while achieving better performance.
|
\item We simplified the anchor parameters with local and global polar coordinate systems, and apply them to two-stage lane detection frameworks. Compared with other sparse two-stage methods, the number of porposed anchors are greatly decreasing with a better performace.
|
||||||
\item We introduced a novel heuristic Polar GNN block to implement an NMS-free paradigm. The GNN architecture is designed with reference to Graph-based FastNMS, providing interpretability. Our model supports end-to-end training and testing, but traditional NMS postprocessing can still be used as an option for an NMS version of our model.
|
\item We proposed a novel heuristic graph neural network (GNN) head to implement a nms-free paradigm. The architecture of GNN is designed according to Fast NMS with interpretability. The whole training and testing process of our model is end-to-end.
|
||||||
\item Our method utilizes two-stage architectures and achieves competitive performance compared to state-of-the-art methods across five datasets. The high performance with fewer anchors and an NMS-free paradigm demonstrates the effectiveness of our approach. Additionally, our model is designed with a straightforward structure (without cascade refinement or attention strategies), which simplifies deployment.
|
\item Our proposed method applies simple model architectures and get competitive performance with other state-of-art methods on five datasets. The high performace with fewer anchors and nms-free paradigm and demonstrate the effectiveness of our method.
|
||||||
\end{itemize}
|
\end{itemize}
|
||||||
|
|
||||||
\section{Related Works}
|
\section{Related Works}
|
||||||
The lane detection aims to detect lane instances in a image. In this section, we only introduce deep-leanrning based methods for lane detection. The lane detection methods can be categorized by segmentation based, parameter-based methods and anchor-based methods.
|
The lane detection aims to detect lane instances in a image. In this section, we only introduce deep-leanrning based methods for lane detection. The lane detection methods can be categorized by segmentation based parameter-based methods and anchor-based methods.
|
||||||
|
|
||||||
\textbf{Segmentation-based Methods.} Segmentation-based methods focus on pixel-wise prediction. They predefined each pixel into different categories according to different lane instances and background\cite{} and predicted information pixel by pixel. However, these methods overly focus on low-level and local features, neglecting global semantic information and real-time detection. SCNN uses a larger receptive field to overcome this problem. Some methods such as UFLDv1 and v2\cite{}\cite{} and CondLaneNet\cite{} utilize row-wise or column-wise classification instead of pixel classification to improve detection speed. Another issue with these methods is that the lane instance prior is learned by the model itself, leading to a lack of prior knowledge. Lanenet uses post-clustering to distinguish each lane instance. UFLD divides lane instances by angles and locations and can only detect a fixed number of lanes. CondLaneNet utilizes different conditional dynamic kernels to predict different lane instances. Some methods such as FOLOLane\cite{} and GANet\cite{} use bottom-up strategies to detect a few key points and model their global relations to form lane instances.
|
\textbf{Segmentation-based Methods.} Segmentation-based methods focus on pixel-wise prediction. They predefined each pixel into different categories according to different lane instances and background\cite{} and predicted information pixel by pixel. However, these methods overly focus on low-level and local features, neglecting global semantic information and real-time detection. SCNN uses a larger receptive field to overcome this problem. Some methods such as UFLDv1 and v2\cite{}\cite{} and CondLaneNet\cite{} utilize row-wise or column-wise classification instead of pixel classification to improve detection speed. Another issue with these methods is that the lane instance prior is learned by the model itself, leading to a lack of prior knowledge. Lanenet uses post-clustering to distinguish each lane instance. UFLD divides lane instances by angles and locations and can only detect a fixed number of lanes. CondLaneNet utilizes different conditional dynamic kernels to predict different lane instances. Some methods such as FOLOLane\cite{} and GANet\cite{} use bottom-up strategies to detect a few key points and model their global relations to form lane instances.
|
||||||
|
|
||||||
\textbf{Parameter-based Methods.} Instead of predicting a series of points locations or pixel classes, parameter-based methods directly generate the curve parameters of lane instances. PolyLanenet\cite{} and LSTR\cite{} consider the lane instance as a polynomial curve and output the polynomial coefficients directly. BézierLaneNet\cite{} treats the lane instance as a Bézier curve and generates the locations of control points of the curve. BSLane uses B-Spline to describe the lane, and the curve parameters focus on the local shapes of lanes. Parameter-based methods are mostly end-to-end without postprocessing, which grants them faster speed. However, since the final visual lane shapes are sensitive to the lane shape, the robustness and generalization of parameter-based methods may be less than ideal.
|
\textbf{Parameter-based Methods.} Instead of predicting a series of points locations or pixel classes, parameter-based methods directly generate the curve parameters of lane instances. PolyLanenet\cite{} and LSTR\cite{} consider the lane instance as a polynomial curve and output the polynomial coefficients directly. BézierLaneNet\cite{} treats the lane instance as a Bézier curve and generates the locations of control points of the curve. BSLane uses B-Spline to describe the lane, and the curve parameters focus on the local shapes of lanes. Parameter-based methods are mostly end-to-end without postprocessing, which grants them faster speed. However, since the final visual lane shapes are sensitive to the lane shape, the robustness and generalization of parameter-based methods may be less than ideal.
|
||||||
|
|
||||||
|
|
||||||
\textbf{Anchor-Based Methods.} Inspired by general object detection methods like YOLO \cite{} and DETR \cite{}, anchor-based approaches have been proposed for lane detection. Line-CNN is, to our knowledge, the earliest method that utilizes line anchors for detecting lanes. These lines are designed as rays emitted from the three edges (left, bottom, and right) of an image. However, the model’s receptive field is limited to the edges, making it slower compared to some other methods. LaneATT \cite{} improves upon this by employing anchor-based feature pooling to aggregate features along the entire line anchor, achieving faster speeds and better performance. Nevertheless, its grid sampling strategy and label assignment pose limitations. CLRNet \cite{} enhances anchor-based performance with cross-layer refinement strategies, SimOTA label assignment \cite{}, and Liou loss, surpassing many previous methods. A key advantage of anchor-based methods is their adaptability, allowing the integration of strategies from anchor-based general object detection, such as label assignment, bounding box refinement, and GIOU loss. However, existing anchor-based lane detection methods also have notable drawbacks. Line anchors are often handcrafted and numerous, which can be cumbersome. Some approaches, such as ADNet \cite{}, SRLane \cite{}, and Sparse Laneformer \cite{}, attempt to reduce the number of anchors and provide proposals, but this can slightly impact performance. Additionally, methods such as \cite{} \cite{} still rely on NMS postprocessing, complicating NMS threshold settings and model deployment. Although one-to-one label assignment (during training) without NMS \cite{} (during evaluation) alleviates this issue, its performance remains less satisfactory compared to NMS-based models.
|
\textbf{Anchor-Based Methods.} Inspired by some methods in general object detection like YOLO \cite{} and DETR \cite{}, anchor-based methods have been proposed for lane detection. Line-CNN is the earliest work, to our knowledge, that utilizes line anchors to detect lanes. The lines are designed as rays emitted from the three edges (left, bottom, and right) of an image. However, the receptive field of the model only focuses on edges and is slower than some methods. LaneATT \cite{} employs anchor-based feature pooling to aggregate features along the whole line anchor, achieving faster speed with better performance. Nevertheless, the grid sampling strategy and label assignment limit its potential. CLRNet \cite{} utilizes cross-layer refinement strategies, SimOTA label assignment \cite{}, and Liou loss to enhance anchor-based performance beyond most methods. The main advantage of anchor-based methods is that many strategies from anchor-based general object detection can be easily applied to lane detection, such as label assignment, bounding box refinement, GIOU loss, etc. However, the disadvantages of existing anchor-based lane detection are also evident. The line anchors need to be handcrafted and the anchor number is large, NMS postprocessing are needed, resulting in high computational consumption.
|
||||||
|
some work such as ADNet\cite{}, SRLane\cite{} and Sparse Laneformer\cite{} attempt to reduce the anchors and give proposals.
|
||||||
|
|
||||||
\begin{figure*}[ht]
|
\textbf{NMS-Free Object Detections}. NMS is an import postprocessing step in most general object detection methods. Detr \cite{} use one to one label assignment to avoid redundant predictions without NMS. Other nms-free method \cite{} successively proposed. These methods analysis this issue in to aspects, the model architecture and label assignment. \cite{}\cite{} hold the view that one to one assignments are the key points for nms-free predictions. Other works also consider the model expression ability to provided the non-redundant predictions. However, few anchor-based lane detecction methods analysis the nms-free paradigm as the general object detection, and rely on the NMS postprocessing. In our work, we find both the labal assignment and the expressive ability of nms-free module (e.g. the architecture and the inputs of module) both play an important role in the nms-free lane detection task for ancnor-based models.
|
||||||
|
|
||||||
|
This paper aims to address the two issue mentioned above (reducing anchors numbers and nms-free) for the anchor-based lanes proposed methods.
|
||||||
|
|
||||||
|
|
||||||
|
\section{Method}
|
||||||
|
The overall architecture of PolarRCNN is illustrated in fig. \ref{overall_architecture}. Our model consists of backbone-FPN, local polar head and global polar head. Only simple network layers such as convolution, MLP and pooling ops are used in each bolck (rather than attention, dynamic kernels, etc.).
|
||||||
|
|
||||||
|
\begin{figure*}[t]
|
||||||
\centering
|
\centering
|
||||||
\includegraphics[width=\linewidth]{thsis_figure/ovarall_architecture.png} % 替换为你的图片文件名
|
\includegraphics[width=\linewidth]{thsis_figure/ovarall_architecture.png} % 替换为你的图片文件名
|
||||||
\caption{The overall pipeline of PolarRCNN. The architecture is simple and lightweight. The backbone (e.g. ResNet18) and FPN aims to extract feature of the image. And the Local polar head aims to proposed sparse line anchors. After pooling features sample along the line anchors, the global polar head give the final predictions. Trilet subheads are set in the Global polar Head, including an one-to-one classification head (o2o cls head), an one-to-many classification head (o2m cls head) and an one-to-many regression head (o2m Reg Head). The one-to-one cls head aim to replace the NMS postprocessing and select only one positive prediction sample for each ground truth from the redundant predictions from the o2m head.}
|
\caption{The overall pipeline of PolarRCNN. The architecture is simple and lightweight. The backbone (e.g. ResNet18) and FPN aims to extract feature of the image. And the Local polar head aims to proposed sparse line anchors. After pooling features sample along the line anchors, the global polar head give the final predictions. Trilet subheads are set in the Global polar Head, including an one-to-one classification head (O2O Cls head), an one-to-many classification head (O2M Cls head) and an one-to-many regression head (O2M Reg Head). The one-to-one cls head aim to replace the NMS postprocessing and select only one positive prediction sample for each groundtruth from the redundant predictions from the O2M head.}
|
||||||
\label{overall_architecture}
|
\label{overall_architecture}
|
||||||
\end{figure*}
|
\end{figure*}
|
||||||
|
|
||||||
\textbf{NMS-Free Object Detections}. Non-Maximum Suppression (NMS) is an important postprocessing step in most general object detection methods. Detr \cite{} employs one-to-one label assignment to avoid redundant predictions without using NMS. Other NMS-free methods \cite{} have also been proposed, addressing this issue from two aspects: model architecture and label assignment. Studies \cite{} \cite{} suggest that one-to-one assignments are crucial for NMS-free predictions, but maintaining one-to-many assignments is still necessary to ensure effective feature learning of the model. Other works \cite{} \cite{} consider the model’s expressive capacity to provide non-redundant predictions. However, few studies have analyzed the NMS-free paradigm for anchor-based lane detection methods as thoroughly as in general object detection. Most anchor-based lane detection methods still rely on NMS postprocessing. In our work, besides label assignment, we extend the analysis to the detection head’s structure, focusing on achieving non-redundant (NMS-free) lane predictions.
|
|
||||||
|
|
||||||
In this work, we aim to address to two issues in anchor-based lane detection mentioned above, the sparse lane anchor setting and NMS-free predictions.
|
|
||||||
|
|
||||||
\section{Method}
|
|
||||||
The overall architecture of PolarRCNN is illustrated in Fig. \ref{overall_architecture}. Our model adheres to the Faster R-CNN \cite{} framework, consisting of a backbone, FPN (Feature Pyramid Network), RPN (Region Proposal Network), and RoI (Region of Interest) pooling. To investigate the fundamental factors affecting model performance, such as anchor settings and NMS (Non-Maximum Suppression) postprocessing, and make the model easier to deploy, PolarRCNN employs a simple and straightforward network structure. It relies on basic components including convolutional layers, MLPs (Multi-Layer Perceptrons), and pooling operations, deliberately excluding advanced elements like attention mechanisms, dynamic kernels, and cross-layer refinement used in pervious works \cite{}\cite{}.
|
|
||||||
|
|
||||||
\begin{table}[h]
|
|
||||||
\centering
|
|
||||||
\caption{Notations of some important variable}
|
|
||||||
\begin{adjustbox}{width=\linewidth}
|
|
||||||
\begin{tabular}{lll}
|
|
||||||
\toprule
|
|
||||||
\textbf{Variable} & \textbf{Type} & \hspace{10em}\textbf{Defination} \\
|
|
||||||
\midrule
|
|
||||||
\textbf{P}_{i} & tensor& The $i_{th}$ output feature map from FPN\\
|
|
||||||
H^{L}& scalar& The height of the local polar map\\
|
|
||||||
W^{L}& scalar& The weight of the local polar map\\
|
|
||||||
K_{A} & scalar& The number of anchors selected during evaluation\\
|
|
||||||
\textbf{c}^{G}& tensor& The origin point of global polar coordinate\\
|
|
||||||
\textbf{c}^{L}& tensor& The origin point of local polar coordinate\\
|
|
||||||
r^{G}_{i}& scalar& The $i_{th}$ anchor radius under global polar coordinate\\
|
|
||||||
r^{L}_{i}& scalar& The $i_{th}$ anchor radius under global polar coordinate\\
|
|
||||||
\theta_{i}& scalar& The $i_{th}$ anchor angle under global/local polar coordinate\\
|
|
||||||
\midrule
|
|
||||||
\textbf{X}^{pool}_{i}& tensor& The pooling feature of the $i_{th}$ anchor\\
|
|
||||||
N^{nbr}_{i}& set& The adjacent node set of the $i_{th}$ of anchor node\\
|
|
||||||
C_{o2m} & scalar& The positive threshold of one-to-many confidence\\
|
|
||||||
C_{o2o} & scalar& The positive threshold of one-to-one confidence\\
|
|
||||||
\midrule
|
|
||||||
& & \\
|
|
||||||
& & \\
|
|
||||||
& & \\
|
|
||||||
& & \\
|
|
||||||
& & \\
|
|
||||||
\bottomrule
|
|
||||||
\end{tabular}
|
|
||||||
\end{adjustbox}
|
|
||||||
\end{table}
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\subsection{Lane and Line Anchor Representation}
|
\subsection{Lane and Line Anchor Representation}
|
||||||
|
|
||||||
Lanes are characterized by their thin and elongated curved shapes. A suitable lane prior aids the model in extracting features, predicting locations, and modeling the shapes of lane curves with greater accuracy. In line with previous works \cite{}\cite{}, our lane priors (also referred to as lane anchors) consists of straight lines. We sample a sequence of 2D points along each lane anchor, denoted as $ P\doteq \left\{ \left( x_1, y_1 \right) , \left( x_2, y_2 \right) , ....,\left( x_n, y_n \right) \right\} $, where N is the number of sampled points. The y-coordinates of these points are uniformly sampled from the vertical axis of the image, specifically $y_i=\frac{H}{N-1}*i$, where H is the image height. These y-coordinates are also sampled from the ground truth lane, and the model is tasked with regressing the x-coordinate offset from the line anchor to the lane instance ground truth. The primary distinction between PolarRCNN and previous approaches lies in the description of the lane anchors (straight line), which will be detailed in the following sections.
|
Lanes are thin and long curves, a suitable lane prior helps the model to extract features and predict location and modeling the shapes of lane curves more accurately. Keeping the same as privious works\cite{}\cite{}, the lane prior (also called lane anchor) in our work are straight lines and we sample a sequense of 2D points on each line anchor, i.e. $ P\doteq \left\{ \left( x_1, y_1 \right) , \left( x_2, y_2 \right) , ....,\left( x_n, y_n \right) \right\} $, where N is the number of sampled points, The y coordinate of points is uniform sampled from the image vertically, i.e. $y_i=\frac{H}{N-1}*i$, where H is the image height. The same y coordinate of points are also sampled from the groundtruth lane and the model regress the x coordinate offset from line anchor to lane instance ground truth. The only differernce between PolarRCNN and previous work is the description of straight line anchors. It will be introduced in follows.
|
||||||
|
|
||||||
\textbf{Polar Coordinate system.} Since lane anchors are typically represented as straight lines, they can be described using straight line parameters. Previous approaches have used rays to describe 2D lane anchors, with the parameters including the coordinates of the starting point and the orientation/angle, denoted as $\left\{\theta, P_{xy}\right\}$, as shown in Fig. \ref{coord} (a). \cite{}\cite{} define the start points as lying on the three image boundaries. However, \cite{} argue that this approach is problematic because the actual starting point of a lane could be located anywhere within the image. In our analysis, using a ray can lead to ambiguity in line representation because a line can have an infinite number of starting points, and the choice of the starting point for a lane is subjective. As illustrated in Fig. \ref{coord} (a), the yellow (the visual start point) and green (the point located on the image boundary) starting points with the same orientation $\theta$ describe the same line, and either could be used in different datasets \cite{}\cite{}. This ambiguity arises because a straight line has two degrees of freedom, whereas a ray has three. To resolve this ussue , we propose using polar coordinates to describe a lane anchor with only two parameters: radius and angle, deoted as $\left\{\theta, r\right\}$, where $\theta \in \left[-\frac{\pi}{2}, \frac{\pi}{2}\right)$ and $r \in \left(-\infty, +\infty\right)$. This representation isillustrated in Fig. \ref{coord} (b).
|
\textbf{Polar Coordinate system.} Since the lane anchor are set to be straight by default, it could be described by the straight line parameter. Previous work uses a ray to describe a 2D line anchor, and the parameters of a ray contain the start point's coordinates and the orientation/angle, i.e., $\left\{\theta, P_{xy}\right\}$, as shown in Figure \ref{coord} (a). \cite{}\cite{} define the start points locates on the three image boundary. And \cite{} points out that this not reasonable because the real start point of a lane could be in any location within an image. In our analysis, using a ray may cause ambiguity in describing a line because a line may have infinite start points and the start point of the lane is subjective. As illustrated in Figure \ref{coord} (a), the yellow and darkgreen start points with the same orientation $\theta$ describe the same line, and either of them could be chosen in different datasets. This ambiguity arises because a straight line has two degrees of freedom while a ray has three degrees of freedom. To address this issue, as shown in Figure \ref{coord} (b), we use polar coordinate systems to describe a lane anchor with two parameters for radius and angle $\left\{\theta, r\right\}$, where $\theta \in \left[-\frac{\pi}{2}, \frac{\pi}{2}\right)$ and $r \in \left(-\infty, +\infty\right)$.
|
||||||
|
|
||||||
\begin{figure}[t]
|
\begin{figure}[t]
|
||||||
\centering
|
\centering
|
||||||
@ -221,26 +190,26 @@ Lanes are characterized by their thin and elongated curved shapes. A suitable la
|
|||||||
\includegraphics[width=\imgwidth]{thsis_figure/coord/polar.png}
|
\includegraphics[width=\imgwidth]{thsis_figure/coord/polar.png}
|
||||||
\caption{}
|
\caption{}
|
||||||
\end{subfigure}
|
\end{subfigure}
|
||||||
\caption{Different descriptions for anchor parameters. (a) Ray: start point and orientation. (b) Polar: radius and angle.}
|
\caption{Different descriptions for anchor parameters. (a) Ray: start point and orientation. (b) polar: radius and angle.}
|
||||||
\label{coord}
|
\label{coord}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
We define two types of polar coordinate systems: the global coordinate system and the local coordinate system, with the origin points denoted as the global origin $\boldsymbol{c}^{G}$ and the local origin $\boldsymbol{c}^{L}$, respectively. For convenience, the global origin is positioned near the static vanishing point of the entire lane image dataset, while the local origins are set at lattice points within the image. As illustrated in Fig. \ref{coord} (b), only the radius parameters are affected by the choice of the origin point, while the angle/orientation parameters remain consistent.
|
We define two kinds of polar coordinate systems called the global coordinate system and the local coordinate system, with the origin points denoted as the global origin point $P_{0}^{\text{global}}$ and the local origin point $P_{0}^{\text{local}}$, correspondingly. For convenience, the global origin point is set around the static vanishing point of the whole lane image dataset, while the local origin points are set as lattice within the image. From Figure \ref{coord}, it is easy to see that only the radius parameters are influenced by the choise of the origin point, with the angle/orientation parameters keeping consistent.
|
||||||
|
|
||||||
\subsection{Local Polar Head}
|
\subsection{Local polar Head}
|
||||||
|
|
||||||
\textbf{Anchor formulation in Local polar head}. Inspired by the region proposal network in Faster R-CNN \cite{}, the local polar head (LPH) aims to propose flexible, high-quality anchors aorund the lane ground truths within an image. As Figure \ref{lph} and Figure \ref{overall_architecture} demonstrate, the highest level $P_{3} \in \mathbb{R}^{C_{f} \times H_{f} \times W_{f}}$ of FPN feature maps is selected as the input for the Local Polar Head (LPH). Following a downsampling operation, the feature map is then fed into two branches: the regression branch $\phi _{reg}^{lph}\left(\cdot \right)$ and the classification branch $\phi _{cls}^{lph}\left(\cdot \right)$.
|
Dispired by the region proposal network in Faster RCNN \cite{}, the local polar proposal module aims to propose flexible anchors with high-quality in an image. As fig.\ref{lph} and fig. \ref{overall_architecture}. The highest level (P3) of FPN feature maps the input of $F \in \mathbb{R}^{C_{f} \times H_{f} \times W_{f}}$ are chosen as the input of Local Polar Head (LPH). After downsampling opereation, the feature map are fed into two branch, namely the regression branch and the classification branch:
|
||||||
|
|
||||||
\begin{equation}
|
\begin{equation}
|
||||||
\begin{aligned}
|
\begin{aligned}
|
||||||
&F_d\gets DS\left( P_{3} \right), \,F_d\in \mathbb{R} ^{C_f\times H^{L}\times W^{L}}\\
|
&F_d\gets downsample\left( F \right), \,F_d\in \mathbb{R} ^{C_f\times H_l\times W_l}\\
|
||||||
&F_{reg\,\,}\gets \phi _{reg}^{lph}\left( F_d \right), \,F_{reg\,\,}\in \mathbb{R} ^{2\times H^{L}\times W^{L}}\\
|
&F_{reg\,\,}\gets \phi _{reg}^{lph}\left( F_d \right), \,F_{reg\,\,}\in \mathbb{R} ^{2\times H_l\times W_l}\\
|
||||||
&F_{cls}\gets \phi _{cls}^{lph}\left( F_d \right), \,F_{cls}\in \mathbb{R} ^{H^{L}\times W^{L}}
|
&F_{cls}\gets \phi _{cls}^{lph}\left( F_d \right), \,F_{cls}\in \mathbb{R} ^{1\times H_l\times W_l}
|
||||||
\end{aligned}
|
\end{aligned}
|
||||||
\label{lph equ}
|
\label{lph equ}
|
||||||
\end{equation}
|
\end{equation}
|
||||||
|
|
||||||
The regression branch aims to propose lane anchors by predicting two parameters $F_{reg\,\,} \equiv \left\{\theta_{j}, r^{L}_{j}\right\}_{j=1}^{H^{L}\times W^{L}}$, within the local polar coordinate system. These parameters represent the angles and the radius.The classification branch, on the other hand, predicts the heat map $F_{cls\,\,}\left\{c_{j}\right\}_{j=1}^{H^{L}\times W^{L}}$ of the local polar origin grid. By discarding local origin points with lower confidence, the module increases the likelihood of selecting potential positive foreground lane anchors while removing background lane anchors to the greatest extent. Keeping it simple, the regression branch $\phi _{reg}^{lph}\left(\cdot \right)$ consists of one $1\times1$ convolutional layer while the classification branch $\phi _{cls}^{lph}\left(\cdot \right)$ consists of two $1\times1$ convolutional layers.
|
The regression branch aim to proposed lane anchors by predicting the two parameters $F_{reg\,\,} \equiv \left[\mathbf{\Theta}^{H_{l} \times W_{l}}, \mathbf{\xi}^{H_{l}\times W_{l}}\right]$ under the local polar coordinate system, which denotes the angles and the radius. The classification branch predicts the heat map of the local polar origin grid. By removing the local origin points with lower confidence, the potential positive lane anchors around the groundtruth are more likely to chosen while the background lane anchors are removed. Keeping it simple, the regression branch $\phi _{reg}^{lph}\left(\cdot \right)$ and the classification branch $\phi _{cls}^{lph}\left(\cdot \right)$ consists of one conv 1x1 layers and two conv 1x1 layers correspondingly.
|
||||||
|
|
||||||
\begin{figure}[t]
|
\begin{figure}[t]
|
||||||
\centering
|
\centering
|
||||||
@ -249,19 +218,20 @@ We define two types of polar coordinate systems: the global coordinate system an
|
|||||||
\label{lph}
|
\label{lph}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
\textbf{Loss Function.} During the training phase, as illustrated in Fig. \ref{lphlabel}, the ground truth labels for the Local Polar Head (LPH) are constructed as follows. The radius ground truth is defined as the shortest distance from a grid point (local origin point) to the ground truth lane curve. The angle ground truth is defined as the orientation of the vector from the grid point to the nearest point on the curve. A grid point is designated as a positive sample if its radius label is less than a threshold $\tau_{L}$ ; otherwise, it is considered a negative sample.
|
During the training stage, as fig. \ref{lphlabel},the ground truth label of local polar head is constructed as follows. The radius ground truth is defined as the shortest distance from a grid point (local plot origin point) to the ground truth lane curve. The ground truth of angle is defined as the orientation of the link from the grid point to the nearest points on the curve. Only one grid with the label of radius less than a threshold $\tau$ is set as a positive sample, while others are set as negative samples. Once the regression and classification labels are constructed, it can be easy to train the LPH by smooth-l1 loss and cross entropy loss (BCE). The LPH loss function is defined as follows:
|
||||||
|
|
||||||
Once the regression and classification labels are established, the LPH can be trained using the smooth L1 loss $d\left(\cdot \right)$ for regression and the binary cross-entropy loss $BCE\left( \cdot , \cdot \right)$. The LPH loss function is defined as follows:
|
|
||||||
|
|
||||||
\begin{equation}
|
\begin{equation}
|
||||||
\begin{aligned}
|
\begin{aligned}
|
||||||
\mathcal{L} _{lph}^{cls}&=BCE\left( F_{cls},F_{gt} \right) \\
|
\mathcal{L} _{lph}^{cls}&=BCE\left( F_{cls},F_{gt} \right) \\
|
||||||
\mathcal{L} _{lph}^{r\mathrm{e}g}&=\frac{1}{N_{lph}^{pos}}\sum_{j\in \left\{j|\hat{r}_i<\tau_{L} \right\}}{\left( d\left( \theta _j-\hat{\theta}_j \right) +d\left( r_j^L-\hat{r}_j^L \right) \right)}\\
|
\mathcal{L} _{lph}^{r\mathrm{e}g}&=\frac{1}{N_{lph}^{pos}}\sum_{i\in \left\{i|\hat{r}_i<\tau \right\}}{\left( d\left( \theta _i-\hat{\theta}_i \right) +d\left( r_i-\hat{r}_i \right) \right)}\\
|
||||||
|
% \mathcal{L} _{lph}^{r\mathrm{e}g}&=\lambda _{lph}^{cls}\mathcal{L} _{lph}^{cls}+\lambda _{lph}^{reg}\mathcal{L} _{lph}^{r\mathrm{e}g}
|
||||||
\end{aligned}
|
\end{aligned}
|
||||||
\label{loss_lph}
|
\label{loss_lpm}
|
||||||
\end{equation}
|
\end{equation}
|
||||||
|
|
||||||
\textbf{Top-$K_{A}$ Anchor Selectoin}. During the training stage, all $H^{local}\times W^{local}$ anchors are considered as candidate anchors and fed into the R-CNN module. This approach helps the R-CNN module to learn from sufficient features of negative (background) anchor samples. In the evaluation stage, however, only the top-$K_{A}$ anchors with the highest confidence scores are selected and fed into the R-CNN module. This strategy is designed to filter out potential negative (background) anchors and reduce the computational complexity of the R-CNN module. By doing so, it maintains the adaptability and flexibility of anchor distribution while decreasing the total number of anchors. The following experiments will demonstrate the effectiveness of our top-$K_{A}$ anchor selection strategy.
|
|
||||||
|
where $BCE\left( \cdot , \cdot \right) $ denotes the binary cross entropy loss and $d\left(\cdot \right)$ denotes the smooth-l1 loss. In order to keep the backbone training stability, the gradiants from the confidential branch to the backbone feature map are detached.
|
||||||
|
|
||||||
|
|
||||||
\begin{figure}[t]
|
\begin{figure}[t]
|
||||||
\centering
|
\centering
|
||||||
@ -270,192 +240,43 @@ Once the regression and classification labels are established, the LPH can be tr
|
|||||||
\label{lphlabel}
|
\label{lphlabel}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
\subsection{Global Polar Head.}
|
|
||||||
Global polar head (GPH) is a crucial component in the second stage of PolarRCNN. It takes lane anchor pooling features as input and predicts the precise lane location and confidence. Fig. \ref{gph} illustrates the structure and pipeline of GPH. GPH comprises RoI pooling modules and three sub-heads (triplet heads), which will be introduced in detail.
|
|
||||||
|
|
||||||
\textbf{RoI Pooling Module.} RoI pooling module is designed to transform features sampled from lane anchors into a standard feature tensor. Once the local polar parameters of a lane anchor are given, they can be converted to global polar coordinates using the following equation:
|
|
||||||
|
|
||||||
|
\subsection{Global polar Head}
|
||||||
|
Global polar head serves has the second stage of PolarRCNN, which accept the line pooling features as input and predict the accurate lane shape and localtion. The global polar head consist of 3 partsd.
|
||||||
|
Once the local polar parameter of a line anchor is provided, it can be transformed to the global polar coordinates with the following euqation:
|
||||||
\begin{equation}
|
\begin{equation}
|
||||||
\begin{aligned}
|
\begin{aligned}
|
||||||
r^{G}_{j}=r^{L}_{j}+\left( \textbf{c}^{L}_{j}-\textbf{c}^{G}_{j} \right) \left[\cos\theta_{j}, \sin\theta_{j} \right]^{T}
|
r^{global}=r^{local}+\left( x^{local}-x^{global} \right) \cos \theta
|
||||||
|
\\+\left( y^{local}-y^{global} \right) \sin \theta
|
||||||
\end{aligned}
|
\end{aligned}
|
||||||
\end{equation}
|
\end{equation}
|
||||||
where $\textbf{c}^{L}_{j} \in \mathbb{R}^{2}$ and $\textbf{c}^{G} \in \mathbb{R}^{2}$ represent the Cartesian coordinates of local and global origins correspondingly.
|
where $\left( x^{local}, y^{local} \right)$ and $\left( x^{global}, y^{global} \right)$ are the Cartesian coordinates of local and global origin points correspondingly.
|
||||||
|
|
||||||
Next, feature points are sampled on the lane anchor. The y-coordinates of these points are uniformly sampled vertically from the image, as previously mentioned. The $x_{i}$ coordinates are computed using the global polar axis with the following equation:
|
Then the feature points can be sample on the line anchor. The y coordinate of points is uniform sampled from the image vertically as mentioned before, and the $x_{i}$ is caculated using the global polar axis by the following equation:
|
||||||
|
|
||||||
\begin{equation}
|
\begin{equation}
|
||||||
\begin{aligned}
|
\begin{aligned}
|
||||||
x_{i\,\,}=-y_i\tan \theta +\frac{r^{G}}{\cos \theta}
|
x_{i\,\,}=-y_i\tan \theta +\frac{r}{\cos \theta}
|
||||||
\end{aligned}
|
\end{aligned}
|
||||||
\end{equation}
|
\end{equation}
|
||||||
|
|
||||||
|
|
||||||
\begin{figure}[t]
|
\begin{figure}[t]
|
||||||
\centering
|
\centering
|
||||||
\includegraphics[width=\linewidth]{thsis_figure/detection_head.png} % 替换为你的图片文件名
|
\includegraphics[width=\linewidth]{thsis_figure/triplet_head.png} % 替换为你的图片文件名
|
||||||
\caption{The main architecture of global polar head}
|
\caption{The main architecture of global head}
|
||||||
\label{gph}
|
\label{triplet}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
Suppose the $P_{0}$, $P_{1}$ and $P_{2}$ denote the last three levels from FPN and $\boldsymbol{F}_{L}^{s}\in \mathbb{R} ^{N_p\times d_f}$ represent the $L_{th}$ sample point feature from $P_{L}$. The grid featuers from the three levels are extracted and fused together without cross layer cascade refinenment unlike CLRNet. To reduce the number of parameters, we employ a weight sum strategy to combine features from different layers, similar to \cite{}, but in a more compact form:
|
|
||||||
|
|
||||||
\begin{equation}
|
|
||||||
\begin{aligned}
|
|
||||||
\boldsymbol{F}^s=\sum_{L=0}^2{\boldsymbol{F}_{L}^{s}\times \frac{e^{\boldsymbol{w}_{L}^{s}}}{\sum_{L=0}^2{e^{\boldsymbol{w}_{L}^{s}}}}}
|
|
||||||
\end{aligned}
|
|
||||||
\end{equation}
|
|
||||||
where $\boldsymbol{w}_{L}^{s}\in \mathbb{R} ^{N_p}$ represents the learnable aggregate weight, serving as a learned model weight. Instead of concatenating the three sampling features into $\boldsymbol{F}^s\in \mathbb{R} ^{N_p\times d_f\times 3}$ directly, the adaptive summation significantly reduces the feature dimensions to $\boldsymbol{F}^s\in \mathbb{R} ^{N_p\times d_f}$, which is one-third of the original dimension. The weighted sum tensors are then fed into fully connected layers to obtain the pooled RoI features of an anchor:
|
|
||||||
|
|
||||||
\begin{equation}
|
|
||||||
\begin{aligned}
|
|
||||||
\boldsymbol{F}^{roi}\gets FC^{pooling}\left( \boldsymbol{F}^s \right) , \boldsymbol{F}^{roi}\in \mathbb{R} ^{d_r}
|
|
||||||
\end{aligned}
|
|
||||||
\end{equation}
|
|
||||||
|
|
||||||
\textbf{Triplet Head.} The triplet head contrains three heads, namely the one-to-one classification(o2o cls) head, one-to-many classification(o2m cls) head and one-to-many regression(o2m Reg) head. In \cite{}\cite{}\cite{}\cite{}, the detection head are all in the one-to-many paradigm. During training stage, more than one positive samples are assigned to one ground truth. So more than one detections results for each instance are predicted during the evaluation stage, which need Non-Maximum Suppression (NMS) to remove the redundant results and keep one final result with highest confidence. However, NMS depends on the defination of distance between two detection results and the calculation for distance is complicated for curve lane and other irregular geometric shapes (such as instance segment). So in order to provided a detection result without redundancy (NMS-free), one-to-one paradigm is necessary during training stage, according to \cite{}. However, one-to-one paradigm is not enough and the structure of detection head is also essential for NMS-free detection. This issue will be analyzed in detail below.
|
|
||||||
|
|
||||||
\begin{algorithm}[t]
|
|
||||||
\caption{The Algorithm of the Graph-based FastNMS}
|
|
||||||
\begin{algorithmic}[1] %这个1 表示每一行都显示数字
|
|
||||||
\REQUIRE ~~\\ %算法的输入参数:Input
|
|
||||||
The index of positive predictions, $1, 2, ..., i, ..., N_{pos}$;\\
|
|
||||||
The positive corresponding anchors, $[\theta_i, r_{i}^{global}]$;\\
|
|
||||||
The x axis of sampling points from positive anchors, $\boldsymbol{x}_{i}^{b}$;\\
|
|
||||||
The positive confidence get from o2m cls head, $s_i$;\\
|
|
||||||
The positive regressions get from o2m Reg head, the horizontal offset $\varDelta \boldsymbol{x}_{i}^{roi}$ and end point location $\boldsymbol{e}_{i}$;\\
|
|
||||||
\ENSURE ~~\\ %算法的输出:Output
|
|
||||||
\STATE Calculate the confidential adjacent matrix $\boldsymbol{C} \in \mathbb{R} ^{N_{pos} \times N_{pos}} $, where the element $C_{ij}$ in $\boldsymbol{C}$ is caculate as follows:
|
|
||||||
\begin{equation}
|
|
||||||
\begin{aligned}
|
|
||||||
C_{ij}=\begin{cases}
|
|
||||||
1, s_i<s_j\,\,| \left( s_i=s_j \land i<j \right)\\
|
|
||||||
0, others\\
|
|
||||||
\end{cases}
|
|
||||||
\end{aligned}
|
|
||||||
\label{al_1-1}
|
|
||||||
\end{equation}
|
|
||||||
\STATE Calculate the geometric prior adjacent matrix $\boldsymbol{M} \in \mathbb{R} ^{N_{pos} \times N_{pos}} $, where the element $M_{ij}$ in $\boldsymbol{M}$ is caculate as follows:
|
|
||||||
\begin{equation}
|
|
||||||
\begin{aligned}
|
|
||||||
M_{ij}=\begin{cases}
|
|
||||||
1,\left| \theta _i-\theta _j \right|<\theta _{\tau}\land \left| r_{i}^{global}-r_{j}^{global} \right|<r_{\tau}\\
|
|
||||||
0,others\\
|
|
||||||
\end{cases}
|
|
||||||
\end{aligned}
|
|
||||||
\label{al_1-2}
|
|
||||||
\end{equation}
|
|
||||||
|
|
||||||
\STATE Calculate the distance matrix $\boldsymbol{D} \in \mathbb{R} ^{N_{pos} \times N_{pos}}$, where the element $D_{ij}$ in $\boldsymbol{D}$ is defined as follows:
|
|
||||||
\begin{equation}
|
|
||||||
\begin{aligned}
|
|
||||||
D_{ij} = 1-d\left( \boldsymbol{x}_{i}^{b} + \varDelta \boldsymbol{x}_{i}^{roi}, \boldsymbol{x}_{j}^{b} + \varDelta \boldsymbol{x}_{j}^{roi}, \boldsymbol{e}_{i}, \boldsymbol{e}_{j}\right)
|
|
||||||
\end{aligned}
|
|
||||||
\label{al_1-3}
|
|
||||||
\end{equation}
|
|
||||||
where $d\left(\cdot, \cdot, \cdot, \cdot \right)$ is some predefined function to quantify the distance between two lane predictions.
|
|
||||||
\STATE Define the adjacent matrix $\boldsymbol{T}=\,\,\boldsymbol{C}\land\boldsymbol{M}$ and the final confidence $\tilde{s}_i$ is calculate as following:
|
|
||||||
\begin{equation}
|
|
||||||
\begin{aligned}
|
|
||||||
\tilde{s}_i=\begin{cases}
|
|
||||||
1,\underset{j\in \left\{ j|T_{ij}=1 \right\}}{\max}D_{ij}<\d_{\tau}\\
|
|
||||||
0,others\\
|
|
||||||
\end{cases}
|
|
||||||
\end{aligned}
|
|
||||||
\label{al_1-4}
|
|
||||||
\end{equation}
|
|
||||||
|
|
||||||
\RETURN The final confidence $\tilde{s}_i$; %算法的返回值
|
|
||||||
\end{algorithmic}
|
|
||||||
\label{Graph FastNMS}
|
|
||||||
\end{algorithm}
|
|
||||||
|
|
||||||
\begin{figure}[t]
|
\begin{figure}[t]
|
||||||
\centering
|
\centering
|
||||||
\includegraphics[width=\linewidth]{thsis_figure/gnn.png} % 替换为你的图片文件名
|
\includegraphics[width=\linewidth]{thsis_figure/gnn.png} % 替换为你的图片文件名
|
||||||
\caption{The main architecture of gnn.}
|
\caption{The main architecture of our model.}
|
||||||
\label{gnn}
|
\label{gnn}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
\textbf{NMS vs NMS-free.} Let $\boldsymbol{F}^{roi}_{i}$ denotes the roi features extracted from $i_{th}$ anchors and the three sub heads take $\boldsymbol{F}^{roi}_{i}$ as input. Now, let us only consider the o2m cls head and o2m Reg head, which meets the old paradigm for previous work and can be taken as the baseline for the following new one-to-one paradigm. Keeping it simple and rigorous, both o2m cls head and o2m Reg head consists of two layers with activation function (plain structure without any complex mechanisms such as attention and desformable convolution). To remove the NMS postprocessing, directly replace the one-to-many with one-to-one label assignment is not enough as we mentioned before, because the anchors are highly overlapping or with small distance with each other, as the Fig. \ref{anchor setting} (b)(c) shows. Let the $\boldsymbol{F}^{roi}_{i}$ and $\boldsymbol{F}^{roi}_{j}$ denote the features form two overlapping (or with very small distance), so the values of $\boldsymbol{F}^{roi}_{i}$ and $\boldsymbol{F}^{roi}_{j}$ is almost the same. Let $f_{plain}^{cls}$ denotes the neural structure the sample as the o2m cls head but trained with one-to-one label assignment. Supposed that $\boldsymbol{F}^{roi}_{i}$ is the positive sample and the $\boldsymbol{F}^{roi}_{j}$ is the negative, the ideal correspondingly output is different as following:
|
|
||||||
|
|
||||||
|
|
||||||
\begin{equation}
|
|
||||||
\begin{aligned}
|
|
||||||
&\boldsymbol{F}_{i}^{roi}\approx \boldsymbol{F}_{j}^{roi}
|
|
||||||
\\
|
|
||||||
&f_{cls}^{plain}\left( \boldsymbol{F}_{i}^{roi} \right) \rightarrow 1
|
|
||||||
\\
|
|
||||||
&f_{cls}^{plain}\left( \boldsymbol{F}_{i}^{roi} \right) \rightarrow 0
|
|
||||||
\end{aligned}
|
|
||||||
\label{sharp fun}
|
|
||||||
\end{equation}
|
|
||||||
|
|
||||||
The equatin \ref{sharp fun} implicit the property of $f_{cls}^{plain}$ is sharp, the same issue is also mentioned in \cite{}. Learning the sharp property with the plain structure is hard. In the most extreme case with $\boldsymbol{F}_{i}^{roi} = \boldsymbol{F}_{j}^{roi}$, it's nearly impossible to distinguish the two anchors to positive and negative samples completely, the reality is both the confidence is convergent to around 0.5. The issue is caused by the limitations of the input format and the structure, which limit the expression ability. So it's essential to establish the relations between anchors and design new model structure to express the relation.
|
|
||||||
|
|
||||||
It is easy to notice that the "ideal" one-to-one branch is equivalence to o2m cls branch + o2m regression + NMS postprocessing. If the NMS could be replaced by some equivalent but learnable functions (e.g. some neural work), the o2o head is able to be trained and learn the one-to-one assignment. However, the NMS need sequential iteration and confidence sorting process, which is hard to be rewirtten to neural network. Though previous work such as the RNN based neural work is also porposed \cite{} to replace NMS, it's time comsuming and the iteration process introduce additional difficulty for the model trianing.
|
|
||||||
|
|
||||||
The key rule of the NMS postprocessing is gien as following:
|
|
||||||
Given a series of positive detections with redundancy, a detection lane A is supressed by another detection lane B if and only if:
|
|
||||||
|
|
||||||
(1) The confidence of A is lower than that of B.
|
|
||||||
|
|
||||||
(2) The predefined distance(e.g. IoU distance and L1 distance) between A and B is smaller than a threshold.
|
|
||||||
|
|
||||||
(3) Detection lane B is not supressed by any other detections.
|
|
||||||
|
|
||||||
However, as a simplicity of NMS, FastNMS only need the condition (1) and (2) and introduce more false negative predictions but has faster speed without sequential iteraion. Based on the propoerty of "iteration-free", we design a "sort-free" FastNMS further. The new algorithm are called Graph-based FastNMS, and the algorithm is elaborated in Algorithm \ref{Graph FastNMS}.
|
|
||||||
|
|
||||||
It's easy to prove that when the elements in $\boldsymbol{M}$ are all set to 1 (regardless of the geometric priors), the Graph-based FastNMS is equivalent to FastNMS. Based on our newly proposed Graph-based FastNMS, we can construct the structure of o2o cls head reference to Graph-based FastNMS.
|
|
||||||
|
|
||||||
According to the analysis of the shortcoming of traditional NMS postprocessing, the essential issue is due to the distance between two prediction and the settting of the threshold $\d_{\tau}$. So we replace the explicit defination of distance function with implicit graph neural work. What's more, the input of x axis is also replace with the anchor features ${F}_{i}^{roi}$. As the \cite{} mentioned, ${F}_{i}^{roi}$ contains the location and classification information, which is enough to modelling the distance by neural work.
|
|
||||||
So the implicit distance is defined as following;
|
|
||||||
|
|
||||||
\begin{equation}
|
|
||||||
\begin{aligned}
|
|
||||||
\tilde{\boldsymbol{F}}_{i}^{roi}\gets& \mathrm{Re}LU\left( FC_{o2o,roi}\left( \boldsymbol{F}_{i}^{roi} \right) \right)
|
|
||||||
\\
|
|
||||||
\boldsymbol{F}_{ij}^{edge}\gets& FC_{in}\left( \tilde{\boldsymbol{F}}_{i}^{roi} \right) -FC_{out}\left( \tilde{\boldsymbol{F}}_{i}^{roi} \right)
|
|
||||||
\\
|
|
||||||
&+FC_{base}\left( \boldsymbol{x}_{i}^{b}-\boldsymbol{x}_{j}^{b} \right)
|
|
||||||
\\
|
|
||||||
\boldsymbol{D}_{ij}^{edge}\gets& MLP_{edge}\left( \boldsymbol{F}_{ij}^{graph} \right)
|
|
||||||
\\
|
|
||||||
\end{aligned}
|
|
||||||
\label{edge_layer}
|
|
||||||
\end{equation}
|
|
||||||
|
|
||||||
the equation \ref{edge_layer} is the implicit replacement of equation \ref{al_1-3}
|
|
||||||
|
|
||||||
|
|
||||||
\begin{equation}
|
|
||||||
\begin{aligned}
|
|
||||||
\\
|
|
||||||
&\boldsymbol{D}_{i}^{node}\gets \underset{j\in \left\{ j|T_{ij}=1 \right\}}{\max}\boldsymbol{D}_{ij}^{edge}
|
|
||||||
\\
|
|
||||||
&\boldsymbol{F}_{i}^{node}\gets MLP_{node}\left( \boldsymbol{D}_{i}^{node} \right)
|
|
||||||
\\
|
|
||||||
&\tilde{s}_i\gets \sigma \left( FC_{o2o,out}\left( \boldsymbol{F}_{i}^{node} \right) \right)
|
|
||||||
\end{aligned}
|
|
||||||
\label{node_layer}
|
|
||||||
\end{equation}
|
|
||||||
|
|
||||||
|
|
||||||
the equation \ref{node_layer} is the implicit replacement of equation \ref{al_1-4}.
|
|
||||||
|
|
||||||
It should be noted that the o2o cls head depends on the predictons of o2m cls head. From the perspective of probablity, the confidence output by o2m cls head $s_{j}$ denotes the probablity that the $j_{th}$ detection is the positive sample. The confidence output by o2o cls head $\tilde{s}_i$ denotes the conditional probablity that $i_{th}$ sample shouldn't be supressed given the condition that the $i_{th}$ sample is already the positive sample.
|
|
||||||
|
|
||||||
|
|
||||||
\begin{equation}
|
|
||||||
\begin{aligned}
|
|
||||||
&s_j|_{j=1}^{N_A}\equiv P\left( a_j\,\,is\,\,pos \right) \,\,
|
|
||||||
\\
|
|
||||||
&\tilde{s}_i|_{i=1}^{N_{pos}}\equiv P\left( a_i\,\,is\,\,saved|a_i\,is\,\,pos \right)
|
|
||||||
\end{aligned}
|
|
||||||
\label{probablity}
|
|
||||||
\end{equation}
|
|
||||||
|
|
||||||
|
|
||||||
\textbf{Label assignment and Cost function} We use the label assignment (SimOTA) similar to previous work \cite{}\cite{} but in order to make the function more compact and keep consistant with works of general object detection \cite{}, the lane IoU is redefined. As illustrated in fig,9, the newly-defined lane Iou which we called GLaneIoU is defined as follows:
|
|
||||||
|
|
||||||
\begin{figure}[t]
|
\begin{figure}[t]
|
||||||
\centering
|
\centering
|
||||||
\includegraphics[width=\linewidth]{thsis_figure/GLaneIoU.png} % 替换为你的图片文件名
|
\includegraphics[width=\linewidth]{thsis_figure/GLaneIoU.png} % 替换为你的图片文件名
|
||||||
@ -463,64 +284,6 @@ It should be noted that the o2o cls head depends on the predictons of o2m cls he
|
|||||||
\label{glaneiou}
|
\label{glaneiou}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
\begin{equation}
|
|
||||||
\begin{aligned}
|
|
||||||
&w_{i}^{k}=\frac{\sqrt{\left( \Delta x_{i}^{k} \right) ^2+\left( \Delta y_{i}^{k} \right) ^2}}{\Delta y_{i}^{k}}w
|
|
||||||
\\
|
|
||||||
&\hat{d}_{i}^{\mathcal{O}}=\min \left( x_{i}^{p}+w_{i}^{p}, x_{i}^{q}+w_{i}^{q} \right) -\max \left( x_{i}^{p}-w_{i}^{p}, x_{i}^{q}-w_{i}^{q} \right)
|
|
||||||
\\
|
|
||||||
&\hat{d}_{i}^{\xi}=\max \left( x_{i}^{p}-w_{i}^{p}, x_{i}^{q}-w_{i}^{q} \right) -\min \left( x_{i}^{p}+w_{i}^{p}, x_{i}^{q}+w_{i}^{q} \right)
|
|
||||||
\\
|
|
||||||
&d_{i}^{\mathcal{U}}=\max \left( x_{i}^{p}+w_{i}^{p}, x_{i}^{q}+w_{i}^{q} \right) -\min \left( x_{i}^{p}-w_{i}^{p}, x_{i}^{q}-w_{i}^{q} \right)
|
|
||||||
\\
|
|
||||||
&d_{i}^{\mathcal{O}}=\max \left( \hat{d}_{i}^{\mathcal{O}},0 \right) \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, d_{i}^{\xi}=\max \left( \hat{d}_{i}^{\xi},0 \right)
|
|
||||||
\end{aligned}
|
|
||||||
\end{equation}
|
|
||||||
|
|
||||||
The definations of $d_{i}^{\mathcal{O}}$ and $d_{i}^{\mathcal{\xi}}$ is similar but slightly different from \cite{} and \cite{}, which force the value nonnegative. This format aim to be consistant to the IoU definations for bounding box. So the overall GLaneIoU is given as follows;
|
|
||||||
|
|
||||||
\begin{equation}
|
|
||||||
\begin{aligned}
|
|
||||||
GLaneIoU\,\,=\,\,\frac{\sum\nolimits_{i=j}^k{d_{i}^{\mathcal{O}}}}{\sum\nolimits_{i=j}^k{d_{i}^{\mathcal{U}}}}-g\frac{\sum\nolimits_{i=j}^k{d_{i}^{\xi}}}{\sum\nolimits_{i=j}^k{d_{i}^{\mathcal{U}}}}
|
|
||||||
\end{aligned}
|
|
||||||
\end{equation}
|
|
||||||
Where j and k is the valid points index (the start point and the end point). It's easy to see that when $g=0$, the GLaneIoU is correspond to IoU for bounding box, and the value range is $\left[0, 1 \right]$. When $g=1$, the GLaneIoU is correspond to GIoU for bounding box, and the value range is $\left(-1, 1 \right]$. Generally, when $g>0$, the value range of GLaneIoU is $\left(-g, 1 \right]$.
|
|
||||||
Then we can define the cost function between $i_{th}$ prediction and $j_{th}$ ground truth as \cite{}:
|
|
||||||
\begin{equation}
|
|
||||||
\begin{aligned}
|
|
||||||
\mathcal{C} _{ij}=\left(s_i\right)^{\beta_c}\times \left( GLaneIoU_{ij, g=0} \right) ^{\beta_r}
|
|
||||||
\end{aligned}
|
|
||||||
\end{equation}
|
|
||||||
This cost function is more compact than previous work and taken both location and confidenct into account. SimOTA (k=4) \cite{} are used for label assignment for two o2m heads while hungary algorithm for the o2o cls head.
|
|
||||||
|
|
||||||
\textbf{Loss function} We use focal loss \cite{} for o2o cls head and o2m cls head:
|
|
||||||
|
|
||||||
\begin{equation}
|
|
||||||
\begin{aligned}
|
|
||||||
\mathcal{L} _{\,\,o2m,cls}&=\sum_{i\in \varOmega _{pos}^{o2m}}{\alpha _{o2m}\left( 1-s_i \right) ^{\gamma}\log \left( s_i \right)}\\&+\sum_{i\in \varOmega _{neg}^{o2m}}{\left( 1-\alpha _{o2m} \right) \left( s_i \right) ^{\gamma}\log \left( 1-s_i \right)}
|
|
||||||
\\
|
|
||||||
\mathcal{L} _{\,\,o2o,cls}&=\sum_{i\in \varOmega _{pos}^{o2o}}{\alpha _{o2o}\left( 1-\tilde{s}_i \right) ^{\gamma}\log \left( \tilde{s}_i \right)}\\&+\sum_{i\in \varOmega _{neg}^{o2o}}{\left( 1-\alpha _{o2o} \right) \left( \tilde{s}_i \right) ^{\gamma}\log \left( 1-\tilde{s}_i \right)}
|
|
||||||
\\
|
|
||||||
\end{aligned}
|
|
||||||
\end{equation}
|
|
||||||
|
|
||||||
where the set of the one-to-one sample$\varOmega _{pos}^{o2o}$ and $\varOmega _{neg}^{o2o}$ is based on the positive sample from the sample of o2m cls head:
|
|
||||||
\begin{equation}
|
|
||||||
\begin{aligned}
|
|
||||||
\varOmega _{pos}^{o2o}\cup \varOmega _{neg}^{o2o}=\left\{ i|s_i>C_{o2o} \right\}
|
|
||||||
\end{aligned}
|
|
||||||
\end{equation}
|
|
||||||
|
|
||||||
only one sample with confidence larger than $C_{o2m}$ is chosed as the canditate sample of o2o cls head. So in ordr to keep the feature quality during training stage, the gradient of o2o cls head are stopped from remain detection head (the roi feature of the anchor $\boldsymbol{F}}_{i}^{roi}$). Additionally, we use the rank loss to increase the gap between positive and negative confidences of o2o cls head:
|
|
||||||
|
|
||||||
|
|
||||||
\begin{equation}
|
|
||||||
\begin{aligned}
|
|
||||||
&\mathcal{L} _{\,\,rank}=\frac{1}{N_{rank}}\sum_{i\in \varOmega _{pos}^{o2o}}{\sum_{j\in \varOmega _{neg}^{o2o}}{\max \left( 0, \tau _{rank}-\tilde{s}_i+\tilde{s}_j \right)}}\\
|
|
||||||
&N_{rank}=\left| \varOmega _{pos}^{o2o} \right|\left| \varOmega _{neg}^{o2o} \right|
|
|
||||||
\end{aligned}
|
|
||||||
\end{equation}
|
|
||||||
|
|
||||||
|
|
||||||
\begin{figure}[t]
|
\begin{figure}[t]
|
||||||
\centering
|
\centering
|
||||||
@ -529,38 +292,19 @@ only one sample with confidence larger than $C_{o2m}$ is chosed as the canditate
|
|||||||
\label{auxloss}
|
\label{auxloss}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
We directly use the GLaneIoU loss $\mathcal{L} _{GLaneIoU}$to regression the offset of xs (with g=1) and SmoothL1 loss for the regression of end points (namely the y axis of the start point and the end point) $\mathcal{L} _{end}$. In order to make model learn the global features, we proposed the auxloss:
|
|
||||||
\begin{equation}
|
|
||||||
\begin{aligned}
|
|
||||||
\mathcal{L} _{\,\,aux}=\frac{1}{\left| \varOmega _{pos}^{o2m} \right|N_{seg}}\sum_{i\in \varOmega _{pos}^{o2o}}{\sum_{m=j}^k{l\left( \theta _i-\hat{\theta}_{i}^{seg,m} \right) \\+l\left( r_{i}^{global}-\hat{r}_{i}^{seg,m} \right)}}
|
|
||||||
\end{aligned}
|
|
||||||
\end{equation}
|
|
||||||
|
|
||||||
|
The RCNN Module consists of several MLP layers and predicts the confidence and the coordinate offset of $x_{i}$. During the training stage, all the $F\in \mathbb{R} ^{C_{f}\times H_{f}\times W_{f}}$ proposed anchors participate, and the SimOTA\ref{} label assignment strategy is used for the RCNN module to determine which anchors are positive anchors, irrespective of the confidence predicted by the LPM module. These strategies are employed because the negative/background anchors are also crucial for the adaptability of the RCNN module.
|
||||||
|
|
||||||
|
The loss function is as follows:
|
||||||
\subsection{Loss function}
|
|
||||||
|
|
||||||
The overall loss function of PolarRCNN is given as follows:
|
|
||||||
|
|
||||||
\begin{equation}
|
\begin{equation}
|
||||||
\begin{aligned}
|
\begin{aligned}
|
||||||
\mathcal{L}_overall &=\mathcal{L} _{lph}^{cls}+w_{lph}^{reg}\mathcal{L} _{lph}^{reg}\\&+w_{o2m}^{cls}\mathcal{L} _{o2m}^{cls}+w_{o2o}^{cls}\mathcal{L} _{o2o}^{cls}+w_{rank}\mathcal{L} _{rank}\\&+w_{IoU}\mathcal{L} _{IoU}+w_{end}\mathcal{L} _{end}+w_{aux}\mathcal{L} _{aux}
|
\mathcal{L} _{RCNN}=c_{cls}\mathcal{L} _{cls}+c_{loc}\mathcal{L} _{loc}
|
||||||
\end{aligned}
|
\end{aligned}
|
||||||
\end{equation}
|
\end{equation}
|
||||||
|
where $\mathcal{L} _{cls}$ is focal loss, and $\mathcal{L} _{loc}$ is LaneIou loss\cite{}.
|
||||||
|
|
||||||
The training process is end-to-end.
|
In the testing stage, anchors with the top-$k_{l}$ confidence are the chosed as the proposal anchors, and $k_{l}$ anchors are fed into the RCNN module to get the final predictions.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\section{Experiment}
|
\section{Experiment}
|
||||||
|
|
||||||
@ -574,14 +318,6 @@ The training process is end-to-end.
|
|||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
|
|
||||||
\begin{figure}[t]
|
|
||||||
\centering
|
|
||||||
\includegraphics[width=\linewidth]{thsis_figure/speed_method.png}
|
|
||||||
\caption{Anchor Number and f1-score of different methods on CULane.}
|
|
||||||
\label{speed_method}
|
|
||||||
\end{figure}
|
|
||||||
|
|
||||||
|
|
||||||
\begin{figure*}[htbp]
|
\begin{figure*}[htbp]
|
||||||
\centering
|
\centering
|
||||||
\def\subwidth{0.325\textwidth}
|
\def\subwidth{0.325\textwidth}
|
||||||
@ -611,34 +347,28 @@ The training process is end-to-end.
|
|||||||
\toprule
|
\toprule
|
||||||
\multicolumn{2}{c|}{\textbf{Dataset}} & CULane & TUSimple & LLAMAS & DL-Rail & Curvelanes \\
|
\multicolumn{2}{c|}{\textbf{Dataset}} & CULane & TUSimple & LLAMAS & DL-Rail & Curvelanes \\
|
||||||
\midrule
|
\midrule
|
||||||
\multirow{7}*{Dataset Description}
|
\multirow{7}*{Data info}
|
||||||
& Train &88,880/$55,698^{*}$&3,268 &58,269&5,435&100,000\\
|
& Train &3,268 &88,880&58,269&5,435&100,000\\
|
||||||
& Validation &9,675 &358 &20,844&- &20,000 \\
|
& Validation &9,675 &358 &20,844&- &20,000 \\
|
||||||
& Test &34,680&2,782 &20,929&1,569&- \\
|
& Test &34,680&2,782 &20,929&1,569&- \\
|
||||||
& Resolution &1640\times590&1280\times720&1276\times717&1920\times1080&2560\times1440, etc\\
|
& Resolution &1640\times590&1280\times720&1276\times717&1920\times1080&2560\times1440, etc\\
|
||||||
& Lane &\leqslant4&\leqslant5&\leqslant4&=2&\leqslant10\\
|
& Lane &\leqslant4&\leqslant5&\leqslant4&=2&\leqslant10\\
|
||||||
& Environment &urban and highway & highway&highway&railay&urban and highway\\
|
& Environment &urban and highway & highway&highway&Railay&urban and highway\\
|
||||||
& Distribution &sparse&sparse&sparse&sparse&sparse and dense\\
|
& Property &sparse&sparse&sparse&sparse&sparse and dense\\
|
||||||
\midrule
|
\midrule
|
||||||
\multirow{1}*{Data Preprocess}
|
\multirow{1}*{Data Preprocess}
|
||||||
& Crop Height &270&160&300&560&640, etc\\
|
& Crop Height &270&160&300&560&640, etc\\
|
||||||
\midrule
|
\midrule
|
||||||
\multirow{6}*{Training Parameter}
|
\multirow{8}*{Hyper Parameter}
|
||||||
& Epoch Number &32&70&20&90&32\\
|
& Epoch Number &32&70&20&90&32\\
|
||||||
& Batch Size &40&24&32&40&40\\
|
& Batch Size &40&24&32&40&40\\
|
||||||
& Warm up iterations &800&200&800&400&800\\
|
& Warm up iterations &800&200&800&400&800\\
|
||||||
& Aux loss &0.2&0 &0.2&0.2&0.2\\
|
& Aux Loss &0.2&0 &0.2&0.2&0.2\\
|
||||||
& Rank loss &0.7&0.7&0.1&0.7&0 \\
|
& Rank Loss &0.7&0.7&0.1&0.7&0 \\
|
||||||
\midrule
|
& O2M conf thres &0.48&0.40&0.40&0.40&0.45\\
|
||||||
\multirow{4}*{Evaluation Parameter}
|
& O2O conf thres &0.46&0.46&0.46&0.46&0.44\\
|
||||||
& Polar map size &4\times10&4\times10&4\times10&4\times10&6\times13\\
|
& Eval split &Test&Test&Test&Test&Validation\\
|
||||||
& Top anchor selection &20&20&20&12&50\\
|
& Vis split &Test&Test&Validation&Test&Validation\\
|
||||||
& o2m conf thres &0.48&0.40&0.40&0.40&0.45\\
|
|
||||||
& o2o conf thres &0.46&0.46&0.46&0.46&0.44\\
|
|
||||||
\midrule
|
|
||||||
\multirow{2}*{Dataset Split}
|
|
||||||
& Evaluation &Test&Test&Test&Test&Val\\
|
|
||||||
& Visualization &Test&Test&Val&Test&Val\\
|
|
||||||
\bottomrule
|
\bottomrule
|
||||||
\end{tabular}
|
\end{tabular}
|
||||||
\end{adjustbox}
|
\end{adjustbox}
|
||||||
@ -691,11 +421,11 @@ The training process is end-to-end.
|
|||||||
\hline
|
\hline
|
||||||
\textbf{Proposed Method} \\
|
\textbf{Proposed Method} \\
|
||||||
\cline{1-1}
|
\cline{1-1}
|
||||||
PolarRCNN_{o2m} &ResNet18&80.81&63.96&94.12&79.57&76.53&83.33&55.06&90.62&79.50&1088&75.25\\
|
PolarRCNN_{O2M} &ResNet18&80.81&63.96&94.12&79.57&76.53&83.33&55.06&90.62&79.50&1088&75.25\\
|
||||||
PolarRCNN &ResNet18&80.81&63.96&94.12&79.57&76.53&83.33&55.06&90.62&79.50&1088&75.25\\
|
PolarRCNN &ResNet18&80.81&63.96&94.12&79.57&76.53&83.33&55.06&90.62&79.50&1088&75.25\\
|
||||||
PolarRCNN &ResNet34&80.92&63.97&94.24&79.76&76.70&81.93&55.40&\textbf{91.12}&79.85&1158&75.71\\
|
PolarRCNN &ResNet34&80.92&63.97&94.24&79.76&76.70&81.93&55.40&\textbf{91.12}&79.85&1158&75.71\\
|
||||||
PolarRCNN &ResNet50&81.34&64.77&94.45&\textbf{80.42}&75.82&83.61&56.62&91.10&80.05&1356&75.94\\
|
PolarRCNN &ResNet50&81.34&64.77&94.45&\textbf{80.42}&75.82&83.61&56.62&91.10&80.05&1356&75.94\\
|
||||||
PolarRCNN_{o2m} &DLA34 &\textbf{81.49}&64.96&\textbf{94.44}&80.36&\textbf{76.83}&83.68&56.53&90.85&\textbf{80.09}&1135&76.32\\
|
PolarRCNN_{O2M} &DLA34 &\textbf{81.49}&64.96&\textbf{94.44}&80.36&\textbf{76.83}&83.68&56.53&90.85&\textbf{80.09}&1135&76.32\\
|
||||||
PolarRCNN &DLA34 &\textbf{81.49}&\textbf{64.97}&\textbf{94.44}&80.36&\textbf{76.79}&83.68&\textbf{56.52}&90.85&\textbf{80.09}&1133&76.32\\
|
PolarRCNN &DLA34 &\textbf{81.49}&\textbf{64.97}&\textbf{94.44}&80.36&\textbf{76.79}&83.68&\textbf{56.52}&90.85&\textbf{80.09}&1133&76.32\\
|
||||||
\bottomrule
|
\bottomrule
|
||||||
\end{tabular}
|
\end{tabular}
|
||||||
@ -726,7 +456,7 @@ The training process is end-to-end.
|
|||||||
CondLaneNet&ResNet101 &96.54&97.24&2.01&3.50\\
|
CondLaneNet&ResNet101 &96.54&97.24&2.01&3.50\\
|
||||||
CLRNet &ResNet18 &96.84&97.89&2.28&1.92\\
|
CLRNet &ResNet18 &96.84&97.89&2.28&1.92\\
|
||||||
\midrule
|
\midrule
|
||||||
PolarRCNN_{o2m} &ResNet18&96.21&\textbf{97.98}&2.17&1.86\\
|
PolarRCNN_{O2M} &ResNet18&96.21&\textbf{97.98}&2.17&1.86\\
|
||||||
PolarRCNN &ResNet18&96.20&97.94&2.25&1.87\\
|
PolarRCNN &ResNet18&96.20&97.94&2.25&1.87\\
|
||||||
\bottomrule
|
\bottomrule
|
||||||
\end{tabular}
|
\end{tabular}
|
||||||
@ -750,9 +480,9 @@ The training process is end-to-end.
|
|||||||
CLRNet &DLA34 &96.12&- &- \\
|
CLRNet &DLA34 &96.12&- &- \\
|
||||||
\midrule
|
\midrule
|
||||||
|
|
||||||
PolarRCNN_{o2m} &ResNet18&96.05&96.80&95.32\\
|
PolarRCNN_{O2M} &ResNet18&96.05&96.80&95.32\\
|
||||||
PolarRCNN &ResNet18&96.06&96.81&95.32\\
|
PolarRCNN &ResNet18&96.06&96.81&95.32\\
|
||||||
PolarRCNN_{o2m} &DLA34&96.13&96.80&\textbf{95.47}\\
|
PolarRCNN_{O2M} &DLA34&96.13&96.80&\textbf{95.47}\\
|
||||||
PolarRCNN &DLA34&\textbf{96.14}&96.82&\textbf{95.47}\\
|
PolarRCNN &DLA34&\textbf{96.14}&96.82&\textbf{95.47}\\
|
||||||
|
|
||||||
\bottomrule
|
\bottomrule
|
||||||
@ -775,7 +505,7 @@ The training process is end-to-end.
|
|||||||
LaneATT(with RPN) &ResNet18&55.57&93.82&58.97\\
|
LaneATT(with RPN) &ResNet18&55.57&93.82&58.97\\
|
||||||
DALNet &ResNet18&59.79&96.43&65.48\\
|
DALNet &ResNet18&59.79&96.43&65.48\\
|
||||||
\midrule
|
\midrule
|
||||||
PolarRCNN_{o2m} &ResNet18&\textbf{61.53}&\textbf{97.01}&\textbf{67.86}\\
|
PolarRCNN_{O2M} &ResNet18&\textbf{61.53}&\textbf{97.01}&\textbf{67.86}\\
|
||||||
PolarRCNN &ResNet18&61.52&96.99&67.85\\
|
PolarRCNN &ResNet18&61.52&96.99&67.85\\
|
||||||
\bottomrule
|
\bottomrule
|
||||||
\end{tabular}
|
\end{tabular}
|
||||||
@ -847,7 +577,7 @@ The training process is end-to-end.
|
|||||||
\toprule
|
\toprule
|
||||||
\textbf{Paradigm} & \textbf{NMS thres(pixel)} & \textbf{F1(\%)} & \textbf{Precision(\%)} & \textbf{Recall(\%)} \\
|
\textbf{Paradigm} & \textbf{NMS thres(pixel)} & \textbf{F1(\%)} & \textbf{Precision(\%)} & \textbf{Recall(\%)} \\
|
||||||
\midrule
|
\midrule
|
||||||
\multirow{7}*{PolarRCNN_{o2m}}
|
\multirow{7}*{PolarRCNN_{O2M}}
|
||||||
& 50 (default) &85.38&\textbf{91.01}&80.40\\
|
& 50 (default) &85.38&\textbf{91.01}&80.40\\
|
||||||
& 40 &85.97&90.72&81.68\\
|
& 40 &85.97&90.72&81.68\\
|
||||||
& 30 &86.26&90.44&82.45\\
|
& 30 &86.26&90.44&82.45\\
|
||||||
@ -869,7 +599,7 @@ The training process is end-to-end.
|
|||||||
\begin{adjustbox}{width=\linewidth}
|
\begin{adjustbox}{width=\linewidth}
|
||||||
\begin{tabular}{cccc|ccc}
|
\begin{tabular}{cccc|ccc}
|
||||||
\toprule
|
\toprule
|
||||||
\textbf{GNN}&\textbf{cls Mat}& \textbf{Nbr Mat}&\textbf{Rank Loss}&\textbf{F1@50}&\textbf{Precision(\%)} & \textbf{Recall(\%)} \\
|
\textbf{GNN}&\textbf{Cls Mat}& \textbf{Nbr Mat}&\textbf{Rank Loss}&\textbf{F1@50}&\textbf{Precision(\%)} & \textbf{Recall(\%)} \\
|
||||||
\midrule
|
\midrule
|
||||||
& & & &16.19&69.05&9.17\\
|
& & & &16.19&69.05&9.17\\
|
||||||
\checkmark&\checkmark& & &79.42&88.46&72.06\\
|
\checkmark&\checkmark& & &79.42&88.46&72.06\\
|
||||||
@ -891,24 +621,24 @@ The training process is end-to-end.
|
|||||||
\multicolumn{2}{c|}{\textbf{Anchor strategy~/~assign}} & \textbf{F1@50(\%)} & \textbf{Precision(\%)} & \textbf{Recall(\%)} \\
|
\multicolumn{2}{c|}{\textbf{Anchor strategy~/~assign}} & \textbf{F1@50(\%)} & \textbf{Precision(\%)} & \textbf{Recall(\%)} \\
|
||||||
\midrule
|
\midrule
|
||||||
\multirow{6}*{Fixed}
|
\multirow{6}*{Fixed}
|
||||||
&o2m-B w/~ NMS &80.38&87.44&74.38\\
|
&O2M-B w/~ NMS &80.38&87.44&74.38\\
|
||||||
&o2m-B w/o NMS &44.03\textcolor{darkgreen}{~(36.35$\downarrow$)}&31.12\textcolor{darkgreen}{~(56.32$\downarrow$)}&75.23\textcolor{red}{~(0.85$\uparrow$)}\\
|
&O2M-B w/o NMS &44.03\textcolor{darkgreen}{~(36.35$\downarrow$)}&31.12\textcolor{darkgreen}{~(56.32$\downarrow$)}&75.23\textcolor{red}{~(0.85$\uparrow$)}\\
|
||||||
\cline{2-5}
|
\cline{2-5}
|
||||||
&o2o-B w/~ NMS &78.72&87.58&71.50\\
|
&O2O-B w/~ NMS &78.72&87.58&71.50\\
|
||||||
&o2o-B w/o NMS &78.23\textcolor{darkgreen}{~(0.49$\downarrow$)}&86.26\textcolor{darkgreen}{~(1.32$\downarrow$)}&71.57\textcolor{red}{~(0.07$\uparrow$)}\\
|
&O2O-B w/o NMS &78.23\textcolor{darkgreen}{~(0.49$\downarrow$)}&86.26\textcolor{darkgreen}{~(1.32$\downarrow$)}&71.57\textcolor{red}{~(0.07$\uparrow$)}\\
|
||||||
\cline{2-5}
|
\cline{2-5}
|
||||||
&o2o-G w/~ NMS &80.37&87.44&74.37\\
|
&O2O-G w/~ NMS &80.37&87.44&74.37\\
|
||||||
&o2o-G w/o NMS &80.27\textcolor{darkgreen}{~(0.10$\downarrow$)}&87.14\textcolor{darkgreen}{~(0.30$\downarrow$)}&74.40\textcolor{red}{~(0.03$\uparrow$)}\\
|
&O2O-G w/o NMS &80.27\textcolor{darkgreen}{~(0.10$\downarrow$)}&87.14\textcolor{darkgreen}{~(0.30$\downarrow$)}&74.40\textcolor{red}{~(0.03$\uparrow$)}\\
|
||||||
\midrule
|
\midrule
|
||||||
\multirow{6}*{Proposal}
|
\multirow{6}*{Proposal}
|
||||||
&o2m-B w/~ NMS &80.81&88.53&74.33\\
|
&O2M-B w/~ NMS &80.81&88.53&74.33\\
|
||||||
&o2m-B w/o NMS &36.46\textcolor{darkgreen}{~(44.35$\downarrow$)}&24.09\textcolor{darkgreen}{~(64.44$\downarrow$)}&74.93\textcolor{red}{~(0.6$\uparrow$)}\\
|
&O2M-B w/o NMS &36.46\textcolor{darkgreen}{~(44.35$\downarrow$)}&24.09\textcolor{darkgreen}{~(64.44$\downarrow$)}&74.93\textcolor{red}{~(0.6$\uparrow$)}\\
|
||||||
\cline{2-5}
|
\cline{2-5}
|
||||||
&o2o-B w/~ NMS &77.27&92.64&66.28\\
|
&O2O-B w/~ NMS &77.27&92.64&66.28\\
|
||||||
&o2o-B w/o NMS &47.11\textcolor{darkgreen}{~(30.16$\downarrow$)}&36.48\textcolor{darkgreen}{~(56.16$\downarrow$)}&66.48\textcolor{red}{~(0.20$\uparrow$)}\\
|
&O2O-B w/o NMS &47.11\textcolor{darkgreen}{~(30.16$\downarrow$)}&36.48\textcolor{darkgreen}{~(56.16$\downarrow$)}&66.48\textcolor{red}{~(0.20$\uparrow$)}\\
|
||||||
\cline{2-5}
|
\cline{2-5}
|
||||||
&o2o-G w/~ NMS &80.81&88.53&74.32\\
|
&O2O-G w/~ NMS &80.81&88.53&74.32\\
|
||||||
&o2o-G w/o NMS &80.81\textcolor{red}{~(0.00$\uparrow$)}&88.52\textcolor{darkgreen}{~(0.01$\downarrow$)}&74.33\textcolor{red}{~(0.01$\uparrow$)}\\
|
&O2O-G w/o NMS &80.81\textcolor{red}{~(0.00$\uparrow$)}&88.52\textcolor{darkgreen}{~(0.01$\downarrow$)}&74.33\textcolor{red}{~(0.01$\uparrow$)}\\
|
||||||
\bottomrule
|
\bottomrule
|
||||||
\end{tabular}
|
\end{tabular}
|
||||||
\end{adjustbox}
|
\end{adjustbox}
|
||||||
@ -925,12 +655,12 @@ The training process is end-to-end.
|
|||||||
\multicolumn{2}{c|}{\textbf{Paradigm}} & \textbf{F1(\%)} & \textbf{Precision(\%)} & \textbf{Recall(\%)} \\
|
\multicolumn{2}{c|}{\textbf{Paradigm}} & \textbf{F1(\%)} & \textbf{Precision(\%)} & \textbf{Recall(\%)} \\
|
||||||
\midrule
|
\midrule
|
||||||
\multirow{2}*{Baseline}
|
\multirow{2}*{Baseline}
|
||||||
&o2m-B w/~ NMS &78.83&88.99&70.75\\
|
&O2M-B w/~ NMS &78.83&88.99&70.75\\
|
||||||
&o2o-G w/o NMS &71.68\textcolor{darkgreen}{~(7.15$\downarrow$)}&72.56\textcolor{darkgreen}{~(16.43$\downarrow$)}&70.81\textcolor{red}{~(0.06$\uparrow$)}\\
|
&O2O-G w/o NMS &71.68\textcolor{darkgreen}{~(7.15$\downarrow$)}&72.56\textcolor{darkgreen}{~(16.43$\downarrow$)}&70.81\textcolor{red}{~(0.06$\uparrow$)}\\
|
||||||
\midrule
|
\midrule
|
||||||
\multirow{2}*{Stop grad}
|
\multirow{2}*{Stop grad}
|
||||||
&o2m-B w/~ NMS &80.81&88.53&74.33\\
|
&O2M-B w/~ NMS &80.81&88.53&74.33\\
|
||||||
&o2o-G w/o NMS &80.81\textcolor{red}{~(0.00$\uparrow$)}&88.52\textcolor{darkgreen}{~(0.01$\downarrow$)}&74.33\textcolor{red}{~(0.00$\uparrow$)} \\
|
&O2O-G w/o NMS &80.81\textcolor{red}{~(0.00$\uparrow$)}&88.52\textcolor{darkgreen}{~(0.01$\downarrow$)}&74.33\textcolor{red}{~(0.00$\uparrow$)} \\
|
||||||
\bottomrule
|
\bottomrule
|
||||||
\end{tabular}
|
\end{tabular}
|
||||||
\end{adjustbox}
|
\end{adjustbox}
|
||||||
@ -1040,17 +770,18 @@ The training process is end-to-end.
|
|||||||
\end{subfigure}
|
\end{subfigure}
|
||||||
\vspace{0.5em}
|
\vspace{0.5em}
|
||||||
|
|
||||||
|
|
||||||
\begin{subfigure}{\subwidth}
|
\begin{subfigure}{\subwidth}
|
||||||
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/all2_gt.jpg}
|
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/all2_gt.jpg}
|
||||||
\caption{GT}
|
\caption{GT}
|
||||||
\end{subfigure}
|
\end{subfigure}
|
||||||
\begin{subfigure}{\subwidth}
|
\begin{subfigure}{\subwidth}
|
||||||
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/all2_pred50.jpg}
|
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/all2_pred50.jpg}
|
||||||
\caption{NMS@50}
|
\caption{NMS50}
|
||||||
\end{subfigure}
|
\end{subfigure}
|
||||||
\begin{subfigure}{\subwidth}
|
\begin{subfigure}{\subwidth}
|
||||||
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/all2_pred15.jpg}
|
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/all2_pred15.jpg}
|
||||||
\caption{NMS@15}
|
\caption{NMS15}
|
||||||
\end{subfigure}
|
\end{subfigure}
|
||||||
\begin{subfigure}{\subwidth}
|
\begin{subfigure}{\subwidth}
|
||||||
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/all2_nmsfree.jpg}
|
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_nms/all2_nmsfree.jpg}
|
||||||
@ -1058,142 +789,108 @@ The training process is end-to-end.
|
|||||||
\end{subfigure}
|
\end{subfigure}
|
||||||
\vspace{0.5em}
|
\vspace{0.5em}
|
||||||
|
|
||||||
\caption{The Visualization of the detection results of sparse scenarios.}
|
\caption{hhh}
|
||||||
\end{figure*}
|
\end{figure*}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\begin{figure*}[htbp]
|
\begin{figure*}[htbp]
|
||||||
\centering
|
\centering
|
||||||
\def\pagewidth{0.49\textwidth}
|
\def\subwidth{0.24\textwidth}
|
||||||
\def\subwidth{0.47\linewidth}
|
|
||||||
\def\imgwidth{\linewidth}
|
\def\imgwidth{\linewidth}
|
||||||
\def\imgheight{0.5625\linewidth}
|
\def\imgheight{0.5625\linewidth}
|
||||||
\def\dashheight{0.8\linewidth}
|
\def\dashheight{0.8\linewidth}
|
||||||
|
|
||||||
\begin{subfigure}{\pagewidth}
|
\begin{subfigure}{\subwidth}
|
||||||
\rotatebox{90}{\small{GT}}
|
|
||||||
\begin{minipage}{\subwidth}
|
|
||||||
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/culane/1_gt.jpg}
|
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/culane/1_gt.jpg}
|
||||||
\end{minipage}
|
\end{subfigure}
|
||||||
\begin{minipage}{\subwidth}
|
\begin{subfigure}{\subwidth}
|
||||||
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/culane/2_gt.jpg}
|
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/culane/2_gt.jpg}
|
||||||
\end{minipage}
|
|
||||||
\end{subfigure}
|
\end{subfigure}
|
||||||
\begin{subfigure}{\pagewidth}
|
\begin{subfigure}{\subwidth}
|
||||||
\begin{minipage}{\subwidth}
|
|
||||||
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/tusimple/1_gt.jpg}
|
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/tusimple/1_gt.jpg}
|
||||||
\end{minipage}
|
\end{subfigure}
|
||||||
\begin{minipage}{\subwidth}
|
\begin{subfigure}{\subwidth}
|
||||||
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/tusimple/2_gt.jpg}
|
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/tusimple/2_gt.jpg}
|
||||||
\end{minipage}
|
|
||||||
\end{subfigure}
|
\end{subfigure}
|
||||||
\vspace{0.5em}
|
\vspace{0.5em}
|
||||||
|
|
||||||
|
\begin{subfigure}{\subwidth}
|
||||||
\begin{subfigure}{\pagewidth}
|
|
||||||
\raisebox{-1.5em}{\rotatebox{90}{\small{Anchors}}}
|
|
||||||
\begin{minipage}{\subwidth}
|
|
||||||
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/culane/1_anchor.jpg}
|
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/culane/1_anchor.jpg}
|
||||||
\end{minipage}
|
\end{subfigure}
|
||||||
\begin{minipage}{\subwidth}
|
\begin{subfigure}{\subwidth}
|
||||||
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/culane/2_anchor.jpg}
|
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/culane/2_anchor.jpg}
|
||||||
\end{minipage}
|
|
||||||
\end{subfigure}
|
\end{subfigure}
|
||||||
\begin{subfigure}{\pagewidth}
|
\begin{subfigure}{\subwidth}
|
||||||
\begin{minipage}{\subwidth}
|
|
||||||
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/tusimple/1_anchor.jpg}
|
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/tusimple/1_anchor.jpg}
|
||||||
\end{minipage}
|
\end{subfigure}
|
||||||
\begin{minipage}{\subwidth}
|
\begin{subfigure}{\subwidth}
|
||||||
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/tusimple/2_anchor.jpg}
|
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/tusimple/2_anchor.jpg}
|
||||||
\end{minipage}
|
|
||||||
\end{subfigure}
|
\end{subfigure}
|
||||||
\vspace{0.5em}
|
\vspace{0.5em}
|
||||||
|
|
||||||
\begin{subfigure}{\pagewidth}
|
\begin{subfigure}{\subwidth}
|
||||||
\raisebox{-2em}{\rotatebox{90}{\small{Predictions}}}
|
|
||||||
\begin{minipage}{\subwidth}
|
|
||||||
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/culane/1_pred.jpg}
|
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/culane/1_pred.jpg}
|
||||||
\end{minipage}
|
\end{subfigure}
|
||||||
\begin{minipage}{\subwidth}
|
\begin{subfigure}{\subwidth}
|
||||||
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/culane/2_pred.jpg}
|
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/culane/2_pred.jpg}
|
||||||
\end{minipage}
|
|
||||||
\caption{CULane}
|
|
||||||
\end{subfigure}
|
\end{subfigure}
|
||||||
\begin{subfigure}{\pagewidth}
|
\begin{subfigure}{\subwidth}
|
||||||
\begin{minipage}{\subwidth}
|
|
||||||
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/tusimple/1_pred.jpg}
|
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/tusimple/1_pred.jpg}
|
||||||
\end{minipage}
|
\end{subfigure}
|
||||||
\begin{minipage}{\subwidth}
|
\begin{subfigure}{\subwidth}
|
||||||
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/tusimple/2_pred.jpg}
|
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/tusimple/2_pred.jpg}
|
||||||
\end{minipage}
|
|
||||||
\caption{TuSimple}
|
|
||||||
\end{subfigure}
|
\end{subfigure}
|
||||||
\vspace{0.5em}
|
\vspace{0.5em}
|
||||||
|
|
||||||
% \begin{tikzpicture}
|
\begin{tikzpicture}
|
||||||
% \draw[dashed, pattern=on 8pt off 2pt, color=gray, line width=1pt] (-\textwidth/2,0) -- (\textwidth/2.,0);
|
\draw[dashed, pattern=on 8pt off 2pt, color=gray, line width=1pt] (-\textwidth/2,0) -- (\textwidth/2,0);
|
||||||
% \end{tikzpicture}
|
\end{tikzpicture}
|
||||||
% \vspace{0.05em}
|
\vspace{0.05em}
|
||||||
|
|
||||||
\begin{subfigure}{\pagewidth}
|
\begin{subfigure}{\subwidth}
|
||||||
\rotatebox{90}{\small{GT}}
|
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/llamas/1_gt.jpg}
|
||||||
\begin{minipage}{\subwidth}
|
|
||||||
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/llamas/1_gt.jpg}
|
|
||||||
\end{minipage}
|
|
||||||
\begin{minipage}{\subwidth}
|
|
||||||
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/llamas/2_gt.jpg}
|
|
||||||
\end{minipage}
|
|
||||||
\end{subfigure}
|
\end{subfigure}
|
||||||
\begin{subfigure}{\pagewidth}
|
\begin{subfigure}{\subwidth}
|
||||||
\begin{minipage}{\subwidth}
|
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/llamas/2_gt.jpg}
|
||||||
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/dlrail/1_gt.jpg}
|
\end{subfigure}
|
||||||
\end{minipage}
|
\begin{subfigure}{\subwidth}
|
||||||
\begin{minipage}{\subwidth}
|
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/dlrail/1_gt.jpg}
|
||||||
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/dlrail/2_gt.jpg}
|
\end{subfigure}
|
||||||
\end{minipage}
|
\begin{subfigure}{\subwidth}
|
||||||
|
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/dlrail/2_gt.jpg}
|
||||||
\end{subfigure}
|
\end{subfigure}
|
||||||
\vspace{0.5em}
|
\vspace{0.5em}
|
||||||
|
|
||||||
\begin{subfigure}{\pagewidth}
|
\begin{subfigure}{\subwidth}
|
||||||
\raisebox{-1.5em}{\rotatebox{90}{\small{Anchors}}}
|
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/llamas/1_anchor.jpg}
|
||||||
\begin{minipage}{\subwidth}
|
|
||||||
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/llamas/1_anchor.jpg}
|
|
||||||
\end{minipage}
|
|
||||||
\begin{minipage}{\subwidth}
|
|
||||||
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/llamas/2_anchor.jpg}
|
|
||||||
\end{minipage}
|
|
||||||
\end{subfigure}
|
\end{subfigure}
|
||||||
\begin{subfigure}{\pagewidth}
|
\begin{subfigure}{\subwidth}
|
||||||
\begin{minipage}{\subwidth}
|
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/llamas/2_anchor.jpg}
|
||||||
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/dlrail/1_anchor.jpg}
|
\end{subfigure}
|
||||||
\end{minipage}
|
\begin{subfigure}{\subwidth}
|
||||||
\begin{minipage}{\subwidth}
|
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/dlrail/1_anchor.jpg}
|
||||||
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/dlrail/2_anchor.jpg}
|
\end{subfigure}
|
||||||
\end{minipage}
|
\begin{subfigure}{\subwidth}
|
||||||
|
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/dlrail/2_anchor.jpg}
|
||||||
\end{subfigure}
|
\end{subfigure}
|
||||||
\vspace{0.5em}
|
\vspace{0.5em}
|
||||||
|
|
||||||
\begin{subfigure}{\pagewidth}
|
\begin{subfigure}{\subwidth}
|
||||||
\raisebox{-2em}{\rotatebox{90}{\small{Predictions}}}
|
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/llamas/1_pred.jpg}
|
||||||
\begin{minipage}{\subwidth}
|
|
||||||
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/llamas/1_pred.jpg}
|
|
||||||
\end{minipage}
|
|
||||||
\begin{minipage}{\subwidth}
|
|
||||||
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/llamas/2_pred.jpg}
|
|
||||||
\end{minipage}
|
|
||||||
\caption{LLAMAS}
|
|
||||||
\end{subfigure}
|
\end{subfigure}
|
||||||
\begin{subfigure}{\pagewidth}
|
\begin{subfigure}{\subwidth}
|
||||||
\begin{minipage}{\subwidth}
|
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/llamas/2_pred.jpg}
|
||||||
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/dlrail/1_pred.jpg}
|
\end{subfigure}
|
||||||
\end{minipage}
|
\begin{subfigure}{\subwidth}
|
||||||
\begin{minipage}{\subwidth}
|
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/dlrail/1_pred.jpg}
|
||||||
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/dlrail/2_pred.jpg}
|
\end{subfigure}
|
||||||
\end{minipage}
|
\begin{subfigure}{\subwidth}
|
||||||
\caption{DL-Rail}
|
\includegraphics[width=\imgwidth, height=\imgheight]{thsis_figure/view_dataset/dlrail/2_pred.jpg}
|
||||||
\end{subfigure}
|
\end{subfigure}
|
||||||
\vspace{0.5em}
|
\vspace{0.5em}
|
||||||
|
|
||||||
\caption{The Visualization of the detection results of sparse scenarios.}
|
\caption{hhh}
|
||||||
\end{figure*}
|
\end{figure*}
|
||||||
|
|
||||||
\section{Conclusion}
|
\section{Conclusion}
|
||||||
|
721
main2.tex
@ -1,701 +1,38 @@
|
|||||||
\documentclass[lettersize,journal]{IEEEtran}
|
\documentclass{article}
|
||||||
\usepackage{amsmath,amsfonts}
|
|
||||||
\usepackage{algorithmic}
|
|
||||||
\usepackage{algorithm}
|
|
||||||
\usepackage{array}
|
|
||||||
% \usepackage[caption=false,font=normalsize,labelfont=sf,textfont=sf]{subfig}
|
|
||||||
\usepackage{textcomp}
|
|
||||||
\usepackage{stfloats}
|
|
||||||
\usepackage{url}
|
|
||||||
\usepackage{verbatim}
|
|
||||||
\usepackage{graphicx}
|
\usepackage{graphicx}
|
||||||
\usepackage{cite}
|
|
||||||
\usepackage{subcaption}
|
\usepackage{subcaption}
|
||||||
\usepackage{multirow}
|
|
||||||
\usepackage[T1]{fontenc}
|
|
||||||
\usepackage{adjustbox}
|
|
||||||
\usepackage{amssymb}
|
|
||||||
\usepackage{booktabs}
|
|
||||||
\usepackage[table,xcdraw]{xcolor}
|
|
||||||
\definecolor{darkgreen}{RGB}{17,159,27} % 或者使用其他 RGB 值定义深绿色
|
|
||||||
\aboverulesep=0pt
|
|
||||||
\belowrulesep=0pt
|
|
||||||
\hyphenation{op-tical net-works semi-conduc-tor IEEE-Xpolare}
|
|
||||||
% updated with editorial comments 8/9/2021
|
|
||||||
|
|
||||||
\begin{document}
|
\begin{document}
|
||||||
|
|
||||||
\title{PolarRCNN:\@ End-to-End Lane Detection with Fewer Anchors}
|
\begin{figure}
|
||||||
|
\centering
|
||||||
\author{IEEE Publication Technology,~\IEEEmembership{Staff,~IEEE,}
|
\begin{subfigure}{0.45\textwidth}
|
||||||
% <-this % stops a space
|
|
||||||
\thanks{This paper was produced by the IEEE Publication Technology Group. They are in Piscataway, NJ.}% <-this % stops a space
|
|
||||||
\thanks{Manuscript received April 19, 2021; revised August 16, 2021.}}
|
|
||||||
|
|
||||||
% The paper headers
|
|
||||||
\markboth{Journal of \LaTeX\ Class Files,~Vol.~14, No.~8, August~2021}%
|
|
||||||
{Shell \MakeLowercase{\textit{et al.}}: A Sample Article Using IEEEtran.cls for IEEE Journals}
|
|
||||||
|
|
||||||
% \IEEEpubid{0000--0000/00\$00.00~\copyright~2021 IEEE}
|
|
||||||
% Remember, if you use this you must call \IEEEpubidadjcol in the second
|
|
||||||
% column for its text to clear the IEEEpubid mark.
|
|
||||||
|
|
||||||
\maketitle
|
|
||||||
|
|
||||||
\begin{abstract}
|
|
||||||
Lane detection is a critical and challenging task in autonomous driving, particularly in real-world scenarios where traffic lanes are often slender, lengthy, and partially obscured by other vehicles, complicating detection efforts. Existing anchor-based methods typically rely on prior straight line anchors to extract features and refine lane location and shape. Though achieving high performance, manually setting prior anchors is cumbersome, and ensuring sufficient anchor coverage across diverse datasets requires a large number of dense anchors. Furthermore, NMS postprocessing should be applied to supress the redundant predictions. In this study, we introduce PolarRCNN, a two-stage nms-free anchor-based method for lane detection. By introducing local polar head, the proposal of anchors are dynamic. The number of anchors are decreasing greatly without sacrificing performace. What's more, a GNN based nms free head is proposed to enable the model reach an end-to-end format, which is deployment friendly. Our model yields competitive results on five popular lane detection benchmarks (Tusimple, CULane, LLAMAS, Curvelanes and DL-Rail) while maintaining a lightweight size and a simple structure.
|
|
||||||
\end{abstract}
|
|
||||||
\begin{IEEEkeywords}
|
|
||||||
Lane detection.
|
|
||||||
\end{IEEEkeywords}
|
|
||||||
|
|
||||||
\section{Introduction}
|
|
||||||
\IEEEPARstart{L}{ane} detection is a significant problem in computer vision and autonomous driving, forming the basis for accurately perceiving the driving environment in intelligent driving systems. While extensive research has been conducted in ideal environments, it remains a challenging task in adverse scenarios such as night driving, glare, crowd, and rainy conditions, where lanes may be occluded or damaged. Moreover, the slender shapes, complex topologies of lanes and the global property to the complexity of detection challenges. An effective lane detection method should take into account both global high-level semantic features and local low-level features to address these varied conditions and ensure robust performance in real-time applications such as autonomous driving.
|
|
||||||
|
|
||||||
Traditional methods predominantly concentrate on handcrafted local feature extraction and lane shape modeling. Techniques such as the Canny edge detector\cite{canny1986computational}, Hough transform\cite{houghtransform}, and deformable templates for lane fitting\cite{kluge1995deformable} have been extensively utilized. Nevertheless, these approaches often encounter limitations in practical settings, particularly when low-level and local features lack clarity or distinctiveness.
|
|
||||||
|
|
||||||
In recent years, fueled by advancements in deep learning and the availability of large datasets, significant strides have been made in lane detection. Deep models, including convolutional neural networks (CNNs) and transformer-based architectures, have propelled progress in this domain. Previous approaches often treated lane detection as a segmentation task, albeit with simplicity came time-intensive computations. Some methods relied on parameter-based models, directly outputting lane curve parameters instead of pixel locations. These models offer end-to-end solutions, but the curve parameter sensitivity to lane shape compromises robustness.
|
|
||||||
|
|
||||||
\begin{figure}[t]
|
|
||||||
\centering
|
\centering
|
||||||
\begin{subfigure}{0.49\linewidth}
|
\includegraphics[width=0.8\linewidth]{lanefig/anchor_demo/anchor_fix_init.jpg}
|
||||||
\centering
|
\caption{}
|
||||||
\includegraphics[width=0.9\linewidth]{thsis_figure/anchor_demo/anchor_fix_init.jpg}
|
\label{fig:image1}
|
||||||
\caption{}
|
\end{subfigure}
|
||||||
\end{subfigure}
|
\begin{subfigure}{0.45\textwidth}
|
||||||
\begin{subfigure}{0.49\linewidth}
|
\centering
|
||||||
\centering
|
\includegraphics[width=0.8\linewidth]{lanefig/anchor_demo/anchor_fix_init.jpg}
|
||||||
\includegraphics[width=0.9\linewidth]{thsis_figure/anchor_demo/anchor_fix_learned.jpg}
|
\caption{}
|
||||||
\caption{}
|
\label{fig:image2}
|
||||||
\end{subfigure}
|
\end{subfigure}
|
||||||
%\qquad
|
\\ % 换行
|
||||||
%让图片换行,
|
\begin{subfigure}{0.45\textwidth}
|
||||||
|
\centering
|
||||||
\begin{subfigure}{0.49\linewidth}
|
\includegraphics[width=0.8\linewidth]{lanefig/anchor_demo/anchor_fix_init.jpg}
|
||||||
\centering
|
\caption{}
|
||||||
\includegraphics[width=0.9\linewidth]{thsis_figure/anchor_demo/anchor_proposal.jpg}
|
\label{fig:image3}
|
||||||
\caption{}
|
\end{subfigure}
|
||||||
\end{subfigure}
|
\begin{subfigure}{0.45\textwidth}
|
||||||
\begin{subfigure}{0.49\linewidth}
|
\centering
|
||||||
\centering
|
\includegraphics[width=0.8\linewidth]{lanefig/anchor_demo/anchor_fix_init.jpg}
|
||||||
\includegraphics[width=0.9\linewidth]{thsis_figure/anchor_demo/gt.jpg}
|
\caption{}
|
||||||
\caption{}
|
\label{fig:image4}
|
||||||
\end{subfigure}
|
\end{subfigure}
|
||||||
\caption{Compare with the anchor setting with other methods. (a) The initial anchor settings of CLRNet. (b) The learned anchor settings of CLRNet trained on CULane. (c) The proposed anchors of our method. (d) The ground truth.}
|
\caption{这是四张图片的组合}
|
||||||
\label{anchor setting}
|
\label{fig:subfigures}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
Drawing inspiration from object detection methods such as Yolos and Fast RCNN, several anchor-based approaches have been introduced for lane detection, the representative work including LanesATT and CLRNet. These methods have demonstrated superior performance by leveraging anchor priors and enabling larger receptive fields for feature extraction. However, anchor-based methods encounter similar drawbacks as anchor-based general object detection method as follows:
|
|
||||||
|
|
||||||
\begin{itemize} (1) A large amount of dense anchors should be configured to ensure the recall of detection result since the lane distributions are complex in real scenarios (i.e the direction and the localtion), as the Fig. \ref{anchor setting}(a) shows.
|
|
||||||
\end{itemize}
|
|
||||||
|
|
||||||
\begin{itemize} (2) Due to the large anchor setting, redundant predictions should be remove by postprocessing such as NMS \cite{} and FastNMS \cite{}, which brings the difficulty to deployment and the threshold of NMS should be manual setting.
|
|
||||||
\end{itemize}
|
|
||||||
|
|
||||||
In order to solve the first problem, CLRNet uses learned anchors which location are optimized during training to adapt to the lane distributions (see Fig \ref{anchor setting} (b)) in real scenarios and use cascade cross layer anchor refinement to make the anchor more closer to the groundtruth. However, the anchors in CLRNet are still numerous to cover the potential distributions of lanes. To solve this problem, ADNet \cite{} uses start points generate unit to propose flexible anchors for each image rather than uses the same set of anchors for all images. However, the start points of lanes are subjective and lack of clear visual evidence due to the gloal property of lanes, so the performance of ADNet is not ideal. SRLane uses local angle map to propose sketch anchors according the direction of groundtruth. This method only consider the direction and ignore the accurate location of anchors, leading to worse performance without cascade anchor refinement. Moreover, all methods mentioned above fail to avoid the redundant predictions in the second proplem.
|
|
||||||
|
|
||||||
In order to address the issue we mentioned above better than the previous work, we analysis the reasons causing these issues and proposed a new lane detection method called PolarRCNN, which is two-stage nms-free anchor-based model. PolarRCNN uses local and global coordinates to describe the anchors and the number of proposed anchors are much less than previous work, as shown in fig. \ref{anchor setting} (c). Moreover, aheuristic graph neural network block is proposed to make the model nms-free. The model architecture is simple without complex mechanism using in previous work(i.e. attenion, cascade refinement, etc.), making the model deployment easier and speed faster. Besides, simple architecture helps us to inspect the key factors for performance for anchor based lane detection methods.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
We conducted ecperiment on five mainstream benchmarks including TuSimple \cite{}, CULane \cite{}, LLAMAS\cite{}, Curvelanes\cite{} and DL-Rail\cite{}. Our proposed method is blessed with competitive performance with the state-of-art methods.
|
|
||||||
|
|
||||||
Our main contribution are summarized as:
|
|
||||||
|
|
||||||
\begin{itemize}
|
|
||||||
\item We simplified the anchor parameters with local and global polar coordinate systems, and apply them to two-stage lane detection frameworks. Compared with other sparse two-stage methods, the number of porposed anchors are greatly decreasing with a better performace.
|
|
||||||
\item We proposed a novel heuristic graph neural network (GNN) head to implement a nms-free paradigm. The architecture of GNN is designed according to Fast NMS with interpretability. The whole training and testing process of our model is end-to-end.
|
|
||||||
\item Our proposed method applies simple model architectures and get competitive performance with other state-of-art methods on five datasets. The high performace with fewer anchors and nms-free paradigm and demonstrate the effectiveness of our method.
|
|
||||||
\end{itemize}
|
|
||||||
|
|
||||||
\section{Related Works}
|
|
||||||
The lane detection aims to detect lane instances in a image. In this section, we only introduce deep-leanrning based methods for lane detection. The lane detection methods can be categorized by segmentation based parameter-based methods and anchor-based methods.
|
|
||||||
|
|
||||||
\textbf{Segmentation-based Methods.} Segmentation-based methods focus on pixel-wise prediction. They predefined each pixel into different categories according to different lane instances and background\cite{} and predicted information pixel by pixel. However, these methods overly focus on low-level and local features, neglecting global semantic information and real-time detection. SCNN uses a larger receptive field to overcome this problem. Some methods such as UFLDv1 and v2\cite{}\cite{} and CondLaneNet\cite{} utilize row-wise or column-wise classification instead of pixel classification to improve detection speed. Another issue with these methods is that the lane instance prior is learned by the model itself, leading to a lack of prior knowledge. Lanenet uses post-clustering to distinguish each lane instance. UFLD divides lane instances by angles and locations and can only detect a fixed number of lanes. CondLaneNet utilizes different conditional dynamic kernels to predict different lane instances. Some methods such as FOLOLane\cite{} and GANet\cite{} use bottom-up strategies to detect a few key points and model their global relations to form lane instances.
|
|
||||||
|
|
||||||
\textbf{Parameter-based Methods.} Instead of predicting a series of points locations or pixel classes, parameter-based methods directly generate the curve parameters of lane instances. PolyLanenet\cite{} and LSTR\cite{} consider the lane instance as a polynomial curve and output the polynomial coefficients directly. BézierLaneNet\cite{} treats the lane instance as a Bézier curve and generates the locations of control points of the curve. BSLane uses B-Spline to describe the lane, and the curve parameters focus on the local shapes of lanes. Parameter-based methods are mostly end-to-end without postprocessing, which grants them faster speed. However, since the final visual lane shapes are sensitive to the lane shape, the robustness and generalization of parameter-based methods may be less than ideal.
|
|
||||||
|
|
||||||
|
|
||||||
\textbf{Anchor-Based Methods.} Inspired by some methods in general object detection like YOLO \cite{} and DETR \cite{}, anchor-based methods have been proposed for lane detection. Line-CNN is the earliest work, to our knowledge, that utilizes line anchors to detect lanes. The lines are designed as rays emitted from the three edges (left, bottom, and right) of an image. However, the receptive field of the model only focuses on edges and is slower than some methods. LaneATT \cite{} employs anchor-based feature pooling to aggregate features along the whole line anchor, achieving faster speed with better performance. Nevertheless, the grid sampling strategy and label assignment limit its potential. CLRNet \cite{} utilizes cross-layer refinement strategies, SimOTA label assignment \cite{}, and Liou loss to enhance anchor-based performance beyond most methods. The main advantage of anchor-based methods is that many strategies from anchor-based general object detection can be easily applied to lane detection, such as label assignment, bounding box refinement, GIOU loss, etc. However, the disadvantages of existing anchor-based lane detection are also evident. The line anchors need to be handcrafted and the anchor number is large, NMS postprocessing are needed, resulting in high computational consumption.
|
|
||||||
some work such as ADNet\cite{}, SRLane\cite{} and Sparse Laneformer\cite{} attempt to reduce the anchors and give proposals.
|
|
||||||
|
|
||||||
\textbf{NMS-Free Object Detections}. NMS is an import postprocessing step in most general object detection methods. Detr \cite{} use one to one label assignment to avoid redundant predictions without NMS. Other nms-free method \cite{} successively proposed. These methods analysis this issue in to aspects, the model architecture and label assignment. \cite{}\cite{} hold the view that one to one assignments are the key points for nms-free predictions. Other works also consider the model expression ability to provided the non-redundant predictions. However, few anchor-based lane detecction methods analysis the nms-free paradigm as the general object detection, and rely on the NMS postprocessing. In our work, we find both the labal assignment and the expressive ability of nms-free module (e.g. the architecture and the inputs of module) both play an important role in the nms-free lane detection task for ancnor-based models.
|
|
||||||
|
|
||||||
This paper aims to address the two issue mentioned above (reducing anchors numbers and nms-free) for the anchor-based lanes proposed methods.
|
|
||||||
|
|
||||||
|
|
||||||
\section{Method}
|
|
||||||
The overall architecture of PolarRCNN is illustrated in fig. \ref{overall_architecture}. Our model consists of backbone-FPN, local polar head and global polar head. Only simple network layers such as convolution, MLP and pooling ops are used in each bolck (rather than attention, dynamic kernels, etc.).
|
|
||||||
|
|
||||||
\begin{figure*}[ht]
|
|
||||||
\centering
|
|
||||||
\includegraphics[width=0.9\textwidth]{thsis_figure/ovarall_architecture.png} % 替换为你的图片文件名
|
|
||||||
\caption{The overall pipeline of PolarRCNN. The architecture is simple and lightweight. The backbone (e.g. ResNet18) and FPN aims to extract feature of the image. And the Local polar head aims to proposed sparse line anchors. After pooling features sample along the line anchors, the global polar head give the final predictions. Trilet subheads are set in the Global polar Head, including an one-to-one classification head (o2o cls head), an one-to-many classification head (o2m cls head) and an one-to-many regression head (o2m reg Head). The one-to-one cls head aim to replace the NMS postprocessing and select only one positive prediction sample for each groundtruth from the redundant predictions from the o2m head.}
|
|
||||||
\label{overall_architecture}
|
|
||||||
\end{figure*}
|
|
||||||
|
|
||||||
\subsection{Lane and Line Anchor Representation}
|
|
||||||
|
|
||||||
Lanes are thin and long curves, a suitable lane prior helps the model to extract features and predict location and modeling the shapes of lane curves more accurately. Keeping the same as privious works\cite{}\cite{}, the lane prior (also called lane anchor) in our work are straight lines and we sample a sequense of 2D points on each line anchor, i.e. $ P\doteq \left\{ \left( x_1, y_1 \right) , \left( x_2, y_2 \right) , ....,\left( x_n, y_n \right) \right\} $, where N is the number of sampled points, The y coordinate of points is uniform sampled from the image vertically, i.e. $y_i=\frac{H}{N-1}*i$, where H is the image height. The same y coordinate of points are also sampled from the groundtruth lane and the model regress the x coordinate offset from line anchor to lane instance ground truth. The only differernce between PolarRCNN and previous work is the description of straight line anchors. It will be introduced in follows.
|
|
||||||
|
|
||||||
\textbf{Polar Coordinate system.} Since the lane anchor are set to be straight by default, it could be described by the straight line parameter. Previous work uses a ray to describe a 2D line anchor, and the parameters of a ray contain the start point's coordinates and the orientation/angle, i.e., $\left\{\theta, P_{xy}\right\}$, as shown in Figure \ref{coord} (a). \cite{}\cite{} define the start points locates on the three image boundary. And \cite{} points out that this not reasonable because the real start point of a lane could be in any location within an image. In our analysis, using a ray may cause ambiguity in describing a line because a line may have infinite start points and the start point of the lane is subjective. As illustrated in Figure \ref{coord} (a), the yellow and darkgreen start points with the same orientation $\theta$ describe the same line, and either of them could be chosen in different datasets. This ambiguity arises because a straight line has two degrees of freedom while a ray has three degrees of freedom. To address this issue, as shown in Figure \ref{coord} (b), we use polar coordinate systems to describe a lane anchor with two parameters for radius and angle $\left\{\theta, r\right\}$, where $\theta \in \left[-\frac{\pi}{2}, \frac{\pi}{2}\right)$ and $r \in \left(-\infty, +\infty\right)$.
|
|
||||||
|
|
||||||
|
|
||||||
\begin{figure}[t]
|
|
||||||
\centering
|
|
||||||
\begin{subfigure}{0.49\linewidth}
|
|
||||||
\centering
|
|
||||||
\includegraphics[width=1\linewidth]{thsis_figure/coord/ray.png}
|
|
||||||
\caption{}
|
|
||||||
\end{subfigure}
|
|
||||||
\hfill
|
|
||||||
\begin{subfigure}{0.49\linewidth}
|
|
||||||
\centering
|
|
||||||
\includegraphics[width=1\linewidth]{thsis_figure/coord/polar.png}
|
|
||||||
\caption{}
|
|
||||||
\end{subfigure}
|
|
||||||
\caption{Different descriptions for anchor parameters. (a) Ray: start point and orientation. (b) polar: radius and angle.}
|
|
||||||
\label{coord}
|
|
||||||
\end{figure}
|
|
||||||
|
|
||||||
We define two kinds of polar coordinate systems called the global coordinate system and the local coordinate system, with the origin points denoted as the global origin point $P_{0}^{\text{global}}$ and the local origin point $P_{0}^{\text{local}}$, correspondingly. For convenience, the global origin point is set around the static vanishing point of the whole lane image dataset, while the local origin points are set as lattice within the image. From Figure \ref{coord}, it is easy to see that only the radius parameters are influenced by the choise of the origin point, with the angle/orientation parameters keeping consistent.
|
|
||||||
|
|
||||||
\subsection{Local polar Head}
|
|
||||||
|
|
||||||
Dispired by the region proposal network in Faster RCNN \cite{}, the local polar proposal module aims to propose flexible anchors with high-quality in an image. As fig.\ref{lph} and fig. \ref{overall_architecture}. The highest level (P3) of FPN feature maps the input of $F \in \mathbb{R}^{C_{f} \times H_{f} \times W_{f}}$ are chosen as the input of Local Polar Head (LPH). After downsampling opereation, the feature map are fed into two branch, namely the regression branch and the classification branch:
|
|
||||||
|
|
||||||
\begin{equation}
|
|
||||||
\begin{aligned}
|
|
||||||
&F_d\gets downsample\left( F \right), \,F_d\in \mathbb{R} ^{C_f\times H_l\times W_l}\\
|
|
||||||
&F_{reg\,\,}\gets \phi _{reg}^{lph}\left( F_d \right), \,F_{reg\,\,}\in \mathbb{R} ^{2\times H_l\times W_l}\\
|
|
||||||
&F_{cls}\gets \phi _{cls}^{lph}\left( F_d \right), \,F_{cls}\in \mathbb{R} ^{1\times H_l\times W_l}
|
|
||||||
\end{aligned}
|
|
||||||
\label{lph equ}
|
|
||||||
\end{equation}
|
|
||||||
|
|
||||||
The regression branch aim to proposed lane anchors by predicting the two parameters $F_{reg\,\,} \equiv \left[\mathbf{\Theta}^{H_{l} \times W_{l}}, \mathbf{\xi}^{H_{l}\times W_{l}}\right]$ under the local polar coordinate system, which denotes the angles and the radius. The classification branch predicts the heat map of the local polar origin grid. By removing the local origin points with lower confidence, the potential positive lane anchors around the groundtruth are more likely to chosen while the background lane anchors are removed. Keeping it simple, the regression branch $\phi _{reg}^{lph}\left(\cdot \right)$ and the classification branch $\phi _{cls}^{lph}\left(\cdot \right)$ consists of one conv 1x1 layers and two conv 1x1 layers correspondingly.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\begin{figure}[t]
|
|
||||||
\centering
|
|
||||||
\includegraphics[width=0.45\textwidth]{thsis_figure/local_polar_head.png} % 替换为你的图片文件名
|
|
||||||
\caption{The main architecture of our model.}
|
|
||||||
\label{lph}
|
|
||||||
\end{figure}
|
|
||||||
|
|
||||||
During the training stage, as fig. \ref{lphlabel},the ground truth label of local polar head is constructed as follows. The radius ground truth is defined as the shortest distance from a grid point (local plot origin point) to the ground truth lane curve. The ground truth of angle is defined as the orientation of the link from the grid point to the nearest points on the curve. Only one grid with the label of radius less than a threshold $\tau$ is set as a positive sample, while others are set as negative samples. Once the regression and classification labels are constructed, it can be easy to train the LPH by smooth-l1 loss and cross entropy loss (BCE). The LPH loss function is defined as follows:
|
|
||||||
|
|
||||||
\begin{equation}
|
|
||||||
\begin{aligned}
|
|
||||||
\mathcal{L} _{lph}^{cls}&=BCE\left( F_{cls},F_{gt} \right) \\
|
|
||||||
\mathcal{L} _{lph}^{r\mathrm{e}g}&=\frac{1}{N_{lph}^{pos}}\sum_{i\in \left\{i|\hat{r}_i<\tau \right\}}{\left( d\left( \theta _i-\hat{\theta}_i \right) +d\left( r_i-\hat{r}_i \right) \right)}\\
|
|
||||||
% \mathcal{L} _{lph}^{r\mathrm{e}g}&=\lambda _{lph}^{cls}\mathcal{L} _{lph}^{cls}+\lambda _{lph}^{reg}\mathcal{L} _{lph}^{r\mathrm{e}g}
|
|
||||||
\end{aligned}
|
|
||||||
\label{loss_lpm}
|
|
||||||
\end{equation}
|
|
||||||
|
|
||||||
|
|
||||||
where $BCE\left( \cdot , \cdot \right) $ denotes the binary cross entropy loss and $d\left(\cdot \right)$ denotes the smooth-l1 loss. In order to keep the backbone training stability, the gradiants from the confidential branch to the backbone feature map are detached.
|
|
||||||
|
|
||||||
|
|
||||||
\begin{figure}[t]
|
|
||||||
\centering
|
|
||||||
\includegraphics[width=0.48\textwidth]{thsis_figure/coord/localpolar.png}
|
|
||||||
\caption{Label construction for local polar proposal module.}
|
|
||||||
\label{lphlabel}
|
|
||||||
\end{figure}
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\subsection{Global polar Head}
|
|
||||||
Global polar head serves has the second stage of PolarRCNN, which accept the line pooling features as input and predict the accurate lane shape and localtion. The global polar head consist of 3 partsd.
|
|
||||||
Once the local polar parameter of a line anchor is provided, it can be transformed to the global polar coordinates with the following euqation:
|
|
||||||
\begin{equation}
|
|
||||||
\begin{aligned}
|
|
||||||
r^{global}=r^{local}+\left( x^{local}-x^{global} \right) \cos \theta
|
|
||||||
\\+\left( y^{local}-y^{global} \right) \sin \theta
|
|
||||||
\end{aligned}
|
|
||||||
\end{equation}
|
|
||||||
where $\left( x^{local}, y^{local} \right)$ and $\left( x^{global}, y^{global} \right)$ are the Cartesian coordinates of local and global origin points correspondingly.
|
|
||||||
|
|
||||||
Then the feature points can be sample on the line anchor. The y coordinate of points is uniform sampled from the feature maps from FPN vertically as mentioned before, and the $x_{i}$ is caculated using the global polar axis by the following equation:
|
|
||||||
|
|
||||||
\begin{equation}
|
|
||||||
\begin{aligned}
|
|
||||||
x_{i\,\,}=-y_i\tan \theta +\frac{r}{\cos \theta}
|
|
||||||
\end{aligned}
|
|
||||||
\end{equation}
|
|
||||||
|
|
||||||
|
|
||||||
\begin{figure}[t]
|
|
||||||
\centering
|
|
||||||
\includegraphics[width=0.49\textwidth]{thsis_figure/triplet_head.png} % 替换为你的图片文件名
|
|
||||||
\caption{The main architecture of global head}
|
|
||||||
\label{triplet}
|
|
||||||
\end{figure}
|
|
||||||
|
|
||||||
\begin{figure}[t]
|
|
||||||
\centering
|
|
||||||
\includegraphics[width=0.4\textwidth]{thsis_figure/gnn.png} % 替换为你的图片文件名
|
|
||||||
\caption{The main architecture of our model.}
|
|
||||||
\label{gnn}
|
|
||||||
\end{figure}
|
|
||||||
|
|
||||||
\begin{figure}[t]
|
|
||||||
\centering
|
|
||||||
\includegraphics[width=0.49\textwidth]{thsis_figure/GLaneIoU.png} % 替换为你的图片文件名
|
|
||||||
\caption{Illustrations of GLaneIoU re-defined in our work.}
|
|
||||||
\label{glaneiou}
|
|
||||||
\end{figure}
|
|
||||||
|
|
||||||
Suppose the $\left{L_{0}, L_{1} and L{2}\right}$ denotes the different levels from FPN, we sample the grid featuers from the three levels once rather than using the cross layer refinenment like CLRNet. In order to reduce the number of parameters, we use the weight sum strategy to add features from different layers similar to \cite{}:
|
|
||||||
|
|
||||||
|
|
||||||
\begin{equation}
|
|
||||||
\begin{aligned}
|
|
||||||
\boldsymbol{F}^s=\sum_{i=0}^2{\boldsymbol{F}_{i}^{s}\times \frac{e^{\boldsymbol{w}_{i}^{s}}}{\sum_{i=0}^2{e^{\boldsymbol{w}_{i}^{s}}}}}
|
|
||||||
\end{aligned}
|
|
||||||
\end{equation}
|
|
||||||
where $\boldsymbol{F}_{i}^{s}\in \mathbb{R} ^{N_p\times d_f}$ is the grid featuers sampled from $L_{i}$ and $\boldsymbol{w}_{i}^{s}\in \mathbb{R} ^{N_p}$ is the aggregate weight, serving as a learned weight of model. Insteading of concationate the three sampleing features to $\boldsymbol{F}^s\in \mathbb{R} ^{N_p\times d_f\times 3}$ directly, the adaptive adding greatly reduce the dimentsion of the features to $\boldsymbol{F}^s\in \mathbb{R} ^{N_p\times d_f}$, which is 1/3 of the former. Then the weighted sumed tensors are fed into a full connection layers:
|
|
||||||
\begin{equation}
|
|
||||||
\begin{aligned}
|
|
||||||
\boldsymbol{F}^{roi}\gets fc\left( \boldsymbol{F}^s \right) , \boldsymbol{F}^{roi}\in \mathbb{R} ^{d_r}
|
|
||||||
\end{aligned}
|
|
||||||
\end{equation}
|
|
||||||
|
|
||||||
It should be noted that each proposal anchor produce an roi features, where we denote it as $\boldsymbol{F}^{roi}_{i}$ (i=0,1,2,... $N_{anc}$) are fed into the triplet head and get one-to-one(o2o) confidence , one-to-many(o2m) confidence and many-to-one (o2m) regressions. As for the o2m confidence, more than one detecting results are provided around the ground truth with high probablity, So the NMS post processing is necesary to remove the redundant positive detecting results, just as the previous works\cite{}. Unlike one-to-many assignment, one-to-one assignment only assign one positive anchors for each anchor, so just one detecting result is likely to be provided for one groundtruth, which is nms-free.
|
|
||||||
|
|
||||||
However, we find that only plain structure branch is unable to learn the one-to-one assignment, because the anchors are highly coincide just as the fig \ref{anchor setting} (b)(c) shows. Directly use use the one-to-one assignment to the same structure of o2m branch should greatly reduce the performance of the model. To address this issue, We use a heuristic way to design the structure of one-to-one beanches.
|
|
||||||
|
|
||||||
It is easy to notice that the "ideal" one-to-one branch is equivalence to o2m cls branch + o2m regression + NMS postprocessing. To deduce the latter more clearly, the process are transfer to the equation as follows:
|
|
||||||
|
|
||||||
|
|
||||||
\begin{equation}
|
|
||||||
\begin{aligned}
|
|
||||||
s_i\gets f_{o2o}^{cls}\left( \boldsymbol{F}_{i}^{roi} \right)
|
|
||||||
\\
|
|
||||||
\varDelta \boldsymbol{x}_{i}^{roi}\gets f_{o2o}^{cls}\left( \boldsymbol{F}_{i}^{roi} \right) \,\, \varDelta \boldsymbol{x}_{i}^{roi}\in \mathbb{R} ^{N_r}
|
|
||||||
\\
|
|
||||||
\tilde{s}_i|_{i=1}^{N_{anc}}\gets NMS\left( s_i|_{i=1}^{N_{anc}}, \varDelta \boldsymbol{x}_{i}^{roi}+\boldsymbol{x}_{i}^{b}|_{i=1}^{N_{anc}} \right)
|
|
||||||
\\
|
|
||||||
\end{aligned}
|
|
||||||
\end{equation}
|
|
||||||
|
|
||||||
That is to say the o2o confidence could be prdicted by some functions (the functions should takes features/scores\locations of all anchors rather than just one anchor)
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
The loss function is as follows:
|
|
||||||
|
|
||||||
\begin{equation}
|
|
||||||
\begin{aligned}
|
|
||||||
\mathcal{L} _{RCNN}=c_{cls}\mathcal{L} _{cls}+c_{loc}\mathcal{L} _{loc}
|
|
||||||
\end{aligned}
|
|
||||||
\end{equation}
|
|
||||||
where $\mathcal{L} _{cls}$ is focal loss, and $\mathcal{L} _{loc}$ is LaneIou loss\cite{}.
|
|
||||||
|
|
||||||
In the testing stage, anchors with the top-$k_{l}$ confidence are the chosed as the proposal anchors, and $k_{l}$ anchors are fed into the RCNN module to get the final predictions.
|
|
||||||
|
|
||||||
\section{Experiment}
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\begin{figure}[t]
|
|
||||||
\centering
|
|
||||||
\includegraphics[width=0.48\textwidth]{thsis_figure/anchor_num_f1.png} % 替换为你的图片文件名
|
|
||||||
\caption{Anchor Number and f1-score of different methods on CULane.}
|
|
||||||
\label{glaneiou}
|
|
||||||
\end{figure}
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\begin{table*}[htbp]
|
|
||||||
\centering
|
|
||||||
\caption{Dataset \& preprocess}
|
|
||||||
\begin{adjustbox}{width=\linewidth}
|
|
||||||
\begin{tabular}{l|l|ccccc}
|
|
||||||
\toprule
|
|
||||||
\multicolumn{2}{c|}{\textbf{Dataset}} & CULane & TUSimple & LLAMAS & DL-Rail & Curvelanes \\
|
|
||||||
\midrule
|
|
||||||
\multirow{7}*{Data info}
|
|
||||||
& Train &3,268 &88,880&58,269&5,435&100,000\\
|
|
||||||
& Validation &9,675 &358 &20,844&- &20,000 \\
|
|
||||||
& Test &34,680&2,782 &20,929&1,569&- \\
|
|
||||||
& Resolution &1640\times590&1280\times720&1276\times717&1920\times1080&2560\times1440, etc\\
|
|
||||||
& Lane &\leqslant4&\leqslant5&\leqslant4&=2&\leqslant10\\
|
|
||||||
& Environment &urban and highway & highway&highway&Railay&urban and highway\\
|
|
||||||
& Property &sparse&sparse&sparse&sparse&sparse and dense\\
|
|
||||||
\midrule
|
|
||||||
\multirow{1}*{Data Preprocess}
|
|
||||||
& Crop Height &270&160&300&560&640, etc\\
|
|
||||||
\midrule
|
|
||||||
\multirow{5}*{Training Hyper Parameter}
|
|
||||||
& Epoch Number &32&70&20&90&32\\
|
|
||||||
& Batch Size &40&24&32&40&40\\
|
|
||||||
& Warm up iterations &800&200&800&400&800\\
|
|
||||||
& Aux Loss &0.2&0 &0.2&0.2&0.2\\
|
|
||||||
& Rank Loss &0.7&0.7&0.1&0.7&0 \\
|
|
||||||
\midrule
|
|
||||||
\multirow{4}*{Model Hyper Parameter}
|
|
||||||
& polar map size &4\times10&4\times10&4\times10&4\times10&6\times13\\
|
|
||||||
& testing anchor number (top-k) &20&20&20&20&50\\
|
|
||||||
& o2m conf thres &0.48&0.40&0.40&0.40&0.45\\
|
|
||||||
& o2o conf thres &0.46&0.46&0.46&0.46&0.44\\
|
|
||||||
\midrule
|
|
||||||
\multirow{2}*{Evaluation Choise}
|
|
||||||
& Eval split &Test&Test&Test&Test&Validation\\
|
|
||||||
& Vis split &Test&Test&Validation&Test&Validation\\
|
|
||||||
\bottomrule
|
|
||||||
\end{tabular}
|
|
||||||
\end{adjustbox}
|
|
||||||
\label{dataset_info}
|
|
||||||
\end{table*}
|
|
||||||
|
|
||||||
|
|
||||||
\begin{table*}[htbp]
|
|
||||||
\centering
|
|
||||||
\caption{CULane Result compared with other methods}
|
|
||||||
\normalsize
|
|
||||||
\begin{adjustbox}{width=\linewidth}
|
|
||||||
\begin{tabular}{lrlllllllllll}
|
|
||||||
\toprule
|
|
||||||
\textbf{Method}& \textbf{Backbone}&\textbf{F1@50}$\uparrow$& \textbf{F1@75}$\uparrow$& \textbf{Normal}$\uparrow$&\textbf{Crowded}$\uparrow$&\textbf{Dazzle}$\uparrow$&\textbf{Shadow}$\uparrow$&\textbf{No line}$\uparrow$& \textbf{Arrow}$\uparrow$& \textbf{Curve}$\uparrow$& \textbf{Cross}$\downarrow$ & \textbf{Night}$\uparrow$ \\
|
|
||||||
\hline
|
|
||||||
\textbf{Seg \& Grid} \\
|
|
||||||
\cline{1-1}
|
|
||||||
SCNN &VGG-16 &71.60&39.84&90.60&69.70&58.50&66.90&43.40&84.10&64.40&1900&66.10\\
|
|
||||||
RESA &ResNet50 &75.30&53.39&92.10&73.10&69.20&72.80&47.70&83.30&70.30&1503&69.90\\
|
|
||||||
LaneAF &DLA34 &77.41&- &91.80&75.61&71.78&79.12&51.38&86.88&72.70&1360&73.03\\
|
|
||||||
UFLDv2 &ResNet34 &76.0 &- &92.5 &74.8 &65.5 &75.5 &49.2 &88.8 &70.1 &1910&70.8 \\
|
|
||||||
CondLaneNet &ResNet101&79.48&61.23&93.47&77.44&70.93&80.91&54.13&90.16&75.21&1201&74.80\\
|
|
||||||
\cline{1-1}
|
|
||||||
\textbf{Parameter} \\
|
|
||||||
\cline{1-1}
|
|
||||||
BézierLaneNet &ResNet18&73.67&-&90.22&71.55&62.49&70.91&45.30&84.09&58.98&\textbf{996} &68.70\\
|
|
||||||
BSNet &DLA34 &80.28&-&93.87&78.92&75.02&82.52&54.84&90.73&74.71&1485&75.59\\
|
|
||||||
Eigenlanes &ResNet50&77.20&-&91.7 &76.0 &69.8 &74.1 &52.2 &87.7 &62.9 &1509&71.8 \\
|
|
||||||
\cline{1-1}
|
|
||||||
\textbf{Keypoint} \\
|
|
||||||
\cline{1-1}
|
|
||||||
CurveLanes-NAS-L &-u &74.80&-&90.70&72.30&67.70&70.10&49.40&85.80&68.40&1746&68.90\\
|
|
||||||
FOLOLane &ResNet18 &78.80&-&92.70&77.80&75.20&79.30&52.10&89.00&69.40&1569&74.50\\
|
|
||||||
GANet-L &ResNet101&79.63&-&93.67&78.66&71.82&78.32&53.38&89.86&77.37&1352&73.85\\
|
|
||||||
\cline{1-1}
|
|
||||||
\textbf{Dense Anchor} \\
|
|
||||||
\cline{1-1}
|
|
||||||
LaneATT &ResNet18 &75.13&51.29&91.17&72.71&65.82&68.03&49.13&87.82&63.75&1020&68.58\\
|
|
||||||
LaneATT &ResNet122&77.02&57.50&91.74&76.16&69.47&76.31&50.46&86.29&64.05&1264&70.81\\
|
|
||||||
CLRNet &Resnet18 &79.58&62.21&93.30&78.33&73.71&79.66&53.14&90.25&71.56&1321&75.11\\
|
|
||||||
CLRNet &DLA34 &80.47&62.78&93.73&79.59&75.30&82.51&54.58&90.62&74.13&1155&75.37\\
|
|
||||||
CLRerNet &DLA34 &81.12&64.07&94.02&80.20&74.41&\textbf{83.71}&56.27&90.39&74.67&1161&\textbf{76.53}\\
|
|
||||||
\cline{1-1}
|
|
||||||
\textbf{Sparse Anchor} \\
|
|
||||||
\cline{1-1}
|
|
||||||
ADNet &ResNet34&78.94&-&92.90&77.45&71.71&79.11&52.89&89.90&70.64&1499&74.78\\
|
|
||||||
SRLane &ResNet18&79.73&-&93.52&78.58&74.13&81.90&55.65&89.50&75.27&1412&74.58\\
|
|
||||||
Sparse Laneformer &Resnet50&77.83&-&- &- &- &- &- &- &- &- &- \\
|
|
||||||
\hline
|
|
||||||
\textbf{Proposed Method} \\
|
|
||||||
\cline{1-1}
|
|
||||||
PolarRCNN_{o2m} &ResNet18&80.81&63.97&94.11&79.57&76.53&83.33&55.10&90.70&79.47&1089&75.25\\
|
|
||||||
PolarRCNN &ResNet18&80.80&63.97&94.12&79.57&76.53&83.33&55.09&90.62&79.47&1089&75.25\\
|
|
||||||
PolarRCNN &ResNet34&80.91&63.96&94.24&79.75&76.67&81.97&55.40&\textbf{91.12}&79.85&1158&75.70\\
|
|
||||||
PolarRCNN &ResNet50&81.34&64.77&94.45&\textbf{80.42}&75.88&83.61&56.63&91.10&80.00&1356&75.94\\
|
|
||||||
PolarRCNN_{o2m} &DLA34 &\textbf{81.49}&64.96&\textbf{94.44}&80.36&\textbf{76.83}&83.68&56.53&90.85&\textbf{80.09}&1135&76.32\\
|
|
||||||
PolarRCNN &DLA34 &\textbf{81.49}&\textbf{64.97}&\textbf{94.44}&80.36&\textbf{76.83}&83.68&\textbf{56.56}&90.81&79.80&1135&76.33\\
|
|
||||||
\bottomrule
|
|
||||||
\end{tabular}
|
|
||||||
\end{adjustbox}
|
|
||||||
\label{culane result}
|
|
||||||
\end{table*}
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\begin{table}[h]
|
|
||||||
\centering
|
|
||||||
\caption{TuSimple Result compared with other methods}
|
|
||||||
\begin{adjustbox}{width=\linewidth}
|
|
||||||
\begin{tabular}{lrcccc}
|
|
||||||
\toprule
|
|
||||||
\textbf{Method}& \textbf{Backbone}& \textbf{Acc(\%)}&\textbf{F1(\%)}&\textbf{FP(\%)}&\textbf{FN(\%)} \\
|
|
||||||
\midrule
|
|
||||||
SCNN &VGG16 &96.53&95.97&6.17&\textbf{1.80}\\
|
|
||||||
PolyLanenet&EfficientNetB0&93.36&90.62&9.42&9.33\\
|
|
||||||
UFLDv2 &ResNet34 &88.08&95.73&18.84&3.70\\
|
|
||||||
LaneATT &ResNet34 &95.63&96.77&3.53&2.92\\
|
|
||||||
FOLOLane &ERFNet &\textbf{96.92}&96.59&4.47&2.28\\
|
|
||||||
CondLaneNet&ResNet101 &96.54&97.24&2.01&3.50\\
|
|
||||||
CLRNet &ResNet18 &96.84&97.89&2.28&1.92\\
|
|
||||||
\midrule
|
|
||||||
PolarRCNN_{o2m} &ResNet18&96.20&\textbf{97.98}&2.16&1.86\\
|
|
||||||
PolarRCNN &ResNet18&96.21&97.93&2.26&1.87\\
|
|
||||||
\bottomrule
|
|
||||||
\end{tabular}
|
|
||||||
\end{adjustbox}
|
|
||||||
\end{table}
|
|
||||||
|
|
||||||
|
|
||||||
\begin{table}[h]
|
|
||||||
\centering
|
|
||||||
\caption{LLAMAS test results compared with other methods}
|
|
||||||
\begin{adjustbox}{width=\linewidth}
|
|
||||||
\begin{tabular}{lrcccc}
|
|
||||||
\toprule
|
|
||||||
\textbf{Method}& \textbf{Backbone}&\textbf{F1@50(\%)}&\textbf{Precision(\%)}&\textbf{Recall(\%)} \\
|
|
||||||
\midrule
|
|
||||||
SCNN &ResNet34&94.25&94.11&94.39\\
|
|
||||||
BézierLaneNet &ResNet34&95.17&95.89&94.46\\
|
|
||||||
LaneATT &ResNet34&93.74&96.79&90.88\\
|
|
||||||
LaneAF &DLA34 &96.07&96.91&95.26\\
|
|
||||||
DALNet &ResNet34&96.12&\textbf{96.83}&95.42\\
|
|
||||||
CLRNet &DLA34 &96.12&- &- \\
|
|
||||||
\midrule
|
|
||||||
PolarRCNN &ResNet18&96.06&96.81&95.32\\
|
|
||||||
PolarRCNN &DLA34 &\textbf{96.14}&96.82&\textbf{95.47}\\
|
|
||||||
\bottomrule
|
|
||||||
\end{tabular}
|
|
||||||
\end{adjustbox}
|
|
||||||
\end{table}
|
|
||||||
|
|
||||||
\begin{table}[h]
|
|
||||||
\centering
|
|
||||||
\caption{DL-Rail test results compared with other methods}
|
|
||||||
\begin{adjustbox}{width=\linewidth}
|
|
||||||
\begin{tabular}{lrccc}
|
|
||||||
\toprule
|
|
||||||
\textbf{Method}& \textbf{Backbone}&\textbf{mF1(\%)}&\textbf{F1@50(\%)}&\textbf{F1@75(\%)} \\
|
|
||||||
\midrule
|
|
||||||
BézierLaneNet &ResNet18&42.81&85.13&38.62\\
|
|
||||||
GANet-S &Resnet18&57.64&95.68&62.01\\
|
|
||||||
CondLaneNet &Resnet18&52.37&95.10&53.10\\
|
|
||||||
UFLDv1 &ResNet34&53.76&94.78&57.15\\
|
|
||||||
LaneATT(with RPN) &ResNet18&55.57&93.82&58.97\\
|
|
||||||
DALNet &ResNet18&59.79&96.43&65.48\\
|
|
||||||
\midrule
|
|
||||||
PolarRCNN_{o2m} &ResNet18&\textbf{61.54}&\textbf{97.01}&\textbf{67.92}\\
|
|
||||||
PolarRCNN &ResNet18&61.53&96.99&67.91\\
|
|
||||||
\bottomrule
|
|
||||||
\end{tabular}
|
|
||||||
\end{adjustbox}
|
|
||||||
\end{table}
|
|
||||||
|
|
||||||
|
|
||||||
\begin{table}[h]
|
|
||||||
\centering
|
|
||||||
\caption{Curvelanes validation results compared with other methods}
|
|
||||||
\begin{adjustbox}{width=\linewidth}
|
|
||||||
\begin{tabular}{lrcccc}
|
|
||||||
\toprule
|
|
||||||
\textbf{Method}& \textbf{Backbone}&\textbf{F1(\%)}&\textbf{Precision(\%)}&\textbf{Recall(\%)} \\
|
|
||||||
\midrule
|
|
||||||
SCNN &VGG16 &65.02&76.13&56.74\\
|
|
||||||
Enet-SAD &- &50.31&63.60&41.60\\
|
|
||||||
PointLanenet &ResNet101&78.47&86.33&72.91\\
|
|
||||||
CurveLane-S &- &81.12&93.58&71.59\\
|
|
||||||
CurveLane-M &- &81.80&93.49&72.71\\
|
|
||||||
CurveLane-L &- &82.29&91.11&75.03\\
|
|
||||||
UFLDv2 &ResNet34 &81.34&81.93&80.76\\
|
|
||||||
CondLaneNet-M &ResNet34 &85.92&88.29&83.68\\
|
|
||||||
CondLaneNet-L &ResNet101&86.10&88.98&83.41\\
|
|
||||||
CLRNet &DLA34 &86.10&91.40&81.39\\
|
|
||||||
CLRerNet &DLA34 &86.47&91.66&81.83\\
|
|
||||||
\hline
|
|
||||||
PolarRCNN &DLA34&\textbf{87.29}&90.50&\textbf{84.31}\\
|
|
||||||
\hline
|
|
||||||
\end{tabular}
|
|
||||||
\end{adjustbox}
|
|
||||||
\end{table}
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\begin{table}[h]
|
|
||||||
\centering
|
|
||||||
\caption{Comparsion between different anchor strategies}
|
|
||||||
\begin{adjustbox}{width=\linewidth}
|
|
||||||
\begin{tabular}{c|ccc|cc}
|
|
||||||
\toprule
|
|
||||||
\textbf{Anchor strategy}&\textbf{Local R}& \textbf{Local Angle}&\textbf{Auxloss}&\textbf{F1@50}&\textbf{F1@75}\\
|
|
||||||
\midrule
|
|
||||||
\multirow{2}*{Fixed}
|
|
||||||
&- &- & &79.90 &60.98\\
|
|
||||||
&- &- &\checkmark&80.38 &62.35\\
|
|
||||||
\midrule
|
|
||||||
\multirow{5}*{Porposal}
|
|
||||||
& & & &75.85 &58.97\\
|
|
||||||
&\checkmark& & &78.46 &60.32\\
|
|
||||||
& &\checkmark& &80.31 &62.13\\
|
|
||||||
&\checkmark&\checkmark& &80.51 &63.38\\
|
|
||||||
&\checkmark&\checkmark&\checkmark&\textbf{80.81}&\textbf{63.97}\\
|
|
||||||
\bottomrule
|
|
||||||
\end{tabular}
|
|
||||||
\end{adjustbox}
|
|
||||||
\end{table}
|
|
||||||
|
|
||||||
\begin{table}[h]
|
|
||||||
\centering
|
|
||||||
\caption{NMS vs NMS-free on Curvelanes}
|
|
||||||
\begin{adjustbox}{width=\linewidth}
|
|
||||||
\begin{tabular}{l|l|ccc}
|
|
||||||
\toprule
|
|
||||||
\textbf{Paradigm} & \textbf{NMS thres(pixel)} & \textbf{F1(\%)} & \textbf{Precision(\%)} & \textbf{Recall(\%)} \\
|
|
||||||
\midrule
|
|
||||||
\multirow{7}*{PolarRCNN_{o2m}}
|
|
||||||
& 50 (default) &85.38&\textbf{91.01}&80.40\\
|
|
||||||
& 40 &85.97&90.72&81.68\\
|
|
||||||
& 30 &86.26&90.44&82.45\\
|
|
||||||
& 25 &86.38&90.27&82.83\\
|
|
||||||
& 20 &86.57&90.05&83.37\\
|
|
||||||
& 15 (optimal) &86.81&89.64&84.16\\
|
|
||||||
& 10 &86.58&88.62&\textbf{84.64}\\
|
|
||||||
\midrule
|
|
||||||
PolarRCNN (NMS-free) & - &\textbf{87.29}&90.50&84.31\\
|
|
||||||
\bottomrule
|
|
||||||
\end{tabular}
|
|
||||||
\end{adjustbox}
|
|
||||||
\end{table}
|
|
||||||
|
|
||||||
|
|
||||||
\begin{table}[h]
|
|
||||||
\centering
|
|
||||||
\caption{Ablation study on nms-free block}
|
|
||||||
\begin{adjustbox}{width=\linewidth}
|
|
||||||
\begin{tabular}{cccc|ccc}
|
|
||||||
\toprule
|
|
||||||
\textbf{GNN}&\textbf{cls Mat}& \textbf{Nbr Mat}&\textbf{Rank Loss}&\textbf{F1@50}&\textbf{Precision(\%)} & \textbf{Recall(\%)} \\
|
|
||||||
\midrule
|
|
||||||
& & & &16.19&69.05&9.17\\
|
|
||||||
\checkmark&\checkmark& & &79.42&88.46&72.06\\
|
|
||||||
\checkmark& &\checkmark& &71.97&73.13&70.84\\
|
|
||||||
\checkmark&\checkmark&\checkmark& &80.74&88.49&74.23\\
|
|
||||||
\checkmark&\checkmark&\checkmark&\checkmark&\textbf{80.78}&\textbf{88.49}&\textbf{74.30}\\
|
|
||||||
\bottomrule
|
|
||||||
\end{tabular}\
|
|
||||||
\end{adjustbox}
|
|
||||||
\end{table}
|
|
||||||
|
|
||||||
|
|
||||||
\begin{table}[h]
|
|
||||||
\centering
|
|
||||||
\caption{The ablation study for structure on CULane test set}
|
|
||||||
\begin{adjustbox}{width=\linewidth}
|
|
||||||
\begin{tabular}{c|l|lll}
|
|
||||||
\toprule
|
|
||||||
\multicolumn{2}{c|}{\textbf{Anchor strategy~/~assign}} & \textbf{F1@50(\%)} & \textbf{Precision(\%)} & \textbf{Recall(\%)} \\
|
|
||||||
\midrule
|
|
||||||
\multirow{6}*{Fixed}
|
|
||||||
&o2m-B w/~ NMS &80.38&87.44&74.38\\
|
|
||||||
&o2m-B w/o NMS &44.03\textcolor{darkgreen}{~(36.35$\downarrow$)}&31.12\textcolor{darkgreen}{~(56.32$\downarrow$)}&75.23\textcolor{red}{~(0.85$\uparrow$)}\\
|
|
||||||
\cline{2-5}
|
|
||||||
&o2o-B w/~ NMS &78.72&87.58&71.50\\
|
|
||||||
&o2o-B w/o NMS &78.23\textcolor{darkgreen}{~(0.49$\downarrow$)}&86.26\textcolor{darkgreen}{~(1.32$\downarrow$)}&71.57\textcolor{red}{~(0.07$\uparrow$)}\\
|
|
||||||
\cline{2-5}
|
|
||||||
&o2o-G w/~ NMS &80.37&87.44&74.37\\
|
|
||||||
&o2o-G w/o NMS &80.27\textcolor{darkgreen}{~(0.10$\downarrow$)}&87.14\textcolor{darkgreen}{~(0.30$\downarrow$)}&74.40\textcolor{red}{~(0.03$\uparrow$)}\\
|
|
||||||
\midrule
|
|
||||||
\multirow{6}*{Proposal}
|
|
||||||
&o2m-B w/~ NMS &80.81&88.53&74.33\\
|
|
||||||
&o2m-B w/o NMS &36.46\textcolor{darkgreen}{~(44.35$\downarrow$)}&24.09\textcolor{darkgreen}{~(64.44$\downarrow$)}&74.93\textcolor{red}{~(0.6$\uparrow$)}\\
|
|
||||||
\cline{2-5}
|
|
||||||
&o2o-B w/~ NMS &77.27&92.64&66.28\\
|
|
||||||
&o2o-B w/o NMS &47.11\textcolor{darkgreen}{~(30.16$\downarrow$)}&36.48\textcolor{darkgreen}{~(56.16$\downarrow$)}&66.48\textcolor{red}{~(0.20$\uparrow$)}\\
|
|
||||||
\cline{2-5}
|
|
||||||
&o2o-G w/~ NMS &80.81&88.53&74.32\\
|
|
||||||
&o2o-G w/o NMS &80.80\textcolor{darkgreen}{~(0.01$\downarrow$)}&88.51\textcolor{darkgreen}{~(0.02$\downarrow$)}&74.33\textcolor{red}{~(0.01$\uparrow$)}\\
|
|
||||||
\bottomrule
|
|
||||||
\end{tabular}
|
|
||||||
\end{adjustbox}
|
|
||||||
\end{table}
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\begin{table}[h]
|
|
||||||
\centering
|
|
||||||
\caption{The ablation study for stop grad on CULane test set}
|
|
||||||
\begin{adjustbox}{width=\linewidth}
|
|
||||||
\begin{tabular}{c|c|lll}
|
|
||||||
\toprule
|
|
||||||
\multicolumn{2}{c|}{\textbf{Paradigm}} & \textbf{F1(\%)} & \textbf{Precision(\%)} & \textbf{Recall(\%)} \\
|
|
||||||
\midrule
|
|
||||||
\multirow{2}*{Baseline}
|
|
||||||
&o2m-B w/~ NMS &78.83&88.99&70.75\\
|
|
||||||
&o2o-G w/o NMS &71.68\textcolor{darkgreen}{~(7.15$\downarrow$)}&72.56\textcolor{darkgreen}{~(16.43$\downarrow$)}&70.81\textcolor{red}{~(0.06$\uparrow$)}\\
|
|
||||||
\midrule
|
|
||||||
\multirow{2}*{Stop grad}
|
|
||||||
&o2m-B w/~ NMS &80.81&88.53&74.33\\
|
|
||||||
&o2o-G w/o NMS &80.80\textcolor{darkgreen}{~(0.01$\downarrow$)}&88.51\textcolor{darkgreen}{~(0.02$\downarrow$)}&74.33\textcolor{red}{~(0.00$\uparrow$)} \\
|
|
||||||
\bottomrule
|
|
||||||
\end{tabular}
|
|
||||||
\end{adjustbox}
|
|
||||||
\end{table}
|
|
||||||
|
|
||||||
|
|
||||||
\section{Conclusion}
|
|
||||||
The conclusion goes here.
|
|
||||||
|
|
||||||
|
|
||||||
\section*{Acknowledgments}
|
|
||||||
This should be a simple paragraph before the References to thank those individuals and institutions who have supported your work on this article.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
%{\appendices
|
|
||||||
%\section*{Proof of the First Zonklar Equation}
|
|
||||||
%Appendix one text goes here.
|
|
||||||
% You can choose not to have a title for an appendix if you want by leaving the argument blank
|
|
||||||
%\section*{Proof of the Second Zonklar Equation}
|
|
||||||
%Appendix two text goes here.}
|
|
||||||
|
|
||||||
|
|
||||||
\bibliographystyle{IEEEtran}
|
|
||||||
\bibliography{ref}
|
|
||||||
|
|
||||||
|
|
||||||
\newpage
|
|
||||||
|
|
||||||
\section{Biography Section}
|
|
||||||
If you have an EPS/PDF photo (graphicx package needed), extra braces are
|
|
||||||
needed around the contents of the optional argument to biography to prevent
|
|
||||||
the LaTeX parser from getting confused when it sees the complicated
|
|
||||||
$\backslash${\tt{includegraphics}} command within an optional argument. (You can create
|
|
||||||
your own custom macro containing the $\backslash${\tt{includegraphics}} command to make things
|
|
||||||
simpler here.)
|
|
||||||
|
|
||||||
\vspace{11pt}
|
|
||||||
|
|
||||||
% \bf{If you include a photo:}\vspace{-33pt}
|
|
||||||
% \begin{IEEEbiography}[{\includegraphics[width=1in,height=1.25in,clip,keepaspectratio]{fig1}}]{Michael Shell}
|
|
||||||
% Use $\backslash${\tt{begin\{IEEEbiography\}}} and then for the 1st argument use $\backslash${\tt{includegraphics}} to declare and link the author photo.
|
|
||||||
% Use the author name as the 3rd argument followed by the biography text.
|
|
||||||
% \end{IEEEbiography}
|
|
||||||
|
|
||||||
\vspace{11pt}
|
|
||||||
|
|
||||||
\bf{If you will not include a photo:}\vspace{-33pt}
|
|
||||||
\begin{IEEEbiographynophoto}{John Doe}
|
|
||||||
Use $\backslash${\tt{begin\{IEEEbiographynophoto\}}} and the author name as the argument followed by the biography text.
|
|
||||||
\end{IEEEbiographynophoto}
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\vfill
|
|
||||||
|
|
||||||
\end{document}
|
\end{document}
|
||||||
|
|
||||||
|
|
||||||
|
@ -200,11 +200,11 @@ where $BCE\left( \cdot , \cdot \right) $ denotes the binary cross entropy loss a
|
|||||||
During the test stage, once the local plor parameter of a line anchor is provided, it can be transformed to the global plor coordinates with the following euqation:
|
During the test stage, once the local plor parameter of a line anchor is provided, it can be transformed to the global plor coordinates with the following euqation:
|
||||||
\begin{equation}
|
\begin{equation}
|
||||||
\begin{aligned}
|
\begin{aligned}
|
||||||
r^{G}=r^{L}+\left( x^{L}-x^{G} \right) \cos \theta
|
r^{global}=r^{local}+\left( x^{local}-x^{global} \right) \cos \theta
|
||||||
\\+\left( y^{L}-y^{G} \right) \sin \theta
|
\\+\left( y^{local}-y^{global} \right) \sin \theta
|
||||||
\end{aligned}
|
\end{aligned}
|
||||||
\end{equation}
|
\end{equation}
|
||||||
where $\left( x^{L}, y^{L} \right)$ and $\left( x^{G}, y^{G} \right)$ are the Cartesian coordinates of local and global origin points correspondingly.
|
where $\left( x^{local}, y^{local} \right)$ and $\left( x^{global}, y^{global} \right)$ are the Cartesian coordinates of local and global origin points correspondingly.
|
||||||
|
|
||||||
|
|
||||||
\subsection{RCNN Module}
|
\subsection{RCNN Module}
|
||||||
|
2
make.sh
@ -1,6 +1,6 @@
|
|||||||
# latexmk -c
|
# latexmk -c
|
||||||
# latexmk -pvc -xelatex -interaction=nonstopmode main.tex
|
# latexmk -pvc -xelatex -interaction=nonstopmode main.tex
|
||||||
latexmk -quiet -interaction=nonstopmode --pvc --pdf main.tex
|
latexmk -quiet -interaction=nonstopmode --pvc -pdf main.tex
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Before Width: | Height: | Size: 1.7 MiB After Width: | Height: | Size: 1.7 MiB |
Before Width: | Height: | Size: 147 KiB After Width: | Height: | Size: 147 KiB |
Before Width: | Height: | Size: 887 KiB |
Before Width: | Height: | Size: 694 KiB After Width: | Height: | Size: 602 KiB |
Before Width: | Height: | Size: 1.4 MiB After Width: | Height: | Size: 1.4 MiB |
Before Width: | Height: | Size: 1.4 MiB After Width: | Height: | Size: 1.4 MiB |
Before Width: | Height: | Size: 133 KiB |
Before Width: | Height: | Size: 1.4 MiB After Width: | Height: | Size: 1.4 MiB |
Before Width: | Height: | Size: 621 KiB After Width: | Height: | Size: 621 KiB |
Before Width: | Height: | Size: 1.4 MiB After Width: | Height: | Size: 1.4 MiB |
Before Width: | Height: | Size: 189 KiB |
@ -1,54 +0,0 @@
|
|||||||
import matplotlib.pyplot as plt
|
|
||||||
import matplotlib as mpl
|
|
||||||
|
|
||||||
# 设置全局字体为 Times New Roman
|
|
||||||
mpl.rcParams['font.family'] = 'Times New Roman'
|
|
||||||
mpl.rcParams['font.serif'] = ['Times New Roman']
|
|
||||||
mpl.rcParams['axes.titlesize'] = 14
|
|
||||||
mpl.rcParams['axes.labelsize'] = 12
|
|
||||||
mpl.rcParams['xtick.labelsize'] = 12
|
|
||||||
mpl.rcParams['ytick.labelsize'] = 12
|
|
||||||
mpl.rcParams['legend.fontsize'] = 12
|
|
||||||
mark_size = 8
|
|
||||||
|
|
||||||
# 定义数据
|
|
||||||
data = {
|
|
||||||
'LaneATT (2021)': {'x': [3.23, 5.01, 23.67], 'y': [75.09, 76.68, 77.02], 'color': 'magenta', 'marker': 'H'},
|
|
||||||
'CLRNet (2022)': {'x': [7.37, 8.81, 9.31, 14.36], 'y': [79.58, 79.73, 80.47, 80.13], 'color': 'orange', 'marker': 'p'},
|
|
||||||
'CLRerNet (2023)': {'x': [8.81, 9.31, 14.36], 'y': [80.76, 81.12, 80.91], 'color': 'orangered', 'marker': 'p'},
|
|
||||||
'ADNet (2023)': {'x': [8.4, 10.67], 'y': [77.56, 78.94], 'color': 'green', 'marker': 'v'},
|
|
||||||
'SRLane (2024)': {'x': [3.12], 'y': [79.73], 'color': 'red', 'marker': '*'},
|
|
||||||
'UFLDv2 (2022)': {'x': [2.7, 4.6], 'y': [75, 76], 'color': 'purple', 'marker': '^'},
|
|
||||||
'PolarRCNN-NMS (ours)': {'x': [3.71, 4.97, 5.47, 6.14], 'y': [80.81, 80.92, 81.49, 81.34], 'color': 'blue', 'marker': 'o'},
|
|
||||||
'PolarRCNN (ours)': {'x': [4.77, 6.10, 6.54, 7.13], 'y': [80.81, 80.92, 81.49, 81.34], 'color': 'cyan', 'marker': 'o'},
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
plt.xlim(0, 30)
|
|
||||||
|
|
||||||
# 绘制数据点
|
|
||||||
for label, props in data.items():
|
|
||||||
plt.plot(
|
|
||||||
props['x'], props['y'],
|
|
||||||
alpha=0.8,
|
|
||||||
c=props['color'],
|
|
||||||
marker=props['marker'],
|
|
||||||
# edgecolors='w',
|
|
||||||
markersize = mark_size,
|
|
||||||
linewidth=1.2,
|
|
||||||
label=label
|
|
||||||
)
|
|
||||||
|
|
||||||
# 设置标题和标签
|
|
||||||
plt.grid(True, linestyle='-', alpha=0.5)
|
|
||||||
plt.xlabel('Latency (ms) on NVIDIA A100')
|
|
||||||
plt.ylabel('F1-score (%)')
|
|
||||||
|
|
||||||
# 添加图例,并调整图例中的标记大小
|
|
||||||
legend = plt.legend(loc="upper right")
|
|
||||||
for handle in legend.legend_handles:
|
|
||||||
handle._sizes = [20]
|
|
||||||
plt.savefig('speed_method.png', dpi=300)
|
|
||||||
plt.show()
|
|
BIN
thsis_figure/triplet_head.png
Normal file
After Width: | Height: | Size: 64 KiB |
Before Width: | Height: | Size: 592 KiB After Width: | Height: | Size: 719 KiB |
Before Width: | Height: | Size: 584 KiB After Width: | Height: | Size: 717 KiB |
Before Width: | Height: | Size: 584 KiB After Width: | Height: | Size: 717 KiB |
Before Width: | Height: | Size: 562 KiB After Width: | Height: | Size: 718 KiB |