update
This commit is contained in:
parent
6c9fec214c
commit
e460769732
@ -1,867 +0,0 @@
|
||||
\documentclass[lettersize,journal]{IEEEtran}
|
||||
\usepackage{amsmath,amsfonts}
|
||||
\usepackage{algorithmic}
|
||||
\usepackage{array}
|
||||
\usepackage[caption=false,font=normalsize,labelfont=sf,textfont=sf]{subfig}
|
||||
\usepackage{textcomp}
|
||||
\usepackage{stfloats}
|
||||
\usepackage{url}
|
||||
\usepackage{verbatim}
|
||||
\usepackage{graphicx}
|
||||
\hyphenation{op-tical net-works semi-conduc-tor IEEE-Xplore}
|
||||
\def\BibTeX{{\rm B\kern-.05em{\sc i\kern-.025em b}\kern-.08em
|
||||
T\kern-.1667em\lower.7ex\hbox{E}\kern-.125emX}}
|
||||
\usepackage{balance}
|
||||
\begin{document}
|
||||
\title{How to Use the IEEEtran \LaTeX \ Templates}
|
||||
\author{IEEE Publication Technology Department
|
||||
\thanks{Manuscript created October, 2020; This work was developed by the IEEE Publication Technology Department. This work is distributed under the \LaTeX \ Project Public License (LPPL) ( http://www.latex-project.org/ ) version 1.3. A copy of the LPPL, version 1.3, is included in the base \LaTeX \ documentation of all distributions of \LaTeX \ released 2003/12/01 or later. The opinions expressed here are entirely that of the author. No warranty is expressed or implied. User assumes all risk.}}
|
||||
|
||||
\markboth{Journal of \LaTeX\ Class Files,~Vol.~18, No.~9, September~2020}%
|
||||
{How to Use the IEEEtran \LaTeX \ Templates}
|
||||
|
||||
\maketitle
|
||||
|
||||
\begin{abstract}
|
||||
This document describes the most common article elements and how to use the IEEEtran class with \LaTeX \ to produce files that are suitable for submission to the Institute of Electrical and Electronics Engineers (IEEE). IEEEtran can produce conference, journal and technical note (correspondence) papers with a suitable choice of class options.
|
||||
\end{abstract}
|
||||
|
||||
\begin{IEEEkeywords}
|
||||
Class, IEEEtran, \LaTeX, paper, style, template, typesetting.
|
||||
\end{IEEEkeywords}
|
||||
|
||||
|
||||
\section{Introduction}
|
||||
\IEEEPARstart{W}{elcome} to the updated and simplified documentation to using the IEEEtran \LaTeX \ class file. The IEEE has examined hundreds of author submissions using this package to help formulate this easy to follow guide. We will cover the most commonly used elements of a journal article. For less common elements we will refer back to the ``IEEEtran\_HOWTO.pdf''.
|
||||
|
||||
This document applies to version 1.8b of IEEEtran.
|
||||
|
||||
The IEEEtran template package contains the following example files:
|
||||
\begin{list}{}{}
|
||||
\item{bare\_jrnl.tex}
|
||||
\item{bare\_conf.tex}
|
||||
\item{bare\_jrnl\_compsoc.tex}
|
||||
\item{bare\_conf\_compsoc.tex}
|
||||
\item{bare\_jrnl\_comsoc.tex}
|
||||
\end{list}
|
||||
These are ``bare bones" templates to quickly understand the document structure.
|
||||
|
||||
It is assumed that the reader has a basic working knowledge of \LaTeX. Those who are new to \LaTeX \ are encouraged to read Tobias Oetiker's ``The Not So Short Introduction to \LaTeX '', available at: \url{http://tug.ctan.org/info/lshort/english/lshort.pdf} which provides an overview of working with \LaTeX.
|
||||
|
||||
\section{The Design, Intent and \\ Limitations of the Templates}
|
||||
\noindent The templates are intended to {\bf{approximate the final look and page length of the articles/papers}}. Therefore, {\bf{they are NOT intended to be the final produced work that is displayed in print or on IEEEXplore\textsuperscript{\textregistered}}}. They will help to give the authors an approximation of the number of pages that will be in the final version. The structure of the \LaTeX files, as designed, enable easy conversion to XML for the composition systems used by the IEEE's outsource vendors. The XML files are used to produce the final print/IEEEXplore\textsuperscript{\textregistered} pdf and then converted to HTML for IEEEXplore\textsuperscript{\textregistered}. Have you looked at your article/paper in the HTML version?
|
||||
|
||||
\section{\LaTeX \ Distributions: Where to Get Them}
|
||||
\noindent IEEE recommends using the distribution from the \TeX User Group at \url{http://www.tug.org}. You can join TUG and obtain a DVD distribution or download for free from the links provided on their website: \url{http://www.tug.org/texlive/}. The DVD includes distributions for Windows, Mac OS X and Linux operating systems.
|
||||
|
||||
\section{Where to get the IEEEtran Templates}
|
||||
\noindent The {\bf{IEEE Template Selector}} will always have the most up-to-date versions of the \LaTeX\ and MSWord templates. Please see: \url{https://template-selector.ieee.org/} and follow the steps to find the correct template for your intended publication. Many publications use the IEEETran LaTeX templates, however, some publications have their own special templates. Many of these are based on IEEEtran, but may have special instructions that vary slightly from those in this document.
|
||||
|
||||
\section{Where to get \LaTeX \ help - user groups}
|
||||
\noindent The following on-line groups are very helpful to beginning and experienced \LaTeX\ users. A search through their archives can provide many answers to common questions.
|
||||
\begin{list}{}{}
|
||||
\item{\url{http://www.latex-community.org/}}
|
||||
\item{\url{https://tex.stackexchange.com/} }
|
||||
\end{list}
|
||||
|
||||
\section{Document Class Options in IEEEtran}
|
||||
\noindent At the beginning of your \LaTeX\ file you will need to establish what type of publication style you intend to use. The following list shows appropriate documentclass options for each of the types covered by IEEEtran.
|
||||
|
||||
\begin{list}{}{}
|
||||
\item{regular Journal Article}
|
||||
\item{{\tt{$\backslash$documentclass[journal]{IEEEtran}}}}\\
|
||||
\item{{Conference Paper}}
|
||||
\item{{\tt{$\backslash$documentclass[conference]{IEEEtran}}}}\\
|
||||
\item{Computer Society Journal Article}
|
||||
\item{{\tt{$\backslash$documentclass[10pt,journal,compsoc]{IEEEtran}}}}\\
|
||||
\item{Computer Society Conference Paper}
|
||||
\item{{\tt{$\backslash$documentclass[conference,compsoc]{IEEEtran}}}}\\
|
||||
\item{{Communications Society Journal Article}}
|
||||
\item{{\tt{$\backslash$documentclass[journal,comsoc]{IEEEtran}}}}\\
|
||||
\item{{Brief, Correspondence or Technote}}
|
||||
\item{{\tt{$\backslash$documentclass[9pt,technote]{IEEEtran}}}}
|
||||
\end{list}
|
||||
|
||||
There are other options available for each of these when submitting for peer review or other special requirements. IEEE recommends to compose your article in the base 2-column format to make sure all your equations, tables and graphics will fit the final 2-column format. Please refer to the document ``IEEEtran\_HOWTO.pdf'' for more information on settings for peer review submission if required by your EIC.
|
||||
|
||||
\section{How to Create Common Front Matter}
|
||||
\noindent The following sections describe general coding for these common elements. Computer Society publications and Conferences may have their own special variations and will be noted below.
|
||||
\subsection{Paper Title}
|
||||
\noindent The title of your paper is coded as:
|
||||
|
||||
\begin{verbatim}
|
||||
\title{The Title of Your Paper}
|
||||
\end{verbatim}
|
||||
|
||||
\noindent Please try to avoid the use of math or chemical formulas in your title if possible.
|
||||
|
||||
\subsection{Author Names and Affiliations}
|
||||
\noindent The author section should be coded as follows:
|
||||
\begin{verbatim}
|
||||
\author{Masahito Hayashi
|
||||
\IEEEmembership{Fellow, IEEE}, Masaki Owari
|
||||
\thanks{M. Hayashi is with Graduate School
|
||||
of Mathematics, Nagoya University, Nagoya,
|
||||
Japan}
|
||||
\thanks{M. Owari is with the Faculty of
|
||||
Informatics, Shizuoka University,
|
||||
Hamamatsu, Shizuoka, Japan.}
|
||||
}
|
||||
\end{verbatim}
|
||||
Be sure to use the $\backslash$IEEEmembership command to identify IEEE membership status.
|
||||
Please see the ``IEEEtran\_HOWTO.pdf'' for specific information on coding authors for Conferences and Computer Society publications. Note that the closing curly brace for the author group comes at the end of the thanks group. This will prevent you from creating a blank first page.
|
||||
|
||||
\subsection{Running Heads}
|
||||
\noindent The running heads are declared by using the $\backslash${\tt{markboth}} command. There are two arguments to this command: the first contains the journal name information and the second contains the author names and paper title.
|
||||
\begin{verbatim}
|
||||
\markboth{Journal of Quantum Electronics,
|
||||
Vol. 1, No. 1, January 2021}
|
||||
{Author1, Author2,
|
||||
\MakeLowercase{\textit{(et al.)}:
|
||||
Paper Title}
|
||||
\end{verbatim}
|
||||
|
||||
\subsection{Copyright Line}
|
||||
\noindent For Transactions and Journals papers, this is not necessary to use at the submission stage of your paper. The IEEE production process will add the appropriate copyright line. If you are writing a conference paper, please see the ``IEEEtran\_HOWTO.pdf'' for specific information on how to code "Publication ID Marks".
|
||||
|
||||
\subsection{Abstracts}
|
||||
\noindent The abstract is the first element of a paper after the $\backslash${\tt{maketitle}} macro is invoked. The coding is simply:
|
||||
\begin{verbatim}
|
||||
\begin{abstract}
|
||||
Text of your abstract.
|
||||
\end{abstract}
|
||||
\end{verbatim}
|
||||
Please try to avoid mathematical and chemical formulas in the abstract.
|
||||
|
||||
\subsection{Index Terms}
|
||||
\noindent The index terms are used to help other researchers discover your paper. Each society may have it's own keyword set. Contact the EIC of your intended publication for this list.
|
||||
\begin{verbatim}
|
||||
\begin{IEEEkeywords}
|
||||
Broad band networks, quality of service
|
||||
\end{IEEEkeywords}
|
||||
\end{verbatim}
|
||||
\section{How to Create Common Body Elements}
|
||||
\noindent The following sections describe common body text elements and how to code them.
|
||||
|
||||
\subsection{Initial Drop Cap Letter}
|
||||
\noindent The first text paragraph uses a ``drop cap'' followed by the first word in ALL CAPS. This is accomplished by using the $\backslash${\tt{IEEEPARstart}} command as follows:
|
||||
\begin{verbatim}
|
||||
\IEEEPARstart{T}{his} is the first paragraph
|
||||
of your paper. . .
|
||||
\end{verbatim}
|
||||
|
||||
\subsection{Sections and Subsections}
|
||||
\noindent Section headings use standard \LaTeX\ commands: $\backslash${\tt{section}}, $\backslash${\tt{subsection}} and $\backslash${\tt{subsubsection}}. Numbering is handled automatically for you and varies according to type of publication. It is common to not indent the first paragraph following a section head by using $\backslash${\tt{noindent}} as follows:
|
||||
\begin{verbatim}
|
||||
\section{Section Head}
|
||||
\noindent The text of your paragraph . . .
|
||||
\end{verbatim}
|
||||
|
||||
\subsection{Citations to the Bibliography}
|
||||
\noindent The coding for the citations are made with the \LaTeX\ $\backslash${\tt{cite}} command. This will produce individual bracketed reference numbers in the IEEE style. At the top of your \LaTeX\ file you should include:
|
||||
\begin{verbatim}
|
||||
\usepackage{cite}
|
||||
\end{verbatim}
|
||||
For a single citation code as follows:
|
||||
\begin{verbatim}
|
||||
see \cite{ams}
|
||||
\end{verbatim}
|
||||
This will display as: see \cite{ams}\\
|
||||
|
||||
For multiple citations code as follows:
|
||||
\begin{verbatim}
|
||||
\cite{ams,oxford,lacomp}
|
||||
\end{verbatim}
|
||||
|
||||
This will display as \cite{ams,oxford,lacomp}
|
||||
|
||||
\subsection{Figures}
|
||||
\noindent Figures are coded with the standard \LaTeX\ commands as follows:
|
||||
\begin{verbatim}
|
||||
\begin{figure}[!t]
|
||||
\centering
|
||||
\includegraphics[width=2.5in]{fig1}
|
||||
\caption{This is the caption for one fig.}
|
||||
\label{fig1}
|
||||
\end{figure}
|
||||
\end{verbatim}
|
||||
The [!t] argument enables floats to the top of the page to follow IEEE style. Make sure you include:
|
||||
\begin{verbatim}
|
||||
\usepackage{graphicx}
|
||||
\end{verbatim}
|
||||
|
||||
\noindent at the top of your \LaTeX file with the other package declarations.
|
||||
|
||||
To cross-reference your figures in the text use the following code example:
|
||||
\begin{verbatim}
|
||||
See figure \ref{fig1} ...
|
||||
\end{verbatim}
|
||||
This will produce:\\
|
||||
See figure \ref{fig1} . . .
|
||||
|
||||
\begin{figure}[!t]
|
||||
\centering
|
||||
\includegraphics[width=2.5in]{fig1}
|
||||
\caption{This is the caption for one fig.}
|
||||
\label{fig1}
|
||||
\end{figure}
|
||||
|
||||
\subsection{Tables}
|
||||
\noindent Tables should be coded with the standard \LaTeX\ coding. The following example shows a simple table.
|
||||
|
||||
|
||||
\begin{verbatim}
|
||||
\begin{table}
|
||||
\begin{center}
|
||||
\caption{Filter design equations ...}
|
||||
\label{tab1}
|
||||
\begin{tabular}{| c | c | c |}
|
||||
\hline
|
||||
Order & Arbitrary coefficients &
|
||||
coefficients\\
|
||||
of filter & $e_m$ & $b_{ij}$ \\
|
||||
\hline
|
||||
1& $b_{ij}=\hat{e}.\hat{\beta_{ij}}$,
|
||||
& $b_{00}=0$\\
|
||||
\hline
|
||||
2&$\beta_{22}=(~1,-1,-1,~~1,~~1,~~1)$ &\\
|
||||
\hline
|
||||
3& $b_{ij}=\hat{e}.\hat{\beta_{ij}}$,
|
||||
& $b_{00}=0$,\\
|
||||
\hline
|
||||
\end{tabular}
|
||||
\end{center}
|
||||
\end{table}
|
||||
\end{verbatim}
|
||||
To reference the table in the text, code as follows:
|
||||
\begin{verbatim}Table~\ref{tab1} lists the closed-form...\end{verbatim}
|
||||
to produce:
|
||||
|
||||
Table~\ref{tab1} lists the closed-form . . .
|
||||
|
||||
|
||||
%moved here for pagination purposes
|
||||
\begin{table}
|
||||
\begin{center}
|
||||
\caption{A Simple Table Example.}
|
||||
\label{tab1}
|
||||
\begin{tabular}{| c | c | c |}
|
||||
\hline
|
||||
Order & Arbitrary coefficients & coefficients\\
|
||||
of filter & $e_m$ & $b_{ij}$ \\
|
||||
\hline
|
||||
1& $b_{ij}=\hat{e}.\hat{\beta_{ij}}$, & $b_{00}=0$\\
|
||||
\hline
|
||||
2&$\beta_{22}=(~1,-1,-1,~~1,~~1,~~1)$ &\\
|
||||
\hline
|
||||
3& $b_{ij}=\hat{e}.\hat{\beta_{ij}}$, & $b_{00}=0$,\\
|
||||
\hline
|
||||
\end{tabular}
|
||||
\end{center}
|
||||
\end{table}
|
||||
|
||||
|
||||
\subsection{Lists}
|
||||
\noindent In this section, we will consider three types of lists: simple unnumbered, numbered and bulleted. There have been numerous options added to IEEEtran to enhance the creation of lists. If your lists are more complex than those shown below, please refer to the ``IEEEtran\_HOWTO.pdf'' for additional options.\\
|
||||
|
||||
\noindent{\bf A plain unnumbered list}
|
||||
|
||||
\begin{list}{}{}
|
||||
\item{bare\_jrnl.tex}
|
||||
\item{bare\_conf.tex}
|
||||
\item{bare\_jrnl\_compsoc.tex}
|
||||
\item{bare\_conf\_compsoc.tex}
|
||||
\item{bare\_jrnl\_comsoc.tex}
|
||||
\end{list}
|
||||
|
||||
\noindent coded as:
|
||||
\begin{verbatim}
|
||||
\begin{list}{}{}
|
||||
\item{bare\_jrnl.tex}
|
||||
\item{bare\_conf.tex}
|
||||
\item{bare\_jrnl\_compsoc.tex}
|
||||
\item{bare\_conf\_compsoc.tex}
|
||||
\item{bare\_jrnl\_comsoc.tex}
|
||||
\end{list}
|
||||
\end{verbatim}
|
||||
\noindent{\bf A simple numbered list}
|
||||
|
||||
\begin{enumerate}
|
||||
\item{bare\_jrnl.tex}
|
||||
\item{bare\_conf.tex}
|
||||
\item{bare\_jrnl\_compsoc.tex}
|
||||
\item{bare\_conf\_compsoc.tex}
|
||||
\item{bare\_jrnl\_comsoc.tex}
|
||||
\end{enumerate}
|
||||
\noindent coded as:
|
||||
\begin{verbatim}
|
||||
\begin{enumerate}
|
||||
\item{bare\_jrnl.tex}
|
||||
\item{bare\_conf.tex}
|
||||
\item{bare\_jrnl\_compsoc.tex}
|
||||
\item{bare\_conf\_compsoc.tex}
|
||||
\item{bare\_jrnl\_comsoc.tex}
|
||||
\end{enumerate}
|
||||
\end{verbatim}
|
||||
|
||||
\noindent{\bf A simple bulleted list}
|
||||
|
||||
\begin{itemize}
|
||||
\item{bare\_jrnl.tex}
|
||||
\item{bare\_conf.tex}
|
||||
\item{bare\_jrnl\_compsoc.tex}
|
||||
\item{bare\_conf\_compsoc.tex}
|
||||
\item{bare\_jrnl\_comsoc.tex}
|
||||
\end{itemize}
|
||||
|
||||
\noindent coded as:
|
||||
|
||||
\begin{verbatim}
|
||||
\begin{itemize}
|
||||
\item{bare\_jrnl.tex}
|
||||
\item{bare\_conf.tex}
|
||||
\item{bare\_jrnl\_compsoc.tex}
|
||||
\item{bare\_conf\_compsoc.tex}
|
||||
\item{bare\_jrnl\_comsoc.tex}
|
||||
\end{itemize}
|
||||
\end{verbatim}
|
||||
|
||||
|
||||
\subsection{Other Elements}
|
||||
\noindent For other less common elements such as Algorithms, Theorems and Proofs, and Floating Structures such as page-wide tables, figures or equations, please refer to the ``IEEEtran\_HOWTO.pdf'' section on ``Double Column Floats.''
|
||||
|
||||
|
||||
\section{How to Create Common Back Matter Elements}
|
||||
\noindent The following sections demonstrate common back matter elements such as Acknowledgments, Bibliographies, Appendicies and Author Biographies.
|
||||
|
||||
\subsection{Acknowledgments}
|
||||
\noindent This should be a simple paragraph before the bibliography to thank those individuals and institutions who have supported your work on this article.
|
||||
|
||||
\begin{verbatim}
|
||||
\section{Acknowledgments}
|
||||
\noindent Text describing those who
|
||||
supported your paper.
|
||||
\end{verbatim}
|
||||
|
||||
\subsection{Bibliographies}
|
||||
\noindent {\bf{References Simplified:}} A simple way of composing references is to use the $\backslash${\tt{bibitem}} macro to define the beginning of a reference as in the following examples:\\
|
||||
|
||||
|
||||
\noindent [6] H. Sira-Ramirez. ``On the sliding mode control of nonlinear systems,'' \textit{Systems \& Control Letters}, vol. 19, pp. 303--312, 1992.
|
||||
|
||||
\noindent coded as:
|
||||
\begin{verbatim}
|
||||
\bibitem{Sira3}
|
||||
H. Sira-Ramirez. ``On the sliding mode
|
||||
control of nonlinear systems,''
|
||||
\textit{Systems \& Control Letters},
|
||||
vol. 19, pp. 303--312, 1992.
|
||||
\end{verbatim}
|
||||
|
||||
\noindent [7] A. Levant.``Exact differentiation of signals with unbounded higher derivatives,'' in \textit{Proceedings of the 45th IEEE Conference on Decision and Control}, San Diego, California, USA, pp. 5585--5590, 2006.
|
||||
|
||||
\noindent coded as:
|
||||
\begin{verbatim}\bibitem{Levant}
|
||||
A. Levant. ``Exact differentiation of
|
||||
signals with unbounded higher
|
||||
derivatives,'' in \textit{Proceedings
|
||||
of the 45th IEEE Conference on
|
||||
Decision and Control}, San Diego,
|
||||
California, USA, pp. 5585--5590, 2006.
|
||||
\end{verbatim}
|
||||
|
||||
|
||||
\noindent [8] M. Fliess, C. Join, and H. Sira-Ramirez. ``Non-linear estimation is easy,'' \textit{International Journal of Modelling, Identification and Control}, vol. 4, no. 1, pp. 12--27, 2008.
|
||||
|
||||
\noindent coded as:
|
||||
\begin{verbatim}
|
||||
\bibitem{Cedric}
|
||||
M. Fliess, C. Join, and H. Sira-Ramirez.
|
||||
``Non-linear estimation is easy,''
|
||||
\textit{International Journal of Modelling,
|
||||
Identification and Control}, vol. 4,
|
||||
no. 1, pp. 12--27, 2008.
|
||||
\end{verbatim}
|
||||
|
||||
\noindent [9] R. Ortega, A. Astolfi, G. Bastin, and H. Rodriguez. ``Stabilization of food-chain systems using a port-controlled Hamiltonian description,'' in \textit{Proceedings of the American Control Conference}, Chicago, Illinois, USA, pp. 2245--2249, 2000.
|
||||
|
||||
\noindent coded as:
|
||||
\begin{verbatim}
|
||||
\bibitem{Ortega}
|
||||
R. Ortega, A. Astolfi, G. Bastin, and H.
|
||||
Rodriguez. ``Stabilization of food-chain
|
||||
systems using a port-controlled Hamiltonian
|
||||
description,'' in \textit{Proceedings of the
|
||||
American Control Conference}, Chicago,
|
||||
Illinois, USA, pp. 2245--2249, 2000.
|
||||
\end{verbatim}
|
||||
|
||||
\subsection{Accented Characters in References}
|
||||
\noindent When using accented characters in references, please use the standard LaTeX coding for accents. {\bf{Do not use math coding for character accents}}. For example:
|
||||
\begin{verbatim}
|
||||
\'e, \"o, \`a, \~e
|
||||
\end{verbatim}
|
||||
will produce: \'e, \"o, \`a, \~e
|
||||
|
||||
|
||||
\subsection{Use of BibTeX}
|
||||
\noindent If you wish to use BibTeX, please see the documentation that accompanies the IEEEtran Bibliography package.
|
||||
|
||||
\subsection{Biographies and Author Photos}
|
||||
\noindent Authors may have options to include their photo or not. Photos should be a bit-map graphic (.tif or .jpg) and sized to fit in the space allowed. Please see the coding samples below:
|
||||
\begin{verbatim}
|
||||
\begin{IEEEbiographynophoto}{Jane Doe}
|
||||
Biography text here without a photo.
|
||||
\end{IEEEbiographynophoto}
|
||||
\end{verbatim}
|
||||
or a biography with a photo
|
||||
|
||||
\begin{verbatim}
|
||||
\begin{IEEEbiography}[{\includegraphics
|
||||
[width=1in,height=1.25in,clip,
|
||||
keepaspectratio]{fig1.png}}]
|
||||
{IEEE Publications Technology Team}
|
||||
In this paragraph you can place
|
||||
your educational, professional background
|
||||
and research and other interests.
|
||||
\end{IEEEbiography}
|
||||
\end{verbatim}
|
||||
|
||||
Please see the end of this document to see the output of these coding examples.
|
||||
|
||||
|
||||
|
||||
\section{Mathematical Typography \\ and Why It Matters}
|
||||
|
||||
\noindent Typographical conventions for mathematical formulas have been developed to {\bf provide uniformity and clarity of presentation across mathematical texts}. This enables the readers of those texts to both understand the author's ideas and to grasp new concepts quickly. While software such as \LaTeX \ and MathType\textsuperscript{\textregistered} can produce aesthetically pleasing math when used properly, it is also very easy to misuse the software, potentially resulting in incorrect math display.
|
||||
|
||||
IEEE aims to provide authors with the proper guidance on mathematical typesetting style and assist them in writing the best possible article.
|
||||
|
||||
As such, IEEE has assembled a set of examples of good and bad mathematical typesetting. You will see how various issues are dealt with. The following publications have been referenced in preparing this material:
|
||||
|
||||
\begin{list}{}{}
|
||||
\item{\emph{Mathematics into Type}, published by the American Mathematical Society}
|
||||
\item{\emph{The Printing of Mathematics}, published by Oxford University Press}
|
||||
\item{\emph{The \LaTeX Companion}, by F. Mittelbach and M. Goossens}
|
||||
\item{\emph{More Math into LaTeX}, by G. Gr\"atzer}
|
||||
\item{AMS-StyleGuide-online.pdf, published by the American Mathematical Society}
|
||||
\end{list}
|
||||
|
||||
Further examples can be seen at \url{http://journals.ieeeauthorcenter.ieee.org/wp-content/uploads/sites/7/IEEE-Math-Typesetting-Guide.pdf}
|
||||
|
||||
\subsection{Display Equations}
|
||||
\noindent A simple display equation example shown below uses the ``equation'' environment. To number the equations, use the $\backslash${\tt{label}} macro to create an identifier for the equation. LaTeX will automatically number the equation for you.
|
||||
\begin{equation}
|
||||
\label{deqn_ex1}
|
||||
x = \sum_{i=0}^{n} 2{i} Q.
|
||||
\end{equation}
|
||||
|
||||
\noindent is coded as follows:
|
||||
\begin{verbatim}
|
||||
\begin{equation}
|
||||
\label{deqn_ex1}
|
||||
x = \sum_{i=0}^{n} 2{i} Q.
|
||||
\end{equation}
|
||||
\end{verbatim}
|
||||
|
||||
To reference this equation in the text use the $\backslash${\tt{ref}} macro.
|
||||
Please see (\ref{deqn_ex1})\\
|
||||
\noindent is coded as follows:
|
||||
\begin{verbatim}
|
||||
Please see (\ref{deqn_ex1})\end{verbatim}
|
||||
|
||||
\subsection{Equation Numbering}
|
||||
\noindent {\bf{Consecutive Numbering:}} Equations within an article are numbered consecutively from the beginning of the
|
||||
article to the end, i.e., (1), (2), (3), (4), (5), etc. Do not use roman numerals or section numbers for equation numbering.\\
|
||||
|
||||
\noindent {\bf{Appendix Equations:}} The continuation of consecutively numbered equations is best in the Appendix, but numbering
|
||||
as (A1), (A2), etc., is permissible.\\
|
||||
|
||||
\noindent {\bf{Hyphens and Periods}}: Hyphens and periods should not be used in equation numbers, i.e., use (1a) rather than
|
||||
(1-a) and (2a) rather than (2.a) for sub-equations. This should be consistent throughout the article.
|
||||
|
||||
\subsection{Multi-line equations and alignment}
|
||||
\noindent Here we show several examples of multi-line equations and proper alignments.
|
||||
|
||||
\noindent {\bf{A single equation that must break over multiple lines due to length with no specific alignment.}}
|
||||
\begin{multline}
|
||||
\text{The first line of this example}\\
|
||||
\text{The second line of this example}\\
|
||||
\text{The third line of this example}
|
||||
\end{multline}
|
||||
|
||||
\noindent is coded as:
|
||||
\begin{verbatim}
|
||||
\begin{multline}
|
||||
\text{The first line of this example}\\
|
||||
\text{The second line of this example}\\
|
||||
\text{The third line of this example}
|
||||
\end{multline}
|
||||
\end{verbatim}
|
||||
|
||||
\noindent {\bf{A single equation with multiple lines aligned at the = signs}}
|
||||
\begin{align}
|
||||
a &= c+d \\
|
||||
b &= e+f
|
||||
\end{align}
|
||||
\noindent is coded as:
|
||||
\begin{verbatim}
|
||||
\begin{align}
|
||||
a &= c+d \\
|
||||
b &= e+f
|
||||
\end{align}
|
||||
\end{verbatim}
|
||||
|
||||
The {\tt{align}} environment can align on multiple points as shown in the following example:
|
||||
\begin{align}
|
||||
x &= y & X & =Y & a &=bc\\
|
||||
x' &= y' & X' &=Y' &a' &=bz
|
||||
\end{align}
|
||||
\noindent is coded as:
|
||||
\begin{verbatim}
|
||||
\begin{align}
|
||||
x &= y & X & =Y & a &=bc\\
|
||||
x' &= y' & X' &=Y' &a' &=bz
|
||||
\end{align}
|
||||
\end{verbatim}
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
\subsection{Subnumbering}
|
||||
\noindent The amsmath package provides a {\tt{subequations}} environment to facilitate subnumbering. An example:
|
||||
|
||||
\begin{subequations}\label{eq:2}
|
||||
\begin{align}
|
||||
f&=g \label{eq:2A}\\
|
||||
f' &=g' \label{eq:2B}\\
|
||||
\mathcal{L}f &= \mathcal{L}g \label{eq:2c}
|
||||
\end{align}
|
||||
\end{subequations}
|
||||
|
||||
\noindent is coded as:
|
||||
\begin{verbatim}
|
||||
\begin{subequations}\label{eq:2}
|
||||
\begin{align}
|
||||
f&=g \label{eq:2A}\\
|
||||
f' &=g' \label{eq:2B}\\
|
||||
\mathcal{L}f &= \mathcal{L}g \label{eq:2c}
|
||||
\end{align}
|
||||
\end{subequations}
|
||||
|
||||
\end{verbatim}
|
||||
|
||||
\subsection{Matrices}
|
||||
\noindent There are several useful matrix environments that can save you some keystrokes. See the example coding below and the output.
|
||||
|
||||
\noindent {\bf{A simple matrix:}}
|
||||
\begin{equation}
|
||||
\begin{matrix} 0 & 1 \\
|
||||
1 & 0 \end{matrix}
|
||||
\end{equation}
|
||||
is coded as:
|
||||
\begin{verbatim}
|
||||
\begin{equation}
|
||||
\begin{matrix} 0 & 1 \\
|
||||
1 & 0 \end{matrix}
|
||||
\end{equation}
|
||||
\end{verbatim}
|
||||
|
||||
\noindent {\bf{A matrix with parenthesis}}
|
||||
\begin{equation}
|
||||
\begin{pmatrix} 0 & -i \\
|
||||
i & 0 \end{pmatrix}
|
||||
\end{equation}
|
||||
is coded as:
|
||||
\begin{verbatim}
|
||||
\begin{equation}
|
||||
\begin{pmatrix} 0 & -i \\
|
||||
i & 0 \end{pmatrix}
|
||||
\end{equation}
|
||||
\end{verbatim}
|
||||
|
||||
\noindent {\bf{A matrix with square brackets}}
|
||||
\begin{equation}
|
||||
\begin{bmatrix} 0 & -1 \\
|
||||
1 & 0 \end{bmatrix}
|
||||
\end{equation}
|
||||
is coded as:
|
||||
\begin{verbatim}
|
||||
\begin{equation}
|
||||
\begin{bmatrix} 0 & -1 \\
|
||||
1 & 0 \end{bmatrix}
|
||||
\end{equation}
|
||||
\end{verbatim}
|
||||
|
||||
\noindent {\bf{A matrix with curly braces}}
|
||||
\begin{equation}
|
||||
\begin{Bmatrix} 1 & 0 \\
|
||||
0 & -1 \end{Bmatrix}
|
||||
\end{equation}
|
||||
is coded as:
|
||||
\begin{verbatim}
|
||||
\begin{equation}
|
||||
\begin{Bmatrix} 1 & 0 \\
|
||||
0 & -1 \end{Bmatrix}
|
||||
\end{equation}\end{verbatim}
|
||||
|
||||
\noindent {\bf{A matrix with single verticals}}
|
||||
\begin{equation}
|
||||
\begin{vmatrix} a & b \\
|
||||
c & d \end{vmatrix}
|
||||
\end{equation}
|
||||
is coded as:
|
||||
\begin{verbatim}
|
||||
\begin{equation}
|
||||
\begin{vmatrix} a & b \\
|
||||
c & d \end{vmatrix}
|
||||
\end{equation}\end{verbatim}
|
||||
|
||||
\noindent {\bf{A matrix with double verticals}}
|
||||
\begin{equation}
|
||||
\begin{Vmatrix} i & 0 \\
|
||||
0 & -i \end{Vmatrix}
|
||||
\end{equation}
|
||||
is coded as:
|
||||
\begin{verbatim}
|
||||
\begin{equation}
|
||||
\begin{Vmatrix} i & 0 \\
|
||||
0 & -i \end{Vmatrix}
|
||||
\end{equation}\end{verbatim}
|
||||
|
||||
\subsection{Arrays}
|
||||
\noindent The {\tt{array}} environment allows you some options for matrix-like equations. You will have to manually key the fences, but you'll have options for alignment of the columns and for setting horizontal and vertical rules. The argument to {\tt{array}} controls alignment and placement of vertical rules.
|
||||
|
||||
A simple array
|
||||
\begin{equation}
|
||||
\left(
|
||||
\begin{array}{cccc}
|
||||
a+b+c & uv & x-y & 27\\
|
||||
a+b & u+v & z & 134
|
||||
\end{array}\right)
|
||||
\end{equation}
|
||||
is coded as:
|
||||
\begin{verbatim}
|
||||
\begin{equation}
|
||||
\left(
|
||||
\begin{array}{cccc}
|
||||
a+b+c & uv & x-y & 27\\
|
||||
a+b & u+v & z & 134
|
||||
\end{array} \right)
|
||||
\end{equation}
|
||||
\end{verbatim}
|
||||
|
||||
A slight variation on this to better align the numbers in the last column
|
||||
\begin{equation}
|
||||
\left(
|
||||
\begin{array}{cccr}
|
||||
a+b+c & uv & x-y & 27\\
|
||||
a+b & u+v & z & 134
|
||||
\end{array}\right)
|
||||
\end{equation}
|
||||
is coded as:
|
||||
\begin{verbatim}
|
||||
\begin{equation}
|
||||
\left(
|
||||
\begin{array}{cccr}
|
||||
a+b+c & uv & x-y & 27\\
|
||||
a+b & u+v & z & 134
|
||||
\end{array} \right)
|
||||
\end{equation}
|
||||
\end{verbatim}
|
||||
|
||||
An array with vertical and horizontal rules
|
||||
\begin{equation}
|
||||
\left( \begin{array}{c|c|c|r}
|
||||
a+b+c & uv & x-y & 27\\ \hline
|
||||
a+b & u+v & z & 134
|
||||
\end{array}\right)
|
||||
\end{equation}
|
||||
is coded as:
|
||||
\begin{verbatim}
|
||||
\begin{equation}
|
||||
\left(
|
||||
\begin{array}{c|c|c|r}
|
||||
a+b+c & uv & x-y & 27\\
|
||||
a+b & u+v & z & 134
|
||||
\end{array} \right)
|
||||
\end{equation}
|
||||
\end{verbatim}
|
||||
Note the argument now has the pipe "$\vert$" included to indicate the placement of the vertical rules.
|
||||
|
||||
|
||||
\subsection{Cases Structures}
|
||||
\noindent Many times we find cases coded using the wrong environment, i.e., {\tt{array}}. Using the {\tt{cases}} environment will save keystrokes (from not having to type the $\backslash${\tt{left}}$\backslash${\tt{lbrace}}) and automatically provide the correct column alignment.
|
||||
\begin{equation*}
|
||||
{z_m(t)} = \begin{cases}
|
||||
1,&{\text{if}}\ {\beta }_m(t) \\
|
||||
{0,}&{\text{otherwise.}}
|
||||
\end{cases}
|
||||
\end{equation*}
|
||||
\noindent is coded as follows:
|
||||
\begin{verbatim}
|
||||
\begin{equation*}
|
||||
{z_m(t)} =
|
||||
\begin{cases}
|
||||
1,&{\text{if}}\ {\beta }_m(t),\\
|
||||
{0,}&{\text{otherwise.}}
|
||||
\end{cases}
|
||||
\end{equation*}
|
||||
\end{verbatim}
|
||||
\noindent Note that the ``\&'' is used to mark the tabular alignment. This is important to get proper column alignment. Do not use $\backslash${\tt{quad}} or other fixed spaces to try and align the columns. Also, note the use of the $\backslash${\tt{text}} macro for text elements such as ``if'' and ``otherwise''.
|
||||
|
||||
\subsection{Function Formatting in Equations}
|
||||
In many cases there is an easy way to properly format most common functions. Use of the $\backslash$ in front of the function name will in most cases, provide the correct formatting. When this does not work, the following example provides a solution using the $\backslash${\tt{text}} macro.
|
||||
|
||||
\begin{equation*}
|
||||
d_{R}^{KM} = \underset {d_{l}^{KM}} {\text{arg min}} \{ d_{1}^{KM},\ldots,d_{6}^{KM}\}.
|
||||
\end{equation*}
|
||||
|
||||
\noindent is coded as follows:
|
||||
\begin{verbatim}
|
||||
\begin{equation*}
|
||||
d_{R}^{KM} = \underset {d_{l}^{KM}}
|
||||
{\text{arg min}} \{ d_{1}^{KM},
|
||||
\ldots,d_{6}^{KM}\}.
|
||||
\end{equation*}
|
||||
\end{verbatim}
|
||||
|
||||
\subsection{ Text Acronyms inside equations}
|
||||
\noindent This example shows where the acronym ``MSE" is coded using $\backslash${\tt{text\{\}}} to match how it appears in the text.
|
||||
|
||||
\begin{equation*}
|
||||
\text{MSE} = \frac {1}{n}\sum _{i=1}^{n}(Y_{i} - \hat {Y_{i}})^{2}
|
||||
\end{equation*}
|
||||
|
||||
\begin{verbatim}
|
||||
\begin{equation*}
|
||||
\text{MSE} = \frac {1}{n}\sum _{i=1}^{n}
|
||||
(Y_{i} - \hat {Y_{i}})^{2}
|
||||
\end{equation*}
|
||||
\end{verbatim}
|
||||
|
||||
\subsection{Obsolete Coding}
|
||||
\noindent Avoid the use of outdated environments, such as {\tt{eqnarray}} and \$\$ math delimiters, for display equations. The \$\$ display math delimiters are left over from PlainTeX and should not be used in \LaTeX, ever. Poor vertical spacing will result.
|
||||
\subsection{Use Appropriate Delimiters for Display Equations}
|
||||
\noindent Some improper mathematical coding advice has been given in various YouTube\textsuperscript{TM} videos on how to write scholarly articles, so please follow these good examples:\\
|
||||
|
||||
For {\bf{single-line unnumbered display equations}}, please use the following delimiters:
|
||||
\begin{verbatim}\[ . . . \] or \end{verbatim}
|
||||
\begin{verbatim}\begin{equation*} . . . \end{equation*}\end{verbatim}
|
||||
Note that the * in the environment name turns off equation numbering.\\
|
||||
|
||||
For {\bf{multiline unnumbered display equations}} that have alignment requirements, please use the following delimiters:
|
||||
\begin{verbatim}
|
||||
\begin{align*} . . . \end{align*}
|
||||
\end{verbatim}
|
||||
|
||||
For {\bf{single-line numbered display equations}}, please use the following delimiters:
|
||||
\begin{verbatim}
|
||||
\begin{equation} . . . \end{equation}
|
||||
\end{verbatim}
|
||||
|
||||
For {\bf{multiline numbered display equations}}, please use the following delimiters:
|
||||
\begin{verbatim}
|
||||
\begin{align} . . . \end{align}
|
||||
\end{verbatim}
|
||||
|
||||
\section{LaTeX Package Suggestions}
|
||||
\noindent Immediately after your documenttype declaration at the top of your \LaTeX\ file is the place where you should declare any packages that are being used. The following packages were used in the production of this document.
|
||||
\begin{verbatim}
|
||||
\usepackage{amsmath,amsfonts}
|
||||
\usepackage{algorithmic}
|
||||
\usepackage{array}
|
||||
\usepackage[caption=false,font=normalsize,
|
||||
labelfont=sf,textfont=sf]{subfig}
|
||||
\u00sepackage{textcomp}
|
||||
\usepackage{stfloats}
|
||||
\usepackage{url}
|
||||
\usepackage{verbatim}
|
||||
\usepackage{graphicx}
|
||||
\usepackage{balance}
|
||||
\end{verbatim}
|
||||
|
||||
\section{Additional Advice}
|
||||
|
||||
Please use ``soft'' (e.g., \verb|\eqref{Eq}|) or \verb|(\ref{Eq})|
|
||||
cross references instead of ``hard'' references (e.g., \verb|(1)|).
|
||||
That will make it possible to combine sections, add equations, or
|
||||
change the order of figures or citations without having to go through
|
||||
the file line by line.
|
||||
|
||||
Please note that the \verb|{subequations}| environment in {\LaTeX}
|
||||
will increment the main equation counter even when there are no
|
||||
equation numbers displayed. If you forget that, you might write an
|
||||
article in which the equation numbers skip from (17) to (20), causing
|
||||
the copy editors to wonder if you've discovered a new method of
|
||||
counting.
|
||||
|
||||
{\BibTeX} does not work by magic. It doesn't get the bibliographic
|
||||
data from thin air but from .bib files. If you use {\BibTeX} to produce a
|
||||
bibliography you must send the .bib files.
|
||||
|
||||
{\LaTeX} can't read your mind. If you assign the same label to a
|
||||
subsubsection and a table, you might find that Table I has been cross
|
||||
referenced as Table IV-B3.
|
||||
|
||||
{\LaTeX} does not have precognitive abilities. If you put a
|
||||
\verb|\label| command before the command that updates the counter it's
|
||||
supposed to be using, the label will pick up the last counter to be
|
||||
cross referenced instead. In particular, a \verb|\label| command
|
||||
should not go before the caption of a figure or a table.
|
||||
|
||||
Please do not use \verb|\nonumber| or \verb|\notag| inside the
|
||||
\verb|{array}| environment. It will not stop equation numbers inside
|
||||
\verb|{array}| (there won't be any anyway) and it might stop a wanted
|
||||
equation number in the surrounding equation.
|
||||
|
||||
\balance
|
||||
|
||||
\section{A Final Checklist}
|
||||
\begin{enumerate}{}{}
|
||||
\item{Make sure that your equations are numbered sequentially and there are no equation numbers missing or duplicated. Avoid hyphens and periods in your equation numbering. Stay with IEEE style, i.e., (1), (2), (3) or for sub-equations (1a), (1b). For equations in the appendix (A1), (A2), etc.}.
|
||||
\item{Are your equations properly formatted? Text, functions, alignment points in cases and arrays, etc. }
|
||||
\item{Make sure all graphics are included.}
|
||||
\item{Make sure your references are included either in your main LaTeX file or a separate .bib file if calling the external file.}
|
||||
\end{enumerate}
|
||||
|
||||
\begin{thebibliography}{1}
|
||||
|
||||
\bibitem{ams}
|
||||
{\it{Mathematics into Type}}, American Mathematical Society. Online available:
|
||||
|
||||
\bibitem{oxford}
|
||||
T.W. Chaundy, P.R. Barrett and C. Batey, {\it{The Printing of Mathematics}}, Oxford University Press. London, 1954.
|
||||
|
||||
\bibitem{lacomp}{\it{The \LaTeX Companion}}, by F. Mittelbach and M. Goossens
|
||||
|
||||
\bibitem{mmt}{\it{More Math into LaTeX}}, by G. Gr\"atzer
|
||||
|
||||
\bibitem{amstyle}{\it{AMS-StyleGuide-online.pdf,}} published by the American Mathematical Society
|
||||
|
||||
\bibitem{Sira3}
|
||||
H. Sira-Ramirez. ``On the sliding mode control of nonlinear systems,'' \textit{Systems \& Control Letters}, vol. 19, pp. 303--312, 1992.
|
||||
|
||||
\bibitem{Levant}
|
||||
A. Levant. ``Exact differentiation of signals with unbounded higher derivatives,'' in \textit{Proceedings of the 45th IEEE Conference on Decision and Control}, San Diego, California, USA, pp. 5585--5590, 2006.
|
||||
|
||||
\bibitem{Cedric}
|
||||
M. Fliess, C. Join, and H. Sira-Ramirez. ``Non-linear estimation is easy,'' \textit{International Journal of Modelling, Identification and Control}, vol. 4, no. 1, pp. 12--27, 2008.
|
||||
|
||||
\bibitem{Ortega}
|
||||
R. Ortega, A. Astolfi, G. Bastin, and H. Rodriguez. ``Stabilization of food-chain systems using a port-controlled Hamiltonian description,'' in \textit{Proceedings of the American Control Conference}, Chicago, Illinois, USA, pp. 2245--2249, 2000.
|
||||
|
||||
\end{thebibliography}
|
||||
|
||||
\begin{IEEEbiographynophoto}{Jane Doe}
|
||||
Biography text here without a photo.
|
||||
\end{IEEEbiographynophoto}
|
||||
|
||||
\begin{IEEEbiography}[{\includegraphics[width=1in,height=1.25in,clip,keepaspectratio]{fig1.png}}]{IEEE Publications Technology Team}
|
||||
In this paragraph you can place your educational, professional background and research and other interests.\end{IEEEbiography}
|
||||
|
||||
|
||||
\end{document}
|
||||
|
||||
|
240
main.bbl
240
main.bbl
@ -1,5 +1,5 @@
|
||||
% Generated by IEEEtran.bst, version: 1.14 (2015/08/26)
|
||||
\begin{thebibliography}{1}
|
||||
\begin{thebibliography}{10}
|
||||
\providecommand{\url}[1]{#1}
|
||||
\csname url@samestyle\endcsname
|
||||
\providecommand{\newblock}{\relax}
|
||||
@ -21,7 +21,7 @@
|
||||
\providecommand{\BIBdecl}{\relax}
|
||||
\BIBdecl
|
||||
|
||||
\bibitem{canny1986computational}
|
||||
\bibitem{cannyedge}
|
||||
J.~Canny, ``A computational approach to edge detection,'' \emph{IEEE
|
||||
Transactions on pattern analysis and machine intelligence}, no.~6, pp.
|
||||
679--698, 1986.
|
||||
@ -36,4 +36,240 @@ K.~Kluge and S.~Lakshmanan, ``A deformable-template approach to lane
|
||||
detection,'' in \emph{Proceedings of the Intelligent Vehicles' 95.
|
||||
Symposium}.\hskip 1em plus 0.5em minus 0.4em\relax IEEE, 1995, pp. 54--59.
|
||||
|
||||
\bibitem{yolov10}
|
||||
A.~Wang, H.~Chen, L.~Liu, K.~Chen, Z.~Lin, J.~Han, and G.~Ding, ``Yolov10:
|
||||
Real-time end-to-end object detection,'' \emph{arXiv preprint
|
||||
arXiv:2405.14458}, 2024.
|
||||
|
||||
\bibitem{fasterrcnn}
|
||||
S.~Ren, K.~He, R.~Girshick, and J.~Sun, ``Faster r-cnn: Towards real-time
|
||||
object detection with region proposal networks,'' \emph{IEEE transactions on
|
||||
pattern analysis and machine intelligence}, vol.~39, no.~6, pp. 1137--1149,
|
||||
2016.
|
||||
|
||||
\bibitem{laneatt}
|
||||
L.~Tabelini, R.~Berriel, T.~M. Paixao, C.~Badue, A.~F. De~Souza, and
|
||||
T.~Oliveira-Santos, ``Keep your eyes on the lane: Real-time attention-guided
|
||||
lane detection,'' in \emph{Proceedings of the IEEE/CVF conference on computer
|
||||
vision and pattern recognition}, 2021, pp. 294--302.
|
||||
|
||||
\bibitem{clrnet}
|
||||
T.~Zheng, Y.~Huang, Y.~Liu, W.~Tang, Z.~Yang, D.~Cai, and X.~He, ``Clrnet:
|
||||
Cross layer refinement network for lane detection,'' in \emph{Proceedings of
|
||||
the IEEE/CVF conference on computer vision and pattern recognition}, 2022,
|
||||
pp. 898--907.
|
||||
|
||||
\bibitem{adnet}
|
||||
L.~Xiao, X.~Li, S.~Yang, and W.~Yang, ``Adnet: Lane shape prediction via anchor
|
||||
decomposition,'' in \emph{Proceedings of the IEEE/CVF International
|
||||
Conference on Computer Vision}, 2023, pp. 6404--6413.
|
||||
|
||||
\bibitem{srlane}
|
||||
C.~Chen, J.~Liu, C.~Zhou, J.~Tang, and G.~Wu, ``Sketch and refine: Towards fast
|
||||
and accurate lane detection,'' in \emph{Proceedings of the AAAI Conference on
|
||||
Artificial Intelligence}, vol.~38, no.~2, 2024, pp. 1001--1009.
|
||||
|
||||
\bibitem{tusimple}
|
||||
\BIBentryALTinterwordspacing
|
||||
{TuSimple}, ``Tusimple benchmark,'' 2020, accessed: September 2020. [Online].
|
||||
Available: \url{https://github.com/TuSimple/tusimple-benchmark/}
|
||||
\BIBentrySTDinterwordspacing
|
||||
|
||||
\bibitem{scnn}
|
||||
X.~Pan, J.~Shi, P.~Luo, X.~Wang, and X.~Tang, ``Spatial as deep: Spatial cnn
|
||||
for traffic scene understanding,'' in \emph{Proceedings of the AAAI
|
||||
conference on artificial intelligence}, vol.~32, no.~1, 2018.
|
||||
|
||||
\bibitem{llamas}
|
||||
K.~Behrendt and R.~Soussan, ``Unsupervised labeled lane markers using maps,''
|
||||
in \emph{Proceedings of the IEEE/CVF international conference on computer
|
||||
vision workshops}, 2019, pp. 0--0.
|
||||
|
||||
\bibitem{curvelanes}
|
||||
H.~Xu, S.~Wang, X.~Cai, W.~Zhang, X.~Liang, and Z.~Li, ``Curvelane-nas:
|
||||
Unifying lane-sensitive architecture search and adaptive point blending,'' in
|
||||
\emph{Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK,
|
||||
August 23--28, 2020, Proceedings, Part XV 16}.\hskip 1em plus 0.5em minus
|
||||
0.4em\relax Springer, 2020, pp. 689--704.
|
||||
|
||||
\bibitem{dalnet}
|
||||
Z.~Yu, Q.~Liu, W.~Wang, L.~Zhang, and X.~Zhao, ``Dalnet: A rail detection
|
||||
network based on dynamic anchor line,'' \emph{IEEE Transactions on
|
||||
Instrumentation and Measurement}, 2024.
|
||||
|
||||
\bibitem{lanenet}
|
||||
Z.~Wang, W.~Ren, and Q.~Qiu, ``Lanenet: Real-time lane detection networks for
|
||||
autonomous driving,'' \emph{arXiv preprint arXiv:1807.01726}, 2018.
|
||||
|
||||
\bibitem{ufld}
|
||||
Z.~Qin, H.~Wang, and X.~Li, ``Ultra fast structure-aware deep lane detection,''
|
||||
in \emph{Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK,
|
||||
August 23--28, 2020, Proceedings, Part XXIV 16}.\hskip 1em plus 0.5em minus
|
||||
0.4em\relax Springer, 2020, pp. 276--291.
|
||||
|
||||
\bibitem{ufldv2}
|
||||
Z.~Qin, P.~Zhang, and X.~Li, ``Ultra fast deep lane detection with hybrid
|
||||
anchor driven ordinal classification,'' \emph{IEEE transactions on pattern
|
||||
analysis and machine intelligence}, vol.~46, no.~5, pp. 2555--2568, 2022.
|
||||
|
||||
\bibitem{condlanenet}
|
||||
L.~Liu, X.~Chen, S.~Zhu, and P.~Tan, ``Condlanenet: a top-to-down lane
|
||||
detection framework based on conditional convolution,'' in \emph{Proceedings
|
||||
of the IEEE/CVF international conference on computer vision}, 2021, pp.
|
||||
3773--3782.
|
||||
|
||||
\bibitem{fololane}
|
||||
Z.~Qin, P.~Zhang, and X.~Li, ``Ultra fast deep lane detection with hybrid
|
||||
anchor driven ordinal classification,'' \emph{IEEE transactions on pattern
|
||||
analysis and machine intelligence}, vol.~46, no.~5, pp. 2555--2568, 2022.
|
||||
|
||||
\bibitem{ganet}
|
||||
M.~Morley, R.~Atkinson, D.~Savi{\'c}, and G.~Walters, ``Ganet: genetic
|
||||
algorithm platform for pipe network optimisation,'' \emph{Advances in
|
||||
engineering software}, vol.~32, no.~6, pp. 467--475, 2001.
|
||||
|
||||
\bibitem{polylanenet}
|
||||
L.~Tabelini, R.~Berriel, T.~M. Paixao, C.~Badue, A.~F. De~Souza, and
|
||||
T.~Oliveira-Santos, ``Polylanenet: Lane estimation via deep polynomial
|
||||
regression,'' in \emph{2020 25th International Conference on Pattern
|
||||
Recognition (ICPR)}.\hskip 1em plus 0.5em minus 0.4em\relax IEEE, 2021, pp.
|
||||
6150--6156.
|
||||
|
||||
\bibitem{lstr}
|
||||
R.~Liu, Z.~Yuan, T.~Liu, and Z.~Xiong, ``End-to-end lane shape prediction with
|
||||
transformers,'' in \emph{Proceedings of the IEEE/CVF winter conference on
|
||||
applications of computer vision}, 2021, pp. 3694--3702.
|
||||
|
||||
\bibitem{bezierlanenet}
|
||||
Z.~Feng, S.~Guo, X.~Tan, K.~Xu, M.~Wang, and L.~Ma, ``Rethinking efficient lane
|
||||
detection via curve modeling,'' in \emph{Proceedings of the IEEE/CVF
|
||||
Conference on Computer Vision and Pattern Recognition}, 2022, pp.
|
||||
17\,062--17\,070.
|
||||
|
||||
\bibitem{yolox}
|
||||
G.~Zheng, L.~Songtao, W.~Feng, L.~Zeming, S.~Jian \emph{et~al.}, ``Yolox:
|
||||
Exceeding yolo series in 2021,'' \emph{arXiv preprint arXiv:2107.08430},
|
||||
2021.
|
||||
|
||||
\bibitem{sparse}
|
||||
J.~Liu, Z.~Zhang, M.~Lu, H.~Wei, D.~Li, Y.~Xie, J.~Peng, L.~Tian, A.~Sirasao,
|
||||
and E.~Barsoum, ``Sparse laneformer,'' \emph{arXiv preprint
|
||||
arXiv:2404.07821}, 2024.
|
||||
|
||||
\bibitem{clrernet}
|
||||
H.~Honda and Y.~Uchida, ``Clrernet: improving confidence of lane detection with
|
||||
laneiou,'' in \emph{Proceedings of the IEEE/CVF Winter Conference on
|
||||
Applications of Computer Vision}, 2024, pp. 1176--1185.
|
||||
|
||||
\bibitem{detr}
|
||||
N.~Carion, F.~Massa, G.~Synnaeve, N.~Usunier, A.~Kirillov, and S.~Zagoruyko,
|
||||
``End-to-end object detection with transformers,'' in \emph{European
|
||||
conference on computer vision}.\hskip 1em plus 0.5em minus 0.4em\relax
|
||||
Springer, 2020, pp. 213--229.
|
||||
|
||||
\bibitem{o2o}
|
||||
P.~Sun, Y.~Jiang, E.~Xie, W.~Shao, Z.~Yuan, C.~Wang, and P.~Luo, ``What makes
|
||||
for end-to-end object detection?'' in \emph{International Conference on
|
||||
Machine Learning}.\hskip 1em plus 0.5em minus 0.4em\relax PMLR, 2021, pp.
|
||||
9934--9944.
|
||||
|
||||
\bibitem{learnnms}
|
||||
J.~Hosang, R.~Benenson, and B.~Schiele, ``Learning non-maximum suppression,''
|
||||
in \emph{Proceedings of the IEEE conference on computer vision and pattern
|
||||
recognition}, 2017, pp. 4507--4515.
|
||||
|
||||
\bibitem{date}
|
||||
Y.~Chen, Q.~Chen, Q.~Hu, and J.~Cheng, ``Date: Dual assignment for end-to-end
|
||||
fully convolutional object detection,'' \emph{arXiv preprint
|
||||
arXiv:2211.13859}, 2022.
|
||||
|
||||
\bibitem{o3d}
|
||||
J.~Wang, L.~Song, Z.~Li, H.~Sun, J.~Sun, and N.~Zheng, ``End-to-end object
|
||||
detection with fully convolutional network,'' in \emph{Proceedings of the
|
||||
IEEE/CVF conference on computer vision and pattern recognition}, 2021, pp.
|
||||
15\,849--15\,858.
|
||||
|
||||
\bibitem{relationnet}
|
||||
H.~Hu, J.~Gu, Z.~Zhang, J.~Dai, and Y.~Wei, ``Relation networks for object
|
||||
detection,'' in \emph{Proceedings of the IEEE conference on computer vision
|
||||
and pattern recognition}, 2018, pp. 3588--3597.
|
||||
|
||||
\bibitem{linecnn}
|
||||
X.~Li, J.~Li, X.~Hu, and J.~Yang, ``Line-cnn: End-to-end traffic line detection
|
||||
with line proposal unit,'' \emph{IEEE Transactions on Intelligent
|
||||
Transportation Systems}, vol.~21, no.~1, pp. 248--258, 2019.
|
||||
|
||||
\bibitem{vil100}
|
||||
Y.~Zhang, L.~Zhu, W.~Feng, H.~Fu, M.~Wang, Q.~Li, C.~Li, and S.~Wang,
|
||||
``Vil-100: A new dataset and a baseline model for video instance lane
|
||||
detection,'' in \emph{Proceedings of the IEEE/CVF international conference on
|
||||
computer vision}, 2021, pp. 15\,681--15\,690.
|
||||
|
||||
\bibitem{xu2022overview}
|
||||
Z.-Q.~J. Xu, Y.~Zhang, and T.~Luo, ``Overview frequency principle/spectral bias
|
||||
in deep learning,'' \emph{arXiv preprint arXiv:2201.07395}, 2022.
|
||||
|
||||
\bibitem{stewart2016end}
|
||||
R.~Stewart, M.~Andriluka, and A.~Y. Ng, ``End-to-end people detection in
|
||||
crowded scenes,'' in \emph{Proceedings of the IEEE conference on computer
|
||||
vision and pattern recognition}, 2016, pp. 2325--2333.
|
||||
|
||||
\bibitem{yolact}
|
||||
D.~Bolya, C.~Zhou, F.~Xiao, and Y.~J. Lee, ``Yolact: Real-time instance
|
||||
segmentation,'' in \emph{Proceedings of the IEEE/CVF international conference
|
||||
on computer vision}, 2019, pp. 9157--9166.
|
||||
|
||||
\bibitem{alemi2016deep}
|
||||
A.~A. Alemi, I.~Fischer, J.~V. Dillon, and K.~Murphy, ``Deep variational
|
||||
information bottleneck,'' \emph{arXiv preprint arXiv:1612.00410}, 2016.
|
||||
|
||||
\bibitem{focal}
|
||||
T.-Y. Lin, P.~Goyal, R.~Girshick, K.~He, and P.~Doll{\'a}r, ``Focal loss for
|
||||
dense object detection,'' in \emph{Proceedings of the IEEE international
|
||||
conference on computer vision}, 2017, pp. 2980--2988.
|
||||
|
||||
\bibitem{pss}
|
||||
Q.~Zhou and C.~Yu, ``Object detection made simpler by eliminating heuristic
|
||||
nms,'' \emph{IEEE Transactions on Multimedia}, vol.~25, pp. 9254--9262, 2023.
|
||||
|
||||
\bibitem{adam}
|
||||
D.~P. Kingma, ``Adam: A method for stochastic optimization,'' \emph{arXiv
|
||||
preprint arXiv:1412.6980}, 2014.
|
||||
|
||||
\bibitem{resnet}
|
||||
K.~He, X.~Zhang, S.~Ren, and J.~Sun, ``Deep residual learning for image
|
||||
recognition,'' in \emph{Proceedings of the IEEE conference on computer vision
|
||||
and pattern recognition}, 2016, pp. 770--778.
|
||||
|
||||
\bibitem{dla}
|
||||
F.~Yu, D.~Wang, E.~Shelhamer, and T.~Darrell, ``Deep layer aggregation,'' in
|
||||
\emph{Proceedings of the IEEE conference on computer vision and pattern
|
||||
recognition}, 2018, pp. 2403--2412.
|
||||
|
||||
\bibitem{resa}
|
||||
T.~Zheng, H.~Fang, Y.~Zhang, W.~Tang, Z.~Yang, H.~Liu, and D.~Cai, ``Resa:
|
||||
Recurrent feature-shift aggregator for lane detection,'' in \emph{Proceedings
|
||||
of the AAAI conference on artificial intelligence}, vol.~35, no.~4, 2021, pp.
|
||||
3547--3554.
|
||||
|
||||
\bibitem{laneaf}
|
||||
H.~Abualsaud, S.~Liu, D.~B. Lu, K.~Situ, A.~Rangesh, and M.~M. Trivedi,
|
||||
``Laneaf: Robust multi-lane detection with affinity fields,'' \emph{IEEE
|
||||
Robotics and Automation Letters}, vol.~6, no.~4, pp. 7477--7484, 2021.
|
||||
|
||||
\bibitem{bsnet}
|
||||
H.~Chen, M.~Wang, and Y.~Liu, ``Bsnet: Lane detection via draw b-spline curves
|
||||
nearby,'' \emph{arXiv preprint arXiv:2301.06910}, 2023.
|
||||
|
||||
\bibitem{enetsad}
|
||||
Y.~Hou, Z.~Ma, C.~Liu, and C.~C. Loy, ``Learning lightweight lane detection
|
||||
cnns by self attention distillation,'' in \emph{Proceedings of the IEEE/CVF
|
||||
international conference on computer vision}, 2019, pp. 1013--1021.
|
||||
|
||||
\bibitem{pointlanenet}
|
||||
Z.~Chen, Q.~Liu, and C.~Lian, ``Pointlanenet: Efficient end-to-end cnns for
|
||||
accurate real-time lane detection,'' in \emph{2019 IEEE intelligent vehicles
|
||||
symposium (IV)}.\hskip 1em plus 0.5em minus 0.4em\relax IEEE, 2019, pp.
|
||||
2563--2568.
|
||||
|
||||
\end{thebibliography}
|
||||
|
258
main.tex
258
main.tex
@ -18,8 +18,8 @@
|
||||
\usepackage{booktabs}
|
||||
\usepackage{tikz}
|
||||
\usepackage{tabularx}
|
||||
\usepackage[table,xcdraw]{xcolor}
|
||||
\usepackage[colorlinks,bookmarksopen,bookmarksnumbered, linkcolor=red]{hyperref}
|
||||
% \usepackage[table,xcdraw]{xcolor}
|
||||
|
||||
\definecolor{darkgreen}{RGB}{17,159,27} % 或者使用其他 RGB 值定义深绿色
|
||||
\aboverulesep=0pt
|
||||
@ -56,7 +56,7 @@ Lane detection, NMS-free, Graph neural network, Polar coordinate system.
|
||||
\section{Introduction}
|
||||
\IEEEPARstart{L}{ane} detection is a significant problem in computer vision and autonomous driving, forming the basis for accurately perceiving the driving environment in intelligent driving systems. While extensive research has been conducted in ideal environments, it remains a challenging task in adverse scenarios such as night driving, glare, crowd, and rainy conditions, where lanes may be occluded or damaged. Moreover, the slender shapes, complex topologies of lanes and the global property add to the complexity of detection challenges. An effective lane detection method should take into account both global high-level semantic features and local low-level features to address these varied conditions and ensure robust performance in real-time applications such as autonomous driving.
|
||||
|
||||
Traditional methods predominantly concentrate on handcrafted local feature extraction and lane shape modeling. Techniques such as the Canny edge detector\cite{canny1986computational}, Hough transform\cite{houghtransform}, and deformable templates for lane fitting\cite{kluge1995deformable} have been extensively utilized. Nevertheless, these approaches often encounter limitations in practical settings, particularly when low-level and local features lack clarity or distinctiveness.
|
||||
Traditional methods predominantly concentrate on handcrafted local feature extraction and lane shape modeling. Techniques such as the Canny edge detector\cite{cannyedge}, Hough transform\cite{houghtransform}, and deformable templates for lane fitting\cite{kluge1995deformable} have been extensively utilized. Nevertheless, these approaches often encounter limitations in practical settings, particularly when low-level and local features lack clarity or distinctiveness.
|
||||
|
||||
In recent years, fueled by advancements in deep learning and the availability of large datasets, significant strides have been made in lane detection. Deep models, including convolutional neural networks (CNNs) and transformer-based architectures, have propelled progress in this domain. Previous approaches often treated lane detection as a segmentation task, albeit with simplicity came time-intensive computations. Some methods relied on parameter-based models, directly outputting lane curve parameters instead of pixel locations. These models offer end-to-end solutions, but the curve parameter sensitivity to lane shape compromises robustness.
|
||||
|
||||
@ -120,19 +120,19 @@ In recent years, fueled by advancements in deep learning and the availability of
|
||||
|
||||
|
||||
|
||||
Drawing inspiration from object detection methods such as Yolos \cite{} and Faster RCNN \cite{}, several anchor-based approaches have been introduced for lane detection, the representative work including LanesATT \cite{} and CLRNet \cite{}. These methods have demonstrated superior performance by leveraging anchor priors and enabling larger receptive fields for feature extraction. However, anchor-based methods encounter similar drawbacks as anchor-based general object detection method as follows:
|
||||
Drawing inspiration from object detection methods such as Yolos \cite{yolov10} and Faster RCNN \cite{fasterrcnn}, several anchor-based approaches have been introduced for lane detection, the representative work including LaneATT \cite{laneatt} and CLRNet \cite{clrnet}. These methods have demonstrated superior performance by leveraging anchor priors and enabling larger receptive fields for feature extraction. However, anchor-based methods encounter similar drawbacks as anchor-based general object detection method as follows:
|
||||
|
||||
(1) A large amount of lane anchors are set among the image even in sparse scenarios.
|
||||
|
||||
(2) Non-maximum suppression (NMS) postprocessing is necessary for the remove of redundant prediction but may fail in dense scenarios.
|
||||
|
||||
Regrading the first issue, \cite{} introduced learned anchors, where the anchor parameters are optimized during training to adapt to the lane distributions (see Fig. \ref{anchor setting} (b)) in real dataset. Additionally, they employ cascade cross-layer anchor refinement to bring the anchors closer to the ground truth. However, the anchors are still numerous to cover the potential distributions of lanes. Moving further, \cite{} proposes flexible anchors for each image by generating start points, rather than using a fixed set of anchors for all images. Nevertheless, the start points of lanes are subjective and lack clear visual evidence due to the global nature of lanes, which affects its performance. \cite{} uses a local angle map to propose sketch anchors according to the direction of ground truth. This approach only considers the direction and neglects the accurate positioning of anchors, resulting in suboptimal performance without cascade anchor refinement. Overall, numerous anchors are unnecessary in sparse scenarios (where lane ground truths are sparse). The trend in newly proposed methods is to reduce the number of anchors and offer more flexible anchor configurations.
|
||||
Regrading the first issue, \cite{clrnet} introduced learned anchors, where the anchor parameters are optimized during training to adapt to the lane distributions (see Fig. \ref{anchor setting} (b)) in real dataset. Additionally, they employ cascade cross-layer anchor refinement to bring the anchors closer to the ground truth. However, the anchors are still numerous to cover the potential distributions of lanes. Moving further, \cite{adnet} proposes flexible anchors for each image by generating start points, rather than using a fixed set of anchors for all images. Nevertheless, the start points of lanes are subjective and lack clear visual evidence due to the global nature of lanes, which affects its performance. \cite{srlane} uses a local angle map to propose sketch anchors according to the direction of ground truth. This approach only considers the direction and neglects the accurate positioning of anchors, resulting in suboptimal performance without cascade anchor refinement. Overall, numerous anchors are unnecessary in sparse scenarios (where lane ground truths are sparse). The trend in newly proposed methods is to reduce the number of anchors and offer more flexible anchor configurations.
|
||||
|
||||
|
||||
|
||||
Regarding the second issue, nearly all anchor-based methods (including those mentioned above) require direct or indirect Non-Maximum Suppression (NMS) post-processing to eliminate redundant predictions. Although it is necessary to eliminate redundant predictions, NMS remains a suboptimal solution. On the one hand, NMS is not deployment-friendly because it involves defining and calculating distances (e.g., Intersection over Union) between lane pairs. This is more challenging than bounding boxes in general object detection due to the complexity of lane geometry. On the other hand, NMS fails in some dense scenarios where the lane ground truths are closer together compared to sparse scenarios. A larger distance threshold may result in false negatives, as some true positive predictions might be eliminated (as shown in Fig. \ref{nms setting} (a) and (b)) by mistake. Conversely, a smaller distance threshold may not eliminate redundant predictions effectively and can leave false positives (as shown in Fig. \ref{nms setting} © and (d)). Achieving an optimal trade-off in all scenarios by manually setting the distance threshold is challenging. The root cause of this problem is that the distance definition in NMS considers only geometric parameters while ignoring the semantic context in the image. Thus, when two predictions are “close” to each other, it is nearly impossible to determine whether one of them is redundant.
|
||||
Regarding the second issue, nearly all anchor-based methods (including those mentioned above) require direct or indirect Non-Maximum Suppression (NMS) post-processing to eliminate redundant predictions. Although it is necessary to eliminate redundant predictions, NMS remains a suboptimal solution. On the one hand, NMS is not deployment-friendly because it involves defining and calculating distances (e.g., Intersection over Union) between lane pairs. This is more challenging than bounding boxes in general object detection due to the complexity of lane geometry. On the other hand, NMS fails in some dense scenarios where the lane ground truths are closer together compared to sparse scenarios. A larger distance threshold may result in false negatives, as some true positive predictions might be eliminated (as shown in Fig. \ref{nms setting} (a) and (b)) by mistake. Conversely, a smaller distance threshold may not eliminate redundant predictions effectively and can leave false positives (as shown in Fig. \ref{nms setting} (c) and (d)). Achieving an optimal trade-off in all scenarios by manually setting the distance threshold is challenging. The root cause of this problem is that the distance definition in NMS considers only geometric parameters while ignoring the semantic context in the image. Thus, when two predictions are “close” to each other, it is nearly impossible to determine whether one of them is redundant.
|
||||
|
||||
To address the two issues outlined above, we propose PolarRCNN, a novel anchor-based method for lane detection. For the first issue, we introduce local and global heads based on the polar coordinate system to create anchors with more accurate locations and reduce the number of proposed anchors in sparse scenarios, as illustrated in Fig. \ref{anchor setting} (c). Compared to state-of-the-art previous work \cite{} which uses 192 anchors, PolarRCNN employs only 20 anchors to cover potential lane ground truths. For the second issue, we have revised FastNMS to Graph-based FastNMS and introduced a new heuristic graph neural network block (Polar GNN block) integrated into the non-maximum suppression (NMS) head. The Polar GNN block offers a more interpretable structure compared to traditional NMS, achieving nearly equivalent performance in sparse scenarios and superior performance in dense scenarios. We conducted experiments on five major benchmarks: TuSimple \cite{}, CULane \cite{}, LLAMAS \cite{}, CurveLanes \cite{}, and DL-Rail \cite{}. Our proposed method demonstrates competitive performance compared to state-of-the-art methods.
|
||||
To address the two issues outlined above, we propose PolarRCNN, a novel anchor-based method for lane detection. For the first issue, we introduce local and global heads based on the polar coordinate system to create anchors with more accurate locations and reduce the number of proposed anchors in sparse scenarios, as illustrated in Fig. \ref{anchor setting} (c). Compared to state-of-the-art previous work \cite{clrnet} which uses 192 anchors, PolarRCNN employs only 20 anchors to cover potential lane ground truths. For the second issue, we have revised FastNMS to Graph-based FastNMS and introduced a new heuristic graph neural network block (Polar GNN block) integrated into the non-maximum suppression (NMS) head. The Polar GNN block offers a more interpretable structure compared to traditional NMS, achieving nearly equivalent performance in sparse scenarios and superior performance in dense scenarios. We conducted experiments on five major benchmarks: TuSimple \cite{tusimple}, CULane \cite{scnn}, LLAMAS \cite{llamas}, CurveLanes \cite{curvelanes}, and DL-Rail \cite{dalnet}. Our proposed method demonstrates competitive performance compared to state-of-the-art methods.
|
||||
|
||||
Our main contributions are summarized as follows:
|
||||
|
||||
@ -145,12 +145,12 @@ Our main contributions are summarized as follows:
|
||||
\section{Related Works}
|
||||
The lane detection aims to detect lane instances in a image. In this section, we only introduce deep-leanrning based methods for lane detection. The lane detection methods can be categorized by segmentation based, parameter-based methods and anchor-based methods.
|
||||
|
||||
\textbf{Segmentation-based Methods.} Segmentation-based methods focus on pixel-wise prediction. They predefined each pixel into different categories according to different lane instances and background\cite{} and predicted information pixel by pixel. However, these methods overly focus on low-level and local features, neglecting global semantic information and real-time detection. SCNN uses a larger receptive field to overcome this problem. Some methods such as UFLDv1 and v2\cite{}\cite{} and CondLaneNet\cite{} utilize row-wise or column-wise classification instead of pixel classification to improve detection speed. Another issue with these methods is that the lane instance prior is learned by the model itself, leading to a lack of prior knowledge. Lanenet uses post-clustering to distinguish each lane instance. UFLD divides lane instances by angles and locations and can only detect a fixed number of lanes. CondLaneNet utilizes different conditional dynamic kernels to predict different lane instances. Some methods such as FOLOLane\cite{} and GANet\cite{} use bottom-up strategies to detect a few key points and model their global relations to form lane instances.
|
||||
\textbf{Segmentation-based Methods.} Segmentation-based methods focus on pixel-wise prediction. They predefined each pixel into different categories according to different lane instances and background\cite{lanenet} and predicted information pixel by pixel. However, these methods overly focus on low-level and local features, neglecting global semantic information and real-time detection. SCNN uses a larger receptive field to overcome this problem. Some methods such as UFLDv1 and v2\cite{ufld}\cite{ufldv2} and CondLaneNet\cite{condlanenet} utilize row-wise or column-wise classification instead of pixel classification to improve detection speed. Another issue with these methods is that the lane instance prior is learned by the model itself, leading to a lack of prior knowledge. Lanenet uses post-clustering to distinguish each lane instance. UFLD divides lane instances by angles and locations and can only detect a fixed number of lanes. CondLaneNet utilizes different conditional dynamic kernels to predict different lane instances. Some methods such as FOLOLane\cite{fololane} and GANet\cite{ganet} use bottom-up strategies to detect a few key points and model their global relations to form lane instances.
|
||||
|
||||
\textbf{Parameter-based Methods.} Instead of predicting a series of points locations or pixel classes, parameter-based methods directly generate the curve parameters of lane instances. PolyLanenet\cite{} and LSTR\cite{} consider the lane instance as a polynomial curve and output the polynomial coefficients directly. BézierLaneNet\cite{} treats the lane instance as a Bézier curve and generates the locations of control points of the curve. BSLane uses B-Spline to describe the lane, and the curve parameters focus on the local shapes of lanes. Parameter-based methods are mostly end-to-end without postprocessing, which grants them faster speed. However, since the final visual lane shapes are sensitive to the lane shape, the robustness and generalization of parameter-based methods may be less than ideal.
|
||||
\textbf{Parameter-based Methods.} Instead of predicting a series of points locations or pixel classes, parameter-based methods directly generate the curve parameters of lane instances. PolyLanenet\cite{polylanenet} and LSTR\cite{lstr} consider the lane instance as a polynomial curve and output the polynomial coefficients directly. BézierLaneNet\cite{bezierlanenet} treats the lane instance as a Bézier curve and generates the locations of control points of the curve. BSLane uses B-Spline to describe the lane, and the curve parameters focus on the local shapes of lanes. Parameter-based methods are mostly end-to-end without postprocessing, which grants them faster speed. However, since the final visual lane shapes are sensitive to the lane shape, the robustness and generalization of parameter-based methods may be less than ideal.
|
||||
|
||||
|
||||
\textbf{Anchor-Based Methods.} Inspired by general object detection methods like YOLO \cite{} and DETR \cite{}, anchor-based approaches have been proposed for lane detection. Line-CNN is, to our knowledge, the earliest method that utilizes line anchors for detecting lanes. These lines are designed as rays emitted from the three edges (left, bottom, and right) of an image. However, the model’s receptive field is limited to the edges, making it slower compared to some other methods. LaneATT \cite{} improves upon this by employing anchor-based feature pooling to aggregate features along the entire line anchor, achieving faster speeds and better performance. Nevertheless, its grid sampling strategy and label assignment pose limitations. CLRNet \cite{} enhances anchor-based performance with cross-layer refinement strategies, SimOTA label assignment \cite{}, and Liou loss, surpassing many previous methods. A key advantage of anchor-based methods is their adaptability, allowing the integration of strategies from anchor-based general object detection, such as label assignment, bounding box refinement, and GIOU loss. However, existing anchor-based lane detection methods also have notable drawbacks. Line anchors are often handcrafted and numerous, which can be cumbersome. Some approaches, such as ADNet \cite{}, SRLane \cite{}, and Sparse Laneformer \cite{}, attempt to reduce the number of anchors and provide proposals, but this can slightly impact performance. Additionally, methods such as \cite{} \cite{} still rely on NMS postprocessing, complicating NMS threshold settings and model deployment. Although one-to-one label assignment (during training) without NMS \cite{} (during evaluation) alleviates this issue, its performance remains less satisfactory compared to NMS-based models.
|
||||
\textbf{Anchor-Based Methods.} Inspired by general object detection methods like YOLO \cite{yolov10} and fasterrcnn \cite{fasterrcnn}, anchor-based approaches have been proposed for lane detection. Line-CNN is, to our knowledge, the earliest method that utilizes line anchors for detecting lanes. These lines are designed as rays emitted from the three edges (left, bottom, and right) of an image. However, the model’s receptive field is limited to the edges, making it slower compared to some other methods. LaneATT \cite{laneatt} improves upon this by employing anchor-based feature pooling to aggregate features along the entire line anchor, achieving faster speeds and better performance. Nevertheless, its grid sampling strategy and label assignment pose limitations. CLRNet \cite{clrnet} enhances anchor-based performance with cross-layer refinement strategies, SimOTA label assignment \cite{yolox}, and Liou loss, surpassing many previous methods. A key advantage of anchor-based methods is their adaptability, allowing the integration of strategies from anchor-based general object detection, such as label assignment, bounding box refinement, and GIOU loss. However, existing anchor-based lane detection methods also have notable drawbacks. Line anchors are often handcrafted and numerous, which can be cumbersome. Some approaches, such as ADNet \cite{adnet}, SRLane \cite{srlane}, and Sparse Laneformer \cite{sparse}, attempt to reduce the number of anchors and provide proposals, but this can slightly impact performance. Additionally, methods such as \cite{clrernet} \cite{adnet} still rely on NMS postprocessing, complicating NMS threshold settings and model deployment. Although one-to-one label assignment (during training) without NMS \cite{detr}\cite{o2o} (during evaluation) alleviates this issue, its performance remains less satisfactory compared to NMS-based models.
|
||||
|
||||
\begin{figure*}[ht]
|
||||
\centering
|
||||
@ -159,12 +159,12 @@ The lane detection aims to detect lane instances in a image. In this section, we
|
||||
\label{overall_architecture}
|
||||
\end{figure*}
|
||||
|
||||
\textbf{NMS-Free Object Detections}. Non-Maximum Suppression (NMS) is an important postprocessing step in most general object detection methods. Detr \cite{} employs one-to-one label assignment to avoid redundant predictions without using NMS. Other NMS-free methods \cite{} have also been proposed, addressing this issue from two aspects: model architecture and label assignment. Studies \cite{} \cite{} suggest that one-to-one assignments are crucial for NMS-free predictions, but maintaining one-to-many assignments is still necessary to ensure effective feature learning of the model. Other works \cite{} \cite{} consider the model’s expressive capacity to provide non-redundant predictions. However, few studies have analyzed the NMS-free paradigm for anchor-based lane detection methods as thoroughly as in general object detection. Most anchor-based lane detection methods still rely on NMS postprocessing. In our work, besides label assignment, we extend the analysis to the detection head’s structure, focusing on achieving non-redundant (NMS-free) lane predictions.
|
||||
\textbf{NMS-Free Object Detections}. Non-Maximum Suppression (NMS) is an important postprocessing step in most general object detection methods. Detr \cite{detr} employs one-to-one label assignment to avoid redundant predictions without using NMS. Other NMS-free methods \cite{learnnms} have also been proposed, addressing this issue from two aspects: model architecture and label assignment. Studies \cite{date} \cite{yolov10} suggest that one-to-one assignments are crucial for NMS-free predictions, but maintaining one-to-many assignments is still necessary to ensure effective feature learning of the model. Other works \cite{o3d} \cite{relationnet} consider the model’s expressive capacity to provide non-redundant predictions. However, few studies have analyzed the NMS-free paradigm for anchor-based lane detection methods as thoroughly as in general object detection. Most anchor-based lane detection methods still rely on NMS postprocessing. In our work, besides label assignment, we extend the analysis to the detection head’s structure, focusing on achieving non-redundant (NMS-free) lane predictions.
|
||||
|
||||
In this work, we aim to address to two issues in anchor-based lane detection mentioned above, the sparse lane anchor setting and NMS-free predictions.
|
||||
|
||||
\section{Method}
|
||||
The overall architecture of PolarRCNN is illustrated in Fig. \ref{overall_architecture}. Our model adheres to the Faster R-CNN \cite{} framework, consisting of a backbone, FPN (Feature Pyramid Network), RPN (Region Proposal Network), and RoI (Region of Interest) pooling. To investigate the fundamental factors affecting model performance, such as anchor settings and NMS (Non-Maximum Suppression) postprocessing, and make the model easier to deploy, PolarRCNN employs a simple and straightforward network structure. It relies on basic components including convolutional layers, MLPs (Multi-Layer Perceptrons), and pooling operations, deliberately excluding advanced elements like attention mechanisms, dynamic kernels, and cross-layer refinement used in pervious works \cite{}\cite{}.
|
||||
The overall architecture of PolarRCNN is illustrated in Fig. \ref{overall_architecture}. Our model adheres to the Faster R-CNN \cite{fasterrcnn} framework, consisting of a backbone, FPN (Feature Pyramid Network), RPN (Region Proposal Network), and RoI (Region of Interest) pooling. To investigate the fundamental factors affecting model performance, such as anchor settings and NMS (Non-Maximum Suppression) postprocessing, and make the model easier to deploy, PolarRCNN employs a simple and straightforward network structure. It relies on basic components including convolutional layers, MLPs (Multi-Layer Perceptrons), and pooling operations, deliberately excluding advanced elements like attention mechanisms, dynamic kernels, and cross-layer refinement used in pervious works \cite{clrnet}\cite{clrernet}.
|
||||
|
||||
\begin{table}[h]
|
||||
\centering
|
||||
@ -174,20 +174,20 @@ The overall architecture of PolarRCNN is illustrated in Fig. \ref{overall_archit
|
||||
\toprule
|
||||
\textbf{Variable} & \textbf{Type} & \hspace{10em}\textbf{Defination} \\
|
||||
\midrule
|
||||
\textbf{P}_{i} & tensor& The $i_{th}$ output feature map from FPN\\
|
||||
H^{L}& scalar& The height of the local polar map\\
|
||||
W^{L}& scalar& The weight of the local polar map\\
|
||||
K_{A} & scalar& The number of anchors selected during evaluation\\
|
||||
\textbf{c}^{G}& tensor& The origin point of global polar coordinate\\
|
||||
\textbf{c}^{L}& tensor& The origin point of local polar coordinate\\
|
||||
r^{G}_{i}& scalar& The $i_{th}$ anchor radius under global polar coordinate\\
|
||||
r^{L}_{i}& scalar& The $i_{th}$ anchor radius under global polar coordinate\\
|
||||
\theta_{i}& scalar& The $i_{th}$ anchor angle under global/local polar coordinate\\
|
||||
$\mathbf{P}_{i}$ & tensor& The $i_{th}$ output feature map from FPN\\
|
||||
$H^{L}$& scalar& The height of the local polar map\\
|
||||
$W^{L}$& scalar& The weight of the local polar map\\
|
||||
$K_{A}$ & scalar& The number of anchors selected during evaluation\\
|
||||
$\mathbf{c}^{G}$& tensor& The origin point of global polar coordinate\\
|
||||
$\mathbf{c}^{L}$& tensor& The origin point of local polar coordinate\\
|
||||
$r^{G}_{i}$& scalar& The $i_{th}$ anchor radius under global polar coordinate\\
|
||||
$r^{L}_{i}$& scalar& The $i_{th}$ anchor radius under global polar coordinate\\
|
||||
$\theta_{i}$& scalar& The $i_{th}$ anchor angle under global/local polar coordinate\\
|
||||
\midrule
|
||||
\textbf{X}^{pool}_{i}& tensor& The pooling feature of the $i_{th}$ anchor\\
|
||||
N^{nbr}_{i}& set& The adjacent node set of the $i_{th}$ of anchor node\\
|
||||
C_{o2m} & scalar& The positive threshold of one-to-many confidence\\
|
||||
C_{o2o} & scalar& The positive threshold of one-to-one confidence\\
|
||||
$\mathbf{X}^{pool}_{i}$& tensor& The pooling feature of the $i_{th}$ anchor\\
|
||||
$N^{nbr}_{i}$& set& The adjacent node set of the $i_{th}$ of anchor node\\
|
||||
$C_{o2m}$ & scalar& The positive threshold of one-to-many confidence\\
|
||||
$C_{o2o}$ & scalar& The positive threshold of one-to-one confidence\\
|
||||
\midrule
|
||||
& & \\
|
||||
& & \\
|
||||
@ -203,9 +203,9 @@ The overall architecture of PolarRCNN is illustrated in Fig. \ref{overall_archit
|
||||
|
||||
\subsection{Lane and Line Anchor Representation}
|
||||
|
||||
Lanes are characterized by their thin and elongated curved shapes. A suitable lane prior aids the model in extracting features, predicting locations, and modeling the shapes of lane curves with greater accuracy. In line with previous works \cite{}\cite{}, our lane priors (also referred to as lane anchors) consists of straight lines. We sample a sequence of 2D points along each lane anchor, denoted as $ P\doteq \left\{ \left( x_1, y_1 \right) , \left( x_2, y_2 \right) , ....,\left( x_n, y_n \right) \right\} $, where N is the number of sampled points. The y-coordinates of these points are uniformly sampled from the vertical axis of the image, specifically $y_i=\frac{H}{N-1}*i$, where H is the image height. These y-coordinates are also sampled from the ground truth lane, and the model is tasked with regressing the x-coordinate offset from the line anchor to the lane instance ground truth. The primary distinction between PolarRCNN and previous approaches lies in the description of the lane anchors (straight line), which will be detailed in the following sections.
|
||||
Lanes are characterized by their thin and elongated curved shapes. A suitable lane prior aids the model in extracting features, predicting locations, and modeling the shapes of lane curves with greater accuracy. In line with previous works \cite{linecnn}\cite{laneatt}, our lane priors (also referred to as lane anchors) consists of straight lines. We sample a sequence of 2D points along each lane anchor, denoted as $ P\doteq \left\{ \left( x_1, y_1 \right) , \left( x_2, y_2 \right) , ....,\left( x_n, y_n \right) \right\} $, where N is the number of sampled points. The y-coordinates of these points are uniformly sampled from the vertical axis of the image, specifically $y_i=\frac{H}{N-1}*i$, where H is the image height. These y-coordinates are also sampled from the ground truth lane, and the model is tasked with regressing the x-coordinate offset from the line anchor to the lane instance ground truth. The primary distinction between PolarRCNN and previous approaches lies in the description of the lane anchors (straight line), which will be detailed in the following sections.
|
||||
|
||||
\textbf{Polar Coordinate system.} Since lane anchors are typically represented as straight lines, they can be described using straight line parameters. Previous approaches have used rays to describe 2D lane anchors, with the parameters including the coordinates of the starting point and the orientation/angle, denoted as $\left\{\theta, P_{xy}\right\}$, as shown in Fig. \ref{coord} (a). \cite{}\cite{} define the start points as lying on the three image boundaries. However, \cite{} argue that this approach is problematic because the actual starting point of a lane could be located anywhere within the image. In our analysis, using a ray can lead to ambiguity in line representation because a line can have an infinite number of starting points, and the choice of the starting point for a lane is subjective. As illustrated in Fig. \ref{coord} (a), the yellow (the visual start point) and green (the point located on the image boundary) starting points with the same orientation $\theta$ describe the same line, and either could be used in different datasets \cite{}\cite{}. This ambiguity arises because a straight line has two degrees of freedom, whereas a ray has three. To resolve this ussue , we propose using polar coordinates to describe a lane anchor with only two parameters: radius and angle, deoted as $\left\{\theta, r\right\}$, where $\theta \in \left[-\frac{\pi}{2}, \frac{\pi}{2}\right)$ and $r \in \left(-\infty, +\infty\right)$. This representation isillustrated in Fig. \ref{coord} (b).
|
||||
\textbf{Polar Coordinate system.} Since lane anchors are typically represented as straight lines, they can be described using straight line parameters. Previous approaches have used rays to describe 2D lane anchors, with the parameters including the coordinates of the starting point and the orientation/angle, denoted as $\left\{\theta, P_{xy}\right\}$, as shown in Fig. \ref{coord} (a). \cite{linecnn}\cite{laneatt} define the start points as lying on the three image boundaries. However, \cite{adnet} argue that this approach is problematic because the actual starting point of a lane could be located anywhere within the image. In our analysis, using a ray can lead to ambiguity in line representation because a line can have an infinite number of starting points, and the choice of the starting point for a lane is subjective. As illustrated in Fig. \ref{coord} (a), the yellow (the visual start point) and green (the point located on the image boundary) starting points with the same orientation $\theta$ describe the same line, and either could be used in different datasets \cite{scnn}\cite{vil100}. This ambiguity arises because a straight line has two degrees of freedom, whereas a ray has three. To resolve this ussue , we propose using polar coordinates to describe a lane anchor with only two parameters: radius and angle, deoted as $\left\{\theta, r\right\}$, where $\theta \in \left[-\frac{\pi}{2}, \frac{\pi}{2}\right)$ and $r \in \left(-\infty, +\infty\right)$. This representation isillustrated in Fig. \ref{coord} (b).
|
||||
|
||||
\begin{figure}[t]
|
||||
\centering
|
||||
@ -229,7 +229,7 @@ We define two types of polar coordinate systems: the global coordinate system an
|
||||
|
||||
\subsection{Local Polar Head}
|
||||
|
||||
\textbf{Anchor formulation in Local polar head}. Inspired by the region proposal network in Faster R-CNN \cite{}, the local polar head (LPH) aims to propose flexible, high-quality anchors aorund the lane ground truths within an image. As Figure \ref{lph} and Figure \ref{overall_architecture} demonstrate, the highest level $P_{3} \in \mathbb{R}^{C_{f} \times H_{f} \times W_{f}}$ of FPN feature maps is selected as the input for the Local Polar Head (LPH). Following a downsampling operation, the feature map is then fed into two branches: the regression branch $\phi _{reg}^{lph}\left(\cdot \right)$ and the classification branch $\phi _{cls}^{lph}\left(\cdot \right)$.
|
||||
\textbf{Anchor formulation in Local polar head}. Inspired by the region proposal network in Faster R-CNN \cite{fasterrcnn}, the local polar head (LPH) aims to propose flexible, high-quality anchors aorund the lane ground truths within an image. As Figure \ref{lph} and Figure \ref{overall_architecture} demonstrate, the highest level $P_{3} \in \mathbb{R}^{C_{f} \times H_{f} \times W_{f}}$ of FPN feature maps is selected as the input for the Local Polar Head (LPH). Following a downsampling operation, the feature map is then fed into two branches: the regression branch $\phi _{reg}^{lph}\left(\cdot \right)$ and the classification branch $\phi _{cls}^{lph}\left(\cdot \right)$.
|
||||
|
||||
\begin{equation}
|
||||
\begin{aligned}
|
||||
@ -296,7 +296,7 @@ Next, feature points are sampled on the lane anchor. The y-coordinates of these
|
||||
\label{gph}
|
||||
\end{figure}
|
||||
|
||||
Suppose the $P_{0}$, $P_{1}$ and $P_{2}$ denote the last three levels from FPN and $\boldsymbol{F}_{L}^{s}\in \mathbb{R} ^{N_p\times d_f}$ represent the $L_{th}$ sample point feature from $P_{L}$. The grid featuers from the three levels are extracted and fused together without cross layer cascade refinenment unlike CLRNet. To reduce the number of parameters, we employ a weight sum strategy to combine features from different layers, similar to \cite{}, but in a more compact form:
|
||||
Suppose the $P_{0}$, $P_{1}$ and $P_{2}$ denote the last three levels from FPN and $\boldsymbol{F}_{L}^{s}\in \mathbb{R} ^{N_p\times d_f}$ represent the $L_{th}$ sample point feature from $P_{L}$. The grid featuers from the three levels are extracted and fused together without cross layer cascade refinenment unlike CLRNet. To reduce the number of parameters, we employ a weight sum strategy to combine features from different layers, similar to \cite{detr}, but in a more compact form:
|
||||
|
||||
\begin{equation}
|
||||
\begin{aligned}
|
||||
@ -311,7 +311,7 @@ where $\boldsymbol{w}_{L}^{s}\in \mathbb{R} ^{N_p}$ represents the learnable agg
|
||||
\end{aligned}
|
||||
\end{equation}
|
||||
|
||||
\textbf{Triplet Head.} The triplet head comprises three distinct heads: the one-to-one classification (O2O cls) head, the one-to-many classification (O2M cls) head, and the one-to-many regression (O2M Reg) head. In various studies \cite{}\cite{}\cite{}\cite{}, the detection head predominantly follows the one-to-many paradigm. During the training phase, multiple positive samples are assigned to a single ground truth. Consequently, during the evaluation stage, redundant detection results are often predicted for each instance. These redundancies are typically addressed using Non-Maximum Suppression (NMS), which eliminates duplicate results and retains the highest confidence detection. However, NMS relies on the definition of distance between detection results, and this calculation can be complex for curved lanes and other irregular geometric shapes. To achieve non-redundant detection results (NMS-free), the one-to-one paradigm becomes crucial during training, as highlighted in \cite{}. Nevertheless, merely adopting the one-to-one paradigm is insufficient; the structure of the detection head also plays a pivotal role in achieving NMS-free detection. This aspect will be further analyzed in the following sections.
|
||||
\textbf{Triplet Head.} The triplet head comprises three distinct heads: the one-to-one classification (O2O cls) head, the one-to-many classification (O2M cls) head, and the one-to-many regression (O2M Reg) head. In various studies \cite{laneatt}\cite{clrnet}\cite{adnet}\cite{srlane}, the detection head predominantly follows the one-to-many paradigm. During the training phase, multiple positive samples are assigned to a single ground truth. Consequently, during the evaluation stage, redundant detection results are often predicted for each instance. These redundancies are typically addressed using Non-Maximum Suppression (NMS), which eliminates duplicate results and retains the highest confidence detection. However, NMS relies on the definition of distance between detection results, and this calculation can be complex for curved lanes and other irregular geometric shapes. To achieve non-redundant detection results (NMS-free), the one-to-one paradigm becomes crucial during training, as highlighted in \cite{o2o}. Nevertheless, merely adopting the one-to-one paradigm is insufficient; the structure of the detection head also plays a pivotal role in achieving NMS-free detection. This aspect will be further analyzed in the following sections.
|
||||
|
||||
\begin{algorithm}[t]
|
||||
\caption{The Algorithm of the Graph-based FastNMS}
|
||||
@ -354,15 +354,14 @@ where $\boldsymbol{w}_{L}^{s}\in \mathbb{R} ^{N_p}$ represents the learnable agg
|
||||
where $d\left(\cdot, \cdot, \cdot, \cdot \right)$ is some predefined function to quantify the distance between two lane predictions.
|
||||
\STATE Define the adjacent matrix $\boldsymbol{T}=\,\,\boldsymbol{C}\land\boldsymbol{M}$ and the final confidence $\tilde{s}_i$ is calculate as following:
|
||||
\begin{equation}
|
||||
\begin{aligned}
|
||||
\tilde{s}_i = \begin{cases}
|
||||
1,\underset{j\in \left\{ j|T_{ij}=1 \right\}}{\max}D_{ij}<\d_{\tau}\\
|
||||
0,others\\
|
||||
1, & \text{if } \underset{j \in \{ j \mid T_{ij} = 1 \}}{\max} D_{ij} < \delta_{\tau} \\
|
||||
0, & \text{otherwise}
|
||||
\end{cases}
|
||||
\end{aligned}
|
||||
\label{al_1-4}
|
||||
\end{equation}
|
||||
|
||||
|
||||
\RETURN The final confidence $\tilde{s}_i$; %算法的返回值
|
||||
\end{algorithmic}
|
||||
\label{Graph FastNMS}
|
||||
@ -390,9 +389,9 @@ where $\boldsymbol{w}_{L}^{s}\in \mathbb{R} ^{N_p}$ represents the learnable agg
|
||||
\end{equation}
|
||||
|
||||
|
||||
The equation \ref{sharp fun} suggests that the property of $f_{cls}^{plain}$ need to be "sharp" enough to differentiate between two similar features. That is to say, the output of $f_{cls}^{plain}$ changes repidly over short proids or distances, it implies that $f_{cls}^{plain}$ need to captures information with higher frequency. This issue is also discussed in \cite{}. Capturing the high frequency with a plain structure is challenging because a naive MLP tends to capture information with lower frequency \cite{}. In the most extreme case, where $\boldsymbol{F}_{i}^{roi} = \boldsymbol{F}_{j}^{roi}$, it becomes impossible to distinguish the two anchors to positive and negative samples completely; in practice, both confidences converge to around 0.5. This problem arises from the limitations of the input format and the structure of the naive MLP, which restrict its expressive capability for information with higher frequency. Therefore, it is crucial to establish relationships between anchors and design a new model structure to effectively represent “sharp” information.
|
||||
The equation \ref{sharp fun} suggests that the property of $f_{cls}^{plain}$ need to be "sharp" enough to differentiate between two similar features. That is to say, the output of $f_{cls}^{plain}$ changes repidly over short proids or distances, it implies that $f_{cls}^{plain}$ need to captures information with higher frequency. This issue is also discussed in \cite{o3d}. Capturing the high frequency with a plain structure is challenging because a naive MLP tends to capture information with lower frequency \cite{xu2022overview}. In the most extreme case, where $\boldsymbol{F}_{i}^{roi} = \boldsymbol{F}_{j}^{roi}$, it becomes impossible to distinguish the two anchors to positive and negative samples completely; in practice, both confidences converge to around 0.5. This problem arises from the limitations of the input format and the structure of the naive MLP, which restrict its expressive capability for information with higher frequency. Therefore, it is crucial to establish relationships between anchors and design a new model structure to effectively represent “sharp” information.
|
||||
|
||||
It is easy to see that the "ideal" one-to-one branch is equivalence to O2M cls branch with O2M regression and NMS postprocessing. If the NMS could be replaced by some equivalent but learnable functions (e.g. a neural work with specific structure), the O2O head could be trained to handle the one-to-one assignment. However, the NMS involves sequential iteration and confidence sorting, which are challenging to reproduce with a neural network. Although previous work, such as RNN-based approaches\cite{}. These methods are time-consuming and introduce additional complexity into the model training process due to their iterative nature. To eliminate the iteration process, we proposed a equivalent format of FastNMS\cite{}.
|
||||
It is easy to see that the "ideal" one-to-one branch is equivalence to O2M cls branch with O2M regression and NMS postprocessing. If the NMS could be replaced by some equivalent but learnable functions (e.g. a neural work with specific structure), the O2O head could be trained to handle the one-to-one assignment. However, the NMS involves sequential iteration and confidence sorting, which are challenging to reproduce with a neural network. Although previous work, such as RNN-based approaches\cite{stewart2016end}. These methods are time-consuming and introduce additional complexity into the model training process due to their iterative nature. To eliminate the iteration process, we proposed a equivalent format of FastNMS\cite{yolact}.
|
||||
|
||||
The key rule of the NMS postprocessing is as follows:
|
||||
Given a series of positive detections with redundancy, detection lane A is supressed by another detection lane B if and only if:
|
||||
@ -407,9 +406,9 @@ For simplicity, FastNMS only satisfies the condition (1) and (2), which may lead
|
||||
|
||||
It is straightforward to demonstrate that, when all elements in $\boldsymbol{M}$ are all set to 1 (regardless of geometric priors), Graph-based FastNMS is equivalent to FastNMS. Building upon our newly proposed Graph-based FastNMS, we can design the structure of the one-to-one classification head in a manner that mirrors the principles of following Graph-based FastNMS.
|
||||
|
||||
According to the analysis of the shortcomings of traditional NMS postprocessing shown in Fig. \ref{nms setting}, the fundamental issue arises from the definition of the distance between predictions. Traditional NMS relies on geometric properties to define distances between predictions, which often neglects the contextual semantics. For example, in some scenarios, two predicted lanes with a small geometric distance should not be suppressed, such as in the case of double lines or fork lines. Although setting a threshold $\d_{\tau}$ can mitigate this problem, it is challenging to strike a balance between precision and recall.
|
||||
According to the analysis of the shortcomings of traditional NMS postprocessing shown in Fig. \ref{nms setting}, the fundamental issue arises from the definition of the distance between predictions. Traditional NMS relies on geometric properties to define distances between predictions, which often neglects the contextual semantics. For example, in some scenarios, two predicted lanes with a small geometric distance should not be suppressed, such as in the case of double lines or fork lines. Although setting a threshold $d_{\tau}$ can mitigate this problem, it is challenging to strike a balance between precision and recall.
|
||||
|
||||
To address this, we replace the explicit definition of the distance function with an implicit graph neural network. Additionally, the coordinates of anchors is also replace with the anchor features ${F}_{i}^{roi}$. According to information bottleneck theory \cite{}, ${F}_{i}^{roi}$ , which contains the location and classification information, is sufficient for modelling the explicit geometric distance by neural work. Besides the geometric information, features ${F}_{i}^{roi}$ containes the contextual information of an anchor, which provides additional clues for establishing implicit distances between two anchors. The implicit distance is expressed as follows:
|
||||
To address this, we replace the explicit definition of the distance function with an implicit graph neural network. Additionally, the coordinates of anchors is also replace with the anchor features ${F}_{i}^{roi}$. According to information bottleneck theory \cite{alemi2016deep}, ${F}_{i}^{roi}$ , which contains the location and classification information, is sufficient for modelling the explicit geometric distance by neural work. Besides the geometric information, features ${F}_{i}^{roi}$ containes the contextual information of an anchor, which provides additional clues for establishing implicit distances between two anchors. The implicit distance is expressed as follows:
|
||||
|
||||
\begin{equation}
|
||||
\begin{aligned}
|
||||
@ -453,7 +452,7 @@ It should be noted that the O2O cls head depends on the predictons of O2M cls he
|
||||
\end{equation}
|
||||
|
||||
|
||||
\textbf{Label assignment and Cost function} We use the label assignment (SimOTA) similar to previous work \cite{}\cite{}. However, to make the function more compact and consistent with general object detection works \cite{ref3}, we have redefined the lane IoU. As illustrated in Fig. \ref{glaneiou}, the newly-defined lane IoU, which we refer to as GLaneIoU, is redefined as follows:
|
||||
\textbf{Label assignment and Cost function} We use the label assignment (SimOTA) similar to previous work \cite{clrnet}\cite{clrernet}. However, to make the function more compact and consistent with general object detection works \cite{ref3}, we have redefined the lane IoU. As illustrated in Fig. \ref{glaneiou}, the newly-defined lane IoU, which we refer to as GLaneIoU, is redefined as follows:
|
||||
|
||||
\begin{figure}[t]
|
||||
\centering
|
||||
@ -475,22 +474,22 @@ It should be noted that the O2O cls head depends on the predictons of O2M cls he
|
||||
\end{aligned}
|
||||
\end{equation}
|
||||
|
||||
The definations of $d_{i}^{\mathcal{O}}$ and $d_{i}^{\mathcal{\xi}}$ is similar but slightly different from those in \cite{} and \cite{}, with adjustments made to ensure the values are non-negative. This format is intended to maintain consistency with the IoU definitions used for bounding boxes. Therefore, the overall GLaneIoU is given as follows:
|
||||
The definations of $d_{i}^{\mathcal{O}}$ and $d_{i}^{\mathcal{\xi}}$ is similar but slightly different from those in \cite{clrnet} and \cite{adnet}, with adjustments made to ensure the values are non-negative. This format is intended to maintain consistency with the IoU definitions used for bounding boxes. Therefore, the overall GLaneIoU is given as follows:
|
||||
\begin{equation}
|
||||
\begin{aligned}
|
||||
GLaneIoU\,\,=\,\,\frac{\sum\nolimits_{i=j}^k{d_{i}^{\mathcal{O}}}}{\sum\nolimits_{i=j}^k{d_{i}^{\mathcal{U}}}}-g\frac{\sum\nolimits_{i=j}^k{d_{i}^{\xi}}}{\sum\nolimits_{i=j}^k{d_{i}^{\mathcal{U}}}}
|
||||
\end{aligned}
|
||||
\end{equation}
|
||||
where j and k are the indices of the valid points (the start point and the end point). It's straightforward to observed that when $g=0$, the GLaneIoU is correspond to IoU for bounding box, with a value range of $\left[0, 1 \right]$. When $g=1$, the GLaneIoU is correspond to GIoU for bounding box, with a value range of $\left(-1, 1 \right]$. In general, when $g>0$, the value range of GLaneIoU is $\left(-g, 1 \right]$.
|
||||
We then define the cost function between $i_{th}$ prediction and $j_{th}$ ground truth as follows \cite{}:
|
||||
We then define the cost function between $i_{th}$ prediction and $j_{th}$ ground truth as follows like \cite{detr}:
|
||||
\begin{equation}
|
||||
\begin{aligned}
|
||||
\mathcal{C} _{ij}=\left(s_i\right)^{\beta_c}\times \left( GLaneIoU_{ij, g=0} \right) ^{\beta_r}
|
||||
\end{aligned}
|
||||
\end{equation}
|
||||
This cost function is more compact than those in previous work and takes both location and confidence into account. For label assignment, SimOTA (with k=4) \cite{ref1} is used for the two O2M heads with one-to-many assignment, while the Hungarian \cite{} algorithm is employed for the O2O classification head for one-to-one assignment.
|
||||
This cost function is more compact than those in previous work and takes both location and confidence into account. For label assignment, SimOTA (with k=4) \cite{yolox} is used for the two O2M heads with one-to-many assignment, while the Hungarian \cite{detr} algorithm is employed for the O2O classification head for one-to-one assignment.
|
||||
|
||||
\textbf{Loss function} We use focal loss \cite{} for O2O cls head and O2M cls head:
|
||||
\textbf{Loss function} We use focal loss \cite{focal} for O2O cls head and O2M cls head:
|
||||
\begin{equation}
|
||||
\begin{aligned}
|
||||
\mathcal{L} _{\,\,o2m,cls}&=\sum_{i\in \varOmega _{pos}^{o2m}}{\alpha _{o2m}\left( 1-s_i \right) ^{\gamma}\log \left( s_i \right)}\\&+\sum_{i\in \varOmega _{neg}^{o2m}}{\left( 1-\alpha _{o2m} \right) \left( s_i \right) ^{\gamma}\log \left( 1-s_i \right)}
|
||||
@ -505,7 +504,7 @@ where the set of the one-to-one sample, $\varOmega _{pos}^{o2o}$ and $\varOmega
|
||||
\varOmega _{pos}^{o2o}\cup \varOmega _{neg}^{o2o}=\left\{ i|s_i>C_{o2m} \right\}
|
||||
\end{aligned}
|
||||
\end{equation}
|
||||
only one sample with confidence larger than $C_{o2m}$ is chosed as the canditate sample of O2O cls head. To maintain feature quality during training stage, the gradient of O2O cls head are stopped from propagating back to the rest of the network (the roi feature of the anchor $\boldsymbol{F}}_{i}^{roi}$). Additionally, we use the rank loss to increase the gap between positive and negative confidences of O2O cls head:
|
||||
only one sample with confidence larger than $C_{o2m}$ is chosed as the canditate sample of O2O cls head. According to \cite{pss}, to maintain feature quality during training stage, the gradient of O2O cls head are stopped from propagating back to the rest of the network (the roi feature of the anchor $\boldsymbol{F}_{i}^{roi}$). Additionally, we use the rank loss to increase the gap between positive and negative confidences of O2O cls head:
|
||||
\begin{equation}
|
||||
\begin{aligned}
|
||||
&\mathcal{L} _{\,\,rank}=\frac{1}{N_{rank}}\sum_{i\in \varOmega _{pos}^{o2o}}{\sum_{j\in \varOmega _{neg}^{o2o}}{\max \left( 0, \tau _{rank}-\tilde{s}_i+\tilde{s}_j \right)}}\\
|
||||
@ -514,11 +513,10 @@ only one sample with confidence larger than $C_{o2m}$ is chosed as the canditate
|
||||
\end{equation}
|
||||
|
||||
We directly use the GLaneIoU loss, $\mathcal{L}_{GLaneIoU}$, to regression the offset of xs (with g=1) and SmoothL1 loss for the regression of end points (namely the y axis of the start point and the end point), denoted as $\mathcal{L} _{end}$. In order to make model learn the global features, we proposed the auxloss illustrated in fig. \ref{auxloss}:
|
||||
\begin{equation}
|
||||
\begin{aligned}
|
||||
\mathcal{L} _{\,\,aux}=\frac{1}{\left| \varOmega _{pos}^{o2m} \right|N_{seg}}\sum_{i\in \varOmega _{pos}^{o2o}}{\sum_{m=j}^k{l\left( \theta _i-\hat{\theta}_{i}^{seg,m} \right) \\+l\left( r_{i}^{global}-\hat{r}_{i}^{seg,m} \right)}}
|
||||
\end{aligned}
|
||||
\end{equation}
|
||||
\begin{align}
|
||||
\mathcal{L}_{aux} &= \frac{1}{\left| \varOmega_{pos}^{o2m} \right| N_{seg}} \sum_{i \in \varOmega_{pos}^{o2o}} \sum_{m=j}^k \Bigg[ l \left( \theta_i - \hat{\theta}_{i}^{seg,m} \right) \\
|
||||
&\quad + l \left( r_{i}^{global} - \hat{r}_{i}^{seg,m} \right) \Bigg]
|
||||
\end{align}
|
||||
The anchors and ground truth are divided into several segments. Each anchor segment is regressed to the main components of the corresponding segment of the assigned ground truth. This approach assists the anchors in learning more about the global geometric shape.
|
||||
|
||||
|
||||
@ -545,8 +543,8 @@ The first line in the loss function represents the loss for the local polar head
|
||||
& Train &88,880/$55,698^{*}$&3,268 &58,269&5,435&100,000\\
|
||||
& Validation &9,675 &358 &20,844&- &20,000 \\
|
||||
& Test &34,680&2,782 &20,929&1,569&- \\
|
||||
& Resolution &1640\times590&1280\times720&1276\times717&1920\times1080&2560\times1440, etc\\
|
||||
& Lane &\leqslant4&\leqslant5&\leqslant4&=2&\leqslant10\\
|
||||
& Resolution &$1640\times590$&$1280\times720$&$1276\times717$&$1920\times1080$&$2560\times1440$, etc\\
|
||||
& Lane &$\leqslant4$&$\leqslant5$&$\leqslant4$&$=2$&$\leqslant10$\\
|
||||
& Environment &urban and highway & highway&highway&railay&urban and highway\\
|
||||
& Distribution &sparse&sparse&sparse&sparse&sparse and dense\\
|
||||
\midrule
|
||||
@ -561,7 +559,7 @@ The first line in the loss function represents the loss for the local polar head
|
||||
& Rank loss &0.7&0.7&0.1&0.7&0 \\
|
||||
\midrule
|
||||
\multirow{4}*{Evaluation Parameter}
|
||||
& Polar map size &4\times10&4\times10&4\times10&4\times10&6\times13\\
|
||||
& Polar map size &$4\times10$&$4\times10$&$4\times10$&$4\times10$&$6\times13$\\
|
||||
& Top anchor selection &20&20&20&12&50\\
|
||||
& o2m conf thres &0.48&0.40&0.40&0.40&0.45\\
|
||||
& o2o conf thres &0.46&0.46&0.46&0.46&0.44\\
|
||||
@ -587,7 +585,7 @@ The first line in the loss function represents the loss for the local polar head
|
||||
|
||||
|
||||
\subsection{Dataset and Evaluation Metric}
|
||||
We conducted experiments on four widely used lane detection benchmarks and one rail detection dataset: CULane, TuSimple, LLAMAS, CurveLanes, and DL-Rail. Among these datasets, CULane and CurveLanes are particularly challenging. The CULane dataset consists various scenarios but has sparse lane distributions, whereas CurveLanes includes a large number of curved and dense lane types, such as forked and double lanes. The DL-Rail dataset, focused on rail detection across different scenarios, was chosen to evaluate our model’s performance beyond traditional lane detection. The details for five dataset are shown in Tab. \ref{dataset_info}
|
||||
We conducted experiments on four widely used lane detection benchmarks and one rail detection dataset: CULane\cite{scnn}, TuSimple\cite{tusimple}, LLAMAS\cite{llamas}, CurveLanes\cite{curvelanes}, and DL-Rail\cite{dalnet}. Among these datasets, CULane and CurveLanes are particularly challenging. The CULane dataset consists various scenarios but has sparse lane distributions, whereas CurveLanes includes a large number of curved and dense lane types, such as forked and double lanes. The DL-Rail dataset, focused on rail detection across different scenarios, was chosen to evaluate our model’s performance beyond traditional lane detection. The details for five dataset are shown in Tab. \ref{dataset_info}
|
||||
|
||||
We use the F1-score to evaluate our model on the CULane, LLAMAS, DL-Rail, and Curvelanes datasets, maintaining consistency with previous work. The F1-score is defined as follows:
|
||||
\begin{equation}
|
||||
@ -599,23 +597,23 @@ We use the F1-score to evaluate our model on the CULane, LLAMAS, DL-Rail, and Cu
|
||||
Recall\,\,=\,\,\frac{TP}{TP+FN}
|
||||
\end{aligned}
|
||||
\end{equation}
|
||||
In our experiment, we use different IoU thresholds to calculate the F1-score for different datasets: F1@50 and F1@75 for CULane \cite{}, F1@50 for LLAMAS \cite{} and Curvelanes \cite{}, and F1@50, F1@75, and mF1 for DL-Rail \cite{}. The mF1 is defined as:
|
||||
In our experiment, we use different IoU thresholds to calculate the F1-score for different datasets: F1@50 and F1@75 for CULane \cite{clrnet}, F1@50 for LLAMAS \cite{clrnet} and Curvelanes \cite{condlanenet}, and F1@50, F1@75, and mF1 for DL-Rail \cite{dalnet}. The mF1 is defined as:
|
||||
\begin{equation}
|
||||
\begin{align}
|
||||
\begin{aligned}
|
||||
mF1=\left( F1@50+F1@55+...+F1@95 \right) /10
|
||||
\end{align}
|
||||
\end{aligned}
|
||||
\end{equation}
|
||||
|
||||
For Tusimple, the evaluation is formulated as follows:
|
||||
\begin{equation}
|
||||
\begin{align}
|
||||
\begin{aligned}
|
||||
Accuracy=\frac{\sum{C_{clip}}}{\sum{S_{clip}}}
|
||||
\end{align}
|
||||
\end{aligned}
|
||||
\end{equation}
|
||||
where $C_{clip}$ and $S_{clip}$ represent the number of correct points (predicted points within 20 pixels of the ground truth) and the ground truth points, respectively. If the accuracy exceeds 85\%, the prediction is considered correct. Tusimples also report the False Positive rate (FP=1-Precision) and False Negative Rate (FN=1-Recall) formular.
|
||||
|
||||
\subsection{Implement Detail}
|
||||
All input images are cropped and resized to $800\times320$. Similar to \cite{}, we apply random affine transformations and random horizontal flips. For the optimization process, we use the AdamW \cite{} optimizer with a learning rate warm-up and a cosine decay strategy \cite{}. The initial learning rate is set to 0.006. The number of sampled points and regression points for each lane anchor are set to 36 and 72, respectively. Other parameters, such as batch size and loss weights for each dataset, are detailed in Table \ref{dataset_info}. Since some test/validation sets for the five datasets are not accessible, the test/validation sets used are also listed in Table \ref{dataset_info}. All the expoeriments are conducted on a single NVIDIA A100-40G GPU. To make our model simple, we only use CNN based backbone, namely ResNet\cite{} and DLA34\cite{}.
|
||||
All input images are cropped and resized to $800\times320$. Similar to \cite{clrnet}, we apply random affine transformations and random horizontal flips. For the optimization process, we use the AdamW \cite{adam} optimizer with a learning rate warm-up and a cosine decay strategy. The initial learning rate is set to 0.006. The number of sampled points and regression points for each lane anchor are set to 36 and 72, respectively. Other parameters, such as batch size and loss weights for each dataset, are detailed in Table \ref{dataset_info}. Since some test/validation sets for the five datasets are not accessible, the test/validation sets used are also listed in Table \ref{dataset_info}. All the expoeriments are conducted on a single NVIDIA A100-40G GPU. To make our model simple, we only use CNN based backbone, namely ResNet\cite{resnet} and DLA34\cite{dla}.
|
||||
|
||||
|
||||
\begin{table*}[htbp]
|
||||
@ -629,45 +627,45 @@ All input images are cropped and resized to $800\times320$. Similar to \cite{},
|
||||
\hline
|
||||
\textbf{Seg \& Grid} \\
|
||||
\cline{1-1}
|
||||
SCNN &VGG-16 &71.60&39.84&90.60&69.70&58.50&66.90&43.40&84.10&64.40&1900&66.10\\
|
||||
RESA &ResNet50 &75.30&53.39&92.10&73.10&69.20&72.80&47.70&83.30&70.30&1503&69.90\\
|
||||
LaneAF &DLA34 &77.41&- &91.80&75.61&71.78&79.12&51.38&86.88&72.70&1360&73.03\\
|
||||
UFLDv2 &ResNet34 &76.0 &- &92.5 &74.8 &65.5 &75.5 &49.2 &88.8 &70.1 &1910&70.8 \\
|
||||
CondLaneNet &ResNet101&79.48&61.23&93.47&77.44&70.93&80.91&54.13&90.16&75.21&1201&74.80\\
|
||||
SCNN\cite{scnn} &VGG-16 &71.60&39.84&90.60&69.70&58.50&66.90&43.40&84.10&64.40&1900&66.10\\
|
||||
RESA\cite{resa} &ResNet50 &75.30&53.39&92.10&73.10&69.20&72.80&47.70&83.30&70.30&1503&69.90\\
|
||||
LaneAF\cite{laneaf} &DLA34 &77.41&- &91.80&75.61&71.78&79.12&51.38&86.88&72.70&1360&73.03\\
|
||||
UFLDv2\cite{ufldv2} &ResNet34 &76.0 &- &92.5 &74.8 &65.5 &75.5 &49.2 &88.8 &70.1 &1910&70.8 \\
|
||||
CondLaneNet\cite{condlanenet} &ResNet101&79.48&61.23&93.47&77.44&70.93&80.91&54.13&90.16&75.21&1201&74.80\\
|
||||
\cline{1-1}
|
||||
\textbf{Parameter} \\
|
||||
\cline{1-1}
|
||||
BézierLaneNet &ResNet18&73.67&-&90.22&71.55&62.49&70.91&45.30&84.09&58.98&\textbf{996} &68.70\\
|
||||
BSNet &DLA34 &80.28&-&93.87&78.92&75.02&82.52&54.84&90.73&74.71&1485&75.59\\
|
||||
Eigenlanes &ResNet50&77.20&-&91.7 &76.0 &69.8 &74.1 &52.2 &87.7 &62.9 &1509&71.8 \\
|
||||
BézierLaneNet\cite{bezierlanenet} &ResNet18&73.67&-&90.22&71.55&62.49&70.91&45.30&84.09&58.98&\textbf{996} &68.70\\
|
||||
BSNet\cite{bsnet} &DLA34 &80.28&-&93.87&78.92&75.02&82.52&54.84&90.73&74.71&1485&75.59\\
|
||||
Eigenlanes\cite{enginlanes} &ResNet50&77.20&-&91.7 &76.0 &69.8 &74.1 &52.2 &87.7 &62.9 &1509&71.8 \\
|
||||
\cline{1-1}
|
||||
\textbf{Keypoint} \\
|
||||
\cline{1-1}
|
||||
CurveLanes-NAS-L &-u &74.80&-&90.70&72.30&67.70&70.10&49.40&85.80&68.40&1746&68.90\\
|
||||
FOLOLane &ResNet18 &78.80&-&92.70&77.80&75.20&79.30&52.10&89.00&69.40&1569&74.50\\
|
||||
GANet-L &ResNet101&79.63&-&93.67&78.66&71.82&78.32&53.38&89.86&77.37&1352&73.85\\
|
||||
CurveLanes-NAS-L\cite{curvelanes} &-u &74.80&-&90.70&72.30&67.70&70.10&49.40&85.80&68.40&1746&68.90\\
|
||||
FOLOLane\cite{fololane} &ResNet18 &78.80&-&92.70&77.80&75.20&79.30&52.10&89.00&69.40&1569&74.50\\
|
||||
GANet-L\cite{ganet} &ResNet101&79.63&-&93.67&78.66&71.82&78.32&53.38&89.86&77.37&1352&73.85\\
|
||||
\cline{1-1}
|
||||
\textbf{Dense Anchor} \\
|
||||
\cline{1-1}
|
||||
LaneATT &ResNet18 &75.13&51.29&91.17&72.71&65.82&68.03&49.13&87.82&63.75&1020&68.58\\
|
||||
LaneATT &ResNet122&77.02&57.50&91.74&76.16&69.47&76.31&50.46&86.29&64.05&1264&70.81\\
|
||||
CLRNet &Resnet18 &79.58&62.21&93.30&78.33&73.71&79.66&53.14&90.25&71.56&1321&75.11\\
|
||||
CLRNet &DLA34 &80.47&62.78&93.73&79.59&75.30&82.51&54.58&90.62&74.13&1155&75.37\\
|
||||
CLRerNet &DLA34 &81.12&64.07&94.02&80.20&74.41&\textbf{83.71}&56.27&90.39&74.67&1161&\textbf{76.53}\\
|
||||
LaneATT\cite{laneatt} &ResNet18 &75.13&51.29&91.17&72.71&65.82&68.03&49.13&87.82&63.75&1020&68.58\\
|
||||
LaneATT\cite{laneatt} &ResNet122&77.02&57.50&91.74&76.16&69.47&76.31&50.46&86.29&64.05&1264&70.81\\
|
||||
CLRNet\cite{laneatt} &Resnet18 &79.58&62.21&93.30&78.33&73.71&79.66&53.14&90.25&71.56&1321&75.11\\
|
||||
CLRNet\cite{laneatt} &DLA34 &80.47&62.78&93.73&79.59&75.30&82.51&54.58&90.62&74.13&1155&75.37\\
|
||||
CLRerNet\cite{clrernet} &DLA34 &81.12&64.07&94.02&80.20&74.41&\textbf{83.71}&56.27&90.39&74.67&1161&\textbf{76.53}\\
|
||||
\cline{1-1}
|
||||
\textbf{Sparse Anchor} \\
|
||||
\cline{1-1}
|
||||
ADNet &ResNet34&78.94&-&92.90&77.45&71.71&79.11&52.89&89.90&70.64&1499&74.78\\
|
||||
SRLane &ResNet18&79.73&-&93.52&78.58&74.13&81.90&55.65&89.50&75.27&1412&74.58\\
|
||||
Sparse Laneformer &Resnet50&77.83&-&- &- &- &- &- &- &- &- &- \\
|
||||
ADNet \cite{adnet} &ResNet34&78.94&-&92.90&77.45&71.71&79.11&52.89&89.90&70.64&1499&74.78\\
|
||||
SRLane \cite{srlane} &ResNet18&79.73&-&93.52&78.58&74.13&81.90&55.65&89.50&75.27&1412&74.58\\
|
||||
Sparse Laneformer\cite{sparse} &Resnet50&77.83&-&- &- &- &- &- &- &- &- &- \\
|
||||
\hline
|
||||
\textbf{Proposed Method} \\
|
||||
\cline{1-1}
|
||||
PolarRCNN_{o2m} &ResNet18&80.81&63.96&94.12&79.57&76.53&83.33&55.06&90.62&79.50&1088&75.25\\
|
||||
PolarRCNN-NMS &ResNet18&80.81&63.96&94.12&79.57&76.53&83.33&55.06&90.62&79.50&1088&75.25\\
|
||||
PolarRCNN &ResNet18&80.81&63.96&94.12&79.57&76.53&83.33&55.06&90.62&79.50&1088&75.25\\
|
||||
PolarRCNN &ResNet34&80.92&63.97&94.24&79.76&76.70&81.93&55.40&\textbf{91.12}&79.85&1158&75.71\\
|
||||
PolarRCNN &ResNet50&81.34&64.77&94.45&\textbf{80.42}&75.82&83.61&56.62&91.10&80.05&1356&75.94\\
|
||||
PolarRCNN_{o2m} &DLA34 &\textbf{81.49}&64.96&\textbf{94.44}&80.36&\textbf{76.83}&83.68&56.53&90.85&\textbf{80.09}&1135&76.32\\
|
||||
PolarRCNN-NMS &DLA34 &\textbf{81.49}&64.96&\textbf{94.44}&80.36&\textbf{76.83}&83.68&56.53&90.85&\textbf{80.09}&1135&76.32\\
|
||||
PolarRCNN &DLA34 &\textbf{81.49}&\textbf{64.97}&\textbf{94.44}&80.36&\textbf{76.79}&83.68&\textbf{56.52}&90.85&\textbf{80.09}&1133&76.32\\
|
||||
\bottomrule
|
||||
\end{tabular}
|
||||
@ -687,15 +685,15 @@ All input images are cropped and resized to $800\times320$. Similar to \cite{},
|
||||
\toprule
|
||||
\textbf{Method}& \textbf{Backbone}& \textbf{Acc(\%)}&\textbf{F1(\%)}&\textbf{FP(\%)}&\textbf{FN(\%)} \\
|
||||
\midrule
|
||||
SCNN &VGG16 &96.53&95.97&6.17&\textbf{1.80}\\
|
||||
PolyLanenet&EfficientNetB0&93.36&90.62&9.42&9.33\\
|
||||
UFLDv2 &ResNet34 &88.08&95.73&18.84&3.70\\
|
||||
LaneATT &ResNet34 &95.63&96.77&3.53&2.92\\
|
||||
FOLOLane &ERFNet &\textbf{96.92}&96.59&4.47&2.28\\
|
||||
CondLaneNet&ResNet101 &96.54&97.24&2.01&3.50\\
|
||||
CLRNet &ResNet18 &96.84&97.89&2.28&1.92\\
|
||||
SCNN\cite{scnn} &VGG16 &96.53&95.97&6.17&\textbf{1.80}\\
|
||||
PolyLanenet\cite{polylanenet}&EfficientNetB0&93.36&90.62&9.42&9.33\\
|
||||
UFLDv2\cite{ufld} &ResNet34 &88.08&95.73&18.84&3.70\\
|
||||
LaneATT\cite{laneatt} &ResNet34 &95.63&96.77&3.53&2.92\\
|
||||
FOLOLane\cite{laneatt} &ERFNet &\textbf{96.92}&96.59&4.47&2.28\\
|
||||
CondLaneNet\cite{condlanenet}&ResNet101 &96.54&97.24&2.01&3.50\\
|
||||
CLRNet\cite{clrnet} &ResNet18 &96.84&97.89&2.28&1.92\\
|
||||
\midrule
|
||||
PolarRCNN_{o2m} &ResNet18&96.21&\textbf{97.98}&2.17&1.86\\
|
||||
PolarRCNN-NMS &ResNet18&96.21&\textbf{97.98}&2.17&1.86\\
|
||||
PolarRCNN &ResNet18&96.20&97.94&2.25&1.87\\
|
||||
\bottomrule
|
||||
\end{tabular}
|
||||
@ -712,17 +710,17 @@ All input images are cropped and resized to $800\times320$. Similar to \cite{},
|
||||
\toprule
|
||||
\textbf{Method}& \textbf{Backbone}&\textbf{F1@50(\%)}&\textbf{Precision(\%)}&\textbf{Recall(\%)} \\
|
||||
\midrule
|
||||
SCNN &ResNet34&94.25&94.11&94.39\\
|
||||
BézierLaneNet &ResNet34&95.17&95.89&94.46\\
|
||||
LaneATT &ResNet34&93.74&96.79&90.88\\
|
||||
LaneAF &DLA34 &96.07&96.91&95.26\\
|
||||
DALNet &ResNet34&96.12&\textbf{96.83}&95.42\\
|
||||
CLRNet &DLA34 &96.12&- &- \\
|
||||
SCNN\cite{scnn} &ResNet34&94.25&94.11&94.39\\
|
||||
BézierLaneNet\cite{bezierlanenet} &ResNet34&95.17&95.89&94.46\\
|
||||
LaneATT\cite{laneatt} &ResNet34&93.74&96.79&90.88\\
|
||||
LaneAF\cite{laneaf} &DLA34 &96.07&96.91&95.26\\
|
||||
DALNet\cite{dalnet} &ResNet34&96.12&\textbf{96.83}&95.42\\
|
||||
CLRNet\cite{clrnet} &DLA34 &96.12&- &- \\
|
||||
\midrule
|
||||
|
||||
PolarRCNN_{o2m} &ResNet18&96.05&96.80&95.32\\
|
||||
PolarRCNN-NMS &ResNet18&96.05&96.80&95.32\\
|
||||
PolarRCNN &ResNet18&96.06&96.81&95.32\\
|
||||
PolarRCNN_{o2m} &DLA34&96.13&96.80&\textbf{95.47}\\
|
||||
PolarRCNN-NMS &DLA34&96.13&96.80&\textbf{95.47}\\
|
||||
PolarRCNN &DLA34&\textbf{96.14}&96.82&\textbf{95.47}\\
|
||||
|
||||
\bottomrule
|
||||
@ -739,14 +737,14 @@ All input images are cropped and resized to $800\times320$. Similar to \cite{},
|
||||
\toprule
|
||||
\textbf{Method}& \textbf{Backbone}&\textbf{mF1(\%)}&\textbf{F1@50(\%)}&\textbf{F1@75(\%)} \\
|
||||
\midrule
|
||||
BézierLaneNet &ResNet18&42.81&85.13&38.62\\
|
||||
GANet-S &Resnet18&57.64&95.68&62.01\\
|
||||
CondLaneNet &Resnet18&52.37&95.10&53.10\\
|
||||
UFLDv1 &ResNet34&53.76&94.78&57.15\\
|
||||
LaneATT(with RPN) &ResNet18&55.57&93.82&58.97\\
|
||||
DALNet &ResNet18&59.79&96.43&65.48\\
|
||||
BézierLaneNet\cite{bezierlanenet} &ResNet18&42.81&85.13&38.62\\
|
||||
GANet-S\cite{ganet} &Resnet18&57.64&95.68&62.01\\
|
||||
CondLaneNet\cite{condlanenet} &Resnet18&52.37&95.10&53.10\\
|
||||
UFLDv1\cite{ufld} &ResNet34&53.76&94.78&57.15\\
|
||||
LaneATT(with RPN)\cite{dalnet} &ResNet18&55.57&93.82&58.97\\
|
||||
DALNet\cite{dalnet} &ResNet18&59.79&96.43&65.48\\
|
||||
\midrule
|
||||
PolarRCNN_{o2m} &ResNet18&\textbf{61.53}&\textbf{97.01}&\textbf{67.86}\\
|
||||
PolarRCNN-NMS &ResNet18&\textbf{61.53}&\textbf{97.01}&\textbf{67.86}\\
|
||||
PolarRCNN &ResNet18&61.52&96.99&67.85\\
|
||||
\bottomrule
|
||||
\end{tabular}
|
||||
@ -763,17 +761,17 @@ All input images are cropped and resized to $800\times320$. Similar to \cite{},
|
||||
\toprule
|
||||
\textbf{Method}& \textbf{Backbone}&\textbf{F1(\%)}&\textbf{Precision(\%)}&\textbf{Recall(\%)} \\
|
||||
\midrule
|
||||
SCNN &VGG16 &65.02&76.13&56.74\\
|
||||
Enet-SAD &- &50.31&63.60&41.60\\
|
||||
PointLanenet &ResNet101&78.47&86.33&72.91\\
|
||||
CurveLane-S &- &81.12&93.58&71.59\\
|
||||
CurveLane-M &- &81.80&93.49&72.71\\
|
||||
CurveLane-L &- &82.29&91.11&75.03\\
|
||||
UFLDv2 &ResNet34 &81.34&81.93&80.76\\
|
||||
CondLaneNet-M &ResNet34 &85.92&88.29&83.68\\
|
||||
CondLaneNet-L &ResNet101&86.10&88.98&83.41\\
|
||||
CLRNet &DLA34 &86.10&91.40&81.39\\
|
||||
CLRerNet &DLA34 &86.47&91.66&81.83\\
|
||||
SCNN\cite{scnn} &VGG16 &65.02&76.13&56.74\\
|
||||
Enet-SAD\cite{enetsad} &- &50.31&63.60&41.60\\
|
||||
PointLanenet\cite{pointlanenet} &ResNet101&78.47&86.33&72.91\\
|
||||
CurveLane-S\cite{curvelanes} &- &81.12&93.58&71.59\\
|
||||
CurveLane-M\cite{curvelanes} &- &81.80&93.49&72.71\\
|
||||
CurveLane-L\cite{curvelanes} &- &82.29&91.11&75.03\\
|
||||
UFLDv2\cite{ufldv2} &ResNet34 &81.34&81.93&80.76\\
|
||||
CondLaneNet-M\cite{condlanenet} &ResNet34 &85.92&88.29&83.68\\
|
||||
CondLaneNet-L\cite{condlanenet} &ResNet101&86.10&88.98&83.41\\
|
||||
CLRNet\cite{clrnet} &DLA34 &86.10&91.40&81.39\\
|
||||
CLRerNet\cite{clrernet} &DLA34 &86.47&91.66&81.83\\
|
||||
\hline
|
||||
PolarRCNN &DLA34&\textbf{87.29}&90.50&\textbf{84.31}\\
|
||||
\hline
|
||||
@ -783,9 +781,9 @@ All input images are cropped and resized to $800\times320$. Similar to \cite{},
|
||||
\end{table}
|
||||
|
||||
\subsection{Comparison with the state-of-the-art results}
|
||||
The comparison results of our proposed model with other methods are shown in Tables \ref{culane result}, \ref{tusimple result}, \ref{llamas result}, \ref{dlrail result}, and \ref{curvelanes result}. We present results for two versions of our model: the NMS-based version, denoted as $PolarRCNN_{o2m}$, and the NMS-free version, denoted as $PolarRCNN$. The NMS-based version utilizes predictions obtained from the O2M head followed by NMS post-processing, while the NMS-free version derives predictions directly from the O2O classification head without NMS.
|
||||
The comparison results of our proposed model with other methods are shown in Tables \ref{culane result}, \ref{tusimple result}, \ref{llamas result}, \ref{dlrail result}, and \ref{curvelanes result}. We present results for two versions of our model: the NMS-based version, denoted as $PolarRCNN-NMS$, and the NMS-free version, denoted as $PolarRCNN$. The NMS-based version utilizes predictions obtained from the O2M head followed by NMS post-processing, while the NMS-free version derives predictions directly from the O2O classification head without NMS.
|
||||
|
||||
To ensure a fair comparison, we also include results for CLRerNet \cite{} on the CULane and CurveLanes datasets, as we use a similar training strategy and data split. As illustrated in the comparison results, our model demonstrates competitive performance across five datasets. Specifically, on the CULane, TuSimple, LLAMAS, and DL-Rail datasets (sparse scenarios), our model outperforms other anchor-based methods. Additionally, the performance of the NMS-free version is nearly identical to that of the NMS-based version, highlighting the effectiveness of the O2O head in eliminating redundant predictions. On the CurveLanes dataset, the NMS-free version achieves superior F1-measure and Recall compared to both NMS-based and segment\&grid-based methods.
|
||||
To ensure a fair comparison, we also include results for CLRerNet \cite{clrernet} on the CULane and CurveLanes datasets, as we use a similar training strategy and data split. As illustrated in the comparison results, our model demonstrates competitive performance across five datasets. Specifically, on the CULane, TuSimple, LLAMAS, and DL-Rail datasets (sparse scenarios), our model outperforms other anchor-based methods. Additionally, the performance of the NMS-free version is nearly identical to that of the NMS-based version, highlighting the effectiveness of the O2O head in eliminating redundant predictions. On the CurveLanes dataset, the NMS-free version achieves superior F1-measure and Recall compared to both NMS-based and segment\&grid-based methods.
|
||||
|
||||
We also compare the number of anchors and processing speed with other methods. Figure \ref{anchor_num_method} illustrates the number of anchors used by several anchor-based methods on CULane. Our proposed model utilizes the fewest anchors (20) while achieving the highest F1-score on CULane. It remains competitive with state-of-the-art methods like CLRerNet, which uses 192 anchors and a cross-layer refinement strategy. Conversely, the sparse Laneformer, which also uses 20 anchors, does not achieve optimal performance. It is important to note that our model features a simpler structure without additional refinement, indicating that the design of flexible anchors is crucial for performance in sparse scenarios. Furthermore, due to its simple structure and fewer anchors, our model exhibits lower latency compared to most methods, as shown in Figure \ref{speed_method}. The combination of fast processing speed and a straightforward architecture makes our model highly deployable.
|
||||
|
||||
@ -879,7 +877,7 @@ We also compare the number of anchors and processing speed with other methods. F
|
||||
\toprule
|
||||
\textbf{Paradigm} & \textbf{NMS thres(pixel)} & \textbf{F1(\%)} & \textbf{Precision(\%)} & \textbf{Recall(\%)} \\
|
||||
\midrule
|
||||
\multirow{7}*{PolarRCNN_{o2m}}
|
||||
\multirow{7}*{PolarRCNN-NMS}
|
||||
& 50 (default) &85.38&\textbf{91.01}&80.40\\
|
||||
& 40 &85.97&90.72&81.68\\
|
||||
& 30 &86.26&90.44&82.45\\
|
||||
@ -1246,28 +1244,28 @@ This should be a simple paragraph before the References to thank those individua
|
||||
|
||||
|
||||
\bibliographystyle{IEEEtran}
|
||||
\bibliography{ref}
|
||||
\bibliography{reference}
|
||||
|
||||
|
||||
\newpage
|
||||
|
||||
\section{Biography Section}
|
||||
If you have an EPS/PDF photo (graphicx package needed), extra braces are
|
||||
needed around the contents of the optional argument to biography to prevent
|
||||
the LaTeX parser from getting confused when it sees the complicated
|
||||
$\backslash${\tt{includegraphics}} command within an optional argument. (You can create
|
||||
your own custom macro containing the $\backslash${\tt{includegraphics}} command to make things
|
||||
simpler here.)
|
||||
% If you have an EPS/PDF photo (graphicx package needed), extra braces are
|
||||
% needed around the contents of the optional argument to biography to prevent
|
||||
% the LaTeX parser from getting confused when it sees the complicated
|
||||
% $\backslash${\tt{includegraphics}} command within an optional argument. (You can create
|
||||
% your own custom macro containing the $\backslash${\tt{includegraphics}} command to make things
|
||||
% simpler here.)
|
||||
|
||||
\vspace{11pt}
|
||||
% \vspace{11pt}
|
||||
|
||||
\bf{If you include a photo:}\vspace{-33pt}
|
||||
\begin{IEEEbiography}[{\includegraphics[width=1in,height=1.25in,clip,keepaspectratio]{fig1}}]{Michael Shell}
|
||||
Use $\backslash${\tt{begin\{IEEEbiography\}}} and then for the 1st argument use $\backslash${\tt{includegraphics}} to declare and link the author photo.
|
||||
Use the author name as the 3rd argument followed by the biography text.
|
||||
\end{IEEEbiography}
|
||||
% \bf{If you include a photo:}\vspace{-33pt}
|
||||
% \begin{IEEEbiography}[{\includegraphics[width=1in,height=1.25in,clip,keepaspectratio]{fig1}}]{Michael Shell}
|
||||
% Use $\backslash${\tt{begin\{IEEEbiography\}}} and then for the 1st argument use $\backslash${\tt{includegraphics}} to declare and link the author photo.
|
||||
% Use the author name as the 3rd argument followed by the biography text.
|
||||
% \end{IEEEbiography}
|
||||
|
||||
\vspace{11pt}
|
||||
% \vspace{11pt}
|
||||
|
||||
\bf{If you will not include a photo:}\vspace{-33pt}
|
||||
\begin{IEEEbiographynophoto}{John Doe}
|
||||
|
701
main2.tex
701
main2.tex
@ -1,701 +0,0 @@
|
||||
\documentclass[lettersize,journal]{IEEEtran}
|
||||
\usepackage{amsmath,amsfonts}
|
||||
\usepackage{algorithmic}
|
||||
\usepackage{algorithm}
|
||||
\usepackage{array}
|
||||
% \usepackage[caption=false,font=normalsize,labelfont=sf,textfont=sf]{subfig}
|
||||
\usepackage{textcomp}
|
||||
\usepackage{stfloats}
|
||||
\usepackage{url}
|
||||
\usepackage{verbatim}
|
||||
\usepackage{graphicx}
|
||||
\usepackage{cite}
|
||||
\usepackage{subcaption}
|
||||
\usepackage{multirow}
|
||||
\usepackage[T1]{fontenc}
|
||||
\usepackage{adjustbox}
|
||||
\usepackage{amssymb}
|
||||
\usepackage{booktabs}
|
||||
\usepackage[table,xcdraw]{xcolor}
|
||||
\definecolor{darkgreen}{RGB}{17,159,27} % 或者使用其他 RGB 值定义深绿色
|
||||
\aboverulesep=0pt
|
||||
\belowrulesep=0pt
|
||||
\hyphenation{op-tical net-works semi-conduc-tor IEEE-Xpolare}
|
||||
% updated with editorial comments 8/9/2021
|
||||
|
||||
\begin{document}
|
||||
|
||||
\title{PolarRCNN:\@ End-to-End Lane Detection with Fewer Anchors}
|
||||
|
||||
\author{IEEE Publication Technology,~\IEEEmembership{Staff,~IEEE,}
|
||||
% <-this % stops a space
|
||||
\thanks{This paper was produced by the IEEE Publication Technology Group. They are in Piscataway, NJ.}% <-this % stops a space
|
||||
\thanks{Manuscript received April 19, 2021; revised August 16, 2021.}}
|
||||
|
||||
% The paper headers
|
||||
\markboth{Journal of \LaTeX\ Class Files,~Vol.~14, No.~8, August~2021}%
|
||||
{Shell \MakeLowercase{\textit{et al.}}: A Sample Article Using IEEEtran.cls for IEEE Journals}
|
||||
|
||||
% \IEEEpubid{0000--0000/00\$00.00~\copyright~2021 IEEE}
|
||||
% Remember, if you use this you must call \IEEEpubidadjcol in the second
|
||||
% column for its text to clear the IEEEpubid mark.
|
||||
|
||||
\maketitle
|
||||
|
||||
\begin{abstract}
|
||||
Lane detection is a critical and challenging task in autonomous driving, particularly in real-world scenarios where traffic lanes are often slender, lengthy, and partially obscured by other vehicles, complicating detection efforts. Existing anchor-based methods typically rely on prior straight line anchors to extract features and refine lane location and shape. Though achieving high performance, manually setting prior anchors is cumbersome, and ensuring sufficient anchor coverage across diverse datasets requires a large number of dense anchors. Furthermore, NMS postprocessing should be applied to supress the redundant predictions. In this study, we introduce PolarRCNN, a two-stage nms-free anchor-based method for lane detection. By introducing local polar head, the proposal of anchors are dynamic. The number of anchors are decreasing greatly without sacrificing performace. What's more, a GNN based nms free head is proposed to enable the model reach an end-to-end format, which is deployment friendly. Our model yields competitive results on five popular lane detection benchmarks (Tusimple, CULane, LLAMAS, CurveLanes and DL-Rail) while maintaining a lightweight size and a simple structure.
|
||||
\end{abstract}
|
||||
\begin{IEEEkeywords}
|
||||
Lane detection.
|
||||
\end{IEEEkeywords}
|
||||
|
||||
\section{Introduction}
|
||||
\IEEEPARstart{L}{ane} detection is a significant problem in computer vision and autonomous driving, forming the basis for accurately perceiving the driving environment in intelligent driving systems. While extensive research has been conducted in ideal environments, it remains a challenging task in adverse scenarios such as night driving, glare, crowd, and rainy conditions, where lanes may be occluded or damaged. Moreover, the slender shapes, complex topologies of lanes and the global property to the complexity of detection challenges. An effective lane detection method should take into account both global high-level semantic features and local low-level features to address these varied conditions and ensure robust performance in real-time applications such as autonomous driving.
|
||||
|
||||
Traditional methods predominantly concentrate on handcrafted local feature extraction and lane shape modeling. Techniques such as the Canny edge detector\cite{canny1986computational}, Hough transform\cite{houghtransform}, and deformable templates for lane fitting\cite{kluge1995deformable} have been extensively utilized. Nevertheless, these approaches often encounter limitations in practical settings, particularly when low-level and local features lack clarity or distinctiveness.
|
||||
|
||||
In recent years, fueled by advancements in deep learning and the availability of large datasets, significant strides have been made in lane detection. Deep models, including convolutional neural networks (CNNs) and transformer-based architectures, have propelled progress in this domain. Previous approaches often treated lane detection as a segmentation task, albeit with simplicity came time-intensive computations. Some methods relied on parameter-based models, directly outputting lane curve parameters instead of pixel locations. These models offer end-to-end solutions, but the curve parameter sensitivity to lane shape compromises robustness.
|
||||
|
||||
\begin{figure}[t]
|
||||
\centering
|
||||
\begin{subfigure}{0.49\linewidth}
|
||||
\centering
|
||||
\includegraphics[width=0.9\linewidth]{thsis_figure/anchor_demo/anchor_fix_init.jpg}
|
||||
\caption{}
|
||||
\end{subfigure}
|
||||
\begin{subfigure}{0.49\linewidth}
|
||||
\centering
|
||||
\includegraphics[width=0.9\linewidth]{thsis_figure/anchor_demo/anchor_fix_learned.jpg}
|
||||
\caption{}
|
||||
\end{subfigure}
|
||||
%\qquad
|
||||
%让图片换行,
|
||||
|
||||
\begin{subfigure}{0.49\linewidth}
|
||||
\centering
|
||||
\includegraphics[width=0.9\linewidth]{thsis_figure/anchor_demo/anchor_proposal.jpg}
|
||||
\caption{}
|
||||
\end{subfigure}
|
||||
\begin{subfigure}{0.49\linewidth}
|
||||
\centering
|
||||
\includegraphics[width=0.9\linewidth]{thsis_figure/anchor_demo/gt.jpg}
|
||||
\caption{}
|
||||
\end{subfigure}
|
||||
\caption{Compare with the anchor setting with other methods. (a) The initial anchor settings of CLRNet. (b) The learned anchor settings of CLRNet trained on CULane. (c) The proposed anchors of our method. (d) The ground truth.}
|
||||
\label{anchor setting}
|
||||
\end{figure}
|
||||
|
||||
Drawing inspiration from object detection methods such as Yolos and Fast RCNN, several anchor-based approaches have been introduced for lane detection, the representative work including LanesATT and CLRNet. These methods have demonstrated superior performance by leveraging anchor priors and enabling larger receptive fields for feature extraction. However, anchor-based methods encounter similar drawbacks as anchor-based general object detection method as follows:
|
||||
|
||||
\begin{itemize} (1) A large amount of dense anchors should be configured to ensure the recall of detection result since the lane distributions are complex in real scenarios (i.e the direction and the localtion), as the Fig. \ref{anchor setting}(a) shows.
|
||||
\end{itemize}
|
||||
|
||||
\begin{itemize} (2) Due to the large anchor setting, redundant predictions should be remove by postprocessing such as NMS \cite{} and FastNMS \cite{}, which brings the difficulty to deployment and the threshold of NMS should be manual setting.
|
||||
\end{itemize}
|
||||
|
||||
In order to solve the first problem, CLRNet uses learned anchors which location are optimized during training to adapt to the lane distributions (see Fig \ref{anchor setting} (b)) in real scenarios and use cascade cross layer anchor refinement to make the anchor more closer to the groundtruth. However, the anchors in CLRNet are still numerous to cover the potential distributions of lanes. To solve this problem, ADNet \cite{} uses start points generate unit to propose flexible anchors for each image rather than uses the same set of anchors for all images. However, the start points of lanes are subjective and lack of clear visual evidence due to the gloal property of lanes, so the performance of ADNet is not ideal. SRLane uses local angle map to propose sketch anchors according the direction of groundtruth. This method only consider the direction and ignore the accurate location of anchors, leading to worse performance without cascade anchor refinement. Moreover, all methods mentioned above fail to avoid the redundant predictions in the second proplem.
|
||||
|
||||
In order to address the issue we mentioned above better than the previous work, we analysis the reasons causing these issues and proposed a new lane detection method called PolarRCNN, which is two-stage nms-free anchor-based model. PolarRCNN uses local and global coordinates to describe the anchors and the number of proposed anchors are much less than previous work, as shown in fig. \ref{anchor setting} (c). Moreover, aheuristic graph neural network block is proposed to make the model nms-free. The model architecture is simple without complex mechanism using in previous work(i.e. attenion, cascade refinement, etc.), making the model deployment easier and speed faster. Besides, simple architecture helps us to inspect the key factors for performance for anchor based lane detection methods.
|
||||
|
||||
|
||||
|
||||
We conducted ecperiment on five mainstream benchmarks including TuSimple \cite{}, CULane \cite{}, LLAMAS\cite{}, CurveLanes\cite{} and DL-Rail\cite{}. Our proposed method is blessed with competitive performance with the state-of-art methods.
|
||||
|
||||
Our main contribution are summarized as:
|
||||
|
||||
\begin{itemize}
|
||||
\item We simplified the anchor parameters with local and global polar coordinate systems, and apply them to two-stage lane detection frameworks. Compared with other sparse two-stage methods, the number of porposed anchors are greatly decreasing with a better performace.
|
||||
\item We proposed a novel heuristic graph neural network (GNN) head to implement a nms-free paradigm. The architecture of GNN is designed according to Fast NMS with interpretability. The whole training and testing process of our model is end-to-end.
|
||||
\item Our proposed method applies simple model architectures and get competitive performance with other state-of-art methods on five datasets. The high performace with fewer anchors and nms-free paradigm and demonstrate the effectiveness of our method.
|
||||
\end{itemize}
|
||||
|
||||
\section{Related Works}
|
||||
The lane detection aims to detect lane instances in a image. In this section, we only introduce deep-leanrning based methods for lane detection. The lane detection methods can be categorized by segmentation based parameter-based methods and anchor-based methods.
|
||||
|
||||
\textbf{Segmentation-based Methods.} Segmentation-based methods focus on pixel-wise prediction. They predefined each pixel into different categories according to different lane instances and background\cite{} and predicted information pixel by pixel. However, these methods overly focus on low-level and local features, neglecting global semantic information and real-time detection. SCNN uses a larger receptive field to overcome this problem. Some methods such as UFLDv1 and v2\cite{}\cite{} and CondLaneNet\cite{} utilize row-wise or column-wise classification instead of pixel classification to improve detection speed. Another issue with these methods is that the lane instance prior is learned by the model itself, leading to a lack of prior knowledge. Lanenet uses post-clustering to distinguish each lane instance. UFLD divides lane instances by angles and locations and can only detect a fixed number of lanes. CondLaneNet utilizes different conditional dynamic kernels to predict different lane instances. Some methods such as FOLOLane\cite{} and GANet\cite{} use bottom-up strategies to detect a few key points and model their global relations to form lane instances.
|
||||
|
||||
\textbf{Parameter-based Methods.} Instead of predicting a series of points locations or pixel classes, parameter-based methods directly generate the curve parameters of lane instances. PolyLanenet\cite{} and LSTR\cite{} consider the lane instance as a polynomial curve and output the polynomial coefficients directly. BézierLaneNet\cite{} treats the lane instance as a Bézier curve and generates the locations of control points of the curve. BSLane uses B-Spline to describe the lane, and the curve parameters focus on the local shapes of lanes. Parameter-based methods are mostly end-to-end without postprocessing, which grants them faster speed. However, since the final visual lane shapes are sensitive to the lane shape, the robustness and generalization of parameter-based methods may be less than ideal.
|
||||
|
||||
|
||||
\textbf{Anchor-Based Methods.} Inspired by some methods in general object detection like YOLO \cite{} and DETR \cite{}, anchor-based methods have been proposed for lane detection. Line-CNN is the earliest work, to our knowledge, that utilizes line anchors to detect lanes. The lines are designed as rays emitted from the three edges (left, bottom, and right) of an image. However, the receptive field of the model only focuses on edges and is slower than some methods. LaneATT \cite{} employs anchor-based feature pooling to aggregate features along the whole line anchor, achieving faster speed with better performance. Nevertheless, the grid sampling strategy and label assignment limit its potential. CLRNet \cite{} utilizes cross-layer refinement strategies, SimOTA label assignment \cite{}, and Liou loss to enhance anchor-based performance beyond most methods. The main advantage of anchor-based methods is that many strategies from anchor-based general object detection can be easily applied to lane detection, such as label assignment, bounding box refinement, GIOU loss, etc. However, the disadvantages of existing anchor-based lane detection are also evident. The line anchors need to be handcrafted and the anchor number is large, NMS postprocessing are needed, resulting in high computational consumption.
|
||||
some work such as ADNet\cite{}, SRLane\cite{} and Sparse Laneformer\cite{} attempt to reduce the anchors and give proposals.
|
||||
|
||||
\textbf{NMS-Free Object Detections}. NMS is an import postprocessing step in most general object detection methods. Detr \cite{} use one to one label assignment to avoid redundant predictions without NMS. Other nms-free method \cite{} successively proposed. These methods analysis this issue in to aspects, the model architecture and label assignment. \cite{}\cite{} hold the view that one to one assignments are the key points for nms-free predictions. Other works also consider the model expression ability to provided the non-redundant predictions. However, few anchor-based lane detecction methods analysis the nms-free paradigm as the general object detection, and rely on the NMS postprocessing. In our work, we find both the labal assignment and the expressive ability of nms-free module (e.g. the architecture and the inputs of module) both play an important role in the nms-free lane detection task for ancnor-based models.
|
||||
|
||||
This paper aims to address the two issue mentioned above (reducing anchors numbers and nms-free) for the anchor-based lanes proposed methods.
|
||||
|
||||
|
||||
\section{Method}
|
||||
The overall architecture of PolarRCNN is illustrated in fig. \ref{overall_architecture}. Our model consists of backbone-FPN, local polar head and global polar head. Only simple network layers such as convolution, MLP and pooling ops are used in each bolck (rather than attention, dynamic kernels, etc.).
|
||||
|
||||
\begin{figure*}[ht]
|
||||
\centering
|
||||
\includegraphics[width=0.9\textwidth]{thsis_figure/ovarall_architecture.png} % 替换为你的图片文件名
|
||||
\caption{The overall pipeline of PolarRCNN. The architecture is simple and lightweight. The backbone (e.g. ResNet18) and FPN aims to extract feature of the image. And the Local polar head aims to proposed sparse line anchors. After pooling features sample along the line anchors, the global polar head give the final predictions. Trilet subheads are set in the Global polar Head, including an one-to-one classification head (o2o cls head), an one-to-many classification head (o2m cls head) and an one-to-many regression head (o2m reg Head). The one-to-one cls head aim to replace the NMS postprocessing and select only one positive prediction sample for each groundtruth from the redundant predictions from the o2m head.}
|
||||
\label{overall_architecture}
|
||||
\end{figure*}
|
||||
|
||||
\subsection{Lane and Line Anchor Representation}
|
||||
|
||||
Lanes are thin and long curves, a suitable lane prior helps the model to extract features and predict location and modeling the shapes of lane curves more accurately. Keeping the same as privious works\cite{}\cite{}, the lane prior (also called lane anchor) in our work are straight lines and we sample a sequense of 2D points on each line anchor, i.e. $ P\doteq \left\{ \left( x_1, y_1 \right) , \left( x_2, y_2 \right) , ....,\left( x_n, y_n \right) \right\} $, where N is the number of sampled points, The y coordinate of points is uniform sampled from the image vertically, i.e. $y_i=\frac{H}{N-1}*i$, where H is the image height. The same y coordinate of points are also sampled from the groundtruth lane and the model regress the x coordinate offset from line anchor to lane instance ground truth. The only differernce between PolarRCNN and previous work is the description of straight line anchors. It will be introduced in follows.
|
||||
|
||||
\textbf{Polar Coordinate system.} Since the lane anchor are set to be straight by default, it could be described by the straight line parameter. Previous work uses a ray to describe a 2D line anchor, and the parameters of a ray contain the start point's coordinates and the orientation/angle, i.e., $\left\{\theta, P_{xy}\right\}$, as shown in Figure \ref{coord} (a). \cite{}\cite{} define the start points locates on the three image boundary. And \cite{} points out that this not reasonable because the real start point of a lane could be in any location within an image. In our analysis, using a ray may cause ambiguity in describing a line because a line may have infinite start points and the start point of the lane is subjective. As illustrated in Figure \ref{coord} (a), the yellow and darkgreen start points with the same orientation $\theta$ describe the same line, and either of them could be chosen in different datasets. This ambiguity arises because a straight line has two degrees of freedom while a ray has three degrees of freedom. To address this issue, as shown in Figure \ref{coord} (b), we use polar coordinate systems to describe a lane anchor with two parameters for radius and angle $\left\{\theta, r\right\}$, where $\theta \in \left[-\frac{\pi}{2}, \frac{\pi}{2}\right)$ and $r \in \left(-\infty, +\infty\right)$.
|
||||
|
||||
|
||||
\begin{figure}[t]
|
||||
\centering
|
||||
\begin{subfigure}{0.49\linewidth}
|
||||
\centering
|
||||
\includegraphics[width=1\linewidth]{thsis_figure/coord/ray.png}
|
||||
\caption{}
|
||||
\end{subfigure}
|
||||
\hfill
|
||||
\begin{subfigure}{0.49\linewidth}
|
||||
\centering
|
||||
\includegraphics[width=1\linewidth]{thsis_figure/coord/polar.png}
|
||||
\caption{}
|
||||
\end{subfigure}
|
||||
\caption{Different descriptions for anchor parameters. (a) Ray: start point and orientation. (b) polar: radius and angle.}
|
||||
\label{coord}
|
||||
\end{figure}
|
||||
|
||||
We define two kinds of polar coordinate systems called the global coordinate system and the local coordinate system, with the origin points denoted as the global origin point $P_{0}^{\text{global}}$ and the local origin point $P_{0}^{\text{local}}$, correspondingly. For convenience, the global origin point is set around the static vanishing point of the whole lane image dataset, while the local origin points are set as lattice within the image. From Figure \ref{coord}, it is easy to see that only the radius parameters are influenced by the choise of the origin point, with the angle/orientation parameters keeping consistent.
|
||||
|
||||
\subsection{Local polar Head}
|
||||
|
||||
Dispired by the region proposal network in Faster RCNN \cite{}, the local polar proposal module aims to propose flexible anchors with high-quality in an image. As fig.\ref{lph} and fig. \ref{overall_architecture}. The highest level (P3) of FPN feature maps the input of $F \in \mathbb{R}^{C_{f} \times H_{f} \times W_{f}}$ are chosen as the input of Local Polar Head (LPH). After downsampling opereation, the feature map are fed into two branch, namely the regression branch and the classification branch:
|
||||
|
||||
\begin{equation}
|
||||
\begin{aligned}
|
||||
&F_d\gets downsample\left( F \right), \,F_d\in \mathbb{R} ^{C_f\times H_l\times W_l}\\
|
||||
&F_{reg\,\,}\gets \phi _{reg}^{lph}\left( F_d \right), \,F_{reg\,\,}\in \mathbb{R} ^{2\times H_l\times W_l}\\
|
||||
&F_{cls}\gets \phi _{cls}^{lph}\left( F_d \right), \,F_{cls}\in \mathbb{R} ^{1\times H_l\times W_l}
|
||||
\end{aligned}
|
||||
\label{lph equ}
|
||||
\end{equation}
|
||||
|
||||
The regression branch aim to proposed lane anchors by predicting the two parameters $F_{reg\,\,} \equiv \left[\mathbf{\Theta}^{H_{l} \times W_{l}}, \mathbf{\xi}^{H_{l}\times W_{l}}\right]$ under the local polar coordinate system, which denotes the angles and the radius. The classification branch predicts the heat map of the local polar origin grid. By removing the local origin points with lower confidence, the potential positive lane anchors around the groundtruth are more likely to chosen while the background lane anchors are removed. Keeping it simple, the regression branch $\phi _{reg}^{lph}\left(\cdot \right)$ and the classification branch $\phi _{cls}^{lph}\left(\cdot \right)$ consists of one conv 1x1 layers and two conv 1x1 layers correspondingly.
|
||||
|
||||
|
||||
|
||||
\begin{figure}[t]
|
||||
\centering
|
||||
\includegraphics[width=0.45\textwidth]{thsis_figure/local_polar_head.png} % 替换为你的图片文件名
|
||||
\caption{The main architecture of our model.}
|
||||
\label{lph}
|
||||
\end{figure}
|
||||
|
||||
During the training stage, as fig. \ref{lphlabel},the ground truth label of local polar head is constructed as follows. The radius ground truth is defined as the shortest distance from a grid point (local plot origin point) to the ground truth lane curve. The ground truth of angle is defined as the orientation of the link from the grid point to the nearest points on the curve. Only one grid with the label of radius less than a threshold $\tau$ is set as a positive sample, while others are set as negative samples. Once the regression and classification labels are constructed, it can be easy to train the LPH by smooth-l1 loss and cross entropy loss (BCE). The LPH loss function is defined as follows:
|
||||
|
||||
\begin{equation}
|
||||
\begin{aligned}
|
||||
\mathcal{L} _{lph}^{cls}&=BCE\left( F_{cls},F_{gt} \right) \\
|
||||
\mathcal{L} _{lph}^{r\mathrm{e}g}&=\frac{1}{N_{lph}^{pos}}\sum_{i\in \left\{i|\hat{r}_i<\tau \right\}}{\left( d\left( \theta _i-\hat{\theta}_i \right) +d\left( r_i-\hat{r}_i \right) \right)}\\
|
||||
% \mathcal{L} _{lph}^{r\mathrm{e}g}&=\lambda _{lph}^{cls}\mathcal{L} _{lph}^{cls}+\lambda _{lph}^{reg}\mathcal{L} _{lph}^{r\mathrm{e}g}
|
||||
\end{aligned}
|
||||
\label{loss_lpm}
|
||||
\end{equation}
|
||||
|
||||
|
||||
where $BCE\left( \cdot , \cdot \right) $ denotes the binary cross entropy loss and $d\left(\cdot \right)$ denotes the smooth-l1 loss. In order to keep the backbone training stability, the gradiants from the confidential branch to the backbone feature map are detached.
|
||||
|
||||
|
||||
\begin{figure}[t]
|
||||
\centering
|
||||
\includegraphics[width=0.48\textwidth]{thsis_figure/coord/localpolar.png}
|
||||
\caption{Label construction for local polar proposal module.}
|
||||
\label{lphlabel}
|
||||
\end{figure}
|
||||
|
||||
|
||||
|
||||
|
||||
\subsection{Global polar Head}
|
||||
Global polar head serves has the second stage of PolarRCNN, which accept the line pooling features as input and predict the accurate lane shape and localtion. The global polar head consist of 3 partsd.
|
||||
Once the local polar parameter of a line anchor is provided, it can be transformed to the global polar coordinates with the following euqation:
|
||||
\begin{equation}
|
||||
\begin{aligned}
|
||||
r^{global}=r^{local}+\left( x^{local}-x^{global} \right) \cos \theta
|
||||
\\+\left( y^{local}-y^{global} \right) \sin \theta
|
||||
\end{aligned}
|
||||
\end{equation}
|
||||
where $\left( x^{local}, y^{local} \right)$ and $\left( x^{global}, y^{global} \right)$ are the Cartesian coordinates of local and global origin points correspondingly.
|
||||
|
||||
Then the feature points can be sample on the line anchor. The y coordinate of points is uniform sampled from the feature maps from FPN vertically as mentioned before, and the $x_{i}$ is caculated using the global polar axis by the following equation:
|
||||
|
||||
\begin{equation}
|
||||
\begin{aligned}
|
||||
x_{i\,\,}=-y_i\tan \theta +\frac{r}{\cos \theta}
|
||||
\end{aligned}
|
||||
\end{equation}
|
||||
|
||||
|
||||
\begin{figure}[t]
|
||||
\centering
|
||||
\includegraphics[width=0.49\textwidth]{thsis_figure/triplet_head.png} % 替换为你的图片文件名
|
||||
\caption{The main architecture of global head}
|
||||
\label{triplet}
|
||||
\end{figure}
|
||||
|
||||
\begin{figure}[t]
|
||||
\centering
|
||||
\includegraphics[width=0.4\textwidth]{thsis_figure/gnn.png} % 替换为你的图片文件名
|
||||
\caption{The main architecture of our model.}
|
||||
\label{gnn}
|
||||
\end{figure}
|
||||
|
||||
\begin{figure}[t]
|
||||
\centering
|
||||
\includegraphics[width=0.49\textwidth]{thsis_figure/GLaneIoU.png} % 替换为你的图片文件名
|
||||
\caption{Illustrations of GLaneIoU re-defined in our work.}
|
||||
\label{glaneiou}
|
||||
\end{figure}
|
||||
|
||||
Suppose the $\left{L_{0}, L_{1} and L{2}\right}$ denotes the different levels from FPN, we sample the grid featuers from the three levels once rather than using the cross layer refinenment like CLRNet. In order to reduce the number of parameters, we use the weight sum strategy to add features from different layers similar to \cite{}:
|
||||
|
||||
|
||||
\begin{equation}
|
||||
\begin{aligned}
|
||||
\boldsymbol{F}^s=\sum_{i=0}^2{\boldsymbol{F}_{i}^{s}\times \frac{e^{\boldsymbol{w}_{i}^{s}}}{\sum_{i=0}^2{e^{\boldsymbol{w}_{i}^{s}}}}}
|
||||
\end{aligned}
|
||||
\end{equation}
|
||||
where $\boldsymbol{F}_{i}^{s}\in \mathbb{R} ^{N_p\times d_f}$ is the grid featuers sampled from $L_{i}$ and $\boldsymbol{w}_{i}^{s}\in \mathbb{R} ^{N_p}$ is the aggregate weight, serving as a learned weight of model. Insteading of concationate the three sampleing features to $\boldsymbol{F}^s\in \mathbb{R} ^{N_p\times d_f\times 3}$ directly, the adaptive adding greatly reduce the dimentsion of the features to $\boldsymbol{F}^s\in \mathbb{R} ^{N_p\times d_f}$, which is 1/3 of the former. Then the weighted sumed tensors are fed into a full connection layers:
|
||||
\begin{equation}
|
||||
\begin{aligned}
|
||||
\boldsymbol{F}^{roi}\gets fc\left( \boldsymbol{F}^s \right) , \boldsymbol{F}^{roi}\in \mathbb{R} ^{d_r}
|
||||
\end{aligned}
|
||||
\end{equation}
|
||||
|
||||
It should be noted that each proposal anchor produce an roi features, where we denote it as $\boldsymbol{F}^{roi}_{i}$ (i=0,1,2,... $N_{anc}$) are fed into the triplet head and get one-to-one(o2o) confidence , one-to-many(o2m) confidence and many-to-one (o2m) regressions. As for the o2m confidence, more than one detecting results are provided around the ground truth with high probablity, So the NMS post processing is necesary to remove the redundant positive detecting results, just as the previous works\cite{}. Unlike one-to-many assignment, one-to-one assignment only assign one positive anchors for each anchor, so just one detecting result is likely to be provided for one groundtruth, which is nms-free.
|
||||
|
||||
However, we find that only plain structure branch is unable to learn the one-to-one assignment, because the anchors are highly coincide just as the fig \ref{anchor setting} (b)(c) shows. Directly use use the one-to-one assignment to the same structure of o2m branch should greatly reduce the performance of the model. To address this issue, We use a heuristic way to design the structure of one-to-one beanches.
|
||||
|
||||
It is easy to notice that the "ideal" one-to-one branch is equivalence to o2m cls branch + o2m regression + NMS postprocessing. To deduce the latter more clearly, the process are transfer to the equation as follows:
|
||||
|
||||
|
||||
\begin{equation}
|
||||
\begin{aligned}
|
||||
s_i\gets f_{o2o}^{cls}\left( \boldsymbol{F}_{i}^{roi} \right)
|
||||
\\
|
||||
\varDelta \boldsymbol{x}_{i}^{roi}\gets f_{o2o}^{cls}\left( \boldsymbol{F}_{i}^{roi} \right) \,\, \varDelta \boldsymbol{x}_{i}^{roi}\in \mathbb{R} ^{N_r}
|
||||
\\
|
||||
\tilde{s}_i|_{i=1}^{N_{anc}}\gets NMS\left( s_i|_{i=1}^{N_{anc}}, \varDelta \boldsymbol{x}_{i}^{roi}+\boldsymbol{x}_{i}^{b}|_{i=1}^{N_{anc}} \right)
|
||||
\\
|
||||
\end{aligned}
|
||||
\end{equation}
|
||||
|
||||
That is to say the o2o confidence could be prdicted by some functions (the functions should takes features/scores\locations of all anchors rather than just one anchor)
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
The loss function is as follows:
|
||||
|
||||
\begin{equation}
|
||||
\begin{aligned}
|
||||
\mathcal{L} _{RCNN}=c_{cls}\mathcal{L} _{cls}+c_{loc}\mathcal{L} _{loc}
|
||||
\end{aligned}
|
||||
\end{equation}
|
||||
where $\mathcal{L} _{cls}$ is focal loss, and $\mathcal{L} _{loc}$ is LaneIou loss\cite{}.
|
||||
|
||||
In the testing stage, anchors with the top-$k_{l}$ confidence are the chosed as the proposal anchors, and $k_{l}$ anchors are fed into the RCNN module to get the final predictions.
|
||||
|
||||
\section{Experiment}
|
||||
|
||||
|
||||
|
||||
\begin{figure}[t]
|
||||
\centering
|
||||
\includegraphics[width=0.48\textwidth]{thsis_figure/anchor_num_f1.png} % 替换为你的图片文件名
|
||||
\caption{Anchor Number and f1-score of different methods on CULane.}
|
||||
\label{glaneiou}
|
||||
\end{figure}
|
||||
|
||||
|
||||
|
||||
\begin{table*}[htbp]
|
||||
\centering
|
||||
\caption{Dataset \& preprocess}
|
||||
\begin{adjustbox}{width=\linewidth}
|
||||
\begin{tabular}{l|l|ccccc}
|
||||
\toprule
|
||||
\multicolumn{2}{c|}{\textbf{Dataset}} & CULane & TUSimple & LLAMAS & DL-Rail & CurveLanes \\
|
||||
\midrule
|
||||
\multirow{7}*{Data info}
|
||||
& Train &3,268 &88,880&58,269&5,435&100,000\\
|
||||
& Validation &9,675 &358 &20,844&- &20,000 \\
|
||||
& Test &34,680&2,782 &20,929&1,569&- \\
|
||||
& Resolution &1640\times590&1280\times720&1276\times717&1920\times1080&2560\times1440, etc\\
|
||||
& Lane &\leqslant4&\leqslant5&\leqslant4&=2&\leqslant10\\
|
||||
& Environment &urban and highway & highway&highway&Railay&urban and highway\\
|
||||
& Property &sparse&sparse&sparse&sparse&sparse and dense\\
|
||||
\midrule
|
||||
\multirow{1}*{Data Preprocess}
|
||||
& Crop Height &270&160&300&560&640, etc\\
|
||||
\midrule
|
||||
\multirow{5}*{Training Hyper Parameter}
|
||||
& Epoch Number &32&70&20&90&32\\
|
||||
& Batch Size &40&24&32&40&40\\
|
||||
& Warm up iterations &800&200&800&400&800\\
|
||||
& Aux Loss &0.2&0 &0.2&0.2&0.2\\
|
||||
& Rank Loss &0.7&0.7&0.1&0.7&0 \\
|
||||
\midrule
|
||||
\multirow{4}*{Model Hyper Parameter}
|
||||
& polar map size &4\times10&4\times10&4\times10&4\times10&6\times13\\
|
||||
& testing anchor number (top-k) &20&20&20&20&50\\
|
||||
& o2m conf thres &0.48&0.40&0.40&0.40&0.45\\
|
||||
& o2o conf thres &0.46&0.46&0.46&0.46&0.44\\
|
||||
\midrule
|
||||
\multirow{2}*{Evaluation Choise}
|
||||
& Eval split &Test&Test&Test&Test&Validation\\
|
||||
& Vis split &Test&Test&Validation&Test&Validation\\
|
||||
\bottomrule
|
||||
\end{tabular}
|
||||
\end{adjustbox}
|
||||
\label{dataset_info}
|
||||
\end{table*}
|
||||
|
||||
|
||||
\begin{table*}[htbp]
|
||||
\centering
|
||||
\caption{CULane Result compared with other methods}
|
||||
\normalsize
|
||||
\begin{adjustbox}{width=\linewidth}
|
||||
\begin{tabular}{lrlllllllllll}
|
||||
\toprule
|
||||
\textbf{Method}& \textbf{Backbone}&\textbf{F1@50}$\uparrow$& \textbf{F1@75}$\uparrow$& \textbf{Normal}$\uparrow$&\textbf{Crowded}$\uparrow$&\textbf{Dazzle}$\uparrow$&\textbf{Shadow}$\uparrow$&\textbf{No line}$\uparrow$& \textbf{Arrow}$\uparrow$& \textbf{Curve}$\uparrow$& \textbf{Cross}$\downarrow$ & \textbf{Night}$\uparrow$ \\
|
||||
\hline
|
||||
\textbf{Seg \& Grid} \\
|
||||
\cline{1-1}
|
||||
SCNN &VGG-16 &71.60&39.84&90.60&69.70&58.50&66.90&43.40&84.10&64.40&1900&66.10\\
|
||||
RESA &ResNet50 &75.30&53.39&92.10&73.10&69.20&72.80&47.70&83.30&70.30&1503&69.90\\
|
||||
LaneAF &DLA34 &77.41&- &91.80&75.61&71.78&79.12&51.38&86.88&72.70&1360&73.03\\
|
||||
UFLDv2 &ResNet34 &76.0 &- &92.5 &74.8 &65.5 &75.5 &49.2 &88.8 &70.1 &1910&70.8 \\
|
||||
CondLaneNet &ResNet101&79.48&61.23&93.47&77.44&70.93&80.91&54.13&90.16&75.21&1201&74.80\\
|
||||
\cline{1-1}
|
||||
\textbf{Parameter} \\
|
||||
\cline{1-1}
|
||||
BézierLaneNet &ResNet18&73.67&-&90.22&71.55&62.49&70.91&45.30&84.09&58.98&\textbf{996} &68.70\\
|
||||
BSNet &DLA34 &80.28&-&93.87&78.92&75.02&82.52&54.84&90.73&74.71&1485&75.59\\
|
||||
Eigenlanes &ResNet50&77.20&-&91.7 &76.0 &69.8 &74.1 &52.2 &87.7 &62.9 &1509&71.8 \\
|
||||
\cline{1-1}
|
||||
\textbf{Keypoint} \\
|
||||
\cline{1-1}
|
||||
CurveLanes-NAS-L &-u &74.80&-&90.70&72.30&67.70&70.10&49.40&85.80&68.40&1746&68.90\\
|
||||
FOLOLane &ResNet18 &78.80&-&92.70&77.80&75.20&79.30&52.10&89.00&69.40&1569&74.50\\
|
||||
GANet-L &ResNet101&79.63&-&93.67&78.66&71.82&78.32&53.38&89.86&77.37&1352&73.85\\
|
||||
\cline{1-1}
|
||||
\textbf{Dense Anchor} \\
|
||||
\cline{1-1}
|
||||
LaneATT &ResNet18 &75.13&51.29&91.17&72.71&65.82&68.03&49.13&87.82&63.75&1020&68.58\\
|
||||
LaneATT &ResNet122&77.02&57.50&91.74&76.16&69.47&76.31&50.46&86.29&64.05&1264&70.81\\
|
||||
CLRNet &Resnet18 &79.58&62.21&93.30&78.33&73.71&79.66&53.14&90.25&71.56&1321&75.11\\
|
||||
CLRNet &DLA34 &80.47&62.78&93.73&79.59&75.30&82.51&54.58&90.62&74.13&1155&75.37\\
|
||||
CLRerNet &DLA34 &81.12&64.07&94.02&80.20&74.41&\textbf{83.71}&56.27&90.39&74.67&1161&\textbf{76.53}\\
|
||||
\cline{1-1}
|
||||
\textbf{Sparse Anchor} \\
|
||||
\cline{1-1}
|
||||
ADNet &ResNet34&78.94&-&92.90&77.45&71.71&79.11&52.89&89.90&70.64&1499&74.78\\
|
||||
SRLane &ResNet18&79.73&-&93.52&78.58&74.13&81.90&55.65&89.50&75.27&1412&74.58\\
|
||||
Sparse Laneformer &Resnet50&77.83&-&- &- &- &- &- &- &- &- &- \\
|
||||
\hline
|
||||
\textbf{Proposed Method} \\
|
||||
\cline{1-1}
|
||||
PolarRCNN_{o2m} &ResNet18&80.81&63.97&94.11&79.57&76.53&83.33&55.10&90.70&79.47&1089&75.25\\
|
||||
PolarRCNN &ResNet18&80.80&63.97&94.12&79.57&76.53&83.33&55.09&90.62&79.47&1089&75.25\\
|
||||
PolarRCNN &ResNet34&80.91&63.96&94.24&79.75&76.67&81.97&55.40&\textbf{91.12}&79.85&1158&75.70\\
|
||||
PolarRCNN &ResNet50&81.34&64.77&94.45&\textbf{80.42}&75.88&83.61&56.63&91.10&80.00&1356&75.94\\
|
||||
PolarRCNN_{o2m} &DLA34 &\textbf{81.49}&64.96&\textbf{94.44}&80.36&\textbf{76.83}&83.68&56.53&90.85&\textbf{80.09}&1135&76.32\\
|
||||
PolarRCNN &DLA34 &\textbf{81.49}&\textbf{64.97}&\textbf{94.44}&80.36&\textbf{76.83}&83.68&\textbf{56.56}&90.81&79.80&1135&76.33\\
|
||||
\bottomrule
|
||||
\end{tabular}
|
||||
\end{adjustbox}
|
||||
\label{culane result}
|
||||
\end{table*}
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
\begin{table}[h]
|
||||
\centering
|
||||
\caption{TuSimple Result compared with other methods}
|
||||
\begin{adjustbox}{width=\linewidth}
|
||||
\begin{tabular}{lrcccc}
|
||||
\toprule
|
||||
\textbf{Method}& \textbf{Backbone}& \textbf{Acc(\%)}&\textbf{F1(\%)}&\textbf{FP(\%)}&\textbf{FN(\%)} \\
|
||||
\midrule
|
||||
SCNN &VGG16 &96.53&95.97&6.17&\textbf{1.80}\\
|
||||
PolyLanenet&EfficientNetB0&93.36&90.62&9.42&9.33\\
|
||||
UFLDv2 &ResNet34 &88.08&95.73&18.84&3.70\\
|
||||
LaneATT &ResNet34 &95.63&96.77&3.53&2.92\\
|
||||
FOLOLane &ERFNet &\textbf{96.92}&96.59&4.47&2.28\\
|
||||
CondLaneNet&ResNet101 &96.54&97.24&2.01&3.50\\
|
||||
CLRNet &ResNet18 &96.84&97.89&2.28&1.92\\
|
||||
\midrule
|
||||
PolarRCNN_{o2m} &ResNet18&96.20&\textbf{97.98}&2.16&1.86\\
|
||||
PolarRCNN &ResNet18&96.21&97.93&2.26&1.87\\
|
||||
\bottomrule
|
||||
\end{tabular}
|
||||
\end{adjustbox}
|
||||
\end{table}
|
||||
|
||||
|
||||
\begin{table}[h]
|
||||
\centering
|
||||
\caption{LLAMAS test results compared with other methods}
|
||||
\begin{adjustbox}{width=\linewidth}
|
||||
\begin{tabular}{lrcccc}
|
||||
\toprule
|
||||
\textbf{Method}& \textbf{Backbone}&\textbf{F1@50(\%)}&\textbf{Precision(\%)}&\textbf{Recall(\%)} \\
|
||||
\midrule
|
||||
SCNN &ResNet34&94.25&94.11&94.39\\
|
||||
BézierLaneNet &ResNet34&95.17&95.89&94.46\\
|
||||
LaneATT &ResNet34&93.74&96.79&90.88\\
|
||||
LaneAF &DLA34 &96.07&96.91&95.26\\
|
||||
DALNet &ResNet34&96.12&\textbf{96.83}&95.42\\
|
||||
CLRNet &DLA34 &96.12&- &- \\
|
||||
\midrule
|
||||
PolarRCNN &ResNet18&96.06&96.81&95.32\\
|
||||
PolarRCNN &DLA34 &\textbf{96.14}&96.82&\textbf{95.47}\\
|
||||
\bottomrule
|
||||
\end{tabular}
|
||||
\end{adjustbox}
|
||||
\end{table}
|
||||
|
||||
\begin{table}[h]
|
||||
\centering
|
||||
\caption{DL-Rail test results compared with other methods}
|
||||
\begin{adjustbox}{width=\linewidth}
|
||||
\begin{tabular}{lrccc}
|
||||
\toprule
|
||||
\textbf{Method}& \textbf{Backbone}&\textbf{mF1(\%)}&\textbf{F1@50(\%)}&\textbf{F1@75(\%)} \\
|
||||
\midrule
|
||||
BézierLaneNet &ResNet18&42.81&85.13&38.62\\
|
||||
GANet-S &Resnet18&57.64&95.68&62.01\\
|
||||
CondLaneNet &Resnet18&52.37&95.10&53.10\\
|
||||
UFLDv1 &ResNet34&53.76&94.78&57.15\\
|
||||
LaneATT(with RPN) &ResNet18&55.57&93.82&58.97\\
|
||||
DALNet &ResNet18&59.79&96.43&65.48\\
|
||||
\midrule
|
||||
PolarRCNN_{o2m} &ResNet18&\textbf{61.54}&\textbf{97.01}&\textbf{67.92}\\
|
||||
PolarRCNN &ResNet18&61.53&96.99&67.91\\
|
||||
\bottomrule
|
||||
\end{tabular}
|
||||
\end{adjustbox}
|
||||
\end{table}
|
||||
|
||||
|
||||
\begin{table}[h]
|
||||
\centering
|
||||
\caption{CurveLanes validation results compared with other methods}
|
||||
\begin{adjustbox}{width=\linewidth}
|
||||
\begin{tabular}{lrcccc}
|
||||
\toprule
|
||||
\textbf{Method}& \textbf{Backbone}&\textbf{F1(\%)}&\textbf{Precision(\%)}&\textbf{Recall(\%)} \\
|
||||
\midrule
|
||||
SCNN &VGG16 &65.02&76.13&56.74\\
|
||||
Enet-SAD &- &50.31&63.60&41.60\\
|
||||
PointLanenet &ResNet101&78.47&86.33&72.91\\
|
||||
CurveLane-S &- &81.12&93.58&71.59\\
|
||||
CurveLane-M &- &81.80&93.49&72.71\\
|
||||
CurveLane-L &- &82.29&91.11&75.03\\
|
||||
UFLDv2 &ResNet34 &81.34&81.93&80.76\\
|
||||
CondLaneNet-M &ResNet34 &85.92&88.29&83.68\\
|
||||
CondLaneNet-L &ResNet101&86.10&88.98&83.41\\
|
||||
CLRNet &DLA34 &86.10&91.40&81.39\\
|
||||
CLRerNet &DLA34 &86.47&91.66&81.83\\
|
||||
\hline
|
||||
PolarRCNN &DLA34&\textbf{87.29}&90.50&\textbf{84.31}\\
|
||||
\hline
|
||||
\end{tabular}
|
||||
\end{adjustbox}
|
||||
\end{table}
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
\begin{table}[h]
|
||||
\centering
|
||||
\caption{Comparsion between different anchor strategies}
|
||||
\begin{adjustbox}{width=\linewidth}
|
||||
\begin{tabular}{c|ccc|cc}
|
||||
\toprule
|
||||
\textbf{Anchor strategy}&\textbf{Local R}& \textbf{Local Angle}&\textbf{Auxloss}&\textbf{F1@50}&\textbf{F1@75}\\
|
||||
\midrule
|
||||
\multirow{2}*{Fixed}
|
||||
&- &- & &79.90 &60.98\\
|
||||
&- &- &\checkmark&80.38 &62.35\\
|
||||
\midrule
|
||||
\multirow{5}*{Porposal}
|
||||
& & & &75.85 &58.97\\
|
||||
&\checkmark& & &78.46 &60.32\\
|
||||
& &\checkmark& &80.31 &62.13\\
|
||||
&\checkmark&\checkmark& &80.51 &63.38\\
|
||||
&\checkmark&\checkmark&\checkmark&\textbf{80.81}&\textbf{63.97}\\
|
||||
\bottomrule
|
||||
\end{tabular}
|
||||
\end{adjustbox}
|
||||
\end{table}
|
||||
|
||||
\begin{table}[h]
|
||||
\centering
|
||||
\caption{NMS vs NMS-free on CurveLanes}
|
||||
\begin{adjustbox}{width=\linewidth}
|
||||
\begin{tabular}{l|l|ccc}
|
||||
\toprule
|
||||
\textbf{Paradigm} & \textbf{NMS thres(pixel)} & \textbf{F1(\%)} & \textbf{Precision(\%)} & \textbf{Recall(\%)} \\
|
||||
\midrule
|
||||
\multirow{7}*{PolarRCNN_{o2m}}
|
||||
& 50 (default) &85.38&\textbf{91.01}&80.40\\
|
||||
& 40 &85.97&90.72&81.68\\
|
||||
& 30 &86.26&90.44&82.45\\
|
||||
& 25 &86.38&90.27&82.83\\
|
||||
& 20 &86.57&90.05&83.37\\
|
||||
& 15 (optimal) &86.81&89.64&84.16\\
|
||||
& 10 &86.58&88.62&\textbf{84.64}\\
|
||||
\midrule
|
||||
PolarRCNN (NMS-free) & - &\textbf{87.29}&90.50&84.31\\
|
||||
\bottomrule
|
||||
\end{tabular}
|
||||
\end{adjustbox}
|
||||
\end{table}
|
||||
|
||||
|
||||
\begin{table}[h]
|
||||
\centering
|
||||
\caption{Ablation study on nms-free block}
|
||||
\begin{adjustbox}{width=\linewidth}
|
||||
\begin{tabular}{cccc|ccc}
|
||||
\toprule
|
||||
\textbf{GNN}&\textbf{cls Mat}& \textbf{Nbr Mat}&\textbf{Rank Loss}&\textbf{F1@50}&\textbf{Precision(\%)} & \textbf{Recall(\%)} \\
|
||||
\midrule
|
||||
& & & &16.19&69.05&9.17\\
|
||||
\checkmark&\checkmark& & &79.42&88.46&72.06\\
|
||||
\checkmark& &\checkmark& &71.97&73.13&70.84\\
|
||||
\checkmark&\checkmark&\checkmark& &80.74&88.49&74.23\\
|
||||
\checkmark&\checkmark&\checkmark&\checkmark&\textbf{80.78}&\textbf{88.49}&\textbf{74.30}\\
|
||||
\bottomrule
|
||||
\end{tabular}\
|
||||
\end{adjustbox}
|
||||
\end{table}
|
||||
|
||||
|
||||
\begin{table}[h]
|
||||
\centering
|
||||
\caption{The ablation study for structure on CULane test set}
|
||||
\begin{adjustbox}{width=\linewidth}
|
||||
\begin{tabular}{c|l|lll}
|
||||
\toprule
|
||||
\multicolumn{2}{c|}{\textbf{Anchor strategy~/~assign}} & \textbf{F1@50(\%)} & \textbf{Precision(\%)} & \textbf{Recall(\%)} \\
|
||||
\midrule
|
||||
\multirow{6}*{Fixed}
|
||||
&o2m-B w/~ NMS &80.38&87.44&74.38\\
|
||||
&o2m-B w/o NMS &44.03\textcolor{darkgreen}{~(36.35$\downarrow$)}&31.12\textcolor{darkgreen}{~(56.32$\downarrow$)}&75.23\textcolor{red}{~(0.85$\uparrow$)}\\
|
||||
\cline{2-5}
|
||||
&o2o-B w/~ NMS &78.72&87.58&71.50\\
|
||||
&o2o-B w/o NMS &78.23\textcolor{darkgreen}{~(0.49$\downarrow$)}&86.26\textcolor{darkgreen}{~(1.32$\downarrow$)}&71.57\textcolor{red}{~(0.07$\uparrow$)}\\
|
||||
\cline{2-5}
|
||||
&o2o-G w/~ NMS &80.37&87.44&74.37\\
|
||||
&o2o-G w/o NMS &80.27\textcolor{darkgreen}{~(0.10$\downarrow$)}&87.14\textcolor{darkgreen}{~(0.30$\downarrow$)}&74.40\textcolor{red}{~(0.03$\uparrow$)}\\
|
||||
\midrule
|
||||
\multirow{6}*{Proposal}
|
||||
&o2m-B w/~ NMS &80.81&88.53&74.33\\
|
||||
&o2m-B w/o NMS &36.46\textcolor{darkgreen}{~(44.35$\downarrow$)}&24.09\textcolor{darkgreen}{~(64.44$\downarrow$)}&74.93\textcolor{red}{~(0.6$\uparrow$)}\\
|
||||
\cline{2-5}
|
||||
&o2o-B w/~ NMS &77.27&92.64&66.28\\
|
||||
&o2o-B w/o NMS &47.11\textcolor{darkgreen}{~(30.16$\downarrow$)}&36.48\textcolor{darkgreen}{~(56.16$\downarrow$)}&66.48\textcolor{red}{~(0.20$\uparrow$)}\\
|
||||
\cline{2-5}
|
||||
&o2o-G w/~ NMS &80.81&88.53&74.32\\
|
||||
&o2o-G w/o NMS &80.80\textcolor{darkgreen}{~(0.01$\downarrow$)}&88.51\textcolor{darkgreen}{~(0.02$\downarrow$)}&74.33\textcolor{red}{~(0.01$\uparrow$)}\\
|
||||
\bottomrule
|
||||
\end{tabular}
|
||||
\end{adjustbox}
|
||||
\end{table}
|
||||
|
||||
|
||||
|
||||
\begin{table}[h]
|
||||
\centering
|
||||
\caption{The ablation study for stop grad on CULane test set}
|
||||
\begin{adjustbox}{width=\linewidth}
|
||||
\begin{tabular}{c|c|lll}
|
||||
\toprule
|
||||
\multicolumn{2}{c|}{\textbf{Paradigm}} & \textbf{F1(\%)} & \textbf{Precision(\%)} & \textbf{Recall(\%)} \\
|
||||
\midrule
|
||||
\multirow{2}*{Baseline}
|
||||
&o2m-B w/~ NMS &78.83&88.99&70.75\\
|
||||
&o2o-G w/o NMS &71.68\textcolor{darkgreen}{~(7.15$\downarrow$)}&72.56\textcolor{darkgreen}{~(16.43$\downarrow$)}&70.81\textcolor{red}{~(0.06$\uparrow$)}\\
|
||||
\midrule
|
||||
\multirow{2}*{Stop grad}
|
||||
&o2m-B w/~ NMS &80.81&88.53&74.33\\
|
||||
&o2o-G w/o NMS &80.80\textcolor{darkgreen}{~(0.01$\downarrow$)}&88.51\textcolor{darkgreen}{~(0.02$\downarrow$)}&74.33\textcolor{red}{~(0.00$\uparrow$)} \\
|
||||
\bottomrule
|
||||
\end{tabular}
|
||||
\end{adjustbox}
|
||||
\end{table}
|
||||
|
||||
|
||||
\section{Conclusion}
|
||||
The conclusion goes here.
|
||||
|
||||
|
||||
\section*{Acknowledgments}
|
||||
This should be a simple paragraph before the References to thank those individuals and institutions who have supported your work on this article.
|
||||
|
||||
|
||||
|
||||
%{\appendices
|
||||
%\section*{Proof of the First Zonklar Equation}
|
||||
%Appendix one text goes here.
|
||||
% You can choose not to have a title for an appendix if you want by leaving the argument blank
|
||||
%\section*{Proof of the Second Zonklar Equation}
|
||||
%Appendix two text goes here.}
|
||||
|
||||
|
||||
\bibliographystyle{IEEEtran}
|
||||
\bibliography{ref}
|
||||
|
||||
|
||||
\newpage
|
||||
|
||||
\section{Biography Section}
|
||||
If you have an EPS/PDF photo (graphicx package needed), extra braces are
|
||||
needed around the contents of the optional argument to biography to prevent
|
||||
the LaTeX parser from getting confused when it sees the complicated
|
||||
$\backslash${\tt{includegraphics}} command within an optional argument. (You can create
|
||||
your own custom macro containing the $\backslash${\tt{includegraphics}} command to make things
|
||||
simpler here.)
|
||||
|
||||
\vspace{11pt}
|
||||
|
||||
% \bf{If you include a photo:}\vspace{-33pt}
|
||||
% \begin{IEEEbiography}[{\includegraphics[width=1in,height=1.25in,clip,keepaspectratio]{fig1}}]{Michael Shell}
|
||||
% Use $\backslash${\tt{begin\{IEEEbiography\}}} and then for the 1st argument use $\backslash${\tt{includegraphics}} to declare and link the author photo.
|
||||
% Use the author name as the 3rd argument followed by the biography text.
|
||||
% \end{IEEEbiography}
|
||||
|
||||
\vspace{11pt}
|
||||
|
||||
\bf{If you will not include a photo:}\vspace{-33pt}
|
||||
\begin{IEEEbiographynophoto}{John Doe}
|
||||
Use $\backslash${\tt{begin\{IEEEbiographynophoto\}}} and the author name as the argument followed by the biography text.
|
||||
\end{IEEEbiographynophoto}
|
||||
|
||||
|
||||
|
||||
|
||||
\vfill
|
||||
|
||||
\end{document}
|
||||
|
||||
|
451
main333.tex
451
main333.tex
@ -1,451 +0,0 @@
|
||||
\documentclass[lettersize,journal]{IEEEtran}
|
||||
\usepackage{amsmath,amsfonts}
|
||||
\usepackage{algorithmic}
|
||||
\usepackage{algorithm}
|
||||
\usepackage{array}
|
||||
% \usepackage[caption=false,font=normalsize,labelfont=sf,textfont=sf]{subfig}
|
||||
\usepackage{textcomp}
|
||||
\usepackage{stfloats}
|
||||
\usepackage{url}
|
||||
\usepackage{verbatim}
|
||||
\usepackage{graphicx}
|
||||
\usepackage{cite}
|
||||
\usepackage{subcaption}
|
||||
\usepackage{graphicx}
|
||||
% \usepackage{subfigure}
|
||||
\usepackage[T1]{fontenc}
|
||||
|
||||
\hyphenation{op-tical net-works semi-conduc-tor IEEE-Xplore}
|
||||
% updated with editorial comments 8/9/2021
|
||||
|
||||
\begin{document}
|
||||
|
||||
\title{PlorRCNN:\@ Fewer Anchors for Lane Deteciton}
|
||||
|
||||
\author{IEEE Publication Technology,~\IEEEmembership{Staff,~IEEE,}
|
||||
% <-this % stops a space
|
||||
\thanks{This paper was produced by the IEEE Publication Technology Group. They are in Piscataway, NJ.}% <-this % stops a space
|
||||
\thanks{Manuscript received April 19, 2021; revised August 16, 2021.}}
|
||||
|
||||
% The paper headers
|
||||
\markboth{Journal of \LaTeX\ Class Files,~Vol.~14, No.~8, August~2021}%
|
||||
{Shell \MakeLowercase{\textit{et al.}}: A Sample Article Using IEEEtran.cls for IEEE Journals}
|
||||
|
||||
% \IEEEpubid{0000--0000/00\$00.00~\copyright~2021 IEEE}
|
||||
% Remember, if you use this you must call \IEEEpubidadjcol in the second
|
||||
% column for its text to clear the IEEEpubid mark.
|
||||
|
||||
\maketitle
|
||||
|
||||
\begin{abstract}
|
||||
Lane detection is a critical and challenging task in autonomous driving, particularly in real-world scenarios where traffic lanes are often slender, lengthy, and partially obscured by other vehicles, complicating detection efforts. Existing anchor-based methods typically rely on prior line anchors or grid anchors to extract features and regress lane location and shape. However, manually setting these prior anchors based on lane distribution is cumbersome, and ensuring sufficient anchor coverage across diverse datasets requires a large number of anchors. In this study, we introduce PlorRCNN, a two-stage anchor-based method for lane detection. Our approach effectively reduces the number of lane anchors without sacrificing performance, yielding competitive results on three prominent 2D lane detection benchmarks (Tusimple, CULane, and LLAMAS) while maintaining a lightweight model size.
|
||||
\end{abstract}
|
||||
|
||||
\begin{IEEEkeywords}
|
||||
Lane detection
|
||||
\end{IEEEkeywords}
|
||||
|
||||
\section{Introduction}
|
||||
\IEEEPARstart{L}{ane} detection is a significant problem in computer vision and autonomous driving, forming the basis for accurately perceiving the driving environment in intelligent driving systems. While extensive research has been conducted in ideal environments, it remains a challenging task in adverse scenarios such as night driving, glare, crowd, and rainy conditions, where lanes may be occluded or damaged. Moreover, the slender shapes and complex topologies of lanes add to the complexity of detection challenges. An effective lane detection method should take into account both high-level semantic features and low-level features to address these varied conditions and ensure robust performance with a fast speed in real-time applications such as autonomous driving.
|
||||
|
||||
Traditional methods predominantly concentrate on handcrafted local feature extraction and lane shape modeling. Techniques such as the Canny edge detector\cite{canny1986computational}, Hough transform\cite{houghtransform}, and deformable templates for lane fitting\cite{kluge1995deformable} have been extensively utilized. Nevertheless, these approaches often encounter limitations in practical settings, particularly when low-level and local features lack clarity or distinctiveness.
|
||||
|
||||
In recent years, fueled by advancements in deep learning and the availability of large datasets, significant strides have been made in lane detection. Deep models, including convolutional neural networks (CNNs) and transformer-based architectures, have propelled progress in this domain. Previous approaches often treated lane detection as a segmentation task, albeit with simplicity came time-intensive computations. Some methods relied on parameter-based models, directly outputting lane curve parameters instead of pixel locations. These models offer end-to-end solutions, but the curve parameter sensitivity to lane shape compromises robustness.
|
||||
|
||||
\begin{figure}[t]
|
||||
\centering
|
||||
\begin{subfigure}{0.49\linewidth}
|
||||
\centering
|
||||
\includegraphics[width=0.9\linewidth]{lanefig/anchor_demo/anchor_fix_init.jpg}
|
||||
\caption{}
|
||||
\end{subfigure}
|
||||
\begin{subfigure}{0.49\linewidth}
|
||||
\centering
|
||||
\includegraphics[width=0.9\linewidth]{lanefig/anchor_demo/anchor_fix_learned.jpg}
|
||||
\caption{}
|
||||
\end{subfigure}
|
||||
%\qquad
|
||||
%让图片换行,
|
||||
|
||||
\begin{subfigure}{0.49\linewidth}
|
||||
\centering
|
||||
\includegraphics[width=0.9\linewidth]{lanefig/anchor_demo/anchor_proposal.jpg}
|
||||
\caption{}
|
||||
\end{subfigure}
|
||||
\begin{subfigure}{0.49\linewidth}
|
||||
\centering
|
||||
\includegraphics[width=0.9\linewidth]{lanefig/anchor_demo/gt.jpg}
|
||||
\caption{}
|
||||
\end{subfigure}
|
||||
\caption{Compare with the anchor setting with other methods. (a) The initial anchor settings of CLRNet. (b) The learned anchor settings of CLRNet trained on CULane. (c) The proposed anchors of our method. (d) The ground truth.}
|
||||
\label{anchor setting}
|
||||
\end{figure}
|
||||
|
||||
Drawing inspiration from object detection methods, several anchor-based approaches have been introduced for lane detection, such as row anchors and straight lane anchors. These methods have demonstrated superior performance by leveraging anchor priors and enabling larger receptive fields for feature extraction. However, anchor-based methods encounter challenges, with the main issue lying in the configuration of anchor priors, specifically in determining the optimal number and locations of anchors. It is imperative that the anchor number is sufficiently large to cover all potential lane locations, yet this may lead to increased model complexity and the introduction of numerous background (negative) anchors. Some studies utilize multiple row and column anchors, though label assignment remains a manually crafted process based on angle considerations. Alternatively, other approaches employ region proposal techniques, generating flexible and high-quality anchors for each image through the use of theta maps and starting points maps, rather than fixed anchors. Though this method offers adaptability, its performance may lag behind anchor-fixed (one-stage) approaches, suffering from the feature training of multitask loss funtions. This phenomenon also appear in our method, which we will discuss later.
|
||||
|
||||
In this paper, to address the some problems of anchor-based, we proposed PlorRCNN, a two-stage model based on local and global plor coordinate system. As shown in Figure \ref{anchor setting}, different from previous work\cite{} with a large amount of predefined anchors, our method propose a fewer anchors with high quality than other fixed anchor-based method. The proposal module is position aware and most of the propoal anchors are around the ground truth, so as to provide a strong base for the second stage to regress the lane more accurate.
|
||||
|
||||
% We also introduce plor refinement to refine the anchor shape by segmentation. The architectures of our baseline is simple, only use the CNN and MLP layers, without any complicated block such as self attention or
|
||||
Our main contribution are summarized as follows:
|
||||
|
||||
\begin{itemize}
|
||||
\item We simplified the anchor parameters with local and global plor system.
|
||||
\item We proposed local plor module to generate a set of anchors for each image with high quality and more fiexibility.
|
||||
\item Our proposed method can get competitive performance with other advanded methods. With more than 80\% f1 -score on CULane with a light weight backbone with ResNet-18
|
||||
\end{itemize}
|
||||
|
||||
|
||||
\section{Related Works}
|
||||
The lane detection aims to detect lane instances in a image. In this section, we only introduce deep-leanrning based methods for lane detection. The lane detection methods can be categorized by segmentation based parameter-based methods and anchor-based methods.
|
||||
|
||||
\textbf{Segmentation-based Methods.} Segmentation-based methods focus on pixel-wise prediction. They predefined each pixel into different categories according to different lane instances and background\cite{} and predicted information pixel by pixel. However, these methods overly focus on low-level and local features, neglecting global semantic information and real-time detection. SCNN uses a larger receptive field to overcome this problem. Some methods such as UFLDv1 and v2\cite{}\cite{} and CondLaneNet\cite{} utilize row-wise or column-wise classification instead of pixel classification to improve detection speed. Another issue with these methods is that the lane instance prior is learned by the model itself, leading to a lack of prior knowledge. Lanenet uses post-clustering to distinguish each lane instance. UFLD divides lane instances by angles and locations and can only detect a fixed number of lanes. CondLaneNet utilizes different conditional dynamic kernels to predict different lane instances. Some methods such as FOLOLane\cite{} and GANet\cite{} use bottom-up strategies to detect a few key points and model their global relations to form lane instances.
|
||||
|
||||
\textbf{Parameter-based Methods.} Instead of predicting a series of points locations or pixel classes, parameter-based methods directly generate the curve parameters of lane instances. PolyLanenet\cite{} and LSTR\cite{} consider the lane instance as a polynomial curve and output the polynomial coefficients directly. BézierLaneNet\cite{} treats the lane instance as a Bézier curve and generates the locations of control points of the curve. BSLane uses B-Spline to describe the lane, and the curve parameters focus on the local shapes of lanes. Parameter-based methods are mostly end-to-end without postprocessing, which grants them faster speed. However, since the final visual lane shapes are sensitive to the lane shape, the robustness and generalization of parameter-based methods may be less than ideal.
|
||||
|
||||
|
||||
\textbf{Anchor-Based Methods.} Inspired by some methods in general object detection like YOLO \cite{} and DETR \cite{}, anchor-based methods have been proposed for lane detection. Line-CNN is the earliest work, to our knowledge, that utilizes line anchors to detect lanes. The lines are designed as rays emitted from the three edges (left, bottom, and right) of an image. However, the receptive field of the model only focuses on edges and is slower than some methods. LaneATT \cite{} employs anchor-based feature pooling to aggregate features along the whole line anchor, achieving faster speed with better performance. Nevertheless, the grid sampling strategy and label assignment limit its potential. CLRNet \cite{} utilizes cross-layer refinement strategies, SimOTA label assignment \cite{}, and Liou loss to enhance anchor-based performance beyond most methods. The main advantage of anchor-based methods is that many strategies from anchor-based general object detection can be easily applied to lane detection, such as label assignment, bounding box refinement, and GIOU loss, etc. However, the disadvantages of existing anchor-based lane detection are also evident. The line anchors need to be handcrafted, and the number of anchors is large, resulting in high computational consumption. Motivated by this, ADNet \cite{} uses theta map and start point map to propose more flexible anchors, but its performance is lower than that of CLRNet, which employs a set of handcrafted predefined anchors.
|
||||
|
||||
To address the issues present in anchor-based methods, we have developed a novel anchor proposal module designed to achieve higher performance with fewer anchors.
|
||||
|
||||
|
||||
|
||||
|
||||
\section{Method}
|
||||
\subsection{Overall architecture}
|
||||
To reduce the number of anchors, we design a two-stage network for lane detection similar to Faster R-CNN \cite{}. Figure \ref{} illustrates the overall pipeline of our model. The backbone extracts the image features, the local polar serves as the first stage to propose line anchors, and the RCNN block serves as the second stage to aggregate the line features along the line anchors to predict the lane instance. We will introduce these blocks in detail in the following subsections.
|
||||
|
||||
|
||||
\subsection{Lane and Line Anchor Representation}
|
||||
|
||||
Lanes are thin and long curves, a suitable lane prior helps the model to extract features and predict location and modeling the shapes of lane curves more accurately. Like privious works\cite{}\cite{}, the lane prior in our work are straight lines and we sample a sequense of 2D points on each line anchor, i.e. $ P\doteq \left\{ \left( x_1, y_1 \right) , \left( x_2, y_2 \right) , ....,\left( x_n, y_n \right) \right\} $, where N is the number of sampled points, The y coordinate of points is uniform sampled from the image vertically, i.e. $y_i=\frac{H}{N-1}*i$, where H is the image height. The same y coordinate of points are also sampled from the groundtruth lane and the model regress the x coordinate offset from line anchor to lane instance ground truth.
|
||||
|
||||
\textbf{Plor Coordinate system.} Since the line anchors are always straight in our method, we use straight line parameters to describe a line anchor. Previous work uses a ray to describe a line anchor, where the parameters of a ray contain the start point's coordinates and its orientation/angle, i.e., $\left\{\theta, P_{xy}\right\}$, as shown in Figure \ref{coord} (a). Using a ray may cause ambiguity in describing a line because a line may have infinite start points. As illustrated in Figure \ref{coord} (a), the yellow and darkgreen start points with the same orientation $\theta$ describe the same line. This ambiguity arises because a straight line has two degrees of freedom while a ray has three degrees of freedom. Motivated by this, as shown in Figure \ref{coord} (b), we use polar coordinate systems to describe a line anchor with two parameters for radius and angle $\left\{\theta, r\right\}$, where $\theta \in \left[-\frac{\pi}{2}, \frac{\pi}{2}\right]$ and $r \in \left(-\infty, +\infty\right)$.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
% \begin{figure}[t]
|
||||
% \centering
|
||||
% \begin{subfigure}{1\linewidth}
|
||||
% \centering
|
||||
% \includegraphics[width=0.2\linewidth]{lanefig/coord/ray.png}
|
||||
% \caption{}
|
||||
% \end{subfigure}
|
||||
% \begin{subfigure}{1\linewidth}
|
||||
% \centering
|
||||
% \includegraphics[width=0.2\linewidth]{lanefig/coord/plor.png}
|
||||
% \caption{}
|
||||
% \end{subfigure}
|
||||
% \caption{Different descriptions for anchor parameters. (a) Ray: start point and oritation. (b) Plor: radius and angle.}
|
||||
% \label{coord}
|
||||
% \end{figure}
|
||||
|
||||
|
||||
% \begin{figure}[t]
|
||||
% \centering
|
||||
% \begin{subfigure}{0.49\linewidth}
|
||||
% \centering
|
||||
% \includegraphics[width=1\linewidth]{lanefig/coord/ray.png}
|
||||
% \caption{}
|
||||
% \end{subfigure}
|
||||
% \hfill
|
||||
% \begin{subfigure}{0.49\linewidth}
|
||||
% \centering
|
||||
% \includegraphics[width=1\linewidth]{lanefig/coord/plor.png}
|
||||
% \caption{}
|
||||
% \end{subfigure}
|
||||
% \caption{Different descriptions for anchor parameters. (a) Ray: start point and orientation. (b) Plor: radius and angle.}
|
||||
% \label{coord}
|
||||
% \end{figure}
|
||||
|
||||
The polar coordinate should have an origin point. We define two kinds of polar coordinate systems called the global coordinate system and the local coordinate system, with the origin points denoted as the global origin point $P_{0}^{\text{global}}$ and the local origin point $P_{0}^{\text{local}}$, respectively. For convenience, the global origin point is set around the static vanishing point of the lane image dataset, while the local origin points can be set anywhere within the image. From Figure \ref{coord}, it is easy to see that both the global and local coordinate systems share the same angle parameter $\theta$ for the same line anchor, with only the radius being different.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
\subsection{Local Plor Proposal Module}
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Just like the region proposal network in Faster RCNN \cite{}, the local polar proposal module aims to propose flexible anchors with high-quality in an image. The backbone receives an image $I \in \mathbb{R}^{3 \times H \times W}$ and outputs the feature map $F \in \mathbb{R}^{C_{f} \times H_{f} \times W_{f}}$. We set each of the $H_{f} \times W_{f}$ map grids as local origin points for different local polar systems. The output of the local polar proposal consists of two branches. The first branch is called the polar coordinate branch, and it predicts anchor parameters under the corresponding local polar coordinate, i.e., $\left[\mathbf{\Theta}^{H_{f} \times W_{f}}, \mathbf{\xi}^{H_{f}\times W_{f}}\right]$, which denotes the angle and radius, respectively. The second branch is called the polar confidence branch, which predicts the confidence of each proposed anchor in the first stage, i.e., $\delta^{H_{f} \times W_{f} }$. To keep the model lightweight, the local polar proposal module (LPM) is composed of several convolutional layers.
|
||||
|
||||
During the training stage, the ground truth of proposal parameters is constructed as follows. The absolute value of radius ground truth is defined as the shortest distance from a grid point (local plot origin point) to the lane curve. The ground truth of angle is defined as the orientation of the link from the grid point to the shortest points on the curve. Only one grid with radius less than a threshold $\tau$ is set as a positive sample, while others are set as negative samples. Figure \ref{LPM} illustrates the label construction process for LPM. Therefore, the LPM training loss function is as follows:
|
||||
|
||||
\begin{equation}
|
||||
\begin{aligned}
|
||||
\mathcal{L} _{LPM}&=BCE\left( \delta , \delta _{gt} \right) \\
|
||||
&+\sum_i^{N_{pos}}{\left( d\left( \theta _{pos,i}-\theta _{gt,i} \right) +d\left( r_{pos,i}-r_{gt,i} \right) \right)}
|
||||
\end{aligned}
|
||||
\label{loss_lpm}
|
||||
\end{equation}
|
||||
|
||||
|
||||
where $BCE\left( \cdot , \cdot \right) $ denotes the binary cross entropy loss and $d\left(\cdot \right)$ denotes the smooth-l1 loss. In order to keep the backbone training stability, the gradiants from the confidential branch to the backbone feature map are detached.
|
||||
|
||||
|
||||
% \begin{figure}[t]
|
||||
% \centering
|
||||
% \includegraphics[width=0.7\linewidth]{lanefig/coord/localplor.png}
|
||||
% \caption{Label construction for local plor proposal module.}
|
||||
% \label{LPM}
|
||||
% \end{figure}
|
||||
|
||||
During the test stage, once the local plor parameter of a line anchor is provided, it can be transformed to the global plor coordinates with the following euqation:
|
||||
\begin{equation}
|
||||
\begin{aligned}
|
||||
r^{G}=r^{L}+\left( x^{L}-x^{G} \right) \cos \theta
|
||||
\\+\left( y^{L}-y^{G} \right) \sin \theta
|
||||
\end{aligned}
|
||||
\end{equation}
|
||||
where $\left( x^{L}, y^{L} \right)$ and $\left( x^{G}, y^{G} \right)$ are the Cartesian coordinates of local and global origin points correspondingly.
|
||||
|
||||
|
||||
\subsection{RCNN Module}
|
||||
The second stage is RCNN Module, which accept the line pooling features as input and predict the accurate lane shape and localtion. Once a proposal anchor global plor parameters $\left\{ \theta , r \right\} $ are provided, the feature points can be sample on the line anchor. The y coordinate of points is uniform sampled from the image vertically as mentioned before, and the $x_{i}$ is caculated by the following equation:
|
||||
|
||||
\begin{equation}
|
||||
\begin{aligned}
|
||||
x_{i\,\,}=-y_i\tan \theta +\frac{r}{\cos \theta}
|
||||
\end{aligned}
|
||||
\end{equation}
|
||||
|
||||
The RCNN Module consists of several MLP layers and predicts the confidence and the coordinate offset of $x_{i}$. During the training stage, all the $F\in \mathbb{R} ^{C_{f}\times H_{f}\times W_{f}}$ proposed anchors participate, and the SimOTA\ref{} label assignment strategy is used for the RCNN module to determine which anchors are positive anchors, irrespective of the confidence predicted by the LPM module. These strategies are employed because the negative/background anchors are also crucial for the adaptability of the RCNN module.
|
||||
|
||||
The loss function is as follows:
|
||||
|
||||
\begin{equation}
|
||||
\begin{aligned}
|
||||
\mathcal{L} _{RCNN}=c_{cls}\mathcal{L} _{cls}+c_{loc}\mathcal{L} _{loc}
|
||||
\end{aligned}
|
||||
\end{equation}
|
||||
|
||||
where $\mathcal{L} _{cls}$ is focal loss, and $\mathcal{L} _{loc}$ is LaneIou loss\cite{}.
|
||||
|
||||
In the testing stage, anchors with the top-$k_{l}$ confidence are the chosed as the proposal anchors, and $k_{l}$ anchors are fed into the RCNN module to get the final predictions.
|
||||
|
||||
|
||||
\section{Experiment}
|
||||
|
||||
|
||||
\begin{table*}[h]
|
||||
\centering
|
||||
\caption{CULane Result compared with other methods}
|
||||
\begin{tabular}{cccccccccccc}
|
||||
\hline
|
||||
\textbf{Method}& \textbf{Backbone}& \textbf{F1@50}$\uparrow$ & \textbf{Normal}$\uparrow$&\textbf{Crowded}$\uparrow$&\textbf{Dazzle}$\uparrow$&\textbf{Shadow}$\uparrow$&\textbf{No line}$\uparrow$& \textbf{Arrow}$\uparrow$& \textbf{Curve}$\uparrow$& \textbf{Cross}$\downarrow$ & \textbf{Night}$\uparrow$ \\
|
||||
\hline
|
||||
\textbf{Segmentation Based} \\
|
||||
\cline{1-1}
|
||||
SCNN &VGG-16&71.60&90.60&69.70&58.50&66.90&43.40&84.10&64.40&1900&66.10 \\
|
||||
RESA &ResNet50&75.3&92.10&73.10&69.20&72.80&47.70&83.30&70.30&1503&69.90 \\
|
||||
LaneAF &DLA34&77.41&91.80&75.61&71.78&79.12&51.38&86.88&72.70&1360&73.03 \\
|
||||
\cline{1-1}
|
||||
\textbf{Parameter Based} \\
|
||||
\cline{1-1}
|
||||
% LSTR &ResNet18&64.00&&&&&&& \\
|
||||
BezierLanenet &ResNet18&73.67&90.22&71.55&62.49&70.91&45.30&84.09&58.98&996&68.70\\
|
||||
BSNet &ResNet34&79.89&93.75&78.01&76.65&79.55&54.69&90.72&73.99& 1455&75.28\\
|
||||
% Eigenlanes &ResNet50&77.20&&&&&&&&&&&& \\
|
||||
% Laneformer &ResNet50&77.06&&&&&&&&&&& \\
|
||||
\cline{1-1}
|
||||
\textbf{Anchor Based} \\
|
||||
\cline{1-1}
|
||||
LaneATT &ResNet122&77.02&91.74&76.16&69.47&76.31&50.46&86.29&64.05&1264&70.81 \\
|
||||
ADNet &ResNet34&78.94&92.90&77.45&71.71&79.11&52.89&89.90&70.64&1499&74.78 \\
|
||||
% CLRNet &ResNet34&79.73&&&&&&&& \\
|
||||
CLRNet &ResNet101&80.13&93.85&78.78&72.49&82.33&54.50&89.79&75.57&1262&75.51 \\
|
||||
CLRNet &DLA34&80.47&93.73&79.59&75.30&82.51&54.58&90.62&74.13&1155&75.37 \\
|
||||
\hline
|
||||
PlorRCNN (ours) &ResNet18&80.81&94.11&79.62&75.65&82.43&54.41&90.49&77.02&975&75.59\\
|
||||
PlorRCNN-NMS-free (ours) &ResNet18&80.37&93.81&79.07&74.73&81.40&53.73&89.91&75.64&\textbf{941}&75.41\\
|
||||
PlorRCNN (ours) &ResNet34&80.95&\textbf{94.33}&79.68&\textbf{75.87}&82.89&55.69&\textbf{90.90}&78.40&1182&75.84\\
|
||||
PlorRCNN (ours) &ResNet50&\textbf{81.23}&94.32&\textbf{80.22}&75.04&\textbf{83.40}&\textbf{56.25}&90.41&\textbf{78.94}&1271&\textbf{76.16}\\
|
||||
\hline
|
||||
\end{tabular}
|
||||
|
||||
\label{tab:my_table}
|
||||
\end{table*}
|
||||
|
||||
\begin{table}[h]
|
||||
\centering
|
||||
\begin{tabular}{cccccc}
|
||||
\hline
|
||||
\textbf{Fix anchor}& \textbf{Plor angle}& \textbf{Plor r}&\textbf{Auxloss}&\textbf{F1@50}&\textbf{F1@75} \\
|
||||
\hline
|
||||
\checkmark&&&\checkmark&80.29&62.05\\
|
||||
&&\checkmark&\checkmark&54.48&25.28\\
|
||||
&\checkmark&&\checkmark&80.30&62.67\\
|
||||
&\checkmark&\checkmark&&80.51&63.09\\
|
||||
&\checkmark&\checkmark&\checkmark&\textbf{80.81}&\textbf{63.64}\\
|
||||
\hline
|
||||
\end{tabular}
|
||||
\caption{CULane Result compared with other methods}
|
||||
\end{table}
|
||||
|
||||
|
||||
\begin{table}[h]
|
||||
\centering
|
||||
\begin{tabular}{ccccc}
|
||||
\hline
|
||||
\textbf{Local Plor Size}& \textbf{Top-60}& \textbf{Top-40}&\textbf{Top-20}&\textbf{Top-10} \\
|
||||
\hline
|
||||
$2\times10$&/&/&80.54&80.50\\
|
||||
$4\times10$&/&80.81&80.81&80.39\\
|
||||
$5\times12$&80.86&80.86&80.82&79.68\\
|
||||
\hline
|
||||
\end{tabular}
|
||||
\caption{CULane Result compared with other methods}
|
||||
\end{table}
|
||||
|
||||
|
||||
\begin{table}[h]
|
||||
\centering
|
||||
\begin{tabular}{cccccc}
|
||||
\hline
|
||||
\textbf{Method}& \textbf{Backbone}& \textbf{F1(\%)}&\textbf{Acc(\%)}&\textbf{FP(\%)}&\textbf{FN(\%)} \\
|
||||
\hline
|
||||
SCNN&VGG16&95.97&96.53&6.17&1.80\\
|
||||
PolyLanenet&EfficientNetB0&90.62&93.36&9.42&9.33\\
|
||||
UFLD&ResNet18&87.87&95.82&19.05&3.92\\
|
||||
UFLD&ResNet34&88.02&95.86&18.91&3.75\\
|
||||
LaneATT&ResNet34&96.77&95.63&3.53&2.92\\
|
||||
LaneATT&ResNet122&96.06&96.10&5.64&2.17\\
|
||||
FOLOLane&ERFNet&96.59&96.92&4.47&2.28\\
|
||||
CondLaneNet&ResNet101&97.24&96.54&2.01&3.50\\
|
||||
CLRNet&ResNet18&97.89&96.84&2.28&1.92\\
|
||||
\hline
|
||||
PlorRCNN(ours)&ResNet18&\textbf{98.00}&96.00&\textbf{1.75}&2.25\\
|
||||
PlorRCNN-NMS-free(ours)&ResNet18&97.65&96.02&2.52&2.15\\
|
||||
\hline
|
||||
\end{tabular}
|
||||
\caption{CULane Result compared with other methods}
|
||||
\end{table}
|
||||
|
||||
|
||||
|
||||
\begin{table}[h]
|
||||
\centering
|
||||
\begin{tabular}{cccc}
|
||||
\hline
|
||||
\textbf{Attention}& \textbf{Id Embeddings}& \textbf{Plor Embeddings}&\textbf{F1@50}\\
|
||||
\hline
|
||||
&&&69.12\\
|
||||
\checkmark&&&75.55\\
|
||||
\checkmark&\checkmark&&78.30\\
|
||||
\checkmark&&\checkmark&76.14\\
|
||||
\checkmark&\checkmark&\checkmark&80.37\\
|
||||
\hline
|
||||
\end{tabular}
|
||||
\caption{CULane Result compared with other methods}
|
||||
\end{table}
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
% \section{References Section}
|
||||
% You can use a bibliography generated by BibTeX as a .bbl file.
|
||||
% BibTeX documentation can be easily obtained at:
|
||||
% http://mirror.ctan.org/biblio/bibtex/contrib/doc/
|
||||
% The IEEEtran BibTeX style support page is:
|
||||
% http://www.michaelshell.org/tex/ieeetran/bibtex/
|
||||
|
||||
% argument is your BibTeX string definitions and bibliography database(s)
|
||||
%\bibliography{IEEEabrv,../bib/paper}
|
||||
%
|
||||
% \section{Simple References}
|
||||
% You can manually copy in the resultant .bbl file and set second argument of $\backslash${\tt{begin}} to the number of references
|
||||
% (used to reserve space for the reference number labels box).
|
||||
|
||||
|
||||
\bibliographystyle{IEEEtran}
|
||||
\bibliography{ref}
|
||||
% \begin{thebibliography}{1}
|
||||
% \bibliographystyle{IEEEtran}
|
||||
|
||||
% \bibitem{ref1}
|
||||
% {\it{Mathematics Into Type}}. American Mathematical Society. [Online]. Available: https://www.ams.org/arc/styleguide/mit-2.pdf
|
||||
|
||||
% \bibitem{ref2}
|
||||
% T. W. Chaundy, P. R. Barrett and C. Batey, {\it{The Printing of Mathematics}}. London, U.K., Oxford Univ. Press, 1954.
|
||||
|
||||
% \bibitem{ref3}
|
||||
% F. Mittelbach and M. Goossens, {\it{The \LaTeX Companion}}, 2nd ed. Boston, MA, USA: Pearson, 2004.
|
||||
|
||||
% \bibitem{ref4}
|
||||
% G. Gr\"atzer, {\it{More Math Into LaTeX}}, New York, NY, USA: Springer, 2007.
|
||||
|
||||
% \bibitem{ref5}M. Letourneau and J. W. Sharp, {\it{AMS-StyleGuide-online.pdf,}} American Mathematical Society, Providence, RI, USA, [Online]. Available: http://www.ams.org/arc/styleguide/index.html
|
||||
|
||||
% \bibitem{ref6}
|
||||
% H. Sira-Ramirez, ``On the sliding mode control of nonlinear systems,'' \textit{Syst. Control Lett.}, vol. 19, pp. 303--312, 1992.
|
||||
|
||||
% \bibitem{ref7}
|
||||
% A. Levant, ``Exact differentiation of signals with unbounded higher derivatives,'' in \textit{Proc. 45th IEEE Conf. Decis.
|
||||
% Control}, San Diego, CA, USA, 2006, pp. 5585--5590. DOI: 10.1109/CDC.2006.377165.
|
||||
|
||||
% \bibitem{ref8}
|
||||
% M. Fliess, C. Join, and H. Sira-Ramirez, ``Non-linear estimation is easy,'' \textit{Int. J. Model., Ident. Control}, vol. 4, no. 1, pp. 12--27, 2008.
|
||||
|
||||
% \bibitem{ref9}
|
||||
% R. Ortega, A. Astolfi, G. Bastin, and H. Rodriguez, ``Stabilization of food-chain systems using a port-controlled Hamiltonian description,'' in \textit{Proc. Amer. Control Conf.}, Chicago, IL, USA,
|
||||
% 2000, pp. 2245--2249.
|
||||
|
||||
% \end{thebibliography}
|
||||
|
||||
|
||||
\newpage
|
||||
|
||||
\section{Biography Section}
|
||||
If you have an EPS/PDF photo (graphicx package needed), extra braces are
|
||||
needed around the contents of the optional argument to biography to prevent
|
||||
the LaTeX parser from getting confused when it sees the complicated
|
||||
$\backslash${\tt{includegraphics}} command within an optional argument. (You can create
|
||||
your own custom macro containing the $\backslash${\tt{includegraphics}} command to make things
|
||||
simpler here.)
|
||||
|
||||
\vspace{11pt}
|
||||
|
||||
\bf{If you include a photo:}\vspace{-33pt}
|
||||
\begin{IEEEbiography}[{\includegraphics[width=1in,height=1.25in,clip,keepaspectratio]{fig1}}]{Michael Shell}
|
||||
Use $\backslash${\tt{begin\{IEEEbiography\}}} and then for the 1st argument use $\backslash${\tt{includegraphics}} to declare and link the author photo.
|
||||
Use the author name as the 3rd argument followed by the biography text.
|
||||
\end{IEEEbiography}
|
||||
|
||||
\vspace{11pt}
|
||||
|
||||
\bf{If you will not include a photo:}\vspace{-33pt}
|
||||
\begin{IEEEbiographynophoto}{John Doe}
|
||||
Use $\backslash${\tt{begin\{IEEEbiographynophoto\}}} and the author name as the argument followed by the biography text.
|
||||
\end{IEEEbiographynophoto}
|
||||
|
||||
|
||||
|
||||
|
||||
\vfill
|
||||
|
||||
\end{document}
|
||||
|
||||
|
657
main_copy.tex
657
main_copy.tex
@ -1,657 +0,0 @@
|
||||
\documentclass[lettersize,journal]{IEEEtran}
|
||||
\usepackage{amsmath,amsfonts}
|
||||
\usepackage{algorithmic}
|
||||
\usepackage{algorithm}
|
||||
\usepackage{array}
|
||||
\usepackage[caption=false,font=normalsize,labelfont=sf,textfont=sf]{subfig}
|
||||
\usepackage{textcomp}
|
||||
\usepackage{stfloats}
|
||||
\usepackage{url}
|
||||
\usepackage{verbatim}
|
||||
\usepackage{graphicx}
|
||||
\usepackage{cite}
|
||||
\hyphenation{op-tical net-works semi-conduc-tor IEEE-Xplore}
|
||||
% updated with editorial comments 8/9/2021
|
||||
|
||||
\begin{document}
|
||||
|
||||
\title{Rethiking the anchor of lane detection }
|
||||
|
||||
\author{IEEE Publication Technology,~\IEEEmembership{Staff,~IEEE,}
|
||||
% <-this % stops a space
|
||||
\thanks{This paper was produced by the IEEE Publication Technology Group. They are in Piscataway, NJ.}% <-this % stops a space
|
||||
\thanks{Manuscript received April 19, 2021; revised August 16, 2021.}}
|
||||
|
||||
% The paper headers
|
||||
\markboth{Journal of \LaTeX\ Class Files,~Vol.~14, No.~8, August~2021}%
|
||||
{Shell \MakeLowercase{\textit{et al.}}: A Sample Article Using IEEEtran.cls for IEEE Journals}
|
||||
|
||||
\IEEEpubid{0000--0000/00\$00.00~\copyright~2021 IEEE}
|
||||
% Remember, if you use this you must call \IEEEpubidadjcol in the second
|
||||
% column for its text to clear the IEEEpubid mark.
|
||||
|
||||
\maketitle
|
||||
|
||||
\begin{abstract}
|
||||
This document describes the most common article elements and how to use the IEEEtran class with \LaTeX \ to produce files that are suitable for submission to the IEEE. IEEEtran can produce conference, journal, and technical note (correspondence) papers with a suitable choice of class options. Are there any possible in the united statse are there any possible
|
||||
\end{abstract}
|
||||
|
||||
\begin{IEEEkeywords}
|
||||
Article submission, IEEE, IEEEtran, journal, \LaTeX, paper, template, typesetting.
|
||||
\end{IEEEkeywords}
|
||||
|
||||
\section{Introduction}
|
||||
\IEEEPARstart{T}{his} file is intended to serve as a ``sample article file''
|
||||
for IEEE journal papers produced under \LaTeX\ using
|
||||
IEEEtran.cls version 1.8b and later. The most common elements are covered in the simplified and updated instructions in ``New\_IEEEtran\_how-to.pdf''. For less common elements you can refer back to the original ``IEEEtran\_HOWTO.pdf''. It is assumed that the reader has a basic working knowledge of \LaTeX. Those who are new to \LaTeX \ are encouraged to read Tobias Oetiker's ``The Not So Short Introduction to \LaTeX ,'' available at: \url{http://tug.ctan.org/info/lshort/english/lshort.pdf} which provides an overview of working with \LaTeX.
|
||||
|
||||
\section{The Design, Intent, and \\ Limitations of the Templates}
|
||||
The templates are intended to {\bf{approximate the final look and page length of the articles/papers}}. {\bf{They are NOT intended to be the final produced work that is displayed in print or on IEEEXplore\textsuperscript{\textregistered}}}. They will help to give the authors an approximation of the number of pages that will be in the final version. The structure of the \LaTeX\ files, as designed, enable easy conversion to XML for the composition systems used by the IEEE. The XML files are used to produce the final print/IEEEXplore pdf and then converted to HTML for IEEEXplore.
|
||||
|
||||
\section{Where to Get \LaTeX \ Help --- User Groups}
|
||||
The following online groups are helpful to beginning and experienced \LaTeX\ users. A search through their archives can provide many answers to common questions.
|
||||
\begin{list}{}{}
|
||||
\item{\url{http://www.latex-community.org/}}
|
||||
\item{\url{https://tex.stackexchange.com/} }
|
||||
\end{list}
|
||||
|
||||
\section{Other Resources}
|
||||
See \cite{ref1,ref2,ref3,ref4,ref5} for resources on formatting math into text and additional help in working with \LaTeX .
|
||||
|
||||
\section{Text}
|
||||
For some of the remainer of this sample we will use dummy text to fill out paragraphs rather than use live text that may violate a copyright.
|
||||
|
||||
Itam, que ipiti sum dem velit la sum et dionet quatibus apitet voloritet audam, qui aliciant voloreicid quaspe volorem ut maximusandit faccum conemporerum aut ellatur, nobis arcimus.
|
||||
Fugit odi ut pliquia incitium latum que cusapere perit molupta eaquaeria quod ut optatem poreiur? Quiaerr ovitior suntiant litio bearciur?
|
||||
|
||||
Onseque sequaes rectur autate minullore nusae nestiberum, sum voluptatio. Et ratem sequiam quaspername nos rem repudandae volum consequis nos eium aut as molupta tectum ulparumquam ut maximillesti consequas quas inctia cum volectinusa porrum unt eius cusaest exeritatur? Nias es enist fugit pa vollum reium essusam nist et pa aceaqui quo elibusdandis deligendus que nullaci lloreri bla que sa coreriam explacc atiumquos simolorpore, non prehendunt lam que occum\cite{ref6} si aut aut maximus eliaeruntia dia sequiamenime natem sendae ipidemp orehend uciisi omnienetus most verum, ommolendi omnimus, est, veni aut ipsa volendelist mo conserum volores estisciis recessi nveles ut poressitatur sitiis ex endi diti volum dolupta aut aut odi as eatquo cullabo remquis toreptum et des accus dolende pores sequas dolores tinust quas expel moditae ne sum quiatis nis endipie nihilis etum fugiae audi dia quiasit quibus.
|
||||
\IEEEpubidadjcol
|
||||
Ibus el et quatemo luptatque doluptaest et pe volent rem ipidusa eribus utem venimolorae dera qui acea quam etur aceruptat.
|
||||
Gias anis doluptaspic tem et aliquis alique inctiuntiur?
|
||||
|
||||
Sedigent, si aligend elibuscid ut et ium volo tem eictore pellore ritatus ut ut ullatus in con con pere nos ab ium di tem aliqui od magnit repta volectur suntio. Nam isquiante doluptis essit, ut eos suntionsecto debitiur sum ea ipitiis adipit, oditiore, a dolorerempos aut harum ius, atquat.
|
||||
|
||||
Rum rem ditinti sciendunti volupiciendi sequiae nonsect oreniatur, volores sition ressimil inus solut ea volum harumqui to see\eqref{deqn_ex1a} mint aut quat eos explis ad quodi debis deliqui aspel earcius.
|
||||
|
||||
\begin{equation}
|
||||
\label{deqn_ex1a}
|
||||
x = \sum_{i=0}^{n} 2{i} Q.
|
||||
\end{equation}
|
||||
|
||||
Alis nime volorempera perferi sitio denim repudae pre ducilit atatet volecte ssimillorae dolore, ut pel ipsa nonsequiam in re nus maiost et que dolor sunt eturita tibusanis eatent a aut et dio blaudit reptibu scipitem liquia consequodi od unto ipsae. Et enitia vel et experferum quiat harum sa net faccae dolut voloria nem. Bus ut labo. Ita eum repraer rovitia samendit aut et volupta tecupti busant omni quiae porro que nossimodic temquis anto blacita conse nis am, que ereperum eumquam quaescil imenisci quae magnimos recus ilibeaque cum etum iliate prae parumquatemo blaceaquiam quundia dit apienditem rerit re eici quaes eos sinvers pelecabo. Namendignis as exerupit aut magnim ium illabor roratecte plic tem res apiscipsam et vernat untur a deliquaest que non cus eat ea dolupiducim fugiam volum hil ius dolo eaquis sitis aut landesto quo corerest et auditaquas ditae voloribus, qui optaspis exero cusa am, ut plibus.
|
||||
|
||||
|
||||
\section{Some Common Elements}
|
||||
\subsection{Sections and Subsections}
|
||||
Enumeration of section headings is desirable, but not required. When numbered, please be consistent throughout the article, that is, all headings and all levels of section headings in the article should be enumerated. Primary headings are designated with Roman numerals, secondary with capital letters, tertiary with Arabic numbers; and quaternary with lowercase letters. Reference and Acknowledgment headings are unlike all other section headings in text. They are never enumerated. They are simply primary headings without labels, regardless of whether the other headings in the article are enumerated.
|
||||
|
||||
\subsection{Citations to the Bibliography}
|
||||
The coding for the citations is made with the \LaTeX\ $\backslash${\tt{cite}} command.
|
||||
This will display as: see \cite{ref1}.
|
||||
|
||||
For multiple citations code as follows: {\tt{$\backslash$cite\{ref1,ref2,ref3\}}}
|
||||
which will produce \cite{ref1,ref2,ref3}. For reference ranges that are not consecutive code as {\tt{$\backslash$cite\{ref1,ref2,ref3,ref9\}}} which will produce \cite{ref1,ref2,ref3,ref9}
|
||||
|
||||
\subsection{Lists}
|
||||
In this section, we will consider three types of lists: simple unnumbered, numbered, and bulleted. There have been many options added to IEEEtran to enhance the creation of lists. If your lists are more complex than those shown below, please refer to the original ``IEEEtran\_HOWTO.pdf'' for additional options.\\
|
||||
|
||||
\subsubsection*{\bf A plain unnumbered list}
|
||||
\begin{list}{}{}
|
||||
\item{bare\_jrnl.tex}
|
||||
\item{bare\_conf.tex}
|
||||
\item{bare\_jrnl\_compsoc.tex}
|
||||
\item{bare\_conf\_compsoc.tex}
|
||||
\item{bare\_jrnl\_comsoc.tex}
|
||||
\end{list}
|
||||
|
||||
\subsubsection*{\bf A simple numbered list}
|
||||
\begin{enumerate}
|
||||
\item{bare\_jrnl.tex}
|
||||
\item{bare\_conf.tex}
|
||||
\item{bare\_jrnl\_compsoc.tex}
|
||||
\item{bare\_conf\_compsoc.tex}
|
||||
\item{bare\_jrnl\_comsoc.tex}
|
||||
\end{enumerate}
|
||||
|
||||
\subsubsection*{\bf A simple bulleted list}
|
||||
\begin{itemize}
|
||||
\item{bare\_jrnl.tex}
|
||||
\item{bare\_conf.tex}
|
||||
\item{bare\_jrnl\_compsoc.tex}
|
||||
\item{bare\_conf\_compsoc.tex}
|
||||
\item{bare\_jrnl\_comsoc.tex}
|
||||
\end{itemize}
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
\subsection{Figures}
|
||||
Fig. 1 is an example of a floating figure using the graphicx package.
|
||||
Note that $\backslash${\tt{label}} must occur AFTER (or within) $\backslash${\tt{caption}}.
|
||||
For figures, $\backslash${\tt{caption}} should occur after the $\backslash${\tt{includegraphics}}.
|
||||
|
||||
\begin{figure}[!t]
|
||||
\centering
|
||||
\includegraphics[width=2.5in]{fig1}
|
||||
\caption{Simulation results for the network.}
|
||||
\label{fig_1}
|
||||
\end{figure}
|
||||
|
||||
Fig. 2(a) and 2(b) is an example of a double column floating figure using two subfigures.
|
||||
(The subfig.sty package must be loaded for this to work.)
|
||||
The subfigure $\backslash${\tt{label}} commands are set within each subfloat command,
|
||||
and the $\backslash${\tt{label}} for the overall figure must come after $\backslash${\tt{caption}}.
|
||||
$\backslash${\tt{hfil}} is used as a separator to get equal spacing.
|
||||
The combined width of all the parts of the figure should do not exceed the text width or a line break will occur.
|
||||
%
|
||||
\begin{figure*}[!t]
|
||||
\centering
|
||||
\subfloat[]{\includegraphics[width=2.5in]{fig1}%
|
||||
\label{fig_first_case}}
|
||||
\hfil
|
||||
\subfloat[]{\includegraphics[width=2.5in]{fig1}%
|
||||
\label{fig_second_case}}
|
||||
\caption{Dae. Ad quatur autat ut porepel itemoles dolor autem fuga. Bus quia con nessunti as remo di quatus non perum que nimus. (a) Case I. (b) Case II.}
|
||||
\label{fig_sim}
|
||||
\end{figure*}
|
||||
|
||||
Note that often IEEE papers with multi-part figures do not place the labels within the image itself (using the optional argument to $\backslash${\tt{subfloat}}[]), but instead will
|
||||
reference/describe all of them (a), (b), etc., within the main caption.
|
||||
Be aware that for subfig.sty to generate the (a), (b), etc., subfigure
|
||||
labels, the optional argument to $\backslash${\tt{subfloat}} must be present. If a
|
||||
subcaption is not desired, leave its contents blank,
|
||||
e.g.,$\backslash${\tt{subfloat}}[].
|
||||
|
||||
|
||||
|
||||
|
||||
\section{Tables}
|
||||
Note that, for IEEE-style tables, the
|
||||
$\backslash${\tt{caption}} command should come BEFORE the table. Table captions use title case. Articles (a, an, the), coordinating conjunctions (and, but, for, or, nor), and most short prepositions are lowercase unless they are the first or last word. Table text will default to $\backslash${\tt{footnotesize}} as
|
||||
the IEEE normally uses this smaller font for tables.
|
||||
The $\backslash${\tt{label}} must come after $\backslash${\tt{caption}} as always.
|
||||
|
||||
\begin{table}[!t]
|
||||
\caption{An Example of a Table\label{tab:table1}}
|
||||
\centering
|
||||
\begin{tabular}{|c||c|}
|
||||
\hline
|
||||
One & Two\\
|
||||
\hline
|
||||
Three & Four\\
|
||||
\hline
|
||||
\end{tabular}
|
||||
\end{table}
|
||||
|
||||
|
||||
\begin{table*}[h]
|
||||
\centering
|
||||
\caption{CULane Result compared with other methods}
|
||||
\begin{tabular}{cccccccccccc}
|
||||
\hline
|
||||
\textbf{Method}& \textbf{Backbone}& \textbf{F1@50}$\uparrow$ & \textbf{Normal}$\uparrow$&\textbf{Crowded}$\uparrow$&\textbf{Dazzle}$\uparrow$&\textbf{Shadow}$\uparrow$&\textbf{No line}$\uparrow$& \textbf{Arrow}$\uparrow$& \textbf{Curve}$\uparrow$& \textbf{Cross}$\downarrow$ & \textbf{Night}$\uparrow$ \\
|
||||
\hline
|
||||
\textbf{Segmentation Based} \\
|
||||
\cline{1-1}
|
||||
SCNN &VGG-16&71.60&90.60&69.70&58.50&66.90&43.40&84.10&64.40&1900&66.10 \\
|
||||
RESA &ResNet50&75.3&92.10&73.10&69.20&72.80&47.70&83.30&70.30&1503&69.90 \\
|
||||
LaneAF &DLA34&77.41&91.80&75.61&71.78&79.12&51.38&86.88&72.70&1360&73.03 \\
|
||||
\cline{1-1}
|
||||
\textbf{Parameter Based} \\
|
||||
\cline{1-1}
|
||||
% LSTR &ResNet18&64.00&&&&&&& \\
|
||||
BezierLanenet &ResNet18&73.67&90.22&71.55&62.49&70.91&45.30&84.09&58.98&996&68.70\\
|
||||
BSNet &ResNet34&79.89&93.75&78.01&76.65&79.55&54.69&90.72&73.99& 1455&75.28\\
|
||||
% Eigenlanes &ResNet50&77.20&&&&&&&&&&&& \\
|
||||
% Laneformer &ResNet50&77.06&&&&&&&&&&& \\
|
||||
\cline{1-1}
|
||||
\textbf{Anchor Based} \\
|
||||
\cline{1-1}
|
||||
LaneATT &ResNet122&77.02&91.74&76.16&69.47&76.31&50.46&86.29&64.05&1264&70.81 \\
|
||||
ADNet &ResNet34&78.94&92.90&77.45&71.71&79.11&52.89&89.90&70.64&1499&74.78 \\
|
||||
% CLRNet &ResNet34&79.73&&&&&&&& \\
|
||||
CLRNet &ResNet101&80.13&93.85&78.78&72.49&82.33&54.50&89.79&75.57&1262&75.51 \\
|
||||
CLRNet &DLA34&80.47&93.73&79.59&75.30&82.51&54.58&90.62&74.13&1155&75.37 \\
|
||||
\hline
|
||||
PlorRCNN (ours) &ResNet18&80.81&94.11&79.62&75.65&82.43&54.41&90.49&77.02&975&75.59\\
|
||||
PlorRCNN-NMS-free (ours) &ResNet18&80.37&93.81&79.07&74.73&81.40&53.73&89.91&75.64&\textbf{941}&75.41\\
|
||||
PlorRCNN (ours) &ResNet34&80.95&\textbf{94.33}&79.68&\textbf{75.87}&82.89&55.69&\textbf{90.90}&78.40&1182&75.84\\
|
||||
PlorRCNN (ours) &ResNet50&\textbf{81.23}&94.32&\textbf{80.22}&75.04&\textbf{83.40}&\textbf{56.25}&90.41&\textbf{78.94}&1271&\textbf{76.16}\\
|
||||
\hline
|
||||
\end{tabular}
|
||||
|
||||
\label{tab:my_table}
|
||||
\end{table*}
|
||||
|
||||
\section{Algorithms}
|
||||
Algorithms should be numbered and include a short title. They are set off from the text with rules above and below the title and after the last line.
|
||||
|
||||
\begin{algorithm}[H]
|
||||
\caption{Weighted Tanimoto ELM.}\label{alg:alg1}
|
||||
\begin{algorithmic}
|
||||
\STATE
|
||||
\STATE {\textsc{TRAIN}}$(\mathbf{X} \mathbf{T})$
|
||||
\STATE \hspace{0.5cm}$ \textbf{select randomly } W \subset \mathbf{X} $
|
||||
\STATE \hspace{0.5cm}$ N_\mathbf{t} \gets | \{ i : \mathbf{t}_i = \mathbf{t} \} | $ \textbf{ for } $ \mathbf{t}= -1,+1 $
|
||||
\STATE \hspace{0.5cm}$ B_i \gets \sqrt{ \textsc{max}(N_{-1},N_{+1}) / N_{\mathbf{t}_i} } $ \textbf{ for } $ i = 1,...,N $
|
||||
\STATE \hspace{0.5cm}$ \hat{\mathbf{H}} \gets B \cdot (\mathbf{X}^T\textbf{W})/( \mathbb{1}\mathbf{X} + \mathbb{1}\textbf{W} - \mathbf{X}^T\textbf{W} ) $
|
||||
\STATE \hspace{0.5cm}$ \beta \gets \left ( I/C + \hat{\mathbf{H}}^T\hat{\mathbf{H}} \right )^{-1}(\hat{\mathbf{H}}^T B\cdot \mathbf{T}) $
|
||||
\STATE \hspace{0.5cm}\textbf{return} $\textbf{W}, \beta $
|
||||
\STATE
|
||||
\STATE {\textsc{PREDICT}}$(\mathbf{X} )$
|
||||
\STATE \hspace{0.5cm}$ \mathbf{H} \gets (\mathbf{X}^T\textbf{W} )/( \mathbb{1}\mathbf{X} + \mathbb{1}\textbf{W}- \mathbf{X}^T\textbf{W} ) $
|
||||
\STATE \hspace{0.5cm}\textbf{return} $\textsc{sign}( \mathbf{H} \beta )$
|
||||
\end{algorithmic}
|
||||
\label{alg1}
|
||||
\end{algorithm}
|
||||
|
||||
Que sunt eum lam eos si dic to estist, culluptium quid qui nestrum nobis reiumquiatur minimus minctem. Ro moluptat fuga. Itatquiam ut laborpo rersped exceres vollandi repudaerem. Ulparci sunt, qui doluptaquis sumquia ndestiu sapient iorepella sunti veribus. Ro moluptat fuga. Itatquiam ut laborpo rersped exceres vollandi repudaerem.
|
||||
\section{Mathematical Typography \\ and Why It Matters}
|
||||
|
||||
Typographical conventions for mathematical formulas have been developed to {\bf provide uniformity and clarity of presentation across mathematical texts}. This enables the readers of those texts to both understand the author's ideas and to grasp new concepts quickly. While software such as \LaTeX \ and MathType\textsuperscript{\textregistered} can produce aesthetically pleasing math when used properly, it is also very easy to misuse the software, potentially resulting in incorrect math display.
|
||||
|
||||
IEEE aims to provide authors with the proper guidance on mathematical typesetting style and assist them in writing the best possible article. As such, IEEE has assembled a set of examples of good and bad mathematical typesetting \cite{ref1,ref2,ref3,ref4,ref5}.
|
||||
|
||||
Further examples can be found at \url{http://journals.ieeeauthorcenter.ieee.org/wp-content/uploads/sites/7/IEEE-Math-Typesetting-Guide-for-LaTeX-Users.pdf}
|
||||
|
||||
\subsection{Display Equations}
|
||||
The simple display equation example shown below uses the ``equation'' environment. To number the equations, use the $\backslash${\tt{label}} macro to create an identifier for the equation. LaTeX will automatically number the equation for you.
|
||||
\begin{equation}
|
||||
\label{deqn_ex1}
|
||||
x = \sum_{i=0}^{n} 2{i} Q.
|
||||
\end{equation}
|
||||
|
||||
\noindent is coded as follows:
|
||||
\begin{verbatim}
|
||||
\begin{equation}
|
||||
\label{deqn_ex1}
|
||||
x = \sum_{i=0}^{n} 2{i} Q.
|
||||
\end{equation}
|
||||
\end{verbatim}
|
||||
|
||||
To reference this equation in the text use the $\backslash${\tt{ref}} macro.
|
||||
Please see (\ref{deqn_ex1})\\
|
||||
\noindent is coded as follows:
|
||||
\begin{verbatim}
|
||||
Please see (\ref{deqn_ex1})\end{verbatim}
|
||||
|
||||
\subsection{Equation Numbering}
|
||||
{\bf{Consecutive Numbering:}} Equations within an article are numbered consecutively from the beginning of the
|
||||
article to the end, i.e., (1), (2), (3), (4), (5), etc. Do not use roman numerals or section numbers for equation numbering.
|
||||
|
||||
\noindent {\bf{Appendix Equations:}} The continuation of consecutively numbered equations is best in the Appendix, but numbering
|
||||
as (A1), (A2), etc., is permissible.\\
|
||||
|
||||
\noindent {\bf{Hyphens and Periods}}: Hyphens and periods should not be used in equation numbers, i.e., use (1a) rather than
|
||||
(1-a) and (2a) rather than (2.a) for subequations. This should be consistent throughout the article.
|
||||
|
||||
\subsection{Multi-Line Equations and Alignment}
|
||||
Here we show several examples of multi-line equations and proper alignments.
|
||||
|
||||
\noindent {\bf{A single equation that must break over multiple lines due to length with no specific alignment.}}
|
||||
\begin{multline}
|
||||
\text{The first line of this example}\\
|
||||
\text{The second line of this example}\\
|
||||
\text{The third line of this example}
|
||||
\end{multline}
|
||||
|
||||
\noindent is coded as:
|
||||
\begin{verbatim}
|
||||
\begin{multline}
|
||||
\text{The first line of this example}\\
|
||||
\text{The second line of this example}\\
|
||||
\text{The third line of this example}
|
||||
\end{multline}
|
||||
\end{verbatim}
|
||||
|
||||
\noindent {\bf{A single equation with multiple lines aligned at the = signs}}
|
||||
\begin{align}
|
||||
a &= c+d \\
|
||||
b &= e+f
|
||||
\end{align}
|
||||
\noindent is coded as:
|
||||
\begin{verbatim}
|
||||
\begin{align}
|
||||
a &= c+d \\
|
||||
b &= e+f
|
||||
\end{align}
|
||||
\end{verbatim}
|
||||
|
||||
The {\tt{align}} environment can align on multiple points as shown in the following example:
|
||||
\begin{align}
|
||||
x &= y & X & =Y & a &=bc\\
|
||||
x' &= y' & X' &=Y' &a' &=bz
|
||||
\end{align}
|
||||
\noindent is coded as:
|
||||
\begin{verbatim}
|
||||
\begin{align}
|
||||
x &= y & X & =Y & a &=bc\\
|
||||
x' &= y' & X' &=Y' &a' &=bz
|
||||
\end{align}
|
||||
\end{verbatim}
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
\subsection{Subnumbering}
|
||||
The amsmath package provides a {\tt{subequations}} environment to facilitate subnumbering. An example:
|
||||
|
||||
\begin{subequations}\label{eq:2}
|
||||
\begin{align}
|
||||
f&=g \label{eq:2A}\\
|
||||
f' &=g' \label{eq:2B}\\
|
||||
\mathcal{L}f &= \mathcal{L}g \label{eq:2c}
|
||||
\end{align}
|
||||
\end{subequations}
|
||||
|
||||
\noindent is coded as:
|
||||
\begin{verbatim}
|
||||
\begin{subequations}\label{eq:2}
|
||||
\begin{align}
|
||||
f&=g \label{eq:2A}\\
|
||||
f' &=g' \label{eq:2B}\\
|
||||
\mathcal{L}f &= \mathcal{L}g \label{eq:2c}
|
||||
\end{align}
|
||||
\end{subequations}
|
||||
|
||||
\end{verbatim}
|
||||
|
||||
\subsection{Matrices}
|
||||
There are several useful matrix environments that can save you some keystrokes. See the example coding below and the output.
|
||||
|
||||
\noindent {\bf{A simple matrix:}}
|
||||
\begin{equation}
|
||||
\begin{matrix} 0 & 1 \\
|
||||
1 & 0 \end{matrix}
|
||||
\end{equation}
|
||||
is coded as:
|
||||
\begin{verbatim}
|
||||
\begin{equation}
|
||||
\begin{matrix} 0 & 1 \\
|
||||
1 & 0 \end{matrix}
|
||||
\end{equation}
|
||||
\end{verbatim}
|
||||
|
||||
\noindent {\bf{A matrix with parenthesis}}
|
||||
\begin{equation}
|
||||
\begin{pmatrix} 0 & -i \\
|
||||
i & 0 \end{pmatrix}
|
||||
\end{equation}
|
||||
is coded as:
|
||||
\begin{verbatim}
|
||||
\begin{equation}
|
||||
\begin{pmatrix} 0 & -i \\
|
||||
i & 0 \end{pmatrix}
|
||||
\end{equation}
|
||||
\end{verbatim}
|
||||
|
||||
\noindent {\bf{A matrix with square brackets}}
|
||||
\begin{equation}
|
||||
\begin{bmatrix} 0 & -1 \\
|
||||
1 & 0 \end{bmatrix}
|
||||
\end{equation}
|
||||
is coded as:
|
||||
\begin{verbatim}
|
||||
\begin{equation}
|
||||
\begin{bmatrix} 0 & -1 \\
|
||||
1 & 0 \end{bmatrix}
|
||||
\end{equation}
|
||||
\end{verbatim}
|
||||
|
||||
\noindent {\bf{A matrix with curly braces}}
|
||||
\begin{equation}
|
||||
\begin{Bmatrix} 1 & 0 \\
|
||||
0 & -1 \end{Bmatrix}
|
||||
\end{equation}
|
||||
is coded as:
|
||||
\begin{verbatim}
|
||||
\begin{equation}
|
||||
\begin{Bmatrix} 1 & 0 \\
|
||||
0 & -1 \end{Bmatrix}
|
||||
\end{equation}\end{verbatim}
|
||||
|
||||
\noindent {\bf{A matrix with single verticals}}
|
||||
\begin{equation}
|
||||
\begin{vmatrix} a & b \\
|
||||
c & d \end{vmatrix}
|
||||
\end{equation}
|
||||
is coded as:
|
||||
\begin{verbatim}
|
||||
\begin{equation}
|
||||
\begin{vmatrix} a & b \\
|
||||
c & d \end{vmatrix}
|
||||
\end{equation}\end{verbatim}
|
||||
|
||||
\noindent {\bf{A matrix with double verticals}}
|
||||
\begin{equation}
|
||||
\begin{Vmatrix} i & 0 \\
|
||||
0 & -i \end{Vmatrix}
|
||||
\end{equation}
|
||||
is coded as:
|
||||
\begin{verbatim}
|
||||
\begin{equation}
|
||||
\begin{Vmatrix} i & 0 \\
|
||||
0 & -i \end{Vmatrix}
|
||||
\end{equation}\end{verbatim}
|
||||
|
||||
\subsection{Arrays}
|
||||
The {\tt{array}} environment allows you some options for matrix-like equations. You will have to manually key the fences, but there are other options for alignment of the columns and for setting horizontal and vertical rules. The argument to {\tt{array}} controls alignment and placement of vertical rules.
|
||||
|
||||
A simple array
|
||||
\begin{equation}
|
||||
\left(
|
||||
\begin{array}{cccc}
|
||||
a+b+c & uv & x-y & 27\\
|
||||
a+b & u+v & z & 134
|
||||
\end{array}\right)
|
||||
\end{equation}
|
||||
is coded as:
|
||||
\begin{verbatim}
|
||||
\begin{equation}
|
||||
\left(
|
||||
\begin{array}{cccc}
|
||||
a+b+c & uv & x-y & 27\\
|
||||
a+b & u+v & z & 134
|
||||
\end{array} \right)
|
||||
\end{equation}
|
||||
\end{verbatim}
|
||||
|
||||
A slight variation on this to better align the numbers in the last column
|
||||
\begin{equation}
|
||||
\left(
|
||||
\begin{array}{cccr}
|
||||
a+b+c & uv & x-y & 27\\
|
||||
a+b & u+v & z & 134
|
||||
\end{array}\right)
|
||||
\end{equation}
|
||||
is coded as:
|
||||
\begin{verbatim}
|
||||
\begin{equation}
|
||||
\left(
|
||||
\begin{array}{cccr}
|
||||
a+b+c & uv & x-y & 27\\
|
||||
a+b & u+v & z & 134
|
||||
\end{array} \right)
|
||||
\end{equation}
|
||||
\end{verbatim}
|
||||
|
||||
An array with vertical and horizontal rules
|
||||
\begin{equation}
|
||||
\left( \begin{array}{c|c|c|r}
|
||||
a+b+c & uv & x-y & 27\\ \hline
|
||||
a+b & u+v & z & 134
|
||||
\end{array}\right)
|
||||
\end{equation}
|
||||
is coded as:
|
||||
\begin{verbatim}
|
||||
\begin{equation}
|
||||
\left(
|
||||
\begin{array}{c|c|c|r}
|
||||
a+b+c & uv & x-y & 27\\
|
||||
a+b & u+v & z & 134
|
||||
\end{array} \right)
|
||||
\end{equation}
|
||||
\end{verbatim}
|
||||
Note the argument now has the pipe "$\vert$" included to indicate the placement of the vertical rules.
|
||||
|
||||
|
||||
\subsection{Cases Structures}
|
||||
Many times cases can be miscoded using the wrong environment, i.e., {\tt{array}}. Using the {\tt{cases}} environment will save keystrokes (from not having to type the $\backslash${\tt{left}}$\backslash${\tt{lbrace}}) and automatically provide the correct column alignment.
|
||||
\begin{equation*}
|
||||
{z_m(t)} = \begin{cases}
|
||||
1,&{\text{if}}\ {\beta }_m(t) \\
|
||||
{0,}&{\text{otherwise.}}
|
||||
\end{cases}
|
||||
\end{equation*}
|
||||
\noindent is coded as follows:
|
||||
\begin{verbatim}
|
||||
\begin{equation*}
|
||||
{z_m(t)} =
|
||||
\begin{cases}
|
||||
1,&{\text{if}}\ {\beta }_m(t),\\
|
||||
{0,}&{\text{otherwise.}}
|
||||
\end{cases}
|
||||
\end{equation*}
|
||||
\end{verbatim}
|
||||
\noindent Note that the ``\&'' is used to mark the tabular alignment. This is important to get proper column alignment. Do not use $\backslash${\tt{quad}} or other fixed spaces to try and align the columns. Also, note the use of the $\backslash${\tt{text}} macro for text elements such as ``if'' and ``otherwise.''
|
||||
|
||||
\subsection{Function Formatting in Equations}
|
||||
Often, there is an easy way to properly format most common functions. Use of the $\backslash$ in front of the function name will in most cases, provide the correct formatting. When this does not work, the following example provides a solution using the $\backslash${\tt{text}} macro:
|
||||
|
||||
\begin{equation*}
|
||||
d_{R}^{KM} = \underset {d_{l}^{KM}} {\text{arg min}} \{ d_{1}^{KM},\ldots,d_{6}^{KM}\}.
|
||||
\end{equation*}
|
||||
|
||||
\noindent is coded as follows:
|
||||
\begin{verbatim}
|
||||
\begin{equation*}
|
||||
d_{R}^{KM} = \underset {d_{l}^{KM}}
|
||||
{\text{arg min}} \{ d_{1}^{KM},
|
||||
\ldots,d_{6}^{KM}\}.
|
||||
\end{equation*}
|
||||
\end{verbatim}
|
||||
|
||||
\subsection{ Text Acronyms Inside Equations}
|
||||
This example shows where the acronym ``MSE" is coded using $\backslash${\tt{text\{\}}} to match how it appears in the text.
|
||||
|
||||
\begin{equation*}
|
||||
\text{MSE} = \frac {1}{n}\sum _{i=1}^{n}(Y_{i} - \hat {Y_{i}})^{2}
|
||||
\end{equation*}
|
||||
|
||||
\begin{verbatim}
|
||||
\begin{equation*}
|
||||
\text{MSE} = \frac {1}{n}\sum _{i=1}^{n}
|
||||
(Y_{i} - \hat {Y_{i}})^{2}
|
||||
\end{equation*}
|
||||
\end{verbatim}
|
||||
|
||||
\section{Conclusion}
|
||||
The conclusion goes here.
|
||||
|
||||
|
||||
\section*{Acknowledgments}
|
||||
This should be a simple paragraph before the References to thank those individuals and institutions who have supported your work on this article.
|
||||
|
||||
|
||||
|
||||
{\appendix[Proof of the Zonklar Equations]
|
||||
Use $\backslash${\tt{appendix}} if you have a single appendix:
|
||||
Do not use $\backslash${\tt{section}} anymore after $\backslash${\tt{appendix}}, only $\backslash${\tt{section*}}.
|
||||
If you have multiple appendixes use $\backslash${\tt{appendices}} then use $\backslash${\tt{section}} to start each appendix.
|
||||
You must declare a $\backslash${\tt{section}} before using any $\backslash${\tt{subsection}} or using $\backslash${\tt{label}} ($\backslash${\tt{appendices}} by itself
|
||||
starts a section numbered zero.)}
|
||||
|
||||
|
||||
|
||||
%{\appendices
|
||||
%\section*{Proof of the First Zonklar Equation}
|
||||
%Appendix one text goes here.
|
||||
% You can choose not to have a title for an appendix if you want by leaving the argument blank
|
||||
%\section*{Proof of the Second Zonklar Equation}
|
||||
%Appendix two text goes here.}
|
||||
|
||||
|
||||
|
||||
\section{References Section}
|
||||
You can use a bibliography generated by BibTeX as a .bbl file.
|
||||
BibTeX documentation can be easily obtained at:
|
||||
http://mirror.ctan.org/biblio/bibtex/contrib/doc/
|
||||
The IEEEtran BibTeX style support page is:
|
||||
http://www.michaelshell.org/tex/ieeetran/bibtex/
|
||||
|
||||
% argument is your BibTeX string definitions and bibliography database(s)
|
||||
%\bibliography{IEEEabrv,../bib/paper}
|
||||
%
|
||||
\section{Simple References}
|
||||
You can manually copy in the resultant .bbl file and set second argument of $\backslash${\tt{begin}} to the number of references
|
||||
(used to reserve space for the reference number labels box).
|
||||
|
||||
\begin{thebibliography}{1}
|
||||
\bibliographystyle{IEEEtran}
|
||||
|
||||
\bibitem{ref1}
|
||||
{\it{Mathematics Into Type}}. American Mathematical Society. [Online]. Available: https://www.ams.org/arc/styleguide/mit-2.pdf
|
||||
|
||||
\bibitem{ref2}
|
||||
T. W. Chaundy, P. R. Barrett and C. Batey, {\it{The Printing of Mathematics}}. London, U.K., Oxford Univ. Press, 1954.
|
||||
|
||||
\bibitem{ref3}
|
||||
F. Mittelbach and M. Goossens, {\it{The \LaTeX Companion}}, 2nd ed. Boston, MA, USA: Pearson, 2004.
|
||||
|
||||
\bibitem{ref4}
|
||||
G. Gr\"atzer, {\it{More Math Into LaTeX}}, New York, NY, USA: Springer, 2007.
|
||||
|
||||
\bibitem{ref5}M. Letourneau and J. W. Sharp, {\it{AMS-StyleGuide-online.pdf,}} American Mathematical Society, Providence, RI, USA, [Online]. Available: http://www.ams.org/arc/styleguide/index.html
|
||||
|
||||
\bibitem{ref6}
|
||||
H. Sira-Ramirez, ``On the sliding mode control of nonlinear systems,'' \textit{Syst. Control Lett.}, vol. 19, pp. 303--312, 1992.
|
||||
|
||||
\bibitem{ref7}
|
||||
A. Levant, ``Exact differentiation of signals with unbounded higher derivatives,'' in \textit{Proc. 45th IEEE Conf. Decis.
|
||||
Control}, San Diego, CA, USA, 2006, pp. 5585--5590. DOI: 10.1109/CDC.2006.377165.
|
||||
|
||||
\bibitem{ref8}
|
||||
M. Fliess, C. Join, and H. Sira-Ramirez, ``Non-linear estimation is easy,'' \textit{Int. J. Model., Ident. Control}, vol. 4, no. 1, pp. 12--27, 2008.
|
||||
|
||||
\bibitem{ref9}
|
||||
R. Ortega, A. Astolfi, G. Bastin, and H. Rodriguez, ``Stabilization of food-chain systems using a port-controlled Hamiltonian description,'' in \textit{Proc. Amer. Control Conf.}, Chicago, IL, USA,
|
||||
2000, pp. 2245--2249.
|
||||
|
||||
\end{thebibliography}
|
||||
|
||||
|
||||
\newpage
|
||||
|
||||
\section{Biography Section}
|
||||
If you have an EPS/PDF photo (graphicx package needed), extra braces are
|
||||
needed around the contents of the optional argument to biography to prevent
|
||||
the LaTeX parser from getting confused when it sees the complicated
|
||||
$\backslash${\tt{includegraphics}} command within an optional argument. (You can create
|
||||
your own custom macro containing the $\backslash${\tt{includegraphics}} command to make things
|
||||
simpler here.)
|
||||
|
||||
\vspace{11pt}
|
||||
|
||||
\bf{If you include a photo:}\vspace{-33pt}
|
||||
\begin{IEEEbiography}[{\includegraphics[width=1in,height=1.25in,clip,keepaspectratio]{fig1}}]{Michael Shell}
|
||||
Use $\backslash${\tt{begin\{IEEEbiography\}}} and then for the 1st argument use $\backslash${\tt{includegraphics}} to declare and link the author photo.
|
||||
Use the author name as the 3rd argument followed by the biography text.
|
||||
\end{IEEEbiography}
|
||||
|
||||
\vspace{11pt}
|
||||
|
||||
\bf{If you will not include a photo:}\vspace{-33pt}
|
||||
\begin{IEEEbiographynophoto}{John Doe}
|
||||
Use $\backslash${\tt{begin\{IEEEbiographynophoto\}}} and the author name as the argument followed by the biography text.
|
||||
\end{IEEEbiographynophoto}
|
||||
|
||||
|
||||
|
||||
|
||||
\vfill
|
||||
|
||||
\end{document}
|
||||
|
||||
|
5
make.sh
5
make.sh
@ -1,6 +1,7 @@
|
||||
# latexmk -c
|
||||
# latexmk -pvc -xelatex -interaction=nonstopmode main.tex
|
||||
latexmk -quiet -interaction=nonstopmode --pvc --pdf main.tex
|
||||
latexmk -pvc -xelatex -interaction=nonstopmode main.tex
|
||||
# latexmk -quiet -interaction=nonstopmode --pvc --pdf main.tex
|
||||
# latexmk -pdf -interaction=nonstopmode -pvc main.tex
|
||||
|
||||
|
||||
|
||||
|
@ -1,4 +0,0 @@
|
||||
PWD /home/haoru/repo/plor_rcnn_thesis
|
||||
INPUT /usr/local/texlive/2023/texmf.cnf
|
||||
INPUT /usr/local/texlive/2023/texmf-dist/web2c/texmf.cnf
|
||||
INPUT /usr/local/texlive/2023/texmf-var/web2c/pdftex/pdflatex.fmt
|
29
ref.bib
29
ref.bib
@ -1,29 +0,0 @@
|
||||
@article{houghtransform,
|
||||
title={A survey of the Hough transform},
|
||||
author={Illingworth, John and Kittler, Josef},
|
||||
journal={Computer vision, graphics, and image processing},
|
||||
volume={44},
|
||||
number={1},
|
||||
pages={87--116},
|
||||
year={1988},
|
||||
publisher={Elsevier}
|
||||
}
|
||||
|
||||
@article{canny1986computational,
|
||||
title={A computational approach to edge detection},
|
||||
author={Canny, John},
|
||||
journal={IEEE Transactions on pattern analysis and machine intelligence},
|
||||
number={6},
|
||||
pages={679--698},
|
||||
year={1986},
|
||||
publisher={IEEE}
|
||||
}
|
||||
|
||||
@inproceedings{kluge1995deformable,
|
||||
title={A deformable-template approach to lane detection},
|
||||
author={Kluge, Karl and Lakshmanan, Sridhar},
|
||||
booktitle={Proceedings of the Intelligent Vehicles' 95. Symposium},
|
||||
pages={54--59},
|
||||
year={1995},
|
||||
organization={IEEE}
|
||||
}
|
BIN
~$演示文稿1.pptx
BIN
~$演示文稿1.pptx
Binary file not shown.
Loading…
x
Reference in New Issue
Block a user