* cppinternals.texi: Update.

From-SVN: r39144
This commit is contained in:
Neil Booth 2001-01-19 22:25:53 +00:00 committed by Neil Booth
parent 55cf7bb972
commit 111e0469ce
2 changed files with 91 additions and 10 deletions

View File

@ -1,3 +1,7 @@
2001-01-19 Neil Booth <neil@daikokuya.demon.co.uk>
* cppinternals.texi: Update.
2001-01-19 Richard Earnshaw <rearnsha@arm.com>
* arm.c (arm_init_builtins): Re-enable builtins.

View File

@ -91,11 +91,15 @@ Identifiers, macro expansion, hash nodes, lexing.
* Conventions:: Conventions used in the code.
* Lexer:: The combined C, C++ and Objective C Lexer.
* Whitespace:: Input and output newlines and whitespace.
* Hash Nodes:: All identifiers are hashed.
* Macro Expansion:: Macro expansion algorithm.
* Files:: File handling.
* Concept Index:: Index of concepts and terms.
* Index:: Index.
@end menu
@node Conventions, Lexer, Top, Top
@unnumbered Conventions
cpplib has two interfaces - one is exposed internally only, and the
other is for both internal and external use.
@ -113,6 +117,7 @@ are perhaps relying on some kind of undocumented implementation-specific
behaviour.
@node Lexer, Whitespace, Conventions, Top
@unnumbered The Lexer
The lexer is contained in the file @samp{cpplex.c}. We want to have a
lexer that is single-pass, for efficiency reasons. We would also like
@ -194,7 +199,8 @@ a trigraph, but the command line option @samp{-trigraphs} is not in
force but @samp{-Wtrigraphs} is, we need to warn about it but then
buffer it and continue to treat it as 3 separate characters.
@node Whitespace, Concept Index, Lexer, Top
@node Whitespace, Hash Nodes, Lexer, Top
@unnumbered Whitespace
The lexer has been written to treat each of @samp{\r}, @samp{\n},
@samp{\r\n} and @samp{\n\r} as a single new line indicator. This allows
@ -202,18 +208,89 @@ it to transparently preprocess MS-DOS, Macintosh and Unix files without
their needing to pass through a special filter beforehand.
We also decided to treat a backslash, either @samp{\} or the trigraph
@samp{??/}, separated from one of the above newline forms by whitespace
only (one or more space, tab, form-feed, vertical tab or NUL characters),
as an intended escaped newline. The library issues a diagnostic in this
case.
@samp{??/}, separated from one of the above newline indicators by
non-comment whitespace only, as intending to escape the newline. It
tends to be a typing mistake, and cannot reasonably be mistaken for
anything else in any of the C-family grammars. Since handling it this
way is not strictly conforming to the ISO standard, the library issues a
warning wherever it encounters it.
Handling newlines in this way is made simpler by doing it in one place
Handling newlines like this is made simpler by doing it in one place
only. The function @samp{handle_newline} takes care of all newline
characters, and @samp{skip_escaped_newlines} takes care of all escaping
of newlines, deferring to @samp{handle_newline} to handle the newlines
themselves.
characters, and @samp{skip_escaped_newlines} takes care of arbitrarily
long sequences of escaped newlines, deferring to @samp{handle_newline}
to handle the newlines themselves.
@node Concept Index, Index, Whitespace, Top
@node Hash Nodes, Macro Expansion, Whitespace, Top
@unnumbered Hash Nodes
When cpplib encounters an "identifier", it generates a hash code for it
and stores it in the hash table. By "identifier" we mean tokens with
type @samp{CPP_NAME}; this includes identifiers in the usual C sense, as
well as keywords, directive names, macro names and so on. For example,
all of "pragma", "int", "foo" and "__GNUC__" are identifiers and hashed
when lexed.
Each node in the hash table contain various information about the
identifier it represents. For example, its length and type. At any one
time, each identifier falls into exactly one of three categories:
@itemize @bullet
@item Macros
These have been declared to be macros, either on the command line or
with @samp{#define}. A few, such as @samp{__TIME__} are builtins
entered in the hash table during initialisation. The hash node for a
normal macro points to a structure with more information about the
macro, such as whether it is function-like, how many arguments it takes,
and its expansion. Builtin macros are flagged as special, and instead
contain an enum indicating which of the various builtin macros it is.
@item Assertions
Assertions are in a separate namespace to macros. To enforce this, cpp
actually prepends a @samp{#} character before hashing and entering it in
the hash table. An assertion's node points to a chain of answers to
that assertion.
@item Void
Everything else falls into this category - an identifier that is not
currently a macro, or a macro that has since been undefined with
@samp{#undef}.
When preprocessing C++, this category also includes the named operators,
such as @samp{xor}. In expressions these behave like the operators they
represent, but in contexts where the spelling of a token matters they
are spelt differently. This spelling distinction is relevant when they
are operands of the stringizing and pasting macro operators @samp{#} and
@samp{##}. Named operator hash nodes are flagged, both to catch the
spelling distinction and to prevent them from being defined as macros.
@end itemize
The same identifiers share the same hash node. Since each identifier
token, after lexing, contains a pointer to its hash node, this is used
to provide rapid lookup of various information. For example, when
parsing a @samp{#define} statement, CPP flags each argument's identifier
hash node with the index of that argument. This makes duplicated
argument checking an O(1) operation for each argument. Similarly, for
each identifier in the macro's expansion, lookup to see if it is an
argument, and which argument it is, is also an O(1) operation. Further,
each directive name, such as @samp{endif}, has an associated directive
enum stored in its hash node, so that directive lookup is also O(1).
Later, CPP may also store C front-end information in its identifier hash
table, such as a @samp{tree} pointer.
@node Macro Expansion, Files, Hash Nodes, Top
@unnumbered Macro Expansion Algorithm
@printindex cp
@node Files, Concept Index, Macro Expansion, Top
@unnumbered File Handling
@printindex cp
@node Concept Index, Index, Files, Top
@unnumbered Concept Index
@printindex cp