* cppinternals.texi: Update for file handling.
From-SVN: r40432
This commit is contained in:
parent
a84efb51f7
commit
1198142ba6
@ -1,3 +1,7 @@
|
||||
2001-03-12 Neil Booth <neil@daikokuya.demon.co.uk>
|
||||
|
||||
* cppinternals.texi: Update for file handling.
|
||||
|
||||
2001-03-12 Jeffrey Oldham <oldham@codesourcery.com>
|
||||
|
||||
* emit-rtl.c (remove_unnecessary_notes): Reverse Richard Kenner's
|
||||
|
@ -184,7 +184,7 @@ problem.
|
||||
|
||||
Another place where state flags are used to change behaviour is whilst
|
||||
parsing header names. Normally, a @samp{<} would be lexed as a single
|
||||
token. After a @samp{#include} directive, though, it should be lexed
|
||||
token. After a @code{#include} directive, though, it should be lexed
|
||||
as a single token as far as the nearest @samp{>} character. Note that
|
||||
we don't allow the terminators of header names to be escaped; the first
|
||||
@samp{"} or @samp{>} terminates the header name.
|
||||
@ -311,7 +311,7 @@ time, each identifier falls into exactly one of three categories:
|
||||
@item Macros
|
||||
|
||||
These have been declared to be macros, either on the command line or
|
||||
with @samp{#define}. A few, such as @samp{__TIME__} are builtins
|
||||
with @code{#define}. A few, such as @samp{__TIME__} are builtins
|
||||
entered in the hash table during initialisation. The hash node for a
|
||||
normal macro points to a structure with more information about the
|
||||
macro, such as whether it is function-like, how many arguments it takes,
|
||||
@ -321,7 +321,7 @@ contain an enum indicating which of the various builtin macros it is.
|
||||
@item Assertions
|
||||
|
||||
Assertions are in a separate namespace to macros. To enforce this, cpp
|
||||
actually prepends a @samp{#} character before hashing and entering it in
|
||||
actually prepends a @code{#} character before hashing and entering it in
|
||||
the hash table. An assertion's node points to a chain of answers to
|
||||
that assertion.
|
||||
|
||||
@ -329,21 +329,21 @@ that assertion.
|
||||
|
||||
Everything else falls into this category - an identifier that is not
|
||||
currently a macro, or a macro that has since been undefined with
|
||||
@samp{#undef}.
|
||||
@code{#undef}.
|
||||
|
||||
When preprocessing C++, this category also includes the named operators,
|
||||
such as @samp{xor}. In expressions these behave like the operators they
|
||||
represent, but in contexts where the spelling of a token matters they
|
||||
are spelt differently. This spelling distinction is relevant when they
|
||||
are operands of the stringizing and pasting macro operators @samp{#} and
|
||||
@samp{##}. Named operator hash nodes are flagged, both to catch the
|
||||
are operands of the stringizing and pasting macro operators @code{#} and
|
||||
@code{##}. Named operator hash nodes are flagged, both to catch the
|
||||
spelling distinction and to prevent them from being defined as macros.
|
||||
@end itemize
|
||||
|
||||
The same identifiers share the same hash node. Since each identifier
|
||||
token, after lexing, contains a pointer to its hash node, this is used
|
||||
to provide rapid lookup of various information. For example, when
|
||||
parsing a @samp{#define} statement, CPP flags each argument's identifier
|
||||
parsing a @code{#define} statement, CPP flags each argument's identifier
|
||||
hash node with the index of that argument. This makes duplicated
|
||||
argument checking an O(1) operation for each argument. Similarly, for
|
||||
each identifier in the macro's expansion, lookup to see if it is an
|
||||
@ -353,11 +353,74 @@ enum stored in its hash node, so that directive lookup is also O(1).
|
||||
|
||||
@node Macro Expansion, Files, Hash Nodes, Top
|
||||
@unnumbered Macro Expansion Algorithm
|
||||
@printindex cp
|
||||
|
||||
@node Files, Index, Macro Expansion, Top
|
||||
@unnumbered File Handling
|
||||
@printindex cp
|
||||
@cindex files
|
||||
|
||||
Fairly obviously, the file handling code of cpplib resides in the file
|
||||
@samp{cppfiles.c}. It takes care of the details of file searching,
|
||||
opening, reading and caching, for both the main source file and all the
|
||||
headers it recursively includes.
|
||||
|
||||
The basic strategy is to minimize the number of system calls. On many
|
||||
systems, the basic @code{open ()} and @code{fstat ()} system calls can
|
||||
be quite expensive. For every @code{#include}-d file, we need to try
|
||||
all the directories in the search path until we find a match. Some
|
||||
projects, such as glibc, pass twenty or thirty include paths on the
|
||||
command line, so this can rapidly become time consuming.
|
||||
|
||||
For a header file we have not encountered before we have little choice
|
||||
but to do this. However, it is often the case that the same headers are
|
||||
repeatedly included, and in these cases we try to avoid repeating the
|
||||
filesystem queries whilst searching for the correct file.
|
||||
|
||||
For each file we try to open, we store the constructed path in a splay
|
||||
tree. This path first undergoes simplification by the function
|
||||
@code{_cpp_simplify_pathname}. For example,
|
||||
@samp{/usr/include/bits/../foo.h} is simplified to
|
||||
@samp{/usr/include/foo.h} before we enter it in the splay tree and try
|
||||
to @code{open ()} the file. CPP will then find subsequent uses of
|
||||
@samp{foo.h}, even as @samp{/usr/include/foo.h}, in the splay tree and
|
||||
save system calls.
|
||||
|
||||
Further, it is likely the file contents have also been cached, saving a
|
||||
@code{read ()} system call. We don't bother caching the contents of
|
||||
header files that are re-inclusion protected, and whose re-inclusion
|
||||
macro is defined when we leave the header file for the first time. If
|
||||
the host supports it, we try to map suitably large files into memory,
|
||||
rather than reading them in directly.
|
||||
|
||||
The include paths are intenally stored on a null-terminated
|
||||
singly-linked list, starting with the @code{"header.h"} directory search
|
||||
chain, which then links into the @code{<header.h>} directory chain.
|
||||
|
||||
Files included with the @code{<foo.h>} syntax start the lookup directly
|
||||
in the second half of this chain. However, files included with the
|
||||
@code{"foo.h"} syntax start at the beginning of the chain, but with one
|
||||
extra directory prepended. This is the directory of the current file;
|
||||
the one containing the @code{#include} directive. Prepending this
|
||||
directory on a per-file basis is handled by the function
|
||||
@code{search_from}.
|
||||
|
||||
Note that a header included with a directory component, such as
|
||||
@code{#include "mydir/foo.h"} and opened as
|
||||
@samp{/usr/local/include/mydir/foo.h}, will have the complete path minus
|
||||
the basename @samp{foo.h} as the current directory.
|
||||
|
||||
Enough information is stored in the splay tree that CPP can immediately
|
||||
tell whether it can skip the header file because of the multiple include
|
||||
optimisation, whether the file didn't exist or couldn't be opened for
|
||||
some reason, or whether the header was flagged not to be re-used, as it
|
||||
is with the obsolete @code{#import} directive.
|
||||
|
||||
For the benefit of MS-DOS filesystems with an 8.3 filename limitation,
|
||||
CPP offers the ability to treat various include file names as aliases
|
||||
for the real header files with shorter names. The map from one to the
|
||||
other is found in a special file called @samp{header.gcc}, stored in the
|
||||
command line (or system) include directories to which the mapping
|
||||
applies. This may be higher up the directory tree than the full path to
|
||||
the file minus the base name.
|
||||
|
||||
@node Index,, Files, Top
|
||||
@unnumbered Index
|
||||
|
Loading…
Reference in New Issue
Block a user