Prior to this patch, program_state::detect_leaks worked by finding all
live svalues in the old state and in the new state, and calling
on_svalue_leak for each svalue that has changed from being live to
not being live.
PR analyzer/99042 and PR analyzer/99774 both describe false leak
diagnostics from -fanalyzer (a false FILE * leak in git, and a false
malloc leak in qemu, respectively).
In both cases the root cause of the false leak diagnostic relates to
svalues no longer being explicitly bound in the store due to regions
being conservatively clobbered, due to an unknown function being
called, or due to a write through a pointer that could alias the
region, respectively.
We have a transition from an svalue being explicitly live to not
being explicitly live - but only because the store is being
conservative, clobbering the binding. The leak detection is looking
for transitions from "definitely live" to "not definitely live",
when it should be looking for transitions from "definitely live"
to "definitely not live".
This patch introduces a new class to temporarily capture information
about svalues that were explicitly live, but for which a region bound
to them got clobbered for conservative reasons. This new
"uncertainty_t" class is passed around to capture the data long enough
for use in program_state::detect_leaks, where it is used to only
complain about svalues that were definitely live and are now both
not definitely live *or* possibly-live i.e. definitely not-live.
The class also captures for which svalues we can't meaningfully track
sm-state anymore, and resets the svalues back to the "start" state.
Together, these changes fix the false leak reports.
gcc/analyzer/ChangeLog:
PR analyzer/99042
PR analyzer/99774
* engine.cc
(impl_region_model_context::impl_region_model_context): Add
uncertainty param and use it to initialize m_uncertainty.
(impl_region_model_context::get_uncertainty): New.
(impl_sm_context::get_fndecl_for_call): Add NULL for new
uncertainty param when constructing impl_region_model_context.
(impl_sm_context::get_state): Likewise.
(impl_sm_context::set_next_state): Likewise.
(impl_sm_context::warn): Likewise.
(exploded_node::on_stmt): Add uncertainty param
and use it when constructing impl_region_model_context.
(exploded_node::on_edge): Add uncertainty param and pass
to on_edge call.
(exploded_node::detect_leaks): Create uncertainty_t and pass to
impl_region_model_context.
(exploded_graph::get_or_create_node): Create uncertainty_t and
pass to prune_for_point.
(maybe_process_run_of_before_supernode_enodes): Create
uncertainty_t and pass to impl_region_model_context.
(exploded_graph::process_node): Create uncertainty_t instances and
pass around as needed.
* exploded-graph.h
(impl_region_model_context::impl_region_model_context): Add
uncertainty param.
(impl_region_model_context::get_uncertainty): New decl.
(impl_region_model_context::m_uncertainty): New field.
(exploded_node::on_stmt): Add uncertainty param.
(exploded_node::on_edge): Likewise.
* program-state.cc (sm_state_map::on_liveness_change): Get
uncertainty from context and use it to unset sm-state from
svalues as appropriate.
(program_state::on_edge): Add uncertainty param and use it when
constructing impl_region_model_context. Fix indentation.
(program_state::prune_for_point): Add uncertainty param and use it
when constructing impl_region_model_context.
(program_state::detect_leaks): Get any uncertainty from ctxt and
use it to get maybe-live svalues for dest_state, rather than
definitely-live ones; use this when determining which svalues
have leaked.
(selftest::test_program_state_merging): Create uncertainty_t and
pass to impl_region_model_context.
* program-state.h (program_state::on_edge): Add uncertainty param.
(program_state::prune_for_point): Likewise.
* region-model-impl-calls.cc (call_details::get_uncertainty): New.
(region_model::impl_call_memcpy): Pass uncertainty to
mark_region_as_unknown call.
(region_model::impl_call_memset): Likewise.
(region_model::impl_call_strcpy): Likewise.
* region-model-reachability.cc (reachable_regions::handle_sval):
Also add sval to m_mutable_svals.
* region-model.cc (region_model::on_assignment): Pass any
uncertainty from ctxt to the store::set_value call.
(region_model::handle_unrecognized_call): Get any uncertainty from
ctxt and use it to record mutable svalues at the unknown call.
(region_model::get_reachable_svalues): Add uncertainty param and
use it to mark any maybe-bound svalues as being reachable.
(region_model::set_value): Pass any uncertainty from ctxt to the
store::set_value call.
(region_model::mark_region_as_unknown): Add uncertainty param and
pass it on to the store::mark_region_as_unknown call.
(region_model::update_for_call_summary): Add uncertainty param and
pass it on to the region_model::mark_region_as_unknown call.
* region-model.h (call_details::get_uncertainty): New decl.
(region_model::get_reachable_svalues): Add uncertainty param.
(region_model::mark_region_as_unknown): Add uncertainty param.
(region_model_context::get_uncertainty): New vfunc.
(noop_region_model_context::get_uncertainty): New vfunc
implementation.
* store.cc (dump_svalue_set): New.
(uncertainty_t::dump_to_pp): New.
(uncertainty_t::dump): New.
(binding_cluster::clobber_region): Pass NULL for uncertainty to
remove_overlapping_bindings.
(binding_cluster::mark_region_as_unknown): Add uncertainty param
and pass it to remove_overlapping_bindings.
(binding_cluster::remove_overlapping_bindings): Add uncertainty param.
Use it to record any svalues that were in clobbered bindings.
(store::set_value): Add uncertainty param. Pass it to
binding_cluster::mark_region_as_unknown when handling symbolic
regions.
(store::mark_region_as_unknown): Add uncertainty param and pass it
to binding_cluster::mark_region_as_unknown.
(store::remove_overlapping_bindings): Add uncertainty param and
pass it to binding_cluster::remove_overlapping_bindings.
* store.h (binding_cluster::mark_region_as_unknown): Add
uncertainty param.
(binding_cluster::remove_overlapping_bindings): Likewise.
(store::set_value): Likewise.
(store::mark_region_as_unknown): Likewise.
gcc/testsuite/ChangeLog:
PR analyzer/99042
PR analyzer/99774
* gcc.dg/analyzer/pr99042.c: New test.
* gcc.dg/analyzer/pr99774-1.c: New test.
* gcc.dg/analyzer/pr99774-2.c: New test.
Copyright (C) 2000-2021 Free Software Foundation, Inc.
This file is intended to contain a few notes about writing C code
within GCC so that it compiles without error on the full range of
compilers GCC needs to be able to compile on.
The problem is that many ISO-standard constructs are not accepted by
either old or buggy compilers, and we keep getting bitten by them.
This knowledge until now has been sparsely spread around, so I
thought I'd collect it in one useful place. Please add and correct
any problems as you come across them.
I'm going to start from a base of the ISO C90 standard, since that is
probably what most people code to naturally. Obviously using
constructs introduced after that is not a good idea.
For the complete coding style conventions used in GCC, please read
http://gcc.gnu.org/codingconventions.html
String literals
---------------
Some compilers like MSVC++ have fairly low limits on the maximum
length of a string literal; 509 is the lowest we've come across. You
may need to break up a long printf statement into many smaller ones.
Empty macro arguments
---------------------
ISO C (6.8.3 in the 1990 standard) specifies the following:
If (before argument substitution) any argument consists of no
preprocessing tokens, the behavior is undefined.
This was relaxed by ISO C99, but some older compilers emit an error,
so code like
#define foo(x, y) x y
foo (bar, )
needs to be coded in some other way.
Avoid unnecessary test before free
----------------------------------
Since SunOS 4 stopped being a reasonable portability target,
(which happened around 2007) there has been no need to guard
against "free (NULL)". Thus, any guard like the following
constitutes a redundant test:
if (P)
free (P);
It is better to avoid the test.[*]
Instead, simply free P, regardless of whether it is NULL.
[*] However, if your profiling exposes a test like this in a
performance-critical loop, say where P is nearly always NULL, and
the cost of calling free on a NULL pointer would be prohibitively
high, consider using __builtin_expect, e.g., like this:
if (__builtin_expect (ptr != NULL, 0))
free (ptr);
Trigraphs
---------
You weren't going to use them anyway, but some otherwise ISO C
compliant compilers do not accept trigraphs.
Suffixes on Integer Constants
-----------------------------
You should never use a 'l' suffix on integer constants ('L' is fine),
since it can easily be confused with the number '1'.
Common Coding Pitfalls
======================
errno
-----
errno might be declared as a macro.
Implicit int
------------
In C, the 'int' keyword can often be omitted from type declarations.
For instance, you can write
unsigned variable;
as shorthand for
unsigned int variable;
There are several places where this can cause trouble. First, suppose
'variable' is a long; then you might think
(unsigned) variable
would convert it to unsigned long. It does not. It converts to
unsigned int. This mostly causes problems on 64-bit platforms, where
long and int are not the same size.
Second, if you write a function definition with no return type at
all:
operate (int a, int b)
{
...
}
that function is expected to return int, *not* void. GCC will warn
about this.
Implicit function declarations always have return type int. So if you
correct the above definition to
void
operate (int a, int b)
...
but operate() is called above its definition, you will get an error
about a "type mismatch with previous implicit declaration". The cure
is to prototype all functions at the top of the file, or in an
appropriate header.
Char vs unsigned char vs int
----------------------------
In C, unqualified 'char' may be either signed or unsigned; it is the
implementation's choice. When you are processing 7-bit ASCII, it does
not matter. But when your program must handle arbitrary binary data,
or fully 8-bit character sets, you have a problem. The most obvious
issue is if you have a look-up table indexed by characters.
For instance, the character '\341' in ISO Latin 1 is SMALL LETTER A
WITH ACUTE ACCENT. In the proper locale, isalpha('\341') will be
true. But if you read '\341' from a file and store it in a plain
char, isalpha(c) may look up character 225, or it may look up
character -31. And the ctype table has no entry at offset -31, so
your program will crash. (If you're lucky.)
It is wise to use unsigned char everywhere you possibly can. This
avoids all these problems. Unfortunately, the routines in <string.h>
take plain char arguments, so you have to remember to cast them back
and forth - or avoid the use of strxxx() functions, which is probably
a good idea anyway.
Another common mistake is to use either char or unsigned char to
receive the result of getc() or related stdio functions. They may
return EOF, which is outside the range of values representable by
char. If you use char, some legal character value may be confused
with EOF, such as '\377' (SMALL LETTER Y WITH UMLAUT, in Latin-1).
The correct choice is int.
A more subtle version of the same mistake might look like this:
unsigned char pushback[NPUSHBACK];
int pbidx;
#define unget(c) (assert(pbidx < NPUSHBACK), pushback[pbidx++] = (c))
#define get(c) (pbidx ? pushback[--pbidx] : getchar())
...
unget(EOF);
which will mysteriously turn a pushed-back EOF into a SMALL LETTER Y
WITH UMLAUT.
Other common pitfalls
---------------------
o Expecting 'plain' char to be either sign or unsigned extending.
o Shifting an item by a negative amount or by greater than or equal to
the number of bits in a type (expecting shifts by 32 to be sensible
has caused quite a number of bugs at least in the early days).
o Expecting ints shifted right to be sign extended.
o Modifying the same value twice within one sequence point.
o Host vs. target floating point representation, including emitting NaNs
and Infinities in a form that the assembler handles.
o qsort being an unstable sort function (unstable in the sense that
multiple items that sort the same may be sorted in different orders
by different qsort functions).
o Passing incorrect types to fprintf and friends.
o Adding a function declaration for a module declared in another file to
a .c file instead of to a .h file.