In the PR, fwprop was changing a call instruction and tripped
an assert when trying to update a list of call clobbers.
There are two ways we could handle this: remove the call clobber
and then add it back, or assume that the clobber will stay in its
current place.
At the moment we don't have enough information to safely move
calls around, so the second approach seems simpler and more
efficient.
gcc/
PR rtl-optimization/98403
* rtl-ssa/changes.cc (function_info::finalize_new_accesses): Explain
why we don't remove call clobbers.
(function_info::apply_changes_to_insn): Don't attempt to add
call clobbers here.
gcc/testsuite/
PR rtl-optimization/98403
* g++.dg/opt/pr98403.C: New test.
On AArch64, the vectoriser tries various ways of vectorising with both
SVE and Advanced SIMD and picks the best one. All other things being
equal, it prefers earlier attempts over later attempts.
The way this works currently is that, once it has a successful
vectorisation attempt A, it analyses all other attempts as epilogue
loops of A:
/* When pick_lowest_cost_p is true, we should in principle iterate
over all the loop_vec_infos that LOOP_VINFO could replace and
try to vectorize LOOP_VINFO under the same conditions.
E.g. when trying to replace an epilogue loop, we should vectorize
LOOP_VINFO as an epilogue loop with the same VF limit. When trying
to replace the main loop, we should vectorize LOOP_VINFO as a main
loop too.
However, autovectorize_vector_modes is usually sorted as follows:
- Modes that naturally produce lower VFs usually follow modes that
naturally produce higher VFs.
- When modes naturally produce the same VF, maskable modes
usually follow unmaskable ones, so that the maskable mode
can be used to vectorize the epilogue of the unmaskable mode.
This order is preferred because it leads to the maximum
epilogue vectorization opportunities. Targets should only use
a different order if they want to make wide modes available while
disparaging them relative to earlier, smaller modes. The assumption
in that case is that the wider modes are more expensive in some
way that isn't reflected directly in the costs.
There should therefore be few interesting cases in which
LOOP_VINFO fails when treated as an epilogue loop, succeeds when
treated as a standalone loop, and ends up being genuinely cheaper
than FIRST_LOOP_VINFO. */
However, the vectoriser can normally elide alias checks for epilogue
loops, on the basis that the main loop should do them instead.
Converting an epilogue loop to a main loop can therefore cause the alias
checks to be skipped. (It probably also unfairly penalises the original
loop in the cost comparison, given that one loop will have alias checks
and the other won't.)
As the comment says, we should in principle analyse each vector mode
twice: once as a main loop and once as an epilogue. However, doing
that up-front would be quite expensive. This patch instead goes for a
compromise: if an epilogue loop for mode M2 seems better than a main
loop for mode M1, re-analyse with M2 as the main loop.
The patch fixes dg.torture.exp=pr69719.c when testing with
-msve-vector-bits=128.
gcc/
PR tree-optimization/98371
* tree-vect-loop.c (vect_reanalyze_as_main_loop): New function.
(vect_analyze_loop): If an epilogue loop appears to be cheaper
than the main loop, re-analyze it as a main loop before adopting
it as a main loop.
With the introduction of C++20 modules and libcody, cc1plus and
cc1objplus gained a dependency on the socket functions. Before those
were merged into libc in Solaris 11.4, one needed to link with -lsocket -lnsl
on Solaris, so that merge broke the Solaris 11.3 build.
While we already have 4 different checks for those libraries in the
tree, I decided to import autoconf-archive's AX_LIB_SOCKET_NSL macro
instead. At the same time, the patch only links libcody and the
networking libs where needed (cc1plus, cc1objplus).
Bootstrapped without regressions on i386-pc-solaris2.11 (Solaris 11.3
and 11.4), sparc-sun-solaris2.11, and x86_64-pc-linux-gnu.
2020-12-16 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
c++tools:
PR c++/98316
* configure.ac: Include ../config/ax_lib_socket_nsl.m4.
(NETLIBS): Determine using AX_LIB_SOCKET_NSL.
* configure: Regenerate.
* Makefile.in (NETLIBS): Define.
(g++-mapper-server$(exeext)): Add $(NETLIBS).
gcc/objcp:
PR c++/98316
* Make-lang.in (cc1objplus$(exeext)): Add $(CODYLIB), $(NETLIBS).
gcc/cp:
PR c++/98316
* Make-lang.in (cc1plus$(exeext)): Add $(CODYLIB), $(NETLIBS).
gcc:
PR c++/98316
* configure.ac (NETLIBS): Determine using AX_LIB_SOCKET_NSL.
* aclocal.m4, configure: Regenerate.
* Makefile.in (NETLIBS): Define.
(BACKEND): Remove $(CODYLIB).
config:
PR c++/98316
* ax_lib_socket_nsl.m4: Import from autoconf-archive.
We don't try to optimize for signed x, y (int) (x - 1U) * y + y
into x * y, we can't do that with signed x * y, because the former
is well defined for INT_MIN and -1, while the latter is not.
We could perhaps optimize it during isel or some very late optimization
where we'd turn magically flag_wrapv, but we don't do that yet.
This patch optimizes it in simplify-rtx.c, such that we can optimize
it during combine.
2021-01-05 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/98334
* simplify-rtx.c (simplify_context::simplify_binary_operation_1):
Optimize (X - 1) * Y + Y to X * Y or (X + 1) * Y - Y to X * Y.
* gcc.target/i386/pr98334.c: New test.
This is just a precautionary fix.
2021-01-05 Bernd Edlinger <bernd.edlinger@hotmail.de>
* tree-inline.c (expand_call_inline): Restore input_location.
Return result from recursive call.
The constexpr iteration dereferenced an array element past the end of
the array.
for gcc/testsuite/ChangeLog
* g++.dg/cpp1y/constexpr-66093.C: Fix bounds issue.
This option will be used by the go command to implement go:embed directives,
which are new with the upcoming Go 1.16 release.
* lang.opt (fgo-embedcfg): New option.
* go-c.h (struct go_create_gogo_args): Add embedcfg field.
* go-lang.c (go_embedcfg): New static variable.
(go_langhook_init): Set go_create_gogo_args embedcfg field.
(go_langhook_handle_option): Handle OPT_fgo_embedcfg_.
* gccgo.texi (Invoking gccgo): Document -fgo-embedcfg.
-fsanitize=undefined with calls to nonnull functions
creates struct __ubsan_nonnull_arg_data instances
with CONSTRUCTORs for RECORD_TYPEs with NULL index values.
The analyzer was mistakenly using INTEGER_CST for these
fields, leading to ICEs.
Fix the issue by iterating through the fields in the type
for such cases, imitating similar logic in varasm.c's
output_constructor.
gcc/analyzer/ChangeLog:
PR analyzer/98293
* store.cc (binding_map::apply_ctor_to_region): When "index" is
NULL, iterate through the fields for RECORD_TYPEs, rather than
creating an INTEGER_CST index.
gcc/testsuite/ChangeLog:
PR analyzer/98293
* gcc.dg/analyzer/pr98293.c: New test.
The IFN_MASK* functions take two leading arguments: a load or
store pointer and a “cookie”. The type of the cookie is the
type of the access for TBAA purposes (like for MEM_REFs)
while the value of the cookie is the alignment of the access.
This PR was caused by a disagreement about whether the alignment
is measured in bits or bytes.
It looks like this goes back to PR68786, which made the
vectoriser create its own cookie argument rather than reusing
the one created by ifcvt. The alignment value of the new cookie
was measured in bytes (as needed by set_ptr_info_alignment)
while the existing code expected it to be measured in bits.
The folds I added for IFN_MASK_LOAD and STORE then made
things worse.
gcc/
PR tree-optimization/95401
* config/aarch64/aarch64-sve-builtins.cc
(gimple_folder::load_store_cookie): Use bits rather than bytes
for the alignment argument to IFN_MASK_LOAD and IFN_MASK_STORE.
* gimple-fold.c (gimple_fold_mask_load_store_mem_ref): Likewise.
* tree-vect-stmts.c (vectorizable_store): Likewise.
(vectorizable_load): Likewise.
gcc/testsuite/
PR tree-optimization/95401
* g++.dg/vect/pr95401.cc: New test.
* g++.dg/vect/pr95401a.cc: Likewise.
This makes sure to set the vector type on an invariant mask argument
for a masked load and SLP.
2021-01-04 Richard Biener <rguenther@suse.de>
PR tree-optimization/98308
* tree-vect-stmts.c (vectorizable_load): Set invariant mask
SLP vectype.
* gcc.dg/vect/pr98308.c: New testcase.
As the testcase shows, we punt unnecessarily on popcount loop idioms if
the type is smaller than int or larger than long long.
Smaller type than int can be handled by zero-extending the argument to
unsigned int, and types twice as long as long long by doing
__builtin_popcountll on both halves of the __int128.
2020-01-04 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/95771
* tree-ssa-loop-niter.c (number_of_iterations_popcount): Handle types
with precision smaller than int's precision and types with precision
twice as large as long long. Formatting fixes.
* gcc.target/i386/pr95771.c: New test.
This does VN replacement in loop nb_iterations consistent with
the rest of the IL by using availability at the definition site
of uses.
2021-01-04 Richard Biener <rguenther@suse.de>
PR tree-optimization/98464
* tree-ssa-sccvn.c (vn_valueize_for_srt): Rename from ...
(vn_valueize_wrapper): ... this. Temporarily adjust vn_context_bb.
(process_bb): Adjust.
* g++.dg/opt/pr98464.C: New testcase.
The original documentation added to mention the clash between
-fsanitize=address and -fsanitize=hwaddress used confusing wording trying
to say that -fsanitize=hwaddress is only available on AArch64.
It read as if -fsanitize=address were only supported on AArch64.
This patch fixes that wording by being more explicit.
gcc/ChangeLog:
PR other/98437
* doc/invoke.texi (-fsanitize=address): Fix wording describing
clash with -fsanitize=hwaddress.
This avoids running into memory reference code in compute_avail by
properly classifying unfolded reference trees on constants.
2021-01-04 Richard Biener <rguenther@suse.de>
PR tree-optimization/98282
* tree-ssa-sccvn.c (vn_get_stmt_kind): Classify tcc_reference on
invariants as VN_NARY.
* g++.dg/opt/pr98282.C: New testcase.
This patch fixes a codegen regression in the handling of things like:
__temp.val[0] \
= vcombine_##funcsuffix (__b.val[0], \
vcreate_##funcsuffix (__AARCH64_UINT64_C (0))); \
in the 64-bit vst[234] functions. The zero was forced into a
register at expand time, and we relied on combine to fuse the
zero and combine back together into a single combinez pattern.
The problem is that the zero could be hoisted before combine
gets a chance to do its thing.
gcc/
PR target/89057
* config/aarch64/aarch64-simd.md (aarch64_combine<mode>): Accept
aarch64_simd_reg_or_zero for operand 2. Use the combinez patterns
to handle zero operands.
gcc/testsuite/
PR target/89057
* gcc.target/aarch64/pr89057.c: New test.
The expansions of the svprf[bhwd] instructions weren't taking
advantage of the immediate addressing mode.
gcc/
* config/aarch64/aarch64.c (offset_6bit_signed_scaled_p): New function.
(offset_6bit_unsigned_scaled_p): Fix typo in comment.
(aarch64_sve_prefetch_operand_p): Accept MUL VLs in the range
[-32, 31].
gcc/testsuite/
* gcc.target/aarch64/sve/acle/asm/prfb.c: Test for a MUL VL range of
[-32, 31].
* gcc.target/aarch64/sve/acle/asm/prfh.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/prfw.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/prfd.c: Likewise.
This zeroes matches when failing SLP discovery because of the
work limit.
2021-01-04 Richard Biener <rguenther@suse.de>
PR tree-optimization/98393
* tree-vect-slp.c (vect_build_slp_tree): Properly zero matches
when hitting the limit.
When the VF is one a SLP reduction is in-order and thus we can
vectorize even when the reduction op is not associative.
2021-01-04 Richard Biener <rguenther@suse.de>
PR tree-optimization/98291
* tree-vect-loop.c (vectorizable_reduction): Bypass
associativity check for SLP reductions with VF 1.
* gcc.dg/vect/slp-reduc-11.c: New testcase.
* gcc.dg/vect/vect-reduc-in-order-4.c: Adjust.
x is never equal to ~x, so we can fold such comparisons to constants.
2021-01-04 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/96782
* match.pd (x == ~x -> false, x != ~x -> true): New simplifications.
* gcc.dg/tree-ssa/pr96782.c: New test.
When linking with -flto and -save-temps, various
temporary files are created in /tmp.
The same happens when invoking the driver with @file
parameter, and using -L or -I options.
gcc:
2021-01-04 Bernd Edlinger <bernd.edlinger@hotmail.de>
* collect-utils.c (collect_execute): Check dumppfx.
* collect2.c (maybe_run_lto_and_relink, do_link): Pass atsuffix
to collect_execute.
(do_link): Add new parameter atsuffix.
(main): Handle -dumpdir option. Skip one argument for
-o, -isystem and -B options.
* gcc.c (make_at_file): New helper function.
(close_at_file): Use it.
gcc/testsuite:
2021-01-04 Bernd Edlinger <bernd.edlinger@hotmail.de>
* gcc.misc-tests/outputs.exp: Adjust testcase.
We use this in the sim tree currently. Rather than require people to
have pkg-config installed, include it in the config/ dir.
config/ChangeLog:
* pkg.m4: New file from pkg-config-0.29.2.
Ideally, the linker will be queried for its version and that will be
used to determine capabilities that cannot be discovered from
reasonable configuration testing.
When building cross tools, this might not be possible, and we have
strategies for providing useful defaults. These are adjusted here to
refect current choices.
gcc/ChangeLog:
* config/darwin.h (MIN_LD64_NO_COAL_SECTS): Adjust.
Amend handling for LD64_VERSION fallback defaults.
The darwinN.h headers (with the sole exception of darwin7.h,
which contains a target macro definition) now only contain
values that set fall-backs for cross-compilations, these can
be provided from the config.gcc script which means we no longer
need the darwinN.h - so delete them.
gcc/ChangeLog:
* config.gcc: Compute default version information
from the configured target. Likewise defaults for
ld64.
* config/darwin10.h: Removed.
* config/darwin12.h: Removed.
* config/darwin9.h: Removed.
* config/rs6000/darwin8.h: Removed.
Darwin defines ASM_OUTPUT_ALIGNED_DECL_COMMON which is used in
preference to ASM_OUTPUT_ALIGNED_COMMON, which makes the latter
definition dead code. Remove this.
gcc/ChangeLog:
* config/darwin9.h (ASM_OUTPUT_ALIGNED_COMMON): Delete.
We now need a modern (C++11) toolchain to bootstrap GCC, so there's no
need to skip the stack protect for Darwin < 9.
gcc/ChangeLog:
* config/darwin9.h (STACK_CHECK_STATIC_BUILTIN): Move from here..
* config/darwin.h (STACK_CHECK_STATIC_BUILTIN): .. to here.
There is no need to make the LINK_GCC_C_SEQUENCE_SPEC conditional on
configuration parameters, it is adequately conditionalized on the
macosx-version-min.
gcc/ChangeLog:
* config/darwin10.h (LINK_GCC_C_SEQUENCE_SPEC): Move from
here...
* config/darwin.h (LINK_GCC_C_SEQUENCE_SPEC): ... to here.
The darwinN.h headers were (presumably) introduced to allow specs to be
adjusted when there was no mmacosx-version-min handling, or that was
considered unreliable.
We have version-specific specs for the values that have configuration
data, and the version is set in the driver (so may be considered
reliably present).
Some of the 'darwinN.h' content has become dead code, and the reminder
is either conditionalised on version information (or is setting values
used as fall-backs in cross-compilations).
With the changes needed for Darwin20 / macOS 11 the 'darwnN.h' headers
are now too unwieldy to be useful - so this series moves the relevant
specs definitons to the common 'darwin.h' header and then finally uses
the config.gcc script to supply the fall-back defaults for cross-
compilations.
We can then delete all but the main header, since the darwinN.h are
unused.
This change moves a spec from darwin10.h to the main darwin.h
target header.
gcc/ChangeLog:
* config/darwin10.h (LINK_GCC_C_SEQUENCE_SPEC): Move the spec
for the Darwin10 unwinder stub from here ...
* config/darwin.h (LINK_COMMAND_SPEC_A): ... to here.
The toolchain now requires a C++11 compiler to bootstrap and
none of the older Darwin toolchains which were based on stabs
debugging are suitable. We can simplify the debug setup now.
gcc/ChangeLog:
* config/darwin.h (DSYMUTIL_SPEC): Default to DWARF
(ASM_DEBUG_SPEC):Only define if the assembler supports
stabs.
(PREFERRED_DEBUGGING_TYPE): Default to DWARF.
(DARWIN_PREFER_DWARF): Define.
* config/darwin9.h (PREFERRED_DEBUGGING_TYPE): Remove.
(DARWIN_PREFER_DWARF): Likewise
(DSYMUTIL_SPEC): Likewise.
(COLLECT_RUN_DSYMUTIL): Likewise.
(ASM_DEBUG_SPEC): Likewise.
(ASM_DEBUG_OPTION_SPEC): Likewise.