Bugzilla inspection turned up a bunch of old(er) PRs that have been
fixed. Let's include them not to regress in the future.
gcc/testsuite/ChangeLog:
PR c++/87530
PR c++/58156
PR c++/68828
PR c++/86002
PR c++/91525
PR c++/96223
PR c++/87032
PR c++/35098
* g++.dg/cpp0x/move-return4.C: New test.
* g++.dg/cpp0x/vt-58156.C: New test.
* g++.dg/cpp2a/concepts-pr68828.C: New test.
* g++.dg/cpp2a/concepts-pr86002.C: New test.
* g++.dg/cpp2a/concepts-pr91525.C: New test.
* g++.dg/cpp2a/constexpr-indeterminate1.C: New test.
* g++.dg/cpp2a/desig17.C: New test.
* g++.dg/ext/attrib62.C: New test.
ptrace is actually declared as a variadic function. On ppc64le
the ABI requires to the caller to allocate space for the parameters
and allows the caller to modify them.
On ppc64le, depending on how and what version of GCC is used,
it will save to parameter save area. This happened to clobber
a saved LR, and caused syscall.TestExecPtrace to fail with a timeout
when the tracee segfaults, and waits for the parent process to inspect.
Wrap this function to avoid directly calling glibc's ptrace from go.
Fixesgolang/go#36698
Fixes go/92567
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/254755
Currently the -Wmisleading-indentation warning doesn't do any analysis
when the guarded statement or the statement after it is produced by a
macro. This means we warn for:
if (flag)
foo ();
bar ();
but not for:
#define BAR bar
if (flag)
foo ();
BAR ();
This patch extends the -Wmisleading-indentation implementation to
support analyzing such statements and their tokens. This is done in the
"natural" way by resolving the location of each of the three tokens to
the token's macro expansion point. (Additionally, if the tokens all
resolve to the same macro expansion point then we instead use their
locations within the macro definition.) When these resolved locations
are all different, then we can proceed with applying the warning
heuristics to them as if no macros were involved.
gcc/c-family/ChangeLog:
PR c/80076
* c-indentation.c (should_warn_for_misleading_indentation): Move
declarations of local variables closer to their first use.
Handle virtual token locations by resolving them to their
respective macro expansion points. If all three tokens are
produced from the same macro expansion, then instead use their
loci within the macro definition.
gcc/objc/ChangeLog:
PR c/80076
* objc-gnu-runtime-abi-01.c
(gnu_runtime_abi_01_get_class_super_ref): Reduce indentation of
misleadingly indented return statements.
* objc-next-runtime-abi-01.c
(next_runtime_abi_01_get_class_super_ref): Likewise.
gcc/ChangeLog:
PR c/80076
* gensupport.c (alter_attrs_for_subst_insn) <case SET_ATTR>:
Reduce indentation of misleadingly indented code fragment.
* lra-constraints.c (multi_block_pseudo_p): Likewise.
* sel-sched-ir.c (merge_fences): Likewise.
libcpp/ChangeLog:
PR c/80076
* include/line-map.h (first_map_in_common): Declare.
* line-map.c (first_map_in_common): Remove static.
gcc/testsuite/ChangeLog:
PR c/80076
* c-c++-common/Wmisleading-indentation-5.c: New test.
Some DWARF tests scan the assembly output looking for constant values.
When using DWARF5 those constants might use DW_FORM_implicit_const,
which are output (in the comments) after the attribute instead of
before. To make sure these tests work introduce a -gdwarf-5 variant
of these tests and explicitly use -gdwarf-2 for the original.
gcc/testsuite/ChangeLog:
* gcc.dg/debug/dwarf2/inline2.c: Add -gdwarf-2.
* g++.dg/debug/dwarf2/inline-var-1.C: Likewise.
* gcc.dg/debug/dwarf2/pr41445-5.c: Likewise.
* gcc.dg/debug/dwarf2/pr41445-6.c: Likewise.
* gcc.dg/debug/dwarf2/inline6.c: New variant with -gdwarf-5.
* g++.dg/debug/dwarf2/inline-var-3.C: Likewise.
* gcc.dg/debug/dwarf2/pr41445-7.c: Likewise.
* gcc.dg/debug/dwarf2/pr41445-8.c: Likewise.
Recent Technology Levels of AIX 7.2 have made sys/socket.h more C++-aware,
which causes the fix to be applied in too many locations. This patch adds
more context for the selection to apply the fix more narrowly.
fixincludes/ChangeLog:
2020-09-17 David Edelsohn <dje.gcc@gmail.com>
* inclhack.def (aix_externcpp1): Add more context to select.
(aix_externcpp2): Same.
* fixincl.x: Regenerate.
* tests/base/sys/socket.h: Update expected results.
This patch makes tsubst_requires_expr avoid substituting into a
requires-expression when partially instantiating a generic lambda.
This is necessary in general to ensure that we always check requirements
in lexical order (as in the first testcase below). A mechanism similar
to PACK_EXPANSION_EXTRA_ARGS is added to remember template arguments and
defer substitution of requires-expressions.
Incidentally, this change also fixes the two mentioned PRs -- the
problem there is that tsubst_requires_expr was performing semantic
checks on template trees, and some of the checks are not prepared to
handle such trees. With this patch, tsubst_requires_expr no longer
does any semantic checking at all when processing_template_decl.
gcc/cp/ChangeLog:
PR c++/96409
PR c++/96410
* constraint.cc (tsubst_requires_expr): Use REQUIRES_EXPR_PARMS
and REQUIRES_EXPR_REQS. Use REQUIRES_EXPR_EXTRA_ARGS,
add_extra_args and build_extra_args to defer substitution until
we have all the template arguments.
(finish_requires_expr): Adjust the call to build_min so that
REQUIRES_EXPR_EXTRA_ARGS gets set to NULL_TREE.
* cp-tree.def (REQUIRES_EXPR): Give it a third operand.
* cp-tree.h (REQUIRES_EXPR_PARMS, REQUIRES_EXPR_REQS,
REQUIRES_EXPR_EXTRA_ARGS): Define.
(add_extra_args, build_extra_args): Declare.
gcc/testsuite/ChangeLog:
PR c++/96409
PR c++/96410
* g++.dg/cpp2a/concepts-lambda13.C: New test.
* g++.dg/cpp2a/concepts-lambda14.C: New test.
This patch makes the *_internal functions 'static inline' to avoid these warnings during the build:
/libgcc/config/arm/fp16.c:169:1: warning: no previous prototype for '__gnu_h2f_internal' [-Wmissing-prototypes]
/libgcc/config/arm/fp16.c:194:1: warning: no previous prototype for '__gnu_f2h_ieee' [-Wmissing-prototypes]
/libgcc/config/arm/fp16.c:200:1: warning: no previous prototype for '__gnu_h2f_ieee' [-Wmissing-prototypes]
/libgcc/config/arm/fp16.c:206:1: warning: no previous prototype for '__gnu_f2h_alternative' [-Wmissing-prototypes]
/libgcc/config/arm/fp16.c:212:1: warning: no previous prototype for '__gnu_h2f_alternative' [-Wmissing-prototypes]
/libgcc/config/arm/fp16.c:218:1: warning: no previous prototype for '__gnu_d2h_ieee' [-Wmissing-prototypes]
/libgcc/config/arm/fp16.c:224:1: warning: no previous prototype for '__gnu_d2h_alternative' [-Wmissing-prototypes]
2020-09-11 Torbjörn SVENSSON <torbjorn.svensson@st.com>
Christophe Lyon <christophe.lyon@linaro.org>
libgcc/
* config/arm/fp16.c (__gnu_h2f_internal): Add 'static inline'
qualifier.
(__gnu_f2h_ieee, __gnu_h2f_ieee, __gnu_f2h_alternative)
(__gnu_h2f_alternative,__gnu_d2h_ieee, __gnu_d2h_alternative): Add
missing prototypes.
This adds the capability to look for available negated multiplications
and divisions, replacing them with cheaper negates.
2020-09-17 Richard Biener <rguenther@suse.de>
* tree-ssa-sccvn.c (visit_nary_op): Value-number multiplications
and divisions to negates of available negated forms.
* gcc.dg/tree-ssa/ssa-fre-88.c: New testcase.
gcc/ChangeLog:
PR middle-end/97078
* function.c (use_register_for_decl): Test cfun->tail_call_marked
for a parameter here instead of...
(assign_parm_setup_reg): ...here.
gcc/testsuite/ChangeLog:
* gcc.dg/pr97078.c: New test.
This fixes an ICE when trying to copy a legacy value_range containing
a symbolic to a multi-range:
min = make_ssa_name (type);
max = build_int_cst (type, 55);
value_range vv (min, max);
int_range<2> vr = vv;
gcc/ChangeLog:
* range-op.cc (multi_precision_range_tests): Normalize symbolics when copying to a
multi-range.
* value-range.cc (irange::copy_legacy_range): Add test.
pz_tmp_base and pz_tmp_dot are always set, but used only when
_PC_NAME_MAX is defined.
This patch moves their declaration and definition undef #ifdef
_PC_NAME_MAX to avoid this warning.
2020-09-11 Torbjörn SVENSSON <torbjorn.svensson@st.com>
Christophe Lyon <christophe.lyon@linaro.org>
fixincludes/
* fixfixes.c (pz_tmp_base, pz_tmp_dot): Define only with
_PC_NAME_MAX.
Before the change 'man gcc' rendered "SOURCE_DATE_EPOCH" section as:
... the output of @command{date +%s} on GNU/Linux ...
After the change it renders as:
... the output of "date +%s" on GNU/Linux ...
gcc/ChangeLog:
* doc/cppenv.texi: Use @code{} instead of @samp{@command{}}
around 'date %s'.
Current status is -mno-avx implies -mno-xsave which should be wrong.
gcc/ChangeLog
* common/config/i386/i386-common.c
(OPTION_MASK_ISA_AVX_UNSET): Remove OPTION_MASK_ISA_XSAVE_UNSET.
(OPTION_MASK_ISA_XSAVE_UNSET): Add OPTION_MASK_ISA_AVX_UNSET.
gcc/testsuite/ChangeLog
* gcc.target/i386/xsave-avx-1.c: New test.
Debugging the state explosion of the very large switch statement in
gcc.dg/analyzer/pr96653.c showed that the worklist was failing to
order the exploded nodes correctly; the in-edges at the join point
after the switch were not getting processed together, but were instead
being rocessed in smaller batches, bloating the exploded graph until the
per-point limit was reached.
The root cause turned out to be a bug in creating the strongly-connected
components for the supergraph: the code was considering interprocedural
edges as well as intraprocedural edges, leading to unpredictable
misorderings of the SCC and worklist, leading to bloating of the
exploded graph.
This patch fixes the SCC creation so it only considers intraprocedural
edges within the supergraph. It also tweaks worklist::key_t::cmp to
give higher precedence to call_string over differences within a
supernode, since enodes with different call_strings can't be merges.
In practise, none of my test cases were affected by this latter change,
though it seems to be the right thing to do.
With this patch, the very large switch statement in
gcc.dg/analyzer/pr96653.c is handled in a single call to
exploded_graph::maybe_process_run_of_before_supernode_enodes:
merged 358 in-enodes into 2 out-enode(s) at SN: 402
and that testcase no longer hits the per-program-point limits.
gcc/analyzer/ChangeLog:
* engine.cc (strongly_connected_components::strong_connect): Only
consider intraprocedural edges when creating SCCs.
(worklist::key_t::cmp): Add comment. Treat call_string
differences as more important than differences of program_point
within a supernode.
gcc/testsuite/ChangeLog:
PR analyzer/96653
* gcc.dg/analyzer/loop-0-up-to-n-by-1-with-iter-obj.c: Update
expected number of exploded nodes.
* gcc.dg/analyzer/malloc-vs-local-1a.c: Update expected number
of exploded nodes.
* gcc.dg/analyzer/pr96653.c: Remove -Wno-analyzer-too-complex.
gcc/analyzer/ChangeLog:
* engine.cc (supernode_cluster::dump_dot): Show the SCC id
in the per-supernode clusters in FILENAME.eg.dot output.
(exploded_graph_annotator::add_node_annotations):
Show the SCC of the supernode in FILENAME.supernode.eg.dot output.
* exploded-graph.h (worklist::scc_id): New.
(exploded_graph::get_scc_id): New.
Prior to this patch the analyzer worklist considered only one node or
two nodes at a time, processing and/or merging state individually or
pairwise.
This could lead to explosions of merger nodes at CFG join points,
especially after switch statements, which could have large numbers
of in-edges, and thus large numbers of merger exploded_nodes could
be created, exceeding the per-point limit and thus stopping analysis
with -Wanalyzer-too-complex.
This patch special-cases the handling for runs of consecutive
nodes in the worklist at a CFG join point, processing and merging
them all together.
The patch fixes a state explosion seen in bzip2.c seen when attempting
to reproduce PR analyzer/95188, in a switch statement in a loop for
argument parsing. With this patch, the analyzer successfully
consolidates the state after the argument parsing to a single exploded
node.
In gcc.dg/analyzer/pr96653.c there is a switch statement with over 300
cases which leads to hitting the per-point limit. With this patch
the consolidation code doesn't manage to merge all of them due to other
worklist-ordering bugs, and it still hits the per-point limits, but it
does manage some very long consolidations:
merged 2 in-enodes into 2 out-enode(s) at SN: 403
merged 2 in-enodes into 2 out-enode(s) at SN: 403
merged 2 in-enodes into 1 out-enode(s) at SN: 11
merged 29 in-enodes into 1 out-enode(s) at SN: 35
merged 6 in-enodes into 1 out-enode(s) at SN: 41
merged 31 in-enodes into 1 out-enode(s) at SN: 35
and with a followup patch to fix an SCC issue it manages:
merged 358 in-enodes into 2 out-enode(s) at SN: 402
The patch appears to fix the failure on non-x86_64 of:
gcc.dg/analyzer/pr93032-mztools.c (test for excess errors)
which is PR analyzer/96616.
Unfortunately, the patch introduces a memory leak false positive in
gcc.dg/analyzer/pr94851-1.c, but this appears to be a pre-existing bug
that was hidden by state-merging failures.
gcc/analyzer/ChangeLog:
* engine.cc (exploded_node::dump_dot): Show STATUS_BULK_MERGED.
(exploded_graph::process_worklist): Call
maybe_process_run_of_before_supernode_enodes.
(exploded_graph::maybe_process_run_of_before_supernode_enodes):
New.
(exploded_graph_annotator::print_enode): Show STATUS_BULK_MERGED.
* exploded-graph.h (enum exploded_node::status): Add
STATUS_BULK_MERGED.
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/bzip2-arg-parse-1.c: New test.
* gcc.dg/analyzer/loop-n-down-to-1-by-1.c: Remove xfail.
* gcc.dg/analyzer/pr94851-1.c: Add xfail.
Avoid some future copy-and-paste by introducing a function.
gcc/analyzer/ChangeLog:
* engine.cc
(exploded_graph::process_node) <case PK_BEFORE_SUPERNODE>:
Simplify by using program_point::get_next.
* program-point.cc (program_point::get_next): New.
* program-point.h (program_point::get_next): New decl.
I found this useful when debugging.
gcc/analyzer/ChangeLog:
* engine.cc (exploded_graph::get_or_create_node): Show the
program point when issuing -Wanalyzer-too-complex due to hitting
the per-program-point limit.
Seen whilst debugging another issue, where the analyzer was assuming
conservatively that a call to getchar could clobber a global.
This is handled for most of the other stdio functions by the list
in sm-file.cc
gcc/analyzer/ChangeLog:
* region-model.cc (region_model::on_call_pre): Treat getchar as
having no side-effects.
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/getchar-1.c: New test.
g++ 4.8.5 rejected cases with SFmode and DFmode, presumably due to
some bug in the constexpr implementation.
for gcc/ChangeLog
* config/rs6000/rs6000.c (have_compare_and_set_mask): Use
E_*mode in cases.
Most uses of rs6000_pcrel_p are called for the current function.
A specialized version for cfun is more efficient for these uses.
2020-09-16 Bill Schmidt <wschmidt@linux.ibm.com>
gcc/
* config/rs6000/predicates.md (current_file_function_operand):
Remove argument from rs6000_pcrel_p call.
* config/rs6000/rs6000-logue.c (rs6000_decl_ok_for_sibcall):
Likewise.
(rs6000_global_entry_point_prologue_needed_p): Likewise.
(rs6000_output_function_prologue): Likewise.
* config/rs6000/rs6000-protos.h (rs6000_function_pcrel_p): New
prototype.
(rs6000_pcrel_p): Remove argument.
* config/rs6000/rs6000.c (rs6000_legitimize_tls_address): Remove
argument from rs6000_pcrel_p call.
(rs6000_call_template_1): Likewise.
(rs6000_indirect_call_template_1): Likewise.
(rs6000_longcall_ref): Likewise.
(rs6000_call_aix): Likewise.
(rs6000_sibcall_aix): Likewise.
(rs6000_function_pcrel_p): Rename from rs6000_pcrel_p.
(rs6000_pcrel_p): Rewrite.
* config/rs6000/rs6000.md (*pltseq_plt_pcrel<mode>): Remove
argument from rs6000_pcrel_p call.
(*call_local<mode>): Likewise.
(*call_value_local<mode>): Likewise.
(*call_nonlocal_aix<mode>): Likewise.
(*call_value_nonlocal_aix<mode>): Likewise.
(*call_indirect_pcrel<mode>): Likewise.
(*call_value_indirect_pcrel<mode>): Likewise.
Here we ICE in char_span::subspan because the offset it gets is -1.
It's -1 because get_substring_ranges_for_loc gets a location whose
column was 0. That only happens in testcases like the attached where
we're dealing with extremely long lines (at least 4065 chars it seems).
This does happen in practice, though, so it's not just a theoretical
problem (e.g. when building the SU2 suite).
Fixed by checking that the column get_substring_ranges_for_loc gets is
sane, akin to other checks in that function.
gcc/ChangeLog:
PR preprocessor/96935
* input.c (get_substring_ranges_for_loc): Return if start.column
is less than 1.
gcc/testsuite/ChangeLog:
PR preprocessor/96935
* gcc.dg/format/pr96935.c: New test.
Resolves:
PR middle-end/96295 - -Wmaybe-uninitialized warning for range operator with
reference to an empty struct
gcc/ChangeLog:
PR middle-end/96295
* tree-ssa-uninit.c (maybe_warn_operand): Work harder to avoid
warning for objects of empty structs
gcc/testsuite/ChangeLog:
PR middle-end/96295
* g++.dg/warn/Wuninitialized-11.C: New test.
This corrects the earlier problems with removing the template header
from local omp reductions. And it uncovered a latent bug. When we
tsubst such a decl, we immediately tsubst its body.
cp_check_omp_declare_reduction gets a success return value to gate
that instantiation.
udr-2.C got a further error, as the omp checking machinery doesn't
appear to turn the reduction into an error mark when failing. I
didn't dig into that further. udr-3.C appears to have been invalid
and accidentally worked.
gcc/cp/
* cp-tree.h (cp_check_omp_declare_reduction): Return bool.
* semantics.c (cp_check_omp_declare_reduction): Return true on for
success.
* pt.c (push_template_decl_real): OMP reductions do not get a
template header.
(tsubst_function_decl): Remove special casing for local decl omp
reductions.
(tsubst_expr): Call instantiate_body for a local omp reduction.
(instantiate_body): Add nested_p parm, and deal with such
instantiations.
(instantiate_decl): Reject FUNCTION_SCOPE entities, adjust
instantiate_body call.
gcc/testsuite/
* g++.dg/gomp/udr-2.C: Add additional expected error.
libgomp/
* testsuite/libgomp.c++/udr-3.C: Add missing ctor.
This adds the PPC architecture variants for Mach-O libbacktrace.
With this (as for X86 and Arm) when dsymutil is run on the binary
we get a basic usable backtrace.
Testsuite results on powerpc-apple-darwin9 are the same as for X86:
* btest fails (TBC why)
* dwarf5 tests fail because dsymutil does not handle that so far.
libbacktrace/ChangeLog:
* macho.c (MACH_O_CPU_TYPE_PPC): New.
(MACH_O_CPU_TYPE_PPC64): New.
Add compile-tests for powerpc to the Mach-O variants.
This restores the post-order traversal done by cleanup_all_empty_eh in
order to eliminate empty landing pads and also contains a small tweak
to the line debug info to avoid a problematic inheritance for coverage
measurement.
gcc/ChangeLog:
* tree-eh.c (lower_try_finally_dup_block): Backward propagate slocs
to stack restore builtin calls.
(cleanup_all_empty_eh): Do again a post-order traversal of the EH
region tree.
gcc/testsuite/ChangeLog:
* gnat.dg/concat4.adb: New test.
instantiate_body has a local var call 'nested', which indicates that
this instantiation was caused during the body of some function -- not
necessarily its containing scope. That's confusing, let's just use
'current_function_decl' directly. Then we can also simplify the
push_to_top_level logic, which /does/ indicate whether this is an
actual nested function. (C++ does not have nested functions, but OMP
ODRs fall into that category. A follow up patch will use that more
usual meaning of 'nested' wrt to functions.)
gcc/cp/
* pt.c (instantiate_body): Remove 'nested' var, simplify
push_to_top logic.
This refactors instantiate_decl, breaking out the actual instantiation
work to instantiate_body. That'll allow me to address the OMP UDR
issue, but it also means we have slightly neater code in
instantiate_decl anyway.
gcc/cp/
* pt.c (instantiate_body): New, broken out of ..
(instantiate_decl): ... here. Call it.
gcc/ChangeLog
2020-09-09 Andrea Corallo <andrea.corallo@arm.com>
* tree-vect-loop.c (vect_need_peeling_or_partial_vectors_p): New
function.
(vect_analyze_loop_2): Make use of it not to select partial
vectors if no peel is required.
(determine_peel_for_niter): Move out some logic into
'vect_need_peeling_or_partial_vectors_p'.
gcc/testsuite/ChangeLog
2020-09-09 Andrea Corallo <andrea.corallo@arm.com>
* gcc.target/aarch64/sve/cost_model_10.c: New test.
* gcc.target/aarch64/sve/clastb_8.c: Update test for new
vectorization strategy.
* gcc.target/aarch64/sve/cost_model_5.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_14.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_15.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_16.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_17.c: Likewise.
Add sp_is_clobbered_by_asm to rtl_data to inform backends that the stack
pointer is clobbered by asm statement.
gcc/
PR target/97032
* cfgexpand.c (asm_clobber_reg_kind): Set sp_is_clobbered_by_asm
to true if the stack pointer is clobbered by asm statement.
* emit-rtl.h (rtl_data): Add sp_is_clobbered_by_asm.
* config/i386/i386.c (ix86_get_drap_rtx): Set need_drap to true
if the stack pointer is clobbered by asm statement.
gcc/testsuite/
PR target/97032
* gcc.target/i386/pr97032.c: New test.
The following testcases will be simplified by the new rule
(T)(A) +- (T)(B) -> (T)(A +- B), so could not keep code pattern
expected by test-check. Adjust test code to suppress simplification.
2020-09-16 Feng Xue <fxue@os.amperecomputing.com>
gcc/testsuite/
PR testsuite/97066
* gcc.dg/ifcvt-3.c: Modified to suppress simplification.
* gcc.dg/tree-ssa/20030807-10.c: Likewise.
Certain alternatives of *vec_tf_to_v1tf use "v" constraint for its
TFmode source operand. Therefore it is assigned to VEC_REGS class,
and when it is reloaded using *movtf_64, whose relevant alternatives
need FP_REGS, LRA loops and ICE happens. The reason is that register
class mismatch causes LRA to emit another reload, which triggers this
issue again.
Fix by using "f" constraint, which is more appropriate for FP register
pairs anyway.
gcc/ChangeLog:
2020-09-02 Ilya Leoshkevich <iii@linux.ibm.com>
* config/s390/vector.md(*vec_tf_to_v1tf): Use "f" instead of "v"
for the source operand.
This removes STMT_VINFO_NUM_SLP_USES by pushing the setting of
the shared stmt_vec_info vector type to where we actually need it
which is alignment analysis and vectorizable_* analysis (where
we could eventually elide it for non-load/store operations).
In particular "uses" in the cache and in disqualified SLP
subgraphs should no longer provide conflicting vector types
this way.
2020-09-16 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (_stmt_vec_info::num_slp_uses): Remove.
(STMT_VINFO_NUM_SLP_USES): Likewise.
(vect_free_slp_instance): Adjust.
(vect_update_shared_vectype): Declare.
* tree-vectorizer.c (vec_info::~vec_info): Adjust.
* tree-vect-loop.c (vect_analyze_loop_2): Likewise.
(vectorizable_live_operation): Use vector type from
SLP_TREE_REPRESENTATIVE.
(vect_transform_loop): Adjust.
* tree-vect-data-refs.c (vect_slp_analyze_node_alignment):
Set the shared vector type.
* tree-vect-slp.c (vect_free_slp_tree): Remove final_p
parameter, remove STMT_VINFO_NUM_SLP_USES updating.
(vect_free_slp_instance): Adjust.
(vect_create_new_slp_node): Remove STMT_VINFO_NUM_SLP_USES
updating.
(vect_update_shared_vectype): Always compare with the
present vector type, update if NULL.
(vect_build_slp_tree_1): Do not update the shared vector
type here.
(vect_build_slp_tree_2): Adjust.
(slp_copy_subtree): Likewise.
(vect_attempt_slp_rearrange_stmts): Likewise.
(vect_analyze_slp_instance): Likewise.
(vect_analyze_slp): Likewise.
(vect_slp_analyze_node_operations_1): Update the shared
vector type.
(vect_slp_analyze_operations): Adjust.
(vect_slp_analyze_bb_1): Likewise.
2020-09-16 Jakub Jelinek <jakub@redhat.com>
* config/arm/arm.c (arm_option_restore): Comment out opts argument
name to avoid unused parameter warnings.
When working on the previous patch, I've noticed that all cl_optimization
fields appart from strings are streamed with bp_pack_value (..., 64); so we
waste quite a lot of space, given that many of the options are just booleans
or char options and there are 450-ish of them.
Fixed by streaming the number of bits the corresponding fields have.
While for char fields we have also range information, except for 3
it is either -128, 127 or 0, 255, so it didn't seem worth it to bother
with using range-ish packing.
2020-09-16 Jakub Jelinek <jakub@redhat.com>
* optc-save-gen.awk: In cl_optimization_stream_out use
bp_pack_var_len_{int,unsigned} instead of bp_pack_value. In
cl_optimization_stream_in use bp_unpack_var_len_{int,unsigned}
instead of bp_unpack_value. Formatting fix.
As the testcases show, if we have something like:
MEM <char[12]> [&b + 8B] = {};
MEM[(short *) &b] = 5;
_5 = *x_4(D);
MEM <long long unsigned int> [&b + 2B] = _5;
MEM[(char *)&b + 16B] = 88;
MEM[(int *)&b + 20B] = 1;
then in sort_by_bitpos the stores are almost like in the given order,
except the first store is after the = _5; store.
We can't coalesce the = 5; store with = _5;, because the latter is MEM_REF,
while the former INTEGER_CST, and we can't coalesce the = _5 store with
the = {} store because the former is MEM_REF, the latter INTEGER_CST.
But we happily coalesce the remaining 3 stores, which is wrong, because the
= _5; store overlaps those and is in between them in the program order.
We already have code to deal with similar cases in check_no_overlap, but we
deal only with the following stores in sort_by_bitpos order, not the earlier
ones.
The following patch checks also the earlier ones. In coalesce_immediate_stores
it computes the first one that needs to be checked (all the ones whose
bitpos + bitsize is smaller or equal to merged_store->start don't need to be
checked and don't need to be checked even for any following attempts because
of the sort_by_bitpos sorting) and the end of that (that is the first store
in the merged_store).
2020-09-16 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/97053
* gimple-ssa-store-merging.c (check_no_overlap): Add FIRST_ORDER,
START, FIRST_EARLIER and LAST_EARLIER arguments. Return false if
any stores between FIRST_EARLIER inclusive and LAST_EARLIER exclusive
has order in between FIRST_ORDER and LAST_ORDER and overlaps the to
be merged store.
(imm_store_chain_info::try_coalesce_bswap): Add FIRST_EARLIER argument.
Adjust check_no_overlap caller.
(imm_store_chain_info::coalesce_immediate_stores): Add first_earlier
and last_earlier variables, adjust them during iterations. Adjust
check_no_overlap callers, call check_no_overlap even when extending
overlapping stores by extra INTEGER_CST stores.
* gcc.dg/store_merging_31.c: New test.
* gcc.dg/store_merging_32.c: New test.