TWO_OPERANDS allows any order or number of combinations of + and - operations
but the pattern matcher only supports pairs of operations.
This patch has the pattern matcher for complex numbers reject SLP trees where
the lanes are not a multiple of 2.
gcc/ChangeLog:
PR tree-optimization/99825
* tree-vect-slp-patterns.c (vect_check_evenodd_blend):
Reject non-mult 2 lanes.
gcc/testsuite/ChangeLog:
PR tree-optimization/99825
* gfortran.dg/vect/pr99825.f90: New test.
When compiling with -mfloat-abi=hard -march=armv8.1-m.main+mve, we
want to emit Tag_ABI_VFP_args even though we are not emitting
floating-point instructions (we need "+mve.fp" for that), because we
use MVE registers to pass FP arguments.
This patch removes the condition on (! TARGET_SOFT_FLOAT) because this
is a case where TARGET_SOFT_FLOAT is true, and TARGET_HARD_FLOAT_ABI
is true too.
2021-03-30 Richard Earnshaw <rearnsha@arm.com>
gcc/
PR target/99773
* config/arm/arm.c (arm_file_start): Fix emission of
Tag_ABI_VFP_args attribute.
VN sometimes builds new integer types to handle accesss where precision
of the access type does not match the access size. The way
ao_ref_init_from_vn_reference is computing the access size ignores
the access type in case the ref operands have an outermost
COMPONENT_REF which, in case it is an array for example, can be
way larger than the access size. This can cause us to try
building an integer type with precision larger than WIDE_INT_MAX_PRECISION
eventually leading to memory corruption.
The following adjusts ao_ref_init_from_vn_reference to only lower
access sizes via the outermost COMPONENT_REF but otherwise honor
the access size as specified by the access type.
It also places an assert in integer type building that we remain
in the limits of WIDE_INT_MAX_PRECISION. I chose the shared code
where we set TYPE_MIN/MAX_VALUE because that will immediately
cross the wide_ints capacity otherwise.
2021-03-30 Richard Biener <rguenther@suse.de>
PR tree-optimization/99824
* stor-layout.c (set_min_and_max_values_for_integral_type):
Assert the precision is within the bounds of
WIDE_INT_MAX_PRECISION.
* tree-ssa-sccvn.c (ao_ref_init_from_vn_reference): Use
the outermost component ref only to lower the access size
and initialize that from the access type.
* gcc.dg/torture/pr99824.c: New testcase.
This PR is a regression caused by r8-5967, where we replaced
a call to aarch64_internal_mov_immediate in aarch64_add_offset
with a call to aarch64_force_temporary, which in turn uses the
normal emit_move_insn{,_1} routines.
The problem is that aarch64_add_offset can be called while
outputting a thunk, where we require all instructions to be
valid without splitting. However, the move expanders were
not splitting CONST_INT moves themselves.
I think the right fix is to make the move expanders work
even in this scenario, rather than require callers to handle
it as a special case.
gcc/
PR target/98136
* config/aarch64/aarch64.md (mov<mode>): Pass multi-instruction
CONST_INTs to aarch64_expand_mov_immediate when called after RA.
gcc/testsuite/
PR target/98136
* g++.dg/pr98136.C: New test.
Currently, SF->SI and DF->DI conversions on Aarch64 with the "nosimd"
flag provided sometimes cause the emitting of a vector variant of the
fcvtz[su] instruction (e.g. fcvtzu s0, s0).
This modifies the corresponding pattern to only select the vector
variant of the instruction when generating code with SIMD enabled.
gcc/ChangeLog:
* config/aarch64/aarch64.md
(<optab>_trunc<fcvt_target><GPI:mode>2): Set the "arch"
attribute to disambiguate between SIMD and FP variants of the
instruction.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/fcvt_nosimd.c: New test.
This is a regression present on the mainline: the compiler (front-end) fails
to assign an aggregate to a full-access component (i.e. Atomic or VFA) as a
whole if the type of the component is not full access itself.
gcc/ada/
PR ada/99802
* freeze.adb (Is_Full_Access_Aggregate): Call Is_Full_Access_Object
on the name of an N_Assignment_Statement to spot full access.
In the patch that I applied on March 2nd, I had code to provide support for
Decimal/_Float128 conversions if the user did not use at least GLIBC 2.32. It
did this by using __ibm128 as an intermediate type. The trouble is __ibm128
cannot represent all of the numbers that _Float128 can, and you lose if you do
this conversion.
This patch removes this support. The dfp-bit.c functions now call the the
__sprintfieee128 and __strtoieee128 functions to do the conversion. If the
user does not have GLIBC, they will get a linker error that these functions do
not exist.
The float128 support functions are only built into the static libgcc, so there
isn't an issue with having references to __strtoieee128 and __sprintfieee128
with older GLIBC libraries.
As an added bonus, this patch eliminates the __sprintfkf function which
included stdio.h to get a definition for the sprintf library function. This
allows for building cross compilers without having to have a target stdio.h
available.
libgcc/
2021-03-29 Michael Meissner <meissner@linux.ibm.com>
* config/rs6000/t-float128 (fp128_decstr_funcs): Delete.
(fp128_ppc_funcs): Do not add $(fp128_decstr_funcs).
(fp128_decstr_objs): Delete.
* dfp-bit.h: Call __sprintfieee128 to do conversions from
_Float128 to a Decimal type. Call __strtoieee128 to do
conversions from a Decimal type to _Float128.
* config/rs6000/_sprintfkf.c: Delete file.
* config/rs6000/_sprintfkf.h: Delete file.
* config/rs6000/_strtokf.c: Delete file.
* config/rs6000/_strtokf.h: Delete file.
aarch64 currently doesn't support declare simd where the return value and arguments
have different sizes and warns about that case. This change adds a dg-warning
for that case like various other tests have already.
2021-03-29 Jakub Jelinek <jakub@redhat.com>
PR fortran/93660
* gfortran.dg/gomp/declare-simd-coarray-lib.f90: Expect a mixed size
declare simd warning on aarch64.
The LLVM project renamed their default branch to 'main'.
libstdc++-v3/ChangeLog:
* doc/xml/manual/status_cxx2017.xml: Adjust link for PSTL.
* doc/html/manual/status.html: Regenerate.
As discussed in the PR, we currently have two different numbering
schemes for SVE builtins: one for C, and one for C++. This is
problematic for LTO, where we end up getting confused about which
intrinsic we're talking about. This patch inserts placeholders into the
registered_functions vector to ensure that there is a consistent
numbering scheme for both C and C++.
We use integer_zero_node as a placeholder node instead of building a
function decl. This is safe because the node is only returned by the
TARGET_BUILTIN_DECL hook, which (on AArch64) is only used for validation
when builtin decls are streamed into lto1.
gcc/ChangeLog:
PR target/99216
* config/aarch64/aarch64-sve-builtins.cc
(function_builder::add_function): Add placeholder_p argument, use
placeholder decls if this is set.
(function_builder::add_unique_function): Instead of conditionally adding
direct overloads, unconditionally add either a direct overload or a
placeholder.
(function_builder::add_overloaded_function): Set placeholder_p if we're
using C++ overloads. Use the obstack for string storage instead
of relying on the tree nodes.
(function_builder::add_overloaded_functions): Don't return early for
m_direct_overloads: we need to add placeholders.
* config/aarch64/aarch64-sve-builtins.h
(function_builder::add_function): Add placeholder_p argument.
gcc/testsuite/ChangeLog:
PR target/99216
* g++.target/aarch64/sve/pr99216.C: New test.
This avoids asserting anything on the SLP_TREE_REPRESENTATIVE of
an SLP permute node (which shouldn't be there).
2021-03-29 Richard Biener <rguenther@suse.de>
PR tree-optimization/99807
* tree-vect-slp.c (vect_slp_analyze_node_operations_1): Move
assert below VEC_PERM handling.
* gfortran.dg/vect/pr99807.f90: New testcase.
This patch fixes the RTL representation of the move_lo_quad patterns to use aarch64_simd_or_scalar_imm_zero
for the zero part rather than a vec_duplicate of zero or a const_int 0.
The expander that generates them is also adjusted so that we use and match the correct const_vector forms throughout.
Co-Authored-By: Jakub Jelinek <jakub@redhat.com>
gcc/ChangeLog:
PR target/99037
* config/aarch64/aarch64-simd.md (move_lo_quad_internal_<mode>): Use
aarch64_simd_or_scalar_imm_zero to match zeroes. Remove pattern
matching const_int 0.
(move_lo_quad_internal_be_<mode>): Likewise.
(move_lo_quad_<mode>): Update for the above.
* config/aarch64/iterators.md (VQ_2E): Delete.
gcc/testsuite/ChangeLog:
PR target/99808
* gcc.target/aarch64/pr99808.c: New test.
extract_muldiv{,_1} is apparently only prepared to handle scalar integer
operations, the callers ensure it by only calling it if the divisor or
one of the multiplicands is INTEGER_CST and because neither multiplication
nor division nor modulo are really supported e.g. for pointer types, nullptr
type etc. But the CASE_CONVERT handling doesn't really check if it isn't
a cast from some other type kind, so on the testcase we end up trying to
build MULT_EXPR in POINTER_TYPE which ICEs. A few years ago Marek has
added ANY_INTEGRAL_TYPE_P checks to two spots, but the code uses
TYPE_PRECISION which means something completely different for vector types,
etc.
So IMNSHO we should just punt on conversions from non-integrals or
non-scalar integrals.
2021-03-29 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/99777
* fold-const.c (extract_muldiv_1): For conversions, punt on casts from
types other than scalar integral types.
* g++.dg/torture/pr99777.C: New test.
GCC currently emits TLS relocation decorations on symbols in DWARF sections.
Recent changes to the AIX linker cause it to reject such symbols.
This patch removes the decorations (@ie, @le, @m) and emit only the
qualified symbol name.
gcc/ChangeLog:
* config/rs6000/rs6000.c (rs6000_output_dwarf_dtprel): Do not add
XCOFF TLS reloc decorations.
I'm seeing random scan-assembler-times failures in pr96770.c when LTO is used.
I suspect this is because the \\+4 string matches the LTO sections, sometimes.
This small patch avoids the issue, by matching arr\\+4 instead of \\+4.
2021-03-28 Christophe Lyon <christophe.lyon@linaro.org>
gcc/testsuite/
PR target/96770
* gcc.target/arm/pure-code/pr96770.c: Improve scan-assembler-times.
2021-03-28 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran/ChangeLog
PR fortran/99602
* trans-expr.c (gfc_conv_procedure_call): Use the _data attrs
for class expressions and detect proc pointer evaluations by
the non-null actual argument list.
gcc/testsuite/ChangeLog
PR fortran/99602
* gfortran.dg/pr99602.f90: New test.
* gfortran.dg/pr99602a.f90: New test.
* gfortran.dg/pr99602b.f90: New test.
* gfortran.dg/pr99602c.f90: New test.
* gfortran.dg/pr99602d.f90: New test.
Instead, tests are copied from the source tree (i.e: $srcdir/compilable)
into the test base directory ($base_dir/compilable). A dejagnu test
file with all translated test directives is created in a path that
follows DejaGnu naming conventions ($base_dir/gdc.test/compilable),
which is then passed to `dg-test'.
Before invoking the compiler, the gdc.test prefixed is trimmed from the
test program in `gdc-dg-test' so that all copied test files are picked
up with the correct path names.
gcc/testsuite/ChangeLog:
* lib/gdc-utils.exp (gdc-copy-extra): Rename to...
(gdc-copy-file): ... this. Use file copy instead of open/close.
(gdc-convert-test): Save translated dejagnu test to gdc.test
directory, only write dejagnu directives to the test file.
(gdc-do-test): Don't create gdc.test symlink.
The underlying base type for enumerals are always present in TREE_TYPE.
gcc/d/ChangeLog:
* d-lang.cc (d_enum_underlying_base_type): New function.
(LANG_HOOKS_ENUM_UNDERLYING_BASE_TYPE): Set as
d_enum_underlying_base_type.
This means the correct config headers are included when building the
D front-end in a Canadian cross configuration.
gcc/d/ChangeLog:
* Make-lang.in (DMDGEN_COMPILE): Remove.
(d/%.dmdgen.o): Use COMPILER_FOR_BUILD and BUILD_COMPILERFLAGS to
build all D generator programs.
(D_SYSTEM_H): New macro.
(d/idgen.dmdgen.o): Add dependencies to build.
(d/impcnvgen.dmdgen.o): Likewise.
* d-system.h: Include bconfig.h if GENERATOR_FILE is defined.
The static constructor/destructor list only ever has one function to
call in it, so mark the gdc.dso_ctor and gdc.dso_dtor functions as
static ctor/dtor directly instead.
gcc/d/ChangeLog:
* config-lang.in (gtfiles): Remove modules.cc.
* modules.cc (struct module_info): Remove GTY marker.
(static_ctor_list): Remove variable.
(static_dtor_list): Remove variable.
(register_moduleinfo): Directly set DECL_STATIC_CONSTRUCTOR on
dso_ctor, and DECL_STATIC_DESTRUCTOR on dso_dtor.
(d_finish_compilation): Remove static ctor/dtor handling.
gcc/testsuite/ChangeLog:
* gdc.dg/gdc270a.d: Removed.
* gdc.dg/gdc270b.d: Removed.
The AIX power alignment rules apply the natural alignment of the
"first member" if it is of a floating-point data type (or is an aggregate
whose recursively "first" member or element is such a type). The alignment
associated with these types for subsequent members use an alignment value
where the floating-point data type is considered to have 4-byte alignment.
GCC had been stripping array type but had not recursively looked
within structs and unions. This also applies to classes and
subclasses and, therefore, becomes more prominent with C++.
For example,
struct A {
double x[2];
int y;
};
struct B {
int i;
struct A a;
};
struct A has double-word alignment for the bare type, but
word alignment and offset within struct B despite the alignment of
struct A. If struct A were the first member of struct B, struct B
would have double-word alignment. One must search for the innermost
first member to increase the alignment if double and then search for
the innermost first member to reduce the alignment if the TYPE had
double-word alignment solely because the innermost first member was
double.
This patch recursively looks through the first member to apply the
double-word alignment to the struct / union as a whole and to apply
the word alignment to the struct or union as a member within a struct
or union.
This is an ABI change for GCC on AIX, but GCC on AIX had not correctly
implemented the AIX ABI and had not been compatible with the IBM XL
compiler.
Bootstrapped on powerpc-ibm-aix7.2.3.0.
gcc/ChangeLog:
* config/rs6000/aix.h (ADJUST_FIELD_ALIGN): Call function.
* config/rs6000/rs6000-protos.h (rs6000_special_adjust_field_align):
Declare.
* config/rs6000/rs6000.c (rs6000_special_adjust_field_align): New.
(rs6000_special_round_type_align): Recursively check innermost first
field.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pr99557.c: New.
On the testcase in the PR with
-fno-tree-sink -O3 -fPIC -fomit-frame-pointer -fno-strict-aliasing -mstackrealign
we have prologue:
0000000000000000 <_func_with_dwarf_issue_>:
0: 4c 8d 54 24 08 lea 0x8(%rsp),%r10
5: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp
9: 41 ff 72 f8 pushq -0x8(%r10)
d: 55 push %rbp
e: 48 89 e5 mov %rsp,%rbp
11: 41 57 push %r15
13: 41 56 push %r14
15: 41 55 push %r13
17: 41 54 push %r12
19: 41 52 push %r10
1b: 53 push %rbx
1c: 48 83 ec 20 sub $0x20,%rsp
and emit
00000000 0000000000000014 00000000 CIE
Version: 1
Augmentation: "zR"
Code alignment factor: 1
Data alignment factor: -8
Return address column: 16
Augmentation data: 1b
DW_CFA_def_cfa: r7 (rsp) ofs 8
DW_CFA_offset: r16 (rip) at cfa-8
DW_CFA_nop
DW_CFA_nop
00000018 0000000000000044 0000001c FDE cie=00000000 pc=0000000000000000..00000000000001d5
DW_CFA_advance_loc: 5 to 0000000000000005
DW_CFA_def_cfa: r10 (r10) ofs 0
DW_CFA_advance_loc: 9 to 000000000000000e
DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
DW_CFA_advance_loc: 13 to 000000000000001b
DW_CFA_def_cfa_expression (DW_OP_breg6 (rbp): -40; DW_OP_deref)
DW_CFA_expression: r15 (r15) (DW_OP_breg6 (rbp): -8)
DW_CFA_expression: r14 (r14) (DW_OP_breg6 (rbp): -16)
DW_CFA_expression: r13 (r13) (DW_OP_breg6 (rbp): -24)
DW_CFA_expression: r12 (r12) (DW_OP_breg6 (rbp): -32)
...
unwind info for that. The problem is when async signal
(or stepping through in the debugger) stops after the pushq %rbp
instruction and before movq %rsp, %rbp, the unwind info says that
caller's %rbp is saved there at *%rbp, but that is not true, caller's
%rbp is either still available in the %rbp register, or in *%rsp,
only after executing the next instruction - movq %rsp, %rbp - the
location for %rbp is correct. So, either we'd need to temporarily
say:
DW_CFA_advance_loc: 9 to 000000000000000e
DW_CFA_expression: r6 (rbp) (DW_OP_breg7 (rsp): 0)
DW_CFA_advance_loc: 3 to 0000000000000011
DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
DW_CFA_advance_loc: 10 to 000000000000001b
or to me it seems more compact to just say:
DW_CFA_advance_loc: 12 to 0000000000000011
DW_CFA_expression: r6 (rbp) (DW_OP_breg6 (rbp): 0)
DW_CFA_advance_loc: 10 to 000000000000001b
I've tried instead to deal with it through REG_FRAME_RELATED_EXPR
from the backend, but that failed miserably as explained in the PR,
dwarf2cfi.c has some rules (Rule 16 to Rule 19) that are specific to the
dynamic stack realignment using drap register that only the i386 backend
does right now, and by using REG_FRAME_RELATED_EXPR or REG_CFA* notes we
can't emulate those rules. The following patch instead does the deferring
of the hard frame pointer save rule in dwarf2cfi.c Rule 18 handling and
emits it on the (set hfp sp) assignment that must appear shortly after it
and adds assertion that it is the case.
The difference before/after the patch on the assembly is:
--- pr99334.s~ 2021-03-26 15:42:40.881749380 +0100
+++ pr99334.s 2021-03-26 17:38:05.729161910 +0100
@@ -11,8 +11,8 @@ _func_with_dwarf_issue_:
andq $-16, %rsp
pushq -8(%r10)
pushq %rbp
- .cfi_escape 0x10,0x6,0x2,0x76,0
movq %rsp, %rbp
+ .cfi_escape 0x10,0x6,0x2,0x76,0
pushq %r15
pushq %r14
pushq %r13
i.e. does just what we IMHO need, after pushq %rbp %rbp
still contains parent's frame value and so the save rule doesn't
need to be overridden there, ditto at the start of the next insn
before the side-effect took effect, and we override it only after
it when %rbp already has the right value.
If some other target adds dynamic stack realignment in the future and
the offset 0 case wouldn't be true there, the code can be adjusted so that
it works on all the drap architectures, I'm pretty sure the code would
need other adjustments too.
For the rule 18 and for the (set hfp sp) after it we already have asserts
for the drap cases that check whether the code looks the way i?86/x86_64
emit it currently.
2021-03-26 Jakub Jelinek <jakub@redhat.com>
PR debug/99334
* dwarf2out.h (struct dw_fde_node): Add rule18 member.
* dwarf2cfi.c (dwarf2out_frame_debug_expr): When handling (set hfp sp)
assignment with drap_reg active, queue reg save for hfp with offset 0
and flush queued reg saves. When handling a push with rule18,
defer queueing reg save for hfp and just assert the offset is 0.
(scan_trace): Assert that fde->rule18 is false.
NSDMIs are a C++11 thing, and here we ICE with them on the non-C++11
path. Fortunately all we need is a small tweak to my recent r11-7835
patch.
gcc/cp/ChangeLog:
PR c++/98352
* method.c (implicitly_declare_fn): Pass &raises to
synthesized_method_walk.
gcc/testsuite/ChangeLog:
PR c++/98352
* g++.dg/cpp0x/inh-ctor37.C: Remove dg-error.
* g++.dg/cpp0x/nsdmi17.C: New test.
This makes std::random_device usable on VxWorks when running on older
x86 hardware. Since the r10-728 fix for PR libstdc++/85494 the library
will use the new code unconditionally on x86, but the cpuid checks for
RDSEED and RDRAND can fail at runtime, depending on the hardware where
the code is executing. If the OS does not provide /dev/urandom then this
means the std::random_device constructor always fails. In previous
releases if /dev/urandom is unavailable then std::mt19937 was used
unconditionally.
This patch adds a fallback for the case where the runtime cpuid checks
for x86 hardware instructions fail, and no /dev/urandom is available.
When this happens a std::linear_congruential_engine object will be used,
with a seed based on hashing the engine's address and the current time.
Distinct std::random_device objects will use different seeds, unless an
object is created and destroyed and a new object created at the same
memory location within the clock tick. This is not great, but is better
than always throwing from the constructor, and better than always using
std::mt19937 with the same seed (as GCC 9 and earlier do).
libstdc++-v3/ChangeLog:
* src/c++11/random.cc (USE_LCG): Define when a pseudo-random
fallback is needed.
[USE_LCG] (bad_seed, construct_lcg_at, destroy_lcg_at, __lcg):
New helper functions and callback.
(random_device::_M_init): Add 'prng' and 'all' enumerators.
Replace switch with fallthrough with a series of 'if' statements.
[USE_LCG]: Construct an lcg_type engine and use __lcg when cpuid
checks fail.
(random_device::_M_init_pretr1) [USE_MT19937]: Accept "prng"
token.
(random_device::_M_getval): Check for callback unconditionally
and always pass _M_file pointer.
* testsuite/26_numerics/random/random_device/85494.cc: Remove
effective-target check. Use new random_device_available helper.
* testsuite/26_numerics/random/random_device/94087.cc: Likewise.
* testsuite/26_numerics/random/random_device/cons/default-cow.cc:
Remove effective-target check.
* testsuite/26_numerics/random/random_device/cons/default.cc:
Likewise.
* testsuite/26_numerics/random/random_device/cons/token.cc: Use
new random_device_available helper. Test "prng" token.
* testsuite/util/testsuite_random.h (random_device_available):
New helper function.
During development of modules, I had difficulty deciding whether the
module flags of a template should live on the decl_template_result,
the template_decl, or both. I chose the latter, and require them to
be consistent. This and a few other defects show how hard that
consistency is. Hence this patch move to holding the flags on the
template-decl-result decl. That's the entity various bits of the
parser have at the appropriate time. Once needs STRIP_TEMPLATE in a
bunch of places, which this patch adds. Also a check that we never
give a TEMPLATE_DECL to the module flag accessors.
This left a problem with how I was handling template aliases. These
were in two parts -- separating the TEMPLATE_DECL from the TYPE_DECL.
That seemed somewhat funky, but development showed it necessary. Of
course, that causes problems if the TEMPLATE_DECL cannot contain 'am
imported' information. Investigating now shows that we do not need to
treat them separately. By reverting a bit of template instantiation
machinery that caused the problem, we're back on course. I think what
has happened is that between then and now, other typedef fixes have
corrected the underlying problem this separation was working around.
It allows a bunch of cleanup in the decl streamer, as we no longer
have to handle a null TEMPLATE_DECL_RESULT.
PR c++/99283
gcc/cp/
* cp-tree.h (DECL_MODULE_CHECK): Ban TEMPLATE_DECL.
(SET_TYPE_TEMPLATE_INFO): Restore Alias template setting.
* decl.c (duplicate_decls): Remove template_decl module flag
propagation.
* module.cc (merge_kind_name): Add alias tmpl spec as a thing.
(dumper::impl::nested_name): Adjust for template-decl module flag
change.
(trees_in::assert_definition): Likewise.
(trees_in::install_entity): Likewise.
(trees_out::decl_value): Likewise. Remove alias template
separation of template and type_decl.
(trees_in::decl_value): Likewise.
(trees_out::key_mergeable): Likewise,
(trees_in::key_mergeable): Likewise.
(trees_out::decl_node): Adjust for template-decl module flag
change.
(depset:#️⃣:make_dependency): Likewise.
(get_originating_module, module_may_redeclare): Likewise.
(set_instantiating_module, set_defining_module): Likewise.
* name-lookup.c (name_lookup::search_adl): Likewise.
(do_pushdecl): Likewise.
* pt.c (build_template_decl): Likewise.
(lookup_template_class_1): Remove special alias_template handling
of DECL_TI_TEMPLATE.
(tsubst_template_decl): Likewise.
gcc/testsuite/
* g++.dg/modules/pr99283-2_a.H: New.
* g++.dg/modules/pr99283-2_b.H: New.
* g++.dg/modules/pr99283-2_c.H: New.
* g++.dg/modules/pr99283-3_a.H: New.
* g++.dg/modules/pr99283-3_b.H: New.
* g++.dg/modules/pr99283-4.H: New.
* g++.dg/modules/tpl-alias-1_a.H: Adjust scans.
* g++.dg/modules/tpl-alias-1_b.C: Adjust scans.
Relaxed memory should be considered more like memory then special memory.
gcc/ChangeLog:
PR target/99766
* ira-costs.c (record_reg_classes): Put case with
CT_RELAXED_MEMORY adjacent to one with CT_MEMORY.
* ira.c (ira_setup_alts): Ditto.
* lra-constraints.c (process_alt_operands): Ditto.
* recog.c (asm_operand_ok): Ditto.
* reload.c (find_reloads): Ditto.
gcc/testsuite/ChangeLog:
PR target/99766
* g++.target/aarch64/sve/pr99766.C: New.
Most postincrements are cheap on Neoverse V1, but it's
generally better to avoid them on LD[34] and ST[34] instructions.
This patch adds separate address costs fields for these cases.
Other CPUs continue to use the same costs for all postincrements.
gcc/
* config/aarch64/aarch64-protos.h
(cpu_addrcost_table::post_modify_ld3_st3): New member variable.
(cpu_addrcost_table::post_modify_ld4_st4): Likewise.
* config/aarch64/aarch64.c (generic_addrcost_table): Update
accordingly, using the same costs as for post_modify.
(exynosm1_addrcost_table, xgene1_addrcost_table): Likewise.
(thunderx2t99_addrcost_table, thunderx3t110_addrcost_table):
(tsv110_addrcost_table, qdf24xx_addrcost_table): Likewise.
(a64fx_addrcost_table): Likewise.
(neoversev1_addrcost_table): New.
(neoversev1_tunings): Use neoversev1_addrcost_table.
(aarch64_address_cost): Use the new post_modify costs for CImode
and XImode.
When SVE is enabled, GCC needs to do a three-way comparison
between scalar, Advanced SIMD and SVE code. The normal costs
tend to be latency-based, which is well-suited to SLP. However,
comparing sums of latency costs means that we effectively treat
the code as executing sequentially. This can hide the effect of
pipeline bubbles or resource contention that in practice are quite
important for loop vectorisation. This is particularly true for
loops that involve reductions.
This patch therefore tries to estimate how quickly each piece
of code could issue, using a very (very) simplistic model.
It then uses this to adjust the loop vector costs up or down as
appropriate. Part of the Advanced SIMD vs. SVE adjustment is
opt-in and is not enabled by default even for use_new_vector_costs.
Like with the previous patches, this one only becomes active if
a CPU selects use_new_vector_costs. It should therefore have
a very low impact on other CPUs. The code also mostly ignores
CPUs that have no issue information, even if use_new_vector_costs
is enabled for some reason.
gcc/
* config/aarch64/aarch64.opt
(-param=aarch64-loop-vect-issue-rate-niters=): New parameter.
* doc/invoke.texi: Document it.
* config/aarch64/aarch64-protos.h (aarch64_base_vec_issue_info)
(aarch64_scalar_vec_issue_info, aarch64_simd_vec_issue_info)
(aarch64_advsimd_vec_issue_info, aarch64_sve_vec_issue_info)
(aarch64_vec_issue_info): New structures.
(cpu_vector_cost): Write comments above the variables rather
than to the side.
(cpu_vector_cost::issue_info): New member variable.
* config/aarch64/aarch64.c: Include gimple-pretty-print.h
and tree-ssa-loop-niter.h.
(generic_vector_cost, a64fx_vector_cost, qdf24xx_vector_cost)
(thunderx_vector_cost, tsv110_vector_cost, cortexa57_vector_cost)
(exynosm1_vector_cost, xgene1_vector_cost, thunderx2t99_vector_cost)
(thunderx3t110_vector_cost): Initialize issue_info to null.
(neoversev1_scalar_issue_info, neoversev1_advsimd_issue_info)
(neoversev1_sve_issue_info, neoversev1_vec_issue_info): New structures.
(neoversev1_vector_cost): Use them.
(aarch64_vec_op_count, aarch64_sve_op_count): New structures.
(aarch64_vector_costs::saw_sve_only_op): New member variable.
(aarch64_vector_costs::num_vector_iterations): Likewise.
(aarch64_vector_costs::scalar_ops): Likewise.
(aarch64_vector_costs::advsimd_ops): Likewise.
(aarch64_vector_costs::sve_ops): Likewise.
(aarch64_vector_costs::seen_loads): Likewise.
(aarch64_simd_vec_costs_for_flags): New function.
(aarch64_analyze_loop_vinfo): Initialize num_vector_iterations.
Count the number of predicate operations required by SVE WHILE
instructions.
(aarch64_comparison_type, aarch64_multiply_add_p): New functions.
(aarch64_sve_only_stmt_p, aarch64_in_loop_reduction_latency): Likewise.
(aarch64_count_ops): Likewise.
(aarch64_add_stmt_cost): Record whether see an SVE operation
that cannot currently be implementing using Advanced SIMD.
Record issue information about the scalar, Advanced SIMD
and (where relevant) SVE versions of a loop.
(aarch64_vec_op_count::dump): New function.
(aarch64_sve_op_count::dump): Likewise.
(aarch64_estimate_min_cycles_per_iter): Likewise.
(aarch64_adjust_body_cost): If issue information is available,
try to compare the issue rates of the various loop implementations
and increase or decrease the vector body cost accordingly.
In practice it seems to be better not to cost a vector induction.
The scalar code generally needs the same induction but doesn't
cost it, making an apples-for-apples comparison harder. Most
inductions also have a low latency and their cost usually gets
hidden by other operations.
Like with the previous patches, this one only becomes active if
a CPU selects use_new_vector_costs. It should therefore have
a very low impact on other CPUs.
gcc/
* config/aarch64/aarch64.c (aarch64_detect_vector_stmt_subtype):
Assume a zero cost for induction phis.
So far the costing of COND_EXPRs hasn't distinguished between
cases in which the condition is calculated separately or is
built into the COND_EXPR itself. This patch adds the cost
of any embedded comparison.
Like with the previous patches, this one only becomes active if
a CPU selects use_new_vector_costs. It should therefore have
a very low impact on other CPUs.
gcc/
* config/aarch64/aarch64.c (aarch64_embedded_comparison_type): New
function.
(aarch64_adjust_stmt_cost): Add the costs of embedded scalar and
vector comparisons.