8sa1-gcc

Author	SHA1	Message	Date
Iain Sandoe	02f305440f	Darwin : Fix build failure for powerpc-darwin8 [PR99661]. A hunk had been missed from r11-6417, fixed thus: gcc/ChangeLog: PR target/99661 * config.gcc (powerpc-*-darwin8): Delete the reference to the now removed darwin8.h.	2021-03-19 16:40:11 +00:00
Olivier Hainque	eadb118e36	target/99660 - missing VX_CPU_PREFIX for vxworksae This fixes an oversight which causes make all-gcc to fail for --target=vxworksae or vxworksmils, a regression introduced by the recent VxWorks7 related updates. Both AE and MILS variants resort to a common config/vxworksae.h, which misses a definition of VX_CPU_PREFIX expected by port specific headers. The change just provides the missing definition. 2021-03-19 Olivier Hainque <hainque@adacore.com> gcc/ PR target/99660 config/vxworksae.h (VX_CPU_PREFIX): Define.	2021-03-19 16:16:39 +00:00
John David Anglin	22d1a90a15	Use memcpy instead of strncpy to avoid error with -Werror=stringop-truncation. gcc/ChangeLog: * config/pa/pa.c (import_milli): Use memcpy instead of strncpy.	2021-03-19 15:57:06 +00:00
Tamar Christina	c3a2bc6daa	slp: remove unneeded permute calculation (PR99656) The attach testcase ICEs because as you showed on the PR we have one child which is an internal with a PERM of EVENEVEN and one with TOP. The problem is while we can conceptually merge the permute itself into EVENEVEN, merging the lanes don't really make sense. That said, we no longer even require the merged lanes as we create the permutes based on the KIND directly. This patch just removes all of that code. Unfortunately it still won't vectorize with the cost model enabled due to the blend that's created combining the load and the external note: node 0x51f2ce8 (max_nunits=1, refcnt=1) note: op: VEC_PERM_EXPR note: { } note: lane permutation { 0[0] 1[1] } note: children 0x51f23e0 0x51f2578 note: node 0x51f23e0 (max_nunits=2, refcnt=1) note: op template: _16 = REALPART_EXPR <t1_9(D)>; note: stmt 0 _16 = REALPART_EXPR <t1_9(D)>; note: stmt 1 _16 = REALPART_EXPR <t1_9(D)>; note: load permutation { 0 0 } note: node (external) 0x51f2578 (max_nunits=1, refcnt=1) note: { _18, _18 } which costs the cost for the load-and-split and the cost of the external splat, and the one for blending them while in reality it's just a scalar load and insert. The compiler (with the cost model disabled) generates ldr q1, [x19] dup v1.2d, v1.d[0] ldr d0, [x19, 8] fneg d0, d0 ins v1.d[1], v0.d[0] while really it should be ldp d1, d0, [x19] fneg d0, d0 ins v1.d[1], v0.d[0] but that's for another time. gcc/ChangeLog: PR tree-optimization/99656 tree-vect-slp-patterns.c (linear_loads_p, complex_add_pattern::matches, is_eq_or_top, vect_validate_multiplication, complex_mul_pattern::matches, complex_fms_pattern::matches): Remove complex_perm_kinds_t. * tree-vectorizer.h: (complex_load_perm_t): Removed. (slp_tree_to_load_perm_map_t): Use complex_perm_kinds_t instead of complex_load_perm_t. gcc/testsuite/ChangeLog: PR tree-optimization/99656 * gfortran.dg/vect/pr99656.f90: New test.	2021-03-19 14:29:36 +00:00
H.J. Lu	5e2eabe1ee	x86: Issue error for return/argument only with function body If we never generate function body, we shouldn't issue errors for return nor argument. Add silent_p to i386 machine_function to avoid issuing errors for return and argument without function body. gcc/ PR target/99652 * config/i386/i386-options.c (ix86_init_machine_status): Set silent_p to true. * config/i386/i386.c (init_cumulative_args): Set silent_p to false. (construct_container): Return early for return and argument errors if silent_p is true. * config/i386/i386.h (machine_function): Add silent_p. gcc/testsuite/ PR target/99652 * gcc.dg/torture/pr99652-1.c: New test. * gcc.dg/torture/pr99652-2.c: Likewise. * gcc.target/i386/pr57655.c: Adjusted. * gcc.target/i386/pr59794-6.c: Likewise. * gcc.target/i386/pr70738-1.c: Likewise. * gcc.target/i386/pr96744-1.c: Likewise.	2021-03-19 06:39:51 -07:00
David Malcolm	21d09cb732	analyzer: mark epath_finder with DISABLE_COPY_AND_ASSIGN [PR99614] cppcheck warns that class epath_finder does dynamic memory allocation, but is missing a copy constructor and operator=. This class isn't meant to be copied or assigned, so mark it with DISABLE_COPY_AND_ASSIGN. gcc/analyzer/ChangeLog: PR analyzer/99614 * diagnostic-manager.cc (class epath_finder): Add DISABLE_COPY_AND_ASSIGN.	2021-03-19 09:01:57 -04:00
Jakub Jelinek	009528d61c	arm: Fix mve_vshlq* [PR99593] As mentioned in the PR, before the r11-6708-gbfab355012ca0f5219da8beb04f2fdaf757d34b7 change v[al]shr<mode>3 expanders were expanding the shifts by register to gen_ashl<mode>3_{,un}signed which don't support immediate CONST_VECTOR shift amounts, but now expand to mve_vshlq_<supf><mode> which does. The testcase ICEs, because the constraint doesn't match the predicate and because LRA works solely with the constraints, so it can e.g. from REG_EQUAL propagate there a CONST_VECTOR which matches the constraint but fails the predicate and only later on other passes will notice the predicate fails and ICE. Fixed by adding a constraint that matches the immediate part of the predicate. PR target/99593 * config/arm/constraints.md (Ds): New constraint. * config/arm/vec-common.md (mve_vshlq_<supf><mode>): Use w,Ds constraint instead of w,Dm. * g++.target/arm/pr99593.C: New test.	2021-03-19 13:48:44 +01:00
Andrew Stubbs	5cded5aff7	amdgcn: Typo fix gcc/ChangeLog: * config/gcn/gcn.c (gcn_parse_amdgpu_hsa_kernel_attribute): Fix quotes in error message.	2021-03-19 10:51:43 +00:00
Matthias Klose	3b0155305e	substitute @tie{} with a space for the man pages contrib/ 2021-03-19 Matthias Klose <doko@ubuntu.com> * texi2pod.pl: Substitute @tie{} with a space for the man pages.	2021-03-19 10:03:02 +00:00
Eric Botcazou	af73a8b202	Require linker plugin for another LTO test If it is not present, fat LTO is generated with an additional warning. gcc/testsuite/ * g++.dg/lto/pr89335_0.C: Require the linker plugin.	2021-03-19 09:25:23 +01:00
Eric Botcazou	b980edba50	Fix segfault during encoding of CONSTRUCTORs The segfault occurs in native_encode_initializer when it is encoding the CONSTRUCTOR for an array whose lower bound is negative (it's OK in Ada). The computation of the current position is done in HOST_WIDE_INT and this does not work for arrays whose original range has a negative lower bound and a positive upper bound; the computation must be done in sizetype instead so that it may wrap around. gcc/ PR middle-end/99641 * fold-const.c (native_encode_initializer) <CONSTRUCTOR>: For an array type, do the computation of the current position in sizetype.	2021-03-19 09:25:23 +01:00
GCC Administrator	287e3e8466	Daily bump.	2021-03-19 00:16:26 +00:00
Marek Polacek	bd9b262fa9	c++: Fix error-recovery with requires expression [PR99500] This fixes an ICE on invalid code where one of the parameters was error_mark_node and thus resetting its DECL_CONTEXT crashed. gcc/cp/ChangeLog: PR c++/99500 * parser.c (cp_parser_requirement_parameter_list): Handle error_mark_node. gcc/testsuite/ChangeLog: PR c++/99500 * g++.dg/cpp2a/concepts-err3.C: New test.	2021-03-18 20:09:44 -04:00
Marek Polacek	96ccb32543	c++: Remove FLOAT_EXPR assert in tsubst. This assert triggered when pr85013.C was compiled with -fchecking=2 which the usual testing doesn't exercise. Let's remove it for now and revisit in GCC 12. gcc/cp/ChangeLog: * pt.c (tsubst_copy_and_build) <case FLOAT_EXPR>: Remove.	2021-03-18 17:20:32 -04:00
Vladimir N. Makarov	a4670f58eb	[PR99422] LRA: Use lookup_constraint only for a single constraint in process_address_1. This is an additional patch for PR99422. In process_address_1 we look only at the first constraint in the 1st alternative and ignore all other possibilities. As we don't know what alternative and constraint will be used at this stage, we can be sure only for a single constraint with one alternative and should use unknown constraint for all other cases. gcc/ChangeLog: PR target/99422 * lra-constraints.c (process_address_1): Use lookup_constraint only for a single constraint.	2021-03-18 15:59:15 -04:00
Martin Sebor	30b10dacd0	PR middle-end/99502 - missing -Warray-bounds on partial out of bounds gcc/ChangeLog: PR middle-end/99502 * gimple-array-bounds.cc (inbounds_vbase_memaccess_p): Rename... (inbounds_memaccess_p): ...to this. Check the ending offset of the accessed member. gcc/testsuite/ChangeLog: PR middle-end/99502 * g++.dg/warn/Warray-bounds-22.C: New test. * g++.dg/warn/Warray-bounds-23.C: New test. * g++.dg/warn/Warray-bounds-24.C: New test.	2021-03-18 13:38:00 -06:00
Marek Polacek	c5e55673b4	c++: Add assert to tsubst. As discussed in the r11-7709 patch, we can now make sure that tsubst never sees a FLOAT_EXPR, much like its counterpart FIX_TRUNC_EXPR. gcc/cp/ChangeLog: * pt.c (tsubst_copy_and_build): Add assert.	2021-03-18 14:20:00 -04:00
Andrew Stubbs	55308fc263	amdgcn: Silence warnings in gcn.c This fixes a few cases of "unquoted identifier or keyword", one "spurious trailing punctuation sequence", and a "may be used uninitialized". gcc/ChangeLog: * config/gcn/gcn.c (gcn_parse_amdgpu_hsa_kernel_attribute): Add %< and %> quote markers to error messages. (gcn_goacc_validate_dims): Likewise. (gcn_conditional_register_usage): Remove exclaimation mark from error message. (gcn_vectorize_vec_perm_const): Ensure perm is fully uninitialized.	2021-03-18 17:38:51 +00:00
Jan Hubicka	ab03c0d575	Fix idiv latencies for znver3 update costs of integer divides to match actual latencies (the scheduler model already does the right thing). It is essentially no-op, since we end up expanding idiv for all sensible constants, so this only may end up disabling vectorization in some cases, but I did not find any such examples. However in general it is better ot have actual latencies than random numbers. gcc/ChangeLog: 2021-03-18 Jan Hubicka <hubicka@ucw.cz> * config/i386/x86-tune-costs.h (struct processor_costs): Fix costs of integer divides1.	2021-03-18 17:15:34 +01:00
Sinan Lin	d9f0ade001	PR target/99314: Fix integer signedness issue for cpymem pattern expansion. Third operand of cpymem pattern is unsigned HOST_WIDE_INT, however we are interpret that as signed HOST_WIDE_INT, that not a problem in most case, but when the value is large than signed HOST_WIDE_INT, it might screw up since we have using that value to calculate the buffer size. 2021-03-05 Sinan Lin <sinan@isrc.iscas.ac.cn> Kito Cheng <kito.cheng@sifive.com> gcc/ChangeLog: * config/riscv/riscv.c (riscv_block_move_straight): Change type to unsigned HOST_WIDE_INT for parameter and local variable with HOST_WIDE_INT type. (riscv_adjust_block_mem): Ditto. (riscv_block_move_loop): Ditto. (riscv_expand_block_move): Ditto.	2021-03-19 00:04:32 +08:00
Jakub Jelinek	89d44a9f3b	testsuite: Fix up strlenopt-80.c on powerpc [PR99636] Similar issue as in strlenopt-73.c, various spots in this test rely on MOVE_MAX >= 8, this time it uses a target selector to pick up a couple of targets, and all of them but powerpc 32-bit satisfy it, but powerpc 32-bit have MOVE_MAX just 4. 2021-03-18 Jakub Jelinek <jakub@redhat.com> PR testsuite/99636 * gcc.dg/strlenopt-80.c: For powerpc--*, only enable for lp64.	2021-03-18 16:14:47 +01:00
Jakub Jelinek	fff9faa790	testsuite: Fix up strlenopt-73.c on powerpc [PR99626] As mentioned in the testcase as well as in the PR, this testcase relies on MOVE_MAX being sufficiently large that the memcpy call is folded early into load + store. Some popular targets define MOVE_MAX to 8 or even 16 (e.g. x86_64 or some options on s390x), but many other targets define it to just 4 (e.g. powerpc 32-bit), or even 2. The testcase has already one test routine guarded on one particular target with MOVE_MAX 16 (but does it incorrectly, __i386__ is only defined on 32-bit x86 and __SIZEOF_INT128__ is only defined on 64-bit targets), this patch fixes that, and guards another test that relies on memcpy (, , 8) being folded that way (which therefore needs MOVE_MAX >= 8) on a couple of common targets that are known to have such MOVE_MAX. 2021-03-18 Jakub Jelinek <jakub@redhat.com> PR testsuite/99626 * gcc.dg/strlenopt-73.c: Ifdef out test_copy_cond_unequal_length_i64 on targets other than x86, aarch64, s390 and 64-bit powerpc. Use test_copy_cond_unequal_length_i128 for __x86_64__ with int128 support rather than __i386__.	2021-03-18 16:11:46 +01:00
Jeff Law	d186c677e4	Update email address for primary entry / * MAINTAINERS: Update primary entry.	2021-03-18 08:33:20 -06:00
Christophe Lyon	0211fbb610	testsuite: Skip c-c++-common/zero-scratch-regs-10.c on arm As discussed in PR 97680, -fzero-call-used-regs is not supported on arm. Skip this test to avoid failure reports. 2021-03-18 Christophe Lyon <christophe.lyon@linaro.org> gcc/testsuite/ PR testsuite/97680 * c-c++-common/zero-scratch-regs-10.c: Skip on arm	2021-03-18 14:26:34 +00:00
Nick Clifton	073595ef13	Fix building the V850 port using recent versions of gcc. gcc/ * config/v850/v850.c (construct_restore_jr): Increase static buffer size. (construct_save_jarl): Likewise. * config/v850/v850.h (DWARF2_DEBUGGING_INFO): Define.	2021-03-18 12:57:25 +00:00
Iain Sandoe	0cc218d42c	Objective-C++ : Fix handling of unnamed message parms [PR49070]. When we are parsing an Objective-C++ message, a colon is a valid terminator for a assignment-expression. That is: [receiver meth❌x❌x]; Is a valid, if somewhat unreadable, construction; corresponding to a method declaration like: - (id) meth:(id)arg0 :(id)arg1 :(id)arg2 :(id)arg3; Where three of the message params have no selector name. If fact, although it might be unintentional, Objective-C/C++ can accept message selectors with all the parms unnamed (this applies to the clang implementation too, which is taken as the reference for the language). For regular C++, the pattern x:x is not valid in that position an an error is emitted with a fixit for the expected scope token. If we simply made that error conditional on !c_dialect_objc() that would regress Objective-C++ diagnostics for cases outside a message selector, so we add a state flag for this. gcc/cp/ChangeLog: PR objc++/49070 * parser.c (cp_debug_parser): Add Objective-C++ message state flag. (cp_parser_nested_name_specifier_opt): Allow colon to terminate an assignment-expression when parsing Objective- C++ messages. (cp_parser_objc_message_expression): Set and clear message parsing state on entry and exit. * parser.h (struct cp_parser): Add a context flag for Objective-C++ message state. gcc/testsuite/ChangeLog: PR objc++/49070 * obj-c++.dg/pr49070.mm: New test. * objc.dg/unnamed-parms.m: New test.	2021-03-18 11:47:27 +00:00
Kyrylo Tkachov	8f0c9d53ef	aarch64: Improve generic SVE tuning defaults This patch adds the recently-added tweak to split some SVE VL-based scalar operations [1] to the generic tuning used for SVE, as enabled by adding +sve to the -march flag, for example -march=armv8.2-a+sve. The recommendation for best performance on a particular CPU remains unchanged: use the -mcpu option for that CPU, where possible. -mcpu=native makes this straightforward for native compilation. The tweak to split out SVE VL-based scalar operations is a consistent win for the Neoverse V1 CPU and should be neutral for the Fujitsu A64FX. A run of SPEC2017 on A64FX with this tweak on didn't show any non-noise differences. It is also expected to be neutral on SVE2 implementations. Therefore, the patch enables the tweak for generic +sve tuning e.g. -march=armv8.2-a+sve. No SVE2 CPUs are expected to benefit from it, therefore the tweak is disabled for generic tuning when +sve2 is in -march e.g. -march=armv8.2-a+sve2. The implementation of this approach requires a bit of custom logic in aarch64_override_options_internal to handle these kinds of architecture-dependent decisions, but we do believe the user-facing principle here is important to implement. In general, for the generic target we're using a decision framework that looks like: * If all cores that are known to benefit from an optimization are of architecture X, and all other cores that implement X or above are not impacted, or have a very slight impact, we will consider it for generic tuning for architecture X. * We will not enable that optimisation for generic tuning for architecture X+1 if no known cores of architecture X+1 or above will benefit. This framework allows us to improve generic tuning for CPUs of generation X while avoiding accumulating tweaks for future CPUs of generation X+1, X+2... that do not need them, and thus avoid even the slight negative effects of these optimisations if the user is willing to tell us the desired architecture accurately. X above can mean either annual architecture updates (Armv8.2-a, Armv8.3-a etc) or optional architecture extensions (like SVE, SVE2). [1] http://gcc.gnu.org/g:a65b9ad863c5fc0aea12db58557f4d286a1974d7 gcc/ChangeLog: * config/aarch64/aarch64.c (aarch64_adjust_generic_arch_tuning): Define. (aarch64_override_options_internal): Use it. (generic_tunings): Add AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS to tune_flags. gcc/testsuite/ChangeLog: * g++.target/aarch64/sve/aarch64-sve.exp: Add -moverride=tune=none to sve_flags. * g++.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp: Likewise. * g++.target/aarch64/sve/acle/aarch64-sve-acle.exp: Likewise. * gcc.target/aarch64/sve/aarch64-sve.exp: Likewise. * gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp: Likewise. * gcc.target/aarch64/sve/acle/aarch64-sve-acle.exp: Likewise.	2021-03-18 09:56:47 +00:00
Martin Liska	3bcf19215d	coroutines: init struct members to NULL gcc/cp/ChangeLog: PR c++/99617 * coroutines.cc (struct var_nest_node): Init then_cl and else_cl to NULL.	2021-03-18 10:42:44 +01:00
Jakub Jelinek	57e274408c	testsuite: Fix up pr98099.c testcase for big endian [PR98099] The testcase fails on big-endian without int128 support, because due to -fsso-struct=big-endian no swapping is needed for big endian. This patch restricts the testcase to big or little endian (but not pdp) and uses -fsso-struct=little-endian for big endian, so that it is swapping everywhere. 2021-03-18 Jakub Jelinek <jakub@redhat.com> PR middle-end/98099 * gcc.dg/pr98099.c: Don't compile the test on pdp endian. For big endian use -fsso-struct=little-endian dg-options.	2021-03-18 09:53:24 +01:00
GCC Administrator	19ac7c94b2	Daily bump.	2021-03-18 00:16:24 +00:00
Marek Polacek	40465293cd	c++: ICE with real-to-int conversion in template [PR97973] In this test we are building a call in a template, but since neither the function nor any of its arguments are dependent, we go down the normal path in finish_call_expr. convert_arguments sees that we're binding a reference to int to double and therein convert_to_integer creates a FIX_TRUNC_EXPR. Later, we call check_function_arguments which folds the arguments, and, in a template, fold_for_warn calls fold_non_dependent_expr. But tsubst_copy_and_build should not see a FIX_TRUNC_EXPR (see the patch discussed in <https://gcc.gnu.org/pipermail/gcc-patches/2018-March/496183.html>) or we crash. So let's not create a FIX_TRUNC_EXPR in a template in the first place and instead use IMPLICIT_CONV_EXPR. gcc/cp/ChangeLog: PR c++/97973 * call.c (conv_unsafe_in_template_p): New. (convert_like): Use it. gcc/testsuite/ChangeLog: PR c++/97973 * g++.dg/conversion/real-to-int1.C: New test.	2021-03-17 19:26:25 -04:00
Anthony Sharp	be246ac2d2	c++: Private parent access check for using decls [PR19377] This bug was already mostly fixed by the patch for PR17314. This patch continues that by ensuring that where a using decl is used, causing an access failure to a child class because the using decl is private, the compiler correctly points to the using decl as the source of the problem. gcc/cp/ChangeLog: 2021-03-10 Anthony Sharp <anthonysharp15@gmail.com> * semantics.c (get_class_access_diagnostic_decl): New function that examines special cases when a parent class causes a private access failure. (enforce_access): Slightly modified to call function above. gcc/testsuite/ChangeLog: 2021-03-10 Anthony Sharp <anthonysharp15@gmail.com> * g++.dg/cpp1z/using9.C: New using decl test. Co-authored-by: Jason Merrill <jason@redhat.com>	2021-03-17 19:11:02 -04:00
Sandra Loosemore	5074c6fa38	nios2: Fix format complaints and similar diagnostics. The nios2 back end has not been building with newer versions of host GCC due to several complaints about diagnostic formatting, along with a couple other warnings. This patch fixes the errors seen when building with a host compiler from current mainline head. I also made a pass through all the error messages in this file to make them use more consistent formatting, even where the host compiler was not specifically complaining. gcc/ * config/nios2/nios2.c (nios2_custom_check_insns): Clean up error message format issues. (nios2_option_override): Likewise. (nios2_expand_fpu_builtin): Likewise. (nios2_init_custom_builtins): Adjust to avoid bogus strncpy truncation warning. (nios2_expand_custom_builtin): More error message format fixes. (nios2_expand_rdwrctl_builtin): Likewise. (nios2_expand_rdprs_builtin): Likewise. (nios2_expand_eni_builtin): Likewise. (nios2_expand_builtin): Likewise. (nios2_register_custom_code): Likewise. (nios2_valid_target_attribute_rec): Likewise. (nios2_add_insn_asm): Fix uninitialized variable warning.	2021-03-17 14:41:31 -07:00
Jan Hubicka	bd364aaee3	Enable gather on zen3 hardware. For TSVC it get used by 5 benchmarks with following runtime improvements: s4114: 1.424 -> 1.209 (84.9017%) s4115: 2.021 -> 1.065 (52.6967%) s4116: 1.549 -> 0.854 (55.1323%) s4117: 1.386 -> 1.193 (86.075%) vag: 2.741 -> 1.940 (70.7771%) there is regression in s4112: 1.115 -> 1.184 (106.188%) The internal loop is: for (int i = 0; i < LEN_1D; i++) { a[i] += b[ip[i]] * s; } (so a standard accmulate and add with indirect addressing) 40a400: c5 fe 6f 24 03 vmovdqu (%rbx,%rax,1),%ymm4 40a405: c5 fc 28 da vmovaps %ymm2,%ymm3 40a409: 48 83 c0 20 add $0x20,%rax 40a40d: c4 e2 65 92 04 a5 00 vgatherdps %ymm3,0x594100(,%ymm4,4),%ymm0 40a414: 41 59 00 40a417: c4 e2 75 a8 80 e0 34 vfmadd213ps 0x5b34e0(%rax),%ymm1,%ymm0 40a41e: 5b 00 40a420: c5 fc 29 80 e0 34 5b vmovaps %ymm0,0x5b34e0(%rax) 40a427: 00 40a428: 48 3d 00 f4 01 00 cmp $0x1f400,%rax 40a42e: 75 d0 jne 40a400 <s4112+0x60> compared to: 40a280: 49 63 14 04 movslq (%r12,%rax,1),%rdx 40a284: 48 83 c0 04 add $0x4,%rax 40a288: c5 fa 10 04 95 00 41 vmovss 0x594100(,%rdx,4),%xmm0 40a28f: 59 00 40a291: c4 e2 71 a9 80 fc 34 vfmadd213ss 0x5b34fc(%rax),%xmm1,%xmm0 40a298: 5b 00 40a29a: c5 fa 11 80 fc 34 5b vmovss %xmm0,0x5b34fc(%rax) 40a2a1: 00 40a2a2: 48 3d 00 f4 01 00 cmp $0x1f400,%rax 40a2a8: 75 d6 jne 40a280 <s4112+0x40> Looking at instructions latencies - fmadd is 4 cycles - vgatherdps is 39 So vgather iself is 4.8 cycle per iteration and probably CPU is able to execute rest out of order getting clos to 4 cycles per iteration (it can do 2 loads in parallel, one store and rest fits easily to execution resources). That would explain 20% slowdown. gimple internal loop is: _2 = a[i_38]; _3 = (long unsigned int) i_38; _4 = _3 * 4; _5 = ip_18 + _4; _6 = _5; _7 = b[_6]; _8 = _7 s_19; _9 = _2 + _8; a[i_38] = _9; i_28 = i_38 + 1; ivtmp_52 = ivtmp_53 - 1; if (ivtmp_52 != 0) goto <bb 8>; [98.99%] else goto <bb 4>; [1.01%] 0x25bac30 a[i_38] 1 times scalar_load costs 12 in body 0x25bac30 _5 1 times scalar_load costs 12 in body 0x25bac30 b[_6] 1 times scalar_load costs 12 in body 0x25bac30 _7 s_19 1 times scalar_stmt costs 12 in body 0x25bac30 _2 + _8 1 times scalar_stmt costs 12 in body 0x25bac30 _9 1 times scalar_store costs 16 in body so 19 cycles estimate of scalar load 0x2668630 a[i_38] 1 times vector_load costs 12 in body 0x2668630 _5 1 times unaligned_load (misalign -1) costs 12 in body 0x2668630 b[_6] 8 times scalar_load costs 96 in body 0x2668630 _7 s_19 1 times scalar_to_vec costs 4 in prologue 0x2668630 _7 * s_19 1 times vector_stmt costs 12 in body 0x2668630 _2 + _8 1 times vector_stmt costs 12 in body 0x2668630 _9 1 times vector_store costs 16 in body so 40 cycles per 8x vectorized body tsvc.c:3450:27: note: operating only on full vectors. tsvc.c:3450:27: note: Cost model analysis: Vector inside of loop cost: 160 Vector prologue cost: 4 Vector epilogue cost: 0 Scalar iteration cost: 76 Scalar outside cost: 0 Vector outside cost: 4 prologue iterations: 0 epilogue iterations: 0 Calculated minimum iters for profitability: 1 I think this generally suffers from GIGO principle. One problem seems to be that we do not know about fmadd yet and compute it as two instructions (6 cycles instead of 4). More importnat problem is that we do not account the parallelism at all. I do not see how to disable the vecotrization here without bumping gather costs noticeably off reality and thus we probably can try to experiment with this if more similar problems are found. Icc is also using gather in s1115 and s128. For s1115 the vectorization does not seem to help and s128 gets slower. Clang and aocc does not use gathers. * config/i386/x86-tune-costs.h (struct processor_costs): Update costs of gather to match reality. * config/i386/x86-tune.def (X86_TUNE_USE_GATHER): Enable for znver3.	2021-03-17 22:37:11 +01:00
Ian Lance Taylor	f3e9c98a9f	compiler: copy receiver argument for go/defer of method call Test case is https://golang.org/cl/302371. Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/302270	2021-03-17 12:17:51 -07:00
Iain Sandoe	c86c5195c8	testsuite, Darwin : Fix the asan/strncpy-overflow-1 test. 1. To be more compatible with Linux, Darwin testcases that include string.h should set _FORTIFY_SOURCE=0 since, otherwise, it will be defaulted on and the _chk versions of the string builtins will be used. This testcase fails otherwise because there's no convenient way to disable the _chk builtins. 2. The system tool that handles symbolization (atos) is not reliable with GCC's DWARF-2 output but, fortunately, all the platform versions that support the current sanitizers are able to handle dwarf-3 for this testcase. gcc/testsuite/ChangeLog: * c-c++-common/asan/strncpy-overflow-1.c: Add _FORTIFY_SOURCE=0 and -gdwarf-3 to the command line options. Adjust the expected line numbers for the revised options header.	2021-03-17 19:12:25 +00:00
Iain Sandoe	9c4d77fc1c	testsuite, Darwin : Fix match output for asan/memcmp-1.c. The Darwin part of libasan produces different output for memcmp cases from other ports. The GCC implementation does produce the same output for this test as the clang one (modulo the two points below). 1. To be more compatible with Linux, Darwin testcases that include string.h should set _FORTIFY_SOURCE=0 since, otherwise, it will be defaulted on and the _chk versions of the string builtins will be used. 2. The system tool that handles symbolization (atos) is not reliable with GCC's DWARF-2 output but, fortunately, all the platform versions that support the current sanitizers are able to handle dwarf-3 for this testcase. gcc/testsuite/ChangeLog: * c-c++-common/asan/memcmp-1.c: Add _FORTIFY_SOURCE=0 and -gdwarf-3 to the command line options. Provide Darwin- specific match lines for the expected output.	2021-03-17 19:12:03 +00:00
Kyrylo Tkachov	f7581eb38e	aarch64: Fix status return logic in RNG intrinsics There is a bug with the RNG intrinsics in their return code. The definition says: "Stores a 64-bit random number into the object pointed to by the argument and returns zero. If the implementation could not generate a random number within a reasonable period of time the object pointed to by the input is set to zero and a non-zero value is returned." This means we should be testing whether to return non-zero with: CSET W0, EQ rather than NE. This patch fixes that. gcc/ChangeLog: * config/aarch64/aarch64-builtins.c (aarch64_expand_rng_builtin): Use EQ to compare against CC_REG rather than NE. gcc/testsuite/ChangeLog: * gcc.target/aarch64/acle/rng_2.c: New test.	2021-03-17 18:21:05 +00:00
H.J. Lu	adf14bdbc1	x86: Update 'P' operand modifier for -fno-plt Update 'P' operand modifier for -fno-plt to support inline assembly statements. In 64-bit, we can always load function address with @GOTPCREL. In 32-bit, we load function address with @GOT only for non-PIC since PIC register may not be available at call site. gcc/ PR target/99504 * config/i386/i386.c (ix86_force_load_from_GOT_p): Support inline assembly statements. (ix86_print_operand): Update 'P' handling for -fno-plt. gcc/testsuite/ PR target/99504 * gcc.target/i386/pr99530-1.c: New test. * gcc.target/i386/pr99530-2.c: Likewise. * gcc.target/i386/pr99530-3.c: Likewise. * gcc.target/i386/pr99530-4.c: Likewise. * gcc.target/i386/pr99530-5.c: Likewise. * gcc.target/i386/pr99530-6.c: Likewise.	2021-03-17 07:06:10 -07:00
Tamar Christina	39916ceab4	AArch64: Fix -Werror issue in aarch64_simd_clone_compute_vecsize_and_simdlen g:fcefc59befd396267b824c170b6a37acaf10874e introduced a new variable named arg_type which shadows the function scoped one. The function scoped one is now unused and so causes bootstrap to fail due to -Werror. This patch removes the unused variable. gcc/ChangeLog: PR target/99542 * config/aarch64/aarch64.c (aarch64_simd_clone_compute_vecsize_and_simdlen): Remove unused var.	2021-03-17 11:12:25 +00:00
GCC Administrator	bc2127767a	Daily bump.	2021-03-17 00:16:25 +00:00
Christophe Lyon	a2a6e9214e	aarch64: Fix up aarch64_simd_clone_compute_vecsize_and_simdlen [PR99542] The gcc.dg/declare-simd.c test does not emit a warning with -mabi=ilp32. 2021-03-16 Christophe Lyon <christophe.lyon@linaro.org> PR target/99542 gcc/testsuite/ * gcc.dg/declare-simd.c (fn2): Expect a warning only under lp64.	2021-03-16 21:51:39 +00:00
Jason Merrill	a4101e5aaf	c++: Fix NaN as C++20 template argument C++20 allows floating-point types for non-type template parameters; floating-point values are considered to be equivalent template arguments if they are "identical", which conveniently seems to map onto an existing GCC predicate. gcc/cp/ChangeLog: * tree.c (cp_tree_equal): Use real_identical. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/nontype-float1.C: New test.	2021-03-16 17:47:22 -04:00
Jakub Jelinek	0251051db6	c++: Ensure correct destruction order of local statics [PR99613] As mentioned in the PR, if end of two constructions of local statics is strongly ordered, their destructors should be run in the reverse order. As we run __cxa_guard_release before calling __cxa_atexit, it is possible that we have two threads that access two local statics in the same order for the first time, one thread wins the __cxa_guard_acquire on the first one but is rescheduled in between the __cxa_guard_release and __cxa_atexit calls, then the other thread is scheduled and wins __cxa_guard_acquire on the second one and calls __cxa_quard_release and __cxa_atexit and only afterwards the first thread calls its __cxa_atexit. This means a variable whose completion of the constructor strongly happened after the completion of the other one will be destructed after the other variable is destructed. The following patch fixes that by swapping the __cxa_guard_release and __cxa_atexit calls. 2021-03-16 Jakub Jelinek <jakub@redhat.com> PR c++/99613 * decl.c (expand_static_init): For thread guards, call __cxa_atexit before calling __cxa_guard_release rather than after it. Formatting fixes.	2021-03-16 21:17:44 +01:00
Segher Boessenkool	a0b5843a9b	rs6000: Workaround for PR98092 The bcdinvalid_<mode> RTL instruction uses the "unordered" comparison, which cannot be used if we have -ffinite-math-only. We really need CCMODEs that describe what bits in a CR field are set by other insns than just comparisons, but that is a lot more surgery, and it is stage 4 now. This patch does a simple workaround. 2021-03-16 Segher Boessenkool <segher@kernel.crashing.org> PR target/98092 * config/rs6000/predicates.md (branch_comparison_operator): Allow ordered and unordered for CCFPmode, if flag_finite_math_only. gcc/testsuite/ PR target/98092 * gcc.target/powerpc/pr98092.c: New.	2021-03-16 19:21:34 +00:00
Jakub Jelinek	d55ce33a34	i386: Avoid mutual recursion between two peephole2s [PR99600] As the testcase shows, the compiler hangs and eats all memory when compiling it. This is because in r11-7274-gdecd8fb0128870d0d768ba53dae626913d6d9c54 I have changed the ix86_avoid_lea_for_addr splitting from a splitter into a peephole2 (because during splitting passes we don't have guaranteed df, while during peephole2 we do). The problem is we have another peephole2 that works in the opposite way, when seeing split lea (in particular ASHIFT followed by PLUS) it attempts to turn it back into a lea. In the past, they were fighting against each other, but as they were in different passes, simply the last one won. So, split after reload split the lea into shift left and plus, peephole2 reverted that (but, note not perfectly, the peephole2 doesn't understand that something can be placed into lea disp; to be fixed for GCC12) and then another split pass split the lea appart again. But my changes and the way peephole2 works means that we endlessly iterate over those two, the first peephole2 splits the lea, the second one reverts it, the first peephole2 splits the new lea back into new 2 insns and so forth forever. So, we need to break the cycle somehow. This patch does that by not emitting an ASHIFT insn from ix86_split_lea_for_addr but emitting a corresponding MULT by constant instead, and splitting that later back into ASHIFT. 2021-03-16 Jakub Jelinek <jakub@redhat.com> PR target/99600 * config/i386/i386-expand.c (ix86_split_lea_for_addr): Emit a MULT rather than ASHIFT. * config/i386/i386.md (mult by 1248 into ashift): New splitter. * gcc.target/i386/pr99600.c: New test.	2021-03-16 18:46:20 +01:00
Martin Liska	1c7bec8bfb	c++: support target attr for DECL_LOCAL_DECL_P fns [PR99108] We crash when target attribute get_function_versions_dispatcher is called for a function that is not registered in call graph. This was happening because we were calling it for the function-local decls that aren't in the symbol table, instead of the corresponding namespace-scope decls that are. gcc/cp/ChangeLog: PR c++/99108 * call.c (get_function_version_dispatcher): Handle DECL_LOCAL_DECL_P. * decl.c (maybe_version_functions): Likewise. (maybe_mark_function_versioned): New. * name-lookup.c (push_local_extern_decl_alias): No longer static. * name-lookup.h (push_local_extern_decl_alias): Adjust. gcc/testsuite/ChangeLog: PR c++/99108 * g++.target/i386/pr99108.C: New test. Co-authored-by: Jason Merrill <jason@redhat.com>	2021-03-16 10:54:23 -04:00
Nick Clifton	f6e9c1c919	Fix potentially undefined behaviour when computing a sha1 value. libiberty/ * sha1.c (sha1_process_bytes): Use memmove in place of memcpy.	2021-03-16 14:43:17 +00:00
Martin Liska	408d137027	options: ignore flag_ipa_ra in cl_optimization_compare gcc/ChangeLog: PR target/99592 * optc-save-gen.awk: Add flag_ipa_ra to exceptions for cl_optimization_compare function. gcc/testsuite/ChangeLog: PR target/99592 * gcc.target/arm/pr99592.c: New test.	2021-03-16 14:44:26 +01:00
Ilya Leoshkevich	4073a09e23	IBM Z: Fix "+fvm" constraint with long doubles When a long double is passed to an asm statement with a "+fvm" constraint, a LRA loop occurs. This happens, because LRA chooses the widest register class in this case (VEC_REGS), but the code generated by s390_md_asm_adjust() always wants FP_REGS. Mismatching register classes cause infinite reloading. Fix by treating "fv" constraints as "v" in s390_md_asm_adjust(). gcc/ChangeLog: * config/s390/s390.c (f_constraint_p): Treat "fv" constraints as "v". gcc/testsuite/ChangeLog: * gcc.target/s390/vector/long-double-asm-fprvrmem.c: New test.	2021-03-16 13:57:34 +01:00

1 2 3 4 5 ...

183991 Commits