With SELECT RANK, name mangling results in long internal symbols that
overflows internal buffers. Fix that.
gcc/fortran/
PR fortran/95828
* match.c (select_rank_set_tmp): Enlarge internal buffer used in
generating a mangled name.
* resolve.c (resolve_select_rank): Likewise.
With PDTs (parameterized derived types), name mangling results in variably
long internal symbols. Use a dynamic buffer instead of a fixed-size one.
gcc/fortran/
PR fortran/95826
* decl.c (gfc_match_decl_type_spec): Replace a fixed size
buffer by a pointer and reallocate if necessary.
This has been questionable behaviour since it was added, and though it
has no effect on wider discussions around what should be the correct
semantics of pragma(inline) within D modules, doing this tree-level
optimization has mostly zero benefit as cross-module inlining doesn't
happen anyway.
gcc/d/ChangeLog:
* decl.cc (get_symbol_decl): Do not implicitly set
DECL_DECLARED_INLINE_P on member functions.
These two functions are not tied to the language-specific part of the
front-end in any way.
gcc/d/ChangeLog:
* d-lang.cc (d_gimplify_expr_p): Make static.
(d_parse_file): Likewise.
(d_signed_or_unsigned_type): Move to types.cc.
(d_unsigned_type): Likewise.
(d_signed_type): Likewise.
* d-tree.h (d_unsigned_type): Change the location in file.
(d_signed_type): Likewise.
* types.cc (d_signed_or_unsigned_type): Moved from d-lang.cc.
(d_unsigned_type): Likewise.
(d_signed_type): Likewise.
Fixes a regression caused by an incomplete backport of converting the
Expression semantic pass to a Visitor.
Reviewed-on: https://github.com/dlang/dmd/pull/11314
gcc/d/ChangeLog:
PR d/95250
* dmd/MERGE: Merge upstream dmd 90450f3ef.
gcc/testsuite/ChangeLog:
PR d/95250
* gdc.dg/pr95250.d: New test.
As the DMD front-end never frees allocated memory, the glue layer
between the DMD front-end and GCC should generally avoid using DMD types
and interfaces if the purpose is internal only.
gcc/d/ChangeLog:
* d-lang.cc (d_parse_file): Replace OutBuffer with obstack.
Backports the OutBuffer interface from upstream dmd master, removing
another difference this and the self-hosted D branch that is purely
refactoring, and doesn't introduce any mechanical changes.
Reviewed-on: https://github.com/dlang/dmd/pull/11302
gcc/d/ChangeLog:
* dmd/MERGE: Merge upstream dmd 5fc1806cd.
* d-lang.cc (d_parse_file): Use peekChars to get string representation
of OutBuffer data.
The target attribute table is not guaranteed to be set in all backends.
gcc/d/ChangeLog:
PR d/95173
* d-attribs.cc (uda_attribute_p): Don't search target attribute table
if NULL.
gcc/testsuite/ChangeLog:
PR d/95173
* gdc.dg/pr95173.d: New test.
Declarations initialized with `= void` were being default initialized.
That is not really the intent, and misses the small optimization that
should have been gained from using void initializations.
gcc/d/ChangeLog:
* decl.cc (DeclVisitor::visit (VarDeclaration *)): Don't set
DECL_INITIAL if initializer is 'void'.
gcc/testsuite/ChangeLog:
* gdc.dg/init1.d: New test.
This is the default in the upstream reference compiler, and can reduce
some confusion when comparing warning/error messages of gdc and dmd side
by side.
Merges libphobos with upstream druntime d05ebaad and phobos 021ae0df7.
Reviewed-on: https://github.com/dlang/druntime/pull/3127https://github.com/dlang/phobos/pull/7521
gcc/d/ChangeLog:
* d-lang.cc (d_init_options): Turn on deprecation warnings by default.
libphobos/ChangeLog:
* libdruntime/MERGE: Merge upstream druntime d05ebaad.
* src/MERGE: Merge upstream phobos 021ae0df7.
* testsuite/libphobos.typeinfo/struct-align.d: Remove empty statement.
gcc/testsuite/ChangeLog:
* gdc.dg/asm1.d: Don't use deprecated asm syntax.
* gdc.dg/compilable.d: Add public to selective import.
* gdc.dg/lto/ltotests_0.d: Explicitly catch Throwable.
* gdc.dg/runnable.d: Remove empty statement.
Testing showed that it is always set and its value matches
always ts->kind (if available) or otherwise, if it is a variable,
the sym->ts.kind.
gcc/fortran/ChangeLog:
PR fortran/95837
* resolve.c (gfc_resolve_substring_charlen): Remove
bogus ts.kind setting for the expression.
gcc/testsuite/ChangeLog:
PR fortran/95837
* gfortran.dg/char4-subscript.f90: New test.
This removes a premature check for enough datarefs in a basic-block
before we consider vectorizing it which leaves basic-blocks with
just vectorizable vector constructors unvectorized. The check
is effectively done by the following check for store groups
which then also include constructors.
2020-06-25 Richard Biener <rguenther@suse.de>
PR tree-optimization/95839
* tree-vect-slp.c (vect_slp_analyze_bb_1): Remove premature
check on the number of datarefs.
* gcc.dg/vect/bb-slp-pr95839.c: New testcase.
Darwin has signed chars and the fields in the insn_data
struct are const char, which leads to the fail.
gcc/ChangeLog:
* config/rs6000/rs6000-call.c (mma_init_builtins): Cast
the insn_data n_operands value to unsigned.
The unmodified 'if' clause should be applied to all the sub-constructs that
accept an 'if' clause in a combined OpenMP construct, and not just to the
'parallel' sub-construct.
2020-06-25 Kwok Cheung Yeung <kcy@codesourcery.com>
gcc/fortran/
* trans-openmp.c (gfc_split_omp_clauses): Add if clause
to target and simd sub-constructs.
gcc/testsuite/
* gfortran.dg/gomp/combined-if.f90: New.
Reviewed-by: Jakub Jelinek <jakub@redhat.com>
With the last vectorizable_shift patch we can now always use the
SLP vector defs to determine the vectorized stmt insertion place,
paving the way for a "verifier" for pending restructuring and
BB vectorization of reductions and other live stmts.
2020-06-25 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_schedule_slp_instance): Always use
vector defs to determine insertion place.
CLWB isn't supported on Ice Lake client. But Ice Lake server and Tiger
Lake support it. Move PTA_CLWB to PTA_ICELAKE_SERVER and PTA_TIGERLAKE.
PR target/95874
* config/i386/i386.h (PTA_ICELAKE_CLIENT): Remove PTA_CLWB.
(PTA_ICELAKE_SERVER): Add PTA_CLWB.
(PTA_TIGERLAKE): Add PTA_CLWB.
This avoids using the original scalar SSA operand when vectorizing
a shift with a vectorized shift operand where we know all vector
components have the same value and thus we can use a vector by
scalar shift. Using the scalar SSA operand causes a possibly
long chain of scalar computation to be retained so it's better
to simply extract lane zero from the available vectorized shift
operand.
2020-06-25 Richard Biener <rguenther@suse.de>
PR tree-optimization/95866
* tree-vect-stmts.c (vectorizable_shift): Reject incompatible
vectorized shift operands. For scalar shifts use lane zero
of a vectorized shift operand.
* gcc.dg/vect/bb-slp-pr95866.c: New testcase.
libgcc/ChangeLog:
* libgcov-driver.c (merge_summary): Remove function as its name
is misleading and doing something different.
(dump_one_gcov): Add ATTRIBUTE_UNUSED for 2 args. Take read summary
in gcov-tool.
* libgcov-util.c (curr_object_summary): Remove.
(read_gcda_file): Remove unused curr_object_summary.
(gcov_merge): Merge summaries.
* libgcov.h: Add summary argument for gcov_info struct.
gcc/ChangeLog:
PR tree-optimization/95745
PR middle-end/95830
* gimple-isel.cc (gimple_expand_vec_cond_exprs): Delete dead
SSA_NAMEs used as the first argument of a VEC_COND_EXPR. Always
return 0.
* tree-vect-generic.c (expand_vector_condition): Remove dead
SSA_NAMEs used as the first argument of a VEC_COND_EXPR.
Hi,
Fix codegen for builtin vec_pack_to_short_fp32. This includes adding
a define_insn for xvcvsphp, and adding a new define_expand for
convert_4f32_8f16.
2020-06-24 Will Schmidt <will_schmidt@vnet.ibm.com>
PR target/94954
gcc
* config/rs6000/altivec.h (vec_pack_to_short_fp32): Update.
* config/rs6000/altivec.md (UNSPEC_CONVERT_4F32_8F16): New unspec.
(convert_4f32_8f16): New define_expand
* config/rs6000/rs6000-builtin.def (convert_4f32_8f16): New builtin define
and overload.
* config/rs6000/rs6000-call.c (P9V_BUILTIN_VEC_CONVERT_4F32_8F16): New
overloaded builtin entry.
* config/rs6000/vsx.md (UNSPEC_VSX_XVCVSPHP): New unspec.
(vsx_xvcvsphp): New define_insn.
gcc/testsuite
* gcc.target/powerpc/builtins-1-p9-runnable.c: Update.
This patch introduces support for conditionals (and expr) expansions
to file lists in proc outest in outputs.exp.
The conditionals machinery is now used to guard files that are only
created by the LTO plugin, or when not using the LTO plugin.
It is also used to avoid special-casing .dwo files: the condition of
when they're expected is now encoded in the list.
Furthermore, the -g flag, that used to be specified along with
$gsplit_dwarf, is now moved into $gsplit_dwarf, so that we don't
compile with -g if -gsplit-dwarf is not needed. This avoids having to
deal with .dSYM directories.
Further removing special cases, $aout is now dealt with in a more
general way, using expr to perform variable/string expansion.
for gcc/testsuite/ChangeLog
PR testsuite/95416
PR testsuite/95577
* gcc.misc-tests/outputs.exp (gsplit_dwarf): Move -g into it.
(outest): Introduce conditionals and string/variable/expr
expansion. Drop special-casing of $aout and .dwo.
(gspd): New conditional. Guard all .dwo files with it.
(ltop): New conditional. Guard files created by the LTO
plugin with it. Guard files created by fat LTO compilation
with its negation. Add a few -fno-use-linker-plugin tests
guarded by it.
This fixes PR95672 by adding the missing TYPE_PACK_EXPANSION case in
cxx_incomplete_type_diagnostic in order to avoid ICEs on diagnosing
incomplete template pack expansion cases.
Tested on powerpc64le-unknown-linux-gnu.
gcc/cp/ChangeLog:
PR c++/95672
* typeck2.c (cxx_incomplete_type_diagnostic): Add missing
TYPE_EXPANSION_PACK check for diagnosing incomplete types in
cxx_incomplete_type_diagnostic.
gcc/testsuite/ChangeLog:
PR c++/95672
* g++.dg/template/pr95672.C: New test.
Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
We had omitted the copying of function attributes, we now copy
the used, alignment, section values from the original decal and
the complete set of function attributes. It is likely that
some function attributes don't really make sense for coroutines,
but that can be disgnosed separately. Also mark the outlined
functions as artificial, since they are; some diagnostic
processing tests this.
gcc/cp/ChangeLog:
PR c++/95518
PR c++/95813
* coroutines.cc (act_des_fn): Copy function
attributes onto the outlined coroutine helpers.
gcc/testsuite/ChangeLog:
PR c++/95518
PR c++/95813
* g++.dg/coroutines/pr95518.C: New test.
* g++.dg/coroutines/pr95813.C: New test.
We updated the handling of the errors for cases when the
ramp return cannot be constructed from the user's provided
get-return-object method. This updates the testcases to
cover this.
gcc/testsuite/ChangeLog:
* g++.dg/coroutines/void-gro-non-class-coro.C: Moved to...
* g++.dg/coroutines/coro-bad-gro-01-void-gro-non-class-coro.C: ...here.
* g++.dg/coroutines/coro-bad-gro-00-class-gro-scalar-return.C: New test.
It occurred to me that if we're looking up the defining base within the
conversion_path binfo, we could use the result for the conversion as well
instead of doing two separate conversions.
gcc/cp/ChangeLog:
* call.c (build_over_call): Only call build_base_path once.
conversion_path points to the base where we found the using-declaration, not
where the function is actually a member; look up the actual base. And then
maybe look back to the derived class if the base is primary.
gcc/cp/ChangeLog:
PR c++/95719
* call.c (build_over_call): Look up the overrider in base_binfo.
* class.c (lookup_vfn_in_binfo): Look through BINFO_PRIMARY_P.
gcc/testsuite/ChangeLog:
PR c++/95719
* g++.dg/tree-ssa/final4.C: New test.
2020-06-24 Roger Sayle <roger@nextmovesoftware.com>
* simplify-rtx.c (simplify_unary_operation_1): Simplify
(parity (parity x)) as (parity x), i.e. PARITY is idempotent.
With submodules and coarrays, name mangling results in long internal
symbols. Enlarge internal buffer.
gcc/fortran/
PR fortran/95827
* iresolve.c (gfc_get_string): Enlarge internal buffer used in
generating the mangled name.
This avoids vectorizing SLP subgraphs that just compute uniform
operations on all-same operands. That fixes the less interesting
(but most embarrasing) part of the testcase in the PR. On the
way it also fixed a missing matches[0] reset in the last
refactoring touching that place.
2020-06-24 Richard Biener <rguenther@suse.de>
PR tree-optimization/95866
* tree-vect-slp.c (vect_slp_tree_uniform_p): New.
(vect_build_slp_tree_2): Properly reset matches[0],
ignore uniform constants.
* gcc.target/i386/pr95866-1.c: New testcase.
Brand ID was a feature that briefly existed in some Pentium III and
Pentium 4 CPUs. The CPUs that had non-zero brand ID still have had
valid family/model. Brand ID just gives a marketing name for the CPU.
Remove the extra code for brand ID check.
gcc/
PR target/95660
* common/config/i386/cpuinfo.h (get_intel_cpu): Remove brand_id.
(cpu_indicator_init): Likewise.
* config/i386/driver-i386.c (host_detect_local_cpu): Updated.
gcc/testsuite/
PR target/95660
* gcc.target/i386/builtin_target.c (check_detailed): Updated.
All Sky Lake family processors have the same CPUID model number, 0x55.
The differences are Cascade Lake has AVX512VNNI and Cooper Lake has
AVX512VNNI + AVX512BF16. Check AVX512BF16 for Cooper Lake.
PR target/95774
* common/config/i386/cpuinfo.h (get_intel_cpu): Add Cooper Lake
detection with AVX512BF16.
Both driver-i386.c and libgcc use CPUID to detect the processor name
as well as available ISAs. To detect the same processor or ISAs, the
same detection logic is duplicated in 2 places. Sometimes only one place
was up to date or got it right. Sometimes both places got it wrong.
1. Add common/config/i386/i386-isas.h to define _isa_names_table.
2. Use isa_names_table to auto-generate ISA command-line options.
3. Use isa_names_table to auto-generate __builtin_cpu_supports tests.
4. Use common/config/i386/cpuinfo.h to check available ISAs and detect
newer Intel processors in driver-i386.c and builtin_target.c.
5. Detection of AMD processors and older processors in driver-i386.c is
unchanged.
gcc/
PR target/95843
* common/config/i386/i386-isas.h: New file. Extracted from
gcc/config/i386/i386-builtins.c.
(_isa_names_table): Add option.
(ISA_NAMES_TABLE_START): New.
(ISA_NAMES_TABLE_END): Likewise.
(ISA_NAMES_TABLE_ENTRY): Likewise.
(isa_names_table): Defined with ISA_NAMES_TABLE_START,
ISA_NAMES_TABLE_END and ISA_NAMES_TABLE_ENTRY. Add more ISAs
from enum processor_features.
* config/i386/driver-i386.c: Include
"common/config/i386/cpuinfo.h" and
"common/config/i386/i386-isas.h".
(has_feature): New macro.
(host_detect_local_cpu): Call cpu_indicator_init to get CPU
features. Use has_feature to detect processor features. Call
Call get_intel_cpu to get the newer Intel CPU name. Use
isa_names_table to generate command-line options.
* config/i386/i386-builtins.c: Include
"common/config/i386/i386-isas.h".
(_arch_names_table): Removed.
(isa_names_table): Likewise.
gcc/testsuite/
PR target/95843
* gcc.target/i386/builtin_target.c: Include <stdlib.h>,
../../../common/config/i386/i386-cpuinfo.h and
../../../common/config/i386/cpuinfo.h.
(check_amd_cpu_model): Removed.
(check_intel_cpu_model): Likewise,
(CHECK___builtin_cpu_is): New.
(gcc_assert): New. Defined as assert.
(gcc_unreachable): New. Defined as abort.
(inline): New. Defined as empty.
(ISA_NAMES_TABLE_START): Likewise.
(ISA_NAMES_TABLE_END): Likewise.
(ISA_NAMES_TABLE_ENTRY): New.
(check_features): Include
"../../../common/config/i386/i386-isas.h".
(check_detailed): Call cpu_indicator_init. Always call
check_features. Call get_amd_cpu instead of check_amd_cpu_model.
Call get_intel_cpu instead of check_intel_cpu_model.
Both x86 backend and libgcc define enum processor_features. libgcc sets
enum processor_feature and x86 backend checks enum processor_feature.
They are very easy out of sync and it has happened multiple times in the
past.
1. Move cpuinfo.h from libgcc to common/config/i386 so that we can share
the same enum processor_features in x86 backend and libgcc.
2. Change __cpu_features2 to an array to support more processor features.
3. Add more processor features to enum processor_features.
gcc/
PR target/95259
* common/config/i386/cpuinfo.h: New file.
(__processor_model): Moved from libgcc/config/i386/cpuinfo.h.
(__processor_model2): New.
(CHECK___builtin_cpu_is): New. Defined as empty if not defined.
(has_cpu_feature): New function.
(set_cpu_feature): Likewise.
(get_amd_cpu): Moved from libgcc/config/i386/cpuinfo.c. Use
CHECK___builtin_cpu_is. Return AMD CPU name.
(get_intel_cpu): Moved from libgcc/config/i386/cpuinfo.c. Use
Use CHECK___builtin_cpu_is. Return Intel CPU name.
(get_available_features): Moved from libgcc/config/i386/cpuinfo.c.
Also check FEATURE_3DNOW, FEATURE_3DNOWP, FEATURE_ADX,
FEATURE_ABM, FEATURE_CLDEMOTE, FEATURE_CLFLUSHOPT, FEATURE_CLWB,
FEATURE_CLZERO, FEATURE_CMPXCHG16B, FEATURE_CMPXCHG8B,
FEATURE_ENQCMD, FEATURE_F16C, FEATURE_FSGSBASE, FEATURE_FXSAVE,
FEATURE_HLE, FEATURE_IBT, FEATURE_LAHF_LM, FEATURE_LM,
FEATURE_LWP, FEATURE_LZCNT, FEATURE_MOVBE, FEATURE_MOVDIR64B,
FEATURE_MOVDIRI, FEATURE_MWAITX, FEATURE_OSXSAVE,
FEATURE_PCONFIG, FEATURE_PKU, FEATURE_PREFETCHWT1, FEATURE_PRFCHW,
FEATURE_PTWRITE, FEATURE_RDPID, FEATURE_RDRND, FEATURE_RDSEED,
FEATURE_RTM, FEATURE_SERIALIZE, FEATURE_SGX, FEATURE_SHA,
FEATURE_SHSTK, FEATURE_TBM, FEATURE_TSXLDTRK, FEATURE_VAES,
FEATURE_WAITPKG, FEATURE_WBNOINVD, FEATURE_XSAVE, FEATURE_XSAVEC,
FEATURE_XSAVEOPT and FEATURE_XSAVES
(cpu_indicator_init): Moved from libgcc/config/i386/cpuinfo.c.
Also update cpu_model2.
* common/config/i386/i386-cpuinfo.h (processor_vendor): Add
Add VENDOR_CENTAUR, VENDOR_CYRIX and VENDOR_NSC.
(processor_features): Moved from gcc/config/i386/i386-builtins.c.
Renamed F_XXX to FEATURE_XXX. Add FEATURE_3DNOW, FEATURE_3DNOWP,
FEATURE_ADX, FEATURE_ABM, FEATURE_CLDEMOTE, FEATURE_CLFLUSHOPT,
FEATURE_CLWB, FEATURE_CLZERO, FEATURE_CMPXCHG16B,
FEATURE_CMPXCHG8B, FEATURE_ENQCMD, FEATURE_F16C,
FEATURE_FSGSBASE, FEATURE_FXSAVE, FEATURE_HLE, FEATURE_IBT,
FEATURE_LAHF_LM, FEATURE_LM, FEATURE_LWP, FEATURE_LZCNT,
FEATURE_MOVBE, FEATURE_MOVDIR64B, FEATURE_MOVDIRI,
FEATURE_MWAITX, FEATURE_OSXSAVE, FEATURE_PCONFIG,
FEATURE_PKU, FEATURE_PREFETCHWT1, FEATURE_PRFCHW,
FEATURE_PTWRITE, FEATURE_RDPID, FEATURE_RDRND, FEATURE_RDSEED,
FEATURE_RTM, FEATURE_SERIALIZE, FEATURE_SGX, FEATURE_SHA,
FEATURE_SHSTK, FEATURE_TBM, FEATURE_TSXLDTRK, FEATURE_VAES,
FEATURE_WAITPKG, FEATURE_WBNOINVD, FEATURE_XSAVE, FEATURE_XSAVEC,
FEATURE_XSAVEOPT, FEATURE_XSAVES and CPU_FEATURE_MAX.
(SIZE_OF_CPU_FEATURES): New.
* config/i386/i386-builtins.c (processor_features): Removed.
(isa_names_table): Replace F_XXX with FEATURE_XXX.
(fold_builtin_cpu): Change __cpu_features2 to an array.
libgcc/
PR target/95259
* config/i386/cpuinfo.c: Don't include "cpuinfo.h". Include
"common/config/i386/i386-cpuinfo.h" and
"common/config/i386/cpuinfo.h".
(__cpu_features2): Changed to array.
(get_amd_cpu): Removed.
(get_intel_cpu): Likewise.
(get_available_features): Likewise.
(__cpu_indicator_init): Call cpu_indicator_init.
* config/i386/cpuinfo.h: Removed.
The parser for binary numbers returned an error if the entire string
contains more digits than the result type. Leading zeros should be
ignored.
libstdc++-v3/ChangeLog:
* include/std/charconv (__from_chars_binary): Ignore leading zeros.
* testsuite/20_util/from_chars/1.cc: Check "0x1" for all bases,
not just 10 and 16.
* testsuite/20_util/from_chars/3.cc: New test.
The __detail::__to_chars_2 function assumes it won't be called with zero
values. However, when the output buffer is empty the caller doesn't
handle zero values correctly, and calls __to_chars_2 with a zero value,
resulting in an overflow of the empty buffer.
The __detail::__to_chars_i function should just return immediately for
an empty buffer, and otherwise ensure zero values are handled properly.
libstdc++-v3/ChangeLog:
PR libstdc++/95851
* include/std/charconv (__to_chars_i): Check for zero-sized
buffer unconditionally.
* testsuite/20_util/to_chars/95851.cc: New test.
In i386-builtins.c, arch_names_table is used to to map architecture name
string to internal model. A switch statement is used to map internal
processor name to architecture name string and internal priority.
model and priority are added to processor_alias_table so that a single
entry contains architecture name string, internal processor name,
internal model and internal priority. 6 entries are appended for
i386-builtins.c, which have special architecture name strings: amd,
amdfam10h, amdfam15h, amdfam17h, shanghai and istanbul, and pta_size is
adjusted to exclude them. Entries which are not used by i386-builtins.c
have internal model 0. P_PROC_DYNAMIC is added to internal priority to
make entries with dynamic architecture name string or priority.
PR target/95842
* common/config/i386/i386-common.c (processor_alias_table): Add
processor model and priority to each entry.
(pta_size): Updated with -6.
(num_arch_names): New.
* common/config/i386/i386-cpuinfo.h: New file.
* config/i386/i386-builtins.c (feature_priority): Removed.
(processor_model): Likewise.
(_arch_names_table): Likewise.
(arch_names_table): Likewise.
(_isa_names_table): Replace P_ZERO with P_NONE.
(get_builtin_code_for_version): Replace P_ZERO with P_NONE. Use
processor_alias_table.
(fold_builtin_cpu): Replace arch_names_table with
processor_alias_table.
* config/i386/i386.h: Include "common/config/i386/i386-cpuinfo.h".
(pta): Add model and priority.
(num_arch_names): New.
This makes sure to emit SLP vectorized loads where the first scalar
load is. This makes SLP dependence checking more powerful because
hoisting loads can use TBAA and it increases the freedom for
vector placement when there are constraints from live lanes.
Vectorized shifts block inserting vectorized stmts always after
vectorized defs because it ends up using the original scalar
operand even when the SLP graph indicates the shift operand
is vectorized (and we actually emit and cost those stmts).
vect_slp_analyze_and_verify_node_alignment shows we need alignment
for too many places, this is a temporary solution and my plan
is to have a single meta-info for a dataref group instead
(also getting rid of DR_GROUP_FIRST/NEXT_ELEMENT).
2020-06-24 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (vect_find_first_scalar_stmt_in_slp):
Declare.
* tree-vect-data-refs.c (vect_preserves_scalar_order_p):
Simplify for new position of vectorized SLP loads.
(vect_slp_analyze_node_dependences): Adjust for it.
(vect_slp_analyze_and_verify_node_alignment): Compute alignment
for the first stmts dataref.
* tree-vect-slp.c (vect_find_first_scalar_stmt_in_slp): New.
(vect_schedule_slp_instance): Emit loads before the
first scalar stmt.
* tree-vect-stmts.c (vectorizable_load): Do what the comment
says and use vect_find_first_scalar_stmt_in_slp.
The following adjusts vect_stmt_dominates_stmt_p to honor out-of-region
stmts we run into which have UID -1u.
2020-06-24 Richard Biener <rguenther@suse.de>
PR tree-optimization/95856
* tree-vectorizer.c (vect_stmt_dominates_stmt_p): Honor
region marker -1u.
* gcc.dg/vect/pr95856.c: New testcase.
We folded A <= 0 ? A : -A into -ABS (A), which is for signed integral types
incorrect - can invoke on INT_MIN UB twice, once on ABS and once on its
negation.
The following patch fixes it by instead folding it to (type)-ABSU (A).
2020-06-24 Jakub Jelinek <jakub@redhat.com>
PR middle-end/95810
* fold-const.c (fold_cond_expr_with_comparison): Optimize
A <= 0 ? A : -A into (type)-absu(A) rather than -abs(A).
* gcc.dg/ubsan/pr95810.c: New test.