When SVE is enabled, GCC needs to do a three-way comparison between scalar, Advanced SIMD and SVE code. The normal costs tend to be latency-based, which is well-suited to SLP. However, comparing sums of latency costs means that we effectively treat the code as executing sequentially. This can hide the effect of pipeline bubbles or resource contention that in practice are quite important for loop vectorisation. This is particularly true for loops that involve reductions. This patch therefore tries to estimate how quickly each piece of code could issue, using a very (very) simplistic model. It then uses this to adjust the loop vector costs up or down as appropriate. Part of the Advanced SIMD vs. SVE adjustment is opt-in and is not enabled by default even for use_new_vector_costs. Like with the previous patches, this one only becomes active if a CPU selects use_new_vector_costs. It should therefore have a very low impact on other CPUs. The code also mostly ignores CPUs that have no issue information, even if use_new_vector_costs is enabled for some reason. gcc/ * config/aarch64/aarch64.opt (-param=aarch64-loop-vect-issue-rate-niters=): New parameter. * doc/invoke.texi: Document it. * config/aarch64/aarch64-protos.h (aarch64_base_vec_issue_info) (aarch64_scalar_vec_issue_info, aarch64_simd_vec_issue_info) (aarch64_advsimd_vec_issue_info, aarch64_sve_vec_issue_info) (aarch64_vec_issue_info): New structures. (cpu_vector_cost): Write comments above the variables rather than to the side. (cpu_vector_cost::issue_info): New member variable. * config/aarch64/aarch64.c: Include gimple-pretty-print.h and tree-ssa-loop-niter.h. (generic_vector_cost, a64fx_vector_cost, qdf24xx_vector_cost) (thunderx_vector_cost, tsv110_vector_cost, cortexa57_vector_cost) (exynosm1_vector_cost, xgene1_vector_cost, thunderx2t99_vector_cost) (thunderx3t110_vector_cost): Initialize issue_info to null. (neoversev1_scalar_issue_info, neoversev1_advsimd_issue_info) (neoversev1_sve_issue_info, neoversev1_vec_issue_info): New structures. (neoversev1_vector_cost): Use them. (aarch64_vec_op_count, aarch64_sve_op_count): New structures. (aarch64_vector_costs::saw_sve_only_op): New member variable. (aarch64_vector_costs::num_vector_iterations): Likewise. (aarch64_vector_costs::scalar_ops): Likewise. (aarch64_vector_costs::advsimd_ops): Likewise. (aarch64_vector_costs::sve_ops): Likewise. (aarch64_vector_costs::seen_loads): Likewise. (aarch64_simd_vec_costs_for_flags): New function. (aarch64_analyze_loop_vinfo): Initialize num_vector_iterations. Count the number of predicate operations required by SVE WHILE instructions. (aarch64_comparison_type, aarch64_multiply_add_p): New functions. (aarch64_sve_only_stmt_p, aarch64_in_loop_reduction_latency): Likewise. (aarch64_count_ops): Likewise. (aarch64_add_stmt_cost): Record whether see an SVE operation that cannot currently be implementing using Advanced SIMD. Record issue information about the scalar, Advanced SIMD and (where relevant) SVE versions of a loop. (aarch64_vec_op_count::dump): New function. (aarch64_sve_op_count::dump): Likewise. (aarch64_estimate_min_cycles_per_iter): Likewise. (aarch64_adjust_body_cost): If issue information is available, try to compare the issue rates of the various loop implementations and increase or decrease the vector body cost accordingly. |
||
|---|---|---|
| c++tools | ||
| config | ||
| contrib | ||
| fixincludes | ||
| gcc | ||
| gnattools | ||
| gotools | ||
| include | ||
| INSTALL | ||
| intl | ||
| libada | ||
| libatomic | ||
| libbacktrace | ||
| libcc1 | ||
| libcody | ||
| libcpp | ||
| libdecnumber | ||
| libffi | ||
| libgcc | ||
| libgfortran | ||
| libgo | ||
| libgomp | ||
| libhsail-rt | ||
| libiberty | ||
| libitm | ||
| libobjc | ||
| liboffloadmic | ||
| libphobos | ||
| libquadmath | ||
| libsanitizer | ||
| libssp | ||
| libstdc++-v3 | ||
| libvtv | ||
| lto-plugin | ||
| maintainer-scripts | ||
| zlib | ||
| .dir-locals.el | ||
| .gitattributes | ||
| .gitignore | ||
| ABOUT-NLS | ||
| ar-lib | ||
| ChangeLog | ||
| ChangeLog.jit | ||
| ChangeLog.tree-ssa | ||
| compile | ||
| config-ml.in | ||
| config.guess | ||
| config.rpath | ||
| config.sub | ||
| configure | ||
| configure.ac | ||
| COPYING | ||
| COPYING3 | ||
| COPYING3.LIB | ||
| COPYING.LIB | ||
| COPYING.RUNTIME | ||
| depcomp | ||
| install-sh | ||
| libtool-ldflags | ||
| libtool.m4 | ||
| lt~obsolete.m4 | ||
| ltgcc.m4 | ||
| ltmain.sh | ||
| ltoptions.m4 | ||
| ltsugar.m4 | ||
| ltversion.m4 | ||
| MAINTAINERS | ||
| Makefile.def | ||
| Makefile.in | ||
| Makefile.tpl | ||
| missing | ||
| mkdep | ||
| mkinstalldirs | ||
| move-if-change | ||
| multilib.am | ||
| README | ||
| symlink-tree | ||
| test-driver | ||
| ylwrap | ||
This directory contains the GNU Compiler Collection (GCC). The GNU Compiler Collection is free software. See the files whose names start with COPYING for copying permission. The manuals, and some of the runtime libraries, are under different terms; see the individual source files for details. The directory INSTALL contains copies of the installation information as HTML and plain text. The source of this information is gcc/doc/install.texi. The installation information includes details of what is included in the GCC sources and what files GCC installs. See the file gcc/doc/gcc.texi (together with other files that it includes) for usage and porting information. An online readable version of the manual is in the files gcc/doc/gcc.info*. See http://gcc.gnu.org/bugs/ for how to report bugs usefully. Copyright years on GCC source files may be listed using range notation, e.g., 1987-2012, indicating that every year in the range, inclusive, is a copyrightable year that could otherwise be listed individually.