[v3,0/5] Add support for approximate instructions and optimize divf/sqrtf/rsqrtf operations.

Message ID 20231206070453.3252-1-xujiahao@loongson.cn
Headers
Series Add support for approximate instructions and optimize divf/sqrtf/rsqrtf operations. |

Message

Jiahao Xu Dec. 6, 2023, 7:04 a.m. UTC
  LoongArch V1.1 adds support for approximate instructions, which are utilized along with additional
Newton-Raphson steps implement single precision floating-point division, square root and reciprocal
square root operations for better throughput.

The patches are modifications made based on the patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639243.html

Jiahao Xu (5):
  LoongArch: Add support for LoongArch V1.1 approximate instructions.
  LoongArch: Use standard pattern name for xvfrsqrt/vfrsqrt
    instructions.
  LoongArch: Redefine pattern for xvfrecip/vfrecip instructions.
  LoongArch: New options -mrecip and -mrecip= with ffast-math.
  LoongArch: Vectorized loop unrolling is disable for divf/sqrtf/rsqrtf
    when -mrecip is enabled.

 gcc/config/loongarch/genopts/isa-evolution.in |   1 +
 gcc/config/loongarch/genopts/loongarch.opt.in |  11 +
 gcc/config/loongarch/larchintrin.h            |  38 +++
 gcc/config/loongarch/lasx.md                  |  89 ++++++-
 gcc/config/loongarch/lasxintrin.h             |  34 +++
 gcc/config/loongarch/loongarch-builtins.cc    |  66 +++++
 gcc/config/loongarch/loongarch-c.cc           |   3 +
 gcc/config/loongarch/loongarch-cpucfg-map.h   |   1 +
 gcc/config/loongarch/loongarch-def.cc         |   3 +-
 gcc/config/loongarch/loongarch-protos.h       |   2 +
 gcc/config/loongarch/loongarch-str.h          |   1 +
 gcc/config/loongarch/loongarch.cc             | 252 +++++++++++++++++-
 gcc/config/loongarch/loongarch.h              |  18 ++
 gcc/config/loongarch/loongarch.md             | 104 ++++++--
 gcc/config/loongarch/loongarch.opt            |  15 ++
 gcc/config/loongarch/lsx.md                   |  89 ++++++-
 gcc/config/loongarch/lsxintrin.h              |  34 +++
 gcc/config/loongarch/predicates.md            |   8 +
 gcc/doc/extend.texi                           |  35 +++
 gcc/doc/invoke.texi                           |  54 ++++
 gcc/testsuite/gcc.target/loongarch/divf.c     |  10 +
 .../loongarch/larch-frecipe-builtin.c         |  28 ++
 .../gcc.target/loongarch/recip-divf.c         |   9 +
 .../gcc.target/loongarch/recip-sqrtf.c        |  23 ++
 gcc/testsuite/gcc.target/loongarch/sqrtf.c    |  24 ++
 .../loongarch/vector/lasx/lasx-divf.c         |  13 +
 .../vector/lasx/lasx-frecipe-builtin.c        |  30 +++
 .../loongarch/vector/lasx/lasx-recip-divf.c   |  12 +
 .../loongarch/vector/lasx/lasx-recip-sqrtf.c  |  28 ++
 .../loongarch/vector/lasx/lasx-recip.c        |  24 ++
 .../loongarch/vector/lasx/lasx-rsqrt.c        |  26 ++
 .../loongarch/vector/lasx/lasx-sqrtf.c        |  29 ++
 .../loongarch/vector/lsx/lsx-divf.c           |  13 +
 .../vector/lsx/lsx-frecipe-builtin.c          |  30 +++
 .../loongarch/vector/lsx/lsx-recip-divf.c     |  12 +
 .../loongarch/vector/lsx/lsx-recip-sqrtf.c    |  28 ++
 .../loongarch/vector/lsx/lsx-recip.c          |  24 ++
 .../loongarch/vector/lsx/lsx-rsqrt.c          |  26 ++
 .../loongarch/vector/lsx/lsx-sqrtf.c          |  29 ++
 39 files changed, 1234 insertions(+), 42 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/divf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/larch-frecipe-builtin.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/recip-divf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/recip-sqrtf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/sqrtf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-divf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-frecipe-builtin.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-divf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-sqrtf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-rsqrt.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-sqrtf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-divf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-frecipe-builtin.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-divf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-sqrtf.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-rsqrt.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-sqrtf.c
  

Comments

Jiahao Xu Dec. 6, 2023, 8:25 a.m. UTC | #1
在 2023/12/6 下午3:04, Jiahao Xu 写道:
> LoongArch V1.1 adds support for approximate instructions, which are utilized along with additional
> Newton-Raphson steps implement single precision floating-point division, square root and reciprocal
> square root operations for better throughput.
>
> The patches are modifications made based on the patch:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639243.html

The changes in version 3 compared to the previous version include:

*Enable -mfrecipe when using -march=la664.
*Implement builtin functions for frecipe and frsqrte instructions and 
introduce a new builtin macro "__loongarch_frecipe".
*Add corresponding test cases for the implemented builtin functions.
*Update the usage for the new intrinsic functions and builtin functions 
in extend.texi.
*Add reverse tests for scenarios where the -mrecip option is not enabled.
> Jiahao Xu (5):
>    LoongArch: Add support for LoongArch V1.1 approximate instructions.
>    LoongArch: Use standard pattern name for xvfrsqrt/vfrsqrt
>      instructions.
>    LoongArch: Redefine pattern for xvfrecip/vfrecip instructions.
>    LoongArch: New options -mrecip and -mrecip= with ffast-math.
>    LoongArch: Vectorized loop unrolling is disable for divf/sqrtf/rsqrtf
>      when -mrecip is enabled.
>
>   gcc/config/loongarch/genopts/isa-evolution.in |   1 +
>   gcc/config/loongarch/genopts/loongarch.opt.in |  11 +
>   gcc/config/loongarch/larchintrin.h            |  38 +++
>   gcc/config/loongarch/lasx.md                  |  89 ++++++-
>   gcc/config/loongarch/lasxintrin.h             |  34 +++
>   gcc/config/loongarch/loongarch-builtins.cc    |  66 +++++
>   gcc/config/loongarch/loongarch-c.cc           |   3 +
>   gcc/config/loongarch/loongarch-cpucfg-map.h   |   1 +
>   gcc/config/loongarch/loongarch-def.cc         |   3 +-
>   gcc/config/loongarch/loongarch-protos.h       |   2 +
>   gcc/config/loongarch/loongarch-str.h          |   1 +
>   gcc/config/loongarch/loongarch.cc             | 252 +++++++++++++++++-
>   gcc/config/loongarch/loongarch.h              |  18 ++
>   gcc/config/loongarch/loongarch.md             | 104 ++++++--
>   gcc/config/loongarch/loongarch.opt            |  15 ++
>   gcc/config/loongarch/lsx.md                   |  89 ++++++-
>   gcc/config/loongarch/lsxintrin.h              |  34 +++
>   gcc/config/loongarch/predicates.md            |   8 +
>   gcc/doc/extend.texi                           |  35 +++
>   gcc/doc/invoke.texi                           |  54 ++++
>   gcc/testsuite/gcc.target/loongarch/divf.c     |  10 +
>   .../loongarch/larch-frecipe-builtin.c         |  28 ++
>   .../gcc.target/loongarch/recip-divf.c         |   9 +
>   .../gcc.target/loongarch/recip-sqrtf.c        |  23 ++
>   gcc/testsuite/gcc.target/loongarch/sqrtf.c    |  24 ++
>   .../loongarch/vector/lasx/lasx-divf.c         |  13 +
>   .../vector/lasx/lasx-frecipe-builtin.c        |  30 +++
>   .../loongarch/vector/lasx/lasx-recip-divf.c   |  12 +
>   .../loongarch/vector/lasx/lasx-recip-sqrtf.c  |  28 ++
>   .../loongarch/vector/lasx/lasx-recip.c        |  24 ++
>   .../loongarch/vector/lasx/lasx-rsqrt.c        |  26 ++
>   .../loongarch/vector/lasx/lasx-sqrtf.c        |  29 ++
>   .../loongarch/vector/lsx/lsx-divf.c           |  13 +
>   .../vector/lsx/lsx-frecipe-builtin.c          |  30 +++
>   .../loongarch/vector/lsx/lsx-recip-divf.c     |  12 +
>   .../loongarch/vector/lsx/lsx-recip-sqrtf.c    |  28 ++
>   .../loongarch/vector/lsx/lsx-recip.c          |  24 ++
>   .../loongarch/vector/lsx/lsx-rsqrt.c          |  26 ++
>   .../loongarch/vector/lsx/lsx-sqrtf.c          |  29 ++
>   39 files changed, 1234 insertions(+), 42 deletions(-)
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/divf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/larch-frecipe-builtin.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/recip-divf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/recip-sqrtf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/sqrtf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-divf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-frecipe-builtin.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-divf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-sqrtf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-rsqrt.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-sqrtf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-divf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-frecipe-builtin.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-divf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-sqrtf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-rsqrt.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-sqrtf.c
>
  
chenglulu Dec. 8, 2023, 8:31 a.m. UTC | #2
Pushed to r14-6311...r14-6315.

在 2023/12/6 下午3:04, Jiahao Xu 写道:
> LoongArch V1.1 adds support for approximate instructions, which are utilized along with additional
> Newton-Raphson steps implement single precision floating-point division, square root and reciprocal
> square root operations for better throughput.
>
> The patches are modifications made based on the patch:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639243.html
>
> Jiahao Xu (5):
>    LoongArch: Add support for LoongArch V1.1 approximate instructions.
>    LoongArch: Use standard pattern name for xvfrsqrt/vfrsqrt
>      instructions.
>    LoongArch: Redefine pattern for xvfrecip/vfrecip instructions.
>    LoongArch: New options -mrecip and -mrecip= with ffast-math.
>    LoongArch: Vectorized loop unrolling is disable for divf/sqrtf/rsqrtf
>      when -mrecip is enabled.
>
>   gcc/config/loongarch/genopts/isa-evolution.in |   1 +
>   gcc/config/loongarch/genopts/loongarch.opt.in |  11 +
>   gcc/config/loongarch/larchintrin.h            |  38 +++
>   gcc/config/loongarch/lasx.md                  |  89 ++++++-
>   gcc/config/loongarch/lasxintrin.h             |  34 +++
>   gcc/config/loongarch/loongarch-builtins.cc    |  66 +++++
>   gcc/config/loongarch/loongarch-c.cc           |   3 +
>   gcc/config/loongarch/loongarch-cpucfg-map.h   |   1 +
>   gcc/config/loongarch/loongarch-def.cc         |   3 +-
>   gcc/config/loongarch/loongarch-protos.h       |   2 +
>   gcc/config/loongarch/loongarch-str.h          |   1 +
>   gcc/config/loongarch/loongarch.cc             | 252 +++++++++++++++++-
>   gcc/config/loongarch/loongarch.h              |  18 ++
>   gcc/config/loongarch/loongarch.md             | 104 ++++++--
>   gcc/config/loongarch/loongarch.opt            |  15 ++
>   gcc/config/loongarch/lsx.md                   |  89 ++++++-
>   gcc/config/loongarch/lsxintrin.h              |  34 +++
>   gcc/config/loongarch/predicates.md            |   8 +
>   gcc/doc/extend.texi                           |  35 +++
>   gcc/doc/invoke.texi                           |  54 ++++
>   gcc/testsuite/gcc.target/loongarch/divf.c     |  10 +
>   .../loongarch/larch-frecipe-builtin.c         |  28 ++
>   .../gcc.target/loongarch/recip-divf.c         |   9 +
>   .../gcc.target/loongarch/recip-sqrtf.c        |  23 ++
>   gcc/testsuite/gcc.target/loongarch/sqrtf.c    |  24 ++
>   .../loongarch/vector/lasx/lasx-divf.c         |  13 +
>   .../vector/lasx/lasx-frecipe-builtin.c        |  30 +++
>   .../loongarch/vector/lasx/lasx-recip-divf.c   |  12 +
>   .../loongarch/vector/lasx/lasx-recip-sqrtf.c  |  28 ++
>   .../loongarch/vector/lasx/lasx-recip.c        |  24 ++
>   .../loongarch/vector/lasx/lasx-rsqrt.c        |  26 ++
>   .../loongarch/vector/lasx/lasx-sqrtf.c        |  29 ++
>   .../loongarch/vector/lsx/lsx-divf.c           |  13 +
>   .../vector/lsx/lsx-frecipe-builtin.c          |  30 +++
>   .../loongarch/vector/lsx/lsx-recip-divf.c     |  12 +
>   .../loongarch/vector/lsx/lsx-recip-sqrtf.c    |  28 ++
>   .../loongarch/vector/lsx/lsx-recip.c          |  24 ++
>   .../loongarch/vector/lsx/lsx-rsqrt.c          |  26 ++
>   .../loongarch/vector/lsx/lsx-sqrtf.c          |  29 ++
>   39 files changed, 1234 insertions(+), 42 deletions(-)
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/divf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/larch-frecipe-builtin.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/recip-divf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/recip-sqrtf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/sqrtf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-divf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-frecipe-builtin.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-divf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-sqrtf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-rsqrt.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-sqrtf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-divf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-frecipe-builtin.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-divf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-sqrtf.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-rsqrt.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-sqrtf.c
>