Message ID | 20230713140904.3274306-1-manolis.tsamis@vrull.eu |
---|---|
Headers |
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp1851261vqm; Thu, 13 Jul 2023 07:10:10 -0700 (PDT) X-Google-Smtp-Source: APBJJlHFqfeKyTrC7SBwemtdcO0575LhjBbvvAIZKAE//WlV9pRAuRCpC3YoWG1Eerlx2ky/q/S0 X-Received: by 2002:a17:906:ce3a:b0:988:9d0f:db52 with SMTP id sd26-20020a170906ce3a00b009889d0fdb52mr1574640ejb.35.1689257409836; Thu, 13 Jul 2023 07:10:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689257409; cv=none; d=google.com; s=arc-20160816; b=yjTnZjYVonul3clLVX5YjhgJl0ojDlXq3AehMoGAs/4PrwI7bmBxleY2SdOkW5+lnA nLlDXj2aaH/mMNjp8VCsJ7V801ImtTxUX+H0tT4ymNsaB+ghH4IdNm0MKdBm/YT9hdne O59Ua4+5pZLFUbN5GRUqeIimFihC/D+f4N1dxvIT5144pVrEBzauksRA3l8csr64hcZ1 jcZJt8+7ETBfWka6y46jlp1RIOhg2jGDmRy4glDjlOOF9fLEWt5V+RcEP8N0LLCR+nhh HG+trx9IIRInhVaPR24AxVleMpuyqb4uCZVCW/ktYkvbM8sWPhqJQxWqljsN/Ugv6Y4I wiiw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:message-id:date:subject:cc:to:from:dkim-signature :dmarc-filter:delivered-to; bh=jDFDoTfa3dTCWYI3+YSIxZM9s2cTPJcQWeBZPVLunUs=; fh=D6Yxz1I1DGJ9hxMEwgeUclSSttrsgrGLtgCEiAZlEzc=; b=VR80b2ZqDJUSfwQbOCpSxBZQ0ZxwPlReoR+QTsWU5YAhkKB6jCshLD3iXjM7VLTuDv pYZDprqyJ1r7/eJqSvs/ZrVbfu28YvLvKgxffTZpfwDigbqtjPqtSxfvNTRZnEpXliIy d7eO+i3Hh3IkMZG2rrjHIsQzbriPaO4Ja71qsbu0oc1NLUoLMacMijb9n8R5y2mb9vj5 QNcD7pofPXDwadWHk+f7yT2Z+3sygF2YCLz5eUzOpmKrOw0QWxBZpYGejvHYD5PoFK+b 5di+pFjPV27IcT8Ve1K15GBH4zLUmKlG+YlzHrJ7fNrBfWYk9ORdVXTjrmfRHWgi6ogS V/IQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@vrull.eu header.s=google header.b=oSfM9FJn; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id g16-20020a17090613d000b00987e40fd473si113830ejc.1019.2023.07.13.07.10.09 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 13 Jul 2023 07:10:09 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@vrull.eu header.s=google header.b=oSfM9FJn; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D1FF63857345 for <ouuuleilei@gmail.com>; Thu, 13 Jul 2023 14:09:46 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-lf1-x132.google.com (mail-lf1-x132.google.com [IPv6:2a00:1450:4864:20::132]) by sourceware.org (Postfix) with ESMTPS id 538D43858D32 for <gcc-patches@gcc.gnu.org>; Thu, 13 Jul 2023 14:09:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 538D43858D32 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=vrull.eu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu Received: by mail-lf1-x132.google.com with SMTP id 2adb3069b0e04-4fbf09a9139so1297682e87.2 for <gcc-patches@gcc.gnu.org>; Thu, 13 Jul 2023 07:09:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vrull.eu; s=google; t=1689257356; x=1691849356; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=jDFDoTfa3dTCWYI3+YSIxZM9s2cTPJcQWeBZPVLunUs=; b=oSfM9FJnH/LKN38XU8/N7kfoXPOaUJ++JtKMl6DdzQ6T4Nnt7ODYFFZfT3ai9BcNFZ Jl7tSFZ7GinXKlioxQsqdcYVXba4zdcGkP/f7X53GHykc/qU7io58q8d+pBuB5lr+hkG ZY5pAsDdJcWUdXQFCIQQCC2bsJtcTIlTHPYBav/6OYoHDDrAQ0xMxPc4LtfWKd5qmYUp Xa/VMENrKxvLTg9FD5EiTglL8EOiQ1XZCLq/WJEbFlscgr6kVlR4580xyKxKk7iVDaeO v1Xx+W8Py1eb+pRqvKZ/NqKtM7jua3g5qdjoN/GKpAVxLM9GltBy7dQHZoTTOE78fMMV 3QUQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689257356; x=1691849356; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=jDFDoTfa3dTCWYI3+YSIxZM9s2cTPJcQWeBZPVLunUs=; b=jgcT6rVi0jVvUtvvXxwtiOQHoBk+Sq4eU2oge6ZwX9NxUyWxK050TxTKapmYx/QKBq G252ermyGPWCFDl3tDrAv/HTh3fvKVHq4aFD1sUE7dmH8DgWssOtqBwZLszEXIoGI9rB Bl1IBDO40kVV1gQ2w+G5/vyXEFw3I4/IiO9Aj5tfanly1WDcYiReIb/+b3RvkHa8WcDV KJF+7PoaXrsrnVz/GpIxXF5kTxWOc8MqKAmWjOGoh+4MMbYkUx75QiWgl57iwOghVyAf VOUTgPqsdzofxIZ9NY04AUBl4q+izeyxvlLWdWS3cO83t+7woP64HguXnrZYsMimxK4H 50EQ== X-Gm-Message-State: ABy/qLbloEj3qQBRfM7zSzsQSsIxXzczLMQRHAE84Fb647FATUMhHrws mtGzCTk1fVSEszcSH01ynszE9oAdMOSeGaMBk/A= X-Received: by 2002:a2e:9992:0:b0:2b6:d77b:92b8 with SMTP id w18-20020a2e9992000000b002b6d77b92b8mr1613235lji.16.1689257355847; Thu, 13 Jul 2023 07:09:15 -0700 (PDT) Received: from helsinki-03.engr ([2a01:4f9:6b:2a47::2]) by smtp.gmail.com with ESMTPSA id d9-20020a2e8909000000b002b6e3337fd5sm1528181lji.7.2023.07.13.07.09.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 13 Jul 2023 07:09:15 -0700 (PDT) From: Manolis Tsamis <manolis.tsamis@vrull.eu> To: gcc-patches@gcc.gnu.org Cc: Philipp Tomsich <philipp.tomsich@vrull.eu>, Jakub Jelinek <jakub@redhat.com>, Andrew Pinski <apinski@marvell.com>, Robin Dapp <rdapp@linux.ibm.com>, Manolis Tsamis <manolis.tsamis@vrull.eu> Subject: [PATCH v2 0/2] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets Date: Thu, 13 Jul 2023 16:09:02 +0200 Message-Id: <20230713140904.3274306-1-manolis.tsamis@vrull.eu> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-5.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, JMQ_SPF_NEUTRAL, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771314777550546195 X-GMAIL-MSGID: 1771314777550546195 |
Series |
ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets
|
|
Message
Manolis Tsamis
July 13, 2023, 2:09 p.m. UTC
noce_convert_multiple_sets has been introduced and extended over time to handle if conversion for blocks with multiple sets. Currently this is focused on register moves and rejects any sort of arithmetic operations. This series is an extension to allow more sequences to take part in if conversion. The first patch is a required change to emit correct code and the second patch whitelists a larger number of operations through bb_ok_for_noce_convert_multiple_sets. For targets that have a rich selection of conditional instructions, like aarch64, I have seen an ~5x increase of profitable if conversions for multiple set blocks in SPEC benchmarks. Also tested with a wide variety of benchmarks and I have not seen performance regressions on either x64 / aarch64. Some samples that previously resulted in a branch but now better use these instructions can be seen in the provided test case. Tested on aarch64 and x64; On x64 some tests that use __builtin_rint are failing with an ICE but I believe that it's not an issue of this change. force_operand crashes when (and:DF (not:DF (reg:DF 88)) (reg/v:DF 83 [ x ])) is provided through emit_conditional_move. Changes in v2: - Change "conditional moves" to "conditional instructions" in bb_ok_for_noce_convert_multiple_sets's comment. Manolis Tsamis (2): ifcvt: handle sequences that clobber flags in noce_convert_multiple_sets ifcvt: Allow more operations in multiple set if conversion gcc/ifcvt.cc | 109 ++++++++++-------- .../aarch64/ifcvt_multiple_sets_arithm.c | 67 +++++++++++ 2 files changed, 127 insertions(+), 49 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c
Comments
Manolis Tsamis <manolis.tsamis@vrull.eu> writes: > noce_convert_multiple_sets has been introduced and extended over time to handle > if conversion for blocks with multiple sets. Currently this is focused on > register moves and rejects any sort of arithmetic operations. > > This series is an extension to allow more sequences to take part in if > conversion. The first patch is a required change to emit correct code and the > second patch whitelists a larger number of operations through > bb_ok_for_noce_convert_multiple_sets. > > For targets that have a rich selection of conditional instructions, > like aarch64, I have seen an ~5x increase of profitable if conversions for > multiple set blocks in SPEC benchmarks. Also tested with a wide variety of > benchmarks and I have not seen performance regressions on either x64 / aarch64. Interesting results. Are you free to say which target you used for aarch64? If I've understood the cost heuristics correctly, we'll allow a "predictable" branch to be replaced by up to 5 simple conditional instructions and an "unpredictable" branch to be replaced by up to 10 simple conditional instructions. That seems pretty high. And I'm not sure how well we guess predictability in the absence of real profile information. So my gut instinct was that the limitations of the current code might be saving us from overly generous limits. It sounds from your results like that might not be the case though. Still, if it does turn out to be the case in future, I agree we should fix the costs rather than hamstring the code. > Some samples that previously resulted in a branch but now better use these > instructions can be seen in the provided test case. > > Tested on aarch64 and x64; On x64 some tests that use __builtin_rint are > failing with an ICE but I believe that it's not an issue of this change. > force_operand crashes when (and:DF (not:DF (reg:DF 88)) (reg/v:DF 83 [ x ])) > is provided through emit_conditional_move. I guess that needs to be fixed first though. (Thanks for checking both targets.) My main comments on the series are: (1) It isn't obvious which operations should be included in the list in patch 2 and which shouldn't. Also, the patch only checks the outermost operation, and so it allows the inner rtxes to be arbitrarily complex. Because of that, it might be better to remove the condition altogether and just rely on the other routines to do costing and correctness checks. (2) Don't you also need to update the "rewiring" mechanism, to cope with cases where the then block has something like: if (a == 0) { a = b op c; -> a' = a == 0 ? b op c : a; d = a op b; -> d = a == 0 ? a' op b : d; } a = a' At the moment the code only handles regs and subregs, whereas but IIUC it should now iterate over all the regs in the SET_SRC. And I suppose that creates the need for multiple possible rewirings in the same insn, so that it isn't a simple insn -> index mapping any more. Thanks, Richard > > > Changes in v2: > - Change "conditional moves" to "conditional instructions" > in bb_ok_for_noce_convert_multiple_sets's comment. > > Manolis Tsamis (2): > ifcvt: handle sequences that clobber flags in > noce_convert_multiple_sets > ifcvt: Allow more operations in multiple set if conversion > > gcc/ifcvt.cc | 109 ++++++++++-------- > .../aarch64/ifcvt_multiple_sets_arithm.c | 67 +++++++++++ > 2 files changed, 127 insertions(+), 49 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c
Hi Richard, Thanks for your insightful reply. On Tue, Jul 18, 2023 at 1:12 AM Richard Sandiford <richard.sandiford@arm.com> wrote: > > Manolis Tsamis <manolis.tsamis@vrull.eu> writes: > > noce_convert_multiple_sets has been introduced and extended over time to handle > > if conversion for blocks with multiple sets. Currently this is focused on > > register moves and rejects any sort of arithmetic operations. > > > > This series is an extension to allow more sequences to take part in if > > conversion. The first patch is a required change to emit correct code and the > > second patch whitelists a larger number of operations through > > bb_ok_for_noce_convert_multiple_sets. > > > > For targets that have a rich selection of conditional instructions, > > like aarch64, I have seen an ~5x increase of profitable if conversions for > > multiple set blocks in SPEC benchmarks. Also tested with a wide variety of > > benchmarks and I have not seen performance regressions on either x64 / aarch64. > > Interesting results. Are you free to say which target you used for aarch64? > > If I've understood the cost heuristics correctly, we'll allow a "predictable" > branch to be replaced by up to 5 simple conditional instructions and an > "unpredictable" branch to be replaced by up to 10 simple conditional > instructions. That seems pretty high. And I'm not sure how well we > guess predictability in the absence of real profile information. > > So my gut instinct was that the limitations of the current code might > be saving us from overly generous limits. It sounds from your results > like that might not be the case though. > > Still, if it does turn out to be the case in future, I agree we should > fix the costs rather than hamstring the code. > My writing may have been confusing, but with "~5x increase of profitable if conversions" I just meant that ifcvt considers these profitable, not that they actually are when executed in particular hardware. But at the same time I haven't yet seen any obvious performance regressions in some benchmarks that I have ran. In any case it could be interesting to microbenchmark branches vs conditional instructions and see how sane these numbers are. > > Some samples that previously resulted in a branch but now better use these > > instructions can be seen in the provided test case. > > > > Tested on aarch64 and x64; On x64 some tests that use __builtin_rint are > > failing with an ICE but I believe that it's not an issue of this change. > > force_operand crashes when (and:DF (not:DF (reg:DF 88)) (reg/v:DF 83 [ x ])) > > is provided through emit_conditional_move. > > I guess that needs to be fixed first though. (Thanks for checking both > targets.) > I get a feeling this may be fixed if I properly take care of your points 1 & 2 below. I will report on that. > My main comments on the series are: > > (1) It isn't obvious which operations should be included in the list > in patch 2 and which shouldn't. Also, the patch only checks the > outermost operation, and so it allows the inner rtxes to be > arbitrarily complex. > > Because of that, it might be better to remove the condition > altogether and just rely on the other routines to do costing and > correctness checks. > That is true; I wanted to somehow only allow "normal arithmetic operations" and avoid generating sequences with stranger codes. I will try and see what happens if I remove the condition altogether. I also totally missed the fact that I was allowing arbitrarily complex inner rtxes so thanks for pointing that out. > (2) Don't you also need to update the "rewiring" mechanism, to cope > with cases where the then block has something like: > > if (a == 0) { > a = b op c; -> a' = a == 0 ? b op c : a; > d = a op b; -> d = a == 0 ? a' op b : d; > } a = a' > > At the moment the code only handles regs and subregs, whereas but IIUC > it should now iterate over all the regs in the SET_SRC. And I suppose > that creates the need for multiple possible rewirings in the same insn, > so that it isn't a simple insn -> index mapping any more. > Indeed, I believe this current patch cannot properly handle these. I will create testcases for this and see what changes need to be done in the next iteration so that correct code is generated. Thanks, Manolis > Thanks, > Richard > > > > > > > Changes in v2: > > - Change "conditional moves" to "conditional instructions" > > in bb_ok_for_noce_convert_multiple_sets's comment. > > > > Manolis Tsamis (2): > > ifcvt: handle sequences that clobber flags in > > noce_convert_multiple_sets > > ifcvt: Allow more operations in multiple set if conversion > > > > gcc/ifcvt.cc | 109 ++++++++++-------- > > .../aarch64/ifcvt_multiple_sets_arithm.c | 67 +++++++++++ > > 2 files changed, 127 insertions(+), 49 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c
Manolis Tsamis <manolis.tsamis@vrull.eu> writes: > On Tue, Jul 18, 2023 at 1:12 AM Richard Sandiford > <richard.sandiford@arm.com> wrote: >> >> Manolis Tsamis <manolis.tsamis@vrull.eu> writes: >> > noce_convert_multiple_sets has been introduced and extended over time to handle >> > if conversion for blocks with multiple sets. Currently this is focused on >> > register moves and rejects any sort of arithmetic operations. >> > >> > This series is an extension to allow more sequences to take part in if >> > conversion. The first patch is a required change to emit correct code and the >> > second patch whitelists a larger number of operations through >> > bb_ok_for_noce_convert_multiple_sets. >> > >> > For targets that have a rich selection of conditional instructions, >> > like aarch64, I have seen an ~5x increase of profitable if conversions for >> > multiple set blocks in SPEC benchmarks. Also tested with a wide variety of >> > benchmarks and I have not seen performance regressions on either x64 / aarch64. >> >> Interesting results. Are you free to say which target you used for aarch64? >> >> If I've understood the cost heuristics correctly, we'll allow a "predictable" >> branch to be replaced by up to 5 simple conditional instructions and an >> "unpredictable" branch to be replaced by up to 10 simple conditional >> instructions. That seems pretty high. And I'm not sure how well we >> guess predictability in the absence of real profile information. >> >> So my gut instinct was that the limitations of the current code might >> be saving us from overly generous limits. It sounds from your results >> like that might not be the case though. >> >> Still, if it does turn out to be the case in future, I agree we should >> fix the costs rather than hamstring the code. >> > > My writing may have been confusing, but with "~5x increase of > profitable if conversions" I just meant that ifcvt considers these > profitable, not that they actually are when executed in particular > hardware. Yeah, sorry, I'd read that part as measuring the number of if-converisons. But... > But at the same time I haven't yet seen any obvious performance > regressions in some benchmarks that I have ran. ...it was a pleasant surprise that doing so much more if-conversion didn't make things noticeably worse. :) > In any case it could be interesting to microbenchmark branches vs > conditional instructions and see how sane these numbers are. I think for this we really do need the real workload, since it's hard to measure realistic branch mispredict penalties with a microbenchmark. > [...] >> (2) Don't you also need to update the "rewiring" mechanism, to cope >> with cases where the then block has something like: >> >> if (a == 0) { >> a = b op c; -> a' = a == 0 ? b op c : a; >> d = a op b; -> d = a == 0 ? a' op b : d; >> } a = a' >> >> At the moment the code only handles regs and subregs, whereas but IIUC >> it should now iterate over all the regs in the SET_SRC. And I suppose >> that creates the need for multiple possible rewirings in the same insn, >> so that it isn't a simple insn -> index mapping any more. >> > > Indeed, I believe this current patch cannot properly handle these. I > will create testcases for this and see what changes need to be done in > the next iteration so that correct code is generated. Perhaps we should change the way that the rewiring is done. At the moment, need_cmov_or_rewire detects the renumbering ahead of time. But it might be easier to: - have noce_convert_multiple_sets_1 keep track of which SET_DESTs it has replaced with temporaries. - for each subsequent instruction, go through that list in order and use insn_propagation (from recog.h) to apply each replacement. That might be simpler, and should also be more robust, since the insn_propagation routines return false on failure. Thanks, Richard
On Tue, Jul 18, 2023 at 9:38 PM Richard Sandiford <richard.sandiford@arm.com> wrote: > > Manolis Tsamis <manolis.tsamis@vrull.eu> writes: > > On Tue, Jul 18, 2023 at 1:12 AM Richard Sandiford > > <richard.sandiford@arm.com> wrote: > >> > >> Manolis Tsamis <manolis.tsamis@vrull.eu> writes: > >> > noce_convert_multiple_sets has been introduced and extended over time to handle > >> > if conversion for blocks with multiple sets. Currently this is focused on > >> > register moves and rejects any sort of arithmetic operations. > >> > > >> > This series is an extension to allow more sequences to take part in if > >> > conversion. The first patch is a required change to emit correct code and the > >> > second patch whitelists a larger number of operations through > >> > bb_ok_for_noce_convert_multiple_sets. > >> > > >> > For targets that have a rich selection of conditional instructions, > >> > like aarch64, I have seen an ~5x increase of profitable if conversions for > >> > multiple set blocks in SPEC benchmarks. Also tested with a wide variety of > >> > benchmarks and I have not seen performance regressions on either x64 / aarch64. > >> > >> Interesting results. Are you free to say which target you used for aarch64? > >> > >> If I've understood the cost heuristics correctly, we'll allow a "predictable" > >> branch to be replaced by up to 5 simple conditional instructions and an > >> "unpredictable" branch to be replaced by up to 10 simple conditional > >> instructions. That seems pretty high. And I'm not sure how well we > >> guess predictability in the absence of real profile information. > >> > >> So my gut instinct was that the limitations of the current code might > >> be saving us from overly generous limits. It sounds from your results > >> like that might not be the case though. > >> > >> Still, if it does turn out to be the case in future, I agree we should > >> fix the costs rather than hamstring the code. > >> > > > > My writing may have been confusing, but with "~5x increase of > > profitable if conversions" I just meant that ifcvt considers these > > profitable, not that they actually are when executed in particular > > hardware. > > Yeah, sorry, I'd read that part as measuring the number of if-converisons. > But... > > > But at the same time I haven't yet seen any obvious performance > > regressions in some benchmarks that I have ran. > > ...it was a pleasant surprise that doing so much more if-conversion > didn't make things noticeably worse. :) > > > In any case it could be interesting to microbenchmark branches vs > > conditional instructions and see how sane these numbers are. > > I think for this we really do need the real workload, since it's > hard to measure realistic branch mispredict penalties with a > microbenchmark. > Yes indeed. I'm still trying to get properly analyze the effects of this change. I'll share when I have something interesting on the benchmarks side. > > [...] > >> (2) Don't you also need to update the "rewiring" mechanism, to cope > >> with cases where the then block has something like: > >> > >> if (a == 0) { > >> a = b op c; -> a' = a == 0 ? b op c : a; > >> d = a op b; -> d = a == 0 ? a' op b : d; > >> } a = a' > >> > >> At the moment the code only handles regs and subregs, whereas but IIUC > >> it should now iterate over all the regs in the SET_SRC. And I suppose > >> that creates the need for multiple possible rewirings in the same insn, > >> so that it isn't a simple insn -> index mapping any more. > >> > > > > Indeed, I believe this current patch cannot properly handle these. I > > will create testcases for this and see what changes need to be done in > > the next iteration so that correct code is generated. > > Perhaps we should change the way that the rewiring is done. > At the moment, need_cmov_or_rewire detects the renumbering > ahead of time. But it might be easier to: > > - have noce_convert_multiple_sets_1 keep track of which > SET_DESTs it has replaced with temporaries. > > - for each subsequent instruction, go through that list in order > and use insn_propagation (from recog.h) to apply each replacement. > > That might be simpler, and should also be more robust, since the > insn_propagation routines return false on failure. > Thanks, I've tried various designs with these ideas, including moving the rewiring code to noce_convert_multiple_sets_1 but there was always some issue. For example we cannot remove the need_cmov_or_rewire function because the part that calculates need_no_cmov needs to iterate once before the conversion, otherwise it wouldn't work. Also noce_convert_multiple_sets_1 is ran twice each time so doing the new rewiring logic which is somewhat expensive two times felt like a regression without making things much easier. In the end I opted to keep need_cmov_or_rewire (albait renamed) and introduce a new struct noce_multiple_sets_info, which I thing made things much nicer. That includes taking care of the very long signatures of the helper functions and improving efficiency by removing a lot of vectors/hash set/hash maps. I've also added a SCALAR_INT_MODE_P check that takes care of the force_reg x86-64 issue. These changes can be found in the two last commits of the new series. Any feedback on the new implementation would be welcome: https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628820.html > Thanks, > Richard > Thanks, Manolis