Message ID | 3f6d302a2068d9e357efda2d92c8da99a0f2d0b2.1669278892.git.christophe.leroy@csgroup.eu |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp3267947wrr; Thu, 24 Nov 2022 00:40:57 -0800 (PST) X-Google-Smtp-Source: AA0mqf6CbIF/uqghF3gSy2Aswz3pe73fOqCz6KRG0A+8XD3eR/Pndz3daANVMJIcmfogjqSP6kF8 X-Received: by 2002:a05:6402:1ca5:b0:468:f02a:f523 with SMTP id cz5-20020a0564021ca500b00468f02af523mr13298031edb.362.1669279257533; Thu, 24 Nov 2022 00:40:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669279257; cv=none; d=google.com; s=arc-20160816; b=AaEfSSzEvMyYB2eETDxS8hdZTile5Tl+qDHpYv1rQaZyGcCrHbtVPfT2ouNjJZyTuz DEFKz4E1Q53Y8PSSKbxFDSmXkwcSYYIZWk9o77dK/2vH750do3VsJHQCYqg0LqrCNxM4 Z+a8VkdbAzRUnv+bCQ1a9LeWNAuWVGZ4dwNguUgFmFiQEt2jp0m4zK/5nIW8KRCg0iyl zVg9DgVYx8FKptdeON+L/W6+Y9XaImuLOt9Ua7dU3KtHB0NX1SgvPjB13kjvzqzZs7P8 DIJHVqiCJuCJmKygXFaFr9a9iOYH8akYg2GJ+VzeM3Xwwip6FwnXWSX2owKQKrHU2iEQ t+SA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=plMPVDbKRwI4oy2AadrOgteZXu/c/ea+fZl/T7y1j/w=; b=U9RSLQAppiK2f6j+zpuOYCJwfVsKIrqRinWDUkoYBIK7BTBI0GL9LYaTCYW9BDsini aTtI3clFftFizUgrzs14cJpv4gMlxh9/JhAxqocGLJPTGmN/gPEYben/09qMhFrAvhCx U3XwJ35xEEhP1yOaxbzGkcTEAaFy8MB9RqD98tw4IkYc3MgtWF48ajBy2qMSTrYl99K9 8Xams+UKPyslnUaITTOpq2T1yfAZ4atn+vnrRAG1SBW2Tc+2Oy5UsaxXB9NP5FL9TUZJ 93RMy6VaOXrNELaxf7Db9QKz/dlP9pdj3+aEwP+4iUQXeEFxpSPED/gGVnXcyk2ufvAb RCNw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id gs11-20020a170906f18b00b007adf558e182si294030ejb.926.2022.11.24.00.40.33; Thu, 24 Nov 2022 00:40:57 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229846AbiKXIkA (ORCPT <rfc822;fengqi706@gmail.com> + 99 others); Thu, 24 Nov 2022 03:40:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38922 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229541AbiKXIj7 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 24 Nov 2022 03:39:59 -0500 Received: from pegase2.c-s.fr (pegase2.c-s.fr [93.17.235.10]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BC8D665AE; Thu, 24 Nov 2022 00:39:58 -0800 (PST) Received: from localhost (mailhub3.si.c-s.fr [172.26.127.67]) by localhost (Postfix) with ESMTP id 4NHrz12mK7z9shv; Thu, 24 Nov 2022 09:39:57 +0100 (CET) X-Virus-Scanned: amavisd-new at c-s.fr Received: from pegase2.c-s.fr ([172.26.127.65]) by localhost (pegase2.c-s.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WchORCaqqNge; Thu, 24 Nov 2022 09:39:57 +0100 (CET) Received: from messagerie.si.c-s.fr (messagerie.si.c-s.fr [192.168.25.192]) by pegase2.c-s.fr (Postfix) with ESMTP id 4NHrz06t34z9shP; Thu, 24 Nov 2022 09:39:56 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by messagerie.si.c-s.fr (Postfix) with ESMTP id DAA968B78B; Thu, 24 Nov 2022 09:39:56 +0100 (CET) X-Virus-Scanned: amavisd-new at c-s.fr Received: from messagerie.si.c-s.fr ([127.0.0.1]) by localhost (messagerie.si.c-s.fr [127.0.0.1]) (amavisd-new, port 10023) with ESMTP id JFZpC7am0SjR; Thu, 24 Nov 2022 09:39:56 +0100 (CET) Received: from PO20335.IDSI0.si.c-s.fr (unknown [172.25.230.108]) by messagerie.si.c-s.fr (Postfix) with ESMTP id AD7C18B763; Thu, 24 Nov 2022 09:39:56 +0100 (CET) Received: from PO20335.IDSI0.si.c-s.fr (localhost [127.0.0.1]) by PO20335.IDSI0.si.c-s.fr (8.17.1/8.16.1) with ESMTPS id 2AO8dkua3221470 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Thu, 24 Nov 2022 09:39:46 +0100 Received: (from chleroy@localhost) by PO20335.IDSI0.si.c-s.fr (8.17.1/8.17.1/Submit) id 2AO8dk7h3221465; Thu, 24 Nov 2022 09:39:46 +0100 X-Authentication-Warning: PO20335.IDSI0.si.c-s.fr: chleroy set sender to christophe.leroy@csgroup.eu using -f From: Christophe Leroy <christophe.leroy@csgroup.eu> To: Michael Ellerman <mpe@ellerman.id.au>, Nicholas Piggin <npiggin@gmail.com>, "Naveen N. Rao" <naveen.n.rao@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu>, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, Andrii Nakryiko <andrii@kernel.org>, Martin KaFai Lau <martin.lau@linux.dev>, Song Liu <song@kernel.org>, Yonghong Song <yhs@fb.com>, John Fastabend <john.fastabend@gmail.com>, KP Singh <kpsingh@kernel.org>, Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>, Jiri Olsa <jolsa@kernel.org> Subject: [PATCH] powerpc/bpf: Only update ldimm64 during extra pass when it is an address Date: Thu, 24 Nov 2022 09:39:38 +0100 Message-Id: <3f6d302a2068d9e357efda2d92c8da99a0f2d0b2.1669278892.git.christophe.leroy@csgroup.eu> X-Mailer: git-send-email 2.38.1 MIME-Version: 1.0 X-Developer-Signature: v=1; a=ed25519-sha256; t=1669279174; l=3014; s=20211009; h=from:subject:message-id; bh=+S4KRYqAQ4K62MmHbsCFH4Gyg8GYGcPtfQe/xrukDhE=; b=SGsPF4zlrdXAmx5hZ6L/Mj0KXGgVVQwCQjS+/l1Of/4GO4YL2yDuj2UEB7tQ5hY1YKDXQyhkL9Rz Gx7wHOzjBWFwMfVZbSfDzE0dk1L1fzDXltaKp/0QcuJ/i/BpuLDw X-Developer-Key: i=christophe.leroy@csgroup.eu; a=ed25519; pk=HIzTzUj91asvincQGOFx6+ZF5AoUuP9GdOtQChs7Mm0= Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1750366166436427498?= X-GMAIL-MSGID: =?utf-8?q?1750366166436427498?= |
Series |
powerpc/bpf: Only update ldimm64 during extra pass when it is an address
|
|
Commit Message
Christophe Leroy
Nov. 24, 2022, 8:39 a.m. UTC
ldimm64 is not only used for loading function addresses, and
the NOPs added for padding are impacting performance, so avoid
them when not necessary.
On QEMU mac99, with the patch:
test_bpf: #829 ALU64_MOV_K: all immediate value magnitudes jited:1 167436810 PASS
test_bpf: #831 ALU64_OR_K: all immediate value magnitudes jited:1 170702940 PASS
Without the patch:
test_bpf: #829 ALU64_MOV_K: all immediate value magnitudes jited:1 173012360 PASS
test_bpf: #831 ALU64_OR_K: all immediate value magnitudes jited:1 176424090 PASS
That's a 3.5% performance improvement.
Fixes: f9320c49993c ("powerpc/bpf: Update ldimm64 instructions during extra pass")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
arch/powerpc/net/bpf_jit_comp.c | 3 ++-
arch/powerpc/net/bpf_jit_comp32.c | 5 +++--
arch/powerpc/net/bpf_jit_comp64.c | 5 +++--
3 files changed, 8 insertions(+), 5 deletions(-)
Comments
Christophe Leroy wrote: > ldimm64 is not only used for loading function addresses, and That's probably true today, but I worry that that can change upstream and we may not notice at all. > the NOPs added for padding are impacting performance, so avoid > them when not necessary. > > On QEMU mac99, with the patch: > > test_bpf: #829 ALU64_MOV_K: all immediate value magnitudes jited:1 167436810 PASS > test_bpf: #831 ALU64_OR_K: all immediate value magnitudes jited:1 170702940 PASS > > Without the patch: > > test_bpf: #829 ALU64_MOV_K: all immediate value magnitudes jited:1 173012360 PASS > test_bpf: #831 ALU64_OR_K: all immediate value magnitudes jited:1 176424090 PASS > > That's a 3.5% performance improvement. A better approach would be to do a full JIT during the extra pass. That's what most other architectures do today. And, as long as we can ensure that the JIT'ed program size can never increase during the extra pass, we should be ok to do a single extra pass. - Naveen
Le 24/11/2022 à 11:13, Naveen N. Rao a écrit : > Christophe Leroy wrote: >> ldimm64 is not only used for loading function addresses, and > > That's probably true today, but I worry that that can change upstream > and we may not notice at all. Not sure what you mean. Today POWERPC considers that ldimm64 is _always_ loading a function address whereas upstream BPF considers that ldimm64 is a function only when it is flagged BPF_PSEUDO_FUNC. In what direction could that change in the future ? For me if they change that it becomes an API change. Christophe > >> the NOPs added for padding are impacting performance, so avoid >> them when not necessary. >> >> On QEMU mac99, with the patch: >> >> test_bpf: #829 ALU64_MOV_K: all immediate value magnitudes jited:1 >> 167436810 PASS >> test_bpf: #831 ALU64_OR_K: all immediate value magnitudes jited:1 >> 170702940 PASS >> >> Without the patch: >> >> test_bpf: #829 ALU64_MOV_K: all immediate value magnitudes jited:1 >> 173012360 PASS >> test_bpf: #831 ALU64_OR_K: all immediate value magnitudes jited:1 >> 176424090 PASS >> >> That's a 3.5% performance improvement. > > A better approach would be to do a full JIT during the extra pass. > That's what most other architectures do today. And, as long as we can > ensure that the JIT'ed program size can never increase during the extra > pass, we should be ok to do a single extra pass. > > > - Naveen
Christophe Leroy wrote: > > > Le 24/11/2022 à 11:13, Naveen N. Rao a écrit : >> Christophe Leroy wrote: >>> ldimm64 is not only used for loading function addresses, and >> >> That's probably true today, but I worry that that can change upstream >> and we may not notice at all. > > Not sure what you mean. > > Today POWERPC considers that ldimm64 is _always_ loading a function > address whereas upstream BPF considers that ldimm64 is a function only > when it is flagged BPF_PSEUDO_FUNC. Not sure why you think we consider ldimm64 to always be loading a function address. Perhaps it is due to the poorly chosen variable name func_addr in bpf_jit_fixup_addresses(), or due to the fact that we always update the JIT code for ldimm64. In any case, we simply overwrite imm64 load instructions to ensure we are using the updated address. > > In what direction could that change in the future ? > > For me if they change that it becomes an API change. More of an extension, which is exactly what we had when BPF_PSEUDO_FUNC was introduced. Took us nearly a year before we noticed. Because we do not do a full JIT during the extra pass today like other architectures, we are the exception - there is always the risk of bpf core changes breaking our JIT. So, I still think it is better if we do a full JIT during extra pass. - Naveen
Le 24/11/2022 à 14:49, Naveen N. Rao a écrit : > Christophe Leroy wrote: >> >> >> Le 24/11/2022 à 11:13, Naveen N. Rao a écrit : >>> Christophe Leroy wrote: >>>> ldimm64 is not only used for loading function addresses, and >>> >>> That's probably true today, but I worry that that can change upstream >>> and we may not notice at all. >> >> Not sure what you mean. >> >> Today POWERPC considers that ldimm64 is _always_ loading a function >> address whereas upstream BPF considers that ldimm64 is a function only >> when it is flagged BPF_PSEUDO_FUNC. > > Not sure why you think we consider ldimm64 to always be loading a > function address. Perhaps it is due to the poorly chosen variable name > func_addr in bpf_jit_fixup_addresses(), or due to the fact that we > always update the JIT code for ldimm64. In any case, we simply overwrite > imm64 load instructions to ensure we are using the updated address. Well that's the padding which make me think that. When ldimm64 is used with immediate value, it won't change from one pass to the other. We have the need for the padding only because it may contain addresses that will change from one pass to another. > >> >> In what direction could that change in the future ? >> >> For me if they change that it becomes an API change. > > More of an extension, which is exactly what we had when BPF_PSEUDO_FUNC > was introduced. Took us nearly a year before we noticed. > > Because we do not do a full JIT during the extra pass today like other > architectures, we are the exception - there is always the risk of bpf > core changes breaking our JIT. So, I still think it is better if we do a > full JIT during extra pass. > I like the idea of a full JIT during extra passes and will start looking at it. Will it also allow us to revert your commit fab07611fb2e ("powerpc32/bpf: Fix codegen for bpf-to-bpf calls") ? Thanks Christophe
Christophe Leroy wrote: > > > Le 24/11/2022 à 14:49, Naveen N. Rao a écrit : >> Christophe Leroy wrote: >>> >>> >>> Le 24/11/2022 à 11:13, Naveen N. Rao a écrit : >>>> Christophe Leroy wrote: >>> >>> In what direction could that change in the future ? >>> >>> For me if they change that it becomes an API change. >> >> More of an extension, which is exactly what we had when BPF_PSEUDO_FUNC >> was introduced. Took us nearly a year before we noticed. >> >> Because we do not do a full JIT during the extra pass today like other >> architectures, we are the exception - there is always the risk of bpf >> core changes breaking our JIT. So, I still think it is better if we do a >> full JIT during extra pass. >> > > I like the idea of a full JIT during extra passes and will start looking > at it. > > Will it also allow us to revert your commit fab07611fb2e > ("powerpc32/bpf: Fix codegen for bpf-to-bpf calls") ? Not entirely. We still need those extra nops during the initial JIT so that we can estimate the maximum prog size. During extra pass, we can only emit the necessary instructions and skip extra nops. We may need to do two passes during extra_pass to adjust the branch targets though. Thanks, Naveen
Le 25/11/2022 à 06:38, Naveen N. Rao a écrit : > Christophe Leroy wrote: >> >> >> Le 24/11/2022 à 14:49, Naveen N. Rao a écrit : >>> Christophe Leroy wrote: >>>> >>>> >>>> Le 24/11/2022 à 11:13, Naveen N. Rao a écrit : >>>>> Christophe Leroy wrote: >>>> >>>> In what direction could that change in the future ? >>>> >>>> For me if they change that it becomes an API change. >>> >>> More of an extension, which is exactly what we had when >>> BPF_PSEUDO_FUNC was introduced. Took us nearly a year before we noticed. >>> >>> Because we do not do a full JIT during the extra pass today like >>> other architectures, we are the exception - there is always the risk >>> of bpf core changes breaking our JIT. So, I still think it is better >>> if we do a full JIT during extra pass. >>> >> >> I like the idea of a full JIT during extra passes and will start >> looking at it. >> >> Will it also allow us to revert your commit fab07611fb2e >> ("powerpc32/bpf: Fix codegen for bpf-to-bpf calls") ? > > Not entirely. We still need those extra nops during the initial JIT so > that we can estimate the maximum prog size. During extra pass, we can > only emit the necessary instructions and skip extra nops. We may need to > do two passes during extra_pass to adjust the branch targets though. > Before your change, the code was: if (image && rel < 0x2000000 && rel >= -0x2000000) { PPC_BL(func); } else { /* Load function address into r0 */ EMIT(PPC_RAW_LIS(_R0, IMM_H(func))); EMIT(PPC_RAW_ORI(_R0, _R0, IMM_L(func))); EMIT(PPC_RAW_MTCTR(_R0)); EMIT(PPC_RAW_BCTRL()); } During the initial pass, image is NULL so the else branch is taken. Christophe
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c index 43e634126514..206b698723a3 100644 --- a/arch/powerpc/net/bpf_jit_comp.c +++ b/arch/powerpc/net/bpf_jit_comp.c @@ -68,7 +68,8 @@ static int bpf_jit_fixup_addresses(struct bpf_prog *fp, u32 *image, * of the JITed sequence remains unchanged. */ ctx->idx = tmp_idx; - } else if (insn[i].code == (BPF_LD | BPF_IMM | BPF_DW)) { + } else if (insn[i].code == (BPF_LD | BPF_IMM | BPF_DW) && + insn[i].src_reg == BPF_PSEUDO_FUNC) { tmp_idx = ctx->idx; ctx->idx = addrs[i] / 4; #ifdef CONFIG_PPC32 diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c index a379b0ce19ff..878f8a88d83e 100644 --- a/arch/powerpc/net/bpf_jit_comp32.c +++ b/arch/powerpc/net/bpf_jit_comp32.c @@ -960,8 +960,9 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context * PPC_LI32(dst_reg_h, (u32)insn[i + 1].imm); PPC_LI32(dst_reg, (u32)insn[i].imm); /* padding to allow full 4 instructions for later patching */ - for (j = ctx->idx - tmp_idx; j < 4; j++) - EMIT(PPC_RAW_NOP()); + if (insn[i].src_reg == BPF_PSEUDO_FUNC) + for (j = ctx->idx - tmp_idx; j < 4; j++) + EMIT(PPC_RAW_NOP()); /* Adjust for two bpf instructions */ addrs[++i] = ctx->idx * 4; break; diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c index 29ee306d6302..af8bdb5553cd 100644 --- a/arch/powerpc/net/bpf_jit_comp64.c +++ b/arch/powerpc/net/bpf_jit_comp64.c @@ -938,8 +938,9 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context * tmp_idx = ctx->idx; PPC_LI64(dst_reg, imm64); /* padding to allow full 5 instructions for later patching */ - for (j = ctx->idx - tmp_idx; j < 5; j++) - EMIT(PPC_RAW_NOP()); + if (insn[i].src_reg == BPF_PSEUDO_FUNC) + for (j = ctx->idx - tmp_idx; j < 5; j++) + EMIT(PPC_RAW_NOP()); /* Adjust for two bpf instructions */ addrs[++i] = ctx->idx * 4; break;