powerpc/bpf: Only update ldimm64 during extra pass when it is an address

Message ID 3f6d302a2068d9e357efda2d92c8da99a0f2d0b2.1669278892.git.christophe.leroy@csgroup.eu
State New
Headers
Series powerpc/bpf: Only update ldimm64 during extra pass when it is an address |

Commit Message

Christophe Leroy Nov. 24, 2022, 8:39 a.m. UTC
  ldimm64 is not only used for loading function addresses, and
the NOPs added for padding are impacting performance, so avoid
them when not necessary.

On QEMU mac99, with the patch:

test_bpf: #829 ALU64_MOV_K: all immediate value magnitudes jited:1 167436810 PASS
test_bpf: #831 ALU64_OR_K: all immediate value magnitudes jited:1 170702940 PASS

Without the patch:

test_bpf: #829 ALU64_MOV_K: all immediate value magnitudes jited:1 173012360 PASS
test_bpf: #831 ALU64_OR_K: all immediate value magnitudes jited:1 176424090 PASS

That's a 3.5% performance improvement.

Fixes: f9320c49993c ("powerpc/bpf: Update ldimm64 instructions during extra pass")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/net/bpf_jit_comp.c   | 3 ++-
 arch/powerpc/net/bpf_jit_comp32.c | 5 +++--
 arch/powerpc/net/bpf_jit_comp64.c | 5 +++--
 3 files changed, 8 insertions(+), 5 deletions(-)
  

Comments

Naveen N. Rao Nov. 24, 2022, 10:13 a.m. UTC | #1
Christophe Leroy wrote:
> ldimm64 is not only used for loading function addresses, and

That's probably true today, but I worry that that can change upstream 
and we may not notice at all.

> the NOPs added for padding are impacting performance, so avoid
> them when not necessary.
> 
> On QEMU mac99, with the patch:
> 
> test_bpf: #829 ALU64_MOV_K: all immediate value magnitudes jited:1 167436810 PASS
> test_bpf: #831 ALU64_OR_K: all immediate value magnitudes jited:1 170702940 PASS
> 
> Without the patch:
> 
> test_bpf: #829 ALU64_MOV_K: all immediate value magnitudes jited:1 173012360 PASS
> test_bpf: #831 ALU64_OR_K: all immediate value magnitudes jited:1 176424090 PASS
> 
> That's a 3.5% performance improvement.

A better approach would be to do a full JIT during the extra pass. 
That's what most other architectures do today. And, as long as we can 
ensure that the JIT'ed program size can never increase during the 
extra pass, we should be ok to do a single extra pass.


- Naveen
  
Christophe Leroy Nov. 24, 2022, 12:24 p.m. UTC | #2
Le 24/11/2022 à 11:13, Naveen N. Rao a écrit :
> Christophe Leroy wrote:
>> ldimm64 is not only used for loading function addresses, and
> 
> That's probably true today, but I worry that that can change upstream 
> and we may not notice at all.

Not sure what you mean.

Today POWERPC considers that ldimm64 is _always_ loading a function 
address whereas upstream BPF considers that ldimm64 is a function only 
when it is flagged BPF_PSEUDO_FUNC.

In what direction could that change in the future ?

For me if they change that it becomes an API change.

Christophe


> 
>> the NOPs added for padding are impacting performance, so avoid
>> them when not necessary.
>>
>> On QEMU mac99, with the patch:
>>
>> test_bpf: #829 ALU64_MOV_K: all immediate value magnitudes jited:1 
>> 167436810 PASS
>> test_bpf: #831 ALU64_OR_K: all immediate value magnitudes jited:1 
>> 170702940 PASS
>>
>> Without the patch:
>>
>> test_bpf: #829 ALU64_MOV_K: all immediate value magnitudes jited:1 
>> 173012360 PASS
>> test_bpf: #831 ALU64_OR_K: all immediate value magnitudes jited:1 
>> 176424090 PASS
>>
>> That's a 3.5% performance improvement.
> 
> A better approach would be to do a full JIT during the extra pass. 
> That's what most other architectures do today. And, as long as we can 
> ensure that the JIT'ed program size can never increase during the extra 
> pass, we should be ok to do a single extra pass.
> 
> 
> - Naveen
  
Naveen N. Rao Nov. 24, 2022, 1:49 p.m. UTC | #3
Christophe Leroy wrote:
> 
> 
> Le 24/11/2022 à 11:13, Naveen N. Rao a écrit :
>> Christophe Leroy wrote:
>>> ldimm64 is not only used for loading function addresses, and
>> 
>> That's probably true today, but I worry that that can change upstream 
>> and we may not notice at all.
> 
> Not sure what you mean.
> 
> Today POWERPC considers that ldimm64 is _always_ loading a function 
> address whereas upstream BPF considers that ldimm64 is a function only 
> when it is flagged BPF_PSEUDO_FUNC.

Not sure why you think we consider ldimm64 to always be loading a 
function address. Perhaps it is due to the poorly chosen variable name 
func_addr in bpf_jit_fixup_addresses(), or due to the fact that we 
always update the JIT code for ldimm64. In any case, we simply overwrite 
imm64 load instructions to ensure we are using the updated address.

> 
> In what direction could that change in the future ?
> 
> For me if they change that it becomes an API change.

More of an extension, which is exactly what we had when BPF_PSEUDO_FUNC 
was introduced. Took us nearly a year before we noticed.

Because we do not do a full JIT during the extra pass today like other 
architectures, we are the exception - there is always the risk of bpf 
core changes breaking our JIT. So, I still think it is better if we do a 
full JIT during extra pass.


- Naveen
  
Christophe Leroy Nov. 24, 2022, 7:46 p.m. UTC | #4
Le 24/11/2022 à 14:49, Naveen N. Rao a écrit :
> Christophe Leroy wrote:
>>
>>
>> Le 24/11/2022 à 11:13, Naveen N. Rao a écrit :
>>> Christophe Leroy wrote:
>>>> ldimm64 is not only used for loading function addresses, and
>>>
>>> That's probably true today, but I worry that that can change upstream 
>>> and we may not notice at all.
>>
>> Not sure what you mean.
>>
>> Today POWERPC considers that ldimm64 is _always_ loading a function 
>> address whereas upstream BPF considers that ldimm64 is a function only 
>> when it is flagged BPF_PSEUDO_FUNC.
> 
> Not sure why you think we consider ldimm64 to always be loading a 
> function address. Perhaps it is due to the poorly chosen variable name 
> func_addr in bpf_jit_fixup_addresses(), or due to the fact that we 
> always update the JIT code for ldimm64. In any case, we simply overwrite 
> imm64 load instructions to ensure we are using the updated address.

Well that's the padding which make me think that. When ldimm64 is used 
with immediate value, it won't change from one pass to the other. We 
have the need for the padding only because it may contain addresses that 
will change from one pass to another.

> 
>>
>> In what direction could that change in the future ?
>>
>> For me if they change that it becomes an API change.
> 
> More of an extension, which is exactly what we had when BPF_PSEUDO_FUNC 
> was introduced. Took us nearly a year before we noticed.
> 
> Because we do not do a full JIT during the extra pass today like other 
> architectures, we are the exception - there is always the risk of bpf 
> core changes breaking our JIT. So, I still think it is better if we do a 
> full JIT during extra pass.
> 

I like the idea of a full JIT during extra passes and will start looking 
at it.

Will it also allow us to revert your commit fab07611fb2e 
("powerpc32/bpf: Fix codegen for bpf-to-bpf calls") ?

Thanks
Christophe
  
Naveen N. Rao Nov. 25, 2022, 5:38 a.m. UTC | #5
Christophe Leroy wrote:
> 
> 
> Le 24/11/2022 à 14:49, Naveen N. Rao a écrit :
>> Christophe Leroy wrote:
>>>
>>>
>>> Le 24/11/2022 à 11:13, Naveen N. Rao a écrit :
>>>> Christophe Leroy wrote:
>>>
>>> In what direction could that change in the future ?
>>>
>>> For me if they change that it becomes an API change.
>> 
>> More of an extension, which is exactly what we had when BPF_PSEUDO_FUNC 
>> was introduced. Took us nearly a year before we noticed.
>> 
>> Because we do not do a full JIT during the extra pass today like other 
>> architectures, we are the exception - there is always the risk of bpf 
>> core changes breaking our JIT. So, I still think it is better if we do a 
>> full JIT during extra pass.
>> 
> 
> I like the idea of a full JIT during extra passes and will start looking 
> at it.
> 
> Will it also allow us to revert your commit fab07611fb2e 
> ("powerpc32/bpf: Fix codegen for bpf-to-bpf calls") ?

Not entirely. We still need those extra nops during the initial JIT so 
that we can estimate the maximum prog size. During extra pass, we can 
only emit the necessary instructions and skip extra nops. We may need to 
do two passes during extra_pass to adjust the branch targets though.

Thanks,
Naveen
  
Christophe Leroy Nov. 25, 2022, 5:59 a.m. UTC | #6
Le 25/11/2022 à 06:38, Naveen N. Rao a écrit :
> Christophe Leroy wrote:
>>
>>
>> Le 24/11/2022 à 14:49, Naveen N. Rao a écrit :
>>> Christophe Leroy wrote:
>>>>
>>>>
>>>> Le 24/11/2022 à 11:13, Naveen N. Rao a écrit :
>>>>> Christophe Leroy wrote:
>>>>
>>>> In what direction could that change in the future ?
>>>>
>>>> For me if they change that it becomes an API change.
>>>
>>> More of an extension, which is exactly what we had when 
>>> BPF_PSEUDO_FUNC was introduced. Took us nearly a year before we noticed.
>>>
>>> Because we do not do a full JIT during the extra pass today like 
>>> other architectures, we are the exception - there is always the risk 
>>> of bpf core changes breaking our JIT. So, I still think it is better 
>>> if we do a full JIT during extra pass.
>>>
>>
>> I like the idea of a full JIT during extra passes and will start 
>> looking at it.
>>
>> Will it also allow us to revert your commit fab07611fb2e 
>> ("powerpc32/bpf: Fix codegen for bpf-to-bpf calls") ?
> 
> Not entirely. We still need those extra nops during the initial JIT so 
> that we can estimate the maximum prog size. During extra pass, we can 
> only emit the necessary instructions and skip extra nops. We may need to 
> do two passes during extra_pass to adjust the branch targets though.
> 

Before your change, the code was:

	if (image && rel < 0x2000000 && rel >= -0x2000000) {
		PPC_BL(func);
	} else {
		/* Load function address into r0 */
		EMIT(PPC_RAW_LIS(_R0, IMM_H(func)));
		EMIT(PPC_RAW_ORI(_R0, _R0, IMM_L(func)));
		EMIT(PPC_RAW_MTCTR(_R0));
		EMIT(PPC_RAW_BCTRL());
	}

During the initial pass, image is NULL so the else branch is taken.

Christophe
  

Patch

diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 43e634126514..206b698723a3 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -68,7 +68,8 @@  static int bpf_jit_fixup_addresses(struct bpf_prog *fp, u32 *image,
 			 * of the JITed sequence remains unchanged.
 			 */
 			ctx->idx = tmp_idx;
-		} else if (insn[i].code == (BPF_LD | BPF_IMM | BPF_DW)) {
+		} else if (insn[i].code == (BPF_LD | BPF_IMM | BPF_DW) &&
+			   insn[i].src_reg == BPF_PSEUDO_FUNC) {
 			tmp_idx = ctx->idx;
 			ctx->idx = addrs[i] / 4;
 #ifdef CONFIG_PPC32
diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c
index a379b0ce19ff..878f8a88d83e 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -960,8 +960,9 @@  int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 			PPC_LI32(dst_reg_h, (u32)insn[i + 1].imm);
 			PPC_LI32(dst_reg, (u32)insn[i].imm);
 			/* padding to allow full 4 instructions for later patching */
-			for (j = ctx->idx - tmp_idx; j < 4; j++)
-				EMIT(PPC_RAW_NOP());
+			if (insn[i].src_reg == BPF_PSEUDO_FUNC)
+				for (j = ctx->idx - tmp_idx; j < 4; j++)
+					EMIT(PPC_RAW_NOP());
 			/* Adjust for two bpf instructions */
 			addrs[++i] = ctx->idx * 4;
 			break;
diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index 29ee306d6302..af8bdb5553cd 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -938,8 +938,9 @@  int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 			tmp_idx = ctx->idx;
 			PPC_LI64(dst_reg, imm64);
 			/* padding to allow full 5 instructions for later patching */
-			for (j = ctx->idx - tmp_idx; j < 5; j++)
-				EMIT(PPC_RAW_NOP());
+			if (insn[i].src_reg == BPF_PSEUDO_FUNC)
+				for (j = ctx->idx - tmp_idx; j < 5; j++)
+					EMIT(PPC_RAW_NOP());
 			/* Adjust for two bpf instructions */
 			addrs[++i] = ctx->idx * 4;
 			break;