Message ID | 20231122221814.139916-3-deller@kernel.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:6358:6f03:b0:164:83eb:24d7 with SMTP id r3csp1537113rwn; Wed, 22 Nov 2023 14:18:42 -0800 (PST) X-Google-Smtp-Source: AGHT+IG3IK185RNMdbkZYeJTYQGlHH3ipTiuVPtccnurZUkznN1QIyW2pJSPjDKLF4YCvYQa04SV X-Received: by 2002:a05:6a00:1c8c:b0:6be:5a1a:3b93 with SMTP id y12-20020a056a001c8c00b006be5a1a3b93mr3602956pfw.4.1700691521969; Wed, 22 Nov 2023 14:18:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700691521; cv=none; d=google.com; s=arc-20160816; b=bnE1dGRe+GCF5ibCR/fPQffB7zaEgyXOzKnTAPmdcvag/SM/v2pAp7fdVyyTYMlDCv CDo/43JyTVg9GxYHLf2ZqrWZLlugRgnTwdlPEZ49etd5o1iELy7sQ4An8qHA//c/b/h3 Ul9oq/oR3yG1OaFEIQR7vxVsqcdTGO6svr/PEP0U4ReA3fyraxxtYAx8p/wU4qmwx3ge AyYiMlVDkWWJ9lqMvaCg3n9YvMNsnriYMM+gyq2xP2i7QZxru8S9k+y0BKiuEcwk0+cA qJST0fEWKH/ZrIFWeOoAehIYJpSAZvBTDB9J8ClJe26b2eoZ43vfMQQfqkxHHyDvJXLt hxRA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:to:from :dkim-signature; bh=//rM/jtPk29ZexVoO3S4krAhI8Q91c//MPYQhR89e/E=; fh=p1Qf4H5TYe6x87oeu/ZwD97uWE3f/aSmdhLxEgeh+tk=; b=hAMssxuCWxkxUIe+QkW2ptSrHXqydIxbP1dL+k3g56opWSAuBpsA+SY4SGj2Cuosvn wqpux9hj2tt7UqZEjGAnu/WF03qdl25i0EawaihR5AbMPUcHeCgbNWfIr+vG75I6qxXM pGJgTPdxPetVJFS1Cvkzh9kGmi0dhRArrCnMd40bk8oBl3YeYCPIX/uTFbdOk9eCwRUl PU/NK91pQSrVxBbGKEfNkZHnvPa3oE+QpwsgjbXctDh2erPbFyyf3Oo3JfJuLLFDPrw8 NFSgWkyQxtJsaKi6mRbj9DpSPxxXsTcOEAH2CsqvxnTiXo/BUGiRxHYOIp8k4YzSZeJq eA9A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=DDxAjeN0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id l2-20020a056a00140200b0068a65b26fc6si426415pfu.43.2023.11.22.14.18.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Nov 2023 14:18:41 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=DDxAjeN0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id A75BE825A6C0; Wed, 22 Nov 2023 14:18:38 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344685AbjKVWSb (ORCPT <rfc822;ouuuleilei@gmail.com> + 99 others); Wed, 22 Nov 2023 17:18:31 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33814 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344661AbjKVWS0 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 22 Nov 2023 17:18:26 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D83F0D42 for <linux-kernel@vger.kernel.org>; Wed, 22 Nov 2023 14:18:22 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0AC02C433CC; Wed, 22 Nov 2023 22:18:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1700691502; bh=naPsrd6dgZmrcqFRnBTJLF85zK+mznbfmwCxwvHg1bc=; h=From:To:Subject:Date:In-Reply-To:References:From; b=DDxAjeN0PF56HLuHq+owxFyRf1SXkUc0Z3SIxrGBECInqJ9bI5WTebiKUjB+kdXgc 4GAAJFWkwGKOeLi4wqgSc5qKmWrSDg2dC/VcMaTpnvQB/ctWee/LrWjppfZMvUUEZT +Xt4fRitwdPMjB5F1t2+G/fVNeDMtj/EVVXtakgp2AUggs4s5bljxWU2MTDNSB4ElI e5d9Yb4xkQhock5YbwJnwZ+TbIKMpPf7s/cT+0vH51VrCbGGPqjMw2BFSh1//YguKf ZXpMGYJ71+KG4ilzh78AGFUUbGJIONjGDFGHncsmOLm0RinIWC8kHlcjJce+uJ0aNU YM+gqd4jArNIg== From: deller@kernel.org To: linux-kernel@vger.kernel.org, Masahiro Yamada <masahiroy@kernel.org>, Arnd Bergmann <arnd@arndb.de>, linux-modules@vger.kernel.org, linux-arch@vger.kernel.org, Luis Chamberlain <mcgrof@kernel.org> Subject: [PATCH 2/4] modules: Ensure 64-bit alignment on __ksymtab_* sections Date: Wed, 22 Nov 2023 23:18:12 +0100 Message-ID: <20231122221814.139916-3-deller@kernel.org> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231122221814.139916-1-deller@kernel.org> References: <20231122221814.139916-1-deller@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Wed, 22 Nov 2023 14:18:38 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1783304313326062221 X-GMAIL-MSGID: 1783304313326062221 |
Series |
[1/4] linux/export: Fix alignment for 64-bit ksymtab entries
|
|
Commit Message
Helge Deller
Nov. 22, 2023, 10:18 p.m. UTC
From: Helge Deller <deller@gmx.de> On 64-bit architectures without CONFIG_HAVE_ARCH_PREL32_RELOCATIONS (e.g. ppc64, ppc64le, parisc, s390x,...) the __KSYM_REF() macro stores 64-bit pointers into the __ksymtab* sections. Make sure that those sections will be correctly aligned at module link time, otherwise unaligned memory accesses may happen at runtime. The __kcrctab* sections store 32-bit entities, so use ALIGN(4) for those. Signed-off-by: Helge Deller <deller@gmx.de> Cc: <stable@vger.kernel.org> # v6.0+ --- scripts/module.lds.S | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
Comments
On Wed, Nov 22, 2023 at 11:18:12PM +0100, deller@kernel.org wrote: > From: Helge Deller <deller@gmx.de> > > On 64-bit architectures without CONFIG_HAVE_ARCH_PREL32_RELOCATIONS > (e.g. ppc64, ppc64le, parisc, s390x,...) the __KSYM_REF() macro stores > 64-bit pointers into the __ksymtab* sections. > Make sure that those sections will be correctly aligned at module link time, > otherwise unaligned memory accesses may happen at runtime. The ramifications are not explained there. You keep sending me patches with this and we keep doing a nose dive on this. It means I have to do more work. So as I had suggested with your patch which I merged in commit 87c482bdfa79 ("modules: Ensure natural alignment for .altinstructions and __bug_table sections") please clarify the impact of not merging this patch. Also last time you noticed the misalignment due to a faulty exception handler, please mention how you found this out now. And since this is not your first patch on the exact related subject I'd appreciate if you can show me perf stat results differences between having and not having this patch merged. Why? Because we talk about a performance penalthy, but we are not saying how much, and since this is an ongoing thing, we might as well have a tool belt with ways to measure such performance impact to bring clarity and value to this and future related patches. > The __kcrctab* sections store 32-bit entities, so use ALIGN(4) for those. I've given some thought about how to test this. Sadly perf kallsysms just opens the /proc/kallsysms file, but that's fine, we need our own test. I think a 3 new simple modules selftest would do it and running perf stat on it. One module, let us call it module A which constructs its own name space prefix for its exported symbols and has tons of silly symbols for arbitrary data, whatever. We then have module B which refers to a few arbitrary symbols from module A, hopefully spread out linearly, so if module A had 10,000 symbols, we'd have module A refer to a symbol ever 1,000 symbols. Finally we want a symbol C which has say, 50,000 symbols all of which will not be used at all by the first two modules, but the selftest will load module C first, prior to calling modprobe B. We'll stress test this way two calls which use find_symbol(): 1) Upon load of B it will trigger simplify_symbols() to look for the symbol it uses from the module A with tons of symbols. That's an indirect way for us to call resolve_symbol_wait() from module A without having to use symbol_get() which want to remove as exported as it is just a hack which should go away. Our goal is for us to test resolve_symbol() which will call find_symbol() and that will eventually look for the symbol on module A with: find_exported_symbol_in_section() That uses bsearch() so a binary search for the symbol and we'd end up hitting the misalignments here. Binary search will at worst be O(log(n)) and so the only way to aggreviate the search will be to add tons of symbols to A, and have B use a few of them. 2) When you load B, userspace will at first load A as depmod will inform userspace A goes before B. Upon B's load towards the end right before we call module B's init routine we get complete_formation() called on the module. That will first check for duplicate symbols with the call to verify_exported_symbols(). That is when we'll force iteration on module C's insane symbol list. The selftests just runs perf stat -e pick-your-poison-for-misalignments tools/testing/selftests/kmod/ksymtab.sh Where ksymtab.sh is your new script which calls: modprobe C modprobe B I say pick-your-poison-for-misalignments because I am not sure what is best here. Thoughts? > Signed-off-by: Helge Deller <deller@gmx.de> > Cc: <stable@vger.kernel.org> # v6.0+ That's a stretch without any data, don't you think? Luis
Hi Luis, On 12/22/23 06:59, Luis Chamberlain wrote: > On Wed, Nov 22, 2023 at 11:18:12PM +0100, deller@kernel.org wrote: >> From: Helge Deller <deller@gmx.de> >> >> On 64-bit architectures without CONFIG_HAVE_ARCH_PREL32_RELOCATIONS >> (e.g. ppc64, ppc64le, parisc, s390x,...) the __KSYM_REF() macro stores >> 64-bit pointers into the __ksymtab* sections. >> Make sure that those sections will be correctly aligned at module link time, >> otherwise unaligned memory accesses may happen at runtime. > > The ramifications are not explained there. What do you mean exactly by this? > You keep sending me patches with this and we keep doing a nose dive > on this. It means I have to do more work. Sorry about that. Since you are mentioned as maintainer for modules I thought you are the right contact. Furthermore, this patch is pretty small and trivial. And I had the impression that people understand that having unaligned structures is generally a bad idea as it usually always impacts performance (although the performance penalty in this case isn't critical at all, as we are not on hot paths). > So as I had suggested with your patch which I merged in > commit 87c482bdfa79 ("modules: Ensure natural alignment for > .altinstructions and __bug_table sections") please clarify the > impact of not merging this patch. Also last time you noticed the > misalignment due to a faulty exception handler, please mention how > you found this out now. Sure. > And since this is not your first patch on the exact related subject > I'd appreciate if you can show me perf stat results differences between > having and not having this patch merged. Why? Because we talk about > a performance penalthy, but we are not saying how much, and since this > is an ongoing thing, we might as well have a tool belt with ways to > measure such performance impact to bring clarity and value to this > and future related patches. > >> The __kcrctab* sections store 32-bit entities, so use ALIGN(4) for those. > > I've given some thought about how to test this. Sadly perf kallsysms > just opens the /proc/kallsysms file, but that's fine, we need our own > test. > > I think a 3 new simple modules selftest would do it and running perf > stat on it. One module, let us call it module A which constructs its own > name space prefix for its exported symbols and has tons of silly symbols > for arbitrary data, whatever. We then have module B which refers to a > few arbitrary symbols from module A, hopefully spread out linearly, so > if module A had 10,000 symbols, we'd have module A refer to a symbol > ever 1,000 symbols. Finally we want a symbol C which has say, 50,000 > symbols all of which will not be used at all by the first two modules, > but the selftest will load module C first, prior to calling modprobe B. > > We'll stress test this way two calls which use find_symbol(): > > 1) Upon load of B it will trigger simplify_symbols() to look for the > symbol it uses from the module A with tons of symbols. That's an > indirect way for us to call resolve_symbol_wait() from module A without > having to use symbol_get() which want to remove as exported as it is > just a hack which should go away. Our goal is for us to test > resolve_symbol() which will call find_symbol() and that will eventually > look for the symbol on module A with: > > find_exported_symbol_in_section() > > That uses bsearch() so a binary search for the symbol and we'd end up > hitting the misalignments here. Binary search will at worst be O(log(n)) > and so the only way to aggreviate the search will be to add tons of > symbols to A, and have B use a few of them. > > 2) When you load B, userspace will at first load A as depmod will inform > userspace A goes before B. Upon B's load towards the end right before > we call module B's init routine we get complete_formation() called on > the module. That will first check for duplicate symbols with the call > to verify_exported_symbols(). That is when we'll force iteration on > module C's insane symbol list. > > The selftests just runs > > perf stat -e pick-your-poison-for-misalignments tools/testing/selftests/kmod/ksymtab.sh > > Where ksymtab.sh is your new script which calls: > > modprobe C > modprobe B > > I say pick-your-poison-for-misalignments because I am not sure what is > best here. > > Thoughts? I really appreciate your thoughts here! But given the triviality of this patch, I personally think you are proposing a huge academic investment in time and resources with no real benefit. On which architecture would you suggest to test this? What would be the effective (really new) outcome? If the performance penalty is zero or close to zero, will that imply that such a patch isn't useful? Please also keep in mind that not everyone gets paid to work on the kernel, so I have neither the time nor the various architectures to test on. Then, as Masahiro already mentioned, the real solution is probably to add ".balign" to the various inline assembler parts. With this the alignment gets recorded in the sh_addralign field in the object files, and the linker will correctly lay out the executable even without this patch here. And, this is exactly what you would get if C-initializers would have been used instead of inline assembly. But independend of the correct way to fix it, I think the linker script ideally should mention all sections it expects with the right alignments. Just for completeness. I'll leave it up to you if you will apply this patch. Currently it's not absolutely needed any longer, but let's think about the pros/cons of this patch: Pro: - It can prevent unnecessary unaligned memory accesses on all arches. - It provides the linker with correct alignment for the various sections. - It mimics what you would get if the structs were coded with C-initializers. - A correct section alignment can address upcoming inline assembly patches which may have missed to add the .balign Cons: - For this specific patch the worst case scenario is that we add a total of up to 7 bytes to kernel image (__ksymtab gets aligned to 8 bytes and the following sections are then already aligned). So, honestly I don't see a real reason why it shouldn't be applied... >> Signed-off-by: Helge Deller <deller@gmx.de> >> Cc: <stable@vger.kernel.org> # v6.0+ > > That's a stretch without any data, don't you think? Yes. No need to push such a patch to stable series. Helge
On Fri, Dec 22, 2023 at 01:13:26PM +0100, Helge Deller wrote: > Hi Luis, > > On 12/22/23 06:59, Luis Chamberlain wrote: > > On Wed, Nov 22, 2023 at 11:18:12PM +0100, deller@kernel.org wrote: > > > From: Helge Deller <deller@gmx.de> > > > > > > On 64-bit architectures without CONFIG_HAVE_ARCH_PREL32_RELOCATIONS > > > (e.g. ppc64, ppc64le, parisc, s390x,...) the __KSYM_REF() macro stores > > > 64-bit pointers into the __ksymtab* sections. > > > Make sure that those sections will be correctly aligned at module link time, > > > otherwise unaligned memory accesses may happen at runtime. > > > > The ramifications are not explained there. > > What do you mean exactly by this? You don't explain the impact of not applying this patch. > > You keep sending me patches with this and we keep doing a nose dive > > on this. It means I have to do more work. > Sorry about that. Since you are mentioned as maintainer for modules I > thought you are the right contact. I am certaintly the right person to send this patch to, however, I am saying that given our previous dialog I would like the commit log to describe a bit more detail of a few things: * how you found this * what are the impact of not applying it Specially if you are going to be sending patches right before the holidays with a Cc stable fix annotation. This gets me on hight alert and I have to put things down to see how really critical this is so to decide to fast track this to Linus or not. > Furthermore, this patch is pretty small and trivial. Many patches can be but some can break things and some can be small but also improve things critically, for example if we are aware of broken exception handlers. > And I had the impression that people understand that having unaligned > structures is generally a bad idea as it usually always impacts performance > (although the performance penalty in this case isn't critical at all, > as we are not on hot paths). There are two aspects to this, one is the critical nature which is implied by your patch which pegs it as a stable fix, given you prior patches about this it leaves me wondering if it is fixing some crashes on some systems given faulty exception handlers. The performance thing is also subjective, but it could not be subjective by doing some very trivial tests as I suggested. Such a test would also help as we lack specific selftsts for this case and we can grow it later with other things. I figured I'd ask you to try it, since *you* keep sending patches about misalignments on module sections so I figured you must be seeing something others are not, and so you must care. > > Thoughts? > > I really appreciate your thoughts here! > > But given the triviality of this patch, I personally think you are > proposing a huge academic investment in time and resources with no real benefit. > On which architecture would you suggest to test this? > What would be the effective (really new) outcome? > If the performance penalty is zero or close to zero, will that imply > that such a patch isn't useful? > Please also keep in mind that not everyone gets paid to work on the kernel, > so I have neither the time nor the various architectures to test on. I think you make this seem so difficult, but I understand it could seem that way. I've attached at the end of this email a patch that does just this then to help. > So, honestly I don't see a real reason why it shouldn't be applied... Like I said, you Cc'd stable as a fix, as a maintainer it is my job to verify how critical this is and ask for more details about how you found it and evaluate the real impact. Even if it was not a stable fix I tend to ask this for patches, even if they are trivial. > > > Signed-off-by: Helge Deller <deller@gmx.de> > > > Cc: <stable@vger.kernel.org> # v6.0+ > > > > That's a stretch without any data, don't you think? > > Yes. No need to push such a patch to stable series. OK, can you extend the patch below with something like: perf stat --repeat 100 --pre 'modprobe -r b a b c' -- ./tools/testing/selftests/module/find_symbol.sh And test before and after? I ran a simple test as-is and the data I get is within noise, and so I think we need the --repeat 100 thing. ----------------------------------------------------------------------------------- before: sudo ./tools/testing/selftests/module/find_symbol.sh Performance counter stats for '/sbin/modprobe test_kallsyms_b': 81,956,206 ns duration_time 81,883,000 ns system_time 210 page-faults 0.081956206 seconds time elapsed 0.000000000 seconds user 0.081883000 seconds sys Performance counter stats for '/sbin/modprobe test_kallsyms_b': 85,960,863 ns duration_time 84,679,000 ns system_time 212 page-faults 0.085960863 seconds time elapsed 0.000000000 seconds user 0.084679000 seconds sys Performance counter stats for '/sbin/modprobe test_kallsyms_b': 86,484,868 ns duration_time 86,541,000 ns system_time 213 page-faults 0.086484868 seconds time elapsed 0.000000000 seconds user 0.086541000 seconds sys ----------------------------------------------------------------------------------- After your modules alignement fix: sudo ./tools/testing/selftests/module/find_symbol.sh Performance counter stats for '/sbin/modprobe test_kallsyms_b': 83,579,980 ns duration_time 83,530,000 ns system_time 212 page-faults 0.083579980 seconds time elapsed 0.000000000 seconds user 0.083530000 seconds sys Performance counter stats for '/sbin/modprobe test_kallsyms_b': 70,721,786 ns duration_time 69,289,000 ns system_time 211 page-faults 0.070721786 seconds time elapsed 0.000000000 seconds user 0.069289000 seconds sys Performance counter stats for '/sbin/modprobe test_kallsyms_b': 76,513,219 ns duration_time 76,381,000 ns system_time 214 page-faults 0.076513219 seconds time elapsed 0.000000000 seconds user 0.076381000 seconds sys After your modules alignement fix: sudo ./tools/testing/selftests/module/find_symbol.sh Performance counter stats for '/sbin/modprobe test_kallsyms_b': 83,579,980 ns duration_time 83,530,000 ns system_time 212 page-faults 0.083579980 seconds time elapsed 0.000000000 seconds user 0.083530000 seconds sys Performance counter stats for '/sbin/modprobe test_kallsyms_b': 70,721,786 ns duration_time 69,289,000 ns system_time 211 page-faults 0.070721786 seconds time elapsed 0.000000000 seconds user 0.069289000 seconds sys Performance counter stats for '/sbin/modprobe test_kallsyms_b': 76,513,219 ns duration_time 76,381,000 ns system_time 214 page-faults 0.076513219 seconds time elapsed 0.000000000 seconds user 0.076381000 seconds sys ----------------------------------------------------------------------------------- From efe903f7a1e5917405f9555e9c00b1da2ac4b830 Mon Sep 17 00:00:00 2001 From: Luis Chamberlain <mcgrof@kernel.org> Date: Fri, 22 Dec 2023 11:22:08 -0800 Subject: [PATCH] selftests: add new kallsyms selftests Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> --- lib/Kconfig.debug | 95 +++++++++++++ lib/Makefile | 1 + lib/tests/Makefile | 1 + lib/tests/module/.gitignore | 4 + lib/tests/module/Makefile | 15 ++ lib/tests/module/gen_test_kallsyms.sh | 128 ++++++++++++++++++ tools/testing/selftests/module/Makefile | 12 ++ tools/testing/selftests/module/config | 7 + tools/testing/selftests/module/find_symbol.sh | 65 +++++++++ 9 files changed, 328 insertions(+) create mode 100644 lib/tests/Makefile create mode 100644 lib/tests/module/.gitignore create mode 100644 lib/tests/module/Makefile create mode 100755 lib/tests/module/gen_test_kallsyms.sh create mode 100644 tools/testing/selftests/module/Makefile create mode 100644 tools/testing/selftests/module/config create mode 100755 tools/testing/selftests/module/find_symbol.sh diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 97ce28f4d154..4f5fbaceaf88 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -2835,6 +2835,101 @@ config TEST_KMOD If unsure, say N. +config TEST_RUNTIME + bool + +config TEST_RUNTIME_MODULE + bool + +config TEST_KALLSYMS + tristate "module kallsyms find_symbol() test" + depends on m + select TEST_RUNTIME + select TEST_RUNTIME_MODULE + select TEST_KALLSYMS_A + select TEST_KALLSYMS_B + select TEST_KALLSYMS_C + select TEST_KALLSYMS_D + help + This allows us to stress test find_symbol() through the kallsyms + used to place symbols on the kernel ELF kallsyms and modules kallsyms + where we place kernel symbols such as exported symbols. + + We have four test modules: + + A: has KALLSYSMS_NUMSYMS exported symbols + B: uses one of A's symbols + C: adds KALLSYMS_SCALE_FACTOR * KALLSYSMS_NUMSYMS exported + D: adds 2 * the symbols than C + + We stress test find_symbol() through two means: + + 1) Upon load of B it will trigger simplify_symbols() to look for the + one symbol it uses from the module A with tons of symbols. This is an + indirect way for us to have B call resolve_symbol_wait() upon module + load. This will eventually call find_symbol() which will eventually + try to find the symbols used with find_exported_symbol_in_section(). + find_exported_symbol_in_section() uses bsearch() so a binary search + for each symbol. Binary search will at worst be O(log(n)) so the + larger TEST_MODULE_KALLSYSMS the worse the search. + + 2) The selftests should load C first, before B. Upon B's load towards + the end right before we call module B's init routine we get + complete_formation() called on the module. That will first check + for duplicate symbols with the call to verify_exported_symbols(). + That is when we'll force iteration on module C's insane symbol list. + Since it has 10 * KALLSYMS_NUMSYMS it means we can first test + just loading B without C. The amount of time it takes to load C Vs + B can give us an idea of the impact growth of the symbol space and + give us projection. Module A only uses one symbol from B so to allow + this scaling in module C to be proportional, if it used more symbols + then the first test would be doing more and increasing just the + search space would be slightly different. The last module, module D + will just increase the search space by twice the number of symbols in + C so to allow for full projects. + + tools/testing/selftests/module/find_symbol.sh + + If unsure, say N. + +if TEST_KALLSYMS + +config TEST_KALLSYMS_A + tristate + depends on m + +config TEST_KALLSYMS_B + tristate + depends on m + +config TEST_KALLSYMS_C + tristate + depends on m + +config TEST_KALLSYMS_D + tristate + depends on m + +config TEST_KALLSYMS_NUMSYMS + int "test kallsyms number of symbols" + default 10000 + help + The number of symbols to create on TEST_KALLSYMS_A, only one of which + module TEST_KALLSYMS_B will use. This also will be used + for how many symbols TEST_KALLSYMS_C will have, scaled up by + TEST_KALLSYMS_SCALE_FACTOR. + +config TEST_KALLSYMS_SCALE_FACTOR + int "test kallsyms scale factor" + default 4 + help + How many more unusued symbols will TEST_KALLSYSMS_C have than + TEST_KALLSYMS_A. If 4, then module C will have 4 * syms + than module A. Then TEST_KALLSYMS_D will have double the amount + of symbols than C so to allow projections. + +endif # TEST_KALLSYMS + config TEST_DEBUG_VIRTUAL tristate "Test CONFIG_DEBUG_VIRTUAL feature" depends on DEBUG_VIRTUAL diff --git a/lib/Makefile b/lib/Makefile index 6b09731d8e61..48e69a61cd85 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -95,6 +95,7 @@ obj-$(CONFIG_TEST_XARRAY) += test_xarray.o obj-$(CONFIG_TEST_MAPLE_TREE) += test_maple_tree.o obj-$(CONFIG_TEST_PARMAN) += test_parman.o obj-$(CONFIG_TEST_KMOD) += test_kmod.o +obj-$(CONFIG_TEST_RUNTIME) += tests/ obj-$(CONFIG_TEST_DEBUG_VIRTUAL) += test_debug_virtual.o obj-$(CONFIG_TEST_MEMCAT_P) += test_memcat_p.o obj-$(CONFIG_TEST_OBJAGG) += test_objagg.o diff --git a/lib/tests/Makefile b/lib/tests/Makefile new file mode 100644 index 000000000000..8e4f42cb9c54 --- /dev/null +++ b/lib/tests/Makefile @@ -0,0 +1 @@ +obj-$(CONFIG_TEST_RUNTIME_MODULE) += module/ diff --git a/lib/tests/module/.gitignore b/lib/tests/module/.gitignore new file mode 100644 index 000000000000..8be7891b250f --- /dev/null +++ b/lib/tests/module/.gitignore @@ -0,0 +1,4 @@ +test_kallsyms_a.c +test_kallsyms_b.c +test_kallsyms_c.c +test_kallsyms_d.c diff --git a/lib/tests/module/Makefile b/lib/tests/module/Makefile new file mode 100644 index 000000000000..af5c27b996cb --- /dev/null +++ b/lib/tests/module/Makefile @@ -0,0 +1,15 @@ +obj-$(CONFIG_TEST_KALLSYMS_A) += test_kallsyms_a.o +obj-$(CONFIG_TEST_KALLSYMS_B) += test_kallsyms_b.o +obj-$(CONFIG_TEST_KALLSYMS_C) += test_kallsyms_c.o +obj-$(CONFIG_TEST_KALLSYMS_D) += test_kallsyms_d.o + +$(obj)/%.c: FORCE + @$(kecho) " GEN $@" + $(Q)$(srctree)/lib/tests/module/gen_test_kallsyms.sh $@\ + $(CONFIG_TEST_KALLSYMS_NUMSYMS) \ + $(CONFIG_TEST_KALLSYMS_SCALE_FACTOR) + +clean-files += test_kallsyms_a.c +clean-files += test_kallsyms_b.c +clean-files += test_kallsyms_c.c +clean-files += test_kallsyms_d.c diff --git a/lib/tests/module/gen_test_kallsyms.sh b/lib/tests/module/gen_test_kallsyms.sh new file mode 100755 index 000000000000..e85f10dc11bd --- /dev/null +++ b/lib/tests/module/gen_test_kallsyms.sh @@ -0,0 +1,128 @@ +#!/bin/bash + +TARGET=$(basename $1) +DIR=lib/tests/module +TARGET="$DIR/$TARGET" +NUM_SYMS=$2 +SCALE_FACTOR=$3 +TEST_TYPE=$(echo $TARGET | sed -e 's|lib/tests/module/test_kallsyms_||g') +TEST_TYPE=$(echo $TEST_TYPE | sed -e 's|.c||g') + +gen_template_module_header() +{ + cat <<____END_MODULE +// SPDX-License-Identifier: GPL-2.0-or-later OR copyleft-next-0.3.1 +/* + * Copyright (C) 2023 Luis Chamberlain <mcgrof@kernel.org> + * + * Automatically generated code for testing, do not edit manually. + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <linux/init.h> +#include <linux/module.h> +#include <linux/printk.h> + +____END_MODULE +} + +gen_num_syms() +{ + PREFIX=$1 + NUM=$2 + for i in $(seq 1 $NUM); do + printf "int auto_test_%s_%010d = 0xff;\n" $PREFIX $i + printf "EXPORT_SYMBOL_GPL(auto_test_%s_%010d);\n" $PREFIX $i + done + echo +} + +gen_template_module_data_a() +{ + gen_num_syms a $1 + cat <<____END_MODULE +static int auto_runtime_test(void) +{ + return 0; +} + +____END_MODULE +} + +gen_template_module_data_b() +{ + printf "\nextern int auto_test_a_%010d;\n\n" 28 + echo "static int auto_runtime_test(void)" + echo "{" + printf "\nreturn auto_test_a_%010d;\n" 28 + echo "}" +} + +gen_template_module_data_c() +{ + gen_num_syms c $1 + cat <<____END_MODULE +static int auto_runtime_test(void) +{ + return 0; +} + +____END_MODULE +} + +gen_template_module_data_d() +{ + gen_num_syms d $1 + cat <<____END_MODULE +static int auto_runtime_test(void) +{ + return 0; +} + +____END_MODULE +} + +gen_template_module_exit() +{ + cat <<____END_MODULE +static int __init auto_test_module_init(void) +{ + return auto_runtime_test(); +} +module_init(auto_test_module_init); + +static void __exit auto_test_module_exit(void) +{ +} +module_exit(auto_test_module_exit); + +MODULE_AUTHOR("Luis Chamberlain <mcgrof@kernel.org>"); +MODULE_LICENSE("GPL"); +____END_MODULE +} + +case $TEST_TYPE in + a) + gen_template_module_header > $TARGET + gen_template_module_data_a $NUM_SYMS >> $TARGET + gen_template_module_exit >> $TARGET + ;; + b) + gen_template_module_header > $TARGET + gen_template_module_data_b >> $TARGET + gen_template_module_exit >> $TARGET + ;; + c) + gen_template_module_header > $TARGET + gen_template_module_data_c $((NUM_SYMS * SCALE_FACTOR)) >> $TARGET + gen_template_module_exit >> $TARGET + ;; + d) + gen_template_module_header > $TARGET + gen_template_module_data_d $((NUM_SYMS * SCALE_FACTOR * 2)) >> $TARGET + gen_template_module_exit >> $TARGET + ;; + *) + ;; +esac diff --git a/tools/testing/selftests/module/Makefile b/tools/testing/selftests/module/Makefile new file mode 100644 index 000000000000..6132d7ddb08b --- /dev/null +++ b/tools/testing/selftests/module/Makefile @@ -0,0 +1,12 @@ +# SPDX-License-Identifier: GPL-2.0-only +# Makefile for module loading selftests + +# No binaries, but make sure arg-less "make" doesn't trigger "run_tests" +all: + +TEST_PROGS := find_symbol.sh + +include ../lib.mk + +# Nothing to clean up. +clean: diff --git a/tools/testing/selftests/module/config b/tools/testing/selftests/module/config new file mode 100644 index 000000000000..259f4fd6b5e2 --- /dev/null +++ b/tools/testing/selftests/module/config @@ -0,0 +1,7 @@ +CONFIG_TEST_KMOD=m +CONFIG_TEST_LKM=m +CONFIG_XFS_FS=m + +# For the module parameter force_init_test is used +CONFIG_TUN=m +CONFIG_BTRFS_FS=m diff --git a/tools/testing/selftests/module/find_symbol.sh b/tools/testing/selftests/module/find_symbol.sh new file mode 100755 index 000000000000..c206b42525f1 --- /dev/null +++ b/tools/testing/selftests/module/find_symbol.sh @@ -0,0 +1,65 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0-or-later OR copyleft-next-0.3.1 +# Copyright (C) 2023 Luis Chamberlain <mcgrof@kernel.org> +# +# This is a stress test script for kallsyms through find_symbol() + +set -e + +# Kselftest framework requirement - SKIP code is 4. +ksft_skip=4 + +test_reqs() +{ + if ! which modprobe 2> /dev/null > /dev/null; then + echo "$0: You need modprobe installed" >&2 + exit $ksft_skip + fi + + if ! which kmod 2> /dev/null > /dev/null; then + echo "$0: You need kmod installed" >&2 + exit $ksft_skip + fi + + if ! which perf 2> /dev/null > /dev/null; then + echo "$0: You need perf installed" >&2 + exit $ksft_skip + fi + + uid=$(id -u) + if [ $uid -ne 0 ]; then + echo $msg must be run as root >&2 + exit $ksft_skip + fi +} + +remove_all() +{ + $MODPROBE -r test_kallsyms_b + for i in a b c d; do + $MODPROBE -r test_kallsyms_$i + done +} +test_reqs + +MODPROBE=$(</proc/sys/kernel/modprobe) +STATS="-e duration_time" +STATS="$STATS -e user_time" +STATS="$STATS -e system_time" +STATS="$STATS -e page-faults" + +remove_all +perf stat $STATS $MODPROBE test_kallsyms_b +remove_all + +# Now pollute the namespace +$MODPROBE test_kallsyms_c +perf stat $STATS $MODPROBE test_kallsyms_b + +# Now pollute the namespace with twice the number of symbols than the last time +remove_all +$MODPROBE test_kallsyms_c +$MODPROBE test_kallsyms_d +perf stat $STATS $MODPROBE test_kallsyms_b + +exit 0
Hi Luis, On 12/22/23 21:10, Luis Chamberlain wrote: > On Fri, Dec 22, 2023 at 01:13:26PM +0100, Helge Deller wrote: >> On 12/22/23 06:59, Luis Chamberlain wrote: >>> On Wed, Nov 22, 2023 at 11:18:12PM +0100, deller@kernel.org wrote: >>>> On 64-bit architectures without CONFIG_HAVE_ARCH_PREL32_RELOCATIONS >>>> (e.g. ppc64, ppc64le, parisc, s390x,...) the __KSYM_REF() macro stores >>>> 64-bit pointers into the __ksymtab* sections. >>>> Make sure that those sections will be correctly aligned at module link time, >>>> otherwise unaligned memory accesses may happen at runtime. >>> ... ... >> So, honestly I don't see a real reason why it shouldn't be applied... > > Like I said, you Cc'd stable as a fix, I added "Cc: stable@vger.kernel.org" on the patch itself, so *if* the patch would have been applied by you, it would later end up in stable kernel series too. But I did not CC'ed the stable mailing list directly, so my patch was never sent to that mailing list. > as a maintainer it is my job to > verify how critical this is and ask for more details about how you found > it and evaluate the real impact. Even if it was not a stable fix I tend > to ask this for patches, even if they are trivial. > ... > OK, can you extend the patch below with something like: > > perf stat --repeat 100 --pre 'modprobe -r b a b c' -- ./tools/testing/selftests/module/find_symbol.sh > > And test before and after? > > I ran a simple test as-is and the data I get is within noise, and so > I think we need the --repeat 100 thing. Your selftest code is based on perf. AFAICS we don't have perf on parisc/hppa, so I can't test your selftest code on that architecture. I assume you tested on x86, where the CPU will transparently take care of unaligned accesses. This is probably why the results are within the noise. But on some platforms the CPU raises an exception on unaligned accesses and jumps into special exception handler assembler code inside the kernel. This is much more expensive than on x86, which is why we track on parisc in /proc/cpuinfo counters on how often this exception handler is called: IRQ: CPU0 CPU1 3: 1332 0 SuperIO ttyS0 7: 1270013 0 SuperIO pata_ns87415 64: 320023012 320021431 CPU timer 65: 17080507 20624423 CPU IPI UAH: 10948640 58104 Unaligned access handler traps This "UAH" field could theoretically be used to extend your selftest. But is it really worth it? The outcome is very much architecture and CPU specific, maybe it's just within the noise as you measured. IMHO we should always try to natively align structures, and if we see we got it wrong in kernel code, we should fix it. My patches just fix those memory sections where we use inline assembly (instead of C) and thus missed to provide the correct alignments. Helge > ----------------------------------------------------------------------------------- > before: > sudo ./tools/testing/selftests/module/find_symbol.sh > > Performance counter stats for '/sbin/modprobe test_kallsyms_b': > > 81,956,206 ns duration_time > 81,883,000 ns system_time > 210 page-faults > > 0.081956206 seconds time elapsed > > 0.000000000 seconds user > 0.081883000 seconds sys > > > > Performance counter stats for '/sbin/modprobe test_kallsyms_b': > > 85,960,863 ns duration_time > 84,679,000 ns system_time > 212 page-faults > > 0.085960863 seconds time elapsed > > 0.000000000 seconds user > 0.084679000 seconds sys > > > > Performance counter stats for '/sbin/modprobe test_kallsyms_b': > > 86,484,868 ns duration_time > 86,541,000 ns system_time > 213 page-faults > > 0.086484868 seconds time elapsed > > 0.000000000 seconds user > 0.086541000 seconds sys > > ----------------------------------------------------------------------------------- > After your modules alignement fix: > sudo ./tools/testing/selftests/module/find_symbol.sh > Performance counter stats for '/sbin/modprobe test_kallsyms_b': > > 83,579,980 ns duration_time > 83,530,000 ns system_time > 212 page-faults > > 0.083579980 seconds time elapsed > > 0.000000000 seconds user > 0.083530000 seconds sys > > > > Performance counter stats for '/sbin/modprobe test_kallsyms_b': > > 70,721,786 ns duration_time > 69,289,000 ns system_time > 211 page-faults > > 0.070721786 seconds time elapsed > > 0.000000000 seconds user > 0.069289000 seconds sys > > > > Performance counter stats for '/sbin/modprobe test_kallsyms_b': > > 76,513,219 ns duration_time > 76,381,000 ns system_time > 214 page-faults > > 0.076513219 seconds time elapsed > > 0.000000000 seconds user > 0.076381000 seconds sys > > After your modules alignement fix: > sudo ./tools/testing/selftests/module/find_symbol.sh > Performance counter stats for '/sbin/modprobe test_kallsyms_b': > > 83,579,980 ns duration_time > 83,530,000 ns system_time > 212 page-faults > > 0.083579980 seconds time elapsed > > 0.000000000 seconds user > 0.083530000 seconds sys > > > > Performance counter stats for '/sbin/modprobe test_kallsyms_b': > > 70,721,786 ns duration_time > 69,289,000 ns system_time > 211 page-faults > > 0.070721786 seconds time elapsed > > 0.000000000 seconds user > 0.069289000 seconds sys > > > > Performance counter stats for '/sbin/modprobe test_kallsyms_b': > > 76,513,219 ns duration_time > 76,381,000 ns system_time > 214 page-faults > > 0.076513219 seconds time elapsed > > 0.000000000 seconds user > 0.076381000 seconds sys > ----------------------------------------------------------------------------------- > > [perf-based selftest patch from Luis stripped]
On Sat, Dec 30, 2023 at 08:33:24AM +0100, Helge Deller wrote: > Your selftest code is based on perf. > AFAICS we don't have perf on parisc/hppa, I see! > so I can't test your selftest code > on that architecture. > I assume you tested on x86, where the CPU will transparently take care of > unaligned accesses. This is probably why the results are within > the noise. > But on some platforms the CPU raises an exception on unaligned accesses > and jumps into special exception handler assembler code inside the kernel. > This is much more expensive than on x86, which is why we track on parisc > in /proc/cpuinfo counters on how often this exception handler is called: > IRQ: CPU0 CPU1 > 3: 1332 0 SuperIO ttyS0 > 7: 1270013 0 SuperIO pata_ns87415 > 64: 320023012 320021431 CPU timer > 65: 17080507 20624423 CPU IPI > UAH: 10948640 58104 Unaligned access handler traps > > This "UAH" field could theoretically be used to extend your selftest. Nice! > But is it really worth it? The outcome is very much architecture and CPU > specific, maybe it's just within the noise as you measured. It's within the noise for x86_64, but given what you suggest for parisc where it is much more expensive, we should see a non-noise delta. Even just time on loading the module should likely result in a considerable delta than on x86_64. You may just need to play a bit with the default values at build time. > IMHO we should always try to natively align structures, and if we see > we got it wrong in kernel code, we should fix it. This was all motivated by the first review criteria of these patches as if they were stable worthy or not. Even if we don't consider them stable material, given the test is now written and easily extended to test on parisc with just timing information and UAH I think it would be nice to have this data for a few larger default factor values so we can compare against x86_64 while we're at it. If you don't feel like doing that test that's fine too, we can just ignore that. I'll still apply the patches but, I figured I'd ask to collect information while the test was already written and it should now be easy to compare / contrast differences. Luis
On 1/22/24 17:10, Luis Chamberlain wrote: > On Sat, Dec 30, 2023 at 08:33:24AM +0100, Helge Deller wrote: >> Your selftest code is based on perf. >> AFAICS we don't have perf on parisc/hppa, > > I see! > >> so I can't test your selftest code >> on that architecture. >> I assume you tested on x86, where the CPU will transparently take care of >> unaligned accesses. This is probably why the results are within >> the noise. >> But on some platforms the CPU raises an exception on unaligned accesses >> and jumps into special exception handler assembler code inside the kernel. >> This is much more expensive than on x86, which is why we track on parisc >> in /proc/cpuinfo counters on how often this exception handler is called: >> IRQ: CPU0 CPU1 >> 3: 1332 0 SuperIO ttyS0 >> 7: 1270013 0 SuperIO pata_ns87415 >> 64: 320023012 320021431 CPU timer >> 65: 17080507 20624423 CPU IPI >> UAH: 10948640 58104 Unaligned access handler traps >> >> This "UAH" field could theoretically be used to extend your selftest. > > Nice! > >> But is it really worth it? The outcome is very much architecture and CPU >> specific, maybe it's just within the noise as you measured. > > It's within the noise for x86_64, but given what you suggest > for parisc where it is much more expensive, we should see a non-noise > delta. Even just time on loading the module should likely result in > a considerable delta than on x86_64. You may just need to play a bit > with the default values at build time. I don't know if it will be a "considerable" amount of time. >> IMHO we should always try to natively align structures, and if we see >> we got it wrong in kernel code, we should fix it. > > This was all motivated by the first review criteria of these patches > as if they were stable worthy or not. Even if we don't consider them > stable material, given the test is now written and easily extended to > test on parisc with just timing information and UAH I think it would > be nice to have this data for a few larger default factor values so we > can compare against x86_64 while we're at it. > > If you don't feel like doing that test that's fine too, we can just > ignore that. I can do that test, but I won't have time for that in the next few weeks... > I'll still apply the patches Yes, please do! Even if I don't test now, I (or others) will test at a later point. > but, I figured I'd ask to collect information while the test was already > written and it should now be easy to compare / contrast differences. Ok. Helge
On Mon, Jan 22, 2024 at 05:47:49PM +0100, Helge Deller wrote: > On 1/22/24 17:10, Luis Chamberlain wrote: > > > > It's within the noise for x86_64, but given what you suggest > > for parisc where it is much more expensive, we should see a non-noise > > delta. Even just time on loading the module should likely result in > > a considerable delta than on x86_64. You may just need to play a bit > > with the default values at build time. > > I don't know if it will be a "considerable" amount of time. There are variables which you can tune, so given what you suggest it should be possible to get to that spot rather easily with a few changes to the default values I think. > > If you don't feel like doing that test that's fine too, we can just > > ignore that. > > I can do that test, but I won't have time for that in the next few weeks... I would appreciate it! Luis
diff --git a/scripts/module.lds.S b/scripts/module.lds.S index bf5bcf2836d8..b00415a9ff27 100644 --- a/scripts/module.lds.S +++ b/scripts/module.lds.S @@ -15,10 +15,10 @@ SECTIONS { *(.discard.*) } - __ksymtab 0 : { *(SORT(___ksymtab+*)) } - __ksymtab_gpl 0 : { *(SORT(___ksymtab_gpl+*)) } - __kcrctab 0 : { *(SORT(___kcrctab+*)) } - __kcrctab_gpl 0 : { *(SORT(___kcrctab_gpl+*)) } + __ksymtab 0 : ALIGN(8) { *(SORT(___ksymtab+*)) } + __ksymtab_gpl 0 : ALIGN(8) { *(SORT(___ksymtab_gpl+*)) } + __kcrctab 0 : ALIGN(4) { *(SORT(___kcrctab+*)) } + __kcrctab_gpl 0 : ALIGN(4) { *(SORT(___kcrctab_gpl+*)) } .ctors 0 : ALIGN(8) { *(SORT(.ctors.*)) *(.ctors) } .init_array 0 : ALIGN(8) { *(SORT(.init_array.*)) *(.init_array) }