Message ID | 20221020032024.1804535-1-yury.norov@gmail.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4ac7:0:0:0:0:0 with SMTP id y7csp46109wrs; Wed, 19 Oct 2022 20:49:38 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6WrdfoPK+2TBKWFWimeQpCu6Ttj2Oz/7qkfgRy/UPIE5ty+UapSExYdBXqXI6n66kkuQKg X-Received: by 2002:a17:907:724e:b0:78d:b515:f634 with SMTP id ds14-20020a170907724e00b0078db515f634mr9135461ejc.35.1666237778606; Wed, 19 Oct 2022 20:49:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666237778; cv=none; d=google.com; s=arc-20160816; b=OLivGwn5xnqApk/rTI3+VjulnTpVJa5PAOuwstyYBFV5rFM39OX3+KQjUN6rGZdMem Nslwoox6R99k+L5PWTKfqKGmdubKHIq1kxLAjl+Y95udpNLL8pVbFt7HrziZTjlhWahQ JOkswj/og3Un0XY7zOeS1kpQxZtirVVG0u+5VkX3UjvQJyB8NAHaUjlhgMHTbkxyv6/r Ubrp84syMO3as+mrNtgnqCy5uO8pxhE8NQSynByDf/sxL5C46BuyEXs0gIcYvSGbLFJK /BS6S+nbjmyDttLCth7mxuvQfbDttulTHDKbMjWQlE03SFrJXQPYL3ANdRAwj7g1Q/YK e+fA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=XZo2P0btegWzJjzezfawR0r1cS5ypDKFjzgQbfC4/vQ=; b=egAw8ZzxgYo/kHA+sOeERynJ5XS/rH6u2j1HevUpj/m+3BL9gl3lNYxku588V+7OvR WzhibS7AlKZzWoWiEDtckR1jLaiFjY5J0DhOQt4FBwEFwPj7XChmqmat1pEQHf80ZUzD TPps+Dcz73VoVpo6HJlPT+14LEBVoL+Ed3OxJ/kdcNXvVkR0evSf2KXOnPuI4KwQAeQb haYn3jRsTCpMCUmv54hLiXN0PPk6b7Wx7kRpKwMrX126VFBGvFzJDKvhX2jUmpzmjyvD qv4OGmZPuIOkIDgAJj7n41HBU4i3/C6q34W/O40eWATcbYQ+2cvPe+2q0osCrRVyZbuj 4LtA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b="XB/FWk7G"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x14-20020a05640226ce00b0045cfb63a033si17932112edd.551.2022.10.19.20.49.08; Wed, 19 Oct 2022 20:49:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b="XB/FWk7G"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229508AbiJTDWq (ORCPT <rfc822;samuel.l.nystrom@gmail.com> + 99 others); Wed, 19 Oct 2022 23:22:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41718 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229497AbiJTDWo (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 19 Oct 2022 23:22:44 -0400 Received: from mail-oa1-x2f.google.com (mail-oa1-x2f.google.com [IPv6:2001:4860:4864:20::2f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 68F3B1D2B54 for <linux-kernel@vger.kernel.org>; Wed, 19 Oct 2022 20:22:43 -0700 (PDT) Received: by mail-oa1-x2f.google.com with SMTP id 586e51a60fabf-1322fa1cf6fso23078922fac.6 for <linux-kernel@vger.kernel.org>; Wed, 19 Oct 2022 20:22:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=XZo2P0btegWzJjzezfawR0r1cS5ypDKFjzgQbfC4/vQ=; b=XB/FWk7GqUsa9t0mEGZQrGtNDBVz24LAJAge3DYogCxHXy9qgpHxwopFhZO2US4vQc aZTz1DPlSinA0LEC4Sa00ZTxeh/0HHs8U+W52L2C1c5y5e8GM07o5fBXhECIzVcZPBCa P9MAaSckHuZJS8aJ0mORgERwKUiFDe8ZwfAXJ2nV5F/IybzbIs7xmOkgM/UbjxcWBUZA si/EEMGGuLQU+sangBokqcTPuldoXiOGkVGx+iLkMw3op+Qf8JWLy426CMIi6B4bqVqX dVDFnn8vVv8AbfAm13EurfhSOdeecbOc5FExU66tymM7vBcUyHH3NYXwouxZ2gvOj81o 92Mw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=XZo2P0btegWzJjzezfawR0r1cS5ypDKFjzgQbfC4/vQ=; b=2XgNEe9eHrd/hN6S/RJA/I4MBlqHH7GyQi+IU2pZbG8N+Z2hbrtiS89Bmjubi3cbl/ xbhAXTl6LP+wfCp2jkKiPL1hybW0lg7gN7Wxn8c1EAuK2uBdSn1eIrih4zJrWwW/OHNm wEbT5oWfw1wS8W7PA00pdb5wFm1ijT5uIyR5ZaesTDrAMvLcuQlVrsJmXfEd6F7yN2U7 TBC91oMUkm9GPKYQsTRs/LEQDRS5dvjAVeYjvaRTQ+xyOt+MCYhiD4A6cryk9+pBcMX7 0hmygV94etI90sPxVkccHTaZs8+SEqQEwXj0j4OwcRLyYBcsHHpbZpeiPdvb+Z5xqEJV RUGw== X-Gm-Message-State: ACrzQf12a7S1SgQ7jTXdaBMDc9tOvb/E44NqGDB4xsyJTg1TtUtIXDrb rHCvnHnrUhqYSkgQ1ir21gE= X-Received: by 2002:a05:6870:41ca:b0:12b:9637:1cda with SMTP id z10-20020a05687041ca00b0012b96371cdamr7307113oac.114.1666236162649; Wed, 19 Oct 2022 20:22:42 -0700 (PDT) Received: from localhost ([12.97.180.36]) by smtp.gmail.com with ESMTPSA id a19-20020a056808129300b00354b619a375sm7596815oiw.0.2022.10.19.20.22.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Oct 2022 20:22:42 -0700 (PDT) From: Yury Norov <yury.norov@gmail.com> To: "Russell King (Oracle)" <linux@armlinux.org.uk>, Catalin Marinas <catalin.marinas@arm.com>, Mark Rutland <mark.rutland@arm.com>, Will Deacon <will@kernel.org>, linux-arm-kernel@lists.infradead.org Cc: Yury Norov <yury.norov@gmail.com>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, Alexey Klimov <klimov.linux@gmail.com>, Andy Shevchenko <andy.shevchenko@gmail.com>, Andy Whitcroft <apw@canonical.com>, Dennis Zhou <dennis@kernel.org>, Geert Uytterhoeven <geert@linux-m68k.org>, Guenter Roeck <linux@roeck-us.net>, Kees Cook <keescook@chromium.org>, Linus Torvalds <torvalds@linux-foundation.org>, Rasmus Villemoes <linux@rasmusvillemoes.dk> Subject: [RFC PATCH 0/2] Switch ARM to generic find_bit() API Date: Wed, 19 Oct 2022 20:20:22 -0700 Message-Id: <20221020032024.1804535-1-yury.norov@gmail.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747176944970906251?= X-GMAIL-MSGID: =?utf-8?q?1747176944970906251?= |
Series |
Switch ARM to generic find_bit() API
|
|
Message
Yury Norov
Oct. 20, 2022, 3:20 a.m. UTC
Hi Russell, all, I'd like to respin a patch that switches ARM to generic find_bit() functions. Generic code works on par with arch or better, according to my testing [1], and with recent improvements merged in v6.1, it should be even faster. ARM already uses many generic find_bit() functions - those that it doesn't implement. So we are talking about migrating a subset of the API; most of find_bit() family has only generic implementation on ARM. The only concern about this migration is that ARM code supports byte-aligned bitmap addresses, while generic code is optimized for word-aligned bitmaps. In my practice, I've never seen unaligned bitmaps. But to check that on ARM, I added a run-time check for bitmap alignment. I gave it run on several architectures and found nothing. Can you please check that on your hardware and compare performance of generic vs arch code for you? If everything is OK, I suggest switching ARM to generic find_bit() completely. Thanks, Yury [1] https://lore.kernel.org/all/YuWk3titnOiQACzC@yury-laptop/ Yury Norov (2): bitmap: add sanity check function for find_bit() arm: drop arch implementation for find_bit() functions arch/arm/include/asm/bitops.h | 63 ----------- arch/arm/kernel/armksyms.c | 11 -- arch/arm/lib/Makefile | 2 +- arch/arm/lib/findbit.S | 193 ---------------------------------- include/linux/find.h | 35 ++++++ lib/Kconfig.debug | 7 ++ 6 files changed, 43 insertions(+), 268 deletions(-) delete mode 100644 arch/arm/lib/findbit.S
Comments
On Wed, Oct 19, 2022 at 08:20:22PM -0700, Yury Norov wrote: > Hi Russell, all, > > I'd like to respin a patch that switches ARM to generic find_bit() > functions. > > Generic code works on par with arch or better, according to my > testing [1], and with recent improvements merged in v6.1, it should > be even faster. > > ARM already uses many generic find_bit() functions - those that it > doesn't implement. So we are talking about migrating a subset of the > API; most of find_bit() family has only generic implementation on ARM. > > The only concern about this migration is that ARM code supports > byte-aligned bitmap addresses, while generic code is optimized for > word-aligned bitmaps. > > In my practice, I've never seen unaligned bitmaps. But to check that on > ARM, I added a run-time check for bitmap alignment. I gave it run on > several architectures and found nothing. > > Can you please check that on your hardware and compare performance of > generic vs arch code for you? If everything is OK, I suggest switching > ARM to generic find_bit() completely. > > Thanks, > Yury > > [1] https://lore.kernel.org/all/YuWk3titnOiQACzC@yury-laptop/ I _really_ don't want to play around with this stuff right now... 6.0 appears to have a regression on arm32 early on during boot: [ 1.410115] EXT4-fs error (device sda1): htree_dirblock_to_tree:1093: inode #256: block 8797: comm systemd: bad entry in directory: rec_len % 4 != 0 - offset=0, inode=33188, rec_len=35097, size=4096 fake=0 Booting 5.19 with the same filesystem works without issue and without even a fsck, but booting 6.0 always results in some problem that prevents it booting. Debugging this is not easy, because there also seems to be something up with the bloody serial console - sometimes I get nothing, other times I get nothing more than: [ 2.929502] EXT4-fs error (de and then the output stops. Is the console no longer synchronous? If it isn't, that's a huge mistake which can be seen right here with the partial message output... so I also need to work out how to make the console output synchronous again.
On Thu, Oct 20, 2022 at 05:51:34PM +0100, Russell King (Oracle) wrote: > On Wed, Oct 19, 2022 at 08:20:22PM -0700, Yury Norov wrote: > > Hi Russell, all, > > > > I'd like to respin a patch that switches ARM to generic find_bit() > > functions. > > > > Generic code works on par with arch or better, according to my > > testing [1], and with recent improvements merged in v6.1, it should > > be even faster. > > > > ARM already uses many generic find_bit() functions - those that it > > doesn't implement. So we are talking about migrating a subset of the > > API; most of find_bit() family has only generic implementation on ARM. > > > > The only concern about this migration is that ARM code supports > > byte-aligned bitmap addresses, while generic code is optimized for > > word-aligned bitmaps. > > > > In my practice, I've never seen unaligned bitmaps. But to check that on > > ARM, I added a run-time check for bitmap alignment. I gave it run on > > several architectures and found nothing. > > > > Can you please check that on your hardware and compare performance of > > generic vs arch code for you? If everything is OK, I suggest switching > > ARM to generic find_bit() completely. > > > > Thanks, > > Yury > > > > [1] https://lore.kernel.org/all/YuWk3titnOiQACzC@yury-laptop/ > > I _really_ don't want to play around with this stuff right now... 6.0 > appears to have a regression on arm32 early on during boot: > > [ 1.410115] EXT4-fs error (device sda1): htree_dirblock_to_tree:1093: inode #256: block 8797: comm systemd: bad entry in directory: rec_len % 4 != 0 - offset=0, inode=33188, rec_len=35097, size=4096 fake=0 > > Booting 5.19 with the same filesystem works without issue and without > even a fsck, but booting 6.0 always results in some problem that > prevents it booting. > > Debugging this is not easy, because there also seems to be something > up with the bloody serial console - sometimes I get nothing, other > times I get nothing more than: > > [ 2.929502] EXT4-fs error (de > > and then the output stops. Is the console no longer synchronous? If it > isn't, that's a huge mistake which can be seen right here with the > partial message output... so I also need to work out how to make the > console output synchronous again. Got it. I you think that EXT4 problems are due to unaligned bitmaps, you can take 1st patch from this series to check. Thanks, Yury
On Thu, Oct 20, 2022 at 01:02:07PM -0700, Yury Norov wrote: > On Thu, Oct 20, 2022 at 05:51:34PM +0100, Russell King (Oracle) wrote: > > On Wed, Oct 19, 2022 at 08:20:22PM -0700, Yury Norov wrote: > > > Hi Russell, all, > > > > > > I'd like to respin a patch that switches ARM to generic find_bit() > > > functions. > > > > > > Generic code works on par with arch or better, according to my > > > testing [1], and with recent improvements merged in v6.1, it should > > > be even faster. > > > > > > ARM already uses many generic find_bit() functions - those that it > > > doesn't implement. So we are talking about migrating a subset of the > > > API; most of find_bit() family has only generic implementation on ARM. > > > > > > The only concern about this migration is that ARM code supports > > > byte-aligned bitmap addresses, while generic code is optimized for > > > word-aligned bitmaps. > > > > > > In my practice, I've never seen unaligned bitmaps. But to check that on > > > ARM, I added a run-time check for bitmap alignment. I gave it run on > > > several architectures and found nothing. > > > > > > Can you please check that on your hardware and compare performance of > > > generic vs arch code for you? If everything is OK, I suggest switching > > > ARM to generic find_bit() completely. > > > > > > Thanks, > > > Yury > > > > > > [1] https://lore.kernel.org/all/YuWk3titnOiQACzC@yury-laptop/ > > > > I _really_ don't want to play around with this stuff right now... 6.0 > > appears to have a regression on arm32 early on during boot: > > > > [ 1.410115] EXT4-fs error (device sda1): htree_dirblock_to_tree:1093: inode #256: block 8797: comm systemd: bad entry in directory: rec_len % 4 != 0 - offset=0, inode=33188, rec_len=35097, size=4096 fake=0 > > > > Booting 5.19 with the same filesystem works without issue and without > > even a fsck, but booting 6.0 always results in some problem that > > prevents it booting. > > > > Debugging this is not easy, because there also seems to be something > > up with the bloody serial console - sometimes I get nothing, other > > times I get nothing more than: > > > > [ 2.929502] EXT4-fs error (de > > > > and then the output stops. Is the console no longer synchronous? If it > > isn't, that's a huge mistake which can be seen right here with the > > partial message output... so I also need to work out how to make the > > console output synchronous again. > > Got it. > > I you think that EXT4 problems are due to unaligned bitmaps, you can take > 1st patch from this series to check. Got to the bottom of it, it wasn't the bit array functions, it was DMA API issues. Okay, I've now tested the generic ops vs my updated optimised ops, and my version still comes out faster (based on three runs). The random-filled show less difference, but the sparse bitmaps show a much better win for my optimised code over the generic code where they exist: arm: [ 694.614773] find_next_bit: 40078 ns, 656 iterations gen: [ 88.611973] find_next_bit: 69943 ns, 655 iterations arm: [ 694.625436] find_next_zero_bit: 3939309 ns, 327025 iterations gen: [ 88.624227] find_next_zero_bit: 5529553 ns, 327026 iterations arm: [ 694.646236] find_first_bit: 7301363 ns, 656 iterations gen: [ 88.645313] find_first_bit: 7589120 ns, 655 iterations These figures appear to be pretty consistent across the three runs. For completness, here's the random-filled results: arm: [ 694.109190] find_next_bit: 2242618 ns, 163949 iterations gen: [ 88.167340] find_next_bit: 2632859 ns, 163743 iterations arm: [ 694.117968] find_next_zero_bit: 2049129 ns, 163732 iterations gen: [ 88.176912] find_next_zero_bit: 2844221 ns, 163938 iterations arm: [ 694.151421] find_first_bit: 17778911 ns, 16448 iterations gen: [ 88.211167] find_first_bit: 18596622 ns, 16401 iterations So, I don't see much reason to switch to the generic ops for these, not when we have such a significant speedup on the find_next_* functions for sparse-filled results..