Message ID | BYAPR06MB557347406F22FBA1E400A5BFD8319@BYAPR06MB5573.namprd06.prod.outlook.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp750852wru; Mon, 24 Oct 2022 18:51:09 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5oUcLxPbTq31dI2mbnPNZnFMsXxGmLzNTJf6iaucQYpxPMMOzYNa7PbhQnOMzZLeqeURz+ X-Received: by 2002:a05:6a00:15c2:b0:565:bc96:1c75 with SMTP id o2-20020a056a0015c200b00565bc961c75mr35957744pfu.23.1666662669669; Mon, 24 Oct 2022 18:51:09 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1666662669; cv=pass; d=google.com; s=arc-20160816; b=ksJTrqoN7XiCOTEM5qzFzMVONEyytKezPnsnDyxbYRlmPrbGnFE4zKz3WbHHyqKrXT 3y0obCYZcE5vG28RNhpvgdrWbRUdKEFFXS++4xQ5d8OvoKqOJklEDn+sf6OUqkH0jNIi YIqc1BSd/UeEI+11K7J6OfFnWSNk7DP3OhO8V8AXv0QY22MM1SRTqbX1NTXc2dmldGYN odnRFwMIMzxMEpKPq3f0tV67VGqqdnqt+QtlC8jdFAiJQNi9Dh3icrTTDFU0rAlv3U+j eo4lWWxtkIJ6CHKfms6wCSrqynLH1xpOd8pfyqN1MKkOJc17IVTuDZfsLVetDYHzWZqx aNBQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:content-transfer-encoding :content-language:accept-language:message-id:date:thread-index :thread-topic:subject:to:from:dkim-signature; bh=pG4T3DJJr/eG5bTs9YDAwGXrStNCvUKCRGYS1gq9lYU=; b=e/0bV0oxezYXzKFf7Ak1uIiVcDTV5Yf+8qS4I5tGD3MpyVRO7K4xwX8a9KvqUaAS1t A/+c+neHOPg4xwL9Mgffu9pXM8tDLKo90Rqi17ltN9DoYyjL51NGBeDSVFPp7AxFpJwB 4Q0cwikuvNy29IriZ+Bsx0bc+LOyPt1O5KOYekuAuooo/r77TJWL9SMzRXO5COa9PRxD m15V1E1125XxuZJ6ri1bjHnu9tEeWM9VihUtuconioZn+J/LWnAsa8Qe3HI5rAJAdZZq sMv81vDGcPI6oSKpm+/ERPG0o5DarnNASTmtIbF11+XEBn0CPstNsE9HJHV5zO+2p7eR qd8w== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@nathanm.onmicrosoft.com header.s=selector2-nathanm-onmicrosoft-com header.b=EcTmZndM; arc=pass (i=1 spf=pass spfdomain=nathanm.com dkim=pass dkdomain=nathanm.com dmarc=pass fromdomain=nathanm.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o6-20020a655206000000b004606d9040b8si1299710pgp.387.2022.10.24.18.50.57; Mon, 24 Oct 2022 18:51:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@nathanm.onmicrosoft.com header.s=selector2-nathanm-onmicrosoft-com header.b=EcTmZndM; arc=pass (i=1 spf=pass spfdomain=nathanm.com dkim=pass dkdomain=nathanm.com dmarc=pass fromdomain=nathanm.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231668AbiJYBuW (ORCPT <rfc822;pwkd43@gmail.com> + 99 others); Mon, 24 Oct 2022 21:50:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59722 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231435AbiJYBuD (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 24 Oct 2022 21:50:03 -0400 Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on2116.outbound.protection.outlook.com [40.107.93.116]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B51731372BA for <linux-kernel@vger.kernel.org>; Mon, 24 Oct 2022 18:46:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=crU395b4E9aLHKhoG5fD+LVcxzer8gMRWafduvbB2XfRKS3wYaLb4bI+MMLS4s+OWNAEdr5zVC1sv97K7qRUJvBoW2a52lAeYWIoRVkDDqCfY+ct8A+VXRHAHB6DiVMlNHcy19d977DSQAsJ8se6tDumeVRC7uoOdEE7o2sMdf921elxeaF5VTtZp4hm3GUavMue5CJZQsQJSGMTHIc0foD6qIv+HLG9G2Vdh5FyhA/gkdp23DDMiKih3p3v4Cuiba/agu8tdO9K8DRTGOG0w87PI5aJLOyYSvUJwx+fmNDvZHrZ0Be+3BkhaiSDIwBgeaT9gkdoQb4taVwWmDSGIQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=pG4T3DJJr/eG5bTs9YDAwGXrStNCvUKCRGYS1gq9lYU=; b=ENmtPMSpxWoHE0f6nuuTNWdTvqLSW5wC1UswGRF9E0A2HMMixxsaa/RR9cdrsw9HE2vq673KB38v1eTwVH2ozcgFg+CXPGYL6/jWIffifWGEWtc/gQccYDQSQouGRXV8ffS9kaDrSGd+2teXToajxH87zyecMGihGvmOsZenpA6vZYgCu7XS+Arzt78YRFcW3Ee8hgxgVzHYdnMKzrM/UEm3nBz5Pygq4fJOQoML5p39N2ePKsOimNxxC1sRZ17TNZLhAecb/5mWbKSHJScWBRWMkLRI8P4qWj34whWXctub2KSr2S7g3BBC+5xt4Zs9rbKqqdP9trctwDYjS4RPIw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nathanm.com; dmarc=pass action=none header.from=nathanm.com; dkim=pass header.d=nathanm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nathanm.onmicrosoft.com; s=selector2-nathanm-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=pG4T3DJJr/eG5bTs9YDAwGXrStNCvUKCRGYS1gq9lYU=; b=EcTmZndMC3nmnlIL109yCQE43bXI0ql/LR5kDjcDQQh6edw2uUFImuLaJgRiXwJxf3SuYzrT4jDBw4a79Ny8XJgtS2cHtOQ4vnTICmppwlN9MZqS4hrBYh9/l0eMALWtjJaN2ZxsvNTZOrEMVQKH6VTvfRYyRUzu/Oo46V6Tp9k= Received: from BYAPR06MB5573.namprd06.prod.outlook.com (2603:10b6:a03:a8::18) by MWHPR06MB2336.namprd06.prod.outlook.com (2603:10b6:300:65::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5746.21; Tue, 25 Oct 2022 01:46:32 +0000 Received: from BYAPR06MB5573.namprd06.prod.outlook.com ([fe80::8680:24cb:c1f9:6497]) by BYAPR06MB5573.namprd06.prod.outlook.com ([fe80::8680:24cb:c1f9:6497%5]) with mapi id 15.20.5746.021; Tue, 25 Oct 2022 01:46:32 +0000 From: Nathan Moinvaziri <nathan@nathanm.com> To: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, Andy Shevchenko <andy@kernel.org> Subject: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match Thread-Topic: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match Thread-Index: AdjoEtBynXV7SUtxSVy5+tznV2O0qA== Date: Tue, 25 Oct 2022 01:46:32 +0000 Message-ID: <BYAPR06MB557347406F22FBA1E400A5BFD8319@BYAPR06MB5573.namprd06.prod.outlook.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nathanm.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: BYAPR06MB5573:EE_|MWHPR06MB2336:EE_ x-ms-office365-filtering-correlation-id: c9923f8f-0dea-44b6-9a7d-08dab62abe4f x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: em191S5BSzv/eUJSVV3W3WiqVmDEW/5YG48nW1UFoS8lcJ9v5ucnMgLG7JKgDKipqB0FDv0AOnYUNPTkU6mwmX1U/7KV577yuYRyPxlbEL7Ry62aMvEaN5W4h9ykq72KkBqk27sQh4EhONCziiQDme4dogMis2/PlW3wiNO1a3Zm9JBi73bp4Q9LpYJFpT4Q8J6xjQe+ki3plqziBa4qRJGo+yHVOG37Yf3458Z6u964Xb4IXwjtDhkEJQmApZgImbu1dx+4uzAObH47ZAgKBnyrDimWjdIMbMg9T5rynQJUtDGiV9tEjlis1r1FIIvz/sI52Zrwx5Z7b5rd3LYBWcycuqYOmoC0tQEwNI6UqoUyzEBUT7Q5jwfaIZJk9yFsbBTYg5XLH0uB3iPxgjxquhxKMfsyuoVXVtIVcpbhdBMSb893oIvoj8GJUVsKsh6Sc2aggu1c/oOMaFGMiWnQrmkIN1o4XlUl4oJQHNawm3RI5b0F40gpT9eCPGLDGjRoTs14LfFDysrA0QDIKxS/YBKMCFubrETlJ3H8TI5I7Lk+FawFFeRBHTV5qVMAu65ICBWTirxU6PKLX06QX8CsID2oG2zsUGKhOcw5InPEIDlrDr43QIlThUeYT7Nu4MURQvs17eFNuWPTQ/pzufSsCdDmeGB+P1q+8g74uE9Djn2IbIOOVUUEGy1HtxlQukRWGqNPTilgvIIWW9qRS1Pb+gctdb29YoSHOum32qMR73wvaWnUoJqF3Y2etwWLfaj2p8DZi4LHvntJIfR0kMdvXg== x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BYAPR06MB5573.namprd06.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(366004)(346002)(396003)(136003)(39830400003)(376002)(451199015)(83380400001)(38070700005)(186003)(2906002)(122000001)(38100700002)(9686003)(5660300002)(41300700001)(8936002)(55016003)(52536014)(478600001)(6506007)(26005)(76116006)(64756008)(53546011)(66556008)(7696005)(66446008)(66946007)(66476007)(8676002)(110136005)(71200400001)(86362001)(316002)(33656002);DIR:OUT;SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: BjwmGzvsYUt9zU0zaT45vFdQgDaDCY8DkeIZo6p2ZUwp8OAQKiXYtNcDTDrj168RZViv9RDXjVkXBhyV3Ns3qme/CCFBStmLQdOdJQgcjiQQRY43O3Ejq7UKpxgtP5+daednCqV5TIK9+HXQbC57IyO2VAkDXAL5/X7sQxi1KZezltpdkJBixzYfRTxo/x3hW1VOm9/Lgs9dST9iyMQFzM6WikIhiTRoGE1SqiNu2DyCRLd5hjLybGBZ0llkrD5d7iApLUku1c2TRGLKZHMnMCs9Q/Zb3cSBfLZsbUV0Yb7yrZ98m/CHMyyN9r8JM3W0IWQd0fQhHnMRTM4Z8IBcjeUYURro/wwRjXFZ9lEPkBflpls1ifp18DkPjALA69mooHnbToem3jx8KSEj4j4uBSNCBH/TVdgljOQm5+oFAM7BHH4f7NQjcMPuFZFKS7YKU00CwI/i9yJ/prkuFI6ll8JrJzj6wmZa88VRcMMe8i0ldHicHhqC9BWmi8/juGVR83KI9T1msFdBmCKwDrQMKaNm5w7gDggp/qktp3qRbv81U+m/+lQQ3t7Mu7ljKrfqB03ue2B7GLepsjq4ju58cAIbq/CwOEdzv/rusHdh7Kx11hEKsaLRDfaT1vZ3DcUxxs08L0SukKWr2IvbmEKafVywdaiiG09GqsD5N+1a2ZETAMtMxQ6Pp6134eBbtKbHi27jRRo53+n0PDtQbep4ICKJG4pWOS8vO/codOggZZuqWtMOB+RzHq4rUmpZEwlcpzlDhCozCy6wkwL9I6QNZlCU3VHJmXJyBynA87CMS8uqfRELQxffyovzE/QdnrwbKxRvgdWhecxwfYXMSd9funk7/IkSFOyohvFmrtgzkheCcP1wZ1RarbTEpOK+9r7Sn3fOGq3605syDlo9NKAjfsgyuSwxyIwWEzt5sLOfT5oziWH4bigXsKo9RsMxPWEMb2EXZm05KTkDKSuhGBj+zwooQuYIX7vKx1o005K4zTVGiZAhs0OHeM7/tuQHw7IgXnYXRNYlBq7iUFVxVDFeSIY5tG4G/2STngYTd4YgHSiDHKHq+HhZ8CTBjT+GhFnszMECtSgxaiW/hKFlO+Jowkw0GCcn9td16byxE5cVMtQ13sPEjlF0ZK8fMEe5I3eTHFubWPaPDo/JkT1QbExgqGnRWrg6esL5NUXaLcXLTu7PYDK+0eHeVYwbOKlHBuEKEDl2vFElguozGjAP21jR54Ima7Q6PX2QXJXZh3ADYP7ctxP1eFNaylPmGHnglSHFSmQiyTas2nwyNlpZWAj8r/RNUlMrTUldzRH1YYtQNZ94a6vxfHDHFyzMFDiRbiouy+qrr7Yj2WtkITsl2fIE1YuI4uWU3i4V0zrWvyZVcKnOhvPDir+fs5xHxkngCd2SSFoqLZTK+UPBOA9GJNyAyrQQ2bv3Q1nXMebTvSl/sRB3qtSTuBZ6UbuCL/ccSB3rt5WJJQlJLpIjoruhngJ9P9c4pmckFuk5feIWJJCZmKfD581jQQYw5hodAlDrpSpUGe6JjZpJCJnkPDSaOevCmMJITYn+7CAoys8UeH5V4gPKcCH/Y/lYbYjM6tUGAeJM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: nathanm.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: BYAPR06MB5573.namprd06.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: c9923f8f-0dea-44b6-9a7d-08dab62abe4f X-MS-Exchange-CrossTenant-originalarrivaltime: 25 Oct 2022 01:46:32.3647 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3e74266a-92ff-414a-9879-2149aecc5932 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: f3bvU4kS1im+b2/+jsjPKulK8okiBCjuW3U0RHu5wNYyVOY6oneHIJ+PFfd2C9hzcPVnr2T+63AxMXlxNq53gA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR06MB2336 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747622475175701646?= X-GMAIL-MSGID: =?utf-8?q?1747622475175701646?= |
Series |
lib/string.c: Improve strcasecmp speed by not lowering if chars match
|
|
Commit Message
Nathan Moinvaziri
Oct. 25, 2022, 1:46 a.m. UTC
From fcb0159ee74908f92adc34143657d8ca56e9a811 Mon Sep 17 00:00:00 2001 From: Nathan Moinvaziri <nathan@nathanm.com> Date: Mon, 24 Oct 2022 16:37:59 -0700 Subject: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match. With strings where many characters match exactly each character is needlessly converted to lowercase before comparing. This patch improves the comparison by only converting to lowercase after checking that the characters don't match. The more characters that match exactly the better performance we expect versus the old function. When running tests using Quick Benchmark with two matching 256 character strings these changes result in anywhere between ~6-9x speed improvement. * We use unsigned char instead of int similar to strncasecmp. * We only subtract c1 - c2 when they are not equal. Reviewed-by: Sergey Markelov <sergio_nsk@yahoo.de> Reviewed-by: Steve Tucker <steven.r.tucker@gmail.com> --- lib/string.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-)
Comments
On Tue, Oct 25, 2022 at 4:46 AM Nathan Moinvaziri <nathan@nathanm.com> wrote: > > From fcb0159ee74908f92adc34143657d8ca56e9a811 Mon Sep 17 00:00:00 2001 > From: Nathan Moinvaziri <nathan@nathanm.com> > Date: Mon, 24 Oct 2022 16:37:59 -0700 > Subject: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if > chars match. Why is the above in the commit message? > With strings where many characters match exactly each character is needlessly > converted to lowercase before comparing. This patch improves the comparison > by only converting to lowercase after checking that the characters don't match. > > The more characters that match exactly the better performance we expect versus > the old function. > > When running tests using Quick Benchmark with two matching 256 character > strings these changes result in anywhere between ~6-9x speed improvement. > > * We use unsigned char instead of int similar to strncasecmp. > * We only subtract c1 - c2 when they are not equal. Nobody can take it. Please, read Submitting Patches documentation and fix your contribution accordingly. ... > do { > - c1 = tolower(*s1++); > - c2 = tolower(*s2++); > - } while (c1 == c2 && c1 != 0); > - return c1 - c2; > + c1 = *s1++; > + c2 = *s2++; > + if (c1 != c2) { > + c1 = tolower(c1); > + c2 = tolower(c2); > + if (c1 != c2) > + return (int)c1 - (int)c2; > + } > + } while (c1 != 0); You tell us that this is more preformant, but have not provided the numbers. Can we see those, please? Note, that you basically trash CPU cache lines when characters are not equal, and before doing that you have a branching. I'm unsure that your way is more performant than the original one.
On Tue, Oct 25, 2022 at 11:00:36AM +0300, Andy Shevchenko wrote: > On Tue, Oct 25, 2022 at 4:46 AM Nathan Moinvaziri <nathan@nathanm.com> wrote: ... > > When running tests using Quick Benchmark with two matching 256 character > > strings these changes result in anywhere between ~6-9x speed improvement. > > > > * We use unsigned char instead of int similar to strncasecmp. > > * We only subtract c1 - c2 when they are not equal. ... > You tell us that this is more preformant, but have not provided the > numbers. Can we see those, please? So, I have read carefully and see the reference to some QuickBenchmark I have no idea about. What I meant here is to have numbers provided by an (open source) tool (maybe even in-kernel test case) that anybody can test on their machines. You also missed details about how you run, what the data set has been used, etc. > Note, that you basically trash CPU cache lines when characters are not > equal, and before doing that you have a branching. I'm unsure that > your way is more performant than the original one.
Hi Andy, I appreciate your quick feedback! I have done as you suggested and published my results this time using Google benchmark: https://github.com/nmoinvaz/strcasecmp After you review it, and if you still think the patch is worthwhile then I can fix the other problems you mentioned for the original patch. If you think it is not worth it, then I understand. Thanks again, Nathan -----Original Message----- From: Andy Shevchenko <andy@kernel.org> Sent: Tuesday, October 25, 2022 2:04 AM To: Nathan Moinvaziri <nathan@nathanm.com> Cc: linux-kernel@vger.kernel.org Subject: Re: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match On Tue, Oct 25, 2022 at 11:00:36AM +0300, Andy Shevchenko wrote: > On Tue, Oct 25, 2022 at 4:46 AM Nathan Moinvaziri <nathan@nathanm.com> wrote: ... > > When running tests using Quick Benchmark with two matching 256 > > character strings these changes result in anywhere between ~6-9x speed improvement. > > > > * We use unsigned char instead of int similar to strncasecmp. > > * We only subtract c1 - c2 when they are not equal. ... > You tell us that this is more preformant, but have not provided the > numbers. Can we see those, please? So, I have read carefully and see the reference to some QuickBenchmark I have no idea about. What I meant here is to have numbers provided by an (open source) tool (maybe even in-kernel test case) that anybody can test on their machines. You also missed details about how you run, what the data set has been used, etc. > Note, that you basically trash CPU cache lines when characters are not > equal, and before doing that you have a branching. I'm unsure that > your way is more performant than the original one. -- With Best Regards, Andy Shevchenko
On Tue, Oct 25, 2022 at 8:53 PM Nathan Moinvaziri <nathan@nathanm.com> wrote: > > Hi Andy, > > I appreciate your quick feedback! > > I have done as you suggested and published my results this time using Google benchmark: > https://github.com/nmoinvaz/strcasecmp Thank you for sharing! Looks promising, but may I suggest a few things: 1) have you considered the word-at-a-time use (like strscpy() does)? 2) instead of using tolower() on both sides, have you considered (with the above in mind) to use XOR over words and if they are not 0, check if the result is one of possible combinations of 0x20 and then by excluding the non-letters from the range you may find the difference? So, I think it's a good exercise for the twiddling of bits. > After you review it, and if you still think the patch is worthwhile then I can fix the other problems you mentioned for the original patch. If you think it is not worth it, then I understand. P.S. Avoid top-posting in the Linux kernel mailing lists! > -----Original Message----- > From: Andy Shevchenko <andy@kernel.org> > Sent: Tuesday, October 25, 2022 2:04 AM > To: Nathan Moinvaziri <nathan@nathanm.com> > Cc: linux-kernel@vger.kernel.org > Subject: Re: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match > > On Tue, Oct 25, 2022 at 11:00:36AM +0300, Andy Shevchenko wrote: > > On Tue, Oct 25, 2022 at 4:46 AM Nathan Moinvaziri <nathan@nathanm.com> wrote: > > ... > > > > When running tests using Quick Benchmark with two matching 256 > > > character strings these changes result in anywhere between ~6-9x speed improvement. > > > > > > * We use unsigned char instead of int similar to strncasecmp. > > > * We only subtract c1 - c2 when they are not equal. > > ... > > > You tell us that this is more preformant, but have not provided the > > numbers. Can we see those, please? > > So, I have read carefully and see the reference to some QuickBenchmark I have no idea about. What I meant here is to have numbers provided by an (open > source) tool (maybe even in-kernel test case) that anybody can test on their machines. You also missed details about how you run, what the data set has been used, etc. > > > Note, that you basically trash CPU cache lines when characters are not > > equal, and before doing that you have a branching. I'm unsure that > > your way is more performant than the original one.
Le 25/10/2022 à 19:53, Nathan Moinvaziri a écrit : > Hi Andy, > > I appreciate your quick feedback! > > I have done as you suggested and published my results this time using Google benchmark: > https://github.com/nmoinvaz/strcasecmp Hi, the algorithm on github is not the same as the one posted here. IIUC, the one on github is wrong. If you compare 2 strings that are the same, they will have the same length, and "if (c1 == c2) continue;" will go one past the end of the strings. And the result will be <0 or 0 or >0 depending the the char *after* the trailing \0. On the other side, the results of the benchmark on github are likely not accurate with the algorithm posted here, because there is one more test in each loop ("while (c1 != 0)") as long as the 2 strings are the same. On github this test is skipped because you will go through the "continue" CJ > > After you review it, and if you still think the patch is worthwhile then I can fix the other problems you mentioned for the original patch. If you think it is not worth it, then I understand. > > Thanks again, > Nathan > > -----Original Message----- > From: Andy Shevchenko <andy@kernel.org> > Sent: Tuesday, October 25, 2022 2:04 AM > To: Nathan Moinvaziri <nathan@nathanm.com> > Cc: linux-kernel@vger.kernel.org > Subject: Re: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match > > On Tue, Oct 25, 2022 at 11:00:36AM +0300, Andy Shevchenko wrote: >> On Tue, Oct 25, 2022 at 4:46 AM Nathan Moinvaziri <nathan@nathanm.com> wrote: > > ... > >>> When running tests using Quick Benchmark with two matching 256 >>> character strings these changes result in anywhere between ~6-9x speed improvement. >>> >>> * We use unsigned char instead of int similar to strncasecmp. >>> * We only subtract c1 - c2 when they are not equal. > > ... > >> You tell us that this is more preformant, but have not provided the >> numbers. Can we see those, please? > > So, I have read carefully and see the reference to some QuickBenchmark I have no idea about. What I meant here is to have numbers provided by an (open > source) tool (maybe even in-kernel test case) that anybody can test on their machines. You also missed details about how you run, what the data set has been used, etc. > >> Note, that you basically trash CPU cache lines when characters are not >> equal, and before doing that you have a branching. I'm unsure that >> your way is more performant than the original one. > > -- > With Best Regards, > Andy Shevchenko > > >
On 25/10/2022 10.00, Andy Shevchenko wrote: > On Tue, Oct 25, 2022 at 4:46 AM Nathan Moinvaziri <nathan@nathanm.com> wrote: >> >> From fcb0159ee74908f92adc34143657d8ca56e9a811 Mon Sep 17 00:00:00 2001 >> From: Nathan Moinvaziri <nathan@nathanm.com> >> Date: Mon, 24 Oct 2022 16:37:59 -0700 >> Subject: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if >> chars match. > > Why is the above in the commit message? > >> With strings where many characters match exactly each character is needlessly >> converted to lowercase before comparing. This patch improves the comparison >> by only converting to lowercase after checking that the characters don't match. >> >> The more characters that match exactly the better performance we expect versus >> the old function. > You tell us that this is more preformant, but have not provided the > numbers. Can we see those, please? > > Note, that you basically trash CPU cache lines when characters are not > equal, and before doing that you have a branching. I'm unsure that > your way is more performant than the original one. > Are there any code paths in the kernel where strcasecmp performance matters? strcmp, sure, but strcasecmp or strncasecmp? I don't think so. If anything, we should nuke the complication in strncasecmp(), and then make strcasecmp() simply do strncasecmp(a, b, SIZE_MAX). Rasmus
On 10/25/2022 12:55 PM, Rasmus Villemoes wrote: > Are there any code paths in the kernel where strcasecmp performance > matters? strcmp, sure, but strcasecmp or strncasecmp? I don't think so. > If anything, we should nuke the complication in strncasecmp(), and then > make strcasecmp() simply do strncasecmp(a, b, SIZE_MAX). It looks like several of the string functions could be collapsed in that way. Nathan
On 10/25/2022 12:32 PM, Christophe JAILLET wrote: > Hi, > the algorithm on github is not the same as the one posted here. > > IIUC, the one on github is wrong. If you compare 2 strings that are > the same, they will have the same length, and "if (c1 == c2) > continue;" will go one past the end of the strings. And the result > will be <0 or 0 or >0 depending the the char *after* the trailing \0. > > On the other side, the results of the benchmark on github are likely > not accurate with the algorithm posted here, because there is one more > test in each loop ("while (c1 != 0)") as long as the 2 strings are the > same. > On github this test is skipped because you will go through the "continue" > > CJ Hi CJ, Thanks for catching that, I had changed it at the last second. I have updated the code and the benchmarks to what I initially proposed in the patch. Results are about +/-1% from previously. Nathan
On 10/25/2022 12:19 PM, Andy Shevchenko wrote: > Looks promising, but may I suggest a few things: > 1) have you considered the word-at-a-time use (like strscpy() does)? Only briefly at the beginning of the function to check for an identical comparison and the added check hurt performance for strings that were not identical. On 10/25/2022 12:19 PM, Andy Shevchenko wrote: > 2) instead of using tolower() on both sides, have you considered > (with the above in mind) to use XOR over words and if they are not 0, > check if the result is one of possible combinations of 0x20 and then > by excluding the non-letters from the range you may find the > difference? I'm not sure what you mean about the possible combinations of the space character. I have not investigated this method. ... According to my previous findings the check for c1 != c2 does perform better for strings that are at least 25% or more the same. I was able to get even more performance out of it by changing tolower() to use a different hash table than the one used for the is*() functions. By using a pre-generated hash table for both islower() and isupper() it is possible to remove the branch where ever those functions are used, including in strcasecmp. This method I've seen employed in the Android code base and also in cURL. Using it would add additional 2x256 bytes to the code size for the tables. I've put together a Quick Benchmark that shows the comparison between the different methods: https://quick-bench.com/q/l5DkYQO-CcMxQUu5MjZiqZ8M-Y0 Nathan
On Thu, Oct 27, 2022 at 6:29 AM Nathan Moinvaziri <nathan@nathanm.com> wrote: > On 10/25/2022 12:19 PM, Andy Shevchenko wrote: > > Looks promising, but may I suggest a few things: > > 1) have you considered the word-at-a-time use (like strscpy() does)? > > Only briefly at the beginning of the function to check for an identical > comparison and the added check hurt performance for strings that were > not identical. > > On 10/25/2022 12:19 PM, Andy Shevchenko wrote: > > > 2) instead of using tolower() on both sides, have you considered > > (with the above in mind) to use XOR over words and if they are not 0, > > check if the result is one of possible combinations of 0x20 and then > > by excluding the non-letters from the range you may find the > > difference? > > I'm not sure what you mean about the possible combinations of the space > character. I have not investigated this method. 'a' xor 'A' == 0x20 (same for all the letters. That's why we have a specific _tolower() in vsprintf.c. > According to my previous findings the check for c1 != c2 does perform > better for strings that are at least 25% or more the same. I was able to > get even more performance out of it by changing tolower() to use a > different hash table than the one used for the is*() functions. By using > a pre-generated hash table for both islower() and isupper() it is > possible to remove the branch where ever those functions are used, > including in strcasecmp. This method I've seen employed in the Android > code base and also in cURL. Using it would add additional 2x256 bytes to > the code size for the tables. Rasmus raised a good question, where do we actually need the performant strcasecmp()?
diff --git a/lib/string.c b/lib/string.c index 3371d26a0e39..51ad56db1b5d 100644 --- a/lib/string.c +++ b/lib/string.c @@ -64,13 +64,20 @@ EXPORT_SYMBOL(strncasecmp); #ifndef __HAVE_ARCH_STRCASECMP int strcasecmp(const char *s1, const char *s2) { - int c1, c2; + /* Yes, Virginia, it had better be unsigned */ + unsigned char c1, c2; do { - c1 = tolower(*s1++); - c2 = tolower(*s2++); - } while (c1 == c2 && c1 != 0); - return c1 - c2; + c1 = *s1++; + c2 = *s2++; + if (c1 != c2) { + c1 = tolower(c1); + c2 = tolower(c2); + if (c1 != c2) + return (int)c1 - (int)c2; + } + } while (c1 != 0); + return 0; } EXPORT_SYMBOL(strcasecmp); #endif