From patchwork Tue Jun 20 07:40:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 110295 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp3495325vqr; Tue, 20 Jun 2023 00:51:18 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5WyjfkbnmsYk032Z3PKRh4LsaRNMo0cTBd6yRmNsMlwyRUUo8foCsKyRWw0AOqXlNgM2Oo X-Received: by 2002:a05:6a20:94c9:b0:10f:9317:153a with SMTP id ht9-20020a056a2094c900b0010f9317153amr7830991pzb.62.1687247478533; Tue, 20 Jun 2023 00:51:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687247478; cv=none; d=google.com; s=arc-20160816; b=StNwsi9E0CPlrVvih6XRTUG1vOhxL5+5w6uIg2cLjIeP6u0AjFxsn9F0WbigYywlTm tmo3vKhUMB+xDW8smJGo8bn1rDAtinfBoXE8m+t/QPLac61/Yv4kWrRbzQDjhf9Tgjfu pDBZP2mWelA9lnXsrNK7W1bwV15e/o6IJ4TElGkKkG7ec8NBknr17070r82MkuSBXYUz 2MDaGnPesMBv86TWqcw5jCU82Wi24MbcAh8U7hjPZ3BagReyilgvM7d1btmy9THH8bFl rpocNh7ySdbxE5Y1iX6s0njUmN4/gNLzLZ7oKcHA072QtHSw2XMc2ELeoYB4/5BQICxt ekhw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:message-id:in-reply-to :subject:cc:to:from:date:dkim-signature; bh=2Xgtva00uaRjPQjQYWCbyYlUVerqtzlz4H3/JM5nRsA=; b=joIf/MgwCGGqFZ05TRaOcNjDLchJdpWCkhM1IX0w2yVy1ZgNyYu6bDjxoj2q/6+tnu kilvmdzP82V5il7namJf4ccMisJUGPki5Z2PUYSS3q0+MWJQdDpeGSgRx743CBdCUlC1 6pMTlK5j4eCgLRz8m9FwtgNpWSNfTgnpE3w60ivUw5C7PlBpo+j0VcpQOOqLa65aKzaP F/pfBou+ByFi5k7/sHZMHlJ91YIuMko+29jOXGYeCmjyKu+LoFGbfY0OC5Vdjghw1f3T Dvs2zZWKX5RnEQIeAACNp8vJ+H9IZisvF6JcuNcbCT0xG3Bz5I42RkddFNVrm4r5ueiB rX5g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b="0LMD2/Bp"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i2-20020aa796e2000000b006688fee07eesi1195052pfq.41.2023.06.20.00.51.06; Tue, 20 Jun 2023 00:51:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b="0LMD2/Bp"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230131AbjFTHlo (ORCPT + 99 others); Tue, 20 Jun 2023 03:41:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42086 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231455AbjFTHlT (ORCPT ); Tue, 20 Jun 2023 03:41:19 -0400 Received: from mail-yw1-x112f.google.com (mail-yw1-x112f.google.com [IPv6:2607:f8b0:4864:20::112f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0FBB01FC6 for ; Tue, 20 Jun 2023 00:40:14 -0700 (PDT) Received: by mail-yw1-x112f.google.com with SMTP id 00721157ae682-5701e8f2b79so49562237b3.0 for ; Tue, 20 Jun 2023 00:40:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687246813; x=1689838813; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=2Xgtva00uaRjPQjQYWCbyYlUVerqtzlz4H3/JM5nRsA=; b=0LMD2/Bp9LkZIBYgPbN4jKWx5l3xqsV5p+9EAezDq6xR/SWmyYTEAe1FdDy1xbMmRb H8PBsbrNU4fYI4mKZfhyMwB/CU4nG5b31JuXXrIAyWt0sajI0olzTibXN+ktWNYF0oep x0ccngPNp6kQepRw/XRcKOWVrIwm+WEupexqrFO6FFWjiu1SlHy8yMCBdGb2CGeJXPQe rQ6klP2gqbiJAoxVWslA7bD8MQXYp6OVctc8Pm9x7E9rZqqXloCfaCSI/DExJxOsqSJT 4d6fjthovhopd44fEKacEstPh46MmIG/u5ZNkKED0IyMDYNloqJ4rKSwMXR8oHsnZsZm /6EA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687246813; x=1689838813; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=2Xgtva00uaRjPQjQYWCbyYlUVerqtzlz4H3/JM5nRsA=; b=M2kI2BfSxb5OwIoQs7CGagC+IMfq7sUAXO2+ptSWjiBqf7mG2H5cqREYW588Xp8vJR WsYsPTSa0uviSfKT1FQBtsltKV71LCwhiB7I/vDdt/POxFiEv8iLcPJnuwdT86vZXCcE ufWMIltFsAoyu3knUYTK2Tqbsg22/GnL/A/t24/CaiBMCXfMRUMUDuZ5fhH/+rXRsURv T10DqpRTB7WLP/+0NR3fYUmgOvTPTsgpm/kEFrMWvC8soLhYeO0vGs/gpTVV+2/1FGCZ S4PFwNcYL2oYmN0eOVOs3TSuZyprBfAnL7cNd0+EZTIAFxojs6W1VvVQ0Xb1Zv+5fggr kXnA== X-Gm-Message-State: AC+VfDzF93sSe2Y1MGcCowLw/ja39f7r4a4gkSJvaaAIoEjsuGYE5aKF luaoYWKmSHW+nrnwzUL8C3YfVg== X-Received: by 2002:a81:75d6:0:b0:56d:b98:cc16 with SMTP id q205-20020a8175d6000000b0056d0b98cc16mr13330173ywc.45.1687246813120; Tue, 20 Jun 2023 00:40:13 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id x8-20020a817c08000000b005623ae13106sm368166ywc.100.2023.06.20.00.40.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 00:40:12 -0700 (PDT) Date: Tue, 20 Jun 2023 00:40:00 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Gerald Shaefer , Vasily Gorbik , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , Vishal Moola , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 01/12] mm/pgtable: add rcu_read_lock() and rcu_read_unlock()s In-Reply-To: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> Message-ID: <53514a65-9053-1e8a-c76a-c158f8965@google.com> References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> MIME-Version: 1.0 X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769207211967296677?= X-GMAIL-MSGID: =?utf-8?q?1769207211967296677?= Before putting them to use (several commits later), add rcu_read_lock() to pte_offset_map(), and rcu_read_unlock() to pte_unmap(). Make this a separate commit, since it risks exposing imbalances: prior commits have fixed all the known imbalances, but we may find some have been missed. Signed-off-by: Hugh Dickins --- include/linux/pgtable.h | 4 ++-- mm/pgtable-generic.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index a1326e61d7ee..8b0fc7fdc46f 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -99,7 +99,7 @@ static inline pte_t *pte_offset_kernel(pmd_t *pmd, unsigned long address) ((pte_t *)kmap_local_page(pmd_page(*(pmd))) + pte_index((address))) #define pte_unmap(pte) do { \ kunmap_local((pte)); \ - /* rcu_read_unlock() to be added later */ \ + rcu_read_unlock(); \ } while (0) #else static inline pte_t *__pte_map(pmd_t *pmd, unsigned long address) @@ -108,7 +108,7 @@ static inline pte_t *__pte_map(pmd_t *pmd, unsigned long address) } static inline void pte_unmap(pte_t *pte) { - /* rcu_read_unlock() to be added later */ + rcu_read_unlock(); } #endif diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index c7ab18a5fb77..674671835631 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -236,7 +236,7 @@ pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp) { pmd_t pmdval; - /* rcu_read_lock() to be added later */ + rcu_read_lock(); pmdval = pmdp_get_lockless(pmd); if (pmdvalp) *pmdvalp = pmdval; @@ -250,7 +250,7 @@ pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp) } return __pte_map(&pmdval, addr); nomap: - /* rcu_read_unlock() to be added later */ + rcu_read_unlock(); return NULL; } From patchwork Tue Jun 20 07:42:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 110334 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp3509481vqr; Tue, 20 Jun 2023 01:25:19 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7CDIvcnm3nbmc9+aoak7CxROY5z64yYTNEj2bKeagUO0klsMHg30mPMyC2nZa++GKrlusD X-Received: by 2002:a17:902:aa05:b0:1b6:797d:3401 with SMTP id be5-20020a170902aa0500b001b6797d3401mr1550111plb.67.1687249519170; Tue, 20 Jun 2023 01:25:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687249519; cv=none; d=google.com; s=arc-20160816; b=g8Tm4aWxGX6vQIK9b0KaUVah3Ip+Nbvs+PH8IXlBumLdorpecB3K20smOnXIxe06+4 D8lxlqrudK0kbzZYLVGX/x4WHb3uekrZb5mEmaKOkJo603IwrV9XIWSAplR5Gv4wMQN8 wAzMS8LgZvy7Bo7fk61BuBu9eAxlLyRkgVrjrpc1zf3kH5p2JScV7w4F8tZVaiHuzeBD OXvnVSJvXVW21fHGDV9u+mYpwec1CSWx+6wqz2iaIqR3nUeUpg52xo+kdBUfie782AVM HS2izWvvmge3nMM/NaepC66Yx7ZlhKZJL6rvHYO9xdKlq2hSPmU3GP9nWOqlxOK9Eq8M uXpg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:message-id:in-reply-to :subject:cc:to:from:date:dkim-signature; bh=4mQCeX//121ajNwvCY66Rysl9c9mZt4CnPkKAdBrlyo=; b=bqblJtd33Qwn++NZYov5AXEX+Ulb47qAXO5DxoGhwFUaMfOyhYFMczb3u9nX36Dnbp nKQOk84EME/7Uh4BpBPWGGkHvi5yNk1TcCIS7n0Run/ZTmgj+IzDAYdUKCAq8YJDBmPW xpmd51Lu+YsRRNZERWHLTEkyi4MNRu9ATmlfxAQWjka97SS5WO99g1lN22rVS1oQPit6 JlGudqkJ0MNt/LFaASHpb/P5Qbye7TLGbQxBHHAVGVY1uLPz4Tnq7Na/gx9OMUdKMzYu o55+E4bJ6G1BJE4nt9FX/GEHc9cxD3JCiY/Ok+TYH8sLg6jHGsLQkJQnsls8SuIlTEZP fPZQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=Rk9aZz3l; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c2-20020a170902d90200b001aafb271d13si1403544plz.235.2023.06.20.01.25.05; Tue, 20 Jun 2023 01:25:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=Rk9aZz3l; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230506AbjFTHm6 (ORCPT + 99 others); Tue, 20 Jun 2023 03:42:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41610 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231586AbjFTHmc (ORCPT ); Tue, 20 Jun 2023 03:42:32 -0400 Received: from mail-yb1-xb2d.google.com (mail-yb1-xb2d.google.com [IPv6:2607:f8b0:4864:20::b2d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B4B70CF for ; Tue, 20 Jun 2023 00:42:19 -0700 (PDT) Received: by mail-yb1-xb2d.google.com with SMTP id 3f1490d57ef6-be3b35ae72dso4470732276.0 for ; Tue, 20 Jun 2023 00:42:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687246939; x=1689838939; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=4mQCeX//121ajNwvCY66Rysl9c9mZt4CnPkKAdBrlyo=; b=Rk9aZz3lCgWM/wzoeIdA8TQQHo7EP0qUx4FBk6w6vBIv3V9EGmhpCtWeOKkVLalH/B 3/IhZXP8pzSAUtUwv1YZ0rZFRUBVXIfjAUyirXzYoBsOop5Z2sZPS+eTpvJIhr7MrLYm LUjzgRD1GyQ/XuvAm8PLozNnLUaUsJj8Srr/G3h67lgjS83i0yMgd02EIVoO5rsRx9kI daqlfGCCUSLKQt7wMh3o0YW7GwW3VGqW8slprzgeDxp+5B0Z6u1+4uuHf8AelDneMwjV DKf7ckeYtLcfEfqu2RjXHw4XHJyjxhdMznrF0GKkr5IkH8s4Le9upNEGz9sVfSI4hq5l 39mg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687246939; x=1689838939; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=4mQCeX//121ajNwvCY66Rysl9c9mZt4CnPkKAdBrlyo=; b=WxQwtBgzb9TP2kdtTi/B44z0bDZJVx5KGzA4YEbR6nr9tUdBlvL8NXfyqJmpE8Cuiu pnRtsVEMB3MhKzuKTq46OdviE7jRgyWKCJuEIQ/gODbGEmPxS9P+U2Kauo2i970gAU4x Z806Kp0c/E+D6w1+7abTjA0HPXbEffp2txddpQSI0VEf2iOe135xREehT+UO7rxdyaxs VkpMB54Wbq1jMozN41LTzZBAB6fV85yuutCQcoovqdrJzT3N94jC9fbjK13F9qQCMsGc JTSprXkEpJMSodtJcXGTrCTFHYZR42IdgE/oQlsdZMMvkYoEHTq6IAfM2mwvlq1vFtbj sQ5Q== X-Gm-Message-State: AC+VfDwUlPbBrQHwqpsu+4gnY7ORK5F6Ox1VDgjYYj+Gbh4yPaT8uXsx xCdVZ1p2U+4Em+G1h+keaWbKSg== X-Received: by 2002:a25:4dc2:0:b0:bac:4dd6:d0d1 with SMTP id a185-20020a254dc2000000b00bac4dd6d0d1mr7498427ybb.19.1687246938747; Tue, 20 Jun 2023 00:42:18 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id j200-20020a2523d1000000b00be6cf9d2544sm254636ybj.40.2023.06.20.00.42.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 00:42:18 -0700 (PDT) Date: Tue, 20 Jun 2023 00:42:13 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Gerald Schaefer , Vasily Gorbik , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David Sc. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , Vishal Moola , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 02/12] mm/pgtable: add PAE safety to __pte_offset_map() In-Reply-To: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> Message-ID: References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> MIME-Version: 1.0 X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769209352061123946?= X-GMAIL-MSGID: =?utf-8?q?1769209352061123946?= There is a faint risk that __pte_offset_map(), on a 32-bit architecture with a 64-bit pmd_t e.g. x86-32 with CONFIG_X86_PAE=y, would succeed on a pmdval assembled from a pmd_low and a pmd_high which never belonged together: their combination not pointing to a page table at all, perhaps not even a valid pfn. pmdp_get_lockless() is not enough to prevent that. Guard against that (on such configs) by local_irq_save() blocking TLB flush between present updates, as linux/pgtable.h suggests. It's only needed around the pmdp_get_lockless() in __pte_offset_map(): a race when __pte_offset_map_lock() repeats the pmdp_get_lockless() after getting the lock, would just send it back to __pte_offset_map() again. Complement this pmdp_get_lockless_start() and pmdp_get_lockless_end(), used only locally in __pte_offset_map(), with a pmdp_get_lockless_sync() synonym for tlb_remove_table_sync_one(): to send the necessary interrupt at the right moment on those configs which do not already send it. CONFIG_GUP_GET_PXX_LOW_HIGH is enabled when required by mips, sh and x86. It is not enabled by arm-32 CONFIG_ARM_LPAE: my understanding is that Will Deacon's 2020 enhancements to READ_ONCE() are sufficient for arm. It is not enabled by arc, but its pmd_t is 32-bit even when pte_t 64-bit. Limit the IRQ disablement to CONFIG_HIGHPTE? Perhaps, but would need a little more work, to retry if pmd_low good for page table, but pmd_high non-zero from THP (and that might be making x86-specific assumptions). Signed-off-by: Hugh Dickins --- include/linux/pgtable.h | 4 ++++ mm/pgtable-generic.c | 29 +++++++++++++++++++++++++++++ 2 files changed, 33 insertions(+) diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 8b0fc7fdc46f..525f1782b466 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -390,6 +390,7 @@ static inline pmd_t pmdp_get_lockless(pmd_t *pmdp) return pmd; } #define pmdp_get_lockless pmdp_get_lockless +#define pmdp_get_lockless_sync() tlb_remove_table_sync_one() #endif /* CONFIG_PGTABLE_LEVELS > 2 */ #endif /* CONFIG_GUP_GET_PXX_LOW_HIGH */ @@ -408,6 +409,9 @@ static inline pmd_t pmdp_get_lockless(pmd_t *pmdp) { return pmdp_get(pmdp); } +static inline void pmdp_get_lockless_sync(void) +{ +} #endif #ifdef CONFIG_TRANSPARENT_HUGEPAGE diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index 674671835631..5e85a625ab30 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -232,12 +232,41 @@ pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address, #endif #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +#if defined(CONFIG_GUP_GET_PXX_LOW_HIGH) && \ + (defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RCU)) +/* + * See the comment above ptep_get_lockless() in include/linux/pgtable.h: + * the barriers in pmdp_get_lockless() cannot guarantee that the value in + * pmd_high actually belongs with the value in pmd_low; but holding interrupts + * off blocks the TLB flush between present updates, which guarantees that a + * successful __pte_offset_map() points to a page from matched halves. + */ +static unsigned long pmdp_get_lockless_start(void) +{ + unsigned long irqflags; + + local_irq_save(irqflags); + return irqflags; +} +static void pmdp_get_lockless_end(unsigned long irqflags) +{ + local_irq_restore(irqflags); +} +#else +static unsigned long pmdp_get_lockless_start(void) { return 0; } +static void pmdp_get_lockless_end(unsigned long irqflags) { } +#endif + pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp) { + unsigned long irqflags; pmd_t pmdval; rcu_read_lock(); + irqflags = pmdp_get_lockless_start(); pmdval = pmdp_get_lockless(pmd); + pmdp_get_lockless_end(irqflags); + if (pmdvalp) *pmdvalp = pmdval; if (unlikely(pmd_none(pmdval) || is_pmd_migration_entry(pmdval))) From patchwork Tue Jun 20 07:43:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 110294 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp3495158vqr; Tue, 20 Jun 2023 00:50:47 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4H/s1yUKsJqoEKsrNga3ClmBus4Syw0duLz5oZxK5cQBRl/4Z2fbmxO4OfhyWcxjUP0DqX X-Received: by 2002:a05:6a20:a11c:b0:106:c9b7:c92f with SMTP id q28-20020a056a20a11c00b00106c9b7c92fmr8568495pzk.49.1687247446823; Tue, 20 Jun 2023 00:50:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687247446; cv=none; d=google.com; s=arc-20160816; b=oL9jhdere11VZnzIYjQwZdClWUL2bn3Mi39coaUSchONPEhB/yf5LiZ3M0Axq+AS74 4sJhiGHnCmRvPBF5dWlgQnZ3Wn3j2MOJUS3HMWle39rhF5aTobxvlBhft8CTYcLOgdkK EMqAbXpCK3+Xo2vxU+rmG6FUiSABwodTDz3Q0/pIaO0mAYNef38RdyvAzmLvyh573F0c oWrmhh+uX0D5nCSJzW+YsLkGhv3w/TRi43wkB8Pc9fQVJndWpgnnVDiMUgSe43DDMppc vhT8sfrweVORIGbYYgRvwQdlDBTdFVzJrfcON4GG2oCl5G+eRM+yon5eOpHZirYNUa0U Ac3Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:message-id:in-reply-to :subject:cc:to:from:date:dkim-signature; bh=QJ3N9RNUACQk0rp/xTQEOUK2bEy9LceuxJNDbuu7Hdo=; b=N+xkdOGkKMlcPjIB18Tl+4zu+jviglA02VIpwIIz/7TEQm/M6IwpBtQBxogUXPxD/p IU+9ADydXvg0XPqemiibJqbz9OlYOlxMwnp6MNm/rGekZlL2zPEJu1h1fAPhEY9C5B1m qLZ0VKONZqCewvtKyLQCJCLdqbrq9q9OAqz0nstGNhqrG+w7nMB0QpICOSQcxZzRHJh0 5Z+2uGRCbtR2bNZXxxkH5ClJtEPghXm/pETUlyq0DdhrvxjsHSgXkU58MQy8EdMN57cs qVG5zb30yA9s/K6rt/f4i5KYIdCJok7O7C5TcsmGVVCLPfo1xvzFMjPdQ8zSBNmZ7mk2 RRtA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=qhwhngGU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o1-20020a170902bcc100b001b55070e154si1346843pls.96.2023.06.20.00.50.33; Tue, 20 Jun 2023 00:50:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=qhwhngGU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230125AbjFTHpH (ORCPT + 99 others); Tue, 20 Jun 2023 03:45:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41618 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231228AbjFTHoP (ORCPT ); Tue, 20 Jun 2023 03:44:15 -0400 Received: from mail-yw1-x112c.google.com (mail-yw1-x112c.google.com [IPv6:2607:f8b0:4864:20::112c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 355A3172C for ; Tue, 20 Jun 2023 00:43:56 -0700 (PDT) Received: by mail-yw1-x112c.google.com with SMTP id 00721157ae682-570808d8ddeso41703117b3.0 for ; Tue, 20 Jun 2023 00:43:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687247035; x=1689839035; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=QJ3N9RNUACQk0rp/xTQEOUK2bEy9LceuxJNDbuu7Hdo=; b=qhwhngGUpgrcZDlnvH+b7VWG9WLTHUEVswW9xK/3YCFZXun6Sku2aj+qmeuJtlpDm9 NL1p7tRRaSaVuFlEgJZSd1u9Osk9e4zYWPVYSmjFqeH51Two+LdeRyVK6tn3qt9RQ2/C FXROYrneK++r2OPYxjNpEmp15a3hsycB8EeXDTwBWnGuwYFgeqd19dhMRsM+70lw7kZD UxM84oMlRLUMCqnzuOHx9Iap0algo8Gh2vtX9lFjHG25G0LFiMapQLFR33NHeN7LFjU7 xqRJD57e/d0/xxEOFNXoz5t1uVw5lEYV2V7WrZMTJMmAFZ9r11+4uO5ZNp31jPHEHE3P 8GXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687247035; x=1689839035; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=QJ3N9RNUACQk0rp/xTQEOUK2bEy9LceuxJNDbuu7Hdo=; b=PB2T5XpBUn6kjSmaYO7ODj/PTkbdEULQ4i/JivI6M6wDUJuYODvg+oyT8OaurwZGmu NG6kM/MqusMHX151eDHK8rFcnsA4V2oGw8Bq/168LINh/PXnwk42Fx43k+IIYBuVpgvC iYPU/iTGCDTFdTpT+pUdPWValM57whegyjn9kUd11zHkpGcuDdKMzVRA13I3KCuUEsU+ 6npyWci7WZHr4dqKDktsCKrsujdAcIYw4QiGYvRhoFuYYI/3prMYQSUxfaZCa/efYLAt PNRJI6pn5mC287hzUL6fzIS36XyhO8bLn5UHi24iI6MqR73cRZqa3rzTmkl72yYnQ3Py chLQ== X-Gm-Message-State: AC+VfDxGGHW2Uj3sGEwmmUVz8t8WJ/iUGQ9f/IGPRAF8H9coFuTM+a9b 1v/6BjnDG+Tsnbqw0prfI++ueg== X-Received: by 2002:a0d:cf82:0:b0:56d:2b1e:3d88 with SMTP id r124-20020a0dcf82000000b0056d2b1e3d88mr11408597ywd.24.1687247035289; Tue, 20 Jun 2023 00:43:55 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id y188-20020a8188c5000000b00545a08184f8sm365085ywf.136.2023.06.20.00.43.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 00:43:54 -0700 (PDT) Date: Tue, 20 Jun 2023 00:43:50 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Gerald Schaefer , Vasily Gorbik , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , Vishal Moola , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 03/12] arm: adjust_pte() use pte_offset_map_nolock() In-Reply-To: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> Message-ID: References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> MIME-Version: 1.0 X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769207178904102043?= X-GMAIL-MSGID: =?utf-8?q?1769207178904102043?= Instead of pte_lockptr(), use the recently added pte_offset_map_nolock() in adjust_pte(): because it gives the not-locked ptl for precisely that pte, which the caller can then safely lock; whereas pte_lockptr() is not so tightly coupled, because it dereferences the pmd pointer again. Signed-off-by: Hugh Dickins --- arch/arm/mm/fault-armv.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c index ca5302b0b7ee..7cb125497976 100644 --- a/arch/arm/mm/fault-armv.c +++ b/arch/arm/mm/fault-armv.c @@ -117,11 +117,10 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address, * must use the nested version. This also means we need to * open-code the spin-locking. */ - pte = pte_offset_map(pmd, address); + pte = pte_offset_map_nolock(vma->vm_mm, pmd, address, &ptl); if (!pte) return 0; - ptl = pte_lockptr(vma->vm_mm, pmd); do_pte_lock(ptl); ret = do_adjust_pte(vma, address, pfn, pte); From patchwork Tue Jun 20 07:45:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 110297 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp3495427vqr; Tue, 20 Jun 2023 00:51:36 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4HegGuxcroZRitzb2RzZ3UxM76RaJKSUtOM4trvpSQrkAXytpDQXn5HlpY9OAcQvI3wXOV X-Received: by 2002:a92:d141:0:b0:343:13e1:68ad with SMTP id t1-20020a92d141000000b0034313e168admr3203581ilg.10.1687247495747; Tue, 20 Jun 2023 00:51:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687247495; cv=none; d=google.com; s=arc-20160816; b=pREtmAqzB3x29Pe7pp0ozzaCbo92gY1uJQB0wqWGASYnGENFv20RhyjUzAA/bCvOhF DHHJG0Tvx2YyRsRwKXBDGuJhWfZVUdnZOhFs/keuKjD3xwPfbJdCj30ZsAbiUoqwGLhK Tz0bJIh5yBtNvz/Ju2fH7z3NDYTJomNfuQBocSfxt51P8LyB8EqXqhuHLcsJJwAfsZU2 GWeVM2xGrNW/z3iORxuTIyV5NpgeHIeuLdxWAUlqydHW90z+w+ooUxFWftdLPY99x/cl SZuyRfDk0rf4AXugsGtIyF8fd1jqqO6y4FE9hMkmXyReGYFLp1G+R0xOQ+teGr2Ue0QZ lpwA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:message-id:in-reply-to :subject:cc:to:from:date:dkim-signature; bh=7tbEFbFk5pzLjwVra+37SCseEW8dJexk2Np2YDtZH+c=; b=KtN6I4MHD6aXZ8XQVjDvaS1PfcfF7f26T6MdH+auj312KNXkGcQXHCBf87ipCNTfW6 8CFeP3ZANyBwl3g3DOYpJo8TR3a+OzGhxq+kX6HnhcQ1B+9lkBWmjGSs+C/tCBgjwCH6 7wuzU9F7A3SRba5BrwX/MFB2LNpYgipzwwbPGeQ/hDBJHwf1fYyfgGFF4Gk0l0K1JOTM cv5nlgwlRcpuxegaNXD4pbhqxzBoATNzzLSMT9xlhQghc6Pq8dsHtuIj1ceLm5fdySyz e9ATgd9OdRoN5wbFHWUBLu5hpRPoub3M5BQZFAvjkpAu+iKXgq0rA6cT0z/+qRn1BGEn 3qgw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=jZI7lGkG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m25-20020a637119000000b0054ff67d6666si1182684pgc.512.2023.06.20.00.51.23; Tue, 20 Jun 2023 00:51:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=jZI7lGkG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230482AbjFTHqF (ORCPT + 99 others); Tue, 20 Jun 2023 03:46:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43278 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230453AbjFTHpl (ORCPT ); Tue, 20 Jun 2023 03:45:41 -0400 Received: from mail-yw1-x112c.google.com (mail-yw1-x112c.google.com [IPv6:2607:f8b0:4864:20::112c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9151110E2 for ; Tue, 20 Jun 2023 00:45:32 -0700 (PDT) Received: by mail-yw1-x112c.google.com with SMTP id 00721157ae682-5703cb4bcb4so39361647b3.3 for ; Tue, 20 Jun 2023 00:45:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687247132; x=1689839132; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=7tbEFbFk5pzLjwVra+37SCseEW8dJexk2Np2YDtZH+c=; b=jZI7lGkGOA9VplRFSG9/39QEHLqEHFn81AVfRltHoq/8iNx7aAWlNrNyFe+XyF0lml Bnzro2qd7JBTjoGR7k3DplV7l1biYn3/LoxKNyegFiBFIRoYqaL2y+rd8Ux0pQmhTgvs IUPCYgX4z0WdKLSspIafzLUn3c05nMTizwc56jirtYHMfRSc/oqFfK7oTa/fwkNDoThq 2HdR/Am0ViDDg3zBY7MZWP/MVjmD1Yx9xMfwqoDnIfAkP9LmxM3rFBDQvW9AFIfnglmj paE2iSuMCCu/tpYLxht5C/v0kudDdPO6A0RSV0ftjBtK/MFrCHoTJHxirOjy2uYrAODS XFsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687247132; x=1689839132; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=7tbEFbFk5pzLjwVra+37SCseEW8dJexk2Np2YDtZH+c=; b=BJIW9+SlLrJ+bBrEqXhzca6+Om2c2pcpsvpMWRtSruZdo1EgVEP6LXDBsYUOF/V8fr KTz0RU7Llmtasy7LZ7UIi2Y95g+IvXEgAuFsEXnRdouq+X40tkVtvWfSpcoSrmbgPRyW 0JvuIaSqiDaU9lgXevwS++m6HYXxY1e8fW/AcVtDB1JErIOcA8STOmS+WgSheidz12Z0 GslEcNm4zwETb4YNK85CS7Re6SW2Sn3ob8A8u/LhzEEcU8hNUBN/e67SR2wtTgRftERU 3YK27pXuGjeE53ejM9vAKeCKjY7ubVgamLSZTcBA7cYMMnmBLAUf7yDS8NE5z3e0LpnT YQ+Q== X-Gm-Message-State: AC+VfDy8A09wHqjW/yLWaWdP0skqDmNy0hgKC2DZ0r6Sen1FXii5V741 zZhmwRaxaCVOJumC1FhpyTaOeQ== X-Received: by 2002:a81:7189:0:b0:54f:9cd0:990 with SMTP id m131-20020a817189000000b0054f9cd00990mr2846044ywc.18.1687247131680; Tue, 20 Jun 2023 00:45:31 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id o9-20020a0dcc09000000b0056d2fce4e09sm379759ywd.42.2023.06.20.00.45.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 00:45:31 -0700 (PDT) Date: Tue, 20 Jun 2023 00:45:26 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Gerald Schaefer , Vasily Gorbik , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , Vishal Moola , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 04/12] powerpc: assert_pte_locked() use pte_offset_map_nolock() In-Reply-To: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> Message-ID: <7ae6836b-b612-23f1-63e0-babda6e96e2c@google.com> References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> MIME-Version: 1.0 X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769207230351306011?= X-GMAIL-MSGID: =?utf-8?q?1769207230351306011?= Instead of pte_lockptr(), use the recently added pte_offset_map_nolock() in assert_pte_locked(). BUG if pte_offset_map_nolock() fails: this is stricter than the previous implementation, which skipped when pmd_none() (with a comment on khugepaged collapse transitions): but wouldn't we want to know, if an assert_pte_locked() caller can be racing such transitions? This mod might cause new crashes: which either expose my ignorance, or indicate issues to be fixed, or limit the usage of assert_pte_locked(). Signed-off-by: Hugh Dickins --- arch/powerpc/mm/pgtable.c | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c index cb2dcdb18f8e..16b061af86d7 100644 --- a/arch/powerpc/mm/pgtable.c +++ b/arch/powerpc/mm/pgtable.c @@ -311,6 +311,8 @@ void assert_pte_locked(struct mm_struct *mm, unsigned long addr) p4d_t *p4d; pud_t *pud; pmd_t *pmd; + pte_t *pte; + spinlock_t *ptl; if (mm == &init_mm) return; @@ -321,16 +323,10 @@ void assert_pte_locked(struct mm_struct *mm, unsigned long addr) pud = pud_offset(p4d, addr); BUG_ON(pud_none(*pud)); pmd = pmd_offset(pud, addr); - /* - * khugepaged to collapse normal pages to hugepage, first set - * pmd to none to force page fault/gup to take mmap_lock. After - * pmd is set to none, we do a pte_clear which does this assertion - * so if we find pmd none, return. - */ - if (pmd_none(*pmd)) - return; - BUG_ON(!pmd_present(*pmd)); - assert_spin_locked(pte_lockptr(mm, pmd)); + pte = pte_offset_map_nolock(mm, pmd, addr, &ptl); + BUG_ON(!pte); + assert_spin_locked(ptl); + pte_unmap(pte); } #endif /* CONFIG_DEBUG_VM */ From patchwork Tue Jun 20 07:47:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 110305 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp3496346vqr; Tue, 20 Jun 2023 00:54:15 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5chOsFxM5GVSF7YL8ok1waZvmbYZm6HThdhy4wC78zH4SggPE2JO26cPMw0L8ea1eToHQF X-Received: by 2002:a05:6358:610e:b0:12f:2815:fed6 with SMTP id 14-20020a056358610e00b0012f2815fed6mr5236202rws.19.1687247655222; Tue, 20 Jun 2023 00:54:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687247655; cv=none; d=google.com; s=arc-20160816; b=rYXy2rlfJ91yn3qp7UAJD3teEaIEC/mCkCtDxUq01bJmoJRQZ8fBmB05xrq28raPto 37dq2JXOj7OWoOlzShyZ0LwEx8vRUfw82HvTBxaoMK2FPBzlgu87EARhDHpZwgwVnUT8 /g+3R+HnOt2LvKAKZRz5BFprFSWGVttXVly7HQPNH1Y24Tbc73Hs2JtX9GjYUROHHqIi cXGQl0n+vq9Hg1u1tKL6P9+cu17zFoElRKqc50koYVJ/cORhwSP9J19htAR/rhRNVK6J V/Hr4DDlTWb5Ms4qf3haJZK6XGtZhGKTmaBVbFQR9oiIyN6IqYSbMZbtYLDWJ9V6dnIo GbwA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:message-id:in-reply-to :subject:cc:to:from:date:dkim-signature; bh=VQ3G/mOw1YDakj6K/QDY3ISJ4AmLryMvPDR0lMuqBuw=; b=xAKuCvtm/+LMFlmqB9W9YWCGztjzHNiNBz05siB68u8NQ+a/IF6PgCytF0PKL1y6uB nXQycNfyXQaj6d/U4pEvT+4nnGCn0ZYSrw47om2OBcJzIZKwifvbkHBd9voYfuG9O3My 2jtd9i3cTvTIwu+ohRWuLfMyOyHJvNi2RQve1nKBxLDWYIo1MMIZYFL6epDsaG6ZCsM5 kQB6zkEbQzWWIg0xHmXzGYYxUOKo6pxuvlQGTMl9PJyMRqOt/aoW8KNdszXDnAlQLVLN BhGCuXjFxKA1zB+eKFoiVqBextdMO1HKCdecFNNL6NxL/b5qw0U9NUpOG0Bt7/Fi2E+m j68g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=FTDl5iE6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b190-20020a621bc7000000b0065fe77bf291si1119783pfb.276.2023.06.20.00.54.02; Tue, 20 Jun 2023 00:54:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=FTDl5iE6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230399AbjFTHs4 (ORCPT + 99 others); Tue, 20 Jun 2023 03:48:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47720 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230235AbjFTHsk (ORCPT ); Tue, 20 Jun 2023 03:48:40 -0400 Received: from mail-yb1-xb36.google.com (mail-yb1-xb36.google.com [IPv6:2607:f8b0:4864:20::b36]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0C94B170A for ; Tue, 20 Jun 2023 00:48:00 -0700 (PDT) Received: by mail-yb1-xb36.google.com with SMTP id 3f1490d57ef6-be3b35ae72dso4474976276.0 for ; Tue, 20 Jun 2023 00:48:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687247279; x=1689839279; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=VQ3G/mOw1YDakj6K/QDY3ISJ4AmLryMvPDR0lMuqBuw=; b=FTDl5iE634tOzkrqSCVEu9Nf5D7hLqctOZFWgKMkQ3RqUmNsUi/5jC5UBELVK+62a/ tppQbX64R9NP8cj1gGJ6SvM5CU9vfUE39QVPjAKeWzgQ/S5s5xCa2DCQNDmPIUw76uDU docHziNctEA/rKYRNqAAe9o2ycs/Al7k/GvMXYgsm/K5xYfYahKx2SHDAiqufCA//I5i fpqnLbMwekkTgKZJTegbsUNCWCDlJKVeWVzpAxM2KcHVartgtsYK5L6lD7NGrGn8gZ8V H4EcXMPG6hTq4PUDvh8MLdynIDaaJQky/EdiXCX827BTAn0GvFQ+WeIvsKvPBLKBi8Cj Dm0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687247279; x=1689839279; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=VQ3G/mOw1YDakj6K/QDY3ISJ4AmLryMvPDR0lMuqBuw=; b=P6HwWD5xiQ6mk/h2GdkgPpLcfrbuSvbm/QGTC8/86G+G2G6EPFuaPn9owkUYDPmmR8 ULFNCsaCo55nVbWTTwZDyfOImDZGMgr83hmh7o6G/BqtHzBNXzCBMaDUpzb4Q7W95PZb 1D+lvW3VtbCaU7k0cefsdJCXkjC4qcT+CFARvcfnrPEG/glkdDbx0/c03k7PIXVgU0te ww7RYeuCG1plxmoOhZG5y2TwYO84M9CsuyeWA6/m3smqbsFUL0NzzOHNr97tl5mjXqME zoti7wKhOPwRJSQdCm/E8/AbHVhL5zVPdwu5FoOMtD9VcMynCI/CED5GgfV83lwNJgyy p2XQ== X-Gm-Message-State: AC+VfDwY0+UR4UbjOEspTUNU+dRnV8cEWx4XrcE2pwoUD9FkjajK2lDv rCQVsohIy/V8zIUgu9cGjiuahQ== X-Received: by 2002:a0d:e64d:0:b0:569:74f3:f3e1 with SMTP id p74-20020a0de64d000000b0056974f3f3e1mr11385428ywe.0.1687247279055; Tue, 20 Jun 2023 00:47:59 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id j127-20020a0df985000000b005612fc707bfsm364068ywf.120.2023.06.20.00.47.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 00:47:58 -0700 (PDT) Date: Tue, 20 Jun 2023 00:47:54 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Gerald Schaefer , Vasily Gorbik , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David Sc. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , Vishal Moola , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 05/12] powerpc: add pte_free_defer() for pgtables sharing page In-Reply-To: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> Message-ID: <5cd9f442-61da-4c3d-eca-b7f44d22aa5f@google.com> References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> MIME-Version: 1.0 X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769207397234963323?= X-GMAIL-MSGID: =?utf-8?q?1769207397234963323?= Add powerpc-specific pte_free_defer(), to call pte_free() via call_rcu(). pte_free_defer() will be called inside khugepaged's retract_page_tables() loop, where allocating extra memory cannot be relied upon. This precedes the generic version to avoid build breakage from incompatible pgtable_t. This is awkward because the struct page contains only one rcu_head, but that page may be shared between PTE_FRAG_NR pagetables, each wanting to use the rcu_head at the same time: account concurrent deferrals with a heightened refcount, only the first making use of the rcu_head, but re-deferring if more deferrals arrived during its grace period. Signed-off-by: Hugh Dickins Signed-off-by: Hugh Dickins --- arch/powerpc/include/asm/pgalloc.h | 4 +++ arch/powerpc/mm/pgtable-frag.c | 51 ++++++++++++++++++++++++++++++ 2 files changed, 55 insertions(+) diff --git a/arch/powerpc/include/asm/pgalloc.h b/arch/powerpc/include/asm/pgalloc.h index 3360cad78ace..3a971e2a8c73 100644 --- a/arch/powerpc/include/asm/pgalloc.h +++ b/arch/powerpc/include/asm/pgalloc.h @@ -45,6 +45,10 @@ static inline void pte_free(struct mm_struct *mm, pgtable_t ptepage) pte_fragment_free((unsigned long *)ptepage, 0); } +/* arch use pte_free_defer() implementation in arch/powerpc/mm/pgtable-frag.c */ +#define pte_free_defer pte_free_defer +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable); + /* * Functions that deal with pagetables that could be at any level of * the table need to be passed an "index_size" so they know how to diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c index 20652daa1d7e..e4f58c5fc2ac 100644 --- a/arch/powerpc/mm/pgtable-frag.c +++ b/arch/powerpc/mm/pgtable-frag.c @@ -120,3 +120,54 @@ void pte_fragment_free(unsigned long *table, int kernel) __free_page(page); } } + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#define PTE_FREE_DEFERRED 0x10000 /* beyond any PTE_FRAG_NR */ + +static void pte_free_now(struct rcu_head *head) +{ + struct page *page; + int refcount; + + page = container_of(head, struct page, rcu_head); + refcount = atomic_sub_return(PTE_FREE_DEFERRED - 1, + &page->pt_frag_refcount); + if (refcount < PTE_FREE_DEFERRED) { + pte_fragment_free((unsigned long *)page_address(page), 0); + return; + } + /* + * One page may be shared between PTE_FRAG_NR pagetables. + * At least one more call to pte_free_defer() came in while we + * were already deferring, so the free must be deferred again; + * but just for one grace period, however many calls came in. + */ + while (refcount >= PTE_FREE_DEFERRED + PTE_FREE_DEFERRED) { + refcount = atomic_sub_return(PTE_FREE_DEFERRED, + &page->pt_frag_refcount); + } + /* Remove that refcount of 1 left for fragment freeing above */ + atomic_dec(&page->pt_frag_refcount); + call_rcu(&page->rcu_head, pte_free_now); +} + +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable) +{ + struct page *page; + + page = virt_to_page(pgtable); + /* + * One page may be shared between PTE_FRAG_NR pagetables: only queue + * it once for freeing, but note whenever the free must be deferred. + * + * (This would be much simpler if the struct page had an rcu_head for + * each fragment, or if we could allocate a separate array for that.) + * + * Convert our refcount of 1 to a refcount of PTE_FREE_DEFERRED, and + * proceed to call_rcu() only when the rcu_head is not already in use. + */ + if (atomic_add_return(PTE_FREE_DEFERRED - 1, &page->pt_frag_refcount) < + PTE_FREE_DEFERRED + PTE_FREE_DEFERRED) + call_rcu(&page->rcu_head, pte_free_now); +} +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ From patchwork Tue Jun 20 07:49:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 110322 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp3501938vqr; Tue, 20 Jun 2023 01:06:44 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4y4J6+puy23AcnZkwC5EKh+vFAZ5s+VaQ16R26n6HKfBsIB2hl/Pg8M+A27LSY9VI8PLeO X-Received: by 2002:a05:6a21:398c:b0:119:54a7:1e61 with SMTP id ad12-20020a056a21398c00b0011954a71e61mr12297396pzc.22.1687248403937; Tue, 20 Jun 2023 01:06:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687248403; cv=none; d=google.com; s=arc-20160816; b=fEwXA7ZPwObBfrzjkcsuMVsISz9jL6cu731VfdG3oX+e2wHVhwKKRX0zkyD4geBDVS LTUW0kxtFF8s033cs+xKEUWQiOrZ/D793sk1Jrwy1KUltfGc6LnBA+WV+1BelBTsbzKi rXnKewoeS+JDTpqDikx5HdSmA71xH4JIfuZkrAhRKr0zPRUp93B4WjcuWJDPlZMSv2vI YfbN2SJHdvnGafdfHOPQhmzFVS9xt24SK8Rr6Jg2+0WNBhOwlkWDkaqvYbQ0MMk6Y9Yb ZrYCzOdpvmWkiEidCgWZonB6CKpClruGfUyp0O6IP4BmiITyyG40S+AhEGxxvX6kex4g lGRw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:message-id:in-reply-to :subject:cc:to:from:date:dkim-signature; bh=JyT8tmsQKzrYxiMI/rVOY2yPaVbA/RNeIBfphNdBMF4=; b=L4QjvqBNb/x5IVuXs6ZfbaSvqD8tVOuYEfZrkYumhsbg77upZv1p20SvYfuUxRAnh8 TM07Rl8LCTjccGjBCLJMpgOmZdnx+HioXzwlZLZcUtqklW8X3Lv9nUBTbHBCxwuO0Ach RG1tO0etltlMDCVBFb5hdXYDeD4G05KNYpwdnvWmZj5TFoS1VZ0E81+OvvlMEYlyqeu8 b8PnPxO5kybsDWeTS3DQ1WulYqtIeDZRk0PSuq/sAS5icEBtX9+f8UtkeYTE13lQLmi9 /tJqfMEbOL7zYBXQmhM7mgsvm4sJ7+QBLmVOP6c/4AZdUWnBamfREtDSZ5ag8SU0jOHF ihyQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=Wlyhpqj5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e28-20020aa7981c000000b00668802803fbsi1087224pfl.393.2023.06.20.01.06.29; Tue, 20 Jun 2023 01:06:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=Wlyhpqj5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231244AbjFTHuL (ORCPT + 99 others); Tue, 20 Jun 2023 03:50:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49162 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229595AbjFTHuJ (ORCPT ); Tue, 20 Jun 2023 03:50:09 -0400 Received: from mail-yw1-x112e.google.com (mail-yw1-x112e.google.com [IPv6:2607:f8b0:4864:20::112e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2F67B172B for ; Tue, 20 Jun 2023 00:49:41 -0700 (PDT) Received: by mail-yw1-x112e.google.com with SMTP id 00721157ae682-5704fce0f23so47448457b3.3 for ; Tue, 20 Jun 2023 00:49:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687247379; x=1689839379; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=JyT8tmsQKzrYxiMI/rVOY2yPaVbA/RNeIBfphNdBMF4=; b=Wlyhpqj5qzfuvutrjOa/4V6PuPsDCDOgWHtSW/4kwbuOkurc1q0KurhYLqHW1JVDng ocfjumMCL+5VTdlkv6JFAWzHFSaf/DQcTveKuW/F1nUw+Sj6E3jgL+8I+UQrrKTcSspC zGEBBcceErIi3jvfUdE8XIwgkG18lQEbl2V2HS4GGiBXkvY4t9/NkaOVpMEahcggUUzl u8oefzeOOrLLnM3RkTGo6WsoKQCjwWkScujGOaUdyRth+pzrlMusEw0XYP/BcTSABeYV rleoq5m+PuifcHOelghsbpsP6kubTOIzWI9OP98uqMngj6OJlbnVDdZn1qt2293RkTnb aaaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687247379; x=1689839379; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=JyT8tmsQKzrYxiMI/rVOY2yPaVbA/RNeIBfphNdBMF4=; b=XoQY8njBodNgYq1P3TlimrNmOxG3YHXYUduQhekqceTfaarVvS9NKqnZxT9RqqbQ9r aP+uP3iwNRqeQMihzzz9/+/y4E3fi0taQa/6rkkF3i9fYZAHNrcEguxCl58i8B7Cnv9A UEif3LqayRlv7BazoHeTn8GODOIuBfMvtShx7UoU8eaJZy6oWWgR+vv+4oqqpiwPp9YD LsW5zBBtNBLJve19lRvhVRwM0tqIl3yJe2XWeWxxywJYOIKkn3S8+UDoFvjUg3ktgib4 bSvV31CRok2hCEhO2gs+ztIsxK/4au+Eo7dghPEYbUWmY6z8Nc7gDrjjKXRQDJghHL9W 32tA== X-Gm-Message-State: AC+VfDz/82W14wFjkaQXHSD35OK9KfZs+0DLOHYIrC6y7QCSh7+pqxy6 mHCBhgx09hVR0l123I90d1TWIw== X-Received: by 2002:a81:8311:0:b0:568:d63e:dd2c with SMTP id t17-20020a818311000000b00568d63edd2cmr10308263ywf.11.1687247379257; Tue, 20 Jun 2023 00:49:39 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id o17-20020a0dcc11000000b005702597583fsm381836ywd.26.2023.06.20.00.49.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 00:49:38 -0700 (PDT) Date: Tue, 20 Jun 2023 00:49:34 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Gerald Schaefer , Vasily Gorbik , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , Vishal Moola , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 06/12] sparc: add pte_free_defer() for pte_t *pgtable_t In-Reply-To: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> Message-ID: References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> MIME-Version: 1.0 X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769208182339220500?= X-GMAIL-MSGID: =?utf-8?q?1769208182339220500?= Add sparc-specific pte_free_defer(), to call pte_free() via call_rcu(). pte_free_defer() will be called inside khugepaged's retract_page_tables() loop, where allocating extra memory cannot be relied upon. This precedes the generic version to avoid build breakage from incompatible pgtable_t. sparc32 supports pagetables sharing a page, but does not support THP; sparc64 supports THP, but does not support pagetables sharing a page. So the sparc-specific pte_free_defer() is as simple as the generic one, except for converting between pte_t *pgtable_t and struct page *. Signed-off-by: Hugh Dickins --- arch/sparc/include/asm/pgalloc_64.h | 4 ++++ arch/sparc/mm/init_64.c | 16 ++++++++++++++++ 2 files changed, 20 insertions(+) diff --git a/arch/sparc/include/asm/pgalloc_64.h b/arch/sparc/include/asm/pgalloc_64.h index 7b5561d17ab1..caa7632be4c2 100644 --- a/arch/sparc/include/asm/pgalloc_64.h +++ b/arch/sparc/include/asm/pgalloc_64.h @@ -65,6 +65,10 @@ pgtable_t pte_alloc_one(struct mm_struct *mm); void pte_free_kernel(struct mm_struct *mm, pte_t *pte); void pte_free(struct mm_struct *mm, pgtable_t ptepage); +/* arch use pte_free_defer() implementation in arch/sparc/mm/init_64.c */ +#define pte_free_defer pte_free_defer +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable); + #define pmd_populate_kernel(MM, PMD, PTE) pmd_set(MM, PMD, PTE) #define pmd_populate(MM, PMD, PTE) pmd_set(MM, PMD, PTE) diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c index 04f9db0c3111..0d7fd793924c 100644 --- a/arch/sparc/mm/init_64.c +++ b/arch/sparc/mm/init_64.c @@ -2930,6 +2930,22 @@ void pgtable_free(void *table, bool is_page) } #ifdef CONFIG_TRANSPARENT_HUGEPAGE +static void pte_free_now(struct rcu_head *head) +{ + struct page *page; + + page = container_of(head, struct page, rcu_head); + __pte_free((pgtable_t)page_address(page)); +} + +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable) +{ + struct page *page; + + page = virt_to_page(pgtable); + call_rcu(&page->rcu_head, pte_free_now); +} + void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmd) { From patchwork Tue Jun 20 07:51:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 110329 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp3504065vqr; Tue, 20 Jun 2023 01:11:51 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4JrTH+0vIPInDQVSVgQxlLkn3EXe1fEkCvm3g0QAhlY89rAXeI164cQ9xN7DC3iJ/1C9ex X-Received: by 2002:a05:6358:f55:b0:130:e6ce:d5a7 with SMTP id c21-20020a0563580f5500b00130e6ced5a7mr5159765rwj.6.1687248711187; Tue, 20 Jun 2023 01:11:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687248711; cv=none; d=google.com; s=arc-20160816; b=tVv8vl/QJPSItcDE5DNJJ/LCjjQQP5JEfmi2ZFjO6NF6Y8bLbSiNvk8R7KEah4kNRy 5ZCuPxFLZjI2D+kNvgYqIRuEpldMKgq6QtxrHT2TiucJWRF3vbAPOVy0ezBijV68du7Q MHwvZHF8Hkfd4DWzq3SXj9k7PqwtjhfH+fUNGvPVNnKNDI3yIk8zTfk/sieunfm3oFDo C5H1cpO1fcQLASU52WMFnbzn+9XLzgm/I/TubMYSQTeiHcLNNYgst4vpG2dJ5UuFD/xs pIitCDprhjXTquSBPhmVmIWiZejfALMghTy5J+YMT5N6PfJK3LB1pBefLEiq/4+Hr2Vt Wohg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:message-id:in-reply-to :subject:cc:to:from:date:dkim-signature; bh=+jUseASXhwGaNM4F6CkFzwvn2Dk1xhlzNLQjSeyjV8Y=; b=dHUmFdlaJVbr3zUl3uqA+8KYGUl0SsjZavUly0X4Q3zT5kyVfmSEiN0bizrKhe8SMr bDv3pb6oIlZ88g5+pSCVrmOGEFk2dTF/X7rqD8dysoE2WsDXJSyLNnzUQ3yRzmDEobKs dRpc9tPK/JqPoPz2t40H9RqGBSY21DUWPp0RCUB/7moCrTmHhTQX0cyPR61B9qroybKx EoYymPkh9u4fFfRyCyUc26OtEJE1NFfUDVd2fpUbe0kewNrUcM8hFNFuhsRslplmhiQN h/RfPuKb82/14TKgHzFKTTaZwqztYow2Dl7GOHVSgx6SwV7zYQ9F7uOkZ4J+MBLxzuWG 3Q+g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=yMZZZ71U; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a19-20020a621a13000000b00666e510da4csi1162877pfa.237.2023.06.20.01.11.38; Tue, 20 Jun 2023 01:11:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=yMZZZ71U; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231373AbjFTHwZ (ORCPT + 99 others); Tue, 20 Jun 2023 03:52:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50272 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231698AbjFTHwL (ORCPT ); Tue, 20 Jun 2023 03:52:11 -0400 Received: from mail-yb1-xb2b.google.com (mail-yb1-xb2b.google.com [IPv6:2607:f8b0:4864:20::b2b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 558071BCD for ; Tue, 20 Jun 2023 00:51:43 -0700 (PDT) Received: by mail-yb1-xb2b.google.com with SMTP id 3f1490d57ef6-bc40d4145feso4394966276.1 for ; Tue, 20 Jun 2023 00:51:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687247485; x=1689839485; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=+jUseASXhwGaNM4F6CkFzwvn2Dk1xhlzNLQjSeyjV8Y=; b=yMZZZ71UME7oK4yGdGKbgAGJwHWifsRBUE1ty8JHuL5GZamUEZ4PagH7+CA+ACRuXJ sVYkjhp9M9lk4smKqdKCXO0MTjpP1EkTbYT8MDDYBlSiF5OQTuGuvrMV46BD/K8KMf/8 VOuKBB/cMCFRHgkyZqVnyK9Ko9EAea3S5WqfjuYhhZ6hhTM9p4DwQdnfw6MNXWL2d4va sxwz/XIyotH6XV5mKGrc5rIq2+19pPbxhawaqiUWsN1AbxSzM+rTQINZrSJH0bvVIXEy SnNN73ZM1CD0WeufHS2IZ+FCcyXO+JAYLl5u8OigLfNh6mFqqNG2vS4i+tU2O9Fk9zM/ R+ow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687247485; x=1689839485; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=+jUseASXhwGaNM4F6CkFzwvn2Dk1xhlzNLQjSeyjV8Y=; b=izqGHFCtim8TpUppIjMZsXQKezGCohNK4ClZ+mLa0146JTIs2c1YTaj4S3OCbItFEx jLVN2UO22nZPToEu37XJgfdQ5N9zPZf85fRrfowiqDsVl8FRDeGYO/5sHZydc1PKwMcC 0IHfgDzRLXUAZR38ueMRYlWFSVma9ql74wz3Cyob9L0BorZT67eIq/lhgL7RHwmcG/74 2Q5Z2n69ys5f3siWq/N3lKNXuZJRsCRGR853wWzLdcowb2bdh1uTuOjcMwAhF27RikB6 ZikHeU3K+6nNtlh0ctweOaYLx58OB0zD3fbWyVW0YwMPqNLt4G/XvSGCklVoIQjs54Ik E6ZA== X-Gm-Message-State: AC+VfDwUcJCkhNAA37hihMae5CL4iqZxnhVV99oKP74EzTbv0qnZYAPt WxTRVqob2MJH3hcAmncl/pdaJw== X-Received: by 2002:a25:b296:0:b0:ba8:7122:2917 with SMTP id k22-20020a25b296000000b00ba871222917mr8863514ybj.0.1687247484917; Tue, 20 Jun 2023 00:51:24 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id o195-20020a2541cc000000b00bb21aeddeffsm258356yba.18.2023.06.20.00.51.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 00:51:24 -0700 (PDT) Date: Tue, 20 Jun 2023 00:51:19 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Gerald Schaefer , Vasily Gorbik , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , Vishal Moola , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page In-Reply-To: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> Message-ID: References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> MIME-Version: 1.0 X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769208504188107152?= X-GMAIL-MSGID: =?utf-8?q?1769208504188107152?= Add s390-specific pte_free_defer(), to call pte_free() via call_rcu(). pte_free_defer() will be called inside khugepaged's retract_page_tables() loop, where allocating extra memory cannot be relied upon. This precedes the generic version to avoid build breakage from incompatible pgtable_t. This version is more complicated than others: because s390 fits two 2K page tables into one 4K page (so page->rcu_head must be shared between both halves), and already uses page->lru (which page->rcu_head overlays) to list any free halves; with clever management by page->_refcount bits. Build upon the existing management, adjusted to follow a new rule: that a page is not linked to mm_context_t::pgtable_list while either half is pending free, by either tlb_remove_table() or pte_free_defer(); but is afterwards either relinked to the list (if other half is allocated), or freed (if other half is free): by __tlb_remove_table() in both cases. This rule ensures that page->lru is no longer in use while page->rcu_head may be needed for use by pte_free_defer(). And a fortuitous byproduct of following this rule is that page_table_free() no longer needs its curious two-step manipulation of _refcount - read commit c2c224932fd0 ("s390/mm: fix 2KB pgtable release race") for what to think of there. But it does not solve the problem that two halves may need rcu_head at the same time. For that, add HHead bits between s390's AAllocated and PPending bits in the upper byte of page->_refcount: then the second pte_free_defer() can see that rcu_head is already in use, and the RCU callee pte_free_half() can see that it needs to make a further call_rcu() for that other half. page_table_alloc() set the page->pt_mm field, so __tlb_remove_table() knows where to link the freed half while its other half is allocated. But linking to the list needs mm->context.lock: and although AA bit set guarantees that pt_mm must still be valid, it does not guarantee that mm is still valid an instant later: so acquiring mm->context.lock would not be safe. For now, use a static global mm_pgtable_list_lock instead: then a soon-to-follow commit will split it per-mm as before (probably by using a SLAB_TYPESAFE_BY_RCU structure for the list head and its lock); and update the commentary on the pgtable_list. Signed-off-by: Hugh Dickins Signed-off-by: Gerald Schaefer Signed-off-by: Hugh Dickins Reviewed-by: Gerald Schaefer --- arch/s390/include/asm/pgalloc.h | 4 + arch/s390/mm/pgalloc.c | 205 +++++++++++++++++++++++--------- include/linux/mm_types.h | 2 +- 3 files changed, 154 insertions(+), 57 deletions(-) diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h index 17eb618f1348..89a9d5ef94f8 100644 --- a/arch/s390/include/asm/pgalloc.h +++ b/arch/s390/include/asm/pgalloc.h @@ -143,6 +143,10 @@ static inline void pmd_populate(struct mm_struct *mm, #define pte_free_kernel(mm, pte) page_table_free(mm, (unsigned long *) pte) #define pte_free(mm, pte) page_table_free(mm, (unsigned long *) pte) +/* arch use pte_free_defer() implementation in arch/s390/mm/pgalloc.c */ +#define pte_free_defer pte_free_defer +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable); + void vmem_map_init(void); void *vmem_crst_alloc(unsigned long val); pte_t *vmem_pte_alloc(void); diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c index 66ab68db9842..11983a3ff95a 100644 --- a/arch/s390/mm/pgalloc.c +++ b/arch/s390/mm/pgalloc.c @@ -159,6 +159,11 @@ void page_table_free_pgste(struct page *page) #endif /* CONFIG_PGSTE */ +/* + * Temporarily use a global spinlock instead of mm->context.lock. + * This will be replaced by a per-mm spinlock in a followup commit. + */ +static DEFINE_SPINLOCK(mm_pgtable_list_lock); /* * A 2KB-pgtable is either upper or lower half of a normal page. * The second half of the page may be unused or used as another @@ -172,7 +177,7 @@ void page_table_free_pgste(struct page *page) * When a parent page gets fully allocated it contains 2KB-pgtables in both * upper and lower halves and is removed from mm_context_t::pgtable_list. * - * When 2KB-pgtable is freed from to fully allocated parent page that + * When 2KB-pgtable is freed from the fully allocated parent page that * page turns partially allocated and added to mm_context_t::pgtable_list. * * If 2KB-pgtable is freed from the partially allocated parent page that @@ -182,16 +187,24 @@ void page_table_free_pgste(struct page *page) * As follows from the above, no unallocated or fully allocated parent * pages are contained in mm_context_t::pgtable_list. * + * NOTE NOTE NOTE: The commentary above and below has not yet been updated: + * the new rule is that a page is not linked to mm_context_t::pgtable_list + * while either half is pending free by any method; but afterwards is + * either relinked to it, or freed, by __tlb_remove_table(). This allows + * pte_free_defer() to use the page->rcu_head (which overlays page->lru). + * * The upper byte (bits 24-31) of the parent page _refcount is used * for tracking contained 2KB-pgtables and has the following format: * - * PP AA - * 01234567 upper byte (bits 24-31) of struct page::_refcount - * || || - * || |+--- upper 2KB-pgtable is allocated - * || +---- lower 2KB-pgtable is allocated - * |+------- upper 2KB-pgtable is pending for removal - * +-------- lower 2KB-pgtable is pending for removal + * PPHHAA + * 76543210 upper byte (bits 24-31) of struct page::_refcount + * |||||| + * |||||+--- lower 2KB-pgtable is allocated + * ||||+---- upper 2KB-pgtable is allocated + * |||+----- lower 2KB-pgtable is pending free by page->rcu_head + * ||+------ upper 2KB-pgtable is pending free by page->rcu_head + * |+------- lower 2KB-pgtable is pending free by any method + * +-------- upper 2KB-pgtable is pending free by any method * * (See commit 620b4e903179 ("s390: use _refcount for pgtables") on why * using _refcount is possible). @@ -200,7 +213,7 @@ void page_table_free_pgste(struct page *page) * The parent page is either: * - added to mm_context_t::pgtable_list in case the second half of the * parent page is still unallocated; - * - removed from mm_context_t::pgtable_list in case both hales of the + * - removed from mm_context_t::pgtable_list in case both halves of the * parent page are allocated; * These operations are protected with mm_context_t::lock. * @@ -239,32 +252,22 @@ unsigned long *page_table_alloc(struct mm_struct *mm) /* Try to get a fragment of a 4K page as a 2K page table */ if (!mm_alloc_pgste(mm)) { table = NULL; - spin_lock_bh(&mm->context.lock); + spin_lock_bh(&mm_pgtable_list_lock); if (!list_empty(&mm->context.pgtable_list)) { page = list_first_entry(&mm->context.pgtable_list, struct page, lru); mask = atomic_read(&page->_refcount) >> 24; - /* - * The pending removal bits must also be checked. - * Failure to do so might lead to an impossible - * value of (i.e 0x13 or 0x23) written to _refcount. - * Such values violate the assumption that pending and - * allocation bits are mutually exclusive, and the rest - * of the code unrails as result. That could lead to - * a whole bunch of races and corruptions. - */ - mask = (mask | (mask >> 4)) & 0x03U; - if (mask != 0x03U) { - table = (unsigned long *) page_to_virt(page); - bit = mask & 1; /* =1 -> second 2K */ - if (bit) - table += PTRS_PER_PTE; - atomic_xor_bits(&page->_refcount, - 0x01U << (bit + 24)); - list_del(&page->lru); - } + /* Cannot be on this list if either half pending free */ + WARN_ON_ONCE(mask & ~0x03U); + /* One or other half must be available, but not both */ + WARN_ON_ONCE(mask == 0x00U || mask == 0x03U); + table = (unsigned long *)page_to_virt(page); + bit = mask & 0x01U; /* =1 -> second 2K available */ + table += bit * PTRS_PER_PTE; + atomic_xor_bits(&page->_refcount, 0x01U << (bit + 24)); + list_del(&page->lru); } - spin_unlock_bh(&mm->context.lock); + spin_unlock_bh(&mm_pgtable_list_lock); if (table) return table; } @@ -278,6 +281,7 @@ unsigned long *page_table_alloc(struct mm_struct *mm) } arch_set_page_dat(page, 0); /* Initialize page table */ + page->pt_mm = mm; table = (unsigned long *) page_to_virt(page); if (mm_alloc_pgste(mm)) { /* Return 4K page table with PGSTEs */ @@ -288,14 +292,14 @@ unsigned long *page_table_alloc(struct mm_struct *mm) /* Return the first 2K fragment of the page */ atomic_xor_bits(&page->_refcount, 0x01U << 24); memset64((u64 *)table, _PAGE_INVALID, 2 * PTRS_PER_PTE); - spin_lock_bh(&mm->context.lock); + spin_lock_bh(&mm_pgtable_list_lock); list_add(&page->lru, &mm->context.pgtable_list); - spin_unlock_bh(&mm->context.lock); + spin_unlock_bh(&mm_pgtable_list_lock); } return table; } -static void page_table_release_check(struct page *page, void *table, +static void page_table_release_check(struct page *page, unsigned long *table, unsigned int half, unsigned int mask) { char msg[128]; @@ -317,21 +321,18 @@ void page_table_free(struct mm_struct *mm, unsigned long *table) if (!mm_alloc_pgste(mm)) { /* Free 2K page table fragment of a 4K page */ bit = ((unsigned long) table & ~PAGE_MASK)/(PTRS_PER_PTE*sizeof(pte_t)); - spin_lock_bh(&mm->context.lock); + spin_lock_bh(&mm_pgtable_list_lock); /* - * Mark the page for delayed release. The actual release - * will happen outside of the critical section from this - * function or from __tlb_remove_table() + * Mark the page for release. The actual release will happen + * below from this function, or later from __tlb_remove_table(). */ - mask = atomic_xor_bits(&page->_refcount, 0x11U << (bit + 24)); + mask = atomic_xor_bits(&page->_refcount, 0x01U << (bit + 24)); mask >>= 24; - if (mask & 0x03U) + if (mask & 0x03U) /* other half is allocated */ list_add(&page->lru, &mm->context.pgtable_list); - else + else if (!(mask & 0x30U)) /* other half not pending */ list_del(&page->lru); - spin_unlock_bh(&mm->context.lock); - mask = atomic_xor_bits(&page->_refcount, 0x10U << (bit + 24)); - mask >>= 24; + spin_unlock_bh(&mm_pgtable_list_lock); if (mask != 0x00U) return; half = 0x01U << bit; @@ -362,19 +363,17 @@ void page_table_free_rcu(struct mmu_gather *tlb, unsigned long *table, return; } bit = ((unsigned long) table & ~PAGE_MASK) / (PTRS_PER_PTE*sizeof(pte_t)); - spin_lock_bh(&mm->context.lock); + spin_lock_bh(&mm_pgtable_list_lock); /* - * Mark the page for delayed release. The actual release will happen - * outside of the critical section from __tlb_remove_table() or from - * page_table_free() + * Mark the page for delayed release. + * The actual release will happen later, from __tlb_remove_table(). */ mask = atomic_xor_bits(&page->_refcount, 0x11U << (bit + 24)); mask >>= 24; - if (mask & 0x03U) - list_add_tail(&page->lru, &mm->context.pgtable_list); - else + /* Other half not allocated? Other half not already pending free? */ + if ((mask & 0x03U) == 0x00U && (mask & 0x30U) != 0x30U) list_del(&page->lru); - spin_unlock_bh(&mm->context.lock); + spin_unlock_bh(&mm_pgtable_list_lock); table = (unsigned long *) ((unsigned long) table | (0x01U << bit)); tlb_remove_table(tlb, table); } @@ -382,17 +381,40 @@ void page_table_free_rcu(struct mmu_gather *tlb, unsigned long *table, void __tlb_remove_table(void *_table) { unsigned int mask = (unsigned long) _table & 0x03U, half = mask; - void *table = (void *)((unsigned long) _table ^ mask); + unsigned long *table = (unsigned long *)((unsigned long) _table ^ mask); struct page *page = virt_to_page(table); switch (half) { case 0x00U: /* pmd, pud, or p4d */ - free_pages((unsigned long)table, CRST_ALLOC_ORDER); + __free_pages(page, CRST_ALLOC_ORDER); return; case 0x01U: /* lower 2K of a 4K page table */ - case 0x02U: /* higher 2K of a 4K page table */ - mask = atomic_xor_bits(&page->_refcount, mask << (4 + 24)); - mask >>= 24; + case 0x02U: /* upper 2K of a 4K page table */ + /* + * If the other half is marked as allocated, page->pt_mm must + * still be valid, page->rcu_head no longer in use so page->lru + * good for use, so now make the freed half available for reuse. + * But be wary of races with that other half being freed. + */ + if (atomic_read(&page->_refcount) & (0x03U << 24)) { + struct mm_struct *mm = page->pt_mm; + /* + * It is safe to use page->pt_mm when the other half + * is seen allocated while holding pgtable_list lock; + * but how will it be safe to acquire that spinlock? + * Global mm_pgtable_list_lock is safe and easy for + * now, then a followup commit will split it per-mm. + */ + spin_lock_bh(&mm_pgtable_list_lock); + mask = atomic_xor_bits(&page->_refcount, mask << 28); + mask >>= 24; + if (mask & 0x03U) + list_add(&page->lru, &mm->context.pgtable_list); + spin_unlock_bh(&mm_pgtable_list_lock); + } else { + mask = atomic_xor_bits(&page->_refcount, mask << 28); + mask >>= 24; + } if (mask != 0x00U) return; break; @@ -407,6 +429,77 @@ void __tlb_remove_table(void *_table) __free_page(page); } +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +static void pte_free_now0(struct rcu_head *head); +static void pte_free_now1(struct rcu_head *head); + +static void pte_free_pgste(struct rcu_head *head) +{ + unsigned long *table; + struct page *page; + + page = container_of(head, struct page, rcu_head); + table = (unsigned long *)page_to_virt(page); + table = (unsigned long *)((unsigned long)table | 0x03U); + __tlb_remove_table(table); +} + +static void pte_free_half(struct rcu_head *head, unsigned int bit) +{ + unsigned long *table; + struct page *page; + unsigned int mask; + + page = container_of(head, struct page, rcu_head); + mask = atomic_xor_bits(&page->_refcount, 0x04U << (bit + 24)); + + table = (unsigned long *)page_to_virt(page); + table += bit * PTRS_PER_PTE; + table = (unsigned long *)((unsigned long)table | (0x01U << bit)); + __tlb_remove_table(table); + + /* If pte_free_defer() of the other half came in, queue it now */ + if (mask & 0x0CU) + call_rcu(&page->rcu_head, bit ? pte_free_now0 : pte_free_now1); +} + +static void pte_free_now0(struct rcu_head *head) +{ + pte_free_half(head, 0); +} + +static void pte_free_now1(struct rcu_head *head) +{ + pte_free_half(head, 1); +} + +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable) +{ + unsigned int bit, mask; + struct page *page; + + page = virt_to_page(pgtable); + if (mm_alloc_pgste(mm)) { + call_rcu(&page->rcu_head, pte_free_pgste); + return; + } + bit = ((unsigned long)pgtable & ~PAGE_MASK) / + (PTRS_PER_PTE * sizeof(pte_t)); + + spin_lock_bh(&mm_pgtable_list_lock); + mask = atomic_xor_bits(&page->_refcount, 0x15U << (bit + 24)); + mask >>= 24; + /* Other half not allocated? Other half not already pending free? */ + if ((mask & 0x03U) == 0x00U && (mask & 0x30U) != 0x30U) + list_del(&page->lru); + spin_unlock_bh(&mm_pgtable_list_lock); + + /* Do not relink on rcu_head if other half already linked on rcu_head */ + if ((mask & 0x0CU) != 0x0CU) + call_rcu(&page->rcu_head, bit ? pte_free_now1 : pte_free_now0); +} +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ + /* * Base infrastructure required to generate basic asces, region, segment, * and page tables that do not make use of enhanced features like EDAT1. diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 306a3d1a0fa6..1667a1bdb8a8 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -146,7 +146,7 @@ struct page { pgtable_t pmd_huge_pte; /* protected by page->ptl */ unsigned long _pt_pad_2; /* mapping */ union { - struct mm_struct *pt_mm; /* x86 pgds only */ + struct mm_struct *pt_mm; /* x86 pgd, s390 */ atomic_t pt_frag_refcount; /* powerpc */ }; #if ALLOC_SPLIT_PTLOCKS From patchwork Tue Jun 20 07:53:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 110343 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp3516613vqr; Tue, 20 Jun 2023 01:44:51 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7RFV9yz8QF5L+YxXKQgYOl0LxiBhsC6d5JqcEt2MwGlb5v81YBfOCxY3JaNtlAN+rYRwBB X-Received: by 2002:a05:6a00:1f94:b0:668:7508:78f9 with SMTP id bg20-20020a056a001f9400b00668750878f9mr6981656pfb.4.1687250691462; Tue, 20 Jun 2023 01:44:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687250691; cv=none; d=google.com; s=arc-20160816; b=TE3Si68LIzkvTh30sa70uNMHYFLGgLzbzYwi+Xm/PJLYv2Z1KMWYLpJsq+xVXkrXi4 SQIOHM3jK0WCZMIu1nAg85czJjbkNIJwFNBtl2t0sdVoG2rZVZRq81+5sbbl/B4Amvxt W1imhUuOlVaTvQvR0wUqeefhe0HxgEkx5OuXyCZFhkp0AMnrzRM7jsNAdS+O3ZNh0Xrn yrK0lBxkYQq0yvj3bNb6nSxWzdkykXCPxjl0gu55czT6fFjwL00KWD3jN5/nS+6wBlDq jEQNHhgTd+Cr8zknkV6eAgymgSNqzQD99scEODzCB9NnPCe+OgNoYk5uD1emCENVCCKc Baxw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:message-id:in-reply-to :subject:cc:to:from:date:dkim-signature; bh=Q5R0zcB6mybnkU9NBbTtghvpq67JAa0crCVDjZ9SpHk=; b=OG6YfQrXX+LwY7i0qxtH2z3r6xmcccWxpQNu/ztkur5bVHDuE6V0cFi1maeYrtAz8F cHtHUOlBNgqULgiNHUTQuZ1lyAzsXdUoSd/tP9c0ypT3sC2VMjZvk/OxH5xnVwKAGGGX 2ugL74YY1pV4m1vPng+sv+VbYcqaE0yHJ+Au8+4OJLhkmzlXMQ7huh576itnwiNs8JkS 79X/xFOMWnzQAL/llpoZNuZ/Tr1lPpYmQ8sw3NDAEbtVBS8nmsDsbdkhsCgqc8nssQc0 W8sq7WshKBHJfmiLJkfTT9eqrpu6aNOounR7Uzuo8ewujirjGDd26TtG05x2gRel/Er+ ibdg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=2GY6tb59; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j64-20020a636e43000000b005533db5fbb0si1236438pgc.379.2023.06.20.01.44.36; Tue, 20 Jun 2023 01:44:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=2GY6tb59; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231660AbjFTHxu (ORCPT + 99 others); Tue, 20 Jun 2023 03:53:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52056 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230500AbjFTHx2 (ORCPT ); Tue, 20 Jun 2023 03:53:28 -0400 Received: from mail-yw1-x112c.google.com (mail-yw1-x112c.google.com [IPv6:2607:f8b0:4864:20::112c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 693CC19AE for ; Tue, 20 Jun 2023 00:53:10 -0700 (PDT) Received: by mail-yw1-x112c.google.com with SMTP id 00721157ae682-5707b429540so53753347b3.1 for ; Tue, 20 Jun 2023 00:53:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687247588; x=1689839588; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=Q5R0zcB6mybnkU9NBbTtghvpq67JAa0crCVDjZ9SpHk=; b=2GY6tb59UZYGOHu8Jpard49Ijl8S41fZ8HDgfi6a4gvFJ5750jQeYrmu9nVrvi1jW2 tp8ypG/banZ4ws6Js9iQQN9cGknEmtht9BWd32J2yMcrVqteXUvW9YR9uMMhKiTOhwCj DlNulmZyq+f8KzKXIgNcUSNUMmKGW1CGHEEgbEp/J7IxewrE//XtQewkerpcgsbwL40t WfRaz/SdO8RzSMtTrw2rYV18y0/dvvOSll1UA2/d8Zuh32W7Pnj2B4fhDdSSvEN+0aoF 9f1IZjfj8naj8IynmrWiGUnfEPBGZdaKRscDXhw1Sy1gWU3iNWnHKGQGUPwpRy5JQoX9 lMYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687247588; x=1689839588; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Q5R0zcB6mybnkU9NBbTtghvpq67JAa0crCVDjZ9SpHk=; b=lOmEDasdW4/fOgBNdtj0689XasxNOQTfz6pqRIEaGH3A3Aayth0Kp0LGUAlSb0LBwj x0xDIYjR6H5PU3rcCfzDCGfHWgzglkqe7SabVn/qPNOjR2yxdgR2RrLNBd6gVsVAJ4cc lw/Us7QyBNLnBYJ/t3WLpMsQ/6nNfHYqAYBPGViPTfZazhThr7U6iBc1DjA945ktxieU qlcDMpFitrgDJlaBU46jRdf81Fz0wRh1DPDZtZ1ADbRUitgnVeX5aKCCPbVKorJWbOdV Abg2Hq90MABt/YqSbIARjb6CWQtJWZyG5ro+L9Z5T4YeF/VJ+iBqhyw2388nyOHGVNPC uPOA== X-Gm-Message-State: AC+VfDxlO+D1Y7NperHWrtDJcXFqg1ptIq8slCzwNyKPg+n0vXohxoTp GeD9AX0qi1qYfQMLtYnNTfUmwQ== X-Received: by 2002:a81:6c11:0:b0:56f:f83f:618 with SMTP id h17-20020a816c11000000b0056ff83f0618mr10840742ywc.19.1687247587950; Tue, 20 Jun 2023 00:53:07 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id x11-20020a81630b000000b0056ffca5fb01sm370775ywb.117.2023.06.20.00.53.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 00:53:07 -0700 (PDT) Date: Tue, 20 Jun 2023 00:53:03 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Gerald Schaefer , Vasily Gorbik , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , Vishal Moola , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 08/12] mm/pgtable: add pte_free_defer() for pgtable as page In-Reply-To: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> Message-ID: <3e5961a2-26e5-d1ab-5c4c-527e273e3cc5@google.com> References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> MIME-Version: 1.0 X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769210581051685579?= X-GMAIL-MSGID: =?utf-8?q?1769210581051685579?= Add the generic pte_free_defer(), to call pte_free() via call_rcu(). pte_free_defer() will be called inside khugepaged's retract_page_tables() loop, where allocating extra memory cannot be relied upon. This version suits all those architectures which use an unfragmented page for one page table (none of whose pte_free()s use the mm arg which was passed to it). Signed-off-by: Hugh Dickins --- include/linux/mm_types.h | 4 ++++ include/linux/pgtable.h | 2 ++ mm/pgtable-generic.c | 20 ++++++++++++++++++++ 3 files changed, 26 insertions(+) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 1667a1bdb8a8..09335fa28c41 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -144,6 +144,10 @@ struct page { struct { /* Page table pages */ unsigned long _pt_pad_1; /* compound_head */ pgtable_t pmd_huge_pte; /* protected by page->ptl */ + /* + * A PTE page table page might be freed by use of + * rcu_head: which overlays those two fields above. + */ unsigned long _pt_pad_2; /* mapping */ union { struct mm_struct *pt_mm; /* x86 pgd, s390 */ diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 525f1782b466..d18d3e963967 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -112,6 +112,8 @@ static inline void pte_unmap(pte_t *pte) } #endif +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable); + /* Find an entry in the second-level page table.. */ #ifndef pmd_offset static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address) diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index 5e85a625ab30..ab3741064bb8 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -13,6 +13,7 @@ #include #include #include +#include #include /* @@ -230,6 +231,25 @@ pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address, return pmd; } #endif + +/* arch define pte_free_defer in asm/pgalloc.h for its own implementation */ +#ifndef pte_free_defer +static void pte_free_now(struct rcu_head *head) +{ + struct page *page; + + page = container_of(head, struct page, rcu_head); + pte_free(NULL /* mm not passed and not used */, (pgtable_t)page); +} + +void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable) +{ + struct page *page; + + page = pgtable; + call_rcu(&page->rcu_head, pte_free_now); +} +#endif /* pte_free_defer */ #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #if defined(CONFIG_GUP_GET_PXX_LOW_HIGH) && \ From patchwork Tue Jun 20 07:54:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 110336 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp3511297vqr; Tue, 20 Jun 2023 01:30:30 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4Wayj8AkuG4kT9H5ltgj/0ssyGrTCeHgxnEMZtKvig1Z9RarKpLdIdAVhtBUkOYoJJG1Rb X-Received: by 2002:a05:6358:455:b0:12b:ea6b:8010 with SMTP id 21-20020a056358045500b0012bea6b8010mr3600530rwe.24.1687249830373; Tue, 20 Jun 2023 01:30:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687249830; cv=none; d=google.com; s=arc-20160816; b=wAYvSipr3aakdhRFyfqE0CyBnXBASmahrQC5ri784KNgxBfcT3chZ4N0PinwCcXS4P fvwDCI5hu15qZ6FbDighlqMzJaQWv1PsbijLRXiAhrCU987fYLuD0ttdrvDajr7Mpzhf sfSQG6FgqFpQIhWzIBsaMS1OOFXuL8RuosF9p/MuC+fRNnSakFw7Laf1seVej+/BlRZZ PGYNrCnjgrBr+sT6jf3NBbBPqqGnVKuaAAErkoOHg3uAaJskhQNexeOLysqHapkm1Y7q iAXdMeCEAcE6Gp4xMTQQB5tg9O1T0V8GV13kIFiHWNm0AOLXtcshgggMAdgK8GEtW3nr NyjA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:message-id:in-reply-to :subject:cc:to:from:date:dkim-signature; bh=EoggPrivOwaF6YYpZ0mcfv9qnK5xaGr29ovQpplAZx8=; b=go+AkLMlTq7pd2p/YvSLT4UoPykxUkzymma5UjR/j7vccKuRhMlYN+lWIoKz9jj51Y Ep3HZIECSipqdqSE86f+dvqMQEAeCVnvWSyp+SfQ/7xvpkutP0NyGrtOiB1hBmhbBl8I nza+61bEp/TTSI/rbalYKnL9KakRIWcbOEdGc1B1HjHNUuFcvsGpdDT03QT0fU5ObRuX 65qqMXxzDJby8F35K9vptJDCtiX+Kyanauy1ABhoyMM191EqrRdBPEUuhL954Od7fTmH +YDgG7FbKCN7yfivN4PckiJT0wsx0YsfwaC8K4r2UvK2mcePhNJSl7oJ/oJE1OTfNwg+ 5MUw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=J8nzRWA0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h2-20020a633842000000b005347d6bd7eesi1248975pgn.141.2023.06.20.01.30.15; Tue, 20 Jun 2023 01:30:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=J8nzRWA0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231735AbjFTH4p (ORCPT + 99 others); Tue, 20 Jun 2023 03:56:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52058 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231140AbjFTH4W (ORCPT ); Tue, 20 Jun 2023 03:56:22 -0400 Received: from mail-qk1-x72d.google.com (mail-qk1-x72d.google.com [IPv6:2607:f8b0:4864:20::72d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A33E31BC5 for ; Tue, 20 Jun 2023 00:55:02 -0700 (PDT) Received: by mail-qk1-x72d.google.com with SMTP id af79cd13be357-76242a24e3aso254347485a.1 for ; Tue, 20 Jun 2023 00:55:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687247701; x=1689839701; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=EoggPrivOwaF6YYpZ0mcfv9qnK5xaGr29ovQpplAZx8=; b=J8nzRWA0uvpss9wXeQOicc8Ot9y4/UFuc6plZhigByzD/E6z56SmcPuzuWHVg6RLIr zeSwVuKlgmRjiVprT2c29o+LY/xl0mKSG7EI+HK+ZmF9VRuRMDrYHfurcBSQUvbGMDJM BrTznrXs5U8ERPP2CrQPh2E6mB/xxogY+UVNmpVE8mP4PFPg1/HAfaCo5DqPYBJr1jNU jJBNI0/yqplj0fpPoW7vqJFOdNB1zyVXocOEAXeD5BoEwwZ8uu8OHUDQYXH1raSp0ip4 Z7eB2HLsP8hFBVrCu74QJhwf7v/CbG+Ae1Bf3UqUxfhWUQfg52/Gluq44YXQQXKCwSWh Wn8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687247701; x=1689839701; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=EoggPrivOwaF6YYpZ0mcfv9qnK5xaGr29ovQpplAZx8=; b=BuZYTa3ITu4AVH5suB1x/AjTzNxT2aw7VAQBnf2a9QTQezC8T23Kjgcfb0GYXAVF8Z WfFaSH9CBeoLf11yRznalPzavTC0NdUpcpWqiTuRaaYffvHFNfnRGCXra4u6JvT3QaMU i+64/mzf6MHVR9C5imjrF4GPXftb41BnY2kHVj8gKjbmzdvEJtp6MbJwrWyMLWBilyZ+ AT6sd6TOsoHWK9aFfW1tduSvi6I1R4QStIoFzWZ94IldUvixJT4jwrbivzF2Vp6M4GBI 8fArfGMLgJdQMyFeE1Jji9RJt5uZVIOdsbg8pswQV2Gv1NKzcRtaQb8Ab/Vz3HfkafSA b5fQ== X-Gm-Message-State: AC+VfDwztcP9iwsjpjgkN7Hulnf3Z51RdB5nJXPdRR5k8L7nnx3NjiAT sWoOA/53X2kTkmRvlthEdEC20g== X-Received: by 2002:a05:620a:2890:b0:75d:4145:154e with SMTP id j16-20020a05620a289000b0075d4145154emr12424317qkp.65.1687247701263; Tue, 20 Jun 2023 00:55:01 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id j1-20020a0df901000000b0054f50f71834sm368662ywf.124.2023.06.20.00.54.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 00:55:00 -0700 (PDT) Date: Tue, 20 Jun 2023 00:54:56 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Gerald Schaefer , Vasily Gorbik , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , Vishal Moola , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 09/12] mm/khugepaged: retract_page_tables() without mmap or vma lock In-Reply-To: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> Message-ID: References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> MIME-Version: 1.0 X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769209678117319150?= X-GMAIL-MSGID: =?utf-8?q?1769209678117319150?= Simplify shmem and file THP collapse's retract_page_tables(), and relax its locking: to improve its success rate and to lessen impact on others. Instead of its MADV_COLLAPSE case doing set_huge_pmd() at target_addr of target_mm, leave that part of the work to madvise_collapse() calling collapse_pte_mapped_thp() afterwards: just adjust collapse_file()'s result code to arrange for that. That spares retract_page_tables() four arguments; and since it will be successful in retracting all of the page tables expected of it, no need to track and return a result code itself. It needs i_mmap_lock_read(mapping) for traversing the vma interval tree, but it does not need i_mmap_lock_write() for that: page_vma_mapped_walk() allows for pte_offset_map_lock() etc to fail, and uses pmd_lock() for THPs. retract_page_tables() just needs to use those same spinlocks to exclude it briefly, while transitioning pmd from page table to none: so restore its use of pmd_lock() inside of which pte lock is nested. Users of pte_offset_map_lock() etc all now allow for them to fail: so retract_page_tables() now has no use for mmap_write_trylock() or vma_try_start_write(). In common with rmap and page_vma_mapped_walk(), it does not even need the mmap_read_lock(). But those users do expect the page table to remain a good page table, until they unlock and rcu_read_unlock(): so the page table cannot be freed immediately, but rather by the recently added pte_free_defer(). Use the (usually a no-op) pmdp_get_lockless_sync() to send an interrupt when PAE, and pmdp_collapse_flush() did not already do so: to make sure that the start,pmdp_get_lockless(),end sequence in __pte_offset_map() cannot pick up a pmd entry with mismatched pmd_low and pmd_high. retract_page_tables() can be enhanced to replace_page_tables(), which inserts the final huge pmd without mmap lock: going through an invalid state instead of pmd_none() followed by fault. But that enhancement does raise some more questions: leave it until a later release. Signed-off-by: Hugh Dickins --- mm/khugepaged.c | 184 ++++++++++++++++++++---------------------------- 1 file changed, 75 insertions(+), 109 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 1083f0e38a07..f7a0f7673127 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1617,9 +1617,8 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, break; case SCAN_PMD_NONE: /* - * In MADV_COLLAPSE path, possible race with khugepaged where - * all pte entries have been removed and pmd cleared. If so, - * skip all the pte checks and just update the pmd mapping. + * All pte entries have been removed and pmd cleared. + * Skip all the pte checks and just update the pmd mapping. */ goto maybe_install_pmd; default: @@ -1748,123 +1747,88 @@ static void khugepaged_collapse_pte_mapped_thps(struct khugepaged_mm_slot *mm_sl mmap_write_unlock(mm); } -static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff, - struct mm_struct *target_mm, - unsigned long target_addr, struct page *hpage, - struct collapse_control *cc) +static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) { struct vm_area_struct *vma; - int target_result = SCAN_FAIL; - i_mmap_lock_write(mapping); + i_mmap_lock_read(mapping); vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) { - int result = SCAN_FAIL; - struct mm_struct *mm = NULL; - unsigned long addr = 0; - pmd_t *pmd; - bool is_target = false; + struct mmu_notifier_range range; + struct mm_struct *mm; + unsigned long addr; + pmd_t *pmd, pgt_pmd; + spinlock_t *pml; + spinlock_t *ptl; + bool skipped_uffd = false; /* * Check vma->anon_vma to exclude MAP_PRIVATE mappings that - * got written to. These VMAs are likely not worth investing - * mmap_write_lock(mm) as PMD-mapping is likely to be split - * later. - * - * Note that vma->anon_vma check is racy: it can be set up after - * the check but before we took mmap_lock by the fault path. - * But page lock would prevent establishing any new ptes of the - * page, so we are safe. - * - * An alternative would be drop the check, but check that page - * table is clear before calling pmdp_collapse_flush() under - * ptl. It has higher chance to recover THP for the VMA, but - * has higher cost too. It would also probably require locking - * the anon_vma. + * got written to. These VMAs are likely not worth removing + * page tables from, as PMD-mapping is likely to be split later. */ - if (READ_ONCE(vma->anon_vma)) { - result = SCAN_PAGE_ANON; - goto next; - } + if (READ_ONCE(vma->anon_vma)) + continue; + addr = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT); if (addr & ~HPAGE_PMD_MASK || - vma->vm_end < addr + HPAGE_PMD_SIZE) { - result = SCAN_VMA_CHECK; - goto next; - } - mm = vma->vm_mm; - is_target = mm == target_mm && addr == target_addr; - result = find_pmd_or_thp_or_none(mm, addr, &pmd); - if (result != SCAN_SUCCEED) - goto next; - /* - * We need exclusive mmap_lock to retract page table. - * - * We use trylock due to lock inversion: we need to acquire - * mmap_lock while holding page lock. Fault path does it in - * reverse order. Trylock is a way to avoid deadlock. - * - * Also, it's not MADV_COLLAPSE's job to collapse other - * mappings - let khugepaged take care of them later. - */ - result = SCAN_PTE_MAPPED_HUGEPAGE; - if ((cc->is_khugepaged || is_target) && - mmap_write_trylock(mm)) { - /* trylock for the same lock inversion as above */ - if (!vma_try_start_write(vma)) - goto unlock_next; - - /* - * Re-check whether we have an ->anon_vma, because - * collapse_and_free_pmd() requires that either no - * ->anon_vma exists or the anon_vma is locked. - * We already checked ->anon_vma above, but that check - * is racy because ->anon_vma can be populated under the - * mmap lock in read mode. - */ - if (vma->anon_vma) { - result = SCAN_PAGE_ANON; - goto unlock_next; - } - /* - * When a vma is registered with uffd-wp, we can't - * recycle the pmd pgtable because there can be pte - * markers installed. Skip it only, so the rest mm/vma - * can still have the same file mapped hugely, however - * it'll always mapped in small page size for uffd-wp - * registered ranges. - */ - if (hpage_collapse_test_exit(mm)) { - result = SCAN_ANY_PROCESS; - goto unlock_next; - } - if (userfaultfd_wp(vma)) { - result = SCAN_PTE_UFFD_WP; - goto unlock_next; - } - collapse_and_free_pmd(mm, vma, addr, pmd); - if (!cc->is_khugepaged && is_target) - result = set_huge_pmd(vma, addr, pmd, hpage); - else - result = SCAN_SUCCEED; - -unlock_next: - mmap_write_unlock(mm); - goto next; - } - /* - * Calling context will handle target mm/addr. Otherwise, let - * khugepaged try again later. - */ - if (!is_target) { - khugepaged_add_pte_mapped_thp(mm, addr); + vma->vm_end < addr + HPAGE_PMD_SIZE) continue; + + mm = vma->vm_mm; + if (find_pmd_or_thp_or_none(mm, addr, &pmd) != SCAN_SUCCEED) + continue; + + if (hpage_collapse_test_exit(mm)) + continue; + /* + * When a vma is registered with uffd-wp, we cannot recycle + * the page table because there may be pte markers installed. + * Other vmas can still have the same file mapped hugely, but + * skip this one: it will always be mapped in small page size + * for uffd-wp registered ranges. + */ + if (userfaultfd_wp(vma)) + continue; + + /* PTEs were notified when unmapped; but now for the PMD? */ + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, + addr, addr + HPAGE_PMD_SIZE); + mmu_notifier_invalidate_range_start(&range); + + pml = pmd_lock(mm, pmd); + ptl = pte_lockptr(mm, pmd); + if (ptl != pml) + spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); + + /* + * Huge page lock is still held, so normally the page table + * must remain empty; and we have already skipped anon_vma + * and userfaultfd_wp() vmas. But since the mmap_lock is not + * held, it is still possible for a racing userfaultfd_ioctl() + * to have inserted ptes or markers. Now that we hold ptlock, + * repeating the anon_vma check protects from one category, + * and repeating the userfaultfd_wp() check from another. + */ + if (unlikely(vma->anon_vma || userfaultfd_wp(vma))) { + skipped_uffd = true; + } else { + pgt_pmd = pmdp_collapse_flush(vma, addr, pmd); + pmdp_get_lockless_sync(); + } + + if (ptl != pml) + spin_unlock(ptl); + spin_unlock(pml); + + mmu_notifier_invalidate_range_end(&range); + + if (!skipped_uffd) { + mm_dec_nr_ptes(mm); + page_table_check_pte_clear_range(mm, addr, pgt_pmd); + pte_free_defer(mm, pmd_pgtable(pgt_pmd)); } -next: - if (is_target) - target_result = result; } - i_mmap_unlock_write(mapping); - return target_result; + i_mmap_unlock_read(mapping); } /** @@ -2261,9 +2225,11 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, /* * Remove pte page tables, so we can re-fault the page as huge. + * If MADV_COLLAPSE, adjust result to call collapse_pte_mapped_thp(). */ - result = retract_page_tables(mapping, start, mm, addr, hpage, - cc); + retract_page_tables(mapping, start); + if (cc && !cc->is_khugepaged) + result = SCAN_PTE_MAPPED_HUGEPAGE; unlock_page(hpage); /* From patchwork Tue Jun 20 07:56:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 110332 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp3506169vqr; Tue, 20 Jun 2023 01:16:45 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5LvUNxeoktLjzZyo3z8ER+NUaIKLm+HQAExx5crxx7C2iYoZKFvgc7Dc2AJ+odwZ2bmfsJ X-Received: by 2002:a05:620a:9449:b0:75b:23a0:d9e6 with SMTP id sm9-20020a05620a944900b0075b23a0d9e6mr10529347qkn.60.1687249005020; Tue, 20 Jun 2023 01:16:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687249004; cv=none; d=google.com; s=arc-20160816; b=zxwaz4JgDowXjVLbFuKb9HYo8benW9rfh+gyvD9Ar97iKqbV+Ub75nYB3hnGYCAy71 VbB1+BRrEWIKXMcdBMDHcGJn18GqpOvtFd/eWV6ACVQznMLfPRFBYjZBhEgDNxNZGdHK XbU6fgffsmNMF1HnIU3NMDpcPBWW2MqhtvGzYVZP6Tl/8UX4W/BLNIenihC4jOIabZHY wDtMxC9JO0dPq+UZcg415UA+GLUh+3bUbXKnvSU2bdmxwvZXUTeFARYgJLqNh9nTy70k fcj6Jyoxd6D4OpVH/LlXedMIP2B9dKiqQSiO7HOIDaJkIHi6w1fpFyKo2VvdJBciIbiQ j8SA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:message-id:in-reply-to :subject:cc:to:from:date:dkim-signature; bh=6+vwC/lRMNRi87hMBE1YekLdovsofTWKPL6p1SpVoTA=; b=aQYf1TW2ce6WcYnQQfs7u/9ZUsaUvPkgoHWaNQ4PP/Tk1LgYn02OZTPwc9O8eTqIx5 bu943cebj5g6ZLb+O0rIqD8O2mUUEUaYIueHouCBKXAZ639irxqvfl85rqe0xxHBt4Ox O1wjEg+t1m8xY0SUFAzU5zCTH8e+XLeK7ox2CRqJE5G5W4ivbbOSYKlYOIPUgUtEpy+y tC0tNjFyWG5YeSpST/yvvFYtSNuFR6OpN/za5rDN0JqEOAus+TEUpK6GxQ+w3u0Eal0m HNJHlvmKm1RgFOYIyKeIxEYpI0emLY0NfZrZrgjn9B9hiwil+SsAgyHabPSJYcOWyr02 aKYA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=aVESce+3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h190-20020a6253c7000000b0066887aa93e7si1269567pfb.69.2023.06.20.01.16.30; Tue, 20 Jun 2023 01:16:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=aVESce+3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230449AbjFTH5n (ORCPT + 99 others); Tue, 20 Jun 2023 03:57:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54672 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231715AbjFTH5Q (ORCPT ); Tue, 20 Jun 2023 03:57:16 -0400 Received: from mail-qk1-x729.google.com (mail-qk1-x729.google.com [IPv6:2607:f8b0:4864:20::729]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 41C6FE72 for ; Tue, 20 Jun 2023 00:56:37 -0700 (PDT) Received: by mail-qk1-x729.google.com with SMTP id af79cd13be357-762490831f6so278759085a.2 for ; Tue, 20 Jun 2023 00:56:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687247796; x=1689839796; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=6+vwC/lRMNRi87hMBE1YekLdovsofTWKPL6p1SpVoTA=; b=aVESce+35OzsmAlAStgyGN3RAj7WDCAE8sHC0KoMDM6cH2/moaTq02nEqNPwD3/T+W noO2Ww2L93Fcf5p9VnVnxgvjusPf47iNS1mjXHxFvgvoI/RPajs6xy2hlJm3pPwx/ukk EEVdSHMxTCdD+Z/QL780HcIsKFRRn53QQIJPB599L7CCSrk3CJVO4ksGsL2XLtesId9S XBJIAQuaqrldCBAWpVU1/KQoK1mGqrJ3H9wgg6dhtt5FIThtuxRlpvzAq0tV3zGFdrNA jHvRkgQ7WhWy8ZMMUEOCC7EiG5C12tuJVl2FBkFi/jQT8V3wZP59XIwC46nMyksOaHFh Ka8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687247796; x=1689839796; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6+vwC/lRMNRi87hMBE1YekLdovsofTWKPL6p1SpVoTA=; b=YLEjONs3cUd8hVQcncwWOneBErN4ptU6/ljMX9wgNqhj2rWFSghbEMfvRCG9I3v3l+ I4BufDFpRE3dWbhomV1axbHsvm2zXI0uKepLhv0IganUr9YT+UfIsERbJxWoaE+45mg6 /N2mVLRT0qZHnRLI0iZc/OiNn0YmmK77nkgaoL6HIsiWNyiacnV51192iRp11NYkKLSX wxhf+RKNpwARWJsvzFquj5PEQBRNA+ru9d4Xy5p7XDclFMDr+9IObp0NirdX7+ZNuVwg CKFD44zqkzNFyj4TeCBcT7EsiBHei47SVNfNqNHcCQ56YwPKXu94vT27ocpEZNNz+X9Z 2OkA== X-Gm-Message-State: AC+VfDyYmMnQEq7T5X+zJrZpSir0aFksTzZsjv9Xb6aFk96RFXnNpnsF EmQJ2GzzisKddPsPkMm37zCI4g== X-Received: by 2002:a05:622a:1708:b0:3f0:ac80:1ed7 with SMTP id h8-20020a05622a170800b003f0ac801ed7mr15885756qtk.45.1687247796145; Tue, 20 Jun 2023 00:56:36 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id v25-20020a25fc19000000b00bab9a67a4cesm257974ybd.29.2023.06.20.00.56.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 00:56:35 -0700 (PDT) Date: Tue, 20 Jun 2023 00:56:31 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Gerald Schaefer , Vasily Gorbik , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , Vishal Moola , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 10/12] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock() In-Reply-To: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> Message-ID: References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> MIME-Version: 1.0 X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769208812413273585?= X-GMAIL-MSGID: =?utf-8?q?1769208812413273585?= Bring collapse_and_free_pmd() back into collapse_pte_mapped_thp(). It does need mmap_read_lock(), but it does not need mmap_write_lock(), nor vma_start_write() nor i_mmap lock nor anon_vma lock. All racing paths are relying on pte_offset_map_lock() and pmd_lock(), so use those. Follow the pattern in retract_page_tables(); and using pte_free_defer() removes most of the need for tlb_remove_table_sync_one() here; but call pmdp_get_lockless_sync() to use it in the PAE case. First check the VMA, in case page tables are being torn down: from JannH. Confirm the preliminary find_pmd_or_thp_or_none() once page lock has been acquired and the page looks suitable: from then on its state is stable. However, collapse_pte_mapped_thp() was doing something others don't: freeing a page table still containing "valid" entries. i_mmap lock did stop a racing truncate from double-freeing those pages, but we prefer collapse_pte_mapped_thp() to clear the entries as usual. Their TLB flush can wait until the pmdp_collapse_flush() which follows, but the mmu_notifier_invalidate_range_start() has to be done earlier. Do the "step 1" checking loop without mmu_notifier: it wouldn't be good for khugepaged to keep on repeatedly invalidating a range which is then found unsuitable e.g. contains COWs. "step 2", which does the clearing, must then be more careful (after dropping ptl to do mmu_notifier), with abort prepared to correct the accounting like "step 3". But with those entries now cleared, "step 4" (after dropping ptl to do pmd_lock) is kept safe by the huge page lock, which stops new PTEs from being faulted in. Signed-off-by: Hugh Dickins --- mm/khugepaged.c | 172 ++++++++++++++++++++++-------------------------- 1 file changed, 77 insertions(+), 95 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index f7a0f7673127..060ac8789a1e 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1485,7 +1485,7 @@ static bool khugepaged_add_pte_mapped_thp(struct mm_struct *mm, return ret; } -/* hpage must be locked, and mmap_lock must be held in write */ +/* hpage must be locked, and mmap_lock must be held */ static int set_huge_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmdp, struct page *hpage) { @@ -1497,7 +1497,7 @@ static int set_huge_pmd(struct vm_area_struct *vma, unsigned long addr, }; VM_BUG_ON(!PageTransHuge(hpage)); - mmap_assert_write_locked(vma->vm_mm); + mmap_assert_locked(vma->vm_mm); if (do_set_pmd(&vmf, hpage)) return SCAN_FAIL; @@ -1506,48 +1506,6 @@ static int set_huge_pmd(struct vm_area_struct *vma, unsigned long addr, return SCAN_SUCCEED; } -/* - * A note about locking: - * Trying to take the page table spinlocks would be useless here because those - * are only used to synchronize: - * - * - modifying terminal entries (ones that point to a data page, not to another - * page table) - * - installing *new* non-terminal entries - * - * Instead, we need roughly the same kind of protection as free_pgtables() or - * mm_take_all_locks() (but only for a single VMA): - * The mmap lock together with this VMA's rmap locks covers all paths towards - * the page table entries we're messing with here, except for hardware page - * table walks and lockless_pages_from_mm(). - */ -static void collapse_and_free_pmd(struct mm_struct *mm, struct vm_area_struct *vma, - unsigned long addr, pmd_t *pmdp) -{ - pmd_t pmd; - struct mmu_notifier_range range; - - mmap_assert_write_locked(mm); - if (vma->vm_file) - lockdep_assert_held_write(&vma->vm_file->f_mapping->i_mmap_rwsem); - /* - * All anon_vmas attached to the VMA have the same root and are - * therefore locked by the same lock. - */ - if (vma->anon_vma) - lockdep_assert_held_write(&vma->anon_vma->root->rwsem); - - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, addr, - addr + HPAGE_PMD_SIZE); - mmu_notifier_invalidate_range_start(&range); - pmd = pmdp_collapse_flush(vma, addr, pmdp); - tlb_remove_table_sync_one(); - mmu_notifier_invalidate_range_end(&range); - mm_dec_nr_ptes(mm); - page_table_check_pte_clear_range(mm, addr, pmd); - pte_free(mm, pmd_pgtable(pmd)); -} - /** * collapse_pte_mapped_thp - Try to collapse a pte-mapped THP for mm at * address haddr. @@ -1563,26 +1521,29 @@ static void collapse_and_free_pmd(struct mm_struct *mm, struct vm_area_struct *v int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, bool install_pmd) { + struct mmu_notifier_range range; + bool notified = false; unsigned long haddr = addr & HPAGE_PMD_MASK; struct vm_area_struct *vma = vma_lookup(mm, haddr); struct page *hpage; pte_t *start_pte, *pte; - pmd_t *pmd; - spinlock_t *ptl; - int count = 0, result = SCAN_FAIL; + pmd_t *pmd, pgt_pmd; + spinlock_t *pml, *ptl; + int nr_ptes = 0, result = SCAN_FAIL; int i; - mmap_assert_write_locked(mm); + mmap_assert_locked(mm); + + /* First check VMA found, in case page tables are being torn down */ + if (!vma || !vma->vm_file || + !range_in_vma(vma, haddr, haddr + HPAGE_PMD_SIZE)) + return SCAN_VMA_CHECK; /* Fast check before locking page if already PMD-mapped */ result = find_pmd_or_thp_or_none(mm, haddr, &pmd); if (result == SCAN_PMD_MAPPED) return result; - if (!vma || !vma->vm_file || - !range_in_vma(vma, haddr, haddr + HPAGE_PMD_SIZE)) - return SCAN_VMA_CHECK; - /* * If we are here, we've succeeded in replacing all the native pages * in the page cache with a single hugepage. If a mm were to fault-in @@ -1612,6 +1573,7 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, goto drop_hpage; } + result = find_pmd_or_thp_or_none(mm, haddr, &pmd); switch (result) { case SCAN_SUCCEED: break; @@ -1625,27 +1587,10 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, goto drop_hpage; } - /* Lock the vma before taking i_mmap and page table locks */ - vma_start_write(vma); - - /* - * We need to lock the mapping so that from here on, only GUP-fast and - * hardware page walks can access the parts of the page tables that - * we're operating on. - * See collapse_and_free_pmd(). - */ - i_mmap_lock_write(vma->vm_file->f_mapping); - - /* - * This spinlock should be unnecessary: Nobody else should be accessing - * the page tables under spinlock protection here, only - * lockless_pages_from_mm() and the hardware page walker can access page - * tables while all the high-level locks are held in write mode. - */ result = SCAN_FAIL; start_pte = pte_offset_map_lock(mm, pmd, haddr, &ptl); - if (!start_pte) - goto drop_immap; + if (!start_pte) /* mmap_lock + page lock should prevent this */ + goto drop_hpage; /* step 1: check all mapped PTEs are to the right huge page */ for (i = 0, addr = haddr, pte = start_pte; @@ -1671,57 +1616,94 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, */ if (hpage + i != page) goto abort; - count++; } - /* step 2: adjust rmap */ + pte_unmap_unlock(start_pte, ptl); + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, + haddr, haddr + HPAGE_PMD_SIZE); + mmu_notifier_invalidate_range_start(&range); + notified = true; + start_pte = pte_offset_map_lock(mm, pmd, haddr, &ptl); + if (!start_pte) /* mmap_lock + page lock should prevent this */ + goto abort; + + /* step 2: clear page table and adjust rmap */ for (i = 0, addr = haddr, pte = start_pte; i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE, pte++) { struct page *page; if (pte_none(*pte)) continue; - page = vm_normal_page(vma, addr, *pte); - if (WARN_ON_ONCE(page && is_zone_device_page(page))) + /* + * We dropped ptl after the first scan, to do the mmu_notifier: + * page lock stops more PTEs of the hpage being faulted in, but + * does not stop write faults COWing anon copies from existing + * PTEs; and does not stop those being swapped out or migrated. + */ + if (!pte_present(*pte)) { + result = SCAN_PTE_NON_PRESENT; goto abort; + } + page = vm_normal_page(vma, addr, *pte); + if (hpage + i != page) + goto abort; + + /* + * Must clear entry, or a racing truncate may re-remove it. + * TLB flush can be left until pmdp_collapse_flush() does it. + * PTE dirty? Shmem page is already dirty; file is read-only. + */ + pte_clear(mm, addr, pte); page_remove_rmap(page, vma, false); + nr_ptes++; } pte_unmap_unlock(start_pte, ptl); /* step 3: set proper refcount and mm_counters. */ - if (count) { - page_ref_sub(hpage, count); - add_mm_counter(vma->vm_mm, mm_counter_file(hpage), -count); + if (nr_ptes) { + page_ref_sub(hpage, nr_ptes); + add_mm_counter(mm, mm_counter_file(hpage), -nr_ptes); } - /* step 4: remove pte entries */ - /* we make no change to anon, but protect concurrent anon page lookup */ - if (vma->anon_vma) - anon_vma_lock_write(vma->anon_vma); + /* step 4: remove page table */ - collapse_and_free_pmd(mm, vma, haddr, pmd); + /* Huge page lock is still held, so page table must remain empty */ + pml = pmd_lock(mm, pmd); + if (ptl != pml) + spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); + pgt_pmd = pmdp_collapse_flush(vma, haddr, pmd); + pmdp_get_lockless_sync(); + if (ptl != pml) + spin_unlock(ptl); + spin_unlock(pml); - if (vma->anon_vma) - anon_vma_unlock_write(vma->anon_vma); - i_mmap_unlock_write(vma->vm_file->f_mapping); + mmu_notifier_invalidate_range_end(&range); + + mm_dec_nr_ptes(mm); + page_table_check_pte_clear_range(mm, haddr, pgt_pmd); + pte_free_defer(mm, pmd_pgtable(pgt_pmd)); maybe_install_pmd: /* step 5: install pmd entry */ result = install_pmd ? set_huge_pmd(vma, haddr, pmd, hpage) : SCAN_SUCCEED; - + goto drop_hpage; +abort: + if (nr_ptes) { + flush_tlb_mm(mm); + page_ref_sub(hpage, nr_ptes); + add_mm_counter(mm, mm_counter_file(hpage), -nr_ptes); + } + if (start_pte) + pte_unmap_unlock(start_pte, ptl); + if (notified) + mmu_notifier_invalidate_range_end(&range); drop_hpage: unlock_page(hpage); put_page(hpage); return result; - -abort: - pte_unmap_unlock(start_pte, ptl); -drop_immap: - i_mmap_unlock_write(vma->vm_file->f_mapping); - goto drop_hpage; } static void khugepaged_collapse_pte_mapped_thps(struct khugepaged_mm_slot *mm_slot) @@ -2857,9 +2839,9 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, case SCAN_PTE_MAPPED_HUGEPAGE: BUG_ON(mmap_locked); BUG_ON(*prev); - mmap_write_lock(mm); + mmap_read_lock(mm); result = collapse_pte_mapped_thp(mm, addr, true); - mmap_write_unlock(mm); + mmap_locked = true; goto handle_result; /* Whitelisted set of results where continuing OK */ case SCAN_PMD_NULL: From patchwork Tue Jun 20 07:58:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 110331 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp3505503vqr; Tue, 20 Jun 2023 01:15:22 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ54cX3Bxu2863riHTrtV3PFbPBthSizNqnI3eVPDcy9cw9BBqdaWoDybeoeuHxBYtfu1ldo X-Received: by 2002:a05:6a20:7290:b0:120:7282:2166 with SMTP id o16-20020a056a20729000b0012072822166mr8484482pzk.18.1687248922401; Tue, 20 Jun 2023 01:15:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687248922; cv=none; d=google.com; s=arc-20160816; b=eNngT4pPlMLTDnKhD7KJVQ3wKxlCuSiyPRNWLAmTqZDvgrOLjgdLJ5PfoJjmb2posE 0/xaRNeRW6tJVGLMpWVlvy1r1cBc0H4MCYpB7ygNtSsvRMPV2D06euQ1Rfn/lxTv9tEY AW9F1M8o7bJgXzbfcJuCJQ2sUpUOyeoJRkqIUf3L3J2LMjSNpK9UKlS7ZFd9huEjLu1V J5uDnQ8PZP4UrTzKpFSkmuiVUhQl/db3KxZDyPGN5tMYxUmkDh92OMTc6ZkOX+1qolET ikLmfJXDkPm50lD0Oz/WgiZ8bZP9l8lEyE+9Hm99SGvNoXCOk/RaqJ2bXt9OOXV3xAzC 55oA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:message-id:in-reply-to :subject:cc:to:from:date:dkim-signature; bh=oZY6w9J7toyiTSnrDYdTl/iZtQ4DwvkRvcpzIuD6Z/M=; b=yT9yjY671ZMfPo3vi6XFMJ6qlk7u4PtRRCOktXuF/7E9ZNnFWdRon8v73J6JYXPUBZ MtWMUisU3MHw9Z1i8Y5YaOt3HwEYzbHBBTyzpIcxOdDe218arplp6BgenmMYvNenieOJ 7OvdcBufQUcTY8ZKcg+40L2aLFwiHLYlDoqO14vlurExHJXw90nPq20UO1klKlaEX9DE SC/Ri9gp2PxtHgCmFcnmK/Mm/QRDqOjuBEbRkCdE4/DMfj0Q51f0oDRXeoOdPdpTy8UM ZfFwp+w5SWwwj9AauksGVvBO/ZG7//QmoNswjMnZ/w+tZuQfiS8ITFAnwrCW1NLsTTXM HQlg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=jjSq9aTI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id kk13-20020a170903070d00b001b3d5eb9234si1363397plb.410.2023.06.20.01.15.07; Tue, 20 Jun 2023 01:15:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=jjSq9aTI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231684AbjFTH7D (ORCPT + 99 others); Tue, 20 Jun 2023 03:59:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55912 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231698AbjFTH6d (ORCPT ); Tue, 20 Jun 2023 03:58:33 -0400 Received: from mail-yw1-x1132.google.com (mail-yw1-x1132.google.com [IPv6:2607:f8b0:4864:20::1132]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1647AE68 for ; Tue, 20 Jun 2023 00:58:13 -0700 (PDT) Received: by mail-yw1-x1132.google.com with SMTP id 00721157ae682-57045429f76so46745027b3.0 for ; Tue, 20 Jun 2023 00:58:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687247892; x=1689839892; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=oZY6w9J7toyiTSnrDYdTl/iZtQ4DwvkRvcpzIuD6Z/M=; b=jjSq9aTI2DDqxscN4pq6tIW2t1oB2Us2M+ybVjj+WbNNuPa0SHl8anbtzK7Rjar1SY FtbdELwb+qm7aZfRDGlas9htldGEgGJo8EUel1IgLBRhuOjybXZ2JhWFWoNydWhWnLoe sDikPgviTF+F9BLGHUmNHshe9jLjcHp4eHYN78AHXDshMx1gmgi0lwXRrXrymzvfy1oS sSS/SWZGOylcmqt/OOyH7yW5jIVhDF2myOH5nmIHjTp4hCoBYxyIV0dBLcoGfKjX+nSm SCqmPcSA1OfsDZ4JORRyiDnJMkamZlwDweNLYjOmQrj5114Aq+ZG9d2oxbylEIDpX6zC XvUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687247892; x=1689839892; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=oZY6w9J7toyiTSnrDYdTl/iZtQ4DwvkRvcpzIuD6Z/M=; b=FftIKRt7VcCfM9qDNTRRXVoyY814MRr8HpZOYGJl6qrVf/om3fIkz0wGC/iBzvpkv9 qgG44lHxeAD7Tpqdx8jkAQyS8TLcxwH3qaZNpAZnhQna+m/bm/U7/TqU1FD9x/36fcNV ODc9QZyKgOkOSKnw4CZgSNxU2oU3PT3ky1eGL/XkHTDAlLSlrq0XWtOW7fpU3G+Bl3c8 F31L9bIKzfEatLWcbSaIxvkJtg9BN+cIsVj/M4Wg3d3B30y0ca7ld9b7TFRQG29YtgX5 lJzC86OriD2xyxMWrhkeEp2A1xy/zmJ8cWD+zTNv5roOyqAcRrHiOTebkC7tfEraVKZu NvcA== X-Gm-Message-State: AC+VfDyIYjbtAd1KdouBWFDGuAAdfs5W6ObJdszJlllPEAsm6hE29wGc p6JjEcAGdq3AIq56bdVyzJqBsQ== X-Received: by 2002:a81:9157:0:b0:570:8802:ec9f with SMTP id i84-20020a819157000000b005708802ec9fmr10023541ywg.19.1687247892063; Tue, 20 Jun 2023 00:58:12 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id a133-20020a81668b000000b00569ff2d94f6sm385736ywc.19.2023.06.20.00.58.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 00:58:11 -0700 (PDT) Date: Tue, 20 Jun 2023 00:58:07 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Gerald Schaefer , Vasily Gorbik , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , Vishal Moola , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 11/12] mm/khugepaged: delete khugepaged_collapse_pte_mapped_thps() In-Reply-To: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> Message-ID: <90cd6860-eb92-db66-9a8-5fa7b494a10@google.com> References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> MIME-Version: 1.0 X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769208726113214719?= X-GMAIL-MSGID: =?utf-8?q?1769208726113214719?= Now that retract_page_tables() can retract page tables reliably, without depending on trylocks, delete all the apparatus for khugepaged to try again later: khugepaged_collapse_pte_mapped_thps() etc; and free up the per-mm memory which was set aside for that in the khugepaged_mm_slot. But one part of that is worth keeping: when hpage_collapse_scan_file() found SCAN_PTE_MAPPED_HUGEPAGE, that address was noted in the mm_slot to be tried for retraction later - catching, for example, page tables where a reversible mprotect() of a portion had required splitting the pmd, but now it can be recollapsed. Call collapse_pte_mapped_thp() directly in this case (why was it deferred before? I assume an issue with needing mmap_lock for write, but now it's only needed for read). Signed-off-by: Hugh Dickins --- mm/khugepaged.c | 125 +++++++----------------------------------------- 1 file changed, 16 insertions(+), 109 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 060ac8789a1e..06c659e6a89e 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -92,8 +92,6 @@ static __read_mostly DEFINE_HASHTABLE(mm_slots_hash, MM_SLOTS_HASH_BITS); static struct kmem_cache *mm_slot_cache __read_mostly; -#define MAX_PTE_MAPPED_THP 8 - struct collapse_control { bool is_khugepaged; @@ -107,15 +105,9 @@ struct collapse_control { /** * struct khugepaged_mm_slot - khugepaged information per mm that is being scanned * @slot: hash lookup from mm to mm_slot - * @nr_pte_mapped_thp: number of pte mapped THP - * @pte_mapped_thp: address array corresponding pte mapped THP */ struct khugepaged_mm_slot { struct mm_slot slot; - - /* pte-mapped THP in this mm */ - int nr_pte_mapped_thp; - unsigned long pte_mapped_thp[MAX_PTE_MAPPED_THP]; }; /** @@ -1441,50 +1433,6 @@ static void collect_mm_slot(struct khugepaged_mm_slot *mm_slot) } #ifdef CONFIG_SHMEM -/* - * Notify khugepaged that given addr of the mm is pte-mapped THP. Then - * khugepaged should try to collapse the page table. - * - * Note that following race exists: - * (1) khugepaged calls khugepaged_collapse_pte_mapped_thps() for mm_struct A, - * emptying the A's ->pte_mapped_thp[] array. - * (2) MADV_COLLAPSE collapses some file extent with target mm_struct B, and - * retract_page_tables() finds a VMA in mm_struct A mapping the same extent - * (at virtual address X) and adds an entry (for X) into mm_struct A's - * ->pte-mapped_thp[] array. - * (3) khugepaged calls khugepaged_collapse_scan_file() for mm_struct A at X, - * sees a pte-mapped THP (SCAN_PTE_MAPPED_HUGEPAGE) and adds an entry - * (for X) into mm_struct A's ->pte-mapped_thp[] array. - * Thus, it's possible the same address is added multiple times for the same - * mm_struct. Should this happen, we'll simply attempt - * collapse_pte_mapped_thp() multiple times for the same address, under the same - * exclusive mmap_lock, and assuming the first call is successful, subsequent - * attempts will return quickly (without grabbing any additional locks) when - * a huge pmd is found in find_pmd_or_thp_or_none(). Since this is a cheap - * check, and since this is a rare occurrence, the cost of preventing this - * "multiple-add" is thought to be more expensive than just handling it, should - * it occur. - */ -static bool khugepaged_add_pte_mapped_thp(struct mm_struct *mm, - unsigned long addr) -{ - struct khugepaged_mm_slot *mm_slot; - struct mm_slot *slot; - bool ret = false; - - VM_BUG_ON(addr & ~HPAGE_PMD_MASK); - - spin_lock(&khugepaged_mm_lock); - slot = mm_slot_lookup(mm_slots_hash, mm); - mm_slot = mm_slot_entry(slot, struct khugepaged_mm_slot, slot); - if (likely(mm_slot && mm_slot->nr_pte_mapped_thp < MAX_PTE_MAPPED_THP)) { - mm_slot->pte_mapped_thp[mm_slot->nr_pte_mapped_thp++] = addr; - ret = true; - } - spin_unlock(&khugepaged_mm_lock); - return ret; -} - /* hpage must be locked, and mmap_lock must be held */ static int set_huge_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmdp, struct page *hpage) @@ -1706,29 +1654,6 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, return result; } -static void khugepaged_collapse_pte_mapped_thps(struct khugepaged_mm_slot *mm_slot) -{ - struct mm_slot *slot = &mm_slot->slot; - struct mm_struct *mm = slot->mm; - int i; - - if (likely(mm_slot->nr_pte_mapped_thp == 0)) - return; - - if (!mmap_write_trylock(mm)) - return; - - if (unlikely(hpage_collapse_test_exit(mm))) - goto out; - - for (i = 0; i < mm_slot->nr_pte_mapped_thp; i++) - collapse_pte_mapped_thp(mm, mm_slot->pte_mapped_thp[i], false); - -out: - mm_slot->nr_pte_mapped_thp = 0; - mmap_write_unlock(mm); -} - static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) { struct vm_area_struct *vma; @@ -2372,16 +2297,6 @@ static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr, { BUILD_BUG(); } - -static void khugepaged_collapse_pte_mapped_thps(struct khugepaged_mm_slot *mm_slot) -{ -} - -static bool khugepaged_add_pte_mapped_thp(struct mm_struct *mm, - unsigned long addr) -{ - return false; -} #endif static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, @@ -2411,7 +2326,6 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, khugepaged_scan.mm_slot = mm_slot; } spin_unlock(&khugepaged_mm_lock); - khugepaged_collapse_pte_mapped_thps(mm_slot); mm = slot->mm; /* @@ -2464,36 +2378,29 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, khugepaged_scan.address); mmap_read_unlock(mm); - *result = hpage_collapse_scan_file(mm, - khugepaged_scan.address, - file, pgoff, cc); mmap_locked = false; + *result = hpage_collapse_scan_file(mm, + khugepaged_scan.address, file, pgoff, cc); + if (*result == SCAN_PTE_MAPPED_HUGEPAGE) { + mmap_read_lock(mm); + mmap_locked = true; + if (hpage_collapse_test_exit(mm)) { + fput(file); + goto breakouterloop; + } + *result = collapse_pte_mapped_thp(mm, + khugepaged_scan.address, false); + if (*result == SCAN_PMD_MAPPED) + *result = SCAN_SUCCEED; + } fput(file); } else { *result = hpage_collapse_scan_pmd(mm, vma, - khugepaged_scan.address, - &mmap_locked, - cc); + khugepaged_scan.address, &mmap_locked, cc); } - switch (*result) { - case SCAN_PTE_MAPPED_HUGEPAGE: { - pmd_t *pmd; - *result = find_pmd_or_thp_or_none(mm, - khugepaged_scan.address, - &pmd); - if (*result != SCAN_SUCCEED) - break; - if (!khugepaged_add_pte_mapped_thp(mm, - khugepaged_scan.address)) - break; - } fallthrough; - case SCAN_SUCCEED: + if (*result == SCAN_SUCCEED) ++khugepaged_pages_collapsed; - break; - default: - break; - } /* move to next address */ khugepaged_scan.address += HPAGE_PMD_SIZE; From patchwork Tue Jun 20 07:59:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 110346 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp3518265vqr; Tue, 20 Jun 2023 01:49:04 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6Tzq5JpfzTmfrqHbCdV7RHX0P/pWcHgZbBZLTxvvRgKNmL/8OjHgyKy7c8MdzQWD5z5CS2 X-Received: by 2002:a17:902:d48f:b0:1af:ea40:34e6 with SMTP id c15-20020a170902d48f00b001afea4034e6mr2809822plg.60.1687250943846; Tue, 20 Jun 2023 01:49:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687250943; cv=none; d=google.com; s=arc-20160816; b=pa70g4ZuLp3rxnlFysC0SUJMI+ixBEz9xqo11tZPe544sFSaX4Vai8Hx0gg0ar612f QcTORvXKZjnw9OhkPfxXj0RAEpyxNwF4PLlfgmzTMLSMI8tgAahexrQYn0eJW8v6oPoy +WnlXg7uxjbJNfsuE+6gaO0cUn3OWEBicesGKRasj6tHi8MHkaoIjGWBNcuTUlEYFUXc +QqbktVrCMisY/CITHZmQYRXPfgUJp/LeI7MVxcpgo7eBUN1WS5FmW1+Qkd6/ldh795J dxy5hyk4g3O1jKCbxkoZtqeauCV6eVlLV774/YurAwxWU/leltv56gipKkQyLyLmQP/f 8xpw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:message-id:in-reply-to :subject:cc:to:from:date:dkim-signature; bh=Aa5HWe45UW6rQUJiJNVLC4MPu6AngULI5k321Sbm9Ew=; b=wcjDZssgLvsd1ngBNfptWEMj7Mu3ejP0mxwDXS/B/UaGItzG52WqzTg/AMGXL1/vzY jkv9SNZsIPLpJzqDZQ6OHetj0CuyU8C9LIkPeVQKMqYS4jdqYt6Vthd9o/oBQlMtmJFS LttyJlB1GBNgEetlIMLpfCnhcfnfyTRi0/BkrPOVQa9ZHQzqez8UdvvT7e/d1qxwv07B S6qE9ibYhjumyxCTSvffqyPaTI0/2G4cD4caREyhJO8NnIVqRbLq83YY0ED2hiBjh8lb SRSHGSEp+oZ+uol69wvo8cZ/cHtcnfu76QYc9LKr5vIi7LQnDRwgYtF4Dnh9gYbvjarT sZzQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=PTqSZx2n; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i3-20020a17090332c300b001addf547a59si1741332plr.0.2023.06.20.01.48.50; Tue, 20 Jun 2023 01:49:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=PTqSZx2n; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231755AbjFTIAw (ORCPT + 99 others); Tue, 20 Jun 2023 04:00:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56560 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231985AbjFTIAT (ORCPT ); Tue, 20 Jun 2023 04:00:19 -0400 Received: from mail-yw1-x1131.google.com (mail-yw1-x1131.google.com [IPv6:2607:f8b0:4864:20::1131]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BAA4F19AE for ; Tue, 20 Jun 2023 00:59:55 -0700 (PDT) Received: by mail-yw1-x1131.google.com with SMTP id 00721157ae682-5700b37da3fso39520867b3.1 for ; Tue, 20 Jun 2023 00:59:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687247995; x=1689839995; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=Aa5HWe45UW6rQUJiJNVLC4MPu6AngULI5k321Sbm9Ew=; b=PTqSZx2nNI4PV4JwXAt99NVosmlLWlQ/KYn1Uf1f3Yd1Z3NVe7obF1sAci/vRU4j/I /dODEkNukKEVJX5QJOgx8A4SVofs7E+EBk4aUKnEe/gxIGiuFO4K0lGYei4SR6tcYKnK AduYU9eLhwRb6vByQAsUQ6tUe6SjjlibwcGiUn9EmoAEYZk/fTkwOJxts/LviFdlBYyU 2HDG5towuNEQ5CvzS8P4/0FmCn98RyYyORjokeprTYbqe1PhIQYQex8ifvvnbwDxRbO6 McJrcKmHoDDBglMt6fs5aQdKWv2A5unXxMstk7k6pfo7pQ4FA/PbwXapoPL1ZYXTeDU7 ON7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687247995; x=1689839995; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Aa5HWe45UW6rQUJiJNVLC4MPu6AngULI5k321Sbm9Ew=; b=YHjnx/Cu8l1EhhQLx4hbiZgvRBnSGXPm5FnS9I6Kv7faln3ipixdHk4lFdhSJsyZjZ aTPLDKJygXoFGyxmP+FWVZYrrpLqgn08fqguKSrmoo1O8udI010kgvT4uSsyrNic9vZ6 Kn8OqDovTvEbd4zdOCtRuJmCHj3t+LFpSXVDSbHiWE1OYe0tqeWLrLZsh7fPwib/04Az noKZfOQmEN5t6J1UPvWAU2FCrmIFxYGrsGgX5aTnJSaiTHnGHnfDhRqwaB/YY4GT9lz1 lV6nFidDe/fH7wH7/Aj37lzVvZU2XVDQ8IdiIDQToPRBpQxlawATgGhssx+Lb2Qncx2d R+Rw== X-Gm-Message-State: AC+VfDy0cgta/aWfbMPT1Xk1Q2TjilHtG5qlVGtk0VsaEdqairTz2v3e 6p7LO+ldZUT/ELU8e33p1V3tQQ== X-Received: by 2002:a5b:bc9:0:b0:bb4:14a2:fb4a with SMTP id c9-20020a5b0bc9000000b00bb414a2fb4amr2548922ybr.9.1687247993367; Tue, 20 Jun 2023 00:59:53 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id v190-20020a257ac7000000b00ba88763e5b5sm268132ybc.2.2023.06.20.00.59.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jun 2023 00:59:53 -0700 (PDT) Date: Tue, 20 Jun 2023 00:59:48 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Gerald Schaefer , Vasily Gorbik , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Lorenzo Stoakes , Huang Ying , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , Vishal Moola , Vlastimil Babka , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 12/12] mm: delete mmap_write_trylock() and vma_try_start_write() In-Reply-To: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> Message-ID: <27505a8-e717-61ce-ab70-5f79d9bf646b@google.com> References: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> MIME-Version: 1.0 X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769210845479469556?= X-GMAIL-MSGID: =?utf-8?q?1769210845479469556?= mmap_write_trylock() and vma_try_start_write() were added just for khugepaged, but now it has no use for them: delete. Signed-off-by: Hugh Dickins --- include/linux/mm.h | 17 ----------------- include/linux/mmap_lock.h | 10 ---------- 2 files changed, 27 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 3c2e56980853..9b24f8fbf899 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -690,21 +690,6 @@ static inline void vma_start_write(struct vm_area_struct *vma) up_write(&vma->vm_lock->lock); } -static inline bool vma_try_start_write(struct vm_area_struct *vma) -{ - int mm_lock_seq; - - if (__is_vma_write_locked(vma, &mm_lock_seq)) - return true; - - if (!down_write_trylock(&vma->vm_lock->lock)) - return false; - - vma->vm_lock_seq = mm_lock_seq; - up_write(&vma->vm_lock->lock); - return true; -} - static inline void vma_assert_write_locked(struct vm_area_struct *vma) { int mm_lock_seq; @@ -730,8 +715,6 @@ static inline bool vma_start_read(struct vm_area_struct *vma) { return false; } static inline void vma_end_read(struct vm_area_struct *vma) {} static inline void vma_start_write(struct vm_area_struct *vma) {} -static inline bool vma_try_start_write(struct vm_area_struct *vma) - { return true; } static inline void vma_assert_write_locked(struct vm_area_struct *vma) {} static inline void vma_mark_detached(struct vm_area_struct *vma, bool detached) {} diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index aab8f1b28d26..d1191f02c7fa 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -112,16 +112,6 @@ static inline int mmap_write_lock_killable(struct mm_struct *mm) return ret; } -static inline bool mmap_write_trylock(struct mm_struct *mm) -{ - bool ret; - - __mmap_lock_trace_start_locking(mm, true); - ret = down_write_trylock(&mm->mmap_lock) != 0; - __mmap_lock_trace_acquire_returned(mm, true, ret); - return ret; -} - static inline void mmap_write_unlock(struct mm_struct *mm) { __mmap_lock_trace_released(mm, true);