From patchwork Wed Sep 20 02:16:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rik van Riel X-Patchwork-Id: 14241 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:172:b0:3f2:4152:657d with SMTP id h50csp3821837vqi; Tue, 19 Sep 2023 19:19:29 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHzvlbuKcdVA9tmgoONcTvS67qlsmW7M6ec/+rPlNbNdWbH9UgKb6f2aFO2BtmQ5/BE95tV X-Received: by 2002:a05:6a20:8e09:b0:136:faec:a7dc with SMTP id y9-20020a056a208e0900b00136faeca7dcmr1467038pzj.11.1695176369640; Tue, 19 Sep 2023 19:19:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695176369; cv=none; d=google.com; s=arc-20160816; b=jfRcUuVBaPcT9f+WpoSrPOz6zB3sbv5QkdH4tH0kMP/6WlYmPCd1L4NzOYElrkL+Wh cMl98EQnDTcALrsHD3uN0soh5LcSd2lXtxwOV6ikR3hGXVlwNMwuafuOpvHkgs6Lg05k AkEwmxbn2TPh1eUNQSuaLhfgoSe0n1mwH/RwNqjNtksVdYWlUUzUeyBYDnQpfAvL5MbF wjpWTIGWgDIZ1LWFgDnB9w0C9yVHS9Me32c0wMKuPGm7Kz+Gu9PCLtqiyvrSpoQ7PCH2 0EdXa9lOpNAbgFSamc54bkfnw9+3HF1GrabU1MPIJb1AGpdPKzRXFawW9vh7mAlLtg6I tf1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=UXByQc3t3A49Pgq8d2E1JEtcEwQkNM/QU0Xg3xEoGLo=; fh=eSzpgospgMU8oOGgEUSakcgwfntwwBOKwj4f1IHW3hI=; b=qkkJrJkjN6fDRRfcJp9t4KDfBgYnK/ForOaVktwK4MI1Phc5EVC9twWbhi/i5tMlgZ VbElODGDusqTL6aNqvPF9yAyjrwkvW5YUho0ZRLMnoTbKETzC3EW9qGFopqBpJd/UWm5 Ilf/MXXFhN5tGMbjFi31R3LX5Rjv0O7x1Q5o07l8iNarQ6HROwHQP6SrJkYo22aupvbs msrZ5MZfslIkmipWKwDZHN6TxwZ0kij6Ed3k+EUJXEN8WMIEtpTpGAlhmKfVmyhE3XA3 WrgpiQ7Y6PvwDTWKq8+XG9fcxYJrg5bbr3/eG7d2wg/+597dffQacmp0l4cRZw0FGr55 QlQg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from howler.vger.email (howler.vger.email. [2620:137:e000::3:4]) by mx.google.com with ESMTPS id mv16-20020a17090b199000b00262ebe643a2si483993pjb.186.2023.09.19.19.19.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Sep 2023 19:19:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) client-ip=2620:137:e000::3:4; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id 5469180A856E; Tue, 19 Sep 2023 19:18:51 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231347AbjITCSq (ORCPT + 26 others); Tue, 19 Sep 2023 22:18:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47874 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229521AbjITCSp (ORCPT ); Tue, 19 Sep 2023 22:18:45 -0400 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EB7C1BE for ; Tue, 19 Sep 2023 19:18:39 -0700 (PDT) Received: from imladris.home.surriel.com ([10.0.13.28] helo=imladris.surriel.com) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1qimn9-0006me-2W; Tue, 19 Sep 2023 22:18:15 -0400 From: riel@surriel.com To: linux-kernel@vger.kernel.org Cc: kernel-team@meta.com, linux-mm@kvack.org, akpm@linux-foundation.org, muchun.song@linux.dev, mike.kravetz@oracle.com, leit@meta.com Subject: [PATCH 0/2] hugetlbfs: close race between MADV_DONTNEED and page fault Date: Tue, 19 Sep 2023 22:16:08 -0400 Message-ID: <20230920021811.3095089-1-riel@surriel.com> X-Mailer: git-send-email 2.41.0 MIME-Version: 1.0 Sender: riel@surriel.com X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Tue, 19 Sep 2023 19:18:51 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777521256785748790 X-GMAIL-MSGID: 1777521256785748790 Malloc libraries, like jemalloc and tcalloc, take decisions on when to call madvise independently from the code in the main application. This sometimes results in the application page faulting on an address, right after the malloc library has shot down the backing memory with MADV_DONTNEED. Usually this is harmless, because we always have some 4kB pages sitting around to satisfy a page fault. However, with hugetlbfs systems often allocate only the exact number of huge pages that the application wants. Due to TLB batching, hugetlbfs MADV_DONTNEED will free pages outside of any lock taken on the page fault path, which can open up the following race condition: CPU 1 CPU 2 MADV_DONTNEED unmap page shoot down TLB entry page fault fail to allocate a huge page killed with SIGBUS free page Fix that race by extending the hugetlb_vma_lock locking scheme to also cover private hugetlb mappings (with resv_map), and pulling the locking from __unmap_hugepage_final_range into helper functions called from zap_page_range_single. This ensures page faults stay locked out of the MADV_DONTNEED VMA until the huge pages have actually been freed.