From patchwork Tue Feb 27 02:35:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Changbin Du X-Patchwork-Id: 207009 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:a81b:b0:108:e6aa:91d0 with SMTP id bq27csp2454122dyb; Mon, 26 Feb 2024 18:37:08 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCVXiBHIZwwYSeOha61z82ePFnP7kcGeI70PRUe547pULlKF0GNjoD4m95vD1D3RJ5VQVDGQ6MZdIO+I+IRMiCNz//UbSw== X-Google-Smtp-Source: AGHT+IFiwwKRAw8rk/7cRnbyyJ7Adkl+UlCiFEpnUV8f9dZI1P3YJ3l8/BADy4RVEQO1SnX0bMx2 X-Received: by 2002:ac2:419a:0:b0:512:b555:17c4 with SMTP id z26-20020ac2419a000000b00512b55517c4mr4438959lfh.20.1709001428446; Mon, 26 Feb 2024 18:37:08 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709001428; cv=pass; d=google.com; s=arc-20160816; b=UIGQX0378C0Z/brCo9+hln+VWGYtJ7WTTyJ1ecmJvFUEDwtzME/2TDWkIKE+GqAKCi tg0+uAgvq5dlc0PFqVnIZVsl7zifMp+x3bkF44n2aVGQx1yuQY59HkJWsH1j3MESApNc aPOjvxCFRgit0oJExR+x0FIdcH7pk7a3KDA0bGJRRKXu2I4txYJzJdEXbzHhx/DHY3aM 5xiQ6sWKXlOm86x/rr1YxvvW3hz9wCTAXnhhsu4OjdMXcHHU71VHh+OH2Udr1XOHxSbN ZDCRUIaHvcsCB0lD0fHsF90Q0uYi0RF0/vO8UAQdepkf2Amb9y5g7sEoTWE0aUWfSdTi 9YVA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from; bh=AtGSWKoCufb4hSkrYwLumHdidcbCDUf9oz/Gjeqbl+Q=; fh=79q3AHLFvfMIUmra6MFW1Jj7qqVDhKS2eoZtPNeZqSo=; b=xcnJoYLBiYKGWgUpy0/Qnz8ttOeAHWdlvjM3QT8N7ijF+sH1KuY0TWxyMFJ8a6LCU0 vFby9WRQ9FUNDxZeRj2nlYs7r9IizYX1oZGzk5UmATlvHtyjQ4FrSiRNCN2lm6XuF4qf Cl7ElwXwFRQIIH0Rh8lHE8aJ27kNPjDaoGREr0NkNcEsURSJtsC2Jt0uxnK7hGqFXZDF cJuQ8ORG6vlk/8MNt5IYg7KdlLssChojg8HsB53nlKlKcfhQVy2KeTujiLXnQZnVJIrU EyXPhJh5Xpm0pBn27E8swf6hDO7LG7E7VI7xPljWrPRTM4LaRjKwpj8tVObaEbjopGxT jz1A==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=huawei.com dmarc=pass fromdomain=huawei.com); spf=pass (google.com: domain of linux-kernel+bounces-82614-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-82614-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id i5-20020a170906090500b00a3f4419227dsi302304ejd.672.2024.02.26.18.37.08 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 26 Feb 2024 18:37:08 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-82614-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=huawei.com dmarc=pass fromdomain=huawei.com); spf=pass (google.com: domain of linux-kernel+bounces-82614-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-82614-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 06A0B1F2300D for ; Tue, 27 Feb 2024 02:37:08 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id DCF771B949; Tue, 27 Feb 2024 02:36:51 +0000 (UTC) Received: from szxga05-in.huawei.com (szxga05-in.huawei.com [45.249.212.191]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EFEA236C; Tue, 27 Feb 2024 02:36:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.191 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709001410; cv=none; b=eeknyjFHSdCLgiX7QHs8blC3714ISZi8Lc5vhnQJ93R/vTA03LTWHQ2k0y2S3R4i84MbccZ3FOMvMvLOBy0j6ee0CDlXh8i4LjdqRJ6h1sSaqMX0wvmssbUkv4O4m14Ti7xM1KfAHt/nQOxabO66vhQq748UyRbVzyjBVwUhB8s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709001410; c=relaxed/simple; bh=ZomoVMlhkyvEyE01FSrF2sqSUEnt8WowYC/LgMLqUAc=; h=From:To:CC:Subject:Date:Message-ID:MIME-Version:Content-Type; b=cPx4UjJ99Q5dYRl1hKxKxWkrakhaZGQKBWEQT2kZ5Ej5QvhI63nzvoNOSxh+DA83ofA3tPQSOen8D/LV2wIacYAYhIZRWnuRh1wNK1YS0keGTl0YKARFZrxvKaTGGBsqu+bdsnhG6giDnPBk/uDVwTbBWZFZKI6Smt0GKtSQsc0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.191 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.88.214]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4TkM4x5fVjz1h0jg; Tue, 27 Feb 2024 10:34:25 +0800 (CST) Received: from kwepemd100003.china.huawei.com (unknown [7.221.188.180]) by mail.maildlp.com (Postfix) with ESMTPS id 6AB5F1A016B; Tue, 27 Feb 2024 10:36:39 +0800 (CST) Received: from kwepemd100011.china.huawei.com (7.221.188.204) by kwepemd100003.china.huawei.com (7.221.188.180) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28; Tue, 27 Feb 2024 10:36:39 +0800 Received: from M910t.huawei.com (10.110.54.157) by kwepemd100011.china.huawei.com (7.221.188.204) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28; Tue, 27 Feb 2024 10:36:38 +0800 From: Changbin Du To: Andrew Morton , Luis Chamberlain CC: , , "Changbin Du" , Xiaoyi Su , Eric Chanudet , Luis Chamberlain Subject: [PATCH v4] modules: wait do_free_init correctly Date: Tue, 27 Feb 2024 10:35:46 +0800 Message-ID: <20240227023546.2490667-1-changbin.du@huawei.com> X-Mailer: git-send-email 2.25.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To kwepemd100011.china.huawei.com (7.221.188.204) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1792017881660606927 X-GMAIL-MSGID: 1792017881660606927 The synchronization here is to ensure the ordering of freeing of a module init so that it happens before W+X checking. It is worth noting it is not that the freeing was not happening, it is just that our sanity checkers raced against the permission checkers which assume init memory is already gone. Commit 1a7b7d922081 ("modules: Use vmalloc special flag") moved calling do_free_init() into a global workqueue instead of relying on it being called through call_rcu(..., do_free_init), which used to allowed us call do_free_init() asynchronously after the end of a subsequent grace period. The move to a global workqueue broke the gaurantees for code which needed to be sure the do_free_init() would complete with rcu_barrier(). To fix this callers which used to rely on rcu_barrier() must now instead use flush_work(&init_free_wq). Without this fix, we still could encounter false positive reports in W+X checking since the rcu_barrier() here can not ensure the ordering now. Even worse, the rcu_barrier() can introduce significant delay. Eric Chanudet reported that the rcu_barrier introduces ~0.1s delay on a PREEMPT_RT kernel. [ 0.291444] Freeing unused kernel memory: 5568K [ 0.402442] Run /sbin/init as init process With this fix, the above delay can be eliminated. Fixes: 1a7b7d922081 ("modules: Use vmalloc special flag") Signed-off-by: Changbin Du Cc: Xiaoyi Su Cc: Eric Chanudet Cc: Luis Chamberlain Tested-by: Eric Chanudet Acked-by: Luis Chamberlain --- v4: - polish commit msg. (Luis Chamberlain) v3: - amend comment in do_init_module() and update commit msg. v2: - fix compilation issue for no CONFIG_MODULES found by 0-DAY. --- include/linux/moduleloader.h | 8 ++++++++ init/main.c | 5 +++-- kernel/module/main.c | 9 +++++++-- 3 files changed, 18 insertions(+), 4 deletions(-) diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h index 001b2ce83832..89b1e0ed9811 100644 --- a/include/linux/moduleloader.h +++ b/include/linux/moduleloader.h @@ -115,6 +115,14 @@ int module_finalize(const Elf_Ehdr *hdr, const Elf_Shdr *sechdrs, struct module *mod); +#ifdef CONFIG_MODULES +void flush_module_init_free_work(void); +#else +static inline void flush_module_init_free_work(void) +{ +} +#endif + /* Any cleanup needed when module leaves. */ void module_arch_cleanup(struct module *mod); diff --git a/init/main.c b/init/main.c index e24b0780fdff..f0b7e21ac67f 100644 --- a/init/main.c +++ b/init/main.c @@ -99,6 +99,7 @@ #include #include #include +#include #include #include @@ -1402,11 +1403,11 @@ static void mark_readonly(void) if (rodata_enabled) { /* * load_module() results in W+X mappings, which are cleaned - * up with call_rcu(). Let's make sure that queued work is + * up with init_free_wq. Let's make sure that queued work is * flushed so that we don't hit false positives looking for * insecure pages which are W+X. */ - rcu_barrier(); + flush_module_init_free_work(); mark_rodata_ro(); rodata_test(); } else diff --git a/kernel/module/main.c b/kernel/module/main.c index 36681911c05a..b0b99348e1a8 100644 --- a/kernel/module/main.c +++ b/kernel/module/main.c @@ -2489,6 +2489,11 @@ static void do_free_init(struct work_struct *w) } } +void flush_module_init_free_work(void) +{ + flush_work(&init_free_wq); +} + #undef MODULE_PARAM_PREFIX #define MODULE_PARAM_PREFIX "module." /* Default value for module->async_probe_requested */ @@ -2593,8 +2598,8 @@ static noinline int do_init_module(struct module *mod) * Note that module_alloc() on most architectures creates W+X page * mappings which won't be cleaned up until do_free_init() runs. Any * code such as mark_rodata_ro() which depends on those mappings to - * be cleaned up needs to sync with the queued work - ie - * rcu_barrier() + * be cleaned up needs to sync with the queued work by invoking + * flush_module_init_free_work(). */ if (llist_add(&freeinit->node, &init_free_list)) schedule_work(&init_free_wq);