From patchwork Wed Nov 9 11:30:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 17500 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp285153wru; Wed, 9 Nov 2022 03:43:13 -0800 (PST) X-Google-Smtp-Source: AMsMyM5A6jGGJMQ35m2MHuacQXQAiwFCsKpB4Qdaq6e6aa3W1I1Wga7a1CznbcuP8KhEiCqIz4A2 X-Received: by 2002:a05:6402:1c0a:b0:463:3cda:3750 with SMTP id ck10-20020a0564021c0a00b004633cda3750mr54593694edb.341.1667994193300; Wed, 09 Nov 2022 03:43:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1667994193; cv=none; d=google.com; s=arc-20160816; b=hkiMvokbT8+l/4KIhQFP1UwxkYZ6V3iPsFaHu+7S/MA5yLF6LiOtGEV1D3SOpEhf+h JsRG8MKABZaqFtJmAWwBbgxHEHdbRRkPjRcPh6FpyySkyDnVlUYLzZDO+NrJtBUtwPKS iNaAp6RBiixX6YdIlAFTZefp2QxbkAKqnBEbHcH2Z3CHRyVV/2co2LSHUbCd2cneEp0b tWjXVh0fwXlQXiGCw1VgGQQ017IINGiTpauNHYP+LESQYZhT9ozo4/KtK9tjTalbWdez 0N7AmQuIvfj3lrW2wwLUANOgOIb2HsNaqnYKvaT42JnaorZzgMfisOzoZVMhLYAdFcqO zjHw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-disposition:mime-version:message-id :subject:cc:to:from:date; bh=xcGlvp9pxbfwdAvm+57oieDavDcTYMfSmBYna47Z3Uw=; b=PdACDBD4fb74dF5UossWRFSMRX2NXhszmUNkkYz+CwoRwgqCpvbx93zHG8prhy8I9d fDIpdpEQHpz/Cp6XFbHX41Ds5daLVGOqiwmG1gQMtj5NeiDDBMpA7TK7W0hFossxeZGp EeY1cTQ4V4282ystuwvnUyO4/ur/9XAXz5qKQeUewpTovOxzYelIUcvO9DJTMxi0kWq6 lf92bmX91BlnUgEKdwVjqWUvr6xkLIG9xYHgG+i1XNoPw2s1FfTEtRgznStuBS08HZk1 OqaZUsdWAjI8RLkMgegnWU/WzxyRmhbyIfXtGSELTJp0BSayG3yO5hGlV/aY2aQgBs3r AF8Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id qb26-20020a1709077e9a00b007ade14a74b9si15411337ejc.660.2022.11.09.03.42.42; Wed, 09 Nov 2022 03:43:13 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230009AbiKILay (ORCPT + 99 others); Wed, 9 Nov 2022 06:30:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49334 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229764AbiKILax (ORCPT ); Wed, 9 Nov 2022 06:30:53 -0500 Received: from outbound-smtp20.blacknight.com (outbound-smtp20.blacknight.com [46.22.139.247]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2DDC011C17 for ; Wed, 9 Nov 2022 03:30:48 -0800 (PST) Received: from mail.blacknight.com (pemlinmail05.blacknight.ie [81.17.254.26]) by outbound-smtp20.blacknight.com (Postfix) with ESMTPS id 8AB9C1C3A47 for ; Wed, 9 Nov 2022 11:30:46 +0000 (GMT) Received: (qmail 14175 invoked from network); 9 Nov 2022 11:30:46 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.198.246]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 9 Nov 2022 11:30:46 -0000 Date: Wed, 9 Nov 2022 11:30:44 +0000 From: Mel Gorman To: Thomas Gleixner Cc: "Chang S. Bae" , Borislav Petkov , Mike Galbraith , LKML , Linux-RT Subject: [RFC PATCH] x86: Drop fpregs lock before inheriting FPU permissions during clone Message-ID: <20221109113044.7ncdw6263o3msycl@techsingularity.net> MIME-Version: 1.0 Content-Disposition: inline X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1749018679005456992?= X-GMAIL-MSGID: =?utf-8?q?1749018679005456992?= Mike Galbraith reported the following off-list against an old fork of preempt-rt but the same issue likely also applies to current preempt-rt BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: systemd preempt_count: 1, expected: 0 RCU nest depth: 0, expected: 0 Preemption disabled at: fpu_clone+0xfa/0x480 CPU: 6 PID: 1 Comm: systemd Tainted: G E (unreleased) Call Trace: dump_stack_lvl+0x45/0x5b ? fpu_clone+0xfa/0x480 __might_resched+0x165/0x200 rt_spin_lock+0x2d/0x70 fpu_clone+0x32a/0x480 ? copy_thread+0xef/0x270 ? copy_process+0xd2c/0x1c00 ? shmem_alloc_inode+0x16/0x30 ? kmem_cache_alloc+0x120/0x2a0 ? kernel_clone+0x9b/0x460 ? __do_sys_clone+0x72/0xa0 ? do_syscall_64+0x58/0x80 ? __x64_sys_rt_sigprocmask+0x93/0xd0 ? syscall_exit_to_user_mode+0x18/0x40 ? do_syscall_64+0x67/0x80 ? syscall_exit_to_user_mode+0x18/0x40 ? do_syscall_64+0x67/0x80 ? syscall_exit_to_user_mode+0x18/0x40 ? do_syscall_64+0x67/0x80 ? exc_page_fault+0x6a/0x190 ? entry_SYSCALL_64_after_hwframe+0x61/0xcb The splat comes from fpu_inherit_perms() being called under fpregs_lock(), and us reaching the spin_lock_irq() therein due to fpu_state_size_dynamic() returning true despite static key __fpu_state_size_dynamic having never been enabled. Mike's assessment looks correct. fpregs_lock on PREEMPT_RT disables preemption only so the spin_lock_irq() in fpu_inherit_perms is unsafe and converting siglock to raw spinlock would be an unwelcome change. This problem exists since commit 9e798e9aa14c ("x86/fpu: Prepare fpu_clone() for dynamically enabled features"). While the bug triggering is probably a mistake for the affected machine and due to a bug that is not in mainline, spin_lock_irq within a preempt_disable section on PREEMPT_RT is problematic. In this specific context, it may not be necessary to hold fpregs_lock at all. The lock is necessary when editing the FPU registers or a tasks fpstate but in this case, the only write of any FP state in fpu_inherit_perms is for the new child which is not running yet so it cannot context switch or be borrowed by a kernel thread yet. Hence, fpregs_lock is not protecting anything in the new child until clone() completes. The siglock still needs to be acquired by fpu_inherit_perms as the read of the parents permissions has to be serialised. This is not tested as I did not access to a machine with Intel's eXtended Feature Disable (XFD) feature that enables the relevant path in fpu_inherit_perms and the bug is against a non-mainline kernel. Reported-by: Mike Galbraith Signed-off-by: Mel Gorman Reviewed-by: Thomas Gleixner --- arch/x86/kernel/fpu/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index 3b28c5b25e12..d00db56a8868 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -605,9 +605,9 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal) if (test_thread_flag(TIF_NEED_FPU_LOAD)) fpregs_restore_userregs(); save_fpregs_to_fpstate(dst_fpu); + fpregs_unlock(); if (!(clone_flags & CLONE_THREAD)) fpu_inherit_perms(dst_fpu); - fpregs_unlock(); /* * Children never inherit PASID state.