From patchwork Wed Oct 18 20:46:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 155144 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2908:b0:403:3b70:6f57 with SMTP id ib8csp5057705vqb; Wed, 18 Oct 2023 13:46:48 -0700 (PDT) X-Google-Smtp-Source: AGHT+IF2Tr2qUaN6mL8MSu/bQYI6hWiCzL1wX6VWecZMJ3BKKXZ7dvnpluSSGN+SjONd8yuqJCif X-Received: by 2002:a17:902:f092:b0:1ca:200b:8dce with SMTP id p18-20020a170902f09200b001ca200b8dcemr336924pla.41.1697662008294; Wed, 18 Oct 2023 13:46:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697662008; cv=none; d=google.com; s=arc-20160816; b=VjAfPBSDM/yobtR/0JsQi/akV4wBYg/Lura7JPYp/yWfINKpi3/P+f/EHf87e1kjue +SEtESBMJ/8EsVoN9zL7HZ6XhEY1OBJbak4KV14HOynD+sxngJSv+kXTL+VC7fmmGYX4 5OAUW/JjIWxnYgNlRiBvWih5kXm3S39fUp3d+LtnIlSMeXxnjgntJ5r/QczTtvO6+0Nw p8rDmaeFV0xkBC7HoXnQzApS2FHAWlUpXhurGbCMqUkGr1e4j79ZdhYQCewUlZgNrPzT g4GiGDpabLwH7qPo8mDXqHfWV767/nt+tTe+pBZ7rh0nL1CA74OPQS44iTtE72rc8V0b +HxQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:reply-to:dkim-signature; bh=0NC3yqMcGjhIZrhf7725DJ1Gf8NkzsIha09da8LPGz0=; fh=S++QvovuovGo8dWglP3zfPiw3tBjpuKi+q/HqjmYx30=; b=PNJZRnowXWr8rkMKv1B2jTVT5UoPQNZE1dKzfjIt96lHmsWwfgRcSYvbr3g/FP58TS t8ZkcIdM+SmWJoWJ1zR03f23AoIhvj0fjmqCLcJjYLcUduOWFzcktxWE+B3tNT96KgGe pEiRDEnE6t5ptOxXA67eHZ6K8Ur/nxKb4lSdDsA3V5D5bEPkp1J4RDBTs8Ntu6QWo16v i+5NdzyVWJCHv7DD1pR0zi8/WQe+HWeYStzaDk/CpT4dVyHS4ahdcH33xghugdPDyjbr QFXjbl60ubZdiis4s6GcwwG6lKLtbE68WyT/TUUYN5+ADmDE1R/B1nUW+YLuILOd5lR7 17aw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=SmAaQ7pu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id g4-20020a170902740400b001b81a112f9bsi598841pll.586.2023.10.18.13.46.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Oct 2023 13:46:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=SmAaQ7pu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 854F281B6AB5; Wed, 18 Oct 2023 13:46:47 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232307AbjJRUqj (ORCPT + 24 others); Wed, 18 Oct 2023 16:46:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34270 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232082AbjJRUqf (ORCPT ); Wed, 18 Oct 2023 16:46:35 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0A3679B for ; Wed, 18 Oct 2023 13:46:34 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-d9a3a98b34dso10019765276.3 for ; Wed, 18 Oct 2023 13:46:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1697661993; x=1698266793; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=0NC3yqMcGjhIZrhf7725DJ1Gf8NkzsIha09da8LPGz0=; b=SmAaQ7putaQ3J9cjAp0JVUn0Kcd12FMuMq4Sl67CevF4CgsKROoWDvmmhp3kRKmGLQ 3QMt7nCvyNMn7/TFbNaLtmuM7W+jWyrJx6vhg9hOs//AB0sYdKhmYfjuvEDYhfHy+/ju hqMbQ99fiBxOGENlnGXhYSN1A2GlE19mlS2vnmOs0GdbeNYKPaYrs/2LZWJ/ohWYLZ6R RGqPv03UwwsBO6mI5aqAQQYJa8ot68Pq5M12HKzDhajzxXBa2DsN/Bxc0gXxGr7gxWgn eR6jmqBHFekWlaHqzNduMj7a5id4JgQewCu1IAA474hxqINCASw/k/HGprLiHz7ytne+ Rtsg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697661993; x=1698266793; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=0NC3yqMcGjhIZrhf7725DJ1Gf8NkzsIha09da8LPGz0=; b=ce50dm3SPBGswXVCg5wQu8yAUZsBtp+mTHAbIyj816Q9lT7tcR3wgj/knt3XbdccTX vYWEApDz+MzSCez7N4rcHsoDMene+dZHQipCouSrxprIq64tPvJKGyG/UhdfGB22d56C RDgHbjOZmnadJxsFVKDDMXiKqn87WWKQpeoTsOOcRArirNd0a/FLfq2m2ZDzT+225TOF ocg1jlsMZBQnXwrm9MNcLXt8X0SQLp5UglgjNnusssZby2//ObnhkI4HV/bZsST/YNR+ 5UiSAXpY02bXom48S3LHdfvepVdsEhIMyi4/ByIOaNURZ51IlxsPglCyjJCG15bYwKXI deYA== X-Gm-Message-State: AOJu0YxRu0sE6Fr9B1b48m3vvfpcgRzRAC/5HLCunaEH4YxA3jhZaehT ZvN3ZTH+foN9nvmpB/efbfP42eYTFzQ= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6902:134a:b0:d9a:58e0:c7c7 with SMTP id g10-20020a056902134a00b00d9a58e0c7c7mr11788ybu.1.1697661993300; Wed, 18 Oct 2023 13:46:33 -0700 (PDT) Reply-To: Sean Christopherson Date: Wed, 18 Oct 2023 13:46:22 -0700 In-Reply-To: <20231018204624.1905300-1-seanjc@google.com> Mime-Version: 1.0 References: <20231018204624.1905300-1-seanjc@google.com> X-Mailer: git-send-email 2.42.0.655.g421f12c284-goog Message-ID: <20231018204624.1905300-2-seanjc@google.com> Subject: [PATCH 1/3] KVM: Set file_operations.owner appropriately for all such structures From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Al Viro , David Matlack X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Wed, 18 Oct 2023 13:46:47 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780127637545991934 X-GMAIL-MSGID: 1780127637545991934 Set .owner for all KVM-owned filed types so that the KVM module is pinned until any files with callbacks back into KVM are completely freed. Using "struct kvm" as a proxy for the module, i.e. keeping KVM-the-module alive while there are active VMs, doesn't provide full protection. Userspace can invoke delete_module() the instant the last reference to KVM is put. If KVM itself puts the last reference, e.g. via kvm_destroy_vm(), then it's possible for KVM to be preempted and deleted/unloaded before KVM fully exits, e.g. when the task running kvm_destroy_vm() is scheduled back in, it will jump to a code page that is no longer mapped. Note, file types that can call into sub-module code, e.g. kvm-intel.ko or kvm-amd.ko on x86, must use the module pointer passed to kvm_init(), not THIS_MODULE (which points at kvm.ko). KVM assumes that if /dev/kvm is reachable, e.g. VMs are active, then the vendor module is loaded. To reduce the probability of forgetting to set .owner entirely, use THIS_MODULE for stats files where KVM does not call back into vendor code. This reverts commit 70375c2d8fa3fb9b0b59207a9c5df1e2e1205c10, and fixes several other file types that have been buggy since their introduction. Fixes: 70375c2d8fa3 ("Revert "KVM: set owner of cpu and vm file operations"") Fixes: 3bcd0662d66f ("KVM: X86: Introduce mmu_rmaps_stat per-vm debugfs file") Reported-by: Al Viro Link: https://lore.kernel.org/all/20231010003746.GN800259@ZenIV Signed-off-by: Sean Christopherson --- arch/x86/kvm/debugfs.c | 1 + virt/kvm/kvm_main.c | 11 ++++++++--- 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/debugfs.c b/arch/x86/kvm/debugfs.c index ee8c4c3496ed..eea6ea7f14af 100644 --- a/arch/x86/kvm/debugfs.c +++ b/arch/x86/kvm/debugfs.c @@ -182,6 +182,7 @@ static int kvm_mmu_rmaps_stat_release(struct inode *inode, struct file *file) } static const struct file_operations mmu_rmaps_stat_fops = { + .owner = THIS_MODULE, .open = kvm_mmu_rmaps_stat_open, .read = seq_read, .llseek = seq_lseek, diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 486800a7024b..1e65a506985f 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3887,7 +3887,7 @@ static int kvm_vcpu_release(struct inode *inode, struct file *filp) return 0; } -static const struct file_operations kvm_vcpu_fops = { +static struct file_operations kvm_vcpu_fops = { .release = kvm_vcpu_release, .unlocked_ioctl = kvm_vcpu_ioctl, .mmap = kvm_vcpu_mmap, @@ -4081,6 +4081,7 @@ static int kvm_vcpu_stats_release(struct inode *inode, struct file *file) } static const struct file_operations kvm_vcpu_stats_fops = { + .owner = THIS_MODULE, .read = kvm_vcpu_stats_read, .release = kvm_vcpu_stats_release, .llseek = noop_llseek, @@ -4431,7 +4432,7 @@ static int kvm_device_release(struct inode *inode, struct file *filp) return 0; } -static const struct file_operations kvm_device_fops = { +static struct file_operations kvm_device_fops = { .unlocked_ioctl = kvm_device_ioctl, .release = kvm_device_release, KVM_COMPAT(kvm_device_ioctl), @@ -4759,6 +4760,7 @@ static int kvm_vm_stats_release(struct inode *inode, struct file *file) } static const struct file_operations kvm_vm_stats_fops = { + .owner = THIS_MODULE, .read = kvm_vm_stats_read, .release = kvm_vm_stats_release, .llseek = noop_llseek, @@ -5060,7 +5062,7 @@ static long kvm_vm_compat_ioctl(struct file *filp, } #endif -static const struct file_operations kvm_vm_fops = { +static struct file_operations kvm_vm_fops = { .release = kvm_vm_release, .unlocked_ioctl = kvm_vm_ioctl, .llseek = noop_llseek, @@ -6095,6 +6097,9 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module) goto err_async_pf; kvm_chardev_ops.owner = module; + kvm_vm_fops.owner = module; + kvm_vcpu_fops.owner = module; + kvm_device_fops.owner = module; kvm_preempt_ops.sched_in = kvm_sched_in; kvm_preempt_ops.sched_out = kvm_sched_out; From patchwork Wed Oct 18 20:46:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 155145 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2908:b0:403:3b70:6f57 with SMTP id ib8csp5057814vqb; Wed, 18 Oct 2023 13:47:06 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFK5UQqp8qm592LfSpQVGecmezslpT5STLURti1lqh4l2RsyWHI1dOdlgX5e65ANeYUT1JU X-Received: by 2002:a05:6830:718d:b0:6b9:ed64:1423 with SMTP id el13-20020a056830718d00b006b9ed641423mr467681otb.2.1697662026010; Wed, 18 Oct 2023 13:47:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697662025; cv=none; d=google.com; s=arc-20160816; b=h189IUNlDrD6bTtNDoddH2nvWyY61u95C7n9x8mF/qWFSeN2/11rcFabzl0Nk4eKsL 6hcLrxU1xpoe7Phhb2xQYfNhRCgiZQ73Qn3mAOEi88ahqwuWjof91jW5kFKqYWHC1EX4 VJeADLbC/9nTYu1Ua8kDdHi/OdtKhZf7caB5SQzxxnE5f986r+sWLq4J59Obu7YngeVT 4pnqUyn4DzvHWokeGZBpOp37S2rwYthN23kkBoOZPhBo86fc7XovQmPsVlpLrbFk1w7P 9z02gLdiL7XB0QIp4amLMtzUcURK1D5DaH4dHFJGt0Ivy8te5KdV4nlXagYWZgtbKPGJ 2X+g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:reply-to:dkim-signature; bh=w+Tq8ZbDqZnNACP1ZGd4eGNbYVTAqWG4yMn2szlYYbk=; fh=S++QvovuovGo8dWglP3zfPiw3tBjpuKi+q/HqjmYx30=; b=e6CfV5znMAyBlVr8vXxf2UxmdJwBaBOikkwinF3q5C/3HC/Zuep5fQfxwamwgtQ6T+ 6K6c8tXA+7a9U6728rAJxp7JHcYl4pbSqjJlUdiqnxQBq4gzeJ+IG9NBh66AJy8Ffzit +P08T+PvdJsg3o0Sm+SofuTJmBvZsmgLsu+/Cl8ZbDMwqIFCi5SeUvxF5JpBVCcTajhg uJyCbW3xdkDVFV2Op7183Tkz0OhwwR0z/jsONJF1j9tgYmlOe6vVxlRcd5Q3qya6+hwB J/XQByLzzL2fMCR9UOeAUF4Z52+RuAw4yKgpFmexFtMxpFylreAAwd2vLVdrFAlmcrJQ 3dVw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=T98kk+jN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from morse.vger.email (morse.vger.email. [2620:137:e000::3:1]) by mx.google.com with ESMTPS id bk13-20020a056a02028d00b005a9f776c59csi3005698pgb.468.2023.10.18.13.47.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Oct 2023 13:47:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) client-ip=2620:137:e000::3:1; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=T98kk+jN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id EC17A823F5F9; Wed, 18 Oct 2023 13:47:01 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232387AbjJRUqn (ORCPT + 24 others); Wed, 18 Oct 2023 16:46:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51042 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232149AbjJRUqh (ORCPT ); Wed, 18 Oct 2023 16:46:37 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C2467A4 for ; Wed, 18 Oct 2023 13:46:35 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-5a7af53bde4so117957207b3.0 for ; Wed, 18 Oct 2023 13:46:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1697661995; x=1698266795; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=w+Tq8ZbDqZnNACP1ZGd4eGNbYVTAqWG4yMn2szlYYbk=; b=T98kk+jNDwt/ZiuSK4thE7SNRmSmHKpNUjbVFTUOUn1zp7roA0T/GxYtAivo2NfBRp iSok0N2azlFKcVN4CBiS6e/ST5N5Qfbc2K6a9737JC+ra4WSrvNb74LU20wMm4A8emdm uPM8dmBoUr9ZFKz4C9oiTP9DIR01oYLTL6QGFfMMIwG3CFyGBUYGU3CGT0y6L05aA1ay 0OikiUefL/VdGER5iJKu+15WKaK6Xu6B6Nm+qkr/7lJETspWuQDyLSsoXwdkBcqSkpWA 5jQ52I8t7dt9ka/SHKX7tCKAOfhYpCtpkPL1qrqY/B10qsAAQIAp0wSbn4jKeWl/80nC rcvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697661995; x=1698266795; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=w+Tq8ZbDqZnNACP1ZGd4eGNbYVTAqWG4yMn2szlYYbk=; b=AdbN3jfaN1un/qGJl2BUb9+YdsMmhxQ69mz3Z7Vk1894uxDhUmXydyo82hczVOgp5f BCznRIaCDno+GIIPNQrZps0/zAqzv+FLbl2Cn7uKLZKIVC/BDyv/V/84SrI4//y1zSmR UlHRMHfaH4jXU+O31M2p2DL2zTGO9OC2Y/aY+AWlZhFEJqmykOxaMdJhTkE1tGjVAVhR YWwsPl4w1PJtiIqwPsT1FsQ/d2E+OQdPy9Lax59cBYoixJszF9GIbUiF+0RzlH/n1/wg e9+uu4J/AAudvyJUu5lfL4oSTqk0gh7lzw96KOgGQRiiqsgzv8+R5YuBL1S1r26LT1c/ oO+Q== X-Gm-Message-State: AOJu0YwAJC2/4qBmvY+uo/caJ6w+JVjKp7xO8KKvP9v6snNP27X8RWp4 Y4JF/BqNZ9oqS4pYwSNlTGO+NuACurA= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a0d:cb89:0:b0:5a7:7683:995d with SMTP id n131-20020a0dcb89000000b005a77683995dmr11570ywd.5.1697661995065; Wed, 18 Oct 2023 13:46:35 -0700 (PDT) Reply-To: Sean Christopherson Date: Wed, 18 Oct 2023 13:46:23 -0700 In-Reply-To: <20231018204624.1905300-1-seanjc@google.com> Mime-Version: 1.0 References: <20231018204624.1905300-1-seanjc@google.com> X-Mailer: git-send-email 2.42.0.655.g421f12c284-goog Message-ID: <20231018204624.1905300-3-seanjc@google.com> Subject: [PATCH 2/3] KVM: Always flush async #PF workqueue when vCPU is being destroyed From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Al Viro , David Matlack X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Wed, 18 Oct 2023 13:47:02 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780127656306434455 X-GMAIL-MSGID: 1780127656306434455 Always flush the per-vCPU async #PF workqueue when a vCPU is clearing its completion queue, i.e. when a VM and all its vCPUs is being destroyed. KVM must ensure that none of its workqueue callbacks is running when the last reference to the KVM _module_ is put. Gifting a reference to the associated VM prevents the workqueue callback from dereferencing freed vCPU/VM memory, but does not prevent the KVM module from being unloaded before the callback completes. Drop the misguided VM refcount gifting, as calling kvm_put_kvm() from async_pf_execute() if kvm_put_kvm() flushes the async #PF workqueue will result in deadlock. async_pf_execute() can't return until kvm_put_kvm() finishes, and kvm_put_kvm() can't return until async_pf_execute() finishes: WARNING: CPU: 8 PID: 251 at virt/kvm/kvm_main.c:1435 kvm_put_kvm+0x2d/0x320 [kvm] Modules linked in: vhost_net vhost vhost_iotlb tap kvm_intel kvm irqbypass CPU: 8 PID: 251 Comm: kworker/8:1 Tainted: G W 6.6.0-rc1-e7af8d17224a-x86/gmem-vm #119 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Workqueue: events async_pf_execute [kvm] RIP: 0010:kvm_put_kvm+0x2d/0x320 [kvm] Call Trace: async_pf_execute+0x198/0x260 [kvm] process_one_work+0x145/0x2d0 worker_thread+0x27e/0x3a0 kthread+0xba/0xe0 ret_from_fork+0x2d/0x50 ret_from_fork_asm+0x11/0x20 ---[ end trace 0000000000000000 ]--- INFO: task kworker/8:1:251 blocked for more than 120 seconds. Tainted: G W 6.6.0-rc1-e7af8d17224a-x86/gmem-vm #119 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/8:1 state:D stack:0 pid:251 ppid:2 flags:0x00004000 Workqueue: events async_pf_execute [kvm] Call Trace: __schedule+0x33f/0xa40 schedule+0x53/0xc0 schedule_timeout+0x12a/0x140 __wait_for_common+0x8d/0x1d0 __flush_work.isra.0+0x19f/0x2c0 kvm_clear_async_pf_completion_queue+0x129/0x190 [kvm] kvm_arch_destroy_vm+0x78/0x1b0 [kvm] kvm_put_kvm+0x1c1/0x320 [kvm] async_pf_execute+0x198/0x260 [kvm] process_one_work+0x145/0x2d0 worker_thread+0x27e/0x3a0 kthread+0xba/0xe0 ret_from_fork+0x2d/0x50 ret_from_fork_asm+0x11/0x20 If kvm_clear_async_pf_completion_queue() actually flushes the workqueue, then there's no need to gift async_pf_execute() a reference because all invocations of async_pf_execute() will be forced to complete before the vCPU and its VM are destroyed/freed. And that in turn fixes the module unloading bug as __fput() won't do module_put() on the last vCPU reference until the vCPU has been freed, e.g. if closing the vCPU file also puts the last reference to the KVM module. Note, commit 5f6de5cbebee ("KVM: Prevent module exit until all VMs are freed") *tried* to fix the module refcounting issue by having VMs grab a reference to the module, but that only made the bug slightly harder to hit as it gave async_pf_execute() a bit more time to complete before the KVM module could be unloaded. Fixes: af585b921e5d ("KVM: Halt vcpu if page it tries to access is swapped out") Cc: stable@vger.kernel.org Cc: David Matlack Signed-off-by: Sean Christopherson Reviewed-by: David Matlack --- virt/kvm/async_pf.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c index e033c79d528e..7aeb9d1f43b1 100644 --- a/virt/kvm/async_pf.c +++ b/virt/kvm/async_pf.c @@ -87,7 +87,6 @@ static void async_pf_execute(struct work_struct *work) __kvm_vcpu_wake_up(vcpu); mmput(mm); - kvm_put_kvm(vcpu->kvm); } void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu) @@ -114,7 +113,6 @@ void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu) #else if (cancel_work_sync(&work->work)) { mmput(work->mm); - kvm_put_kvm(vcpu->kvm); /* == work->vcpu->kvm */ kmem_cache_free(async_pf_cache, work); } #endif @@ -126,7 +124,19 @@ void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu) list_first_entry(&vcpu->async_pf.done, typeof(*work), link); list_del(&work->link); + + spin_unlock(&vcpu->async_pf.lock); + + /* + * The async #PF is "done", but KVM must wait for the work item + * itself, i.e. async_pf_execute(), to run to completion. If + * KVM is a module, KVM must ensure *no* code owned by the KVM + * (the module) can be run after the last call to module_put(), + * i.e. after the last reference to the last vCPU's file is put. + */ + flush_work(&work->work); kmem_cache_free(async_pf_cache, work); + spin_lock(&vcpu->async_pf.lock); } spin_unlock(&vcpu->async_pf.lock); @@ -186,7 +196,6 @@ bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, work->arch = *arch; work->mm = current->mm; mmget(work->mm); - kvm_get_kvm(work->vcpu->kvm); INIT_WORK(&work->work, async_pf_execute); From patchwork Wed Oct 18 20:46:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 155146 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2908:b0:403:3b70:6f57 with SMTP id ib8csp5057822vqb; Wed, 18 Oct 2023 13:47:07 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHDGPn0etKjJsO+mAE5ttEuG5wqy374VVeguKF04Cqx3TmW71eyOa3QTZbSz6mkG5fvEvly X-Received: by 2002:a05:6359:610a:b0:13c:f631:bed with SMTP id rz10-20020a056359610a00b0013cf6310bedmr92014rwb.32.1697662026777; Wed, 18 Oct 2023 13:47:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697662026; cv=none; d=google.com; s=arc-20160816; b=uPBlK80TO84M3DDcJMD0vTPYVd3n6D5ZBSfEKZKwboCsfUA1uN3oRJV5LKNnzOxnTg XYkSsPnTzpqcXQDCo2O1/HDv6/I2CMhff3L0ZW0f2NvoITtBTNmNUBwZlM4G/QWi01ap W3jChRtfs6zIqWBpmolPbqJUhNgpiCl5gW6XfXF/ctHCchkXKdbQIKvfAYEYXiAnaloI 21f+7xqhR8dVg9Ub1sqFYSALlorxeXZJjDZ3CGPoCAJDg5rdGXIaOZqX5wstMqA5kd6A hFPMY2LH8A+6qNZVhFw06B739s7NTO79Wwd/plSIwo5EjZxnAFnEQCHFPv+06xGc5+sS KX5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:reply-to:dkim-signature; bh=AxQtUaL497jm8Kvv+JzkijCCW6oqSLgiPCMTJq7eWjY=; fh=S++QvovuovGo8dWglP3zfPiw3tBjpuKi+q/HqjmYx30=; b=vRf0I+JV6xpu8q4+vgpA/8w6Hgc1hTYCmDWxU5ryolBe8YS6AyLhLqiQkxtk0gL+Vl +xCQmk4InKrsOlEOeGGH5hTsWatRWod1A2A97z72f5e3Odwra9ISpZ+SnvPOr1A4eTs5 tD028HHjhL0E+6s5rsENOEKCRlhA0jclO+szIQhTqGGVKlWqWhhL0pcPlrHEWSW950tA 8UpBqH1N7N32EAigqOcs3l1+wTmgV/gsQbNN5NEi9shyi/X8ytSToCIatDDZibDb4sM5 k15qp5DyFOqkkf3R0fovogsxpd73ta9YGecU8I7pebQA+3kCcqQ95aEucUDyaiExvWk5 EAWQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=qHUULQsS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id 184-20020a6300c1000000b0057942bfab4dsi2902988pga.395.2023.10.18.13.47.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Oct 2023 13:47:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=qHUULQsS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id E1F4681D2AE9; Wed, 18 Oct 2023 13:47:05 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232560AbjJRUqs (ORCPT + 24 others); Wed, 18 Oct 2023 16:46:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51026 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232262AbjJRUqj (ORCPT ); Wed, 18 Oct 2023 16:46:39 -0400 Received: from mail-oi1-x249.google.com (mail-oi1-x249.google.com [IPv6:2607:f8b0:4864:20::249]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 601DDFA for ; Wed, 18 Oct 2023 13:46:37 -0700 (PDT) Received: by mail-oi1-x249.google.com with SMTP id 5614622812f47-3af6a12b2a8so12070975b6e.1 for ; Wed, 18 Oct 2023 13:46:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1697661996; x=1698266796; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=AxQtUaL497jm8Kvv+JzkijCCW6oqSLgiPCMTJq7eWjY=; b=qHUULQsSo3Ub018clPnEeoGy1KFO9nj/b12yYOWutZRQBYALkbLj/B2C+oF0XuLbcR NLeJjUPJAz0tFXBku3c22stWEspn5R2O4ncJx0EvYY7DolE9X5pB3I+LJWgZcuLKloqt PiT7noNmG96kEbTH0XsmnICSTFzgyoRdU1F3MhtJe4koukZUBoikGDcr1D+WJX1mmb2h 73HmsFByuSSZnPR6AQQdzUeqpc7x54czTNbpa5Zk9BjWN8+l3z/jO9xXwPI60qqAp8OW +Yqed/Fmzo0XRFg1nG4AVH8v3UiV6MTSiMx+oZ36n8hhz7F7kVQ28Wwvk+5en1RPGiGn IkHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697661996; x=1698266796; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=AxQtUaL497jm8Kvv+JzkijCCW6oqSLgiPCMTJq7eWjY=; b=AMUD6a3OWq71IRP4ruauqs7Ta93yENCBa5F9XLCZ5qxAVfOEFmoTap2wI4MUJUGJd0 g+p9WJGN5I6FgF+pRirOrgVrnrRmaaAgIDYGB7lpwelzvFmNId3aunPUi/Rk5lZVL3Db ulz+/geyVMi8gsK9C8wWYQoBpL6jCBWwxCA9Xz/1U3JsqholUF270zKHj38bAL6njz3A RZ0yUZ5Ye6pvmxCBV8sB7FGpaR0/Li5+9cJmGcvzmoAvGL44Ths4SXeQ8NuBitDXEEb/ fFtrclHatyqLRbzKWCiO2JbKiBaHI5g2K8AswNaViykYtWccPJWcoAjrvrqFCjo/XW+0 JdOg== X-Gm-Message-State: AOJu0YxJ7rCjDG2yezmqOa5qDjmXGJA5LjKyuSlr6s7RC4o5Xe0ucfRB O+jODfvskNLnmfjN1mDHS8JDXFp5hII= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6808:1884:b0:3a9:d030:5023 with SMTP id bi4-20020a056808188400b003a9d0305023mr94020oib.3.1697661996735; Wed, 18 Oct 2023 13:46:36 -0700 (PDT) Reply-To: Sean Christopherson Date: Wed, 18 Oct 2023 13:46:24 -0700 In-Reply-To: <20231018204624.1905300-1-seanjc@google.com> Mime-Version: 1.0 References: <20231018204624.1905300-1-seanjc@google.com> X-Mailer: git-send-email 2.42.0.655.g421f12c284-goog Message-ID: <20231018204624.1905300-4-seanjc@google.com> Subject: [PATCH 3/3] Revert "KVM: Prevent module exit until all VMs are freed" From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Al Viro , David Matlack X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Wed, 18 Oct 2023 13:47:05 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780127657304913893 X-GMAIL-MSGID: 1780127657304913893 Revert KVM's misguided attempt to "fix" a use-after-module-unload bug that was actually due to failure to flush a workqueue, not a lack of module refcounting. Pinning the KVM module until kvm_vm_destroy() doesn't prevent use-after-free due to the module being unloaded, as userspace can invoke delete_module() the instant the last reference to KVM is put, i.e. can cause all KVM code to be unmapped while KVM is actively executing said code. Generally speaking, the many instances of module_put(THIS_MODULE) notwithstanding, outside of a few special paths, a module can never safely put the last reference to itself without creating deadlock, i.e. something external to the module *must* put the last reference. In other words, having VMs grab a reference to the KVM module is futile, pointless, and as evidenced by the now-reverted commit 70375c2d8fa3 ("Revert "KVM: set owner of cpu and vm file operations""), actively dangerous. This reverts commit 405294f29faee5de8c10cb9d4a90e229c2835279 and commit 5f6de5cbebee925a612856fce6f9182bb3eee0db. Fixes: 405294f29fae ("KVM: Unconditionally get a ref to /dev/kvm module when creating a VM") Fixes: 5f6de5cbebee ("KVM: Prevent module exit until all VMs are freed") Signed-off-by: Sean Christopherson --- virt/kvm/kvm_main.c | 7 ------- 1 file changed, 7 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 1e65a506985f..3b1b9e8dd70c 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -115,8 +115,6 @@ EXPORT_SYMBOL_GPL(kvm_debugfs_dir); static const struct file_operations stat_fops_per_vm; -static struct file_operations kvm_chardev_ops; - static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl, unsigned long arg); #ifdef CONFIG_KVM_COMPAT @@ -1157,9 +1155,6 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname) if (!kvm) return ERR_PTR(-ENOMEM); - /* KVM is pinned via open("/dev/kvm"), the fd passed to this ioctl(). */ - __module_get(kvm_chardev_ops.owner); - KVM_MMU_LOCK_INIT(kvm); mmgrab(current->mm); kvm->mm = current->mm; @@ -1279,7 +1274,6 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname) out_err_no_srcu: kvm_arch_free_vm(kvm); mmdrop(current->mm); - module_put(kvm_chardev_ops.owner); return ERR_PTR(r); } @@ -1348,7 +1342,6 @@ static void kvm_destroy_vm(struct kvm *kvm) preempt_notifier_dec(); hardware_disable_all(); mmdrop(mm); - module_put(kvm_chardev_ops.owner); } void kvm_get_kvm(struct kvm *kvm)