From patchwork Tue Nov 8 08:44:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 16958 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp2582682wru; Tue, 8 Nov 2022 01:12:41 -0800 (PST) X-Google-Smtp-Source: AMsMyM5Uh7fnAxpxCuprLaLm9q0cQte3pIL+DuzKdthy3vKpASPZoCM9YV8liJPp4BMSgVNzZpmm X-Received: by 2002:a17:906:8a48:b0:7a5:a8f5:b870 with SMTP id gx8-20020a1709068a4800b007a5a8f5b870mr49428937ejc.458.1667898761788; Tue, 08 Nov 2022 01:12:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1667898761; cv=none; d=google.com; s=arc-20160816; b=a6HDbglpFchQT228/QK36reWDAdgifJmtfxL0mtQctUwJEf+nukC2Q7r+SGK6A52Pu UblKhRc/YbTnenBQBKf2dBtiuDj82rkTC3JF4D0Kik8N2v7KlFcbNdnFDGesE2aXM1cG vf1uKvETe9NueERFttt1H96lVs3uZBYQXXlejHqKlmA1QYziYB71uSl9Jy2eg6BntKNi rb/ZnMJx6eQJWSItaUmnnAKXFz6VDNtBVO46pXW6fnCoIN62jOIJJvlTnuvsBZKFmsT0 FUKbVVB7RNSV55mhkNgvn8YoqpgD3MCNg4ZE92LMkJ9w/q1s1ip7vzqzwykVHfmnwol1 8pxA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:subject:cc:to:from :dkim-signature; bh=J+PoXUmhaL/eeYW0pXlnqDvCHrmEpO66S8XICaMV4cc=; b=OosRXvAIKZbWOXcMLg6cG6/oGG2V95stpk7SZx/tjYj0smBHHUUE/GJ9l9AuB+95I2 Rj949elwvoGd4gBzRUmLtDhgmaDAQEzXzOiSwGDHJAVU91gPrRgapneg27pvvoGJvR+M qKOcFwW29nAsknNk4FP+aulIeTyRD6RFPzmjDJW60BaAbIRcr033jRa+l4BGtzrTmmm5 w+9gzFVQ6vIRMFQpgE9y2+SYctzpFtKMtwe42ULn/MnZ/mDHI0Je9JSd7YvQA46ywD8U fCQKEDIm7q2sN+sN+4SfbKFd6T3OYtTDrzNDEB4J6p1AwEWoMmDcnO/YhTW7kIwXyWY4 rWJA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=VUJDq89f; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f20-20020a056402195400b00461c9edc3a5si11913139edz.77.2022.11.08.01.12.13; Tue, 08 Nov 2022 01:12:41 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=VUJDq89f; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233728AbiKHJHQ (ORCPT + 99 others); Tue, 8 Nov 2022 04:07:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59770 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233680AbiKHJHJ (ORCPT ); Tue, 8 Nov 2022 04:07:09 -0500 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 114801E3F0; Tue, 8 Nov 2022 01:07:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667898429; x=1699434429; h=from:to:cc:subject:date:message-id; bh=LJJFnczcEeMn0nte3m6151Gb7yapTGUV60MFhRsjZeo=; b=VUJDq89fa7dihyhJXSGJ6IpBwaEjAS6KVCVI8okK+YXtgOpPWVecbDVB 89KkL3WlwDycZiqXzIMlSGFYe7JbTqrD/wCHcwTcfVKT5h0TdnUrjJIrR cBYIhzQf3Pnel4Fnn46uTVnxFYbc5niGY2dyPys6B0mPC8rTmJWMWHbKJ uB8h6LXeAnKR4xQhSfjIraK6FzEBCf4bvjcnoiz7pGZeB9VqLjsbqfedJ 0VNZqQ12ERGeVpMb7H3F1Crg9ijDAfjTM4lkOZAn5KVH9bBcHFtg5qAqx ATw56WUbGKULHLJ6do6/FxjGS+8wkrvJBUWtbnLgNDf/d7okwd3R6aDkv Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10524"; a="291034619" X-IronPort-AV: E=Sophos;i="5.96,147,1665471600"; d="scan'208";a="291034619" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Nov 2022 01:07:08 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10524"; a="811172114" X-IronPort-AV: E=Sophos;i="5.96,147,1665471600"; d="scan'208";a="811172114" Received: from yzhao56-desk.sh.intel.com ([10.238.200.254]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Nov 2022 01:07:07 -0800 From: Yan Zhao To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: pbonzini@redhat.com, seanjc@google.com, Yan Zhao Subject: [PATCH] KVM: move memslot invalidation later than possible failures Date: Tue, 8 Nov 2022 16:44:16 +0800 Message-Id: <20221108084416.11447-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.17.1 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748918612388065595?= X-GMAIL-MSGID: =?utf-8?q?1748918612388065595?= For memslot delete and move, kvm_invalidate_memslot() is required before the real changes committed. Besides swapping to an inactive slot, kvm_invalidate_memslot() will call kvm_arch_flush_shadow_memslot() and further kvm_page_track_flush_slot() in arch x86. And according to the definition in kvm_page_track_notifier_node, users can drop write-protection for the pages in the memory slot on receiving .track_flush_slot. However, if kvm_prepare_memory_region() fails, the later kvm_activate_memslot() will only swap back the original slot, leaving previous write protection not recovered. This may not be a problem for kvm itself as a page tracker user, but may cause problem to other page tracker users, e.g. kvmgt, whose write-protected pages are removed from the write-protected list and not added back. So call kvm_prepare_memory_region first for meta data preparation before the slot invalidation so as to avoid failure and recovery. Signed-off-by: Yan Zhao --- virt/kvm/kvm_main.c | 40 +++++++++++++++------------------------- 1 file changed, 15 insertions(+), 25 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 25d7872b29c1..5f29011f432d 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1827,45 +1827,35 @@ static int kvm_set_memslot(struct kvm *kvm, */ mutex_lock(&kvm->slots_arch_lock); - /* - * Invalidate the old slot if it's being deleted or moved. This is - * done prior to actually deleting/moving the memslot to allow vCPUs to - * continue running by ensuring there are no mappings or shadow pages - * for the memslot when it is deleted/moved. Without pre-invalidation - * (and without a lock), a window would exist between effecting the - * delete/move and committing the changes in arch code where KVM or a - * guest could access a non-existent memslot. - * - * Modifications are done on a temporary, unreachable slot. The old - * slot needs to be preserved in case a later step fails and the - * invalidation needs to be reverted. - */ if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) { invalid_slot = kzalloc(sizeof(*invalid_slot), GFP_KERNEL_ACCOUNT); if (!invalid_slot) { mutex_unlock(&kvm->slots_arch_lock); return -ENOMEM; } - kvm_invalidate_memslot(kvm, old, invalid_slot); } r = kvm_prepare_memory_region(kvm, old, new, change); if (r) { - /* - * For DELETE/MOVE, revert the above INVALID change. No - * modifications required since the original slot was preserved - * in the inactive slots. Changing the active memslots also - * release slots_arch_lock. - */ - if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) { - kvm_activate_memslot(kvm, invalid_slot, old); + if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) kfree(invalid_slot); - } else { - mutex_unlock(&kvm->slots_arch_lock); - } + + mutex_unlock(&kvm->slots_arch_lock); return r; } + /* + * Invalidate the old slot if it's being deleted or moved. This is + * done prior to actually deleting/moving the memslot to allow vCPUs to + * continue running by ensuring there are no mappings or shadow pages + * for the memslot when it is deleted/moved. Without pre-invalidation + * (and without a lock), a window would exist between effecting the + * delete/move and committing the changes in arch code where KVM or a + * guest could access a non-existent memslot. + */ + if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) + kvm_invalidate_memslot(kvm, old, invalid_slot); + /* * For DELETE and MOVE, the working slot is now active as the INVALID * version of the old slot. MOVE is particularly special as it reuses