From patchwork Thu Feb 29 02:57:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Stevens X-Patchwork-Id: 208180 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2097:b0:108:e6aa:91d0 with SMTP id gs23csp142698dyb; Wed, 28 Feb 2024 18:58:50 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCUE69hcZZrCO068Wbre2tIi5XmWvhJrtfYggVIkopkmY0Dw0QAlwmBrWMcXDoi6NohIM7Wvef7EQ7YXjJIkh5ArKENC4w== X-Google-Smtp-Source: AGHT+IEVAm5y0i2n6y4bXT6GSAd0MT5qZsYAZ2ETptfcalq2K/pDSKyrn3JM0ZqJxgN0g2KSeE13 X-Received: by 2002:a62:e802:0:b0:6e4:c376:bc11 with SMTP id c2-20020a62e802000000b006e4c376bc11mr935785pfi.6.1709175529925; Wed, 28 Feb 2024 18:58:49 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709175529; cv=pass; d=google.com; s=arc-20160816; b=xcevZ+ns6CpUszS5drp0JxkwNxyYB1H5MEEKKg0IJjtLxthfuOcH2C8ZPqbPsKK1gb mVvp4/HjrjNP2CTRh+Oqa3OW158aUb4MeoA0rr7xmRU/nUgBvZEzI+Vm5CwoOcdK9hsl he/BVbbZrcVJDYSsKt/bm/rGXNWAH/zwzqES5GvFSUYqyNGoTjQCco1xOvY+td5+RVEX +NFWZVVMOJtfBM3XUcdqO3tu7La1yJ5zYaOua/uA9zgW/I4PxXPQlvgqDhnb8Ur6mxvm L3LwMpwUaGKcfzH99GBm+Xy36+exaz89rdpgNY0JHnY3vd448gr9F7JUiap/U5LkPmyH zPtg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=h8gBHZBcb2YMWItHZ5v52k75LYEpNNA/oZohojyoTqA=; fh=mK71tCayosplD7hKXdVC2WehuaAZDNTKdCAyGa1Fgwk=; b=ojiIFfyKvsRydh9VR2CoSX8XE4Jg2gzKoEXizrExMzNzJC14cL5WnO4Ee1eLQMOAm9 VhbywBR/fQfQFzAeyXpNmXaLEL5/+gdEQjmtdvV/Cy4UzKeGJL58v2dYchKO0Oe95hs8 F8OFiU/gglcx69PAhHfqvOoKKKNIQnYKht4jQKlcMp6pHCKhDMVsS37GP+gEpSjQ8lnd u4+apEN131ParjICpDNg1fYD+d8a03s1HxJmqMaF41L+6DpiyclkLZu6DDF0C3IfLdC6 h2H5prd74B2zvTzzB299xNIjvpYW9UT/NFi7ncTm9d9r/h3i7igB0kayBe4pLHtR7Vm6 m1Uw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b="KgjD/AlN"; arc=pass (i=1 spf=pass spfdomain=chromium.org dkim=pass dkdomain=chromium.org dmarc=pass fromdomain=chromium.org); spf=pass (google.com: domain of linux-kernel+bounces-86065-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-86065-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id bz41-20020a056a02062900b005dca1e01c0asi470561pgb.880.2024.02.28.18.58.49 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Feb 2024 18:58:49 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-86065-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b="KgjD/AlN"; arc=pass (i=1 spf=pass spfdomain=chromium.org dkim=pass dkdomain=chromium.org dmarc=pass fromdomain=chromium.org); spf=pass (google.com: domain of linux-kernel+bounces-86065-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-86065-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id BA6092860DD for ; Thu, 29 Feb 2024 02:58:49 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 96134381DD; Thu, 29 Feb 2024 02:58:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="KgjD/AlN" Received: from mail-ot1-f49.google.com (mail-ot1-f49.google.com [209.85.210.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5436B374F7 for ; Thu, 29 Feb 2024 02:58:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.49 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709175493; cv=none; b=svC6Cm3PM/+YX9+bqUecOnMZnoivUYRDthMHYt481JOe6Wt5LcED85HW8U6xhWDum9LuO9juhRZbalw3PE7iCk4gZ6sWtDulJrdtcB33d2WuuVjrRaBNxhiBiOv0Skn4oMWoc8jjlwn14YRAbUrSB6qzuffISFX6l/yQbzR/yzw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709175493; c=relaxed/simple; bh=4TOA+r5FgQu9r7XyUG+jnEXfKZGB1J0B4LAWQX2lpPQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=i6/6zvs4vRX/e5gxIW+Axuq5ogz8zcI3aEmSDSioGGiodfw1gtH8sTGYLzLED6Lb4N7NctUeGyEf2yN1awhhnqc9UosW86z43smOWXC5whOKbYYDQXr0hoXd9tVf/EnR51SFKNO0Rn5OF130NvtSIav5wrZZdki2sAp6NmjNtlw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org; spf=pass smtp.mailfrom=chromium.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b=KgjD/AlN; arc=none smtp.client-ip=209.85.210.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=chromium.org Received: by mail-ot1-f49.google.com with SMTP id 46e09a7af769-6e48153c13aso202777a34.3 for ; Wed, 28 Feb 2024 18:58:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1709175491; x=1709780291; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=h8gBHZBcb2YMWItHZ5v52k75LYEpNNA/oZohojyoTqA=; b=KgjD/AlNkquVx5HzOsnwQTVby5PEbzF8CT9VtOG6+9VTjGFd1/M9KSvgEKOA9xLvbb BY/KVjfOgJj9vGOhDs1ghheKsREuquRCGywg9BMWydxuhwC64noIs18BAvNfI+ewTVlY z5rKjQEsjgaMHTB+CWbvJ3e3Tnr9onosiTpvs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709175491; x=1709780291; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=h8gBHZBcb2YMWItHZ5v52k75LYEpNNA/oZohojyoTqA=; b=rUJVvPn+Ue1C+Mawin7TN7ag+7kOu9nPMJJ+1bwJXys/s7PnDGCQMt3finHXQXOINY YQNGtMd/OPnwrIVw9Mg2gZmR28tX67lstfS0/g3eKKeDX1/L5PY3dUJwKx7rzqHo47m9 mQ85NBgSKJOW3vE7r/jGO7rQSfRb+zEm/r4gEqaTo0jRe4KD29CuPpZAwgQNb9VMk1fZ Sz7BlGinpXC+1Aphwvbbrn9+PObD5WHthU5Bk5TVdjvEEsd3hfn3/ZJg5S+jR3yD3oO8 98S6pAr0YgOhOAkhgPsuZr6fHUEbVBCGFsLFWkjk95tCKbTdPj2RzYZJzfwCnAiLCYDT TAWQ== X-Forwarded-Encrypted: i=1; AJvYcCVc5EMq1DrLXss/WhWabAJdvbretz9pWzRrTfMpZBer3dWfQ/+QgbeKq+A6uVTk9qxNdjvWJQQmO0vMn+TbLUyQ0XECLC8VGzIhQ+hD X-Gm-Message-State: AOJu0YwXQixYshK8e+zqd4UuCJ7LfCZG2GQU9Sk6cHTrcH5rxF86MxZE gRlpXJIRElSIi4XJpxIK9woh+heJyBUHlZprpZ7OjVEbUi5UVeJXo3N1bZk8mw== X-Received: by 2002:a05:6358:6f12:b0:17b:c624:b0a1 with SMTP id r18-20020a0563586f1200b0017bc624b0a1mr1060447rwn.23.1709175491359; Wed, 28 Feb 2024 18:58:11 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:f51:e79e:9056:77ea]) by smtp.gmail.com with UTF8SMTPSA id r37-20020a632065000000b005dcc075d5edsm190825pgm.60.2024.02.28.18.58.09 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 28 Feb 2024 18:58:10 -0800 (PST) From: David Stevens X-Google-Original-From: David Stevens To: Sean Christopherson , Paolo Bonzini Cc: Yu Zhang , Isaku Yamahata , Zhi Wang , Maxim Levitsky , kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Subject: [PATCH v11 1/8] KVM: Assert that a page's refcount is elevated when marking accessed/dirty Date: Thu, 29 Feb 2024 11:57:52 +0900 Message-ID: <20240229025759.1187910-2-stevensd@google.com> X-Mailer: git-send-email 2.44.0.rc1.240.g4c46232300-goog In-Reply-To: <20240229025759.1187910-1-stevensd@google.com> References: <20240229025759.1187910-1-stevensd@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1792200440361033191 X-GMAIL-MSGID: 1792200440361033191 From: Sean Christopherson Assert that a page's refcount is elevated, i.e. that _something_ holds a reference to the page, when KVM marks a page as accessed and/or dirty. KVM typically doesn't hold a reference to pages that are mapped into the guest, e.g. to allow page migration, compaction, swap, etc., and instead relies on mmu_notifiers to react to changes in the primary MMU. Incorrect handling of mmu_notifier events (or similar mechanisms) can result in KVM keeping a mapping beyond the lifetime of the backing page, i.e. can (and often does) result in use-after-free. Yelling if KVM marks a freed page as accessed/dirty doesn't prevent badness as KVM usually only does A/D updates when unmapping memory from the guest, i.e. the assertion fires well after an underlying bug has occurred, but yelling does help detect, triage, and debug use-after-free bugs. Note, the assertion must use page_count(), NOT page_ref_count()! For hugepages, the returned struct page may be a tailpage and thus not have its own refcount. Signed-off-by: Sean Christopherson --- virt/kvm/kvm_main.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 10bfc88a69f7..c5e4bf7c48f9 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3204,6 +3204,19 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_unmap); static bool kvm_is_ad_tracked_page(struct page *page) { + /* + * Assert that KVM isn't attempting to mark a freed page as Accessed or + * Dirty, i.e. that KVM's MMU doesn't have a use-after-free bug. KVM + * (typically) doesn't pin pages that are mapped in KVM's MMU, and + * instead relies on mmu_notifiers to know when a mapping needs to be + * zapped/invalidated. Unmapping from KVM's MMU must happen _before_ + * KVM returns from its mmu_notifier, i.e. the page should have an + * elevated refcount at this point even though KVM doesn't hold a + * reference of its own. + */ + if (WARN_ON_ONCE(!page_count(page))) + return false; + /* * Per page-flags.h, pages tagged PG_reserved "should in general not be * touched (e.g. set dirty) except by its owner". From patchwork Thu Feb 29 02:57:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Stevens X-Patchwork-Id: 208181 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2097:b0:108:e6aa:91d0 with SMTP id gs23csp142904dyb; Wed, 28 Feb 2024 18:59:32 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCWYrMB2gzLprKZIn33FvvQyFStWo7yd/YhBbP/7egRBAk+9MK3xxQihPWtrisCXCvcP1FDlxJRUx1u6akNa5dcIYTSOKA== X-Google-Smtp-Source: AGHT+IF6hJSzOcNjRMyID29jiXhmQGZ6wLbqJvCkst5hjDNsz8XyLUsn1QWRCOM/TpPDi12hWULu X-Received: by 2002:aa7:8506:0:b0:6e4:f2c3:213c with SMTP id v6-20020aa78506000000b006e4f2c3213cmr1064228pfn.20.1709175571842; Wed, 28 Feb 2024 18:59:31 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709175571; cv=pass; d=google.com; s=arc-20160816; b=SWoKqY0fWUqR5ch77EJp0gVBMnAaTYLCTbtwOd4djxEDjtZPDm09TiwUiZcm8m4U7r uEkV3noQ7mGJffzgdt6K+AdPlHMA/wlukHoCukhwnYJfHPrQGFXUzkjuuFYNeC0ct08q 7PMpv3ZNn1jzg2KlR26l7fjWETcjCbEG/ZpqlLRkBQ+kuPv5SKaGqsuTFrIu0i11otYW qfJtlUq05pnFVUUwxTt8s5g/F/AW2j0OOXktNfOTuM62An9DNV2unw76QFh857Wh91PW Egx9EUGHoy74uFR9lgzk6jI2Gk35Twue5BR0kEKyFQs9xtF2gWF1T/93M0QHmxpPnHEE I3Ow== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=DqzBf0lbp1ARHQDBEkHQmEKBBGeWS/SzkM01JYF1z0o=; fh=WDpVw4Y3hfobbU9VTds+/GampGcsX4DdFqGrTnY/AZ0=; b=BEw/TeRAJpTJ0uFmdJf4e5Tu5Wrk+QKhT/+b2ohLXby7cVV3ly97aGuaE7P22rR08p I40Ya1BjIq/UXiZwobK23fv13274hoeeuqfKg0Gatv9nyLFNM0HypZlloSPP41sh1UmD mGaZMW8thUatsKWYcVizGjPzxsBd5uTnOQhwQiFH51bcOrWNtBUORAHaWPE9Pupggccx J8xz2jq6NtpYsr7hG1f6pfAK0FiHwjve90xQAGI97g7lFe7cDIPl7i2aPr2/FqITYcV8 xthwxUXS+45k7OCv+9fT5hB/+3UW7WFNgDPDUMHkTOrPfQ10WyWTooHDhTr2LKqG5I5T zlBA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=Od541j0o; arc=pass (i=1 spf=pass spfdomain=chromium.org dkim=pass dkdomain=chromium.org dmarc=pass fromdomain=chromium.org); spf=pass (google.com: domain of linux-kernel+bounces-86066-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-86066-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id z4-20020aa78884000000b006e4751b9565si378001pfe.385.2024.02.28.18.59.31 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Feb 2024 18:59:31 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-86066-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=Od541j0o; arc=pass (i=1 spf=pass spfdomain=chromium.org dkim=pass dkdomain=chromium.org dmarc=pass fromdomain=chromium.org); spf=pass (google.com: domain of linux-kernel+bounces-86066-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-86066-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 40610B22950 for ; Thu, 29 Feb 2024 02:59:10 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id ACD0E38DED; Thu, 29 Feb 2024 02:58:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="Od541j0o" Received: from mail-pf1-f180.google.com (mail-pf1-f180.google.com [209.85.210.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B0C1938382 for ; Thu, 29 Feb 2024 02:58:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709175498; cv=none; b=i/aIwqP9wGikot15d/WEDmNJNetklS283LtvrMSAnTkUwAn5toiO3uWTWnSsFLG4cGsrpIZ8I0QalrQsUEaNxm9qk8KbZgZ7UIdd3wm5kfFU6dUD34WtnD2drOOB3bmexNSn3ju0BsSoHRaZZ5JHz5qLbgAzGzgHjosOHUsg0kY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709175498; c=relaxed/simple; bh=vBxpG3Dz9ZyzQmLOVkEwkMADJ6mdtbcyVLq+Z9tdtMM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tfrj1RQWu0fi7Kb9hkHr9gpIxatN6zPUkEk0cKrYav09l44h0ypd9KB73yfSJ/8nYNDZRLRhPSEVB/q0/UpD5hu2O8lXuPMR1h2IfPfLhTxr0CDjsa1dfnoSMCyTftGcPWvK0ZXlt3uoChnPxlqMR70ce59cCswNbPlBuq9bD4c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org; spf=pass smtp.mailfrom=chromium.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b=Od541j0o; arc=none smtp.client-ip=209.85.210.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=chromium.org Received: by mail-pf1-f180.google.com with SMTP id d2e1a72fcca58-6d9f94b9186so372528b3a.0 for ; Wed, 28 Feb 2024 18:58:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1709175496; x=1709780296; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=DqzBf0lbp1ARHQDBEkHQmEKBBGeWS/SzkM01JYF1z0o=; b=Od541j0o8r+l+813ZW6A63LYTtnUz48u1QrOxZ8ejUtOwR0zg0WrWI/dyD3Z9WuPcr GCPqD5el18IKluc49eM9k2Vs8TAFpxK+7qTE/pYOviYsCCqQkbalD6oQi1oZw/Q83RF8 DoKinC/gJLAs9Mi9xpHEENjFW9V28EBNW54WU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709175496; x=1709780296; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DqzBf0lbp1ARHQDBEkHQmEKBBGeWS/SzkM01JYF1z0o=; b=fd1o19bxw3yMrrjuqPocE2Q81ydSCyFaWy9W0uTRoHbiiXZ4MWGDk3b8hB7YI2ODym EMccULSgz2Afy+5Oy+yhnRQb8YEHU6Bk78nk2KF+I2rLN7WBrQjmxu69vk98yKvZHO9X 61mhLYSv+JytHjJ7TU8U754iQ8GXLL415WP3WBBrZawcaGEyS866juSrf2U/yrxrAM+n kdCfYT44Hg4ys/f7YhRHlurpCa3jXiLzEB/xkPonWCbHPQxI9kMUuSzMxmpGUgkkCCho uEkeMKlU61HTNg/tcLL/hRevyBunfQ8XUWKtwRJdXDppZTdaA96ruLMKSUUkU2ZF/2nJ STGQ== X-Forwarded-Encrypted: i=1; AJvYcCWz/yOv1dzTsCUnNSI40/r42ags2b8z81Q9++kxJCg69MYzqs3Z3lC8Osbp/agrX06w6xYQMCjK/pgPyktpALheEgS8ZYfVsiACwbg3 X-Gm-Message-State: AOJu0Yz8OMXGTGaFxrAUBE+UQsvyAQWiqtFKfcVssIB3Y3DYzEW6uNSo zr4wZuC6Jiv6srdJT803TKIMUL5rMiLx00CPqIdSKzF69352GXiQTzKk66/gcg== X-Received: by 2002:a05:6a20:c887:b0:1a0:f713:8317 with SMTP id hb7-20020a056a20c88700b001a0f7138317mr1259104pzb.61.1709175496202; Wed, 28 Feb 2024 18:58:16 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:f51:e79e:9056:77ea]) by smtp.gmail.com with UTF8SMTPSA id a1-20020a17090aa50100b002997f192d4esm2055537pjq.1.2024.02.28.18.58.13 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 28 Feb 2024 18:58:15 -0800 (PST) From: David Stevens X-Google-Original-From: David Stevens To: Sean Christopherson , Paolo Bonzini Cc: Yu Zhang , Isaku Yamahata , Zhi Wang , Maxim Levitsky , kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, David Stevens Subject: [PATCH v11 2/8] KVM: Relax BUG_ON argument validation Date: Thu, 29 Feb 2024 11:57:53 +0900 Message-ID: <20240229025759.1187910-3-stevensd@google.com> X-Mailer: git-send-email 2.44.0.rc1.240.g4c46232300-goog In-Reply-To: <20240229025759.1187910-1-stevensd@google.com> References: <20240229025759.1187910-1-stevensd@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1792200484020960076 X-GMAIL-MSGID: 1792200484020960076 From: David Stevens hva_to_pfn() includes a check that KVM isn't trying to do an async page fault in a situation where it can't sleep. Downgrade this check from a BUG_ON() to a WARN_ON_ONCE(), since DoS'ing the guest (at worst) is better than bringing down the host. Suggested-by: Sean Christopherson Signed-off-by: David Stevens --- virt/kvm/kvm_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index c5e4bf7c48f9..6f37d56fb2fc 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2979,7 +2979,7 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool interruptible, int npages, r; /* we can do it either atomically or asynchronously, not both */ - BUG_ON(atomic && async); + WARN_ON_ONCE(atomic && async); if (hva_to_pfn_fast(addr, write_fault, writable, &pfn)) return pfn; From patchwork Thu Feb 29 02:57:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Stevens X-Patchwork-Id: 208188 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2097:b0:108:e6aa:91d0 with SMTP id gs23csp144927dyb; Wed, 28 Feb 2024 19:04:04 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCUXSFwqv7zHWSH1oFANWB/Yrbr2QH4f7PQI+kdrDujJUDUph6ZS2OfwA0foe9f1lAxARiQB1evq47NuZ1MyrbRQsCkjDA== X-Google-Smtp-Source: AGHT+IEIyiu2Xe4OEcHzEBCqns0VGkNby8QLzeAaOVNUOWy3ZLQZJs0eT/z2n0tbe/ABMXGcT7It X-Received: by 2002:a17:903:191:b0:1dc:8b82:7f56 with SMTP id z17-20020a170903019100b001dc8b827f56mr982685plg.20.1709175844071; Wed, 28 Feb 2024 19:04:04 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709175844; cv=pass; d=google.com; s=arc-20160816; b=SmEgFc4CYew57M6x/8sq7weXk8G+RZYQQrgSAM7rgd6xo2JSGc/urvpbBUpaQI6Sw8 mQaejzV65NYNVqa//NZtTFxJNU3e8zoV/Uvoe58o3D3wC3cJO0KjG4edXxchQ6FigkuJ C81z+CoSpITNbYBRiFk+qxnl2CtnKxWP8CFkfZOV8kdMVNfA8KPvde9q2y9Ll5OHLZhb 4jLeQJRFUraXd9/LkPwXSOoJhvs51Xtp0xKsocs+4QUOotin/tYgmZReYspmimihCoYJ 4SPb2eePjqc2TsAM8RRjrduEPkB6dYOflNaG2RerljyXB9xim52ED5nkBb8zT1/stxht 8cGw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=7pCyadIacDNXpMxBRbQ7ViSuJim7Tld7U5Xm9y5zMV8=; fh=8L2SGKn8pkKfCNhoi8HO8a7E/0rD7riUpLuj16lfYxE=; b=tfeSGwTi4jOQKH3CNm39mBAgaE87vry1/4wxX32HgP08Bc1xO6XKlO84ahNS6SA5j2 kMXcmEYQAgjoXW9Z4FBYOhE9tbe1h60PQfmpJJ3TjhtapxZ5td3QMl4yjG0G/bca8o4a 0BMdWxuXGr2wYzQgnRmRP5nIMahRNA9zQCLCY50UmRwmY1ih5gLvXI1NKTb8UOuD2QyW 6w0STnL9SHp+lURzdaeg/EQy4HE2iVqFsgXAX4oG20HzgNdiSHGUsAiuvtbvZctdQY/p 2HFfIXFVZr7g0BWrEo28qKF3z3ypBaoKNqE4aQdFkVHrOL+Bg3RmMkgS4nCxnHW2GnkE nTWA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=OkUjrea7; arc=pass (i=1 spf=pass spfdomain=chromium.org dkim=pass dkdomain=chromium.org dmarc=pass fromdomain=chromium.org); spf=pass (google.com: domain of linux-kernel+bounces-86067-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-86067-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id w12-20020a1709029a8c00b001db2d82e7f1si372326plp.137.2024.02.28.19.04.03 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Feb 2024 19:04:04 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-86067-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=OkUjrea7; arc=pass (i=1 spf=pass spfdomain=chromium.org dkim=pass dkdomain=chromium.org dmarc=pass fromdomain=chromium.org); spf=pass (google.com: domain of linux-kernel+bounces-86067-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-86067-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 5004DB22793 for ; Thu, 29 Feb 2024 02:59:35 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id E3C653BBDC; Thu, 29 Feb 2024 02:58:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="OkUjrea7" Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F2D8338DF8 for ; Thu, 29 Feb 2024 02:58:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709175502; cv=none; b=o/juY+x6id4UuH+1YdRHPxQKYxxseMpgN2yuKxzVQZSj6agPbOFmOm6JDSnZWNS6UlVIsH4Wg0WMCFMf1D2a2scj51OUab8F7XgO7hC+YqO2RBDoCSltn0R5xLQrD209Nwh1RU6hEa5BPjyt1zCsNjMXgBod/osuGfEWsjA2aJY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709175502; c=relaxed/simple; bh=zqbO5W+2Y74Xe1DsTvWYol7UEmu3gtWMmT7zTXAW+jM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=l0MQFQfu9pQ7gAVqmny634bMgbXtUMVSS4RPHdRCjAQEbHX71Yuc7IdnmFLCGiDSfd0Jc7OK/kcEeMuhkrAzoBrX21PGNNMZKKe0PO+fotNNxE0rqd8Vt5eMSaYa3iuHnVns+g1PGRUjUED5eudAcHYFgF9PyHFCAJxBYHylT0w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org; spf=pass smtp.mailfrom=chromium.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b=OkUjrea7; arc=none smtp.client-ip=209.85.214.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=chromium.org Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-1dcc7f4717fso4642625ad.0 for ; Wed, 28 Feb 2024 18:58:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1709175500; x=1709780300; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=7pCyadIacDNXpMxBRbQ7ViSuJim7Tld7U5Xm9y5zMV8=; b=OkUjrea7QC0MX7A50vNaqmmoG2EB6A7kCHuwFPwphiLbXARU0EnyOVQfdMN3THSTzS l0rKSBVIkAz4DEBbZDVZi+3YzZwNJjfrZluMddHmCDqX6VSM9AkP5/WVlv51HF6LNHDW BSwcOj/TS4nXBsg1hKLy4/tnIbRVVLNjhlH4Q= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709175500; x=1709780300; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7pCyadIacDNXpMxBRbQ7ViSuJim7Tld7U5Xm9y5zMV8=; b=NpjACNFpMQflAPRKXQiytsp8LhPmXLK8WzPoh25xWsc5I8tSby3Gx3kyXub1XVuPEF ymEI4C6W+Wvz7ve5AETxJtOmhXPnAth+vxXPm63p8rTsJgtl4d9ltjkffSNgD8aA8Zjo f+qz5jfUwnqdd9Hlm3RSHUST3r7DuV+zPcWkPRCZLvdH3Glj7eey2/SXTeFxpx2vA++K O24ZlH6Fvgu7KNEbvcHxHPGPgw+Br6nZlT8y0xQlPUf4+Z4uzp34nGszL3PS16yBAUmI ld+DzMKOUdq3mpBkw5qT2DYpiUZs0Ta70GoTVZuudOR81O38jgdqscNYwfvyUKSs33cH 4R5A== X-Forwarded-Encrypted: i=1; AJvYcCXgbjcema6kIZwQLHRx2KL0DniIZTUNYxjCgzDv0kUNhmWMMZX9epkUbXkn/+B0QxpGbGRwzTD8IjSauojyDSA2noET8HSGXBC13SkD X-Gm-Message-State: AOJu0YwHWGIBd0OWTxGO+oKP36vcur9+PRWaXM88U9akipy38msMuTcK lmOIll4D5wdu5ce+Xvb3MKC866Mkgey7CKe5N9jWa9jVf5lueFEUSHGX4wZ8RuO0g/EEbdL6uf4 = X-Received: by 2002:a17:903:234f:b0:1db:f830:c381 with SMTP id c15-20020a170903234f00b001dbf830c381mr860436plh.44.1709175500286; Wed, 28 Feb 2024 18:58:20 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:f51:e79e:9056:77ea]) by smtp.gmail.com with UTF8SMTPSA id c1-20020a170903234100b001d9fc6df457sm180902plh.5.2024.02.28.18.58.17 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 28 Feb 2024 18:58:19 -0800 (PST) From: David Stevens X-Google-Original-From: David Stevens To: Sean Christopherson , Paolo Bonzini Cc: Yu Zhang , Isaku Yamahata , Zhi Wang , Maxim Levitsky , kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, David Stevens Subject: [PATCH v11 3/8] KVM: mmu: Introduce kvm_follow_pfn() Date: Thu, 29 Feb 2024 11:57:54 +0900 Message-ID: <20240229025759.1187910-4-stevensd@google.com> X-Mailer: git-send-email 2.44.0.rc1.240.g4c46232300-goog In-Reply-To: <20240229025759.1187910-1-stevensd@google.com> References: <20240229025759.1187910-1-stevensd@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1792200769714165847 X-GMAIL-MSGID: 1792200769714165847 From: David Stevens Introduce kvm_follow_pfn(), which will replace __gfn_to_pfn_memslot(). This initial implementation is just a refactor of the existing API which uses a single structure for passing the arguments. The arguments are further refactored as follows: - The write_fault and interruptible boolean flags and the in parameter part of async are replaced by setting FOLL_WRITE, FOLL_INTERRUPTIBLE, and FOLL_NOWAIT respectively in a new flags argument. - The out parameter portion of the async parameter is now a return value. - The writable in/out parameter is split into a separate. try_map_writable in parameter and writable out parameter. - All other parameter are the same. Upcoming changes will add the ability to get a pfn without needing to take a ref to the underlying page. Signed-off-by: David Stevens Reviewed-by: Maxim Levitsky --- include/linux/kvm_host.h | 18 ++++ virt/kvm/kvm_main.c | 191 +++++++++++++++++++++------------------ virt/kvm/kvm_mm.h | 3 +- virt/kvm/pfncache.c | 10 +- 4 files changed, 131 insertions(+), 91 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 7e7fd25b09b3..290db5133c36 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -97,6 +97,7 @@ #define KVM_PFN_ERR_HWPOISON (KVM_PFN_ERR_MASK + 1) #define KVM_PFN_ERR_RO_FAULT (KVM_PFN_ERR_MASK + 2) #define KVM_PFN_ERR_SIGPENDING (KVM_PFN_ERR_MASK + 3) +#define KVM_PFN_ERR_NEEDS_IO (KVM_PFN_ERR_MASK + 4) /* * error pfns indicate that the gfn is in slot but faild to @@ -1209,6 +1210,23 @@ unsigned long gfn_to_hva_memslot_prot(struct kvm_memory_slot *slot, gfn_t gfn, void kvm_release_page_clean(struct page *page); void kvm_release_page_dirty(struct page *page); +struct kvm_follow_pfn { + const struct kvm_memory_slot *slot; + gfn_t gfn; + /* FOLL_* flags modifying lookup behavior. */ + unsigned int flags; + /* Whether this function can sleep. */ + bool atomic; + /* Try to create a writable mapping even for a read fault. */ + bool try_map_writable; + + /* Outputs of kvm_follow_pfn */ + hva_t hva; + bool writable; +}; + +kvm_pfn_t kvm_follow_pfn(struct kvm_follow_pfn *kfp); + kvm_pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn); kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault, bool *writable); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 6f37d56fb2fc..575756c9c5b0 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2791,8 +2791,7 @@ static inline int check_user_page_hwpoison(unsigned long addr) * true indicates success, otherwise false is returned. It's also the * only part that runs if we can in atomic context. */ -static bool hva_to_pfn_fast(unsigned long addr, bool write_fault, - bool *writable, kvm_pfn_t *pfn) +static bool hva_to_pfn_fast(struct kvm_follow_pfn *kfp, kvm_pfn_t *pfn) { struct page *page[1]; @@ -2801,14 +2800,12 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault, * or the caller allows to map a writable pfn for a read fault * request. */ - if (!(write_fault || writable)) + if (!((kfp->flags & FOLL_WRITE) || kfp->try_map_writable)) return false; - if (get_user_page_fast_only(addr, FOLL_WRITE, page)) { + if (get_user_page_fast_only(kfp->hva, FOLL_WRITE, page)) { *pfn = page_to_pfn(page[0]); - - if (writable) - *writable = true; + kfp->writable = true; return true; } @@ -2819,8 +2816,7 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault, * The slow path to get the pfn of the specified host virtual address, * 1 indicates success, -errno is returned if error is detected. */ -static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault, - bool interruptible, bool *writable, kvm_pfn_t *pfn) +static int hva_to_pfn_slow(struct kvm_follow_pfn *kfp, kvm_pfn_t *pfn) { /* * When a VCPU accesses a page that is not mapped into the secondary @@ -2833,32 +2829,24 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault, * Note that get_user_page_fast_only() and FOLL_WRITE for now * implicitly honor NUMA hinting faults and don't need this flag. */ - unsigned int flags = FOLL_HWPOISON | FOLL_HONOR_NUMA_FAULT; + unsigned int flags = FOLL_HWPOISON | FOLL_HONOR_NUMA_FAULT | kfp->flags; struct page *page; int npages; might_sleep(); - if (writable) - *writable = write_fault; - - if (write_fault) - flags |= FOLL_WRITE; - if (async) - flags |= FOLL_NOWAIT; - if (interruptible) - flags |= FOLL_INTERRUPTIBLE; - - npages = get_user_pages_unlocked(addr, 1, &page, flags); + npages = get_user_pages_unlocked(kfp->hva, 1, &page, flags); if (npages != 1) return npages; - /* map read fault as writable if possible */ - if (unlikely(!write_fault) && writable) { + if (kfp->flags & FOLL_WRITE) { + kfp->writable = true; + } else if (kfp->try_map_writable) { struct page *wpage; - if (get_user_page_fast_only(addr, FOLL_WRITE, &wpage)) { - *writable = true; + /* map read fault as writable if possible */ + if (get_user_page_fast_only(kfp->hva, FOLL_WRITE, &wpage)) { + kfp->writable = true; put_page(page); page = wpage; } @@ -2889,23 +2877,23 @@ static int kvm_try_get_pfn(kvm_pfn_t pfn) } static int hva_to_pfn_remapped(struct vm_area_struct *vma, - unsigned long addr, bool write_fault, - bool *writable, kvm_pfn_t *p_pfn) + struct kvm_follow_pfn *kfp, kvm_pfn_t *p_pfn) { kvm_pfn_t pfn; pte_t *ptep; pte_t pte; spinlock_t *ptl; + bool write_fault = kfp->flags & FOLL_WRITE; int r; - r = follow_pte(vma->vm_mm, addr, &ptep, &ptl); + r = follow_pte(vma->vm_mm, kfp->hva, &ptep, &ptl); if (r) { /* * get_user_pages fails for VM_IO and VM_PFNMAP vmas and does * not call the fault handler, so do it here. */ bool unlocked = false; - r = fixup_user_fault(current->mm, addr, + r = fixup_user_fault(current->mm, kfp->hva, (write_fault ? FAULT_FLAG_WRITE : 0), &unlocked); if (unlocked) @@ -2913,7 +2901,7 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, if (r) return r; - r = follow_pte(vma->vm_mm, addr, &ptep, &ptl); + r = follow_pte(vma->vm_mm, kfp->hva, &ptep, &ptl); if (r) return r; } @@ -2925,8 +2913,7 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, goto out; } - if (writable) - *writable = pte_write(pte); + kfp->writable = pte_write(pte); pfn = pte_pfn(pte); /* @@ -2957,38 +2944,28 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, } /* - * Pin guest page in memory and return its pfn. - * @addr: host virtual address which maps memory to the guest - * @atomic: whether this function can sleep - * @interruptible: whether the process can be interrupted by non-fatal signals - * @async: whether this function need to wait IO complete if the - * host page is not in the memory - * @write_fault: whether we should get a writable host page - * @writable: whether it allows to map a writable host page for !@write_fault - * - * The function will map a writable host page for these two cases: - * 1): @write_fault = true - * 2): @write_fault = false && @writable, @writable will tell the caller - * whether the mapping is writable. + * Convert a hva to a pfn. + * @kfp: args struct for the conversion */ -kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool interruptible, - bool *async, bool write_fault, bool *writable) +kvm_pfn_t hva_to_pfn(struct kvm_follow_pfn *kfp) { struct vm_area_struct *vma; kvm_pfn_t pfn; int npages, r; - /* we can do it either atomically or asynchronously, not both */ - WARN_ON_ONCE(atomic && async); + /* + * FOLL_NOWAIT is used for async page faults, which don't make sense + * in an atomic context where the caller can't do async resolution. + */ + WARN_ON_ONCE(kfp->atomic && (kfp->flags & FOLL_NOWAIT)); - if (hva_to_pfn_fast(addr, write_fault, writable, &pfn)) + if (hva_to_pfn_fast(kfp, &pfn)) return pfn; - if (atomic) + if (kfp->atomic) return KVM_PFN_ERR_FAULT; - npages = hva_to_pfn_slow(addr, async, write_fault, interruptible, - writable, &pfn); + npages = hva_to_pfn_slow(kfp, &pfn); if (npages == 1) return pfn; if (npages == -EINTR) @@ -2996,83 +2973,123 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool interruptible, mmap_read_lock(current->mm); if (npages == -EHWPOISON || - (!async && check_user_page_hwpoison(addr))) { + (!(kfp->flags & FOLL_NOWAIT) && check_user_page_hwpoison(kfp->hva))) { pfn = KVM_PFN_ERR_HWPOISON; goto exit; } retry: - vma = vma_lookup(current->mm, addr); + vma = vma_lookup(current->mm, kfp->hva); if (vma == NULL) pfn = KVM_PFN_ERR_FAULT; else if (vma->vm_flags & (VM_IO | VM_PFNMAP)) { - r = hva_to_pfn_remapped(vma, addr, write_fault, writable, &pfn); + r = hva_to_pfn_remapped(vma, kfp, &pfn); if (r == -EAGAIN) goto retry; if (r < 0) pfn = KVM_PFN_ERR_FAULT; } else { - if (async && vma_is_valid(vma, write_fault)) - *async = true; - pfn = KVM_PFN_ERR_FAULT; + if ((kfp->flags & FOLL_NOWAIT) && + vma_is_valid(vma, kfp->flags & FOLL_WRITE)) + pfn = KVM_PFN_ERR_NEEDS_IO; + else + pfn = KVM_PFN_ERR_FAULT; } exit: mmap_read_unlock(current->mm); return pfn; } -kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, - bool atomic, bool interruptible, bool *async, - bool write_fault, bool *writable, hva_t *hva) +kvm_pfn_t kvm_follow_pfn(struct kvm_follow_pfn *kfp) { - unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, write_fault); + kfp->writable = false; + kfp->hva = __gfn_to_hva_many(kfp->slot, kfp->gfn, NULL, + kfp->flags & FOLL_WRITE); - if (hva) - *hva = addr; - - if (addr == KVM_HVA_ERR_RO_BAD) { - if (writable) - *writable = false; + if (kfp->hva == KVM_HVA_ERR_RO_BAD) return KVM_PFN_ERR_RO_FAULT; - } - if (kvm_is_error_hva(addr)) { - if (writable) - *writable = false; + if (kvm_is_error_hva(kfp->hva)) return KVM_PFN_NOSLOT; - } - /* Do not map writable pfn in the readonly memslot. */ - if (writable && memslot_is_readonly(slot)) { - *writable = false; - writable = NULL; - } + if (memslot_is_readonly(kfp->slot)) + kfp->try_map_writable = false; + + return hva_to_pfn(kfp); +} +EXPORT_SYMBOL_GPL(kvm_follow_pfn); + +kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, + bool atomic, bool interruptible, bool *async, + bool write_fault, bool *writable, hva_t *hva) +{ + kvm_pfn_t pfn; + struct kvm_follow_pfn kfp = { + .slot = slot, + .gfn = gfn, + .flags = 0, + .atomic = atomic, + .try_map_writable = !!writable, + }; + + if (write_fault) + kfp.flags |= FOLL_WRITE; + if (async) + kfp.flags |= FOLL_NOWAIT; + if (interruptible) + kfp.flags |= FOLL_INTERRUPTIBLE; - return hva_to_pfn(addr, atomic, interruptible, async, write_fault, - writable); + pfn = kvm_follow_pfn(&kfp); + if (pfn == KVM_PFN_ERR_NEEDS_IO) { + *async = true; + pfn = KVM_PFN_ERR_FAULT; + } + if (hva) + *hva = kfp.hva; + if (writable) + *writable = kfp.writable; + return pfn; } EXPORT_SYMBOL_GPL(__gfn_to_pfn_memslot); kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault, bool *writable) { - return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, false, - NULL, write_fault, writable, NULL); + kvm_pfn_t pfn; + struct kvm_follow_pfn kfp = { + .slot = gfn_to_memslot(kvm, gfn), + .gfn = gfn, + .flags = write_fault ? FOLL_WRITE : 0, + .try_map_writable = !!writable, + }; + pfn = kvm_follow_pfn(&kfp); + if (writable) + *writable = kfp.writable; + return pfn; } EXPORT_SYMBOL_GPL(gfn_to_pfn_prot); kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn) { - return __gfn_to_pfn_memslot(slot, gfn, false, false, NULL, true, - NULL, NULL); + struct kvm_follow_pfn kfp = { + .slot = slot, + .gfn = gfn, + .flags = FOLL_WRITE, + }; + return kvm_follow_pfn(&kfp); } EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot); kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gfn) { - return __gfn_to_pfn_memslot(slot, gfn, true, false, NULL, true, - NULL, NULL); + struct kvm_follow_pfn kfp = { + .slot = slot, + .gfn = gfn, + .flags = FOLL_WRITE, + .atomic = true, + }; + return kvm_follow_pfn(&kfp); } EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot_atomic); diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h index ecefc7ec51af..9ba61fbb727c 100644 --- a/virt/kvm/kvm_mm.h +++ b/virt/kvm/kvm_mm.h @@ -20,8 +20,7 @@ #define KVM_MMU_UNLOCK(kvm) spin_unlock(&(kvm)->mmu_lock) #endif /* KVM_HAVE_MMU_RWLOCK */ -kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool interruptible, - bool *async, bool write_fault, bool *writable); +kvm_pfn_t hva_to_pfn(struct kvm_follow_pfn *foll); #ifdef CONFIG_HAVE_KVM_PFNCACHE void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm, diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c index 2d6aba677830..1fb21c2ced5d 100644 --- a/virt/kvm/pfncache.c +++ b/virt/kvm/pfncache.c @@ -144,6 +144,12 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc) kvm_pfn_t new_pfn = KVM_PFN_ERR_FAULT; void *new_khva = NULL; unsigned long mmu_seq; + struct kvm_follow_pfn kfp = { + .slot = gpc->memslot, + .gfn = gpa_to_gfn(gpc->gpa), + .flags = FOLL_WRITE, + .hva = gpc->uhva, + }; lockdep_assert_held(&gpc->refresh_lock); @@ -182,8 +188,8 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc) cond_resched(); } - /* We always request a writeable mapping */ - new_pfn = hva_to_pfn(gpc->uhva, false, false, NULL, true, NULL); + /* We always request a writable mapping */ + new_pfn = hva_to_pfn(&kfp); if (is_error_noslot_pfn(new_pfn)) goto out_error; From patchwork Thu Feb 29 02:57:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Stevens X-Patchwork-Id: 208189 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2097:b0:108:e6aa:91d0 with SMTP id gs23csp145054dyb; Wed, 28 Feb 2024 19:04:23 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCVR3DM2JPhCWuZUoPx4ZHKDC343N7uSRI9xMILM/HUO4elF1E9HBbO3H/T8Qd889kw0AVwm/UylBJuxFRo4dKq2D68JBg== X-Google-Smtp-Source: AGHT+IE3Axua0yl5bmH0OEeZKxvVaebFJxnVSXEJ0Ju6pl2BK2uQE9s5kZtCTmq/3SUUhqFuJj3v X-Received: by 2002:a17:902:f7cd:b0:1dc:7856:2213 with SMTP id h13-20020a170902f7cd00b001dc78562213mr629865plw.37.1709175863147; Wed, 28 Feb 2024 19:04:23 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709175863; cv=pass; d=google.com; s=arc-20160816; b=GpuIjLD6RsQHm1KMTNvfmBAh2RlzENjJfDkkbZ66KLcXwli9cqLaeQEkACahH62RNF ei5vVIrbFaNdRt1nwX7M+PCRIH508fVuejWF0C0EP38AAEJh28SulrgiYPFePdcbRCOw nYFimfo/iQyaxOx9GI6m0d1mKtPSbrAtaFuXFyKQMLwdAIdRh3+XNaJK9Ag0Z9M1LxRT s29iRfheyYleyfQYs7QqvFBzYvXYeFQpaw5VHL84iJMbYA4xoTA1/O8AwQcQr3uk/n/N F5W7gBcgSjbomseHgNBDVHBeCkrcwVZn873EXNjtT/SifFyeuVf6RzR6sgfUF62qHR6e fhuQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=guFGcwgeQ9q19T0MpQj3XFS0S0+Vp/ZSlvHqGbok+lw=; fh=JkTWXkWOEd1PWkGMfZkx5qk6VrXYb0RYSUrD9hGz9PQ=; b=Q3c6wsX+JrH8wQ6Vh1U7jKhKu5VrB3nCsTOgSS1BeifY/fWgc7FCQ4b6+i+nd/CFPa PJsML5lfErskS8qKybQK6Mi0Vr1lolFIRjrilBdfDLyFoYrDTa1KnaUUFYCn74gOrtL2 tiRL8OJ3ghYJWGBiJ6K43RmhdVYphzoywo9kaDJXxMbFAgPKS4Vjqkf53Ez6Qo2ckd+Y F4lmCpdzAmva2uYhuuGfJeZinvzy5VDMSqYhlKX7iYGquTkZSKwCcFSh2+kBMmrVd74L WcbqRPJ5iTeIDuJSc7S6bZPl1uM9JLhmJNNh3HbmcaD7T58IIg19WUiGAGABWHaMZune n2rg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=TUm3QJqU; arc=pass (i=1 spf=pass spfdomain=chromium.org dkim=pass dkdomain=chromium.org dmarc=pass fromdomain=chromium.org); spf=pass (google.com: domain of linux-kernel+bounces-86068-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-86068-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id q18-20020a170902e31200b001dcd0cf3781si367802plc.135.2024.02.28.19.04.22 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Feb 2024 19:04:23 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-86068-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=TUm3QJqU; arc=pass (i=1 spf=pass spfdomain=chromium.org dkim=pass dkdomain=chromium.org dmarc=pass fromdomain=chromium.org); spf=pass (google.com: domain of linux-kernel+bounces-86068-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-86068-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 4133DB2340D for ; Thu, 29 Feb 2024 02:59:52 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id DAEF03E485; Thu, 29 Feb 2024 02:58:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="TUm3QJqU" Received: from mail-pf1-f179.google.com (mail-pf1-f179.google.com [209.85.210.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ABB823A8FA for ; Thu, 29 Feb 2024 02:58:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709175507; cv=none; b=o0FGBLy/6Shy9OgiDfmBWjTiUc/H/dXQfOczOGspvccCCmqAGQzuCLCvtpGbMmvT+U201giUwvy+Zzpppt0gIhspGRJYbdRMdHU1AAZvwA4cAm2gZziIj2i/7DO/CQi5Ja0KeRD/jQtigB+4sF0y5cpJxyH5rAcpWIqxmrhzDQs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709175507; c=relaxed/simple; bh=wTBT90vLQAt7jPRgA9QAvNLDpWe+6cmuHUKhvPwYsbY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=c6aRx2JSW4o3g/VURvtdHcZtFc1bSBIAg78bdAmT6jTK/PMpWigrMiWUZ4DQkNRAv0XEgXpKiFmPKQAZPp33chg0aR5+b2EfSXIqL2m/Wsowph4zm4tJRVY/iZSA/3aeJtjfLMTikye1JoNCR6FjhSIGmwg0i+tD720XL5FwKdA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org; spf=pass smtp.mailfrom=chromium.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b=TUm3QJqU; arc=none smtp.client-ip=209.85.210.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=chromium.org Received: by mail-pf1-f179.google.com with SMTP id d2e1a72fcca58-6e55bb75c9eso362479b3a.3 for ; Wed, 28 Feb 2024 18:58:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1709175505; x=1709780305; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=guFGcwgeQ9q19T0MpQj3XFS0S0+Vp/ZSlvHqGbok+lw=; b=TUm3QJqUmYeb/qZSBMmtBKc+whEjzhK5xu3GdtOo5gOuv0atgLwyiINVQXQu/E4ikU cDYUdPovJvpCQZQwPN+FaG7Sp3vOzekYuDzQDEt/na8DBWK8NW7idxY4PDSVhPTevjBl yum3+0ibwYWr/7X95ZBJ/0YFMDoDh+esPK3Pc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709175505; x=1709780305; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=guFGcwgeQ9q19T0MpQj3XFS0S0+Vp/ZSlvHqGbok+lw=; b=jI+fqFkAWJzoUaH/FK3x2y6WleBoQF5dODdkk0/bzqOTr4gbEAOylAMFyvtYSp9h77 qO/8LTm6l4hwlP6G3HU9a7X7mrEpI9ZRs1SF9dxdvNL9tXZAuY0Zmp4RR4zLl2f9DEYp HSIPkEAzA2bfnShHmbfSvyAXSBGSUHkhmTcjhj9FvB6SGOmwh0vlpsXtlfOmxQGiCZFY xVkc64X8dJBCKXUjPi0HVCj0NMSSmMMxdWMjOfs4hjQKwS7SsK7FCRiF7b5tWiDR0yfm WH3xyI1uaTHl+A6ciT28GIZyu2nn2Ic26cH66jVzNcS6zHxb7Rx803jiokpZnwamVKRe x8Gw== X-Forwarded-Encrypted: i=1; AJvYcCXSikNLUMvZY3vTbid71+1rlf+oGFJOMtu/7g0egtmTQl31wJgp66RH6qixWT0FebKeLQnHFi1QmjPlklU5FOBWXcpH/3V4RolU94sK X-Gm-Message-State: AOJu0YxupGzcb9hyabpDecfN8RLH9pu8apoc4RmDx7CTL37VcZ+6SxMS UdU+JH8KfOrFLUvH9H+tOLdK4FdJgA0IBvSakXisG4bzkvDCCsxengw4LZgeSg== X-Received: by 2002:a05:6a00:c87:b0:6e5:3b0e:9f14 with SMTP id a7-20020a056a000c8700b006e53b0e9f14mr1271659pfv.13.1709175505093; Wed, 28 Feb 2024 18:58:25 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:f51:e79e:9056:77ea]) by smtp.gmail.com with UTF8SMTPSA id u32-20020a056a0009a000b006e144ec8eafsm153448pfg.119.2024.02.28.18.58.22 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 28 Feb 2024 18:58:24 -0800 (PST) From: David Stevens X-Google-Original-From: David Stevens To: Sean Christopherson , Paolo Bonzini Cc: Yu Zhang , Isaku Yamahata , Zhi Wang , Maxim Levitsky , kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, David Stevens Subject: [PATCH v11 4/8] KVM: mmu: Improve handling of non-refcounted pfns Date: Thu, 29 Feb 2024 11:57:55 +0900 Message-ID: <20240229025759.1187910-5-stevensd@google.com> X-Mailer: git-send-email 2.44.0.rc1.240.g4c46232300-goog In-Reply-To: <20240229025759.1187910-1-stevensd@google.com> References: <20240229025759.1187910-1-stevensd@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1792200789923534195 X-GMAIL-MSGID: 1792200789923534195 From: David Stevens KVM's handling of non-refcounted pfns has two problems: - pfns without struct pages can be accessed without the protection of a mmu notifier. This is unsafe because KVM cannot monitor or control the lifespan of such pfns, so it may continue to access the pfns after they are freed. - struct pages without refcounting (e.g. tail pages of non-compound higher order pages) cannot be used at all, as gfn_to_pfn does not provide enough information for callers to be able to avoid underflowing the refcount. This patch extends the kvm_follow_pfn() API to properly handle these cases: - First, it adds FOLL_GET to the list of supported flags, to indicate whether or not the caller actually wants to take a refcount. - Second, it adds a guarded_by_mmu_notifier parameter that is used to avoid returning non-refcounted pages when the caller cannot safely use them. - Third, it adds an is_refcounted_page output parameter so that callers can tell whether or not a pfn has a struct page that needs to be passed to put_page. Since callers need to be updated on a case-by-case basis to pay attention to is_refcounted_page, the new behavior of returning non-refcounted pages is opt-in via the allow_non_refcounted_struct_page parameter. Once all callers have been updated, this parameter should be removed. The fact that non-refcounted pfns can no longer be accessed without mmu notifier protection by default is a breaking change. This patch provides a module parameter that system admins can use to re-enable the previous unsafe behavior when userspace is trusted not to migrate/free/etc non-refcounted pfns that are mapped into the guest. There is no timeline for updating everything in KVM to use mmu notifiers to alleviate the need for this module parameter. Signed-off-by: David Stevens --- include/linux/kvm_host.h | 29 +++++++++++ virt/kvm/kvm_main.c | 104 +++++++++++++++++++++++++-------------- virt/kvm/pfncache.c | 3 +- 3 files changed, 99 insertions(+), 37 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 290db5133c36..66516088bb0a 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1219,10 +1219,39 @@ struct kvm_follow_pfn { bool atomic; /* Try to create a writable mapping even for a read fault. */ bool try_map_writable; + /* + * Usage of the returned pfn will be guared by a mmu notifier. If + * FOLL_GET is not set, this must be true. + */ + bool guarded_by_mmu_notifier; + /* + * When false, do not return pfns for non-refcounted struct pages. + * + * This allows callers to continue to rely on the legacy behavior + * where pfs returned by gfn_to_pfn can be safely passed to + * kvm_release_pfn without worrying about corrupting the refcount of + * non-refcounted pages. + * + * Callers that opt into non-refcount struct pages need to track + * whether or not the returned pages are refcounted and avoid touching + * them when they are not. Some architectures may not have enough + * free space in PTEs to do this. + */ + bool allow_non_refcounted_struct_page; /* Outputs of kvm_follow_pfn */ hva_t hva; bool writable; + /* + * Non-NULL if the returned pfn is for a page with a valid refcount, + * NULL if the returned pfn has no struct page or if the struct page is + * not being refcounted (e.g. tail pages of non-compound higher order + * allocations from IO/PFNMAP mappings). + * + * NOTE: This will still be set if FOLL_GET is not specified, but the + * returned page will not have an elevated refcount. + */ + struct page *refcounted_page; }; kvm_pfn_t kvm_follow_pfn(struct kvm_follow_pfn *kfp); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 575756c9c5b0..984bcf8511e7 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -96,6 +96,13 @@ unsigned int halt_poll_ns_shrink; module_param(halt_poll_ns_shrink, uint, 0644); EXPORT_SYMBOL_GPL(halt_poll_ns_shrink); +/* + * Allow non-refcounted struct pages and non-struct page memory to + * be mapped without MMU notifier protection. + */ +static bool allow_unsafe_mappings; +module_param(allow_unsafe_mappings, bool, 0444); + /* * Ordering of locks: * @@ -2786,6 +2793,24 @@ static inline int check_user_page_hwpoison(unsigned long addr) return rc == -EHWPOISON; } +static kvm_pfn_t kvm_follow_refcounted_pfn(struct kvm_follow_pfn *kfp, + struct page *page) +{ + kvm_pfn_t pfn = page_to_pfn(page); + + /* + * FIXME: Ideally, KVM wouldn't pass FOLL_GET to gup() when the caller + * doesn't want to grab a reference, but gup() doesn't support getting + * just the pfn, i.e. FOLL_GET is effectively mandatory. If that ever + * changes, drop this and simply don't pass FOLL_GET to gup(). + */ + if (!(kfp->flags & FOLL_GET)) + put_page(page); + + kfp->refcounted_page = page; + return pfn; +} + /* * The fast path to get the writable pfn which will be stored in @pfn, * true indicates success, otherwise false is returned. It's also the @@ -2804,7 +2829,7 @@ static bool hva_to_pfn_fast(struct kvm_follow_pfn *kfp, kvm_pfn_t *pfn) return false; if (get_user_page_fast_only(kfp->hva, FOLL_WRITE, page)) { - *pfn = page_to_pfn(page[0]); + *pfn = kvm_follow_refcounted_pfn(kfp, page[0]); kfp->writable = true; return true; } @@ -2851,7 +2876,7 @@ static int hva_to_pfn_slow(struct kvm_follow_pfn *kfp, kvm_pfn_t *pfn) page = wpage; } } - *pfn = page_to_pfn(page); + *pfn = kvm_follow_refcounted_pfn(kfp, page); return npages; } @@ -2866,16 +2891,6 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault) return true; } -static int kvm_try_get_pfn(kvm_pfn_t pfn) -{ - struct page *page = kvm_pfn_to_refcounted_page(pfn); - - if (!page) - return 1; - - return get_page_unless_zero(page); -} - static int hva_to_pfn_remapped(struct vm_area_struct *vma, struct kvm_follow_pfn *kfp, kvm_pfn_t *p_pfn) { @@ -2884,6 +2899,7 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, pte_t pte; spinlock_t *ptl; bool write_fault = kfp->flags & FOLL_WRITE; + struct page *page; int r; r = follow_pte(vma->vm_mm, kfp->hva, &ptep, &ptl); @@ -2908,37 +2924,40 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, pte = ptep_get(ptep); + kfp->writable = pte_write(pte); + pfn = pte_pfn(pte); + + page = kvm_pfn_to_refcounted_page(pfn); + if (write_fault && !pte_write(pte)) { pfn = KVM_PFN_ERR_RO_FAULT; goto out; } - kfp->writable = pte_write(pte); - pfn = pte_pfn(pte); + if (!page) + goto out; /* - * Get a reference here because callers of *hva_to_pfn* and - * *gfn_to_pfn* ultimately call kvm_release_pfn_clean on the - * returned pfn. This is only needed if the VMA has VM_MIXEDMAP - * set, but the kvm_try_get_pfn/kvm_release_pfn_clean pair will - * simply do nothing for reserved pfns. - * - * Whoever called remap_pfn_range is also going to call e.g. - * unmap_mapping_range before the underlying pages are freed, - * causing a call to our MMU notifier. - * - * Certain IO or PFNMAP mappings can be backed with valid - * struct pages, but be allocated without refcounting e.g., - * tail pages of non-compound higher order allocations, which - * would then underflow the refcount when the caller does the - * required put_page. Don't allow those pages here. + * IO or PFNMAP mappings can be backed with valid struct pages but be + * allocated without refcounting. We need to detect that to make sure we + * only pass refcounted pages to kvm_follow_refcounted_pfn. */ - if (!kvm_try_get_pfn(pfn)) - r = -EFAULT; + if (get_page_unless_zero(page)) + WARN_ON_ONCE(kvm_follow_refcounted_pfn(kfp, page) != pfn); out: pte_unmap_unlock(ptep, ptl); - *p_pfn = pfn; + + if (page && !kfp->refcounted_page && + !kfp->allow_non_refcounted_struct_page) { + r = -EFAULT; + } else if (!kfp->refcounted_page && + !kfp->guarded_by_mmu_notifier && + !allow_unsafe_mappings) { + r = -EFAULT; + } else { + *p_pfn = pfn; + } return r; } @@ -3004,6 +3023,11 @@ kvm_pfn_t hva_to_pfn(struct kvm_follow_pfn *kfp) kvm_pfn_t kvm_follow_pfn(struct kvm_follow_pfn *kfp) { kfp->writable = false; + kfp->refcounted_page = NULL; + + if (WARN_ON_ONCE(!(kfp->flags & FOLL_GET) && !kfp->guarded_by_mmu_notifier)) + return KVM_PFN_ERR_FAULT; + kfp->hva = __gfn_to_hva_many(kfp->slot, kfp->gfn, NULL, kfp->flags & FOLL_WRITE); @@ -3028,9 +3052,10 @@ kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn, struct kvm_follow_pfn kfp = { .slot = slot, .gfn = gfn, - .flags = 0, + .flags = FOLL_GET, .atomic = atomic, .try_map_writable = !!writable, + .allow_non_refcounted_struct_page = false, }; if (write_fault) @@ -3060,8 +3085,9 @@ kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault, struct kvm_follow_pfn kfp = { .slot = gfn_to_memslot(kvm, gfn), .gfn = gfn, - .flags = write_fault ? FOLL_WRITE : 0, + .flags = FOLL_GET | (write_fault ? FOLL_WRITE : 0), .try_map_writable = !!writable, + .allow_non_refcounted_struct_page = false, }; pfn = kvm_follow_pfn(&kfp); if (writable) @@ -3075,7 +3101,8 @@ kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn) struct kvm_follow_pfn kfp = { .slot = slot, .gfn = gfn, - .flags = FOLL_WRITE, + .flags = FOLL_GET | FOLL_WRITE, + .allow_non_refcounted_struct_page = false, }; return kvm_follow_pfn(&kfp); } @@ -3086,8 +3113,13 @@ kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gfn_t gf struct kvm_follow_pfn kfp = { .slot = slot, .gfn = gfn, - .flags = FOLL_WRITE, + .flags = FOLL_GET | FOLL_WRITE, .atomic = true, + /* + * Setting atomic means __kvm_follow_pfn will never make it + * to hva_to_pfn_remapped, so this is vacuously true. + */ + .allow_non_refcounted_struct_page = true, }; return kvm_follow_pfn(&kfp); } diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c index 1fb21c2ced5d..6e82062ea203 100644 --- a/virt/kvm/pfncache.c +++ b/virt/kvm/pfncache.c @@ -147,8 +147,9 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc) struct kvm_follow_pfn kfp = { .slot = gpc->memslot, .gfn = gpa_to_gfn(gpc->gpa), - .flags = FOLL_WRITE, + .flags = FOLL_GET | FOLL_WRITE, .hva = gpc->uhva, + .allow_non_refcounted_struct_page = false, }; lockdep_assert_held(&gpc->refresh_lock); From patchwork Thu Feb 29 02:57:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Stevens X-Patchwork-Id: 208182 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2097:b0:108:e6aa:91d0 with SMTP id gs23csp143120dyb; Wed, 28 Feb 2024 19:00:11 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCVKhn4q6Ts0iSH2NFrZwTeVbMAqzZz60n/XE7ENAqS+aA3aSWZsNsIF+TGxIqxmtXpwAhIHiVMjcuRDi5QM6FUy+7mrAQ== X-Google-Smtp-Source: AGHT+IHGcd3nohgcuzbmimVwKme6mM4d0KGvBF++2nlFxK3ceLkQ9SNz0a03WglX9OlroVtZelYo X-Received: by 2002:a05:6a20:1a94:b0:1a1:2a5b:a75b with SMTP id ci20-20020a056a201a9400b001a12a5ba75bmr372009pzb.12.1709175611547; Wed, 28 Feb 2024 19:00:11 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709175611; cv=pass; d=google.com; s=arc-20160816; b=qYtbh+L8JYGabJK5crQQ/dPu4GmNc2i+Gx4Lb5iR4w/fhf8YRtweIb0DXp6YWhxeky MRD5Dus/3+ZNbk1ljUDETgGxc/ybBhndAB7bJ04GX/cZh1KsiEqa258j3h+sFTsiQkMd 7qHqXZmtaGKZRtb3kYbjpgdk4Mi4bZuDu9tXt5pBAPKdIAxVKJWcX35MJmjR4U1q9Ve7 XHAizx2rTNjqkrhoaHWm79oceCTaaT6WAhf7XvAS2yk9Mq8INP1ke7HIKCioJBiksdO/ TwihDnrXKIOVdHkfrokcfXGhWYtbv3rBAxDSSUfm5OqT6tmIa/wvLWe2REmWrNQyrh8J VM4Q== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=Ni2jT8j4ual03uOxDJftxXgB5iCIPPNSKtQ7Mr+vnE0=; fh=tNnHSUJEi69o8gcBdrccfCF3vxqgU7FosT0aZccnYf4=; b=qoc26w0xPbnwNiLJfWuzlTxsD5jJ8N8guQH3oXr8IdXkm6kWeLn2JcKAL8Au9nSywk uplhGHIAF5wQYkfOYsMwRgzZZoNjgxCXttHbFk0/gx8T+oj6VwPloe/9ifQuV8C91hGx d0GNbtzyqZTu5/CzPc/P0ybCLwa5maxMEM8WCA7kltVvNcEwAxjtDf7+Lvjqw5hzcfDk A5vpKRwusk31lhL3iWyEC8glOg3LS97JqcUwly5Ns+WWhXBXB8iWXWmrKY+YFMEtCZjJ XRAg6FRJL/2MJOhfEnjEN1wuk/3xjj/4GTqvEcCu7/ZFtBtjIqQ2gDrdZHdGWijNF63a mYog==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=J0QDgUnR; arc=pass (i=1 spf=pass spfdomain=chromium.org dkim=pass dkdomain=chromium.org dmarc=pass fromdomain=chromium.org); spf=pass (google.com: domain of linux-kernel+bounces-86069-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-86069-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id y6-20020a63ce06000000b005e19d15689bsi429237pgf.819.2024.02.28.19.00.11 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Feb 2024 19:00:11 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-86069-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=J0QDgUnR; arc=pass (i=1 spf=pass spfdomain=chromium.org dkim=pass dkdomain=chromium.org dmarc=pass fromdomain=chromium.org); spf=pass (google.com: domain of linux-kernel+bounces-86069-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-86069-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 49D1A2860DA for ; Thu, 29 Feb 2024 03:00:11 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id E79CE41C84; Thu, 29 Feb 2024 02:58:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="J0QDgUnR" Received: from mail-pf1-f176.google.com (mail-pf1-f176.google.com [209.85.210.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 871093D38C for ; Thu, 29 Feb 2024 02:58:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709175512; cv=none; b=UeLehDhpMPrP1I+HvbimY2kJqQ0GrEcujUOw/xW54dyivUKk2kjUw2Fs/zdjCbF+mNtfupHUm8oiFcPcxPIyfESJYT1/jzx4vYArr6ZY7Z6IDyNVdBs7Q8yivj+VKwlias5N847XiGVeQ42tGmdXewR7v+5IUd9ygew97V2EuaQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709175512; c=relaxed/simple; bh=ntmqS432o2IecDkbveS88A7jvkU9H2eMDxulzMNm4uA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=SR2RoSp9JnWYjLbL5ySciOUrm0RvGPmk3ZqvozJFHp9tNFKA0xPx8T/XoTm2jL3o+cC95o6BmERbSyu2nwFQ6OY2BCQ43xX4zXiPiYgosMVblF30Sqrdk3fd+qfUP/xX2SQIm5+h9of164ViP3DkSKF5NXCw44je+soYtgwLkO4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org; spf=pass smtp.mailfrom=chromium.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b=J0QDgUnR; arc=none smtp.client-ip=209.85.210.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=chromium.org Received: by mail-pf1-f176.google.com with SMTP id d2e1a72fcca58-6e581ba25e1so334855b3a.1 for ; Wed, 28 Feb 2024 18:58:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1709175510; x=1709780310; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Ni2jT8j4ual03uOxDJftxXgB5iCIPPNSKtQ7Mr+vnE0=; b=J0QDgUnR/OeRtU1ByzEXvqoTR5PYVh3RUK9jBS1tOfws8S1szIMaZRyDiVs0rr0iH/ nwpSRRKmq79Fo67RgPTkzk1X8T0ksPR5oCvPFX9gLjJahSEEOahzeZRcjoaz2hdZYU9e hdgEB1OZIYjHwfILSifxU9lPSTLXxfy1R35oQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709175510; x=1709780310; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ni2jT8j4ual03uOxDJftxXgB5iCIPPNSKtQ7Mr+vnE0=; b=SOpAevWY8t6yfCJRjD8x4RFdWarED3VhuWkmRv3Ky7wxYH14Ph05S9rMBl+M4ssFOK zfXlx+MW0cAduskjOxsLfVXDRfpHAVWiGWd+SNEe04iYjKK2IGE3LIGQiBVr6udbgB5F wjwrm9SfyXUx0/Bu+AooDjNebJkSx3xV8bNzueAn2pnFtOPfxeyoiLklu9eD+JmJi38n qSjoIMwDwlpIr+6TT+vGCkxaeR4P2QKHbQWJiXgrOHB5iaXZfTsCXblQ+SMNznK71sOz OzEyfepgOqnmfAxmOE1LT1NbA3jm/sRzJioszaX1XIS71YQRpMfm8McCRDzdfI11LRSx eDaw== X-Forwarded-Encrypted: i=1; AJvYcCUMJRo35G4l8OLHg+NykjjWRTStzynjuAHHP8SlLZ0zXpMQSqC527tokHRfBv8txFeeyPHsVH2Zkf323/S3rZdyqby+G2viLrgCXJVZ X-Gm-Message-State: AOJu0YwCDKWQgDt2bCEfVJznvDSLNfwyobFl2yR53XAZ72aWwemjzThi Dzf7oX+BR/eO70eP/7DiIztHZMhnd+qm2bgrJO6Os1+9IRVl0Y/L/kZluJK0Ww== X-Received: by 2002:a05:6a00:cc4:b0:6e5:37cb:64c4 with SMTP id b4-20020a056a000cc400b006e537cb64c4mr952284pfv.9.1709175509796; Wed, 28 Feb 2024 18:58:29 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:f51:e79e:9056:77ea]) by smtp.gmail.com with UTF8SMTPSA id x65-20020a626344000000b006da96503d9fsm167784pfb.109.2024.02.28.18.58.27 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 28 Feb 2024 18:58:29 -0800 (PST) From: David Stevens X-Google-Original-From: David Stevens To: Sean Christopherson , Paolo Bonzini Cc: Yu Zhang , Isaku Yamahata , Zhi Wang , Maxim Levitsky , kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, David Stevens Subject: [PATCH v11 5/8] KVM: Migrate kvm_vcpu_map() to kvm_follow_pfn() Date: Thu, 29 Feb 2024 11:57:56 +0900 Message-ID: <20240229025759.1187910-6-stevensd@google.com> X-Mailer: git-send-email 2.44.0.rc1.240.g4c46232300-goog In-Reply-To: <20240229025759.1187910-1-stevensd@google.com> References: <20240229025759.1187910-1-stevensd@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1792200526058735194 X-GMAIL-MSGID: 1792200526058735194 From: David Stevens Migrate kvm_vcpu_map() to kvm_follow_pfn(). Track is_refcounted_page so that kvm_vcpu_unmap() know whether or not it needs to release the page. Signed-off-by: David Stevens --- include/linux/kvm_host.h | 2 +- virt/kvm/kvm_main.c | 24 ++++++++++++++---------- 2 files changed, 15 insertions(+), 11 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 66516088bb0a..59dc9fbafc08 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -295,6 +295,7 @@ struct kvm_host_map { void *hva; kvm_pfn_t pfn; kvm_pfn_t gfn; + bool is_refcounted_page; }; /* @@ -1270,7 +1271,6 @@ void kvm_release_pfn_dirty(kvm_pfn_t pfn); void kvm_set_pfn_dirty(kvm_pfn_t pfn); void kvm_set_pfn_accessed(kvm_pfn_t pfn); -void kvm_release_pfn(kvm_pfn_t pfn, bool dirty); int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset, int len); int kvm_read_guest(struct kvm *kvm, gpa_t gpa, void *data, unsigned long len); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 984bcf8511e7..17bf9fd6774e 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3184,24 +3184,22 @@ struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn) } EXPORT_SYMBOL_GPL(gfn_to_page); -void kvm_release_pfn(kvm_pfn_t pfn, bool dirty) -{ - if (dirty) - kvm_release_pfn_dirty(pfn); - else - kvm_release_pfn_clean(pfn); -} - int kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map) { kvm_pfn_t pfn; void *hva = NULL; struct page *page = KVM_UNMAPPED_PAGE; + struct kvm_follow_pfn kfp = { + .slot = gfn_to_memslot(vcpu->kvm, gfn), + .gfn = gfn, + .flags = FOLL_GET | FOLL_WRITE, + .allow_non_refcounted_struct_page = true, + }; if (!map) return -EINVAL; - pfn = gfn_to_pfn(vcpu->kvm, gfn); + pfn = kvm_follow_pfn(&kfp); if (is_error_noslot_pfn(pfn)) return -EINVAL; @@ -3221,6 +3219,7 @@ int kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map) map->hva = hva; map->pfn = pfn; map->gfn = gfn; + map->is_refcounted_page = !!kfp.refcounted_page; return 0; } @@ -3244,7 +3243,12 @@ void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map, bool dirty) if (dirty) kvm_vcpu_mark_page_dirty(vcpu, map->gfn); - kvm_release_pfn(map->pfn, dirty); + if (map->is_refcounted_page) { + if (dirty) + kvm_release_page_dirty(map->page); + else + kvm_release_page_clean(map->page); + } map->hva = NULL; map->page = NULL; From patchwork Thu Feb 29 02:57:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Stevens X-Patchwork-Id: 208183 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2097:b0:108:e6aa:91d0 with SMTP id gs23csp143278dyb; Wed, 28 Feb 2024 19:00:34 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCXQ1fWHTuE4QGmdXCHcNSDwb+MDaqoomxzHRC28dnbLjZOAzFW6s0CR7HZPOMx7VetlXCvmvenEZ4DT6UG+/H2fSznNIA== X-Google-Smtp-Source: AGHT+IF/6e6kRxYyig2TuxjNgldhUK/7aR8HNd9dLtOVNmgVUFo4fe0oEDxoJbTPTbdidvMwOB4D X-Received: by 2002:a17:906:6819:b0:a41:3728:9428 with SMTP id k25-20020a170906681900b00a4137289428mr440746ejr.4.1709175634158; Wed, 28 Feb 2024 19:00:34 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709175634; cv=pass; d=google.com; s=arc-20160816; b=Ih+yo6sKTf1ZsfPf8eeWcjzvSX0Viw0Ufkx6LbdBKiJdUVEOXtnE7qm1hGc14QRaFD YD05TR91QWk5ZK4A4m5gtnSRVKvSBz0RjzdqE7sl7p1S+6rhA1GVhU1KKUVfoiQDjMsl h9vZhwe5X3f+s2fJyVr1V4SKFW4VnIhgCdXSz1Vz+6gfBW6b24J2RzGII6hgwc7qpsnh 35docK7n0qo7IMd87geAiFvAtNZByEl3iodGb1Z9p/SPmjfr6GlFDvtjz2gOT8kgMPPz R6OM7BKbjSG3Y9jq9dLup2prtQEkzHftKRs21IL3EHQJJd1QU278dQtm5bWcGeanDdBO lasw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=rjuEwew1c9XXdIDCEsgjgWaQEbnELox0pWUx+WG8IvI=; fh=7o953U1ZaYOInpY3izWXVtZgaQtHXyxO7FJu0TvuXis=; b=aQbOzp1n0g1W55+IwASUgaR+zIep4P4CUqkKMmpiqZQp71fZeN68ef0K9SrXgM/jk2 CQEBci2NNnex3Ak3hyihuXunsQAR0s4JmeOetWVxzm5PpOo9eN6PFlN3/W0mReIbtN60 Zg+n4Hs/f/9Q7Wr8W0mVG5Jiduxuw1n0od20JgpLXvn/jzB+9Y9uDRvNXotpeqayUgJr zf4Nk21NYX+uh0vm7xflcBDlYvsdgE7qG9NclP4GNkznC5oMzqPvKn+vl0qXW/zGg30Z 92WdaLBe66xq2u2Cw0eKLiSvMfYqfYDzWhp54ZMCHPTCi0ficAUImu8gmNLgx7T6gWgO HW1w==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=iMgLyVET; arc=pass (i=1 spf=pass spfdomain=chromium.org dkim=pass dkdomain=chromium.org dmarc=pass fromdomain=chromium.org); spf=pass (google.com: domain of linux-kernel+bounces-86070-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-86070-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id h4-20020a1709063b4400b00a441e6636c6si164123ejf.251.2024.02.28.19.00.33 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Feb 2024 19:00:34 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-86070-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=iMgLyVET; arc=pass (i=1 spf=pass spfdomain=chromium.org dkim=pass dkdomain=chromium.org dmarc=pass fromdomain=chromium.org); spf=pass (google.com: domain of linux-kernel+bounces-86070-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-86070-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 831C31F24277 for ; Thu, 29 Feb 2024 03:00:33 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 00EA7446B7; Thu, 29 Feb 2024 02:58:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="iMgLyVET" Received: from mail-oi1-f182.google.com (mail-oi1-f182.google.com [209.85.167.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6773041E4E for ; Thu, 29 Feb 2024 02:58:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709175517; cv=none; b=FmfyUBtaEH8ZsKNRNIXqi8yWP3D2Flzth/egda5h3ydC9/VgDNZph2QFny9c5WMD7T/WNYvJgZPMJ5kY1qPw7OE14uY/BCeBHTabQx164TzgqXpxZMRRcdnKYgu2xnvICgghLELIsqQZV1JuScJFP4yFfyaMQYfR4zFaKWK/xGk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709175517; c=relaxed/simple; bh=UIJzBxDI/Inn6eJziAqaeOpmUvLyyhxfpWjO2dCmE1Y=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gJJuXJc2pmJJspq+CveRvmU+9PuA4UvKtqLw290Py2NAOelHS2IHacFv+lHDCJYUsiKFJPfafBhkaQRodMaTi5cvUAP2I3xj7wsf4Z19tDAzlnMs4JCkenCbjAjN36ovc0W0fhbi12v2T8TJA0MOHKm+ruOtmJcLtvAFtGIAMwk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org; spf=pass smtp.mailfrom=chromium.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b=iMgLyVET; arc=none smtp.client-ip=209.85.167.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=chromium.org Received: by mail-oi1-f182.google.com with SMTP id 5614622812f47-3c1a172e46bso278921b6e.3 for ; Wed, 28 Feb 2024 18:58:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1709175514; x=1709780314; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=rjuEwew1c9XXdIDCEsgjgWaQEbnELox0pWUx+WG8IvI=; b=iMgLyVETxWC8KRYKo/7f74QL9LDcZBAMlVPkBKvtTNYj/ZL14ZW7CKnWrEJVkO1PKA mBcVirTjXDxpOt84VuuDZLrF/Z3h5ayEsGwJvGsfhp0e9xPGn9eNwv7r2Kovve3L+KVF odJU3QEC08tnBddJm/oQYWU46oyOvELb6P/jI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709175514; x=1709780314; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rjuEwew1c9XXdIDCEsgjgWaQEbnELox0pWUx+WG8IvI=; b=WYNsuz0SNnKzH1buapCh8YPe5x0DjrceEhMh60nJz3RqZZsC3mYIGcmt0UuFsAgS5N nJ8gpa8yCiPkOt6fe/FUqgxpptMkeeHgnHuq+DisaDL/+R8k4o7OYUXL51I9igmvjSaK dPMT5HQfEQTRjA79b6KOGTb2owI9rczxv13WcHIhNOIz9LoKboJh/1eVH0OVa8w1JSVp sDxFhYI0KbuNuTfyMWlHIUzsT2pLlHBfW/MRokEx7fmv0jLRHkYlxcM6TIZXffKP+Ihr soAJe4nEK2YCjs9/4fcqseWFLyvGjjBGOS35VMeWt7cLgJcpHpy+SKtQg29PBot3nGOo ppEQ== X-Forwarded-Encrypted: i=1; AJvYcCWAnnFbK2uHJAcQN8js+kzMwuijhUMYksxSKmUpp1UFpkytsffe8FF/VauUcWsAbpMbs56F3YPwxvm8lbROoAdYnVVLzZsGlfR4VKKX X-Gm-Message-State: AOJu0YzbLEgKxpFMfuct5RDnzkrkbUc3bIM2rB243THkcWQCzib3p/7J 8ynqrBp+fttUwxd3idF8wzb2SchdhCDzS2p0rPiOeSBDh9cvUvEYZYhCR1AW9UbqnZzGvmaQZqQ = X-Received: by 2002:a05:6808:2196:b0:3c1:acc3:99ce with SMTP id be22-20020a056808219600b003c1acc399cemr1050071oib.37.1709175514558; Wed, 28 Feb 2024 18:58:34 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:f51:e79e:9056:77ea]) by smtp.gmail.com with UTF8SMTPSA id e25-20020a62aa19000000b006e45dce37basm153830pff.220.2024.02.28.18.58.32 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 28 Feb 2024 18:58:34 -0800 (PST) From: David Stevens X-Google-Original-From: David Stevens To: Sean Christopherson , Paolo Bonzini Cc: Yu Zhang , Isaku Yamahata , Zhi Wang , Maxim Levitsky , kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, David Stevens Subject: [PATCH v11 6/8] KVM: x86: Migrate to kvm_follow_pfn() Date: Thu, 29 Feb 2024 11:57:57 +0900 Message-ID: <20240229025759.1187910-7-stevensd@google.com> X-Mailer: git-send-email 2.44.0.rc1.240.g4c46232300-goog In-Reply-To: <20240229025759.1187910-1-stevensd@google.com> References: <20240229025759.1187910-1-stevensd@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1792200549770625629 X-GMAIL-MSGID: 1792200549770625629 From: David Stevens Migrate functions which need to be able to map non-refcounted struct pages to kvm_follow_pfn(). These functions are kvm_faultin_pfn() and reexecute_instruction(). The former requires replacing the async in/out parameter with FOLL_NOWAIT parameter and the KVM_PFN_ERR_NEEDS_IO return value (actually handling non-refcounted pages is complicated, so it will be done in a followup). The latter is a straightforward refactor. APIC related callers do not need to migrate because KVM controls the memslot, so it will always be regular memory. Prefetch related callers do not need to be migrated because atomic gfn_to_pfn() calls can never make it to hva_to_pfn_remapped(). Signed-off-by: David Stevens Reviewed-by: Maxim Levitsky --- arch/x86/kvm/mmu/mmu.c | 43 ++++++++++++++++++++++++++++++++---------- arch/x86/kvm/x86.c | 11 +++++++++-- virt/kvm/kvm_main.c | 11 ++++------- 3 files changed, 46 insertions(+), 19 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 2d6cdeab1f8a..bbeb0f6783d7 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4331,7 +4331,14 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu, static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { struct kvm_memory_slot *slot = fault->slot; - bool async; + struct kvm_follow_pfn kfp = { + .slot = slot, + .gfn = fault->gfn, + .flags = FOLL_GET | (fault->write ? FOLL_WRITE : 0), + .try_map_writable = true, + .guarded_by_mmu_notifier = true, + .allow_non_refcounted_struct_page = false, + }; /* * Retry the page fault if the gfn hit a memslot that is being deleted @@ -4368,12 +4375,20 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault if (fault->is_private) return kvm_faultin_pfn_private(vcpu, fault); - async = false; - fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, false, &async, - fault->write, &fault->map_writable, - &fault->hva); - if (!async) - return RET_PF_CONTINUE; /* *pfn has correct page already */ + kfp.flags |= FOLL_NOWAIT; + fault->pfn = kvm_follow_pfn(&kfp); + + if (!is_error_noslot_pfn(fault->pfn)) + goto success; + + /* + * If kvm_follow_pfn() failed because I/O is needed to fault in the + * page, then either set up an asynchronous #PF to do the I/O, or if + * doing an async #PF isn't possible, retry kvm_follow_pfn() with + * I/O allowed. All other failures are fatal, i.e. retrying won't help. + */ + if (fault->pfn != KVM_PFN_ERR_NEEDS_IO) + return RET_PF_CONTINUE; if (!fault->prefetch && kvm_can_do_async_pf(vcpu)) { trace_kvm_try_async_get_page(fault->addr, fault->gfn); @@ -4391,9 +4406,17 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault * to wait for IO. Note, gup always bails if it is unable to quickly * get a page and a fatal signal, i.e. SIGKILL, is pending. */ - fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, true, NULL, - fault->write, &fault->map_writable, - &fault->hva); + kfp.flags |= FOLL_INTERRUPTIBLE; + kfp.flags &= ~FOLL_NOWAIT; + fault->pfn = kvm_follow_pfn(&kfp); + + if (!is_error_noslot_pfn(fault->pfn)) + goto success; + + return RET_PF_CONTINUE; +success: + fault->hva = kfp.hva; + fault->map_writable = kfp.writable; return RET_PF_CONTINUE; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 363b1c080205..f4a20e9bc7a6 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8747,6 +8747,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, { gpa_t gpa = cr2_or_gpa; kvm_pfn_t pfn; + struct kvm_follow_pfn kfp; if (!(emulation_type & EMULTYPE_ALLOW_RETRY_PF)) return false; @@ -8776,7 +8777,13 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, * retry instruction -> write #PF -> emulation fail -> retry * instruction -> ... */ - pfn = gfn_to_pfn(vcpu->kvm, gpa_to_gfn(gpa)); + kfp = (struct kvm_follow_pfn) { + .slot = gfn_to_memslot(vcpu->kvm, gpa_to_gfn(gpa)), + .gfn = gpa_to_gfn(gpa), + .flags = FOLL_GET | FOLL_WRITE, + .allow_non_refcounted_struct_page = true, + }; + pfn = kvm_follow_pfn(&kfp); /* * If the instruction failed on the error pfn, it can not be fixed, @@ -8785,7 +8792,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, if (is_error_noslot_pfn(pfn)) return false; - kvm_release_pfn_clean(pfn); + kvm_release_page_clean(kfp.refcounted_page); /* The instructions are well-emulated on direct mmu. */ if (vcpu->arch.mmu->root_role.direct) { diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 17bf9fd6774e..24e2269339cb 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3293,6 +3293,9 @@ void kvm_release_page_clean(struct page *page) { WARN_ON(is_error_page(page)); + if (!page) + return; + kvm_set_page_accessed(page); put_page(page); } @@ -3300,16 +3303,10 @@ EXPORT_SYMBOL_GPL(kvm_release_page_clean); void kvm_release_pfn_clean(kvm_pfn_t pfn) { - struct page *page; - if (is_error_noslot_pfn(pfn)) return; - page = kvm_pfn_to_refcounted_page(pfn); - if (!page) - return; - - kvm_release_page_clean(page); + kvm_release_page_clean(kvm_pfn_to_refcounted_page(pfn)); } EXPORT_SYMBOL_GPL(kvm_release_pfn_clean); From patchwork Thu Feb 29 02:57:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Stevens X-Patchwork-Id: 208184 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2097:b0:108:e6aa:91d0 with SMTP id gs23csp143575dyb; Wed, 28 Feb 2024 19:01:07 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCXyL5HGaIfIs7pMHozZlZ3tAdl/m4fgqjdwLHQNKVPjctVbQ6k637CIMvVQ9GqmgkyoLzKs2rU+T2z7RSV9TGyR6dM6Og== X-Google-Smtp-Source: AGHT+IEIwfkFUjoIN2aLM70g1VRmRkEdlYE/8TjUNH4dW0aNSkd5jk+jUO1YxtkBjIWGYTKjvlm7 X-Received: by 2002:ac8:4c90:0:b0:42e:b8d5:fcb1 with SMTP id j16-20020ac84c90000000b0042eb8d5fcb1mr719486qtv.52.1709175667412; Wed, 28 Feb 2024 19:01:07 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709175667; cv=pass; d=google.com; s=arc-20160816; b=eG4PIhntnIZgR29y5PU6i646dW0b96RDtAgyX6b6TVfGRaIWeYSda2edtj9+Khrvvc bsURMJ+R1DmyyI/aPHVaV+Xd9MOpBBPXwm5MNzKRvcWhBFkO2NqT98GrWDbsTPK/CLZZ 8KBWnwd1hU+bXIdoEN2hzBJFrbgWaBjoXqQOHUzVi5In2BjBMUNwbxsl2sDO9yPTZoo2 opvQxVpljpDWjSNQxpFUI7CzBNWEtTZFuY4ZsJDquRty/yNfpDdiuHJKAwSlKVVnJxcF SC9Mxlv0r6W3jU+cZLGMKco+NMJcsyVm2xI6XVPjWdhi0xlh2EAEqH9Jw3SOepfdSxiE C0Rw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=mkoJpJo0GmazuamQk6RjcdpeI7xXFyOUp8sC9NxHbRI=; fh=pLuXquw6dgXvsSSNS0SbS6y9Ho6geGinHLgmMJECDl4=; b=tOaT38C/1GLJ8HVhQqB9rhh8VPXGxJ9xYAiPoah+QkjuB2nxWOI9R7IN5uR7UD/o/W Dy/RbA5FuX08HPcCOTEuFO+0decj8p6eNaVq02uAsgpQs+02AiBDhc5VrOjd7H8wA5hI gEGk+KDRDb8pucvrjJ0kz0CpNwnH6abIHM8bLFy6p3hAWJ0w+8Zxb1eLla5//7oaLvAi YeyAU8y4ut1rmciiwSOLA/2P1C4G8GZ8Agv+cnsdg0c2FtUhWpOUA210fw8ELJDUDaSW UNuEBStzvBsTlpzgAHri4ysKQn/u2JnK5iVID5Wo1vEf8xpG9Mo+mr3Py/N3NzDBEw/a w2qg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=OSJZ1f5M; arc=pass (i=1 spf=pass spfdomain=chromium.org dkim=pass dkdomain=chromium.org dmarc=pass fromdomain=chromium.org); spf=pass (google.com: domain of linux-kernel+bounces-86071-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-86071-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id d14-20020a05622a05ce00b0042e9574b4f3si576758qtb.644.2024.02.28.19.01.07 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Feb 2024 19:01:07 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-86071-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=OSJZ1f5M; arc=pass (i=1 spf=pass spfdomain=chromium.org dkim=pass dkdomain=chromium.org dmarc=pass fromdomain=chromium.org); spf=pass (google.com: domain of linux-kernel+bounces-86071-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-86071-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 1BC781C20E56 for ; Thu, 29 Feb 2024 03:01:07 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id EE64545BE2; Thu, 29 Feb 2024 02:58:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="OSJZ1f5M" Received: from mail-oa1-f47.google.com (mail-oa1-f47.google.com [209.85.160.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 25D59446AB for ; Thu, 29 Feb 2024 02:58:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709175523; cv=none; b=W/IwX3knFa3VSKDNVF6t4htABI+/+5bYfLYwmN8sgcc1mlzIW91fcXfTlZjIIzbpgE85keKSm4Z0sSFQzH6rGdCAzUQD6mIWwhWphdVlGf16m24BXBhfREu5bKgA/vUHfLXD/iU7XtqROmZaZIyk73DgyEG3OJBkKG9Z2J0ysBo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709175523; c=relaxed/simple; bh=mx4ftk9A2eIm9XAiiyU38OIODZ8JYeMn5dt8m6qb2xg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=QLy/Dv7YBxu3eLmpDiBV7iqHttkmbVE1cZ5KXr5+3MYxXUfWPOOnL10eXf9LKvn//aFTneMwosk76VCLk3pB4qy0ZqR38LRIRI8X41CmrUZUdUeibU8gIwIVpmwnJPJTYIISxbc8r31vxVflmZffhVAGHjOZDgts/Gm6PrGjenw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org; spf=pass smtp.mailfrom=chromium.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b=OSJZ1f5M; arc=none smtp.client-ip=209.85.160.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=chromium.org Received: by mail-oa1-f47.google.com with SMTP id 586e51a60fabf-21e45ece781so265963fac.0 for ; Wed, 28 Feb 2024 18:58:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1709175519; x=1709780319; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=mkoJpJo0GmazuamQk6RjcdpeI7xXFyOUp8sC9NxHbRI=; b=OSJZ1f5MVL/wYvFWJSp5mZ9WtNfkduU/eKx0FTVD/QunRr+FZRNvWTq2zQzQouwyd1 KSIsxDkRZ0/TpAWFeBQpcKPHn++cMNm0WNjM4+wNb8AnoS76o5SPDXPmfnpjurLdnXP4 1IbSyyIJkQpdQe1UhUfu6brQN7F4EPP5SKV7E= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709175519; x=1709780319; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mkoJpJo0GmazuamQk6RjcdpeI7xXFyOUp8sC9NxHbRI=; b=bOfREXsOaxmi4bPNWUode9pX8VErMkzyEdHJosmC5LGBOcERkt2VBdI3LO6+2PCXKe 3Vk+kRjHtGxBhrPngQaQT62D6Lb/9Qkp/kIJhiYo2aidW3RW3XrDqfJrVsXL4TSV2RMb OmPqDmm6t4A0ba8Q8mZFj944dPwbqKasPongDZJn6hPcooa8op8KG8xTIcB2jf1HQ/6M 6H/HWsbfJj0+VgSMLGeoqPl8+Xztjc45PRD7Fpz7ddd4a3JFcSetqZYEl8RG5Brgbw1h PgVj5c+++B0hoqn3vfKnAwrMtEA64TaG0GYUbM2/La+h8oOgkxWiVH7P3NH2YeGE591n Ig0Q== X-Forwarded-Encrypted: i=1; AJvYcCUHFSshDbWaxZq4vzGfHGwYtO+Vk+65vof73GIujICjmbjG4zcBvPQ+7k7m02KMa+ZuhwKU172Dpabr7/BwX/IMmQl/8aKCWKPRQwOk X-Gm-Message-State: AOJu0YxIAuV1Zrcj8PQrBN+MEa6695Z/CzwNtD4jkW/bqFqFuTW8kVtX upUsmwkGvA3qbXRrsipbcjvwbVGn7Q/mGqYZT7mmWDvLvB4uTl/WhzYkAZzItg== X-Received: by 2002:a05:6870:a79e:b0:21f:cd31:f051 with SMTP id x30-20020a056870a79e00b0021fcd31f051mr796695oao.11.1709175519257; Wed, 28 Feb 2024 18:58:39 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:f51:e79e:9056:77ea]) by smtp.gmail.com with UTF8SMTPSA id z12-20020aa785cc000000b006e56e5c09absm166699pfn.14.2024.02.28.18.58.36 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 28 Feb 2024 18:58:38 -0800 (PST) From: David Stevens X-Google-Original-From: David Stevens To: Sean Christopherson , Paolo Bonzini Cc: Yu Zhang , Isaku Yamahata , Zhi Wang , Maxim Levitsky , kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, David Stevens Subject: [PATCH v11 7/8] KVM: x86/mmu: Track if sptes refer to refcounted pages Date: Thu, 29 Feb 2024 11:57:58 +0900 Message-ID: <20240229025759.1187910-8-stevensd@google.com> X-Mailer: git-send-email 2.44.0.rc1.240.g4c46232300-goog In-Reply-To: <20240229025759.1187910-1-stevensd@google.com> References: <20240229025759.1187910-1-stevensd@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1792200584637647564 X-GMAIL-MSGID: 1792200584637647564 From: David Stevens Use one of the unused bits in EPT sptes to track whether or not an spte refers to a struct page that has a valid refcount, in preparation for adding support for mapping such pages into guests. The new bit is used to avoid triggering a page_count() == 0 warning and to avoid touching A/D bits of unknown usage. Non-EPT sptes don't have any free bits to use, so this tracking is not possible when TDP is disabled or on 32-bit x86. Signed-off-by: David Stevens --- arch/x86/kvm/mmu/mmu.c | 47 ++++++++++++++++++++-------------- arch/x86/kvm/mmu/paging_tmpl.h | 5 ++-- arch/x86/kvm/mmu/spte.c | 5 +++- arch/x86/kvm/mmu/spte.h | 16 +++++++++++- arch/x86/kvm/mmu/tdp_mmu.c | 21 ++++++++------- include/linux/kvm_host.h | 3 +++ virt/kvm/kvm_main.c | 6 +++-- 7 files changed, 69 insertions(+), 34 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index bbeb0f6783d7..4936a8c5829b 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -541,12 +541,14 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte) if (is_accessed_spte(old_spte) && !is_accessed_spte(new_spte)) { flush = true; - kvm_set_pfn_accessed(spte_to_pfn(old_spte)); + if (is_refcounted_page_spte(old_spte)) + kvm_set_page_accessed(pfn_to_page(spte_to_pfn(old_spte))); } if (is_dirty_spte(old_spte) && !is_dirty_spte(new_spte)) { flush = true; - kvm_set_pfn_dirty(spte_to_pfn(old_spte)); + if (is_refcounted_page_spte(old_spte)) + kvm_set_page_dirty(pfn_to_page(spte_to_pfn(old_spte))); } return flush; @@ -578,20 +580,23 @@ static u64 mmu_spte_clear_track_bits(struct kvm *kvm, u64 *sptep) pfn = spte_to_pfn(old_spte); - /* - * KVM doesn't hold a reference to any pages mapped into the guest, and - * instead uses the mmu_notifier to ensure that KVM unmaps any pages - * before they are reclaimed. Sanity check that, if the pfn is backed - * by a refcounted page, the refcount is elevated. - */ - page = kvm_pfn_to_refcounted_page(pfn); - WARN_ON_ONCE(page && !page_count(page)); + if (is_refcounted_page_spte(old_spte)) { + /* + * KVM doesn't hold a reference to any pages mapped into the + * guest, and instead uses the mmu_notifier to ensure that KVM + * unmaps any pages before they are reclaimed. Sanity check + * that, if the pfn is backed by a refcounted page, the + * refcount is elevated. + */ + page = kvm_pfn_to_refcounted_page(pfn); + WARN_ON_ONCE(!page || !page_count(page)); - if (is_accessed_spte(old_spte)) - kvm_set_pfn_accessed(pfn); + if (is_accessed_spte(old_spte)) + kvm_set_page_accessed(pfn_to_page(pfn)); - if (is_dirty_spte(old_spte)) - kvm_set_pfn_dirty(pfn); + if (is_dirty_spte(old_spte)) + kvm_set_page_dirty(pfn_to_page(pfn)); + } return old_spte; } @@ -627,8 +632,8 @@ static bool mmu_spte_age(u64 *sptep) * Capture the dirty status of the page, so that it doesn't get * lost when the SPTE is marked for access tracking. */ - if (is_writable_pte(spte)) - kvm_set_pfn_dirty(spte_to_pfn(spte)); + if (is_writable_pte(spte) && is_refcounted_page_spte(spte)) + kvm_set_page_dirty(pfn_to_page(spte_to_pfn(spte))); spte = mark_spte_for_access_track(spte); mmu_spte_update_no_track(sptep, spte); @@ -1267,8 +1272,8 @@ static bool spte_wrprot_for_clear_dirty(u64 *sptep) { bool was_writable = test_and_clear_bit(PT_WRITABLE_SHIFT, (unsigned long *)sptep); - if (was_writable && !spte_ad_enabled(*sptep)) - kvm_set_pfn_dirty(spte_to_pfn(*sptep)); + if (was_writable && !spte_ad_enabled(*sptep) && is_refcounted_page_spte(*sptep)) + kvm_set_page_dirty(pfn_to_page(spte_to_pfn(*sptep))); return was_writable; } @@ -2946,7 +2951,7 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot, } wrprot = make_spte(vcpu, sp, slot, pte_access, gfn, pfn, *sptep, prefetch, - true, host_writable, &spte); + true, host_writable, true, &spte); if (*sptep == spte) { ret = RET_PF_SPURIOUS; @@ -5999,6 +6004,10 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level, #ifdef CONFIG_X86_64 tdp_mmu_enabled = tdp_mmu_allowed && tdp_enabled; + + /* The SPTE_MMU_PAGE_REFCOUNTED bit is only available with EPT. */ + if (enable_tdp) + shadow_refcounted_mask = SPTE_MMU_PAGE_REFCOUNTED; #endif /* * max_huge_page_level reflects KVM's MMU capabilities irrespective diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 4d4e98fe4f35..c965f77ac4d5 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -902,7 +902,7 @@ static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, */ static int FNAME(sync_spte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, int i) { - bool host_writable; + bool host_writable, is_refcounted; gpa_t first_pte_gpa; u64 *sptep, spte; struct kvm_memory_slot *slot; @@ -959,10 +959,11 @@ static int FNAME(sync_spte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, int sptep = &sp->spt[i]; spte = *sptep; host_writable = spte & shadow_host_writable_mask; + is_refcounted = is_refcounted_page_spte(spte); slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); make_spte(vcpu, sp, slot, pte_access, gfn, spte_to_pfn(spte), spte, true, false, - host_writable, &spte); + host_writable, is_refcounted, &spte); return mmu_spte_update(sptep, spte); } diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index 4a599130e9c9..e4a458b7e185 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -39,6 +39,7 @@ u64 __read_mostly shadow_memtype_mask; u64 __read_mostly shadow_me_value; u64 __read_mostly shadow_me_mask; u64 __read_mostly shadow_acc_track_mask; +u64 __read_mostly shadow_refcounted_mask; u64 __read_mostly shadow_nonpresent_or_rsvd_mask; u64 __read_mostly shadow_nonpresent_or_rsvd_lower_gfn_mask; @@ -138,7 +139,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, const struct kvm_memory_slot *slot, unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn, u64 old_spte, bool prefetch, bool can_unsync, - bool host_writable, u64 *new_spte) + bool host_writable, bool is_refcounted, u64 *new_spte) { int level = sp->role.level; u64 spte = SPTE_MMU_PRESENT_MASK; @@ -188,6 +189,8 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, if (level > PG_LEVEL_4K) spte |= PT_PAGE_SIZE_MASK; + if (is_refcounted) + spte |= shadow_refcounted_mask; if (shadow_memtype_mask) spte |= static_call(kvm_x86_get_mt_mask)(vcpu, gfn, diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index a129951c9a88..6bf0069d8db6 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -96,6 +96,13 @@ static_assert(!(EPT_SPTE_MMU_WRITABLE & SHADOW_ACC_TRACK_SAVED_MASK)); /* Defined only to keep the above static asserts readable. */ #undef SHADOW_ACC_TRACK_SAVED_MASK +/* + * Indicates that the SPTE refers to a page with a valid refcount. Only + * available for TDP SPTEs, since bits 62:52 are reserved for PAE paging, + * including NPT PAE. + */ +#define SPTE_MMU_PAGE_REFCOUNTED BIT_ULL(59) + /* * Due to limited space in PTEs, the MMIO generation is a 19 bit subset of * the memslots generation and is derived as follows: @@ -345,6 +352,13 @@ static inline bool is_dirty_spte(u64 spte) return dirty_mask ? spte & dirty_mask : spte & PT_WRITABLE_MASK; } +extern u64 __read_mostly shadow_refcounted_mask; + +static inline bool is_refcounted_page_spte(u64 spte) +{ + return !shadow_refcounted_mask || (spte & shadow_refcounted_mask); +} + static inline u64 get_rsvd_bits(struct rsvd_bits_validate *rsvd_check, u64 pte, int level) { @@ -475,7 +489,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, const struct kvm_memory_slot *slot, unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn, u64 old_spte, bool prefetch, bool can_unsync, - bool host_writable, u64 *new_spte); + bool host_writable, bool is_refcounted, u64 *new_spte); u64 make_huge_page_split_spte(struct kvm *kvm, u64 huge_spte, union kvm_mmu_page_role role, int index); u64 make_nonleaf_spte(u64 *child_pt, bool ad_disabled); diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 6ae19b4ee5b1..ee497fb78d90 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -414,6 +414,7 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, bool was_leaf = was_present && is_last_spte(old_spte, level); bool is_leaf = is_present && is_last_spte(new_spte, level); bool pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte); + bool is_refcounted = is_refcounted_page_spte(old_spte); WARN_ON_ONCE(level > PT64_ROOT_MAX_LEVEL); WARN_ON_ONCE(level < PG_LEVEL_4K); @@ -478,9 +479,9 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, if (is_leaf != was_leaf) kvm_update_page_stats(kvm, level, is_leaf ? 1 : -1); - if (was_leaf && is_dirty_spte(old_spte) && + if (was_leaf && is_dirty_spte(old_spte) && is_refcounted && (!is_present || !is_dirty_spte(new_spte) || pfn_changed)) - kvm_set_pfn_dirty(spte_to_pfn(old_spte)); + kvm_set_page_dirty(pfn_to_page(spte_to_pfn(old_spte))); /* * Recursively handle child PTs if the change removed a subtree from @@ -492,9 +493,9 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed))) handle_removed_pt(kvm, spte_to_child_pt(old_spte, level), shared); - if (was_leaf && is_accessed_spte(old_spte) && + if (was_leaf && is_accessed_spte(old_spte) && is_refcounted && (!is_present || !is_accessed_spte(new_spte) || pfn_changed)) - kvm_set_pfn_accessed(spte_to_pfn(old_spte)); + kvm_set_page_accessed(pfn_to_page(spte_to_pfn(old_spte))); } /* @@ -956,8 +957,8 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, new_spte = make_mmio_spte(vcpu, iter->gfn, ACC_ALL); else wrprot = make_spte(vcpu, sp, fault->slot, ACC_ALL, iter->gfn, - fault->pfn, iter->old_spte, fault->prefetch, true, - fault->map_writable, &new_spte); + fault->pfn, iter->old_spte, fault->prefetch, true, + fault->map_writable, true, &new_spte); if (new_spte == iter->old_spte) ret = RET_PF_SPURIOUS; @@ -1178,8 +1179,9 @@ static bool age_gfn_range(struct kvm *kvm, struct tdp_iter *iter, * Capture the dirty status of the page, so that it doesn't get * lost when the SPTE is marked for access tracking. */ - if (is_writable_pte(iter->old_spte)) - kvm_set_pfn_dirty(spte_to_pfn(iter->old_spte)); + if (is_writable_pte(iter->old_spte) && + is_refcounted_page_spte(iter->old_spte)) + kvm_set_page_dirty(pfn_to_page(spte_to_pfn(iter->old_spte))); new_spte = mark_spte_for_access_track(iter->old_spte); iter->old_spte = kvm_tdp_mmu_write_spte(iter->sptep, @@ -1602,7 +1604,8 @@ static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root, trace_kvm_tdp_mmu_spte_changed(iter.as_id, iter.gfn, iter.level, iter.old_spte, iter.old_spte & ~dbit); - kvm_set_pfn_dirty(spte_to_pfn(iter.old_spte)); + if (is_refcounted_page_spte(iter.old_spte)) + kvm_set_page_dirty(pfn_to_page(spte_to_pfn(iter.old_spte))); } rcu_read_unlock(); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 59dc9fbafc08..d19a418df04b 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1211,6 +1211,9 @@ unsigned long gfn_to_hva_memslot_prot(struct kvm_memory_slot *slot, gfn_t gfn, void kvm_release_page_clean(struct page *page); void kvm_release_page_dirty(struct page *page); +void kvm_set_page_accessed(struct page *page); +void kvm_set_page_dirty(struct page *page); + struct kvm_follow_pfn { const struct kvm_memory_slot *slot; gfn_t gfn; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 24e2269339cb..235c92830cdc 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3277,17 +3277,19 @@ static bool kvm_is_ad_tracked_page(struct page *page) return !PageReserved(page); } -static void kvm_set_page_dirty(struct page *page) +void kvm_set_page_dirty(struct page *page) { if (kvm_is_ad_tracked_page(page)) SetPageDirty(page); } +EXPORT_SYMBOL_GPL(kvm_set_page_dirty); -static void kvm_set_page_accessed(struct page *page) +void kvm_set_page_accessed(struct page *page) { if (kvm_is_ad_tracked_page(page)) mark_page_accessed(page); } +EXPORT_SYMBOL_GPL(kvm_set_page_accessed); void kvm_release_page_clean(struct page *page) { From patchwork Thu Feb 29 02:57:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Stevens X-Patchwork-Id: 208185 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2097:b0:108:e6aa:91d0 with SMTP id gs23csp143953dyb; Wed, 28 Feb 2024 19:01:46 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCWFhoMjd3SEIUJc/GnL4UVp2BqAC4X/906eyFQtcpA3Xyh9/tlF76cqe4O+HDiir+YUxOWHjtl2QRnypkj0ZZaJt3t7Qg== X-Google-Smtp-Source: AGHT+IG/tC9vTFx+z96FXI49O6mDITzbDFLz4p/EX1fxuVosN0W1d3KficJ9XiL0LUcgcm7l2zuh X-Received: by 2002:a05:6a00:b41:b0:6e4:c102:8065 with SMTP id p1-20020a056a000b4100b006e4c1028065mr998443pfo.5.1709175706200; Wed, 28 Feb 2024 19:01:46 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709175706; cv=pass; d=google.com; s=arc-20160816; b=Pfh1YVmfym3QSr9/zzVr4wNbURdavabN3nmcR5czmReolKB5qfkqfTkc5zZbGrm329 iqpmFpJ2ZFfkgTjmMApNCZLNDYbcEY0N4zFVn3njeqjC9MuUQ3QmaGv/D0vZdjMHpj2q BwjMa4xW6PYsJ5+1Y/VBuuXh7a1CYOY4+wOKpb2Y2B5GsVayCWyABioWehxS+avpie1O LQWs9jhZHKhDxX7oPFIkS+OjjlgV7ccrPDDJ2uvN0tVNkJphOB63tJG6qovHKatlDHQ3 TfjSuujCE4BlWsRHPh8AjDZFo4gK/7g8P1XaBsaSQoe0+9mJVgO0as27/LMJKoi1V3En OkNg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=jpTtMKeJSamVSvzUBrIb8hYRwduZxS9ewHlGqezhwx0=; fh=yZOFYdPSUNSx/hrV7iYfGrTGUJTClr0hIrDH0rF+5kw=; b=mgmI2fWsZoFFgHAA1zQqJz23xCINf8lnWGBuB+9uttb/CLrJ6Y5G+k1NZGKOeuiQu+ pRtxTpuSLukittBVtAxJE2ev8oU250S3Xe/MZzlDKyMIkrd/xgsbA9P/Y+EJTFnIb3Ys D22QS0peGpD10hj7POxbHrk1cEXLwAUNXxSEiGQtHrcW4rJ+gGXaThPvbBq0RzqWKjIM DGbcmsCwLAtTq2ltFvGQmolaR4NuFzzLCX+XVldF6Ay6L7iU06E1ShRUp14dDw3LHew1 0L8MdpJ5ZP7rfACm3yr2DoOmW2+hEzwJ/v+2lLG3EaO5wkJM6tLFgDLIac5dUI8AieqK 0Afw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=mp1HfUoV; arc=pass (i=1 spf=pass spfdomain=chromium.org dkim=pass dkdomain=chromium.org dmarc=pass fromdomain=chromium.org); spf=pass (google.com: domain of linux-kernel+bounces-86072-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-86072-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id lp3-20020a056a003d4300b006e561349528si370307pfb.374.2024.02.28.19.01.46 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Feb 2024 19:01:46 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-86072-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=mp1HfUoV; arc=pass (i=1 spf=pass spfdomain=chromium.org dkim=pass dkdomain=chromium.org dmarc=pass fromdomain=chromium.org); spf=pass (google.com: domain of linux-kernel+bounces-86072-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-86072-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 6D5412854F0 for ; Thu, 29 Feb 2024 03:01:14 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 9960B46B98; Thu, 29 Feb 2024 02:58:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="mp1HfUoV" Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1CD0D44C6E for ; Thu, 29 Feb 2024 02:58:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709175525; cv=none; b=GKPu4nLNMR7xETR3NPpAn9yrx5DM70BmfKcXxazF+cbtBihkn0WrwiJ633KQmpMOcBHnoPtYN/U+7aLzsERdYw4RRELQm2DA83mRXfFGXIMsVvn3SP0HsDaEvn/UCuHOd9t5XfEDIAr/L5J3c8mZF/BdeWREWMU5tJFHQV9PMN4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709175525; c=relaxed/simple; bh=CHA75S6snq8UA20Y4BdKK2SwtCegWvLmcyn13Xk/r2Y=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=J3/3lLEMWxbgFttExIXNrNQ4i1aBBjsUR7Wmb9kptChvCYRAYtVqxOoeNnHTvYefk0/SfDlM8Cq6C9tJ5GLXAhWnx3ybZAmHdZqVpas69hGTyku1qpKEsQB+7wJ6UzxDwEn2ae8gr0QS+oJEQEsm8IHEpjE9eUie5QOpHmHLhCQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org; spf=pass smtp.mailfrom=chromium.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b=mp1HfUoV; arc=none smtp.client-ip=209.85.214.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=chromium.org Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-1dba177c596so3133045ad.0 for ; Wed, 28 Feb 2024 18:58:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1709175523; x=1709780323; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=jpTtMKeJSamVSvzUBrIb8hYRwduZxS9ewHlGqezhwx0=; b=mp1HfUoV8R1444WIRrFqUzYsztJIyvqm0ZOmqXBbcmLPwO5W+7I0kcV3s/V+A5voQl WGioLd8r/JBgnvkQFknWgPHmvi/NPAIZSdyK/E4gQvp2FjQJW+JlJBwbTBSi0DcuJuC2 MrwIJJcJHby6lJRxN52BzPSs2EGtqaSBp13X4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709175523; x=1709780323; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jpTtMKeJSamVSvzUBrIb8hYRwduZxS9ewHlGqezhwx0=; b=WLHw0ibYuoMlXUxTsXC+5P2eQtzVyD5GeRqJ/a2oMFwLVXF12iNEYFkM3Ow7EAWFMd 5M0GeJTygBrAr3G1/SbVW1q2qJ9eP3SrbZ5roWHgqhDLjG0UW9ri26zkAqXU+Iv57ASc dLd9DKglb4ueHR2DMZsDeJTYIAzzyYsFz2HXYREE/GKyAZvIvpJVVhHQKhGHIUoXgEGn 0tQzFHaUyN1hwgazLLuqK2Ebv8PlFu/tDVPvj76Sszoo6vC4qUfQXzi1wAn3PnYciR3K TXAy6elYqmhqmTCz+WZo/Ii9dgjhXU2QxY6g+YepfCvf8fxK6USOPsciY1gjmR4WOF/l 7HeA== X-Forwarded-Encrypted: i=1; AJvYcCXpX89b7t96wJLXbwVcRH2JeYDMlBNJZR12RJmlQfvvrRogvZo4KPPfUjw2iFaG+CeHxLbevB9dJzfVB+LC9uIxz5ONC/1Yrod7Xw0w X-Gm-Message-State: AOJu0YyaOm1G+Bxw/0rVtXrF85OfWpwHo/osZgJM1xQk7lU8BdQRTZSI t3cc8XcSrmK7k7GCqQuY0i8e72xgpU6q4vOV/fu8tDijuzLkRX5A0eUcHChBuuSLFU3NyUd+99A = X-Received: by 2002:a17:903:1c3:b0:1dc:b887:35bd with SMTP id e3-20020a17090301c300b001dcb88735bdmr1034209plh.5.1709175523466; Wed, 28 Feb 2024 18:58:43 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:f51:e79e:9056:77ea]) by smtp.gmail.com with UTF8SMTPSA id x10-20020a170902ec8a00b001d5f1005096sm181559plg.55.2024.02.28.18.58.41 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 28 Feb 2024 18:58:43 -0800 (PST) From: David Stevens X-Google-Original-From: David Stevens To: Sean Christopherson , Paolo Bonzini Cc: Yu Zhang , Isaku Yamahata , Zhi Wang , Maxim Levitsky , kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, David Stevens Subject: [PATCH v11 8/8] KVM: x86/mmu: Handle non-refcounted pages Date: Thu, 29 Feb 2024 11:57:59 +0900 Message-ID: <20240229025759.1187910-9-stevensd@google.com> X-Mailer: git-send-email 2.44.0.rc1.240.g4c46232300-goog In-Reply-To: <20240229025759.1187910-1-stevensd@google.com> References: <20240229025759.1187910-1-stevensd@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1792200625401743476 X-GMAIL-MSGID: 1792200625401743476 From: David Stevens Handle non-refcounted pages in __kvm_faultin_pfn. This allows the host to map memory into the guest that is backed by non-refcounted struct pages - for example, the tail pages of higher order non-compound pages allocated by the amdgpu driver via ttm_pool_alloc_page. Signed-off-by: David Stevens --- arch/x86/kvm/mmu/mmu.c | 24 +++++++++++++++++------- arch/x86/kvm/mmu/mmu_internal.h | 2 ++ arch/x86/kvm/mmu/paging_tmpl.h | 2 +- arch/x86/kvm/mmu/tdp_mmu.c | 3 ++- include/linux/kvm_host.h | 6 ++++-- virt/kvm/guest_memfd.c | 8 ++++---- virt/kvm/kvm_main.c | 10 ++++++++-- 7 files changed, 38 insertions(+), 17 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 4936a8c5829b..f9046912bb43 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -2924,6 +2924,11 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot, bool host_writable = !fault || fault->map_writable; bool prefetch = !fault || fault->prefetch; bool write_fault = fault && fault->write; + /* + * Prefetching uses gfn_to_page_many_atomic, which never gets + * non-refcounted pages. + */ + bool is_refcounted = !fault || !!fault->accessed_page; if (unlikely(is_noslot_pfn(pfn))) { vcpu->stat.pf_mmio_spte_created++; @@ -2951,7 +2956,7 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot, } wrprot = make_spte(vcpu, sp, slot, pte_access, gfn, pfn, *sptep, prefetch, - true, host_writable, true, &spte); + true, host_writable, is_refcounted, &spte); if (*sptep == spte) { ret = RET_PF_SPURIOUS; @@ -4319,8 +4324,8 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu, return -EFAULT; } - r = kvm_gmem_get_pfn(vcpu->kvm, fault->slot, fault->gfn, &fault->pfn, - &max_order); + r = kvm_gmem_get_pfn(vcpu->kvm, fault->slot, fault->gfn, + &fault->pfn, &fault->accessed_page, &max_order); if (r) { kvm_mmu_prepare_memory_fault_exit(vcpu, fault); return r; @@ -4330,6 +4335,9 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu, fault->max_level); fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY); + /* kvm_gmem_get_pfn takes a refcount, but accessed_page doesn't need it. */ + put_page(fault->accessed_page); + return RET_PF_CONTINUE; } @@ -4339,10 +4347,10 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault struct kvm_follow_pfn kfp = { .slot = slot, .gfn = fault->gfn, - .flags = FOLL_GET | (fault->write ? FOLL_WRITE : 0), + .flags = fault->write ? FOLL_WRITE : 0, .try_map_writable = true, .guarded_by_mmu_notifier = true, - .allow_non_refcounted_struct_page = false, + .allow_non_refcounted_struct_page = shadow_refcounted_mask, }; /* @@ -4359,6 +4367,7 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault fault->slot = NULL; fault->pfn = KVM_PFN_NOSLOT; fault->map_writable = false; + fault->accessed_page = NULL; return RET_PF_CONTINUE; } /* @@ -4422,6 +4431,7 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault success: fault->hva = kfp.hva; fault->map_writable = kfp.writable; + fault->accessed_page = kfp.refcounted_page; return RET_PF_CONTINUE; } @@ -4510,8 +4520,8 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault r = direct_map(vcpu, fault); out_unlock: + kvm_set_page_accessed(fault->accessed_page); write_unlock(&vcpu->kvm->mmu_lock); - kvm_release_pfn_clean(fault->pfn); return r; } @@ -4586,8 +4596,8 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu, r = kvm_tdp_mmu_map(vcpu, fault); out_unlock: + kvm_set_page_accessed(fault->accessed_page); read_unlock(&vcpu->kvm->mmu_lock); - kvm_release_pfn_clean(fault->pfn); return r; } #endif diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index 0669a8a668ca..0b05183600af 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -240,6 +240,8 @@ struct kvm_page_fault { kvm_pfn_t pfn; hva_t hva; bool map_writable; + /* Does NOT have an elevated refcount */ + struct page *accessed_page; /* * Indicates the guest is trying to write a gfn that contains one or diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index c965f77ac4d5..b39dce802394 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -847,8 +847,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault r = FNAME(fetch)(vcpu, fault, &walker); out_unlock: + kvm_set_page_accessed(fault->accessed_page); write_unlock(&vcpu->kvm->mmu_lock); - kvm_release_pfn_clean(fault->pfn); return r; } diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index ee497fb78d90..0524be7c0796 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -958,7 +958,8 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, else wrprot = make_spte(vcpu, sp, fault->slot, ACC_ALL, iter->gfn, fault->pfn, iter->old_spte, fault->prefetch, true, - fault->map_writable, true, &new_spte); + fault->map_writable, !!fault->accessed_page, + &new_spte); if (new_spte == iter->old_spte) ret = RET_PF_SPURIOUS; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index d19a418df04b..ea34eae6cfa4 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2426,11 +2426,13 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn) #ifdef CONFIG_KVM_PRIVATE_MEM int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, - gfn_t gfn, kvm_pfn_t *pfn, int *max_order); + gfn_t gfn, kvm_pfn_t *pfn, struct page **page, + int *max_order); #else static inline int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn, - kvm_pfn_t *pfn, int *max_order) + kvm_pfn_t *pfn, struct page **page, + int *max_order) { KVM_BUG_ON(1, kvm); return -EIO; diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 0f4e0cf4f158..dabcca2ecc37 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -483,12 +483,12 @@ void kvm_gmem_unbind(struct kvm_memory_slot *slot) } int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, - gfn_t gfn, kvm_pfn_t *pfn, int *max_order) + gfn_t gfn, kvm_pfn_t *pfn, struct page **page, + int *max_order) { pgoff_t index = gfn - slot->base_gfn + slot->gmem.pgoff; struct kvm_gmem *gmem; struct folio *folio; - struct page *page; struct file *file; int r; @@ -514,9 +514,9 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, goto out_unlock; } - page = folio_file_page(folio, index); + *page = folio_file_page(folio, index); - *pfn = page_to_pfn(page); + *pfn = page_to_pfn(*page); if (max_order) *max_order = 0; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 235c92830cdc..1f5d2a1e63a9 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3284,11 +3284,17 @@ void kvm_set_page_dirty(struct page *page) } EXPORT_SYMBOL_GPL(kvm_set_page_dirty); -void kvm_set_page_accessed(struct page *page) +static void __kvm_set_page_accessed(struct page *page) { if (kvm_is_ad_tracked_page(page)) mark_page_accessed(page); } + +void kvm_set_page_accessed(struct page *page) +{ + if (page) + __kvm_set_page_accessed(page); +} EXPORT_SYMBOL_GPL(kvm_set_page_accessed); void kvm_release_page_clean(struct page *page) @@ -3298,7 +3304,7 @@ void kvm_release_page_clean(struct page *page) if (!page) return; - kvm_set_page_accessed(page); + __kvm_set_page_accessed(page); put_page(page); } EXPORT_SYMBOL_GPL(kvm_release_page_clean);