Message ID | 20230504225351.10765-3-decui@microsoft.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp704vqo; Thu, 4 May 2023 15:57:24 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4TtDksrHV8G3inxqcdMsaO6L51V4eXIZ+8l6mkY5YH/z701Mk95jbaRzmPbFlpI/Lqkt5r X-Received: by 2002:a05:6a20:938f:b0:f4:d4a8:9c55 with SMTP id x15-20020a056a20938f00b000f4d4a89c55mr5073603pzh.40.1683241044348; Thu, 04 May 2023 15:57:24 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1683241044; cv=pass; d=google.com; s=arc-20160816; b=f2028AQjlrOmGS0QTl7ciB4jlQaMTMhWylQdF3/0XT8j1yLbfEyjgj53A3ihjviIM9 Vse+jvsiJleo1l3rJa/E+ebReL5djgaYRNFzwFWwRx4+p5hl79h9qpmwE+Sxdwi8b8UW RoYKyAoW40Ec8BXPt5VqQyHQwnD4tmATftVRGwLJNWKwpDPtf0iyR+XbVqa0e8WPN7nH KtyUGE/q0+3UogHz/J+hLtAKEsZw8Wfut3Jgv5V6dcYYRweJzE5eWnxLYAZJR4MUJNkK kQomuMf+In2sCOVV91PxoikPb7sHuyAKQHx0XtYf68hFQXmW7DwEI2P2x2V5YvWcsX8z kfBg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=F8E9Nk4FNiFw3SJfXQWE+zYXr8WlJktHjMyIKEPXEzA=; b=fx6l01ilKOFt9PK53cHZPWRlrTeU1Y5ymu/+5fU4GRTb0iRwV6Iv9TIaXlwSs5Ra9f 7BZJz11RrXlskqfUiACA+dLd1mdc72wCDk7lOjovKhSdfeA2A6bIvQEcFDhO7Gb6JXhg mjSx4dPsQVy2oHIT1Wu1V3eSCck7ZheweJdC0XkBfLKZSpEEOGM6gMuujv701xy+JCIn qWW+DTmuiqGkNFk+lZQbGJv+fAMREsGw9k7f+fRDDm00S4GKI1S47gTgJWyaBUBz12XE AsGqPk/n//Df9MCdVA/BacXKHomh75GKgZeH3uQs1FYiR2ooLVao9bqSMW5WSETmm58h p84w== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@microsoft.com header.s=selector2 header.b=LZJPbVdI; arc=pass (i=1 spf=pass spfdomain=microsoft.com dkim=pass dkdomain=microsoft.com dmarc=pass fromdomain=microsoft.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=microsoft.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i187-20020a639dc4000000b0051344de1eb8si459671pgd.276.2023.05.04.15.57.08; Thu, 04 May 2023 15:57:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@microsoft.com header.s=selector2 header.b=LZJPbVdI; arc=pass (i=1 spf=pass spfdomain=microsoft.com dkim=pass dkdomain=microsoft.com dmarc=pass fromdomain=microsoft.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=microsoft.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229941AbjEDWzB (ORCPT <rfc822;b08248@gmail.com> + 99 others); Thu, 4 May 2023 18:55:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43122 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229698AbjEDWyy (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 4 May 2023 18:54:54 -0400 Received: from DM5PR00CU002.outbound.protection.outlook.com (mail-centralusazon11021020.outbound.protection.outlook.com [52.101.62.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0E11311D80; Thu, 4 May 2023 15:54:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=As5p4FQi2sOVitQQKKj8LzX6f7aqaWdpJSygeF9LMzqglLB1NWxzqql6p2EIFGWkqK17gpe8iYKr2T6KCBD6QP4u1z5fNrtTeNsVzJS+Z4uVNWQdBOI0e/Wk4zJ1cLOcW0C1rHcS2nuNAJ1y3ZXRhzALszvjN3OE3IzJ4ZtWRKu/G5QOwaYEH5D+MTCuwJsVJ3agB0wBrQRVLIxIo1N1msQFFGi17UC+ldCgqJ15pwQqQN8P0wUmcNKfZb702OKTzc3WXVLBjjo6Fv5TwTjoBQ0BY23k35YQdfzih4lQzRW5tt6eRSb8Z/NUZsoWRkcrWIEC68AoMz/kusfWx5iltg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=F8E9Nk4FNiFw3SJfXQWE+zYXr8WlJktHjMyIKEPXEzA=; b=dVjN22pDEpRECyT25hfi7zls99PWWvc2WaPKq1ubhUoH39rkau47JKjCdY7+CKK3f5iHy5kHEcx48GzRnHwS3JZUReD8EOSK4DLcp9FkVBSUC9CKpfVp70CPmzBJWz1m9pW635y/Ni7R5nbj8kcZIix7ltX1VFKBjy2KP09ZeXFxmvq9uiXlAR6mpM85ojEg2eb5clBBJ7CnUWSW+8rfMjoDSHx6DQOSgmk6cpt79tqTU3NoQA/rqOGbDxzWFk8FikI/vs4pGLuCnvZhlezGYKFMZQm8u6EKN02tUFLPHgYeXPnFmAFH1LBMQNcHTFgIgM6sJcOoQzf+zAqSb8d/oQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=microsoft.com; dmarc=pass action=none header.from=microsoft.com; dkim=pass header.d=microsoft.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=F8E9Nk4FNiFw3SJfXQWE+zYXr8WlJktHjMyIKEPXEzA=; b=LZJPbVdI/ztKWxBOyvqrd+9V2iYU1CfAti3zQi9DmWffhGlzoJJZ4FM5XKSqhNI3ZRglf0vMURNFkW8QGzQvp7/Qyd/WS4RgogTpx4bFj3EjL942bYDICwos6XjVtb1p0MqTi6u6Nd9wT7FV7s9V0bmJW+XqqyKnO1R78a1SaGs= Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=microsoft.com; Received: from SN6PR2101MB1101.namprd21.prod.outlook.com (2603:10b6:805:6::26) by SA0PR21MB1883.namprd21.prod.outlook.com (2603:10b6:806:e9::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6387.9; Thu, 4 May 2023 22:54:51 +0000 Received: from SN6PR2101MB1101.namprd21.prod.outlook.com ([fe80::b2eb:246c:555a:b274]) by SN6PR2101MB1101.namprd21.prod.outlook.com ([fe80::b2eb:246c:555a:b274%4]) with mapi id 15.20.6387.011; Thu, 4 May 2023 22:54:51 +0000 From: Dexuan Cui <decui@microsoft.com> To: ak@linux.intel.com, arnd@arndb.de, bp@alien8.de, brijesh.singh@amd.com, dan.j.williams@intel.com, dave.hansen@linux.intel.com, haiyangz@microsoft.com, hpa@zytor.com, jane.chu@oracle.com, kirill.shutemov@linux.intel.com, kys@microsoft.com, linux-arch@vger.kernel.org, linux-hyperv@vger.kernel.org, luto@kernel.org, mingo@redhat.com, peterz@infradead.org, rostedt@goodmis.org, sathyanarayanan.kuppuswamy@linux.intel.com, seanjc@google.com, tglx@linutronix.de, tony.luck@intel.com, wei.liu@kernel.org, x86@kernel.org, mikelley@microsoft.com Cc: linux-kernel@vger.kernel.org, Tianyu.Lan@microsoft.com, Dexuan Cui <decui@microsoft.com> Subject: [PATCH v6 2/6] x86/tdx: Support vmalloc() for tdx_enc_status_changed() Date: Thu, 4 May 2023 15:53:47 -0700 Message-Id: <20230504225351.10765-3-decui@microsoft.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230504225351.10765-1-decui@microsoft.com> References: <20230504225351.10765-1-decui@microsoft.com> Content-Type: text/plain X-ClientProxiedBy: CYXPR03CA0033.namprd03.prod.outlook.com (2603:10b6:930:d2::10) To SN6PR2101MB1101.namprd21.prod.outlook.com (2603:10b6:805:6::26) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SN6PR2101MB1101:EE_|SA0PR21MB1883:EE_ X-MS-Office365-Filtering-Correlation-Id: b70d12be-4f96-420d-b0a1-08db4cf29171 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: x3+C0bFHHXzXL/S1w5vbpyD49XsXlenhxgl29TF2cR3TrxrbGF7cjtWY6nax2S4llDnFpHfL0JRrmrpDhhKPCfWGjTAKkj2PFqf76tnJZkV+JXV6aSBShMGrFVxZdwaZlNwOezXdNQhWB8M/ONY8W8NQggmBi/GWNBGKTBOLR1hcbYVuL6r2TvzUfVtSj5x0u+ay0LhxDxBXuYIuRjN0RaaRA31VL7CRHofBFgzRzJIYpJElVN+TG9Q3DMdh7daBHEN0NRbPQSs/m7/h5us7eWiPxU5zw4Kmfqgw/LrCVobW3UUtnwAJvCj5WS63kRoEHXX24DQL0gdWJDTdHNkkxy4+EqPxuenPRPPpmHJ2K9Mxl37Y+fKfPJkGSKPl6VjxaT1gTV0g06MBEJpxu/i4y/h3KZInTqGQ9nMk+bI1WkYjmpKc9u93KvEai9f6ovdOSrF4YnX3Uzruc+b/UVBJKbuteNj9AgeZ63+YC5/iGyfUIzkIOTvfoe4OczLJ7/g4d04mVxaPdHoLL5s3YFhUKfrd8qfC9C+4qnopHsosAXwTdoSQAWahYi+Yd5uMaYnY8bnKd6CehWGiAjaB1cwJlyj+FVquCr+y/+SaQKH3t+FCIkyualmBEVA6d6KN9ksl4ZRLiC9+6zgIhlkVHu46RQ== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:SN6PR2101MB1101.namprd21.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230028)(4636009)(346002)(136003)(366004)(376002)(39860400002)(396003)(451199021)(36756003)(86362001)(66946007)(786003)(316002)(66476007)(6666004)(66556008)(52116002)(478600001)(4326008)(6636002)(6486002)(10290500003)(2906002)(82960400001)(8936002)(5660300002)(8676002)(7416002)(38100700002)(921005)(186003)(82950400001)(107886003)(6512007)(1076003)(6506007)(2616005)(41300700001)(83380400001);DIR:OUT;SFP:1102; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: 0+hKvBATs+EdyYTnwcdQODEHnEPu6354LzzewHssS+cd44yTa97CDwLCuGAKPzboNIw/dlwA63zOdYTeMwTYrPfXF9EfB/OSFegmmk6QFYYRskjRNTQX6XRgsZQAq4ZnCB812yoBshiDqDVvxDN9pJThbL1muZdpJ02HG7coJahhkvaA20s0X5ARvLv8h0KFwANIFBOclxV9Z02zFxWogandUhvvsjHC4izGm+pxw4Nij8jt2UpScG0UHfnan2Cs3XjIKTHnd/1CVgqq/4f6jOKmC0GlYqZkNh4Ik1O4Rnlx9/6Us4QPlScOB8cA9VuTyppPtWZPed6362rMKbIHjZ4bAyhqtxZxjBBVGCnm3UjZlK9HKBc/mdat2uV7v4o20n6k4MSQAX9413DI7ZzqA8uVlZzKVJnGWKJphRCfovOFdcycJ9ufTbikcAA4WBdHYHI8GdNsWeWQIRrH9VhwFchc8BfxCW2Tucrjs9L7KllK7byFVpUPhkB3NHbvX1f/DG/lkJtlJBKEvqLTiCxqgfbFL+tpjoRVGR3ac50VGlXxWPxv+JL0Ff24tf+CGmCngXOITmBWim+XSH6ZCD11Rar+eOOtqmVxdQ3j/1Gix3wC6F9wWZp+246h+qZE1ChJrfPYHoDu2gRhhyGMmXPsAq0+Wf+UDCG8At1pIY/k6Ra0MPmPjVDQHWpg/B+Sb1pLirc/ebMzzW695wZm40mUC4nA8LBi5QlCiYuYZCGCe7X/dx2DDGsHWUrXf1lcKUqffYy2mYPQHlwaTZobE0TvywaxWlAqGYowqmc/VMKFXMJEMX920bu0Fg/zHN3Z67zl+mf7wgtSojfSFnEuUdOzJBKTqIYfXUjRxVTts+fobi0JljMLlf67WLyohMxJv0PvQta4jREreY6OwajyjeYlbbj+Das7DD6eg6Dgt8Eb6x7j8Q0Uy9VT3QuFHVRL+UOlkTvFnsYQazUroAq9eyGEzajOg++bLaSsPsBvT/wbyG1BcWfh1pnmMWATxGzp34ZT6/dXXLuCYjJmJEbsmRkitN8zGOF3lmM49R4PI8HIMWR1mGMnML04e9UqMRTnPe5UGA7trxUW2ENca0a2Mj+HYHiO2tKyMWoeQUb+XIkcmIKyAQSEYTl4Ce+3iYyw5+M7KK8GT6QCqUMmJBz2/8q7+KZDrv2PiIF3bsXIcu6CjgkcEMfwsSxezAP65kvSrSaNAS+kfbJ8QXXMvNsIwbt5W2V81fpJyWrIcl8PL62XXyoRfmSseYdhA/mvJBp17r0SGamCU7bGXBcj62JSYsgR1TPXqzEx+5ZTwfudP6ATpfjRup8G3s4N+ejQBLSghgnVhBLH0TJVt6m93spHbuT0vdqxovKJxnaTKUhtPP2EPFjTDxrNg1eLfYO2uRqKh4R211rI5h7LV+KrVVfOxKR/wxzHzic7fKIh4c6VPgeKdJsO6W0i7l9J20WrVSgF5oJ+krAhIhwkeVj8YezH+VWoD/BI+HfdoyKJWFtIuzrMbPf2CQ9s6StNkbHHgkx2BG06aNa97OPphdvDaJbYkR8nRTR/6mdYsBiv2Zz0ZlqFkUUJys2CnoO6NPttNkxG1dbjjXYxTQ0FVcbEmfpFSoRAriKNzPTB20th+jdn3iqVtYknm/WTl2rgD3qVFKyNdD8A X-OriginatorOrg: microsoft.com X-MS-Exchange-CrossTenant-Network-Message-Id: b70d12be-4f96-420d-b0a1-08db4cf29171 X-MS-Exchange-CrossTenant-AuthSource: SN6PR2101MB1101.namprd21.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 04 May 2023 22:54:51.0487 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 72f988bf-86f1-41af-91ab-2d7cd011db47 X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: x0L2WymJC7dRfrKvuFG0HKx9gxwh1oibbHPj3J/k2iWgb6gVrRuzmV0pSP3HvKwOLKLxQWglceBIyzPcPgmcgw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA0PR21MB1883 X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1765006161221219974?= X-GMAIL-MSGID: =?utf-8?q?1765006161221219974?= |
Series |
Support TDX guests on Hyper-V
|
|
Commit Message
Dexuan Cui
May 4, 2023, 10:53 p.m. UTC
When a TDX guest runs on Hyper-V, the hv_netvsc driver's netvsc_init_buf() allocates buffers using vzalloc(), and needs to share the buffers with the host OS by calling set_memory_decrypted(), which is not working for vmalloc() yet. Add the support by handling the pages one by one. Co-developed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> --- Changes in v2: Changed tdx_enc_status_changed() in place. Hi, Dave, I checked the huge vmalloc mapping code, but still don't know how to get the underlying huge page info (if huge page is in use) and try to use PG_LEVEL_2M/1G in try_accept_page() for vmalloc: I checked is_vm_area_hugepages() and __vfree() -> __vunmap(), and I think the underlying page allocation info is internal to the mm code, and there is no mm API to for me get the info in tdx_enc_status_changed(). Changes in v3: No change since v2. Changes in v4: Added Kirill's Co-developed-by since Kirill helped to improve the code by adding tdx_enc_status_changed_phys(). Thanks Kirill for the clarification on load_unaligned_zeropad()! The vzalloc() usage in drivers/net/hyperv/netvsc.c: netvsc_init_buf() remains the same. It may not worth it to "allocate a vmalloc region, allocate pages manually", because we have to consider the worst case where the system is sufferiing from severe memory fragmentation and we can only allocate multiple single pages. We may not want to complicate the code in netvsc_init_buf(). We'll support NIC SR-IOV for TDX VMs on Hyper-V, so the netvsc send/recv buffers won't be used when the VF NIC is up. Changes in v5: Added Kirill's Signed-off-by. Added Michael's Reviewed-by. Changes in v6: None. arch/x86/coco/tdx/tdx.c | 76 ++++++++++++++++++++++++++++------------- 1 file changed, 52 insertions(+), 24 deletions(-)
Comments
On 5/4/23 15:53, Dexuan Cui wrote: > When a TDX guest runs on Hyper-V, the hv_netvsc driver's netvsc_init_buf() > allocates buffers using vzalloc(), and needs to share the buffers with the > host OS by calling set_memory_decrypted(), which is not working for > vmalloc() yet. Add the support by handling the pages one by one. I think this sets a bad precedent. There are consequences for converting pages between shared and private. Doing it on a vmalloc() mapping is guaranteed to fracture the underlying EPT/SEPT mappings. How does this work with load_unaligned_zeropad()? Couldn't it be running around poking at one of these vmalloc()'d pages via the direct map during a shared->private conversion before the page has been accepted?
On Tue, May 23, 2023, Dave Hansen wrote: > On 5/4/23 15:53, Dexuan Cui wrote: > > When a TDX guest runs on Hyper-V, the hv_netvsc driver's netvsc_init_buf() > > allocates buffers using vzalloc(), and needs to share the buffers with the > > host OS by calling set_memory_decrypted(), which is not working for > > vmalloc() yet. Add the support by handling the pages one by one. > > I think this sets a bad precedent. +1 > There are consequences for converting pages between shared and private. > Doing it on a vmalloc() mapping is guaranteed to fracture the underlying > EPT/SEPT mappings. > > How does this work with load_unaligned_zeropad()? Couldn't it be > running around poking at one of these vmalloc()'d pages via the direct > map during a shared->private conversion before the page has been accepted? Would it be feasible and sensible to add a GFP_SHARED or whatever, to communicate to the core allocators that the page is destined to be converted to a shared page? I assume that would provide a common place (or two) for initiating conversions, and would hopefully allow for future optimizations, e.g. to keep shared allocation in the same pool or whatever. Sharing memory without any intelligence as to what memory is converted is going to make both the guest and host sad.
On 5/23/23 14:25, Sean Christopherson wrote: >> There are consequences for converting pages between shared and private. >> Doing it on a vmalloc() mapping is guaranteed to fracture the underlying >> EPT/SEPT mappings. >> >> How does this work with load_unaligned_zeropad()? Couldn't it be >> running around poking at one of these vmalloc()'d pages via the direct >> map during a shared->private conversion before the page has been accepted? > Would it be feasible and sensible to add a GFP_SHARED or whatever, to communicate > to the core allocators that the page is destined to be converted to a shared page? > I assume that would provide a common place (or two) for initiating conversions, > and would hopefully allow for future optimizations, e.g. to keep shared allocation > in the same pool or whatever. Sharing memory without any intelligence as to what > memory is converted is going to make both the guest and host sad. I don't think we want a GFP flag. This is still way too specialized to warrant one of those. It sounds like a similar problem to what folks want for modules or BPF. There are a bunch of allocations that are related and can have some of their setup/teardown costs amortized if they can be clumped together. For BPF, the costs are from doing RW=>RO in the kernel direct map, and fracturing it in the process. Here, the costs are from the private->shared conversions and fracturing both the direct map and the EPT/SEPT. I just don't know if there's anything that we can reuse from the BPF effort.
On Tue, May 23, 2023 at 01:39:11PM -0700, Dave Hansen wrote: > On 5/4/23 15:53, Dexuan Cui wrote: > > When a TDX guest runs on Hyper-V, the hv_netvsc driver's netvsc_init_buf() > > allocates buffers using vzalloc(), and needs to share the buffers with the > > host OS by calling set_memory_decrypted(), which is not working for > > vmalloc() yet. Add the support by handling the pages one by one. > > I think this sets a bad precedent. > > There are consequences for converting pages between shared and private. > Doing it on a vmalloc() mapping is guaranteed to fracture the underlying > EPT/SEPT mappings. > > How does this work with load_unaligned_zeropad()? Couldn't it be > running around poking at one of these vmalloc()'d pages via the direct > map during a shared->private conversion before the page has been accepted? Alias processing in __change_page_attr_set_clr() will change direct mapping if you call it on vmalloc()ed memory. I think we are safe wrt load_unaligned_zeropad() here.
On 5/23/23 15:37, kirill.shutemov@linux.intel.com wrote: >> How does this work with load_unaligned_zeropad()? Couldn't it be >> running around poking at one of these vmalloc()'d pages via the direct >> map during a shared->private conversion before the page has been accepted? > Alias processing in __change_page_attr_set_clr() will change direct > mapping if you call it on vmalloc()ed memory. I think we are safe wrt > load_unaligned_zeropad() here. We're *eventually* OK: > /* Notify hypervisor that we are about to set/clr encryption attribute. */ > x86_platform.guest.enc_status_change_prepare(addr, numpages, enc); > > ret = __change_page_attr_set_clr(&cpa, 1); But what about in the middle between enc_status_change_prepare() and __change_page_attr_set_clr()? Don't the direct map and the shared/private status of the page diverge in there?
On Tue, 2023-05-23 at 14:33 -0700, Dave Hansen wrote: > On 5/23/23 14:25, Sean Christopherson wrote: > > > There are consequences for converting pages between shared and > > > private. > > > Doing it on a vmalloc() mapping is guaranteed to fracture the > > > underlying > > > EPT/SEPT mappings. > > > > > > How does this work with load_unaligned_zeropad()? Couldn't it be > > > running around poking at one of these vmalloc()'d pages via the > > > direct > > > map during a shared->private conversion before the page has been > > > accepted? > > Would it be feasible and sensible to add a GFP_SHARED or whatever, > > to communicate > > to the core allocators that the page is destined to be converted to > > a shared page? > > I assume that would provide a common place (or two) for initiating > > conversions, > > and would hopefully allow for future optimizations, e.g. to keep > > shared allocation > > in the same pool or whatever. Sharing memory without any > > intelligence as to what > > memory is converted is going to make both the guest and host sad. > > I don't think we want a GFP flag. This is still way too specialized > to > warrant one of those. > > It sounds like a similar problem to what folks want for modules or > BPF. > There are a bunch of allocations that are related and can have some > of > their setup/teardown costs amortized if they can be clumped together. > > For BPF, the costs are from doing RW=>RO in the kernel direct map, > and > fracturing it in the process. > > Here, the costs are from the private->shared conversions and > fracturing > both the direct map and the EPT/SEPT. > > I just don't know if there's anything that we can reuse from the BPF > effort. Last I saw the BPF allocator was focused on module space memory and packing multiple allocations into pages. So that would probably have to be generalized to support this. But also, a lot of the efforts around improving direct map modification efficiency have sort of assumed all of the types of special permissions could be grouped together on the direct map and just mapped elsewhere in whatever permission they needed. This does seem like a special case where it really needs to be grouped together only with similar permissions. If a GFP flag is too precious, what about something like PKS tables[0] tried. It kind of had a similar problem with respect to preferring physically contiguous special permissioned memory on the direct map. It did this with a cache of converted pages outside the page allocator and a separate alloc() function to get at it. This one could be vmalloc_shared() or something. Don't know, just tossing it out there. [0] https://lore.kernel.org/lkml/20210830235927.6443-7-rick.p.edgecombe@intel.com/
On Tue, May 23, 2023 at 03:43:15PM -0700, Dave Hansen wrote: > On 5/23/23 15:37, kirill.shutemov@linux.intel.com wrote: > >> How does this work with load_unaligned_zeropad()? Couldn't it be > >> running around poking at one of these vmalloc()'d pages via the direct > >> map during a shared->private conversion before the page has been accepted? > > Alias processing in __change_page_attr_set_clr() will change direct > > mapping if you call it on vmalloc()ed memory. I think we are safe wrt > > load_unaligned_zeropad() here. > > We're *eventually* OK: > > > /* Notify hypervisor that we are about to set/clr encryption attribute. */ > > x86_platform.guest.enc_status_change_prepare(addr, numpages, enc); > > > > ret = __change_page_attr_set_clr(&cpa, 1); > > But what about in the middle between enc_status_change_prepare() and > __change_page_attr_set_clr()? Don't the direct map and the > shared/private status of the page diverge in there? Hmm. Maybe we would need to go through making the range in direct mapping non-present before notifying VMM about the change. I need to look at this again in the morning.
On Wed, May 24, 2023 at 02:28:51AM +0300, kirill.shutemov@linux.intel.com wrote: > On Tue, May 23, 2023 at 03:43:15PM -0700, Dave Hansen wrote: > > On 5/23/23 15:37, kirill.shutemov@linux.intel.com wrote: > > >> How does this work with load_unaligned_zeropad()? Couldn't it be > > >> running around poking at one of these vmalloc()'d pages via the direct > > >> map during a shared->private conversion before the page has been accepted? > > > Alias processing in __change_page_attr_set_clr() will change direct > > > mapping if you call it on vmalloc()ed memory. I think we are safe wrt > > > load_unaligned_zeropad() here. > > > > We're *eventually* OK: > > > > > /* Notify hypervisor that we are about to set/clr encryption attribute. */ > > > x86_platform.guest.enc_status_change_prepare(addr, numpages, enc); > > > > > > ret = __change_page_attr_set_clr(&cpa, 1); > > > > But what about in the middle between enc_status_change_prepare() and > > __change_page_attr_set_clr()? Don't the direct map and the > > shared/private status of the page diverge in there? > > Hmm. Maybe we would need to go through making the range in direct mapping > non-present before notifying VMM about the change. > > I need to look at this again in the morning. Okay, I've got around to it finally. Private->Shared conversion is safe: we first set shared bit in the direct mapping and all aliases and then call MapGPA enc_status_change_finish(). So we don't have privately mapped pages that we converted to shared with MapGPA. Shared->Private is not safe. As with Private->Shared, we adjust direct mapping before notifying VMM and accepting the memory, so there's short window when privately mapped memory that is neither mapped into SEPT nor accepted. It is a problem as it can race with load_unaligned_zeropad(). Shared->Private conversion is rare. I only see one call total during the boot in my setup. Worth fixing anyway. The patch below fixes the issue by hooking up enc_status_change_prepare() and doing conversion from Shared to Private there. enc_status_change_finish() only covers Private to Shared conversion. The patch is on top of unaccepted memory patchset. It is more convenient base. I will rebase to Linus' tree if the approach looks sane to you. Any comments? diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c index 32501277ef84..b73ec2449c64 100644 --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -713,16 +713,32 @@ static bool tdx_cache_flush_required(void) return true; } +static bool tdx_enc_status_change_prepare(unsigned long vaddr, int numpages, + bool enc) +{ + phys_addr_t start = __pa(vaddr); + phys_addr_t end = __pa(vaddr + numpages * PAGE_SIZE); + + if (!enc) + return true; + + return tdx_enc_status_changed_phys(start, end, enc); +} + /* * Inform the VMM of the guest's intent for this physical page: shared with * the VMM or private to the guest. The VMM is expected to change its mapping * of the page in response. */ -static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc) +static bool tdx_enc_status_change_finish(unsigned long vaddr, int numpages, + bool enc) { phys_addr_t start = __pa(vaddr); phys_addr_t end = __pa(vaddr + numpages * PAGE_SIZE); + if (enc) + return true; + return tdx_enc_status_changed_phys(start, end, enc); } @@ -753,9 +769,10 @@ void __init tdx_early_init(void) */ physical_mask &= cc_mask - 1; - x86_platform.guest.enc_cache_flush_required = tdx_cache_flush_required; - x86_platform.guest.enc_tlb_flush_required = tdx_tlb_flush_required; - x86_platform.guest.enc_status_change_finish = tdx_enc_status_changed; + x86_platform.guest.enc_cache_flush_required = tdx_cache_flush_required; + x86_platform.guest.enc_tlb_flush_required = tdx_tlb_flush_required; + x86_platform.guest.enc_status_change_prepare = tdx_enc_status_change_prepare; + x86_platform.guest.enc_status_change_finish = tdx_enc_status_change_finish; pr_info("Guest detected\n"); } diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h index 88085f369ff6..1ca9701917c5 100644 --- a/arch/x86/include/asm/x86_init.h +++ b/arch/x86/include/asm/x86_init.h @@ -150,7 +150,7 @@ struct x86_init_acpi { * @enc_cache_flush_required Returns true if a cache flush is needed before changing page encryption status */ struct x86_guest { - void (*enc_status_change_prepare)(unsigned long vaddr, int npages, bool enc); + bool (*enc_status_change_prepare)(unsigned long vaddr, int npages, bool enc); bool (*enc_status_change_finish)(unsigned long vaddr, int npages, bool enc); bool (*enc_tlb_flush_required)(bool enc); bool (*enc_cache_flush_required)(void); diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c index d82f4fa2f1bf..64664311ac2b 100644 --- a/arch/x86/kernel/x86_init.c +++ b/arch/x86/kernel/x86_init.c @@ -130,8 +130,8 @@ struct x86_cpuinit_ops x86_cpuinit = { static void default_nmi_init(void) { }; -static void enc_status_change_prepare_noop(unsigned long vaddr, int npages, bool enc) { } -static bool enc_status_change_finish_noop(unsigned long vaddr, int npages, bool enc) { return false; } +static bool enc_status_change_prepare_noop(unsigned long vaddr, int npages, bool enc) { return true; } +static bool enc_status_change_finish_noop(unsigned long vaddr, int npages, bool enc) { return true; } static bool enc_tlb_flush_required_noop(bool enc) { return false; } static bool enc_cache_flush_required_noop(void) { return false; } static bool is_private_mmio_noop(u64 addr) {return false; } diff --git a/arch/x86/mm/mem_encrypt_amd.c b/arch/x86/mm/mem_encrypt_amd.c index e0b51c09109f..4f95c449a406 100644 --- a/arch/x86/mm/mem_encrypt_amd.c +++ b/arch/x86/mm/mem_encrypt_amd.c @@ -319,7 +319,7 @@ static void enc_dec_hypercall(unsigned long vaddr, int npages, bool enc) #endif } -static void amd_enc_status_change_prepare(unsigned long vaddr, int npages, bool enc) +static bool amd_enc_status_change_prepare(unsigned long vaddr, int npages, bool enc) { /* * To maintain the security guarantees of SEV-SNP guests, make sure @@ -327,6 +327,8 @@ static void amd_enc_status_change_prepare(unsigned long vaddr, int npages, bool */ if (cc_platform_has(CC_ATTR_GUEST_SEV_SNP) && !enc) snp_set_memory_shared(vaddr, npages); + + return true; } /* Return true unconditionally: return value doesn't matter for the SEV side */ diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index 7159cf787613..b8f48ebe753c 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -2151,7 +2151,8 @@ static int __set_memory_enc_pgtable(unsigned long addr, int numpages, bool enc) cpa_flush(&cpa, x86_platform.guest.enc_cache_flush_required()); /* Notify hypervisor that we are about to set/clr encryption attribute. */ - x86_platform.guest.enc_status_change_prepare(addr, numpages, enc); + if (!x86_platform.guest.enc_status_change_prepare(addr, numpages, enc)) + return -EIO; ret = __change_page_attr_set_clr(&cpa, 1);
On 5/25/23 12:08, Kirill A. Shutemov wrote: > Shared->Private conversion is rare. I only see one call total during the > boot in my setup. Worth fixing anyway. ... > Any comments? So the rules are: * Shared mapping of a private page: BAD * Private mapping of a shared page: OK ? The patch seems OK, other than having zero comments in it.
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c index 5574c91541a2..731be50b3d09 100644 --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -7,6 +7,7 @@ #include <linux/cpufeature.h> #include <linux/export.h> #include <linux/io.h> +#include <linux/mm.h> #include <asm/coco.h> #include <asm/tdx.h> #include <asm/vmx.h> @@ -789,6 +790,34 @@ static bool try_accept_one(phys_addr_t *start, unsigned long len, return true; } +static bool try_accept_page(phys_addr_t start, phys_addr_t end) +{ + /* + * For shared->private conversion, accept the page using + * TDX_ACCEPT_PAGE TDX module call. + */ + while (start < end) { + unsigned long len = end - start; + + /* + * Try larger accepts first. It gives chance to VMM to keep + * 1G/2M SEPT entries where possible and speeds up process by + * cutting number of hypercalls (if successful). + */ + + if (try_accept_one(&start, len, PG_LEVEL_1G)) + continue; + + if (try_accept_one(&start, len, PG_LEVEL_2M)) + continue; + + if (!try_accept_one(&start, len, PG_LEVEL_4K)) + return false; + } + + return true; +} + /* * Notify the VMM about page mapping conversion. More info about ABI * can be found in TDX Guest-Host-Communication Interface (GHCI), @@ -838,6 +867,19 @@ static bool tdx_map_gpa(phys_addr_t start, phys_addr_t end, bool enc) return !ret; } +static bool tdx_enc_status_changed_phys(phys_addr_t start, phys_addr_t end, + bool enc) +{ + if (!tdx_map_gpa(start, end, enc)) + return false; + + /* private->shared conversion requires only MapGPA call */ + if (!enc) + return true; + + return try_accept_page(start, end); +} + /* * Inform the VMM of the guest's intent for this physical page: shared with * the VMM or private to the guest. The VMM is expected to change its mapping @@ -845,37 +887,23 @@ static bool tdx_map_gpa(phys_addr_t start, phys_addr_t end, bool enc) */ static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc) { - phys_addr_t start = __pa(vaddr); - phys_addr_t end = __pa(vaddr + numpages * PAGE_SIZE); + unsigned long start = vaddr; + unsigned long end = start + numpages * PAGE_SIZE; - if (!tdx_map_gpa(start, end, enc)) + if (offset_in_page(start) != 0) return false; - /* private->shared conversion requires only MapGPA call */ - if (!enc) - return true; + if (!is_vmalloc_addr((void *)start)) + return tdx_enc_status_changed_phys(__pa(start), __pa(end), enc); - /* - * For shared->private conversion, accept the page using - * TDX_ACCEPT_PAGE TDX module call. - */ while (start < end) { - unsigned long len = end - start; + phys_addr_t start_pa = slow_virt_to_phys((void *)start); + phys_addr_t end_pa = start_pa + PAGE_SIZE; - /* - * Try larger accepts first. It gives chance to VMM to keep - * 1G/2M SEPT entries where possible and speeds up process by - * cutting number of hypercalls (if successful). - */ - - if (try_accept_one(&start, len, PG_LEVEL_1G)) - continue; - - if (try_accept_one(&start, len, PG_LEVEL_2M)) - continue; - - if (!try_accept_one(&start, len, PG_LEVEL_4K)) + if (!tdx_enc_status_changed_phys(start_pa, end_pa, enc)) return false; + + start += PAGE_SIZE; } return true;