From patchwork Mon Jul 10 22:33:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 118134 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp118330vqm; Mon, 10 Jul 2023 15:51:14 -0700 (PDT) X-Google-Smtp-Source: APBJJlEJpZf4D0VNgF69XOLXOX6QbLua+r9hzqyMsZrmWNWWFcdBul4YpvuIBwc+MIO8Belfrf0x X-Received: by 2002:aa7:d043:0:b0:51e:cc7:534d with SMTP id n3-20020aa7d043000000b0051e0cc7534dmr12550687edo.24.1689029473777; Mon, 10 Jul 2023 15:51:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689029473; cv=none; d=google.com; s=arc-20160816; b=udDbC6K8bYURbeiNB2Ovh/f6wcwuLtCYh0iXhA+ZrEtHG10Vyzy4nTUV6jWy+bVOPE dg8UjCgTBGyRpB73T4rE/Ybu1gLz+iLgQXOiOhBjjaQWPcCHKIlNPSlIS2F4wAba7tvf rzgGGPTQaajygMNIlIx/APqdlsWw6eND8MJmoWHJqEHo/WHEQkSs9pN/jEQIY0BSwb3O xZg2M3AesEPAP9LPfD1+kJAe7/Mb6GHTCibklGdmq5sE18DNA+SjqfePsOlNK7Rp8KuY +4lujUnAo0JwBiGn1/M4KYuKpKWCeOs35UOK1EJxEVjlJSIznubIvs9PluniHMvxVaRN lgEg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=oQJYRa4cYbFlxJ1mDmha+0HxPANtp6LWuy6ed2Yjbu4=; fh=55VxVtf/wwatkgb/EP05SgBT89BC1zibP0SY8oI5qug=; b=QJ5bWPvSudrOIVhTTOeb8/ICpZdvM25hPdHUQmJJPHj7B+/ARcrzaiDJPdfdqzcecG 0GjG/xz1yU2RPR/1F5rEe2On4/hRuflpkso3Yx0a6JQXI8drr6M2pJ6AAbdcdf79jLd6 BMaHONTuK1fpM5Rf2lPf/72uMnBRtH0XIyPTEorJOqWKaOhn3XQwBRjIaMELbZA4hzpg 4EoVd7ECWNPWzhyVfzuCKTTIH/4PYlB2/kh5UYKeRjtqAeGHdn/W9phPzvB6vYn0VmAJ itf1XUCNlktFo9N9iLlLkMSzUMA/r8P/TgtKSBkGiKNnyDizLDqISifOAiaAiW08ih23 kZ1A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=CetzuFar; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n25-20020a056402515900b0051df8f154d4si660774edd.6.2023.07.10.15.50.51; Mon, 10 Jul 2023 15:51:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=CetzuFar; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231320AbjGJWfx (ORCPT + 99 others); Mon, 10 Jul 2023 18:35:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60172 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231186AbjGJWfe (ORCPT ); Mon, 10 Jul 2023 18:35:34 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 44B3DE6F for ; Mon, 10 Jul 2023 15:34:58 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-c83284edf0eso1866419276.3 for ; Mon, 10 Jul 2023 15:34:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689028463; x=1691620463; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=oQJYRa4cYbFlxJ1mDmha+0HxPANtp6LWuy6ed2Yjbu4=; b=CetzuFarHJgPUiGBwDAWLpdzt0w3V9hQbqMPhuZfIenfXirykHD2HLGYdWQSkINvJh xJG2uW65yyAAb+o06RvuUp3WpLKZAzNYnV7EddNjt5dfXFCFq8AZmqCWdcr58enYZIQE n5JK8gg0yI5kVZJS/Tnc0gRJy7edUPwC0tcHLkojiMtzkNIt/J3L+ITk1kszc47h7Vz7 7HfhvqbXuGyD3D189Y0m4jp9NOnOMRg/24QxNfglicfLYXQlHIKScA6XdF716DYeVEDF C2/NnZaF4OaiJaWHPvY80aoJTwZFVHVTEJFSi6QbKnAMO9h++YLyvxmo53PBx0z1bgkV UbJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689028463; x=1691620463; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=oQJYRa4cYbFlxJ1mDmha+0HxPANtp6LWuy6ed2Yjbu4=; b=ZoNBD/DbzBJO2R/c4R1o2FA6hgkHy+zfcpDfPAvZ2st4NTg1FcDxP1dOBTIzr1BwKX yjP4AgouMbjHvOjDati0bN7HVMweTlejlKBralv+RCgrOnvg/JhW0ftCBK4hiVzcQC2A 7nkv1SjQDK80g7/oBdsTQmNQCE2pJgTRXLDnm/MHwHv5HY06JxibQPUUhtSzREF7Rmx+ Y1gw83nFOyalZ6/OZ0dmgpbfIGHUnZ6N3ABawk+wjdG5kzMAK21bpx+HKeR0KMdQTn45 jjNHpkVAp5ijPiUxw2LpuAMS3tm4aLGEDbQZLZCcFEgyYpbPCetfafVKThbkqHzYz0uo NrPA== X-Gm-Message-State: ABy/qLarWce170/CrxKts4LxwKXPPaoxuulgO65vVKDXnzySGC7TtIEy WCDY5tkJwVwoSOcGyk97vSG92DE0lPNB4hDr63ZfBy2uVWOo/7+pvmpDgYZooGh4RvaGp9GPrLc 3r3MiFtNja1IP832T0esYcf+/Q3Iekzks3dHOZsYWqsgdUbJc4r3MjVB3QLgQ33hTwzLEs1f80W nLWS5QZ2U= X-Received: from almasrymina.svl.corp.google.com ([2620:15c:2c4:200:4c0f:bfb6:9942:8c53]) (user=almasrymina job=sendgmr) by 2002:a25:4252:0:b0:c6f:6ffe:f904 with SMTP id p79-20020a254252000000b00c6f6ffef904mr50579yba.9.1689028463039; Mon, 10 Jul 2023 15:34:23 -0700 (PDT) Date: Mon, 10 Jul 2023 15:33:01 -0700 In-Reply-To: <20230710223304.1174642-1-almasrymina@google.com> Mime-Version: 1.0 References: <20230710223304.1174642-1-almasrymina@google.com> X-Mailer: git-send-email 2.41.0.390.g38632f3daf-goog Message-ID: <20230710223304.1174642-11-almasrymina@google.com> Subject: [RFC PATCH 10/10] memory-provider: add dmabuf devmem provider From: Mina Almasry To: linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org, netdev@vger.kernel.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Mina Almasry , Sumit Semwal , " =?utf-8?q?Christian_K=C3=B6nig?= " , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesper Dangaard Brouer , Ilias Apalodimas , Arnd Bergmann , David Ahern , Willem de Bruijn , Shuah Khan , jgg@ziepe.ca X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771075769106035602 X-GMAIL-MSGID: 1771075769106035602 Use Jakub's memory provider PoC API: https://github.com/kuba-moo/linux/tree/pp-providers To implement a dmabuf devmem memory provider. The provider allocates NET_RX dmabuf pages to the page pool. This abstracts any custom memory allocation or freeing changes for devmem TCP from drivers using the page pool. The memory provider allocates NET_RX pages from the dmabuf pages provided by the driver. These pages are ZONE_DEVICE pages with the sg dma_addrs stored in the zone_device_data entry in the page. The page pool entries in struct page are in a union with the ZONE_DEVICE entries, and - without special handling - the page pool would accidentally overwrite the data in the ZONE_DEVICE fields. To solve this, the memory provider converts the page from a ZONE_DEVICE page to a ZONE_NORMAL page upon giving it to the page pool, and converts it back to ZONE_DEVICE page upon getting it back from the page pool. This is safe to do because the NET_RX pages are dmabuf pages created to hold the dma_addr in the dma_buf_map_attachement sg_table entries, and are only used with code that handles them specifically. However, since dmabuf pages can now also be page pool page, we need to update 2 places to detect this correctly: 1. is_dma_buf_page() needs to be updated to correctly detect dmabuf pages after they've been inserted into the pool. 2. dma_buf_page_to_dma_addr() needs to be updated. For page pool pages, the dma_addr exists in page->dma_addr. For non page pool pages, the dma_addr exists in page->zone_device_data. Signed-off-by: Mina Almasry --- include/linux/dma-buf.h | 29 ++++++++++- include/net/page_pool.h | 20 ++++++++ net/core/page_pool.c | 104 ++++++++++++++++++++++++++++++++++++---- 3 files changed, 143 insertions(+), 10 deletions(-) diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 93228a2fec47..896359fa998d 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -692,15 +692,26 @@ static inline bool is_dma_buf_pages_file(struct file *file) struct page *dma_buf_pages_net_rx_alloc(struct dma_buf_pages *priv); +static inline bool is_dma_buf_page_net_rx(struct page *page) +{ + struct dma_buf_pages *priv; + + return (is_page_pool_page(page) && (priv = page->pp->mp_priv) && + priv->pgmap.ops == &dma_buf_pgmap_ops); +} + static inline bool is_dma_buf_page(struct page *page) { return (is_zone_device_page(page) && page->pgmap && - page->pgmap->ops == &dma_buf_pgmap_ops); + page->pgmap->ops == &dma_buf_pgmap_ops) || + is_dma_buf_page_net_rx(page); } static inline dma_addr_t dma_buf_page_to_dma_addr(struct page *page) { - return (dma_addr_t)page->zone_device_data; + return is_dma_buf_page_net_rx(page) ? + (dma_addr_t)page->dma_addr : + (dma_addr_t)page->zone_device_data; } static inline int dma_buf_map_sg(struct device *dev, struct scatterlist *sg, @@ -718,6 +729,16 @@ static inline int dma_buf_map_sg(struct device *dev, struct scatterlist *sg, return nents; } + +static inline bool is_dma_buf_pages_priv(void *ptr) +{ + struct dma_buf_pages *priv = (struct dma_buf_pages *)ptr; + + if (!priv || priv->pgmap.ops != &dma_buf_pgmap_ops) + return false; + + return true; +} #else static inline bool is_dma_buf_page(struct page *page) { @@ -745,6 +766,10 @@ static inline struct page *dma_buf_pages_net_rx_alloc(struct dma_buf_pages *priv return NULL; } +static inline bool is_dma_buf_pages_priv(void *ptr) +{ + return false; +} #endif diff --git a/include/net/page_pool.h b/include/net/page_pool.h index 7b6668479baf..a57757a13cc8 100644 --- a/include/net/page_pool.h +++ b/include/net/page_pool.h @@ -157,6 +157,7 @@ enum pp_memory_provider_type { PP_MP_HUGE_SPLIT, /* 2MB, online page alloc */ PP_MP_HUGE, /* 2MB, all memory pre-allocated */ PP_MP_HUGE_1G, /* 1G pages, MEP, pre-allocated */ + PP_MP_DMABUF_DEVMEM, /* dmabuf devmem provider */ }; struct pp_memory_provider_ops { @@ -170,6 +171,7 @@ extern const struct pp_memory_provider_ops basic_ops; extern const struct pp_memory_provider_ops hugesp_ops; extern const struct pp_memory_provider_ops huge_ops; extern const struct pp_memory_provider_ops huge_1g_ops; +extern const struct pp_memory_provider_ops dmabuf_devmem_ops; struct page_pool { struct page_pool_params p; @@ -420,4 +422,22 @@ static inline void page_pool_nid_changed(struct page_pool *pool, int new_nid) page_pool_update_nid(pool, new_nid); } +static inline bool is_page_pool_page(struct page *page) +{ + /* page->pp_magic is OR'ed with PP_SIGNATURE after the allocation + * in order to preserve any existing bits, such as bit 0 for the + * head page of compound page and bit 1 for pfmemalloc page, so + * mask those bits for freeing side when doing below checking, + * and page_is_pfmemalloc() is checked in __page_pool_put_page() + * to avoid recycling the pfmemalloc page. + */ + if (unlikely((page->pp_magic & ~0x3UL) != PP_SIGNATURE)) + return false; + + if (!page->pp) + return false; + + return true; +} + #endif /* _NET_PAGE_POOL_H */ diff --git a/net/core/page_pool.c b/net/core/page_pool.c index df3f431fcff3..e626d4e309c1 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -236,6 +236,9 @@ static int page_pool_init(struct page_pool *pool, case PP_MP_HUGE_1G: pool->mp_ops = &huge_1g_ops; break; + case PP_MP_DMABUF_DEVMEM: + pool->mp_ops = &dmabuf_devmem_ops; + break; default: err = -EINVAL; goto free_ptr_ring; @@ -975,14 +978,7 @@ bool page_pool_return_skb_page(struct page *page, bool napi_safe) page = compound_head(page); - /* page->pp_magic is OR'ed with PP_SIGNATURE after the allocation - * in order to preserve any existing bits, such as bit 0 for the - * head page of compound page and bit 1 for pfmemalloc page, so - * mask those bits for freeing side when doing below checking, - * and page_is_pfmemalloc() is checked in __page_pool_put_page() - * to avoid recycling the pfmemalloc page. - */ - if (unlikely((page->pp_magic & ~0x3UL) != PP_SIGNATURE)) + if (!is_page_pool_page(page)) return false; pp = page->pp; @@ -1538,3 +1534,95 @@ const struct pp_memory_provider_ops huge_1g_ops = { .alloc_pages = mp_huge_1g_alloc_pages, .release_page = mp_huge_1g_release, }; + +/*** "Dmabuf devmem page" ***/ + +/* Dmabuf devmem memory provider allocates DMA_BUF_PAGES_NET_RX pages which are + * backing the dma_buf_map_attachment() from the NIC to the device memory. + * + * These pages are wrappers around the dma_addr of the sg entries in the + * sg_table returned from dma_buf_map_attachment(). They can be passed to the + * networking stack, which will generate devmem skbs from them and process them + * correctly. + */ +static int mp_dmabuf_devmem_init(struct page_pool *pool) +{ + struct dma_buf_pages *priv; + + priv = pool->mp_priv; + if (!is_dma_buf_pages_priv(priv)) + return -EINVAL; + + return 0; +} + +static void mp_dmabuf_devmem_destroy(struct page_pool *pool) +{ +} + +static struct page *mp_dmabuf_devmem_alloc_pages(struct page_pool *pool, + gfp_t gfp) +{ + struct dma_buf_pages *priv = pool->mp_priv; + dma_addr_t dma_addr; + struct page *page; + + page = dma_buf_pages_net_rx_alloc(priv); + if (!page) + return page; + + /* It shouldn't be possible for the allocation to give us a page not + * belonging to this page_pool's pgmap. + */ + BUG_ON(page->pgmap != &priv->pgmap); + + /* netdev_rxq_alloc_dma_buf_page() allocates a ZONE_DEVICE page. + * Prepare to convert it into a page_pool page. We need to hold pgmap + * and zone_device_data (which holds the dma_addr). + * + * DMA_BUF_PAGES_NET_RX are dmabuf pages created specifically to wrap + * the dma_addr of the sg_table into a struct page. These pages are + * used by code specifically equipped to handle them, so this + * conversation from ZONE_DEVICE page to page pool page should be safe. + */ + dma_addr = (dma_addr_t)page->zone_device_data; + + set_page_zone(page, ZONE_NORMAL); + page->pp_magic = 0; + page_pool_set_pp_info(pool, page); + + page->dma_addr = dma_addr; + + return page; +} + +static bool mp_dmabuf_devmem_release_page(struct page_pool *pool, + struct page *page) +{ + struct dma_buf_pages *priv = pool->mp_priv; + unsigned long dma_addr = page->dma_addr; + + page_pool_clear_pp_info(page); + + /* As the page pool releases the page, restore it back to a ZONE_DEVICE + * page so it gets freed according to the + * page->pgmap->ops->page_free(). + */ + set_page_zone(page, ZONE_DEVICE); + page->zone_device_data = (void*)dma_addr; + page->pgmap = &priv->pgmap; + put_page(page); + + /* Return false here as we don't want the page pool touching the page + * after it's released to us. + */ + return false; +} + +const struct pp_memory_provider_ops dmabuf_devmem_ops = { + .init = mp_dmabuf_devmem_init, + .destroy = mp_dmabuf_devmem_destroy, + .alloc_pages = mp_dmabuf_devmem_alloc_pages, + .release_page = mp_dmabuf_devmem_release_page, +}; +EXPORT_SYMBOL(dmabuf_devmem_ops);