From patchwork Mon Jul 10 22:32:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 118132 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp116843vqm; Mon, 10 Jul 2023 15:47:07 -0700 (PDT) X-Google-Smtp-Source: APBJJlEsDfhMsiNruXwtkZHZkPHZfwdpPqDCJrme/nhs78uk1VU1dhlX2yUCrHmzC6hIGvo2qGPU X-Received: by 2002:a2e:3c0e:0:b0:2b4:7f2e:a433 with SMTP id j14-20020a2e3c0e000000b002b47f2ea433mr11631749lja.36.1689029227267; Mon, 10 Jul 2023 15:47:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689029227; cv=none; d=google.com; s=arc-20160816; b=Qk715mCEhMHSWdXnyK9SidnEVoB0srMJFvraBiAhZt8XmzJ4dJfQ8+RrXcwBT8M2UX wFgix8H/7CR1D9NRw9+Q4WAO8crsDFvIqgd1N/vIrZKqUagWTbNWFm7al1omjdIrjqlY qmAPiXGN6GX2Ow2cafvrwTIqTxR/s6445OOlkMpi10rrgrtJWUEx2SlievCG1vBIJCi9 VBL2dr2aqHltdwYOnFdGL9YK4rR5rcPFeqpRQhVCc7DfTcs0rjw1xWMTPLsyExoVELnB ZgAWpvhh6wHYcl6VK54xStlkLF4yToRbh6LzikHCbusZTIpA3ZTPotHhFd48yGBQlL7x YyFQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=n3XnWoX3mq+b99bSYScr9hgrqKqfzZzVVDFO4lEAixo=; fh=55VxVtf/wwatkgb/EP05SgBT89BC1zibP0SY8oI5qug=; b=ss4xhKtAs3xDN4Zeg+EaE72AD0jqQLG/Cd86BDcahk3P3OZlh1a8/wd4nH/zVE7uWB BOTZRFCW5rtSNVXMvaNRe81EoLYFAfHIxLitVc/3YUmTmbEuuLHpyJEPW/IHyy1jFg1z tCrF9pNaShp8BFp7tHAgRR+tkfwrXsnh6uaVK9BjZfJz5iOq7OrcPZUgNp/qbX0qQU/j 4dEtQMXRzrFboqOKvq7bVN9Nz73YpvJI1aYv5Lg65YMXaRkvo45+5MOkaDT2/xumeNWF B6a9n7xuhBYn+mPwrNHz0rtsrEBnVgXdsuooCJd0ZG7W1fr2Eq9pDA8K/n+InMweq3lT +vqg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=aKK4x+BP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id qp15-20020a170907206f00b009874bc72715si656094ejb.557.2023.07.10.15.46.43; Mon, 10 Jul 2023 15:47:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=aKK4x+BP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230352AbjGJWdw (ORCPT + 99 others); Mon, 10 Jul 2023 18:33:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58710 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230345AbjGJWdu (ORCPT ); Mon, 10 Jul 2023 18:33:50 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8F18DE47 for ; Mon, 10 Jul 2023 15:33:46 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-579e9b95b86so60707677b3.1 for ; Mon, 10 Jul 2023 15:33:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689028425; x=1691620425; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=n3XnWoX3mq+b99bSYScr9hgrqKqfzZzVVDFO4lEAixo=; b=aKK4x+BPpAO/YJnlbxEPDSQ7Or5Frwhwee2iD2GBEfe1fPtmZVHgx7xOzFYwlpwBd0 CIAhwJFzySe6vd1iF5hHC7H5Bx4ikw2GiXtyeS+3mlpQ9IGJE/tNmOre93nGmlNqG7go H7sT0O60Au84IyYseY02KCEXg7RyIWAeAXh7dCgGut854L9f3xU2dkEJqjFqZYXmGFlG Joyc0e+Us0w0oYp3zs6SdDLqnd4mFFLgNGOx2cS/wRtoXMK+OGZP1jT/03NO1qBDNTNY VajQypKncfQ8ybRP4+obiiz7E94H3YhLRSSy4tUt/CNwfeshxUkcAM5tsYxecciDD/PR 1PlQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689028425; x=1691620425; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=n3XnWoX3mq+b99bSYScr9hgrqKqfzZzVVDFO4lEAixo=; b=Vebvl3j8Ic0df3rhFjQ9IvwOHszgfB1M7iGYCvSyLEKJ1mrJCepEWIsdN3G01tXot1 vCjz2kOqjmhmPIllg8YnbHjC+wsyVcgkzGUgxMvZsDYDpnNROe8r/SICXKB3dwYjU01c Ow29mzS93kFZHcoojltXyfUbgwVclAJr+0BvW9exkYqlb4cVvwCoOs1E8VKCKa8dKpFY g8nKTXurceWvBRuer3jsHVPE3juGOzhpgrJcDufu/4Gc9rqYvb+DxLBLhsWSokZFytCu PR/kST9vp3h13zbFveqBJj+V4BRG5CLSH3wmPjd4sSFgitbJBYpi9bikwLT31isriaQu 4Srw== X-Gm-Message-State: ABy/qLaTkQB3jYvwj0ENOT0LLAuKQEvM8I62pQXUmoS1DpNy1yoa4cF3 HTl5DgaW6L7upGG58731qbaldeaMyGZzNlueTTycdZIsIUIY8LuKUTPduE2mP0doopUpo9XenPs wsO8n44kRiicflXejLgV1VtEm49TZYk454rh890zXpqPRU9cDRMIiujwhFOK6QNy/8gfp55tgmG BAPdeXkbI= X-Received: from almasrymina.svl.corp.google.com ([2620:15c:2c4:200:4c0f:bfb6:9942:8c53]) (user=almasrymina job=sendgmr) by 2002:a81:cf0c:0:b0:570:b1:ca37 with SMTP id u12-20020a81cf0c000000b0057000b1ca37mr97547ywi.5.1689028424874; Mon, 10 Jul 2023 15:33:44 -0700 (PDT) Date: Mon, 10 Jul 2023 15:32:52 -0700 In-Reply-To: <20230710223304.1174642-1-almasrymina@google.com> Mime-Version: 1.0 References: <20230710223304.1174642-1-almasrymina@google.com> X-Mailer: git-send-email 2.41.0.390.g38632f3daf-goog Message-ID: <20230710223304.1174642-2-almasrymina@google.com> Subject: [RFC PATCH 01/10] dma-buf: add support for paged attachment mappings From: Mina Almasry To: linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org, netdev@vger.kernel.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Mina Almasry , Sumit Semwal , " =?utf-8?q?Christian_K=C3=B6nig?= " , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesper Dangaard Brouer , Ilias Apalodimas , Arnd Bergmann , David Ahern , Willem de Bruijn , Shuah Khan , jgg@ziepe.ca X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771075511036373873 X-GMAIL-MSGID: 1771075511036373873 Currently dmabuf p2p memory doesn't present itself in the form of struct pages and the memory can't be easily used with code that expects memory in that form. Add support for paged attachment mappings. We use existing dmabuf APIs to create a mapped attachment (dma_buf_attach() & dma_buf_map_attachment()), and we create struct pages for this mapping. We write the dma_addr's from the sg_table into the created pages. These pages can then be passed into code that expects struct pages and can largely operate on these pages with minimal modifications: 1. The pages need not be dma mapped. The dma addr can be queried from page->zone_device_data and used directly. 2. The pages are not kmappable. Add a new ioctl that enables the user to create a struct page backed dmabuf attachment mapping. This ioctl returns a new file descriptor which represents the dmabuf pages. The pages are freed when (a) the user closes the file, and (b) the struct pages backing the dmabuf are no longer in use. Once the pages are no longer in use, the mapped attachment is removed. The support added in this patch should be generic - the pages are created by the base code, but the user specifies the type of page to create using the dmabuf_create_pages_info->type flag. The base code hands of any handling specific to the use case of the ops of that page type. Signed-off-by: Mina Almasry --- drivers/dma-buf/dma-buf.c | 223 +++++++++++++++++++++++++++++++++++ include/linux/dma-buf.h | 90 ++++++++++++++ include/uapi/linux/dma-buf.h | 9 ++ 3 files changed, 322 insertions(+) diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index aa4ea8530cb3..50b1d813cf5c 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -22,6 +22,7 @@ #include #include #include +#include #include #include #include @@ -442,12 +443,16 @@ static long dma_buf_import_sync_file(struct dma_buf *dmabuf, } #endif +static long dma_buf_create_pages(struct file *file, + struct dma_buf_create_pages_info *create_info); + static long dma_buf_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { struct dma_buf *dmabuf; struct dma_buf_sync sync; enum dma_data_direction direction; + struct dma_buf_create_pages_info create_info; int ret; dmabuf = file->private_data; @@ -484,6 +489,12 @@ static long dma_buf_ioctl(struct file *file, case DMA_BUF_SET_NAME_A: case DMA_BUF_SET_NAME_B: return dma_buf_set_name(dmabuf, (const char __user *)arg); + case DMA_BUF_CREATE_PAGES: + if (copy_from_user(&create_info, (void __user *)arg, + sizeof(create_info))) + return -EFAULT; + + return dma_buf_create_pages(file, &create_info); #if IS_ENABLED(CONFIG_SYNC_FILE) case DMA_BUF_IOCTL_EXPORT_SYNC_FILE: @@ -1613,6 +1624,218 @@ void dma_buf_vunmap_unlocked(struct dma_buf *dmabuf, struct iosys_map *map) } EXPORT_SYMBOL_NS_GPL(dma_buf_vunmap_unlocked, DMA_BUF); +static int dma_buf_pages_release(struct inode *inode, struct file *file) +{ + struct dma_buf_pages *priv = file->private_data; + + if (priv->type_ops->dma_buf_pages_release) + priv->type_ops->dma_buf_pages_release(priv, file); + + percpu_ref_kill(&priv->pgmap.ref); + /* Drop initial ref after percpu_ref_kill(). */ + percpu_ref_put(&priv->pgmap.ref); + + return 0; +} + +static void dma_buf_page_free(struct page *page) +{ + struct dma_buf_pages *priv; + struct dev_pagemap *pgmap; + + pgmap = page->pgmap; + priv = container_of(pgmap, struct dma_buf_pages, pgmap); + + if (priv->type_ops->dma_buf_page_free) + priv->type_ops->dma_buf_page_free(priv, page); +} + +const struct dev_pagemap_ops dma_buf_pgmap_ops = { + .page_free = dma_buf_page_free, +}; +EXPORT_SYMBOL_GPL(dma_buf_pgmap_ops); + +const struct file_operations dma_buf_pages_fops = { + .release = dma_buf_pages_release, +}; +EXPORT_SYMBOL_GPL(dma_buf_pages_fops); + +#ifdef CONFIG_ZONE_DEVICE +static void dma_buf_pages_destroy(struct percpu_ref *ref) +{ + struct dma_buf_pages *priv; + struct dev_pagemap *pgmap; + + pgmap = container_of(ref, struct dev_pagemap, ref); + priv = container_of(pgmap, struct dma_buf_pages, pgmap); + + if (priv->type_ops->dma_buf_pages_destroy) + priv->type_ops->dma_buf_pages_destroy(priv); + + kvfree(priv->pages); + kfree(priv); + + dma_buf_unmap_attachment(priv->attachment, priv->sgt, priv->direction); + dma_buf_detach(priv->dmabuf, priv->attachment); + dma_buf_put(priv->dmabuf); + pci_dev_put(priv->pci_dev); +} + +static long dma_buf_create_pages(struct file *file, + struct dma_buf_create_pages_info *create_info) +{ + int err, fd, i, pg_idx; + struct scatterlist *sg; + struct dma_buf_pages *priv; + struct file *new_file; + + fd = get_unused_fd_flags(O_RDWR | O_CLOEXEC); + if (fd < 0) { + err = fd; + goto out_err; + } + + priv = kzalloc(sizeof(*priv), GFP_KERNEL); + if (!priv) { + err = -ENOMEM; + goto out_put_fd; + } + + priv->pgmap.type = MEMORY_DEVICE_PRIVATE; + priv->pgmap.ops = &dma_buf_pgmap_ops; + init_completion(&priv->pgmap.done); + + /* This refcount is incremented every time a page in priv->pages is + * allocated, and decremented every time a page is freed. When + * it drops to 0, the dma_buf_pages can be destroyed. An initial ref is + * held and the dma_buf_pages is not destroyed until that is dropped. + */ + err = percpu_ref_init(&priv->pgmap.ref, dma_buf_pages_destroy, 0, + GFP_KERNEL); + if (err) + goto out_free_priv; + + /* Initial ref to be dropped after percpu_ref_kill(). */ + percpu_ref_get(&priv->pgmap.ref); + + priv->pci_dev = pci_get_domain_bus_and_slot( + 0, create_info->pci_bdf[0], + PCI_DEVFN(create_info->pci_bdf[1], create_info->pci_bdf[2])); + if (!priv->pci_dev) { + err = -ENODEV; + goto out_exit_percpu_ref; + } + + priv->dmabuf = dma_buf_get(create_info->dma_buf_fd); + if (IS_ERR(priv->dmabuf)) { + err = PTR_ERR(priv->dmabuf); + goto out_put_pci_dev; + } + + if (priv->dmabuf->size % PAGE_SIZE != 0) { + err = -EINVAL; + goto out_put_dma_buf; + } + + priv->attachment = dma_buf_attach(priv->dmabuf, &priv->pci_dev->dev); + if (IS_ERR(priv->attachment)) { + err = PTR_ERR(priv->attachment); + goto out_put_dma_buf; + } + + priv->num_pages = priv->dmabuf->size / PAGE_SIZE; + priv->pages = kvmalloc_array(priv->num_pages, sizeof(struct page), + GFP_KERNEL); + if (!priv->pages) { + err = -ENOMEM; + goto out_detach_dma_buf; + } + + for (i = 0; i < priv->num_pages; i++) { + struct page *page = &priv->pages[i]; + + mm_zero_struct_page(page); + set_page_zone(page, ZONE_DEVICE); + set_page_count(page, 1); + page->pgmap = &priv->pgmap; + } + + priv->direction = DMA_BIDIRECTIONAL; + priv->sgt = dma_buf_map_attachment(priv->attachment, priv->direction); + if (IS_ERR(priv->sgt)) { + err = PTR_ERR(priv->sgt); + goto out_free_pages; + } + + /* write each dma addresses from sgt to each page */ + pg_idx = 0; + for_each_sgtable_dma_sg(priv->sgt, sg, i) { + size_t len = sg_dma_len(sg); + dma_addr_t dma_addr = sg_dma_address(sg); + + BUG_ON(!PAGE_ALIGNED(len)); + while (len > 0) { + priv->pages[pg_idx].zone_device_data = (void *)dma_addr; + pg_idx++; + dma_addr += PAGE_SIZE; + len -= PAGE_SIZE; + } + } + + new_file = anon_inode_getfile("[dma_buf_pages]", &dma_buf_pages_fops, + (void *)priv, O_RDWR | O_CLOEXEC); + if (IS_ERR(new_file)) { + err = PTR_ERR(new_file); + goto out_unmap_dma_buf; + } + + priv->type = create_info->type; + priv->create_flags = create_info->create_flags; + + switch (priv->type) { + default: + err = -EINVAL; + goto out_put_new_file; + } + + if (priv->type_ops->dma_buf_pages_init) { + err = priv->type_ops->dma_buf_pages_init(priv, new_file); + if (err) + goto out_put_new_file; + } + + fd_install(fd, new_file); + return fd; + +out_put_new_file: + fput(new_file); +out_unmap_dma_buf: + dma_buf_unmap_attachment(priv->attachment, priv->sgt, priv->direction); +out_free_pages: + kvfree(priv->pages); +out_detach_dma_buf: + dma_buf_detach(priv->dmabuf, priv->attachment); +out_put_dma_buf: + dma_buf_put(priv->dmabuf); +out_put_pci_dev: + pci_dev_put(priv->pci_dev); +out_exit_percpu_ref: + percpu_ref_exit(&priv->pgmap.ref); +out_free_priv: + kfree(priv); +out_put_fd: + put_unused_fd(fd); +out_err: + return err; +} +#else +static long dma_buf_create_pages(struct file *file, + struct dma_buf_create_pages_info *create_info) +{ + return -ENOTSUPP; +} +#endif + #ifdef CONFIG_DEBUG_FS static int dma_buf_debug_show(struct seq_file *s, void *unused) { diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 3f31baa3293f..5789006180ea 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -540,6 +540,36 @@ struct dma_buf_export_info { void *priv; }; +struct dma_buf_pages; + +struct dma_buf_pages_type_ops { + int (*dma_buf_pages_init)(struct dma_buf_pages *priv, + struct file *file); + void (*dma_buf_pages_release)(struct dma_buf_pages *priv, + struct file *file); + void (*dma_buf_pages_destroy)(struct dma_buf_pages *priv); + void (*dma_buf_page_free)(struct dma_buf_pages *priv, + struct page *page); +}; + +struct dma_buf_pages { + /* fields for dmabuf */ + struct dma_buf *dmabuf; + struct dma_buf_attachment *attachment; + struct sg_table *sgt; + struct pci_dev *pci_dev; + enum dma_data_direction direction; + + /* fields for dma-buf pages */ + size_t num_pages; + struct page *pages; + struct dev_pagemap pgmap; + + unsigned int type; + const struct dma_buf_pages_type_ops *type_ops; + __u64 create_flags; +}; + /** * DEFINE_DMA_BUF_EXPORT_INFO - helper macro for exporters * @name: export-info name @@ -631,4 +661,64 @@ int dma_buf_vmap(struct dma_buf *dmabuf, struct iosys_map *map); void dma_buf_vunmap(struct dma_buf *dmabuf, struct iosys_map *map); int dma_buf_vmap_unlocked(struct dma_buf *dmabuf, struct iosys_map *map); void dma_buf_vunmap_unlocked(struct dma_buf *dmabuf, struct iosys_map *map); + +#ifdef CONFIG_DMA_SHARED_BUFFER +extern const struct file_operations dma_buf_pages_fops; +extern const struct dev_pagemap_ops dma_buf_pgmap_ops; + +static inline bool is_dma_buf_pages_file(struct file *file) +{ + return file->f_op == &dma_buf_pages_fops; +} + +static inline bool is_dma_buf_page(struct page *page) +{ + return (is_zone_device_page(page) && page->pgmap && + page->pgmap->ops == &dma_buf_pgmap_ops); +} + +static inline dma_addr_t dma_buf_page_to_dma_addr(struct page *page) +{ + return (dma_addr_t)page->zone_device_data; +} + +static inline int dma_buf_map_sg(struct device *dev, struct scatterlist *sg, + int nents, enum dma_data_direction dir) +{ + struct scatterlist *s; + int i; + + for_each_sg(sg, s, nents, i) { + struct page *pg = sg_page(s); + + s->dma_address = dma_buf_page_to_dma_addr(pg); + sg_dma_len(s) = s->length; + } + + return nents; +} +#else +static inline bool is_dma_buf_page(struct page *page) +{ + return false; +} + +static inline bool is_dma_buf_pages_file(struct file *file) +{ + return false; +} + +static inline dma_addr_t dma_buf_page_to_dma_addr(struct page *page) +{ + return 0; +} + +static inline int dma_buf_map_sg(struct device *dev, struct scatterlist *sg, + int nents, enum dma_data_direction dir) +{ + return 0; +} +#endif + + #endif /* __DMA_BUF_H__ */ diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h index 5a6fda66d9ad..d0f63a2ab7e4 100644 --- a/include/uapi/linux/dma-buf.h +++ b/include/uapi/linux/dma-buf.h @@ -179,4 +179,13 @@ struct dma_buf_import_sync_file { #define DMA_BUF_IOCTL_EXPORT_SYNC_FILE _IOWR(DMA_BUF_BASE, 2, struct dma_buf_export_sync_file) #define DMA_BUF_IOCTL_IMPORT_SYNC_FILE _IOW(DMA_BUF_BASE, 3, struct dma_buf_import_sync_file) +struct dma_buf_create_pages_info { + __u8 pci_bdf[3]; + __s32 dma_buf_fd; + __u32 type; + __u64 create_flags; +}; + +#define DMA_BUF_CREATE_PAGES _IOW(DMA_BUF_BASE, 4, struct dma_buf_create_pages_info) + #endif From patchwork Mon Jul 10 22:32:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 118126 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp112625vqm; Mon, 10 Jul 2023 15:35:55 -0700 (PDT) X-Google-Smtp-Source: APBJJlErCIjiGpMFlI/gy0KNUlj3mCKa8qIYWsS+Ds9os1H5SWBoAFJQUSHO+uRIBgYkEPE75WEC X-Received: by 2002:a17:906:89a0:b0:98c:cc3c:194e with SMTP id gg32-20020a17090689a000b0098ccc3c194emr11340583ejc.52.1689028554941; Mon, 10 Jul 2023 15:35:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689028554; cv=none; d=google.com; s=arc-20160816; b=F6gd/BpHRQRGeK4ZrnU0VFW6KHB1GNoeJWHJFdArk2cAPQp5CNa13MWhH5bwsEE3+m 7hvuHH02onLitb1j8XBIrnbStSITu0m8fDvR/0ptr/drI2i4iH39vjnkQwhcZ/e0Bh7Q hJ1lN2JY1xFRlFZVzxtIW0em3G6G08rNFwZhWa+zDsBL6bhVreuKjC4MUoB/86zv2xdn mCIDmEg03GoDXwihLtgZzTtClHrvrP3QpL/rsQunuwRWvTWwgFHo9QElcWAotv1J410b 2bRzSKfdRvVHAkGKGdHAZgdsl2e0kP7YGshniDJvsQIiyPc2uXEwhrkSu8MgIftye2Dv PtnQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=K10CMsALExOmOcR/YTwAYoH6sOUce0sACyLlyAd3oCA=; fh=55VxVtf/wwatkgb/EP05SgBT89BC1zibP0SY8oI5qug=; b=wIhdkUd7TTE6g+QA1po7+EhF3fEVoOoRQ/tuPUvMlrEn5Ve9tFp6HnVpyv8UAR0jS3 6b4Xxqq3qbs2MeWHnBCo3RNHLt2BLi7z1aIREb2JBYvKolvMs9gtgsER0KrC4eS36I3n 5PQqtD/7QUAyJge/uIw6myvY415wPYCpS1oc44a0RZ6cOHoJY/P1N5OcrtzCJkmtUM0H Zi78dkgyH2wSUjx1sqN0VYddaMLkCGBHvg/gODuDGL9hMHuKjrgbUbmrDTxhlKaYjjqs 7rnxjO+1EPdThWe+E+1AExeBjXmJsSgehpB7li/ameZ0rBGMUyBrBrYNM5QKFTEJVYqT P1zg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=PmjrhD36; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g18-20020a17090670d200b0098e266c9592si593821ejk.262.2023.07.10.15.35.32; Mon, 10 Jul 2023 15:35:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=PmjrhD36; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230410AbjGJWeC (ORCPT + 99 others); Mon, 10 Jul 2023 18:34:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58952 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230392AbjGJWd7 (ORCPT ); Mon, 10 Jul 2023 18:33:59 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6576C1BE for ; Mon, 10 Jul 2023 15:33:50 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-c595cadae4bso5724285276.1 for ; Mon, 10 Jul 2023 15:33:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689028429; x=1691620429; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=K10CMsALExOmOcR/YTwAYoH6sOUce0sACyLlyAd3oCA=; b=PmjrhD360OT99Lc8iPVcJYoR6DDqdeqIAZ5+tJAhFT5ahoktvUJx3QBnBPzN/WyXln kbA5Ozqmx/gY8WIxPOw00lEwu6Irt1PeBniAsNqQA8De/a7Gmw+d3Ar8j+YKrZssZ8uA BuSorRLnfwVf9kLGmV78yHD3Id1Ztae6MVmUiqkzc97p1exlkKRRIKvklbv9OqJSUqmP Qk97r7IUXC5lwlf/gsOz/X4tKxwtoTHHXF4t3kR8h/ivJz2kuJv7koREvjKNpq8Fdh4o MhP81DlKHR9GRSfe72HCOY8i4aU5RoyUS+8Tp6rfoY7xD/75fsJWZLCBFtePU5WPjwxt ZKhA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689028429; x=1691620429; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=K10CMsALExOmOcR/YTwAYoH6sOUce0sACyLlyAd3oCA=; b=i7IV808vRWpqFVlE7Uf1daUyCmscCXXDJtupoMAai+RBLNNK3idr2XhjgL1UM1IVZr RNNwOJFfo481WA8phC0a5i3IhZuNz6ltcVw1C+36wy1V9S1a7gIYdvzFZQVZIBhavjwl uy/RMrj2xP7njoD7IlPotPGr0w9q/LkXKedPC5/6fGuZogSL1ZW3OyLeppeK0AcaqOUt IjaUr25nzClgnQbTmRZgWosjTl2KjEIXbf//Kx5srPs8r96bkrP2+++aZbeR7bXQNW5j 2cJtav3agi5pHFxTyAP3Tzx3qHOmXjxmgF3O6k9U9d3BoUuWxhVey+HGh8uGz5J9ncaH MR9g== X-Gm-Message-State: ABy/qLZV0nOqDDHIGhnioPW6+b1yMiEmUQE9EW1ChvEwUfE2zxIqpspK fF03gc3VybZRFNkofxQ3eeiAb2tL0aiYENLPql3nADN4ehS96IgdosLVt6xDMXJcsM8z/iZpAhj E5Bk+IeKWTz/9p31TWnTUTd5wkGc94MyGY2X8Qrv7K1bCVZ9j8W16KGkDPXMFdBXlRuaqJW0/vm 8YOb+JTg8= X-Received: from almasrymina.svl.corp.google.com ([2620:15c:2c4:200:4c0f:bfb6:9942:8c53]) (user=almasrymina job=sendgmr) by 2002:a25:1ed4:0:b0:c48:b822:36db with SMTP id e203-20020a251ed4000000b00c48b82236dbmr78934ybe.10.1689028429046; Mon, 10 Jul 2023 15:33:49 -0700 (PDT) Date: Mon, 10 Jul 2023 15:32:53 -0700 In-Reply-To: <20230710223304.1174642-1-almasrymina@google.com> Mime-Version: 1.0 References: <20230710223304.1174642-1-almasrymina@google.com> X-Mailer: git-send-email 2.41.0.390.g38632f3daf-goog Message-ID: <20230710223304.1174642-3-almasrymina@google.com> Subject: [RFC PATCH 02/10] dma-buf: add support for NET_RX pages From: Mina Almasry To: linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org, netdev@vger.kernel.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Mina Almasry , Sumit Semwal , " =?utf-8?q?Christian_K=C3=B6nig?= " , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesper Dangaard Brouer , Ilias Apalodimas , Arnd Bergmann , David Ahern , Willem de Bruijn , Shuah Khan , jgg@ziepe.ca X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771074806032039520 X-GMAIL-MSGID: 1771074806032039520 Use the paged attachment mappings support to create NET_RX pages. NET_RX pages are pages that can be used in the networking receive path: Bind the pages to the driver's rx queues specified by the create_flags param, and create a gen_pool to hold the free pages available for the driver to allocate. Signed-off-by: Mina Almasry --- drivers/dma-buf/dma-buf.c | 174 +++++++++++++++++++++++++++++++++++ include/linux/dma-buf.h | 20 ++++ include/linux/netdevice.h | 1 + include/uapi/linux/dma-buf.h | 2 + 4 files changed, 197 insertions(+) diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index 50b1d813cf5c..acb86bf406f4 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include @@ -1681,6 +1682,8 @@ static void dma_buf_pages_destroy(struct percpu_ref *ref) pci_dev_put(priv->pci_dev); } +const struct dma_buf_pages_type_ops net_rx_ops; + static long dma_buf_create_pages(struct file *file, struct dma_buf_create_pages_info *create_info) { @@ -1793,6 +1796,9 @@ static long dma_buf_create_pages(struct file *file, priv->create_flags = create_info->create_flags; switch (priv->type) { + case DMA_BUF_PAGES_NET_RX: + priv->type_ops = &net_rx_ops; + break; default: err = -EINVAL; goto out_put_new_file; @@ -1966,3 +1972,171 @@ static void __exit dma_buf_deinit(void) dma_buf_uninit_sysfs_statistics(); } __exitcall(dma_buf_deinit); + +/******************************** + * dma_buf_pages_net_rx * + ********************************/ + +void dma_buf_pages_net_rx_release(struct dma_buf_pages *priv, struct file *file) +{ + struct netdev_rx_queue *rxq; + unsigned long xa_idx; + + xa_for_each(&priv->net_rx.bound_rxq_list, xa_idx, rxq) + if (rxq->dmabuf_pages == file) + rxq->dmabuf_pages = NULL; +} + +static int dev_is_class(struct device *dev, void *class) +{ + if (dev->class != NULL && !strcmp(dev->class->name, class)) + return 1; + + return 0; +} + +int dma_buf_pages_net_rx_init(struct dma_buf_pages *priv, struct file *file) +{ + struct netdev_rx_queue *rxq; + struct net_device *netdev; + int xa_id, err, rxq_idx; + struct device *device; + + priv->net_rx.page_pool = + gen_pool_create(PAGE_SHIFT, dev_to_node(&priv->pci_dev->dev)); + + if (!priv->net_rx.page_pool) + return -ENOMEM; + + /* + * We start with PAGE_SIZE instead of 0 since gen_pool_alloc_*() returns + * NULL on error + */ + err = gen_pool_add_virt(priv->net_rx.page_pool, PAGE_SIZE, 0, + PAGE_SIZE * priv->num_pages, + dev_to_node(&priv->pci_dev->dev)); + if (err) + goto out_destroy_pool; + + xa_init_flags(&priv->net_rx.bound_rxq_list, XA_FLAGS_ALLOC); + + device = device_find_child(&priv->pci_dev->dev, "net", dev_is_class); + if (!device) { + err = -ENODEV; + goto out_destroy_xarray; + } + + netdev = to_net_dev(device); + if (!netdev) { + err = -ENODEV; + goto out_put_dev; + } + + for (rxq_idx = 0; rxq_idx < (sizeof(priv->create_flags) * 8); + rxq_idx++) { + if (!(priv->create_flags & (1ULL << rxq_idx))) + continue; + + if (rxq_idx >= netdev->num_rx_queues) { + err = -ERANGE; + goto out_release_rx; + } + + rxq = __netif_get_rx_queue(netdev, rxq_idx); + + err = xa_alloc(&priv->net_rx.bound_rxq_list, &xa_id, rxq, + xa_limit_32b, GFP_KERNEL); + if (err) + goto out_release_rx; + + /* We previously have done a dma_buf_attach(), which validates + * that the net_device we're trying to attach to can reach the + * dmabuf, so we don't need to check here as well. + */ + rxq->dmabuf_pages = file; + } + put_device(device); + return 0; + +out_release_rx: + dma_buf_pages_net_rx_release(priv, file); +out_put_dev: + put_device(device); +out_destroy_xarray: + xa_destroy(&priv->net_rx.bound_rxq_list); +out_destroy_pool: + gen_pool_destroy(priv->net_rx.page_pool); + return err; +} + +void dma_buf_pages_net_rx_free(struct dma_buf_pages *priv) +{ + xa_destroy(&priv->net_rx.bound_rxq_list); + gen_pool_destroy(priv->net_rx.page_pool); +} + +static unsigned long dma_buf_page_to_gen_pool_addr(struct page *page) +{ + struct dma_buf_pages *priv; + struct dev_pagemap *pgmap; + unsigned long offset; + + pgmap = page->pgmap; + priv = container_of(pgmap, struct dma_buf_pages, pgmap); + offset = page - priv->pages; + /* Offset + 1 is due to the fact that we want to avoid 0 virt address + * returned from the gen_pool. The gen_pool returns 0 on error, and virt + * address 0 is indistinguishable from an error. + */ + return (offset + 1) << PAGE_SHIFT; +} + +static struct page * +dma_buf_gen_pool_addr_to_page(unsigned long addr, struct dma_buf_pages *priv) +{ + /* - 1 is due to the fact that we want to avoid 0 virt address + * returned from the gen_pool. See comment in dma_buf_create_pages() + * for details. + */ + unsigned long offset = (addr >> PAGE_SHIFT) - 1; + return &priv->pages[offset]; +} + +void dma_buf_page_free_net_rx(struct dma_buf_pages *priv, struct page *page) +{ + unsigned long addr = dma_buf_page_to_gen_pool_addr(page); + + if (gen_pool_has_addr(priv->net_rx.page_pool, addr, PAGE_SIZE)) + gen_pool_free(priv->net_rx.page_pool, addr, PAGE_SIZE); +} + +const struct dma_buf_pages_type_ops net_rx_ops = { + .dma_buf_pages_init = dma_buf_pages_net_rx_init, + .dma_buf_pages_release = dma_buf_pages_net_rx_release, + .dma_buf_pages_destroy = dma_buf_pages_net_rx_free, + .dma_buf_page_free = dma_buf_page_free_net_rx, +}; + +struct page *dma_buf_pages_net_rx_alloc(struct dma_buf_pages *priv) +{ + unsigned long gen_pool_addr; + struct page *pg; + + if (!(priv->type & DMA_BUF_PAGES_NET_RX)) + return NULL; + + gen_pool_addr = gen_pool_alloc(priv->net_rx.page_pool, PAGE_SIZE); + if (!gen_pool_addr) + return NULL; + + if (!PAGE_ALIGNED(gen_pool_addr)) { + net_err_ratelimited("dmabuf page pool allocation not aligned"); + gen_pool_free(priv->net_rx.page_pool, gen_pool_addr, PAGE_SIZE); + return NULL; + } + + pg = dma_buf_gen_pool_addr_to_page(gen_pool_addr, priv); + + percpu_ref_get(&priv->pgmap.ref); + return pg; +} diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 5789006180ea..e8e66d6407d0 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -22,6 +22,9 @@ #include #include #include +#include +#include +#include struct device; struct dma_buf; @@ -552,6 +555,11 @@ struct dma_buf_pages_type_ops { struct page *page); }; +struct dma_buf_pages_net_rx { + struct gen_pool *page_pool; + struct xarray bound_rxq_list; +}; + struct dma_buf_pages { /* fields for dmabuf */ struct dma_buf *dmabuf; @@ -568,6 +576,10 @@ struct dma_buf_pages { unsigned int type; const struct dma_buf_pages_type_ops *type_ops; __u64 create_flags; + + union { + struct dma_buf_pages_net_rx net_rx; + }; }; /** @@ -671,6 +683,8 @@ static inline bool is_dma_buf_pages_file(struct file *file) return file->f_op == &dma_buf_pages_fops; } +struct page *dma_buf_pages_net_rx_alloc(struct dma_buf_pages *priv); + static inline bool is_dma_buf_page(struct page *page) { return (is_zone_device_page(page) && page->pgmap && @@ -718,6 +732,12 @@ static inline int dma_buf_map_sg(struct device *dev, struct scatterlist *sg, { return 0; } + +static inline struct page *dma_buf_pages_net_rx_alloc(struct dma_buf_pages *priv) +{ + return NULL; +} + #endif diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index c2f0c6002a84..7a087ffa9baa 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -796,6 +796,7 @@ struct netdev_rx_queue { #ifdef CONFIG_XDP_SOCKETS struct xsk_buff_pool *pool; #endif + struct file __rcu *dmabuf_pages; } ____cacheline_aligned_in_smp; /* diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h index d0f63a2ab7e4..b392cef9d3c6 100644 --- a/include/uapi/linux/dma-buf.h +++ b/include/uapi/linux/dma-buf.h @@ -186,6 +186,8 @@ struct dma_buf_create_pages_info { __u64 create_flags; }; +#define DMA_BUF_PAGES_NET_RX (1 << 0) + #define DMA_BUF_CREATE_PAGES _IOW(DMA_BUF_BASE, 4, struct dma_buf_create_pages_info) #endif From patchwork Mon Jul 10 22:32:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 118133 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp118240vqm; Mon, 10 Jul 2023 15:50:57 -0700 (PDT) X-Google-Smtp-Source: APBJJlElQ+qZ69uUYW3hX3/AqJ3Gy1zBlGkBe0DF4sbHxJZKMShgByPdE7Fy9Erdp80hz565ChDT X-Received: by 2002:a17:906:7a5a:b0:993:d589:8b75 with SMTP id i26-20020a1709067a5a00b00993d5898b75mr14377039ejo.2.1689029457123; Mon, 10 Jul 2023 15:50:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689029457; cv=none; d=google.com; s=arc-20160816; b=MlLunvcOCijbf7BYFc5vRgWyrZ86MYPDay1vlmDvdpqJ5vwbmoi4zHvLRSrGZcZk0r nv+oiMk4nozgsGSsjNT0fvKOtpQXck0mPmPAUywcEAIFZfGx3ZqtNBs5qHu7cIO48/zL 5nOn5axn414zSl5bO6Z62SVu+I2jtKBkHvZsCLOMnZecnkMoMPPFQDoqTNUAudUQlbIZ xISH0sl/xTybwfXgy0+40AtICRrHXrOXjKy11DsOg/QCuMjIhTIFLePSE6PDz3wlAOVM exHkTxFBt0Ocy2FwyPQa993MgHeCyfOrOqSrw7FYeo+tIVvlJw1JrPlS8K4WDQ2SARkt NDwQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=ZtAFtzT26HwoXDlgQMNRwGQzav+ArXt3GcR/tyBO9gA=; fh=55VxVtf/wwatkgb/EP05SgBT89BC1zibP0SY8oI5qug=; b=alb0VR/KmT4yBas6pLbBXrVDPkzxPE0rJGsc2TGRNTOtdaCM5qm3jJTbF1YH5rzIVD Gz+JbLvPdGUn7XRE8UXdmE8EzNwUGELWv3yC9ys6YUihDD9rm3rSzY1IERURdcwFh2UD fdqdMKxn+R5dnxVrAvSIZGWNxk51VfADPVZClxiudIZqxv4izuwp1gZThgtHH+nA7mB1 XI6CpAYfjge1QufsqWPSZP5NJLH4kfnpObAkPtRVk8DdGom4AMO+1MG8CYg0kh2Z2FgA rI4M+7pk1fbdLB/0KLo2527/ecREFJf+Lu7xRVEXILxeEzBsvN6djVC34ezbCyUtUiW0 QM2w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=HkBlwseb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v6-20020a170906564600b009929280d0dfsi632219ejr.434.2023.07.10.15.50.33; Mon, 10 Jul 2023 15:50:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=HkBlwseb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230406AbjGJWeG (ORCPT + 99 others); Mon, 10 Jul 2023 18:34:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58878 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230344AbjGJWeA (ORCPT ); Mon, 10 Jul 2023 18:34:00 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 97746E55 for ; Mon, 10 Jul 2023 15:33:53 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-57059e6f9c7so73873987b3.0 for ; Mon, 10 Jul 2023 15:33:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689028432; x=1691620432; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ZtAFtzT26HwoXDlgQMNRwGQzav+ArXt3GcR/tyBO9gA=; b=HkBlwsebRlph1pf39ww19ir3TXRKAaMo7qsHRnHvMwd7eaBtsd+aEsyn2PFrjYVPN/ Q1m2zbfqNxMH7b2QmwqEmx7Hw8+0VDdPsjPbv5ClyeuSbRANbqwhxJwBlxKfNPSzGspo 7yQorKtO9RR3lY8Caocl1cEReHVAh3gKZjRNe8K/5mwfYof1P1cHyHw8LwIbAtpadOQi xt/cqEvGCmmTBBOu377T/OItahks2lVVcdnKMKB3565gPdYG/TvH20z3G9f0jRiObQC7 /Ikbp8b7IOidW9ZdGHVTIfQ7XQf/3jcuOVmetsVhHpuKf5wo9wt+kW67ruZRHH/8+c9U xfJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689028432; x=1691620432; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ZtAFtzT26HwoXDlgQMNRwGQzav+ArXt3GcR/tyBO9gA=; b=D56Xum9Fe7I35d1nknbKkP2EmR3UgB3h82bbHJblU/zjOJxdwUL5WpMKn3iASlq4ZX nv3E+chyWlvE8VMNKEI6n+QtEyTzBsSWBUPg2SHMRYLrIrwxpcQfUzCXmiPI9sr7hfn6 2OeANfo1wYiT/XSOhGcRUfd+zBPXsqucadVddXnfZqF4+LBhxnhUcasLbom2JKJTGsAc F9WFYkI9dFivz3qCJTDnDb53a980LADCvNdpdfexkfJe2DaFva+WrLzJ0Ryi4XVCEPSd 9jO7fTVIy59gq3am4krgTPFaHgG0Zl11kW5kWWGp4VUQ86Jj3epjBr1tjeS7dnQVsSBT WFRA== X-Gm-Message-State: ABy/qLa4wHonWGXvMhEszmdR2+89flQ7Dt7oRsy9j3+FQgI/78GNTcpb +JnKiG6cQ4tptrf0UyGKsRsGxmlAr2itaVjX1zo7yIjRfJF2ovzMK1x2sb1iHOOfju7B7f9WaY2 Hfyb3qVWcW/yq6HjT1w/OaaGu69f9iXiKthVYjivMhzx+7qOqjJQsuTtf8gk6c3kARlaX8VIu79 g30G/lE3g= X-Received: from almasrymina.svl.corp.google.com ([2620:15c:2c4:200:4c0f:bfb6:9942:8c53]) (user=almasrymina job=sendgmr) by 2002:a81:7e0c:0:b0:56c:e9fe:3cb4 with SMTP id o12-20020a817e0c000000b0056ce9fe3cb4mr204351ywn.1.1689028432231; Mon, 10 Jul 2023 15:33:52 -0700 (PDT) Date: Mon, 10 Jul 2023 15:32:54 -0700 In-Reply-To: <20230710223304.1174642-1-almasrymina@google.com> Mime-Version: 1.0 References: <20230710223304.1174642-1-almasrymina@google.com> X-Mailer: git-send-email 2.41.0.390.g38632f3daf-goog Message-ID: <20230710223304.1174642-4-almasrymina@google.com> Subject: [RFC PATCH 03/10] dma-buf: add support for NET_TX pages From: Mina Almasry To: linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org, netdev@vger.kernel.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Mina Almasry , Sumit Semwal , " =?utf-8?q?Christian_K=C3=B6nig?= " , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesper Dangaard Brouer , Ilias Apalodimas , Arnd Bergmann , David Ahern , Willem de Bruijn , Shuah Khan , jgg@ziepe.ca X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771075751922743446 X-GMAIL-MSGID: 1771075751922743446 Used the paged attachment mappings support to create NET_TX pages. NET_TX pages can be used in the networking transmit path: 1. Create an iov_iter & bio_vec entries to represent this dmabuf. 2. Initialize the bio_vec with the backing dmabuf pages. Signed-off-by: Mina Almasry --- drivers/dma-buf/dma-buf.c | 47 ++++++++++++++++++++++++++++++++++++ include/linux/dma-buf.h | 7 ++++++ include/uapi/linux/dma-buf.h | 1 + 3 files changed, 55 insertions(+) diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index acb86bf406f4..3ca71297b9b4 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -1683,6 +1683,7 @@ static void dma_buf_pages_destroy(struct percpu_ref *ref) } const struct dma_buf_pages_type_ops net_rx_ops; +const struct dma_buf_pages_type_ops net_tx_ops; static long dma_buf_create_pages(struct file *file, struct dma_buf_create_pages_info *create_info) @@ -1799,6 +1800,9 @@ static long dma_buf_create_pages(struct file *file, case DMA_BUF_PAGES_NET_RX: priv->type_ops = &net_rx_ops; break; + case DMA_BUF_PAGES_NET_TX: + priv->type_ops = &net_tx_ops; + break; default: err = -EINVAL; goto out_put_new_file; @@ -2140,3 +2144,46 @@ struct page *dma_buf_pages_net_rx_alloc(struct dma_buf_pages *priv) percpu_ref_get(&priv->pgmap.ref); return pg; } + +/******************************** + * dma_buf_pages_net_tx * + ********************************/ + +static void dma_buf_pages_net_tx_release(struct dma_buf_pages *priv, + struct file *file) +{ + int i; + for (i = 0; i < priv->num_pages; i++) + put_page(&priv->pages[i]); +} + +static int dma_buf_pages_net_tx_init(struct dma_buf_pages *priv, + struct file *file) +{ + int i; + priv->net_tx.tx_bv = kvmalloc_array(priv->num_pages, + sizeof(struct bio_vec), GFP_KERNEL); + if (!priv->net_tx.tx_bv) + return -ENOMEM; + + for (i = 0; i < priv->num_pages; i++) { + priv->net_tx.tx_bv[i].bv_page = &priv->pages[i]; + priv->net_tx.tx_bv[i].bv_offset = 0; + priv->net_tx.tx_bv[i].bv_len = PAGE_SIZE; + } + percpu_ref_get_many(&priv->pgmap.ref, priv->num_pages); + iov_iter_bvec(&priv->net_tx.iter, WRITE, priv->net_tx.tx_bv, + priv->num_pages, priv->dmabuf->size); + return 0; +} + +static void dma_buf_pages_net_tx_free(struct dma_buf_pages *priv) +{ + kvfree(priv->net_tx.tx_bv); +} + +const struct dma_buf_pages_type_ops net_tx_ops = { + .dma_buf_pages_init = dma_buf_pages_net_tx_init, + .dma_buf_pages_release = dma_buf_pages_net_tx_release, + .dma_buf_pages_destroy = dma_buf_pages_net_tx_free, +}; diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index e8e66d6407d0..93228a2fec47 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -22,6 +22,7 @@ #include #include #include +#include #include #include #include @@ -555,6 +556,11 @@ struct dma_buf_pages_type_ops { struct page *page); }; +struct dma_buf_pages_net_tx { + struct iov_iter iter; + struct bio_vec *tx_bv; +}; + struct dma_buf_pages_net_rx { struct gen_pool *page_pool; struct xarray bound_rxq_list; @@ -579,6 +585,7 @@ struct dma_buf_pages { union { struct dma_buf_pages_net_rx net_rx; + struct dma_buf_pages_net_tx net_tx; }; }; diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h index b392cef9d3c6..546f211a7556 100644 --- a/include/uapi/linux/dma-buf.h +++ b/include/uapi/linux/dma-buf.h @@ -187,6 +187,7 @@ struct dma_buf_create_pages_info { }; #define DMA_BUF_PAGES_NET_RX (1 << 0) +#define DMA_BUF_PAGES_NET_TX (2 << 0) #define DMA_BUF_CREATE_PAGES _IOW(DMA_BUF_BASE, 4, struct dma_buf_create_pages_info) From patchwork Mon Jul 10 22:32:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 118129 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp115651vqm; Mon, 10 Jul 2023 15:43:58 -0700 (PDT) X-Google-Smtp-Source: APBJJlH0s3vurp1o07gEeJgP6z4Hr1tLaEUli/PPWzA1etqcHO1wKOFa2nw10Qh234CV1qg0azM6 X-Received: by 2002:a05:6512:3a91:b0:4f8:5886:186d with SMTP id q17-20020a0565123a9100b004f85886186dmr11619458lfu.9.1689029037788; Mon, 10 Jul 2023 15:43:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689029037; cv=none; d=google.com; s=arc-20160816; b=PyjGYpTBh+YI/OLwpaDMTvmR5gMv5eNflD3rSI7PHor3rzQhaIclb5Yf0v/MM5Tw0I qaUhUDVSLPoC9r8bqdcKiONDbCoswJd/1k3jgjgFhle/ODpeUC0b20bFERAaFENESB/5 fOXueupV5zYy86racUE4Czv/WMfoAbqKWcZ1MIXNbdCsqXTjEL8BerlMSGm6ni2YmI+s +/yRhkerzhjXKQfR42ICZacvDsa7P123Up1PgduGdGwbnpkwrWgmrdEHTJDS3246tHII gBzn4DQKBXIz4A42gIq2J+E525+y7h/mn3c2/SWGvh94JxfKrMEyOEEyZQkFaxmXIGB4 h0TQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=QDPrrkJolhYj8+JA2ecQJjxVsJNIQec1wNqvH72xEAE=; fh=55VxVtf/wwatkgb/EP05SgBT89BC1zibP0SY8oI5qug=; b=LptHGcTnNiKb8BykJ/y4NkgVTCw6kprDKSlk+TdwvfKN8CnkyFtA3jyQiGhcj2UEjj Sr1GdCeV5XsSF3VK8s+c+wc543sP0KobeC/aIWDIMf2vr0KJnno1fGPbnZmR74bT/umH ziAhQkrtv8Y8+kC8GDElDmccjLVKIPo/pov0KoRoWDNa5vvTKhf9Ua35ZT/Zut2jX7ZT 1MT+U07AlPKHB7m3XCbZ89dbNQyZh7/24UJsUjZXIGRL9WDekV48qYpvqwcnerpOLC82 JXJELOYQ06iBAxTQ6/pWJrz+YNxGChNrgaxw8EkdkSchd7Z1H7VE9Yyd//klk2QzW7nw u5Fg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=bliRqYBY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r24-20020a056402035800b0051e0bb6bc02si672142edw.274.2023.07.10.15.43.34; Mon, 10 Jul 2023 15:43:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=bliRqYBY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230432AbjGJWeT (ORCPT + 99 others); Mon, 10 Jul 2023 18:34:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59218 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230421AbjGJWeL (ORCPT ); Mon, 10 Jul 2023 18:34:11 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 15056E76 for ; Mon, 10 Jul 2023 15:33:56 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-c118efd0c3cso4456660276.0 for ; Mon, 10 Jul 2023 15:33:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689028436; x=1691620436; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=QDPrrkJolhYj8+JA2ecQJjxVsJNIQec1wNqvH72xEAE=; b=bliRqYBYA7SP+SPxK2IjcbatX7Omsm+v4X2DL5ulNehUSHXhYkbBCjCoPA4+lde/+r Mv4vn1NPHsq9zQaJ/BS5q4yOT28xnB+GR7b90+Msgrod+u9besvi+CHAzr3d2lqkB/AA yxX32+4hCEHv/rYkC3zNVkxpRlZhCActP39jEwWjryIEygF1XkF2wIN5uXGWgJII1qn3 4AzQQnHnn+AHSw/qKKc5jLJMbzVKVGyCcgjIEEOOjxsfJU/XS8Za+1qyJC6zoQp8SC9Z eZmHcI7zYkSKm3we5mSp3N6GMumLtl7F3sh7WJbaYkTjS79SpPYbISoFN9wtRlENHs+q GjoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689028436; x=1691620436; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=QDPrrkJolhYj8+JA2ecQJjxVsJNIQec1wNqvH72xEAE=; b=KncNs7MjHNK9sD8JCv3aL0s9lzgQcDphvWTabsKfIebsIm4rFRltw4qVB4JA8p42ai Ac7Ldlu+nLi5lSDoZkwZ9zzW07Giov0lfhLp4ZrmxRxBHwLpmVyA88DwIs8v/WlYebRT QcHGkbPVnHmluxkWmdavzgbngXB1wiU+jbf8u1Ot5umfWlDUyCw2Jcf/JGHY1qQ7CCH4 5efY3GV9Vjd5udFRyxKl5BPP5SVJHJpxd9YTOcso6bK+6NmH0OYdLv1SkdchrMyBNx7i XXrY4s2X2Rc6fZSIUC1BKTEl7y/Jw2N2cp/8HoMV+nqoqX2DlxDse5RC+aH237gDvAQQ GxOw== X-Gm-Message-State: ABy/qLaRDY4kkDfeRFAONHk3LvH3DUmIxyDEC3umgg4n4CLE7makkynn toj4lO+roVCIwFBP+fj1h2TN55jVvGxqxOJOW8FSn4zZ/n1IvR/Ogj8DO2AwVExZWF8XyJoyA3b UapuSlR4SQIpMe5uAlBzsveRQbrnu6VoEmZKNkJr8zQWC964IJcv5bt3q+zVmQIqF+q+2XAzdPc wbkAQ1esk= X-Received: from almasrymina.svl.corp.google.com ([2620:15c:2c4:200:4c0f:bfb6:9942:8c53]) (user=almasrymina job=sendgmr) by 2002:a25:41ca:0:b0:c6b:c65d:6d02 with SMTP id o193-20020a2541ca000000b00c6bc65d6d02mr65909yba.9.1689028435810; Mon, 10 Jul 2023 15:33:55 -0700 (PDT) Date: Mon, 10 Jul 2023 15:32:55 -0700 In-Reply-To: <20230710223304.1174642-1-almasrymina@google.com> Mime-Version: 1.0 References: <20230710223304.1174642-1-almasrymina@google.com> X-Mailer: git-send-email 2.41.0.390.g38632f3daf-goog Message-ID: <20230710223304.1174642-5-almasrymina@google.com> Subject: [RFC PATCH 04/10] net: add support for skbs with unreadable frags From: Mina Almasry To: linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org, netdev@vger.kernel.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Mina Almasry , Sumit Semwal , " =?utf-8?q?Christian_K=C3=B6nig?= " , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesper Dangaard Brouer , Ilias Apalodimas , Arnd Bergmann , David Ahern , Willem de Bruijn , Shuah Khan , jgg@ziepe.ca X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771075312459112690 X-GMAIL-MSGID: 1771075312459112690 For device memory TCP, we expect the skb headers to be available in host memory for access, and we expect the skb frags to be in device memory and unaccessible to the host. We expect there to be no mixing and matching of device memory frags (unaccessible) with host memory frags (accessible) in the same skb. Add a skb->devmem flag which indicates whether the frags in this skb are device memory frags or not. __skb_fill_page_desc() & skb_fill_page_desc_noacc() now checks frags added to skbs for dmabuf pages, and marks the skb as skb->devmem if the page is a device memory page. Add checks through the network stack to avoid accessing the frags of devmem skbs and avoid coallescing devmem skbs with non devmem skbs. Signed-off-by: Mina Almasry --- include/linux/skbuff.h | 15 +++++++++ include/net/tcp.h | 6 ++-- net/core/skbuff.c | 73 ++++++++++++++++++++++++++++++++++-------- net/ipv4/tcp.c | 3 ++ net/ipv4/tcp_input.c | 13 ++++++-- net/ipv4/tcp_output.c | 5 ++- net/packet/af_packet.c | 4 +-- 7 files changed, 97 insertions(+), 22 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 0b40417457cd..f5e03aa84160 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -38,6 +38,7 @@ #endif #include #include +#include /** * DOC: skb checksums @@ -805,6 +806,8 @@ typedef unsigned char *sk_buff_data_t; * @csum_level: indicates the number of consecutive checksums found in * the packet minus one that have been verified as * CHECKSUM_UNNECESSARY (max 3) + * @devmem: indicates that all the fragments in this skb is backed by + * device memory. * @dst_pending_confirm: need to confirm neighbour * @decrypted: Decrypted SKB * @slow_gro: state present at GRO time, slower prepare step required @@ -992,6 +995,7 @@ struct sk_buff { __u8 csum_not_inet:1; #endif + __u8 devmem:1; #ifdef CONFIG_NET_SCHED __u16 tc_index; /* traffic control index */ #endif @@ -1766,6 +1770,12 @@ static inline void skb_zcopy_downgrade_managed(struct sk_buff *skb) __skb_zcopy_downgrade_managed(skb); } +/* Return true if frags in this skb are not readable by the host. */ +static inline bool skb_frags_not_readable(const struct sk_buff *skb) +{ + return skb->devmem; +} + static inline void skb_mark_not_on_list(struct sk_buff *skb) { skb->next = NULL; @@ -2469,6 +2479,8 @@ static inline void __skb_fill_page_desc(struct sk_buff *skb, int i, page = compound_head(page); if (page_is_pfmemalloc(page)) skb->pfmemalloc = true; + if (is_dma_buf_page(page)) + skb->devmem = true; } /** @@ -2511,6 +2523,9 @@ static inline void skb_fill_page_desc_noacc(struct sk_buff *skb, int i, __skb_fill_page_desc_noacc(shinfo, i, page, off, size); shinfo->nr_frags = i + 1; + + if (is_dma_buf_page(page)) + skb->devmem = true; } void skb_add_rx_frag(struct sk_buff *skb, int i, struct page *page, int off, diff --git a/include/net/tcp.h b/include/net/tcp.h index 5066e4586cf0..6d86ed3736ad 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -986,7 +986,7 @@ static inline int tcp_skb_mss(const struct sk_buff *skb) static inline bool tcp_skb_can_collapse_to(const struct sk_buff *skb) { - return likely(!TCP_SKB_CB(skb)->eor); + return likely(!TCP_SKB_CB(skb)->eor && !skb_frags_not_readable(skb)); } static inline bool tcp_skb_can_collapse(const struct sk_buff *to, @@ -994,7 +994,9 @@ static inline bool tcp_skb_can_collapse(const struct sk_buff *to, { return likely(tcp_skb_can_collapse_to(to) && mptcp_skb_can_collapse(to, from) && - skb_pure_zcopy_same(to, from)); + skb_pure_zcopy_same(to, from) && + skb_frags_not_readable(to) == + skb_frags_not_readable(from)); } /* Events passed to congestion control interface */ diff --git a/net/core/skbuff.c b/net/core/skbuff.c index cea28d30abb5..9b83da794641 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1191,11 +1191,16 @@ void skb_dump(const char *level, const struct sk_buff *skb, bool full_pkt) skb_frag_size(frag), p, p_off, p_len, copied) { seg_len = min_t(int, p_len, len); - vaddr = kmap_atomic(p); - print_hex_dump(level, "skb frag: ", - DUMP_PREFIX_OFFSET, - 16, 1, vaddr + p_off, seg_len, false); - kunmap_atomic(vaddr); + if (!is_dma_buf_page(p)) { + vaddr = kmap_atomic(p); + print_hex_dump(level, "skb frag: ", + DUMP_PREFIX_OFFSET, 16, 1, + vaddr + p_off, seg_len, false); + kunmap_atomic(vaddr); + } else { + printk("%sskb frag: devmem", level); + } + len -= seg_len; if (!len) break; @@ -1764,6 +1769,9 @@ int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask) if (skb_shared(skb) || skb_unclone(skb, gfp_mask)) return -EINVAL; + if (skb_frags_not_readable(skb)) + return -EFAULT; + if (!num_frags) goto release; @@ -1934,8 +1942,10 @@ struct sk_buff *skb_copy(const struct sk_buff *skb, gfp_t gfp_mask) { int headerlen = skb_headroom(skb); unsigned int size = skb_end_offset(skb) + skb->data_len; - struct sk_buff *n = __alloc_skb(size, gfp_mask, - skb_alloc_rx_flag(skb), NUMA_NO_NODE); + struct sk_buff *n = skb_frags_not_readable(skb) ? NULL : + __alloc_skb(size, gfp_mask, + skb_alloc_rx_flag(skb), + NUMA_NO_NODE); if (!n) return NULL; @@ -2266,9 +2276,10 @@ struct sk_buff *skb_copy_expand(const struct sk_buff *skb, /* * Allocate the copy buffer */ - struct sk_buff *n = __alloc_skb(newheadroom + skb->len + newtailroom, - gfp_mask, skb_alloc_rx_flag(skb), - NUMA_NO_NODE); + struct sk_buff *n = skb_frags_not_readable(skb) ? NULL : + __alloc_skb(newheadroom + skb->len + newtailroom, + gfp_mask, skb_alloc_rx_flag(skb), + NUMA_NO_NODE); int oldheadroom = skb_headroom(skb); int head_copy_len, head_copy_off; @@ -2609,6 +2620,9 @@ void *__pskb_pull_tail(struct sk_buff *skb, int delta) */ int i, k, eat = (skb->tail + delta) - skb->end; + if (skb_frags_not_readable(skb)) + return NULL; + if (eat > 0 || skb_cloned(skb)) { if (pskb_expand_head(skb, 0, eat > 0 ? eat + 128 : 0, GFP_ATOMIC)) @@ -2762,6 +2776,9 @@ int skb_copy_bits(const struct sk_buff *skb, int offset, void *to, int len) to += copy; } + if (skb_frags_not_readable(skb)) + goto fault; + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { int end; skb_frag_t *f = &skb_shinfo(skb)->frags[i]; @@ -2835,7 +2852,7 @@ static struct page *linear_to_page(struct page *page, unsigned int *len, { struct page_frag *pfrag = sk_page_frag(sk); - if (!sk_page_frag_refill(sk, pfrag)) + if (!sk_page_frag_refill(sk, pfrag) || is_dma_buf_page(pfrag->page)) return NULL; *len = min_t(unsigned int, *len, pfrag->size - pfrag->offset); @@ -3164,6 +3181,9 @@ int skb_store_bits(struct sk_buff *skb, int offset, const void *from, int len) from += copy; } + if (skb_frags_not_readable(skb)) + goto fault; + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; int end; @@ -3243,6 +3263,9 @@ __wsum __skb_checksum(const struct sk_buff *skb, int offset, int len, pos = copy; } + if (skb_frags_not_readable(skb)) + return 0; + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { int end; skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; @@ -3343,6 +3366,9 @@ __wsum skb_copy_and_csum_bits(const struct sk_buff *skb, int offset, pos = copy; } + if (skb_frags_not_readable(skb)) + return 0; + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { int end; @@ -3800,7 +3826,9 @@ static inline void skb_split_inside_header(struct sk_buff *skb, skb_shinfo(skb1)->frags[i] = skb_shinfo(skb)->frags[i]; skb_shinfo(skb1)->nr_frags = skb_shinfo(skb)->nr_frags; + skb1->devmem = skb->devmem; skb_shinfo(skb)->nr_frags = 0; + skb->devmem = 0; skb1->data_len = skb->data_len; skb1->len += skb1->data_len; skb->data_len = 0; @@ -3814,11 +3842,13 @@ static inline void skb_split_no_header(struct sk_buff *skb, { int i, k = 0; const int nfrags = skb_shinfo(skb)->nr_frags; + const int devmem = skb->devmem; skb_shinfo(skb)->nr_frags = 0; skb1->len = skb1->data_len = skb->len - len; skb->len = len; skb->data_len = len - pos; + skb->devmem = skb1->devmem = 0; for (i = 0; i < nfrags; i++) { int size = skb_frag_size(&skb_shinfo(skb)->frags[i]); @@ -3847,6 +3877,12 @@ static inline void skb_split_no_header(struct sk_buff *skb, pos += size; } skb_shinfo(skb1)->nr_frags = k; + + if (skb_shinfo(skb)->nr_frags) + skb->devmem = devmem; + + if (skb_shinfo(skb1)->nr_frags) + skb1->devmem = devmem; } /** @@ -4082,6 +4118,9 @@ unsigned int skb_seq_read(unsigned int consumed, const u8 **data, return block_limit - abs_offset; } + if (skb_frags_not_readable(st->cur_skb)) + return 0; + if (st->frag_idx == 0 && !st->frag_data) st->stepped_offset += skb_headlen(st->cur_skb); @@ -5681,7 +5720,10 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from, (from->pp_recycle && skb_cloned(from))) return false; - if (len <= skb_tailroom(to)) { + if (skb_frags_not_readable(from) != skb_frags_not_readable(to)) + return false; + + if (len <= skb_tailroom(to) && !skb_frags_not_readable(from)) { if (len) BUG_ON(skb_copy_bits(from, 0, skb_put(to, len), len)); *delta_truesize = 0; @@ -5997,6 +6039,9 @@ int skb_ensure_writable(struct sk_buff *skb, unsigned int write_len) if (!pskb_may_pull(skb, write_len)) return -ENOMEM; + if (skb_frags_not_readable(skb)) + return -EFAULT; + if (!skb_cloned(skb) || skb_clone_writable(skb, write_len)) return 0; @@ -6656,8 +6701,8 @@ EXPORT_SYMBOL(pskb_extract); void skb_condense(struct sk_buff *skb) { if (skb->data_len) { - if (skb->data_len > skb->end - skb->tail || - skb_cloned(skb)) + if (skb->data_len > skb->end - skb->tail || skb_cloned(skb) || + skb_frags_not_readable(skb)) return; /* Nice, we can free page frag(s) right now */ diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 8d20d9221238..51e8d5872670 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -4520,6 +4520,9 @@ int tcp_md5_hash_skb_data(struct tcp_md5sig_pool *hp, if (crypto_ahash_update(req)) return 1; + if (skb_frags_not_readable(skb)) + return 1; + for (i = 0; i < shi->nr_frags; ++i) { const skb_frag_t *f = &shi->frags[i]; unsigned int offset = skb_frag_off(f); diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index bf8b22218dd4..8d28d96a3c24 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -5188,6 +5188,9 @@ tcp_collapse(struct sock *sk, struct sk_buff_head *list, struct rb_root *root, for (end_of_skbs = true; skb != NULL && skb != tail; skb = n) { n = tcp_skb_next(skb, list); + if (skb_frags_not_readable(skb)) + goto skip_this; + /* No new bits? It is possible on ofo queue. */ if (!before(start, TCP_SKB_CB(skb)->end_seq)) { skb = tcp_collapse_one(sk, skb, list, root); @@ -5208,17 +5211,20 @@ tcp_collapse(struct sock *sk, struct sk_buff_head *list, struct rb_root *root, break; } - if (n && n != tail && mptcp_skb_can_collapse(skb, n) && + if (n && n != tail && !skb_frags_not_readable(n) && + mptcp_skb_can_collapse(skb, n) && TCP_SKB_CB(skb)->end_seq != TCP_SKB_CB(n)->seq) { end_of_skbs = false; break; } +skip_this: /* Decided to skip this, advance start seq. */ start = TCP_SKB_CB(skb)->end_seq; } if (end_of_skbs || - (TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_SYN | TCPHDR_FIN))) + (TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_SYN | TCPHDR_FIN)) || + skb_frags_not_readable(skb)) return; __skb_queue_head_init(&tmp); @@ -5262,7 +5268,8 @@ tcp_collapse(struct sock *sk, struct sk_buff_head *list, struct rb_root *root, if (!skb || skb == tail || !mptcp_skb_can_collapse(nskb, skb) || - (TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_SYN | TCPHDR_FIN))) + (TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_SYN | TCPHDR_FIN)) || + skb_frags_not_readable(skb)) goto end; #ifdef CONFIG_TLS_DEVICE if (skb->decrypted != nskb->decrypted) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index cfe128b81a01..eddade864c7f 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2310,7 +2310,8 @@ static bool tcp_can_coalesce_send_queue_head(struct sock *sk, int len) if (unlikely(TCP_SKB_CB(skb)->eor) || tcp_has_tx_tstamp(skb) || - !skb_pure_zcopy_same(skb, next)) + !skb_pure_zcopy_same(skb, next) || + skb->devmem != next->devmem) return false; len -= skb->len; @@ -3087,6 +3088,8 @@ static bool tcp_can_collapse(const struct sock *sk, const struct sk_buff *skb) return false; if (skb_cloned(skb)) return false; + if (skb_frags_not_readable(skb)) + return false; /* Some heuristics for collapsing over SACK'd could be invented */ if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED) return false; diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index a2dbeb264f26..9b31f688163c 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2152,7 +2152,7 @@ static int packet_rcv(struct sk_buff *skb, struct net_device *dev, } } - snaplen = skb->len; + snaplen = skb_frags_not_readable(skb) ? skb_headlen(skb) : skb->len; res = run_filter(skb, sk, snaplen); if (!res) @@ -2275,7 +2275,7 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, } } - snaplen = skb->len; + snaplen = skb_frags_not_readable(skb) ? skb_headlen(skb) : skb->len; res = run_filter(skb, sk, snaplen); if (!res) From patchwork Mon Jul 10 22:32:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 118131 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp116684vqm; Mon, 10 Jul 2023 15:46:45 -0700 (PDT) X-Google-Smtp-Source: APBJJlHdYOygTjTiqV5daDjas/7aZJ6qH1CpvY5RiIg/SkoAJf92lFMnwwsZongsy6xLswZKSFjg X-Received: by 2002:a17:906:7046:b0:992:ba2c:2e0c with SMTP id r6-20020a170906704600b00992ba2c2e0cmr11817784ejj.36.1689029204850; Mon, 10 Jul 2023 15:46:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689029204; cv=none; d=google.com; s=arc-20160816; b=g5CE+Mj94iAtBp+aKgYEUfiLoBWGhyHyDOy0il0J0EBSOKLXZbzosKIA4IcWEjHODx zEq0mGN0HSUGJyRffcCgrXihHpVm/oXUbdPF5HI0UBPQsZsDJUCHR7cSMGdV+mVro5nn SctDkXTKKQU8QWPmDOU48uPW+3iiJUdCNUnqgt24DRWDXnmOwTVp46sAXklUBVVGDjcy epsdRUrjAx6MmWq0LDzeg9zr1ZbINExCLZEiKIErxjrqDl3aD88a8eJiew6FSHopvwSD Zr2z136eoi4dEE5/SsSqHOlkoWwldR9xVtKg0ICIMkcQouse5wgpisnW16smxnIs3ZwN lPeg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=cs78TMZKTTsnyLaGsD47TqXXpiU8IKn3E+LzkzIF7BU=; fh=55VxVtf/wwatkgb/EP05SgBT89BC1zibP0SY8oI5qug=; b=G3ZYXtsP/2Ya0PQRZyJOdfpIfP4ay1cHuDHumN34j2/AvAyeST16NSTXDFVuvGu9dD jexc/IvNFDM+Y/mMkW5A4S4O175ou0nTBUmJ9nkVfMaGpohbRrXhiovnAnmOQmd6ABYG AgNKlEUaLj4NLuzyx6NFM33S2JSnMTWWcCj21RqrkC3S0dKsvXefp1Kazq0VP6UVugzU qxwLIf7aO2jrTYIL1GcstY+ufipE1JunKo7JA87EXJUIg+hhaqisOizTIcI6wptbUJt9 gGNAn84u1AINho9ywt2vZ+qoYcDmxn3eTkCTkBY5bItmqj7h2f3KS3qYHq3PKh2PElTq tokQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=3a5uxA+q; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y15-20020a170906470f00b00992e94c0d25si570987ejq.781.2023.07.10.15.46.21; Mon, 10 Jul 2023 15:46:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=3a5uxA+q; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230474AbjGJWeb (ORCPT + 99 others); Mon, 10 Jul 2023 18:34:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59432 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230505AbjGJWeY (ORCPT ); Mon, 10 Jul 2023 18:34:24 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C6A96E4D for ; Mon, 10 Jul 2023 15:34:03 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-c78d98b0213so2767538276.2 for ; Mon, 10 Jul 2023 15:34:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689028442; x=1691620442; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=cs78TMZKTTsnyLaGsD47TqXXpiU8IKn3E+LzkzIF7BU=; b=3a5uxA+qQl4NZTJNL69iIkTvOyWVY7//EUdVncePAl7DuYdYysCLTB+Scqjs0DbaRb knVSDHY19O2jw3TmS+RU9Ak2mdsBgSK6tfW6u5GV1mUKkWyYVITRFk6IlJEv5ytnZkeh zBT9t0KymYJ4QcgVINn2Gq3UU2ZggoPreRx+Zv/eoJI962CJj+J+KIPmh9/akMc2ypBs hRWw+J3GhzD0inq7pRnsKc7SkPkoyPF2IBr7akov++47y3RLln3OTm3ZRyFBuMw9dTux qWTBDMWGCdy7dhQBGYAsQnhwui5piSP/JLPmLDeSdKvpAE1zn/5xU46UzOfFbxNEAau/ kuKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689028442; x=1691620442; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=cs78TMZKTTsnyLaGsD47TqXXpiU8IKn3E+LzkzIF7BU=; b=U7BEzSnKg9FzxUOQFQR5TX5ty9Zcz9QpETTrk44vcV8NvQDTnSRE/s2lYRaJL1kzJw oZxv29Erz8J6pDbMKlzatgEBdSS59vmdXoj98XQVOkNg5BxX82ZQF8Betn4CI+uZG1dh qXGon5RN/e5ZP7hpiRUz7MXrvQKt65uhLNgdF5uvZ2lBMWEMqwk4ybdxz9cvRnQNjl8o PVA9UHmHQ7exSOsgzK7i/5B8E3wNRI94ctBbJtvES7rs3nwbiKCkULD16d/gV7nxQMq8 Y71dZI/b/b8sjlP5jI2iyexWP/DR+CRkH64/o0p9C2bduTPTrotiTTarFyAlcFLFYUnD BBuQ== X-Gm-Message-State: ABy/qLb4zHKnPWx1sbvBfoPGmeXqQYQfAz9Pj79U9xX2bhqDWsPk4Oi6 5FZ9f+QYJFrHghBo4EyMCKJ/nIM8utS51X3b6eEBdifjxExUmVJbpT5Fgw7djuxYmj6o7Ybi76H 34Lp0AnvgneInY85UX89TOA2nskzQbzBcbGNMSdiNy1wAofBi2nvJyKVMwd+0o3Y5INojooQqD3 x4P2HNiuA= X-Received: from almasrymina.svl.corp.google.com ([2620:15c:2c4:200:4c0f:bfb6:9942:8c53]) (user=almasrymina job=sendgmr) by 2002:a25:6c44:0:b0:c5d:b034:28e2 with SMTP id h65-20020a256c44000000b00c5db03428e2mr73516ybc.7.1689028442390; Mon, 10 Jul 2023 15:34:02 -0700 (PDT) Date: Mon, 10 Jul 2023 15:32:56 -0700 In-Reply-To: <20230710223304.1174642-1-almasrymina@google.com> Mime-Version: 1.0 References: <20230710223304.1174642-1-almasrymina@google.com> X-Mailer: git-send-email 2.41.0.390.g38632f3daf-goog Message-ID: <20230710223304.1174642-6-almasrymina@google.com> Subject: [RFC PATCH 05/10] tcp: implement recvmsg() RX path for devmem TCP From: Mina Almasry To: linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org, netdev@vger.kernel.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Mina Almasry , Sumit Semwal , " =?utf-8?q?Christian_K=C3=B6nig?= " , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesper Dangaard Brouer , Ilias Apalodimas , Arnd Bergmann , David Ahern , Willem de Bruijn , Shuah Khan , jgg@ziepe.ca X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771075487592948817 X-GMAIL-MSGID: 1771075487592948817 In tcp_recvmsg_locked(), detect if the skb being received by the user is a devmem skb. In this case - if the user provided the MSG_SOCK_DEVMEM flag - pass it to tcp_recvmsg_devmem() for custom handling. tcp_recvmsg_devmem() copies any data in the skb header to the linear buffer, and returns a cmsg to the user indicating the number of bytes returned in the linear buffer. tcp_recvmsg_devmem() then loops over the unaccessible devmem skb frags, and returns to the user a cmsg_devmem indicating the location of the data in the dmabuf device memory. cmsg_devmem contains this information: 1. the offset into the dmabuf where the payload starts. 'frag_offset'. 2. the size of the frag. 'frag_size'. 3. an opaque token 'frag_token' to return to the kernel when the buffer is to be released. The pages awaiting freeing are stored in the newly added sk->sk_pagepool, and each page passed to userspace is get_page()'d. This reference is dropped once the userspace indicates that it is done reading this page. All pages are released when the socket is destroyed. Signed-off-by: Mina Almasry --- include/linux/socket.h | 1 + include/net/sock.h | 2 + include/uapi/asm-generic/socket.h | 5 + include/uapi/linux/uio.h | 6 + net/core/datagram.c | 3 + net/ipv4/tcp.c | 186 +++++++++++++++++++++++++++++- net/ipv4/tcp_ipv4.c | 8 ++ 7 files changed, 209 insertions(+), 2 deletions(-) diff --git a/include/linux/socket.h b/include/linux/socket.h index 13c3a237b9c9..12905b2f1215 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -326,6 +326,7 @@ struct ucred { * plain text and require encryption */ +#define MSG_SOCK_DEVMEM 0x2000000 /* Receive devmem skbs as cmsg */ #define MSG_ZEROCOPY 0x4000000 /* Use user data in kernel path */ #define MSG_FASTOPEN 0x20000000 /* Send data in TCP SYN */ #define MSG_CMSG_CLOEXEC 0x40000000 /* Set close_on_exec for file diff --git a/include/net/sock.h b/include/net/sock.h index 6f428a7f3567..c615666ff19a 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -353,6 +353,7 @@ struct sk_filter; * @sk_txtime_unused: unused txtime flags * @ns_tracker: tracker for netns reference * @sk_bind2_node: bind node in the bhash2 table + * @sk_pagepool: page pool associated with this socket. */ struct sock { /* @@ -545,6 +546,7 @@ struct sock { struct rcu_head sk_rcu; netns_tracker ns_tracker; struct hlist_node sk_bind2_node; + struct xarray sk_pagepool; }; enum sk_pacing { diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h index 638230899e98..88f9234f78cb 100644 --- a/include/uapi/asm-generic/socket.h +++ b/include/uapi/asm-generic/socket.h @@ -132,6 +132,11 @@ #define SO_RCVMARK 75 +#define SO_DEVMEM_HEADER 98 +#define SCM_DEVMEM_HEADER SO_DEVMEM_HEADER +#define SO_DEVMEM_OFFSET 99 +#define SCM_DEVMEM_OFFSET SO_DEVMEM_OFFSET + #if !defined(__KERNEL__) #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__)) diff --git a/include/uapi/linux/uio.h b/include/uapi/linux/uio.h index 059b1a9147f4..8b0be0f50838 100644 --- a/include/uapi/linux/uio.h +++ b/include/uapi/linux/uio.h @@ -20,6 +20,12 @@ struct iovec __kernel_size_t iov_len; /* Must be size_t (1003.1g) */ }; +struct cmsg_devmem { + __u32 frag_offset; + __u32 frag_size; + __u32 frag_token; +}; + /* * UIO_MAXIOV shall be at least 16 1003.1g (5.4.1.1) */ diff --git a/net/core/datagram.c b/net/core/datagram.c index 176eb5834746..3a82598aa6ed 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -455,6 +455,9 @@ static int __skb_datagram_iter(const struct sk_buff *skb, int offset, skb_walk_frags(skb, frag_iter) { int end; + if (frag_iter->devmem) + goto short_copy; + WARN_ON(start > offset + len); end = start + frag_iter->len; diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 51e8d5872670..a894b8a9dbb0 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -279,6 +279,7 @@ #include #include #include +#include /* Track pending CMSGs. */ enum { @@ -460,6 +461,7 @@ void tcp_init_sock(struct sock *sk) set_bit(SOCK_SUPPORT_ZC, &sk->sk_socket->flags); sk_sockets_allocated_inc(sk); + xa_init_flags(&sk->sk_pagepool, XA_FLAGS_ALLOC); } EXPORT_SYMBOL(tcp_init_sock); @@ -2408,6 +2410,165 @@ static int tcp_inq_hint(struct sock *sk) return inq; } +static int tcp_recvmsg_devmem(const struct sock *sk, const struct sk_buff *skb, + unsigned int offset, struct msghdr *msg, int len) +{ + unsigned int start = skb_headlen(skb); + struct cmsg_devmem cmsg_devmem = { 0 }; + unsigned int tokens_added_idx = 0; + int i, copy = start - offset, n; + struct sk_buff *frag_iter; + u32 *tokens_added; + int err = 0; + + if (!skb->devmem) + return -ENODEV; + + tokens_added = kzalloc(sizeof(u32) * skb_shinfo(skb)->nr_frags, + GFP_KERNEL); + + if (!tokens_added) + return -ENOMEM; + + /* Copy header. */ + if (copy > 0) { + copy = min(copy, len); + + n = copy_to_iter(skb->data + offset, copy, &msg->msg_iter); + if (n != copy) { + err = -EFAULT; + goto err_release_pages; + } + + offset += copy; + len -= copy; + + /* First a cmsg_devmem for # bytes copied to user buffer */ + cmsg_devmem.frag_size = copy; + err = put_cmsg(msg, SOL_SOCKET, SO_DEVMEM_HEADER, + sizeof(cmsg_devmem), &cmsg_devmem); + if (err) + goto err_release_pages; + + if (len == 0) + goto out; + } + + /* after that, send information of devmem pages through a sequence + * of cmsg + */ + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { + const skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; + struct page *page = skb_frag_page(frag); + struct dma_buf_pages *priv; + u32 user_token, frag_offset; + struct page *dmabuf_pages; + int end; + + /* skb->devmem should indicate that ALL the pages in this skb + * are dma buf pages. We're checking for that flag above, but + * also check individual pages here. If the driver is not + * setting skb->devmem correctly, we still don't want to crash + * here when accessing pgmap or priv below. + */ + if (!is_dma_buf_page(page)) { + net_err_ratelimited("Found non-devmem skb with dma_buf " + "page"); + err = -ENODEV; + goto err_release_pages; + } + + end = start + skb_frag_size(frag); + copy = end - offset; + memset(&cmsg_devmem, 0, sizeof(cmsg_devmem)); + + if (copy > 0) { + copy = min(copy, len); + + priv = (struct dma_buf_pages *)page->pp->mp_priv; + + dmabuf_pages = priv->pages; + frag_offset = ((page - dmabuf_pages) << PAGE_SHIFT) + + skb_frag_off(frag) + offset - start; + cmsg_devmem.frag_offset = frag_offset; + cmsg_devmem.frag_size = copy; + err = xa_alloc((struct xarray *)&sk->sk_pagepool, + &user_token, page, xa_limit_31b, + GFP_KERNEL); + if (err) + goto err_release_pages; + + tokens_added[tokens_added_idx++] = user_token; + + get_page(page); + cmsg_devmem.frag_token = user_token; + + offset += copy; + len -= copy; + + err = put_cmsg(msg, SOL_SOCKET, SO_DEVMEM_OFFSET, + sizeof(cmsg_devmem), &cmsg_devmem); + if (err) { + put_page(page); + goto err_release_pages; + } + + if (len == 0) + goto out; + } + start = end; + } + + if (!len) + goto out; + + /* if len is not satisfied yet, we need to skb_walk_frags() to satisfy + * len + */ + skb_walk_frags(skb, frag_iter) + { + int end; + + if (!frag_iter->devmem) { + err = -EFAULT; + goto err_release_pages; + } + + WARN_ON(start > offset + len); + end = start + frag_iter->len; + copy = end - offset; + if (copy > 0) { + if (copy > len) + copy = len; + err = tcp_recvmsg_devmem(sk, frag_iter, offset - start, + msg, copy); + if (err) + goto err_release_pages; + len -= copy; + if (len == 0) + goto out; + offset += copy; + } + start = end; + } + + if (len) { + err = -EFAULT; + goto err_release_pages; + } + + goto out; + +err_release_pages: + for (i = 0; i < tokens_added_idx; i++) + put_page(xa_erase((struct xarray *)&sk->sk_pagepool, + tokens_added[i])); + +out: + kfree(tokens_added); + return err; +} + /* * This routine copies from a sock struct into the user buffer. * @@ -2428,7 +2589,7 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len, int err; int target; /* Read at least this many bytes */ long timeo; - struct sk_buff *skb, *last; + struct sk_buff *skb, *last, *skb_last_copied = NULL; u32 urg_hole = 0; err = -ENOTCONN; @@ -2593,7 +2754,27 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len, } } - if (!(flags & MSG_TRUNC)) { + if (skb_last_copied && skb_last_copied->devmem != skb->devmem) + break; + + if (skb->devmem) { + if (!(flags & MSG_SOCK_DEVMEM)) { + /* skb->devmem skbs can only be received with + * the MSG_SOCK_DEVMEM flag. + */ + + copied = -EFAULT; + break; + } + + err = tcp_recvmsg_devmem(sk, skb, offset, msg, used); + if (err) { + if (!copied) + copied = -EFAULT; + break; + } + skb_last_copied = skb; + } else if (!(flags & MSG_TRUNC)) { err = skb_copy_datagram_msg(skb, offset, msg, used); if (err) { /* Exception. Bailout! */ @@ -2601,6 +2782,7 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len, copied = -EFAULT; break; } + skb_last_copied = skb; } WRITE_ONCE(*seq, *seq + used); diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 06d2573685ca..d7dee38e0410 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -2291,6 +2291,14 @@ void tcp_v4_destroy_sock(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); + unsigned long index; + struct page *page; + + xa_for_each(&sk->sk_pagepool, index, page) + put_page(page); + + xa_destroy(&sk->sk_pagepool); + trace_tcp_destroy_sock(sk); tcp_clear_xmit_timers(sk); From patchwork Mon Jul 10 22:32:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 118128 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp115524vqm; Mon, 10 Jul 2023 15:43:32 -0700 (PDT) X-Google-Smtp-Source: APBJJlGYvigJnnIIm6qempjgf/arF88X4lHcSWseDAGVPhZ5eD1mCm25NYVbMe6AT79x7ZmSHizl X-Received: by 2002:a17:906:1001:b0:962:46d7:c8fc with SMTP id 1-20020a170906100100b0096246d7c8fcmr16953143ejm.21.1689029011687; Mon, 10 Jul 2023 15:43:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689029011; cv=none; d=google.com; s=arc-20160816; b=kDfz+07VixCcvSGqiGggLtq/ieYBEhRt3qG9DkncYM2mtE3qbXO0Gzvd3W1MQRTc5V Btclo6V/p28I1EQXMi/Ncx2p2+BZcgRoAMKV6QCnLtmPUbCKwxDTX2RQUqvAF2yf9tO3 E+yn/VYOBcTFuQ49QN/MXGnXSmTlDchbW5/6cccMfZ2ssCGfRzGZ5LZ8yg/Ql7Z13Cpj K4izC44nllaiih9JfNIX7Occeya+XtLahGg4ex3uUXsP8BCqf6TGB+x4oHyqNJobu4w6 8aKnSrU6PvAEpO/a1fs5RJuf5Lddubu/MDZwAOAJy3HobijyTyzV4BIJkFCH+3Jy+GrL SOlQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=eMznknX6LhCZzxkpGK1anYqC2AReGQzyhNKGt0dhRMs=; fh=55VxVtf/wwatkgb/EP05SgBT89BC1zibP0SY8oI5qug=; b=TEoGAvtkkBRPVb00TvECGjpLCIJE46T9zdZNPq969a/p+cNjpRuiLjw6WaNGHV6mcA IjgIi7n4PKMEyF2Rrkdwjl5hEFPx6YUUADvzx9AUVsG4ILcLxahln7x3937C2xLMfBIs hJtyONsvVCUu+7P/+gVzMVsHK2lbfSfdpaQzPUJc6YMp82Gg4RYlEf+laWeOJcYeA2Fk SXUoOdTB9i0E2wydXmGQyrIP5HF74D+4WHx52kPCKRA/SSv2Cr8a2rQT3nd5funel59V X088S4zQJHRVCaX3M5y8gz76rfBiYEo6GOddBcaexoYJz88OF8pof4g8Rl8Tw6kuCrvY WRPg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=BtdvUbd2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ja6-20020a170907988600b00992e51fe33bsi700714ejc.118.2023.07.10.15.43.08; Mon, 10 Jul 2023 15:43:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=BtdvUbd2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229528AbjGJWep (ORCPT + 99 others); Mon, 10 Jul 2023 18:34:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59586 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231124AbjGJWeg (ORCPT ); Mon, 10 Jul 2023 18:34:36 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6F87A10DA for ; Mon, 10 Jul 2023 15:34:08 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-5704e551e8bso59419597b3.3 for ; Mon, 10 Jul 2023 15:34:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689028447; x=1691620447; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=eMznknX6LhCZzxkpGK1anYqC2AReGQzyhNKGt0dhRMs=; b=BtdvUbd29zwhezN9vWnqcJKpf0bV2LKN/+c8VWC9oiyT99FFcNEJDOvUA5o+m21DRJ e9aNejKksk1rdwKl8O1MZ9yP/O1OkMbpULehfNB94qYTmzji2HykYi+osFgOuhdAF+qD KGUZyMgcGBl8J91gT7Vb/XlYX07mdXJEBkEmKhCq++SE1RIb3ka/J0r4X0WZmUp1LyEo SjYNX6E3UyUwTUHmBfJ7fNPviqSEF3jRznivgjuuqoMmWVaZBvnRoFgKdOZiBTRle0wA 2DGPZWSVDlrn2z8oZhGDAjAlxNPaIc+/JHcTW0CD86xRjmhJfySPnG/Os836KYedkH0U WpvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689028447; x=1691620447; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=eMznknX6LhCZzxkpGK1anYqC2AReGQzyhNKGt0dhRMs=; b=HNcV9rwUrOdCEsdrF1pJTnGlIZcmegSJjd/TSsnoRNEByLTDDd0O+toZZxFZQDAAon 37TfcE2gC3Nzmr1zceky7/Y3mFzP7kqXJ19vSU5SHQH9TIKMO5u7xWgojSRePVH4pKFu RWX6QWu/ACooaAHDm36ASEdJKoGHkF7stD+g3uirbHEdG6O4HbMbcIFlrDfk1SQYCo16 ggIvpwudK7zqWCg8HUE2Mmq3Bh6jhZX9K4GpkjWwp5D9LUnbTH0GiCAxgKsyF1bYp1T0 JED7m0Dpic/jYC1LqAmEAxZ0HQS+JRQkxsm1hKoAirAn7Do1YR+2h8tDk89lMaDEU8NV +hmg== X-Gm-Message-State: ABy/qLbf33SXwZIbN4v7nsQgppmoGMsDsL9PPciEoPoohd8fuRFxHPn9 A29WZB6tk2I4aKVY2KamM8tMSbvQMFpYhLQg5Z9NwGdx4R3492dnChxOWbfCN2KcbyHB/F06hpH +Ipf5yQ9FQiQw2pQ76jmpz8QSeP/X8C8x2WlvViY/Y8YojdCnYbjitVKAEFaQGhhAczmw5Lyf6p j+GTR5fto= X-Received: from almasrymina.svl.corp.google.com ([2620:15c:2c4:200:4c0f:bfb6:9942:8c53]) (user=almasrymina job=sendgmr) by 2002:a81:e90b:0:b0:573:285a:c2a3 with SMTP id d11-20020a81e90b000000b00573285ac2a3mr95478ywm.1.1689028446916; Mon, 10 Jul 2023 15:34:06 -0700 (PDT) Date: Mon, 10 Jul 2023 15:32:57 -0700 In-Reply-To: <20230710223304.1174642-1-almasrymina@google.com> Mime-Version: 1.0 References: <20230710223304.1174642-1-almasrymina@google.com> X-Mailer: git-send-email 2.41.0.390.g38632f3daf-goog Message-ID: <20230710223304.1174642-7-almasrymina@google.com> Subject: [RFC PATCH 06/10] net: add SO_DEVMEM_DONTNEED setsockopt to release RX pages From: Mina Almasry To: linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org, netdev@vger.kernel.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Mina Almasry , Sumit Semwal , " =?utf-8?q?Christian_K=C3=B6nig?= " , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesper Dangaard Brouer , Ilias Apalodimas , Arnd Bergmann , David Ahern , Willem de Bruijn , Shuah Khan , jgg@ziepe.ca X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771075285044919978 X-GMAIL-MSGID: 1771075285044919978 Add an interface for the user to notify the kernel that it is done reading the NET_RX dmabuf pages returned as cmsg. The kernel will drop the reference on the NET_RX pages to make them available for re-use. Signed-off-by: Mina Almasry --- include/uapi/asm-generic/socket.h | 1 + include/uapi/linux/uio.h | 4 +++ net/core/sock.c | 41 +++++++++++++++++++++++++++++++ 3 files changed, 46 insertions(+) diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h index 88f9234f78cb..2a5a7f5da358 100644 --- a/include/uapi/asm-generic/socket.h +++ b/include/uapi/asm-generic/socket.h @@ -132,6 +132,7 @@ #define SO_RCVMARK 75 +#define SO_DEVMEM_DONTNEED 97 #define SO_DEVMEM_HEADER 98 #define SCM_DEVMEM_HEADER SO_DEVMEM_HEADER #define SO_DEVMEM_OFFSET 99 diff --git a/include/uapi/linux/uio.h b/include/uapi/linux/uio.h index 8b0be0f50838..faaa765fd5a4 100644 --- a/include/uapi/linux/uio.h +++ b/include/uapi/linux/uio.h @@ -26,6 +26,10 @@ struct cmsg_devmem { __u32 frag_token; }; +struct devmemtoken { + __u32 token_start; + __u32 token_count; +}; /* * UIO_MAXIOV shall be at least 16 1003.1g (5.4.1.1) */ diff --git a/net/core/sock.c b/net/core/sock.c index 24f2761bdb1d..f9b9d9ec7322 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1531,7 +1531,48 @@ int sk_setsockopt(struct sock *sk, int level, int optname, /* Paired with READ_ONCE() in tcp_rtx_synack() */ WRITE_ONCE(sk->sk_txrehash, (u8)val); break; + case SO_DEVMEM_DONTNEED: { + struct devmemtoken tokens[128]; + unsigned int num_tokens, i, j; + if (sk->sk_type != SOCK_STREAM || + sk->sk_protocol != IPPROTO_TCP) { + ret = -EBADF; + break; + } + + if (optlen % sizeof(struct devmemtoken) || + optlen > sizeof(tokens)) { + ret = -EINVAL; + break; + } + + num_tokens = optlen / sizeof(struct devmemtoken); + if (copy_from_sockptr(tokens, optval, optlen)) { + ret = -EFAULT; + break; + } + + ret = 0; + + for (i = 0; i < num_tokens; i++) { + for (j = 0; j < tokens[i].token_count; j++) { + struct page *pg = xa_erase(&sk->sk_pagepool, + tokens[i].token_start + j); + + if (pg) + put_page(pg); + else + /* -EINTR here notifies the userspace + * that not all tokens passed to it have + * been freed. + */ + ret = -EINTR; + } + } + + break; + } default: ret = -ENOPROTOOPT; break; From patchwork Mon Jul 10 22:32:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 118135 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp120843vqm; Mon, 10 Jul 2023 15:58:41 -0700 (PDT) X-Google-Smtp-Source: APBJJlFgleoevm4hUyDLgxLZsvBNoU6WTkwO5iFfmhQkHfp4DYa7OxeJsTF81ajEpso6j9oXDW0n X-Received: by 2002:aa7:c442:0:b0:51e:293b:e1ce with SMTP id n2-20020aa7c442000000b0051e293be1cemr12442477edr.31.1689029921620; Mon, 10 Jul 2023 15:58:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689029921; cv=none; d=google.com; s=arc-20160816; b=kHlhNh8/EKJ4lAOd8sBCiWAxZcvFx8bjKWGxzft8NWL2frEm/tKPfy7VkbR+PTZGC6 Wx8ZvT1NF2gW4ByxyeWhwf0vsYLlz0KjoXaeJXEGzNfshASKcxlH1kvp4kOzVRLvimWm /b82kLrxE5l1oRjy3y8gCe7lvr0mSLwLD0VhA6QM7URL12ShzBMNjh7yXyI6kaqwRKHe cpLCOxK35Zvgc06fkPY+7BODjWMuQu39Wls0ZpG5nfRF0wiyCUjaHT5yBUIsmiTB8P5H 7/LXH8A/r6TYajPj0XsIQi9BbkdQcjDGKXouVY4+FDMJKdJYysIVgl6fDVQcSySLnqjT lukg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=mZRt6IHRVYwjrDgVxc5BdSo0qSi/7hDrkUlzlo5nha0=; fh=55VxVtf/wwatkgb/EP05SgBT89BC1zibP0SY8oI5qug=; b=QlSE3ToI+T70jMT83El0qhTtFOk0YotArZHL35cYK+1JBJyT011+ZcgmGynQogbFBT 9crsrt1KckgokS7drWO4cVCViQTA2PvNHXsRVdjF30aPQD47uJDCPHYCffcyVYISeEXr 9zbivLK39Xi1ptMOZiZeor+cLgR92Q3Np4TxJ0Lfn+IWMGhAHf+Qk/7Em6vhNqNsYDZA F2PmzXbiZEwLBt8fwi1SCuxNc8bkT9I5RSw2cESEDkOxOfOTPuk3LRtWoFg6Nh0MxHGW efR2AIzYz4Vr42EqtvF1rtZzOevRerTKBf8nJJNGwMl0n2BuqRQA0fPr149aGYCVPgfh FsKw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=F1oYk6PY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i22-20020aa7c716000000b0051de453f1bcsi546045edq.303.2023.07.10.15.58.18; Mon, 10 Jul 2023 15:58:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=F1oYk6PY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230505AbjGJWfR (ORCPT + 99 others); Mon, 10 Jul 2023 18:35:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60090 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230490AbjGJWfP (ORCPT ); Mon, 10 Jul 2023 18:35:15 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2838910C9 for ; Mon, 10 Jul 2023 15:34:37 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-c5a17bfb38bso4295277276.3 for ; Mon, 10 Jul 2023 15:34:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689028450; x=1691620450; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=mZRt6IHRVYwjrDgVxc5BdSo0qSi/7hDrkUlzlo5nha0=; b=F1oYk6PYa7tb7uIVT7ncRImo9dzRRTMW5Ebj/UOQML5cCAoP1bLxMrobu7CbsW0S0Y /G53r5rrIvyYyG7hi6OPzXlzkrD5CGMaY2hGiwXxL/WiSz79VteXRkiY81i7pEKLwp3l 17i21BWEE+TiQROAnL1CRW8Ot45TQa2M7Zp54xbYqCQ+Shk7eiW886z2C/Mfw+okEbQ3 +Ztb3SAnWz8mhB7hcleaQBtg3wQ1hi2zzpD2QPLMozZVO5zAIhJQh5CKSeVrTq/Oq2dZ VwWPCjINgVuULzor/doPFvcq/lk4R/bGAoXL66l0NggTBvJcD8v6CnyLUTPM0bsChWTJ PBKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689028450; x=1691620450; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=mZRt6IHRVYwjrDgVxc5BdSo0qSi/7hDrkUlzlo5nha0=; b=czjM2SADan+b25eFBN3x3mjdL8YgIM2XhQymbeNLM7kHE/MBtXzUZpXqEuRBTac9FL uuD44DMbncxB89u3eBdqEaVGs/nRNudE3v4LYdktIxdAh2oawdWDdReAZg4gjWyF1Vwu gpGzUc78A9033XOlNEIC7bZafYcbQ4A3DxppvTzwdPa7bCis2AyYIrllPVGKTGv2bX5L jZ8CFUl4pNJdfod/wQwLcmUxfc7LlVfTFOgWorZh6rVEg4pEJORUDsdTtw3b94DjLbQL 53rfw4C5inXSXFlS7+CurS+wuzes6H5tAkuFZVTUDanClzfIvP+X3N1JlYXY2kP4zppg itFA== X-Gm-Message-State: ABy/qLbaj4f+my6IF17stLUQ8bItpU5zLdinsL9vzzRpit3PdfiC1cPP ir3Nx9cbsZhEpxxNIyntObJPbbSVghBsKGT16x0B293qvywxVsaqySMV+XdYQVxaJxtMuUjWQbH PkDKvbAGj6nqbSr0gq+z0ACGPhqwJ51uF5C+rjKVjHRRJtMBLLmSSUCBpooX3dTc3SHx3L+rMKs qlIB30J9o= X-Received: from almasrymina.svl.corp.google.com ([2620:15c:2c4:200:4c0f:bfb6:9942:8c53]) (user=almasrymina job=sendgmr) by 2002:a25:c902:0:b0:c61:7151:6727 with SMTP id z2-20020a25c902000000b00c6171516727mr76617ybf.10.1689028449938; Mon, 10 Jul 2023 15:34:09 -0700 (PDT) Date: Mon, 10 Jul 2023 15:32:58 -0700 In-Reply-To: <20230710223304.1174642-1-almasrymina@google.com> Mime-Version: 1.0 References: <20230710223304.1174642-1-almasrymina@google.com> X-Mailer: git-send-email 2.41.0.390.g38632f3daf-goog Message-ID: <20230710223304.1174642-8-almasrymina@google.com> Subject: [RFC PATCH 07/10] tcp: implement sendmsg() TX path for for devmem tcp From: Mina Almasry To: linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org, netdev@vger.kernel.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Mina Almasry , Sumit Semwal , " =?utf-8?q?Christian_K=C3=B6nig?= " , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesper Dangaard Brouer , Ilias Apalodimas , Arnd Bergmann , David Ahern , Willem de Bruijn , Shuah Khan , jgg@ziepe.ca X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771076239102002575 X-GMAIL-MSGID: 1771076239102002575 For device memory TCP, we let the user provide the kernel with a cmsg container 2 items: 1. the dmabuf pages fd that the user would like to send data from. 2. the offset into this dmabuf that the user would like to start sending from. In tcp_sendmsg_locked(), if this cmsg is provided, we send the data using the dmabuf NET_TX pages bio_vec. Also provide drivers with a new skb_devmem_frag_dma_map() helper. This helper is similar to skb_frag_dma_map(), but it first checks whether the frag being mapped is backed by dmabuf NET_TX pages, and provides the correct dma_addr if so. Signed-off-by: Mina Almasry --- include/linux/skbuff.h | 19 +++++++++-- include/net/sock.h | 2 ++ net/core/skbuff.c | 8 ++--- net/core/sock.c | 6 ++++ net/ipv4/tcp.c | 73 +++++++++++++++++++++++++++++++++++++++++- 5 files changed, 101 insertions(+), 7 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index f5e03aa84160..ad4e7bfcab07 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1660,8 +1660,8 @@ static inline int skb_zerocopy_iter_dgram(struct sk_buff *skb, } int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb, - struct msghdr *msg, int len, - struct ubuf_info *uarg); + struct msghdr *msg, struct iov_iter *iov_iter, + int len, struct ubuf_info *uarg); /* Internal */ #define skb_shinfo(SKB) ((struct skb_shared_info *)(skb_end_pointer(SKB))) @@ -3557,6 +3557,21 @@ static inline dma_addr_t skb_frag_dma_map(struct device *dev, skb_frag_off(frag) + offset, size, dir); } +/* Similar to skb_frag_dma_map, but handles devmem skbs correctly. */ +static inline dma_addr_t skb_devmem_frag_dma_map(struct device *dev, + const struct sk_buff *skb, + const skb_frag_t *frag, + size_t offset, size_t size, + enum dma_data_direction dir) +{ + if (unlikely(skb->devmem && is_dma_buf_page(skb_frag_page(frag)))) { + dma_addr_t dma_addr = + dma_buf_page_to_dma_addr(skb_frag_page(frag)); + return dma_addr + skb_frag_off(frag) + offset; + } + return skb_frag_dma_map(dev, frag, offset, size, dir); +} + static inline struct sk_buff *pskb_copy(struct sk_buff *skb, gfp_t gfp_mask) { diff --git a/include/net/sock.h b/include/net/sock.h index c615666ff19a..733865f89635 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1890,6 +1890,8 @@ struct sockcm_cookie { u64 transmit_time; u32 mark; u32 tsflags; + u32 devmem_fd; + u32 devmem_offset; }; static inline void sockcm_init(struct sockcm_cookie *sockc, diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 9b83da794641..b1e28e7ad6a8 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1685,8 +1685,8 @@ void msg_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref) EXPORT_SYMBOL_GPL(msg_zerocopy_put_abort); int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb, - struct msghdr *msg, int len, - struct ubuf_info *uarg) + struct msghdr *msg, struct iov_iter *iov_iter, + int len, struct ubuf_info *uarg) { struct ubuf_info *orig_uarg = skb_zcopy(skb); int err, orig_len = skb->len; @@ -1697,12 +1697,12 @@ int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb, if (orig_uarg && uarg != orig_uarg) return -EEXIST; - err = __zerocopy_sg_from_iter(msg, sk, skb, &msg->msg_iter, len); + err = __zerocopy_sg_from_iter(msg, sk, skb, iov_iter, len); if (err == -EFAULT || (err == -EMSGSIZE && skb->len == orig_len)) { struct sock *save_sk = skb->sk; /* Streams do not free skb on error. Reset to prev state. */ - iov_iter_revert(&msg->msg_iter, skb->len - orig_len); + iov_iter_revert(iov_iter, skb->len - orig_len); skb->sk = sk; ___pskb_trim(skb, orig_len); skb->sk = save_sk; diff --git a/net/core/sock.c b/net/core/sock.c index f9b9d9ec7322..854624bee5d0 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2813,6 +2813,12 @@ int __sock_cmsg_send(struct sock *sk, struct cmsghdr *cmsg, return -EINVAL; sockc->transmit_time = get_unaligned((u64 *)CMSG_DATA(cmsg)); break; + case SCM_DEVMEM_OFFSET: + if (cmsg->cmsg_len != CMSG_LEN(2 * sizeof(u32))) + return -EINVAL; + sockc->devmem_fd = ((u32 *)CMSG_DATA(cmsg))[0]; + sockc->devmem_offset = ((u32 *)CMSG_DATA(cmsg))[1]; + break; /* SCM_RIGHTS and SCM_CREDENTIALS are semantically in SOL_UNIX. */ case SCM_RIGHTS: case SCM_CREDENTIALS: diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index a894b8a9dbb0..85d6cdc832ef 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -280,6 +280,7 @@ #include #include #include +#include /* Track pending CMSGs. */ enum { @@ -1216,6 +1217,52 @@ int tcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg, int *copied, return err; } +static int tcp_prepare_devmem_data(struct msghdr *msg, int devmem_fd, + unsigned int devmem_offset, + struct file **devmem_file, + struct iov_iter *devmem_tx_iter, size_t size) +{ + struct dma_buf_pages *priv; + int err = 0; + + *devmem_file = fget_raw(devmem_fd); + if (!*devmem_file) { + err = -EINVAL; + goto err; + } + + if (!is_dma_buf_pages_file(*devmem_file)) { + err = -EBADF; + goto err_fput; + } + + priv = (*devmem_file)->private_data; + if (!priv) { + WARN_ONCE(!priv, "dma_buf_pages_file has no private_data"); + err = -EINTR; + goto err_fput; + } + + if (!(priv->type & DMA_BUF_PAGES_NET_TX)) + return -EINVAL; + + if (devmem_offset + size > priv->dmabuf->size) { + err = -ENOSPC; + goto err_fput; + } + + *devmem_tx_iter = priv->net_tx.iter; + iov_iter_advance(devmem_tx_iter, devmem_offset); + + return 0; + +err_fput: + fput(*devmem_file); + *devmem_file = NULL; +err: + return err; +} + int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) { struct tcp_sock *tp = tcp_sk(sk); @@ -1227,6 +1274,8 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) int process_backlog = 0; bool zc = false; long timeo; + struct file *devmem_file = NULL; + struct iov_iter devmem_tx_iter; flags = msg->msg_flags; @@ -1295,6 +1344,14 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) } } + if (sockc.devmem_fd) { + err = tcp_prepare_devmem_data(msg, sockc.devmem_fd, + sockc.devmem_offset, &devmem_file, + &devmem_tx_iter, size); + if (err) + goto out_err; + } + /* This should be in poll */ sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk); @@ -1408,7 +1465,17 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) goto wait_for_space; } - err = skb_zerocopy_iter_stream(sk, skb, msg, copy, uarg); + if (devmem_file) { + err = skb_zerocopy_iter_stream(sk, skb, msg, + &devmem_tx_iter, + copy, uarg); + if (err > 0) + iov_iter_advance(&msg->msg_iter, err); + } else { + err = skb_zerocopy_iter_stream(sk, skb, msg, + &msg->msg_iter, + copy, uarg); + } if (err == -EMSGSIZE || err == -EEXIST) { tcp_mark_push(tp, skb); goto new_segment; @@ -1462,6 +1529,8 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) } out_nopush: net_zcopy_put(uarg); + if (devmem_file) + fput(devmem_file); return copied + copied_syn; do_error: @@ -1470,6 +1539,8 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) if (copied + copied_syn) goto out; out_err: + if (devmem_file) + fput(devmem_file); net_zcopy_put_abort(uarg, true); err = sk_stream_error(sk, flags, err); /* make sure we wake any epoll edge trigger waiter */ From patchwork Mon Jul 10 22:32:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 118130 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp115742vqm; Mon, 10 Jul 2023 15:44:13 -0700 (PDT) X-Google-Smtp-Source: APBJJlGVj+/PWJ9t+8Skn8oUapejv8eJ/jpCFiQvIepSDdOm47JDPmX5NFcNNd0yuWN2tn4A57aV X-Received: by 2002:a05:6512:29e:b0:4fb:393:26c3 with SMTP id j30-20020a056512029e00b004fb039326c3mr11475125lfp.15.1689029052936; Mon, 10 Jul 2023 15:44:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689029052; cv=none; d=google.com; s=arc-20160816; b=YCpfFQXQAIQpKRZgemCfHAGUICbtKqvSJXSSzRRDmSkyr3sBZjOqnB7UGmjyed2cc4 +axRplYVhf7r+udj9n2+826XAHGbwCimjSsIYQx9B9h1DRhjZKjKRRzAmWqBdv+ftK/K 8Ye5GI5q8j/VYa9CzGe/Z9IaeqrdBNBZRctNjC3YmE+Wo7ZSt4adQVZNjDSlR+fPNsx4 JWxvU5bu0/kZjylbyWPMcDG6H1XeAfNAVlbTXapAbwfbbiidxRdzEHScVnGEQ6oKnvpG irh+gGRXn/b8mKnj1OQTMhykuZ6OWYSbHPp86vKuTEuw6kOgJkA9FFHYWSB5o4Yi8P8I FUHg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=oKO6sogguZyjcmLGJ/hPDJOveqN84Swf5N5/pg86eqs=; fh=55VxVtf/wwatkgb/EP05SgBT89BC1zibP0SY8oI5qug=; b=yOStB8NJhcOZ2MyPx0Q4l3r6J9P3PF40doD0l5+v9kBRxUKPhNMvDf2fm/KVV3bjRm +3wS2MehUwv1gp7Z/0DzqO0lrYRdpkFdIXQ4MnI6SgFOMm+FMZYYMxPr906Kxkojrr07 2SCXIcdE/5iFb/cf4NWDOgCsgIXNvO3OYzSym/wVth+3N7nCe5i0bz5UuMNA1EKm8HfK S49CrxAbG3xPgnL7L1y37/WYioQScDzz9Ht/320oiCnIt5byub/q6mgix9jctv/E6Isn hkkF0gPtOxUlhobwesO/CC+P1NPDf7GcLjaTRNEGnSZci38NgkkdKXst5J2Nx6nfu4lw G8Eg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=N3DHxkVh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r23-20020aa7d597000000b0051dd525dd79si527016edq.521.2023.07.10.15.43.49; Mon, 10 Jul 2023 15:44:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=N3DHxkVh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231219AbjGJWfj (ORCPT + 99 others); Mon, 10 Jul 2023 18:35:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60342 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231148AbjGJWfc (ORCPT ); Mon, 10 Jul 2023 18:35:32 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5DC94E71 for ; Mon, 10 Jul 2023 15:34:53 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-57320c10635so57589857b3.3 for ; Mon, 10 Jul 2023 15:34:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689028456; x=1691620456; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=oKO6sogguZyjcmLGJ/hPDJOveqN84Swf5N5/pg86eqs=; b=N3DHxkVhQSbndqZLMPdZWvbeEsiR7+htJ+1F2e5ZIGT+muOYFkWuSnLJpIhBCW2Xw6 hXI3ZbUG6LFBfMU+G84GLSCOBf7y4Afp8JUa2lUwnJL+V70hHvNdF8++6kI1xQBP7HTj 4x0eCTNMZbfF2zgIWjwW2Exc89WxhhuQHmhK/MvP8+yoOC7ZNNACgejmFTgcln/iwDzY rDMjpZczN766fZ80KQDuw8DEPgFCdPKePsMJ9rdNpLEBGCs0LfvmrJ4MUWNd04Y39Ki2 mYmgIHIco7xdBn3H8/uIz08V9CUGlxqkJCZNngHlfzPv1NwCClpsAsq3nLZAv+qts+b+ eSIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689028456; x=1691620456; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=oKO6sogguZyjcmLGJ/hPDJOveqN84Swf5N5/pg86eqs=; b=S84WbGqIBH+ZlwNN10S7TWpMmVSMHAYIm4ZPX/YKdi24V+o1ijHTs+N+xfY8la/2qb NMIZ0lpqavImyk8TKzGlMNKGPXXMvwt7ybcDAWlMo00AmEJFXybwtwDTUSsobE3mBQoq ISUPKHCmW389mpzhsxZYnnv4iXlU9pkybRXZj9FECRpFGlhJX5M71xxE2y9XA2jqQVQc jc3awZWvBbh7N6JAXxGD+2GSuR2jYQ/QkfK1/xWdQtmlY7szm8COzl1B/Qky1f68dpV/ V4ldZXcJvsSnExjw95E1o2hAgKQcU7o8Yqx/DpRbg+0zQNwF5SNRI1qbLQxR9qqsqwOO T5oQ== X-Gm-Message-State: ABy/qLbOZVW0hJ9miGtH6qdYtvez1ZDoJzxy8PdQZ5ezNVSn8CEnwGxe SA7IDLBxx49chqwBLjn/xJFfmnAuM/GK0P6UrXUBh27a9nP3vFWGeEV4Jxu9f9TfR85MnWYIDdc ZJxsWcX13pglNHcCqlFk74ggMgRf0V/FQdSxukgTne3ZaK6J0ZmnSsdPCNjtJ6cYthNxIk3gyf9 WEt2seXgk= X-Received: from almasrymina.svl.corp.google.com ([2620:15c:2c4:200:4c0f:bfb6:9942:8c53]) (user=almasrymina job=sendgmr) by 2002:a81:d305:0:b0:56d:2a88:49e5 with SMTP id y5-20020a81d305000000b0056d2a8849e5mr98145ywi.2.1689028455695; Mon, 10 Jul 2023 15:34:15 -0700 (PDT) Date: Mon, 10 Jul 2023 15:32:59 -0700 In-Reply-To: <20230710223304.1174642-1-almasrymina@google.com> Mime-Version: 1.0 References: <20230710223304.1174642-1-almasrymina@google.com> X-Mailer: git-send-email 2.41.0.390.g38632f3daf-goog Message-ID: <20230710223304.1174642-9-almasrymina@google.com> Subject: [RFC PATCH 08/10] selftests: add ncdevmem, netcat for devmem TCP From: Mina Almasry To: linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org, netdev@vger.kernel.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Mina Almasry , Sumit Semwal , " =?utf-8?q?Christian_K=C3=B6nig?= " , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesper Dangaard Brouer , Ilias Apalodimas , Arnd Bergmann , David Ahern , Willem de Bruijn , Shuah Khan , jgg@ziepe.ca X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771075327894480101 X-GMAIL-MSGID: 1771075327894480101 ncdevmem is a devmem TCP netcat. It works similarly to netcat, but it sends and receives data using the devmem TCP APIs. It uses udmabuf as the dmabuf provider. It is compatible with a regular netcat running on a peer, or a ncdevmem running on a peer. In addition to normal netcat support, ncdevmem has a validation mode, where it sends a specific pattern and validates this pattern on the receiver side to ensure data integrity. Signed-off-by: Mina Almasry --- tools/testing/selftests/net/.gitignore | 1 + tools/testing/selftests/net/Makefile | 1 + tools/testing/selftests/net/ncdevmem.c | 693 +++++++++++++++++++++++++ 3 files changed, 695 insertions(+) create mode 100644 tools/testing/selftests/net/ncdevmem.c diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore index f27a7338b60e..1153ba177f1b 100644 --- a/tools/testing/selftests/net/.gitignore +++ b/tools/testing/selftests/net/.gitignore @@ -16,6 +16,7 @@ ipsec ipv6_flowlabel ipv6_flowlabel_mgr msg_zerocopy +ncdevmem nettest psock_fanout psock_snd diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile index c12df57d5539..ea5ccbd99d3b 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -84,6 +84,7 @@ TEST_GEN_FILES += ip_local_port_range TEST_GEN_FILES += bind_wildcard TEST_PROGS += test_vxlan_mdb.sh TEST_PROGS += test_bridge_neigh_suppress.sh +TEST_GEN_FILES += ncdevmem TEST_FILES := settings diff --git a/tools/testing/selftests/net/ncdevmem.c b/tools/testing/selftests/net/ncdevmem.c new file mode 100644 index 000000000000..4f62f22bf763 --- /dev/null +++ b/tools/testing/selftests/net/ncdevmem.c @@ -0,0 +1,693 @@ +// SPDX-License-Identifier: GPL-2.0 +#define _GNU_SOURCE +#define __EXPORTED_HEADERS__ + +#include +#include +#include +#include +#include +#include +#include +#define __iovec_defined +#include +#include + +#include +#include +#include +#include +#include + +#include +#include +#include +#include + +#define PAGE_SHIFT 12 +#define PAGE_SIZE 4096 +#define TEST_PREFIX "ncdevmem" +#define NUM_PAGES 16000 + +#ifndef MSG_SOCK_DEVMEM +#define MSG_SOCK_DEVMEM 0x2000000 +#endif + +/* + * tcpdevmem netcat. Works similarly to netcat but does device memory TCP + * instead of regular TCP. Uses udmabuf to mock a dmabuf provider. + * + * Usage: + * + * * Without validation: + * + * On server: + * ncdevmem -s -c -f eth1 -n 0000:06:00.0 -l \ + * -p 5201 + * + * On client: + * ncdevmem -s -c -f eth1 -n 0000:06:00.0 -p 5201 + * + * * With Validation: + * On server: + * ncdevmem -s -c -l -f eth1 -n 0000:06:00.0 \ + * -p 5202 -v 1 + * + * On client: + * ncdevmem -s -c -f eth1 -n 0000:06:00.0 -p 5202 \ + * -v 100000 + * + * Note this is compatible with regular netcat. i.e. the sender or receiver can + * be replaced with regular netcat to test the RX or TX path in isolation. + */ + +static char *server_ip = "192.168.1.4"; +static char *client_ip = "192.168.1.2"; +static char *port = "5201"; +static size_t do_validation = 0; +static int queue_num = 15; +static char *ifname = "eth1"; +static char *nic_pci_addr = "0000:06:00.0"; +static unsigned int iterations = 0; + +void print_bytes(void *ptr, size_t size) +{ + unsigned char *p = ptr; + int i; + for (i = 0; i < size; i++) { + printf("%02hhX ", p[i]); + } + printf("\n"); +} + +void print_nonzero_bytes(void *ptr, size_t size) +{ + unsigned char *p = ptr; + unsigned int i; + for (i = 0; i < size; i++) { + if (p[i]) + printf("%c", p[i]); + } + printf("\n"); +} + +void initialize_validation(void *line, size_t size) +{ + static unsigned char seed = 1; + unsigned char *ptr = line; + for (size_t i = 0; i < size; i++) { + ptr[i] = seed; + seed++; + if (seed == 254) + seed = 0; + } +} + +void validate_buffer(void *line, size_t size) +{ + static unsigned char seed = 1; + int errors = 0; + + unsigned char *ptr = line; + for (size_t i = 0; i < size; i++) { + if (ptr[i] != seed) { + fprintf(stderr, + "Failed validation: expected=%u, " + "actual=%u, index=%lu\n", + seed, ptr[i], i); + errors++; + if (errors > 20) + exit(1); + } + seed++; + if (seed == 254) + seed = 0; + } + + fprintf(stdout, "Validated buffer\n"); +} + +/* Triggers a driver reset... + * + * The proper way to do this is probably 'ethtool --reset', but I don't have + * that supported on my current test bed. I resort to changing this + * configuration in the driver which also causes a driver reset... + */ +static void reset_flow_steering() +{ + char command[256]; + memset(command, 0, sizeof(command)); + snprintf(command, sizeof(command), "sudo ethtool -K %s ntuple off", + "eth1"); + system(command); + + memset(command, 0, sizeof(command)); + snprintf(command, sizeof(command), "sudo ethtool -K %s ntuple on", + "eth1"); + system(command); +} + +static void configure_flow_steering() +{ + char command[256]; + memset(command, 0, sizeof(command)); + snprintf(command, sizeof(command), + "sudo ethtool -N %s flow-type tcp4 src-ip %s dst-ip %s " + "src-port %s dst-port %s queue %d", + ifname, client_ip, server_ip, port, port, queue_num); + system(command); +} + +/* Triggers a device reset, which causes the dmabuf pages binding to take + * effect. A better and more generic way to do this may be ethtool --reset. + */ +static void trigger_device_reset() +{ + char command[256]; + memset(command, 0, sizeof(command)); + snprintf(command, sizeof(command), + "sudo ethtool --set-priv-flags %s enable-header-split off", + ifname); + system(command); + + memset(command, 0, sizeof(command)); + snprintf(command, sizeof(command), + "sudo ethtool --set-priv-flags %s enable-header-split on", + ifname); + system(command); +} + +static void create_udmabuf(int *devfd, int *memfd, int *buf, size_t dmabuf_size) +{ + struct udmabuf_create create; + int ret; + + *devfd = open("/dev/udmabuf", O_RDWR); + if (*devfd < 0) { + fprintf(stderr, + "%s: [skip,no-udmabuf: Unable to access DMA " + "buffer device file]\n", + TEST_PREFIX); + exit(70); + } + + *memfd = memfd_create("udmabuf-test", MFD_ALLOW_SEALING); + if (*memfd < 0) { + printf("%s: [skip,no-memfd]\n", TEST_PREFIX); + exit(72); + } + + ret = fcntl(*memfd, F_ADD_SEALS, F_SEAL_SHRINK); + if (ret < 0) { + printf("%s: [skip,fcntl-add-seals]\n", TEST_PREFIX); + exit(73); + } + + ret = ftruncate(*memfd, dmabuf_size); + if (ret == -1) { + printf("%s: [FAIL,memfd-truncate]\n", TEST_PREFIX); + exit(74); + } + + memset(&create, 0, sizeof(create)); + + create.memfd = *memfd; + create.offset = 0; + create.size = dmabuf_size; + *buf = ioctl(*devfd, UDMABUF_CREATE, &create); + if (*buf < 0) { + printf("%s: [FAIL, create udmabuf]\n", TEST_PREFIX); + exit(75); + } +} + + +int do_server() +{ + struct dma_buf_create_pages_info pages_create_info = { 0 }; + int devfd, memfd, buf, buf_pages, ret; + size_t dmabuf_size; + + dmabuf_size = getpagesize() * NUM_PAGES; + + create_udmabuf(&devfd, &memfd, &buf, dmabuf_size); + + pages_create_info.dma_buf_fd = buf; + pages_create_info.type = DMA_BUF_PAGES_NET_RX; + pages_create_info.create_flags = (1 << queue_num); + + ret = sscanf(nic_pci_addr, "0000:%hhx:%hhx.%hhx", + &pages_create_info.pci_bdf[0], + &pages_create_info.pci_bdf[1], + &pages_create_info.pci_bdf[2]); + + if (ret != 3) { + printf("%s: [FAIL, parse fail]\n", TEST_PREFIX); + exit(76); + } + + buf_pages = ioctl(buf, DMA_BUF_CREATE_PAGES, &pages_create_info); + if (buf_pages < 0) { + perror("create pages"); + exit(77); + } + + char *buf_mem = NULL; + buf_mem = mmap(NULL, dmabuf_size, PROT_READ | PROT_WRITE, MAP_SHARED, + buf, 0); + if (buf_mem == MAP_FAILED) { + perror("mmap()"); + exit(1); + } + + /* Need to trigger the NIC to reallocate its RX pages, otherwise the + * bind doesn't take effect. + */ + trigger_device_reset(); + + sleep(1); + + reset_flow_steering(); + configure_flow_steering(); + + struct sockaddr_in server_sin; + server_sin.sin_family = AF_INET; + server_sin.sin_port = htons(atoi(port)); + + ret = inet_pton(server_sin.sin_family, server_ip, &server_sin.sin_addr); + if (socket < 0) { + printf("%s: [FAIL, create socket]\n", TEST_PREFIX); + exit(79); + } + + int socket_fd = socket(server_sin.sin_family, SOCK_STREAM, 0); + if (socket < 0) { + printf("%s: [FAIL, create socket]\n", TEST_PREFIX); + exit(76); + } + + int opt = 1; + ret = setsockopt(socket_fd, SOL_SOCKET, + SO_REUSEADDR | SO_REUSEPORT | SO_ZEROCOPY, &opt, + sizeof(opt)); + if (ret) { + printf("%s: [FAIL, set sock opt]: %s\n", TEST_PREFIX, + strerror(errno)); + exit(76); + } + + printf("binding to address %s:%d\n", server_ip, + ntohs(server_sin.sin_port)); + + ret = bind(socket_fd, &server_sin, sizeof(server_sin)); + if (ret) { + printf("%s: [FAIL, bind]: %s\n", TEST_PREFIX, strerror(errno)); + exit(76); + } + + ret = listen(socket_fd, 1); + if (ret) { + printf("%s: [FAIL, listen]: %s\n", TEST_PREFIX, + strerror(errno)); + exit(76); + } + + struct sockaddr_in client_addr; + socklen_t client_addr_len = sizeof(client_addr); + + char buffer[256]; + + inet_ntop(server_sin.sin_family, &server_sin.sin_addr, buffer, + sizeof(buffer)); + printf("Waiting or connection on %s:%d\n", buffer, + ntohs(server_sin.sin_port)); + int client_fd = accept(socket_fd, &client_addr, &client_addr_len); + + inet_ntop(client_addr.sin_family, &client_addr.sin_addr, buffer, + sizeof(buffer)); + printf("Got connection from %s:%d\n", buffer, + ntohs(client_addr.sin_port)); + + char iobuf[819200]; + char ctrl_data[sizeof(int) * 20000]; + + size_t total_received = 0; + size_t i = 0; + size_t page_aligned_frags = 0; + size_t non_page_aligned_frags = 0; + while (1) { + bool is_devmem = false; + printf("\n\n"); + + struct msghdr msg = { 0 }; + struct iovec iov = { .iov_base = iobuf, + .iov_len = sizeof(iobuf) }; + msg.msg_iov = &iov; + msg.msg_iovlen = 1; + msg.msg_control = ctrl_data; + msg.msg_controllen = sizeof(ctrl_data); + ssize_t ret = recvmsg(client_fd, &msg, MSG_SOCK_DEVMEM); + printf("recvmsg ret=%ld\n", ret); + if (ret < 0 && (errno == EAGAIN || errno == EWOULDBLOCK)) { + continue; + } + if (ret < 0) { + perror("recvmsg"); + continue; + } + if (ret == 0) { + printf("client exited\n"); + goto cleanup; + } + + i++; + struct cmsghdr *cm = NULL; + struct cmsg_devmem *cmsg_devmem = NULL; + for (cm = CMSG_FIRSTHDR(&msg); cm; cm = CMSG_NXTHDR(&msg, cm)) { + if (cm->cmsg_level != SOL_SOCKET || + (cm->cmsg_type != SCM_DEVMEM_OFFSET && + cm->cmsg_type != SCM_DEVMEM_HEADER)) { + fprintf(stdout, "skipping non-devmem cmsg\n"); + continue; + } + + cmsg_devmem = (struct cmsg_devmem *)CMSG_DATA(cm); + is_devmem = true; + + if (cm->cmsg_type == SCM_DEVMEM_HEADER) { + // TODO: process data copied from skb's linear + // buffer. + fprintf(stdout, + "SCM_DEVMEM_HEADER. " + "cmsg_devmem->frag_size=%u\n", + cmsg_devmem->frag_size); + + continue; + } + + struct devmemtoken token = { cmsg_devmem->frag_token, + 1 }; + + total_received += cmsg_devmem->frag_size; + printf("received frag_page=%u, in_page_offset=%u," + " frag_offset=%u, frag_size=%u, token=%u" + " total_received=%lu\n", + cmsg_devmem->frag_offset >> PAGE_SHIFT, + cmsg_devmem->frag_offset % PAGE_SIZE, + cmsg_devmem->frag_offset, cmsg_devmem->frag_size, + cmsg_devmem->frag_token, total_received); + + if (cmsg_devmem->frag_size % PAGE_SIZE) + non_page_aligned_frags++; + else + page_aligned_frags++; + + struct dma_buf_sync sync = { 0 }; + sync.flags = DMA_BUF_SYNC_READ | DMA_BUF_SYNC_START; + ioctl(buf, DMA_BUF_IOCTL_SYNC, &sync); + + if (do_validation) + validate_buffer( + ((unsigned char *)buf_mem) + + cmsg_devmem->frag_offset, + cmsg_devmem->frag_size); + else + print_nonzero_bytes( + ((unsigned char *)buf_mem) + + cmsg_devmem->frag_offset, + cmsg_devmem->frag_size); + + sync.flags = DMA_BUF_SYNC_READ | DMA_BUF_SYNC_END; + ioctl(buf, DMA_BUF_IOCTL_SYNC, &sync); + + ret = setsockopt(client_fd, SOL_SOCKET, + SO_DEVMEM_DONTNEED, &token, + sizeof(token)); + if (ret) { + perror("SO_DEVMEM_DONTNEED"); + exit(1); + } + } + if (!is_devmem) + printf("flow steering error\n"); + + printf("total_received=%lu\n", total_received); + } + +cleanup: + fprintf(stdout, "%s: ok\n", TEST_PREFIX); + + fprintf(stdout, "page_aligned_frags=%lu, non_page_aligned_frags=%lu\n", + page_aligned_frags, non_page_aligned_frags); + + fprintf(stdout, "page_aligned_frags=%lu, non_page_aligned_frags=%lu\n", + page_aligned_frags, non_page_aligned_frags); + + munmap(buf_mem, dmabuf_size); + close(client_fd); + close(socket_fd); + close(buf_pages); + close(buf); + close(memfd); + close(devfd); + trigger_device_reset(); + + return 0; +} + +int do_client() +{ + printf("doing client\n"); + + struct dma_buf_create_pages_info pages_create_info = { 0 }; + int devfd, memfd, buf, buf_pages, ret; + size_t dmabuf_size; + + dmabuf_size = getpagesize() * NUM_PAGES; + + create_udmabuf(&devfd, &memfd, &buf, dmabuf_size); + + pages_create_info.dma_buf_fd = buf; + pages_create_info.type = DMA_BUF_PAGES_NET_TX; + + ret = sscanf(nic_pci_addr, "0000:%hhx:%hhx.%hhx", + &pages_create_info.pci_bdf[0], + &pages_create_info.pci_bdf[1], + &pages_create_info.pci_bdf[2]); + + if (ret != 3) { + printf("%s: [FAIL, parse fail]\n", TEST_PREFIX); + exit(76); + } + + buf_pages = ioctl(buf, DMA_BUF_CREATE_PAGES, &pages_create_info); + if (buf_pages < 0) { + perror("create pages"); + exit(77); + } + + char *buf_mem = NULL; + buf_mem = mmap(NULL, dmabuf_size, PROT_READ | PROT_WRITE, MAP_SHARED, + buf, 0); + if (buf_mem == MAP_FAILED) { + perror("mmap()"); + exit(1); + } + + struct sockaddr_in server_sin; + server_sin.sin_family = AF_INET; + server_sin.sin_port = htons(atoi(port)); + + ret = inet_pton(server_sin.sin_family, server_ip, &server_sin.sin_addr); + if (socket < 0) { + printf("%s: [FAIL, create socket]\n", TEST_PREFIX); + exit(79); + } + + int socket_fd = socket(server_sin.sin_family, SOCK_STREAM, 0); + if (socket < 0) { + printf("%s: [FAIL, create socket]\n", TEST_PREFIX); + exit(76); + } + + int opt = 1; + ret = setsockopt(socket_fd, SOL_SOCKET, + SO_REUSEADDR | SO_REUSEPORT | SO_ZEROCOPY, &opt, + sizeof(opt)); + if (ret) { + printf("%s: [FAIL, set sock opt]: %s\n", TEST_PREFIX, + strerror(errno)); + exit(76); + } + + struct sockaddr_in client_sin; + client_sin.sin_family = AF_INET; + client_sin.sin_port = htons(atoi(port)); + + ret = inet_pton(client_sin.sin_family, client_ip, &client_sin.sin_addr); + if (socket < 0) { + printf("%s: [FAIL, create socket]\n", TEST_PREFIX); + exit(79); + } + + ret = bind(socket_fd, &client_sin, sizeof(client_sin)); + if (ret) { + printf("%s: [FAIL, bind]: %s\n", TEST_PREFIX, strerror(errno)); + exit(76); + } + + ret = setsockopt(socket_fd, SOL_SOCKET, SO_ZEROCOPY, &opt, sizeof(opt)); + if (ret) { + printf("%s: [FAIL, set sock opt]: %s\n", TEST_PREFIX, + strerror(errno)); + exit(76); + } + + ret = connect(socket_fd, &server_sin, sizeof(server_sin)); + if (ret) { + printf("%s: [FAIL, connect]: %s\n", TEST_PREFIX, + strerror(errno)); + exit(76); + } + + char *line = NULL; + size_t line_size = 0; + size_t len = 0; + + size_t i = 0; + while (!iterations || i < iterations) { + i++; + free(line); + line = NULL; + if (do_validation) { + line = malloc(do_validation); + if (!line) { + fprintf(stderr, "Failed to allocate\n"); + exit(1); + } + memset(line, 0, do_validation); + initialize_validation(line, do_validation); + line_size = do_validation; + } else { + line_size = getline(&line, &len, stdin); + } + + if (line_size < 0) + continue; + + fprintf(stdout, "DEBUG: read line_size=%lu\n", line_size); + + struct dma_buf_sync sync = { 0 }; + sync.flags = DMA_BUF_SYNC_WRITE | DMA_BUF_SYNC_START; + ioctl(buf, DMA_BUF_IOCTL_SYNC, &sync); + + memset(buf_mem, 0, dmabuf_size); + memcpy(buf_mem, line, line_size); + + if (do_validation) + validate_buffer(buf_mem, line_size); + + struct iovec iov = { .iov_base = NULL, .iov_len = line_size }; + + struct msghdr msg = { 0 }; + + msg.msg_iov = &iov; + msg.msg_iovlen = 1; + + char ctrl_data[CMSG_SPACE(sizeof(int) * 2)]; + msg.msg_control = ctrl_data; + msg.msg_controllen = sizeof(ctrl_data); + + struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg); + cmsg->cmsg_level = SOL_SOCKET; + cmsg->cmsg_type = SCM_DEVMEM_OFFSET; + cmsg->cmsg_len = CMSG_LEN(sizeof(int) * 2); + *((int *)CMSG_DATA(cmsg)) = buf_pages; + ((int *)CMSG_DATA(cmsg))[1] = 0; + + ret = sendmsg(socket_fd, &msg, MSG_ZEROCOPY); + if (ret < 0) + perror("sendmsg"); + else + fprintf(stdout, "DEBUG: sendmsg_ret=%d\n", ret); + + sync.flags = DMA_BUF_SYNC_WRITE | DMA_BUF_SYNC_END; + ioctl(buf, DMA_BUF_IOCTL_SYNC, &sync); + + // sleep for a bit before we overwrite the dmabuf that's being + // sent. + if (do_validation) + sleep(1); + } + + fprintf(stdout, "%s: ok\n", TEST_PREFIX); + + while (1) + sleep(10); + + munmap(buf_mem, dmabuf_size); + + free(line); + close(socket_fd); + close(buf_pages); + close(buf); + close(memfd); + close(devfd); + trigger_device_reset(); + + return 0; +} + +int main(int argc, char *argv[]) +{ + int is_server = 0, opt; + + // put ':' in the starting of the + // string so that program can + //distinguish between '?' and ':' + while ((opt = getopt(argc, argv, "ls:c:p:v:q:f:n:i:")) != -1) { + switch (opt) { + case 'l': + is_server = 1; + break; + case 's': + server_ip = optarg; + break; + case 'c': + client_ip = optarg; + break; + case 'p': + port = optarg; + break; + case 'v': + do_validation = atoll(optarg); + break; + case 'q': + queue_num = atoi(optarg); + break; + case 'f': + ifname = optarg; + break; + case 'n': + nic_pci_addr = optarg; + break; + case 'i': + iterations = atoll(optarg); + break; + case '?': + printf("unknown option: %c\n", optopt); + break; + } + } + + // optind is for the extra arguments + // which are not parsed + for (; optind < argc; optind++) { + printf("extra arguments: %s\n", argv[optind]); + } + + if (is_server) + return do_server(); + + return do_client(); +} From patchwork Mon Jul 10 22:33:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 118127 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp114192vqm; Mon, 10 Jul 2023 15:39:53 -0700 (PDT) X-Google-Smtp-Source: APBJJlEEavLqIxQF8064cOM/2RJwUGJJuraN5zvUHvmyK0/Ilf5OuWHMiGO5WsqOytSZiODQbsXC X-Received: by 2002:a17:906:4f97:b0:96f:1f79:c0a6 with SMTP id o23-20020a1709064f9700b0096f1f79c0a6mr13123932eju.70.1689028792834; Mon, 10 Jul 2023 15:39:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689028792; cv=none; d=google.com; s=arc-20160816; b=JOBjWBpmPlsegTL0LfPwpa3v0SHWqILYJvAYvV7FR5ELeRSaqsnZEYfnSmwsit/O2j 4PKfilS236+VAbXDu+gHnXyKLjZn15fhyYFQHBvM1ZLs4WnziKO6qPQ7X4VXWdA2gwr/ f8RmOmfKNY9qai54d6D0uTsSnDUyN/haU2BYVxjt2BhOImJjoA9MiM6WsP73xuQ0awh7 DGBsWq+AckRxMsKgpyDgYnvEJLpHO6Fj9xfkGo6OatSpA578AeRVJjw7gNgJ5xGao1YM BP+7E0ZzUO5Y42oBbHe4OL9hsaDUkRHKjQzO4cmTvYA/hB9qRcBQHExGgxsaa6QqU4GV hCdg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=UnMZL8UhzWb/0aBP8bCpuACSexb4BfJiKzPPZTajpks=; fh=55VxVtf/wwatkgb/EP05SgBT89BC1zibP0SY8oI5qug=; b=Iv6JqIoqV8DF+4pHBf13c+0H5GLqMCQ62GS2sLN0kmwOPMbuW0skUnRON0wtjbq/5A LKSnQfiyFuA5XsAflujSmgGLnCqoQjYH+sJAm3Xw4yFbcAjxHGdMetdf/ns+5HIifdRX XtJvsPbd9RWaRNS8SI1ie0McBasVJI8S7NvI0iQQnclo1S8LKZO2N+8/EM7a6SAzT3I5 uwWJjGV3CQcpaUT9yNRTfXgdHCFVhk1+foUQQ5qbZEXAt0LIs/7ZYSh98wC+tW+R3f+Y 8Ro1DHSzUpyaJGVT/UbmFed+d5WX+zXMTZAN32yo/qf0HbF8n6SDwQvhNtgdF41kTi1p bIOg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b="ZycEIM/t"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p19-20020a170906b21300b0098e417a0c3bsi651533ejz.109.2023.07.10.15.39.28; Mon, 10 Jul 2023 15:39:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b="ZycEIM/t"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231195AbjGJWff (ORCPT + 99 others); Mon, 10 Jul 2023 18:35:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60184 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231133AbjGJWfb (ORCPT ); Mon, 10 Jul 2023 18:35:31 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1FE761707 for ; Mon, 10 Jul 2023 15:34:54 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-c0f35579901so5919511276.0 for ; Mon, 10 Jul 2023 15:34:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689028459; x=1691620459; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=UnMZL8UhzWb/0aBP8bCpuACSexb4BfJiKzPPZTajpks=; b=ZycEIM/tzN0od6DMdpG2ZBlkJqdG/YaEAiXfT+38tW9bimX5mFjLfnmha8sMd1FwJJ P1YTcXkuScRdFZ+6Nke3Sdy4NjMA2W+JEp3F6Wzh3QRgc3G5LGn1Zu1iHZ4RyIqBjoHq k9jtQN4DN+tPVpc1G/NOC+sNs6OMyg6jWMhgxeuaJcoDHd6HbM1Xh1fktrKhCJh7t8TZ 2mOqQGiKrSnTwbDIJsz/RpKI8MeOMnJHXoAtfD0XSqgWhXjbLkfzJLjhD09BOkgZES0H docmIVXKHe4UX6CgXt/bh6mysKACg9wyOus054kiN2CxJmmkeSAQzjRyHNih7/Suny73 Z7kA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689028459; x=1691620459; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=UnMZL8UhzWb/0aBP8bCpuACSexb4BfJiKzPPZTajpks=; b=ACKhdyLRgmh0fbCHDMWtO8fLmhtC9dptejyxnCz2WDH9LgmIRBbxgDu1ucMA7soNp9 I30vh07oYqUGf3IAZt3K7W8D+8njITQf742vpUtOYfebojWo8sWhpMoBFwtEoEYb8zUo OhyjpGUTA0CKvinKWoXp+bgpwLGbiscpWuFlA2Akbv6vysC05PK3CV/O/CnFeaBodBmX TLq94efFazL0G12/FDSmXSqeas3SIeJkFy+2GFjoN/Dab2qEq/zQKe6e49PLluiCHj3X 6inp/+FTen5+A4urQWLXWrikomhmeVnZLXzOfvxNclG8P9mDoHXGmLFeWn1mmQ/MB8jN 0JMA== X-Gm-Message-State: ABy/qLbvTrabec6BwvJ3yJBZIPDuz5x1YjLd6sB37l7f+Z1o2w/Tx+Y7 y73/24JVrAO5AcXecjpXN7TI8yexc4+XSO/WxEgnLt3b3K1T2Fo4B3NDqM2nYdoSzuSHyNula8i BHWn+9T2aDT9MsSp8N9UgaJbtBaQTBxYzyVnkEI2fvIfV2CN60DUHJEsv1NjYj0HpAQ0+S64svY HrioiLjKA= X-Received: from almasrymina.svl.corp.google.com ([2620:15c:2c4:200:4c0f:bfb6:9942:8c53]) (user=almasrymina job=sendgmr) by 2002:a25:11c6:0:b0:c85:934:7ad2 with SMTP id 189-20020a2511c6000000b00c8509347ad2mr21545ybr.8.1689028458996; Mon, 10 Jul 2023 15:34:18 -0700 (PDT) Date: Mon, 10 Jul 2023 15:33:00 -0700 In-Reply-To: <20230710223304.1174642-1-almasrymina@google.com> Mime-Version: 1.0 References: <20230710223304.1174642-1-almasrymina@google.com> X-Mailer: git-send-email 2.41.0.390.g38632f3daf-goog Message-ID: <20230710223304.1174642-10-almasrymina@google.com> Subject: [RFC PATCH 09/10] memory-provider: updates core provider API for devmem TCP From: Mina Almasry To: linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org, netdev@vger.kernel.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Mina Almasry , Sumit Semwal , " =?utf-8?q?Christian_K=C3=B6nig?= " , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesper Dangaard Brouer , Ilias Apalodimas , Arnd Bergmann , David Ahern , Willem de Bruijn , Shuah Khan , jgg@ziepe.ca X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771075055289682833 X-GMAIL-MSGID: 1771075055289682833 Implement a few updates to Jakub's RFC memory provider API to make it suitable for device memory TCP: 1. Currently for devmem TCP the driver's netdev_rx_queue holds a reference to the dma_buf_pages struct and needs to pass that to the page_pool's memory provider somehow. For PoC purposes, create a pp->mp_priv field that is set by the driver. Likely needs a better API (likely dependant on the general memory provider API). 2. The current memory_provider API gives the memory_provider the option to override put_page(), but tries page_pool_clear_pp_info() after the memory provider has released the page. IMO if the page freeing is delegated to the provider then the page_pool should not modify the page after release_page() has been called. Signed-off-by: Mina Almasry --- include/net/page_pool.h | 1 + net/core/page_pool.c | 7 ++++--- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/include/net/page_pool.h b/include/net/page_pool.h index 364fe6924258..7b6668479baf 100644 --- a/include/net/page_pool.h +++ b/include/net/page_pool.h @@ -78,6 +78,7 @@ struct page_pool_params { struct device *dev; /* device, for DMA pre-mapping purposes */ struct napi_struct *napi; /* Sole consumer of pages, otherwise NULL */ u8 memory_provider; /* haaacks! should be user-facing */ + void *mp_priv; /* argument to pass to the memory provider */ enum dma_data_direction dma_dir; /* DMA mapping direction */ unsigned int max_len; /* max DMA sync memory size */ unsigned int offset; /* DMA addr offset */ diff --git a/net/core/page_pool.c b/net/core/page_pool.c index d50f6728e4f6..df3f431fcff3 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -241,6 +241,7 @@ static int page_pool_init(struct page_pool *pool, goto free_ptr_ring; } + pool->mp_priv = pool->p.mp_priv; if (pool->mp_ops) { err = pool->mp_ops->init(pool); if (err) { @@ -564,16 +565,16 @@ void page_pool_return_page(struct page_pool *pool, struct page *page) else __page_pool_release_page_dma(pool, page); - page_pool_clear_pp_info(page); - /* This may be the last page returned, releasing the pool, so * it is not safe to reference pool afterwards. */ count = atomic_inc_return_relaxed(&pool->pages_state_release_cnt); trace_page_pool_state_release(pool, page, count); - if (put) + if (put) { + page_pool_clear_pp_info(page); put_page(page); + } /* An optimization would be to call __free_pages(page, pool->p.order) * knowing page is not part of page-cache (thus avoiding a * __page_cache_release() call). From patchwork Mon Jul 10 22:33:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 118134 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp118330vqm; Mon, 10 Jul 2023 15:51:14 -0700 (PDT) X-Google-Smtp-Source: APBJJlEJpZf4D0VNgF69XOLXOX6QbLua+r9hzqyMsZrmWNWWFcdBul4YpvuIBwc+MIO8Belfrf0x X-Received: by 2002:aa7:d043:0:b0:51e:cc7:534d with SMTP id n3-20020aa7d043000000b0051e0cc7534dmr12550687edo.24.1689029473777; Mon, 10 Jul 2023 15:51:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689029473; cv=none; d=google.com; s=arc-20160816; b=udDbC6K8bYURbeiNB2Ovh/f6wcwuLtCYh0iXhA+ZrEtHG10Vyzy4nTUV6jWy+bVOPE dg8UjCgTBGyRpB73T4rE/Ybu1gLz+iLgQXOiOhBjjaQWPcCHKIlNPSlIS2F4wAba7tvf rzgGGPTQaajygMNIlIx/APqdlsWw6eND8MJmoWHJqEHo/WHEQkSs9pN/jEQIY0BSwb3O xZg2M3AesEPAP9LPfD1+kJAe7/Mb6GHTCibklGdmq5sE18DNA+SjqfePsOlNK7Rp8KuY +4lujUnAo0JwBiGn1/M4KYuKpKWCeOs35UOK1EJxEVjlJSIznubIvs9PluniHMvxVaRN lgEg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=oQJYRa4cYbFlxJ1mDmha+0HxPANtp6LWuy6ed2Yjbu4=; fh=55VxVtf/wwatkgb/EP05SgBT89BC1zibP0SY8oI5qug=; b=QJ5bWPvSudrOIVhTTOeb8/ICpZdvM25hPdHUQmJJPHj7B+/ARcrzaiDJPdfdqzcecG 0GjG/xz1yU2RPR/1F5rEe2On4/hRuflpkso3Yx0a6JQXI8drr6M2pJ6AAbdcdf79jLd6 BMaHONTuK1fpM5Rf2lPf/72uMnBRtH0XIyPTEorJOqWKaOhn3XQwBRjIaMELbZA4hzpg 4EoVd7ECWNPWzhyVfzuCKTTIH/4PYlB2/kh5UYKeRjtqAeGHdn/W9phPzvB6vYn0VmAJ itf1XUCNlktFo9N9iLlLkMSzUMA/r8P/TgtKSBkGiKNnyDizLDqISifOAiaAiW08ih23 kZ1A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=CetzuFar; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n25-20020a056402515900b0051df8f154d4si660774edd.6.2023.07.10.15.50.51; Mon, 10 Jul 2023 15:51:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=CetzuFar; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231320AbjGJWfx (ORCPT + 99 others); Mon, 10 Jul 2023 18:35:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60172 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231186AbjGJWfe (ORCPT ); Mon, 10 Jul 2023 18:35:34 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 44B3DE6F for ; Mon, 10 Jul 2023 15:34:58 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-c83284edf0eso1866419276.3 for ; Mon, 10 Jul 2023 15:34:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689028463; x=1691620463; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=oQJYRa4cYbFlxJ1mDmha+0HxPANtp6LWuy6ed2Yjbu4=; b=CetzuFarHJgPUiGBwDAWLpdzt0w3V9hQbqMPhuZfIenfXirykHD2HLGYdWQSkINvJh xJG2uW65yyAAb+o06RvuUp3WpLKZAzNYnV7EddNjt5dfXFCFq8AZmqCWdcr58enYZIQE n5JK8gg0yI5kVZJS/Tnc0gRJy7edUPwC0tcHLkojiMtzkNIt/J3L+ITk1kszc47h7Vz7 7HfhvqbXuGyD3D189Y0m4jp9NOnOMRg/24QxNfglicfLYXQlHIKScA6XdF716DYeVEDF C2/NnZaF4OaiJaWHPvY80aoJTwZFVHVTEJFSi6QbKnAMO9h++YLyvxmo53PBx0z1bgkV UbJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689028463; x=1691620463; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=oQJYRa4cYbFlxJ1mDmha+0HxPANtp6LWuy6ed2Yjbu4=; b=ZoNBD/DbzBJO2R/c4R1o2FA6hgkHy+zfcpDfPAvZ2st4NTg1FcDxP1dOBTIzr1BwKX yjP4AgouMbjHvOjDati0bN7HVMweTlejlKBralv+RCgrOnvg/JhW0ftCBK4hiVzcQC2A 7nkv1SjQDK80g7/oBdsTQmNQCE2pJgTRXLDnm/MHwHv5HY06JxibQPUUhtSzREF7Rmx+ Y1gw83nFOyalZ6/OZ0dmgpbfIGHUnZ6N3ABawk+wjdG5kzMAK21bpx+HKeR0KMdQTn45 jjNHpkVAp5ijPiUxw2LpuAMS3tm4aLGEDbQZLZCcFEgyYpbPCetfafVKThbkqHzYz0uo NrPA== X-Gm-Message-State: ABy/qLarWce170/CrxKts4LxwKXPPaoxuulgO65vVKDXnzySGC7TtIEy WCDY5tkJwVwoSOcGyk97vSG92DE0lPNB4hDr63ZfBy2uVWOo/7+pvmpDgYZooGh4RvaGp9GPrLc 3r3MiFtNja1IP832T0esYcf+/Q3Iekzks3dHOZsYWqsgdUbJc4r3MjVB3QLgQ33hTwzLEs1f80W nLWS5QZ2U= X-Received: from almasrymina.svl.corp.google.com ([2620:15c:2c4:200:4c0f:bfb6:9942:8c53]) (user=almasrymina job=sendgmr) by 2002:a25:4252:0:b0:c6f:6ffe:f904 with SMTP id p79-20020a254252000000b00c6f6ffef904mr50579yba.9.1689028463039; Mon, 10 Jul 2023 15:34:23 -0700 (PDT) Date: Mon, 10 Jul 2023 15:33:01 -0700 In-Reply-To: <20230710223304.1174642-1-almasrymina@google.com> Mime-Version: 1.0 References: <20230710223304.1174642-1-almasrymina@google.com> X-Mailer: git-send-email 2.41.0.390.g38632f3daf-goog Message-ID: <20230710223304.1174642-11-almasrymina@google.com> Subject: [RFC PATCH 10/10] memory-provider: add dmabuf devmem provider From: Mina Almasry To: linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org, netdev@vger.kernel.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Mina Almasry , Sumit Semwal , " =?utf-8?q?Christian_K=C3=B6nig?= " , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesper Dangaard Brouer , Ilias Apalodimas , Arnd Bergmann , David Ahern , Willem de Bruijn , Shuah Khan , jgg@ziepe.ca X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771075769106035602 X-GMAIL-MSGID: 1771075769106035602 Use Jakub's memory provider PoC API: https://github.com/kuba-moo/linux/tree/pp-providers To implement a dmabuf devmem memory provider. The provider allocates NET_RX dmabuf pages to the page pool. This abstracts any custom memory allocation or freeing changes for devmem TCP from drivers using the page pool. The memory provider allocates NET_RX pages from the dmabuf pages provided by the driver. These pages are ZONE_DEVICE pages with the sg dma_addrs stored in the zone_device_data entry in the page. The page pool entries in struct page are in a union with the ZONE_DEVICE entries, and - without special handling - the page pool would accidentally overwrite the data in the ZONE_DEVICE fields. To solve this, the memory provider converts the page from a ZONE_DEVICE page to a ZONE_NORMAL page upon giving it to the page pool, and converts it back to ZONE_DEVICE page upon getting it back from the page pool. This is safe to do because the NET_RX pages are dmabuf pages created to hold the dma_addr in the dma_buf_map_attachement sg_table entries, and are only used with code that handles them specifically. However, since dmabuf pages can now also be page pool page, we need to update 2 places to detect this correctly: 1. is_dma_buf_page() needs to be updated to correctly detect dmabuf pages after they've been inserted into the pool. 2. dma_buf_page_to_dma_addr() needs to be updated. For page pool pages, the dma_addr exists in page->dma_addr. For non page pool pages, the dma_addr exists in page->zone_device_data. Signed-off-by: Mina Almasry --- include/linux/dma-buf.h | 29 ++++++++++- include/net/page_pool.h | 20 ++++++++ net/core/page_pool.c | 104 ++++++++++++++++++++++++++++++++++++---- 3 files changed, 143 insertions(+), 10 deletions(-) diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 93228a2fec47..896359fa998d 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -692,15 +692,26 @@ static inline bool is_dma_buf_pages_file(struct file *file) struct page *dma_buf_pages_net_rx_alloc(struct dma_buf_pages *priv); +static inline bool is_dma_buf_page_net_rx(struct page *page) +{ + struct dma_buf_pages *priv; + + return (is_page_pool_page(page) && (priv = page->pp->mp_priv) && + priv->pgmap.ops == &dma_buf_pgmap_ops); +} + static inline bool is_dma_buf_page(struct page *page) { return (is_zone_device_page(page) && page->pgmap && - page->pgmap->ops == &dma_buf_pgmap_ops); + page->pgmap->ops == &dma_buf_pgmap_ops) || + is_dma_buf_page_net_rx(page); } static inline dma_addr_t dma_buf_page_to_dma_addr(struct page *page) { - return (dma_addr_t)page->zone_device_data; + return is_dma_buf_page_net_rx(page) ? + (dma_addr_t)page->dma_addr : + (dma_addr_t)page->zone_device_data; } static inline int dma_buf_map_sg(struct device *dev, struct scatterlist *sg, @@ -718,6 +729,16 @@ static inline int dma_buf_map_sg(struct device *dev, struct scatterlist *sg, return nents; } + +static inline bool is_dma_buf_pages_priv(void *ptr) +{ + struct dma_buf_pages *priv = (struct dma_buf_pages *)ptr; + + if (!priv || priv->pgmap.ops != &dma_buf_pgmap_ops) + return false; + + return true; +} #else static inline bool is_dma_buf_page(struct page *page) { @@ -745,6 +766,10 @@ static inline struct page *dma_buf_pages_net_rx_alloc(struct dma_buf_pages *priv return NULL; } +static inline bool is_dma_buf_pages_priv(void *ptr) +{ + return false; +} #endif diff --git a/include/net/page_pool.h b/include/net/page_pool.h index 7b6668479baf..a57757a13cc8 100644 --- a/include/net/page_pool.h +++ b/include/net/page_pool.h @@ -157,6 +157,7 @@ enum pp_memory_provider_type { PP_MP_HUGE_SPLIT, /* 2MB, online page alloc */ PP_MP_HUGE, /* 2MB, all memory pre-allocated */ PP_MP_HUGE_1G, /* 1G pages, MEP, pre-allocated */ + PP_MP_DMABUF_DEVMEM, /* dmabuf devmem provider */ }; struct pp_memory_provider_ops { @@ -170,6 +171,7 @@ extern const struct pp_memory_provider_ops basic_ops; extern const struct pp_memory_provider_ops hugesp_ops; extern const struct pp_memory_provider_ops huge_ops; extern const struct pp_memory_provider_ops huge_1g_ops; +extern const struct pp_memory_provider_ops dmabuf_devmem_ops; struct page_pool { struct page_pool_params p; @@ -420,4 +422,22 @@ static inline void page_pool_nid_changed(struct page_pool *pool, int new_nid) page_pool_update_nid(pool, new_nid); } +static inline bool is_page_pool_page(struct page *page) +{ + /* page->pp_magic is OR'ed with PP_SIGNATURE after the allocation + * in order to preserve any existing bits, such as bit 0 for the + * head page of compound page and bit 1 for pfmemalloc page, so + * mask those bits for freeing side when doing below checking, + * and page_is_pfmemalloc() is checked in __page_pool_put_page() + * to avoid recycling the pfmemalloc page. + */ + if (unlikely((page->pp_magic & ~0x3UL) != PP_SIGNATURE)) + return false; + + if (!page->pp) + return false; + + return true; +} + #endif /* _NET_PAGE_POOL_H */ diff --git a/net/core/page_pool.c b/net/core/page_pool.c index df3f431fcff3..e626d4e309c1 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -236,6 +236,9 @@ static int page_pool_init(struct page_pool *pool, case PP_MP_HUGE_1G: pool->mp_ops = &huge_1g_ops; break; + case PP_MP_DMABUF_DEVMEM: + pool->mp_ops = &dmabuf_devmem_ops; + break; default: err = -EINVAL; goto free_ptr_ring; @@ -975,14 +978,7 @@ bool page_pool_return_skb_page(struct page *page, bool napi_safe) page = compound_head(page); - /* page->pp_magic is OR'ed with PP_SIGNATURE after the allocation - * in order to preserve any existing bits, such as bit 0 for the - * head page of compound page and bit 1 for pfmemalloc page, so - * mask those bits for freeing side when doing below checking, - * and page_is_pfmemalloc() is checked in __page_pool_put_page() - * to avoid recycling the pfmemalloc page. - */ - if (unlikely((page->pp_magic & ~0x3UL) != PP_SIGNATURE)) + if (!is_page_pool_page(page)) return false; pp = page->pp; @@ -1538,3 +1534,95 @@ const struct pp_memory_provider_ops huge_1g_ops = { .alloc_pages = mp_huge_1g_alloc_pages, .release_page = mp_huge_1g_release, }; + +/*** "Dmabuf devmem page" ***/ + +/* Dmabuf devmem memory provider allocates DMA_BUF_PAGES_NET_RX pages which are + * backing the dma_buf_map_attachment() from the NIC to the device memory. + * + * These pages are wrappers around the dma_addr of the sg entries in the + * sg_table returned from dma_buf_map_attachment(). They can be passed to the + * networking stack, which will generate devmem skbs from them and process them + * correctly. + */ +static int mp_dmabuf_devmem_init(struct page_pool *pool) +{ + struct dma_buf_pages *priv; + + priv = pool->mp_priv; + if (!is_dma_buf_pages_priv(priv)) + return -EINVAL; + + return 0; +} + +static void mp_dmabuf_devmem_destroy(struct page_pool *pool) +{ +} + +static struct page *mp_dmabuf_devmem_alloc_pages(struct page_pool *pool, + gfp_t gfp) +{ + struct dma_buf_pages *priv = pool->mp_priv; + dma_addr_t dma_addr; + struct page *page; + + page = dma_buf_pages_net_rx_alloc(priv); + if (!page) + return page; + + /* It shouldn't be possible for the allocation to give us a page not + * belonging to this page_pool's pgmap. + */ + BUG_ON(page->pgmap != &priv->pgmap); + + /* netdev_rxq_alloc_dma_buf_page() allocates a ZONE_DEVICE page. + * Prepare to convert it into a page_pool page. We need to hold pgmap + * and zone_device_data (which holds the dma_addr). + * + * DMA_BUF_PAGES_NET_RX are dmabuf pages created specifically to wrap + * the dma_addr of the sg_table into a struct page. These pages are + * used by code specifically equipped to handle them, so this + * conversation from ZONE_DEVICE page to page pool page should be safe. + */ + dma_addr = (dma_addr_t)page->zone_device_data; + + set_page_zone(page, ZONE_NORMAL); + page->pp_magic = 0; + page_pool_set_pp_info(pool, page); + + page->dma_addr = dma_addr; + + return page; +} + +static bool mp_dmabuf_devmem_release_page(struct page_pool *pool, + struct page *page) +{ + struct dma_buf_pages *priv = pool->mp_priv; + unsigned long dma_addr = page->dma_addr; + + page_pool_clear_pp_info(page); + + /* As the page pool releases the page, restore it back to a ZONE_DEVICE + * page so it gets freed according to the + * page->pgmap->ops->page_free(). + */ + set_page_zone(page, ZONE_DEVICE); + page->zone_device_data = (void*)dma_addr; + page->pgmap = &priv->pgmap; + put_page(page); + + /* Return false here as we don't want the page pool touching the page + * after it's released to us. + */ + return false; +} + +const struct pp_memory_provider_ops dmabuf_devmem_ops = { + .init = mp_dmabuf_devmem_init, + .destroy = mp_dmabuf_devmem_destroy, + .alloc_pages = mp_dmabuf_devmem_alloc_pages, + .release_page = mp_dmabuf_devmem_release_page, +}; +EXPORT_SYMBOL(dmabuf_devmem_ops);