From patchwork Tue Nov 15 18:28:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 20500 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp2882629wru; Tue, 15 Nov 2022 10:34:18 -0800 (PST) X-Google-Smtp-Source: AA0mqf5idoqX5KU/sjpvaNnCOf/rVExNXAnvf4MUedjtN/Rr98dd8c0PedDhV+7/aXtUOeDV4LP3 X-Received: by 2002:a05:6a00:3007:b0:56b:e15a:7215 with SMTP id ay7-20020a056a00300700b0056be15a7215mr19336665pfb.27.1668537257856; Tue, 15 Nov 2022 10:34:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668537257; cv=none; d=google.com; s=arc-20160816; b=MMbaahdDoKaZeBH+9PGzMPP6FPTZjJz9E6nylpgTP0oBXyKFpeyWvYJYukBYokzSz5 6/xWzCrFRbEU8gJEx+dqc9YAGc693bq3Mpl4dNbpsdcHIZmdGKhktUxD056KKVveiv/B fdH2aUKvMS65aW+KyW0P6f5dms9xZZ6c74BPXzP3IVB7eoJ5tEdqmEBDlu89f5EOrobq ZTCc1r9bPxaAC/ef9ausVVfA2QboA/RFn0v4kIv4lD0rcaOCTNiqFq/K/qGVt7DRkgAb vuololyIFE8AreRtvID58x8qoFjWD7mr4fOowp43ywFqrU1GGtCZ/c5CAfA1NgmQtkRV JKLQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:mime-version:date :dkim-signature; bh=SoUmPkHlkDH+7nhAOWIZ0SCq+4Ebj+CALgmJJRMTdeg=; b=O6yhzrt0noDsFlQ+Wc5U2Bwa0VBPejLzF7hr7jjgkK5AiqvWoDaQRkx7J3xvsqDDZC HlygLoFGZ+3M7Hqh0lMit58FdxC6VhlCxR+WAH/v48EITpWrtegMqI+MSbN8rBwscHbw s/D1BZBrjUQVDWc0UeortT2mhFGZ7lLjK0RJLRJO41yiYOQ3Cbd9iBoJ986bHR+zu4YG CxaVT3/4a5Vs5gADr3ITufWNn9B6k1/HZgCOVcPXS4q5EzFhjY4OR+59Zwu0wyyzLY2c z3pko3uXBecFBYO8e+Et0573aMJ9iCom0aVB0bjLwAd+WjNDuOtfq4QVhMsPBKYLhCTN GUZg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="eANR3/+r"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k14-20020a170902c40e00b0017f97fe774fsi14852811plk.445.2022.11.15.10.33.51; Tue, 15 Nov 2022 10:34:17 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="eANR3/+r"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231578AbiKOS27 (ORCPT + 99 others); Tue, 15 Nov 2022 13:28:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38648 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230401AbiKOS2p (ORCPT ); Tue, 15 Nov 2022 13:28:45 -0500 Received: from mail-qv1-xf49.google.com (mail-qv1-xf49.google.com [IPv6:2607:f8b0:4864:20::f49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9952DDFEC for ; Tue, 15 Nov 2022 10:28:44 -0800 (PST) Received: by mail-qv1-xf49.google.com with SMTP id mo15-20020a056214330f00b004b96d712bccso11368960qvb.22 for ; Tue, 15 Nov 2022 10:28:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=SoUmPkHlkDH+7nhAOWIZ0SCq+4Ebj+CALgmJJRMTdeg=; b=eANR3/+r4/MRfVqHI/utl+NkZlPzevU1TGEEysolV5xO3wHm28FQqXZ3yK+tishuev FEdyAzazOUwFaX/ieAkinArTmqoCzRPkfskS+jTQLD1MSouu/OMU9BXEUA/pPut5/7Rb 8fn99JhkQp5ZK/xXjV8842yEKstiUq2qleGkkoi+kBxNAuyXfOsquk6c5zVgEI4fEAjS r9IEjLOtF7ByxDK7FTHiPiPVlT5Bn+FBXA5Vt40vGwdOeNFIQOVIcJN/BhQnjV17zKvA cio68NUuIMMwMyvC7DbTShTMKa17vFn4vCPsfivXCRG8D7S3llUgVfR+CFMQNZnAf48h F08Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=SoUmPkHlkDH+7nhAOWIZ0SCq+4Ebj+CALgmJJRMTdeg=; b=G/pCA6v6ZKwdFZPQFioFY9X+NY2sCiI/2xpCxqA+PH5e0Vr2VF8sQ0rLEBRryw828B BfXrcsR8Yb08PPsxQxbwpGvKmhR1Y0HlGVVo1xAvjzRz1e3QIihNarBbWkbvylIZpyUb 1VKypTmIfGquXb+Llcny9Ed3wfksk3a9w8GSbstw+o61e1EgVSVi5547sJg5oHdzQfkS EOkqEtjqtSMKCE9TbCcCN9wHxJQTHEapLp/wfn5/DIdp2Y7nzpG0FitgyyM3xRXFGtTm R5GT73bHcXi8FJ8DRP/ix747T+FDknhqJ7iAK/a9QwrSspY9DBYStJlMnSU64zxXJqOx 3DDw== X-Gm-Message-State: ANoB5pmCZoHgFbK4N4gCuhDnsj76ilnWAYgmFsEUiboB2V0rhalT1+eD EFrWUcjVQP00j7jqe+tYYkgd1Td6o+lwzw== X-Received: from edumazet1.c.googlers.com ([fda3:e722:ac3:cc00:2b:7d90:c0a8:395a]) (user=edumazet job=sendgmr) by 2002:ac8:7502:0:b0:3a5:7e65:ddeb with SMTP id u2-20020ac87502000000b003a57e65ddebmr17287503qtq.424.1668536923808; Tue, 15 Nov 2022 10:28:43 -0800 (PST) Date: Tue, 15 Nov 2022 18:28:41 +0000 Mime-Version: 1.0 X-Mailer: git-send-email 2.38.1.431.g37b22c650d-goog Message-ID: <20221115182841.2640176-1-edumazet@google.com> Subject: [PATCH v2 -next] iommu/dma: avoid expensive indirect calls for sync operations From: Eric Dumazet To: Joerg Roedel , Robin Murphy , Will Deacon Cc: linux-kernel , netdev@vger.kernel.org, Eric Dumazet , Eric Dumazet , iommu@lists.linux.dev X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1749588123699464864?= X-GMAIL-MSGID: =?utf-8?q?1749588123699464864?= Quite often, NIC devices do not need dma_sync operations on x86_64 at least. Indeed, when dev_is_dma_coherent(dev) is true and dev_use_swiotlb(dev) is false, iommu_dma_sync_single_for_cpu() and friends do nothing. However, indirectly calling them when CONFIG_RETPOLINE=y consumes about 10% of cycles on a cpu receiving packets from softirq at ~100Gbit rate, as shown in [1] Even if/when CONFIG_RETPOLINE is not set, there is a cost of about 3%. This patch adds dev->skip_dma_sync boolean that can be opted-in. For instance iommu_setup_dma_ops() can set this boolean to true if CONFIG_DMA_API_DEBUG is not set, and dev_is_dma_coherent(dev). Then later, if/when swiotlb is used for the first time, the flag is turned off, from swiotlb_tbl_map_single() We might in the future inline again these helpers, like: static void inline dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr, size_t size, enum dma_data_direction dir) { if (!dev_skip_dma_sync(dev)) __dma_sync_single_for_cpu(dev, addr, size, dir); } perf profile before the patch: 18.53% [kernel] [k] gq_rx_skb 14.77% [kernel] [k] napi_reuse_skb 8.95% [kernel] [k] skb_release_data 5.42% [kernel] [k] dev_gro_receive 5.37% [kernel] [k] memcpy <*> 5.26% [kernel] [k] iommu_dma_sync_sg_for_cpu 4.78% [kernel] [k] tcp_gro_receive <*> 4.42% [kernel] [k] iommu_dma_sync_sg_for_device 4.12% [kernel] [k] ipv6_gro_receive 3.65% [kernel] [k] gq_pool_get 3.25% [kernel] [k] skb_gro_receive 2.07% [kernel] [k] napi_gro_frags 1.98% [kernel] [k] tcp6_gro_receive 1.27% [kernel] [k] gq_rx_prep_buffers 1.18% [kernel] [k] gq_rx_napi_handler 0.99% [kernel] [k] csum_partial 0.74% [kernel] [k] csum_ipv6_magic 0.72% [kernel] [k] free_pcp_prepare 0.60% [kernel] [k] __napi_poll 0.58% [kernel] [k] net_rx_action 0.56% [kernel] [k] read_tsc <*> 0.50% [kernel] [k] __x86_indirect_thunk_r11 0.45% [kernel] [k] memset After patch, lines with <*> no longer show up, and overall cpu usage looks much better (~60% instead of ~72%) 25.56% [kernel] [k] gq_rx_skb 9.90% [kernel] [k] napi_reuse_skb 7.39% [kernel] [k] dev_gro_receive 6.78% [kernel] [k] memcpy 6.53% [kernel] [k] skb_release_data 6.39% [kernel] [k] tcp_gro_receive 5.71% [kernel] [k] ipv6_gro_receive 4.35% [kernel] [k] napi_gro_frags 4.34% [kernel] [k] skb_gro_receive 3.50% [kernel] [k] gq_pool_get 3.08% [kernel] [k] gq_rx_napi_handler 2.35% [kernel] [k] tcp6_gro_receive 2.06% [kernel] [k] gq_rx_prep_buffers 1.32% [kernel] [k] csum_partial 0.93% [kernel] [k] csum_ipv6_magic 0.65% [kernel] [k] net_rx_action Many thanks to Robin Murphy for his feedback and ideas to make this patch much better ! Signed-off-by: Eric Dumazet Cc: Robin Murphy Cc: Joerg Roedel Cc: Will Deacon Cc: iommu@lists.linux.dev Reviewed-by: Robin Murphy --- drivers/iommu/dma-iommu.c | 2 ++ include/linux/device.h | 1 + include/linux/dma-map-ops.h | 5 +++++ kernel/dma/mapping.c | 20 ++++++++++++++++---- kernel/dma/swiotlb.c | 3 +++ 5 files changed, 27 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 9297b741f5e80e2408e864fc3f779410d6b04d49..bd3f4d3d646cc57c7588f22d49ea32ac693e38ff 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -1587,6 +1587,8 @@ void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit) if (iommu_dma_init_domain(domain, dma_base, dma_limit, dev)) goto out_err; dev->dma_ops = &iommu_dma_ops; + if (!IS_ENABLED(CONFIG_DMA_API_DEBUG) && dev_is_dma_coherent(dev)) + dev->skip_dma_sync = true; } return; diff --git a/include/linux/device.h b/include/linux/device.h index 424b55df02727b5742070f72374fd65f5dd68151..2fbb2cc18e44e21eba5f43557ee16d0dc92ef2ef 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -647,6 +647,7 @@ struct device { defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) bool dma_coherent:1; #endif + bool skip_dma_sync:1; #ifdef CONFIG_DMA_OPS_BYPASS bool dma_ops_bypass : 1; #endif diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h index d678afeb8a13a3a54380a959d14f79bca9c23d8e..4691081f71c51da5468cf6703570ebc7a64d40c5 100644 --- a/include/linux/dma-map-ops.h +++ b/include/linux/dma-map-ops.h @@ -275,6 +275,11 @@ static inline bool dev_is_dma_coherent(struct device *dev) } #endif /* CONFIG_ARCH_HAS_DMA_COHERENCE_H */ +static inline bool dev_skip_dma_sync(struct device *dev) +{ + return dev->skip_dma_sync; +} + void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs); void arch_dma_free(struct device *dev, size_t size, void *cpu_addr, diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index 33437d6206445812b6d4d5b33c77235d18074dec..5d5d286ffae7fa6b7ff1aef46bdc59e7e31a8038 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -328,9 +328,12 @@ EXPORT_SYMBOL(dma_unmap_resource); void dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr, size_t size, enum dma_data_direction dir) { - const struct dma_map_ops *ops = get_dma_ops(dev); + const struct dma_map_ops *ops; BUG_ON(!valid_dma_direction(dir)); + if (dev_skip_dma_sync(dev)) + return; + ops = get_dma_ops(dev);; if (dma_map_direct(dev, ops)) dma_direct_sync_single_for_cpu(dev, addr, size, dir); else if (ops->sync_single_for_cpu) @@ -342,9 +345,12 @@ EXPORT_SYMBOL(dma_sync_single_for_cpu); void dma_sync_single_for_device(struct device *dev, dma_addr_t addr, size_t size, enum dma_data_direction dir) { - const struct dma_map_ops *ops = get_dma_ops(dev); + const struct dma_map_ops *ops; BUG_ON(!valid_dma_direction(dir)); + if (dev_skip_dma_sync(dev)) + return; + ops = get_dma_ops(dev);; if (dma_map_direct(dev, ops)) dma_direct_sync_single_for_device(dev, addr, size, dir); else if (ops->sync_single_for_device) @@ -356,9 +362,12 @@ EXPORT_SYMBOL(dma_sync_single_for_device); void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, int nelems, enum dma_data_direction dir) { - const struct dma_map_ops *ops = get_dma_ops(dev); + const struct dma_map_ops *ops; BUG_ON(!valid_dma_direction(dir)); + if (dev_skip_dma_sync(dev)) + return; + ops = get_dma_ops(dev);; if (dma_map_direct(dev, ops)) dma_direct_sync_sg_for_cpu(dev, sg, nelems, dir); else if (ops->sync_sg_for_cpu) @@ -370,9 +379,12 @@ EXPORT_SYMBOL(dma_sync_sg_for_cpu); void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, int nelems, enum dma_data_direction dir) { - const struct dma_map_ops *ops = get_dma_ops(dev); + const struct dma_map_ops *ops; BUG_ON(!valid_dma_direction(dir)); + if (dev_skip_dma_sync(dev)) + return; + ops = get_dma_ops(dev);; if (dma_map_direct(dev, ops)) dma_direct_sync_sg_for_device(dev, sg, nelems, dir); else if (ops->sync_sg_for_device) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 339a990554e7fed98dd337efe4fb759a98161cdb..03ebd9803db1a457600f1fac8a18fb3dde724a6f 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -734,6 +734,9 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr, int index; phys_addr_t tlb_addr; + if (unlikely(dev->skip_dma_sync)) + dev->skip_dma_sync = false; + if (!mem || !mem->nslabs) { dev_warn_ratelimited(dev, "Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer");