From patchwork Wed Mar 29 14:13:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76602 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp449600vqo; Wed, 29 Mar 2023 07:19:44 -0700 (PDT) X-Google-Smtp-Source: AKy350bbUnyD4tXHCp2w8wE6KiaGu+As09pco0qMPtt05OTjMXYQfjhAmmE7kQSIgdN94r+scch7 X-Received: by 2002:a17:906:3288:b0:8b1:fc58:a4ad with SMTP id 8-20020a170906328800b008b1fc58a4admr21442649ejw.11.1680099584635; Wed, 29 Mar 2023 07:19:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680099584; cv=none; d=google.com; s=arc-20160816; b=h5uPNJdtR300jYbBbAZFLaT+0Un0pp/dEdNRaSTgAiIbbCf095HcRpd6akQFPrF7MM fTXn1JsNyqq6k40RAtml0xQsgCiBAhqItWrA3P8fYLn62tiS2nS6VrvxxW5OuLJlIoHX Wy4AvIg4fSdMca+EKPr8o7Pi7fL1B+Q0ZIKKvF4ZKNrT0Mq/ETzjn14HNMjWDaD/9V+P WeRFBTlLheSfLzZhK0z1kUiPYwtnjr1kUBd3Z9m+asi0QH1jwN7RHi09NcXe0RI/3qCm 7pxhZvXg5w4/UwwRt+IDZeddLwAG7z3JdaByDOINuPtKoXTkLp5xoMXJxkX2kQ8xEIG4 auTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=fsOLHSRSLMInNcD449MTPXsGwp1kWYvONeYqTpf/Ez4=; b=DSLABCzXKX19WYxvdwLHOmJOfPN+uYxiQBMqKiPM8fj7dnz+Bywfsz/6x3HAnFPwlT sBLeKYg4uNefRA3Webjeh+08ZANxK1Z25BNm8tKd4fgWaHU1vtQ6EqathhSjAwByerL7 oLjd0y0dy/WEPJ3QJpKGpYEDDECwYmbUVFdrlmAWBKzpo248jJnJinVh7gRJeidMgBai cFjXAuBCB7teBqoEL9X/Jg729QDCkWh5sB1Gz948z4YPSLm/CNKhvg9PqdXPwyqTbxZQ 4O/Mby0qsVL3XbBMcoqZYu7gBcbW0sVCh02xbkRZlgfY1xyUzDbmSsW+mh7yx4brqVNM osXQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=beBy+dD0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n18-20020a1709065db200b009335a2e3cfcsi29860605ejv.919.2023.03.29.07.19.19; Wed, 29 Mar 2023 07:19:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=beBy+dD0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230371AbjC2OQu (ORCPT + 99 others); Wed, 29 Mar 2023 10:16:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53052 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230204AbjC2OQc (ORCPT ); Wed, 29 Mar 2023 10:16:32 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8657D55AA for ; Wed, 29 Mar 2023 07:15:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099246; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fsOLHSRSLMInNcD449MTPXsGwp1kWYvONeYqTpf/Ez4=; b=beBy+dD022NdWtdaVKpwN4DF5Zv3960mDrhNFTVvEAOGATKOBtVvvg4/ScRqkNKm5PnO1y /Mt1spWdGcjxaOd2khxraHm1drcUpTrtrNvnVpwc32UatUTTollwfO/4wOAgPlLhezjkXK 1c85G+Vadt0vdvVC6gb68f/GpPMfyQc= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-516-Xdb9sur-PoeNk1trWuucgw-1; Wed, 29 Mar 2023 10:14:03 -0400 X-MC-Unique: Xdb9sur-PoeNk1trWuucgw-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 080F01C06EE7; Wed, 29 Mar 2023 14:14:02 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 94AC618EC6; Wed, 29 Mar 2023 14:13:59 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Steve French , Shyam Prasad N , Rohith Surabattula , linux-cachefs@redhat.com, linux-cifs@vger.kernel.org Subject: [RFC PATCH v2 01/48] netfs: Fix netfs_extract_iter_to_sg() for ITER_UBUF/IOVEC Date: Wed, 29 Mar 2023 15:13:07 +0100 Message-Id: <20230329141354.516864-2-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761712102156631533?= X-GMAIL-MSGID: =?utf-8?q?1761712102156631533?= Fix netfs_extract_iter_to_sg() for ITER_UBUF and ITER_IOVEC to set the size of the page to the part of the page extracted, not the remaining amount of data in the extracted page array at that point. This doesn't yet affect anything as cifs, the only current user, only passes in non-user-backed iterators. Fixes: 018584697533 ("netfs: Add a function to extract an iterator into a scatterlist") Signed-off-by: David Howells cc: Jeff Layton cc: Steve French cc: Shyam Prasad N cc: Rohith Surabattula cc: linux-cachefs@redhat.com cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org --- fs/netfs/iterator.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/netfs/iterator.c b/fs/netfs/iterator.c index e9a45dea748a..8a4c86687429 100644 --- a/fs/netfs/iterator.c +++ b/fs/netfs/iterator.c @@ -139,7 +139,7 @@ static ssize_t netfs_extract_user_to_sg(struct iov_iter *iter, size_t seg = min_t(size_t, PAGE_SIZE - off, len); *pages++ = NULL; - sg_set_page(sg, page, len, off); + sg_set_page(sg, page, seg, off); sgtable->nents++; sg++; len -= seg; From patchwork Wed Mar 29 14:13:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76614 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp450796vqo; Wed, 29 Mar 2023 07:21:22 -0700 (PDT) X-Google-Smtp-Source: AKy350aWgVbMJJM+f+h8xHVIHQQsGfOgdupAvl79MS3MLARnuX1DPPNPJitmxttfPbnXr1zVkqkI X-Received: by 2002:a17:906:74b:b0:8b1:3a23:8ec7 with SMTP id z11-20020a170906074b00b008b13a238ec7mr20047140ejb.43.1680099682175; Wed, 29 Mar 2023 07:21:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680099682; cv=none; d=google.com; s=arc-20160816; b=A55U0UdZYkvAvekqJJuFk/L8QY7916+T44TiipnBqIF9Gy1yueM4D+xLnN65yL88qe zNwd2lKwzhHpYHQoYTx8vJdeeLXQ0spSZXyE4ZpArh7RMoryp9vynywyokUDCZi5ZJCB UgAe64eZ+1C4WEdwk5qLSGibkd01wViblQG0qTwByOBAYoHx8A/nfcqh+viOnTFXuaBc 7vNvnJJh2RtZzyg48g4FUMEe4em1b6iE0+X0LCTzb8/By53bAZrjoI82zI8LWvpZunoJ aVHVD7dwDlcfjScnsaq3qfTYNrniVcXt8Ks11UbBkEfUQThgHkQyLjdvdtcFsAKC/fP5 mzVA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=s7FY6IkB7NRnaaunO8kQ3wM8dgoy+C5dVBJnz6ApgXk=; b=R8jgXPusrZOkvVSafBdwXDk3KrmrXiTeDuHdATrKoX7oeh5M7Oa55ZppcmS8okAliQ uleh7Tyu1BbvcwPjZyNxAiUSn0ibuEZcCUrE7ltu7j7jIjILSM5Z5ippClYPyg3+IPf1 5Df6NU42quhO7tMgutFcaAeXmqFcLblMB5K/+mllAUO27+iTstOcnq95b5meTox1qx7Y P6jo+4BK0EedGhzwVSJKC9j8elQtIiggRE6KsEH+psutfO+MNBMwYfcGC4LhNVNNh5sj JUYxqlm0tdjTpC8357bmzzYeYK5F8B11r5OFghnUgFWqmpEY5xyc9J832UhSoGay1WKb QcFg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=WWaY4cuM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id gl7-20020a170906e0c700b0093cda757bd8si16418385ejb.132.2023.03.29.07.20.56; Wed, 29 Mar 2023 07:21:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=WWaY4cuM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229907AbjC2OTW (ORCPT + 99 others); Wed, 29 Mar 2023 10:19:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55582 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230308AbjC2ORh (ORCPT ); Wed, 29 Mar 2023 10:17:37 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F17EE5B91 for ; Wed, 29 Mar 2023 07:15:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099249; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=s7FY6IkB7NRnaaunO8kQ3wM8dgoy+C5dVBJnz6ApgXk=; b=WWaY4cuMxsc0r7tkwhHJ0sq+xwcZx0RgXCtDqpmD/R90GyMteblACjdXRdwPh3OmJglPka bOYDIgs1FENV/p69reaS/948xjusq1tHe4Bgc4BlwJfAlKfQrJNeNLW8Y9HXZY4jAGFlxe TaoaWmZ6FNOd/p+jwR77h/AK0K1zK00= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-440-h-W_J7FgPEG0aunda_tCKQ-1; Wed, 29 Mar 2023 10:14:05 -0400 X-MC-Unique: h-W_J7FgPEG0aunda_tCKQ-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id AF299855315; Wed, 29 Mar 2023 14:14:04 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id B6A6F492B01; Wed, 29 Mar 2023 14:14:02 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-nfs@vger.kernel.org Subject: [RFC PATCH v2 02/48] iov_iter: Remove last_offset member Date: Wed, 29 Mar 2023 15:13:08 +0100 Message-Id: <20230329141354.516864-3-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761712204170302581?= X-GMAIL-MSGID: =?utf-8?q?1761712204170302581?= With the removal of ITER_PIPE, the last_offset member of struct iov_iter is no longer used, so remove it and un-unionise the remaining member. Signed-off-by: David Howells cc: Jens Axboe cc: Matthew Wilcox cc: Alexander Viro cc: Jeff Layton cc: linux-nfs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org cc: netdev@vger.kernel.org --- include/linux/uio.h | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/include/linux/uio.h b/include/linux/uio.h index 74598426edb4..2d8a70cb9b26 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -43,10 +43,7 @@ struct iov_iter { bool nofault; bool data_source; bool user_backed; - union { - size_t iov_offset; - int last_offset; - }; + size_t iov_offset; size_t count; union { const struct iovec *iov; From patchwork Wed Mar 29 14:13:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76601 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp449538vqo; Wed, 29 Mar 2023 07:19:39 -0700 (PDT) X-Google-Smtp-Source: AKy350bYq6mjhy1sBtlqd5qyQCPkkHQmNEbPsKhzICdriRmRgO1u1J0e6wmIpIDVqUhaWQ6sVq+T X-Received: by 2002:aa7:d452:0:b0:4fa:733f:8abb with SMTP id q18-20020aa7d452000000b004fa733f8abbmr18430495edr.32.1680099579333; Wed, 29 Mar 2023 07:19:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680099579; cv=none; d=google.com; s=arc-20160816; b=pAJAXqPzKRutkvH47iLJ3DCDHATmI2Z1ACA1AVH/+nJHtb2DZQXtncCK2u3ScEHVn/ zEm6ATcilE+6hP2310ksi4HcsTJS9PSecV3pU4uJ6yte9vLIY10MP737c08XEDHxX2Bo sjMKcFsvQK3Fx+RrXFNFByyvaeUgLDEbANyB3tO+EQ5E422Urnvi/OG9SY8Ox6yi56SN yGmwT5Sjz247WxPuwI/u1AD9iNre3yoeMG/rXrLrAPwtq163ifCaqCZD367gdkuSUawy ZcZ8fjKoyYxx5C5BRmgx6ElWxhajUotIAvwfP3BF3GEKkwc8hOxcGM5BTCpXy03i2l3z 9kiA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=YKmPIBLTzwmxTywoQlbWhVqEAglt5Diim843HzqnLOY=; b=Y3Gtt/9K5W1iC5EW4w0Fd3J/MY+1H0nVWA+sHFwV/Lq0ldz0nWxaXA8t/yAp0sDVYM rcYjZ8ORSe00yzloEgqFPfmV5LIAXVwRk88A5e8IpEJBpLXbHe44Ve7a9OtdFdY7uDp/ OiWARWFgGlOjuP1nZCbP9Q22tRGrKVvLWWasOP+HczS1mgPSDogLMtLMWKt5ksl4SN7O uRLsuZpw9x+sOvRrYbSW08fa3awrt4EOmsoVcaxqgmzySbVK12z/dgAHfoTmGzW0kokY 0YWiTh9ScMJnJoRoPQkEFn1l6AysJgoi9LTdzKc6cjpsv0z4W+7S5tJzEJu/OY9U/4AN tnHg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=hBQpR6Bt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v24-20020aa7d658000000b00501dd8e9f9dsi23789140edr.370.2023.03.29.07.19.10; Wed, 29 Mar 2023 07:19:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=hBQpR6Bt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230443AbjC2OQn (ORCPT + 99 others); Wed, 29 Mar 2023 10:16:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53018 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230032AbjC2OQb (ORCPT ); Wed, 29 Mar 2023 10:16:31 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8EA4959C3 for ; Wed, 29 Mar 2023 07:15:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099254; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YKmPIBLTzwmxTywoQlbWhVqEAglt5Diim843HzqnLOY=; b=hBQpR6Bt84wr3e9vA9QALa3Zoga0PWs22MbfIserQehZSmB/4N2nEOF6HzC5TYmIbuTM+h /bYQpE6hSybDfrQ8emd9wuGndvitE4cuxS4IUDX9Y3+KC39yMgkZvgqYUjIKh0wJs9s3rt SA5EYhtU2+ytMSnvoVsRtkP6vDvD0EU= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-132-y_B6QV2XMFK1PdJabznHmQ-1; Wed, 29 Mar 2023 10:14:09 -0400 X-MC-Unique: y_B6QV2XMFK1PdJabznHmQ-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 93F51381459A; Wed, 29 Mar 2023 14:14:07 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4B2C718EC2; Wed, 29 Mar 2023 14:14:05 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Trond Myklebust , Anna Schumaker , linux-nfs@vger.kernel.org Subject: [RFC PATCH v2 03/48] iov_iter: Add an iterator-of-iterators Date: Wed, 29 Mar 2023 15:13:09 +0100 Message-Id: <20230329141354.516864-4-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761712096256012482?= X-GMAIL-MSGID: =?utf-8?q?1761712096256012482?= Add a new I/O iterator type, ITER_ITERLIST, that allows iteration over a series of I/O iterators, provided the iterators are all the same direction (all ITER_SOURCE or all ITER_DEST) and none of them are themselves ITER_ITERLIST (this function is recursive). To make reversion possible, I've added an 'orig_count' member into the iov_iter struct so that reversion of an ITER_ITERLIST can know when to go move backwards through the iter list. It might make more sense to make the iterator list element, say: struct itervec { struct iov_iter iter; size_t orig_count; }; rather than expanding struct iov_iter itself and have iov_iter_iterlist() set vec[i].orig_count from vec[i].iter->count. Also, for the moment, I've only permitted its use with source iterators (eg. sendmsg). To use this, you allocate an array of iterators and point the list iterator at it, e.g.: struct iov_iter iters[3]; struct msghdr msg; iov_iter_bvec(&iters[0], ITER_SOURCE, &head_bv, 1, sizeof(marker) + head->iov_len); iov_iter_xarray(&iters[1], ITER_SOURCE, xdr->pages, xdr->page_fpos, xdr->page_len); iov_iter_kvec(&iters[2], ITER_SOURCE, &tail_kv, 1, tail->iov_len); iov_iter_iterlist(&msg.msg_iter, ITER_SOURCE, iters, 3, size); This can be used by network filesystem protocols, such as sunrpc, to glue a header and a trailer on to some data to form a message and then dump the entire message onto the socket in a single go. [!] Note: I'm not entirely sure that this is a good idea: the problem is that it's reasonably common practice to copy an iterator by direct assignment - and that works for the existing iterators... but not this one. With the iterator-of-iterators, the list of iterators has to be modified if we recurse. It's probably fine just for calling sendmsg() from network filesystems, but I'm not 100% sure of that. Suggested-by: Trond Myklebust Signed-off-by: David Howells cc: Trond Myklebust cc: Anna Schumaker cc: Chuck Lever cc: Jens Axboe cc: Matthew Wilcox cc: Alexander Viro cc: Jeff Layton cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-nfs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org cc: netdev@vger.kernel.org --- include/linux/uio.h | 13 ++- lib/iov_iter.c | 254 ++++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 260 insertions(+), 7 deletions(-) diff --git a/include/linux/uio.h b/include/linux/uio.h index 2d8a70cb9b26..6c75c94566b8 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -27,6 +27,7 @@ enum iter_type { ITER_XARRAY, ITER_DISCARD, ITER_UBUF, + ITER_ITERLIST, }; #define ITER_SOURCE 1 // == WRITE @@ -45,12 +46,14 @@ struct iov_iter { bool user_backed; size_t iov_offset; size_t count; + size_t orig_count; union { const struct iovec *iov; const struct kvec *kvec; const struct bio_vec *bvec; struct xarray *xarray; void __user *ubuf; + struct iov_iter *iterlist; }; union { unsigned long nr_segs; @@ -101,6 +104,11 @@ static inline bool iov_iter_is_xarray(const struct iov_iter *i) return iov_iter_type(i) == ITER_XARRAY; } +static inline bool iov_iter_is_iterlist(const struct iov_iter *i) +{ + return iov_iter_type(i) == ITER_ITERLIST; +} + static inline unsigned char iov_iter_rw(const struct iov_iter *i) { return i->data_source ? WRITE : READ; @@ -235,6 +243,8 @@ void iov_iter_bvec(struct iov_iter *i, unsigned int direction, const struct bio_ void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count); void iov_iter_xarray(struct iov_iter *i, unsigned int direction, struct xarray *xarray, loff_t start, size_t count); +void iov_iter_iterlist(struct iov_iter *i, unsigned int direction, struct iov_iter *iterlist, + unsigned long nr_segs, size_t count); ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages, size_t maxsize, unsigned maxpages, size_t *start, iov_iter_extraction_t extraction_flags); @@ -342,7 +352,8 @@ static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction, .user_backed = true, .data_source = direction, .ubuf = buf, - .count = count + .count = count, + .orig_count = count, }; } /* Flags for iov_iter_get/extract_pages*() */ diff --git a/lib/iov_iter.c b/lib/iov_iter.c index fad95e4cf372..8a9ae4af45fc 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -282,7 +282,8 @@ void iov_iter_init(struct iov_iter *i, unsigned int direction, .iov = iov, .nr_segs = nr_segs, .iov_offset = 0, - .count = count + .count = count, + .orig_count = count, }; } EXPORT_SYMBOL(iov_iter_init); @@ -364,6 +365,26 @@ size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i) if (WARN_ON_ONCE(!i->data_source)) return 0; + if (unlikely(iov_iter_is_iterlist(i))) { + size_t copied = 0; + + while (bytes && i->count) { + size_t part = min(bytes, i->iterlist->count), n; + + if (part > 0) + n = _copy_from_iter(addr, part, i->iterlist); + addr += n; + copied += n; + bytes -= n; + i->count -= n; + if (n < part || !bytes) + break; + i->iterlist++; + i->nr_segs--; + } + return copied; + } + if (user_backed_iter(i)) might_fault(); iterate_and_advance(i, bytes, base, len, off, @@ -380,6 +401,27 @@ size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i) if (WARN_ON_ONCE(!i->data_source)) return 0; + if (unlikely(iov_iter_is_iterlist(i))) { + size_t copied = 0; + + while (bytes && i->count) { + size_t part = min(bytes, i->iterlist->count), n; + + if (part > 0) + n = _copy_from_iter_nocache(addr, part, + i->iterlist); + addr += n; + copied += n; + bytes -= n; + i->count -= n; + if (n < part || !bytes) + break; + i->iterlist++; + i->nr_segs--; + } + return copied; + } + iterate_and_advance(i, bytes, base, len, off, __copy_from_user_inatomic_nocache(addr + off, base, len), memcpy(addr + off, base, len) @@ -411,6 +453,27 @@ size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i) if (WARN_ON_ONCE(!i->data_source)) return 0; + if (unlikely(iov_iter_is_iterlist(i))) { + size_t copied = 0; + + while (bytes && i->count) { + size_t part = min(bytes, i->iterlist->count), n; + + if (part > 0) + n = _copy_from_iter_flushcache(addr, part, + i->iterlist); + addr += n; + copied += n; + bytes -= n; + i->count -= n; + if (n < part || !bytes) + break; + i->iterlist++; + i->nr_segs--; + } + return copied; + } + iterate_and_advance(i, bytes, base, len, off, __copy_from_user_flushcache(addr + off, base, len), memcpy_flushcache(addr + off, base, len) @@ -514,7 +577,31 @@ EXPORT_SYMBOL(iov_iter_zero); size_t copy_page_from_iter_atomic(struct page *page, unsigned offset, size_t bytes, struct iov_iter *i) { - char *kaddr = kmap_atomic(page), *p = kaddr + offset; + char *kaddr, *p; + + if (unlikely(iov_iter_is_iterlist(i))) { + size_t copied = 0; + + while (bytes && i->count) { + size_t part = min(bytes, i->iterlist->count), n; + + if (part > 0) + n = copy_page_from_iter_atomic(page, offset, part, + i->iterlist); + offset += n; + copied += n; + bytes -= n; + i->count -= n; + if (n < part || !bytes) + break; + i->iterlist++; + i->nr_segs--; + } + return copied; + } + + kaddr = kmap_atomic(page); + p = kaddr + offset; if (!page_copy_sane(page, offset, bytes)) { kunmap_atomic(kaddr); return 0; @@ -585,19 +672,49 @@ void iov_iter_advance(struct iov_iter *i, size_t size) iov_iter_bvec_advance(i, size); } else if (iov_iter_is_discard(i)) { i->count -= size; + }else if (iov_iter_is_iterlist(i)) { + i->count -= size; + for (;;) { + size_t part = min(size, i->iterlist->count); + + if (part > 0) + iov_iter_advance(i->iterlist, part); + size -= part; + if (!size) + break; + i->iterlist++; + i->nr_segs--; + } } } EXPORT_SYMBOL(iov_iter_advance); +static void iov_iter_revert_iterlist(struct iov_iter *i, size_t unroll) +{ + for (;;) { + size_t part = min(unroll, i->iterlist->orig_count - i->iterlist->count); + + if (part > 0) + iov_iter_revert(i->iterlist, part); + unroll -= part; + if (!unroll) + break; + i->iterlist--; + i->nr_segs++; + } +} + void iov_iter_revert(struct iov_iter *i, size_t unroll) { if (!unroll) return; - if (WARN_ON(unroll > MAX_RW_COUNT)) + if (WARN_ON(unroll > i->orig_count - i->count)) return; i->count += unroll; if (unlikely(iov_iter_is_discard(i))) return; + if (unlikely(iov_iter_is_iterlist(i))) + return iov_iter_revert_iterlist(i, unroll); if (unroll <= i->iov_offset) { i->iov_offset -= unroll; return; @@ -641,6 +758,8 @@ EXPORT_SYMBOL(iov_iter_revert); */ size_t iov_iter_single_seg_count(const struct iov_iter *i) { + if (iov_iter_is_iterlist(i)) + i = i->iterlist; if (i->nr_segs > 1) { if (likely(iter_is_iovec(i) || iov_iter_is_kvec(i))) return min(i->count, i->iov->iov_len - i->iov_offset); @@ -662,7 +781,8 @@ void iov_iter_kvec(struct iov_iter *i, unsigned int direction, .kvec = kvec, .nr_segs = nr_segs, .iov_offset = 0, - .count = count + .count = count, + .orig_count = count, }; } EXPORT_SYMBOL(iov_iter_kvec); @@ -678,7 +798,8 @@ void iov_iter_bvec(struct iov_iter *i, unsigned int direction, .bvec = bvec, .nr_segs = nr_segs, .iov_offset = 0, - .count = count + .count = count, + .orig_count = count, }; } EXPORT_SYMBOL(iov_iter_bvec); @@ -706,6 +827,7 @@ void iov_iter_xarray(struct iov_iter *i, unsigned int direction, .xarray = xarray, .xarray_start = start, .count = count, + .orig_count = count, .iov_offset = 0 }; } @@ -727,11 +849,47 @@ void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count) .iter_type = ITER_DISCARD, .data_source = false, .count = count, + .orig_count = count, .iov_offset = 0 }; } EXPORT_SYMBOL(iov_iter_discard); +/** + * iov_iter_iterlist - Initialise an I/O iterator that is a list of iterators + * @iter: The iterator to initialise. + * @direction: The direction of the transfer. + * @iterlist: The list of iterators + * @nr_segs: The number of elements in the list + * @count: The size of the I/O buffer in bytes. + * + * Set up an I/O iterator that just discards everything that's written to it. + * It's only available as a source iterator (for WRITE), all the iterators in + * the list must be the same and none of them can be ITER_ITERLIST type. + */ +void iov_iter_iterlist(struct iov_iter *iter, unsigned int direction, + struct iov_iter *iterlist, unsigned long nr_segs, + size_t count) +{ + unsigned long i; + + BUG_ON(direction != WRITE); + for (i = 0; i < nr_segs; i++) { + BUG_ON(iterlist[i].iter_type == ITER_ITERLIST); + BUG_ON(!iterlist[i].data_source); + } + + *iter = (struct iov_iter){ + .iter_type = ITER_ITERLIST, + .data_source = true, + .count = count, + .orig_count = count, + .iterlist = iterlist, + .nr_segs = nr_segs, + }; +} +EXPORT_SYMBOL(iov_iter_iterlist); + static bool iov_iter_aligned_iovec(const struct iov_iter *i, unsigned addr_mask, unsigned len_mask) { @@ -879,6 +1037,15 @@ unsigned long iov_iter_alignment(const struct iov_iter *i) if (iov_iter_is_xarray(i)) return (i->xarray_start + i->iov_offset) | i->count; + if (iov_iter_is_iterlist(i)) { + unsigned long align = 0; + unsigned int j; + + for (j = 0; j < i->nr_segs; j++) + align |= iov_iter_alignment(&i->iterlist[j]); + return align; + } + return 0; } EXPORT_SYMBOL(iov_iter_alignment); @@ -1078,6 +1245,18 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i, } if (iov_iter_is_xarray(i)) return iter_xarray_get_pages(i, pages, maxsize, maxpages, start); + if (iov_iter_is_iterlist(i)) { + ssize_t size; + + while (!i->iterlist->count) { + i->iterlist++; + i->nr_segs--; + } + size = __iov_iter_get_pages_alloc(i->iterlist, pages, maxsize, maxpages, + start, extraction_flags); + i->count -= size; + return size; + } return -EFAULT; } @@ -1126,6 +1305,31 @@ ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i, } EXPORT_SYMBOL(iov_iter_get_pages_alloc2); +static size_t csum_and_copy_from_iterlist(void *addr, size_t bytes, __wsum *csum, + struct iov_iter *i) +{ + size_t copied = 0, n; + + while (i->count && i->nr_segs) { + struct iov_iter *j = i->iterlist; + + if (j->count == 0) { + i->iterlist++; + i->nr_segs--; + continue; + } + + n = csum_and_copy_from_iter(addr, bytes - copied, csum, j); + addr += n; + copied += n; + i->count -= n; + if (n == 0) + break; + } + + return copied; +} + size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum, struct iov_iter *i) { @@ -1133,6 +1337,8 @@ size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum, sum = *csum; if (WARN_ON_ONCE(!i->data_source)) return 0; + if (iov_iter_is_iterlist(i)) + return csum_and_copy_from_iterlist(addr, bytes, csum, i); iterate_and_advance(i, bytes, base, len, off, ({ next = csum_and_copy_from_user(base, addr + off, len); @@ -1236,6 +1442,21 @@ static int bvec_npages(const struct iov_iter *i, int maxpages) return npages; } +static int iterlist_npages(const struct iov_iter *i, int maxpages) +{ + ssize_t size = i->count; + const struct iov_iter *p; + int npages = 0; + + for (p = i->iterlist; size; p++) { + size -= p->count; + npages += iov_iter_npages(p, maxpages - npages); + if (unlikely(npages >= maxpages)) + return maxpages; + } + return npages; +} + int iov_iter_npages(const struct iov_iter *i, int maxpages) { if (unlikely(!i->count)) @@ -1255,6 +1476,8 @@ int iov_iter_npages(const struct iov_iter *i, int maxpages) int npages = DIV_ROUND_UP(offset + i->count, PAGE_SIZE); return min(npages, maxpages); } + if (iov_iter_is_iterlist(i)) + return iterlist_npages(i, maxpages); return 0; } EXPORT_SYMBOL(iov_iter_npages); @@ -1266,11 +1489,14 @@ const void *dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags) return new->bvec = kmemdup(new->bvec, new->nr_segs * sizeof(struct bio_vec), flags); - else if (iov_iter_is_kvec(new) || iter_is_iovec(new)) + if (iov_iter_is_kvec(new) || iter_is_iovec(new)) /* iovec and kvec have identical layout */ return new->iov = kmemdup(new->iov, new->nr_segs * sizeof(struct iovec), flags); + if (WARN_ON_ONCE(iov_iter_is_iterlist(old))) + /* Don't allow dup'ing of iterlist as the cleanup is complicated */ + return NULL; return NULL; } EXPORT_SYMBOL(dup_iter); @@ -1759,6 +1985,22 @@ ssize_t iov_iter_extract_pages(struct iov_iter *i, return iov_iter_extract_xarray_pages(i, pages, maxsize, maxpages, extraction_flags, offset0); + if (iov_iter_is_iterlist(i)) { + ssize_t size; + + while (i->nr_segs && !i->iterlist->count) { + i->iterlist++; + i->nr_segs--; + } + if (!i->nr_segs) { + WARN_ON_ONCE(i->count); + return 0; + } + size = iov_iter_extract_pages(i->iterlist, pages, maxsize, maxpages, + extraction_flags, offset0); + i->count -= size; + return size; + } return -EFAULT; } EXPORT_SYMBOL_GPL(iov_iter_extract_pages); From patchwork Wed Mar 29 14:13:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76606 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp449784vqo; Wed, 29 Mar 2023 07:19:59 -0700 (PDT) X-Google-Smtp-Source: AKy350Z5QSHH7uWHt2pPHrjfm80JlvZae+oxHIb3OKm4WR+BvKJ0P9iXg1U7nGi/XrDI6qA8H5Xz X-Received: by 2002:a05:6402:4d5:b0:4ac:d2b4:ec2c with SMTP id n21-20020a05640204d500b004acd2b4ec2cmr19920788edw.29.1680099599622; Wed, 29 Mar 2023 07:19:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680099599; cv=none; d=google.com; s=arc-20160816; b=ovTWOYkUK48ZKHGOnvpccoLMa6A8f0NTRriI8pZp/LVrPatCXNV3Y5lgyI5L2kWLRA hTf8SJl4Miuy/UNu0lTtzTt+uEAJSGUy3MECBhEXwoyv3trJF0z05Xj/oTNRoHnttmhP vMTztqBL2gjGSDy8+Q/cI0kbc+Ad06Tqqn91+57iF+BIaywlcLWrg9xMSU4g7sfyzJ2W x4jTOjy2XBj9lVEjyxLMI3X88LCYPIobAhn0DCTDlgXS8HnrqkTHHf7t13k5l4YoFBwq Ex6AbOatE0r56cAmIIaDDmORDRbrjsymlpgYpzUN7ia06mpewO0zWoSrJOSHswCiVT0E GYvg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Jo7xD6cW+5TTIp8oqR2hYvnkWHwWe/N8CbruF8cE9GA=; b=vFwsUmVvdCT92iNM5h6RvsEieovVgFOuAyqNHnfY7Foe/vFO+zQFioin00PMYGZmjS C1o5OMhEw+IAA1Npo3I2q/jXEOaU93oesQtth/iHioh3reSLkQpOthYcwlTcGjb7cf7I PvnVjxfPpySr5WrCw8jI51R32pyNsxwwgWfxf7UWomUORulNWOarrku6cO4cfSN465Pe 2YWnR8K9TGaFR8v1RcpF7Acot/jCG/IEBAn2tCaAVUB/F4UfBRT7gUZbHj6cmkl5p41L d+XKri+pzK/Z+e2SjZZcpx+d1Fs2RE+C6sLcVNDF2dGaUA1OVFueIxHKLitmApi1ASnw hdGw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=XSorInbJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x23-20020aa7d6d7000000b004af60cc2a8csi34657979edr.486.2023.03.29.07.19.36; Wed, 29 Mar 2023 07:19:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=XSorInbJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231127AbjC2ORi (ORCPT + 99 others); Wed, 29 Mar 2023 10:17:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53570 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230451AbjC2OQo (ORCPT ); Wed, 29 Mar 2023 10:16:44 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BA9014C3E for ; Wed, 29 Mar 2023 07:15:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099254; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Jo7xD6cW+5TTIp8oqR2hYvnkWHwWe/N8CbruF8cE9GA=; b=XSorInbJHYv7h0BCJsb/2BlcJsTxSyyglGtCgq/izDeFljKyhqfM0I8+mZxo/kg4Y6YZgR YP7gqXE8Ghnsea46ocW0FpYKxg/MWCGwZPuhsQgP3Z3MZpzNo5P8R16sR+5r13ZxxUsUnq DqFqOEzAQdBHRBR7zJaY8dmQ7pl4khw= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-228-hXJAadh0MrOcoI5-BRrEXg-1; Wed, 29 Mar 2023 10:14:11 -0400 X-MC-Unique: hXJAadh0MrOcoI5-BRrEXg-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 42E2B185A78B; Wed, 29 Mar 2023 14:14:10 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4BD8D2166B33; Wed, 29 Mar 2023 14:14:08 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Willem de Bruijn Subject: [RFC PATCH v2 04/48] net: Declare MSG_SPLICE_PAGES internal sendmsg() flag Date: Wed, 29 Mar 2023 15:13:10 +0100 Message-Id: <20230329141354.516864-5-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761712117502837814?= X-GMAIL-MSGID: =?utf-8?q?1761712117502837814?= Declare MSG_SPLICE_PAGES, an internal sendmsg() flag, that hints to a network protocol that it should splice pages from the source iterator rather than copying the data if it can. This flag is added to a list that is cleared by sendmsg and recvmsg syscalls on entry. This is intended as a replacement for the ->sendpage() op, allowing a way to splice in several multipage folios in one go. Signed-off-by: David Howells cc: Willem de Bruijn cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- include/linux/socket.h | 3 +++ net/socket.c | 7 +++++++ 2 files changed, 10 insertions(+) diff --git a/include/linux/socket.h b/include/linux/socket.h index 13c3a237b9c9..c2fa0f800999 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -327,6 +327,7 @@ struct ucred { */ #define MSG_ZEROCOPY 0x4000000 /* Use user data in kernel path */ +#define MSG_SPLICE_PAGES 0x8000000 /* Splice the pages from the iterator in sendmsg() */ #define MSG_FASTOPEN 0x20000000 /* Send data in TCP SYN */ #define MSG_CMSG_CLOEXEC 0x40000000 /* Set close_on_exec for file descriptor received through @@ -337,6 +338,8 @@ struct ucred { #define MSG_CMSG_COMPAT 0 /* We never have 32 bit fixups */ #endif +/* Flags to be cleared on entry by sendmsg, recvmsg, sendmmsg and recvmmsg syscalls */ +#define MSG_INTERNAL_FLAGS (MSG_SPLICE_PAGES) /* Setsockoptions(2) level. Thanks to BSD these must match IPPROTO_xxx */ #define SOL_IP 0 diff --git a/net/socket.c b/net/socket.c index 6bae8ce7059e..dfb912bbed62 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2139,6 +2139,7 @@ int __sys_sendto(int fd, void __user *buff, size_t len, unsigned int flags, msg.msg_name = (struct sockaddr *)&address; msg.msg_namelen = addr_len; } + flags &= ~MSG_INTERNAL_FLAGS; if (sock->file->f_flags & O_NONBLOCK) flags |= MSG_DONTWAIT; msg.msg_flags = flags; @@ -2192,6 +2193,7 @@ int __sys_recvfrom(int fd, void __user *ubuf, size_t size, unsigned int flags, if (!sock) goto out; + flags &= ~MSG_INTERNAL_FLAGS; if (sock->file->f_flags & O_NONBLOCK) flags |= MSG_DONTWAIT; err = sock_recvmsg(sock, &msg, flags); @@ -2579,6 +2581,7 @@ long __sys_sendmsg(int fd, struct user_msghdr __user *msg, unsigned int flags, if (forbid_cmsg_compat && (flags & MSG_CMSG_COMPAT)) return -EINVAL; + flags &= ~MSG_INTERNAL_FLAGS; sock = sockfd_lookup_light(fd, &err, &fput_needed); if (!sock) @@ -2627,6 +2630,7 @@ int __sys_sendmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen, entry = mmsg; compat_entry = (struct compat_mmsghdr __user *)mmsg; err = 0; + flags &= ~MSG_INTERNAL_FLAGS; flags |= MSG_BATCH; while (datagrams < vlen) { @@ -2775,6 +2779,7 @@ long __sys_recvmsg_sock(struct socket *sock, struct msghdr *msg, struct user_msghdr __user *umsg, struct sockaddr __user *uaddr, unsigned int flags) { + flags &= ~MSG_INTERNAL_FLAGS; return ____sys_recvmsg(sock, msg, umsg, uaddr, flags, 0); } @@ -2787,6 +2792,7 @@ long __sys_recvmsg(int fd, struct user_msghdr __user *msg, unsigned int flags, if (forbid_cmsg_compat && (flags & MSG_CMSG_COMPAT)) return -EINVAL; + flags &= ~MSG_INTERNAL_FLAGS; sock = sockfd_lookup_light(fd, &err, &fput_needed); if (!sock) @@ -2839,6 +2845,7 @@ static int do_recvmmsg(int fd, struct mmsghdr __user *mmsg, goto out_put; } } + flags &= ~MSG_INTERNAL_FLAGS; entry = mmsg; compat_entry = (struct compat_mmsghdr __user *)mmsg; From patchwork Wed Mar 29 14:13:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76604 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp449701vqo; Wed, 29 Mar 2023 07:19:54 -0700 (PDT) X-Google-Smtp-Source: AKy350bWnSgiL1f2hNB7LO6dyaaxu3LNvi5LFi7n+JAo9pyMAIx5tY0smDr/Ugr8oh4/IqdPbNxh X-Received: by 2002:a05:6402:4403:b0:4af:6e08:319 with SMTP id y3-20020a056402440300b004af6e080319mr2346575eda.15.1680099594257; Wed, 29 Mar 2023 07:19:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680099594; cv=none; d=google.com; s=arc-20160816; b=gNouG/N1+ynihi6wGfQ5fH/eaocx/2zn9KdmowdsCVg7Zk6Jn1UA0GkYhChymAogA/ v2ESeIwP+M/Uv6mEdnrKBaf3Hs7GpJQEYGIm3L6fh7Sk2wfdb8eapttzlNj75b2wa3/v fpOz0t+NH9IYx0aj2lcAoYqYk0EbJ3KFQaXox3LRTBYVL3f2ubOE8ruZPEALJL2ltOVN 1OP/KS5VI20X0j+Uc+Tmm3ic+UgJKBSEfCO6mMLO5hUcS0BCi0JVVcVbb1HHdfSwE59E PgnThdKTBvdY8aaTTUx9L3fXqk/ABe7sVhh6QfXu+ZUYxl53hm17Ye7qMFji3f9rhHAi LYbQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=zHarMfhYqdBKP5ReoH3//xqNxFuro7P2+Q66shvU368=; b=pM5hkvgNbEAl4mkJaQhBAXDL+FIonU5TyH4FsFuB45aKGQaVuM9SL0s20uujjF20/2 B7GCmZVtBdrSlBrMiF6Hkxw0od30WSb0RjYvwvjHXiKiq1trAxC+i/oRGODssSyXEbVG PAKKcw5WYCgiPpw9Iu43yZkRPoa3b2JSfC0j+pP97uQRg+9GcVG9nfBbdRcIyu5QLnQA /kHLDk2aLEiI7rmJ+EN4IYgNekH/HeI19nrXAxzWRrN9shH8z602edOreKsS29MGoFdc sQnFlSH3ICm3va2Es78elVuOcwx1ao748pGge4wfDXvzSn6+gfCkeB6Xy6o493hQKN/L ciPA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=bIeLdMxZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id lt27-20020a170906fa9b00b008cbb481fc84si32519210ejb.20.2023.03.29.07.19.30; Wed, 29 Mar 2023 07:19:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=bIeLdMxZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230290AbjC2OQr (ORCPT + 99 others); Wed, 29 Mar 2023 10:16:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53316 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230161AbjC2OQc (ORCPT ); Wed, 29 Mar 2023 10:16:32 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4194F59F7 for ; Wed, 29 Mar 2023 07:15:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099259; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zHarMfhYqdBKP5ReoH3//xqNxFuro7P2+Q66shvU368=; b=bIeLdMxZeHb3x7vpj0raOhpIaYEsdK4dsdBhIO8ytmza8RGSSq+6nbikxb4vp5Xop1P7nv 5NMOpl+/EPCDP2UqpfMiEVIG0oJxsfU1EkqmeSK9FYMjGqkL1NX6DYdIuksYBfBaUBKkPQ f6GW4TOirv2G46XZcwJaXGsZT2BMQsk= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-338-kG3Hk5VCOq-MrLBrXf1dJQ-1; Wed, 29 Mar 2023 10:14:14 -0400 X-MC-Unique: kG3Hk5VCOq-MrLBrXf1dJQ-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 2DA6A3814592; Wed, 29 Mar 2023 14:14:13 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id D3728492C3E; Wed, 29 Mar 2023 14:14:10 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Bernard Metzler , Tom Talpey , linux-rdma@vger.kernel.org Subject: [RFC PATCH v2 05/48] mm: Move the page fragment allocator from page_alloc.c into its own file Date: Wed, 29 Mar 2023 15:13:11 +0100 Message-Id: <20230329141354.516864-6-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761712112543813505?= X-GMAIL-MSGID: =?utf-8?q?1761712112543813505?= Move the page fragment allocator from page_alloc.c into its own file preparatory to changing it. Signed-off-by: David Howells cc: Bernard Metzler cc: Tom Talpey cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: linux-rdma@vger.kernel.org cc: netdev@vger.kernel.org --- mm/Makefile | 2 +- mm/page_alloc.c | 126 ----------------------------------------- mm/page_frag_alloc.c | 131 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 132 insertions(+), 127 deletions(-) create mode 100644 mm/page_frag_alloc.c diff --git a/mm/Makefile b/mm/Makefile index 8e105e5b3e29..4e6dc12b4cbd 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -52,7 +52,7 @@ obj-y := filemap.o mempool.o oom_kill.o fadvise.o \ readahead.o swap.o truncate.o vmscan.o shmem.o \ util.o mmzone.o vmstat.o backing-dev.o \ mm_init.o percpu.o slab_common.o \ - compaction.o \ + compaction.o page_frag_alloc.o \ interval_tree.o list_lru.o workingset.o \ debug.o gup.o mmap_lock.o $(mmu-y) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index ac1fc986af44..c08847308907 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5694,132 +5694,6 @@ void free_pages(unsigned long addr, unsigned int order) EXPORT_SYMBOL(free_pages); -/* - * Page Fragment: - * An arbitrary-length arbitrary-offset area of memory which resides - * within a 0 or higher order page. Multiple fragments within that page - * are individually refcounted, in the page's reference counter. - * - * The page_frag functions below provide a simple allocation framework for - * page fragments. This is used by the network stack and network device - * drivers to provide a backing region of memory for use as either an - * sk_buff->head, or to be used in the "frags" portion of skb_shared_info. - */ -static struct page *__page_frag_cache_refill(struct page_frag_cache *nc, - gfp_t gfp_mask) -{ - struct page *page = NULL; - gfp_t gfp = gfp_mask; - -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - gfp_mask |= __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY | - __GFP_NOMEMALLOC; - page = alloc_pages_node(NUMA_NO_NODE, gfp_mask, - PAGE_FRAG_CACHE_MAX_ORDER); - nc->size = page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE; -#endif - if (unlikely(!page)) - page = alloc_pages_node(NUMA_NO_NODE, gfp, 0); - - nc->va = page ? page_address(page) : NULL; - - return page; -} - -void __page_frag_cache_drain(struct page *page, unsigned int count) -{ - VM_BUG_ON_PAGE(page_ref_count(page) == 0, page); - - if (page_ref_sub_and_test(page, count)) - free_the_page(page, compound_order(page)); -} -EXPORT_SYMBOL(__page_frag_cache_drain); - -void *page_frag_alloc_align(struct page_frag_cache *nc, - unsigned int fragsz, gfp_t gfp_mask, - unsigned int align_mask) -{ - unsigned int size = PAGE_SIZE; - struct page *page; - int offset; - - if (unlikely(!nc->va)) { -refill: - page = __page_frag_cache_refill(nc, gfp_mask); - if (!page) - return NULL; - -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - /* if size can vary use size else just use PAGE_SIZE */ - size = nc->size; -#endif - /* Even if we own the page, we do not use atomic_set(). - * This would break get_page_unless_zero() users. - */ - page_ref_add(page, PAGE_FRAG_CACHE_MAX_SIZE); - - /* reset page count bias and offset to start of new frag */ - nc->pfmemalloc = page_is_pfmemalloc(page); - nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; - nc->offset = size; - } - - offset = nc->offset - fragsz; - if (unlikely(offset < 0)) { - page = virt_to_page(nc->va); - - if (!page_ref_sub_and_test(page, nc->pagecnt_bias)) - goto refill; - - if (unlikely(nc->pfmemalloc)) { - free_the_page(page, compound_order(page)); - goto refill; - } - -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - /* if size can vary use size else just use PAGE_SIZE */ - size = nc->size; -#endif - /* OK, page count is 0, we can safely set it */ - set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1); - - /* reset page count bias and offset to start of new frag */ - nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; - offset = size - fragsz; - if (unlikely(offset < 0)) { - /* - * The caller is trying to allocate a fragment - * with fragsz > PAGE_SIZE but the cache isn't big - * enough to satisfy the request, this may - * happen in low memory conditions. - * We don't release the cache page because - * it could make memory pressure worse - * so we simply return NULL here. - */ - return NULL; - } - } - - nc->pagecnt_bias--; - offset &= align_mask; - nc->offset = offset; - - return nc->va + offset; -} -EXPORT_SYMBOL(page_frag_alloc_align); - -/* - * Frees a page fragment allocated out of either a compound or order 0 page. - */ -void page_frag_free(void *addr) -{ - struct page *page = virt_to_head_page(addr); - - if (unlikely(put_page_testzero(page))) - free_the_page(page, compound_order(page)); -} -EXPORT_SYMBOL(page_frag_free); - static void *make_alloc_exact(unsigned long addr, unsigned int order, size_t size) { diff --git a/mm/page_frag_alloc.c b/mm/page_frag_alloc.c new file mode 100644 index 000000000000..bee95824ef8f --- /dev/null +++ b/mm/page_frag_alloc.c @@ -0,0 +1,131 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Page fragment allocator + * + * Page Fragment: + * An arbitrary-length arbitrary-offset area of memory which resides within a + * 0 or higher order page. Multiple fragments within that page are + * individually refcounted, in the page's reference counter. + * + * The page_frag functions provide a simple allocation framework for page + * fragments. This is used by the network stack and network device drivers to + * provide a backing region of memory for use as either an sk_buff->head, or to + * be used in the "frags" portion of skb_shared_info. + */ + +#include +#include +#include + +static struct page *__page_frag_cache_refill(struct page_frag_cache *nc, + gfp_t gfp_mask) +{ + struct page *page = NULL; + gfp_t gfp = gfp_mask; + +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) + gfp_mask |= __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY | + __GFP_NOMEMALLOC; + page = alloc_pages_node(NUMA_NO_NODE, gfp_mask, + PAGE_FRAG_CACHE_MAX_ORDER); + nc->size = page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE; +#endif + if (unlikely(!page)) + page = alloc_pages_node(NUMA_NO_NODE, gfp, 0); + + nc->va = page ? page_address(page) : NULL; + + return page; +} + +void __page_frag_cache_drain(struct page *page, unsigned int count) +{ + VM_BUG_ON_PAGE(page_ref_count(page) == 0, page); + + if (page_ref_sub_and_test(page, count - 1)) + __free_pages(page, compound_order(page)); +} +EXPORT_SYMBOL(__page_frag_cache_drain); + +void *page_frag_alloc_align(struct page_frag_cache *nc, + unsigned int fragsz, gfp_t gfp_mask, + unsigned int align_mask) +{ + unsigned int size = PAGE_SIZE; + struct page *page; + int offset; + + if (unlikely(!nc->va)) { +refill: + page = __page_frag_cache_refill(nc, gfp_mask); + if (!page) + return NULL; + +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) + /* if size can vary use size else just use PAGE_SIZE */ + size = nc->size; +#endif + /* Even if we own the page, we do not use atomic_set(). + * This would break get_page_unless_zero() users. + */ + page_ref_add(page, PAGE_FRAG_CACHE_MAX_SIZE); + + /* reset page count bias and offset to start of new frag */ + nc->pfmemalloc = page_is_pfmemalloc(page); + nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; + nc->offset = size; + } + + offset = nc->offset - fragsz; + if (unlikely(offset < 0)) { + page = virt_to_page(nc->va); + + if (page_ref_count(page) != nc->pagecnt_bias) + goto refill; + if (unlikely(nc->pfmemalloc)) { + page_ref_sub(page, nc->pagecnt_bias - 1); + __free_pages(page, compound_order(page)); + goto refill; + } + +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) + /* if size can vary use size else just use PAGE_SIZE */ + size = nc->size; +#endif + /* OK, page count is 0, we can safely set it */ + set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1); + + /* reset page count bias and offset to start of new frag */ + nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; + offset = size - fragsz; + if (unlikely(offset < 0)) { + /* + * The caller is trying to allocate a fragment + * with fragsz > PAGE_SIZE but the cache isn't big + * enough to satisfy the request, this may + * happen in low memory conditions. + * We don't release the cache page because + * it could make memory pressure worse + * so we simply return NULL here. + */ + return NULL; + } + } + + nc->pagecnt_bias--; + offset &= align_mask; + nc->offset = offset; + + return nc->va + offset; +} +EXPORT_SYMBOL(page_frag_alloc_align); + +/* + * Frees a page fragment allocated out of either a compound or order 0 page. + */ +void page_frag_free(void *addr) +{ + struct page *page = virt_to_head_page(addr); + + __free_pages(page, compound_order(page)); +} +EXPORT_SYMBOL(page_frag_free); From patchwork Wed Mar 29 14:13:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76651 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp480591vqo; Wed, 29 Mar 2023 08:05:59 -0700 (PDT) X-Google-Smtp-Source: AKy350aXu5eOP+RiNz9bNQ5OO1v5kTkrRw2a6X676NZ46X53y0WIK2AGBZ5cSh1hzGOBtBhLKcC8 X-Received: by 2002:a17:906:174b:b0:929:7d80:3a37 with SMTP id d11-20020a170906174b00b009297d803a37mr19882601eje.37.1680102359676; Wed, 29 Mar 2023 08:05:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680102359; cv=none; d=google.com; s=arc-20160816; b=wjDyliBlOqW704Oungyv8avNgY3f90tRW5QbyWr5to+1SMsRXJE+1geQuxJ2tazS7G 8mFs1csAKxYJU731IK57bx19T+5CCzcTefSV/rNiX2pTH1kyTmDwsya6ayfoh4/1wFpy 3V5u3tcQKg72GAXbXNmcVHSVlR/cysIG9bLLUQXTS64RsGla9bU054ThVeARcP89StCz yVG/6zeoM6EE+qxP8gxcfbn1sDZl1D7e+O3V4kCRQMvZhLe6QDF2+EFDenYPKBFqsFcw zzdiw8AUeZGWzDV6r3we1JQ7wtTRm9I0IVuW4KDe1sArYP3btelqKRTUZIqBfnsPOmbE SKwA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=w9vYmidIh5WhninpbPh1DpHK4Q4l9z+4/ehUwmRVCvg=; b=Y5j7Q/E73xePUusF08vipvboei2H+h/g4GkBw20RTI9QZ2Njaeg0qWrmrrdbhxxYJ4 gC+EZTPII5JgFK+dxL1fJRGTLvhOeq5fbaGXdaxZFltR3gKYmc8oHG3JDPbMBuCrf7ya QHisfomaQeDVea2e51VJMEqD4+KgEUO5BcpTTImNbmyF+DrffRXbf/kw8/4OoIrWDV3P XZjLY9WTsUOBcKU3eQRryiK0MOpylOLAddAIMwq5DOesOK0Qt9tI7LY/MdO7LhRDZ5H0 UTDx1ElYx7oxzfPwUNCu1RRAro6k7ynBbHemld2MQYmMJbYRvj8oE9TdRccqFEi+puNl YFxQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=UoAbi3Tg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id sc38-20020a1709078a2600b0092fbf61b95csi32432400ejc.50.2023.03.29.08.05.35; Wed, 29 Mar 2023 08:05:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=UoAbi3Tg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230141AbjC2OoJ (ORCPT + 99 others); Wed, 29 Mar 2023 10:44:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42674 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229786AbjC2Onb (ORCPT ); Wed, 29 Mar 2023 10:43:31 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 87F256EAD for ; Wed, 29 Mar 2023 07:40:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680100809; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=w9vYmidIh5WhninpbPh1DpHK4Q4l9z+4/ehUwmRVCvg=; b=UoAbi3Tg+8ceaxCmiccxIID7bsUvffZKBt/gp6esMmIXJBsQFZHavD+JGv/nckmCm1xsuY eaecNgCYHt2YHSPKdavINBy0Ulrham2FVOeIn+yyj6k5CNO6l4KVo+IT44bMtS5ths6hUW l/I4wJbx9dnbHB7CYwuPkZlF+VFNjJM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-187-vEQ2qQN_Nv6Frz5PzfcuyA-1; Wed, 29 Mar 2023 10:14:16 -0400 X-MC-Unique: vEQ2qQN_Nv6Frz5PzfcuyA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C7D96801779; Wed, 29 Mar 2023 14:14:15 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id E3D6E4020C83; Wed, 29 Mar 2023 14:14:13 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v2 06/48] mm: Make the page_frag_cache allocator use multipage folios Date: Wed, 29 Mar 2023 15:13:12 +0100 Message-Id: <20230329141354.516864-7-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761715012207576393?= X-GMAIL-MSGID: =?utf-8?q?1761715012207576393?= Change the page_frag_cache allocator to use multipage folios rather than groups of pages. This reduces page_frag_free to just a folio_put() or put_page(). Signed-off-by: David Howells cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- include/linux/mm_types.h | 13 ++---- mm/page_frag_alloc.c | 88 +++++++++++++++++++--------------------- 2 files changed, 45 insertions(+), 56 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 0722859c3647..49a70b3f44a9 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -420,18 +420,13 @@ static inline void *folio_get_private(struct folio *folio) } struct page_frag_cache { - void * va; -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - __u16 offset; - __u16 size; -#else - __u32 offset; -#endif + struct folio *folio; + unsigned int offset; /* we maintain a pagecount bias, so that we dont dirty cache line * containing page->_refcount every time we allocate a fragment. */ - unsigned int pagecnt_bias; - bool pfmemalloc; + unsigned int pagecnt_bias; + bool pfmemalloc; }; typedef unsigned long vm_flags_t; diff --git a/mm/page_frag_alloc.c b/mm/page_frag_alloc.c index bee95824ef8f..c3792b68ce32 100644 --- a/mm/page_frag_alloc.c +++ b/mm/page_frag_alloc.c @@ -16,33 +16,34 @@ #include #include -static struct page *__page_frag_cache_refill(struct page_frag_cache *nc, - gfp_t gfp_mask) +/* + * Allocate a new folio for the frag cache. + */ +static struct folio *page_frag_cache_refill(struct page_frag_cache *nc, + gfp_t gfp_mask) { - struct page *page = NULL; + struct folio *folio = NULL; gfp_t gfp = gfp_mask; #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - gfp_mask |= __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY | - __GFP_NOMEMALLOC; - page = alloc_pages_node(NUMA_NO_NODE, gfp_mask, - PAGE_FRAG_CACHE_MAX_ORDER); - nc->size = page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE; + gfp_mask |= __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC; + folio = folio_alloc(gfp_mask, PAGE_FRAG_CACHE_MAX_ORDER); #endif - if (unlikely(!page)) - page = alloc_pages_node(NUMA_NO_NODE, gfp, 0); - - nc->va = page ? page_address(page) : NULL; + if (unlikely(!folio)) + folio = folio_alloc(gfp, 0); - return page; + if (folio) + nc->folio = folio; + return folio; } void __page_frag_cache_drain(struct page *page, unsigned int count) { - VM_BUG_ON_PAGE(page_ref_count(page) == 0, page); + struct folio *folio = page_folio(page); - if (page_ref_sub_and_test(page, count - 1)) - __free_pages(page, compound_order(page)); + VM_BUG_ON_FOLIO(folio_ref_count(folio) == 0, folio); + + folio_put_refs(folio, count); } EXPORT_SYMBOL(__page_frag_cache_drain); @@ -50,54 +51,47 @@ void *page_frag_alloc_align(struct page_frag_cache *nc, unsigned int fragsz, gfp_t gfp_mask, unsigned int align_mask) { - unsigned int size = PAGE_SIZE; - struct page *page; - int offset; + struct folio *folio = nc->folio; + size_t offset; - if (unlikely(!nc->va)) { + if (unlikely(!folio)) { refill: - page = __page_frag_cache_refill(nc, gfp_mask); - if (!page) + folio = page_frag_cache_refill(nc, gfp_mask); + if (!folio) return NULL; -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - /* if size can vary use size else just use PAGE_SIZE */ - size = nc->size; -#endif /* Even if we own the page, we do not use atomic_set(). * This would break get_page_unless_zero() users. */ - page_ref_add(page, PAGE_FRAG_CACHE_MAX_SIZE); + folio_ref_add(folio, PAGE_FRAG_CACHE_MAX_SIZE); /* reset page count bias and offset to start of new frag */ - nc->pfmemalloc = page_is_pfmemalloc(page); + nc->pfmemalloc = folio_is_pfmemalloc(folio); nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; - nc->offset = size; + nc->offset = folio_size(folio); } - offset = nc->offset - fragsz; - if (unlikely(offset < 0)) { - page = virt_to_page(nc->va); - - if (page_ref_count(page) != nc->pagecnt_bias) + offset = nc->offset; + if (unlikely(fragsz > offset)) { + /* Reuse the folio if everyone we gave it to has finished with it. */ + if (!folio_ref_sub_and_test(folio, nc->pagecnt_bias)) { + nc->folio = NULL; goto refill; + } + if (unlikely(nc->pfmemalloc)) { - page_ref_sub(page, nc->pagecnt_bias - 1); - __free_pages(page, compound_order(page)); + __folio_put(folio); + nc->folio = NULL; goto refill; } -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - /* if size can vary use size else just use PAGE_SIZE */ - size = nc->size; -#endif /* OK, page count is 0, we can safely set it */ - set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1); + folio_set_count(folio, PAGE_FRAG_CACHE_MAX_SIZE + 1); /* reset page count bias and offset to start of new frag */ nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; - offset = size - fragsz; - if (unlikely(offset < 0)) { + offset = folio_size(folio); + if (unlikely(fragsz > offset)) { /* * The caller is trying to allocate a fragment * with fragsz > PAGE_SIZE but the cache isn't big @@ -107,15 +101,17 @@ void *page_frag_alloc_align(struct page_frag_cache *nc, * it could make memory pressure worse * so we simply return NULL here. */ + nc->offset = offset; return NULL; } } nc->pagecnt_bias--; + offset -= fragsz; offset &= align_mask; nc->offset = offset; - return nc->va + offset; + return folio_address(folio) + offset; } EXPORT_SYMBOL(page_frag_alloc_align); @@ -124,8 +120,6 @@ EXPORT_SYMBOL(page_frag_alloc_align); */ void page_frag_free(void *addr) { - struct page *page = virt_to_head_page(addr); - - __free_pages(page, compound_order(page)); + folio_put(virt_to_folio(addr)); } EXPORT_SYMBOL(page_frag_free); From patchwork Wed Mar 29 14:13:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76642 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp463876vqo; Wed, 29 Mar 2023 07:41:02 -0700 (PDT) X-Google-Smtp-Source: AKy350ZyitJtPqiBpZ1dlvRh1vodmHkQVagXhXXGg67HzS38u/wNjk40SjmE8b8JP2UEOizN6o7g X-Received: by 2002:a17:906:f2c9:b0:93b:62f:82a3 with SMTP id gz9-20020a170906f2c900b0093b062f82a3mr22119282ejb.6.1680100861855; Wed, 29 Mar 2023 07:41:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680100861; cv=none; d=google.com; s=arc-20160816; b=yZ170inY3OqnC1rrViQkq2TxVidzckw5++n5wiT33ELxGHAEyWuOctKA48RVm8up1H 9O1EwDU16S/HOv115m+7D3FeumZqFkeMNg/IOdMYkh0Fea/Y1PC20YgUloJhXaGwmpRX 76kbEGDAU7fIS53wo5myEU7wQ+14QzAHDxXmXWiStrD4na/EZ2ui6v/WLm1OAsryupwc b5tBLiFMbEdVQlVg+1Sw/2DYDg+AqOGsQkV8QjX4Zo4B4WQCgm2OkiO9iH3il3bFyuuP Q73xk0UMnXEH6b9MVk02mKhKjM8Q7e/YfRPaOJ/HzBIAXfPgSVIepxYvjjJf7zFCpzk4 5eXw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=/nU0M+T1DCbw3wZvEUGAhwFOIfw7rmUPAJ5895bBp2k=; b=TpD9UBjURMMrgnAiiuOkPUuoMoXGh5C15I9QLNErv+NcCQC46MmmoxEGTeYQJFfO06 gNnsq0k6rO1XUbSNnfRI0bx9O3OG8eOJDK5ERmStSZcDyIq4fg/+nbJns90CxCpRXUEA iMHtnApL6Xuggs0yoVZD8DlsTC1k3nfad6wGqdzCf7h/4DgjczOSF+KsqW6Qa++2k9P9 xWRMMYkjnBB/x4nFRoCBloytnrNOAzEsJ2abOIl/Hc4DnAdd4gwBR11hGjzhILOi/QCB zrNLPWG74e5uxT3naRkUFIj+kgW5DMv40vCJ9lQ9Egs5E3bAQjk20ZfeJUp36slw6wsN mtfQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=DH8KTfbM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id oz38-20020a1709077da600b00939a0e68aa6si19219521ejc.418.2023.03.29.07.40.37; Wed, 29 Mar 2023 07:41:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=DH8KTfbM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229889AbjC2OaR (ORCPT + 99 others); Wed, 29 Mar 2023 10:30:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47532 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230514AbjC2O3j (ORCPT ); Wed, 29 Mar 2023 10:29:39 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BDB15BDCA for ; Wed, 29 Mar 2023 07:22:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099660; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/nU0M+T1DCbw3wZvEUGAhwFOIfw7rmUPAJ5895bBp2k=; b=DH8KTfbMCqLta2AC+5VAXGpCTtopLnYWwglm9PHTVWTd10/YxBKX7sxm63JDIadk7xEQUK UtCp/FOasZW637DyMY+W3yF7njA2dn5kXPyt730fBd9hehXPfYv4AvH80FpQNINIeD7EVX giBppBzyWvtixhWGLB0WfEtCUt9fV5g= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-98-MxenIVgtPHWZWtbUZgH7HA-1; Wed, 29 Mar 2023 10:14:22 -0400 X-MC-Unique: MxenIVgtPHWZWtbUZgH7HA-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C8DAB381459C; Wed, 29 Mar 2023 14:14:19 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 62C781121330; Wed, 29 Mar 2023 14:14:16 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Lorenzo Bianconi , Felix Fietkau , John Crispin , Sean Wang , Mark Lee , Keith Busch , Christoph Hellwig , Sagi Grimberg , Chaitanya Kulkarni , linux-nvme@lists.infradead.org, linux-mediatek@lists.infradead.org Subject: [RFC PATCH v2 07/48] mm: Make the page_frag_cache allocator use per-cpu Date: Wed, 29 Mar 2023 15:13:13 +0100 Message-Id: <20230329141354.516864-8-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761713441153465620?= X-GMAIL-MSGID: =?utf-8?q?1761713441153465620?= Make the page_frag_cache allocator have a separate allocation bucket for each cpu to avoid racing. This means that no lock is required, other than preempt disablement, to allocate from it, though if a softirq wants to access it, then softirq disablement will need to be added. Make the NVMe and mediatek drivers pass in NULL to page_frag_cache() and use the default allocation buckets rather than defining their own. Signed-off-by: David Howells cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Lorenzo Bianconi cc: Felix Fietkau cc: John Crispin cc: Sean Wang cc: Mark Lee cc: Keith Busch cc: Christoph Hellwig cc: Sagi Grimberg cc: Chaitanya Kulkarni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org cc: linux-nvme@lists.infradead.org cc: linux-mediatek@lists.infradead.org --- drivers/net/ethernet/mediatek/mtk_wed_wo.c | 19 +-- drivers/net/ethernet/mediatek/mtk_wed_wo.h | 2 - drivers/nvme/host/tcp.c | 19 +-- drivers/nvme/target/tcp.c | 22 +-- include/linux/gfp.h | 17 +- mm/page_frag_alloc.c | 182 +++++++++++++++------ net/core/skbuff.c | 32 ++-- 7 files changed, 164 insertions(+), 129 deletions(-) diff --git a/drivers/net/ethernet/mediatek/mtk_wed_wo.c b/drivers/net/ethernet/mediatek/mtk_wed_wo.c index 69fba29055e9..859f34447f2f 100644 --- a/drivers/net/ethernet/mediatek/mtk_wed_wo.c +++ b/drivers/net/ethernet/mediatek/mtk_wed_wo.c @@ -143,7 +143,7 @@ mtk_wed_wo_queue_refill(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q, dma_addr_t addr; void *buf; - buf = page_frag_alloc(&q->cache, q->buf_size, GFP_ATOMIC); + buf = page_frag_alloc(NULL, q->buf_size, GFP_ATOMIC); if (!buf) break; @@ -286,7 +286,6 @@ mtk_wed_wo_queue_free(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q) static void mtk_wed_wo_queue_tx_clean(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q) { - struct page *page; int i; for (i = 0; i < q->n_desc; i++) { @@ -297,20 +296,11 @@ mtk_wed_wo_queue_tx_clean(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q) skb_free_frag(entry->buf); entry->buf = NULL; } - - if (!q->cache.va) - return; - - page = virt_to_page(q->cache.va); - __page_frag_cache_drain(page, q->cache.pagecnt_bias); - memset(&q->cache, 0, sizeof(q->cache)); } static void mtk_wed_wo_queue_rx_clean(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q) { - struct page *page; - for (;;) { void *buf = mtk_wed_wo_dequeue(wo, q, NULL, true); @@ -319,13 +309,6 @@ mtk_wed_wo_queue_rx_clean(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q) skb_free_frag(buf); } - - if (!q->cache.va) - return; - - page = virt_to_page(q->cache.va); - __page_frag_cache_drain(page, q->cache.pagecnt_bias); - memset(&q->cache, 0, sizeof(q->cache)); } static void diff --git a/drivers/net/ethernet/mediatek/mtk_wed_wo.h b/drivers/net/ethernet/mediatek/mtk_wed_wo.h index dbcf42ce9173..6f940db67fb8 100644 --- a/drivers/net/ethernet/mediatek/mtk_wed_wo.h +++ b/drivers/net/ethernet/mediatek/mtk_wed_wo.h @@ -210,8 +210,6 @@ struct mtk_wed_wo_queue_entry { struct mtk_wed_wo_queue { struct mtk_wed_wo_queue_regs regs; - struct page_frag_cache cache; - struct mtk_wed_wo_queue_desc *desc; dma_addr_t desc_dma; diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 7723a4989524..fa32969b532f 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -147,8 +147,6 @@ struct nvme_tcp_queue { __le32 exp_ddgst; __le32 recv_ddgst; - struct page_frag_cache pf_cache; - void (*state_change)(struct sock *); void (*data_ready)(struct sock *); void (*write_space)(struct sock *); @@ -470,9 +468,8 @@ static int nvme_tcp_init_request(struct blk_mq_tag_set *set, struct nvme_tcp_queue *queue = &ctrl->queues[queue_idx]; u8 hdgst = nvme_tcp_hdgst_len(queue); - req->pdu = page_frag_alloc(&queue->pf_cache, - sizeof(struct nvme_tcp_cmd_pdu) + hdgst, - GFP_KERNEL | __GFP_ZERO); + req->pdu = page_frag_alloc(NULL, sizeof(struct nvme_tcp_cmd_pdu) + hdgst, + GFP_KERNEL | __GFP_ZERO); if (!req->pdu) return -ENOMEM; @@ -1288,9 +1285,8 @@ static int nvme_tcp_alloc_async_req(struct nvme_tcp_ctrl *ctrl) struct nvme_tcp_request *async = &ctrl->async_req; u8 hdgst = nvme_tcp_hdgst_len(queue); - async->pdu = page_frag_alloc(&queue->pf_cache, - sizeof(struct nvme_tcp_cmd_pdu) + hdgst, - GFP_KERNEL | __GFP_ZERO); + async->pdu = page_frag_alloc(NULL, sizeof(struct nvme_tcp_cmd_pdu) + hdgst, + GFP_KERNEL | __GFP_ZERO); if (!async->pdu) return -ENOMEM; @@ -1300,7 +1296,6 @@ static int nvme_tcp_alloc_async_req(struct nvme_tcp_ctrl *ctrl) static void nvme_tcp_free_queue(struct nvme_ctrl *nctrl, int qid) { - struct page *page; struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl); struct nvme_tcp_queue *queue = &ctrl->queues[qid]; unsigned int noreclaim_flag; @@ -1311,12 +1306,6 @@ static void nvme_tcp_free_queue(struct nvme_ctrl *nctrl, int qid) if (queue->hdr_digest || queue->data_digest) nvme_tcp_free_crypto(queue); - if (queue->pf_cache.va) { - page = virt_to_head_page(queue->pf_cache.va); - __page_frag_cache_drain(page, queue->pf_cache.pagecnt_bias); - queue->pf_cache.va = NULL; - } - noreclaim_flag = memalloc_noreclaim_save(); sock_release(queue->sock); memalloc_noreclaim_restore(noreclaim_flag); diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c index 66e8f9fd0ca7..d6cc557cc539 100644 --- a/drivers/nvme/target/tcp.c +++ b/drivers/nvme/target/tcp.c @@ -143,8 +143,6 @@ struct nvmet_tcp_queue { struct nvmet_tcp_cmd connect; - struct page_frag_cache pf_cache; - void (*data_ready)(struct sock *); void (*state_change)(struct sock *); void (*write_space)(struct sock *); @@ -1312,25 +1310,25 @@ static int nvmet_tcp_alloc_cmd(struct nvmet_tcp_queue *queue, c->queue = queue; c->req.port = queue->port->nport; - c->cmd_pdu = page_frag_alloc(&queue->pf_cache, - sizeof(*c->cmd_pdu) + hdgst, GFP_KERNEL | __GFP_ZERO); + c->cmd_pdu = page_frag_alloc(NULL, sizeof(*c->cmd_pdu) + hdgst, + GFP_KERNEL | __GFP_ZERO); if (!c->cmd_pdu) return -ENOMEM; c->req.cmd = &c->cmd_pdu->cmd; - c->rsp_pdu = page_frag_alloc(&queue->pf_cache, - sizeof(*c->rsp_pdu) + hdgst, GFP_KERNEL | __GFP_ZERO); + c->rsp_pdu = page_frag_alloc(NULL, sizeof(*c->rsp_pdu) + hdgst, + GFP_KERNEL | __GFP_ZERO); if (!c->rsp_pdu) goto out_free_cmd; c->req.cqe = &c->rsp_pdu->cqe; - c->data_pdu = page_frag_alloc(&queue->pf_cache, - sizeof(*c->data_pdu) + hdgst, GFP_KERNEL | __GFP_ZERO); + c->data_pdu = page_frag_alloc(NULL, sizeof(*c->data_pdu) + hdgst, + GFP_KERNEL | __GFP_ZERO); if (!c->data_pdu) goto out_free_rsp; - c->r2t_pdu = page_frag_alloc(&queue->pf_cache, - sizeof(*c->r2t_pdu) + hdgst, GFP_KERNEL | __GFP_ZERO); + c->r2t_pdu = page_frag_alloc(NULL, sizeof(*c->r2t_pdu) + hdgst, + GFP_KERNEL | __GFP_ZERO); if (!c->r2t_pdu) goto out_free_data; @@ -1438,7 +1436,6 @@ static void nvmet_tcp_free_cmd_data_in_buffers(struct nvmet_tcp_queue *queue) static void nvmet_tcp_release_queue_work(struct work_struct *w) { - struct page *page; struct nvmet_tcp_queue *queue = container_of(w, struct nvmet_tcp_queue, release_work); @@ -1460,9 +1457,6 @@ static void nvmet_tcp_release_queue_work(struct work_struct *w) if (queue->hdr_digest || queue->data_digest) nvmet_tcp_free_crypto(queue); ida_free(&nvmet_tcp_queue_ida, queue->idx); - - page = virt_to_head_page(queue->pf_cache.va); - __page_frag_cache_drain(page, queue->pf_cache.pagecnt_bias); kfree(queue); } diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 65a78773dcca..b208ca315882 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -304,14 +304,17 @@ extern void free_pages(unsigned long addr, unsigned int order); struct page_frag_cache; extern void __page_frag_cache_drain(struct page *page, unsigned int count); -extern void *page_frag_alloc_align(struct page_frag_cache *nc, - unsigned int fragsz, gfp_t gfp_mask, - unsigned int align_mask); - -static inline void *page_frag_alloc(struct page_frag_cache *nc, - unsigned int fragsz, gfp_t gfp_mask) +extern void *page_frag_alloc_align(struct page_frag_cache __percpu *frag_cache, + size_t fragsz, gfp_t gfp, + unsigned long align_mask); +extern void *page_frag_memdup(struct page_frag_cache __percpu *frag_cache, + const void *p, size_t fragsz, gfp_t gfp, + unsigned long align_mask); + +static inline void *page_frag_alloc(struct page_frag_cache __percpu *frag_cache, + size_t fragsz, gfp_t gfp) { - return page_frag_alloc_align(nc, fragsz, gfp_mask, ~0u); + return page_frag_alloc_align(frag_cache, fragsz, gfp, ULONG_MAX); } extern void page_frag_free(void *addr); diff --git a/mm/page_frag_alloc.c b/mm/page_frag_alloc.c index c3792b68ce32..7844398afe26 100644 --- a/mm/page_frag_alloc.c +++ b/mm/page_frag_alloc.c @@ -16,25 +16,23 @@ #include #include +static DEFINE_PER_CPU(struct page_frag_cache, page_frag_default_allocator); + /* * Allocate a new folio for the frag cache. */ -static struct folio *page_frag_cache_refill(struct page_frag_cache *nc, - gfp_t gfp_mask) +static struct folio *page_frag_cache_refill(gfp_t gfp) { - struct folio *folio = NULL; - gfp_t gfp = gfp_mask; + struct folio *folio; #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - gfp_mask |= __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC; - folio = folio_alloc(gfp_mask, PAGE_FRAG_CACHE_MAX_ORDER); + folio = folio_alloc(gfp | __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC, + PAGE_FRAG_CACHE_MAX_ORDER); + if (folio) + return folio; #endif - if (unlikely(!folio)) - folio = folio_alloc(gfp, 0); - if (folio) - nc->folio = folio; - return folio; + return folio_alloc(gfp, 0); } void __page_frag_cache_drain(struct page *page, unsigned int count) @@ -47,41 +45,68 @@ void __page_frag_cache_drain(struct page *page, unsigned int count) } EXPORT_SYMBOL(__page_frag_cache_drain); -void *page_frag_alloc_align(struct page_frag_cache *nc, - unsigned int fragsz, gfp_t gfp_mask, - unsigned int align_mask) +/** + * page_frag_alloc_align - Allocate some memory for use in zerocopy + * @frag_cache: The frag cache to use (or NULL for the default) + * @fragsz: The size of the fragment desired + * @gfp: Allocation flags under which to make an allocation + * @align_mask: The required alignment + * + * Allocate some memory for use with zerocopy where protocol bits have to be + * mixed in with spliced/zerocopied data. Unlike memory allocated from the + * slab, this memory's lifetime is purely dependent on the folio's refcount. + * + * The way it works is that a folio is allocated and fragments are broken off + * sequentially and returned to the caller with a ref until the folio no longer + * has enough spare space - at which point the allocator's ref is dropped and a + * new folio is allocated. The folio remains in existence until the last ref + * held by, say, an sk_buff is discarded and then the page is returned to the + * page allocator. + * + * Returns a pointer to the memory on success and -ENOMEM on allocation + * failure. + * + * The allocated memory should be disposed of with folio_put(). + */ +void *page_frag_alloc_align(struct page_frag_cache __percpu *frag_cache, + size_t fragsz, gfp_t gfp, unsigned long align_mask) { - struct folio *folio = nc->folio; + struct page_frag_cache *nc; + struct folio *folio, *spare = NULL; size_t offset; + void *p; - if (unlikely(!folio)) { -refill: - folio = page_frag_cache_refill(nc, gfp_mask); - if (!folio) - return NULL; - - /* Even if we own the page, we do not use atomic_set(). - * This would break get_page_unless_zero() users. - */ - folio_ref_add(folio, PAGE_FRAG_CACHE_MAX_SIZE); + if (!frag_cache) + frag_cache = &page_frag_default_allocator; + if (WARN_ON_ONCE(fragsz == 0)) + fragsz = 1; + align_mask &= ~3UL; - /* reset page count bias and offset to start of new frag */ - nc->pfmemalloc = folio_is_pfmemalloc(folio); - nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; - nc->offset = folio_size(folio); + nc = get_cpu_ptr(frag_cache); +reload: + folio = nc->folio; + offset = nc->offset; +try_again: + + /* Make the allocation if there's sufficient space. */ + if (fragsz <= offset) { + nc->pagecnt_bias--; + offset = (offset - fragsz) & align_mask; + nc->offset = offset; + p = folio_address(folio) + offset; + put_cpu_ptr(frag_cache); + if (spare) + folio_put(spare); + return p; } - offset = nc->offset; - if (unlikely(fragsz > offset)) { - /* Reuse the folio if everyone we gave it to has finished with it. */ - if (!folio_ref_sub_and_test(folio, nc->pagecnt_bias)) { - nc->folio = NULL; + /* Insufficient space - see if we can refurbish the current folio. */ + if (folio) { + if (!folio_ref_sub_and_test(folio, nc->pagecnt_bias)) goto refill; - } if (unlikely(nc->pfmemalloc)) { __folio_put(folio); - nc->folio = NULL; goto refill; } @@ -91,27 +116,56 @@ void *page_frag_alloc_align(struct page_frag_cache *nc, /* reset page count bias and offset to start of new frag */ nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; offset = folio_size(folio); - if (unlikely(fragsz > offset)) { - /* - * The caller is trying to allocate a fragment - * with fragsz > PAGE_SIZE but the cache isn't big - * enough to satisfy the request, this may - * happen in low memory conditions. - * We don't release the cache page because - * it could make memory pressure worse - * so we simply return NULL here. - */ - nc->offset = offset; + if (unlikely(fragsz > offset)) + goto frag_too_big; + goto try_again; + } + +refill: + if (!spare) { + nc->folio = NULL; + put_cpu_ptr(frag_cache); + + spare = page_frag_cache_refill(gfp); + if (!spare) return NULL; - } + + nc = get_cpu_ptr(frag_cache); + /* We may now be on a different cpu and/or someone else may + * have refilled it + */ + nc->pfmemalloc = folio_is_pfmemalloc(spare); + if (nc->folio) + goto reload; } - nc->pagecnt_bias--; - offset -= fragsz; - offset &= align_mask; + nc->folio = spare; + folio = spare; + spare = NULL; + + /* Even if we own the page, we do not use atomic_set(). This would + * break get_page_unless_zero() users. + */ + folio_ref_add(folio, PAGE_FRAG_CACHE_MAX_SIZE); + + /* Reset page count bias and offset to start of new frag */ + nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; + offset = folio_size(folio); + goto try_again; + +frag_too_big: + /* + * The caller is trying to allocate a fragment with fragsz > PAGE_SIZE + * but the cache isn't big enough to satisfy the request, this may + * happen in low memory conditions. We don't release the cache page + * because it could make memory pressure worse so we simply return NULL + * here. + */ nc->offset = offset; - - return folio_address(folio) + offset; + put_cpu_ptr(frag_cache); + if (spare) + folio_put(spare); + return NULL; } EXPORT_SYMBOL(page_frag_alloc_align); @@ -123,3 +177,25 @@ void page_frag_free(void *addr) folio_put(virt_to_folio(addr)); } EXPORT_SYMBOL(page_frag_free); + +/** + * page_frag_memdup - Allocate a page fragment and duplicate some data into it + * @frag_cache: The frag cache to use (or NULL for the default) + * @fragsz: The amount of memory to copy (maximum 1/2 page). + * @p: The source data to copy + * @gfp: Allocation flags under which to make an allocation + * @align_mask: The required alignment + */ +void *page_frag_memdup(struct page_frag_cache __percpu *frag_cache, + const void *p, size_t fragsz, gfp_t gfp, + unsigned long align_mask) +{ + void *q; + + q = page_frag_alloc_align(frag_cache, fragsz, gfp, align_mask); + if (!q) + return q; + + return memcpy(q, p, fragsz); +} +EXPORT_SYMBOL(page_frag_memdup); diff --git a/net/core/skbuff.c b/net/core/skbuff.c index eb7d33b41e71..0506e4cf1ed9 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -222,13 +222,13 @@ static void *page_frag_alloc_1k(struct page_frag_1k *nc, gfp_t gfp_mask) #endif struct napi_alloc_cache { - struct page_frag_cache page; struct page_frag_1k page_small; unsigned int skb_count; void *skb_cache[NAPI_SKB_CACHE_SIZE]; }; static DEFINE_PER_CPU(struct page_frag_cache, netdev_alloc_cache); +static DEFINE_PER_CPU(struct page_frag_cache, napi_frag_cache); static DEFINE_PER_CPU(struct napi_alloc_cache, napi_alloc_cache); /* Double check that napi_get_frags() allocates skbs with @@ -250,11 +250,9 @@ void napi_get_frags_check(struct napi_struct *napi) void *__napi_alloc_frag_align(unsigned int fragsz, unsigned int align_mask) { - struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache); - fragsz = SKB_DATA_ALIGN(fragsz); - return page_frag_alloc_align(&nc->page, fragsz, GFP_ATOMIC, align_mask); + return page_frag_alloc_align(&napi_frag_cache, fragsz, GFP_ATOMIC, align_mask); } EXPORT_SYMBOL(__napi_alloc_frag_align); @@ -264,15 +262,12 @@ void *__netdev_alloc_frag_align(unsigned int fragsz, unsigned int align_mask) fragsz = SKB_DATA_ALIGN(fragsz); if (in_hardirq() || irqs_disabled()) { - struct page_frag_cache *nc = this_cpu_ptr(&netdev_alloc_cache); - - data = page_frag_alloc_align(nc, fragsz, GFP_ATOMIC, align_mask); + data = page_frag_alloc_align(&netdev_alloc_cache, + fragsz, GFP_ATOMIC, align_mask); } else { - struct napi_alloc_cache *nc; - local_bh_disable(); - nc = this_cpu_ptr(&napi_alloc_cache); - data = page_frag_alloc_align(&nc->page, fragsz, GFP_ATOMIC, align_mask); + data = page_frag_alloc_align(&napi_frag_cache, + fragsz, GFP_ATOMIC, align_mask); local_bh_enable(); } return data; @@ -656,7 +651,6 @@ EXPORT_SYMBOL(__alloc_skb); struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int len, gfp_t gfp_mask) { - struct page_frag_cache *nc; struct sk_buff *skb; bool pfmemalloc; void *data; @@ -681,14 +675,12 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int len, gfp_mask |= __GFP_MEMALLOC; if (in_hardirq() || irqs_disabled()) { - nc = this_cpu_ptr(&netdev_alloc_cache); - data = page_frag_alloc(nc, len, gfp_mask); - pfmemalloc = nc->pfmemalloc; + data = page_frag_alloc(&netdev_alloc_cache, len, gfp_mask); + pfmemalloc = folio_is_pfmemalloc(virt_to_folio(data)); } else { local_bh_disable(); - nc = this_cpu_ptr(&napi_alloc_cache.page); - data = page_frag_alloc(nc, len, gfp_mask); - pfmemalloc = nc->pfmemalloc; + data = page_frag_alloc(&napi_frag_cache, len, gfp_mask); + pfmemalloc = folio_is_pfmemalloc(virt_to_folio(data)); local_bh_enable(); } @@ -776,8 +768,8 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len, } else { len = SKB_HEAD_ALIGN(len); - data = page_frag_alloc(&nc->page, len, gfp_mask); - pfmemalloc = nc->page.pfmemalloc; + data = page_frag_alloc(&napi_frag_cache, len, gfp_mask); + pfmemalloc = folio_is_pfmemalloc(virt_to_folio(data)); } if (unlikely(!data)) From patchwork Wed Mar 29 14:13:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76623 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp460902vqo; Wed, 29 Mar 2023 07:36:12 -0700 (PDT) X-Google-Smtp-Source: AKy350Yu3muJuJ2oODZ9B1+ZrjF3Y4azmDwYx7A/vYXIeB2+XgQ62i0mjo7HCB0cSwJsR3SfmU0g X-Received: by 2002:a17:906:ad9:b0:930:8714:6739 with SMTP id z25-20020a1709060ad900b0093087146739mr18823962ejf.30.1680100572216; Wed, 29 Mar 2023 07:36:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680100572; cv=none; d=google.com; s=arc-20160816; b=oAPsj6u353mNqYFzeyj0KfbCSca40P3+r7sA8qQAXqzFQX4+o3R4wW3CRnUDFWYaJB BBE3MUcHq+lwXfmpS1jfN8HQbLFn/9+/GlFL/6K4JIUXh/ZWWqYAe/7Hh6igq8yMr0Ne Q5eYroUNAPrvPyUaeBoI5ft3cZpBup/9Z0sC4LSi2CnidIkdWeYoIhYVMIlxjXzjWI1k 96McOZ2K02sqXYBYjFcFFOWfWvkPGljgRwliUjWve45/ugfUrC17myEYNlKBeWB9I6H/ hkEb7thY5IsweLWqRkgPnbJmHy/zr5DPXFxYDyXAf5O9wOg4cNeCJvEDHbsn+Rx3C0X0 BNwg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=62BPou+xERgiMp9fd3+/fNOBk5TLUi4yt2oNbndQPtM=; b=Qi5Lvm4fp+qHmwpPpEYM9zsJ69tmcThkSE1uqOVD19qDbmhO8nW/qzEDCP0pho75fy vBl0tj9oPSM6U6cLgncXUbJR0uj0icA+BqkXtObdhEX7cw3yLcmTU1FqBo9Ne9f4F6gK P4n57OaQjEPB7cmV+M6gzOBU+QYA2Aktr2vM7JwAVOfQHiI3VaqGsrkydvCQSrzDxB7U WpWyQcoBAVpUCQTV2XlTv34YoN7PyTho/URpdPu0L2kWo47MkNte8umWqZZwBTIDiGlG fN683VHZwgQ5G/AwyKesY0LQd33OvIuMRsxKg1Ys9mrMtnKDAjNDASXSjmXiDKFR/GBB FI7A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=W113B46R; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ne36-20020a1709077ba400b0093d4fb59d99si18629708ejc.422.2023.03.29.07.35.48; Wed, 29 Mar 2023 07:36:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=W113B46R; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230391AbjC2OQ5 (ORCPT + 99 others); Wed, 29 Mar 2023 10:16:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53416 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230420AbjC2OQm (ORCPT ); Wed, 29 Mar 2023 10:16:42 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 07A0240F8 for ; Wed, 29 Mar 2023 07:15:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099267; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=62BPou+xERgiMp9fd3+/fNOBk5TLUi4yt2oNbndQPtM=; b=W113B46RL2gmRT3NPODmTOwxmb+L04YRFWj1FKLHTxQFPpx/QECuZZE9eso3kJD5WRlECo WJBgpiVtmDV+Poa8Hoc/Tvbit2X7p1CjuZWq9xN8lobLm/+Kys7QgjKvAtjKkO2k2Axvzr 8Qy2eCrfruBoMlrLbTPnn8f08HY0Kqs= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-74-xczYM2jqMimXt-WLCPIA-g-1; Wed, 29 Mar 2023 10:14:23 -0400 X-MC-Unique: xczYM2jqMimXt-WLCPIA-g-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6A6F1100DEAF; Wed, 29 Mar 2023 14:14:22 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 850CF14171BB; Wed, 29 Mar 2023 14:14:20 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v2 08/48] tcp: Support MSG_SPLICE_PAGES Date: Wed, 29 Mar 2023 15:13:14 +0100 Message-Id: <20230329141354.516864-9-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761713137712489120?= X-GMAIL-MSGID: =?utf-8?q?1761713137712489120?= Make TCP's sendmsg() support MSG_SPLICE_PAGES. This causes pages to be spliced from the source iterator if possible (the iterator must be ITER_BVEC and the pages must be spliceable). This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells cc: Eric Dumazet cc: "David S. Miller" cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- net/ipv4/tcp.c | 67 ++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 60 insertions(+), 7 deletions(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 288693981b00..910b327c236e 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1220,7 +1220,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) int flags, err, copied = 0; int mss_now = 0, size_goal, copied_syn = 0; int process_backlog = 0; - bool zc = false; + int zc = 0; long timeo; flags = msg->msg_flags; @@ -1231,17 +1231,22 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) if (msg->msg_ubuf) { uarg = msg->msg_ubuf; net_zcopy_get(uarg); - zc = sk->sk_route_caps & NETIF_F_SG; + if (sk->sk_route_caps & NETIF_F_SG) + zc = 1; } else if (sock_flag(sk, SOCK_ZEROCOPY)) { uarg = msg_zerocopy_realloc(sk, size, skb_zcopy(skb)); if (!uarg) { err = -ENOBUFS; goto out_err; } - zc = sk->sk_route_caps & NETIF_F_SG; - if (!zc) + if (sk->sk_route_caps & NETIF_F_SG) + zc = 1; + else uarg_to_msgzc(uarg)->zerocopy = 0; } + } else if (unlikely(msg->msg_flags & MSG_SPLICE_PAGES) && size) { + if (sk->sk_route_caps & NETIF_F_SG) + zc = 2; } if (unlikely(flags & MSG_FASTOPEN || inet_sk(sk)->defer_connect) && @@ -1304,7 +1309,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) goto do_error; while (msg_data_left(msg)) { - int copy = 0; + ssize_t copy = 0; skb = tcp_write_queue_tail(sk); if (skb) @@ -1345,7 +1350,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) if (copy > msg_data_left(msg)) copy = msg_data_left(msg); - if (!zc) { + if (zc == 0) { bool merge = true; int i = skb_shinfo(skb)->nr_frags; struct page_frag *pfrag = sk_page_frag(sk); @@ -1390,7 +1395,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) page_ref_inc(pfrag->page); } pfrag->offset += copy; - } else { + } else if (zc == 1) { /* First append to a fragless skb builds initial * pure zerocopy skb */ @@ -1411,6 +1416,54 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) if (err < 0) goto do_error; copy = err; + } else if (zc == 2) { + /* Splice in data. */ + struct page *page = NULL, **pages = &page; + size_t off = 0, part; + bool can_coalesce; + int i = skb_shinfo(skb)->nr_frags; + + copy = iov_iter_extract_pages(&msg->msg_iter, &pages, + copy, 1, 0, &off); + if (copy <= 0) { + err = copy ?: -EIO; + goto do_error; + } + + can_coalesce = skb_can_coalesce(skb, i, page, off); + if (!can_coalesce && i >= READ_ONCE(sysctl_max_skb_frags)) { + tcp_mark_push(tp, skb); + iov_iter_revert(&msg->msg_iter, copy); + goto new_segment; + } + if (tcp_downgrade_zcopy_pure(sk, skb)) { + iov_iter_revert(&msg->msg_iter, copy); + goto wait_for_space; + } + + part = tcp_wmem_schedule(sk, copy); + iov_iter_revert(&msg->msg_iter, copy - part); + if (!part) + goto wait_for_space; + copy = part; + + if (can_coalesce) { + skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy); + } else { + get_page(page); + skb_fill_page_desc_noacc(skb, i, page, off, copy); + } + page = NULL; + + if (!(flags & MSG_NO_SHARED_FRAGS)) + skb_shinfo(skb)->flags |= SKBFL_SHARED_FRAG; + + skb->len += copy; + skb->data_len += copy; + skb->truesize += copy; + sk_wmem_queued_add(sk, copy); + sk_mem_charge(sk, copy); + } if (!copied) From patchwork Wed Mar 29 14:13:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76628 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp462121vqo; Wed, 29 Mar 2023 07:38:11 -0700 (PDT) X-Google-Smtp-Source: AKy350ZpxFxga0iF3EyKBROMy29Xu7HZpJLglZJfMtwgjKaWe5JpsXWdQXud12KxaQjSrB7mHua6 X-Received: by 2002:a17:906:6d09:b0:934:2fe4:4921 with SMTP id m9-20020a1709066d0900b009342fe44921mr18830789ejr.19.1680100691205; Wed, 29 Mar 2023 07:38:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680100691; cv=none; d=google.com; s=arc-20160816; b=DP9YYW0DqJ8IvWBxmqqfUbMrxy3yIuWY1V02zJnWo9xjBRDRIqLg54NR33XEPph3NM a68/Mgbn9+K1pjPqEZKf8P7/Bti4d0I6BWhNz41J/oUFJ7pzvBlCXrhXZVZQDcHQkzQL I4Emz1IdRstNOXpkOTyHnYxO2CBg5O5eOB/N+MROw7SQPkjGhgZTIDq6ZhpCG5LynpgP Uz8Qg2zhZSJgp8sgNfnERF01ntMcoRz+KHDfFffO2sQjNwoxO9bjiq4jj8AmaHao/dYZ D0mVLNHTbyPLgdMFbzMR522p+It6n++8BPf54+pEsWsQEhcPaEXUPwuCgNiZfry+82wA uZ2g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=4TWX2aSvGOZ9Qof5Bdvg67w1lje6zmFD7Oc0vC/yB3w=; b=IvpwptwfM7fDcx4LSgOaLkFXlrkobR4Nrx1JPyOp8bk4L1BO3hw3paMxHXu/f/M6z0 wo3fN/Ovf05+U3KtuaaqZb3vLtLxEwzGGOGfqmasoK1xYJf+W/eCX12b9pTO+mC0hlqY lYbi7rPisontOTUJRXeSa3120BmkJ7vORd9eZaERG68gnYL+0VaHFEMeJOGe5dkwVNa+ fsoqj0+hHE/E82JrsC29mFOvAQltv8OkuivSmcD5TVmfQd8CFiieUybReYtVsCWRw4Tl mgGFBjj0J4iU4PhjMF+my4eUB6E3r4yMaRCeCg5QxVkfZX99eqRc8kx8ONSRFykWTkPm NrhQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=KK7pzZ41; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ne36-20020a1709077ba400b0093d4fb59d99si18629708ejc.422.2023.03.29.07.37.47; Wed, 29 Mar 2023 07:38:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=KK7pzZ41; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230487AbjC2OQx (ORCPT + 99 others); Wed, 29 Mar 2023 10:16:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53306 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230268AbjC2OQe (ORCPT ); Wed, 29 Mar 2023 10:16:34 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 62A8926A5 for ; Wed, 29 Mar 2023 07:15:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099270; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4TWX2aSvGOZ9Qof5Bdvg67w1lje6zmFD7Oc0vC/yB3w=; b=KK7pzZ41W2ghnCQVl4KrjhZwgGpvQr0kJLgQxsXe0N/VamT6VLfrbs0I03cbOVn5oTKE91 Y2MiJfY6RDmmApIo+C4VAzbprmkbsRVb33BOe6Y7oZ3vG6Y3vx7PkZNAw3qD755UYSN0WI fxn0GvMr0+lTZ/rqckfZwTM1RjYxgNA= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-106-aG1NERqFMhu5XfuNhb_KOw-1; Wed, 29 Mar 2023 10:14:25 -0400 X-MC-Unique: aG1NERqFMhu5XfuNhb_KOw-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E1811811E7C; Wed, 29 Mar 2023 14:14:24 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 082972027040; Wed, 29 Mar 2023 14:14:22 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v2 09/48] tcp: Make sendmsg(MSG_SPLICE_PAGES) copy unspliceable data Date: Wed, 29 Mar 2023 15:13:15 +0100 Message-Id: <20230329141354.516864-10-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761713262204522556?= X-GMAIL-MSGID: =?utf-8?q?1761713262204522556?= If sendmsg() with MSG_SPLICE_PAGES encounters a page that shouldn't be spliced - a slab page, for instance, or one with a zero count - make tcp_sendmsg() copy it. Signed-off-by: David Howells cc: Eric Dumazet cc: "David S. Miller" cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- net/ipv4/tcp.c | 28 +++++++++++++++++++++++++--- 1 file changed, 25 insertions(+), 3 deletions(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 910b327c236e..6ef0518eb706 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1417,10 +1417,10 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) goto do_error; copy = err; } else if (zc == 2) { - /* Splice in data. */ + /* Splice in data if we can; copy if we can't. */ struct page *page = NULL, **pages = &page; size_t off = 0, part; - bool can_coalesce; + bool can_coalesce, put = false; int i = skb_shinfo(skb)->nr_frags; copy = iov_iter_extract_pages(&msg->msg_iter, &pages, @@ -1447,12 +1447,34 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) goto wait_for_space; copy = part; + if (!sendpage_ok(page)) { + const void *p = kmap_local_page(page); + void *q; + + q = page_frag_memdup(NULL, p + off, copy, + sk->sk_allocation, ULONG_MAX); + kunmap_local(p); + if (!q) { + iov_iter_revert(&msg->msg_iter, copy); + err = copy ?: -ENOMEM; + goto do_error; + } + page = virt_to_page(q); + off = offset_in_page(q); + put = true; + can_coalesce = false; + } + if (can_coalesce) { skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy); } else { - get_page(page); + if (!put) + get_page(page); + put = false; skb_fill_page_desc_noacc(skb, i, page, off, copy); } + if (put) + put_page(page); page = NULL; if (!(flags & MSG_NO_SHARED_FRAGS)) From patchwork Wed Mar 29 14:13:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76603 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp449677vqo; Wed, 29 Mar 2023 07:19:52 -0700 (PDT) X-Google-Smtp-Source: AKy350YKb01sexkW24l6kWqv3eNizTs9zJZPqUHGj0wZRXYdd774nZdjGeblIt++T3TsA2zWPzTN X-Received: by 2002:a17:906:6946:b0:92f:8324:e3b7 with SMTP id c6-20020a170906694600b0092f8324e3b7mr18261932ejs.37.1680099592416; Wed, 29 Mar 2023 07:19:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680099592; cv=none; d=google.com; s=arc-20160816; b=rp04rDes6Zi3JDgTCAnwrbBqQ80WOWDnFYBRNK6+cD+WxcDK4p0iz1NmXZMlbdjqd7 0KTa24ntQ6HhdTPx5pTEIlivwSEp5+8TkTFlj5dVbAowrYtrubZnbgLfglLanyGMZjAL 4Hi03tOn7rWO0e+mhag/NVqWFETu0BzNgdXfNwa8nDHI3I8pHNj5q6Z3dbum7l3CHcfL HX9STr2krHnq8Rz3kzhOnODWCmUtxvJw1tXitc9YhzHtr/yA6MlmtdS/GoGSSvbqdeet SzcNMn917xJ1VqF7yNbZrvKD7ikQXAah7sH8E/devkv8ruU/Qm3b3YkCeUZms8GZ+OSg hurg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=lNWag+P+g8mnNn3gi6svi7x9UvjDZCb1uhedzWJMgb8=; b=CpbaNZptVqWj+NafPC98wjIyZJvbAHPWAGLdIrcLSaZtWQIu/vrufUdv6CJQ3TsDpJ qzLGwP/F4pP2B76DtOiEz5Jm3j7rW409CH1167lyhjlUWUdqcfG9oCj3rqSgN0zze/9r zvUidhmqzxwXTnBHBV8sdMJzdwWJW5YdgtrD3yKG5ZG2rbYsHJTfjwj1BxRUiYxpEiPZ L9JeZuBMi7lnXemj43X2Sz+SjSdnfQn8g7ZaYzfp//XyZD/b7klbm7Og3TOqm1rAUyTQ nUAJjajxMLk2+Qm2L8JrKnigNq8pUCUQvWl1dkgYZDd+dQBjaNWlJZqr2xZqrf3hCIAd GfmA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="Oo/66DB+"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hq13-20020a1709073f0d00b0093fcc6295f7si12410171ejc.773.2023.03.29.07.19.27; Wed, 29 Mar 2023 07:19:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="Oo/66DB+"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230497AbjC2ORU (ORCPT + 99 others); Wed, 29 Mar 2023 10:17:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53620 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230433AbjC2OQn (ORCPT ); Wed, 29 Mar 2023 10:16:43 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C2E894ED5 for ; Wed, 29 Mar 2023 07:15:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099272; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lNWag+P+g8mnNn3gi6svi7x9UvjDZCb1uhedzWJMgb8=; b=Oo/66DB+8joKChMy4DiNu497WzOMVGqE/8kNx8rHdtsSTz0EDL3jGtrUZjmuNQiPjhglyZ FDiPEK0dXl60OONwPC8/gWdpuR4NTIHa15M7rU/YV9/LDIxSAr0BCXiigDEMRosrG5p0Lq dXYPQUJZVIhusDF9kkwLwxqmBmTxNww= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-600-m7kGADpVPpym7xhogv7czA-1; Wed, 29 Mar 2023 10:14:28 -0400 X-MC-Unique: m7kGADpVPpym7xhogv7czA-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 64174811E7C; Wed, 29 Mar 2023 14:14:27 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7E0FA1121330; Wed, 29 Mar 2023 14:14:25 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v2 10/48] tcp: Convert do_tcp_sendpages() to use MSG_SPLICE_PAGES Date: Wed, 29 Mar 2023 15:13:16 +0100 Message-Id: <20230329141354.516864-11-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761712110440188550?= X-GMAIL-MSGID: =?utf-8?q?1761712110440188550?= Convert do_tcp_sendpages() to use sendmsg() with MSG_SPLICE_PAGES rather than directly splicing in the pages itself. do_tcp_sendpages() can then be inlined in subsequent patches into its callers. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells cc: Eric Dumazet cc: "David S. Miller" cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- net/ipv4/tcp.c | 161 +++---------------------------------------------- 1 file changed, 10 insertions(+), 151 deletions(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 6ef0518eb706..a1c5a6d9419c 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -971,163 +971,22 @@ static int tcp_wmem_schedule(struct sock *sk, int copy) return min(copy, sk->sk_forward_alloc); } -static struct sk_buff *tcp_build_frag(struct sock *sk, int size_goal, int flags, - struct page *page, int offset, size_t *size) -{ - struct sk_buff *skb = tcp_write_queue_tail(sk); - struct tcp_sock *tp = tcp_sk(sk); - bool can_coalesce; - int copy, i; - - if (!skb || (copy = size_goal - skb->len) <= 0 || - !tcp_skb_can_collapse_to(skb)) { -new_segment: - if (!sk_stream_memory_free(sk)) - return NULL; - - skb = tcp_stream_alloc_skb(sk, 0, sk->sk_allocation, - tcp_rtx_and_write_queues_empty(sk)); - if (!skb) - return NULL; - -#ifdef CONFIG_TLS_DEVICE - skb->decrypted = !!(flags & MSG_SENDPAGE_DECRYPTED); -#endif - tcp_skb_entail(sk, skb); - copy = size_goal; - } - - if (copy > *size) - copy = *size; - - i = skb_shinfo(skb)->nr_frags; - can_coalesce = skb_can_coalesce(skb, i, page, offset); - if (!can_coalesce && i >= READ_ONCE(sysctl_max_skb_frags)) { - tcp_mark_push(tp, skb); - goto new_segment; - } - if (tcp_downgrade_zcopy_pure(sk, skb)) - return NULL; - - copy = tcp_wmem_schedule(sk, copy); - if (!copy) - return NULL; - - if (can_coalesce) { - skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy); - } else { - get_page(page); - skb_fill_page_desc_noacc(skb, i, page, offset, copy); - } - - if (!(flags & MSG_NO_SHARED_FRAGS)) - skb_shinfo(skb)->flags |= SKBFL_SHARED_FRAG; - - skb->len += copy; - skb->data_len += copy; - skb->truesize += copy; - sk_wmem_queued_add(sk, copy); - sk_mem_charge(sk, copy); - WRITE_ONCE(tp->write_seq, tp->write_seq + copy); - TCP_SKB_CB(skb)->end_seq += copy; - tcp_skb_pcount_set(skb, 0); - - *size = copy; - return skb; -} - ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset, size_t size, int flags) { - struct tcp_sock *tp = tcp_sk(sk); - int mss_now, size_goal; - int err; - ssize_t copied; - long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT); - - if (IS_ENABLED(CONFIG_DEBUG_VM) && - WARN_ONCE(!sendpage_ok(page), - "page must not be a Slab one and have page_count > 0")) - return -EINVAL; - - /* Wait for a connection to finish. One exception is TCP Fast Open - * (passive side) where data is allowed to be sent before a connection - * is fully established. - */ - if (((1 << sk->sk_state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) && - !tcp_passive_fastopen(sk)) { - err = sk_stream_wait_connect(sk, &timeo); - if (err != 0) - goto out_err; - } - - sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk); - - mss_now = tcp_send_mss(sk, &size_goal, flags); - copied = 0; - - err = -EPIPE; - if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN)) - goto out_err; - - while (size > 0) { - struct sk_buff *skb; - size_t copy = size; - - skb = tcp_build_frag(sk, size_goal, flags, page, offset, ©); - if (!skb) - goto wait_for_space; - - if (!copied) - TCP_SKB_CB(skb)->tcp_flags &= ~TCPHDR_PSH; - - copied += copy; - offset += copy; - size -= copy; - if (!size) - goto out; - - if (skb->len < size_goal || (flags & MSG_OOB)) - continue; - - if (forced_push(tp)) { - tcp_mark_push(tp, skb); - __tcp_push_pending_frames(sk, mss_now, TCP_NAGLE_PUSH); - } else if (skb == tcp_send_head(sk)) - tcp_push_one(sk, mss_now); - continue; - -wait_for_space: - set_bit(SOCK_NOSPACE, &sk->sk_socket->flags); - tcp_push(sk, flags & ~MSG_MORE, mss_now, - TCP_NAGLE_PUSH, size_goal); - - err = sk_stream_wait_memory(sk, &timeo); - if (err != 0) - goto do_error; + struct bio_vec bvec; + struct msghdr msg = { + .msg_flags = flags, + .msg_kflags = MSG_SPLICE_PAGES, + }; - mss_now = tcp_send_mss(sk, &size_goal, flags); - } + bvec_set_page(&bvec, page, size, offset); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); -out: - if (copied) { - tcp_tx_timestamp(sk, sk->sk_tsflags); - if (!(flags & MSG_SENDPAGE_NOTLAST)) - tcp_push(sk, flags, mss_now, tp->nonagle, size_goal); - } - return copied; + if (flags & MSG_SENDPAGE_NOTLAST) + msg.msg_flags |= MSG_MORE; -do_error: - tcp_remove_empty_skb(sk); - if (copied) - goto out; -out_err: - /* make sure we wake any epoll edge trigger waiter */ - if (unlikely(tcp_rtx_and_write_queues_empty(sk) && err == -EAGAIN)) { - sk->sk_write_space(sk); - tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED); - } - return sk_stream_error(sk, flags, err); + return tcp_sendmsg_locked(sk, &msg, size); } EXPORT_SYMBOL_GPL(do_tcp_sendpages); From patchwork Wed Mar 29 14:13:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76605 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp449723vqo; Wed, 29 Mar 2023 07:19:56 -0700 (PDT) X-Google-Smtp-Source: AKy350Z1SEQqnjfbRTklpfju/Unx35+xdtFNmOXegZtXuhN2MANZtsCaCHmVEnzaikMMR9ux6OsK X-Received: by 2002:a17:906:6a8d:b0:933:9918:665d with SMTP id p13-20020a1709066a8d00b009339918665dmr2563372ejr.11.1680099595896; Wed, 29 Mar 2023 07:19:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680099595; cv=none; d=google.com; s=arc-20160816; b=EuriewSuohcm99qUVq28vxE8v1dv3M1+jBt+ohRtGpVG3HPQKaJH2x3n5zABULN9z4 WYZfWK/TPGu5Xwo7yOG+uTMSJz9svuSsy+QXR68Oe0EFumcw7BV/jCa43mWkhh9vvTx9 ZbN8Z32CY4Aebgf/3tKH/4tXf9CFHs0e/3td0il7/ny4uk/Qi4TdjZGcOmJTntnakLr4 sxkffx1bLJ3fpc5BzOtYKQAQmrUh0uY6/jkGhHSkpU2QdQWsY4e9+la4l1RtjPYwMAse vkdQL6JtoTCSNtbwOcjzd5Sn2OjT2+3+T/7/IaeqSilH/wp9Z3evYgq06ibql4BvA7EQ 3HaQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=V4A9+ThbnxeOTamldoFRxBPwB7/WQbFmJXJUgDasL0c=; b=Kr7yqYnwTq2uDg8I4IQxvimmrPEwBiuhjG3qKUiQugLqHEtFN25R4dcnKT+i9ndaNN SWCrASjRh2g5T3LZsct7Q3mdN2nnp1hGYGRs1N7t1ZYb7Qed4GfqqW/SL2NnLAbIBxo1 m/2Usg4kHdkKU64gagK4p4AaIbeHPt3H/fD2GOu/tYbySCMPgHquhbn/yG219LciDDdF ZqHHTLR/7uHDt/UnFh/+d+4TxQ3sW+4KVXshG7w3ux3tFI6lfW3KZYL6Y5AtMpHEnXY4 y7JCeGe50gSP8gom9Bsah4JgqWv3BjsjJ0HhVn7fpwwbDu4HA+CSR/KYrL7pY/pCsmrY 0deA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="CFqgZ/0V"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 15-20020a170906328f00b008d02052ed2asi35559704ejw.637.2023.03.29.07.19.31; Wed, 29 Mar 2023 07:19:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="CFqgZ/0V"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230095AbjC2ORd (ORCPT + 99 others); Wed, 29 Mar 2023 10:17:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53384 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230448AbjC2OQo (ORCPT ); Wed, 29 Mar 2023 10:16:44 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 36C5F40C1 for ; Wed, 29 Mar 2023 07:15:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099276; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=V4A9+ThbnxeOTamldoFRxBPwB7/WQbFmJXJUgDasL0c=; b=CFqgZ/0VGge0DwYJgehqSudrGcsgGA/xsdeVcKw4tpSLYqPRPd2+ZMr1hxuz4aIQN3r93a q7Mv97ifFaDK8MIElOP28zMDLigq+CWXy3UWMsqQUWMEzVS+bHv56ctO4+KqqfUfsesfXH rB/CQH6PPtsOWNPteGjdGCDgxz3Kaik= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-16-Xdfj3glLMsiVuDad3Y8AFA-1; Wed, 29 Mar 2023 10:14:32 -0400 X-MC-Unique: Xdfj3glLMsiVuDad3Y8AFA-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4CBFB3C0E451; Wed, 29 Mar 2023 14:14:30 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 243C9492B03; Wed, 29 Mar 2023 14:14:28 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, John Fastabend , Jakub Sitnicki , bpf@vger.kernel.org Subject: [RFC PATCH v2 11/48] tcp_bpf: Inline do_tcp_sendpages as it's now a wrapper around tcp_sendmsg Date: Wed, 29 Mar 2023 15:13:17 +0100 Message-Id: <20230329141354.516864-12-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761712114120415146?= X-GMAIL-MSGID: =?utf-8?q?1761712114120415146?= do_tcp_sendpages() is now just a small wrapper around tcp_sendmsg_locked(), so inline it. This is part of replacing ->sendpage() with a call to sendmsg() with MSG_SPLICE_PAGES set. Signed-off-by: David Howells cc: John Fastabend cc: Jakub Sitnicki cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org cc: bpf@vger.kernel.org --- net/ipv4/tcp_bpf.c | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index cf26d65ca389..7f17134637eb 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -72,11 +72,13 @@ static int tcp_bpf_push(struct sock *sk, struct sk_msg *msg, u32 apply_bytes, { bool apply = apply_bytes; struct scatterlist *sge; + struct msghdr msghdr = { .msg_flags = flags | MSG_SPLICE_PAGES, }; struct page *page; int size, ret = 0; u32 off; while (1) { + struct bio_vec bvec; bool has_tx_ulp; sge = sk_msg_elem(msg, msg->sg.start); @@ -88,16 +90,18 @@ static int tcp_bpf_push(struct sock *sk, struct sk_msg *msg, u32 apply_bytes, tcp_rate_check_app_limited(sk); retry: has_tx_ulp = tls_sw_has_ctx_tx(sk); - if (has_tx_ulp) { - flags |= MSG_SENDPAGE_NOPOLICY; - ret = kernel_sendpage_locked(sk, - page, off, size, flags); - } else { - ret = do_tcp_sendpages(sk, page, off, size, flags); - } + if (has_tx_ulp) + msghdr.msg_flags |= MSG_SENDPAGE_NOPOLICY; + if (flags & MSG_SENDPAGE_NOTLAST) + msghdr.msg_flags |= MSG_MORE; + + bvec_set_page(&bvec, page, size, off); + iov_iter_bvec(&msghdr.msg_iter, ITER_SOURCE, &bvec, 1, size); + ret = tcp_sendmsg_locked(sk, &msghdr, size); if (ret <= 0) return ret; + if (apply) apply_bytes -= ret; msg->sg.size -= ret; @@ -398,7 +402,7 @@ static int tcp_bpf_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) long timeo; int flags; - /* Don't let internal do_tcp_sendpages() flags through */ + /* Don't let internal sendpage flags through */ flags = (msg->msg_flags & ~MSG_SENDPAGE_DECRYPTED); flags |= MSG_NO_SHARED_FRAGS; From patchwork Wed Mar 29 14:13:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76610 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp450379vqo; Wed, 29 Mar 2023 07:20:49 -0700 (PDT) X-Google-Smtp-Source: AKy350Zic0ScT9kE2NNHPy/eRTpk35jfFBNyvlfXwnY+FUPNA1CPZ8EX25bSy55aYf2oPl46NTiI X-Received: by 2002:a50:fe8a:0:b0:502:4862:d453 with SMTP id d10-20020a50fe8a000000b005024862d453mr8516673edt.3.1680099649349; Wed, 29 Mar 2023 07:20:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680099649; cv=none; d=google.com; s=arc-20160816; b=FGp5miqSZLRp3LJk+42LZ97A9HcTK7TrVdvaYhI0zzQqVcsCWq1cf3qAwtgvRzJWdA RXpX39R+U7nzoXqJbe9z8bcMAz4IJBrCELSyJpFfOvegKSe1s5IClOOhK1Wm26vH50ap FAlDHgh19NJcG/EzwsdPtKtK4+nZ5oYaebsiUVAmXCMS6XtvHloYi1K09ct9LdoXcK/i NV/D/Z4mDwbxu9aU0jFLGlsW07AEDym4Gb6fwBABudO8AG/lLLxFfovAd+dZbtjCxkjd HvBZTf+EjQaRdWlY8BJxcTAR9IdmI1nq99fa2wjgOu74Wf80KpjmRDkEC2Q7Q2xtcePI TITQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=DjALLcCTAkJ9GFGzQm0mxszTSA2UBcCYteYf8nsJBJs=; b=cBmP6KE/DGhMOB/7qJEYjwYj8CfThMz9Lf8lDQ5Qsi7FyLoH7nDLVmMa0uDJ++SNy1 aAiaeHmdE1gKVcNtM4lnJ9gOcH9Y7fkEmt7rc6i5MpCZKiTKQOs4yutwHJfmXZgi8FAo 5Wlm/eNjcc8pNv3k/hHbaiSY0AAumuH2F0tFajNigldapGe+72OVbkaRvscuc2hhunTH 7iqM+Af8F+ckTTXQllWicoM0Q1NlbktLoauUnR09KdwAsNxqg8RpiaqosCZyxpoBY7Pq KsJ39bCI5lyVk+xDcD4IdAUUftHYzKNOkUgUfF/Hh5jEBnw+NNquisV85EmurvXOtptD 4ebA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=EkhqoFr4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m1-20020aa7c481000000b004fedca1007dsi36202659edq.161.2023.03.29.07.20.25; Wed, 29 Mar 2023 07:20:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=EkhqoFr4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230073AbjC2OSW (ORCPT + 99 others); Wed, 29 Mar 2023 10:18:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55208 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230504AbjC2OR2 (ORCPT ); Wed, 29 Mar 2023 10:17:28 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BA79659F3 for ; Wed, 29 Mar 2023 07:15:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099277; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DjALLcCTAkJ9GFGzQm0mxszTSA2UBcCYteYf8nsJBJs=; b=EkhqoFr4Q/AKgIncNUIt61FcbK31PU8oe5MPjt0AepOz4pL6ofZ4tAN4CwN6xFzFMDCgme 12qMSpKbZtxn0G6As8R55763879sEV9mYD96Kuj0AuYb07iw+5H8J2bf4Ll2kumdi3N8mJ BX85i9jU6Iu0IcHyXiJBQFOYLQwYfDI= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-671-FOdpR13LNSGAWxRHNrnIWQ-1; Wed, 29 Mar 2023 10:14:34 -0400 X-MC-Unique: FOdpR13LNSGAWxRHNrnIWQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 14F9B185A78F; Wed, 29 Mar 2023 14:14:33 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id DB6FCC15BA0; Wed, 29 Mar 2023 14:14:30 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Steffen Klassert , Herbert Xu Subject: [RFC PATCH v2 12/48] espintcp: Inline do_tcp_sendpages() Date: Wed, 29 Mar 2023 15:13:18 +0100 Message-Id: <20230329141354.516864-13-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761712170091111715?= X-GMAIL-MSGID: =?utf-8?q?1761712170091111715?= do_tcp_sendpages() is now just a small wrapper around tcp_sendmsg_locked(), so inline it, allowing do_tcp_sendpages() to be removed. This is part of replacing ->sendpage() with a call to sendmsg() with MSG_SPLICE_PAGES set. Signed-off-by: David Howells cc: Steffen Klassert cc: Herbert Xu cc: Eric Dumazet cc: "David S. Miller" cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- net/xfrm/espintcp.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/net/xfrm/espintcp.c b/net/xfrm/espintcp.c index 872b80188e83..3504925babdb 100644 --- a/net/xfrm/espintcp.c +++ b/net/xfrm/espintcp.c @@ -205,14 +205,16 @@ static int espintcp_sendskb_locked(struct sock *sk, struct espintcp_msg *emsg, static int espintcp_sendskmsg_locked(struct sock *sk, struct espintcp_msg *emsg, int flags) { + struct msghdr msghdr = { .msg_flags = flags | MSG_SPLICE_PAGES, }; struct sk_msg *skmsg = &emsg->skmsg; struct scatterlist *sg; int done = 0; int ret; - flags |= MSG_SENDPAGE_NOTLAST; + msghdr.msg_flags |= MSG_SENDPAGE_NOTLAST; sg = &skmsg->sg.data[skmsg->sg.start]; do { + struct bio_vec bvec; size_t size = sg->length - emsg->offset; int offset = sg->offset + emsg->offset; struct page *p; @@ -220,11 +222,13 @@ static int espintcp_sendskmsg_locked(struct sock *sk, emsg->offset = 0; if (sg_is_last(sg)) - flags &= ~MSG_SENDPAGE_NOTLAST; + msghdr.msg_flags &= ~MSG_SENDPAGE_NOTLAST; p = sg_page(sg); retry: - ret = do_tcp_sendpages(sk, p, offset, size, flags); + bvec_set_page(&bvec, p, size, offset); + iov_iter_bvec(&msghdr.msg_iter, ITER_SOURCE, &bvec, 1, size); + ret = tcp_sendmsg_locked(sk, &msghdr, size); if (ret < 0) { emsg->offset = offset - sg->offset; skmsg->sg.start += done; From patchwork Wed Mar 29 14:13:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76609 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp450273vqo; Wed, 29 Mar 2023 07:20:38 -0700 (PDT) X-Google-Smtp-Source: AK7set+uewMeO8UJodPRQg2VSfx/X2r+TFWX7Ib88TEUV8mtI/bkTegxo6cgWgncStFYiBxaaPOZ X-Received: by 2002:a05:6a20:8baf:b0:db:a8aa:307c with SMTP id m47-20020a056a208baf00b000dba8aa307cmr17229905pzh.12.1680099638545; Wed, 29 Mar 2023 07:20:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680099638; cv=none; d=google.com; s=arc-20160816; b=lQMndisKj35SLpor6aiF9BG9IL+Ae6o7BOr63wpxhaiYTNXoqJlUuEvV2SFtyprqQT Rlhb2Rz/SFkuA7Umm8hiGUaBdQyz4wKAT0RbUIqZdfAgelKXAmpqV7tInT+ZdXAS0ymt fJHuyzsT5ZquT0BbVBFBEpd9yF3HdNG5qc3HWhlBH1HE0fbDR1WI2q/RENz6alMlCdKP pUA1o9R8hvsgCD/n12s4JOwN2hRrGlsHPJk0fCcR2sc+AGrZXdLkd6EgBcCGo1BGe3/Z cHBeJfOtvF1grEFh/Lk4f8HGtQHey3YF37/I5WkF7St7rlO7Qj/8qGpnbbv6UADQBjkZ umkQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=v6WcNKpTWWgexmydU39qEchX1MtJJl9VMoZGlW6ePEY=; b=b7PRxiB8sZ9QfWXsNsrPlAk+xLaWGOct+ja3rfMu95TaHh471BvMy0Tohn+VTNYEGx dC5U/LLEQYiMXs/GJwZ3lGeR2Kj5LioRPFJNFwF4Jnu3oOt8INnYDbq+jbaH2DSYK0Nr y4as5OhHOwNXJeaPtgBqlLuEADuDnNI8tS9qULa17GlHpcJGESJKViHaohEM303z0NNU UD1+4/6i8xtwZo0wvHDgHz/xqXlq0kwppxLgC3+8uncpol1IKisaaMPeJjQzvE8/Z88y ZPs5MOcrv8NxpipE8FOmljA/lNZNk+jiNk5jm4QvbJqU5ma7ZZkkbhkjnYCDVgkRZFv1 GJ8Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=fcnlM7wm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s82-20020a632c55000000b0050f79e158a1si24901694pgs.737.2023.03.29.07.20.24; Wed, 29 Mar 2023 07:20:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=fcnlM7wm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230506AbjC2OR3 (ORCPT + 99 others); Wed, 29 Mar 2023 10:17:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53364 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230440AbjC2OQn (ORCPT ); Wed, 29 Mar 2023 10:16:43 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6407A189 for ; Wed, 29 Mar 2023 07:15:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099280; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=v6WcNKpTWWgexmydU39qEchX1MtJJl9VMoZGlW6ePEY=; b=fcnlM7wmufZdjeVJ1N0qIER77Qvut65Q6/b1CUVkM3f+pzjILKvWhErozoWT7rqdGwWopf SXxfoZ3CAiFJ0J96gVxDQfj660NOqX28Wic0hsFUpGD/vHHvXyn/A5CV77s7clU8qhb6dr N2M1O9qL69lD5xg8umWJRHJ8YIzdEGc= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-428-r-bTm78zPGe9KMPftn10KA-1; Wed, 29 Mar 2023 10:14:36 -0400 X-MC-Unique: r-bTm78zPGe9KMPftn10KA-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id D52383C0E441; Wed, 29 Mar 2023 14:14:35 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id C31801121330; Wed, 29 Mar 2023 14:14:33 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Boris Pismenny , John Fastabend Subject: [RFC PATCH v2 13/48] tls: Inline do_tcp_sendpages() Date: Wed, 29 Mar 2023 15:13:19 +0100 Message-Id: <20230329141354.516864-14-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761712158363162633?= X-GMAIL-MSGID: =?utf-8?q?1761712158363162633?= do_tcp_sendpages() is now just a small wrapper around tcp_sendmsg_locked(), so inline it, allowing do_tcp_sendpages() to be removed. This is part of replacing ->sendpage() with a call to sendmsg() with MSG_SPLICE_PAGES set. Signed-off-by: David Howells cc: Boris Pismenny cc: John Fastabend cc: Jakub Kicinski cc: "David S. Miller" cc: Eric Dumazet cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- include/net/tls.h | 2 +- net/tls/tls_main.c | 25 ++++++++++++++++--------- 2 files changed, 17 insertions(+), 10 deletions(-) diff --git a/include/net/tls.h b/include/net/tls.h index 154949c7b0c8..d31521c36a84 100644 --- a/include/net/tls.h +++ b/include/net/tls.h @@ -256,7 +256,7 @@ struct tls_context { struct scatterlist *partially_sent_record; u16 partially_sent_offset; - bool in_tcp_sendpages; + bool splicing_pages; bool pending_open_record_frags; struct mutex tx_lock; /* protects partially_sent_* fields and diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c index 3735cb00905d..9a1e77de6a1f 100644 --- a/net/tls/tls_main.c +++ b/net/tls/tls_main.c @@ -124,7 +124,11 @@ int tls_push_sg(struct sock *sk, u16 first_offset, int flags) { - int sendpage_flags = flags | MSG_SENDPAGE_NOTLAST; + struct bio_vec bvec; + struct msghdr msg = { + .msg_flags = flags | MSG_SENDPAGE_NOTLAST, + .msg_kflags = MSG_SPLICE_PAGES, + }; int ret = 0; struct page *p; size_t size; @@ -133,16 +137,19 @@ int tls_push_sg(struct sock *sk, size = sg->length - offset; offset += sg->offset; - ctx->in_tcp_sendpages = true; + ctx->splicing_pages = true; while (1) { if (sg_is_last(sg)) - sendpage_flags = flags; + msg.msg_flags = flags; /* is sending application-limited? */ tcp_rate_check_app_limited(sk); p = sg_page(sg); retry: - ret = do_tcp_sendpages(sk, p, offset, size, sendpage_flags); + bvec_set_page(&bvec, p, size, offset); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); + + ret = tcp_sendmsg_locked(sk, &msg, size); if (ret != size) { if (ret > 0) { @@ -154,7 +161,7 @@ int tls_push_sg(struct sock *sk, offset -= sg->offset; ctx->partially_sent_offset = offset; ctx->partially_sent_record = (void *)sg; - ctx->in_tcp_sendpages = false; + ctx->splicing_pages = false; return ret; } @@ -168,7 +175,7 @@ int tls_push_sg(struct sock *sk, size = sg->length; } - ctx->in_tcp_sendpages = false; + ctx->splicing_pages = false; return 0; } @@ -246,11 +253,11 @@ static void tls_write_space(struct sock *sk) { struct tls_context *ctx = tls_get_ctx(sk); - /* If in_tcp_sendpages call lower protocol write space handler + /* If splicing_pages call lower protocol write space handler * to ensure we wake up any waiting operations there. For example - * if do_tcp_sendpages where to call sk_wait_event. + * if splicing pages where to call sk_wait_event. */ - if (ctx->in_tcp_sendpages) { + if (ctx->splicing_pages) { ctx->sk_write_space(sk); return; } From patchwork Wed Mar 29 14:13:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76607 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp449865vqo; Wed, 29 Mar 2023 07:20:05 -0700 (PDT) X-Google-Smtp-Source: AKy350ZIkArJER3A2V3anV8FCFFXy5JrBBQ1tJhJe6jBbxJQ9mud2FVrMfKcH3jQgGM5pDwdSwhn X-Received: by 2002:a17:906:c097:b0:946:e908:3af0 with SMTP id f23-20020a170906c09700b00946e9083af0mr2690884ejz.26.1680099605556; Wed, 29 Mar 2023 07:20:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680099605; cv=none; d=google.com; s=arc-20160816; b=nQU1fLMHJkSd75hYqA+SskiyC/pVIgDQ/HWp+nFgVrl4V8VJPhdlStTsZeFs14tUIw kyNsVmnybG6EtMlTi60uqw0+kteG4Wubu5w6Tq4Oht1mMn1dtnYP+hR9ggtm4hvg6PPv MTWbYGa9FHnKtDX0TeiiQbpYnthDQ/6mhVIJnaG8lvZgiuezCjSd0uQ2l4hc55MxKVkh oRAn7H2Dqk/6drQbWbyr3bAzzjlVI3+HPCdWn2Bv+sjWE0eRApCbH+NPRP+ogrn2t7Sy V9ET8XfDNZSM76RSnx1WvQEA3j6hulxAboWub/maxL6CjER/afi7NSkpQRsUyEE9vvQX z7wA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=pP5V7G7GRqo/82ZWWApO+0CTJP1pdqvbg0H8r7V6l6c=; b=fEP7zz0B0L4IrnQ7Br3OJv9rOVHERdzA7Yll25DakelONbqBjg1qr7WI6l50FyuxZa EsKD+t69ol55mFnQBpT+dVs9ZTblYunPT90Us/Ivc90xowMYTaxgEgMl8CGh2kVuI3go E9oNHJb5k0b83qpy2Tg7uq+0wTN1SWzW+DatFCcIEqjbqZ2MthqYeAGV89aQ2hBX9TF2 YDaef2zK5GKQ0USzkWxBCgJZzxmtsJnhL9bIZNZufRrg8JF1EkNEqgCUyvsNJ6lFrgEs YDrH3wBxbtqCFY0VB0Wc8W5uOTmAp6TWszdPsTRwlG+/Ad3e0cUj5hB3GL3U3zIzFxRe PgGg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=AZtl3wcm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id vl5-20020a170907b60500b00933e496c8d5si13022496ejc.759.2023.03.29.07.19.40; Wed, 29 Mar 2023 07:20:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=AZtl3wcm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231134AbjC2ORy (ORCPT + 99 others); Wed, 29 Mar 2023 10:17:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53610 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230458AbjC2OQp (ORCPT ); Wed, 29 Mar 2023 10:16:45 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 57A9A1717 for ; Wed, 29 Mar 2023 07:15:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099287; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pP5V7G7GRqo/82ZWWApO+0CTJP1pdqvbg0H8r7V6l6c=; b=AZtl3wcmJMsQDtZuvBviqsEcQy3vFDx5K6qMGw5H6RVY2nwA0dY+yEs/mtsV8CdG03STFL yJUJMNNk05loWXVTBEpGE0Co52+PmpobGKf409anN4IsT70xRk8C5Dz2SzOAEze5Zx1vz5 XEkj9g6om7/pENRL34g7cSz5Xd8H/vY= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-387-i4eZEzSqMsuAkK3bbaUC_A-1; Wed, 29 Mar 2023 10:14:39 -0400 X-MC-Unique: i4eZEzSqMsuAkK3bbaUC_A-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 9C7CD801779; Wed, 29 Mar 2023 14:14:38 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 715D0C15BA0; Wed, 29 Mar 2023 14:14:36 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Bernard Metzler , Tom Talpey , linux-rdma@vger.kernel.org Subject: [RFC PATCH v2 14/48] siw: Inline do_tcp_sendpages() Date: Wed, 29 Mar 2023 15:13:20 +0100 Message-Id: <20230329141354.516864-15-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761712124092202265?= X-GMAIL-MSGID: =?utf-8?q?1761712124092202265?= do_tcp_sendpages() is now just a small wrapper around tcp_sendmsg_locked(), so inline it, allowing do_tcp_sendpages() to be removed. This is part of replacing ->sendpage() with a call to sendmsg() with MSG_SPLICE_PAGES set. Signed-off-by: David Howells cc: Bernard Metzler cc: Tom Talpey cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: linux-rdma@vger.kernel.org cc: netdev@vger.kernel.org --- drivers/infiniband/sw/siw/siw_qp_tx.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c b/drivers/infiniband/sw/siw/siw_qp_tx.c index 05052b49107f..fa5de40d85d5 100644 --- a/drivers/infiniband/sw/siw/siw_qp_tx.c +++ b/drivers/infiniband/sw/siw/siw_qp_tx.c @@ -313,7 +313,7 @@ static int siw_tx_ctrl(struct siw_iwarp_tx *c_tx, struct socket *s, } /* - * 0copy TCP transmit interface: Use do_tcp_sendpages. + * 0copy TCP transmit interface: Use MSG_SPLICE_PAGES. * * Using sendpage to push page by page appears to be less efficient * than using sendmsg, even if data are copied. @@ -324,20 +324,27 @@ static int siw_tx_ctrl(struct siw_iwarp_tx *c_tx, struct socket *s, static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset, size_t size) { + struct bio_vec bvec; + struct msghdr msg = { + .msg_flags = (MSG_MORE | MSG_DONTWAIT | MSG_SENDPAGE_NOTLAST | + MSG_SPLICE_PAGES), + }; struct sock *sk = s->sk; - int i = 0, rv = 0, sent = 0, - flags = MSG_MORE | MSG_DONTWAIT | MSG_SENDPAGE_NOTLAST; + int i = 0, rv = 0, sent = 0; while (size) { size_t bytes = min_t(size_t, PAGE_SIZE - offset, size); if (size + offset <= PAGE_SIZE) - flags = MSG_MORE | MSG_DONTWAIT; + msg.msg_flags = MSG_MORE | MSG_DONTWAIT; tcp_rate_check_app_limited(sk); + bvec_set_page(&bvec, page[i], bytes, offset); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); + try_page_again: lock_sock(sk); - rv = do_tcp_sendpages(sk, page[i], offset, bytes, flags); + rv = tcp_sendmsg_locked(sk, &msg, size); release_sock(sk); if (rv > 0) { From patchwork Wed Mar 29 14:13:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76627 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp461613vqo; Wed, 29 Mar 2023 07:37:18 -0700 (PDT) X-Google-Smtp-Source: AKy350al9+UuS8+vZeb91WfVZ0x+pOSaYYncIleuxELC2rSwbRRdxslAUH1Y/J0R01Jq19QESPFc X-Received: by 2002:a05:6402:757:b0:4fd:c5e:79b8 with SMTP id p23-20020a056402075700b004fd0c5e79b8mr19878129edy.32.1680100637882; Wed, 29 Mar 2023 07:37:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680100637; cv=none; d=google.com; s=arc-20160816; b=XFh7GAZ8K3BUiMlOyZDWQWDj/FsMaUX7UfUQVW/pB2qrfmB0B9WX9SPIx+71U0KVK+ u+ERLq0IjZiy7B3syPv0MeCPAUpmKNHDE1qww6C2eYxKhxpR0zGmY+T5PgK0YP0xFbu+ 23vDuZVDdmpJakz3v9TFSz+ZmNWibr7pwefB61ZG0NQfQql3g+q90XBnsVLTblA87VGO krV+aeQ4dGlcSGVrfUjZVichm4hHsqUVraO9XXx/5+sMZ1hB/+aYOejboq1FdPkE74bh k2tPomEecOC28oYaOh1egr0GCS2b+KzvfStyyJf9XDW0izOmMODW+rERp0l9CYtPMNZD KNFg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Bb5tuF13A7AUAmGHrtoNc4Ue+Am7P92yBHtRqoRTTps=; b=pAU+ot8yBeYoUrDWAvK1t8ILCNAeDnoK08cRPn52DBcLyXrnhszdhUE7jVpF/U3mTx uE+lDyxxDZw5Auk6kLad2+3UVmU7udF2ZR38suEdeZa8E3m7mFKuLLKWu2AhFyREEEHB JnFqCPxQ+fVO+C5VxL3JgqydvJqiflcoGQqS43wWqJNcfHnFmoS0gShjMji+YodNbUVh JffSYzguxT8UQ7Sbs4gag7VI6RhvOkXB0cdwZ6/aNR5sm0jsUK8dnQwuBtEboyWPQ756 nwVCnhAQrtbuDMm2KQ97kIkf9dO/DZ0uGLZOwA1EgyBFRznIx22MQOMeZBx7AWV+x3L7 TAkg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ITtQfFbs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w4-20020a50fa84000000b004ad6d2941efsi34659893edr.478.2023.03.29.07.36.50; Wed, 29 Mar 2023 07:37:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ITtQfFbs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229638AbjC2OR5 (ORCPT + 99 others); Wed, 29 Mar 2023 10:17:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53644 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230459AbjC2OQq (ORCPT ); Wed, 29 Mar 2023 10:16:46 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 13F144C18 for ; Wed, 29 Mar 2023 07:15:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099286; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Bb5tuF13A7AUAmGHrtoNc4Ue+Am7P92yBHtRqoRTTps=; b=ITtQfFbsMu0vDog1WqpiWDXO4zirdomQ4CJNgLGfWMQSzWuhQfG5wemeYuhpQ+C9xtYjAK 19d2z0Y0kYUzkAO1j1SxzMq6pUsEHdaw4KcYgEJ9wUAewIU8o5pGTEzIiqClYqBFBYg4Xg bC3jPHsoH/ReVk/+XtvJ0TTAHKHgeJ4= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-558-1vALR20tPYeGhBQuRt7gLg-1; Wed, 29 Mar 2023 10:14:42 -0400 X-MC-Unique: 1vALR20tPYeGhBQuRt7gLg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 3D8F81C0758E; Wed, 29 Mar 2023 14:14:41 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 59D114020C82; Wed, 29 Mar 2023 14:14:39 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v2 15/48] tcp: Fold do_tcp_sendpages() into tcp_sendpage_locked() Date: Wed, 29 Mar 2023 15:13:21 +0100 Message-Id: <20230329141354.516864-16-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761713206801810560?= X-GMAIL-MSGID: =?utf-8?q?1761713206801810560?= Fold do_tcp_sendpages() into its last remaining caller, tcp_sendpage_locked(). Signed-off-by: David Howells cc: Eric Dumazet cc: "David S. Miller" cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- include/net/tcp.h | 2 -- net/ipv4/tcp.c | 26 ++++++++------------------ 2 files changed, 8 insertions(+), 20 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index db9f828e9d1e..844bc8e6a714 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -333,8 +333,6 @@ int tcp_sendpage(struct sock *sk, struct page *page, int offset, size_t size, int flags); int tcp_sendpage_locked(struct sock *sk, struct page *page, int offset, size_t size, int flags); -ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset, - size_t size, int flags); int tcp_send_mss(struct sock *sk, int *size_goal, int flags); void tcp_push(struct sock *sk, int flags, int mss_now, int nonagle, int size_goal); diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index a1c5a6d9419c..a8f8ccaed10e 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -971,14 +971,16 @@ static int tcp_wmem_schedule(struct sock *sk, int copy) return min(copy, sk->sk_forward_alloc); } -ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset, - size_t size, int flags) +int tcp_sendpage_locked(struct sock *sk, struct page *page, int offset, + size_t size, int flags) { struct bio_vec bvec; - struct msghdr msg = { - .msg_flags = flags, - .msg_kflags = MSG_SPLICE_PAGES, - }; + struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, }; + + if (!(sk->sk_route_caps & NETIF_F_SG)) + return sock_no_sendpage_locked(sk, page, offset, size, flags); + + tcp_rate_check_app_limited(sk); /* is sending application-limited? */ bvec_set_page(&bvec, page, size, offset); iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); @@ -988,18 +990,6 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset, return tcp_sendmsg_locked(sk, &msg, size); } -EXPORT_SYMBOL_GPL(do_tcp_sendpages); - -int tcp_sendpage_locked(struct sock *sk, struct page *page, int offset, - size_t size, int flags) -{ - if (!(sk->sk_route_caps & NETIF_F_SG)) - return sock_no_sendpage_locked(sk, page, offset, size, flags); - - tcp_rate_check_app_limited(sk); /* is sending application-limited? */ - - return do_tcp_sendpages(sk, page, offset, size, flags); -} EXPORT_SYMBOL_GPL(tcp_sendpage_locked); int tcp_sendpage(struct sock *sk, struct page *page, int offset, From patchwork Wed Mar 29 14:13:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76611 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp450424vqo; Wed, 29 Mar 2023 07:20:53 -0700 (PDT) X-Google-Smtp-Source: AKy350ZHyvszmtH02bIgSiqQEF+TUePUWTJgqpwhkSOGCHGUM4I1aHoMuZe80NY06O5XkgPmBEki X-Received: by 2002:a17:906:8a41:b0:930:18f5:d016 with SMTP id gx1-20020a1709068a4100b0093018f5d016mr19846742ejc.15.1680099653141; Wed, 29 Mar 2023 07:20:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680099653; cv=none; d=google.com; s=arc-20160816; b=oE8z0kK5FJgnqAH1W5sQqX+BORbIBKlMCXiia7PQOgZWskfoMjc1JyG6W7B/+mIrpx Ok8H263u0tRBaQYHo4qzjl8vsbTxxdN0YQl+Z/BqNPbWjXJyRfk9fPeXbKo2XVjuIL6S aYEm/U1ZloHhepKk7C48sgE180BeeBJolCgty6YstLh16LNOAbvHZ8fSzOHmhDy4Of5U zyYugIZB71O40qmgd4EIRd/zT2YQhd2IFYtEmD48szRF8Y4iv3m0OitNL6cPEMyr0HgB SkFXVN8k9XsdHVxmwB64lvZaDm0PskCW+Hefio44lOsM5sqebIIvKAE59hELyIvEscdv Oxag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=6eWwAqocQ917L7rRS0xkmog8pJjMU6goCC63rugY9Cw=; b=ZvVugRN2QP43Hz+RZvfZUk33k4pxyF4K4z6KPRNOr7pNHNqyay3O4qL15ud7cxN/Xv EkYOr6K69m49FFix4nrN+qVzMjCrhGn1Hv6eeglnxhItGf9C4d6tsxoGspmhK55JQOD9 LwsaZnrCveStJ4UcCROig6wXlT+mqCp8lREQA5MRDnyh3XFJRec2RlJLsGDNuu+29S1q ukjahYOzFdvKxWvjovuo6jUBlJ05at7mECmNT7BR95kFzlO5as4CfItK5Nd0GwrwBAAt Zxt0C9fgQ3lVKReWk8pgb99Wq8HcicvN4J/fj1d+32EP9Ntnu0nqiB72WAyvkpF91yZ+ 759w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=OXUXyK7F; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id xi14-20020a170906dace00b0093348ceb372si16169606ejb.742.2023.03.29.07.20.25; Wed, 29 Mar 2023 07:20:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=OXUXyK7F; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230386AbjC2OSZ (ORCPT + 99 others); Wed, 29 Mar 2023 10:18:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54738 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230509AbjC2OR3 (ORCPT ); Wed, 29 Mar 2023 10:17:29 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EE9C75588 for ; Wed, 29 Mar 2023 07:15:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099289; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6eWwAqocQ917L7rRS0xkmog8pJjMU6goCC63rugY9Cw=; b=OXUXyK7F4gLE0Dia+upujRjQD9E6auNbchHKn4mJeM+XXEc+MAY8Jd7UjDcwD4E/vX7r2P OapnpBgsBmMY4aj1JQMepk0BUdh7kQLcKOp8ze4oOIo7vSvsDBmBBlS3uJoUIk7EV9bsxF rLm5ESmklyCSACh9q/+JHFKGPbTVzK4= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-516-0sNPtnvnMki-Ezl96bY6_g-1; Wed, 29 Mar 2023 10:14:44 -0400 X-MC-Unique: 0sNPtnvnMki-Ezl96bY6_g-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C96611C0758A; Wed, 29 Mar 2023 14:14:43 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id CD48A40B3EDA; Wed, 29 Mar 2023 14:14:41 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Willem de Bruijn Subject: [RFC PATCH v2 16/48] ip, udp: Support MSG_SPLICE_PAGES Date: Wed, 29 Mar 2023 15:13:22 +0100 Message-Id: <20230329141354.516864-17-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761712174188248907?= X-GMAIL-MSGID: =?utf-8?q?1761712174188248907?= Make IP/UDP sendmsg() support MSG_SPLICE_PAGES. This causes pages to be spliced from the source iterator if possible (the iterator must be ITER_BVEC and the pages must be spliceable). This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells cc: Willem de Bruijn cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- net/ipv4/ip_output.c | 85 +++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 81 insertions(+), 4 deletions(-) diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 4e4e308c3230..07736da70eab 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -973,11 +973,11 @@ static int __ip_append_data(struct sock *sk, int hh_len; int exthdrlen; int mtu; - int copy; + ssize_t copy; int err; int offset = 0; bool zc = false; - unsigned int maxfraglen, fragheaderlen, maxnonfragsize; + unsigned int maxfraglen, fragheaderlen, maxnonfragsize, xlength; int csummode = CHECKSUM_NONE; struct rtable *rt = (struct rtable *)cork->dst; unsigned int wmem_alloc_delta = 0; @@ -1017,6 +1017,7 @@ static int __ip_append_data(struct sock *sk, (!exthdrlen || (rt->dst.dev->features & NETIF_F_HW_ESP_TX_CSUM))) csummode = CHECKSUM_PARTIAL; + xlength = length; if ((flags & MSG_ZEROCOPY) && length) { struct msghdr *msg = from; @@ -1047,6 +1048,14 @@ static int __ip_append_data(struct sock *sk, skb_zcopy_set(skb, uarg, &extra_uref); } } + } else if ((flags & MSG_SPLICE_PAGES) && length) { + struct msghdr *msg = from; + + if (inet->hdrincl) + return -EPERM; + if (!(rt->dst.dev->features & NETIF_F_SG)) + return -EOPNOTSUPP; + xlength = transhdrlen; /* We need an empty buffer to attach stuff to */ } cork->length += length; @@ -1074,6 +1083,50 @@ static int __ip_append_data(struct sock *sk, unsigned int alloclen, alloc_extra; unsigned int pagedlen; struct sk_buff *skb_prev; + + if (unlikely(flags & MSG_SPLICE_PAGES)) { + skb_prev = skb; + fraggap = skb_prev->len - maxfraglen; + + alloclen = fragheaderlen + hh_len + fraggap + 15; + skb = sock_wmalloc(sk, alloclen, 1, sk->sk_allocation); + if (unlikely(!skb)) { + err = -ENOBUFS; + goto error; + } + + /* + * Fill in the control structures + */ + skb->ip_summed = CHECKSUM_NONE; + skb->csum = 0; + skb_reserve(skb, hh_len); + + /* + * Find where to start putting bytes. + */ + skb_put(skb, fragheaderlen + fraggap); + skb_reset_network_header(skb); + skb->transport_header = (skb->network_header + + fragheaderlen); + if (fraggap) { + skb->csum = skb_copy_and_csum_bits( + skb_prev, maxfraglen, + skb_transport_header(skb), + fraggap); + skb_prev->csum = csum_sub(skb_prev->csum, + skb->csum); + pskb_trim_unique(skb_prev, maxfraglen); + } + + /* + * Put the packet on the pending queue. + */ + __skb_queue_tail(&sk->sk_write_queue, skb); + continue; + } + xlength = length; + alloc_new_skb: skb_prev = skb; if (skb_prev) @@ -1085,7 +1138,7 @@ static int __ip_append_data(struct sock *sk, * If remaining data exceeds the mtu, * we know we need more fragment(s). */ - datalen = length + fraggap; + datalen = xlength + fraggap; if (datalen > mtu - fragheaderlen) datalen = maxfraglen - fragheaderlen; fraglen = datalen + fragheaderlen; @@ -1099,7 +1152,7 @@ static int __ip_append_data(struct sock *sk, * because we have no idea what fragment will be * the last. */ - if (datalen == length + fraggap) + if (datalen == xlength + fraggap) alloc_extra += rt->dst.trailer_len; if ((flags & MSG_MORE) && @@ -1206,6 +1259,30 @@ static int __ip_append_data(struct sock *sk, err = -EFAULT; goto error; } + } else if (flags & MSG_SPLICE_PAGES) { + struct msghdr *msg = from; + struct page *page = NULL, **pages = &page; + size_t off; + + copy = iov_iter_extract_pages(&msg->msg_iter, &pages, + copy, 1, 0, &off); + if (copy <= 0) { + err = copy ?: -EIO; + goto error; + } + + err = skb_append_pagefrags(skb, page, off, copy); + if (err < 0) + goto error; + + if (skb->ip_summed == CHECKSUM_NONE) { + __wsum csum; + csum = csum_page(page, off, copy); + skb->csum = csum_block_add(skb->csum, csum, skb->len); + } + + skb_len_add(skb, copy); + refcount_add(copy, &sk->sk_wmem_alloc); } else if (!zc) { int i = skb_shinfo(skb)->nr_frags; From patchwork Wed Mar 29 14:13:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76635 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp463102vqo; Wed, 29 Mar 2023 07:39:50 -0700 (PDT) X-Google-Smtp-Source: AKy350a6bFDjBS9k+vgdINtgAmbREESKRArhGROfhBW735e4RmR8n2itCRpU6BrFaJ0m5qRNyfGb X-Received: by 2002:aa7:c65a:0:b0:501:c3de:dc5c with SMTP id z26-20020aa7c65a000000b00501c3dedc5cmr19267794edr.18.1680100790407; Wed, 29 Mar 2023 07:39:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680100790; cv=none; d=google.com; s=arc-20160816; b=HjDNMomWotAVRRCjlneAQOFvS0n1/o+mhelIfyIkn+WwNXEwMB9uq0S+dgY4gB+lFu JNrxoGYqrnzQmwzcCGQhZYV/oHNxVlWJi7QYDGtYQuVQ8ijGlC8ayLILo2KF6X3NUZOc 2m2VjHv0MBZJRDCCR8iJZLwYeUkU5A0dIcaGXCzy8mTjfpM38kdlANRJcWXx+R+g4jJf 4dffhHVl/UAeAtsEXrT5SDvIt3zfyUxtML7WegshPAbkrCJnrusZnK1JOQM0N2yKZcyN DEuTgQh2zvL28F4+Sxokk5JRgSWB/fTYgXsUFD78UMW7AYCApEYsYnUJhM5qfCyCcmT4 VqFQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=vWPvILWC7NBdevnbDDsys9XJnr3/cfzKJoNpaM2CnAc=; b=o0J1etDqst7nSbAk1U07M/kUZFV/wO7FPS8Mc9yU84vk+rdQAWNeua1rXASVS85VKy run56cWtumb8MupOq0IuiaMkTXE82XkYp29hJsDacCVdjcgd5y+/lph8tRjbxAjsUwmK eGuqJZD3YaRHojuKf+99cotF4e8IOAA9NV1j3W3sKyzfdYubTFMK5+1ZViT+djkBygoB 5yFuHB4EEajdv+09mI98CrGAuhcJNn8u2tceFO9eYj0kEpnNA4j3lRjYwFjNTPNMQwaO zJXWxuaVCnJxyuawR2WYRUJaz29UylfcOwgL8Z3YDsuP9+TCXZuRPcgdjO+7Ffn7NoAl QBVA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=QzWP+i8W; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m3-20020a056402050300b004c461474e86si34612948edv.152.2023.03.29.07.39.26; Wed, 29 Mar 2023 07:39:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=QzWP+i8W; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231176AbjC2OSO (ORCPT + 99 others); Wed, 29 Mar 2023 10:18:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54640 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230489AbjC2ORS (ORCPT ); Wed, 29 Mar 2023 10:17:18 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2BBFE49EE for ; Wed, 29 Mar 2023 07:15:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099293; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vWPvILWC7NBdevnbDDsys9XJnr3/cfzKJoNpaM2CnAc=; b=QzWP+i8WMu7/PiwpZpzrToQTuupNVsjVsV1PHlG57HghEdsV7yXFEWONQcWs1euEaLvc/E YF29B/XR56XOpGkeGaDlDX66Ux+ERnSUDy/PHfejBh/wHjGwD1LNckp8/9Aap314yAbNiC aW7G4/ElLQT/9UwRe0Vc4eHmsq9GUMA= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-612-2VMedOL2PkqYySq_Hlwhxg-1; Wed, 29 Mar 2023 10:14:47 -0400 X-MC-Unique: 2VMedOL2PkqYySq_Hlwhxg-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 76C28185A7A9; Wed, 29 Mar 2023 14:14:46 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8174D2166B33; Wed, 29 Mar 2023 14:14:44 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Willem de Bruijn Subject: [RFC PATCH v2 17/48] ip, udp: Make sendmsg(MSG_SPLICE_PAGES) copy unspliceable data Date: Wed, 29 Mar 2023 15:13:23 +0100 Message-Id: <20230329141354.516864-18-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761713366629216268?= X-GMAIL-MSGID: =?utf-8?q?1761713366629216268?= If sendmsg() with MSG_SPLICE_PAGES encounters a page that shouldn't be spliced - a slab page, for instance, or one with a zero count - make __ip_append_data() copy it. Signed-off-by: David Howells cc: Willem de Bruijn cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- net/ipv4/ip_output.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 07736da70eab..e4aeaab704c8 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1263,6 +1263,7 @@ static int __ip_append_data(struct sock *sk, struct msghdr *msg = from; struct page *page = NULL, **pages = &page; size_t off; + bool put = false; copy = iov_iter_extract_pages(&msg->msg_iter, &pages, copy, 1, 0, &off); @@ -1271,7 +1272,25 @@ static int __ip_append_data(struct sock *sk, goto error; } + if (!sendpage_ok(page)) { + const void *p = kmap_local_page(page); + void *q; + + q = page_frag_memdup(NULL, p + off, copy, + sk->sk_allocation, ULONG_MAX); + kunmap_local(p); + if (!q) { + err = copy ?: -ENOMEM; + goto error; + } + page = virt_to_page(q); + off = offset_in_page(q); + put = true; + } + err = skb_append_pagefrags(skb, page, off, copy); + if (put) + put_page(page); if (err < 0) goto error; From patchwork Wed Mar 29 14:13:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76620 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp457505vqo; Wed, 29 Mar 2023 07:31:29 -0700 (PDT) X-Google-Smtp-Source: AKy350ZEwmEHhCm5VbdbtvjolAmNRhdEH5cCSEsIJUHJkW4zsSdECL1s8W7cmE7XkDCHYdnATGBz X-Received: by 2002:aa7:cd12:0:b0:4fd:20ee:225d with SMTP id b18-20020aa7cd12000000b004fd20ee225dmr17914326edw.24.1680100288992; Wed, 29 Mar 2023 07:31:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680100288; cv=none; d=google.com; s=arc-20160816; b=B2hbtpbYTfYAFsZohdv9sMHWj94JHV00t9LSOXBbdFHGF8RFwa2AZGpYTdm3D4yKXr UIimqbxEWZfTYZMlNDcI5Zz/Np7rMsJgi8jXW0j4VQC+s0EyvJBI7iH+P/rJccQjwQ6s l43irL5LBMh9JSNQiubeYZPrY9YPM+7EPQUWR8R7qLam+hf5mUMzIjYAVif48uHMOfGT hQO8y1g5ZGACL9iLcE3C4gUA7FOnxp+quvnm9PsNaQCv0Xn2BRsCyjEeBMHjkqVD2LeD wqlak39b60OJ+mx0t+56Q1LaC/kVK/VVihppXIB+H+O5i0sMpGGnFvulCGKRxbDFUxWA 4zag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=5bgiBObt8qBauZcUWYsV1T0v3n3OqeoexntQtIJIREA=; b=WojPB/RL9UiC38kB25dmoxAJ+8XCR1ly12eJ2K+QUweGVjItLqi4XdvGOm2LbZOv8V L1wS6MXO5RS4VmsdERfCA6MGJVwVi/AjBgUf5n69iQRHjK+yekV4wL8oJD1HoCrElX8e 3IjjuHOawlR2qTki1/eKCVS/2JIhlVxzfyUEi14GA3dWGJMmiWFd9SWrfalgWzVgdLRG 1MqFsZIYQ+ARyHWIPK2cCBvtCdc6O2Mih7pCcC39T2GmckvpHekf7pe7yHs6cdoEUXnC 8tRhWt2z0NhCVAkX4p85L2JzBvYqqu3FMuSBSAfYiLCO/QFuq2jpSQdbt3jMvI+h2A6V 6j9w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=dOOaBqk1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k3-20020aa7d8c3000000b004bc501f34d4si30505766eds.250.2023.03.29.07.30.55; Wed, 29 Mar 2023 07:31:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=dOOaBqk1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231168AbjC2OSJ (ORCPT + 99 others); Wed, 29 Mar 2023 10:18:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54602 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230294AbjC2ORR (ORCPT ); Wed, 29 Mar 2023 10:17:17 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6A8F7527C for ; Wed, 29 Mar 2023 07:15:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099296; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5bgiBObt8qBauZcUWYsV1T0v3n3OqeoexntQtIJIREA=; b=dOOaBqk1dQQdMUuqFELqeUDWabWbmNQoo1byuNYRtNG6EQmApDEhcUNijMrK3tkBexj8VD RX5zOxFaeEVw1XHgFAaH5Qxt4e1YjOP2ahqEBQfctxYzF0/OMn8RdVb4AxQjXbKp/I0XWl m5Uic6+XsXRt1+dSMB3REgEgGAPknIY= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-619-sSXFfrUjOyG56ONSjKtQ0A-1; Wed, 29 Mar 2023 10:14:50 -0400 X-MC-Unique: sSXFfrUjOyG56ONSjKtQ0A-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0AB0B3C0ED6A; Wed, 29 Mar 2023 14:14:49 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 115CE492C3E; Wed, 29 Mar 2023 14:14:46 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Willem de Bruijn Subject: [RFC PATCH v2 18/48] udp: Convert udp_sendpage() to use MSG_SPLICE_PAGES Date: Wed, 29 Mar 2023 15:13:24 +0100 Message-Id: <20230329141354.516864-19-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761712840540595594?= X-GMAIL-MSGID: =?utf-8?q?1761712840540595594?= Convert udp_sendpage() to use sendmsg() with MSG_SPLICE_PAGES rather than directly splicing in the pages itself. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells cc: Willem de Bruijn cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- net/ipv4/udp.c | 50 +++++++++----------------------------------------- 1 file changed, 9 insertions(+), 41 deletions(-) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index c605d171eb2d..097feb92e215 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1332,52 +1332,20 @@ EXPORT_SYMBOL(udp_sendmsg); int udp_sendpage(struct sock *sk, struct page *page, int offset, size_t size, int flags) { - struct inet_sock *inet = inet_sk(sk); - struct udp_sock *up = udp_sk(sk); + struct bio_vec bvec; + struct msghdr msg = { + .msg_flags = flags | MSG_SPLICE_PAGES | MSG_MORE + }; int ret; - if (flags & MSG_SENDPAGE_NOTLAST) - flags |= MSG_MORE; + bvec_set_page(&bvec, page, size, offset); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); - if (!up->pending) { - struct msghdr msg = { .msg_flags = flags|MSG_MORE }; - - /* Call udp_sendmsg to specify destination address which - * sendpage interface can't pass. - * This will succeed only when the socket is connected. - */ - ret = udp_sendmsg(sk, &msg, 0); - if (ret < 0) - return ret; - } + if (flags & MSG_SENDPAGE_NOTLAST) + msg.msg_flags |= MSG_MORE; lock_sock(sk); - - if (unlikely(!up->pending)) { - release_sock(sk); - - net_dbg_ratelimited("cork failed\n"); - return -EINVAL; - } - - ret = ip_append_page(sk, &inet->cork.fl.u.ip4, - page, offset, size, flags); - if (ret == -EOPNOTSUPP) { - release_sock(sk); - return sock_no_sendpage(sk->sk_socket, page, offset, - size, flags); - } - if (ret < 0) { - udp_flush_pending_frames(sk); - goto out; - } - - up->len += size; - if (!(READ_ONCE(up->corkflag) || (flags&MSG_MORE))) - ret = udp_push_pending_frames(sk); - if (!ret) - ret = size; -out: + ret = udp_sendmsg(sk, &msg, size); release_sock(sk); return ret; } From patchwork Wed Mar 29 14:13:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76608 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp450087vqo; Wed, 29 Mar 2023 07:20:24 -0700 (PDT) X-Google-Smtp-Source: AKy350asT+rK+6f2ULNeCtNgnc+Ghu9aLgzumhoAfi7/qbchBcJG7+jK9KjtH8knyiW/BrVxhTCv X-Received: by 2002:a17:907:72c3:b0:93b:2be7:3ce4 with SMTP id du3-20020a17090772c300b0093b2be73ce4mr3169143ejc.1.1680099624438; Wed, 29 Mar 2023 07:20:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680099624; cv=none; d=google.com; s=arc-20160816; b=EBJo2Q4tPJMxJdSaSEDcZhoKsmWHEvaSjWwDWogZV4Ym5ZKXot6rbrSyluYqK6WnMA 8diT1a73Txg8DgWwydNL42X5impS/j/9u/3ImRgWOs3/X7yQ/G0oXh3s4frjneFdtNCY uZ8pXjEPb9cEikmcIbujWFU0ngUWJ2ZEeBTWUhk2z/mH2xxGRhUEZIvRhqPv25CK9IJy OVAIqhMrd0hOGgc0au62VXTNf5AczBYO0i4pSu2ETO5yol0/T5CGkvXjTS7Kvt02ydnt PqOp/HP4LY87PL/UwZ0/OAiG2bikVReN6TIaZxkQxI5hfj8nDbrh6vMKMWP4JKGW2LDa yG7w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=NG18J76VRrzhfHMvUgJjpISKtrp6GX9XUZPxH3Nxq0I=; b=EuSV4ptF0ZkJct+/8IdMJwrImFZjRfA1aEmoL/++2G4Z8F7amJr6mCpVFY+9vzYzMq 2mHLN5p8xje1p0IZtIbzkg0Ed4/MegnOlgTk+zV3GQ0dpCGUalh3GGgY6w13WdLF5s30 uVyeKObFX7W0jmQmGCK1vyw0UkQWxWuFmopNlghJlvG7cSG6/7hUnv4IY2COrUb35omR IJzZufKsgErWKkVKppAKWcsVXUeF8nN59rRGhEPA5Z6K8UGkCHu6vai4n7IP6XNhNW2k NF4jic1h+GtmCTwiH3d9W4Vdhs2FKXPPMKX+rRuor2Iv8InUmOa9QToRGdD0LOvRKUQ8 zMjA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=YIZ7FzDz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j15-20020a170906830f00b008bc21b16e13si33565747ejx.871.2023.03.29.07.19.54; Wed, 29 Mar 2023 07:20:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=YIZ7FzDz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231161AbjC2OSF (ORCPT + 99 others); Wed, 29 Mar 2023 10:18:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54780 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229666AbjC2ORN (ORCPT ); Wed, 29 Mar 2023 10:17:13 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D0C51558B for ; Wed, 29 Mar 2023 07:15:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099296; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NG18J76VRrzhfHMvUgJjpISKtrp6GX9XUZPxH3Nxq0I=; b=YIZ7FzDz88YXGKH8ouR2Hc/ZYDNlBj8XQJCHzUy+bTAhpxz4JCwz3BeU72+qIUJ1HhtRN5 YIO9x2Uy57aEIDfCtS9uXs5dwUUpAinSECht+r3/4tU46Yr9TETDCFW/dQ1mQfhb8ZW36z Zkrq4Nh60aeCByzpDeRj2xIFapSdAkU= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-310-xuaB0LLNMnyvhGqA--dfEQ-1; Wed, 29 Mar 2023 10:14:52 -0400 X-MC-Unique: xuaB0LLNMnyvhGqA--dfEQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 9B50B1C0758A; Wed, 29 Mar 2023 14:14:51 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id BA9AC2166B33; Wed, 29 Mar 2023 14:14:49 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v2 19/48] af_unix: Support MSG_SPLICE_PAGES Date: Wed, 29 Mar 2023 15:13:25 +0100 Message-Id: <20230329141354.516864-20-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761712143329066071?= X-GMAIL-MSGID: =?utf-8?q?1761712143329066071?= Make AF_UNIX sendmsg() support MSG_SPLICE_PAGES, splicing in pages from the source iterator if given and if ITER_BVEC and copying the data in otherwise. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- net/unix/af_unix.c | 93 ++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 77 insertions(+), 16 deletions(-) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 347122c3575e..84a0d97f1aa4 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -2151,6 +2151,53 @@ static int queue_oob(struct socket *sock, struct msghdr *msg, struct sock *other } #endif +/* + * Extract pages from an iterator and add them to the socket buffer. + */ +static ssize_t unix_extract_bvec_to_skb(struct sk_buff *skb, + struct iov_iter *iter, ssize_t maxsize) +{ + struct page *pages[8], **ppages = pages; + unsigned int i, nr; + ssize_t ret = 0; + + while (iter->count > 0) { + size_t off, len; + + nr = min(MAX_SKB_FRAGS - skb_shinfo(skb)->nr_frags, + ARRAY_SIZE(pages)); + if (nr == 0) + break; + + len = iov_iter_extract_pages(iter, &ppages, maxsize, nr, 0, &off); + if (len <= 0) { + if (!ret) + ret = len ?: -EIO; + break; + } + + i = 0; + do { + size_t part = min(PAGE_SIZE - off, len); + + if (skb_append_pagefrags(skb, pages[i++], off, part) < 0) { + if (!ret) + ret = -EMSGSIZE; + goto out; + } + off = 0; + ret += part; + maxsize -= part; + len -= part; + } while (len > 0); + if (maxsize <= 0) + break; + } + +out: + return ret; +} + static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) { @@ -2194,19 +2241,25 @@ static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg, while (sent < len) { size = len - sent; - /* Keep two messages in the pipe so it schedules better */ - size = min_t(int, size, (sk->sk_sndbuf >> 1) - 64); + if (unlikely(msg->msg_flags & MSG_SPLICE_PAGES)) { + skb = sock_alloc_send_pskb(sk, 0, 0, + msg->msg_flags & MSG_DONTWAIT, + &err, 0); + } else { + /* Keep two messages in the pipe so it schedules better */ + size = min_t(int, size, (sk->sk_sndbuf >> 1) - 64); - /* allow fallback to order-0 allocations */ - size = min_t(int, size, SKB_MAX_HEAD(0) + UNIX_SKB_FRAGS_SZ); + /* allow fallback to order-0 allocations */ + size = min_t(int, size, SKB_MAX_HEAD(0) + UNIX_SKB_FRAGS_SZ); - data_len = max_t(int, 0, size - SKB_MAX_HEAD(0)); + data_len = max_t(int, 0, size - SKB_MAX_HEAD(0)); - data_len = min_t(size_t, size, PAGE_ALIGN(data_len)); + data_len = min_t(size_t, size, PAGE_ALIGN(data_len)); - skb = sock_alloc_send_pskb(sk, size - data_len, data_len, - msg->msg_flags & MSG_DONTWAIT, &err, - get_order(UNIX_SKB_FRAGS_SZ)); + skb = sock_alloc_send_pskb(sk, size - data_len, data_len, + msg->msg_flags & MSG_DONTWAIT, &err, + get_order(UNIX_SKB_FRAGS_SZ)); + } if (!skb) goto out_err; @@ -2218,13 +2271,21 @@ static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg, } fds_sent = true; - skb_put(skb, size - data_len); - skb->data_len = data_len; - skb->len = size; - err = skb_copy_datagram_from_iter(skb, 0, &msg->msg_iter, size); - if (err) { - kfree_skb(skb); - goto out_err; + if (unlikely(msg->msg_flags & MSG_SPLICE_PAGES)) { + size = unix_extract_bvec_to_skb(skb, &msg->msg_iter, size); + skb->data_len += size; + skb->len += size; + skb->truesize += size; + refcount_add(size, &sk->sk_wmem_alloc); + } else { + skb_put(skb, size - data_len); + skb->data_len = data_len; + skb->len = size; + err = skb_copy_datagram_from_iter(skb, 0, &msg->msg_iter, size); + if (err) { + kfree_skb(skb); + goto out_err; + } } unix_state_lock(other); From patchwork Wed Mar 29 14:13:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76645 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp464176vqo; Wed, 29 Mar 2023 07:41:33 -0700 (PDT) X-Google-Smtp-Source: AKy350bWy/0wnRFm1+6/HFyomVdLbbhnqfvXkUB/aRl4dV/HxngZsNlTSdZRR6U3TeKvY0LBdzEx X-Received: by 2002:a62:6458:0:b0:62d:b26e:fc63 with SMTP id y85-20020a626458000000b0062db26efc63mr1970028pfb.32.1680100893345; Wed, 29 Mar 2023 07:41:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680100893; cv=none; d=google.com; s=arc-20160816; b=kPXr9W4B2+LDq356PBXXvrr6DXG4vBG7d3tW2Cg3ecC1JttQm3VD1P6hgksjxl0hM1 pmu3W3k02l20kQv7C5e1UYcpY2PsyDoibcXKYnrcaRuclRxN4Zokh6anvo8WxpGwUO49 k7UP06f0aDTUmdzg86uaw3lNDG/f6fts4/XhplAtLddSecgfjq7eh+BVt7HT35EQJLMt nkkqNDbbJjaGFwjDRRg6lEPKt6p4Jp+1BMk8grv15hFL3/HGlePD1nISd0TulYqEMAIZ Eu9AYNfxRNejA95TpJsoYz0ssFVCC0lEFZf9Kq/FNfZ7kG5kFovkDwqUCUYwQ8Y3f2Jm 4t0Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=DuSpRIzeQZ01nwpBuy8LkLTaj8h3sZ+W9+uqSPxJMKE=; b=WmPvNrYL8t6dfPJwmRBy3vkXLgcWQp2UNNSkrfv+sRTHVTEeZWr1ZhQ1OXbQkYBnUE 3swwP+srZ868/7VRhGDfRuf1P9fL7hoaFCJZcb0uV7Lgc7i8ORxRzItzJz7HoY+ulfvY wbHid+xO2I5LlEPR28ruestJyMnh8a4J+6KPfsGzY8fcKvUiOUQ3VkUmRbTZ+41RE8rl jHaIvQM73b8kCTzK37tPhx5tTZwVk5H/GtC5jyzDaCqUqJwyNnBNhTl25j5Dn429CF7A Y7ln/ZfqcTVDei+XGNlWgawEK65Q5Ru3xu9gBZH/kqPxqWcOLYgtSis/60hHShcrg2rS Z6dA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Ikqa8NT+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p37-20020a635b25000000b0050c04831b60si236476pgb.682.2023.03.29.07.41.20; Wed, 29 Mar 2023 07:41:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Ikqa8NT+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231183AbjC2OSR (ORCPT + 99 others); Wed, 29 Mar 2023 10:18:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54660 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230491AbjC2ORS (ORCPT ); Wed, 29 Mar 2023 10:17:18 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1C81D4C3F for ; Wed, 29 Mar 2023 07:15:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099300; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DuSpRIzeQZ01nwpBuy8LkLTaj8h3sZ+W9+uqSPxJMKE=; b=Ikqa8NT+S3ldFr78WGnoKdiOQKW7zATeeRMFipjl8QjhCht1bqpTsyNlBFLiWK+6HvNOCt L4d2XaSZVPRIVEdixQB3IjvgViZ1906LAhAzIQYiMCZ9MIm6HtKxZpFe+Q1jg8TEGemang taDF6yWCz2rQaRPev8at8UfadMIw8ms= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-306-jnNXOk_ROhmWbX02B6xDmA-1; Wed, 29 Mar 2023 10:14:55 -0400 X-MC-Unique: jnNXOk_ROhmWbX02B6xDmA-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 170BE85A5B1; Wed, 29 Mar 2023 14:14:54 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 35746492C3E; Wed, 29 Mar 2023 14:14:52 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v2 20/48] af_unix: Make sendmsg(MSG_SPLICE_PAGES) copy unspliceable data Date: Wed, 29 Mar 2023 15:13:26 +0100 Message-Id: <20230329141354.516864-21-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761713474379397333?= X-GMAIL-MSGID: =?utf-8?q?1761713474379397333?= If sendmsg() with MSG_SPLICE_PAGES encounters a page that shouldn't be spliced - a slab page, for instance, or one with a zero count - make unix_extract_bvec_to_skb() copy it. Signed-off-by: David Howells cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- net/unix/af_unix.c | 44 +++++++++++++++++++++++++++++++++----------- 1 file changed, 33 insertions(+), 11 deletions(-) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 84a0d97f1aa4..b4b27a652ef0 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -2154,12 +2154,12 @@ static int queue_oob(struct socket *sock, struct msghdr *msg, struct sock *other /* * Extract pages from an iterator and add them to the socket buffer. */ -static ssize_t unix_extract_bvec_to_skb(struct sk_buff *skb, - struct iov_iter *iter, ssize_t maxsize) +static ssize_t unix_extract_bvec_to_skb(struct sk_buff *skb, struct iov_iter *iter, + ssize_t maxsize, gfp_t gfp) { struct page *pages[8], **ppages = pages; unsigned int i, nr; - ssize_t ret = 0; + ssize_t spliced = 0, ret = 0; while (iter->count > 0) { size_t off, len; @@ -2171,31 +2171,52 @@ static ssize_t unix_extract_bvec_to_skb(struct sk_buff *skb, len = iov_iter_extract_pages(iter, &ppages, maxsize, nr, 0, &off); if (len <= 0) { - if (!ret) - ret = len ?: -EIO; + ret = len ?: -EIO; break; } i = 0; do { + struct page *page = pages[i++]; size_t part = min(PAGE_SIZE - off, len); + bool put = false; + + if (!sendpage_ok(page)) { + const void *p = kmap_local_page(page); + void *q; + + q = page_frag_memdup(NULL, p + off, part, gfp, + ULONG_MAX); + kunmap_local(p); + if (!q) { + iov_iter_revert(iter, len); + ret = -ENOMEM; + goto out; + } + page = virt_to_page(q); + off = offset_in_page(q); + put = true; + } - if (skb_append_pagefrags(skb, pages[i++], off, part) < 0) { - if (!ret) - ret = -EMSGSIZE; + ret = skb_append_pagefrags(skb, page, off, part); + if (put) + put_page(page); + if (ret < 0) { + iov_iter_revert(iter, len); goto out; } off = 0; - ret += part; + spliced += part; maxsize -= part; len -= part; } while (len > 0); + if (maxsize <= 0) break; } out: - return ret; + return spliced ?: ret; } static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg, @@ -2272,7 +2293,8 @@ static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg, fds_sent = true; if (unlikely(msg->msg_flags & MSG_SPLICE_PAGES)) { - size = unix_extract_bvec_to_skb(skb, &msg->msg_iter, size); + size = unix_extract_bvec_to_skb(skb, &msg->msg_iter, size, + sk->sk_allocation); skb->data_len += size; skb->len += size; skb->truesize += size; From patchwork Wed Mar 29 14:13:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76613 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp450752vqo; Wed, 29 Mar 2023 07:21:18 -0700 (PDT) X-Google-Smtp-Source: AKy350be7XWFQTmKDjF+06SvvBN5pkbgUbmJdWB+3Iwi9o9+WsUIDpN8UM3c8IUo67N5i+pul8cO X-Received: by 2002:a17:907:d68f:b0:8c3:3439:24d9 with SMTP id wf15-20020a170907d68f00b008c3343924d9mr21063132ejc.24.1680099677907; Wed, 29 Mar 2023 07:21:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680099677; cv=none; d=google.com; s=arc-20160816; b=k7YiAbXnTq2GX1CCPYQwBtKHV6MgyRFYpmClnOnQwqvZ221vdVz8Yty6FVZx25igCU /Zf+sLA8W5JWTwFeekPEE4n6AtJXAaC9uM4KvZf+yR2jSoPOb1thI8Aj8khs1amSLawC smy75Dc53XZ+t2JB9/Jhq/ThHjMRQ7qdZtu+9nVDixFbr67x/XJ7XS6IjUdF4aFfJ7XO jgDOsM7nTdq0Jp3DtdJAB9lx8Ebajk5tZGaI6s99sqwaIyLqPDKOnpbOspf5b/qG7xVE EwpoxlUzmbAuRZvao2/1pBPBwJFpcnC7QCpAAvyxO+WYbJhf0HfhfOmj1jyq6k58SyCn ZMHA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=yzXpQhTxTxrOyN4QfDXBN8fe/pZE0zPIgWzTS/9cMoE=; b=MYHpBpiCBSlnmuiIc1/aUL5UbyfNp8c1CoZ7tv+tDZjPgmE7dz8Lmiomx+WjYn13NF ZXLDD0Ups2zrvhqNQF+Fjdywt2QH3wEIkrb5CaNcPQulyaIOhx8xhQboj4Hlg2XjpiEi BtTvXARQba6ugzZhGAHJBprl+er+iPyjms7/Z33CvqtL8Dk+pgDRxdqt+v0uJqAEtc16 PCgmxLX0bTSaPyMwtjvp+2OO60zYFqQGk8ZadyN2EkpgQH7nOaa3ldt/EwIlk0PP6w74 nyoInpbeMbCGjx1L5bPWxu+d632jpz2ACk8jqZpHsL8+ucEsHfMlKZuV09Y4knnb1zZb CjFQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=WdNEreZQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id mm10-20020a170906cc4a00b00930de1d9553si10864114ejb.16.2023.03.29.07.20.54; Wed, 29 Mar 2023 07:21:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=WdNEreZQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230322AbjC2OTJ (ORCPT + 99 others); Wed, 29 Mar 2023 10:19:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55542 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230075AbjC2ORh (ORCPT ); Wed, 29 Mar 2023 10:17:37 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 043D24229 for ; Wed, 29 Mar 2023 07:15:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099303; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yzXpQhTxTxrOyN4QfDXBN8fe/pZE0zPIgWzTS/9cMoE=; b=WdNEreZQMa+BsuCt6smoaHQGo6LRD39OGGAAtpdMZXpX3F00pWImJJ7jWOO1x68hTgglOn pqjafGs8hiEpry67sE7sdldKvK8+bMj7oqy6qV6lIckFZuM/L1FJQ6t5nGKqrJ1arYC7hX pDi4k4X/Smfr3n2NHui4BVal0wDRh+A= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-445-ZVu3CVMuPAy5XfxUPzoW4A-1; Wed, 29 Mar 2023 10:14:58 -0400 X-MC-Unique: ZVu3CVMuPAy5XfxUPzoW4A-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B9DA11C07588; Wed, 29 Mar 2023 14:14:56 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id A725B1121331; Wed, 29 Mar 2023 14:14:54 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Herbert Xu , linux-crypto@vger.kernel.org Subject: [RFC PATCH v2 21/48] crypto: af_alg: Pin pages rather than ref'ing if appropriate Date: Wed, 29 Mar 2023 15:13:27 +0100 Message-Id: <20230329141354.516864-22-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761712199683838731?= X-GMAIL-MSGID: =?utf-8?q?1761712199683838731?= Convert AF_ALG to use iov_iter_extract_pages() instead of iov_iter_get_pages(). This will pin pages or leave them unaltered rather than getting a ref on them as appropriate to the iterator. The pages need to be pinned for DIO-read rather than having refs taken on them to prevent VM copy-on-write from malfunctioning during a concurrent fork() (the result of the I/O would otherwise end up only visible to the child process and not the parent). Signed-off-by: David Howells cc: Herbert Xu cc: linux-crypto@vger.kernel.org --- crypto/af_alg.c | 10 +++++++--- include/crypto/if_alg.h | 1 + 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/crypto/af_alg.c b/crypto/af_alg.c index 5f7252a5b7b4..7caff10df643 100644 --- a/crypto/af_alg.c +++ b/crypto/af_alg.c @@ -533,14 +533,17 @@ static const struct net_proto_family alg_family = { int af_alg_make_sg(struct af_alg_sgl *sgl, struct iov_iter *iter, int len) { + struct page **pages = sgl->pages; size_t off; ssize_t n; int npages, i; - n = iov_iter_get_pages2(iter, sgl->pages, len, ALG_MAX_PAGES, &off); + n = iov_iter_extract_pages(iter, &pages, len, ALG_MAX_PAGES, 0, &off); if (n < 0) return n; + sgl->need_unpin = iov_iter_extract_will_pin(iter); + npages = DIV_ROUND_UP(off + n, PAGE_SIZE); if (WARN_ON(npages == 0)) return -EINVAL; @@ -573,8 +576,9 @@ void af_alg_free_sg(struct af_alg_sgl *sgl) { int i; - for (i = 0; i < sgl->npages; i++) - put_page(sgl->pages[i]); + if (sgl->need_unpin) + for (i = 0; i < sgl->npages; i++) + unpin_user_page(sgl->pages[i]); } EXPORT_SYMBOL_GPL(af_alg_free_sg); diff --git a/include/crypto/if_alg.h b/include/crypto/if_alg.h index 7e76623f9ec3..46494b33f5bc 100644 --- a/include/crypto/if_alg.h +++ b/include/crypto/if_alg.h @@ -59,6 +59,7 @@ struct af_alg_sgl { struct scatterlist sg[ALG_MAX_PAGES + 1]; struct page *pages[ALG_MAX_PAGES]; unsigned int npages; + bool need_unpin; }; /* TX SGL entry */ From patchwork Wed Mar 29 14:13:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76632 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp462695vqo; Wed, 29 Mar 2023 07:39:11 -0700 (PDT) X-Google-Smtp-Source: AKy350YuD1BqGmyowIHDxJulkSTmrFUFwf1luqgAHZ9prLOZvFHSgtfKcVPIiZpV31hQx+Z9cMhX X-Received: by 2002:a17:906:3616:b0:931:19f8:d89c with SMTP id q22-20020a170906361600b0093119f8d89cmr20266194ejb.73.1680100750972; Wed, 29 Mar 2023 07:39:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680100750; cv=none; d=google.com; s=arc-20160816; b=RuVA2mpYsp5TBsk6KG7T+D6u8VUbbCoF78XfwgGRd4xkfLqOf50GUnwO5jCfAdsfkj ctxdvTozn0oAXmgPE3XSkCq+YsRgBrSypLpLM3O7jRJnMgd0TMk+wN+9094Dv2pnp8Us 4+p1aCn9x8C90Z3rEoz7Q6l7aVqwz/W/yY6tWB2L9f/Gq3spoDLrnD8TIQvNrdIDINDf oeNzCHroJIwZYxOD2CQnMeuVYc13QCjQareF7Z2SU7vbe2sv9P9nZj4N8b4OgYy8BtxE MHl7pT/RhST+lXAPmUeFvk7cSZ9Q2nCfadwdQNX1FZQ/A9VJoW2++5gEYjFsyocKHFa9 BEvw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=j9S4Bu113H4v2EoDiT0rFcd+jOn0KUYsdPxs77CpfgI=; b=jBM5XSf8UNsaVRMLEqPGkkqjpPrKuH7vhpmPF9m52nP3aH/mJ81h6pu2M1WJLQ2pLM 3AqezWspbTugPNnLX6ZE1q0Cw9jna7Q8/HBNkYsRHz24AETHdKeF7zaau8ozM80sUaXR GXy5+IUveqwv7h+ZcEHSssTsNRaCD3lHqscY+eC/FpNW7TCs5HO5nOJKyyYnKmwcZ1t/ lEOKsUV8EjfXTFr08mZ7fON8+td04nvgE4Mu+i6wP5ahLAUJsOSet3NoALi1x1CCapFC SNbbEdsoXIscyL1AF+mT9JxxnN+nQujehkBjGrJ/T8vssA8wyIrN6Z8Ww8OMqV7SEAtE pFQw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=H9mRLbvF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f18-20020aa7d852000000b004fefeece418si34691023eds.284.2023.03.29.07.38.46; Wed, 29 Mar 2023 07:39:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=H9mRLbvF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229828AbjC2OTC (ORCPT + 99 others); Wed, 29 Mar 2023 10:19:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55218 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229781AbjC2ORb (ORCPT ); Wed, 29 Mar 2023 10:17:31 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 16A3B59E0 for ; Wed, 29 Mar 2023 07:15:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099306; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=j9S4Bu113H4v2EoDiT0rFcd+jOn0KUYsdPxs77CpfgI=; b=H9mRLbvFTPkYx4zeDpwNaB2z1RXedq1dx6Jtl475v75GgmjGS7gfxoBW7CB6sBwLuzk0yT gBygYjPQ89bUjxA7JqiYgOdjkc+VXqSo0J68gbUNW1qKDxgWpw4GH2/O9ybU1N0JzKmV/i CWtakVY2Fj02oOMCs1Jby3aDoKiMIAY= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-3-Iw9_nRRqMKG9XRVw14kxqw-1; Wed, 29 Mar 2023 10:15:02 -0400 X-MC-Unique: Iw9_nRRqMKG9XRVw14kxqw-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6BDD18028B2; Wed, 29 Mar 2023 14:14:59 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5B9152166B34; Wed, 29 Mar 2023 14:14:57 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Herbert Xu , linux-crypto@vger.kernel.org Subject: [RFC PATCH v2 22/48] crypto: af_alg: Use netfs_extract_iter_to_sg() to create scatterlists Date: Wed, 29 Mar 2023 15:13:28 +0100 Message-Id: <20230329141354.516864-23-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761713325136965818?= X-GMAIL-MSGID: =?utf-8?q?1761713325136965818?= Use netfs_extract_iter_to_sg() to decant the destination iterator into a scatterlist in af_alg_get_rsgl(). af_alg_make_sg() can then be removed. [!] Note that if this fits, netfs_extract_iter_to_sg() should move to core code. Signed-off-by: David Howells cc: Herbert Xu cc: linux-crypto@vger.kernel.org --- crypto/af_alg.c | 55 ++++++++++------------------------------- crypto/algif_aead.c | 12 ++++----- crypto/algif_hash.c | 18 ++++++++++---- crypto/algif_skcipher.c | 2 +- include/crypto/if_alg.h | 6 ++--- 5 files changed, 35 insertions(+), 58 deletions(-) diff --git a/crypto/af_alg.c b/crypto/af_alg.c index 7caff10df643..1dafd088ad45 100644 --- a/crypto/af_alg.c +++ b/crypto/af_alg.c @@ -22,6 +22,7 @@ #include #include #include +#include #include #include #include @@ -531,45 +532,11 @@ static const struct net_proto_family alg_family = { .owner = THIS_MODULE, }; -int af_alg_make_sg(struct af_alg_sgl *sgl, struct iov_iter *iter, int len) -{ - struct page **pages = sgl->pages; - size_t off; - ssize_t n; - int npages, i; - - n = iov_iter_extract_pages(iter, &pages, len, ALG_MAX_PAGES, 0, &off); - if (n < 0) - return n; - - sgl->need_unpin = iov_iter_extract_will_pin(iter); - - npages = DIV_ROUND_UP(off + n, PAGE_SIZE); - if (WARN_ON(npages == 0)) - return -EINVAL; - /* Add one extra for linking */ - sg_init_table(sgl->sg, npages + 1); - - for (i = 0, len = n; i < npages; i++) { - int plen = min_t(int, len, PAGE_SIZE - off); - - sg_set_page(sgl->sg + i, sgl->pages[i], plen, off); - - off = 0; - len -= plen; - } - sg_mark_end(sgl->sg + npages - 1); - sgl->npages = npages; - - return n; -} -EXPORT_SYMBOL_GPL(af_alg_make_sg); - static void af_alg_link_sg(struct af_alg_sgl *sgl_prev, struct af_alg_sgl *sgl_new) { - sg_unmark_end(sgl_prev->sg + sgl_prev->npages - 1); - sg_chain(sgl_prev->sg, sgl_prev->npages + 1, sgl_new->sg); + sg_unmark_end(sgl_prev->sgt.sgl + sgl_prev->sgt.nents - 1); + sg_chain(sgl_prev->sgt.sgl, sgl_prev->sgt.nents + 1, sgl_new->sgt.sgl); } void af_alg_free_sg(struct af_alg_sgl *sgl) @@ -577,8 +544,8 @@ void af_alg_free_sg(struct af_alg_sgl *sgl) int i; if (sgl->need_unpin) - for (i = 0; i < sgl->npages; i++) - unpin_user_page(sgl->pages[i]); + for (i = 0; i < sgl->sgt.nents; i++) + unpin_user_page(sg_page(&sgl->sgt.sgl[i])); } EXPORT_SYMBOL_GPL(af_alg_free_sg); @@ -1292,8 +1259,8 @@ int af_alg_get_rsgl(struct sock *sk, struct msghdr *msg, int flags, while (maxsize > len && msg_data_left(msg)) { struct af_alg_rsgl *rsgl; + ssize_t err; size_t seglen; - int err; /* limit the amount of readable buffers */ if (!af_alg_readable(sk)) @@ -1310,16 +1277,20 @@ int af_alg_get_rsgl(struct sock *sk, struct msghdr *msg, int flags, return -ENOMEM; } - rsgl->sgl.npages = 0; + rsgl->sgl.sgt.sgl = rsgl->sgl.sgl; + rsgl->sgl.sgt.nents = 0; + rsgl->sgl.sgt.orig_nents = 0; list_add_tail(&rsgl->list, &areq->rsgl_list); - /* make one iovec available as scatterlist */ - err = af_alg_make_sg(&rsgl->sgl, &msg->msg_iter, seglen); + err = netfs_extract_iter_to_sg(&msg->msg_iter, seglen, + &rsgl->sgl.sgt, ALG_MAX_PAGES, 0); if (err < 0) { rsgl->sg_num_bytes = 0; return err; } + rsgl->sgl.need_unpin = iov_iter_extract_will_pin(&msg->msg_iter); + /* chain the new scatterlist with previous one */ if (areq->last_rsgl) af_alg_link_sg(&areq->last_rsgl->sgl, &rsgl->sgl); diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c index 42493b4d8ce4..f6aa3856d8d5 100644 --- a/crypto/algif_aead.c +++ b/crypto/algif_aead.c @@ -210,7 +210,7 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr *msg, */ /* Use the RX SGL as source (and destination) for crypto op. */ - rsgl_src = areq->first_rsgl.sgl.sg; + rsgl_src = areq->first_rsgl.sgl.sgt.sgl; if (ctx->enc) { /* @@ -224,7 +224,7 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr *msg, * RX SGL: AAD || PT || Tag */ err = crypto_aead_copy_sgl(null_tfm, tsgl_src, - areq->first_rsgl.sgl.sg, processed); + areq->first_rsgl.sgl.sgt.sgl, processed); if (err) goto free; af_alg_pull_tsgl(sk, processed, NULL, 0); @@ -242,7 +242,7 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr *msg, /* Copy AAD || CT to RX SGL buffer for in-place operation. */ err = crypto_aead_copy_sgl(null_tfm, tsgl_src, - areq->first_rsgl.sgl.sg, outlen); + areq->first_rsgl.sgl.sgt.sgl, outlen); if (err) goto free; @@ -268,8 +268,8 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr *msg, /* RX SGL present */ struct af_alg_sgl *sgl_prev = &areq->last_rsgl->sgl; - sg_unmark_end(sgl_prev->sg + sgl_prev->npages - 1); - sg_chain(sgl_prev->sg, sgl_prev->npages + 1, + sg_unmark_end(sgl_prev->sgt.sgl + sgl_prev->sgt.nents - 1); + sg_chain(sgl_prev->sgt.sgl, sgl_prev->sgt.nents + 1, areq->tsgl); } else /* no RX SGL present (e.g. authentication only) */ @@ -278,7 +278,7 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr *msg, /* Initialize the crypto operation */ aead_request_set_crypt(&areq->cra_u.aead_req, rsgl_src, - areq->first_rsgl.sgl.sg, used, ctx->iv); + areq->first_rsgl.sgl.sgt.sgl, used, ctx->iv); aead_request_set_ad(&areq->cra_u.aead_req, ctx->aead_assoclen); aead_request_set_tfm(&areq->cra_u.aead_req, tfm); diff --git a/crypto/algif_hash.c b/crypto/algif_hash.c index 1d017ec5c63c..f051fa624bd7 100644 --- a/crypto/algif_hash.c +++ b/crypto/algif_hash.c @@ -14,6 +14,7 @@ #include #include #include +#include #include struct hash_ctx { @@ -91,13 +92,20 @@ static int hash_sendmsg(struct socket *sock, struct msghdr *msg, if (len > limit) len = limit; - len = af_alg_make_sg(&ctx->sgl, &msg->msg_iter, len); + ctx->sgl.sgt.sgl = ctx->sgl.sgl; + ctx->sgl.sgt.nents = 0; + ctx->sgl.sgt.orig_nents = 0; + + len = netfs_extract_iter_to_sg(&msg->msg_iter, len, + &ctx->sgl.sgt, ALG_MAX_PAGES, 0); if (len < 0) { err = copied ? 0 : len; goto unlock; } - ahash_request_set_crypt(&ctx->req, ctx->sgl.sg, NULL, len); + ctx->sgl.need_unpin = iov_iter_extract_will_pin(&msg->msg_iter); + + ahash_request_set_crypt(&ctx->req, ctx->sgl.sgt.sgl, NULL, len); err = crypto_wait_req(crypto_ahash_update(&ctx->req), &ctx->wait); @@ -141,8 +149,8 @@ static ssize_t hash_sendpage(struct socket *sock, struct page *page, flags |= MSG_MORE; lock_sock(sk); - sg_init_table(ctx->sgl.sg, 1); - sg_set_page(ctx->sgl.sg, page, size, offset); + sg_init_table(ctx->sgl.sgl, 1); + sg_set_page(ctx->sgl.sgl, page, size, offset); if (!(flags & MSG_MORE)) { err = hash_alloc_result(sk, ctx); @@ -151,7 +159,7 @@ static ssize_t hash_sendpage(struct socket *sock, struct page *page, } else if (!ctx->more) hash_free_result(sk, ctx); - ahash_request_set_crypt(&ctx->req, ctx->sgl.sg, ctx->result, size); + ahash_request_set_crypt(&ctx->req, ctx->sgl.sgl, ctx->result, size); if (!(flags & MSG_MORE)) { if (ctx->more) diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c index ee8890ee8f33..a251cd6bd5b9 100644 --- a/crypto/algif_skcipher.c +++ b/crypto/algif_skcipher.c @@ -105,7 +105,7 @@ static int _skcipher_recvmsg(struct socket *sock, struct msghdr *msg, /* Initialize the crypto operation */ skcipher_request_set_tfm(&areq->cra_u.skcipher_req, tfm); skcipher_request_set_crypt(&areq->cra_u.skcipher_req, areq->tsgl, - areq->first_rsgl.sgl.sg, len, ctx->iv); + areq->first_rsgl.sgl.sgt.sgl, len, ctx->iv); if (msg->msg_iocb && !is_sync_kiocb(msg->msg_iocb)) { /* AIO operation */ diff --git a/include/crypto/if_alg.h b/include/crypto/if_alg.h index 46494b33f5bc..34224e77f5a2 100644 --- a/include/crypto/if_alg.h +++ b/include/crypto/if_alg.h @@ -56,9 +56,8 @@ struct af_alg_type { }; struct af_alg_sgl { - struct scatterlist sg[ALG_MAX_PAGES + 1]; - struct page *pages[ALG_MAX_PAGES]; - unsigned int npages; + struct sg_table sgt; + struct scatterlist sgl[ALG_MAX_PAGES + 1]; bool need_unpin; }; @@ -164,7 +163,6 @@ int af_alg_release(struct socket *sock); void af_alg_release_parent(struct sock *sk); int af_alg_accept(struct sock *sk, struct socket *newsock, bool kern); -int af_alg_make_sg(struct af_alg_sgl *sgl, struct iov_iter *iter, int len); void af_alg_free_sg(struct af_alg_sgl *sgl); static inline struct alg_sock *alg_sk(struct sock *sk) From patchwork Wed Mar 29 14:13:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76612 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp450476vqo; Wed, 29 Mar 2023 07:20:57 -0700 (PDT) X-Google-Smtp-Source: AKy350bEgrGtaUIHBEjvT33b1D6pzNWXmQyspxdtqtCXjCfbkX7eO1DT2meKfI77PACbbTg2VeRt X-Received: by 2002:a17:907:248c:b0:931:bbcf:eb6b with SMTP id zg12-20020a170907248c00b00931bbcfeb6bmr20849838ejb.63.1680099657831; Wed, 29 Mar 2023 07:20:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680099657; cv=none; d=google.com; s=arc-20160816; b=rKFY31kpmqkatC69u7jJt0kESIb6SxsjLUDDWCku0x0rheTWdGSun3x8KG2fc2vQlJ vnyVQtw3wtuOrP32dY2BcZlD9Z9r/UQ2gyALEIYyhrBSKqkWTuse5P9Cm9bWCIDnfBIW Ig6cgS0enXW0/hs6gW2VMzeqO61xZU5IidgJSH8wggze+wslIpDfuEJC+/jBHjk8Rbol SFhUfsL1eVTXQZX2d6jQCQ3Z5Pnb60qCSwnX8Y/IqclOxZmIXHzX1rtK4tE4x+247EVM iGU37+3njNEPfHhxd57pfM3521QHirKF3Fz+OPXX2p23g1bAGKovhOdG/TI09/3Ujx4j rekQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Cnsl2PBKcn/WeJcDUlvQg+kDRV/xI85q/f39rakA9vk=; b=MrhB5WuWEynT6Ga+BlV4ev4UCQk7TxG4edqYE63eqX6CbPx6FMriS2FwLe05zASvIR DqM1xijgyn5hRWh8TEwnUU7iUChLVzh4YjxPJ7IaYE+hzo9W1WSPwUqLiUX8RIjOqMZQ YzWaj4F/HYbFFxNeKGaqGHHocs0hRU+JRRsQfQULarZeqhHNCQuXWeI8vsNSeBAC+Qdl clLfNLEWhedbpgRYaY5JhsSI/HcQRYdl+8r4Pcepz1Y0SsOpi7HAqkahNLEREijs/8ia 66XYfxi9VmVFjsLgStfWdiEl3iRuPiNv2JueYjJuVj3q9DTV3Zy0D7ugXc7p2OhODVqn KlPg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=IiJWyflK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id vk5-20020a170907cbc500b009334be3d5fcsi31259082ejc.296.2023.03.29.07.20.33; Wed, 29 Mar 2023 07:20:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=IiJWyflK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229891AbjC2OS4 (ORCPT + 99 others); Wed, 29 Mar 2023 10:18:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55160 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230511AbjC2ORa (ORCPT ); Wed, 29 Mar 2023 10:17:30 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EEA7B59E4 for ; Wed, 29 Mar 2023 07:15:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099307; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Cnsl2PBKcn/WeJcDUlvQg+kDRV/xI85q/f39rakA9vk=; b=IiJWyflK47eMCUoMXyaGoXeIODB06BJ3L+Rx7P0pJWqXyNOfcXthtDaFtUeb3XosIanZgz BKGfNaZ1RNcG0iml8UOsUzhBc/Ta9GeUl1W7SyaCZka/V06twyFg2hFijsdoIAOMGdYvSl aan++Dh7yMAT/9sokCDnNJEyXR9g0Do= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-354-ZXES39PtP7W3LCkby0Z-qw-1; Wed, 29 Mar 2023 10:15:03 -0400 X-MC-Unique: ZXES39PtP7W3LCkby0Z-qw-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 19257858F09; Wed, 29 Mar 2023 14:15:02 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 062E318EC2; Wed, 29 Mar 2023 14:14:59 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Herbert Xu , linux-crypto@vger.kernel.org Subject: [RFC PATCH v2 23/48] crypto: af_alg: Indent the loop in af_alg_sendmsg() Date: Wed, 29 Mar 2023 15:13:29 +0100 Message-Id: <20230329141354.516864-24-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761712178975012853?= X-GMAIL-MSGID: =?utf-8?q?1761712178975012853?= Put the loop in af_alg_sendmsg() into an if-statement to indent it to make the next patch easier to review as that will add another branch to handle MSG_SPLICE_PAGES to the if-statement. Signed-off-by: David Howells cc: Herbert Xu cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: linux-crypto@vger.kernel.org cc: netdev@vger.kernel.org --- crypto/af_alg.c | 50 +++++++++++++++++++++++++------------------------ 1 file changed, 26 insertions(+), 24 deletions(-) diff --git a/crypto/af_alg.c b/crypto/af_alg.c index 1dafd088ad45..483821e310e9 100644 --- a/crypto/af_alg.c +++ b/crypto/af_alg.c @@ -1031,35 +1031,37 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr *msg, size_t size, if (sgl->cur) sg_unmark_end(sg + sgl->cur - 1); - do { - struct page *pg; - unsigned int i = sgl->cur; + if (1 /* TODO check MSG_SPLICE_PAGES */) { + do { + struct page *pg; + unsigned int i = sgl->cur; - plen = min_t(size_t, len, PAGE_SIZE); + plen = min_t(size_t, len, PAGE_SIZE); - pg = alloc_page(GFP_KERNEL); - if (!pg) { - err = -ENOMEM; - goto unlock; - } + pg = alloc_page(GFP_KERNEL); + if (!pg) { + err = -ENOMEM; + goto unlock; + } - sg_assign_page(sg + i, pg); + sg_assign_page(sg + i, pg); - err = memcpy_from_msg(page_address(sg_page(sg + i)), - msg, plen); - if (err) { - __free_page(sg_page(sg + i)); - sg_assign_page(sg + i, NULL); - goto unlock; - } + err = memcpy_from_msg(page_address(sg_page(sg + i)), + msg, plen); + if (err) { + __free_page(sg_page(sg + i)); + sg_assign_page(sg + i, NULL); + goto unlock; + } - sg[i].length = plen; - len -= plen; - ctx->used += plen; - copied += plen; - size -= plen; - sgl->cur++; - } while (len && sgl->cur < MAX_SGL_ENTS); + sg[i].length = plen; + len -= plen; + ctx->used += plen; + copied += plen; + size -= plen; + sgl->cur++; + } while (len && sgl->cur < MAX_SGL_ENTS); + } if (!size) sg_mark_end(sg + sgl->cur - 1); From patchwork Wed Mar 29 14:13:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76639 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp463621vqo; Wed, 29 Mar 2023 07:40:40 -0700 (PDT) X-Google-Smtp-Source: AKy350b5RrvwAOEgJA2UvEApigka/Orded17BkUX+fa41z43cTUD1ryPCgdIKrNK7dfcoNHviHAF X-Received: by 2002:a17:903:186:b0:19d:16fa:ba48 with SMTP id z6-20020a170903018600b0019d16faba48mr21410206plg.28.1680100839956; Wed, 29 Mar 2023 07:40:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680100839; cv=none; d=google.com; s=arc-20160816; b=qrHwkkr+ZV7MbtT6ruR/7uoj4bXaagzdxtte5wtg/pQ3vNTA009d3vSnmZLjXxfIo5 me03+pXYMin+s+Q9VKQ+xMzBEndrvOaLkrx/eKZ4TUVcH8BAtA5/zxTpvMTvmeW2Ksr8 oAUoU6HiFGEwWCNfcDAPcyZ4/A2+3ykooI+DhcycRkDaKucHqrcrj8NY4oHaNmgWsLt/ xA4wTq0hfFbATjt65pl2dsabIEBPRGaadsRJlXfw5u9yAEpFlV1cERQJkvsokRKSCSiC 9PM8lMseU94Dk3UWEzzn4855DcupQ7LyPXJS0rs2hkFrzcy4mAr1ma125IyRXHY+PpzB ltng== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=85afYybs4YSPWlJ5qYhdk1qdo4c8pZli94KqFHLqWDY=; b=GlSWa9ekQ4yNAPXEBW6F29zW90ykvUb7KsT+AXHhhzj55MkXW5BNCAfK8WWlSr4/eW fu2yKb1C1QsTFwG9qjeliDP/MG1DMkuWz76UBlFrrAJWcNpukztt774rCZvsFX2IwxpX qxxIcy2Vwzx2MxHA3Nhc9UyDrsQK6nSJsj3Rynlv5lEz6j/uBwqFFPgsvrzK/3APSsND wh94hdJ80Ib656cEAbAHuWfIN7tX9CLkasz0BxeBmiXYGDipVysmnukrtaZWHuwJpr+4 6Ry5ZWz9UKTOgxCrtc6Yxqo6f7gHag1EcE75E8ogJnPIddYJ9wVwCTmxgyLMp4foYmaf fOhA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=e4UAlSlg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d17-20020a170903231100b001a1d76e7214si23891735plh.111.2023.03.29.07.40.26; Wed, 29 Mar 2023 07:40:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=e4UAlSlg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230033AbjC2OTH (ORCPT + 99 others); Wed, 29 Mar 2023 10:19:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53052 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230525AbjC2ORf (ORCPT ); Wed, 29 Mar 2023 10:17:35 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F186C5B92 for ; Wed, 29 Mar 2023 07:15:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099312; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=85afYybs4YSPWlJ5qYhdk1qdo4c8pZli94KqFHLqWDY=; b=e4UAlSlgOwuDkwd9nCNNDVDVIIXVIukng/aBH9EhEgJrVLu0MI+o6DtKICuGoQyEn3XYZm 0igEYEsaPK91DjAvUMM20LU5gzb/chRy/vcUg5YnZxjv4W/m1LWvRkT1gHuCgvEbLd9N90 rMhXZ+MBn82ljIEPJ+IDgf9FXFRt2n0= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-159-GHZurxPfN1C-QlBwmC150g-1; Wed, 29 Mar 2023 10:15:05 -0400 X-MC-Unique: GHZurxPfN1C-QlBwmC150g-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id EAA273C0ED65; Wed, 29 Mar 2023 14:15:04 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id CEF4F202701F; Wed, 29 Mar 2023 14:15:02 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Herbert Xu , linux-crypto@vger.kernel.org Subject: [RFC PATCH v2 24/48] crypto: af_alg: Support MSG_SPLICE_PAGES Date: Wed, 29 Mar 2023 15:13:30 +0100 Message-Id: <20230329141354.516864-25-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761713418097146842?= X-GMAIL-MSGID: =?utf-8?q?1761713418097146842?= Make AF_ALG sendmsg() support MSG_SPLICE_PAGES. This causes pages to be spliced from the source iterator if possible (the iterator must be ITER_BVEC and the pages must be spliceable). This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. [!] Note that this makes use of netfs_extract_iter_to_sg() from netfslib. This probably needs moving to core code somewhere. Signed-off-by: David Howells cc: Herbert Xu cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: linux-crypto@vger.kernel.org cc: netdev@vger.kernel.org --- crypto/Kconfig | 1 + crypto/af_alg.c | 28 ++++++++++++++++++++++++++-- crypto/algif_aead.c | 22 +++++++++++----------- crypto/algif_skcipher.c | 8 ++++---- 4 files changed, 42 insertions(+), 17 deletions(-) diff --git a/crypto/Kconfig b/crypto/Kconfig index 9c86f7045157..8c04ecbb4395 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -1297,6 +1297,7 @@ menu "Userspace interface" config CRYPTO_USER_API tristate + select NETFS_SUPPORT # for netfs_extract_iter_to_sg() config CRYPTO_USER_API_HASH tristate "Hash algorithms" diff --git a/crypto/af_alg.c b/crypto/af_alg.c index 483821e310e9..3088ab298632 100644 --- a/crypto/af_alg.c +++ b/crypto/af_alg.c @@ -941,6 +941,10 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr *msg, size_t size, bool init = false; int err = 0; + if ((msg->msg_flags & MSG_SPLICE_PAGES) && + !iov_iter_is_bvec(&msg->msg_iter)) + return -EINVAL; + if (msg->msg_controllen) { err = af_alg_cmsg_send(msg, &con); if (err) @@ -986,7 +990,7 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr *msg, size_t size, while (size) { struct scatterlist *sg; size_t len = size; - size_t plen; + ssize_t plen; /* use the existing memory in an allocated page */ if (ctx->merge) { @@ -1031,7 +1035,27 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr *msg, size_t size, if (sgl->cur) sg_unmark_end(sg + sgl->cur - 1); - if (1 /* TODO check MSG_SPLICE_PAGES */) { + if (msg->msg_flags & MSG_SPLICE_PAGES) { + struct sg_table sgtable = { + .sgl = sg, + .nents = sgl->cur, + .orig_nents = sgl->cur, + }; + + plen = netfs_extract_iter_to_sg(&msg->msg_iter, len, + &sgtable, MAX_SGL_ENTS, 0); + if (plen < 0) { + err = plen; + goto unlock; + } + + for (; sgl->cur < sgtable.nents; sgl->cur++) + get_page(sg_page(&sg[sgl->cur])); + len -= plen; + ctx->used += plen; + copied += plen; + size -= plen; + } else { do { struct page *pg; unsigned int i = sgl->cur; diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c index f6aa3856d8d5..b16111a3025a 100644 --- a/crypto/algif_aead.c +++ b/crypto/algif_aead.c @@ -9,8 +9,8 @@ * The following concept of the memory management is used: * * The kernel maintains two SGLs, the TX SGL and the RX SGL. The TX SGL is - * filled by user space with the data submitted via sendpage/sendmsg. Filling - * up the TX SGL does not cause a crypto operation -- the data will only be + * filled by user space with the data submitted via sendpage. Filling up + * the TX SGL does not cause a crypto operation -- the data will only be * tracked by the kernel. Upon receipt of one recvmsg call, the caller must * provide a buffer which is tracked with the RX SGL. * @@ -113,19 +113,19 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr *msg, } /* - * Data length provided by caller via sendmsg/sendpage that has not - * yet been processed. + * Data length provided by caller via sendmsg that has not yet been + * processed. */ used = ctx->used; /* - * Make sure sufficient data is present -- note, the same check is - * also present in sendmsg/sendpage. The checks in sendpage/sendmsg - * shall provide an information to the data sender that something is - * wrong, but they are irrelevant to maintain the kernel integrity. - * We need this check here too in case user space decides to not honor - * the error message in sendmsg/sendpage and still call recvmsg. This - * check here protects the kernel integrity. + * Make sure sufficient data is present -- note, the same check is also + * present in sendmsg. The checks in sendmsg shall provide an + * information to the data sender that something is wrong, but they are + * irrelevant to maintain the kernel integrity. We need this check + * here too in case user space decides to not honor the error message + * in sendmsg and still call recvmsg. This check here protects the + * kernel integrity. */ if (!aead_sufficient_data(sk)) return -EINVAL; diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c index a251cd6bd5b9..b1f321b9f846 100644 --- a/crypto/algif_skcipher.c +++ b/crypto/algif_skcipher.c @@ -9,10 +9,10 @@ * The following concept of the memory management is used: * * The kernel maintains two SGLs, the TX SGL and the RX SGL. The TX SGL is - * filled by user space with the data submitted via sendpage/sendmsg. Filling - * up the TX SGL does not cause a crypto operation -- the data will only be - * tracked by the kernel. Upon receipt of one recvmsg call, the caller must - * provide a buffer which is tracked with the RX SGL. + * filled by user space with the data submitted via sendmsg. Filling up the TX + * SGL does not cause a crypto operation -- the data will only be tracked by + * the kernel. Upon receipt of one recvmsg call, the caller must provide a + * buffer which is tracked with the RX SGL. * * During the processing of the recvmsg operation, the cipher request is * allocated and prepared. As part of the recvmsg operation, the processed From patchwork Wed Mar 29 14:13:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76633 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp462995vqo; Wed, 29 Mar 2023 07:39:41 -0700 (PDT) X-Google-Smtp-Source: AKy350a263y+HkSeHTRK2m76o1IP3AwJcVSsYzWyeVOlxw59gBY0eq//xvDbKFeghyYyiLkyF251 X-Received: by 2002:aa7:de92:0:b0:4fe:9374:30cb with SMTP id j18-20020aa7de92000000b004fe937430cbmr18392956edv.37.1680100781300; Wed, 29 Mar 2023 07:39:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680100781; cv=none; d=google.com; s=arc-20160816; b=e2rdcwHYnYgUy+xKI+D9W1XwPhwE0FMfWWa2aoS7340fcIVAJE3PfneVIfhzt5B/K2 FWGODMPAKn9/vG6F4Hn+AMFmMHjEc7DmmaJEDJP867DFq++0cxIvYyTtRzF4/b3mKTPb OEbQHFfQWXSSuGXoyW0BuQCTL+MFv2lry9ZEqLA5hvaTR5fO+QusPsHaESb/ChQI0XnF Qcztd0vjTgAKi6vLyWdLxzZWnUbKe5MpKQFDii40sXcEyOZg8P8kqRl6H/Fm7RJg8aer MgoyYp4PhEbnU74rwvPcB/PoJcK4WGLIGcBZXrhWasALJxiLHRFpbZth/o00El1dSROc NwcA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=mERLnHIZ7Ufhp5QAaSwmDcCxGjZ2CWUtcWDMeYUTupY=; b=dDUkX6ZYENnADbEQoJpK/NHownvg2VM5kk/f/OQA4ECYfy/rOroV8JSlHe6EG0OZct +L9jYdJdrmJo5N52VJ84QSD1FDMgTq/U7IS3WisS3LvFxLj/f4NQr1ov9344LE9+x9pS ed7VciTTAQZ+D6T4rdRFwQ0L/ilKBTseAFzffaTf3o6Cru9W2YzO0j3qW2X191UlOTNU 1Dftqpd2O5Q8D8IxpmlB3Vcn0FgxB2VJ0GTtPfQ4dsGt9tNm4Pb5TGTR2D8Jo0RvzNLK kXqgPSvB8e+5952ex1tN9P9gSfRI1G7ghUox2IJoWpPV14cthQxC+WlkhhmfrZ2EKTi4 Sn6w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=SYBbsC2d; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v14-20020aa7d64e000000b005021f0d5766si14637838edr.685.2023.03.29.07.39.17; Wed, 29 Mar 2023 07:39:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=SYBbsC2d; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230289AbjC2OTZ (ORCPT + 99 others); Wed, 29 Mar 2023 10:19:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55132 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231126AbjC2ORi (ORCPT ); Wed, 29 Mar 2023 10:17:38 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 050075B8E for ; Wed, 29 Mar 2023 07:15:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099311; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mERLnHIZ7Ufhp5QAaSwmDcCxGjZ2CWUtcWDMeYUTupY=; b=SYBbsC2diMRlA1CwY38E95aUE7oNGTg+DlMiDa30r/ll8twPWuaGIvqJe02LQkfifwtTqk FmNYDT8HVkHoMf8/I9r+db080xlNEJz6ZU/2sJ4XqQl0ulhn+6O6LNtcuiKMUuY0f2F37z 7Fxc2qIHQjju4u6bWBfKnHdbrZVPypA= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-590-z9dsdsf9NKCR0l94rUYaTw-1; Wed, 29 Mar 2023 10:15:08 -0400 X-MC-Unique: z9dsdsf9NKCR0l94rUYaTw-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 98FAE885625; Wed, 29 Mar 2023 14:15:07 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 85B182027040; Wed, 29 Mar 2023 14:15:05 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Herbert Xu , linux-crypto@vger.kernel.org Subject: [RFC PATCH v2 25/48] crypto: af_alg: Convert af_alg_sendpage() to use MSG_SPLICE_PAGES Date: Wed, 29 Mar 2023 15:13:31 +0100 Message-Id: <20230329141354.516864-26-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761713356880642741?= X-GMAIL-MSGID: =?utf-8?q?1761713356880642741?= Convert af_alg_sendpage() to use sendmsg() with MSG_SPLICE_PAGES rather than directly splicing in the pages itself. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. [!] Note that this makes use of netfs_extract_iter_to_sg() from netfslib. This probably needs moving to core code somewhere. Signed-off-by: David Howells cc: Herbert Xu cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: linux-crypto@vger.kernel.org cc: netdev@vger.kernel.org --- crypto/af_alg.c | 53 +++++++++---------------------------------------- 1 file changed, 9 insertions(+), 44 deletions(-) diff --git a/crypto/af_alg.c b/crypto/af_alg.c index 3088ab298632..7fe8c8db6bb5 100644 --- a/crypto/af_alg.c +++ b/crypto/af_alg.c @@ -1118,53 +1118,18 @@ EXPORT_SYMBOL_GPL(af_alg_sendmsg); ssize_t af_alg_sendpage(struct socket *sock, struct page *page, int offset, size_t size, int flags) { - struct sock *sk = sock->sk; - struct alg_sock *ask = alg_sk(sk); - struct af_alg_ctx *ctx = ask->private; - struct af_alg_tsgl *sgl; - int err = -EINVAL; - - if (flags & MSG_SENDPAGE_NOTLAST) - flags |= MSG_MORE; - - lock_sock(sk); - if (!ctx->more && ctx->used) - goto unlock; - - if (!size) - goto done; - - if (!af_alg_writable(sk)) { - err = af_alg_wait_for_wmem(sk, flags); - if (err) - goto unlock; - } - - err = af_alg_alloc_tsgl(sk); - if (err) - goto unlock; - - ctx->merge = 0; - sgl = list_entry(ctx->tsgl_list.prev, struct af_alg_tsgl, list); - - if (sgl->cur) - sg_unmark_end(sgl->sg + sgl->cur - 1); + struct bio_vec bvec; + struct msghdr msg = { + .msg_flags = flags | MSG_SPLICE_PAGES, + }; - sg_mark_end(sgl->sg + sgl->cur); + bvec_set_page(&bvec, page, size, offset); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); - get_page(page); - sg_set_page(sgl->sg + sgl->cur, page, size, offset); - sgl->cur++; - ctx->used += size; - -done: - ctx->more = flags & MSG_MORE; - -unlock: - af_alg_data_wakeup(sk); - release_sock(sk); + if (flags & MSG_SENDPAGE_NOTLAST) + msg.msg_flags |= MSG_MORE; - return err ?: size; + return sock_sendmsg(sock, &msg); } EXPORT_SYMBOL_GPL(af_alg_sendpage); From patchwork Wed Mar 29 14:13:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76637 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp463214vqo; Wed, 29 Mar 2023 07:40:01 -0700 (PDT) X-Google-Smtp-Source: AKy350bHgmmyCuju9PX0INzGC4Mx7zjwHJGrA07kZy0RVmJIt/z4jD0Ruesmf9a2pn2fndM3J4UI X-Received: by 2002:a17:906:fcc9:b0:92c:138e:ff1f with SMTP id qx9-20020a170906fcc900b0092c138eff1fmr18802973ejb.18.1680100801132; Wed, 29 Mar 2023 07:40:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680100801; cv=none; d=google.com; s=arc-20160816; b=JHaCaEoOdb/n/VPVCtgLD1DxwgLl90liw+q9hzAhWC/r+CUm/+26HYO5fkxPwqUWZZ QE4RipXFDctZFTSsFmMAhFlIzu9EieCnOFxGHP1qciRlEQ2uXjgeEDpIKpi5WvdAMgLG M6LVhsFBhrq2wU56E6+/YRNpvjjOpKs4Q6lIjnLEjy96Ur9lpF6gqng6GF8908C5fMpJ 2mop6dw+FQdLEgTFbZ2nXBeqd9cwFXLOjHyCAXjb50+hMq5bwAyKmlPwMXNkGiGR0nCm E9WFAdO72og09d3TS+WBbyQeplO0k3Tvt7bU5YYiJVv3/rppMBjAFeLmxcan3TZ3FWc5 XoJA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=M5DaLi1tovZrH11oTI95mtlHRv/XZcPSFvwL/aUxUh4=; b=z6e70utdu7RxIk3ubqfZddAzgAddT0yCdCz4+jBzlBspBtGoHgpU48DTL4awAlFTEH Ac2xVpAiIS+kgoKBT/38IhUIKWSoWKJqY3eo7QdJ9nJ7xBMU6LTDC1OCJki66LLaH5Km 9dyLcetFUFctEsz/DzFnoAejkNNnGNIfzQQDPoGUx8Z+qSyyEI/Ssbk3aUUkKZrPvcBx TUQdiPojHX3EjYK6I9+DjguGJocYq/DmwhQ590TZZNrAnvzwVTG2TMUUrSeaOW2+nX6P 0l3vOj1lCn8vmVfuhnpRYMrKP6XVWj+undc3mSYwckDEnB8vyyxibRUhi7at13IOfHp1 xpWw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="Dk//I4lx"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e19-20020a50ec93000000b004ad8bff8f1dsi32555085edr.204.2023.03.29.07.39.37; Wed, 29 Mar 2023 07:40:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="Dk//I4lx"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230425AbjC2OT7 (ORCPT + 99 others); Wed, 29 Mar 2023 10:19:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53482 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231165AbjC2OSG (ORCPT ); Wed, 29 Mar 2023 10:18:06 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 54C055BA5 for ; Wed, 29 Mar 2023 07:16:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099315; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=M5DaLi1tovZrH11oTI95mtlHRv/XZcPSFvwL/aUxUh4=; b=Dk//I4lx7BOnYxkfIz9av1ekycqK22HPXeNxO1cvCjF8u68qCSuXVWi021S1bxx9ueIwEZ Kv55SWUKxhqolj0IODlih/ilHJqUdr8G7gqd+wnEe8ObkCiRPl1Dh76AL+LY1pK0xaIWGa 1XcSQIcbrOceMyZqQ55U01F8cDhiYWQ= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-612-sHDpKiIEOqiGBf3DQmh3WA-1; Wed, 29 Mar 2023 10:15:11 -0400 X-MC-Unique: sHDpKiIEOqiGBf3DQmh3WA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6673388562B; Wed, 29 Mar 2023 14:15:10 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 548E32027041; Wed, 29 Mar 2023 14:15:08 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Herbert Xu , linux-crypto@vger.kernel.org Subject: [RFC PATCH v2 26/48] crypto: af_alg/hash: Support MSG_SPLICE_PAGES Date: Wed, 29 Mar 2023 15:13:32 +0100 Message-Id: <20230329141354.516864-27-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761713377579846222?= X-GMAIL-MSGID: =?utf-8?q?1761713377579846222?= Make AF_ALG sendmsg() support MSG_SPLICE_PAGES in the hashing code. This causes pages to be spliced from the source iterator if possible. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. [!] Note that this makes use of netfs_extract_iter_to_sg() from netfslib. This probably needs moving to core code somewhere. Signed-off-by: David Howells cc: Herbert Xu cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: linux-crypto@vger.kernel.org cc: netdev@vger.kernel.org --- crypto/af_alg.c | 11 +++-- crypto/algif_hash.c | 99 ++++++++++++++++++++++++++++----------------- 2 files changed, 70 insertions(+), 40 deletions(-) diff --git a/crypto/af_alg.c b/crypto/af_alg.c index 7fe8c8db6bb5..686610a4986f 100644 --- a/crypto/af_alg.c +++ b/crypto/af_alg.c @@ -543,9 +543,14 @@ void af_alg_free_sg(struct af_alg_sgl *sgl) { int i; - if (sgl->need_unpin) - for (i = 0; i < sgl->sgt.nents; i++) - unpin_user_page(sg_page(&sgl->sgt.sgl[i])); + if (sgl->sgt.sgl) { + if (sgl->need_unpin) + for (i = 0; i < sgl->sgt.nents; i++) + unpin_user_page(sg_page(&sgl->sgt.sgl[i])); + if (sgl->sgt.sgl != sgl->sgl) + kvfree(sgl->sgt.sgl); + sgl->sgt.sgl = NULL; + } } EXPORT_SYMBOL_GPL(af_alg_free_sg); diff --git a/crypto/algif_hash.c b/crypto/algif_hash.c index f051fa624bd7..b89c2c50cecc 100644 --- a/crypto/algif_hash.c +++ b/crypto/algif_hash.c @@ -64,77 +64,102 @@ static void hash_free_result(struct sock *sk, struct hash_ctx *ctx) static int hash_sendmsg(struct socket *sock, struct msghdr *msg, size_t ignored) { - int limit = ALG_MAX_PAGES * PAGE_SIZE; struct sock *sk = sock->sk; struct alg_sock *ask = alg_sk(sk); struct hash_ctx *ctx = ask->private; - long copied = 0; + ssize_t copied = 0; + size_t len, max_pages = ALG_MAX_PAGES, npages; + bool continuing = ctx->more, need_init = false; int err; - if (limit > sk->sk_sndbuf) - limit = sk->sk_sndbuf; + /* Don't limit to ALG_MAX_PAGES if the pages are all already pinned. */ + if (!user_backed_iter(&msg->msg_iter)) + max_pages = INT_MAX; + else + max_pages = min_t(size_t, max_pages, + DIV_ROUND_UP(sk->sk_sndbuf, PAGE_SIZE)); lock_sock(sk); - if (!ctx->more) { + if (!continuing) { if ((msg->msg_flags & MSG_MORE)) hash_free_result(sk, ctx); - - err = crypto_wait_req(crypto_ahash_init(&ctx->req), &ctx->wait); - if (err) - goto unlock; + need_init = true; } ctx->more = false; while (msg_data_left(msg)) { - int len = msg_data_left(msg); - - if (len > limit) - len = limit; - ctx->sgl.sgt.sgl = ctx->sgl.sgl; ctx->sgl.sgt.nents = 0; ctx->sgl.sgt.orig_nents = 0; - len = netfs_extract_iter_to_sg(&msg->msg_iter, len, - &ctx->sgl.sgt, ALG_MAX_PAGES, 0); - if (len < 0) { - err = copied ? 0 : len; - goto unlock; + err = -EIO; + npages = iov_iter_npages(&msg->msg_iter, max_pages); + if (npages == 0) + goto unlock_free; + + if (npages > ARRAY_SIZE(ctx->sgl.sgl)) { + err = -ENOMEM; + ctx->sgl.sgt.sgl = + kvmalloc(array_size(npages, sizeof(*ctx->sgl.sgt.sgl)), + GFP_KERNEL); + if (!ctx->sgl.sgt.sgl) + goto unlock_free; } + sg_init_table(ctx->sgl.sgl, npages); ctx->sgl.need_unpin = iov_iter_extract_will_pin(&msg->msg_iter); - ahash_request_set_crypt(&ctx->req, ctx->sgl.sgt.sgl, NULL, len); + err = netfs_extract_iter_to_sg(&msg->msg_iter, LONG_MAX, + &ctx->sgl.sgt, npages, 0); + if (err < 0) + goto unlock_free; + len = err; + sg_mark_end(ctx->sgl.sgt.sgl + ctx->sgl.sgt.nents - 1); - err = crypto_wait_req(crypto_ahash_update(&ctx->req), - &ctx->wait); - af_alg_free_sg(&ctx->sgl); - if (err) { - iov_iter_revert(&msg->msg_iter, len); - goto unlock; + if (!msg_data_left(msg)) { + err = hash_alloc_result(sk, ctx); + if (err) + goto unlock_free; } - copied += len; - } + ahash_request_set_crypt(&ctx->req, ctx->sgl.sgt.sgl, ctx->result, len); - err = 0; + if (!msg_data_left(msg) && !continuing && !(msg->msg_flags & MSG_MORE)) { + err = crypto_ahash_digest(&ctx->req); + } else { + if (need_init) { + err = crypto_wait_req(crypto_ahash_init(&ctx->req), + &ctx->wait); + if (err) + goto unlock_free; + need_init = false; + } + + if (msg_data_left(msg) || (msg->msg_flags & MSG_MORE)) + err = crypto_ahash_update(&ctx->req); + else + err = crypto_ahash_finup(&ctx->req); + continuing = true; + } - ctx->more = msg->msg_flags & MSG_MORE; - if (!ctx->more) { - err = hash_alloc_result(sk, ctx); + err = crypto_wait_req(err, &ctx->wait); if (err) - goto unlock; + goto unlock_free; - ahash_request_set_crypt(&ctx->req, NULL, ctx->result, 0); - err = crypto_wait_req(crypto_ahash_final(&ctx->req), - &ctx->wait); + copied += len; + af_alg_free_sg(&ctx->sgl); } + ctx->more = msg->msg_flags & MSG_MORE; + err = 0; unlock: release_sock(sk); + return copied ?: err; - return err ?: copied; +unlock_free: + af_alg_free_sg(&ctx->sgl); + goto unlock; } static ssize_t hash_sendpage(struct socket *sock, struct page *page, From patchwork Wed Mar 29 14:13:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76638 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp463466vqo; Wed, 29 Mar 2023 07:40:26 -0700 (PDT) X-Google-Smtp-Source: AKy350Z8YTcjCAoR0lMw9hzgzt22Jp7oTR8ExqjNVaW2MuqrbFMAns0VS6t97UWGBEIxGtS6r6mJ X-Received: by 2002:a17:903:2312:b0:1a1:a6e5:764b with SMTP id d18-20020a170903231200b001a1a6e5764bmr25491262plh.60.1680100826248; Wed, 29 Mar 2023 07:40:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680100826; cv=none; d=google.com; s=arc-20160816; b=EWlOstsnO2VNDHNIeuU6S0v9mzR7w14UHGGItdn2pkOB18/2MRXA/paAXgmtEEYuPC 1qPRExXtwjrfaRS13GDpeMr0GFzm1orWsHwSXBWKzOzJY/hMycDmHqF/C6fnTbcLBHF/ qv4v+rKoMK8CsZ2w0OM5lE6ODtaZKnWSCvqR61Nj72lba2mj8gV2mt7M6kVumW3koc+V QhtNpj83OK47Vmzlx/GtxlE1Li/ixSOze67cBleRC4Y29+pt8pHAu82JTXCYPRspOjta laWdLDL/Mq7r59retA65ipld3ky5cZOgizjlNk0o6v7SodgMAKpwBpCKxyAxV/nFK+oc wuVA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=I/S0tdHFgU2wYUbqRS8y7qqDbwCo5raKog1zVTXoTHk=; b=oqeZNXeFmUC9FHH2rgwzfHMj0uATlvVwIPWtw79was2HyN5zMKJorL3AyAVwzEUJjq LpBXR1n6q2nrGCauc+YFJTJBkCBSyVzfIiv9yFjcGurs6b3z43gas5T/k5Mp9DBAvdL9 ePCRZQ7opnBpIeHZR8hA+TQr+T00nrIX5Qy95orMnXLzXiW/FDimQdM3Zgce+LSjTCMk S4C936OhSJtmEgf3iWMCDShpvgLQ5ZtxUy7IGDQc+9Iy67Bg8Ye7K4Q7m3SvqN+UwUXC tU3B4UTl77cWXr3M6WHjPOF+VR4tx8ZZ25A0d5q0q8hzPAXx0GESByjMVwm0gVXAfYtK 0WnA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=FSA3Hbcl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e26-20020a630f1a000000b0051344de997fsi8663320pgl.437.2023.03.29.07.40.13; Wed, 29 Mar 2023 07:40:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=FSA3Hbcl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231179AbjC2OTr (ORCPT + 99 others); Wed, 29 Mar 2023 10:19:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55256 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230451AbjC2ORp (ORCPT ); Wed, 29 Mar 2023 10:17:45 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CE2F75B9B for ; Wed, 29 Mar 2023 07:16:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099317; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=I/S0tdHFgU2wYUbqRS8y7qqDbwCo5raKog1zVTXoTHk=; b=FSA3HbclTG/VUq/4IdV1ol7z1lsRggUH8Ffx/3rAGIpyfJ5SgzEnXNlZKIya4Ye4/d1x7r tzanLLn1mDy6UF/PdYemBcqRZ85CwaP4bw2AaK9jVa6DajEz91qejUu7/5o33IyXJIaelY gLa336/tBv2/PDdVtZl4kJzaT51WN/4= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-96-tJwSygBLMUeAppTVp53tcA-1; Wed, 29 Mar 2023 10:15:14 -0400 X-MC-Unique: tJwSygBLMUeAppTVp53tcA-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0E7F23C0ED4B; Wed, 29 Mar 2023 14:15:13 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 151BA492B01; Wed, 29 Mar 2023 14:15:10 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v2 27/48] splice, net: Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage() Date: Wed, 29 Mar 2023 15:13:33 +0100 Message-Id: <20230329141354.516864-28-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761713404236559018?= X-GMAIL-MSGID: =?utf-8?q?1761713404236559018?= Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage() to splice data from a pipe to a socket. This paves the way for passing in multiple pages at once from a pipe and the handling of multipage folios. Signed-off-by: David Howells cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- fs/splice.c | 42 +++++++++++++++++++++++------------------- include/linux/fs.h | 2 -- include/linux/splice.h | 2 ++ net/socket.c | 26 ++------------------------ 4 files changed, 27 insertions(+), 45 deletions(-) diff --git a/fs/splice.c b/fs/splice.c index f46dd1fb367b..23ead122d631 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -32,6 +32,7 @@ #include #include #include +#include #include #include @@ -410,29 +411,32 @@ const struct pipe_buf_operations nosteal_pipe_buf_ops = { }; EXPORT_SYMBOL(nosteal_pipe_buf_ops); +#ifdef CONFIG_NET /* * Send 'sd->len' bytes to socket from 'sd->file' at position 'sd->pos' * using sendpage(). Return the number of bytes sent. */ -static int pipe_to_sendpage(struct pipe_inode_info *pipe, - struct pipe_buffer *buf, struct splice_desc *sd) +static int pipe_to_sendmsg(struct pipe_inode_info *pipe, + struct pipe_buffer *buf, struct splice_desc *sd) { - struct file *file = sd->u.file; - loff_t pos = sd->pos; - int more; - - if (!likely(file->f_op->sendpage)) - return -EINVAL; + struct socket *sock = sock_from_file(sd->u.file); + struct bio_vec bvec; + struct msghdr msg = { + .msg_flags = MSG_SPLICE_PAGES, + }; - more = (sd->flags & SPLICE_F_MORE) ? MSG_MORE : 0; + if (sd->flags & SPLICE_F_MORE) + msg.msg_flags |= MSG_MORE; if (sd->len < sd->total_len && pipe_occupancy(pipe->head, pipe->tail) > 1) - more |= MSG_SENDPAGE_NOTLAST; + msg.msg_flags |= MSG_MORE; - return file->f_op->sendpage(file, buf->page, buf->offset, - sd->len, &pos, more); + bvec_set_page(&bvec, buf->page, sd->len, buf->offset); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, sd->len); + return sock_sendmsg(sock, &msg); } +#endif static void wakeup_pipe_writers(struct pipe_inode_info *pipe) { @@ -614,7 +618,7 @@ static void splice_from_pipe_end(struct pipe_inode_info *pipe, struct splice_des * Description: * This function does little more than loop over the pipe and call * @actor to do the actual moving of a single struct pipe_buffer to - * the desired destination. See pipe_to_file, pipe_to_sendpage, or + * the desired destination. See pipe_to_file, pipe_to_sendmsg, or * pipe_to_user. * */ @@ -795,8 +799,9 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out, EXPORT_SYMBOL(iter_file_splice_write); +#ifdef CONFIG_NET /** - * generic_splice_sendpage - splice data from a pipe to a socket + * splice_to_socket - splice data from a pipe to a socket * @pipe: pipe to splice from * @out: socket to write to * @ppos: position in @out @@ -808,13 +813,12 @@ EXPORT_SYMBOL(iter_file_splice_write); * is involved. * */ -ssize_t generic_splice_sendpage(struct pipe_inode_info *pipe, struct file *out, - loff_t *ppos, size_t len, unsigned int flags) +ssize_t splice_to_socket(struct pipe_inode_info *pipe, struct file *out, + loff_t *ppos, size_t len, unsigned int flags) { - return splice_from_pipe(pipe, out, ppos, len, flags, pipe_to_sendpage); + return splice_from_pipe(pipe, out, ppos, len, flags, pipe_to_sendmsg); } - -EXPORT_SYMBOL(generic_splice_sendpage); +#endif static int warn_unsupported(struct file *file, const char *op) { diff --git a/include/linux/fs.h b/include/linux/fs.h index c85916e9f7db..f3ccc243851e 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2740,8 +2740,6 @@ extern ssize_t generic_file_splice_read(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int); extern ssize_t iter_file_splice_write(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int); -extern ssize_t generic_splice_sendpage(struct pipe_inode_info *pipe, - struct file *out, loff_t *, size_t len, unsigned int flags); extern long do_splice_direct(struct file *in, loff_t *ppos, struct file *out, loff_t *opos, size_t len, unsigned int flags); diff --git a/include/linux/splice.h b/include/linux/splice.h index 8f052c3dae95..e6153feda86c 100644 --- a/include/linux/splice.h +++ b/include/linux/splice.h @@ -87,6 +87,8 @@ extern long do_splice(struct file *in, loff_t *off_in, extern long do_tee(struct file *in, struct file *out, size_t len, unsigned int flags); +extern ssize_t splice_to_socket(struct pipe_inode_info *pipe, struct file *out, + loff_t *ppos, size_t len, unsigned int flags); /* * for dynamic pipe sizing diff --git a/net/socket.c b/net/socket.c index dfb912bbed62..2cd5c2bcdde8 100644 --- a/net/socket.c +++ b/net/socket.c @@ -57,6 +57,7 @@ #include #include #include +#include #include #include #include @@ -126,8 +127,6 @@ static long compat_sock_ioctl(struct file *file, unsigned int cmd, unsigned long arg); #endif static int sock_fasync(int fd, struct file *filp, int on); -static ssize_t sock_sendpage(struct file *file, struct page *page, - int offset, size_t size, loff_t *ppos, int more); static ssize_t sock_splice_read(struct file *file, loff_t *ppos, struct pipe_inode_info *pipe, size_t len, unsigned int flags); @@ -162,8 +161,7 @@ static const struct file_operations socket_file_ops = { .mmap = sock_mmap, .release = sock_close, .fasync = sock_fasync, - .sendpage = sock_sendpage, - .splice_write = generic_splice_sendpage, + .splice_write = splice_to_socket, .splice_read = sock_splice_read, .show_fdinfo = sock_show_fdinfo, }; @@ -1062,26 +1060,6 @@ int kernel_recvmsg(struct socket *sock, struct msghdr *msg, } EXPORT_SYMBOL(kernel_recvmsg); -static ssize_t sock_sendpage(struct file *file, struct page *page, - int offset, size_t size, loff_t *ppos, int more) -{ - struct socket *sock; - int flags; - int ret; - - sock = file->private_data; - - flags = (file->f_flags & O_NONBLOCK) ? MSG_DONTWAIT : 0; - /* more is a combination of MSG_MORE and MSG_SENDPAGE_NOTLAST */ - flags |= more; - - ret = kernel_sendpage(sock, page, offset, size, flags); - - if (trace_sock_send_length_enabled()) - call_trace_sock_send_length(sock->sk, ret, 0); - return ret; -} - static ssize_t sock_splice_read(struct file *file, loff_t *ppos, struct pipe_inode_info *pipe, size_t len, unsigned int flags) From patchwork Wed Mar 29 14:13:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76640 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp463786vqo; Wed, 29 Mar 2023 07:40:53 -0700 (PDT) X-Google-Smtp-Source: AKy350Y0FzVBZl+X/mFIov+4KLvDWW9zzqef+HOPGPanbz4rCOtv4c8iZzpIErccrvlXpVrypIyY X-Received: by 2002:a17:90b:4a05:b0:23d:15d8:1bc3 with SMTP id kk5-20020a17090b4a0500b0023d15d81bc3mr21816925pjb.39.1680100853011; Wed, 29 Mar 2023 07:40:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680100852; cv=none; d=google.com; s=arc-20160816; b=Cpkh9aBCTlH+Srq6owFD59Nlp52MHGJVSmX74/JVp+96heZfoWiwiw7VOc5uQlkNy3 mFM6AAZZFfdyNxKaq+tGpyC8SJUvdA9odsqxDQVkQMQqrFAGZ0nCyIkWgq5O1u7/UugW rkdZ+DO5n9MmJtSZprDXa2h5Bq+R9lGAzooeVP239fVK/J5wfhtads7L60qJLhZU1y2p c+yMyp1GR0kc0FJqmR9vTQc3oO26lMNMdJJVbeEy78Z9J6j/LxhwtnOQIprUXnXDBj8r TSngApa5abXpG2HVXqEaj7Y8iyLArUFzGlCOFr143SWMq24GA4p9ed6iaJwwWFZBupBv ECug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=+Zy7MVQlzWaHdY2cAkrhIxWsi1s53LxwKb8sGz4UC6E=; b=vW5E+oV2wPnN/g7WFTVzHmlDljMzPcAiZcTUX1lAJ9irjWzmlcY1+ILOkI+5k5/cay Eg9hV91TwUSX9FpgtXZumovvdXCNi5wy4gwIhSR0zNfDdPCbz7E+NA2QJTHpl1GUaDkI 49wvlWrSuY/+KAF8T1RTykGzpzQwFgor1vGypG2KjGveLdkDTTLrlNTaUdIseSCL4Kzc 4Blf8GGgSIR3UX5IBtSO0D8IQn/GNJw/D9exFzimiKbIE4V0lsnZ0JFkgBFNbib6aiaQ LdCiC2e9W7jQg51g1Ab3LHZx0OCYdjyC5eHBOd6vh+1NyXPZHoue3ARDBMPb0T+MUzeV BlZA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=JAP6ggSG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m1-20020a170902db0100b0019a96849d6dsi32966068plx.605.2023.03.29.07.40.40; Wed, 29 Mar 2023 07:40:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=JAP6ggSG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230218AbjC2OUx (ORCPT + 99 others); Wed, 29 Mar 2023 10:20:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54738 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229584AbjC2OTL (ORCPT ); Wed, 29 Mar 2023 10:19:11 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B77285FFD for ; Wed, 29 Mar 2023 07:16:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099344; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+Zy7MVQlzWaHdY2cAkrhIxWsi1s53LxwKb8sGz4UC6E=; b=JAP6ggSGHe+a4Dua/CKDo1wpHLQhekTjrLwTXQ+fQArLRgmc/Byk5i6PJR4Nq2cGPFiQ6m Qamh/0IS9Wt1Yhk/7riKyLbICsO97y6vb4O1XtD2ev8Crvd5zv8xSXoy30JMSLTZn9G5VL ZVzNwpf8EwPR3IJfmjmixb+tmMzcZUQ= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-441-MbBkqL9VMYWp7A_VXkoNTw-1; Wed, 29 Mar 2023 10:15:36 -0400 X-MC-Unique: MbBkqL9VMYWp7A_VXkoNTw-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 9C09D299E75B; Wed, 29 Mar 2023 14:15:15 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id B961E492C3E; Wed, 29 Mar 2023 14:15:13 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v2 28/48] splice: Reimplement splice_to_socket() to pass multiple bufs to sendmsg() Date: Wed, 29 Mar 2023 15:13:34 +0100 Message-Id: <20230329141354.516864-29-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761713431860542817?= X-GMAIL-MSGID: =?utf-8?q?1761713431860542817?= Reimplement splice_to_socket() so that it can pass multiple pipe buffer pages to sendmsg() in a single go. Signed-off-by: David Howells --- fs/splice.c | 148 ++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 120 insertions(+), 28 deletions(-) diff --git a/fs/splice.c b/fs/splice.c index 23ead122d631..d5bc28b59720 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -411,33 +411,6 @@ const struct pipe_buf_operations nosteal_pipe_buf_ops = { }; EXPORT_SYMBOL(nosteal_pipe_buf_ops); -#ifdef CONFIG_NET -/* - * Send 'sd->len' bytes to socket from 'sd->file' at position 'sd->pos' - * using sendpage(). Return the number of bytes sent. - */ -static int pipe_to_sendmsg(struct pipe_inode_info *pipe, - struct pipe_buffer *buf, struct splice_desc *sd) -{ - struct socket *sock = sock_from_file(sd->u.file); - struct bio_vec bvec; - struct msghdr msg = { - .msg_flags = MSG_SPLICE_PAGES, - }; - - if (sd->flags & SPLICE_F_MORE) - msg.msg_flags |= MSG_MORE; - - if (sd->len < sd->total_len && - pipe_occupancy(pipe->head, pipe->tail) > 1) - msg.msg_flags |= MSG_MORE; - - bvec_set_page(&bvec, buf->page, sd->len, buf->offset); - iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, sd->len); - return sock_sendmsg(sock, &msg); -} -#endif - static void wakeup_pipe_writers(struct pipe_inode_info *pipe) { smp_mb(); @@ -816,7 +789,126 @@ EXPORT_SYMBOL(iter_file_splice_write); ssize_t splice_to_socket(struct pipe_inode_info *pipe, struct file *out, loff_t *ppos, size_t len, unsigned int flags) { - return splice_from_pipe(pipe, out, ppos, len, flags, pipe_to_sendmsg); + struct socket *sock = sock_from_file(out); + struct bio_vec bvec[16]; + struct msghdr msg = {}; + ssize_t ret; + size_t spliced = 0; + bool need_wakeup = false; + + pipe_lock(pipe); + + while (len > 0) { + unsigned int head, tail, mask, bc = 0; + size_t remain = len; + + /* + * Check for signal early to make process killable when there + * are always buffers available + */ + ret = -ERESTARTSYS; + if (signal_pending(current)) + break; + + while (pipe_empty(pipe->head, pipe->tail)) { + ret = 0; + if (!pipe->writers) + goto out; + + if (spliced) + goto out; + + ret = -EAGAIN; + if (flags & SPLICE_F_NONBLOCK) + goto out; + + ret = -ERESTARTSYS; + if (signal_pending(current)) + goto out; + + if (need_wakeup) { + wakeup_pipe_writers(pipe); + need_wakeup = false; + } + + pipe_wait_readable(pipe); + } + + head = pipe->head; + tail = pipe->tail; + mask = pipe->ring_size - 1; + + while (!pipe_empty(head, tail)) { + struct pipe_buffer *buf = &pipe->bufs[tail & mask]; + size_t seg; + + if (!buf->len) { + tail++; + continue; + } + + seg = min_t(size_t, remain, buf->len); + seg = min_t(size_t, seg, PAGE_SIZE); + + ret = pipe_buf_confirm(pipe, buf); + if (unlikely(ret)) { + if (ret == -ENODATA) + ret = 0; + break; + } + + bvec_set_page(&bvec[bc++], buf->page, seg, buf->offset); + remain -= seg; + if (seg >= buf->len) + tail++; + if (bc >= ARRAY_SIZE(bvec)) + break; + } + + if (!bc) + break; + + msg.msg_flags = 0; + if (flags & SPLICE_F_MORE) + msg.msg_flags = MSG_MORE; + if (remain && pipe_occupancy(pipe->head, tail) > 0) + msg.msg_flags = MSG_MORE; + msg.msg_flags |= MSG_SPLICE_PAGES; + + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, bvec, bc, len - remain); + ret = sock_sendmsg(sock, &msg); + if (ret <= 0) + break; + + spliced += ret; + len -= ret; + tail = pipe->tail; + while (ret > 0) { + struct pipe_buffer *buf = &pipe->bufs[tail & mask]; + size_t seg = min_t(size_t, ret, buf->len); + + buf->offset += seg; + buf->len -= seg; + ret -= seg; + + if (!buf->len) { + pipe_buf_release(pipe, buf); + tail++; + } + } + + if (tail != pipe->tail) { + pipe->tail = tail; + if (pipe->files) + need_wakeup = true; + } + } + +out: + pipe_unlock(pipe); + if (need_wakeup) + wakeup_pipe_writers(pipe); + return spliced ?: ret; } #endif From patchwork Wed Mar 29 14:13:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76624 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp460913vqo; Wed, 29 Mar 2023 07:36:13 -0700 (PDT) X-Google-Smtp-Source: AKy350YNqzMjPE9xovOS7Y1RF/2xrIeeMRhHH5YhPYS7AFqQeWO4BiVQOpQGL9t8WHDRFkGprXlF X-Received: by 2002:a17:907:8c83:b0:939:a610:fc32 with SMTP id td3-20020a1709078c8300b00939a610fc32mr21795359ejc.53.1680100573207; Wed, 29 Mar 2023 07:36:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680100573; cv=none; d=google.com; s=arc-20160816; b=qt9khII25NBaAUN5tiV9Ttnin7JgKj4masEaAqnWB4bMOERT4xUs5Po/o1N4awxAJy N8V37aYXpQyEGPx9lOtwxNVk0rkc115JlrxB7qirmiHR3M0fx/Va+37bWC4ioYp1MANe VFGLX8Qeyh86KMzG0xTVU7eK3LVKvl6ZSW0+7whaWobycjCAHE01Senc6+W0+wh2vooI TiIzlw1BGRLYULTGuIh1+jgnXMYGFhzE1Bw1DawKPJdIy+FMesw4ajRGsRBMRORhK7ea ytU+he2LJCXxgBljtClZAivYP06vyA1qtjja9l6btfMvQPfcBqA/QuO6l3ynWhYINRJJ a3OA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=BwLwJU8AwjD+1uYy2mnm7V08kEOLBE2Z8d7v3yKcDiA=; b=yKoczOwxTj9uYIwDHFHuX0sNFaceWgBvBfLioArEiSWXGCHB2RM8TLc6iNBh9b9m41 uNE+d5xuVCvVdmgt83I2WlP0cYLPDOnT7DbtY4CJB62veLy/CEn4x5CxpUWR3iHoXFNm Oexi8bqfmTu15gyD0pouEiSmyjbd6u0ZPBlT4eRUw4WlUFrwp+1FJHK5DLzbF1qPg1iw q44cUkMev7/NZ3hoIteQgxhLsyMK9za38ojRWpIDhs2hCDEgLPWlbn8hg5lL/y5K9YpL Fxoo/JvcHeXVvcZ+9x8A9poVULytumxse913UEa1x33rskcE8wPlTZOs5ersiuimm573 22Hw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="Knyp+6e/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f25-20020a17090631d900b0092c8da1e5edsi33845317ejf.611.2023.03.29.07.35.49; Wed, 29 Mar 2023 07:36:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="Knyp+6e/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229817AbjC2OUC (ORCPT + 99 others); Wed, 29 Mar 2023 10:20:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53590 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230491AbjC2OSU (ORCPT ); Wed, 29 Mar 2023 10:18:20 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C6A4E5FC2 for ; Wed, 29 Mar 2023 07:16:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099323; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BwLwJU8AwjD+1uYy2mnm7V08kEOLBE2Z8d7v3yKcDiA=; b=Knyp+6e/CMDqJfYZx1lE88V7P5E+TnC6x22vyUwfEIUIw2E5g9zmPK+islTJu4WYUUqASW yHJfOD0thVwLM4c5yJIQw05fo2nv6+RBZK1z3Eue+DTBP1uK2WDFfEOCNgctQwhVk/BhU8 TYEwJl354NYUeiWthrPiS6zAhcPrcis= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-1-P4yHE0I5OgCg_PbyvAV0fw-1; Wed, 29 Mar 2023 10:15:19 -0400 X-MC-Unique: P4yHE0I5OgCg_PbyvAV0fw-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 23AE23814951; Wed, 29 Mar 2023 14:15:18 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3CE2E2027040; Wed, 29 Mar 2023 14:15:16 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v2 29/48] Remove file->f_op->sendpage Date: Wed, 29 Mar 2023 15:13:35 +0100 Message-Id: <20230329141354.516864-30-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761713138676619513?= X-GMAIL-MSGID: =?utf-8?q?1761713138676619513?= Remove file->f_op->sendpage as splicing to a socket now calls sendmsg rather than sendpage. Signed-off-by: David Howells cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- include/linux/fs.h | 1 - 1 file changed, 1 deletion(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index f3ccc243851e..a9f1b2543d2c 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1773,7 +1773,6 @@ struct file_operations { int (*fsync) (struct file *, loff_t, loff_t, int datasync); int (*fasync) (int, struct file *, int); int (*lock) (struct file *, int, struct file_lock *); - ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int); unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long); int (*check_flags)(int); int (*flock) (struct file *, int, struct file_lock *); From patchwork Wed Mar 29 14:13:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76615 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp451795vqo; Wed, 29 Mar 2023 07:22:56 -0700 (PDT) X-Google-Smtp-Source: AKy350ZQe4jodbkmuxn5nv1fgkeWhgkFIIGXHf4KaSvdUk5LOEanV3UiIHKMOS35xWn9ByEohxor X-Received: by 2002:aa7:cd12:0:b0:4fe:9555:3a57 with SMTP id b18-20020aa7cd12000000b004fe95553a57mr17659757edw.16.1680099776479; Wed, 29 Mar 2023 07:22:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680099776; cv=none; d=google.com; s=arc-20160816; b=lZWCHAhr5n2IlDsfcR2Y/Fh5B0EA4eww9znh0on78ze5qtTa9FGRf1n9CjsQTpMY6h v/ZZSouhIlfaRShY/u+dIZ1+d+/VD158j/Innn9THFdnMkeoiTK4bH9trbcnE4ezsCuQ rlvDe5vtOUo8l9oBU2MGSnRbLGze4gcw02EvO8hDhQ6rF9jRlFFvpQxSL5WSo9wp4LXz UL74R2PpdZZi1V/HbuyKrn3Cf3IUh9v2R5oVZDDwKeTEOdsElhobL1DLB2Y+bP8yDzRu GzpEOABuRwQ0UFGzw6GGam1pMATATg75t2bfRR6ziLfhtSBVkVI3m0Gg82pzyJF1dB/C Ldlg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=uOfuvlKLCw3PwI6FlP4CG+iWtw5a+KyQbWE9E7aqyN8=; b=CFxzp1+9Ji65HI+siP92qyyDPOduhEJ924IwAP4s1dGJ34HGjKUmRMbhFX1vz0CZp4 0U+g9dK21994UBJmNEM6JR+1zIntQuoOg5VxeVVScEa/Mk3PIwyRKINCyCO2hA29CxQ0 uZC1GzJdkBu81kwsnDQXJ+R4k5qZcwTppnuC0SEjxVvgMDWpgTfASqtOzjI/TcHadcsI mFzDz7OXOWSBUE50g8vP1fSOsSqvBFjwv8LV4fWupJiQWWBnBab4JD1y7It6KB2xfDax Jo7VeC53t/ddjYKtiaaRiw2e123vAPEZhXw/m8LvXSw8EAaM+I1bcKmfrjN63YGX2lJO V73g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="eM0K/ZG3"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f18-20020aa7d852000000b004fefeece418si34659664eds.284.2023.03.29.07.22.31; Wed, 29 Mar 2023 07:22:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="eM0K/ZG3"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229765AbjC2OUO (ORCPT + 99 others); Wed, 29 Mar 2023 10:20:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55258 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230320AbjC2OSo (ORCPT ); Wed, 29 Mar 2023 10:18:44 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A865D5FCE for ; Wed, 29 Mar 2023 07:16:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099328; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uOfuvlKLCw3PwI6FlP4CG+iWtw5a+KyQbWE9E7aqyN8=; b=eM0K/ZG3veR37/pj3kJKN2Sz8gRICgoLz9RV1gPySK2XAGtrzB/OKCg15J10VtIWBf1Dxn cwab88azAgyf6xf1YMcmi2oOPPb7dqvW7i1mk2gpnZYV9TUwFYQ0qciAjOfcVNi/taZlry 5u/qWah5A/FxeSZ6CtBH8nKr3Z+AhKk= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-70-Ff7TQ3BGOAynU7ztrXIohw-1; Wed, 29 Mar 2023 10:15:22 -0400 X-MC-Unique: Ff7TQ3BGOAynU7ztrXIohw-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E073A1C0758E; Wed, 29 Mar 2023 14:15:20 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id B3D6F2027042; Wed, 29 Mar 2023 14:15:18 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Bernard Metzler , Tom Talpey , linux-rdma@vger.kernel.org Subject: [RFC PATCH v2 30/48] siw: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage to transmit Date: Wed, 29 Mar 2023 15:13:36 +0100 Message-Id: <20230329141354.516864-31-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761712302941659193?= X-GMAIL-MSGID: =?utf-8?q?1761712302941659193?= When transmitting data, call down into TCP using a single sendmsg with MSG_SPLICE_PAGES to indicate that content should be spliced rather than performing several sendmsg and sendpage calls to transmit header, data pages and trailer. To make this work, the data is assembled in a bio_vec array and attached to a BVEC-type iterator. The header and trailer (if present) are copied into page fragments that can be freed with put_page(). Signed-off-by: David Howells cc: Bernard Metzler cc: Tom Talpey cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: linux-rdma@vger.kernel.org cc: netdev@vger.kernel.org --- drivers/infiniband/sw/siw/siw_qp_tx.c | 234 ++++++-------------------- 1 file changed, 48 insertions(+), 186 deletions(-) diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c b/drivers/infiniband/sw/siw/siw_qp_tx.c index fa5de40d85d5..fbe80c06d0ca 100644 --- a/drivers/infiniband/sw/siw/siw_qp_tx.c +++ b/drivers/infiniband/sw/siw/siw_qp_tx.c @@ -312,114 +312,8 @@ static int siw_tx_ctrl(struct siw_iwarp_tx *c_tx, struct socket *s, return rv; } -/* - * 0copy TCP transmit interface: Use MSG_SPLICE_PAGES. - * - * Using sendpage to push page by page appears to be less efficient - * than using sendmsg, even if data are copied. - * - * A general performance limitation might be the extra four bytes - * trailer checksum segment to be pushed after user data. - */ -static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset, - size_t size) -{ - struct bio_vec bvec; - struct msghdr msg = { - .msg_flags = (MSG_MORE | MSG_DONTWAIT | MSG_SENDPAGE_NOTLAST | - MSG_SPLICE_PAGES), - }; - struct sock *sk = s->sk; - int i = 0, rv = 0, sent = 0; - - while (size) { - size_t bytes = min_t(size_t, PAGE_SIZE - offset, size); - - if (size + offset <= PAGE_SIZE) - msg.msg_flags = MSG_MORE | MSG_DONTWAIT; - - tcp_rate_check_app_limited(sk); - bvec_set_page(&bvec, page[i], bytes, offset); - iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); - -try_page_again: - lock_sock(sk); - rv = tcp_sendmsg_locked(sk, &msg, size); - release_sock(sk); - - if (rv > 0) { - size -= rv; - sent += rv; - if (rv != bytes) { - offset += rv; - bytes -= rv; - goto try_page_again; - } - offset = 0; - } else { - if (rv == -EAGAIN || rv == 0) - break; - return rv; - } - i++; - } - return sent; -} - -/* - * siw_0copy_tx() - * - * Pushes list of pages to TCP socket. If pages from multiple - * SGE's, all referenced pages of each SGE are pushed in one - * shot. - */ -static int siw_0copy_tx(struct socket *s, struct page **page, - struct siw_sge *sge, unsigned int offset, - unsigned int size) -{ - int i = 0, sent = 0, rv; - int sge_bytes = min(sge->length - offset, size); - - offset = (sge->laddr + offset) & ~PAGE_MASK; - - while (sent != size) { - rv = siw_tcp_sendpages(s, &page[i], offset, sge_bytes); - if (rv >= 0) { - sent += rv; - if (size == sent || sge_bytes > rv) - break; - - i += PAGE_ALIGN(sge_bytes + offset) >> PAGE_SHIFT; - sge++; - sge_bytes = min(sge->length, size - sent); - offset = sge->laddr & ~PAGE_MASK; - } else { - sent = rv; - break; - } - } - return sent; -} - #define MAX_TRAILER (MPA_CRC_SIZE + 4) -static void siw_unmap_pages(struct kvec *iov, unsigned long kmap_mask, int len) -{ - int i; - - /* - * Work backwards through the array to honor the kmap_local_page() - * ordering requirements. - */ - for (i = (len-1); i >= 0; i--) { - if (kmap_mask & BIT(i)) { - unsigned long addr = (unsigned long)iov[i].iov_base; - - kunmap_local((void *)(addr & PAGE_MASK)); - } - } -} - /* * siw_tx_hdt() tries to push a complete packet to TCP where all * packet fragments are referenced by the elements of one iovec. @@ -439,15 +333,14 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s) { struct siw_wqe *wqe = &c_tx->wqe_active; struct siw_sge *sge = &wqe->sqe.sge[c_tx->sge_idx]; - struct kvec iov[MAX_ARRAY]; - struct page *page_array[MAX_ARRAY]; + struct bio_vec bvec[MAX_ARRAY]; struct msghdr msg = { .msg_flags = MSG_DONTWAIT | MSG_EOR }; + void *trl, *t; int seg = 0, do_crc = c_tx->do_crc, is_kva = 0, rv; unsigned int data_len = c_tx->bytes_unsent, hdr_len = 0, trl_len = 0, sge_off = c_tx->sge_off, sge_idx = c_tx->sge_idx, pbl_idx = c_tx->pbl_idx; - unsigned long kmap_mask = 0L; if (c_tx->state == SIW_SEND_HDR) { if (c_tx->use_sendpage) { @@ -457,10 +350,15 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s) c_tx->state = SIW_SEND_DATA; } else { - iov[0].iov_base = - (char *)&c_tx->pkt.ctrl + c_tx->ctrl_sent; - iov[0].iov_len = hdr_len = - c_tx->ctrl_len - c_tx->ctrl_sent; + const void *hdr = &c_tx->pkt.ctrl + c_tx->ctrl_sent; + void *h; + + rv = -ENOMEM; + hdr_len = c_tx->ctrl_len - c_tx->ctrl_sent; + h = page_frag_memdup(NULL, hdr, hdr_len, GFP_NOFS, ULONG_MAX); + if (!h) + goto done; + bvec_set_virt(&bvec[0], h, hdr_len); seg = 1; } } @@ -478,28 +376,9 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s) } else { is_kva = 1; } - if (is_kva && !c_tx->use_sendpage) { - /* - * tx from kernel virtual address: either inline data - * or memory region with assigned kernel buffer - */ - iov[seg].iov_base = - (void *)(uintptr_t)(sge->laddr + sge_off); - iov[seg].iov_len = sge_len; - - if (do_crc) - crypto_shash_update(c_tx->mpa_crc_hd, - iov[seg].iov_base, - sge_len); - sge_off += sge_len; - data_len -= sge_len; - seg++; - goto sge_done; - } while (sge_len) { size_t plen = min((int)PAGE_SIZE - fp_off, sge_len); - void *kaddr; if (!is_kva) { struct page *p; @@ -512,33 +391,12 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s) p = siw_get_upage(mem->umem, sge->laddr + sge_off); if (unlikely(!p)) { - siw_unmap_pages(iov, kmap_mask, seg); wqe->processed -= c_tx->bytes_unsent; rv = -EFAULT; goto done_crc; } - page_array[seg] = p; - - if (!c_tx->use_sendpage) { - void *kaddr = kmap_local_page(p); - - /* Remember for later kunmap() */ - kmap_mask |= BIT(seg); - iov[seg].iov_base = kaddr + fp_off; - iov[seg].iov_len = plen; - - if (do_crc) - crypto_shash_update( - c_tx->mpa_crc_hd, - iov[seg].iov_base, - plen); - } else if (do_crc) { - kaddr = kmap_local_page(p); - crypto_shash_update(c_tx->mpa_crc_hd, - kaddr + fp_off, - plen); - kunmap_local(kaddr); - } + + bvec_set_page(&bvec[seg], p, plen, fp_off); } else { /* * Cast to an uintptr_t to preserve all 64 bits @@ -552,12 +410,15 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s) * bits on a 64 bit platform and 32 bits on a * 32 bit platform. */ - page_array[seg] = virt_to_page((void *)(va & PAGE_MASK)); - if (do_crc) - crypto_shash_update( - c_tx->mpa_crc_hd, - (void *)va, - plen); + bvec_set_virt(&bvec[seg], (void *)va, plen); + } + + if (do_crc) { + void *kaddr = kmap_local_page(bvec[seg].bv_page); + crypto_shash_update(c_tx->mpa_crc_hd, + kaddr + bvec[seg].bv_offset, + bvec[seg].bv_len); + kunmap_local(kaddr); } sge_len -= plen; @@ -567,13 +428,12 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s) if (++seg > (int)MAX_ARRAY) { siw_dbg_qp(tx_qp(c_tx), "to many fragments\n"); - siw_unmap_pages(iov, kmap_mask, seg-1); wqe->processed -= c_tx->bytes_unsent; rv = -EMSGSIZE; goto done_crc; } } -sge_done: + /* Update SGE variables at end of SGE */ if (sge_off == sge->length && (data_len != 0 || wqe->processed < wqe->bytes)) { @@ -582,15 +442,8 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s) sge_off = 0; } } - /* trailer */ - if (likely(c_tx->state != SIW_SEND_TRAILER)) { - iov[seg].iov_base = &c_tx->trailer.pad[4 - c_tx->pad]; - iov[seg].iov_len = trl_len = MAX_TRAILER - (4 - c_tx->pad); - } else { - iov[seg].iov_base = &c_tx->trailer.pad[c_tx->ctrl_sent]; - iov[seg].iov_len = trl_len = MAX_TRAILER - c_tx->ctrl_sent; - } + /* Set the CRC in the trailer */ if (c_tx->pad) { *(u32 *)c_tx->trailer.pad = 0; if (do_crc) @@ -603,23 +456,29 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s) else if (do_crc) crypto_shash_final(c_tx->mpa_crc_hd, (u8 *)&c_tx->trailer.crc); - data_len = c_tx->bytes_unsent; - - if (c_tx->use_sendpage) { - rv = siw_0copy_tx(s, page_array, &wqe->sqe.sge[c_tx->sge_idx], - c_tx->sge_off, data_len); - if (rv == data_len) { - rv = kernel_sendmsg(s, &msg, &iov[seg], 1, trl_len); - if (rv > 0) - rv += data_len; - else - rv = data_len; - } + /* Copy the trailer and add it to the output list */ + if (likely(c_tx->state != SIW_SEND_TRAILER)) { + trl = &c_tx->trailer.pad[4 - c_tx->pad]; + trl_len = MAX_TRAILER - (4 - c_tx->pad); } else { - rv = kernel_sendmsg(s, &msg, iov, seg + 1, - hdr_len + data_len + trl_len); - siw_unmap_pages(iov, kmap_mask, seg); + trl = &c_tx->trailer.pad[c_tx->ctrl_sent]; + trl_len = MAX_TRAILER - c_tx->ctrl_sent; } + + rv = -ENOMEM; + t = page_frag_memdup(NULL, trl, trl_len, GFP_NOFS, ULONG_MAX); + if (!t) + goto done_crc; + bvec_set_virt(&bvec[seg], t, trl_len); + + data_len = c_tx->bytes_unsent; + + if (c_tx->use_sendpage) + msg.msg_flags |= MSG_SPLICE_PAGES; + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, bvec, seg + 1, + hdr_len + data_len + trl_len); + rv = sock_sendmsg(s, &msg); + if (rv < (int)hdr_len) { /* Not even complete hdr pushed or negative rv */ wqe->processed -= data_len; @@ -680,6 +539,9 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s) } done_crc: c_tx->do_crc = 0; + if (c_tx->state == SIW_SEND_HDR) + folio_put(page_folio(bvec[0].bv_page)); + folio_put(page_folio(bvec[seg].bv_page)); done: return rv; } From patchwork Wed Mar 29 14:13:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76643 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp463978vqo; Wed, 29 Mar 2023 07:41:13 -0700 (PDT) X-Google-Smtp-Source: AKy350ZP8C34hCtZjSo+uCE5NtokqlNOna8uHFXVI1GFx29g9mxAlrF03mvbGyyxmbfDIaCi8lh+ X-Received: by 2002:a17:906:9f15:b0:931:c7fd:10b1 with SMTP id fy21-20020a1709069f1500b00931c7fd10b1mr20023732ejc.19.1680100873463; Wed, 29 Mar 2023 07:41:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680100873; cv=none; d=google.com; s=arc-20160816; b=GDOOLnRdhmu5uhwe8zxE7wMBhrIBDK7dKm/03gft7cib7Ap8h6QgWtKi/Ci4d3uqOV CbohIpoRoL/agQoQ+aqrOMzeiLd0p/zgRNmhOvhgzlnFd4BEWKsyqr1c1DbCAFL/AaBM dbb1mC9bfufPwqQsXO7alI2G46qtkXEVer9l2DdqtYkro0B5/WB5V1qPAHWg/gLQU7zL zPoQqlxSlUWpiE+afAXqfs4QMY/EBDSy4lhh9r4cQP0R7kub8crM14v0RxVMnuxp7fca dk4b77m9OAfEjp25wxWGvlpeyEbW0Cqn2bucyDm2YGxZHo1UpWnmwGJRsM4F+vlQaW3V YGAg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=mcxOsbWhFQyukCMWrPOKlIpXJGoD9GGTGJfL3NnBqs4=; b=r7CQIY4D7F26DMdBkopMf1Rz3kcIghF+siTSrgy/3T7uJF2XMTFgmgHQ0BDBO1JNHw HdZt4Eq7XGk6DCP5hXmzp6PYCwACN5CBlkarSuzYas7zqEIU/AlG3oyaWvu5z/Y8TnNI Hghrmg9GAbcWmP7742NUsltTYVAMobajD83qEL28sP89WKEgxmn7PMMGP724VfOpcnPe E2Vdk4Og1n2cmDtNn9pRbiKDuhL7oox8jczWr9kXbq4DY+Jw0SZVzFllmzlgG7Sg9VhX J0jfu8k9N5C1+Bz9jHMN2BQq6vwUVj1QJIwMDley+s9qF9PjGtrfvX6dbecxdgVqYcCe hF8w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="a/E1p7X+"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hr12-20020a1709073f8c00b009333f8775a1si35985471ejc.182.2023.03.29.07.40.49; Wed, 29 Mar 2023 07:41:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="a/E1p7X+"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231282AbjC2OUL (ORCPT + 99 others); Wed, 29 Mar 2023 10:20:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54764 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229926AbjC2OSn (ORCPT ); Wed, 29 Mar 2023 10:18:43 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6E6D05FCB for ; Wed, 29 Mar 2023 07:16:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099331; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mcxOsbWhFQyukCMWrPOKlIpXJGoD9GGTGJfL3NnBqs4=; b=a/E1p7X+VK4eYyhZoGsKLQk9MXRbyUMNb+6pJQaV/x0yMw8OHI2wqobntO9gFUiRkewHgh wPu86bid/r8Jcj40FJ2SQ4ifZSYc85IAe7NMDiAy8T1xkVYX32bgidrBLjJTBQKFMNHBQ+ lBMX3iUetpa8uBFXCTBpGlZmqYpBK2k= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-67-R6b5KJCXPSqhwAA81fk1XQ-1; Wed, 29 Mar 2023 10:15:26 -0400 X-MC-Unique: R6b5KJCXPSqhwAA81fk1XQ-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1BD933C0ED4D; Wed, 29 Mar 2023 14:15:24 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9FF1D14171BB; Wed, 29 Mar 2023 14:15:21 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Ilya Dryomov , Xiubo Li , ceph-devel@vger.kernel.org Subject: [RFC PATCH v2 31/48] ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage Date: Wed, 29 Mar 2023 15:13:37 +0100 Message-Id: <20230329141354.516864-32-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761713453388406666?= X-GMAIL-MSGID: =?utf-8?q?1761713453388406666?= Use sendmsg() and MSG_SPLICE_PAGES rather than sendpage in ceph when transmitting data. For the moment, this can only transmit one page at a time because of the architecture of net/ceph/, but if write_partial_message_data() can be given a bvec[] at a time by the iteration code, this would allow pages to be sent in a batch. Signed-off-by: David Howells cc: Ilya Dryomov cc: Xiubo Li cc: Jeff Layton cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: ceph-devel@vger.kernel.org cc: netdev@vger.kernel.org --- net/ceph/messenger_v1.c | 58 ++++++++++++++--------------------------- 1 file changed, 19 insertions(+), 39 deletions(-) diff --git a/net/ceph/messenger_v1.c b/net/ceph/messenger_v1.c index d664cb1593a7..b2d801a49122 100644 --- a/net/ceph/messenger_v1.c +++ b/net/ceph/messenger_v1.c @@ -74,37 +74,6 @@ static int ceph_tcp_sendmsg(struct socket *sock, struct kvec *iov, return r; } -/* - * @more: either or both of MSG_MORE and MSG_SENDPAGE_NOTLAST - */ -static int ceph_tcp_sendpage(struct socket *sock, struct page *page, - int offset, size_t size, int more) -{ - ssize_t (*sendpage)(struct socket *sock, struct page *page, - int offset, size_t size, int flags); - int flags = MSG_DONTWAIT | MSG_NOSIGNAL | more; - int ret; - - /* - * sendpage cannot properly handle pages with page_count == 0, - * we need to fall back to sendmsg if that's the case. - * - * Same goes for slab pages: skb_can_coalesce() allows - * coalescing neighboring slab objects into a single frag which - * triggers one of hardened usercopy checks. - */ - if (sendpage_ok(page)) - sendpage = sock->ops->sendpage; - else - sendpage = sock_no_sendpage; - - ret = sendpage(sock, page, offset, size, flags); - if (ret == -EAGAIN) - ret = 0; - - return ret; -} - static void con_out_kvec_reset(struct ceph_connection *con) { BUG_ON(con->v1.out_skip); @@ -464,7 +433,6 @@ static int write_partial_message_data(struct ceph_connection *con) struct ceph_msg *msg = con->out_msg; struct ceph_msg_data_cursor *cursor = &msg->cursor; bool do_datacrc = !ceph_test_opt(from_msgr(con->msgr), NOCRC); - int more = MSG_MORE | MSG_SENDPAGE_NOTLAST; u32 crc; dout("%s %p msg %p\n", __func__, con, msg); @@ -482,6 +450,10 @@ static int write_partial_message_data(struct ceph_connection *con) */ crc = do_datacrc ? le32_to_cpu(msg->footer.data_crc) : 0; while (cursor->total_resid) { + struct bio_vec bvec; + struct msghdr msghdr = { + .msg_flags = MSG_SPLICE_PAGES | MSG_SENDPAGE_NOTLAST, + }; struct page *page; size_t page_offset; size_t length; @@ -494,9 +466,12 @@ static int write_partial_message_data(struct ceph_connection *con) page = ceph_msg_data_next(cursor, &page_offset, &length); if (length == cursor->total_resid) - more = MSG_MORE; - ret = ceph_tcp_sendpage(con->sock, page, page_offset, length, - more); + msghdr.msg_flags |= MSG_MORE; + + bvec_set_page(&bvec, page, length, page_offset); + iov_iter_bvec(&msghdr.msg_iter, ITER_SOURCE, &bvec, 1, length); + + ret = sock_sendmsg(con->sock, &msghdr); if (ret <= 0) { if (do_datacrc) msg->footer.data_crc = cpu_to_le32(crc); @@ -526,7 +501,10 @@ static int write_partial_message_data(struct ceph_connection *con) */ static int write_partial_skip(struct ceph_connection *con) { - int more = MSG_MORE | MSG_SENDPAGE_NOTLAST; + struct bio_vec bvec; + struct msghdr msghdr = { + .msg_flags = MSG_SPLICE_PAGES | MSG_SENDPAGE_NOTLAST | MSG_MORE, + }; int ret; dout("%s %p %d left\n", __func__, con, con->v1.out_skip); @@ -534,9 +512,11 @@ static int write_partial_skip(struct ceph_connection *con) size_t size = min(con->v1.out_skip, (int)PAGE_SIZE); if (size == con->v1.out_skip) - more = MSG_MORE; - ret = ceph_tcp_sendpage(con->sock, ceph_zero_page, 0, size, - more); + msghdr.msg_flags &= ~MSG_SENDPAGE_NOTLAST; + bvec_set_page(&bvec, ZERO_PAGE(0), size, 0); + iov_iter_bvec(&msghdr.msg_iter, ITER_SOURCE, &bvec, 1, size); + + ret = sock_sendmsg(con->sock, &msghdr); if (ret <= 0) goto out; con->v1.out_skip -= ret; From patchwork Wed Mar 29 14:13:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76649 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp465220vqo; Wed, 29 Mar 2023 07:43:18 -0700 (PDT) X-Google-Smtp-Source: AKy350bTDsH3hOF5f4veLjWs5cNaDeWq7L4yOu6N/3iWgApdAiWSlbJAXPquB4FAwdUIFUVti1qy X-Received: by 2002:aa7:d052:0:b0:4fb:999:e052 with SMTP id n18-20020aa7d052000000b004fb0999e052mr18427893edo.33.1680100998390; Wed, 29 Mar 2023 07:43:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680100998; cv=none; d=google.com; s=arc-20160816; b=b1rnHiRyq53jL1JRhz6ydzavFBkIqch2Zr9vBAlBOMOqT5OUAdSo86YNZMPvOqGhSL TNAiSMgy2epz0y3xGk6r4y7MGbh+4ERxNgVQEFjlOwKGFeBWxS/3afa8aNxCI0hbh6xz q5PkzrzKOhdnhUMMKff/eSAwTwaG9edwbONilyzwTi3sUP5Hm2ti1LkAdOEGYMor+Fov 7NU0pxGTbpoXiwau+hMyKsLCrONuEf7Nta1Zt+e93Wlu9gNlxGk5xmcUKJay9QomIV9O dLze4O8/DlRRoUvOO69nyc5Qn2PhDqPAHFJhUmSRX8aHWebdJwUyNoGd8LY0irYJtiJp S5Fw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=gZm+OH3nuU2w5lgBIDJ5zMpRPGFGHnStb+oddo8E9cs=; b=faja+yRE3LfKGt41uTkZiRjgjdSCjUQ+I9xmczkP2hIVNRzBML+SR3TpQhaDMn+dgV 0yX0Dm4yflzzAvRaUyu3KYeQbWZlf3jtYOquXQ+z9FIR9Wo6VTU2ci3agiIDCHKqVfWs gfpAgCYG3i5aXvqur2s+K8zyq6E6Lw5f2+Op76AsrRhseygl1yugZL06o5uOPoYWqaqS oaGhZQ2uI8DOaouLRBbumVJgKp+DNv7YjHXO6ugTRXM/PbAlV+Edo8+L/fuW9XM/CRuz hJw4e/XYdpAfQg4w71btWJlKejqL/30nx+FsaP54hr9RO1iHgAH9IVASlIn45u7I0m4L C5kQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=I1WHxx0o; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f18-20020aa7d852000000b004fefeece418si34691023eds.284.2023.03.29.07.42.53; Wed, 29 Mar 2023 07:43:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=I1WHxx0o; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230325AbjC2Ofe (ORCPT + 99 others); Wed, 29 Mar 2023 10:35:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60476 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230298AbjC2OfE (ORCPT ); Wed, 29 Mar 2023 10:35:04 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F1E19A27A for ; Wed, 29 Mar 2023 07:30:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680100140; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gZm+OH3nuU2w5lgBIDJ5zMpRPGFGHnStb+oddo8E9cs=; b=I1WHxx0oKxb3xw1E3IDISaYERyOHdrksNpeNqnPni2A7+l/7KAHfG+nHI4Ib5N6i18a+C1 az/nZenbsOwC1nFLYnaFV98sSCJ4bFzMZGDGzGCA5HQ8p3Np+ixgUHIj9JEckJB7F3AZed 1jpjY3G1a7/y0wn7MjPvx7t/a5cebbc= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-427-ZeMxTUzWPHqK1lFKrZJGyg-1; Wed, 29 Mar 2023 10:15:27 -0400 X-MC-Unique: ZeMxTUzWPHqK1lFKrZJGyg-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id DB5A61C0758C; Wed, 29 Mar 2023 14:15:26 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id B2C022166B33; Wed, 29 Mar 2023 14:15:24 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, "Martin K. Petersen" , linux-scsi@vger.kernel.org, target-devel@vger.kernel.org Subject: [RFC PATCH v2 32/48] iscsi: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage Date: Wed, 29 Mar 2023 15:13:38 +0100 Message-Id: <20230329141354.516864-33-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761713584592268058?= X-GMAIL-MSGID: =?utf-8?q?1761713584592268058?= Use sendmsg() with MSG_SPLICE_PAGES rather than sendpage. This allows multiple pages and multipage folios to be passed through. TODO: iscsit_fe_sendpage_sg() should perhaps set up a bio_vec array for the entire set of pages it's going to transfer plus two for the header and trailer and page fragments to hold the header and trailer - and then call sendmsg once for the entire message. Signed-off-by: David Howells cc: "Martin K. Petersen" cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: linux-scsi@vger.kernel.org cc: target-devel@vger.kernel.org cc: netdev@vger.kernel.org --- drivers/scsi/iscsi_tcp.c | 31 ++++++++++++------------ drivers/scsi/iscsi_tcp.h | 2 +- drivers/target/iscsi/iscsi_target_util.c | 14 ++++++----- 3 files changed, 24 insertions(+), 23 deletions(-) diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c index c76f82fb8b63..cf3eb55d2a76 100644 --- a/drivers/scsi/iscsi_tcp.c +++ b/drivers/scsi/iscsi_tcp.c @@ -301,35 +301,37 @@ static int iscsi_sw_tcp_xmit_segment(struct iscsi_tcp_conn *tcp_conn, while (!iscsi_tcp_segment_done(tcp_conn, segment, 0, r)) { struct scatterlist *sg; + struct msghdr msg = {}; + union { + struct kvec kv; + struct bio_vec bv; + } vec; unsigned int offset, copy; - int flags = 0; r = 0; offset = segment->copied; copy = segment->size - offset; if (segment->total_copied + segment->size < segment->total_size) - flags |= MSG_MORE | MSG_SENDPAGE_NOTLAST; + msg.msg_flags |= MSG_MORE | MSG_SENDPAGE_NOTLAST; if (tcp_sw_conn->queue_recv) - flags |= MSG_DONTWAIT; + msg.msg_flags |= MSG_DONTWAIT; - /* Use sendpage if we can; else fall back to sendmsg */ if (!segment->data) { + if (tcp_conn->iscsi_conn->datadgst_en) + msg.msg_flags |= MSG_SPLICE_PAGES; sg = segment->sg; offset += segment->sg_offset + sg->offset; - r = tcp_sw_conn->sendpage(sk, sg_page(sg), offset, - copy, flags); + bvec_set_page(&vec.bv, sg_page(sg), copy, offset); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &vec.bv, 1, copy); } else { - struct msghdr msg = { .msg_flags = flags }; - struct kvec iov = { - .iov_base = segment->data + offset, - .iov_len = copy - }; - - r = kernel_sendmsg(sk, &msg, &iov, 1, copy); + vec.kv.iov_base = segment->data + offset; + vec.kv.iov_len = copy; + iov_iter_kvec(&msg.msg_iter, ITER_SOURCE, &vec.kv, 1, copy); } + r = sock_sendmsg(sk, &msg); if (r < 0) { iscsi_tcp_segment_unmap(segment); return r; @@ -746,7 +748,6 @@ iscsi_sw_tcp_conn_bind(struct iscsi_cls_session *cls_session, sock_no_linger(sk); iscsi_sw_tcp_conn_set_callbacks(conn); - tcp_sw_conn->sendpage = tcp_sw_conn->sock->ops->sendpage; /* * set receive state machine into initial state */ @@ -778,8 +779,6 @@ static int iscsi_sw_tcp_conn_set_param(struct iscsi_cls_conn *cls_conn, mutex_unlock(&tcp_sw_conn->sock_lock); return -ENOTCONN; } - tcp_sw_conn->sendpage = conn->datadgst_en ? - sock_no_sendpage : tcp_sw_conn->sock->ops->sendpage; mutex_unlock(&tcp_sw_conn->sock_lock); break; case ISCSI_PARAM_MAX_R2T: diff --git a/drivers/scsi/iscsi_tcp.h b/drivers/scsi/iscsi_tcp.h index 68e14a344904..d6ec08d7eb63 100644 --- a/drivers/scsi/iscsi_tcp.h +++ b/drivers/scsi/iscsi_tcp.h @@ -48,7 +48,7 @@ struct iscsi_sw_tcp_conn { uint32_t sendpage_failures_cnt; uint32_t discontiguous_hdr_cnt; - ssize_t (*sendpage)(struct socket *, struct page *, int, size_t, int); + bool can_splice_to_tcp; }; struct iscsi_sw_tcp_host { diff --git a/drivers/target/iscsi/iscsi_target_util.c b/drivers/target/iscsi/iscsi_target_util.c index 26dc8ed3045b..c7d58e41ac3b 100644 --- a/drivers/target/iscsi/iscsi_target_util.c +++ b/drivers/target/iscsi/iscsi_target_util.c @@ -1078,6 +1078,8 @@ int iscsit_fe_sendpage_sg( struct iscsit_conn *conn) { struct scatterlist *sg = cmd->first_data_sg; + struct bio_vec bvec; + struct msghdr msghdr = { .msg_flags = MSG_SPLICE_PAGES, }; struct kvec iov; u32 tx_hdr_size, data_len; u32 offset = cmd->first_data_sg_off; @@ -1121,17 +1123,17 @@ int iscsit_fe_sendpage_sg( u32 space = (sg->length - offset); u32 sub_len = min_t(u32, data_len, space); send_pg: - tx_sent = conn->sock->ops->sendpage(conn->sock, - sg_page(sg), sg->offset + offset, sub_len, 0); + bvec_set_page(&bvec, sg_page(sg), sub_len, sg->offset + offset); + iov_iter_bvec(&msghdr.msg_iter, ITER_SOURCE, &bvec, 1, sub_len); + + tx_sent = conn->sock->ops->sendmsg(conn->sock, &msghdr, sub_len); if (tx_sent != sub_len) { if (tx_sent == -EAGAIN) { - pr_err("tcp_sendpage() returned" - " -EAGAIN\n"); + pr_err("sendmsg/splice returned -EAGAIN\n"); goto send_pg; } - pr_err("tcp_sendpage() failure: %d\n", - tx_sent); + pr_err("sendmsg/splice failure: %d\n", tx_sent); return -1; } From patchwork Wed Mar 29 14:13:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76616 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp452884vqo; Wed, 29 Mar 2023 07:24:29 -0700 (PDT) X-Google-Smtp-Source: AKy350ZMp+sH1F13hsprtR/YejNmRHIGWhu/+WicO7Pjw+RL/by6JDLEye8rFHX0IycM2t/2VDTa X-Received: by 2002:a05:6402:5146:b0:502:1cae:8b11 with SMTP id n6-20020a056402514600b005021cae8b11mr22063262edd.23.1680099869236; Wed, 29 Mar 2023 07:24:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680099869; cv=none; d=google.com; s=arc-20160816; b=ZmzZHcslJM6cS3sm+RAVeleJHmeZbUSU02Cnmde7z+IzNMQVHJ+JFEFpy58ep43DFu NZMCdA+ytQjLuwrksv5Q72JiFRUFYkTZOACpX3CHYyUG51IhsBdlB7HMCL/lFJWbvUsb MVV9XIkDfThPmThBL9Fo8dAsAKgV5wrlXg0xkKZ9rjlcMVAJp8TYi1WPBhx0B0PgFCY1 EfdAh02WEbcRYwsFm1Tpvwlq9VWH2NNRe5hYke8bRwG7s4XNGO6lUX5LVLXiqxDEFCS/ 1YOpaMf18+clKv0Z8l5im5ii6+aGTVyp/bmcUX7InAzlaoWUd4Z+phCXuQHtBU2mxJcS KWfw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=0V0VYuNRGUsctVySLnEH8GinGQ4l5nQcc0Sx6Q5aq0A=; b=TklD62jnLogU40fANN50iOHuthQPaFnZM2Efqrx4s0+rPiaFhLYkMsSW+LRm/ffwQ9 caPRsjhsnzFDkF3YoMIs+NRWkmql8DDU5pVhGyhoGHQM9CWMmdzjEPXlBF4YoQB8fRNx IzhCnJvdccRLs8RAOjUKbMXCoNEtxd6ZYH+bQCMmsWksENbcOqfp3Otaw9GPXeGccIZ+ aKkonbHC1ujhF47IYkjfn2qHsyWdB5vqNVn3YJ6EnDYygP5qNKlB1zQcnUXyi0BdHBto QHL2niNMpt751J1AUbOKcK8Xh7hpWqNoHoMwh1yJJAI/oz9BUBwZR6jthUSFmFDdgcPw 5gpQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=c78w6wnc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o21-20020aa7d3d5000000b004acc6c7a631si30574811edr.179.2023.03.29.07.24.04; Wed, 29 Mar 2023 07:24:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=c78w6wnc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231298AbjC2OUc (ORCPT + 99 others); Wed, 29 Mar 2023 10:20:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54932 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231205AbjC2OSq (ORCPT ); Wed, 29 Mar 2023 10:18:46 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 638A15FEE for ; Wed, 29 Mar 2023 07:16:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099337; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0V0VYuNRGUsctVySLnEH8GinGQ4l5nQcc0Sx6Q5aq0A=; b=c78w6wncXK2A7qKBNp/fqp/QLdog2d7a7+Y5riDnZFpKgCuQJ2A3HhS6otgSGW1eMfIgI1 uVeWcEMz3sZgfa5fdgtpVM1MHiGRO7STJ6WiN2FnnpiqYwwV0G2ocWgMLnXfKEPV8GrZBr oW+sF5ijcJT9WBVx6yfv7g/mFiq7EpM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-167-kCehxZq9PD6SIf6owZvSow-1; Wed, 29 Mar 2023 10:15:33 -0400 X-MC-Unique: kCehxZq9PD6SIf6owZvSow-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C7FF3185A790; Wed, 29 Mar 2023 14:15:29 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9BD4C1121330; Wed, 29 Mar 2023 14:15:27 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, "Martin K. Petersen" , linux-scsi@vger.kernel.org, target-devel@vger.kernel.org Subject: [RFC PATCH v2 33/48] iscsi: Assume "sendpage" is okay in iscsi_tcp_segment_map() Date: Wed, 29 Mar 2023 15:13:39 +0100 Message-Id: <20230329141354.516864-34-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761712400785599983?= X-GMAIL-MSGID: =?utf-8?q?1761712400785599983?= As iscsi is now using sendmsg() with MSG_SPLICE_PAGES rather than sendpage, assume that sendpage_ok() will return true in iscsi_tcp_segment_map() and leave it to TCP to copy the data if not. Signed-off-by: David Howells cc: "Martin K. Petersen" cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: linux-scsi@vger.kernel.org cc: target-devel@vger.kernel.org cc: netdev@vger.kernel.org --- drivers/scsi/libiscsi_tcp.c | 13 +++---------- 1 file changed, 3 insertions(+), 10 deletions(-) diff --git a/drivers/scsi/libiscsi_tcp.c b/drivers/scsi/libiscsi_tcp.c index c182aa83f2c9..07ba0d864820 100644 --- a/drivers/scsi/libiscsi_tcp.c +++ b/drivers/scsi/libiscsi_tcp.c @@ -128,18 +128,11 @@ static void iscsi_tcp_segment_map(struct iscsi_segment *segment, int recv) * coalescing neighboring slab objects into a single frag which * triggers one of hardened usercopy checks. */ - if (!recv && sendpage_ok(sg_page(sg))) + if (!recv) return; - if (recv) { - segment->atomic_mapped = true; - segment->sg_mapped = kmap_atomic(sg_page(sg)); - } else { - segment->atomic_mapped = false; - /* the xmit path can sleep with the page mapped so use kmap */ - segment->sg_mapped = kmap(sg_page(sg)); - } - + segment->atomic_mapped = true; + segment->sg_mapped = kmap_atomic(sg_page(sg)); segment->data = segment->sg_mapped + sg->offset + segment->sg_offset; } From patchwork Wed Mar 29 14:13:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76629 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp462210vqo; Wed, 29 Mar 2023 07:38:19 -0700 (PDT) X-Google-Smtp-Source: AKy350ZvqD9wZuGdbHgzzF43m5j96s8rEHzcBxYnF3jpuuiaW81iNVxdP5//gxCY0lm3P5JoPjwm X-Received: by 2002:aa7:cfce:0:b0:502:2386:d22 with SMTP id r14-20020aa7cfce000000b0050223860d22mr17231426edy.12.1680100699452; Wed, 29 Mar 2023 07:38:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680100699; cv=none; d=google.com; s=arc-20160816; b=dbn8OsL/6S0iR3qQELYX7Taxcp62J/9vq70RP9x5voDVo9T+vk2V4/bPXryfF28Xei qxCCzfKmjzPpFhjoGemIpMqinPWGUp7Qc/t9E6R7iQHkS8UHVJ7VOLl6yrquYrZpvpht WCkSXXLLuzCXXLn/84RVRSIsaYpMW4yEgpINja1K3fKahK/vC8v/PTz4+QEXS7G+70qo utdR2iUCjiLNuGafT4Yt6LaVrmNvlsEowyD82FHj1ExaVsmir3qs53k31pSw9+p3zlhg X702TDny+oHkQJoNatGJxOdxW5is1oDasCZxeUpOd9XcVUkULzLvlXx7CZYbrI3lKPYK qIng== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=cLmy3QT3PxPcXOXP8Tmrcj7ottmHeL6VPbhO66kxXIU=; b=S9VWdfXethaN/XFIi7C9U8qLT/heGCANXGVXXVkJzABxThLFflhMLq/nsCqPOb9vmp qc5V3Icv9B4OMM+gqj+mfv42wT3NkLX+WPBp3AIUsFcktjgVdl0lWi6GPEAFG/kCbEMA 0r7ra00hd8Ge8W/S9bso/FCOoaGxmMpl/hoGe3lCAl0JJ4lX+8sWVbG383DUNGXV4Yw+ vegv1UDk7aDnONU7QniMJUR+fZ0Eu7JgPkacqtAvOIamzYBuILuCeB80Fm+mjY6dJ0V+ BDQbFKlyEAk+rZ/B6LRQJLBXzyEmfyOCbN36ZEUm0iTal3ZAa/Q23rxfpTIv2ZPEd1NC aPqw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=MRNZQ6Vu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ba15-20020a0564021acf00b00502259b5834si13470742edb.564.2023.03.29.07.37.55; Wed, 29 Mar 2023 07:38:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=MRNZQ6Vu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231251AbjC2OWD (ORCPT + 99 others); Wed, 29 Mar 2023 10:22:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55582 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231154AbjC2OUB (ORCPT ); Wed, 29 Mar 2023 10:20:01 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B8A0F6192 for ; Wed, 29 Mar 2023 07:16:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099349; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cLmy3QT3PxPcXOXP8Tmrcj7ottmHeL6VPbhO66kxXIU=; b=MRNZQ6VuwSXy8iJSEi700uM4Qhf1ypJnP2vHxbfw5EpVAZcIM6GWJwCYWIyonsCgCGssqV 2CZVky1svM45pT7B8iBI5Z5diuY4IHAVGS/7Z8LzLcgpVBceGMnKv9Wa40g70IHh0v7l9F OFyyGkfC1wTDGI6sof/8hKyfovWfglE= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-407-avHEhrC2M-SiXN4UZyk2qA-1; Wed, 29 Mar 2023 10:15:42 -0400 X-MC-Unique: avHEhrC2M-SiXN4UZyk2qA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 9129B85530C; Wed, 29 Mar 2023 14:15:32 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 64BD62027040; Wed, 29 Mar 2023 14:15:30 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, John Fastabend , Jakub Sitnicki , bpf@vger.kernel.org Subject: [RFC PATCH v2 34/48] tcp_bpf: Make tcp_bpf_sendpage() go through tcp_bpf_sendmsg(MSG_SPLICE_PAGES) Date: Wed, 29 Mar 2023 15:13:40 +0100 Message-Id: <20230329141354.516864-35-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761713270893901666?= X-GMAIL-MSGID: =?utf-8?q?1761713270893901666?= Translate tcp_bpf_sendpage() calls to tcp_bpf_sendmsg(MSG_SPLICE_PAGES). Signed-off-by: David Howells cc: John Fastabend cc: Jakub Sitnicki cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: bpf@vger.kernel.org cc: netdev@vger.kernel.org --- net/ipv4/tcp_bpf.c | 49 +++++++++------------------------------------- 1 file changed, 9 insertions(+), 40 deletions(-) diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index 7f17134637eb..de37a4372437 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -485,49 +485,18 @@ static int tcp_bpf_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) static int tcp_bpf_sendpage(struct sock *sk, struct page *page, int offset, size_t size, int flags) { - struct sk_msg tmp, *msg = NULL; - int err = 0, copied = 0; - struct sk_psock *psock; - bool enospc = false; - - psock = sk_psock_get(sk); - if (unlikely(!psock)) - return tcp_sendpage(sk, page, offset, size, flags); + struct bio_vec bvec; + struct msghdr msg = { + .msg_flags = flags | MSG_SPLICE_PAGES, + }; - lock_sock(sk); - if (psock->cork) { - msg = psock->cork; - } else { - msg = &tmp; - sk_msg_init(msg); - } + bvec_set_page(&bvec, page, size, offset); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); - /* Catch case where ring is full and sendpage is stalled. */ - if (unlikely(sk_msg_full(msg))) - goto out_err; - - sk_msg_page_add(msg, page, size, offset); - sk_mem_charge(sk, size); - copied = size; - if (sk_msg_full(msg)) - enospc = true; - if (psock->cork_bytes) { - if (size > psock->cork_bytes) - psock->cork_bytes = 0; - else - psock->cork_bytes -= size; - if (psock->cork_bytes && !enospc) - goto out_err; - /* All cork bytes are accounted, rerun the prog. */ - psock->eval = __SK_NONE; - psock->cork_bytes = 0; - } + if (flags & MSG_SENDPAGE_NOTLAST) + msg.msg_flags |= MSG_MORE; - err = tcp_bpf_send_verdict(sk, psock, msg, &copied, flags); -out_err: - release_sock(sk); - sk_psock_put(sk, psock); - return copied ? copied : err; + return tcp_bpf_sendmsg(sk, &msg, size); } enum { From patchwork Wed Mar 29 14:13:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76646 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp464274vqo; Wed, 29 Mar 2023 07:41:42 -0700 (PDT) X-Google-Smtp-Source: AKy350ba7dI8rZsWE8ROUzizBhugOBdQSvJE6Hq1jukfqJX5ApK7AwaAbk3o4lAbParHhXWarUZ6 X-Received: by 2002:a05:6402:483:b0:502:4c82:9b4d with SMTP id k3-20020a056402048300b005024c829b4dmr8736279edv.15.1680100902736; Wed, 29 Mar 2023 07:41:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680100902; cv=none; d=google.com; s=arc-20160816; b=eAYyY/Om608XgocuN+DJt1NWHiX+U77WKvdxcOdp+Vrr/Ggddt2pxccGQj7B8EitYn wZfD3BHAgrbCljjp3Z1efowB3rbRFXc3t0nU9/M+sn8YBwNLq8Nx78LcazKkj54u3lGp T8r7J6SLU5fTLnX0oKHQQbLC1Ga60+D666bkzzI/BIEetuoVUd9Sz5M8So7SjsvJJH1E hGAj15eCqpfn7q9xkr1TDO5lnR3BRG5YcAQrJBbZVH5zQudjUDL1SWdEmWoJC6rvGmHb fbcXXz1zIz84CGygGhnyhLSdCFP95kHTUgiN9rJ9lN2yq/GZV0Tz6Z9N2Yf+ISJ5fmeY WSWg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=lLisHPaWAEED9gZclxyhbGIkHvgJDoUiXuyeeqOKByg=; b=yOGdSj1WttC4Dr7uZvP5rKZcZTGbqoNa7Giv5c+I66JgKstB4y/9rdizD0704fiR04 AnBDQXPa3DMSb6pQ0aN1Nx8BY8bWkU1+2YpAZ/X6epfw2z8569Nk8vS3+l8MQg7Wf6j/ nvRxGKcBxrsBmlWy3P7xm4sKKgo8QqHK0kYrZ1y1WvYv3WsplBwfCB4po59wm/Uvq/bJ GOSrQ3fURQgA/a1ZNoUw/JOWIE8Bv7M/GFmOnF/pMZfInUu0UjT0vHYsJj4JDUyjtjHn hGIe3uQf5Td366s4SpUBBD6HfBdmhy9tSW/5+6iIdu6UeC5hl9jJDNuIsk1LW9XXFlvu kw7w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=LvRTUrX4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ba15-20020a0564021acf00b00502259b5834si13470742edb.564.2023.03.29.07.41.18; Wed, 29 Mar 2023 07:41:42 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=LvRTUrX4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231343AbjC2OU7 (ORCPT + 99 others); Wed, 29 Mar 2023 10:20:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53276 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230476AbjC2OT3 (ORCPT ); Wed, 29 Mar 2023 10:19:29 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C54BC618C for ; Wed, 29 Mar 2023 07:16:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099341; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lLisHPaWAEED9gZclxyhbGIkHvgJDoUiXuyeeqOKByg=; b=LvRTUrX4603T3Ke/GLhKecQqHMQG41JEdN/2/iR2EmYnrKM5+uw6Rcz2VvT0WvdZ084H88 PCnx4hGQqsCIfZwdyILjWhE7aBBLYin0/GLRZM55AF6OQWSTgBZvAeapsVUqO71BUOUTjR S3Mm9H0bD2jti06OA/5ny6VP/vDOQeQ= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-663-69Q30cO8PE2HT8MQERo7Mg-1; Wed, 29 Mar 2023 10:15:36 -0400 X-MC-Unique: 69Q30cO8PE2HT8MQERo7Mg-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 33DA1101A552; Wed, 29 Mar 2023 14:15:35 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4C5DF202701E; Wed, 29 Mar 2023 14:15:33 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH v2 35/48] net: Use sendmsg(MSG_SPLICE_PAGES) not sendpage in skb_send_sock() Date: Wed, 29 Mar 2023 15:13:41 +0100 Message-Id: <20230329141354.516864-36-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761713484283137100?= X-GMAIL-MSGID: =?utf-8?q?1761713484283137100?= Use sendmsg() with MSG_SPLICE_PAGES rather than sendpage in skb_send_sock(). This causes pages to be spliced from the source iterator if possible (the iterator must be ITER_BVEC and the pages must be spliceable). This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Note that this could perhaps be improved to fill out a bvec array with all the frags and then make a single sendmsg call, possibly sticking the header on the front also. Signed-off-by: David Howells cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- net/core/skbuff.c | 49 ++++++++++++++++++++++++++--------------------- 1 file changed, 27 insertions(+), 22 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 0506e4cf1ed9..3693b3526d33 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -2919,32 +2919,32 @@ int skb_splice_bits(struct sk_buff *skb, struct sock *sk, unsigned int offset, } EXPORT_SYMBOL_GPL(skb_splice_bits); -static int sendmsg_unlocked(struct sock *sk, struct msghdr *msg, - struct kvec *vec, size_t num, size_t size) +static int sendmsg_locked(struct sock *sk, struct msghdr *msg) { struct socket *sock = sk->sk_socket; + size_t size = msg_data_left(msg); if (!sock) return -EINVAL; - return kernel_sendmsg(sock, msg, vec, num, size); + + if (!sock->ops->sendmsg_locked) + return sock_no_sendmsg_locked(sk, msg, size); + + return sock->ops->sendmsg_locked(sk, msg, size); } -static int sendpage_unlocked(struct sock *sk, struct page *page, int offset, - size_t size, int flags) +static int sendmsg_unlocked(struct sock *sk, struct msghdr *msg) { struct socket *sock = sk->sk_socket; if (!sock) return -EINVAL; - return kernel_sendpage(sock, page, offset, size, flags); + return sock_sendmsg(sock, msg); } -typedef int (*sendmsg_func)(struct sock *sk, struct msghdr *msg, - struct kvec *vec, size_t num, size_t size); -typedef int (*sendpage_func)(struct sock *sk, struct page *page, int offset, - size_t size, int flags); +typedef int (*sendmsg_func)(struct sock *sk, struct msghdr *msg); static int __skb_send_sock(struct sock *sk, struct sk_buff *skb, int offset, - int len, sendmsg_func sendmsg, sendpage_func sendpage) + int len, sendmsg_func sendmsg) { unsigned int orig_len = len; struct sk_buff *head = skb; @@ -2964,8 +2964,9 @@ static int __skb_send_sock(struct sock *sk, struct sk_buff *skb, int offset, memset(&msg, 0, sizeof(msg)); msg.msg_flags = MSG_DONTWAIT; - ret = INDIRECT_CALL_2(sendmsg, kernel_sendmsg_locked, - sendmsg_unlocked, sk, &msg, &kv, 1, slen); + iov_iter_kvec(&msg.msg_iter, ITER_SOURCE, &kv, 1, slen); + ret = INDIRECT_CALL_2(sendmsg, sendmsg_locked, + sendmsg_unlocked, sk, &msg); if (ret <= 0) goto error; @@ -2996,11 +2997,17 @@ static int __skb_send_sock(struct sock *sk, struct sk_buff *skb, int offset, slen = min_t(size_t, len, skb_frag_size(frag) - offset); while (slen) { - ret = INDIRECT_CALL_2(sendpage, kernel_sendpage_locked, - sendpage_unlocked, sk, - skb_frag_page(frag), - skb_frag_off(frag) + offset, - slen, MSG_DONTWAIT); + struct bio_vec bvec; + struct msghdr msg = { + .msg_flags = MSG_SPLICE_PAGES | MSG_DONTWAIT, + }; + + bvec_set_page(&bvec, skb_frag_page(frag), slen, + skb_frag_off(frag) + offset); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, slen); + + ret = INDIRECT_CALL_2(sendmsg, sendmsg_locked, + sendmsg_unlocked, sk, &msg); if (ret <= 0) goto error; @@ -3037,16 +3044,14 @@ static int __skb_send_sock(struct sock *sk, struct sk_buff *skb, int offset, int skb_send_sock_locked(struct sock *sk, struct sk_buff *skb, int offset, int len) { - return __skb_send_sock(sk, skb, offset, len, kernel_sendmsg_locked, - kernel_sendpage_locked); + return __skb_send_sock(sk, skb, offset, len, sendmsg_locked); } EXPORT_SYMBOL_GPL(skb_send_sock_locked); /* Send skb data on a socket. Socket must be unlocked. */ int skb_send_sock(struct sock *sk, struct sk_buff *skb, int offset, int len) { - return __skb_send_sock(sk, skb, offset, len, sendmsg_unlocked, - sendpage_unlocked); + return __skb_send_sock(sk, skb, offset, len, sendmsg_unlocked); } /** From patchwork Wed Mar 29 14:13:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76626 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp461440vqo; Wed, 29 Mar 2023 07:36:59 -0700 (PDT) X-Google-Smtp-Source: AKy350YdYStANn5j3Z+poFS0wsWdn84MwONewcYNasBe6Ir+9jxSSkNvCi8AyVj6nDCP4TgQHcjU X-Received: by 2002:a17:906:a04a:b0:8b1:bab0:aa3d with SMTP id bg10-20020a170906a04a00b008b1bab0aa3dmr19677205ejb.8.1680100619594; Wed, 29 Mar 2023 07:36:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680100619; cv=none; d=google.com; s=arc-20160816; b=B/hSMEJHoarkIWFnmfAxsHo2p7lX+nNMpVcZ2Ov6Z3a1ujvYtkUKHDpGk5RrcycKFo yf1acF59CJiqPnnU3g6bv4nxLjxJ/wjuorciK29XEiH2LyF9hJTQJK/OhvKWg2e8FMWN 5BK2AKQVZVDRxJcBsPjRUkWnjQFm9v+YBZKw+T/uH2kjjEreyHrogL9gYjb4923h9bZN i9+lClCwAnpVGK11DZSA1XUtF+IMdhoBoTj+wCUn6bFXniRmt5wqPKAg20br47PuqqBj ahovV1US9+NzoqRdwLCwFNbcX2xPYDPFD+BwjygA2Qk1yzPoIgzYtDxp2JvvKQssLqe6 p5Yw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=taClZ7tV+6NGiC2MYzyqiyjdYhpNZJPUqY1OQ7XAIPM=; b=s8fapgAFxGajjfnyGY1P4GvAkd19jRTzttImpUi+JT2uc7m5OnFxsLj8x30LruLkcV RWDlpDG3IAgsS/3m2vfrSOhrY3Ku+1Dij/JCzoh2M9ZbEqdg1scUewm/2DzNLdUSIRkr VhFIFGqhup0M+FHtaP46rm1HWP/BSDBiKHn6/KYeS7JsTS6mLoZc0ZKDv/FcJx/uITvH +WMKqYAAh9tKW8AJBKjmD669+FDZB9wDDxIxsPiC2ElPHHeYHuA+IX7zhf+fDeQLRYSY fVSc/Vbf1gQE/hcRehdVg69cLvrEYIGpAQGtHDUKSpoZ+ndYSBErVecngW4aL0uaCHSa 3RwQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Jc8qpajj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j12-20020a170906474c00b0092ff0294c61si33834258ejs.710.2023.03.29.07.36.35; Wed, 29 Mar 2023 07:36:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Jc8qpajj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231267AbjC2OUG (ORCPT + 99 others); Wed, 29 Mar 2023 10:20:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54686 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230153AbjC2OSn (ORCPT ); Wed, 29 Mar 2023 10:18:43 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5761D5FE5 for ; Wed, 29 Mar 2023 07:16:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099343; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=taClZ7tV+6NGiC2MYzyqiyjdYhpNZJPUqY1OQ7XAIPM=; b=Jc8qpajjkpFdAbOomvM1AH/NzjbVJv13HpkqgUw0s3hSm7V+Mtgkqk0c9aWzcRabCBYNlc ETHeaetHEaZNwXcq3eM08fimOYNv/6YUO5TyCGVkbymnVwJQpOC1LZOobovU+ZuGFt/Oa2 UqkP+fziPepT3vWuR168MqsaXGHgkIY= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-572-EfC1k2O2OHKPPpyIGoOAVw-1; Wed, 29 Mar 2023 10:15:39 -0400 X-MC-Unique: EfC1k2O2OHKPPpyIGoOAVw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id DCF561C0758D; Wed, 29 Mar 2023 14:15:37 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id C9F6B4020C83; Wed, 29 Mar 2023 14:15:35 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Herbert Xu , linux-crypto@vger.kernel.org Subject: [RFC PATCH v2 36/48] algif: Remove hash_sendpage*() Date: Wed, 29 Mar 2023 15:13:42 +0100 Message-Id: <20230329141354.516864-37-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761713187272013165?= X-GMAIL-MSGID: =?utf-8?q?1761713187272013165?= Remove hash_sendpage*().. Signed-off-by: David Howells cc: Herbert Xu cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: linux-crypto@vger.kernel.org cc: netdev@vger.kernel.org --- crypto/algif_hash.c | 66 --------------------------------------------- 1 file changed, 66 deletions(-) diff --git a/crypto/algif_hash.c b/crypto/algif_hash.c index b89c2c50cecc..dc6c45637b2d 100644 --- a/crypto/algif_hash.c +++ b/crypto/algif_hash.c @@ -162,58 +162,6 @@ static int hash_sendmsg(struct socket *sock, struct msghdr *msg, goto unlock; } -static ssize_t hash_sendpage(struct socket *sock, struct page *page, - int offset, size_t size, int flags) -{ - struct sock *sk = sock->sk; - struct alg_sock *ask = alg_sk(sk); - struct hash_ctx *ctx = ask->private; - int err; - - if (flags & MSG_SENDPAGE_NOTLAST) - flags |= MSG_MORE; - - lock_sock(sk); - sg_init_table(ctx->sgl.sgl, 1); - sg_set_page(ctx->sgl.sgl, page, size, offset); - - if (!(flags & MSG_MORE)) { - err = hash_alloc_result(sk, ctx); - if (err) - goto unlock; - } else if (!ctx->more) - hash_free_result(sk, ctx); - - ahash_request_set_crypt(&ctx->req, ctx->sgl.sgl, ctx->result, size); - - if (!(flags & MSG_MORE)) { - if (ctx->more) - err = crypto_ahash_finup(&ctx->req); - else - err = crypto_ahash_digest(&ctx->req); - } else { - if (!ctx->more) { - err = crypto_ahash_init(&ctx->req); - err = crypto_wait_req(err, &ctx->wait); - if (err) - goto unlock; - } - - err = crypto_ahash_update(&ctx->req); - } - - err = crypto_wait_req(err, &ctx->wait); - if (err) - goto unlock; - - ctx->more = flags & MSG_MORE; - -unlock: - release_sock(sk); - - return err ?: size; -} - static int hash_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, int flags) { @@ -318,7 +266,6 @@ static struct proto_ops algif_hash_ops = { .release = af_alg_release, .sendmsg = hash_sendmsg, - .sendpage = hash_sendpage, .recvmsg = hash_recvmsg, .accept = hash_accept, }; @@ -370,18 +317,6 @@ static int hash_sendmsg_nokey(struct socket *sock, struct msghdr *msg, return hash_sendmsg(sock, msg, size); } -static ssize_t hash_sendpage_nokey(struct socket *sock, struct page *page, - int offset, size_t size, int flags) -{ - int err; - - err = hash_check_key(sock); - if (err) - return err; - - return hash_sendpage(sock, page, offset, size, flags); -} - static int hash_recvmsg_nokey(struct socket *sock, struct msghdr *msg, size_t ignored, int flags) { @@ -420,7 +355,6 @@ static struct proto_ops algif_hash_ops_nokey = { .release = af_alg_release, .sendmsg = hash_sendmsg_nokey, - .sendpage = hash_sendpage_nokey, .recvmsg = hash_recvmsg_nokey, .accept = hash_accept_nokey, }; From patchwork Wed Mar 29 14:13:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76630 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp462284vqo; Wed, 29 Mar 2023 07:38:27 -0700 (PDT) X-Google-Smtp-Source: AKy350aWncTRnohfchy7eXt4YenaNdRtUuUC5Ue4rlTRyE3ebpT3nbtGlSuX/ovkmMdHBhfQNtfq X-Received: by 2002:aa7:d39a:0:b0:4fd:2b05:aa2 with SMTP id x26-20020aa7d39a000000b004fd2b050aa2mr20365104edq.42.1680100706820; Wed, 29 Mar 2023 07:38:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680100706; cv=none; d=google.com; s=arc-20160816; b=UZROzUUhcuipcbUmCjETXmmmN9SoJj4g12lm6gktdIn0SJXWWiWy6/U8ISozup8VJ5 31ZQFGG+RzKmJ63+Rm26HY7N1IbEB2mJ9zG+oK9c8pvT5oQqjt38tfT9ro5JT9jWc0+k H9xmXi0FkoUefVDR1IHttqL+NyVgZt5JA7vg3uxLdB8707aqX2KxQ00qbNia5x/TOY+c gGjC+55ago51CxUA/h1Uj+csJ6lUAwRdv0Y7lwU8KerrGkLgeP+J5eZ3BVJc7ioZ1mjA GZquNHlUR58uROb1Ny7ryX+pOCzHS5mzB9hA6q/aoNr+Rdex5zijUEHPt1QcmdVv5jW1 Supg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=14pAi6JHwU3C53h/ZQrPmmNB396J3kzB8Cmz3sg/vjY=; b=BNuDkdcDxfbFZxGRFR5fWNFx6i1Hh2mzUoDkksoRrjIo+L+sxFDsOK5ZnWZ4RVMv89 aXisTePUwTEhtZSM3lWTnSwU8tSnL3AWHIeqCGdeq3Z6wconzO2N700rGwx8DoeiCcPk cTom390U34QPt5Tk9jCt9Di6tmGcZNt0rFEdmIvAPZBNlmBJCT1wlNvDtvwQCMddxtA1 Lka05S7+5U4wzT1+i0aZxatPbnp9r2qie3QBKlYbBSkSqiB+pGEsBq5kNsdcPje9BVSj /8vJrj9mnleAlh+Yl5+U/6rH5EODth9ZZEJuMYlSATi2n4dnaFPuJUE2Jj7IE4ucem04 FzdA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=gJrdPpv5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m10-20020aa7c2ca000000b005002e6f0f60si30559402edp.576.2023.03.29.07.38.02; Wed, 29 Mar 2023 07:38:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=gJrdPpv5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231373AbjC2OWk (ORCPT + 99 others); Wed, 29 Mar 2023 10:22:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54634 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231303AbjC2OUi (ORCPT ); Wed, 29 Mar 2023 10:20:38 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 51EA8469C for ; Wed, 29 Mar 2023 07:16:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099349; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=14pAi6JHwU3C53h/ZQrPmmNB396J3kzB8Cmz3sg/vjY=; b=gJrdPpv5S42qPFERzLXmo86tUHBqZD2Jq9IQ49EeeCl9TRfvxCFOpvv04kJ5r3jmq5mh3U I2CZCeBdgwIMXRxfaCByxQQ3HWhF01jXwU4Yf31skXhiZkMbydEZIdBMCeFnrff6IDCgfK THtMn9/QlpeNyK1UGTVFpMVjp72iMhg= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-618-VHJgX1wyNguGyTVDaKNkkw-1; Wed, 29 Mar 2023 10:15:42 -0400 X-MC-Unique: VHJgX1wyNguGyTVDaKNkkw-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C9AAA802D1A; Wed, 29 Mar 2023 14:15:40 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9E5D714171BB; Wed, 29 Mar 2023 14:15:38 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Ilya Dryomov , Xiubo Li , ceph-devel@vger.kernel.org Subject: [RFC PATCH v2 37/48] ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage() Date: Wed, 29 Mar 2023 15:13:43 +0100 Message-Id: <20230329141354.516864-38-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761713278395796523?= X-GMAIL-MSGID: =?utf-8?q?1761713278395796523?= Use sendmsg() and MSG_SPLICE_PAGES rather than sendpage in ceph when transmitting data. For the moment, this can only transmit one page at a time because of the architecture of net/ceph/, but if write_partial_message_data() can be given a bvec[] at a time by the iteration code, this would allow pages to be sent in a batch. Signed-off-by: David Howells cc: Ilya Dryomov cc: Xiubo Li cc: Jeff Layton cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: ceph-devel@vger.kernel.org cc: netdev@vger.kernel.org --- net/ceph/messenger_v2.c | 89 +++++++++-------------------------------- 1 file changed, 18 insertions(+), 71 deletions(-) diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c index 301a991dc6a6..1637a0c21126 100644 --- a/net/ceph/messenger_v2.c +++ b/net/ceph/messenger_v2.c @@ -117,91 +117,38 @@ static int ceph_tcp_recv(struct ceph_connection *con) return ret; } -static int do_sendmsg(struct socket *sock, struct iov_iter *it) -{ - struct msghdr msg = { .msg_flags = CEPH_MSG_FLAGS }; - int ret; - - msg.msg_iter = *it; - while (iov_iter_count(it)) { - ret = sock_sendmsg(sock, &msg); - if (ret <= 0) { - if (ret == -EAGAIN) - ret = 0; - return ret; - } - - iov_iter_advance(it, ret); - } - - WARN_ON(msg_data_left(&msg)); - return 1; -} - -static int do_try_sendpage(struct socket *sock, struct iov_iter *it) -{ - struct msghdr msg = { .msg_flags = CEPH_MSG_FLAGS }; - struct bio_vec bv; - int ret; - - if (WARN_ON(!iov_iter_is_bvec(it))) - return -EINVAL; - - while (iov_iter_count(it)) { - /* iov_iter_iovec() for ITER_BVEC */ - bvec_set_page(&bv, it->bvec->bv_page, - min(iov_iter_count(it), - it->bvec->bv_len - it->iov_offset), - it->bvec->bv_offset + it->iov_offset); - - /* - * sendpage cannot properly handle pages with - * page_count == 0, we need to fall back to sendmsg if - * that's the case. - * - * Same goes for slab pages: skb_can_coalesce() allows - * coalescing neighboring slab objects into a single frag - * which triggers one of hardened usercopy checks. - */ - if (sendpage_ok(bv.bv_page)) { - ret = sock->ops->sendpage(sock, bv.bv_page, - bv.bv_offset, bv.bv_len, - CEPH_MSG_FLAGS); - } else { - iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bv, 1, bv.bv_len); - ret = sock_sendmsg(sock, &msg); - } - if (ret <= 0) { - if (ret == -EAGAIN) - ret = 0; - return ret; - } - - iov_iter_advance(it, ret); - } - - return 1; -} - /* * Write as much as possible. The socket is expected to be corked, * so we don't bother with MSG_MORE/MSG_SENDPAGE_NOTLAST here. * * Return: - * 1 - done, nothing (else) to write + * >0 - done, nothing (else) to write * 0 - socket is full, need to wait * <0 - error */ static int ceph_tcp_send(struct ceph_connection *con) { + struct msghdr msg = { + .msg_iter = con->v2.out_iter, + .msg_flags = CEPH_MSG_FLAGS, + }; int ret; + if (WARN_ON(!iov_iter_is_bvec(&con->v2.out_iter))) + return -EINVAL; + + if (con->v2.out_iter_sendpage) + msg.msg_flags |= MSG_SPLICE_PAGES; + dout("%s con %p have %zu try_sendpage %d\n", __func__, con, iov_iter_count(&con->v2.out_iter), con->v2.out_iter_sendpage); - if (con->v2.out_iter_sendpage) - ret = do_try_sendpage(con->sock, &con->v2.out_iter); - else - ret = do_sendmsg(con->sock, &con->v2.out_iter); + + ret = sock_sendmsg(con->sock, &msg); + if (ret > 0) + iov_iter_advance(&con->v2.out_iter, ret); + else if (ret == -EAGAIN) + ret = 0; + dout("%s con %p ret %d left %zu\n", __func__, con, ret, iov_iter_count(&con->v2.out_iter)); return ret; From patchwork Wed Mar 29 14:13:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76641 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp463806vqo; Wed, 29 Mar 2023 07:40:55 -0700 (PDT) X-Google-Smtp-Source: AKy350Y+Cittfg6ydEQCW0jUiGMYh7iF/Z9ubu0HqyeN8PLiCRpvCDvcm3J1dOQs5dWby6irDvrc X-Received: by 2002:a17:906:4bd7:b0:930:3840:1c4d with SMTP id x23-20020a1709064bd700b0093038401c4dmr20721565ejv.32.1680100855114; Wed, 29 Mar 2023 07:40:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680100855; cv=none; d=google.com; s=arc-20160816; b=QDnk/tKLp/7w3CFaibPleUKJ7Nm8DH88vGMWnYodEoXtYeybue8oS9uFTGbutHORTB 9tEkrRHgb0jaws/MHljScvBa08lwsenBG3YaZRp1CSTLhbXlQcjl3ppEExhTPV+BOZXw 0ZaxMjTGnm2KdYxMID8njB+rzcH+vmQMMhNz8jbIrx+kohcbpDwCtPmjaGTZjCVYMaLF XVp85DoPg5ZZ+CHadVFOXuCZ/Grnjrh5EG/rYLCZn8QsEtS29By/FDZeuS9VmQu58hE0 Gqomq+YhbH8ZzWCgWD800tJT4ex/eI/Kw58UU0fbrdRvxl4G93TY/gDd7CKmYFlvwsg6 YAcA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=pMIF11EWm/8cej/KXQhM8DA1wUu2yCmqcuxOX1F+AiY=; b=JpT+jVr/17N+tazoi9IDaiXD+J2jFdvW/58JsYiUR8CPlpEaReOZoN/k8D6cepA4is rOPY/3PyCRz74s3CiUbnA/MBhvgXoWWcwOEVcSau+jVTj6JzvdU9eRoCCQIwdRcKr7qB mug3e0ZZM365eQ5bgcIheCJDA6VCpGMocWwzFIMP8g89ln/qErP2oW5qPHc09FCHxkfa bZ03K+IkBwVsobUtRk9PFgZFzX2mk9eQIaRZXKrgvbXarxYU7GPFYzj/E7wF3tjVLWY3 anFI23d7FPOv7fEmCDmwQpzswyKnega1onMZ9K1RHB5UzjE+7ywFQiU3Qyw6fQ5Pl5aS cf+Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=hSLiCeMB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z6-20020a170906944600b00931bc7937a0si28643921ejx.497.2023.03.29.07.40.31; Wed, 29 Mar 2023 07:40:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=hSLiCeMB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231330AbjC2OU5 (ORCPT + 99 others); Wed, 29 Mar 2023 10:20:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53108 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230448AbjC2OT3 (ORCPT ); Wed, 29 Mar 2023 10:19:29 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 366085FFF for ; Wed, 29 Mar 2023 07:16:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099346; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pMIF11EWm/8cej/KXQhM8DA1wUu2yCmqcuxOX1F+AiY=; b=hSLiCeMBgNpy0shsr3xu/FisyHSp5Ma65TO3HVKzgSaRFJRQ1/0nPv2/grxVf1UijdmOvW 0Y4s5/aeiAbnu2zZQPL66lRKMsaZuOpSNGoH293w7HXqiMX4Tcln+UH63xxc9U1gGMyIji TPpNdbN5zkzts4hqQ38aGnUT1nqHr7k= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-494-exPjG2AiPg65VJXx__yQjQ-1; Wed, 29 Mar 2023 10:15:45 -0400 X-MC-Unique: exPjG2AiPg65VJXx__yQjQ-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 99A6F185A7A4; Wed, 29 Mar 2023 14:15:43 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6BD31202701E; Wed, 29 Mar 2023 14:15:41 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Santosh Shilimkar , linux-rdma@vger.kernel.org, rds-devel@oss.oracle.com Subject: [RFC PATCH v2 38/48] rds: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage Date: Wed, 29 Mar 2023 15:13:44 +0100 Message-Id: <20230329141354.516864-39-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761713434398522136?= X-GMAIL-MSGID: =?utf-8?q?1761713434398522136?= When transmitting data, call down into TCP using a single sendmsg with MSG_SPLICE_PAGES to indicate that content should be spliced rather than performing several sendmsg and sendpage calls to transmit header and data pages. To make this work, the data is assembled in a bio_vec array and attached to a BVEC-type iterator. The header are copied into memory acquired from zcopy_alloc() which just breaks a page up into small pieces that can be freed with put_page(). Signed-off-by: David Howells cc: Santosh Shilimkar cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: linux-rdma@vger.kernel.org cc: rds-devel@oss.oracle.com cc: netdev@vger.kernel.org --- net/rds/tcp_send.c | 86 +++++++++++++++++++++------------------------- 1 file changed, 40 insertions(+), 46 deletions(-) diff --git a/net/rds/tcp_send.c b/net/rds/tcp_send.c index 8c4d1d6e9249..660d9f203d99 100644 --- a/net/rds/tcp_send.c +++ b/net/rds/tcp_send.c @@ -52,29 +52,24 @@ void rds_tcp_xmit_path_complete(struct rds_conn_path *cp) tcp_sock_set_cork(tc->t_sock->sk, false); } -/* the core send_sem serializes this with other xmit and shutdown */ -static int rds_tcp_sendmsg(struct socket *sock, void *data, unsigned int len) -{ - struct kvec vec = { - .iov_base = data, - .iov_len = len, - }; - struct msghdr msg = { - .msg_flags = MSG_DONTWAIT | MSG_NOSIGNAL, - }; - - return kernel_sendmsg(sock, &msg, &vec, 1, vec.iov_len); -} - /* the core send_sem serializes this with other xmit and shutdown */ int rds_tcp_xmit(struct rds_connection *conn, struct rds_message *rm, unsigned int hdr_off, unsigned int sg, unsigned int off) { struct rds_conn_path *cp = rm->m_inc.i_conn_path; struct rds_tcp_connection *tc = cp->cp_transport_data; + struct msghdr msg = { + .msg_flags = MSG_SPLICE_PAGES | MSG_DONTWAIT | MSG_NOSIGNAL, + }; + struct bio_vec *bvec; + unsigned int i, size = 0, ix = 0; + bool free_hdr = false; int done = 0; - int ret = 0; - int more; + int ret = -ENOMEM; + + bvec = kmalloc_array(1 + sg, sizeof(struct bio_vec), GFP_KERNEL); + if (!bvec) + goto out; if (hdr_off == 0) { /* @@ -99,43 +94,37 @@ int rds_tcp_xmit(struct rds_connection *conn, struct rds_message *rm, if (hdr_off < sizeof(struct rds_header)) { /* see rds_tcp_write_space() */ + void *p; + set_bit(SOCK_NOSPACE, &tc->t_sock->sk->sk_socket->flags); - ret = rds_tcp_sendmsg(tc->t_sock, - (void *)&rm->m_inc.i_hdr + hdr_off, - sizeof(rm->m_inc.i_hdr) - hdr_off); - if (ret < 0) - goto out; - done += ret; - if (hdr_off + done != sizeof(struct rds_header)) + ret = -ENOMEM; + p = page_frag_memdup(NULL, + (void *)&rm->m_inc.i_hdr + hdr_off, + sizeof(rm->m_inc.i_hdr) - hdr_off, + GFP_KERNEL, ULONG_MAX); + if (!p) goto out; + bvec_set_virt(&bvec[ix], p, sizeof(rm->m_inc.i_hdr) - hdr_off); + free_hdr = true; + size += bvec[ix].bv_len; + ix++; } - more = rm->data.op_nents > 1 ? (MSG_MORE | MSG_SENDPAGE_NOTLAST) : 0; - while (sg < rm->data.op_nents) { - int flags = MSG_DONTWAIT | MSG_NOSIGNAL | more; - - ret = tc->t_sock->ops->sendpage(tc->t_sock, - sg_page(&rm->data.op_sg[sg]), - rm->data.op_sg[sg].offset + off, - rm->data.op_sg[sg].length - off, - flags); - rdsdebug("tcp sendpage %p:%u:%u ret %d\n", (void *)sg_page(&rm->data.op_sg[sg]), - rm->data.op_sg[sg].offset + off, rm->data.op_sg[sg].length - off, - ret); - if (ret <= 0) - break; - - off += ret; - done += ret; - if (off == rm->data.op_sg[sg].length) { - off = 0; - sg++; - } - if (sg == rm->data.op_nents - 1) - more = 0; + for (i = sg; i < rm->data.op_nents; i++) { + bvec_set_page(&bvec[ix], + sg_page(&rm->data.op_sg[i]), + rm->data.op_sg[i].length - off, + rm->data.op_sg[i].offset + off); + off = 0; + size += bvec[ix].bv_len; + ix++; } + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, bvec, ix, size); + ret = sock_sendmsg(tc->t_sock, &msg); + rdsdebug("tcp sendmsg-splice %u,%u ret %d\n", ix, size, ret); + out: if (ret <= 0) { /* write_space will hit after EAGAIN, all else fatal */ @@ -158,6 +147,11 @@ int rds_tcp_xmit(struct rds_connection *conn, struct rds_message *rm, } if (done == 0) done = ret; + if (bvec) { + if (free_hdr) + put_page(bvec[0].bv_page); + kfree(bvec); + } return done; } From patchwork Wed Mar 29 14:13:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76619 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp456995vqo; Wed, 29 Mar 2023 07:30:51 -0700 (PDT) X-Google-Smtp-Source: AKy350bRiWNA8BPLog9UlI+uBfjKT2ms9cSKa64VJIu7NDZ+XaOKAv4C17JwmfZQSy0zU6niHjPP X-Received: by 2002:a17:906:f193:b0:92b:eca6:43fc with SMTP id gs19-20020a170906f19300b0092beca643fcmr20518747ejb.64.1680100251340; Wed, 29 Mar 2023 07:30:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680100251; cv=none; d=google.com; s=arc-20160816; b=fS/czhyhMCR+A+xk4HXZC7ZHGJP2q1tZHxH0jpLf0ZwdLeT4UB85A5eJmM+u2J9RmJ eFLBBIoyjwmDYBhT70Wr8KjgQB6J/GqUDn4sHzKSFmyiIpukhFbjcj73Uv+M90vV62RK UvOoRbKJfVcxF7qNzLugPmo62iwK8+OkdPnw2No200iItky6yi7o/v4AnF67QjtEFHgo 2T0dpVOLTQF+E4K1EB2mlTymZ1HarBYgo1iU/Xdurh9eoZPhhq4BJq4YtC9aKJDUW/gU QAnWH8Zzfe7jUGudDklD5DFgmmQawhPcmnkpDozYxOi+aW7kcZzmn1pTN70CUWFawZRg FTOQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=6d/Oxl1Cdu8ybDL5tOgPD1gv3cW2fyMj4HBClKMadWo=; b=J25vrmSZcTWTtgrO4+x0W4CkPFaG0m/cgqUOi/cXVYT6ucdWty2/M7HuWY3Ohcvcwo Zdh8GhWRTobzEy4XVUIge4/b/SFgAr7KRFH16bGLh/i/bH/vJ8i6YjzKpqh7cjbrqbKP B8xRz0D8C2XkS7DqzAHNxjAVz5jR2lHEItzBow6JiRxD4/4awFxIsGi/Q8bGDaX+1fa1 GS18ZOXfkp4JE7obm0a/IKeMsB+JlIUeN4VTRf7Zv5vh0EROMRqs1C+/0KEi4lZbgRms uJwPdPF8EW0o94cmHvp25srLsS09SzUqptRvdvGHYWXQA7L32Hjzkmhx9WYg6jZhoUWC L9Aw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=i5WgYOGl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y24-20020a170906071800b009317f50118fsi13326081ejb.621.2023.03.29.07.30.17; Wed, 29 Mar 2023 07:30:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=i5WgYOGl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231253AbjC2OWg (ORCPT + 99 others); Wed, 29 Mar 2023 10:22:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55312 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231205AbjC2OUi (ORCPT ); Wed, 29 Mar 2023 10:20:38 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 67A4C35A7 for ; Wed, 29 Mar 2023 07:16:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099360; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6d/Oxl1Cdu8ybDL5tOgPD1gv3cW2fyMj4HBClKMadWo=; b=i5WgYOGl7kGTcvpXPydHRtgDvxfHwl0mhKDToCrNgIEYrubxAScPgl3ffAqJoUXQ5IJb69 jlgY1iSMLb1qPGk2tLXqhZZLB3/gLN1qCvdmgX5YDT38wpQ201+0Ku3qff5HkAGSHeER6/ 8aJENWYddgozvDAJil1bM3r9D7zaJb4= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-619-i1SRZaD-MumdNlQEtFGyZA-1; Wed, 29 Mar 2023 10:15:49 -0400 X-MC-Unique: i1SRZaD-MumdNlQEtFGyZA-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id D152F299E75D; Wed, 29 Mar 2023 14:15:46 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6BAE12166B33; Wed, 29 Mar 2023 14:15:44 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Christine Caulfield , David Teigland , cluster-devel@redhat.com Subject: [RFC PATCH v2 39/48] dlm: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage Date: Wed, 29 Mar 2023 15:13:45 +0100 Message-Id: <20230329141354.516864-40-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761712801388709234?= X-GMAIL-MSGID: =?utf-8?q?1761712801388709234?= When transmitting data, call down a layer using a single sendmsg with MSG_SPLICE_PAGES to indicate that content should be spliced rather using sendpage. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells cc: Christine Caulfield cc: David Teigland cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: cluster-devel@redhat.com cc: netdev@vger.kernel.org --- fs/dlm/lowcomms.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c index a9b14f81d655..9c0c691b6106 100644 --- a/fs/dlm/lowcomms.c +++ b/fs/dlm/lowcomms.c @@ -1394,8 +1394,11 @@ int dlm_lowcomms_resend_msg(struct dlm_msg *msg) /* Send a message */ static int send_to_sock(struct connection *con) { - const int msg_flags = MSG_DONTWAIT | MSG_NOSIGNAL; struct writequeue_entry *e; + struct bio_vec bvec; + struct msghdr msg = { + .msg_flags = MSG_SPLICE_PAGES | MSG_DONTWAIT | MSG_NOSIGNAL, + }; int len, offset, ret; spin_lock_bh(&con->writequeue_lock); @@ -1411,8 +1414,9 @@ static int send_to_sock(struct connection *con) WARN_ON_ONCE(len == 0 && e->users == 0); spin_unlock_bh(&con->writequeue_lock); - ret = kernel_sendpage(con->sock, e->page, offset, len, - msg_flags); + bvec_set_page(&bvec, e->page, len, offset); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, len); + ret = sock_sendmsg(con->sock, &msg); trace_dlm_send(con->nodeid, ret); if (ret == -EAGAIN || ret == 0) { lock_sock(con->sock->sk); From patchwork Wed Mar 29 14:13:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76622 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp460181vqo; Wed, 29 Mar 2023 07:35:16 -0700 (PDT) X-Google-Smtp-Source: AKy350Zo/kWjnG3qM3VWh0m0Ma/r98V5MYZzwtoN5X18beM+uWV+SNAczdavSTnpfr/xm2LZ3k/j X-Received: by 2002:a17:906:7312:b0:91f:5f9c:5db6 with SMTP id di18-20020a170906731200b0091f5f9c5db6mr28656432ejc.52.1680100516377; Wed, 29 Mar 2023 07:35:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680100516; cv=none; d=google.com; s=arc-20160816; b=zIhhH/XIgrjOJN14q+0NQfo6yk8Jwpt7QQ3h3GZrSzKGl5aB01ODbEOjrb/Ff8nBWC JSTArcIQMCwmeRl+BfqD0Kh83jezJWQgg/wmGLi7pClgKSC4r5b1VIKbRaNXzD3RV3ps NJDw54OkgnZ53liuIRHMo+Ul68OXypgzMRkowrkJk8TZRI8Ld7w7x/ozl8vG9kyw2M/d +mg/4cJiKlR5aXEgvcqovL+1nSoEXsF4cN+BCweXnCc8wrwu+E0VxM/3ZJfagKe/AbXk H+45Gwz3ndAmvW20iWmoxgfA7dZQOrPuYERvcPsPVkzPeA6XU1V0sZ18BN045k+pNatB YLXA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=soF5gavmzneSvIIvHwhPEmiQq18BggR/9B/uQZFrN8w=; b=NmUiLGWSG8ApNwTx1mP46pZ+JVf4pRinJtnVmgg1ybYt86XxFZubcf+1b9PHgPW5Of /s7XfjgQoecqmBGprFKgx6Crw5p3xKvuptsDNiSC5w5P/oQn4R0kI9Rp63QYiO3dp2RH ImHISEAdELm/XIpNhEYogMQyTEjZrhKznBl8jR41a+KayiJmPqyYljGhYVbRzTB1IGbH gNPWSQyHFAQxVU9olJ0egl6L4P9yK6wvf4+8xyk3VPOGwCE7InVUPhwfY4K5jrdiqCHg T7G2KlT0mFek5AV8U92g6e6IsSrP8o/9KGb44UMqlJGz7ImeBIaVPO+8A9r6mI6Xi9xj qfCg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ZXW1HzVs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x7-20020a170906298700b00930926fc5acsi11987696eje.911.2023.03.29.07.34.52; Wed, 29 Mar 2023 07:35:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ZXW1HzVs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231391AbjC2OWn (ORCPT + 99 others); Wed, 29 Mar 2023 10:22:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53088 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231304AbjC2OUi (ORCPT ); Wed, 29 Mar 2023 10:20:38 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EE5DD61AF for ; Wed, 29 Mar 2023 07:16:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099356; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=soF5gavmzneSvIIvHwhPEmiQq18BggR/9B/uQZFrN8w=; b=ZXW1HzVsOGyt5V4GXikPJkY5ih0M+B27i+lSSZ0oZqoJ9AyiDpVLy/cGyWq+j+g7WFJRMf Qic3wt5LyJ8XCaso4qI0+qXqIbP7hTSS/H+HsNSMGXZtfu8syaWTSZQ51rTrPNfvGNEPtI 5rjCT7O9gt66J/xrNqwkZ8UF+QsoWcQ= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-56-ruOkGzEjPIqn8iT3b-ztNw-1; Wed, 29 Mar 2023 10:15:54 -0400 X-MC-Unique: ruOkGzEjPIqn8iT3b-ztNw-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 99B85855300; Wed, 29 Mar 2023 14:15:49 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6CE9C2027040; Wed, 29 Mar 2023 14:15:47 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Trond Myklebust , Anna Schumaker , linux-nfs@vger.kernel.org Subject: [RFC PATCH v2 40/48] sunrpc: Use sendmsg(MSG_SPLICE_PAGES) rather then sendpage Date: Wed, 29 Mar 2023 15:13:46 +0100 Message-Id: <20230329141354.516864-41-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761713079560756132?= X-GMAIL-MSGID: =?utf-8?q?1761713079560756132?= When transmitting data, call down into TCP using a single sendmsg with MSG_SPLICE_PAGES to indicate that content should be spliced rather than performing several sendmsg and sendpage calls to transmit header, data pages and trailer. To make this work, the data is assembled in a bio_vec array and attached to a BVEC-type iterator. The header and trailer are copied into page fragments so that they can be freed with put_page and attached to iterators of their own. An iterator-of-iterators is then created to bridge all three iterators (headers, data, trailer) and that is passed to sendmsg to pass the entire message in a single call. Signed-off-by: David Howells cc: Trond Myklebust cc: Anna Schumaker cc: Chuck Lever cc: Jeff Layton cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: linux-nfs@vger.kernel.org cc: netdev@vger.kernel.org Signed-off-by: David Howells Signed-off-by: David Howells Signed-off-by: David Howells Acked-by: Chuck Lever Tested-by: Daire Byrne --- include/linux/sunrpc/svc.h | 11 +++-- net/sunrpc/svcsock.c | 89 +++++++++++++++----------------------- 2 files changed, 40 insertions(+), 60 deletions(-) diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 877891536c2f..456ae554aa11 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -161,16 +161,15 @@ static inline bool svc_put_not_last(struct svc_serv *serv) extern u32 svc_max_payload(const struct svc_rqst *rqstp); /* - * RPC Requsts and replies are stored in one or more pages. + * RPC Requests and replies are stored in one or more pages. * We maintain an array of pages for each server thread. * Requests are copied into these pages as they arrive. Remaining * pages are available to write the reply into. * - * Pages are sent using ->sendpage so each server thread needs to - * allocate more to replace those used in sending. To help keep track - * of these pages we have a receive list where all pages initialy live, - * and a send list where pages are moved to when there are to be part - * of a reply. + * Pages are sent using ->sendmsg with MSG_SPLICE_PAGES so each server thread + * needs to allocate more to replace those used in sending. To help keep track + * of these pages we have a receive list where all pages initialy live, and a + * send list where pages are moved to when there are to be part of a reply. * * We use xdr_buf for holding responses as it fits well with NFS * read responses (that have a header, and some data pages, and possibly diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index 03a4f5615086..f1cc53aad6e0 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -1060,16 +1060,8 @@ static int svc_tcp_recvfrom(struct svc_rqst *rqstp) return 0; /* record not complete */ } -static int svc_tcp_send_kvec(struct socket *sock, const struct kvec *vec, - int flags) -{ - return kernel_sendpage(sock, virt_to_page(vec->iov_base), - offset_in_page(vec->iov_base), - vec->iov_len, flags); -} - /* - * kernel_sendpage() is used exclusively to reduce the number of + * MSG_SPLICE_PAGES is used exclusively to reduce the number of * copy operations in this path. Therefore the caller must ensure * that the pages backing @xdr are unchanging. * @@ -1081,65 +1073,54 @@ static int svc_tcp_sendmsg(struct socket *sock, struct xdr_buf *xdr, { const struct kvec *head = xdr->head; const struct kvec *tail = xdr->tail; - struct kvec rm = { - .iov_base = &marker, - .iov_len = sizeof(marker), - }; + struct iov_iter iters[3]; + struct bio_vec head_bv, tail_bv; struct msghdr msg = { - .msg_flags = 0, + .msg_flags = MSG_SPLICE_PAGES, }; - int ret; + void *m, *t; + int ret, n = 2, size; *sentp = 0; ret = xdr_alloc_bvec(xdr, GFP_KERNEL); if (ret < 0) return ret; - ret = kernel_sendmsg(sock, &msg, &rm, 1, rm.iov_len); - if (ret < 0) - return ret; - *sentp += ret; - if (ret != rm.iov_len) - return -EAGAIN; + m = page_frag_alloc(NULL, sizeof(marker) + head->iov_len + tail->iov_len, + GFP_KERNEL); + if (!m) + return -ENOMEM; - ret = svc_tcp_send_kvec(sock, head, 0); - if (ret < 0) - return ret; - *sentp += ret; - if (ret != head->iov_len) - goto out; + memcpy(m, &marker, sizeof(marker)); + if (head->iov_len) + memcpy(m + sizeof(marker), head->iov_base, head->iov_len); + bvec_set_virt(&head_bv, m, sizeof(marker) + head->iov_len); + iov_iter_bvec(&iters[0], ITER_SOURCE, &head_bv, 1, + sizeof(marker) + head->iov_len); - if (xdr->page_len) { - unsigned int offset, len, remaining; - struct bio_vec *bvec; - - bvec = xdr->bvec + (xdr->page_base >> PAGE_SHIFT); - offset = offset_in_page(xdr->page_base); - remaining = xdr->page_len; - while (remaining > 0) { - len = min(remaining, bvec->bv_len - offset); - ret = kernel_sendpage(sock, bvec->bv_page, - bvec->bv_offset + offset, - len, 0); - if (ret < 0) - return ret; - *sentp += ret; - if (ret != len) - goto out; - remaining -= len; - offset = 0; - bvec++; - } - } + iov_iter_bvec(&iters[1], ITER_SOURCE, xdr->bvec, + xdr_buf_pagecount(xdr), xdr->page_len); if (tail->iov_len) { - ret = svc_tcp_send_kvec(sock, tail, 0); - if (ret < 0) - return ret; - *sentp += ret; + t = page_frag_alloc(NULL, tail->iov_len, GFP_KERNEL); + if (!t) + return -ENOMEM; + memcpy(t, tail->iov_base, tail->iov_len); + bvec_set_virt(&tail_bv, t, tail->iov_len); + iov_iter_bvec(&iters[2], ITER_SOURCE, &tail_bv, 1, tail->iov_len); + n++; } -out: + size = sizeof(marker) + head->iov_len + xdr->page_len + tail->iov_len; + iov_iter_iterlist(&msg.msg_iter, ITER_SOURCE, iters, n, size); + + ret = sock_sendmsg(sock, &msg); + if (ret < 0) + return ret; + if (ret > 0) + *sentp = ret; + if (ret != size) + return -EAGAIN; return 0; } From patchwork Wed Mar 29 14:13:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76647 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp464484vqo; Wed, 29 Mar 2023 07:42:04 -0700 (PDT) X-Google-Smtp-Source: AKy350bqaEoYQBS92iKHaMLE04HaAKNajnJrH6m688HGvTfohfn8gkcH0wtX7xYkfsyv3Cs1sepZ X-Received: by 2002:a17:907:b689:b0:93a:6e9c:262e with SMTP id vm9-20020a170907b68900b0093a6e9c262emr22455429ejc.23.1680100924552; Wed, 29 Mar 2023 07:42:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680100924; cv=none; d=google.com; s=arc-20160816; b=mW3wBzO9jzs31ug6nEoew3BRqYoVedw1pTYdGMgrNjgqKZW5xbmNIUBT1t1EJqmjRp oWEyXU9pTDYxmQ740lyqMaVcuKMimXhB1bYNKo9/uN5dSftRQMemFikIUVQSVM1gTMqi SmaliacAXDeqoMFH+wXTL5NWrAf3Lu4nWF6PvBBFDk5mYk8h3USVcO2AxIXUYHDJrj0c C64vmafsva83xa/DIGIqAGMjIkS+nzPzeZuFGcz5flhXdQBf2KN4UA/43BtTOxfyDlsz JXJ7mkXXGN1e54ePjMQ2Kh+iI0+8IIe5sfjvl+g6NleWA1tpkgRHaHmHyUGQQI8KMLi+ 2FtQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=wvXdMcltJv+r5QoZnXaavWZcwnuZQd4mm223J864Zdo=; b=yXMTPm0hJkIMgMvvgrn04EYNc+qaGY70NgIBHqz2dXjfwzQg39C97YuKpSFrqRNlBD iJkg4onroR/ujrrR89NbSb2oJ7ntiS1KZLWPAEtfyqlohF9BMpFgRMYsLMpJutr+yyPl 5nIHlXxGriKScrEoT8hZd2G/fS5F1tTtIKv15R1aHli23tWDYMoyEw136jfVAusP2BJ/ sliCD+eIi5LGbg0P9diHSdaHQpfVOVfRqMCN+AJVmExWrRSyhfMQAxtqf1swC0Evs/Pq mwV6aRH1Ru4PirCpUCGSn7qQyrAig/WMJE3aACf7PuAxcP+FVO2LimAkcp1fUDQBYRu9 xjVA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=RTOAq7HH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z16-20020a170906945000b0093dd2150283si14760788ejx.715.2023.03.29.07.41.38; Wed, 29 Mar 2023 07:42:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=RTOAq7HH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231222AbjC2OW2 (ORCPT + 99 others); Wed, 29 Mar 2023 10:22:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55246 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229930AbjC2OU0 (ORCPT ); Wed, 29 Mar 2023 10:20:26 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 221EB61A2 for ; Wed, 29 Mar 2023 07:16:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099355; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wvXdMcltJv+r5QoZnXaavWZcwnuZQd4mm223J864Zdo=; b=RTOAq7HHjPqvr7Ihpl9Ql987kfwQspGqz1EAi+WeHi2rc46w/2/3L8hBtFedqNXaXEaXrj 4Yp9Qxnx8CU4m8mjPVHe80XAxbbpS2F9XK1tmVaqwpvSjP5h9q96Me4JU4FBsM08V61SfZ YyBxPJ+eQafQ95sHz9HX/GISxSkma8A= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-482-Ff5bQfAPN6KgS6uIgJRP1w-1; Wed, 29 Mar 2023 10:15:53 -0400 X-MC-Unique: Ff5bQfAPN6KgS6uIgJRP1w-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 82A463814947; Wed, 29 Mar 2023 14:15:52 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5750F492B02; Wed, 29 Mar 2023 14:15:50 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Trond Myklebust , Anna Schumaker , linux-nfs@vger.kernel.org Subject: [RFC PATCH v2 41/48] sunrpc: Rely on TCP sendmsg + MSG_SPLICE_PAGES to copy unspliceable data Date: Wed, 29 Mar 2023 15:13:47 +0100 Message-Id: <20230329141354.516864-42-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761713507396876794?= X-GMAIL-MSGID: =?utf-8?q?1761713507396876794?= Rather than copying data in svc_tcp_sendmsg() into page fragments, just hand in ITER_KVEC iterators as part of the ITER_ITERLIST and rely on TCP to copy them if the pages they're residing on are belong to the slab or have a zero refcount. Signed-off-by: David Howells cc: Trond Myklebust cc: Anna Schumaker cc: Chuck Lever cc: Jeff Layton cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: linux-nfs@vger.kernel.org cc: netdev@vger.kernel.org --- net/sunrpc/svcsock.c | 44 ++++++++++++-------------------------------- 1 file changed, 12 insertions(+), 32 deletions(-) diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index f1cc53aad6e0..c1421f6fe57a 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -1071,47 +1071,27 @@ static int svc_tcp_recvfrom(struct svc_rqst *rqstp) static int svc_tcp_sendmsg(struct socket *sock, struct xdr_buf *xdr, rpc_fraghdr marker, unsigned int *sentp) { - const struct kvec *head = xdr->head; - const struct kvec *tail = xdr->tail; - struct iov_iter iters[3]; - struct bio_vec head_bv, tail_bv; - struct msghdr msg = { - .msg_flags = MSG_SPLICE_PAGES, - }; - void *m, *t; - int ret, n = 2, size; + struct iov_iter iters[4]; + struct kvec marker_kv; + struct msghdr msg = { .msg_flags = MSG_SPLICE_PAGES, }; + int ret, n = 0, size; *sentp = 0; ret = xdr_alloc_bvec(xdr, GFP_KERNEL); if (ret < 0) return ret; - m = page_frag_alloc(NULL, sizeof(marker) + head->iov_len + tail->iov_len, - GFP_KERNEL); - if (!m) - return -ENOMEM; - - memcpy(m, &marker, sizeof(marker)); - if (head->iov_len) - memcpy(m + sizeof(marker), head->iov_base, head->iov_len); - bvec_set_virt(&head_bv, m, sizeof(marker) + head->iov_len); - iov_iter_bvec(&iters[0], ITER_SOURCE, &head_bv, 1, - sizeof(marker) + head->iov_len); - - iov_iter_bvec(&iters[1], ITER_SOURCE, xdr->bvec, + marker_kv.iov_base = ▮ + marker_kv.iov_len = sizeof(marker); + iov_iter_kvec(&iters[n++], ITER_SOURCE, &marker_kv, 1, sizeof(marker)); + iov_iter_kvec(&iters[n++], ITER_SOURCE, xdr->head, 1, xdr->head->iov_len); + iov_iter_bvec(&iters[n++], ITER_SOURCE, xdr->bvec, xdr_buf_pagecount(xdr), xdr->page_len); - if (tail->iov_len) { - t = page_frag_alloc(NULL, tail->iov_len, GFP_KERNEL); - if (!t) - return -ENOMEM; - memcpy(t, tail->iov_base, tail->iov_len); - bvec_set_virt(&tail_bv, t, tail->iov_len); - iov_iter_bvec(&iters[2], ITER_SOURCE, &tail_bv, 1, tail->iov_len); - n++; - } + if (xdr->tail->iov_len) + iov_iter_kvec(&iters[n++], ITER_SOURCE, xdr->tail, 1, xdr->tail->iov_len); - size = sizeof(marker) + head->iov_len + xdr->page_len + tail->iov_len; + size = sizeof(marker) + xdr->head->iov_len + xdr->page_len + xdr->tail->iov_len; iov_iter_iterlist(&msg.msg_iter, ITER_SOURCE, iters, n, size); ret = sock_sendmsg(sock, &msg); From patchwork Wed Mar 29 14:13:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76617 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp452945vqo; Wed, 29 Mar 2023 07:24:36 -0700 (PDT) X-Google-Smtp-Source: AKy350bBmwpO9pD4/axorOozEua4q5RqPVskmQsp8l4smTxqjxlTs9KsAhWnBF5a0gJk2cQqs67X X-Received: by 2002:a17:906:c7cc:b0:878:955e:b4a4 with SMTP id dc12-20020a170906c7cc00b00878955eb4a4mr17592341ejb.33.1680099876151; Wed, 29 Mar 2023 07:24:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680099876; cv=none; d=google.com; s=arc-20160816; b=q1ga55VmVrjLTZsBhuAHFcp1V2wj/raDXnaIR/9iMvme4Vi6GjPH79bMalkjbW0BmE tuvK9lPhehYO0VMuVlBSQtYSxEGpJA/xVjqyAS2KEnYlH5rLCmNX2vwinhjp9eZe82A0 /qQAEBmOhHfyYD54Gzj7EjZtrfjcBm2q5Dy/UUoX/vQU2Iz5eJqWipm3aO0Osp+rcqYs TE395XYKd+PQDWgA0MXbnGxztXcUli6IHef6U65jkYQIJr/8Xf4RQQZ1ZO4SF65cr7zT Wjz1HUWpOtfdYHtRjLhGMZMiND7kMzZqH4d4E3Qkku7mC2fYikaIb9QS2dno++gm9Tqy g3Ug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=XvqldHWl7xh/L1V+KGWEm9YizW/K8SDxYbPNnLnjdW8=; b=mEMKYYtSihF+yQkcmwiRMNtqDpymUU7n9qKIzr3d2N8WI5A5iKjTm6Fy/VtV0R7aXe CxY+zY3yBrdKVkI4r8I89txcwSO5YVUfVIDOhppocYahFjJ/5biksrO8tnpk/0PM8IiT BeXyyENlJv8mf/0qblV7s6IZahlYoUMQ+0sdPO0rm7Mhxo9uh3JdDENfj9NyhS5UYNeK D5b2T7PWXDEIZHPQaxWuYYxgqvVO1+73BPpwIciv55pAzg/Q8viwVS7ju13yqS9m8IL2 KxN8lqfQzPGDH/ZzsnAKMSyoKEFqgF/uZhHMu/1ELDW0PLl7lt+clrSG0aW0xGOSQVAN DEPg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="T+j/Wqt9"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a2-20020aa7d742000000b004aef1baf38asi11909933eds.96.2023.03.29.07.24.12; Wed, 29 Mar 2023 07:24:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="T+j/Wqt9"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231407AbjC2OW4 (ORCPT + 99 others); Wed, 29 Mar 2023 10:22:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54660 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231326AbjC2OUx (ORCPT ); Wed, 29 Mar 2023 10:20:53 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C7BA51A1 for ; Wed, 29 Mar 2023 07:16:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099361; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XvqldHWl7xh/L1V+KGWEm9YizW/K8SDxYbPNnLnjdW8=; b=T+j/Wqt93PAX28FYIzV6pG83Hb8YiPWCEMUEaYbQDzarZQbvDecsV2+cLlNmadwNpth8xq VCrf7WhnQhhPAyhjDY4S5euUzbBnIb4UVaYjA7k0mktZxDHRTV85b9wRDjeuB7GIcjkkcv j8QRufEg6cN2ZEd0r9shUW+KhvU+q0o= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-588-COzuqKOVOGS_OcuoCYtLdA-1; Wed, 29 Mar 2023 10:15:59 -0400 X-MC-Unique: COzuqKOVOGS_OcuoCYtLdA-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 9E5AC811E7C; Wed, 29 Mar 2023 14:15:55 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 263DD1121330; Wed, 29 Mar 2023 14:15:53 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Keith Busch , Jens Axboe , Christoph Hellwig , Sagi Grimberg , Chaitanya Kulkarni , linux-nvme@lists.infradead.org Subject: [RFC PATCH v2 42/48] nvme: Use sendmsg(MSG_SPLICE_PAGES) rather then sendpage Date: Wed, 29 Mar 2023 15:13:48 +0100 Message-Id: <20230329141354.516864-43-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761712407899973078?= X-GMAIL-MSGID: =?utf-8?q?1761712407899973078?= When transmitting data, call down into TCP using a single sendmsg with MSG_SPLICE_PAGES to indicate that content should be spliced rather than performing several sendmsg and sendpage calls to transmit header, data pages and trailer. Signed-off-by: David Howells cc: Keith Busch cc: Jens Axboe cc: Christoph Hellwig cc: Sagi Grimberg cc: Chaitanya Kulkarni cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: linux-nvme@lists.infradead.org cc: netdev@vger.kernel.org --- drivers/nvme/host/tcp.c | 44 ++++++++++++++++++------------------ drivers/nvme/target/tcp.c | 47 +++++++++++++++++++++++++-------------- 2 files changed, 52 insertions(+), 39 deletions(-) diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index fa32969b532f..cc617692702d 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -979,25 +979,23 @@ static int nvme_tcp_try_send_data(struct nvme_tcp_request *req) u32 h2cdata_left = req->h2cdata_left; while (true) { + struct bio_vec bvec; + struct msghdr msg = { .msg_flags = MSG_DONTWAIT | MSG_SPLICE_PAGES, }; struct page *page = nvme_tcp_req_cur_page(req); size_t offset = nvme_tcp_req_cur_offset(req); size_t len = nvme_tcp_req_cur_length(req); bool last = nvme_tcp_pdu_last_send(req, len); int req_data_sent = req->data_sent; - int ret, flags = MSG_DONTWAIT; + int ret; if (last && !queue->data_digest && !nvme_tcp_queue_more(queue)) - flags |= MSG_EOR; + msg.msg_flags |= MSG_EOR; else - flags |= MSG_MORE | MSG_SENDPAGE_NOTLAST; + msg.msg_flags |= MSG_MORE | MSG_SENDPAGE_NOTLAST; - if (sendpage_ok(page)) { - ret = kernel_sendpage(queue->sock, page, offset, len, - flags); - } else { - ret = sock_no_sendpage(queue->sock, page, offset, len, - flags); - } + bvec_set_page(&bvec, page, len, offset); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, len); + ret = sock_sendmsg(queue->sock, &msg); if (ret <= 0) return ret; @@ -1036,22 +1034,24 @@ static int nvme_tcp_try_send_cmd_pdu(struct nvme_tcp_request *req) { struct nvme_tcp_queue *queue = req->queue; struct nvme_tcp_cmd_pdu *pdu = req->pdu; + struct bio_vec bvec; + struct msghdr msg = { .msg_flags = MSG_DONTWAIT | MSG_SPLICE_PAGES, }; bool inline_data = nvme_tcp_has_inline_data(req); u8 hdgst = nvme_tcp_hdgst_len(queue); int len = sizeof(*pdu) + hdgst - req->offset; - int flags = MSG_DONTWAIT; int ret; if (inline_data || nvme_tcp_queue_more(queue)) - flags |= MSG_MORE | MSG_SENDPAGE_NOTLAST; + msg.msg_flags |= MSG_MORE | MSG_SENDPAGE_NOTLAST; else - flags |= MSG_EOR; + msg.msg_flags |= MSG_EOR; if (queue->hdr_digest && !req->offset) nvme_tcp_hdgst(queue->snd_hash, pdu, sizeof(*pdu)); - ret = kernel_sendpage(queue->sock, virt_to_page(pdu), - offset_in_page(pdu) + req->offset, len, flags); + bvec_set_virt(&bvec, (void *)pdu + req->offset, len); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, len); + ret = sock_sendmsg(queue->sock, &msg); if (unlikely(ret <= 0)) return ret; @@ -1075,6 +1075,8 @@ static int nvme_tcp_try_send_data_pdu(struct nvme_tcp_request *req) { struct nvme_tcp_queue *queue = req->queue; struct nvme_tcp_data_pdu *pdu = req->pdu; + struct bio_vec bvec; + struct msghdr msg = { .msg_flags = MSG_DONTWAIT | MSG_MORE, }; u8 hdgst = nvme_tcp_hdgst_len(queue); int len = sizeof(*pdu) - req->offset + hdgst; int ret; @@ -1083,13 +1085,11 @@ static int nvme_tcp_try_send_data_pdu(struct nvme_tcp_request *req) nvme_tcp_hdgst(queue->snd_hash, pdu, sizeof(*pdu)); if (!req->h2cdata_left) - ret = kernel_sendpage(queue->sock, virt_to_page(pdu), - offset_in_page(pdu) + req->offset, len, - MSG_DONTWAIT | MSG_MORE | MSG_SENDPAGE_NOTLAST); - else - ret = sock_no_sendpage(queue->sock, virt_to_page(pdu), - offset_in_page(pdu) + req->offset, len, - MSG_DONTWAIT | MSG_MORE); + msg.msg_flags |= MSG_SPLICE_PAGES | MSG_SENDPAGE_NOTLAST; + + bvec_set_virt(&bvec, (void *)pdu + req->offset, len); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, len); + ret = sock_sendmsg(queue->sock, &msg); if (unlikely(ret <= 0)) return ret; diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c index d6cc557cc539..00b491abf50f 100644 --- a/drivers/nvme/target/tcp.c +++ b/drivers/nvme/target/tcp.c @@ -548,13 +548,18 @@ static void nvmet_tcp_execute_request(struct nvmet_tcp_cmd *cmd) static int nvmet_try_send_data_pdu(struct nvmet_tcp_cmd *cmd) { + struct bio_vec bvec; + struct msghdr msg = { + .msg_flags = (MSG_DONTWAIT | MSG_MORE | MSG_SENDPAGE_NOTLAST | + MSG_SPLICE_PAGES), + }; u8 hdgst = nvmet_tcp_hdgst_len(cmd->queue); int left = sizeof(*cmd->data_pdu) - cmd->offset + hdgst; int ret; - ret = kernel_sendpage(cmd->queue->sock, virt_to_page(cmd->data_pdu), - offset_in_page(cmd->data_pdu) + cmd->offset, - left, MSG_DONTWAIT | MSG_MORE | MSG_SENDPAGE_NOTLAST); + bvec_set_virt(&bvec, (void *)cmd->data_pdu + cmd->offset, left); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, left); + ret = sock_sendmsg(cmd->queue->sock, &msg); if (ret <= 0) return ret; @@ -575,17 +580,21 @@ static int nvmet_try_send_data(struct nvmet_tcp_cmd *cmd, bool last_in_batch) int ret; while (cmd->cur_sg) { + struct bio_vec bvec; + struct msghdr msg = { + .msg_flags = MSG_DONTWAIT | MSG_SPLICE_PAGES, + }; struct page *page = sg_page(cmd->cur_sg); u32 left = cmd->cur_sg->length - cmd->offset; - int flags = MSG_DONTWAIT; if ((!last_in_batch && cmd->queue->send_list_len) || cmd->wbytes_done + left < cmd->req.transfer_len || queue->data_digest || !queue->nvme_sq.sqhd_disabled) - flags |= MSG_MORE | MSG_SENDPAGE_NOTLAST; + msg.msg_flags |= MSG_MORE | MSG_SENDPAGE_NOTLAST; - ret = kernel_sendpage(cmd->queue->sock, page, cmd->offset, - left, flags); + bvec_set_page(&bvec, page, left, cmd->offset); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, left); + ret = sock_sendmsg(cmd->queue->sock, &msg); if (ret <= 0) return ret; @@ -621,18 +630,20 @@ static int nvmet_try_send_data(struct nvmet_tcp_cmd *cmd, bool last_in_batch) static int nvmet_try_send_response(struct nvmet_tcp_cmd *cmd, bool last_in_batch) { + struct bio_vec bvec; + struct msghdr msg = { .msg_flags = MSG_DONTWAIT | MSG_SPLICE_PAGES, }; u8 hdgst = nvmet_tcp_hdgst_len(cmd->queue); int left = sizeof(*cmd->rsp_pdu) - cmd->offset + hdgst; - int flags = MSG_DONTWAIT; int ret; if (!last_in_batch && cmd->queue->send_list_len) - flags |= MSG_MORE | MSG_SENDPAGE_NOTLAST; + msg.msg_flags |= MSG_MORE | MSG_SENDPAGE_NOTLAST; else - flags |= MSG_EOR; + msg.msg_flags |= MSG_EOR; - ret = kernel_sendpage(cmd->queue->sock, virt_to_page(cmd->rsp_pdu), - offset_in_page(cmd->rsp_pdu) + cmd->offset, left, flags); + bvec_set_virt(&bvec, (void *)cmd->rsp_pdu + cmd->offset, left); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, left); + ret = sock_sendmsg(cmd->queue->sock, &msg); if (ret <= 0) return ret; cmd->offset += ret; @@ -649,18 +660,20 @@ static int nvmet_try_send_response(struct nvmet_tcp_cmd *cmd, static int nvmet_try_send_r2t(struct nvmet_tcp_cmd *cmd, bool last_in_batch) { + struct bio_vec bvec; + struct msghdr msg = { .msg_flags = MSG_DONTWAIT | MSG_SPLICE_PAGES, }; u8 hdgst = nvmet_tcp_hdgst_len(cmd->queue); int left = sizeof(*cmd->r2t_pdu) - cmd->offset + hdgst; - int flags = MSG_DONTWAIT; int ret; if (!last_in_batch && cmd->queue->send_list_len) - flags |= MSG_MORE | MSG_SENDPAGE_NOTLAST; + msg.msg_flags |= MSG_MORE | MSG_SENDPAGE_NOTLAST; else - flags |= MSG_EOR; + msg.msg_flags |= MSG_EOR; - ret = kernel_sendpage(cmd->queue->sock, virt_to_page(cmd->r2t_pdu), - offset_in_page(cmd->r2t_pdu) + cmd->offset, left, flags); + bvec_set_virt(&bvec, (void *)cmd->r2t_pdu + cmd->offset, left); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, left); + ret = sock_sendmsg(cmd->queue->sock, &msg); if (ret <= 0) return ret; cmd->offset += ret; From patchwork Wed Mar 29 14:13:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76625 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp460921vqo; Wed, 29 Mar 2023 07:36:13 -0700 (PDT) X-Google-Smtp-Source: AKy350YMGQ6kXAXm/lbheXm6sAj61DZkFJR6nGk1TsG5mjjLuB/GIwdlKZAtbD0mGsqu/RdLZs9X X-Received: by 2002:a17:907:1c21:b0:8b1:7ead:7d43 with SMTP id nc33-20020a1709071c2100b008b17ead7d43mr27128626ejc.50.1680100573402; Wed, 29 Mar 2023 07:36:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680100573; cv=none; d=google.com; s=arc-20160816; b=C/DetKO7UbpnsUxyJfwWsO0MTsDfLGMt2m9QENGfdGB4WHYD+8d9+0ElqxGigzD081 +7QA7Kb+qpPbuesbT3Hwn9MoIWO0rBUbQUj7oixywCSA0bbIK6OK9HTNJEEhnwiIXJD9 0xDZ9PbAW787n5hZG/1R0eSCDXtfOxhkxtjyHLtmB1wmqyLjEyRVR66e7NRa9BjMs5Mu D/FWTLjp97d5dvP4nC5eIzpyFjKXHUSl9DC2KCKPnDp8EJBPIlfpLfum8Md4MzU22kzD UEnLaPzmuOIpxdzE/aY0xFAcEhXzlU+HJxb8ang8et5ea3EDjRuTzTNwpgKKZo5SlPOh o4fA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=T9xMfNdsrDCo/ep+SBwQk+++P3r8+NfDYHGjOqRd99M=; b=eIYwaE6QzUU1dPzkZM/iIk/4Nq+YLaHuQl+F5VdAFmnclwqkeAJaF8HGnqrxMuDhku ibjPPbv8DqGYnIrG0qZpd2TCV1IS2XhZ+xiIxrdMP0YvBSG/ftIsiXPuT5fUt0M4BZsz okCHyG/Olc1zCyN0Pz3yPTg7sGiArB1INek56MXm9hjQutt9/fxBHvGb6DATXn2Preei G1eWRK7VthNxi6sykYvZdjdqjnfYWHd6JK2KiGELhAmC6Ivg8rCJA/yI/7wx73kP22Gi De8oyAOTMqOdMu1ezcB9QatWCrs5jw97StB0DDVmgHAGXf/5oOYwbB7YIRbX9yx10mKx xpYg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=B6oK97eq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l3-20020aa7d943000000b004acda4c9666si31326874eds.225.2023.03.29.07.35.48; Wed, 29 Mar 2023 07:36:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=B6oK97eq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229451AbjC2OXN (ORCPT + 99 others); Wed, 29 Mar 2023 10:23:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54678 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230014AbjC2OVS (ORCPT ); Wed, 29 Mar 2023 10:21:18 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D4F6461B9 for ; Wed, 29 Mar 2023 07:16:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099366; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=T9xMfNdsrDCo/ep+SBwQk+++P3r8+NfDYHGjOqRd99M=; b=B6oK97eqLlCXZV5xUMbWjlIdfzklIE/wwgvTFCzK0eTHp1LzfXkkCIP31Wx2qjbPIqk9D3 TX4JnhfRx6Blz6TrZ6StZ5U5mMLA0HmuXYod5vbYYUHMowBAkzs9rBdnxCAjeJpNiXb1UQ rNvpIfjyCpOS5avtGpqxRdQEGAc6jNk= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-610-Ojh5V2k9OAOrBsNjIDnCsA-1; Wed, 29 Mar 2023 10:16:01 -0400 X-MC-Unique: Ojh5V2k9OAOrBsNjIDnCsA-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7C34E299E77B; Wed, 29 Mar 2023 14:15:58 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 615572166B33; Wed, 29 Mar 2023 14:15:56 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Tom Herbert , Tom Herbert Subject: [RFC PATCH v2 43/48] kcm: Use sendmsg(MSG_SPLICE_PAGES) rather then sendpage Date: Wed, 29 Mar 2023 15:13:49 +0100 Message-Id: <20230329141354.516864-44-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761713139123379043?= X-GMAIL-MSGID: =?utf-8?q?1761713139123379043?= When transmitting data, call down into TCP using a single sendmsg with MSG_SPLICE_PAGES to indicate that content should be spliced rather than performing several sendmsg and sendpage calls to transmit header, data pages and trailer. Signed-off-by: David Howells cc: Tom Herbert cc: Tom Herbert cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- net/kcm/kcmsock.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c index cfe828bd7fc6..819259149952 100644 --- a/net/kcm/kcmsock.c +++ b/net/kcm/kcmsock.c @@ -641,6 +641,10 @@ static int kcm_write_msgs(struct kcm_sock *kcm) for (fragidx = 0; fragidx < skb_shinfo(skb)->nr_frags; fragidx++) { + struct bio_vec bvec; + struct msghdr msg = { + .msg_flags = MSG_DONTWAIT | MSG_SPLICE_PAGES, + }; skb_frag_t *frag; frag_offset = 0; @@ -651,11 +655,12 @@ static int kcm_write_msgs(struct kcm_sock *kcm) goto out; } - ret = kernel_sendpage(psock->sk->sk_socket, - skb_frag_page(frag), - skb_frag_off(frag) + frag_offset, - skb_frag_size(frag) - frag_offset, - MSG_DONTWAIT); + bvec_set_page(&bvec, + skb_frag_page(frag), + skb_frag_size(frag) - frag_offset, + skb_frag_off(frag) + frag_offset); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, bvec.bv_len); + ret = sock_sendmsg(psock->sk->sk_socket, &msg); if (ret <= 0) { if (ret == -EAGAIN) { /* Save state to try again when there's From patchwork Wed Mar 29 14:13:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76621 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp459661vqo; Wed, 29 Mar 2023 07:34:29 -0700 (PDT) X-Google-Smtp-Source: AKy350aqS4C6vi9As4jT45ycggUgbGe2Uiqmq3NmrpZ+EhU/gfMz71PJuve/VCocUGbhboDZca0W X-Received: by 2002:a17:907:8a8e:b0:944:49ee:aea2 with SMTP id sf14-20020a1709078a8e00b0094449eeaea2mr14062030ejc.71.1680100468937; Wed, 29 Mar 2023 07:34:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680100468; cv=none; d=google.com; s=arc-20160816; b=jwpBw75YshY3TbczaCFzvhKyGZ45U4rLw5n5G8Of6y5ykOsm0kT2LAdGBQvyyB6q49 vx/cC8fibpTmsSSe+gMpixTfxsU12XLBl8OgMYCSRV1L5AQS/HfGGOvQJOZ7ZVD8g7bw 7G+bEI7E5QbVCEehnKDHjXwIk2JN/RQC0oVgIXzaU2i/E+r/9r55WtouCjnAZsg1sxp5 4SbLLiYdDi8itKNh42Agc6LhujKNuzDasRMFv7sE3o3KyfaS6aMTq+5xuOde3JW9Y8RD hxKDSGI6HOr88mPeO7Hy3T2xpqcJ4GhNX9bYUc9O0yJk/UD5PNhOPybO5TrxogmezkiS PWNw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=7fCBjYGLxHqikc/tNao5Lfq+HewWC5ZgTjyFTAyeW0Q=; b=AV2gS/WEgFwIWAD1CI2OkksAHkRU82WAmEfZcd/L7L7EJvgzhh/yAmq19rGLMqBECt ez978BYTqtOqGe9i+vB0O69+LoUaODEVePBY1HEI3f2KpfC6huA0UK3Dk7kTKPuqUIiu I9CjeJP+urjWpSLkmLD7Db1CQfoOlXkXocTNnIvHps6EiqcJCybk54ln4eGw8UIBnwNU Df29nFTZpRoXZOhlTYV9jQOVBGHeHk9Zd3zXScl3HNAhiNlYxywEGjJOKlQhVtwZDgJb Zsl0ZyaO5y+QKa35iXJ1FxMF79bahy3CpLJYCAnceRD9FjakjAatZ0/W4LI3BGNIQ5PE bsfQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ccrb4F9k; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ne26-20020a1709077b9a00b00939ad2676ffsi28484216ejc.762.2023.03.29.07.34.04; Wed, 29 Mar 2023 07:34:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ccrb4F9k; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231440AbjC2OXQ (ORCPT + 99 others); Wed, 29 Mar 2023 10:23:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55152 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230514AbjC2OVS (ORCPT ); Wed, 29 Mar 2023 10:21:18 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B654155B4 for ; Wed, 29 Mar 2023 07:16:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099366; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7fCBjYGLxHqikc/tNao5Lfq+HewWC5ZgTjyFTAyeW0Q=; b=ccrb4F9kQU5F4JRMK1q+wVJB2/67CR/0xNjY1G0nv6YKHi/Jpwyvp6if5GfisJqQ9HJ0+I 47M6ho3DuZaoi9CpEsBgT6YkMLvzrmP+z+82+nfZPT2/KVvEyo0kfngXNijl+svdAmCdmv dVp5VlyeDWruqI3fgRujGnuCBmXlwlc= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-114-46eAsXOAN6OrZ73_2qvGIA-1; Wed, 29 Mar 2023 10:16:03 -0400 X-MC-Unique: 46eAsXOAN6OrZ73_2qvGIA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5E15C8030D4; Wed, 29 Mar 2023 14:16:01 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1928E202701F; Wed, 29 Mar 2023 14:15:59 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Karsten Graul , Wenjia Zhang , Jan Karcher , linux-s390@vger.kernel.org Subject: [RFC PATCH v2 44/48] smc: Drop smc_sendpage() in favour of smc_sendmsg() + MSG_SPLICE_PAGES Date: Wed, 29 Mar 2023 15:13:50 +0100 Message-Id: <20230329141354.516864-45-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761713029469914948?= X-GMAIL-MSGID: =?utf-8?q?1761713029469914948?= Drop the smc_sendpage() code as smc_sendmsg() just passes the call down to the underlying TCP socket and smc_tx_sendpage() is just a wrapper around its sendmsg implementation. Signed-off-by: David Howells cc: Karsten Graul cc: Wenjia Zhang cc: Jan Karcher cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: linux-s390@vger.kernel.org cc: netdev@vger.kernel.org --- net/smc/af_smc.c | 29 ----------------------------- net/smc/smc_stats.c | 2 +- net/smc/smc_stats.h | 1 - net/smc/smc_tx.c | 16 ---------------- net/smc/smc_tx.h | 2 -- 5 files changed, 1 insertion(+), 49 deletions(-) diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index a4cccdfdc00a..d4113c8a7cda 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -3125,34 +3125,6 @@ static int smc_ioctl(struct socket *sock, unsigned int cmd, return put_user(answ, (int __user *)arg); } -static ssize_t smc_sendpage(struct socket *sock, struct page *page, - int offset, size_t size, int flags) -{ - struct sock *sk = sock->sk; - struct smc_sock *smc; - int rc = -EPIPE; - - smc = smc_sk(sk); - lock_sock(sk); - if (sk->sk_state != SMC_ACTIVE) { - release_sock(sk); - goto out; - } - release_sock(sk); - if (smc->use_fallback) { - rc = kernel_sendpage(smc->clcsock, page, offset, - size, flags); - } else { - lock_sock(sk); - rc = smc_tx_sendpage(smc, page, offset, size, flags); - release_sock(sk); - SMC_STAT_INC(smc, sendpage_cnt); - } - -out: - return rc; -} - /* Map the affected portions of the rmbe into an spd, note the number of bytes * to splice in conn->splice_pending, and press 'go'. Delays consumer cursor * updates till whenever a respective page has been fully processed. @@ -3224,7 +3196,6 @@ static const struct proto_ops smc_sock_ops = { .sendmsg = smc_sendmsg, .recvmsg = smc_recvmsg, .mmap = sock_no_mmap, - .sendpage = smc_sendpage, .splice_read = smc_splice_read, }; diff --git a/net/smc/smc_stats.c b/net/smc/smc_stats.c index e80e34f7ac15..ca14c0f3a07d 100644 --- a/net/smc/smc_stats.c +++ b/net/smc/smc_stats.c @@ -227,7 +227,7 @@ static int smc_nl_fill_stats_tech_data(struct sk_buff *skb, SMC_NLA_STATS_PAD)) goto errattr; if (nla_put_u64_64bit(skb, SMC_NLA_STATS_T_SENDPAGE_CNT, - smc_tech->sendpage_cnt, + 0, SMC_NLA_STATS_PAD)) goto errattr; if (nla_put_u64_64bit(skb, SMC_NLA_STATS_T_CORK_CNT, diff --git a/net/smc/smc_stats.h b/net/smc/smc_stats.h index 84b7ecd8c05c..b60fe1eb37ab 100644 --- a/net/smc/smc_stats.h +++ b/net/smc/smc_stats.h @@ -71,7 +71,6 @@ struct smc_stats_tech { u64 clnt_v2_succ_cnt; u64 srv_v1_succ_cnt; u64 srv_v2_succ_cnt; - u64 sendpage_cnt; u64 urg_data_cnt; u64 splice_cnt; u64 cork_cnt; diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c index f4b6a71ac488..d31ce8209fa2 100644 --- a/net/smc/smc_tx.c +++ b/net/smc/smc_tx.c @@ -298,22 +298,6 @@ int smc_tx_sendmsg(struct smc_sock *smc, struct msghdr *msg, size_t len) return rc; } -int smc_tx_sendpage(struct smc_sock *smc, struct page *page, int offset, - size_t size, int flags) -{ - struct msghdr msg = {.msg_flags = flags}; - char *kaddr = kmap(page); - struct kvec iov; - int rc; - - iov.iov_base = kaddr + offset; - iov.iov_len = size; - iov_iter_kvec(&msg.msg_iter, ITER_SOURCE, &iov, 1, size); - rc = smc_tx_sendmsg(smc, &msg, size); - kunmap(page); - return rc; -} - /***************************** sndbuf consumer *******************************/ /* sndbuf consumer: actual data transfer of one target chunk with ISM write */ diff --git a/net/smc/smc_tx.h b/net/smc/smc_tx.h index 34b578498b1f..a59f370b8b43 100644 --- a/net/smc/smc_tx.h +++ b/net/smc/smc_tx.h @@ -31,8 +31,6 @@ void smc_tx_pending(struct smc_connection *conn); void smc_tx_work(struct work_struct *work); void smc_tx_init(struct smc_sock *smc); int smc_tx_sendmsg(struct smc_sock *smc, struct msghdr *msg, size_t len); -int smc_tx_sendpage(struct smc_sock *smc, struct page *page, int offset, - size_t size, int flags); int smc_tx_sndbuf_nonempty(struct smc_connection *conn); void smc_tx_sndbuf_nonfull(struct smc_sock *smc); void smc_tx_consumer_update(struct smc_connection *conn, bool force); From patchwork Wed Mar 29 14:13:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76631 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp462309vqo; Wed, 29 Mar 2023 07:38:29 -0700 (PDT) X-Google-Smtp-Source: AKy350bKrSheAFtCObtA9czJRiQDaF9qZhTCdFBmU5j6vadWxCZWgLHepS0HcxqcmMVOKQXDGi0w X-Received: by 2002:a17:906:6449:b0:8eb:d3a5:b9f0 with SMTP id l9-20020a170906644900b008ebd3a5b9f0mr19953656ejn.67.1680100709023; Wed, 29 Mar 2023 07:38:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680100709; cv=none; d=google.com; s=arc-20160816; b=YsfIXy5GWtbTnh5Nc7MNm0aQALpkVtcFK5ULXpqdIEFv69kL02PwqXp9NgvVk3nYv9 etauoT/mpnVVg/hcYllwoaVxWeAp7x29FkStMEUuqLoiZXd6KDSlM9CXQVKmwzs4+Ujn wgjU8FwN19wMpD/JnGYi9kd0OHCbooj3ByvGY6C7nArNpuH+TI1HvHwx87FxZfparBQq QQcQ9AgW/8Ip29Kv2Dsol0+BocBR9txi8ZKHtBdyNzyzTtoNWVO8V9L9TeocNef3AKZs MsiXRc0SeWcqt3HvluRkaqljzYxcSgU+2lbQgAlqruJTNvjD9mHq1TdwC4pT5iHbFHyb zBEg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=UieBw+fd7YOFzWwOOxTgfiaOtpF+6tMbkd08W9f/ez8=; b=BGo1krICADqULnO+Xi1SnCPPrqWGVlRbh7St+TTfalb9Z4jW54taSQCRa5Ha7uZhgv iA35x8xxs3GLNmm80rFmprr9F/moYYQHvn3MFHGH0nlCsjfbJpQYCwkp0XqbpNdZsiSW tychoCb/LILXwExUfdzhBFAfxYDyDXrwm9gJqukLVYrSXrM/9D5mZZ6sbc8dHOpzFgRu P3avODHZK4axDQ2TvTSGYvpeRVAKcr2XSZdLYsJPyTy130bpRuAdEX5xR+GaPyn5kGIO AGteuslAPq7HcSvH949UcxkNGvnq7f7xr/uRg7a5TtdrNfHNHJSz+tjlVZnGVVkA2Ik1 Wz/g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=bymhAKGc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c18-20020aa7d612000000b004fd298d4b3esi12934088edr.373.2023.03.29.07.38.05; Wed, 29 Mar 2023 07:38:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=bymhAKGc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230313AbjC2OXl (ORCPT + 99 others); Wed, 29 Mar 2023 10:23:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35902 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231186AbjC2OWJ (ORCPT ); Wed, 29 Mar 2023 10:22:09 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3201E6593 for ; Wed, 29 Mar 2023 07:16:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099382; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UieBw+fd7YOFzWwOOxTgfiaOtpF+6tMbkd08W9f/ez8=; b=bymhAKGcXM30P46CJzLFZPViVRpr6/qeB+a0xE0/eYtjo+7hbQd6u0gLcsi2wC64xE8pjV SrGEMkgkHJ3tqPtqdQOMP/y7w+/CwMDd4r6RRlnBo9eSSc/RaIPa83CVyAJjDrgtw6tFM6 f9fwZHYdgwtosEalA1ZegPGU8J3RmyY= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-446-OM0xdWQoMNypMjdol7M5aQ-1; Wed, 29 Mar 2023 10:16:17 -0400 X-MC-Unique: OM0xdWQoMNypMjdol7M5aQ-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0E9A9299E75F; Wed, 29 Mar 2023 14:16:05 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1FF3340B3ED9; Wed, 29 Mar 2023 14:16:02 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Mark Fasheh , Joel Becker , Joseph Qi , ocfs2-devel@oss.oracle.com Subject: [RFC PATCH v2 45/48] ocfs2: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage() Date: Wed, 29 Mar 2023 15:13:51 +0100 Message-Id: <20230329141354.516864-46-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761713281146052971?= X-GMAIL-MSGID: =?utf-8?q?1761713281146052971?= Fix ocfs2 to use the page fragment allocator rather than kzalloc in order to allocate the buffers for the handshake message and keepalive request and reply messages. Slab pages should not be given to sendpage, but fragments can be. Switch from using sendpage() to using sendmsg() + MSG_SPLICE_PAGES so that sendpage can be phased out. Signed-off-by: David Howells cc: Mark Fasheh cc: Joel Becker cc: Joseph Qi cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: ocfs2-devel@oss.oracle.com cc: netdev@vger.kernel.org --- fs/ocfs2/cluster/tcp.c | 107 ++++++++++++++++++++++------------------- 1 file changed, 58 insertions(+), 49 deletions(-) diff --git a/fs/ocfs2/cluster/tcp.c b/fs/ocfs2/cluster/tcp.c index aecbd712a00c..e568ad2f34bf 100644 --- a/fs/ocfs2/cluster/tcp.c +++ b/fs/ocfs2/cluster/tcp.c @@ -110,9 +110,6 @@ static struct work_struct o2net_listen_work; static struct o2hb_callback_func o2net_hb_up, o2net_hb_down; #define O2NET_HB_PRI 0x1 -static struct o2net_handshake *o2net_hand; -static struct o2net_msg *o2net_keep_req, *o2net_keep_resp; - static int o2net_sys_err_translations[O2NET_ERR_MAX] = {[O2NET_ERR_NONE] = 0, [O2NET_ERR_NO_HNDLR] = -ENOPROTOOPT, @@ -930,19 +927,22 @@ static int o2net_send_tcp_msg(struct socket *sock, struct kvec *vec, } static void o2net_sendpage(struct o2net_sock_container *sc, - void *kmalloced_virt, - size_t size) + void *virt, size_t size) { struct o2net_node *nn = o2net_nn_from_num(sc->sc_node->nd_num); + struct msghdr msg = {}; + struct bio_vec bv; ssize_t ret; + bvec_set_virt(&bv, virt, size); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bv, 1, size); + while (1) { + msg.msg_flags = MSG_DONTWAIT | MSG_SPLICE_PAGES; mutex_lock(&sc->sc_send_lock); - ret = sc->sc_sock->ops->sendpage(sc->sc_sock, - virt_to_page(kmalloced_virt), - offset_in_page(kmalloced_virt), - size, MSG_DONTWAIT); + ret = sock_sendmsg(sc->sc_sock, &msg); mutex_unlock(&sc->sc_send_lock); + if (ret == size) break; if (ret == (ssize_t)-EAGAIN) { @@ -1168,6 +1168,7 @@ static int o2net_process_message(struct o2net_sock_container *sc, struct o2net_msg *hdr) { struct o2net_node *nn = o2net_nn_from_num(sc->sc_node->nd_num); + struct o2net_msg *keep_resp; int ret = 0, handler_status; enum o2net_system_error syserr; struct o2net_msg_handler *nmh = NULL; @@ -1186,8 +1187,16 @@ static int o2net_process_message(struct o2net_sock_container *sc, be32_to_cpu(hdr->status)); goto out; case O2NET_MSG_KEEP_REQ_MAGIC: - o2net_sendpage(sc, o2net_keep_resp, - sizeof(*o2net_keep_resp)); + keep_resp = page_frag_alloc(NULL, sizeof(*keep_resp), + GFP_KERNEL); + if (!keep_resp) { + ret = -ENOMEM; + goto out; + } + memset(keep_resp, 0, sizeof(*keep_resp)); + keep_resp->magic = cpu_to_be16(O2NET_MSG_KEEP_RESP_MAGIC); + o2net_sendpage(sc, keep_resp, sizeof(*keep_resp)); + folio_put(virt_to_folio(keep_resp)); goto out; case O2NET_MSG_KEEP_RESP_MAGIC: goto out; @@ -1439,15 +1448,22 @@ static void o2net_rx_until_empty(struct work_struct *work) sc_put(sc); } -static void o2net_initialize_handshake(void) +static struct o2net_handshake *o2net_initialize_handshake(void) { - o2net_hand->o2hb_heartbeat_timeout_ms = cpu_to_be32( - O2HB_MAX_WRITE_TIMEOUT_MS); - o2net_hand->o2net_idle_timeout_ms = cpu_to_be32(o2net_idle_timeout()); - o2net_hand->o2net_keepalive_delay_ms = cpu_to_be32( - o2net_keepalive_delay()); - o2net_hand->o2net_reconnect_delay_ms = cpu_to_be32( - o2net_reconnect_delay()); + struct o2net_handshake *hand; + + hand = page_frag_alloc(NULL, sizeof(*hand), GFP_KERNEL); + if (!hand) + return NULL; + + memset(hand, 0, sizeof(*hand)); + hand->protocol_version = cpu_to_be64(O2NET_PROTOCOL_VERSION); + hand->connector_id = cpu_to_be64(1); + hand->o2hb_heartbeat_timeout_ms = cpu_to_be32(O2HB_MAX_WRITE_TIMEOUT_MS); + hand->o2net_idle_timeout_ms = cpu_to_be32(o2net_idle_timeout()); + hand->o2net_keepalive_delay_ms = cpu_to_be32(o2net_keepalive_delay()); + hand->o2net_reconnect_delay_ms = cpu_to_be32(o2net_reconnect_delay()); + return hand; } /* ------------------------------------------------------------ */ @@ -1456,16 +1472,22 @@ static void o2net_initialize_handshake(void) * rx path will see the response and mark the sc valid */ static void o2net_sc_connect_completed(struct work_struct *work) { + struct o2net_handshake *hand; struct o2net_sock_container *sc = container_of(work, struct o2net_sock_container, sc_connect_work); + hand = o2net_initialize_handshake(); + if (!hand) + goto out; + mlog(ML_MSG, "sc sending handshake with ver %llu id %llx\n", (unsigned long long)O2NET_PROTOCOL_VERSION, - (unsigned long long)be64_to_cpu(o2net_hand->connector_id)); + (unsigned long long)be64_to_cpu(hand->connector_id)); - o2net_initialize_handshake(); - o2net_sendpage(sc, o2net_hand, sizeof(*o2net_hand)); + o2net_sendpage(sc, hand, sizeof(*hand)); + folio_put(virt_to_folio(hand)); +out: sc_put(sc); } @@ -1475,8 +1497,15 @@ static void o2net_sc_send_keep_req(struct work_struct *work) struct o2net_sock_container *sc = container_of(work, struct o2net_sock_container, sc_keepalive_work.work); + struct o2net_msg *keep_req; - o2net_sendpage(sc, o2net_keep_req, sizeof(*o2net_keep_req)); + keep_req = page_frag_alloc(NULL, sizeof(*keep_req), GFP_KERNEL); + if (keep_req) { + memset(keep_req, 0, sizeof(*keep_req)); + keep_req->magic = cpu_to_be16(O2NET_MSG_KEEP_REQ_MAGIC); + o2net_sendpage(sc, keep_req, sizeof(*keep_req)); + folio_put(virt_to_folio(keep_req)); + } sc_put(sc); } @@ -1780,6 +1809,7 @@ static int o2net_accept_one(struct socket *sock, int *more) struct socket *new_sock = NULL; struct o2nm_node *node = NULL; struct o2nm_node *local_node = NULL; + struct o2net_handshake *hand; struct o2net_sock_container *sc = NULL; struct o2net_node *nn; unsigned int nofs_flag; @@ -1882,8 +1912,11 @@ static int o2net_accept_one(struct socket *sock, int *more) o2net_register_callbacks(sc->sc_sock->sk, sc); o2net_sc_queue_work(sc, &sc->sc_rx_work); - o2net_initialize_handshake(); - o2net_sendpage(sc, o2net_hand, sizeof(*o2net_hand)); + hand = o2net_initialize_handshake(); + if (hand) { + o2net_sendpage(sc, hand, sizeof(*hand)); + folio_put(virt_to_folio(hand)); + } out: if (new_sock) @@ -2090,21 +2123,8 @@ int o2net_init(void) unsigned long i; o2quo_init(); - o2net_debugfs_init(); - o2net_hand = kzalloc(sizeof(struct o2net_handshake), GFP_KERNEL); - o2net_keep_req = kzalloc(sizeof(struct o2net_msg), GFP_KERNEL); - o2net_keep_resp = kzalloc(sizeof(struct o2net_msg), GFP_KERNEL); - if (!o2net_hand || !o2net_keep_req || !o2net_keep_resp) - goto out; - - o2net_hand->protocol_version = cpu_to_be64(O2NET_PROTOCOL_VERSION); - o2net_hand->connector_id = cpu_to_be64(1); - - o2net_keep_req->magic = cpu_to_be16(O2NET_MSG_KEEP_REQ_MAGIC); - o2net_keep_resp->magic = cpu_to_be16(O2NET_MSG_KEEP_RESP_MAGIC); - for (i = 0; i < ARRAY_SIZE(o2net_nodes); i++) { struct o2net_node *nn = o2net_nn_from_num(i); @@ -2122,21 +2142,10 @@ int o2net_init(void) } return 0; - -out: - kfree(o2net_hand); - kfree(o2net_keep_req); - kfree(o2net_keep_resp); - o2net_debugfs_exit(); - o2quo_exit(); - return -ENOMEM; } void o2net_exit(void) { o2quo_exit(); - kfree(o2net_hand); - kfree(o2net_keep_req); - kfree(o2net_keep_resp); o2net_debugfs_exit(); } From patchwork Wed Mar 29 14:13:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76618 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp455408vqo; Wed, 29 Mar 2023 07:28:28 -0700 (PDT) X-Google-Smtp-Source: AKy350azMd5+uIf1PavyntaSxurPRYXzZfZxqR+HWZqHJKYpU+JlDoqxqJUG9j0PwxmBsGlN9n+n X-Received: by 2002:a17:906:4b51:b0:92f:3e2b:fbb7 with SMTP id j17-20020a1709064b5100b0092f3e2bfbb7mr19315538ejv.14.1680100108585; Wed, 29 Mar 2023 07:28:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680100108; cv=none; d=google.com; s=arc-20160816; b=ifIdPz5YgTZdGkLj4Lqf6p75JptJXaOuPvuCnm4mgiPEdm3+AdUrggpRYWfQgJSOPl NUifxWaRcG9hyJyJwH0kOG0TtluRJ/YeTr6rU29dyBsJPIRZGIuAbFP99Sx4U6ypEbKO W6oni74q0drCE0CQKzKv9ZAxUCZ07zw2dS2hBvfrdTNBfJ+LZ1GxuwW2sXreiJBnWr1l 3/ptgea24b0N27PonSU5i6sO9E6rYlItnipRYnoMw+L0TVoqz+wJJoA738NPR1x+jDGL R0jc99fWogbAO6Gkmfr9bs3mZO5AYTDeY3SFoXs6kJ8dmyiCHv7MJ3A4PDuB4vN41FT5 uqmg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ZJvfcCj7beM2l3aL2q8ZqMjrzC72DifYH9bqCAjpsUU=; b=awJNsNK3WysEKTLG0X7k7KmZS4PwUSqjbddgPrFbE9W2nD14pC75O4ZriY5SRr+1N9 HzuaXVwcIysl2iBIj6cRw1BEzYYVvPpfBWAUlXQFcnuqSExDuhHpeLs4XIcz8f4TExg/ B+D2dg+D5n9E3ga3hzhocIjp35vGFQUDzJkWuL4ObJJoIs7Fd2U8sXb/nm8F9jjEozfE 8mmpX+EPQ7k0AY8ZAF/MOs8IcINOIDTQqAtpPjK17InrgSjnpqweSfRElReabf/eccNd nAUEHpHE8skhiyGpIkTPuyu9hGv8YC5DkBJ7WqDSaHt7OILGuAiSeM3VH3TtzA0Li9X1 08uw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=RYyVFfzJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o21-20020aa7d3d5000000b004acc6c7a631si30574811edr.179.2023.03.29.07.28.04; Wed, 29 Mar 2023 07:28:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=RYyVFfzJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231247AbjC2OXf (ORCPT + 99 others); Wed, 29 Mar 2023 10:23:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55216 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231243AbjC2OVq (ORCPT ); Wed, 29 Mar 2023 10:21:46 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DF62C8E for ; Wed, 29 Mar 2023 07:16:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099381; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZJvfcCj7beM2l3aL2q8ZqMjrzC72DifYH9bqCAjpsUU=; b=RYyVFfzJCAUQ+JS2lsDClZWDzmOl8wnGfISjdNNA9QMqZZsVBp69tGp6rS2z9x6QBt6tNp c3pXX7zIgKIHU+5lfLTZjx/NI3YLp48TPvxK43Y3hyjjTERhTl4PfidPAoZqGDmfHKQeb8 IT6+fxaqWivawkmhZYVKoFzbYL6Sv0s= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-478-i-sNnR3eM--eeVz7bh3LKA-1; Wed, 29 Mar 2023 10:16:17 -0400 X-MC-Unique: i-sNnR3eM--eeVz7bh3LKA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 241CE185A790; Wed, 29 Mar 2023 14:16:08 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id A585114171BC; Wed, 29 Mar 2023 14:16:05 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Philipp Reisner , Lars Ellenberg , =?utf-8?q?Christoph_B=C3=B6hmwa?= =?utf-8?q?lder?= , drbd-dev@lists.linbit.com, linux-block@vger.kernel.org Subject: [RFC PATCH v2 46/48] drbd: Use sendmsg(MSG_SPLICE_PAGES) rather than sendmsg() Date: Wed, 29 Mar 2023 15:13:52 +0100 Message-Id: <20230329141354.516864-47-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761712651462231698?= X-GMAIL-MSGID: =?utf-8?q?1761712651462231698?= Use sendmsg() conditionally with MSG_SPLICE_PAGES in _drbd_send_page() rather than calling sendpage() or _drbd_no_send_page(). Signed-off-by: David Howells cc: Philipp Reisner cc: Lars Ellenberg cc: "Christoph Böhmwalder" cc: Jens Axboe cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: drbd-dev@lists.linbit.com cc: linux-block@vger.kernel.org cc: netdev@vger.kernel.org --- drivers/block/drbd/drbd_main.c | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c index 2c764f7ee4a7..e5f90abd29b6 100644 --- a/drivers/block/drbd/drbd_main.c +++ b/drivers/block/drbd/drbd_main.c @@ -1532,7 +1532,8 @@ static int _drbd_send_page(struct drbd_peer_device *peer_device, struct page *pa int offset, size_t size, unsigned msg_flags) { struct socket *socket = peer_device->connection->data.socket; - int len = size; + struct bio_vec bvec; + struct msghdr msg = { .msg_flags = msg_flags, }; int err = -EIO; /* e.g. XFS meta- & log-data is in slab pages, which have a @@ -1541,33 +1542,33 @@ static int _drbd_send_page(struct drbd_peer_device *peer_device, struct page *pa * put_page(); and would cause either a VM_BUG directly, or * __page_cache_release a page that would actually still be referenced * by someone, leading to some obscure delayed Oops somewhere else. */ - if (drbd_disable_sendpage || !sendpage_ok(page)) - return _drbd_no_send_page(peer_device, page, offset, size, msg_flags); + if (!drbd_disable_sendpage && sendpage_ok(page)) + msg.msg_flags |= MSG_NOSIGNAL | MSG_SPLICE_PAGES; + + bvec_set_page(&bvec, page, offset, size); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); - msg_flags |= MSG_NOSIGNAL; drbd_update_congested(peer_device->connection); do { int sent; - sent = socket->ops->sendpage(socket, page, offset, len, msg_flags); + sent = sock_sendmsg(socket, &msg); if (sent <= 0) { if (sent == -EAGAIN) { if (we_should_drop_the_connection(peer_device->connection, socket)) break; continue; } - drbd_warn(peer_device->device, "%s: size=%d len=%d sent=%d\n", - __func__, (int)size, len, sent); + drbd_warn(peer_device->device, "%s: size=%d len=%zu sent=%d\n", + __func__, (int)size, msg_data_left(&msg), sent); if (sent < 0) err = sent; break; } - len -= sent; - offset += sent; - } while (len > 0 /* THINK && device->cstate >= C_CONNECTED*/); + } while (msg_data_left(&msg) /* THINK && device->cstate >= C_CONNECTED*/); clear_bit(NET_CONGESTED, &peer_device->connection->flags); - if (len == 0) { + if (!msg_data_left(&msg)) { err = 0; peer_device->device->send_cnt += size >> 9; } From patchwork Wed Mar 29 14:13:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76634 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp463050vqo; Wed, 29 Mar 2023 07:39:45 -0700 (PDT) X-Google-Smtp-Source: AKy350ZP9BIQGrlcz6i3bHAjrAvpZxtaESQWwwAYXWI1aSLT8MuBOfVw0IfAqRJPLI4FAxqbDx1q X-Received: by 2002:a17:906:da88:b0:945:c4cf:34b7 with SMTP id xh8-20020a170906da8800b00945c4cf34b7mr10110527ejb.68.1680100785587; Wed, 29 Mar 2023 07:39:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680100785; cv=none; d=google.com; s=arc-20160816; b=LAVN1yxHDnQAhxDS+ZiYgR8Nx//rV9WKWKT0JRvbXGAimpn35HnIme/RLI7pAST8os RGafRi2vQvoYsGb9dt3GuNYpuj36Ff4QaBapUde++r7yVNh96U57z4fJBXk6lLEblQ3+ dqMszJxslz05M1BboJIygRpidcwQ6tznrVUOqkgnKTW2/OJXaymHsqFRSDDt8nB3Xv5A eyWB5CgHAxM3zEQbl2X1zqVW0DqLcDz0JuoqNKSzmexGI49bJhtA2sLw7J9HX/aN8s9c wg6bdqrBfSx0Su5mV8glJmkyN7e1+f5axF2dpN7ipu5SA/ubs7RTXDaIsOwYdOKZbl8I tZqQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Q/kAjj+4zLWTXkqKqP2B/MWL9umh9MFmha+7vx/dBng=; b=UPHsttIe4iHuCRMCy2iIfAMuj7NXlHES68LMXSiX5dESLSdKc8r0wxCGJYC3Ltmm3K luiIsMDnQMXA5t21R7aBEgCiRlRu3/12RmzkTXFcMGGG9F8SamjINnVVc3lG6gf1nRif TIoKn4Abnu3P3M5vPOhadVlKDEgRb9YDRFFK2Cg7d9w8BYQGowcHiNydZjSg1Hc/Armk uoqLBRzyienIGr4y4QkUg3Hdww9OR27siEwa6d63GYXeNu+w81Gh+idqQUe4+dEEu5zw YcsgnenPAJSRssLH8Q9YxgiC+0cgoVCuWvyBnXtcWUP+Gqfvom+4bWaHST7AK2J+Ulsx QnMA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=TkRM1Lle; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q23-20020a056402033700b004c10381668csi33888015edw.286.2023.03.29.07.39.17; Wed, 29 Mar 2023 07:39:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=TkRM1Lle; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231215AbjC2OXq (ORCPT + 99 others); Wed, 29 Mar 2023 10:23:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36356 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231214AbjC2OWU (ORCPT ); Wed, 29 Mar 2023 10:22:20 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5E870526D for ; Wed, 29 Mar 2023 07:16:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099384; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Q/kAjj+4zLWTXkqKqP2B/MWL9umh9MFmha+7vx/dBng=; b=TkRM1Llexh4aq/1IDCS0Gjz30Rp7oJPAVTT3UlVqNRUSn+dnloUC3F3UlYdW73rEGPrcxQ GTRuaDYshpL522A6mQmvLK7zTW1fZTzDv+SCAqLqZcOzRh5CcJw0bzjcEe1/Ooa03Lg5So Fmr+m2260HKE9RjY1t+Rd0yJYbm11kQ= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-270-JSybz5aLMb2BtboAW1enxw-1; Wed, 29 Mar 2023 10:16:18 -0400 X-MC-Unique: JSybz5aLMb2BtboAW1enxw-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4406F3C0ED5A; Wed, 29 Mar 2023 14:16:11 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id DC138202701E; Wed, 29 Mar 2023 14:16:08 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Philipp Reisner , Lars Ellenberg , =?utf-8?q?Christoph_B=C3=B6hmwa?= =?utf-8?q?lder?= , drbd-dev@lists.linbit.com, linux-block@vger.kernel.org Subject: [RFC PATCH v2 47/48] drdb: Send an entire bio in a single sendmsg Date: Wed, 29 Mar 2023 15:13:53 +0100 Message-Id: <20230329141354.516864-48-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761713361032348461?= X-GMAIL-MSGID: =?utf-8?q?1761713361032348461?= Since _drdb_sendpage() is now using sendmsg to send the pages rather sendpage, pass the entire bio in one go using a bvec iterator instead of doing it piecemeal. Signed-off-by: David Howells cc: Philipp Reisner cc: Lars Ellenberg cc: "Christoph Böhmwalder" cc: Jens Axboe cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: drbd-dev@lists.linbit.com cc: linux-block@vger.kernel.org cc: netdev@vger.kernel.org --- drivers/block/drbd/drbd_main.c | 77 +++++++++++----------------------- 1 file changed, 25 insertions(+), 52 deletions(-) diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c index e5f90abd29b6..ab63d6138407 100644 --- a/drivers/block/drbd/drbd_main.c +++ b/drivers/block/drbd/drbd_main.c @@ -1512,28 +1512,15 @@ static void drbd_update_congested(struct drbd_connection *connection) * As a workaround, we disable sendpage on pages * with page_count == 0 or PageSlab. */ -static int _drbd_no_send_page(struct drbd_peer_device *peer_device, struct page *page, - int offset, size_t size, unsigned msg_flags) -{ - struct socket *socket; - void *addr; - int err; - - socket = peer_device->connection->data.socket; - addr = kmap(page) + offset; - err = drbd_send_all(peer_device->connection, socket, addr, size, msg_flags); - kunmap(page); - if (!err) - peer_device->device->send_cnt += size >> 9; - return err; -} - -static int _drbd_send_page(struct drbd_peer_device *peer_device, struct page *page, - int offset, size_t size, unsigned msg_flags) +static int _drbd_send_pages(struct drbd_peer_device *peer_device, + struct iov_iter *iter, unsigned msg_flags) { struct socket *socket = peer_device->connection->data.socket; - struct bio_vec bvec; - struct msghdr msg = { .msg_flags = msg_flags, }; + struct msghdr msg = { + .msg_flags = msg_flags | MSG_NOSIGNAL, + .msg_iter = *iter, + }; + size_t size = iov_iter_count(iter); int err = -EIO; /* e.g. XFS meta- & log-data is in slab pages, which have a @@ -1542,11 +1529,8 @@ static int _drbd_send_page(struct drbd_peer_device *peer_device, struct page *pa * put_page(); and would cause either a VM_BUG directly, or * __page_cache_release a page that would actually still be referenced * by someone, leading to some obscure delayed Oops somewhere else. */ - if (!drbd_disable_sendpage && sendpage_ok(page)) - msg.msg_flags |= MSG_NOSIGNAL | MSG_SPLICE_PAGES; - - bvec_set_page(&bvec, page, offset, size); - iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); + if (drbd_disable_sendpage) + msg.msg_flags &= ~(MSG_NOSIGNAL | MSG_SPLICE_PAGES); drbd_update_congested(peer_device->connection); do { @@ -1577,39 +1561,22 @@ static int _drbd_send_page(struct drbd_peer_device *peer_device, struct page *pa static int _drbd_send_bio(struct drbd_peer_device *peer_device, struct bio *bio) { - struct bio_vec bvec; - struct bvec_iter iter; + struct iov_iter iter; - /* hint all but last page with MSG_MORE */ - bio_for_each_segment(bvec, bio, iter) { - int err; + iov_iter_bvec(&iter, ITER_SOURCE, bio->bi_io_vec, bio->bi_vcnt, + bio->bi_iter.bi_size); - err = _drbd_no_send_page(peer_device, bvec.bv_page, - bvec.bv_offset, bvec.bv_len, - bio_iter_last(bvec, iter) - ? 0 : MSG_MORE); - if (err) - return err; - } - return 0; + return _drbd_send_pages(peer_device, &iter, 0); } static int _drbd_send_zc_bio(struct drbd_peer_device *peer_device, struct bio *bio) { - struct bio_vec bvec; - struct bvec_iter iter; + struct iov_iter iter; - /* hint all but last page with MSG_MORE */ - bio_for_each_segment(bvec, bio, iter) { - int err; + iov_iter_bvec(&iter, ITER_SOURCE, bio->bi_io_vec, bio->bi_vcnt, + bio->bi_iter.bi_size); - err = _drbd_send_page(peer_device, bvec.bv_page, - bvec.bv_offset, bvec.bv_len, - bio_iter_last(bvec, iter) ? 0 : MSG_MORE); - if (err) - return err; - } - return 0; + return _drbd_send_pages(peer_device, &iter, MSG_SPLICE_PAGES); } static int _drbd_send_zc_ee(struct drbd_peer_device *peer_device, @@ -1621,10 +1588,16 @@ static int _drbd_send_zc_ee(struct drbd_peer_device *peer_device, /* hint all but last page with MSG_MORE */ page_chain_for_each(page) { + struct iov_iter iter; + struct bio_vec bvec; unsigned l = min_t(unsigned, len, PAGE_SIZE); - err = _drbd_send_page(peer_device, page, 0, l, - page_chain_next(page) ? MSG_MORE : 0); + bvec_set_page(&bvec, page, 0, l); + iov_iter_bvec(&iter, ITER_SOURCE, &bvec, 1, l); + + err = _drbd_send_pages(peer_device, &iter, + MSG_SPLICE_PAGES | + (page_chain_next(page) ? MSG_MORE : 0)); if (err) return err; len -= l; From patchwork Wed Mar 29 14:13:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 76644 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp464064vqo; Wed, 29 Mar 2023 07:41:22 -0700 (PDT) X-Google-Smtp-Source: AKy350ZvSgojpiAlqLRwKbPmEsQv7qNFbWBeHeiui6emWThhljV4+ZPh4uoCGXVy/P99qGJxnU47 X-Received: by 2002:a17:903:d4:b0:19a:b427:230a with SMTP id x20-20020a17090300d400b0019ab427230amr15830804plc.63.1680100881799; Wed, 29 Mar 2023 07:41:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680100881; cv=none; d=google.com; s=arc-20160816; b=xLI+rUNRZ9DLgWG7sLDjKlgXOVSeU11ECUZFtC8LrJJAJQd+nQCp4EyRNk+eoBloW5 Fn70lcwHGoT0O6jJA+p7bJrQ+XII59Ms7JHIuv2wbIycYbejOVtUgFv7AbgiNXPro9uK uIqtm48egWIVZ2fqexUV+gYp4p5u3KAKRa4BS5cduZxlNhUyPypctDI/d6u1WTpv34IS dgkgIX8S1z8lExeF/vmL2TSxF+Qpz6BG+iuHu7j7rfvIjV0BOGU28Rydm8vcOyoMPwf1 92noQQDfbwqasvVGicPYHc0vWMbYgnLvNQnKXLiuP9SkO0k1RMmf0JfRtgDeKC273mYs yr+A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=znvkQBj8L4O9jm+rqEq2ewYackHhkB1pu8mjwp5gnoI=; b=W/XFCToSfz2ID0i8x6dYagL3t4gqBWHOdtiCOBJXrOj0EWcZx0AM1Do25h98ujQOSW Il63Ju2JM9tGCbDF8+9duhSexNioFA/qGnaPmjca1Xlqy3K8t+dloNDAUhIKX86eKde8 4RVwTZNWlLiRV5av6TacbgQIbHk0C31ipBlqoLJpWUd+ZeuD/kJ8aykX0WbsbLP4dE9P X3CBh4LHiG+bbri/gSL5ze1NYJkG/de2st7blEcPqfxpSUArar2EAREw0AEQZ2+B4pyN Fyf4DKLVKRNQbfmltRe4wWyG+PeojY8Fcr5X5mAoyN1Xt4hvxYR0ugFjlmQkutzVgtbF X5Zw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=VZXvt1NN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s1-20020a63dc01000000b0050b51e62c20si32103081pgg.793.2023.03.29.07.41.06; Wed, 29 Mar 2023 07:41:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=VZXvt1NN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229849AbjC2OYW (ORCPT + 99 others); Wed, 29 Mar 2023 10:24:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37334 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231311AbjC2OWp (ORCPT ); Wed, 29 Mar 2023 10:22:45 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A58BF659E for ; Wed, 29 Mar 2023 07:16:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680099385; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=znvkQBj8L4O9jm+rqEq2ewYackHhkB1pu8mjwp5gnoI=; b=VZXvt1NNgJUmT/ejQ17v1esqODYespoS1bDHpKJUlWSeFC6Rk7vHhH3pn3WA9igelGBilh K6+0608eVJnZt2/Iv7mFw1ADHtCDVAnzDT4ssPwYBJRxEGipbsJeHNHKfYZCQYe4cZUNlZ gbudUNLXmAXxMT9nlidCuuvKiKyGN00= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-532-86kb5WFCOSSahj9UFLjFJg-1; Wed, 29 Mar 2023 10:16:21 -0400 X-MC-Unique: 86kb5WFCOSSahj9UFLjFJg-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id BA54D85530C; Wed, 29 Mar 2023 14:16:15 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.18]) by smtp.corp.redhat.com (Postfix) with ESMTP id DCD84492C3E; Wed, 29 Mar 2023 14:16:11 +0000 (UTC) From: David Howells To: Matthew Wilcox , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: David Howells , Al Viro , Christoph Hellwig , Jens Axboe , Jeff Layton , Christian Brauner , Chuck Lever III , Linus Torvalds , netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Marc Kleine-Budde , bpf@vger.kernel.org, dccp@vger.kernel.org, linux-afs@lists.infradead.org, linux-arm-msm@vger.kernel.org, linux-can@vger.kernel.org, linux-crypto@vger.kernel.org, linux-doc@vger.kernel.org, linux-hams@vger.kernel.org, linux-rdma@vger.kernel.org, linux-sctp@vger.kernel.org, linux-wpan@vger.kernel.org, linux-x25@vger.kernel.org, mptcp@lists.linux.dev, rds-devel@oss.oracle.com, tipc-discussion@lists.sourceforge.net, virtualization@lists.linux-foundation.org Subject: [RFC PATCH v2 48/48] sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) Date: Wed, 29 Mar 2023 15:13:54 +0100 Message-Id: <20230329141354.516864-49-dhowells@redhat.com> In-Reply-To: <20230329141354.516864-1-dhowells@redhat.com> References: <20230329141354.516864-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761713462215682583?= X-GMAIL-MSGID: =?utf-8?q?1761713462215682583?= [!] Note: This is a work in progress. At the moment, some things won't build if this patch is applied. nvme, kcm, smc, tls. Remove ->sendpage() and ->sendpage_locked(). sendmsg() with MSG_SPLICE_PAGES should be used instead. This allows multiple pages and multipage folios to be passed through. Signed-off-by: David Howells Acked-by: Marc Kleine-Budde # for net/can cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: bpf@vger.kernel.org cc: dccp@vger.kernel.org cc: linux-afs@lists.infradead.org cc: linux-arm-msm@vger.kernel.org cc: linux-can@vger.kernel.org cc: linux-crypto@vger.kernel.org cc: linux-doc@vger.kernel.org cc: linux-hams@vger.kernel.org cc: linux-kernel@vger.kernel.org cc: linux-rdma@vger.kernel.org cc: linux-sctp@vger.kernel.org cc: linux-wpan@vger.kernel.org cc: linux-x25@vger.kernel.org cc: mptcp@lists.linux.dev cc: netdev@vger.kernel.org cc: rds-devel@oss.oracle.com cc: tipc-discussion@lists.sourceforge.net cc: virtualization@lists.linux-foundation.org --- Documentation/networking/scaling.rst | 4 +- crypto/af_alg.c | 29 ------ crypto/algif_aead.c | 22 +---- crypto/algif_rng.c | 2 - crypto/algif_skcipher.c | 14 --- include/linux/net.h | 8 -- include/net/inet_common.h | 2 - include/net/sock.h | 6 -- net/appletalk/ddp.c | 1 - net/atm/pvc.c | 1 - net/atm/svc.c | 1 - net/ax25/af_ax25.c | 1 - net/caif/caif_socket.c | 2 - net/can/bcm.c | 1 - net/can/isotp.c | 1 - net/can/j1939/socket.c | 1 - net/can/raw.c | 1 - net/core/sock.c | 35 +------ net/dccp/ipv4.c | 1 - net/dccp/ipv6.c | 1 - net/ieee802154/socket.c | 2 - net/ipv4/af_inet.c | 21 ---- net/ipv4/tcp.c | 34 ------- net/ipv4/tcp_bpf.c | 21 +--- net/ipv4/tcp_ipv4.c | 1 - net/ipv4/udp.c | 22 ----- net/ipv4/udp_impl.h | 2 - net/ipv4/udplite.c | 1 - net/ipv6/af_inet6.c | 3 - net/ipv6/raw.c | 1 - net/ipv6/tcp_ipv6.c | 1 - net/key/af_key.c | 1 - net/l2tp/l2tp_ip.c | 1 - net/l2tp/l2tp_ip6.c | 1 - net/llc/af_llc.c | 1 - net/mctp/af_mctp.c | 1 - net/mptcp/protocol.c | 2 - net/netlink/af_netlink.c | 1 - net/netrom/af_netrom.c | 1 - net/packet/af_packet.c | 2 - net/phonet/socket.c | 2 - net/qrtr/af_qrtr.c | 1 - net/rds/af_rds.c | 1 - net/rose/af_rose.c | 1 - net/rxrpc/af_rxrpc.c | 1 - net/sctp/protocol.c | 1 - net/socket.c | 48 --------- net/tipc/socket.c | 3 - net/unix/af_unix.c | 139 --------------------------- net/vmw_vsock/af_vsock.c | 3 - net/x25/af_x25.c | 1 - net/xdp/xsk.c | 1 - 52 files changed, 9 insertions(+), 447 deletions(-) diff --git a/Documentation/networking/scaling.rst b/Documentation/networking/scaling.rst index 3d435caa3ef2..92c9fb46d6a2 100644 --- a/Documentation/networking/scaling.rst +++ b/Documentation/networking/scaling.rst @@ -269,8 +269,8 @@ a single application thread handles flows with many different flow hashes. rps_sock_flow_table is a global flow table that contains the *desired* CPU for flows: the CPU that is currently processing the flow in userspace. Each table value is a CPU index that is updated during calls to recvmsg -and sendmsg (specifically, inet_recvmsg(), inet_sendmsg(), inet_sendpage() -and tcp_splice_read()). +and sendmsg (specifically, inet_recvmsg(), inet_sendmsg() and +tcp_splice_read()). When the scheduler moves a thread to a new CPU while it has outstanding receive packets on the old CPU, packets may arrive out of order. To diff --git a/crypto/af_alg.c b/crypto/af_alg.c index 686610a4986f..9f84816dcabf 100644 --- a/crypto/af_alg.c +++ b/crypto/af_alg.c @@ -483,7 +483,6 @@ static const struct proto_ops alg_proto_ops = { .listen = sock_no_listen, .shutdown = sock_no_shutdown, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, .sendmsg = sock_no_sendmsg, .recvmsg = sock_no_recvmsg, @@ -1110,34 +1109,6 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr *msg, size_t size, } EXPORT_SYMBOL_GPL(af_alg_sendmsg); -/** - * af_alg_sendpage - sendpage system call handler - * @sock: socket of connection to user space to write to - * @page: data to send - * @offset: offset into page to begin sending - * @size: length of data - * @flags: message send/receive flags - * - * This is a generic implementation of sendpage to fill ctx->tsgl_list. - */ -ssize_t af_alg_sendpage(struct socket *sock, struct page *page, - int offset, size_t size, int flags) -{ - struct bio_vec bvec; - struct msghdr msg = { - .msg_flags = flags | MSG_SPLICE_PAGES, - }; - - bvec_set_page(&bvec, page, size, offset); - iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); - - if (flags & MSG_SENDPAGE_NOTLAST) - msg.msg_flags |= MSG_MORE; - - return sock_sendmsg(sock, &msg); -} -EXPORT_SYMBOL_GPL(af_alg_sendpage); - /** * af_alg_free_resources - release resources required for crypto request * @areq: Request holding the TX and RX SGL diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c index b16111a3025a..37b08e5f9114 100644 --- a/crypto/algif_aead.c +++ b/crypto/algif_aead.c @@ -9,10 +9,10 @@ * The following concept of the memory management is used: * * The kernel maintains two SGLs, the TX SGL and the RX SGL. The TX SGL is - * filled by user space with the data submitted via sendpage. Filling up - * the TX SGL does not cause a crypto operation -- the data will only be - * tracked by the kernel. Upon receipt of one recvmsg call, the caller must - * provide a buffer which is tracked with the RX SGL. + * filled by user space with the data submitted via sendmsg (maybe with with + * MSG_SPLICE_PAGES). Filling up the TX SGL does not cause a crypto operation + * -- the data will only be tracked by the kernel. Upon receipt of one recvmsg + * call, the caller must provide a buffer which is tracked with the RX SGL. * * During the processing of the recvmsg operation, the cipher request is * allocated and prepared. As part of the recvmsg operation, the processed @@ -368,7 +368,6 @@ static struct proto_ops algif_aead_ops = { .release = af_alg_release, .sendmsg = aead_sendmsg, - .sendpage = af_alg_sendpage, .recvmsg = aead_recvmsg, .poll = af_alg_poll, }; @@ -420,18 +419,6 @@ static int aead_sendmsg_nokey(struct socket *sock, struct msghdr *msg, return aead_sendmsg(sock, msg, size); } -static ssize_t aead_sendpage_nokey(struct socket *sock, struct page *page, - int offset, size_t size, int flags) -{ - int err; - - err = aead_check_key(sock); - if (err) - return err; - - return af_alg_sendpage(sock, page, offset, size, flags); -} - static int aead_recvmsg_nokey(struct socket *sock, struct msghdr *msg, size_t ignored, int flags) { @@ -459,7 +446,6 @@ static struct proto_ops algif_aead_ops_nokey = { .release = af_alg_release, .sendmsg = aead_sendmsg_nokey, - .sendpage = aead_sendpage_nokey, .recvmsg = aead_recvmsg_nokey, .poll = af_alg_poll, }; diff --git a/crypto/algif_rng.c b/crypto/algif_rng.c index 407408c43730..10c41adac3b1 100644 --- a/crypto/algif_rng.c +++ b/crypto/algif_rng.c @@ -174,7 +174,6 @@ static struct proto_ops algif_rng_ops = { .bind = sock_no_bind, .accept = sock_no_accept, .sendmsg = sock_no_sendmsg, - .sendpage = sock_no_sendpage, .release = af_alg_release, .recvmsg = rng_recvmsg, @@ -192,7 +191,6 @@ static struct proto_ops __maybe_unused algif_rng_test_ops = { .mmap = sock_no_mmap, .bind = sock_no_bind, .accept = sock_no_accept, - .sendpage = sock_no_sendpage, .release = af_alg_release, .recvmsg = rng_test_recvmsg, diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c index b1f321b9f846..9ada9b741af8 100644 --- a/crypto/algif_skcipher.c +++ b/crypto/algif_skcipher.c @@ -194,7 +194,6 @@ static struct proto_ops algif_skcipher_ops = { .release = af_alg_release, .sendmsg = skcipher_sendmsg, - .sendpage = af_alg_sendpage, .recvmsg = skcipher_recvmsg, .poll = af_alg_poll, }; @@ -246,18 +245,6 @@ static int skcipher_sendmsg_nokey(struct socket *sock, struct msghdr *msg, return skcipher_sendmsg(sock, msg, size); } -static ssize_t skcipher_sendpage_nokey(struct socket *sock, struct page *page, - int offset, size_t size, int flags) -{ - int err; - - err = skcipher_check_key(sock); - if (err) - return err; - - return af_alg_sendpage(sock, page, offset, size, flags); -} - static int skcipher_recvmsg_nokey(struct socket *sock, struct msghdr *msg, size_t ignored, int flags) { @@ -285,7 +272,6 @@ static struct proto_ops algif_skcipher_ops_nokey = { .release = af_alg_release, .sendmsg = skcipher_sendmsg_nokey, - .sendpage = skcipher_sendpage_nokey, .recvmsg = skcipher_recvmsg_nokey, .poll = af_alg_poll, }; diff --git a/include/linux/net.h b/include/linux/net.h index b73ad8e3c212..e5794968ac9f 100644 --- a/include/linux/net.h +++ b/include/linux/net.h @@ -206,8 +206,6 @@ struct proto_ops { size_t total_len, int flags); int (*mmap) (struct file *file, struct socket *sock, struct vm_area_struct * vma); - ssize_t (*sendpage) (struct socket *sock, struct page *page, - int offset, size_t size, int flags); ssize_t (*splice_read)(struct socket *sock, loff_t *ppos, struct pipe_inode_info *pipe, size_t len, unsigned int flags); int (*set_peek_off)(struct sock *sk, int val); @@ -220,8 +218,6 @@ struct proto_ops { sk_read_actor_t recv_actor); /* This is different from read_sock(), it reads an entire skb at a time. */ int (*read_skb)(struct sock *sk, skb_read_actor_t recv_actor); - int (*sendpage_locked)(struct sock *sk, struct page *page, - int offset, size_t size, int flags); int (*sendmsg_locked)(struct sock *sk, struct msghdr *msg, size_t size); int (*set_rcvlowat)(struct sock *sk, int val); @@ -339,10 +335,6 @@ int kernel_connect(struct socket *sock, struct sockaddr *addr, int addrlen, int flags); int kernel_getsockname(struct socket *sock, struct sockaddr *addr); int kernel_getpeername(struct socket *sock, struct sockaddr *addr); -int kernel_sendpage(struct socket *sock, struct page *page, int offset, - size_t size, int flags); -int kernel_sendpage_locked(struct sock *sk, struct page *page, int offset, - size_t size, int flags); int kernel_sock_shutdown(struct socket *sock, enum sock_shutdown_cmd how); /* Routine returns the IP overhead imposed by a (caller-protected) socket. */ diff --git a/include/net/inet_common.h b/include/net/inet_common.h index cec453c18f1d..054c3388fa51 100644 --- a/include/net/inet_common.h +++ b/include/net/inet_common.h @@ -33,8 +33,6 @@ int inet_accept(struct socket *sock, struct socket *newsock, int flags, bool kern); int inet_send_prepare(struct sock *sk); int inet_sendmsg(struct socket *sock, struct msghdr *msg, size_t size); -ssize_t inet_sendpage(struct socket *sock, struct page *page, int offset, - size_t size, int flags); int inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t size, int flags); int inet_shutdown(struct socket *sock, int how); diff --git a/include/net/sock.h b/include/net/sock.h index 573f2bf7e0de..4618cd21e16b 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1265,8 +1265,6 @@ struct proto { size_t len); int (*recvmsg)(struct sock *sk, struct msghdr *msg, size_t len, int flags, int *addr_len); - int (*sendpage)(struct sock *sk, struct page *page, - int offset, size_t size, int flags); int (*bind)(struct sock *sk, struct sockaddr *addr, int addr_len); int (*bind_add)(struct sock *sk, @@ -1906,10 +1904,6 @@ int sock_no_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t len); int sock_no_recvmsg(struct socket *, struct msghdr *, size_t, int); int sock_no_mmap(struct file *file, struct socket *sock, struct vm_area_struct *vma); -ssize_t sock_no_sendpage(struct socket *sock, struct page *page, int offset, - size_t size, int flags); -ssize_t sock_no_sendpage_locked(struct sock *sk, struct page *page, - int offset, size_t size, int flags); /* * Functions to fill in entries in struct proto_ops when a protocol diff --git a/net/appletalk/ddp.c b/net/appletalk/ddp.c index a06f4d4a6f47..8978fb6212ff 100644 --- a/net/appletalk/ddp.c +++ b/net/appletalk/ddp.c @@ -1929,7 +1929,6 @@ static const struct proto_ops atalk_dgram_ops = { .sendmsg = atalk_sendmsg, .recvmsg = atalk_recvmsg, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, }; static struct notifier_block ddp_notifier = { diff --git a/net/atm/pvc.c b/net/atm/pvc.c index 53e7d3f39e26..66d9a9bd5896 100644 --- a/net/atm/pvc.c +++ b/net/atm/pvc.c @@ -126,7 +126,6 @@ static const struct proto_ops pvc_proto_ops = { .sendmsg = vcc_sendmsg, .recvmsg = vcc_recvmsg, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, }; diff --git a/net/atm/svc.c b/net/atm/svc.c index 4a02bcaad279..289240fe234e 100644 --- a/net/atm/svc.c +++ b/net/atm/svc.c @@ -649,7 +649,6 @@ static const struct proto_ops svc_proto_ops = { .sendmsg = vcc_sendmsg, .recvmsg = vcc_recvmsg, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, }; diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c index d8da400cb4de..5db805d5f74d 100644 --- a/net/ax25/af_ax25.c +++ b/net/ax25/af_ax25.c @@ -2022,7 +2022,6 @@ static const struct proto_ops ax25_proto_ops = { .sendmsg = ax25_sendmsg, .recvmsg = ax25_recvmsg, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, }; /* diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c index 4eebcc66c19a..9c82698da4f5 100644 --- a/net/caif/caif_socket.c +++ b/net/caif/caif_socket.c @@ -976,7 +976,6 @@ static const struct proto_ops caif_seqpacket_ops = { .sendmsg = caif_seqpkt_sendmsg, .recvmsg = caif_seqpkt_recvmsg, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, }; static const struct proto_ops caif_stream_ops = { @@ -996,7 +995,6 @@ static const struct proto_ops caif_stream_ops = { .sendmsg = caif_stream_sendmsg, .recvmsg = caif_stream_recvmsg, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, }; /* This function is called when a socket is finally destroyed. */ diff --git a/net/can/bcm.c b/net/can/bcm.c index 27706f6ace34..65a946a36d92 100644 --- a/net/can/bcm.c +++ b/net/can/bcm.c @@ -1699,7 +1699,6 @@ static const struct proto_ops bcm_ops = { .sendmsg = bcm_sendmsg, .recvmsg = bcm_recvmsg, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, }; static struct proto bcm_proto __read_mostly = { diff --git a/net/can/isotp.c b/net/can/isotp.c index 9bc344851704..0c3d11c29a2b 100644 --- a/net/can/isotp.c +++ b/net/can/isotp.c @@ -1633,7 +1633,6 @@ static const struct proto_ops isotp_ops = { .sendmsg = isotp_sendmsg, .recvmsg = isotp_recvmsg, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, }; static struct proto isotp_proto __read_mostly = { diff --git a/net/can/j1939/socket.c b/net/can/j1939/socket.c index 7e90f9e61d9b..2bfe4f79bb67 100644 --- a/net/can/j1939/socket.c +++ b/net/can/j1939/socket.c @@ -1301,7 +1301,6 @@ static const struct proto_ops j1939_ops = { .sendmsg = j1939_sk_sendmsg, .recvmsg = j1939_sk_recvmsg, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, }; static struct proto j1939_proto __read_mostly = { diff --git a/net/can/raw.c b/net/can/raw.c index f64469b98260..15c79b079184 100644 --- a/net/can/raw.c +++ b/net/can/raw.c @@ -962,7 +962,6 @@ static const struct proto_ops raw_ops = { .sendmsg = raw_sendmsg, .recvmsg = raw_recvmsg, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, }; static struct proto raw_proto __read_mostly = { diff --git a/net/core/sock.c b/net/core/sock.c index 341c565dbc26..c2ae77bb2075 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -3223,36 +3223,6 @@ void __receive_sock(struct file *file) } } -ssize_t sock_no_sendpage(struct socket *sock, struct page *page, int offset, size_t size, int flags) -{ - ssize_t res; - struct msghdr msg = {.msg_flags = flags}; - struct kvec iov; - char *kaddr = kmap(page); - iov.iov_base = kaddr + offset; - iov.iov_len = size; - res = kernel_sendmsg(sock, &msg, &iov, 1, size); - kunmap(page); - return res; -} -EXPORT_SYMBOL(sock_no_sendpage); - -ssize_t sock_no_sendpage_locked(struct sock *sk, struct page *page, - int offset, size_t size, int flags) -{ - ssize_t res; - struct msghdr msg = {.msg_flags = flags}; - struct kvec iov; - char *kaddr = kmap(page); - - iov.iov_base = kaddr + offset; - iov.iov_len = size; - res = kernel_sendmsg_locked(sk, &msg, &iov, 1, size); - kunmap(page); - return res; -} -EXPORT_SYMBOL(sock_no_sendpage_locked); - /* * Default Socket Callbacks */ @@ -4008,7 +3978,7 @@ static void proto_seq_printf(struct seq_file *seq, struct proto *proto) { seq_printf(seq, "%-9s %4u %6d %6ld %-3s %6u %-3s %-10s " - "%2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c\n", + "%2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c\n", proto->name, proto->obj_size, sock_prot_inuse_get(seq_file_net(seq), proto), @@ -4029,7 +3999,6 @@ static void proto_seq_printf(struct seq_file *seq, struct proto *proto) proto_method_implemented(proto->getsockopt), proto_method_implemented(proto->sendmsg), proto_method_implemented(proto->recvmsg), - proto_method_implemented(proto->sendpage), proto_method_implemented(proto->bind), proto_method_implemented(proto->backlog_rcv), proto_method_implemented(proto->hash), @@ -4050,7 +4019,7 @@ static int proto_seq_show(struct seq_file *seq, void *v) "maxhdr", "slab", "module", - "cl co di ac io in de sh ss gs se re sp bi br ha uh gp em\n"); + "cl co di ac io in de sh ss gs se re bi br ha uh gp em\n"); else proto_seq_printf(seq, list_entry(v, struct proto, node)); return 0; diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c index b780827f5e0a..ea808de374ea 100644 --- a/net/dccp/ipv4.c +++ b/net/dccp/ipv4.c @@ -1008,7 +1008,6 @@ static const struct proto_ops inet_dccp_ops = { .sendmsg = inet_sendmsg, .recvmsg = sock_common_recvmsg, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, }; static struct inet_protosw dccp_v4_protosw = { diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c index b9d7c3dd1cb3..23eb8159e3cd 100644 --- a/net/dccp/ipv6.c +++ b/net/dccp/ipv6.c @@ -1085,7 +1085,6 @@ static const struct proto_ops inet6_dccp_ops = { .sendmsg = inet_sendmsg, .recvmsg = sock_common_recvmsg, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, #ifdef CONFIG_COMPAT .compat_ioctl = inet6_compat_ioctl, #endif diff --git a/net/ieee802154/socket.c b/net/ieee802154/socket.c index 1fa2fe041ec0..1238f036117f 100644 --- a/net/ieee802154/socket.c +++ b/net/ieee802154/socket.c @@ -426,7 +426,6 @@ static const struct proto_ops ieee802154_raw_ops = { .sendmsg = ieee802154_sock_sendmsg, .recvmsg = sock_common_recvmsg, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, }; /* DGRAM Sockets (802.15.4 dataframes) */ @@ -990,7 +989,6 @@ static const struct proto_ops ieee802154_dgram_ops = { .sendmsg = ieee802154_sock_sendmsg, .recvmsg = sock_common_recvmsg, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, }; static void ieee802154_sock_destruct(struct sock *sk) diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 8db6747f892f..869b49933f15 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -827,23 +827,6 @@ int inet_sendmsg(struct socket *sock, struct msghdr *msg, size_t size) } EXPORT_SYMBOL(inet_sendmsg); -ssize_t inet_sendpage(struct socket *sock, struct page *page, int offset, - size_t size, int flags) -{ - struct sock *sk = sock->sk; - const struct proto *prot; - - if (unlikely(inet_send_prepare(sk))) - return -EAGAIN; - - /* IPV6_ADDRFORM can change sk->sk_prot under us. */ - prot = READ_ONCE(sk->sk_prot); - if (prot->sendpage) - return prot->sendpage(sk, page, offset, size, flags); - return sock_no_sendpage(sock, page, offset, size, flags); -} -EXPORT_SYMBOL(inet_sendpage); - INDIRECT_CALLABLE_DECLARE(int udp_recvmsg(struct sock *, struct msghdr *, size_t, int, int *)); int inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t size, @@ -1046,12 +1029,10 @@ const struct proto_ops inet_stream_ops = { #ifdef CONFIG_MMU .mmap = tcp_mmap, #endif - .sendpage = inet_sendpage, .splice_read = tcp_splice_read, .read_sock = tcp_read_sock, .read_skb = tcp_read_skb, .sendmsg_locked = tcp_sendmsg_locked, - .sendpage_locked = tcp_sendpage_locked, .peek_len = tcp_peek_len, #ifdef CONFIG_COMPAT .compat_ioctl = inet_compat_ioctl, @@ -1080,7 +1061,6 @@ const struct proto_ops inet_dgram_ops = { .read_skb = udp_read_skb, .recvmsg = inet_recvmsg, .mmap = sock_no_mmap, - .sendpage = inet_sendpage, .set_peek_off = sk_set_peek_off, #ifdef CONFIG_COMPAT .compat_ioctl = inet_compat_ioctl, @@ -1111,7 +1091,6 @@ static const struct proto_ops inet_sockraw_ops = { .sendmsg = inet_sendmsg, .recvmsg = inet_recvmsg, .mmap = sock_no_mmap, - .sendpage = inet_sendpage, #ifdef CONFIG_COMPAT .compat_ioctl = inet_compat_ioctl, #endif diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index a8f8ccaed10e..bd01a1b23c7b 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -971,40 +971,6 @@ static int tcp_wmem_schedule(struct sock *sk, int copy) return min(copy, sk->sk_forward_alloc); } -int tcp_sendpage_locked(struct sock *sk, struct page *page, int offset, - size_t size, int flags) -{ - struct bio_vec bvec; - struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, }; - - if (!(sk->sk_route_caps & NETIF_F_SG)) - return sock_no_sendpage_locked(sk, page, offset, size, flags); - - tcp_rate_check_app_limited(sk); /* is sending application-limited? */ - - bvec_set_page(&bvec, page, size, offset); - iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); - - if (flags & MSG_SENDPAGE_NOTLAST) - msg.msg_flags |= MSG_MORE; - - return tcp_sendmsg_locked(sk, &msg, size); -} -EXPORT_SYMBOL_GPL(tcp_sendpage_locked); - -int tcp_sendpage(struct sock *sk, struct page *page, int offset, - size_t size, int flags) -{ - int ret; - - lock_sock(sk); - ret = tcp_sendpage_locked(sk, page, offset, size, flags); - release_sock(sk); - - return ret; -} -EXPORT_SYMBOL(tcp_sendpage); - void tcp_free_fastopen_req(struct tcp_sock *tp) { if (tp->fastopen_req) { diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index de37a4372437..ab83cfb9de22 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -482,23 +482,6 @@ static int tcp_bpf_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) return copied ? copied : err; } -static int tcp_bpf_sendpage(struct sock *sk, struct page *page, int offset, - size_t size, int flags) -{ - struct bio_vec bvec; - struct msghdr msg = { - .msg_flags = flags | MSG_SPLICE_PAGES, - }; - - bvec_set_page(&bvec, page, size, offset); - iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); - - if (flags & MSG_SENDPAGE_NOTLAST) - msg.msg_flags |= MSG_MORE; - - return tcp_bpf_sendmsg(sk, &msg, size); -} - enum { TCP_BPF_IPV4, TCP_BPF_IPV6, @@ -528,7 +511,6 @@ static void tcp_bpf_rebuild_protos(struct proto prot[TCP_BPF_NUM_CFGS], prot[TCP_BPF_TX] = prot[TCP_BPF_BASE]; prot[TCP_BPF_TX].sendmsg = tcp_bpf_sendmsg; - prot[TCP_BPF_TX].sendpage = tcp_bpf_sendpage; prot[TCP_BPF_RX] = prot[TCP_BPF_BASE]; prot[TCP_BPF_RX].recvmsg = tcp_bpf_recvmsg_parser; @@ -563,8 +545,7 @@ static int tcp_bpf_assert_proto_ops(struct proto *ops) * indeed valid assumptions. */ return ops->recvmsg == tcp_recvmsg && - ops->sendmsg == tcp_sendmsg && - ops->sendpage == tcp_sendpage ? 0 : -ENOTSUPP; + ops->sendmsg == tcp_sendmsg ? 0 : -ENOTSUPP; } int tcp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore) diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index ea370afa70ed..5c2e1c1ca329 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -3112,7 +3112,6 @@ struct proto tcp_prot = { .keepalive = tcp_set_keepalive, .recvmsg = tcp_recvmsg, .sendmsg = tcp_sendmsg, - .sendpage = tcp_sendpage, .backlog_rcv = tcp_v4_do_rcv, .release_cb = tcp_release_cb, .hash = inet_hash, diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 097feb92e215..85bd5960f7ef 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1329,27 +1329,6 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) } EXPORT_SYMBOL(udp_sendmsg); -int udp_sendpage(struct sock *sk, struct page *page, int offset, - size_t size, int flags) -{ - struct bio_vec bvec; - struct msghdr msg = { - .msg_flags = flags | MSG_SPLICE_PAGES | MSG_MORE - }; - int ret; - - bvec_set_page(&bvec, page, size, offset); - iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); - - if (flags & MSG_SENDPAGE_NOTLAST) - msg.msg_flags |= MSG_MORE; - - lock_sock(sk); - ret = udp_sendmsg(sk, &msg, size); - release_sock(sk); - return ret; -} - #define UDP_SKB_IS_STATELESS 0x80000000 /* all head states (dst, sk, nf conntrack) except skb extensions are @@ -2926,7 +2905,6 @@ struct proto udp_prot = { .getsockopt = udp_getsockopt, .sendmsg = udp_sendmsg, .recvmsg = udp_recvmsg, - .sendpage = udp_sendpage, .release_cb = ip4_datagram_release_cb, .hash = udp_lib_hash, .unhash = udp_lib_unhash, diff --git a/net/ipv4/udp_impl.h b/net/ipv4/udp_impl.h index 4ba7a88a1b1d..e1ff3a375996 100644 --- a/net/ipv4/udp_impl.h +++ b/net/ipv4/udp_impl.h @@ -19,8 +19,6 @@ int udp_getsockopt(struct sock *sk, int level, int optname, int udp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int flags, int *addr_len); -int udp_sendpage(struct sock *sk, struct page *page, int offset, size_t size, - int flags); void udp_destroy_sock(struct sock *sk); #ifdef CONFIG_PROC_FS diff --git a/net/ipv4/udplite.c b/net/ipv4/udplite.c index e0c9cc39b81e..69870f0afc6c 100644 --- a/net/ipv4/udplite.c +++ b/net/ipv4/udplite.c @@ -54,7 +54,6 @@ struct proto udplite_prot = { .getsockopt = udp_getsockopt, .sendmsg = udp_sendmsg, .recvmsg = udp_recvmsg, - .sendpage = udp_sendpage, .hash = udp_lib_hash, .unhash = udp_lib_unhash, .rehash = udp_v4_rehash, diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index 38689bedfce7..769c76d59053 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -695,9 +695,7 @@ const struct proto_ops inet6_stream_ops = { #ifdef CONFIG_MMU .mmap = tcp_mmap, #endif - .sendpage = inet_sendpage, .sendmsg_locked = tcp_sendmsg_locked, - .sendpage_locked = tcp_sendpage_locked, .splice_read = tcp_splice_read, .read_sock = tcp_read_sock, .read_skb = tcp_read_skb, @@ -728,7 +726,6 @@ const struct proto_ops inet6_dgram_ops = { .recvmsg = inet6_recvmsg, /* retpoline's sake */ .read_skb = udp_read_skb, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, .set_peek_off = sk_set_peek_off, #ifdef CONFIG_COMPAT .compat_ioctl = inet6_compat_ioctl, diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c index bac9ba747bde..c6c062678c0e 100644 --- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -1298,7 +1298,6 @@ const struct proto_ops inet6_sockraw_ops = { .sendmsg = inet_sendmsg, /* ok */ .recvmsg = sock_common_recvmsg, /* ok */ .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, #ifdef CONFIG_COMPAT .compat_ioctl = inet6_compat_ioctl, #endif diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 1bf93b61aa06..03ba1e389901 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -2151,7 +2151,6 @@ struct proto tcpv6_prot = { .keepalive = tcp_set_keepalive, .recvmsg = tcp_recvmsg, .sendmsg = tcp_sendmsg, - .sendpage = tcp_sendpage, .backlog_rcv = tcp_v6_do_rcv, .release_cb = tcp_release_cb, .hash = inet6_hash, diff --git a/net/key/af_key.c b/net/key/af_key.c index a815f5ab4c49..bf59d42dc697 100644 --- a/net/key/af_key.c +++ b/net/key/af_key.c @@ -3757,7 +3757,6 @@ static const struct proto_ops pfkey_ops = { .listen = sock_no_listen, .shutdown = sock_no_shutdown, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, /* Now the operations that really occur. */ .release = pfkey_release, diff --git a/net/l2tp/l2tp_ip.c b/net/l2tp/l2tp_ip.c index 4db5a554bdbd..d0dcbe3a4cd7 100644 --- a/net/l2tp/l2tp_ip.c +++ b/net/l2tp/l2tp_ip.c @@ -625,7 +625,6 @@ static const struct proto_ops l2tp_ip_ops = { .sendmsg = inet_sendmsg, .recvmsg = sock_common_recvmsg, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, }; static struct inet_protosw l2tp_ip_protosw = { diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c index 2478aa60145f..49296ce14a90 100644 --- a/net/l2tp/l2tp_ip6.c +++ b/net/l2tp/l2tp_ip6.c @@ -751,7 +751,6 @@ static const struct proto_ops l2tp_ip6_ops = { .sendmsg = inet_sendmsg, .recvmsg = sock_common_recvmsg, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, #ifdef CONFIG_COMPAT .compat_ioctl = inet6_compat_ioctl, #endif diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c index da7fe94bea2e..addd94da2a81 100644 --- a/net/llc/af_llc.c +++ b/net/llc/af_llc.c @@ -1230,7 +1230,6 @@ static const struct proto_ops llc_ui_ops = { .sendmsg = llc_ui_sendmsg, .recvmsg = llc_ui_recvmsg, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, }; static const char llc_proc_err_msg[] __initconst = diff --git a/net/mctp/af_mctp.c b/net/mctp/af_mctp.c index 3150f3f0c872..c6fe2e6b85dd 100644 --- a/net/mctp/af_mctp.c +++ b/net/mctp/af_mctp.c @@ -485,7 +485,6 @@ static const struct proto_ops mctp_dgram_ops = { .sendmsg = mctp_sendmsg, .recvmsg = mctp_recvmsg, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, #ifdef CONFIG_COMPAT .compat_ioctl = mctp_compat_ioctl, #endif diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 3ad9c46202fc..ade89b8d0082 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -3816,7 +3816,6 @@ static const struct proto_ops mptcp_stream_ops = { .sendmsg = inet_sendmsg, .recvmsg = inet_recvmsg, .mmap = sock_no_mmap, - .sendpage = inet_sendpage, }; static struct inet_protosw mptcp_protosw = { @@ -3911,7 +3910,6 @@ static const struct proto_ops mptcp_v6_stream_ops = { .sendmsg = inet6_sendmsg, .recvmsg = inet6_recvmsg, .mmap = sock_no_mmap, - .sendpage = inet_sendpage, #ifdef CONFIG_COMPAT .compat_ioctl = inet6_compat_ioctl, #endif diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index c64277659753..f70073a3bb49 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -2841,7 +2841,6 @@ static const struct proto_ops netlink_ops = { .sendmsg = netlink_sendmsg, .recvmsg = netlink_recvmsg, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, }; static const struct net_proto_family netlink_family_ops = { diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c index 5a4cb796150f..eb8ccbd58df7 100644 --- a/net/netrom/af_netrom.c +++ b/net/netrom/af_netrom.c @@ -1364,7 +1364,6 @@ static const struct proto_ops nr_proto_ops = { .sendmsg = nr_sendmsg, .recvmsg = nr_recvmsg, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, }; static struct notifier_block nr_dev_notifier = { diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index d4e76e2ae153..385bd4982b80 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -4604,7 +4604,6 @@ static const struct proto_ops packet_ops_spkt = { .sendmsg = packet_sendmsg_spkt, .recvmsg = packet_recvmsg, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, }; static const struct proto_ops packet_ops = { @@ -4626,7 +4625,6 @@ static const struct proto_ops packet_ops = { .sendmsg = packet_sendmsg, .recvmsg = packet_recvmsg, .mmap = packet_mmap, - .sendpage = sock_no_sendpage, }; static const struct net_proto_family packet_family_ops = { diff --git a/net/phonet/socket.c b/net/phonet/socket.c index 71e2caf6ab85..a246f7d0a817 100644 --- a/net/phonet/socket.c +++ b/net/phonet/socket.c @@ -441,7 +441,6 @@ const struct proto_ops phonet_dgram_ops = { .sendmsg = pn_socket_sendmsg, .recvmsg = sock_common_recvmsg, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, }; const struct proto_ops phonet_stream_ops = { @@ -462,7 +461,6 @@ const struct proto_ops phonet_stream_ops = { .sendmsg = pn_socket_sendmsg, .recvmsg = sock_common_recvmsg, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, }; EXPORT_SYMBOL(phonet_stream_ops); diff --git a/net/qrtr/af_qrtr.c b/net/qrtr/af_qrtr.c index 5c2fb992803b..5bb7d680bd5f 100644 --- a/net/qrtr/af_qrtr.c +++ b/net/qrtr/af_qrtr.c @@ -1240,7 +1240,6 @@ static const struct proto_ops qrtr_proto_ops = { .shutdown = sock_no_shutdown, .release = qrtr_release, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, }; static struct proto qrtr_proto = { diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c index 3ff6995244e5..01c4cdfef45d 100644 --- a/net/rds/af_rds.c +++ b/net/rds/af_rds.c @@ -653,7 +653,6 @@ static const struct proto_ops rds_proto_ops = { .sendmsg = rds_sendmsg, .recvmsg = rds_recvmsg, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, }; static void rds_sock_destruct(struct sock *sk) diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c index ca2b17f32670..49dafe9ac72f 100644 --- a/net/rose/af_rose.c +++ b/net/rose/af_rose.c @@ -1496,7 +1496,6 @@ static const struct proto_ops rose_proto_ops = { .sendmsg = rose_sendmsg, .recvmsg = rose_recvmsg, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, }; static struct notifier_block rose_dev_notifier = { diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c index 102f5cbff91a..182495804f8f 100644 --- a/net/rxrpc/af_rxrpc.c +++ b/net/rxrpc/af_rxrpc.c @@ -938,7 +938,6 @@ static const struct proto_ops rxrpc_rpc_ops = { .sendmsg = rxrpc_sendmsg, .recvmsg = rxrpc_recvmsg, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, }; static struct proto rxrpc_proto = { diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c index c365df24ad33..acb2d2a69268 100644 --- a/net/sctp/protocol.c +++ b/net/sctp/protocol.c @@ -1135,7 +1135,6 @@ static const struct proto_ops inet_seqpacket_ops = { .sendmsg = inet_sendmsg, .recvmsg = inet_recvmsg, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, }; /* Registration with AF_INET family. */ diff --git a/net/socket.c b/net/socket.c index 2cd5c2bcdde8..382775cbb51c 100644 --- a/net/socket.c +++ b/net/socket.c @@ -3548,54 +3548,6 @@ int kernel_getpeername(struct socket *sock, struct sockaddr *addr) } EXPORT_SYMBOL(kernel_getpeername); -/** - * kernel_sendpage - send a &page through a socket (kernel space) - * @sock: socket - * @page: page - * @offset: page offset - * @size: total size in bytes - * @flags: flags (MSG_DONTWAIT, ...) - * - * Returns the total amount sent in bytes or an error. - */ - -int kernel_sendpage(struct socket *sock, struct page *page, int offset, - size_t size, int flags) -{ - if (sock->ops->sendpage) { - /* Warn in case the improper page to zero-copy send */ - WARN_ONCE(!sendpage_ok(page), "improper page for zero-copy send"); - return sock->ops->sendpage(sock, page, offset, size, flags); - } - return sock_no_sendpage(sock, page, offset, size, flags); -} -EXPORT_SYMBOL(kernel_sendpage); - -/** - * kernel_sendpage_locked - send a &page through the locked sock (kernel space) - * @sk: sock - * @page: page - * @offset: page offset - * @size: total size in bytes - * @flags: flags (MSG_DONTWAIT, ...) - * - * Returns the total amount sent in bytes or an error. - * Caller must hold @sk. - */ - -int kernel_sendpage_locked(struct sock *sk, struct page *page, int offset, - size_t size, int flags) -{ - struct socket *sock = sk->sk_socket; - - if (sock->ops->sendpage_locked) - return sock->ops->sendpage_locked(sk, page, offset, size, - flags); - - return sock_no_sendpage_locked(sk, page, offset, size, flags); -} -EXPORT_SYMBOL(kernel_sendpage_locked); - /** * kernel_sock_shutdown - shut down part of a full-duplex connection (kernel space) * @sock: socket diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 37edfe10f8c6..d2072fbf3272 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -3375,7 +3375,6 @@ static const struct proto_ops msg_ops = { .sendmsg = tipc_sendmsg, .recvmsg = tipc_recvmsg, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage }; static const struct proto_ops packet_ops = { @@ -3396,7 +3395,6 @@ static const struct proto_ops packet_ops = { .sendmsg = tipc_send_packet, .recvmsg = tipc_recvmsg, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage }; static const struct proto_ops stream_ops = { @@ -3417,7 +3415,6 @@ static const struct proto_ops stream_ops = { .sendmsg = tipc_sendstream, .recvmsg = tipc_recvstream, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage }; static const struct net_proto_family tipc_family_ops = { diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index b4b27a652ef0..1223d7e46d4a 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -758,8 +758,6 @@ static int unix_compat_ioctl(struct socket *sock, unsigned int cmd, unsigned lon static int unix_shutdown(struct socket *, int); static int unix_stream_sendmsg(struct socket *, struct msghdr *, size_t); static int unix_stream_recvmsg(struct socket *, struct msghdr *, size_t, int); -static ssize_t unix_stream_sendpage(struct socket *, struct page *, int offset, - size_t size, int flags); static ssize_t unix_stream_splice_read(struct socket *, loff_t *ppos, struct pipe_inode_info *, size_t size, unsigned int flags); @@ -852,7 +850,6 @@ static const struct proto_ops unix_stream_ops = { .recvmsg = unix_stream_recvmsg, .read_skb = unix_stream_read_skb, .mmap = sock_no_mmap, - .sendpage = unix_stream_sendpage, .splice_read = unix_stream_splice_read, .set_peek_off = unix_set_peek_off, .show_fdinfo = unix_show_fdinfo, @@ -878,7 +875,6 @@ static const struct proto_ops unix_dgram_ops = { .read_skb = unix_read_skb, .recvmsg = unix_dgram_recvmsg, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, .set_peek_off = unix_set_peek_off, .show_fdinfo = unix_show_fdinfo, }; @@ -902,7 +898,6 @@ static const struct proto_ops unix_seqpacket_ops = { .sendmsg = unix_seqpacket_sendmsg, .recvmsg = unix_seqpacket_recvmsg, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, .set_peek_off = unix_set_peek_off, .show_fdinfo = unix_show_fdinfo, }; @@ -1839,24 +1834,6 @@ static void maybe_add_creds(struct sk_buff *skb, const struct socket *sock, } } -static int maybe_init_creds(struct scm_cookie *scm, - struct socket *socket, - const struct sock *other) -{ - int err; - struct msghdr msg = { .msg_controllen = 0 }; - - err = scm_send(socket, &msg, scm, false); - if (err) - return err; - - if (unix_passcred_enabled(socket, other)) { - scm->pid = get_pid(task_tgid(current)); - current_uid_gid(&scm->creds.uid, &scm->creds.gid); - } - return err; -} - static bool unix_skb_scm_eq(struct sk_buff *skb, struct scm_cookie *scm) { @@ -2349,122 +2326,6 @@ static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg, return sent ? : err; } -static ssize_t unix_stream_sendpage(struct socket *socket, struct page *page, - int offset, size_t size, int flags) -{ - int err; - bool send_sigpipe = false; - bool init_scm = true; - struct scm_cookie scm; - struct sock *other, *sk = socket->sk; - struct sk_buff *skb, *newskb = NULL, *tail = NULL; - - if (flags & MSG_OOB) - return -EOPNOTSUPP; - - other = unix_peer(sk); - if (!other || sk->sk_state != TCP_ESTABLISHED) - return -ENOTCONN; - - if (false) { -alloc_skb: - unix_state_unlock(other); - mutex_unlock(&unix_sk(other)->iolock); - newskb = sock_alloc_send_pskb(sk, 0, 0, flags & MSG_DONTWAIT, - &err, 0); - if (!newskb) - goto err; - } - - /* we must acquire iolock as we modify already present - * skbs in the sk_receive_queue and mess with skb->len - */ - err = mutex_lock_interruptible(&unix_sk(other)->iolock); - if (err) { - err = flags & MSG_DONTWAIT ? -EAGAIN : -ERESTARTSYS; - goto err; - } - - if (sk->sk_shutdown & SEND_SHUTDOWN) { - err = -EPIPE; - send_sigpipe = true; - goto err_unlock; - } - - unix_state_lock(other); - - if (sock_flag(other, SOCK_DEAD) || - other->sk_shutdown & RCV_SHUTDOWN) { - err = -EPIPE; - send_sigpipe = true; - goto err_state_unlock; - } - - if (init_scm) { - err = maybe_init_creds(&scm, socket, other); - if (err) - goto err_state_unlock; - init_scm = false; - } - - skb = skb_peek_tail(&other->sk_receive_queue); - if (tail && tail == skb) { - skb = newskb; - } else if (!skb || !unix_skb_scm_eq(skb, &scm)) { - if (newskb) { - skb = newskb; - } else { - tail = skb; - goto alloc_skb; - } - } else if (newskb) { - /* this is fast path, we don't necessarily need to - * call to kfree_skb even though with newskb == NULL - * this - does no harm - */ - consume_skb(newskb); - newskb = NULL; - } - - if (skb_append_pagefrags(skb, page, offset, size)) { - tail = skb; - goto alloc_skb; - } - - skb->len += size; - skb->data_len += size; - skb->truesize += size; - refcount_add(size, &sk->sk_wmem_alloc); - - if (newskb) { - err = unix_scm_to_skb(&scm, skb, false); - if (err) - goto err_state_unlock; - spin_lock(&other->sk_receive_queue.lock); - __skb_queue_tail(&other->sk_receive_queue, newskb); - spin_unlock(&other->sk_receive_queue.lock); - } - - unix_state_unlock(other); - mutex_unlock(&unix_sk(other)->iolock); - - other->sk_data_ready(other); - scm_destroy(&scm); - return size; - -err_state_unlock: - unix_state_unlock(other); -err_unlock: - mutex_unlock(&unix_sk(other)->iolock); -err: - kfree_skb(newskb); - if (send_sigpipe && !(flags & MSG_NOSIGNAL)) - send_sig(SIGPIPE, current, 0); - if (!init_scm) - scm_destroy(&scm); - return err; -} - static int unix_seqpacket_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) { diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index 19aea7cba26e..d0e476755cdc 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -1271,7 +1271,6 @@ static const struct proto_ops vsock_dgram_ops = { .sendmsg = vsock_dgram_sendmsg, .recvmsg = vsock_dgram_recvmsg, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, }; static int vsock_transport_cancel_pkt(struct vsock_sock *vsk) @@ -2186,7 +2185,6 @@ static const struct proto_ops vsock_stream_ops = { .sendmsg = vsock_connectible_sendmsg, .recvmsg = vsock_connectible_recvmsg, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, .set_rcvlowat = vsock_set_rcvlowat, }; @@ -2208,7 +2206,6 @@ static const struct proto_ops vsock_seqpacket_ops = { .sendmsg = vsock_connectible_sendmsg, .recvmsg = vsock_connectible_recvmsg, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, }; static int vsock_create(struct net *net, struct socket *sock, diff --git a/net/x25/af_x25.c b/net/x25/af_x25.c index 5c7ad301d742..0fb5143bec7a 100644 --- a/net/x25/af_x25.c +++ b/net/x25/af_x25.c @@ -1757,7 +1757,6 @@ static const struct proto_ops x25_proto_ops = { .sendmsg = x25_sendmsg, .recvmsg = x25_recvmsg, .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, }; static struct packet_type x25_packet_type __read_mostly = { diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 2ac58b282b5e..eff1f0aaa4b5 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -1386,7 +1386,6 @@ static const struct proto_ops xsk_proto_ops = { .sendmsg = xsk_sendmsg, .recvmsg = xsk_recvmsg, .mmap = xsk_mmap, - .sendpage = sock_no_sendpage, }; static void xsk_destruct(struct sock *sk)