From patchwork Wed Mar 15 19:29:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Jannik_Gl=C3=BCckert?= X-Patchwork-Id: 70398 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp80766wrt; Wed, 15 Mar 2023 12:31:13 -0700 (PDT) X-Google-Smtp-Source: AK7set/il/lh9+2iBJ4XQTljFudb26oEKRz3GvRMUMt30uP/CpYxRaiyQIjptcDciu/G1xZoLP+c X-Received: by 2002:a17:906:25d4:b0:877:573d:e919 with SMTP id n20-20020a17090625d400b00877573de919mr6765994ejb.20.1678908672753; Wed, 15 Mar 2023 12:31:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1678908672; cv=none; d=google.com; s=arc-20160816; b=ojs2Pb2U/wCTNDDkMAtG8R7p0CCRHjRviKzZWxz690hvezqs1mM5QX7VO2zr/Pc/E+ 2W33/vZsFJt/LWjk+D+B8+9lOnYhYi9/XSJatFca7aqP82qlXyq5GKWv7tpPeBddEE4C hWZwKA2ZnHyspBiJE/7FGEw6JWWMjL2zOVCsHbQgagfCcC0EW0IDgySCc/CuIgDYAgSl lIE2RNw4WlgfbXHHNWiMj/TvVOq0c40UEnWJtgsAu07qMz1PfUl/UEl3PFMKth0O0v7Q BZ3tbHhXB/udV4qpz5ZqqOnA5Bbeg8QR1C/cKUYGrE0renUkT3kowrnWS1MVmMg48T9c sz9g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:cc:to:subject :message-id:date:mime-version:dmarc-filter:delivered-to :dkim-signature:dkim-filter; bh=PtcJI8INxXPuTZ9dFiyTg1G9/RhVwmQ2bJHYBYzo2QQ=; b=sP07RtwykI7A93YU3GyPqkrZ2XhERzvboyT9Xq/v/FaDRlJRM7ab9tWsUP6/IMk5iE lPz8Q6MlinYbny56nNX68ncrAioP07vHvZLnzfpKosZXj6LERNrIVuwvI6Vnb+YC7PJP McuQgrmiTuOLWDexPwq+pm7R2QP2Vs+7l3Q3p4aIEJYGGNHu5S6RqQj85MYYNISc2odI t2GZiD1PkyxJd9HQKXpGulFg4CF4hd6pF0DYitjnqZZVO8ChblC/cz8UHC9AYt/fcn+r tg44Ia9fe/kJg4lOQ11AzzDVUUkMkQYhIXeBDi08XeFc72wKURkIig7ZtFXmFvyecXYe yUKw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=gYKTM9Xi; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id m4-20020a170906234400b008ddf3c18304si6188534eja.946.2023.03.15.12.31.12 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 15 Mar 2023 12:31:12 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=gYKTM9Xi; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B2AB53858039 for ; Wed, 15 Mar 2023 19:30:57 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B2AB53858039 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1678908657; bh=PtcJI8INxXPuTZ9dFiyTg1G9/RhVwmQ2bJHYBYzo2QQ=; h=Date:Subject:To:Cc:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=gYKTM9XiL06Y/cnNj427wEWyDupofmGCkkPVGu99vIbFFJC9+wcN5gdGJHY6fV5GI XyGpxYla/p+0DolyBQ9NN8502/BEVqHuXShP5NgSZmDB6VPHq2TueUUJRryVY60cjC fY2QqOSUXFF4SYp3kSL3aIaegqhbIZ5l33D3QmfI= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-qk1-x72e.google.com (mail-qk1-x72e.google.com [IPv6:2607:f8b0:4864:20::72e]) by sourceware.org (Postfix) with ESMTPS id 7A2463858C83; Wed, 15 Mar 2023 19:30:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7A2463858C83 Received: by mail-qk1-x72e.google.com with SMTP id p20so3593176qkh.0; Wed, 15 Mar 2023 12:30:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678908610; h=cc:to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=PtcJI8INxXPuTZ9dFiyTg1G9/RhVwmQ2bJHYBYzo2QQ=; b=LFz9zvBwzxeJ9PvHTUlC8qvrddgWxY0+rx//UFOvTb5B0b9+ZE6eLUzbWZgh13d2Ak juSoeZJqJRRE8tmxFm8y0MRJEUVnFKuxDJJM3QVsqULp2b9UJd/751/WRt3Vr8MRJwf/ SheNAB6k6w7obt0Jk0b7W3BWe4TW9BxVwkQBD680UeDDoYdloHELoQOZLnHoJwBqfPb2 qq34s3gR6TV7YvLCMuEkL/6qeCv90eQ55Qq31u+Apd/uL4lCQALX0hDPjipkiQ7w+t4S O4B3/BYXcNfu1Us1h0JJnDJ5HSrpszBwWpIQTgpY616reBMs27IWW+GMLGgJwwXtL/G3 1BYQ== X-Gm-Message-State: AO0yUKWy+L7yVnNHmcC3AjPVM/f0xgosCJrbRWML3AuUO4yyojvpFvjb wu3+RPwAofvJBUdvBQWv7zXaXdsehky1z3+c+NREWWH2mk8= X-Received: by 2002:a05:620a:211b:b0:745:72b3:304f with SMTP id l27-20020a05620a211b00b0074572b3304fmr3546943qkl.5.1678908610386; Wed, 15 Mar 2023 12:30:10 -0700 (PDT) MIME-Version: 1.0 Date: Wed, 15 Mar 2023 20:29:58 +0100 Message-ID: Subject: [PATCH v2 1/2] libstdc++: use copy_file_range, improve sendfile in filesystem::copy_file To: libstdc++@gcc.gnu.org Cc: gcc-patches@gcc.gnu.org X-Spam-Status: No, score=-10.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: =?utf-8?q?Jannik_Gl=C3=BCckert_via_Gcc-patches?= From: =?utf-8?q?Jannik_Gl=C3=BCckert?= Reply-To: =?utf-8?q?Jannik_Gl=C3=BCckert?= Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760463340572949724?= X-GMAIL-MSGID: =?utf-8?q?1760463340572949724?= This iteration improves error handling for copy_file_range, particularly around undocumented error codes in earlier kernel versions. Additionally this fixes the userspace copy fallback to handle zero-length files such as in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108178. Lastly, the case "src gets resized during the copy loop" is now considered and will return true once the loop hits EOF (this is the only situation, aside from a zero-length src, where sendfile and copy_file_range return 0). Best Jannik From b55eb8dccaa44f07d8acbe6294326a46c920b04f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jannik=20Gl=C3=BCckert?= Date: Mon, 6 Mar 2023 20:52:08 +0100 Subject: [PATCH 1/2] libstdc++: also use sendfile for big files MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit we were previously only using sendfile for files smaller than 2GB, as sendfile needs to be called repeatedly for files bigger than that. some quick numbers, copying a 16GB file, average of 10 repetitions: old: real: 13.4s user: 0.14s sys : 7.43s new: real: 8.90s user: 0.00s sys : 3.68s Additionally, this fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108178 libstdc++-v3/ChangeLog: * acinclude.m4 (_GLIBCXX_HAVE_LSEEK): define * config.h.in: Regenerate. * configure: Regenerate. * src/filesystem/ops-common.h: enable sendfile for files >2GB in std::filesystem::copy_file, skip zero-length files Signed-off-by: Jannik Glückert --- libstdc++-v3/acinclude.m4 | 51 +++++---- libstdc++-v3/config.h.in | 3 + libstdc++-v3/configure | 127 ++++++++++++++++------- libstdc++-v3/src/filesystem/ops-common.h | 86 ++++++++------- 4 files changed, 175 insertions(+), 92 deletions(-) diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4 index 5136c0571e8..85a09a5a869 100644 --- a/libstdc++-v3/acinclude.m4 +++ b/libstdc++-v3/acinclude.m4 @@ -4583,6 +4583,7 @@ dnl _GLIBCXX_USE_FCHMOD dnl _GLIBCXX_USE_FCHMODAT dnl _GLIBCXX_USE_SENDFILE dnl HAVE_LINK +dnl HAVE_LSEEK dnl HAVE_READLINK dnl HAVE_SYMLINK dnl @@ -4718,25 +4719,6 @@ dnl if test $glibcxx_cv_fchmodat = yes; then AC_DEFINE(_GLIBCXX_USE_FCHMODAT, 1, [Define if fchmodat is available in .]) fi -dnl - AC_CACHE_CHECK([for sendfile that can copy files], - glibcxx_cv_sendfile, [dnl - case "${target_os}" in - gnu* | linux* | solaris* | uclinux*) - GCC_TRY_COMPILE_OR_LINK( - [#include ], - [sendfile(1, 2, (off_t*)0, sizeof 1);], - [glibcxx_cv_sendfile=yes], - [glibcxx_cv_sendfile=no]) - ;; - *) - glibcxx_cv_sendfile=no - ;; - esac - ]) - if test $glibcxx_cv_sendfile = yes; then - AC_DEFINE(_GLIBCXX_USE_SENDFILE, 1, [Define if sendfile is available in .]) - fi dnl AC_CACHE_CHECK([for link], glibcxx_cv_link, [dnl @@ -4749,6 +4731,18 @@ dnl if test $glibcxx_cv_link = yes; then AC_DEFINE(HAVE_LINK, 1, [Define if link is available in .]) fi +dnl + AC_CACHE_CHECK([for lseek], + glibcxx_cv_lseek, [dnl + GCC_TRY_COMPILE_OR_LINK( + [#include ], + [lseek(1, 0, SEEK_SET);], + [glibcxx_cv_lseek=yes], + [glibcxx_cv_lseek=no]) + ]) + if test $glibcxx_cv_lseek = yes; then + AC_DEFINE(HAVE_LSEEK, 1, [Define if lseek is available in .]) + fi dnl AC_CACHE_CHECK([for readlink], glibcxx_cv_readlink, [dnl @@ -4785,6 +4779,25 @@ dnl if test $glibcxx_cv_truncate = yes; then AC_DEFINE(HAVE_TRUNCATE, 1, [Define if truncate is available in .]) fi +dnl + AC_CACHE_CHECK([for sendfile that can copy files], + glibcxx_cv_sendfile, [dnl + case "${target_os}" in + gnu* | linux* | solaris* | uclinux*) + GCC_TRY_COMPILE_OR_LINK( + [#include ], + [sendfile(1, 2, (off_t*)0, sizeof 1);], + [glibcxx_cv_sendfile=yes], + [glibcxx_cv_sendfile=no]) + ;; + *) + glibcxx_cv_sendfile=no + ;; + esac + ]) + if test $glibcxx_cv_sendfile = yes && test $glibcxx_cv_lseek = yes; then + AC_DEFINE(_GLIBCXX_USE_SENDFILE, 1, [Define if sendfile is available in .]) + fi dnl AC_CACHE_CHECK([for fdopendir], glibcxx_cv_fdopendir, [dnl diff --git a/libstdc++-v3/src/filesystem/ops-common.h b/libstdc++-v3/src/filesystem/ops-common.h index abbfca43e5c..9e1b1d41dc5 100644 --- a/libstdc++-v3/src/filesystem/ops-common.h +++ b/libstdc++-v3/src/filesystem/ops-common.h @@ -51,6 +51,7 @@ # include # ifdef _GLIBCXX_USE_SENDFILE # include // sendfile +# include // lseek # endif #endif @@ -358,6 +359,32 @@ _GLIBCXX_BEGIN_NAMESPACE_FILESYSTEM } #ifdef NEED_DO_COPY_FILE +#if defined _GLIBCXX_USE_SENDFILE && ! defined _GLIBCXX_FILESYSTEM_IS_WINDOWS + bool + copy_file_sendfile(int fd_in, int fd_out, size_t length) noexcept + { + // a zero-length file is either empty, or not copyable by this syscall + // return early to avoid the syscall cost + if (length == 0) + { + errno = EINVAL; + return false; + } + size_t bytes_left = length; + off_t offset = 0; + ssize_t bytes_copied; + do { + bytes_copied = ::sendfile(fd_out, fd_in, &offset, bytes_left); + bytes_left -= bytes_copied; + } while (bytes_left > 0 && bytes_copied > 0); + if (bytes_copied < 0) + { + ::lseek(fd_out, 0, SEEK_SET); + return false; + } + return true; + } +#endif bool do_copy_file(const char_type* from, const char_type* to, std::filesystem::copy_options_existing_file options, @@ -498,28 +525,30 @@ _GLIBCXX_BEGIN_NAMESPACE_FILESYSTEM return false; } - size_t count = from_st->st_size; + bool has_copied = false; + #if defined _GLIBCXX_USE_SENDFILE && ! defined _GLIBCXX_FILESYSTEM_IS_WINDOWS - off_t offset = 0; - ssize_t n = ::sendfile(out.fd, in.fd, &offset, count); - if (n < 0 && errno != ENOSYS && errno != EINVAL) + if (!has_copied) + has_copied = copy_file_sendfile(in.fd, out.fd, from_st->st_size); + if (!has_copied) { - ec.assign(errno, std::generic_category()); - return false; + if (errno != ENOSYS && errno != EINVAL) + { + ec.assign(errno, std::generic_category()); + return false; + } } - if ((size_t)n == count) +#endif + + if (has_copied) { - if (!out.close() || !in.close()) - { - ec.assign(errno, std::generic_category()); - return false; - } - ec.clear(); - return true; + if (!out.close() || !in.close()) + { + ec.assign(errno, std::generic_category()); + return false; + } + return true; } - else if (n > 0) - count -= n; -#endif // _GLIBCXX_USE_SENDFILE using std::ios; __gnu_cxx::stdio_filebuf sbin(in.fd, ios::in|ios::binary); @@ -530,29 +559,12 @@ _GLIBCXX_BEGIN_NAMESPACE_FILESYSTEM if (sbout.is_open()) out.fd = -1; -#ifdef _GLIBCXX_USE_SENDFILE - if (n != 0) + if (!(std::ostream(&sbout) << &sbin)) { - if (n < 0) - n = 0; - - const auto p1 = sbin.pubseekoff(n, ios::beg, ios::in); - const auto p2 = sbout.pubseekoff(n, ios::beg, ios::out); - - const std::streampos errpos(std::streamoff(-1)); - if (p1 == errpos || p2 == errpos) - { - ec = std::make_error_code(std::errc::io_error); - return false; - } + ec = std::make_error_code(std::errc::io_error); + return false; } -#endif - if (count && !(std::ostream(&sbout) << &sbin)) - { - ec = std::make_error_code(std::errc::io_error); - return false; - } if (!sbout.close() || !sbin.close()) { ec.assign(errno, std::generic_category()); -- 2.39.2