Message ID | 20220905225046.193799-1-cf.natali@gmail.com |
---|---|
State | New, archived |
Headers |
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5044:0:0:0:0:0 with SMTP id h4csp377313wrt; Mon, 5 Sep 2022 15:51:56 -0700 (PDT) X-Google-Smtp-Source: AA6agR6SI+Jhm/UYzxH4PVbtgJlH6AT6f1IEzCe9iuCOA/ceIB12V2Me1eUBJF/2+tkuyJbvRUOq X-Received: by 2002:a17:907:2cea:b0:741:6251:3a22 with SMTP id hz10-20020a1709072cea00b0074162513a22mr28461845ejc.6.1662418316522; Mon, 05 Sep 2022 15:51:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1662418316; cv=none; d=google.com; s=arc-20160816; b=oJsuRS75V4NknPHM0OCsJs1fG/MJ/My/QXKKQ2YEvAtUFsBpbAgK7DnW1bH/p7QLFa kHYaFt0XDJ06Rbqyk6Xfc2oxZpCHXnWR12lEi291BYGZPQN4ITOdI3sYX+ywKS7ATnPu rf93kXnC+V07aRMDrUItsc077YzQwD2GztwPvp7yuPzc8eWDyB+k7L0TMY6kA4kxsYSt GEDeXDU8/VW0vbinQJrFDmr1f1V4x+kdccDiYfhnchVd2GXzRDRIUUvrMD9jhTmMm+Ss +PknYqqlFYhPAEopjXLTofGTMvnetAbDU5R7J5sz660JXEeGyQswQ5UzLw31/iqk42uj QySA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:reply-to:from:list-subscribe:list-help :list-post:list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:message-id:date:subject:to :dmarc-filter:delivered-to:dkim-signature:dkim-filter; bh=hhD0j7IYiEscCRyUyQT2fJgSckaVsbOxlmUcZkpTtss=; b=UkFJmQ/4j867mNF9S6vtRfKbpch+PbJ/wI4e1cF0Vw7rhGnRbKLix4+tV20WknupxI bP3XoTzMhctrrBdiVcULJwHGYfGHiuabfsk/vRfhThq4KN1T/lU80Kp0iixOjjLrgDhn XsbA9D3eR/i/3bnqVnzhN4Nbq00AZmBWLI8H7VwCUoeKU4KP2v864j5dI+jSUyPj7l8g mPUqGuapuumIYBMorTSfW8HD/4nJ4l/JuYLhhV1tXNBULdt7FzyEkVCbIi7trHH36NI2 /Sv4Hx7iJV3x3kLU8U8cFw9fjdWhla6iGBo6rdGdQV3HvQXxu8JyP2AX2mQo6p1ULd4x J7GA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=GkH61fIS; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id i15-20020a056402054f00b0044ec0845892si868733edx.121.2022.09.05.15.51.56 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Sep 2022 15:51:56 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=GkH61fIS; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3AB37385085A for <ouuuleilei@gmail.com>; Mon, 5 Sep 2022 22:51:55 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3AB37385085A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1662418315; bh=hhD0j7IYiEscCRyUyQT2fJgSckaVsbOxlmUcZkpTtss=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=GkH61fISYIA4yz0Pr29JuO/hJqmj3o0v3NLiD1GIgH8li5jsJOIUXkMdCFA3J1srW c8q5ptXHMlYyeKyCCyceoyxAtIBE7UzI1B9WkK9/voul/8i6zX69o/tg4w9kk8PaNI C8ebh9tDAnBAndG6j+P5zBKXsYMb35Zawf/lp2sY= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com [IPv6:2a00:1450:4864:20::32d]) by sourceware.org (Postfix) with ESMTPS id C98033858D28; Mon, 5 Sep 2022 22:51:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C98033858D28 Received: by mail-wm1-x32d.google.com with SMTP id d12-20020a05600c34cc00b003a83d20812fso6423519wmq.1; Mon, 05 Sep 2022 15:51:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date; bh=hhD0j7IYiEscCRyUyQT2fJgSckaVsbOxlmUcZkpTtss=; b=xcieLmr0ZE/1zpcn9Mv92cM0+Lwen+OaiaZ362WSXSNlCjaNItAOHvqPhoxzNJN4BQ FTNypNU4ULZfpVggQcmeVjgL8uDFpbfD5RNZYxFFmHhR3SO/zDFVGRmuV/uWDKLSaLow HGqAHCDy89Vej+FkDVx+ND7rfyQsJb3Roywmn02tYpeOEhJ2QuXFk6buHn3zkiFHYtvj 4ju047e7IEcfaYOrx3CujwtbTn+p+MP+l/DlO/lZwlasQA0rvSF2dCg5quMjUyspwred LJH0MpfD/AHxUe+7Jk93Th5Pw4H5qStQvjZpVHTfCM2QbV8wbWEp+6l1D30UuTovDvLP sOhw== X-Gm-Message-State: ACgBeo1jteiIAizWByCEKpwLWM/zr433R/j1QEqWABXEQ6+ik57shy4j zYZTUcEHxZ0t+t/UvYzNlgkOvcAdyDE= X-Received: by 2002:a05:600c:2059:b0:3a5:92cc:19c5 with SMTP id p25-20020a05600c205900b003a592cc19c5mr11434528wmg.101.1662418269425; Mon, 05 Sep 2022 15:51:09 -0700 (PDT) Received: from localhost.localdomain ([90.254.118.85]) by smtp.gmail.com with ESMTPSA id x16-20020adff0d0000000b0021d221daccfsm9883731wro.78.2022.09.05.15.51.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Sep 2022 15:51:09 -0700 (PDT) To: libstdc++@gcc.gnu.org, gcc-patches@gcc.gnu.org Subject: [PATCH] libstdc++: basic_filebuf: don't flush more often than necessary. Date: Mon, 5 Sep 2022 23:50:46 +0100 Message-Id: <20220905225046.193799-1-cf.natali@gmail.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: Charles-Francois Natali via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Charles-Francois Natali <cf.natali@gmail.com> Cc: Charles-Francois Natali <cf.natali@gmail.com> Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1743171948458692248?= X-GMAIL-MSGID: =?utf-8?q?1743171948458692248?= |
Series |
libstdc++: basic_filebuf: don't flush more often than necessary.
|
|
Commit Message
Charles-François Natali
Sept. 5, 2022, 10:50 p.m. UTC
`basic_filebuf::xsputn` would bypass the buffer when passed a chunk of size 1024 and above, seemingly as an optimisation. This can have a significant performance impact if the overhead of a `write` syscall is non-negligible, e.g. on a slow disk, on network filesystems, or simply during IO contention because instead of flushing every `BUFSIZ` (by default), we can flush every 1024 char. The impact is even greater with custom larger buffers, e.g. for network filesystems, because the code could issue `write` for example 1000X more often than necessary with respect to the buffer size. It also introduces a significant discontinuity in performance when writing chunks of size 1024 and above. See this reproducer which writes down a fixed number of chunks to a file open with `O_SYNC` - to replicate high-latency `write` - for varying size of chunks: ``` $ cat test_fstream_flush.cpp int main(int argc, char* argv[]) { assert(argc == 3); const auto* path = argv[1]; const auto chunk_size = std::stoul(argv[2]); const auto fd = open(path, O_CREAT | O_TRUNC | O_WRONLY | O_SYNC | O_CLOEXEC, 0666); assert(fd >= 0); auto filebuf = __gnu_cxx::stdio_filebuf<char>(fd, std::ios_base::out); auto stream = std::ostream(&filebuf); const auto chunk = std::vector<char>(chunk_size); for (auto i = 0; i < 1'000; ++i) { stream.write(chunk.data(), chunk.size()); } return 0; } ``` ``` $ g++ -o /tmp/test_fstream_flush test_fstream_flush.cpp -std=c++17 $ for i in $(seq 1021 1025); do echo -e "\n$i"; time /tmp/test_fstream_flush /tmp/foo $i; done 1021 real 0m0.997s user 0m0.000s sys 0m0.038s 1022 real 0m0.939s user 0m0.005s sys 0m0.032s 1023 real 0m0.954s user 0m0.005s sys 0m0.034s 1024 real 0m7.102s user 0m0.040s sys 0m0.192s 1025 real 0m7.204s user 0m0.025s sys 0m0.209s ``` See the huge drop in performance at the 1024-boundary. An `strace` confirms that from size 1024 we effectively defeat buffering: 1023-sized writes ``` $ strace -P /tmp/foo -e openat,write,writev /tmp/test_fstream_flush /tmp/foo 1023 2>&1 | head -n5 openat(AT_FDCWD, "/tmp/foo", O_WRONLY|O_CREAT|O_TRUNC|O_SYNC|O_CLOEXEC, 0666) = 3 writev(3, [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=8184}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=1023}], 2) = 9207 writev(3, [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=8184}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=1023}], 2) = 9207 writev(3, [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=8184}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=1023}], 2) = 9207 writev(3, [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=8184}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=1023}], 2) = 9207 ``` vs 1024-sized writes ``` $ strace -P /tmp/foo -e openat,write,writev /tmp/test_fstream_flush /tmp/foo 1024 2>&1 | head -n5 openat(AT_FDCWD, "/tmp/foo", O_WRONLY|O_CREAT|O_TRUNC|O_SYNC|O_CLOEXEC, 0666) = 3 writev(3, [{iov_base=NULL, iov_len=0}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=1024}], 2) = 1024 writev(3, [{iov_base="", iov_len=0}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=1024}], 2) = 1024 writev(3, [{iov_base="", iov_len=0}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=1024}], 2) = 1024 writev(3, [{iov_base="", iov_len=0}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=1024}], 2) = 1024 ``` Instead, it makes sense to only bypass the buffer if the amount of data to be written is larger than the buffer capacity. Closes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63746 --- libstdc++-v3/include/bits/fstream.tcc | 9 +-- .../27_io/basic_filebuf/sputn/char/63746.cc | 55 +++++++++++++++++++ 2 files changed, 58 insertions(+), 6 deletions(-) create mode 100644 libstdc++-v3/testsuite/27_io/basic_filebuf/sputn/char/63746.cc
Comments
On Mon, Sep 5, 2022, 23:51 Charles-Francois Natali <cf.natali@gmail.com> wrote: > `basic_filebuf::xsputn` would bypass the buffer when passed a chunk of > size 1024 and above, seemingly as an optimisation. > > This can have a significant performance impact if the overhead of a > `write` syscall is non-negligible, e.g. on a slow disk, on network > filesystems, or simply during IO contention because instead of flushing > every `BUFSIZ` (by default), we can flush every 1024 char. > The impact is even greater with custom larger buffers, e.g. for network > filesystems, because the code could issue `write` for example 1000X more > often than necessary with respect to the buffer size. > It also introduces a significant discontinuity in performance when > writing chunks of size 1024 and above. > > See this reproducer which writes down a fixed number of chunks to a file > open with `O_SYNC` - to replicate high-latency `write` - for varying > size of chunks: > > ``` > $ cat test_fstream_flush.cpp > > int > main(int argc, char* argv[]) > { > assert(argc == 3); > > const auto* path = argv[1]; > const auto chunk_size = std::stoul(argv[2]); > > const auto fd = > open(path, O_CREAT | O_TRUNC | O_WRONLY | O_SYNC | O_CLOEXEC, 0666); > assert(fd >= 0); > > auto filebuf = __gnu_cxx::stdio_filebuf<char>(fd, std::ios_base::out); > auto stream = std::ostream(&filebuf); > > const auto chunk = std::vector<char>(chunk_size); > > for (auto i = 0; i < 1'000; ++i) { > stream.write(chunk.data(), chunk.size()); > } > > return 0; > } > ``` > > ``` > $ g++ -o /tmp/test_fstream_flush test_fstream_flush.cpp -std=c++17 > $ for i in $(seq 1021 1025); do echo -e "\n$i"; time > /tmp/test_fstream_flush /tmp/foo $i; done > > 1021 > > real 0m0.997s > user 0m0.000s > sys 0m0.038s > > 1022 > > real 0m0.939s > user 0m0.005s > sys 0m0.032s > > 1023 > > real 0m0.954s > user 0m0.005s > sys 0m0.034s > > 1024 > > real 0m7.102s > user 0m0.040s > sys 0m0.192s > > 1025 > > real 0m7.204s > user 0m0.025s > sys 0m0.209s > ``` > > See the huge drop in performance at the 1024-boundary. > > An `strace` confirms that from size 1024 we effectively defeat > buffering: > 1023-sized writes > ``` > $ strace -P /tmp/foo -e openat,write,writev /tmp/test_fstream_flush > /tmp/foo 1023 2>&1 | head -n5 > openat(AT_FDCWD, "/tmp/foo", O_WRONLY|O_CREAT|O_TRUNC|O_SYNC|O_CLOEXEC, > 0666) = 3 > writev(3, > [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > iov_len=8184}, > {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > iov_len=1023}], 2) = 9207 > writev(3, > [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > iov_len=8184}, > {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > iov_len=1023}], 2) = 9207 > writev(3, > [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > iov_len=8184}, > {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > iov_len=1023}], 2) = 9207 > writev(3, > [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > iov_len=8184}, > {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > iov_len=1023}], 2) = 9207 > ``` > > vs 1024-sized writes > ``` > $ strace -P /tmp/foo -e openat,write,writev /tmp/test_fstream_flush > /tmp/foo 1024 2>&1 | head -n5 > openat(AT_FDCWD, "/tmp/foo", O_WRONLY|O_CREAT|O_TRUNC|O_SYNC|O_CLOEXEC, > 0666) = 3 > writev(3, [{iov_base=NULL, iov_len=0}, > {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > iov_len=1024}], 2) = 1024 > writev(3, [{iov_base="", iov_len=0}, > {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > iov_len=1024}], 2) = 1024 > writev(3, [{iov_base="", iov_len=0}, > {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > iov_len=1024}], 2) = 1024 > writev(3, [{iov_base="", iov_len=0}, > {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > iov_len=1024}], 2) = 1024 > ``` > > Instead, it makes sense to only bypass the buffer if the amount of data > to be written is larger than the buffer capacity. > > Closes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63746 > --- > libstdc++-v3/include/bits/fstream.tcc | 9 +-- > .../27_io/basic_filebuf/sputn/char/63746.cc | 55 +++++++++++++++++++ > 2 files changed, 58 insertions(+), 6 deletions(-) > create mode 100644 > libstdc++-v3/testsuite/27_io/basic_filebuf/sputn/char/63746.cc > > diff --git a/libstdc++-v3/include/bits/fstream.tcc > b/libstdc++-v3/include/bits/fstream.tcc > index 7ccc887b8..2e9369628 100644 > --- a/libstdc++-v3/include/bits/fstream.tcc > +++ b/libstdc++-v3/include/bits/fstream.tcc > @@ -757,23 +757,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > { > streamsize __ret = 0; > // Optimization in the always_noconv() case, to be generalized in > the > - // future: when __n is sufficiently large we write directly instead > of > - // using the buffer. > + // future: when __n is larger than the available capacity we write > + // directly instead of using the buffer. > const bool __testout = (_M_mode & ios_base::out > || _M_mode & ios_base::app); > if (__check_facet(_M_codecvt).always_noconv() > && __testout && !_M_reading) > { > - // Measurement would reveal the best choice. > - const streamsize __chunk = 1ul << 10; > streamsize __bufavail = this->epptr() - this->pptr(); > > // Don't mistake 'uncommitted' mode buffered with unbuffered. > if (!_M_writing && _M_buf_size > 1) > __bufavail = _M_buf_size - 1; > > - const streamsize __limit = std::min(__chunk, __bufavail); > - if (__n >= __limit) > + if (__n >= __bufavail) > { > const streamsize __buffill = this->pptr() - this->pbase(); > const char* __buf = reinterpret_cast<const > char*>(this->pbase()); > diff --git > a/libstdc++-v3/testsuite/27_io/basic_filebuf/sputn/char/63746.cc > b/libstdc++-v3/testsuite/27_io/basic_filebuf/sputn/char/63746.cc > new file mode 100644 > index 000000000..36448e049 > --- /dev/null > +++ b/libstdc++-v3/testsuite/27_io/basic_filebuf/sputn/char/63746.cc > @@ -0,0 +1,55 @@ > +// Copyright (C) 2013-2022 Free Software Foundation, Inc. > +// > +// This file is part of the GNU ISO C++ Library. This library is free > +// software; you can redistribute it and/or modify it under the > +// terms of the GNU General Public License as published by the > +// Free Software Foundation; either version 3, or (at your option) > +// any later version. > + > +// This library is distributed in the hope that it will be useful, > +// but WITHOUT ANY WARRANTY; without even the implied warranty of > +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +// GNU General Public License for more details. > + > +// You should have received a copy of the GNU General Public License along > +// with this library; see the file COPYING3. If not see > +// <http://www.gnu.org/licenses/>. > + > +// { dg-require-fileio "" } > + > +#include <fstream> > +#include <testsuite_hooks.h> > + > +class testbuf : public std::filebuf { > +public: > + char_type* pub_pprt() const > + { > + return this->pptr(); > + } > + > + char_type* pub_pbase() const > + { > + return this->pbase(); > + } > +}; > + > +void test01() > +{ > + using namespace std; > + > + // Leave capacity to avoid flush. > + const streamsize chunk_size = BUFSIZ - 1 - 1; > + const char data[chunk_size] = {}; > + > + testbuf a_f; > + VERIFY( a_f.open("tmp_63746_sputn", ios_base::out) ); > + VERIFY( chunk_size == a_f.sputn(data, chunk_size) ); > + VERIFY( (a_f.pub_pprt() - a_f.pub_pbase()) == chunk_size ); > + VERIFY( a_f.close() ); > +} > + > +int main() > +{ > + test01(); > + return 0; > +} > -- > 2.30.2 > >
On Thu, Sep 22, 2022, 17:51 Charles-François Natali <cf.natali@gmail.com> wrote: > > On Mon, Sep 5, 2022, 23:51 Charles-Francois Natali <cf.natali@gmail.com> > wrote: > >> `basic_filebuf::xsputn` would bypass the buffer when passed a chunk of >> size 1024 and above, seemingly as an optimisation. >> >> This can have a significant performance impact if the overhead of a >> `write` syscall is non-negligible, e.g. on a slow disk, on network >> filesystems, or simply during IO contention because instead of flushing >> every `BUFSIZ` (by default), we can flush every 1024 char. >> The impact is even greater with custom larger buffers, e.g. for network >> filesystems, because the code could issue `write` for example 1000X more >> often than necessary with respect to the buffer size. >> It also introduces a significant discontinuity in performance when >> writing chunks of size 1024 and above. >> >> See this reproducer which writes down a fixed number of chunks to a file >> open with `O_SYNC` - to replicate high-latency `write` - for varying >> size of chunks: >> >> ``` >> $ cat test_fstream_flush.cpp >> >> int >> main(int argc, char* argv[]) >> { >> assert(argc == 3); >> >> const auto* path = argv[1]; >> const auto chunk_size = std::stoul(argv[2]); >> >> const auto fd = >> open(path, O_CREAT | O_TRUNC | O_WRONLY | O_SYNC | O_CLOEXEC, 0666); >> assert(fd >= 0); >> >> auto filebuf = __gnu_cxx::stdio_filebuf<char>(fd, std::ios_base::out); >> auto stream = std::ostream(&filebuf); >> >> const auto chunk = std::vector<char>(chunk_size); >> >> for (auto i = 0; i < 1'000; ++i) { >> stream.write(chunk.data(), chunk.size()); >> } >> >> return 0; >> } >> ``` >> >> ``` >> $ g++ -o /tmp/test_fstream_flush test_fstream_flush.cpp -std=c++17 >> $ for i in $(seq 1021 1025); do echo -e "\n$i"; time >> /tmp/test_fstream_flush /tmp/foo $i; done >> >> 1021 >> >> real 0m0.997s >> user 0m0.000s >> sys 0m0.038s >> >> 1022 >> >> real 0m0.939s >> user 0m0.005s >> sys 0m0.032s >> >> 1023 >> >> real 0m0.954s >> user 0m0.005s >> sys 0m0.034s >> >> 1024 >> >> real 0m7.102s >> user 0m0.040s >> sys 0m0.192s >> >> 1025 >> >> real 0m7.204s >> user 0m0.025s >> sys 0m0.209s >> ``` >> >> See the huge drop in performance at the 1024-boundary. >> >> An `strace` confirms that from size 1024 we effectively defeat >> buffering: >> 1023-sized writes >> ``` >> $ strace -P /tmp/foo -e openat,write,writev /tmp/test_fstream_flush >> /tmp/foo 1023 2>&1 | head -n5 >> openat(AT_FDCWD, "/tmp/foo", O_WRONLY|O_CREAT|O_TRUNC|O_SYNC|O_CLOEXEC, >> 0666) = 3 >> writev(3, >> [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., >> iov_len=8184}, >> {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., >> iov_len=1023}], 2) = 9207 >> writev(3, >> [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., >> iov_len=8184}, >> {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., >> iov_len=1023}], 2) = 9207 >> writev(3, >> [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., >> iov_len=8184}, >> {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., >> iov_len=1023}], 2) = 9207 >> writev(3, >> [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., >> iov_len=8184}, >> {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., >> iov_len=1023}], 2) = 9207 >> ``` >> >> vs 1024-sized writes >> ``` >> $ strace -P /tmp/foo -e openat,write,writev /tmp/test_fstream_flush >> /tmp/foo 1024 2>&1 | head -n5 >> openat(AT_FDCWD, "/tmp/foo", O_WRONLY|O_CREAT|O_TRUNC|O_SYNC|O_CLOEXEC, >> 0666) = 3 >> writev(3, [{iov_base=NULL, iov_len=0}, >> {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., >> iov_len=1024}], 2) = 1024 >> writev(3, [{iov_base="", iov_len=0}, >> {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., >> iov_len=1024}], 2) = 1024 >> writev(3, [{iov_base="", iov_len=0}, >> {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., >> iov_len=1024}], 2) = 1024 >> writev(3, [{iov_base="", iov_len=0}, >> {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., >> iov_len=1024}], 2) = 1024 >> ``` >> >> Instead, it makes sense to only bypass the buffer if the amount of data >> to be written is larger than the buffer capacity. >> >> Closes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63746 >> --- >> libstdc++-v3/include/bits/fstream.tcc | 9 +-- >> .../27_io/basic_filebuf/sputn/char/63746.cc | 55 +++++++++++++++++++ >> 2 files changed, 58 insertions(+), 6 deletions(-) >> create mode 100644 >> libstdc++-v3/testsuite/27_io/basic_filebuf/sputn/char/63746.cc >> >> diff --git a/libstdc++-v3/include/bits/fstream.tcc >> b/libstdc++-v3/include/bits/fstream.tcc >> index 7ccc887b8..2e9369628 100644 >> --- a/libstdc++-v3/include/bits/fstream.tcc >> +++ b/libstdc++-v3/include/bits/fstream.tcc >> @@ -757,23 +757,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >> { >> streamsize __ret = 0; >> // Optimization in the always_noconv() case, to be generalized in >> the >> - // future: when __n is sufficiently large we write directly >> instead of >> - // using the buffer. >> + // future: when __n is larger than the available capacity we write >> + // directly instead of using the buffer. >> const bool __testout = (_M_mode & ios_base::out >> || _M_mode & ios_base::app); >> if (__check_facet(_M_codecvt).always_noconv() >> && __testout && !_M_reading) >> { >> - // Measurement would reveal the best choice. >> - const streamsize __chunk = 1ul << 10; >> streamsize __bufavail = this->epptr() - this->pptr(); >> >> // Don't mistake 'uncommitted' mode buffered with unbuffered. >> if (!_M_writing && _M_buf_size > 1) >> __bufavail = _M_buf_size - 1; >> >> - const streamsize __limit = std::min(__chunk, __bufavail); >> - if (__n >= __limit) >> + if (__n >= __bufavail) >> { >> const streamsize __buffill = this->pptr() - this->pbase(); >> const char* __buf = reinterpret_cast<const >> char*>(this->pbase()); >> diff --git >> a/libstdc++-v3/testsuite/27_io/basic_filebuf/sputn/char/63746.cc >> b/libstdc++-v3/testsuite/27_io/basic_filebuf/sputn/char/63746.cc >> new file mode 100644 >> index 000000000..36448e049 >> --- /dev/null >> +++ b/libstdc++-v3/testsuite/27_io/basic_filebuf/sputn/char/63746.cc >> @@ -0,0 +1,55 @@ >> +// Copyright (C) 2013-2022 Free Software Foundation, Inc. >> +// >> +// This file is part of the GNU ISO C++ Library. This library is free >> +// software; you can redistribute it and/or modify it under the >> +// terms of the GNU General Public License as published by the >> +// Free Software Foundation; either version 3, or (at your option) >> +// any later version. >> + >> +// This library is distributed in the hope that it will be useful, >> +// but WITHOUT ANY WARRANTY; without even the implied warranty of >> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> +// GNU General Public License for more details. >> + >> +// You should have received a copy of the GNU General Public License >> along >> +// with this library; see the file COPYING3. If not see >> +// <http://www.gnu.org/licenses/>. >> + >> +// { dg-require-fileio "" } >> + >> +#include <fstream> >> +#include <testsuite_hooks.h> >> + >> +class testbuf : public std::filebuf { >> +public: >> + char_type* pub_pprt() const >> + { >> + return this->pptr(); >> + } >> + >> + char_type* pub_pbase() const >> + { >> + return this->pbase(); >> + } >> +}; >> + >> +void test01() >> +{ >> + using namespace std; >> + >> + // Leave capacity to avoid flush. >> + const streamsize chunk_size = BUFSIZ - 1 - 1; >> + const char data[chunk_size] = {}; >> + >> + testbuf a_f; >> + VERIFY( a_f.open("tmp_63746_sputn", ios_base::out) ); >> + VERIFY( chunk_size == a_f.sputn(data, chunk_size) ); >> + VERIFY( (a_f.pub_pprt() - a_f.pub_pbase()) == chunk_size ); >> + VERIFY( a_f.close() ); >> +} >> + >> +int main() >> +{ >> + test01(); >> + return 0; >> +} >> -- >> 2.30.2 >> >>
Sorry for the lack of review. I've been trying to remember (and find) some previous discussions related to this topic, but haven't managed to find it yet. The patch does look sensible (and is the same as the one attached to PR 63746) so I'll make sure to review it in time for the GCC 13 cut-off. I noticed that you contributed your test case with a FSF copyright assignment header. Do you actually have a copyright assignment for GCC contributions? If not, you would either need to complete that paperwork with the FSF, or alternatively just contribute under the DCO terms instead: https://gcc.gnu.org/dco.html On Thu, 6 Oct 2022 at 14:24, Charles-François Natali via Libstdc++ <libstdc++@gcc.gnu.org> wrote: > > On Thu, Sep 22, 2022, 17:51 Charles-François Natali <cf.natali@gmail.com> > wrote: > > > > > On Mon, Sep 5, 2022, 23:51 Charles-Francois Natali <cf.natali@gmail.com> > > wrote: > > > >> `basic_filebuf::xsputn` would bypass the buffer when passed a chunk of > >> size 1024 and above, seemingly as an optimisation. > >> > >> This can have a significant performance impact if the overhead of a > >> `write` syscall is non-negligible, e.g. on a slow disk, on network > >> filesystems, or simply during IO contention because instead of flushing > >> every `BUFSIZ` (by default), we can flush every 1024 char. > >> The impact is even greater with custom larger buffers, e.g. for network > >> filesystems, because the code could issue `write` for example 1000X more > >> often than necessary with respect to the buffer size. > >> It also introduces a significant discontinuity in performance when > >> writing chunks of size 1024 and above. > >> > >> See this reproducer which writes down a fixed number of chunks to a file > >> open with `O_SYNC` - to replicate high-latency `write` - for varying > >> size of chunks: > >> > >> ``` > >> $ cat test_fstream_flush.cpp > >> > >> int > >> main(int argc, char* argv[]) > >> { > >> assert(argc == 3); > >> > >> const auto* path = argv[1]; > >> const auto chunk_size = std::stoul(argv[2]); > >> > >> const auto fd = > >> open(path, O_CREAT | O_TRUNC | O_WRONLY | O_SYNC | O_CLOEXEC, 0666); > >> assert(fd >= 0); > >> > >> auto filebuf = __gnu_cxx::stdio_filebuf<char>(fd, std::ios_base::out); > >> auto stream = std::ostream(&filebuf); > >> > >> const auto chunk = std::vector<char>(chunk_size); > >> > >> for (auto i = 0; i < 1'000; ++i) { > >> stream.write(chunk.data(), chunk.size()); > >> } > >> > >> return 0; > >> } > >> ``` > >> > >> ``` > >> $ g++ -o /tmp/test_fstream_flush test_fstream_flush.cpp -std=c++17 > >> $ for i in $(seq 1021 1025); do echo -e "\n$i"; time > >> /tmp/test_fstream_flush /tmp/foo $i; done > >> > >> 1021 > >> > >> real 0m0.997s > >> user 0m0.000s > >> sys 0m0.038s > >> > >> 1022 > >> > >> real 0m0.939s > >> user 0m0.005s > >> sys 0m0.032s > >> > >> 1023 > >> > >> real 0m0.954s > >> user 0m0.005s > >> sys 0m0.034s > >> > >> 1024 > >> > >> real 0m7.102s > >> user 0m0.040s > >> sys 0m0.192s > >> > >> 1025 > >> > >> real 0m7.204s > >> user 0m0.025s > >> sys 0m0.209s > >> ``` > >> > >> See the huge drop in performance at the 1024-boundary. > >> > >> An `strace` confirms that from size 1024 we effectively defeat > >> buffering: > >> 1023-sized writes > >> ``` > >> $ strace -P /tmp/foo -e openat,write,writev /tmp/test_fstream_flush > >> /tmp/foo 1023 2>&1 | head -n5 > >> openat(AT_FDCWD, "/tmp/foo", O_WRONLY|O_CREAT|O_TRUNC|O_SYNC|O_CLOEXEC, > >> 0666) = 3 > >> writev(3, > >> [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > >> iov_len=8184}, > >> {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > >> iov_len=1023}], 2) = 9207 > >> writev(3, > >> [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > >> iov_len=8184}, > >> {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > >> iov_len=1023}], 2) = 9207 > >> writev(3, > >> [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > >> iov_len=8184}, > >> {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > >> iov_len=1023}], 2) = 9207 > >> writev(3, > >> [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > >> iov_len=8184}, > >> {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > >> iov_len=1023}], 2) = 9207 > >> ``` > >> > >> vs 1024-sized writes > >> ``` > >> $ strace -P /tmp/foo -e openat,write,writev /tmp/test_fstream_flush > >> /tmp/foo 1024 2>&1 | head -n5 > >> openat(AT_FDCWD, "/tmp/foo", O_WRONLY|O_CREAT|O_TRUNC|O_SYNC|O_CLOEXEC, > >> 0666) = 3 > >> writev(3, [{iov_base=NULL, iov_len=0}, > >> {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > >> iov_len=1024}], 2) = 1024 > >> writev(3, [{iov_base="", iov_len=0}, > >> {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > >> iov_len=1024}], 2) = 1024 > >> writev(3, [{iov_base="", iov_len=0}, > >> {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > >> iov_len=1024}], 2) = 1024 > >> writev(3, [{iov_base="", iov_len=0}, > >> {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > >> iov_len=1024}], 2) = 1024 > >> ``` > >> > >> Instead, it makes sense to only bypass the buffer if the amount of data > >> to be written is larger than the buffer capacity. > >> > >> Closes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63746 > >> --- > >> libstdc++-v3/include/bits/fstream.tcc | 9 +-- > >> .../27_io/basic_filebuf/sputn/char/63746.cc | 55 +++++++++++++++++++ > >> 2 files changed, 58 insertions(+), 6 deletions(-) > >> create mode 100644 > >> libstdc++-v3/testsuite/27_io/basic_filebuf/sputn/char/63746.cc > >> > >> diff --git a/libstdc++-v3/include/bits/fstream.tcc > >> b/libstdc++-v3/include/bits/fstream.tcc > >> index 7ccc887b8..2e9369628 100644 > >> --- a/libstdc++-v3/include/bits/fstream.tcc > >> +++ b/libstdc++-v3/include/bits/fstream.tcc > >> @@ -757,23 +757,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > >> { > >> streamsize __ret = 0; > >> // Optimization in the always_noconv() case, to be generalized in > >> the > >> - // future: when __n is sufficiently large we write directly > >> instead of > >> - // using the buffer. > >> + // future: when __n is larger than the available capacity we write > >> + // directly instead of using the buffer. > >> const bool __testout = (_M_mode & ios_base::out > >> || _M_mode & ios_base::app); > >> if (__check_facet(_M_codecvt).always_noconv() > >> && __testout && !_M_reading) > >> { > >> - // Measurement would reveal the best choice. > >> - const streamsize __chunk = 1ul << 10; > >> streamsize __bufavail = this->epptr() - this->pptr(); > >> > >> // Don't mistake 'uncommitted' mode buffered with unbuffered. > >> if (!_M_writing && _M_buf_size > 1) > >> __bufavail = _M_buf_size - 1; > >> > >> - const streamsize __limit = std::min(__chunk, __bufavail); > >> - if (__n >= __limit) > >> + if (__n >= __bufavail) > >> { > >> const streamsize __buffill = this->pptr() - this->pbase(); > >> const char* __buf = reinterpret_cast<const > >> char*>(this->pbase()); > >> diff --git > >> a/libstdc++-v3/testsuite/27_io/basic_filebuf/sputn/char/63746.cc > >> b/libstdc++-v3/testsuite/27_io/basic_filebuf/sputn/char/63746.cc > >> new file mode 100644 > >> index 000000000..36448e049 > >> --- /dev/null > >> +++ b/libstdc++-v3/testsuite/27_io/basic_filebuf/sputn/char/63746.cc > >> @@ -0,0 +1,55 @@ > >> +// Copyright (C) 2013-2022 Free Software Foundation, Inc. > >> +// > >> +// This file is part of the GNU ISO C++ Library. This library is free > >> +// software; you can redistribute it and/or modify it under the > >> +// terms of the GNU General Public License as published by the > >> +// Free Software Foundation; either version 3, or (at your option) > >> +// any later version. > >> + > >> +// This library is distributed in the hope that it will be useful, > >> +// but WITHOUT ANY WARRANTY; without even the implied warranty of > >> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > >> +// GNU General Public License for more details. > >> + > >> +// You should have received a copy of the GNU General Public License > >> along > >> +// with this library; see the file COPYING3. If not see > >> +// <http://www.gnu.org/licenses/>. > >> + > >> +// { dg-require-fileio "" } > >> + > >> +#include <fstream> > >> +#include <testsuite_hooks.h> > >> + > >> +class testbuf : public std::filebuf { > >> +public: > >> + char_type* pub_pprt() const > >> + { > >> + return this->pptr(); > >> + } > >> + > >> + char_type* pub_pbase() const > >> + { > >> + return this->pbase(); > >> + } > >> +}; > >> + > >> +void test01() > >> +{ > >> + using namespace std; > >> + > >> + // Leave capacity to avoid flush. > >> + const streamsize chunk_size = BUFSIZ - 1 - 1; > >> + const char data[chunk_size] = {}; > >> + > >> + testbuf a_f; > >> + VERIFY( a_f.open("tmp_63746_sputn", ios_base::out) ); > >> + VERIFY( chunk_size == a_f.sputn(data, chunk_size) ); > >> + VERIFY( (a_f.pub_pprt() - a_f.pub_pbase()) == chunk_size ); > >> + VERIFY( a_f.close() ); > >> +} > >> + > >> +int main() > >> +{ > >> + test01(); > >> + return 0; > >> +} > >> -- > >> 2.30.2 > >> > >> >
On Thu, Oct 6, 2022, 14:29 Jonathan Wakely <jwakely@redhat.com> wrote: > Sorry for the lack of review. I've been trying to remember (and find) > some previous discussions related to this topic, but haven't managed > to find it yet. > No worries! > The patch does look sensible (and is the same as the one attached to > PR 63746) so I'll make sure to review it in time for the GCC 13 > cut-off. > > I noticed that you contributed your test case with a FSF copyright > assignment header. Do you actually have a copyright assignment for GCC > contributions? If not, you would either need to complete that > paperwork with the FSF, or alternatively just contribute under the DCO > terms instead: https://gcc.gnu.org/dco.html I actually just copy-pasted the header from another test, would it be simpler if i just removed it? Cheers, Charles > > > On Thu, 6 Oct 2022 at 14:24, Charles-François Natali via Libstdc++ > <libstdc++@gcc.gnu.org> wrote: > > > > On Thu, Sep 22, 2022, 17:51 Charles-François Natali <cf.natali@gmail.com > > > > wrote: > > > > > > > > On Mon, Sep 5, 2022, 23:51 Charles-Francois Natali < > cf.natali@gmail.com> > > > wrote: > > > > > >> `basic_filebuf::xsputn` would bypass the buffer when passed a chunk of > > >> size 1024 and above, seemingly as an optimisation. > > >> > > >> This can have a significant performance impact if the overhead of a > > >> `write` syscall is non-negligible, e.g. on a slow disk, on network > > >> filesystems, or simply during IO contention because instead of > flushing > > >> every `BUFSIZ` (by default), we can flush every 1024 char. > > >> The impact is even greater with custom larger buffers, e.g. for > network > > >> filesystems, because the code could issue `write` for example 1000X > more > > >> often than necessary with respect to the buffer size. > > >> It also introduces a significant discontinuity in performance when > > >> writing chunks of size 1024 and above. > > >> > > >> See this reproducer which writes down a fixed number of chunks to a > file > > >> open with `O_SYNC` - to replicate high-latency `write` - for varying > > >> size of chunks: > > >> > > >> ``` > > >> $ cat test_fstream_flush.cpp > > >> > > >> int > > >> main(int argc, char* argv[]) > > >> { > > >> assert(argc == 3); > > >> > > >> const auto* path = argv[1]; > > >> const auto chunk_size = std::stoul(argv[2]); > > >> > > >> const auto fd = > > >> open(path, O_CREAT | O_TRUNC | O_WRONLY | O_SYNC | O_CLOEXEC, > 0666); > > >> assert(fd >= 0); > > >> > > >> auto filebuf = __gnu_cxx::stdio_filebuf<char>(fd, > std::ios_base::out); > > >> auto stream = std::ostream(&filebuf); > > >> > > >> const auto chunk = std::vector<char>(chunk_size); > > >> > > >> for (auto i = 0; i < 1'000; ++i) { > > >> stream.write(chunk.data(), chunk.size()); > > >> } > > >> > > >> return 0; > > >> } > > >> ``` > > >> > > >> ``` > > >> $ g++ -o /tmp/test_fstream_flush test_fstream_flush.cpp -std=c++17 > > >> $ for i in $(seq 1021 1025); do echo -e "\n$i"; time > > >> /tmp/test_fstream_flush /tmp/foo $i; done > > >> > > >> 1021 > > >> > > >> real 0m0.997s > > >> user 0m0.000s > > >> sys 0m0.038s > > >> > > >> 1022 > > >> > > >> real 0m0.939s > > >> user 0m0.005s > > >> sys 0m0.032s > > >> > > >> 1023 > > >> > > >> real 0m0.954s > > >> user 0m0.005s > > >> sys 0m0.034s > > >> > > >> 1024 > > >> > > >> real 0m7.102s > > >> user 0m0.040s > > >> sys 0m0.192s > > >> > > >> 1025 > > >> > > >> real 0m7.204s > > >> user 0m0.025s > > >> sys 0m0.209s > > >> ``` > > >> > > >> See the huge drop in performance at the 1024-boundary. > > >> > > >> An `strace` confirms that from size 1024 we effectively defeat > > >> buffering: > > >> 1023-sized writes > > >> ``` > > >> $ strace -P /tmp/foo -e openat,write,writev /tmp/test_fstream_flush > > >> /tmp/foo 1023 2>&1 | head -n5 > > >> openat(AT_FDCWD, "/tmp/foo", > O_WRONLY|O_CREAT|O_TRUNC|O_SYNC|O_CLOEXEC, > > >> 0666) = 3 > > >> writev(3, > > >> > [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > > >> iov_len=8184}, > > >> > {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > > >> iov_len=1023}], 2) = 9207 > > >> writev(3, > > >> > [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > > >> iov_len=8184}, > > >> > {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > > >> iov_len=1023}], 2) = 9207 > > >> writev(3, > > >> > [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > > >> iov_len=8184}, > > >> > {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > > >> iov_len=1023}], 2) = 9207 > > >> writev(3, > > >> > [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > > >> iov_len=8184}, > > >> > {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > > >> iov_len=1023}], 2) = 9207 > > >> ``` > > >> > > >> vs 1024-sized writes > > >> ``` > > >> $ strace -P /tmp/foo -e openat,write,writev /tmp/test_fstream_flush > > >> /tmp/foo 1024 2>&1 | head -n5 > > >> openat(AT_FDCWD, "/tmp/foo", > O_WRONLY|O_CREAT|O_TRUNC|O_SYNC|O_CLOEXEC, > > >> 0666) = 3 > > >> writev(3, [{iov_base=NULL, iov_len=0}, > > >> > {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > > >> iov_len=1024}], 2) = 1024 > > >> writev(3, [{iov_base="", iov_len=0}, > > >> > {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > > >> iov_len=1024}], 2) = 1024 > > >> writev(3, [{iov_base="", iov_len=0}, > > >> > {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > > >> iov_len=1024}], 2) = 1024 > > >> writev(3, [{iov_base="", iov_len=0}, > > >> > {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > > >> iov_len=1024}], 2) = 1024 > > >> ``` > > >> > > >> Instead, it makes sense to only bypass the buffer if the amount of > data > > >> to be written is larger than the buffer capacity. > > >> > > >> Closes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63746 > > >> --- > > >> libstdc++-v3/include/bits/fstream.tcc | 9 +-- > > >> .../27_io/basic_filebuf/sputn/char/63746.cc | 55 > +++++++++++++++++++ > > >> 2 files changed, 58 insertions(+), 6 deletions(-) > > >> create mode 100644 > > >> libstdc++-v3/testsuite/27_io/basic_filebuf/sputn/char/63746.cc > > >> > > >> diff --git a/libstdc++-v3/include/bits/fstream.tcc > > >> b/libstdc++-v3/include/bits/fstream.tcc > > >> index 7ccc887b8..2e9369628 100644 > > >> --- a/libstdc++-v3/include/bits/fstream.tcc > > >> +++ b/libstdc++-v3/include/bits/fstream.tcc > > >> @@ -757,23 +757,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > > >> { > > >> streamsize __ret = 0; > > >> // Optimization in the always_noconv() case, to be generalized > in > > >> the > > >> - // future: when __n is sufficiently large we write directly > > >> instead of > > >> - // using the buffer. > > >> + // future: when __n is larger than the available capacity we > write > > >> + // directly instead of using the buffer. > > >> const bool __testout = (_M_mode & ios_base::out > > >> || _M_mode & ios_base::app); > > >> if (__check_facet(_M_codecvt).always_noconv() > > >> && __testout && !_M_reading) > > >> { > > >> - // Measurement would reveal the best choice. > > >> - const streamsize __chunk = 1ul << 10; > > >> streamsize __bufavail = this->epptr() - this->pptr(); > > >> > > >> // Don't mistake 'uncommitted' mode buffered with > unbuffered. > > >> if (!_M_writing && _M_buf_size > 1) > > >> __bufavail = _M_buf_size - 1; > > >> > > >> - const streamsize __limit = std::min(__chunk, __bufavail); > > >> - if (__n >= __limit) > > >> + if (__n >= __bufavail) > > >> { > > >> const streamsize __buffill = this->pptr() - > this->pbase(); > > >> const char* __buf = reinterpret_cast<const > > >> char*>(this->pbase()); > > >> diff --git > > >> a/libstdc++-v3/testsuite/27_io/basic_filebuf/sputn/char/63746.cc > > >> b/libstdc++-v3/testsuite/27_io/basic_filebuf/sputn/char/63746.cc > > >> new file mode 100644 > > >> index 000000000..36448e049 > > >> --- /dev/null > > >> +++ b/libstdc++-v3/testsuite/27_io/basic_filebuf/sputn/char/63746.cc > > >> @@ -0,0 +1,55 @@ > > >> +// Copyright (C) 2013-2022 Free Software Foundation, Inc. > > >> +// > > >> +// This file is part of the GNU ISO C++ Library. This library is > free > > >> +// software; you can redistribute it and/or modify it under the > > >> +// terms of the GNU General Public License as published by the > > >> +// Free Software Foundation; either version 3, or (at your option) > > >> +// any later version. > > >> + > > >> +// This library is distributed in the hope that it will be useful, > > >> +// but WITHOUT ANY WARRANTY; without even the implied warranty of > > >> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > > >> +// GNU General Public License for more details. > > >> + > > >> +// You should have received a copy of the GNU General Public License > > >> along > > >> +// with this library; see the file COPYING3. If not see > > >> +// <http://www.gnu.org/licenses/>. > > >> + > > >> +// { dg-require-fileio "" } > > >> + > > >> +#include <fstream> > > >> +#include <testsuite_hooks.h> > > >> + > > >> +class testbuf : public std::filebuf { > > >> +public: > > >> + char_type* pub_pprt() const > > >> + { > > >> + return this->pptr(); > > >> + } > > >> + > > >> + char_type* pub_pbase() const > > >> + { > > >> + return this->pbase(); > > >> + } > > >> +}; > > >> + > > >> +void test01() > > >> +{ > > >> + using namespace std; > > >> + > > >> + // Leave capacity to avoid flush. > > >> + const streamsize chunk_size = BUFSIZ - 1 - 1; > > >> + const char data[chunk_size] = {}; > > >> + > > >> + testbuf a_f; > > >> + VERIFY( a_f.open("tmp_63746_sputn", ios_base::out) ); > > >> + VERIFY( chunk_size == a_f.sputn(data, chunk_size) ); > > >> + VERIFY( (a_f.pub_pprt() - a_f.pub_pbase()) == chunk_size ); > > >> + VERIFY( a_f.close() ); > > >> +} > > >> + > > >> +int main() > > >> +{ > > >> + test01(); > > >> + return 0; > > >> +} > > >> -- > > >> 2.30.2 > > >> > > >> > > > >
On Thu, 6 Oct 2022 at 17:33, Charles-François Natali <cf.natali@gmail.com> wrote: > > On Thu, Oct 6, 2022, 14:29 Jonathan Wakely <jwakely@redhat.com> wrote: >> >> Sorry for the lack of review. I've been trying to remember (and find) >> some previous discussions related to this topic, but haven't managed >> to find it yet. > > > No worries! > > >> >> The patch does look sensible (and is the same as the one attached to >> PR 63746) so I'll make sure to review it in time for the GCC 13 >> cut-off. >> >> I noticed that you contributed your test case with a FSF copyright >> assignment header. Do you actually have a copyright assignment for GCC >> contributions? If not, you would either need to complete that >> paperwork with the FSF, or alternatively just contribute under the DCO >> terms instead: https://gcc.gnu.org/dco.html > > > > I actually just copy-pasted the header from another test, would it be simpler if i just removed it? Yes, that's probably the simplest solution, and then add a Signed-off-by: tag in your patch email, to state you're contributing it under the DCO terms (assuming of course that you are willing and able to certify those terms).
On Thu, Oct 6, 2022, 17:56 Jonathan Wakely <jwakely@redhat.com> wrote: > > I actually just copy-pasted the header from another test, would it be > simpler if i just removed it? > > > Yes, that's probably the simplest solution, and then add a > Signed-off-by: tag in your patch email, to state you're contributing > it under the DCO terms (assuming of course that you are willing and > able to certify those terms). > I submitted another version of the patch without the header and with the Signed-Off tag, see other thread. Cheers, Charles
diff --git a/libstdc++-v3/include/bits/fstream.tcc b/libstdc++-v3/include/bits/fstream.tcc index 7ccc887b8..2e9369628 100644 --- a/libstdc++-v3/include/bits/fstream.tcc +++ b/libstdc++-v3/include/bits/fstream.tcc @@ -757,23 +757,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { streamsize __ret = 0; // Optimization in the always_noconv() case, to be generalized in the - // future: when __n is sufficiently large we write directly instead of - // using the buffer. + // future: when __n is larger than the available capacity we write + // directly instead of using the buffer. const bool __testout = (_M_mode & ios_base::out || _M_mode & ios_base::app); if (__check_facet(_M_codecvt).always_noconv() && __testout && !_M_reading) { - // Measurement would reveal the best choice. - const streamsize __chunk = 1ul << 10; streamsize __bufavail = this->epptr() - this->pptr(); // Don't mistake 'uncommitted' mode buffered with unbuffered. if (!_M_writing && _M_buf_size > 1) __bufavail = _M_buf_size - 1; - const streamsize __limit = std::min(__chunk, __bufavail); - if (__n >= __limit) + if (__n >= __bufavail) { const streamsize __buffill = this->pptr() - this->pbase(); const char* __buf = reinterpret_cast<const char*>(this->pbase()); diff --git a/libstdc++-v3/testsuite/27_io/basic_filebuf/sputn/char/63746.cc b/libstdc++-v3/testsuite/27_io/basic_filebuf/sputn/char/63746.cc new file mode 100644 index 000000000..36448e049 --- /dev/null +++ b/libstdc++-v3/testsuite/27_io/basic_filebuf/sputn/char/63746.cc @@ -0,0 +1,55 @@ +// Copyright (C) 2013-2022 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// You should have received a copy of the GNU General Public License along +// with this library; see the file COPYING3. If not see +// <http://www.gnu.org/licenses/>. + +// { dg-require-fileio "" } + +#include <fstream> +#include <testsuite_hooks.h> + +class testbuf : public std::filebuf { +public: + char_type* pub_pprt() const + { + return this->pptr(); + } + + char_type* pub_pbase() const + { + return this->pbase(); + } +}; + +void test01() +{ + using namespace std; + + // Leave capacity to avoid flush. + const streamsize chunk_size = BUFSIZ - 1 - 1; + const char data[chunk_size] = {}; + + testbuf a_f; + VERIFY( a_f.open("tmp_63746_sputn", ios_base::out) ); + VERIFY( chunk_size == a_f.sputn(data, chunk_size) ); + VERIFY( (a_f.pub_pprt() - a_f.pub_pbase()) == chunk_size ); + VERIFY( a_f.close() ); +} + +int main() +{ + test01(); + return 0; +}