Message ID | 20230503090708.2524310-1-nmi@metaspace.dk |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp1182003vqo; Wed, 3 May 2023 02:09:25 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6dZlHi/P2ggP5ealO/RuwEe0EE+KBD+VrkhuTN5vpi6xCgd9D5KXW56yI1SQ7xMW3Q1p5+ X-Received: by 2002:a05:6a00:2d24:b0:63b:2102:a1d4 with SMTP id fa36-20020a056a002d2400b0063b2102a1d4mr30459781pfb.13.1683104964908; Wed, 03 May 2023 02:09:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683104964; cv=none; d=google.com; s=arc-20160816; b=LPPVSW8HSL+KIwUH2UOlS6N+L4NUtzMArrQdGvkC6zKoxxHQtsoaX4JpHcewDxk02C M0wSWLxUbtnkFndkV4GLufJKIYH1DEmuM3YfmbDfApyuK3RcAb8kZ6oRnv/fL0+1OgrN YTz33gJihDsNWh5tq44n1jWeisMdOJYMhDIquHH4P8rlvXl7+FvzV5LU2F6U1KuytDge q7jTk22C3lufoQ1YVvIM38VomowM61++c2AYgvxGpKkFtMHSDkUkRVaEOM5F+TrKEAUo pWuUEdC+893xTue7UXlfXOykr2HSmpJJmZ2S/IU1/yzCKaS54HbQQbMF3dbENNUTBEc8 XufA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=LQKppYVwf0SeSaMl9IKE/WZzNB7GDromyqxTafCG8fo=; b=o9mCoXgWHakWwl55pCRYajyWmPWBoPwoCAizhtf2NnvktRn1nQKMTNUbhHfSUreY81 Rvm2Fk1gE9FKbZ5yS2c62uB639JqZ2+bYbESVTsha64c6fSItmlNPyOgCMk7oZ06Ar91 6k4WKCvhxOgVlJ6m7aICjun8CvboXbcV33WOn6jhrr8/+fI74fjEETydwchgiAz8PIt4 KEIsPV6UXOgNyl7vmeqvPUDt/KGF8qB+deWZfydYvwBA0SMcY266ND6AZHQGAycyRSW9 6eCHlC4Enm30ed8y4WBzjjJFdCCSDnLHJCEQe9/WWFy6Joh2G118pWLSTrW/EFyhaBen LcBA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@metaspace-dk.20221208.gappssmtp.com header.s=20221208 header.b=M1w5Gyuq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h6-20020a056a00000600b0064140b05b6dsi10284121pfk.228.2023.05.03.02.09.09; Wed, 03 May 2023 02:09:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@metaspace-dk.20221208.gappssmtp.com header.s=20221208 header.b=M1w5Gyuq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229810AbjECJHZ (ORCPT <rfc822;heyuhang3455@gmail.com> + 99 others); Wed, 3 May 2023 05:07:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45268 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229585AbjECJHW (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 3 May 2023 05:07:22 -0400 Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com [IPv6:2a00:1450:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E52001FE3 for <linux-kernel@vger.kernel.org>; Wed, 3 May 2023 02:07:16 -0700 (PDT) Received: by mail-wr1-x42c.google.com with SMTP id ffacd0b85a97d-2fddb442d47so4548839f8f.2 for <linux-kernel@vger.kernel.org>; Wed, 03 May 2023 02:07:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=metaspace-dk.20221208.gappssmtp.com; s=20221208; t=1683104835; x=1685696835; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=LQKppYVwf0SeSaMl9IKE/WZzNB7GDromyqxTafCG8fo=; b=M1w5GyuqmVVKXv2c8Jk8HydKYSHf++D2z0s0Td6pzIpnFDFE0DJAmUXUwzEz5tuOM3 LEnsj1GusJ8DbKoMRoVVx0gORrompfxC+qx/Uzf1juvw5Z5f3D6jUpg9rq3gruzjNGjk Ywjm11lr7fMEGz7D2B1HzRFkOHfqEKDIAUUPJD1zz+Pxnoxx3dLvYqHSAfdsJINZTRyW CMWtAAZl171NljNk37xw5uWw2jK9LZQJFR1nx2jivXfEJSqtphTfEvqyBo5iNnKN/jcQ A+efKm11tUBku32haJ1k2ee3V6CfkqMI11+0FcKWNekMZvZmvVpUOkktEleA+KSUwZ7e /OLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683104835; x=1685696835; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=LQKppYVwf0SeSaMl9IKE/WZzNB7GDromyqxTafCG8fo=; b=YFMoiItzLiPWTlw9lCWfcAY1ryv0nk47jxW502r8T//g8SixhzloyWoODLyfJxtMX/ M7Klk/kGJh7kf3rtLScxDDSwzXEZzPpooT8OmPf9b7DXmR9VdYLNiRrGh5HAEOuFaxw8 eXxUTMRwdZFOUjoqqPP8qtfrM3vzocEqf7DEJkkJSawTiI6o5v6gSeqk+T89rUgaH9nk hBBnSGVogTL6M96QTGVfqwWnL9SoKIDZ64st5nThlyLvudSHol06vwTEfdKO2rv3sT1e 2hDG3TXZacFwPonuwdDjgmTYkfVdwvEP2IMXtKynO03rr4KODJYA73yWSnEkoFBdLqXH DdYw== X-Gm-Message-State: AC+VfDxJRzqHIJJ+xBF7YWYCwEuC0Xj6bdLBJRuX+rzmw2nXgFVVOfZZ eUS0vby7InKZ3RHXFiY+shRDHQ== X-Received: by 2002:adf:eb0a:0:b0:306:3ded:2bf1 with SMTP id s10-20020adfeb0a000000b003063ded2bf1mr1720747wrn.60.1683104835021; Wed, 03 May 2023 02:07:15 -0700 (PDT) Received: from localhost ([147.161.155.99]) by smtp.gmail.com with ESMTPSA id c10-20020adffb0a000000b00306344eaebfsm4798159wrr.28.2023.05.03.02.07.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 May 2023 02:07:14 -0700 (PDT) From: Andreas Hindborg <nmi@metaspace.dk> To: Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>, Keith Busch <kbusch@kernel.org>, Damien Le Moal <Damien.LeMoal@wdc.com>, Hannes Reinecke <hare@suse.de>, lsf-pc@lists.linux-foundation.org, rust-for-linux@vger.kernel.org, linux-block@vger.kernel.org Cc: Andreas Hindborg <a.hindborg@samsung.com>, Matthew Wilcox <willy@infradead.org>, Miguel Ojeda <ojeda@kernel.org>, Alex Gaynor <alex.gaynor@gmail.com>, Wedson Almeida Filho <wedsonaf@gmail.com>, Boqun Feng <boqun.feng@gmail.com>, Gary Guo <gary@garyguo.net>, =?utf-8?q?Bj=C3=B6rn_Roy_Baron?= <bjorn3_gh@protonmail.com>, Benno Lossin <benno.lossin@proton.me>, Andreas Hindborg <nmi@metaspace.dk>, linux-kernel@vger.kernel.org (open list), gost.dev@samsung.com Subject: [RFC PATCH 00/11] Rust null block driver Date: Wed, 3 May 2023 11:06:57 +0200 Message-Id: <20230503090708.2524310-1-nmi@metaspace.dk> X-Mailer: git-send-email 2.40.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1764863471452887970?= X-GMAIL-MSGID: =?utf-8?q?1764863471452887970?= |
Series |
Rust null block driver
|
|
Message
Andreas Hindborg
May 3, 2023, 9:06 a.m. UTC
From: Andreas Hindborg <a.hindborg@samsung.com>
Hi All!
This is an early preview of a null block driver written in Rust. It is a follow
up to my LSF topic proposal [1]. I send this series with intention of collecting
feedback and comments at LSF/MM/BPF.
The patches apply on top of the `rust-next` tree (rust-6.4
ea76e08f4d901a450619831a255e9e0a4c0ed162) [4].
A null block driver is a good opportunity to evaluate Rust bindings for the
block layer. It is a small and simple driver and thus should be simple to reason
about. Further, the null block driver is not usually deployed in production
environments. Thus, it should be fairly straight forward to review, and any
potential issues are not going to bring down any production workloads.
Being small and simple, the null block driver is a good place to introduce the
Linux kernel storage community to Rust. This will help prepare the community for
future Rust projects and facilitate a better maintenance process for these
projects.
The statistics presented in my previous message [1] show that the C null block
driver has had a significant amount of memory safety related problems in the
past. 41% of fixes merged for the C null block driver are fixes for memory
safety issues. This makes the null block driver a good candidate for rewriting
in Rust.
Patches in this series prefixed with capitalized RUST are code taken from the
R4L `rust` tree [2]. This code is not yet in mainline Linux and has not yet been
submitted as patches for inclusion in Linux. I am not the original author of
these patches, except for small changes required to include them in this series.
The driver performs similarly to the C driver for the implemented features, see
table below [5].
In this table each cell shows the relative performance of the Rust
driver to the C driver with the throughput of the C driver in parenthesis:
`rel_read rel_write (read_miops write_miops)`. Ex: the Rust driver performs 4.74
percent better than the C driver for fio randread with 2 jobs at 16 KiB
block size.
Over the 432 benchmark configurations, the relative performance of the Rust
driver to the C driver (in terms of IOPS) is between 6.8 and -11.8 percent with
an average of 0.2 percent better for reads. For writes the span is 16.8 to -4.5
percent with an average of 0.9 percent better.
For each measurement the drivers are loaded, a drive is configured with memory
backing and a size of 4 GiB. C null_blk is configured to match the implemented
modes of the Rust driver: `blocksize` is set to 4 KiB, `completion_nsec` to 0,
`irqmode` to 0 (IRQ_NONE), `queue_mode` to 2 (MQ), `hw_queue_depth` to 256 and
`memory_backed` to 1. For both the drivers, the queue scheduler is set to
`none`. These measurements are made using 30 second runs of `fio` with the
`PSYNC` IO engine with workers pinned to separate CPU cores. The measurements
are done inside a virtual machine (qemu/kvm) on an Intel Alder Lake workstation
(i5-12600).
Given the relatively large spread in these numbers, it is possible that there is
quite a bit of noise in the results. If anybody is able to dedicate some CPU
time to run benchmarks in a more controlled and standardized environment, I
would appreciate it.
The feature list of the driver is relatively short for now:
- `blk-mq` support
- Direct completion of IO
- Read and write requests
- Optional memory backing
Features available in the C null_blk driver that are currently not implemented
in this work:
- Bio-based submission
- NUMA support
- Block size configuration
- Multiple devices
- Dynamic device creation/destruction
- Soft-IRQ and timer mode
- Queue depth configuration
- Queue count configuration
- Discard operation support
- Cache emulation
- Bandwidth throttling
- Per node hctx
- IO scheduler configuration
- Blocking submission mode
- Shared tags configuration (for >1 device)
- Zoned storage support
- Bad block simulation
- Poll queues
The driver is implemented entirely in safe Rust, with all unsafe code fully
contained in wrappers for C APIs.
Patches 07/11 and 08/11 that enable the equivalent of the C
`spin_lock_irqsave()` did appear on the Rust list [6,3] but was not applied to
`rust-next` tree yet [4]. I am not the author of these patches, but I include
them here for convenience. Please defer any discussion of these patches to their
original posts.
The null block driver is provided in patch 10/11. Patches 03/11 and 04/11
introduce Rust abstractions for the block C APIs required by the driver.
[1] https://lore.kernel.org/all/87y1ofj5tt.fsf@metaspace.dk/
[2] https://github.com/Rust-for-Linux/linux/tree/rust
[3] https://lore.kernel.org/all/20230411054543.21278-7-wedsonaf@gmail.com/
[4] https://github.com/Rust-for-Linux/linux/tree/rust-next
[6] https://lore.kernel.org/all/20230411054543.21278-6-wedsonaf@gmail.com/
[5]:
+---------+-----------+-------------------------------+-----------------------------+-------------------------------+
| jobs/bs | workload | 1 | 2 | 3 |
+---------+-----------+-------------------------------+-----------------------------+-------------------------------+
| 4k | randread | 1.54 0.00 (0.7,0.0) | 0.35 0.00 (1.3,0.0) | -0.52 0.00 (1.3,0.0) |
| 8k | randread | 3.30 0.00 (0.57,0.0) | 6.79 0.00 (0.92,0.0) | 0.24 0.00 (0.97,0.0) |
| 16k | randread | 0.89 0.00 (0.42,0.0) | 4.74 0.00 (0.64,0.0) | 3.98 0.00 (0.65,0.0) |
| 32k | randread | 2.04 0.00 (0.27,0.0) | 3.26 0.00 (0.37,0.0) | 1.41 0.00 (0.37,0.0) |
| 64k | randread | 1.35 0.00 (0.16,0.0) | 0.71 0.00 (0.21,0.0) | 0.66 0.00 (0.22,0.0) |
| 128k | randread | 0.59 0.00 (0.09,0.0) | 0.13 0.00 (0.1,0.0) | -6.47 0.00 (0.11,0.0) |
| 256k | randread | 0.13 0.00 (0.048,0.0) | 1.00 0.00 (0.055,0.0) | 0.03 0.00 (0.059,0.0) |
| 512k | randread | 1.16 0.00 (0.023,0.0) | -1.82 0.00 (0.027,0.0) | -0.90 0.00 (0.028,0.0) |
| 1024k | randread | 0.14 0.00 (0.011,0.0) | 0.58 0.00 (0.012,0.0) | -1.14 0.00 (0.013,0.0) |
| 2048k | randread | 2.02 0.00 (0.005,0.0) | 0.13 0.00 (0.0062,0.0) | -0.37 0.00 (0.0062,0.0) |
| 4096k | randread | 1.21 0.00 (0.0026,0.0) | -5.68 0.00 (0.0031,0.0) | -2.39 0.00 (0.0029,0.0) |
| 8192k | randread | -1.02 0.00 (0.0013,0.0) | 0.66 0.00 (0.0013,0.0) | -1.72 0.00 (0.0013,0.0) |
| 4k | randrw | 2.82 2.82 (0.25,0.25) | 2.94 2.94 (0.3,0.3) | 3.19 3.20 (0.34,0.34) |
| 8k | randrw | 2.80 2.82 (0.21,0.21) | 2.32 2.32 (0.25,0.25) | 1.95 1.96 (0.28,0.28) |
| 16k | randrw | 4.54 4.52 (0.16,0.16) | 2.29 2.29 (0.2,0.2) | 2.27 2.30 (0.21,0.21) |
| 32k | randrw | 3.51 3.51 (0.11,0.11) | 2.43 2.40 (0.13,0.13) | 1.38 1.38 (0.14,0.14) |
| 64k | randrw | 2.40 2.42 (0.069,0.069) | 1.98 1.99 (0.08,0.08) | 0.51 0.51 (0.082,0.082) |
| 128k | randrw | 1.50 1.49 (0.039,0.039) | 1.28 1.27 (0.042,0.042) | 0.83 0.84 (0.043,0.043) |
| 256k | randrw | 2.02 2.06 (0.021,0.021) | 0.92 0.91 (0.022,0.022) | 0.70 0.75 (0.022,0.022) |
| 512k | randrw | 0.82 0.82 (0.011,0.011) | 0.69 0.70 (0.011,0.011) | 0.97 0.98 (0.011,0.011) |
| 1024k | randrw | 3.78 3.84 (0.0046,0.0046) | 0.98 0.98 (0.0053,0.0053) | 1.30 1.29 (0.0054,0.0054) |
| 2048k | randrw | 1.60 1.54 (0.0023,0.0023) | 1.76 1.69 (0.0026,0.0026) | 1.23 0.92 (0.0026,0.0026) |
| 4096k | randrw | 1.18 1.10 (0.0011,0.0012) | 1.32 1.20 (0.0014,0.0014) | 1.49 1.41 (0.0013,0.0013) |
| 8192k | randrw | -2.01 -1.89 (0.00057,0.00057) | 1.20 1.14 (0.00061,0.00061) | -0.74 -0.61 (0.0006,0.00062) |
| 4k | randwrite | 0.00 2.48 (0.0,0.41) | 0.00 3.61 (0.0,0.39) | 0.00 3.93 (0.0,0.44) |
| 8k | randwrite | 0.00 3.46 (0.0,0.35) | 0.00 1.90 (0.0,0.34) | 0.00 2.20 (0.0,0.37) |
| 16k | randwrite | 0.00 3.01 (0.0,0.28) | 0.00 2.86 (0.0,0.27) | 0.00 2.25 (0.0,0.29) |
| 32k | randwrite | 0.00 3.02 (0.0,0.19) | 0.00 2.68 (0.0,0.19) | 0.00 2.20 (0.0,0.21) |
| 64k | randwrite | 0.00 3.39 (0.0,0.12) | 0.00 3.75 (0.0,0.12) | 0.00 -0.79 (0.0,0.14) |
| 128k | randwrite | 0.00 3.81 (0.0,0.069) | 0.00 1.10 (0.0,0.072) | 0.00 5.48 (0.0,0.078) |
| 256k | randwrite | 0.00 4.28 (0.0,0.038) | 0.00 1.27 (0.0,0.037) | 0.00 -0.10 (0.0,0.041) |
| 512k | randwrite | 0.00 2.71 (0.0,0.019) | 0.00 0.46 (0.0,0.017) | 0.00 2.96 (0.0,0.019) |
| 1024k | randwrite | 0.00 2.67 (0.0,0.0082) | 0.00 2.54 (0.0,0.0087) | 0.00 0.20 (0.0,0.0093) |
| 2048k | randwrite | 0.00 2.40 (0.0,0.0041) | 0.00 1.90 (0.0,0.0041) | 0.00 3.44 (0.0,0.0043) |
| 4096k | randwrite | 0.00 3.38 (0.0,0.002) | 0.00 1.69 (0.0,0.0025) | 0.00 6.00 (0.0,0.0025) |
| 8192k | randwrite | 0.00 3.09 (0.0,0.00098) | 0.00 1.10 (0.0,0.0012) | 0.00 0.45 (0.0,0.0012) |
| 4k | read | 1.26 0.00 (1.0,0.0) | 1.65 0.00 (2.0,0.0) | -9.94 0.00 (2.8,0.0) |
| 8k | read | 1.86 0.00 (0.78,0.0) | 4.60 0.00 (1.5,0.0) | 0.47 0.00 (1.7,0.0) |
| 16k | read | 1.13 0.00 (0.54,0.0) | 4.00 0.00 (0.95,0.0) | -10.28 0.00 (0.97,0.0) |
| 32k | read | 0.64 0.00 (0.32,0.0) | 5.54 0.00 (0.46,0.0) | -0.67 0.00 (0.47,0.0) |
| 64k | read | 0.29 0.00 (0.18,0.0) | -0.08 0.00 (0.25,0.0) | -0.82 0.00 (0.25,0.0) |
| 128k | read | 0.11 0.00 (0.095,0.0) | 0.12 0.00 (0.11,0.0) | -4.99 0.00 (0.12,0.0) |
| 256k | read | 1.66 0.00 (0.05,0.0) | -0.41 0.00 (0.058,0.0) | -0.30 0.00 (0.061,0.0) |
| 512k | read | 1.06 0.00 (0.024,0.0) | -1.77 0.00 (0.029,0.0) | 0.81 0.00 (0.029,0.0) |
| 1024k | read | 0.41 0.00 (0.011,0.0) | -2.28 0.00 (0.013,0.0) | -0.33 0.00 (0.013,0.0) |
| 2048k | read | -1.18 0.00 (0.0053,0.0) | 1.41 0.00 (0.0063,0.0) | -0.86 0.00 (0.0063,0.0) |
| 4096k | read | -0.54 0.00 (0.0027,0.0) | -11.77 0.00 (0.0031,0.0) | 0.02 0.00 (0.003,0.0) |
| 8192k | read | -1.60 0.00 (0.0013,0.0) | -4.25 0.00 (0.0013,0.0) | -1.64 0.00 (0.0013,0.0) |
| 4k | readwrite | 1.01 1.01 (0.34,0.34) | 1.00 1.00 (0.42,0.42) | 1.30 1.29 (0.45,0.44) |
| 8k | readwrite | 1.54 1.54 (0.28,0.28) | 1.65 1.66 (0.32,0.32) | 1.18 1.18 (0.36,0.36) |
| 16k | readwrite | 1.07 1.08 (0.2,0.2) | 2.82 2.81 (0.24,0.24) | 0.40 0.39 (0.26,0.26) |
| 32k | readwrite | 0.56 0.56 (0.13,0.13) | -1.31 -1.30 (0.16,0.16) | 0.09 0.09 (0.16,0.16) |
| 64k | readwrite | 1.82 1.83 (0.077,0.077) | -0.96 -0.96 (0.091,0.091) | -0.99 -0.97 (0.092,0.092) |
| 128k | readwrite | 2.09 2.10 (0.041,0.041) | 1.07 1.06 (0.045,0.045) | 0.28 0.29 (0.045,0.045) |
| 256k | readwrite | 2.06 2.05 (0.022,0.022) | 1.84 1.85 (0.023,0.023) | 0.66 0.59 (0.023,0.023) |
| 512k | readwrite | 5.90 5.84 (0.01,0.01) | 1.20 1.17 (0.011,0.011) | 1.18 1.21 (0.011,0.011) |
| 1024k | readwrite | 1.94 1.89 (0.0047,0.0047) | 2.43 2.48 (0.0053,0.0053) | 2.18 2.21 (0.0054,0.0054) |
| 2048k | readwrite | -0.47 -0.47 (0.0023,0.0023) | 2.45 2.38 (0.0026,0.0026) | 1.76 1.78 (0.0026,0.0026) |
| 4096k | readwrite | 1.88 1.77 (0.0011,0.0012) | 1.37 1.17 (0.0014,0.0014) | 1.80 1.90 (0.0013,0.0013) |
| 8192k | readwrite | 1.27 1.42 (0.00057,0.00057) | 0.86 0.73 (0.00061,0.00062) | -0.99 -1.28 (0.00061,0.00062) |
| 4k | write | 0.00 1.53 (0.0,0.51) | 0.00 0.35 (0.0,0.51) | 0.00 0.92 (0.0,0.52) |
| 8k | write | 0.00 1.04 (0.0,0.42) | 0.00 1.01 (0.0,0.4) | 0.00 2.90 (0.0,0.44) |
| 16k | write | 0.00 0.86 (0.0,0.32) | 0.00 1.89 (0.0,0.35) | 0.00 -1.38 (0.0,0.36) |
| 32k | write | 0.00 0.47 (0.0,0.21) | 0.00 -0.52 (0.0,0.26) | 0.00 -1.40 (0.0,0.24) |
| 64k | write | 0.00 0.30 (0.0,0.13) | 0.00 13.34 (0.0,0.14) | 0.00 2.01 (0.0,0.15) |
| 128k | write | 0.00 2.24 (0.0,0.073) | 0.00 16.77 (0.0,0.076) | 0.00 -0.36 (0.0,0.085) |
| 256k | write | 0.00 3.49 (0.0,0.039) | 0.00 9.76 (0.0,0.039) | 0.00 -4.52 (0.0,0.043) |
| 512k | write | 0.00 4.08 (0.0,0.02) | 0.00 1.03 (0.0,0.017) | 0.00 1.05 (0.0,0.019) |
| 1024k | write | 0.00 2.72 (0.0,0.0083) | 0.00 2.89 (0.0,0.0086) | 0.00 2.98 (0.0,0.0095) |
| 2048k | write | 0.00 2.14 (0.0,0.0041) | 0.00 3.12 (0.0,0.0041) | 0.00 1.84 (0.0,0.0044) |
| 4096k | write | 0.00 1.76 (0.0,0.002) | 0.00 1.80 (0.0,0.0026) | 0.00 6.01 (0.0,0.0025) |
| 8192k | write | 0.00 1.41 (0.0,0.00099) | 0.00 0.98 (0.0,0.0012) | 0.00 0.13 (0.0,0.0012) |
+---------+-----------+-------------------------------+-----------------------------+-------------------------------+
+---------+-----------+-------------------------------+------------------------------+-------------------------------+
| jobs/bs | workload | 4 | 5 | 6 |
+---------+-----------+-------------------------------+------------------------------+-------------------------------+
| 4k | randread | 1.66 0.00 (1.3,0.0) | -0.81 0.00 (1.3,0.0) | -0.50 0.00 (1.2,0.0) |
| 8k | randread | 2.52 0.00 (0.98,0.0) | 2.54 0.00 (0.95,0.0) | 2.52 0.00 (0.92,0.0) |
| 16k | randread | 0.68 0.00 (0.65,0.0) | 3.43 0.00 (0.64,0.0) | 5.84 0.00 (0.63,0.0) |
| 32k | randread | 1.88 0.00 (0.38,0.0) | 0.46 0.00 (0.38,0.0) | 2.33 0.00 (0.37,0.0) |
| 64k | randread | 1.28 0.00 (0.21,0.0) | 0.64 0.00 (0.21,0.0) | -0.26 0.00 (0.21,0.0) |
| 128k | randread | -0.05 0.00 (0.11,0.0) | 0.10 0.00 (0.11,0.0) | 0.09 0.00 (0.11,0.0) |
| 256k | randread | -0.01 0.00 (0.059,0.0) | -1.01 0.00 (0.059,0.0) | -0.39 0.00 (0.059,0.0) |
| 512k | randread | -1.17 0.00 (0.029,0.0) | 0.39 0.00 (0.028,0.0) | -0.33 0.00 (0.028,0.0) |
| 1024k | randread | -0.20 0.00 (0.013,0.0) | -0.85 0.00 (0.013,0.0) | -0.57 0.00 (0.013,0.0) |
| 2048k | randread | -0.07 0.00 (0.0062,0.0) | -0.26 0.00 (0.0061,0.0) | -0.15 0.00 (0.006,0.0) |
| 4096k | randread | -0.25 0.00 (0.0027,0.0) | -2.70 0.00 (0.0027,0.0) | -2.35 0.00 (0.0026,0.0) |
| 8192k | randread | -2.94 0.00 (0.0013,0.0) | -3.01 0.00 (0.0012,0.0) | -2.09 0.00 (0.0012,0.0) |
| 4k | randrw | 2.83 2.84 (0.36,0.36) | 2.24 2.23 (0.37,0.37) | 3.49 3.50 (0.38,0.38) |
| 8k | randrw | 1.31 1.32 (0.29,0.29) | 2.01 2.01 (0.3,0.3) | 2.09 2.09 (0.3,0.3) |
| 16k | randrw | 1.96 1.95 (0.22,0.22) | 1.61 1.61 (0.22,0.22) | 1.99 2.00 (0.22,0.22) |
| 32k | randrw | 1.01 1.03 (0.14,0.14) | 0.58 0.59 (0.14,0.14) | -0.04 -0.05 (0.14,0.14) |
| 64k | randrw | -0.19 -0.18 (0.081,0.081) | -0.72 -0.72 (0.081,0.081) | -0.50 -0.50 (0.081,0.081) |
| 128k | randrw | 0.42 0.42 (0.043,0.043) | -0.33 -0.36 (0.043,0.043) | -0.62 -0.59 (0.043,0.043) |
| 256k | randrw | 0.38 0.41 (0.022,0.022) | -0.35 -0.35 (0.022,0.022) | 0.37 0.31 (0.022,0.022) |
| 512k | randrw | 0.40 0.41 (0.011,0.011) | -0.29 -0.26 (0.011,0.011) | 2.76 2.48 (0.01,0.01) |
| 1024k | randrw | 1.16 1.01 (0.0052,0.0052) | 0.74 0.72 (0.0051,0.0051) | 1.72 1.77 (0.005,0.005) |
| 2048k | randrw | 1.09 0.96 (0.0026,0.0026) | 1.19 1.57 (0.0026,0.0026) | 1.49 1.21 (0.0025,0.0025) |
| 4096k | randrw | -0.13 0.06 (0.0013,0.0013) | 0.69 0.77 (0.0013,0.0013) | -0.20 -0.13 (0.0012,0.0012) |
| 8192k | randrw | -0.71 -1.27 (0.00059,0.00061) | -1.78 -2.02 (0.00059,0.0006) | -1.60 -1.25 (0.00058,0.00059) |
| 4k | randwrite | 0.00 2.17 (0.0,0.44) | 0.00 4.08 (0.0,0.45) | 0.00 5.47 (0.0,0.46) |
| 8k | randwrite | 0.00 0.98 (0.0,0.38) | 0.00 2.39 (0.0,0.39) | 0.00 2.30 (0.0,0.4) |
| 16k | randwrite | 0.00 2.02 (0.0,0.3) | 0.00 2.63 (0.0,0.3) | 0.00 3.17 (0.0,0.3) |
| 32k | randwrite | 0.00 1.82 (0.0,0.21) | 0.00 1.12 (0.0,0.22) | 0.00 1.40 (0.0,0.21) |
| 64k | randwrite | 0.00 1.41 (0.0,0.14) | 0.00 1.92 (0.0,0.13) | 0.00 1.76 (0.0,0.13) |
| 128k | randwrite | 0.00 2.32 (0.0,0.075) | 0.00 4.61 (0.0,0.074) | 0.00 2.31 (0.0,0.074) |
| 256k | randwrite | 0.00 -0.77 (0.0,0.038) | 0.00 2.61 (0.0,0.036) | 0.00 2.48 (0.0,0.036) |
| 512k | randwrite | 0.00 1.83 (0.0,0.019) | 0.00 1.68 (0.0,0.018) | 0.00 2.80 (0.0,0.018) |
| 1024k | randwrite | 0.00 3.32 (0.0,0.0086) | 0.00 1.20 (0.0,0.0087) | 0.00 0.92 (0.0,0.0081) |
| 2048k | randwrite | 0.00 0.92 (0.0,0.0043) | 0.00 2.95 (0.0,0.0042) | 0.00 3.15 (0.0,0.0042) |
| 4096k | randwrite | 0.00 3.22 (0.0,0.0025) | 0.00 0.50 (0.0,0.0024) | 0.00 0.40 (0.0,0.0024) |
| 8192k | randwrite | 0.00 -0.57 (0.0,0.0012) | 0.00 -0.41 (0.0,0.0011) | 0.00 -0.36 (0.0,0.0011) |
| 4k | read | 4.34 0.00 (2.5,0.0) | -0.40 0.00 (2.6,0.0) | -8.45 0.00 (2.6,0.0) |
| 8k | read | -1.43 0.00 (1.7,0.0) | 0.01 0.00 (1.7,0.0) | -3.89 0.00 (1.7,0.0) |
| 16k | read | 1.68 0.00 (0.96,0.0) | -0.38 0.00 (0.94,0.0) | 0.47 0.00 (0.91,0.0) |
| 32k | read | -0.98 0.00 (0.48,0.0) | -0.09 0.00 (0.47,0.0) | -1.24 0.00 (0.47,0.0) |
| 64k | read | -0.42 0.00 (0.25,0.0) | -0.26 0.00 (0.25,0.0) | -0.94 0.00 (0.25,0.0) |
| 128k | read | -0.49 0.00 (0.12,0.0) | 0.06 0.00 (0.12,0.0) | -0.59 0.00 (0.12,0.0) |
| 256k | read | -0.89 0.00 (0.062,0.0) | -0.11 0.00 (0.062,0.0) | -0.77 0.00 (0.062,0.0) |
| 512k | read | -0.64 0.00 (0.03,0.0) | -0.11 0.00 (0.03,0.0) | 0.08 0.00 (0.03,0.0) |
| 1024k | read | 0.00 0.00 (0.013,0.0) | -0.05 0.00 (0.013,0.0) | 0.42 0.00 (0.013,0.0) |
| 2048k | read | -0.69 0.00 (0.0063,0.0) | -0.36 0.00 (0.0063,0.0) | 1.12 0.00 (0.006,0.0) |
| 4096k | read | -2.69 0.00 (0.0029,0.0) | -1.85 0.00 (0.0027,0.0) | -1.13 0.00 (0.0026,0.0) |
| 8192k | read | -2.15 0.00 (0.0013,0.0) | -1.66 0.00 (0.0013,0.0) | -2.90 0.00 (0.0012,0.0) |
| 4k | readwrite | 0.32 0.32 (0.46,0.46) | 0.68 0.69 (0.47,0.47) | -0.26 -0.26 (0.5,0.5) |
| 8k | readwrite | 0.18 0.18 (0.37,0.37) | 0.77 0.76 (0.38,0.38) | -0.47 -0.48 (0.39,0.39) |
| 16k | readwrite | 0.24 0.25 (0.27,0.27) | -0.17 -0.15 (0.27,0.27) | -0.51 -0.52 (0.27,0.27) |
| 32k | readwrite | -0.91 -0.91 (0.17,0.17) | -1.40 -1.37 (0.17,0.17) | -1.67 -1.70 (0.17,0.17) |
| 64k | readwrite | -1.22 -1.21 (0.09,0.09) | -1.33 -1.30 (0.09,0.09) | -2.49 -2.50 (0.091,0.091) |
| 128k | readwrite | -0.67 -0.69 (0.045,0.045) | -1.23 -1.24 (0.045,0.045) | -1.01 -0.96 (0.045,0.045) |
| 256k | readwrite | -0.31 -0.35 (0.023,0.023) | -0.28 -0.28 (0.023,0.023) | 0.25 0.08 (0.022,0.023) |
| 512k | readwrite | 0.77 0.72 (0.011,0.011) | 0.96 0.96 (0.011,0.011) | 0.62 1.16 (0.011,0.011) |
| 1024k | readwrite | 0.93 0.91 (0.0053,0.0053) | 0.72 0.78 (0.0051,0.0051) | -0.48 -0.10 (0.0051,0.0051) |
| 2048k | readwrite | 1.84 1.77 (0.0026,0.0026) | 0.86 0.71 (0.0026,0.0026) | 0.31 1.37 (0.0026,0.0026) |
| 4096k | readwrite | 2.71 2.43 (0.0013,0.0013) | 1.54 1.73 (0.0013,0.0013) | -1.18 -0.79 (0.0012,0.0013) |
| 8192k | readwrite | -2.61 -1.52 (0.0006,0.00061) | -1.83 -1.27 (0.00059,0.0006) | -2.25 -1.53 (0.00059,0.00059) |
| 4k | write | 0.00 -2.94 (0.0,0.53) | 0.00 -2.23 (0.0,0.57) | 0.00 0.93 (0.0,0.56) |
| 8k | write | 0.00 -1.78 (0.0,0.45) | 0.00 -1.14 (0.0,0.47) | 0.00 0.47 (0.0,0.47) |
| 16k | write | 0.00 -1.46 (0.0,0.35) | 0.00 -1.46 (0.0,0.36) | 0.00 0.25 (0.0,0.36) |
| 32k | write | 0.00 -1.05 (0.0,0.24) | 0.00 -1.53 (0.0,0.24) | 0.00 -0.74 (0.0,0.24) |
| 64k | write | 0.00 -2.78 (0.0,0.15) | 0.00 1.39 (0.0,0.15) | 0.00 -2.42 (0.0,0.15) |
| 128k | write | 0.00 -0.03 (0.0,0.082) | 0.00 0.32 (0.0,0.081) | 0.00 2.76 (0.0,0.079) |
| 256k | write | 0.00 -0.33 (0.0,0.039) | 0.00 -0.32 (0.0,0.038) | 0.00 -2.37 (0.0,0.038) |
| 512k | write | 0.00 4.00 (0.0,0.019) | 0.00 6.45 (0.0,0.02) | 0.00 2.94 (0.0,0.019) |
| 1024k | write | 0.00 1.22 (0.0,0.0088) | 0.00 1.57 (0.0,0.0088) | 0.00 -3.98 (0.0,0.0087) |
| 2048k | write | 0.00 0.05 (0.0,0.0043) | 0.00 3.33 (0.0,0.0044) | 0.00 8.69 (0.0,0.0044) |
| 4096k | write | 0.00 2.36 (0.0,0.0025) | 0.00 1.52 (0.0,0.0024) | 0.00 0.44 (0.0,0.0024) |
| 8192k | write | 0.00 -0.35 (0.0,0.0012) | 0.00 -0.45 (0.0,0.0011) | 0.00 -0.73 (0.0,0.0011) |
+---------+-----------+-------------------------------+------------------------------+-------------------------------+
Andreas Hindborg (9):
rust: add radix tree abstraction
rust: add `pages` module for handling page allocation
rust: block: introduce `kernel::block::mq` module
rust: block: introduce `kernel::block::bio` module
RUST: add `module_params` macro
rust: apply cache line padding for `SpinLock`
RUST: implement `ForeignOwnable` for `Pin`
rust: add null block driver
rust: inline a number of short functions
Wedson Almeida Filho (2):
rust: lock: add support for `Lock::lock_irqsave`
rust: lock: implement `IrqSaveBackend` for `SpinLock`
drivers/block/Kconfig | 4 +
drivers/block/Makefile | 4 +
drivers/block/rnull-helpers.c | 60 ++++
drivers/block/rnull.rs | 177 ++++++++++
rust/bindings/bindings_helper.h | 4 +
rust/bindings/lib.rs | 1 +
rust/helpers.c | 91 ++++++
rust/kernel/block.rs | 6 +
rust/kernel/block/bio.rs | 93 ++++++
rust/kernel/block/bio/vec.rs | 181 +++++++++++
rust/kernel/block/mq.rs | 15 +
rust/kernel/block/mq/gen_disk.rs | 133 ++++++++
rust/kernel/block/mq/operations.rs | 260 +++++++++++++++
rust/kernel/block/mq/raw_writer.rs | 30 ++
rust/kernel/block/mq/request.rs | 87 +++++
rust/kernel/block/mq/tag_set.rs | 92 ++++++
rust/kernel/cache_padded.rs | 33 ++
rust/kernel/error.rs | 4 +
rust/kernel/lib.rs | 13 +
rust/kernel/module_param.rs | 501 +++++++++++++++++++++++++++++
rust/kernel/pages.rs | 284 ++++++++++++++++
rust/kernel/radix_tree.rs | 156 +++++++++
rust/kernel/sync/lock.rs | 49 ++-
rust/kernel/sync/lock/mutex.rs | 2 +
rust/kernel/sync/lock/spinlock.rs | 50 ++-
rust/kernel/types.rs | 30 ++
rust/macros/helpers.rs | 46 ++-
rust/macros/module.rs | 402 +++++++++++++++++++++--
scripts/Makefile.build | 2 +-
29 files changed, 2766 insertions(+), 44 deletions(-)
create mode 100644 drivers/block/rnull-helpers.c
create mode 100644 drivers/block/rnull.rs
create mode 100644 rust/kernel/block.rs
create mode 100644 rust/kernel/block/bio.rs
create mode 100644 rust/kernel/block/bio/vec.rs
create mode 100644 rust/kernel/block/mq.rs
create mode 100644 rust/kernel/block/mq/gen_disk.rs
create mode 100644 rust/kernel/block/mq/operations.rs
create mode 100644 rust/kernel/block/mq/raw_writer.rs
create mode 100644 rust/kernel/block/mq/request.rs
create mode 100644 rust/kernel/block/mq/tag_set.rs
create mode 100644 rust/kernel/cache_padded.rs
create mode 100644 rust/kernel/module_param.rs
create mode 100644 rust/kernel/pages.rs
create mode 100644 rust/kernel/radix_tree.rs
base-commit: ea76e08f4d901a450619831a255e9e0a4c0ed162
Comments
Hi Niklas, Niklas Cassel <Niklas.Cassel@wdc.com> writes: > On Wed, May 03, 2023 at 11:06:57AM +0200, Andreas Hindborg wrote: >> From: Andreas Hindborg <a.hindborg@samsung.com> >> > > (cut) > >> >> For each measurement the drivers are loaded, a drive is configured with memory >> backing and a size of 4 GiB. C null_blk is configured to match the implemented >> modes of the Rust driver: `blocksize` is set to 4 KiB, `completion_nsec` to 0, >> `irqmode` to 0 (IRQ_NONE), `queue_mode` to 2 (MQ), `hw_queue_depth` to 256 and >> `memory_backed` to 1. For both the drivers, the queue scheduler is set to >> `none`. These measurements are made using 30 second runs of `fio` with the >> `PSYNC` IO engine with workers pinned to separate CPU cores. The measurements >> are done inside a virtual machine (qemu/kvm) on an Intel Alder Lake workstation >> (i5-12600). > > Hello Andreas, > > I'm curious why you used psync ioengine for the benchmarks. > > As psync is a sync ioengine, it means queue depth == 1. > > Wouldn't it have been more interesting to see an async ioengine, > together with different queue depths? That would also be interesting. I was a bit constrained on CPU cycles so I had to choose. I intend to produce the numbers you ask for. For now here is two runs of random read using io_uring with queue depth 128 (same table style): For iodepth_batch_submit=1, iodepth_batch_complete=1: +---------+----------+---------------------+---------------------+ | jobs/bs | workload | 1 | 6 | +---------+----------+---------------------+---------------------+ | 4k | randread | 2.97 0.00 (0.9,0.0) | 4.06 0.00 (1.8,0.0) | +---------+----------+---------------------+---------------------+ For iodepth_batch_submit=16, iodepth_batch_complete=16: +---------+----------+---------------------+---------------------+ | jobs/bs | workload | 1 | 6 | +---------+----------+---------------------+---------------------+ | 4k | randread | 4.40 0.00 (1.1,0.0) | 4.87 0.00 (1.8,0.0) | +---------+----------+---------------------+---------------------+ Above numbers are 60 second runs on bare metal, so not entirely comparable with the ones in the cover letter. > You might want to explain your table a bit more. I understand that the table can be difficult to read. It is not easy to convey all this information in ASCII email. The numbers in parenthesis in the cells _are_ IOPS x 10e6 (read,write). Referring to he second table above, for 1 job at 4k bs the Rust driver performed 4.8 percent more IOPS than the C driver. The C driver did 1.1M IOPS. I hope this clarifies the table, otherwise let me know! > It might be nice to see IOPS and average latencies. I did collect latency info as well, including completion latency percentiles. It's just difficult to fit all that data in an email. I have the fio json output, let me know if you want it and I will find a way to get it to you. I am considering setting up some kind of CI that will publish the performance results online automatically so that it will be a link instead of an inline table. > > As an example of a table that I find easier to interpret, > see e.g. the table on page 29 in the SPDK performance report: > https://ci.spdk.io/download/performance-reports/SPDK_nvme_bdev_perf_report_2301.pdf Thanks for the input, I will be sure to reference that next time. Just for clarity, as you mentioned there is only one queue depth in play for the numbers in the cover letter. Best regards Andreas
On Wed, May 03, 2023 at 02:29:08PM +0200, Andreas Hindborg wrote: > (cut) > > For iodepth_batch_submit=1, iodepth_batch_complete=1: > +---------+----------+---------------------+---------------------+ > | jobs/bs | workload | 1 | 6 | > +---------+----------+---------------------+---------------------+ > | 4k | randread | 2.97 0.00 (0.9,0.0) | 4.06 0.00 (1.8,0.0) | > +---------+----------+---------------------+---------------------+ > > For iodepth_batch_submit=16, iodepth_batch_complete=16: > +---------+----------+---------------------+---------------------+ > | jobs/bs | workload | 1 | 6 | > +---------+----------+---------------------+---------------------+ > | 4k | randread | 4.40 0.00 (1.1,0.0) | 4.87 0.00 (1.8,0.0) | > +---------+----------+---------------------+---------------------+ > > Above numbers are 60 second runs on bare metal, so not entirely > comparable with the ones in the cover letter. > > > You might want to explain your table a bit more. > > I understand that the table can be difficult to read. It is not easy to > convey all this information in ASCII email. The numbers in parenthesis > in the cells _are_ IOPS x 10e6 (read,write). Referring to he second > table above, for 1 job at 4k bs the Rust driver performed 4.8 percent > more IOPS than the C driver. The C driver did 1.1M IOPS. I hope this > clarifies the table, otherwise let me know! Yes, I can fully understand that it is hard to convey all the information in a small text table. I do think it is better to have more small tables. It will be a lot of text, but perhaps one table per "nr jobs" would have been clearer. Perhaps you don't need such fine granularity of "nr jobs" either, perhaps skip some. Same thing can probably be said about block size increments, you can probably skip some. It seems like the numbers are very jittery. I assume that this is because the drive utilization is so low. The IOPS are also very low. You probably need to increase the QD for these numbers to be consistent. Perhaps move this text to be just before the table itself: "In this table each cell shows the relative performance of the Rust driver to the C driver with the throughput of the C driver in parenthesis: `rel_read rel_write (read_miops write_miops)`. Ex: the Rust driver performs 4.74 percent better than the C driver for fio randread with 2 jobs at 16 KiB block size." I think it would be good to print the IOPS for both C and Rust, as you now only seem to print the C driver IOPS. That way it is also easier to verify the relative differences. (Not that I think that you are making up numbers, but you usually see IOPS for both implementations, followed by percentage diffs. See e.g. how Jens did it: https://lore.kernel.org/linux-block/20190116175003.17880-1-axboe@kernel.dk/#r Kind regards, Niklas
On 5/3/23 02:06, Andreas Hindborg wrote:
> This is an early preview of a null block driver written in Rust.
It is not clear to me why this effort was started? As far as I know the
null_blk driver is only used by kernel developers for testing kernel
changes so end users are not affected by bugs in this driver.
Additionally, performance of this driver is critical since this driver
is used to measure block layer performance. Does this mean that C is a
better choice than Rust for this driver?
Thanks,
Bart.
Hi Bart, Bart Van Assche <bvanassche@acm.org> writes: > On 5/3/23 02:06, Andreas Hindborg wrote: >> This is an early preview of a null block driver written in Rust. > > It is not clear to me why this effort was started? As far as I know the null_blk > driver is only used by kernel developers for testing kernel changes so end users > are not affected by bugs in this driver. Additionally, performance of this > driver is critical since this driver is used to measure block layer performance. > Does this mean that C is a better choice than Rust for this driver? I take it you did not read the rest of the cover letter. Let me quote some of it here: > A null block driver is a good opportunity to evaluate Rust bindings for the > block layer. It is a small and simple driver and thus should be simple to reason > about. Further, the null block driver is not usually deployed in production > environments. Thus, it should be fairly straight forward to review, and any > potential issues are not going to bring down any production workloads. > > Being small and simple, the null block driver is a good place to introduce the > Linux kernel storage community to Rust. This will help prepare the community for > future Rust projects and facilitate a better maintenance process for these > projects. > > The statistics presented in my previous message [1] show that the C null block > driver has had a significant amount of memory safety related problems in the > past. 41% of fixes merged for the C null block driver are fixes for memory > safety issues. This makes the null block driver a good candidate for rewriting > in Rust. In relation to performance, it turns out that there is not much of a difference. For memory safety bugs - I think we are better off without them, no matter if we are user facing or not. If it is still unclear to you why this effort was started, please do let me know and I shall try to clarify further :) Best regards, Andreas
On 5/4/23 11:15, Andreas Hindborg wrote: > If it is still unclear to you why this effort was started, please do let > me know and I shall try to clarify further :) It seems like I was too polite in my previous email. What I meant is that rewriting code is useful if it provides a clear advantage to the users of a driver. For null_blk, the users are kernel developers. The code that has been posted is the start of a rewrite of the null_blk driver. The benefits of this rewrite (making low-level memory errors less likely) do not outweigh the risks that this effort will introduce functional or performance regressions. Bart.
Bart Van Assche <bvanassche@acm.org> writes: > On 5/4/23 11:15, Andreas Hindborg wrote: >> If it is still unclear to you why this effort was started, please do let >> me know and I shall try to clarify further :) > > It seems like I was too polite in my previous email. What I meant is that > rewriting code is useful if it provides a clear advantage to the users of > a driver. For null_blk, the users are kernel developers. The code that has > been posted is the start of a rewrite of the null_blk driver. The benefits > of this rewrite (making low-level memory errors less likely) do not outweigh > the risks that this effort will introduce functional or performance regressions. If this turns in to a full rewrite instead of just a demonstrator, we will be in the lucky situation that we have the existing C version to verify performance and functionality against. Unnoticed regressions are unlikely in this sense. If we want to have Rust abstractions for the block layer in the kernel (some people do), then having a simple driver in Rust to regression test these abstractions with, is good value. Best regards, Andreas
On Thu, May 04, 2023 at 11:36:01AM -0700, Bart Van Assche wrote: > On 5/4/23 11:15, Andreas Hindborg wrote: > > If it is still unclear to you why this effort was started, please do let > > me know and I shall try to clarify further :) > > It seems like I was too polite in my previous email. What I meant is that > rewriting code is useful if it provides a clear advantage to the users of > a driver. For null_blk, the users are kernel developers. The code that has > been posted is the start of a rewrite of the null_blk driver. The benefits > of this rewrite (making low-level memory errors less likely) do not outweigh > the risks that this effort will introduce functional or performance regressions. Instead of replacing, would co-existing be okay? Of course as long as there's no requirement to maintain feature parity between the two. Actually, just call it "rust_blk" and declare it has no relationship to null_blk, despite their functional similarities: it's a developer reference implementation for a rust block driver.
On 5/4/23 12:52?PM, Keith Busch wrote: > On Thu, May 04, 2023 at 11:36:01AM -0700, Bart Van Assche wrote: >> On 5/4/23 11:15, Andreas Hindborg wrote: >>> If it is still unclear to you why this effort was started, please do let >>> me know and I shall try to clarify further :) >> >> It seems like I was too polite in my previous email. What I meant is that >> rewriting code is useful if it provides a clear advantage to the users of >> a driver. For null_blk, the users are kernel developers. The code that has >> been posted is the start of a rewrite of the null_blk driver. The benefits >> of this rewrite (making low-level memory errors less likely) do not outweigh >> the risks that this effort will introduce functional or performance regressions. > > Instead of replacing, would co-existing be okay? Of course as long as > there's no requirement to maintain feature parity between the two. > Actually, just call it "rust_blk" and declare it has no relationship to > null_blk, despite their functional similarities: it's a developer > reference implementation for a rust block driver. To me, the big discussion point isn't really whether we're doing null_blk or not, it's more if we want to go down this path of maintaining rust bindings for the block code in general. If the answer to that is yes, then doing null_blk seems like a great choice as it's not a critical piece of infrastructure. It might even be a good idea to be able to run both, for performance purposes, as the bindings or core changes. But back to the real question... This is obviously extra burden on maintainers, and that needs to be sorted out first. Block drivers in general are not super security sensitive, as it's mostly privileged code and there's not a whole lot of user visibile API. And the stuff we do have is reasonably basic. So what's the long term win of having rust bindings? This is a legitimate question. I can see a lot of other more user exposed subsystems being of higher interest here.
Jens Axboe <axboe@kernel.dk> writes: > On 5/4/23 12:52?PM, Keith Busch wrote: >> On Thu, May 04, 2023 at 11:36:01AM -0700, Bart Van Assche wrote: >>> On 5/4/23 11:15, Andreas Hindborg wrote: >>>> If it is still unclear to you why this effort was started, please do let >>>> me know and I shall try to clarify further :) >>> >>> It seems like I was too polite in my previous email. What I meant is that >>> rewriting code is useful if it provides a clear advantage to the users of >>> a driver. For null_blk, the users are kernel developers. The code that has >>> been posted is the start of a rewrite of the null_blk driver. The benefits >>> of this rewrite (making low-level memory errors less likely) do not outweigh >>> the risks that this effort will introduce functional or performance regressions. >> >> Instead of replacing, would co-existing be okay? Of course as long as >> there's no requirement to maintain feature parity between the two. >> Actually, just call it "rust_blk" and declare it has no relationship to >> null_blk, despite their functional similarities: it's a developer >> reference implementation for a rust block driver. > > To me, the big discussion point isn't really whether we're doing > null_blk or not, it's more if we want to go down this path of > maintaining rust bindings for the block code in general. If the answer > to that is yes, then doing null_blk seems like a great choice as it's > not a critical piece of infrastructure. It might even be a good idea to > be able to run both, for performance purposes, as the bindings or core > changes. > > But back to the real question... This is obviously extra burden on > maintainers, and that needs to be sorted out first. Block drivers in > general are not super security sensitive, as it's mostly privileged code > and there's not a whole lot of user visibile API. And the stuff we do > have is reasonably basic. So what's the long term win of having rust > bindings? This is a legitimate question. I can see a lot of other more > user exposed subsystems being of higher interest here. Even though the block layer is not usually exposed in the same way that something like the USB stack is, absence of memory safety bugs is a very useful property. If this is attainable without sacrificing performance, it seems like a nice option to offer future block device driver developers. Some would argue that it is worth offering even in the face of performance regression. While memory safety is the primary feature that Rust brings to the table, it does come with other nice features as well. The type system, language support stackless coroutines and error handling language support are all very useful. Regarding maintenance of the bindings, it _is_ an amount extra work. But there is more than one way to structure that work. If Rust is accepted into the block layer at some point, maintenance could be structured in such a way that it does not get in the way of existing C maintenance work. A "rust keeps up or it breaks" model. That could work for a while. Best regards Andreas
On Thu, May 4, 2023 at 9:02 PM Jens Axboe <axboe@kernel.dk> wrote: > > But back to the real question... This is obviously extra burden on > maintainers, and that needs to be sorted out first. Block drivers in Regarding maintenance, something we have suggested in similar cases to other subsystems is that the author gets involved as a maintainer of, at least, the Rust abstractions/driver (possibly with a different `MAINTAINERS` entry). Of course, that is still work for the existing maintainer(s), i.e. you, since coordination takes time. However, it can also be a nice way to learn Rust on the side, meanwhile things are getting upstreamed and discussed (I think Daniel, in Cc, is taking that approach). And it may also be a way for you to get an extra maintainer/reviewer/... later on for the C parts, too, even if Rust does not succeed. > general are not super security sensitive, as it's mostly privileged code > and there's not a whole lot of user visibile API. And the stuff we do > have is reasonably basic. So what's the long term win of having rust > bindings? This is a legitimate question. I can see a lot of other more > user exposed subsystems being of higher interest here. From the experience of other kernel maintainers/developers that are making the move, the advantages seem to be well worth it, even disregarding the security aspect, i.e. on the language side alone. Cheers, Miguel
On 5/4/23 2:11?PM, Miguel Ojeda wrote: > On Thu, May 4, 2023 at 9:02?PM Jens Axboe <axboe@kernel.dk> wrote: >> >> But back to the real question... This is obviously extra burden on >> maintainers, and that needs to be sorted out first. Block drivers in > > Regarding maintenance, something we have suggested in similar cases to > other subsystems is that the author gets involved as a maintainer of, > at least, the Rust abstractions/driver (possibly with a different > `MAINTAINERS` entry). Right, but that doesn't really solve the problem when the rust bindings get in the way of changes that you are currently making. Or if you break them inadvertently. I do see benefits to that approach, but it's no panacea. > Of course, that is still work for the existing maintainer(s), i.e. > you, since coordination takes time. However, it can also be a nice way > to learn Rust on the side, meanwhile things are getting upstreamed and > discussed (I think Daniel, in Cc, is taking that approach). This seems to assume that time is plentiful and we can just add more to our plate, which isn't always true. While I'd love to do more rust and get more familiar with it, the time still has to be there for that. I'm actually typing this on a laptop with a rust gpu driver :-) And this isn't just on me, there are other regular contributors and reviewers that would need to be onboard with this. > And it may also be a way for you to get an extra > maintainer/reviewer/... later on for the C parts, too, even if Rust > does not succeed. That is certainly a win! >> general are not super security sensitive, as it's mostly privileged code >> and there's not a whole lot of user visibile API. And the stuff we do >> have is reasonably basic. So what's the long term win of having rust >> bindings? This is a legitimate question. I can see a lot of other more >> user exposed subsystems being of higher interest here. > > From the experience of other kernel maintainers/developers that are > making the move, the advantages seem to be well worth it, even > disregarding the security aspect, i.e. on the language side alone. Each case is different though, different people and different schedules and priorities. So while the above is promising, it's also just annecdotal and doesn't necessarily apply to our case.
On 5/4/23 1:59?PM, Andreas Hindborg wrote: > > Jens Axboe <axboe@kernel.dk> writes: > >> On 5/4/23 12:52?PM, Keith Busch wrote: >>> On Thu, May 04, 2023 at 11:36:01AM -0700, Bart Van Assche wrote: >>>> On 5/4/23 11:15, Andreas Hindborg wrote: >>>>> If it is still unclear to you why this effort was started, please do let >>>>> me know and I shall try to clarify further :) >>>> >>>> It seems like I was too polite in my previous email. What I meant is that >>>> rewriting code is useful if it provides a clear advantage to the users of >>>> a driver. For null_blk, the users are kernel developers. The code that has >>>> been posted is the start of a rewrite of the null_blk driver. The benefits >>>> of this rewrite (making low-level memory errors less likely) do not outweigh >>>> the risks that this effort will introduce functional or performance regressions. >>> >>> Instead of replacing, would co-existing be okay? Of course as long as >>> there's no requirement to maintain feature parity between the two. >>> Actually, just call it "rust_blk" and declare it has no relationship to >>> null_blk, despite their functional similarities: it's a developer >>> reference implementation for a rust block driver. >> >> To me, the big discussion point isn't really whether we're doing >> null_blk or not, it's more if we want to go down this path of >> maintaining rust bindings for the block code in general. If the answer >> to that is yes, then doing null_blk seems like a great choice as it's >> not a critical piece of infrastructure. It might even be a good idea to >> be able to run both, for performance purposes, as the bindings or core >> changes. >> >> But back to the real question... This is obviously extra burden on >> maintainers, and that needs to be sorted out first. Block drivers in >> general are not super security sensitive, as it's mostly privileged code >> and there's not a whole lot of user visibile API. And the stuff we do >> have is reasonably basic. So what's the long term win of having rust >> bindings? This is a legitimate question. I can see a lot of other more >> user exposed subsystems being of higher interest here. > > Even though the block layer is not usually exposed in the same way > that something like the USB stack is, absence of memory safety bugs is > a very useful property. If this is attainable without sacrificing > performance, it seems like a nice option to offer future block device > driver developers. Some would argue that it is worth offering even in > the face of performance regression. > > While memory safety is the primary feature that Rust brings to the > table, it does come with other nice features as well. The type system, > language support stackless coroutines and error handling language > support are all very useful. We're in violent agreement on this part, I don't think anyone sane would argue that memory safety with the same performance [1] isn't something you'd want. And the error handling with rust is so much better than the C stuff drivers do now that I can't see anyone disagreeing on that being a great thing as well. The discussion point here is the price being paid in terms of people time. > Regarding maintenance of the bindings, it _is_ an amount extra work. But > there is more than one way to structure that work. If Rust is accepted > into the block layer at some point, maintenance could be structured in > such a way that it does not get in the way of existing C maintenance > work. A "rust keeps up or it breaks" model. That could work for a while. That potentially works for null_blk, but it would not work for anything that people actually depend on. In other words, anything that isn't null_blk. And I don't believe we'd be actively discussing these bindings if just doing null_blk is the end goal, because that isn't useful by itself, and at that point we'd all just be wasting our time. In the real world, once we have just one actual driver using it, then we'd be looking at "this driver regressed because of change X/Y/Z and that needs to get sorted before the next release". And THAT is the real issue for me. So a "rust keeps up or it breaks" model is a bit naive in my opinion, it's just not a viable approach. In fact, even for null_blk, this doesn't really fly as we rely on blktests to continually vet the sanity of the IO stack, and null_blk is an integral part of that. So I really don't think there's much to debate between "rust people vs jens" here, as we agree on the benefits, but my end of the table has to stomach the cons. And like I mentioned in an earlier email, that's not just on me, there are other regular contributors and reviewers that are relevant to this discussion. This is something we need to discuss. [1] We obviously need to do real numbers here, the ones posted I don't consider stable enough to be useful in saying "yeah it's fully on part". If you have an updated rust nvme driver that uses these bindings I'd be happy to run some testing that will definitively tell us if there's a performance win, loss, or parity, and how much.
On Thu, May 04, 2023 at 01:02:19PM -0600, Jens Axboe wrote: > null_blk or not, it's more if we want to go down this path of > maintaining rust bindings for the block code in general. If the answer > to that is yes, then doing null_blk seems like a great choice as it's > not a critical piece of infrastructure. It might even be a good idea to > be able to run both, for performance purposes, as the bindings or core > changes. Yes. And I'm not in favor of it, especially right now. There is so much work we need to do that requires changes all over (e.g. sorting out the request_queue vs gendisk, and making the bio_vec folio or even better physical address based), and the last thing I need is a binding to a another language, one that happens to have nice features but that also is really ugly.
Jens Axboe <axboe@kernel.dk> writes: > On 5/4/23 1:59?PM, Andreas Hindborg wrote: >> >> Jens Axboe <axboe@kernel.dk> writes: >> >>> On 5/4/23 12:52?PM, Keith Busch wrote: >>>> On Thu, May 04, 2023 at 11:36:01AM -0700, Bart Van Assche wrote: >>>>> On 5/4/23 11:15, Andreas Hindborg wrote: >>>>>> If it is still unclear to you why this effort was started, please do let >>>>>> me know and I shall try to clarify further :) >>>>> >>>>> It seems like I was too polite in my previous email. What I meant is that >>>>> rewriting code is useful if it provides a clear advantage to the users of >>>>> a driver. For null_blk, the users are kernel developers. The code that has >>>>> been posted is the start of a rewrite of the null_blk driver. The benefits >>>>> of this rewrite (making low-level memory errors less likely) do not outweigh >>>>> the risks that this effort will introduce functional or performance regressions. >>>> >>>> Instead of replacing, would co-existing be okay? Of course as long as >>>> there's no requirement to maintain feature parity between the two. >>>> Actually, just call it "rust_blk" and declare it has no relationship to >>>> null_blk, despite their functional similarities: it's a developer >>>> reference implementation for a rust block driver. >>> >>> To me, the big discussion point isn't really whether we're doing >>> null_blk or not, it's more if we want to go down this path of >>> maintaining rust bindings for the block code in general. If the answer >>> to that is yes, then doing null_blk seems like a great choice as it's >>> not a critical piece of infrastructure. It might even be a good idea to >>> be able to run both, for performance purposes, as the bindings or core >>> changes. >>> >>> But back to the real question... This is obviously extra burden on >>> maintainers, and that needs to be sorted out first. Block drivers in >>> general are not super security sensitive, as it's mostly privileged code >>> and there's not a whole lot of user visibile API. And the stuff we do >>> have is reasonably basic. So what's the long term win of having rust >>> bindings? This is a legitimate question. I can see a lot of other more >>> user exposed subsystems being of higher interest here. >> >> Even though the block layer is not usually exposed in the same way >> that something like the USB stack is, absence of memory safety bugs is >> a very useful property. If this is attainable without sacrificing >> performance, it seems like a nice option to offer future block device >> driver developers. Some would argue that it is worth offering even in >> the face of performance regression. >> >> While memory safety is the primary feature that Rust brings to the >> table, it does come with other nice features as well. The type system, >> language support stackless coroutines and error handling language >> support are all very useful. > > We're in violent agreement on this part, I don't think anyone sane would > argue that memory safety with the same performance [1] isn't something > you'd want. And the error handling with rust is so much better than the > C stuff drivers do now that I can't see anyone disagreeing on that being > a great thing as well. > > The discussion point here is the price being paid in terms of people > time. > >> Regarding maintenance of the bindings, it _is_ an amount extra work. But >> there is more than one way to structure that work. If Rust is accepted >> into the block layer at some point, maintenance could be structured in >> such a way that it does not get in the way of existing C maintenance >> work. A "rust keeps up or it breaks" model. That could work for a while. > > That potentially works for null_blk, but it would not work for anything > that people actually depend on. In other words, anything that isn't > null_blk. And I don't believe we'd be actively discussing these bindings > if just doing null_blk is the end goal, because that isn't useful by > itself, and at that point we'd all just be wasting our time. In the real > world, once we have just one actual driver using it, then we'd be > looking at "this driver regressed because of change X/Y/Z and that needs > to get sorted before the next release". And THAT is the real issue for > me. So a "rust keeps up or it breaks" model is a bit naive in my > opinion, it's just not a viable approach. In fact, even for null_blk, > this doesn't really fly as we rely on blktests to continually vet the > sanity of the IO stack, and null_blk is an integral part of that. Sure, once there are actual users, this model would not work. But during an introduction period it might be a useful model. Having Rust around without having to take care of it might give maintainers,reviewers,contributors a no strings attached opportunity to dabble with the language in a domain they are familiar with. > > So I really don't think there's much to debate between "rust people vs > jens" here, as we agree on the benefits, but my end of the table has to > stomach the cons. And like I mentioned in an earlier email, that's not > just on me, there are other regular contributors and reviewers that are > relevant to this discussion. This is something we need to discuss. > > [1] We obviously need to do real numbers here, the ones posted I don't > consider stable enough to be useful in saying "yeah it's fully on part". > If you have an updated rust nvme driver that uses these bindings I'd > be happy to run some testing that will definitively tell us if there's a > performance win, loss, or parity, and how much. I do plan to rebase the NVMe driver somewhere in the next few months. I'll let you know when that work is done. Best regards Andreas
On 5/4/23 20:36, Bart Van Assche wrote: > On 5/4/23 11:15, Andreas Hindborg wrote: >> If it is still unclear to you why this effort was started, please do let >> me know and I shall try to clarify further :) > > It seems like I was too polite in my previous email. What I meant is that > rewriting code is useful if it provides a clear advantage to the users of > a driver. For null_blk, the users are kernel developers. The code that has > been posted is the start of a rewrite of the null_blk driver. The benefits > of this rewrite (making low-level memory errors less likely) do not > outweigh > the risks that this effort will introduce functional or performance > regressions. > I have to disagree here. While the null_blk driver in itself is certainly not _that_ useful, it does provide a good sounding board if all the design principles of the linux block layer can be adequately expressed in Rust. And by posting this driver you just proved that, and we all have a better understanding what would be needed to convert old or create new drivers. But I guess we'll have a longer discussion at LSF :-) Cheers, Hannes
On Thu, May 4, 2023 at 10:22 PM Jens Axboe <axboe@kernel.dk> wrote: > > Right, but that doesn't really solve the problem when the rust bindings > get in the way of changes that you are currently making. Or if you break > them inadvertently. I do see benefits to that approach, but it's no > panacea. > > This seems to assume that time is plentiful and we can just add more to > our plate, which isn't always true. While I'd love to do more rust and > get more familiar with it, the time still has to be there for that. I'm > actually typing this on a laptop with a rust gpu driver :-) > > And this isn't just on me, there are other regular contributors and > reviewers that would need to be onboard with this. Indeed -- I didn't mean to imply it wouldn't be time consuming, only that it might be an alternative approach compared to having existing maintainers do it. Of course, it depends on the dynamics of the subsystem, how busy the subsystem is, whether there is good rapport, etc. > Each case is different though, different people and different schedules > and priorities. So while the above is promising, it's also just > annecdotal and doesn't necessarily apply to our case. Definitely, in the end subsystems know best if there is enough time available (from everybody) to pull it off. I only meant to say that the security angle is not the only benefit. For instance, like you said, the error handling, plus a bunch more that people usually enjoy: stricter typing, more information on signatures, sum types, pattern matching, privacy, closures, generics, etc. Cheers, Miguel
On Thu, May 4, 2023 at 10:55 PM Jens Axboe <axboe@kernel.dk> wrote: > > That potentially works for null_blk, but it would not work for anything > that people actually depend on. In other words, anything that isn't > null_blk. And I don't believe we'd be actively discussing these bindings > if just doing null_blk is the end goal, because that isn't useful by > itself, and at that point we'd all just be wasting our time. In the real > world, once we have just one actual driver using it, then we'd be > looking at "this driver regressed because of change X/Y/Z and that needs > to get sorted before the next release". And THAT is the real issue for > me. So a "rust keeps up or it breaks" model is a bit naive in my > opinion, it's just not a viable approach. In fact, even for null_blk, > this doesn't really fly as we rely on blktests to continually vet the > sanity of the IO stack, and null_blk is an integral part of that. But `null_blk` shouldn't be affected, no? The Rust abstractions can be behind an explicit "experimental" / "broken" / "compile-test only" gate (or similar) in the beginning, as a way to test how much maintenance it actually requires. In such a setup, Andreas could be the one responsible to keep them up to date in the beginning. That is, in the worst case, a kernel release could happen with the Rust side broken -- that way `null_blk` is not impacted. That is why a separate `MAINTAINERS` entry may be interesting if you want to keep e.g. the `S:` level separate (though Andreas, I think, may be able to do "Supported" at this time). When the first real driver comes, a similar approach could be repeated, to buy some more time too. Cheers, Miguel
On Fri, May 05, 2023 at 12:53:41PM +0200, Miguel Ojeda wrote: > On Thu, May 4, 2023 at 10:22 PM Jens Axboe <axboe@kernel.dk> wrote: > > > > Right, but that doesn't really solve the problem when the rust bindings > > get in the way of changes that you are currently making. Or if you break > > them inadvertently. I do see benefits to that approach, but it's no > > panacea. One thing I want to point out is: not having a block layer Rust API doesn't keep the block layer away from Rust ;-) Rust will get in the way as long as block layer is used, directly or indirectly, in any Rust code in kernel. Take the M1 GPU driver for example, it can totally be done without a drm Rust API: Lina will have to directly call C funciton in her GPU driver, but it's possible or she can have her own drm Rust binding which is not blessed by the drm maintainers. But as long as drm is used in a Rust driver, a refactoring/improvement of drm will need to take the usage of Rust side into consideration. Unless of course, some one is willing to write a C driver for M1 GPU. The Rust bindings are actually the way of communication between subsystem mantainers and Rust driver writers, and can help reduce the amount of work: You can have the abstraction the way you like. Of course, there is always "don't do it until there are actually users", and I totally agree with that. But what is a better way to design the Rust binding for a subsystem? * Sit down and use the wisdom of maintainers and active developers, and really spend time on it right now? Or * Let one future user drag the API/binding design to insaneness? I'd rather prefer the first approach. Time spent is time saved. Personally, my biggest fear is: RCU stalls/lockdep warnings in the Rust code (or they don't happen because incorrect bindings), and who is going to fix them ;-) So I have to spend my time on making sure these bindings in good shapes, which is not always a pleasant experience: the more you use something, the more you hate it ;-) But I think it's worth. Of course, by no means I want to force anyone to learn Rust, I totally understand people who want to see zero Rust. Just want to say the maintain burden may exist any way, and the Rust binding is actually the thing to help here. Regards, Boqun > > > > This seems to assume that time is plentiful and we can just add more to > > our plate, which isn't always true. While I'd love to do more rust and > > get more familiar with it, the time still has to be there for that. I'm > > actually typing this on a laptop with a rust gpu driver :-) > > > > And this isn't just on me, there are other regular contributors and > > reviewers that would need to be onboard with this. > > Indeed -- I didn't mean to imply it wouldn't be time consuming, only > that it might be an alternative approach compared to having existing > maintainers do it. Of course, it depends on the dynamics of the > subsystem, how busy the subsystem is, whether there is good rapport, > etc. > > > Each case is different though, different people and different schedules > > and priorities. So while the above is promising, it's also just > > annecdotal and doesn't necessarily apply to our case. > > Definitely, in the end subsystems know best if there is enough time > available (from everybody) to pull it off. I only meant to say that > the security angle is not the only benefit. > > For instance, like you said, the error handling, plus a bunch more > that people usually enjoy: stricter typing, more information on > signatures, sum types, pattern matching, privacy, closures, generics, > etc. > > Cheers, > Miguel
On Fri, May 05, 2023 at 05:24:56AM -0700, Boqun Feng wrote: > On Fri, May 05, 2023 at 12:53:41PM +0200, Miguel Ojeda wrote: > > On Thu, May 4, 2023 at 10:22 PM Jens Axboe <axboe@kernel.dk> wrote: > > > > > > Right, but that doesn't really solve the problem when the rust bindings > > > get in the way of changes that you are currently making. Or if you break > > > them inadvertently. I do see benefits to that approach, but it's no > > > panacea. > > One thing I want to point out is: not having a block layer Rust API > doesn't keep the block layer away from Rust ;-) Rust will get in the way > as long as block layer is used, directly or indirectly, in any Rust code > in kernel. > > Take the M1 GPU driver for example, it can totally be done without a drm > Rust API: Lina will have to directly call C funciton in her GPU driver, > but it's possible or she can have her own drm Rust binding which is not > blessed by the drm maintainers. But as long as drm is used in a Rust > driver, a refactoring/improvement of drm will need to take the usage of > Rust side into consideration. Unless of course, some one is willing to > write a C driver for M1 GPU. > > The Rust bindings are actually the way of communication between > subsystem mantainers and Rust driver writers, and can help reduce the > amount of work: You can have the abstraction the way you like. > > Of course, there is always "don't do it until there are actually users", > and I totally agree with that. But what is a better way to design the > Rust binding for a subsystem? > > * Sit down and use the wisdom of maintainers and active > developers, and really spend time on it right now? Or > > * Let one future user drag the API/binding design to insaneness? > Ah, of course, I should add: this is not the usual case, most of the time, users (e.g. a real driver) can help the design, I was just trying to say: without the wisdom of maintainers and active developers, a Rust binding solely designed by one user could have some design issues. In other words, the experience of maintaining C side API is very valuable to design Rust bindings. Regards, Boqun > I'd rather prefer the first approach. Time spent is time saved. > > Personally, my biggest fear is: RCU stalls/lockdep warnings in the Rust > code (or they don't happen because incorrect bindings), and who is going > to fix them ;-) So I have to spend my time on making sure these bindings > in good shapes, which is not always a pleasant experience: the more you > use something, the more you hate it ;-) But I think it's worth. > > Of course, by no means I want to force anyone to learn Rust, I totally > understand people who want to see zero Rust. Just want to say the > maintain burden may exist any way, and the Rust binding is actually the > thing to help here. > > Regards, > Boqun > > > > > > > This seems to assume that time is plentiful and we can just add more to > > > our plate, which isn't always true. While I'd love to do more rust and > > > get more familiar with it, the time still has to be there for that. I'm > > > actually typing this on a laptop with a rust gpu driver :-) > > > > > > And this isn't just on me, there are other regular contributors and > > > reviewers that would need to be onboard with this. > > > > Indeed -- I didn't mean to imply it wouldn't be time consuming, only > > that it might be an alternative approach compared to having existing > > maintainers do it. Of course, it depends on the dynamics of the > > subsystem, how busy the subsystem is, whether there is good rapport, > > etc. > > > > > Each case is different though, different people and different schedules > > > and priorities. So while the above is promising, it's also just > > > annecdotal and doesn't necessarily apply to our case. > > > > Definitely, in the end subsystems know best if there is enough time > > available (from everybody) to pull it off. I only meant to say that > > the security angle is not the only benefit. > > > > For instance, like you said, the error handling, plus a bunch more > > that people usually enjoy: stricter typing, more information on > > signatures, sum types, pattern matching, privacy, closures, generics, > > etc. > > > > Cheers, > > Miguel
On 5/5/23 03:53, Miguel Ojeda wrote: > Definitely, in the end subsystems know best if there is enough time > available (from everybody) to pull it off. I only meant to say that > the security angle is not the only benefit. > > For instance, like you said, the error handling, plus a bunch more > that people usually enjoy: stricter typing, more information on > signatures, sum types, pattern matching, privacy, closures, generics, > etc. These are all great advantages of Rust. One potential cause of memory corruption caused by block drivers is misprogramming the DMA engine of the storage controller. This is something no borrow checker can protect against. Only an IOMMU can protect against the storage controller accessing memory that it shouldn't access. This is not a criticism of Rust - I'm bringing this up because I think this is something that is important to realize. Bart.
On Fri, May 05, 2023 at 05:24:56AM -0700, Boqun Feng wrote: > > The Rust bindings are actually the way of communication between > subsystem mantainers and Rust driver writers, and can help reduce the > amount of work: You can have the abstraction the way you like. We don't have stable APIs or structures here, so let's be clear-eyed about the maintenance burden these bindings create for linux-block contributors. Not a hard "no" from me, but this isn't something to handwave over.
On Fri, May 05, 2023 at 01:42:42PM -0600, Keith Busch wrote: > On Fri, May 05, 2023 at 05:24:56AM -0700, Boqun Feng wrote: > > > > The Rust bindings are actually the way of communication between > > subsystem mantainers and Rust driver writers, and can help reduce the > > amount of work: You can have the abstraction the way you like. > > We don't have stable APIs or structures here, so let's be clear-eyed Of course, but every API change need to cover all in-tree users, right? > about the maintenance burden these bindings create for linux-block > contributors. Not a hard "no" from me, but this isn't something to > handwave over. Not tried to handwave over anything ;-) The fact IIUC is simply that Rust drivers can call C function, so say a in-tree Rust driver does something as follow: struct Foo { ptr: *mut bio; // A pointer to bio. ... } impl Foo { pub fn bar(&self) { unsafe { // calling a C function "do_sth_to_bio". // void do_sth_to_bio(struct bio *bio); bindings::do_sth_to_bio(self.ptr); } } } That's an user of the block layer, and that user could exist even without Bio abstraction. And whenever a linux-block developer wants to refactor the "do_sth_to_bio" with a slightly different semantics, that user will need to be taken into consideration, meaning reading the Rust code of Foo to understand the usage. Now with a Bio abstraction, the immediate effect would be there should be no Rust code is allowed to directly calls block layer functions without using Bio abstraction. And hopefully Bio abstraction along with other bindings is a place good enough to reasoning about semanitcs changes or refactoring, so no need to read the code of Foo to understand the usage. Of course, some C side changes may result into changes in Rust bindings as well, but that doesn't make things worse. (Need to understand Foo in that case if there is no Rust bindings). Of course, this is just my 2 cents. I could be completely wrong. (Put the Rust-for-Linux hat on) Needless to say with or without the Rust bindings for the block layer, we are (at least I'm) happy to help on any Rust related questions/bugs/issues for linux-block ;-) Regards, Boqun
On Wed, May 03, 2023 at 11:06:57AM +0200, Andreas Hindborg wrote: > The statistics presented in my previous message [1] show that the C null block > driver has had a significant amount of memory safety related problems in the > past. 41% of fixes merged for the C null block driver are fixes for memory > safety issues. This makes the null block driver a good candidate for rewriting > in Rust. Curious, how long does it take to do an analysis like this? Are there efforts to automate this a bit more? We have efforts to use machine learning to evaluate stable candidate patches, we probably should be able to qualify commits as fixing "memory safety", I figure. Because what I'd love to see is if we can could easily obtain similar statistics for arbitrary parts of the kernel. The easiest way to break this down might be by kconfig symbol for instance, and then based on that gather more information about subsystems. Then the rationale for considerating adopting rust bindings for certain areas of the kernel becomes a bit clearer. I figured some of this work has already been done, but I just haven't seen it yet. Luis
Luis Chamberlain <mcgrof@kernel.org> writes: > On Wed, May 03, 2023 at 11:06:57AM +0200, Andreas Hindborg wrote: >> The statistics presented in my previous message [1] show that the C null block >> driver has had a significant amount of memory safety related problems in the >> past. 41% of fixes merged for the C null block driver are fixes for memory >> safety issues. This makes the null block driver a good candidate for rewriting >> in Rust. > > Curious, how long does it take to do an analysis like this? Are there efforts > to automate this a bit more? We have efforts to use machine learning to > evaluate stable candidate patches, we probably should be able to qualify > commits as fixing "memory safety", I figure. > > Because what I'd love to see is if we can could easily obtain similar > statistics for arbitrary parts of the kernel. The easiest way to break > this down might be by kconfig symbol for instance, and then based on > that gather more information about subsystems. > I spent around 4 hours with a spreadsheet and git. It would be cool if that work could be automated. It's not always clear from the commit heading that a commit is a fix. When it is clear that it is a fix, it might not be clear what is fixed. I had to look at the diff quite a few commits. There is some work mentioning the ratio of memory safety issues fixed in the kernel, but none of them go into details for specific subsystems as far as I know. 20% of bugs fixed in stable Linux Kernel branches for drivers are memory safety issues [1]. 65% of recent Linux kernel vulnerabilities are memory safety issues [2] > Then the rationale for considerating adopting rust bindings for certain areas > of the kernel becomes a bit clearer. As mentioned elsewhere in this thread there are other benefits from deploying Rust than provable absence of memory safety issues. Best regards Andreas [1] http://dx.doi.org/10.15514/ISPRAS-2018-30(6)-8 [2] https://lssna19.sched.com/event/RHaT/writing-linux-kernel-modules-in-safe-rust-geoffrey-thomas-two-sigma-investments-alex-gaynor-alloy
Hi All, I apologize for the lengthy email, but I have a lot of things to cover. As some of you know, a goal of mine is to make it possible to write blk-mq device drivers in Rust. The RFC patches I have sent to this list are the first steps of making that goal a reality. They are a sample of the work I am doing. My current plan of action is to provide a Rust API that allows implementation of blk-mq device drives, along with a Rust implementation of null_blk to serve as a reference implementation. This reference implementation will demonstrate how to use the API. I attended LSF in Vancouver a few weeks back where I led a discussion on the topic. My goal for that session was to obtain input from the community on how to upstream the work as it becomes more mature. I received a lot of feedback, both during the session, in the hallway, and on the mailing list. Ultimately, we did not achieve consensus on a path forward. I will try to condense the key points raised by the community here. If anyone feel their point is not contained below, please chime in. Please note that I am paraphrasing the points below, they are not citations. 1) "Block layer community does not speak Rust and thus cannot review Rust patches" This work hinges on one of two things happening. Either block layer reviewers and maintainers eventually becoming fluent in Rust, or they accept code in their tree that are maintained by the "rust people". I very much would prefer the first option. I would suggest to use this work to facilitate gradual adoption of Rust. I understand that this will be a multi-year effort. By giving the community access to a Rust bindings specifically designed or the block layer, the block layer community will have a helpful reference to consult when investigating Rust. While the block community is getting up to speed in Rust, the Rust for Linux community is ready to conduct review of patches targeting the block layer. Until such a time where Rust code can be reviewed by block layer experts, the work could be gated behind an "EXPERIMENTAL" flag. Selection of the null_blk driver for a reference implementation to drive the Rust block API was not random. The null_blk driver is relatively simple and thus makes for a good platform to demonstrate the Rust API without having to deal with actual hardware. The null_blk driver is a piece of testing infrastructure that is not usually deployed in production environments, so people who are worried about Rust in general will not have to worry about their production environments being infested with Rust. Finally there have been suggestions both to replace and/or complement the existing C null_blk driver with the Rust version. I would suggest (eventually, not _now_) complementing the existing driver, since it can be very useful to benchmark and test the two drivers side by side. 2) "Having Rust bindings for the block layer in-tree is a burden for the maintainers" I believe we can integrate the bindings in a way so that any potential breakage in the Rust API does not impact current maintenance work. Maintainers and reviewers that do not wish to bother with Rust should be able to opt out. All Rust parts should be gated behind a default N kconfig option. With this scheme there should be very little inconvenience for current maintainers. I will take necessary steps to make sure block layer Rust bindings are always up to date with changes to kernel C API. I would run CI against - for-next of https://git.kernel.dk/linux.git - master of https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git - mainline releases including RCs - stable and longterm kernels with queues applied - stable and longterm releases including RCs Samsung will provide resources to support this CI effort. Through this effort I will aim to minimize any inconvenience for maintainers. 3) "How will you detect breakage in the Rust API caused by changes to C code?" The way we call C code from Rust in the kernel guarantees that most changes to C APIs that are called by Rust code will cause a compile failure when building the kernel with Rust enabled. This includes changing C function argument names or types, and struct field names or types. Thus, we do not need to rely on symvers CRC calculation as suggested by James Bottomley at LSF. However, if the semantics of a kernel C function is changed without changing its name or signature, potential breakage will not be detected by the build system. To detect breakage resulting from this kind of change, we have to rely _on the same mechanics_ that maintainers of kernel C code are relying on today: - kunit tests - blktests - fstests - staying in the loop wrt changes in general We also have Rust support in Intel 0-day CI, although only compile tests for now. 4) "How will you prevent breakage in C code resulting from changes to Rust code" The way the Rust API is designed, existing C code is not going to be reliant on Rust code. If anything breaks just disable Rust and no Rust code will be built. Or disable block layer Rust code if you want to keep general Rust support. If Rust is disabled by default, nothing in the kernel should break because of Rust, if not explicitly enabled. 5) "Block drivers in general are not security sensitive because they are mostly privileged code and have limited user visible API" There are probably easier ways to exploit a Linux system than to target the block layer, although people are plugging in potentially malicious block devices all the time in the form of USB Mass Storage devices or CF cards. While memory safety is very relevant for preventing exploitable security vulnerabilities, it is also incredibly useful in preventing memory safety bugs in general. Fewer bugs means less risk of bugs leading to data corruption. It means less time spent on tracking down and fixing bugs, and less time spent reviewing bug fixes. It also means less time required to review patches in general, because reviewers do not have to review for memory safety issues. So while Rust has high merit in exposed and historically exploited subsystems, this does not mean that it has no merit in other subsystems. 6) "Other subsystems may benefit more from adopting Rust" While this might be true, it does not prevent the block subsystem from benefiting from adopting Rust (see 5). 7) "Do not waste time re-implementing null_blk, it is test infrastructure so memory safety does not matter. Why don't you do loop instead?" I strongly believe that memory safety is also relevant in test infrastructure. We waste time and energy fixing memory safety issues in our code, no matter if the code is test infrastructure or not. I refer to the statistics I posted to the list at an earlier date [3]. Further, I think it is a benefit to all if the storage community can become fluent in Rust before any critical infrastructure is deployed using Rust. This is one reason that I switched my efforts to null_block and that I am not pushing Rust NVMe. 8) "Why don't you wait with this work until you have a driver for a new storage standard" Let's be proactive. I think it is important to iron out the details of the Rust API before we implement any potential new driver. When we eventually need to implement a driver for a future storage standard, the choice to do so in Rust should be easy. By making the API available ahead of time, we will be able to provide future developers with a stable implementation to choose from. 9) "You are a new face in our community. How do we know you will not disappear?" I recognize this consideration and I acknowledge that the community is trust based. Trust takes time to build. I can do little more than state that I intend to stay with my team at Samsung to take care of this project for many years to come. Samsung is behind this particular effort. In general Google and Microsoft are actively contributing to the wider Rust for Linux project. Perhaps that can be an indication that the project in general is not going away. 10) "How can I learn how to build the kernel with Rust enabled?" We have a guide in `Documentation/rust/quick-start.rst`. If that guide does not get you started, please reach out to us [1] and we will help you get started (and fix the documentation since it must not be good enough then). 11) "What if something catches fire and you are out of office?" If I am for some reason not responding to pings during a merge, please contact the Rust subsystem maintainer and the Rust for Linux list [2]. There are quite a few people capable of firefighting if it should ever become necessary. 12) "These patches are not ready yet, we should not accept them" They most definitely are _not_ ready, and I would not ask for them to be included at all in their current state. The RFC is meant to give a sample of the work that I am doing and to start this conversation. I would rather have this conversation preemptively. I did not intend to give the impression that the patches are in a finalized state at all. With all this in mind I would suggest that we treat the Rust block layer API and associated null block driver as an experiment. I would suggest that we merge it in when it is ready, and we gate it behind an experimental kconfig option. If it turns out that all your worst nightmares come true and it becomes an unbearable load for maintainers, reviewers and contributors, it will be low effort remove it again. I very much doubt this will be the case though. Jens, Kieth, Christoph, Ming, I would kindly ask you to comment on my suggestion for next steps, or perhaps suggest an alternate path. In general I would appreciate any constructive feedback from the community. [1] https://rust-for-linux.com/contact [2] rust-for-linux@vger.kernel.org [3] https://lore.kernel.org/all/87y1ofj5tt.fsf@metaspace.dk/ Best regards, Andreas Hindborg
On Tue, Jun 6, 2023 at 3:40 PM Andreas Hindborg (Samsung) <nmi@metaspace.dk> wrote: > > Samsung will provide resources to support this CI effort. Through this effort > I will aim to minimize any inconvenience for maintainers. This is great. > We also have Rust support in Intel 0-day CI, although only compile tests for > now. In addition, we also have initial Rust support in KernelCI (including `linux-next` [1], `rust-next` and the old `rust` branch), which could be expanded upon, especially with more resources (Cc'ing Guillaume). Moreover, our plan is to replicate the simple CI we originally had for the `rust` branch (which included QEMU boot tests under a matrix of several archs/configs/compilers) to our `rust-next`, `rust-fixes` and `rust-dev` branches, in order to complement the other CIs and to get some early smoke testing (especially for `rust-dev`). [1] https://github.com/kernelci/kernelci-core/blob/65b0900438e0ed20e7efe0ada681ab212dc8c774/config/core/build-configs.yaml#L1152-L1197 Cheers, Miguel
Hi, Yexuan Yang <1182282462@bupt.edu.cn> writes: >> Over the 432 benchmark configurations, the relative performance of the Rust >> driver to the C driver (in terms of IOPS) is between 6.8 and -11.8 percent with >> an average of 0.2 percent better for reads. For writes the span is 16.8 to -4.5 >> percent with an average of 0.9 percent better. >> >> For each measurement the drivers are loaded, a drive is configured with memory >> backing and a size of 4 GiB. C null_blk is configured to match the implemented >> modes of the Rust driver: `blocksize` is set to 4 KiB, `completion_nsec` to 0, >> `irqmode` to 0 (IRQ_NONE), `queue_mode` to 2 (MQ), `hw_queue_depth` to 256 and >> `memory_backed` to 1. For both the drivers, the queue scheduler is set to >> `none`. These measurements are made using 30 second runs of `fio` with the >> `PSYNC` IO engine with workers pinned to separate CPU cores. The measurements >> are done inside a virtual machine (qemu/kvm) on an Intel Alder Lake workstation >> (i5-12600). > > Hi All! > I have some problems about your benchmark test. > In Ubuntu 22.02, I compiled an RFL kernel with both C null_blk driver and Rust > null_blk_driver as modules. For the C null_blk driver, I used the `modprobe` > command to set relevant parameters, while for the Rust null_blk_driver, I simply > started it. I used the following two commands to start the drivers: > > ```bash > sudo modprobe null_blk \ > queue_mode=2 irqmode=0 hw_queue_depth=256 \ > memory_backed=1 bs=4096 completion_nsec=0 \ > no_sched=1 gb=4; > sudo modprobe rnull_mod > ``` > > After that, I tested their performance in `randread` with the fio command, specifying the first parameter as 4 and the second parameter as 1: > > ```bash > fio --filename=/dev/nullb0 --iodepth=64 --ioengine=psync --direct=1 --rw=randread --bs=$1k --numjobs=$2 --runtime=30 --group_reporting --name=test-rand-read --output=test_c_randread.log > fio --filename=/dev/rnullb0 --iodepth=64 --ioengine=psync --direct=1 --rw=randread --bs=$1k --numjobs=$2 --runtime=30 --group_reporting --name=test-rand-read --output=test_rust_randread.log > ``` > > The test results showed a significant performance difference between the two > drivers, which was drastically different from the data you tested in the > community. Specifically, for `randread`, the C driver had a bandwidth of > 487MiB/s and IOPS of 124.7k, while the Rust driver had a bandwidth of 264MiB/s > and IOPS of 67.7k. However, for other I/O types, the performance of the C and > Rust drivers was more similar. Could you please provide more information about > the actual bandwidth and IOPS data from your tests, rather than just the > performance difference between the C and Rust drivers? Additionally, if you > could offer possible reasons for this abnormality, I would greatly appreciate > it! Thanks for trying out the code! I am not sure why you get these numbers. I am currently out of office, but I will rerun the benchmarks next week when I get back in. Maybe I can provide you with some scripts and my kernel configuration. Hopefully we can figure out the difference in our setups. Best regards, Andreas
Hi, "Andreas Hindborg (Samsung)" <nmi@metaspace.dk> writes: > Hi, > > Yexuan Yang <1182282462@bupt.edu.cn> writes: > >>> Over the 432 benchmark configurations, the relative performance of the Rust >>> driver to the C driver (in terms of IOPS) is between 6.8 and -11.8 percent with >>> an average of 0.2 percent better for reads. For writes the span is 16.8 to -4.5 >>> percent with an average of 0.9 percent better. >>> >>> For each measurement the drivers are loaded, a drive is configured with memory >>> backing and a size of 4 GiB. C null_blk is configured to match the implemented >>> modes of the Rust driver: `blocksize` is set to 4 KiB, `completion_nsec` to 0, >>> `irqmode` to 0 (IRQ_NONE), `queue_mode` to 2 (MQ), `hw_queue_depth` to 256 and >>> `memory_backed` to 1. For both the drivers, the queue scheduler is set to >>> `none`. These measurements are made using 30 second runs of `fio` with the >>> `PSYNC` IO engine with workers pinned to separate CPU cores. The measurements >>> are done inside a virtual machine (qemu/kvm) on an Intel Alder Lake workstation >>> (i5-12600). >> >> Hi All! >> I have some problems about your benchmark test. >> In Ubuntu 22.02, I compiled an RFL kernel with both C null_blk driver and Rust >> null_blk_driver as modules. For the C null_blk driver, I used the `modprobe` >> command to set relevant parameters, while for the Rust null_blk_driver, I simply >> started it. I used the following two commands to start the drivers: >> >> ```bash >> sudo modprobe null_blk \ >> queue_mode=2 irqmode=0 hw_queue_depth=256 \ >> memory_backed=1 bs=4096 completion_nsec=0 \ >> no_sched=1 gb=4; >> sudo modprobe rnull_mod >> ``` >> >> After that, I tested their performance in `randread` with the fio command, specifying the first parameter as 4 and the second parameter as 1: >> >> ```bash >> fio --filename=/dev/nullb0 --iodepth=64 --ioengine=psync --direct=1 --rw=randread --bs=$1k --numjobs=$2 --runtime=30 --group_reporting --name=test-rand-read --output=test_c_randread.log >> fio --filename=/dev/rnullb0 --iodepth=64 --ioengine=psync --direct=1 --rw=randread --bs=$1k --numjobs=$2 --runtime=30 --group_reporting --name=test-rand-read --output=test_rust_randread.log >> ``` >> >> The test results showed a significant performance difference between the two >> drivers, which was drastically different from the data you tested in the >> community. Specifically, for `randread`, the C driver had a bandwidth of >> 487MiB/s and IOPS of 124.7k, while the Rust driver had a bandwidth of 264MiB/s >> and IOPS of 67.7k. However, for other I/O types, the performance of the C and >> Rust drivers was more similar. Could you please provide more information about >> the actual bandwidth and IOPS data from your tests, rather than just the >> performance difference between the C and Rust drivers? Additionally, if you >> could offer possible reasons for this abnormality, I would greatly appreciate >> it! > > Thanks for trying out the code! I am not sure why you get these numbers. > I am currently out of office, but I will rerun the benchmarks next week > when I get back in. Maybe I can provide you with some scripts and my > kernel configuration. Hopefully we can figure out the difference in our > setups. > I just ran the benchmarks for that configuration again. My setup is an Intel Alder Lake CPU (i5-12600) with Debian Bullseye user space running a 6.2.12 host kernel with a kernel config based on the stock Debian Bullseye (debian config + make olddefconfig). 32 GB of DDR5 4800MHz memory. The benchmarks (and patched kernel) run inside QEMU KVM (from Debian repos Debian 1:5.2+dfsg-11+deb11u2). I use the following qemu command to boot (edited a bit for clarity): "qemu-system-x86_64" "-nographic" "-enable-kvm" "-m" "16G" "-cpu" "host" "-M" "q35" "-smp" "6" "-kernel" "vmlinux" "-append" "console=ttyS0 nokaslr rdinit=/sbin/init root=/dev/vda1 null_blk.nr_devices=0" I use a Debian Bullseye user space inside the virtual machine. I think I used the stock Bullseye kernel for the host when I did the numbers for the cover letter, so that is different for this run. Here are the results: +---------+----------+------+---------------------+ | jobs/bs | workload | prep | 4 | +---------+----------+------+---------------------+ | 4 | randread | 1 | 0.28 0.00 (1.7,0.0) | | 4 | randread | 0 | 5.75 0.00 (6.8,0.0) | +---------+----------+------+---------------------+ I used the following parameters for both tests: Block size: 4k Run time: 120s Workload: randread Jobs: 4 This is the `fio` command I used: ['fio', '--group_reporting', '--name=default', '--filename=/dev/<snip>', '--time_based=1', '--runtime=120', '--rw=randread', '--output=<snip>', '--output-format=json', '--numjobs=4', '--cpus_allowed=0-3', '--cpus_allowed_policy=split', '--ioengine=psync', '--bs=4k', '--direct=1', '--norandommap', '--random_generator=lfsr'] For the line in the table where `prep` is 1 I filled up the target device with data before running the benchmark. I use the following `fio` job to do that: ['fio', '--name=prep', '--rw=write', '--filename=/dev/None', '--bs=4k', '--direct=1'] Did you prepare the volumes with data in your experiment? How to read the data in the table: For the benchmark where the drive is prepared, Rust null_blk perform 0.28 percent better than C null_block. For the case where the drive is empty Rust null_block performs 5.75 percent better than C null_block. I calculated the relative performance as "(r_iops - c_iops) / c_iops * 100". The IOPS info you request is present in the tables from the cover letter. As mentioned in the cover letter, the numbers in parenthesis in each cell is throughput in 1,000,000 IOPS for read and write for the C null_blk driver. For the table above, C null_blk did 1,700,000 read IOPS in the case when the drive is prepared and 5,750,000 IOPS in the case where the drive is not prepared. You can obtain bandwidth by multiplying IOPS by block size. I can also provide the raw json output of `fio` if you want to have a look. One last note is that I unload the module between each run and I do not have both modules loaded at the same time. If you are exhausting you memory pool this could maybe impact performance? So things to check: - Do you prepare the drive? - Do you release drive memory after test (you can unload the module) - Use the lfsr random number generator - Do not use the randommap I am not sure what hardware you are running on, but the throughput numbers you obtain seem _really_ low. Best regards Andreas Hindborg