From patchwork Fri Mar 24 13:58:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 74562 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp649083vqo; Fri, 24 Mar 2023 07:03:58 -0700 (PDT) X-Google-Smtp-Source: AKy350Y8d/RGbBgq56GPVbrR2n7in08S6AHHIWFnVdgXmi26GB6WmD+V7wIhYG47/dIp2231KrX7 X-Received: by 2002:aa7:c6c8:0:b0:4f9:deb4:b986 with SMTP id b8-20020aa7c6c8000000b004f9deb4b986mr2638923eds.7.1679666637760; Fri, 24 Mar 2023 07:03:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679666637; cv=none; d=google.com; s=arc-20160816; b=Laniie2d/4bDi0m3LxSuGhKfxA+gbA+N3SzS56j5YvqF67nCOdc9B0m1UK/LETmopg pWJEn9q0Ckty4mDXOQjioae5UjHq3pXf2n1jyouWjAzyw9gbbIOlg8n5MBiGWk8l6KJr IWb9WM/9ZMNpXA8ot6khJtqXyWidaQ0bi3dUUyieJu8y/NULTjU5D4/zCYs1M5H56HyV Ht6H6Mhe7XANa01uBDTx6nmpPBJqlZgNzql2Uk/8F/RJN5OuAJRa8rEcQv/tQLjQ3jJm 5QxFLSwA5nslGvOkVyDnq3+On2AtynUfiyNE/l87VX8QbGGLXbQE1S7OZDo2oLonw2BV 7EZA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=UtTkIiWab0XMlZOmjKpa8MKYPjT9l5NSYtOz0X+FN4U=; b=GDVaxaZQRVskIruTc5vVWsLOJp07vS1gwTQX7Wru7RXxbYR4Whz7cx4omXX3sv765W IQbF5VwMT8mRIbSC9K5aTaOZ3reJbgr/RFaLqLFaz1OGFfO27FidtuFN8LyRLfgmZzs+ UxitmypQetZl4PlAWsLx9kQmwc+5PAYae8zpR5hCadQSmko+8TN2XxmRDFRvSH53ETsL WiMNNPvKgl7LHsRRzI6LdvXsLYIdXrBtEKRk0zI9MwJDSPN3y1H57TX4PSOY1/2YS0r5 LXcdCnfDXY4jHN9k6GH3DsOmYYt0aXSFPrbT56LpUJLobdGSgs0F/lVuql0UG1g9pB0E jg6g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=NAFVWh8k; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id dy20-20020a05640231f400b004fa760de417si22394040edb.122.2023.03.24.07.03.19; Fri, 24 Mar 2023 07:03:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=NAFVWh8k; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232182AbjCXOBB (ORCPT + 99 others); Fri, 24 Mar 2023 10:01:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52666 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231960AbjCXOAZ (ORCPT ); Fri, 24 Mar 2023 10:00:25 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 146A71C595 for ; Fri, 24 Mar 2023 06:59:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679666357; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UtTkIiWab0XMlZOmjKpa8MKYPjT9l5NSYtOz0X+FN4U=; b=NAFVWh8k4dTBQ5hU08zFBPDIQU3B9pLyT0aCaDU1Q2t2Nc5EKxp1JLyaSF1qlHc4an8Z32 eVuB4bPCdkpDiDrHvkqfIMhcP+6MZ1gCn+o9mizbcnwwwKkqWsy/6T9HhY3nDNzPpO9JYV oHBnVDvcJU76Lk5GHDMGgWDwkGkrYEs= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-99-pmqtjrANN9-T7WRDlRT_Lg-1; Fri, 24 Mar 2023 09:59:14 -0400 X-MC-Unique: pmqtjrANN9-T7WRDlRT_Lg-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 8EE9A385556D; Fri, 24 Mar 2023 13:59:13 +0000 (UTC) Received: from localhost (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id A326C4042AC9; Fri, 24 Mar 2023 13:59:12 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Pavel Begunkov , Stefan Hajnoczi , Ming Lei Subject: [PATCH V4 15/17] block: ublk_drv: add read()/write() support for ublk char device Date: Fri, 24 Mar 2023 21:58:06 +0800 Message-Id: <20230324135808.855245-16-ming.lei@redhat.com> In-Reply-To: <20230324135808.855245-1-ming.lei@redhat.com> References: <20230324135808.855245-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761258124416689884?= X-GMAIL-MSGID: =?utf-8?q?1761258124416689884?= We are going to support zero copy by fused uring command, the userspace can't read from or write to the io buffer any more, it becomes not flexible for applications: 1) some targets need to zero buffer explicitly, such as when reading unmapped qcow2 cluster 2) some targets need to support passthrough command, such as zoned report zones, and still need to read/write the io buffer Support pread()/pwrite() on ublk char device for reading/writing request io buffer, so ublk server can handle the above cases easily. This also can help to make zero copy becoming the primary option, and non-zero-copy will become legacy code path since the added read()/write() can cover non-zero-copy feature. Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 131 ++++++++++++++++++++++++++++++++++ include/uapi/linux/ublk_cmd.h | 31 +++++++- 2 files changed, 161 insertions(+), 1 deletion(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 26a14c54da1d..e6b528750b3c 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -1317,6 +1317,36 @@ static void ublk_handle_need_get_data(struct ublk_device *ub, int q_id, ublk_queue_cmd(ubq, req); } +static inline struct request *__ublk_check_and_get_req(struct ublk_device *ub, + struct ublk_queue *ubq, int tag, size_t offset) +{ + struct request *req; + + if (!ublk_support_zc(ubq)) + return NULL; + + req = blk_mq_tag_to_rq(ub->tag_set.tags[ubq->q_id], tag); + if (!req) + return NULL; + + if (!ublk_get_req_ref(ubq, req)) + return NULL; + + if (unlikely(!blk_mq_request_started(req) || req->tag != tag)) + goto fail_put; + + if (!ublk_rq_has_data(req)) + goto fail_put; + + if (offset > blk_rq_bytes(req)) + goto fail_put; + + return req; +fail_put: + ublk_put_req_ref(ubq, req); + return NULL; +} + static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) { struct ublksrv_io_cmd *ub_cmd = (struct ublksrv_io_cmd *)cmd->cmd; @@ -1418,11 +1448,112 @@ static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) return -EIOCBQUEUED; } +static inline bool ublk_check_ubuf_dir(const struct request *req, + int ubuf_dir) +{ + /* copy ubuf to request pages */ + if (req_op(req) == REQ_OP_READ && ubuf_dir == ITER_SOURCE) + return true; + + /* copy request pages to ubuf */ + if (req_op(req) == REQ_OP_WRITE && ubuf_dir == ITER_DEST) + return true; + + return false; +} + +static struct request *ublk_check_and_get_req(struct kiocb *iocb, + struct iov_iter *iter, size_t *off, int dir) +{ + struct ublk_device *ub = iocb->ki_filp->private_data; + struct ublk_queue *ubq; + struct request *req; + size_t buf_off; + u16 tag, q_id; + + if (!ub) + return ERR_PTR(-EACCES); + + if (!user_backed_iter(iter)) + return ERR_PTR(-EACCES); + + if (ub->dev_info.state == UBLK_S_DEV_DEAD) + return ERR_PTR(-EACCES); + + tag = ublk_pos_to_tag(iocb->ki_pos); + q_id = ublk_pos_to_hwq(iocb->ki_pos); + buf_off = ublk_pos_to_buf_offset(iocb->ki_pos); + + if (q_id >= ub->dev_info.nr_hw_queues) + return ERR_PTR(-EINVAL); + + ubq = ublk_get_queue(ub, q_id); + if (!ubq) + return ERR_PTR(-EINVAL); + + if (tag >= ubq->q_depth) + return ERR_PTR(-EINVAL); + + req = __ublk_check_and_get_req(ub, ubq, tag, buf_off); + if (!req) + return ERR_PTR(-EINVAL); + + if (!req->mq_hctx || !req->mq_hctx->driver_data) + goto fail; + + if (!ublk_check_ubuf_dir(req, dir)) + goto fail; + + *off = buf_off; + return req; +fail: + ublk_put_req_ref(ubq, req); + return ERR_PTR(-EACCES); +} + +static ssize_t ublk_ch_read_iter(struct kiocb *iocb, struct iov_iter *to) +{ + struct ublk_queue *ubq; + struct request *req; + size_t buf_off; + size_t ret; + + req = ublk_check_and_get_req(iocb, to, &buf_off, ITER_DEST); + if (unlikely(IS_ERR(req))) + return PTR_ERR(req); + + ret = ublk_copy_user_pages(req, buf_off, to, ITER_DEST); + ubq = req->mq_hctx->driver_data; + ublk_put_req_ref(ubq, req); + + return ret; +} + +static ssize_t ublk_ch_write_iter(struct kiocb *iocb, struct iov_iter *from) +{ + struct ublk_queue *ubq; + struct request *req; + size_t buf_off; + size_t ret; + + req = ublk_check_and_get_req(iocb, from, &buf_off, ITER_SOURCE); + if (unlikely(IS_ERR(req))) + return PTR_ERR(req); + + ret = ublk_copy_user_pages(req, buf_off, from, ITER_SOURCE); + ubq = req->mq_hctx->driver_data; + ublk_put_req_ref(ubq, req); + + return ret; +} + static const struct file_operations ublk_ch_fops = { .owner = THIS_MODULE, .open = ublk_ch_open, .release = ublk_ch_release, .llseek = no_llseek, + .read_iter = ublk_ch_read_iter, + .write_iter = ublk_ch_write_iter, .uring_cmd = ublk_ch_uring_cmd, .mmap = ublk_ch_mmap, }; diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h index f6238ccc7800..d1a6b3dc0327 100644 --- a/include/uapi/linux/ublk_cmd.h +++ b/include/uapi/linux/ublk_cmd.h @@ -54,7 +54,36 @@ #define UBLKSRV_IO_BUF_OFFSET 0x80000000 /* tag bit is 12bit, so at most 4096 IOs for each queue */ -#define UBLK_MAX_QUEUE_DEPTH 4096 +#define UBLK_TAG_BITS 12 +#define UBLK_MAX_QUEUE_DEPTH (1U << UBLK_TAG_BITS) + +/* used for locating each io buffer for pread()/pwrite() on char device */ +#define UBLK_BUFS_SIZE_BITS 42 +#define UBLK_BUFS_SIZE_MASK ((1ULL << UBLK_BUFS_SIZE_BITS) - 1) +#define UBLK_BUF_SIZE_BITS (UBLK_BUFS_SIZE_BITS - UBLK_TAG_BITS) +#define UBLK_BUF_MAX_SIZE (1ULL << UBLK_BUF_SIZE_BITS) + +static inline __u16 ublk_pos_to_hwq(__u64 pos) +{ + return pos >> UBLK_BUFS_SIZE_BITS; +} + +static inline __u32 ublk_pos_to_buf_offset(__u64 pos) +{ + return (pos & UBLK_BUFS_SIZE_MASK) & (UBLK_BUF_MAX_SIZE - 1); +} + +static inline __u16 ublk_pos_to_tag(__u64 pos) +{ + return (pos & UBLK_BUFS_SIZE_MASK) >> UBLK_BUF_SIZE_BITS; +} + +/* offset of single buffer, which has to be < UBLK_BUX_MAX_SIZE */ +static inline __u64 ublk_pos(__u16 q_id, __u16 tag, __u32 offset) +{ + return (((__u64)q_id) << UBLK_BUFS_SIZE_BITS) | + ((((__u64)tag) << UBLK_BUF_SIZE_BITS) + offset); +} /* * zero copy requires 4k block size, and can remap ublk driver's io