From patchwork Fri Mar 24 13:57:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 74552 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp647138vqo; Fri, 24 Mar 2023 07:02:07 -0700 (PDT) X-Google-Smtp-Source: AKy350brHDa+8eAEfLw5An1H9YKLs2/oPyO6Lgh391vZuu9PEB1fXsdVPchOg2mgOApcOR/k5uVB X-Received: by 2002:a05:6402:1110:b0:500:50f6:dd33 with SMTP id u16-20020a056402111000b0050050f6dd33mr3545076edv.2.1679666527223; Fri, 24 Mar 2023 07:02:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679666527; cv=none; d=google.com; s=arc-20160816; b=SFSDqRHw/4Y+NpAiXHEHPcvzWsvuMyJ2e6sPmT8G0GTS/y6DD+TIEQpsDQgupSkjRB ZA52H6FntrKHVlCqxQsTMknP3L/f9Az6wSPrCnG6PiF2ZTNG7jEG8AcozVX650xUq6fa KY7eL4Vk4CaJzTXSUqJrPXdqJ3eZtVdNl8+AGNYOabs69D5AEU1eKQOwHldsLxdjiJSL aCG8vvf8w5W/7HCdQ7jQwy/gqksxdBlr7MS4VZ8i9jod3zXdIS+F8/96VcDKadGPaCR7 zmgfC219fh/48imSdNhgHxHyWVrjattQUOkx2b7ntIg5kkK7CRSJJLK4QsYYtoWQnRg+ mvTw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=PDsglB1VronX+vyTurKZAVkq744xdNdCqr4KsItiReA=; b=tpy8X9LdAh5hc26AvpNsnUiKPwPBtxn/GtQ51xPiX5U5mvYTYFUEiutJI79bixYpOK Z8QK+HrsHkz5T8U1OxwEHwVoXse7zh8v0thxpmAexp5MXzGZLmv9owW5oZtlGJUya3+q x+SQ2Njs+OTJ+N5RpF4nf5sell7gxQP4Iibzk6zENYzktaUfIeUsO/2aungjh11zDuKU 9NfL4HR0ljo95KDasN7F874n744IO6viC//lTWSMgOx3zn/+iSUtQGRMiDYlCdMFAlxt yqEQNwhR9zLHmJGxWUu0vQKsm0HCCivOe3E90fr1WU+8Px37CAnZfKR8woV8co/bbZcF a3UA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=cbo8PjOY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d14-20020aa7c1ce000000b005021f0d0ce1si2203173edp.241.2023.03.24.07.01.36; Fri, 24 Mar 2023 07:02:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=cbo8PjOY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231921AbjCXN7T (ORCPT + 99 others); Fri, 24 Mar 2023 09:59:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50702 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231920AbjCXN7R (ORCPT ); Fri, 24 Mar 2023 09:59:17 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 252A41421B for ; Fri, 24 Mar 2023 06:58:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679666309; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PDsglB1VronX+vyTurKZAVkq744xdNdCqr4KsItiReA=; b=cbo8PjOYfvFEO88E2nW4wxxyBgoAimteSzLslJQ/RKM4AXNRXUfiXkK9olVM78GZXf/zIy 33CoLrtyJb0GCk23NmjmT0HuAGSlSKZy2oFFAJdfbU3zaoUXKDJpE65nkDgUT8V0xPN87r ZCx+JuVASootvdnMvNJR4EerPfPgFUQ= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-665-qLJy3HcJPuqL9nybjRBXyQ-1; Fri, 24 Mar 2023 09:58:25 -0400 X-MC-Unique: qLJy3HcJPuqL9nybjRBXyQ-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 87DE880C8C4; Fri, 24 Mar 2023 13:58:24 +0000 (UTC) Received: from localhost (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9996D140EBF4; Fri, 24 Mar 2023 13:58:23 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Pavel Begunkov , Stefan Hajnoczi , Ming Lei Subject: [PATCH V4 01/17] io_uring: increase io_kiocb->flags into 64bit Date: Fri, 24 Mar 2023 21:57:52 +0800 Message-Id: <20230324135808.855245-2-ming.lei@redhat.com> In-Reply-To: <20230324135808.855245-1-ming.lei@redhat.com> References: <20230324135808.855245-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 X-Spam-Status: No, score=0.6 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,UPPERCASE_50_75 autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761258008361207044?= X-GMAIL-MSGID: =?utf-8?q?1761258008361207044?= The 32bit io_kiocb->flags has been used up, so extend it to 64bit. Signed-off-by: Ming Lei --- include/linux/io_uring_types.h | 65 +++++++++++++++++----------------- io_uring/io_uring.c | 2 +- 2 files changed, 34 insertions(+), 33 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 3d152bdcd30a..aab657cd2b77 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -409,68 +409,68 @@ enum { enum { /* ctx owns file */ - REQ_F_FIXED_FILE = BIT(REQ_F_FIXED_FILE_BIT), + REQ_F_FIXED_FILE = BIT_ULL(REQ_F_FIXED_FILE_BIT), /* drain existing IO first */ - REQ_F_IO_DRAIN = BIT(REQ_F_IO_DRAIN_BIT), + REQ_F_IO_DRAIN = BIT_ULL(REQ_F_IO_DRAIN_BIT), /* linked sqes */ - REQ_F_LINK = BIT(REQ_F_LINK_BIT), + REQ_F_LINK = BIT_ULL(REQ_F_LINK_BIT), /* doesn't sever on completion < 0 */ - REQ_F_HARDLINK = BIT(REQ_F_HARDLINK_BIT), + REQ_F_HARDLINK = BIT_ULL(REQ_F_HARDLINK_BIT), /* IOSQE_ASYNC */ - REQ_F_FORCE_ASYNC = BIT(REQ_F_FORCE_ASYNC_BIT), + REQ_F_FORCE_ASYNC = BIT_ULL(REQ_F_FORCE_ASYNC_BIT), /* IOSQE_BUFFER_SELECT */ - REQ_F_BUFFER_SELECT = BIT(REQ_F_BUFFER_SELECT_BIT), + REQ_F_BUFFER_SELECT = BIT_ULL(REQ_F_BUFFER_SELECT_BIT), /* IOSQE_CQE_SKIP_SUCCESS */ - REQ_F_CQE_SKIP = BIT(REQ_F_CQE_SKIP_BIT), + REQ_F_CQE_SKIP = BIT_ULL(REQ_F_CQE_SKIP_BIT), /* fail rest of links */ - REQ_F_FAIL = BIT(REQ_F_FAIL_BIT), + REQ_F_FAIL = BIT_ULL(REQ_F_FAIL_BIT), /* on inflight list, should be cancelled and waited on exit reliably */ - REQ_F_INFLIGHT = BIT(REQ_F_INFLIGHT_BIT), + REQ_F_INFLIGHT = BIT_ULL(REQ_F_INFLIGHT_BIT), /* read/write uses file position */ - REQ_F_CUR_POS = BIT(REQ_F_CUR_POS_BIT), + REQ_F_CUR_POS = BIT_ULL(REQ_F_CUR_POS_BIT), /* must not punt to workers */ - REQ_F_NOWAIT = BIT(REQ_F_NOWAIT_BIT), + REQ_F_NOWAIT = BIT_ULL(REQ_F_NOWAIT_BIT), /* has or had linked timeout */ - REQ_F_LINK_TIMEOUT = BIT(REQ_F_LINK_TIMEOUT_BIT), + REQ_F_LINK_TIMEOUT = BIT_ULL(REQ_F_LINK_TIMEOUT_BIT), /* needs cleanup */ - REQ_F_NEED_CLEANUP = BIT(REQ_F_NEED_CLEANUP_BIT), + REQ_F_NEED_CLEANUP = BIT_ULL(REQ_F_NEED_CLEANUP_BIT), /* already went through poll handler */ - REQ_F_POLLED = BIT(REQ_F_POLLED_BIT), + REQ_F_POLLED = BIT_ULL(REQ_F_POLLED_BIT), /* buffer already selected */ - REQ_F_BUFFER_SELECTED = BIT(REQ_F_BUFFER_SELECTED_BIT), + REQ_F_BUFFER_SELECTED = BIT_ULL(REQ_F_BUFFER_SELECTED_BIT), /* buffer selected from ring, needs commit */ - REQ_F_BUFFER_RING = BIT(REQ_F_BUFFER_RING_BIT), + REQ_F_BUFFER_RING = BIT_ULL(REQ_F_BUFFER_RING_BIT), /* caller should reissue async */ - REQ_F_REISSUE = BIT(REQ_F_REISSUE_BIT), + REQ_F_REISSUE = BIT_ULL(REQ_F_REISSUE_BIT), /* supports async reads/writes */ - REQ_F_SUPPORT_NOWAIT = BIT(REQ_F_SUPPORT_NOWAIT_BIT), + REQ_F_SUPPORT_NOWAIT = BIT_ULL(REQ_F_SUPPORT_NOWAIT_BIT), /* regular file */ - REQ_F_ISREG = BIT(REQ_F_ISREG_BIT), + REQ_F_ISREG = BIT_ULL(REQ_F_ISREG_BIT), /* has creds assigned */ - REQ_F_CREDS = BIT(REQ_F_CREDS_BIT), + REQ_F_CREDS = BIT_ULL(REQ_F_CREDS_BIT), /* skip refcounting if not set */ - REQ_F_REFCOUNT = BIT(REQ_F_REFCOUNT_BIT), + REQ_F_REFCOUNT = BIT_ULL(REQ_F_REFCOUNT_BIT), /* there is a linked timeout that has to be armed */ - REQ_F_ARM_LTIMEOUT = BIT(REQ_F_ARM_LTIMEOUT_BIT), + REQ_F_ARM_LTIMEOUT = BIT_ULL(REQ_F_ARM_LTIMEOUT_BIT), /* ->async_data allocated */ - REQ_F_ASYNC_DATA = BIT(REQ_F_ASYNC_DATA_BIT), + REQ_F_ASYNC_DATA = BIT_ULL(REQ_F_ASYNC_DATA_BIT), /* don't post CQEs while failing linked requests */ - REQ_F_SKIP_LINK_CQES = BIT(REQ_F_SKIP_LINK_CQES_BIT), + REQ_F_SKIP_LINK_CQES = BIT_ULL(REQ_F_SKIP_LINK_CQES_BIT), /* single poll may be active */ - REQ_F_SINGLE_POLL = BIT(REQ_F_SINGLE_POLL_BIT), + REQ_F_SINGLE_POLL = BIT_ULL(REQ_F_SINGLE_POLL_BIT), /* double poll may active */ - REQ_F_DOUBLE_POLL = BIT(REQ_F_DOUBLE_POLL_BIT), + REQ_F_DOUBLE_POLL = BIT_ULL(REQ_F_DOUBLE_POLL_BIT), /* request has already done partial IO */ - REQ_F_PARTIAL_IO = BIT(REQ_F_PARTIAL_IO_BIT), + REQ_F_PARTIAL_IO = BIT_ULL(REQ_F_PARTIAL_IO_BIT), /* fast poll multishot mode */ - REQ_F_APOLL_MULTISHOT = BIT(REQ_F_APOLL_MULTISHOT_BIT), + REQ_F_APOLL_MULTISHOT = BIT_ULL(REQ_F_APOLL_MULTISHOT_BIT), /* ->extra1 and ->extra2 are initialised */ - REQ_F_CQE32_INIT = BIT(REQ_F_CQE32_INIT_BIT), + REQ_F_CQE32_INIT = BIT_ULL(REQ_F_CQE32_INIT_BIT), /* recvmsg special flag, clear EPOLLIN */ - REQ_F_CLEAR_POLLIN = BIT(REQ_F_CLEAR_POLLIN_BIT), + REQ_F_CLEAR_POLLIN = BIT_ULL(REQ_F_CLEAR_POLLIN_BIT), /* hashed into ->cancel_hash_locked, protected by ->uring_lock */ - REQ_F_HASH_LOCKED = BIT(REQ_F_HASH_LOCKED_BIT), + REQ_F_HASH_LOCKED = BIT_ULL(REQ_F_HASH_LOCKED_BIT), }; typedef void (*io_req_tw_func_t)(struct io_kiocb *req, bool *locked); @@ -531,7 +531,8 @@ struct io_kiocb { * and after selection it points to the buffer ID itself. */ u16 buf_index; - unsigned int flags; + u32 __pad; + u64 flags; struct io_cqe cqe; diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 24be4992821b..449508912a9c 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -4486,7 +4486,7 @@ static int __init io_uring_init(void) BUILD_BUG_ON(SQE_COMMON_FLAGS >= (1 << 8)); BUILD_BUG_ON((SQE_VALID_FLAGS | SQE_COMMON_FLAGS) != SQE_VALID_FLAGS); - BUILD_BUG_ON(__REQ_F_LAST_BIT > 8 * sizeof(int)); + BUILD_BUG_ON(__REQ_F_LAST_BIT > 8 * sizeof(u64)); BUILD_BUG_ON(sizeof(atomic_t) != sizeof(u32)); From patchwork Fri Mar 24 13:57:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 74569 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp658379vqo; Fri, 24 Mar 2023 07:14:49 -0700 (PDT) X-Google-Smtp-Source: AKy350ZIPOSinWTDy2izEwX0vO8xqK8oTgLpWIeJy2yYNe02dtpIaVAfop1//dJLQT14Ik9nbm6+ X-Received: by 2002:a17:907:7fa6:b0:931:99b5:6791 with SMTP id qk38-20020a1709077fa600b0093199b56791mr3261144ejc.72.1679667288858; Fri, 24 Mar 2023 07:14:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679667288; cv=none; d=google.com; s=arc-20160816; b=p9FcwdH1PpRlTLHTWY7vXjMFRozzpnI8dMU+Xbltp/jQlS+pa7b69sCUY6oLzWkYa4 lNuAgultlkBbapCVNy5AqimTpdWtI5p7/ow2RoLJ2qpdNKn1ZNALheOSRFoAH89KxV70 R7RD+By+r4h1h8rpgZN9FKvSXMAoe0J8m7AXX7S3Ch4W5q41g1LuoiONq+j6jJKXNowJ WcMZ4DrfamsmMfw7bCYtEIBRMe8vCNvJDydDXyxeatfFtITD0fBuATMIIh+o2eCJ/NkT 8fLIjbq5p5Tv1vj2qDqoBMEpQLHlcwd8C2edEMNvY5kw3JtPjRQiqqDEMItWKrs/4qmi uDQQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=xzZ+wuPBi9+dQS3sJDeNSKtl4V9pbm62rly4KXDbmfY=; b=ka3xafrJfKpz5j0ewIeouG0cDHlT3aasRwUAVzaWCZYudkFMDkWPgXgWGQmnrHwip/ 5AaDcobqrnTXPPKdIhA14hw2DTYqq8Qk2BBUDmC19sfdaEhCQW6hiiUZcPaH0GDRMado g+MTy1GNmiCuP1OmRISedXedEUM3xzlA12K3rsP6MPoDUsC6W1RSdCKTG8ZxJmcV8WnW YOCNFuCbcDRScK122144gdwu7sivKRwCqTWyPNWDY0HA2MiZjFtdqi0E6w4DIIzMWrrf R8yobh1Kno0kESIDdF53glZWkZR+V9KjkSKH+a0mXUNR9YhiR3I0xbacw8sWs6CQ1fc4 InKA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=aDaW0VXZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hv6-20020a17090760c600b009325610fddcsi19062370ejc.501.2023.03.24.07.14.14; Fri, 24 Mar 2023 07:14:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=aDaW0VXZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231929AbjCXN7X (ORCPT + 99 others); Fri, 24 Mar 2023 09:59:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50794 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231925AbjCXN7R (ORCPT ); Fri, 24 Mar 2023 09:59:17 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9D85918B11 for ; Fri, 24 Mar 2023 06:58:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679666312; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xzZ+wuPBi9+dQS3sJDeNSKtl4V9pbm62rly4KXDbmfY=; b=aDaW0VXZOFKByALjUVVVC0WS9NHGQc565+ynjAUPQh5JtPqvKLwSbyl2veTWSFzjCzL1hh 0rXWy8TRaWiSxwNM1a/TS5VCYPv7JL1gf48UrGOGVMazAdBXNhfNVvPm068aOC9184Qcd1 H0uBxdR0E46Fe2yZ/f/7R/XWvSzZAeQ= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-246-dE6P3biwNJiV7AA7HcfGRg-1; Fri, 24 Mar 2023 09:58:29 -0400 X-MC-Unique: dE6P3biwNJiV7AA7HcfGRg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 9A21B3828898; Fri, 24 Mar 2023 13:58:28 +0000 (UTC) Received: from localhost (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id 41E6318EC7; Fri, 24 Mar 2023 13:58:26 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Pavel Begunkov , Stefan Hajnoczi , Ming Lei Subject: [PATCH V4 02/17] io_uring: add IORING_OP_FUSED_CMD Date: Fri, 24 Mar 2023 21:57:53 +0800 Message-Id: <20230324135808.855245-3-ming.lei@redhat.com> In-Reply-To: <20230324135808.855245-1-ming.lei@redhat.com> References: <20230324135808.855245-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761258807146497448?= X-GMAIL-MSGID: =?utf-8?q?1761258807146497448?= Add IORING_OP_FUSED_CMD, it is one special URING_CMD, which has to be SQE128. The 1st SQE(master) is one 64byte URING_CMD, and the 2nd 64byte SQE(slave) is another normal 64byte OP. For any OP which needs to support slave OP, io_issue_defs[op].fused_slave has to be set as 1, and its ->issue() needs to retrieve buffer from master request's fused_cmd_kbuf. Follows the key points of the design/implementation: 1) The master uring command produces and provides immutable command buffer(struct io_uring_bvec_buf) to the slave request, and the slave OP can retrieve any part of this buffer by sqe->addr and sqe->len. 2) Master command is always completed after the slave request is completed, so slave request can be thought as serving for master command. - slave request borrows master command's buffer(io_uring_bvec_buf), after slave request is completed, the buffer is returned back to master request. - This way also guarantees correct SQE order since the master request uses slave request's LINK flag. 3) Master request completion is always notified to driver, so that driver can know when the buffer is done with slave quest. This way is important since io_uring_bvec_buf represents reference of device io command buffer, and we have to gurantee that reference can not outlive the referent buffer, so far which is represented by bvec. 4) kernel API of io_fused_cmd_start_slave_req is called by driver for making the buffer of io_uring_bvec_buf and starting to submit slave request with the provided buffer. The motivation is for supporting zero copy for fuse/ublk, in which the device holds IO request buffer, and IO handling is often normal IO OP(fs, net, ..). With IORING_OP_FUSED_CMD, we can implement this kind of zero copy easily & reliably. Signed-off-by: Ming Lei --- include/linux/io_uring.h | 49 ++++++- include/linux/io_uring_types.h | 15 +++ include/uapi/linux/io_uring.h | 1 + io_uring/Makefile | 2 +- io_uring/fused_cmd.c | 233 +++++++++++++++++++++++++++++++++ io_uring/fused_cmd.h | 11 ++ io_uring/io_uring.c | 26 +++- io_uring/io_uring.h | 3 + io_uring/opdef.c | 12 ++ io_uring/opdef.h | 7 + 10 files changed, 353 insertions(+), 6 deletions(-) create mode 100644 io_uring/fused_cmd.c create mode 100644 io_uring/fused_cmd.h diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index 934e5dd4ccc0..45253a5b9fc2 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -4,6 +4,7 @@ #include #include +#include #include enum io_uring_cmd_flags { @@ -20,6 +21,26 @@ enum io_uring_cmd_flags { IO_URING_F_SQE128 = (1 << 8), IO_URING_F_CQE32 = (1 << 9), IO_URING_F_IOPOLL = (1 << 10), + + /* for FUSED_CMD only */ + IO_URING_F_FUSED_BUF_DEST = (1 << 11), /* slave writes to buffer */ + IO_URING_F_FUSED_BUF_SRC = (1 << 12), /* slave reads from buffer */ + /* driver incapable of FUSED_CMD should fail cmd when seeing F_FUSED */ + IO_URING_F_FUSED = IO_URING_F_FUSED_BUF_DEST | + IO_URING_F_FUSED_BUF_SRC, +}; + +union io_uring_fused_cmd_data { + /* + * In case of slave request IOSQE_CQE_SKIP_SUCCESS, return slave + * result via master command; otherwise we simply return success + * if buffer is provided, and slave request will return its result + * via its CQE + */ + s32 slave_res; + + /* fused cmd private, driver do not touch it */ + struct io_kiocb *__slave; }; struct io_uring_cmd { @@ -33,10 +54,31 @@ struct io_uring_cmd { }; u32 cmd_op; u32 flags; - u8 pdu[32]; /* available inline for free use */ + + /* for fused command, the available pdu is a bit less */ + union { + struct { + union io_uring_fused_cmd_data data; + u8 pdu[24]; /* available inline for free use */ + } fused; + u8 pdu[32]; /* available inline for free use */ + }; +}; + +struct io_uring_bvec_buf { + unsigned long len; + unsigned int nr_bvecs; + + /* offset in the 1st bvec */ + unsigned int offset; + const struct bio_vec *bvec; + struct bio_vec __bvec[]; }; #if defined(CONFIG_IO_URING) +void io_fused_cmd_start_slave_req(struct io_uring_cmd *ioucmd, bool locked, + const struct io_uring_bvec_buf *imu, + void (*complete_tw_cb)(struct io_uring_cmd *)); int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, struct iov_iter *iter, void *ioucmd); void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, ssize_t res2); @@ -66,6 +108,11 @@ static inline void io_uring_free(struct task_struct *tsk) __io_uring_free(tsk); } #else +static inline void io_fused_cmd_start_slave_req(struct io_uring_cmd *ioucmd, + bool locked, const struct io_uring_bvec_buf *fused_cmd_kbuf, + unsigned int len, void (*complete_tw_cb)(struct io_uring_cmd *)) +{ +} static inline int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, struct iov_iter *iter, void *ioucmd) { diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index aab657cd2b77..22920d7b12a5 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -402,6 +402,7 @@ enum { /* keep async read/write and isreg together and in order */ REQ_F_SUPPORT_NOWAIT_BIT, REQ_F_ISREG_BIT, + REQ_F_FUSED_SLAVE_BIT, /* not a real bit, just to check we're not overflowing the space */ __REQ_F_LAST_BIT, @@ -471,6 +472,8 @@ enum { REQ_F_CLEAR_POLLIN = BIT_ULL(REQ_F_CLEAR_POLLIN_BIT), /* hashed into ->cancel_hash_locked, protected by ->uring_lock */ REQ_F_HASH_LOCKED = BIT_ULL(REQ_F_HASH_LOCKED_BIT), + /* slave request in fused cmd, won't be one uring cmd */ + REQ_F_FUSED_SLAVE = BIT_ULL(REQ_F_FUSED_SLAVE_BIT), }; typedef void (*io_req_tw_func_t)(struct io_kiocb *req, bool *locked); @@ -553,6 +556,18 @@ struct io_kiocb { * REQ_F_BUFFER_RING is set. */ struct io_buffer_list *buf_list; + + /* + * store kernel (sub)buffer of fused master request which OP + * is IORING_OP_FUSED_CMD + */ + const struct io_uring_bvec_buf *fused_cmd_kbuf; + + /* + * store fused command master request for fuse slave request, + * which uses fuse master's io buffer for handling slave OP + */ + struct io_kiocb *fused_master_req; }; union { diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 1d59c816a5b8..9762a2989747 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -223,6 +223,7 @@ enum io_uring_op { IORING_OP_URING_CMD, IORING_OP_SEND_ZC, IORING_OP_SENDMSG_ZC, + IORING_OP_FUSED_CMD, /* this goes last, obviously */ IORING_OP_LAST, diff --git a/io_uring/Makefile b/io_uring/Makefile index 8cc8e5387a75..5301077e61c5 100644 --- a/io_uring/Makefile +++ b/io_uring/Makefile @@ -7,5 +7,5 @@ obj-$(CONFIG_IO_URING) += io_uring.o xattr.o nop.o fs.o splice.o \ openclose.o uring_cmd.o epoll.o \ statx.o net.o msg_ring.o timeout.o \ sqpoll.o fdinfo.o tctx.o poll.o \ - cancel.o kbuf.o rsrc.o rw.o opdef.o notif.o + cancel.o kbuf.o rsrc.o rw.o opdef.o notif.o fused_cmd.o obj-$(CONFIG_IO_WQ) += io-wq.o diff --git a/io_uring/fused_cmd.c b/io_uring/fused_cmd.c new file mode 100644 index 000000000000..ff3921f6a5df --- /dev/null +++ b/io_uring/fused_cmd.c @@ -0,0 +1,233 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include + +#include + +#include "io_uring.h" +#include "opdef.h" +#include "rsrc.h" +#include "uring_cmd.h" +#include "fused_cmd.h" + +static bool io_fused_slave_valid(const struct io_uring_sqe *sqe, u8 op) +{ + unsigned int sqe_flags = READ_ONCE(sqe->flags); + + if (op == IORING_OP_FUSED_CMD || op == IORING_OP_URING_CMD) + return false; + + if (sqe_flags & REQ_F_BUFFER_SELECT) + return false; + + if (!io_issue_defs[op].fused_slave) + return false; + + return true; +} + +static inline void io_fused_cmd_update_link_flags(struct io_kiocb *req, + const struct io_kiocb *slave) +{ + /* + * We have to keep slave SQE in order, so update master link flags + * with slave request's given master command isn't completed until + * the slave request is done + */ + if (slave->flags & (REQ_F_LINK | REQ_F_HARDLINK)) + req->flags |= REQ_F_LINK; +} + +int io_fused_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) + __must_hold(&req->ctx->uring_lock) +{ + struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd); + const struct io_uring_sqe *slave_sqe = sqe + 1; + struct io_ring_ctx *ctx = req->ctx; + struct io_kiocb *slave; + u8 slave_op; + int ret; + + if (unlikely(!(ctx->flags & IORING_SETUP_SQE128))) + return -EINVAL; + + if (unlikely(sqe->__pad1)) + return -EINVAL; + + ioucmd->flags = READ_ONCE(sqe->uring_cmd_flags); + if (unlikely(ioucmd->flags)) + return -EINVAL; + + slave_op = READ_ONCE(slave_sqe->opcode); + if (unlikely(!io_fused_slave_valid(slave_sqe, slave_op))) + return -EINVAL; + + ioucmd->cmd = sqe->cmd; + ioucmd->cmd_op = READ_ONCE(sqe->cmd_op); + req->fused_cmd_kbuf = NULL; + + /* take one extra reference for the slave request */ + io_get_task_refs(1); + + ret = -ENOMEM; + if (unlikely(!io_alloc_req(ctx, &slave))) + goto fail; + + ret = io_init_slave_req(ctx, slave, slave_sqe); + if (unlikely(ret)) + goto fail_free_req; + + /* + * The slave request won't be linked to io_uring submission link list, + * so it can't be handled by IORING_OP_LINK_TIMEOUT, however, we can do + * that on master command directly + */ + io_fused_cmd_update_link_flags(req, slave); + + ioucmd->fused.data.__slave = slave; + + return 0; + +fail_free_req: + io_free_req(slave); +fail: + current->io_uring->cached_refs += 1; + return ret; +} + +int io_fused_cmd(struct io_kiocb *req, unsigned int issue_flags) +{ + struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd); + const struct io_kiocb *slave = ioucmd->fused.data.__slave; + int ret = -EINVAL; + + /* + * Pass buffer direction for driver to validate if the requested buffer + * direction is legal + */ + if (io_issue_defs[slave->opcode].buf_dir) + issue_flags |= IO_URING_F_FUSED_BUF_DEST; + else + issue_flags |= IO_URING_F_FUSED_BUF_SRC; + + ret = io_uring_cmd(req, issue_flags); + if (ret != IOU_ISSUE_SKIP_COMPLETE) + io_free_req(ioucmd->fused.data.__slave); + + return ret; +} + +int io_import_buf_for_slave(unsigned long buf_off, unsigned int len, int dir, + struct iov_iter *iter, struct io_kiocb *slave) +{ + struct io_kiocb *req = slave->fused_master_req; + const struct io_uring_bvec_buf *kbuf; + unsigned long offset; + + if (unlikely(!(slave->flags & REQ_F_FUSED_SLAVE) || !req)) + return -EINVAL; + + if (unlikely(!req->fused_cmd_kbuf)) + return -EINVAL; + + /* req->fused_cmd_kbuf is immutable */ + kbuf = req->fused_cmd_kbuf; + offset = kbuf->offset; + + if (!kbuf->bvec) + return -EINVAL; + + if (unlikely(buf_off > kbuf->len)) + return -EFAULT; + + if (unlikely(len > kbuf->len - buf_off)) + return -EFAULT; + + /* don't use io_import_fixed which doesn't support multipage bvec */ + offset += buf_off; + iov_iter_bvec(iter, dir, kbuf->bvec, kbuf->nr_bvecs, offset + len); + + if (offset) + iov_iter_advance(iter, offset); + + return 0; +} + +/* + * Called after slave request is completed, + * + * Return back master's fused_cmd kbuf, and notify master request by + * the saved callback. + */ +void io_fused_cmd_return_buf(struct io_kiocb *slave) +{ + struct io_kiocb *req = slave->fused_master_req; + struct io_uring_cmd *ioucmd; + + if (unlikely(!req || !(slave->flags & REQ_F_FUSED_SLAVE))) + return; + + /* return back the buffer */ + slave->fused_master_req = NULL; + ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd); + ioucmd->fused.data.__slave = NULL; + + /* + * If slave OP skips CQE, return the result via master command; or + * if slave request is failed, REQ_F_CQE_SKIP will be cleared, return + * result too + */ + if ((slave->flags & REQ_F_CQE_SKIP) || slave->cqe.res < 0) + ioucmd->fused.data.slave_res = slave->cqe.res; + else + ioucmd->fused.data.slave_res = 0; + io_uring_cmd_complete_in_task(ioucmd, ioucmd->task_work_cb); +} + +/* + * Called for starting slave request after master command prepared io buffer. + * + * The io buffer is represented by @fused_cmd_kbuf, which is read only for + * slave request, however slave request can retrieve any sub-buffer by its + * sqe->addr(offset) & sqe->len. For slave request, io buffer is imported + * by io_import_buf_for_slave(). + * + * Slave request borrows master's io buffer for handling the slave operation, + * and the buffer is returned back via io_fused_cmd_return_buf after the slave + * request is completed. Meantime the master command is completed from + * io_fused_cmd_return_buf(). And driver gets completion notification by + * the passed callback of @complete_tw_cb. + */ +void io_fused_cmd_start_slave_req(struct io_uring_cmd *ioucmd, bool locked, + const struct io_uring_bvec_buf *fused_cmd_kbuf, + void (*complete_tw_cb)(struct io_uring_cmd *)) +{ + struct io_kiocb *req = cmd_to_io_kiocb(ioucmd); + struct io_kiocb *slave = ioucmd->fused.data.__slave; + + if (WARN_ON_ONCE(unlikely(!slave || + !(slave->flags & REQ_F_FUSED_SLAVE)))) + return; + + /* + * Once the fused slave request is completed and the buffer isn't be + * used, the driver will be notified by callback of complete_tw_cb + */ + ioucmd->task_work_cb = complete_tw_cb; + + /* now we get the buffer */ + req->fused_cmd_kbuf = fused_cmd_kbuf; + slave->fused_master_req = req; + + trace_io_uring_submit_sqe(slave, true); + if (locked) + io_req_task_submit(slave, &locked); + else + io_req_task_queue(slave); +} +EXPORT_SYMBOL_GPL(io_fused_cmd_start_slave_req); diff --git a/io_uring/fused_cmd.h b/io_uring/fused_cmd.h new file mode 100644 index 000000000000..0a3fc8c69870 --- /dev/null +++ b/io_uring/fused_cmd.h @@ -0,0 +1,11 @@ +// SPDX-License-Identifier: GPL-2.0 +#ifndef IOU_FUSED_CMD_H +#define IOU_FUSED_CMD_H + +int io_fused_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); +int io_fused_cmd(struct io_kiocb *req, unsigned int issue_flags); +void io_fused_cmd_return_buf(struct io_kiocb *slave); +int io_import_buf_for_slave(unsigned long buf, unsigned int len, int dir, + struct iov_iter *iter, struct io_kiocb *slave); + +#endif diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 449508912a9c..e5e43637d313 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -92,6 +92,7 @@ #include "cancel.h" #include "net.h" #include "notif.h" +#include "fused_cmd.h" #include "timeout.h" #include "poll.h" @@ -111,7 +112,7 @@ #define IO_REQ_CLEAN_FLAGS (REQ_F_BUFFER_SELECTED | REQ_F_NEED_CLEANUP | \ REQ_F_POLLED | REQ_F_INFLIGHT | REQ_F_CREDS | \ - REQ_F_ASYNC_DATA) + REQ_F_ASYNC_DATA | REQ_F_FUSED_SLAVE) #define IO_REQ_CLEAN_SLOW_FLAGS (REQ_F_REFCOUNT | REQ_F_LINK | REQ_F_HARDLINK |\ IO_REQ_CLEAN_FLAGS) @@ -971,6 +972,9 @@ static void __io_req_complete_post(struct io_kiocb *req) { struct io_ring_ctx *ctx = req->ctx; + if (req->flags & REQ_F_FUSED_SLAVE) + io_fused_cmd_return_buf(req); + io_cq_lock(ctx); if (!(req->flags & REQ_F_CQE_SKIP)) io_fill_cqe_req(ctx, req); @@ -1855,6 +1859,8 @@ static void io_clean_op(struct io_kiocb *req) spin_lock(&req->ctx->completion_lock); io_put_kbuf_comp(req); spin_unlock(&req->ctx->completion_lock); + } else if (req->flags & REQ_F_FUSED_SLAVE) { + io_fused_cmd_return_buf(req); } if (req->flags & REQ_F_NEED_CLEANUP) { @@ -2163,8 +2169,8 @@ static void io_init_req_drain(struct io_kiocb *req) } } -static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req, - const struct io_uring_sqe *sqe) +static inline int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req, + const struct io_uring_sqe *sqe, bool slave) __must_hold(&ctx->uring_lock) { const struct io_issue_def *def; @@ -2217,6 +2223,12 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req, } } + if (slave) { + if (!def->fused_slave) + return -EINVAL; + req->flags |= REQ_F_FUSED_SLAVE; + } + if (!def->ioprio && sqe->ioprio) return -EINVAL; if (!def->iopoll && (ctx->flags & IORING_SETUP_IOPOLL)) @@ -2257,6 +2269,12 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req, return def->prep(req, sqe); } +int io_init_slave_req(struct io_ring_ctx *ctx, struct io_kiocb *req, + const struct io_uring_sqe *sqe) +{ + return io_init_req(ctx, req, sqe, true); +} + static __cold int io_submit_fail_init(const struct io_uring_sqe *sqe, struct io_kiocb *req, int ret) { @@ -2301,7 +2319,7 @@ static inline int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req, struct io_submit_link *link = &ctx->submit_state.link; int ret; - ret = io_init_req(ctx, req, sqe); + ret = io_init_req(ctx, req, sqe, false); if (unlikely(ret)) return io_submit_fail_init(sqe, req, ret); diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 2711865f1e19..637e12e4fb9f 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -78,6 +78,9 @@ bool __io_alloc_req_refill(struct io_ring_ctx *ctx); bool io_match_task_safe(struct io_kiocb *head, struct task_struct *task, bool cancel_all); +int io_init_slave_req(struct io_ring_ctx *ctx, struct io_kiocb *req, + const struct io_uring_sqe *sqe); + #define io_lockdep_assert_cq_locked(ctx) \ do { \ if (ctx->flags & IORING_SETUP_IOPOLL) { \ diff --git a/io_uring/opdef.c b/io_uring/opdef.c index cca7c5b55208..63b90e8e65f8 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -33,6 +33,7 @@ #include "poll.h" #include "cancel.h" #include "rw.h" +#include "fused_cmd.h" static int io_no_issue(struct io_kiocb *req, unsigned int issue_flags) { @@ -428,6 +429,12 @@ const struct io_issue_def io_issue_defs[] = { .prep = io_eopnotsupp_prep, #endif }, + [IORING_OP_FUSED_CMD] = { + .needs_file = 1, + .plug = 1, + .prep = io_fused_cmd_prep, + .issue = io_fused_cmd, + }, }; @@ -648,6 +655,11 @@ const struct io_cold_def io_cold_defs[] = { .fail = io_sendrecv_fail, #endif }, + [IORING_OP_FUSED_CMD] = { + .name = "FUSED_CMD", + .async_size = uring_cmd_pdu_size(1), + .prep_async = io_uring_cmd_prep_async, + }, }; const char *io_uring_get_opcode(u8 opcode) diff --git a/io_uring/opdef.h b/io_uring/opdef.h index c22c8696e749..ab81c54e9cad 100644 --- a/io_uring/opdef.h +++ b/io_uring/opdef.h @@ -29,6 +29,13 @@ struct io_issue_def { unsigned iopoll_queue : 1; /* opcode specific path will handle ->async_data allocation if needed */ unsigned manual_alloc : 1; + /* can be slave op of fused command */ + unsigned fused_slave : 1; + /* + * buffer direction, 0 : read from buffer, 1: write to buffer, used + * for fused_slave only + */ + unsigned buf_dir : 1; int (*issue)(struct io_kiocb *, unsigned int); int (*prep)(struct io_kiocb *, const struct io_uring_sqe *); From patchwork Fri Mar 24 13:57:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 74554 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp647219vqo; Fri, 24 Mar 2023 07:02:12 -0700 (PDT) X-Google-Smtp-Source: AKy350Z+msLsj1dNsEPqKzMyJpB+mFGWKKG/59nDVvm9XGbyCLLEK2loAtlI1o5NZp6yKFabDoad X-Received: by 2002:a17:906:4f0b:b0:931:59f:d42 with SMTP id t11-20020a1709064f0b00b00931059f0d42mr2945916eju.29.1679666531901; Fri, 24 Mar 2023 07:02:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679666531; cv=none; d=google.com; s=arc-20160816; b=UrMS/kNNP4Steh7jd+iOADXLx5cEliZ8DtAe2T5JAHFLLj2kQp8U9ThppoRAYkE+Jr Ly0jur4QYar6NwuLTAzrmt7+Rcr5kmbj4VfZIA5Jq4yq6DcXyHNP95Dgjhbt63GWVRxL dWd22DDHalaT2mNOQjXyFj0q+aqICMDS1xYuO3any1GyV4/QsYkYR4f3MdUsZCrZBFKb BTljciljSYS7VJIofZlzOXv8/YwgzztJ7PrvVFJLyB6lihQrnBSI7cge2BIA2zjnjrOT MnBSFr0sRUFNv+IgUzNzXoWc/ijosINA+b7MOUIiCBSEf5Fi5WfjRE599BAvuePYXrkp +gEQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=eDi2RpnV8F28qQx81hb2VxE6lg4dSMNF9Efl/eRPHaQ=; b=XVXf6qBiW2n3GFEAGSRk3WqNUgJqo3M43yQiTSlad21tZj8WrcOlgND+0FVZHcFTaZ Eyz/WI46rlbfDCIS1Cp90U7C0OBazOGwXNmPjddwtSqAKAJ0SN7+D73TcrKnkOCcLLM3 nQRUiCM8zcTRBmtU/aWOqNEtlo90DCw2XHbWPUVvezHFPfmqPj6Rh8WDpU3YmXXJrbrb zlCGOfWnm68K9vlaBsfkppVxSQY/NXDkkHGyYUEENW7z7dzJA+aLkPNegvCkrnt2TNZ3 A1gsnhRyYcdKqMOiOzDR54gLIh5ByqgwtmYSbW6ToA8tvanCaUcHJyhWICJ3p6Ecr9GY lgkw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=LWPreA8O; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p2-20020a170906140200b009311151f1c3si586766ejc.426.2023.03.24.07.01.38; Fri, 24 Mar 2023 07:02:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=LWPreA8O; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231925AbjCXN71 (ORCPT + 99 others); Fri, 24 Mar 2023 09:59:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50834 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231939AbjCXN7Y (ORCPT ); Fri, 24 Mar 2023 09:59:24 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D5AD21555A for ; Fri, 24 Mar 2023 06:58:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679666319; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=eDi2RpnV8F28qQx81hb2VxE6lg4dSMNF9Efl/eRPHaQ=; b=LWPreA8OBwIucHpahw+kmV9u6LakN/Quxlwjx67BsA2lJ1hggfKPsc9S8HppnculkPg4Mb 1VnoTzVXIxUiiCnwAGi2VKAxe8pbuYpDyEqPSmK1PqIfGgcWDMaPo2jTqKerJ8m7RoV1y6 sv28dQXJ5e8Q5t6vUMBv5qQi5743EkI= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-584-jQyb_u7AMQqJgEO9sOrk6A-1; Fri, 24 Mar 2023 09:58:35 -0400 X-MC-Unique: jQyb_u7AMQqJgEO9sOrk6A-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id EC0E91C17427; Fri, 24 Mar 2023 13:58:31 +0000 (UTC) Received: from localhost (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id 160CB4042AD0; Fri, 24 Mar 2023 13:58:30 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Pavel Begunkov , Stefan Hajnoczi , Ming Lei Subject: [PATCH V4 03/17] io_uring: support normal SQE for fused command Date: Fri, 24 Mar 2023 21:57:54 +0800 Message-Id: <20230324135808.855245-4-ming.lei@redhat.com> In-Reply-To: <20230324135808.855245-1-ming.lei@redhat.com> References: <20230324135808.855245-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761258013783438636?= X-GMAIL-MSGID: =?utf-8?q?1761258013783438636?= So far, the slave sqe is saved in the 2nd 64 byte of master sqe, which requires that SQE128 has to be enabled. Relax this limit by allowing to fetch slave SQE from SQ directly. IORING_URING_CMD_FUSED_SPLIT_SQE has to be set for this usage, and userspace has to put slave SQE following the master sqe. However, not sure if this way is useful, given fused command needs at least two SQEs for running io in fast path, and SQE128 matches this usecase perfectly. Signed-off-by: Ming Lei --- include/uapi/linux/io_uring.h | 8 ++++++- io_uring/fused_cmd.c | 42 ++++++++++++++++++++++++++++------- io_uring/io_uring.c | 22 ++++++++++++------ io_uring/io_uring.h | 1 + 4 files changed, 57 insertions(+), 16 deletions(-) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 9762a2989747..6f25ca85639f 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -233,9 +233,15 @@ enum io_uring_op { * sqe->uring_cmd_flags * IORING_URING_CMD_FIXED use registered buffer; pass this flag * along with setting sqe->buf_index. + * + * IORING_URING_CMD_FUSED_SPLIT_SQE fused command only, slave sqe is + * provided from another new sqe; without + * setting the flag, slave sqe is from + * 2nd 64byte of this sqe, so SQE128 has + * to be enabled */ #define IORING_URING_CMD_FIXED (1U << 0) - +#define IORING_URING_CMD_FUSED_SPLIT_SQE (1U << 1) /* * sqe->fsync_flags diff --git a/io_uring/fused_cmd.c b/io_uring/fused_cmd.c index ff3921f6a5df..4cfe02e316f9 100644 --- a/io_uring/fused_cmd.c +++ b/io_uring/fused_cmd.c @@ -43,24 +43,45 @@ static inline void io_fused_cmd_update_link_flags(struct io_kiocb *req, req->flags |= REQ_F_LINK; } +static const struct io_uring_sqe *fused_cmd_get_slave_sqe( + struct io_ring_ctx *ctx, const struct io_uring_sqe *master, + bool split_sqe) +{ + if (unlikely(!(ctx->flags & IORING_SETUP_SQE128) && !split_sqe)) + return NULL; + + if (split_sqe) { + const struct io_uring_sqe *sqe; + + if (unlikely(!io_get_slave_sqe(ctx, &sqe))) + return NULL; + return sqe; + } + + return master + 1; +} + int io_fused_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) __must_hold(&req->ctx->uring_lock) { struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd); - const struct io_uring_sqe *slave_sqe = sqe + 1; + const struct io_uring_sqe *slave_sqe; struct io_ring_ctx *ctx = req->ctx; struct io_kiocb *slave; u8 slave_op; int ret; - - if (unlikely(!(ctx->flags & IORING_SETUP_SQE128))) - return -EINVAL; + bool split_sqe; if (unlikely(sqe->__pad1)) return -EINVAL; ioucmd->flags = READ_ONCE(sqe->uring_cmd_flags); - if (unlikely(ioucmd->flags)) + if (unlikely(ioucmd->flags & ~IORING_URING_CMD_FUSED_SPLIT_SQE)) + return -EINVAL; + + split_sqe = ioucmd->flags & IORING_URING_CMD_FUSED_SPLIT_SQE; + slave_sqe = fused_cmd_get_slave_sqe(ctx, sqe, split_sqe); + if (unlikely(!slave_sqe)) return -EINVAL; slave_op = READ_ONCE(slave_sqe->opcode); @@ -71,8 +92,12 @@ int io_fused_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) ioucmd->cmd_op = READ_ONCE(sqe->cmd_op); req->fused_cmd_kbuf = NULL; - /* take one extra reference for the slave request */ - io_get_task_refs(1); + /* + * Take one extra reference for the slave request built from + * builtin SQE since io_uring core code doesn't grab it for us + */ + if (!split_sqe) + io_get_task_refs(1); ret = -ENOMEM; if (unlikely(!io_alloc_req(ctx, &slave))) @@ -96,7 +121,8 @@ int io_fused_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) fail_free_req: io_free_req(slave); fail: - current->io_uring->cached_refs += 1; + if (!split_sqe) + current->io_uring->cached_refs += 1; return ret; } diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index e5e43637d313..b0008d380686 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2414,7 +2414,8 @@ static void io_commit_sqring(struct io_ring_ctx *ctx) * used, it's important that those reads are done through READ_ONCE() to * prevent a re-load down the line. */ -static bool io_get_sqe(struct io_ring_ctx *ctx, const struct io_uring_sqe **sqe) +static inline bool io_get_sqe(struct io_ring_ctx *ctx, + const struct io_uring_sqe **sqe) { unsigned head, mask = ctx->sq_entries - 1; unsigned sq_idx = ctx->cached_sq_head++ & mask; @@ -2443,19 +2444,25 @@ static bool io_get_sqe(struct io_ring_ctx *ctx, const struct io_uring_sqe **sqe) return false; } +bool io_get_slave_sqe(struct io_ring_ctx *ctx, const struct io_uring_sqe **sqe) +{ + return io_get_sqe(ctx, sqe); +} + int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr) __must_hold(&ctx->uring_lock) { unsigned int entries = io_sqring_entries(ctx); - unsigned int left; + unsigned old_head = ctx->cached_sq_head; + unsigned int left = 0; int ret; if (unlikely(!entries)) return 0; /* make sure SQ entry isn't read before tail */ - ret = left = min3(nr, ctx->sq_entries, entries); - io_get_task_refs(left); - io_submit_state_start(&ctx->submit_state, left); + ret = min3(nr, ctx->sq_entries, entries); + io_get_task_refs(ret); + io_submit_state_start(&ctx->submit_state, ret); do { const struct io_uring_sqe *sqe; @@ -2474,11 +2481,12 @@ int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr) */ if (unlikely(io_submit_sqe(ctx, req, sqe)) && !(ctx->flags & IORING_SETUP_SUBMIT_ALL)) { - left--; + left = 1; break; } - } while (--left); + } while ((ctx->cached_sq_head - old_head) < ret); + left = ret - (ctx->cached_sq_head - old_head) - left; if (unlikely(left)) { ret -= left; /* try again if it submitted nothing and can't allocate a req */ diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 637e12e4fb9f..ee22e65c4aef 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -78,6 +78,7 @@ bool __io_alloc_req_refill(struct io_ring_ctx *ctx); bool io_match_task_safe(struct io_kiocb *head, struct task_struct *task, bool cancel_all); +bool io_get_slave_sqe(struct io_ring_ctx *ctx, const struct io_uring_sqe **sqe); int io_init_slave_req(struct io_ring_ctx *ctx, struct io_kiocb *req, const struct io_uring_sqe *sqe); From patchwork Fri Mar 24 13:57:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 74573 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp660380vqo; Fri, 24 Mar 2023 07:17:09 -0700 (PDT) X-Google-Smtp-Source: AKy350aaAL7mOPyQSYhQ36mL0aF3NL+eOLbIVA+T21AqzYW8CZ+z/NEvkpDb7iflXx04wupya538 X-Received: by 2002:a05:6402:1110:b0:4fb:6357:f393 with SMTP id u16-20020a056402111000b004fb6357f393mr2771136edv.1.1679667429564; Fri, 24 Mar 2023 07:17:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679667429; cv=none; d=google.com; s=arc-20160816; b=w/V90CxfLd5w6AAId1+IlX5Z5+x3rODubaPtF6cB9gY+nZG+l3fQNgdnI0MpiJPFON 8QwbFMEzJE51qUOn8wcMi93KmaKGPheOzc+q219mjV504vECFRjQZ2T0oM7X6LSxPCQT /TF/iV2HBetrVyPySLA9kqGQkR9y/TX6BOqG7rPDePxwSLJpN748N/4VAS3srqyEaio9 bjU3Ivjd21qSWf1LAH+kGgx7r96BM20cBwQFgHyFOvCf2GpLZeQsAUkmUn4RlLgNsn93 rfoJTJjGKJiYTnCp2IhxC0P+K6DbPIAStquNwowveJwTyKMl2UpNm5680BQ81otqcTj5 mC2g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=97jH9oZY/nZ/Q7njPQq53hgrYxqtyPQC3OYlAlAsaNY=; b=W1KyROTSk/bz91rvBCjMxzfAGtjLDPjRhmFewubulMBc1aWAgCK85o/LDJ7L6TRJWG +mF1ineeZ2s1VXrOgbKWYsVBALXpY2ZMj23OapCfqwv7KhGGSpFWGvQBhXcYttLtVjW8 MjF2yb+dF9pFlHGf/nf60A7WxUu3VYWVk/Ol76dTj/0rlq3jylM3rLwHqA3rpbJoaejn FdVtY2E1+fz3qf0bnIaxjo20rINbHwot+5hHldOg5h7pm/XA/DQnR0CBbN5HEeZhoRnA jjXYL3P2roUKDI5BLkE3876Hqv7Ym4zeaS44jqF8c3DA3X0wCxjBMjORH4/ub9zF7Lbc lHqg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=c4PWe54V; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a12-20020aa7d74c000000b004af51516391si20744400eds.15.2023.03.24.07.16.45; Fri, 24 Mar 2023 07:17:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=c4PWe54V; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231855AbjCXOAH (ORCPT + 99 others); Fri, 24 Mar 2023 10:00:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51256 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231952AbjCXN76 (ORCPT ); Fri, 24 Mar 2023 09:59:58 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C60B6EB69 for ; Fri, 24 Mar 2023 06:58:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679666322; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=97jH9oZY/nZ/Q7njPQq53hgrYxqtyPQC3OYlAlAsaNY=; b=c4PWe54Vh6z5ZuW/Pfsk6Ojt4ZgYtHG5IGZiCgd4dIywq6M1rbkcJeHs9ZN2Iq440b8d8a R1nKFxiHwGnqJSamGH+TKEdl7VdrAuEOErkq+i1bUN5onC1UXyTfUr2l27zpJQm/T37jOw q2cS//wgcpHXAlxjz0ftmPW8F1McNKQ= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-112-z2fEdsF5Ov6b0InxriGYoQ-1; Fri, 24 Mar 2023 09:58:38 -0400 X-MC-Unique: z2fEdsF5Ov6b0InxriGYoQ-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 973B48030D1; Fri, 24 Mar 2023 13:58:35 +0000 (UTC) Received: from localhost (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id BAB70140E96A; Fri, 24 Mar 2023 13:58:34 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Pavel Begunkov , Stefan Hajnoczi , Ming Lei Subject: [PATCH V4 04/17] io_uring: support OP_READ/OP_WRITE for fused slave request Date: Fri, 24 Mar 2023 21:57:55 +0800 Message-Id: <20230324135808.855245-5-ming.lei@redhat.com> In-Reply-To: <20230324135808.855245-1-ming.lei@redhat.com> References: <20230324135808.855245-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761258954445763686?= X-GMAIL-MSGID: =?utf-8?q?1761258954445763686?= Start to allow fused slave request to support OP_READ/OP_WRITE, and the buffer can be retrieved from master request. Once the slave request is completed, the master buffer will be returned back. Signed-off-by: Ming Lei --- io_uring/opdef.c | 4 ++++ io_uring/rw.c | 20 ++++++++++++++++++++ 2 files changed, 24 insertions(+) diff --git a/io_uring/opdef.c b/io_uring/opdef.c index 63b90e8e65f8..9b376df91abd 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -235,6 +235,8 @@ const struct io_issue_def io_issue_defs[] = { .ioprio = 1, .iopoll = 1, .iopoll_queue = 1, + .fused_slave = 1, + .buf_dir = WRITE, .prep = io_prep_rw, .issue = io_read, }, @@ -248,6 +250,8 @@ const struct io_issue_def io_issue_defs[] = { .ioprio = 1, .iopoll = 1, .iopoll_queue = 1, + .fused_slave = 1, + .buf_dir = READ, .prep = io_prep_rw, .issue = io_write, }, diff --git a/io_uring/rw.c b/io_uring/rw.c index 4c233910e200..0c292ef9a40f 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -19,6 +19,7 @@ #include "kbuf.h" #include "rsrc.h" #include "rw.h" +#include "fused_cmd.h" struct io_rw { /* NOTE: kiocb has the file as the first member, so don't do it here */ @@ -371,6 +372,17 @@ static struct iovec *__io_import_iovec(int ddir, struct io_kiocb *req, size_t sqe_len; ssize_t ret; + /* + * SLAVE OP passes buffer offset from sqe->addr actually, since + * the fused cmd kbuf's mapped start address is zero. + */ + if (req->flags & REQ_F_FUSED_SLAVE) { + ret = io_import_buf_for_slave(rw->addr, rw->len, ddir, iter, req); + if (ret) + return ERR_PTR(ret); + return NULL; + } + if (opcode == IORING_OP_READ_FIXED || opcode == IORING_OP_WRITE_FIXED) { ret = io_import_fixed(ddir, iter, req->imu, rw->addr, rw->len); if (ret) @@ -428,11 +440,19 @@ static inline loff_t *io_kiocb_ppos(struct kiocb *kiocb) */ static ssize_t loop_rw_iter(int ddir, struct io_rw *rw, struct iov_iter *iter) { + struct io_kiocb *req = cmd_to_io_kiocb(rw); struct kiocb *kiocb = &rw->kiocb; struct file *file = kiocb->ki_filp; ssize_t ret = 0; loff_t *ppos; + /* + * Fused slave req hasn't user buffer, so ->read/->write can't + * be supported + */ + if (req->flags & REQ_F_FUSED_SLAVE) + return -EOPNOTSUPP; + /* * Don't support polled IO through this interface, and we can't * support non-blocking either. For the latter, this just causes From patchwork Fri Mar 24 13:57:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 74555 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp647492vqo; Fri, 24 Mar 2023 07:02:27 -0700 (PDT) X-Google-Smtp-Source: AKy350Y1ezv3FUTk5D8g6C4AZsAeZnOWJnvnwG2KLPWujmCBYvrdZjJ3ZyArjRCMM0leCw/3CWCn X-Received: by 2002:a17:906:c791:b0:8f2:62a9:6159 with SMTP id cw17-20020a170906c79100b008f262a96159mr2779400ejb.2.1679666546905; Fri, 24 Mar 2023 07:02:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679666546; cv=none; d=google.com; s=arc-20160816; b=IVvI1qUdhIm0qj98lK2tzuoaM1JGUjrutCNryJbjekFtnOQd9QKrUi1NahcTHvnazI d/Xu/FWQ+JLQJ6TYylAERE8PvBTYehkvk/dXSJ/9eyJgTIgjvcpUBhyjhuDcpzMD/fdr Vg+qBLRuglnpVgCfMfRrvM6sr7RJ/0/gvKV07zRCPVauAv4Vxkd4YLF/G27Xtg8Lsk4A 2Xzl5to32E2FBeyDt61yUbPtoed58/yvgaAwjjrnhG9WnCH2447D9T/g2ZKnnXFe6y4P MrI3d2Zhkv+d6mE5GImhmIayBZW43ByTVuN8Mxyn1xIUlVrS9fN4BDy+qYxEfVdIKkVh yRJQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=AOAiaCAkofmrjLYBJAozNfXVKYb+VVcnQ70rF3f1ifw=; b=fK12vnlr3mBa/iT3/buiRGwFpMXSQD//jijFDtSvlTHkoskIbVJSILECFBzMkRRYaI 7kJ980qKXLUyceOtJP78IayPF8s+QVMH5CINGrwo8WxWw4DKPhf0pu1ZoYP2ozwpp2kE 1Yt8OEAjEj3Rwl4vPTxQOQmYSMdEDvQcAUQD4MeX6HTEjK9YOG6Cen1zG1lKoIicurlk RENgjBEAwYBGpI0TiaxKQ54SyZpMBUg3WkMxx2sHzbe7zX6Nv3XgQG9iCFdyxQrFQZJ/ SASqSOJlkCthxE/u7+y2scRvYoj4Vgvl75cXMaPWALXpqxuYYZl6QfZ8PbRSnkxgXvwE XFTQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=HTvZwzRT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a13-20020a1709066d4d00b0092be4d3413asi20578837ejt.131.2023.03.24.07.01.58; Fri, 24 Mar 2023 07:02:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=HTvZwzRT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232098AbjCXOAT (ORCPT + 99 others); Fri, 24 Mar 2023 10:00:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50664 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231950AbjCXOAF (ORCPT ); Fri, 24 Mar 2023 10:00:05 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E939E1ADC5 for ; Fri, 24 Mar 2023 06:58:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679666327; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=AOAiaCAkofmrjLYBJAozNfXVKYb+VVcnQ70rF3f1ifw=; b=HTvZwzRTzonwGKhCAwJPWeywi+vGKodv0jq1hyPQKyaqHbtE8/oINRH/6Z9r544oNqqK3g Ea5gkdBKi7dZAOZGiQzTX9NfJv1iyB3Sdo8Wizaek+9PF5nZ5EFeniPLOTb9HDs+gn8qpg pzyEwEdQMyAqwX/YKukNjK5qgxCIJKY= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-458-x23RE0-LOEyXfmhlZ1BRRw-1; Fri, 24 Mar 2023 09:58:40 -0400 X-MC-Unique: x23RE0-LOEyXfmhlZ1BRRw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id CA70A18E0062; Fri, 24 Mar 2023 13:58:38 +0000 (UTC) Received: from localhost (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id 08AE34020C81; Fri, 24 Mar 2023 13:58:37 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Pavel Begunkov , Stefan Hajnoczi , Ming Lei Subject: [PATCH V4 05/17] io_uring: support OP_SEND_ZC/OP_RECV for fused slave request Date: Fri, 24 Mar 2023 21:57:56 +0800 Message-Id: <20230324135808.855245-6-ming.lei@redhat.com> In-Reply-To: <20230324135808.855245-1-ming.lei@redhat.com> References: <20230324135808.855245-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761258028920250320?= X-GMAIL-MSGID: =?utf-8?q?1761258028920250320?= Start to allow fused slave request to support OP_SEND_ZC/OP_RECV, and the buffer can be retrieved from master request. Once the slave request is completed, the master buffer will be returned back. Signed-off-by: Ming Lei --- io_uring/net.c | 30 ++++++++++++++++++++++++++++-- io_uring/opdef.c | 6 ++++++ 2 files changed, 34 insertions(+), 2 deletions(-) diff --git a/io_uring/net.c b/io_uring/net.c index b7f190ca528e..bc33f89f61b3 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -16,6 +16,7 @@ #include "net.h" #include "notif.h" #include "rsrc.h" +#include "fused_cmd.h" #if defined(CONFIG_NET) struct io_shutdown { @@ -68,6 +69,13 @@ struct io_sr_msg { struct io_kiocb *notif; }; +#define user_ptr_to_u64(x) ( \ +{ \ + typecheck(void __user *, (x)); \ + (u64)(unsigned long)(x); \ +} \ +) + static inline bool io_check_multishot(struct io_kiocb *req, unsigned int issue_flags) { @@ -378,7 +386,11 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags) if (unlikely(!sock)) return -ENOTSOCK; - ret = import_ubuf(ITER_SOURCE, sr->buf, sr->len, &msg.msg_iter); + if (!(req->flags & REQ_F_FUSED_SLAVE)) + ret = import_ubuf(ITER_SOURCE, sr->buf, sr->len, &msg.msg_iter); + else + ret = io_import_buf_for_slave(user_ptr_to_u64(sr->buf), + sr->len, ITER_SOURCE, &msg.msg_iter, req); if (unlikely(ret)) return ret; @@ -869,7 +881,11 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags) sr->buf = buf; } - ret = import_ubuf(ITER_DEST, sr->buf, len, &msg.msg_iter); + if (!(req->flags & REQ_F_FUSED_SLAVE)) + ret = import_ubuf(ITER_DEST, sr->buf, len, &msg.msg_iter); + else + ret = io_import_buf_for_slave(user_ptr_to_u64(sr->buf), + sr->len, ITER_DEST, &msg.msg_iter, req); if (unlikely(ret)) goto out_free; @@ -983,6 +999,9 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) if (zc->flags & IORING_RECVSEND_FIXED_BUF) { unsigned idx = READ_ONCE(sqe->buf_index); + if (req->flags & REQ_F_FUSED_SLAVE) + return -EINVAL; + if (unlikely(idx >= ctx->nr_user_bufs)) return -EFAULT; idx = array_index_nospec(idx, ctx->nr_user_bufs); @@ -1119,8 +1138,15 @@ int io_send_zc(struct io_kiocb *req, unsigned int issue_flags) if (unlikely(ret)) return ret; msg.sg_from_iter = io_sg_from_iter; + } else if (req->flags & REQ_F_FUSED_SLAVE) { + ret = io_import_buf_for_slave(user_ptr_to_u64(zc->buf), + zc->len, ITER_SOURCE, &msg.msg_iter, req); + if (unlikely(ret)) + return ret; + msg.sg_from_iter = io_sg_from_iter; } else { io_notif_set_extended(zc->notif); + ret = import_ubuf(ITER_SOURCE, zc->buf, zc->len, &msg.msg_iter); if (unlikely(ret)) return ret; diff --git a/io_uring/opdef.c b/io_uring/opdef.c index 9b376df91abd..e7d75bf69c0f 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -273,6 +273,8 @@ const struct io_issue_def io_issue_defs[] = { .audit_skip = 1, .ioprio = 1, .manual_alloc = 1, + .fused_slave = 1, + .buf_dir = READ, #if defined(CONFIG_NET) .prep = io_sendmsg_prep, .issue = io_send, @@ -287,6 +289,8 @@ const struct io_issue_def io_issue_defs[] = { .buffer_select = 1, .audit_skip = 1, .ioprio = 1, + .fused_slave = 1, + .buf_dir = WRITE, #if defined(CONFIG_NET) .prep = io_recvmsg_prep, .issue = io_recv, @@ -413,6 +417,8 @@ const struct io_issue_def io_issue_defs[] = { .audit_skip = 1, .ioprio = 1, .manual_alloc = 1, + .fused_slave = 1, + .buf_dir = READ, #if defined(CONFIG_NET) .prep = io_send_zc_prep, .issue = io_send_zc, From patchwork Fri Mar 24 13:57:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 74570 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp659606vqo; Fri, 24 Mar 2023 07:16:16 -0700 (PDT) X-Google-Smtp-Source: AKy350YgHNZg1Vr3UoN8bHAsRZpRIFTUVnZw5WxcExEWhhLEogTMzEr0d2iM43gXA1tt6N/QIttn X-Received: by 2002:a05:6402:10d8:b0:4fb:2296:30b3 with SMTP id p24-20020a05640210d800b004fb229630b3mr3322934edu.15.1679667376766; Fri, 24 Mar 2023 07:16:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679667376; cv=none; d=google.com; s=arc-20160816; b=I3CsFP57eGjj2WyeA9FXc6eSF8jnwD8pOKlfUrEqsTsdPvUpJQsnn63ahMcDGu0J54 7iq234ijHxl4yXSkxY7PPz07C0fAFK2wu0eXnTL+T1yb5mv5qB7M4VGyXea7F+YuFHZ1 pFCLgL4kUH/jUup5prilYV8skujBepylcqK+zOWAOC5VjAdAyBr4BgaKAo/J9zqrN+1y jpFPEs/do+f15Fi4p5b5bRpAfFGnpOSwllQpkP+jNrybM4rpRbXF8pu9YvswBukE4kGe 3Hzs0Uy8tHybo765bCo0QRQJqJ8RVsyLuoU2h73RoUipH//Kl038vWIsZfutPRuEEaHD mbrA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=5yLNYQmV+u2VRfFCW1SdaEMv1D/RkRV8OgRAz/vRVGU=; b=bSAj7b/DuI03rqyTF8tcQO542hwMYmjqV0spOdOcPwXLw4icJq6OeBWCMODlAd1y3f Ul8bV91ZNUQNuniA5NpUIjgHgW3wEXMPnmoCsYy1zdyJCzZ5JhlwIkCNoIAc5UQ0oXZL CXZ0H9C5+su7ui4exhWm8y5USddu1mEpgfnS6hj/8Ft06iehor4Nj+7OJ4YTcOnOt/WH f/iKbjcTusrSBM2odxyBhoUL4pt7E9flXd7BRKKCSd0kXYIOkWdLmGIOfNUjMIjXMDnI DQbIFDzWnb5lAd/fTsgVQXa8Sr0A+gQhfl1n/3KmnpxexxWIP/a2Vnwbjh3qveV2gb2I gMfQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=JLT7GbCV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d25-20020aa7d699000000b004fc8943f1ffsi236033edr.211.2023.03.24.07.15.48; Fri, 24 Mar 2023 07:16:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=JLT7GbCV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232030AbjCXOAZ (ORCPT + 99 others); Fri, 24 Mar 2023 10:00:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51452 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232062AbjCXOAG (ORCPT ); Fri, 24 Mar 2023 10:00:06 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 187A01ADEC for ; Fri, 24 Mar 2023 06:58:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679666327; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5yLNYQmV+u2VRfFCW1SdaEMv1D/RkRV8OgRAz/vRVGU=; b=JLT7GbCVy9A7Wv9iygS7SYkC1ZCOHGHQJ7+Ouh3/Z9mbasqlV1LNGkdXV3OjojFNMQbyuu ojHvab8OlwfLojo/Zh8idJvm51Mg5zbEkJDXdXNqDsvwm9X6ttZKLy5gty+BAvynCij+l+ oqP06G020xqkYOttGXbmPYKpQQSAZes= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-396-OR_HhzrAOzW_mW1BOzVbcg-1; Fri, 24 Mar 2023 09:58:43 -0400 X-MC-Unique: OR_HhzrAOzW_mW1BOzVbcg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 8BF851C0A593; Fri, 24 Mar 2023 13:58:42 +0000 (UTC) Received: from localhost (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id BAD331121318; Fri, 24 Mar 2023 13:58:41 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Pavel Begunkov , Stefan Hajnoczi , Ming Lei Subject: [PATCH V4 06/17] block: ublk_drv: mark device as LIVE before adding disk Date: Fri, 24 Mar 2023 21:57:57 +0800 Message-Id: <20230324135808.855245-7-ming.lei@redhat.com> In-Reply-To: <20230324135808.855245-1-ming.lei@redhat.com> References: <20230324135808.855245-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761258899149420441?= X-GMAIL-MSGID: =?utf-8?q?1761258899149420441?= IO can be started before add_disk() returns, such as reading parititon table, then the monitor work should work for making forward progress. So mark device as LIVE before adding disk, meantime change to DEAD if add_disk() fails. Reviewed-by: Ziyang Zhang Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index d1d1c8d606c8..fb5a557afde8 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -1602,17 +1602,18 @@ static int ublk_ctrl_start_dev(struct ublk_device *ub, struct io_uring_cmd *cmd) set_bit(GD_SUPPRESS_PART_SCAN, &disk->state); get_device(&ub->cdev_dev); + ub->dev_info.state = UBLK_S_DEV_LIVE; ret = add_disk(disk); if (ret) { /* * Has to drop the reference since ->free_disk won't be * called in case of add_disk failure. */ + ub->dev_info.state = UBLK_S_DEV_DEAD; ublk_put_device(ub); goto out_put_disk; } set_bit(UB_STATE_USED, &ub->state); - ub->dev_info.state = UBLK_S_DEV_LIVE; out_put_disk: if (ret) put_disk(disk); From patchwork Fri Mar 24 13:57:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 74556 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp647589vqo; Fri, 24 Mar 2023 07:02:32 -0700 (PDT) X-Google-Smtp-Source: AKy350ZzRuqNmo/QxCe5q+mmrA0IIEZUfeVdtlFFPw5SuZzJdQ0sKIyrecOIoV+63nco5WvJGQc7 X-Received: by 2002:a17:906:cf89:b0:87b:d3f3:dcf3 with SMTP id um9-20020a170906cf8900b0087bd3f3dcf3mr3068960ejb.35.1679666552633; Fri, 24 Mar 2023 07:02:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679666552; cv=none; d=google.com; s=arc-20160816; b=LUlek/EoZMx8mWmpyTX3Ctx1FB252zoICP83EiJAdV3sqiknY41o5HYmLEDj53FRwl g12qLcDcmwuqAwZEUA+9H2EH0TsOl76MI5jnEdEfe7/DB0fc1Ev7dzqGBL6/x0P7hx75 6Mud6c6SwbC/xstkOiN+7IrK/Z5jLALSnUFTwb4aQ8rv1PGeMuKXhW+k6vNqbeYmLvJJ IntpTXeycIEHnMlxb+VWuaKj5rUoMeULKt9UQPWfyxTvUg7JcfBTXzR3rQfq/5TXWIKc 0vAd9ikRgFqbs4yTIJGP0iuzS8K5pLFfqPaKM+bWCdStK+3IVgczrzmyEshEn4U3p9MF S0YQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=/5kegSRJea5mlNcvhsJOGDqb0HH/kL74VNa9/pI4EIA=; b=PMQxj8cHHwULAnDMOWZOifAvHH5sF3aVF/Mvmqu70nNXdLtLr9crqaWUYMnCprfUgV yUpFRfqBh33cJfxruWgGaQEaByBIcQG6EpyU9X6LAnBl9l4gVb5oD8CjA/WHxE2HyVOh QT1zhwpaI26OkkmAHMJiqBFnyi4D0DEkFTAIrHiRhHmIqzuhpovWqxtPEy1bMFrBuuP2 IG6Z5QJXgSqltuHlDkOnSAFhcYQRHeDKWDwpzXT898EjoU9Mqr/8i+QVv9TBUYQR6fH8 WOxbPVAt1VC2bdsv3Sp6pFjkxc5SvMUod+jaPc4V45KuI0nUF/u/oesATsHCB7qw8x6Z aYSg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=dMowBNlo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l4-20020aa7d944000000b004acbe0c5f69si22656318eds.444.2023.03.24.07.01.58; Fri, 24 Mar 2023 07:02:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=dMowBNlo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232126AbjCXOA2 (ORCPT + 99 others); Fri, 24 Mar 2023 10:00:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50784 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231422AbjCXOAI (ORCPT ); Fri, 24 Mar 2023 10:00:08 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5251C1B2FB for ; Fri, 24 Mar 2023 06:58:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679666334; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/5kegSRJea5mlNcvhsJOGDqb0HH/kL74VNa9/pI4EIA=; b=dMowBNloT/lA6CggcVCePoHmYk6/hfDcweCsGo7Mzf/ogn4IO4U2V63Diqm3iKsSdPI1+J 200CHSVa1j9fuSO+CeEhxBAXm99zU06CfMvZLUQVisYLHpRKLEfbaM4YWTwC/Ck7ME413E u1uK37sMrNgGCXpM05K0IQbOuUiirCU= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-153-aTjkxU65Pruc-MjMCMrXxw-1; Fri, 24 Mar 2023 09:58:47 -0400 X-MC-Unique: aTjkxU65Pruc-MjMCMrXxw-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B1E19185A790; Fri, 24 Mar 2023 13:58:45 +0000 (UTC) Received: from localhost (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id E2C69140EBF4; Fri, 24 Mar 2023 13:58:44 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Pavel Begunkov , Stefan Hajnoczi , Ming Lei Subject: [PATCH V4 07/17] block: ublk_drv: add common exit handling Date: Fri, 24 Mar 2023 21:57:58 +0800 Message-Id: <20230324135808.855245-8-ming.lei@redhat.com> In-Reply-To: <20230324135808.855245-1-ming.lei@redhat.com> References: <20230324135808.855245-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761258034952389213?= X-GMAIL-MSGID: =?utf-8?q?1761258034952389213?= Simplify exit handling a bit, and prepare for supporting fused command. Reviewed-by: Ziyang Zhang Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index fb5a557afde8..b8998ed87a0f 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -655,14 +655,15 @@ static void ublk_complete_rq(struct request *req) struct ublk_queue *ubq = req->mq_hctx->driver_data; struct ublk_io *io = &ubq->ios[req->tag]; unsigned int unmapped_bytes; + blk_status_t res = BLK_STS_OK; /* failed read IO if nothing is read */ if (!io->res && req_op(req) == REQ_OP_READ) io->res = -EIO; if (io->res < 0) { - blk_mq_end_request(req, errno_to_blk_status(io->res)); - return; + res = errno_to_blk_status(io->res); + goto exit; } /* @@ -671,10 +672,8 @@ static void ublk_complete_rq(struct request *req) * * Both the two needn't unmap. */ - if (req_op(req) != REQ_OP_READ && req_op(req) != REQ_OP_WRITE) { - blk_mq_end_request(req, BLK_STS_OK); - return; - } + if (req_op(req) != REQ_OP_READ && req_op(req) != REQ_OP_WRITE) + goto exit; /* for READ request, writing data in iod->addr to rq buffers */ unmapped_bytes = ublk_unmap_io(ubq, req, io); @@ -691,6 +690,10 @@ static void ublk_complete_rq(struct request *req) blk_mq_requeue_request(req, true); else __blk_mq_end_request(req, BLK_STS_OK); + + return; +exit: + blk_mq_end_request(req, res); } /* From patchwork Fri Mar 24 13:57:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 74558 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp648061vqo; Fri, 24 Mar 2023 07:03:01 -0700 (PDT) X-Google-Smtp-Source: AKy350bB3DlukTxQnS2f2pCkIeGRSN+fmB+AfBm7ARTtRFCg56PctlXu5VL0Ck/01dVnseOaJ+oO X-Received: by 2002:a17:906:caa:b0:8b1:3821:1406 with SMTP id k10-20020a1709060caa00b008b138211406mr3027589ejh.45.1679666581232; Fri, 24 Mar 2023 07:03:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679666581; cv=none; d=google.com; s=arc-20160816; b=XX1u1jEuBkAKsuBaBv/rJm27gmu8Z9oS7QGuYDXjRTLvOVr1Acs9jxccmbtiZ3N+XX ado03emYwcKLF5b/3jLCFCgpBN4uAE0cNA2RcgvctsWTdP9Ls9JMN/yf7l3mkZKTHV1G fzkF06h9uOsRaDlaeoKK+8W9GC8Ls+CkGAak0WAxM6PNnlQZeJptUCktmqU4LL6QgQX5 c3ZZvYDE+7LWe+S+xC+SHq3p8H/4fSaLtXselFyiokd3MwkySw8pKdJ+yDw1G7JPTcyq m3ZAsdAJt82MWYp1Lq1ltWqaR89MTB+lFVMUrmUh0JKOANVKl/BuZr9busnm66lFVA+g EpOA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=wz2FGQNTIGH4rfgIOsbrI06KOjBXUtm/DnvsfScX6mQ=; b=hpYKNL9RzEpUqw0dFAoCvY4EyNHVtfti3VsNxaC4lbqbj/m08Czog+yCpv07c+FnG9 uAmO4CiIGlmpxHVDX4DTsniUuT6e5m41DRTEIU4D/TIFKValt+tPOj67hP5W3KsO3YQM BUyTU3DvgYru1Rtwzm73JuJ4o3b9FeD2D6WHYoLUsncwdGRA29QmRkssLy+m7uOB//q9 dgPc3MpaR+XaVX6asxT9k8qRY0y5GTFB6KneHlcOKCCfgCw6z94RVtRsv53tOptbFc3l xwy567pW0widWR6e8hRwoeg0DPhKPvlFiX9liyFpPmIeaT+4h5qREpiq3CQsGktbmUUG 2aYg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=UDgUvIYr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i8-20020a1709061e4800b00931b738addasi17164875ejj.907.2023.03.24.07.02.05; Fri, 24 Mar 2023 07:03:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=UDgUvIYr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232102AbjCXOAW (ORCPT + 99 others); Fri, 24 Mar 2023 10:00:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50698 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232030AbjCXOAF (ORCPT ); Fri, 24 Mar 2023 10:00:05 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2AFC31B2C5 for ; Fri, 24 Mar 2023 06:58:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679666333; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wz2FGQNTIGH4rfgIOsbrI06KOjBXUtm/DnvsfScX6mQ=; b=UDgUvIYr43r3TaKVTiHe4AygYBzSXfY/Qj1X3yc8Mw2+gghkc5o+nAOoQnmdpjGlijpSwn uCiUCXQqcE9EE8gtmC5rhXHtmwe8gZyuQi+E9oD2j8iEtynp3I0aqBzFJ9JllHaCdkjuGR lAeVGSb33bJg7kE43mIpcLlinT5esrk= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-480-pgonzQZOOe-UOoX2wHsxtg-1; Fri, 24 Mar 2023 09:58:49 -0400 X-MC-Unique: pgonzQZOOe-UOoX2wHsxtg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 64B322817233; Fri, 24 Mar 2023 13:58:49 +0000 (UTC) Received: from localhost (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8A49D440D6; Fri, 24 Mar 2023 13:58:48 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Pavel Begunkov , Stefan Hajnoczi , Ming Lei Subject: [PATCH V4 08/17] block: ublk_drv: don't consider flush request in map/unmap io Date: Fri, 24 Mar 2023 21:57:59 +0800 Message-Id: <20230324135808.855245-9-ming.lei@redhat.com> In-Reply-To: <20230324135808.855245-1-ming.lei@redhat.com> References: <20230324135808.855245-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761258065012699968?= X-GMAIL-MSGID: =?utf-8?q?1761258065012699968?= There isn't data in request of REQ_OP_FLUSH always, so don't consider it in both ublk_map_io() and ublk_unmap_io(). Reviewed-by: Ziyang Zhang Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index b8998ed87a0f..4fe324da97a0 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -529,15 +529,13 @@ static int ublk_map_io(const struct ublk_queue *ubq, const struct request *req, struct ublk_io *io) { const unsigned int rq_bytes = blk_rq_bytes(req); + /* * no zero copy, we delay copy WRITE request data into ublksrv * context and the big benefit is that pinning pages in current * context is pretty fast, see ublk_pin_user_pages */ - if (req_op(req) != REQ_OP_WRITE && req_op(req) != REQ_OP_FLUSH) - return rq_bytes; - - if (ublk_rq_has_data(req)) { + if (ublk_rq_has_data(req) && req_op(req) == REQ_OP_WRITE) { struct ublk_map_data data = { .ubq = ubq, .rq = req, @@ -772,9 +770,7 @@ static inline void __ublk_rq_task_work(struct request *req) return; } - if (ublk_need_get_data(ubq) && - (req_op(req) == REQ_OP_WRITE || - req_op(req) == REQ_OP_FLUSH)) { + if (ublk_need_get_data(ubq) && (req_op(req) == REQ_OP_WRITE)) { /* * We have not handled UBLK_IO_NEED_GET_DATA command yet, * so immepdately pass UBLK_IO_RES_NEED_GET_DATA to ublksrv From patchwork Fri Mar 24 13:58:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 74571 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp659721vqo; Fri, 24 Mar 2023 07:16:25 -0700 (PDT) X-Google-Smtp-Source: AKy350a60i+1WATXUSaPEkewRtDOWcZ3xTwB3nGEv2SNSTRBeEvUWmy1K98TUlVyJdHQnugQOPBm X-Received: by 2002:a17:906:e112:b0:933:1b05:8851 with SMTP id gj18-20020a170906e11200b009331b058851mr2761298ejb.16.1679667385284; Fri, 24 Mar 2023 07:16:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679667385; cv=none; d=google.com; s=arc-20160816; b=t9WmcInDVa+JDr5VPNclQRBudGXpWHbQ4xGeBiwYmAfJfXsflXHIIzqSUuVNDTJIDs iEpR/zMGzaW1Qzq1QTuC5J/hoHedcciuXvyvBJWjsdvpKGRs0o6hbscLclcfa5al5weL MECVCenFSfwIbTuzF1L1R7t8nOwhPYcRF65kXjxrY8q/ZM9OFYHPZc+eKrLkyDFdPeAl y47PvEXT5EXGqg8FVn2yTUmE/yzJ7bYYkwxwSfTraK9kHtDIPgz8CbFjceD1BuSSv7Fr F8zQuzBrogdbW6DWSdQLHaTOAHi7p44Twsp1UZNfSRiKv0lSP6ITzp6Jl10P0pYbF01S 3vnQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ULACr9hwIdunm2lWS57kdMnN4Bb/D0rvhYQEw4GmFbc=; b=ewDGgYjp2KQb4FyC+iFOqpsm/HS38frAfkVwTXMHp5npzzp6kJpmMYmD6i3yMMwu+3 XxeNXGWOHJp5HNDzYLi3dVyobmP49eVVoOhITewPyxPjeNHCzDXBiuZ9en2otF6yPBIk a5jQfkokPi/u6HNF9hwGLo9P2nJM6RMA4OJjO07OR/sdVRrBHIPQZz4/PBYBoEpF1jJE bNH2mtf9SEsi+kpzrEZGQag1BtqsHB7SXugp7qMh7tJCpyIsVVmBunS5Xyk9oKZnbe90 QctujLXOUFlTlEzlbn97BDFZ+J17dMzs4LGXrWfHYlt78pDNIpO1emVyKoI3RigtJzFI RB6Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=DnlYVuvo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h17-20020a17090634d100b008d89608d4a2si21028770ejb.204.2023.03.24.07.15.59; Fri, 24 Mar 2023 07:16:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=DnlYVuvo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232025AbjCXOAb (ORCPT + 99 others); Fri, 24 Mar 2023 10:00:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51528 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231961AbjCXOAI (ORCPT ); Fri, 24 Mar 2023 10:00:08 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8E7E21B30C for ; Fri, 24 Mar 2023 06:58:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679666336; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ULACr9hwIdunm2lWS57kdMnN4Bb/D0rvhYQEw4GmFbc=; b=DnlYVuvoze8a1BREF7CFLiRqxHyF3OZy1nU+iOPNWdBn7msZNqoY1SnAGwcr+FxZkvU6Uq 1dvOcLCkgiIrrQH1cjbwXWg8ov74NCH5ua/1q8Wu+fQbM1m4HZQELPYIe7FXpXthtjTjxw FZQP6tY3p9U5ANdIcBJHNx8ZMJUs8Mo= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-437-3JFO45fQOEennVluWcWD3A-1; Fri, 24 Mar 2023 09:58:53 -0400 X-MC-Unique: 3JFO45fQOEennVluWcWD3A-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id CB518855312; Fri, 24 Mar 2023 13:58:52 +0000 (UTC) Received: from localhost (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id CE6D02027040; Fri, 24 Mar 2023 13:58:51 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Pavel Begunkov , Stefan Hajnoczi , Ming Lei Subject: [PATCH V4 09/17] block: ublk_drv: add two helpers to clean up map/unmap request Date: Fri, 24 Mar 2023 21:58:00 +0800 Message-Id: <20230324135808.855245-10-ming.lei@redhat.com> In-Reply-To: <20230324135808.855245-1-ming.lei@redhat.com> References: <20230324135808.855245-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761258908142093476?= X-GMAIL-MSGID: =?utf-8?q?1761258908142093476?= Add two helpers for checking if map/unmap is needed, since we may have passthrough request which needs map or unmap in future, such as for supporting report zones. Meantime don't mark ublk_copy_user_pages as inline since this function is a bit fat now. Reviewed-by: Ziyang Zhang Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 4fe324da97a0..43c0b1247924 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -488,8 +488,7 @@ static inline unsigned ublk_copy_io_pages(struct ublk_io_iter *data, return done; } -static inline int ublk_copy_user_pages(struct ublk_map_data *data, - bool to_vm) +static int ublk_copy_user_pages(struct ublk_map_data *data, bool to_vm) { const unsigned int gup_flags = to_vm ? FOLL_WRITE : 0; const unsigned long start_vm = data->io->addr; @@ -525,6 +524,16 @@ static inline int ublk_copy_user_pages(struct ublk_map_data *data, return done; } +static inline bool ublk_need_map_req(const struct request *req) +{ + return ublk_rq_has_data(req) && req_op(req) == REQ_OP_WRITE; +} + +static inline bool ublk_need_unmap_req(const struct request *req) +{ + return ublk_rq_has_data(req) && req_op(req) == REQ_OP_READ; +} + static int ublk_map_io(const struct ublk_queue *ubq, const struct request *req, struct ublk_io *io) { @@ -535,7 +544,7 @@ static int ublk_map_io(const struct ublk_queue *ubq, const struct request *req, * context and the big benefit is that pinning pages in current * context is pretty fast, see ublk_pin_user_pages */ - if (ublk_rq_has_data(req) && req_op(req) == REQ_OP_WRITE) { + if (ublk_need_map_req(req)) { struct ublk_map_data data = { .ubq = ubq, .rq = req, @@ -556,7 +565,7 @@ static int ublk_unmap_io(const struct ublk_queue *ubq, { const unsigned int rq_bytes = blk_rq_bytes(req); - if (req_op(req) == REQ_OP_READ && ublk_rq_has_data(req)) { + if (ublk_need_unmap_req(req)) { struct ublk_map_data data = { .ubq = ubq, .rq = req, @@ -770,7 +779,7 @@ static inline void __ublk_rq_task_work(struct request *req) return; } - if (ublk_need_get_data(ubq) && (req_op(req) == REQ_OP_WRITE)) { + if (ublk_need_get_data(ubq) && ublk_need_map_req(req)) { /* * We have not handled UBLK_IO_NEED_GET_DATA command yet, * so immepdately pass UBLK_IO_RES_NEED_GET_DATA to ublksrv From patchwork Fri Mar 24 13:58:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 74567 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp657827vqo; Fri, 24 Mar 2023 07:13:59 -0700 (PDT) X-Google-Smtp-Source: AKy350ZMDpg98zUl3yoLFU/ux3MeYrqjapcISxkJhlMIAMADHAgdaO5/L1vZKZ2ltWgTDfB4WZfJ X-Received: by 2002:a50:fe98:0:b0:4fe:ddf:8d8b with SMTP id d24-20020a50fe98000000b004fe0ddf8d8bmr3235067edt.15.1679667239323; Fri, 24 Mar 2023 07:13:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679667239; cv=none; d=google.com; s=arc-20160816; b=l8gpE4LBd+dk4XbBqTblrqjDHCGIkTA8XzBEtep/37svtv+C6NcOHQDbyQo18Wm1Hs TfUVhM/xHZSHSyYucKxN1YioSBXDP6xEduutShZouFALUn15h13fV0ClBmIM9E8u+KKj x4NrssUvffQps/EA8wXzwcunghNEA/MN9PuTX2JwFi9E3G1tQHUoi0mS1k1DuCPdcf+6 KZvAEbt4DO73q+ctAbRIYaL5M3ZAblFOQPCTgcbvd6cNPy+ujES9BxqDIHr+7AZVo11a BD8Ppn+h3Fs3MIEfCot3eLRrff2x1/5rokbA3NnGWt9D5hu7nBONHa64Y4H29rigY/Pm z3CA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=48AKVP9yTkY1ambTnk0VYH59I7XP/ONXFtXKzcPT0nk=; b=a/27CFVNj0n9p4H1SkhgQdCvBdfeIQA1GGslfzaNxSF/mEWKV/QM8H1JChfHoXoZ35 OLBqt839GpJzj7f1s2Ps9pREFFoUlhjBl3xcaDN+79nW0aR1RbYeMPDDgbk4XOgoz8Sj 9Ekw2gj/QAZvnSHFqQxUmiJU400qahjc4aAv+cMMrgFv9znNp5Ui7E/WeGPO0b2tg0rF i5OZkZpxxwI5NxTxNtJ3xvZTZD+bEsiz6ttwFeW1EhlsAHzB262kbAXvlGCzjTEH1nGu Pjr/BYiAP++XmRdfKFbD59ahKSKrkiCQqf47cg43v4yPwXrRFrDo/25vBUZihgFNGR/4 DDDQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="GVX8EB/g"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r8-20020aa7da08000000b004c007c7905asi22138862eds.484.2023.03.24.07.13.34; Fri, 24 Mar 2023 07:13:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="GVX8EB/g"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231774AbjCXOAe (ORCPT + 99 others); Fri, 24 Mar 2023 10:00:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50834 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229551AbjCXOAJ (ORCPT ); Fri, 24 Mar 2023 10:00:09 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 684F51B327 for ; Fri, 24 Mar 2023 06:59:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679666340; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=48AKVP9yTkY1ambTnk0VYH59I7XP/ONXFtXKzcPT0nk=; b=GVX8EB/gkxYTzRli00L4O9gUli/7GKMH49AW2T8vvNjiZheBPhcglODgd6XeemY7JA4JX7 3b3Xfqb6UXazaYXorvDxqOfaritJx6JYGskTObvZC81wfYxrTt+3O/88aFAnqKrqOiubF/ nTIwRZcbiL4YUfJdFvfX/x5bHfdwgTA= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-197-CNqsnyGUN0CmZTTtEOvAmA-1; Fri, 24 Mar 2023 09:58:57 -0400 X-MC-Unique: CNqsnyGUN0CmZTTtEOvAmA-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 8FF27802C15; Fri, 24 Mar 2023 13:58:56 +0000 (UTC) Received: from localhost (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id B60331121314; Fri, 24 Mar 2023 13:58:55 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Pavel Begunkov , Stefan Hajnoczi , Ming Lei Subject: [PATCH V4 10/17] block: ublk_drv: clean up several helpers Date: Fri, 24 Mar 2023 21:58:01 +0800 Message-Id: <20230324135808.855245-11-ming.lei@redhat.com> In-Reply-To: <20230324135808.855245-1-ming.lei@redhat.com> References: <20230324135808.855245-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761258754807886370?= X-GMAIL-MSGID: =?utf-8?q?1761258754807886370?= Convert the following pattern in several helpers if (Z) return true return false into: return Z; Reviewed-by: Ziyang Zhang Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 18 +++++------------- 1 file changed, 5 insertions(+), 13 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 43c0b1247924..c35922f9a066 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -298,9 +298,7 @@ static inline bool ublk_can_use_task_work(const struct ublk_queue *ubq) static inline bool ublk_need_get_data(const struct ublk_queue *ubq) { - if (ubq->flags & UBLK_F_NEED_GET_DATA) - return true; - return false; + return ubq->flags & UBLK_F_NEED_GET_DATA; } static struct ublk_device *ublk_get_device(struct ublk_device *ub) @@ -349,25 +347,19 @@ static inline int ublk_queue_cmd_buf_size(struct ublk_device *ub, int q_id) static inline bool ublk_queue_can_use_recovery_reissue( struct ublk_queue *ubq) { - if ((ubq->flags & UBLK_F_USER_RECOVERY) && - (ubq->flags & UBLK_F_USER_RECOVERY_REISSUE)) - return true; - return false; + return (ubq->flags & UBLK_F_USER_RECOVERY) && + (ubq->flags & UBLK_F_USER_RECOVERY_REISSUE); } static inline bool ublk_queue_can_use_recovery( struct ublk_queue *ubq) { - if (ubq->flags & UBLK_F_USER_RECOVERY) - return true; - return false; + return ubq->flags & UBLK_F_USER_RECOVERY; } static inline bool ublk_can_use_recovery(struct ublk_device *ub) { - if (ub->dev_info.flags & UBLK_F_USER_RECOVERY) - return true; - return false; + return ub->dev_info.flags & UBLK_F_USER_RECOVERY; } static void ublk_free_disk(struct gendisk *disk) From patchwork Fri Mar 24 13:58:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 74559 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp648255vqo; Fri, 24 Mar 2023 07:03:12 -0700 (PDT) X-Google-Smtp-Source: AKy350aw9JYXvxYGxNIhGfANmECPXFw0L+kGI74iG3mRiWK6h8BWRUz+CntKjmmKlBvmx6oyT4dS X-Received: by 2002:a17:906:398b:b0:930:e9ee:c474 with SMTP id h11-20020a170906398b00b00930e9eec474mr2601929eje.54.1679666592029; Fri, 24 Mar 2023 07:03:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679666592; cv=none; d=google.com; s=arc-20160816; b=px11jJHQH3ZL0SYKuP82cU4zR9bTBlrgGLqIYArnH2Vsg41CjlVz+DtXRF7BD4ZYuw MXMOxqGYkmDEJZv9LUE2CFrAEHJmBi8KojnBM+HNheTSp1E5MUupc6c29wnGvLn0yAVM b/HhQxXtsBT7W/l7zk2LwzaEZHKbLbkHfl6VFIkxBpuCsGPyhzEuUbY9Fu8RTFxdg1uv aLo/5Vic8JhGtxCTb7cIPOC71TtTV3KUwC3g4gCPtTuH2Qv4HcBgHA/9qqx8nQzB4Pa3 l2fz6vrvqEy6e8QuHBB3TqV5/xHEjOMPlVP+wUV7LkMrnf7MhehLXglTE2Uw2SZd2fhJ GpGw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=PEVYCdfoGw57Z/H2hNeVQlTVQpqEIM+vCXbTPVRf6bs=; b=qjAgoyi9tPfGBYslCHhOtLGfp4c71S+CsKh3U2w/emiNKxBew4hKy3lm5GPzvR6Ehv Hh0ZUNIaUrlrK/Bjeu6tccD0gI96hhq+5F+ZFnVj2Gz98reIAVHMdO1kQZplrHc4jNhZ TrvJY3j+BxHiCloG4pP0ckB5xNKclQai1Z7JncLMoIrm/5ClKDw9X6+68S18alHatvSW dAiG6cy/TK8DzREJcoVH0MIKRAUX8K7r6KlXgHVOqeZLt40QNTpOBg/EPkIMI299v8W8 CWdmj5DPCxJVZ6d+6V+4rljFwB+RK2g0EF4QKRiSCIX4vxxD6hK8NoZ8guS7//lCPero KbAQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=XEsJnVbR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id qu12-20020a170907110c00b0091f418851absi21853279ejb.74.2023.03.24.07.02.29; Fri, 24 Mar 2023 07:03:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=XEsJnVbR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232067AbjCXOAh (ORCPT + 99 others); Fri, 24 Mar 2023 10:00:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51932 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231984AbjCXOAJ (ORCPT ); Fri, 24 Mar 2023 10:00:09 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 492901B55B for ; Fri, 24 Mar 2023 06:59:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679666344; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PEVYCdfoGw57Z/H2hNeVQlTVQpqEIM+vCXbTPVRf6bs=; b=XEsJnVbRlkxlL5Sn0Nl3I5myEkAwt/5k+m/aWL5GF88C0vdP82uW1F4el7krNIn4Ju8oG5 rAd8plGnsiNJAXm3OYKE26cVn/itn721psIo6h3SbtapRYUWZCVK/vieqrBPWMvPWMCAK2 w2P03ApUnJ+p8lFgDk6EgZFQ8pRXLVE= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-527-s9xC28ysP9GcSiECsOudnQ-1; Fri, 24 Mar 2023 09:59:00 -0400 X-MC-Unique: s9xC28ysP9GcSiECsOudnQ-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 9904A855304; Fri, 24 Mar 2023 13:58:59 +0000 (UTC) Received: from localhost (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id D19792A68; Fri, 24 Mar 2023 13:58:58 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Pavel Begunkov , Stefan Hajnoczi , Ming Lei Subject: [PATCH V4 11/17] block: ublk_drv: cleanup 'struct ublk_map_data' Date: Fri, 24 Mar 2023 21:58:02 +0800 Message-Id: <20230324135808.855245-12-ming.lei@redhat.com> In-Reply-To: <20230324135808.855245-1-ming.lei@redhat.com> References: <20230324135808.855245-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761258076319825737?= X-GMAIL-MSGID: =?utf-8?q?1761258076319825737?= 'struct ublk_map_data' is passed to ublk_copy_user_pages() for copying data between userspace buffer and request pages. Here what matters is userspace buffer address/len and 'struct request', so replace ->io field with user buffer address, and rename max_bytes as len. Meantime remove 'ubq' field from ublk_map_data, since it isn't used any more. Then code becomes more readable. Reviewed-by: Ziyang Zhang Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 27 ++++++++++++--------------- 1 file changed, 12 insertions(+), 15 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index c35922f9a066..0e7533858c1f 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -420,10 +420,9 @@ static const struct block_device_operations ub_fops = { #define UBLK_MAX_PIN_PAGES 32 struct ublk_map_data { - const struct ublk_queue *ubq; const struct request *rq; - const struct ublk_io *io; - unsigned max_bytes; + unsigned long ubuf; + unsigned int len; }; struct ublk_io_iter { @@ -483,14 +482,14 @@ static inline unsigned ublk_copy_io_pages(struct ublk_io_iter *data, static int ublk_copy_user_pages(struct ublk_map_data *data, bool to_vm) { const unsigned int gup_flags = to_vm ? FOLL_WRITE : 0; - const unsigned long start_vm = data->io->addr; + const unsigned long start_vm = data->ubuf; unsigned int done = 0; struct ublk_io_iter iter = { .pg_off = start_vm & (PAGE_SIZE - 1), .bio = data->rq->bio, .iter = data->rq->bio->bi_iter, }; - const unsigned int nr_pages = round_up(data->max_bytes + + const unsigned int nr_pages = round_up(data->len + (start_vm & (PAGE_SIZE - 1)), PAGE_SIZE) >> PAGE_SHIFT; while (done < nr_pages) { @@ -503,13 +502,13 @@ static int ublk_copy_user_pages(struct ublk_map_data *data, bool to_vm) iter.pages); if (iter.nr_pages <= 0) return done == 0 ? iter.nr_pages : done; - len = ublk_copy_io_pages(&iter, data->max_bytes, to_vm); + len = ublk_copy_io_pages(&iter, data->len, to_vm); for (i = 0; i < iter.nr_pages; i++) { if (to_vm) set_page_dirty(iter.pages[i]); put_page(iter.pages[i]); } - data->max_bytes -= len; + data->len -= len; done += iter.nr_pages; } @@ -538,15 +537,14 @@ static int ublk_map_io(const struct ublk_queue *ubq, const struct request *req, */ if (ublk_need_map_req(req)) { struct ublk_map_data data = { - .ubq = ubq, .rq = req, - .io = io, - .max_bytes = rq_bytes, + .ubuf = io->addr, + .len = rq_bytes, }; ublk_copy_user_pages(&data, true); - return rq_bytes - data.max_bytes; + return rq_bytes - data.len; } return rq_bytes; } @@ -559,17 +557,16 @@ static int ublk_unmap_io(const struct ublk_queue *ubq, if (ublk_need_unmap_req(req)) { struct ublk_map_data data = { - .ubq = ubq, .rq = req, - .io = io, - .max_bytes = io->res, + .ubuf = io->addr, + .len = io->res, }; WARN_ON_ONCE(io->res > rq_bytes); ublk_copy_user_pages(&data, false); - return io->res - data.max_bytes; + return io->res - data.len; } return rq_bytes; } From patchwork Fri Mar 24 13:58:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 74568 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp658125vqo; Fri, 24 Mar 2023 07:14:25 -0700 (PDT) X-Google-Smtp-Source: AKy350bnnOnTc9R5mFHRbwTt+PYqxQSJpgJZ4n2H/tgJF+rS8F8txH12/g8rgbF8H5bE1rYljps+ X-Received: by 2002:a17:906:5849:b0:931:4b0b:73e3 with SMTP id h9-20020a170906584900b009314b0b73e3mr2765385ejs.65.1679667265446; Fri, 24 Mar 2023 07:14:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679667265; cv=none; d=google.com; s=arc-20160816; b=MtwlFEik4/SajObp3TTrdBFvUwdAbTA9b7/vL3kdtCt/egndiSW4gw6I1eeGLwVWTF 75T4jigNo1Z4oAUJQ4DwViNkS6ieEaO9cr/whhtq3WVZ2TBejIPaxOLFm2mW3MekJfzk tzrxafLQqxpDAr+W5suA8qZ+hgbnwTyAV1HX6huGdHAfPXM2M9B34SPJKE1gqOLxE2q+ XVPSGwTRM0lT4AUsRXzgv92k+PvcQ0Aoj6n/wETt1vHrCeNNCEsVT4t+/jpOVRdIMOB4 rt0+HMwTY/Lk2lgZSGDoYQb8Uz8PuQJgFwBodK3NAciPE/krrJZWQ2ERjC8TPnjHbnKW mOxQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=NN28dKc6XxSu1a4tYzmGw6THkgXaxqwwsT/5pOpys2I=; b=wSWLgw/Btl27lkYcqvRIKz77ygdUkKmQHPJR5DKrLNEZmBWdqt0qqKnQ2eXYwtzTJd qV0Hd6iVyCMx4erjeB6VNjHb8QHCqvw3uDJsYtQUOAWl+iKRJgD6bwjaHuZn6g272p68 dU7FnIbe8smIuqiY6J8dB5i9WLO68jOvw0XouuQrMh+fYToRYq1dAntgPdr4pkHwx+8z 1aDjlcEvDotpUZ+kD2oOgfk70kY9EjhongmZfy9TWW2GND5rLXfvyBpdRp8qr8U2DDDV 77nMc4hLCbf7ISPWOEHtoowmJJpcsaR+HksNYsFZU803iK4Ha+iBcW2Y9uu6ciIeVydb KM2g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=WuDHx3oj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id gt39-20020a1709072da700b0093337d30dadsi23873675ejc.522.2023.03.24.07.14.01; Fri, 24 Mar 2023 07:14:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=WuDHx3oj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231953AbjCXOA6 (ORCPT + 99 others); Fri, 24 Mar 2023 10:00:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50844 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231778AbjCXOAT (ORCPT ); Fri, 24 Mar 2023 10:00:19 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2FA231BACE for ; Fri, 24 Mar 2023 06:59:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679666347; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NN28dKc6XxSu1a4tYzmGw6THkgXaxqwwsT/5pOpys2I=; b=WuDHx3ojHLVAGaN/Ee6L7xKcUVTcB6ZbXisE6cWXPBOi/y8mMQyXdILgNXdeMSggedqJoh FouWmTz9IYCdoqsQfFQg2ewsIPNpdjBhbODdUh6gcpOKFuePH/h6ok9GvhI5ul246aeTni PGjBAILsOAsDJsibXo83GsDMC7uXGEo= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-643-orZJI0ePNO2w2QP9P70RMg-1; Fri, 24 Mar 2023 09:59:04 -0400 X-MC-Unique: orZJI0ePNO2w2QP9P70RMg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5500F2817233; Fri, 24 Mar 2023 13:59:03 +0000 (UTC) Received: from localhost (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id 789294020C81; Fri, 24 Mar 2023 13:59:02 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Pavel Begunkov , Stefan Hajnoczi , Ming Lei Subject: [PATCH V4 12/17] block: ublk_drv: cleanup ublk_copy_user_pages Date: Fri, 24 Mar 2023 21:58:03 +0800 Message-Id: <20230324135808.855245-13-ming.lei@redhat.com> In-Reply-To: <20230324135808.855245-1-ming.lei@redhat.com> References: <20230324135808.855245-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761258782467456257?= X-GMAIL-MSGID: =?utf-8?q?1761258782467456257?= Clean up ublk_copy_user_pages() by using iov iter, and code gets simplified a lot and becomes much more readable than before. Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 112 +++++++++++++++++---------------------- 1 file changed, 49 insertions(+), 63 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 0e7533858c1f..85ceb8c09d0e 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -419,49 +419,39 @@ static const struct block_device_operations ub_fops = { #define UBLK_MAX_PIN_PAGES 32 -struct ublk_map_data { - const struct request *rq; - unsigned long ubuf; - unsigned int len; -}; - struct ublk_io_iter { struct page *pages[UBLK_MAX_PIN_PAGES]; - unsigned pg_off; /* offset in the 1st page in pages */ - int nr_pages; /* how many page pointers in pages */ struct bio *bio; struct bvec_iter iter; }; -static inline unsigned ublk_copy_io_pages(struct ublk_io_iter *data, - unsigned max_bytes, bool to_vm) +/* return how many pages are copied */ +static void ublk_copy_io_pages(struct ublk_io_iter *data, + size_t total, size_t pg_off, int dir) { - const unsigned total = min_t(unsigned, max_bytes, - PAGE_SIZE - data->pg_off + - ((data->nr_pages - 1) << PAGE_SHIFT)); unsigned done = 0; unsigned pg_idx = 0; while (done < total) { struct bio_vec bv = bio_iter_iovec(data->bio, data->iter); - const unsigned int bytes = min3(bv.bv_len, total - done, - (unsigned)(PAGE_SIZE - data->pg_off)); + unsigned int bytes = min3(bv.bv_len, (unsigned)total - done, + (unsigned)(PAGE_SIZE - pg_off)); void *bv_buf = bvec_kmap_local(&bv); void *pg_buf = kmap_local_page(data->pages[pg_idx]); - if (to_vm) - memcpy(pg_buf + data->pg_off, bv_buf, bytes); + if (dir == ITER_DEST) + memcpy(pg_buf + pg_off, bv_buf, bytes); else - memcpy(bv_buf, pg_buf + data->pg_off, bytes); + memcpy(bv_buf, pg_buf + pg_off, bytes); kunmap_local(pg_buf); kunmap_local(bv_buf); /* advance page array */ - data->pg_off += bytes; - if (data->pg_off == PAGE_SIZE) { + pg_off += bytes; + if (pg_off == PAGE_SIZE) { pg_idx += 1; - data->pg_off = 0; + pg_off = 0; } done += bytes; @@ -475,41 +465,40 @@ static inline unsigned ublk_copy_io_pages(struct ublk_io_iter *data, data->iter = data->bio->bi_iter; } } - - return done; } -static int ublk_copy_user_pages(struct ublk_map_data *data, bool to_vm) +/* + * Copy data between request pages and io_iter, and 'offset' + * is the start point of linear offset of request. + */ +static size_t ublk_copy_user_pages(const struct request *req, + struct iov_iter *uiter, int dir) { - const unsigned int gup_flags = to_vm ? FOLL_WRITE : 0; - const unsigned long start_vm = data->ubuf; - unsigned int done = 0; struct ublk_io_iter iter = { - .pg_off = start_vm & (PAGE_SIZE - 1), - .bio = data->rq->bio, - .iter = data->rq->bio->bi_iter, + .bio = req->bio, + .iter = req->bio->bi_iter, }; - const unsigned int nr_pages = round_up(data->len + - (start_vm & (PAGE_SIZE - 1)), PAGE_SIZE) >> PAGE_SHIFT; - - while (done < nr_pages) { - const unsigned to_pin = min_t(unsigned, UBLK_MAX_PIN_PAGES, - nr_pages - done); - unsigned i, len; - - iter.nr_pages = get_user_pages_fast(start_vm + - (done << PAGE_SHIFT), to_pin, gup_flags, - iter.pages); - if (iter.nr_pages <= 0) - return done == 0 ? iter.nr_pages : done; - len = ublk_copy_io_pages(&iter, data->len, to_vm); - for (i = 0; i < iter.nr_pages; i++) { - if (to_vm) + size_t done = 0; + + while (iov_iter_count(uiter) && iter.bio) { + unsigned nr_pages; + size_t len, off; + int i; + + len = iov_iter_get_pages2(uiter, iter.pages, + iov_iter_count(uiter), + UBLK_MAX_PIN_PAGES, &off); + if (len <= 0) + return done; + + ublk_copy_io_pages(&iter, len, off, dir); + nr_pages = DIV_ROUND_UP(len + off, PAGE_SIZE); + for (i = 0; i < nr_pages; i++) { + if (dir == ITER_DEST) set_page_dirty(iter.pages[i]); put_page(iter.pages[i]); } - data->len -= len; - done += iter.nr_pages; + done += len; } return done; @@ -536,15 +525,14 @@ static int ublk_map_io(const struct ublk_queue *ubq, const struct request *req, * context is pretty fast, see ublk_pin_user_pages */ if (ublk_need_map_req(req)) { - struct ublk_map_data data = { - .rq = req, - .ubuf = io->addr, - .len = rq_bytes, - }; + struct iov_iter iter; + struct iovec iov; + const int dir = ITER_DEST; - ublk_copy_user_pages(&data, true); + import_single_range(dir, u64_to_user_ptr(io->addr), rq_bytes, + &iov, &iter); - return rq_bytes - data.len; + return ublk_copy_user_pages(req, &iter, dir); } return rq_bytes; } @@ -556,17 +544,15 @@ static int ublk_unmap_io(const struct ublk_queue *ubq, const unsigned int rq_bytes = blk_rq_bytes(req); if (ublk_need_unmap_req(req)) { - struct ublk_map_data data = { - .rq = req, - .ubuf = io->addr, - .len = io->res, - }; + struct iov_iter iter; + struct iovec iov; + const int dir = ITER_SOURCE; WARN_ON_ONCE(io->res > rq_bytes); - ublk_copy_user_pages(&data, false); - - return io->res - data.len; + import_single_range(dir, u64_to_user_ptr(io->addr), io->res, + &iov, &iter); + return ublk_copy_user_pages(req, &iter, dir); } return rq_bytes; } From patchwork Fri Mar 24 13:58:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 74565 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp651167vqo; Fri, 24 Mar 2023 07:06:08 -0700 (PDT) X-Google-Smtp-Source: AKy350Zt51BwqXWKtldQzDv12KMh9pQWZ4baHG8YTRqPj8nzbI3wsM3jn3GfETTaitUjGJihQYsf X-Received: by 2002:aa7:c950:0:b0:500:2c4f:3f5 with SMTP id h16-20020aa7c950000000b005002c4f03f5mr2582034edt.12.1679666767852; Fri, 24 Mar 2023 07:06:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679666767; cv=none; d=google.com; s=arc-20160816; b=AsWCjiVePbCmn1PPSiDQ7I7dKK3rsxZHzYRs6OC0alXFkfByhTqoSZPnlPCfGwcPIk FPRdXDf7nF+X4dTQ0XXwZnKwn8Xb8Vi1k+6nqLb7MjMKtxX9BUYO4yd2VpIZ1OA3z9fG xaWfOWN+O0CaOSKTb+n7EmPnGVNu/mAppWSPks0X5oe00yPjQc6BPSrDvekmIEo/7ai6 xoxKyEH/2bPgNSEOsN7WqRZ/hOmOqSceBZaM9FF+Pz3pq1J02csX3MM+g+VQo2CMfQZE gtHyduib0zqdMqYcIxha2wTSc514q1WG0XxzoaNbaFFGibBLD63RAcygu7AdvhurR8Nb /wqw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=hvyvZWm/csvLwmrm6KZCbJ14Wg5t5i+o8LQkCJJRLEg=; b=fRaD8UgHq5E+Xf7xXmq59VZ1lp4CjeAYgaKYvFgVH9R/xGk2aOzHdpPQjUS0LY9E7B zpkQdBg7MUSByzWI7YsTfcr9qUDD6COdp6XZrqeLEwLJmUa7I+DmVzv7XzXxbagy7Qvo dsatcRGUgPgdVOJ49KfBMbRUDsCPtNGjtVAJ51chCxPod6+Wi4bSJB2SGxXAIqPUiyp4 44qAhIMTURBlhEAnKip5UWcorkdkC/h83CwRZBBMC6GKRMy3YChNA4NPMatE3s0/T2ES kcjucL81I153QqhHwNqQ6ap3DsPXmEOm7mw6byxnKPu3C6MjDMjveqN2U3Wn7UzCU7LD k8Sw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=RbO4awdU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o12-20020aa7c50c000000b004fd23c911f7si695049edq.544.2023.03.24.07.05.25; Fri, 24 Mar 2023 07:06:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=RbO4awdU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231358AbjCXOBM (ORCPT + 99 others); Fri, 24 Mar 2023 10:01:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50836 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232114AbjCXOA1 (ORCPT ); Fri, 24 Mar 2023 10:00:27 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7A0C71C32E for ; Fri, 24 Mar 2023 06:59:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679666350; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hvyvZWm/csvLwmrm6KZCbJ14Wg5t5i+o8LQkCJJRLEg=; b=RbO4awdUhqR0n4yMUrcGfDwR71YWgsNcD9TEPHumh7UO9tmN5ABTe2MLdciLpY6LKW9/+9 nIkhZv/yBOxQ1AAy3fS3bGwFQSTUJMP0njhw2KyyyEG8K4HPtzPC0oWvub3mDSDZnnXzov HPfdmuTx/Pt1GoSBG/WU2nDmMp/D/1o= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-28-D9mKF-kFPQ6VIpICSV0kEQ-1; Fri, 24 Mar 2023 09:59:07 -0400 X-MC-Unique: D9mKF-kFPQ6VIpICSV0kEQ-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id BBBDF385556D; Fri, 24 Mar 2023 13:59:06 +0000 (UTC) Received: from localhost (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id D4412492B03; Fri, 24 Mar 2023 13:59:05 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Pavel Begunkov , Stefan Hajnoczi , Ming Lei Subject: [PATCH V4 13/17] block: ublk_drv: grab request reference when the request is handled by userspace Date: Fri, 24 Mar 2023 21:58:04 +0800 Message-Id: <20230324135808.855245-14-ming.lei@redhat.com> In-Reply-To: <20230324135808.855245-1-ming.lei@redhat.com> References: <20230324135808.855245-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761258260515640955?= X-GMAIL-MSGID: =?utf-8?q?1761258260515640955?= Add one reference counter into request pdu data, and hold this reference in the request's lifetime. This way is always safe. In theory, the ublk request won't be completed until fused commands are done. However, it is userspace, and application can submit fused command at will. Prepare for supporting zero copy, which needs to retrieve request buffer by fused command, so we have to guarantee: - the fused command can't succeed unless the request isn't queued - when any fused command is successful, this request can't be freed until all fused commands on this request are done. Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 67 ++++++++++++++++++++++++++++++++++++++-- 1 file changed, 64 insertions(+), 3 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 85ceb8c09d0e..88d5a657834d 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -43,6 +43,7 @@ #include #include #include +#include #include #define UBLK_MINORS (1U << MINORBITS) @@ -62,6 +63,17 @@ struct ublk_rq_data { struct llist_node node; struct callback_head work; + + /* + * Only for applying fused command to support zero copy: + * + * - if there is any fused command aiming at this request, not complete + * request until all fused commands are done + * + * - fused command has to fail unless this reference is grabbed + * successfully + */ + struct kref ref; }; struct ublk_uring_cmd_pdu { @@ -180,6 +192,9 @@ struct ublk_params_header { __u32 types; }; +static inline void __ublk_complete_rq(struct request *req); +static void ublk_complete_rq(struct kref *ref); + static dev_t ublk_chr_devt; static struct class *ublk_chr_class; @@ -288,6 +303,35 @@ static int ublk_apply_params(struct ublk_device *ub) return 0; } +static inline bool ublk_support_zc(const struct ublk_queue *ubq) +{ + return ubq->flags & UBLK_F_SUPPORT_ZERO_COPY; +} + +static inline bool ublk_get_req_ref(const struct ublk_queue *ubq, + struct request *req) +{ + if (ublk_support_zc(ubq)) { + struct ublk_rq_data *data = blk_mq_rq_to_pdu(req); + + return kref_get_unless_zero(&data->ref); + } + + return true; +} + +static inline void ublk_put_req_ref(const struct ublk_queue *ubq, + struct request *req) +{ + if (ublk_support_zc(ubq)) { + struct ublk_rq_data *data = blk_mq_rq_to_pdu(req); + + kref_put(&data->ref, ublk_complete_rq); + } else { + __ublk_complete_rq(req); + } +} + static inline bool ublk_can_use_task_work(const struct ublk_queue *ubq) { if (IS_BUILTIN(CONFIG_BLK_DEV_UBLK) && @@ -632,13 +676,19 @@ static inline bool ubq_daemon_is_dying(struct ublk_queue *ubq) } /* todo: handle partial completion */ -static void ublk_complete_rq(struct request *req) +static inline void __ublk_complete_rq(struct request *req) { struct ublk_queue *ubq = req->mq_hctx->driver_data; struct ublk_io *io = &ubq->ios[req->tag]; unsigned int unmapped_bytes; blk_status_t res = BLK_STS_OK; + /* called from ublk_abort_queue() code path */ + if (io->flags & UBLK_IO_FLAG_ABORTED) { + res = BLK_STS_IOERR; + goto exit; + } + /* failed read IO if nothing is read */ if (!io->res && req_op(req) == REQ_OP_READ) io->res = -EIO; @@ -678,6 +728,15 @@ static void ublk_complete_rq(struct request *req) blk_mq_end_request(req, res); } +static void ublk_complete_rq(struct kref *ref) +{ + struct ublk_rq_data *data = container_of(ref, struct ublk_rq_data, + ref); + struct request *req = blk_mq_rq_from_pdu(data); + + __ublk_complete_rq(req); +} + /* * Since __ublk_rq_task_work always fails requests immediately during * exiting, __ublk_fail_req() is only called from abort context during @@ -696,7 +755,7 @@ static void __ublk_fail_req(struct ublk_queue *ubq, struct ublk_io *io, if (ublk_queue_can_use_recovery_reissue(ubq)) blk_mq_requeue_request(req, false); else - blk_mq_end_request(req, BLK_STS_IOERR); + ublk_put_req_ref(ubq, req); } } @@ -732,6 +791,7 @@ static inline void __ublk_abort_rq(struct ublk_queue *ubq, static inline void __ublk_rq_task_work(struct request *req) { struct ublk_queue *ubq = req->mq_hctx->driver_data; + struct ublk_rq_data *data = blk_mq_rq_to_pdu(req); int tag = req->tag; struct ublk_io *io = &ubq->ios[tag]; unsigned int mapped_bytes; @@ -803,6 +863,7 @@ static inline void __ublk_rq_task_work(struct request *req) mapped_bytes >> 9; } + kref_init(&data->ref); ubq_complete_io_cmd(io, UBLK_IO_RES_OK); } @@ -1013,7 +1074,7 @@ static void ublk_commit_completion(struct ublk_device *ub, req = blk_mq_tag_to_rq(ub->tag_set.tags[qid], tag); if (req && likely(!blk_should_fake_timeout(req->q))) - ublk_complete_rq(req); + ublk_put_req_ref(ubq, req); } /* From patchwork Fri Mar 24 13:58:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 74564 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp650932vqo; Fri, 24 Mar 2023 07:05:53 -0700 (PDT) X-Google-Smtp-Source: AKy350a3SrZL/p2lPkRnQRxsSw+jYwLfm1WDGlKMUzoA9DxTVbubbFZqG4ytZwoH1sjm2BOw6Drj X-Received: by 2002:aa7:d8cf:0:b0:4fc:8642:ce56 with SMTP id k15-20020aa7d8cf000000b004fc8642ce56mr2887933eds.25.1679666753068; Fri, 24 Mar 2023 07:05:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679666753; cv=none; d=google.com; s=arc-20160816; b=zNdkzZTYnEVIKcGkPooCapEmAaf70g9ffmEeeUHs1qllkt5hugjYRWRl+49NmkvLmt 9jymG/r0lHfNqWX0B6LhbY4OEQ/XC9rrkTRKa2c7Yxlqwiv38wh0WfrjcLk5OSQx2gE4 Ax7CE5um+d+KTj0jX+w47f4bZety3b5MCwmCyeLOWYqUhwV/KygMOmY/xdjXimwHYMFy 4qlKJGI2LpwIICWNPAPqfCdtWiOKkB2MOCA2xJHcUKWmplkosyhTL/XLAwxqt3+SYXpq VZ10ry5cdEQFYJ+T7ORdcsgXDkj3xkAoSVTYdRucNEzRQUkRXZfYxnS89m3y/cK5tKhZ fafw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=uuQC+m3nWXpEvAHBrpKpd923ngOUWFuBZCOxaTw0mf8=; b=t/YqsdLoM+fDWbsBlNq9lU5jXidCcBzGDdr3yQjMPytoi8Pk4p+F+GC+Lh+X6JwyNs c1LthX3vgSP1L089SI2ELUf9z9CJPLQ2gotzB6u1tVI644DM2aKxRFQBa5y7pESqh4su v0jvonxSptqSEPG9hdOiL/H5T9rzBWH0X3+GBLD1ow+HBBHkcCzROyR4gfgvkzTeq+sA +O5cdcAvzQqrP2xRboYFU8XXmiEySv0A1w1S+sjn/5dvnnXTJzWWTHou2bllUReMv6zy nJRRFJIAGVVIRhFMHdsyXSHpswswQk1NPU/gril6VWXQGM1q4JpCh39qZ0j4W1ckKvzM 5fJg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=WgsG0oRe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w24-20020aa7cb58000000b0050029c65a33si12264586edt.305.2023.03.24.07.05.10; Fri, 24 Mar 2023 07:05:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=WgsG0oRe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232120AbjCXOBI (ORCPT + 99 others); Fri, 24 Mar 2023 10:01:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52642 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231958AbjCXOA1 (ORCPT ); Fri, 24 Mar 2023 10:00:27 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CECF51C31D for ; Fri, 24 Mar 2023 06:59:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679666354; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uuQC+m3nWXpEvAHBrpKpd923ngOUWFuBZCOxaTw0mf8=; b=WgsG0oRey4sA3/94UJ8aXmGLTyZVUjoS1ExwvfE568rZA5btH6hYGHmcpL8oVoM41cmOqf w3l5D6gSQknPYp3TbLIDlbF+vwLQTqURtD/r1vUAJmPZ6Xss+8Db0xie9kU1NeOYB2kjrw b792fz2BB2QNnJAWrO+s/B2q0kP3UR8= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-267-OzWGU5o8N2yR7DVTUK2FWw-1; Fri, 24 Mar 2023 09:59:10 -0400 X-MC-Unique: OzWGU5o8N2yR7DVTUK2FWw-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 49D26185A7AC; Fri, 24 Mar 2023 13:59:10 +0000 (UTC) Received: from localhost (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id 811204042AC9; Fri, 24 Mar 2023 13:59:09 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Pavel Begunkov , Stefan Hajnoczi , Ming Lei Subject: [PATCH V4 14/17] block: ublk_drv: support to copy any part of request pages Date: Fri, 24 Mar 2023 21:58:05 +0800 Message-Id: <20230324135808.855245-15-ming.lei@redhat.com> In-Reply-To: <20230324135808.855245-1-ming.lei@redhat.com> References: <20230324135808.855245-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761258245305878128?= X-GMAIL-MSGID: =?utf-8?q?1761258245305878128?= Add 'offset' to 'struct ublk_map_data', so that ublk_copy_user_pages() can be used to copy any sub-buffer(linear mapped) of the request. Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 31 ++++++++++++++++++++++++------- 1 file changed, 24 insertions(+), 7 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 88d5a657834d..26a14c54da1d 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -511,19 +511,36 @@ static void ublk_copy_io_pages(struct ublk_io_iter *data, } } +static bool ublk_advance_io_iter(const struct request *req, + struct ublk_io_iter *iter, unsigned int offset) +{ + struct bio *bio = req->bio; + + for_each_bio(bio) { + if (bio->bi_iter.bi_size > offset) { + iter->bio = bio; + iter->iter = bio->bi_iter; + bio_advance_iter(iter->bio, &iter->iter, offset); + return true; + } + offset -= bio->bi_iter.bi_size; + } + return false; +} + /* * Copy data between request pages and io_iter, and 'offset' * is the start point of linear offset of request. */ static size_t ublk_copy_user_pages(const struct request *req, - struct iov_iter *uiter, int dir) + unsigned offset, struct iov_iter *uiter, int dir) { - struct ublk_io_iter iter = { - .bio = req->bio, - .iter = req->bio->bi_iter, - }; + struct ublk_io_iter iter; size_t done = 0; + if (!ublk_advance_io_iter(req, &iter, offset)) + return 0; + while (iov_iter_count(uiter) && iter.bio) { unsigned nr_pages; size_t len, off; @@ -576,7 +593,7 @@ static int ublk_map_io(const struct ublk_queue *ubq, const struct request *req, import_single_range(dir, u64_to_user_ptr(io->addr), rq_bytes, &iov, &iter); - return ublk_copy_user_pages(req, &iter, dir); + return ublk_copy_user_pages(req, 0, &iter, dir); } return rq_bytes; } @@ -596,7 +613,7 @@ static int ublk_unmap_io(const struct ublk_queue *ubq, import_single_range(dir, u64_to_user_ptr(io->addr), io->res, &iov, &iter); - return ublk_copy_user_pages(req, &iter, dir); + return ublk_copy_user_pages(req, 0, &iter, dir); } return rq_bytes; } From patchwork Fri Mar 24 13:58:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 74562 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp649083vqo; Fri, 24 Mar 2023 07:03:58 -0700 (PDT) X-Google-Smtp-Source: AKy350Y8d/RGbBgq56GPVbrR2n7in08S6AHHIWFnVdgXmi26GB6WmD+V7wIhYG47/dIp2231KrX7 X-Received: by 2002:aa7:c6c8:0:b0:4f9:deb4:b986 with SMTP id b8-20020aa7c6c8000000b004f9deb4b986mr2638923eds.7.1679666637760; Fri, 24 Mar 2023 07:03:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679666637; cv=none; d=google.com; s=arc-20160816; b=Laniie2d/4bDi0m3LxSuGhKfxA+gbA+N3SzS56j5YvqF67nCOdc9B0m1UK/LETmopg pWJEn9q0Ckty4mDXOQjioae5UjHq3pXf2n1jyouWjAzyw9gbbIOlg8n5MBiGWk8l6KJr IWb9WM/9ZMNpXA8ot6khJtqXyWidaQ0bi3dUUyieJu8y/NULTjU5D4/zCYs1M5H56HyV Ht6H6Mhe7XANa01uBDTx6nmpPBJqlZgNzql2Uk/8F/RJN5OuAJRa8rEcQv/tQLjQ3jJm 5QxFLSwA5nslGvOkVyDnq3+On2AtynUfiyNE/l87VX8QbGGLXbQE1S7OZDo2oLonw2BV 7EZA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=UtTkIiWab0XMlZOmjKpa8MKYPjT9l5NSYtOz0X+FN4U=; b=GDVaxaZQRVskIruTc5vVWsLOJp07vS1gwTQX7Wru7RXxbYR4Whz7cx4omXX3sv765W IQbF5VwMT8mRIbSC9K5aTaOZ3reJbgr/RFaLqLFaz1OGFfO27FidtuFN8LyRLfgmZzs+ UxitmypQetZl4PlAWsLx9kQmwc+5PAYae8zpR5hCadQSmko+8TN2XxmRDFRvSH53ETsL WiMNNPvKgl7LHsRRzI6LdvXsLYIdXrBtEKRk0zI9MwJDSPN3y1H57TX4PSOY1/2YS0r5 LXcdCnfDXY4jHN9k6GH3DsOmYYt0aXSFPrbT56LpUJLobdGSgs0F/lVuql0UG1g9pB0E jg6g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=NAFVWh8k; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id dy20-20020a05640231f400b004fa760de417si22394040edb.122.2023.03.24.07.03.19; Fri, 24 Mar 2023 07:03:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=NAFVWh8k; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232182AbjCXOBB (ORCPT + 99 others); Fri, 24 Mar 2023 10:01:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52666 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231960AbjCXOAZ (ORCPT ); Fri, 24 Mar 2023 10:00:25 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 146A71C595 for ; Fri, 24 Mar 2023 06:59:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679666357; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UtTkIiWab0XMlZOmjKpa8MKYPjT9l5NSYtOz0X+FN4U=; b=NAFVWh8k4dTBQ5hU08zFBPDIQU3B9pLyT0aCaDU1Q2t2Nc5EKxp1JLyaSF1qlHc4an8Z32 eVuB4bPCdkpDiDrHvkqfIMhcP+6MZ1gCn+o9mizbcnwwwKkqWsy/6T9HhY3nDNzPpO9JYV oHBnVDvcJU76Lk5GHDMGgWDwkGkrYEs= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-99-pmqtjrANN9-T7WRDlRT_Lg-1; Fri, 24 Mar 2023 09:59:14 -0400 X-MC-Unique: pmqtjrANN9-T7WRDlRT_Lg-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 8EE9A385556D; Fri, 24 Mar 2023 13:59:13 +0000 (UTC) Received: from localhost (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id A326C4042AC9; Fri, 24 Mar 2023 13:59:12 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Pavel Begunkov , Stefan Hajnoczi , Ming Lei Subject: [PATCH V4 15/17] block: ublk_drv: add read()/write() support for ublk char device Date: Fri, 24 Mar 2023 21:58:06 +0800 Message-Id: <20230324135808.855245-16-ming.lei@redhat.com> In-Reply-To: <20230324135808.855245-1-ming.lei@redhat.com> References: <20230324135808.855245-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761258124416689884?= X-GMAIL-MSGID: =?utf-8?q?1761258124416689884?= We are going to support zero copy by fused uring command, the userspace can't read from or write to the io buffer any more, it becomes not flexible for applications: 1) some targets need to zero buffer explicitly, such as when reading unmapped qcow2 cluster 2) some targets need to support passthrough command, such as zoned report zones, and still need to read/write the io buffer Support pread()/pwrite() on ublk char device for reading/writing request io buffer, so ublk server can handle the above cases easily. This also can help to make zero copy becoming the primary option, and non-zero-copy will become legacy code path since the added read()/write() can cover non-zero-copy feature. Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 131 ++++++++++++++++++++++++++++++++++ include/uapi/linux/ublk_cmd.h | 31 +++++++- 2 files changed, 161 insertions(+), 1 deletion(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 26a14c54da1d..e6b528750b3c 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -1317,6 +1317,36 @@ static void ublk_handle_need_get_data(struct ublk_device *ub, int q_id, ublk_queue_cmd(ubq, req); } +static inline struct request *__ublk_check_and_get_req(struct ublk_device *ub, + struct ublk_queue *ubq, int tag, size_t offset) +{ + struct request *req; + + if (!ublk_support_zc(ubq)) + return NULL; + + req = blk_mq_tag_to_rq(ub->tag_set.tags[ubq->q_id], tag); + if (!req) + return NULL; + + if (!ublk_get_req_ref(ubq, req)) + return NULL; + + if (unlikely(!blk_mq_request_started(req) || req->tag != tag)) + goto fail_put; + + if (!ublk_rq_has_data(req)) + goto fail_put; + + if (offset > blk_rq_bytes(req)) + goto fail_put; + + return req; +fail_put: + ublk_put_req_ref(ubq, req); + return NULL; +} + static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) { struct ublksrv_io_cmd *ub_cmd = (struct ublksrv_io_cmd *)cmd->cmd; @@ -1418,11 +1448,112 @@ static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) return -EIOCBQUEUED; } +static inline bool ublk_check_ubuf_dir(const struct request *req, + int ubuf_dir) +{ + /* copy ubuf to request pages */ + if (req_op(req) == REQ_OP_READ && ubuf_dir == ITER_SOURCE) + return true; + + /* copy request pages to ubuf */ + if (req_op(req) == REQ_OP_WRITE && ubuf_dir == ITER_DEST) + return true; + + return false; +} + +static struct request *ublk_check_and_get_req(struct kiocb *iocb, + struct iov_iter *iter, size_t *off, int dir) +{ + struct ublk_device *ub = iocb->ki_filp->private_data; + struct ublk_queue *ubq; + struct request *req; + size_t buf_off; + u16 tag, q_id; + + if (!ub) + return ERR_PTR(-EACCES); + + if (!user_backed_iter(iter)) + return ERR_PTR(-EACCES); + + if (ub->dev_info.state == UBLK_S_DEV_DEAD) + return ERR_PTR(-EACCES); + + tag = ublk_pos_to_tag(iocb->ki_pos); + q_id = ublk_pos_to_hwq(iocb->ki_pos); + buf_off = ublk_pos_to_buf_offset(iocb->ki_pos); + + if (q_id >= ub->dev_info.nr_hw_queues) + return ERR_PTR(-EINVAL); + + ubq = ublk_get_queue(ub, q_id); + if (!ubq) + return ERR_PTR(-EINVAL); + + if (tag >= ubq->q_depth) + return ERR_PTR(-EINVAL); + + req = __ublk_check_and_get_req(ub, ubq, tag, buf_off); + if (!req) + return ERR_PTR(-EINVAL); + + if (!req->mq_hctx || !req->mq_hctx->driver_data) + goto fail; + + if (!ublk_check_ubuf_dir(req, dir)) + goto fail; + + *off = buf_off; + return req; +fail: + ublk_put_req_ref(ubq, req); + return ERR_PTR(-EACCES); +} + +static ssize_t ublk_ch_read_iter(struct kiocb *iocb, struct iov_iter *to) +{ + struct ublk_queue *ubq; + struct request *req; + size_t buf_off; + size_t ret; + + req = ublk_check_and_get_req(iocb, to, &buf_off, ITER_DEST); + if (unlikely(IS_ERR(req))) + return PTR_ERR(req); + + ret = ublk_copy_user_pages(req, buf_off, to, ITER_DEST); + ubq = req->mq_hctx->driver_data; + ublk_put_req_ref(ubq, req); + + return ret; +} + +static ssize_t ublk_ch_write_iter(struct kiocb *iocb, struct iov_iter *from) +{ + struct ublk_queue *ubq; + struct request *req; + size_t buf_off; + size_t ret; + + req = ublk_check_and_get_req(iocb, from, &buf_off, ITER_SOURCE); + if (unlikely(IS_ERR(req))) + return PTR_ERR(req); + + ret = ublk_copy_user_pages(req, buf_off, from, ITER_SOURCE); + ubq = req->mq_hctx->driver_data; + ublk_put_req_ref(ubq, req); + + return ret; +} + static const struct file_operations ublk_ch_fops = { .owner = THIS_MODULE, .open = ublk_ch_open, .release = ublk_ch_release, .llseek = no_llseek, + .read_iter = ublk_ch_read_iter, + .write_iter = ublk_ch_write_iter, .uring_cmd = ublk_ch_uring_cmd, .mmap = ublk_ch_mmap, }; diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h index f6238ccc7800..d1a6b3dc0327 100644 --- a/include/uapi/linux/ublk_cmd.h +++ b/include/uapi/linux/ublk_cmd.h @@ -54,7 +54,36 @@ #define UBLKSRV_IO_BUF_OFFSET 0x80000000 /* tag bit is 12bit, so at most 4096 IOs for each queue */ -#define UBLK_MAX_QUEUE_DEPTH 4096 +#define UBLK_TAG_BITS 12 +#define UBLK_MAX_QUEUE_DEPTH (1U << UBLK_TAG_BITS) + +/* used for locating each io buffer for pread()/pwrite() on char device */ +#define UBLK_BUFS_SIZE_BITS 42 +#define UBLK_BUFS_SIZE_MASK ((1ULL << UBLK_BUFS_SIZE_BITS) - 1) +#define UBLK_BUF_SIZE_BITS (UBLK_BUFS_SIZE_BITS - UBLK_TAG_BITS) +#define UBLK_BUF_MAX_SIZE (1ULL << UBLK_BUF_SIZE_BITS) + +static inline __u16 ublk_pos_to_hwq(__u64 pos) +{ + return pos >> UBLK_BUFS_SIZE_BITS; +} + +static inline __u32 ublk_pos_to_buf_offset(__u64 pos) +{ + return (pos & UBLK_BUFS_SIZE_MASK) & (UBLK_BUF_MAX_SIZE - 1); +} + +static inline __u16 ublk_pos_to_tag(__u64 pos) +{ + return (pos & UBLK_BUFS_SIZE_MASK) >> UBLK_BUF_SIZE_BITS; +} + +/* offset of single buffer, which has to be < UBLK_BUX_MAX_SIZE */ +static inline __u64 ublk_pos(__u16 q_id, __u16 tag, __u32 offset) +{ + return (((__u64)q_id) << UBLK_BUFS_SIZE_BITS) | + ((((__u64)tag) << UBLK_BUF_SIZE_BITS) + offset); +} /* * zero copy requires 4k block size, and can remap ublk driver's io From patchwork Fri Mar 24 13:58:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 74563 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp649445vqo; Fri, 24 Mar 2023 07:04:23 -0700 (PDT) X-Google-Smtp-Source: AKy350bIvpfCMva72UVA/+6iSznv8NlxvnYiQYck8KPG19/guZeZ0IBcxY2GY11LPlkWyQwG6UfP X-Received: by 2002:a05:6402:1143:b0:502:25ac:c72a with SMTP id g3-20020a056402114300b0050225acc72amr537675edw.1.1679666663493; Fri, 24 Mar 2023 07:04:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679666663; cv=none; d=google.com; s=arc-20160816; b=sXTmkDpL7ueUlHL1sBFVjjRAZVSHNkvN+xqwinIpjDOPEgCBxrkw7+eRScamgcyrbW l32DLex4eCnm8Ndtz9vy0wl+CQ8b2/tQcGz+EVGgO+hmaiodmSdHIXUVP5vivnwN05jJ HfGhmm9g+m8TrtwtsLsZ60kN6YjC/OZrnbjbyiIpPDK7v3oLwWaB4U7eBE4/GYSIVhVZ xE599LHPRUonfU/+BkMuOTS+AsgKHCP1lrToDTfUfP7eK9Ji3waEk3S0odqChAlvW6so mDVadDzUcRsW++gyxv2EUoeV1NepXHnbFmsvfAkibkz48RdgvAjkKEC1v+JGLDH3fZTZ UtAA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=F2x43YYgaHnxatsD2fHInwtjlkwhs8/vSUKYNVD7354=; b=edhn8AGuwcbN4jongDeKcyVk5hxTRm3jPt3jjiwgilRzqhDxneeoFKbxiYgJ9e6s16 OUf1GUPOb142vjpybjxdnaCNXLgdZ47TZCndzoOZHT5QKtBIlc3lAczk8hlossQ6S7eA o2qxWQG/1cL5R1L70EYoxoRjVjW0s7kxU2BQQubxRVzKkPUQA6dNaLLzt3F3mzbub1Dj 9Lqb/qEs6BOecjYUoMKxtz9usegzSXYxPJJbwpD9iCRf7JVkJC4pcLaLjbgBk+tVAe+e EsdfPYp4otSVY1TvyBPo+JuIrHToQgEwEzAeKlRk0SKxxdw2ASnHc8JpE+Ngcn6P9x0S i6/w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=FBxy9611; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l17-20020aa7c311000000b004c180fc7861si19693512edq.357.2023.03.24.07.03.58; Fri, 24 Mar 2023 07:04:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=FBxy9611; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232005AbjCXOBF (ORCPT + 99 others); Fri, 24 Mar 2023 10:01:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52696 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232106AbjCXOA0 (ORCPT ); Fri, 24 Mar 2023 10:00:26 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 27CDD1EBEE for ; Fri, 24 Mar 2023 06:59:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679666361; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=F2x43YYgaHnxatsD2fHInwtjlkwhs8/vSUKYNVD7354=; b=FBxy9611aKVj/z167yMnA4RuyB5+I9QYfhuIKGTzcVH/I1OJ+/FytO1Hjh+2VC0kdAU9bx DkJ+CbuF1jNpI+30xcEReHeiSdb9Nhhd9vawRKwiDnl3Sz/0ol4X52f9DGQm94Hr2glLfL y+ONfUWbVhbNjzeAngbbUJShyyplWls= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-462-wNsKQ18TNyOOUAjzH2Kbpg-1; Fri, 24 Mar 2023 09:59:18 -0400 X-MC-Unique: wNsKQ18TNyOOUAjzH2Kbpg-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5BD3B1C0512F; Fri, 24 Mar 2023 13:59:17 +0000 (UTC) Received: from localhost (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id 73D13202701E; Fri, 24 Mar 2023 13:59:16 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Pavel Begunkov , Stefan Hajnoczi , Ming Lei Subject: [PATCH V4 16/17] block: ublk_drv: don't check buffer in case of zero copy Date: Fri, 24 Mar 2023 21:58:07 +0800 Message-Id: <20230324135808.855245-17-ming.lei@redhat.com> In-Reply-To: <20230324135808.855245-1-ming.lei@redhat.com> References: <20230324135808.855245-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761258151729709137?= X-GMAIL-MSGID: =?utf-8?q?1761258151729709137?= In case of zero copy, ublk server needn't to pre-allocate IO buffer and provide it to driver more. Meantime not set the buffer in case of zero copy any more, and the userspace can use pread()/pwrite() to read from/write to the io request buffer, which is easier & simpler from userspace viewpoint. Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index e6b528750b3c..979444647831 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -1405,25 +1405,30 @@ static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) if (io->flags & UBLK_IO_FLAG_OWNED_BY_SRV) goto out; /* FETCH_RQ has to provide IO buffer if NEED GET DATA is not enabled */ - if (!ub_cmd->addr && !ublk_need_get_data(ubq)) - goto out; + if (!ublk_support_zc(ubq)) { + if (!ub_cmd->addr && !ublk_need_get_data(ubq)) + goto out; + io->addr = ub_cmd->addr; + } io->cmd = cmd; io->flags |= UBLK_IO_FLAG_ACTIVE; - io->addr = ub_cmd->addr; - ublk_mark_io_ready(ub, ubq); break; case UBLK_IO_COMMIT_AND_FETCH_REQ: req = blk_mq_tag_to_rq(ub->tag_set.tags[ub_cmd->q_id], tag); + + if (!(io->flags & UBLK_IO_FLAG_OWNED_BY_SRV)) + goto out; /* * COMMIT_AND_FETCH_REQ has to provide IO buffer if NEED GET DATA is * not enabled or it is Read IO. */ - if (!ub_cmd->addr && (!ublk_need_get_data(ubq) || req_op(req) == REQ_OP_READ)) - goto out; - if (!(io->flags & UBLK_IO_FLAG_OWNED_BY_SRV)) - goto out; - io->addr = ub_cmd->addr; + if (!ublk_support_zc(ubq)) { + if (!ub_cmd->addr && (!ublk_need_get_data(ubq) || + req_op(req) == REQ_OP_READ)) + goto out; + io->addr = ub_cmd->addr; + } io->flags |= UBLK_IO_FLAG_ACTIVE; io->cmd = cmd; ublk_commit_completion(ub, ub_cmd); From patchwork Fri Mar 24 13:58:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 74561 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp648414vqo; Fri, 24 Mar 2023 07:03:22 -0700 (PDT) X-Google-Smtp-Source: AK7set/wdmm0136bZ/Y9OBHwdjB99YO744UgBrmEkDct3frQBmjaaETdLNRSNWk5UFsqRHOCG+BS X-Received: by 2002:a17:906:24cb:b0:930:8590:95ef with SMTP id f11-20020a17090624cb00b00930859095efmr9702793ejb.18.1679666602753; Fri, 24 Mar 2023 07:03:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679666602; cv=none; d=google.com; s=arc-20160816; b=0CozvG1XE5v59asQOo0HWuyW+BDzUaTN3GcnTiwhZ+nVnU6shx/G2T2OSXBzdUmbrB ZExr8qXlX7LCreIoOlSBLmX0b+jJTHsQqpsZrxkfXBQs9IsGEN1jE+tCVFFTbPQGQY/P vJ7c+HEzIiiIQQPpWOG6qpFsmtyadC9ApldXz+DU2d435eUbBIZHjkDmFj0OGIYVKT6A OCMud9umqTug1oUxaWvmv36AkCepWOyMsJ8w43JGKHPeRDORuKH6WWdX8ukrvwAF2QYl VjD4U+1YoltznyuJKXd7uxTg0D7/YTFdtGblvNm54kSOgvC3QMMbC3KWMM7hY0bwgeir dDeQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=smNOzrmtxEHaBz/HHvXcG8qhls2CG5fdhXftGx1RU8s=; b=OniCX5znZUhOWK9NjMfq7WQvYZQzRbw1IyRrG0DzbTxOEUblP4bGTysXUyYu/oCQfm l+ghqk/mTYC+kvf7R3haQj1HCRYuoiOcPKoaylRlm5hQZmoJ4yDL6QDqvTwelTtF8vcc g1CpOypkG0mer9GPJcODo/e/0DBiMzB4/ocanjOUMTVPxpOsURHJjMrfQsgdemDUWjq8 +B+YhQ8K+DPaxsNKenPswA7x1BcaGvx+51jwy9M90aKECRJKRIQH3JZ6TAk5b6ehOxaO 2nkGW+9xVRPJ1ZVIIaC8WYv1enwUbPqlpbcMkES8vohfobLurfPvD4qcqJ1BwQYtJlr7 cSNA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=jHfHzWcm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 27-20020a170906005b00b008cf961a7be1si21509734ejg.830.2023.03.24.07.02.49; Fri, 24 Mar 2023 07:03:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=jHfHzWcm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231852AbjCXOBT (ORCPT + 99 others); Fri, 24 Mar 2023 10:01:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52984 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232147AbjCXOAg (ORCPT ); Fri, 24 Mar 2023 10:00:36 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4263F1C584 for ; Fri, 24 Mar 2023 06:59:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679666366; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=smNOzrmtxEHaBz/HHvXcG8qhls2CG5fdhXftGx1RU8s=; b=jHfHzWcmNH368KleGuLFDWHMXlLU4uiXBTU43dUS+B5rZDq6NebHRFSR/mbqiksoYOzpzl a4k11tb7+PFLLXYaB9vnQAiXwnFXayTdHKDJmzzYzpf7RI0HLjHLvCWzHyuDLEwO4c5C+v rvaOihlLD8TgKx2o+x6GedCCc5BlM0Y= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-590-wcqBFSVDM3qjrlF5O2baWQ-1; Fri, 24 Mar 2023 09:59:22 -0400 X-MC-Unique: wcqBFSVDM3qjrlF5O2baWQ-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4CD922817234; Fri, 24 Mar 2023 13:59:21 +0000 (UTC) Received: from localhost (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id CEA44140EBF4; Fri, 24 Mar 2023 13:59:19 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Pavel Begunkov , Stefan Hajnoczi , Ming Lei Subject: [PATCH V4 17/17] block: ublk_drv: apply io_uring FUSED_CMD for supporting zero copy Date: Fri, 24 Mar 2023 21:58:08 +0800 Message-Id: <20230324135808.855245-18-ming.lei@redhat.com> In-Reply-To: <20230324135808.855245-1-ming.lei@redhat.com> References: <20230324135808.855245-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761258087685916977?= X-GMAIL-MSGID: =?utf-8?q?1761258087685916977?= Apply io_uring fused command for supporting zero copy: 1) init the fused cmd buffer(io_mapped_buf) in ublk_map_io(), and deinit it in ublk_unmap_io(), and this buffer is immutable, so it is just fine to retrieve it from concurrent fused command. 1) add sub-command opcode of UBLK_IO_FUSED_SUBMIT_IO for retrieving this fused cmd(zero copy) buffer 2) call io_fused_cmd_start_slave_req() to provide buffer to slave request and submit slave request; meantime setup complete callback via this API, once slave request is completed, the complete callback is called for freeing the buffer and completing the fused command Also request reference is held during fused command lifetime, and this way guarantees that request buffer won't be freed until all inflight fused commands are completed. userspace(only implement sqe128 fused command): https://github.com/ming1/ubdsrv/tree/fused-cmd-zc-v2 liburing test(only implement normal sqe fused command: two 64byte SQEs) https://github.com/ming1/liburing/commits/fused_cmd_miniublk Signed-off-by: Ming Lei --- Documentation/block/ublk.rst | 126 ++++++++++++++++++++-- drivers/block/ublk_drv.c | 191 ++++++++++++++++++++++++++++++++-- include/uapi/linux/ublk_cmd.h | 6 +- 3 files changed, 302 insertions(+), 21 deletions(-) diff --git a/Documentation/block/ublk.rst b/Documentation/block/ublk.rst index 1713b2890abb..d6b46455ca4d 100644 --- a/Documentation/block/ublk.rst +++ b/Documentation/block/ublk.rst @@ -297,18 +297,126 @@ with specified IO tag in the command data: ``UBLK_IO_COMMIT_AND_FETCH_REQ`` to the server, ublkdrv needs to copy the server buffer (pages) read to the IO request pages. -Future development -================== +- ``UBLK_IO_FUSED_SUBMIT_IO`` + + Used for implementing zero copy feature. + + It has to been the master command of io_uring fused command. This command + submits the generic slave IO request with io buffer provided by our master + command, and won't be completed until the slave request is done. + + The provided buffer is represented as ``io_uring_bvec_buf``, which is + actually ublk request buffer's reference, and the reference is shared & + read-only, so the generic slave request can retrieve any part of the buffer + by passing buffer offset & length. Zero copy ---------- +========= + +What is zero copy? +------------------ + +When application submits IO to ``/dev/ublkb*``, userspace buffer(direct io) +or page cache buffer(buffered io) or kernel buffer(meta io often) is used +for submitting data to ublk driver, and all kinds of these buffers are +represented by bio/bvecs(ublk request buffer) finally. Before supporting +zero copy, data in these buffers has to be copied to ublk server userspace +buffer before handling WRITE IO, or after handing READ IO, so that ublk +server can handle IO for ``/dev/ublkb*`` with the copied data. + +The extra copy between ublk request buffer and ublk server userspace buffer +not only increases CPU utilization(such as pinning pages, copy data), but +also consumes memory bandwidth, and the cost could be very big when IO size +is big. It is observed that ublk-null IOPS may be increased to ~5X if the +extra copy can be avoided. + +So zero copy is very important for supporting high performance block device +in userspace. + +Technical requirements +---------------------- + +- ublk request buffer use + +ublk request buffer is represented by bio/bvec, which is immutable, so do +not try to change bvec via buffer reference; data can be read from or +written to the buffer according to buffer direction, but bvec can't be +changed + +- buffer lifetime + +Ublk server borrows ublk request buffer for handling ublk IO, ublk request +buffer reference is used. Reference can't outlive the referent buffer. That +means all request buffer references have to be released by ublk server +before ublk driver completes this request, when request buffer ownership +is transferred to upper layer(FS, application, ...). + +Also after ublk request is completed, any page belonging to this ublk +request can not be written or read any more from ublk server since it is +one block device from kernel viewpoint. + +- buffer direction + +For ublk WRITE request, ublk request buffer should only be accessed as data +source, and the buffer can't be written by ublk server + +For ublk READ request, ublk request buffer should only be accessed as data +destination, and the buffer can't be read by ublk server, otherwise kernel +data is leaked to ublk server, which can be unprivileged application. + +- arbitrary size sub-buffer needs to be retrieved from ublk server + +ublk is one generic framework for implementing block device in userspace, +and typical requirements include logical volume manager(mirror, stripped, ...), +distributed network storage, compressed target, ... + +ublk server needs to retrieve arbitrary size sub-buffer of ublk request, and +ublk server needs to submit IOs with these sub-buffer(s). That also means +arbitrary size sub-buffer(s) can be used to submit IO multiple times. + +Any sub-buffer is actually one reference of ublk request buffer, which +ownership can't be transferred to upper layer if any reference is held +by ublk server. + +Why slice isn't good for ublk zero copy +--------------------------------------- + +- spliced page from ->splice_read() can't be written + +ublk READ request can't be handled because spliced page can't be written to, and +extending splice for ublk zero copy isn't one good solution [#splice_extend]_ + +- it is very hard to meet above requirements wrt. request buffer lifetime + +splice/pipe focuses on page reference lifetime, but ublk zero copy pays more +attention to ublk request buffer lifetime. If is very inefficient to respect +request buffer lifetime by using all pipe buffer's ->release() which requires +all pipe buffers and pipe to be kept when ublk server handles IO. That means +one single dedicated ``pipe_inode_info`` has to be allocated runtime for each +provided buffer, and the pipe needs to be populated with pages in ublk request +buffer. + + +io_uring fused command based zero copy +-------------------------------------- + +io_uring fused command includes one master command(uring command) and one +generic slave request. The master command is responsible for submitting +slave request with provided buffer from ublk request, and master command +won't be completed until the slave request is completed. + +Typical ublk IO handling includes network and FS IO, so it is usual enough +for io_uring net & fs to support IO with provided buffer from master command. -Zero copy is a generic requirement for nbd, fuse or similar drivers. A -problem [#xiaoguang]_ Xiaoguang mentioned is that pages mapped to userspace -can't be remapped any more in kernel with existing mm interfaces. This can -occurs when destining direct IO to ``/dev/ublkb*``. Also, he reported that -big requests (IO size >= 256 KB) may benefit a lot from zero copy. +Once master command is submitted successfully, ublk driver guarantees that +the ublk request buffer won't be gone away since slave request actually +grabs the buffer's reference. This way also guarantees that multiple +concurrent fused commands associated with same request buffer works fine, +as the provided buffer reference is shared & read-only. +Also buffer usage direction flag is passed to master command from userspace, +so ublk driver can validate if it is legal to use buffer with requested +direction. References ========== @@ -323,4 +431,4 @@ References .. [#stefan] https://lore.kernel.org/linux-block/YoOr6jBfgVm8GvWg@stefanha-x1.localdomain/ -.. [#xiaoguang] https://lore.kernel.org/linux-block/YoOr6jBfgVm8GvWg@stefanha-x1.localdomain/ +.. [#splice_extend] https://lore.kernel.org/linux-block/CAHk-=wgJsi7t7YYpuo6ewXGnHz2nmj67iWR6KPGoz5TBu34mWQ@mail.gmail.com/ diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 979444647831..9b6e11ef1fc3 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -74,10 +74,15 @@ struct ublk_rq_data { * successfully */ struct kref ref; + bool allocated_bvec; + struct io_uring_bvec_buf buf[0]; }; struct ublk_uring_cmd_pdu { - struct ublk_queue *ubq; + union { + struct ublk_queue *ubq; + struct request *req; + }; }; /* @@ -565,6 +570,69 @@ static size_t ublk_copy_user_pages(const struct request *req, return done; } +/* + * The built command buffer is immutable, so it is fine to feed it to + * concurrent io_uring fused commands + */ +static int ublk_init_zero_copy_buffer(struct request *rq) +{ + struct ublk_rq_data *data = blk_mq_rq_to_pdu(rq); + struct io_uring_bvec_buf *imu = data->buf; + struct req_iterator rq_iter; + unsigned int nr_bvecs = 0; + struct bio_vec *bvec; + unsigned int offset; + struct bio_vec bv; + + if (!ublk_rq_has_data(rq)) + goto exit; + + rq_for_each_bvec(bv, rq, rq_iter) + nr_bvecs++; + + if (!nr_bvecs) + goto exit; + + if (rq->bio != rq->biotail) { + int idx = 0; + + bvec = kvmalloc_array(sizeof(struct bio_vec), nr_bvecs, + GFP_NOIO); + if (!bvec) + return -ENOMEM; + + offset = 0; + rq_for_each_bvec(bv, rq, rq_iter) + bvec[idx++] = bv; + data->allocated_bvec = true; + } else { + struct bio *bio = rq->bio; + + offset = bio->bi_iter.bi_bvec_done; + bvec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter); + } + imu->bvec = bvec; + imu->nr_bvecs = nr_bvecs; + imu->offset = offset; + imu->len = blk_rq_bytes(rq); + + return 0; +exit: + imu->bvec = NULL; + return 0; +} + +static void ublk_deinit_zero_copy_buffer(struct request *rq) +{ + struct ublk_rq_data *data = blk_mq_rq_to_pdu(rq); + struct io_uring_bvec_buf *imu = data->buf; + + if (data->allocated_bvec) { + kvfree(imu->bvec); + data->allocated_bvec = false; + } +} + static inline bool ublk_need_map_req(const struct request *req) { return ublk_rq_has_data(req) && req_op(req) == REQ_OP_WRITE; @@ -575,11 +643,23 @@ static inline bool ublk_need_unmap_req(const struct request *req) return ublk_rq_has_data(req) && req_op(req) == REQ_OP_READ; } -static int ublk_map_io(const struct ublk_queue *ubq, const struct request *req, +static int ublk_map_io(const struct ublk_queue *ubq, struct request *req, struct ublk_io *io) { const unsigned int rq_bytes = blk_rq_bytes(req); + if (ublk_support_zc(ubq)) { + int ret = ublk_init_zero_copy_buffer(req); + + /* + * The only failure is -ENOMEM for allocating fused cmd + * buffer, return zero so that we can requeue this req. + */ + if (unlikely(ret)) + return 0; + return rq_bytes; + } + /* * no zero copy, we delay copy WRITE request data into ublksrv * context and the big benefit is that pinning pages in current @@ -599,11 +679,17 @@ static int ublk_map_io(const struct ublk_queue *ubq, const struct request *req, } static int ublk_unmap_io(const struct ublk_queue *ubq, - const struct request *req, + struct request *req, struct ublk_io *io) { const unsigned int rq_bytes = blk_rq_bytes(req); + if (ublk_support_zc(ubq)) { + ublk_deinit_zero_copy_buffer(req); + + return rq_bytes; + } + if (ublk_need_unmap_req(req)) { struct iov_iter iter; struct iovec iov; @@ -687,6 +773,12 @@ static inline struct ublk_uring_cmd_pdu *ublk_get_uring_cmd_pdu( return (struct ublk_uring_cmd_pdu *)&ioucmd->pdu; } +static inline struct ublk_uring_cmd_pdu *ublk_get_uring_fused_cmd_pdu( + struct io_uring_cmd *ioucmd) +{ + return (struct ublk_uring_cmd_pdu *)&ioucmd->fused.pdu; +} + static inline bool ubq_daemon_is_dying(struct ublk_queue *ubq) { return ubq->ubq_daemon->flags & PF_EXITING; @@ -742,6 +834,7 @@ static inline void __ublk_complete_rq(struct request *req) return; exit: + ublk_deinit_zero_copy_buffer(req); blk_mq_end_request(req, res); } @@ -1347,6 +1440,67 @@ static inline struct request *__ublk_check_and_get_req(struct ublk_device *ub, return NULL; } +static void ublk_fused_cmd_done_cb(struct io_uring_cmd *cmd) +{ + struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_fused_cmd_pdu(cmd); + struct request *req = pdu->req; + struct ublk_queue *ubq = req->mq_hctx->driver_data; + + ublk_put_req_ref(ubq, req); + io_uring_cmd_done(cmd, cmd->fused.data.slave_res, 0); +} + +static inline bool ublk_check_fused_buf_dir(const struct request *req, + unsigned int flags) +{ + flags &= IO_URING_F_FUSED; + + if (req_op(req) == REQ_OP_READ && flags == IO_URING_F_FUSED_BUF_DEST) + return true; + + if (req_op(req) == REQ_OP_WRITE && flags == IO_URING_F_FUSED_BUF_SRC) + return true; + + return false; +} + +static int ublk_handle_fused_cmd(struct io_uring_cmd *cmd, + struct ublk_queue *ubq, int tag, unsigned int issue_flags) +{ + struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_fused_cmd_pdu(cmd); + struct ublk_device *ub = cmd->file->private_data; + struct ublk_rq_data *data; + struct request *req; + + if (!ub) + return -EPERM; + + if (!(issue_flags & IO_URING_F_FUSED)) + goto exit; + + req = __ublk_check_and_get_req(ub, ubq, tag, 0); + if (!req) + goto exit; + + pr_devel("%s: qid %d tag %u request bytes %u, issue flags %x\n", + __func__, tag, ubq->q_id, blk_rq_bytes(req), + issue_flags); + + if (!ublk_check_fused_buf_dir(req, issue_flags)) + goto exit_put_ref; + + pdu->req = req; + data = blk_mq_rq_to_pdu(req); + io_fused_cmd_start_slave_req(cmd, !(issue_flags & IO_URING_F_UNLOCKED), + data->buf, ublk_fused_cmd_done_cb); + return -EIOCBQUEUED; + +exit_put_ref: + ublk_put_req_ref(ubq, req); +exit: + return -EINVAL; +} + static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) { struct ublksrv_io_cmd *ub_cmd = (struct ublksrv_io_cmd *)cmd->cmd; @@ -1362,6 +1516,10 @@ static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) __func__, cmd->cmd_op, ub_cmd->q_id, tag, ub_cmd->result); + if ((issue_flags & IO_URING_F_FUSED) && + cmd_op != UBLK_IO_FUSED_SUBMIT_IO) + return -EOPNOTSUPP; + if (ub_cmd->q_id >= ub->dev_info.nr_hw_queues) goto out; @@ -1369,7 +1527,12 @@ static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) if (!ubq || ub_cmd->q_id != ubq->q_id) goto out; - if (ubq->ubq_daemon && ubq->ubq_daemon != current) + /* + * The fused command reads the io buffer data structure only, so it + * is fine to be issued from other context. + */ + if ((ubq->ubq_daemon && ubq->ubq_daemon != current) && + (cmd_op != UBLK_IO_FUSED_SUBMIT_IO)) goto out; if (tag >= ubq->q_depth) @@ -1392,6 +1555,9 @@ static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) goto out; switch (cmd_op) { + case UBLK_IO_FUSED_SUBMIT_IO: + return ublk_handle_fused_cmd(cmd, ubq, tag, issue_flags); + case UBLK_IO_FETCH_REQ: /* UBLK_IO_FETCH_REQ is only allowed before queue is setup */ if (ublk_queue_ready(ubq)) { @@ -1721,11 +1887,14 @@ static void ublk_align_max_io_size(struct ublk_device *ub) static int ublk_add_tag_set(struct ublk_device *ub) { + int zc = !!(ub->dev_info.flags & UBLK_F_SUPPORT_ZERO_COPY); + struct ublk_rq_data *data; + ub->tag_set.ops = &ublk_mq_ops; ub->tag_set.nr_hw_queues = ub->dev_info.nr_hw_queues; ub->tag_set.queue_depth = ub->dev_info.queue_depth; ub->tag_set.numa_node = NUMA_NO_NODE; - ub->tag_set.cmd_size = sizeof(struct ublk_rq_data); + ub->tag_set.cmd_size = struct_size(data, buf, zc); ub->tag_set.flags = BLK_MQ_F_SHOULD_MERGE; ub->tag_set.driver_data = ub; return blk_mq_alloc_tag_set(&ub->tag_set); @@ -1941,12 +2110,18 @@ static int ublk_ctrl_add_dev(struct io_uring_cmd *cmd) */ ub->dev_info.flags &= UBLK_F_ALL; + /* + * NEED_GET_DATA doesn't make sense any more in case that + * ZERO_COPY is requested. Another reason is that userspace + * can read/write io request buffer by pread()/pwrite() with + * each io buffer's position. + */ + if (ub->dev_info.flags & UBLK_F_SUPPORT_ZERO_COPY) + ub->dev_info.flags &= ~UBLK_F_NEED_GET_DATA; + if (!IS_BUILTIN(CONFIG_BLK_DEV_UBLK)) ub->dev_info.flags |= UBLK_F_URING_CMD_COMP_IN_TASK; - /* We are not ready to support zero copy */ - ub->dev_info.flags &= ~UBLK_F_SUPPORT_ZERO_COPY; - ub->dev_info.nr_hw_queues = min_t(unsigned int, ub->dev_info.nr_hw_queues, nr_cpu_ids); ublk_align_max_io_size(ub); diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h index d1a6b3dc0327..c4f3465399cf 100644 --- a/include/uapi/linux/ublk_cmd.h +++ b/include/uapi/linux/ublk_cmd.h @@ -44,6 +44,7 @@ #define UBLK_IO_FETCH_REQ 0x20 #define UBLK_IO_COMMIT_AND_FETCH_REQ 0x21 #define UBLK_IO_NEED_GET_DATA 0x22 +#define UBLK_IO_FUSED_SUBMIT_IO 0x23 /* only ABORT means that no re-fetch */ #define UBLK_IO_RES_OK 0 @@ -85,10 +86,7 @@ static inline __u64 ublk_pos(__u16 q_id, __u16 tag, __u32 offset) ((((__u64)tag) << UBLK_BUF_SIZE_BITS) + offset); } -/* - * zero copy requires 4k block size, and can remap ublk driver's io - * request into ublksrv's vm space - */ +/* io_uring fused command based zero copy */ #define UBLK_F_SUPPORT_ZERO_COPY (1ULL << 0) /*