From patchwork Tue Mar 28 15:09:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 76118 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp2299174vqo; Tue, 28 Mar 2023 08:18:38 -0700 (PDT) X-Google-Smtp-Source: AKy350YxjYxvLOmcSxbKcgRa02skfH6JvGQnsRcT39gk8hBdIZR8vEU6gEZXGQ/DsI4BXZ+8lSsH X-Received: by 2002:a05:6402:151:b0:4fc:5d56:f91d with SMTP id s17-20020a056402015100b004fc5d56f91dmr15633399edu.18.1680016718157; Tue, 28 Mar 2023 08:18:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680016718; cv=none; d=google.com; s=arc-20160816; b=AF9QQcjiH+YTtT69f8wGcH9NnRXEOQYoWrwJE1HKInh0+C5A4OwVOKHMHEBNLl4MRN q/1J3kGw4EdS+rM6++lxeNH+NYuSZLsibJHDO6fY34Ias4WVSD/sGSXHn0jueqb2FYUu FQHOc2oLB9GGwzg1KkO7MGTwrfynFvVbjZwRNEDRNyTSpYOwag+Q56NDiv7Fps5iH4+a LFsDTyFMJVIn9kehbwSF27/5DPuL1lS3SInmUypoP+iVoYDpvrJ6v+sjuPMsEAzt9JvP r97XIAJhHrq/B+IIgpHRRnOGbltP26OGjyobNpjnKB3wEEgfHnIXhzweqq0wszzfyBUi RlHA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=EO1Gc+65gs0ahiLFrwFbt1R8FOKbz4SZ+dA6YGYHpZ8=; b=GrrBbz4hHhAZ7AbmVKyT86vUoLknytA+X3y2Coxx+Z1jJwmKjCXG1ewIwImyDwF712 DGt2Bv1prPRku24ZrfVeeIgEvGyd9EINT/hHw06+pX3RSG0UchsKB1K+nIyJ7MScmVf9 +w+0Hf/3bUf0Na6dk8sNEohjwZA2Ogc9L1ny72425Wm8HedTFukqGZ4RlCOmFEVd1DT4 AgGtPMGbUWdhuDolcu4Uum2jY9pf5wLoDxtGTtyO/wTJ1zIFYFINMhTl4WLrbq1L6OcT hhj5fqrleLK+sognv/8ne/hp/ToLCKZKlngoucd9kI64LHhd8gpKMGaG9I989nzYVNdb WDRw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=dVrUb62l; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o12-20020aa7c50c000000b004fd23c911f7si10080116edq.544.2023.03.28.08.18.12; Tue, 28 Mar 2023 08:18:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=dVrUb62l; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233901AbjC1PNw (ORCPT + 99 others); Tue, 28 Mar 2023 11:13:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53540 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233919AbjC1PNb (ORCPT ); Tue, 28 Mar 2023 11:13:31 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A264EFF23 for ; Tue, 28 Mar 2023 08:12:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680016257; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EO1Gc+65gs0ahiLFrwFbt1R8FOKbz4SZ+dA6YGYHpZ8=; b=dVrUb62lIcpq1uINB9EhgYQZuoi/BhNW+PYPDJokiystEFy7dx+oS8TeVN/83HH78gqm1D PQaRjOBVCQWU99gtQtoUpbeTMQJFua1WNwxmLabiMqpQtsQXKPAO68rt3E3cAyyonupk7h odg0MpLFjMTLxQVgZqCg9myI5PFgavM= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-37-r3840IcWO6iGys5QDdW3vA-1; Tue, 28 Mar 2023 11:10:53 -0400 X-MC-Unique: r3840IcWO6iGys5QDdW3vA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B37AC38149B1; Tue, 28 Mar 2023 15:10:50 +0000 (UTC) Received: from localhost (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id B1FC7140EBF4; Tue, 28 Mar 2023 15:10:49 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Pavel Begunkov , Stefan Hajnoczi , Dan Williams , Ming Lei Subject: [PATCH V5 01/16] io_uring: increase io_kiocb->flags into 64bit Date: Tue, 28 Mar 2023 23:09:43 +0800 Message-Id: <20230328150958.1253547-2-ming.lei@redhat.com> In-Reply-To: <20230328150958.1253547-1-ming.lei@redhat.com> References: <20230328150958.1253547-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 X-Spam-Status: No, score=0.6 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,UPPERCASE_50_75 autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761625210107074572?= X-GMAIL-MSGID: =?utf-8?q?1761625210107074572?= The 32bit io_kiocb->flags has been used up, so extend it to 64bit. Signed-off-by: Ming Lei --- include/linux/io_uring_types.h | 65 +++++++++++++++++----------------- io_uring/io_uring.c | 2 +- 2 files changed, 34 insertions(+), 33 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 561fa421c453..dd8ef886730b 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -414,68 +414,68 @@ enum { enum { /* ctx owns file */ - REQ_F_FIXED_FILE = BIT(REQ_F_FIXED_FILE_BIT), + REQ_F_FIXED_FILE = BIT_ULL(REQ_F_FIXED_FILE_BIT), /* drain existing IO first */ - REQ_F_IO_DRAIN = BIT(REQ_F_IO_DRAIN_BIT), + REQ_F_IO_DRAIN = BIT_ULL(REQ_F_IO_DRAIN_BIT), /* linked sqes */ - REQ_F_LINK = BIT(REQ_F_LINK_BIT), + REQ_F_LINK = BIT_ULL(REQ_F_LINK_BIT), /* doesn't sever on completion < 0 */ - REQ_F_HARDLINK = BIT(REQ_F_HARDLINK_BIT), + REQ_F_HARDLINK = BIT_ULL(REQ_F_HARDLINK_BIT), /* IOSQE_ASYNC */ - REQ_F_FORCE_ASYNC = BIT(REQ_F_FORCE_ASYNC_BIT), + REQ_F_FORCE_ASYNC = BIT_ULL(REQ_F_FORCE_ASYNC_BIT), /* IOSQE_BUFFER_SELECT */ - REQ_F_BUFFER_SELECT = BIT(REQ_F_BUFFER_SELECT_BIT), + REQ_F_BUFFER_SELECT = BIT_ULL(REQ_F_BUFFER_SELECT_BIT), /* IOSQE_CQE_SKIP_SUCCESS */ - REQ_F_CQE_SKIP = BIT(REQ_F_CQE_SKIP_BIT), + REQ_F_CQE_SKIP = BIT_ULL(REQ_F_CQE_SKIP_BIT), /* fail rest of links */ - REQ_F_FAIL = BIT(REQ_F_FAIL_BIT), + REQ_F_FAIL = BIT_ULL(REQ_F_FAIL_BIT), /* on inflight list, should be cancelled and waited on exit reliably */ - REQ_F_INFLIGHT = BIT(REQ_F_INFLIGHT_BIT), + REQ_F_INFLIGHT = BIT_ULL(REQ_F_INFLIGHT_BIT), /* read/write uses file position */ - REQ_F_CUR_POS = BIT(REQ_F_CUR_POS_BIT), + REQ_F_CUR_POS = BIT_ULL(REQ_F_CUR_POS_BIT), /* must not punt to workers */ - REQ_F_NOWAIT = BIT(REQ_F_NOWAIT_BIT), + REQ_F_NOWAIT = BIT_ULL(REQ_F_NOWAIT_BIT), /* has or had linked timeout */ - REQ_F_LINK_TIMEOUT = BIT(REQ_F_LINK_TIMEOUT_BIT), + REQ_F_LINK_TIMEOUT = BIT_ULL(REQ_F_LINK_TIMEOUT_BIT), /* needs cleanup */ - REQ_F_NEED_CLEANUP = BIT(REQ_F_NEED_CLEANUP_BIT), + REQ_F_NEED_CLEANUP = BIT_ULL(REQ_F_NEED_CLEANUP_BIT), /* already went through poll handler */ - REQ_F_POLLED = BIT(REQ_F_POLLED_BIT), + REQ_F_POLLED = BIT_ULL(REQ_F_POLLED_BIT), /* buffer already selected */ - REQ_F_BUFFER_SELECTED = BIT(REQ_F_BUFFER_SELECTED_BIT), + REQ_F_BUFFER_SELECTED = BIT_ULL(REQ_F_BUFFER_SELECTED_BIT), /* buffer selected from ring, needs commit */ - REQ_F_BUFFER_RING = BIT(REQ_F_BUFFER_RING_BIT), + REQ_F_BUFFER_RING = BIT_ULL(REQ_F_BUFFER_RING_BIT), /* caller should reissue async */ - REQ_F_REISSUE = BIT(REQ_F_REISSUE_BIT), + REQ_F_REISSUE = BIT_ULL(REQ_F_REISSUE_BIT), /* supports async reads/writes */ - REQ_F_SUPPORT_NOWAIT = BIT(REQ_F_SUPPORT_NOWAIT_BIT), + REQ_F_SUPPORT_NOWAIT = BIT_ULL(REQ_F_SUPPORT_NOWAIT_BIT), /* regular file */ - REQ_F_ISREG = BIT(REQ_F_ISREG_BIT), + REQ_F_ISREG = BIT_ULL(REQ_F_ISREG_BIT), /* has creds assigned */ - REQ_F_CREDS = BIT(REQ_F_CREDS_BIT), + REQ_F_CREDS = BIT_ULL(REQ_F_CREDS_BIT), /* skip refcounting if not set */ - REQ_F_REFCOUNT = BIT(REQ_F_REFCOUNT_BIT), + REQ_F_REFCOUNT = BIT_ULL(REQ_F_REFCOUNT_BIT), /* there is a linked timeout that has to be armed */ - REQ_F_ARM_LTIMEOUT = BIT(REQ_F_ARM_LTIMEOUT_BIT), + REQ_F_ARM_LTIMEOUT = BIT_ULL(REQ_F_ARM_LTIMEOUT_BIT), /* ->async_data allocated */ - REQ_F_ASYNC_DATA = BIT(REQ_F_ASYNC_DATA_BIT), + REQ_F_ASYNC_DATA = BIT_ULL(REQ_F_ASYNC_DATA_BIT), /* don't post CQEs while failing linked requests */ - REQ_F_SKIP_LINK_CQES = BIT(REQ_F_SKIP_LINK_CQES_BIT), + REQ_F_SKIP_LINK_CQES = BIT_ULL(REQ_F_SKIP_LINK_CQES_BIT), /* single poll may be active */ - REQ_F_SINGLE_POLL = BIT(REQ_F_SINGLE_POLL_BIT), + REQ_F_SINGLE_POLL = BIT_ULL(REQ_F_SINGLE_POLL_BIT), /* double poll may active */ - REQ_F_DOUBLE_POLL = BIT(REQ_F_DOUBLE_POLL_BIT), + REQ_F_DOUBLE_POLL = BIT_ULL(REQ_F_DOUBLE_POLL_BIT), /* request has already done partial IO */ - REQ_F_PARTIAL_IO = BIT(REQ_F_PARTIAL_IO_BIT), + REQ_F_PARTIAL_IO = BIT_ULL(REQ_F_PARTIAL_IO_BIT), /* fast poll multishot mode */ - REQ_F_APOLL_MULTISHOT = BIT(REQ_F_APOLL_MULTISHOT_BIT), + REQ_F_APOLL_MULTISHOT = BIT_ULL(REQ_F_APOLL_MULTISHOT_BIT), /* ->extra1 and ->extra2 are initialised */ - REQ_F_CQE32_INIT = BIT(REQ_F_CQE32_INIT_BIT), + REQ_F_CQE32_INIT = BIT_ULL(REQ_F_CQE32_INIT_BIT), /* recvmsg special flag, clear EPOLLIN */ - REQ_F_CLEAR_POLLIN = BIT(REQ_F_CLEAR_POLLIN_BIT), + REQ_F_CLEAR_POLLIN = BIT_ULL(REQ_F_CLEAR_POLLIN_BIT), /* hashed into ->cancel_hash_locked, protected by ->uring_lock */ - REQ_F_HASH_LOCKED = BIT(REQ_F_HASH_LOCKED_BIT), + REQ_F_HASH_LOCKED = BIT_ULL(REQ_F_HASH_LOCKED_BIT), }; typedef void (*io_req_tw_func_t)(struct io_kiocb *req, struct io_tw_state *ts); @@ -536,7 +536,8 @@ struct io_kiocb { * and after selection it points to the buffer ID itself. */ u16 buf_index; - unsigned int flags; + u32 __pad; + u64 flags; struct io_cqe cqe; diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 536940675c67..693558c4b10b 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -4486,7 +4486,7 @@ static int __init io_uring_init(void) BUILD_BUG_ON(SQE_COMMON_FLAGS >= (1 << 8)); BUILD_BUG_ON((SQE_VALID_FLAGS | SQE_COMMON_FLAGS) != SQE_VALID_FLAGS); - BUILD_BUG_ON(__REQ_F_LAST_BIT > 8 * sizeof(int)); + BUILD_BUG_ON(__REQ_F_LAST_BIT > 8 * sizeof(u64)); BUILD_BUG_ON(sizeof(atomic_t) != sizeof(u32)); From patchwork Tue Mar 28 15:09:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 76117 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp2298760vqo; Tue, 28 Mar 2023 08:18:06 -0700 (PDT) X-Google-Smtp-Source: AKy350azB1vrJKdPhWGYNBFGxLzFMt0ixVgJi5mYsRtrjZYOJR85m9aMQ7PKQ2MHLqDMBYakuWE+ X-Received: by 2002:aa7:c786:0:b0:4fb:eda4:c093 with SMTP id n6-20020aa7c786000000b004fbeda4c093mr15893147eds.13.1680016686010; Tue, 28 Mar 2023 08:18:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680016685; cv=none; d=google.com; s=arc-20160816; b=A5Kqr6VR/cUjZUzLDhHCeC6vaSeZKjcx20ODk6j3JFyW2AHL5Uqllx8cTtvEMsGb9W Si/i/qgQ3hA2QtaEWk19iZ0+DkBpDQP1hnHQc9uS2TJLa9q7bi5bWvkEvVr9NUe3bWIg 6S07V8bsEpiakNmCzkbhtS9DSkLxlex7Io9qv2VwPRLutGztvR+xlylDKNyHYEv5wtHW xQ/GF1sdunDgEzgomIsJQVbBrUTd5QcN61PXudAnasx/MC6JawD04NhF+0ZMPQT4s7gu Tjd8S/vjh92NuvLTR4+v54SJnN+NAb+tGvdK5YldM8611vSG/JQLT0KbxME6A+0wx8fn 8iRA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=lK02yM2bzLjmtDa73L1C009IahLiG4RzhCW59S2y7Ds=; b=ErYNsqsbxYea6fpUjxxAYiNU2jEzecTiMQc3Ax15IOrjnGEja2AYyYbdoRMvfKc/CO iEWU8RaXgwrBuqkjom+TUZpc8xvaV/1UYmKxujcelgFZG994+rnETmPb9J6wsX41WQQT /4vMMY4i80mUXbxEMw2CZ0BCIUn4ejNnQI6Y8721W29BaK/2UgT+sfyhByu7xJGpIJA+ HV3agPh4PltSnkMSHB9tA9YyssDbJBiGqvXPsWBJvPzd+I3OKRKWTVblD/d5N46iSZtP Iy2tUAVhQH5nqPPPpQrz/pSuq1mwLFnt/GE058WFhCh62SO1JarUKJYKqXH0UO4OrKMF oEjg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=LBshVTyM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bf26-20020a0564021a5a00b004ab250bcee5si25622167edb.647.2023.03.28.08.17.41; Tue, 28 Mar 2023 08:18:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=LBshVTyM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232986AbjC1PNt (ORCPT + 99 others); Tue, 28 Mar 2023 11:13:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55104 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233918AbjC1PNb (ORCPT ); Tue, 28 Mar 2023 11:13:31 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9069F10415 for ; Tue, 28 Mar 2023 08:12:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680016260; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lK02yM2bzLjmtDa73L1C009IahLiG4RzhCW59S2y7Ds=; b=LBshVTyMLSQIvU6FrXr7WqKeoKNyUAQ8w+rljLStAf33qKTyJChix51cYNIwuCYiz/oh0h qVq4ynVLJBZSeCYxmMO8Y3nfHMyJOLm0oq4f3OJhgTilKkF0S8yNWu06Jv/IG5HI3G0ViZ nNzATlG4kIBrQIWWW8q49TS72RbqpHQ= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-671-SO5CHnfeNDid4SxIcIymSw-1; Tue, 28 Mar 2023 11:10:55 -0400 X-MC-Unique: SO5CHnfeNDid4SxIcIymSw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 41C4A887413; Tue, 28 Mar 2023 15:10:55 +0000 (UTC) Received: from localhost (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id BF6584020C82; Tue, 28 Mar 2023 15:10:53 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Pavel Begunkov , Stefan Hajnoczi , Dan Williams , Ming Lei Subject: [PATCH V5 02/16] io_uring: add IORING_OP_FUSED_CMD Date: Tue, 28 Mar 2023 23:09:44 +0800 Message-Id: <20230328150958.1253547-3-ming.lei@redhat.com> In-Reply-To: <20230328150958.1253547-1-ming.lei@redhat.com> References: <20230328150958.1253547-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761625176419718703?= X-GMAIL-MSGID: =?utf-8?q?1761625176419718703?= Add IORING_OP_FUSED_CMD, it is one special URING_CMD, which has to be SQE128. The 1st SQE(primary) is one 64byte URING_CMD, and the 2nd 64byte SQE(secondary) is another normal 64byte OP. For any OP which needs to support secondary OP, io_issue_defs[op].fused_secondary has to be set as 1, and its ->issue() needs to retrieve buffer from primary request's fused_cmd_kbuf. Follows the key points of the design/implementation: 1) The primary uring command produces and provides immutable command buffer(struct io_uring_bvec_buf) to the secondary request, and the secondary OP can retrieve any part of this buffer by sqe->addr and sqe->len. 2) Master command is always completed after the secondary request is completed, so secondary request can be thought as serving for primary command. - secondary request borrows primary command's buffer(io_uring_bvec_buf), after secondary request is completed, the buffer is returned back to primary request. - This way also guarantees correct SQE order since the primary request uses secondary request's LINK flag. 3) Master request completion is always notified to driver, so that driver can know when the buffer is done with secondary quest. This way is important since io_uring_bvec_buf represents reference of device io command buffer, and we have to gurantee that reference can not outlive the referent buffer, so far which is represented by bvec. 4) kernel API of io_fused_cmd_start_secondary_req is called by driver for making the buffer of io_uring_bvec_buf and starting to submit secondary request with the provided buffer. The motivation is for supporting zero copy for fuse/ublk, in which the device holds IO request buffer, and IO handling is often normal IO OP(fs, net, ..). With IORING_OP_FUSED_CMD, we can implement this kind of zero copy easily & reliably. Signed-off-by: Ming Lei --- include/linux/io_uring.h | 50 ++++++- include/linux/io_uring_types.h | 12 ++ include/uapi/linux/io_uring.h | 3 + io_uring/Makefile | 2 +- io_uring/fused_cmd.c | 241 +++++++++++++++++++++++++++++++++ io_uring/fused_cmd.h | 11 ++ io_uring/io_uring.c | 26 +++- io_uring/io_uring.h | 3 + io_uring/opdef.c | 12 ++ io_uring/opdef.h | 7 + 10 files changed, 361 insertions(+), 6 deletions(-) create mode 100644 io_uring/fused_cmd.c create mode 100644 io_uring/fused_cmd.h diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index 35b9328ca335..fdb48fff8313 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -4,6 +4,7 @@ #include #include +#include #include enum io_uring_cmd_flags { @@ -20,6 +21,26 @@ enum io_uring_cmd_flags { IO_URING_F_SQE128 = (1 << 8), IO_URING_F_CQE32 = (1 << 9), IO_URING_F_IOPOLL = (1 << 10), + + /* for FUSED_CMD only */ + IO_URING_F_FUSED_BUF_DEST = (1 << 11), /* secondary write to buffer */ + IO_URING_F_FUSED_BUF_SRC = (1 << 12), /* secondary read from buffer */ + /* driver incapable of FUSED_CMD should fail cmd when seeing F_FUSED */ + IO_URING_F_FUSED = IO_URING_F_FUSED_BUF_DEST | + IO_URING_F_FUSED_BUF_SRC, +}; + +union io_uring_fused_cmd_data { + /* + * In case of secondary request IOSQE_CQE_SKIP_SUCCESS, return the + * result via primary command; otherwise we simply return success + * if buffer is provided, and secondary request will return its result + * via its CQE + */ + s32 secondary_res; + + /* fused cmd private, driver do not touch it */ + struct io_kiocb *__secondary; }; struct io_uring_cmd { @@ -33,10 +54,31 @@ struct io_uring_cmd { }; u32 cmd_op; u32 flags; - u8 pdu[32]; /* available inline for free use */ + + /* for fused command, the available pdu is a bit less */ + union { + struct { + union io_uring_fused_cmd_data data; + u8 pdu[24]; /* available inline for free use */ + } fused; + u8 pdu[32]; /* available inline for free use */ + }; +}; + +struct io_uring_bvec_buf { + unsigned long len; + unsigned int nr_bvecs; + + /* offset in the 1st bvec */ + unsigned int offset; + const struct bio_vec *bvec; + struct bio_vec __bvec[]; }; #if defined(CONFIG_IO_URING) +void io_fused_cmd_start_secondary_req(struct io_uring_cmd *, unsigned, + const struct io_uring_bvec_buf *, + void (*complete_tw_cb)(struct io_uring_cmd *, unsigned)); int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, struct iov_iter *iter, void *ioucmd); void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, ssize_t res2, @@ -67,6 +109,12 @@ static inline void io_uring_free(struct task_struct *tsk) __io_uring_free(tsk); } #else +static inline void io_fused_cmd_start_secondary_req(struct io_uring_cmd *, + unsigned issue_flags, const struct io_uring_bvec_buf *, + unsigned int, + void (*complete_tw_cb)(struct io_uring_cmd *, unsigned)) +{ +} static inline int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, struct iov_iter *iter, void *ioucmd) { diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index dd8ef886730b..9c427f1e00e6 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -407,6 +407,7 @@ enum { /* keep async read/write and isreg together and in order */ REQ_F_SUPPORT_NOWAIT_BIT, REQ_F_ISREG_BIT, + REQ_F_FUSED_SECONDARY_BIT, /* not a real bit, just to check we're not overflowing the space */ __REQ_F_LAST_BIT, @@ -476,6 +477,8 @@ enum { REQ_F_CLEAR_POLLIN = BIT_ULL(REQ_F_CLEAR_POLLIN_BIT), /* hashed into ->cancel_hash_locked, protected by ->uring_lock */ REQ_F_HASH_LOCKED = BIT_ULL(REQ_F_HASH_LOCKED_BIT), + /* secondary request in fused cmd, won't be one uring cmd */ + REQ_F_FUSED_SECONDARY = BIT_ULL(REQ_F_FUSED_SECONDARY_BIT), }; typedef void (*io_req_tw_func_t)(struct io_kiocb *req, struct io_tw_state *ts); @@ -558,6 +561,15 @@ struct io_kiocb { * REQ_F_BUFFER_RING is set. */ struct io_buffer_list *buf_list; + + /* + * store kernel (sub)buffer of fused primary request which OP + * is IORING_OP_FUSED_CMD + */ + const struct io_uring_bvec_buf *fused_cmd_kbuf; + + /* store fused command's primary request for the secondary */ + struct io_kiocb *fused_primary_req; }; union { diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index f8d14d1c58d3..98b7f21623f9 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -73,6 +73,8 @@ struct io_uring_sqe { __u16 buf_index; /* for grouped buffer selection */ __u16 buf_group; + /* how many secondary normal SQEs following this fused SQE */ + __u16 nr_secondary; } __attribute__((packed)); /* personality to use, if used */ __u16 personality; @@ -223,6 +225,7 @@ enum io_uring_op { IORING_OP_URING_CMD, IORING_OP_SEND_ZC, IORING_OP_SENDMSG_ZC, + IORING_OP_FUSED_CMD, /* this goes last, obviously */ IORING_OP_LAST, diff --git a/io_uring/Makefile b/io_uring/Makefile index 8cc8e5387a75..5301077e61c5 100644 --- a/io_uring/Makefile +++ b/io_uring/Makefile @@ -7,5 +7,5 @@ obj-$(CONFIG_IO_URING) += io_uring.o xattr.o nop.o fs.o splice.o \ openclose.o uring_cmd.o epoll.o \ statx.o net.o msg_ring.o timeout.o \ sqpoll.o fdinfo.o tctx.o poll.o \ - cancel.o kbuf.o rsrc.o rw.o opdef.o notif.o + cancel.o kbuf.o rsrc.o rw.o opdef.o notif.o fused_cmd.o obj-$(CONFIG_IO_WQ) += io-wq.o diff --git a/io_uring/fused_cmd.c b/io_uring/fused_cmd.c new file mode 100644 index 000000000000..7af3ddb182c1 --- /dev/null +++ b/io_uring/fused_cmd.c @@ -0,0 +1,241 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include + +#include + +#include "io_uring.h" +#include "opdef.h" +#include "rsrc.h" +#include "uring_cmd.h" +#include "fused_cmd.h" + +static bool io_fused_secondary_valid(const struct io_uring_sqe *sqe, u8 op) +{ + unsigned int sqe_flags = READ_ONCE(sqe->flags); + + if (op == IORING_OP_FUSED_CMD || op == IORING_OP_URING_CMD) + return false; + + if (sqe_flags & REQ_F_BUFFER_SELECT) + return false; + + if (!io_issue_defs[op].fused_secondary) + return false; + + return true; +} + +static inline void io_fused_cmd_update_link_flags(struct io_kiocb *req, + const struct io_kiocb *secondary) +{ + /* + * We have to keep secondary SQE in order, so update primary link flags + * with secondary request's given primary command isn't completed until + * the secondary request is done + */ + if (secondary->flags & (REQ_F_LINK | REQ_F_HARDLINK)) + req->flags |= REQ_F_LINK; +} + +int io_fused_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) + __must_hold(&req->ctx->uring_lock) +{ + struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd); + const struct io_uring_sqe *secondary_sqe = sqe + 1; + struct io_ring_ctx *ctx = req->ctx; + struct io_kiocb *secondary; + u8 secondary_op; + int ret; + + if (unlikely(!(ctx->flags & IORING_SETUP_SQE128))) + return -EINVAL; + + if (unlikely(sqe->__pad1)) + return -EINVAL; + + /* + * So far, only support single secondary request, in future we may + * extend to support multiple secondary requests + */ + if (unlikely(sqe->nr_secondary != 1)) + return -EINVAL; + + ioucmd->flags = READ_ONCE(sqe->uring_cmd_flags); + if (unlikely(ioucmd->flags)) + return -EINVAL; + + secondary_op = READ_ONCE(secondary_sqe->opcode); + if (unlikely(!io_fused_secondary_valid(secondary_sqe, secondary_op))) + return -EINVAL; + + ioucmd->cmd = sqe->cmd; + ioucmd->cmd_op = READ_ONCE(sqe->cmd_op); + req->fused_cmd_kbuf = NULL; + + /* take one extra reference for the secondary request */ + io_get_task_refs(1); + + ret = -ENOMEM; + if (unlikely(!io_alloc_req(ctx, &secondary))) + goto fail; + + ret = io_init_secondary_req(ctx, secondary, secondary_sqe); + if (unlikely(ret)) + goto fail_free_req; + + /* + * The secondary request won't be linked to io_uring submission link list, + * so it can't be handled by IORING_OP_LINK_TIMEOUT, however, we can do + * that on primary command directly + */ + io_fused_cmd_update_link_flags(req, secondary); + + ioucmd->fused.data.__secondary = secondary; + + return 0; + +fail_free_req: + io_free_req(secondary); +fail: + current->io_uring->cached_refs += 1; + return ret; +} + +int io_fused_cmd(struct io_kiocb *req, unsigned int issue_flags) +{ + struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd); + const struct io_kiocb *secondary = ioucmd->fused.data.__secondary; + int ret = -EINVAL; + + /* + * Pass buffer direction for driver to validate if the requested buffer + * direction is legal + */ + if (io_issue_defs[secondary->opcode].buf_dir) + issue_flags |= IO_URING_F_FUSED_BUF_DEST; + else + issue_flags |= IO_URING_F_FUSED_BUF_SRC; + + ret = io_uring_cmd(req, issue_flags); + if (ret != IOU_ISSUE_SKIP_COMPLETE) + io_free_req(ioucmd->fused.data.__secondary); + + return ret; +} + +int io_import_buf_for_secondary(unsigned long buf_off, unsigned int len, + int dir, struct iov_iter *iter, struct io_kiocb *secondary) +{ + struct io_kiocb *req = secondary->fused_primary_req; + const struct io_uring_bvec_buf *kbuf; + unsigned long offset; + + if (unlikely(!(secondary->flags & REQ_F_FUSED_SECONDARY) || !req)) + return -EINVAL; + + if (unlikely(!req->fused_cmd_kbuf)) + return -EINVAL; + + /* req->fused_cmd_kbuf is immutable */ + kbuf = req->fused_cmd_kbuf; + offset = kbuf->offset; + + if (!kbuf->bvec) + return -EINVAL; + + if (unlikely(buf_off > kbuf->len)) + return -EFAULT; + + if (unlikely(len > kbuf->len - buf_off)) + return -EFAULT; + + /* don't use io_import_fixed which doesn't support multipage bvec */ + offset += buf_off; + iov_iter_bvec(iter, dir, kbuf->bvec, kbuf->nr_bvecs, offset + len); + + if (offset) + iov_iter_advance(iter, offset); + + return 0; +} + +/* + * Called after secondary request is completed, + * + * Return back primary's fused_cmd kbuf, and notify primary request by + * the saved callback. + */ +void io_fused_cmd_return_buf(struct io_kiocb *secondary) +{ + struct io_kiocb *req = secondary->fused_primary_req; + struct io_uring_cmd *ioucmd; + + if (unlikely(!req || !(secondary->flags & REQ_F_FUSED_SECONDARY))) + return; + + /* return back the buffer */ + secondary->fused_primary_req = NULL; + ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd); + ioucmd->fused.data.__secondary = NULL; + + /* + * If secondary OP skips CQE, return the result via primary command; or + * if secondary request is failed, REQ_F_CQE_SKIP will be cleared, return + * result too + */ + if ((secondary->flags & REQ_F_CQE_SKIP) || secondary->cqe.res < 0) + ioucmd->fused.data.secondary_res = secondary->cqe.res; + else + ioucmd->fused.data.secondary_res = 0; + io_uring_cmd_complete_in_task(ioucmd, ioucmd->task_work_cb); +} + +/* + * Called for starting secondary request after primary command prepared io buffer. + * + * The io buffer is represented by @fused_cmd_kbuf, which is read only for + * secondary request, however secondary request can retrieve any sub-buffer by its + * sqe->addr(offset) & sqe->len. For secondary request, io buffer is imported + * by io_import_buf_for_secondary(). + * + * Slave request borrows primary's io buffer for handling the secondary operation, + * and the buffer is returned back via io_fused_cmd_return_buf after the secondary + * request is completed. Meantime the primary command is completed from + * io_fused_cmd_return_buf(). And driver gets completion notification by + * the passed callback of @complete_tw_cb. + */ +void io_fused_cmd_start_secondary_req(struct io_uring_cmd *ioucmd, + unsigned issue_flags, + const struct io_uring_bvec_buf *fused_cmd_kbuf, + void (*complete_tw_cb)(struct io_uring_cmd *, unsigned)) +{ + struct io_kiocb *req = cmd_to_io_kiocb(ioucmd); + struct io_kiocb *secondary = ioucmd->fused.data.__secondary; + struct io_tw_state ts = { + .locked = !(issue_flags & IO_URING_F_UNLOCKED), + }; + + if (WARN_ON_ONCE(unlikely(!secondary || !(secondary->flags & + REQ_F_FUSED_SECONDARY)))) + return; + + /* + * Once the fused secondary request is completed and the buffer isn't be + * used, the driver will be notified by callback of complete_tw_cb + */ + ioucmd->task_work_cb = complete_tw_cb; + + /* now we get the buffer */ + req->fused_cmd_kbuf = fused_cmd_kbuf; + secondary->fused_primary_req = req; + + trace_io_uring_submit_sqe(secondary, true); + io_req_task_submit(secondary, &ts); +} +EXPORT_SYMBOL_GPL(io_fused_cmd_start_secondary_req); diff --git a/io_uring/fused_cmd.h b/io_uring/fused_cmd.h new file mode 100644 index 000000000000..c75e5d8c5763 --- /dev/null +++ b/io_uring/fused_cmd.h @@ -0,0 +1,11 @@ +// SPDX-License-Identifier: GPL-2.0 +#ifndef IOU_FUSED_CMD_H +#define IOU_FUSED_CMD_H + +int io_fused_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); +int io_fused_cmd(struct io_kiocb *req, unsigned int issue_flags); +void io_fused_cmd_return_buf(struct io_kiocb *secondary); +int io_import_buf_for_secondary(unsigned long buf, unsigned int len, int dir, + struct iov_iter *iter, struct io_kiocb *secondary); + +#endif diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 693558c4b10b..ddbc9b9e51d3 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -92,6 +92,7 @@ #include "cancel.h" #include "net.h" #include "notif.h" +#include "fused_cmd.h" #include "timeout.h" #include "poll.h" @@ -111,7 +112,7 @@ #define IO_REQ_CLEAN_FLAGS (REQ_F_BUFFER_SELECTED | REQ_F_NEED_CLEANUP | \ REQ_F_POLLED | REQ_F_INFLIGHT | REQ_F_CREDS | \ - REQ_F_ASYNC_DATA) + REQ_F_ASYNC_DATA | REQ_F_FUSED_SECONDARY) #define IO_REQ_CLEAN_SLOW_FLAGS (REQ_F_REFCOUNT | REQ_F_LINK | REQ_F_HARDLINK |\ IO_REQ_CLEAN_FLAGS) @@ -971,6 +972,9 @@ static void __io_req_complete_post(struct io_kiocb *req) { struct io_ring_ctx *ctx = req->ctx; + if (req->flags & REQ_F_FUSED_SECONDARY) + io_fused_cmd_return_buf(req); + io_cq_lock(ctx); if (!(req->flags & REQ_F_CQE_SKIP)) io_fill_cqe_req(ctx, req); @@ -1855,6 +1859,8 @@ static void io_clean_op(struct io_kiocb *req) spin_lock(&req->ctx->completion_lock); io_put_kbuf_comp(req); spin_unlock(&req->ctx->completion_lock); + } else if (req->flags & REQ_F_FUSED_SECONDARY) { + io_fused_cmd_return_buf(req); } if (req->flags & REQ_F_NEED_CLEANUP) { @@ -2163,8 +2169,8 @@ static void io_init_req_drain(struct io_kiocb *req) } } -static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req, - const struct io_uring_sqe *sqe) +static inline int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req, + const struct io_uring_sqe *sqe, bool secondary) __must_hold(&ctx->uring_lock) { const struct io_issue_def *def; @@ -2217,6 +2223,12 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req, } } + if (secondary) { + if (!def->fused_secondary) + return -EINVAL; + req->flags |= REQ_F_FUSED_SECONDARY; + } + if (!def->ioprio && sqe->ioprio) return -EINVAL; if (!def->iopoll && (ctx->flags & IORING_SETUP_IOPOLL)) @@ -2257,6 +2269,12 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req, return def->prep(req, sqe); } +int io_init_secondary_req(struct io_ring_ctx *ctx, struct io_kiocb *req, + const struct io_uring_sqe *sqe) +{ + return io_init_req(ctx, req, sqe, true); +} + static __cold int io_submit_fail_init(const struct io_uring_sqe *sqe, struct io_kiocb *req, int ret) { @@ -2301,7 +2319,7 @@ static inline int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req, struct io_submit_link *link = &ctx->submit_state.link; int ret; - ret = io_init_req(ctx, req, sqe); + ret = io_init_req(ctx, req, sqe, false); if (unlikely(ret)) return io_submit_fail_init(sqe, req, ret); diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index c33f719731ac..dd193c612348 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -78,6 +78,9 @@ bool __io_alloc_req_refill(struct io_ring_ctx *ctx); bool io_match_task_safe(struct io_kiocb *head, struct task_struct *task, bool cancel_all); +int io_init_secondary_req(struct io_ring_ctx *ctx, struct io_kiocb *req, + const struct io_uring_sqe *sqe); + #define io_lockdep_assert_cq_locked(ctx) \ do { \ if (ctx->flags & IORING_SETUP_IOPOLL) { \ diff --git a/io_uring/opdef.c b/io_uring/opdef.c index cca7c5b55208..63b90e8e65f8 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -33,6 +33,7 @@ #include "poll.h" #include "cancel.h" #include "rw.h" +#include "fused_cmd.h" static int io_no_issue(struct io_kiocb *req, unsigned int issue_flags) { @@ -428,6 +429,12 @@ const struct io_issue_def io_issue_defs[] = { .prep = io_eopnotsupp_prep, #endif }, + [IORING_OP_FUSED_CMD] = { + .needs_file = 1, + .plug = 1, + .prep = io_fused_cmd_prep, + .issue = io_fused_cmd, + }, }; @@ -648,6 +655,11 @@ const struct io_cold_def io_cold_defs[] = { .fail = io_sendrecv_fail, #endif }, + [IORING_OP_FUSED_CMD] = { + .name = "FUSED_CMD", + .async_size = uring_cmd_pdu_size(1), + .prep_async = io_uring_cmd_prep_async, + }, }; const char *io_uring_get_opcode(u8 opcode) diff --git a/io_uring/opdef.h b/io_uring/opdef.h index c22c8696e749..bded61ebcbfc 100644 --- a/io_uring/opdef.h +++ b/io_uring/opdef.h @@ -29,6 +29,13 @@ struct io_issue_def { unsigned iopoll_queue : 1; /* opcode specific path will handle ->async_data allocation if needed */ unsigned manual_alloc : 1; + /* can be secondary op of fused command */ + unsigned fused_secondary : 1; + /* + * buffer direction, 0 : read from buffer, 1: write to buffer, used + * for fused_secondary only + */ + unsigned buf_dir : 1; int (*issue)(struct io_kiocb *, unsigned int); int (*prep)(struct io_kiocb *, const struct io_uring_sqe *); From patchwork Tue Mar 28 15:09:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 76119 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp2299291vqo; Tue, 28 Mar 2023 08:18:46 -0700 (PDT) X-Google-Smtp-Source: AKy350agV3F+WafW9Mp4McCuwHGPtOT4PjsYQAhS83Qu4+s1P4W7J+/nPxU5FtFJKqvgp8uardYK X-Received: by 2002:a17:906:4d0f:b0:8b2:8876:6a3c with SMTP id r15-20020a1709064d0f00b008b288766a3cmr15695623eju.29.1680016726462; Tue, 28 Mar 2023 08:18:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680016726; cv=none; d=google.com; s=arc-20160816; b=ZPmI3EswMWd3CXP60fDl0X5TmY3BcXEkk2THv/aWMBgXgUz+KVW9neGSgTj/jdEqfE XJyi1psApmX95BNKYyPFgxvz4Aad5KMujs9W3nXaXJf1srXDRade6rArvX5RyD9cAlrs xJMYTjAcaQeLiWzEcHpjJEHec8/sHg52BmCtaTzXxOBH+I7rGgpHsZtybMvJCqpheUix 4dchjFuUf8pEhJv849hhM8aWHnEkAzNkiT4XhDVwsEbFfhB/UavdPD3Y34RVwVNbmnoo iyMwJtQx7GnbmryvkDV/9zYs0kFTXvZiocRWNfzQsQr4BHcnHfqwRNsHWbUxI1SJV7c9 cB9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=siKKh0x5p/UqrBQe9QYa+cxMAYrLZPJZiVZtx1DzXec=; b=UdQZwOmiaaUD3gRxAPn18B7UXRJ6bB1784VfN21FEP2yaUW6lALXBAluQfWnvYyd1u cE/g6Qnr26ZLG+gdJRE2znJiJuXBcCijuU8iaQ738dcyyp5yphyNDqYqBPFH9eVtqJmN xoBle3NXXPcUPy9PFoG2wmHhuS3WQBOdaY3O56bwbM1lATTlfCJkJQzoK1B4XM7ONlh/ 08iFXE6mpd7iH8gjRu1I++JMuizOLzCzEgM2mh8MsANyhtkS3/3IxXn8FLEFriLND7Fv 4+pQXKXSqnY1tGdjDfSCVNuTCLWVM3avr8krki4eaXinBF0T3WvBqGpnd9Ncx3RpCkFE 4vrQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Yl5FdRpJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ne36-20020a1709077ba400b0093d4fb59d99si15690045ejc.422.2023.03.28.08.18.21; Tue, 28 Mar 2023 08:18:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Yl5FdRpJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233905AbjC1POM (ORCPT + 99 others); Tue, 28 Mar 2023 11:14:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53506 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233931AbjC1PNk (ORCPT ); Tue, 28 Mar 2023 11:13:40 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6634D10252 for ; Tue, 28 Mar 2023 08:12:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680016265; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=siKKh0x5p/UqrBQe9QYa+cxMAYrLZPJZiVZtx1DzXec=; b=Yl5FdRpJp7RFWDKyfDFsAPK0F5bnCA7FwbVoN1lxGqUm8CpdNfhfoAY1x7uEyFTPEKdhQH F2DiweAWSTtWmru4/AWrrQIgDg1kJql0vYBrhf3kQ4R0y7lF/nPAf4/p6JoJl1Y8UQv982 MKAAU+B7MLoBUv65rWYxuEv38fuKb0s= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-381-TbRalWqROt-cqgLg3YGivw-1; Tue, 28 Mar 2023 11:11:00 -0400 X-MC-Unique: TbRalWqROt-cqgLg3YGivw-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E82993C17124; Tue, 28 Mar 2023 15:10:58 +0000 (UTC) Received: from localhost (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0416C1121330; Tue, 28 Mar 2023 15:10:57 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Pavel Begunkov , Stefan Hajnoczi , Dan Williams , Ming Lei Subject: [PATCH V5 03/16] io_uring: support normal SQE for fused command Date: Tue, 28 Mar 2023 23:09:45 +0800 Message-Id: <20230328150958.1253547-4-ming.lei@redhat.com> In-Reply-To: <20230328150958.1253547-1-ming.lei@redhat.com> References: <20230328150958.1253547-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761625219005104960?= X-GMAIL-MSGID: =?utf-8?q?1761625219005104960?= So far, the secondary sqe is saved in the 2nd 64 byte of primary sqe, which requires that SQE128 has to be enabled. Relax this limit by allowing to fetch secondary SQE from SQ directly. IORING_URING_CMD_FUSED_SPLIT_SQE has to be set for this usage, and userspace has to put secondary SQE following the primary sqe. Signed-off-by: Ming Lei --- include/uapi/linux/io_uring.h | 8 ++++++- io_uring/fused_cmd.c | 42 ++++++++++++++++++++++++++++------- io_uring/io_uring.c | 23 +++++++++++++------ io_uring/io_uring.h | 2 ++ 4 files changed, 59 insertions(+), 16 deletions(-) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 98b7f21623f9..b379677dff9d 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -235,9 +235,15 @@ enum io_uring_op { * sqe->uring_cmd_flags * IORING_URING_CMD_FIXED use registered buffer; pass this flag * along with setting sqe->buf_index. + * + * IORING_URING_CMD_FUSED_SPLIT_SQE fused command only, secondary sqe is + * provided from another new sqe; without + * setting the flag, secondary sqe is from + * 2nd 64byte of this sqe, so SQE128 has + * to be enabled */ #define IORING_URING_CMD_FIXED (1U << 0) - +#define IORING_URING_CMD_FUSED_SPLIT_SQE (1U << 1) /* * sqe->fsync_flags diff --git a/io_uring/fused_cmd.c b/io_uring/fused_cmd.c index 7af3ddb182c1..25577cbb0e9c 100644 --- a/io_uring/fused_cmd.c +++ b/io_uring/fused_cmd.c @@ -43,18 +43,34 @@ static inline void io_fused_cmd_update_link_flags(struct io_kiocb *req, req->flags |= REQ_F_LINK; } +static const struct io_uring_sqe *fused_cmd_get_secondary_sqe( + struct io_ring_ctx *ctx, const struct io_uring_sqe *primary, + bool split_sqe) +{ + if (unlikely(!(ctx->flags & IORING_SETUP_SQE128) && !split_sqe)) + return NULL; + + if (split_sqe) { + const struct io_uring_sqe *sqe; + + if (unlikely(!io_get_secondary_sqe(ctx, &sqe))) + return NULL; + return sqe; + } + + return primary + 1; +} + int io_fused_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) __must_hold(&req->ctx->uring_lock) { struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd); - const struct io_uring_sqe *secondary_sqe = sqe + 1; + const struct io_uring_sqe *secondary_sqe; struct io_ring_ctx *ctx = req->ctx; struct io_kiocb *secondary; u8 secondary_op; int ret; - - if (unlikely(!(ctx->flags & IORING_SETUP_SQE128))) - return -EINVAL; + bool split_sqe; if (unlikely(sqe->__pad1)) return -EINVAL; @@ -67,7 +83,12 @@ int io_fused_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) return -EINVAL; ioucmd->flags = READ_ONCE(sqe->uring_cmd_flags); - if (unlikely(ioucmd->flags)) + if (unlikely(ioucmd->flags & ~IORING_URING_CMD_FUSED_SPLIT_SQE)) + return -EINVAL; + + split_sqe = ioucmd->flags & IORING_URING_CMD_FUSED_SPLIT_SQE; + secondary_sqe = fused_cmd_get_secondary_sqe(ctx, sqe, split_sqe); + if (unlikely(!secondary_sqe)) return -EINVAL; secondary_op = READ_ONCE(secondary_sqe->opcode); @@ -78,8 +99,12 @@ int io_fused_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) ioucmd->cmd_op = READ_ONCE(sqe->cmd_op); req->fused_cmd_kbuf = NULL; - /* take one extra reference for the secondary request */ - io_get_task_refs(1); + /* + * Take one extra reference for the secondary request built from + * builtin SQE since io_uring core code doesn't grab it for us + */ + if (!split_sqe) + io_get_task_refs(1); ret = -ENOMEM; if (unlikely(!io_alloc_req(ctx, &secondary))) @@ -103,7 +128,8 @@ int io_fused_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) fail_free_req: io_free_req(secondary); fail: - current->io_uring->cached_refs += 1; + if (!split_sqe) + current->io_uring->cached_refs += 1; return ret; } diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index ddbc9b9e51d3..9d9bc5b06ca2 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2414,7 +2414,8 @@ static void io_commit_sqring(struct io_ring_ctx *ctx) * used, it's important that those reads are done through READ_ONCE() to * prevent a re-load down the line. */ -static bool io_get_sqe(struct io_ring_ctx *ctx, const struct io_uring_sqe **sqe) +static inline bool io_get_sqe(struct io_ring_ctx *ctx, + const struct io_uring_sqe **sqe) { unsigned head, mask = ctx->sq_entries - 1; unsigned sq_idx = ctx->cached_sq_head++ & mask; @@ -2443,19 +2444,26 @@ static bool io_get_sqe(struct io_ring_ctx *ctx, const struct io_uring_sqe **sqe) return false; } +bool io_get_secondary_sqe(struct io_ring_ctx *ctx, + const struct io_uring_sqe **sqe) +{ + return io_get_sqe(ctx, sqe); +} + int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr) __must_hold(&ctx->uring_lock) { unsigned int entries = io_sqring_entries(ctx); - unsigned int left; + unsigned old_head = ctx->cached_sq_head; + unsigned int left = 0; int ret; if (unlikely(!entries)) return 0; /* make sure SQ entry isn't read before tail */ - ret = left = min3(nr, ctx->sq_entries, entries); - io_get_task_refs(left); - io_submit_state_start(&ctx->submit_state, left); + ret = min3(nr, ctx->sq_entries, entries); + io_get_task_refs(ret); + io_submit_state_start(&ctx->submit_state, ret); do { const struct io_uring_sqe *sqe; @@ -2474,11 +2482,12 @@ int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr) */ if (unlikely(io_submit_sqe(ctx, req, sqe)) && !(ctx->flags & IORING_SETUP_SUBMIT_ALL)) { - left--; + left = 1; break; } - } while (--left); + } while ((ctx->cached_sq_head - old_head) < ret); + left = ret - (ctx->cached_sq_head - old_head) - left; if (unlikely(left)) { ret -= left; /* try again if it submitted nothing and can't allocate a req */ diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index dd193c612348..8ede804b3caf 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -78,6 +78,8 @@ bool __io_alloc_req_refill(struct io_ring_ctx *ctx); bool io_match_task_safe(struct io_kiocb *head, struct task_struct *task, bool cancel_all); +bool io_get_secondary_sqe(struct io_ring_ctx *ctx, + const struct io_uring_sqe **sqe); int io_init_secondary_req(struct io_ring_ctx *ctx, struct io_kiocb *req, const struct io_uring_sqe *sqe); From patchwork Tue Mar 28 15:09:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 76143 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp2314040vqo; Tue, 28 Mar 2023 08:41:00 -0700 (PDT) X-Google-Smtp-Source: AKy350Y03Rt2Dmy0Zyg7vTRei86uoviWGHF4sQOy5Vo2Je3g/z2+206h3xL4Zj2PM51CKJYxtVSD X-Received: by 2002:aa7:d50e:0:b0:502:1cd3:d0fb with SMTP id y14-20020aa7d50e000000b005021cd3d0fbmr15976058edq.20.1680018059870; Tue, 28 Mar 2023 08:40:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680018059; cv=none; d=google.com; s=arc-20160816; b=nG6GnbAeKVZU7mE7QqV4qp+8SpaIYMgLGcs0KGnKajfzdUKGzTZY6i5PAsp/L0c8V2 /vFQZwX3Z6LCMopFTFhWo0yPvSWrxeb2lJC1fyHGDTEMI71H6C0GItpHqkKMnKl0RFLt UimvCpsVS61hNLh7JPiigPaNJimhVUrnHSQ2zk9GdOQ1QqVpMZXUD0mq05NnP5Eo3Pdk ZLntbo2NMqy9dVnAaxzlOmwoiuilbUUl2MF3DcLKlvm1orc6uHQHofYH4S0Q5OTAPBiz SoVfz9DWxm02pl7MYX8rTz2DXJ1wJzTb7Nzc7lu1SzBABD+y94hNZovsxn5mxLG9Mmvr kc2g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=lzFlB5kGsJuuszceCewof5xYnczQcl6NgLt+NqBNBds=; b=GHG0c/xJ3GUl5XIISwAsc7NI6WGvH7BATXPPuVBTyLVrrQB8cUMyDwgjRgHBBZvpvx RVQhJigpLSylFPT+kRbbr9gQiUphv70/LRGKVgYcRGIGQRsvlsWKdiO64tPyczW/9iBo qWo71GRAvMkXsieXqYbYocfR8mS+0quyu103eOHsz0MM8QrLt6C9rGZFj0HPRr7MeYxC bzAuOQeKx17P6qyxdGCRvqFdBU/aHVd91a6VBlM+MaRE6Dau3+7+yiSUvjYYhy6NBc4A Uw3iVegm7Jn5KYr82dJu3WjwHhjXuBe2JYC/6nhwbyw4je91ERAkoR4KG+H7otwBzGE/ +e1Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=YQZV5Ywr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y8-20020a50e608000000b004ad4c17aa75si30466916edm.425.2023.03.28.08.40.36; Tue, 28 Mar 2023 08:40:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=YQZV5Ywr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233067AbjC1P2w (ORCPT + 99 others); Tue, 28 Mar 2023 11:28:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49456 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233023AbjC1P2X (ORCPT ); Tue, 28 Mar 2023 11:28:23 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 90F08FF13 for ; Tue, 28 Mar 2023 08:26:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680017161; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lzFlB5kGsJuuszceCewof5xYnczQcl6NgLt+NqBNBds=; b=YQZV5YwrP3jnWgVjoG9LxDrarpwf0Im1ll8kmCbiG874bdo3ZnvCnWuvqfcSISvmcwrj41 LTPYlqqy34tvaPzo4JF/z1zcW28vnlrTeRGLPMvrFI7D2Z96haEULEBTTucUhWp3WKfjwi 2lHXaPZOE9oEXAgyiyk5Usng4Q8RWsE= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-18-QzwG27TSP_qmvEwynYE1dQ-1; Tue, 28 Mar 2023 11:11:12 -0400 X-MC-Unique: QzwG27TSP_qmvEwynYE1dQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0BF0E801206; Tue, 28 Mar 2023 15:11:09 +0000 (UTC) Received: from localhost (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id 190334020C82; Tue, 28 Mar 2023 15:11:01 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Pavel Begunkov , Stefan Hajnoczi , Dan Williams , Ming Lei Subject: [PATCH V5 04/16] io_uring: support OP_READ/OP_WRITE for fused secondary request Date: Tue, 28 Mar 2023 23:09:46 +0800 Message-Id: <20230328150958.1253547-5-ming.lei@redhat.com> In-Reply-To: <20230328150958.1253547-1-ming.lei@redhat.com> References: <20230328150958.1253547-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761626617381504594?= X-GMAIL-MSGID: =?utf-8?q?1761626617381504594?= Start to allow fused secondary request to support OP_READ/OP_WRITE, and the buffer can be retrieved from the primary request. Once the secondary request is completed, the primary request buffer will be returned back. Signed-off-by: Ming Lei --- io_uring/opdef.c | 4 ++++ io_uring/rw.c | 21 +++++++++++++++++++++ 2 files changed, 25 insertions(+) diff --git a/io_uring/opdef.c b/io_uring/opdef.c index 63b90e8e65f8..d81c9afd65ed 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -235,6 +235,8 @@ const struct io_issue_def io_issue_defs[] = { .ioprio = 1, .iopoll = 1, .iopoll_queue = 1, + .fused_secondary = 1, + .buf_dir = WRITE, .prep = io_prep_rw, .issue = io_read, }, @@ -248,6 +250,8 @@ const struct io_issue_def io_issue_defs[] = { .ioprio = 1, .iopoll = 1, .iopoll_queue = 1, + .fused_secondary = 1, + .buf_dir = READ, .prep = io_prep_rw, .issue = io_write, }, diff --git a/io_uring/rw.c b/io_uring/rw.c index 5431caf1e331..d25eeee67c65 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -19,6 +19,7 @@ #include "kbuf.h" #include "rsrc.h" #include "rw.h" +#include "fused_cmd.h" struct io_rw { /* NOTE: kiocb has the file as the first member, so don't do it here */ @@ -371,6 +372,18 @@ static struct iovec *__io_import_iovec(int ddir, struct io_kiocb *req, size_t sqe_len; ssize_t ret; + /* + * fused_secondary OP passes buffer offset from sqe->addr actually, since + * the fused cmd buf's mapped start address is zero. + */ + if (req->flags & REQ_F_FUSED_SECONDARY) { + ret = io_import_buf_for_secondary(rw->addr, rw->len, ddir, + iter, req); + if (ret) + return ERR_PTR(ret); + return NULL; + } + if (opcode == IORING_OP_READ_FIXED || opcode == IORING_OP_WRITE_FIXED) { ret = io_import_fixed(ddir, iter, req->imu, rw->addr, rw->len); if (ret) @@ -443,11 +456,19 @@ static inline loff_t *io_kiocb_ppos(struct kiocb *kiocb) */ static ssize_t loop_rw_iter(int ddir, struct io_rw *rw, struct iov_iter *iter) { + struct io_kiocb *req = cmd_to_io_kiocb(rw); struct kiocb *kiocb = &rw->kiocb; struct file *file = kiocb->ki_filp; ssize_t ret = 0; loff_t *ppos; + /* + * Fused secondary req hasn't user buffer, so ->read/->write can't + * be supported + */ + if (req->flags & REQ_F_FUSED_SECONDARY) + return -EOPNOTSUPP; + /* * Don't support polled IO through this interface, and we can't * support non-blocking either. For the latter, this just causes From patchwork Tue Mar 28 15:09:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 76126 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp2302207vqo; Tue, 28 Mar 2023 08:22:54 -0700 (PDT) X-Google-Smtp-Source: AKy350aUjI3z78dWAc2Cv9vMnnhBfuqkX0iVNhXH+q5G5s9iKGa/E9JUtunur1+G/r0jcfCXrsXv X-Received: by 2002:a05:6402:204c:b0:500:2cc6:36da with SMTP id bc12-20020a056402204c00b005002cc636damr16919235edb.19.1680016973997; Tue, 28 Mar 2023 08:22:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680016973; cv=none; d=google.com; s=arc-20160816; b=xiS7BvTmwX6aJY1xLdbZIbMhq5LXCxcW8kPPygRK/Jq1nDIaSIIG066i0fGqjmDpPQ wbrZ63fsdZOOq0eurra0Zh8yuKm+KMw4wFqTcFaSzRp2uo/wawYNW1wTI9Rrr/swXsBx TiwTwpv4QMPGFLJGa8ntyItEnaCt8VyrLhU9AXxjzStuEsNi62L81Uw7XNDkJIAXRNMq EdeKo3uJzw064yxjejpgm9vktJNJzjavPHdqGqvz5z+895FxlIWlqaenrFSIMu/ct4cf szhG0KPHqB79IYq3oG0GOosxKSqzrxnqGkOqWw+Z2yhhjgwocSjHH4Ue+xle51WTcCmG EopQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=B4h73f5q+9OJB7ocdpIjrrTEZsKx6E5zs0QXKOnNU3Q=; b=eHQlodC9l3aNKN4Z8ICrjt8ACUbzbe0KIipocKC83WAa+9D7bY2FegFoarCYwKvpxv w8YmGRfMInDHbNxRjF0q1BiM4Rhi6DGVwoUWHmw+bLuO75qZdthWuSDm2X4Dc7p432fb zsIUXBfZBHz/8fNTG5VElgvrIkl+WW8MpkqG9k2lkuQtltxf0WB2875cWslfmP5s54Zr gRWTcP28Ih7RKs7al4yh3+0eqM/69cglNXJ5lzI1uGohejbqGiBV3NQTVya0v5Lt1OmZ pjVc8kaGevuGNTNAqBiJxlc1hcZFOV2h4YNm7iIgsrFFWmvwBk+VLPHLisLjN3xAXbYN D68A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=CyL3lS4l; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h8-20020a50ed88000000b004fd56e106dfsi28437957edr.547.2023.03.28.08.22.30; Tue, 28 Mar 2023 08:22:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=CyL3lS4l; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233943AbjC1PO0 (ORCPT + 99 others); Tue, 28 Mar 2023 11:14:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54796 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233937AbjC1PNu (ORCPT ); Tue, 28 Mar 2023 11:13:50 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 62A1710A91 for ; Tue, 28 Mar 2023 08:12:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680016285; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=B4h73f5q+9OJB7ocdpIjrrTEZsKx6E5zs0QXKOnNU3Q=; b=CyL3lS4lhsNqVZ4rBz+yBJiIAyaeUbwPsTgqiGQqCtlMY6jMD3PEilXOmhug6nu8dx95a3 3YDfl5/UAdPntcpyYvIF/BikeIh1eab4VpKH5db7740fz/zkaG4GGBQxtp32e69OR9+ATH h+QAT1l6pg0qhk8RT/fDw0p2giK4pWU= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-299-IE_ZslxBPuyv2EpFbWcMkQ-1; Tue, 28 Mar 2023 11:11:21 -0400 X-MC-Unique: IE_ZslxBPuyv2EpFbWcMkQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 2C6981875045; Tue, 28 Mar 2023 15:11:13 +0000 (UTC) Received: from localhost (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id 33C89C15BB8; Tue, 28 Mar 2023 15:11:11 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Pavel Begunkov , Stefan Hajnoczi , Dan Williams , Ming Lei Subject: [PATCH V5 05/16] io_uring: support OP_SEND_ZC/OP_RECV for fused secondary request Date: Tue, 28 Mar 2023 23:09:47 +0800 Message-Id: <20230328150958.1253547-6-ming.lei@redhat.com> In-Reply-To: <20230328150958.1253547-1-ming.lei@redhat.com> References: <20230328150958.1253547-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761625478850496154?= X-GMAIL-MSGID: =?utf-8?q?1761625478850496154?= Start to allow fused secondary request to support OP_SEND_ZC/OP_RECV, and the buffer can be retrieved from primary request. Once the secondary request is completed, the primary buffer will be returned back. Signed-off-by: Ming Lei --- io_uring/net.c | 30 ++++++++++++++++++++++++++++-- io_uring/opdef.c | 6 ++++++ 2 files changed, 34 insertions(+), 2 deletions(-) diff --git a/io_uring/net.c b/io_uring/net.c index 4040cf093318..e1c807a6e503 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -16,6 +16,7 @@ #include "net.h" #include "notif.h" #include "rsrc.h" +#include "fused_cmd.h" #if defined(CONFIG_NET) struct io_shutdown { @@ -69,6 +70,13 @@ struct io_sr_msg { struct io_kiocb *notif; }; +#define user_ptr_to_u64(x) ( \ +{ \ + typecheck(void __user *, (x)); \ + (u64)(unsigned long)(x); \ +} \ +) + static inline bool io_check_multishot(struct io_kiocb *req, unsigned int issue_flags) { @@ -379,7 +387,11 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags) if (unlikely(!sock)) return -ENOTSOCK; - ret = import_ubuf(ITER_SOURCE, sr->buf, sr->len, &msg.msg_iter); + if (!(req->flags & REQ_F_FUSED_SECONDARY)) + ret = import_ubuf(ITER_SOURCE, sr->buf, sr->len, &msg.msg_iter); + else + ret = io_import_buf_for_secondary(user_ptr_to_u64(sr->buf), + sr->len, ITER_SOURCE, &msg.msg_iter, req); if (unlikely(ret)) return ret; @@ -870,7 +882,11 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags) sr->buf = buf; } - ret = import_ubuf(ITER_DEST, sr->buf, len, &msg.msg_iter); + if (!(req->flags & REQ_F_FUSED_SECONDARY)) + ret = import_ubuf(ITER_DEST, sr->buf, len, &msg.msg_iter); + else + ret = io_import_buf_for_secondary(user_ptr_to_u64(sr->buf), + sr->len, ITER_DEST, &msg.msg_iter, req); if (unlikely(ret)) goto out_free; @@ -984,6 +1000,9 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) if (zc->flags & IORING_RECVSEND_FIXED_BUF) { unsigned idx = READ_ONCE(sqe->buf_index); + if (req->flags & REQ_F_FUSED_SECONDARY) + return -EINVAL; + if (unlikely(idx >= ctx->nr_user_bufs)) return -EFAULT; idx = array_index_nospec(idx, ctx->nr_user_bufs); @@ -1120,8 +1139,15 @@ int io_send_zc(struct io_kiocb *req, unsigned int issue_flags) if (unlikely(ret)) return ret; msg.sg_from_iter = io_sg_from_iter; + } else if (req->flags & REQ_F_FUSED_SECONDARY) { + ret = io_import_buf_for_secondary(user_ptr_to_u64(zc->buf), + zc->len, ITER_SOURCE, &msg.msg_iter, req); + if (unlikely(ret)) + return ret; + msg.sg_from_iter = io_sg_from_iter; } else { io_notif_set_extended(zc->notif); + ret = import_ubuf(ITER_SOURCE, zc->buf, zc->len, &msg.msg_iter); if (unlikely(ret)) return ret; diff --git a/io_uring/opdef.c b/io_uring/opdef.c index d81c9afd65ed..c31badf4fe45 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -273,6 +273,8 @@ const struct io_issue_def io_issue_defs[] = { .audit_skip = 1, .ioprio = 1, .manual_alloc = 1, + .fused_secondary = 1, + .buf_dir = READ, #if defined(CONFIG_NET) .prep = io_sendmsg_prep, .issue = io_send, @@ -287,6 +289,8 @@ const struct io_issue_def io_issue_defs[] = { .buffer_select = 1, .audit_skip = 1, .ioprio = 1, + .fused_secondary = 1, + .buf_dir = WRITE, #if defined(CONFIG_NET) .prep = io_recvmsg_prep, .issue = io_recv, @@ -413,6 +417,8 @@ const struct io_issue_def io_issue_defs[] = { .audit_skip = 1, .ioprio = 1, .manual_alloc = 1, + .fused_secondary = 1, + .buf_dir = READ, #if defined(CONFIG_NET) .prep = io_send_zc_prep, .issue = io_send_zc, From patchwork Tue Mar 28 15:09:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 76129 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp2303635vqo; Tue, 28 Mar 2023 08:25:10 -0700 (PDT) X-Google-Smtp-Source: AKy350ZCh1MQ1K+r5bmD4rX5uMtRLYM/9Kmo6BEsMyazvG+7myqqgdVMR4nilUqWm4+5lO8eghDa X-Received: by 2002:a17:907:8687:b0:93b:46f7:a716 with SMTP id qa7-20020a170907868700b0093b46f7a716mr21230248ejc.50.1680017110055; Tue, 28 Mar 2023 08:25:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680017110; cv=none; d=google.com; s=arc-20160816; b=NUZCLbY7yRjtk1jAbYvJ2F9rVlC//c8MxG1+40PlUcqwa9VoIJc6JEz9480t72POXn IENVnljgRv1zUtzs6jKETetzFR4dQE+6cfFZmd7i2b7qRYUnb03NkpL1NlqWX1gRV0/F 9JwjII4sTo28GZ6/byfB0vWz3Lxt4GJIOFWh1Rp7mEdO3JMirRF4c9Uz6rwsjBV/S45S FSW1utEi6nqHoBaclmoTaks8QcnL5X8W9z45/1TmlsbohUE8OODTfe7jZ6wSi2I10R9L 6dXDnkRqbSaEQlGy99ox/JS0oHIBBK9FYSH5B03LapB7fT+YcqHPEqWpXjDQtYLvEzX5 3KsQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=2HaaAMVXBOU9npIKRCm63G2Jnf2K0Y8PRLMtIL+XBWM=; b=UmnfDkTw7ovYJXB+g6IpB9SV6YU2Mth9Crg/LUE0D4LBPytTt3RbCCz6o0rMk1gqzE ijZKRgtfKfhQZm4npRmgCTwpjuY9uVP/1lPq+/mtPVYXkio38O88RTAkD5T48DoY1Tv2 tiHrBZ4JxsId8i7PuHyF8dLSB5gbOWoR7PwwiYSVrrDidhhKAPyy3wWCWGsZT9zcQdaR pV19h1hGbUO9NzaVo8Ist/B3WDj7cQjUlCylViQeAAQlkXoYe3jqI+oplwv4hj+2QJ0k WjG5SUfuN5gyPH83mWMz0Entu3m64WOGEotui8hk7/l7EA1hQaRUEbrzuNpMsUEGvnpK QC6Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=a013+mhG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l12-20020a1709066b8c00b00921d40751bcsi29260378ejr.618.2023.03.28.08.24.45; Tue, 28 Mar 2023 08:25:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=a013+mhG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233928AbjC1PSD (ORCPT + 99 others); Tue, 28 Mar 2023 11:18:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53884 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233010AbjC1PRc (ORCPT ); Tue, 28 Mar 2023 11:17:32 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CDFDF10423 for ; Tue, 28 Mar 2023 08:15:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680016500; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2HaaAMVXBOU9npIKRCm63G2Jnf2K0Y8PRLMtIL+XBWM=; b=a013+mhGXq/5KosL/zRRFwMQ7pByfNEnJbMlP3imWYCzL130N1Sfz3dOUg0ibEIsgbAwDF 72FhrjRoxywQc/5y1b9KGakppX5KMPfzupJg+ZDC7Ga4FhzA9wAbiBp9A+VCHH4y5u9t60 vDqn1sQ4chfYt7FDrv0SbtmS6oV+Ars= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-380-EKSnFKTlNW2-26zE5_Z4xw-1; Tue, 28 Mar 2023 11:11:23 -0400 X-MC-Unique: EKSnFKTlNW2-26zE5_Z4xw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id D375E38149CC; Tue, 28 Mar 2023 15:11:21 +0000 (UTC) Received: from localhost (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id E684D4020C83; Tue, 28 Mar 2023 15:11:15 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Pavel Begunkov , Stefan Hajnoczi , Dan Williams , Ming Lei Subject: [PATCH V5 06/16] block: ublk_drv: add common exit handling Date: Tue, 28 Mar 2023 23:09:48 +0800 Message-Id: <20230328150958.1253547-7-ming.lei@redhat.com> In-Reply-To: <20230328150958.1253547-1-ming.lei@redhat.com> References: <20230328150958.1253547-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761625621303249160?= X-GMAIL-MSGID: =?utf-8?q?1761625621303249160?= Simplify exit handling a bit, and prepare for supporting fused command. Reviewed-by: Ziyang Zhang Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index c73cc57ec547..bc46616710d4 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -655,14 +655,15 @@ static void ublk_complete_rq(struct request *req) struct ublk_queue *ubq = req->mq_hctx->driver_data; struct ublk_io *io = &ubq->ios[req->tag]; unsigned int unmapped_bytes; + blk_status_t res = BLK_STS_OK; /* failed read IO if nothing is read */ if (!io->res && req_op(req) == REQ_OP_READ) io->res = -EIO; if (io->res < 0) { - blk_mq_end_request(req, errno_to_blk_status(io->res)); - return; + res = errno_to_blk_status(io->res); + goto exit; } /* @@ -671,10 +672,8 @@ static void ublk_complete_rq(struct request *req) * * Both the two needn't unmap. */ - if (req_op(req) != REQ_OP_READ && req_op(req) != REQ_OP_WRITE) { - blk_mq_end_request(req, BLK_STS_OK); - return; - } + if (req_op(req) != REQ_OP_READ && req_op(req) != REQ_OP_WRITE) + goto exit; /* for READ request, writing data in iod->addr to rq buffers */ unmapped_bytes = ublk_unmap_io(ubq, req, io); @@ -691,6 +690,10 @@ static void ublk_complete_rq(struct request *req) blk_mq_requeue_request(req, true); else __blk_mq_end_request(req, BLK_STS_OK); + + return; +exit: + blk_mq_end_request(req, res); } /* From patchwork Tue Mar 28 15:09:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 76131 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp2304215vqo; Tue, 28 Mar 2023 08:25:59 -0700 (PDT) X-Google-Smtp-Source: AK7set824OhxZTKiFjiqJWOq9DYR0yM+i+hkc3YDJ3b3vvUUx9uRoEbxA+e5uPGMGzz33XA5jA2m X-Received: by 2002:a17:906:6d19:b0:93b:2be7:3ce4 with SMTP id m25-20020a1709066d1900b0093b2be73ce4mr23496497ejr.1.1680017159523; Tue, 28 Mar 2023 08:25:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680017159; cv=none; d=google.com; s=arc-20160816; b=r9lC/eej91tpZ2RGIJgPegSb8RYqKyCwOUNxpiocvwUGNfRIornxB5DwwGiRfJr/xZ hphZjwo7oSeLhr7US/dqTuSkfaFHjyw1gco8Ds/56XzR39a9vluzmMuc+nXNAB79zd/H xWLs8RwfPFE2FaV6qV4fX+BRuMSvI/ZM9Oslw1b47kGUMg683+9jZLkaZb8WE81+lNCT wHf7lwFBwFeBfpSZMlSitUxKZ6qUVdkkSLVOrbZlc8erB2q9SITqZVrORpyMwkz4n+5b 22YOwvYyDJ3qIVodav40svwW5t7NOcXefKHSmcDXFZzowsfoNRm7+F3/XOeXcO/qMPsB NQHw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=s89n5MVHIm/yrA8YFK9Xcfzas4VQ/6rU7DsLdSVa7Sk=; b=JXzt+R6Fzk3JNBz4Iw71JA34ADgaoUK00HayoI0wa4i3v1NDK7vrZFvY/oh7y3u2nB FizdzGH9YXOTiII7g270G2oBanskuiQjDsvqvHoJGiQzFwJamqHrejc5clF6uayloFRy r7jwdsvgSwD9wi7eZg63xmxe5ugkHClhDRDAx4gjo3EaNmtFDzEcqbjW0q0P0oIw1a+1 QNHpCO7GUzMV+DBjZPu+YUSDVspNytkm6TLEfAedNKVe5AFfKP5B3oM0Tcg6b4WhQlEz MV01UZ6hwZJ84/+t94eOSJtMPux/PshTBa5hlHLy7SOoWPZfHylFWpntxXQNebDIU3Tn I9oQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=R+hLP4oO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id nc12-20020a1709071c0c00b009333ffa651csi16244965ejc.258.2023.03.28.08.25.35; Tue, 28 Mar 2023 08:25:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=R+hLP4oO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233807AbjC1PSL (ORCPT + 99 others); Tue, 28 Mar 2023 11:18:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53504 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233978AbjC1PRk (ORCPT ); Tue, 28 Mar 2023 11:17:40 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6E14B11666 for ; Tue, 28 Mar 2023 08:16:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680016494; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=s89n5MVHIm/yrA8YFK9Xcfzas4VQ/6rU7DsLdSVa7Sk=; b=R+hLP4oOwHMa5sH3Wn8jctL0MltPpLSZ+/bSp9qD+qcEeSp6Lh0k/KWNEYURnPDm4xugs+ KnmHIlyN+zyGILqsXEG+1wJof4k4d3EV7KISRiDmEg8meKgBmuk6U2Kv+iM3OO8cIggBXp ARVxDG4QSe/1xFptzeqtbts/5XKgRnM= Received: from mimecast-mx02.redhat.com (66.187.233.88 [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-637-iqk6bV3lOzq66C24ZkXuIg-1; Tue, 28 Mar 2023 11:11:31 -0400 X-MC-Unique: iqk6bV3lOzq66C24ZkXuIg-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C0D2D100DEAD; Tue, 28 Mar 2023 15:11:25 +0000 (UTC) Received: from localhost (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id D553C140EBF4; Tue, 28 Mar 2023 15:11:24 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Pavel Begunkov , Stefan Hajnoczi , Dan Williams , Ming Lei Subject: [PATCH V5 07/16] block: ublk_drv: don't consider flush request in map/unmap io Date: Tue, 28 Mar 2023 23:09:49 +0800 Message-Id: <20230328150958.1253547-8-ming.lei@redhat.com> In-Reply-To: <20230328150958.1253547-1-ming.lei@redhat.com> References: <20230328150958.1253547-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761625672755596024?= X-GMAIL-MSGID: =?utf-8?q?1761625672755596024?= There isn't data in request of REQ_OP_FLUSH always, so don't consider it in both ublk_map_io() and ublk_unmap_io(). Reviewed-by: Ziyang Zhang Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index bc46616710d4..c73b2dba25ce 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -529,15 +529,13 @@ static int ublk_map_io(const struct ublk_queue *ubq, const struct request *req, struct ublk_io *io) { const unsigned int rq_bytes = blk_rq_bytes(req); + /* * no zero copy, we delay copy WRITE request data into ublksrv * context and the big benefit is that pinning pages in current * context is pretty fast, see ublk_pin_user_pages */ - if (req_op(req) != REQ_OP_WRITE && req_op(req) != REQ_OP_FLUSH) - return rq_bytes; - - if (ublk_rq_has_data(req)) { + if (ublk_rq_has_data(req) && req_op(req) == REQ_OP_WRITE) { struct ublk_map_data data = { .ubq = ubq, .rq = req, @@ -774,9 +772,7 @@ static inline void __ublk_rq_task_work(struct request *req, return; } - if (ublk_need_get_data(ubq) && - (req_op(req) == REQ_OP_WRITE || - req_op(req) == REQ_OP_FLUSH)) { + if (ublk_need_get_data(ubq) && (req_op(req) == REQ_OP_WRITE)) { /* * We have not handled UBLK_IO_NEED_GET_DATA command yet, * so immepdately pass UBLK_IO_RES_NEED_GET_DATA to ublksrv From patchwork Tue Mar 28 15:09:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 76134 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp2307756vqo; Tue, 28 Mar 2023 08:31:20 -0700 (PDT) X-Google-Smtp-Source: AKy350aD3PlPjzSnQ1RnYmhcWKwgFU3mH9ygbquGMnJeKo6xxGP6yMAw57taEhyBScq1h03ck051 X-Received: by 2002:a05:6402:45:b0:4ea:a9b0:a511 with SMTP id f5-20020a056402004500b004eaa9b0a511mr15889250edu.37.1680017480434; Tue, 28 Mar 2023 08:31:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680017480; cv=none; d=google.com; s=arc-20160816; b=NFxSW9xHqn0rk9mBWNcGbRigYJ7k+Tl00YYG1jHoT9H20vG0UboB6WCsSZcmG6Nnyp 2PKnNhQhk3+t/n49S7MTxmGMlAVn4A1gnh1H6L1QFVaNPpSQ5bU6MDbC2HIjwC6ioWRD IkNz7/UF7e3VzEGuBtZpCWyfVoXZlXo+JfvWMR++8NvFGfY9oRfeKIfcoddMLbEudoNZ eMIn+LN7QWwfrrdzQ9CcVdZy8+BecNHySrXTkPNbRYMKXWS1JkKJeZrH+7o7GfMSNYVi 4gFqyYq0yZv9Q4C5tvQLZ8HT8Gkg+nGcqJ+WpIu3XLCNU4haATPJLMmRk3a06JttQ9z1 Ozmg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=litTAfSrcuMc4HupIpacq5wCRTNFJAvtr6mPltpyGGw=; b=swFDgvWmZoA6tQgKc2dLAfRHML7OiDcioXC8/x54spGdSGI6Kfj2HcZsAIWdr1IBhS U1Fz1PFqEuLJ5uUUfumBjDjMD881WYTP0WTigNsI0DYL32eW8sKlpQKNnZtCcD947omM WQM4HY27FT7SgH0eRz1nmHKWrhNXRqYpo9Y9uL4cFySAxwEpreOWIoWphnV43G1jfN6D V+4s3X86+NfumO8Le3BF3jkaC9lbzbtrUW5on8Llw5VSBavG1pKeUwk8tPMeRgEYXI2U ZnypJ9YXgS3vmkzCZM5IeGCivIFTzFCU5Y2+9OLNCdt02jFSADIxPI/o3qUoSGAgPN3j hLEw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=UcP1YHm0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r8-20020aa7d148000000b004c12b7f5a48si28452892edo.177.2023.03.28.08.30.55; Tue, 28 Mar 2023 08:31:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=UcP1YHm0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233245AbjC1PVb (ORCPT + 99 others); Tue, 28 Mar 2023 11:21:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38354 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233344AbjC1PVP (ORCPT ); Tue, 28 Mar 2023 11:21:15 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2929311678 for ; Tue, 28 Mar 2023 08:18:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680016681; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=litTAfSrcuMc4HupIpacq5wCRTNFJAvtr6mPltpyGGw=; b=UcP1YHm0UEPIZRO1Fpb1sDRNUJxNuC0Y6ZMHr5Y+b0k3Lw8Q7ktfM5p81sEJlJLgmbB79c igEJw1kZee9mYQ0+OQgoQbIdcogcR/LMiK7Hc1Z3rsxSOEU9rxaBHAT/RvH/E9EkFRYX4W oEhoeJx6dDeuB8clcctl5sHGHLsXTcM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-641-akZKb0QuMtm2nodnVwiDcw-1; Tue, 28 Mar 2023 11:11:30 -0400 X-MC-Unique: akZKb0QuMtm2nodnVwiDcw-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5C7BB185A7A9; Tue, 28 Mar 2023 15:11:29 +0000 (UTC) Received: from localhost (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7679E40B3EDA; Tue, 28 Mar 2023 15:11:28 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Pavel Begunkov , Stefan Hajnoczi , Dan Williams , Ming Lei Subject: [PATCH V5 08/16] block: ublk_drv: add two helpers to clean up map/unmap request Date: Tue, 28 Mar 2023 23:09:50 +0800 Message-Id: <20230328150958.1253547-9-ming.lei@redhat.com> In-Reply-To: <20230328150958.1253547-1-ming.lei@redhat.com> References: <20230328150958.1253547-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761626009591261052?= X-GMAIL-MSGID: =?utf-8?q?1761626009591261052?= Add two helpers for checking if map/unmap is needed, since we may have passthrough request which needs map or unmap in future, such as for supporting report zones. Meantime don't mark ublk_copy_user_pages as inline since this function is a bit fat now. Reviewed-by: Ziyang Zhang Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index c73b2dba25ce..f87597a7d679 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -488,8 +488,7 @@ static inline unsigned ublk_copy_io_pages(struct ublk_io_iter *data, return done; } -static inline int ublk_copy_user_pages(struct ublk_map_data *data, - bool to_vm) +static int ublk_copy_user_pages(struct ublk_map_data *data, bool to_vm) { const unsigned int gup_flags = to_vm ? FOLL_WRITE : 0; const unsigned long start_vm = data->io->addr; @@ -525,6 +524,16 @@ static inline int ublk_copy_user_pages(struct ublk_map_data *data, return done; } +static inline bool ublk_need_map_req(const struct request *req) +{ + return ublk_rq_has_data(req) && req_op(req) == REQ_OP_WRITE; +} + +static inline bool ublk_need_unmap_req(const struct request *req) +{ + return ublk_rq_has_data(req) && req_op(req) == REQ_OP_READ; +} + static int ublk_map_io(const struct ublk_queue *ubq, const struct request *req, struct ublk_io *io) { @@ -535,7 +544,7 @@ static int ublk_map_io(const struct ublk_queue *ubq, const struct request *req, * context and the big benefit is that pinning pages in current * context is pretty fast, see ublk_pin_user_pages */ - if (ublk_rq_has_data(req) && req_op(req) == REQ_OP_WRITE) { + if (ublk_need_map_req(req)) { struct ublk_map_data data = { .ubq = ubq, .rq = req, @@ -556,7 +565,7 @@ static int ublk_unmap_io(const struct ublk_queue *ubq, { const unsigned int rq_bytes = blk_rq_bytes(req); - if (req_op(req) == REQ_OP_READ && ublk_rq_has_data(req)) { + if (ublk_need_unmap_req(req)) { struct ublk_map_data data = { .ubq = ubq, .rq = req, @@ -772,7 +781,7 @@ static inline void __ublk_rq_task_work(struct request *req, return; } - if (ublk_need_get_data(ubq) && (req_op(req) == REQ_OP_WRITE)) { + if (ublk_need_get_data(ubq) && ublk_need_map_req(req)) { /* * We have not handled UBLK_IO_NEED_GET_DATA command yet, * so immepdately pass UBLK_IO_RES_NEED_GET_DATA to ublksrv From patchwork Tue Mar 28 15:09:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 76145 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp2314578vqo; Tue, 28 Mar 2023 08:41:54 -0700 (PDT) X-Google-Smtp-Source: AKy350aPGK6Q5UMELSQNL9AIFTFYdJhRktYXCHDWGxcJQtSjAg26pG/Uwq85SMc57lJCzezAxpvW X-Received: by 2002:a17:906:dc7:b0:92b:7e6a:bca0 with SMTP id p7-20020a1709060dc700b0092b7e6abca0mr15774390eji.14.1680018113935; Tue, 28 Mar 2023 08:41:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680018113; cv=none; d=google.com; s=arc-20160816; b=JuctzO9girejU5/tqAGD19vNwIEf7w/lg0sRlMotw9lzOJf6NcZgCULw0c4lpF3MMB e6VVsKkEwIzAX/SSWevTVsuHgIf4h/Rt/WdFnlwGFDRHa9d0CicuIOXUAzYPEnn1YbCQ c/8SyzR77x091HQ7t1QOuKyztZVfY+XkEv/j6b3/zP6Io9SYFo8iI2BMAvvcAsvLv/0/ gjIiaapGMtOBX/6MzOPvoDeFMV0qqe6RbqJBQegNgo75zRKX5oBtgEcJDb5HtgcHihBB jeoR8nLAsGRc9ddI50mIc6RX3Jw79OKQooUSLmre0aApzlTOLMlzMk4vvRkh+QdCXHJl /vbg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Plah9oxK2vEYgPKSe8zMQk868IGL1eXPxQld6RUGB/s=; b=Te26Th/hEXihHzrQooKjCsgMVRtZpteKAB6xiXcmHmu659TKxwouyOd7iNcSGwbQBh nf+YrnMn+4CiqssaGZgt4Z5ZxfTnF320k9fdzlVK0wtLODxztgZww3c16OuB75fo4kJ0 yafX7gMcMBR7AF/Xghv+6v+lC/ySnDi6YsotERMeBmv/6RmqV/VOf5OfuGGmQ7bpTIyi yu/TLGZSi291HrLxUeI+pKIeF9v7fh+2zCWVR//Qqd2ZY8PDiPxsHtVR4SkWiZhiMJwV g6ksB1YUHrWpAC9Xne3C6sGPoVNsypeK+mXJCHrEz3UNvKYR38GUvx3AGdsW/otFcY3U tiZg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=aFe5grvN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m9-20020a17090607c900b008da063a965fsi29999901ejc.937.2023.03.28.08.41.29; Tue, 28 Mar 2023 08:41:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=aFe5grvN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232156AbjC1PR7 (ORCPT + 99 others); Tue, 28 Mar 2023 11:17:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55098 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233944AbjC1PRi (ORCPT ); Tue, 28 Mar 2023 11:17:38 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8759310AA2 for ; Tue, 28 Mar 2023 08:16:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680016500; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Plah9oxK2vEYgPKSe8zMQk868IGL1eXPxQld6RUGB/s=; b=aFe5grvNnlHnopoQloN5mHTBKZryNG+4I15mAezHN8C6K/NVcRNsdjzBCR9kESQ3D3k6RT 7yiH/Ipj2tF71S51z5PpDqMDdt2GegafLv/90InrRkMCArI/B1VTuTZuXhkbjZqDqbOrTG gma6Y2NcxZn2pg1MxfTbp50eTD9umEo= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-552-eIOBpHoaMdCVHdDeJnE0HA-1; Tue, 28 Mar 2023 11:11:34 -0400 X-MC-Unique: eIOBpHoaMdCVHdDeJnE0HA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4E2A3100DEBF; Tue, 28 Mar 2023 15:11:33 +0000 (UTC) Received: from localhost (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7072214171C3; Tue, 28 Mar 2023 15:11:32 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Pavel Begunkov , Stefan Hajnoczi , Dan Williams , Ming Lei Subject: [PATCH V5 09/16] block: ublk_drv: clean up several helpers Date: Tue, 28 Mar 2023 23:09:51 +0800 Message-Id: <20230328150958.1253547-10-ming.lei@redhat.com> In-Reply-To: <20230328150958.1253547-1-ming.lei@redhat.com> References: <20230328150958.1253547-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761626673827083186?= X-GMAIL-MSGID: =?utf-8?q?1761626673827083186?= Convert the following pattern in several helpers if (Z) return true return false into: return Z; Reviewed-by: Ziyang Zhang Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 18 +++++------------- 1 file changed, 5 insertions(+), 13 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index f87597a7d679..1c057003a40a 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -298,9 +298,7 @@ static inline bool ublk_can_use_task_work(const struct ublk_queue *ubq) static inline bool ublk_need_get_data(const struct ublk_queue *ubq) { - if (ubq->flags & UBLK_F_NEED_GET_DATA) - return true; - return false; + return ubq->flags & UBLK_F_NEED_GET_DATA; } static struct ublk_device *ublk_get_device(struct ublk_device *ub) @@ -349,25 +347,19 @@ static inline int ublk_queue_cmd_buf_size(struct ublk_device *ub, int q_id) static inline bool ublk_queue_can_use_recovery_reissue( struct ublk_queue *ubq) { - if ((ubq->flags & UBLK_F_USER_RECOVERY) && - (ubq->flags & UBLK_F_USER_RECOVERY_REISSUE)) - return true; - return false; + return (ubq->flags & UBLK_F_USER_RECOVERY) && + (ubq->flags & UBLK_F_USER_RECOVERY_REISSUE); } static inline bool ublk_queue_can_use_recovery( struct ublk_queue *ubq) { - if (ubq->flags & UBLK_F_USER_RECOVERY) - return true; - return false; + return ubq->flags & UBLK_F_USER_RECOVERY; } static inline bool ublk_can_use_recovery(struct ublk_device *ub) { - if (ub->dev_info.flags & UBLK_F_USER_RECOVERY) - return true; - return false; + return ub->dev_info.flags & UBLK_F_USER_RECOVERY; } static void ublk_free_disk(struct gendisk *disk) From patchwork Tue Mar 28 15:09:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 76138 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp2312148vqo; Tue, 28 Mar 2023 08:37:52 -0700 (PDT) X-Google-Smtp-Source: AKy350b3i2k5MHcrDHVO3+zkm94QTBxc0HEMZtihTv5EPXPb3XHIj9Colh+GtbUg5vv9TSAvBa87 X-Received: by 2002:aa7:9728:0:b0:627:ef23:1f95 with SMTP id k8-20020aa79728000000b00627ef231f95mr15317942pfg.31.1680017872331; Tue, 28 Mar 2023 08:37:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680017872; cv=none; d=google.com; s=arc-20160816; b=oLFGCgHtDGKeor66ZIL1KuDGg7yV/bZ6j6VpYMDqJvYc6qe5J8BM2MBTwd1eCRipmM 3VtMXtuhWQEOln6+oYGZ2AUo+fuHqRGuhgBvI1GgIQb87YRY/GZkmla93usI18rxh4Eg DtBQyr9vmbtp1fqZuzeWaj7lPCc+rAiUQq0F72mzvdmEFo1XHtSTSDEH1z88u6aU4T3r GeD6Bp0gdklv26sFgyltdNZGTSnwl+lRMNWk1LnExx9Jzfs1YqeL28ngLTss0ice0ua5 +eWfl7TxQ31d4kVAwZYHnCexuYprMFyTPsGlnM/TNHbRniJdcCPm9GG2mntHQEzZIn9Q 1ZDw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=aRW9a0ZHd+MGeZHqXvtwA5Cp/cxaEC4LxuKjGisTMi4=; b=feiq8NawCGNBIH1ZHWm8Z8UKML+62dpCQIKufW0vc6ajy0Gs+3BgjzfpoDWlSnW23x RKwV1OxVL7LQzQaICCQ4L8WpCFv3Bh/lYZM9RnGfgud0BMumRMeXF1877arYHD+xy9uo oDyiz5bcBG6Omy746Y9B6+dqPYa++7VDg3oQZb9S/eXjSDz1j0HPwuOTnWtUSjEbTqIW TWW4Mr4LYCr/j20xwzKxTJPk6pBEstpV9LXVWqY95jerpcVNDHmJI9BcO3I9RO342FYq j5m2/JCNhccrzrEg77DnECTwRcxETPSOqRSn1DVHmbQNSLTBS8dXX3UMV1OxAC9L5ljZ jnzQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=X1kCH05n; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d3-20020a056a00244300b005a8b856ad47si30000899pfj.7.2023.03.28.08.37.39; Tue, 28 Mar 2023 08:37:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=X1kCH05n; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233366AbjC1PVe (ORCPT + 99 others); Tue, 28 Mar 2023 11:21:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38352 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233293AbjC1PVN (ORCPT ); Tue, 28 Mar 2023 11:21:13 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9F6761166A for ; Tue, 28 Mar 2023 08:18:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680016681; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aRW9a0ZHd+MGeZHqXvtwA5Cp/cxaEC4LxuKjGisTMi4=; b=X1kCH05nST6X0m/t4j/0gMwAYi7A41WQWlEFgRHQfuUxBC5cG8+Qvg0RRGDpJoUMGFIQl3 XncI6gxozjvf231AXoHK59MgosrXmYRa7Rvfs07zBDy7/G+449x7m4A3KR5Zsyemaemfxt OuOW6wcPfJPvY0KHG930XCNs5NN08yg= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-437-CwbPgIyQOWK4ebslNp_Dkg-1; Tue, 28 Mar 2023 11:11:37 -0400 X-MC-Unique: CwbPgIyQOWK4ebslNp_Dkg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 2C00E185A791; Tue, 28 Mar 2023 15:11:37 +0000 (UTC) Received: from localhost (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id 304CC440D6; Tue, 28 Mar 2023 15:11:35 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Pavel Begunkov , Stefan Hajnoczi , Dan Williams , Ming Lei Subject: [PATCH V5 10/16] block: ublk_drv: cleanup 'struct ublk_map_data' Date: Tue, 28 Mar 2023 23:09:52 +0800 Message-Id: <20230328150958.1253547-11-ming.lei@redhat.com> In-Reply-To: <20230328150958.1253547-1-ming.lei@redhat.com> References: <20230328150958.1253547-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761626420374226276?= X-GMAIL-MSGID: =?utf-8?q?1761626420374226276?= 'struct ublk_map_data' is passed to ublk_copy_user_pages() for copying data between userspace buffer and request pages. Here what matters is userspace buffer address/len and 'struct request', so replace ->io field with user buffer address, and rename max_bytes as len. Meantime remove 'ubq' field from ublk_map_data, since it isn't used any more. Then code becomes more readable. Reviewed-by: Ziyang Zhang Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 27 ++++++++++++--------------- 1 file changed, 12 insertions(+), 15 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 1c057003a40a..fdccbf5fdaa1 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -420,10 +420,9 @@ static const struct block_device_operations ub_fops = { #define UBLK_MAX_PIN_PAGES 32 struct ublk_map_data { - const struct ublk_queue *ubq; const struct request *rq; - const struct ublk_io *io; - unsigned max_bytes; + unsigned long ubuf; + unsigned int len; }; struct ublk_io_iter { @@ -483,14 +482,14 @@ static inline unsigned ublk_copy_io_pages(struct ublk_io_iter *data, static int ublk_copy_user_pages(struct ublk_map_data *data, bool to_vm) { const unsigned int gup_flags = to_vm ? FOLL_WRITE : 0; - const unsigned long start_vm = data->io->addr; + const unsigned long start_vm = data->ubuf; unsigned int done = 0; struct ublk_io_iter iter = { .pg_off = start_vm & (PAGE_SIZE - 1), .bio = data->rq->bio, .iter = data->rq->bio->bi_iter, }; - const unsigned int nr_pages = round_up(data->max_bytes + + const unsigned int nr_pages = round_up(data->len + (start_vm & (PAGE_SIZE - 1)), PAGE_SIZE) >> PAGE_SHIFT; while (done < nr_pages) { @@ -503,13 +502,13 @@ static int ublk_copy_user_pages(struct ublk_map_data *data, bool to_vm) iter.pages); if (iter.nr_pages <= 0) return done == 0 ? iter.nr_pages : done; - len = ublk_copy_io_pages(&iter, data->max_bytes, to_vm); + len = ublk_copy_io_pages(&iter, data->len, to_vm); for (i = 0; i < iter.nr_pages; i++) { if (to_vm) set_page_dirty(iter.pages[i]); put_page(iter.pages[i]); } - data->max_bytes -= len; + data->len -= len; done += iter.nr_pages; } @@ -538,15 +537,14 @@ static int ublk_map_io(const struct ublk_queue *ubq, const struct request *req, */ if (ublk_need_map_req(req)) { struct ublk_map_data data = { - .ubq = ubq, .rq = req, - .io = io, - .max_bytes = rq_bytes, + .ubuf = io->addr, + .len = rq_bytes, }; ublk_copy_user_pages(&data, true); - return rq_bytes - data.max_bytes; + return rq_bytes - data.len; } return rq_bytes; } @@ -559,17 +557,16 @@ static int ublk_unmap_io(const struct ublk_queue *ubq, if (ublk_need_unmap_req(req)) { struct ublk_map_data data = { - .ubq = ubq, .rq = req, - .io = io, - .max_bytes = io->res, + .ubuf = io->addr, + .len = io->res, }; WARN_ON_ONCE(io->res > rq_bytes); ublk_copy_user_pages(&data, false); - return io->res - data.max_bytes; + return io->res - data.len; } return rq_bytes; } From patchwork Tue Mar 28 15:09:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 76168 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp2328041vqo; Tue, 28 Mar 2023 09:03:10 -0700 (PDT) X-Google-Smtp-Source: AKy350ZvjGbmrC1RPDJaUPcPDhKenMmGUpXdLStc5FojCpdtEzIjqRnC3ZsfHx3EcBAShElMQv00 X-Received: by 2002:aa7:d50e:0:b0:502:1cd3:d0fb with SMTP id y14-20020aa7d50e000000b005021cd3d0fbmr16041603edq.20.1680019390687; Tue, 28 Mar 2023 09:03:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680019390; cv=none; d=google.com; s=arc-20160816; b=VA9hPhYdUDqVtxujP7wqr7x6YsgRGm3goG5DAuUfZm9WmhzBSqkNvVLQ/1o4Dm9q3Y LJdM6fJITWpe7xj/TLKhJ45IyESTOQsaqr6VZPBrHcmMzBDctmaEd9FildwrsFt8rsBj jKECb9MdR3/WTu+kRRmmjYJTjrM6LdZnI2hAat96CqrgXerXPYc3yP2BIzlxwy/8kK44 nEDZspzP6VJVQWkSrNERmkqyXwMUlERSAuqRGU0lsHNODICeI6mjjXv1YgssYWm1Ld/L HJXBK9Yf/pp/LUQi6DW3v8ntuXPjrAia81W+4fcyaFo/aEcpaPam8fafvp0OTPiA9gBS PmbA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=NUQtNNFK1AoI7+XEcCF6Kp2M1pxUT3LrqKOWK4i69ow=; b=FtEYjoSYp/VP3WqN851gyiQrB6gezAG+2Atda4DknFa78SViflUo7jDybCGRYHXHAe xJeGulz6CDehAns3ArgUORhy1Z20x3/aKI89p7efo7k1jH28VwGo/DOsg8r9TeMAZ9v0 brNna7rfvu39GGZLOQ1JbOssKEeARoTKZE8DoObMDKNZVPq/K3/PaBkNezl+Jq21874M ks+JfaAfiqU6EV2Mj1CghpO3uyovjC3exsS0FUpMJuIZ2eFo/x6Brlo3jGQdRf/hrJso oT98Sndj1yH+p7Jyx6SpbMVojpBUSAxyue+t0CwMsJBfMTJN/R+pPRJhMiKmpRFwA2/+ +b2w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=hvoDJ0B0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m19-20020aa7d353000000b00502038c2d09si16933147edr.596.2023.03.28.09.02.41; Tue, 28 Mar 2023 09:03:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=hvoDJ0B0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233948AbjC1Pba (ORCPT + 99 others); Tue, 28 Mar 2023 11:31:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32978 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233986AbjC1PbM (ORCPT ); Tue, 28 Mar 2023 11:31:12 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F39A41117C for ; Tue, 28 Mar 2023 08:29:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680017341; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NUQtNNFK1AoI7+XEcCF6Kp2M1pxUT3LrqKOWK4i69ow=; b=hvoDJ0B0tQ284eYCF2gMG7wTfAPUtEV+04OVDH8TcgLo9musf2iUEgsY50jPAQsRWulMfn +K0B2JBYGdXHGO4OXZCoq6eylBSMzTeq/o29Jqj75rsVVS/M52yVO30GUHxWkEMuSgf6UF VwvbkIQ1iz2N7RmZEqPo9nFlGn+mL3o= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-672-2NhuttPlMNC5uJPfRoKe4w-1; Tue, 28 Mar 2023 11:11:42 -0400 X-MC-Unique: 2NhuttPlMNC5uJPfRoKe4w-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 9022888740A; Tue, 28 Mar 2023 15:11:41 +0000 (UTC) Received: from localhost (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id 687B62166B26; Tue, 28 Mar 2023 15:11:40 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Pavel Begunkov , Stefan Hajnoczi , Dan Williams , Ming Lei Subject: [PATCH V5 11/16] block: ublk_drv: cleanup ublk_copy_user_pages Date: Tue, 28 Mar 2023 23:09:53 +0800 Message-Id: <20230328150958.1253547-12-ming.lei@redhat.com> In-Reply-To: <20230328150958.1253547-1-ming.lei@redhat.com> References: <20230328150958.1253547-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761628012667884147?= X-GMAIL-MSGID: =?utf-8?q?1761628012667884147?= Clean up ublk_copy_user_pages() by using iov iter, and code gets simplified a lot and becomes much more readable than before. Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 112 +++++++++++++++++---------------------- 1 file changed, 49 insertions(+), 63 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index fdccbf5fdaa1..cca0e95a89d8 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -419,49 +419,39 @@ static const struct block_device_operations ub_fops = { #define UBLK_MAX_PIN_PAGES 32 -struct ublk_map_data { - const struct request *rq; - unsigned long ubuf; - unsigned int len; -}; - struct ublk_io_iter { struct page *pages[UBLK_MAX_PIN_PAGES]; - unsigned pg_off; /* offset in the 1st page in pages */ - int nr_pages; /* how many page pointers in pages */ struct bio *bio; struct bvec_iter iter; }; -static inline unsigned ublk_copy_io_pages(struct ublk_io_iter *data, - unsigned max_bytes, bool to_vm) +/* return how many pages are copied */ +static void ublk_copy_io_pages(struct ublk_io_iter *data, + size_t total, size_t pg_off, int dir) { - const unsigned total = min_t(unsigned, max_bytes, - PAGE_SIZE - data->pg_off + - ((data->nr_pages - 1) << PAGE_SHIFT)); unsigned done = 0; unsigned pg_idx = 0; while (done < total) { struct bio_vec bv = bio_iter_iovec(data->bio, data->iter); - const unsigned int bytes = min3(bv.bv_len, total - done, - (unsigned)(PAGE_SIZE - data->pg_off)); + unsigned int bytes = min3(bv.bv_len, (unsigned)total - done, + (unsigned)(PAGE_SIZE - pg_off)); void *bv_buf = bvec_kmap_local(&bv); void *pg_buf = kmap_local_page(data->pages[pg_idx]); - if (to_vm) - memcpy(pg_buf + data->pg_off, bv_buf, bytes); + if (dir == ITER_DEST) + memcpy(pg_buf + pg_off, bv_buf, bytes); else - memcpy(bv_buf, pg_buf + data->pg_off, bytes); + memcpy(bv_buf, pg_buf + pg_off, bytes); kunmap_local(pg_buf); kunmap_local(bv_buf); /* advance page array */ - data->pg_off += bytes; - if (data->pg_off == PAGE_SIZE) { + pg_off += bytes; + if (pg_off == PAGE_SIZE) { pg_idx += 1; - data->pg_off = 0; + pg_off = 0; } done += bytes; @@ -475,41 +465,40 @@ static inline unsigned ublk_copy_io_pages(struct ublk_io_iter *data, data->iter = data->bio->bi_iter; } } - - return done; } -static int ublk_copy_user_pages(struct ublk_map_data *data, bool to_vm) +/* + * Copy data between request pages and io_iter, and 'offset' + * is the start point of linear offset of request. + */ +static size_t ublk_copy_user_pages(const struct request *req, + struct iov_iter *uiter, int dir) { - const unsigned int gup_flags = to_vm ? FOLL_WRITE : 0; - const unsigned long start_vm = data->ubuf; - unsigned int done = 0; struct ublk_io_iter iter = { - .pg_off = start_vm & (PAGE_SIZE - 1), - .bio = data->rq->bio, - .iter = data->rq->bio->bi_iter, + .bio = req->bio, + .iter = req->bio->bi_iter, }; - const unsigned int nr_pages = round_up(data->len + - (start_vm & (PAGE_SIZE - 1)), PAGE_SIZE) >> PAGE_SHIFT; - - while (done < nr_pages) { - const unsigned to_pin = min_t(unsigned, UBLK_MAX_PIN_PAGES, - nr_pages - done); - unsigned i, len; - - iter.nr_pages = get_user_pages_fast(start_vm + - (done << PAGE_SHIFT), to_pin, gup_flags, - iter.pages); - if (iter.nr_pages <= 0) - return done == 0 ? iter.nr_pages : done; - len = ublk_copy_io_pages(&iter, data->len, to_vm); - for (i = 0; i < iter.nr_pages; i++) { - if (to_vm) + size_t done = 0; + + while (iov_iter_count(uiter) && iter.bio) { + unsigned nr_pages; + size_t len, off; + int i; + + len = iov_iter_get_pages2(uiter, iter.pages, + iov_iter_count(uiter), + UBLK_MAX_PIN_PAGES, &off); + if (len <= 0) + return done; + + ublk_copy_io_pages(&iter, len, off, dir); + nr_pages = DIV_ROUND_UP(len + off, PAGE_SIZE); + for (i = 0; i < nr_pages; i++) { + if (dir == ITER_DEST) set_page_dirty(iter.pages[i]); put_page(iter.pages[i]); } - data->len -= len; - done += iter.nr_pages; + done += len; } return done; @@ -536,15 +525,14 @@ static int ublk_map_io(const struct ublk_queue *ubq, const struct request *req, * context is pretty fast, see ublk_pin_user_pages */ if (ublk_need_map_req(req)) { - struct ublk_map_data data = { - .rq = req, - .ubuf = io->addr, - .len = rq_bytes, - }; + struct iov_iter iter; + struct iovec iov; + const int dir = ITER_DEST; - ublk_copy_user_pages(&data, true); + import_single_range(dir, u64_to_user_ptr(io->addr), rq_bytes, + &iov, &iter); - return rq_bytes - data.len; + return ublk_copy_user_pages(req, &iter, dir); } return rq_bytes; } @@ -556,17 +544,15 @@ static int ublk_unmap_io(const struct ublk_queue *ubq, const unsigned int rq_bytes = blk_rq_bytes(req); if (ublk_need_unmap_req(req)) { - struct ublk_map_data data = { - .rq = req, - .ubuf = io->addr, - .len = io->res, - }; + struct iov_iter iter; + struct iovec iov; + const int dir = ITER_SOURCE; WARN_ON_ONCE(io->res > rq_bytes); - ublk_copy_user_pages(&data, false); - - return io->res - data.len; + import_single_range(dir, u64_to_user_ptr(io->addr), io->res, + &iov, &iter); + return ublk_copy_user_pages(req, &iter, dir); } return rq_bytes; } From patchwork Tue Mar 28 15:09:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 76140 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp2312313vqo; Tue, 28 Mar 2023 08:38:08 -0700 (PDT) X-Google-Smtp-Source: AKy350b25g8ZnSY2jiclDBwzt3AWVONr9zp8snAgpYvlqIb0YruRISbfFstNGNZkfZ8JOYntBrF0 X-Received: by 2002:a17:902:ca8d:b0:19c:9420:6236 with SMTP id v13-20020a170902ca8d00b0019c94206236mr12481981pld.22.1680017887845; Tue, 28 Mar 2023 08:38:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680017887; cv=none; d=google.com; s=arc-20160816; b=K0oDpTqKLvMbx3zgxzQDN3cU6wcOAH1z3swyA8BoP4G/D3w3nDlsg8Sh2ZMOMlJAQ1 MdDnW07QmTFCtvc/AUMMRiwa4KD2PzZwblBpfReDpRK6d6heJY4BcuSXCX+6YI3IHAy9 gaUumyblBsB5NrpS9lhaAaRuDdihV9cjUK6jHBMIZCarUUbiz1mH+Hcpc+ebWAqcE6wh xOKxxL+vx+8/ZNLZtRcp/+KzpTMRwFgonNINnHTVO0YxqGAT0LV+WMoB1oIV7VTV/I5f VujOekZEB2vtiHdTnb+KZFpYr/J5mBGlJ1os0IRGLha4i5zxm7APX10Ovsy1e3VdSVxE wjLg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=F7F6vbrS0gh77HT5XSUWKXsVAOVZJQ/PeMG00WEkJhQ=; b=Z0VuX/2mbQvnPVMXp/hC+2zCiLsdHCXd3FlvduhAr1M/HSYo9cAL2whNeAz1gHfrz1 pcMhG/rNUZAcuSvjzNX92yEeQSN4JFAmoTWWkqVE4JH0Cwy+IV6OWDWq+G/mfcndp8G4 ZvkGqBrivZOrGGqYqPI/BRYPhnDoa2KNsCzA/e/6HloQtFxfxsmCxHVIiEGT8QRjHD0K k/+ATbzmUcMS7pWe56XBeR6e6DrzDw/GLW2wHgi14oRbsr0oDauWOEBp0LLWo1dVf9fi z6SQtigrNTNU8D+UeSX9QxefAyER9aG7OmdOmTUaibY+gJB/iW8mHhgXZte9ejzE31XF k3ow== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=KzeMfyhP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id kt11-20020a170903088b00b00198ff2d6543si28332486plb.117.2023.03.28.08.37.55; Tue, 28 Mar 2023 08:38:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=KzeMfyhP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233968AbjC1PSG (ORCPT + 99 others); Tue, 28 Mar 2023 11:18:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55156 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233807AbjC1PRk (ORCPT ); Tue, 28 Mar 2023 11:17:40 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4FD4B11663 for ; Tue, 28 Mar 2023 08:16:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680016498; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=F7F6vbrS0gh77HT5XSUWKXsVAOVZJQ/PeMG00WEkJhQ=; b=KzeMfyhP4Q9lN5AceYwVWOh7/sGx+PpH2gkBh+hpirvPZJtKuQTHgACLccsXSB+/3hKH/8 069d7G0QpzI4aVmZK6UnmkK6TAKQIKxhczGcOBj6Rr059sEjdC2Oc6B8aTfU2c1nQVZfLp ixFHaND8R/4WKlA7y0V9vI77PBAKDIg= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-491-4CUDDjSVM8ih_EohNIseDw-1; Tue, 28 Mar 2023 11:11:46 -0400 X-MC-Unique: 4CUDDjSVM8ih_EohNIseDw-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 76BAD887405; Tue, 28 Mar 2023 15:11:45 +0000 (UTC) Received: from localhost (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id 66A462166B26; Tue, 28 Mar 2023 15:11:44 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Pavel Begunkov , Stefan Hajnoczi , Dan Williams , Ming Lei Subject: [PATCH V5 12/16] block: ublk_drv: grab request reference when the request is handled by userspace Date: Tue, 28 Mar 2023 23:09:54 +0800 Message-Id: <20230328150958.1253547-13-ming.lei@redhat.com> In-Reply-To: <20230328150958.1253547-1-ming.lei@redhat.com> References: <20230328150958.1253547-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761626436674488228?= X-GMAIL-MSGID: =?utf-8?q?1761626436674488228?= Add one reference counter into request pdu data, and hold this reference in the request's lifetime. This way is always safe. In theory, the ublk request won't be completed until fused commands are done. However, it is userspace, and application can submit fused command at will. Prepare for supporting zero copy, which needs to retrieve request buffer by fused command, so we have to guarantee: - the fused command can't succeed unless the request isn't queued - when any fused command is successful, this request can't be freed until all fused commands on this request are done. Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 67 ++++++++++++++++++++++++++++++++++++++-- 1 file changed, 64 insertions(+), 3 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index cca0e95a89d8..0dc8eb04b9a5 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -43,6 +43,7 @@ #include #include #include +#include #include #define UBLK_MINORS (1U << MINORBITS) @@ -62,6 +63,17 @@ struct ublk_rq_data { struct llist_node node; struct callback_head work; + + /* + * Only for applying fused command to support zero copy: + * + * - if there is any fused command aiming at this request, not complete + * request until all fused commands are done + * + * - fused command has to fail unless this reference is grabbed + * successfully + */ + struct kref ref; }; struct ublk_uring_cmd_pdu { @@ -180,6 +192,9 @@ struct ublk_params_header { __u32 types; }; +static inline void __ublk_complete_rq(struct request *req); +static void ublk_complete_rq(struct kref *ref); + static dev_t ublk_chr_devt; static struct class *ublk_chr_class; @@ -288,6 +303,35 @@ static int ublk_apply_params(struct ublk_device *ub) return 0; } +static inline bool ublk_support_zc(const struct ublk_queue *ubq) +{ + return ubq->flags & UBLK_F_SUPPORT_ZERO_COPY; +} + +static inline bool ublk_get_req_ref(const struct ublk_queue *ubq, + struct request *req) +{ + if (ublk_support_zc(ubq)) { + struct ublk_rq_data *data = blk_mq_rq_to_pdu(req); + + return kref_get_unless_zero(&data->ref); + } + + return true; +} + +static inline void ublk_put_req_ref(const struct ublk_queue *ubq, + struct request *req) +{ + if (ublk_support_zc(ubq)) { + struct ublk_rq_data *data = blk_mq_rq_to_pdu(req); + + kref_put(&data->ref, ublk_complete_rq); + } else { + __ublk_complete_rq(req); + } +} + static inline bool ublk_can_use_task_work(const struct ublk_queue *ubq) { if (IS_BUILTIN(CONFIG_BLK_DEV_UBLK) && @@ -632,13 +676,19 @@ static inline bool ubq_daemon_is_dying(struct ublk_queue *ubq) } /* todo: handle partial completion */ -static void ublk_complete_rq(struct request *req) +static inline void __ublk_complete_rq(struct request *req) { struct ublk_queue *ubq = req->mq_hctx->driver_data; struct ublk_io *io = &ubq->ios[req->tag]; unsigned int unmapped_bytes; blk_status_t res = BLK_STS_OK; + /* called from ublk_abort_queue() code path */ + if (io->flags & UBLK_IO_FLAG_ABORTED) { + res = BLK_STS_IOERR; + goto exit; + } + /* failed read IO if nothing is read */ if (!io->res && req_op(req) == REQ_OP_READ) io->res = -EIO; @@ -678,6 +728,15 @@ static void ublk_complete_rq(struct request *req) blk_mq_end_request(req, res); } +static void ublk_complete_rq(struct kref *ref) +{ + struct ublk_rq_data *data = container_of(ref, struct ublk_rq_data, + ref); + struct request *req = blk_mq_rq_from_pdu(data); + + __ublk_complete_rq(req); +} + /* * Since __ublk_rq_task_work always fails requests immediately during * exiting, __ublk_fail_req() is only called from abort context during @@ -696,7 +755,7 @@ static void __ublk_fail_req(struct ublk_queue *ubq, struct ublk_io *io, if (ublk_queue_can_use_recovery_reissue(ubq)) blk_mq_requeue_request(req, false); else - blk_mq_end_request(req, BLK_STS_IOERR); + ublk_put_req_ref(ubq, req); } } @@ -734,6 +793,7 @@ static inline void __ublk_rq_task_work(struct request *req, unsigned issue_flags) { struct ublk_queue *ubq = req->mq_hctx->driver_data; + struct ublk_rq_data *data = blk_mq_rq_to_pdu(req); int tag = req->tag; struct ublk_io *io = &ubq->ios[tag]; unsigned int mapped_bytes; @@ -805,6 +865,7 @@ static inline void __ublk_rq_task_work(struct request *req, mapped_bytes >> 9; } + kref_init(&data->ref); ubq_complete_io_cmd(io, UBLK_IO_RES_OK, issue_flags); } @@ -1017,7 +1078,7 @@ static void ublk_commit_completion(struct ublk_device *ub, req = blk_mq_tag_to_rq(ub->tag_set.tags[qid], tag); if (req && likely(!blk_should_fake_timeout(req->q))) - ublk_complete_rq(req); + ublk_put_req_ref(ubq, req); } /* From patchwork Tue Mar 28 15:09:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 76142 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp2313822vqo; Tue, 28 Mar 2023 08:40:37 -0700 (PDT) X-Google-Smtp-Source: AKy350Zlk46zjJdF28iscB4cpTYtKEQMsjznukdVDiPUdMeL7It5j1e1K403lFpuNL8PpRkiajvf X-Received: by 2002:a05:6402:1841:b0:4fc:782c:dca3 with SMTP id v1-20020a056402184100b004fc782cdca3mr16769815edy.28.1680018037714; Tue, 28 Mar 2023 08:40:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680018037; cv=none; d=google.com; s=arc-20160816; b=WcOhNJhrXn03uW2xFZcQY1Bkw/lPt7FCnDyIFbfFhBHu9ZJAXKoseoE1RwQihpmNnW HclHVgqD7+JOan1on0Zh/YT7fVedQzWpnt/WfP5aCiAgtRAOZl82it6MVYvt3sSGOxe9 ZLax1+DfJjtEvcArMMrFEN0qzLPJynN1G3/dYVq+M56d6uMvUZcqaTfF9UKx34jl+cc6 j8l+nxHj0Nq1WxZnh1p+AIJz6QxxPcLjb3aQr4XGOOn5oROXU/J3/iAHodmdp0AerRbU SDu5uy02sBQpSHXrZnC8pxjanE9M/qO+u8XFVIBnHv1+u2ImTPUoTz3IpqYHTEuo780e P6/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=cTxPpka+LhrPaNriA70AKnPyZgOXWneoLBopgtDoZwU=; b=LT5+TphJjlZ9W53Yj8jaKntTIefR4+XuEr59vBHMJmVD5ElTKrf2eVKkzJLVSIyP/e /NqtnW5gLT97HWqWvvVPnIlDgyKMRKfTzRGhkDuoBvxDGAdtoB7sQf0FvOYG5i7f0SeL G1TVlNeiy8NLI8jgY5itKeiHVueaGPX1ddL1cEnA1g5XfxLAOL3RBoqfzuWXEofuyLbh njxUb5b5wEs7nAA/QIalSZ56SSc0nokoKpydVkqqMALK4YIFnPSh6xItU13MrrsMcaaT cQf4X0i2cgaFn1PN5zQVPVxbLO7O8bhMmgHXAWYjt0Dadh1JQVQegEtb5iWH/CLOUeG7 m9ew== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=UgZATod4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d14-20020a056402078e00b004ab41ef87c3si31863036edy.442.2023.03.28.08.40.12; Tue, 28 Mar 2023 08:40:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=UgZATod4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232833AbjC1PXV (ORCPT + 99 others); Tue, 28 Mar 2023 11:23:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38358 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233808AbjC1PW5 (ORCPT ); Tue, 28 Mar 2023 11:22:57 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B1AC5EB56 for ; Tue, 28 Mar 2023 08:20:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680016747; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cTxPpka+LhrPaNriA70AKnPyZgOXWneoLBopgtDoZwU=; b=UgZATod4BzU6rhu1YYWJYatGc2EbZ4XKB3NZFSziMcaGbFbYc/6b6RrIOxflajpKdobhre SNk9eWnwomEbt/FSUzyHi7poNOMjkwrLGaoSBkOgaiqtCFn56vNwQ+4JIeQxiz5XF/9DKc WBER4CLETO4R+aFYdKyafkH+c/iGPXo= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-240-PUrBr7alNgOUiXBQAgnAyQ-1; Tue, 28 Mar 2023 11:11:57 -0400 X-MC-Unique: PUrBr7alNgOUiXBQAgnAyQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 8E7052823806; Tue, 28 Mar 2023 15:11:49 +0000 (UTC) Received: from localhost (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9EB1D4020C83; Tue, 28 Mar 2023 15:11:48 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Pavel Begunkov , Stefan Hajnoczi , Dan Williams , Ming Lei Subject: [PATCH V5 13/16] block: ublk_drv: support to copy any part of request pages Date: Tue, 28 Mar 2023 23:09:55 +0800 Message-Id: <20230328150958.1253547-14-ming.lei@redhat.com> In-Reply-To: <20230328150958.1253547-1-ming.lei@redhat.com> References: <20230328150958.1253547-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761626593584064258?= X-GMAIL-MSGID: =?utf-8?q?1761626593584064258?= Add 'offset' to 'struct ublk_map_data', so that ublk_copy_user_pages() can be used to copy any sub-buffer(linear mapped) of the request. Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 31 ++++++++++++++++++++++++------- 1 file changed, 24 insertions(+), 7 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 0dc8eb04b9a5..32304942ab87 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -511,19 +511,36 @@ static void ublk_copy_io_pages(struct ublk_io_iter *data, } } +static bool ublk_advance_io_iter(const struct request *req, + struct ublk_io_iter *iter, unsigned int offset) +{ + struct bio *bio = req->bio; + + for_each_bio(bio) { + if (bio->bi_iter.bi_size > offset) { + iter->bio = bio; + iter->iter = bio->bi_iter; + bio_advance_iter(iter->bio, &iter->iter, offset); + return true; + } + offset -= bio->bi_iter.bi_size; + } + return false; +} + /* * Copy data between request pages and io_iter, and 'offset' * is the start point of linear offset of request. */ static size_t ublk_copy_user_pages(const struct request *req, - struct iov_iter *uiter, int dir) + unsigned offset, struct iov_iter *uiter, int dir) { - struct ublk_io_iter iter = { - .bio = req->bio, - .iter = req->bio->bi_iter, - }; + struct ublk_io_iter iter; size_t done = 0; + if (!ublk_advance_io_iter(req, &iter, offset)) + return 0; + while (iov_iter_count(uiter) && iter.bio) { unsigned nr_pages; size_t len, off; @@ -576,7 +593,7 @@ static int ublk_map_io(const struct ublk_queue *ubq, const struct request *req, import_single_range(dir, u64_to_user_ptr(io->addr), rq_bytes, &iov, &iter); - return ublk_copy_user_pages(req, &iter, dir); + return ublk_copy_user_pages(req, 0, &iter, dir); } return rq_bytes; } @@ -596,7 +613,7 @@ static int ublk_unmap_io(const struct ublk_queue *ubq, import_single_range(dir, u64_to_user_ptr(io->addr), io->res, &iov, &iter); - return ublk_copy_user_pages(req, &iter, dir); + return ublk_copy_user_pages(req, 0, &iter, dir); } return rq_bytes; } From patchwork Tue Mar 28 15:09:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 76172 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp2331081vqo; Tue, 28 Mar 2023 09:06:26 -0700 (PDT) X-Google-Smtp-Source: AKy350bd8kum/CHAprjdduV81LsKu2oL9tVKtKm6SS7rLlA5gV1yb+z1SfxoUeevHw3CNBpfW/TT X-Received: by 2002:a17:906:1d0a:b0:8b1:7b10:61d5 with SMTP id n10-20020a1709061d0a00b008b17b1061d5mr18037699ejh.33.1680019586301; Tue, 28 Mar 2023 09:06:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680019586; cv=none; d=google.com; s=arc-20160816; b=u1dtcqJA8+GhSfq8jVQzlKgrvqk3u4nDZmLIV3lXEDVHgawAHXe+hn0utegUQLzwdN Rb58xv7cXOiJSs0Be3Z4dXx9+4BE2zwc4QpomhO+2+KHEOwJlqPYou21TInYzlE1ABRF 3pLJM34aILP9J1vnXCCXgnsFE7KSTESnXRzCgff2lmrOdlwclPE1bVBNJ8OhbvNd/kqC HPxG0G3Pp2B78w+WIUHUx7GamCeHLcmFIMRzmI3dc4xxq1zY1bj4YVbzlAUc1kQSsa3q 6dfcoR5IWLzhhBb78VQpUMumB3+3gL4hnUNpkq+hnaVZfaQeD902A4sHRkqi51MonFRF O5xw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=epgMYPHiL/drcoyXP4r134WY88eox9+1YtvoGLQRluU=; b=0iS2Ec8D60+MO0EaRn2B26l9QNhDNaLusHLcuOeRHjS1DayT1Oi9a8mJ2Bg0UUccUW WoGtIeszJx5jzFgq0PT2I6wwMiozcZnmIYAop7JKo23loYl/HQfYDBpCtbcQr63n8N1s iJvVk0ZhzMnFryApZGSaXINzkbMiVg5poERR2RFLHL24tkTcCtArjG2Ci8hoyZWc0DLf UlTUsoqVpGSiA+D+a5CkBRghpKS3cFkb2NxZ/PB6jYvSaZTHuv1qHxiM6sVtJBQ++OyK 9oFHqi2rqFY3i8F+F4t4x0LjHoo8+ei+Ber62QgpWxoVYmDVZeYa7rcvASUisW4n+WdF 4R3Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=NHRo+SqK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id cw20-20020a170906c79400b0092cf025c703si30219081ejb.928.2023.03.28.09.06.00; Tue, 28 Mar 2023 09:06:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=NHRo+SqK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231779AbjC1PeO (ORCPT + 99 others); Tue, 28 Mar 2023 11:34:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34760 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229861AbjC1PeM (ORCPT ); Tue, 28 Mar 2023 11:34:12 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D080B1FF5 for ; Tue, 28 Mar 2023 08:33:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680017605; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=epgMYPHiL/drcoyXP4r134WY88eox9+1YtvoGLQRluU=; b=NHRo+SqKYs7tcQB61y0LncFLLval0cSTfgj87QfQWXwo5plf9LO0pT13/xyWC1FJKgb616 8So44BrUrTpwnu0hYf19b9b/CqsnVSznvxG+ob+mrIgFsnIlIYRfmgmc0qh6pc+bxljw5r 0LTucTImXcsnhIiYFoPeZgHI/ZfAxRU= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-35-NUuhNtO5O_SR8Ng9fEVPbQ-1; Tue, 28 Mar 2023 11:11:54 -0400 X-MC-Unique: NUuhNtO5O_SR8Ng9fEVPbQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6211B887403; Tue, 28 Mar 2023 15:11:53 +0000 (UTC) Received: from localhost (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id 57D7E4020C82; Tue, 28 Mar 2023 15:11:51 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Pavel Begunkov , Stefan Hajnoczi , Dan Williams , Ming Lei Subject: [PATCH V5 14/16] block: ublk_drv: add read()/write() support for ublk char device Date: Tue, 28 Mar 2023 23:09:56 +0800 Message-Id: <20230328150958.1253547-15-ming.lei@redhat.com> In-Reply-To: <20230328150958.1253547-1-ming.lei@redhat.com> References: <20230328150958.1253547-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761628217979489877?= X-GMAIL-MSGID: =?utf-8?q?1761628217979489877?= We are going to support zero copy by fused uring command, the userspace can't read from or write to the io buffer any more, it becomes not flexible for applications: 1) some targets need to zero buffer explicitly, such as when reading unmapped qcow2 cluster 2) some targets need to support passthrough command, such as zoned report zones, and still need to read/write the io buffer Support pread()/pwrite() on ublk char device for reading/writing request io buffer, so ublk server can handle the above cases easily. This also can help to make zero copy becoming the primary option, and non-zero-copy will become legacy code path since the added read()/write() can cover non-zero-copy feature. Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 131 ++++++++++++++++++++++++++++++++++ include/uapi/linux/ublk_cmd.h | 31 +++++++- 2 files changed, 161 insertions(+), 1 deletion(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 32304942ab87..03ad33686808 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -1322,6 +1322,36 @@ static void ublk_handle_need_get_data(struct ublk_device *ub, int q_id, ublk_queue_cmd(ubq, req); } +static inline struct request *__ublk_check_and_get_req(struct ublk_device *ub, + struct ublk_queue *ubq, int tag, size_t offset) +{ + struct request *req; + + if (!ublk_support_zc(ubq)) + return NULL; + + req = blk_mq_tag_to_rq(ub->tag_set.tags[ubq->q_id], tag); + if (!req) + return NULL; + + if (!ublk_get_req_ref(ubq, req)) + return NULL; + + if (unlikely(!blk_mq_request_started(req) || req->tag != tag)) + goto fail_put; + + if (!ublk_rq_has_data(req)) + goto fail_put; + + if (offset > blk_rq_bytes(req)) + goto fail_put; + + return req; +fail_put: + ublk_put_req_ref(ubq, req); + return NULL; +} + static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) { struct ublksrv_io_cmd *ub_cmd = (struct ublksrv_io_cmd *)cmd->cmd; @@ -1423,11 +1453,112 @@ static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) return -EIOCBQUEUED; } +static inline bool ublk_check_ubuf_dir(const struct request *req, + int ubuf_dir) +{ + /* copy ubuf to request pages */ + if (req_op(req) == REQ_OP_READ && ubuf_dir == ITER_SOURCE) + return true; + + /* copy request pages to ubuf */ + if (req_op(req) == REQ_OP_WRITE && ubuf_dir == ITER_DEST) + return true; + + return false; +} + +static struct request *ublk_check_and_get_req(struct kiocb *iocb, + struct iov_iter *iter, size_t *off, int dir) +{ + struct ublk_device *ub = iocb->ki_filp->private_data; + struct ublk_queue *ubq; + struct request *req; + size_t buf_off; + u16 tag, q_id; + + if (!ub) + return ERR_PTR(-EACCES); + + if (!user_backed_iter(iter)) + return ERR_PTR(-EACCES); + + if (ub->dev_info.state == UBLK_S_DEV_DEAD) + return ERR_PTR(-EACCES); + + tag = ublk_pos_to_tag(iocb->ki_pos); + q_id = ublk_pos_to_hwq(iocb->ki_pos); + buf_off = ublk_pos_to_buf_offset(iocb->ki_pos); + + if (q_id >= ub->dev_info.nr_hw_queues) + return ERR_PTR(-EINVAL); + + ubq = ublk_get_queue(ub, q_id); + if (!ubq) + return ERR_PTR(-EINVAL); + + if (tag >= ubq->q_depth) + return ERR_PTR(-EINVAL); + + req = __ublk_check_and_get_req(ub, ubq, tag, buf_off); + if (!req) + return ERR_PTR(-EINVAL); + + if (!req->mq_hctx || !req->mq_hctx->driver_data) + goto fail; + + if (!ublk_check_ubuf_dir(req, dir)) + goto fail; + + *off = buf_off; + return req; +fail: + ublk_put_req_ref(ubq, req); + return ERR_PTR(-EACCES); +} + +static ssize_t ublk_ch_read_iter(struct kiocb *iocb, struct iov_iter *to) +{ + struct ublk_queue *ubq; + struct request *req; + size_t buf_off; + size_t ret; + + req = ublk_check_and_get_req(iocb, to, &buf_off, ITER_DEST); + if (unlikely(IS_ERR(req))) + return PTR_ERR(req); + + ret = ublk_copy_user_pages(req, buf_off, to, ITER_DEST); + ubq = req->mq_hctx->driver_data; + ublk_put_req_ref(ubq, req); + + return ret; +} + +static ssize_t ublk_ch_write_iter(struct kiocb *iocb, struct iov_iter *from) +{ + struct ublk_queue *ubq; + struct request *req; + size_t buf_off; + size_t ret; + + req = ublk_check_and_get_req(iocb, from, &buf_off, ITER_SOURCE); + if (unlikely(IS_ERR(req))) + return PTR_ERR(req); + + ret = ublk_copy_user_pages(req, buf_off, from, ITER_SOURCE); + ubq = req->mq_hctx->driver_data; + ublk_put_req_ref(ubq, req); + + return ret; +} + static const struct file_operations ublk_ch_fops = { .owner = THIS_MODULE, .open = ublk_ch_open, .release = ublk_ch_release, .llseek = no_llseek, + .read_iter = ublk_ch_read_iter, + .write_iter = ublk_ch_write_iter, .uring_cmd = ublk_ch_uring_cmd, .mmap = ublk_ch_mmap, }; diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h index f6238ccc7800..d1a6b3dc0327 100644 --- a/include/uapi/linux/ublk_cmd.h +++ b/include/uapi/linux/ublk_cmd.h @@ -54,7 +54,36 @@ #define UBLKSRV_IO_BUF_OFFSET 0x80000000 /* tag bit is 12bit, so at most 4096 IOs for each queue */ -#define UBLK_MAX_QUEUE_DEPTH 4096 +#define UBLK_TAG_BITS 12 +#define UBLK_MAX_QUEUE_DEPTH (1U << UBLK_TAG_BITS) + +/* used for locating each io buffer for pread()/pwrite() on char device */ +#define UBLK_BUFS_SIZE_BITS 42 +#define UBLK_BUFS_SIZE_MASK ((1ULL << UBLK_BUFS_SIZE_BITS) - 1) +#define UBLK_BUF_SIZE_BITS (UBLK_BUFS_SIZE_BITS - UBLK_TAG_BITS) +#define UBLK_BUF_MAX_SIZE (1ULL << UBLK_BUF_SIZE_BITS) + +static inline __u16 ublk_pos_to_hwq(__u64 pos) +{ + return pos >> UBLK_BUFS_SIZE_BITS; +} + +static inline __u32 ublk_pos_to_buf_offset(__u64 pos) +{ + return (pos & UBLK_BUFS_SIZE_MASK) & (UBLK_BUF_MAX_SIZE - 1); +} + +static inline __u16 ublk_pos_to_tag(__u64 pos) +{ + return (pos & UBLK_BUFS_SIZE_MASK) >> UBLK_BUF_SIZE_BITS; +} + +/* offset of single buffer, which has to be < UBLK_BUX_MAX_SIZE */ +static inline __u64 ublk_pos(__u16 q_id, __u16 tag, __u32 offset) +{ + return (((__u64)q_id) << UBLK_BUFS_SIZE_BITS) | + ((((__u64)tag) << UBLK_BUF_SIZE_BITS) + offset); +} /* * zero copy requires 4k block size, and can remap ublk driver's io From patchwork Tue Mar 28 15:09:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 76154 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp2315675vqo; Tue, 28 Mar 2023 08:43:37 -0700 (PDT) X-Google-Smtp-Source: AKy350YIGTE8nYEslhWLpjKnZTex7Q56UY/UTk+mORZb/bZ4iJznKWwEKCmqdd+xMBVFoA/O0nbZ X-Received: by 2002:a17:906:a84c:b0:92e:3b80:9841 with SMTP id dx12-20020a170906a84c00b0092e3b809841mr14600332ejb.42.1680018217651; Tue, 28 Mar 2023 08:43:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680018217; cv=none; d=google.com; s=arc-20160816; b=X49hbNZ0PCPvY+0J5XDCJNIcZsQR/Ut3uWfrHzQ8nJlo7hNi1CKes/PWp5Fc/jUBpB IyAKeu0nVX/tQNw5F/eIO0vfuQDqS6U+30/Gns3heW9WGuovPRrrrm9LpS3n9dv6Y9NO punsvJ2ZavAK7FEU8pBGTENmqY6Z5hqmvtKJCDMCiABI8sL8AvSwLy33UWc5kTNNviMa twsUfx6Zr8C0rvgEzHZOTX4LXIJZSfRHNotx/aGHLmn123u+J9KFx8TLqTk7JjW8HO8p nRPiJz4OydYKW77CfFV8op22JPzIRjGWNEszqIqU0ZUeVD6H1EoRs9kNbXjr/PW9Hpzi spxQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=a+LP/eGObxESjSZtnTg8CVLgYRu7HZvCpWjJZDO48e4=; b=Z/2PWDSZNrfTa7RaQacrqS1KALECda9WvXptZXa6a3g3CeOfl1KECC0i5wllr6MKZE g2mt8TOtKqSqwPAuPzKbq8GzkDDMJVV505oMObmIRt4NGv7X1SjVeqPowgK8NbN0GfhU yPD9cNWsS5PjWGoFaoKOX9Va49VBJ7bcvOwiPKKcE70CTSWYmnM9b8Iopz1ODXVoakOX m2iO3/kG4hX4XrKFU6D2mB3ShNJ/XTd+aMXGWwqNLxDkMSjymdDnFMcBQCUN/cK7kUbB gzGsYXt72tEP/jMv3FzxbVYcDDYjsEo6PpYJ+QxLeXMOvrfRlKPYAPPPTOHd2gFfXZDB CIiw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=OfQcZ+jX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id gt18-20020a170906f21200b00926b9cdf363si29618565ejb.552.2023.03.28.08.43.12; Tue, 28 Mar 2023 08:43:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=OfQcZ+jX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233504AbjC1PjD (ORCPT + 99 others); Tue, 28 Mar 2023 11:39:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45764 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234070AbjC1PiT (ORCPT ); Tue, 28 Mar 2023 11:38:19 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 86D4740CA for ; Tue, 28 Mar 2023 08:36:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680017787; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=a+LP/eGObxESjSZtnTg8CVLgYRu7HZvCpWjJZDO48e4=; b=OfQcZ+jXTrBUWoX9WqQe+qCsLP40HO1WUNzMBzUzOJ9YRGpaM9TGDzpZsqiQGGlk7nUkYp +skjykt9OwsqbWMntTp4ZQt0nIuP/KzTInfO5c9QyJtyX98dWnTFlxm0fslUXcfuzgqE/+ 8JkIyWD0k7VvuvdpEyek/HpmEdpQ32M= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-495-MWpvJGt3PZ2YlU4lp9woSA-1; Tue, 28 Mar 2023 11:11:58 -0400 X-MC-Unique: MWpvJGt3PZ2YlU4lp9woSA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6D7218028B2; Tue, 28 Mar 2023 15:11:57 +0000 (UTC) Received: from localhost (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6355314171BD; Tue, 28 Mar 2023 15:11:56 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Pavel Begunkov , Stefan Hajnoczi , Dan Williams , Ming Lei Subject: [PATCH V5 15/16] block: ublk_drv: don't check buffer in case of zero copy Date: Tue, 28 Mar 2023 23:09:57 +0800 Message-Id: <20230328150958.1253547-16-ming.lei@redhat.com> In-Reply-To: <20230328150958.1253547-1-ming.lei@redhat.com> References: <20230328150958.1253547-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761626782582907867?= X-GMAIL-MSGID: =?utf-8?q?1761626782582907867?= In case of zero copy, ublk server needn't to pre-allocate IO buffer and provide it to driver more. Meantime not set the buffer in case of zero copy any more, and the userspace can use pread()/pwrite() to read from/write to the io request buffer, which is easier & simpler from userspace viewpoint. Signed-off-by: Ming Lei --- drivers/block/ublk_drv.c | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 03ad33686808..a49b4de5ae1e 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -1410,25 +1410,30 @@ static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) if (io->flags & UBLK_IO_FLAG_OWNED_BY_SRV) goto out; /* FETCH_RQ has to provide IO buffer if NEED GET DATA is not enabled */ - if (!ub_cmd->addr && !ublk_need_get_data(ubq)) - goto out; + if (!ublk_support_zc(ubq)) { + if (!ub_cmd->addr && !ublk_need_get_data(ubq)) + goto out; + io->addr = ub_cmd->addr; + } io->cmd = cmd; io->flags |= UBLK_IO_FLAG_ACTIVE; - io->addr = ub_cmd->addr; - ublk_mark_io_ready(ub, ubq); break; case UBLK_IO_COMMIT_AND_FETCH_REQ: req = blk_mq_tag_to_rq(ub->tag_set.tags[ub_cmd->q_id], tag); + + if (!(io->flags & UBLK_IO_FLAG_OWNED_BY_SRV)) + goto out; /* * COMMIT_AND_FETCH_REQ has to provide IO buffer if NEED GET DATA is * not enabled or it is Read IO. */ - if (!ub_cmd->addr && (!ublk_need_get_data(ubq) || req_op(req) == REQ_OP_READ)) - goto out; - if (!(io->flags & UBLK_IO_FLAG_OWNED_BY_SRV)) - goto out; - io->addr = ub_cmd->addr; + if (!ublk_support_zc(ubq)) { + if (!ub_cmd->addr && (!ublk_need_get_data(ubq) || + req_op(req) == REQ_OP_READ)) + goto out; + io->addr = ub_cmd->addr; + } io->flags |= UBLK_IO_FLAG_ACTIVE; io->cmd = cmd; ublk_commit_completion(ub, ub_cmd); From patchwork Tue Mar 28 15:09:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 76135 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp2310688vqo; Tue, 28 Mar 2023 08:35:32 -0700 (PDT) X-Google-Smtp-Source: AKy350ZrSZHAVHlogaLVfzi2et2wJOvB8YdIbRrk1dgw5o81GgQq44Y8o9TbTwFN+1MSOnJr2+Bn X-Received: by 2002:a05:6a20:b71f:b0:e0:316a:d62c with SMTP id fg31-20020a056a20b71f00b000e0316ad62cmr6194480pzb.60.1680017731779; Tue, 28 Mar 2023 08:35:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680017731; cv=none; d=google.com; s=arc-20160816; b=iucurqmSifZ32gAXPzc7TQmliKo1d7Iilih2IxLFbWbXQEo92Fkoai1r8x5djEyins v8N8GMQ6jQXkZI+6f0iNtU6j9FX8HpKClX8mArhqnSFr8hf9uTelJsSjXKTZBySstRZq 3yHOSrlj8vbZ/qqGENafAI6vJgNO7TZfarm/eJWpBE0MZgK1AswF8ZTdXH+HCdOwIYj6 ohf1k/2PlPn4spelizDu8VCim74U8sIy+c86zeXoICFxaziETC0lqTLiH0kMsAZAqCmW UPaeF4OTuPpRH0aMotFpgNk844q+etYRkQAUaC/qqNbSVONu9WebHVF8xqcx4uJXdTK2 BywA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=bkrQPrOTJ7X3k65OqTy8qHOS1yB1zs3W4YKYKVIPFZA=; b=faUxYOoF5wKSczSrytXrn5icd0HHPsMUVOI5JyrXIDnG88jbbcLMMzmuGFc6Ka5uUd kZbZBa1pselHxPO5zYUkDdvuIAvbUFxWWymTMfr8D4CI7a59wuCtX4HWNX1jLY4HJR7Q SNPVMPvIXhKhyKhzCqA8wKGc6IRPzK4EYX5uk5yk+W8WlKB8wNrc9T0+u/6Cjc1GYuH1 6nINROgL891i/TpiQcH7k2avwqs1vW3VPfdWBh7A+e8YkJW8kuYjE8G3PBs8eBHMLpU/ rXM1tIeK+balcRwgak0XdbifzJO/EzDrizPsCU/jys6CON3w/OtlUhR5ZFcH6Oftm5EC 9qYA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Bh798vqD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a21-20020a63e855000000b004fb165e159bsi30206848pgk.794.2023.03.28.08.35.18; Tue, 28 Mar 2023 08:35:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Bh798vqD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231820AbjC1P2e (ORCPT + 99 others); Tue, 28 Mar 2023 11:28:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48312 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233344AbjC1P2O (ORCPT ); Tue, 28 Mar 2023 11:28:14 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BC56711E87 for ; Tue, 28 Mar 2023 08:26:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680017127; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bkrQPrOTJ7X3k65OqTy8qHOS1yB1zs3W4YKYKVIPFZA=; b=Bh798vqDG9/iIUzTNCKKtyi8VvKtdr1zdNmOBMeOdNiit6FcQvMwWieozvTdo6gRiIHDom I6tKp4vcS5h+D/IHU7g+QXO9iaD/ngkSWoKLyIOO2lcF5LeIZUE5sG36OFJDNDSZnnxpOF iTOdfV+SBTTcYak2UMM695KSXtOUCzg= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-381-RElhPhaVNO-sLFZ5mAKV0g-1; Tue, 28 Mar 2023 11:12:02 -0400 X-MC-Unique: RElhPhaVNO-sLFZ5mAKV0g-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 99A1D100DEA9; Tue, 28 Mar 2023 15:12:01 +0000 (UTC) Received: from localhost (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2E829C15BA0; Tue, 28 Mar 2023 15:11:59 +0000 (UTC) From: Ming Lei To: Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Miklos Szeredi , ZiyangZhang , Xiaoguang Wang , Bernd Schubert , Pavel Begunkov , Stefan Hajnoczi , Dan Williams , Ming Lei Subject: [PATCH V5 16/16] block: ublk_drv: apply io_uring FUSED_CMD for supporting zero copy Date: Tue, 28 Mar 2023 23:09:58 +0800 Message-Id: <20230328150958.1253547-17-ming.lei@redhat.com> In-Reply-To: <20230328150958.1253547-1-ming.lei@redhat.com> References: <20230328150958.1253547-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761626273206769567?= X-GMAIL-MSGID: =?utf-8?q?1761626273206769567?= Apply io_uring fused command for supporting zero copy: 1) init the fused cmd buffer(io_mapped_buf) in ublk_map_io(), and deinit it in ublk_unmap_io(), and this buffer is immutable, so it is just fine to retrieve it from concurrent fused command. 1) add sub-command opcode of UBLK_IO_FUSED_SUBMIT_IO for retrieving this fused cmd(zero copy) buffer 2) call io_fused_cmd_start_secondary_req() to provide buffer to secondary request and submit secondary request; meantime setup complete callback via this API, once secondary request is completed, the complete callback is called for freeing the buffer and completing the fused command Also request reference is held during fused command lifetime, and this way guarantees that request buffer won't be freed until all inflight fused commands are completed. userspace(only implement sqe128 fused command): https://github.com/ming1/ubdsrv/tree/fused-cmd-zc-for-v5 liburing test(only implement normal sqe fused command: two 64byte SQEs) https://github.com/ming1/liburing/tree/fused_cmd_miniublk_for_v5 Signed-off-by: Ming Lei --- Documentation/block/ublk.rst | 126 ++++++++++++++++++++-- drivers/block/ublk_drv.c | 192 ++++++++++++++++++++++++++++++++-- include/uapi/linux/ublk_cmd.h | 6 +- 3 files changed, 303 insertions(+), 21 deletions(-) diff --git a/Documentation/block/ublk.rst b/Documentation/block/ublk.rst index 1713b2890abb..7b7aa24e9729 100644 --- a/Documentation/block/ublk.rst +++ b/Documentation/block/ublk.rst @@ -297,18 +297,126 @@ with specified IO tag in the command data: ``UBLK_IO_COMMIT_AND_FETCH_REQ`` to the server, ublkdrv needs to copy the server buffer (pages) read to the IO request pages. -Future development -================== +- ``UBLK_IO_FUSED_SUBMIT_IO`` + + Used for implementing zero copy feature. + + It has to been the primary command of io_uring fused command. This command + submits the generic secondary IO request with io buffer provided by our primary + command, and won't be completed until the secondary request is done. + + The provided buffer is represented as ``io_uring_bvec_buf``, which is + actually ublk request buffer's reference, and the reference is shared & + read-only, so the generic secondary request can retrieve any part of the buffer + by passing buffer offset & length. Zero copy ---------- +========= + +What is zero copy? +------------------ + +When application submits IO to ``/dev/ublkb*``, userspace buffer(direct io) +or page cache buffer(buffered io) or kernel buffer(meta io often) is used +for submitting data to ublk driver, and all kinds of these buffers are +represented by bio/bvecs(ublk request buffer) finally. Before supporting +zero copy, data in these buffers has to be copied to ublk server userspace +buffer before handling WRITE IO, or after handing READ IO, so that ublk +server can handle IO for ``/dev/ublkb*`` with the copied data. + +The extra copy between ublk request buffer and ublk server userspace buffer +not only increases CPU utilization(such as pinning pages, copy data), but +also consumes memory bandwidth, and the cost could be very big when IO size +is big. It is observed that ublk-null IOPS may be increased to ~5X if the +extra copy can be avoided. + +So zero copy is very important for supporting high performance block device +in userspace. + +Technical requirements +---------------------- + +- ublk request buffer use + +ublk request buffer is represented by bio/bvec, which is immutable, so do +not try to change bvec via buffer reference; data can be read from or +written to the buffer according to buffer direction, but bvec can't be +changed + +- buffer lifetime + +Ublk server borrows ublk request buffer for handling ublk IO, ublk request +buffer reference is used. Reference can't outlive the referent buffer. That +means all request buffer references have to be released by ublk server +before ublk driver completes this request, when request buffer ownership +is transferred to upper layer(FS, application, ...). + +Also after ublk request is completed, any page belonging to this ublk +request can not be written or read any more from ublk server since it is +one block device from kernel viewpoint. + +- buffer direction + +For ublk WRITE request, ublk request buffer should only be accessed as data +source, and the buffer can't be written by ublk server + +For ublk READ request, ublk request buffer should only be accessed as data +destination, and the buffer can't be read by ublk server, otherwise kernel +data is leaked to ublk server, which can be unprivileged application. + +- arbitrary size sub-buffer needs to be retrieved from ublk server + +ublk is one generic framework for implementing block device in userspace, +and typical requirements include logical volume manager(mirror, stripped, ...), +distributed network storage, compressed target, ... + +ublk server needs to retrieve arbitrary size sub-buffer of ublk request, and +ublk server needs to submit IOs with these sub-buffer(s). That also means +arbitrary size sub-buffer(s) can be used to submit IO multiple times. + +Any sub-buffer is actually one reference of ublk request buffer, which +ownership can't be transferred to upper layer if any reference is held +by ublk server. + +Why slice isn't good for ublk zero copy +--------------------------------------- + +- spliced page from ->splice_read() can't be written + +ublk READ request can't be handled because spliced page can't be written to, and +extending splice for ublk zero copy isn't one good solution [#splice_extend]_ + +- it is very hard to meet above requirements wrt. request buffer lifetime + +splice/pipe focuses on page reference lifetime, but ublk zero copy pays more +attention to ublk request buffer lifetime. If is very inefficient to respect +request buffer lifetime by using all pipe buffer's ->release() which requires +all pipe buffers and pipe to be kept when ublk server handles IO. That means +one single dedicated ``pipe_inode_info`` has to be allocated runtime for each +provided buffer, and the pipe needs to be populated with pages in ublk request +buffer. + + +io_uring fused command based zero copy +-------------------------------------- + +io_uring fused command includes one primary command(uring command) and one +generic secondary request. The primary command is responsible for submitting +secondary request with provided buffer from ublk request, and primary command +won't be completed until the secondary request is completed. + +Typical ublk IO handling includes network and FS IO, so it is usual enough +for io_uring net & fs to support IO with provided buffer from primary command. -Zero copy is a generic requirement for nbd, fuse or similar drivers. A -problem [#xiaoguang]_ Xiaoguang mentioned is that pages mapped to userspace -can't be remapped any more in kernel with existing mm interfaces. This can -occurs when destining direct IO to ``/dev/ublkb*``. Also, he reported that -big requests (IO size >= 256 KB) may benefit a lot from zero copy. +Once primary command is submitted successfully, ublk driver guarantees that +the ublk request buffer won't be gone away since secondary request actually +grabs the buffer's reference. This way also guarantees that multiple +concurrent fused commands associated with same request buffer works fine, +as the provided buffer reference is shared & read-only. +Also buffer usage direction flag is passed to primary command from userspace, +so ublk driver can validate if it is legal to use buffer with requested +direction. References ========== @@ -323,4 +431,4 @@ References .. [#stefan] https://lore.kernel.org/linux-block/YoOr6jBfgVm8GvWg@stefanha-x1.localdomain/ -.. [#xiaoguang] https://lore.kernel.org/linux-block/YoOr6jBfgVm8GvWg@stefanha-x1.localdomain/ +.. [#splice_extend] https://lore.kernel.org/linux-block/CAHk-=wgJsi7t7YYpuo6ewXGnHz2nmj67iWR6KPGoz5TBu34mWQ@mail.gmail.com/ diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index a49b4de5ae1e..52b0a6e2be6e 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -74,10 +74,15 @@ struct ublk_rq_data { * successfully */ struct kref ref; + bool allocated_bvec; + struct io_uring_bvec_buf buf[0]; }; struct ublk_uring_cmd_pdu { - struct ublk_queue *ubq; + union { + struct ublk_queue *ubq; + struct request *req; + }; }; /* @@ -565,6 +570,69 @@ static size_t ublk_copy_user_pages(const struct request *req, return done; } +/* + * The built command buffer is immutable, so it is fine to feed it to + * concurrent io_uring fused commands + */ +static int ublk_init_zero_copy_buffer(struct request *rq) +{ + struct ublk_rq_data *data = blk_mq_rq_to_pdu(rq); + struct io_uring_bvec_buf *imu = data->buf; + struct req_iterator rq_iter; + unsigned int nr_bvecs = 0; + struct bio_vec *bvec; + unsigned int offset; + struct bio_vec bv; + + if (!ublk_rq_has_data(rq)) + goto exit; + + rq_for_each_bvec(bv, rq, rq_iter) + nr_bvecs++; + + if (!nr_bvecs) + goto exit; + + if (rq->bio != rq->biotail) { + int idx = 0; + + bvec = kvmalloc_array(sizeof(struct bio_vec), nr_bvecs, + GFP_NOIO); + if (!bvec) + return -ENOMEM; + + offset = 0; + rq_for_each_bvec(bv, rq, rq_iter) + bvec[idx++] = bv; + data->allocated_bvec = true; + } else { + struct bio *bio = rq->bio; + + offset = bio->bi_iter.bi_bvec_done; + bvec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter); + } + imu->bvec = bvec; + imu->nr_bvecs = nr_bvecs; + imu->offset = offset; + imu->len = blk_rq_bytes(rq); + + return 0; +exit: + imu->bvec = NULL; + return 0; +} + +static void ublk_deinit_zero_copy_buffer(struct request *rq) +{ + struct ublk_rq_data *data = blk_mq_rq_to_pdu(rq); + struct io_uring_bvec_buf *imu = data->buf; + + if (data->allocated_bvec) { + kvfree(imu->bvec); + data->allocated_bvec = false; + } +} + static inline bool ublk_need_map_req(const struct request *req) { return ublk_rq_has_data(req) && req_op(req) == REQ_OP_WRITE; @@ -575,11 +643,23 @@ static inline bool ublk_need_unmap_req(const struct request *req) return ublk_rq_has_data(req) && req_op(req) == REQ_OP_READ; } -static int ublk_map_io(const struct ublk_queue *ubq, const struct request *req, +static int ublk_map_io(const struct ublk_queue *ubq, struct request *req, struct ublk_io *io) { const unsigned int rq_bytes = blk_rq_bytes(req); + if (ublk_support_zc(ubq)) { + int ret = ublk_init_zero_copy_buffer(req); + + /* + * The only failure is -ENOMEM for allocating fused cmd + * buffer, return zero so that we can requeue this req. + */ + if (unlikely(ret)) + return 0; + return rq_bytes; + } + /* * no zero copy, we delay copy WRITE request data into ublksrv * context and the big benefit is that pinning pages in current @@ -599,11 +679,17 @@ static int ublk_map_io(const struct ublk_queue *ubq, const struct request *req, } static int ublk_unmap_io(const struct ublk_queue *ubq, - const struct request *req, + struct request *req, struct ublk_io *io) { const unsigned int rq_bytes = blk_rq_bytes(req); + if (ublk_support_zc(ubq)) { + ublk_deinit_zero_copy_buffer(req); + + return rq_bytes; + } + if (ublk_need_unmap_req(req)) { struct iov_iter iter; struct iovec iov; @@ -687,6 +773,12 @@ static inline struct ublk_uring_cmd_pdu *ublk_get_uring_cmd_pdu( return (struct ublk_uring_cmd_pdu *)&ioucmd->pdu; } +static inline struct ublk_uring_cmd_pdu *ublk_get_uring_fused_cmd_pdu( + struct io_uring_cmd *ioucmd) +{ + return (struct ublk_uring_cmd_pdu *)&ioucmd->fused.pdu; +} + static inline bool ubq_daemon_is_dying(struct ublk_queue *ubq) { return ubq->ubq_daemon->flags & PF_EXITING; @@ -742,6 +834,7 @@ static inline void __ublk_complete_rq(struct request *req) return; exit: + ublk_deinit_zero_copy_buffer(req); blk_mq_end_request(req, res); } @@ -1352,6 +1445,68 @@ static inline struct request *__ublk_check_and_get_req(struct ublk_device *ub, return NULL; } +static void ublk_fused_cmd_done_cb(struct io_uring_cmd *cmd, + unsigned issue_flags) +{ + struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_fused_cmd_pdu(cmd); + struct request *req = pdu->req; + struct ublk_queue *ubq = req->mq_hctx->driver_data; + + ublk_put_req_ref(ubq, req); + io_uring_cmd_done(cmd, cmd->fused.data.secondary_res, 0, issue_flags); +} + +static inline bool ublk_check_fused_buf_dir(const struct request *req, + unsigned int flags) +{ + flags &= IO_URING_F_FUSED; + + if (req_op(req) == REQ_OP_READ && flags == IO_URING_F_FUSED_BUF_DEST) + return true; + + if (req_op(req) == REQ_OP_WRITE && flags == IO_URING_F_FUSED_BUF_SRC) + return true; + + return false; +} + +static int ublk_handle_fused_cmd(struct io_uring_cmd *cmd, + struct ublk_queue *ubq, int tag, unsigned int issue_flags) +{ + struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_fused_cmd_pdu(cmd); + struct ublk_device *ub = cmd->file->private_data; + struct ublk_rq_data *data; + struct request *req; + + if (!ub) + return -EPERM; + + if (!(issue_flags & IO_URING_F_FUSED)) + goto exit; + + req = __ublk_check_and_get_req(ub, ubq, tag, 0); + if (!req) + goto exit; + + pr_devel("%s: qid %d tag %u request bytes %u, issue flags %x\n", + __func__, tag, ubq->q_id, blk_rq_bytes(req), + issue_flags); + + if (!ublk_check_fused_buf_dir(req, issue_flags)) + goto exit_put_ref; + + pdu->req = req; + data = blk_mq_rq_to_pdu(req); + io_fused_cmd_start_secondary_req(cmd, !(issue_flags & IO_URING_F_UNLOCKED), + data->buf, ublk_fused_cmd_done_cb); + return -EIOCBQUEUED; + +exit_put_ref: + ublk_put_req_ref(ubq, req); +exit: + return -EINVAL; +} + static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) { struct ublksrv_io_cmd *ub_cmd = (struct ublksrv_io_cmd *)cmd->cmd; @@ -1367,6 +1522,10 @@ static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) __func__, cmd->cmd_op, ub_cmd->q_id, tag, ub_cmd->result); + if ((issue_flags & IO_URING_F_FUSED) && + cmd_op != UBLK_IO_FUSED_SUBMIT_IO) + return -EOPNOTSUPP; + if (ub_cmd->q_id >= ub->dev_info.nr_hw_queues) goto out; @@ -1374,7 +1533,12 @@ static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) if (!ubq || ub_cmd->q_id != ubq->q_id) goto out; - if (ubq->ubq_daemon && ubq->ubq_daemon != current) + /* + * The fused command reads the io buffer data structure only, so it + * is fine to be issued from other context. + */ + if ((ubq->ubq_daemon && ubq->ubq_daemon != current) && + (cmd_op != UBLK_IO_FUSED_SUBMIT_IO)) goto out; if (tag >= ubq->q_depth) @@ -1397,6 +1561,9 @@ static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags) goto out; switch (cmd_op) { + case UBLK_IO_FUSED_SUBMIT_IO: + return ublk_handle_fused_cmd(cmd, ubq, tag, issue_flags); + case UBLK_IO_FETCH_REQ: /* UBLK_IO_FETCH_REQ is only allowed before queue is setup */ if (ublk_queue_ready(ubq)) { @@ -1726,11 +1893,14 @@ static void ublk_align_max_io_size(struct ublk_device *ub) static int ublk_add_tag_set(struct ublk_device *ub) { + int zc = !!(ub->dev_info.flags & UBLK_F_SUPPORT_ZERO_COPY); + struct ublk_rq_data *data; + ub->tag_set.ops = &ublk_mq_ops; ub->tag_set.nr_hw_queues = ub->dev_info.nr_hw_queues; ub->tag_set.queue_depth = ub->dev_info.queue_depth; ub->tag_set.numa_node = NUMA_NO_NODE; - ub->tag_set.cmd_size = sizeof(struct ublk_rq_data); + ub->tag_set.cmd_size = struct_size(data, buf, zc); ub->tag_set.flags = BLK_MQ_F_SHOULD_MERGE; ub->tag_set.driver_data = ub; return blk_mq_alloc_tag_set(&ub->tag_set); @@ -1946,12 +2116,18 @@ static int ublk_ctrl_add_dev(struct io_uring_cmd *cmd) */ ub->dev_info.flags &= UBLK_F_ALL; + /* + * NEED_GET_DATA doesn't make sense any more in case that + * ZERO_COPY is requested. Another reason is that userspace + * can read/write io request buffer by pread()/pwrite() with + * each io buffer's position. + */ + if (ub->dev_info.flags & UBLK_F_SUPPORT_ZERO_COPY) + ub->dev_info.flags &= ~UBLK_F_NEED_GET_DATA; + if (!IS_BUILTIN(CONFIG_BLK_DEV_UBLK)) ub->dev_info.flags |= UBLK_F_URING_CMD_COMP_IN_TASK; - /* We are not ready to support zero copy */ - ub->dev_info.flags &= ~UBLK_F_SUPPORT_ZERO_COPY; - ub->dev_info.nr_hw_queues = min_t(unsigned int, ub->dev_info.nr_hw_queues, nr_cpu_ids); ublk_align_max_io_size(ub); diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h index d1a6b3dc0327..c4f3465399cf 100644 --- a/include/uapi/linux/ublk_cmd.h +++ b/include/uapi/linux/ublk_cmd.h @@ -44,6 +44,7 @@ #define UBLK_IO_FETCH_REQ 0x20 #define UBLK_IO_COMMIT_AND_FETCH_REQ 0x21 #define UBLK_IO_NEED_GET_DATA 0x22 +#define UBLK_IO_FUSED_SUBMIT_IO 0x23 /* only ABORT means that no re-fetch */ #define UBLK_IO_RES_OK 0 @@ -85,10 +86,7 @@ static inline __u64 ublk_pos(__u16 q_id, __u16 tag, __u32 offset) ((((__u64)tag) << UBLK_BUF_SIZE_BITS) + offset); } -/* - * zero copy requires 4k block size, and can remap ublk driver's io - * request into ublksrv's vm space - */ +/* io_uring fused command based zero copy */ #define UBLK_F_SUPPORT_ZERO_COPY (1ULL << 0) /*