From patchwork Thu Nov 9 05:44:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Daisuke Matsuda (Fujitsu)" X-Patchwork-Id: 163228 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b129:0:b0:403:3b70:6f57 with SMTP id q9csp235261vqs; Wed, 8 Nov 2023 21:46:44 -0800 (PST) X-Google-Smtp-Source: AGHT+IEeH4IQU+BYoS3aHQ9j/c2Yjhla1lJ0uarpdlYou6uDKHCTR67FwE+rx0Gh1Xr/cWytlebY X-Received: by 2002:a25:2302:0:b0:d9a:c4cf:a066 with SMTP id j2-20020a252302000000b00d9ac4cfa066mr4023491ybj.34.1699508804547; Wed, 08 Nov 2023 21:46:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699508804; cv=none; d=google.com; s=arc-20160816; b=ZprlByAL4diduguVTkGOJ1G0kwdE0DRLq7m5CoPR/Fnsc6EuMvK+SdLyXf16qn1VtV r9BwnHW5exHhRB+K+5FraeLNqitwfTxnekv1F/rUbiM1Vq/UYtGg1TX5oh2m/+n31MPL e3E+hr3n1k8cDgHPdmMKpR8XaWKlFZwAWaUk3l0C9OKqOkd4cdUk0z/yIUgsgm0SBvhY zgacyzedzEzx4zpruU9zNXQ8hvehdF8lsFkFV3du7I3vnnJqQFc5ZMhsVoglhqYLIH96 w3ukgvqWlnIabasdbY0Qrs6xBi4xO2bVHML/yNaEJXUbDw6xrsP6Hyoufybpwn5yH2gj vdiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=ybIAnMEXfNdHCPOmntokXdx/z+rap4fR4ThPiSwjHHY=; fh=Kza5GnIpHUYQ1cZy2y92CPl6XrzN03vBDGjf8wFwAS8=; b=OSVKo6fLrvKaxPWAoOjziqYVYj+1saFVpcjmlsakPHpOZyhALz7ud8w6wsUgw8bkHL P7OzbrTJaoTK1KoxBWEnONrKUOoJgpNlIBy89vdNs/j+NrMhhlajJKCZUdohaAs3mY9C EzJJ7y6xdSL/ugvVJDEoZksNFPMkyCBWFAlxuHCJ+Oc3wVmXgDKb2vufmUGXkeQ7cNp6 SuBYYalodNXeLieyp+djtfF1p52xrBMAgnx7vQ586lqdfNgmIM10vFZthMRTQ5GQX1nG v8E00SVw7HqOmQLSIMqYXH8XRbT39ZLSNBCUB7iv8gtmPwgLOtQr4GJDvM55VbPipSXr 7ijw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id e70-20020a636949000000b005b8ddb9c305si5773427pgc.30.2023.11.08.21.46.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Nov 2023 21:46:44 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 4AF5981A1BE1; Wed, 8 Nov 2023 21:46:39 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231161AbjKIFpo (ORCPT + 32 others); Thu, 9 Nov 2023 00:45:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36198 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232459AbjKIFpl (ORCPT ); Thu, 9 Nov 2023 00:45:41 -0500 Received: from esa9.hc1455-7.c3s2.iphmx.com (esa9.hc1455-7.c3s2.iphmx.com [139.138.36.223]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AEF5926B2; Wed, 8 Nov 2023 21:45:38 -0800 (PST) X-IronPort-AV: E=McAfee;i="6600,9927,10888"; a="127393828" X-IronPort-AV: E=Sophos;i="6.03,288,1694703600"; d="scan'208";a="127393828" Received: from unknown (HELO yto-r2.gw.nic.fujitsu.com) ([218.44.52.218]) by esa9.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Nov 2023 14:45:35 +0900 Received: from yto-m1.gw.nic.fujitsu.com (yto-nat-yto-m1.gw.nic.fujitsu.com [192.168.83.64]) by yto-r2.gw.nic.fujitsu.com (Postfix) with ESMTP id 96445D6189; Thu, 9 Nov 2023 14:45:33 +0900 (JST) Received: from m3003.s.css.fujitsu.com (sqmail-3003.b.css.fujitsu.com [10.128.233.114]) by yto-m1.gw.nic.fujitsu.com (Postfix) with ESMTP id DC626CFAB3; Thu, 9 Nov 2023 14:45:32 +0900 (JST) Received: from localhost.localdomain (unknown [10.118.237.107]) by m3003.s.css.fujitsu.com (Postfix) with ESMTP id A958F2005323; Thu, 9 Nov 2023 14:45:32 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leon@kernel.org, jgg@ziepe.ca, zyjzyj2000@gmail.com Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, yangx.jy@fujitsu.com, lizhijian@fujitsu.com, y-goto@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v7 1/7] RDMA/rxe: Always defer tasks on responder and completer to workqueue Date: Thu, 9 Nov 2023 14:44:46 +0900 Message-Id: X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Wed, 08 Nov 2023 21:46:39 -0800 (PST) X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1782064143949652212 X-GMAIL-MSGID: 1782064143949652212 Both responder and completer need to sleep to execute page-fault when used with ODP. It can happen when they are going to access user MRs, so tasks must be executed in process context for such cases. Additionally, current implementation seldom defers tasks to workqueue, but instead defers to a softirq context running do_task(). It is called from rxe_resp_queue_pkt() and rxe_comp_queue_pkt() in SOFTIRQ_NET_RX context and can last until maximum RXE_MAX_ITERATIONS (=1024) loops are executed. The problem is the that task execuion appears to be anonymous loads in the system and that the loop can throttle other softirqs on the same CPU. This patch makes responder and completer codes run in process context for ODP and the problem described above. Signed-off-by: Daisuke Matsuda --- drivers/infiniband/sw/rxe/rxe_comp.c | 12 +----------- drivers/infiniband/sw/rxe/rxe_hw_counters.c | 1 - drivers/infiniband/sw/rxe/rxe_hw_counters.h | 1 - drivers/infiniband/sw/rxe/rxe_resp.c | 13 +------------ 4 files changed, 2 insertions(+), 25 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c index d0bdc2d8adc8..bb016a43330d 100644 --- a/drivers/infiniband/sw/rxe/rxe_comp.c +++ b/drivers/infiniband/sw/rxe/rxe_comp.c @@ -129,18 +129,8 @@ void retransmit_timer(struct timer_list *t) void rxe_comp_queue_pkt(struct rxe_qp *qp, struct sk_buff *skb) { - int must_sched; - skb_queue_tail(&qp->resp_pkts, skb); - - must_sched = skb_queue_len(&qp->resp_pkts) > 1; - if (must_sched != 0) - rxe_counter_inc(SKB_TO_PKT(skb)->rxe, RXE_CNT_COMPLETER_SCHED); - - if (must_sched) - rxe_sched_task(&qp->comp.task); - else - rxe_run_task(&qp->comp.task); + rxe_sched_task(&qp->comp.task); } static inline enum comp_state get_wqe(struct rxe_qp *qp, diff --git a/drivers/infiniband/sw/rxe/rxe_hw_counters.c b/drivers/infiniband/sw/rxe/rxe_hw_counters.c index a012522b577a..dc23cf3a6967 100644 --- a/drivers/infiniband/sw/rxe/rxe_hw_counters.c +++ b/drivers/infiniband/sw/rxe/rxe_hw_counters.c @@ -14,7 +14,6 @@ static const struct rdma_stat_desc rxe_counter_descs[] = { [RXE_CNT_RCV_RNR].name = "rcvd_rnr_err", [RXE_CNT_SND_RNR].name = "send_rnr_err", [RXE_CNT_RCV_SEQ_ERR].name = "rcvd_seq_err", - [RXE_CNT_COMPLETER_SCHED].name = "ack_deferred", [RXE_CNT_RETRY_EXCEEDED].name = "retry_exceeded_err", [RXE_CNT_RNR_RETRY_EXCEEDED].name = "retry_rnr_exceeded_err", [RXE_CNT_COMP_RETRY].name = "completer_retry_err", diff --git a/drivers/infiniband/sw/rxe/rxe_hw_counters.h b/drivers/infiniband/sw/rxe/rxe_hw_counters.h index 71f4d4fa9dc8..303da0e3134a 100644 --- a/drivers/infiniband/sw/rxe/rxe_hw_counters.h +++ b/drivers/infiniband/sw/rxe/rxe_hw_counters.h @@ -18,7 +18,6 @@ enum rxe_counters { RXE_CNT_RCV_RNR, RXE_CNT_SND_RNR, RXE_CNT_RCV_SEQ_ERR, - RXE_CNT_COMPLETER_SCHED, RXE_CNT_RETRY_EXCEEDED, RXE_CNT_RNR_RETRY_EXCEEDED, RXE_CNT_COMP_RETRY, diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c index da470a925efc..969e057bbfd1 100644 --- a/drivers/infiniband/sw/rxe/rxe_resp.c +++ b/drivers/infiniband/sw/rxe/rxe_resp.c @@ -46,21 +46,10 @@ static char *resp_state_name[] = { [RESPST_EXIT] = "EXIT", }; -/* rxe_recv calls here to add a request packet to the input queue */ void rxe_resp_queue_pkt(struct rxe_qp *qp, struct sk_buff *skb) { - int must_sched; - struct rxe_pkt_info *pkt = SKB_TO_PKT(skb); - skb_queue_tail(&qp->req_pkts, skb); - - must_sched = (pkt->opcode == IB_OPCODE_RC_RDMA_READ_REQUEST) || - (skb_queue_len(&qp->req_pkts) > 1); - - if (must_sched) - rxe_sched_task(&qp->resp.task); - else - rxe_run_task(&qp->resp.task); + rxe_sched_task(&qp->resp.task); } static inline enum resp_states get_req(struct rxe_qp *qp, From patchwork Thu Nov 9 05:44:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Daisuke Matsuda (Fujitsu)" X-Patchwork-Id: 163232 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b129:0:b0:403:3b70:6f57 with SMTP id q9csp235586vqs; Wed, 8 Nov 2023 21:47:52 -0800 (PST) X-Google-Smtp-Source: AGHT+IFlbCrc8ySeUF5iCaTquqkjn8fvt0jtavPqahawePX5ipHS4QKEfkkp2CE5LALfu5p4ZYrY X-Received: by 2002:a05:6a20:8f11:b0:149:9b2f:a79d with SMTP id b17-20020a056a208f1100b001499b2fa79dmr4473113pzk.6.1699508871885; Wed, 08 Nov 2023 21:47:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699508871; cv=none; d=google.com; s=arc-20160816; b=mHwewuFel0LMjKV0qyhX8IiIK8FTVIjO0TFwlYzPQtQVwfzLX8zZNuyDiQnFqAVuZV FthFB9Ls+FS7SLj7YIQ+w77Mm7qkGcoUy3fPBw8U9RixlC41jbiZ/s6FiSdIu7IfK40z 8tbqoC+YE9OD3+wTpdtJrwO0W39/9Qif7ejf02Gd5O4g8/qitD17UTSD8NRwFy4epFrm ITxde04vP6xXnL7i5rJYOaOp4OzrHXSmkb3n80wKZCIRHaYiemRzAz8PvPSeiWwpt2qq ApjkRXRXUfNcYBqIPU59zhemDmbI0+EK8DkLNlqvCQ5U329xOnCwutaHu/kttasqRPMA LMCA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=5X5Y/lxUjEioJwybA2y3eXWH9Vnys9bZtSSe82LNmDg=; fh=Kza5GnIpHUYQ1cZy2y92CPl6XrzN03vBDGjf8wFwAS8=; b=RJUGa8mJLsGUpQfCnzv+i7LE62Oj/oC2YPYqxpdboO4FkYNZSVM+v2+mJ28f0eD9OW qt8QtBvS5TgRs18fvRtFycc1TGzQEx9QBfyJOZE+gCnMg64hclgkxYanAjwZN1YIYxHs sxL6F6YKz0+WWQCHapMRaU9KirPC0AtNVnCsEpTHeK9xuSnDLjSpi3WFM/cukta3igkW MYXj6bhuEp8obNe2MtXRj3p7x2JL4QKoMaFzD2F0TWgNP4lYDsG3VVRspvfp/qpPduPQ 4YxJrWlE/Ax7H4HChgqPUsqzoMod/J5PfslHWLpMobETnyAh7J/0L3S06j+nElIUK+JR ovQg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id oj17-20020a17090b4d9100b002809a033855si925598pjb.157.2023.11.08.21.47.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Nov 2023 21:47:51 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 4C99480D6A18; Wed, 8 Nov 2023 21:47:44 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229566AbjKIFrP (ORCPT + 32 others); Thu, 9 Nov 2023 00:47:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39756 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232841AbjKIFq5 (ORCPT ); Thu, 9 Nov 2023 00:46:57 -0500 X-Greylist: delayed 62 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Wed, 08 Nov 2023 21:46:42 PST Received: from esa3.hc1455-7.c3s2.iphmx.com (esa3.hc1455-7.c3s2.iphmx.com [207.54.90.49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E7BD42705; Wed, 8 Nov 2023 21:46:42 -0800 (PST) X-IronPort-AV: E=McAfee;i="6600,9927,10888"; a="139093709" X-IronPort-AV: E=Sophos;i="6.03,288,1694703600"; d="scan'208";a="139093709" Received: from unknown (HELO oym-r2.gw.nic.fujitsu.com) ([210.162.30.90]) by esa3.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Nov 2023 14:45:37 +0900 Received: from oym-m4.gw.nic.fujitsu.com (oym-nat-oym-m4.gw.nic.fujitsu.com [192.168.87.61]) by oym-r2.gw.nic.fujitsu.com (Postfix) with ESMTP id 85FA9DC146; Thu, 9 Nov 2023 14:45:35 +0900 (JST) Received: from m3003.s.css.fujitsu.com (sqmail-3003.b.css.fujitsu.com [10.128.233.114]) by oym-m4.gw.nic.fujitsu.com (Postfix) with ESMTP id B7FAAD52D3; Thu, 9 Nov 2023 14:45:34 +0900 (JST) Received: from localhost.localdomain (unknown [10.118.237.107]) by m3003.s.css.fujitsu.com (Postfix) with ESMTP id 735122005323; Thu, 9 Nov 2023 14:45:34 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leon@kernel.org, jgg@ziepe.ca, zyjzyj2000@gmail.com Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, yangx.jy@fujitsu.com, lizhijian@fujitsu.com, y-goto@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v7 2/7] RDMA/rxe: Make MR functions accessible from other rxe source code Date: Thu, 9 Nov 2023 14:44:47 +0900 Message-Id: <945befecbcb4d7c0a3a02d03cfa444aafc7c27b8.1699503619.git.matsuda-daisuke@fujitsu.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Wed, 08 Nov 2023 21:47:44 -0800 (PST) X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1782064214654429481 X-GMAIL-MSGID: 1782064214654429481 Some functions in rxe_mr.c are going to be used in rxe_odp.c, which is to be created in the subsequent patch. List the declarations of the functions in rxe_loc.h. Signed-off-by: Daisuke Matsuda --- drivers/infiniband/sw/rxe/rxe_loc.h | 8 ++++++++ drivers/infiniband/sw/rxe/rxe_mr.c | 11 +++-------- 2 files changed, 11 insertions(+), 8 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index 4d2a8ef52c85..eb867f7d0d36 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -58,6 +58,7 @@ int rxe_mmap(struct ib_ucontext *context, struct vm_area_struct *vma); /* rxe_mr.c */ u8 rxe_get_next_key(u32 last_key); +void rxe_mr_init(int access, struct rxe_mr *mr); void rxe_mr_init_dma(int access, struct rxe_mr *mr); int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, int access, struct rxe_mr *mr); @@ -69,6 +70,8 @@ int copy_data(struct rxe_pd *pd, int access, struct rxe_dma_info *dma, void *addr, int length, enum rxe_mr_copy_dir dir); int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents, unsigned int *sg_offset); +int rxe_mr_copy_xarray(struct rxe_mr *mr, u64 iova, void *addr, + unsigned int length, enum rxe_mr_copy_dir dir); int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, u64 compare, u64 swap_add, u64 *orig_val); int rxe_mr_do_atomic_write(struct rxe_mr *mr, u64 iova, u64 value); @@ -80,6 +83,11 @@ int rxe_invalidate_mr(struct rxe_qp *qp, u32 key); int rxe_reg_fast_mr(struct rxe_qp *qp, struct rxe_send_wqe *wqe); void rxe_mr_cleanup(struct rxe_pool_elem *elem); +static inline unsigned long rxe_mr_iova_to_index(struct rxe_mr *mr, u64 iova) +{ + return (iova >> mr->page_shift) - (mr->ibmr.iova >> mr->page_shift); +} + /* rxe_mw.c */ int rxe_alloc_mw(struct ib_mw *ibmw, struct ib_udata *udata); int rxe_dealloc_mw(struct ib_mw *ibmw); diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c index f54042e9aeb2..86b1908d304b 100644 --- a/drivers/infiniband/sw/rxe/rxe_mr.c +++ b/drivers/infiniband/sw/rxe/rxe_mr.c @@ -45,7 +45,7 @@ int mr_check_range(struct rxe_mr *mr, u64 iova, size_t length) } } -static void rxe_mr_init(int access, struct rxe_mr *mr) +void rxe_mr_init(int access, struct rxe_mr *mr) { u32 key = mr->elem.index << 8 | rxe_get_next_key(-1); @@ -72,11 +72,6 @@ void rxe_mr_init_dma(int access, struct rxe_mr *mr) mr->ibmr.type = IB_MR_TYPE_DMA; } -static unsigned long rxe_mr_iova_to_index(struct rxe_mr *mr, u64 iova) -{ - return (iova >> mr->page_shift) - (mr->ibmr.iova >> mr->page_shift); -} - static unsigned long rxe_mr_iova_to_page_offset(struct rxe_mr *mr, u64 iova) { return iova & (mr_page_size(mr) - 1); @@ -242,8 +237,8 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl, return ib_sg_to_pages(ibmr, sgl, sg_nents, sg_offset, rxe_set_page); } -static int rxe_mr_copy_xarray(struct rxe_mr *mr, u64 iova, void *addr, - unsigned int length, enum rxe_mr_copy_dir dir) +int rxe_mr_copy_xarray(struct rxe_mr *mr, u64 iova, void *addr, + unsigned int length, enum rxe_mr_copy_dir dir) { unsigned int page_offset = rxe_mr_iova_to_page_offset(mr, iova); unsigned long index = rxe_mr_iova_to_index(mr, iova); From patchwork Thu Nov 9 05:44:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Daisuke Matsuda (Fujitsu)" X-Patchwork-Id: 163229 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b129:0:b0:403:3b70:6f57 with SMTP id q9csp235424vqs; Wed, 8 Nov 2023 21:47:21 -0800 (PST) X-Google-Smtp-Source: AGHT+IFKIdooMGQAqdr6p8r/boBptKIHPvTsDLnKFgBakYi/Q/X1LM9XPThqo4os7EbgEU7JbQXx X-Received: by 2002:a05:6a21:7748:b0:181:44c:d59 with SMTP id bc8-20020a056a21774800b00181044c0d59mr5209400pzc.31.1699508840816; Wed, 08 Nov 2023 21:47:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699508840; cv=none; d=google.com; s=arc-20160816; b=0S8AJnW2t0gcsDThN6Rolb40EUZJoOuvmc0+kQAPARUM6zaDoxkpH1HeemJk3yYX0k SsCiWoWfCZlJateLcbScTt4beVpbII8BhBID92nD5Nv14gkoonYrtOzQI5b2I/6AtDbv 9c+Uk0G//gZAO2R+0sYnxu07pe6NpTBBJ89FqxLBOwdzcLCYOEFq6UTxVXhMuJuEuk+k 181ggBRrcVm4ETZONXVM/5dz7A7ipp4RfX1VtNhuoHQLMypydzElCcJEQW4QLWcZSbEI Cne3YQ2EPPi8ElenmlMHMRnoY7veHwPh1jW75PBWK6pB5Yj4/AD7Vx+eEkG3dMZYKusb Blsw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=L40jHUu+0KpWnqHhETcT/MQdmSkNU4vnkVLoDMCnl50=; fh=Kza5GnIpHUYQ1cZy2y92CPl6XrzN03vBDGjf8wFwAS8=; b=OBW8CoacubbGsP85D/SxsSQzbLOtTQe2YWplvp38GTQ0xDuOJy4UH6+hOHoMyI3OfE 1DvZNpbQt0C3H5IUkLMzjPxyEerqGErKrWc4eJNOvehUnPdMhn8wcwA92rpxVBByZxtD egWJHh9/axCdrwGVGLL6A19VMZiJHe+iFzMSyioc5YCb4hJTY7SBZOpDyPxo3o6W2lQV OuMD6/vAXWB2pAllIJnSOH73oXQxndzgZao8IH/KgOFV/bx5Fi0B0Qs8Ei92fA2NzzAs kAXRTaoBpDOreYmnyBJI9icdYL74tSKeGQ+/qGjpn5PKhTwGXtIT69dCdNaJPydpYP+T vGvA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: from fry.vger.email (fry.vger.email. [23.128.96.38]) by mx.google.com with ESMTPS id s8-20020a170902ea0800b001c62e2ce6a7si4419147plg.445.2023.11.08.21.47.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Nov 2023 21:47:20 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) client-ip=23.128.96.38; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 1424D80FCBC8; Wed, 8 Nov 2023 21:46:32 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232521AbjKIFps (ORCPT + 32 others); Thu, 9 Nov 2023 00:45:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47300 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232462AbjKIFpp (ORCPT ); Thu, 9 Nov 2023 00:45:45 -0500 Received: from esa8.hc1455-7.c3s2.iphmx.com (esa8.hc1455-7.c3s2.iphmx.com [139.138.61.253]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8780426AF; Wed, 8 Nov 2023 21:45:42 -0800 (PST) X-IronPort-AV: E=McAfee;i="6600,9927,10888"; a="127068098" X-IronPort-AV: E=Sophos;i="6.03,288,1694703600"; d="scan'208";a="127068098" Received: from unknown (HELO oym-r1.gw.nic.fujitsu.com) ([210.162.30.89]) by esa8.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Nov 2023 14:45:39 +0900 Received: from oym-m1.gw.nic.fujitsu.com (oym-nat-oym-m1.gw.nic.fujitsu.com [192.168.87.58]) by oym-r1.gw.nic.fujitsu.com (Postfix) with ESMTP id A1870C14A1; Thu, 9 Nov 2023 14:45:37 +0900 (JST) Received: from m3003.s.css.fujitsu.com (sqmail-3003.b.css.fujitsu.com [10.128.233.114]) by oym-m1.gw.nic.fujitsu.com (Postfix) with ESMTP id D9B34D9C60; Thu, 9 Nov 2023 14:45:36 +0900 (JST) Received: from localhost.localdomain (unknown [10.118.237.107]) by m3003.s.css.fujitsu.com (Postfix) with ESMTP id 94AAE2005323; Thu, 9 Nov 2023 14:45:36 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leon@kernel.org, jgg@ziepe.ca, zyjzyj2000@gmail.com Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, yangx.jy@fujitsu.com, lizhijian@fujitsu.com, y-goto@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v7 4/7] RDMA/rxe: Add page invalidation support Date: Thu, 9 Nov 2023 14:44:49 +0900 Message-Id: <94f8aa4a634936295ae5772d6df6978c3b5de268.1699503619.git.matsuda-daisuke@fujitsu.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Wed, 08 Nov 2023 21:46:32 -0800 (PST) X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1782064182285009598 X-GMAIL-MSGID: 1782064182285009598 On page invalidation, an MMU notifier callback is invoked to unmap DMA addresses and update the driver page table(umem_odp->dma_list). It also sets the corresponding entries in MR xarray to NULL to prevent any access. The callback is registered when an ODP-enabled MR is created. Signed-off-by: Daisuke Matsuda --- drivers/infiniband/sw/rxe/Makefile | 2 + drivers/infiniband/sw/rxe/rxe_odp.c | 57 +++++++++++++++++++++++++++++ 2 files changed, 59 insertions(+) create mode 100644 drivers/infiniband/sw/rxe/rxe_odp.c diff --git a/drivers/infiniband/sw/rxe/Makefile b/drivers/infiniband/sw/rxe/Makefile index 5395a581f4bb..93134f1d1d0c 100644 --- a/drivers/infiniband/sw/rxe/Makefile +++ b/drivers/infiniband/sw/rxe/Makefile @@ -23,3 +23,5 @@ rdma_rxe-y := \ rxe_task.o \ rxe_net.o \ rxe_hw_counters.o + +rdma_rxe-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += rxe_odp.o diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c new file mode 100644 index 000000000000..ea55b79be0c6 --- /dev/null +++ b/drivers/infiniband/sw/rxe/rxe_odp.c @@ -0,0 +1,57 @@ +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB +/* + * Copyright (c) 2022-2023 Fujitsu Ltd. All rights reserved. + */ + +#include + +#include + +#include "rxe.h" + +static void rxe_mr_unset_xarray(struct rxe_mr *mr, unsigned long start, + unsigned long end) +{ + unsigned long upper = rxe_mr_iova_to_index(mr, end - 1); + unsigned long lower = rxe_mr_iova_to_index(mr, start); + void *entry; + + XA_STATE(xas, &mr->page_list, lower); + + /* make elements in xarray NULL */ + xas_lock(&xas); + xas_for_each(&xas, entry, upper) + xas_store(&xas, NULL); + xas_unlock(&xas); +} + +static bool rxe_ib_invalidate_range(struct mmu_interval_notifier *mni, + const struct mmu_notifier_range *range, + unsigned long cur_seq) +{ + struct ib_umem_odp *umem_odp = + container_of(mni, struct ib_umem_odp, notifier); + struct rxe_mr *mr = umem_odp->private; + unsigned long start, end; + + if (!mmu_notifier_range_blockable(range)) + return false; + + mutex_lock(&umem_odp->umem_mutex); + mmu_interval_set_seq(mni, cur_seq); + + start = max_t(u64, ib_umem_start(umem_odp), range->start); + end = min_t(u64, ib_umem_end(umem_odp), range->end); + + rxe_mr_unset_xarray(mr, start, end); + + /* update umem_odp->dma_list */ + ib_umem_odp_unmap_dma_pages(umem_odp, start, end); + + mutex_unlock(&umem_odp->umem_mutex); + return true; +} + +const struct mmu_interval_notifier_ops rxe_mn_ops = { + .invalidate = rxe_ib_invalidate_range, +}; From patchwork Thu Nov 9 05:44:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Daisuke Matsuda (Fujitsu)" X-Patchwork-Id: 163233 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b129:0:b0:403:3b70:6f57 with SMTP id q9csp235630vqs; Wed, 8 Nov 2023 21:48:02 -0800 (PST) X-Google-Smtp-Source: AGHT+IGQJFrchpj+Do7xDvz7d0TKVA6nQPFzRNTO4VkCjF8dkFekN+4Rd1T52ZYJF1rBvrD9kERj X-Received: by 2002:a9d:7d83:0:b0:6af:95f9:7adc with SMTP id j3-20020a9d7d83000000b006af95f97adcmr4434738otn.14.1699508882495; Wed, 08 Nov 2023 21:48:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699508882; cv=none; d=google.com; s=arc-20160816; b=MqdrsZr0nKyPT0R7XxQKdvNSJEtlLpZ2zaYFaISlr9Y/KTUQMcSK6qp9ezMWwMuPRc NjMVS4EYhGHlLesXrlszDp2/ks7aRDunfzeDtTCgnLGAhXB7APY6KYFiVlfcjAYHpCE0 XgUqEY6KmFiUEQ36b/kzr3K3teYgwQhzPuRVxJvJWzGoC42Sj9/e4+MwtMCFwV0VDnbb W9lLcBX3fIldMcSQsHh1N1JJoAiqwVHgHZhfZ/OfoKDP/VwAbY0lB5tudBLODoNexb8D Dpqq8/AP9efk7s378mQvvw6vmbAR+FjhaPAAYGfuDgB0DfOkleJF2mP01BjJSRcVFZSh eCzQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=Ks66YIKNnnV+fL+UTGQIzkTBRGszWrQEiNbEdRymsn4=; fh=Kza5GnIpHUYQ1cZy2y92CPl6XrzN03vBDGjf8wFwAS8=; b=W6nWVrfp6LpAntR61SWErHuFXeGZ1L0P/rOQu65YGB02/Li/5kz1icbp+1iunW/SBc U7V4X+lcooDN2BQ5cxqzbBvH/p9WWWHwLEf7CkjOMJl9lkqH8mC7rhdVb0zAyIDKDimU CkhaK7aGV5HmCePdM4AoAiRHeriOIkXfMzC/tCydKvKeS6fXLAIVvf9s3NlDxiazX6bi Q3pB5TovOeUtk+STPaqSEDxB2vZneKY5avSl+1SaPa4eFjMeyGRdahSC8cs0E6/VZmr/ SVy+dgaJOkeP62CfyD5M7lP+66jYuetSMfl37EWxV2qa58LZSkNuckAqJBfpNpKcma7m q04A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id bg26-20020a056a02011a00b005b96cbfacdesi6326994pgb.196.2023.11.08.21.48.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Nov 2023 21:48:02 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 9302C81DDBEE; Wed, 8 Nov 2023 21:47:59 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232566AbjKIFrX (ORCPT + 32 others); Thu, 9 Nov 2023 00:47:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53824 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232537AbjKIFq6 (ORCPT ); Thu, 9 Nov 2023 00:46:58 -0500 Received: from esa11.hc1455-7.c3s2.iphmx.com (esa11.hc1455-7.c3s2.iphmx.com [207.54.90.137]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 89CFA2D72; Wed, 8 Nov 2023 21:46:46 -0800 (PST) X-IronPort-AV: E=McAfee;i="6600,9927,10888"; a="118471671" X-IronPort-AV: E=Sophos;i="6.03,288,1694703600"; d="scan'208";a="118471671" Received: from unknown (HELO yto-r4.gw.nic.fujitsu.com) ([218.44.52.220]) by esa11.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Nov 2023 14:45:42 +0900 Received: from yto-m3.gw.nic.fujitsu.com (yto-nat-yto-m3.gw.nic.fujitsu.com [192.168.83.66]) by yto-r4.gw.nic.fujitsu.com (Postfix) with ESMTP id 80418CF1C3; Thu, 9 Nov 2023 14:45:38 +0900 (JST) Received: from m3003.s.css.fujitsu.com (sqmail-3003.b.css.fujitsu.com [10.128.233.114]) by yto-m3.gw.nic.fujitsu.com (Postfix) with ESMTP id C1E9FD20B6; Thu, 9 Nov 2023 14:45:37 +0900 (JST) Received: from localhost.localdomain (unknown [10.118.237.107]) by m3003.s.css.fujitsu.com (Postfix) with ESMTP id 8E2BE2005323; Thu, 9 Nov 2023 14:45:37 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leon@kernel.org, jgg@ziepe.ca, zyjzyj2000@gmail.com Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, yangx.jy@fujitsu.com, lizhijian@fujitsu.com, y-goto@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v7 5/7] RDMA/rxe: Allow registering MRs for On-Demand Paging Date: Thu, 9 Nov 2023 14:44:50 +0900 Message-Id: <5d46bd682aa8e3d5cabc38ca1cd67d2976f2731d.1699503619.git.matsuda-daisuke@fujitsu.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Wed, 08 Nov 2023 21:47:59 -0800 (PST) X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1782064226084894090 X-GMAIL-MSGID: 1782064226084894090 Allow userspace to register an ODP-enabled MR, in which case the flag IB_ACCESS_ON_DEMAND is passed to rxe_reg_user_mr(). However, there is no RDMA operation enabled right now. They will be supported later in the subsequent two patches. rxe_odp_do_pagefault() is called to initialize an ODP-enabled MR. It syncs process address space from the CPU page table to the driver page table (dma_list/pfn_list in umem_odp) when called with RXE_PAGEFAULT_SNAPSHOT flag. Additionally, It can be used to trigger page fault when pages being accessed are not present or do not have proper read/write permissions, and possibly to prefetch pages in the future. Signed-off-by: Daisuke Matsuda --- drivers/infiniband/sw/rxe/rxe.c | 7 ++ drivers/infiniband/sw/rxe/rxe_loc.h | 14 +++ drivers/infiniband/sw/rxe/rxe_mr.c | 9 +- drivers/infiniband/sw/rxe/rxe_odp.c | 122 ++++++++++++++++++++++++++ drivers/infiniband/sw/rxe/rxe_resp.c | 15 +++- drivers/infiniband/sw/rxe/rxe_verbs.c | 5 +- 6 files changed, 166 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c index 54c723a6edda..f2284d27229b 100644 --- a/drivers/infiniband/sw/rxe/rxe.c +++ b/drivers/infiniband/sw/rxe/rxe.c @@ -73,6 +73,13 @@ static void rxe_init_device_param(struct rxe_dev *rxe) rxe->ndev->dev_addr); rxe->max_ucontext = RXE_MAX_UCONTEXT; + + if (IS_ENABLED(CONFIG_INFINIBAND_ON_DEMAND_PAGING)) { + rxe->attr.kernel_cap_flags |= IBK_ON_DEMAND_PAGING; + + /* IB_ODP_SUPPORT_IMPLICIT is not supported right now. */ + rxe->attr.odp_caps.general_caps |= IB_ODP_SUPPORT; + } } /* initialize port attributes */ diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index eb867f7d0d36..4bda154a0248 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -188,4 +188,18 @@ static inline unsigned int wr_opcode_mask(int opcode, struct rxe_qp *qp) return rxe_wr_opcode_info[opcode].mask[qp->ibqp.qp_type]; } +/* rxe_odp.c */ +#ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING +int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, + u64 iova, int access_flags, struct rxe_mr *mr); +#else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ +static inline int +rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, + int access_flags, struct rxe_mr *mr) +{ + return -EOPNOTSUPP; +} + +#endif /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ + #endif /* RXE_LOC_H */ diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c index 86b1908d304b..384cb4ba1f2d 100644 --- a/drivers/infiniband/sw/rxe/rxe_mr.c +++ b/drivers/infiniband/sw/rxe/rxe_mr.c @@ -318,7 +318,10 @@ int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, return err; } - return rxe_mr_copy_xarray(mr, iova, addr, length, dir); + if (mr->umem->is_odp) + return -EOPNOTSUPP; + else + return rxe_mr_copy_xarray(mr, iova, addr, length, dir); } /* copy data in or out of a wqe, i.e. sg list @@ -527,6 +530,10 @@ int rxe_mr_do_atomic_write(struct rxe_mr *mr, u64 iova, u64 value) struct page *page; u64 *va; + /* ODP is not supported right now. WIP. */ + if (mr->umem->is_odp) + return RESPST_ERR_UNSUPPORTED_OPCODE; + /* See IBA oA19-28 */ if (unlikely(mr->state != RXE_MR_STATE_VALID)) { rxe_dbg_mr(mr, "mr not in valid state"); diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c index ea55b79be0c6..c5e24901c141 100644 --- a/drivers/infiniband/sw/rxe/rxe_odp.c +++ b/drivers/infiniband/sw/rxe/rxe_odp.c @@ -9,6 +9,8 @@ #include "rxe.h" +#define RXE_ODP_WRITABLE_BIT 1UL + static void rxe_mr_unset_xarray(struct rxe_mr *mr, unsigned long start, unsigned long end) { @@ -25,6 +27,29 @@ static void rxe_mr_unset_xarray(struct rxe_mr *mr, unsigned long start, xas_unlock(&xas); } +static void rxe_mr_set_xarray(struct rxe_mr *mr, unsigned long start, + unsigned long end, unsigned long *pfn_list) +{ + unsigned long upper = rxe_mr_iova_to_index(mr, end - 1); + unsigned long lower = rxe_mr_iova_to_index(mr, start); + void *page, *entry; + + XA_STATE(xas, &mr->page_list, lower); + + xas_lock(&xas); + while (xas.xa_index <= upper) { + if (pfn_list[xas.xa_index] & HMM_PFN_WRITE) { + page = xa_tag_pointer(hmm_pfn_to_page(pfn_list[xas.xa_index]), + RXE_ODP_WRITABLE_BIT); + } else + page = hmm_pfn_to_page(pfn_list[xas.xa_index]); + + xas_store(&xas, page); + entry = xas_next(&xas); + } + xas_unlock(&xas); +} + static bool rxe_ib_invalidate_range(struct mmu_interval_notifier *mni, const struct mmu_notifier_range *range, unsigned long cur_seq) @@ -55,3 +80,100 @@ static bool rxe_ib_invalidate_range(struct mmu_interval_notifier *mni, const struct mmu_interval_notifier_ops rxe_mn_ops = { .invalidate = rxe_ib_invalidate_range, }; + +#define RXE_PAGEFAULT_RDONLY BIT(1) +#define RXE_PAGEFAULT_SNAPSHOT BIT(2) +static int rxe_odp_do_pagefault_and_lock(struct rxe_mr *mr, u64 user_va, int bcnt, u32 flags) +{ + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); + bool fault = !(flags & RXE_PAGEFAULT_SNAPSHOT); + u64 access_mask; + int np; + + access_mask = ODP_READ_ALLOWED_BIT; + if (umem_odp->umem.writable && !(flags & RXE_PAGEFAULT_RDONLY)) + access_mask |= ODP_WRITE_ALLOWED_BIT; + + /* + * ib_umem_odp_map_dma_and_lock() locks umem_mutex on success. + * Callers must release the lock later to let invalidation handler + * do its work again. + */ + np = ib_umem_odp_map_dma_and_lock(umem_odp, user_va, bcnt, + access_mask, fault); + if (np < 0) + return np; + + /* + * umem_mutex is still locked here, so we can use hmm_pfn_to_page() + * safely to fetch pages in the range. + */ + rxe_mr_set_xarray(mr, user_va, user_va + bcnt, umem_odp->pfn_list); + + return np; +} + +static int rxe_odp_init_pages(struct rxe_mr *mr) +{ + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); + int ret; + + ret = rxe_odp_do_pagefault_and_lock(mr, mr->umem->address, + mr->umem->length, + RXE_PAGEFAULT_SNAPSHOT); + + if (ret >= 0) + mutex_unlock(&umem_odp->umem_mutex); + + return ret >= 0 ? 0 : ret; +} + +int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, + u64 iova, int access_flags, struct rxe_mr *mr) +{ + struct ib_umem_odp *umem_odp; + int err; + + if (!IS_ENABLED(CONFIG_INFINIBAND_ON_DEMAND_PAGING)) + return -EOPNOTSUPP; + + rxe_mr_init(access_flags, mr); + + xa_init(&mr->page_list); + + if (!start && length == U64_MAX) { + if (iova != 0) + return -EINVAL; + if (!(rxe->attr.odp_caps.general_caps & IB_ODP_SUPPORT_IMPLICIT)) + return -EINVAL; + + /* Never reach here, for implicit ODP is not implemented. */ + } + + umem_odp = ib_umem_odp_get(&rxe->ib_dev, start, length, access_flags, + &rxe_mn_ops); + if (IS_ERR(umem_odp)) { + rxe_dbg_mr(mr, "Unable to create umem_odp err = %d\n", + (int)PTR_ERR(umem_odp)); + return PTR_ERR(umem_odp); + } + + umem_odp->private = mr; + + mr->umem = &umem_odp->umem; + mr->access = access_flags; + mr->ibmr.length = length; + mr->ibmr.iova = iova; + mr->page_offset = ib_umem_offset(&umem_odp->umem); + + err = rxe_odp_init_pages(mr); + if (err) { + ib_umem_odp_release(umem_odp); + return err; + } + + mr->state = RXE_MR_STATE_VALID; + mr->ibmr.type = IB_MR_TYPE_USER; + + return err; +} diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c index 969e057bbfd1..9159f1bdfc6f 100644 --- a/drivers/infiniband/sw/rxe/rxe_resp.c +++ b/drivers/infiniband/sw/rxe/rxe_resp.c @@ -635,6 +635,10 @@ static enum resp_states process_flush(struct rxe_qp *qp, struct rxe_mr *mr = qp->resp.mr; struct resp_res *res = qp->resp.res; + /* ODP is not supported right now. WIP. */ + if (mr->umem->is_odp) + return RESPST_ERR_UNSUPPORTED_OPCODE; + /* oA19-14, oA19-15 */ if (res && res->replay) return RESPST_ACKNOWLEDGE; @@ -688,10 +692,13 @@ static enum resp_states atomic_reply(struct rxe_qp *qp, if (!res->replay) { u64 iova = qp->resp.va + qp->resp.offset; - err = rxe_mr_do_atomic_op(mr, iova, pkt->opcode, - atmeth_comp(pkt), - atmeth_swap_add(pkt), - &res->atomic.orig_val); + if (mr->umem->is_odp) + err = RESPST_ERR_UNSUPPORTED_OPCODE; + else + err = rxe_mr_do_atomic_op(mr, iova, pkt->opcode, + atmeth_comp(pkt), + atmeth_swap_add(pkt), + &res->atomic.orig_val); if (err) return err; diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c index 48f86839d36a..192ad835c712 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.c +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c @@ -1278,7 +1278,10 @@ static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd, u64 start, mr->ibmr.pd = ibpd; mr->ibmr.device = ibpd->device; - err = rxe_mr_init_user(rxe, start, length, iova, access, mr); + if (access & IB_ACCESS_ON_DEMAND) + err = rxe_odp_mr_init_user(rxe, start, length, iova, access, mr); + else + err = rxe_mr_init_user(rxe, start, length, iova, access, mr); if (err) { rxe_dbg_mr(mr, "reg_user_mr failed, err = %d", err); goto err_cleanup; From patchwork Thu Nov 9 05:44:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Daisuke Matsuda (Fujitsu)" X-Patchwork-Id: 163230 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b129:0:b0:403:3b70:6f57 with SMTP id q9csp235552vqs; Wed, 8 Nov 2023 21:47:46 -0800 (PST) X-Google-Smtp-Source: AGHT+IFc5X5Ksq8Pkfq0+k1u2SH1HnB/F3WBxz5QBnXo+F8aBbSKTJIV9+2G131oPb16wUEOvsoV X-Received: by 2002:a17:903:22d2:b0:1cc:3a60:bd69 with SMTP id y18-20020a17090322d200b001cc3a60bd69mr4816167plg.25.1699508865712; Wed, 08 Nov 2023 21:47:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699508865; cv=none; d=google.com; s=arc-20160816; b=j23VBT7TWcdlP+0We9r7Gfgl2PcRgIT5/QNxkEnhKWbRPBAcAXLd3pkT6RYVK3ciAl geLGxVXyuAG85drAhW+JGUix4JPmNPPqNQn1FjVgGIwZZDt6g7//7T8y/azneRNuH3qh M6OHoq+PmoKufz1Hh5aYX17lGIMnmJwdK7G5rXa+8yf9I5bS5AwywABwvumo1epJktFI HWGDmk1oOiUOJDmbmKdv713FbtQEuMMcvI9gEm4ZSoBflqnlNPsqy8OIuXqcTCgSA+g6 2CA524d/XOuIFdMaX9ghND78heE4IqHQilLX09VK7Uewf3LpaIfZm6gjML2DUQljxqJ1 U3gw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=0K+WkqpaNKH2QPZidtHRfi9Or/inhoWF+jOvYgHWm9o=; fh=Kza5GnIpHUYQ1cZy2y92CPl6XrzN03vBDGjf8wFwAS8=; b=ZuFmxN6J6SpgrHLCPSdpJg25QRpSFPSyhl4Tpg82i3isC2M8Q8U5J1nZIPMA4OwfE6 AQHFvoO8xFuOtPMrnDFHx4tAMvx4wN9fDTSnY+cm6aJ6FKdFIFDWsv5mbIeev9YJPm+J HY38+bB52WoF4m1YPEmaDjYeLwU0/uk7HNGl48cOzSZv2I5Ei/iGcSJmCvSr7O3Ccq29 yOe81wzY+wItvmxvgxcns4cCuIV1Io5hKcxFyZGww3y8p0Ei8ZtCkbDX04Mcx9DOi8BN uL9LPhT2o0kbUQPJkfqA1eXuUSXGJoIwDIndCmnOY3xeffhMLZCbZdzqHK3cbWJkBhGx POPA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id t18-20020a1709028c9200b001cc18133022si3750715plo.285.2023.11.08.21.47.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Nov 2023 21:47:45 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id A151880A91AA; Wed, 8 Nov 2023 21:46:19 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232537AbjKIFp4 (ORCPT + 32 others); Thu, 9 Nov 2023 00:45:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53802 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232517AbjKIFps (ORCPT ); Thu, 9 Nov 2023 00:45:48 -0500 Received: from esa12.hc1455-7.c3s2.iphmx.com (esa12.hc1455-7.c3s2.iphmx.com [139.138.37.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2937F26B9; Wed, 8 Nov 2023 21:45:44 -0800 (PST) X-IronPort-AV: E=McAfee;i="6600,9927,10888"; a="118449468" X-IronPort-AV: E=Sophos;i="6.03,288,1694703600"; d="scan'208";a="118449468" Received: from unknown (HELO oym-r2.gw.nic.fujitsu.com) ([210.162.30.90]) by esa12.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Nov 2023 14:45:41 +0900 Received: from oym-m1.gw.nic.fujitsu.com (oym-nat-oym-m1.gw.nic.fujitsu.com [192.168.87.58]) by oym-r2.gw.nic.fujitsu.com (Postfix) with ESMTP id C1D48DC146; Thu, 9 Nov 2023 14:45:39 +0900 (JST) Received: from m3003.s.css.fujitsu.com (sqmail-3003.b.css.fujitsu.com [10.128.233.114]) by oym-m1.gw.nic.fujitsu.com (Postfix) with ESMTP id ED58CD9C64; Thu, 9 Nov 2023 14:45:38 +0900 (JST) Received: from localhost.localdomain (unknown [10.118.237.107]) by m3003.s.css.fujitsu.com (Postfix) with ESMTP id A8AF72005323; Thu, 9 Nov 2023 14:45:38 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leon@kernel.org, jgg@ziepe.ca, zyjzyj2000@gmail.com Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, yangx.jy@fujitsu.com, lizhijian@fujitsu.com, y-goto@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v7 6/7] RDMA/rxe: Add support for Send/Recv/Write/Read with ODP Date: Thu, 9 Nov 2023 14:44:51 +0900 Message-Id: X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Wed, 08 Nov 2023 21:46:19 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1782064208551453300 X-GMAIL-MSGID: 1782064208551453300 rxe_mr_copy() is used widely to copy data to/from a user MR. requester uses it to load payloads of requesting packets; responder uses it to process Send, Write, and Read operaetions; completer uses it to copy data from response packets of Read and Atomic operations to a user MR. Allow these operations to be used with ODP by adding a subordinate function rxe_odp_mr_copy(). It is comprised of the following steps: 1. Check page presence and R/W permission. 2. If OK, just execute data copy to/from the pages and exit. 3. Otherwise, trigger page fault to map the pages. 4. Update the MR xarray using PFNs in umem_odp->pfn_list. 5. Execute data copy to/from the pages. Signed-off-by: Daisuke Matsuda --- drivers/infiniband/sw/rxe/rxe.c | 10 ++++ drivers/infiniband/sw/rxe/rxe_loc.h | 8 +++ drivers/infiniband/sw/rxe/rxe_mr.c | 9 +++- drivers/infiniband/sw/rxe/rxe_odp.c | 77 +++++++++++++++++++++++++++++ 4 files changed, 102 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c index f2284d27229b..207a022156f0 100644 --- a/drivers/infiniband/sw/rxe/rxe.c +++ b/drivers/infiniband/sw/rxe/rxe.c @@ -79,6 +79,16 @@ static void rxe_init_device_param(struct rxe_dev *rxe) /* IB_ODP_SUPPORT_IMPLICIT is not supported right now. */ rxe->attr.odp_caps.general_caps |= IB_ODP_SUPPORT; + + rxe->attr.odp_caps.per_transport_caps.ud_odp_caps |= IB_ODP_SUPPORT_SEND; + rxe->attr.odp_caps.per_transport_caps.ud_odp_caps |= IB_ODP_SUPPORT_RECV; + rxe->attr.odp_caps.per_transport_caps.ud_odp_caps |= IB_ODP_SUPPORT_SRQ_RECV; + + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_SEND; + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_RECV; + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_WRITE; + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_READ; + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_SRQ_RECV; } } diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index 4bda154a0248..eeaeff8a1398 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -192,6 +192,8 @@ static inline unsigned int wr_opcode_mask(int opcode, struct rxe_qp *qp) #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, int access_flags, struct rxe_mr *mr); +int rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length, + enum rxe_mr_copy_dir dir); #else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ static inline int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, @@ -199,6 +201,12 @@ rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, { return -EOPNOTSUPP; } +static inline int +rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, + int length, enum rxe_mr_copy_dir dir) +{ + return -EOPNOTSUPP; +} #endif /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c index 384cb4ba1f2d..f0ce87c0fc7d 100644 --- a/drivers/infiniband/sw/rxe/rxe_mr.c +++ b/drivers/infiniband/sw/rxe/rxe_mr.c @@ -247,7 +247,12 @@ int rxe_mr_copy_xarray(struct rxe_mr *mr, u64 iova, void *addr, void *va; while (length) { - page = xa_load(&mr->page_list, index); + if (mr->umem->is_odp) + page = xa_untag_pointer(xa_load(&mr->page_list, + index)); + else + page = xa_load(&mr->page_list, index); + if (!page) return -EFAULT; @@ -319,7 +324,7 @@ int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, } if (mr->umem->is_odp) - return -EOPNOTSUPP; + return rxe_odp_mr_copy(mr, iova, addr, length, dir); else return rxe_mr_copy_xarray(mr, iova, addr, length, dir); } diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c index c5e24901c141..5aa09b9c1095 100644 --- a/drivers/infiniband/sw/rxe/rxe_odp.c +++ b/drivers/infiniband/sw/rxe/rxe_odp.c @@ -177,3 +177,80 @@ int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, return err; } + +/* Take xarray spinlock before entry */ +static inline bool rxe_odp_check_pages(struct rxe_mr *mr, u64 iova, + int length, u32 flags) +{ + unsigned long upper = rxe_mr_iova_to_index(mr, iova + length - 1); + unsigned long lower = rxe_mr_iova_to_index(mr, iova); + bool need_fault = false; + void *page, *entry; + size_t perm = 0; + + + if (!(flags & RXE_PAGEFAULT_RDONLY)) + perm = RXE_ODP_WRITABLE_BIT; + + XA_STATE(xas, &mr->page_list, lower); + + while (xas.xa_index <= upper) { + page = xas_load(&xas); + + /* Check page presence and write permission */ + if (!page || (perm && !(xa_pointer_tag(page) & perm))) { + need_fault = true; + break; + } + entry = xas_next(&xas); + } + + return need_fault; +} + +int rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length, + enum rxe_mr_copy_dir dir) +{ + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); + u32 flags = 0; + int err; + + if (unlikely(!mr->umem->is_odp)) + return -EOPNOTSUPP; + + switch (dir) { + case RXE_TO_MR_OBJ: + break; + + case RXE_FROM_MR_OBJ: + flags = RXE_PAGEFAULT_RDONLY; + break; + + default: + return -EINVAL; + } + + spin_lock(&mr->page_list.xa_lock); + + if (rxe_odp_check_pages(mr, iova, length, flags)) { + spin_unlock(&mr->page_list.xa_lock); + + /* umem_mutex is locked on success */ + err = rxe_odp_do_pagefault_and_lock(mr, iova, length, flags); + if (err < 0) + return err; + + /* + * The spinlock is always locked under mutex_lock except + * for MR initialization. No worry about deadlock. + */ + spin_lock(&mr->page_list.xa_lock); + mutex_unlock(&umem_odp->umem_mutex); + } + + err = rxe_mr_copy_xarray(mr, iova, addr, length, dir); + + spin_unlock(&mr->page_list.xa_lock); + + return err; +} From patchwork Thu Nov 9 05:44:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Daisuke Matsuda (Fujitsu)" X-Patchwork-Id: 163234 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b129:0:b0:403:3b70:6f57 with SMTP id q9csp235652vqs; Wed, 8 Nov 2023 21:48:06 -0800 (PST) X-Google-Smtp-Source: AGHT+IHXqa4ACuqmREzeTwoeUvgC0HTlDcoNSzjHgKPzfGFXKOuWeE3RJwN7E7L5rcdyqszAxB8q X-Received: by 2002:a17:902:e84a:b0:1cc:5306:e883 with SMTP id t10-20020a170902e84a00b001cc5306e883mr4437286plg.42.1699508886049; Wed, 08 Nov 2023 21:48:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699508886; cv=none; d=google.com; s=arc-20160816; b=HrZY+hZTwKZbCrsRHjQVcxTy+tKKDpwKifClVl/PkrnZOBMpEtLsLYbnH0kW3vBYt2 E7o+WX3yb6HT0oySiah6R0VLcZrjDkYsBCWvBAAVaPrv/IDEPeuhEslk6h6gZt8Jw/o2 p9F7vvnxSktWzWe5gGZJeWlWJmn+bL8L8F3+Oj98GVC8EO7htISx59ET2hZynNIh8qZm cqCQ+vf/J2JIweUrgkEHM/xDTdgdv4ovh4T18em502Wjvm8Xn/FL2Ltrs0q5d/id1fe0 wygGoUlrSuMbwul07sHMlawIVBBy8ud3wCjMCIFxTyorXEu8KsDX0LqTabZULQGA7Tv2 6m+w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=JE+n3OIlNwDA4uSPd5FbNPKWXFRlwkauhvpJGXoq6hw=; fh=Kza5GnIpHUYQ1cZy2y92CPl6XrzN03vBDGjf8wFwAS8=; b=v9kyEJMyxm7HNC3tH/G5QY70Ubvb4Rae5BPC7CnqPB0ZwqjFms3gVDLsdRjYdg7Fun AD5Kwt/Zi0ZZ3mXGIpN088pUGRkTMNnuxfIidVyvID5h0LSkNkYCyamU5ILHOAGJz93Y 3Drd+ETyNgQPmbrk/eT02yBQJrjSGNEIIpoR/74KJvQB0jqfIIFYy3rCHxJ2yzs4Q8Fx GUniupIDR5h9Y3TC5UFSiv60OWRXZAtM6Byt2F5OOlnN87qa6VKurBcDwPpgXVIF5IbL rUt6WQGrsM/Lo8oTuyBsWZ+8G6s9q3NFOXctA4cPz7wssLb8/94ArD+wnkASiG7Ha/j3 DhsQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: from fry.vger.email (fry.vger.email. [2620:137:e000::3:8]) by mx.google.com with ESMTPS id u9-20020a17090341c900b001c3e9b0bae1si4456824ple.443.2023.11.08.21.48.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Nov 2023 21:48:06 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) client-ip=2620:137:e000::3:8; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 5A20D802E883; Wed, 8 Nov 2023 21:48:03 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232565AbjKIFrS (ORCPT + 32 others); Thu, 9 Nov 2023 00:47:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39766 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232589AbjKIFq7 (ORCPT ); Thu, 9 Nov 2023 00:46:59 -0500 X-Greylist: delayed 65 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Wed, 08 Nov 2023 21:46:50 PST Received: from esa5.hc1455-7.c3s2.iphmx.com (esa5.hc1455-7.c3s2.iphmx.com [68.232.139.130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D60002709; Wed, 8 Nov 2023 21:46:50 -0800 (PST) X-IronPort-AV: E=McAfee;i="6600,9927,10888"; a="138532744" X-IronPort-AV: E=Sophos;i="6.03,288,1694703600"; d="scan'208";a="138532744" Received: from unknown (HELO oym-r2.gw.nic.fujitsu.com) ([210.162.30.90]) by esa5.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Nov 2023 14:45:44 +0900 Received: from oym-m1.gw.nic.fujitsu.com (oym-nat-oym-m1.gw.nic.fujitsu.com [192.168.87.58]) by oym-r2.gw.nic.fujitsu.com (Postfix) with ESMTP id 11D7CDC146; Thu, 9 Nov 2023 14:45:41 +0900 (JST) Received: from m3003.s.css.fujitsu.com (sqmail-3003.b.css.fujitsu.com [10.128.233.114]) by oym-m1.gw.nic.fujitsu.com (Postfix) with ESMTP id 4537FD9C60; Thu, 9 Nov 2023 14:45:40 +0900 (JST) Received: from localhost.localdomain (unknown [10.118.237.107]) by m3003.s.css.fujitsu.com (Postfix) with ESMTP id 006152005323; Thu, 9 Nov 2023 14:45:39 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leon@kernel.org, jgg@ziepe.ca, zyjzyj2000@gmail.com Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, yangx.jy@fujitsu.com, lizhijian@fujitsu.com, y-goto@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v7 7/7] RDMA/rxe: Add support for the traditional Atomic operations with ODP Date: Thu, 9 Nov 2023 14:44:52 +0900 Message-Id: X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Wed, 08 Nov 2023 21:48:03 -0800 (PST) X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1782064229856141317 X-GMAIL-MSGID: 1782064229856141317 Enable 'fetch and add' and 'compare and swap' operations to be used with ODP. This is comprised of the following steps: 1. Verify that the page is present with write permission. 2. If OK, execute the operation and exit. 3. If not, then trigger page fault to map the page. 4. Update the entry in the MR xarray. 5. Execute the operation. Signed-off-by: Daisuke Matsuda --- drivers/infiniband/sw/rxe/rxe.c | 1 + drivers/infiniband/sw/rxe/rxe_loc.h | 9 ++++++++ drivers/infiniband/sw/rxe/rxe_mr.c | 7 +++++- drivers/infiniband/sw/rxe/rxe_odp.c | 33 ++++++++++++++++++++++++++++ drivers/infiniband/sw/rxe/rxe_resp.c | 5 ++++- 5 files changed, 53 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c index 207a022156f0..abd3267c2873 100644 --- a/drivers/infiniband/sw/rxe/rxe.c +++ b/drivers/infiniband/sw/rxe/rxe.c @@ -88,6 +88,7 @@ static void rxe_init_device_param(struct rxe_dev *rxe) rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_RECV; rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_WRITE; rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_READ; + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_ATOMIC; rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_SRQ_RECV; } } diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index eeaeff8a1398..0bae9044f362 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -194,6 +194,9 @@ int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, int access_flags, struct rxe_mr *mr); int rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length, enum rxe_mr_copy_dir dir); +int rxe_odp_mr_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, + u64 compare, u64 swap_add, u64 *orig_val); + #else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ static inline int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, @@ -207,6 +210,12 @@ rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, { return -EOPNOTSUPP; } +static inline int +rxe_odp_mr_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, + u64 compare, u64 swap_add, u64 *orig_val) +{ + return RESPST_ERR_UNSUPPORTED_OPCODE; +} #endif /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c index f0ce87c0fc7d..0dc452ab772b 100644 --- a/drivers/infiniband/sw/rxe/rxe_mr.c +++ b/drivers/infiniband/sw/rxe/rxe_mr.c @@ -498,7 +498,12 @@ int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, } page_offset = rxe_mr_iova_to_page_offset(mr, iova); index = rxe_mr_iova_to_index(mr, iova); - page = xa_load(&mr->page_list, index); + + if (mr->umem->is_odp) + page = xa_untag_pointer(xa_load(&mr->page_list, index)); + else + page = xa_load(&mr->page_list, index); + if (!page) return RESPST_ERR_RKEY_VIOLATION; } diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c index 5aa09b9c1095..45b54ba15210 100644 --- a/drivers/infiniband/sw/rxe/rxe_odp.c +++ b/drivers/infiniband/sw/rxe/rxe_odp.c @@ -254,3 +254,36 @@ int rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length, return err; } + +int rxe_odp_mr_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, + u64 compare, u64 swap_add, u64 *orig_val) +{ + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); + int err; + + spin_lock(&mr->page_list.xa_lock); + + /* Atomic operations manipulate a single char. */ + if (rxe_odp_check_pages(mr, iova, sizeof(char), 0)) { + spin_unlock(&mr->page_list.xa_lock); + + /* umem_mutex is locked on success */ + err = rxe_odp_do_pagefault_and_lock(mr, iova, sizeof(char), 0); + if (err < 0) + return err; + + /* + * The spinlock is always locked under mutex_lock except + * for MR initialization. No worry about deadlock. + */ + spin_lock(&mr->page_list.xa_lock); + mutex_unlock(&umem_odp->umem_mutex); + } + + err = rxe_mr_do_atomic_op(mr, iova, opcode, compare, + swap_add, orig_val); + + spin_unlock(&mr->page_list.xa_lock); + + return err; +} diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c index 9159f1bdfc6f..af3e669679a0 100644 --- a/drivers/infiniband/sw/rxe/rxe_resp.c +++ b/drivers/infiniband/sw/rxe/rxe_resp.c @@ -693,7 +693,10 @@ static enum resp_states atomic_reply(struct rxe_qp *qp, u64 iova = qp->resp.va + qp->resp.offset; if (mr->umem->is_odp) - err = RESPST_ERR_UNSUPPORTED_OPCODE; + err = rxe_odp_mr_atomic_op(mr, iova, pkt->opcode, + atmeth_comp(pkt), + atmeth_swap_add(pkt), + &res->atomic.orig_val); else err = rxe_mr_do_atomic_op(mr, iova, pkt->opcode, atmeth_comp(pkt),