From patchwork Fri Sep 8 06:26:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Daisuke Matsuda (Fujitsu)" X-Patchwork-Id: 137779 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ab0a:0:b0:3f2:4152:657d with SMTP id m10csp715281vqo; Fri, 8 Sep 2023 11:21:50 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHJ4UXGMYokJSXPW6jq4D64AhLZ23M8pFiakYUXb6G65BSZmIDgfwxSpHPXkpuhRbBBjfE6 X-Received: by 2002:ac2:53ad:0:b0:500:7de4:300e with SMTP id j13-20020ac253ad000000b005007de4300emr2228956lfh.58.1694197309910; Fri, 08 Sep 2023 11:21:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694197309; cv=none; d=google.com; s=arc-20160816; b=ZmZKx+oVSjM/3deKISF2FtID5TQ5N1BX83vK6LPmfCdPUEkzY7PlhRwbCG36beqjTp JcVoGly9/gtdHhW5eZpvrqk2GyQnWj/D0ii/ynnVjPRL25NBt5qsRJGmI47I1CAsnVOC IRVGa72Mvr7OWai0rOtBA+aPOA/FOD+Q93WbJktsTFFMei+aQcrp0kvYcNKUO4TULl/q iSuC2KyVwjYKb09XpkIRrbNjKIN/3wXWoLZpVY9N/1NeoVhY0ChB/y2I31aBpGrrcuf7 kgSJXMx+M9bVh/zyP7Mz5htfY4OSIrbYxIbKI0ayMMFjFdx5a6yRYXqGNLzSlYDSCU6v QJQA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=Pkaty/FHFVi3m8oRN+5cPb6VyaU2c53FHXdIIxR8H3I=; fh=Kza5GnIpHUYQ1cZy2y92CPl6XrzN03vBDGjf8wFwAS8=; b=YHXDovu7QLQiQd13HfScjw2sr4r2V7Hx7Y42n6Ha3MyEQOeQlOD2ENA6IUi3BaQUD2 FLE9qGtCIjVEB4YnkjltwOGFJnIRmPJP3M65kGbpCO0Opo6pOQJa2K0SbKN+WtGh2nC2 L98pHYMrFOhlpFkP7hkQ4DGfaDTHN7KEwN1ZGSj9R9Na7XemWxc3mjdwIa2toI1ZhQD8 q9iErR0TU3BsVBs8F5ef91LMX11VWAWtiTlbbG13lXE4Fbm+iDK/cAvPwQFwdoG8XkwQ 5xHXrsAqQPtPfhfG6ZIJGDScR5ZVATo6Y1xbcqw4yjh5WZnSbxXznez6doFFRNqMVk0s KLHg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id fy12-20020a170906b7cc00b009934b1eb56dsi1809128ejb.11.2023.09.08.11.21.23; Fri, 08 Sep 2023 11:21:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232784AbjIHG3f (ORCPT + 99 others); Fri, 8 Sep 2023 02:29:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51768 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229822AbjIHG3e (ORCPT ); Fri, 8 Sep 2023 02:29:34 -0400 X-Greylist: delayed 101 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Thu, 07 Sep 2023 23:28:46 PDT Received: from esa2.hc1455-7.c3s2.iphmx.com (esa2.hc1455-7.c3s2.iphmx.com [207.54.90.48]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 07EB01FDF; Thu, 7 Sep 2023 23:28:45 -0700 (PDT) X-IronPort-AV: E=McAfee;i="6600,9927,10826"; a="131278177" X-IronPort-AV: E=Sophos;i="6.02,236,1688396400"; d="scan'208";a="131278177" Received: from unknown (HELO yto-r1.gw.nic.fujitsu.com) ([218.44.52.217]) by esa2.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Sep 2023 15:27:01 +0900 Received: from yto-m2.gw.nic.fujitsu.com (yto-nat-yto-m2.gw.nic.fujitsu.com [192.168.83.65]) by yto-r1.gw.nic.fujitsu.com (Postfix) with ESMTP id 8A39CDB3C7; Fri, 8 Sep 2023 15:26:58 +0900 (JST) Received: from m3002.s.css.fujitsu.com (msm3.b.css.fujitsu.com [10.128.233.104]) by yto-m2.gw.nic.fujitsu.com (Postfix) with ESMTP id CE8A9D67AC; Fri, 8 Sep 2023 15:26:57 +0900 (JST) Received: from localhost.localdomain (unknown [10.118.237.107]) by m3002.s.css.fujitsu.com (Postfix) with ESMTP id 9B9B82005B08; Fri, 8 Sep 2023 15:26:57 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leon@kernel.org, jgg@ziepe.ca, zyjzyj2000@gmail.com Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, yangx.jy@fujitsu.com, lizhijian@fujitsu.com, y-goto@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v6 1/7] RDMA/rxe: Always defer tasks on responder and completer to workqueue Date: Fri, 8 Sep 2023 15:26:42 +0900 Message-Id: <7699a90bc4af10c33c0a46ef6330ed4bb7e7ace6.1694153251.git.matsuda-daisuke@fujitsu.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_LOW, SPF_HELO_PASS,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1776494638037064085 X-GMAIL-MSGID: 1776494638037064085 Both responder and completer need to sleep to execute page-fault when used with ODP. It can happen when they are going to access user MRs, so tasks must be executed in process context for such cases. Additionally, current implementation seldom defers tasks to workqueue, but instead defers to a softirq context running do_task(). It is called from rxe_resp_queue_pkt() and rxe_comp_queue_pkt() in SOFTIRQ_NET_RX context and can last until maximum RXE_MAX_ITERATIONS (=1024) loops are executed. The problem is the that task execuion appears to be anonymous loads in the system and that the loop can throttle other softirqs on the same CPU. This patch makes responder and completer codes run in process context for ODP and the problem described above. Reviewed-by: Bob Pearson Signed-off-by: Daisuke Matsuda --- drivers/infiniband/sw/rxe/rxe_comp.c | 12 +----------- drivers/infiniband/sw/rxe/rxe_hw_counters.c | 1 - drivers/infiniband/sw/rxe/rxe_hw_counters.h | 1 - drivers/infiniband/sw/rxe/rxe_resp.c | 13 +------------ 4 files changed, 2 insertions(+), 25 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c index d0bdc2d8adc8..bb016a43330d 100644 --- a/drivers/infiniband/sw/rxe/rxe_comp.c +++ b/drivers/infiniband/sw/rxe/rxe_comp.c @@ -129,18 +129,8 @@ void retransmit_timer(struct timer_list *t) void rxe_comp_queue_pkt(struct rxe_qp *qp, struct sk_buff *skb) { - int must_sched; - skb_queue_tail(&qp->resp_pkts, skb); - - must_sched = skb_queue_len(&qp->resp_pkts) > 1; - if (must_sched != 0) - rxe_counter_inc(SKB_TO_PKT(skb)->rxe, RXE_CNT_COMPLETER_SCHED); - - if (must_sched) - rxe_sched_task(&qp->comp.task); - else - rxe_run_task(&qp->comp.task); + rxe_sched_task(&qp->comp.task); } static inline enum comp_state get_wqe(struct rxe_qp *qp, diff --git a/drivers/infiniband/sw/rxe/rxe_hw_counters.c b/drivers/infiniband/sw/rxe/rxe_hw_counters.c index a012522b577a..dc23cf3a6967 100644 --- a/drivers/infiniband/sw/rxe/rxe_hw_counters.c +++ b/drivers/infiniband/sw/rxe/rxe_hw_counters.c @@ -14,7 +14,6 @@ static const struct rdma_stat_desc rxe_counter_descs[] = { [RXE_CNT_RCV_RNR].name = "rcvd_rnr_err", [RXE_CNT_SND_RNR].name = "send_rnr_err", [RXE_CNT_RCV_SEQ_ERR].name = "rcvd_seq_err", - [RXE_CNT_COMPLETER_SCHED].name = "ack_deferred", [RXE_CNT_RETRY_EXCEEDED].name = "retry_exceeded_err", [RXE_CNT_RNR_RETRY_EXCEEDED].name = "retry_rnr_exceeded_err", [RXE_CNT_COMP_RETRY].name = "completer_retry_err", diff --git a/drivers/infiniband/sw/rxe/rxe_hw_counters.h b/drivers/infiniband/sw/rxe/rxe_hw_counters.h index 71f4d4fa9dc8..303da0e3134a 100644 --- a/drivers/infiniband/sw/rxe/rxe_hw_counters.h +++ b/drivers/infiniband/sw/rxe/rxe_hw_counters.h @@ -18,7 +18,6 @@ enum rxe_counters { RXE_CNT_RCV_RNR, RXE_CNT_SND_RNR, RXE_CNT_RCV_SEQ_ERR, - RXE_CNT_COMPLETER_SCHED, RXE_CNT_RETRY_EXCEEDED, RXE_CNT_RNR_RETRY_EXCEEDED, RXE_CNT_COMP_RETRY, diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c index da470a925efc..969e057bbfd1 100644 --- a/drivers/infiniband/sw/rxe/rxe_resp.c +++ b/drivers/infiniband/sw/rxe/rxe_resp.c @@ -46,21 +46,10 @@ static char *resp_state_name[] = { [RESPST_EXIT] = "EXIT", }; -/* rxe_recv calls here to add a request packet to the input queue */ void rxe_resp_queue_pkt(struct rxe_qp *qp, struct sk_buff *skb) { - int must_sched; - struct rxe_pkt_info *pkt = SKB_TO_PKT(skb); - skb_queue_tail(&qp->req_pkts, skb); - - must_sched = (pkt->opcode == IB_OPCODE_RC_RDMA_READ_REQUEST) || - (skb_queue_len(&qp->req_pkts) > 1); - - if (must_sched) - rxe_sched_task(&qp->resp.task); - else - rxe_run_task(&qp->resp.task); + rxe_sched_task(&qp->resp.task); } static inline enum resp_states get_req(struct rxe_qp *qp, From patchwork Fri Sep 8 06:26:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Daisuke Matsuda (Fujitsu)" X-Patchwork-Id: 137701 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ab0a:0:b0:3f2:4152:657d with SMTP id m10csp365830vqo; Fri, 8 Sep 2023 00:03:31 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHNKrR3P0wKa4CMYAUdZHjEDhpD0py2/FJBi9gyIR6zefRCk/kEMAOuj/1ifW9liC1ZqRKl X-Received: by 2002:a17:906:b08d:b0:99b:ead0:2733 with SMTP id x13-20020a170906b08d00b0099bead02733mr983461ejy.72.1694156611549; Fri, 08 Sep 2023 00:03:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694156611; cv=none; d=google.com; s=arc-20160816; b=OX82SOcAkOzdVEHrCLa6NnrR27vf45YGp3qIOn6kjvkJA43ryf1wsc39n8UCZd+k6G 6g1+P2WCv/EkYR6J+31/Z+pa1eaWTinYg6NFgbLNNndCb83CatJPxt9Uf31GlLCIE2qd L1o4sKaKtE1ZOJHV0Iiw53bVSH73en1D1u4rboLn4VNoxHWNuvkqgPiwVC432uUjNzec xfaDV/KetD7NCGhm99ebX3pBb3YE9sqNSLuhhKomAC1UxxbXOdWBPaCDAv/mb2Iyw6sT lSbgLrSU1EAbWCAGt+UdoJ8eeSralZtn4T5LAfiZg1dFCclrfZk6B2y6pvPQp0NYPG8I yDcg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=5X5Y/lxUjEioJwybA2y3eXWH9Vnys9bZtSSe82LNmDg=; fh=Kza5GnIpHUYQ1cZy2y92CPl6XrzN03vBDGjf8wFwAS8=; b=j41gf2zyrInpaL61HPmMf6iAc9dCZKgVwgYet9BvkKTjhG4tCmaV3uBLCvL8ZYdAfj L5W/V1pimbZU/7V8hRCU7QK+koAaEpQP361xHIWE4ddtXpqxMMcfrnGTHDy+AQWVsC7z ZJ1Ue8vJrkx62Syrf405VOhuzNJQKgBWODNtr+XDbA+P3nER+5iOWvN8TajWAcVvir40 lgEYDt1nlH6yjEvceT8l19JJSxKbM/hdhcVOhxTeIwaRI5PuoiTO6NlaTJPkGlQljvYX 2YtDdziTlhGvbMLdM+teul1yslOEXd0hCYyenj3DQIewelO069qJH6lT7PnBBybgVla3 VeTA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n26-20020a170906089a00b0099cadcf13d9si662651eje.103.2023.09.08.00.03.19; Fri, 08 Sep 2023 00:03:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241924AbjIHG1R (ORCPT + 37 others); Fri, 8 Sep 2023 02:27:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38490 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241855AbjIHG1O (ORCPT ); Fri, 8 Sep 2023 02:27:14 -0400 Received: from esa6.hc1455-7.c3s2.iphmx.com (esa6.hc1455-7.c3s2.iphmx.com [68.232.139.139]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C9E4C19A6; Thu, 7 Sep 2023 23:27:07 -0700 (PDT) X-IronPort-AV: E=McAfee;i="6600,9927,10826"; a="132651174" X-IronPort-AV: E=Sophos;i="6.02,236,1688396400"; d="scan'208";a="132651174" Received: from unknown (HELO yto-r3.gw.nic.fujitsu.com) ([218.44.52.219]) by esa6.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Sep 2023 15:27:04 +0900 Received: from yto-m4.gw.nic.fujitsu.com (yto-nat-yto-m4.gw.nic.fujitsu.com [192.168.83.67]) by yto-r3.gw.nic.fujitsu.com (Postfix) with ESMTP id 38D76C3F80; Fri, 8 Sep 2023 15:27:01 +0900 (JST) Received: from m3002.s.css.fujitsu.com (msm3.b.css.fujitsu.com [10.128.233.104]) by yto-m4.gw.nic.fujitsu.com (Postfix) with ESMTP id 820BFE67AE; Fri, 8 Sep 2023 15:27:00 +0900 (JST) Received: from localhost.localdomain (unknown [10.118.237.107]) by m3002.s.css.fujitsu.com (Postfix) with ESMTP id 4F8B92005B08; Fri, 8 Sep 2023 15:27:00 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leon@kernel.org, jgg@ziepe.ca, zyjzyj2000@gmail.com Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, yangx.jy@fujitsu.com, lizhijian@fujitsu.com, y-goto@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v6 2/7] RDMA/rxe: Make MR functions accessible from other rxe source code Date: Fri, 8 Sep 2023 15:26:43 +0900 Message-Id: <78a170cbd55fce11f455968016cd3a161822ccd0.1694153251.git.matsuda-daisuke@fujitsu.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_PASS,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1776451962631956576 X-GMAIL-MSGID: 1776451962631956576 Some functions in rxe_mr.c are going to be used in rxe_odp.c, which is to be created in the subsequent patch. List the declarations of the functions in rxe_loc.h. Signed-off-by: Daisuke Matsuda --- drivers/infiniband/sw/rxe/rxe_loc.h | 8 ++++++++ drivers/infiniband/sw/rxe/rxe_mr.c | 11 +++-------- 2 files changed, 11 insertions(+), 8 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index 4d2a8ef52c85..eb867f7d0d36 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -58,6 +58,7 @@ int rxe_mmap(struct ib_ucontext *context, struct vm_area_struct *vma); /* rxe_mr.c */ u8 rxe_get_next_key(u32 last_key); +void rxe_mr_init(int access, struct rxe_mr *mr); void rxe_mr_init_dma(int access, struct rxe_mr *mr); int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, int access, struct rxe_mr *mr); @@ -69,6 +70,8 @@ int copy_data(struct rxe_pd *pd, int access, struct rxe_dma_info *dma, void *addr, int length, enum rxe_mr_copy_dir dir); int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents, unsigned int *sg_offset); +int rxe_mr_copy_xarray(struct rxe_mr *mr, u64 iova, void *addr, + unsigned int length, enum rxe_mr_copy_dir dir); int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, u64 compare, u64 swap_add, u64 *orig_val); int rxe_mr_do_atomic_write(struct rxe_mr *mr, u64 iova, u64 value); @@ -80,6 +83,11 @@ int rxe_invalidate_mr(struct rxe_qp *qp, u32 key); int rxe_reg_fast_mr(struct rxe_qp *qp, struct rxe_send_wqe *wqe); void rxe_mr_cleanup(struct rxe_pool_elem *elem); +static inline unsigned long rxe_mr_iova_to_index(struct rxe_mr *mr, u64 iova) +{ + return (iova >> mr->page_shift) - (mr->ibmr.iova >> mr->page_shift); +} + /* rxe_mw.c */ int rxe_alloc_mw(struct ib_mw *ibmw, struct ib_udata *udata); int rxe_dealloc_mw(struct ib_mw *ibmw); diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c index f54042e9aeb2..86b1908d304b 100644 --- a/drivers/infiniband/sw/rxe/rxe_mr.c +++ b/drivers/infiniband/sw/rxe/rxe_mr.c @@ -45,7 +45,7 @@ int mr_check_range(struct rxe_mr *mr, u64 iova, size_t length) } } -static void rxe_mr_init(int access, struct rxe_mr *mr) +void rxe_mr_init(int access, struct rxe_mr *mr) { u32 key = mr->elem.index << 8 | rxe_get_next_key(-1); @@ -72,11 +72,6 @@ void rxe_mr_init_dma(int access, struct rxe_mr *mr) mr->ibmr.type = IB_MR_TYPE_DMA; } -static unsigned long rxe_mr_iova_to_index(struct rxe_mr *mr, u64 iova) -{ - return (iova >> mr->page_shift) - (mr->ibmr.iova >> mr->page_shift); -} - static unsigned long rxe_mr_iova_to_page_offset(struct rxe_mr *mr, u64 iova) { return iova & (mr_page_size(mr) - 1); @@ -242,8 +237,8 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl, return ib_sg_to_pages(ibmr, sgl, sg_nents, sg_offset, rxe_set_page); } -static int rxe_mr_copy_xarray(struct rxe_mr *mr, u64 iova, void *addr, - unsigned int length, enum rxe_mr_copy_dir dir) +int rxe_mr_copy_xarray(struct rxe_mr *mr, u64 iova, void *addr, + unsigned int length, enum rxe_mr_copy_dir dir) { unsigned int page_offset = rxe_mr_iova_to_page_offset(mr, iova); unsigned long index = rxe_mr_iova_to_index(mr, iova); From patchwork Fri Sep 8 06:26:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Daisuke Matsuda (Fujitsu)" X-Patchwork-Id: 137702 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ab0a:0:b0:3f2:4152:657d with SMTP id m10csp376745vqo; Fri, 8 Sep 2023 00:33:48 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFkMYKK0UQ9VDkW3Ffr0xSCQ2Ca+R28hqJ65bTLEmyJ4iu/b7w/wjlRG+3iLSAhz+c2Nur9 X-Received: by 2002:a17:902:ea93:b0:1c2:218c:3754 with SMTP id x19-20020a170902ea9300b001c2218c3754mr1663790plb.53.1694158427912; Fri, 08 Sep 2023 00:33:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694158427; cv=none; d=google.com; s=arc-20160816; b=CGXHTjytm8h6wc3YrTGHeaBq7ZiL3amxpyFDTroP539ma031VJLYki3y0jOCG2fCXW qw6QMFIxwirMxNIR2I6RE/ca/ZdkGKVCrn/59le5LELWELvqVN/fE/K1pKLn812BoCje Wu7hBDtx7j55oF1YxoOC7rhT4s9qdCDubDcweKDOY/c0EIJCbMvQ1+1OarqdQyVl9mZq 1fWy8BQRxcenJC/Wa+yqfdmlWsDKn6jN7uJFh3juQP6Hlt0KXtET233baotM+VDBeBBa GoZHA0woDQRuVaJ/AiAx7vbSNxtUD5JkqfS40nIXsissRpDsGAIYerPa45OBHR3lanYH 4Zgw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=BnhvkF4nUw5o4Fea8VjMUk6h9nOIRZI5SYy8pECzFCo=; fh=Kza5GnIpHUYQ1cZy2y92CPl6XrzN03vBDGjf8wFwAS8=; b=ojeRuv1g/iqjBitVIk9FpohciJEdF2ytZYmGfGwPfGAXJqGsGwuusWg2UEzC65ykIh AmUUBlKxFtJqhrD4/dmeuR0ijKk3e8WQtZOB6EGwtCYgp55QZIWEu2+oMFxNd/8n1nFH winXXMhtjNhVkkQsTh0uX+/u4e3ZllHqN8/U2Q8PPQUUhj+Kk3wjeEQnKjwP0jQzRB9l Fy35LWzEEvRmmUAhq/Z9rSV0x8rZNRlt5flTSMsfFWBGZEGsDz1OdolAr0obQ7RGvMi3 gr1uu/SrrXM4lRPHI8psVHM8b4YDeYRui15B1DsbVCtfR9qSQZQ+xnxTWFzlg4WeZnuX v1FQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t11-20020a170902e1cb00b001bb8a57d518si1099716pla.379.2023.09.08.00.33.38; Fri, 08 Sep 2023 00:33:47 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239540AbjIHG1P (ORCPT + 37 others); Fri, 8 Sep 2023 02:27:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38484 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241837AbjIHG1O (ORCPT ); Fri, 8 Sep 2023 02:27:14 -0400 Received: from esa1.hc1455-7.c3s2.iphmx.com (esa1.hc1455-7.c3s2.iphmx.com [207.54.90.47]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C63A71BF0; Thu, 7 Sep 2023 23:27:07 -0700 (PDT) X-IronPort-AV: E=McAfee;i="6600,9927,10826"; a="131102623" X-IronPort-AV: E=Sophos;i="6.02,236,1688396400"; d="scan'208";a="131102623" Received: from unknown (HELO yto-r1.gw.nic.fujitsu.com) ([218.44.52.217]) by esa1.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Sep 2023 15:27:05 +0900 Received: from yto-m3.gw.nic.fujitsu.com (yto-nat-yto-m3.gw.nic.fujitsu.com [192.168.83.66]) by yto-r1.gw.nic.fujitsu.com (Postfix) with ESMTP id C5CD3DB3CA; Fri, 8 Sep 2023 15:27:02 +0900 (JST) Received: from m3002.s.css.fujitsu.com (msm3.b.css.fujitsu.com [10.128.233.104]) by yto-m3.gw.nic.fujitsu.com (Postfix) with ESMTP id 260A1D9687; Fri, 8 Sep 2023 15:27:02 +0900 (JST) Received: from localhost.localdomain (unknown [10.118.237.107]) by m3002.s.css.fujitsu.com (Postfix) with ESMTP id E647F200537C; Fri, 8 Sep 2023 15:27:01 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leon@kernel.org, jgg@ziepe.ca, zyjzyj2000@gmail.com Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, yangx.jy@fujitsu.com, lizhijian@fujitsu.com, y-goto@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v6 3/7] RDMA/rxe: Move resp_states definition to rxe_verbs.h Date: Fri, 8 Sep 2023 15:26:44 +0900 Message-Id: <609cbbed75f10539578383c5ffab9ef208be82c6.1694153251.git.matsuda-daisuke@fujitsu.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_PASS,SPF_NONE,UPPERCASE_50_75 autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1776453867550611981 X-GMAIL-MSGID: 1776453867550611981 To use the resp_states values in rxe_loc.h, it is necessary to move the definition to rxe_verbs.h, where other internal states of this driver are defined. Reviewed-by: Bob Pearson Signed-off-by: Daisuke Matsuda --- drivers/infiniband/sw/rxe/rxe.h | 37 --------------------------- drivers/infiniband/sw/rxe/rxe_verbs.h | 37 +++++++++++++++++++++++++++ 2 files changed, 37 insertions(+), 37 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe.h b/drivers/infiniband/sw/rxe/rxe.h index d33dd6cf83d3..9b4d044a1264 100644 --- a/drivers/infiniband/sw/rxe/rxe.h +++ b/drivers/infiniband/sw/rxe/rxe.h @@ -100,43 +100,6 @@ #define rxe_info_mw(mw, fmt, ...) ibdev_info_ratelimited((mw)->ibmw.device, \ "mw#%d %s: " fmt, (mw)->elem.index, __func__, ##__VA_ARGS__) -/* responder states */ -enum resp_states { - RESPST_NONE, - RESPST_GET_REQ, - RESPST_CHK_PSN, - RESPST_CHK_OP_SEQ, - RESPST_CHK_OP_VALID, - RESPST_CHK_RESOURCE, - RESPST_CHK_LENGTH, - RESPST_CHK_RKEY, - RESPST_EXECUTE, - RESPST_READ_REPLY, - RESPST_ATOMIC_REPLY, - RESPST_ATOMIC_WRITE_REPLY, - RESPST_PROCESS_FLUSH, - RESPST_COMPLETE, - RESPST_ACKNOWLEDGE, - RESPST_CLEANUP, - RESPST_DUPLICATE_REQUEST, - RESPST_ERR_MALFORMED_WQE, - RESPST_ERR_UNSUPPORTED_OPCODE, - RESPST_ERR_MISALIGNED_ATOMIC, - RESPST_ERR_PSN_OUT_OF_SEQ, - RESPST_ERR_MISSING_OPCODE_FIRST, - RESPST_ERR_MISSING_OPCODE_LAST_C, - RESPST_ERR_MISSING_OPCODE_LAST_D1E, - RESPST_ERR_TOO_MANY_RDMA_ATM_REQ, - RESPST_ERR_RNR, - RESPST_ERR_RKEY_VIOLATION, - RESPST_ERR_INVALIDATE_RKEY, - RESPST_ERR_LENGTH, - RESPST_ERR_CQ_OVERFLOW, - RESPST_ERROR, - RESPST_DONE, - RESPST_EXIT, -}; - void rxe_set_mtu(struct rxe_dev *rxe, unsigned int dev_mtu); int rxe_add(struct rxe_dev *rxe, unsigned int mtu, const char *ibdev_name); diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h index ccb9d19ffe8a..1058b5de8920 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.h +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h @@ -127,6 +127,43 @@ struct rxe_comp_info { struct rxe_task task; }; +/* responder states */ +enum resp_states { + RESPST_NONE, + RESPST_GET_REQ, + RESPST_CHK_PSN, + RESPST_CHK_OP_SEQ, + RESPST_CHK_OP_VALID, + RESPST_CHK_RESOURCE, + RESPST_CHK_LENGTH, + RESPST_CHK_RKEY, + RESPST_EXECUTE, + RESPST_READ_REPLY, + RESPST_ATOMIC_REPLY, + RESPST_ATOMIC_WRITE_REPLY, + RESPST_PROCESS_FLUSH, + RESPST_COMPLETE, + RESPST_ACKNOWLEDGE, + RESPST_CLEANUP, + RESPST_DUPLICATE_REQUEST, + RESPST_ERR_MALFORMED_WQE, + RESPST_ERR_UNSUPPORTED_OPCODE, + RESPST_ERR_MISALIGNED_ATOMIC, + RESPST_ERR_PSN_OUT_OF_SEQ, + RESPST_ERR_MISSING_OPCODE_FIRST, + RESPST_ERR_MISSING_OPCODE_LAST_C, + RESPST_ERR_MISSING_OPCODE_LAST_D1E, + RESPST_ERR_TOO_MANY_RDMA_ATM_REQ, + RESPST_ERR_RNR, + RESPST_ERR_RKEY_VIOLATION, + RESPST_ERR_INVALIDATE_RKEY, + RESPST_ERR_LENGTH, + RESPST_ERR_CQ_OVERFLOW, + RESPST_ERROR, + RESPST_DONE, + RESPST_EXIT, +}; + enum rdatm_res_state { rdatm_res_state_next, rdatm_res_state_new, From patchwork Fri Sep 8 06:26:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Daisuke Matsuda (Fujitsu)" X-Patchwork-Id: 137712 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ab0a:0:b0:3f2:4152:657d with SMTP id m10csp436952vqo; Fri, 8 Sep 2023 03:08:38 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEQdMBYw7z2hSxiwAfq9QmW+2izeoAEFJi0WanbPmi3McA4tn78E4ytY5kCEWZmztBqel0q X-Received: by 2002:aa7:c3d4:0:b0:51e:309:2e11 with SMTP id l20-20020aa7c3d4000000b0051e03092e11mr1408881edr.36.1694167717936; Fri, 08 Sep 2023 03:08:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694167717; cv=none; d=google.com; s=arc-20160816; b=XXJZGOMbIFvHt7DsEGRwJsxt/NuYDEH6Hkv+M6kkMGrdjH6jhtB6i6rYZAhHLIYlHx 2Y0+RNgw6eir70jw8XWAaH4JRxeuQQFSelh8f+wlr3r1GcNcnvMpxppBxw+5QD/Dmlqf OTHrphAeBuneQBveYN91nq6e8KwkxSrDuODCCRwUhNy7ishQIv1VixhnTnT1a1g3F1eK TOO51whLhFvix9MU1JVzvMfEUpWTy4KiPhVk/rSB5ZMdSuakNqVyKrl9mBwFHNbbTfZh CY89XESiTu92oz0f3HbtcZZS7yyCvQ7hKY5IeTLVKy4FiviSuZ2E0dG2wVo81x9rdhh0 aVIQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=dn7cj38k2G3mpStNC1hI53nQLMRoJcjBgYLrJpf4Wd4=; fh=Kza5GnIpHUYQ1cZy2y92CPl6XrzN03vBDGjf8wFwAS8=; b=ZaTHa9VKJiZ4+khXg/cL735OJV2KVLA5J5l/UZvFz/L+S7WwmLE3cXf51IATAJu3U0 KTSUlAMGblPZcj4bj/p2i5yKzYC1XNKvzsVN/zHwQs0rn5XweDhotgXzbMjhjq33xdru XjD/jlpOfSpUaCHycbRtLO1796REtIx7jLfg6UsRjkpUoZCDC/SUWv5j7SIa08YmShQe BNb5pvJNS2CvVSIqS8Q/2iFjROXqLaWt758jo64NiLKqS4baxedbJ48rvUy5Sbaytgvw EtOWMoghWPDq39Lkb0V1lYip37Pzw1QTwdyrOBpGsp4AEsTmi9hEo9zI51/F0LJMmTy1 UfKg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h15-20020aa7de0f000000b0052e8d878e95si1232690edv.476.2023.09.08.03.08.25; Fri, 08 Sep 2023 03:08:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242001AbjIHG1S (ORCPT + 42 others); Fri, 8 Sep 2023 02:27:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38488 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239031AbjIHG1P (ORCPT ); Fri, 8 Sep 2023 02:27:15 -0400 Received: from esa7.hc1455-7.c3s2.iphmx.com (esa7.hc1455-7.c3s2.iphmx.com [139.138.61.252]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6A6071BFF; Thu, 7 Sep 2023 23:27:10 -0700 (PDT) X-IronPort-AV: E=McAfee;i="6600,9927,10826"; a="110091217" X-IronPort-AV: E=Sophos;i="6.02,236,1688396400"; d="scan'208";a="110091217" Received: from unknown (HELO yto-r2.gw.nic.fujitsu.com) ([218.44.52.218]) by esa7.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Sep 2023 15:27:07 +0900 Received: from yto-m4.gw.nic.fujitsu.com (yto-nat-yto-m4.gw.nic.fujitsu.com [192.168.83.67]) by yto-r2.gw.nic.fujitsu.com (Postfix) with ESMTP id 3BB69C68EA; Fri, 8 Sep 2023 15:27:04 +0900 (JST) Received: from m3002.s.css.fujitsu.com (msm3.b.css.fujitsu.com [10.128.233.104]) by yto-m4.gw.nic.fujitsu.com (Postfix) with ESMTP id 7E66BE0C26; Fri, 8 Sep 2023 15:27:03 +0900 (JST) Received: from localhost.localdomain (unknown [10.118.237.107]) by m3002.s.css.fujitsu.com (Postfix) with ESMTP id 4B5A3200537C; Fri, 8 Sep 2023 15:27:03 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leon@kernel.org, jgg@ziepe.ca, zyjzyj2000@gmail.com Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, yangx.jy@fujitsu.com, lizhijian@fujitsu.com, y-goto@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v6 4/7] RDMA/rxe: Add page invalidation support Date: Fri, 8 Sep 2023 15:26:45 +0900 Message-Id: <1566fd3c63e4dac66717731e2c7a80039244e3af.1694153251.git.matsuda-daisuke@fujitsu.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_PASS,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1776463609468400871 X-GMAIL-MSGID: 1776463609468400871 On page invalidation, an MMU notifier callback is invoked to unmap DMA addresses and update the driver page table(umem_odp->dma_list). It also sets the corresponding entries in MR xarray to NULL to prevent any access. The callback is registered when an ODP-enabled MR is created. Signed-off-by: Daisuke Matsuda --- drivers/infiniband/sw/rxe/Makefile | 2 + drivers/infiniband/sw/rxe/rxe_odp.c | 64 +++++++++++++++++++++++++++++ 2 files changed, 66 insertions(+) create mode 100644 drivers/infiniband/sw/rxe/rxe_odp.c diff --git a/drivers/infiniband/sw/rxe/Makefile b/drivers/infiniband/sw/rxe/Makefile index 5395a581f4bb..93134f1d1d0c 100644 --- a/drivers/infiniband/sw/rxe/Makefile +++ b/drivers/infiniband/sw/rxe/Makefile @@ -23,3 +23,5 @@ rdma_rxe-y := \ rxe_task.o \ rxe_net.o \ rxe_hw_counters.o + +rdma_rxe-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += rxe_odp.o diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c new file mode 100644 index 000000000000..834fb1a84800 --- /dev/null +++ b/drivers/infiniband/sw/rxe/rxe_odp.c @@ -0,0 +1,64 @@ +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB +/* + * Copyright (c) 2022-2023 Fujitsu Ltd. All rights reserved. + */ + +#include + +#include + +#include "rxe.h" + +static void rxe_mr_unset_xarray(struct rxe_mr *mr, unsigned long start, + unsigned long end) +{ + unsigned long lower = rxe_mr_iova_to_index(mr, start); + unsigned long upper = rxe_mr_iova_to_index(mr, end - 1); + void *entry; + + XA_STATE(xas, &mr->page_list, lower); + + /* make elements in xarray NULL */ + xas_lock(&xas); + while (true) { + xas_store(&xas, NULL); + + entry = xas_next(&xas); + if (xas_retry(&xas, entry) || (xas.xa_index <= upper)) + continue; + + break; + } + xas_unlock(&xas); +} + +static bool rxe_ib_invalidate_range(struct mmu_interval_notifier *mni, + const struct mmu_notifier_range *range, + unsigned long cur_seq) +{ + struct ib_umem_odp *umem_odp = + container_of(mni, struct ib_umem_odp, notifier); + struct rxe_mr *mr = umem_odp->private; + unsigned long start, end; + + if (!mmu_notifier_range_blockable(range)) + return false; + + mutex_lock(&umem_odp->umem_mutex); + mmu_interval_set_seq(mni, cur_seq); + + start = max_t(u64, ib_umem_start(umem_odp), range->start); + end = min_t(u64, ib_umem_end(umem_odp), range->end); + + rxe_mr_unset_xarray(mr, start, end); + + /* update umem_odp->dma_list */ + ib_umem_odp_unmap_dma_pages(umem_odp, start, end); + + mutex_unlock(&umem_odp->umem_mutex); + return true; +} + +const struct mmu_interval_notifier_ops rxe_mn_ops = { + .invalidate = rxe_ib_invalidate_range, +}; From patchwork Fri Sep 8 06:26:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Daisuke Matsuda (Fujitsu)" X-Patchwork-Id: 137713 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ab0a:0:b0:3f2:4152:657d with SMTP id m10csp445420vqo; Fri, 8 Sep 2023 03:25:21 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFN9sOOtpsadfUjnFBnMXCKJGP4Sn2QhPn2RZHIiVL59sRMoOftTy87IWrMVHtZnpLf0hUS X-Received: by 2002:a05:6a20:8e06:b0:140:61f8:53f3 with SMTP id y6-20020a056a208e0600b0014061f853f3mr2891979pzj.21.1694168721350; Fri, 08 Sep 2023 03:25:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694168721; cv=none; d=google.com; s=arc-20160816; b=CeLNQvVNZykZ+jPoa9MLzwHTBJCVhnNqVYKCWtgggYW/KJC/HE4y5OOvE/TKDowMSA Ilt9A9g/o4Hx90G11Bi6LqP6LiINRVJXpu4mFFsp65emQ+TEotfUOi0pQfdaRFJoWAkd mAVXk2GCuVBAVOZPSHcHl/ihFSTIPPIAaKnXSNBXty9EJbg4NYi4S/qxtoBe+HeRcsyO NHyLX+6NMaOXadkevAXp61RzHV6lOQ/LtiSxome9Xf4DSqDU27Xybm1Zd7FmqXS474aI 8C59FtJaIMIjOGHbtVzHowGQyNS6O0rtmnK/JfWdQDW1oQTzh/lghrhEeJRe7i71kogF pVBA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=ra4A6GysFWRUX5pRVESUfX85aSgNzbCo1/TrV913h2U=; fh=Kza5GnIpHUYQ1cZy2y92CPl6XrzN03vBDGjf8wFwAS8=; b=Xjuf/8+wRxakbH0M0r1YHyv4XcgC69uMCoVQI6FVg1HPEWrxMPLByFx+kTYswRNK45 kdxNm7qo8z2QVeXLx2RjEixRW4yY5e4dcProvPaqDC5qJBR3cuNVkl2dlry4wb5J8xiC o4/MXpdkrdLiDiTgNwdCVIQDOvmTfOdTWL25UN7LQFeZjKuvox1OTBP7ZXHMvFCG9V8c elGBDoAHNA477TLMZiX1FpDrrhDOt8aXQBsMCQUJidsuGrqJO6y7jK5aMTUL2KUk5eZ4 1sRt6cTVDVwinNe+J0BNP9GUoNxqig/gZMWLzMlPoXzuoSrgrAGjRh0qGabQMRF+PW7w m3MA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bt15-20020a056a00438f00b0068e48139828si1325574pfb.349.2023.09.08.03.25.15; Fri, 08 Sep 2023 03:25:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236275AbjIHG3h (ORCPT + 42 others); Fri, 8 Sep 2023 02:29:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51794 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234659AbjIHG3g (ORCPT ); Fri, 8 Sep 2023 02:29:36 -0400 Received: from esa11.hc1455-7.c3s2.iphmx.com (esa11.hc1455-7.c3s2.iphmx.com [207.54.90.137]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0553A2116; Thu, 7 Sep 2023 23:28:47 -0700 (PDT) X-IronPort-AV: E=McAfee;i="6600,9927,10826"; a="110627120" X-IronPort-AV: E=Sophos;i="6.02,236,1688396400"; d="scan'208";a="110627120" Received: from unknown (HELO oym-r4.gw.nic.fujitsu.com) ([210.162.30.92]) by esa11.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Sep 2023 15:27:09 +0900 Received: from oym-m4.gw.nic.fujitsu.com (oym-nat-oym-m4.gw.nic.fujitsu.com [192.168.87.61]) by oym-r4.gw.nic.fujitsu.com (Postfix) with ESMTP id EE274DDC6A; Fri, 8 Sep 2023 15:27:05 +0900 (JST) Received: from m3002.s.css.fujitsu.com (msm3.b.css.fujitsu.com [10.128.233.104]) by oym-m4.gw.nic.fujitsu.com (Postfix) with ESMTP id 20294D621A; Fri, 8 Sep 2023 15:27:05 +0900 (JST) Received: from localhost.localdomain (unknown [10.118.237.107]) by m3002.s.css.fujitsu.com (Postfix) with ESMTP id D0AB2200537C; Fri, 8 Sep 2023 15:27:04 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leon@kernel.org, jgg@ziepe.ca, zyjzyj2000@gmail.com Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, yangx.jy@fujitsu.com, lizhijian@fujitsu.com, y-goto@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v6 5/7] RDMA/rxe: Allow registering MRs for On-Demand Paging Date: Fri, 8 Sep 2023 15:26:46 +0900 Message-Id: <3fb02f58aa660d2d4a01bb187ce683eee23a138f.1694153251.git.matsuda-daisuke@fujitsu.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_PASS,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1776464660936936759 X-GMAIL-MSGID: 1776464660936936759 Allow userspace to register an ODP-enabled MR, in which case the flag IB_ACCESS_ON_DEMAND is passed to rxe_reg_user_mr(). However, there is no RDMA operation enabled right now. They will be supported later in the subsequent two patches. rxe_odp_do_pagefault() is called to initialize an ODP-enabled MR. It syncs process address space from the CPU page table to the driver page table (dma_list/pfn_list in umem_odp) when called with RXE_PAGEFAULT_SNAPSHOT flag. Additionally, It can be used to trigger page fault when pages being accessed are not present or do not have proper read/write permissions, and possibly to prefetch pages in the future. Signed-off-by: Daisuke Matsuda --- drivers/infiniband/sw/rxe/rxe.c | 7 ++ drivers/infiniband/sw/rxe/rxe_loc.h | 14 +++ drivers/infiniband/sw/rxe/rxe_mr.c | 9 +- drivers/infiniband/sw/rxe/rxe_odp.c | 122 ++++++++++++++++++++++++++ drivers/infiniband/sw/rxe/rxe_resp.c | 15 +++- drivers/infiniband/sw/rxe/rxe_verbs.c | 5 +- drivers/infiniband/sw/rxe/rxe_verbs.h | 1 + 7 files changed, 167 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c index 54c723a6edda..f2284d27229b 100644 --- a/drivers/infiniband/sw/rxe/rxe.c +++ b/drivers/infiniband/sw/rxe/rxe.c @@ -73,6 +73,13 @@ static void rxe_init_device_param(struct rxe_dev *rxe) rxe->ndev->dev_addr); rxe->max_ucontext = RXE_MAX_UCONTEXT; + + if (IS_ENABLED(CONFIG_INFINIBAND_ON_DEMAND_PAGING)) { + rxe->attr.kernel_cap_flags |= IBK_ON_DEMAND_PAGING; + + /* IB_ODP_SUPPORT_IMPLICIT is not supported right now. */ + rxe->attr.odp_caps.general_caps |= IB_ODP_SUPPORT; + } } /* initialize port attributes */ diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index eb867f7d0d36..4bda154a0248 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -188,4 +188,18 @@ static inline unsigned int wr_opcode_mask(int opcode, struct rxe_qp *qp) return rxe_wr_opcode_info[opcode].mask[qp->ibqp.qp_type]; } +/* rxe_odp.c */ +#ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING +int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, + u64 iova, int access_flags, struct rxe_mr *mr); +#else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ +static inline int +rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, + int access_flags, struct rxe_mr *mr) +{ + return -EOPNOTSUPP; +} + +#endif /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ + #endif /* RXE_LOC_H */ diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c index 86b1908d304b..384cb4ba1f2d 100644 --- a/drivers/infiniband/sw/rxe/rxe_mr.c +++ b/drivers/infiniband/sw/rxe/rxe_mr.c @@ -318,7 +318,10 @@ int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, return err; } - return rxe_mr_copy_xarray(mr, iova, addr, length, dir); + if (mr->umem->is_odp) + return -EOPNOTSUPP; + else + return rxe_mr_copy_xarray(mr, iova, addr, length, dir); } /* copy data in or out of a wqe, i.e. sg list @@ -527,6 +530,10 @@ int rxe_mr_do_atomic_write(struct rxe_mr *mr, u64 iova, u64 value) struct page *page; u64 *va; + /* ODP is not supported right now. WIP. */ + if (mr->umem->is_odp) + return RESPST_ERR_UNSUPPORTED_OPCODE; + /* See IBA oA19-28 */ if (unlikely(mr->state != RXE_MR_STATE_VALID)) { rxe_dbg_mr(mr, "mr not in valid state"); diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c index 834fb1a84800..713bef9161e3 100644 --- a/drivers/infiniband/sw/rxe/rxe_odp.c +++ b/drivers/infiniband/sw/rxe/rxe_odp.c @@ -32,6 +32,31 @@ static void rxe_mr_unset_xarray(struct rxe_mr *mr, unsigned long start, xas_unlock(&xas); } +static void rxe_mr_set_xarray(struct rxe_mr *mr, unsigned long start, + unsigned long end, unsigned long *pfn_list) +{ + unsigned long lower = rxe_mr_iova_to_index(mr, start); + unsigned long upper = rxe_mr_iova_to_index(mr, end - 1); + struct page *page; + void *entry; + + XA_STATE(xas, &mr->page_list, lower); + + /* ib_umem_odp_unmap_dma_pages() ensures pages are HMM_PFN_VALID */ + xas_lock(&xas); + while (true) { + page = hmm_pfn_to_page(pfn_list[xas.xa_index]); + xas_store(&xas, page); + + entry = xas_next(&xas); + if (xas_retry(&xas, entry) || (xas.xa_index <= upper)) + continue; + + break; + } + xas_unlock(&xas); +} + static bool rxe_ib_invalidate_range(struct mmu_interval_notifier *mni, const struct mmu_notifier_range *range, unsigned long cur_seq) @@ -62,3 +87,100 @@ static bool rxe_ib_invalidate_range(struct mmu_interval_notifier *mni, const struct mmu_interval_notifier_ops rxe_mn_ops = { .invalidate = rxe_ib_invalidate_range, }; + +#define RXE_PAGEFAULT_RDONLY BIT(1) +#define RXE_PAGEFAULT_SNAPSHOT BIT(2) +static int rxe_odp_do_pagefault_and_lock(struct rxe_mr *mr, u64 user_va, int bcnt, u32 flags) +{ + int np; + u64 access_mask; + bool fault = !(flags & RXE_PAGEFAULT_SNAPSHOT); + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); + + access_mask = ODP_READ_ALLOWED_BIT; + if (umem_odp->umem.writable && !(flags & RXE_PAGEFAULT_RDONLY)) + access_mask |= ODP_WRITE_ALLOWED_BIT; + + /* + * ib_umem_odp_map_dma_and_lock() locks umem_mutex on success. + * Callers must release the lock later to let invalidation handler + * do its work again. + */ + np = ib_umem_odp_map_dma_and_lock(umem_odp, user_va, bcnt, + access_mask, fault); + if (np < 0) + return np; + + /* + * umem_mutex is still locked here, so we can use hmm_pfn_to_page() + * safely to fetch pages in the range. + */ + rxe_mr_set_xarray(mr, user_va, user_va + bcnt, umem_odp->pfn_list); + + return np; +} + +static int rxe_odp_init_pages(struct rxe_mr *mr) +{ + int ret; + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); + + ret = rxe_odp_do_pagefault_and_lock(mr, mr->umem->address, + mr->umem->length, + RXE_PAGEFAULT_SNAPSHOT); + + if (ret >= 0) + mutex_unlock(&umem_odp->umem_mutex); + + return ret >= 0 ? 0 : ret; +} + +int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, + u64 iova, int access_flags, struct rxe_mr *mr) +{ + int err; + struct ib_umem_odp *umem_odp; + + if (!IS_ENABLED(CONFIG_INFINIBAND_ON_DEMAND_PAGING)) + return -EOPNOTSUPP; + + rxe_mr_init(access_flags, mr); + + xa_init(&mr->page_list); + + if (!start && length == U64_MAX) { + if (iova != 0) + return -EINVAL; + if (!(rxe->attr.odp_caps.general_caps & IB_ODP_SUPPORT_IMPLICIT)) + return -EINVAL; + + /* Never reach here, for implicit ODP is not implemented. */ + } + + umem_odp = ib_umem_odp_get(&rxe->ib_dev, start, length, access_flags, + &rxe_mn_ops); + if (IS_ERR(umem_odp)) { + rxe_dbg_mr(mr, "Unable to create umem_odp err = %d\n", + (int)PTR_ERR(umem_odp)); + return PTR_ERR(umem_odp); + } + + umem_odp->private = mr; + + mr->umem = &umem_odp->umem; + mr->access = access_flags; + mr->ibmr.length = length; + mr->ibmr.iova = iova; + mr->page_offset = ib_umem_offset(&umem_odp->umem); + + err = rxe_odp_init_pages(mr); + if (err) { + ib_umem_odp_release(umem_odp); + return err; + } + + mr->state = RXE_MR_STATE_VALID; + mr->ibmr.type = IB_MR_TYPE_USER; + + return err; +} diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c index 969e057bbfd1..9159f1bdfc6f 100644 --- a/drivers/infiniband/sw/rxe/rxe_resp.c +++ b/drivers/infiniband/sw/rxe/rxe_resp.c @@ -635,6 +635,10 @@ static enum resp_states process_flush(struct rxe_qp *qp, struct rxe_mr *mr = qp->resp.mr; struct resp_res *res = qp->resp.res; + /* ODP is not supported right now. WIP. */ + if (mr->umem->is_odp) + return RESPST_ERR_UNSUPPORTED_OPCODE; + /* oA19-14, oA19-15 */ if (res && res->replay) return RESPST_ACKNOWLEDGE; @@ -688,10 +692,13 @@ static enum resp_states atomic_reply(struct rxe_qp *qp, if (!res->replay) { u64 iova = qp->resp.va + qp->resp.offset; - err = rxe_mr_do_atomic_op(mr, iova, pkt->opcode, - atmeth_comp(pkt), - atmeth_swap_add(pkt), - &res->atomic.orig_val); + if (mr->umem->is_odp) + err = RESPST_ERR_UNSUPPORTED_OPCODE; + else + err = rxe_mr_do_atomic_op(mr, iova, pkt->opcode, + atmeth_comp(pkt), + atmeth_swap_add(pkt), + &res->atomic.orig_val); if (err) return err; diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c index 48f86839d36a..192ad835c712 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.c +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c @@ -1278,7 +1278,10 @@ static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd, u64 start, mr->ibmr.pd = ibpd; mr->ibmr.device = ibpd->device; - err = rxe_mr_init_user(rxe, start, length, iova, access, mr); + if (access & IB_ACCESS_ON_DEMAND) + err = rxe_odp_mr_init_user(rxe, start, length, iova, access, mr); + else + err = rxe_mr_init_user(rxe, start, length, iova, access, mr); if (err) { rxe_dbg_mr(mr, "reg_user_mr failed, err = %d", err); goto err_cleanup; diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h index 1058b5de8920..24dd747586e0 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.h +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h @@ -298,6 +298,7 @@ enum { | IB_ACCESS_LOCAL_WRITE | IB_ACCESS_MW_BIND | IB_ACCESS_ON_DEMAND + | IB_ACCESS_HUGETLB | IB_ACCESS_FLUSH_GLOBAL | IB_ACCESS_FLUSH_PERSISTENT | IB_ACCESS_OPTIONAL, From patchwork Fri Sep 8 06:26:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Daisuke Matsuda (Fujitsu)" X-Patchwork-Id: 137705 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ab0a:0:b0:3f2:4152:657d with SMTP id m10csp394943vqo; Fri, 8 Sep 2023 01:21:24 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEogdPTMvRLmxhruTM1TwpHFJl7QEKNIlGEJ5c351xFLZHY0DURFTaCylgha7hZyLdYK5kG X-Received: by 2002:a19:4302:0:b0:500:cb2b:866f with SMTP id q2-20020a194302000000b00500cb2b866fmr1086898lfa.30.1694161283920; Fri, 08 Sep 2023 01:21:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694161283; cv=none; d=google.com; s=arc-20160816; b=07CViGJv20xIOBZEIfHL7bUaeJujfxCa/owtyozV/rQEnwz99dUop9QNgJE6swkdhm lXGyt3t/yhLuNBSZXemJA/wXENqnnhPC+MHDrZXC+5adxiDO+d6xmAMe9soASU9J2zIi 2AbJciL/lc+HZN2d1ylLv+/y6KJzSa/LRQHQwBUgkeWpkI1oIFMdlpQGZP3PHIQbKQz/ PuMg1Z+HaH0ACARzqC75zZrlM7vtjHqBXpB1DFDBa1jACn0/x+FJ3x3Csc7HBwnxAEhe ZIrgXiqNxvIdKrRs/4tfbjNtsbjSyt3wFVGmiujG0da/vmVSrtN7/LfAgQvJTiojHD5O QU9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=AvzK0/FpO0nioFKwgg4yp1Jx6otYXzjDL+qOSMRHGvw=; fh=Kza5GnIpHUYQ1cZy2y92CPl6XrzN03vBDGjf8wFwAS8=; b=tAc4fl7qRd+cNEk7Lq9zFkiZ6YJYKEQWsuHpMfcFaEmMxxkjMRtZjGOOsCA6RCIZhe uAKx0XzGUg1Y90zxY1GEhIszD0uP15lNYP/pvh26D7cwsnXnQ4/CJHItOm8RUwUHvO5K /uGYSRpPc9pdxMGIkuXjDrOmAgLzKZ1Z2pMAUo1BbKmi3P7cT3o4fxGTDDRRf9opdrVP r1hcWwutpSj0EX5cYV7iRiomGyoLC+BwtrLWnyKlVLdwQnWyKfDzhzPtawqVkqgT0E34 kJ5q3pcZPjC7foiuzPFKWlXzS9p/yUQ3/L1C6UOCiZPTiXyOG4MvbjAaJmHgqoQ0Vk43 RZ0A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m23-20020aa7c497000000b00529fa0e28e3si995513edq.586.2023.09.08.01.21.20; Fri, 08 Sep 2023 01:21:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241963AbjIHG1U (ORCPT + 7 others); Fri, 8 Sep 2023 02:27:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38556 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241911AbjIHG1Q (ORCPT ); Fri, 8 Sep 2023 02:27:16 -0400 Received: from esa1.hc1455-7.c3s2.iphmx.com (esa1.hc1455-7.c3s2.iphmx.com [207.54.90.47]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 039C11FC4; Thu, 7 Sep 2023 23:27:10 -0700 (PDT) X-IronPort-AV: E=McAfee;i="6600,9927,10826"; a="131102632" X-IronPort-AV: E=Sophos;i="6.02,236,1688396400"; d="scan'208";a="131102632" Received: from unknown (HELO oym-r2.gw.nic.fujitsu.com) ([210.162.30.90]) by esa1.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Sep 2023 15:27:10 +0900 Received: from oym-m4.gw.nic.fujitsu.com (oym-nat-oym-m4.gw.nic.fujitsu.com [192.168.87.61]) by oym-r2.gw.nic.fujitsu.com (Postfix) with ESMTP id 5D061CD7E3; Fri, 8 Sep 2023 15:27:07 +0900 (JST) Received: from m3002.s.css.fujitsu.com (msm3.b.css.fujitsu.com [10.128.233.104]) by oym-m4.gw.nic.fujitsu.com (Postfix) with ESMTP id 85377D621A; Fri, 8 Sep 2023 15:27:06 +0900 (JST) Received: from localhost.localdomain (unknown [10.118.237.107]) by m3002.s.css.fujitsu.com (Postfix) with ESMTP id 4202F20059A3; Fri, 8 Sep 2023 15:27:06 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leon@kernel.org, jgg@ziepe.ca, zyjzyj2000@gmail.com Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, yangx.jy@fujitsu.com, lizhijian@fujitsu.com, y-goto@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v6 6/7] RDMA/rxe: Add support for Send/Recv/Write/Read with ODP Date: Fri, 8 Sep 2023 15:26:47 +0900 Message-Id: X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_PASS,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1776456862134420892 X-GMAIL-MSGID: 1776456862134420892 rxe_mr_copy() is used widely to copy data to/from a user MR. requester uses it to load payloads of requesting packets; responder uses it to process Send, Write, and Read operaetions; completer uses it to copy data from response packets of Read and Atomic operations to a user MR. Allow these operations to be used with ODP by adding a subordinate function rxe_odp_mr_copy(). It is comprised of the following steps: 1. Check page presence and R/W permission. 2. If OK, just execute data copy to/from the pages and exit. 3. Otherwise, trigger page fault to map the pages. 4. Update the MR xarray using PFNs in umem_odp->pfn_list. 5. Execute data copy to/from the pages. umem_mutex is used to ensure that mapped pages are not invalidated before data copy completes. It also protects the lists in umem_odp and the MR xarray. Signed-off-by: Daisuke Matsuda --- drivers/infiniband/sw/rxe/rxe.c | 10 ++++ drivers/infiniband/sw/rxe/rxe_loc.h | 8 +++ drivers/infiniband/sw/rxe/rxe_mr.c | 2 +- drivers/infiniband/sw/rxe/rxe_odp.c | 84 +++++++++++++++++++++++++++++ 4 files changed, 103 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c index f2284d27229b..207a022156f0 100644 --- a/drivers/infiniband/sw/rxe/rxe.c +++ b/drivers/infiniband/sw/rxe/rxe.c @@ -79,6 +79,16 @@ static void rxe_init_device_param(struct rxe_dev *rxe) /* IB_ODP_SUPPORT_IMPLICIT is not supported right now. */ rxe->attr.odp_caps.general_caps |= IB_ODP_SUPPORT; + + rxe->attr.odp_caps.per_transport_caps.ud_odp_caps |= IB_ODP_SUPPORT_SEND; + rxe->attr.odp_caps.per_transport_caps.ud_odp_caps |= IB_ODP_SUPPORT_RECV; + rxe->attr.odp_caps.per_transport_caps.ud_odp_caps |= IB_ODP_SUPPORT_SRQ_RECV; + + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_SEND; + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_RECV; + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_WRITE; + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_READ; + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_SRQ_RECV; } } diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index 4bda154a0248..eeaeff8a1398 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -192,6 +192,8 @@ static inline unsigned int wr_opcode_mask(int opcode, struct rxe_qp *qp) #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, int access_flags, struct rxe_mr *mr); +int rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length, + enum rxe_mr_copy_dir dir); #else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ static inline int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, @@ -199,6 +201,12 @@ rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, { return -EOPNOTSUPP; } +static inline int +rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, + int length, enum rxe_mr_copy_dir dir) +{ + return -EOPNOTSUPP; +} #endif /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c index 384cb4ba1f2d..1641bf1a42a0 100644 --- a/drivers/infiniband/sw/rxe/rxe_mr.c +++ b/drivers/infiniband/sw/rxe/rxe_mr.c @@ -319,7 +319,7 @@ int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, } if (mr->umem->is_odp) - return -EOPNOTSUPP; + return rxe_odp_mr_copy(mr, iova, addr, length, dir); else return rxe_mr_copy_xarray(mr, iova, addr, length, dir); } diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c index 713bef9161e3..da1c0753db93 100644 --- a/drivers/infiniband/sw/rxe/rxe_odp.c +++ b/drivers/infiniband/sw/rxe/rxe_odp.c @@ -184,3 +184,87 @@ int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, return err; } + +static inline bool rxe_odp_check_pages(struct rxe_mr *mr, u64 iova, + int length, u32 flags) +{ + unsigned long lower, upper, idx; + unsigned long hmm_flags = HMM_PFN_VALID; + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); + struct page *page; + bool need_fault = false; + + lower = rxe_mr_iova_to_index(mr, iova); + upper = rxe_mr_iova_to_index(mr, iova + length - 1); + + if (!(flags & RXE_PAGEFAULT_RDONLY)) + hmm_flags |= HMM_PFN_WRITE; + + /* xarray is protected by umem_mutex */ + for (idx = lower; idx <= upper; idx++) { + page = xa_load(&mr->page_list, idx); + + if (!page || !(umem_odp->pfn_list[idx] & hmm_flags)) { + need_fault = true; + break; + } + } + + return need_fault; +} + +int rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length, + enum rxe_mr_copy_dir dir) +{ + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); + u32 flags = 0; + int retry = 0; + int err; + + if (unlikely(!mr->umem->is_odp)) + return -EOPNOTSUPP; + + switch (dir) { + case RXE_TO_MR_OBJ: + break; + + case RXE_FROM_MR_OBJ: + flags = RXE_PAGEFAULT_RDONLY; + break; + + default: + return -EINVAL; + } + + mutex_lock(&umem_odp->umem_mutex); + + if (rxe_odp_check_pages(mr, iova, length, flags)) + goto need_fault; + + err = rxe_mr_copy_xarray(mr, iova, addr, length, dir); + + mutex_unlock(&umem_odp->umem_mutex); + + return err; + +need_fault: + /* allow max 3 tries for pagefault */ + do { + mutex_unlock(&umem_odp->umem_mutex); + + if (retry > 2) + return -EFAULT; + + /* umem_mutex is locked on success */ + err = rxe_odp_do_pagefault_and_lock(mr, iova, length, flags); + if (err < 0) + return err; + retry++; + } while (rxe_odp_check_pages(mr, iova, length, flags)); + + err = rxe_mr_copy_xarray(mr, iova, addr, length, dir); + + mutex_unlock(&umem_odp->umem_mutex); + + return err; +} From patchwork Fri Sep 8 06:26:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Daisuke Matsuda (Fujitsu)" X-Patchwork-Id: 137714 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ab0a:0:b0:3f2:4152:657d with SMTP id m10csp445778vqo; Fri, 8 Sep 2023 03:25:51 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGmTFPNguwTp4ZskT7SpVeOubH0ZWU7Su6a+sVTjlNN/vC6HwrSASb6OefJ6GWXJnlApT0N X-Received: by 2002:a05:6a21:184:b0:153:353e:5e39 with SMTP id le4-20020a056a21018400b00153353e5e39mr2734467pzb.51.1694168751487; Fri, 08 Sep 2023 03:25:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694168751; cv=none; d=google.com; s=arc-20160816; b=SjJtm+T9icHhMOQThEY32T0BzyQnuntpjn+5S17VcVDi4NmzmLUQ2kqDbnjEPmgI4v W4Ukg6upXo1tPhSFU0QaJKwNOZZLQ0AaIgUd2YzyE7ntKMPfayK5w32pRwPnYxMr/w29 SkFOYD4oN+/g6kbm70Da/yYG8AclVVYiblEmU+xIHIymOY1aAj/CwaPDfKS2YnvO/4Oe aRHw8MVALvMz56tyXKKclXM7CZJ81htHp4SpirtVaTCOaivBAM/P9Ch968udSbrzYs/o gfnJPimBFe/oC4zwtfebFIZcJNtl6cuqB/IdA+aAu0LpoV3yUDEs5i6aua9kSOWyu9JV iWaQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=kIZo3EQnrHvayfS2ZWWtvNN9OqP2pf0F86dWl9ayasQ=; fh=Kza5GnIpHUYQ1cZy2y92CPl6XrzN03vBDGjf8wFwAS8=; b=f4azjm2coYaSZmair3orE7/PxxyOsqmihHGqbG9csmFkNrQUBd6yGwkNtUcPZQ9b7m OSPs3Mk3HUVshiJydrQIvuWn5Oinzs216BBs99liB/JKcZmDDRLiJvXW2zemMOHhSyrs Ke4T1VZReQUMZgGUrf33gtV9CKAgF78L3GtpS225rynm5xG+OFmiy1jLnOCpu1IkcASJ ut7/tEXIOfyjYg1JdKkAusjwPgpn8VzTHMv9/r61HP3Zldaehm9189qbpIGWS7aGmF0D 5B3Lyrcpw0tjRTkw4+20kRG/FJ5534Z0k93YICmE9iX9uBx0B6LKwPEQBPlYnWQEbf2X gkQA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t7-20020a63dd07000000b00573fb2def2asi1214965pgg.539.2023.09.08.03.25.44; Fri, 08 Sep 2023 03:25:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242057AbjIHG10 (ORCPT + 42 others); Fri, 8 Sep 2023 02:27:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38640 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242009AbjIHG1U (ORCPT ); Fri, 8 Sep 2023 02:27:20 -0400 Received: from esa8.hc1455-7.c3s2.iphmx.com (esa8.hc1455-7.c3s2.iphmx.com [139.138.61.253]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6F36C1BD8; Thu, 7 Sep 2023 23:27:13 -0700 (PDT) X-IronPort-AV: E=McAfee;i="6600,9927,10826"; a="119319382" X-IronPort-AV: E=Sophos;i="6.02,236,1688396400"; d="scan'208";a="119319382" Received: from unknown (HELO oym-r1.gw.nic.fujitsu.com) ([210.162.30.89]) by esa8.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Sep 2023 15:27:10 +0900 Received: from oym-m3.gw.nic.fujitsu.com (oym-nat-oym-m3.gw.nic.fujitsu.com [192.168.87.60]) by oym-r1.gw.nic.fujitsu.com (Postfix) with ESMTP id A26FFD29E1; Fri, 8 Sep 2023 15:27:08 +0900 (JST) Received: from m3002.s.css.fujitsu.com (msm3.b.css.fujitsu.com [10.128.233.104]) by oym-m3.gw.nic.fujitsu.com (Postfix) with ESMTP id DFAEBD9A75; Fri, 8 Sep 2023 15:27:07 +0900 (JST) Received: from localhost.localdomain (unknown [10.118.237.107]) by m3002.s.css.fujitsu.com (Postfix) with ESMTP id 9B53E200537C; Fri, 8 Sep 2023 15:27:07 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leon@kernel.org, jgg@ziepe.ca, zyjzyj2000@gmail.com Cc: linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, yangx.jy@fujitsu.com, lizhijian@fujitsu.com, y-goto@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v6 7/7] RDMA/rxe: Add support for the traditional Atomic operations with ODP Date: Fri, 8 Sep 2023 15:26:48 +0900 Message-Id: <908514dfa6bbeae72d36481d893674b254ee416d.1694153251.git.matsuda-daisuke@fujitsu.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_PASS,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1776464692374774919 X-GMAIL-MSGID: 1776464692374774919 Enable 'fetch and add' and 'compare and swap' operations to be used with ODP. This is comprised of the following steps: 1. Verify that the page is present with write permission. 2. If OK, execute the operation and exit. 3. If not, then trigger page fault to map the page. 4. Update the entry in the MR xarray. 5. Execute the operation. umem_mutex is used to ensure that the target page is not invalidated before data access completes. It also protects the lists in umem_odp and the MR xarray. Signed-off-by: Daisuke Matsuda --- drivers/infiniband/sw/rxe/rxe.c | 1 + drivers/infiniband/sw/rxe/rxe_loc.h | 9 ++++++ drivers/infiniband/sw/rxe/rxe_odp.c | 43 ++++++++++++++++++++++++++++ drivers/infiniband/sw/rxe/rxe_resp.c | 5 +++- 4 files changed, 57 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c index 207a022156f0..abd3267c2873 100644 --- a/drivers/infiniband/sw/rxe/rxe.c +++ b/drivers/infiniband/sw/rxe/rxe.c @@ -88,6 +88,7 @@ static void rxe_init_device_param(struct rxe_dev *rxe) rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_RECV; rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_WRITE; rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_READ; + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_ATOMIC; rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_SRQ_RECV; } } diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index eeaeff8a1398..0bae9044f362 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -194,6 +194,9 @@ int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, int access_flags, struct rxe_mr *mr); int rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length, enum rxe_mr_copy_dir dir); +int rxe_odp_mr_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, + u64 compare, u64 swap_add, u64 *orig_val); + #else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ static inline int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, @@ -207,6 +210,12 @@ rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, { return -EOPNOTSUPP; } +static inline int +rxe_odp_mr_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, + u64 compare, u64 swap_add, u64 *orig_val) +{ + return RESPST_ERR_UNSUPPORTED_OPCODE; +} #endif /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c index da1c0753db93..289c60cbda12 100644 --- a/drivers/infiniband/sw/rxe/rxe_odp.c +++ b/drivers/infiniband/sw/rxe/rxe_odp.c @@ -268,3 +268,46 @@ int rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length, return err; } + +int rxe_odp_mr_atomic_op(struct rxe_mr *mr, u64 iova, int opcode, + u64 compare, u64 swap_add, u64 *orig_val) +{ + int err; + int retry = 0; + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); + + mutex_lock(&umem_odp->umem_mutex); + + /* Atomic operations manipulate a single char. */ + if (rxe_odp_check_pages(mr, iova, sizeof(char), 0)) + goto need_fault; + + err = rxe_mr_do_atomic_op(mr, iova, opcode, compare, + swap_add, orig_val); + + mutex_unlock(&umem_odp->umem_mutex); + + return err; + +need_fault: + /* allow max 3 tries for pagefault */ + do { + mutex_unlock(&umem_odp->umem_mutex); + + if (retry > 2) + return -EFAULT; + + /* umem_mutex is locked on success */ + err = rxe_odp_do_pagefault_and_lock(mr, iova, sizeof(char), 0); + if (err < 0) + return err; + retry++; + } while (rxe_odp_check_pages(mr, iova, sizeof(char), 0)); + + err = rxe_mr_do_atomic_op(mr, iova, opcode, compare, + swap_add, orig_val); + + mutex_unlock(&umem_odp->umem_mutex); + + return err; +} diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c index 9159f1bdfc6f..af3e669679a0 100644 --- a/drivers/infiniband/sw/rxe/rxe_resp.c +++ b/drivers/infiniband/sw/rxe/rxe_resp.c @@ -693,7 +693,10 @@ static enum resp_states atomic_reply(struct rxe_qp *qp, u64 iova = qp->resp.va + qp->resp.offset; if (mr->umem->is_odp) - err = RESPST_ERR_UNSUPPORTED_OPCODE; + err = rxe_odp_mr_atomic_op(mr, iova, pkt->opcode, + atmeth_comp(pkt), + atmeth_swap_add(pkt), + &res->atomic.orig_val); else err = rxe_mr_do_atomic_op(mr, iova, pkt->opcode, atmeth_comp(pkt),