From patchwork Fri Dec 23 06:51:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Daisuke Matsuda (Fujitsu)" X-Patchwork-Id: 36144 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:e747:0:0:0:0:0 with SMTP id c7csp175542wrn; Thu, 22 Dec 2022 22:58:06 -0800 (PST) X-Google-Smtp-Source: AMrXdXvUqjlCUKgKarpQkWoZtLzVGNPxvuiYKVWh6w+hMJPcE9duVSEjrStBf7Z6GPWj21UnNXE5 X-Received: by 2002:a05:6a20:4916:b0:a8:61bd:d9af with SMTP id ft22-20020a056a20491600b000a861bdd9afmr9399797pzb.40.1671778686560; Thu, 22 Dec 2022 22:58:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671778686; cv=none; d=google.com; s=arc-20160816; b=lIYQMWBBd4cHBF3t1wJ7iqESVBWxE1LElgytYhsNjFfVq5Z2w9d4x9orOxBKxNDahR fYXwVQ+CiJgLVskGxN7kiJ+I72ojFpHgNBmGkQdY7Rfhma0lNdmUwC4JSH0GL91FGipw PPPRRBfxpzPsJr8FCBx5/ToNnpyNDb7LD+WQn+BbN4wY00NTR/N52k8bI1nNUTuxSW// p6KT6Q7rIv5/vhzGBqVbw7gf8Zdw/67BhPfhmqhXWVzAlCb4EKfVZ38xIci1nsYkVgxt wIb3xOu1i04Bp52rhNx7yme+oOnkYsr9pQaey/pzrZryzNL6dRdFQVgtLCxu8fE9tsJA /bkQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=vBGx6yIwvxABi6A+L6pONKXZODqr6wkntq31EH1dekM=; b=uEl81BE1LvZP648TLqqo/mefNQzjLTAr1MtVr9ZHOn7L7Re+gyct9U9owvPaXZq3wg ECrNvpehmkxi7/Cze6ut+efmkh+mZfjvR1DDwr/gGYE4a1Zc1o07aoh38SXGWbDcNobV HP055E6gHcEBfE8zsrLZGugBNaHiPpvwutLlRCsDYfGlnlPM6X884W1MeosZu7U0LHZ6 LrkIx/6mH9dqKvD1VCyETXeJgzy/OPo9AQSvP0OGQ3rgvqemvNI9t3izEUd1jF8L8LNz m40fxbwSxC2VCXZxxkL8/Qyx8+i1icl+3agdGGSdAxxDaN8f5kdOXe25l4JExQszin0q lhwA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w189-20020a6362c6000000b0046fec9f9ed8si2861304pgb.704.2022.12.22.22.57.53; Thu, 22 Dec 2022 22:58:06 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235884AbiLWGxs (ORCPT + 99 others); Fri, 23 Dec 2022 01:53:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51394 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235897AbiLWGx0 (ORCPT ); Fri, 23 Dec 2022 01:53:26 -0500 X-Greylist: delayed 64 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Thu, 22 Dec 2022 22:53:24 PST Received: from esa9.hc1455-7.c3s2.iphmx.com (esa9.hc1455-7.c3s2.iphmx.com [139.138.36.223]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2C161264B0 for ; Thu, 22 Dec 2022 22:53:23 -0800 (PST) X-IronPort-AV: E=McAfee;i="6500,9779,10569"; a="89357418" X-IronPort-AV: E=Sophos;i="5.96,267,1665414000"; d="scan'208";a="89357418" Received: from unknown (HELO oym-r3.gw.nic.fujitsu.com) ([210.162.30.91]) by esa9.hc1455-7.c3s2.iphmx.com with ESMTP; 23 Dec 2022 15:52:16 +0900 Received: from oym-m3.gw.nic.fujitsu.com (oym-nat-oym-m3.gw.nic.fujitsu.com [192.168.87.60]) by oym-r3.gw.nic.fujitsu.com (Postfix) with ESMTP id EFA35D6476; Fri, 23 Dec 2022 15:52:15 +0900 (JST) Received: from m3003.s.css.fujitsu.com (m3003.s.css.fujitsu.com [10.128.233.114]) by oym-m3.gw.nic.fujitsu.com (Postfix) with ESMTP id 18EE5D94B4; Fri, 23 Dec 2022 15:52:15 +0900 (JST) Received: from localhost.localdomain (unknown [10.19.3.107]) by m3003.s.css.fujitsu.com (Postfix) with ESMTP id C8420200B2A8; Fri, 23 Dec 2022 15:52:14 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leonro@nvidia.com, jgg@nvidia.com, zyjzyj2000@gmail.com Cc: nvdimm@lists.linux.dev, linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, yangx.jy@fujitsu.com, lizhijian@fujitsu.com, y-goto@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v3 1/7] RDMA/rxe: Convert triple tasklets to use workqueue Date: Fri, 23 Dec 2022 15:51:52 +0900 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_PASS,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752987008255949052?= X-GMAIL-MSGID: =?utf-8?q?1752987008255949052?= In order to implement On-Demand Paging on the rxe driver, triple tasklets (requester, responder, and completer) must be allowed to sleep so that they can trigger page fault when pages being accessed are not present. This patch replaces the tasklets with a workqueue, but still allows direct- call of works from softirq context if it is obvious that MRs are not going to be accessed and there is no work being processed in the workqueue. As counterparts to tasklet_disable() and tasklet_enable() are missing for workqueues, an atomic value is introduced to prevent work items from being scheduled while qp reset is in progress. The way to initialize/destroy workqueue is picked up from the implementation of Ian Ziemba and Bob Pearson at HPE. Link: https://lore.kernel.org/all/20221018043345.4033-1-rpearsonhpe@gmail.com/ Signed-off-by: Daisuke Matsuda --- drivers/infiniband/sw/rxe/rxe.c | 9 ++++- drivers/infiniband/sw/rxe/rxe_comp.c | 2 +- drivers/infiniband/sw/rxe/rxe_param.h | 2 +- drivers/infiniband/sw/rxe/rxe_qp.c | 2 +- drivers/infiniband/sw/rxe/rxe_req.c | 2 +- drivers/infiniband/sw/rxe/rxe_resp.c | 2 +- drivers/infiniband/sw/rxe/rxe_task.c | 52 ++++++++++++++++++++------- drivers/infiniband/sw/rxe/rxe_task.h | 15 ++++++-- 8 files changed, 65 insertions(+), 21 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c index 136c2efe3466..3c7e42e5b0c7 100644 --- a/drivers/infiniband/sw/rxe/rxe.c +++ b/drivers/infiniband/sw/rxe/rxe.c @@ -210,10 +210,16 @@ static int __init rxe_module_init(void) { int err; - err = rxe_net_init(); + err = rxe_alloc_wq(); if (err) return err; + err = rxe_net_init(); + if (err) { + rxe_destroy_wq(); + return err; + } + rdma_link_register(&rxe_link_ops); pr_info("loaded\n"); return 0; @@ -224,6 +230,7 @@ static void __exit rxe_module_exit(void) rdma_link_unregister(&rxe_link_ops); ib_unregister_driver(RDMA_DRIVER_RXE); rxe_net_exit(); + rxe_destroy_wq(); pr_info("unloaded\n"); } diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c index 20737fec392b..046bbacce37c 100644 --- a/drivers/infiniband/sw/rxe/rxe_comp.c +++ b/drivers/infiniband/sw/rxe/rxe_comp.c @@ -773,7 +773,7 @@ int rxe_completer(void *arg) } /* A non-zero return value will cause rxe_do_task to - * exit its loop and end the tasklet. A zero return + * exit its loop and end the work item. A zero return * will continue looping and return to rxe_completer */ done: diff --git a/drivers/infiniband/sw/rxe/rxe_param.h b/drivers/infiniband/sw/rxe/rxe_param.h index a754fc902e3d..bd8050e99d6b 100644 --- a/drivers/infiniband/sw/rxe/rxe_param.h +++ b/drivers/infiniband/sw/rxe/rxe_param.h @@ -112,7 +112,7 @@ enum rxe_device_param { RXE_INFLIGHT_SKBS_PER_QP_HIGH = 64, RXE_INFLIGHT_SKBS_PER_QP_LOW = 16, - /* Max number of interations of each tasklet + /* Max number of interations of each work item * before yielding the cpu to let other * work make progress */ diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c index ab72db68b58f..e033b2449dfe 100644 --- a/drivers/infiniband/sw/rxe/rxe_qp.c +++ b/drivers/infiniband/sw/rxe/rxe_qp.c @@ -471,7 +471,7 @@ int rxe_qp_chk_attr(struct rxe_dev *rxe, struct rxe_qp *qp, /* move the qp to the reset state */ static void rxe_qp_reset(struct rxe_qp *qp) { - /* stop tasks from running */ + /* flush workqueue and stop tasks from running */ rxe_disable_task(&qp->resp.task); /* stop request/comp */ diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c index 899c8779f800..2bcd287a2c3b 100644 --- a/drivers/infiniband/sw/rxe/rxe_req.c +++ b/drivers/infiniband/sw/rxe/rxe_req.c @@ -830,7 +830,7 @@ int rxe_requester(void *arg) update_state(qp, &pkt); /* A non-zero return value will cause rxe_do_task to - * exit its loop and end the tasklet. A zero return + * exit its loop and end the work item. A zero return * will continue looping and return to rxe_requester */ done: diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c index c74972244f08..d9134a00a529 100644 --- a/drivers/infiniband/sw/rxe/rxe_resp.c +++ b/drivers/infiniband/sw/rxe/rxe_resp.c @@ -1691,7 +1691,7 @@ int rxe_responder(void *arg) } /* A non-zero return value will cause rxe_do_task to - * exit its loop and end the tasklet. A zero return + * exit its loop and end the work item. A zero return * will continue looping and return to rxe_responder */ done: diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c index 60b90e33a884..b96f72aa9005 100644 --- a/drivers/infiniband/sw/rxe/rxe_task.c +++ b/drivers/infiniband/sw/rxe/rxe_task.c @@ -6,6 +6,22 @@ #include "rxe.h" +static struct workqueue_struct *rxe_wq; + +int rxe_alloc_wq(void) +{ + rxe_wq = alloc_workqueue("rxe_wq", WQ_CPU_INTENSIVE, WQ_MAX_ACTIVE); + if (!rxe_wq) + return -ENOMEM; + + return 0; +} + +void rxe_destroy_wq(void) +{ + destroy_workqueue(rxe_wq); +} + int __rxe_do_task(struct rxe_task *task) { @@ -24,11 +40,11 @@ int __rxe_do_task(struct rxe_task *task) * a second caller finds the task already running * but looks just after the last call to func */ -static void do_task(struct tasklet_struct *t) +static void do_task(struct work_struct *w) { int cont; int ret; - struct rxe_task *task = from_tasklet(task, t, tasklet); + struct rxe_task *task = container_of(w, typeof(*task), work); struct rxe_qp *qp = (struct rxe_qp *)task->arg; unsigned int iterations = RXE_MAX_ITERATIONS; @@ -64,10 +80,10 @@ static void do_task(struct tasklet_struct *t) } else if (iterations--) { cont = 1; } else { - /* reschedule the tasklet and exit + /* reschedule the work item and exit * the loop to give up the cpu */ - tasklet_schedule(&task->tasklet); + queue_work(task->workq, &task->work); task->state = TASK_STATE_START; } break; @@ -97,7 +113,8 @@ int rxe_init_task(struct rxe_task *task, void *arg, int (*func)(void *)) task->func = func; task->destroyed = false; - tasklet_setup(&task->tasklet, do_task); + INIT_WORK(&task->work, do_task); + task->workq = rxe_wq; task->state = TASK_STATE_START; spin_lock_init(&task->lock); @@ -111,17 +128,16 @@ void rxe_cleanup_task(struct rxe_task *task) /* * Mark the task, then wait for it to finish. It might be - * running in a non-tasklet (direct call) context. + * running in a non-workqueue (direct call) context. */ task->destroyed = true; + flush_workqueue(task->workq); do { spin_lock_bh(&task->lock); idle = (task->state == TASK_STATE_START); spin_unlock_bh(&task->lock); } while (!idle); - - tasklet_kill(&task->tasklet); } void rxe_run_task(struct rxe_task *task) @@ -129,7 +145,7 @@ void rxe_run_task(struct rxe_task *task) if (task->destroyed) return; - do_task(&task->tasklet); + do_task(&task->work); } void rxe_sched_task(struct rxe_task *task) @@ -137,15 +153,27 @@ void rxe_sched_task(struct rxe_task *task) if (task->destroyed) return; - tasklet_schedule(&task->tasklet); + /* + * busy-loop while qp reset is in progress. + * This may be called from softirq context and thus cannot sleep. + */ + while (atomic_read(&task->suspended)) + cpu_relax(); + + queue_work(task->workq, &task->work); } void rxe_disable_task(struct rxe_task *task) { - tasklet_disable(&task->tasklet); + /* Alternative to tasklet_disable() */ + atomic_inc(&task->suspended); + smp_mb__after_atomic(); + flush_workqueue(task->workq); } void rxe_enable_task(struct rxe_task *task) { - tasklet_enable(&task->tasklet); + /* Alternative to tasklet_enable() */ + smp_mb__before_atomic(); + atomic_dec(&task->suspended); } diff --git a/drivers/infiniband/sw/rxe/rxe_task.h b/drivers/infiniband/sw/rxe/rxe_task.h index 7b88129702ac..9aa3f236e886 100644 --- a/drivers/infiniband/sw/rxe/rxe_task.h +++ b/drivers/infiniband/sw/rxe/rxe_task.h @@ -19,15 +19,22 @@ enum { * called again. */ struct rxe_task { - struct tasklet_struct tasklet; + struct workqueue_struct *workq; + struct work_struct work; int state; spinlock_t lock; void *arg; int (*func)(void *arg); int ret; bool destroyed; + /* used to {dis, en}able per-qp work items */ + atomic_t suspended; }; +int rxe_alloc_wq(void); + +void rxe_destroy_wq(void); + /* * init rxe_task structure * arg => parameter to pass to fcn @@ -40,18 +47,20 @@ void rxe_cleanup_task(struct rxe_task *task); /* * raw call to func in loop without any checking - * can call when tasklets are disabled + * can call when tasks are suspended */ int __rxe_do_task(struct rxe_task *task); +/* run a task without scheduling */ void rxe_run_task(struct rxe_task *task); +/* schedule a task into workqueue */ void rxe_sched_task(struct rxe_task *task); /* keep a task from scheduling */ void rxe_disable_task(struct rxe_task *task); -/* allow task to run */ +/* allow a task to run again */ void rxe_enable_task(struct rxe_task *task); #endif /* RXE_TASK_H */ From patchwork Fri Dec 23 06:51:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Daisuke Matsuda (Fujitsu)" X-Patchwork-Id: 36148 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:e747:0:0:0:0:0 with SMTP id c7csp185693wrn; Thu, 22 Dec 2022 23:27:31 -0800 (PST) X-Google-Smtp-Source: AMrXdXs1BxGIyWjVDbmokUuGx3rrmkxYaeldgzUjGO+H4oFzwB/VIDrkAAlvjUPkNGvP9J5lDTcz X-Received: by 2002:a17:906:c24f:b0:7c0:efb9:bc0e with SMTP id bl15-20020a170906c24f00b007c0efb9bc0emr6728187ejb.62.1671780450916; Thu, 22 Dec 2022 23:27:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671780450; cv=none; d=google.com; s=arc-20160816; b=ZUozLhYw9Amfl7OQv/mmP/Zzer+PVw1UfPnX0sQOJu3quVdAQfzgE0WnCK/jD8FE1p 9kCQxmTDXh/hKJJycfwmDgXNJxiCRlw6BP7ZkmVGXWaaDp5iB6/mEm4YztCuahfYg7PT qU1NejN/Y7HW/GWYRFKdc/SEbmyuzpRFgyfTT/SS/Tf54Ql52LkR1s9wnHbeILrAi87d PN88PkJUZM0rozL2J9gL5GBQSdI3jpQlXME4CUERbI8jqEvrqFGlMWQLug5kDNbdlNFI lUHrib+NHWlO8D51EaW/7m8Qca959YZFBc3IXgLHkU5V+E2jkmrkjgpYyiyQEg0kaOPh s4Mw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=ZZPCTPlrvXRLQslaomGiiICrLLhPCoJUQ/9KWSvyCFk=; b=nQr/Ej81iR6JgXHEvDmGU1NnqTIpoc9CoFAAuOVBmO0KFRH7bSuDZ4+038bzesxR8m hGOsuoNsxVMj8kcAPXEEPRa9q6MqNG77z29eO1r3a5FjkZan+Z6Jlga61S/JhEfND+AW PS5mQguLCsvTFCDo5JUg7lIUgPqWTkoYQdxZkTcnjHBCDOHNZNkHJB/TssPGrufl/9rj /5sDfms7xiHv0aOr2Wmw2BT5y++RHZmFr3YkP1jIeLq0b4QomnYZUlp/aLH3T58ORzDB zVVhss1dtr7rJgxx3zHWK455Fk8Kd1KPEsT9DKn5AGA0DrTCaMChVeBqEGhIHGaoGxcC aSxw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q13-20020a170906770d00b007ade20fc40csi1622267ejm.810.2022.12.22.23.27.07; Thu, 22 Dec 2022 23:27:30 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235918AbiLWGx4 (ORCPT + 99 others); Fri, 23 Dec 2022 01:53:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51054 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229897AbiLWGx0 (ORCPT ); Fri, 23 Dec 2022 01:53:26 -0500 X-Greylist: delayed 64 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Thu, 22 Dec 2022 22:53:25 PST Received: from esa1.hc1455-7.c3s2.iphmx.com (esa1.hc1455-7.c3s2.iphmx.com [207.54.90.47]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C683A2655C for ; Thu, 22 Dec 2022 22:53:25 -0800 (PST) X-IronPort-AV: E=McAfee;i="6500,9779,10569"; a="101155712" X-IronPort-AV: E=Sophos;i="5.96,267,1665414000"; d="scan'208";a="101155712" Received: from unknown (HELO oym-r2.gw.nic.fujitsu.com) ([210.162.30.90]) by esa1.hc1455-7.c3s2.iphmx.com with ESMTP; 23 Dec 2022 15:52:19 +0900 Received: from oym-m2.gw.nic.fujitsu.com (oym-nat-oym-m2.gw.nic.fujitsu.com [192.168.87.59]) by oym-r2.gw.nic.fujitsu.com (Postfix) with ESMTP id B9188D42C7; Fri, 23 Dec 2022 15:52:18 +0900 (JST) Received: from m3003.s.css.fujitsu.com (m3003.s.css.fujitsu.com [10.128.233.114]) by oym-m2.gw.nic.fujitsu.com (Postfix) with ESMTP id DF384BCB6A; Fri, 23 Dec 2022 15:52:17 +0900 (JST) Received: from localhost.localdomain (unknown [10.19.3.107]) by m3003.s.css.fujitsu.com (Postfix) with ESMTP id 980B4200B2A8; Fri, 23 Dec 2022 15:52:17 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leonro@nvidia.com, jgg@nvidia.com, zyjzyj2000@gmail.com Cc: nvdimm@lists.linux.dev, linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, yangx.jy@fujitsu.com, lizhijian@fujitsu.com, y-goto@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v3 2/7] RDMA/rxe: Always schedule works before accessing user MRs Date: Fri, 23 Dec 2022 15:51:53 +0900 Message-Id: <53e045f842a5519eb905c447eeff9441de609fdb.1671772917.git.matsuda-daisuke@fujitsu.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_PASS,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752988858376133416?= X-GMAIL-MSGID: =?utf-8?q?1752988858376133416?= Both responder and completer can sleep to execute page-fault when used with ODP, and page-fault handler can be invoked when they are going to access user MRs, so works must be scheduled in such cases. Signed-off-by: Daisuke Matsuda --- drivers/infiniband/sw/rxe/rxe_comp.c | 20 ++++++++++++++++++-- drivers/infiniband/sw/rxe/rxe_loc.h | 4 ++-- drivers/infiniband/sw/rxe/rxe_recv.c | 4 ++-- drivers/infiniband/sw/rxe/rxe_resp.c | 15 ++++++++++----- 4 files changed, 32 insertions(+), 11 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c index 046bbacce37c..421f4ffe51a3 100644 --- a/drivers/infiniband/sw/rxe/rxe_comp.c +++ b/drivers/infiniband/sw/rxe/rxe_comp.c @@ -124,13 +124,29 @@ void retransmit_timer(struct timer_list *t) } } -void rxe_comp_queue_pkt(struct rxe_qp *qp, struct sk_buff *skb) +void rxe_comp_queue_pkt(struct rxe_pkt_info *pkt, struct sk_buff *skb) { + struct rxe_qp *qp = pkt->qp; int must_sched; skb_queue_tail(&qp->resp_pkts, skb); - must_sched = skb_queue_len(&qp->resp_pkts) > 1; + /* Schedule a work item if processing READ or ATOMIC acks. + * In these cases, completer may sleep to access ODP-enabled MRs. + */ + switch (pkt->opcode) { + case IB_OPCODE_RC_RDMA_READ_RESPONSE_ONLY: + case IB_OPCODE_RC_RDMA_READ_RESPONSE_FIRST: + case IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE: + case IB_OPCODE_RC_RDMA_READ_RESPONSE_LAST: + case IB_OPCODE_RC_ATOMIC_ACKNOWLEDGE: + must_sched = 1; + break; + + default: + must_sched = skb_queue_len(&qp->resp_pkts) > 1; + } + if (must_sched != 0) rxe_counter_inc(SKB_TO_PKT(skb)->rxe, RXE_CNT_COMPLETER_SCHED); diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index 948ce4902b10..d567aa65b5e0 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -176,9 +176,9 @@ int rxe_icrc_init(struct rxe_dev *rxe); int rxe_icrc_check(struct sk_buff *skb, struct rxe_pkt_info *pkt); void rxe_icrc_generate(struct sk_buff *skb, struct rxe_pkt_info *pkt); -void rxe_resp_queue_pkt(struct rxe_qp *qp, struct sk_buff *skb); +void rxe_resp_queue_pkt(struct rxe_pkt_info *pkt, struct sk_buff *skb); -void rxe_comp_queue_pkt(struct rxe_qp *qp, struct sk_buff *skb); +void rxe_comp_queue_pkt(struct rxe_pkt_info *pkt, struct sk_buff *skb); static inline unsigned int wr_opcode_mask(int opcode, struct rxe_qp *qp) { diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c index 434a693cd4a5..01d07572a56f 100644 --- a/drivers/infiniband/sw/rxe/rxe_recv.c +++ b/drivers/infiniband/sw/rxe/rxe_recv.c @@ -174,9 +174,9 @@ static int hdr_check(struct rxe_pkt_info *pkt) static inline void rxe_rcv_pkt(struct rxe_pkt_info *pkt, struct sk_buff *skb) { if (pkt->mask & RXE_REQ_MASK) - rxe_resp_queue_pkt(pkt->qp, skb); + rxe_resp_queue_pkt(pkt, skb); else - rxe_comp_queue_pkt(pkt->qp, skb); + rxe_comp_queue_pkt(pkt, skb); } static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb) diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c index d9134a00a529..991550baef8c 100644 --- a/drivers/infiniband/sw/rxe/rxe_resp.c +++ b/drivers/infiniband/sw/rxe/rxe_resp.c @@ -85,15 +85,20 @@ static char *resp_state_name[] = { }; /* rxe_recv calls here to add a request packet to the input queue */ -void rxe_resp_queue_pkt(struct rxe_qp *qp, struct sk_buff *skb) +void rxe_resp_queue_pkt(struct rxe_pkt_info *pkt, struct sk_buff *skb) { - int must_sched; - struct rxe_pkt_info *pkt = SKB_TO_PKT(skb); + int must_sched = 1; + struct rxe_qp *qp = pkt->qp; skb_queue_tail(&qp->req_pkts, skb); - must_sched = (pkt->opcode == IB_OPCODE_RC_RDMA_READ_REQUEST) || - (skb_queue_len(&qp->req_pkts) > 1); + /* responder can sleep to access an ODP-enabled MR. Always schedule + * work items for non-zero-byte operations, RDMA READ, and ATOMIC + * operations. + */ + if ((skb_queue_len(&qp->req_pkts) == 1) && (payload_size(pkt) == 0) + && !(pkt->mask & RXE_READ_OR_ATOMIC_MASK)) + must_sched = 0; if (must_sched) rxe_sched_task(&qp->resp.task); From patchwork Fri Dec 23 06:51:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Daisuke Matsuda (Fujitsu)" X-Patchwork-Id: 36141 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:e747:0:0:0:0:0 with SMTP id c7csp174314wrn; Thu, 22 Dec 2022 22:54:05 -0800 (PST) X-Google-Smtp-Source: AMrXdXuB7WS0F5uUe8NdOIXTg47UPHxGqBff8VjYTz43qOxd1wG0Gyo/M+gg+gfoVcPa/p0F06dm X-Received: by 2002:a05:6a20:5490:b0:9d:efbf:48be with SMTP id i16-20020a056a20549000b0009defbf48bemr13376634pzk.2.1671778445393; Thu, 22 Dec 2022 22:54:05 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671778445; cv=none; d=google.com; s=arc-20160816; b=NIVIIwGv0b3uuNsKW1r6fyW+8CJkNwoV3pijXxPNCv2U5Rs8kJyLxKtAXHwdxs25ET y40bbx74FBp68t/J7F+SB3481VjgbllDLjyfTNfZMba6Ze+94L0KC1H63rT3gaIB0Gma lpz7fjVEm/aK6NwC1QYqWHm2odjofL4ZsrUtBLn9KOU5RN4BN0a3U1RnLRsgx7LUjWhZ ertdfP7s6UdsvlQBBulfvgeP86DJseYXIiE1pmsfBcfe0pbzV9nXn+DfTtBZSjx0mbYa 9lhy+EVRtC564u9BmfviI4FOyIU6EoHSOOn+jOnwjyM1x8NfLnRDBVNuMfY7By1j36jJ SWqQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=/O6HudOCkMy9m/vNJhyZZHA5O9KkwmEcqDvd5d+dQ/E=; b=YRVhz8m1FDb1p59wz/3GHEm2qi2H6B0uWxnEYRURxiF1axY5OqY5ds6YzR7nernvXa Tw1d8qBa21gnPVd4vVe/Xc5Q+4F7qfgZ8baPdzrHcRlvu897hTxS67cZK4l2QGMJBKrx 9BYTVnt+f9T79ngpX7sNEGrXuUOXta3Doly8rnMLDf15QBHlzj11lhZWfTS7f1rkrCUJ RUax3AfhnRUqt9t7WxavIiwVPTrpLi2+NZAhbW0bOhhJB/jPhaJKNKezlj9g2iZJ6F4D oBqeJro+X9pszQWnUnfkgXSlK0i7CVKzi9YvDHspgGGL5qcaAb++SCRuOU1NCyvZ7dr7 uOIw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m21-20020a63fd55000000b0048a5e036c44si2832341pgj.383.2022.12.22.22.53.53; Thu, 22 Dec 2022 22:54:05 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235769AbiLWGw3 (ORCPT + 99 others); Fri, 23 Dec 2022 01:52:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50344 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230202AbiLWGw1 (ORCPT ); Fri, 23 Dec 2022 01:52:27 -0500 Received: from esa6.hc1455-7.c3s2.iphmx.com (esa6.hc1455-7.c3s2.iphmx.com [68.232.139.139]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D36E724BFB; Thu, 22 Dec 2022 22:52:25 -0800 (PST) X-IronPort-AV: E=McAfee;i="6500,9779,10569"; a="102069663" X-IronPort-AV: E=Sophos;i="5.96,267,1665414000"; d="scan'208";a="102069663" Received: from unknown (HELO yto-r2.gw.nic.fujitsu.com) ([218.44.52.218]) by esa6.hc1455-7.c3s2.iphmx.com with ESMTP; 23 Dec 2022 15:52:23 +0900 Received: from yto-m4.gw.nic.fujitsu.com (yto-nat-yto-m4.gw.nic.fujitsu.com [192.168.83.67]) by yto-r2.gw.nic.fujitsu.com (Postfix) with ESMTP id 55D30DE50C; Fri, 23 Dec 2022 15:52:22 +0900 (JST) Received: from m3003.s.css.fujitsu.com (m3003.s.css.fujitsu.com [10.128.233.114]) by yto-m4.gw.nic.fujitsu.com (Postfix) with ESMTP id 9AB5AF7DB; Fri, 23 Dec 2022 15:52:21 +0900 (JST) Received: from localhost.localdomain (unknown [10.19.3.107]) by m3003.s.css.fujitsu.com (Postfix) with ESMTP id 653E9203EF13; Fri, 23 Dec 2022 15:52:21 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leonro@nvidia.com, jgg@nvidia.com, zyjzyj2000@gmail.com Cc: nvdimm@lists.linux.dev, linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, yangx.jy@fujitsu.com, lizhijian@fujitsu.com, y-goto@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v3 3/7] RDMA/rxe: Cleanup code for responder Atomic operations Date: Fri, 23 Dec 2022 15:51:54 +0900 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_PASS,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752986755461184745?= X-GMAIL-MSGID: =?utf-8?q?1752986755461184745?= Currently, rxe_responder() directly calls the function to execute Atomic operations. This need to be modified to insert some conditional branches for the ODP feature. Additionally, rxe_resp.h is newly added to be used by rxe_odp.c in near future. Signed-off-by: Daisuke Matsuda --- drivers/infiniband/sw/rxe/rxe_resp.c | 100 +++++++++++++++++---------- drivers/infiniband/sw/rxe/rxe_resp.h | 9 +++ 2 files changed, 71 insertions(+), 38 deletions(-) create mode 100644 drivers/infiniband/sw/rxe/rxe_resp.h diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c index 991550baef8c..e18bca076337 100644 --- a/drivers/infiniband/sw/rxe/rxe_resp.c +++ b/drivers/infiniband/sw/rxe/rxe_resp.c @@ -9,6 +9,7 @@ #include "rxe.h" #include "rxe_loc.h" #include "rxe_queue.h" +#include "rxe_resp.h" enum resp_states { RESPST_NONE, @@ -733,60 +734,83 @@ static enum resp_states process_flush(struct rxe_qp *qp, /* Guarantee atomicity of atomic operations at the machine level. */ static DEFINE_SPINLOCK(atomic_ops_lock); -static enum resp_states atomic_reply(struct rxe_qp *qp, - struct rxe_pkt_info *pkt) +enum resp_states rxe_process_atomic(struct rxe_qp *qp, + struct rxe_pkt_info *pkt, u64 *vaddr) { - u64 *vaddr; enum resp_states ret; - struct rxe_mr *mr = qp->resp.mr; struct resp_res *res = qp->resp.res; u64 value; - if (!res) { - res = rxe_prepare_res(qp, pkt, RXE_ATOMIC_MASK); - qp->resp.res = res; + /* check vaddr is 8 bytes aligned. */ + if (!vaddr || (uintptr_t)vaddr & 7) { + ret = RESPST_ERR_MISALIGNED_ATOMIC; + goto out; } - if (!res->replay) { - if (mr->state != RXE_MR_STATE_VALID) { - ret = RESPST_ERR_RKEY_VIOLATION; - goto out; - } + spin_lock(&atomic_ops_lock); + res->atomic.orig_val = value = *vaddr; - vaddr = iova_to_vaddr(mr, qp->resp.va + qp->resp.offset, - sizeof(u64)); + if (pkt->opcode == IB_OPCODE_RC_COMPARE_SWAP) { + if (value == atmeth_comp(pkt)) + value = atmeth_swap_add(pkt); + } else { + value += atmeth_swap_add(pkt); + } - /* check vaddr is 8 bytes aligned. */ - if (!vaddr || (uintptr_t)vaddr & 7) { - ret = RESPST_ERR_MISALIGNED_ATOMIC; - goto out; - } + *vaddr = value; + spin_unlock(&atomic_ops_lock); - spin_lock_bh(&atomic_ops_lock); - res->atomic.orig_val = value = *vaddr; + qp->resp.msn++; - if (pkt->opcode == IB_OPCODE_RC_COMPARE_SWAP) { - if (value == atmeth_comp(pkt)) - value = atmeth_swap_add(pkt); - } else { - value += atmeth_swap_add(pkt); - } + /* next expected psn, read handles this separately */ + qp->resp.psn = (pkt->psn + 1) & BTH_PSN_MASK; + qp->resp.ack_psn = qp->resp.psn; - *vaddr = value; - spin_unlock_bh(&atomic_ops_lock); + qp->resp.opcode = pkt->opcode; + qp->resp.status = IB_WC_SUCCESS; - qp->resp.msn++; + ret = RESPST_ACKNOWLEDGE; +out: + return ret; +} + +static enum resp_states rxe_atomic_ops(struct rxe_qp *qp, + struct rxe_pkt_info *pkt, + struct rxe_mr *mr) +{ + u64 *vaddr; + int ret; + + vaddr = iova_to_vaddr(mr, qp->resp.va + qp->resp.offset, + sizeof(u64)); + + if (pkt->mask & RXE_ATOMIC_MASK) + ret = rxe_process_atomic(qp, pkt, vaddr); + else + ret = RESPST_ERR_UNSUPPORTED_OPCODE; + + return ret; +} - /* next expected psn, read handles this separately */ - qp->resp.psn = (pkt->psn + 1) & BTH_PSN_MASK; - qp->resp.ack_psn = qp->resp.psn; +static enum resp_states rxe_atomic_reply(struct rxe_qp *qp, + struct rxe_pkt_info *pkt) +{ + struct rxe_mr *mr = qp->resp.mr; + struct resp_res *res = qp->resp.res; + int ret; - qp->resp.opcode = pkt->opcode; - qp->resp.status = IB_WC_SUCCESS; + if (!res) { + res = rxe_prepare_res(qp, pkt, RXE_ATOMIC_MASK); + qp->resp.res = res; } - ret = RESPST_ACKNOWLEDGE; -out: + if (!res->replay) { + if (mr->state != RXE_MR_STATE_VALID) + return RESPST_ERR_RKEY_VIOLATION; + ret = rxe_atomic_ops(qp, pkt, mr); + } else + ret = RESPST_ACKNOWLEDGE; + return ret; } @@ -1556,7 +1580,7 @@ int rxe_responder(void *arg) state = read_reply(qp, pkt); break; case RESPST_ATOMIC_REPLY: - state = atomic_reply(qp, pkt); + state = rxe_atomic_reply(qp, pkt); break; case RESPST_ATOMIC_WRITE_REPLY: state = atomic_write_reply(qp, pkt); diff --git a/drivers/infiniband/sw/rxe/rxe_resp.h b/drivers/infiniband/sw/rxe/rxe_resp.h new file mode 100644 index 000000000000..94a4869fdab6 --- /dev/null +++ b/drivers/infiniband/sw/rxe/rxe_resp.h @@ -0,0 +1,9 @@ +/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */ + +#ifndef RXE_RESP_H +#define RXE_RESP_H + +enum resp_states rxe_process_atomic(struct rxe_qp *qp, + struct rxe_pkt_info *pkt, u64 *vaddr); + +#endif /* RXE_RESP_H */ From patchwork Fri Dec 23 06:51:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Daisuke Matsuda (Fujitsu)" X-Patchwork-Id: 36142 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:e747:0:0:0:0:0 with SMTP id c7csp174348wrn; Thu, 22 Dec 2022 22:54:10 -0800 (PST) X-Google-Smtp-Source: AMrXdXuNykJeDvJpUBVnO1yLQiHTvpLnatp/8Rn8LKpv7jKAWRfNrkmtLkU4aVfhXPJYFqQ0rCHd X-Received: by 2002:a05:6a20:4d94:b0:af:ac38:911a with SMTP id gj20-20020a056a204d9400b000afac38911amr10150025pzb.59.1671778450231; Thu, 22 Dec 2022 22:54:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671778450; cv=none; d=google.com; s=arc-20160816; b=Fl4O/K1jfrFaYss+MEXKdggRXoA4PyE5IsXTYC9y9gfXzS2io1Gd6hp+OYZ8Mbmalk vOaPhtL66Bu5sUC26zVSEAmkn5jgyjYOa/ECiin46ydR48hYdbhqT23D7BVRii+7dAxq DLTsOM44c048t2hySytCTu/27Dez/w4qmRrNGR8e8MkyKVOUhqXFxLkOIX9swnVEm5hd t9+rnikX2gbbOUVtkXaG07pgSci4baZwWMv9A/xwRS0WoOEnoi4kcYbJ9fMZfVNNZd9U WKuRNS2Xd2H+3qy8LDRvoPNqEl0GoEoW//n+uDBsIBDCmjScrJpUL9tsuNaeP3AQNGN1 dffQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=YZNQe2lZin3NFTTRqMnJaQtJEVrLunwVgALdsf56QGk=; b=g18OdVRqGwiRq2wVv8eJBhL31Ih45eAKsGjqTYZ5NFESQOEfeLSzLWkWX7Z7VMr1JX zPvDFpYnlmkqQ0tsXcLQlPvafPuPVuEwDgwqYf3tekJBXBfmmIRQG1DhQ3TENnEN3TaZ bR0rXn3UFTs03nwnYRTC5ElS7JIjJItdkpvRvPFo40iOkmBJabBFO3Jtbr4QyTD6/Scy Wz5ILbFi83CkpTZirYy9RtUcrqFVKoPfjR62sH2vxVURh4jWwJglIEeqnPDDVg75kFiz cJN9YhnSPDhVQ85M3lE/CtA84T6xB4zRw/THBXmu/v8eSAyPuVOBZrCro9DA1xCuLVq+ Bp1Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m8-20020a6562c8000000b0047715beb2a5si3002676pgv.150.2022.12.22.22.53.57; Thu, 22 Dec 2022 22:54:10 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235825AbiLWGwc (ORCPT + 99 others); Fri, 23 Dec 2022 01:52:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50384 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230444AbiLWGwa (ORCPT ); Fri, 23 Dec 2022 01:52:30 -0500 Received: from esa8.hc1455-7.c3s2.iphmx.com (esa8.hc1455-7.c3s2.iphmx.com [139.138.61.253]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B0BAE24BFB; Thu, 22 Dec 2022 22:52:28 -0800 (PST) X-IronPort-AV: E=McAfee;i="6500,9779,10569"; a="89180422" X-IronPort-AV: E=Sophos;i="5.96,267,1665414000"; d="scan'208";a="89180422" Received: from unknown (HELO yto-r4.gw.nic.fujitsu.com) ([218.44.52.220]) by esa8.hc1455-7.c3s2.iphmx.com with ESMTP; 23 Dec 2022 15:52:25 +0900 Received: from yto-m4.gw.nic.fujitsu.com (yto-nat-yto-m4.gw.nic.fujitsu.com [192.168.83.67]) by yto-r4.gw.nic.fujitsu.com (Postfix) with ESMTP id DC76CD3EA0; Fri, 23 Dec 2022 15:52:24 +0900 (JST) Received: from m3003.s.css.fujitsu.com (m3003.s.css.fujitsu.com [10.128.233.114]) by yto-m4.gw.nic.fujitsu.com (Postfix) with ESMTP id 2FA5914467; Fri, 23 Dec 2022 15:52:24 +0900 (JST) Received: from localhost.localdomain (unknown [10.19.3.107]) by m3003.s.css.fujitsu.com (Postfix) with ESMTP id EDD2C200B2A8; Fri, 23 Dec 2022 15:52:23 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leonro@nvidia.com, jgg@nvidia.com, zyjzyj2000@gmail.com Cc: nvdimm@lists.linux.dev, linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, yangx.jy@fujitsu.com, lizhijian@fujitsu.com, y-goto@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v3 4/7] RDMA/rxe: Add page invalidation support Date: Fri, 23 Dec 2022 15:51:55 +0900 Message-Id: <97b0362e256dd3eb022d81c30941edd8fb0caba9.1671772917.git.matsuda-daisuke@fujitsu.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_PASS,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752986760398939103?= X-GMAIL-MSGID: =?utf-8?q?1752986760398939103?= On page invalidation, an MMU notifier callback is invoked to unmap DMA addresses and update the driver page table(umem_odp->dma_list). The callback is registered when an ODP-enabled MR is created. Signed-off-by: Daisuke Matsuda --- drivers/infiniband/sw/rxe/Makefile | 2 ++ drivers/infiniband/sw/rxe/rxe_odp.c | 34 +++++++++++++++++++++++++++++ 2 files changed, 36 insertions(+) create mode 100644 drivers/infiniband/sw/rxe/rxe_odp.c diff --git a/drivers/infiniband/sw/rxe/Makefile b/drivers/infiniband/sw/rxe/Makefile index 5395a581f4bb..93134f1d1d0c 100644 --- a/drivers/infiniband/sw/rxe/Makefile +++ b/drivers/infiniband/sw/rxe/Makefile @@ -23,3 +23,5 @@ rdma_rxe-y := \ rxe_task.o \ rxe_net.o \ rxe_hw_counters.o + +rdma_rxe-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += rxe_odp.o diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c new file mode 100644 index 000000000000..0787a9b19646 --- /dev/null +++ b/drivers/infiniband/sw/rxe/rxe_odp.c @@ -0,0 +1,34 @@ +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB +/* + * Copyright (c) 2022 Fujitsu Ltd. All rights reserved. + */ + +#include + +static bool rxe_ib_invalidate_range(struct mmu_interval_notifier *mni, + const struct mmu_notifier_range *range, + unsigned long cur_seq) +{ + struct ib_umem_odp *umem_odp = + container_of(mni, struct ib_umem_odp, notifier); + unsigned long start; + unsigned long end; + + if (!mmu_notifier_range_blockable(range)) + return false; + + mutex_lock(&umem_odp->umem_mutex); + mmu_interval_set_seq(mni, cur_seq); + + start = max_t(u64, ib_umem_start(umem_odp), range->start); + end = min_t(u64, ib_umem_end(umem_odp), range->end); + + ib_umem_odp_unmap_dma_pages(umem_odp, start, end); + + mutex_unlock(&umem_odp->umem_mutex); + return true; +} + +const struct mmu_interval_notifier_ops rxe_mn_ops = { + .invalidate = rxe_ib_invalidate_range, +}; From patchwork Fri Dec 23 06:51:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Daisuke Matsuda (Fujitsu)" X-Patchwork-Id: 36146 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:e747:0:0:0:0:0 with SMTP id c7csp184529wrn; Thu, 22 Dec 2022 23:23:40 -0800 (PST) X-Google-Smtp-Source: AMrXdXtubHjy4DWHzwneT+lbgJ+qKVRQNa7qJqbh3cYbTOFVhgtS7NgxZcOLLnw6yVkCYqNPor2K X-Received: by 2002:a17:906:6b10:b0:7c8:9f04:ae7b with SMTP id q16-20020a1709066b1000b007c89f04ae7bmr7139422ejr.62.1671780219820; Thu, 22 Dec 2022 23:23:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671780219; cv=none; d=google.com; s=arc-20160816; b=ZdRJgHDYsCEqMnYbIvaoc08zD6qpC2WZAAYOsR3dNutT+z2+jA0qJxyUyqB3ZKByEz WIb05VVcSnXjEsCdSi7Bew3SEY9HfpMe0XktZLbKvpIp+gmjJAH94YdheLVZ59yDDAQv ygkC+WV7LcbJrTl/ojFqe1BDpHMYHEvoa5ntDYvSkTI8HkQYec+DjOsdjy3/SOpG3U3/ 8vbB30+DgSQPOMdO04wlhyfUjaJ7uj1yd9ej3eHqhJdl5h+hadH5E2zj5uiDMZLhpbms 913kLJCXmCFwauARydVT93RIJfW4Wkw679L0yK8ap+XQfz9iQ9FSgQCpBIcZzd9uo8NA 9SEg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=QBb48SjiDFbmx8weGoCB33udQi9C/DxzSdSu+wnxYbM=; b=JrTiaB7ryYLN8KyAlC/qmLJZdsYxOL3CwtJCr6kGhEKWJe4LQIsNJ2wp1PomI+OP/Y ZmMkHa4LJbGgCL9Iq6RMvU7kokjle0bg29MRzWyBnsETrzQfdkg/W3Gm09893E1Kiuag udt6uItky33WKk8/aXcA16dup/lTySkgEGeXCitt6hk2Xih4IXPEffIDBEioMxdvcTHX WohkkP4Vlz19t6eXdethEg5HDLOk9JQx5XRGp/55MPCuZEWKJZNo3XXKx2CIeQDodqN7 Lk2IQyNuNLYry5kDyKYs1WV5uW+9KO1ct5fOrs5dkYkRbF6NBifh0hJFXgWeOL0hdD8F Yt4w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hb7-20020a170907160700b0078dbd939dacsi2070139ejc.545.2022.12.22.23.23.16; Thu, 22 Dec 2022 23:23:39 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235929AbiLWGyB (ORCPT + 99 others); Fri, 23 Dec 2022 01:54:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51216 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235866AbiLWGxh (ORCPT ); Fri, 23 Dec 2022 01:53:37 -0500 X-Greylist: delayed 64 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Thu, 22 Dec 2022 22:53:35 PST Received: from esa12.hc1455-7.c3s2.iphmx.com (esa12.hc1455-7.c3s2.iphmx.com [139.138.37.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 987D82654C for ; Thu, 22 Dec 2022 22:53:34 -0800 (PST) X-IronPort-AV: E=McAfee;i="6500,9779,10569"; a="80703257" X-IronPort-AV: E=Sophos;i="5.96,267,1665414000"; d="scan'208";a="80703257" Received: from unknown (HELO oym-r4.gw.nic.fujitsu.com) ([210.162.30.92]) by esa12.hc1455-7.c3s2.iphmx.com with ESMTP; 23 Dec 2022 15:52:27 +0900 Received: from oym-m1.gw.nic.fujitsu.com (oym-nat-oym-m1.gw.nic.fujitsu.com [192.168.87.58]) by oym-r4.gw.nic.fujitsu.com (Postfix) with ESMTP id 3C3E4DD994; Fri, 23 Dec 2022 15:52:27 +0900 (JST) Received: from m3003.s.css.fujitsu.com (m3003.s.css.fujitsu.com [10.128.233.114]) by oym-m1.gw.nic.fujitsu.com (Postfix) with ESMTP id 60297D8C86; Fri, 23 Dec 2022 15:52:26 +0900 (JST) Received: from localhost.localdomain (unknown [10.19.3.107]) by m3003.s.css.fujitsu.com (Postfix) with ESMTP id 1538A200B2A8; Fri, 23 Dec 2022 15:52:26 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leonro@nvidia.com, jgg@nvidia.com, zyjzyj2000@gmail.com Cc: nvdimm@lists.linux.dev, linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, yangx.jy@fujitsu.com, lizhijian@fujitsu.com, y-goto@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v3 5/7] RDMA/rxe: Allow registering MRs for On-Demand Paging Date: Fri, 23 Dec 2022 15:51:56 +0900 Message-Id: <412a759639776180a187c36632397ee8796d0255.1671772917.git.matsuda-daisuke@fujitsu.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_PASS,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752988615500676506?= X-GMAIL-MSGID: =?utf-8?q?1752988615500676506?= Allow applications to register an ODP-enabled MR, in which case the flag IB_ACCESS_ON_DEMAND is passed to rxe_reg_user_mr(). However, there is no RDMA operation supported right now. They will be enabled later in the subsequent two patches. rxe_odp_do_pagefault() is called to initialize an ODP-enabled MR. It syncs process address space from the CPU page table to the driver page table (dma_list/pfn_list in umem_odp) when called with RXE_PAGEFAULT_SNAPSHOT flag. Additionally, It can be used to trigger page fault when pages being accessed are not present or do not have proper read/write permissions, and possibly to prefetch pages in the future. Signed-off-by: Daisuke Matsuda --- drivers/infiniband/sw/rxe/rxe.c | 7 +++ drivers/infiniband/sw/rxe/rxe_loc.h | 16 ++++++ drivers/infiniband/sw/rxe/rxe_mr.c | 7 ++- drivers/infiniband/sw/rxe/rxe_odp.c | 83 +++++++++++++++++++++++++++ drivers/infiniband/sw/rxe/rxe_resp.c | 33 +++++++++-- drivers/infiniband/sw/rxe/rxe_verbs.c | 7 ++- drivers/infiniband/sw/rxe/rxe_verbs.h | 2 + 7 files changed, 147 insertions(+), 8 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c index 3c7e42e5b0c7..7a8a09487f96 100644 --- a/drivers/infiniband/sw/rxe/rxe.c +++ b/drivers/infiniband/sw/rxe/rxe.c @@ -73,6 +73,13 @@ static void rxe_init_device_param(struct rxe_dev *rxe) rxe->ndev->dev_addr); rxe->max_ucontext = RXE_MAX_UCONTEXT; + + if (IS_ENABLED(CONFIG_INFINIBAND_ON_DEMAND_PAGING)) { + rxe->attr.kernel_cap_flags |= IBK_ON_DEMAND_PAGING; + + /* IB_ODP_SUPPORT_IMPLICIT is not supported right now. */ + rxe->attr.odp_caps.general_caps |= IB_ODP_SUPPORT; + } } /* initialize port attributes */ diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index d567aa65b5e0..ab66a07e5700 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -60,6 +60,7 @@ int rxe_mmap(struct ib_ucontext *context, struct vm_area_struct *vma); /* rxe_mr.c */ u8 rxe_get_next_key(u32 last_key); +void rxe_mr_init(int access, struct rxe_mr *mr); void rxe_mr_init_dma(int access, struct rxe_mr *mr); int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova, int access, struct rxe_mr *mr); @@ -185,4 +186,19 @@ static inline unsigned int wr_opcode_mask(int opcode, struct rxe_qp *qp) return rxe_wr_opcode_info[opcode].mask[qp->ibqp.qp_type]; } +#ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING +/* rxe_odp.c */ +int rxe_create_user_odp_mr(struct ib_pd *pd, u64 start, u64 length, u64 iova, + int access_flags, struct rxe_mr *mr); + +#else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ +static inline int +rxe_create_user_odp_mr(struct ib_pd *pd, u64 start, u64 length, u64 iova, + int access_flags, struct rxe_mr *mr) +{ + return -EOPNOTSUPP; +} + +#endif /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ + #endif /* RXE_LOC_H */ diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c index 072eac4b65d2..261bb462341b 100644 --- a/drivers/infiniband/sw/rxe/rxe_mr.c +++ b/drivers/infiniband/sw/rxe/rxe_mr.c @@ -49,7 +49,7 @@ int mr_check_range(struct rxe_mr *mr, u64 iova, size_t length) | IB_ACCESS_REMOTE_WRITE \ | IB_ACCESS_REMOTE_ATOMIC) -static void rxe_mr_init(int access, struct rxe_mr *mr) +void rxe_mr_init(int access, struct rxe_mr *mr) { u32 lkey = mr->elem.index << 8 | rxe_get_next_key(-1); u32 rkey = (access & IB_ACCESS_REMOTE) ? lkey : 0; @@ -478,7 +478,10 @@ int copy_data( if (bytes > 0) { iova = sge->addr + offset; - err = rxe_mr_copy(mr, iova, addr, bytes, dir); + if (mr->odp_enabled) + err = -EOPNOTSUPP; + else + err = rxe_mr_copy(mr, iova, addr, bytes, dir); if (err) goto err2; diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c index 0787a9b19646..8b01ceaeee36 100644 --- a/drivers/infiniband/sw/rxe/rxe_odp.c +++ b/drivers/infiniband/sw/rxe/rxe_odp.c @@ -5,6 +5,8 @@ #include +#include "rxe.h" + static bool rxe_ib_invalidate_range(struct mmu_interval_notifier *mni, const struct mmu_notifier_range *range, unsigned long cur_seq) @@ -32,3 +34,84 @@ static bool rxe_ib_invalidate_range(struct mmu_interval_notifier *mni, const struct mmu_interval_notifier_ops rxe_mn_ops = { .invalidate = rxe_ib_invalidate_range, }; + +#define RXE_PAGEFAULT_RDONLY BIT(1) +#define RXE_PAGEFAULT_SNAPSHOT BIT(2) +static int rxe_odp_do_pagefault(struct rxe_mr *mr, u64 user_va, int bcnt, u32 flags) +{ + int np; + u64 access_mask; + bool fault = !(flags & RXE_PAGEFAULT_SNAPSHOT); + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); + + access_mask = ODP_READ_ALLOWED_BIT; + if (umem_odp->umem.writable && !(flags & RXE_PAGEFAULT_RDONLY)) + access_mask |= ODP_WRITE_ALLOWED_BIT; + + /* + * ib_umem_odp_map_dma_and_lock() locks umem_mutex on success. + * Callers must release the lock later to let invalidation handler + * do its work again. + */ + np = ib_umem_odp_map_dma_and_lock(umem_odp, user_va, bcnt, + access_mask, fault); + + return np; +} + +static int rxe_init_odp_mr(struct rxe_mr *mr) +{ + int ret; + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); + + ret = rxe_odp_do_pagefault(mr, mr->umem->address, mr->umem->length, + RXE_PAGEFAULT_SNAPSHOT); + + if (ret >= 0) + mutex_unlock(&umem_odp->umem_mutex); + + return ret >= 0 ? 0 : ret; +} + +int rxe_create_user_odp_mr(struct ib_pd *pd, u64 start, u64 length, u64 iova, + int access_flags, struct rxe_mr *mr) +{ + int err; + struct ib_umem_odp *umem_odp; + struct rxe_dev *dev = container_of(pd->device, struct rxe_dev, ib_dev); + + if (!IS_ENABLED(CONFIG_INFINIBAND_ON_DEMAND_PAGING)) + return -EOPNOTSUPP; + + rxe_mr_init(access_flags, mr); + + if (!start && length == U64_MAX) { + if (iova != 0) + return -EINVAL; + if (!(dev->attr.odp_caps.general_caps & IB_ODP_SUPPORT_IMPLICIT)) + return -EINVAL; + + /* Never reach here, for implicit ODP is not implemented. */ + } + + umem_odp = ib_umem_odp_get(pd->device, start, length, access_flags, + &rxe_mn_ops); + if (IS_ERR(umem_odp)) + return PTR_ERR(umem_odp); + + umem_odp->private = mr; + + mr->odp_enabled = true; + mr->ibmr.pd = pd; + mr->umem = &umem_odp->umem; + mr->access = access_flags; + mr->ibmr.length = length; + mr->ibmr.iova = iova; + mr->offset = ib_umem_offset(&umem_odp->umem); + mr->state = RXE_MR_STATE_VALID; + mr->ibmr.type = IB_MR_TYPE_USER; + + err = rxe_init_odp_mr(mr); + + return err; +} diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c index e18bca076337..f2c803036888 100644 --- a/drivers/infiniband/sw/rxe/rxe_resp.c +++ b/drivers/infiniband/sw/rxe/rxe_resp.c @@ -627,8 +627,16 @@ static enum resp_states write_data_in(struct rxe_qp *qp, int err; int data_len = payload_size(pkt); - err = rxe_mr_copy(qp->resp.mr, qp->resp.va + qp->resp.offset, - payload_addr(pkt), data_len, RXE_TO_MR_OBJ); + /* resp.mr is not set in check_rkey() for zero byte operations */ + if (data_len == 0) + goto out; + + if (qp->resp.mr->odp_enabled) + err = RESPST_ERR_UNSUPPORTED_OPCODE; + else + err = rxe_mr_copy(qp->resp.mr, qp->resp.va + qp->resp.offset, + payload_addr(pkt), data_len, RXE_TO_MR_OBJ); + if (err) { rc = RESPST_ERR_RKEY_VIOLATION; goto out; @@ -693,6 +701,9 @@ static enum resp_states process_flush(struct rxe_qp *qp, struct rxe_mr *mr = qp->resp.mr; struct resp_res *res = qp->resp.res; + if (mr->odp_enabled) + return RESPST_ERR_UNSUPPORTED_OPCODE; + /* oA19-14, oA19-15 */ if (res && res->replay) return RESPST_ACKNOWLEDGE; @@ -807,7 +818,11 @@ static enum resp_states rxe_atomic_reply(struct rxe_qp *qp, if (!res->replay) { if (mr->state != RXE_MR_STATE_VALID) return RESPST_ERR_RKEY_VIOLATION; - ret = rxe_atomic_ops(qp, pkt, mr); + + if (mr->odp_enabled) + ret = RESPST_ERR_UNSUPPORTED_OPCODE; + else + ret = rxe_atomic_ops(qp, pkt, mr); } else ret = RESPST_ACKNOWLEDGE; @@ -822,6 +837,9 @@ static enum resp_states do_atomic_write(struct rxe_qp *qp, int payload = payload_size(pkt); u64 src, *dst; + if (mr->odp_enabled) + return RESPST_ERR_UNSUPPORTED_OPCODE; + if (mr->state != RXE_MR_STATE_VALID) return RESPST_ERR_RKEY_VIOLATION; @@ -1031,8 +1049,13 @@ static enum resp_states read_reply(struct rxe_qp *qp, return RESPST_ERR_RNR; } - err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt), - payload, RXE_FROM_MR_OBJ); + /* mr is NULL for a zero byte operation. */ + if ((res->read.resid != 0) && mr->odp_enabled) + err = RESPST_ERR_UNSUPPORTED_OPCODE; + else + err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt), + payload, RXE_FROM_MR_OBJ); + if (mr) rxe_put(mr); if (err) { diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c index 025b35bf014e..54cbb7a86019 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.c +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c @@ -903,7 +903,12 @@ static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd, mr->ibmr.pd = ibpd; mr->ibmr.device = ibpd->device; - err = rxe_mr_init_user(rxe, start, length, iova, access, mr); + if (access & IB_ACCESS_ON_DEMAND) + err = rxe_create_user_odp_mr(&pd->ibpd, start, length, iova, + access, mr); + else + err = rxe_mr_init_user(rxe, start, length, iova, access, mr); + if (err) goto err1; diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h index 19ddfa890480..f90116576e7b 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.h +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h @@ -327,6 +327,8 @@ struct rxe_mr { atomic_t num_mw; struct rxe_map **map; + + bool odp_enabled; }; enum rxe_mw_state { From patchwork Fri Dec 23 06:51:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Daisuke Matsuda (Fujitsu)" X-Patchwork-Id: 36145 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:e747:0:0:0:0:0 with SMTP id c7csp183461wrn; Thu, 22 Dec 2022 23:19:57 -0800 (PST) X-Google-Smtp-Source: AMrXdXuHwd4fOYeY5HHmglgI/NwZ19BLoQAI8KV0J852YStPS7827EqCMnh+q2W0G7bzcZhW/A5U X-Received: by 2002:a05:6402:220b:b0:475:32d2:74a5 with SMTP id cq11-20020a056402220b00b0047532d274a5mr6003083edb.42.1671779997186; Thu, 22 Dec 2022 23:19:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671779997; cv=none; d=google.com; s=arc-20160816; b=V9k4QoVGpbq1afo3ysLXcWxX83PnF1X9BAox/mYbpKSDRXE3wmXpTV5k+393+DXauQ +RkBhOc6Ke6gX3zTm0MDQajdckCo/JAda7GH/Moynb6d4iLk0Ffpur0wZeZmmtM3qfO+ JfqKO04MLwVT+jD0CPY4uv6yymB+4dAeHUyi7lnW6C7w/rYHRsuxES7Oc7AxuwZXYTBy U2Luc7jDjOt8Kd5sWhWePkthmKpvc2t8NpMWBL2SkaZwQoovJ30KHygI1/XorZdK4vgd IyQWGQe3r7wjwn722Gh2LiTPwm2RoImXMk2nyxsn5hthXzG1TTy856BGNm28gqszMk+b IfKA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=5A80pQQ7nn38GBr+0CIzELV/aIoOZeOr0ByOan8IC0c=; b=KHYClzFUl6kvld4VQmEXkcSyd+5nV+8P3hjYI5PIO5EDpQOA5YClUXOLzjb+2gz77V NCwYcpNovVe340v5RAjZmipK/De17Qvz9bpOYB/kI4wxhgW0hCFKV2q45AmXvl2zbbh2 cDFHkCaM/2MuZntt2c+ybEMgk12sOlkkgRPOkJMdy/8rtmJ3HsTjZgdzfN29xAX6yxvw JrIwJvOi+3Th6KQXBPhjZ5+MEYZRC7HQ74oCPuz4FJzhHx9GNTZeULWzpw72JPqQXJ9N WpK5Oq0cdLpMR4OkTFC+tjZdZLmxsv3NaLvL0vplcDtW6KXaHBSzG8gfAl9mZCkSmBgT HgCA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m20-20020aa7d354000000b0047866149cdasi1937365edr.388.2022.12.22.23.19.28; Thu, 22 Dec 2022 23:19:57 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235904AbiLWGyE (ORCPT + 99 others); Fri, 23 Dec 2022 01:54:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50548 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235877AbiLWGxi (ORCPT ); Fri, 23 Dec 2022 01:53:38 -0500 Received: from esa2.hc1455-7.c3s2.iphmx.com (esa2.hc1455-7.c3s2.iphmx.com [207.54.90.48]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8BEFB27169 for ; Thu, 22 Dec 2022 22:53:36 -0800 (PST) X-IronPort-AV: E=McAfee;i="6500,9779,10569"; a="101163247" X-IronPort-AV: E=Sophos;i="5.96,267,1665414000"; d="scan'208";a="101163247" Received: from unknown (HELO yto-r3.gw.nic.fujitsu.com) ([218.44.52.219]) by esa2.hc1455-7.c3s2.iphmx.com with ESMTP; 23 Dec 2022 15:52:30 +0900 Received: from yto-m4.gw.nic.fujitsu.com (yto-nat-yto-m4.gw.nic.fujitsu.com [192.168.83.67]) by yto-r3.gw.nic.fujitsu.com (Postfix) with ESMTP id 14F28C3F95; Fri, 23 Dec 2022 15:52:29 +0900 (JST) Received: from m3003.s.css.fujitsu.com (m3003.s.css.fujitsu.com [10.128.233.114]) by yto-m4.gw.nic.fujitsu.com (Postfix) with ESMTP id 57C5FF0A3A; Fri, 23 Dec 2022 15:52:28 +0900 (JST) Received: from localhost.localdomain (unknown [10.19.3.107]) by m3003.s.css.fujitsu.com (Postfix) with ESMTP id 1D0AB200B2A8; Fri, 23 Dec 2022 15:52:28 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leonro@nvidia.com, jgg@nvidia.com, zyjzyj2000@gmail.com Cc: nvdimm@lists.linux.dev, linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, yangx.jy@fujitsu.com, lizhijian@fujitsu.com, y-goto@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v3 6/7] RDMA/rxe: Add support for Send/Recv/Write/Read operations with ODP Date: Fri, 23 Dec 2022 15:51:57 +0900 Message-Id: <9fc493c713731e03d31cfbb0e0b76aa9524dc228.1671772917.git.matsuda-daisuke@fujitsu.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_PASS,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752988382030842887?= X-GMAIL-MSGID: =?utf-8?q?1752988382030842887?= rxe_mr_copy() is used widely to copy data to/from a user MR. requester uses it to load payloads of requesting packets; responder uses it to process Send, Write, and Read operaetions; completer uses it to copy data from response packets of Read and Atomic operations to a user MR. Allow these operations to be used with ODP by adding a counterpart function rxe_odp_mr_copy(). It is comprised of the following steps: 1. Check the driver page table(umem_odp->dma_list) to see if pages being accessed are present with appropriate permission. 2. If necessary, trigger page fault to map the pages. 3. Convert their user space addresses to kernel logical addresses using PFNs in the driver page table(umem_odp->pfn_list). 4. Execute data copy to/from the pages. umem_mutex is used to ensure that dma_list (an array of addresses of an MR) is not changed while it is checked and that mapped pages are not invalidated before data copy completes. Signed-off-by: Daisuke Matsuda --- drivers/infiniband/sw/rxe/rxe.c | 10 ++ drivers/infiniband/sw/rxe/rxe_loc.h | 5 + drivers/infiniband/sw/rxe/rxe_mr.c | 2 +- drivers/infiniband/sw/rxe/rxe_odp.c | 176 +++++++++++++++++++++++++++ drivers/infiniband/sw/rxe/rxe_resp.c | 43 +------ drivers/infiniband/sw/rxe/rxe_resp.h | 37 ++++++ 6 files changed, 233 insertions(+), 40 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c index 7a8a09487f96..2c9f0cf96671 100644 --- a/drivers/infiniband/sw/rxe/rxe.c +++ b/drivers/infiniband/sw/rxe/rxe.c @@ -79,6 +79,16 @@ static void rxe_init_device_param(struct rxe_dev *rxe) /* IB_ODP_SUPPORT_IMPLICIT is not supported right now. */ rxe->attr.odp_caps.general_caps |= IB_ODP_SUPPORT; + + rxe->attr.odp_caps.per_transport_caps.ud_odp_caps |= IB_ODP_SUPPORT_SEND; + rxe->attr.odp_caps.per_transport_caps.ud_odp_caps |= IB_ODP_SUPPORT_RECV; + rxe->attr.odp_caps.per_transport_caps.ud_odp_caps |= IB_ODP_SUPPORT_SRQ_RECV; + + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_SEND; + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_RECV; + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_WRITE; + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_READ; + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_SRQ_RECV; } } diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index ab66a07e5700..fb468999e81e 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -190,6 +190,8 @@ static inline unsigned int wr_opcode_mask(int opcode, struct rxe_qp *qp) /* rxe_odp.c */ int rxe_create_user_odp_mr(struct ib_pd *pd, u64 start, u64 length, u64 iova, int access_flags, struct rxe_mr *mr); +int rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length, + enum rxe_mr_copy_dir dir); #else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ static inline int @@ -198,6 +200,9 @@ rxe_create_user_odp_mr(struct ib_pd *pd, u64 start, u64 length, u64 iova, { return -EOPNOTSUPP; } +static inline int +rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, + int length, enum rxe_mr_copy_dir dir) { return 0; } #endif /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c index 261bb462341b..09a9a8ea3411 100644 --- a/drivers/infiniband/sw/rxe/rxe_mr.c +++ b/drivers/infiniband/sw/rxe/rxe_mr.c @@ -479,7 +479,7 @@ int copy_data( iova = sge->addr + offset; if (mr->odp_enabled) - err = -EOPNOTSUPP; + err = rxe_odp_mr_copy(mr, iova, addr, bytes, dir); else err = rxe_mr_copy(mr, iova, addr, bytes, dir); if (err) diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c index 8b01ceaeee36..c55512417d11 100644 --- a/drivers/infiniband/sw/rxe/rxe_odp.c +++ b/drivers/infiniband/sw/rxe/rxe_odp.c @@ -3,9 +3,12 @@ * Copyright (c) 2022 Fujitsu Ltd. All rights reserved. */ +#include + #include #include "rxe.h" +#include "rxe_resp.h" static bool rxe_ib_invalidate_range(struct mmu_interval_notifier *mni, const struct mmu_notifier_range *range, @@ -115,3 +118,176 @@ int rxe_create_user_odp_mr(struct ib_pd *pd, u64 start, u64 length, u64 iova, return err; } + +static inline bool rxe_is_pagefault_neccesary(struct ib_umem_odp *umem_odp, + u64 iova, int length, u32 perm) +{ + int idx; + u64 addr; + bool need_fault = false; + + addr = iova & (~(BIT(umem_odp->page_shift) - 1)); + + /* Skim through all pages that are to be accessed. */ + while (addr < iova + length) { + idx = (addr - ib_umem_start(umem_odp)) >> umem_odp->page_shift; + + if (!(umem_odp->dma_list[idx] & perm)) { + need_fault = true; + break; + } + + addr += BIT(umem_odp->page_shift); + } + return need_fault; +} + +/* umem mutex must be locked before entering this function. */ +static int rxe_odp_map_range(struct rxe_mr *mr, u64 iova, int length, u32 flags) +{ + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); + const int max_tries = 3; + int cnt = 0; + + int err; + u64 perm; + bool need_fault; + + if (unlikely(length < 1)) { + mutex_unlock(&umem_odp->umem_mutex); + return -EINVAL; + } + + perm = ODP_READ_ALLOWED_BIT; + if (!(flags & RXE_PAGEFAULT_RDONLY)) + perm |= ODP_WRITE_ALLOWED_BIT; + + /* + * A successful return from rxe_odp_do_pagefault() does not guarantee + * that all pages in the range became present. Recheck the DMA address + * array, allowing max 3 tries for pagefault. + */ + while ((need_fault = rxe_is_pagefault_neccesary(umem_odp, + iova, length, perm))) { + + if (cnt >= max_tries) + break; + + mutex_unlock(&umem_odp->umem_mutex); + + /* umem_mutex is locked on success. */ + err = rxe_odp_do_pagefault(mr, iova, length, flags); + if (err < 0) + return err; + + cnt++; + } + + if (need_fault) + return -EFAULT; + + return 0; +} + +static inline void *rxe_odp_get_virt(struct ib_umem_odp *umem_odp, int umem_idx, + size_t offset) +{ + struct page *page; + void *virt; + + /* + * Step 1. Get page struct from the pfn array. + * Step 2. Convert page struct to kernel logical address. + * Step 3. Add offset in the page to the address. + */ + page = hmm_pfn_to_page(umem_odp->pfn_list[umem_idx]); + virt = page_address(page); + + if (!virt) + return NULL; + + virt += offset; + + return virt; +} + +static int __rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, + int length, enum rxe_mr_copy_dir dir) +{ + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); + + int idx, bytes; + u8 *user_va; + size_t offset; + + idx = (iova - ib_umem_start(umem_odp)) >> umem_odp->page_shift; + offset = iova & (BIT(umem_odp->page_shift) - 1); + + while (length > 0) { + u8 *src, *dest; + + user_va = (u8 *)rxe_odp_get_virt(umem_odp, idx, offset); + if (!user_va) + return -EFAULT; + + src = (dir == RXE_TO_MR_OBJ) ? addr : user_va; + dest = (dir == RXE_TO_MR_OBJ) ? user_va : addr; + + bytes = BIT(umem_odp->page_shift) - offset; + + if (bytes > length) + bytes = length; + + memcpy(dest, src, bytes); + + length -= bytes; + idx++; + offset = 0; + } + + return 0; +} + +int rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length, + enum rxe_mr_copy_dir dir) +{ + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); + u32 flags = 0; + + int err; + + if (length == 0) + return 0; + + if (unlikely(!mr->odp_enabled)) + return -EOPNOTSUPP; + + switch (dir) { + case RXE_TO_MR_OBJ: + break; + + case RXE_FROM_MR_OBJ: + flags = RXE_PAGEFAULT_RDONLY; + break; + + default: + return -EINVAL; + } + + /* If pagefault is not required, umem mutex will be held until data + * copy to the MR completes. Otherwise, it is released and locked + * again in rxe_odp_map_range() to let invalidation handler do its + * work meanwhile. + */ + mutex_lock(&umem_odp->umem_mutex); + + err = rxe_odp_map_range(mr, iova, length, flags); + if (err) + return err; + + err = __rxe_odp_mr_copy(mr, iova, addr, length, dir); + + mutex_unlock(&umem_odp->umem_mutex); + + return err; +} diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c index f2c803036888..7ef492e50e20 100644 --- a/drivers/infiniband/sw/rxe/rxe_resp.c +++ b/drivers/infiniband/sw/rxe/rxe_resp.c @@ -11,43 +11,6 @@ #include "rxe_queue.h" #include "rxe_resp.h" -enum resp_states { - RESPST_NONE, - RESPST_GET_REQ, - RESPST_CHK_PSN, - RESPST_CHK_OP_SEQ, - RESPST_CHK_OP_VALID, - RESPST_CHK_RESOURCE, - RESPST_CHK_LENGTH, - RESPST_CHK_RKEY, - RESPST_EXECUTE, - RESPST_READ_REPLY, - RESPST_ATOMIC_REPLY, - RESPST_ATOMIC_WRITE_REPLY, - RESPST_PROCESS_FLUSH, - RESPST_COMPLETE, - RESPST_ACKNOWLEDGE, - RESPST_CLEANUP, - RESPST_DUPLICATE_REQUEST, - RESPST_ERR_MALFORMED_WQE, - RESPST_ERR_UNSUPPORTED_OPCODE, - RESPST_ERR_MISALIGNED_ATOMIC, - RESPST_ERR_PSN_OUT_OF_SEQ, - RESPST_ERR_MISSING_OPCODE_FIRST, - RESPST_ERR_MISSING_OPCODE_LAST_C, - RESPST_ERR_MISSING_OPCODE_LAST_D1E, - RESPST_ERR_TOO_MANY_RDMA_ATM_REQ, - RESPST_ERR_RNR, - RESPST_ERR_RKEY_VIOLATION, - RESPST_ERR_INVALIDATE_RKEY, - RESPST_ERR_LENGTH, - RESPST_ERR_CQ_OVERFLOW, - RESPST_ERROR, - RESPST_RESET, - RESPST_DONE, - RESPST_EXIT, -}; - static char *resp_state_name[] = { [RESPST_NONE] = "NONE", [RESPST_GET_REQ] = "GET_REQ", @@ -632,7 +595,8 @@ static enum resp_states write_data_in(struct rxe_qp *qp, goto out; if (qp->resp.mr->odp_enabled) - err = RESPST_ERR_UNSUPPORTED_OPCODE; + err = rxe_odp_mr_copy(qp->resp.mr, qp->resp.va + qp->resp.offset, + payload_addr(pkt), data_len, RXE_TO_MR_OBJ); else err = rxe_mr_copy(qp->resp.mr, qp->resp.va + qp->resp.offset, payload_addr(pkt), data_len, RXE_TO_MR_OBJ); @@ -1051,7 +1015,8 @@ static enum resp_states read_reply(struct rxe_qp *qp, /* mr is NULL for a zero byte operation. */ if ((res->read.resid != 0) && mr->odp_enabled) - err = RESPST_ERR_UNSUPPORTED_OPCODE; + err = rxe_odp_mr_copy(mr, res->read.va, payload_addr(&ack_pkt), + payload, RXE_FROM_MR_OBJ); else err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt), payload, RXE_FROM_MR_OBJ); diff --git a/drivers/infiniband/sw/rxe/rxe_resp.h b/drivers/infiniband/sw/rxe/rxe_resp.h index 94a4869fdab6..242506f57fd3 100644 --- a/drivers/infiniband/sw/rxe/rxe_resp.h +++ b/drivers/infiniband/sw/rxe/rxe_resp.h @@ -3,6 +3,43 @@ #ifndef RXE_RESP_H #define RXE_RESP_H +enum resp_states { + RESPST_NONE, + RESPST_GET_REQ, + RESPST_CHK_PSN, + RESPST_CHK_OP_SEQ, + RESPST_CHK_OP_VALID, + RESPST_CHK_RESOURCE, + RESPST_CHK_LENGTH, + RESPST_CHK_RKEY, + RESPST_EXECUTE, + RESPST_READ_REPLY, + RESPST_ATOMIC_REPLY, + RESPST_ATOMIC_WRITE_REPLY, + RESPST_PROCESS_FLUSH, + RESPST_COMPLETE, + RESPST_ACKNOWLEDGE, + RESPST_CLEANUP, + RESPST_DUPLICATE_REQUEST, + RESPST_ERR_MALFORMED_WQE, + RESPST_ERR_UNSUPPORTED_OPCODE, + RESPST_ERR_MISALIGNED_ATOMIC, + RESPST_ERR_PSN_OUT_OF_SEQ, + RESPST_ERR_MISSING_OPCODE_FIRST, + RESPST_ERR_MISSING_OPCODE_LAST_C, + RESPST_ERR_MISSING_OPCODE_LAST_D1E, + RESPST_ERR_TOO_MANY_RDMA_ATM_REQ, + RESPST_ERR_RNR, + RESPST_ERR_RKEY_VIOLATION, + RESPST_ERR_INVALIDATE_RKEY, + RESPST_ERR_LENGTH, + RESPST_ERR_CQ_OVERFLOW, + RESPST_ERROR, + RESPST_RESET, + RESPST_DONE, + RESPST_EXIT, +}; + enum resp_states rxe_process_atomic(struct rxe_qp *qp, struct rxe_pkt_info *pkt, u64 *vaddr); From patchwork Fri Dec 23 06:51:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Daisuke Matsuda (Fujitsu)" X-Patchwork-Id: 36143 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:e747:0:0:0:0:0 with SMTP id c7csp174379wrn; Thu, 22 Dec 2022 22:54:16 -0800 (PST) X-Google-Smtp-Source: AMrXdXtktX6/1YLGTcoO0MZPeZGCji102enE9wsW8ittwwrxzLVGOaYFDcek59OZFhbeeYXImszm X-Received: by 2002:a17:90a:e646:b0:219:f624:2979 with SMTP id ep6-20020a17090ae64600b00219f6242979mr9319940pjb.26.1671778456638; Thu, 22 Dec 2022 22:54:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671778456; cv=none; d=google.com; s=arc-20160816; b=KElnq0QjuYCQ1N9urXjxI27wgrxFG565NPqc5JgBYJZBkfPNZn4ipxEqR1vd+KCFTN 9HIBxsdL3a104cNag05vOZ/zs5b/Bkw8s+WGNJoNI08c0uZdOf7JI2B95MUBCjaWnfKm 7sAeN1K5h2/82rNNI0P3zSCdmJoz+KWc7zcZDllfQhzTanHFsSCIhCYn03uPeAzHieZ0 jG6JA4xPZ9l5naQC3QpEM1k4EhbarVykkA7moWB2NMKEWrHGtjJdFIgz5JSDFLJqw4q4 tJescg5L65xcYtCA32nsxTsbxDEvjtEFNMngf4v9ISXIlk3/gnZkxfpoCr5CIz4K6aaP wdaA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=XJcHMbJ32GqeuntB/fxNzRPBvCDvrjO4kBmKSu79fPc=; b=sVu1wfHqy2lwacycXtL0F8GpXpv+/qx/5FnqAKSzM9jyAvZgh7SnzKkf6RNhsQhDfu bmHcIW8VMRErhZVgpGdZ/Rt/RcX2LhPc7hy/m1VwLEKpRehLm8hCeomdX3RlaIQehA7e 3NVCVtCAOPNBpbhnhZsYGxBmdi337lDS5tAw/UzvoqnHqnceqqVz3DOVpA82w9Hv8NMH zcyY8jvvGXja0PCOApTwpx6PerZLOq8GprgDcsEvg8E3ohfpiH/2HP78cCMySK4SAc8X wgDfwHf0vZs099/jOypigVkfireDzgmr5enGZWc/XqFrDmxDzKjE0l1If7hGUYb7Clgn us5A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id pi16-20020a17090b1e5000b002134d2f9848si3170207pjb.9.2022.12.22.22.54.04; Thu, 22 Dec 2022 22:54:16 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=fujitsu.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230106AbiLWGww (ORCPT + 99 others); Fri, 23 Dec 2022 01:52:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50896 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235866AbiLWGwo (ORCPT ); Fri, 23 Dec 2022 01:52:44 -0500 Received: from esa4.hc1455-7.c3s2.iphmx.com (esa4.hc1455-7.c3s2.iphmx.com [68.232.139.117]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EEFFC33CD9; Thu, 22 Dec 2022 22:52:34 -0800 (PST) X-IronPort-AV: E=McAfee;i="6500,9779,10569"; a="101124189" X-IronPort-AV: E=Sophos;i="5.96,267,1665414000"; d="scan'208";a="101124189" Received: from unknown (HELO oym-r3.gw.nic.fujitsu.com) ([210.162.30.91]) by esa4.hc1455-7.c3s2.iphmx.com with ESMTP; 23 Dec 2022 15:52:32 +0900 Received: from oym-m3.gw.nic.fujitsu.com (oym-nat-oym-m3.gw.nic.fujitsu.com [192.168.87.60]) by oym-r3.gw.nic.fujitsu.com (Postfix) with ESMTP id 39BA6D6478; Fri, 23 Dec 2022 15:52:31 +0900 (JST) Received: from m3003.s.css.fujitsu.com (m3003.s.css.fujitsu.com [10.128.233.114]) by oym-m3.gw.nic.fujitsu.com (Postfix) with ESMTP id 62971D94A9; Fri, 23 Dec 2022 15:52:30 +0900 (JST) Received: from localhost.localdomain (unknown [10.19.3.107]) by m3003.s.css.fujitsu.com (Postfix) with ESMTP id 1AEBA200B2A8; Fri, 23 Dec 2022 15:52:30 +0900 (JST) From: Daisuke Matsuda To: linux-rdma@vger.kernel.org, leonro@nvidia.com, jgg@nvidia.com, zyjzyj2000@gmail.com Cc: nvdimm@lists.linux.dev, linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com, yangx.jy@fujitsu.com, lizhijian@fujitsu.com, y-goto@fujitsu.com, Daisuke Matsuda Subject: [PATCH for-next v3 7/7] RDMA/rxe: Add support for the traditional Atomic operations with ODP Date: Fri, 23 Dec 2022 15:51:58 +0900 Message-Id: <30553db1a0333a714ec60b560d54efdfbf07f24d.1671772917.git.matsuda-daisuke@fujitsu.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_PASS,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752986767232799192?= X-GMAIL-MSGID: =?utf-8?q?1752986767232799192?= Enable 'fetch and add' and 'compare and swap' operations to manipulate data in an ODP-enabled MR. This is comprised of the following steps: 1. Check the driver page table(umem_odp->dma_list) to see if the target page is both readable and writable. 2. If not, then trigger page fault to map the page. 3. Convert its user space address to a kernel logical address using PFNs in the driver page table(umem_odp->pfn_list). 4. Execute the operation. umem_mutex is used to ensure that dma_list (an array of addresses of an MR) is not changed while it is checked and that the target page is not invalidated before data access completes. Signed-off-by: Daisuke Matsuda --- drivers/infiniband/sw/rxe/rxe.c | 1 + drivers/infiniband/sw/rxe/rxe_loc.h | 11 +++++++ drivers/infiniband/sw/rxe/rxe_odp.c | 46 ++++++++++++++++++++++++++++ drivers/infiniband/sw/rxe/rxe_resp.c | 2 +- 4 files changed, 59 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c index 2c9f0cf96671..30daf14ee0e8 100644 --- a/drivers/infiniband/sw/rxe/rxe.c +++ b/drivers/infiniband/sw/rxe/rxe.c @@ -88,6 +88,7 @@ static void rxe_init_device_param(struct rxe_dev *rxe) rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_RECV; rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_WRITE; rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_READ; + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_ATOMIC; rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_SRQ_RECV; } } diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index fb468999e81e..24b0b7069688 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -7,6 +7,8 @@ #ifndef RXE_LOC_H #define RXE_LOC_H +#include "rxe_resp.h" + /* rxe_av.c */ void rxe_init_av(struct rdma_ah_attr *attr, struct rxe_av *av); int rxe_av_chk_attr(struct rxe_qp *qp, struct rdma_ah_attr *attr); @@ -192,6 +194,8 @@ int rxe_create_user_odp_mr(struct ib_pd *pd, u64 start, u64 length, u64 iova, int access_flags, struct rxe_mr *mr); int rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length, enum rxe_mr_copy_dir dir); +enum resp_states rxe_odp_atomic_ops(struct rxe_qp *qp, struct rxe_pkt_info *pkt, + struct rxe_mr *mr); #else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ static inline int @@ -204,6 +208,13 @@ static inline int rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length, enum rxe_mr_copy_dir dir) { return 0; } +static inline enum resp_states +rxe_odp_atomic_ops(struct rxe_qp *qp, struct rxe_pkt_info *pkt, + struct rxe_mr *mr) +{ + return RESPST_ERR_UNSUPPORTED_OPCODE; +} + #endif /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ #endif /* RXE_LOC_H */ diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c index c55512417d11..6e0b6a872ddc 100644 --- a/drivers/infiniband/sw/rxe/rxe_odp.c +++ b/drivers/infiniband/sw/rxe/rxe_odp.c @@ -291,3 +291,49 @@ int rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length, return err; } + +static inline void *rxe_odp_get_virt_atomic(struct rxe_qp *qp, struct rxe_mr *mr) +{ + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); + u64 iova = qp->resp.va + qp->resp.offset; + int idx; + size_t offset; + + if (rxe_odp_map_range(mr, iova, sizeof(char), 0)) + return NULL; + + idx = (iova - ib_umem_start(umem_odp)) >> umem_odp->page_shift; + offset = iova & (BIT(umem_odp->page_shift) - 1); + + return rxe_odp_get_virt(umem_odp, idx, offset); +} + +enum resp_states rxe_odp_atomic_ops(struct rxe_qp *qp, struct rxe_pkt_info *pkt, + struct rxe_mr *mr) +{ + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem); + u64 *vaddr; + int ret; + + if (unlikely(!mr->odp_enabled)) + return RESPST_ERR_RKEY_VIOLATION; + + /* If pagefault is not required, umem mutex will be held until the + * atomic operation completes. Otherwise, it is released and locked + * again in rxe_odp_map_range() to let invalidation handler do its + * work meanwhile. + */ + mutex_lock(&umem_odp->umem_mutex); + + vaddr = (u64 *)rxe_odp_get_virt_atomic(qp, mr); + if (!vaddr) + return RESPST_ERR_RKEY_VIOLATION; + + if (pkt->mask & RXE_ATOMIC_MASK) + ret = rxe_process_atomic(qp, pkt, vaddr); + else + ret = RESPST_ERR_UNSUPPORTED_OPCODE; + + mutex_unlock(&umem_odp->umem_mutex); + return ret; +} diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c index 7ef492e50e20..669d3e1a6ee4 100644 --- a/drivers/infiniband/sw/rxe/rxe_resp.c +++ b/drivers/infiniband/sw/rxe/rxe_resp.c @@ -784,7 +784,7 @@ static enum resp_states rxe_atomic_reply(struct rxe_qp *qp, return RESPST_ERR_RKEY_VIOLATION; if (mr->odp_enabled) - ret = RESPST_ERR_UNSUPPORTED_OPCODE; + ret = rxe_odp_atomic_ops(qp, pkt, mr); else ret = rxe_atomic_ops(qp, pkt, mr); } else