From patchwork Sat Dec 2 08:01:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shifeng Li X-Patchwork-Id: 172749 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp1643429vqy; Sat, 2 Dec 2023 00:12:18 -0800 (PST) X-Google-Smtp-Source: AGHT+IEomXIT4iVbiwu3YKs9GY2fatfJqdIm38ZCnqmvdbcdiHezOUfOPVcA31BvprjLC+OHfyhK X-Received: by 2002:a17:90a:fb4e:b0:286:6cd8:ef02 with SMTP id iq14-20020a17090afb4e00b002866cd8ef02mr813125pjb.26.1701504737808; Sat, 02 Dec 2023 00:12:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701504737; cv=none; d=google.com; s=arc-20160816; b=qOffljYpG0086dShfZrtGopGvA511NAP58fDhvXkxNfcm80BcDNDkYt4Bj+j6xADCC N2/ucvclEtkW2tLgZ5ftEZx7nqO41Ycr5ICTfYVK9gW/7yJoLUrdE86kj+fYT7313Hpg PMRzlMWZSF6SvhxChoF01PG9nVJG+p4kfFgYy5RpAChBPat4SjyxpQhsGX8+bGjAFtyQ VEIBGmnXYtLQ778D1K2zppbriJRePww5u+xtZsOpRLvaXdateNMg6diqlNMAbQ3gPQmt kQFyRSi59j8BCDeUWWFf5FiRl4rEBYZ9lxa9gc7ykqyo0+lvm11vbhwUweVpKFGIRlOf OMJw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=N9Ch1wH3SoEcGaQIgdkaxN/EHB8tQo8tSRl05RPk2A4=; fh=kydX+yae8LtyrNye3NRXaQGXVXNepGqJdSkthl1kd7U=; b=ylGFeY7oy3iMA49Ls7DCBVrrqr5kboKLkAZrecvuvKCtBt6HMBKHZzEzMoUucW0r4/ iUYk8DB2uR37HrM3rj899Wof/1Cbih3ovcyQYp6xcyMqKvqvh2c7wRCoBW3lRLAZX3hA mBsxH6CjKYL+/0cOOjBTaad2lWSduq12v+OhS5KgP++wtaueZtY5i0qBpS22qosRohjd H+mWWKsOYxN8z7LZD5pE1SvoWg1ssUhLG9HrMvuVdQrizgNHdEbFT4xhWWMRSurs+W80 MZQB7JRfefGgbLLAfHEfe2ScFDj9Dk6zUBUNoRfI//PWPuOL1fjOifDHFNYv+20S/hGA 4lhg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sangfor.com.cn Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id g22-20020a17090ace9600b00285494e7747si4720437pju.167.2023.12.02.00.12.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 02 Dec 2023 00:12:17 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sangfor.com.cn Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id F2D68816EBC2; Sat, 2 Dec 2023 00:12:10 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229514AbjLBIMC (ORCPT + 99 others); Sat, 2 Dec 2023 03:12:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36766 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229379AbjLBIMB (ORCPT ); Sat, 2 Dec 2023 03:12:01 -0500 X-Greylist: delayed 542 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Sat, 02 Dec 2023 00:12:05 PST Received: from mail-m12821.netease.com (mail-m12821.netease.com [103.209.128.21]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AA2B3134; Sat, 2 Dec 2023 00:12:05 -0800 (PST) Received: from ubuntu.localdomain (unknown [111.222.250.119]) by mail-m12750.qiye.163.com (Hmail) with ESMTPA id D46C3F20445; Sat, 2 Dec 2023 16:02:31 +0800 (CST) From: Shifeng Li To: saeedm@nvidia.com, leon@kernel.org, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, eranbe@mellanox.com, moshe@mellanox.com Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org, dinghui@sangfor.com.cn, lishifeng1992@126.com, Shifeng Li , Moshe Shemesh Subject: [PATCH net v4] net/mlx5e: Fix a race in command alloc flow Date: Sat, 2 Dec 2023 00:01:26 -0800 Message-Id: <20231202080126.1167237-1-lishifeng@sangfor.com.cn> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-HM-Spam-Status: e1kfGhgUHx5ZQUpXWQgPGg8OCBgUHx5ZQUlOS1dZFg8aDwILHllBWSg2Ly tZV1koWUFITzdXWS1ZQUlXWQ8JGhUIEh9ZQVlCH0oYVkoZHkpMTB0fGh4eHlUTARMWGhIXJBQOD1 lXWRgSC1lBWUpKSlVJSUlVSU5LVUpKQllXWRYaDxIVHRRZQVlPS0hVSk1PSUxOVUpLS1VKQktLWQ Y+ X-HM-Tid: 0a8c298cf3a7b21dkuuud46c3f20445 X-HM-MType: 1 X-HM-Sender-Digest: e1kMHhlZQR0aFwgeV1kSHx4VD1lBWUc6OAw6Dxw*KTw8KAsCGCEZKgE3 URMaCzhVSlVKTEtKTktPSk5IS01DVTMWGhIXVRcSCBMSHR4VHDsIGhUcHRQJVRgUFlUYFUVZV1kS C1lBWUpKSlVJSUlVSU5LVUpKQllXWQgBWUFNSEpCNwY+ X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Sat, 02 Dec 2023 00:12:11 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784157031485384089 X-GMAIL-MSGID: 1784157031485384089 Fix a cmd->ent use after free due to a race on command entry. Such race occurs when one of the commands releases its last refcount and frees its index and entry while another process running command flush flow takes refcount to this command entry. The process which handles commands flush may see this command as needed to be flushed if the other process allocated a ent->idx but didn't set ent to cmd->ent_arr in cmd_work_handler(). Fix it by moving the assignment of cmd->ent_arr into the spin lock. [70013.081955] BUG: KASAN: use-after-free in mlx5_cmd_trigger_completions+0x1e2/0x4c0 [mlx5_core] [70013.081967] Write of size 4 at addr ffff88880b1510b4 by task kworker/26:1/1433361 [70013.081968] [70013.082028] Workqueue: events aer_isr [70013.082053] Call Trace: [70013.082067] dump_stack+0x8b/0xbb [70013.082086] print_address_description+0x6a/0x270 [70013.082102] kasan_report+0x179/0x2c0 [70013.082173] mlx5_cmd_trigger_completions+0x1e2/0x4c0 [mlx5_core] [70013.082267] mlx5_cmd_flush+0x80/0x180 [mlx5_core] [70013.082304] mlx5_enter_error_state+0x106/0x1d0 [mlx5_core] [70013.082338] mlx5_try_fast_unload+0x2ea/0x4d0 [mlx5_core] [70013.082377] remove_one+0x200/0x2b0 [mlx5_core] [70013.082409] pci_device_remove+0xf3/0x280 [70013.082439] device_release_driver_internal+0x1c3/0x470 [70013.082453] pci_stop_bus_device+0x109/0x160 [70013.082468] pci_stop_and_remove_bus_device+0xe/0x20 [70013.082485] pcie_do_fatal_recovery+0x167/0x550 [70013.082493] aer_isr+0x7d2/0x960 [70013.082543] process_one_work+0x65f/0x12d0 [70013.082556] worker_thread+0x87/0xb50 [70013.082571] kthread+0x2e9/0x3a0 [70013.082592] ret_from_fork+0x1f/0x40 The logical relationship of this error is as follows: aer_recover_work | ent->work -------------------------------------------+------------------------------ aer_recover_work_func | |- pcie_do_recovery | |- report_error_detected | |- mlx5_pci_err_detected |cmd_work_handler |- mlx5_enter_error_state | |- cmd_alloc_index |- enter_error_state | |- lock cmd->alloc_lock |- mlx5_cmd_flush | |- clear_bit |- mlx5_cmd_trigger_completions| |- unlock cmd->alloc_lock |- lock cmd->alloc_lock | |- vector = ~dev->cmd.vars.bitmask |- for_each_set_bit | |- cmd_ent_get(cmd->ent_arr[i]) (UAF) |- unlock cmd->alloc_lock | |- cmd->ent_arr[ent->idx]=ent The cmd->ent_arr[ent->idx] assignment and the bit clearing are not protected by the cmd->alloc_lock in cmd_work_handler(). Fixes: 50b2412b7e78 ("net/mlx5: Avoid possible free of command entry while timeout comp handler") Reviewed-by: Moshe Shemesh Signed-off-by: Shifeng Li --- drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) --- v1->v2: fix code conflicts. v2->v3: modify Fixes line and massage git log. v3->v4: add target tree name in the subject and add the logical diagram. diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c index f8f0a712c943..a7b1f9686c09 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c @@ -156,15 +156,18 @@ static u8 alloc_token(struct mlx5_cmd *cmd) return token; } -static int cmd_alloc_index(struct mlx5_cmd *cmd) +static int cmd_alloc_index(struct mlx5_cmd *cmd, struct mlx5_cmd_work_ent *ent) { unsigned long flags; int ret; spin_lock_irqsave(&cmd->alloc_lock, flags); ret = find_first_bit(&cmd->vars.bitmask, cmd->vars.max_reg_cmds); - if (ret < cmd->vars.max_reg_cmds) + if (ret < cmd->vars.max_reg_cmds) { clear_bit(ret, &cmd->vars.bitmask); + ent->idx = ret; + cmd->ent_arr[ent->idx] = ent; + } spin_unlock_irqrestore(&cmd->alloc_lock, flags); return ret < cmd->vars.max_reg_cmds ? ret : -ENOMEM; @@ -979,7 +982,7 @@ static void cmd_work_handler(struct work_struct *work) sem = ent->page_queue ? &cmd->vars.pages_sem : &cmd->vars.sem; down(sem); if (!ent->page_queue) { - alloc_ret = cmd_alloc_index(cmd); + alloc_ret = cmd_alloc_index(cmd, ent); if (alloc_ret < 0) { mlx5_core_err_rl(dev, "failed to allocate command entry\n"); if (ent->callback) { @@ -994,15 +997,14 @@ static void cmd_work_handler(struct work_struct *work) up(sem); return; } - ent->idx = alloc_ret; } else { ent->idx = cmd->vars.max_reg_cmds; spin_lock_irqsave(&cmd->alloc_lock, flags); clear_bit(ent->idx, &cmd->vars.bitmask); + cmd->ent_arr[ent->idx] = ent; spin_unlock_irqrestore(&cmd->alloc_lock, flags); } - cmd->ent_arr[ent->idx] = ent; lay = get_inst(cmd, ent->idx); ent->lay = lay; memset(lay, 0, sizeof(*lay));