From patchwork Sat Feb 3 15:43:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 196331 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:9bc1:b0:106:209c:c626 with SMTP id op1csp1080244dyc; Sat, 3 Feb 2024 07:49:07 -0800 (PST) X-Google-Smtp-Source: AGHT+IHbaejeGRxo3phkzGTrgXLG4sI9FN9uybHXi16hmrqUhnhZEljVgAC56Au7KZeOoM13zWZG X-Received: by 2002:a17:903:484:b0:1d9:7729:2a6f with SMTP id jj4-20020a170903048400b001d977292a6fmr3944007plb.61.1706975347603; Sat, 03 Feb 2024 07:49:07 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706975347; cv=pass; d=google.com; s=arc-20160816; b=XOEW+UA4xzFLwoLE9DZwVlxjQZqqScaCY++N29ugz8uIloiHwLDoSZgdoFlmQV4NaH EuTdwDKUebrtUFtdlU+fcnnJohJcvKmtsYlaigKXiFJa7zURfBCQAmEXi1bLRMp7HcYN sGOB7PmtlooOCEFMe0LymnSVb+XQIjV7NpaqTypPtxP/Vhn6HDE4BEdn+AwwX0trN9vs UaT5Zhjp7vO5BSuNUQK0HyWa44r4VUsQA2TnmBiUwF47oun93QTZruKGtMOfs1e9qu+m VgFtci2PtWfomQ9tFaptnzEo55DB3uKdVOYtbUHjX+E48Dw+bxrxLhSaAwzinUGESwY4 +oRQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=xqI95iFKnrWkwe8wbz452OkEw/9Fkjl9mFJelsPsD1I=; fh=AHHn5AclAEYDGG1RjYqEr7SZ0UGK/ebqc60usFFT+8w=; b=J15X17/qb8Ds8VqZCHa533Itb9pTYwLp30RPsOx+ELr2vXQ1xTVHlH5spNYoBYSY8H bqY+opDuqisuy4SSeJ4/YUa16ULUMNptOczDMaFn4L9acPcoIPIGHL5i3SHAGV+U6kVa fNFiTD2UIRL+eTMMWlJN57qAL4H+EmudGh0MSUiCywPyfs6F+jjGHzmZVG+ocj0UJs4B vG2R30DG2PPVgvdfJgxOMltDGD8m36GpeiyRqXG+zuSxpLpRMAIMxONLTeqpW/gf38xU zJLDaVJCXkHDSktC6QYU3iHtU0x4vsVg1tcYk6ICXxwUpGo5zI1eG6qqGKYMskoDPANa e+DQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=TQUXn5fm; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-51157-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-51157-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Forwarded-Encrypted: i=1; AJvYcCU+Xq6jjwULwf5sZcsdt8dx5NG0CAL45WOmaDwX6Dw2E6x4pX0TNqfAmEPRau/ywk22sb/uANPtlLyes/msRnOlhxCQdA== Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id jg1-20020a17090326c100b001d6f7875f69si3393288plb.380.2024.02.03.07.49.07 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 03 Feb 2024 07:49:07 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-51157-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=TQUXn5fm; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-51157-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-51157-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 55CD5B25747 for ; Sat, 3 Feb 2024 15:45:10 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id DE6BD5FBBA; Sat, 3 Feb 2024 15:44:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="TQUXn5fm" Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B002A5F549 for ; Sat, 3 Feb 2024 15:44:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706975046; cv=none; b=tCWziBbzkfYvwIuTht4sJxbzDeD+Y6g0OERN8uHty7oq5T0y8iTo25xM3OBxpm9k0cMSAnuQJhzb6o210p97eFJGDRVZ2yBriDRmDppR1qtUkhucz4LfUGGtq4Y5btDkA4uQysjyqn2mYrDWkx9ITKhQhuAc66y1UJjfYuBThJc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706975046; c=relaxed/simple; bh=P0fDczSqjm+4VoEPMcjEIqr0fOERQ7HRV0xJuzSw7Sw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=apeEqKu66IDlxeMuio3AsxscXrv+LZGsrQohiP5PfUpzFcOTLAg4JOcyVmcIB3iVj3KJ7whMtlrQx8bv8wgpSsrt+0hUE66CW0xlyNYGS91EYTqHZ+YK4tB0AD/9K3990lZ2V7Sa5UYN03aM7tTPHmQ0CD5KxfxV9setd2rvwtk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=TQUXn5fm; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1706975043; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xqI95iFKnrWkwe8wbz452OkEw/9Fkjl9mFJelsPsD1I=; b=TQUXn5fmTUy20KWzZKT077IM+kL4649wFqBTa1kFro1Ns0i7bAUyeuE+75x5lMHcMYBLFD rpkC2icXiFk6yF4/CePTZcJuKhWModAKQo2RgvUOFuzcbWA2f5H5GZa6um7LtB1NaeSEHG FOsWi23RKpuuddD4cg5Ci/QnI2s3cDE= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-134-yHwX9ymmOlquW_3aGoaEYQ-1; Sat, 03 Feb 2024 10:44:00 -0500 X-MC-Unique: yHwX9ymmOlquW_3aGoaEYQ-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C2467383DBE0; Sat, 3 Feb 2024 15:43:59 +0000 (UTC) Received: from llong.com (unknown [10.22.32.36]) by smtp.corp.redhat.com (Postfix) with ESMTP id 496A6492BC6; Sat, 3 Feb 2024 15:43:59 +0000 (UTC) From: Waiman Long To: Tejun Heo , Lai Jiangshan Cc: linux-kernel@vger.kernel.org, Juri Lelli , Cestmir Kalina , Alex Gladkov , Phil Auld , Costa Shulyupin , Waiman Long Subject: [PATCH-wq v2 2/5] workqueue: Enable unbound cpumask update on ordered workqueues Date: Sat, 3 Feb 2024 10:43:31 -0500 Message-Id: <20240203154334.791910-3-longman@redhat.com> In-Reply-To: <20240203154334.791910-1-longman@redhat.com> References: <20240203154334.791910-1-longman@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789893315001443703 X-GMAIL-MSGID: 1789893382199844932 Ordered workqueues does not currently follow changes made to the global unbound cpumask because per-pool workqueue changes may break the ordering guarantee. IOW, a work function in an ordered workqueue may run on an isolated CPU. This patch enables ordered workqueues to follow changes made to the global unbound cpumask by temporaily freeze the newly allocated pool_workqueue by using the new frozen flag to freeze execution of newly queued work items until the old pwq has been properly flushed. This enables ordered workqueues to follow the unbound cpumask changes like other unbound workqueues at the expense of some delay in execution of work functions during the transition period. Signed-off-by: Waiman Long --- kernel/workqueue.c | 93 +++++++++++++++++++++++++++++++++++++++------- 1 file changed, 80 insertions(+), 13 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 7ef393f4012e..f089e532758a 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -242,6 +242,7 @@ struct pool_workqueue { int refcnt; /* L: reference count */ int nr_in_flight[WORK_NR_COLORS]; /* L: nr of in_flight works */ + int frozen; /* L: temporarily frozen */ /* * nr_active management and WORK_STRUCT_INACTIVE: @@ -1667,6 +1668,9 @@ static bool pwq_tryinc_nr_active(struct pool_workqueue *pwq, bool fill) lockdep_assert_held(&pool->lock); + if (pwq->frozen) + return false; + if (!nna) { /* per-cpu workqueue, pwq->nr_active is sufficient */ obtained = pwq->nr_active < READ_ONCE(wq->max_active); @@ -1747,6 +1751,21 @@ static bool pwq_activate_first_inactive(struct pool_workqueue *pwq, bool fill) } } +/** + * thaw_pwq - thaw a frozen pool_workqueue + * @pwq: pool_workqueue to be thawed + */ +static void thaw_pwq(struct pool_workqueue *pwq) +{ + unsigned long flags; + + raw_spin_lock_irqsave(&pwq->pool->lock, flags); + pwq->frozen = false; + if (pwq_activate_first_inactive(pwq, true)) + kick_pool(pwq->pool); + raw_spin_unlock_irqrestore(&pwq->pool->lock, flags); +} + /** * node_activate_pending_pwq - Activate a pending pwq on a wq_node_nr_active * @nna: wq_node_nr_active to activate a pending pwq for @@ -4595,6 +4614,14 @@ static void pwq_release_workfn(struct kthread_work *work) mutex_lock(&wq->mutex); list_del_rcu(&pwq->pwqs_node); is_last = list_empty(&wq->pwqs); + + /* + * For ordered workqueue with a frozen dfl_pwq, thaw it now. + */ + if (!is_last && (wq->flags & __WQ_ORDERED_EXPLICIT) && + wq->dfl_pwq->frozen) + thaw_pwq(wq->dfl_pwq); + mutex_unlock(&wq->mutex); } @@ -4758,10 +4785,30 @@ static void apply_wqattrs_cleanup(struct apply_wqattrs_ctx *ctx) { if (ctx) { int cpu; + bool refcheck = false; for_each_possible_cpu(cpu) put_pwq_unlocked(ctx->pwq_tbl[cpu]); + + /* + * For ordered workqueue with a frozen dfl_pwq and a reference + * count of 1 in ctx->dfl_pwq, it is highly likely that the + * refcnt will become 0 after the final put_pwq(). Acquire + * wq->mutex to ensure that the pwq won't be freed by + * pwq_release_workfn() when we check pwq later. + */ + if ((ctx->wq->flags & __WQ_ORDERED_EXPLICIT) && + ctx->wq->dfl_pwq->frozen && + (ctx->dfl_pwq->refcnt == 1)) { + mutex_lock(&ctx->wq->mutex); + refcheck = true; + } put_pwq_unlocked(ctx->dfl_pwq); + if (refcheck) { + if (!ctx->dfl_pwq->refcnt) + thaw_pwq(ctx->wq->dfl_pwq); + mutex_unlock(&ctx->wq->mutex); + } free_workqueue_attrs(ctx->attrs); @@ -4821,6 +4868,15 @@ apply_wqattrs_prepare(struct workqueue_struct *wq, cpumask_copy(new_attrs->__pod_cpumask, new_attrs->cpumask); ctx->attrs = new_attrs; + /* + * For initialized ordered workqueues, there is only one pwq (dfl_pwq). + * Temporarily the frozen flag of ctx->dfl_pwq to freeze the execution + * of newly queued work items until execution of older work items in + * the old pwq has completed. + */ + if (!list_empty(&wq->pwqs) && (wq->flags & __WQ_ORDERED_EXPLICIT)) + ctx->dfl_pwq->frozen = true; + ctx->wq = wq; return ctx; @@ -4861,13 +4917,8 @@ static int apply_workqueue_attrs_locked(struct workqueue_struct *wq, if (WARN_ON(!(wq->flags & WQ_UNBOUND))) return -EINVAL; - /* creating multiple pwqs breaks ordering guarantee */ - if (!list_empty(&wq->pwqs)) { - if (WARN_ON(wq->flags & __WQ_ORDERED_EXPLICIT)) - return -EINVAL; - + if (!list_empty(&wq->pwqs) && !(wq->flags & __WQ_ORDERED_EXPLICIT)) wq->flags &= ~__WQ_ORDERED; - } ctx = apply_wqattrs_prepare(wq, attrs, wq_unbound_cpumask); if (IS_ERR(ctx)) @@ -6316,11 +6367,28 @@ static int workqueue_apply_unbound_cpumask(const cpumask_var_t unbound_cpumask) if (!(wq->flags & WQ_UNBOUND) || (wq->flags & __WQ_DESTROYING)) continue; - /* creating multiple pwqs breaks ordering guarantee */ + /* + * We does not support changing cpumask of an ordered workqueue + * again before the previous cpumask change is completed. + * Sleep up to 100ms in 10ms interval to allow previous + * operation to complete and skip it if not done by then. + */ if (!list_empty(&wq->pwqs)) { - if (wq->flags & __WQ_ORDERED_EXPLICIT) - continue; - wq->flags &= ~__WQ_ORDERED; + struct pool_workqueue *pwq = wq->dfl_pwq; + + if (!(wq->flags & __WQ_ORDERED_EXPLICIT)) { + wq->flags &= ~__WQ_ORDERED; + } else if (pwq && pwq->frozen) { + int i; + + for (i = 0; i < 10; i++) { + msleep(10); + if (!pwq->frozen) + break; + } + if (WARN_ON_ONCE(pwq->frozen)) + continue; + } } ctx = apply_wqattrs_prepare(wq, wq->unbound_attrs, unbound_cpumask); @@ -6836,9 +6904,8 @@ int workqueue_sysfs_register(struct workqueue_struct *wq) int ret; /* - * Adjusting max_active or creating new pwqs by applying - * attributes breaks ordering guarantee. Disallow exposing ordered - * workqueues. + * Adjusting max_active breaks ordering guarantee. Disallow exposing + * ordered workqueues. */ if (WARN_ON(wq->flags & __WQ_ORDERED_EXPLICIT)) return -EINVAL;