Message ID | 20230527132747.83196-3-frank@oltmanns.dev |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp345280vqr; Sat, 27 May 2023 06:40:26 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7WXGNp+vO6yklZ552l0wuZFyn3oJDmGwBUs4qginf5i669CIh+/smrrGV/y5N+sVk/8wE3 X-Received: by 2002:a17:903:2312:b0:1b0:155b:bff2 with SMTP id d18-20020a170903231200b001b0155bbff2mr6138835plh.59.1685194826069; Sat, 27 May 2023 06:40:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685194826; cv=none; d=google.com; s=arc-20160816; b=cdAcoyjUhYKN8rJaM3GMjCEan4N56QdliVLbNsxYZz2Dh/7fEZhIjP66valz0fn78W ks816OXa8M0j355s3u8EsQtacMoKQaLwcgnPEZiKM581o/JWCMc0XknCbtilWYMNrPU3 iwCSnpmg5uyKKdIgi0VI2tFp6+SuyMhsUyryczBQPDTV6Sw5Ekgwi3iaKBtXqTVkui3c uXWO4UrCysFhaSISWiZivDSKilDP/eddFx3Ly0qtutVyy1Dzecws3xQctIWyCjM3QsaW UIoCTg9OSIXCYDCqCLPt9J3oZ8riwbacAbagZYYHNgbtdE8YaSFEb4yHKriznnAOJYit jrwg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=4E0//LKBNUllQfAtkiLIrhWN7vXYAzX6w6tTzQQg8xk=; b=O7lxRCToY3V1U4mTMG2th9QZi9FAQhI9gV+amUZ2irNYLdgebOJB6pdWHIfDv5pVqx Meow9QBpKa8dAmgde8fRBsN0ctNqKJX766XB8uX+eYc9Qd9/4uL/xNjHJa6WM/64uw/6 0Iypx3mE2BHGSkNtn+MINw4mKuHTRxKCpZ4A0nf8fqClwbnFT/u/hLl1jHsbWQqRnTpE tXuYfn+eDlr82uFiWUv3qrOHI0SBVJ2KfjbnIYi3qRwZdUOqBOKLZfLr9+c+WUbYZzfx xQ+m9iiS2DUKWjExMlqHJbsW3OUP/UBBG4fwmAJDwQ5DVWTqxZL0QTKF9zKpY6TfT7py JzNQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oltmanns.dev header.s=MBO0001 header.b=OvGVuYsK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oltmanns.dev Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g18-20020a170902869200b001ae4b4527b3si5918107plo.290.2023.05.27.06.40.13; Sat, 27 May 2023 06:40:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@oltmanns.dev header.s=MBO0001 header.b=OvGVuYsK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oltmanns.dev Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230137AbjE0N2a (ORCPT <rfc822;zhanglyra.2023@gmail.com> + 99 others); Sat, 27 May 2023 09:28:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57712 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232239AbjE0N21 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sat, 27 May 2023 09:28:27 -0400 Received: from mout-p-202.mailbox.org (mout-p-202.mailbox.org [80.241.56.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D8C87B1; Sat, 27 May 2023 06:28:25 -0700 (PDT) Received: from smtp2.mailbox.org (smtp2.mailbox.org [10.196.197.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-202.mailbox.org (Postfix) with ESMTPS id 4QT2fp11vSz9scm; Sat, 27 May 2023 15:28:18 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oltmanns.dev; s=MBO0001; t=1685194098; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4E0//LKBNUllQfAtkiLIrhWN7vXYAzX6w6tTzQQg8xk=; b=OvGVuYsKY7RuRPhxEsjloD+qLuaq7awbeJKTP8VfsQmuXdzIo5jrdutG53X3Lo2nnc+rpG 76ZGsRemAnDeuwaChPtGROHGTOXlVweQHvkn03uIuBEguEACkXmNr7MzE+KmkYbrdnkJeY y30NivBAuv6lziTLewyWXYZprgOfcd0gwsXNTPf2LquA/K8NtM3YKv1luQ1cjJo2u8z/+y gNwhnCk70+EoQKHm853UIogoYSwytYF9+j7EUAO7qFab5WvI4t3CcG4G0c+SfXI88DycWo JtQeA69v9mbTY7igiAfe8A/hFN42x+8awJX66stHWJKYVDJ4ZIaFkQPjYlCviQ== From: Frank Oltmanns <frank@oltmanns.dev> To: linux-arm-kernel@lists.infradead.org, linux-clk@vger.kernel.org, linux-kernel@vger.kernel.org, linux-sunxi@lists.linux.dev Cc: Frank Oltmanns <frank@oltmanns.dev>, Andre Przywara <andre.przywara@arm.com>, Chen-Yu Tsai <wens@csie.org>, Icenowy Zheng <icenowy@aosc.io>, Jernej Skrabec <jernej.skrabec@gmail.com>, Maxime Ripard <mripard@kernel.org>, Michael Turquette <mturquette@baylibre.com>, Rob Herring <robh@kernel.org>, Samuel Holland <samuel@sholland.org>, Stephen Boyd <sboyd@kernel.org> Subject: [RFC PATCH 2/3] clk: sunxi-ng: Implement precalculated NKM rate selection Date: Sat, 27 May 2023 15:27:46 +0200 Message-Id: <20230527132747.83196-3-frank@oltmanns.dev> In-Reply-To: <20230527132747.83196-1-frank@oltmanns.dev> References: <20230527132747.83196-1-frank@oltmanns.dev> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767054849729255525?= X-GMAIL-MSGID: =?utf-8?q?1767054849729255525?= |
Series |
clk: sunxi-ng: Optimize rate selection for NKM clocks
|
|
Commit Message
Frank Oltmanns
May 27, 2023, 1:27 p.m. UTC
Add a new precalculation method for NKM clock rate selection in the
sunxi-ng clock driver. Introduce ccu_nkm_find_best_precalc which uses a
precalculated table of valid NKM combinations (struct clk_nkm_table and
struct clk_nkm_combo) to find the best rate. This approach provides
faster rate selection by searching a table of valid combinations rather
than calculating for all possible combinations.
The table of NKM combinations needs to be initialized with meaningful
combinations only, i.e. removing redundant combinations that result in
the same rate.
Keep the existing ccu_nkm_find_best function in place and use it as a
fallback if no precalculated table is provided.
Signed-off-by: Frank Oltmanns <frank@oltmanns.dev>
---
drivers/clk/sunxi-ng/ccu_nkm.c | 84 +++++++++++++++++++++++++++-------
drivers/clk/sunxi-ng/ccu_nkm.h | 26 +++++++++++
2 files changed, 94 insertions(+), 16 deletions(-)
Comments
Hi Frank, On Sat, May 27, 2023 at 11:37 PM Frank Oltmanns <frank@oltmanns.dev> wrote: > > Add a new precalculation method for NKM clock rate selection in the > sunxi-ng clock driver. Introduce ccu_nkm_find_best_precalc which uses a > precalculated table of valid NKM combinations (struct clk_nkm_table and > struct clk_nkm_combo) to find the best rate. This approach provides > faster rate selection by searching a table of valid combinations rather > than calculating for all possible combinations. > > The table of NKM combinations needs to be initialized with meaningful > combinations only, i.e. removing redundant combinations that result in > the same rate. > > Keep the existing ccu_nkm_find_best function in place and use it as a > fallback if no precalculated table is provided. > > Signed-off-by: Frank Oltmanns <frank@oltmanns.dev> > --- > drivers/clk/sunxi-ng/ccu_nkm.c | 84 +++++++++++++++++++++++++++------- > drivers/clk/sunxi-ng/ccu_nkm.h | 26 +++++++++++ > 2 files changed, 94 insertions(+), 16 deletions(-) > > diff --git a/drivers/clk/sunxi-ng/ccu_nkm.c b/drivers/clk/sunxi-ng/ccu_nkm.c > index 94d2a83992b2..9652f6df17bd 100644 > --- a/drivers/clk/sunxi-ng/ccu_nkm.c > +++ b/drivers/clk/sunxi-ng/ccu_nkm.c > @@ -54,6 +54,49 @@ static unsigned long ccu_nkm_find_best(unsigned long parent, unsigned long rate, > return best_rate; > } > > +static unsigned long ccu_nkm_find_best_precalc(unsigned long parent, > + unsigned long rate, > + struct _ccu_nkm *nkm, > + struct clk_nkm_table *table) > +{ > + unsigned long best_rate = 0, best_diff = ULONG_MAX; > + unsigned long best_n = 0, best_k = 0, best_m = 0; > + int start = 0, end = table->num - 1, mid; > + > + while (start <= end) { > + unsigned long tmp_rate; > + unsigned long tmp_diff; > + > + mid = (start + end) / 2; > + > + tmp_rate = parent * table->combos[mid].n * table->combos[mid].k / > + table->combos[mid].m; > + > + tmp_diff = abs(rate - tmp_rate); > + > + if (tmp_diff < best_diff) { > + best_rate = tmp_rate; > + best_diff = tmp_diff; > + best_n = table->combos[mid].n; > + best_k = table->combos[mid].k; > + best_m = table->combos[mid].m; > + if (best_diff == 0) > + goto out; > + } If the table was sorted by n * k / m, this could just be a process of searching through until we either: - find that the first rate in the table is too high - find an exact rate - go above the requested rate, then there's only two to compare: our current rate and the previous one This should massively simplify this function and would still work with a binary search. > + if (rate < tmp_rate) > + end = mid - 1; > + else > + start = mid + 1; > + } > + > +out: > + nkm->n = best_n; > + nkm->k = best_k; > + nkm->m = best_m; > + > + return best_rate; > +} > + > static void ccu_nkm_disable(struct clk_hw *hw) > { > struct ccu_nkm *nkm = hw_to_ccu_nkm(hw); > diff --git a/drivers/clk/sunxi-ng/ccu_nkm.h b/drivers/clk/sunxi-ng/ccu_nkm.h > index 6601defb3f38..fa5551724921 100644 > --- a/drivers/clk/sunxi-ng/ccu_nkm.h > +++ b/drivers/clk/sunxi-ng/ccu_nkm.h > @@ -12,6 +12,30 @@ > #include "ccu_div.h" > #include "ccu_mult.h" > > +struct clk_nkm_combo { > + u8 n; > + u8 k; > + u8 m; > +}; > + > +/** > + * struct clk_nkm_table - Table of all meaningful combinations for n, k, and m > + * > + * @num: Number of entries in the table > + * @combos: Array of combos (of size num) that are supported by this clock. > + * > + * This table shall contain all meaningful combinations of n, k, and m. That > + * means that combinations that result in the same clock rate shall only be > + * listed once. For example, if both > + * { .n = 1, .k = 2, .m = 2} and { .n = 2, .k = 2, .m = 4} > + * are valid values for n, k, and m, only one of them would be allowed because > + * both result in a factor of 1.0. > + */ > +struct clk_nkm_table { > + size_t num; > + struct clk_nkm_combo *combos; Should this be a "flex" array, i.e. struct clk_nkm_combo combos[] > +}; > + > /* > * struct ccu_nkm - Definition of an N-K-M clock > * Thanks,
Hi Julian, On 2023-05-28 at 09:19:36 +1000, Julian Calaby <julian.calaby@gmail.com> wrote: > Hi Frank, > > On Sat, May 27, 2023 at 11:37 PM Frank Oltmanns <frank@oltmanns.dev> wrote: >> >> Add a new precalculation method for NKM clock rate selection in the >> sunxi-ng clock driver. Introduce ccu_nkm_find_best_precalc which uses a >> precalculated table of valid NKM combinations (struct clk_nkm_table and >> struct clk_nkm_combo) to find the best rate. This approach provides >> faster rate selection by searching a table of valid combinations rather >> than calculating for all possible combinations. >> >> The table of NKM combinations needs to be initialized with meaningful >> combinations only, i.e. removing redundant combinations that result in >> the same rate. >> >> Keep the existing ccu_nkm_find_best function in place and use it as a >> fallback if no precalculated table is provided. >> >> Signed-off-by: Frank Oltmanns <frank@oltmanns.dev> >> --- >> drivers/clk/sunxi-ng/ccu_nkm.c | 84 +++++++++++++++++++++++++++------- >> drivers/clk/sunxi-ng/ccu_nkm.h | 26 +++++++++++ >> 2 files changed, 94 insertions(+), 16 deletions(-) >> >> diff --git a/drivers/clk/sunxi-ng/ccu_nkm.c b/drivers/clk/sunxi-ng/ccu_nkm.c >> index 94d2a83992b2..9652f6df17bd 100644 >> --- a/drivers/clk/sunxi-ng/ccu_nkm.c >> +++ b/drivers/clk/sunxi-ng/ccu_nkm.c >> @@ -54,6 +54,49 @@ static unsigned long ccu_nkm_find_best(unsigned long parent, unsigned long rate, >> return best_rate; >> } >> >> +static unsigned long ccu_nkm_find_best_precalc(unsigned long parent, >> + unsigned long rate, >> + struct _ccu_nkm *nkm, >> + struct clk_nkm_table *table) >> +{ >> + unsigned long best_rate = 0, best_diff = ULONG_MAX; >> + unsigned long best_n = 0, best_k = 0, best_m = 0; >> + int start = 0, end = table->num - 1, mid; >> + >> + while (start <= end) { >> + unsigned long tmp_rate; >> + unsigned long tmp_diff; >> + >> + mid = (start + end) / 2; >> + >> + tmp_rate = parent * table->combos[mid].n * table->combos[mid].k / >> + table->combos[mid].m; >> + >> + tmp_diff = abs(rate - tmp_rate); >> + >> + if (tmp_diff < best_diff) { >> + best_rate = tmp_rate; >> + best_diff = tmp_diff; >> + best_n = table->combos[mid].n; >> + best_k = table->combos[mid].k; >> + best_m = table->combos[mid].m; >> + if (best_diff == 0) >> + goto out; >> + } > Thank you for your feedback! In my proposal, the code performs a binary search by 1. taking the element in the middle (mid) 2. calculating the rate of the element (tmp_rate) 3. calculating the difference to the requested rate (tmp_diff) 4. if the diff is better than the best_diff making it the new best n-k-m-combo (the if block) > If the table was sorted by n * k / m, this could just be a process of Please note, the table already has to be sorted for the function to work, as is the nature of a binary search. I should definitely add comments. I'm sorry, the code was intended more as a basis to discuss the general idea that I described in the cover letter. I should have made that clearer. > searching through until we either: > - find that the first rate in the table is too high I could see that I could add two steps in the beginning, before the loop: - Take the first element and see if its rate is greater than the requested rate, if so immediatly return it - Take the last element and see if its rate is less than the requested rate, if so immediatly return it Is that what you mean? I'd have to run some simulations to see, if this is a real improvement, because we would need two additional rate calculations. Worst case would therefore be 2+log(n) calculations instead of log(n) and the code would be slightly more complicated in my opinion. But if we run this function with all possible parents rate (as suggested in the end of my cover letter) these two special cases could very well be often applicable. Thanks! > - find an exact rate What do you mean by "exact rate"? Do you mean a rate that matches the requested rate exactly. This is what the code is already trying to do. But, as this is not always possible, in cases where it does not find an exact match, it takes the closest match instead. > - go above the requested rate, then there's only two to compare: our > current rate and the previous one Sorry, you've lost me here. How would I go above the requested rate? You would have to do the binary search to find that rate, but then why not search the closest rate directly (as the code does) instead of searching the closest rate above the requested (as you proposed). I feel like either one of us is missing something. :) > This should massively simplify this function and would still work with > a binary search. Sidenote, we could store best_index instead of best_n, best_k, best_m, but for the first iteration I tried to keep it as close as possible to the original ccu_nkm_find_best() function. > >> + if (rate < tmp_rate) >> + end = mid - 1; >> + else >> + start = mid + 1; >> + } >> + >> +out: >> + nkm->n = best_n; >> + nkm->k = best_k; >> + nkm->m = best_m; >> + >> + return best_rate; >> +} >> + >> static void ccu_nkm_disable(struct clk_hw *hw) >> { >> struct ccu_nkm *nkm = hw_to_ccu_nkm(hw); >> diff --git a/drivers/clk/sunxi-ng/ccu_nkm.h b/drivers/clk/sunxi-ng/ccu_nkm.h >> index 6601defb3f38..fa5551724921 100644 >> --- a/drivers/clk/sunxi-ng/ccu_nkm.h >> +++ b/drivers/clk/sunxi-ng/ccu_nkm.h >> @@ -12,6 +12,30 @@ >> #include "ccu_div.h" >> #include "ccu_mult.h" >> >> +struct clk_nkm_combo { >> + u8 n; >> + u8 k; >> + u8 m; >> +}; >> + >> +/** >> + * struct clk_nkm_table - Table of all meaningful combinations for n, k, and m >> + * >> + * @num: Number of entries in the table >> + * @combos: Array of combos (of size num) that are supported by this clock. >> + * >> + * This table shall contain all meaningful combinations of n, k, and m. That >> + * means that combinations that result in the same clock rate shall only be >> + * listed once. For example, if both >> + * { .n = 1, .k = 2, .m = 2} and { .n = 2, .k = 2, .m = 4} >> + * are valid values for n, k, and m, only one of them would be allowed because >> + * both result in a factor of 1.0. >> + */ >> +struct clk_nkm_table { >> + size_t num; >> + struct clk_nkm_combo *combos; > > Should this be a "flex" array, i.e. > > struct clk_nkm_combo combos[] Thanks, noted. I'll look into that once we have an agreement on the general concept. I think it depends on the fact if we want to use values that have been calculated off-line (i.e. prior to compilation) or if we want to create the table at run-time (i.e. when needed) using kmalloc. See my alternate proposals in the cover letter for details. I'll need to check if the run-time approach works with "flex" arrays. Thanks, Frank > >> +}; >> + >> /* >> * struct ccu_nkm - Definition of an N-K-M clock >> * > > Thanks,
On 2023-05-27 at 15:27:46 +0200, Frank Oltmanns <frank@oltmanns.dev> wrote: [...] > diff --git a/drivers/clk/sunxi-ng/ccu_nkm.c b/drivers/clk/sunxi-ng/ccu_nkm.c > index 94d2a83992b2..9652f6df17bd 100644 > --- a/drivers/clk/sunxi-ng/ccu_nkm.c > +++ b/drivers/clk/sunxi-ng/ccu_nkm.c [...] > @@ -157,14 +205,18 @@ static int ccu_nkm_set_rate(struct clk_hw *hw, unsigned long rate, > if (nkm->common.features & CCU_FEATURE_FIXED_POSTDIV) > rate *= nkm->fixed_post_div; > > - _nkm.min_n = nkm->n.min ?: 1; > - _nkm.max_n = nkm->n.max ?: 1 << nkm->n.width; > - _nkm.min_k = nkm->k.min ?: 1; > - _nkm.max_k = nkm->k.max ?: 1 << nkm->k.width; > - _nkm.min_m = 1; > - _nkm.max_m = nkm->m.max ?: 1 << nkm->m.width; > - > - ccu_nkm_find_best(parent_rate, rate, &_nkm); > + if (nkm->table.num) > + rate = ccu_nkm_find_best_precalc(*parent_rate, rate, &_nkm, Ugh! s/*parent_rate/parent_rate/ Sorry! Thanks to intel kernel test robot for pointing it out: https://lore.kernel.org/oe-kbuild-all/202305280405.bUAMrEtn-lkp@intel.com/ Cheers, Frank > + &nkm->table); > + else { > + _nkm.min_n = nkm->n.min ?: 1; > + _nkm.max_n = nkm->n.max ?: 1 << nkm->n.width; > + _nkm.min_k = nkm->k.min ?: 1; > + _nkm.max_k = nkm->k.max ?: 1 << nkm->k.width; > + _nkm.min_m = 1; > + _nkm.max_m = nkm->m.max ?: 1 << nkm->m.width; > + ccu_nkm_find_best(parent_rate, rate, &_nkm); > + } > > spin_lock_irqsave(nkm->common.lock, flags); > > diff --git a/drivers/clk/sunxi-ng/ccu_nkm.h b/drivers/clk/sunxi-ng/ccu_nkm.h > index 6601defb3f38..fa5551724921 100644 > --- a/drivers/clk/sunxi-ng/ccu_nkm.h > +++ b/drivers/clk/sunxi-ng/ccu_nkm.h > @@ -12,6 +12,30 @@ > #include "ccu_div.h" > #include "ccu_mult.h" > > +struct clk_nkm_combo { > + u8 n; > + u8 k; > + u8 m; > +}; > + > +/** > + * struct clk_nkm_table - Table of all meaningful combinations for n, k, and m > + * > + * @num: Number of entries in the table > + * @combos: Array of combos (of size num) that are supported by this clock. > + * > + * This table shall contain all meaningful combinations of n, k, and m. That > + * means that combinations that result in the same clock rate shall only be > + * listed once. For example, if both > + * { .n = 1, .k = 2, .m = 2} and { .n = 2, .k = 2, .m = 4} > + * are valid values for n, k, and m, only one of them would be allowed because > + * both result in a factor of 1.0. > + */ > +struct clk_nkm_table { > + size_t num; > + struct clk_nkm_combo *combos; > +}; > + > /* > * struct ccu_nkm - Definition of an N-K-M clock > * > @@ -29,6 +53,8 @@ struct ccu_nkm { > unsigned int fixed_post_div; > > struct ccu_common common; > + > + struct clk_nkm_table table; > }; > > #define SUNXI_CCU_NKM_WITH_MUX_GATE_LOCK(_struct, _name, _parents, _reg, \
Hi Frank, On Sun, May 28, 2023 at 8:10 PM Frank Oltmanns <frank@oltmanns.dev> wrote: > > Hi Julian, > > On 2023-05-28 at 09:19:36 +1000, Julian Calaby <julian.calaby@gmail.com> wrote: > > Hi Frank, > > > > On Sat, May 27, 2023 at 11:37 PM Frank Oltmanns <frank@oltmanns.dev> wrote: > >> > >> Add a new precalculation method for NKM clock rate selection in the > >> sunxi-ng clock driver. Introduce ccu_nkm_find_best_precalc which uses a > >> precalculated table of valid NKM combinations (struct clk_nkm_table and > >> struct clk_nkm_combo) to find the best rate. This approach provides > >> faster rate selection by searching a table of valid combinations rather > >> than calculating for all possible combinations. > >> > >> The table of NKM combinations needs to be initialized with meaningful > >> combinations only, i.e. removing redundant combinations that result in > >> the same rate. > >> > >> Keep the existing ccu_nkm_find_best function in place and use it as a > >> fallback if no precalculated table is provided. > >> > >> Signed-off-by: Frank Oltmanns <frank@oltmanns.dev> > >> --- > >> drivers/clk/sunxi-ng/ccu_nkm.c | 84 +++++++++++++++++++++++++++------- > >> drivers/clk/sunxi-ng/ccu_nkm.h | 26 +++++++++++ > >> 2 files changed, 94 insertions(+), 16 deletions(-) > >> > >> diff --git a/drivers/clk/sunxi-ng/ccu_nkm.c b/drivers/clk/sunxi-ng/ccu_nkm.c > >> index 94d2a83992b2..9652f6df17bd 100644 > >> --- a/drivers/clk/sunxi-ng/ccu_nkm.c > >> +++ b/drivers/clk/sunxi-ng/ccu_nkm.c > >> @@ -54,6 +54,49 @@ static unsigned long ccu_nkm_find_best(unsigned long parent, unsigned long rate, > >> return best_rate; > >> } > >> > >> +static unsigned long ccu_nkm_find_best_precalc(unsigned long parent, > >> + unsigned long rate, > >> + struct _ccu_nkm *nkm, > >> + struct clk_nkm_table *table) > >> +{ > >> + unsigned long best_rate = 0, best_diff = ULONG_MAX; > >> + unsigned long best_n = 0, best_k = 0, best_m = 0; > >> + int start = 0, end = table->num - 1, mid; > >> + > >> + while (start <= end) { > >> + unsigned long tmp_rate; > >> + unsigned long tmp_diff; > >> + > >> + mid = (start + end) / 2; > >> + > >> + tmp_rate = parent * table->combos[mid].n * table->combos[mid].k / > >> + table->combos[mid].m; > >> + > >> + tmp_diff = abs(rate - tmp_rate); > >> + > >> + if (tmp_diff < best_diff) { > >> + best_rate = tmp_rate; > >> + best_diff = tmp_diff; > >> + best_n = table->combos[mid].n; > >> + best_k = table->combos[mid].k; > >> + best_m = table->combos[mid].m; > >> + if (best_diff == 0) > >> + goto out; > >> + } > > > > Thank you for your feedback! > > In my proposal, the code performs a binary search by > 1. taking the element in the middle (mid) > 2. calculating the rate of the element (tmp_rate) > 3. calculating the difference to the requested rate (tmp_diff) > 4. if the diff is better than the best_diff making it the new best > n-k-m-combo (the if block) I'm so sorry, I thought that this was still doing a linear search as it's so close to the original code. > > > If the table was sorted by n * k / m, this could just be a process of > > Please note, the table already has to be sorted for the function to > work, as is the nature of a binary search. I should definitely add > comments. I'm sorry, the code was intended more as a basis to discuss > the general idea that I described in the cover letter. I should have > made that clearer. > > > searching through until we either: > > - find that the first rate in the table is too high > > I could see that I could add two steps in the beginning, before the loop: > - Take the first element and see if its rate is greater than the > requested rate, if so immediatly return it > - Take the last element and see if its rate is less than the requested > rate, if so immediatly return it > > Is that what you mean? I'd have to run some simulations to see, if this > is a real improvement, because we would need two additional rate > calculations. Worst case would therefore be 2+log(n) calculations > instead of log(n) and the code would be slightly more complicated in my > opinion. But if we run this function with all possible parents rate (as > suggested in the end of my cover letter) these two special cases could > very well be often applicable. Thanks! > > > - find an exact rate > > What do you mean by "exact rate"? Do you mean a rate that matches the > requested rate exactly. This is what the code is already trying to do. > But, as this is not always possible, in cases where it does not find an > exact match, it takes the closest match instead. > > > - go above the requested rate, then there's only two to compare: our > > current rate and the previous one > > Sorry, you've lost me here. How would I go above the requested rate? You > would have to do the binary search to find that rate, but then why not > search the closest rate directly (as the code does) instead of searching > the closest rate above the requested (as you proposed). I feel like > either one of us is missing something. :) What we're missing is that I'm not explaining this well. Let's take a very simple table: (value = parent * n * k / m) 0. 100 1. 200 2. 300 3. 400 If we search for 50, our closest is the first rate, so index 0: this is the "find that the first rate in the table is too high" case. If we search for 300, we'll converge on index 2: this is the "exact rate" situation. If we search for 275, then we'll converge on either 200 or 300: this is the "two to compare" situation: if we converge until we get to the lowest rate above our target, we only need to check the rate immediately before it in the table and the one we converged on to find the closest. So in pseudo-code, we'd end up with something like this: -------- start = 0; cur_rate = parent * table[start].n * table[start].k / table[start].m; if (cur_rate >= target) return table[start]; while (start <= end) { mid = (start + end) / 2; cur_rate = parent * table[mid].n * table[mid].k / table[mid].m; if (cur_rate == target) return table[mid]; if (target < cur_rate) end = mid - 1; else start = mid + 1; } prev_rate = parent * table[mid - 1].n * table[mid - 1].k / table[mid - 1].m; if (abs(target - prev_rate) < abs(target - cur_rate)) return table[mid - 1]; return table[mid]; -------- Which seems simpler to my eye and moves all the difference calculations out of the loop so they only have to be done once, effectively trading a difference calculation on each checked rate for a rate calculation, and dropping some variables in the process. Thanks,
Hi Julian, On 2023-05-29 at 01:32:02 +1000, Julian Calaby <julian.calaby@gmail.com> wrote: > Hi Frank, > > On Sun, May 28, 2023 at 8:10 PM Frank Oltmanns <frank@oltmanns.dev> wrote: >> >> Hi Julian, >> >> On 2023-05-28 at 09:19:36 +1000, Julian Calaby <julian.calaby@gmail.com> wrote: >> > Hi Frank, >> > >> > On Sat, May 27, 2023 at 11:37 PM Frank Oltmanns <frank@oltmanns.dev> wrote: >> >> >> >> Add a new precalculation method for NKM clock rate selection in the >> >> sunxi-ng clock driver. Introduce ccu_nkm_find_best_precalc which uses a >> >> precalculated table of valid NKM combinations (struct clk_nkm_table and >> >> struct clk_nkm_combo) to find the best rate. This approach provides >> >> faster rate selection by searching a table of valid combinations rather >> >> than calculating for all possible combinations. >> >> >> >> The table of NKM combinations needs to be initialized with meaningful >> >> combinations only, i.e. removing redundant combinations that result in >> >> the same rate. >> >> >> >> Keep the existing ccu_nkm_find_best function in place and use it as a >> >> fallback if no precalculated table is provided. >> >> >> >> Signed-off-by: Frank Oltmanns <frank@oltmanns.dev> >> >> --- >> >> drivers/clk/sunxi-ng/ccu_nkm.c | 84 +++++++++++++++++++++++++++------- >> >> drivers/clk/sunxi-ng/ccu_nkm.h | 26 +++++++++++ >> >> 2 files changed, 94 insertions(+), 16 deletions(-) >> >> >> >> diff --git a/drivers/clk/sunxi-ng/ccu_nkm.c b/drivers/clk/sunxi-ng/ccu_nkm.c >> >> index 94d2a83992b2..9652f6df17bd 100644 >> >> --- a/drivers/clk/sunxi-ng/ccu_nkm.c >> >> +++ b/drivers/clk/sunxi-ng/ccu_nkm.c >> >> @@ -54,6 +54,49 @@ static unsigned long ccu_nkm_find_best(unsigned long parent, unsigned long rate, >> >> return best_rate; >> >> } >> >> >> >> +static unsigned long ccu_nkm_find_best_precalc(unsigned long parent, >> >> + unsigned long rate, >> >> + struct _ccu_nkm *nkm, >> >> + struct clk_nkm_table *table) >> >> +{ >> >> + unsigned long best_rate = 0, best_diff = ULONG_MAX; >> >> + unsigned long best_n = 0, best_k = 0, best_m = 0; >> >> + int start = 0, end = table->num - 1, mid; >> >> + >> >> + while (start <= end) { >> >> + unsigned long tmp_rate; >> >> + unsigned long tmp_diff; >> >> + >> >> + mid = (start + end) / 2; >> >> + >> >> + tmp_rate = parent * table->combos[mid].n * table->combos[mid].k / >> >> + table->combos[mid].m; >> >> + >> >> + tmp_diff = abs(rate - tmp_rate); >> >> + >> >> + if (tmp_diff < best_diff) { >> >> + best_rate = tmp_rate; >> >> + best_diff = tmp_diff; >> >> + best_n = table->combos[mid].n; >> >> + best_k = table->combos[mid].k; >> >> + best_m = table->combos[mid].m; >> >> + if (best_diff == 0) >> >> + goto out; >> >> + } >> > >> >> Thank you for your feedback! >> >> In my proposal, the code performs a binary search by >> 1. taking the element in the middle (mid) >> 2. calculating the rate of the element (tmp_rate) >> 3. calculating the difference to the requested rate (tmp_diff) >> 4. if the diff is better than the best_diff making it the new best >> n-k-m-combo (the if block) > > I'm so sorry, I thought that this was still doing a linear search as > it's so close to the original code. > >> >> > If the table was sorted by n * k / m, this could just be a process of >> >> Please note, the table already has to be sorted for the function to >> work, as is the nature of a binary search. I should definitely add >> comments. I'm sorry, the code was intended more as a basis to discuss >> the general idea that I described in the cover letter. I should have >> made that clearer. >> >> > searching through until we either: >> > - find that the first rate in the table is too high >> >> I could see that I could add two steps in the beginning, before the loop: >> - Take the first element and see if its rate is greater than the >> requested rate, if so immediatly return it >> - Take the last element and see if its rate is less than the requested >> rate, if so immediatly return it >> >> Is that what you mean? I'd have to run some simulations to see, if this >> is a real improvement, because we would need two additional rate >> calculations. Worst case would therefore be 2+log(n) calculations >> instead of log(n) and the code would be slightly more complicated in my >> opinion. But if we run this function with all possible parents rate (as >> suggested in the end of my cover letter) these two special cases could >> very well be often applicable. Thanks! >> >> > - find an exact rate >> >> What do you mean by "exact rate"? Do you mean a rate that matches the >> requested rate exactly. This is what the code is already trying to do. >> But, as this is not always possible, in cases where it does not find an >> exact match, it takes the closest match instead. >> >> > - go above the requested rate, then there's only two to compare: our >> > current rate and the previous one >> >> Sorry, you've lost me here. How would I go above the requested rate? You >> would have to do the binary search to find that rate, but then why not >> search the closest rate directly (as the code does) instead of searching >> the closest rate above the requested (as you proposed). I feel like >> either one of us is missing something. :) > > What we're missing is that I'm not explaining this well. > > Let's take a very simple table: (value = parent * n * k / m) > > 0. 100 > 1. 200 > 2. 300 > 3. 400 > > If we search for 50, our closest is the first rate, so index 0: this > is the "find that the first rate in the table is too high" case. > > If we search for 300, we'll converge on index 2: this is the "exact > rate" situation. > > If we search for 275, then we'll converge on either 200 or 300: this > is the "two to compare" situation: if we converge until we get to the > lowest rate above our target, we only need to check the rate > immediately before it in the table and the one we converged on to find > the closest. > > So in pseudo-code, we'd end up with something like this: > > -------- > > start = 0; > > cur_rate = parent * table[start].n * table[start].k / table[start].m; > > if (cur_rate >= target) > return table[start]; > > while (start <= end) { > mid = (start + end) / 2; Thanks for the thorough explanation! This needs to be (start + end + 1) / 2 Otherwise, if we extend your hypothetical list above with another item, let's say 500 and look for 199, this would result in the loop finishing with mid = 0, if I'm not mistaken, and hence an access to table[-1] when calculating prev_rate below. Not good. But I *think*, with (start + end + 1) / 2 it works in all cases. > > cur_rate = parent * table[mid].n * table[mid].k / table[mid].m; > > if (cur_rate == target) > return table[mid]; > > if (target < cur_rate) > end = mid - 1; > else > start = mid + 1; > } > > prev_rate = parent * table[mid - 1].n * table[mid - 1].k / table[mid - 1].m; > > if (abs(target - prev_rate) < abs(target - cur_rate)) > return table[mid - 1]; > > return table[mid]; > > -------- > > Which seems simpler to my eye and moves all the difference > calculations out of the loop so they only have to be done once, > effectively trading a difference calculation on each checked rate for > a rate calculation, and dropping some variables in the process. At least it's shorter. I'm not sure it's simpler (after all it contained a mistake, I think ;-)). Still, it looks neat, so I might still use your (revised) algorithm. Thanks, Frank > > Thanks,
diff --git a/drivers/clk/sunxi-ng/ccu_nkm.c b/drivers/clk/sunxi-ng/ccu_nkm.c index 94d2a83992b2..9652f6df17bd 100644 --- a/drivers/clk/sunxi-ng/ccu_nkm.c +++ b/drivers/clk/sunxi-ng/ccu_nkm.c @@ -54,6 +54,49 @@ static unsigned long ccu_nkm_find_best(unsigned long parent, unsigned long rate, return best_rate; } +static unsigned long ccu_nkm_find_best_precalc(unsigned long parent, + unsigned long rate, + struct _ccu_nkm *nkm, + struct clk_nkm_table *table) +{ + unsigned long best_rate = 0, best_diff = ULONG_MAX; + unsigned long best_n = 0, best_k = 0, best_m = 0; + int start = 0, end = table->num - 1, mid; + + while (start <= end) { + unsigned long tmp_rate; + unsigned long tmp_diff; + + mid = (start + end) / 2; + + tmp_rate = parent * table->combos[mid].n * table->combos[mid].k / + table->combos[mid].m; + + tmp_diff = abs(rate - tmp_rate); + + if (tmp_diff < best_diff) { + best_rate = tmp_rate; + best_diff = tmp_diff; + best_n = table->combos[mid].n; + best_k = table->combos[mid].k; + best_m = table->combos[mid].m; + if (best_diff == 0) + goto out; + } + if (rate < tmp_rate) + end = mid - 1; + else + start = mid + 1; + } + +out: + nkm->n = best_n; + nkm->k = best_k; + nkm->m = best_m; + + return best_rate; +} + static void ccu_nkm_disable(struct clk_hw *hw) { struct ccu_nkm *nkm = hw_to_ccu_nkm(hw); @@ -119,17 +162,22 @@ static unsigned long ccu_nkm_round_rate(struct ccu_mux_internal *mux, struct ccu_nkm *nkm = data; struct _ccu_nkm _nkm; - _nkm.min_n = nkm->n.min ?: 1; - _nkm.max_n = nkm->n.max ?: 1 << nkm->n.width; - _nkm.min_k = nkm->k.min ?: 1; - _nkm.max_k = nkm->k.max ?: 1 << nkm->k.width; - _nkm.min_m = 1; - _nkm.max_m = nkm->m.max ?: 1 << nkm->m.width; - if (nkm->common.features & CCU_FEATURE_FIXED_POSTDIV) rate *= nkm->fixed_post_div; - rate = ccu_nkm_find_best(*parent_rate, rate, &_nkm); + if (nkm->table.num) + rate = ccu_nkm_find_best_precalc(*parent_rate, rate, &_nkm, + &nkm->table); + else { + _nkm.min_n = nkm->n.min ?: 1; + _nkm.max_n = nkm->n.max ?: 1 << nkm->n.width; + _nkm.min_k = nkm->k.min ?: 1; + _nkm.max_k = nkm->k.max ?: 1 << nkm->k.width; + _nkm.min_m = 1; + _nkm.max_m = nkm->m.max ?: 1 << nkm->m.width; + + rate = ccu_nkm_find_best(*parent_rate, rate, &_nkm); + } if (nkm->common.features & CCU_FEATURE_FIXED_POSTDIV) rate /= nkm->fixed_post_div; @@ -157,14 +205,18 @@ static int ccu_nkm_set_rate(struct clk_hw *hw, unsigned long rate, if (nkm->common.features & CCU_FEATURE_FIXED_POSTDIV) rate *= nkm->fixed_post_div; - _nkm.min_n = nkm->n.min ?: 1; - _nkm.max_n = nkm->n.max ?: 1 << nkm->n.width; - _nkm.min_k = nkm->k.min ?: 1; - _nkm.max_k = nkm->k.max ?: 1 << nkm->k.width; - _nkm.min_m = 1; - _nkm.max_m = nkm->m.max ?: 1 << nkm->m.width; - - ccu_nkm_find_best(parent_rate, rate, &_nkm); + if (nkm->table.num) + rate = ccu_nkm_find_best_precalc(*parent_rate, rate, &_nkm, + &nkm->table); + else { + _nkm.min_n = nkm->n.min ?: 1; + _nkm.max_n = nkm->n.max ?: 1 << nkm->n.width; + _nkm.min_k = nkm->k.min ?: 1; + _nkm.max_k = nkm->k.max ?: 1 << nkm->k.width; + _nkm.min_m = 1; + _nkm.max_m = nkm->m.max ?: 1 << nkm->m.width; + ccu_nkm_find_best(parent_rate, rate, &_nkm); + } spin_lock_irqsave(nkm->common.lock, flags); diff --git a/drivers/clk/sunxi-ng/ccu_nkm.h b/drivers/clk/sunxi-ng/ccu_nkm.h index 6601defb3f38..fa5551724921 100644 --- a/drivers/clk/sunxi-ng/ccu_nkm.h +++ b/drivers/clk/sunxi-ng/ccu_nkm.h @@ -12,6 +12,30 @@ #include "ccu_div.h" #include "ccu_mult.h" +struct clk_nkm_combo { + u8 n; + u8 k; + u8 m; +}; + +/** + * struct clk_nkm_table - Table of all meaningful combinations for n, k, and m + * + * @num: Number of entries in the table + * @combos: Array of combos (of size num) that are supported by this clock. + * + * This table shall contain all meaningful combinations of n, k, and m. That + * means that combinations that result in the same clock rate shall only be + * listed once. For example, if both + * { .n = 1, .k = 2, .m = 2} and { .n = 2, .k = 2, .m = 4} + * are valid values for n, k, and m, only one of them would be allowed because + * both result in a factor of 1.0. + */ +struct clk_nkm_table { + size_t num; + struct clk_nkm_combo *combos; +}; + /* * struct ccu_nkm - Definition of an N-K-M clock * @@ -29,6 +53,8 @@ struct ccu_nkm { unsigned int fixed_post_div; struct ccu_common common; + + struct clk_nkm_table table; }; #define SUNXI_CCU_NKM_WITH_MUX_GATE_LOCK(_struct, _name, _parents, _reg, \