[20/20] interconnect: qcom: Divide clk rate by src node bus width

Message ID 20230526-topic-smd_icc-v1-20-1bf8e6663c4e@linaro.org
State New
Headers
Series Restructure RPM SMD ICC |

Commit Message

Konrad Dybcio May 30, 2023, 10:20 a.m. UTC
  Ever since the introduction of SMD RPM ICC, we've been dividing the
clock rate by the wrong bus width. This has resulted in:

- setting wrong (mostly too low) rates, affecting performance
  - most often /2 or /4
  - things like DDR never hit their full potential
  - the rates were only correct if src bus width == dst bus width
    for all src, dst pairs on a given bus

- Qualcomm using the same wrong logic in their BSP driver in msm-5.x
  that ships in production devices today

- me losing my sanity trying to find this

Resolve it by using dst_qn, if it exists.

Fixes: 5e4e6c4d3ae0 ("interconnect: qcom: Add QCS404 interconnect provider driver")
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
---
 drivers/interconnect/qcom/icc-rpm.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)
  

Comments

Konrad Dybcio May 30, 2023, 12:16 p.m. UTC | #1
Note: the commit title is wrong (src -> dst obviously).
Thanks Stephan for spotting this.

Konrad

On 30.05.2023 12:20, Konrad Dybcio wrote:
> Ever since the introduction of SMD RPM ICC, we've been dividing the
> clock rate by the wrong bus width. This has resulted in:
> 
> - setting wrong (mostly too low) rates, affecting performance
>   - most often /2 or /4
>   - things like DDR never hit their full potential
>   - the rates were only correct if src bus width == dst bus width
>     for all src, dst pairs on a given bus
> 
> - Qualcomm using the same wrong logic in their BSP driver in msm-5.x
>   that ships in production devices today
> 
> - me losing my sanity trying to find this
> 
> Resolve it by using dst_qn, if it exists.
> 
> Fixes: 5e4e6c4d3ae0 ("interconnect: qcom: Add QCS404 interconnect provider driver")
> Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
> ---
>  drivers/interconnect/qcom/icc-rpm.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/interconnect/qcom/icc-rpm.c b/drivers/interconnect/qcom/icc-rpm.c
> index 59be704364bb..58e2a8b1b7c3 100644
> --- a/drivers/interconnect/qcom/icc-rpm.c
> +++ b/drivers/interconnect/qcom/icc-rpm.c
> @@ -340,7 +340,7 @@ static void qcom_icc_bus_aggregate(struct icc_provider *provider,
>  static int qcom_icc_set(struct icc_node *src, struct icc_node *dst)
>  {
>  	struct qcom_icc_provider *qp;
> -	struct qcom_icc_node *src_qn = NULL, *dst_qn = NULL;
> +	struct qcom_icc_node *src_qn = NULL, *dst_qn = NULL, *qn = NULL;
>  	struct icc_provider *provider;
>  	u64 active_rate, sleep_rate;
>  	u64 agg_avg[QCOM_SMD_RPM_STATE_NUM], agg_peak[QCOM_SMD_RPM_STATE_NUM];
> @@ -353,6 +353,8 @@ static int qcom_icc_set(struct icc_node *src, struct icc_node *dst)
>  	provider = src->provider;
>  	qp = to_qcom_provider(provider);
>  
> +	qn = dst_qn ? dst_qn : src_qn;
> +
>  	qcom_icc_bus_aggregate(provider, agg_avg, agg_peak, &max_agg_avg);
>  
>  	ret = qcom_icc_rpm_set(src_qn, agg_avg);
> @@ -372,11 +374,11 @@ static int qcom_icc_set(struct icc_node *src, struct icc_node *dst)
>  	/* Intentionally keep the rates in kHz as that's what RPM accepts */
>  	active_rate = max(agg_avg[QCOM_SMD_RPM_ACTIVE_STATE],
>  			  agg_peak[QCOM_SMD_RPM_ACTIVE_STATE]);
> -	do_div(active_rate, src_qn->buswidth);
> +	do_div(active_rate, qn->buswidth);
>  
>  	sleep_rate = max(agg_avg[QCOM_SMD_RPM_SLEEP_STATE],
>  			 agg_peak[QCOM_SMD_RPM_SLEEP_STATE]);
> -	do_div(sleep_rate, src_qn->buswidth);
> +	do_div(sleep_rate, qn->buswidth);
>  
>  	/*
>  	 * Downstream checks whether the requested rate is zero, but it makes little sense
>
  
Konrad Dybcio May 30, 2023, 4:32 p.m. UTC | #2
On 30.05.2023 12:20, Konrad Dybcio wrote:
> Ever since the introduction of SMD RPM ICC, we've been dividing the
> clock rate by the wrong bus width. This has resulted in:
> 
> - setting wrong (mostly too low) rates, affecting performance
>   - most often /2 or /4
>   - things like DDR never hit their full potential
>   - the rates were only correct if src bus width == dst bus width
>     for all src, dst pairs on a given bus
> 
> - Qualcomm using the same wrong logic in their BSP driver in msm-5.x
>   that ships in production devices today
> 
> - me losing my sanity trying to find this
> 
> Resolve it by using dst_qn, if it exists.
> 
> Fixes: 5e4e6c4d3ae0 ("interconnect: qcom: Add QCS404 interconnect provider driver")
> Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
> ---
The problem is deeper.

Chatting with Stephan (+CC), we tackled a few issues (that I will send
fixes for in v2):

1. qcom_icc_rpm_set() should take per-node (src_qn->sum_avg, dst_qn->sum_avg)
   and NOT aggregated bw (unless you want ALL of your nodes on a given provider
   to "go very fast")

2. the aggregate bw/clk rate calculation should use the node-specific bus widths
   and not only the bus width of the src/dst node, otherwise the average bw
   values will be utterly meaningless

3. thanks to (1) and (2) qcom_icc_bus_aggregate() can be remodeled to instead
   calculate the clock rates for the two rpm contexts, which we can then max()
   and pass on to the ratesetting call


----8<---- Cutting off Stephan's seal of approval, this is my thinking ----

4. I *think* Qualcomm really made a mistake in their msm-5.4 driver where they
   took most of the logic from the current -next state and should have been
   setting the rate based on the *DST* provider, or at least that's my
   understanding trying to read the "known good" msm-4.19 driver
   (which remembers msm-3.0 lol).. Or maybe we should keep src but ensure there's
   also a final (dst, dst) vote cast:

provider->inter_set = false // current state upstream

setting apps_proc<->slv_bimc_snoc
setting mas_bimc_snoc<->slv_snoc_cnoc
setting mas_snoc_cnoc<->qhs_sdc2


provider->inter_set = true // I don't think there's effectively a difference?

setting apps_proc<->slv_bimc_snoc
setting slv_bimc_snoc<->mas_bimc_snoc
setting mas_bimc_snoc<->slv_snoc_cnoc
setting slv_snoc_cnoc<->mas_snoc_cnoc
setting mas_snoc_cnoc<->qhs_sdc2

all the (mas|slv)_bus1_bus2 are very wide whereas the target nodes are usually
4-, 8- or 16-wide, which without this patch or something equivalent decimates
(or actually 2^n-ates) the calculated rates..

Konrad


>  drivers/interconnect/qcom/icc-rpm.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/interconnect/qcom/icc-rpm.c b/drivers/interconnect/qcom/icc-rpm.c
> index 59be704364bb..58e2a8b1b7c3 100644
> --- a/drivers/interconnect/qcom/icc-rpm.c
> +++ b/drivers/interconnect/qcom/icc-rpm.c
> @@ -340,7 +340,7 @@ static void qcom_icc_bus_aggregate(struct icc_provider *provider,
>  static int qcom_icc_set(struct icc_node *src, struct icc_node *dst)
>  {
>  	struct qcom_icc_provider *qp;
> -	struct qcom_icc_node *src_qn = NULL, *dst_qn = NULL;
> +	struct qcom_icc_node *src_qn = NULL, *dst_qn = NULL, *qn = NULL;
>  	struct icc_provider *provider;
>  	u64 active_rate, sleep_rate;
>  	u64 agg_avg[QCOM_SMD_RPM_STATE_NUM], agg_peak[QCOM_SMD_RPM_STATE_NUM];
> @@ -353,6 +353,8 @@ static int qcom_icc_set(struct icc_node *src, struct icc_node *dst)
>  	provider = src->provider;
>  	qp = to_qcom_provider(provider);
>  
> +	qn = dst_qn ? dst_qn : src_qn;
> +
>  	qcom_icc_bus_aggregate(provider, agg_avg, agg_peak, &max_agg_avg);
>  
>  	ret = qcom_icc_rpm_set(src_qn, agg_avg);
> @@ -372,11 +374,11 @@ static int qcom_icc_set(struct icc_node *src, struct icc_node *dst)
>  	/* Intentionally keep the rates in kHz as that's what RPM accepts */
>  	active_rate = max(agg_avg[QCOM_SMD_RPM_ACTIVE_STATE],
>  			  agg_peak[QCOM_SMD_RPM_ACTIVE_STATE]);
> -	do_div(active_rate, src_qn->buswidth);
> +	do_div(active_rate, qn->buswidth);
>  
>  	sleep_rate = max(agg_avg[QCOM_SMD_RPM_SLEEP_STATE],
>  			 agg_peak[QCOM_SMD_RPM_SLEEP_STATE]);
> -	do_div(sleep_rate, src_qn->buswidth);
> +	do_div(sleep_rate, qn->buswidth);
>  
>  	/*
>  	 * Downstream checks whether the requested rate is zero, but it makes little sense
>
  
Stephan Gerhold May 30, 2023, 7:02 p.m. UTC | #3
On Tue, May 30, 2023 at 06:32:04PM +0200, Konrad Dybcio wrote:
> On 30.05.2023 12:20, Konrad Dybcio wrote:
> > Ever since the introduction of SMD RPM ICC, we've been dividing the
> > clock rate by the wrong bus width. This has resulted in:
> > 
> > - setting wrong (mostly too low) rates, affecting performance
> >   - most often /2 or /4
> >   - things like DDR never hit their full potential
> >   - the rates were only correct if src bus width == dst bus width
> >     for all src, dst pairs on a given bus
> > 
> > - Qualcomm using the same wrong logic in their BSP driver in msm-5.x
> >   that ships in production devices today
> > 
> > - me losing my sanity trying to find this
> > 
> > Resolve it by using dst_qn, if it exists.
> > 
> > Fixes: 5e4e6c4d3ae0 ("interconnect: qcom: Add QCS404 interconnect provider driver")
> > Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
> > ---
> The problem is deeper.
> 
> Chatting with Stephan (+CC), we tackled a few issues (that I will send
> fixes for in v2):
> 
> 1. qcom_icc_rpm_set() should take per-node (src_qn->sum_avg, dst_qn->sum_avg)
>    and NOT aggregated bw (unless you want ALL of your nodes on a given provider
>    to "go very fast")
> 
> 2. the aggregate bw/clk rate calculation should use the node-specific bus widths
>    and not only the bus width of the src/dst node, otherwise the average bw
>    values will be utterly meaningless
> 

The peak bandwidth / clock rate is wrong as well if you have two paths
with different buswidths on the same bus/NoC. (If someone is interested
in details I can post my specific example I had in the chat, it shows
this more clearly.)

> 3. thanks to (1) and (2) qcom_icc_bus_aggregate() can be remodeled to instead
>    calculate the clock rates for the two rpm contexts, which we can then max()
>    and pass on to the ratesetting call
> 

Sounds good.

> 
> ----8<---- Cutting off Stephan's seal of approval, this is my thinking ----
> 
> 4. I *think* Qualcomm really made a mistake in their msm-5.4 driver where they
>    took most of the logic from the current -next state and should have been
>    setting the rate based on the *DST* provider, or at least that's my
>    understanding trying to read the "known good" msm-4.19 driver
>    (which remembers msm-3.0 lol).. Or maybe we should keep src but ensure there's
>    also a final (dst, dst) vote cast:
> 
> provider->inter_set = false // current state upstream
> 
> setting apps_proc<->slv_bimc_snoc
> setting mas_bimc_snoc<->slv_snoc_cnoc
> setting mas_snoc_cnoc<->qhs_sdc2
> 
> 
> provider->inter_set = true // I don't think there's effectively a difference?
> 
> setting apps_proc<->slv_bimc_snoc
> setting slv_bimc_snoc<->mas_bimc_snoc
> setting mas_bimc_snoc<->slv_snoc_cnoc
> setting slv_snoc_cnoc<->mas_snoc_cnoc
> setting mas_snoc_cnoc<->qhs_sdc2
> 

I think with our proposed changes above it does no longer matter if a
node is passed as "src" or "dst". This means in your example above you
just waste additional time setting the bandwidth twice for
slv_bimc_snoc, mas_bimc_snoc, slv_snoc_cnoc and mas_snoc_cnoc.
The final outcome is the same with or without "inter_set".

Thanks,
Stephan
  
Konrad Dybcio June 1, 2023, 12:43 p.m. UTC | #4
On 30.05.2023 21:02, Stephan Gerhold wrote:
> On Tue, May 30, 2023 at 06:32:04PM +0200, Konrad Dybcio wrote:
>> On 30.05.2023 12:20, Konrad Dybcio wrote:
>>> Ever since the introduction of SMD RPM ICC, we've been dividing the
>>> clock rate by the wrong bus width. This has resulted in:
>>>
>>> - setting wrong (mostly too low) rates, affecting performance
>>>   - most often /2 or /4
>>>   - things like DDR never hit their full potential
>>>   - the rates were only correct if src bus width == dst bus width
>>>     for all src, dst pairs on a given bus
>>>
>>> - Qualcomm using the same wrong logic in their BSP driver in msm-5.x
>>>   that ships in production devices today
>>>
>>> - me losing my sanity trying to find this
>>>
>>> Resolve it by using dst_qn, if it exists.
>>>
>>> Fixes: 5e4e6c4d3ae0 ("interconnect: qcom: Add QCS404 interconnect provider driver")
>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
>>> ---
>> The problem is deeper.
>>
>> Chatting with Stephan (+CC), we tackled a few issues (that I will send
>> fixes for in v2):
>>
>> 1. qcom_icc_rpm_set() should take per-node (src_qn->sum_avg, dst_qn->sum_avg)
>>    and NOT aggregated bw (unless you want ALL of your nodes on a given provider
>>    to "go very fast")
>>
>> 2. the aggregate bw/clk rate calculation should use the node-specific bus widths
>>    and not only the bus width of the src/dst node, otherwise the average bw
>>    values will be utterly meaningless
>>
> 
> The peak bandwidth / clock rate is wrong as well if you have two paths
> with different buswidths on the same bus/NoC. (If someone is interested
> in details I can post my specific example I had in the chat, it shows
> this more clearly.)
agg_peak takes care of that, I believe..


> 
>> 3. thanks to (1) and (2) qcom_icc_bus_aggregate() can be remodeled to instead
>>    calculate the clock rates for the two rpm contexts, which we can then max()
>>    and pass on to the ratesetting call
>>
> 
> Sounds good.
> 
>>
>> ----8<---- Cutting off Stephan's seal of approval, this is my thinking ----
>>
>> 4. I *think* Qualcomm really made a mistake in their msm-5.4 driver where they
>>    took most of the logic from the current -next state and should have been
>>    setting the rate based on the *DST* provider, or at least that's my
>>    understanding trying to read the "known good" msm-4.19 driver
>>    (which remembers msm-3.0 lol).. Or maybe we should keep src but ensure there's
>>    also a final (dst, dst) vote cast:
>>
>> provider->inter_set = false // current state upstream
>>
>> setting apps_proc<->slv_bimc_snoc
>> setting mas_bimc_snoc<->slv_snoc_cnoc
>> setting mas_snoc_cnoc<->qhs_sdc2
>>
>>
>> provider->inter_set = true // I don't think there's effectively a difference?
>>
>> setting apps_proc<->slv_bimc_snoc
>> setting slv_bimc_snoc<->mas_bimc_snoc
>> setting mas_bimc_snoc<->slv_snoc_cnoc
>> setting slv_snoc_cnoc<->mas_snoc_cnoc
>> setting mas_snoc_cnoc<->qhs_sdc2
>>
> 
> I think with our proposed changes above it does no longer matter if a
> node is passed as "src" or "dst". This means in your example above you
> just waste additional time setting the bandwidth twice for
> slv_bimc_snoc, mas_bimc_snoc, slv_snoc_cnoc and mas_snoc_cnoc.
> The final outcome is the same with or without "inter_set".
Yeah I guess due to the fact that two "real" nodes are always
connected by a set of "gateway" nodes, the rate will be applied..

I am however not sure if we're supposed to set the bandwidth
(via qcom_icc_rpm_set()) on all of them..

Konrad
> 
> Thanks,
> Stephan
  
Stephan Gerhold June 1, 2023, 1:23 p.m. UTC | #5
On Thu, Jun 01, 2023 at 02:43:50PM +0200, Konrad Dybcio wrote:
> On 30.05.2023 21:02, Stephan Gerhold wrote:
> > On Tue, May 30, 2023 at 06:32:04PM +0200, Konrad Dybcio wrote:
> >> On 30.05.2023 12:20, Konrad Dybcio wrote:
> >>> Ever since the introduction of SMD RPM ICC, we've been dividing the
> >>> clock rate by the wrong bus width. This has resulted in:
> >>>
> >>> - setting wrong (mostly too low) rates, affecting performance
> >>>   - most often /2 or /4
> >>>   - things like DDR never hit their full potential
> >>>   - the rates were only correct if src bus width == dst bus width
> >>>     for all src, dst pairs on a given bus
> >>>
> >>> - Qualcomm using the same wrong logic in their BSP driver in msm-5.x
> >>>   that ships in production devices today
> >>>
> >>> - me losing my sanity trying to find this
> >>>
> >>> Resolve it by using dst_qn, if it exists.
> >>>
> >>> Fixes: 5e4e6c4d3ae0 ("interconnect: qcom: Add QCS404 interconnect provider driver")
> >>> Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
> >>> ---
> >> The problem is deeper.
> >>
> >> Chatting with Stephan (+CC), we tackled a few issues (that I will send
> >> fixes for in v2):
> >>
> >> 1. qcom_icc_rpm_set() should take per-node (src_qn->sum_avg, dst_qn->sum_avg)
> >>    and NOT aggregated bw (unless you want ALL of your nodes on a given provider
> >>    to "go very fast")
> >>
> >> 2. the aggregate bw/clk rate calculation should use the node-specific bus widths
> >>    and not only the bus width of the src/dst node, otherwise the average bw
> >>    values will be utterly meaningless
> >>
> > 
> > The peak bandwidth / clock rate is wrong as well if you have two paths
> > with different buswidths on the same bus/NoC. (If someone is interested
> > in details I can post my specific example I had in the chat, it shows
> > this more clearly.)
> agg_peak takes care of that, I believe..
> 

I was just nitpicking on your description here, I think the solution
you/we had in mind was already correct. :)

> 
> > 
> >> 3. thanks to (1) and (2) qcom_icc_bus_aggregate() can be remodeled to instead
> >>    calculate the clock rates for the two rpm contexts, which we can then max()
> >>    and pass on to the ratesetting call
> >>
> > 
> > Sounds good.
> > 
> >>
> >> ----8<---- Cutting off Stephan's seal of approval, this is my thinking ----
> >>
> >> 4. I *think* Qualcomm really made a mistake in their msm-5.4 driver where they
> >>    took most of the logic from the current -next state and should have been
> >>    setting the rate based on the *DST* provider, or at least that's my
> >>    understanding trying to read the "known good" msm-4.19 driver
> >>    (which remembers msm-3.0 lol).. Or maybe we should keep src but ensure there's
> >>    also a final (dst, dst) vote cast:
> >>
> >> provider->inter_set = false // current state upstream
> >>
> >> setting apps_proc<->slv_bimc_snoc
> >> setting mas_bimc_snoc<->slv_snoc_cnoc
> >> setting mas_snoc_cnoc<->qhs_sdc2
> >>
> >>
> >> provider->inter_set = true // I don't think there's effectively a difference?
> >>
> >> setting apps_proc<->slv_bimc_snoc
> >> setting slv_bimc_snoc<->mas_bimc_snoc
> >> setting mas_bimc_snoc<->slv_snoc_cnoc
> >> setting slv_snoc_cnoc<->mas_snoc_cnoc
> >> setting mas_snoc_cnoc<->qhs_sdc2
> >>
> > 
> > I think with our proposed changes above it does no longer matter if a
> > node is passed as "src" or "dst". This means in your example above you
> > just waste additional time setting the bandwidth twice for
> > slv_bimc_snoc, mas_bimc_snoc, slv_snoc_cnoc and mas_snoc_cnoc.
> > The final outcome is the same with or without "inter_set".
> Yeah I guess due to the fact that two "real" nodes are always
> connected by a set of "gateway" nodes, the rate will be applied..
> 
> I am however not sure if we're supposed to set the bandwidth
> (via qcom_icc_rpm_set()) on all of them..
> 

I think so? The nodes RPM doesn't care about shouldn't have
a slv/mas_rpm_id.
  
Konrad Dybcio June 1, 2023, 1:29 p.m. UTC | #6
On 1.06.2023 15:23, Stephan Gerhold wrote:
> On Thu, Jun 01, 2023 at 02:43:50PM +0200, Konrad Dybcio wrote:
>> On 30.05.2023 21:02, Stephan Gerhold wrote:
>>> On Tue, May 30, 2023 at 06:32:04PM +0200, Konrad Dybcio wrote:
>>>> On 30.05.2023 12:20, Konrad Dybcio wrote:
>>>>> Ever since the introduction of SMD RPM ICC, we've been dividing the
>>>>> clock rate by the wrong bus width. This has resulted in:
>>>>>
>>>>> - setting wrong (mostly too low) rates, affecting performance
>>>>>   - most often /2 or /4
>>>>>   - things like DDR never hit their full potential
>>>>>   - the rates were only correct if src bus width == dst bus width
>>>>>     for all src, dst pairs on a given bus
>>>>>
>>>>> - Qualcomm using the same wrong logic in their BSP driver in msm-5.x
>>>>>   that ships in production devices today
>>>>>
>>>>> - me losing my sanity trying to find this
>>>>>
>>>>> Resolve it by using dst_qn, if it exists.
>>>>>
>>>>> Fixes: 5e4e6c4d3ae0 ("interconnect: qcom: Add QCS404 interconnect provider driver")
>>>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
>>>>> ---
>>>> The problem is deeper.
>>>>
>>>> Chatting with Stephan (+CC), we tackled a few issues (that I will send
>>>> fixes for in v2):
>>>>
>>>> 1. qcom_icc_rpm_set() should take per-node (src_qn->sum_avg, dst_qn->sum_avg)
>>>>    and NOT aggregated bw (unless you want ALL of your nodes on a given provider
>>>>    to "go very fast")
>>>>
>>>> 2. the aggregate bw/clk rate calculation should use the node-specific bus widths
>>>>    and not only the bus width of the src/dst node, otherwise the average bw
>>>>    values will be utterly meaningless
>>>>
>>>
>>> The peak bandwidth / clock rate is wrong as well if you have two paths
>>> with different buswidths on the same bus/NoC. (If someone is interested
>>> in details I can post my specific example I had in the chat, it shows
>>> this more clearly.)
>> agg_peak takes care of that, I believe..
>>
> 
> I was just nitpicking on your description here, I think the solution
> you/we had in mind was already correct. :)
> 
>>
>>>
>>>> 3. thanks to (1) and (2) qcom_icc_bus_aggregate() can be remodeled to instead
>>>>    calculate the clock rates for the two rpm contexts, which we can then max()
>>>>    and pass on to the ratesetting call
>>>>
>>>
>>> Sounds good.
>>>
>>>>
>>>> ----8<---- Cutting off Stephan's seal of approval, this is my thinking ----
>>>>
>>>> 4. I *think* Qualcomm really made a mistake in their msm-5.4 driver where they
>>>>    took most of the logic from the current -next state and should have been
>>>>    setting the rate based on the *DST* provider, or at least that's my
>>>>    understanding trying to read the "known good" msm-4.19 driver
>>>>    (which remembers msm-3.0 lol).. Or maybe we should keep src but ensure there's
>>>>    also a final (dst, dst) vote cast:
>>>>
>>>> provider->inter_set = false // current state upstream
>>>>
>>>> setting apps_proc<->slv_bimc_snoc
>>>> setting mas_bimc_snoc<->slv_snoc_cnoc
>>>> setting mas_snoc_cnoc<->qhs_sdc2
>>>>
>>>>
>>>> provider->inter_set = true // I don't think there's effectively a difference?
>>>>
>>>> setting apps_proc<->slv_bimc_snoc
>>>> setting slv_bimc_snoc<->mas_bimc_snoc
>>>> setting mas_bimc_snoc<->slv_snoc_cnoc
>>>> setting slv_snoc_cnoc<->mas_snoc_cnoc
>>>> setting mas_snoc_cnoc<->qhs_sdc2
>>>>
>>>
>>> I think with our proposed changes above it does no longer matter if a
>>> node is passed as "src" or "dst". This means in your example above you
>>> just waste additional time setting the bandwidth twice for
>>> slv_bimc_snoc, mas_bimc_snoc, slv_snoc_cnoc and mas_snoc_cnoc.
>>> The final outcome is the same with or without "inter_set".
>> Yeah I guess due to the fact that two "real" nodes are always
>> connected by a set of "gateway" nodes, the rate will be applied..
>>
>> I am however not sure if we're supposed to set the bandwidth
>> (via qcom_icc_rpm_set()) on all of them..
>>
> 
> I think so? The nodes RPM doesn't care about shouldn't have
> a slv/mas_rpm_id.
Hm I guess the inter_set doesn't make a difference anyway, as you
pointed out.. Thankfully one thing less to fix :D

Konrad
  

Patch

diff --git a/drivers/interconnect/qcom/icc-rpm.c b/drivers/interconnect/qcom/icc-rpm.c
index 59be704364bb..58e2a8b1b7c3 100644
--- a/drivers/interconnect/qcom/icc-rpm.c
+++ b/drivers/interconnect/qcom/icc-rpm.c
@@ -340,7 +340,7 @@  static void qcom_icc_bus_aggregate(struct icc_provider *provider,
 static int qcom_icc_set(struct icc_node *src, struct icc_node *dst)
 {
 	struct qcom_icc_provider *qp;
-	struct qcom_icc_node *src_qn = NULL, *dst_qn = NULL;
+	struct qcom_icc_node *src_qn = NULL, *dst_qn = NULL, *qn = NULL;
 	struct icc_provider *provider;
 	u64 active_rate, sleep_rate;
 	u64 agg_avg[QCOM_SMD_RPM_STATE_NUM], agg_peak[QCOM_SMD_RPM_STATE_NUM];
@@ -353,6 +353,8 @@  static int qcom_icc_set(struct icc_node *src, struct icc_node *dst)
 	provider = src->provider;
 	qp = to_qcom_provider(provider);
 
+	qn = dst_qn ? dst_qn : src_qn;
+
 	qcom_icc_bus_aggregate(provider, agg_avg, agg_peak, &max_agg_avg);
 
 	ret = qcom_icc_rpm_set(src_qn, agg_avg);
@@ -372,11 +374,11 @@  static int qcom_icc_set(struct icc_node *src, struct icc_node *dst)
 	/* Intentionally keep the rates in kHz as that's what RPM accepts */
 	active_rate = max(agg_avg[QCOM_SMD_RPM_ACTIVE_STATE],
 			  agg_peak[QCOM_SMD_RPM_ACTIVE_STATE]);
-	do_div(active_rate, src_qn->buswidth);
+	do_div(active_rate, qn->buswidth);
 
 	sleep_rate = max(agg_avg[QCOM_SMD_RPM_SLEEP_STATE],
 			 agg_peak[QCOM_SMD_RPM_SLEEP_STATE]);
-	do_div(sleep_rate, src_qn->buswidth);
+	do_div(sleep_rate, qn->buswidth);
 
 	/*
 	 * Downstream checks whether the requested rate is zero, but it makes little sense