[rcu/dev,3/3] net: Use call_rcu_flush() for dst_destroy_rcu

Message ID 20221117031551.1142289-3-joel@joelfernandes.org
State New
Headers
Series [rcu/dev,1/3] net: Use call_rcu_flush() for qdisc_free_cb |

Commit Message

Joel Fernandes Nov. 17, 2022, 3:15 a.m. UTC
  In a networking test on ChromeOS, we find that using the new CONFIG_RCU_LAZY
causes a networking test to fail in the teardown phase.

The failure happens during: ip netns del <name>

Using ftrace, I found the callbacks it was queuing which this series fixes. Use
call_rcu_flush() to revert to the old behavior. With that, the test passes.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 net/core/dst.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Eric Dumazet Nov. 17, 2022, 3:44 a.m. UTC | #1
On Wed, Nov 16, 2022 at 7:16 PM Joel Fernandes (Google)
<joel@joelfernandes.org> wrote:
>
> In a networking test on ChromeOS, we find that using the new CONFIG_RCU_LAZY
> causes a networking test to fail in the teardown phase.
>
> The failure happens during: ip netns del <name>

And ? What happens then next ?

>
> Using ftrace, I found the callbacks it was queuing which this series fixes. Use
> call_rcu_flush() to revert to the old behavior. With that, the test passes.

What is this test about ? What barrier was used to make it not flaky ?

Was it depending on some undocumented RCU behavior ?

Maybe adding a sysctl to force the flush would be better for functional tests ?

I would rather change the test(s), than adding call_rcu_flush(),
adding merge conflicts to future backports.
  
Joel Fernandes Nov. 17, 2022, 3:58 p.m. UTC | #2
Hello Eric,

On Wed, Nov 16, 2022 at 07:44:41PM -0800, Eric Dumazet wrote:
> On Wed, Nov 16, 2022 at 7:16 PM Joel Fernandes (Google)
> <joel@joelfernandes.org> wrote:
> >
> > In a networking test on ChromeOS, we find that using the new CONFIG_RCU_LAZY
> > causes a networking test to fail in the teardown phase.
> >
> > The failure happens during: ip netns del <name>
> 
> And ? What happens then next ?

The test is doing the 'ip netns del <name>' and then polling for the
disappearance of a network interface name for upto 5 seconds. I believe it is
using netlink to get a table of interfaces. That polling is timing out.

Here is some more details from the test's owner (copy pasting from another
bug report):
In the cleanup, we remove the netns, and thus will cause the veth pair being
removed automatically, so we use a poll to check that if the veth in the root
netns still exists to know whether the cleanup is done.

Here is a public link to the code that is failing (its in golang):
https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/chromiumos/tast/local/network/virtualnet/env/env.go;drc=6c2841d6cc3eadd23e07912ec331943ee33d7de8;l=161

Here is a public link to the line of code in the actual test leading up to the above
path (this is the test that is run:
network.RoutingFallthrough.ipv4_only_primary) :
https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/chromiumos/tast/local/bundles/cros/network/routing_fallthrough.go;drc=8fbf2c53960bc8917a6a01fda5405cad7c17201e;l=52

> > Using ftrace, I found the callbacks it was queuing which this series fixes. Use
> > call_rcu_flush() to revert to the old behavior. With that, the test passes.
> 
> What is this test about ? What barrier was used to make it not flaky ?

I provided the links above, let me know if you have any questions.

> Was it depending on some undocumented RCU behavior ?

This is a new RCU feature posted here for significant power-savings on
battery-powered devices:
https://lore.kernel.org/rcu/20221017140726.GG5600@paulmck-ThinkPad-P17-Gen-1/T/#m7a54809b8903b41538850194d67eb34f203c752a

There is also an LPC presentation about the same, I can dig the link if you
are interested.

> Maybe adding a sysctl to force the flush would be better for functional tests ?
> 
> I would rather change the test(s), than adding call_rcu_flush(),
> adding merge conflicts to future backports.

I am not too sure about that, I think a user might expect the network
interface to disappear from the networking tables quickly enough without
dealing with barriers or kernel iternals. However, I added the authors of the
test to this email in the hopes he can provide is point of views as well.

The general approach we are taking with this sort of thing is to use
call_rcu_flush() which is basically the same as call_rcu() for systems with
CALL_RCU_LAZY=n. You can see some examples of that in the patch series link
above. Just to note, CALL_RCU_LAZY depends on CONFIG_RCU_NOCB_CPU so its only
Android and ChromeOS that are using it. I am adding Jie to share any input,
he is from the networking team and knows this test well.

thanks,

 - Joel
  
Eric Dumazet Nov. 17, 2022, 5:17 p.m. UTC | #3
On Thu, Nov 17, 2022 at 7:58 AM Joel Fernandes <joel@joelfernandes.org> wrote:
>
> Hello Eric,
>
> On Wed, Nov 16, 2022 at 07:44:41PM -0800, Eric Dumazet wrote:
> > On Wed, Nov 16, 2022 at 7:16 PM Joel Fernandes (Google)
> > <joel@joelfernandes.org> wrote:
> > >
> > > In a networking test on ChromeOS, we find that using the new CONFIG_RCU_LAZY
> > > causes a networking test to fail in the teardown phase.
> > >
> > > The failure happens during: ip netns del <name>
> >
> > And ? What happens then next ?
>
> The test is doing the 'ip netns del <name>' and then polling for the
> disappearance of a network interface name for upto 5 seconds. I believe it is
> using netlink to get a table of interfaces. That polling is timing out.
>
> Here is some more details from the test's owner (copy pasting from another
> bug report):
> In the cleanup, we remove the netns, and thus will cause the veth pair being
> removed automatically, so we use a poll to check that if the veth in the root
> netns still exists to know whether the cleanup is done.
>
> Here is a public link to the code that is failing (its in golang):
> https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/chromiumos/tast/local/network/virtualnet/env/env.go;drc=6c2841d6cc3eadd23e07912ec331943ee33d7de8;l=161
>
> Here is a public link to the line of code in the actual test leading up to the above
> path (this is the test that is run:
> network.RoutingFallthrough.ipv4_only_primary) :
> https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/chromiumos/tast/local/bundles/cros/network/routing_fallthrough.go;drc=8fbf2c53960bc8917a6a01fda5405cad7c17201e;l=52
>
> > > Using ftrace, I found the callbacks it was queuing which this series fixes. Use
> > > call_rcu_flush() to revert to the old behavior. With that, the test passes.
> >
> > What is this test about ? What barrier was used to make it not flaky ?
>
> I provided the links above, let me know if you have any questions.
>
> > Was it depending on some undocumented RCU behavior ?
>
> This is a new RCU feature posted here for significant power-savings on
> battery-powered devices:
> https://lore.kernel.org/rcu/20221017140726.GG5600@paulmck-ThinkPad-P17-Gen-1/T/#m7a54809b8903b41538850194d67eb34f203c752a
>
> There is also an LPC presentation about the same, I can dig the link if you
> are interested.
>
> > Maybe adding a sysctl to force the flush would be better for functional tests ?
> >
> > I would rather change the test(s), than adding call_rcu_flush(),
> > adding merge conflicts to future backports.
>
> I am not too sure about that, I think a user might expect the network
> interface to disappear from the networking tables quickly enough without
> dealing with barriers or kernel iternals. However, I added the authors of the
> test to this email in the hopes he can provide is point of views as well.
>
> The general approach we are taking with this sort of thing is to use
> call_rcu_flush() which is basically the same as call_rcu() for systems with
> CALL_RCU_LAZY=n. You can see some examples of that in the patch series link
> above. Just to note, CALL_RCU_LAZY depends on CONFIG_RCU_NOCB_CPU so its only
> Android and ChromeOS that are using it. I am adding Jie to share any input,
> he is from the networking team and knows this test well.
>
>

I do not know what is this RCU_LAZY thing, but IMO this should be opt-in

For instance, only kfree_rcu() should use it.

We can not review hundreds of call_rcu() call sites and decide if
adding arbitrary delays cou hurt .
  
Eric Dumazet Nov. 17, 2022, 5:33 p.m. UTC | #4
On Thu, Nov 17, 2022 at 9:17 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Thu, Nov 17, 2022 at 7:58 AM Joel Fernandes <joel@joelfernandes.org> wrote:
> >
> > Hello Eric,
> >
> > On Wed, Nov 16, 2022 at 07:44:41PM -0800, Eric Dumazet wrote:
> > > On Wed, Nov 16, 2022 at 7:16 PM Joel Fernandes (Google)
> > > <joel@joelfernandes.org> wrote:
> > > >
> > > > In a networking test on ChromeOS, we find that using the new CONFIG_RCU_LAZY
> > > > causes a networking test to fail in the teardown phase.
> > > >
> > > > The failure happens during: ip netns del <name>
> > >
> > > And ? What happens then next ?
> >
> > The test is doing the 'ip netns del <name>' and then polling for the
> > disappearance of a network interface name for upto 5 seconds. I believe it is
> > using netlink to get a table of interfaces. That polling is timing out.
> >
> > Here is some more details from the test's owner (copy pasting from another
> > bug report):
> > In the cleanup, we remove the netns, and thus will cause the veth pair being
> > removed automatically, so we use a poll to check that if the veth in the root
> > netns still exists to know whether the cleanup is done.
> >
> > Here is a public link to the code that is failing (its in golang):
> > https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/chromiumos/tast/local/network/virtualnet/env/env.go;drc=6c2841d6cc3eadd23e07912ec331943ee33d7de8;l=161
> >
> > Here is a public link to the line of code in the actual test leading up to the above
> > path (this is the test that is run:
> > network.RoutingFallthrough.ipv4_only_primary) :
> > https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/chromiumos/tast/local/bundles/cros/network/routing_fallthrough.go;drc=8fbf2c53960bc8917a6a01fda5405cad7c17201e;l=52
> >
> > > > Using ftrace, I found the callbacks it was queuing which this series fixes. Use
> > > > call_rcu_flush() to revert to the old behavior. With that, the test passes.
> > >
> > > What is this test about ? What barrier was used to make it not flaky ?
> >
> > I provided the links above, let me know if you have any questions.
> >
> > > Was it depending on some undocumented RCU behavior ?
> >
> > This is a new RCU feature posted here for significant power-savings on
> > battery-powered devices:
> > https://lore.kernel.org/rcu/20221017140726.GG5600@paulmck-ThinkPad-P17-Gen-1/T/#m7a54809b8903b41538850194d67eb34f203c752a
> >
> > There is also an LPC presentation about the same, I can dig the link if you
> > are interested.
> >
> > > Maybe adding a sysctl to force the flush would be better for functional tests ?
> > >
> > > I would rather change the test(s), than adding call_rcu_flush(),
> > > adding merge conflicts to future backports.
> >
> > I am not too sure about that, I think a user might expect the network
> > interface to disappear from the networking tables quickly enough without
> > dealing with barriers or kernel iternals. However, I added the authors of the
> > test to this email in the hopes he can provide is point of views as well.
> >
> > The general approach we are taking with this sort of thing is to use
> > call_rcu_flush() which is basically the same as call_rcu() for systems with
> > CALL_RCU_LAZY=n. You can see some examples of that in the patch series link
> > above. Just to note, CALL_RCU_LAZY depends on CONFIG_RCU_NOCB_CPU so its only
> > Android and ChromeOS that are using it. I am adding Jie to share any input,
> > he is from the networking team and knows this test well.
> >
> >
>
> I do not know what is this RCU_LAZY thing, but IMO this should be opt-in
>
> For instance, only kfree_rcu() should use it.
>
> We can not review hundreds of call_rcu() call sites and decide if
> adding arbitrary delays cou hurt .

At a very minimum, things like rcu_barrier() should make sure that all
'lazy' callbacks are processed in a reasonable amount of time.
  
Joel Fernandes Nov. 17, 2022, 5:38 p.m. UTC | #5
On Thu, Nov 17, 2022 at 5:17 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Thu, Nov 17, 2022 at 7:58 AM Joel Fernandes <joel@joelfernandes.org> wrote:
> >
> > Hello Eric,
> >
> > On Wed, Nov 16, 2022 at 07:44:41PM -0800, Eric Dumazet wrote:
> > > On Wed, Nov 16, 2022 at 7:16 PM Joel Fernandes (Google)
> > > <joel@joelfernandes.org> wrote:
> > > >
> > > > In a networking test on ChromeOS, we find that using the new CONFIG_RCU_LAZY
> > > > causes a networking test to fail in the teardown phase.
> > > >
> > > > The failure happens during: ip netns del <name>
> > >
> > > And ? What happens then next ?
> >
> > The test is doing the 'ip netns del <name>' and then polling for the
> > disappearance of a network interface name for upto 5 seconds. I believe it is
> > using netlink to get a table of interfaces. That polling is timing out.
> >
> > Here is some more details from the test's owner (copy pasting from another
> > bug report):
> > In the cleanup, we remove the netns, and thus will cause the veth pair being
> > removed automatically, so we use a poll to check that if the veth in the root
> > netns still exists to know whether the cleanup is done.
> >
> > Here is a public link to the code that is failing (its in golang):
> > https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/chromiumos/tast/local/network/virtualnet/env/env.go;drc=6c2841d6cc3eadd23e07912ec331943ee33d7de8;l=161
> >
> > Here is a public link to the line of code in the actual test leading up to the above
> > path (this is the test that is run:
> > network.RoutingFallthrough.ipv4_only_primary) :
> > https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/chromiumos/tast/local/bundles/cros/network/routing_fallthrough.go;drc=8fbf2c53960bc8917a6a01fda5405cad7c17201e;l=52
> >
> > > > Using ftrace, I found the callbacks it was queuing which this series fixes. Use
> > > > call_rcu_flush() to revert to the old behavior. With that, the test passes.
> > >
> > > What is this test about ? What barrier was used to make it not flaky ?
> >
> > I provided the links above, let me know if you have any questions.
> >
> > > Was it depending on some undocumented RCU behavior ?
> >
> > This is a new RCU feature posted here for significant power-savings on
> > battery-powered devices:
> > https://lore.kernel.org/rcu/20221017140726.GG5600@paulmck-ThinkPad-P17-Gen-1/T/#m7a54809b8903b41538850194d67eb34f203c752a
> >
> > There is also an LPC presentation about the same, I can dig the link if you
> > are interested.
> >
> > > Maybe adding a sysctl to force the flush would be better for functional tests ?
> > >
> > > I would rather change the test(s), than adding call_rcu_flush(),
> > > adding merge conflicts to future backports.
> >
> > I am not too sure about that, I think a user might expect the network
> > interface to disappear from the networking tables quickly enough without
> > dealing with barriers or kernel iternals. However, I added the authors of the
> > test to this email in the hopes he can provide is point of views as well.
> >
> > The general approach we are taking with this sort of thing is to use
> > call_rcu_flush() which is basically the same as call_rcu() for systems with
> > CALL_RCU_LAZY=n. You can see some examples of that in the patch series link
> > above. Just to note, CALL_RCU_LAZY depends on CONFIG_RCU_NOCB_CPU so its only
> > Android and ChromeOS that are using it. I am adding Jie to share any input,
> > he is from the networking team and knows this test well.
> >
> >
>
> I do not know what is this RCU_LAZY thing, but IMO this should be opt-in

You should read the links I sent you. We did already try opt-in,
Thomas Gleixner made a point at LPC that we should not add new APIs
for this purpose and confuse kernel developers.

> For instance, only kfree_rcu() should use it.

No. Most of the call_rcu() usages are for freeing memory, so the
consensus is we should apply this as opt out and fix issues along the
way. We already did a lot of research/diligence on seeing which users
need conversion.

> We can not review hundreds of call_rcu() call sites and decide if
> adding arbitrary delays cou hurt .

That work has already been done as much as possible, please read the
links I sent.

Thanks.
  
Eric Dumazet Nov. 17, 2022, 5:39 p.m. UTC | #6
On Thu, Nov 17, 2022 at 9:38 AM Joel Fernandes <joel@joelfernandes.org> wrote:
>
> On Thu, Nov 17, 2022 at 5:17 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Thu, Nov 17, 2022 at 7:58 AM Joel Fernandes <joel@joelfernandes.org> wrote:
> > >
> > > Hello Eric,
> > >
> > > On Wed, Nov 16, 2022 at 07:44:41PM -0800, Eric Dumazet wrote:
> > > > On Wed, Nov 16, 2022 at 7:16 PM Joel Fernandes (Google)
> > > > <joel@joelfernandes.org> wrote:
> > > > >
> > > > > In a networking test on ChromeOS, we find that using the new CONFIG_RCU_LAZY
> > > > > causes a networking test to fail in the teardown phase.
> > > > >
> > > > > The failure happens during: ip netns del <name>
> > > >
> > > > And ? What happens then next ?
> > >
> > > The test is doing the 'ip netns del <name>' and then polling for the
> > > disappearance of a network interface name for upto 5 seconds. I believe it is
> > > using netlink to get a table of interfaces. That polling is timing out.
> > >
> > > Here is some more details from the test's owner (copy pasting from another
> > > bug report):
> > > In the cleanup, we remove the netns, and thus will cause the veth pair being
> > > removed automatically, so we use a poll to check that if the veth in the root
> > > netns still exists to know whether the cleanup is done.
> > >
> > > Here is a public link to the code that is failing (its in golang):
> > > https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/chromiumos/tast/local/network/virtualnet/env/env.go;drc=6c2841d6cc3eadd23e07912ec331943ee33d7de8;l=161
> > >
> > > Here is a public link to the line of code in the actual test leading up to the above
> > > path (this is the test that is run:
> > > network.RoutingFallthrough.ipv4_only_primary) :
> > > https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/chromiumos/tast/local/bundles/cros/network/routing_fallthrough.go;drc=8fbf2c53960bc8917a6a01fda5405cad7c17201e;l=52
> > >
> > > > > Using ftrace, I found the callbacks it was queuing which this series fixes. Use
> > > > > call_rcu_flush() to revert to the old behavior. With that, the test passes.
> > > >
> > > > What is this test about ? What barrier was used to make it not flaky ?
> > >
> > > I provided the links above, let me know if you have any questions.
> > >
> > > > Was it depending on some undocumented RCU behavior ?
> > >
> > > This is a new RCU feature posted here for significant power-savings on
> > > battery-powered devices:
> > > https://lore.kernel.org/rcu/20221017140726.GG5600@paulmck-ThinkPad-P17-Gen-1/T/#m7a54809b8903b41538850194d67eb34f203c752a
> > >
> > > There is also an LPC presentation about the same, I can dig the link if you
> > > are interested.
> > >
> > > > Maybe adding a sysctl to force the flush would be better for functional tests ?
> > > >
> > > > I would rather change the test(s), than adding call_rcu_flush(),
> > > > adding merge conflicts to future backports.
> > >
> > > I am not too sure about that, I think a user might expect the network
> > > interface to disappear from the networking tables quickly enough without
> > > dealing with barriers or kernel iternals. However, I added the authors of the
> > > test to this email in the hopes he can provide is point of views as well.
> > >
> > > The general approach we are taking with this sort of thing is to use
> > > call_rcu_flush() which is basically the same as call_rcu() for systems with
> > > CALL_RCU_LAZY=n. You can see some examples of that in the patch series link
> > > above. Just to note, CALL_RCU_LAZY depends on CONFIG_RCU_NOCB_CPU so its only
> > > Android and ChromeOS that are using it. I am adding Jie to share any input,
> > > he is from the networking team and knows this test well.
> > >
> > >
> >
> > I do not know what is this RCU_LAZY thing, but IMO this should be opt-in
>
> You should read the links I sent you. We did already try opt-in,
> Thomas Gleixner made a point at LPC that we should not add new APIs
> for this purpose and confuse kernel developers.
>
> > For instance, only kfree_rcu() should use it.
>
> No. Most of the call_rcu() usages are for freeing memory, so the
> consensus is we should apply this as opt out and fix issues along the
> way. We already did a lot of research/diligence on seeing which users
> need conversion.
>
> > We can not review hundreds of call_rcu() call sites and decide if
> > adding arbitrary delays cou hurt .
>
> That work has already been done as much as possible, please read the
> links I sent.

Oh well. No.

I will leave it to other folks dealing with this crazy thing.
  
Joel Fernandes Nov. 17, 2022, 5:40 p.m. UTC | #7
On Thu, Nov 17, 2022 at 5:38 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>
> On Thu, Nov 17, 2022 at 5:17 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Thu, Nov 17, 2022 at 7:58 AM Joel Fernandes <joel@joelfernandes.org> wrote:
> > >
> > > Hello Eric,
> > >
> > > On Wed, Nov 16, 2022 at 07:44:41PM -0800, Eric Dumazet wrote:
> > > > On Wed, Nov 16, 2022 at 7:16 PM Joel Fernandes (Google)
> > > > <joel@joelfernandes.org> wrote:
> > > > >
> > > > > In a networking test on ChromeOS, we find that using the new CONFIG_RCU_LAZY
> > > > > causes a networking test to fail in the teardown phase.
> > > > >
> > > > > The failure happens during: ip netns del <name>
> > > >
> > > > And ? What happens then next ?
> > >
> > > The test is doing the 'ip netns del <name>' and then polling for the
> > > disappearance of a network interface name for upto 5 seconds. I believe it is
> > > using netlink to get a table of interfaces. That polling is timing out.
> > >
> > > Here is some more details from the test's owner (copy pasting from another
> > > bug report):
> > > In the cleanup, we remove the netns, and thus will cause the veth pair being
> > > removed automatically, so we use a poll to check that if the veth in the root
> > > netns still exists to know whether the cleanup is done.
> > >
> > > Here is a public link to the code that is failing (its in golang):
> > > https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/chromiumos/tast/local/network/virtualnet/env/env.go;drc=6c2841d6cc3eadd23e07912ec331943ee33d7de8;l=161
> > >
> > > Here is a public link to the line of code in the actual test leading up to the above
> > > path (this is the test that is run:
> > > network.RoutingFallthrough.ipv4_only_primary) :
> > > https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/chromiumos/tast/local/bundles/cros/network/routing_fallthrough.go;drc=8fbf2c53960bc8917a6a01fda5405cad7c17201e;l=52
> > >
> > > > > Using ftrace, I found the callbacks it was queuing which this series fixes. Use
> > > > > call_rcu_flush() to revert to the old behavior. With that, the test passes.
> > > >
> > > > What is this test about ? What barrier was used to make it not flaky ?
> > >
> > > I provided the links above, let me know if you have any questions.
> > >
> > > > Was it depending on some undocumented RCU behavior ?
> > >
> > > This is a new RCU feature posted here for significant power-savings on
> > > battery-powered devices:
> > > https://lore.kernel.org/rcu/20221017140726.GG5600@paulmck-ThinkPad-P17-Gen-1/T/#m7a54809b8903b41538850194d67eb34f203c752a
> > >
> > > There is also an LPC presentation about the same, I can dig the link if you
> > > are interested.
> > >
> > > > Maybe adding a sysctl to force the flush would be better for functional tests ?
> > > >
> > > > I would rather change the test(s), than adding call_rcu_flush(),
> > > > adding merge conflicts to future backports.
> > >
> > > I am not too sure about that, I think a user might expect the network
> > > interface to disappear from the networking tables quickly enough without
> > > dealing with barriers or kernel iternals. However, I added the authors of the
> > > test to this email in the hopes he can provide is point of views as well.
> > >
> > > The general approach we are taking with this sort of thing is to use
> > > call_rcu_flush() which is basically the same as call_rcu() for systems with
> > > CALL_RCU_LAZY=n. You can see some examples of that in the patch series link
> > > above. Just to note, CALL_RCU_LAZY depends on CONFIG_RCU_NOCB_CPU so its only
> > > Android and ChromeOS that are using it. I am adding Jie to share any input,
> > > he is from the networking team and knows this test well.
> > >
> > >
> >
> > I do not know what is this RCU_LAZY thing, but IMO this should be opt-in
>
> You should read the links I sent you. We did already try opt-in,
> Thomas Gleixner made a point at LPC that we should not add new APIs
> for this purpose and confuse kernel developers.
>
> > For instance, only kfree_rcu() should use it.
>
> No. Most of the call_rcu() usages are for freeing memory, so the
> consensus is we should apply this as opt out and fix issues along the
> way. We already did a lot of research/diligence on seeing which users
> need conversion.
>
> > We can not review hundreds of call_rcu() call sites and decide if
> > adding arbitrary delays cou hurt .
>
> That work has already been done as much as possible, please read the
> links I sent.

Also just to add, this test is a bit weird / corner case, as in anyone
expecting a quick response from call_rcu() is broken by design.
However, for these callbacks, it does not matter much which API they
use as they are quite infrequent for power savings.

Thanks.
  
Joel Fernandes Nov. 17, 2022, 5:42 p.m. UTC | #8
On Thu, Nov 17, 2022 at 5:40 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Thu, Nov 17, 2022 at 9:38 AM Joel Fernandes <joel@joelfernandes.org> wrote:
> >
> > On Thu, Nov 17, 2022 at 5:17 PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > On Thu, Nov 17, 2022 at 7:58 AM Joel Fernandes <joel@joelfernandes.org> wrote:
> > > >
> > > > Hello Eric,
> > > >
> > > > On Wed, Nov 16, 2022 at 07:44:41PM -0800, Eric Dumazet wrote:
> > > > > On Wed, Nov 16, 2022 at 7:16 PM Joel Fernandes (Google)
> > > > > <joel@joelfernandes.org> wrote:
> > > > > >
> > > > > > In a networking test on ChromeOS, we find that using the new CONFIG_RCU_LAZY
> > > > > > causes a networking test to fail in the teardown phase.
> > > > > >
> > > > > > The failure happens during: ip netns del <name>
> > > > >
> > > > > And ? What happens then next ?
> > > >
> > > > The test is doing the 'ip netns del <name>' and then polling for the
> > > > disappearance of a network interface name for upto 5 seconds. I believe it is
> > > > using netlink to get a table of interfaces. That polling is timing out.
> > > >
> > > > Here is some more details from the test's owner (copy pasting from another
> > > > bug report):
> > > > In the cleanup, we remove the netns, and thus will cause the veth pair being
> > > > removed automatically, so we use a poll to check that if the veth in the root
> > > > netns still exists to know whether the cleanup is done.
> > > >
> > > > Here is a public link to the code that is failing (its in golang):
> > > > https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/chromiumos/tast/local/network/virtualnet/env/env.go;drc=6c2841d6cc3eadd23e07912ec331943ee33d7de8;l=161
> > > >
> > > > Here is a public link to the line of code in the actual test leading up to the above
> > > > path (this is the test that is run:
> > > > network.RoutingFallthrough.ipv4_only_primary) :
> > > > https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/chromiumos/tast/local/bundles/cros/network/routing_fallthrough.go;drc=8fbf2c53960bc8917a6a01fda5405cad7c17201e;l=52
> > > >
> > > > > > Using ftrace, I found the callbacks it was queuing which this series fixes. Use
> > > > > > call_rcu_flush() to revert to the old behavior. With that, the test passes.
> > > > >
> > > > > What is this test about ? What barrier was used to make it not flaky ?
> > > >
> > > > I provided the links above, let me know if you have any questions.
> > > >
> > > > > Was it depending on some undocumented RCU behavior ?
> > > >
> > > > This is a new RCU feature posted here for significant power-savings on
> > > > battery-powered devices:
> > > > https://lore.kernel.org/rcu/20221017140726.GG5600@paulmck-ThinkPad-P17-Gen-1/T/#m7a54809b8903b41538850194d67eb34f203c752a
> > > >
> > > > There is also an LPC presentation about the same, I can dig the link if you
> > > > are interested.
> > > >
> > > > > Maybe adding a sysctl to force the flush would be better for functional tests ?
> > > > >
> > > > > I would rather change the test(s), than adding call_rcu_flush(),
> > > > > adding merge conflicts to future backports.
> > > >
> > > > I am not too sure about that, I think a user might expect the network
> > > > interface to disappear from the networking tables quickly enough without
> > > > dealing with barriers or kernel iternals. However, I added the authors of the
> > > > test to this email in the hopes he can provide is point of views as well.
> > > >
> > > > The general approach we are taking with this sort of thing is to use
> > > > call_rcu_flush() which is basically the same as call_rcu() for systems with
> > > > CALL_RCU_LAZY=n. You can see some examples of that in the patch series link
> > > > above. Just to note, CALL_RCU_LAZY depends on CONFIG_RCU_NOCB_CPU so its only
> > > > Android and ChromeOS that are using it. I am adding Jie to share any input,
> > > > he is from the networking team and knows this test well.
> > > >
> > > >
> > >
> > > I do not know what is this RCU_LAZY thing, but IMO this should be opt-in
> >
> > You should read the links I sent you. We did already try opt-in,
> > Thomas Gleixner made a point at LPC that we should not add new APIs
> > for this purpose and confuse kernel developers.
> >
> > > For instance, only kfree_rcu() should use it.
> >
> > No. Most of the call_rcu() usages are for freeing memory, so the
> > consensus is we should apply this as opt out and fix issues along the
> > way. We already did a lot of research/diligence on seeing which users
> > need conversion.
> >
> > > We can not review hundreds of call_rcu() call sites and decide if
> > > adding arbitrary delays cou hurt .
> >
> > That work has already been done as much as possible, please read the
> > links I sent.
>
> Oh well. No.
>
> I will leave it to other folks dealing with this crazy thing.

Yes, I agree. Your comments here have not been useful (or respectful)
so I am Ok with that.

 - Joel
  
Eric Dumazet Nov. 17, 2022, 5:49 p.m. UTC | #9
On Thu, Nov 17, 2022 at 9:42 AM Joel Fernandes <joel@joelfernandes.org> wrote:
>

>
> Yes, I agree. Your comments here have not been useful (or respectful)
> so I am Ok with that.
>
>  - Joel

Well, I have discovered that some changes went in networking tree
without network maintainers being involved nor CCed.

What can I say ?

It seems I have no say, right ?


commit f32846476afbe1f296c41d036219178b3dfb6a9d
Author: Joel Fernandes (Google) <joel@joelfernandes.org>
Date:   Sun Oct 16 16:23:04 2022 +0000

    rxrpc: Use call_rcu_flush() instead of call_rcu()

    call_rcu() changes to save power may cause slowness. Use the
    call_rcu_flush() API instead which reverts to the old behavior.

    We find this via inspection that the RCU callback does a wakeup of a
    thread. This usually indicates that something is waiting on it. To be
    safe, let us use call_rcu_flush() here instead.

    Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/net/rxrpc/conn_object.c b/net/rxrpc/conn_object.c
index 22089e37e97f0628f780855f9e219e5c33d4afa1..fdcfb509cc4434b0781b76623532aff9c854ce60
100644
--- a/net/rxrpc/conn_object.c
+++ b/net/rxrpc/conn_object.c
@@ -253,7 +253,7 @@ void rxrpc_kill_connection(struct rxrpc_connection *conn)
         * must carry a ref on the connection to prevent us getting here whilst
         * it is queued or running.
         */
-       call_rcu(&conn->rcu, rxrpc_destroy_connection);
+       call_rcu_flush(&conn->rcu, rxrpc_destroy_connection);
 }

 /*
  
Joel Fernandes Nov. 17, 2022, 6:18 p.m. UTC | #10
On Thu, Nov 17, 2022 at 5:49 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Thu, Nov 17, 2022 at 9:42 AM Joel Fernandes <joel@joelfernandes.org> wrote:
> >
>
> >
> > Yes, I agree. Your comments here have not been useful (or respectful)
> > so I am Ok with that.
> >
> >  - Joel
>
> Well, I have discovered that some changes went in networking tree
> without network maintainers being involved nor CCed.
>
> What can I say ?
>
> It seems I have no say, right ?

Sorry, I take responsibility for that. FWIW, the rxrpc change is not
yet in Linus's tree.

Also FWIW, the rxrpc case came up because we detected that it does a
scheduler wakeup from the callback. We did both static and dynamic
testing to identify callbacks that do wakeups throughout the kernel
(kernel patch available on request), as the pattern observed is things
doing wakeups typically are for use cases that are not freeing memory
but something blocking, similar to synchronize_rcu(). So it was a
"trivial/obvious" change to make for rxrpc which I might have assumed
did not need much supervision because it just reverts that API to the
old behavior -- still probably no excuse.

Again, we can talk this out no problem. But I would strongly recommend
not calling it "crazy thing", as we did all due diligence for almost a
year (talking about it at LPC, working through various code paths and
bugs, 4 different patch redesigns on the idea (including the opt-in
that you are bringing up), including a late night debugging session to
figure this out etc).

Just to clarify, I know you review/maintain a lot of the networking
code and I really appreciate that (not praising just for the sake).
And I care about the kernel too, just like you.

thanks,

 - Joel


> commit f32846476afbe1f296c41d036219178b3dfb6a9d
> Author: Joel Fernandes (Google) <joel@joelfernandes.org>
> Date:   Sun Oct 16 16:23:04 2022 +0000
>
>     rxrpc: Use call_rcu_flush() instead of call_rcu()
>
>     call_rcu() changes to save power may cause slowness. Use the
>     call_rcu_flush() API instead which reverts to the old behavior.
>
>     We find this via inspection that the RCU callback does a wakeup of a
>     thread. This usually indicates that something is waiting on it. To be
>     safe, let us use call_rcu_flush() here instead.
>
>     Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
>
> diff --git a/net/rxrpc/conn_object.c b/net/rxrpc/conn_object.c
> index 22089e37e97f0628f780855f9e219e5c33d4afa1..fdcfb509cc4434b0781b76623532aff9c854ce60
> 100644
> --- a/net/rxrpc/conn_object.c
> +++ b/net/rxrpc/conn_object.c
> @@ -253,7 +253,7 @@ void rxrpc_kill_connection(struct rxrpc_connection *conn)
>          * must carry a ref on the connection to prevent us getting here whilst
>          * it is queued or running.
>          */
> -       call_rcu(&conn->rcu, rxrpc_destroy_connection);
> +       call_rcu_flush(&conn->rcu, rxrpc_destroy_connection);
>  }
>
>  /*
  
Eric Dumazet Nov. 17, 2022, 6:22 p.m. UTC | #11
On Thu, Nov 17, 2022 at 10:18 AM Joel Fernandes <joel@joelfernandes.org> wrote:
>
> On Thu, Nov 17, 2022 at 5:49 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Thu, Nov 17, 2022 at 9:42 AM Joel Fernandes <joel@joelfernandes.org> wrote:
> > >
> >
> > >
> > > Yes, I agree. Your comments here have not been useful (or respectful)
> > > so I am Ok with that.
> > >
> > >  - Joel
> >
> > Well, I have discovered that some changes went in networking tree
> > without network maintainers being involved nor CCed.
> >
> > What can I say ?
> >
> > It seems I have no say, right ?
>
> Sorry, I take responsibility for that. FWIW, the rxrpc change is not
> yet in Linus's tree.
>
> Also FWIW, the rxrpc case came up because we detected that it does a
> scheduler wakeup from the callback. We did both static and dynamic
> testing to identify callbacks that do wakeups throughout the kernel
> (kernel patch available on request), as the pattern observed is things
> doing wakeups typically are for use cases that are not freeing memory
> but something blocking, similar to synchronize_rcu(). So it was a
> "trivial/obvious" change to make for rxrpc which I might have assumed
> did not need much supervision because it just reverts that API to the
> old behavior -- still probably no excuse.
>
> Again, we can talk this out no problem. But I would strongly recommend
> not calling it "crazy thing", as we did all due diligence for almost a
> year (talking about it at LPC, working through various code paths and
> bugs, 4 different patch redesigns on the idea (including the opt-in
> that you are bringing up), including a late night debugging session to
> figure this out etc).

Apologies.

For me "crazy" does not have the same meaning, apparently.

I will try to use more neutral words in the future.

>
> Just to clarify, I know you review/maintain a lot of the networking
> code and I really appreciate that (not praising just for the sake).
> And I care about the kernel too, just like you.

I had no doubts about that, really.
  
Paul E. McKenney Nov. 17, 2022, 7:29 p.m. UTC | #12
On Thu, Nov 17, 2022 at 05:40:40PM +0000, Joel Fernandes wrote:
> On Thu, Nov 17, 2022 at 5:38 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> >
> > On Thu, Nov 17, 2022 at 5:17 PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > On Thu, Nov 17, 2022 at 7:58 AM Joel Fernandes <joel@joelfernandes.org> wrote:
> > > >
> > > > Hello Eric,
> > > >
> > > > On Wed, Nov 16, 2022 at 07:44:41PM -0800, Eric Dumazet wrote:
> > > > > On Wed, Nov 16, 2022 at 7:16 PM Joel Fernandes (Google)
> > > > > <joel@joelfernandes.org> wrote:
> > > > > >
> > > > > > In a networking test on ChromeOS, we find that using the new CONFIG_RCU_LAZY
> > > > > > causes a networking test to fail in the teardown phase.
> > > > > >
> > > > > > The failure happens during: ip netns del <name>
> > > > >
> > > > > And ? What happens then next ?
> > > >
> > > > The test is doing the 'ip netns del <name>' and then polling for the
> > > > disappearance of a network interface name for upto 5 seconds. I believe it is
> > > > using netlink to get a table of interfaces. That polling is timing out.
> > > >
> > > > Here is some more details from the test's owner (copy pasting from another
> > > > bug report):
> > > > In the cleanup, we remove the netns, and thus will cause the veth pair being
> > > > removed automatically, so we use a poll to check that if the veth in the root
> > > > netns still exists to know whether the cleanup is done.
> > > >
> > > > Here is a public link to the code that is failing (its in golang):
> > > > https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/chromiumos/tast/local/network/virtualnet/env/env.go;drc=6c2841d6cc3eadd23e07912ec331943ee33d7de8;l=161
> > > >
> > > > Here is a public link to the line of code in the actual test leading up to the above
> > > > path (this is the test that is run:
> > > > network.RoutingFallthrough.ipv4_only_primary) :
> > > > https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/chromiumos/tast/local/bundles/cros/network/routing_fallthrough.go;drc=8fbf2c53960bc8917a6a01fda5405cad7c17201e;l=52
> > > >
> > > > > > Using ftrace, I found the callbacks it was queuing which this series fixes. Use
> > > > > > call_rcu_flush() to revert to the old behavior. With that, the test passes.
> > > > >
> > > > > What is this test about ? What barrier was used to make it not flaky ?
> > > >
> > > > I provided the links above, let me know if you have any questions.
> > > >
> > > > > Was it depending on some undocumented RCU behavior ?
> > > >
> > > > This is a new RCU feature posted here for significant power-savings on
> > > > battery-powered devices:
> > > > https://lore.kernel.org/rcu/20221017140726.GG5600@paulmck-ThinkPad-P17-Gen-1/T/#m7a54809b8903b41538850194d67eb34f203c752a
> > > >
> > > > There is also an LPC presentation about the same, I can dig the link if you
> > > > are interested.
> > > >
> > > > > Maybe adding a sysctl to force the flush would be better for functional tests ?
> > > > >
> > > > > I would rather change the test(s), than adding call_rcu_flush(),
> > > > > adding merge conflicts to future backports.
> > > >
> > > > I am not too sure about that, I think a user might expect the network
> > > > interface to disappear from the networking tables quickly enough without
> > > > dealing with barriers or kernel iternals. However, I added the authors of the
> > > > test to this email in the hopes he can provide is point of views as well.
> > > >
> > > > The general approach we are taking with this sort of thing is to use
> > > > call_rcu_flush() which is basically the same as call_rcu() for systems with
> > > > CALL_RCU_LAZY=n. You can see some examples of that in the patch series link
> > > > above. Just to note, CALL_RCU_LAZY depends on CONFIG_RCU_NOCB_CPU so its only
> > > > Android and ChromeOS that are using it. I am adding Jie to share any input,
> > > > he is from the networking team and knows this test well.
> > > >
> > > >
> > >
> > > I do not know what is this RCU_LAZY thing, but IMO this should be opt-in
> >
> > You should read the links I sent you. We did already try opt-in,
> > Thomas Gleixner made a point at LPC that we should not add new APIs
> > for this purpose and confuse kernel developers.
> >
> > > For instance, only kfree_rcu() should use it.
> >
> > No. Most of the call_rcu() usages are for freeing memory, so the
> > consensus is we should apply this as opt out and fix issues along the
> > way. We already did a lot of research/diligence on seeing which users
> > need conversion.
> >
> > > We can not review hundreds of call_rcu() call sites and decide if
> > > adding arbitrary delays cou hurt .
> >
> > That work has already been done as much as possible, please read the
> > links I sent.
> 
> Also just to add, this test is a bit weird / corner case, as in anyone
> expecting a quick response from call_rcu() is broken by design.
> However, for these callbacks, it does not matter much which API they
> use as they are quite infrequent for power savings.

The "broken by design" is a bit strong.  Some of those call_rcu()
invocations have been around for the better part of 20 years, after all.

That aside, I do hope that we can arrive at something that will enhance
battery lifetime while avoiding unnecessary disruption.  But we are
unlikely to be able to completely avoid disruption.  As this email
thread illustrates.  ;-)

							Thanx, Paul
  
Joel Fernandes Nov. 17, 2022, 9:16 p.m. UTC | #13
> On Nov 17, 2022, at 2:29 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
> 
> On Thu, Nov 17, 2022 at 05:40:40PM +0000, Joel Fernandes wrote:
>>> On Thu, Nov 17, 2022 at 5:38 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>>> 
>>> On Thu, Nov 17, 2022 at 5:17 PM Eric Dumazet <edumazet@google.com> wrote:
>>>> 
>>>> On Thu, Nov 17, 2022 at 7:58 AM Joel Fernandes <joel@joelfernandes.org> wrote:
>>>>> 
>>>>> Hello Eric,
>>>>> 
>>>>> On Wed, Nov 16, 2022 at 07:44:41PM -0800, Eric Dumazet wrote:
>>>>>> On Wed, Nov 16, 2022 at 7:16 PM Joel Fernandes (Google)
>>>>>> <joel@joelfernandes.org> wrote:
>>>>>>> 
>>>>>>> In a networking test on ChromeOS, we find that using the new CONFIG_RCU_LAZY
>>>>>>> causes a networking test to fail in the teardown phase.
>>>>>>> 
>>>>>>> The failure happens during: ip netns del <name>
>>>>>> 
>>>>>> And ? What happens then next ?
>>>>> 
>>>>> The test is doing the 'ip netns del <name>' and then polling for the
>>>>> disappearance of a network interface name for upto 5 seconds. I believe it is
>>>>> using netlink to get a table of interfaces. That polling is timing out.
>>>>> 
>>>>> Here is some more details from the test's owner (copy pasting from another
>>>>> bug report):
>>>>> In the cleanup, we remove the netns, and thus will cause the veth pair being
>>>>> removed automatically, so we use a poll to check that if the veth in the root
>>>>> netns still exists to know whether the cleanup is done.
>>>>> 
>>>>> Here is a public link to the code that is failing (its in golang):
>>>>> https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/chromiumos/tast/local/network/virtualnet/env/env.go;drc=6c2841d6cc3eadd23e07912ec331943ee33d7de8;l=161
>>>>> 
>>>>> Here is a public link to the line of code in the actual test leading up to the above
>>>>> path (this is the test that is run:
>>>>> network.RoutingFallthrough.ipv4_only_primary) :
>>>>> https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/chromiumos/tast/local/bundles/cros/network/routing_fallthrough.go;drc=8fbf2c53960bc8917a6a01fda5405cad7c17201e;l=52
>>>>> 
>>>>>>> Using ftrace, I found the callbacks it was queuing which this series fixes. Use
>>>>>>> call_rcu_flush() to revert to the old behavior. With that, the test passes.
>>>>>> 
>>>>>> What is this test about ? What barrier was used to make it not flaky ?
>>>>> 
>>>>> I provided the links above, let me know if you have any questions.
>>>>> 
>>>>>> Was it depending on some undocumented RCU behavior ?
>>>>> 
>>>>> This is a new RCU feature posted here for significant power-savings on
>>>>> battery-powered devices:
>>>>> https://lore.kernel.org/rcu/20221017140726.GG5600@paulmck-ThinkPad-P17-Gen-1/T/#m7a54809b8903b41538850194d67eb34f203c752a
>>>>> 
>>>>> There is also an LPC presentation about the same, I can dig the link if you
>>>>> are interested.
>>>>> 
>>>>>> Maybe adding a sysctl to force the flush would be better for functional tests ?
>>>>>> 
>>>>>> I would rather change the test(s), than adding call_rcu_flush(),
>>>>>> adding merge conflicts to future backports.
>>>>> 
>>>>> I am not too sure about that, I think a user might expect the network
>>>>> interface to disappear from the networking tables quickly enough without
>>>>> dealing with barriers or kernel iternals. However, I added the authors of the
>>>>> test to this email in the hopes he can provide is point of views as well.
>>>>> 
>>>>> The general approach we are taking with this sort of thing is to use
>>>>> call_rcu_flush() which is basically the same as call_rcu() for systems with
>>>>> CALL_RCU_LAZY=n. You can see some examples of that in the patch series link
>>>>> above. Just to note, CALL_RCU_LAZY depends on CONFIG_RCU_NOCB_CPU so its only
>>>>> Android and ChromeOS that are using it. I am adding Jie to share any input,
>>>>> he is from the networking team and knows this test well.
>>>>> 
>>>>> 
>>>> 
>>>> I do not know what is this RCU_LAZY thing, but IMO this should be opt-in
>>> 
>>> You should read the links I sent you. We did already try opt-in,
>>> Thomas Gleixner made a point at LPC that we should not add new APIs
>>> for this purpose and confuse kernel developers.
>>> 
>>>> For instance, only kfree_rcu() should use it.
>>> 
>>> No. Most of the call_rcu() usages are for freeing memory, so the
>>> consensus is we should apply this as opt out and fix issues along the
>>> way. We already did a lot of research/diligence on seeing which users
>>> need conversion.
>>> 
>>>> We can not review hundreds of call_rcu() call sites and decide if
>>>> adding arbitrary delays cou hurt .
>>> 
>>> That work has already been done as much as possible, please read the
>>> links I sent.
>> 
>> Also just to add, this test is a bit weird / corner case, as in anyone
>> expecting a quick response from call_rcu() is broken by design.
>> However, for these callbacks, it does not matter much which API they
>> use as they are quite infrequent for power savings.
> 
> The "broken by design" is a bit strong.  Some of those call_rcu()
> invocations have been around for the better part of 20 years, after all.
> 
> That aside, I do hope that we can arrive at something that will enhance
> battery lifetime while avoiding unnecessary disruption.  But we are
> unlikely to be able to completely avoid disruption.  As this email
> thread illustrates.  ;-)

Another approach, with these 3 patches could be to keep the call_rcu() but add an rcu_barrier() after them. I think people running ip del netns should not have to wait for their RCU cb to take too long to run and remove user visible state. But I would need suggestions from networking experts which CBs of these 3, to do this for. Or for all of them.

Alternatively, we can also patch just the test with a new knob that does rcu_barrier. But I dislike that as it does not fix it for all users. Probably the ip utilities will also need a patch then.

Thanks,

- Joel 


> 
>                            Thanx, Paul
  
Eric Dumazet Nov. 17, 2022, 9:29 p.m. UTC | #14
On Thu, Nov 17, 2022 at 1:16 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>
>
>
> > On Nov 17, 2022, at 2:29 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Thu, Nov 17, 2022 at 05:40:40PM +0000, Joel Fernandes wrote:
> >>> On Thu, Nov 17, 2022 at 5:38 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> >>>
> >>> On Thu, Nov 17, 2022 at 5:17 PM Eric Dumazet <edumazet@google.com> wrote:
> >>>>
> >>>> On Thu, Nov 17, 2022 at 7:58 AM Joel Fernandes <joel@joelfernandes.org> wrote:
> >>>>>
> >>>>> Hello Eric,
> >>>>>
> >>>>> On Wed, Nov 16, 2022 at 07:44:41PM -0800, Eric Dumazet wrote:
> >>>>>> On Wed, Nov 16, 2022 at 7:16 PM Joel Fernandes (Google)
> >>>>>> <joel@joelfernandes.org> wrote:
> >>>>>>>
> >>>>>>> In a networking test on ChromeOS, we find that using the new CONFIG_RCU_LAZY
> >>>>>>> causes a networking test to fail in the teardown phase.
> >>>>>>>
> >>>>>>> The failure happens during: ip netns del <name>
> >>>>>>
> >>>>>> And ? What happens then next ?
> >>>>>
> >>>>> The test is doing the 'ip netns del <name>' and then polling for the
> >>>>> disappearance of a network interface name for upto 5 seconds. I believe it is
> >>>>> using netlink to get a table of interfaces. That polling is timing out.
> >>>>>
> >>>>> Here is some more details from the test's owner (copy pasting from another
> >>>>> bug report):
> >>>>> In the cleanup, we remove the netns, and thus will cause the veth pair being
> >>>>> removed automatically, so we use a poll to check that if the veth in the root
> >>>>> netns still exists to know whether the cleanup is done.
> >>>>>
> >>>>> Here is a public link to the code that is failing (its in golang):
> >>>>> https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/chromiumos/tast/local/network/virtualnet/env/env.go;drc=6c2841d6cc3eadd23e07912ec331943ee33d7de8;l=161
> >>>>>
> >>>>> Here is a public link to the line of code in the actual test leading up to the above
> >>>>> path (this is the test that is run:
> >>>>> network.RoutingFallthrough.ipv4_only_primary) :
> >>>>> https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/chromiumos/tast/local/bundles/cros/network/routing_fallthrough.go;drc=8fbf2c53960bc8917a6a01fda5405cad7c17201e;l=52
> >>>>>
> >>>>>>> Using ftrace, I found the callbacks it was queuing which this series fixes. Use
> >>>>>>> call_rcu_flush() to revert to the old behavior. With that, the test passes.
> >>>>>>
> >>>>>> What is this test about ? What barrier was used to make it not flaky ?
> >>>>>
> >>>>> I provided the links above, let me know if you have any questions.
> >>>>>
> >>>>>> Was it depending on some undocumented RCU behavior ?
> >>>>>
> >>>>> This is a new RCU feature posted here for significant power-savings on
> >>>>> battery-powered devices:
> >>>>> https://lore.kernel.org/rcu/20221017140726.GG5600@paulmck-ThinkPad-P17-Gen-1/T/#m7a54809b8903b41538850194d67eb34f203c752a
> >>>>>
> >>>>> There is also an LPC presentation about the same, I can dig the link if you
> >>>>> are interested.
> >>>>>
> >>>>>> Maybe adding a sysctl to force the flush would be better for functional tests ?
> >>>>>>
> >>>>>> I would rather change the test(s), than adding call_rcu_flush(),
> >>>>>> adding merge conflicts to future backports.
> >>>>>
> >>>>> I am not too sure about that, I think a user might expect the network
> >>>>> interface to disappear from the networking tables quickly enough without
> >>>>> dealing with barriers or kernel iternals. However, I added the authors of the
> >>>>> test to this email in the hopes he can provide is point of views as well.
> >>>>>
> >>>>> The general approach we are taking with this sort of thing is to use
> >>>>> call_rcu_flush() which is basically the same as call_rcu() for systems with
> >>>>> CALL_RCU_LAZY=n. You can see some examples of that in the patch series link
> >>>>> above. Just to note, CALL_RCU_LAZY depends on CONFIG_RCU_NOCB_CPU so its only
> >>>>> Android and ChromeOS that are using it. I am adding Jie to share any input,
> >>>>> he is from the networking team and knows this test well.
> >>>>>
> >>>>>
> >>>>
> >>>> I do not know what is this RCU_LAZY thing, but IMO this should be opt-in
> >>>
> >>> You should read the links I sent you. We did already try opt-in,
> >>> Thomas Gleixner made a point at LPC that we should not add new APIs
> >>> for this purpose and confuse kernel developers.
> >>>
> >>>> For instance, only kfree_rcu() should use it.
> >>>
> >>> No. Most of the call_rcu() usages are for freeing memory, so the
> >>> consensus is we should apply this as opt out and fix issues along the
> >>> way. We already did a lot of research/diligence on seeing which users
> >>> need conversion.
> >>>
> >>>> We can not review hundreds of call_rcu() call sites and decide if
> >>>> adding arbitrary delays cou hurt .
> >>>
> >>> That work has already been done as much as possible, please read the
> >>> links I sent.
> >>
> >> Also just to add, this test is a bit weird / corner case, as in anyone
> >> expecting a quick response from call_rcu() is broken by design.
> >> However, for these callbacks, it does not matter much which API they
> >> use as they are quite infrequent for power savings.
> >
> > The "broken by design" is a bit strong.  Some of those call_rcu()
> > invocations have been around for the better part of 20 years, after all.
> >
> > That aside, I do hope that we can arrive at something that will enhance
> > battery lifetime while avoiding unnecessary disruption.  But we are
> > unlikely to be able to completely avoid disruption.  As this email
> > thread illustrates.  ;-)
>
> Another approach, with these 3 patches could be to keep the call_rcu() but add an rcu_barrier() after them. I think people running ip del netns should not have to wait for their RCU cb to take too long to run and remove user visible state. But I would need suggestions from networking experts which CBs of these 3, to do this for. Or for all of them.
>
> Alternatively, we can also patch just the test with a new knob that does rcu_barrier. But I dislike that as it does not fix it for all users. Probably the ip utilities will also need a patch then.
>

Normally we have an rcu_barrier() in netns dismantle path already at a
strategic location ( in cleanup_net() )

Maybe the issue here is that some particular layers need another one.
Or we need to release a blocking reference before the call_rcu().
Some call_rcu() usages might not be optimal in this respect.

We should not add an rcu_barrier() after a call_rcu(), we prefer
factoring these expensive operations.
  
Joel Fernandes Nov. 18, 2022, 1:05 a.m. UTC | #15
On Thu, Nov 17, 2022 at 9:29 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Thu, Nov 17, 2022 at 1:16 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> >
> >
> >
> > > On Nov 17, 2022, at 2:29 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
> > >
> > > On Thu, Nov 17, 2022 at 05:40:40PM +0000, Joel Fernandes wrote:
> > >>> On Thu, Nov 17, 2022 at 5:38 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> > >>>
> > >>> On Thu, Nov 17, 2022 at 5:17 PM Eric Dumazet <edumazet@google.com> wrote:
> > >>>>
> > >>>> On Thu, Nov 17, 2022 at 7:58 AM Joel Fernandes <joel@joelfernandes.org> wrote:
> > >>>>>
> > >>>>> Hello Eric,
> > >>>>>
> > >>>>> On Wed, Nov 16, 2022 at 07:44:41PM -0800, Eric Dumazet wrote:
> > >>>>>> On Wed, Nov 16, 2022 at 7:16 PM Joel Fernandes (Google)
> > >>>>>> <joel@joelfernandes.org> wrote:
> > >>>>>>>
> > >>>>>>> In a networking test on ChromeOS, we find that using the new CONFIG_RCU_LAZY
> > >>>>>>> causes a networking test to fail in the teardown phase.
> > >>>>>>>
> > >>>>>>> The failure happens during: ip netns del <name>
> > >>>>>>
> > >>>>>> And ? What happens then next ?
> > >>>>>
> > >>>>> The test is doing the 'ip netns del <name>' and then polling for the
> > >>>>> disappearance of a network interface name for upto 5 seconds. I believe it is
> > >>>>> using netlink to get a table of interfaces. That polling is timing out.
> > >>>>>
> > >>>>> Here is some more details from the test's owner (copy pasting from another
> > >>>>> bug report):
> > >>>>> In the cleanup, we remove the netns, and thus will cause the veth pair being
> > >>>>> removed automatically, so we use a poll to check that if the veth in the root
> > >>>>> netns still exists to know whether the cleanup is done.
> > >>>>>
> > >>>>> Here is a public link to the code that is failing (its in golang):
> > >>>>> https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/chromiumos/tast/local/network/virtualnet/env/env.go;drc=6c2841d6cc3eadd23e07912ec331943ee33d7de8;l=161
> > >>>>>
> > >>>>> Here is a public link to the line of code in the actual test leading up to the above
> > >>>>> path (this is the test that is run:
> > >>>>> network.RoutingFallthrough.ipv4_only_primary) :
> > >>>>> https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/chromiumos/tast/local/bundles/cros/network/routing_fallthrough.go;drc=8fbf2c53960bc8917a6a01fda5405cad7c17201e;l=52
> > >>>>>
> > >>>>>>> Using ftrace, I found the callbacks it was queuing which this series fixes. Use
> > >>>>>>> call_rcu_flush() to revert to the old behavior. With that, the test passes.
> > >>>>>>
> > >>>>>> What is this test about ? What barrier was used to make it not flaky ?
> > >>>>>
> > >>>>> I provided the links above, let me know if you have any questions.
> > >>>>>
> > >>>>>> Was it depending on some undocumented RCU behavior ?
> > >>>>>
> > >>>>> This is a new RCU feature posted here for significant power-savings on
> > >>>>> battery-powered devices:
> > >>>>> https://lore.kernel.org/rcu/20221017140726.GG5600@paulmck-ThinkPad-P17-Gen-1/T/#m7a54809b8903b41538850194d67eb34f203c752a
> > >>>>>
> > >>>>> There is also an LPC presentation about the same, I can dig the link if you
> > >>>>> are interested.
> > >>>>>
> > >>>>>> Maybe adding a sysctl to force the flush would be better for functional tests ?
> > >>>>>>
> > >>>>>> I would rather change the test(s), than adding call_rcu_flush(),
> > >>>>>> adding merge conflicts to future backports.
> > >>>>>
> > >>>>> I am not too sure about that, I think a user might expect the network
> > >>>>> interface to disappear from the networking tables quickly enough without
> > >>>>> dealing with barriers or kernel iternals. However, I added the authors of the
> > >>>>> test to this email in the hopes he can provide is point of views as well.
> > >>>>>
> > >>>>> The general approach we are taking with this sort of thing is to use
> > >>>>> call_rcu_flush() which is basically the same as call_rcu() for systems with
> > >>>>> CALL_RCU_LAZY=n. You can see some examples of that in the patch series link
> > >>>>> above. Just to note, CALL_RCU_LAZY depends on CONFIG_RCU_NOCB_CPU so its only
> > >>>>> Android and ChromeOS that are using it. I am adding Jie to share any input,
> > >>>>> he is from the networking team and knows this test well.
> > >>>>>
> > >>>>>
> > >>>>
> > >>>> I do not know what is this RCU_LAZY thing, but IMO this should be opt-in
> > >>>
> > >>> You should read the links I sent you. We did already try opt-in,
> > >>> Thomas Gleixner made a point at LPC that we should not add new APIs
> > >>> for this purpose and confuse kernel developers.
> > >>>
> > >>>> For instance, only kfree_rcu() should use it.
> > >>>
> > >>> No. Most of the call_rcu() usages are for freeing memory, so the
> > >>> consensus is we should apply this as opt out and fix issues along the
> > >>> way. We already did a lot of research/diligence on seeing which users
> > >>> need conversion.
> > >>>
> > >>>> We can not review hundreds of call_rcu() call sites and decide if
> > >>>> adding arbitrary delays cou hurt .
> > >>>
> > >>> That work has already been done as much as possible, please read the
> > >>> links I sent.
> > >>
> > >> Also just to add, this test is a bit weird / corner case, as in anyone
> > >> expecting a quick response from call_rcu() is broken by design.
> > >> However, for these callbacks, it does not matter much which API they
> > >> use as they are quite infrequent for power savings.
> > >
> > > The "broken by design" is a bit strong.  Some of those call_rcu()
> > > invocations have been around for the better part of 20 years, after all.
> > >
> > > That aside, I do hope that we can arrive at something that will enhance
> > > battery lifetime while avoiding unnecessary disruption.  But we are
> > > unlikely to be able to completely avoid disruption.  As this email
> > > thread illustrates.  ;-)
> >
> > Another approach, with these 3 patches could be to keep the call_rcu() but add an rcu_barrier() after them. I think people running ip del netns should not have to wait for their RCU cb to take too long to run and remove user visible state. But I would need suggestions from networking experts which CBs of these 3, to do this for. Or for all of them.
> >
> > Alternatively, we can also patch just the test with a new knob that does rcu_barrier. But I dislike that as it does not fix it for all users. Probably the ip utilities will also need a patch then.
> >
>
> Normally we have an rcu_barrier() in netns dismantle path already at a
> strategic location ( in cleanup_net() )
>
> Maybe the issue here is that some particular layers need another one.
> Or we need to release a blocking reference before the call_rcu().
> Some call_rcu() usages might not be optimal in this respect.
>
> We should not add an rcu_barrier() after a call_rcu(), we prefer
> factoring these expensive operations.

Sounds good! The dst_destroy_rcu() function appears complex to break
down (similar to how you suggested in 2/3). The dst->ops->destroy()
can decrement refcounts and so forth. We could audit all such dst->ops
users, but I wonder if it is safer to use call_rcu_flush() for this
patch.

Thanks,

 - Joel
  

Patch

diff --git a/net/core/dst.c b/net/core/dst.c
index bc9c9be4e080..15b16322703f 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -174,7 +174,7 @@  void dst_release(struct dst_entry *dst)
 			net_warn_ratelimited("%s: dst:%p refcnt:%d\n",
 					     __func__, dst, newrefcnt);
 		if (!newrefcnt)
-			call_rcu(&dst->rcu_head, dst_destroy_rcu);
+			call_rcu_flush(&dst->rcu_head, dst_destroy_rcu);
 	}
 }
 EXPORT_SYMBOL(dst_release);