[v2,0/2] nfsd: sanely handle inabilty to fetch pre/post attributes

Message ID 20230720-bz2223560-v2-0-070aaf2660b7@kernel.org
Headers
Series nfsd: sanely handle inabilty to fetch pre/post attributes |

Message

Jeff Layton July 20, 2023, 6:23 p.m. UTC
  Boyang reported tripping the BUG_ON in set_change_info. While we
couldn't confirm it, one way this could happen would be for nfsd_lookup
to succeed and then for fh_fill_both_attrs to fail.

This patchset attempts to (sanely) fix this, usually by aborting the
operation if fetching the pre attributes fails. Post-op attribute fetch
handling is more difficult to deal with however since we've already done
the operation, so this has it just fudge the change_info4 if that
occurs.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
Changes in v2:
- make fh_fill_*_attrs return an error and have the callers handle it
- rework of set_change_info, to better handle missing pre/post attrs

---
Jeff Layton (2):
      nfsd: handle failure to collect pre/post-op attrs more sanely
      nfsd: remove unsafe BUG_ON from set_change_info

 fs/nfsd/nfs3proc.c |  4 +++-
 fs/nfsd/nfs4proc.c | 45 +++++++++++++++++++++++++++++++++------
 fs/nfsd/nfsfh.c    | 26 ++++++++++++++---------
 fs/nfsd/nfsfh.h    |  6 +++---
 fs/nfsd/vfs.c      | 62 ++++++++++++++++++++++++++++++++++--------------------
 fs/nfsd/xdr4.h     | 11 ----------
 6 files changed, 100 insertions(+), 54 deletions(-)
---
base-commit: 070f391ca4d48e1750ee6108eb44f751a9e9448e
change-id: 20230720-bz2223560-9c4690a8217b

Best regards,
  

Comments

NeilBrown July 20, 2023, 9:42 p.m. UTC | #1
On Fri, 21 Jul 2023, Jeff Layton wrote:
> Boyang reported tripping the BUG_ON in set_change_info. While we
> couldn't confirm it, one way this could happen would be for nfsd_lookup
> to succeed and then for fh_fill_both_attrs to fail.
> 
> This patchset attempts to (sanely) fix this, usually by aborting the
> operation if fetching the pre attributes fails. Post-op attribute fetch
> handling is more difficult to deal with however since we've already done
> the operation, so this has it just fudge the change_info4 if that
> occurs.

I think both v3 and v4 allow a reply that says "the operation was a
success but there are no post-op attrs".  With v4 you can say "there is
no change-attr, but here are some other attrs".  I think.

Our xdr-encoding doesn't make that easy, but it is just a "simple matter
of coding".  If you think it is worth it.

NeilBrown


> 
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
> Changes in v2:
> - make fh_fill_*_attrs return an error and have the callers handle it
> - rework of set_change_info, to better handle missing pre/post attrs
> 
> ---
> Jeff Layton (2):
>       nfsd: handle failure to collect pre/post-op attrs more sanely
>       nfsd: remove unsafe BUG_ON from set_change_info
> 
>  fs/nfsd/nfs3proc.c |  4 +++-
>  fs/nfsd/nfs4proc.c | 45 +++++++++++++++++++++++++++++++++------
>  fs/nfsd/nfsfh.c    | 26 ++++++++++++++---------
>  fs/nfsd/nfsfh.h    |  6 +++---
>  fs/nfsd/vfs.c      | 62 ++++++++++++++++++++++++++++++++++--------------------
>  fs/nfsd/xdr4.h     | 11 ----------
>  6 files changed, 100 insertions(+), 54 deletions(-)
> ---
> base-commit: 070f391ca4d48e1750ee6108eb44f751a9e9448e
> change-id: 20230720-bz2223560-9c4690a8217b
> 
> Best regards,
> -- 
> Jeff Layton <jlayton@kernel.org>
> 
>
  
Chuck Lever July 20, 2023, 11:15 p.m. UTC | #2
On Fri, Jul 21, 2023 at 07:42:47AM +1000, NeilBrown wrote:
> On Fri, 21 Jul 2023, Jeff Layton wrote:
> > Boyang reported tripping the BUG_ON in set_change_info. While we
> > couldn't confirm it, one way this could happen would be for nfsd_lookup
> > to succeed and then for fh_fill_both_attrs to fail.
> > 
> > This patchset attempts to (sanely) fix this, usually by aborting the
> > operation if fetching the pre attributes fails. Post-op attribute fetch
> > handling is more difficult to deal with however since we've already done
> > the operation, so this has it just fudge the change_info4 if that
> > occurs.
> 
> I think both v3 and v4 allow a reply that says "the operation was a
> success but there are no post-op attrs".  With v4 you can say "there is
> no change-attr, but here are some other attrs".  I think.

If the protocols enable NFSD to avoid returning made-up values, I'm
all for it.


> Our xdr-encoding doesn't make that easy, but it is just a "simple matter
> of coding".  If you think it is worth it.
> 
> NeilBrown
> 
> 
> > 
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> > Changes in v2:
> > - make fh_fill_*_attrs return an error and have the callers handle it
> > - rework of set_change_info, to better handle missing pre/post attrs
> > 
> > ---
> > Jeff Layton (2):
> >       nfsd: handle failure to collect pre/post-op attrs more sanely
> >       nfsd: remove unsafe BUG_ON from set_change_info
> > 
> >  fs/nfsd/nfs3proc.c |  4 +++-
> >  fs/nfsd/nfs4proc.c | 45 +++++++++++++++++++++++++++++++++------
> >  fs/nfsd/nfsfh.c    | 26 ++++++++++++++---------
> >  fs/nfsd/nfsfh.h    |  6 +++---
> >  fs/nfsd/vfs.c      | 62 ++++++++++++++++++++++++++++++++++--------------------
> >  fs/nfsd/xdr4.h     | 11 ----------
> >  6 files changed, 100 insertions(+), 54 deletions(-)
> > ---
> > base-commit: 070f391ca4d48e1750ee6108eb44f751a9e9448e
> > change-id: 20230720-bz2223560-9c4690a8217b
> > 
> > Best regards,
> > -- 
> > Jeff Layton <jlayton@kernel.org>
> > 
> > 
>
  
Jeff Layton July 21, 2023, 12:48 p.m. UTC | #3
On Fri, 2023-07-21 at 07:42 +1000, NeilBrown wrote:
> On Fri, 21 Jul 2023, Jeff Layton wrote:
> > Boyang reported tripping the BUG_ON in set_change_info. While we
> > couldn't confirm it, one way this could happen would be for nfsd_lookup
> > to succeed and then for fh_fill_both_attrs to fail.
> > 
> > This patchset attempts to (sanely) fix this, usually by aborting the
> > operation if fetching the pre attributes fails. Post-op attribute fetch
> > handling is more difficult to deal with however since we've already done
> > the operation, so this has it just fudge the change_info4 if that
> > occurs.
> 
> I think both v3 and v4 allow a reply that says "the operation was a
> success but there are no post-op attrs".  With v4 you can say "there is
> no change-attr, but here are some other attrs".  I think.
> 

v3 has this ability:

      union pre_op_attr switch (bool attributes_follow) {
      case TRUE:
           wcc_attr  attributes;
      case FALSE:
           void;
      };

...we can just set the attributes_follow flag to false there in that
case.

That's not possible with v4, AFAICT. Several of the *4resok structures
contain a change_info4, which just looks like this:

struct change_info4 {
        bool            atomic;
        changeid4       before;
        changeid4       after;
};

We can set "atomic" to false (and this patch does that in this
situation), but I don't believe there is any alternative to the change
attribute. If the underlying fs doesn't support native change attrs, the
server is expected to fake one up somehow (usually from the ctime).

We could (in principle) allow the operation to proceed on v3 even if
fh_fill_pre_attrs fails, but I don't think we can do the same thing with
v4. That said, if getattr is failing then it's somewhat likely that
other operations will fail too, so aborting the operation in this
situation doesn't seem too onerous.
  
NeilBrown July 22, 2023, 12:34 a.m. UTC | #4
On Fri, 21 Jul 2023, Jeff Layton wrote:
> On Fri, 2023-07-21 at 07:42 +1000, NeilBrown wrote:
> > 
> > I think both v3 and v4 allow a reply that says "the operation was a
> > success but there are no post-op attrs".  With v4 you can say "there is
> > no change-attr, but here are some other attrs".  I think.
> > 
> 
> v3 has this ability:
> 
>       union pre_op_attr switch (bool attributes_follow) {
>       case TRUE:
>            wcc_attr  attributes;
>       case FALSE:
>            void;
>       };
> 
> ...we can just set the attributes_follow flag to false there in that
> case.
> 
> That's not possible with v4, AFAICT. Several of the *4resok structures
> contain a change_info4, which just looks like this:
> 
> struct change_info4 {
>         bool            atomic;
>         changeid4       before;
>         changeid4       after;
> };

Yes...  I was thinking of GETATTR which reports a bitmap of all the
attributes that it can return.  Though I'm not sure if the server is
"allowed" to not return something that it has said is "supported".  And
I think changeid has to be "supported".  I'm not sure.

But anyway, that doesn't help change_info4 which comes with
directory-modifying operation.

> 
> We can set "atomic" to false (and this patch does that in this
> situation), but I don't believe there is any alternative to the change
> attribute. If the underlying fs doesn't support native change attrs, the
> server is expected to fake one up somehow (usually from the ctime).

I had a look again at the current code and your patch, and I think that
if the "post' vfs_getattr() fails, then the operation succeeds, the
change_info is marked non-atomic (as you say) and the "after" changeid is
set to an uninitialised value.  Is that right?  Did I miss something?
Maybe we should set it to the pre value plus 1.

It probably doesn't matter at all in practice, but if I'm right and it
is using an uninitialized value, we should at least fix that.

Thanks - your v3 patch looks good in general.  I like the must_check and
the goto structure.

Thanks,
NeilBrown


> 
> We could (in principle) allow the operation to proceed on v3 even if
> fh_fill_pre_attrs fails, but I don't think we can do the same thing with
> v4. That said, if getattr is failing then it's somewhat likely that
> other operations will fail too, so aborting the operation in this
> situation doesn't seem too onerous.
> 
> -- 
> Jeff Layton <jlayton@kernel.org>
>
  
Jeff Layton July 24, 2023, 10:36 a.m. UTC | #5
On Sat, 2023-07-22 at 10:34 +1000, NeilBrown wrote:
> On Fri, 21 Jul 2023, Jeff Layton wrote:
> > On Fri, 2023-07-21 at 07:42 +1000, NeilBrown wrote:
> > > 
> > > I think both v3 and v4 allow a reply that says "the operation was a
> > > success but there are no post-op attrs".  With v4 you can say "there is
> > > no change-attr, but here are some other attrs".  I think.
> > > 
> > 
> > v3 has this ability:
> > 
> >       union pre_op_attr switch (bool attributes_follow) {
> >       case TRUE:
> >            wcc_attr  attributes;
> >       case FALSE:
> >            void;
> >       };
> > 
> > ...we can just set the attributes_follow flag to false there in that
> > case.
> > 
> > That's not possible with v4, AFAICT. Several of the *4resok structures
> > contain a change_info4, which just looks like this:
> > 
> > struct change_info4 {
> >         bool            atomic;
> >         changeid4       before;
> >         changeid4       after;
> > };
> 
> Yes...  I was thinking of GETATTR which reports a bitmap of all the
> attributes that it can return.  Though I'm not sure if the server is
> "allowed" to not return something that it has said is "supported".  And
> I think changeid has to be "supported".  I'm not sure.
>
> But anyway, that doesn't help change_info4 which comes with
> directory-modifying operation.
> 
> > 
> > We can set "atomic" to false (and this patch does that in this
> > situation), but I don't believe there is any alternative to the change
> > attribute. If the underlying fs doesn't support native change attrs, the
> > server is expected to fake one up somehow (usually from the ctime).
> 
> I had a look again at the current code and your patch, and I think that
> if the "post' vfs_getattr() fails, then the operation succeeds, the
> change_info is marked non-atomic (as you say) and the "after" changeid is
> set to an uninitialised value.  Is that right?  Did I miss something?
> Maybe we should set it to the pre value plus 1.
> 
> It probably doesn't matter at all in practice, but if I'm right and it
> is using an uninitialized value, we should at least fix that.
> 
> Thanks - your v3 patch looks good in general.  I like the must_check and
> the goto structure.
> 
> Thanks,
> NeilBrown
> 
> 


The current patch sets the missing pre/post values to 0. I'm happy to
change that to pre-value+1 though if you think that'd be more correct.
The client already fudges the changeid like that in the CB_GETATTR case,
so I doubt that would break anything.