Message ID | 20230405171449.4064321-1-stefanb@linux.ibm.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp469677vqo; Wed, 5 Apr 2023 10:19:24 -0700 (PDT) X-Google-Smtp-Source: AKy350Z2oi2grjwE9yQ9+1/Oddcn8jQzuezAATdQUsy1lTOBnwHe7WewAzEQqxSERFI6xcWP/tXR X-Received: by 2002:a17:90b:4c92:b0:23f:635e:51e9 with SMTP id my18-20020a17090b4c9200b0023f635e51e9mr7684633pjb.8.1680715163841; Wed, 05 Apr 2023 10:19:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680715163; cv=none; d=google.com; s=arc-20160816; b=XMnf+2tlDPXMCe6AkIAa02AHJ4DiNz5ixSlPGREiC9TY3PYOnUj9rZOtGMKtjb8gKC 5vHHdNhUdROMFNAVqWjG9MCWPtsyEhYIe8QFeusosj9pq+XWKPKFn6O0ZX1FLPFm7UJ8 233+Aau8AtTJOwJyx6tVfUAKcqSyJ2y/UhiqDWZvM++XYGz4SbR9D2bjHMmTAwu60WpI mPEQZrV796K0IHreBV7azF8LyTX3Ystg7eUfrPfSu+RjZLeoFX6k1tE/6LGR9nD7VfM8 vF/TIASZOKgD78r/ZP5HWVWKoX2FuFCCYjrh09MwYyS3TJA3H9PuAGBhzZkI+3pXcjDu p6xA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=xdPANpjtlbwea8smzHbacqN4g1Ruy2zYgHhQtcKL4jA=; b=BsxJHFaRiigIXNqM7lY0AuI6nGr7gmcarhnczjOs33ZxvqJHcUCGWynFuvFyp0EvV0 oouP8n/0fX7EDPrhboUFVmqrKcYPCOkjMy5ZWsu+/w2dAth29UHg2doGG8K/gQmEjRcc Ps/Uj/9LenSTOGU0B4MHvNiD4gJ0r77Zh30EsI8uqNW8jeJcUhVSMlD6NSJlzKTg7rDj JALxo5LRw08AGnXQgBdqDIwLZtGTCzJ9EW5DMGydmJVR4nDJxoAsroNx2nnHRYmCUPaX yr0iU8l4B2byei1ZRwoUIvTTohwRz5PW8Xp8ep46P8LSuth046RPCZ0fTN+G7i+NaigJ 6ZQg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=Jviavfdg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x13-20020a17090a1f8d00b0023fb878c669si1723349pja.6.2023.04.05.10.19.06; Wed, 05 Apr 2023 10:19:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=Jviavfdg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231574AbjDERPK (ORCPT <rfc822;lkml4gm@gmail.com> + 99 others); Wed, 5 Apr 2023 13:15:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34224 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229748AbjDERPG (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 5 Apr 2023 13:15:06 -0400 Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C08DA1BEF; Wed, 5 Apr 2023 10:15:05 -0700 (PDT) Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 335GG6GM013112; Wed, 5 Apr 2023 17:15:04 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : mime-version : content-transfer-encoding; s=pp1; bh=xdPANpjtlbwea8smzHbacqN4g1Ruy2zYgHhQtcKL4jA=; b=JviavfdghGpxFybYoZ5PlrAS6QH2tK6y0r0h+eXId+HQSVyuZwFsAE51ZdMf1ff8IoSY P++/zzbiTHnTDBH9nzhlqIg3PrXzCqG3Ag4Ui7vYjJ+SYDACo3jAydx7qX4BAmris178 pSHQRQqmageLgFAvtA+6VddGcD5/HFH4snG+444FBWx9UBBRgbMxO4wf0+V11IoVldbJ fUBcS1Y+CJcWyp+IS+74gxvt0hoNYLjeLOYYde0GmAJIDbwNDfVjwLU3s2pSjWN+I7DV 8BrNIpuZsqc1J5HzEP9DCeb9eWi6kAyGm4xsmky9G580xQpWuJGp80zwxn/Zy53Yj8r8 GA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3ps8w7yu2n-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 05 Apr 2023 17:15:04 +0000 Received: from m0098416.ppops.net (m0098416.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 335Gk6am002751; Wed, 5 Apr 2023 17:15:03 GMT Received: from ppma02dal.us.ibm.com (a.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.10]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3ps8w7yu29-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 05 Apr 2023 17:15:03 +0000 Received: from pps.filterd (ppma02dal.us.ibm.com [127.0.0.1]) by ppma02dal.us.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 335H1GMO031516; Wed, 5 Apr 2023 17:15:03 GMT Received: from smtprelay02.dal12v.mail.ibm.com ([9.208.130.97]) by ppma02dal.us.ibm.com (PPS) with ESMTPS id 3ppc88d3y0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 05 Apr 2023 17:15:02 +0000 Received: from smtpav02.wdc07v.mail.ibm.com (smtpav02.wdc07v.mail.ibm.com [10.39.53.229]) by smtprelay02.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 335HF1hV36307508 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 5 Apr 2023 17:15:01 GMT Received: from smtpav02.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 17F085805C; Wed, 5 Apr 2023 17:15:01 +0000 (GMT) Received: from smtpav02.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2D01B58059; Wed, 5 Apr 2023 17:15:00 +0000 (GMT) Received: from sbct-3.pok.ibm.com (unknown [9.47.158.153]) by smtpav02.wdc07v.mail.ibm.com (Postfix) with ESMTP; Wed, 5 Apr 2023 17:15:00 +0000 (GMT) From: Stefan Berger <stefanb@linux.ibm.com> To: zohar@linux.ibm.com, linux-integrity@vger.kernel.org, miklos@szeredi.hu Cc: linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-unionfs@vger.kernel.org, amir73il@gmail.com, Stefan Berger <stefanb@linux.ibm.com> Subject: [PATCH] overlayfs: Trigger file re-evaluation by IMA / EVM after writes Date: Wed, 5 Apr 2023 13:14:49 -0400 Message-Id: <20230405171449.4064321-1-stefanb@linux.ibm.com> X-Mailer: git-send-email 2.39.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: I-u9B3_Iu8NUUSka9TVqB0TEA3acKn1L X-Proofpoint-ORIG-GUID: usiMNPTQycmGVGH7XZb5S6smuMFo-bqs X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-04-05_11,2023-04-05_01,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 impostorscore=0 lowpriorityscore=0 clxscore=1011 phishscore=0 spamscore=0 bulkscore=0 adultscore=0 malwarescore=0 suspectscore=0 mlxlogscore=999 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2304050154 X-Spam-Status: No, score=-0.1 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_EF,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1762357583805308143?= X-GMAIL-MSGID: =?utf-8?q?1762357583805308143?= |
Series |
overlayfs: Trigger file re-evaluation by IMA / EVM after writes
|
|
Commit Message
Stefan Berger
April 5, 2023, 5:14 p.m. UTC
Overlayfs fails to notify IMA / EVM about file content modifications
and therefore IMA-appraised files may execute even though their file
signature does not validate against the changed hash of the file
anymore. To resolve this issue, add a call to integrity_notify_change()
to the ovl_release() function to notify the integrity subsystem about
file changes. The set flag triggers the re-evaluation of the file by
IMA / EVM once the file is accessed again.
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
---
fs/overlayfs/file.c | 4 ++++
include/linux/integrity.h | 6 ++++++
security/integrity/iint.c | 13 +++++++++++++
3 files changed, 23 insertions(+)
Comments
On Wed, Apr 05, 2023 at 01:14:49PM -0400, Stefan Berger wrote: > Overlayfs fails to notify IMA / EVM about file content modifications > and therefore IMA-appraised files may execute even though their file > signature does not validate against the changed hash of the file > anymore. To resolve this issue, add a call to integrity_notify_change() > to the ovl_release() function to notify the integrity subsystem about > file changes. The set flag triggers the re-evaluation of the file by > IMA / EVM once the file is accessed again. > > Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> > --- > fs/overlayfs/file.c | 4 ++++ > include/linux/integrity.h | 6 ++++++ > security/integrity/iint.c | 13 +++++++++++++ > 3 files changed, 23 insertions(+) > > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c > index 6011f955436b..19b8f4bcc18c 100644 > --- a/fs/overlayfs/file.c > +++ b/fs/overlayfs/file.c > @@ -13,6 +13,7 @@ > #include <linux/security.h> > #include <linux/mm.h> > #include <linux/fs.h> > +#include <linux/integrity.h> > #include "overlayfs.h" > > struct ovl_aio_req { > @@ -169,6 +170,9 @@ static int ovl_open(struct inode *inode, struct file *file) > > static int ovl_release(struct inode *inode, struct file *file) > { > + if (file->f_flags & O_ACCMODE) > + integrity_notify_change(inode); > + > fput(file->private_data); > > return 0; > diff --git a/include/linux/integrity.h b/include/linux/integrity.h > index 2ea0f2f65ab6..cefdeccc1619 100644 > --- a/include/linux/integrity.h > +++ b/include/linux/integrity.h > @@ -23,6 +23,7 @@ enum integrity_status { > #ifdef CONFIG_INTEGRITY > extern struct integrity_iint_cache *integrity_inode_get(struct inode *inode); > extern void integrity_inode_free(struct inode *inode); > +extern void integrity_notify_change(struct inode *inode); I thought we concluded that ima is going to move into the security hook infrastructure so it seems this should be a proper LSM hook?
On Thu, Apr 6, 2023 at 6:26 AM Christian Brauner <brauner@kernel.org> wrote: > On Wed, Apr 05, 2023 at 01:14:49PM -0400, Stefan Berger wrote: > > Overlayfs fails to notify IMA / EVM about file content modifications > > and therefore IMA-appraised files may execute even though their file > > signature does not validate against the changed hash of the file > > anymore. To resolve this issue, add a call to integrity_notify_change() > > to the ovl_release() function to notify the integrity subsystem about > > file changes. The set flag triggers the re-evaluation of the file by > > IMA / EVM once the file is accessed again. > > > > Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> > > --- > > fs/overlayfs/file.c | 4 ++++ > > include/linux/integrity.h | 6 ++++++ > > security/integrity/iint.c | 13 +++++++++++++ > > 3 files changed, 23 insertions(+) > > > > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c > > index 6011f955436b..19b8f4bcc18c 100644 > > --- a/fs/overlayfs/file.c > > +++ b/fs/overlayfs/file.c > > @@ -13,6 +13,7 @@ > > #include <linux/security.h> > > #include <linux/mm.h> > > #include <linux/fs.h> > > +#include <linux/integrity.h> > > #include "overlayfs.h" > > > > struct ovl_aio_req { > > @@ -169,6 +170,9 @@ static int ovl_open(struct inode *inode, struct file *file) > > > > static int ovl_release(struct inode *inode, struct file *file) > > { > > + if (file->f_flags & O_ACCMODE) > > + integrity_notify_change(inode); > > + > > fput(file->private_data); > > > > return 0; > > diff --git a/include/linux/integrity.h b/include/linux/integrity.h > > index 2ea0f2f65ab6..cefdeccc1619 100644 > > --- a/include/linux/integrity.h > > +++ b/include/linux/integrity.h > > @@ -23,6 +23,7 @@ enum integrity_status { > > #ifdef CONFIG_INTEGRITY > > extern struct integrity_iint_cache *integrity_inode_get(struct inode *inode); > > extern void integrity_inode_free(struct inode *inode); > > +extern void integrity_notify_change(struct inode *inode); > > I thought we concluded that ima is going to move into the security hook > infrastructure so it seems this should be a proper LSM hook? We are working towards migrating IMA/EVM to the LSM layer, but there are a few things we need to fix/update/remove first; if anyone is curious, you can join the LSM list as we've been discussing some of these changes this week. Bug fixes like this should probably remain as IMA/EVM calls for the time being, with the understanding that they will migrate over with the rest of IMA/EVM. That said, we should give Mimi a chance to review this patch as it is possible there is a different/better approach. A bit of patience may be required as I know Mimi is very busy at the moment.
On 4/6/23 10:05, Paul Moore wrote: > On Thu, Apr 6, 2023 at 6:26 AM Christian Brauner <brauner@kernel.org> wrote: >> On Wed, Apr 05, 2023 at 01:14:49PM -0400, Stefan Berger wrote: >>> Overlayfs fails to notify IMA / EVM about file content modifications >>> and therefore IMA-appraised files may execute even though their file >>> signature does not validate against the changed hash of the file >>> anymore. To resolve this issue, add a call to integrity_notify_change() >>> to the ovl_release() function to notify the integrity subsystem about >>> file changes. The set flag triggers the re-evaluation of the file by >>> IMA / EVM once the file is accessed again. >>> >>> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> >>> --- >>> fs/overlayfs/file.c | 4 ++++ >>> include/linux/integrity.h | 6 ++++++ >>> security/integrity/iint.c | 13 +++++++++++++ >>> 3 files changed, 23 insertions(+) >>> >>> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c >>> index 6011f955436b..19b8f4bcc18c 100644 >>> --- a/fs/overlayfs/file.c >>> +++ b/fs/overlayfs/file.c >>> @@ -13,6 +13,7 @@ >>> #include <linux/security.h> >>> #include <linux/mm.h> >>> #include <linux/fs.h> >>> +#include <linux/integrity.h> >>> #include "overlayfs.h" >>> >>> struct ovl_aio_req { >>> @@ -169,6 +170,9 @@ static int ovl_open(struct inode *inode, struct file *file) >>> >>> static int ovl_release(struct inode *inode, struct file *file) >>> { >>> + if (file->f_flags & O_ACCMODE) >>> + integrity_notify_change(inode); >>> + >>> fput(file->private_data); >>> >>> return 0; >>> diff --git a/include/linux/integrity.h b/include/linux/integrity.h >>> index 2ea0f2f65ab6..cefdeccc1619 100644 >>> --- a/include/linux/integrity.h >>> +++ b/include/linux/integrity.h >>> @@ -23,6 +23,7 @@ enum integrity_status { >>> #ifdef CONFIG_INTEGRITY >>> extern struct integrity_iint_cache *integrity_inode_get(struct inode *inode); >>> extern void integrity_inode_free(struct inode *inode); >>> +extern void integrity_notify_change(struct inode *inode); >> >> I thought we concluded that ima is going to move into the security hook >> infrastructure so it seems this should be a proper LSM hook? > > We are working towards migrating IMA/EVM to the LSM layer, but there > are a few things we need to fix/update/remove first; if anyone is > curious, you can join the LSM list as we've been discussing some of > these changes this week. Bug fixes like this should probably remain > as IMA/EVM calls for the time being, with the understanding that they > will migrate over with the rest of IMA/EVM. > > That said, we should give Mimi a chance to review this patch as it is > possible there is a different/better approach. A bit of patience may > be required as I know Mimi is very busy at the moment. > There may be a better approach actually by increasing the inode's i_version, which then should trigger the appropriate path in ima_check_last_writer().
On Thu, Apr 6, 2023 at 10:20 AM Stefan Berger <stefanb@linux.ibm.com> wrote: > On 4/6/23 10:05, Paul Moore wrote: > > On Thu, Apr 6, 2023 at 6:26 AM Christian Brauner <brauner@kernel.org> wrote: > >> On Wed, Apr 05, 2023 at 01:14:49PM -0400, Stefan Berger wrote: > >>> Overlayfs fails to notify IMA / EVM about file content modifications > >>> and therefore IMA-appraised files may execute even though their file > >>> signature does not validate against the changed hash of the file > >>> anymore. To resolve this issue, add a call to integrity_notify_change() > >>> to the ovl_release() function to notify the integrity subsystem about > >>> file changes. The set flag triggers the re-evaluation of the file by > >>> IMA / EVM once the file is accessed again. > >>> > >>> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> > >>> --- > >>> fs/overlayfs/file.c | 4 ++++ > >>> include/linux/integrity.h | 6 ++++++ > >>> security/integrity/iint.c | 13 +++++++++++++ > >>> 3 files changed, 23 insertions(+) > >>> > >>> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c > >>> index 6011f955436b..19b8f4bcc18c 100644 > >>> --- a/fs/overlayfs/file.c > >>> +++ b/fs/overlayfs/file.c > >>> @@ -13,6 +13,7 @@ > >>> #include <linux/security.h> > >>> #include <linux/mm.h> > >>> #include <linux/fs.h> > >>> +#include <linux/integrity.h> > >>> #include "overlayfs.h" > >>> > >>> struct ovl_aio_req { > >>> @@ -169,6 +170,9 @@ static int ovl_open(struct inode *inode, struct file *file) > >>> > >>> static int ovl_release(struct inode *inode, struct file *file) > >>> { > >>> + if (file->f_flags & O_ACCMODE) > >>> + integrity_notify_change(inode); > >>> + > >>> fput(file->private_data); > >>> > >>> return 0; > >>> diff --git a/include/linux/integrity.h b/include/linux/integrity.h > >>> index 2ea0f2f65ab6..cefdeccc1619 100644 > >>> --- a/include/linux/integrity.h > >>> +++ b/include/linux/integrity.h > >>> @@ -23,6 +23,7 @@ enum integrity_status { > >>> #ifdef CONFIG_INTEGRITY > >>> extern struct integrity_iint_cache *integrity_inode_get(struct inode *inode); > >>> extern void integrity_inode_free(struct inode *inode); > >>> +extern void integrity_notify_change(struct inode *inode); > >> > >> I thought we concluded that ima is going to move into the security hook > >> infrastructure so it seems this should be a proper LSM hook? > > > > We are working towards migrating IMA/EVM to the LSM layer, but there > > are a few things we need to fix/update/remove first; if anyone is > > curious, you can join the LSM list as we've been discussing some of > > these changes this week. Bug fixes like this should probably remain > > as IMA/EVM calls for the time being, with the understanding that they > > will migrate over with the rest of IMA/EVM. > > > > That said, we should give Mimi a chance to review this patch as it is > > possible there is a different/better approach. A bit of patience may > > be required as I know Mimi is very busy at the moment. > > There may be a better approach actually by increasing the inode's i_version, > which then should trigger the appropriate path in ima_check_last_writer(). I'm not the VFS/inode expert here, but I thought the inode's i_version field was only supposed to be bumped when the inode metadata changed, not necessarily the file contents, right? That said, overlayfs is a bit different so maybe that's okay, but I think we would need to hear from the VFS folks if this is acceptable.
On Thu, Apr 06, 2023 at 10:36:41AM -0400, Paul Moore wrote: > On Thu, Apr 6, 2023 at 10:20 AM Stefan Berger <stefanb@linux.ibm.com> wrote: > > On 4/6/23 10:05, Paul Moore wrote: > > > On Thu, Apr 6, 2023 at 6:26 AM Christian Brauner <brauner@kernel.org> wrote: > > >> On Wed, Apr 05, 2023 at 01:14:49PM -0400, Stefan Berger wrote: > > >>> Overlayfs fails to notify IMA / EVM about file content modifications > > >>> and therefore IMA-appraised files may execute even though their file > > >>> signature does not validate against the changed hash of the file > > >>> anymore. To resolve this issue, add a call to integrity_notify_change() > > >>> to the ovl_release() function to notify the integrity subsystem about > > >>> file changes. The set flag triggers the re-evaluation of the file by > > >>> IMA / EVM once the file is accessed again. > > >>> > > >>> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> > > >>> --- > > >>> fs/overlayfs/file.c | 4 ++++ > > >>> include/linux/integrity.h | 6 ++++++ > > >>> security/integrity/iint.c | 13 +++++++++++++ > > >>> 3 files changed, 23 insertions(+) > > >>> > > >>> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c > > >>> index 6011f955436b..19b8f4bcc18c 100644 > > >>> --- a/fs/overlayfs/file.c > > >>> +++ b/fs/overlayfs/file.c > > >>> @@ -13,6 +13,7 @@ > > >>> #include <linux/security.h> > > >>> #include <linux/mm.h> > > >>> #include <linux/fs.h> > > >>> +#include <linux/integrity.h> > > >>> #include "overlayfs.h" > > >>> > > >>> struct ovl_aio_req { > > >>> @@ -169,6 +170,9 @@ static int ovl_open(struct inode *inode, struct file *file) > > >>> > > >>> static int ovl_release(struct inode *inode, struct file *file) > > >>> { > > >>> + if (file->f_flags & O_ACCMODE) > > >>> + integrity_notify_change(inode); > > >>> + > > >>> fput(file->private_data); > > >>> > > >>> return 0; > > >>> diff --git a/include/linux/integrity.h b/include/linux/integrity.h > > >>> index 2ea0f2f65ab6..cefdeccc1619 100644 > > >>> --- a/include/linux/integrity.h > > >>> +++ b/include/linux/integrity.h > > >>> @@ -23,6 +23,7 @@ enum integrity_status { > > >>> #ifdef CONFIG_INTEGRITY > > >>> extern struct integrity_iint_cache *integrity_inode_get(struct inode *inode); > > >>> extern void integrity_inode_free(struct inode *inode); > > >>> +extern void integrity_notify_change(struct inode *inode); > > >> > > >> I thought we concluded that ima is going to move into the security hook > > >> infrastructure so it seems this should be a proper LSM hook? > > > > > > We are working towards migrating IMA/EVM to the LSM layer, but there > > > are a few things we need to fix/update/remove first; if anyone is > > > curious, you can join the LSM list as we've been discussing some of > > > these changes this week. Bug fixes like this should probably remain > > > as IMA/EVM calls for the time being, with the understanding that they > > > will migrate over with the rest of IMA/EVM. > > > > > > That said, we should give Mimi a chance to review this patch as it is > > > possible there is a different/better approach. A bit of patience may > > > be required as I know Mimi is very busy at the moment. > > > > There may be a better approach actually by increasing the inode's i_version, > > which then should trigger the appropriate path in ima_check_last_writer(). > > I'm not the VFS/inode expert here, but I thought the inode's i_version > field was only supposed to be bumped when the inode metadata changed, > not necessarily the file contents, right? > > That said, overlayfs is a bit different so maybe that's okay, but I > think we would need to hear from the VFS folks if this is acceptable. Ccing Jeff for awareness since he did the i_version rework a short time ago. The documentation in include/linux/iversion.h states: * [...] The i_version must * appear larger to observers if there was an explicit change to the inode's * data or metadata since it was last queried. what I'm less sure in all of this is why this is called in ovl_release() and whether it's correct to increment the overlayfs inode's i_version. The change is done to the inode of the copied up/modified file's inode in the upper layer. So the i_version should already be incremented when we call into the upper layer usually via vfs_*() methods.
On 4/6/23 10:36, Paul Moore wrote: > On Thu, Apr 6, 2023 at 10:20 AM Stefan Berger <stefanb@linux.ibm.com> wrote: >> On 4/6/23 10:05, Paul Moore wrote: >>> On Thu, Apr 6, 2023 at 6:26 AM Christian Brauner <brauner@kernel.org> wrote: >>>> On Wed, Apr 05, 2023 at 01:14:49PM -0400, Stefan Berger wrote: >>>>> Overlayfs fails to notify IMA / EVM about file content modifications >>>>> and therefore IMA-appraised files may execute even though their file >>>>> signature does not validate against the changed hash of the file >>>>> anymore. To resolve this issue, add a call to integrity_notify_change() >>>>> to the ovl_release() function to notify the integrity subsystem about >>>>> file changes. The set flag triggers the re-evaluation of the file by >>>>> IMA / EVM once the file is accessed again. >>>>> >>>>> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> >>>>> --- >>>>> fs/overlayfs/file.c | 4 ++++ >>>>> include/linux/integrity.h | 6 ++++++ >>>>> security/integrity/iint.c | 13 +++++++++++++ >>>>> 3 files changed, 23 insertions(+) >>>>> >>>>> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c >>>>> index 6011f955436b..19b8f4bcc18c 100644 >>>>> --- a/fs/overlayfs/file.c >>>>> +++ b/fs/overlayfs/file.c >>>>> @@ -13,6 +13,7 @@ >>>>> #include <linux/security.h> >>>>> #include <linux/mm.h> >>>>> #include <linux/fs.h> >>>>> +#include <linux/integrity.h> >>>>> #include "overlayfs.h" >>>>> >>>>> struct ovl_aio_req { >>>>> @@ -169,6 +170,9 @@ static int ovl_open(struct inode *inode, struct file *file) >>>>> >>>>> static int ovl_release(struct inode *inode, struct file *file) >>>>> { >>>>> + if (file->f_flags & O_ACCMODE) >>>>> + integrity_notify_change(inode); >>>>> + >>>>> fput(file->private_data); >>>>> >>>>> return 0; >>>>> diff --git a/include/linux/integrity.h b/include/linux/integrity.h >>>>> index 2ea0f2f65ab6..cefdeccc1619 100644 >>>>> --- a/include/linux/integrity.h >>>>> +++ b/include/linux/integrity.h >>>>> @@ -23,6 +23,7 @@ enum integrity_status { >>>>> #ifdef CONFIG_INTEGRITY >>>>> extern struct integrity_iint_cache *integrity_inode_get(struct inode *inode); >>>>> extern void integrity_inode_free(struct inode *inode); >>>>> +extern void integrity_notify_change(struct inode *inode); >>>> >>>> I thought we concluded that ima is going to move into the security hook >>>> infrastructure so it seems this should be a proper LSM hook? >>> >>> We are working towards migrating IMA/EVM to the LSM layer, but there >>> are a few things we need to fix/update/remove first; if anyone is >>> curious, you can join the LSM list as we've been discussing some of >>> these changes this week. Bug fixes like this should probably remain >>> as IMA/EVM calls for the time being, with the understanding that they >>> will migrate over with the rest of IMA/EVM. >>> >>> That said, we should give Mimi a chance to review this patch as it is >>> possible there is a different/better approach. A bit of patience may >>> be required as I know Mimi is very busy at the moment. >> >> There may be a better approach actually by increasing the inode's i_version, >> which then should trigger the appropriate path in ima_check_last_writer(). > > I'm not the VFS/inode expert here, but I thought the inode's i_version > field was only supposed to be bumped when the inode metadata changed, > not necessarily the file contents, right? > > That said, overlayfs is a bit different so maybe that's okay, but I > think we would need to hear from the VFS folks if this is acceptable. > Exactly. In ima_check_last_writer() I want to trigger the code path with iint->flags &= ... if (atomic_read(&inode->i_writecount) == 1) { update = test_and_clear_bit(IMA_UPDATE_XATTR, &iint->atomic_flags); if (!IS_I_VERSION(inode) || !inode_eq_iversion(inode, iint->version) || (iint->flags & IMA_NEW_FILE)) { iint->flags &= ~(IMA_DONE_MASK | IMA_NEW_FILE); iint->measured_pcrs = 0; if (update) ima_update_xattr(iint, file); } } This patch here resolves it for my use case and triggers the expected code paths when ima_file_free() -> ima_check_last_writer() is called because then the i_version is seen as having been modified. diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c index 6011f955436b..1dfe5e7bfe1c 100644 --- a/fs/overlayfs/file.c +++ b/fs/overlayfs/file.c @@ -13,6 +13,7 @@ #include <linux/security.h> #include <linux/mm.h> #include <linux/fs.h> +#include <linux/iversion.h> #include "overlayfs.h" struct ovl_aio_req { @@ -408,6 +409,8 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter) if (ret != -EIOCBQUEUED) ovl_aio_cleanup_handler(aio_req); } + if (ret > 0) + inode_maybe_inc_iversion(inode, false); out: revert_creds(old_cred); out_fdput: I have been testing this in a OpenBMC/Yocto environment where overlayfs is used as root filesystem with the lower filesystem being a squashfs. Regards, Stefan
On Thu, 2023-04-06 at 17:01 +0200, Christian Brauner wrote: > On Thu, Apr 06, 2023 at 10:36:41AM -0400, Paul Moore wrote: > > On Thu, Apr 6, 2023 at 10:20 AM Stefan Berger <stefanb@linux.ibm.com> wrote: > > > On 4/6/23 10:05, Paul Moore wrote: > > > > On Thu, Apr 6, 2023 at 6:26 AM Christian Brauner <brauner@kernel.org> wrote: > > > > > On Wed, Apr 05, 2023 at 01:14:49PM -0400, Stefan Berger wrote: > > > > > > Overlayfs fails to notify IMA / EVM about file content modifications > > > > > > and therefore IMA-appraised files may execute even though their file > > > > > > signature does not validate against the changed hash of the file > > > > > > anymore. To resolve this issue, add a call to integrity_notify_change() > > > > > > to the ovl_release() function to notify the integrity subsystem about > > > > > > file changes. The set flag triggers the re-evaluation of the file by > > > > > > IMA / EVM once the file is accessed again. > > > > > > > > > > > > Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> > > > > > > --- > > > > > > fs/overlayfs/file.c | 4 ++++ > > > > > > include/linux/integrity.h | 6 ++++++ > > > > > > security/integrity/iint.c | 13 +++++++++++++ > > > > > > 3 files changed, 23 insertions(+) > > > > > > > > > > > > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c > > > > > > index 6011f955436b..19b8f4bcc18c 100644 > > > > > > --- a/fs/overlayfs/file.c > > > > > > +++ b/fs/overlayfs/file.c > > > > > > @@ -13,6 +13,7 @@ > > > > > > #include <linux/security.h> > > > > > > #include <linux/mm.h> > > > > > > #include <linux/fs.h> > > > > > > +#include <linux/integrity.h> > > > > > > #include "overlayfs.h" > > > > > > > > > > > > struct ovl_aio_req { > > > > > > @@ -169,6 +170,9 @@ static int ovl_open(struct inode *inode, struct file *file) > > > > > > > > > > > > static int ovl_release(struct inode *inode, struct file *file) > > > > > > { > > > > > > + if (file->f_flags & O_ACCMODE) > > > > > > + integrity_notify_change(inode); > > > > > > + > > > > > > fput(file->private_data); > > > > > > > > > > > > return 0; > > > > > > diff --git a/include/linux/integrity.h b/include/linux/integrity.h > > > > > > index 2ea0f2f65ab6..cefdeccc1619 100644 > > > > > > --- a/include/linux/integrity.h > > > > > > +++ b/include/linux/integrity.h > > > > > > @@ -23,6 +23,7 @@ enum integrity_status { > > > > > > #ifdef CONFIG_INTEGRITY > > > > > > extern struct integrity_iint_cache *integrity_inode_get(struct inode *inode); > > > > > > extern void integrity_inode_free(struct inode *inode); > > > > > > +extern void integrity_notify_change(struct inode *inode); > > > > > > > > > > I thought we concluded that ima is going to move into the security hook > > > > > infrastructure so it seems this should be a proper LSM hook? > > > > > > > > We are working towards migrating IMA/EVM to the LSM layer, but there > > > > are a few things we need to fix/update/remove first; if anyone is > > > > curious, you can join the LSM list as we've been discussing some of > > > > these changes this week. Bug fixes like this should probably remain > > > > as IMA/EVM calls for the time being, with the understanding that they > > > > will migrate over with the rest of IMA/EVM. > > > > > > > > That said, we should give Mimi a chance to review this patch as it is > > > > possible there is a different/better approach. A bit of patience may > > > > be required as I know Mimi is very busy at the moment. > > > > > > There may be a better approach actually by increasing the inode's i_version, > > > which then should trigger the appropriate path in ima_check_last_writer(). > > > > I'm not the VFS/inode expert here, but I thought the inode's i_version > > field was only supposed to be bumped when the inode metadata changed, > > not necessarily the file contents, right? > > No. The i_version should change any time there is a "deliberate change" to the file. That can be to the data or metadata, but it has to be in response to some sort of deliberate, observable change -- something that would cause an mtime or ctime change. In practice, the i_version changes whenever the ctime changes, as changing the mtime also changes the ctime. > > That said, overlayfs is a bit different so maybe that's okay, but I > > think we would need to hear from the VFS folks if this is acceptable. > > Ccing Jeff for awareness since he did the i_version rework a short time ago. > > The documentation in include/linux/iversion.h states: > > * [...] The i_version must > * appear larger to observers if there was an explicit change to the inode's > * data or metadata since it was last queried. > > what I'm less sure in all of this is why this is called in ovl_release() and > whether it's correct to increment the overlayfs inode's i_version. > Yeah, not sure what's special about doing this on close(). Seems like someone could just hold the file open to prevent triggering the remeasurement? > The change is done to the inode of the copied up/modified file's inode in the > upper layer. So the i_version should already be incremented when we call into > the upper layer usually via vfs_*() methods. Correct. As long as IMA is also measuring the upper inode then it seems like you shouldn't need to do anything special here. What sort of fs are you using for the upper layer?
On 4/6/23 14:46, Jeff Layton wrote: > On Thu, 2023-04-06 at 17:01 +0200, Christian Brauner wrote: >> On Thu, Apr 06, 2023 at 10:36:41AM -0400, Paul Moore wrote: > > Correct. As long as IMA is also measuring the upper inode then it seems > like you shouldn't need to do anything special here. Unfortunately IMA does not notice the changes. With the patch provided in the other email IMA works as expected. > > What sort of fs are you using for the upper layer? jffs2: /dev/mtdblock4 on /run/initramfs/ro type squashfs (ro,relatime,errors=continue) /dev/mtdblock5 on /run/initramfs/rw type jffs2 (rw,relatime) cow on / type overlay (rw,relatime,lowerdir=run/initramfs/ro,upperdir=run/initramfs/rw/cow,workdir=run/initramfs/rw/work) Regards, Stefan
On Thu, 2023-04-06 at 15:11 -0400, Stefan Berger wrote: > > On 4/6/23 14:46, Jeff Layton wrote: > > On Thu, 2023-04-06 at 17:01 +0200, Christian Brauner wrote: > > > On Thu, Apr 06, 2023 at 10:36:41AM -0400, Paul Moore wrote: > > > > > Correct. As long as IMA is also measuring the upper inode then it seems > > like you shouldn't need to do anything special here. > > Unfortunately IMA does not notice the changes. With the patch provided in the other email IMA works as expected. > It looks like remeasurement is usually done in ima_check_last_writer. That gets called from __fput which is called when we're releasing the last reference to the struct file. You've hooked into the ->release op, which gets called whenever filp_close is called, which happens when we're disassociating the file from the file descriptor table. So...I don't get it. Is ima_file_free not getting called on your file for some reason when you go to close it? It seems like that should be handling this. In any case, I think this could use a bit more root-cause analysis. > > > > What sort of fs are you using for the upper layer? > > jffs2: > > /dev/mtdblock4 on /run/initramfs/ro type squashfs (ro,relatime,errors=continue) > /dev/mtdblock5 on /run/initramfs/rw type jffs2 (rw,relatime) > cow on / type overlay (rw,relatime,lowerdir=run/initramfs/ro,upperdir=run/initramfs/rw/cow,workdir=run/initramfs/rw/work) > jffs2 does not have a proper i_version counter, I'm afraid. But, IMA should handle that OK (by assuming that it always needs to remeasure when there is no i_version counter).
On 4/6/23 15:37, Jeff Layton wrote: > On Thu, 2023-04-06 at 15:11 -0400, Stefan Berger wrote: >> >> On 4/6/23 14:46, Jeff Layton wrote: >>> On Thu, 2023-04-06 at 17:01 +0200, Christian Brauner wrote: >>>> On Thu, Apr 06, 2023 at 10:36:41AM -0400, Paul Moore wrote: >> >>> >>> Correct. As long as IMA is also measuring the upper inode then it seems >>> like you shouldn't need to do anything special here. >> >> Unfortunately IMA does not notice the changes. With the patch provided in the other email IMA works as expected. >> > > > It looks like remeasurement is usually done in ima_check_last_writer. > That gets called from __fput which is called when we're releasing the > last reference to the struct file. > > You've hooked into the ->release op, which gets called whenever > filp_close is called, which happens when we're disassociating the file > from the file descriptor table. > > So...I don't get it. Is ima_file_free not getting called on your file > for some reason when you go to close it? It seems like that should be > handling this. I would ditch the original proposal in favor of this 2-line patch shown here: https://lore.kernel.org/linux-integrity/a95f62ed-8b8a-38e5-e468-ecbde3b221af@linux.ibm.com/T/#m3bd047c6e5c8200df1d273c0ad551c645dd43232 The new proposed i_version increase occurs on the inode that IMA sees later on for the file that's being executed and on which it must do a re-evaluation. Upon file changes ima_inode_free() seems to see two ima_file_free() calls, one for what seems to be the upper layer (used for vfs_* functions below) and once for the lower one. The important thing is that IMA will see the lower one when the file gets executed later on and this is the one that I instrumented now to have its i_version increased, which in turn triggers the re-evaluation of the file post modification. static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter) [...] struct fd real; [...] ret = ovl_real_fdget(file, &real); if (ret) goto out_unlock; [...] if (is_sync_kiocb(iocb)) { file_start_write(real.file); --> ret = vfs_iter_write(real.file, iter, &iocb->ki_pos, ovl_iocb_to_rwf(ifl)); file_end_write(real.file); /* Update size */ ovl_copyattr(inode); } else { struct ovl_aio_req *aio_req; ret = -ENOMEM; aio_req = kmem_cache_zalloc(ovl_aio_request_cachep, GFP_KERNEL); if (!aio_req) goto out; file_start_write(real.file); /* Pacify lockdep, same trick as done in aio_write() */ __sb_writers_release(file_inode(real.file)->i_sb, SB_FREEZE_WRITE); aio_req->fd = real; real.flags = 0; aio_req->orig_iocb = iocb; kiocb_clone(&aio_req->iocb, iocb, real.file); aio_req->iocb.ki_flags = ifl; aio_req->iocb.ki_complete = ovl_aio_rw_complete; refcount_set(&aio_req->ref, 2); --> ret = vfs_iocb_iter_write(real.file, &aio_req->iocb, iter); ovl_aio_put(aio_req); if (ret != -EIOCBQUEUED) ovl_aio_cleanup_handler(aio_req); } if (ret > 0) <--- this get it to work inode_maybe_inc_iversion(inode, false); <--- since this inode is known to IMA out: revert_creds(old_cred); out_fdput: fdput(real); out_unlock: inode_unlock(inode); Stefan > > In any case, I think this could use a bit more root-cause analysis.
On Thu, 2023-04-06 at 16:22 -0400, Stefan Berger wrote: > > On 4/6/23 15:37, Jeff Layton wrote: > > On Thu, 2023-04-06 at 15:11 -0400, Stefan Berger wrote: > > > > > > On 4/6/23 14:46, Jeff Layton wrote: > > > > On Thu, 2023-04-06 at 17:01 +0200, Christian Brauner wrote: > > > > > On Thu, Apr 06, 2023 at 10:36:41AM -0400, Paul Moore wrote: > > > > > > > > > > > Correct. As long as IMA is also measuring the upper inode then it seems > > > > like you shouldn't need to do anything special here. > > > > > > Unfortunately IMA does not notice the changes. With the patch provided in the other email IMA works as expected. > > > > > > > > > It looks like remeasurement is usually done in ima_check_last_writer. > > That gets called from __fput which is called when we're releasing the > > last reference to the struct file. > > > > You've hooked into the ->release op, which gets called whenever > > filp_close is called, which happens when we're disassociating the file > > from the file descriptor table. > > > > So...I don't get it. Is ima_file_free not getting called on your file > > for some reason when you go to close it? It seems like that should be > > handling this. > > I would ditch the original proposal in favor of this 2-line patch shown here: > > https://lore.kernel.org/linux-integrity/a95f62ed-8b8a-38e5-e468-ecbde3b221af@linux.ibm.com/T/#m3bd047c6e5c8200df1d273c0ad551c645dd43232 > > Ok, I think I get it. IMA is trying to use the i_version from the overlayfs inode. I suspect that the real problem here is that IMA is just doing a bare inode_query_iversion. Really, we ought to make IMA call vfs_getattr_nosec (or something like it) to query the getattr routine in the upper layer. Then overlayfs could just propagate the results from the upper layer in its response. That sort of design may also eventually help IMA work properly with more exotic filesystems, like NFS or Ceph. > The new proposed i_version increase occurs on the inode that IMA sees later on for > the file that's being executed and on which it must do a re-evaluation. > > Upon file changes ima_inode_free() seems to see two ima_file_free() calls, > one for what seems to be the upper layer (used for vfs_* functions below) > and once for the lower one. > The important thing is that IMA will see the lower one when the file gets > executed later on and this is the one that I instrumented now to have its > i_version increased, which in turn triggers the re-evaluation of the file post > modification. > > static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter) > [...] > struct fd real; > [...] > ret = ovl_real_fdget(file, &real); > if (ret) > goto out_unlock; > > [...] > if (is_sync_kiocb(iocb)) { > file_start_write(real.file); > --> ret = vfs_iter_write(real.file, iter, &iocb->ki_pos, > ovl_iocb_to_rwf(ifl)); > file_end_write(real.file); > /* Update size */ > ovl_copyattr(inode); > } else { > struct ovl_aio_req *aio_req; > > ret = -ENOMEM; > aio_req = kmem_cache_zalloc(ovl_aio_request_cachep, GFP_KERNEL); > if (!aio_req) > goto out; > > file_start_write(real.file); > /* Pacify lockdep, same trick as done in aio_write() */ > __sb_writers_release(file_inode(real.file)->i_sb, > SB_FREEZE_WRITE); > aio_req->fd = real; > real.flags = 0; > aio_req->orig_iocb = iocb; > kiocb_clone(&aio_req->iocb, iocb, real.file); > aio_req->iocb.ki_flags = ifl; > aio_req->iocb.ki_complete = ovl_aio_rw_complete; > refcount_set(&aio_req->ref, 2); > --> ret = vfs_iocb_iter_write(real.file, &aio_req->iocb, iter); > ovl_aio_put(aio_req); > if (ret != -EIOCBQUEUED) > ovl_aio_cleanup_handler(aio_req); > } > if (ret > 0) <--- this get it to work > inode_maybe_inc_iversion(inode, false); <--- since this inode is known to IMA > out: > revert_creds(old_cred); > out_fdput: > fdput(real); > > out_unlock: > inode_unlock(inode); > > > > > Stefan > > > > > In any case, I think this could use a bit more root-cause analysis. >
On 4/6/23 17:24, Jeff Layton wrote: > On Thu, 2023-04-06 at 16:22 -0400, Stefan Berger wrote: >> >> On 4/6/23 15:37, Jeff Layton wrote: >>> On Thu, 2023-04-06 at 15:11 -0400, Stefan Berger wrote: >>>> >>>> On 4/6/23 14:46, Jeff Layton wrote: >>>>> On Thu, 2023-04-06 at 17:01 +0200, Christian Brauner wrote: >>>>>> On Thu, Apr 06, 2023 at 10:36:41AM -0400, Paul Moore wrote: >>>> >>>>> >>>>> Correct. As long as IMA is also measuring the upper inode then it seems >>>>> like you shouldn't need to do anything special here. >>>> >>>> Unfortunately IMA does not notice the changes. With the patch provided in the other email IMA works as expected. >>>> >>> >>> >>> It looks like remeasurement is usually done in ima_check_last_writer. >>> That gets called from __fput which is called when we're releasing the >>> last reference to the struct file. >>> >>> You've hooked into the ->release op, which gets called whenever >>> filp_close is called, which happens when we're disassociating the file >>> from the file descriptor table. >>> >>> So...I don't get it. Is ima_file_free not getting called on your file >>> for some reason when you go to close it? It seems like that should be >>> handling this. >> >> I would ditch the original proposal in favor of this 2-line patch shown here: >> >> https://lore.kernel.org/linux-integrity/a95f62ed-8b8a-38e5-e468-ecbde3b221af@linux.ibm.com/T/#m3bd047c6e5c8200df1d273c0ad551c645dd43232 >> >> > > Ok, I think I get it. IMA is trying to use the i_version from the > overlayfs inode. > > I suspect that the real problem here is that IMA is just doing a bare > inode_query_iversion. Really, we ought to make IMA call > vfs_getattr_nosec (or something like it) to query the getattr routine in > the upper layer. Then overlayfs could just propagate the results from > the upper layer in its response. You mean compare known stat against current ? It seems more expensive to stat the file rather than using the simple i_version-has-changed indicator. > > That sort of design may also eventually help IMA work properly with more > exotic filesystems, like NFS or Ceph. And these don't support i_version at all? Stefan
On Thu, 2023-04-06 at 17:24 -0400, Jeff Layton wrote: > On Thu, 2023-04-06 at 16:22 -0400, Stefan Berger wrote: > > > > On 4/6/23 15:37, Jeff Layton wrote: > > > On Thu, 2023-04-06 at 15:11 -0400, Stefan Berger wrote: > > > > > > > > On 4/6/23 14:46, Jeff Layton wrote: > > > > > On Thu, 2023-04-06 at 17:01 +0200, Christian Brauner wrote: > > > > > > On Thu, Apr 06, 2023 at 10:36:41AM -0400, Paul Moore wrote: > > > > > > > > > > > > > > Correct. As long as IMA is also measuring the upper inode then it seems > > > > > like you shouldn't need to do anything special here. > > > > > > > > Unfortunately IMA does not notice the changes. With the patch provided in the other email IMA works as expected. > > > > > > > > > > > > > It looks like remeasurement is usually done in ima_check_last_writer. > > > That gets called from __fput which is called when we're releasing the > > > last reference to the struct file. > > > > > > You've hooked into the ->release op, which gets called whenever > > > filp_close is called, which happens when we're disassociating the file > > > from the file descriptor table. > > > > > > So...I don't get it. Is ima_file_free not getting called on your file > > > for some reason when you go to close it? It seems like that should be > > > handling this. > > > > I would ditch the original proposal in favor of this 2-line patch shown here: > > > > https://lore.kernel.org/linux-integrity/a95f62ed-8b8a-38e5-e468-ecbde3b221af@linux.ibm.com/T/#m3bd047c6e5c8200df1d273c0ad551c645dd43232 > > > > > > Ok, I think I get it. IMA is trying to use the i_version from the > overlayfs inode. > > I suspect that the real problem here is that IMA is just doing a bare > inode_query_iversion. Really, we ought to make IMA call > vfs_getattr_nosec (or something like it) to query the getattr routine in > the upper layer. Then overlayfs could just propagate the results from > the upper layer in its response. > > That sort of design may also eventually help IMA work properly with more > exotic filesystems, like NFS or Ceph. > > > Maybe something like this? It builds for me but I haven't tested it. It looks like overlayfs already should report the upper layer's i_version in getattr, though I haven't tested that either: -----------------------8<--------------------------- [PATCH] IMA: use vfs_getattr_nosec to get the i_version IMA currently accesses the i_version out of the inode directly when it does a measurement. This is fine for most simple filesystems, but can be problematic with more complex setups (e.g. overlayfs). Make IMA instead call vfs_getattr_nosec to get this info. This allows the filesystem to determine whether and how to report the i_version, and should allow IMA to work properly with a broader class of filesystems in the future. Reported-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> --- security/integrity/ima/ima_api.c | 9 ++++++--- security/integrity/ima/ima_main.c | 12 ++++++++---- 2 files changed, 14 insertions(+), 7 deletions(-) diff --git a/security/integrity/ima/ima_api.c b/security/integrity/ima/ima_api.c index d3662f4acadc..c45902e72044 100644 --- a/security/integrity/ima/ima_api.c +++ b/security/integrity/ima/ima_api.c @@ -13,7 +13,6 @@ #include <linux/fs.h> #include <linux/xattr.h> #include <linux/evm.h> -#include <linux/iversion.h> #include <linux/fsverity.h> #include "ima.h" @@ -246,10 +245,11 @@ int ima_collect_measurement(struct integrity_iint_cache *iint, struct inode *inode = file_inode(file); const char *filename = file->f_path.dentry->d_name.name; struct ima_max_digest_data hash; + struct kstat stat; int result = 0; int length; void *tmpbuf; - u64 i_version; + u64 i_version = 0; /* * Always collect the modsig, because IMA might have already collected @@ -268,7 +268,10 @@ int ima_collect_measurement(struct integrity_iint_cache *iint, * to an initial measurement/appraisal/audit, but was modified to * assume the file changed. */ - i_version = inode_query_iversion(inode); + result = vfs_getattr_nosec(&file->f_path, &stat, STATX_CHANGE_COOKIE, + AT_STATX_SYNC_AS_STAT); + if (!result && (stat.result_mask & STATX_CHANGE_COOKIE)) + i_version = stat.change_cookie; hash.hdr.algo = algo; hash.hdr.length = hash_digest_size[algo]; diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c index d66a0a36415e..365db0e43d7c 100644 --- a/security/integrity/ima/ima_main.c +++ b/security/integrity/ima/ima_main.c @@ -24,7 +24,6 @@ #include <linux/slab.h> #include <linux/xattr.h> #include <linux/ima.h> -#include <linux/iversion.h> #include <linux/fs.h> #include "ima.h" @@ -164,11 +163,16 @@ static void ima_check_last_writer(struct integrity_iint_cache *iint, mutex_lock(&iint->mutex); if (atomic_read(&inode->i_writecount) == 1) { + struct kstat stat; + update = test_and_clear_bit(IMA_UPDATE_XATTR, &iint->atomic_flags); - if (!IS_I_VERSION(inode) || - !inode_eq_iversion(inode, iint->version) || - (iint->flags & IMA_NEW_FILE)) { + if ((iint->flags & IMA_NEW_FILE) || + vfs_getattr_nosec(&file->f_path, &stat, + STATX_CHANGE_COOKIE, + AT_STATX_SYNC_AS_STAT) || + !(stat.result_mask & STATX_CHANGE_COOKIE) || + stat.change_cookie != iint->version) { iint->flags &= ~(IMA_DONE_MASK | IMA_NEW_FILE); iint->measured_pcrs = 0; if (update)
On Thu, 2023-04-06 at 17:58 -0400, Stefan Berger wrote: > > On 4/6/23 17:24, Jeff Layton wrote: > > On Thu, 2023-04-06 at 16:22 -0400, Stefan Berger wrote: > > > > > > On 4/6/23 15:37, Jeff Layton wrote: > > > > On Thu, 2023-04-06 at 15:11 -0400, Stefan Berger wrote: > > > > > > > > > > On 4/6/23 14:46, Jeff Layton wrote: > > > > > > On Thu, 2023-04-06 at 17:01 +0200, Christian Brauner wrote: > > > > > > > On Thu, Apr 06, 2023 at 10:36:41AM -0400, Paul Moore wrote: > > > > > > > > > > > > > > > > > Correct. As long as IMA is also measuring the upper inode then it seems > > > > > > like you shouldn't need to do anything special here. > > > > > > > > > > Unfortunately IMA does not notice the changes. With the patch provided in the other email IMA works as expected. > > > > > > > > > > > > > > > > > It looks like remeasurement is usually done in ima_check_last_writer. > > > > That gets called from __fput which is called when we're releasing the > > > > last reference to the struct file. > > > > > > > > You've hooked into the ->release op, which gets called whenever > > > > filp_close is called, which happens when we're disassociating the file > > > > from the file descriptor table. > > > > > > > > So...I don't get it. Is ima_file_free not getting called on your file > > > > for some reason when you go to close it? It seems like that should be > > > > handling this. > > > > > > I would ditch the original proposal in favor of this 2-line patch shown here: > > > > > > https://lore.kernel.org/linux-integrity/a95f62ed-8b8a-38e5-e468-ecbde3b221af@linux.ibm.com/T/#m3bd047c6e5c8200df1d273c0ad551c645dd43232 > > > > > > > > > > Ok, I think I get it. IMA is trying to use the i_version from the > > overlayfs inode. > > > > I suspect that the real problem here is that IMA is just doing a bare > > inode_query_iversion. Really, we ought to make IMA call > > vfs_getattr_nosec (or something like it) to query the getattr routine in > > the upper layer. Then overlayfs could just propagate the results from > > the upper layer in its response. > > You mean compare known stat against current ? It seems more expensive to stat the file > rather than using the simple i_version-has-changed indicator. > getattr is fairly cheap on a local filesystem. It's more expensive with something networked, but that's the price of correctness. > > That sort of design may also eventually help IMA work properly with more > > exotic filesystems, like NFS or Ceph. > > And these don't support i_version at all? They absolutely do. Their change attributes are mediated by the server, so they can't use the kernel's mechanism for IS_I_VERSION inodes. They can report that field in their ->getattr routines however.
On 4/6/23 18:04, Jeff Layton wrote: > On Thu, 2023-04-06 at 17:24 -0400, Jeff Layton wrote: >> On Thu, 2023-04-06 at 16:22 -0400, Stefan Berger wrote: >>> >>> On 4/6/23 15:37, Jeff Layton wrote: >>>> On Thu, 2023-04-06 at 15:11 -0400, Stefan Berger wrote: >>>>> >>>>> On 4/6/23 14:46, Jeff Layton wrote: >>>>>> On Thu, 2023-04-06 at 17:01 +0200, Christian Brauner wrote: >>>>>>> On Thu, Apr 06, 2023 at 10:36:41AM -0400, Paul Moore wrote: >>>>> >>>>>> >>>>>> Correct. As long as IMA is also measuring the upper inode then it seems >>>>>> like you shouldn't need to do anything special here. >>>>> >>>>> Unfortunately IMA does not notice the changes. With the patch provided in the other email IMA works as expected. >>>>> >>>> >>>> >>>> It looks like remeasurement is usually done in ima_check_last_writer. >>>> That gets called from __fput which is called when we're releasing the >>>> last reference to the struct file. >>>> >>>> You've hooked into the ->release op, which gets called whenever >>>> filp_close is called, which happens when we're disassociating the file >>>> from the file descriptor table. >>>> >>>> So...I don't get it. Is ima_file_free not getting called on your file >>>> for some reason when you go to close it? It seems like that should be >>>> handling this. >>> >>> I would ditch the original proposal in favor of this 2-line patch shown here: >>> >>> https://lore.kernel.org/linux-integrity/a95f62ed-8b8a-38e5-e468-ecbde3b221af@linux.ibm.com/T/#m3bd047c6e5c8200df1d273c0ad551c645dd43232 >>> >>> >> >> Ok, I think I get it. IMA is trying to use the i_version from the >> overlayfs inode. >> >> I suspect that the real problem here is that IMA is just doing a bare >> inode_query_iversion. Really, we ought to make IMA call >> vfs_getattr_nosec (or something like it) to query the getattr routine in >> the upper layer. Then overlayfs could just propagate the results from >> the upper layer in its response. >> >> That sort of design may also eventually help IMA work properly with more >> exotic filesystems, like NFS or Ceph. >> >> >> > > Maybe something like this? It builds for me but I haven't tested it. It > looks like overlayfs already should report the upper layer's i_version > in getattr, though I haven't tested that either: Thank you! I will give it a try once I am back. Stefan
On Thu, Apr 6, 2023 at 11:23 PM Stefan Berger <stefanb@linux.ibm.com> wrote: > > > > On 4/6/23 15:37, Jeff Layton wrote: > > On Thu, 2023-04-06 at 15:11 -0400, Stefan Berger wrote: > >> > >> On 4/6/23 14:46, Jeff Layton wrote: > >>> On Thu, 2023-04-06 at 17:01 +0200, Christian Brauner wrote: > >>>> On Thu, Apr 06, 2023 at 10:36:41AM -0400, Paul Moore wrote: > >> > >>> > >>> Correct. As long as IMA is also measuring the upper inode then it seems > >>> like you shouldn't need to do anything special here. > >> > >> Unfortunately IMA does not notice the changes. With the patch provided in the other email IMA works as expected. > >> > > > > > > It looks like remeasurement is usually done in ima_check_last_writer. > > That gets called from __fput which is called when we're releasing the > > last reference to the struct file. > > > > You've hooked into the ->release op, which gets called whenever > > filp_close is called, which happens when we're disassociating the file > > from the file descriptor table. > > > > So...I don't get it. Is ima_file_free not getting called on your file > > for some reason when you go to close it? It seems like that should be > > handling this. > > I would ditch the original proposal in favor of this 2-line patch shown here: > > https://lore.kernel.org/linux-integrity/a95f62ed-8b8a-38e5-e468-ecbde3b221af@linux.ibm.com/T/#m3bd047c6e5c8200df1d273c0ad551c645dd43232 > > > The new proposed i_version increase occurs on the inode that IMA sees later on for > the file that's being executed and on which it must do a re-evaluation. > > Upon file changes ima_inode_free() seems to see two ima_file_free() calls, > one for what seems to be the upper layer (used for vfs_* functions below) > and once for the lower one. > The important thing is that IMA will see the lower one when the file gets > executed later on and this is the one that I instrumented now to have its > i_version increased, which in turn triggers the re-evaluation of the file post > modification. > > static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter) > [...] > struct fd real; > [...] > ret = ovl_real_fdget(file, &real); > if (ret) > goto out_unlock; > > [...] > if (is_sync_kiocb(iocb)) { > file_start_write(real.file); > --> ret = vfs_iter_write(real.file, iter, &iocb->ki_pos, > ovl_iocb_to_rwf(ifl)); > file_end_write(real.file); > /* Update size */ > ovl_copyattr(inode); > } else { > struct ovl_aio_req *aio_req; > > ret = -ENOMEM; > aio_req = kmem_cache_zalloc(ovl_aio_request_cachep, GFP_KERNEL); > if (!aio_req) > goto out; > > file_start_write(real.file); > /* Pacify lockdep, same trick as done in aio_write() */ > __sb_writers_release(file_inode(real.file)->i_sb, > SB_FREEZE_WRITE); > aio_req->fd = real; > real.flags = 0; > aio_req->orig_iocb = iocb; > kiocb_clone(&aio_req->iocb, iocb, real.file); > aio_req->iocb.ki_flags = ifl; > aio_req->iocb.ki_complete = ovl_aio_rw_complete; > refcount_set(&aio_req->ref, 2); > --> ret = vfs_iocb_iter_write(real.file, &aio_req->iocb, iter); > ovl_aio_put(aio_req); > if (ret != -EIOCBQUEUED) > ovl_aio_cleanup_handler(aio_req); > } > if (ret > 0) <--- this get it to work > inode_maybe_inc_iversion(inode, false); <--- since this inode is known to IMA If the aio is queued, then I think increasing i_version here may be premature. Note that in this code flow, the ovl ctime is updated in ovl_aio_cleanup_handler() => ovl_copyattr() after file_end_write(), similar to the is_sync_kiocb() code patch. It probably makes most sense to include i_version in ovl_copyattr(). Note that this could cause ovl i_version to go backwards on copy up (i.e. after first open for write) when switching from the lower inode i_version to the upper inode i_version. Jeff's proposal to use vfs_getattr_nosec() in IMA code is fine too. It will result in the same i_version jump. IMO it wouldn't hurt to have a valid i_version value in the ovl inode as well. If the ovl inode values would not matter, we would not have needed ovl_copyattr() at all, but it's not good to keep vfs in the dark... Thanks, Amir.
On Fri, Apr 07, 2023 at 09:42:43AM +0300, Amir Goldstein wrote: > On Thu, Apr 6, 2023 at 11:23 PM Stefan Berger <stefanb@linux.ibm.com> wrote: > > > > > > > > On 4/6/23 15:37, Jeff Layton wrote: > > > On Thu, 2023-04-06 at 15:11 -0400, Stefan Berger wrote: > > >> > > >> On 4/6/23 14:46, Jeff Layton wrote: > > >>> On Thu, 2023-04-06 at 17:01 +0200, Christian Brauner wrote: > > >>>> On Thu, Apr 06, 2023 at 10:36:41AM -0400, Paul Moore wrote: > > >> > > >>> > > >>> Correct. As long as IMA is also measuring the upper inode then it seems > > >>> like you shouldn't need to do anything special here. > > >> > > >> Unfortunately IMA does not notice the changes. With the patch provided in the other email IMA works as expected. > > >> > > > > > > > > > It looks like remeasurement is usually done in ima_check_last_writer. > > > That gets called from __fput which is called when we're releasing the > > > last reference to the struct file. > > > > > > You've hooked into the ->release op, which gets called whenever > > > filp_close is called, which happens when we're disassociating the file > > > from the file descriptor table. > > > > > > So...I don't get it. Is ima_file_free not getting called on your file > > > for some reason when you go to close it? It seems like that should be > > > handling this. > > > > I would ditch the original proposal in favor of this 2-line patch shown here: > > > > https://lore.kernel.org/linux-integrity/a95f62ed-8b8a-38e5-e468-ecbde3b221af@linux.ibm.com/T/#m3bd047c6e5c8200df1d273c0ad551c645dd43232 > > > > > > The new proposed i_version increase occurs on the inode that IMA sees later on for > > the file that's being executed and on which it must do a re-evaluation. > > > > Upon file changes ima_inode_free() seems to see two ima_file_free() calls, > > one for what seems to be the upper layer (used for vfs_* functions below) > > and once for the lower one. > > The important thing is that IMA will see the lower one when the file gets > > executed later on and this is the one that I instrumented now to have its > > i_version increased, which in turn triggers the re-evaluation of the file post > > modification. > > > > static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter) > > [...] > > struct fd real; > > [...] > > ret = ovl_real_fdget(file, &real); > > if (ret) > > goto out_unlock; > > > > [...] > > if (is_sync_kiocb(iocb)) { > > file_start_write(real.file); > > --> ret = vfs_iter_write(real.file, iter, &iocb->ki_pos, > > ovl_iocb_to_rwf(ifl)); > > file_end_write(real.file); > > /* Update size */ > > ovl_copyattr(inode); > > } else { > > struct ovl_aio_req *aio_req; > > > > ret = -ENOMEM; > > aio_req = kmem_cache_zalloc(ovl_aio_request_cachep, GFP_KERNEL); > > if (!aio_req) > > goto out; > > > > file_start_write(real.file); > > /* Pacify lockdep, same trick as done in aio_write() */ > > __sb_writers_release(file_inode(real.file)->i_sb, > > SB_FREEZE_WRITE); > > aio_req->fd = real; > > real.flags = 0; > > aio_req->orig_iocb = iocb; > > kiocb_clone(&aio_req->iocb, iocb, real.file); > > aio_req->iocb.ki_flags = ifl; > > aio_req->iocb.ki_complete = ovl_aio_rw_complete; > > refcount_set(&aio_req->ref, 2); > > --> ret = vfs_iocb_iter_write(real.file, &aio_req->iocb, iter); > > ovl_aio_put(aio_req); > > if (ret != -EIOCBQUEUED) > > ovl_aio_cleanup_handler(aio_req); > > } > > if (ret > 0) <--- this get it to work > > inode_maybe_inc_iversion(inode, false); <--- since this inode is known to IMA > > If the aio is queued, then I think increasing i_version here may be premature. > > Note that in this code flow, the ovl ctime is updated in > ovl_aio_cleanup_handler() => ovl_copyattr() > after file_end_write(), similar to the is_sync_kiocb() code patch. > > It probably makes most sense to include i_version in ovl_copyattr(). > Note that this could cause ovl i_version to go backwards on copy up > (i.e. after first open for write) when switching from the lower inode > i_version to the upper inode i_version. > > Jeff's proposal to use vfs_getattr_nosec() in IMA code is fine too. > It will result in the same i_version jump. > > IMO it wouldn't hurt to have a valid i_version value in the ovl inode > as well. If the ovl inode values would not matter, we would not have > needed ovl_copyattr() at all, but it's not good to keep vfs in the dark... > > Thanks, > Amir. On Thu, Apr 06, 2023 at 05:24:11PM -0400, Jeff Layton wrote: > On Thu, 2023-04-06 at 16:22 -0400, Stefan Berger wrote: > > > > On 4/6/23 15:37, Jeff Layton wrote: > > > On Thu, 2023-04-06 at 15:11 -0400, Stefan Berger wrote: > > > > > > > > On 4/6/23 14:46, Jeff Layton wrote: > > > > > On Thu, 2023-04-06 at 17:01 +0200, Christian Brauner wrote: > > > > > > On Thu, Apr 06, 2023 at 10:36:41AM -0400, Paul Moore wrote: > > > > > > > > > > > > > > Correct. As long as IMA is also measuring the upper inode then it seems > > > > > like you shouldn't need to do anything special here. > > > > > > > > Unfortunately IMA does not notice the changes. With the patch provided in the other email IMA works as expected. > > > > > > > > > > > > > It looks like remeasurement is usually done in ima_check_last_writer. > > > That gets called from __fput which is called when we're releasing the > > > last reference to the struct file. > > > > > > You've hooked into the ->release op, which gets called whenever > > > filp_close is called, which happens when we're disassociating the file > > > from the file descriptor table. > > > > > > So...I don't get it. Is ima_file_free not getting called on your file > > > for some reason when you go to close it? It seems like that should be > > > handling this. > > > > I would ditch the original proposal in favor of this 2-line patch shown here: > > > > https://lore.kernel.org/linux-integrity/a95f62ed-8b8a-38e5-e468-ecbde3b221af@linux.ibm.com/T/#m3bd047c6e5c8200df1d273c0ad551c645dd43232 > > > > > > Ok, I think I get it. IMA is trying to use the i_version from the > overlayfs inode. Which is likely to give wrong results and I agree with you that it should rely on vfs_getattr_nosec(). > > I suspect that the real problem here is that IMA is just doing a bare > inode_query_iversion. Really, we ought to make IMA call > vfs_getattr_nosec (or something like it) to query the getattr routine in > the upper layer. Then overlayfs could just propagate the results from > the upper layer in its response. > > That sort of design may also eventually help IMA work properly with more > exotic filesystems, like NFS or Ceph. > > > The new proposed i_version increase occurs on the inode that IMA sees later on for > > the file that's being executed and on which it must do a re-evaluation. > > > > Upon file changes ima_inode_free() seems to see two ima_file_free() calls, > > one for what seems to be the upper layer (used for vfs_* functions below) > > and once for the lower one. > > The important thing is that IMA will see the lower one when the file gets > > executed later on and this is the one that I instrumented now to have its > > i_version increased, which in turn triggers the re-evaluation of the file post > > modification. > > > > static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter) > > [...] > > struct fd real; > > [...] > > ret = ovl_real_fdget(file, &real); > > if (ret) > > goto out_unlock; > > > > [...] > > if (is_sync_kiocb(iocb)) { > > file_start_write(real.file); > > --> ret = vfs_iter_write(real.file, iter, &iocb->ki_pos, > > ovl_iocb_to_rwf(ifl)); > > file_end_write(real.file); > > /* Update size */ > > ovl_copyattr(inode); > > } else { > > struct ovl_aio_req *aio_req; > > > > ret = -ENOMEM; > > aio_req = kmem_cache_zalloc(ovl_aio_request_cachep, GFP_KERNEL); > > if (!aio_req) > > goto out; > > > > file_start_write(real.file); > > /* Pacify lockdep, same trick as done in aio_write() */ > > __sb_writers_release(file_inode(real.file)->i_sb, > > SB_FREEZE_WRITE); > > aio_req->fd = real; > > real.flags = 0; > > aio_req->orig_iocb = iocb; > > kiocb_clone(&aio_req->iocb, iocb, real.file); > > aio_req->iocb.ki_flags = ifl; > > aio_req->iocb.ki_complete = ovl_aio_rw_complete; > > refcount_set(&aio_req->ref, 2); > > --> ret = vfs_iocb_iter_write(real.file, &aio_req->iocb, iter); > > ovl_aio_put(aio_req); > > if (ret != -EIOCBQUEUED) > > ovl_aio_cleanup_handler(aio_req); > > } > > if (ret > 0) <--- this get it to work > > inode_maybe_inc_iversion(inode, false); <--- since this inode is known to IMA > > out: > > revert_creds(old_cred); > > out_fdput: > > fdput(real); > > > > out_unlock: > > inode_unlock(inode); > > > > > > > > > > Stefan > > > > > > > > In any case, I think this could use a bit more root-cause analysis. > > > > -- > Jeff Layton <jlayton@kernel.org> On Thu, Apr 06, 2023 at 06:04:36PM -0400, Jeff Layton wrote: > On Thu, 2023-04-06 at 17:24 -0400, Jeff Layton wrote: > > On Thu, 2023-04-06 at 16:22 -0400, Stefan Berger wrote: > > > > > > On 4/6/23 15:37, Jeff Layton wrote: > > > > On Thu, 2023-04-06 at 15:11 -0400, Stefan Berger wrote: > > > > > > > > > > On 4/6/23 14:46, Jeff Layton wrote: > > > > > > On Thu, 2023-04-06 at 17:01 +0200, Christian Brauner wrote: > > > > > > > On Thu, Apr 06, 2023 at 10:36:41AM -0400, Paul Moore wrote: > > > > > > > > > > > > > > > > > Correct. As long as IMA is also measuring the upper inode then it seems > > > > > > like you shouldn't need to do anything special here. > > > > > > > > > > Unfortunately IMA does not notice the changes. With the patch provided in the other email IMA works as expected. > > > > > > > > > > > > > > > > > It looks like remeasurement is usually done in ima_check_last_writer. > > > > That gets called from __fput which is called when we're releasing the > > > > last reference to the struct file. > > > > > > > > You've hooked into the ->release op, which gets called whenever > > > > filp_close is called, which happens when we're disassociating the file > > > > from the file descriptor table. > > > > > > > > So...I don't get it. Is ima_file_free not getting called on your file > > > > for some reason when you go to close it? It seems like that should be > > > > handling this. > > > > > > I would ditch the original proposal in favor of this 2-line patch shown here: > > > > > > https://lore.kernel.org/linux-integrity/a95f62ed-8b8a-38e5-e468-ecbde3b221af@linux.ibm.com/T/#m3bd047c6e5c8200df1d273c0ad551c645dd43232 We should cool it with the quick hacks to fix things. :) > > > > > > > > > > Ok, I think I get it. IMA is trying to use the i_version from the > > overlayfs inode. > > > > I suspect that the real problem here is that IMA is just doing a bare > > inode_query_iversion. Really, we ought to make IMA call > > vfs_getattr_nosec (or something like it) to query the getattr routine in > > the upper layer. Then overlayfs could just propagate the results from > > the upper layer in its response. > > > > That sort of design may also eventually help IMA work properly with more > > exotic filesystems, like NFS or Ceph. > > > > > > > > Maybe something like this? It builds for me but I haven't tested it. It > looks like overlayfs already should report the upper layer's i_version > in getattr, though I haven't tested that either: > > -----------------------8<--------------------------- > > [PATCH] IMA: use vfs_getattr_nosec to get the i_version > > IMA currently accesses the i_version out of the inode directly when it > does a measurement. This is fine for most simple filesystems, but can be > problematic with more complex setups (e.g. overlayfs). > > Make IMA instead call vfs_getattr_nosec to get this info. This allows > the filesystem to determine whether and how to report the i_version, and > should allow IMA to work properly with a broader class of filesystems in > the future. > > Reported-by: Stefan Berger <stefanb@linux.ibm.com> > Signed-off-by: Jeff Layton <jlayton@kernel.org> > --- So, I think we want both; we want the ovl_copyattr() and the vfs_getattr_nosec() change: (1) overlayfs should copy up the inode version in ovl_copyattr(). That is in line what we do with all other inode attributes. IOW, the overlayfs inode's i_version counter should aim to mirror the relevant layer's i_version counter. I wouldn't know why that shouldn't be the case. Asking the other way around there doesn't seem to be any use for overlayfs inodes to have an i_version that isn't just mirroring the relevant layer's i_version. (2) Jeff's changes for ima to make it rely on vfs_getattr_nosec(). Currently, ima assumes that it will get the correct i_version from an inode but that just doesn't hold for stacking filesystem. While (1) would likely just fix the immediate bug (2) is correct and _robust_. If we change how attributes are handled vfs_*() helpers will get updated and ima with it. Poking at raw inodes without using appropriate helpers is much more likely to get ima into trouble. Christian
> > > > > > > > I would ditch the original proposal in favor of this 2-line patch shown here: > > > > > > > > https://lore.kernel.org/linux-integrity/a95f62ed-8b8a-38e5-e468-ecbde3b221af@linux.ibm.com/T/#m3bd047c6e5c8200df1d273c0ad551c645dd43232 > > We should cool it with the quick hacks to fix things. :) > Yeah. It might fix this specific testcase, but I think the way it uses the i_version is "gameable" in other situations. Then again, I don't know a lot about IMA in this regard. When is it expected to remeasure? If it's only expected to remeasure on a close(), then that's one thing. That would be a weird design though. > > > > > > > > > > > > > > Ok, I think I get it. IMA is trying to use the i_version from the > > > overlayfs inode. > > > > > > I suspect that the real problem here is that IMA is just doing a bare > > > inode_query_iversion. Really, we ought to make IMA call > > > vfs_getattr_nosec (or something like it) to query the getattr routine in > > > the upper layer. Then overlayfs could just propagate the results from > > > the upper layer in its response. > > > > > > That sort of design may also eventually help IMA work properly with more > > > exotic filesystems, like NFS or Ceph. > > > > > > > > > > > > > Maybe something like this? It builds for me but I haven't tested it. It > > looks like overlayfs already should report the upper layer's i_version > > in getattr, though I haven't tested that either: > > > > -----------------------8<--------------------------- > > > > [PATCH] IMA: use vfs_getattr_nosec to get the i_version > > > > IMA currently accesses the i_version out of the inode directly when it > > does a measurement. This is fine for most simple filesystems, but can be > > problematic with more complex setups (e.g. overlayfs). > > > > Make IMA instead call vfs_getattr_nosec to get this info. This allows > > the filesystem to determine whether and how to report the i_version, and > > should allow IMA to work properly with a broader class of filesystems in > > the future. > > > > Reported-by: Stefan Berger <stefanb@linux.ibm.com> > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > --- > > So, I think we want both; we want the ovl_copyattr() and the > vfs_getattr_nosec() change: > > (1) overlayfs should copy up the inode version in ovl_copyattr(). That > is in line what we do with all other inode attributes. IOW, the > overlayfs inode's i_version counter should aim to mirror the > relevant layer's i_version counter. I wouldn't know why that > shouldn't be the case. Asking the other way around there doesn't > seem to be any use for overlayfs inodes to have an i_version that > isn't just mirroring the relevant layer's i_version. It's less than ideal to do this IMO, particularly with an IS_I_VERSION inode. You can't just copy up the value from the upper. You'll need to call inode_query_iversion(upper_inode), which will flag the upper inode for a logged i_version update on the next write. IOW, this could create some (probably minor) metadata write amplification in the upper layer inode with IS_I_VERSION inodes. > (2) Jeff's changes for ima to make it rely on vfs_getattr_nosec(). > Currently, ima assumes that it will get the correct i_version from > an inode but that just doesn't hold for stacking filesystem. > > While (1) would likely just fix the immediate bug (2) is correct and > _robust_. If we change how attributes are handled vfs_*() helpers will > get updated and ima with it. Poking at raw inodes without using > appropriate helpers is much more likely to get ima into trouble. This will fix it the right way, I think (assuming it actually works), and should open the door for IMA to work properly with networked filesystems that support i_version as well. Note that there Stephen is correct that calling getattr is probably going to be less efficient here since we're going to end up calling generic_fillattr unnecessarily, but I still think it's the right thing to do. If it turns out to cause measurable performance regressions though, maybe we can look at adding a something that still calls ->getattr if it exists but only returns the change_cookie value.
On Fri, Apr 07, 2023 at 09:29:29AM -0400, Jeff Layton wrote: > > > > > > > > > > I would ditch the original proposal in favor of this 2-line patch shown here: > > > > > > > > > > https://lore.kernel.org/linux-integrity/a95f62ed-8b8a-38e5-e468-ecbde3b221af@linux.ibm.com/T/#m3bd047c6e5c8200df1d273c0ad551c645dd43232 > > > > We should cool it with the quick hacks to fix things. :) > > > > Yeah. It might fix this specific testcase, but I think the way it uses > the i_version is "gameable" in other situations. Then again, I don't > know a lot about IMA in this regard. > > When is it expected to remeasure? If it's only expected to remeasure on > a close(), then that's one thing. That would be a weird design though. > > > > > > > > > > > > > > > > > > > Ok, I think I get it. IMA is trying to use the i_version from the > > > > overlayfs inode. > > > > > > > > I suspect that the real problem here is that IMA is just doing a bare > > > > inode_query_iversion. Really, we ought to make IMA call > > > > vfs_getattr_nosec (or something like it) to query the getattr routine in > > > > the upper layer. Then overlayfs could just propagate the results from > > > > the upper layer in its response. > > > > > > > > That sort of design may also eventually help IMA work properly with more > > > > exotic filesystems, like NFS or Ceph. > > > > > > > > > > > > > > > > > > Maybe something like this? It builds for me but I haven't tested it. It > > > looks like overlayfs already should report the upper layer's i_version > > > in getattr, though I haven't tested that either: > > > > > > -----------------------8<--------------------------- > > > > > > [PATCH] IMA: use vfs_getattr_nosec to get the i_version > > > > > > IMA currently accesses the i_version out of the inode directly when it > > > does a measurement. This is fine for most simple filesystems, but can be > > > problematic with more complex setups (e.g. overlayfs). > > > > > > Make IMA instead call vfs_getattr_nosec to get this info. This allows > > > the filesystem to determine whether and how to report the i_version, and > > > should allow IMA to work properly with a broader class of filesystems in > > > the future. > > > > > > Reported-by: Stefan Berger <stefanb@linux.ibm.com> > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > > --- > > > > So, I think we want both; we want the ovl_copyattr() and the > > vfs_getattr_nosec() change: > > > > (1) overlayfs should copy up the inode version in ovl_copyattr(). That > > is in line what we do with all other inode attributes. IOW, the > > overlayfs inode's i_version counter should aim to mirror the > > relevant layer's i_version counter. I wouldn't know why that > > shouldn't be the case. Asking the other way around there doesn't > > seem to be any use for overlayfs inodes to have an i_version that > > isn't just mirroring the relevant layer's i_version. > > It's less than ideal to do this IMO, particularly with an IS_I_VERSION > inode. > > You can't just copy up the value from the upper. You'll need to call > inode_query_iversion(upper_inode), which will flag the upper inode for a > logged i_version update on the next write. IOW, this could create some > (probably minor) metadata write amplification in the upper layer inode > with IS_I_VERSION inodes. I'm likely just missing context and am curious about this so bear with me. Why do we need to flag the upper inode for a logged i_version update? Any required i_version interactions should've already happened when overlayfs called into the upper layer. So all that's left to do is for overlayfs' to mirror the i_version value after the upper operation has returned. ovl_copyattr() - which copies the inode attributes - is always called after the operation on the upper inode has finished. So the additional query seems odd at first glance. But there might well be a good reason for it. In my naive approach I would've thought that sm along the lines of: diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c index 923d66d131c1..8b089035b9b3 100644 --- a/fs/overlayfs/util.c +++ b/fs/overlayfs/util.c @@ -1119,4 +1119,5 @@ void ovl_copyattr(struct inode *inode) inode->i_mtime = realinode->i_mtime; inode->i_ctime = realinode->i_ctime; i_size_write(inode, i_size_read(realinode)); + inode_set_iversion_raw(inode, inode_peek_iversion_raw(realinode)); } would've been sufficient. Since overlayfs' does explicitly disallow changes to the upper and lower trees while overlayfs is mounted it seems intuitive that it should just mirror the relevant layer's i_version. If we don't do this, then we should probably document that i_version doesn't have a meaning yet for the inodes of stacking filesystems. > > > > (2) Jeff's changes for ima to make it rely on vfs_getattr_nosec(). > > Currently, ima assumes that it will get the correct i_version from > > an inode but that just doesn't hold for stacking filesystem. > > > > While (1) would likely just fix the immediate bug (2) is correct and > > _robust_. If we change how attributes are handled vfs_*() helpers will > > get updated and ima with it. Poking at raw inodes without using > > appropriate helpers is much more likely to get ima into trouble. > > This will fix it the right way, I think (assuming it actually works), > and should open the door for IMA to work properly with networked > filesystems that support i_version as well. > > Note that there Stephen is correct that calling getattr is probably > going to be less efficient here since we're going to end up calling > generic_fillattr unnecessarily, but I still think it's the right thing > to do. > > If it turns out to cause measurable performance regressions though, > maybe we can look at adding a something that still calls ->getattr if it > exists but only returns the change_cookie value. Sounds good to me.
On Sun, 2023-04-09 at 17:22 +0200, Christian Brauner wrote: > On Fri, Apr 07, 2023 at 09:29:29AM -0400, Jeff Layton wrote: > > > > > > > > > > > > I would ditch the original proposal in favor of this 2-line patch shown here: > > > > > > > > > > > > https://lore.kernel.org/linux-integrity/a95f62ed-8b8a-38e5-e468-ecbde3b221af@linux.ibm.com/T/#m3bd047c6e5c8200df1d273c0ad551c645dd43232 > > > > > > We should cool it with the quick hacks to fix things. :) > > > > > > > Yeah. It might fix this specific testcase, but I think the way it uses > > the i_version is "gameable" in other situations. Then again, I don't > > know a lot about IMA in this regard. > > > > When is it expected to remeasure? If it's only expected to remeasure on > > a close(), then that's one thing. That would be a weird design though. > > > > > > > > > > > > > > > > > > > > > > > > Ok, I think I get it. IMA is trying to use the i_version from the > > > > > overlayfs inode. > > > > > > > > > > I suspect that the real problem here is that IMA is just doing a bare > > > > > inode_query_iversion. Really, we ought to make IMA call > > > > > vfs_getattr_nosec (or something like it) to query the getattr routine in > > > > > the upper layer. Then overlayfs could just propagate the results from > > > > > the upper layer in its response. > > > > > > > > > > That sort of design may also eventually help IMA work properly with more > > > > > exotic filesystems, like NFS or Ceph. > > > > > > > > > > > > > > > > > > > > > > > Maybe something like this? It builds for me but I haven't tested it. It > > > > looks like overlayfs already should report the upper layer's i_version > > > > in getattr, though I haven't tested that either: > > > > > > > > -----------------------8<--------------------------- > > > > > > > > [PATCH] IMA: use vfs_getattr_nosec to get the i_version > > > > > > > > IMA currently accesses the i_version out of the inode directly when it > > > > does a measurement. This is fine for most simple filesystems, but can be > > > > problematic with more complex setups (e.g. overlayfs). > > > > > > > > Make IMA instead call vfs_getattr_nosec to get this info. This allows > > > > the filesystem to determine whether and how to report the i_version, and > > > > should allow IMA to work properly with a broader class of filesystems in > > > > the future. > > > > > > > > Reported-by: Stefan Berger <stefanb@linux.ibm.com> > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > > > --- > > > > > > So, I think we want both; we want the ovl_copyattr() and the > > > vfs_getattr_nosec() change: > > > > > > (1) overlayfs should copy up the inode version in ovl_copyattr(). That > > > is in line what we do with all other inode attributes. IOW, the > > > overlayfs inode's i_version counter should aim to mirror the > > > relevant layer's i_version counter. I wouldn't know why that > > > shouldn't be the case. Asking the other way around there doesn't > > > seem to be any use for overlayfs inodes to have an i_version that > > > isn't just mirroring the relevant layer's i_version. > > > > It's less than ideal to do this IMO, particularly with an IS_I_VERSION > > inode. > > > > You can't just copy up the value from the upper. You'll need to call > > inode_query_iversion(upper_inode), which will flag the upper inode for a > > logged i_version update on the next write. IOW, this could create some > > (probably minor) metadata write amplification in the upper layer inode > > with IS_I_VERSION inodes. > > I'm likely just missing context and am curious about this so bear with me. Why > do we need to flag the upper inode for a logged i_version update? Any required > i_version interactions should've already happened when overlayfs called into > the upper layer. So all that's left to do is for overlayfs' to mirror the > i_version value after the upper operation has returned. > ovl_copyattr() - which copies the inode attributes - is always called after the > operation on the upper inode has finished. So the additional query seems odd at > first glance. But there might well be a good reason for it. In my naive > approach I would've thought that sm along the lines of: > > diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c > index 923d66d131c1..8b089035b9b3 100644 > --- a/fs/overlayfs/util.c > +++ b/fs/overlayfs/util.c > @@ -1119,4 +1119,5 @@ void ovl_copyattr(struct inode *inode) > inode->i_mtime = realinode->i_mtime; > inode->i_ctime = realinode->i_ctime; > i_size_write(inode, i_size_read(realinode)); > + inode_set_iversion_raw(inode, inode_peek_iversion_raw(realinode)); > } > > would've been sufficient. > Nope, because then you wouldn't get any updates to i_version after that point. Note that with an IS_I_VERSION inode we only update the i_version when there has been a query since the last update. What you're doing above is circumventing that mechanism. You'll get the i_version at the time of of the ovl_copyattr, but there won't be any updates of it after that point because the QUERIED bit won't end up being set on realinode. > Since overlayfs' does explicitly disallow changes to the upper and lower trees > while overlayfs is mounted it seems intuitive that it should just mirror the > relevant layer's i_version. > > > If we don't do this, then we should probably document that i_version doesn't > have a meaning yet for the inodes of stacking filesystems. > Trying to cache the i_version is counterproductive, IMO, at least with an IS_I_VERSION inode. The problem is that a query against the i_version has a side-effect. It has to (atomically) mark the inode for an update on the next change. If you try to cache that value, you'll likely end up doing more queries than you really need to (because you'll need to keep the cache up to date) and you'll have an i_version that will necessarily lag the one in the upper layer inode. The whole point of the change attribute is to get the value as it is at this very moment so we can check whether there have been changes. A laggy value is not terribly useful. Overlayfs should just always call the upper layer's ->getattr to get the version. I wouldn't even bother copying it up in the first place. Doing so is just encouraging someone to try use the value in the overlayfs inode, when they really need to go through ->getattr and get the one from the upper layer. >
On Sun, Apr 09, 2023 at 06:12:09PM -0400, Jeff Layton wrote: > On Sun, 2023-04-09 at 17:22 +0200, Christian Brauner wrote: > > On Fri, Apr 07, 2023 at 09:29:29AM -0400, Jeff Layton wrote: > > > > > > > > > > > > > > I would ditch the original proposal in favor of this 2-line patch shown here: > > > > > > > > > > > > > > https://lore.kernel.org/linux-integrity/a95f62ed-8b8a-38e5-e468-ecbde3b221af@linux.ibm.com/T/#m3bd047c6e5c8200df1d273c0ad551c645dd43232 > > > > > > > > We should cool it with the quick hacks to fix things. :) > > > > > > > > > > Yeah. It might fix this specific testcase, but I think the way it uses > > > the i_version is "gameable" in other situations. Then again, I don't > > > know a lot about IMA in this regard. > > > > > > When is it expected to remeasure? If it's only expected to remeasure on > > > a close(), then that's one thing. That would be a weird design though. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Ok, I think I get it. IMA is trying to use the i_version from the > > > > > > overlayfs inode. > > > > > > > > > > > > I suspect that the real problem here is that IMA is just doing a bare > > > > > > inode_query_iversion. Really, we ought to make IMA call > > > > > > vfs_getattr_nosec (or something like it) to query the getattr routine in > > > > > > the upper layer. Then overlayfs could just propagate the results from > > > > > > the upper layer in its response. > > > > > > > > > > > > That sort of design may also eventually help IMA work properly with more > > > > > > exotic filesystems, like NFS or Ceph. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Maybe something like this? It builds for me but I haven't tested it. It > > > > > looks like overlayfs already should report the upper layer's i_version > > > > > in getattr, though I haven't tested that either: > > > > > > > > > > -----------------------8<--------------------------- > > > > > > > > > > [PATCH] IMA: use vfs_getattr_nosec to get the i_version > > > > > > > > > > IMA currently accesses the i_version out of the inode directly when it > > > > > does a measurement. This is fine for most simple filesystems, but can be > > > > > problematic with more complex setups (e.g. overlayfs). > > > > > > > > > > Make IMA instead call vfs_getattr_nosec to get this info. This allows > > > > > the filesystem to determine whether and how to report the i_version, and > > > > > should allow IMA to work properly with a broader class of filesystems in > > > > > the future. > > > > > > > > > > Reported-by: Stefan Berger <stefanb@linux.ibm.com> > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > > > > --- > > > > > > > > So, I think we want both; we want the ovl_copyattr() and the > > > > vfs_getattr_nosec() change: > > > > > > > > (1) overlayfs should copy up the inode version in ovl_copyattr(). That > > > > is in line what we do with all other inode attributes. IOW, the > > > > overlayfs inode's i_version counter should aim to mirror the > > > > relevant layer's i_version counter. I wouldn't know why that > > > > shouldn't be the case. Asking the other way around there doesn't > > > > seem to be any use for overlayfs inodes to have an i_version that > > > > isn't just mirroring the relevant layer's i_version. > > > > > > It's less than ideal to do this IMO, particularly with an IS_I_VERSION > > > inode. > > > > > > You can't just copy up the value from the upper. You'll need to call > > > inode_query_iversion(upper_inode), which will flag the upper inode for a > > > logged i_version update on the next write. IOW, this could create some > > > (probably minor) metadata write amplification in the upper layer inode > > > with IS_I_VERSION inodes. > > > > I'm likely just missing context and am curious about this so bear with me. Why > > do we need to flag the upper inode for a logged i_version update? Any required > > i_version interactions should've already happened when overlayfs called into > > the upper layer. So all that's left to do is for overlayfs' to mirror the > > i_version value after the upper operation has returned. > > > ovl_copyattr() - which copies the inode attributes - is always called after the > > operation on the upper inode has finished. So the additional query seems odd at > > first glance. But there might well be a good reason for it. In my naive > > approach I would've thought that sm along the lines of: > > > > diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c > > index 923d66d131c1..8b089035b9b3 100644 > > --- a/fs/overlayfs/util.c > > +++ b/fs/overlayfs/util.c > > @@ -1119,4 +1119,5 @@ void ovl_copyattr(struct inode *inode) > > inode->i_mtime = realinode->i_mtime; > > inode->i_ctime = realinode->i_ctime; > > i_size_write(inode, i_size_read(realinode)); > > + inode_set_iversion_raw(inode, inode_peek_iversion_raw(realinode)); > > } > > > > would've been sufficient. > > > > Nope, because then you wouldn't get any updates to i_version after that > point. > > Note that with an IS_I_VERSION inode we only update the i_version when > there has been a query since the last update. What you're doing above is > circumventing that mechanism. You'll get the i_version at the time of of > the ovl_copyattr, but there won't be any updates of it after that point > because the QUERIED bit won't end up being set on realinode. I get all that. But my understanding had been that the i_version value at the time of ovl_copyattr() would be correct. Because when ovl_copyattr() is called the expected i_version change will have been done in the relevant layer includig raising the QUERIED bit. Since the layers are not allowed to be changed outside of the overlayfs mount any change to them can only originate from overlayfs which would necessarily call ovl_copyattr() again. IOW, overlayfs would by virtue of its implementation keep the i_version value in sync. Overlayfs wouldn't even raise SB_I_VERSION. It would indeed just be a cache of i_version of the relevant layer. > > > > Since overlayfs' does explicitly disallow changes to the upper and lower trees > > while overlayfs is mounted it seems intuitive that it should just mirror the > > relevant layer's i_version. > > > > > > If we don't do this, then we should probably document that i_version doesn't > > have a meaning yet for the inodes of stacking filesystems. > > > > Trying to cache the i_version is counterproductive, IMO, at least with > an IS_I_VERSION inode. > > The problem is that a query against the i_version has a side-effect. It > has to (atomically) mark the inode for an update on the next change. > > If you try to cache that value, you'll likely end up doing more queries > than you really need to (because you'll need to keep the cache up to > date) and you'll have an i_version that will necessarily lag the one in > the upper layer inode. > > The whole point of the change attribute is to get the value as it is at > this very moment so we can check whether there have been changes. A > laggy value is not terribly useful. > > Overlayfs should just always call the upper layer's ->getattr to get the > version. I wouldn't even bother copying it up in the first place. Doing > so is just encouraging someone to try use the value in the overlayfs > inode, when they really need to go through ->getattr and get the one > from the upper layer. That seems reasonable to me. I read this as an agreeing with my earlier suggestion to document that i_version doesn't have a meaning for the inodes of stacking filesystems and that we should spell out that vfs_getattr()/->getattr() needs to be used to interact with i_version. We need to explain to subsystems such as IMA somwhere what the correct way to query i_version agnostically is; independent of filesystem implementation details. Looking at IMA, it queries the i_version directly without checking whether it's an IS_I_VERSION() inode first. This might make a difference. Afaict, filesystems that persist i_version to disk automatically raise SB_I_VERSION. I would guess that it be considered a bug if a filesystem would persist i_version to disk and not raise SB_I_VERSION. If so IMA should probably be made to check for IS_I_VERSION() and it will probably get that by switching to vfs_getattr_nosec().
On Tue, 2023-04-11 at 10:38 +0200, Christian Brauner wrote: > On Sun, Apr 09, 2023 at 06:12:09PM -0400, Jeff Layton wrote: > > On Sun, 2023-04-09 at 17:22 +0200, Christian Brauner wrote: > > > On Fri, Apr 07, 2023 at 09:29:29AM -0400, Jeff Layton wrote: > > > > > > > > > > > > > > > > I would ditch the original proposal in favor of this 2-line patch shown here: > > > > > > > > > > > > > > > > https://lore.kernel.org/linux-integrity/a95f62ed-8b8a-38e5-e468-ecbde3b221af@linux.ibm.com/T/#m3bd047c6e5c8200df1d273c0ad551c645dd43232 > > > > > > > > > > We should cool it with the quick hacks to fix things. :) > > > > > > > > > > > > > Yeah. It might fix this specific testcase, but I think the way it uses > > > > the i_version is "gameable" in other situations. Then again, I don't > > > > know a lot about IMA in this regard. > > > > > > > > When is it expected to remeasure? If it's only expected to remeasure on > > > > a close(), then that's one thing. That would be a weird design though. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Ok, I think I get it. IMA is trying to use the i_version from the > > > > > > > overlayfs inode. > > > > > > > > > > > > > > I suspect that the real problem here is that IMA is just doing a bare > > > > > > > inode_query_iversion. Really, we ought to make IMA call > > > > > > > vfs_getattr_nosec (or something like it) to query the getattr routine in > > > > > > > the upper layer. Then overlayfs could just propagate the results from > > > > > > > the upper layer in its response. > > > > > > > > > > > > > > That sort of design may also eventually help IMA work properly with more > > > > > > > exotic filesystems, like NFS or Ceph. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Maybe something like this? It builds for me but I haven't tested it. It > > > > > > looks like overlayfs already should report the upper layer's i_version > > > > > > in getattr, though I haven't tested that either: > > > > > > > > > > > > -----------------------8<--------------------------- > > > > > > > > > > > > [PATCH] IMA: use vfs_getattr_nosec to get the i_version > > > > > > > > > > > > IMA currently accesses the i_version out of the inode directly when it > > > > > > does a measurement. This is fine for most simple filesystems, but can be > > > > > > problematic with more complex setups (e.g. overlayfs). > > > > > > > > > > > > Make IMA instead call vfs_getattr_nosec to get this info. This allows > > > > > > the filesystem to determine whether and how to report the i_version, and > > > > > > should allow IMA to work properly with a broader class of filesystems in > > > > > > the future. > > > > > > > > > > > > Reported-by: Stefan Berger <stefanb@linux.ibm.com> > > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > > > > > --- > > > > > > > > > > So, I think we want both; we want the ovl_copyattr() and the > > > > > vfs_getattr_nosec() change: > > > > > > > > > > (1) overlayfs should copy up the inode version in ovl_copyattr(). That > > > > > is in line what we do with all other inode attributes. IOW, the > > > > > overlayfs inode's i_version counter should aim to mirror the > > > > > relevant layer's i_version counter. I wouldn't know why that > > > > > shouldn't be the case. Asking the other way around there doesn't > > > > > seem to be any use for overlayfs inodes to have an i_version that > > > > > isn't just mirroring the relevant layer's i_version. > > > > > > > > It's less than ideal to do this IMO, particularly with an IS_I_VERSION > > > > inode. > > > > > > > > You can't just copy up the value from the upper. You'll need to call > > > > inode_query_iversion(upper_inode), which will flag the upper inode for a > > > > logged i_version update on the next write. IOW, this could create some > > > > (probably minor) metadata write amplification in the upper layer inode > > > > with IS_I_VERSION inodes. > > > > > > I'm likely just missing context and am curious about this so bear with me. Why > > > do we need to flag the upper inode for a logged i_version update? Any required > > > i_version interactions should've already happened when overlayfs called into > > > the upper layer. So all that's left to do is for overlayfs' to mirror the > > > i_version value after the upper operation has returned. > > > > > ovl_copyattr() - which copies the inode attributes - is always called after the > > > operation on the upper inode has finished. So the additional query seems odd at > > > first glance. But there might well be a good reason for it. In my naive > > > approach I would've thought that sm along the lines of: > > > > > > diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c > > > index 923d66d131c1..8b089035b9b3 100644 > > > --- a/fs/overlayfs/util.c > > > +++ b/fs/overlayfs/util.c > > > @@ -1119,4 +1119,5 @@ void ovl_copyattr(struct inode *inode) > > > inode->i_mtime = realinode->i_mtime; > > > inode->i_ctime = realinode->i_ctime; > > > i_size_write(inode, i_size_read(realinode)); > > > + inode_set_iversion_raw(inode, inode_peek_iversion_raw(realinode)); > > > } > > > > > > would've been sufficient. > > > > > > > Nope, because then you wouldn't get any updates to i_version after that > > point. > > > > Note that with an IS_I_VERSION inode we only update the i_version when > > there has been a query since the last update. What you're doing above is > > circumventing that mechanism. You'll get the i_version at the time of of > > the ovl_copyattr, but there won't be any updates of it after that point > > because the QUERIED bit won't end up being set on realinode. > > I get all that. > But my understanding had been that the i_version value at the time of > ovl_copyattr() would be correct. Because when ovl_copyattr() is called > the expected i_version change will have been done in the relevant layer > includig raising the QUERIED bit. Since the layers are not allowed to be > changed outside of the overlayfs mount any change to them can only > originate from overlayfs which would necessarily call ovl_copyattr() > again. IOW, overlayfs would by virtue of its implementation keep the > i_version value in sync. > > Overlayfs wouldn't even raise SB_I_VERSION. It would indeed just be a > cache of i_version of the relevant layer. > > > > > > > > Since overlayfs' does explicitly disallow changes to the upper and lower trees > > > while overlayfs is mounted it seems intuitive that it should just mirror the > > > relevant layer's i_version. > > > > > > > > > If we don't do this, then we should probably document that i_version doesn't > > > have a meaning yet for the inodes of stacking filesystems. > > > > > > > Trying to cache the i_version is counterproductive, IMO, at least with > > an IS_I_VERSION inode. > > > > The problem is that a query against the i_version has a side-effect. It > > has to (atomically) mark the inode for an update on the next change. > > > > If you try to cache that value, you'll likely end up doing more queries > > than you really need to (because you'll need to keep the cache up to > > date) and you'll have an i_version that will necessarily lag the one in > > the upper layer inode. > > > > The whole point of the change attribute is to get the value as it is at > > this very moment so we can check whether there have been changes. A > > laggy value is not terribly useful. > > > > Overlayfs should just always call the upper layer's ->getattr to get the > > version. I wouldn't even bother copying it up in the first place. Doing > > so is just encouraging someone to try use the value in the overlayfs > > inode, when they really need to go through ->getattr and get the one > > from the upper layer. > > That seems reasonable to me. I read this as an agreeing with my earlier > suggestion to document that i_version doesn't have a meaning for the > inodes of stacking filesystems and that we should spell out that > vfs_getattr()/->getattr() needs to be used to interact with i_version. > It really has no meaning in the stacked filesystem's _inode_. The only i_version that has any meaning in a (simple) stacking setup is the upper layer inode. > We need to explain to subsystems such as IMA somwhere what the correct > way to query i_version agnostically is; independent of filesystem > implementation details. > > Looking at IMA, it queries the i_version directly without checking > whether it's an IS_I_VERSION() inode first. This might make a > difference. > IMA should just use getattr. That allows the filesystem to present the i_version to the caller as it sees fit. Fetching i_version directly without testing for IS_I_VERSION is wrong, because you don't know what that field contains, or whether the fs supports it at all. > Afaict, filesystems that persist i_version to disk automatically raise > SB_I_VERSION. I would guess that it be considered a bug if a filesystem > would persist i_version to disk and not raise SB_I_VERSION. If so IMA > should probably be made to check for IS_I_VERSION() and it will probably > get that by switching to vfs_getattr_nosec(). Not quite. SB_I_VERSION tells the vfs that the filesystem wants the kernel to manage the increment of the i_version for it. The filesystem is still responsible for persisting that value to disk (if appropriate). Switching to vfs_getattr_nosec should make it so IMA doesn't need to worry about the gory details of all of this. If STATX_CHANGE_COOKIE is set in the response, then it can trust that value. Otherwise, it's no good.
On Tue, Apr 11, 2023 at 05:32:11AM -0400, Jeff Layton wrote: > On Tue, 2023-04-11 at 10:38 +0200, Christian Brauner wrote: > > On Sun, Apr 09, 2023 at 06:12:09PM -0400, Jeff Layton wrote: > > > On Sun, 2023-04-09 at 17:22 +0200, Christian Brauner wrote: > > > > On Fri, Apr 07, 2023 at 09:29:29AM -0400, Jeff Layton wrote: > > > > > > > > > > > > > > > > > > I would ditch the original proposal in favor of this 2-line patch shown here: > > > > > > > > > > > > > > > > > > https://lore.kernel.org/linux-integrity/a95f62ed-8b8a-38e5-e468-ecbde3b221af@linux.ibm.com/T/#m3bd047c6e5c8200df1d273c0ad551c645dd43232 > > > > > > > > > > > > We should cool it with the quick hacks to fix things. :) > > > > > > > > > > > > > > > > Yeah. It might fix this specific testcase, but I think the way it uses > > > > > the i_version is "gameable" in other situations. Then again, I don't > > > > > know a lot about IMA in this regard. > > > > > > > > > > When is it expected to remeasure? If it's only expected to remeasure on > > > > > a close(), then that's one thing. That would be a weird design though. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Ok, I think I get it. IMA is trying to use the i_version from the > > > > > > > > overlayfs inode. > > > > > > > > > > > > > > > > I suspect that the real problem here is that IMA is just doing a bare > > > > > > > > inode_query_iversion. Really, we ought to make IMA call > > > > > > > > vfs_getattr_nosec (or something like it) to query the getattr routine in > > > > > > > > the upper layer. Then overlayfs could just propagate the results from > > > > > > > > the upper layer in its response. > > > > > > > > > > > > > > > > That sort of design may also eventually help IMA work properly with more > > > > > > > > exotic filesystems, like NFS or Ceph. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Maybe something like this? It builds for me but I haven't tested it. It > > > > > > > looks like overlayfs already should report the upper layer's i_version > > > > > > > in getattr, though I haven't tested that either: > > > > > > > > > > > > > > -----------------------8<--------------------------- > > > > > > > > > > > > > > [PATCH] IMA: use vfs_getattr_nosec to get the i_version > > > > > > > > > > > > > > IMA currently accesses the i_version out of the inode directly when it > > > > > > > does a measurement. This is fine for most simple filesystems, but can be > > > > > > > problematic with more complex setups (e.g. overlayfs). > > > > > > > > > > > > > > Make IMA instead call vfs_getattr_nosec to get this info. This allows > > > > > > > the filesystem to determine whether and how to report the i_version, and > > > > > > > should allow IMA to work properly with a broader class of filesystems in > > > > > > > the future. > > > > > > > > > > > > > > Reported-by: Stefan Berger <stefanb@linux.ibm.com> > > > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > > > > > > --- > > > > > > > > > > > > So, I think we want both; we want the ovl_copyattr() and the > > > > > > vfs_getattr_nosec() change: > > > > > > > > > > > > (1) overlayfs should copy up the inode version in ovl_copyattr(). That > > > > > > is in line what we do with all other inode attributes. IOW, the > > > > > > overlayfs inode's i_version counter should aim to mirror the > > > > > > relevant layer's i_version counter. I wouldn't know why that > > > > > > shouldn't be the case. Asking the other way around there doesn't > > > > > > seem to be any use for overlayfs inodes to have an i_version that > > > > > > isn't just mirroring the relevant layer's i_version. > > > > > > > > > > It's less than ideal to do this IMO, particularly with an IS_I_VERSION > > > > > inode. > > > > > > > > > > You can't just copy up the value from the upper. You'll need to call > > > > > inode_query_iversion(upper_inode), which will flag the upper inode for a > > > > > logged i_version update on the next write. IOW, this could create some > > > > > (probably minor) metadata write amplification in the upper layer inode > > > > > with IS_I_VERSION inodes. > > > > > > > > I'm likely just missing context and am curious about this so bear with me. Why > > > > do we need to flag the upper inode for a logged i_version update? Any required > > > > i_version interactions should've already happened when overlayfs called into > > > > the upper layer. So all that's left to do is for overlayfs' to mirror the > > > > i_version value after the upper operation has returned. > > > > > > > ovl_copyattr() - which copies the inode attributes - is always called after the > > > > operation on the upper inode has finished. So the additional query seems odd at > > > > first glance. But there might well be a good reason for it. In my naive > > > > approach I would've thought that sm along the lines of: > > > > > > > > diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c > > > > index 923d66d131c1..8b089035b9b3 100644 > > > > --- a/fs/overlayfs/util.c > > > > +++ b/fs/overlayfs/util.c > > > > @@ -1119,4 +1119,5 @@ void ovl_copyattr(struct inode *inode) > > > > inode->i_mtime = realinode->i_mtime; > > > > inode->i_ctime = realinode->i_ctime; > > > > i_size_write(inode, i_size_read(realinode)); > > > > + inode_set_iversion_raw(inode, inode_peek_iversion_raw(realinode)); > > > > } > > > > > > > > would've been sufficient. > > > > > > > > > > Nope, because then you wouldn't get any updates to i_version after that > > > point. > > > > > > Note that with an IS_I_VERSION inode we only update the i_version when > > > there has been a query since the last update. What you're doing above is > > > circumventing that mechanism. You'll get the i_version at the time of of > > > the ovl_copyattr, but there won't be any updates of it after that point > > > because the QUERIED bit won't end up being set on realinode. > > > > I get all that. > > But my understanding had been that the i_version value at the time of > > ovl_copyattr() would be correct. Because when ovl_copyattr() is called > > the expected i_version change will have been done in the relevant layer > > includig raising the QUERIED bit. Since the layers are not allowed to be > > changed outside of the overlayfs mount any change to them can only > > originate from overlayfs which would necessarily call ovl_copyattr() > > again. IOW, overlayfs would by virtue of its implementation keep the > > i_version value in sync. > > > > Overlayfs wouldn't even raise SB_I_VERSION. It would indeed just be a > > cache of i_version of the relevant layer. > > > > > > > > > > > > Since overlayfs' does explicitly disallow changes to the upper and lower trees > > > > while overlayfs is mounted it seems intuitive that it should just mirror the > > > > relevant layer's i_version. > > > > > > > > > > > > If we don't do this, then we should probably document that i_version doesn't > > > > have a meaning yet for the inodes of stacking filesystems. > > > > > > > > > > Trying to cache the i_version is counterproductive, IMO, at least with > > > an IS_I_VERSION inode. > > > > > > The problem is that a query against the i_version has a side-effect. It > > > has to (atomically) mark the inode for an update on the next change. > > > > > > If you try to cache that value, you'll likely end up doing more queries > > > than you really need to (because you'll need to keep the cache up to > > > date) and you'll have an i_version that will necessarily lag the one in > > > the upper layer inode. > > > > > > The whole point of the change attribute is to get the value as it is at > > > this very moment so we can check whether there have been changes. A > > > laggy value is not terribly useful. > > > > > > Overlayfs should just always call the upper layer's ->getattr to get the > > > version. I wouldn't even bother copying it up in the first place. Doing > > > so is just encouraging someone to try use the value in the overlayfs > > > inode, when they really need to go through ->getattr and get the one > > > from the upper layer. > > > > That seems reasonable to me. I read this as an agreeing with my earlier > > suggestion to document that i_version doesn't have a meaning for the > > inodes of stacking filesystems and that we should spell out that > > vfs_getattr()/->getattr() needs to be used to interact with i_version. > > > > It really has no meaning in the stacked filesystem's _inode_. The only > i_version that has any meaning in a (simple) stacking setup is the upper > layer inode. Ok, we're on the same page then. > > > We need to explain to subsystems such as IMA somwhere what the correct > > way to query i_version agnostically is; independent of filesystem > > implementation details. > > > > Looking at IMA, it queries the i_version directly without checking > > whether it's an IS_I_VERSION() inode first. This might make a > > difference. > > > > IMA should just use getattr. That allows the filesystem to present the > i_version to the caller as it sees fit. Fetching i_version directly > without testing for IS_I_VERSION is wrong, because you don't know what > that field contains, or whether the fs supports it at all. Yep, same page again. > > > Afaict, filesystems that persist i_version to disk automatically raise > > SB_I_VERSION. I would guess that it be considered a bug if a filesystem > > would persist i_version to disk and not raise SB_I_VERSION. If so IMA > > should probably be made to check for IS_I_VERSION() and it will probably > > get that by switching to vfs_getattr_nosec(). > > Not quite. SB_I_VERSION tells the vfs that the filesystem wants the > kernel to manage the increment of the i_version for it. The filesystem > is still responsible for persisting that value to disk (if appropriate). Yes, sure it's the filesystems responsibility to persist it to disk or not. What I tried to ask was that when a filesystem does persist i_version to disk then would it be legal to mount it without SB_I_VERSION (because ext2/ext3 did use to have that mount option)? If it would then the filesystem would probably need to take care to leave the i_version field in struct inode uninitialized to avoid confusion or would that just work? (Mere curiosity, don't feel obligated to go into detail here. I don't want to hog your time.) > > Switching to vfs_getattr_nosec should make it so IMA doesn't need to > worry about the gory details of all of this. If STATX_CHANGE_COOKIE is > set in the response, then it can trust that value. Otherwise, it's no > good. Yep, same page again.
On Tue, 2023-04-11 at 11:49 +0200, Christian Brauner wrote: > > > > > > Afaict, filesystems that persist i_version to disk automatically raise > > > SB_I_VERSION. I would guess that it be considered a bug if a filesystem > > > would persist i_version to disk and not raise SB_I_VERSION. If so IMA > > > should probably be made to check for IS_I_VERSION() and it will probably > > > get that by switching to vfs_getattr_nosec(). > > > > Not quite. SB_I_VERSION tells the vfs that the filesystem wants the > > kernel to manage the increment of the i_version for it. The filesystem > > is still responsible for persisting that value to disk (if appropriate). > > Yes, sure it's the filesystems responsibility to persist it to disk or > not. What I tried to ask was that when a filesystem does persist > i_version to disk then would it be legal to mount it without > SB_I_VERSION (because ext2/ext3 did use to have that mount option)? If > it would then the filesystem would probably need to take care to leave > the i_version field in struct inode uninitialized to avoid confusion or > would that just work? (Mere curiosity, don't feel obligated to go into > detail here. I don't want to hog your time.) > In modern kernels, not setting SB_I_VERSION would mainly have the effect of stopping increments of i_version field on write. It would also mean that the STATX_CHANGE_COOKIE is not automatically reported via getattr. You probably wouldn't want to mount the fs without SB_I_VERSION set. The missing increments could trick an observer into believing that nothing had changed in the file across mounts when it actually had.
On Tue, Apr 11, 2023 at 06:13:15AM -0400, Jeff Layton wrote: > On Tue, 2023-04-11 at 11:49 +0200, Christian Brauner wrote: > > > > > > > > > Afaict, filesystems that persist i_version to disk automatically raise > > > > SB_I_VERSION. I would guess that it be considered a bug if a filesystem > > > > would persist i_version to disk and not raise SB_I_VERSION. If so IMA > > > > should probably be made to check for IS_I_VERSION() and it will probably > > > > get that by switching to vfs_getattr_nosec(). > > > > > > Not quite. SB_I_VERSION tells the vfs that the filesystem wants the > > > kernel to manage the increment of the i_version for it. The filesystem > > > is still responsible for persisting that value to disk (if appropriate). > > > > Yes, sure it's the filesystems responsibility to persist it to disk or > > not. What I tried to ask was that when a filesystem does persist > > i_version to disk then would it be legal to mount it without > > SB_I_VERSION (because ext2/ext3 did use to have that mount option)? If > > it would then the filesystem would probably need to take care to leave > > the i_version field in struct inode uninitialized to avoid confusion or > > would that just work? (Mere curiosity, don't feel obligated to go into > > detail here. I don't want to hog your time.) > > > > In modern kernels, not setting SB_I_VERSION would mainly have the effect > of stopping increments of i_version field on write. It would also mean > that the STATX_CHANGE_COOKIE is not automatically reported via getattr. Ah, good. > > You probably wouldn't want to mount the fs without SB_I_VERSION set. The > missing increments could trick an observer into believing that nothing > had changed in the file across mounts when it actually had. Yeah, that's what I thought and that would potentially be an attack on IMA which is why I asked. Thanks! Christian
On 4/7/23 09:29, Jeff Layton wrote: >>>>> >>>>> I would ditch the original proposal in favor of this 2-line patch shown here: >>>>> >>>>> https://lore.kernel.org/linux-integrity/a95f62ed-8b8a-38e5-e468-ecbde3b221af@linux.ibm.com/T/#m3bd047c6e5c8200df1d273c0ad551c645dd43232 >> >> We should cool it with the quick hacks to fix things. :) >> > > Yeah. It might fix this specific testcase, but I think the way it uses > the i_version is "gameable" in other situations. Then again, I don't > know a lot about IMA in this regard. > > When is it expected to remeasure? If it's only expected to remeasure on > a close(), then that's one thing. That would be a weird design though. IMA should remeasure the file when it has visibly changed for another thread or process. >>> -----------------------8<--------------------------- >>> >>> [PATCH] IMA: use vfs_getattr_nosec to get the i_version >>> >>> IMA currently accesses the i_version out of the inode directly when it >>> does a measurement. This is fine for most simple filesystems, but can be >>> problematic with more complex setups (e.g. overlayfs). >>> >>> Make IMA instead call vfs_getattr_nosec to get this info. This allows >>> the filesystem to determine whether and how to report the i_version, and >>> should allow IMA to work properly with a broader class of filesystems in >>> the future. >>> >>> Reported-by: Stefan Berger <stefanb@linux.ibm.com> >>> Signed-off-by: Jeff Layton <jlayton@kernel.org> >>> --- >> >> So, I think we want both; we want the ovl_copyattr() and the >> vfs_getattr_nosec() change: >> >> (1) overlayfs should copy up the inode version in ovl_copyattr(). That >> is in line what we do with all other inode attributes. IOW, the >> overlayfs inode's i_version counter should aim to mirror the >> relevant layer's i_version counter. I wouldn't know why that >> shouldn't be the case. Asking the other way around there doesn't >> seem to be any use for overlayfs inodes to have an i_version that >> isn't just mirroring the relevant layer's i_version. > > It's less than ideal to do this IMO, particularly with an IS_I_VERSION > inode. > > You can't just copy up the value from the upper. You'll need to call > inode_query_iversion(upper_inode), which will flag the upper inode for a > logged i_version update on the next write. IOW, this could create some > (probably minor) metadata write amplification in the upper layer inode > with IS_I_VERSION inodes. > > >> (2) Jeff's changes for ima to make it rely on vfs_getattr_nosec(). >> Currently, ima assumes that it will get the correct i_version from >> an inode but that just doesn't hold for stacking filesystem. >> >> While (1) would likely just fix the immediate bug (2) is correct and >> _robust_. If we change how attributes are handled vfs_*() helpers will >> get updated and ima with it. Poking at raw inodes without using >> appropriate helpers is much more likely to get ima into trouble. > > This will fix it the right way, I think (assuming it actually works), > and should open the door for IMA to work properly with networked > filesystems that support i_version as well. > > Note that there Stephen is correct that calling getattr is probably > going to be less efficient here since we're going to end up calling > generic_fillattr unnecessarily, but I still think it's the right thing > to do. I was wondering whether to use the existing inode_eq_iversion() for all other filesystems than overlayfs, nfs, and possibly other ones (which ones?) where we would use the vfs_getattr_nosec() via a case on inode->i_sb->s_magic? If so, would this function be generic enough to be a public function for libfs.c? I'll hopefully be able to test the proposed patch tomorrow. > > If it turns out to cause measurable performance regressions though, > maybe we can look at adding a something that still calls ->getattr if it > exists but only returns the change_cookie value.
On Sun, Apr 16, 2023 at 09:57:10PM -0400, Stefan Berger wrote: > > > On 4/7/23 09:29, Jeff Layton wrote: > > > > > > > > > > > > I would ditch the original proposal in favor of this 2-line patch shown here: > > > > > > > > > > > > https://lore.kernel.org/linux-integrity/a95f62ed-8b8a-38e5-e468-ecbde3b221af@linux.ibm.com/T/#m3bd047c6e5c8200df1d273c0ad551c645dd43232 > > > > > > We should cool it with the quick hacks to fix things. :) > > > > > > > Yeah. It might fix this specific testcase, but I think the way it uses > > the i_version is "gameable" in other situations. Then again, I don't > > know a lot about IMA in this regard. > > > > When is it expected to remeasure? If it's only expected to remeasure on > > a close(), then that's one thing. That would be a weird design though. > > IMA should remeasure the file when it has visibly changed for another thread or process. > > > > > > -----------------------8<--------------------------- > > > > > > > > [PATCH] IMA: use vfs_getattr_nosec to get the i_version > > > > > > > > IMA currently accesses the i_version out of the inode directly when it > > > > does a measurement. This is fine for most simple filesystems, but can be > > > > problematic with more complex setups (e.g. overlayfs). > > > > > > > > Make IMA instead call vfs_getattr_nosec to get this info. This allows > > > > the filesystem to determine whether and how to report the i_version, and > > > > should allow IMA to work properly with a broader class of filesystems in > > > > the future. > > > > > > > > Reported-by: Stefan Berger <stefanb@linux.ibm.com> > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > > > --- > > > > > > So, I think we want both; we want the ovl_copyattr() and the > > > vfs_getattr_nosec() change: > > > > > > (1) overlayfs should copy up the inode version in ovl_copyattr(). That > > > is in line what we do with all other inode attributes. IOW, the > > > overlayfs inode's i_version counter should aim to mirror the > > > relevant layer's i_version counter. I wouldn't know why that > > > shouldn't be the case. Asking the other way around there doesn't > > > seem to be any use for overlayfs inodes to have an i_version that > > > isn't just mirroring the relevant layer's i_version. > > > > It's less than ideal to do this IMO, particularly with an IS_I_VERSION > > inode. > > > > You can't just copy up the value from the upper. You'll need to call > > inode_query_iversion(upper_inode), which will flag the upper inode for a > > logged i_version update on the next write. IOW, this could create some > > (probably minor) metadata write amplification in the upper layer inode > > with IS_I_VERSION inodes. > > > > > > > (2) Jeff's changes for ima to make it rely on vfs_getattr_nosec(). > > > Currently, ima assumes that it will get the correct i_version from > > > an inode but that just doesn't hold for stacking filesystem. > > > > > > While (1) would likely just fix the immediate bug (2) is correct and > > > _robust_. If we change how attributes are handled vfs_*() helpers will > > > get updated and ima with it. Poking at raw inodes without using > > > appropriate helpers is much more likely to get ima into trouble. > > > > This will fix it the right way, I think (assuming it actually works), > > and should open the door for IMA to work properly with networked > > filesystems that support i_version as well. > > > > Note that there Stephen is correct that calling getattr is probably > > going to be less efficient here since we're going to end up calling > > generic_fillattr unnecessarily, but I still think it's the right thing > > to do. > > I was wondering whether to use the existing inode_eq_iversion() for all > other filesystems than overlayfs, nfs, and possibly other ones (which ones?) > where we would use the vfs_getattr_nosec() via a case on inode->i_sb->s_magic? > If so, would this function be generic enough to be a public function for libfs.c? That's just an invitation for bugs and maintenance headaches. Just call vfs_getattr_nosec() directly and measure the performance impact before trying to optimize this. If you see performance impact that is worth mentioning then we can explore other options such as allowing ->getattr() to only query for i_version and nothing else.
On Sun, 2023-04-16 at 21:57 -0400, Stefan Berger wrote: > > On 4/7/23 09:29, Jeff Layton wrote: > > > > > > > > > > > > I would ditch the original proposal in favor of this 2-line patch shown here: > > > > > > > > > > > > https://lore.kernel.org/linux-integrity/a95f62ed-8b8a-38e5-e468-ecbde3b221af@linux.ibm.com/T/#m3bd047c6e5c8200df1d273c0ad551c645dd43232 > > > > > > We should cool it with the quick hacks to fix things. :) > > > > > > > Yeah. It might fix this specific testcase, but I think the way it uses > > the i_version is "gameable" in other situations. Then again, I don't > > know a lot about IMA in this regard. > > > > When is it expected to remeasure? If it's only expected to remeasure on > > a close(), then that's one thing. That would be a weird design though. > > IMA should remeasure the file when it has visibly changed for another thread or process. > > > > > > -----------------------8<--------------------------- > > > > > > > > [PATCH] IMA: use vfs_getattr_nosec to get the i_version > > > > > > > > IMA currently accesses the i_version out of the inode directly when it > > > > does a measurement. This is fine for most simple filesystems, but can be > > > > problematic with more complex setups (e.g. overlayfs). > > > > > > > > Make IMA instead call vfs_getattr_nosec to get this info. This allows > > > > the filesystem to determine whether and how to report the i_version, and > > > > should allow IMA to work properly with a broader class of filesystems in > > > > the future. > > > > > > > > Reported-by: Stefan Berger <stefanb@linux.ibm.com> > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > > > --- > > > > > > So, I think we want both; we want the ovl_copyattr() and the > > > vfs_getattr_nosec() change: > > > > > > (1) overlayfs should copy up the inode version in ovl_copyattr(). That > > > is in line what we do with all other inode attributes. IOW, the > > > overlayfs inode's i_version counter should aim to mirror the > > > relevant layer's i_version counter. I wouldn't know why that > > > shouldn't be the case. Asking the other way around there doesn't > > > seem to be any use for overlayfs inodes to have an i_version that > > > isn't just mirroring the relevant layer's i_version. > > > > It's less than ideal to do this IMO, particularly with an IS_I_VERSION > > inode. > > > > You can't just copy up the value from the upper. You'll need to call > > inode_query_iversion(upper_inode), which will flag the upper inode for a > > logged i_version update on the next write. IOW, this could create some > > (probably minor) metadata write amplification in the upper layer inode > > with IS_I_VERSION inodes. > > > > > > > (2) Jeff's changes for ima to make it rely on vfs_getattr_nosec(). > > > Currently, ima assumes that it will get the correct i_version from > > > an inode but that just doesn't hold for stacking filesystem. > > > > > > While (1) would likely just fix the immediate bug (2) is correct and > > > _robust_. If we change how attributes are handled vfs_*() helpers will > > > get updated and ima with it. Poking at raw inodes without using > > > appropriate helpers is much more likely to get ima into trouble. > > > > This will fix it the right way, I think (assuming it actually works), > > and should open the door for IMA to work properly with networked > > filesystems that support i_version as well. > > > > Note that there Stephen is correct that calling getattr is probably > > going to be less efficient here since we're going to end up calling > > generic_fillattr unnecessarily, but I still think it's the right thing > > to do. > > I was wondering whether to use the existing inode_eq_iversion() for all > other filesystems than overlayfs, nfs, and possibly other ones (which ones?) > where we would use the vfs_getattr_nosec() via a case on inode->i_sb->s_magic? > If so, would this function be generic enough to be a public function for libfs.c? > > I'll hopefully be able to test the proposed patch tomorrow. > > No, you don't want to use inode_eq_iversion here because (as the comment over it says): * Note that we don't need to set the QUERIED flag in this case, as the value * in the inode is not being recorded for later use. The IMA code _does_ record the value for later use. Furthermore, it's not valid to use inode_eq_iversion on a non-IS_I_VERSION inode, so it's better to just use vfs_getattr_nosec which allows IMA to avoid all of those gory details. Thanks,
On 4/17/23 06:05, Jeff Layton wrote: > On Sun, 2023-04-16 at 21:57 -0400, Stefan Berger wrote: >> >> On 4/7/23 09:29, Jeff Layton wrote: >>> >>> Note that there Stephen is correct that calling getattr is probably >>> going to be less efficient here since we're going to end up calling >>> generic_fillattr unnecessarily, but I still think it's the right thing >>> to do. >> >> I was wondering whether to use the existing inode_eq_iversion() for all >> other filesystems than overlayfs, nfs, and possibly other ones (which ones?) >> where we would use the vfs_getattr_nosec() via a case on inode->i_sb->s_magic? >> If so, would this function be generic enough to be a public function for libfs.c? >> >> I'll hopefully be able to test the proposed patch tomorrow. >> >> > > No, you don't want to use inode_eq_iversion here because (as the comment > over it says): In the ima_check_last_writer() case the usage of inode_eq_iversion() was correct since at this point no record of its value was made and therefore no writer needed to change the i_value again due to IMA: update = test_and_clear_bit(IMA_UPDATE_XATTR, &iint->atomic_flags); if (!IS_I_VERSION(inode) || !inode_eq_iversion(inode, iint->version) || (iint->flags & IMA_NEW_FILE)) { iint->flags &= ~(IMA_DONE_MASK | IMA_NEW_FILE); iint->measured_pcrs = 0; if (update) ima_update_xattr(iint, file); } The record of the value is only made when the actual measurement is done in ima_collect_measurement() Compared to this the usage of vfs_getattr_nosec() is expensive since it resets the flag. if ((request_mask & STATX_CHANGE_COOKIE) && IS_I_VERSION(inode)) { stat->result_mask |= STATX_CHANGE_COOKIE; stat->change_cookie = inode_query_iversion(inode); } idmap = mnt_idmap(path->mnt); if (inode->i_op->getattr) return inode->i_op->getattr(idmap, path, stat, request_mask, query_flags); Also, many filesystems will have their getattr now called as well. I understand Christian's argument about the maintenance headache to a certain degree... Stefan > > * Note that we don't need to set the QUERIED flag in this case, as the value > * in the inode is not being recorded for later use. > > The IMA code _does_ record the value for later use. Furthermore, it's > not valid to use inode_eq_iversion on a non-IS_I_VERSION inode, so it's > better to just use vfs_getattr_nosec which allows IMA to avoid all of > those gory details. > > Thanks,
On Mon, 2023-04-17 at 08:45 -0400, Stefan Berger wrote: > > On 4/17/23 06:05, Jeff Layton wrote: > > On Sun, 2023-04-16 at 21:57 -0400, Stefan Berger wrote: > > > > > > On 4/7/23 09:29, Jeff Layton wrote: > > > > > > > > > Note that there Stephen is correct that calling getattr is probably > > > > going to be less efficient here since we're going to end up calling > > > > generic_fillattr unnecessarily, but I still think it's the right thing > > > > to do. > > > > > > I was wondering whether to use the existing inode_eq_iversion() for all > > > other filesystems than overlayfs, nfs, and possibly other ones (which ones?) > > > where we would use the vfs_getattr_nosec() via a case on inode->i_sb->s_magic? > > > If so, would this function be generic enough to be a public function for libfs.c? > > > > > > I'll hopefully be able to test the proposed patch tomorrow. > > > > > > > > > > No, you don't want to use inode_eq_iversion here because (as the comment > > over it says): > > In the ima_check_last_writer() case the usage of inode_eq_iversion() was correct since > at this point no record of its value was made and therefore no writer needed to change > the i_value again due to IMA: > > update = test_and_clear_bit(IMA_UPDATE_XATTR, > &iint->atomic_flags); > if (!IS_I_VERSION(inode) || > !inode_eq_iversion(inode, iint->version) || > (iint->flags & IMA_NEW_FILE)) { > iint->flags &= ~(IMA_DONE_MASK | IMA_NEW_FILE); > iint->measured_pcrs = 0; > if (update) > ima_update_xattr(iint, file); > } > > The record of the value is only made when the actual measurement is done in > ima_collect_measurement() > True, but we don't have a generic mechanism to do a this. What you're doing only works for IS_I_VERSION inodes. > Compared to this the usage of vfs_getattr_nosec() is expensive since it resets the flag. > > if ((request_mask & STATX_CHANGE_COOKIE) && IS_I_VERSION(inode)) { > stat->result_mask |= STATX_CHANGE_COOKIE; > stat->change_cookie = inode_query_iversion(inode); > } > > idmap = mnt_idmap(path->mnt); > if (inode->i_op->getattr) > return inode->i_op->getattr(idmap, path, stat, > request_mask, query_flags); > > Also, many filesystems will have their getattr now called as well. > ...as they should! > I understand Christian's argument about the maintenance headache to a certain degree... > IMA is not equipped to understand the subtleties of how the i_version counter is implemented on different filesystems. In the past it dealt with this by limiting its usage to IS_I_VERSION inodes, but that is already problematic today. For instance: xfs currently sets the SB_I_VERSION flag, but its i_version counter also bumps the value on atime updates. That means that IMA is doing more remeasurements on xfs than are needed. I'm trying to clean a lot of this up, but IMA's current usage isn't really helping since it's poking around in areas it shouldn't be. Doing a getattr is the canonical way to query this value since it leaves it up to the filesystem how to report this value. If this turns out to cause a performance regression we can look at adding a getattr-like routine that _only_ reports the change attribute. I wouldn't want to do that though unless the need were clear (and backed up by performance numbers).
On 4/6/23 18:04, Jeff Layton wrote: > On Thu, 2023-04-06 at 17:24 -0400, Jeff Layton wrote: >> On Thu, 2023-04-06 at 16:22 -0400, Stefan Berger wrote: >>> >>> On 4/6/23 15:37, Jeff Layton wrote: >>>> On Thu, 2023-04-06 at 15:11 -0400, Stefan Berger wrote: >>>>> >>>>> On 4/6/23 14:46, Jeff Layton wrote: >>>>>> On Thu, 2023-04-06 at 17:01 +0200, Christian Brauner wrote: >>>>>>> On Thu, Apr 06, 2023 at 10:36:41AM -0400, Paul Moore wrote: >>>>> >>>>>> >>>>>> Correct. As long as IMA is also measuring the upper inode then it seems >>>>>> like you shouldn't need to do anything special here. >>>>> >>>>> Unfortunately IMA does not notice the changes. With the patch provided in the other email IMA works as expected. >>>>> >>>> >>>> >>>> It looks like remeasurement is usually done in ima_check_last_writer. >>>> That gets called from __fput which is called when we're releasing the >>>> last reference to the struct file. >>>> >>>> You've hooked into the ->release op, which gets called whenever >>>> filp_close is called, which happens when we're disassociating the file >>>> from the file descriptor table. >>>> >>>> So...I don't get it. Is ima_file_free not getting called on your file >>>> for some reason when you go to close it? It seems like that should be >>>> handling this. >>> >>> I would ditch the original proposal in favor of this 2-line patch shown here: >>> >>> https://lore.kernel.org/linux-integrity/a95f62ed-8b8a-38e5-e468-ecbde3b221af@linux.ibm.com/T/#m3bd047c6e5c8200df1d273c0ad551c645dd43232 >>> >>> >> >> Ok, I think I get it. IMA is trying to use the i_version from the >> overlayfs inode. >> >> I suspect that the real problem here is that IMA is just doing a bare >> inode_query_iversion. Really, we ought to make IMA call >> vfs_getattr_nosec (or something like it) to query the getattr routine in >> the upper layer. Then overlayfs could just propagate the results from >> the upper layer in its response. >> >> That sort of design may also eventually help IMA work properly with more >> exotic filesystems, like NFS or Ceph. >> >> >> > > Maybe something like this? It builds for me but I haven't tested it. It > looks like overlayfs already should report the upper layer's i_version > in getattr, though I haven't tested that either: > > -----------------------8<--------------------------- > > [PATCH] IMA: use vfs_getattr_nosec to get the i_version > > IMA currently accesses the i_version out of the inode directly when it > does a measurement. This is fine for most simple filesystems, but can be > problematic with more complex setups (e.g. overlayfs). > > Make IMA instead call vfs_getattr_nosec to get this info. This allows > the filesystem to determine whether and how to report the i_version, and > should allow IMA to work properly with a broader class of filesystems in > the future. > > Reported-by: Stefan Berger <stefanb@linux.ibm.com> > Signed-off-by: Jeff Layton <jlayton@kernel.org> > --- > security/integrity/ima/ima_api.c | 9 ++++++--- > security/integrity/ima/ima_main.c | 12 ++++++++---- > 2 files changed, 14 insertions(+), 7 deletions(-) > > diff --git a/security/integrity/ima/ima_api.c b/security/integrity/ima/ima_api.c > index d3662f4acadc..c45902e72044 100644 > --- a/security/integrity/ima/ima_api.c > +++ b/security/integrity/ima/ima_api.c > @@ -13,7 +13,6 @@ > #include <linux/fs.h> > #include <linux/xattr.h> > #include <linux/evm.h> > -#include <linux/iversion.h> > #include <linux/fsverity.h> > > #include "ima.h" > @@ -246,10 +245,11 @@ int ima_collect_measurement(struct integrity_iint_cache *iint, > struct inode *inode = file_inode(file); > const char *filename = file->f_path.dentry->d_name.name; > struct ima_max_digest_data hash; > + struct kstat stat; > int result = 0; > int length; > void *tmpbuf; > - u64 i_version; > + u64 i_version = 0; > > /* > * Always collect the modsig, because IMA might have already collected > @@ -268,7 +268,10 @@ int ima_collect_measurement(struct integrity_iint_cache *iint, > * to an initial measurement/appraisal/audit, but was modified to > * assume the file changed. > */ > - i_version = inode_query_iversion(inode); > + result = vfs_getattr_nosec(&file->f_path, &stat, STATX_CHANGE_COOKIE, > + AT_STATX_SYNC_AS_STAT); > + if (!result && (stat.result_mask & STATX_CHANGE_COOKIE)) > + i_version = stat.change_cookie; > hash.hdr.algo = algo; > hash.hdr.length = hash_digest_size[algo]; > > diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c > index d66a0a36415e..365db0e43d7c 100644 > --- a/security/integrity/ima/ima_main.c > +++ b/security/integrity/ima/ima_main.c > @@ -24,7 +24,6 @@ > #include <linux/slab.h> > #include <linux/xattr.h> > #include <linux/ima.h> > -#include <linux/iversion.h> > #include <linux/fs.h> > > #include "ima.h" > @@ -164,11 +163,16 @@ static void ima_check_last_writer(struct integrity_iint_cache *iint, > > mutex_lock(&iint->mutex); > if (atomic_read(&inode->i_writecount) == 1) { > + struct kstat stat; > + > update = test_and_clear_bit(IMA_UPDATE_XATTR, > &iint->atomic_flags); > - if (!IS_I_VERSION(inode) || > - !inode_eq_iversion(inode, iint->version) || > - (iint->flags & IMA_NEW_FILE)) { > + if ((iint->flags & IMA_NEW_FILE) || > + vfs_getattr_nosec(&file->f_path, &stat, > + STATX_CHANGE_COOKIE, > + AT_STATX_SYNC_AS_STAT) || > + !(stat.result_mask & STATX_CHANGE_COOKIE) || > + stat.change_cookie != iint->version) { > iint->flags &= ~(IMA_DONE_MASK | IMA_NEW_FILE); > iint->measured_pcrs = 0; > if (update) I tested this in the OpenBMC setup with overlayfs acting as rootfs. It works now as expected. Tested-by: Stefan Berger <stefanb@linux.ibm.com>
On Fri, 2023-04-07 at 09:29 -0400, Jeff Layton wrote: > > > > > > > > > > I would ditch the original proposal in favor of this 2-line patch shown here: > > > > > > > > > > https://lore.kernel.org/linux-integrity/a95f62ed-8b8a-38e5-e468-ecbde3b221af@linux.ibm.com/T/#m3bd047c6e5c8200df1d273c0ad551c645dd43232 > > > > We should cool it with the quick hacks to fix things. :) > > > > Yeah. It might fix this specific testcase, but I think the way it uses > the i_version is "gameable" in other situations. Then again, I don't > know a lot about IMA in this regard. > > When is it expected to remeasure? If it's only expected to remeasure on > a close(), then that's one thing. That would be a weird design though. Historical background: Prior to IMA being upstreamed there was a lot of discussion about how much/how frequently to measure files. Re-measuring files after each write would impact performance. Instead of re-measuring files after each write, if a file already opened for write was opened for read (open writers) or a file already opened for read was opened for write (Time of Measure/Time of Use) the IMA meausrement list was invalidated by including a violation record in the measurement list. Only the BPRM hook prevents a file from being opened for write. > > > > > > > > > > > > > > > > > > > Ok, I think I get it. IMA is trying to use the i_version from the > > > > overlayfs inode. > > > > > > > > I suspect that the real problem here is that IMA is just doing a bare > > > > inode_query_iversion. Really, we ought to make IMA call > > > > vfs_getattr_nosec (or something like it) to query the getattr routine in > > > > the upper layer. Then overlayfs could just propagate the results from > > > > the upper layer in its response. > > > > > > > > That sort of design may also eventually help IMA work properly with more > > > > exotic filesystems, like NFS or Ceph. > > > > > > > > > > > > > > > > > > Maybe something like this? It builds for me but I haven't tested it. It > > > looks like overlayfs already should report the upper layer's i_version > > > in getattr, though I haven't tested that either: > > > > > > -----------------------8<--------------------------- > > > > > > [PATCH] IMA: use vfs_getattr_nosec to get the i_version > > > > > > IMA currently accesses the i_version out of the inode directly when it > > > does a measurement. This is fine for most simple filesystems, but can be > > > problematic with more complex setups (e.g. overlayfs). > > > > > > Make IMA instead call vfs_getattr_nosec to get this info. This allows > > > the filesystem to determine whether and how to report the i_version, and > > > should allow IMA to work properly with a broader class of filesystems in > > > the future. > > > > > > Reported-by: Stefan Berger <stefanb@linux.ibm.com> > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > > --- > > > > So, I think we want both; we want the ovl_copyattr() and the > > vfs_getattr_nosec() change: > > > > (1) overlayfs should copy up the inode version in ovl_copyattr(). That > > is in line what we do with all other inode attributes. IOW, the > > overlayfs inode's i_version counter should aim to mirror the > > relevant layer's i_version counter. I wouldn't know why that > > shouldn't be the case. Asking the other way around there doesn't > > seem to be any use for overlayfs inodes to have an i_version that > > isn't just mirroring the relevant layer's i_version. > > It's less than ideal to do this IMO, particularly with an IS_I_VERSION > inode. > > You can't just copy up the value from the upper. You'll need to call > inode_query_iversion(upper_inode), which will flag the upper inode for a > logged i_version update on the next write. IOW, this could create some > (probably minor) metadata write amplification in the upper layer inode > with IS_I_VERSION inodes. > > > > (2) Jeff's changes for ima to make it rely on vfs_getattr_nosec(). > > Currently, ima assumes that it will get the correct i_version from > > an inode but that just doesn't hold for stacking filesystem. > > > > While (1) would likely just fix the immediate bug (2) is correct and > > _robust_. If we change how attributes are handled vfs_*() helpers will > > get updated and ima with it. Poking at raw inodes without using > > appropriate helpers is much more likely to get ima into trouble. > > This will fix it the right way, I think (assuming it actually works), > and should open the door for IMA to work properly with networked > filesystems that support i_version as well. On a local filesystem, there are guarantees that the calculated file hash is that of the file being used. Reminder IMA reads a file, page size chunk at a time into a single buffer, calculating the file hash. Once the file hash is calculated, the memory is freed. There are no guarantees on a fuse filesystem, for example, that the original file read and verified is the same as the one being executed. I'm not sure that the integrity guarantees of a file on a remote filesystem will be the same as those on a local file system. > > Note that there Stephen is correct that calling getattr is probably > going to be less efficient here since we're going to end up calling > generic_fillattr unnecessarily, but I still think it's the right thing > to do. > > If it turns out to cause measurable performance regressions though, > maybe we can look at adding a something that still calls ->getattr if it > exists but only returns the change_cookie value. Sure. For now, Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
On Tue, 2023-04-11 at 10:38 +0200, Christian Brauner wrote: > On Sun, Apr 09, 2023 at 06:12:09PM -0400, Jeff Layton wrote: > > On Sun, 2023-04-09 at 17:22 +0200, Christian Brauner wrote: > > > On Fri, Apr 07, 2023 at 09:29:29AM -0400, Jeff Layton wrote: > > > > > > > > > > > > > > > > I would ditch the original proposal in favor of this 2-line patch shown here: > > > > > > > > > > > > > > > > https://lore.kernel.org/linux-integrity/a95f62ed-8b8a-38e5-e468-ecbde3b221af@linux.ibm.com/T/#m3bd047c6e5c8200df1d273c0ad551c645dd43232 > > > > > > > > > > We should cool it with the quick hacks to fix things. :) > > > > > > > > > > > > > Yeah. It might fix this specific testcase, but I think the way it uses > > > > the i_version is "gameable" in other situations. Then again, I don't > > > > know a lot about IMA in this regard. > > > > > > > > When is it expected to remeasure? If it's only expected to remeasure on > > > > a close(), then that's one thing. That would be a weird design though. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Ok, I think I get it. IMA is trying to use the i_version from the > > > > > > > overlayfs inode. > > > > > > > > > > > > > > I suspect that the real problem here is that IMA is just doing a bare > > > > > > > inode_query_iversion. Really, we ought to make IMA call > > > > > > > vfs_getattr_nosec (or something like it) to query the getattr routine in > > > > > > > the upper layer. Then overlayfs could just propagate the results from > > > > > > > the upper layer in its response. > > > > > > > > > > > > > > That sort of design may also eventually help IMA work properly with more > > > > > > > exotic filesystems, like NFS or Ceph. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Maybe something like this? It builds for me but I haven't tested it. It > > > > > > looks like overlayfs already should report the upper layer's i_version > > > > > > in getattr, though I haven't tested that either: > > > > > > > > > > > > -----------------------8<--------------------------- > > > > > > > > > > > > [PATCH] IMA: use vfs_getattr_nosec to get the i_version > > > > > > > > > > > > IMA currently accesses the i_version out of the inode directly when it > > > > > > does a measurement. This is fine for most simple filesystems, but can be > > > > > > problematic with more complex setups (e.g. overlayfs). > > > > > > > > > > > > Make IMA instead call vfs_getattr_nosec to get this info. This allows > > > > > > the filesystem to determine whether and how to report the i_version, and > > > > > > should allow IMA to work properly with a broader class of filesystems in > > > > > > the future. > > > > > > > > > > > > Reported-by: Stefan Berger <stefanb@linux.ibm.com> > > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > > > > > --- > > > > > > > > > > So, I think we want both; we want the ovl_copyattr() and the > > > > > vfs_getattr_nosec() change: > > > > > > > > > > (1) overlayfs should copy up the inode version in ovl_copyattr(). That > > > > > is in line what we do with all other inode attributes. IOW, the > > > > > overlayfs inode's i_version counter should aim to mirror the > > > > > relevant layer's i_version counter. I wouldn't know why that > > > > > shouldn't be the case. Asking the other way around there doesn't > > > > > seem to be any use for overlayfs inodes to have an i_version that > > > > > isn't just mirroring the relevant layer's i_version. > > > > > > > > It's less than ideal to do this IMO, particularly with an IS_I_VERSION > > > > inode. > > > > > > > > You can't just copy up the value from the upper. You'll need to call > > > > inode_query_iversion(upper_inode), which will flag the upper inode for a > > > > logged i_version update on the next write. IOW, this could create some > > > > (probably minor) metadata write amplification in the upper layer inode > > > > with IS_I_VERSION inodes. > > > > > > I'm likely just missing context and am curious about this so bear with me. Why > > > do we need to flag the upper inode for a logged i_version update? Any required > > > i_version interactions should've already happened when overlayfs called into > > > the upper layer. So all that's left to do is for overlayfs' to mirror the > > > i_version value after the upper operation has returned. > > > > > ovl_copyattr() - which copies the inode attributes - is always called after the > > > operation on the upper inode has finished. So the additional query seems odd at > > > first glance. But there might well be a good reason for it. In my naive > > > approach I would've thought that sm along the lines of: > > > > > > diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c > > > index 923d66d131c1..8b089035b9b3 100644 > > > --- a/fs/overlayfs/util.c > > > +++ b/fs/overlayfs/util.c > > > @@ -1119,4 +1119,5 @@ void ovl_copyattr(struct inode *inode) > > > inode->i_mtime = realinode->i_mtime; > > > inode->i_ctime = realinode->i_ctime; > > > i_size_write(inode, i_size_read(realinode)); > > > + inode_set_iversion_raw(inode, inode_peek_iversion_raw(realinode)); > > > } > > > > > > would've been sufficient. > > > > > > > Nope, because then you wouldn't get any updates to i_version after that > > point. > > > > Note that with an IS_I_VERSION inode we only update the i_version when > > there has been a query since the last update. What you're doing above is > > circumventing that mechanism. You'll get the i_version at the time of of > > the ovl_copyattr, but there won't be any updates of it after that point > > because the QUERIED bit won't end up being set on realinode. > > I get all that. > But my understanding had been that the i_version value at the time of > ovl_copyattr() would be correct. Because when ovl_copyattr() is called > the expected i_version change will have been done in the relevant layer > includig raising the QUERIED bit. Since the layers are not allowed to be > changed outside of the overlayfs mount any change to them can only > originate from overlayfs which would necessarily call ovl_copyattr() > again. IOW, overlayfs would by virtue of its implementation keep the > i_version value in sync. > > Overlayfs wouldn't even raise SB_I_VERSION. It would indeed just be a > cache of i_version of the relevant layer. > > > > > > > > Since overlayfs' does explicitly disallow changes to the upper and lower trees > > > while overlayfs is mounted it seems intuitive that it should just mirror the > > > relevant layer's i_version. > > > > > > > > > If we don't do this, then we should probably document that i_version doesn't > > > have a meaning yet for the inodes of stacking filesystems. > > > > > > > Trying to cache the i_version is counterproductive, IMO, at least with > > an IS_I_VERSION inode. > > > > The problem is that a query against the i_version has a side-effect. It > > has to (atomically) mark the inode for an update on the next change. > > > > If you try to cache that value, you'll likely end up doing more queries > > than you really need to (because you'll need to keep the cache up to > > date) and you'll have an i_version that will necessarily lag the one in > > the upper layer inode. > > > > The whole point of the change attribute is to get the value as it is at > > this very moment so we can check whether there have been changes. A > > laggy value is not terribly useful. > > > > Overlayfs should just always call the upper layer's ->getattr to get the > > version. I wouldn't even bother copying it up in the first place. Doing > > so is just encouraging someone to try use the value in the overlayfs > > inode, when they really need to go through ->getattr and get the one > > from the upper layer. > > That seems reasonable to me. I read this as an agreeing with my earlier > suggestion to document that i_version doesn't have a meaning for the > inodes of stacking filesystems and that we should spell out that > vfs_getattr()/->getattr() needs to be used to interact with i_version. > > We need to explain to subsystems such as IMA somwhere what the correct > way to query i_version agnostically is; independent of filesystem > implementation details. > > Looking at IMA, it queries the i_version directly without checking > whether it's an IS_I_VERSION() inode first. This might make a > difference.h > > Afaict, filesystems that persist i_version to disk automatically raise > SB_I_VERSION. I would guess that it be considered a bug if a filesystem > would persist i_version to disk and not raise SB_I_VERSION. If so IMA > should probably be made to check for IS_I_VERSION() and it will probably > get that by switching to vfs_getattr_nosec(). When the filesystem isn't mounted with I_VERSION, i_version should be set to 0. Originally when the filesytem wasn't mounted with I_VERSION support, the file would only be measured once. With commit ac0bf025d2c0 ("ima: Use i_version only when filesystem supports it"), this changed. The "iint" flags are reset, causing the file to be re- {measure/appraised/audited} on next access.
On Fri, Apr 21, 2023 at 10:44 AM Mimi Zohar <zohar@linux.ibm.com> wrote: > On Fri, 2023-04-07 at 09:29 -0400, Jeff Layton wrote: > > > > > > > > > > > > I would ditch the original proposal in favor of this 2-line patch shown here: > > > > > > > > > > > > https://lore.kernel.org/linux-integrity/a95f62ed-8b8a-38e5-e468-ecbde3b221af@linux.ibm.com/T/#m3bd047c6e5c8200df1d273c0ad551c645dd43232 > > > > > > We should cool it with the quick hacks to fix things. :) > > > > > > > Yeah. It might fix this specific testcase, but I think the way it uses > > the i_version is "gameable" in other situations. Then again, I don't > > know a lot about IMA in this regard. > > > > When is it expected to remeasure? If it's only expected to remeasure on > > a close(), then that's one thing. That would be a weird design though. > > Historical background: > > Prior to IMA being upstreamed there was a lot of discussion about how > much/how frequently to measure files. Re-measuring files after each > write would impact performance. Instead of re-measuring files after > each write, if a file already opened for write was opened for read > (open writers) or a file already opened for read was opened for write > (Time of Measure/Time of Use) the IMA meausrement list was invalidated > by including a violation record in the measurement list. > > Only the BPRM hook prevents a file from being opened for write. > > > > > > > > > > > > > > > > > > > > > > > > > Ok, I think I get it. IMA is trying to use the i_version from the > > > > > overlayfs inode. > > > > > > > > > > I suspect that the real problem here is that IMA is just doing a bare > > > > > inode_query_iversion. Really, we ought to make IMA call > > > > > vfs_getattr_nosec (or something like it) to query the getattr routine in > > > > > the upper layer. Then overlayfs could just propagate the results from > > > > > the upper layer in its response. > > > > > > > > > > That sort of design may also eventually help IMA work properly with more > > > > > exotic filesystems, like NFS or Ceph. > > > > > > > > > > > > > > > > > > > > > > > Maybe something like this? It builds for me but I haven't tested it. It > > > > looks like overlayfs already should report the upper layer's i_version > > > > in getattr, though I haven't tested that either: > > > > > > > > -----------------------8<--------------------------- > > > > > > > > [PATCH] IMA: use vfs_getattr_nosec to get the i_version > > > > > > > > IMA currently accesses the i_version out of the inode directly when it > > > > does a measurement. This is fine for most simple filesystems, but can be > > > > problematic with more complex setups (e.g. overlayfs). > > > > > > > > Make IMA instead call vfs_getattr_nosec to get this info. This allows > > > > the filesystem to determine whether and how to report the i_version, and > > > > should allow IMA to work properly with a broader class of filesystems in > > > > the future. > > > > > > > > Reported-by: Stefan Berger <stefanb@linux.ibm.com> > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > > > --- > > > > > > So, I think we want both; we want the ovl_copyattr() and the > > > vfs_getattr_nosec() change: > > > > > > (1) overlayfs should copy up the inode version in ovl_copyattr(). That > > > is in line what we do with all other inode attributes. IOW, the > > > overlayfs inode's i_version counter should aim to mirror the > > > relevant layer's i_version counter. I wouldn't know why that > > > shouldn't be the case. Asking the other way around there doesn't > > > seem to be any use for overlayfs inodes to have an i_version that > > > isn't just mirroring the relevant layer's i_version. > > > > It's less than ideal to do this IMO, particularly with an IS_I_VERSION > > inode. > > > > You can't just copy up the value from the upper. You'll need to call > > inode_query_iversion(upper_inode), which will flag the upper inode for a > > logged i_version update on the next write. IOW, this could create some > > (probably minor) metadata write amplification in the upper layer inode > > with IS_I_VERSION inodes. > > > > > > > (2) Jeff's changes for ima to make it rely on vfs_getattr_nosec(). > > > Currently, ima assumes that it will get the correct i_version from > > > an inode but that just doesn't hold for stacking filesystem. > > > > > > While (1) would likely just fix the immediate bug (2) is correct and > > > _robust_. If we change how attributes are handled vfs_*() helpers will > > > get updated and ima with it. Poking at raw inodes without using > > > appropriate helpers is much more likely to get ima into trouble. > > > > This will fix it the right way, I think (assuming it actually works), > > and should open the door for IMA to work properly with networked > > filesystems that support i_version as well. > > On a local filesystem, there are guarantees that the calculated file > hash is that of the file being used. Reminder IMA reads a file, page > size chunk at a time into a single buffer, calculating the file hash. > Once the file hash is calculated, the memory is freed. > > There are no guarantees on a fuse filesystem, for example, that the > original file read and verified is the same as the one being executed. > I'm not sure that the integrity guarantees of a file on a remote > filesystem will be the same as those on a local file system. > > > > > Note that there Stephen is correct that calling getattr is probably > > going to be less efficient here since we're going to end up calling > > generic_fillattr unnecessarily, but I still think it's the right thing > > to do. > > > > If it turns out to cause measurable performance regressions though, > > maybe we can look at adding a something that still calls ->getattr if it > > exists but only returns the change_cookie value. > > Sure. For now, > > Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> I'm going through my review queue to make sure I haven't missed anything and this thread popped up ... Stefan, Mimi, did you get a fix into an upstream tree somewhere? If not, is it because you are waiting on a review/merge from me into the LSM tree?
On Thu, 2023-05-18 at 16:46 -0400, Paul Moore wrote: > On Fri, Apr 21, 2023 at 10:44 AM Mimi Zohar <zohar@linux.ibm.com> wrote: > > On Fri, 2023-04-07 at 09:29 -0400, Jeff Layton wrote: > > > > > > > > > > > > > > I would ditch the original proposal in favor of this 2-line patch shown here: > > > > > > > > > > > > > > https://lore.kernel.org/linux-integrity/a95f62ed-8b8a-38e5-e468-ecbde3b221af@linux.ibm.com/T/#m3bd047c6e5c8200df1d273c0ad551c645dd43232 > > > > > > > > We should cool it with the quick hacks to fix things. :) > > > > > > > > > > Yeah. It might fix this specific testcase, but I think the way it uses > > > the i_version is "gameable" in other situations. Then again, I don't > > > know a lot about IMA in this regard. > > > > > > When is it expected to remeasure? If it's only expected to remeasure on > > > a close(), then that's one thing. That would be a weird design though. > > > > Historical background: > > > > Prior to IMA being upstreamed there was a lot of discussion about how > > much/how frequently to measure files. Re-measuring files after each > > write would impact performance. Instead of re-measuring files after > > each write, if a file already opened for write was opened for read > > (open writers) or a file already opened for read was opened for write > > (Time of Measure/Time of Use) the IMA meausrement list was invalidated > > by including a violation record in the measurement list. > > > > Only the BPRM hook prevents a file from being opened for write. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Ok, I think I get it. IMA is trying to use the i_version from the > > > > > > overlayfs inode. > > > > > > > > > > > > I suspect that the real problem here is that IMA is just doing a bare > > > > > > inode_query_iversion. Really, we ought to make IMA call > > > > > > vfs_getattr_nosec (or something like it) to query the getattr routine in > > > > > > the upper layer. Then overlayfs could just propagate the results from > > > > > > the upper layer in its response. > > > > > > > > > > > > That sort of design may also eventually help IMA work properly with more > > > > > > exotic filesystems, like NFS or Ceph. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Maybe something like this? It builds for me but I haven't tested it. It > > > > > looks like overlayfs already should report the upper layer's i_version > > > > > in getattr, though I haven't tested that either: > > > > > > > > > > -----------------------8<--------------------------- > > > > > > > > > > [PATCH] IMA: use vfs_getattr_nosec to get the i_version > > > > > > > > > > IMA currently accesses the i_version out of the inode directly when it > > > > > does a measurement. This is fine for most simple filesystems, but can be > > > > > problematic with more complex setups (e.g. overlayfs). > > > > > > > > > > Make IMA instead call vfs_getattr_nosec to get this info. This allows > > > > > the filesystem to determine whether and how to report the i_version, and > > > > > should allow IMA to work properly with a broader class of filesystems in > > > > > the future. > > > > > > > > > > Reported-by: Stefan Berger <stefanb@linux.ibm.com> > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > > > > --- > > > > > > > > So, I think we want both; we want the ovl_copyattr() and the > > > > vfs_getattr_nosec() change: > > > > > > > > (1) overlayfs should copy up the inode version in ovl_copyattr(). That > > > > is in line what we do with all other inode attributes. IOW, the > > > > overlayfs inode's i_version counter should aim to mirror the > > > > relevant layer's i_version counter. I wouldn't know why that > > > > shouldn't be the case. Asking the other way around there doesn't > > > > seem to be any use for overlayfs inodes to have an i_version that > > > > isn't just mirroring the relevant layer's i_version. > > > > > > It's less than ideal to do this IMO, particularly with an IS_I_VERSION > > > inode. > > > > > > You can't just copy up the value from the upper. You'll need to call > > > inode_query_iversion(upper_inode), which will flag the upper inode for a > > > logged i_version update on the next write. IOW, this could create some > > > (probably minor) metadata write amplification in the upper layer inode > > > with IS_I_VERSION inodes. > > > > > > > > > > (2) Jeff's changes for ima to make it rely on vfs_getattr_nosec(). > > > > Currently, ima assumes that it will get the correct i_version from > > > > an inode but that just doesn't hold for stacking filesystem. > > > > > > > > While (1) would likely just fix the immediate bug (2) is correct and > > > > _robust_. If we change how attributes are handled vfs_*() helpers will > > > > get updated and ima with it. Poking at raw inodes without using > > > > appropriate helpers is much more likely to get ima into trouble. > > > > > > This will fix it the right way, I think (assuming it actually works), > > > and should open the door for IMA to work properly with networked > > > filesystems that support i_version as well. > > > > On a local filesystem, there are guarantees that the calculated file > > hash is that of the file being used. Reminder IMA reads a file, page > > size chunk at a time into a single buffer, calculating the file hash. > > Once the file hash is calculated, the memory is freed. > > > > There are no guarantees on a fuse filesystem, for example, that the > > original file read and verified is the same as the one being executed. > > I'm not sure that the integrity guarantees of a file on a remote > > filesystem will be the same as those on a local file system. > > > > > > > > Note that there Stephen is correct that calling getattr is probably > > > going to be less efficient here since we're going to end up calling > > > generic_fillattr unnecessarily, but I still think it's the right thing > > > to do. > > > > > > If it turns out to cause measurable performance regressions though, > > > maybe we can look at adding a something that still calls ->getattr if it > > > exists but only returns the change_cookie value. > > > > Sure. For now, > > > > Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> > > I'm going through my review queue to make sure I haven't missed > anything and this thread popped up ... Stefan, Mimi, did you get a fix > into an upstream tree somewhere? If not, is it because you are > waiting on a review/merge from me into the LSM tree? Sorry for the delay. Between vacation and LSS, I just started testing Jeff Layton's patch. Mimi
On Thu, May 18, 2023 at 4:56 PM Mimi Zohar <zohar@linux.ibm.com> wrote: > On Thu, 2023-05-18 at 16:46 -0400, Paul Moore wrote: > > On Fri, Apr 21, 2023 at 10:44 AM Mimi Zohar <zohar@linux.ibm.com> wrote: > > > On Fri, 2023-04-07 at 09:29 -0400, Jeff Layton wrote: ... > > I'm going through my review queue to make sure I haven't missed > > anything and this thread popped up ... Stefan, Mimi, did you get a fix > > into an upstream tree somewhere? If not, is it because you are > > waiting on a review/merge from me into the LSM tree? > > Sorry for the delay. Between vacation and LSS, I just started testing > Jeff Layton's patch. No worries, I'm a bit behind too, I just wanted to make sure I wasn't blocking this thread :)
On Fri, 2023-04-07 at 10:31 +0200, Christian Brauner wrote: > So, I think we want both; we want the ovl_copyattr() and the > vfs_getattr_nosec() change: > > (1) overlayfs should copy up the inode version in ovl_copyattr(). That > is in line what we do with all other inode attributes. IOW, the > overlayfs inode's i_version counter should aim to mirror the > relevant layer's i_version counter. I wouldn't know why that > shouldn't be the case. Asking the other way around there doesn't > seem to be any use for overlayfs inodes to have an i_version that > isn't just mirroring the relevant layer's i_version. > (2) Jeff's changes for ima to make it rely on vfs_getattr_nosec(). > Currently, ima assumes that it will get the correct i_version from > an inode but that just doesn't hold for stacking filesystem. > > While (1) would likely just fix the immediate bug (2) is correct and > _robust_. If we change how attributes are handled vfs_*() helpers will > get updated and ima with it. Poking at raw inodes without using > appropriate helpers is much more likely to get ima into trouble. In addition to properly setting the i_version for IMA, EVM has a similar issue with i_generation and s_uuid. Adding them to ovl_copyattr() seems to resolve it. Does that make sense? diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c index 923d66d131c1..cd0aeb828868 100644 --- a/fs/overlayfs/util.c +++ b/fs/overlayfs/util.c @@ -1118,5 +1118,8 @@ void ovl_copyattr(struct inode *inode) inode->i_atime = realinode->i_atime; inode->i_mtime = realinode->i_mtime; inode->i_ctime = realinode->i_ctime; + inode->i_generation = realinode->i_generation; + if (inode->i_sb) + uuid_copy(&inode->i_sb->s_uuid, &realinode->i_sb- >s_uuid); i_size_write(inode, i_size_read(realinode)); }
On Fri, May 19, 2023 at 10:42 PM Mimi Zohar <zohar@linux.ibm.com> wrote: > > On Fri, 2023-04-07 at 10:31 +0200, Christian Brauner wrote: > > So, I think we want both; we want the ovl_copyattr() and the > > vfs_getattr_nosec() change: > > > > (1) overlayfs should copy up the inode version in ovl_copyattr(). That > > is in line what we do with all other inode attributes. IOW, the > > overlayfs inode's i_version counter should aim to mirror the > > relevant layer's i_version counter. I wouldn't know why that > > shouldn't be the case. Asking the other way around there doesn't > > seem to be any use for overlayfs inodes to have an i_version that > > isn't just mirroring the relevant layer's i_version. > > (2) Jeff's changes for ima to make it rely on vfs_getattr_nosec(). > > Currently, ima assumes that it will get the correct i_version from > > an inode but that just doesn't hold for stacking filesystem. > > > > While (1) would likely just fix the immediate bug (2) is correct and > > _robust_. If we change how attributes are handled vfs_*() helpers will > > get updated and ima with it. Poking at raw inodes without using > > appropriate helpers is much more likely to get ima into trouble. > > In addition to properly setting the i_version for IMA, EVM has a > similar issue with i_generation and s_uuid. Adding them to > ovl_copyattr() seems to resolve it. Does that make sense? > > diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c > index 923d66d131c1..cd0aeb828868 100644 > --- a/fs/overlayfs/util.c > +++ b/fs/overlayfs/util.c > @@ -1118,5 +1118,8 @@ void ovl_copyattr(struct inode *inode) > inode->i_atime = realinode->i_atime; > inode->i_mtime = realinode->i_mtime; > inode->i_ctime = realinode->i_ctime; > + inode->i_generation = realinode->i_generation; > + if (inode->i_sb) > + uuid_copy(&inode->i_sb->s_uuid, &realinode->i_sb- > >s_uuid); That is not a possible solution Mimi. The i_gneration copy *may* be acceptable in "all layers on same fs" setup, but changing overlayfs s_uuid over and over is a non-starter. If you explain the problem, I may be able to help you find a better solution. Thanks, Amir.
On Fri, May 19, 2023 at 03:42:38PM -0400, Mimi Zohar wrote: > On Fri, 2023-04-07 at 10:31 +0200, Christian Brauner wrote: > > So, I think we want both; we want the ovl_copyattr() and the > > vfs_getattr_nosec() change: > > > > (1) overlayfs should copy up the inode version in ovl_copyattr(). That > > is in line what we do with all other inode attributes. IOW, the > > overlayfs inode's i_version counter should aim to mirror the > > relevant layer's i_version counter. I wouldn't know why that > > shouldn't be the case. Asking the other way around there doesn't > > seem to be any use for overlayfs inodes to have an i_version that > > isn't just mirroring the relevant layer's i_version. > > (2) Jeff's changes for ima to make it rely on vfs_getattr_nosec(). > > Currently, ima assumes that it will get the correct i_version from > > an inode but that just doesn't hold for stacking filesystem. > > > > While (1) would likely just fix the immediate bug (2) is correct and > > _robust_. If we change how attributes are handled vfs_*() helpers will > > get updated and ima with it. Poking at raw inodes without using > > appropriate helpers is much more likely to get ima into trouble. > > In addition to properly setting the i_version for IMA, EVM has a > similar issue with i_generation and s_uuid. Adding them to > ovl_copyattr() seems to resolve it. Does that make sense? > > diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c > index 923d66d131c1..cd0aeb828868 100644 > --- a/fs/overlayfs/util.c > +++ b/fs/overlayfs/util.c > @@ -1118,5 +1118,8 @@ void ovl_copyattr(struct inode *inode) > inode->i_atime = realinode->i_atime; > inode->i_mtime = realinode->i_mtime; > inode->i_ctime = realinode->i_ctime; > + inode->i_generation = realinode->i_generation; > + if (inode->i_sb) > + uuid_copy(&inode->i_sb->s_uuid, &realinode->i_sb- Overlayfs can consist of multiple lower layers and each of those lower layers may have a different uuid. So everytime you trigger a ovl_copyattr() on a different layer this patch would alter the uuid of the overlayfs superblock. In addition the uuid should be set when the filesystem is mounted. Unless the filesystem implements a dedicated ioctl() - like ext4 - to change the uuid.
On Sat, 2023-05-20 at 12:15 +0300, Amir Goldstein wrote: > On Fri, May 19, 2023 at 10:42 PM Mimi Zohar <zohar@linux.ibm.com> wrote: > > > > On Fri, 2023-04-07 at 10:31 +0200, Christian Brauner wrote: > > > So, I think we want both; we want the ovl_copyattr() and the > > > vfs_getattr_nosec() change: > > > > > > (1) overlayfs should copy up the inode version in ovl_copyattr(). That > > > is in line what we do with all other inode attributes. IOW, the > > > overlayfs inode's i_version counter should aim to mirror the > > > relevant layer's i_version counter. I wouldn't know why that > > > shouldn't be the case. Asking the other way around there doesn't > > > seem to be any use for overlayfs inodes to have an i_version that > > > isn't just mirroring the relevant layer's i_version. > > > (2) Jeff's changes for ima to make it rely on vfs_getattr_nosec(). > > > Currently, ima assumes that it will get the correct i_version from > > > an inode but that just doesn't hold for stacking filesystem. > > > > > > While (1) would likely just fix the immediate bug (2) is correct and > > > _robust_. If we change how attributes are handled vfs_*() helpers will > > > get updated and ima with it. Poking at raw inodes without using > > > appropriate helpers is much more likely to get ima into trouble. > > > > In addition to properly setting the i_version for IMA, EVM has a > > similar issue with i_generation and s_uuid. Adding them to > > ovl_copyattr() seems to resolve it. Does that make sense? > > > > diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c > > index 923d66d131c1..cd0aeb828868 100644 > > --- a/fs/overlayfs/util.c > > +++ b/fs/overlayfs/util.c > > @@ -1118,5 +1118,8 @@ void ovl_copyattr(struct inode *inode) > > inode->i_atime = realinode->i_atime; > > inode->i_mtime = realinode->i_mtime; > > inode->i_ctime = realinode->i_ctime; > > + inode->i_generation = realinode->i_generation; > > + if (inode->i_sb) > > + uuid_copy(&inode->i_sb->s_uuid, &realinode->i_sb- > > >s_uuid); > > That is not a possible solution Mimi. > > The i_gneration copy *may* be acceptable in "all layers on same fs" > setup, but changing overlayfs s_uuid over and over is a non-starter. > > If you explain the problem, I may be able to help you find a better solution. EVM calculates an HMAC of the file metadata (security xattrs, i_ino, i_generation, i_uid, i_gid, i_mode, s_uuid) and stores it as security.evm. Notrmally this would be used for mutable files, which cannot be signed. The i_generation and s_uuid on the lower layer and the overlay are not the same, causing the EVM HMAC verification to fail.
On Mon, May 22, 2023 at 3:18 PM Mimi Zohar <zohar@linux.ibm.com> wrote: > > On Sat, 2023-05-20 at 12:15 +0300, Amir Goldstein wrote: > > On Fri, May 19, 2023 at 10:42 PM Mimi Zohar <zohar@linux.ibm.com> wrote: > > > > > > On Fri, 2023-04-07 at 10:31 +0200, Christian Brauner wrote: > > > > So, I think we want both; we want the ovl_copyattr() and the > > > > vfs_getattr_nosec() change: > > > > > > > > (1) overlayfs should copy up the inode version in ovl_copyattr(). That > > > > is in line what we do with all other inode attributes. IOW, the > > > > overlayfs inode's i_version counter should aim to mirror the > > > > relevant layer's i_version counter. I wouldn't know why that > > > > shouldn't be the case. Asking the other way around there doesn't > > > > seem to be any use for overlayfs inodes to have an i_version that > > > > isn't just mirroring the relevant layer's i_version. > > > > (2) Jeff's changes for ima to make it rely on vfs_getattr_nosec(). > > > > Currently, ima assumes that it will get the correct i_version from > > > > an inode but that just doesn't hold for stacking filesystem. > > > > > > > > While (1) would likely just fix the immediate bug (2) is correct and > > > > _robust_. If we change how attributes are handled vfs_*() helpers will > > > > get updated and ima with it. Poking at raw inodes without using > > > > appropriate helpers is much more likely to get ima into trouble. > > > > > > In addition to properly setting the i_version for IMA, EVM has a > > > similar issue with i_generation and s_uuid. Adding them to > > > ovl_copyattr() seems to resolve it. Does that make sense? > > > > > > diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c > > > index 923d66d131c1..cd0aeb828868 100644 > > > --- a/fs/overlayfs/util.c > > > +++ b/fs/overlayfs/util.c > > > @@ -1118,5 +1118,8 @@ void ovl_copyattr(struct inode *inode) > > > inode->i_atime = realinode->i_atime; > > > inode->i_mtime = realinode->i_mtime; > > > inode->i_ctime = realinode->i_ctime; > > > + inode->i_generation = realinode->i_generation; > > > + if (inode->i_sb) > > > + uuid_copy(&inode->i_sb->s_uuid, &realinode->i_sb- > > > >s_uuid); > > > > That is not a possible solution Mimi. > > > > The i_gneration copy *may* be acceptable in "all layers on same fs" > > setup, but changing overlayfs s_uuid over and over is a non-starter. > > > > If you explain the problem, I may be able to help you find a better solution. > > EVM calculates an HMAC of the file metadata (security xattrs, i_ino, > i_generation, i_uid, i_gid, i_mode, s_uuid) and stores it as > security.evm. Notrmally this would be used for mutable files, which > cannot be signed. The i_generation and s_uuid on the lower layer and > the overlay are not the same, causing the EVM HMAC verification to > fail. > OK, so EVM expects i_ino, i_generation, i_uid, i_gid, i_mode, s_uuid and security xattr to remain stable and persistent (survive umount/mount). Correct? You cannot expect that the same EVM xattr will correctly describe both the overlayfs inode and the underlying real fs inode, because they may vary in some of the metadata, so need to decide if you only want to attest overlayfs inodes, real underlying inodes or both. If both, then the same EVM xattr cannot be used, but as it is, overlayfs inode has no "private" xattr version, it stores its xattr on the underlying real inode. i_uid, i_gid, i_mode: Should be stable and persistent for overlayfs inode and survive copy up. Should be identical to the underlying inode. security xattr: Overlayfs tries to copy up all security.* xattr and also calls the LSM hook security_inode_copy_up_xattr() to approve each copied xattr. Should be identical to the underlying inode. s_uuid: So far, overlayfs sb has a null uuid. With this patch, overlayfs will gain a persistent s_uuid, just like any other disk fs with the opt-in feature index=on: https://lore.kernel.org/linux-unionfs/20230425132223.2608226-4-amir73il@gmail.com/ Should be different from the underlying fs uuid when there is more than one underlying fs. We can consider inheriting s_uuid from underlying fs when all layers are on the same fs. i_ino: As documented in: https://github.com/torvalds/linux/blob/master/Documentation/filesystems/overlayfs.rst#inode-properties It should be persistent and survive copy up with the xino=auto feature (module param or mount option) or CONFIG_OVERLAY_FS_XINO_AUTO=y which is not the kernel default, but already set by some distros. Will be identical to the underlying inode only in some special cases such as pure upper (not copied up) inodes. Will be different from the underlying lower file inode many in other cases. i_generation: For xino=auto, we could follow the same rules as i_ino and get similar qualities - i_generation will become persistent and survive copy up, but it will not be identical to the real underlying inode i_generation in many cases. Bottom line: If you only want to attest overlayfs inodes - shouldn't be too hard If you want to attest both overlayfs inodes AND their backing "real" inodes - much more challenging. Hope that this writeup helps more than it confuses. Thanks, Amir.
On Mon, 2023-05-22 at 08:49 +1000, Dave Chinner wrote: > > In addition the uuid should be set when the filesystem is mounted. > > Unless the filesystem implements a dedicated ioctl() - like ext4 - to > > change the uuid. > > IMO, that ext4 functionality is a landmine waiting to be stepped on. > > We should not be changing the sb->s_uuid of filesysetms dynamically. > The VFS does not guarantee in any way that it is safe to change the > sb->s_uuid (i.e. no locking, no change notifications, no udev > events, etc). Various subsystems - both in the kernel and in > userspace - use the sb->s_uuid as a canonical and/or persistent > filesystem/device identifier and are unprepared to have it change > while the filesystem is mounted and active. > > I commented on this from an XFS perspective here when it was > proposed to copy this ext4 mis-feature in XFS: > > https://lore.kernel.org/linux-xfs/20230314062847.GQ360264@dread.disaster.area/ > > Further to this, I also suspect that changing uuids online will > cause issues with userspace caching of fs uuids (e.g. libblkid and > anything that uses it) and information that uses uuids to identify > the filesystem that are set up at mount time (/dev/disk/by-uuid/ > links, etc) by kernel events sent to userspace helpers... > > IMO, we shouldn't even be considering dynamic sb->s_uuid changes > without first working through the full system impacts of having > persistent userspace-visible filesystem identifiers change > dynamically... Oh! FYI, we've started using the ability to change the UUID for IMA testing. IMA policy rules can be defined in terms of the UUID without impacting the existing policy rules. Changing the UUID can be used to enable different tests without interferring with existing policy rules. Mimi
On Mon, 2023-05-22 at 17:00 +0300, Amir Goldstein wrote: > On Mon, May 22, 2023 at 3:18 PM Mimi Zohar <zohar@linux.ibm.com> wrote: > > > > On Sat, 2023-05-20 at 12:15 +0300, Amir Goldstein wrote: > > > On Fri, May 19, 2023 at 10:42 PM Mimi Zohar <zohar@linux.ibm.com> wrote: > > > > > > > > On Fri, 2023-04-07 at 10:31 +0200, Christian Brauner wrote: > > > > > So, I think we want both; we want the ovl_copyattr() and the > > > > > vfs_getattr_nosec() change: > > > > > > > > > > (1) overlayfs should copy up the inode version in ovl_copyattr(). That > > > > > is in line what we do with all other inode attributes. IOW, the > > > > > overlayfs inode's i_version counter should aim to mirror the > > > > > relevant layer's i_version counter. I wouldn't know why that > > > > > shouldn't be the case. Asking the other way around there doesn't > > > > > seem to be any use for overlayfs inodes to have an i_version that > > > > > isn't just mirroring the relevant layer's i_version. > > > > > (2) Jeff's changes for ima to make it rely on vfs_getattr_nosec(). > > > > > Currently, ima assumes that it will get the correct i_version from > > > > > an inode but that just doesn't hold for stacking filesystem. > > > > > > > > > > While (1) would likely just fix the immediate bug (2) is correct and > > > > > _robust_. If we change how attributes are handled vfs_*() helpers will > > > > > get updated and ima with it. Poking at raw inodes without using > > > > > appropriate helpers is much more likely to get ima into trouble. > > > > > > > > In addition to properly setting the i_version for IMA, EVM has a > > > > similar issue with i_generation and s_uuid. Adding them to > > > > ovl_copyattr() seems to resolve it. Does that make sense? > > > > > > > > diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c > > > > index 923d66d131c1..cd0aeb828868 100644 > > > > --- a/fs/overlayfs/util.c > > > > +++ b/fs/overlayfs/util.c > > > > @@ -1118,5 +1118,8 @@ void ovl_copyattr(struct inode *inode) > > > > inode->i_atime = realinode->i_atime; > > > > inode->i_mtime = realinode->i_mtime; > > > > inode->i_ctime = realinode->i_ctime; > > > > + inode->i_generation = realinode->i_generation; > > > > + if (inode->i_sb) > > > > + uuid_copy(&inode->i_sb->s_uuid, &realinode->i_sb- > > > > >s_uuid); > > > > > > That is not a possible solution Mimi. > > > > > > The i_gneration copy *may* be acceptable in "all layers on same fs" > > > setup, but changing overlayfs s_uuid over and over is a non-starter. > > > > > > If you explain the problem, I may be able to help you find a better solution. > > > > EVM calculates an HMAC of the file metadata (security xattrs, i_ino, > > i_generation, i_uid, i_gid, i_mode, s_uuid) and stores it as > > security.evm. Notrmally this would be used for mutable files, which > > cannot be signed. The i_generation and s_uuid on the lower layer and > > the overlay are not the same, causing the EVM HMAC verification to > > fail. > > > > OK, so EVM expects i_ino, i_generation, i_uid, i_gid, i_mode, s_uuid > and security xattr to remain stable and persistent (survive umount/mount). > Correct? Yes > > You cannot expect that the same EVM xattr will correctly describe both > the overlayfs inode and the underlying real fs inode, because they may > vary in some of the metadata, so need to decide if you only want to attest > overlayfs inodes, real underlying inodes or both. Understood. Accessing a file on the overlay filesystem then needs to be verified based on the backing file metadata. Currently that isn't being done. So either all the backing file metadata needs to be copied up or some other change(s) need to be made. > If both, then the same EVM xattr cannot be used, but as it is, overlayfs > inode has no "private" xattr version, it stores its xattr on the underlying > real inode. > > i_uid, i_gid, i_mode: > Should be stable and persistent for overlayfs inode and survive copy up. > Should be identical to the underlying inode. > > security xattr: > Overlayfs tries to copy up all security.* xattr and also calls the LSM > hook security_inode_copy_up_xattr() to approve each copied xattr. > Should be identical to the underlying inode. > s_uuid: > So far, overlayfs sb has a null uuid. > With this patch, overlayfs will gain a persistent s_uuid, just like any > other disk fs with the opt-in feature index=on: > https://lore.kernel.org/linux-unionfs/20230425132223.2608226-4-amir73il@gmail.com/ > Should be different from the underlying fs uuid when there is more > than one underlying fs. > We can consider inheriting s_uuid from underlying fs when all layers > are on the same fs. > > i_ino: > As documented in: > https://github.com/torvalds/linux/blob/master/Documentation/filesystems/overlayfs.rst#inode-properties > It should be persistent and survive copy up with the > xino=auto feature (module param or mount option) or > CONFIG_OVERLAY_FS_XINO_AUTO=y > which is not the kernel default, but already set by some distros. > Will be identical to the underlying inode only in some special cases > such as pure upper (not copied up) inodes. > Will be different from the underlying lower file inode many in other cases. > > i_generation: > For xino=auto, we could follow the same rules as i_ino and get similar > qualities - > i_generation will become persistent and survive copy up, but it will not be > identical to the real underlying inode i_generation in many cases. > > Bottom line: > If you only want to attest overlayfs inodes - shouldn't be too hard > If you want to attest both overlayfs inodes AND their backing "real" inodes - > much more challenging. > > Hope that this writeup helps more than it confuses. Thanks, Amir. It definitely helps. To summarize what I'm seeing (IMA hash and EVM HMAC): - Directly accessing overlay files, "lower" backed file, fails to verify without copying all the file metadata up. - Writing directly to the "upper" backing file properly updates the file metadata. - Writing directly to the overlay file does not write security.ima either to the overlayfs or the "upper" backing file. policy rules: appraise func=FILE_CHECK fsuuid=.... measure func=FILE_CHECK fsuuid=.... appraise func=FILE_CHECK fsname=overlay measure func=FILE_CHECK fsname=overlay Mimi
On Fri, 2023-05-19 at 10:58 -0400, Paul Moore wrote: > On Thu, May 18, 2023 at 4:56 PM Mimi Zohar <zohar@linux.ibm.com> wrote: > > On Thu, 2023-05-18 at 16:46 -0400, Paul Moore wrote: > > > On Fri, Apr 21, 2023 at 10:44 AM Mimi Zohar <zohar@linux.ibm.com> wrote: > > > > On Fri, 2023-04-07 at 09:29 -0400, Jeff Layton wrote: > > ... > > > > I'm going through my review queue to make sure I haven't missed > > > anything and this thread popped up ... Stefan, Mimi, did you get a fix > > > into an upstream tree somewhere? If not, is it because you are > > > waiting on a review/merge from me into the LSM tree? > > > > Sorry for the delay. Between vacation and LSS, I just started testing > > Jeff Layton's patch. > > No worries, I'm a bit behind too, I just wanted to make sure I wasn't > blocking this thread :) FYI, Jeff Layton's patch is now queued in next-integrity.
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c index 6011f955436b..19b8f4bcc18c 100644 --- a/fs/overlayfs/file.c +++ b/fs/overlayfs/file.c @@ -13,6 +13,7 @@ #include <linux/security.h> #include <linux/mm.h> #include <linux/fs.h> +#include <linux/integrity.h> #include "overlayfs.h" struct ovl_aio_req { @@ -169,6 +170,9 @@ static int ovl_open(struct inode *inode, struct file *file) static int ovl_release(struct inode *inode, struct file *file) { + if (file->f_flags & O_ACCMODE) + integrity_notify_change(inode); + fput(file->private_data); return 0; diff --git a/include/linux/integrity.h b/include/linux/integrity.h index 2ea0f2f65ab6..cefdeccc1619 100644 --- a/include/linux/integrity.h +++ b/include/linux/integrity.h @@ -23,6 +23,7 @@ enum integrity_status { #ifdef CONFIG_INTEGRITY extern struct integrity_iint_cache *integrity_inode_get(struct inode *inode); extern void integrity_inode_free(struct inode *inode); +extern void integrity_notify_change(struct inode *inode); extern void __init integrity_load_keys(void); #else @@ -37,6 +38,11 @@ static inline void integrity_inode_free(struct inode *inode) return; } +static inline void integrity_notify_change(struct inode *inode) +{ + return; +} + static inline void integrity_load_keys(void) { } diff --git a/security/integrity/iint.c b/security/integrity/iint.c index 8638976f7990..70d2d716f3ae 100644 --- a/security/integrity/iint.c +++ b/security/integrity/iint.c @@ -85,6 +85,19 @@ static void iint_free(struct integrity_iint_cache *iint) kmem_cache_free(iint_cache, iint); } +void integrity_notify_change(struct inode *inode) +{ + struct integrity_iint_cache *iint; + + if (!IS_IMA(inode)) + return; + + iint = integrity_iint_find(inode); + if (iint) + set_bit(IMA_CHANGE_XATTR, &iint->atomic_flags); +} +EXPORT_SYMBOL_GPL(integrity_notify_change); + /** * integrity_inode_get - find or allocate an iint associated with an inode * @inode: pointer to the inode