[v2] kasan: fix deadlock in start_report()

Message ID 20230209031159.2337445-1-ouyangweizhao@zeku.com
State New
Headers
Series [v2] kasan: fix deadlock in start_report() |

Commit Message

Weizhao Ouyang Feb. 9, 2023, 3:11 a.m. UTC
  From: Weizhao Ouyang <o451686892@gmail.com>

From: Shuai Yuan <yuanshuai@zeku.com>

Calling start_report() again between start_report() and end_report()
will result in a race issue for the report_lock. In extreme cases this
problem arose in Kunit tests in the hardware tag-based Kasan mode.

For example, when an invalid memory release problem is found,
kasan_report_invalid_free() will print error log, but if an MTE exception
is raised during the output log, the kasan_report() is called, resulting
in a deadlock problem. The kasan_depth not protect it in hardware
tag-based Kasan mode.

Signed-off-by: Shuai Yuan <yuanshuai@zeku.com>
Reviewed-by: Weizhao Ouyang <ouyangweizhao@zeku.com>
Reviewed-by: Peng Ren <renlipeng@zeku.com>
---
Changes in v2:
-- remove redundant log

 mm/kasan/report.c | 25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)
  

Comments

Dmitry Vyukov Feb. 9, 2023, 8:55 a.m. UTC | #1
On Thu, 9 Feb 2023 at 04:27, Weizhao Ouyang <ouyangweizhao@zeku.com> wrote:
>
> From: Weizhao Ouyang <o451686892@gmail.com>
>
> From: Shuai Yuan <yuanshuai@zeku.com>
>
> Calling start_report() again between start_report() and end_report()
> will result in a race issue for the report_lock. In extreme cases this
> problem arose in Kunit tests in the hardware tag-based Kasan mode.
>
> For example, when an invalid memory release problem is found,
> kasan_report_invalid_free() will print error log, but if an MTE exception
> is raised during the output log, the kasan_report() is called, resulting
> in a deadlock problem. The kasan_depth not protect it in hardware
> tag-based Kasan mode.

I think checking report_suppressed() would be cleaner and simpler than
ignoring all trylock failures. If trylock fails, it does not mean that
the current thread is holding it. We of course could do a custom lock
which stores current->tid in the lock word, but it looks effectively
equivalent to checking report_suppressed().



> Signed-off-by: Shuai Yuan <yuanshuai@zeku.com>
> Reviewed-by: Weizhao Ouyang <ouyangweizhao@zeku.com>
> Reviewed-by: Peng Ren <renlipeng@zeku.com>
> ---
> Changes in v2:
> -- remove redundant log
>
>  mm/kasan/report.c | 25 ++++++++++++++++++++-----
>  1 file changed, 20 insertions(+), 5 deletions(-)
>
> diff --git a/mm/kasan/report.c b/mm/kasan/report.c
> index 22598b20c7b7..aa39aa8b1855 100644
> --- a/mm/kasan/report.c
> +++ b/mm/kasan/report.c
> @@ -166,7 +166,7 @@ static inline void fail_non_kasan_kunit_test(void) { }
>
>  static DEFINE_SPINLOCK(report_lock);
>
> -static void start_report(unsigned long *flags, bool sync)
> +static bool start_report(unsigned long *flags, bool sync)
>  {
>         fail_non_kasan_kunit_test();
>         /* Respect the /proc/sys/kernel/traceoff_on_warning interface. */
> @@ -175,8 +175,13 @@ static void start_report(unsigned long *flags, bool sync)
>         lockdep_off();
>         /* Make sure we don't end up in loop. */
>         kasan_disable_current();
> -       spin_lock_irqsave(&report_lock, *flags);
> +       if (!spin_trylock_irqsave(&report_lock, *flags)) {
> +               lockdep_on();
> +               kasan_enable_current();
> +               return false;
> +       }
>         pr_err("==================================================================\n");
> +       return true;
>  }
>
>  static void end_report(unsigned long *flags, void *addr)
> @@ -468,7 +473,10 @@ void kasan_report_invalid_free(void *ptr, unsigned long ip, enum kasan_report_ty
>         if (unlikely(!report_enabled()))
>                 return;
>
> -       start_report(&flags, true);
> +       if (!start_report(&flags, true)) {
> +               pr_err("%s: report ignore\n", __func__);
> +               return;
> +       }
>
>         memset(&info, 0, sizeof(info));
>         info.type = type;
> @@ -503,7 +511,11 @@ bool kasan_report(unsigned long addr, size_t size, bool is_write,
>                 goto out;
>         }
>
> -       start_report(&irq_flags, true);
> +       if (!start_report(&irq_flags, true)) {
> +               ret = false;
> +               pr_err("%s: report ignore\n", __func__);
> +               goto out;
> +       }
>
>         memset(&info, 0, sizeof(info));
>         info.type = KASAN_REPORT_ACCESS;
> @@ -536,7 +548,10 @@ void kasan_report_async(void)
>         if (unlikely(!report_enabled()))
>                 return;
>
> -       start_report(&flags, false);
> +       if (!start_report(&flags, false)) {
> +               pr_err("%s: report ignore\n", __func__);
> +               return;
> +       }
>         pr_err("BUG: KASAN: invalid-access\n");
>         pr_err("Asynchronous fault: no details available\n");
>         pr_err("\n");
> --
> 2.25.1
>
> --
> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/20230209031159.2337445-1-ouyangweizhao%40zeku.com.
  
袁帅(Shuai Yuan) Feb. 9, 2023, 9:19 a.m. UTC | #2
Hi Dmitry Vyukov

Thanks, I see that your means.

Currently, report_suppressed() seem not work in Kasan-HW mode, it always return false.
Do you think should change the report_suppressed function?
I don't know why CONFIG_KASAN_HW_TAGS was blocked separately before.

-----邮件原件-----
发件人: Dmitry Vyukov <dvyukov@google.com>
发送时间: 2023年2月9日 16:56
收件人: 欧阳炜钊(Weizhao Ouyang) <ouyangweizhao@zeku.com>
抄送: Andrey Ryabinin <ryabinin.a.a@gmail.com>; Alexander Potapenko <glider@google.com>; Andrey Konovalov <andreyknvl@gmail.com>; Vincenzo Frascino <vincenzo.frascino@arm.com>; Andrew Morton <akpm@linux-foundation.org>; kasan-dev@googlegroups.com; linux-mm@kvack.org; linux-kernel@vger.kernel.org; Weizhao Ouyang <o451686892@gmail.com>; 袁帅(Shuai Yuan) <yuanshuai@zeku.com>; 任立鹏(Peng Ren) <renlipeng@zeku.com>
主题: Re: [PATCH v2] kasan: fix deadlock in start_report()

On Thu, 9 Feb 2023 at 04:27, Weizhao Ouyang <ouyangweizhao@zeku.com> wrote:
>
> From: Weizhao Ouyang <o451686892@gmail.com>
>
> From: Shuai Yuan <yuanshuai@zeku.com>
>
> Calling start_report() again between start_report() and end_report()
> will result in a race issue for the report_lock. In extreme cases this
> problem arose in Kunit tests in the hardware tag-based Kasan mode.
>
> For example, when an invalid memory release problem is found,
> kasan_report_invalid_free() will print error log, but if an MTE
> exception is raised during the output log, the kasan_report() is
> called, resulting in a deadlock problem. The kasan_depth not protect
> it in hardware tag-based Kasan mode.

I think checking report_suppressed() would be cleaner and simpler than ignoring all trylock failures. If trylock fails, it does not mean that the current thread is holding it. We of course could do a custom lock which stores current->tid in the lock word, but it looks effectively equivalent to checking report_suppressed().



> Signed-off-by: Shuai Yuan <yuanshuai@zeku.com>
> Reviewed-by: Weizhao Ouyang <ouyangweizhao@zeku.com>
> Reviewed-by: Peng Ren <renlipeng@zeku.com>
> ---
> Changes in v2:
> -- remove redundant log
>
>  mm/kasan/report.c | 25 ++++++++++++++++++++-----
>  1 file changed, 20 insertions(+), 5 deletions(-)
>
> diff --git a/mm/kasan/report.c b/mm/kasan/report.c index
> 22598b20c7b7..aa39aa8b1855 100644
> --- a/mm/kasan/report.c
> +++ b/mm/kasan/report.c
> @@ -166,7 +166,7 @@ static inline void fail_non_kasan_kunit_test(void)
> { }
>
>  static DEFINE_SPINLOCK(report_lock);
>
> -static void start_report(unsigned long *flags, bool sync)
> +static bool start_report(unsigned long *flags, bool sync)
>  {
>         fail_non_kasan_kunit_test();
>         /* Respect the /proc/sys/kernel/traceoff_on_warning interface.
> */ @@ -175,8 +175,13 @@ static void start_report(unsigned long *flags, bool sync)
>         lockdep_off();
>         /* Make sure we don't end up in loop. */
>         kasan_disable_current();
> -       spin_lock_irqsave(&report_lock, *flags);
> +       if (!spin_trylock_irqsave(&report_lock, *flags)) {
> +               lockdep_on();
> +               kasan_enable_current();
> +               return false;
> +       }
>
> pr_err("==============================================================
> ====\n");
> +       return true;
>  }
>
>  static void end_report(unsigned long *flags, void *addr) @@ -468,7
> +473,10 @@ void kasan_report_invalid_free(void *ptr, unsigned long ip, enum kasan_report_ty
>         if (unlikely(!report_enabled()))
>                 return;
>
> -       start_report(&flags, true);
> +       if (!start_report(&flags, true)) {
> +               pr_err("%s: report ignore\n", __func__);
> +               return;
> +       }
>
>         memset(&info, 0, sizeof(info));
>         info.type = type;
> @@ -503,7 +511,11 @@ bool kasan_report(unsigned long addr, size_t size, bool is_write,
>                 goto out;
>         }
>
> -       start_report(&irq_flags, true);
> +       if (!start_report(&irq_flags, true)) {
> +               ret = false;
> +               pr_err("%s: report ignore\n", __func__);
> +               goto out;
> +       }
>
>         memset(&info, 0, sizeof(info));
>         info.type = KASAN_REPORT_ACCESS; @@ -536,7 +548,10 @@ void
> kasan_report_async(void)
>         if (unlikely(!report_enabled()))
>                 return;
>
> -       start_report(&flags, false);
> +       if (!start_report(&flags, false)) {
> +               pr_err("%s: report ignore\n", __func__);
> +               return;
> +       }
>         pr_err("BUG: KASAN: invalid-access\n");
>         pr_err("Asynchronous fault: no details available\n");
>         pr_err("\n");
> --
> 2.25.1
>
> --
> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/20230209031159.2337445-1-ouyangweizhao%40zeku.com.
ZEKU
信息安全声明:本邮件包含信息归发件人所在组织ZEKU所有。 禁止任何人在未经授权的情况下以任何形式(包括但不限于全部或部分披露、复制或传播)使用包含的信息。若您错收了本邮件,请立即电话或邮件通知发件人,并删除本邮件及附件。
Information Security Notice: The information contained in this mail is solely property of the sender's organization ZEKU. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this email in error, please notify the sender by phone or email immediately and delete it.
  
Dmitry Vyukov Feb. 9, 2023, 10:44 a.m. UTC | #3
aOn Thu, 9 Feb 2023 at 10:19, 袁帅(Shuai Yuan) <yuanshuai@zeku.com> wrote:
>
> Hi Dmitry Vyukov
>
> Thanks, I see that your means.
>
> Currently, report_suppressed() seem not work in Kasan-HW mode, it always return false.
> Do you think should change the report_suppressed function?
> I don't know why CONFIG_KASAN_HW_TAGS was blocked separately before.

That logic was added by Andrey in:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c068664c97c7cf

Andrey, can we make report_enabled() check current->kasan_depth and
remove report_suppressed()?

Then we can also remove the comment in kasan_report_invalid_free().

It looks like kasan_disable_current() in kmemleak needs to affect
HW_TAGS mode as well:
https://elixir.bootlin.com/linux/v6.2-rc7/source/mm/kmemleak.c#L301

So overall it looks like simplifications and it will fix what Shuai reported.




> -----邮件原件-----
> 发件人: Dmitry Vyukov <dvyukov@google.com>
> 发送时间: 2023年2月9日 16:56
> 收件人: 欧阳炜钊(Weizhao Ouyang) <ouyangweizhao@zeku.com>
> 抄送: Andrey Ryabinin <ryabinin.a.a@gmail.com>; Alexander Potapenko <glider@google.com>; Andrey Konovalov <andreyknvl@gmail.com>; Vincenzo Frascino <vincenzo.frascino@arm.com>; Andrew Morton <akpm@linux-foundation.org>; kasan-dev@googlegroups.com; linux-mm@kvack.org; linux-kernel@vger.kernel.org; Weizhao Ouyang <o451686892@gmail.com>; 袁帅(Shuai Yuan) <yuanshuai@zeku.com>; 任立鹏(Peng Ren) <renlipeng@zeku.com>
> 主题: Re: [PATCH v2] kasan: fix deadlock in start_report()
>
> On Thu, 9 Feb 2023 at 04:27, Weizhao Ouyang <ouyangweizhao@zeku.com> wrote:
> >
> > From: Weizhao Ouyang <o451686892@gmail.com>
> >
> > From: Shuai Yuan <yuanshuai@zeku.com>
> >
> > Calling start_report() again between start_report() and end_report()
> > will result in a race issue for the report_lock. In extreme cases this
> > problem arose in Kunit tests in the hardware tag-based Kasan mode.
> >
> > For example, when an invalid memory release problem is found,
> > kasan_report_invalid_free() will print error log, but if an MTE
> > exception is raised during the output log, the kasan_report() is
> > called, resulting in a deadlock problem. The kasan_depth not protect
> > it in hardware tag-based Kasan mode.
>
> I think checking report_suppressed() would be cleaner and simpler than ignoring all trylock failures. If trylock fails, it does not mean that the current thread is holding it. We of course could do a custom lock which stores current->tid in the lock word, but it looks effectively equivalent to checking report_suppressed().
>
>
>
> > Signed-off-by: Shuai Yuan <yuanshuai@zeku.com>
> > Reviewed-by: Weizhao Ouyang <ouyangweizhao@zeku.com>
> > Reviewed-by: Peng Ren <renlipeng@zeku.com>
> > ---
> > Changes in v2:
> > -- remove redundant log
> >
> >  mm/kasan/report.c | 25 ++++++++++++++++++++-----
> >  1 file changed, 20 insertions(+), 5 deletions(-)
> >
> > diff --git a/mm/kasan/report.c b/mm/kasan/report.c index
> > 22598b20c7b7..aa39aa8b1855 100644
> > --- a/mm/kasan/report.c
> > +++ b/mm/kasan/report.c
> > @@ -166,7 +166,7 @@ static inline void fail_non_kasan_kunit_test(void)
> > { }
> >
> >  static DEFINE_SPINLOCK(report_lock);
> >
> > -static void start_report(unsigned long *flags, bool sync)
> > +static bool start_report(unsigned long *flags, bool sync)
> >  {
> >         fail_non_kasan_kunit_test();
> >         /* Respect the /proc/sys/kernel/traceoff_on_warning interface.
> > */ @@ -175,8 +175,13 @@ static void start_report(unsigned long *flags, bool sync)
> >         lockdep_off();
> >         /* Make sure we don't end up in loop. */
> >         kasan_disable_current();
> > -       spin_lock_irqsave(&report_lock, *flags);
> > +       if (!spin_trylock_irqsave(&report_lock, *flags)) {
> > +               lockdep_on();
> > +               kasan_enable_current();
> > +               return false;
> > +       }
> >
> > pr_err("==============================================================
> > ====\n");
> > +       return true;
> >  }
> >
> >  static void end_report(unsigned long *flags, void *addr) @@ -468,7
> > +473,10 @@ void kasan_report_invalid_free(void *ptr, unsigned long ip, enum kasan_report_ty
> >         if (unlikely(!report_enabled()))
> >                 return;
> >
> > -       start_report(&flags, true);
> > +       if (!start_report(&flags, true)) {
> > +               pr_err("%s: report ignore\n", __func__);
> > +               return;
> > +       }
> >
> >         memset(&info, 0, sizeof(info));
> >         info.type = type;
> > @@ -503,7 +511,11 @@ bool kasan_report(unsigned long addr, size_t size, bool is_write,
> >                 goto out;
> >         }
> >
> > -       start_report(&irq_flags, true);
> > +       if (!start_report(&irq_flags, true)) {
> > +               ret = false;
> > +               pr_err("%s: report ignore\n", __func__);
> > +               goto out;
> > +       }
> >
> >         memset(&info, 0, sizeof(info));
> >         info.type = KASAN_REPORT_ACCESS; @@ -536,7 +548,10 @@ void
> > kasan_report_async(void)
> >         if (unlikely(!report_enabled()))
> >                 return;
> >
> > -       start_report(&flags, false);
> > +       if (!start_report(&flags, false)) {
> > +               pr_err("%s: report ignore\n", __func__);
> > +               return;
> > +       }
> >         pr_err("BUG: KASAN: invalid-access\n");
> >         pr_err("Asynchronous fault: no details available\n");
> >         pr_err("\n");
> > --
> > 2.25.1
> >
> > --
> > You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com.
> > To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/20230209031159.2337445-1-ouyangweizhao%40zeku.com.
> ZEKU
> 信息安全声明:本邮件包含信息归发件人所在组织ZEKU所有。 禁止任何人在未经授权的情况下以任何形式(包括但不限于全部或部分披露、复制或传播)使用包含的信息。若您错收了本邮件,请立即电话或邮件通知发件人,并删除本邮件及附件。
> Information Security Notice: The information contained in this mail is solely property of the sender's organization ZEKU. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this email in error, please notify the sender by phone or email immediately and delete it.
  
Andrey Konovalov Feb. 9, 2023, 10:54 p.m. UTC | #4
On Thu, Feb 9, 2023 at 11:44 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
>  On Thu, 9 Feb 2023 at 10:19, 袁帅(Shuai Yuan) <yuanshuai@zeku.com> wrote:
> >
> > Hi Dmitry Vyukov
> >
> > Thanks, I see that your means.
> >
> > Currently, report_suppressed() seem not work in Kasan-HW mode, it always return false.
> > Do you think should change the report_suppressed function?
> > I don't know why CONFIG_KASAN_HW_TAGS was blocked separately before.
>
> That logic was added by Andrey in:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c068664c97c7cf
>
> Andrey, can we make report_enabled() check current->kasan_depth and
> remove report_suppressed()?

I decided to not use kasan_depth for HW_TAGS, as we can always use a
match-all tag to make "invalid" memory accesses.

I think we can fix the reporting code to do exactly that so that it
doesn't cause MTE faults.

Shuai, could you clarify, at which point due kasan_report_invalid_free
an MTE exception is raised in your tests?

> Then we can also remove the comment in kasan_report_invalid_free().
>
> It looks like kasan_disable_current() in kmemleak needs to affect
> HW_TAGS mode as well:
> https://elixir.bootlin.com/linux/v6.2-rc7/source/mm/kmemleak.c#L301

It uses kasan_reset_tag, so it should work properly with HW_TAGS.
  
袁帅(Shuai Yuan) Feb. 10, 2023, 2:32 a.m. UTC | #5
On Friday, February 10, 2023 at 6:54 AM Andrey Konovalov <andreyknvl@gmail.com>
wrote:
> On Thu, Feb 9, 2023 at 11:44 AM Dmitry Vyukov <dvyukov@google.com>
> wrote:
> >
> >  On Thu, 9 Feb 2023 at 10:19, 袁帅(Shuai Yuan) <yuanshuai@zeku.com>
> wrote:
> > >
> > > Hi Dmitry Vyukov
> > >
> > > Thanks, I see that your means.
> > >
> > > Currently, report_suppressed() seem not work in Kasan-HW mode, it
> always return false.
> > > Do you think should change the report_suppressed function?
> > > I don't know why CONFIG_KASAN_HW_TAGS was blocked separately
> before.
> >
> > That logic was added by Andrey in:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/com
> > mit/?id=c068664c97c7cf
> >
> > Andrey, can we make report_enabled() check current->kasan_depth and
> > remove report_suppressed()?
>
> I decided to not use kasan_depth for HW_TAGS, as we can always use a
> match-all tag to make "invalid" memory accesses.
>
> I think we can fix the reporting code to do exactly that so that it doesn't
> cause MTE faults.
>
> Shuai, could you clarify, at which point due kasan_report_invalid_free an
> MTE exception is raised in your tests?

Yes, I need some time to clarify this problem with a clear log by test.

> > Then we can also remove the comment in kasan_report_invalid_free().
> >
> > It looks like kasan_disable_current() in kmemleak needs to affect
> > HW_TAGS mode as well:
> > https://elixir.bootlin.com/linux/v6.2-rc7/source/mm/kmemleak.c#L301
>
> It uses kasan_reset_tag, so it should work properly with HW_TAGS.
ZEKU
信息安全声明:本邮件包含信息归发件人所在组织ZEKU所有。 禁止任何人在未经授权的情况下以任何形式(包括但不限于全部或部分披露、复制或传播)使用包含的信息。若您错收了本邮件,请立即电话或邮件通知发件人,并删除本邮件及附件。
Information Security Notice: The information contained in this mail is solely property of the sender's organization ZEKU. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this email in error, please notify the sender by phone or email immediately and delete it.
  
袁帅(Shuai Yuan) Feb. 15, 2023, 1:22 p.m. UTC | #6
> On Friday, February 10, 2023 at 6:54 AM Andrey Konovalov
> <andreyknvl@gmail.com>
> wrote:
> > On Thu, Feb 9, 2023 at 11:44 AM Dmitry Vyukov <dvyukov@google.com>
> > wrote:
> > >
> > >  On Thu, 9 Feb 2023 at 10:19, 袁帅(Shuai Yuan) <yuanshuai@zeku.com>
> > wrote:
> > > >
> > > > Hi Dmitry Vyukov
> > > >
> > > > Thanks, I see that your means.
> > > >
> > > > Currently, report_suppressed() seem not work in Kasan-HW mode, it
> > always return false.
> > > > Do you think should change the report_suppressed function?
> > > > I don't know why CONFIG_KASAN_HW_TAGS was blocked separately
> > before.
> > >
> > > That logic was added by Andrey in:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/c
> > > om
> > > mit/?id=c068664c97c7cf
> > >
> > > Andrey, can we make report_enabled() check current->kasan_depth and
> > > remove report_suppressed()?
> >
> > I decided to not use kasan_depth for HW_TAGS, as we can always use a
> > match-all tag to make "invalid" memory accesses.
> >
> > I think we can fix the reporting code to do exactly that so that it
> > doesn't cause MTE faults.
> >
> > Shuai, could you clarify, at which point due kasan_report_invalid_free
> > an MTE exception is raised in your tests?
>
> Yes, I need some time to clarify this problem with a clear log by test.
>

Hi Andrey and Dmitry

I have got valid information to clarify the problem and solutions. I made
a few changes to the code to do this.

a) I was testing on a device that had hardware issues with MTE,
    and the memory tag sometimes changed randomly.

b) I did this test on kernel version 5.15, but this problem should
    exist on the latest kernel version from a code perspective.

c) Run the kernel with a single core by "maxcpus=1".

d) Code modify,
    (1) Call dump_stack_lvl(KERN_ERR) when start_report() returns false,
    this is done based on the current patch v2.

    (2) Add some log in print_address_description() to show kmem_cache address
    and memory tag.
    https://elixir.bootlin.com/linux/v5.15.94/source/mm/kasan/report.c#L252
   @@ -255,24 +260,25 @@ static void print_address_description(void *addr, u8 tag)

 dump_stack_lvl(KERN_ERR);
 pr_err("\n");
  -
  +pr_err("ys:1\n");
 if (page && PageSlab(page)) {
 struct kmem_cache    *cache = page->slab_cache;
  -void *object = nearest_obj(cache, page,addr);
  +void *object = NULL;
  +pr_err("ys:cache start %llx, mtag:%x, page_address:%llx\n",
  +cache, hw_get_mem_tag(cache), page_address(page));
  +object = nearest_obj(cache, page, addr);
  +                         pr_err("ys:cache end %llx, object %llx, page_address:%llx\n",
  +                                        cache, object, page_address(page));
describe_object(cache, object, addr, tag);
 }

    (3) Add kasan_enable_tagging() to KUNIT_EXPECT_KASAN_FAIL in
    https://elixir.bootlin.com/linux/v5.15.94/source/lib/test_kasan.c#L94
    This ensures that kunit is tested on this unstable device.

e) With the above modification we can get the backtrace:
ys:1
ys:cache start f4ffff8140005380, mtag:fe, page_address:ffffff8140328000
ys:cache change f4ffff8140005380, mtag:fe, page_address:ffffff8140328000
ys: error address:f4ffff8140005398
Pointer tag: [f4], memory tag: [fe]
CPU: 0 PID: 100 Comm: kunit_try_catch Tainted:
Call trace:
dump_backtrace.cfi_jt+0x0/0x8
 show_stack+0x28/0x38
 dump_stack_lvl+0x68/0x98
 __kasan_report+0x110/0x29c
kasan_report+0x40/0x8c
 __do_kernel_fault+0xd4/0x2c4
 do_bad_area+0x40/0x100
 do_tag_check_fault+0x2c/0x40
 do_mem_abort+0x74/0x138
el1_abort+0x40/0x64
 el1h_64_sync_handler+0x60/0xa0
 el1h_64_sync+0x7c/0x80
 print_address_description+0x154/0x2e8
 __kasan_report+0x200/0x29c
kasan_report+0x40/0x8c
 __do_kernel_fault+0xd4/0x2c4
 do_bad_area+0x40/0x100
 do_tag_check_fault+0x2c/0x40
 do_mem_abort+0x74/0x138
el1_abort+0x40/0x64
 el1h_64_sync_handler+0x60/0xa0
 el1h_64_sync+0x7c/0x80
 enqueue_entity+0x23c/0x4b8
enqueue_task_fair+0x13c/0x48c
 enqueue_task.llvm.1684042887774774428+0xd0/0x250
 __do_set_cpus_allowed+0x1ac/0x304
 __set_cpus_allowed_ptr_locked+0x168/0x28c
 migrate_enable+0xf0/0x17c
kasan_strings+0x59c/0x72c
kunit_try_run_case+0x84/0x128
kunit_generic_run_threadfn_adapter+0x48/0x80
kthread+0x17c/0x1e8
ret_from_fork+0x10/0x20
ys:cache end f4ffff8140005380, object ffffff814032ca00, page_address:ffffff8140328000

f) From the above log, you can see that the system tried to call kasan_report() twice,
   because we visit tag address by kmem_cache and this tag have change..
   Normally this doesn't happen easily. So I think we can add kasan_reset_tag() to handle
   the kmem_cache address.

   For example, the following changes are used for the latest kernel version.
diff --git a/mm/kasan/report.c b/mm/kasan/report.c
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -412,7 +412,7 @@ static void complete_report_info(struct kasan_report_info *info)
        slab = kasan_addr_to_slab(addr);
        if (slab) {
-               info->cache = slab->slab_cache;
+               info->cache = kasan_reset_tag(slab->slab_cache);
                info->object = nearest_obj(info->cache, slab, addr);

   I have tested Kernel5.15 using a similar approach and it seems to work.
   On the other hand, I think there should be other solutions and hope to get your feedback.
   Thanks a lot.

> > > Then we can also remove the comment in kasan_report_invalid_free().
> > >
> > > It looks like kasan_disable_current() in kmemleak needs to affect
> > > HW_TAGS mode as well:
> > > https://elixir.bootlin.com/linux/v6.2-rc7/source/mm/kmemleak.c#L301
> >
> > It uses kasan_reset_tag, so it should work properly with HW_TAGS.
ZEKU
信息安全声明:本邮件包含信息归发件人所在组织ZEKU所有。 禁止任何人在未经授权的情况下以任何形式(包括但不限于全部或部分披露、复制或传播)使用包含的信息。若您错收了本邮件,请立即电话或邮件通知发件人,并删除本邮件及附件。
Information Security Notice: The information contained in this mail is solely property of the sender's organization ZEKU. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this email in error, please notify the sender by phone or email immediately and delete it.
  
Andrey Konovalov Feb. 27, 2023, 2:13 a.m. UTC | #7
On Wed, Feb 15, 2023 at 2:22 PM 袁帅(Shuai Yuan) <yuanshuai@zeku.com> wrote:
>
> I have got valid information to clarify the problem and solutions. I made
> a few changes to the code to do this.
>
> a) I was testing on a device that had hardware issues with MTE,
>     and the memory tag sometimes changed randomly.

Ah, I see. Faulty hardware explains the problem. Thank you!

> f) From the above log, you can see that the system tried to call kasan_report() twice,
>    because we visit tag address by kmem_cache and this tag have change..
>    Normally this doesn't happen easily. So I think we can add kasan_reset_tag() to handle
>    the kmem_cache address.
>
>    For example, the following changes are used for the latest kernel version.
> diff --git a/mm/kasan/report.c b/mm/kasan/report.c
> --- a/mm/kasan/report.c
> +++ b/mm/kasan/report.c
> @@ -412,7 +412,7 @@ static void complete_report_info(struct kasan_report_info *info)
>         slab = kasan_addr_to_slab(addr);
>         if (slab) {
> -               info->cache = slab->slab_cache;
> +               info->cache = kasan_reset_tag(slab->slab_cache);

This fixes the problem for accesses to slab_cache, but KASAN reporting
code also accesses stack depot memory and calls other routines that
might access (faulty) tagged memory. And the accessed addresses aren't
exposed to KASAN code, so we can't use kasan_reset_tag for those.

I wonder what would be a good solution here. I really don't want to
use kasan_depth or some other global/per-cpu flag here, as it would be
too good of a target for attackers wishing to bypass MTE. Perhaps,
disabling MTE once reporting started would be a better option: calling
the disabling routine would arguably be a harder task for an attacker
than overwriting a flag.

+Catalin, would it be acceptable to implement a routine that disables
in-kernel MTE tag checking (until the next
mte_enable_kernel_sync/async/asymm call)? In a similar way an MTE
fault does this, but without the fault itself. I.e., expose the part
of do_tag_recovery functionality without report_tag_fault?

TL;DR on the problem: Besides relying on CPU tag checks, KASAN also
does explicit tag checks to detect double-frees and similar problems,
see the calls to kasan_report_invalid_free. Thus, when e.g. a
double-free report is printed, MTE checking is still on. This results
in a deadlock in case invalid memory is accessed during KASAN
reporting.

Thanks!
  
Catalin Marinas Feb. 28, 2023, 4:09 p.m. UTC | #8
On Mon, Feb 27, 2023 at 03:13:45AM +0100, Andrey Konovalov wrote:
> +Catalin, would it be acceptable to implement a routine that disables
> in-kernel MTE tag checking (until the next
> mte_enable_kernel_sync/async/asymm call)? In a similar way an MTE
> fault does this, but without the fault itself. I.e., expose the part
> of do_tag_recovery functionality without report_tag_fault?

I don't think we ever re-enable MTE after do_tag_recovery(). The
mte_enable_kernel_*() are called at boot. We do call
kasan_enable_tagging() explicitly in the kunit tests but that's a
controlled fault environment.

IIUC, the problem is that the kernel already got an MTE fault, so at
that point the error is not really recoverable. If we want to avoid a
fault in the first place, we could do something like
__uaccess_enable_tco() (Vincenzo has some patches to generalise these
routines) but if an MTE fault already triggered and MTE is to stay
disabled after the reporting anyway, I don't think it's worth it.

So I wonder whether it's easier to just disable MTE before calling
report_tag_fault() so that it won't trigger additional faults:

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index f4cb0f85ccf4..1449d2bc6f10 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -329,8 +329,6 @@ static void do_tag_recovery(unsigned long addr, unsigned long esr,
 			   struct pt_regs *regs)
 {
 
-	report_tag_fault(addr, esr, regs);
-
 	/*
 	 * Disable MTE Tag Checking on the local CPU for the current EL.
 	 * It will be done lazily on the other CPUs when they will hit a
@@ -339,6 +337,8 @@ static void do_tag_recovery(unsigned long addr, unsigned long esr,
 	sysreg_clear_set(sctlr_el1, SCTLR_EL1_TCF_MASK,
 			 SYS_FIELD_PREP_ENUM(SCTLR_EL1, TCF, NONE));
 	isb();
+
+	report_tag_fault(addr, esr, regs);
 }
 
 static bool is_el1_mte_sync_tag_check_fault(unsigned long esr)
  
Andrey Konovalov Feb. 28, 2023, 9:50 p.m. UTC | #9
On Tue, Feb 28, 2023 at 5:09 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> On Mon, Feb 27, 2023 at 03:13:45AM +0100, Andrey Konovalov wrote:
> > +Catalin, would it be acceptable to implement a routine that disables
> > in-kernel MTE tag checking (until the next
> > mte_enable_kernel_sync/async/asymm call)? In a similar way an MTE
> > fault does this, but without the fault itself. I.e., expose the part
> > of do_tag_recovery functionality without report_tag_fault?
>
> I don't think we ever re-enable MTE after do_tag_recovery(). The
> mte_enable_kernel_*() are called at boot. We do call
> kasan_enable_tagging() explicitly in the kunit tests but that's a
> controlled fault environment.

Right, but here we don't want to re-enable MTE after a fault, we want
to suppress faults when printing an error report.

> IIUC, the problem is that the kernel already got an MTE fault, so at
> that point the error is not really recoverable.

No, the problem is with the following sequence of events:

1. KASAN detects a memory corruption and starts printing a report
_without getting an MTE fault_. This happens when e.g. KASAN sees a
free of an invalid address.

2. During error reporting, an MTE fault is triggered by the error
reporting code. E.g. while collecting information about the accessed
slab object.

3. KASAN tries to print another report while printing a report and
goes into a deadlock.

If we could avoid MTE faults being triggered during error reporting,
this would solve the problem.

> If we want to avoid a
> fault in the first place, we could do something like
> __uaccess_enable_tco() (Vincenzo has some patches to generalise these
> routines)

Ah, this looks exactly like what we need. Adding
__uaccess_en/disable_tco to kasan_report_invalid_free solves the
problem.

Do you think it would be possible to expose these routines to KASAN?

> but if an MTE fault already triggered and MTE is to stay
> disabled after the reporting anyway, I don't think it's worth it.

No MTE fault is triggered yet in the described sequence of events.

> So I wonder whether it's easier to just disable MTE before calling
> report_tag_fault() so that it won't trigger additional faults:

This will only help in case the first error report is caused by an MTE
fault. However, this won't help with the discussed problem: KASAN can
detect a memory corruption and print a report without getting an MTE
fault.

Nevertheless, this change makes sense to avoid a similar scenario
involving 2 MTE faults.
  
Catalin Marinas March 1, 2023, 4:59 p.m. UTC | #10
On Tue, Feb 28, 2023 at 10:50:46PM +0100, Andrey Konovalov wrote:
> On Tue, Feb 28, 2023 at 5:09 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Mon, Feb 27, 2023 at 03:13:45AM +0100, Andrey Konovalov wrote:
> > > +Catalin, would it be acceptable to implement a routine that disables
> > > in-kernel MTE tag checking (until the next
> > > mte_enable_kernel_sync/async/asymm call)? In a similar way an MTE
> > > fault does this, but without the fault itself. I.e., expose the part
> > > of do_tag_recovery functionality without report_tag_fault?
> >
> > I don't think we ever re-enable MTE after do_tag_recovery(). The
> > mte_enable_kernel_*() are called at boot. We do call
> > kasan_enable_tagging() explicitly in the kunit tests but that's a
> > controlled fault environment.
> 
> Right, but here we don't want to re-enable MTE after a fault, we want
> to suppress faults when printing an error report.
> 
> > IIUC, the problem is that the kernel already got an MTE fault, so at
> > that point the error is not really recoverable.
> 
> No, the problem is with the following sequence of events:
> 
> 1. KASAN detects a memory corruption and starts printing a report
> _without getting an MTE fault_. This happens when e.g. KASAN sees a
> free of an invalid address.
> 
> 2. During error reporting, an MTE fault is triggered by the error
> reporting code. E.g. while collecting information about the accessed
> slab object.
> 
> 3. KASAN tries to print another report while printing a report and
> goes into a deadlock.
> 
> If we could avoid MTE faults being triggered during error reporting,
> this would solve the problem.

Ah, I get it now. So we just want to avoid triggering a benign MTE
fault.

> > If we want to avoid a
> > fault in the first place, we could do something like
> > __uaccess_enable_tco() (Vincenzo has some patches to generalise these
> > routines)
> 
> Ah, this looks exactly like what we need. Adding
> __uaccess_en/disable_tco to kasan_report_invalid_free solves the
> problem.
> 
> Do you think it would be possible to expose these routines to KASAN?

Yes. I'm including Vincenzo's patch below (part of fixing some potential
strscpy() faults with its unaligned accesses eager reading; we'll get to
posting that eventually). You can add some arch_kasan_enable/disable()
macros on top and feel free to include the patch below.

Now, I wonder whether we should link those into kasan_disable_current().
These functions only deal with the depth for KASAN_SW_TAGS but it would
make sense for KASAN_HW_TAGS to enable tag-check-override so that we
don't need to bother with a match-all tags on pointer dereferencing.

----8<----------------------------
From 0dcfc84d8b984001219cc3c9eaf698c26286624c Mon Sep 17 00:00:00 2001
From: Vincenzo Frascino <vincenzo.frascino@arm.com>
Date: Thu, 13 Oct 2022 07:46:23 +0100
Subject: [PATCH] arm64: mte: Rename TCO routines

The TCO related routines are used in uaccess methods and
load_unaligned_zeropad() but are unrelated to both even if the naming
suggest otherwise.

Improve the readability of the code moving the away from uaccess.h and
pre-pending them with "mte".

Cc: Will Deacon <will@kernel.org>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/include/asm/mte-kasan.h      | 81 +++++++++++++++++++++++++
 arch/arm64/include/asm/mte.h            | 12 ----
 arch/arm64/include/asm/uaccess.h        | 66 +++-----------------
 arch/arm64/include/asm/word-at-a-time.h |  4 +-
 4 files changed, 93 insertions(+), 70 deletions(-)

diff --git a/arch/arm64/include/asm/mte-kasan.h b/arch/arm64/include/asm/mte-kasan.h
index 9f79425fc65a..598be32ed811 100644
--- a/arch/arm64/include/asm/mte-kasan.h
+++ b/arch/arm64/include/asm/mte-kasan.h
@@ -13,8 +13,73 @@
 
 #include <linux/types.h>
 
+#ifdef CONFIG_KASAN_HW_TAGS
+
+/* Whether the MTE asynchronous mode is enabled. */
+DECLARE_STATIC_KEY_FALSE(mte_async_or_asymm_mode);
+
+static inline bool system_uses_mte_async_or_asymm_mode(void)
+{
+	return static_branch_unlikely(&mte_async_or_asymm_mode);
+}
+
+#else /* CONFIG_KASAN_HW_TAGS */
+
+static inline bool system_uses_mte_async_or_asymm_mode(void)
+{
+	return false;
+}
+
+#endif /* CONFIG_KASAN_HW_TAGS */
+
 #ifdef CONFIG_ARM64_MTE
 
+/*
+ * The Tag Check Flag (TCF) mode for MTE is per EL, hence TCF0
+ * affects EL0 and TCF affects EL1 irrespective of which TTBR is
+ * used.
+ * The kernel accesses TTBR0 usually with LDTR/STTR instructions
+ * when UAO is available, so these would act as EL0 accesses using
+ * TCF0.
+ * However futex.h code uses exclusives which would be executed as
+ * EL1, this can potentially cause a tag check fault even if the
+ * user disables TCF0.
+ *
+ * To address the problem we set the PSTATE.TCO bit in uaccess_enable()
+ * and reset it in uaccess_disable().
+ *
+ * The Tag check override (TCO) bit disables temporarily the tag checking
+ * preventing the issue.
+ */
+static inline void __mte_disable_tco(void)
+{
+	asm volatile(ALTERNATIVE("nop", SET_PSTATE_TCO(0),
+				 ARM64_MTE, CONFIG_KASAN_HW_TAGS));
+}
+
+static inline void __mte_enable_tco(void)
+{
+	asm volatile(ALTERNATIVE("nop", SET_PSTATE_TCO(1),
+				 ARM64_MTE, CONFIG_KASAN_HW_TAGS));
+}
+
+/*
+ * These functions disable tag checking only if in MTE async mode
+ * since the sync mode generates exceptions synchronously and the
+ * nofault or load_unaligned_zeropad can handle them.
+ */
+static inline void __mte_disable_tco_async(void)
+{
+	if (system_uses_mte_async_or_asymm_mode())
+		 __mte_disable_tco();
+}
+
+static inline void __mte_enable_tco_async(void)
+{
+	if (system_uses_mte_async_or_asymm_mode())
+		__mte_enable_tco();
+}
+
 /*
  * These functions are meant to be only used from KASAN runtime through
  * the arch_*() interface defined in asm/memory.h.
@@ -138,6 +203,22 @@ void mte_enable_kernel_asymm(void);
 
 #else /* CONFIG_ARM64_MTE */
 
+static inline void __mte_disable_tco(void)
+{
+}
+
+static inline void __mte_enable_tco(void)
+{
+}
+
+static inline void __mte_disable_tco_async(void)
+{
+}
+
+static inline void __mte_enable_tco_async(void)
+{
+}
+
 static inline u8 mte_get_ptr_tag(void *ptr)
 {
 	return 0xFF;
diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
index 20dd06d70af5..c028afb1cd0b 100644
--- a/arch/arm64/include/asm/mte.h
+++ b/arch/arm64/include/asm/mte.h
@@ -178,14 +178,6 @@ static inline void mte_disable_tco_entry(struct task_struct *task)
 }
 
 #ifdef CONFIG_KASAN_HW_TAGS
-/* Whether the MTE asynchronous mode is enabled. */
-DECLARE_STATIC_KEY_FALSE(mte_async_or_asymm_mode);
-
-static inline bool system_uses_mte_async_or_asymm_mode(void)
-{
-	return static_branch_unlikely(&mte_async_or_asymm_mode);
-}
-
 void mte_check_tfsr_el1(void);
 
 static inline void mte_check_tfsr_entry(void)
@@ -212,10 +204,6 @@ static inline void mte_check_tfsr_exit(void)
 	mte_check_tfsr_el1();
 }
 #else
-static inline bool system_uses_mte_async_or_asymm_mode(void)
-{
-	return false;
-}
 static inline void mte_check_tfsr_el1(void)
 {
 }
diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index 5c7b2f9d5913..057ec1882326 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -136,55 +136,9 @@ static inline void __uaccess_enable_hw_pan(void)
 			CONFIG_ARM64_PAN));
 }
 
-/*
- * The Tag Check Flag (TCF) mode for MTE is per EL, hence TCF0
- * affects EL0 and TCF affects EL1 irrespective of which TTBR is
- * used.
- * The kernel accesses TTBR0 usually with LDTR/STTR instructions
- * when UAO is available, so these would act as EL0 accesses using
- * TCF0.
- * However futex.h code uses exclusives which would be executed as
- * EL1, this can potentially cause a tag check fault even if the
- * user disables TCF0.
- *
- * To address the problem we set the PSTATE.TCO bit in uaccess_enable()
- * and reset it in uaccess_disable().
- *
- * The Tag check override (TCO) bit disables temporarily the tag checking
- * preventing the issue.
- */
-static inline void __uaccess_disable_tco(void)
-{
-	asm volatile(ALTERNATIVE("nop", SET_PSTATE_TCO(0),
-				 ARM64_MTE, CONFIG_KASAN_HW_TAGS));
-}
-
-static inline void __uaccess_enable_tco(void)
-{
-	asm volatile(ALTERNATIVE("nop", SET_PSTATE_TCO(1),
-				 ARM64_MTE, CONFIG_KASAN_HW_TAGS));
-}
-
-/*
- * These functions disable tag checking only if in MTE async mode
- * since the sync mode generates exceptions synchronously and the
- * nofault or load_unaligned_zeropad can handle them.
- */
-static inline void __uaccess_disable_tco_async(void)
-{
-	if (system_uses_mte_async_or_asymm_mode())
-		 __uaccess_disable_tco();
-}
-
-static inline void __uaccess_enable_tco_async(void)
-{
-	if (system_uses_mte_async_or_asymm_mode())
-		__uaccess_enable_tco();
-}
-
 static inline void uaccess_disable_privileged(void)
 {
-	__uaccess_disable_tco();
+	__mte_disable_tco();
 
 	if (uaccess_ttbr0_disable())
 		return;
@@ -194,7 +148,7 @@ static inline void uaccess_disable_privileged(void)
 
 static inline void uaccess_enable_privileged(void)
 {
-	__uaccess_enable_tco();
+	__mte_enable_tco();
 
 	if (uaccess_ttbr0_enable())
 		return;
@@ -302,8 +256,8 @@ do {									\
 #define get_user	__get_user
 
 /*
- * We must not call into the scheduler between __uaccess_enable_tco_async() and
- * __uaccess_disable_tco_async(). As `dst` and `src` may contain blocking
+ * We must not call into the scheduler between __mte_enable_tco_async() and
+ * __mte_disable_tco_async(). As `dst` and `src` may contain blocking
  * functions, we must evaluate these outside of the critical section.
  */
 #define __get_kernel_nofault(dst, src, type, err_label)			\
@@ -312,10 +266,10 @@ do {									\
 	__typeof__(src) __gkn_src = (src);				\
 	int __gkn_err = 0;						\
 									\
-	__uaccess_enable_tco_async();					\
+	__mte_enable_tco_async();					\
 	__raw_get_mem("ldr", *((type *)(__gkn_dst)),			\
 		      (__force type *)(__gkn_src), __gkn_err, K);	\
-	__uaccess_disable_tco_async();					\
+	__mte_disable_tco_async();					\
 									\
 	if (unlikely(__gkn_err))					\
 		goto err_label;						\
@@ -388,8 +342,8 @@ do {									\
 #define put_user	__put_user
 
 /*
- * We must not call into the scheduler between __uaccess_enable_tco_async() and
- * __uaccess_disable_tco_async(). As `dst` and `src` may contain blocking
+ * We must not call into the scheduler between __mte_enable_tco_async() and
+ * __mte_disable_tco_async(). As `dst` and `src` may contain blocking
  * functions, we must evaluate these outside of the critical section.
  */
 #define __put_kernel_nofault(dst, src, type, err_label)			\
@@ -398,10 +352,10 @@ do {									\
 	__typeof__(src) __pkn_src = (src);				\
 	int __pkn_err = 0;						\
 									\
-	__uaccess_enable_tco_async();					\
+	__mte_enable_tco_async();					\
 	__raw_put_mem("str", *((type *)(__pkn_src)),			\
 		      (__force type *)(__pkn_dst), __pkn_err, K);	\
-	__uaccess_disable_tco_async();					\
+	__mte_disable_tco_async();					\
 									\
 	if (unlikely(__pkn_err))					\
 		goto err_label;						\
diff --git a/arch/arm64/include/asm/word-at-a-time.h b/arch/arm64/include/asm/word-at-a-time.h
index 1c8e4f2490bf..f3b151ed0d7a 100644
--- a/arch/arm64/include/asm/word-at-a-time.h
+++ b/arch/arm64/include/asm/word-at-a-time.h
@@ -55,7 +55,7 @@ static inline unsigned long load_unaligned_zeropad(const void *addr)
 {
 	unsigned long ret;
 
-	__uaccess_enable_tco_async();
+	__mte_enable_tco_async();
 
 	/* Load word from unaligned pointer addr */
 	asm(
@@ -65,7 +65,7 @@ static inline unsigned long load_unaligned_zeropad(const void *addr)
 	: "=&r" (ret)
 	: "r" (addr), "Q" (*(unsigned long *)addr));
 
-	__uaccess_disable_tco_async();
+	__mte_disable_tco_async();
 
 	return ret;
 }
  
袁帅(Shuai Yuan) March 3, 2023, 10:41 a.m. UTC | #11
> On Tue, Feb 28, 2023 at 10:50:46PM +0100, Andrey Konovalov wrote:
> > On Tue, Feb 28, 2023 at 5:09 PM Catalin Marinas <catalin.marinas@arm.com>
> >
> > Right, but here we don't want to re-enable MTE after a fault, we want
> > to suppress faults when printing an error report.
> >
> > > IIUC, the problem is that the kernel already got an MTE fault, so at
> > > that point the error is not really recoverable.
> >
> > No, the problem is with the following sequence of events:
> >
> > 1. KASAN detects a memory corruption and starts printing a report
> > _without getting an MTE fault_. This happens when e.g. KASAN sees a
> > free of an invalid address.
> >
> > 2. During error reporting, an MTE fault is triggered by the error
> > reporting code. E.g. while collecting information about the accessed
> > slab object.
> >
> > 3. KASAN tries to print another report while printing a report and
> > goes into a deadlock.
> >
> > If we could avoid MTE faults being triggered during error reporting,
> > this would solve the problem.
>
> Ah, I get it now. So we just want to avoid triggering a benign MTE fault.
>
> > > If we want to avoid a
> > > fault in the first place, we could do something like
> > > __uaccess_enable_tco() (Vincenzo has some patches to generalise
> > > these
> > > routines)
> >
> > Ah, this looks exactly like what we need. Adding
> > __uaccess_en/disable_tco to kasan_report_invalid_free solves the
> > problem.
> >
> > Do you think it would be possible to expose these routines to KASAN?
>
> Yes. I'm including Vincenzo's patch below (part of fixing some potential
> strscpy() faults with its unaligned accesses eager reading; we'll get to posting
> that eventually). You can add some arch_kasan_enable/disable() macros on
> top and feel free to include the patch below.

I have initially verified the following code on kernel version 5.15 and it is valid.
Although not using the latest interface, there is no fundamental difference.
I think this change should also apply to the latest kernel code.

diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index 1f96a72c7edd..73b7fc532d81 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -28,7 +28,9 @@
 #include <trace/events/error_report.h>

 #include <asm/sections.h>
-
+#ifdef CONFIG_KASAN_HW_TAGS
+#include <asm/uaccess.h>
+#endif
 #include <kunit/test.h>

 #include "kasan.h"
@@ -107,6 +109,10 @@ static void start_report(unsigned long *flags)
  */
 kasan_disable_current();
 spin_lock_irqsave(&report_lock, *flags);
+#ifdef CONFIG_KASAN_HW_TAGS
+if (kasan_hw_tags_enabled())
+__uaccess_enable_tco();
+#endif
 pr_err("==================================================================\n");
 }

@@ -116,6 +122,10 @@ static void end_report(unsigned long *flags, unsigned long addr)
 trace_error_report_end(ERROR_DETECTOR_KASAN, addr);
pr_err("==================================================================\n");
 add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
+#ifdef CONFIG_KASAN_HW_TAGS
+if (kasan_hw_tags_enabled())
+__uaccess_disable_tco();
+#endif
spin_unlock_irqrestore(&report_lock, *flags);
 if (panic_on_warn && !test_bit(KASAN_BIT_MULTI_SHOT, &kasan_flags)) {
 /*

> Now, I wonder whether we should link those into kasan_disable_current().
> These functions only deal with the depth for KASAN_SW_TAGS but it would
> make sense for KASAN_HW_TAGS to enable tag-check-override so that we
> don't need to bother with a match-all tags on pointer dereferencing.

ZEKU
信息安全声明:本邮件包含信息归发件人所在组织ZEKU所有。 禁止任何人在未经授权的情况下以任何形式(包括但不限于全部或部分披露、复制或传播)使用包含的信息。若您错收了本邮件,请立即电话或邮件通知发件人,并删除本邮件及附件。
Information Security Notice: The information contained in this mail is solely property of the sender's organization ZEKU. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this email in error, please notify the sender by phone or email immediately and delete it.
  
Andrey Konovalov March 10, 2023, 11:42 p.m. UTC | #12
On Wed, Mar 1, 2023 at 6:00 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> Yes. I'm including Vincenzo's patch below (part of fixing some potential
> strscpy() faults with its unaligned accesses eager reading; we'll get to
> posting that eventually). You can add some arch_kasan_enable/disable()
> macros on top and feel free to include the patch below.

Ah, perfect! I'll send a patchset soon. Thanks!

> Now, I wonder whether we should link those into kasan_disable_current().
> These functions only deal with the depth for KASAN_SW_TAGS but it would
> make sense for KASAN_HW_TAGS to enable tag-check-override so that we
> don't need to bother with a match-all tags on pointer dereferencing.

Using these TCO routines requires having (at least) migration disabled, right?

It's not a problem for KASAN reporting code, as it already disables
preemption anyway.

The question is with the other kasan_disable/enable_current() call
sites. But as within all of them, the code does either a single access
or a memcpy or something similar, I think we can disable preemption
for that duration.

On a related note, I recalled that we also have a bug about using
supporting no_sanitize_address for HW_TAGS KASAN. And Peter suggested
using TCO entry/exit instrumentation to resolve it [2]. However, we
will also need to disable preemption for the duration of
no_sanitize_address-annotated functions, and I'm not sure if it's a
good idea to do that via compiler instrumentation.

Any thoughts?

In the mean time, I'll send a simpler patchset without converting all
kasan_disable/enable_current().

[1] https://bugzilla.kernel.org/show_bug.cgi?id=212513
[2] https://bugzilla.kernel.org/show_bug.cgi?id=212513#c2
  
Catalin Marinas March 15, 2023, 4:14 p.m. UTC | #13
On Sat, Mar 11, 2023 at 12:42:20AM +0100, Andrey Konovalov wrote:
> On Wed, Mar 1, 2023 at 6:00 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > Yes. I'm including Vincenzo's patch below (part of fixing some potential
> > strscpy() faults with its unaligned accesses eager reading; we'll get to
> > posting that eventually). You can add some arch_kasan_enable/disable()
> > macros on top and feel free to include the patch below.
> 
> Ah, perfect! I'll send a patchset soon. Thanks!
> 
> > Now, I wonder whether we should link those into kasan_disable_current().
> > These functions only deal with the depth for KASAN_SW_TAGS but it would
> > make sense for KASAN_HW_TAGS to enable tag-check-override so that we
> > don't need to bother with a match-all tags on pointer dereferencing.
> 
> Using these TCO routines requires having (at least) migration disabled, right?

Not necessarily. The TCO is set per CPU and disabling preemption (I
don't think migration is sufficient) would work but these routines are
also called on a uaccess fault path, so it needs to be preemptible. We
used to clear TCO on exception entry prior to commit 38ddf7dafaea
("arm64: mte: avoid clearing PSTATE.TCO on entry unless necessary") but
we restore it anyway on exception return.

I think the only problem is if between these routines, we invoke
cond_resched() directly. Not sure what the kasan code does but disabling
preemption should avoid a reschedule. Another option is for
mte_thread_switch() to context switch the TCO state.
  

Patch

diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index 22598b20c7b7..aa39aa8b1855 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -166,7 +166,7 @@  static inline void fail_non_kasan_kunit_test(void) { }
 
 static DEFINE_SPINLOCK(report_lock);
 
-static void start_report(unsigned long *flags, bool sync)
+static bool start_report(unsigned long *flags, bool sync)
 {
 	fail_non_kasan_kunit_test();
 	/* Respect the /proc/sys/kernel/traceoff_on_warning interface. */
@@ -175,8 +175,13 @@  static void start_report(unsigned long *flags, bool sync)
 	lockdep_off();
 	/* Make sure we don't end up in loop. */
 	kasan_disable_current();
-	spin_lock_irqsave(&report_lock, *flags);
+	if (!spin_trylock_irqsave(&report_lock, *flags)) {
+		lockdep_on();
+		kasan_enable_current();
+		return false;
+	}
 	pr_err("==================================================================\n");
+	return true;
 }
 
 static void end_report(unsigned long *flags, void *addr)
@@ -468,7 +473,10 @@  void kasan_report_invalid_free(void *ptr, unsigned long ip, enum kasan_report_ty
 	if (unlikely(!report_enabled()))
 		return;
 
-	start_report(&flags, true);
+	if (!start_report(&flags, true)) {
+		pr_err("%s: report ignore\n", __func__);
+		return;
+	}
 
 	memset(&info, 0, sizeof(info));
 	info.type = type;
@@ -503,7 +511,11 @@  bool kasan_report(unsigned long addr, size_t size, bool is_write,
 		goto out;
 	}
 
-	start_report(&irq_flags, true);
+	if (!start_report(&irq_flags, true)) {
+		ret = false;
+		pr_err("%s: report ignore\n", __func__);
+		goto out;
+	}
 
 	memset(&info, 0, sizeof(info));
 	info.type = KASAN_REPORT_ACCESS;
@@ -536,7 +548,10 @@  void kasan_report_async(void)
 	if (unlikely(!report_enabled()))
 		return;
 
-	start_report(&flags, false);
+	if (!start_report(&flags, false)) {
+		pr_err("%s: report ignore\n", __func__);
+		return;
+	}
 	pr_err("BUG: KASAN: invalid-access\n");
 	pr_err("Asynchronous fault: no details available\n");
 	pr_err("\n");