[RFC,v2,00/31] timers: Use del_timer_shutdown() before freeing timers

Message ID	20221027150525.753064657@goodmis.org
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Message-ID: <20221027150525.753064657@goodmis.org> User-Agent: quilt/0.66 Date: Thu, 27 Oct 2022 11:05:25 -0400 From: Steven Rostedt <rostedt@goodmis.org> To: linux-kernel@vger.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org>, Thomas Gleixner <tglx@linutronix.de>, Stephen Boyd <sboyd@kernel.org>, Guenter Roeck <linux@roeck-us.net> Subject: [RFC][PATCH v2 00/31] timers: Use del_timer_shutdown() before freeing timers Precedence: bulk
Series	timers: Use del_timer_shutdown() before freeing timers \| [RFC,v2,00/31] timers: Use del_timer_shutdown() before freeing timers [RFC,v2,01/31] timers: Add del_timer_shutdown() to be called before freeing timers [RFC,v2,02/31] timers: s390/cmm: Use del_timer_shutdown() before freeing timer [RFC,v2,03/31] timers: sh: Use del_timer_shutdown() before freeing timer [RFC,v2,05/31] timers: ACPI: Use del_timer_shutdown() before freeing timer [RFC,v2,06/31] timers: atm: Use del_timer_shutdown() before freeing timer [RFC,v2,07/31] timers: PM: Use del_timer_shutdown() [RFC,v2,08/31] timers: Bluetooth: Use del_timer_shutdown() before freeing timer [RFC,v2,09/31] timers: hangcheck: Use del_timer_shutdown() before freeing timer [RFC,v2,10/31] timers: ipmi: Use del_timer_shutdown() before freeing timer [RFC,v2,11/31] timers: random: Use del_timer_shutdown() before freeing timer [RFC,v2,14/31] timers: HID: Use del_timer_shutdown() before freeing timer [RFC,v2,16/31] timers: mISDN: Use del_timer_shutdown() before freeing timer [RFC,v2,17/31] timers: leds: Use del_timer_shutdown() before freeing timer [RFC,v2,19/31] timers: net: Use del_timer_shutdown() before freeing timer [RFC,v2,20/31] timers: usb: Use del_timer_shutdown() before freeing timer [RFC,v2,21/31] timers: cgroup: Use del_timer_shutdown() before freeing timer [RFC,v2,22/31] timers: workqueue: Use del_timer_shutdown() before freeing timer [RFC,v2,23/31] timers: nfc: pn533: Use del_timer_shutdown() before freeing timer [RFC,v2,24/31] timers: pcmcia: Use del_timer_shutdown() before freeing timer [RFC,v2,25/31] timers: scsi: Use del_timer_shutdown() before freeing timer [RFC,v2,26/31] timers: tty: Use del_timer_shutdown() before freeing timer [RFC,v2,27/31] timers: ext4: Use del_timer_shutdown() before freeing timer [RFC,v2,28/31] timers: fs/nilfs2: Use del_timer_shutdown() before freeing timer [RFC,v2,29/31] timers: ALSA: Use del_timer_shutdown() before freeing timer [RFC,v2,30/31] timers: x86/mce: Use __init_timer() for resetting timers [RFC,v2,31/31] timers: Expand DEBUG_OBJECTS_TIMER to check if it ever was used

Message ID

20221027150525.753064657@goodmis.org

Headers

Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Message-ID: <20221027150525.753064657@goodmis.org>
User-Agent: quilt/0.66
Date: Thu, 27 Oct 2022 11:05:25 -0400
From: Steven Rostedt <rostedt@goodmis.org>
To: linux-kernel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
        Thomas Gleixner <tglx@linutronix.de>,
        Stephen Boyd <sboyd@kernel.org>,
        Guenter Roeck <linux@roeck-us.net>
Subject: [RFC][PATCH v2 00/31] timers: Use del_timer_shutdown() before freeing
 timers
Precedence: bulk

Series

timers: Use del_timer_shutdown() before freeing timers |

Message

Steven Rostedt Oct. 27, 2022, 3:05 p.m. UTC

  Back in April, I posted an RFC patch set to help mitigate a common issue
where a timer gets armed just before it is freed, and when the timer
goes off, it crashes in the timer code without any evidence of who the
culprit was. I got side tracked and never finished up on that patch set.
Since this type of crash is still our #1 crash we are seeing in the field,
it has become a priority again to finish it.

This is v2 of that patch set. Thomas Gleixner posted an untested version
that makes timer->function NULL as the flag that it is shutdown. I took that
code, tested it (fixed it up), added more comments, and changed the
name to del_timer_shutdown() as Linus had asked. I also converted it to use
WARN_ON_ONCE() instead of just WARN_ON() as Linus asked for that too.

(Thomas, you never added a SoB, so I only added a link to your email
 in that commit. But as this will likely go through your tree anyway,
 I'm sure you'll have your SoB on all these).

I then created a trivial coccinelle script to find where del_timer*()
is called before being freed, and converted them all to del_timer_shutdown()
(There was a couple that still used del_timer() instead of del_timer_sync()).

I also updated DEBUG_OBJECTS_TIMERS to check from where the timer is ever
armed, to calling of del_timer_shutdown(), and it will trigger if a timer
is freed in between. The current way is to only check if the timer is armed,
but that means it only triggers if the race condition is hit, and with
experience, it's not run on enough machines to catch all of them. By triggering
it from the time the timer is armed to the time it is shutdown, it catches
all potential cases even if the race condition is not hit.

I went though the result of the cocinelle script, and updated the locations.
Some locations were caught by DEBUG_OBJECTS_TIMERS as the coccinelle script
only checked for timers being freed in the same function as the del_timer*().

V1 is found here: https://lore.kernel.org/all/20220407161745.7d6754b3@gandalf.local.home/

Here's the original text of that version:

   [
     This is an RFC patch. As we hit a few bugs were del_timer() is called
     instead of del_timer_sync() before the timer is freed, and there could
     be bugs where even del_timer_sync() is used, but the timer gets rearmed,
     I decided to introduce a "del_timer_free()" function that can be used
     instead. This will at least educate developers on what to call before they
     free a structure that holds a timer.

     In this RFC, I modified hci_qca.c as a use case, even though that change
     needs some work, because the workqueue could still rearm it (I'm looking
     to see if I can trigger the warning).

     If this approach is acceptable, then I will remove the hci_qca.c portion
     from this patch, and create a series of patches to use the
     del_timer_free() in all the locations in the kernel that remove the timer
     before freeing.
   ]

   We are hitting a common bug were a timer is being triggered after it is
   freed. This causes a corruption in the timer link list and crashes the
   kernel. Unfortunately it is not easy to know what timer it was that was
   freed. Looking at the code, it appears that there are several cases that
   del_timer() is used when del_timer_sync() should have been.

   Add a del_timer_free() that not only does a del_timer_sync() but will mark
   the timer as freed in case it gets rearmed, it will trigger a WARN_ON. The
   del_timer_free() is more likely to be used by developers that are about to
   free a timer, then using del_timer_sync() as the latter is not as obvious
   to being needed for freeing. Having the word "free" in the name of the
   function will hopefully help developers know that that function needs to
   be called before freeing.

   The added bonus is the marking of the timer as being freed such that it
   will trigger a warning if it gets rearmed. At least that way if the system
   crashes on a freed timer, at least we may see which timer it was that was
   freed.

You can pull this series down from here:

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
timers


Head SHA1: 581931395e77326bb76d7648be080ce302244dd5


Steven Rostedt (Google) (31):
      timers: Add del_timer_shutdown() to be called before freeing timers
      timers: s390/cmm: Use del_timer_shutdown() before freeing timer
      timers: sh: Use del_timer_shutdown() before freeing timer
      timers: block: Use del_timer_shutdown() before freeing timer
      timers: ACPI: Use del_timer_shutdown() before freeing timer
      timers: atm: Use del_timer_shutdown() before freeing timer
      timers: PM: Use del_timer_shutdown()
      timers: Bluetooth: Use del_timer_shutdown() before freeing timer
      timers: hangcheck: Use del_timer_shutdown() before freeing timer
      timers: ipmi: Use del_timer_shutdown() before freeing timer
      timers: random: Use del_timer_shutdown() before freeing timer
      timers: dma-buf: Use del_timer_shutdown() before freeing timer
      timers: drm: Use del_timer_shutdown() before freeing timer
      timers: HID: Use del_timer_shutdown() before freeing timer
      timers: Input: Use del_timer_shutdown() before freeing timer
      timers: mISDN: Use del_timer_shutdown() before freeing timer
      timers: leds: Use del_timer_shutdown() before freeing timer
      timers: media: Use del_timer_shutdown() before freeing timer
      timers: net: Use del_timer_shutdown() before freeing timer
      timers: usb: Use del_timer_shutdown() before freeing timer
      timers: cgroup: Use del_timer_shutdown() before freeing timer
      timers: workqueue: Use del_timer_shutdown() before freeing timer
      timers: nfc: pn533: Use del_timer_shutdown() before freeing timer
      timers: pcmcia: Use del_timer_shutdown() before freeing timer
      timers: scsi: Use del_timer_shutdown() before freeing timer
      timers: tty: Use del_timer_shutdown() before freeing timer
      timers: ext4: Use del_timer_shutdown() before freeing timer
      timers: fs/nilfs2: Use del_timer_shutdown() before freeing timer
      timers: ALSA: Use del_timer_shutdown() before freeing timer
      timers: x86/mce: Use __init_timer() for resetting timers
      timers: Expand DEBUG_OBJECTS_TIMER to check if it ever was used

----
 .../RCU/Design/Requirements/Requirements.rst       |  2 +-
 Documentation/core-api/local_ops.rst               |  2 +-
 Documentation/kernel-hacking/locking.rst           |  4 ++
 arch/s390/mm/cmm.c                                 |  4 +-
 arch/sh/drivers/push-switch.c                      |  2 +-
 arch/x86/kernel/cpu/mce/core.c                     | 14 +++++-
 block/blk-iocost.c                                 |  2 +-
 block/blk-iolatency.c                              |  2 +-
 block/blk-stat.c                                   |  2 +-
 block/blk-throttle.c                               |  2 +-
 block/kyber-iosched.c                              |  2 +-
 drivers/acpi/apei/ghes.c                           |  2 +-
 drivers/atm/idt77105.c                             |  4 +-
 drivers/atm/idt77252.c                             |  4 +-
 drivers/atm/iphase.c                               |  2 +-
 drivers/base/power/wakeup.c                        |  7 +--
 drivers/block/drbd/drbd_main.c                     |  2 +-
 drivers/block/loop.c                               |  2 +-
 drivers/block/sunvdc.c                             |  2 +-
 drivers/bluetooth/hci_bcsp.c                       |  2 +-
 drivers/bluetooth/hci_h5.c                         |  2 +-
 drivers/bluetooth/hci_qca.c                        |  4 +-
 drivers/char/hangcheck-timer.c                     |  2 +-
 drivers/char/ipmi/ipmi_msghandler.c                |  2 +-
 drivers/char/ipmi/ipmi_ssif.c                      |  4 +-
 drivers/char/random.c                              |  2 +-
 drivers/dma-buf/st-dma-fence.c                     |  2 +-
 drivers/gpu/drm/gud/gud_pipe.c                     |  2 +-
 drivers/gpu/drm/i915/i915_sw_fence.c               |  2 +-
 drivers/hid/hid-wiimote-core.c                     |  2 +-
 drivers/input/keyboard/locomokbd.c                 |  2 +-
 drivers/input/keyboard/omap-keypad.c               |  2 +-
 drivers/input/mouse/alps.c                         |  2 +-
 drivers/input/serio/hil_mlc.c                      |  2 +-
 drivers/input/serio/hp_sdc.c                       |  2 +-
 drivers/isdn/hardware/mISDN/hfcmulti.c             |  2 +-
 drivers/isdn/mISDN/l1oip_core.c                    |  4 +-
 drivers/isdn/mISDN/timerdev.c                      |  4 +-
 drivers/leds/trigger/ledtrig-activity.c            |  2 +-
 drivers/leds/trigger/ledtrig-heartbeat.c           |  2 +-
 drivers/leds/trigger/ledtrig-pattern.c             |  2 +-
 drivers/leds/trigger/ledtrig-transient.c           |  2 +-
 drivers/media/pci/ivtv/ivtv-driver.c               |  2 +-
 drivers/media/usb/pvrusb2/pvrusb2-hdw.c            | 18 +++----
 drivers/media/usb/s2255/s2255drv.c                 |  4 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c        |  6 +--
 drivers/net/ethernet/marvell/sky2.c                |  2 +-
 drivers/net/ethernet/sun/sunvnet.c                 |  2 +-
 drivers/net/usb/sierra_net.c                       |  2 +-
 drivers/net/wireless/intel/iwlwifi/iwl-dbg-tlv.c   |  2 +-
 drivers/net/wireless/intersil/hostap/hostap_ap.c   |  2 +-
 drivers/net/wireless/marvell/mwifiex/main.c        |  2 +-
 drivers/net/wireless/microchip/wilc1000/hif.c      |  8 +--
 drivers/nfc/pn533/pn533.c                          |  2 +-
 drivers/nfc/pn533/uart.c                           |  2 +-
 drivers/pcmcia/bcm63xx_pcmcia.c                    |  2 +-
 drivers/pcmcia/electra_cf.c                        |  2 +-
 drivers/pcmcia/omap_cf.c                           |  2 +-
 drivers/pcmcia/pd6729.c                            |  4 +-
 drivers/pcmcia/yenta_socket.c                      |  4 +-
 drivers/scsi/qla2xxx/qla_edif.c                    |  4 +-
 drivers/staging/media/atomisp/i2c/atomisp-lm3554.c |  2 +-
 drivers/tty/n_gsm.c                                |  2 +-
 drivers/tty/sysrq.c                                |  2 +-
 drivers/usb/core/hub.c                             |  3 ++
 drivers/usb/gadget/udc/m66592-udc.c                |  2 +-
 drivers/usb/serial/garmin_gps.c                    |  2 +-
 drivers/usb/serial/mos7840.c                       |  2 +-
 fs/ext4/super.c                                    |  2 +-
 fs/nilfs2/segment.c                                |  2 +-
 include/linux/timer.h                              | 47 +++++++++++++++---
 kernel/cgroup/cgroup.c                             |  2 +-
 kernel/time/timer.c                                | 57 ++++++++++++++--------
 kernel/workqueue.c                                 |  4 +-
 net/802/garp.c                                     |  2 +-
 net/802/mrp.c                                      |  2 +-
 net/bridge/br_multicast.c                          |  6 +--
 net/bridge/br_multicast_eht.c                      |  4 +-
 net/core/gen_estimator.c                           |  2 +-
 net/core/sock.c                                    |  2 +-
 net/ipv4/inet_timewait_sock.c                      |  2 +-
 net/ipv4/ipmr.c                                    |  2 +-
 net/ipv6/ip6mr.c                                   |  2 +-
 net/mac80211/mesh_pathtbl.c                        |  2 +-
 net/netfilter/ipset/ip_set_list_set.c              |  2 +-
 net/netfilter/ipvs/ip_vs_lblc.c                    |  2 +-
 net/netfilter/ipvs/ip_vs_lblcr.c                   |  2 +-
 net/netfilter/xt_LED.c                             |  2 +-
 net/rxrpc/conn_object.c                            |  2 +-
 net/sched/cls_flow.c                               |  2 +-
 net/sunrpc/svc.c                                   |  2 +-
 net/tipc/discover.c                                |  2 +-
 net/tipc/monitor.c                                 |  2 +-
 sound/i2c/other/ak4117.c                           |  2 +-
 sound/synth/emux/emux.c                            |  2 +-
 95 files changed, 213 insertions(+), 153 deletions(-)

Comments

Steven Rostedt Oct. 27, 2022, 3:19 p.m. UTC | #1

[
  quilt mail --send still can't handle unicode characters.
    Here's the patch again
]

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

Before a timer is freed, del_timer_shutdown() must be called.

Link: https://lore.kernel.org/all/20220407161745.7d6754b3@gandalf.local.home/

Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: "Christoph Böhmwalder" <christoph.boehmwalder@linbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: drbd-dev@lists.linbit.com
Cc: Tejun Heo <tj@kernel.org>
Cc: cgroups@vger.kernel.org
Cc: linux-block@vger.kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 block/blk-iocost.c             | 2 +-
 block/blk-iolatency.c          | 2 +-
 block/blk-stat.c               | 2 +-
 block/blk-throttle.c           | 2 +-
 block/kyber-iosched.c          | 2 +-
 drivers/block/drbd/drbd_main.c | 2 +-
 drivers/block/loop.c           | 2 +-
 drivers/block/sunvdc.c         | 2 +-
 8 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index 495396425bad..e2d4bdd3d135 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -2814,7 +2814,7 @@ static void ioc_rqos_exit(struct rq_qos *rqos)
 	ioc->running = IOC_STOP;
 	spin_unlock_irq(&ioc->lock);
 
-	del_timer_sync(&ioc->timer);
+	del_timer_shutdown(&ioc->timer);
 	free_percpu(ioc->pcpu_stat);
 	kfree(ioc);
 }
diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index 571fa95aafe9..7b61f09afedd 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -645,7 +645,7 @@ static void blkcg_iolatency_exit(struct rq_qos *rqos)
 {
 	struct blk_iolatency *blkiolat = BLKIOLATENCY(rqos);
 
-	del_timer_sync(&blkiolat->timer);
+	del_timer_shutdown(&blkiolat->timer);
 	flush_work(&blkiolat->enable_work);
 	blkcg_deactivate_policy(rqos->q, &blkcg_policy_iolatency);
 	kfree(blkiolat);
diff --git a/block/blk-stat.c b/block/blk-stat.c
index 2ea01b5c1aca..de51db302c44 100644
--- a/block/blk-stat.c
+++ b/block/blk-stat.c
@@ -165,7 +165,7 @@ void blk_stat_remove_callback(struct request_queue *q,
 		blk_queue_flag_clear(QUEUE_FLAG_STATS, q);
 	spin_unlock_irqrestore(&q->stats->lock, flags);
 
-	del_timer_sync(&cb->timer);
+	del_timer_shutdown(&cb->timer);
 }
 
 static void blk_stat_free_callback_rcu(struct rcu_head *head)
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 847721dc2b2b..95af99f24137 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -490,7 +490,7 @@ static void throtl_pd_free(struct blkg_policy_data *pd)
 {
 	struct throtl_grp *tg = pd_to_tg(pd);
 
-	del_timer_sync(&tg->service_queue.pending_timer);
+	del_timer_shutdown(&tg->service_queue.pending_timer);
 	blkg_rwstat_exit(&tg->stat_bytes);
 	blkg_rwstat_exit(&tg->stat_ios);
 	kfree(tg);
diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
index b05357bced99..59a444a47ba3 100644
--- a/block/kyber-iosched.c
+++ b/block/kyber-iosched.c
@@ -434,7 +434,7 @@ static void kyber_exit_sched(struct elevator_queue *e)
 	struct kyber_queue_data *kqd = e->elevator_data;
 	int i;
 
-	del_timer_sync(&kqd->timer);
+	del_timer_shutdown(&kqd->timer);
 	blk_stat_disable_accounting(kqd->q);
 
 	for (i = 0; i < KYBER_NUM_DOMAINS; i++)
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index f3e4db16fd07..3f574f3769c3 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -2184,7 +2184,7 @@ void drbd_destroy_device(struct kref *kref)
 	struct drbd_resource *resource = device->resource;
 	struct drbd_peer_device *peer_device, *tmp_peer_device;
 
-	del_timer_sync(&device->request_timer);
+	del_timer_shutdown(&device->request_timer);
 
 	/* paranoia asserts */
 	D_ASSERT(device, device->open_cnt == 0);
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index ad92192c7d61..d134a5fd4ae7 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1755,7 +1755,7 @@ static void lo_free_disk(struct gendisk *disk)
 	if (lo->workqueue)
 		destroy_workqueue(lo->workqueue);
 	loop_free_idle_workers(lo, true);
-	del_timer_sync(&lo->timer);
+	del_timer_shutdown(&lo->timer);
 	mutex_destroy(&lo->lo_mutex);
 	kfree(lo);
 }
diff --git a/drivers/block/sunvdc.c b/drivers/block/sunvdc.c
index fb855da971ee..9868937a9602 100644
--- a/drivers/block/sunvdc.c
+++ b/drivers/block/sunvdc.c
@@ -1067,7 +1067,7 @@ static void vdc_port_remove(struct vio_dev *vdev)
 
 		flush_work(&port->ldc_reset_work);
 		cancel_delayed_work_sync(&port->ldc_reset_timer_work);
-		del_timer_sync(&port->vio.timer);
+		del_timer_shutdown(&port->vio.timer);
 
 		del_gendisk(port->disk);
 		put_disk(port->disk);

Steven Rostedt Oct. 27, 2022, 3:20 p.m. UTC | #2

[
  quilt mail --send still can't handle unicode characters.
    Here's the patch again
]

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

Before a timer is freed, del_timer_shutdown() must be called.

Link: https://lore.kernel.org/all/20220407161745.7d6754b3@gandalf.local.home/

Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 drivers/dma-buf/st-dma-fence.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/dma-buf/st-dma-fence.c b/drivers/dma-buf/st-dma-fence.c
index fb6e0a6ae2c9..c67b70205b6f 100644
--- a/drivers/dma-buf/st-dma-fence.c
+++ b/drivers/dma-buf/st-dma-fence.c
@@ -412,7 +412,7 @@ static int test_wait_timeout(void *arg)
 
 	err = 0;
 err_free:
-	del_timer_sync(&wt.timer);
+	del_timer_shutdown(&wt.timer);
 	destroy_timer_on_stack(&wt.timer);
 	dma_fence_signal(wt.f);
 	dma_fence_put(wt.f);

Steven Rostedt Oct. 27, 2022, 3:20 p.m. UTC | #3

[
  quilt mail --send still can't handle unicode characters.
    Here's the patch again
]

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

Before a timer is freed, del_timer_shutdown() must be called.

Link: https://lore.kernel.org/all/20220407161745.7d6754b3@gandalf.local.home/

Cc: "Noralf Trønnes" <noralf@tronnes.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: dri-devel@lists.freedesktop.org
Cc: intel-gfx@lists.freedesktop.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 drivers/gpu/drm/gud/gud_pipe.c       | 2 +-
 drivers/gpu/drm/i915/i915_sw_fence.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/gud/gud_pipe.c b/drivers/gpu/drm/gud/gud_pipe.c
index 7c6dc2bcd14a..5117a24ca4b5 100644
--- a/drivers/gpu/drm/gud/gud_pipe.c
+++ b/drivers/gpu/drm/gud/gud_pipe.c
@@ -272,7 +272,7 @@ static int gud_usb_bulk(struct gud_device *gdrm, size_t len)
 
 	usb_sg_wait(&ctx.sgr);
 
-	if (!del_timer_sync(&ctx.timer))
+	if (!del_timer_shutdown(&ctx.timer))
 		ret = -ETIMEDOUT;
 	else if (ctx.sgr.status < 0)
 		ret = ctx.sgr.status;
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
index 6fc0d1b89690..c762e99c8fdf 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence.c
@@ -465,7 +465,7 @@ static void irq_i915_sw_fence_work(struct irq_work *wrk)
 	struct i915_sw_dma_fence_cb_timer *cb =
 		container_of(wrk, typeof(*cb), work);
 
-	del_timer_sync(&cb->timer);
+	del_timer_shutdown(&cb->timer);
 	dma_fence_put(cb->dma);
 
 	kfree_rcu(cb, rcu);

Steven Rostedt Oct. 27, 2022, 3:21 p.m. UTC | #4

[
  quilt mail --send still can't handle unicode characters.
    Here's the patch again
]

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

Before a timer is freed, del_timer_shutdown() must be called.

Link: https://lore.kernel.org/all/20220407161745.7d6754b3@gandalf.local.home/

Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: "Pali Rohár" <pali@kernel.org>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-input@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 drivers/input/keyboard/locomokbd.c   | 2 +-
 drivers/input/keyboard/omap-keypad.c | 2 +-
 drivers/input/mouse/alps.c           | 2 +-
 drivers/input/serio/hil_mlc.c        | 2 +-
 drivers/input/serio/hp_sdc.c         | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/input/keyboard/locomokbd.c b/drivers/input/keyboard/locomokbd.c
index dae053596572..aabd1f6a986a 100644
--- a/drivers/input/keyboard/locomokbd.c
+++ b/drivers/input/keyboard/locomokbd.c
@@ -310,7 +310,7 @@ static void locomokbd_remove(struct locomo_dev *dev)
 
 	free_irq(dev->irq[0], locomokbd);
 
-	del_timer_sync(&locomokbd->timer);
+	del_timer_shutdown(&locomokbd->timer);
 
 	input_unregister_device(locomokbd->input);
 	locomo_set_drvdata(dev, NULL);
diff --git a/drivers/input/keyboard/omap-keypad.c b/drivers/input/keyboard/omap-keypad.c
index 57447d6c9007..39974158961d 100644
--- a/drivers/input/keyboard/omap-keypad.c
+++ b/drivers/input/keyboard/omap-keypad.c
@@ -296,7 +296,7 @@ static int omap_kp_remove(struct platform_device *pdev)
 	omap_writew(1, OMAP1_MPUIO_BASE + OMAP_MPUIO_KBD_MASKIT);
 	free_irq(omap_kp->irq, omap_kp);
 
-	del_timer_sync(&omap_kp->timer);
+	del_timer_shutdown(&omap_kp->timer);
 	tasklet_kill(&kp_tasklet);
 
 	/* unregister everything */
diff --git a/drivers/input/mouse/alps.c b/drivers/input/mouse/alps.c
index 4a6b33bbe7ea..4d402e75aca8 100644
--- a/drivers/input/mouse/alps.c
+++ b/drivers/input/mouse/alps.c
@@ -2970,7 +2970,7 @@ static void alps_disconnect(struct psmouse *psmouse)
 	struct alps_data *priv = psmouse->private;
 
 	psmouse_reset(psmouse);
-	del_timer_sync(&priv->timer);
+	del_timer_shutdown(&priv->timer);
 	if (priv->dev2)
 		input_unregister_device(priv->dev2);
 	if (!IS_ERR_OR_NULL(priv->dev3))
diff --git a/drivers/input/serio/hil_mlc.c b/drivers/input/serio/hil_mlc.c
index d36e89d6fc54..33fc73da8e54 100644
--- a/drivers/input/serio/hil_mlc.c
+++ b/drivers/input/serio/hil_mlc.c
@@ -1017,7 +1017,7 @@ static int __init hil_mlc_init(void)
 
 static void __exit hil_mlc_exit(void)
 {
-	del_timer_sync(&hil_mlcs_kicker);
+	del_timer_shutdown(&hil_mlcs_kicker);
 	tasklet_kill(&hil_mlcs_tasklet);
 }
 
diff --git a/drivers/input/serio/hp_sdc.c b/drivers/input/serio/hp_sdc.c
index 13eacf6ab431..49e5884df801 100644
--- a/drivers/input/serio/hp_sdc.c
+++ b/drivers/input/serio/hp_sdc.c
@@ -980,7 +980,7 @@ static void hp_sdc_exit(void)
 	free_irq(hp_sdc.irq, &hp_sdc);
 	write_unlock_irq(&hp_sdc.lock);
 
-	del_timer_sync(&hp_sdc.kicker);
+	del_timer_shutdown(&hp_sdc.kicker);
 
 	tasklet_kill(&hp_sdc.task);

Jason A. Donenfeld Oct. 27, 2022, 3:52 p.m. UTC | #5

On Thu, Oct 27, 2022 at 11:05:25AM -0400, Steven Rostedt wrote:
>    We are hitting a common bug were a timer is being triggered after it is
>    freed. This causes a corruption in the timer link list and crashes the
>    kernel. Unfortunately it is not easy to know what timer it was that was
>    freed. Looking at the code, it appears that there are several cases that
>    del_timer() is used when del_timer_sync() should have been.
> 
>    Add a del_timer_free() that not only does a del_timer_sync() but will mark
>    the timer as freed in case it gets rearmed, it will trigger a WARN_ON. The
>    del_timer_free() is more likely to be used by developers that are about to
>    free a timer, then using del_timer_sync() as the latter is not as obvious
>    to being needed for freeing. Having the word "free" in the name of the
>    function will hopefully help developers know that that function needs to
>    be called before freeing.
> 
>    The added bonus is the marking of the timer as being freed such that it
>    will trigger a warning if it gets rearmed. At least that way if the system
>    crashes on a freed timer, at least we may see which timer it was that was
>    freed.

FYI, there's a related issue with add_timer_on(), currently without a
straight forward solution, in case you're curious, discussed with
Sebastian and Sultan a few weeks ago. Pasting from that thread, the
issue is:

 1 while (conditions) {
 2     if (!timer_pending(&stack.timer))
 3         add_timer_on(&stack.timer, some_next_cpu);
 4 }
 5 del_timer_sync(&stack.timer);

a) add_timer_on() on line 3 is called from CPU 1 and pends the timer on
   CPU 2.

b) Just before the timer callback runs, not after, timer_pending() is
   made false, so the condition on line 2 holds true again.

c) add_timer_on() on line 3 is called from CPU 1 and pends the timer on
   CPU 3.

d) The conditions on line 1 are made false, and the loop breaks.

e) del_timer_sync() on line 5 is called, and its `base->running_timer !=
   timer` check is false, because of step (c).

f) `stack.timer` gets freed / goes out of scope.

g) The callback scheduled from step (b) runs, and we have a UaF.

Here's a reproducer of this flow, which prints out:

    [    4.157610] wireguard: Stack on cpu 1 is corrupt

diff --git a/drivers/net/wireguard/main.c b/drivers/net/wireguard/main.c
index ee4da9ab8013..5c61f49918f2 100644
--- a/drivers/net/wireguard/main.c
+++ b/drivers/net/wireguard/main.c
@@ -17,10 +17,40 @@
 #include <linux/genetlink.h>
 #include <net/rtnetlink.h>

+struct state {
+	struct timer_list timer;
+	char valid[8];
+};
+
+static void fire(struct timer_list *timer)
+{
+	struct state *stack = container_of(timer, struct state, timer);
+	mdelay(1000);
+	pr_err("Stack on cpu %d is %s\n", raw_smp_processor_id(), stack->valid);
+}
+
+static void do_the_thing(struct work_struct *work)
+{
+	struct state stack = { .valid = "valid" };
+	timer_setup_on_stack(&stack.timer, fire, 0);
+	stack.timer.expires = jiffies;
+	add_timer_on(&stack.timer, 1);
+	while (timer_pending(&stack.timer))
+		cpu_relax();
+	stack.timer.expires = jiffies;
+	add_timer_on(&stack.timer, 2);
+	del_timer_sync(&stack.timer);
+	memcpy(&stack.valid, "corrupt", 8);
+}
+
+static DECLARE_DELAYED_WORK(reproducer, do_the_thing);
+
 static int __init wg_mod_init(void)
 {
 	int ret;

+	schedule_delayed_work_on(0, &reproducer, HZ * 3);
+
 	ret = wg_allowedips_slab_init();
 	if (ret < 0)
 		goto err_allowedips;

It would be interesting if your patch fixed this case too. But maybe the
above is unfixable (and rather niche anyway).

Jason

Sebastian Andrzej Siewior Oct. 27, 2022, 4:01 p.m. UTC | #6

On 2022-10-27 17:52:21 [+0200], Jason A. Donenfeld wrote:
> On Thu, Oct 27, 2022 at 11:05:25AM -0400, Steven Rostedt wrote:
…
> 
> FYI, there's a related issue with add_timer_on(), currently without a
> straight forward solution, in case you're curious, discussed with
> Sebastian and Sultan a few weeks ago. Pasting from that thread, the
> issue is:
…
…
> 
> It would be interesting if your patch fixed this case too. But maybe the
> above is unfixable (and rather niche anyway).

Haven't read the thread, yet, but the issue we talked about a few weeks
ago is unfixable. I plan to document this.
Eitherway CONFIG_DEBUG_OBJECTS_TIMERS should warn you here.

There are watchdogs (clocksource_watchdog()) which rotate CPUs and
invoke add_timer_on() from within the callback. This complicates
things.

> Jason

Sebastian

Dmitry Torokhov Oct. 27, 2022, 4:38 p.m. UTC | #7

On Thu, Oct 27, 2022 at 11:21:17AM -0400, Steven Rostedt wrote:
> [
>   quilt mail --send still can't handle unicode characters.
>     Here's the patch again
> ]
> 
> From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
> 
> Before a timer is freed, del_timer_shutdown() must be called.
> 
> Link: https://lore.kernel.org/all/20220407161745.7d6754b3@gandalf.local.home/
> 
> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
> Cc: "Pali Rohár" <pali@kernel.org>
> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
> Cc: Helge Deller <deller@gmx.de>
> Cc: Tony Lindgren <tony@atomide.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: linux-input@vger.kernel.org
> Cc: linux-parisc@vger.kernel.org
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

Please feel free to merge with the rest of the series.

Thanks.

Steven Rostedt Oct. 27, 2022, 5:23 p.m. UTC | #8

On Thu, 27 Oct 2022 18:01:21 +0200
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> Haven't read the thread, yet, but the issue we talked about a few weeks
> ago is unfixable. I plan to document this.
> Eitherway CONFIG_DEBUG_OBJECTS_TIMERS should warn you here.

Not sure if my change affects that in this patch series.


  https://lore.kernel.org/all/20221027150931.071195430@goodmis.org/

-- Steve

Guenter Roeck Oct. 27, 2022, 6:58 p.m. UTC | #9

On 10/27/22 08:05, Steven Rostedt wrote:
> Back in April, I posted an RFC patch set to help mitigate a common issue
> where a timer gets armed just before it is freed, and when the timer
> goes off, it crashes in the timer code without any evidence of who the
> culprit was. I got side tracked and never finished up on that patch set.
> Since this type of crash is still our #1 crash we are seeing in the field,
> it has become a priority again to finish it.
> 
> This is v2 of that patch set. Thomas Gleixner posted an untested version
> that makes timer->function NULL as the flag that it is shutdown. I took that
> code, tested it (fixed it up), added more comments, and changed the
> name to del_timer_shutdown() as Linus had asked. I also converted it to use
> WARN_ON_ONCE() instead of just WARN_ON() as Linus asked for that too.
> 

I rebased the series on top of v6.1-rc2 and gave it a try. Unfortunately it
blows up in my face, first with

[   16.212535] ------------[ cut here ]------------
[   16.212887] ODEBUG: free active (active state 0) object type: timer_list hint: tcp_write_timer+0x0/0x190
[   16.213725] WARNING: CPU: 0 PID: 310 at lib/debugobjects.c:502 debug_print_object+0xb8/0x100

which, of course, might be real, but after a couple of those I get

[   16.325257] Insufficient stack space to handle exception!
[   16.325326] ESR: 0x0000000096000047 -- DABT (current EL)
[   16.325355] FAR: 0xffff800008677fe0
[   16.325366] Task stack:     [0xffff800008678000..0xffff80000867c000]
[   16.325376] IRQ stack:      [0xffff800008000000..0xffff800008004000]
[   16.325387] Overflow stack: [0xffff51e19feab300..0xffff51e19feac300]
[   16.325406] CPU: 0 PID: 310 Comm: telnet Tainted: G        W        N 6.1.0-rc2-00032-g895257c4037a #1
[   16.325425] Hardware name: linux,dummy-virt (DT)
[   16.325434] pstate: 400000c5 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   16.325450] pc : mark_lock+0x4/0x620
[   16.325482] lr : __lock_acquire+0x3e8/0x1950
[   16.325493] sp : ffff800008678090
[   16.325499] x29: ffff800008678090 x28: ffff51e18651ccc0 x27: ffffc690a5583000
[   16.325530] x26: ffff51e18651d590 x25: 0000000000000001 x24: 0000000000000001
[   16.325552] x23: 0000000000000001 x22: 0000000000000028 x21: ffff51e18651d568
[   16.325571] x20: ffffc690a61b5710 x19: 0000000000000000 x18: ffffffffffffffff
[   16.325589] x17: 6e6968207473696c x16: 5f72656d6974203a x15: 6570797420746365
[   16.325608] x14: 6a626f2029302065 x13: ffffc690a55a78c0 x12: 00000000000c02e7
[   16.325626] x11: 0000000030fa562e x10: ffffc690a5a9f208 x9 : ffffc690a5a9f200
[   16.325645] x8 : ffff51e18651ccc0 x7 : 0000000000000000 x6 : 000000000002e9fa
[   16.325663] x5 : 0000000000000028 x4 : 0000000000000002 x3 : 00000000000c0003
[   16.325681] x2 : 0000000000000008 x1 : ffff51e18651d590 x0 : ffff51e18651ccc0
[   16.325902] Kernel panic - not syncing: kernel stack overflow
[   16.325992] CPU: 0 PID: 310 Comm: telnet Tainted: G        W        N 6.1.0-rc2-00032-g895257c4037a #1
[   16.326008] Hardware name: linux,dummy-virt (DT)
[   16.326026] Call trace:
[   16.326031]  dump_backtrace.part.0+0xe0/0xf0
[   16.326048]  show_stack+0x18/0x40
[   16.326058]  dump_stack_lvl+0x9c/0xd8
[   16.326073]  dump_stack+0x18/0x34
[   16.326083]  panic+0x194/0x38c
[   16.326093]  nmi_panic+0xac/0xb0
[   16.326103]  panic_bad_stack+0x104/0x124
[   16.326113]  handle_bad_stack+0x34/0xe0
[   16.326124]  __bad_stack+0x78/0x7c
[   16.326134]  mark_lock+0x4/0x620
[   16.326144]  lock_acquire.part.0+0xf0/0x26c
[   16.326155]  lock_acquire+0x68/0x84
[   16.326166]  _raw_spin_lock_irqsave+0x70/0x150
[   16.326178]  debug_object_assert_init+0xa0/0x1a4
[   16.326193]  __try_to_del_timer_sync+0x40/0xdc
[   16.326207]  __del_timer_sync+0xa4/0x100
[   16.326218]  timer_fixup_free+0x2c/0x54
[   16.326229]  debug_object_free.part.0+0x188/0x1b0
...
(many more of those)
...
[   16.329989]  timer_fixup_free+0x40/0x54
[   16.329996]  __debug_check_no_obj_freed+0x1ec/0x25c
[   16.330003]  debug_check_no_obj_freed+0x20/0x90
[   16.330009]  slab_free_freelist_hook.constprop.0+0xac/0x1b0
[   16.330019]  kmem_cache_free+0x1ac/0x500
[   16.330026]  __sk_destruct+0x140/0x2a0
[   16.330035]  sk_destruct+0x54/0x64
[   16.330042]  __sk_free+0x74/0x120
[   16.330048]  sk_free+0x64/0x8c
[   16.330055]  tcp_close+0x94/0xc0
[   16.330064]  inet_release+0x50/0xb0
[   16.330071]  __sock_release+0x44/0xbc
[   16.330076]  sock_close+0x18/0x30
[   16.330081]  __fput+0x84/0x270
[   16.330088]  ____fput+0x10/0x20
[   16.330094]  task_work_run+0x88/0xf0
[   16.330102]  do_exit+0x334/0xafc
[   16.330108]  do_group_exit+0x34/0x90
[   16.330115]  __arm64_sys_exit_group+0x18/0x20
[   16.330121]  invoke_syscall+0x48/0x114
[   16.330133]  el0_svc_common.constprop.0+0x60/0x11c
[   16.330146]  do_el0_svc+0x30/0xd0
[   16.330157]  el0_svc+0x48/0xc0
[   16.330170]  el0t_64_sync_handler+0xbc/0x13c
[   16.330179]  el0t_64_sync+0x18c/0x190
[   16.330645] Kernel Offset: 0x46909ae00000 from 0xffff800008000000
[   16.330664] PHYS_OFFSET: 0xffffae1ec0000000
[   16.330686] CPU features: 0x22000,2033c080,0000421b
[   16.330752] Memory Limit: none

This is with arm64_defconfig plus various debug options.
Running the same test with v6.0.4 produces no errors, and neither
does running it with v6.1-rc2-105-gb229b6ca5abb (current mainline).

Guenter

Steven Rostedt Oct. 27, 2022, 7:02 p.m. UTC | #10

On Thu, 27 Oct 2022 11:58:59 -0700
Guenter Roeck <linux@roeck-us.net> wrote:

> This is with arm64_defconfig plus various debug options.
> Running the same test with v6.0.4 produces no errors, and neither
> does running it with v6.1-rc2-105-gb229b6ca5abb (current mainline).

So it works on current stable and current mainline? Does that mean we need
to worry about this?

-- Steve

Guenter Roeck Oct. 27, 2022, 7:11 p.m. UTC | #11

On Thu, Oct 27, 2022 at 03:02:37PM -0400, Steven Rostedt wrote:
> On Thu, 27 Oct 2022 11:58:59 -0700
> Guenter Roeck <linux@roeck-us.net> wrote:
> 
> > This is with arm64_defconfig plus various debug options.
> > Running the same test with v6.0.4 produces no errors, and neither
> > does running it with v6.1-rc2-105-gb229b6ca5abb (current mainline).
> 
> So it works on current stable and current mainline? Does that mean we need
> to worry about this?
> 
Sorry, I wasn't clear. The test on v6.0.4 and v6.1-rc2-105-gb229b6ca5abb was
without your patch set but with all ODEBUG options enabled, to be sure
that there is no pre-existing condition.

Guenter

Linus Torvalds Oct. 27, 2022, 7:11 p.m. UTC | #12

On Thu, Oct 27, 2022 at 12:02 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> So it works on current stable and current mainline? Does that mean we need
> to worry about this?

No, I think Guenter is saying that current mainline *without* the
patches works fine.

But v6.1-rc2 (which is quite close to that current mainline) *with*
the patches blows up.

So it's almost certainly the patches that break. There are no
appreciable timer changes in those 105 commits (there's some added
irq_work_sync but that looks very unlikely to be related.

            Linus

Steven Rostedt Oct. 27, 2022, 7:16 p.m. UTC | #13

On Thu, 27 Oct 2022 12:11:43 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Thu, Oct 27, 2022 at 12:02 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > So it works on current stable and current mainline? Does that mean we need
> > to worry about this?  
> 
> No, I think Guenter is saying that current mainline *without* the
> patches works fine.
> 
> But v6.1-rc2 (which is quite close to that current mainline) *with*
> the patches blows up.
> 
> So it's almost certainly the patches that break. There are no
> appreciable timer changes in those 105 commits (there's some added
> irq_work_sync but that looks very unlikely to be related.
> 

Got it. I'll need to setup an arm64 VM to see if I can reproduce it.

-- Steve

Steven Rostedt Oct. 27, 2022, 7:20 p.m. UTC | #14

On Thu, 27 Oct 2022 11:58:59 -0700
Guenter Roeck <linux@roeck-us.net> wrote:

> On 10/27/22 08:05, Steven Rostedt wrote:
> > Back in April, I posted an RFC patch set to help mitigate a common issue
> > where a timer gets armed just before it is freed, and when the timer
> > goes off, it crashes in the timer code without any evidence of who the
> > culprit was. I got side tracked and never finished up on that patch set.
> > Since this type of crash is still our #1 crash we are seeing in the field,
> > it has become a priority again to finish it.
> > 
> > This is v2 of that patch set. Thomas Gleixner posted an untested version
> > that makes timer->function NULL as the flag that it is shutdown. I took that
> > code, tested it (fixed it up), added more comments, and changed the
> > name to del_timer_shutdown() as Linus had asked. I also converted it to use
> > WARN_ON_ONCE() instead of just WARN_ON() as Linus asked for that too.
> >   
> 
> I rebased the series on top of v6.1-rc2 and gave it a try. Unfortunately it
> blows up in my face, first with
> 
> [   16.212535] ------------[ cut here ]------------
> [   16.212887] ODEBUG: free active (active state 0) object type: timer_list hint: tcp_write_timer+0x0/0x190
> [   16.213725] WARNING: CPU: 0 PID: 310 at lib/debugobjects.c:502 debug_print_object+0xb8/0x100
> 
> which, of course, might be real, but after a couple of those I get
> 
> [   16.325257] Insufficient stack space to handle exception!
> [   16.325326] ESR: 0x0000000096000047 -- DABT (current EL)
> [   16.325355] FAR: 0xffff800008677fe0
> [   16.325366] Task stack:     [0xffff800008678000..0xffff80000867c000]
> [   16.325376] IRQ stack:      [0xffff800008000000..0xffff800008004000]
> [   16.325387] Overflow stack: [0xffff51e19feab300..0xffff51e19feac300]
> [   16.325406] CPU: 0 PID: 310 Comm: telnet Tainted: G        W        N 6.1.0-rc2-00032-g895257c4037a #1
> [   16.325425] Hardware name: linux,dummy-virt (DT)
> [   16.325434] pstate: 400000c5 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [   16.325450] pc : mark_lock+0x4/0x620
> [   16.325482] lr : __lock_acquire+0x3e8/0x1950
> [   16.325493] sp : ffff800008678090
> [   16.325499] x29: ffff800008678090 x28: ffff51e18651ccc0 x27: ffffc690a5583000
> [   16.325530] x26: ffff51e18651d590 x25: 0000000000000001 x24: 0000000000000001
> [   16.325552] x23: 0000000000000001 x22: 0000000000000028 x21: ffff51e18651d568
> [   16.325571] x20: ffffc690a61b5710 x19: 0000000000000000 x18: ffffffffffffffff
> [   16.325589] x17: 6e6968207473696c x16: 5f72656d6974203a x15: 6570797420746365
> [   16.325608] x14: 6a626f2029302065 x13: ffffc690a55a78c0 x12: 00000000000c02e7
> [   16.325626] x11: 0000000030fa562e x10: ffffc690a5a9f208 x9 : ffffc690a5a9f200
> [   16.325645] x8 : ffff51e18651ccc0 x7 : 0000000000000000 x6 : 000000000002e9fa
> [   16.325663] x5 : 0000000000000028 x4 : 0000000000000002 x3 : 00000000000c0003
> [   16.325681] x2 : 0000000000000008 x1 : ffff51e18651d590 x0 : ffff51e18651ccc0
> [   16.325902] Kernel panic - not syncing: kernel stack overflow
> [   16.325992] CPU: 0 PID: 310 Comm: telnet Tainted: G        W        N 6.1.0-rc2-00032-g895257c4037a #1
> [   16.326008] Hardware name: linux,dummy-virt (DT)
> [   16.326026] Call trace:
> [   16.326031]  dump_backtrace.part.0+0xe0/0xf0
> [   16.326048]  show_stack+0x18/0x40
> [   16.326058]  dump_stack_lvl+0x9c/0xd8
> [   16.326073]  dump_stack+0x18/0x34
> [   16.326083]  panic+0x194/0x38c
> [   16.326093]  nmi_panic+0xac/0xb0
> [   16.326103]  panic_bad_stack+0x104/0x124
> [   16.326113]  handle_bad_stack+0x34/0xe0
> [   16.326124]  __bad_stack+0x78/0x7c
> [   16.326134]  mark_lock+0x4/0x620
> [   16.326144]  lock_acquire.part.0+0xf0/0x26c
> [   16.326155]  lock_acquire+0x68/0x84
> [   16.326166]  _raw_spin_lock_irqsave+0x70/0x150
> [   16.326178]  debug_object_assert_init+0xa0/0x1a4
> [   16.326193]  __try_to_del_timer_sync+0x40/0xdc
> [   16.326207]  __del_timer_sync+0xa4/0x100
> [   16.326218]  timer_fixup_free+0x2c/0x54
> [   16.326229]  debug_object_free.part.0+0x188/0x1b0
> ...
> (many more of those)
> ...
> [   16.329989]  timer_fixup_free+0x40/0x54

Ah, I see the issue here. Looks like the timer_fixup_free() is calling
itself and crashing.

Let me take a look into that. I didn't touch the fixup code, and there
could be an assumption there that it's behaving with the old approach.

-- Steve


> [   16.329996]  __debug_check_no_obj_freed+0x1ec/0x25c
> [   16.330003]  debug_check_no_obj_freed+0x20/0x90
> [   16.330009]  slab_free_freelist_hook.constprop.0+0xac/0x1b0
> [   16.330019]  kmem_cache_free+0x1ac/0x500
> [   16.330026]  __sk_destruct+0x140/0x2a0
> [   16.330035]  sk_destruct+0x54/0x64
> [   16.330042]  __sk_free+0x74/0x120
> [   16.330048]  sk_free+0x64/0x8c
> [   16.330055]  tcp_close+0x94/0xc0
> [   16.330064]  inet_release+0x50/0xb0
> [   16.330071]  __sock_release+0x44/0xbc
> [   16.330076]  sock_close+0x18/0x30
> [   16.330081]  __fput+0x84/0x270
> [   16.330088]  ____fput+0x10/0x20
> [   16.330094]  task_work_run+0x88/0xf0
> [   16.330102]  do_exit+0x334/0xafc
> [   16.330108]  do_group_exit+0x34/0x90
> [   16.330115]  __arm64_sys_exit_group+0x18/0x20
> [   16.330121]  invoke_syscall+0x48/0x114
> [   16.330133]  el0_svc_common.constprop.0+0x60/0x11c
> [   16.330146]  do_el0_svc+0x30/0xd0
> [   16.330157]  el0_svc+0x48/0xc0
> [   16.330170]  el0t_64_sync_handler+0xbc/0x13c
> [   16.330179]  el0t_64_sync+0x18c/0x190
> [   16.330645] Kernel Offset: 0x46909ae00000 from 0xffff800008000000
> [   16.330664] PHYS_OFFSET: 0xffffae1ec0000000
> [   16.330686] CPU features: 0x22000,2033c080,0000421b
> [   16.330752] Memory Limit: none
> 
> This is with arm64_defconfig plus various debug options.
> Running the same test with v6.0.4 produces no errors, and neither
> does running it with v6.1-rc2-105-gb229b6ca5abb (current mainline).
> 
> Guenter

Steven Rostedt Oct. 27, 2022, 7:27 p.m. UTC | #15

On Thu, 27 Oct 2022 15:20:58 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> > (many more of those)
> > ...
> > [   16.329989]  timer_fixup_free+0x40/0x54  
> 
> Ah, I see the issue here. Looks like the timer_fixup_free() is calling
> itself and crashing.
> 
> Let me take a look into that. I didn't touch the fixup code, and there
> could be an assumption there that it's behaving with the old approach.

Can you add this and see if it makes this issue go away?

-- Steve

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 1d17552b3ede..7305c65ad0eb 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -691,7 +691,7 @@ static bool timer_fixup_init(void *addr, enum debug_obj_state state)
 
 	switch (state) {
 	case ODEBUG_STATE_ACTIVE:
-		del_timer_sync(timer);
+		del_timer_shutdown(timer);
 		debug_object_init(timer, &timer_debug_descr);
 		return true;
 	default:
@@ -737,7 +737,7 @@ static bool timer_fixup_free(void *addr, enum debug_obj_state state)
 
 	switch (state) {
 	case ODEBUG_STATE_ACTIVE:
-		del_timer_sync(timer);
+		del_timer_shutdown(timer);
 		debug_object_free(timer, &timer_debug_descr);
 		return true;
 	default:

Guenter Roeck Oct. 27, 2022, 7:38 p.m. UTC | #16

On 10/27/22 12:27, Steven Rostedt wrote:
> On Thu, 27 Oct 2022 15:20:58 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
>>> (many more of those)
>>> ...
>>> [   16.329989]  timer_fixup_free+0x40/0x54
>>
>> Ah, I see the issue here. Looks like the timer_fixup_free() is calling
>> itself and crashing.
>>
>> Let me take a look into that. I didn't touch the fixup code, and there
>> could be an assumption there that it's behaving with the old approach.
> 
> Can you add this and see if it makes this issue go away?
> 

Yes, that fixes the crash. However, it still reports

[   12.235054] ------------[ cut here ]------------
[   12.235240] ODEBUG: free active (active state 0) object type: timer_list hint: tcp_write_timer+0x0/0x190
[   12.237331] WARNING: CPU: 0 PID: 310 at lib/debugobjects.c:502 debug_print_object+0xb8/0x100
...
[   12.255251] Call trace:
[   12.255305]  debug_print_object+0xb8/0x100
[   12.255385]  __debug_check_no_obj_freed+0x1d0/0x25c
[   12.255474]  debug_check_no_obj_freed+0x20/0x90
[   12.255555]  slab_free_freelist_hook.constprop.0+0xac/0x1b0
[   12.255650]  kmem_cache_free+0x1ac/0x500
[   12.255728]  __sk_destruct+0x140/0x2a0
[   12.255805]  sk_destruct+0x54/0x64
[   12.255877]  __sk_free+0x74/0x120
[   12.255944]  sk_free+0x64/0x8c
[   12.256009]  tcp_close+0x94/0xc0
[   12.256076]  inet_release+0x50/0xb0
[   12.256145]  __sock_release+0x44/0xbc
[   12.256219]  sock_close+0x18/0x30
[   12.256292]  __fput+0x84/0x270
[   12.256361]  ____fput+0x10/0x20
[   12.256426]  task_work_run+0x88/0xf0
[   12.256499]  do_exit+0x334/0xafc
[   12.256566]  do_group_exit+0x34/0x90
[   12.256634]  __arm64_sys_exit_group+0x18/0x20
[   12.256713]  invoke_syscall+0x48/0x114
[   12.256789]  el0_svc_common.constprop.0+0x60/0x11c
[   12.256874]  do_el0_svc+0x30/0xd0
[   12.256943]  el0_svc+0x48/0xc0
[   12.257008]  el0t_64_sync_handler+0xbc/0x13c
[   12.257086]  el0t_64_sync+0x18c/0x190

Is that a real problem or a false positive ? I didn't see that
without your patch series (which of course might be the whole point
of the series).

Thanks,
Guenter

Guenter Roeck Oct. 27, 2022, 7:44 p.m. UTC | #17

On 10/27/22 12:16, Steven Rostedt wrote:
> On Thu, 27 Oct 2022 12:11:43 -0700
> Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
>> On Thu, Oct 27, 2022 at 12:02 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>>>
>>> So it works on current stable and current mainline? Does that mean we need
>>> to worry about this?
>>
>> No, I think Guenter is saying that current mainline *without* the
>> patches works fine.
>>
>> But v6.1-rc2 (which is quite close to that current mainline) *with*
>> the patches blows up.
>>
>> So it's almost certainly the patches that break. There are no
>> appreciable timer changes in those 105 commits (there's some added
>> irq_work_sync but that looks very unlikely to be related.
>>
> 
> Got it. I'll need to setup an arm64 VM to see if I can reproduce it.
> 

Try http://server.roeck-us.net/qemu/aarch64/

Guenter

Guenter Roeck Oct. 27, 2022, 10:24 p.m. UTC | #18

On Thu, Oct 27, 2022 at 11:05:25AM -0400, Steven Rostedt wrote:
> 
> Back in April, I posted an RFC patch set to help mitigate a common issue
> where a timer gets armed just before it is freed, and when the timer
> goes off, it crashes in the timer code without any evidence of who the
> culprit was. I got side tracked and never finished up on that patch set.
> Since this type of crash is still our #1 crash we are seeing in the field,
> it has become a priority again to finish it.
> 
> This is v2 of that patch set. Thomas Gleixner posted an untested version
> that makes timer->function NULL as the flag that it is shutdown. I took that
> code, tested it (fixed it up), added more comments, and changed the
> name to del_timer_shutdown() as Linus had asked. I also converted it to use
> WARN_ON_ONCE() instead of just WARN_ON() as Linus asked for that too.
> 

Here are various warnings and crashes. Complete logs are at

https://kerneltests.org/builders

in the "testing" column of the qemu test results.

This is with the published patch set plus the fixups in 
timer_fixup_init() and timer_fixup_free().

Guenter

---
Block code:

WARNING: CPU: 0 PID: 8 at lib/debugobjects.c:502 debug_print_object+0x66/0x7a
ODEBUG: free active (active state 0) object type: timer_list hint: blk_rq_timed_out_timer+0x0/0xc

---
In tcp:

WARNING: CPU: 0 PID: 255 at lib/debugobjects.c:502 debug_print_object+0xa4/0xd8
ODEBUG: free active (active state 0) object type: timer_list hint: tcp_keepalive_timer+0x0/0x3a8

WARNING: CPU: 0 PID: 255 at lib/debugobjects.c:502 debug_print_object+0xa4/0xd8
ODEBUG: free active (active state 0) object type: timer_list hint: tcp_write_timer+0x0/0x1a8

---
Other networking:

WARNING: CPU: 0 PID: 8 at lib/debugobjects.c:502 debug_print_object+0xa4/0xd8
ODEBUG: free active (active state 0) object type: timer_list hint: neigh_timer_handler+0x0/0x468

---

WARNING: CPU: 0 PID: 280 at lib/debugobjects.c:502 debug_print_object+0xa4/0xd8
ODEBUG: init active (active state 0) object type: timer_list hint: tulip_timer+0x0/0x38

That one is often followed by:

[   29.833216] ODEBUG: init active (active state 0) object type: timer_list hint: 0x0
[   29.833723] WARNING: CPU: 0 PID: 365 at lib/debugobjects.c:502 debug_print_object+0xb8/0x100

in the same code line, suggesting that the timer may be shut down
more than once (?). Then there is another crash with

[   29.863890] Insufficient stack space to handle exception!
[   29.863939] ESR: 0x0000000096000047 -- DABT (current EL)
[   29.863952] FAR: 0xffff800008b17f80
[   29.863959] Task stack:     [0xffff800008b18000..0xffff800008b1c000]
[   29.863967] IRQ stack:      [0xffff800008000000..0xffff800008004000]
[   29.863975] Overflow stack: [0xffff30a35fe7a0f0..0xffff30a35fe7b0f0]
[   29.863987] CPU: 0 PID: 365 Comm: ip Tainted: G        W        N 6.1.0-rc2-00138-gced58c742836 #1
[   29.863999] Hardware name: linux,dummy-virt (DT)

followed by a sequence of

[   29.864572]  __try_to_del_timer_sync+0x40/0xdc
[   29.864582]  __del_timer_sync+0xa4/0x100
[   29.864590]  timer_fixup_init+0x2c/0x5c
[   29.864599]  __debug_object_init+0x248/0x53c
[   29.864606]  debug_object_init+0x24/0x30
[   29.864614]  timer_fixup_init+0x40/0x5c
[   29.864622]  __debug_object_init+0x248/0x53c
[   29.864630]  debug_object_init+0x24/0x30
[   29.864637]  timer_fixup_init+0x40/0x5c
[   29.864645]  __debug_object_init+0x248/0x53c
[   29.864658]  debug_object_init+0x24/0x30
[   29.864666]  timer_fixup_init+0x40/0x5c
...
[   29.866492]  debug_object_init+0x24/0x30
[   29.866500]  init_timer_key+0x7c/0x80
[   29.866508]  tulip_down+0x1d0/0x24c
[   29.866518]  tulip_close+0x3c/0xc0

This crash is seen with variants on several platforms/architectures.

---

WARNING: CPU: 0 PID: 8 at lib/debugobjects.c:502 debug_print_object+0xa4/0xd8
ODEBUG: free active (active state 0) object type: timer_list hint: addrconf_dad_work+0x0/0x628

---
mips64:

WARNING: CPU: 0 PID: 280 at lib/debugobjects.c:502 debug_print_object+0xa4/0xd8
ODEBUG: init active (active state 0) object type: timer_list hint: 0x0
Modules linked in:
CPU: 0 PID: 280 Comm: ip Tainted: G        W        N 6.1.0-rc2-00138-gced58c742836 #1
Stack : 0000000000000056 ffffffffffffffff 0000000000000008 00b65275abba8126
        00b65275abba8126 0000000000000000 9000000004193728 ffffffff80f29f38
        ffffffff8105c9d8 ffffffffffff8880 9000000004193668 0000000000000000
        0000000000000005 0000000000000010 ffffffff80c731a0 ffffffff81030000
        2020205720202020 ffffffff81030000 0000000000000000 ffffffff80f29f38
        0000000000000009 00000000000001f6 ffffffff8063d45c ffffffff81cb94c0
        ffffffff81d76240 0000000000000000 ffffffff807575a8 0000000000000000
        ffffffff81230000 9000000004190000 9000000004193720 9000000018002030
        ffffffff80c8f528 0000000000000000 0000000000000000 00b65275abba8126
        ffffffff8122e050 000000001000a4e1 ffffffff8010a884 00b65275abba8126
        ...
Call Trace:
[<ffffffff8010a884>] show_stack+0x3c/0x120
[<ffffffff80c8f528>] dump_stack_lvl+0x4c/0x90
[<ffffffff80130e54>] __warn+0xdc/0x1c8
[<ffffffff80c75a08>] warn_slowpath_fmt+0x98/0xc4
[<ffffffff8063d45c>] debug_print_object+0xa4/0xd8
[<ffffffff8063e590>] __debug_object_init+0x2d0/0x670
[<ffffffff801b44b8>] timer_fixup_init+0x40/0x58

irq event stamp: 1537
hardirqs last  enabled at (1549): [<ffffffff8018e67c>] __up_console_sem+0x9c/0xc8
hardirqs last disabled at (1560): [<ffffffff8018e658>] __up_console_sem+0x78/0xc8
softirqs last  enabled at (1324): [<ffffffff80a572bc>] dev_deactivate_many+0x32c/0x458
softirqs last disabled at (1322): [<ffffffff80a57288>] dev_deactivate_many+0x2f8/0x458
---[ end trace 0000000000000000 ]---

=====================================
WARNING: bad unlock balance detected!
6.1.0-rc2-00138-gced58c742836 #1 Tainted: G        W        N
-------------------------------------
�������␈␂/-1 is trying to release lock (&obj_hash[i].lock) at:
[<ffffffff801b41e0>] __try_to_del_timer_sync+0x48/0x140
but there are no more locks to release!

other info that might help us debug this:
qemu-system-mips64: terminating on signal 15 from pid 2525787 (/bin/bash)

---
Openrisc fails completely. Lots of the following, then boot stalls.

WARNING: CPU: 0 PID: 88 at lib/debugobjects.c:502 debug_print_object+0xc0/0xe8
ODEBUG: init active (active state 0) object type: timer_list hint: 0x0
Modules linked in:
CPU: 0 PID: 88 Comm: udhcpc Not tainted 6.1.0-rc2-00138-gced58c742836 #1
Call trace:
[<8eaa3f51>] dump_stack_lvl+0x44/0x80
[<4da6c5ef>] dump_stack+0x1c/0x2c
[<1b9f58b7>] __warn+0xdc/0x118
[<86d4d066>] ? debug_print_object+0xc0/0xe8
[<689beae8>] warn_slowpath_fmt+0x78/0x90
[<86d4d066>] debug_print_object+0xc0/0xe8
[<872f4074>] __debug_object_init+0x2bc/0x7f4
[<c30d6214>] ? _raw_spin_unlock_irqrestore+0x50/0x84
[<c3f3be91>] ? debug_check_no_locks_freed+0xb8/0x194
[<558675c1>] ? slob_alloc+0xe8/0x350
[<26225550>] ? lockdep_init_map_type+0x68/0x38c
[<c3f3be91>] ? debug_check_no_locks_freed+0xb8/0x194
[<2c9cf7b5>] ? inet_create+0x2e8/0x404
[<bab055df>] debug_object_init+0x30/0x40
[<6d2e468f>] init_timer_key+0xb4/0x110
[<60c0d3e7>] ? sk_init_common+0x1a0/0x1c0
[<b52686c2>] sock_init_data+0x60/0x2a4
[<1ff904c3>] ? sk_alloc+0xe8/0x138
[<ca3e6ef9>] inet_create+0x1e0/0x404
[<623d726a>] ? inet_create+0x80/0x404
[<a3256728>] ? lock_release+0x1c0/0x30c
[<960a454e>] __sock_create+0x140/0x288
[<0290cd64>] ? __sock_create+0x98/0x288
[<b62bc649>] __sys_socket+0x7c/0x128
[<fa003224>] ? do_work_pending+0x4c/0x118
[<705147e9>] sys_socket+0x14/0x24
[<9c4e015d>] ? _syscall_return+0x0/0x4
---[ end trace 0000000000000000 ]---

---
parisc crashes.

[    3.015186] ------------[ cut here ]------------
[    3.015580] ODEBUG: init active (active state 0) object type: timer_list hint: timeout_waiting_on_port+0x0/0x2c
[    3.016945] WARNING: CPU: 0 PID: 1 at lib/debugobjects.c:502 debug_print_object+0x98/0xc8
...
[    3.026117] ODEBUG: init active (active state 0) object type: timer_list hint: 0x0
[    3.026355] WARNING: CPU: 0 PID: 1 at lib/debugobjects.c:502 debug_print_object+0x98/0xc8
...
[ many more of the same ]

[    3.054542] stackcheck: swapper/0 will most likely overflow irq stack (sp:11e51800, stk bottom-top:11460004-11468004)
[    3.054592] Kernel panic - not syncing: low stack detected by irq handler - check messages

and boom.

---
usb:

[   23.993737] WARNING: CPU: 0 PID: 343 at lib/debugobjects.c:502 debug_print_object+0xac/0xc8
[   23.993953] ODEBUG: free active (active state 0) object type: timer_list hint: hub_init_func2+0x0/0xc

---
ppc:

[    6.607478][    T1] ODEBUG: init active (active state 0) object type: timer_list hint: .ibmvscsi_timeout+0x0/0x58
[    6.608536][    T1] WARNING: CPU: 0 PID: 1 at lib/debugobjects.c:502 .debug_print_object+0xbc/0xf0

followed by several of

[    6.623661][    T1] ODEBUG: init active (active state 0) object type: timer_list hint: 0x0
[    6.624155][    T1] WARNING: CPU: 0 PID: 1 at lib/debugobjects.c:502 .debug_print_object+0xbc/0xf0

and finally:

[    6.694785][    C0] BUG: Kernel NULL pointer dereference on read at 0x000000b8
[    6.694796][    C0] Faulting instruction address: 0xc0000000001ece44
[    6.694807][    C0] Thread overran stack, or stack corrupted
[    6.694967][    C0] Oops: Kernel access of bad area, sig: 7 [#1]

Steven Rostedt Oct. 27, 2022, 10:58 p.m. UTC | #19

On Thu, 27 Oct 2022 15:24:04 -0700
Guenter Roeck <linux@roeck-us.net> wrote:

> On Thu, Oct 27, 2022 at 11:05:25AM -0400, Steven Rostedt wrote:
> > 
> > Back in April, I posted an RFC patch set to help mitigate a common issue
> > where a timer gets armed just before it is freed, and when the timer
> > goes off, it crashes in the timer code without any evidence of who the
> > culprit was. I got side tracked and never finished up on that patch set.
> > Since this type of crash is still our #1 crash we are seeing in the field,
> > it has become a priority again to finish it.
> > 
> > This is v2 of that patch set. Thomas Gleixner posted an untested version
> > that makes timer->function NULL as the flag that it is shutdown. I took that
> > code, tested it (fixed it up), added more comments, and changed the
> > name to del_timer_shutdown() as Linus had asked. I also converted it to use
> > WARN_ON_ONCE() instead of just WARN_ON() as Linus asked for that too.
> >   
> 
> Here are various warnings and crashes. Complete logs are at
> 
> https://kerneltests.org/builders
> 
> in the "testing" column of the qemu test results.
> 
> This is with the published patch set plus the fixups in 
> timer_fixup_init() and timer_fixup_free().
> 
> Guenter
> 
> ---
> Block code:
> 
> WARNING: CPU: 0 PID: 8 at lib/debugobjects.c:502 debug_print_object+0x66/0x7a
> ODEBUG: free active (active state 0) object type: timer_list hint: blk_rq_timed_out_timer+0x0/0xc

This is probably just missing a switch to shutdown.

> 
> ---
> In tcp:
> 
> WARNING: CPU: 0 PID: 255 at lib/debugobjects.c:502 debug_print_object+0xa4/0xd8
> ODEBUG: free active (active state 0) object type: timer_list hint: tcp_keepalive_timer+0x0/0x3a8
> 
> WARNING: CPU: 0 PID: 255 at lib/debugobjects.c:502 debug_print_object+0xa4/0xd8
> ODEBUG: free active (active state 0) object type: timer_list hint: tcp_write_timer+0x0/0x1a8

Looking at these, I think the del_timer_try_shutdown() needs to always
shutdown (let's try that, and see if that fixes things before we rename
anything).

> 
> ---
> Other networking:
> 
> WARNING: CPU: 0 PID: 8 at lib/debugobjects.c:502 debug_print_object+0xa4/0xd8
> ODEBUG: free active (active state 0) object type: timer_list hint: neigh_timer_handler+0x0/0x468
> 
> ---
> 
> WARNING: CPU: 0 PID: 280 at lib/debugobjects.c:502 debug_print_object+0xa4/0xd8
> ODEBUG: init active (active state 0) object type: timer_list hint: tulip_timer+0x0/0x38
> 
> That one is often followed by:
> 
> [   29.833216] ODEBUG: init active (active state 0) object type: timer_list hint: 0x0
> [   29.833723] WARNING: CPU: 0 PID: 365 at lib/debugobjects.c:502 debug_print_object+0xb8/0x100
> 
> in the same code line, suggesting that the timer may be shut down
> more than once (?). Then there is another crash with
> 
> [   29.863890] Insufficient stack space to handle exception!
> [   29.863939] ESR: 0x0000000096000047 -- DABT (current EL)
> [   29.863952] FAR: 0xffff800008b17f80
> [   29.863959] Task stack:     [0xffff800008b18000..0xffff800008b1c000]
> [   29.863967] IRQ stack:      [0xffff800008000000..0xffff800008004000]
> [   29.863975] Overflow stack: [0xffff30a35fe7a0f0..0xffff30a35fe7b0f0]
> [   29.863987] CPU: 0 PID: 365 Comm: ip Tainted: G        W        N 6.1.0-rc2-00138-gced58c742836 #1
> [   29.863999] Hardware name: linux,dummy-virt (DT)
> 
> followed by a sequence of
> 
> [   29.864572]  __try_to_del_timer_sync+0x40/0xdc
> [   29.864582]  __del_timer_sync+0xa4/0x100
> [   29.864590]  timer_fixup_init+0x2c/0x5c
> [   29.864599]  __debug_object_init+0x248/0x53c
> [   29.864606]  debug_object_init+0x24/0x30
> [   29.864614]  timer_fixup_init+0x40/0x5c
> [   29.864622]  __debug_object_init+0x248/0x53c
> [   29.864630]  debug_object_init+0x24/0x30
> [   29.864637]  timer_fixup_init+0x40/0x5c
> [   29.864645]  __debug_object_init+0x248/0x53c
> [   29.864658]  debug_object_init+0x24/0x30
> [   29.864666]  timer_fixup_init+0x40/0x5c
> ...
> [   29.866492]  debug_object_init+0x24/0x30
> [   29.866500]  init_timer_key+0x7c/0x80
> [   29.866508]  tulip_down+0x1d0/0x24c
> [   29.866518]  tulip_close+0x3c/0xc0
> 
> This crash is seen with variants on several platforms/architectures.

Hmm, I'll have to take a deeper look here.

> 
> ---
> 
> WARNING: CPU: 0 PID: 8 at lib/debugobjects.c:502 debug_print_object+0xa4/0xd8
> ODEBUG: free active (active state 0) object type: timer_list hint: addrconf_dad_work+0x0/0x628

Hmm, I was afraid of this one.

Looks to be from the workqueue code for delayed work. I don't see a place
that frees besides just flushing it. And we can't determine if a flush is
permanent or will be armed again.

May need to add a helper function that resets the debugging so that it
doesn't trigger on delayed work timers. There's a DEBUG_OBJECTS_WORK that
handles bad work queue issues.


> 
> ---
> mips64:
> 
> WARNING: CPU: 0 PID: 280 at lib/debugobjects.c:502 debug_print_object+0xa4/0xd8
> ODEBUG: init active (active state 0) object type: timer_list hint: 0x0

Not a very good hint.

> Modules linked in:
> CPU: 0 PID: 280 Comm: ip Tainted: G        W        N 6.1.0-rc2-00138-gced58c742836 #1
> Stack : 0000000000000056 ffffffffffffffff 0000000000000008 00b65275abba8126
>         00b65275abba8126 0000000000000000 9000000004193728 ffffffff80f29f38
>         ffffffff8105c9d8 ffffffffffff8880 9000000004193668 0000000000000000
>         0000000000000005 0000000000000010 ffffffff80c731a0 ffffffff81030000
>         2020205720202020 ffffffff81030000 0000000000000000 ffffffff80f29f38
>         0000000000000009 00000000000001f6 ffffffff8063d45c ffffffff81cb94c0
>         ffffffff81d76240 0000000000000000 ffffffff807575a8 0000000000000000
>         ffffffff81230000 9000000004190000 9000000004193720 9000000018002030
>         ffffffff80c8f528 0000000000000000 0000000000000000 00b65275abba8126
>         ffffffff8122e050 000000001000a4e1 ffffffff8010a884 00b65275abba8126
>         ...
> Call Trace:
> [<ffffffff8010a884>] show_stack+0x3c/0x120
> [<ffffffff80c8f528>] dump_stack_lvl+0x4c/0x90
> [<ffffffff80130e54>] __warn+0xdc/0x1c8
> [<ffffffff80c75a08>] warn_slowpath_fmt+0x98/0xc4
> [<ffffffff8063d45c>] debug_print_object+0xa4/0xd8
> [<ffffffff8063e590>] __debug_object_init+0x2d0/0x670
> [<ffffffff801b44b8>] timer_fixup_init+0x40/0x58
> 
> irq event stamp: 1537
> hardirqs last  enabled at (1549): [<ffffffff8018e67c>] __up_console_sem+0x9c/0xc8
> hardirqs last disabled at (1560): [<ffffffff8018e658>] __up_console_sem+0x78/0xc8
> softirqs last  enabled at (1324): [<ffffffff80a572bc>] dev_deactivate_many+0x32c/0x458
> softirqs last disabled at (1322): [<ffffffff80a57288>] dev_deactivate_many+0x2f8/0x458
> ---[ end trace 0000000000000000 ]---
> 
> =====================================
> WARNING: bad unlock balance detected!
> 6.1.0-rc2-00138-gced58c742836 #1 Tainted: G        W        N
> -------------------------------------
> �������␈␂/-1 is trying to release lock (&obj_hash[i].lock) at:
> [<ffffffff801b41e0>] __try_to_del_timer_sync+0x48/0x140
> but there are no more locks to release!
> 
> other info that might help us debug this:
> qemu-system-mips64: terminating on signal 15 from pid 2525787 (/bin/bash)
> 
> ---
> Openrisc fails completely. Lots of the following, then boot stalls.
> 
> WARNING: CPU: 0 PID: 88 at lib/debugobjects.c:502 debug_print_object+0xc0/0xe8
> ODEBUG: init active (active state 0) object type: timer_list hint: 0x0
> Modules linked in:
> CPU: 0 PID: 88 Comm: udhcpc Not tainted 6.1.0-rc2-00138-gced58c742836 #1
> Call trace:
> [<8eaa3f51>] dump_stack_lvl+0x44/0x80
> [<4da6c5ef>] dump_stack+0x1c/0x2c
> [<1b9f58b7>] __warn+0xdc/0x118
> [<86d4d066>] ? debug_print_object+0xc0/0xe8
> [<689beae8>] warn_slowpath_fmt+0x78/0x90
> [<86d4d066>] debug_print_object+0xc0/0xe8
> [<872f4074>] __debug_object_init+0x2bc/0x7f4
> [<c30d6214>] ? _raw_spin_unlock_irqrestore+0x50/0x84
> [<c3f3be91>] ? debug_check_no_locks_freed+0xb8/0x194
> [<558675c1>] ? slob_alloc+0xe8/0x350
> [<26225550>] ? lockdep_init_map_type+0x68/0x38c
> [<c3f3be91>] ? debug_check_no_locks_freed+0xb8/0x194
> [<2c9cf7b5>] ? inet_create+0x2e8/0x404
> [<bab055df>] debug_object_init+0x30/0x40
> [<6d2e468f>] init_timer_key+0xb4/0x110
> [<60c0d3e7>] ? sk_init_common+0x1a0/0x1c0
> [<b52686c2>] sock_init_data+0x60/0x2a4
> [<1ff904c3>] ? sk_alloc+0xe8/0x138
> [<ca3e6ef9>] inet_create+0x1e0/0x404
> [<623d726a>] ? inet_create+0x80/0x404
> [<a3256728>] ? lock_release+0x1c0/0x30c
> [<960a454e>] __sock_create+0x140/0x288
> [<0290cd64>] ? __sock_create+0x98/0x288
> [<b62bc649>] __sys_socket+0x7c/0x128
> [<fa003224>] ? do_work_pending+0x4c/0x118
> [<705147e9>] sys_socket+0x14/0x24
> [<9c4e015d>] ? _syscall_return+0x0/0x4
> ---[ end trace 0000000000000000 ]---
> 
> ---
> parisc crashes.
> 
> [    3.015186] ------------[ cut here ]------------
> [    3.015580] ODEBUG: init active (active state 0) object type: timer_list hint: timeout_waiting_on_port+0x0/0x2c
> [    3.016945] WARNING: CPU: 0 PID: 1 at lib/debugobjects.c:502 debug_print_object+0x98/0xc8
> ...
> [    3.026117] ODEBUG: init active (active state 0) object type: timer_list hint: 0x0
> [    3.026355] WARNING: CPU: 0 PID: 1 at lib/debugobjects.c:502 debug_print_object+0x98/0xc8
> ...
> [ many more of the same ]
> 
> [    3.054542] stackcheck: swapper/0 will most likely overflow irq stack (sp:11e51800, stk bottom-top:11460004-11468004)
> [    3.054592] Kernel panic - not syncing: low stack detected by irq handler - check messages
> 
> and boom.
> 
> ---
> usb:
> 
> [   23.993737] WARNING: CPU: 0 PID: 343 at lib/debugobjects.c:502 debug_print_object+0xac/0xc8
> [   23.993953] ODEBUG: free active (active state 0) object type: timer_list hint: hub_init_func2+0x0/0xc

Yeah, there's going to be lots of these.

> 
> ---
> ppc:
> 
> [    6.607478][    T1] ODEBUG: init active (active state 0) object type: timer_list hint: .ibmvscsi_timeout+0x0/0x58
> [    6.608536][    T1] WARNING: CPU: 0 PID: 1 at lib/debugobjects.c:502 .debug_print_object+0xbc/0xf0
> 
> followed by several of
> 
> [    6.623661][    T1] ODEBUG: init active (active state 0) object type: timer_list hint: 0x0
> [    6.624155][    T1] WARNING: CPU: 0 PID: 1 at lib/debugobjects.c:502 .debug_print_object+0xbc/0xf0
> 
> and finally:
> 
> [    6.694785][    C0] BUG: Kernel NULL pointer dereference on read at 0x000000b8
> [    6.694796][    C0] Faulting instruction address: 0xc0000000001ece44
> [    6.694807][    C0] Thread overran stack, or stack corrupted
> [    6.694967][    C0] Oops: Kernel access of bad area, sig: 7 [#1]



Anyway, this is all an issue with the last patch that makes calling
shutdown a requirement before freeing and not just a hint.

You may want to test all but the last patch, as only the last patch is
what's going to trigger all the above, as it modifies the semantics of
DEBUG_OBJECTS_TIMER.

But for the tcp one, add this to the queue and see if the tcp one goes away?

-- Steve

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index ecf625378078..e9838ce6a6cd 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1255,7 +1255,7 @@ int __del_timer(struct timer_list *timer, bool free)
 	if (timer_pending(timer)) {
 		base = lock_timer_base(timer, &flags);
 		ret = detach_if_pending(timer, base, true);
-		if (free && ret) {
+		if (free) {
 			timer->function = NULL;
 			debug_timer_deactivate(timer);
 		}

Guenter Roeck Oct. 27, 2022, 11:24 p.m. UTC | #20

On Thu, Oct 27, 2022 at 06:58:59PM -0400, Steven Rostedt wrote:
> On Thu, 27 Oct 2022 15:24:04 -0700
> Guenter Roeck <linux@roeck-us.net> wrote:
> 
> > On Thu, Oct 27, 2022 at 11:05:25AM -0400, Steven Rostedt wrote:
> > > 
> > > Back in April, I posted an RFC patch set to help mitigate a common issue
> > > where a timer gets armed just before it is freed, and when the timer
> > > goes off, it crashes in the timer code without any evidence of who the
> > > culprit was. I got side tracked and never finished up on that patch set.
> > > Since this type of crash is still our #1 crash we are seeing in the field,
> > > it has become a priority again to finish it.
> > > 
> > > This is v2 of that patch set. Thomas Gleixner posted an untested version
> > > that makes timer->function NULL as the flag that it is shutdown. I took that
> > > code, tested it (fixed it up), added more comments, and changed the
> > > name to del_timer_shutdown() as Linus had asked. I also converted it to use
> > > WARN_ON_ONCE() instead of just WARN_ON() as Linus asked for that too.
> > >   
> > 
> > Here are various warnings and crashes. Complete logs are at
> > 
> > https://kerneltests.org/builders
> > 
> > in the "testing" column of the qemu test results.
> > 
> > This is with the published patch set plus the fixups in 
> > timer_fixup_init() and timer_fixup_free().
> > 
> > Guenter
> > 
> > ---
> > Block code:
> > 
> > WARNING: CPU: 0 PID: 8 at lib/debugobjects.c:502 debug_print_object+0x66/0x7a
> > ODEBUG: free active (active state 0) object type: timer_list hint: blk_rq_timed_out_timer+0x0/0xc
> 
> This is probably just missing a switch to shutdown.
> 
Wild shot, and I don't really know what I am doing,
but I'll try

iff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index e71b3b43927c..6c1cb4a219f0 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -767,6 +767,8 @@ static void blk_release_queue(struct kobject *kobj)

        might_sleep();

+       del_timer_shutdown(&q->timeout);
+
        percpu_ref_exit(&q->q_usage_counter);

        if (q->poll_stat)

Guenter

Steven Rostedt Oct. 27, 2022, 11:55 p.m. UTC | #21

On Thu, 27 Oct 2022 16:24:42 -0700
Guenter Roeck <linux@roeck-us.net> wrote:

> > This is probably just missing a switch to shutdown.
> >   
> Wild shot, and I don't really know what I am doing,
> but I'll try
> 
> iff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index e71b3b43927c..6c1cb4a219f0 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -767,6 +767,8 @@ static void blk_release_queue(struct kobject *kobj)
> 
>         might_sleep();
> 
> +       del_timer_shutdown(&q->timeout);
> +
>         percpu_ref_exit(&q->q_usage_counter);
> 
>         if (q->poll_stat)

Or perhaps this:

-- Steve

diff --git a/block/blk-core.c b/block/blk-core.c
index 17667159482e..69b1daa2e91a 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -227,7 +227,7 @@ const char *blk_status_to_str(blk_status_t status)
  */
 void blk_sync_queue(struct request_queue *q)
 {
-	del_timer_sync(&q->timeout);
+	del_timer_shutdown(&q->timeout);
 	cancel_work_sync(&q->timeout_work);
 }
 EXPORT_SYMBOL(blk_sync_queue);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index e71b3b43927c..12a1e46536ed 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -769,6 +769,8 @@ static void blk_release_queue(struct kobject *kobj)
 
 	percpu_ref_exit(&q->q_usage_counter);
 
+	blk_sync_queue(q);
+
 	if (q->poll_stat)
 		blk_stat_remove_callback(q, q->poll_cb);
 	blk_stat_free_callback(q->poll_cb);

Guenter Roeck Oct. 28, 2022, 12:54 a.m. UTC | #22

On 10/27/22 16:55, Steven Rostedt wrote:
> On Thu, 27 Oct 2022 16:24:42 -0700
> Guenter Roeck <linux@roeck-us.net> wrote:
> 
>>> This is probably just missing a switch to shutdown.
>>>    
>> Wild shot, and I don't really know what I am doing,
>> but I'll try
>>
>> iff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
>> index e71b3b43927c..6c1cb4a219f0 100644
>> --- a/block/blk-sysfs.c
>> +++ b/block/blk-sysfs.c
>> @@ -767,6 +767,8 @@ static void blk_release_queue(struct kobject *kobj)
>>
>>          might_sleep();
>>
>> +       del_timer_shutdown(&q->timeout);
>> +
>>          percpu_ref_exit(&q->q_usage_counter);
>>
>>          if (q->poll_stat)
> 
> Or perhaps this:
> 

I think you are correct. Let me give it a try.

> -- Steve
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 17667159482e..69b1daa2e91a 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -227,7 +227,7 @@ const char *blk_status_to_str(blk_status_t status)
>    */
>   void blk_sync_queue(struct request_queue *q)
>   {
> -	del_timer_sync(&q->timeout);
> +	del_timer_shutdown(&q->timeout);
>   	cancel_work_sync(&q->timeout_work);
>   }
>   EXPORT_SYMBOL(blk_sync_queue);
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index e71b3b43927c..12a1e46536ed 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -769,6 +769,8 @@ static void blk_release_queue(struct kobject *kobj)
>   
>   	percpu_ref_exit(&q->q_usage_counter);
>   
> +	blk_sync_queue(q);
> +
This might not even be needed. I'll try without it first.

Thanks,
Guenter

>   	if (q->poll_stat)
>   		blk_stat_remove_callback(q, q->poll_cb);
>   	blk_stat_free_callback(q->poll_cb);

Steven Rostedt Oct. 28, 2022, 3:17 a.m. UTC | #23

On Fri, 28 Oct 2022 10:18:15 +0800
Hillf Danton <hdanton@sina.com> wrote:

> On 27 Oct 2022 11:05:45 -0400 Steven Rostedt (Google) <rostedt@goodmis.org>
> > 
> > --- a/drivers/usb/core/hub.c
> > +++ b/drivers/usb/core/hub.c
> > @@ -1261,6 +1261,9 @@ static void hub_activate(struct usb_hub *hub, enum hub_activation_type type)
> >  
> >  		/* Don't do a long sleep inside a workqueue routine */
> >  		if (type == HUB_INIT2) {
> > +			/* Timers must be shutdown before they are re-initialized */
> > +			if (hub->init_work.work.func)
> > +				del_timer_shutdown(&hub->init_work.timer);  
> 
> This is not needed in the workqueue callback as the timer in question
> is not pending.

This was added because of the updates to DEBUG_OBJECTS_TIMERS that changed
it to require a shutdown to remove the activation of the timer. This is to
detect the possibility that a timer may become active just before freeing
(there's way too many bugs that show that code logic is not enough).

This code in particular is troubling because it re-initializes an already
initialized timer with a new function. This causes the debug-objects to
trigger an "object activated while initializing" warning.

I originally added the "shutdown" to deactivate the object before you
re-initialize it. But I have since updated the code to keep track of if it
was ever activated, and if so, not to call the init code again, so this may
not be required anymore.

I'm still trying to work out the kinks as the users of timers have become
adapted to the implementation, and may need to add some other helpers to
make this work.

-- Steve

> 
> >  			INIT_DELAYED_WORK(&hub->init_work, hub_init_func3);
> >  			queue_delayed_work(system_power_efficient_wq,
> >  					&hub->init_work,

Christoph Hellwig Oct. 28, 2022, 8:26 a.m. UTC | #24

This is just a single patch out of apparently 31, which claims that
something that doesn't even exist in mainline must be used without any
explanation.  How do you expect anyone to be able to review it?

Steven Rostedt Oct. 28, 2022, 10:24 a.m. UTC | #25

On Fri, 28 Oct 2022 01:26:03 -0700
Christoph Hellwig <hch@infradead.org> wrote:

> This is just a single patch out of apparently 31, which claims that
> something that doesn't even exist in mainline must be used without any
> explanation.  How do you expect anyone to be able to review it?

  https://lore.kernel.org/all/20221027150525.753064657@goodmis.org/

Only the first patch is relevant to you. I guess the Cc list would have
been too big to Cc everyone that was Cc'd in the series.

It not being in mainline is the reason I marked it RFC. As it's more of an
FYI than a pull it in request.

-- Steve

Jens Axboe Oct. 28, 2022, 1:56 p.m. UTC | #26

On 10/28/22 4:24 AM, Steven Rostedt wrote:
> On Fri, 28 Oct 2022 01:26:03 -0700
> Christoph Hellwig <hch@infradead.org> wrote:
> 
>> This is just a single patch out of apparently 31, which claims that
>> something that doesn't even exist in mainline must be used without any
>> explanation.  How do you expect anyone to be able to review it?
> 
>   https://lore.kernel.org/all/20221027150525.753064657@goodmis.org/
> 
> Only the first patch is relevant to you. I guess the Cc list would have
> been too big to Cc everyone that was Cc'd in the series.

No it's not, because how on earth would anyone know what the change does
if you only see the simple s/name/newname change? The patch is useless
by itself.

Steven Rostedt Oct. 28, 2022, 2:06 p.m. UTC | #27

On Fri, 28 Oct 2022 07:56:50 -0600
Jens Axboe <axboe@kernel.dk> wrote:

> On 10/28/22 4:24 AM, Steven Rostedt wrote:
> > On Fri, 28 Oct 2022 01:26:03 -0700
> > Christoph Hellwig <hch@infradead.org> wrote:
> >   
> >> This is just a single patch out of apparently 31, which claims that
> >> something that doesn't even exist in mainline must be used without any
> >> explanation.  How do you expect anyone to be able to review it?  
> > 
> >   https://lore.kernel.org/all/20221027150525.753064657@goodmis.org/
> > 
> > Only the first patch is relevant to you. I guess the Cc list would have
> > been too big to Cc everyone that was Cc'd in the series.  
> 
> No it's not, because how on earth would anyone know what the change does
> if you only see the simple s/name/newname change? The patch is useless
> by itself.
> 

I meant this as the first patch:

  https://lore.kernel.org/all/20221027150925.248421571@goodmis.org/

Which was what the link above was suppose to point to.

It's the only patch relevant to the rest of the series, as the rest is just
converting over to the shutdown API, and the last patch changes
DEBUG_OBJECTS_TIMERS to catch if this was done properly.

That is, patch 01/31 and the patch you were Cc'd on is relevant, and for
those that want to look deeper, see patch 31 as well.

But if I included the Cc list for patch 01 for all those Cc'd in the
entire series, it would be a huge Cc list, so I avoided doing so.

Also, this is still RFC as the changes may still change. That is, this
patch set is a heads up to what is to come. Ideally, I'd like to get just
the API possibly in the kernel before the merge window without anyone using
it. Then I can ask all the sub systems to pull in these individual patches
as well.

-- Steve

Jens Axboe Oct. 28, 2022, 2:11 p.m. UTC | #28

On 10/28/22 8:06 AM, Steven Rostedt wrote:
> On Fri, 28 Oct 2022 07:56:50 -0600
> Jens Axboe <axboe@kernel.dk> wrote:
> 
>> On 10/28/22 4:24 AM, Steven Rostedt wrote:
>>> On Fri, 28 Oct 2022 01:26:03 -0700
>>> Christoph Hellwig <hch@infradead.org> wrote:
>>>   
>>>> This is just a single patch out of apparently 31, which claims that
>>>> something that doesn't even exist in mainline must be used without any
>>>> explanation.  How do you expect anyone to be able to review it?  
>>>
>>>   https://lore.kernel.org/all/20221027150525.753064657@goodmis.org/
>>>
>>> Only the first patch is relevant to you. I guess the Cc list would have
>>> been too big to Cc everyone that was Cc'd in the series.  
>>
>> No it's not, because how on earth would anyone know what the change does
>> if you only see the simple s/name/newname change? The patch is useless
>> by itself.
>>
> 
> I meant this as the first patch:
> 
>   https://lore.kernel.org/all/20221027150925.248421571@goodmis.org/
> 
> Which was what the link above was suppose to point to.
> 
> It's the only patch relevant to the rest of the series, as the rest is just
> converting over to the shutdown API, and the last patch changes
> DEBUG_OBJECTS_TIMERS to catch if this was done properly.
> 
> That is, patch 01/31 and the patch you were Cc'd on is relevant, and for
> those that want to look deeper, see patch 31 as well.

So we got half of what was needed to make any kind of sense of judgement
on the patch.

> But if I included the Cc list for patch 01 for all those Cc'd in the
> entire series, it would be a huge Cc list, so I avoided doing so.

And my point is that just CC'ing the relevant list for patch 4/31 is
useless. Do we need to see the whole series? No. Does everyone need to
see patch 1/31? Yes, very much so. Without that, 4/31 means nothing.

This is pretty common for tree wide changes. The relevant lists need
to see the full context, patch 4/31 by itself is useless and may as well
not be sent at this point then.

Steven Rostedt Oct. 28, 2022, 2:30 p.m. UTC | #29

On Fri, 28 Oct 2022 08:11:27 -0600
Jens Axboe <axboe@kernel.dk> wrote:

> This is pretty common for tree wide changes. The relevant lists need
> to see the full context, patch 4/31 by itself is useless and may as well
> not be sent at this point then.

Ah, I didn't think about just including the mailing lists. The Cc lists
were auto-generated, and I didn't think about just taking out the lists.

Will do that for v2.

Thanks,

-- Steve

Guenter Roeck Oct. 28, 2022, 3:11 p.m. UTC | #30

On 10/27/22 08:05, Steven Rostedt wrote:
> From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
> 
> Before a timer is freed, del_timer_shutdown() must be called.
> 

I also had to add the following, as you had already suggested.

Just changing blk_sync_queue() was insufficient; I had to add the call from
blk_release_queue() because otherwise blk_sync_queue() was not always called.

Thanks,
Guenter

---
diff --git a/block/blk-core.c b/block/blk-core.c
index 17667159482e..69b1daa2e91a 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -227,7 +227,7 @@ const char *blk_status_to_str(blk_status_t status)
   */
  void blk_sync_queue(struct request_queue *q)
  {
-       del_timer_sync(&q->timeout);
+       del_timer_shutdown(&q->timeout);
         cancel_work_sync(&q->timeout_work);
  }
  EXPORT_SYMBOL(blk_sync_queue);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index e71b3b43927c..12a1e46536ed 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -769,6 +769,8 @@ static void blk_release_queue(struct kobject *kobj)

         percpu_ref_exit(&q->q_usage_counter);

+       blk_sync_queue(q);
+
         if (q->poll_stat)
                 blk_stat_remove_callback(q, q->poll_cb);
         blk_stat_free_callback(q->poll_cb);

Guenter Roeck Oct. 28, 2022, 3:30 p.m. UTC | #31

On 10/27/22 15:58, Steven Rostedt wrote:
> On Thu, 27 Oct 2022 15:24:04 -0700
> Guenter Roeck <linux@roeck-us.net> wrote:
> 
>> On Thu, Oct 27, 2022 at 11:05:25AM -0400, Steven Rostedt wrote:
>>>
>>> Back in April, I posted an RFC patch set to help mitigate a common issue
>>> where a timer gets armed just before it is freed, and when the timer
>>> goes off, it crashes in the timer code without any evidence of who the
>>> culprit was. I got side tracked and never finished up on that patch set.
>>> Since this type of crash is still our #1 crash we are seeing in the field,
>>> it has become a priority again to finish it.
>>>
>>> This is v2 of that patch set. Thomas Gleixner posted an untested version
>>> that makes timer->function NULL as the flag that it is shutdown. I took that
>>> code, tested it (fixed it up), added more comments, and changed the
>>> name to del_timer_shutdown() as Linus had asked. I also converted it to use
>>> WARN_ON_ONCE() instead of just WARN_ON() as Linus asked for that too.
>>>    
>>
>> Here are various warnings and crashes. Complete logs are at
>>
>> https://kerneltests.org/builders
>>
>> in the "testing" column of the qemu test results.
>>
>> This is with the published patch set plus the fixups in
>> timer_fixup_init() and timer_fixup_free().
>>
>> Guenter
>>
>> ---
[ ... ]

>>
>> WARNING: CPU: 0 PID: 280 at lib/debugobjects.c:502 debug_print_object+0xa4/0xd8
>> ODEBUG: init active (active state 0) object type: timer_list hint: tulip_timer+0x0/0x38
>>


The problem is that the tulip code calls timer_setup() repeatedly (and
unnecessarily). Apparently either the new timer code and/or the associated
ODEBUG code doesn't like that. The patch below fixes the problem.

I think there needs to be a means to handle that situation gracefully.
The parport code has the same problem (see second patch below), and
I am sure there are others.

Thanks,
Guenter

---
tulip:

diff --git a/drivers/net/ethernet/dec/tulip/tulip_core.c b/drivers/net/ethernet/dec/tulip/tulip_core.c
index ecfad43df45a..0c86066929d3 100644
--- a/drivers/net/ethernet/dec/tulip/tulip_core.c
+++ b/drivers/net/ethernet/dec/tulip/tulip_core.c
@@ -770,8 +770,6 @@ static void tulip_down (struct net_device *dev)

         spin_unlock_irqrestore (&tp->lock, flags);

-       timer_setup(&tp->timer, tulip_tbl[tp->chip_id].media_timer, 0);
-
         dev->if_port = tp->saved_if_port;

         /* Leave the driver in snooze, not sleep, mode. */
@@ -1869,10 +1867,14 @@ static int __maybe_unused tulip_resume(struct device *dev_d)
  static void tulip_remove_one(struct pci_dev *pdev)
  {
         struct net_device *dev = pci_get_drvdata (pdev);
+       struct tulip_private *tp;

         if (!dev)
                 return;

+       tp = netdev_priv(dev);
+       del_timer_shutdown(&tp->timer);
+
         unregister_netdev(dev);
  }


---
parport:

diff --git a/drivers/parport/ieee1284.c b/drivers/parport/ieee1284.c
index 4547ac44c8d4..50dbd2ea23fc 100644
--- a/drivers/parport/ieee1284.c
+++ b/drivers/parport/ieee1284.c
@@ -73,7 +73,7 @@ int parport_wait_event (struct parport *port, signed long timeout)
         timer_setup(&port->timer, timeout_waiting_on_port, 0);
         mod_timer(&port->timer, jiffies + timeout);
         ret = down_interruptible (&port->physport->ieee1284.irq);
-       if (!del_timer_sync(&port->timer) && !ret)
+       if (!del_timer_shutdown(&port->timer) && !ret)
                 /* Timed out. */
                 ret = 1;

Guenter Roeck Oct. 28, 2022, 4:10 p.m. UTC | #32

On 10/27/22 15:58, Steven Rostedt wrote:
> On Thu, 27 Oct 2022 15:24:04 -0700
> Guenter Roeck <linux@roeck-us.net> wrote:
> 
>> On Thu, Oct 27, 2022 at 11:05:25AM -0400, Steven Rostedt wrote:
>>>
>>> Back in April, I posted an RFC patch set to help mitigate a common issue
>>> where a timer gets armed just before it is freed, and when the timer
>>> goes off, it crashes in the timer code without any evidence of who the
>>> culprit was. I got side tracked and never finished up on that patch set.
>>> Since this type of crash is still our #1 crash we are seeing in the field,
>>> it has become a priority again to finish it.
>>>
>>> This is v2 of that patch set. Thomas Gleixner posted an untested version
>>> that makes timer->function NULL as the flag that it is shutdown. I took that
>>> code, tested it (fixed it up), added more comments, and changed the
>>> name to del_timer_shutdown() as Linus had asked. I also converted it to use
>>> WARN_ON_ONCE() instead of just WARN_ON() as Linus asked for that too.
>>>    
>>
>> Here are various warnings and crashes. Complete logs are at
>>
>> https://kerneltests.org/builders
>>
>> in the "testing" column of the qemu test results.
>>
>> This is with the published patch set plus the fixups in
>> timer_fixup_init() and timer_fixup_free().
>>
>> Guenter
>>
>> ---
>> Block code:
>>
>> WARNING: CPU: 0 PID: 8 at lib/debugobjects.c:502 debug_print_object+0x66/0x7a
>> ODEBUG: free active (active state 0) object type: timer_list hint: blk_rq_timed_out_timer+0x0/0xc
> 
> This is probably just missing a switch to shutdown.
> 
>>
>> ---
>> In tcp:
>>
>> WARNING: CPU: 0 PID: 255 at lib/debugobjects.c:502 debug_print_object+0xa4/0xd8
>> ODEBUG: free active (active state 0) object type: timer_list hint: tcp_keepalive_timer+0x0/0x3a8
>>
>> WARNING: CPU: 0 PID: 255 at lib/debugobjects.c:502 debug_print_object+0xa4/0xd8
>> ODEBUG: free active (active state 0) object type: timer_list hint: tcp_write_timer+0x0/0x1a8
> 
> Looking at these, I think the del_timer_try_shutdown() needs to always
> shutdown (let's try that, and see if that fixes things before we rename
> anything).
> 
>>
>> ---
>> Other networking:
>>
>> WARNING: CPU: 0 PID: 8 at lib/debugobjects.c:502 debug_print_object+0xa4/0xd8
>> ODEBUG: free active (active state 0) object type: timer_list hint: neigh_timer_handler+0x0/0x468
>>
>> ---
>>
>> WARNING: CPU: 0 PID: 280 at lib/debugobjects.c:502 debug_print_object+0xa4/0xd8
>> ODEBUG: init active (active state 0) object type: timer_list hint: tulip_timer+0x0/0x38
>>
>> That one is often followed by:
>>
>> [   29.833216] ODEBUG: init active (active state 0) object type: timer_list hint: 0x0
>> [   29.833723] WARNING: CPU: 0 PID: 365 at lib/debugobjects.c:502 debug_print_object+0xb8/0x100
>>
>> in the same code line, suggesting that the timer may be shut down
>> more than once (?). Then there is another crash with
>>
>> [   29.863890] Insufficient stack space to handle exception!
>> [   29.863939] ESR: 0x0000000096000047 -- DABT (current EL)
>> [   29.863952] FAR: 0xffff800008b17f80
>> [   29.863959] Task stack:     [0xffff800008b18000..0xffff800008b1c000]
>> [   29.863967] IRQ stack:      [0xffff800008000000..0xffff800008004000]
>> [   29.863975] Overflow stack: [0xffff30a35fe7a0f0..0xffff30a35fe7b0f0]
>> [   29.863987] CPU: 0 PID: 365 Comm: ip Tainted: G        W        N 6.1.0-rc2-00138-gced58c742836 #1
>> [   29.863999] Hardware name: linux,dummy-virt (DT)
>>
>> followed by a sequence of
>>
>> [   29.864572]  __try_to_del_timer_sync+0x40/0xdc
>> [   29.864582]  __del_timer_sync+0xa4/0x100
>> [   29.864590]  timer_fixup_init+0x2c/0x5c
>> [   29.864599]  __debug_object_init+0x248/0x53c
>> [   29.864606]  debug_object_init+0x24/0x30
>> [   29.864614]  timer_fixup_init+0x40/0x5c
>> [   29.864622]  __debug_object_init+0x248/0x53c
>> [   29.864630]  debug_object_init+0x24/0x30
>> [   29.864637]  timer_fixup_init+0x40/0x5c
>> [   29.864645]  __debug_object_init+0x248/0x53c
>> [   29.864658]  debug_object_init+0x24/0x30
>> [   29.864666]  timer_fixup_init+0x40/0x5c
>> ...
>> [   29.866492]  debug_object_init+0x24/0x30
>> [   29.866500]  init_timer_key+0x7c/0x80
>> [   29.866508]  tulip_down+0x1d0/0x24c
>> [   29.866518]  tulip_close+0x3c/0xc0
>>
>> This crash is seen with variants on several platforms/architectures.
> 
> Hmm, I'll have to take a deeper look here.
> 
>>
>> ---
>>
>> WARNING: CPU: 0 PID: 8 at lib/debugobjects.c:502 debug_print_object+0xa4/0xd8
>> ODEBUG: free active (active state 0) object type: timer_list hint: addrconf_dad_work+0x0/0x628
> 
> Hmm, I was afraid of this one.
> 
> Looks to be from the workqueue code for delayed work. I don't see a place
> that frees besides just flushing it. And we can't determine if a flush is
> permanent or will be armed again.
> 
> May need to add a helper function that resets the debugging so that it
> doesn't trigger on delayed work timers. There's a DEBUG_OBJECTS_WORK that
> handles bad work queue issues.
> 
> 
>>
>> ---
>> mips64:
>>
>> WARNING: CPU: 0 PID: 280 at lib/debugobjects.c:502 debug_print_object+0xa4/0xd8
>> ODEBUG: init active (active state 0) object type: timer_list hint: 0x0
> 
> Not a very good hint.
> 
>> Modules linked in:
>> CPU: 0 PID: 280 Comm: ip Tainted: G        W        N 6.1.0-rc2-00138-gced58c742836 #1
>> Stack : 0000000000000056 ffffffffffffffff 0000000000000008 00b65275abba8126
>>          00b65275abba8126 0000000000000000 9000000004193728 ffffffff80f29f38
>>          ffffffff8105c9d8 ffffffffffff8880 9000000004193668 0000000000000000
>>          0000000000000005 0000000000000010 ffffffff80c731a0 ffffffff81030000
>>          2020205720202020 ffffffff81030000 0000000000000000 ffffffff80f29f38
>>          0000000000000009 00000000000001f6 ffffffff8063d45c ffffffff81cb94c0
>>          ffffffff81d76240 0000000000000000 ffffffff807575a8 0000000000000000
>>          ffffffff81230000 9000000004190000 9000000004193720 9000000018002030
>>          ffffffff80c8f528 0000000000000000 0000000000000000 00b65275abba8126
>>          ffffffff8122e050 000000001000a4e1 ffffffff8010a884 00b65275abba8126
>>          ...
>> Call Trace:
>> [<ffffffff8010a884>] show_stack+0x3c/0x120
>> [<ffffffff80c8f528>] dump_stack_lvl+0x4c/0x90
>> [<ffffffff80130e54>] __warn+0xdc/0x1c8
>> [<ffffffff80c75a08>] warn_slowpath_fmt+0x98/0xc4
>> [<ffffffff8063d45c>] debug_print_object+0xa4/0xd8
>> [<ffffffff8063e590>] __debug_object_init+0x2d0/0x670
>> [<ffffffff801b44b8>] timer_fixup_init+0x40/0x58
>>
>> irq event stamp: 1537
>> hardirqs last  enabled at (1549): [<ffffffff8018e67c>] __up_console_sem+0x9c/0xc8
>> hardirqs last disabled at (1560): [<ffffffff8018e658>] __up_console_sem+0x78/0xc8
>> softirqs last  enabled at (1324): [<ffffffff80a572bc>] dev_deactivate_many+0x32c/0x458
>> softirqs last disabled at (1322): [<ffffffff80a57288>] dev_deactivate_many+0x2f8/0x458
>> ---[ end trace 0000000000000000 ]---
>>
>> =====================================
>> WARNING: bad unlock balance detected!
>> 6.1.0-rc2-00138-gced58c742836 #1 Tainted: G        W        N
>> -------------------------------------
>> �������␈␂/-1 is trying to release lock (&obj_hash[i].lock) at:
>> [<ffffffff801b41e0>] __try_to_del_timer_sync+0x48/0x140
>> but there are no more locks to release!
>>
>> other info that might help us debug this:
>> qemu-system-mips64: terminating on signal 15 from pid 2525787 (/bin/bash)
>>
>> ---
>> Openrisc fails completely. Lots of the following, then boot stalls.
>>
>> WARNING: CPU: 0 PID: 88 at lib/debugobjects.c:502 debug_print_object+0xc0/0xe8
>> ODEBUG: init active (active state 0) object type: timer_list hint: 0x0
>> Modules linked in:
>> CPU: 0 PID: 88 Comm: udhcpc Not tainted 6.1.0-rc2-00138-gced58c742836 #1
>> Call trace:
>> [<8eaa3f51>] dump_stack_lvl+0x44/0x80
>> [<4da6c5ef>] dump_stack+0x1c/0x2c
>> [<1b9f58b7>] __warn+0xdc/0x118
>> [<86d4d066>] ? debug_print_object+0xc0/0xe8
>> [<689beae8>] warn_slowpath_fmt+0x78/0x90
>> [<86d4d066>] debug_print_object+0xc0/0xe8
>> [<872f4074>] __debug_object_init+0x2bc/0x7f4
>> [<c30d6214>] ? _raw_spin_unlock_irqrestore+0x50/0x84
>> [<c3f3be91>] ? debug_check_no_locks_freed+0xb8/0x194
>> [<558675c1>] ? slob_alloc+0xe8/0x350
>> [<26225550>] ? lockdep_init_map_type+0x68/0x38c
>> [<c3f3be91>] ? debug_check_no_locks_freed+0xb8/0x194
>> [<2c9cf7b5>] ? inet_create+0x2e8/0x404
>> [<bab055df>] debug_object_init+0x30/0x40
>> [<6d2e468f>] init_timer_key+0xb4/0x110
>> [<60c0d3e7>] ? sk_init_common+0x1a0/0x1c0
>> [<b52686c2>] sock_init_data+0x60/0x2a4
>> [<1ff904c3>] ? sk_alloc+0xe8/0x138
>> [<ca3e6ef9>] inet_create+0x1e0/0x404
>> [<623d726a>] ? inet_create+0x80/0x404
>> [<a3256728>] ? lock_release+0x1c0/0x30c
>> [<960a454e>] __sock_create+0x140/0x288
>> [<0290cd64>] ? __sock_create+0x98/0x288
>> [<b62bc649>] __sys_socket+0x7c/0x128
>> [<fa003224>] ? do_work_pending+0x4c/0x118
>> [<705147e9>] sys_socket+0x14/0x24
>> [<9c4e015d>] ? _syscall_return+0x0/0x4
>> ---[ end trace 0000000000000000 ]---
>>
>> ---
>> parisc crashes.
>>
>> [    3.015186] ------------[ cut here ]------------
>> [    3.015580] ODEBUG: init active (active state 0) object type: timer_list hint: timeout_waiting_on_port+0x0/0x2c
>> [    3.016945] WARNING: CPU: 0 PID: 1 at lib/debugobjects.c:502 debug_print_object+0x98/0xc8
>> ...
>> [    3.026117] ODEBUG: init active (active state 0) object type: timer_list hint: 0x0
>> [    3.026355] WARNING: CPU: 0 PID: 1 at lib/debugobjects.c:502 debug_print_object+0x98/0xc8
>> ...
>> [ many more of the same ]
>>
>> [    3.054542] stackcheck: swapper/0 will most likely overflow irq stack (sp:11e51800, stk bottom-top:11460004-11468004)
>> [    3.054592] Kernel panic - not syncing: low stack detected by irq handler - check messages
>>
>> and boom.
>>
>> ---
>> usb:
>>
>> [   23.993737] WARNING: CPU: 0 PID: 343 at lib/debugobjects.c:502 debug_print_object+0xac/0xc8
>> [   23.993953] ODEBUG: free active (active state 0) object type: timer_list hint: hub_init_func2+0x0/0xc
> 
> Yeah, there's going to be lots of these.
> 
>>
>> ---
>> ppc:
>>
>> [    6.607478][    T1] ODEBUG: init active (active state 0) object type: timer_list hint: .ibmvscsi_timeout+0x0/0x58
>> [    6.608536][    T1] WARNING: CPU: 0 PID: 1 at lib/debugobjects.c:502 .debug_print_object+0xbc/0xf0
>>
>> followed by several of
>>
>> [    6.623661][    T1] ODEBUG: init active (active state 0) object type: timer_list hint: 0x0
>> [    6.624155][    T1] WARNING: CPU: 0 PID: 1 at lib/debugobjects.c:502 .debug_print_object+0xbc/0xf0
>>
>> and finally:
>>
>> [    6.694785][    C0] BUG: Kernel NULL pointer dereference on read at 0x000000b8
>> [    6.694796][    C0] Faulting instruction address: 0xc0000000001ece44
>> [    6.694807][    C0] Thread overran stack, or stack corrupted
>> [    6.694967][    C0] Oops: Kernel access of bad area, sig: 7 [#1]
> 
> 
> 
> Anyway, this is all an issue with the last patch that makes calling
> shutdown a requirement before freeing and not just a hint.
> 

I can see this is a real problem. I see it all over the place. The latest is
in drivers/scsi/ibmvscsi/ibmvscsi.c where, again, timer_setup() is called
repeatedly and the timer is stopped with del_timer(), causing a crash.
Replacing those calls with del_timer_shutdown() or, where this isn't possible,
with del_timer_try_shutdown() fixes that problem.

> You may want to test all but the last patch, as only the last patch is
> what's going to trigger all the above, as it modifies the semantics of
> DEBUG_OBJECTS_TIMER.
> 

I'll do that after running a test with the change below, to see if there
are any other problems.

> But for the tcp one, add this to the queue and see if the tcp one goes away?
> 
> -- Steve
> 
> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index ecf625378078..e9838ce6a6cd 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -1255,7 +1255,7 @@ int __del_timer(struct timer_list *timer, bool free)
>   	if (timer_pending(timer)) {
>   		base = lock_timer_base(timer, &flags);
>   		ret = detach_if_pending(timer, base, true);
> -		if (free && ret) {
> +		if (free) {
>   			timer->function = NULL;
>   			debug_timer_deactivate(timer);
>   		}

Just noticed that one (too many patches floating around, sorry).
Applied, and I'll test again.

Thanks,
Guenter

Steven Rostedt Oct. 28, 2022, 6:50 p.m. UTC | #33

Trond,

I'm looking at a commit from 2005:

0f9dc2b16884b ("RPC: Clean up socket autodisconnect")

     Cancel autodisconnect requests inside xprt_transmit() in order to avoid
     races.
     Use more efficient del_singleshot_timer_sync()


I'm working on adding a "shutdown" state to timers, making it required for
freeing the timer. This is to address the numerous bugs we hit where timers
get rearmed just before freeing and then cause a crash in the timer code,
without knowing what timer it was that caused it.

Having a specific shutdown state for timers will remove this problem
because if something tries to rearm a shutdown timer, it will fail and a
WARN_ON_ONCE() is triggered. See below in the "reply" part for a
description of this effort.

The reason for this email, is because that WARN_ON_ONCE() triggered on the
mod_timer() from:

static void
xprt_schedule_autodisconnect(struct rpc_xprt *xprt)
	__must_hold(&xprt->transport_lock)
{
	xprt->last_used = jiffies;
	if (RB_EMPTY_ROOT(&xprt->recv_queue) && xprt_has_timer(xprt))
		mod_timer(&xprt->timer, xprt->last_used + xprt->idle_timeout);
}

That's because xptr->timer was shutdown due to:

int
xprt_request_enqueue_receive(struct rpc_task *task)
{
	[..]
	/* Turn off autodisconnect */
	del_singleshot_timer_sync(&xprt->timer);
	return 0;
}

Now singleshot means just that. It's a single shot and calling
del_singleshot_timer_sync() will shut it down so that it can be freed. That
also means that it can no longer be re-armed.

I'm not sure what you meant by "Use more efficient del_singleshot_timer_sync()"
but I'm guessing since that was written in 2005, it is no longer relevant,
and del_timer_sync() should now be used.

After replacing that with del_timer_sync(), the warning goes away.

I just want to confirm that's OK with you.

Thanks,

-- Steve


On Thu, 27 Oct 2022 11:05:25 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> Back in April, I posted an RFC patch set to help mitigate a common issue
> where a timer gets armed just before it is freed, and when the timer
> goes off, it crashes in the timer code without any evidence of who the
> culprit was. I got side tracked and never finished up on that patch set.
> Since this type of crash is still our #1 crash we are seeing in the field,
> it has become a priority again to finish it.
> 
> This is v2 of that patch set. Thomas Gleixner posted an untested version
> that makes timer->function NULL as the flag that it is shutdown. I took that
> code, tested it (fixed it up), added more comments, and changed the
> name to del_timer_shutdown() as Linus had asked. I also converted it to use
> WARN_ON_ONCE() instead of just WARN_ON() as Linus asked for that too.
> 
> (Thomas, you never added a SoB, so I only added a link to your email
>  in that commit. But as this will likely go through your tree anyway,
>  I'm sure you'll have your SoB on all these).
> 
> I then created a trivial coccinelle script to find where del_timer*()
> is called before being freed, and converted them all to del_timer_shutdown()
> (There was a couple that still used del_timer() instead of del_timer_sync()).
> 
> I also updated DEBUG_OBJECTS_TIMERS to check from where the timer is ever
> armed, to calling of del_timer_shutdown(), and it will trigger if a timer
> is freed in between. The current way is to only check if the timer is armed,
> but that means it only triggers if the race condition is hit, and with
> experience, it's not run on enough machines to catch all of them. By triggering
> it from the time the timer is armed to the time it is shutdown, it catches
> all potential cases even if the race condition is not hit.
> 
> I went though the result of the cocinelle script, and updated the locations.
> Some locations were caught by DEBUG_OBJECTS_TIMERS as the coccinelle script
> only checked for timers being freed in the same function as the del_timer*().
> 
> V1 is found here: https://lore.kernel.org/all/20220407161745.7d6754b3@gandalf.local.home/
> 
> Here's the original text of that version:
> 
>    [
>      This is an RFC patch. As we hit a few bugs were del_timer() is called
>      instead of del_timer_sync() before the timer is freed, and there could
>      be bugs where even del_timer_sync() is used, but the timer gets rearmed,
>      I decided to introduce a "del_timer_free()" function that can be used
>      instead. This will at least educate developers on what to call before they
>      free a structure that holds a timer.
> 
>      In this RFC, I modified hci_qca.c as a use case, even though that change
>      needs some work, because the workqueue could still rearm it (I'm looking
>      to see if I can trigger the warning).
> 
>      If this approach is acceptable, then I will remove the hci_qca.c portion
>      from this patch, and create a series of patches to use the
>      del_timer_free() in all the locations in the kernel that remove the timer
>      before freeing.
>    ]
> 
>    We are hitting a common bug were a timer is being triggered after it is
>    freed. This causes a corruption in the timer link list and crashes the
>    kernel. Unfortunately it is not easy to know what timer it was that was
>    freed. Looking at the code, it appears that there are several cases that
>    del_timer() is used when del_timer_sync() should have been.
> 
>    Add a del_timer_free() that not only does a del_timer_sync() but will mark
>    the timer as freed in case it gets rearmed, it will trigger a WARN_ON. The
>    del_timer_free() is more likely to be used by developers that are about to
>    free a timer, then using del_timer_sync() as the latter is not as obvious
>    to being needed for freeing. Having the word "free" in the name of the
>    function will hopefully help developers know that that function needs to
>    be called before freeing.
> 
>    The added bonus is the marking of the timer as being freed such that it
>    will trigger a warning if it gets rearmed. At least that way if the system
>    crashes on a freed timer, at least we may see which timer it was that was
>    freed.
>

Trond Myklebust Oct. 28, 2022, 8:12 p.m. UTC | #34

Hi Steve,

> On Oct 28, 2022, at 14:50, Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> Trond,
> 
> I'm looking at a commit from 2005:
> 
> 0f9dc2b16884b ("RPC: Clean up socket autodisconnect")
> 
>     Cancel autodisconnect requests inside xprt_transmit() in order to avoid
>     races.
>     Use more efficient del_singleshot_timer_sync()
> 
> 
> I'm working on adding a "shutdown" state to timers, making it required for
> freeing the timer. This is to address the numerous bugs we hit where timers
> get rearmed just before freeing and then cause a crash in the timer code,
> without knowing what timer it was that caused it.
> 
> Having a specific shutdown state for timers will remove this problem
> because if something tries to rearm a shutdown timer, it will fail and a
> WARN_ON_ONCE() is triggered. See below in the "reply" part for a
> description of this effort.
> 
> The reason for this email, is because that WARN_ON_ONCE() triggered on the
> mod_timer() from:
> 
> static void
> xprt_schedule_autodisconnect(struct rpc_xprt *xprt)
> __must_hold(&xprt->transport_lock)
> {
> xprt->last_used = jiffies;
> if (RB_EMPTY_ROOT(&xprt->recv_queue) && xprt_has_timer(xprt))
> mod_timer(&xprt->timer, xprt->last_used + xprt->idle_timeout);
> }
> 
> That's because xptr->timer was shutdown due to:
> 
> int
> xprt_request_enqueue_receive(struct rpc_task *task)
> {
> [..]
> /* Turn off autodisconnect */
> del_singleshot_timer_sync(&xprt->timer);
> return 0;
> }
> 
> Now singleshot means just that. It's a single shot and calling
> del_singleshot_timer_sync() will shut it down so that it can be freed. That
> also means that it can no longer be re-armed.
> 
> I'm not sure what you meant by "Use more efficient del_singleshot_timer_sync()"
> but I'm guessing since that was written in 2005, it is no longer relevant,
> and del_timer_sync() should now be used.
> 
> After replacing that with del_timer_sync(), the warning goes away.
> 
> I just want to confirm that's OK with you.

I seem to vaguely remember that at the time, del_timer_sync() would loop in order to catch re-arming timers, whereas del_singleshot_timer_sync() would not, hence the commit message. The expectation for del_singleshot_timer_sync() was simply that the caller would ensure safety against re-arming, which was indeed the case for this code.

However if that del_singleshot_timer_sync() expectation has been strengthened to mean that you guarantee never to re-arm the timer at all, then I agree that we should switch to del_timer_sync().

Thanks!
  Trond

Steven Rostedt Oct. 28, 2022, 8:49 p.m. UTC | #35

On Fri, 28 Oct 2022 20:12:30 +0000
Trond Myklebust <trondmy@hammerspace.com> wrote:

> I seem to vaguely remember that at the time, del_timer_sync() would loop
> in order to catch re-arming timers, whereas del_singleshot_timer_sync()
> would not, hence the commit message. The expectation for
> del_singleshot_timer_sync() was simply that the caller would ensure
> safety against re-arming, which was indeed the case for this code.

Well, that expectation didn't last long. Your commit was added on June 22,
2005. Then on June 23, 2005 (the next day!) this happened:


55c888d6d09a0 ("timers fixes/improvements")

Which has:

@@ -89,12 +77,12 @@ static inline void add_timer(struct timer_list * timer)
 
 #ifdef CONFIG_SMP
   extern int del_timer_sync(struct timer_list *timer);
-  extern int del_singleshot_timer_sync(struct timer_list *timer);
 #else
 # define del_timer_sync(t) del_timer(t)
-# define del_singleshot_timer_sync(t) del_timer(t)
 #endif
 
+#define del_singleshot_timer_sync(t) del_timer_sync(t)
+


So much or efficiency! :-)

I guess converting it back to del_timer_sync() is the right thing to do
regardless of this patch series. I'll send you a patch.

-- Steve

Trond Myklebust Oct. 28, 2022, 9:57 p.m. UTC | #36

> On Oct 28, 2022, at 16:49, Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> On Fri, 28 Oct 2022 20:12:30 +0000
> Trond Myklebust <trondmy@hammerspace.com> wrote:
> 
>> I seem to vaguely remember that at the time, del_timer_sync() would loop
>> in order to catch re-arming timers, whereas del_singleshot_timer_sync()
>> would not, hence the commit message. The expectation for
>> del_singleshot_timer_sync() was simply that the caller would ensure
>> safety against re-arming, which was indeed the case for this code.
> 
> Well, that expectation didn't last long. Your commit was added on June 22,
> 2005. Then on June 23, 2005 (the next day!) this happened:
> 
> 
> 55c888d6d09a0 ("timers fixes/improvements")
> 
> Which has:
> 
> @@ -89,12 +77,12 @@ static inline void add_timer(struct timer_list * timer)
> 
> #ifdef CONFIG_SMP
>   extern int del_timer_sync(struct timer_list *timer);
> -  extern int del_singleshot_timer_sync(struct timer_list *timer);
> #else
> # define del_timer_sync(t) del_timer(t)
> -# define del_singleshot_timer_sync(t) del_timer(t)
> #endif
> 
> +#define del_singleshot_timer_sync(t) del_timer_sync(t)
> +
> 
> 
> So much or efficiency! :-)
> 
> I guess converting it back to del_timer_sync() is the right thing to do
> regardless of this patch series. I'll send you a patch.
> 
> -- Steve

We’re all moving a lot slower now, 17 years later...

_________________________________
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com