[-next,8/8] block: fix null-pointer dereference in ioc_pd_init

Message ID 20221128154434.4177442-9-linan122@huawei.com
State New
Headers
Series iocost bugfix |

Commit Message

Li Nan Nov. 28, 2022, 3:44 p.m. UTC
  Remove block device when iocost is initializing may cause
null-pointer dereference:

	CPU1				   CPU2
  ioc_qos_write
   blkcg_conf_open_bdev
    blkdev_get_no_open
     kobject_get_unless_zero
    blk_iocost_init
     rq_qos_add
  					del_gendisk
  					 rq_qos_exit
  					  q->rq_qos = rqos->next
  					   //iocost is removed from q->roqs
      blkcg_activate_policy
       pd_init_fn
        ioc_pd_init
  	 ioc = q_to_ioc(blkg->q)
 	  //cant find iocost and return null

Fix problem by moving rq_qos_exit() to disk_release(). ioc_qos_write() get
bd_device.kobj in blkcg_conf_open_bdev(), so disk_release will not be
actived until iocost initialization is complited.

Signed-off-by: Li Nan <linan122@huawei.com>
---
 block/genhd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

kernel test robot Nov. 29, 2022, 11:23 a.m. UTC | #1
Hi Li,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on next-20221128]

url:    https://github.com/intel-lab-lkp/linux/commits/Li-Nan/iocost-bugfix/20221128-232536
patch link:    https://lore.kernel.org/r/20221128154434.4177442-9-linan122%40huawei.com
patch subject: [PATCH -next 8/8] block: fix null-pointer dereference in ioc_pd_init
config: arm-randconfig-r046-20221128
compiler: arm-linux-gnueabi-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/44da423d235b6c23ecf6a38325d0429cfac20ee2
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Li-Nan/iocost-bugfix/20221128-232536
        git checkout 44da423d235b6c23ecf6a38325d0429cfac20ee2
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=arm SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   block/genhd.c: In function 'disk_release':
>> block/genhd.c:1170:21: error: 'q' undeclared (first use in this function); did you mean 'rq'?
    1170 |         rq_qos_exit(q);
         |                     ^
         |                     rq
   block/genhd.c:1170:21: note: each undeclared identifier is reported only once for each function it appears in


vim +1170 block/genhd.c

  1136	
  1137	/**
  1138	 * disk_release - releases all allocated resources of the gendisk
  1139	 * @dev: the device representing this disk
  1140	 *
  1141	 * This function releases all allocated resources of the gendisk.
  1142	 *
  1143	 * Drivers which used __device_add_disk() have a gendisk with a request_queue
  1144	 * assigned. Since the request_queue sits on top of the gendisk for these
  1145	 * drivers we also call blk_put_queue() for them, and we expect the
  1146	 * request_queue refcount to reach 0 at this point, and so the request_queue
  1147	 * will also be freed prior to the disk.
  1148	 *
  1149	 * Context: can sleep
  1150	 */
  1151	static void disk_release(struct device *dev)
  1152	{
  1153		struct gendisk *disk = dev_to_disk(dev);
  1154	
  1155		might_sleep();
  1156		WARN_ON_ONCE(disk_live(disk));
  1157	
  1158		/*
  1159		 * To undo the all initialization from blk_mq_init_allocated_queue in
  1160		 * case of a probe failure where add_disk is never called we have to
  1161		 * call blk_mq_exit_queue here. We can't do this for the more common
  1162		 * teardown case (yet) as the tagset can be gone by the time the disk
  1163		 * is released once it was added.
  1164		 */
  1165		if (queue_is_mq(disk->queue) &&
  1166		    test_bit(GD_OWNS_QUEUE, &disk->state) &&
  1167		    !test_bit(GD_ADDED, &disk->state))
  1168			blk_mq_exit_queue(disk->queue);
  1169	
> 1170		rq_qos_exit(q);
  1171		blkcg_exit_disk(disk);
  1172	
  1173		bioset_exit(&disk->bio_split);
  1174	
  1175		disk_release_events(disk);
  1176		kfree(disk->random);
  1177		disk_free_zone_bitmaps(disk);
  1178		xa_destroy(&disk->part_tbl);
  1179	
  1180		disk->queue->disk = NULL;
  1181		blk_put_queue(disk->queue);
  1182	
  1183		if (test_bit(GD_ADDED, &disk->state) && disk->fops->free_disk)
  1184			disk->fops->free_disk(disk);
  1185	
  1186		iput(disk->part0->bd_inode);	/* frees the disk */
  1187	}
  1188
  
Christoph Hellwig Nov. 29, 2022, 2:25 p.m. UTC | #2
On Mon, Nov 28, 2022 at 11:44:34PM +0800, Li Nan wrote:
> Fix problem by moving rq_qos_exit() to disk_release().

No, that now means it is removed to later.  You need to add proper
synchronization.
  
kernel test robot Nov. 30, 2022, 12:50 a.m. UTC | #3
Hi Li,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on next-20221128]

url:    https://github.com/intel-lab-lkp/linux/commits/Li-Nan/iocost-bugfix/20221128-232536
patch link:    https://lore.kernel.org/r/20221128154434.4177442-9-linan122%40huawei.com
patch subject: [PATCH -next 8/8] block: fix null-pointer dereference in ioc_pd_init
config: s390-randconfig-r044-20221128
compiler: clang version 16.0.0 (https://github.com/llvm/llvm-project 6e4cea55f0d1104408b26ac574566a0e4de48036)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install s390 cross compiling tool for clang build
        # apt-get install binutils-s390x-linux-gnu
        # https://github.com/intel-lab-lkp/linux/commit/44da423d235b6c23ecf6a38325d0429cfac20ee2
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Li-Nan/iocost-bugfix/20221128-232536
        git checkout 44da423d235b6c23ecf6a38325d0429cfac20ee2
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=s390 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from block/genhd.c:28:
   In file included from block/blk-throttle.h:4:
   In file included from block/blk-cgroup-rwstat.h:9:
   In file included from block/blk-cgroup.h:20:
   In file included from include/linux/blk-mq.h:8:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:75:
   include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __raw_readb(PCI_IOBASE + addr);
                             ~~~~~~~~~~ ^
   include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:37:59: note: expanded from macro '__le16_to_cpu'
   #define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x))
                                                             ^
   include/uapi/linux/swab.h:102:54: note: expanded from macro '__swab16'
   #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x))
                                                        ^
   In file included from block/genhd.c:28:
   In file included from block/blk-throttle.h:4:
   In file included from block/blk-cgroup-rwstat.h:9:
   In file included from block/blk-cgroup.h:20:
   In file included from include/linux/blk-mq.h:8:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:75:
   include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:35:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:115:54: note: expanded from macro '__swab32'
   #define __swab32(x) (__u32)__builtin_bswap32((__u32)(x))
                                                        ^
   In file included from block/genhd.c:28:
   In file included from block/blk-throttle.h:4:
   In file included from block/blk-cgroup-rwstat.h:9:
   In file included from block/blk-cgroup.h:20:
   In file included from include/linux/blk-mq.h:8:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:75:
   include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writeb(value, PCI_IOBASE + addr);
                               ~~~~~~~~~~ ^
   include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:692:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsb(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:700:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsw(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:708:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsl(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:717:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesb(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:726:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesw(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:735:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesl(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
>> block/genhd.c:1170:14: error: use of undeclared identifier 'q'
           rq_qos_exit(q);
                       ^
   12 warnings and 1 error generated.


vim +/q +1170 block/genhd.c

  1136	
  1137	/**
  1138	 * disk_release - releases all allocated resources of the gendisk
  1139	 * @dev: the device representing this disk
  1140	 *
  1141	 * This function releases all allocated resources of the gendisk.
  1142	 *
  1143	 * Drivers which used __device_add_disk() have a gendisk with a request_queue
  1144	 * assigned. Since the request_queue sits on top of the gendisk for these
  1145	 * drivers we also call blk_put_queue() for them, and we expect the
  1146	 * request_queue refcount to reach 0 at this point, and so the request_queue
  1147	 * will also be freed prior to the disk.
  1148	 *
  1149	 * Context: can sleep
  1150	 */
  1151	static void disk_release(struct device *dev)
  1152	{
  1153		struct gendisk *disk = dev_to_disk(dev);
  1154	
  1155		might_sleep();
  1156		WARN_ON_ONCE(disk_live(disk));
  1157	
  1158		/*
  1159		 * To undo the all initialization from blk_mq_init_allocated_queue in
  1160		 * case of a probe failure where add_disk is never called we have to
  1161		 * call blk_mq_exit_queue here. We can't do this for the more common
  1162		 * teardown case (yet) as the tagset can be gone by the time the disk
  1163		 * is released once it was added.
  1164		 */
  1165		if (queue_is_mq(disk->queue) &&
  1166		    test_bit(GD_OWNS_QUEUE, &disk->state) &&
  1167		    !test_bit(GD_ADDED, &disk->state))
  1168			blk_mq_exit_queue(disk->queue);
  1169	
> 1170		rq_qos_exit(q);
  1171		blkcg_exit_disk(disk);
  1172	
  1173		bioset_exit(&disk->bio_split);
  1174	
  1175		disk_release_events(disk);
  1176		kfree(disk->random);
  1177		disk_free_zone_bitmaps(disk);
  1178		xa_destroy(&disk->part_tbl);
  1179	
  1180		disk->queue->disk = NULL;
  1181		blk_put_queue(disk->queue);
  1182	
  1183		if (test_bit(GD_ADDED, &disk->state) && disk->fops->free_disk)
  1184			disk->fops->free_disk(disk);
  1185	
  1186		iput(disk->part0->bd_inode);	/* frees the disk */
  1187	}
  1188
  
Yu Kuai Nov. 30, 2022, 1:32 a.m. UTC | #4
Hi,

在 2022/11/29 22:25, Christoph Hellwig 写道:
> On Mon, Nov 28, 2022 at 11:44:34PM +0800, Li Nan wrote:
>> Fix problem by moving rq_qos_exit() to disk_release().
> 
> No, that now means it is removed to later.  You need to add proper
> synchronization.
> .
> 

Can you explain a bit more? Maybe I'm being noob, here disk is about to
be freed, and I can think of any contention.

Thanks,
Kuai
  
Christoph Hellwig Nov. 30, 2022, 3:59 p.m. UTC | #5
On Wed, Nov 30, 2022 at 09:32:58AM +0800, Yu Kuai wrote:
> > No, that now means it is removed to later.  You need to add proper
> > synchronization.
> > .
> > 
> 
> Can you explain a bit more? Maybe I'm being noob, here disk is about to
> be freed, and I can think of any contention.

Right now we need synchronization with e.g. open_mutex and a check
for a dead disk, which I suggst to add insted of creating a lifetime
imbalance.
  

Patch

diff --git a/block/genhd.c b/block/genhd.c
index dcf200bcbd3e..c264da49eaaa 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -656,7 +656,6 @@  void del_gendisk(struct gendisk *disk)
 		elevator_exit(q);
 		mutex_unlock(&q->sysfs_lock);
 	}
-	rq_qos_exit(q);
 	blk_mq_unquiesce_queue(q);
 
 	/*
@@ -1168,6 +1167,7 @@  static void disk_release(struct device *dev)
 	    !test_bit(GD_ADDED, &disk->state))
 		blk_mq_exit_queue(disk->queue);
 
+	rq_qos_exit(q);
 	blkcg_exit_disk(disk);
 
 	bioset_exit(&disk->bio_split);