nvmet: Reorder fields in 'struct nvmet_ns'

Message ID aea924d31f2bd2f740b1ccc6f462905bd6cab763.1682624855.git.christophe.jaillet@wanadoo.fr
State New
Headers
Series nvmet: Reorder fields in 'struct nvmet_ns' |

Commit Message

Christophe JAILLET April 27, 2023, 7:47 p.m. UTC
  Group some variables based on their sizes to reduce holes.
On x86_64, this shrinks the size of 'struct nvmet_ns' from 520 to 512
bytes.

When such a structure is allocated in nvmet_ns_alloc(), because of the way
memory allocation works, when 520 bytes were requested, 1024 bytes were
allocated.

So, on x86_64, this change saves 512 bytes per allocation.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
---
More aggressive grouping could be done to be more future proof, but the
way the struct nvmet_ns is written suggest that some fields should be
kept together. So keep grouping as-is.


Using pahole

Before:
======
struct nvmet_ns {
	struct percpu_ref          ref;                  /*     0    16 */
	struct block_device *      bdev;                 /*    16     8 */
	struct file *              file;                 /*    24     8 */
	bool                       readonly;             /*    32     1 */

	/* XXX 3 bytes hole, try to pack */

	u32                        nsid;                 /*    36     4 */
	u32                        blksize_shift;        /*    40     4 */

	/* XXX 4 bytes hole, try to pack */

	loff_t                     size;                 /*    48     8 */
	u8                         nguid[16];            /*    56    16 */
	/* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
	uuid_t                     uuid;                 /*    72    16 */
	u32                        anagrpid;             /*    88     4 */
	bool                       buffered_io;          /*    92     1 */
	bool                       enabled;              /*    93     1 */

	/* XXX 2 bytes hole, try to pack */

	struct nvmet_subsys *      subsys;               /*    96     8 */
	const char  *              device_path;          /*   104     8 */
	struct config_group        device_group;         /*   112   136 */
	/* --- cacheline 3 boundary (192 bytes) was 56 bytes ago --- */
	struct config_group        group;                /*   248   136 */
	/* --- cacheline 6 boundary (384 bytes) --- */
	struct completion          disable_done;         /*   384    96 */
	/* --- cacheline 7 boundary (448 bytes) was 32 bytes ago --- */
	mempool_t *                bvec_pool;            /*   480     8 */
	int                        use_p2pmem;           /*   488     4 */

	/* XXX 4 bytes hole, try to pack */

	struct pci_dev *           p2p_dev;              /*   496     8 */
	int                        pi_type;              /*   504     4 */
	int                        metadata_size;        /*   508     4 */
	/* --- cacheline 8 boundary (512 bytes) --- */
	u8                         csi;                  /*   512     1 */

	/* size: 520, cachelines: 9, members: 23 */
	/* sum members: 500, holes: 4, sum holes: 13 */
	/* padding: 7 */
	/* last cacheline: 8 bytes */
};

After:
=====
struct nvmet_ns {
	struct percpu_ref          ref;                  /*     0    16 */
	struct block_device *      bdev;                 /*    16     8 */
	struct file *              file;                 /*    24     8 */
	bool                       readonly;             /*    32     1 */

	/* XXX 3 bytes hole, try to pack */

	u32                        nsid;                 /*    36     4 */
	u32                        blksize_shift;        /*    40     4 */

	/* XXX 4 bytes hole, try to pack */

	loff_t                     size;                 /*    48     8 */
	u8                         nguid[16];            /*    56    16 */
	/* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
	uuid_t                     uuid;                 /*    72    16 */
	u32                        anagrpid;             /*    88     4 */
	bool                       buffered_io;          /*    92     1 */
	bool                       enabled;              /*    93     1 */

	/* XXX 2 bytes hole, try to pack */

	struct nvmet_subsys *      subsys;               /*    96     8 */
	const char  *              device_path;          /*   104     8 */
	struct config_group        device_group;         /*   112   136 */
	/* --- cacheline 3 boundary (192 bytes) was 56 bytes ago --- */
	struct config_group        group;                /*   248   136 */
	/* --- cacheline 6 boundary (384 bytes) --- */
	struct completion          disable_done;         /*   384    96 */
	/* --- cacheline 7 boundary (448 bytes) was 32 bytes ago --- */
	mempool_t *                bvec_pool;            /*   480     8 */
	struct pci_dev *           p2p_dev;              /*   488     8 */
	int                        use_p2pmem;           /*   496     4 */
	int                        pi_type;              /*   500     4 */
	int                        metadata_size;        /*   504     4 */
	u8                         csi;                  /*   508     1 */

	/* size: 512, cachelines: 8, members: 23 */
	/* sum members: 500, holes: 3, sum holes: 9 */
	/* padding: 3 */
};
---
 drivers/nvme/target/nvmet.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Jens Axboe April 27, 2023, 10:51 p.m. UTC | #1
On 4/27/23 1:47 PM, Christophe JAILLET wrote:
> Group some variables based on their sizes to reduce holes.
> On x86_64, this shrinks the size of 'struct nvmet_ns' from 520 to 512
> bytes.
> 
> When such a structure is allocated in nvmet_ns_alloc(), because of the way
> memory allocation works, when 520 bytes were requested, 1024 bytes were
> allocated.
> 
> So, on x86_64, this change saves 512 bytes per allocation.
> 
> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
> ---
> More aggressive grouping could be done to be more future proof, but the
> way the struct nvmet_ns is written suggest that some fields should be
> kept together. So keep grouping as-is.

I think you did the right thing, that move doesn't matter and it brings
it to pow-of-2 or less and that is really what matters. So looks fine to
me:

Reviewed-by: Jens Axboe <axboe@kernel.dk>
  
Chaitanya Kulkarni April 27, 2023, 10:59 p.m. UTC | #2
On 4/27/23 12:47, Christophe JAILLET wrote:
> Group some variables based on their sizes to reduce holes.
> On x86_64, this shrinks the size of 'struct nvmet_ns' from 520 to 512
> bytes.
>

Although this looks good, we at least need to document what
happens on other arch(s) which are not mentioned in the
commit log ? is there a possibility that someone might come
up with the contradictory data in future for the arch(s) which
arch that are not mentioned here ?

-ck
  
Jens Axboe April 27, 2023, 11:01 p.m. UTC | #3
On 4/27/23 4:59?PM, Chaitanya Kulkarni wrote:
> On 4/27/23 12:47, Christophe JAILLET wrote:
>> Group some variables based on their sizes to reduce holes.
>> On x86_64, this shrinks the size of 'struct nvmet_ns' from 520 to 512
>> bytes.
>>
> 
> Although this looks good, we at least need to document what
> happens on other arch(s) which are not mentioned in the
> commit log ? is there a possibility that someone might come
> up with the contradictory data in future for the arch(s) which
> arch that are not mentioned here ?

The change is certainly not going to make things _worse_ for any arch,
so I think that's somewhat of a pointless exercise and an unreasonable
ask for something that makes sense on 64-bit arm/x86 and saves half the
space.
  
Chaitanya Kulkarni April 27, 2023, 11:07 p.m. UTC | #4
On 4/27/23 16:01, Jens Axboe wrote:
> On 4/27/23 4:59?PM, Chaitanya Kulkarni wrote:
>> On 4/27/23 12:47, Christophe JAILLET wrote:
>>> Group some variables based on their sizes to reduce holes.
>>> On x86_64, this shrinks the size of 'struct nvmet_ns' from 520 to 512
>>> bytes.
>>>
>> Although this looks good, we at least need to document what
>> happens on other arch(s) which are not mentioned in the
>> commit log ? is there a possibility that someone might come
>> up with the contradictory data in future for the arch(s) which
>> arch that are not mentioned here ?
> The change is certainly not going to make things _worse_ for any arch,
> so I think that's somewhat of a pointless exercise and an unreasonable
> ask for something that makes sense on 64-bit arm/x86 and saves half the
> space.
>

disregard my comment, looks good...

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck
  
Chaitanya Kulkarni April 27, 2023, 11:12 p.m. UTC | #5
(+Keith, for host side)
> ---
> More aggressive grouping could be done to be more future proof, but the
> way the struct nvmet_ns is written suggest that some fields should be
> kept together. So keep grouping as-is.
>
>

you can send RFC and I'll be happy to take a look if it's going
have any benefit, it will take some time though..

for host side :-

while you are at it, it might be useful to take a look at the structures
that are accessed in the fast path on the host side ?

unless there is some reason for not doing that ...

-ck
  
Christophe JAILLET April 28, 2023, 7:33 a.m. UTC | #6
Le 28/04/2023 à 01:12, Chaitanya Kulkarni a écrit :
> (+Keith, for host side)
>> ---
>> More aggressive grouping could be done to be more future proof, but the
>> way the struct nvmet_ns is written suggest that some fields should be
>> kept together. So keep grouping as-is.
>>
>>
> 
> you can send RFC and I'll be happy to take a look if it's going
> have any benefit, it will take some time though..
> 
> for host side :-
> 
> while you are at it, it might be useful to take a look at the structures
> that are accessed in the fast path on the host side ?

Ok, why not, but can you help identifying these structures or places 
considered as fast path?

CJ

> 
> unless there is some reason for not doing that ...
> 
> -ck
> 
>
  
Chaitanya Kulkarni April 28, 2023, 8:11 a.m. UTC | #7
On 4/28/23 00:33, Christophe JAILLET wrote:
> Le 28/04/2023 à 01:12, Chaitanya Kulkarni a écrit :
>> (+Keith, for host side)
>>> ---
>>> More aggressive grouping could be done to be more future proof, but the
>>> way the struct nvmet_ns is written suggest that some fields should be
>>> kept together. So keep grouping as-is.
>>>
>>>
>>
>> you can send RFC and I'll be happy to take a look if it's going
>> have any benefit, it will take some time though..
>>
>> for host side :-
>>
>> while you are at it, it might be useful to take a look at the structures
>> that are accessed in the fast path on the host side ?
>
> Ok, why not, but can you help identifying these structures or places 
> considered as fast path?
>
> CJ

you can start with nvme_ns/nvme_ctrl as I remember nvme_ns is used
in nvme_setup_rw_cmd() on host and nvme_ctrl is used in
nvmet_passthru_execute_cmd().

In general nvme_queue_rq()/nvme_rdma_queue_rq()/
nvmet_bdev_execute_rw()/nvmet_file_execute_rw()/
nvmet_passthru_execute_cmd() are functions to start with, then you can
see structs starting with nvme_ prefix (mainly related to ns and ctrl)
which are worth taking a look and their benefits, but be careful what
you are moving ...

Ohh and nvmet_req that is also ...

-ck
  
Sagi Grimberg May 1, 2023, 1:58 p.m. UTC | #8
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
  
Christophe JAILLET June 19, 2023, 6:08 p.m. UTC | #9
Le 28/04/2023 à 01:07, Chaitanya Kulkarni a écrit :
> On 4/27/23 16:01, Jens Axboe wrote:
>> On 4/27/23 4:59?PM, Chaitanya Kulkarni wrote:
>>> On 4/27/23 12:47, Christophe JAILLET wrote:
>>>> Group some variables based on their sizes to reduce holes.
>>>> On x86_64, this shrinks the size of 'struct nvmet_ns' from 520 to 512
>>>> bytes.
>>>>
>>> Although this looks good, we at least need to document what
>>> happens on other arch(s) which are not mentioned in the
>>> commit log ? is there a possibility that someone might come
>>> up with the contradictory data in future for the arch(s) which
>>> arch that are not mentioned here ?
>> The change is certainly not going to make things _worse_ for any arch,
>> so I think that's somewhat of a pointless exercise and an unreasonable
>> ask for something that makes sense on 64-bit arm/x86 and saves half the
>> space.
>>
> 
> disregard my comment, looks good...
> 
> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
> 
> -ck
> 
> 

Hi,


All my other nvmet patches have reached -next today, but this one seems 
to be missing.

So this is a gentle reminder, in case it got lost somewhere.

CJ
  
Chaitanya Kulkarni June 20, 2023, 5:21 a.m. UTC | #10
On 6/19/2023 11:08 AM, Christophe JAILLET wrote:
> Le 28/04/2023 à 01:07, Chaitanya Kulkarni a écrit :
>> On 4/27/23 16:01, Jens Axboe wrote:
>>> On 4/27/23 4:59?PM, Chaitanya Kulkarni wrote:
>>>> On 4/27/23 12:47, Christophe JAILLET wrote:
>>>>> Group some variables based on their sizes to reduce holes.
>>>>> On x86_64, this shrinks the size of 'struct nvmet_ns' from 520 to 512
>>>>> bytes.
>>>>>
>>>> Although this looks good, we at least need to document what
>>>> happens on other arch(s) which are not mentioned in the
>>>> commit log ? is there a possibility that someone might come
>>>> up with the contradictory data in future for the arch(s) which
>>>> arch that are not mentioned here ?
>>> The change is certainly not going to make things _worse_ for any arch,
>>> so I think that's somewhat of a pointless exercise and an unreasonable
>>> ask for something that makes sense on 64-bit arm/x86 and saves half the
>>> space.
>>>
>>
>> disregard my comment, looks good...
>>
>> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
>>
>> -ck
>>
>>
> 
> Hi,
> 
> 
> All my other nvmet patches have reached -next today, but this one seems 
> to be missing.
> 
> So this is a gentle reminder, in case it got lost somewhere.
> 
> CJ


I believe this patch can still be applied as is on the top of nvme-6.5..

-ck
  
Keith Busch June 20, 2023, 3:45 p.m. UTC | #11
On Mon, Jun 19, 2023 at 08:08:38PM +0200, Christophe JAILLET wrote:
> All my other nvmet patches have reached -next today, but this one seems to
> be missing.
> 
> So this is a gentle reminder, in case it got lost somewhere.

Oops, thanks for the catch. I'll queue this up for the next 6.5 pull
request.
  
Keith Busch June 21, 2023, 5:25 p.m. UTC | #12
Queued up now for nvme-6.5.
  

Patch

diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index dc60a22646f7..c50146085fb5 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -79,8 +79,8 @@  struct nvmet_ns {
 	struct completion	disable_done;
 	mempool_t		*bvec_pool;
 
-	int			use_p2pmem;
 	struct pci_dev		*p2p_dev;
+	int			use_p2pmem;
 	int			pi_type;
 	int			metadata_size;
 	u8			csi;