Message ID | 20221026194122.11761-5-mpdesouza@suse.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp453358wru; Wed, 26 Oct 2022 12:44:33 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6DSaH9l/P7y3I9NVcEm8FTRUNyiuYIS2TWLzU7daND2isciZrAXDnxPdIP/RVkKNruVW5s X-Received: by 2002:a17:906:9bc4:b0:7aa:9184:d95d with SMTP id de4-20020a1709069bc400b007aa9184d95dmr13075394ejc.349.1666813473235; Wed, 26 Oct 2022 12:44:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666813473; cv=none; d=google.com; s=arc-20160816; b=MTZHkT+UoI6VwgdAMDEBlLiNvLoZcBivc0Xbpo8aBpCKQduVN4I8g+r8H0GCGFwIcl 4kCbaGE2oIE8vFFhjGpAQaZDz3nuCpNvSA52U5rrSseYsTSEtqITitwUnovm8pk7Bjt3 BTaE4t1ZLNC7DsJBJWRuHGF16b4SHbTC2ZOLIHLlUgzyxXusNuHeURL0BCaRup9iwUJu 1yaMmUMNTM9Q7mxbhrSH9Tapqtd3RZgi/0Rz+dH3x8Xw6md1Ng3Nbk4I2cqNa5WBKIrg 0zkJbHi6IdJNkG5XFkKCY0XksBbcegSFYsc+jBKxPpC1i+JGKpMT9jl1LEBaCkXz2cIl LKfw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=1Lk0jla7Dh3kqdUHZpu2RNb/LcgsWJikmOAgMB8CUfw=; b=UNNarmYoMp4ptSi7Ze2ul3F/PLsQ/c0R0XsH28xk7rqAk3iZhTMiEyXFa0uJb+qasC TVzevnUw1KpXdNym89cUVQl59Dtw2Tsagf7/iv52iq0ncTdPi6V6ruvE1I1BHI2EIAVF I4rJqoIaOJ5ZgsPNCEw0vrxDvp7yd/fUr/2ifr3bmdocl91Qo50ftpVasW0nb+Zy6Wpq CaoE/6LV34hvJOfBlZcfZLMuHzAoAeDSAfn5ZI0xtd5P7okwj50CiuePO77Jxn4V71Gn de7DVgWzWFn01TEDyvyVth71jGbCIZxMq9xwD09Mo913Wn2NpggjoKmPhXc/tW3o4tvw 0FLg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=TAfAc8VM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hz1-20020a1709072ce100b007acbaef6d9csi5182062ejc.137.2022.10.26.12.43.52; Wed, 26 Oct 2022 12:44:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=TAfAc8VM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233165AbiJZTmG (ORCPT <rfc822;pwkd43@gmail.com> + 99 others); Wed, 26 Oct 2022 15:42:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39060 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234432AbiJZTlt (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 26 Oct 2022 15:41:49 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6DE4EBC608; Wed, 26 Oct 2022 12:41:47 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id DDCA722029; Wed, 26 Oct 2022 19:41:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1666813305; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1Lk0jla7Dh3kqdUHZpu2RNb/LcgsWJikmOAgMB8CUfw=; b=TAfAc8VM1Vaf0+ZB5lC3dMQLSXTFUfue1NPEssJ+0NN/pyv8Di3AST3eqQ8Hc5h4anZVyY tU4HQLR1AHVZ1FAI+fz0XsWV1UIhc+C8IOUSyDInyo2vS7QdAYHNUT7VoM9ilfZMNbRawa B//bg2AVCQDEe8AdXK/5rFVNEzdUSdY= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 1B9D813A77; Wed, 26 Oct 2022 19:41:42 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id IEzmNHaNWWNlFwAAMHmgww (envelope-from <mpdesouza@suse.com>); Wed, 26 Oct 2022 19:41:42 +0000 From: Marcos Paulo de Souza <mpdesouza@suse.com> To: linux-kernel@vger.kernel.org, live-patching@vger.kernel.org Cc: jpoimboe@redhat.com, joe.lawrence@redhat.com, pmladek@suse.com, Marcos Paulo de Souza <mpdesouza@suse.com> Subject: [PATCH v2 4/4] livepatch/shadow: Add garbage collection of shadow variables Date: Wed, 26 Oct 2022 16:41:22 -0300 Message-Id: <20221026194122.11761-5-mpdesouza@suse.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221026194122.11761-1-mpdesouza@suse.com> References: <20221026194122.11761-1-mpdesouza@suse.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747780604281836150?= X-GMAIL-MSGID: =?utf-8?q?1747780604281836150?= |
Series |
livepatch: Add garbage collection for shadow variables
|
|
Commit Message
Marcos Paulo de Souza
Oct. 26, 2022, 7:41 p.m. UTC
The life of shadow variables is not completely trivial to maintain. They might be used by more livepatches and more livepatched objects. They should stay as long as there is any user. In practice, it requires to implement reference counting in callbacks of all users. They should register all the user and remove the shadow variables only when there is no user left. This patch hides the reference counting into the klp_shadow API. The counter is connected with the shadow variable @id. It requires an API to take and release the reference. The release function also calls the related dtor() when defined. An easy solution would be to add some get_ref()/put_ref() API. But it would need to get called from pre()/post_un() callbacks. It might be easy to forget a callback and make it wrong. A more safe approach is to associate the klp_shadow_type with klp_objects that use the shadow variables. The livepatch core code might then handle the reference counters on background. The shadow variable type might then be added into a new @shadow_types member of struct klp_object. They will get then automatically registered and unregistered when the object is being livepatched. The registration increments the reference count. Unregistration decreases the reference count. All shadow variables of the given type are freed when the reference count reaches zero. All klp_shadow_alloc/get/free functions also checks whether the requested type is registered. It will help to catch missing registration and might also help to catch eventual races. Signed-off-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> --- Changes from v1: * Reordered my SoB (Josh) * Changed code comments (Petr) include/linux/livepatch.h | 21 ++++ kernel/livepatch/core.c | 39 +++++++ kernel/livepatch/core.h | 1 + kernel/livepatch/shadow.c | 124 ++++++++++++++++++++++ kernel/livepatch/transition.c | 4 +- lib/livepatch/test_klp_shadow_vars.c | 18 +++- samples/livepatch/livepatch-shadow-fix1.c | 8 +- samples/livepatch/livepatch-shadow-fix2.c | 9 +- 8 files changed, 214 insertions(+), 10 deletions(-)
Comments
On Wed, Oct 26, 2022 at 04:41:22PM -0300, Marcos Paulo de Souza wrote: > The life of shadow variables is not completely trivial to maintain. > They might be used by more livepatches and more livepatched objects. > They should stay as long as there is any user. > > In practice, it requires to implement reference counting in callbacks > of all users. They should register all the user and remove the shadow > variables only when there is no user left. > > This patch hides the reference counting into the klp_shadow API. > The counter is connected with the shadow variable @id. It requires > an API to take and release the reference. The release function also > calls the related dtor() when defined. > > An easy solution would be to add some get_ref()/put_ref() API. > But it would need to get called from pre()/post_un() callbacks. > It might be easy to forget a callback and make it wrong. > > A more safe approach is to associate the klp_shadow_type with > klp_objects that use the shadow variables. The livepatch core > code might then handle the reference counters on background. > > The shadow variable type might then be added into a new @shadow_types > member of struct klp_object. They will get then automatically registered > and unregistered when the object is being livepatched. The registration > increments the reference count. Unregistration decreases the reference > count. All shadow variables of the given type are freed when the reference > count reaches zero. > > All klp_shadow_alloc/get/free functions also checks whether the requested > type is registered. It will help to catch missing registration and might > also help to catch eventual races. Is there a reason the shadow variable lifetime is tied to klp_object rather than klp_patch? I get the feeling the latter would be easier to implement (no reference counting; also maybe can be auto-detected with THIS_MODULE?) and harder for the patch author to mess up (by accidentally omitting an object which uses it).
On Thu 2022-11-03 18:03:27, Josh Poimboeuf wrote: > On Wed, Oct 26, 2022 at 04:41:22PM -0300, Marcos Paulo de Souza wrote: > > The life of shadow variables is not completely trivial to maintain. > > They might be used by more livepatches and more livepatched objects. > > They should stay as long as there is any user. > > > > In practice, it requires to implement reference counting in callbacks > > of all users. They should register all the user and remove the shadow > > variables only when there is no user left. > > > > This patch hides the reference counting into the klp_shadow API. > > The counter is connected with the shadow variable @id. It requires > > an API to take and release the reference. The release function also > > calls the related dtor() when defined. > > > > An easy solution would be to add some get_ref()/put_ref() API. > > But it would need to get called from pre()/post_un() callbacks. > > It might be easy to forget a callback and make it wrong. > > > > A more safe approach is to associate the klp_shadow_type with > > klp_objects that use the shadow variables. The livepatch core > > code might then handle the reference counters on background. > > > > The shadow variable type might then be added into a new @shadow_types > > member of struct klp_object. They will get then automatically registered > > and unregistered when the object is being livepatched. The registration > > increments the reference count. Unregistration decreases the reference > > count. All shadow variables of the given type are freed when the reference > > count reaches zero. > > > > All klp_shadow_alloc/get/free functions also checks whether the requested > > type is registered. It will help to catch missing registration and might > > also help to catch eventual races. > > Is there a reason the shadow variable lifetime is tied to klp_object > rather than klp_patch? Good question! My thinking was that shadow variables might be tight to variables from a particular livepatched module. It would make sense to free them when the only user (livepatched module) is removed. The question is if it is really important. I did re-read the original issue and the real problem was when the livepatch was reloaded. The related variables were handled using livepatched code, then without the livepatched code, and then with the livepatched code again. The variable was modified with the original code that was not aware of the shadow variable. As a result the state of the shadow variable was not in sync when it was later used again. Nicolai, your have the practical experience. Should the reference counting be per-livepatched object or per-livepatch, please? > I get the feeling the latter would be easier to implement (no reference > counting; also maybe can be auto-detected with THIS_MODULE?) and harder > for the patch author to mess up (by accidentally omitting an object > which uses it). I am not sure how you mean it. I guess that you suggest to store the name of the livepatch module into the shadow variable. And use the variable only when the livepatch module is still loaded. This would not help. We use the reference counting because the shadow variables should survive an atomic replace of the lifepatch. In this case, the related variables are still handled using a livepatched code that is aware of the shadow variables. Also it does not solve the problem that the shadow variable might get out of sync with the related variable when the same livepatch gets reloaded. Best Regards, Petr
On Fri, Nov 04, 2022 at 11:25:38AM +0100, Petr Mladek wrote: > > I get the feeling the latter would be easier to implement (no reference > > counting; also maybe can be auto-detected with THIS_MODULE?) and harder > > for the patch author to mess up (by accidentally omitting an object > > which uses it). > > I am not sure how you mean it. I guess that you suggest to store > the name of the livepatch module into the shadow variable. > And use the variable only when the livepatch module is still loaded. Actually I was thinking the klp_patch could have references to all the shadow variables (or shadow variable types?) it owns. Instead of reference counting, the livepatch atomic replace code could just migrate any previously owned shadow variables to the new klp_patch. This of course adds the restriction that such garbage-collected shadow variables couldn't be shared across non-replace livepatches. But I wouldn't expect that to be much of a problem. Additionally, I was wondering if "which klp_patch owns which shadow variables" could be auto-detected somehow. For example: 1) add 'struct module *owner' or 'struct klp_patch *owner' to klp_shadow 2) add klp_shadow_alloc_gc() and klp_shadow_get_or_alloc_gc(), which are similar to their non-gc counterparts, with a few additional arguments: the klp module owner (THIS_MODULE for the caller); and a destructor to be used later for the garbage collection 3) When atomic replacing a patch, iterate through the klp_shadow_hash and, for each klp_shadow which previously had an owner, change it to be owned by the new patch 4) When unloading/freeing a patch, free all its associated klp_shadows (also calling destructors where applicable) I'm thinking this would be easier for the patch author, and also simpler overall. I could work up a patch.
On Mon 2022-11-07 17:32:09, Josh Poimboeuf wrote: > On Fri, Nov 04, 2022 at 11:25:38AM +0100, Petr Mladek wrote: > > > I get the feeling the latter would be easier to implement (no reference > > > counting; also maybe can be auto-detected with THIS_MODULE?) and harder > > > for the patch author to mess up (by accidentally omitting an object > > > which uses it). > > > > I am not sure how you mean it. I guess that you suggest to store > > the name of the livepatch module into the shadow variable. > > And use the variable only when the livepatch module is still loaded. > > Actually I was thinking the klp_patch could have references to all the > shadow variables (or shadow variable types?) it owns. In short, you suggest to move the array of used klp_shadow_types from struct klp_object to struct klp_patch. Do I get it correctly? > Instead of reference counting, the livepatch atomic replace code could > just migrate any previously owned shadow variables to the new klp_patch. I see. We would not need to explicitly register the shadow type using klp_shadow_register() and allocate struct klp_shadow_type_reg for the reference counter. > This of course adds the restriction that such garbage-collected shadow > variables couldn't be shared across non-replace livepatches. But I > wouldn't expect that to be much of a problem. From my POV, this is similar to "func_stack" in struct "klp_ops". The list was originally designed for non-replace livepatches because the same function might be livepatched many times. It might theoretically get "simplified" when only the atomic replace is supported. Then there would be the maximum of two livepatches and the (infinite) list would be an overhead. I say theoretically because the list is actually quite elegant solution. > Additionally, I was wondering if "which klp_patch owns which shadow > variables" could be auto-detected somehow. > > For example: > > 1) add 'struct module *owner' or 'struct klp_patch *owner' to klp_shadow > > 2) add klp_shadow_alloc_gc() and klp_shadow_get_or_alloc_gc(), which are > similar to their non-gc counterparts, with a few additional > arguments: the klp module owner (THIS_MODULE for the caller); and a > destructor to be used later for the garbage collection > > 3) When atomic replacing a patch, iterate through the klp_shadow_hash > and, for each klp_shadow which previously had an owner, change it to > be owned by the new patch This is not clear to me. The new livepatch might also use less shadow variables. It must not blindly take over all shadow variables which were owned by the previous livepatch. > 4) When unloading/freeing a patch, free all its associated klp_shadows > (also calling destructors where applicable) > > > I'm thinking this would be easier for the patch author, and also simpler > overall. I could work up a patch. From the patch author POV: If the autodetection did not work then the patch author would still need to provide the array of used shadow types. I agree that only one array in struct klp_patch might be enough. From the implementation POV: I agree that the code might be easier if we support only atomic replace. We would not need the reference counter in this case. But I am not sure if this is acceptable for users that do not use the atomic replace. They suffer from the same problem. Do we really want to make this mode a 2nd citizen? IMHO, all applicable features have been implemented for both modes so far. Best Regards, Petr
On Tue, Nov 08, 2022 at 10:14:18AM +0100, Petr Mladek wrote: > On Mon 2022-11-07 17:32:09, Josh Poimboeuf wrote: > > On Fri, Nov 04, 2022 at 11:25:38AM +0100, Petr Mladek wrote: > > > > I get the feeling the latter would be easier to implement (no reference > > > > counting; also maybe can be auto-detected with THIS_MODULE?) and harder > > > > for the patch author to mess up (by accidentally omitting an object > > > > which uses it). > > > > > > I am not sure how you mean it. I guess that you suggest to store > > > the name of the livepatch module into the shadow variable. > > > And use the variable only when the livepatch module is still loaded. > > > > Actually I was thinking the klp_patch could have references to all the > > shadow variables (or shadow variable types?) it owns. > > In short, you suggest to move the array of used klp_shadow_types from > struct klp_object to struct klp_patch. Do I get it correctly? Right. Though, thinking about it more, this isn't even needed. Each klp_shadow would have a pointer to its owning module. We already have a global hash of klp_shadows which can be iterated when the module gets unloaded or replaced. > > 1) add 'struct module *owner' or 'struct klp_patch *owner' to klp_shadow > > > > 2) add klp_shadow_alloc_gc() and klp_shadow_get_or_alloc_gc(), which are > > similar to their non-gc counterparts, with a few additional > > arguments: the klp module owner (THIS_MODULE for the caller); and a > > destructor to be used later for the garbage collection > > > > 3) When atomic replacing a patch, iterate through the klp_shadow_hash > > and, for each klp_shadow which previously had an owner, change it to > > be owned by the new patch > > This is not clear to me. The new livepatch might also use less shadow > variables. It must not blindly take over all shadow variables which > were owned by the previous livepatch. Assuming atomic replace, the new patch is almost always a superset of the old patch. We can optimize for that case. If the new patch needs to remove any old shadow variables, it can do so in its post-patch callback. > > 4) When unloading/freeing a patch, free all its associated klp_shadows > > (also calling destructors where applicable) > > > > > > I'm thinking this would be easier for the patch author, and also simpler > > overall. I could work up a patch. > > From the patch author POV: > > If the autodetection did not work then the patch author would still > need to provide the array of used shadow types. I agree that only > one array in struct klp_patch might be enough. > > > From the implementation POV: > > I agree that the code might be easier if we support only atomic > replace. We would not need the reference counter in this case. > > But I am not sure if this is acceptable for users that do not use > the atomic replace. They suffer from the same problem. Do we really > want to make this mode a 2nd citizen? IMHO, all applicable features > have been implemented for both modes so far. Non-replace patches would still be supported. Just with the restriction that garbage-collected shadow variables are by definition owned by a single patch module and thus can't be shared across patch modules.
On Tue 2022-11-08 10:44:10, Josh Poimboeuf wrote: > On Tue, Nov 08, 2022 at 10:14:18AM +0100, Petr Mladek wrote: > > On Mon 2022-11-07 17:32:09, Josh Poimboeuf wrote: > > > On Fri, Nov 04, 2022 at 11:25:38AM +0100, Petr Mladek wrote: > > > > > I get the feeling the latter would be easier to implement (no reference > > > > > counting; also maybe can be auto-detected with THIS_MODULE?) and harder > > > > > for the patch author to mess up (by accidentally omitting an object > > > > > which uses it). > > > > > > > > I am not sure how you mean it. I guess that you suggest to store > > > > the name of the livepatch module into the shadow variable. > > > > And use the variable only when the livepatch module is still loaded. > > > > > > Actually I was thinking the klp_patch could have references to all the > > > shadow variables (or shadow variable types?) it owns. > > > > In short, you suggest to move the array of used klp_shadow_types from > > struct klp_object to struct klp_patch. Do I get it correctly? > > Right. Though, thinking about it more, this isn't even needed. Each > klp_shadow would have a pointer to its owning module. We already have a > global hash of klp_shadows which can be iterated when the module gets > unloaded or replaced. > > Assuming atomic replace, the new patch is almost always a superset of > the old patch. We can optimize for that case. I see. But I do not agree with this assumption. The new livepatch might also remove code that caused problems. Also we allow downgrades. I mean that there is no versioning of livepatches. Older livepatches might replace new ones. We really want to support downgrades or upgrades that remove problematic code. Or upgrades that start fixing the problem a better way using another shadow variable. I personally think that registering the supported klp_shadow_types is worth the effort. It allows to do various changes a safe way. Also it should be easy to make sure that all klp_shadow_types are registered: 1. Grep might be used to find declarations in the source code. IMHO, it should work even with kPatch. 2. The klp_shadow_*() API will warn when a non-registered shadow variable is used. IMHO, this might be useful and catch some bugs on its own. Best Regards, Petr
Hi all, Petr Mladek <pmladek@suse.com> writes: > On Thu 2022-11-03 18:03:27, Josh Poimboeuf wrote: >> On Wed, Oct 26, 2022 at 04:41:22PM -0300, Marcos Paulo de Souza wrote: >> > The life of shadow variables is not completely trivial to maintain. >> > They might be used by more livepatches and more livepatched objects. >> > They should stay as long as there is any user. >> > >> > In practice, it requires to implement reference counting in callbacks >> > of all users. They should register all the user and remove the shadow >> > variables only when there is no user left. >> > >> > This patch hides the reference counting into the klp_shadow API. >> > The counter is connected with the shadow variable @id. It requires >> > an API to take and release the reference. The release function also >> > calls the related dtor() when defined. >> > >> > An easy solution would be to add some get_ref()/put_ref() API. >> > But it would need to get called from pre()/post_un() callbacks. >> > It might be easy to forget a callback and make it wrong. >> > >> > A more safe approach is to associate the klp_shadow_type with >> > klp_objects that use the shadow variables. The livepatch core >> > code might then handle the reference counters on background. >> > >> > The shadow variable type might then be added into a new @shadow_types >> > member of struct klp_object. They will get then automatically registered >> > and unregistered when the object is being livepatched. The registration >> > increments the reference count. Unregistration decreases the reference >> > count. All shadow variables of the given type are freed when the reference >> > count reaches zero. >> > >> > All klp_shadow_alloc/get/free functions also checks whether the requested >> > type is registered. It will help to catch missing registration and might >> > also help to catch eventual races. >> >> Is there a reason the shadow variable lifetime is tied to klp_object >> rather than klp_patch? > > Good question! > > My thinking was that shadow variables might be tight to variables > from a particular livepatched module. It would make sense to free them > when the only user (livepatched module) is removed. > > The question is if it is really important. I don't think so. For shadow variables attached to "real" data objects, I would expect those objects to be gone by the time the patched ko gets removed. For dummy shadows attached to NULL (see below), used for handing over shared state between atomic-replace livepatch modules, the memory footprint is negligible, I'd say. OTOH, callbacks are per klp_object already, so for API consistency, it might make sense to stick to this pattern for the shadows as well. But I don't have a real opinion on this. > I did re-read the original issue and the real problem was when the > livepatch was reloaded. The related variables were handled using > livepatched code, then without the livepatched code, and then with the > livepatched code again. > > The variable was modified with the original code that was not aware of > the shadow variable. As a result the state of the shadow variable was > not in sync when it was later used again. > Apologies for being late to the discussion. As I've not seen it mentioned anywhere in this thread here -- I haven't checked v1 though -- let me outline the primary limitations (as perceived by me) of the current shadow variables API here again, just to have it documented. In practice, an atomic-replace klp_patch is composed of a number of "sublivepatches" addressing individual CVEs or other issues. In general, it is usually true that a more recent version of a livepatch contains a superset of these sublivepatches and the individual sublivepatches' implementations never change in practice. However, users can downgrade an atomic-replace livepatch to an earlier version or disable a livepatch completely. From my experience, there are basically two relevant usage patterns of shadow variables. 1.) To hand over global state from one sublivepatch to its pendant in the to-be-applied livepatch module. Example: a new global mutex or alike. 2.) The "regular" intended usage, attaching schadow variables to real (data) objects. To manage lifetime for 1.), we usually implement some refcount scheme, managed from the livepatches' module_init()/_exit(): the next livepatch would subscribe to the shared state before the previous one got a chance to release it. This works in practice, but the code related to it is tedious to write and quite verbose. The second usage pattern is much more difficult to implement correctly in light of possible livepatch downgrades to a subset of sublivepatches. Usually a sublivepatch making use of a shadow variable attached to real objects would livepatch the associated object's destruction code to free up the associated shadow, if any. If the next livepatch to be applied happened to not contain this sublivepatch in question as well, the destruction code would effectively become unpatched, and any existing shadows leaked. Depending on the object type in question, this memory leakage might or might not be an actual problem, but it isn't nice either way. Often, there's a more subtle issue with the latter usecase though: the shadow continues to exist, but becomes unmaintained once the transitions has started. If said sublivepatch happens to become reactivated later on, it would potentially find stale shadows, and these could even get wrongly associated with a completely different object which happened to get allocated at the same memory address. Depending on the shadow type, this might or might not be Ok. New per-object locks or a "TLB flush needed" boolean would probably be Ok, but some kind of refcount would certainly not. There's not much which could be done from the pre-unpatch callbacks, because these aren't getting invoked for atomic-replace downgrades. We had quite some discussion internally on how to best approach these limitations, the outcome being (IIRC), that a more versatile callbacks support would perhaps quickly become too complex or error-prone to use correctly. So Petr proposed this garbage collection/refcounting mechanism posted here, which would solve the memory leakage issue as a first step (and would make shadow variable usage less verbose IMO). The consistency problem would still not be fully solved: consider a refcount-like shadow, where during the transition some acquirer had been unpatched already, while a releaser has not yet. But my hope is that we can later build on this work here and somehow resolve this as well. > Nicolai, your have the practical experience. Should the reference > counting be per-livepatched object or per-livepatch, please? See above, I think it won't matter much from a functionality POV. >> I get the feeling the latter would be easier to implement (no reference >> counting; also maybe can be auto-detected with THIS_MODULE?) and harder >> for the patch author to mess up (by accidentally omitting an object >> which uses it). > > I am not sure how you mean it. I guess that you suggest to store > the name of the livepatch module into the shadow variable. > And use the variable only when the livepatch module is still loaded. > > This would not help. No, if I'm understanding correctly, this would tie the shadow lifetime to one single livepatch module, which would make handing them over to the next atomic-replace livepatch impossible? > > We use the reference counting because the shadow variables should > survive an atomic replace of the lifepatch. In this case, the related > variables are still handled using a livepatched code that is aware > of the shadow variables. Yes. > Also it does not solve the problem that the shadow variable might > get out of sync with the related variable when the same > livepatch gets reloaded. We would still not have a complete guarantee that some shadow is consistently maintained over its full lifetime with this patchset here, see the refcount example from above. But the overall situation would be much better than before, IMO. Thanks! Nicolai
On Fri 2022-11-11 10:20:44, Nicolai Stange wrote: > Hi all, > > Petr Mladek <pmladek@suse.com> writes: > > > On Thu 2022-11-03 18:03:27, Josh Poimboeuf wrote: > >> On Wed, Oct 26, 2022 at 04:41:22PM -0300, Marcos Paulo de Souza wrote: > >> > The life of shadow variables is not completely trivial to maintain. > >> > They might be used by more livepatches and more livepatched objects. > >> > They should stay as long as there is any user. > >> > > >> > In practice, it requires to implement reference counting in callbacks > >> > of all users. They should register all the user and remove the shadow > >> > variables only when there is no user left. > >> > > >> > This patch hides the reference counting into the klp_shadow API. > >> > The counter is connected with the shadow variable @id. It requires > >> > an API to take and release the reference. The release function also > >> > calls the related dtor() when defined. > >> > > >> > An easy solution would be to add some get_ref()/put_ref() API. > >> > But it would need to get called from pre()/post_un() callbacks. > >> > It might be easy to forget a callback and make it wrong. > >> > > >> > A more safe approach is to associate the klp_shadow_type with > >> > klp_objects that use the shadow variables. The livepatch core > >> > code might then handle the reference counters on background. > >> > > >> > The shadow variable type might then be added into a new @shadow_types > >> > member of struct klp_object. They will get then automatically registered > >> > and unregistered when the object is being livepatched. The registration > >> > increments the reference count. Unregistration decreases the reference > >> > count. All shadow variables of the given type are freed when the reference > >> > count reaches zero. > >> > > >> > All klp_shadow_alloc/get/free functions also checks whether the requested > >> > type is registered. It will help to catch missing registration and might > >> > also help to catch eventual races. > >> > >> Is there a reason the shadow variable lifetime is tied to klp_object > >> rather than klp_patch? > > > > Good question! > > > > My thinking was that shadow variables might be tight to variables > > from a particular livepatched module. It would make sense to free them > > when the only user (livepatched module) is removed. > > > > The question is if it is really important. > > I don't think so. For shadow variables attached to "real" data objects, > I would expect those objects to be gone by the time the patched ko gets > removed. For dummy shadows attached to NULL (see below), used for > handing over shared state between atomic-replace livepatch modules, the > memory footprint is negligible, I'd say. > > OTOH, callbacks are per klp_object already, so for API consistency, it > might make sense to stick to this pattern for the shadows as well. But I > don't have a real opinion on this. Good point! > >From my experience, there are basically two relevant usage patterns of > shadow variables. > 1.) To hand over global state from one sublivepatch to its pendant in > the to-be-applied livepatch module. Example: a new global mutex or > alike. > 2.) The "regular" intended usage, attaching schadow variables to real > (data) objects. > > To manage lifetime for 1.), we usually implement some refcount scheme, > managed from the livepatches' module_init()/_exit(): the next livepatch > would subscribe to the shared state before the previous one got a chance > to release it. This works in practice, but the code related to it is > tedious to write and quite verbose. > > The second usage pattern is much more difficult to implement correctly > in light of possible livepatch downgrades to a subset of > sublivepatches. Usually a sublivepatch making use of a shadow variable > attached to real objects would livepatch the associated object's > destruction code to free up the associated shadow, if any. If the next > livepatch to be applied happened to not contain this sublivepatch in > question as well, the destruction code would effectively become > unpatched, and any existing shadows leaked. Depending on the object type > in question, this memory leakage might or might not be an actual > problem, but it isn't nice either way. > > Often, there's a more subtle issue with the latter usecase though: the > shadow continues to exist, but becomes unmaintained once the transitions > has started. If said sublivepatch happens to become reactivated later > on, it would potentially find stale shadows, and these could even get > wrongly associated with a completely different object which happened to > get allocated at the same memory address. Depending on the shadow type, > this might or might not be Ok. New per-object locks or a "TLB flush > needed" boolean would probably be Ok, but some kind of refcount would > certainly not. There's not much which could be done from the pre-unpatch > callbacks, because these aren't getting invoked for atomic-replace > downgrades. > > We had quite some discussion internally on how to best approach these > limitations, the outcome being (IIRC), that a more versatile callbacks > support would perhaps quickly become too complex or error-prone to use > correctly. So Petr proposed this garbage collection/refcounting > mechanism posted here, which would solve the memory leakage issue as a > first step (and would make shadow variable usage less verbose IMO). > > The consistency problem would still not be fully solved: consider a > refcount-like shadow, where during the transition some acquirer had been > unpatched already, while a releaser has not yet. But my hope is that we > can later build on this work here and somehow resolve this as well. > > > Nicolai, your have the practical experience. Should the reference > > counting be per-livepatched object or per-livepatch, please? > > See above, I think it won't matter much from a functionality POV. I would personally keep it tied together with the livepatched object just to be on the safe side. Is this acceptable for kPatch users in the end? Best Regards, Petr
On Fri, Nov 11, 2022 at 10:55:38AM +0100, Petr Mladek wrote: > > >From my experience, there are basically two relevant usage patterns of > > shadow variables. > > 1.) To hand over global state from one sublivepatch to its pendant in > > the to-be-applied livepatch module. Example: a new global mutex or > > alike. > > 2.) The "regular" intended usage, attaching schadow variables to real > > (data) objects. > > > > To manage lifetime for 1.), we usually implement some refcount scheme, > > managed from the livepatches' module_init()/_exit(): the next livepatch > > would subscribe to the shared state before the previous one got a chance > > to release it. This works in practice, but the code related to it is > > tedious to write and quite verbose. > > > > The second usage pattern is much more difficult to implement correctly > > in light of possible livepatch downgrades to a subset of > > sublivepatches. Usually a sublivepatch making use of a shadow variable > > attached to real objects would livepatch the associated object's > > destruction code to free up the associated shadow, if any. If the next > > livepatch to be applied happened to not contain this sublivepatch in > > question as well, the destruction code would effectively become > > unpatched, and any existing shadows leaked. Depending on the object type > > in question, this memory leakage might or might not be an actual > > problem, but it isn't nice either way. > > > > Often, there's a more subtle issue with the latter usecase though: the > > shadow continues to exist, but becomes unmaintained once the transitions > > has started. If said sublivepatch happens to become reactivated later > > on, it would potentially find stale shadows, and these could even get > > wrongly associated with a completely different object which happened to > > get allocated at the same memory address. Depending on the shadow type, > > this might or might not be Ok. New per-object locks or a "TLB flush > > needed" boolean would probably be Ok, but some kind of refcount would > > certainly not. There's not much which could be done from the pre-unpatch > > callbacks, because these aren't getting invoked for atomic-replace > > downgrades. > > > > We had quite some discussion internally on how to best approach these > > limitations, the outcome being (IIRC), that a more versatile callbacks > > support would perhaps quickly become too complex or error-prone to use > > correctly. So Petr proposed this garbage collection/refcounting > > mechanism posted here, which would solve the memory leakage issue as a > > first step (and would make shadow variable usage less verbose IMO). > > > > The consistency problem would still not be fully solved: consider a > > refcount-like shadow, where during the transition some acquirer had been > > unpatched already, while a releaser has not yet. But my hope is that we > > can later build on this work here and somehow resolve this as well. It would be great to have all this motivation for the new feature documented in shadow-vars.rst. > > > Nicolai, your have the practical experience. Should the reference > > > counting be per-livepatched object or per-livepatch, please? > > > > See above, I think it won't matter much from a functionality POV. > > I would personally keep it tied together with the livepatched object > just to be on the safe side. If downgrades are going to be commonplace then I agree my automatic detection idea wouldn't work so well. And ref counting does make sense. However I'm still not convinced that per-object is the way to go. Doesn't that create more room for human error? There's no way to detect-and-warn if the wrong object is used, or if the variable is used by multiple objects but only one of them is listed. Per-patch shadow variable types would be easy to detect and warn on misuse. And easy to automate in patch author tooling. Also, I'm not crazy about the new API. It's even more confusing than before.
Hi, I am sorry for answering this so late. It somehow fallen under cracks. On Sun 2022-11-13 10:51:38, Josh Poimboeuf wrote: > On Fri, Nov 11, 2022 at 10:55:38AM +0100, Petr Mladek wrote: > > > >From my experience, there are basically two relevant usage patterns of > > > shadow variables. > > > 1.) To hand over global state from one sublivepatch to its pendant in > > > the to-be-applied livepatch module. Example: a new global mutex or > > > alike. > > > 2.) The "regular" intended usage, attaching shadow variables to real > > > (data) objects. > > > > > > To manage lifetime for 1.), we usually implement some refcount scheme, > > > managed from the livepatches' module_init()/_exit(): the next livepatch > > > would subscribe to the shared state before the previous one got a chance > > > to release it. This works in practice, but the code related to it is > > > tedious to write and quite verbose. > > > > > > The second usage pattern is much more difficult to implement correctly > > > in light of possible livepatch downgrades to a subset of > > > sublivepatches. Usually a sublivepatch making use of a shadow variable > > > attached to real objects would livepatch the associated object's > > > destruction code to free up the associated shadow, if any. If the next > > > livepatch to be applied happened to not contain this sublivepatch in > > > question as well, the destruction code would effectively become > > > unpatched, and any existing shadows leaked. Depending on the object type > > > in question, this memory leakage might or might not be an actual > > > problem, but it isn't nice either way. > > > > > > Often, there's a more subtle issue with the latter usecase though: the > > > shadow continues to exist, but becomes unmaintained once the transitions > > > has started. If said sublivepatch happens to become reactivated later > > > on, it would potentially find stale shadows, and these could even get > > > wrongly associated with a completely different object which happened to > > > get allocated at the same memory address. Depending on the shadow type, > > > this might or might not be Ok. New per-object locks or a "TLB flush > > > needed" boolean would probably be Ok, but some kind of refcount would > > > certainly not. There's not much which could be done from the pre-unpatch > > > callbacks, because these aren't getting invoked for atomic-replace > > > downgrades. IMHO, this is the reason why we should make it per-object. If the shadow variable was used by a livepatched module and we remove this module then the shadow variables would get unmaintained. It would results in the problem described in this paragraph. > > > We had quite some discussion internally on how to best approach these > > > limitations, the outcome being (IIRC), that a more versatile callbacks > > > support would perhaps quickly become too complex or error-prone to use > > > correctly. So Petr proposed this garbage collection/refcounting > > > mechanism posted here, which would solve the memory leakage issue as a > > > first step (and would make shadow variable usage less verbose IMO). > > > > > > The consistency problem would still not be fully solved: consider a > > > refcount-like shadow, where during the transition some acquirer had been > > > unpatched already, while a releaser has not yet. But my hope is that we > > > can later build on this work here and somehow resolve this as well. > > It would be great to have all this motivation for the new feature > documented in shadow-vars.rst. > > > > > Nicolai, your have the practical experience. Should the reference > > > > counting be per-livepatched object or per-livepatch, please? > > > > > > See above, I think it won't matter much from a functionality POV. > > > > I would personally keep it tied together with the livepatched object > > just to be on the safe side. > > If downgrades are going to be commonplace then I agree my automatic > detection idea wouldn't work so well. And ref counting does make sense. > > However I'm still not convinced that per-object is the way to go. > Doesn't that create more room for human error? There's no way to > detect-and-warn if the wrong object is used, or if the variable is used > by multiple objects but only one of them is listed. I agree that these mistakes might happen. And I really do not see any reasonable way how to auto-detect which livepatched object uses which shadow variables. Well, the per-object registration allows to register all shadow types for the "vmlinux" object that is always loaded. It would work as you suggested. I mean that it would keep the shadow variables registered as long as the livepatch is loaded. But I would still like to keep the registration per-object. It would allow to handle re-load of livepatched modules the right way when needed. And I agree that we should document these pitfalls in the documentation. > Per-patch shadow variable types would be easy to detect and warn on > misuse. And easy to automate in patch author tooling. > > Also, I'm not crazy about the new API. It's even more confusing than > before. I do not think that it made the API much worse. It actually made some things more safe and easier. The ctor/dtor callbacks are defined by the shadow type. It removes the risk to passing a wrong one. Also there is not need to free the unused shadow variables in post_un() callbacks. It was actually pretty tricky because it required the reference counting. Best Regards, Petr
On Tue, Jan 17, 2023 at 04:01:57PM +0100, Petr Mladek wrote: > IMHO, this is the reason why we should make it per-object. > > If the shadow variable was used by a livepatched module and we remove > this module then the shadow variables would get unmaintained. It would > results in the problem described in this paragraph. Yes, that makes sense. Ok, I'm convinced. BTW, this is yet another unfortunate consequence of our decision many years ago to break the module dependency between a livepatch module and the modules it patches. We already have a lot of technical debt as a result of that decision and it continues to pile up. In that vein see also Song's and my recent patches to fix module re-patching.
On Wed 2023-01-25 15:22:48, Josh Poimboeuf wrote: > On Tue, Jan 17, 2023 at 04:01:57PM +0100, Petr Mladek wrote: > > IMHO, this is the reason why we should make it per-object. > > > > If the shadow variable was used by a livepatched module and we remove > > this module then the shadow variables would get unmaintained. It would > > results in the problem described in this paragraph. > > Yes, that makes sense. Ok, I'm convinced. Thanks! > BTW, this is yet another unfortunate consequence of our decision many > years ago to break the module dependency between a livepatch module and > the modules it patches. We already have a lot of technical debt as a > result of that decision and it continues to pile up. > > In that vein see also Song's and my recent patches to fix module > re-patching. Yeah. Just for record, I have played with splitting the livepatch module some years ago. It was quite tricky. The main problem was loading all the needed livepatch modules and synchronizing their load with the livepatched modules. Few more details were mentioned in https://lore.kernel.org/r/Ytp+u2mGPk5+7Tvf@alley Best Regards, Petr
On Wed, Oct 26, 2022 at 04:41:22PM -0300, Marcos Paulo de Souza wrote: > /** > * struct klp_object - kernel object structure for live patching > * @name: module name (or NULL for vmlinux) > * @funcs: function entries for functions to be patched in the object > * @callbacks: functions to be executed pre/post (un)patching > + * @shadow_types: shadow variable types used by the livepatch for the klp_object Please change the indentation of the other field descriptions so they match (here and elsewhere in the patch). > @@ -222,13 +226,30 @@ typedef void (*klp_shadow_dtor_t)(void *obj, void *shadow_data); > * @ctor: custom constructor to initialize the shadow data (optional) > * @dtor: custom callback that can be used to unregister the variable > * and/or free data that the shadow variable points to (optional) > + * @registered: flag indicating if the variable was successfully registered "variable" -> "type" > + * > + * All shadow variables used by the livepatch for the related klp_object "variables" -> "variable types" ? > + * must be listed here so that they are registered when the livepatch > + * and the module is loaded. Otherwise, it will not be possible to Where is "here"? Does it mean to say "must be listed in the klp_object"? Though, I don't think even that is true, since the user can manually call klp_shadow_register(). "module" -> "object" ? > + * allocate them. > */ > struct klp_shadow_type { > unsigned long id; > klp_shadow_ctor_t ctor; > klp_shadow_dtor_t dtor; > + > + /* internal */ > + bool registered; > }; > > +#define klp_for_each_shadow_type(obj, shadow_type, i) \ > + for (shadow_type = obj->shadow_types ? obj->shadow_types[0] : NULL, i = 1; \ > + shadow_type; \ > + shadow_type = obj->shadow_types[i++]) > + > +int klp_shadow_register(struct klp_shadow_type *shadow_type); > +void klp_shadow_unregister(struct klp_shadow_type *shadow_type); More cases where 'shadow_type' can be shortened to 'type'. (And the same comment elsewhere) > + > void *klp_shadow_get(void *obj, struct klp_shadow_type *shadow_type); > void *klp_shadow_alloc(void *obj, struct klp_shadow_type *shadow_type, > size_t size, gfp_t gfp_flags, void *ctor_data); I find the type-based interface to be unnecessarily clunky and confusing: - It obscures the fact that uniqueness is determined by ID, not by type. - It's weird to be passing around the type even after it has been registered: e.g., why doesn't klp remember the ctor/dtor? - It complicates the registration interface for the normal case where we normally don't have constructors/destructors. - I don't understand the exposed klp_shadow_{un}register() interfaces. What's the point of forcing their use if there's no garbage collection? - It's internally confusing in klp to have both 'type' and 'type_reg'. I don't have any real suggestions yet, I'll need to think a little more about it. One question: has anybody actually used destructors in the real world? > @@ -988,6 +1012,13 @@ static int __klp_enable_patch(struct klp_patch *patch) > if (!klp_is_object_loaded(obj)) > continue; > > + ret = klp_register_shadow_types(obj); > + if (ret) { > + pr_warn("failed to register shadow types for object '%s'\n", > + klp_is_module(obj) ? obj->name : "vmlinux"); > + goto err; > + } > + If this fails for some reason then the error path doesn't unregister. Presumably klp_register_shadow_types() needs to be self-cleaning on error like klp_patch_object() is. And BTW, it might be easier to do so if klp_shadow_type has a 'reg' pointer to its corresponding reg struct. Then the 'registered' boolean is no longer needed and klp_shadow_unregister() doesn't need to call klp_shadow_type_get_reg(). > +++ b/kernel/livepatch/shadow.c > @@ -34,6 +34,7 @@ > #include <linux/hashtable.h> > #include <linux/slab.h> > #include <linux/livepatch.h> > +#include "core.h" > > static DEFINE_HASHTABLE(klp_shadow_hash, 12); > > @@ -59,6 +60,22 @@ struct klp_shadow { > char data[]; > }; > > +/** > + * struct klp_shadow_type_reg - information about a registered shadow > + * variable type > + * @id: shadow variable type indentifier "identifier" > +static struct klp_shadow_type_reg * > +klp_shadow_type_get_reg(struct klp_shadow_type *shadow_type) > +{ > + struct klp_shadow_type_reg *shadow_type_reg; > + lockdep_assert_held(&klp_shadow_lock); > + > + list_for_each_entry(shadow_type_reg, &klp_shadow_types, list) { > + if (shadow_type_reg->id == shadow_type->id) > + return shadow_type_reg; > + } > + > + return NULL; > +} It seems like overkill to have both klp_shadow_hash and klp_shadow_types. For example 'struct klp_shadow' could have a link to its type and then klp_shadow_type_get_reg could iterate through the klp_shadow_hash. > + > +/** > + * klp_shadow_register() - register the given shadow variable type > + * @shadow_type: shadow type to be registered > + * > + * Tell the system that the given shadow type is going to used by the caller > + * (livepatch module). It allows to check and maintain lifetime of shadow > + * variables. It's probably worth mentioning here that this function typically isn't called directly by the livepatch, and is only needed if the klp_object doesn't have the type in its 'shadow_types' array. > + * > + * Return: 0 on suceess, -ENOMEM when there is not enough memory. "success" > + */ > +int klp_shadow_register(struct klp_shadow_type *shadow_type) > +{ > + struct klp_shadow_type_reg *shadow_type_reg; > + struct klp_shadow_type_reg *new_shadow_type_reg; > + > + new_shadow_type_reg = > + kzalloc(sizeof(struct klp_shadow_type_reg), GFP_KERNEL); > + if (!new_shadow_type_reg) > + return -ENOMEM; > + > + spin_lock_irq(&klp_shadow_lock); > + > + if (shadow_type->registered) { > + pr_err("Trying to register shadow variable type that is already registered: %lu", > + shadow_type->id); > + kfree(new_shadow_type_reg); > + goto out; Shouldn't this return -EINVAL or so? > + } > + > + shadow_type_reg = klp_shadow_type_get_reg(shadow_type); > + if (!shadow_type_reg) { > + shadow_type_reg = new_shadow_type_reg; > + shadow_type_reg->id = shadow_type->id; > + list_add(&shadow_type_reg->list, &klp_shadow_types); > + } else { > + kfree(new_shadow_type_reg); > + } > + > + shadow_type_reg->ref_cnt++; > + shadow_type->registered = true; > +out: > + spin_unlock_irq(&klp_shadow_lock); > + > + return 0; > +} > +EXPORT_SYMBOL_GPL(klp_shadow_register); > + > +/** > + * klp_shadow_unregister() - unregister the give shadow variable type "given" > + * @shadow_type: shadow type to be unregistered > + * > + * Tell the system that a given shadow variable ID is not longer be used by "is no longer used" > + * the caller (livepatch module). All existing shadow variables are freed > + * when it was the last registered user. > + */ > +void klp_shadow_unregister(struct klp_shadow_type *shadow_type) > +{ > + struct klp_shadow_type_reg *shadow_type_reg; > + > + spin_lock_irq(&klp_shadow_lock); > + > + if (!shadow_type->registered) { > + pr_err("Trying to unregister shadow variable type that is not registered: %lu", > + shadow_type->id); > + goto out; > + } > + > + shadow_type_reg = klp_shadow_type_get_reg(shadow_type); > + if (!shadow_type_reg) { > + pr_err("Can't find shadow variable type registration: %lu", shadow_type->id); > + goto out; > + } Since it's too late to report these errors and they might indicate a major bug, maybe they should WARN().
On Mon 2023-01-30 20:40:08, Josh Poimboeuf wrote: > On Wed, Oct 26, 2022 at 04:41:22PM -0300, Marcos Paulo de Souza wrote: > > +#define klp_for_each_shadow_type(obj, shadow_type, i) \ > > + for (shadow_type = obj->shadow_types ? obj->shadow_types[0] : NULL, i = 1; \ > > + shadow_type; \ > > + shadow_type = obj->shadow_types[i++]) > > + > > +int klp_shadow_register(struct klp_shadow_type *shadow_type); > > +void klp_shadow_unregister(struct klp_shadow_type *shadow_type); > > More cases where 'shadow_type' can be shortened to 'type'. > (And the same comment elsewhere) Works for me. > > + > > void *klp_shadow_get(void *obj, struct klp_shadow_type *shadow_type); > > void *klp_shadow_alloc(void *obj, struct klp_shadow_type *shadow_type, > > size_t size, gfp_t gfp_flags, void *ctor_data); > > I find the type-based interface to be unnecessarily clunky and > confusing: > > - It obscures the fact that uniqueness is determined by ID, not by type. > > - It's weird to be passing around the type even after it has been > registered: e.g., why doesn't klp remember the ctor/dtor? ctor/dtor must be implemented in each livepatch. klp_shadow_register() would need to remember all registered implementations. klp_shadow_alloc/get/free would need to find and choose one. It would add an extra overhead and non-determines. The overhead might be negligible. The callbacks should be compatible. But still, is it worth the potential troubles? IMHO, it is just about POV. Does it really matter if we pass id or type? Switching to type might be a slight complication for existing API users that are used to ID. But we are changing the API anyway. > - It complicates the registration interface for the normal case where > we normally don't have constructors/destructors. I am sorry but I do not understand this concern. The new API hides ctor/dtor into struct klp_shadow_type. It removes one NULL parameter from klp_alloc/get/free API. It looks like a simplification to me. > - I don't understand the exposed klp_shadow_{un}register() interfaces. > What's the point of forcing their use if there's no garbage > collection? I probably do not understand it correctly. Normal livepatch developers do not need to use klp_shadow_{un}register(). They are called transparently in __klp_enable_patch()/klp_complete_transition() and klp_module_coming()/klp_cleanup_module_patches_limited(). The main reason why they are exposed via include/linux/livepatch.h and EXPORT_SYMBOL_GPL() is the selftest lib/livepatch/test_klp_shadow_vars.c. Well, the selftest probably could be bundled into a livepatch to go around this. > - It's internally confusing in klp to have both 'type' and 'type_reg'. I do not like it much either. An idea. We could replace "bool registered" with "struct list_head register_node" in struct klp_shadow_type. Then we could use it for registration. All the registered types will be in a global list (klp_type_register). klp_shadow_unregister() would do the garbage collection when it removed the last entry with the given id from the global list. > One question: has anybody actually used destructors in the real world? > > > @@ -988,6 +1012,13 @@ static int __klp_enable_patch(struct klp_patch *patch) > > if (!klp_is_object_loaded(obj)) > > continue; > > > > + ret = klp_register_shadow_types(obj); > > + if (ret) { > > + pr_warn("failed to register shadow types for object '%s'\n", > > + klp_is_module(obj) ? obj->name : "vmlinux"); > > + goto err; > > + } > > + > > If this fails for some reason then the error path doesn't unregister. > Presumably klp_register_shadow_types() needs to be self-cleaning on > error like klp_patch_object() is. Good point! They actually are unregisterd via: + __klp_enable_patch() + klp_cancel_transition() # err: in __klp_enabled_patch() + klp_complete_transition() + klp_unregister_shadow_types() But there is a problem. It tries to unregister all types. We have to make sure that it does not complain when a type has not been registered yet. > And BTW, it might be easier to do so if klp_shadow_type has a 'reg' > pointer to its corresponding reg struct. Then the 'registered' boolean > is no longer needed and klp_shadow_unregister() doesn't need to call > klp_shadow_type_get_reg(). Yup. It would be easier also if the use list_head pointing to the global register. > > +++ b/kernel/livepatch/shadow.c > > > +static struct klp_shadow_type_reg * > > +klp_shadow_type_get_reg(struct klp_shadow_type *shadow_type) > > +{ > > + struct klp_shadow_type_reg *shadow_type_reg; > > + lockdep_assert_held(&klp_shadow_lock); > > + > > + list_for_each_entry(shadow_type_reg, &klp_shadow_types, list) { > > + if (shadow_type_reg->id == shadow_type->id) > > + return shadow_type_reg; > > + } > > + > > + return NULL; > > +} > > It seems like overkill to have both klp_shadow_hash and > klp_shadow_types. For example 'struct klp_shadow' could have a link to > its type and then klp_shadow_type_get_reg could iterate through the > klp_shadow_hash. I guess that there is a misunderstanding. We need two databases: + One for the registered shadow_types. I does the reference counting. It counts the number of livepatched objects (livepatches) that might the shadow type. It says _when_ the garbage collection must be done. + Second for existing shadow variables. It is needed for finding the shadow data. It says _what_ variables have to freed when the garbage collection is being proceed. > > + > > +/** > > + * klp_shadow_register() - register the given shadow variable type > > + * @shadow_type: shadow type to be registered > > + * > > + * Tell the system that the given shadow type is going to used by the caller > > + * (livepatch module). It allows to check and maintain lifetime of shadow > > + * variables. > > It's probably worth mentioning here that this function typically isn't > called directly by the livepatch, and is only needed if the klp_object > doesn't have the type in its 'shadow_types' array. Yup. > > + * > > + * Return: 0 on suceess, -ENOMEM when there is not enough memory. > > "success" > > > + */ > > +int klp_shadow_register(struct klp_shadow_type *shadow_type) > > +{ > > + struct klp_shadow_type_reg *shadow_type_reg; > > + struct klp_shadow_type_reg *new_shadow_type_reg; > > + > > + new_shadow_type_reg = > > + kzalloc(sizeof(struct klp_shadow_type_reg), GFP_KERNEL); > > + if (!new_shadow_type_reg) > > + return -ENOMEM; > > + > > + spin_lock_irq(&klp_shadow_lock); > > + > > + if (shadow_type->registered) { > > + pr_err("Trying to register shadow variable type that is already registered: %lu", > > + shadow_type->id); > > + kfree(new_shadow_type_reg); > > + goto out; > > Shouldn't this return -EINVAL or so? Yes, it would make more sense. > > + } > > + > > + shadow_type_reg = klp_shadow_type_get_reg(shadow_type); > > + if (!shadow_type_reg) { > > + shadow_type_reg = new_shadow_type_reg; > > + shadow_type_reg->id = shadow_type->id; > > + list_add(&shadow_type_reg->list, &klp_shadow_types); > > + } else { > > + kfree(new_shadow_type_reg); > > + } > > + > > + shadow_type_reg->ref_cnt++; > > + shadow_type->registered = true; > > +out: > > + spin_unlock_irq(&klp_shadow_lock); > > + > > + return 0; > > +} > > +EXPORT_SYMBOL_GPL(klp_shadow_register); > > + > > +void klp_shadow_unregister(struct klp_shadow_type *shadow_type) > > +{ > > + struct klp_shadow_type_reg *shadow_type_reg; > > + > > + spin_lock_irq(&klp_shadow_lock); > > + > > + if (!shadow_type->registered) { > > + pr_err("Trying to unregister shadow variable type that is not registered: %lu", > > + shadow_type->id); > > + goto out; > > + } > > + > > + shadow_type_reg = klp_shadow_type_get_reg(shadow_type); > > + if (!shadow_type_reg) { > > + pr_err("Can't find shadow variable type registration: %lu", shadow_type->id); > > + goto out; > > + } > > Since it's too late to report these errors and they might indicate a > major bug, maybe they should WARN(). This probably won't be needed after we get rid of shadow_type_reg. Also this is actually a valid scenario. klp_complete_transition() might try to unregister a not-yet-registered type. It might happen when called from __klp_enabled_patch() error path. Thanks a lot for the review. Best Regards, Petr
On Tue, Jan 31, 2023 at 03:23:58PM +0100, Petr Mladek wrote: > > > + > > > void *klp_shadow_get(void *obj, struct klp_shadow_type *shadow_type); > > > void *klp_shadow_alloc(void *obj, struct klp_shadow_type *shadow_type, > > > size_t size, gfp_t gfp_flags, void *ctor_data); > > > > I find the type-based interface to be unnecessarily clunky and > > confusing: > > > > - It obscures the fact that uniqueness is determined by ID, not by type. > > > > - It's weird to be passing around the type even after it has been > > registered: e.g., why doesn't klp remember the ctor/dtor? > > ctor/dtor must be implemented in each livepatch. klp_shadow_register() > would need to remember all registered implementations. > klp_shadow_alloc/get/free would need to find and choose one. > It would add an extra overhead and non-determines. There's really no need to even "register" the constructor. It can continue to just be passed as needed to the alloc functions. (Side note, I don't see why klp_shadow_alloc() even needs a constructor as it's typically used when its corresponding object gets allocated, so its initialization doesn't need to be atomic. klp_shadow_get_or_alloc() on the other hand, does need a constructor since it's used for attaching to already-existing objects.) The thing really complicating this whole mess of an API is the destructor. The problem is that it can change when replacing livepatches, so we can't just remember it in the registration. At least at Red Hat, I don't think we've ever used a shadow destructor. Its real world use seems somewhere between rare and non-existent. I guess a destructor would be needed if the constructor (or some other initialization) allocated additional memory associated with the shadow variable. That data would need to be freed. But in that case couldn't an additional shadow variable be used for the additional memory? Then we could just get rid of this idea of the destructor and this API could become much more straightforward. Alternatively, each livepatch could just have an optional global destructor which is called for every object which needs destructing. It could then use the ID to decide what needs done (or pass it off to a more specific destructor). > > - I don't understand the exposed klp_shadow_{un}register() interfaces. > > What's the point of forcing their use if there's no garbage > > collection? > > I probably do not understand it correctly. > > Normal livepatch developers do not need to use klp_shadow_{un}register(). > They are called transparently in __klp_enable_patch()/klp_complete_transition() > and klp_module_coming()/klp_cleanup_module_patches_limited(). > > The main reason why they are exposed via include/linux/livepatch.h > and EXPORT_SYMBOL_GPL() is the selftest lib/livepatch/test_klp_shadow_vars.c. > > Well, the selftest probably could be bundled into a livepatch to go > around this. In that case, at the very least they should be prefixed with underscores and the function comment should make it clear they shouldn't be called under normal usage. > > - It's internally confusing in klp to have both 'type' and 'type_reg'. > > I do not like it much either. > > An idea. We could replace "bool registered" with "struct list_head > register_node" in struct klp_shadow_type. Then we could use it > for registration. > > All the registered types will be in a global list (klp_type_register). > klp_shadow_unregister() would do the garbage collection when it > removed the last entry with the given id from the global list. Yeah, that does sound better (assuming we decide to keep this type thing). > > It seems like overkill to have both klp_shadow_hash and > > klp_shadow_types. For example 'struct klp_shadow' could have a link to > > its type and then klp_shadow_type_get_reg could iterate through the > > klp_shadow_hash. > > I guess that there is a misunderstanding. We need two databases: > > + One for the registered shadow_types. I does the reference > counting. It counts the number of livepatched objects > (livepatches) that might the shadow type. It says _when_ the garbage > collection must be done. > > + Second for existing shadow variables. It is needed for finding > the shadow data. It says _what_ variables have to freed when > the garbage collection is being proceed. Yup, not sure what I was thinking, please ignore...
On Wed, Oct 26, 2022 at 04:41:22PM -0300, Marcos Paulo de Souza wrote: > The shadow variable type might then be added into a new @shadow_types > member of struct klp_object. They will get then automatically registered > and unregistered when the object is being livepatched. The registration > increments the reference count. Unregistration decreases the reference > count. All shadow variables of the given type are freed when the reference > count reaches zero. How does the automatic unregistration work for replaced patches? I see klp_unpatch_replaced_patches() is called, but I don't see where it ends up calling klp_shadow_unregister() for the replaced patch(es).
On Tue 2023-01-31 16:18:17, Josh Poimboeuf wrote: > On Wed, Oct 26, 2022 at 04:41:22PM -0300, Marcos Paulo de Souza wrote: > > The shadow variable type might then be added into a new @shadow_types > > member of struct klp_object. They will get then automatically registered > > and unregistered when the object is being livepatched. The registration > > increments the reference count. Unregistration decreases the reference > > count. All shadow variables of the given type are freed when the reference > > count reaches zero. > > How does the automatic unregistration work for replaced patches? > > I see klp_unpatch_replaced_patches() is called, but I don't see where it > ends up calling klp_shadow_unregister() for the replaced patch(es). Great catch! I forgot that replaced patches are handled separately. We should do the following (on top of this patch): --- a/kernel/livepatch/transition.c +++ b/kernel/livepatch/transition.c @@ -123,8 +123,11 @@ static void klp_complete_transition(void) continue; if (klp_target_state == KLP_PATCHED) klp_post_patch_callback(obj); - else if (klp_target_state == KLP_UNPATCHED) { + else if (klp_target_state == KLP_UNPATCHED) klp_post_unpatch_callback(obj); + + if (klp_target_state == KLP_UNPATCHED || + klp_transition_patch->replace) { klp_unregister_shadow_types(obj); } } Also we should add more selftests for handling the shadow variables. Best Regards, Petr
On Tue 2023-01-31 13:17:31, Josh Poimboeuf wrote: > On Tue, Jan 31, 2023 at 03:23:58PM +0100, Petr Mladek wrote: > > > > + > > > > void *klp_shadow_get(void *obj, struct klp_shadow_type *shadow_type); > > > > void *klp_shadow_alloc(void *obj, struct klp_shadow_type *shadow_type, > > > > size_t size, gfp_t gfp_flags, void *ctor_data); > > > > > > I find the type-based interface to be unnecessarily clunky and > > > confusing: > > > > > > - It obscures the fact that uniqueness is determined by ID, not by type. > > > > > > - It's weird to be passing around the type even after it has been > > > registered: e.g., why doesn't klp remember the ctor/dtor? > > > > ctor/dtor must be implemented in each livepatch. klp_shadow_register() > > would need to remember all registered implementations. > > klp_shadow_alloc/get/free would need to find and choose one. > > It would add an extra overhead and non-determines. > > There's really no need to even "register" the constructor. It can > continue to just be passed as needed to the alloc functions. Sure. But it looks weird to handle the constructor another way then the destructor. I mean that if we registrer the destructor then we should registrer the constructor as well. I see an analogy with C++ here. klp_shadow_register_type() is similar to defining a class. And both constructor and destructor are defined in the class definition. A custom constructor in C++ might allow to pass extra parameters needed for initialization. In our case, these are obj, size, gpf_flags, ctor_data. > (Side note, I don't see why klp_shadow_alloc() even needs a constructor > as it's typically used when its corresponding object gets allocated, so > its initialization doesn't need to be atomic. klp_shadow_get_or_alloc() > on the other hand, does need a constructor since it's used for attaching > to already-existing objects.) If we agree that klp_shadow_get_or_alloc() needed a constructor then klp_shadow_alloc() should do the same. Different behavior would add more confusion from my POV. > The thing really complicating this whole mess of an API is the > destructor. The problem is that it can change when replacing > livepatches, so we can't just remember it in the registration. > > At least at Red Hat, I don't think we've ever used a shadow destructor. > Its real world use seems somewhere between rare and non-existent. I checked SUSE livepatches and we use the destructor once for mutex_destroy(). I guess that it made the livepatch more safe. Also it might have been useful for testing. > I guess a destructor would be needed if the constructor (or some other > initialization) allocated additional memory associated with the shadow > variable. That data would need to be freed. It seems that the constructor is primary used to initialize the structure members, for example, locks. The commit e91c2518a5d22a07642f35 ("livepatch: Initialize shadow variables safely by a custom callback") mentions as an example list_head. The constructor might want to link it into another list. In that case, the destructor would do the opposite. > But in that case couldn't an additional shadow variable be used for the > additional memory? Then we could just get rid of this idea of the > destructor and this API could become much more straightforward. > > Alternatively, each livepatch could just have an optional global > destructor which is called for every object which needs destructing. It > could then use the ID to decide what needs done (or pass it off to a > more specific destructor). Honestly, I do not think that it would really make things easier for people developing the livepatches. Summary: My understanding is that you do not like the following things: 1. Problem: Having both struct klp_shadow_type and struct klp_shadow_type_reg. Proposal: Get rid of struct klp_shadow_type_reg. Instead, add a list_head into struct klp_shadow_type and use it for registration and reference counting. 2. Problem: Using struct klp_shadow_type * instead of ID in the klp_shadow API. Solution: a) Find the shadow_type by ID. b) Get rid of klp_shadow_type completely 3. Problem: The API is obscure in general and this makes it even worse Solution: a) Do not pass constructor in klp_shadow_alloc() b) Do not support destructor c) Have global destructor Requirements: We agree that a constructor is needed at least for klp_shadow_get_or_alloc(). The garbage collection solves a real problem. It provides a real solution instead of custom hacks. My opinion: The shadow API is used rarely so that livepatch developer do not have much experience with it. The shadow API is usually used for some tricky things. And it is used for custom livepatch-specific modifications that do not get much review. An ideal API should be simple. But we agree that the constructor is needed in klp_shadow_get_or_alloc(). It defines the basic complexity of the API. The API should be predictable. IMHO, klp_shadow_alloc() should handle the constructor the same way as klp_shadow_get_or_alloc(). Also it would help when the API is symmetric. I would keep the destructor. And if register destructor then we should register constructor as well. I do not see big difference between passing ID or struct klp_shadow_type * in the API. IMHO, it is better than a common destructor. Summary: I would get rid of struct klp_shadow_type_reg and keep the rest as is. Proposal: If we can't find an agreenment. Then the decision should be made by people actually using the API. On the SUSE side, it is Nicolai and Marcos. IMHO, the main question is whether we need custom destructors. We could avoid struct klp_shadow_type if the destructors are not needed... > > > - I don't understand the exposed klp_shadow_{un}register() interfaces. > > > What's the point of forcing their use if there's no garbage > > > collection? > > > > I probably do not understand it correctly. > > > > Normal livepatch developers do not need to use klp_shadow_{un}register(). > > They are called transparently in __klp_enable_patch()/klp_complete_transition() > > and klp_module_coming()/klp_cleanup_module_patches_limited(). > > > > The main reason why they are exposed via include/linux/livepatch.h > > and EXPORT_SYMBOL_GPL() is the selftest lib/livepatch/test_klp_shadow_vars.c. > > > > Well, the selftest probably could be bundled into a livepatch to go > > around this. > > In that case, at the very least they should be prefixed with underscores > and the function comment should make it clear they shouldn't be called > under normal usage. Good point. Well, I think that we should start testing the klp_shadow API using livepatches. So that it won't be necessary to export the registration API. Best Regards, Petr
On Thu, Feb 02, 2023 at 11:14:15AM +0100, Petr Mladek wrote: > On Tue 2023-01-31 16:18:17, Josh Poimboeuf wrote: > > On Wed, Oct 26, 2022 at 04:41:22PM -0300, Marcos Paulo de Souza wrote: > > > The shadow variable type might then be added into a new @shadow_types > > > member of struct klp_object. They will get then automatically registered > > > and unregistered when the object is being livepatched. The registration > > > increments the reference count. Unregistration decreases the reference > > > count. All shadow variables of the given type are freed when the reference > > > count reaches zero. > > > > How does the automatic unregistration work for replaced patches? > > > > I see klp_unpatch_replaced_patches() is called, but I don't see where it > > ends up calling klp_shadow_unregister() for the replaced patch(es). > > Great catch! I forgot that replaced patches are handled separately. > We should do the following (on top of this patch): > > --- a/kernel/livepatch/transition.c > +++ b/kernel/livepatch/transition.c > @@ -123,8 +123,11 @@ static void klp_complete_transition(void) > continue; > if (klp_target_state == KLP_PATCHED) > klp_post_patch_callback(obj); > - else if (klp_target_state == KLP_UNPATCHED) { > + else if (klp_target_state == KLP_UNPATCHED) > klp_post_unpatch_callback(obj); > + > + if (klp_target_state == KLP_UNPATCHED || > + klp_transition_patch->replace) { > klp_unregister_shadow_types(obj); > } > } That calls klp_unregister_shadow_types() on the objects for klp_transition_patch, which is the replacement patch, not the replaced patch(es).
On Wed, Jan 25, 2023 at 03:22:48PM -0800, Josh Poimboeuf wrote: > On Tue, Jan 17, 2023 at 04:01:57PM +0100, Petr Mladek wrote: > > > >From my experience, there are basically two relevant usage patterns of > > > shadow variables. > > > 1.) To hand over global state from one sublivepatch to its pendant in > > > the to-be-applied livepatch module. Example: a new global mutex or > > > alike. > > > 2.) The "regular" intended usage, attaching shadow variables to real > > > (data) objects. > > > > > > To manage lifetime for 1.), we usually implement some refcount scheme, > > > managed from the livepatches' module_init()/_exit(): the next livepatch > > > would subscribe to the shared state before the previous one got a chance > > > to release it. This works in practice, but the code related to it is > > > tedious to write and quite verbose. > > > > > > The second usage pattern is much more difficult to implement correctly > > > in light of possible livepatch downgrades to a subset of > > > sublivepatches. Usually a sublivepatch making use of a shadow variable > > > attached to real objects would livepatch the associated object's > > > destruction code to free up the associated shadow, if any. If the next > > > livepatch to be applied happened to not contain this sublivepatch in > > > question as well, the destruction code would effectively become > > > unpatched, and any existing shadows leaked. Depending on the object type > > > in question, this memory leakage might or might not be an actual > > > problem, but it isn't nice either way. > > > > > > Often, there's a more subtle issue with the latter usecase though: the > > > shadow continues to exist, but becomes unmaintained once the transitions > > > has started. If said sublivepatch happens to become reactivated later > > > on, it would potentially find stale shadows, and these could even get > > > wrongly associated with a completely different object which happened to > > > get allocated at the same memory address. Depending on the shadow type, > > > this might or might not be Ok. New per-object locks or a "TLB flush > > > needed" boolean would probably be Ok, but some kind of refcount would > > > certainly not. There's not much which could be done from the pre-unpatch > > > callbacks, because these aren't getting invoked for atomic-replace > > > downgrades. > > > > IMHO, this is the reason why we should make it per-object. > > > > If the shadow variable was used by a livepatched module and we remove > > this module then the shadow variables would get unmaintained. It would > > results in the problem described in this paragraph. > > Yes, that makes sense. Ok, I'm convinced. I've been thinking about this some more, and this justification for making it per-object no longer makes sense to me. A shadow variable should follow the lifetime of its associated data object, so the only way it would leak from an unloaded patched module would be if there's a bug either in the patched module or in the livepatch itself, right? Or did I misunderstand your point?
On Wed, Nov 09, 2022 at 03:36:23PM +0100, Petr Mladek wrote: > > Right. Though, thinking about it more, this isn't even needed. Each > > klp_shadow would have a pointer to its owning module. We already have a > > global hash of klp_shadows which can be iterated when the module gets > > unloaded or replaced. > > > > Assuming atomic replace, the new patch is almost always a superset of > > the old patch. We can optimize for that case. > > I see. But I do not agree with this assumption. The new livepatch > might also remove code that caused problems. > > Also we allow downgrades. I mean that there is no versioning > of livepatches. Older livepatches might replace new ones. > > We really want to support downgrades or upgrades that remove > problematic code. Or upgrades that start fixing the problem > a better way using another shadow variable. So I've been thinking about this too... I think it's good to support downgrades. But even with this patch set, they still aren't properly supported because of the callback design. Unpatch callbacks aren't called for the replaced patch. Any cleanup it needs to do (e.g., for a downgrade) is skipped. That can cause all kinds of problems including leaks and conflicts with future patches. And actually, that seems directly related to shadow variable garbage collection. Typically you would call 'klp_shadow_free_all(id, dtor)' in the post_unpatch callback, *unless* that data is going to be used by the replacement patch. That "unless" is the problem. Without that problem, garbage collection of shadow variables would be trivial. Thing is, that "unless" exists not only for the freeing of shadow variables, but also for the allocation of some shadow variables in pre/post-patch callbacks, and many other types of initializations and cleanups done by patch/unpatch callbacks. Correct me if I'm wrong, but it seems like downgrades (and similar scenarios like upgrades with partial revert) are the primary real-world motivation for this patch set. These patches feel like a partial solution to a bigger problem. If we really want to support downgrade use cases, the callbacks will need to be improved. The main issue is that callbacks aren't called at the right times, and they're not granular enough to support different levels of upgrade/downgrade so that we can arbitrarily replace any patch with any other version of the patch without consequence. If the callbacks were split up and called only when they're needed, it would be trivial to free the shadow variables in a post_unpatch callback. To support this we could have some kind of callback versioning, where a klp_object can have an array of klp_callbacks, and each set of callbacks has a version associated with it. For example, consider a sequence of patches P1-P4, each of which is an upgrade from the last. (Note: I'm assuming only one klp_object here to keep it simple.) - P1 has [ callbacks.version=1 ] - P2 has [ callbacks.version=1 ] - P3 has [ callbacks.version=1, callbacks.version=2 ] - P4 has [ callbacks.version=1, callbacks.version=3 ] Usually a patch will have the same callbacks (with same versions) as its predecessor. P1 and P2 have the same callbacks (v1). If needed, a patch can then add an additional set of callbacks (with bumped version) like P3. Or it can remove its predecessor's callbacks. Or it can do both, like P4. When deciding which callbacks to call for a replace operation, it's basically an XOR. If a callback version exists in one patch but not the other, its callbacks should be called. For example, if P3 gets upgraded to P4, call P4's v3 patch callbacks and P3's v2 unpatch callbacks. If P4 gets downgraded to P1, call P4's v3 unpatch callbacks. BTW, "version" may not be the right name, as there's no concept of newer or older: any patch can be arbitrarily replaced by any other patch. The version is just a unique ID for a set of callbacks.
On Sat 2023-02-04 15:59:10, Josh Poimboeuf wrote: > On Wed, Nov 09, 2022 at 03:36:23PM +0100, Petr Mladek wrote: > > We really want to support downgrades or upgrades that remove > > problematic code. Or upgrades that start fixing the problem > > a better way using another shadow variable. > > So I've been thinking about this too... > > I think it's good to support downgrades. But even with this patch set, > they still aren't properly supported because of the callback design. > > Unpatch callbacks aren't called for the replaced patch. Any cleanup it > needs to do (e.g., for a downgrade) is skipped. That can cause all > kinds of problems including leaks and conflicts with future patches. > > And actually, that seems directly related to shadow variable garbage > collection. > > Typically you would call 'klp_shadow_free_all(id, dtor)' in the > post_unpatch callback, *unless* that data is going to be used by the > replacement patch. Exactly. > That "unless" is the problem. Without that problem, garbage collection > of shadow variables would be trivial. > > Thing is, that "unless" exists not only for the freeing of shadow > variables, but also for the allocation of some shadow variables in > pre/post-patch callbacks, and many other types of initializations and > cleanups done by patch/unpatch callbacks. Yes. > Correct me if I'm wrong, but it seems like downgrades (and similar > scenarios like upgrades with partial revert) are the primary real-world > motivation for this patch set. Yes. > These patches feel like a partial solution to a bigger problem. If we > really want to support downgrade use cases, the callbacks will need to > be improved. You are right. All this started when I discussed problems with writing livepatch callbacks with Nicolai. I tried to make it easier by introducing the klp_state API. But Nicolai realized that it was still too hard. The garbage collection was an idea how to easily handle the most common case, see below. > The main issue is that callbacks aren't called at the right times, and > they're not granular enough to support different levels of > upgrade/downgrade so that we can arbitrarily replace any patch with any > other version of the patch without consequence. > > If the callbacks were split up and called only when they're needed, it > would be trivial to free the shadow variables in a post_unpatch > callback. > > To support this we could have some kind of callback versioning, where a > klp_object can have an array of klp_callbacks, and each set of callbacks > has a version associated with it. > > For example, consider a sequence of patches P1-P4, each of which is an > upgrade from the last. > > (Note: I'm assuming only one klp_object here to keep it simple.) > > - P1 has [ callbacks.version=1 ] > - P2 has [ callbacks.version=1 ] > - P3 has [ callbacks.version=1, callbacks.version=2 ] > - P4 has [ callbacks.version=1, callbacks.version=3 ] > > Usually a patch will have the same callbacks (with same versions) as its > predecessor. P1 and P2 have the same callbacks (v1). > > If needed, a patch can then add an additional set of callbacks (with > bumped version) like P3. Or it can remove its predecessor's callbacks. > Or it can do both, like P4. > > When deciding which callbacks to call for a replace operation, it's > basically an XOR. If a callback version exists in one patch but not the > other, its callbacks should be called. > > For example, if P3 gets upgraded to P4, call P4's v3 patch callbacks and > P3's v2 unpatch callbacks. > > If P4 gets downgraded to P1, call P4's v3 unpatch callbacks. > > BTW, "version" may not be the right name, as there's no concept of newer > or older: any patch can be arbitrarily replaced by any other patch. The > version is just a unique ID for a set of callbacks. Yes, we discussed various possibilities and it was always getting pretty complex. The most promising direction with the callbacks was connecting the callbacks with the klp_state. They are versioned as you suggest above. And we could have callbacks: + state_prepare (instead of pre_patch) + state_enable (instead of post_patch) + state_disable (instead of pre_unpatch) + state_cleanup (instead of pre_unpatch) The above callbacks would be called only when the state is new. We might need more callbacks if the support update/downgrade to another version of the state: + state_update_prepare (instead of pre_patch) + state_update_enable (instead of post_patch) + state_downgrade_prepare (instead of pre_unpatch) + state_downgrade_cleanup (instead of pre_unpatch) Also we might some callbacks when an action is needed to take over the state from the previous livepatch: + state_take_over_prepare (instead of pre_patch and pre_unpatch?) + state_take_over_enable (instead of post_patch and post_unpatch?) IMHO, it is better than the current state. It better describes the complexity of the problem. It forces/allows livepatch authors to handle all scenarios correctly. Anyway, it is complex. The authors have to understand when exactly each callback will be called. Also some callbacks might be called from the old-to-be-replaced and some from the new-to-be-used livepatch. We discussed various use-cases and realized that shadow variables were very common use case. I have just double checked it and it seems that we used: + post_patch callback for two CVEs + klp_shadow_free() for 14 CVEs This is where the idea of the garbage collection came from. It looked like an elegant solution for the most common use case. So that livepatch authors do not need to think about the rather complex ordering of the many callbacks. They would just register/define the shadow variable type and be done. Well, the single state_cleanup() callback would usually be enough for the shadow variables. Conclusion: You are right that the shadow variables and callbacks are connected. Or more precisely, the shadow variables and klp_states have similar life-time handling. They both are introduced/transfered/removed. Maybe, we could merge klp_shadow and klp_state into a single entity that would support versioning and various callbacks. I am just not sure how the entity would be called. The name is important because it should be clear what it means and how it can be used. The unification and bad name might cause more harm then good. Note that the meaning is a bit different. klp_shadow is for storing data. klp_state rather describes a change of the behavior done by a callback. IMHO, there does not exist 1:1 relation between shadow variables and states. Using a particular shadow variable type might define a system state. But a callback might change a system state even without using shadow variables. Another small problem is that klp_state is currently in struct klp_livepatch. And it would be more safe to register klp_shadow_type in struct klp_object. Best Regards, Petr
diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h index 79e7bf3b35f6..fdd82fde86e6 100644 --- a/include/linux/livepatch.h +++ b/include/linux/livepatch.h @@ -100,11 +100,14 @@ struct klp_callbacks { bool post_unpatch_enabled; }; +struct klp_shadow_type; + /** * struct klp_object - kernel object structure for live patching * @name: module name (or NULL for vmlinux) * @funcs: function entries for functions to be patched in the object * @callbacks: functions to be executed pre/post (un)patching + * @shadow_types: shadow variable types used by the livepatch for the klp_object * @kobj: kobject for sysfs resources * @func_list: dynamic list of the function entries * @node: list node for klp_patch obj_list @@ -118,6 +121,7 @@ struct klp_object { const char *name; struct klp_func *funcs; struct klp_callbacks callbacks; + struct klp_shadow_type **shadow_types; /* internal */ struct kobject kobj; @@ -222,13 +226,30 @@ typedef void (*klp_shadow_dtor_t)(void *obj, void *shadow_data); * @ctor: custom constructor to initialize the shadow data (optional) * @dtor: custom callback that can be used to unregister the variable * and/or free data that the shadow variable points to (optional) + * @registered: flag indicating if the variable was successfully registered + * + * All shadow variables used by the livepatch for the related klp_object + * must be listed here so that they are registered when the livepatch + * and the module is loaded. Otherwise, it will not be possible to + * allocate them. */ struct klp_shadow_type { unsigned long id; klp_shadow_ctor_t ctor; klp_shadow_dtor_t dtor; + + /* internal */ + bool registered; }; +#define klp_for_each_shadow_type(obj, shadow_type, i) \ + for (shadow_type = obj->shadow_types ? obj->shadow_types[0] : NULL, i = 1; \ + shadow_type; \ + shadow_type = obj->shadow_types[i++]) + +int klp_shadow_register(struct klp_shadow_type *shadow_type); +void klp_shadow_unregister(struct klp_shadow_type *shadow_type); + void *klp_shadow_get(void *obj, struct klp_shadow_type *shadow_type); void *klp_shadow_alloc(void *obj, struct klp_shadow_type *shadow_type, size_t size, gfp_t gfp_flags, void *ctor_data); diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c index 9ada0bc5247b..44c9e5ea0d2c 100644 --- a/kernel/livepatch/core.c +++ b/kernel/livepatch/core.c @@ -928,6 +928,30 @@ static int klp_init_patch(struct klp_patch *patch) return 0; } +void klp_unregister_shadow_types(struct klp_object *obj) +{ + struct klp_shadow_type *shadow_type; + int i; + + klp_for_each_shadow_type(obj, shadow_type, i) { + klp_shadow_unregister(shadow_type); + } +} + +static int klp_register_shadow_types(struct klp_object *obj) +{ + struct klp_shadow_type *shadow_type; + int i, ret; + + klp_for_each_shadow_type(obj, shadow_type, i) { + ret = klp_shadow_register(shadow_type); + if (ret) + return ret; + } + + return 0; +} + static int __klp_disable_patch(struct klp_patch *patch) { struct klp_object *obj; @@ -988,6 +1012,13 @@ static int __klp_enable_patch(struct klp_patch *patch) if (!klp_is_object_loaded(obj)) continue; + ret = klp_register_shadow_types(obj); + if (ret) { + pr_warn("failed to register shadow types for object '%s'\n", + klp_is_module(obj) ? obj->name : "vmlinux"); + goto err; + } + ret = klp_pre_patch_callback(obj); if (ret) { pr_warn("pre-patch callback failed for object '%s'\n", @@ -1172,6 +1203,7 @@ static void klp_cleanup_module_patches_limited(struct module *mod, klp_unpatch_object(obj); klp_post_unpatch_callback(obj); + klp_unregister_shadow_types(obj); klp_free_object_loaded(obj); break; @@ -1218,6 +1250,13 @@ int klp_module_coming(struct module *mod) pr_notice("applying patch '%s' to loading module '%s'\n", patch->mod->name, obj->mod->name); + ret = klp_register_shadow_types(obj); + if (ret) { + pr_warn("failed to register shadow types for object '%s'\n", + obj->name); + goto err; + } + ret = klp_pre_patch_callback(obj); if (ret) { pr_warn("pre-patch callback failed for object '%s'\n", diff --git a/kernel/livepatch/core.h b/kernel/livepatch/core.h index 38209c7361b6..0b68f2407a82 100644 --- a/kernel/livepatch/core.h +++ b/kernel/livepatch/core.h @@ -13,6 +13,7 @@ extern struct list_head klp_patches; #define klp_for_each_patch(patch) \ list_for_each_entry(patch, &klp_patches, list) +void klp_unregister_shadow_types(struct klp_object *obj); void klp_free_patch_async(struct klp_patch *patch); void klp_free_replaced_patches_async(struct klp_patch *new_patch); void klp_unpatch_replaced_patches(struct klp_patch *new_patch); diff --git a/kernel/livepatch/shadow.c b/kernel/livepatch/shadow.c index 64e83853891d..9437dc1be7b2 100644 --- a/kernel/livepatch/shadow.c +++ b/kernel/livepatch/shadow.c @@ -34,6 +34,7 @@ #include <linux/hashtable.h> #include <linux/slab.h> #include <linux/livepatch.h> +#include "core.h" static DEFINE_HASHTABLE(klp_shadow_hash, 12); @@ -59,6 +60,22 @@ struct klp_shadow { char data[]; }; +/** + * struct klp_shadow_type_reg - information about a registered shadow + * variable type + * @id: shadow variable type indentifier + * @count: reference counter + * @list: list node for list of registered shadow variable types + */ +struct klp_shadow_type_reg { + unsigned long id; + int ref_cnt; + struct list_head list; +}; + +/* List of registered shadow variable types */ +static LIST_HEAD(klp_shadow_types); + /** * klp_shadow_match() - verify a shadow variable matches given <obj, id> * @shadow: shadow variable to match @@ -84,6 +101,13 @@ void *klp_shadow_get(void *obj, struct klp_shadow_type *shadow_type) { struct klp_shadow *shadow; + /* Just the best effort. Can't take @klp_shadow_lock here. */ + if (!shadow_type->registered) { + pr_err("Trying to get shadow variable of non-registered type: %lu\n", + shadow_type->id); + return NULL; + } + rcu_read_lock(); hash_for_each_possible_rcu(klp_shadow_hash, shadow, node, @@ -310,3 +334,103 @@ void klp_shadow_free_all(struct klp_shadow_type *shadow_type) spin_unlock_irqrestore(&klp_shadow_lock, flags); } EXPORT_SYMBOL_GPL(klp_shadow_free_all); + +static struct klp_shadow_type_reg * +klp_shadow_type_get_reg(struct klp_shadow_type *shadow_type) +{ + struct klp_shadow_type_reg *shadow_type_reg; + lockdep_assert_held(&klp_shadow_lock); + + list_for_each_entry(shadow_type_reg, &klp_shadow_types, list) { + if (shadow_type_reg->id == shadow_type->id) + return shadow_type_reg; + } + + return NULL; +} + +/** + * klp_shadow_register() - register the given shadow variable type + * @shadow_type: shadow type to be registered + * + * Tell the system that the given shadow type is going to used by the caller + * (livepatch module). It allows to check and maintain lifetime of shadow + * variables. + * + * Return: 0 on suceess, -ENOMEM when there is not enough memory. + */ +int klp_shadow_register(struct klp_shadow_type *shadow_type) +{ + struct klp_shadow_type_reg *shadow_type_reg; + struct klp_shadow_type_reg *new_shadow_type_reg; + + new_shadow_type_reg = + kzalloc(sizeof(struct klp_shadow_type_reg), GFP_KERNEL); + if (!new_shadow_type_reg) + return -ENOMEM; + + spin_lock_irq(&klp_shadow_lock); + + if (shadow_type->registered) { + pr_err("Trying to register shadow variable type that is already registered: %lu", + shadow_type->id); + kfree(new_shadow_type_reg); + goto out; + } + + shadow_type_reg = klp_shadow_type_get_reg(shadow_type); + if (!shadow_type_reg) { + shadow_type_reg = new_shadow_type_reg; + shadow_type_reg->id = shadow_type->id; + list_add(&shadow_type_reg->list, &klp_shadow_types); + } else { + kfree(new_shadow_type_reg); + } + + shadow_type_reg->ref_cnt++; + shadow_type->registered = true; +out: + spin_unlock_irq(&klp_shadow_lock); + + return 0; +} +EXPORT_SYMBOL_GPL(klp_shadow_register); + +/** + * klp_shadow_unregister() - unregister the give shadow variable type + * @shadow_type: shadow type to be unregistered + * + * Tell the system that a given shadow variable ID is not longer be used by + * the caller (livepatch module). All existing shadow variables are freed + * when it was the last registered user. + */ +void klp_shadow_unregister(struct klp_shadow_type *shadow_type) +{ + struct klp_shadow_type_reg *shadow_type_reg; + + spin_lock_irq(&klp_shadow_lock); + + if (!shadow_type->registered) { + pr_err("Trying to unregister shadow variable type that is not registered: %lu", + shadow_type->id); + goto out; + } + + shadow_type_reg = klp_shadow_type_get_reg(shadow_type); + if (!shadow_type_reg) { + pr_err("Can't find shadow variable type registration: %lu", shadow_type->id); + goto out; + } + + shadow_type->registered = false; + shadow_type_reg->ref_cnt--; + + if (!shadow_type_reg->ref_cnt) { + __klp_shadow_free_all(shadow_type); + list_del(&shadow_type_reg->list); + kfree(shadow_type_reg); + } +out: + spin_unlock_irq(&klp_shadow_lock); +} +EXPORT_SYMBOL_GPL(klp_shadow_unregister); diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c index 30187b1d8275..9c57941974a7 100644 --- a/kernel/livepatch/transition.c +++ b/kernel/livepatch/transition.c @@ -123,8 +123,10 @@ static void klp_complete_transition(void) continue; if (klp_target_state == KLP_PATCHED) klp_post_patch_callback(obj); - else if (klp_target_state == KLP_UNPATCHED) + else if (klp_target_state == KLP_UNPATCHED) { klp_post_unpatch_callback(obj); + klp_unregister_shadow_types(obj); + } } pr_notice("'%s': %s complete\n", klp_transition_patch->mod->name, diff --git a/lib/livepatch/test_klp_shadow_vars.c b/lib/livepatch/test_klp_shadow_vars.c index ee47c1fae8e2..6de1f6d11bcf 100644 --- a/lib/livepatch/test_klp_shadow_vars.c +++ b/lib/livepatch/test_klp_shadow_vars.c @@ -188,6 +188,17 @@ static int test_klp_shadow_vars_init(void) int ret; int i; + /* Registered manually since we don't have a klp_object instance. */ + ret = klp_shadow_register(&shadow_type_1); + if (ret) + return ret; + + ret = klp_shadow_register(&shadow_type_2); + if (ret) { + klp_shadow_unregister(&shadow_type_1); + return ret; + } + ptr_id(NULL); /* @@ -296,12 +307,9 @@ static int test_klp_shadow_vars_init(void) pr_info(" got expected NULL result\n"); } - free_ptr_list(); - - return 0; out: - shadow_free_all(&shadow_type_1); /* 'char' pairs */ - shadow_free_all(&shadow_type_2); /* 'int' pairs */ + klp_shadow_unregister(&shadow_type_1); /* 'char' pairs */ + klp_shadow_unregister(&shadow_type_2); /* 'int' pairs */ free_ptr_list(); return ret; diff --git a/samples/livepatch/livepatch-shadow-fix1.c b/samples/livepatch/livepatch-shadow-fix1.c index 0cc7d1e4b4bc..6718df9ec14b 100644 --- a/samples/livepatch/livepatch-shadow-fix1.c +++ b/samples/livepatch/livepatch-shadow-fix1.c @@ -141,6 +141,11 @@ static struct klp_shadow_type shadow_leak_type = { .dtor = livepatch_fix1_dummy_leak_dtor, }; +struct klp_shadow_type *shadow_types[] = { + &shadow_leak_type, + NULL +}; + static struct klp_func funcs[] = { { .old_name = "dummy_alloc", @@ -156,6 +161,7 @@ static struct klp_object objs[] = { { .name = "livepatch_shadow_mod", .funcs = funcs, + .shadow_types = shadow_types, }, { } }; @@ -171,8 +177,6 @@ static int livepatch_shadow_fix1_init(void) static void livepatch_shadow_fix1_exit(void) { - /* Cleanup any existing SV_LEAK shadow variables */ - klp_shadow_free_all(&shadow_leak_type); } module_init(livepatch_shadow_fix1_init); diff --git a/samples/livepatch/livepatch-shadow-fix2.c b/samples/livepatch/livepatch-shadow-fix2.c index 840100555152..290c7e3f96b0 100644 --- a/samples/livepatch/livepatch-shadow-fix2.c +++ b/samples/livepatch/livepatch-shadow-fix2.c @@ -103,6 +103,12 @@ static struct klp_shadow_type shadow_counter_type = { .id = SV_COUNTER, }; +struct klp_shadow_type *shadow_types[] = { + &shadow_leak_type, + &shadow_counter_type, + NULL +}; + static struct klp_func funcs[] = { { .old_name = "dummy_check", @@ -118,6 +124,7 @@ static struct klp_object objs[] = { { .name = "livepatch_shadow_mod", .funcs = funcs, + .shadow_types = shadow_types, }, { } }; @@ -133,8 +140,6 @@ static int livepatch_shadow_fix2_init(void) static void livepatch_shadow_fix2_exit(void) { - /* Cleanup any existing SV_COUNTER shadow variables */ - klp_shadow_free_all(&shadow_leak_type); } module_init(livepatch_shadow_fix2_init);