[RFC,v2,00/13] integrity: Introduce a digest cache

Message ID 20230812104616.2190095-1-roberto.sassu@huaweicloud.com
Headers
Series integrity: Introduce a digest cache |

Message

Roberto Sassu Aug. 12, 2023, 10:46 a.m. UTC
  From: Roberto Sassu <roberto.sassu@huawei.com>

Introduction
============

The main goal of Integrity Measurement Architecture (IMA) is to perform a
measurement of the file content and use it for remote attestation, to
report a possibly compromised system, using the TPM as a root of trust. It
can also prevent a system compromise from happening by checking the
calculated file digest against a known-good reference value and by denying
the current operation if there is a mismatch.


Motivation
==========

This patch set aims to address two important shortcomings: predictability
of the Platform Configuration Registers (PCRs), and the provisioning of
reference values to compare the calculated file digest against.

Remote attestation, according to Trusted Computing Group (TCG)
specifications, is done by replicating the PCR extend operation in
software with the digests in the event log (in this case the IMA
measurement list), and by comparing the obtained value with the PCR value
signed by the TPM with the quote operation.

Due to how the extend operation is performed, if measurements are done in
a different order, the final PCR value will be different. That means that
if measurements are done in parallel, there is no way to predict what the
final PCR value will be, making impossible to seal data to a PCR value. If
the PCR value was predictable, a system could for example prove its
integrity by unsealing and using its private key, without sending every
time the full list of measurements.

Provisioning reference values for file digests is also a difficult task.
The solution so far was to add file signatures to RPM packages, and
possibly to DEB packages, so that IMA can verify them. While this undoubtly
works, it also requires Linux distribution vendors to support the feature
by rebuilding all their packages, and eventually extending their PKI to
perform the additional signatures. It could also require developers extra
work to deal with the additional data.

On the other hand, since often packages carry the file digests themselves,
it won't be actually needed to add file signatures. If the kernel was able
to extract the file digests by itself, all the tasks mentioned above for
the Linux distribution vendors won't be needed too. All current and past
Linux distributions can be easily retrofitted to enable IMA appraisal with
the file digests from the packages.

Narrowing down the scope of a package parser to only extract specific
information makes it small enough to accurately verify that it cannot harm
the kernel. An additional mitigation consists in verifying the signature of
the package first, before attempting to extract the file digests.


Solution
========

To avoid a PCR is extended in a non-deterministic way, the proposed
solution is to replace individual file measurements with the measurement of
a file (the digest list) containing a set of file digests. If the
calculated digest of a file being measured/appraised matches one digest in
the set, its measurement is skipped. If otherwise there is no match, the
file digest is added to the measurement list.

The resulting measurement list, which cannot be done on the default IMA PCR
to avoid ambiguity with the default-style measurement, has the following
meaning: none/some/all files represented with the measurement of the digest
lists COULD have been accessed, without knowing IF and WHEN. Any other
measurement (other than boot_aggregate) is of a file whose digest was not
included in the digest list.

File signatures have a coarser granularity, it is per-signing key and not
per-package. A measurement list containing just the measurement of the
signing keys and the files without/invalid signature (those with valid
signature would be skipped) would be even less accurate.

To ensure a rapid and smooth deployment of IMA appraisal, the kernel has
been provided with the ability to extract file digests from the RPM
package headers, and add them to the kernel memory on demand (only when a
file from a given package is accessed). This ensures that the memory
consumption for this new feature is directly proportional to the usage of
the system.


Scope
=====

The integrity digest cache enables IMA to extend a PCR (not the default
one) in a deterministic fashion, and to appraise immutable files with file
digests from the packages, when no other appraisal method is available. It
does not yet support metadata verification with Extended Verification
Module (EVM), for which a separate patch set will be provided.


Design
======

The digest cache is a hash table of file digests, attached to the inode of
the digest list from which file digests are extracted. It is accessible,
when a given file is being measured/appraised, from the new xattr
security.digest_list, containing the path of the digest list itself.

If the calculated file digest is found in the digest cache, its measurement
is avoided, or read-only access is granted if appraisal is in enforcing
mode. Read-write access is prevented to avoid updating an unverified HMAC
of file metadata.

The digest cache can be used only if the following conditions are met:

- The 'digest_cache=content' keyword is added to the desired IMA policy
  rules;
- If the rule action is 'measure', a PCR different from the default one
  is specified;
- If the rule action is 'appraise', 'digest_cache=content' and
  'appraise_type' don't appear at the same time;
- The same action for which the digest cache is used was done also on the
  digest list;
- The digest cache is not (currently) used for measurement/appraisal of
  other digest lists.

For performance reasons, the digest cache is attached to every inode using
it, since multiple hooks can be invoked on it before the
measurement/appraisal result is cached. A reference count indicates how
many inodes use it, and only when it reaches zero, the digest cache can be
freed (for example when inodes are evicted from memory).

Two digest cache pointers have been added to the iint to distinguish for
which purpose they should be used: dig_owner points to the digest cache
created from the same inode the iint refers to, and should be used for
measurement/appraisal of other inodes; dig_user points to the digest
cache created from a different inode, and requested for
measurement/appraisal. One digest cache pointer would be confusing, as
for digest lists the digest cache was created from them, but IMA would
try to use that digest cache for measurement/appraisal of itself.

Finally, at the first digest list measurement, an iterator is executed to
sequentially read (not parse) all the digest lists in the same directory,
so that the PCR is extended in a deterministic fashion. The other
concurrent users of the digest cache have to wait until the iterator
finishes.


API
===

digest_cache_alloc(), digest_cache_parse_digest_list() and
digest_cache_new() are internal functions used during the creation and
initialization of the digest cache.

digest_cache_get() and digest_cache_free() are called by the user of the
digest cache (e.g. IMA), to obtain and free a digest cache.

digest_cache_init_htable(), digest_cache_add() and digest_cache_lookup()
are called by the digest list parsers to populate and search in a digest
cache.


Digest List Formats
===================

tlv
~~~

The Type-Length-Value (TLV) format was chosen for its extensibility.
Additional fields can be added without breaking compatibility with old
versions of the parser.

The layout of a tlv digest list is the following:

 [header: DIGEST_LIST_FILE, num fields, total len]
 [field: DIGEST_LIST_ALGO, length, value]
 [field: DIGEST_LIST_ENTRY#1, length, value (below)]
  |- [header: DIGEST_LIST_FILE, num fields, total len]
  |- [ENTRY#1_DIGEST, length, file digest]
  |- [ENTRY#1_PATH, length, file path]
 [field: DIGEST_LIST_ENTRY#N, length, value (below)]
  |- [header: DIGEST_LIST_FILE, num fields, total len]
  |- [ENTRY#N_DIGEST, length, file digest]
  |- [ENTRY#N_PATH, length, file path]

DIGEST_LIST_ALGO is a field to specify the algorithm of the file digest.
DIGEST_LIST_ENTRY is a nested TLV structure with the following fields:
ENTRY_DIGEST contains the file digest; ENTRY_PATH contains the file path.


rpm
~~~

The rpm digest list is basically a subset of the RPM package header.
Its format is:

 [RPM magic number]
 [RPMTAG_IMMUTABLE]

RPMTAG_IMMUTABLE is a section of the full RPM header containing the part
of the header that was signed, and whose signature is stored in the
RPMTAG_RSAHEADER section.


Appended Signature
~~~~~~~~~~~~~~~~~~

Digest lists can have a module-style appended signature, that can be used
for appraisal with IMA. The signature type can be PKCS#7, as for kernel
modules, or the new user asymmetric key signature.


History
=======

The original name of this work was IMA Digest Lists, which was somehow
considered too invasive. The code was moved to a separate component named
DIGLIM (DIGest Lists Integrity Module), with the purpose of removing the
complexity away of IMA, and also add the possibility of using it with other
kernel components (e.g. Integrity Policy Enforcement, or IPE).

Since it was originally proposed, in 2017, this work grew up a lot thanks
to various comments/suggestions. It became integrally part of the openEuler
distribution since end of 2020.

There are significant differences between this and the previous versions.
The most important one is moving from a centralized repository of file
digests to a per-package repository. This significantly reduces the memory
pressure, since digest lists are loaded into kernel memory only when they
are actually needed. Also, file digests are automatically unloaded from
kernel memory at the same time inodes are evicted from memory during
reclamation.


Performance
===========

The tests have been performed on a Fedora 38 virtual machine, with 8 cores
(AMD EPYC-Rome), 4GB of RAM, TPM passthrough. The signing key is an ECDSA
NIST P-384.

IMA measurement policy: no cache
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 dont_measure fsmagic=0x01021994
 measure func=BPRM_CHECK
 measure func=MMAP_CHECK


IMA measurement policy: cache
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 dont_measure fsmagic=0x01021994
 measure func=DIGEST_LIST_CHECK template=ima-modsig pcr=11
 measure func=BPRM_CHECK digest_cache=content pcr=11
 measure func=MMAP_CHECK digest_cache=content pcr=11


IMA Measurement Results
~~~~~~~~~~~~~~~~~~~~~~~

                               +-----------+-----------+-----------+
                               | # measur. | boot time |   slab    |
 +-----------------------------+-----------+-----------+-----------+
 | measure (no cache)          |    389    |  12.682s  | 231453 KB |
 +-----------------------------+-----------+-----------+-----------+
 | measure (cache, no iter)    |    175    |  12.283s  | 234224 KB |
 +-----------------------------+-----------+-----------+-----------+
 | measure (cache, iter)       |    853    |  16.430s  | 238491 KB |
 +-----------------------------+-----------+-----------+-----------+

With the iterator enabled, all 852 packages are measured. Consequently, the
boot time is longer. One possible optimization would be to exclude the
packages that don't include measured files. By disabling the iterator, it
can be seen that the packages actually used are 174 (one measurement is for
boot_aggregate).


IMA appraisal policy: no cache
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 dont_appraise fsmagic=0x01021994
 appraise func=BPRM_CHECK appraise_type=imasig
 appraise func=MMAP_CHECK appraise_type=imasig


IMA appraisal policy: cache
~~~~~~~~~~~~~~~~~~~~~~~~~~~

 dont_appraise fsmagic=0x01021994
 appraise func=DIGEST_LIST_CHECK appraise_type=imasig|modsig
 appraise func=BPRM_CHECK digest_cache=content
 appraise func=MMAP_CHECK digest_cache=content


IMA Appraisal Results
~~~~~~~~~~~~~~~~~~~~~

                               +-----------+-----------+
                               | boot time |   slab    |
 +-----------------------------+-----------+-----------+
 | appraise (no cache)         |  11.995s  | 231145 KB |
 +-----------------------------+-----------+-----------+
 | appraise (cache)            |  11.879s  | 233091 KB |
 +-----------------------------+-----------+-----------+

In this test, it can be seen that the performance of the two solutions are
comparable, with the digest cache slightly ahead. The difference could be
more substantial with more file appraised.


How to Test
===========

First, it is necessary to copy the new kernel headers (tlv_parser.h,
uasym_parser.h, tlv_digest_list.h) from usr/include/linux in the kernel
source directory to /usr/include/linux.

Then, gpg must be rebuilt with the additional patches to convert the PGP
keys of the Linux distribution to the new user asymmetric key format:

 $ gpg --conv-kernel <path of PGP key> >> certs/uasym_keys.bin

This embeds the converted keys in the kernel image. Then, the following
kernel options must be enabled:

 CONFIG_INTEGRITY_DIGEST_CACHE=y
 CONFIG_UASYM_KEYS_SIGS=y
 CONFIG_UASYM_PRELOAD_PUBLIC_KEYS=y

and the kernel must be rebuilt with the patches applied. After boot, it is
necessary to build and install the digest list tool in tools/digest-lists,
and to execute (as root):

 # manage_digest_lists -o gen -d /etc/digest_lists -i rpmdb -f rpm

The new gpg must also be installed in the system, as it will be used to
convert the PGP signatures of the RPM headers to the user asymmetric key
format.

It is recommended to create an additional digest list with the following
files, by creating a file named 'list' with the content:

 /usr/bin/manage_digest_lists
 /usr/lib64/libgen-tlv-list.so
 /usr/lib64/libgen-rpm-list.so
 /usr/lib64/libparse-rpm-list.so
 /usr/lib64/libparse-tlv-list.so

Then, to create the digest list, it is sufficient to execute:

 # manage_digest_lists -i list -L -d /etc/digest_lists -o gen -f tlv

If appraisal is enabled and in enforcing mode, it is necessary to sign the
new digest list, with the sign-file tool in the scripts/ directory of the
kernel sources:

 # scripts/sign-file sha256 certs/signing_key.pem certs/signing_key.pem /etc/digest_lists/tlv-list

The final step is to add security.digest_list to each file with:

 # manage_digest_lists -i /etc/digest_lists -o add-xattr

After that, it is possible to test the integrity digest cache with the
following policy written to /etc/ima/ima-policy:

 dont_measure fsmagic=0x01021994
 measure func=DIGEST_LIST_CHECK template=ima-modsig pcr=11
 measure func=BPRM_CHECK digest_cache=content pcr=11
 measure func=MMAP_CHECK digest_cache=content pcr=11
 dont_appraise fsmagic=0x01021994
 appraise func=BPRM_CHECK digest_cache=content
 appraise func=MMAP_CHECK digest_cache=content
 appraise func=DIGEST_LIST_CHECK appraise_type=imasig|modsig

Tmpfs is excluded for now, until memfd is properly handled.

Before loading the policy, it is possible to enable dynamic debug to see
which operations are done by the integrity digest cache:

 # echo "file tlv* +p" > /sys/kernel/debug/dynamic_debug/control
 # echo "file rpm* +p" > /sys/kernel/debug/dynamic_debug/control
 # echo "file digest* +p" > /sys/kernel/debug/dynamic_debug/control

Alternatively, the same strings can be set as value of the dyndbg= option
in the kernel command line.

A preliminary test, before booting the system with the new policy, is to
supply the policy to IMA in the current system with:

 # cat /etc/ima/ima-policy > /sys/kernel/security/ima/policy

If that worked, the system can be rebooted. Systemd will take care of
loading the IMA policy at boot. The instructions have been tested on a
Fedora 38 OS.

After boot, it is possible to check the content of the measurement list:

 # cat /sys/kernel/security/ima/ascii_runtime_measurements

If only the files shipped with Fedora 38 have been executed, the
measurement list will contain only the digest lists, and not the individual
files.

Another test is to ensure that IMA prevents the execution of unknown files:

 # cp -a /bin/cat .
 # ./cat

That will work. But not on the modified binary:

 # echo 1 >> cat
 # cat
 -bash: ./cat: Permission denied

Execution will be denied, and a new entry in the measurement list will
appear (it would be probably ok to not add that entry, as access to the
file was denied):

 11 50b5a68bea0776a84eef6725f17ce474756e51c0 ima-ng sha256:15e1efee080fe54f5d7404af7e913de01671e745ce55215d89f3d6521d3884f0 /root/cat

Finally, it is possible to test the shrinking of the digest cache, by
forcing the kernel to evict inodes from memory:

 # echo 3 > /proc/sys/vm/drop_caches

The kernel log should have messages like:

 [  313.032536] DIGEST CACHE: Remove digest sha256:102900208eef27b766380135906d431dba87edaa7ec6aa72e6ebd3dd67f3a97b from digest list /etc/digest_lists/rpm-libseccomp-2.5.3-4.fc38.x86_64


Patch set dependencies
======================

This patch set depends on:

https://lore.kernel.org/linux-integrity/20230720153247.3755856-2-roberto.sassu@huaweicloud.com/

which allows to appraise RPM package headers with the PGP keys of Linux
distribution vendors.


Patch set content
=================

Patch 1 introduces a new hook to identify the loading of digest lists and
consequently appraise them.

Patches 2-4 implement the digest cache, and an iterator to prefetch the
digest lists to measure them in a deterministic way.

Patches 5-6 implement the currently supported digest list formats: tlv and
rpm. The tlv format relies on the TLV parser defined in the patch set
mentioned above.

Patches 7-9 enable the usage of the digest cache in IMA for measurement and
appraisal.

Patches 10-12 add a tool to manage digest lists.

Patch 13 adds the documentation of the integrity digest cache.


Changelog
=========

v1:
- Add documentation in Documentation/security/integrity-digest-cache.rst
- Pass the mask of IMA actions to digest_cache_alloc()
- Add a reference count to the digest cache
- Remove the path parameter from digest_cache_get(), and rely on the
  reference count to avoid the digest cache disappearing while being used
- Rename the dentry_to_check parameter of digest_cache_get() to dentry
- Rename digest_cache_get() to digest_cache_new() and add
  digest_cache_get() to set the digest cache in the iint of the inode for
  which the digest cache was requested
- Add dig_owner and dig_user to the iint, to distinguish from which inode
  the digest cache was created from, and which is using it; consequently it
  makes the digest cache usable to measure/appraise other digest caches
  (support not yet enabled)
- Add dig_owner_mutex and dig_user_mutex to serialize accesses to dig_owner
  and dig_user until they are initialized
- Enforce strong synchronization and make the contenders wait until
  dig_owner and dig_user are assigned to the iint the first time
- Move checking IMA actions on the digest list earlier, and fail if no
  action were performed (digest cache not usable)
- Remove digest_cache_put(), not needed anymore with the introduction of
  the reference count
- Fail immediately in digest_cache_lookup() if the digest algorithm is
  not set in the digest cache
- Use 64 bit mask for IMA actions on the digest list instead of 8 bit
- Return NULL in the inline version of digest_cache_get()
- Use list_add_tail() instead of list_add() in the iterator
- Copy the digest list path to a separate buffer in digest_cache_iter_dir()
- Use digest list parsers verified with Frama-C
- Explicitly disable (for now) the possibility in the IMA policy to use the
  digest cache to measure/appraise other digest lists
- Replace exit(<value>) with return <value> in manage_digest_lists.c

Roberto Sassu (13):
  ima: Introduce hook DIGEST_LIST_CHECK
  integrity: Introduce a digest cache
  integrity/digest_cache: Add functions to populate and search
  integrity/digest_cache: Prefetch digest lists in a directory
  integrity/digest_cache: Parse tlv digest lists
  integrity/digest_cache: Parse rpm digest lists
  ima: Add digest_cache policy keyword
  ima: Use digest cache for measurement
  ima: Use digest cache for appraisal
  tools: Add tool to manage digest lists
  tools/digest-lists: Add tlv digest list generator and parser
  tools/digest-lists: Add rpm digest list generator and parser
  docs: Add documentation of the integrity digest cache

 Documentation/ABI/testing/ima_policy          |   6 +-
 Documentation/security/index.rst              |   1 +
 .../security/integrity-digest-cache.rst       | 484 ++++++++++++++++++
 MAINTAINERS                                   |   2 +
 include/linux/kernel_read_file.h              |   1 +
 include/uapi/linux/tlv_digest_list.h          |  59 +++
 include/uapi/linux/xattr.h                    |   3 +
 security/integrity/Kconfig                    |  12 +
 security/integrity/Makefile                   |   4 +
 security/integrity/digest_cache.c             | 454 ++++++++++++++++
 security/integrity/digest_cache.h             | 110 ++++
 security/integrity/digest_cache_iter.c        | 160 ++++++
 .../integrity/digest_list_parsers/parsers.h   |  15 +
 security/integrity/digest_list_parsers/rpm.c  | 215 ++++++++
 security/integrity/digest_list_parsers/tlv.c  | 188 +++++++
 security/integrity/iint.c                     |  12 +
 security/integrity/ima/ima.h                  |  16 +-
 security/integrity/ima/ima_api.c              |  22 +-
 security/integrity/ima/ima_appraise.c         |  16 +-
 security/integrity/ima/ima_main.c             |  40 +-
 security/integrity/ima/ima_policy.c           |  59 ++-
 security/integrity/integrity.h                |   8 +
 tools/Makefile                                |  16 +-
 tools/digest-lists/.gitignore                 |   7 +
 tools/digest-lists/Makefile                   |  72 +++
 tools/digest-lists/common.c                   | 163 ++++++
 tools/digest-lists/common.h                   |  90 ++++
 tools/digest-lists/generators/generators.h    |  18 +
 tools/digest-lists/generators/rpm.c           | 257 ++++++++++
 tools/digest-lists/generators/tlv.c           | 168 ++++++
 tools/digest-lists/manage_digest_lists.c      | 349 +++++++++++++
 tools/digest-lists/manage_digest_lists.txt    |  82 +++
 tools/digest-lists/parsers/parsers.h          |  16 +
 tools/digest-lists/parsers/rpm.c              | 169 ++++++
 tools/digest-lists/parsers/tlv.c              | 195 +++++++
 tools/digest-lists/parsers/tlv_parser.h       |  38 ++
 36 files changed, 3501 insertions(+), 26 deletions(-)
 create mode 100644 Documentation/security/integrity-digest-cache.rst
 create mode 100644 include/uapi/linux/tlv_digest_list.h
 create mode 100644 security/integrity/digest_cache.c
 create mode 100644 security/integrity/digest_cache.h
 create mode 100644 security/integrity/digest_cache_iter.c
 create mode 100644 security/integrity/digest_list_parsers/parsers.h
 create mode 100644 security/integrity/digest_list_parsers/rpm.c
 create mode 100644 security/integrity/digest_list_parsers/tlv.c
 create mode 100644 tools/digest-lists/.gitignore
 create mode 100644 tools/digest-lists/Makefile
 create mode 100644 tools/digest-lists/common.c
 create mode 100644 tools/digest-lists/common.h
 create mode 100644 tools/digest-lists/generators/generators.h
 create mode 100644 tools/digest-lists/generators/rpm.c
 create mode 100644 tools/digest-lists/generators/tlv.c
 create mode 100644 tools/digest-lists/manage_digest_lists.c
 create mode 100644 tools/digest-lists/manage_digest_lists.txt
 create mode 100644 tools/digest-lists/parsers/parsers.h
 create mode 100644 tools/digest-lists/parsers/rpm.c
 create mode 100644 tools/digest-lists/parsers/tlv.c
 create mode 100644 tools/digest-lists/parsers/tlv_parser.h
  

Comments

Roberto Sassu Sept. 5, 2023, 3:46 p.m. UTC | #1
On Sat, 2023-08-12 at 12:46 +0200, Roberto Sassu wrote:
> From: Roberto Sassu <roberto.sassu@huawei.com>
> 
> Introduction
> ============
> 
> The main goal of Integrity Measurement Architecture (IMA) is to perform a
> measurement of the file content and use it for remote attestation, to
> report a possibly compromised system, using the TPM as a root of trust. It
> can also prevent a system compromise from happening by checking the
> calculated file digest against a known-good reference value and by denying
> the current operation if there is a mismatch.

Small update, I build openSUSE Tumbleweed packages with the patches I
sent (I added an rpm plugin). After the following steps, openSUSE
Tumbleweed will:

- Provide a predictable measurement list and PCR (11)
- Enforce an IMA Appraisal policy for executable code (excluding tmpfs)
- Support the above after package installation and removal


1) Add the following repository to YaST:

https://download.opensuse.org/repositories/home:/roberto.sassu:/integrity-digest-cache/standard/


2) Install the necessary packages (in a testing VM):

# zypper in gpg2-2.3.8-316.2.x86_64 kernel-kvmsmall kernel-tools-digest-lists


3) Generate the initial set of digest lists from installed packages:

# manage_digest_lists -d /etc/digest_lists -o gen -i rpmdb -f rpm


4) Add security.digest_list xattr to every file, to point to the
respective digest list:

# manage_digest_lists -i /etc/digest_lists -o add-xattr


5) Create /etc/ima/ima-policy with the content:

dont_measure fsmagic=0x01021994
measure func=DIGEST_LIST_CHECK template=ima-modsig pcr=11
measure func=BPRM_CHECK digest_cache=content pcr=11
measure func=MMAP_CHECK digest_cache=content pcr=11
dont_appraise fsmagic=0x01021994
appraise func=BPRM_CHECK digest_cache=content
appraise func=MMAP_CHECK digest_cache=content
appraise func=DIGEST_LIST_CHECK appraise_type=imasig|modsig


6) Add the following lines to /etc/apparmor.d/abstractions/base, to
   allow all processes to read /etc/digest_lists

@{etc_ro}/digest_lists/**      r,


7) Reboot


That's it. Only openSUSE-distributed files are allowed to run (plus the
ones I built). If there is any problem, please reboot with
ima_appraise=log in the kernel command line, and delete the IMA policy.

I hope to receive more reviews, and to make the necessary changes to
have this new feature accepted in the kernel. This is a requirement for
Linux distributions to pick the feature up.

Thanks

Roberto

> Motivation
> ==========
> 
> This patch set aims to address two important shortcomings: predictability
> of the Platform Configuration Registers (PCRs), and the provisioning of
> reference values to compare the calculated file digest against.
> 
> Remote attestation, according to Trusted Computing Group (TCG)
> specifications, is done by replicating the PCR extend operation in
> software with the digests in the event log (in this case the IMA
> measurement list), and by comparing the obtained value with the PCR value
> signed by the TPM with the quote operation.
> 
> Due to how the extend operation is performed, if measurements are done in
> a different order, the final PCR value will be different. That means that
> if measurements are done in parallel, there is no way to predict what the
> final PCR value will be, making impossible to seal data to a PCR value. If
> the PCR value was predictable, a system could for example prove its
> integrity by unsealing and using its private key, without sending every
> time the full list of measurements.
> 
> Provisioning reference values for file digests is also a difficult task.
> The solution so far was to add file signatures to RPM packages, and
> possibly to DEB packages, so that IMA can verify them. While this undoubtly
> works, it also requires Linux distribution vendors to support the feature
> by rebuilding all their packages, and eventually extending their PKI to
> perform the additional signatures. It could also require developers extra
> work to deal with the additional data.
> 
> On the other hand, since often packages carry the file digests themselves,
> it won't be actually needed to add file signatures. If the kernel was able
> to extract the file digests by itself, all the tasks mentioned above for
> the Linux distribution vendors won't be needed too. All current and past
> Linux distributions can be easily retrofitted to enable IMA appraisal with
> the file digests from the packages.
> 
> Narrowing down the scope of a package parser to only extract specific
> information makes it small enough to accurately verify that it cannot harm
> the kernel. An additional mitigation consists in verifying the signature of
> the package first, before attempting to extract the file digests.
> 
> 
> Solution
> ========
> 
> To avoid a PCR is extended in a non-deterministic way, the proposed
> solution is to replace individual file measurements with the measurement of
> a file (the digest list) containing a set of file digests. If the
> calculated digest of a file being measured/appraised matches one digest in
> the set, its measurement is skipped. If otherwise there is no match, the
> file digest is added to the measurement list.
> 
> The resulting measurement list, which cannot be done on the default IMA PCR
> to avoid ambiguity with the default-style measurement, has the following
> meaning: none/some/all files represented with the measurement of the digest
> lists COULD have been accessed, without knowing IF and WHEN. Any other
> measurement (other than boot_aggregate) is of a file whose digest was not
> included in the digest list.
> 
> File signatures have a coarser granularity, it is per-signing key and not
> per-package. A measurement list containing just the measurement of the
> signing keys and the files without/invalid signature (those with valid
> signature would be skipped) would be even less accurate.
> 
> To ensure a rapid and smooth deployment of IMA appraisal, the kernel has
> been provided with the ability to extract file digests from the RPM
> package headers, and add them to the kernel memory on demand (only when a
> file from a given package is accessed). This ensures that the memory
> consumption for this new feature is directly proportional to the usage of
> the system.
> 
> 
> Scope
> =====
> 
> The integrity digest cache enables IMA to extend a PCR (not the default
> one) in a deterministic fashion, and to appraise immutable files with file
> digests from the packages, when no other appraisal method is available. It
> does not yet support metadata verification with Extended Verification
> Module (EVM), for which a separate patch set will be provided.
> 
> 
> Design
> ======
> 
> The digest cache is a hash table of file digests, attached to the inode of
> the digest list from which file digests are extracted. It is accessible,
> when a given file is being measured/appraised, from the new xattr
> security.digest_list, containing the path of the digest list itself.
> 
> If the calculated file digest is found in the digest cache, its measurement
> is avoided, or read-only access is granted if appraisal is in enforcing
> mode. Read-write access is prevented to avoid updating an unverified HMAC
> of file metadata.
> 
> The digest cache can be used only if the following conditions are met:
> 
> - The 'digest_cache=content' keyword is added to the desired IMA policy
>   rules;
> - If the rule action is 'measure', a PCR different from the default one
>   is specified;
> - If the rule action is 'appraise', 'digest_cache=content' and
>   'appraise_type' don't appear at the same time;
> - The same action for which the digest cache is used was done also on the
>   digest list;
> - The digest cache is not (currently) used for measurement/appraisal of
>   other digest lists.
> 
> For performance reasons, the digest cache is attached to every inode using
> it, since multiple hooks can be invoked on it before the
> measurement/appraisal result is cached. A reference count indicates how
> many inodes use it, and only when it reaches zero, the digest cache can be
> freed (for example when inodes are evicted from memory).
> 
> Two digest cache pointers have been added to the iint to distinguish for
> which purpose they should be used: dig_owner points to the digest cache
> created from the same inode the iint refers to, and should be used for
> measurement/appraisal of other inodes; dig_user points to the digest
> cache created from a different inode, and requested for
> measurement/appraisal. One digest cache pointer would be confusing, as
> for digest lists the digest cache was created from them, but IMA would
> try to use that digest cache for measurement/appraisal of itself.
> 
> Finally, at the first digest list measurement, an iterator is executed to
> sequentially read (not parse) all the digest lists in the same directory,
> so that the PCR is extended in a deterministic fashion. The other
> concurrent users of the digest cache have to wait until the iterator
> finishes.
> 
> 
> API
> ===
> 
> digest_cache_alloc(), digest_cache_parse_digest_list() and
> digest_cache_new() are internal functions used during the creation and
> initialization of the digest cache.
> 
> digest_cache_get() and digest_cache_free() are called by the user of the
> digest cache (e.g. IMA), to obtain and free a digest cache.
> 
> digest_cache_init_htable(), digest_cache_add() and digest_cache_lookup()
> are called by the digest list parsers to populate and search in a digest
> cache.
> 
> 
> Digest List Formats
> ===================
> 
> tlv
> ~~~
> 
> The Type-Length-Value (TLV) format was chosen for its extensibility.
> Additional fields can be added without breaking compatibility with old
> versions of the parser.
> 
> The layout of a tlv digest list is the following:
> 
>  [header: DIGEST_LIST_FILE, num fields, total len]
>  [field: DIGEST_LIST_ALGO, length, value]
>  [field: DIGEST_LIST_ENTRY#1, length, value (below)]
>   |- [header: DIGEST_LIST_FILE, num fields, total len]
>   |- [ENTRY#1_DIGEST, length, file digest]
>   |- [ENTRY#1_PATH, length, file path]
>  [field: DIGEST_LIST_ENTRY#N, length, value (below)]
>   |- [header: DIGEST_LIST_FILE, num fields, total len]
>   |- [ENTRY#N_DIGEST, length, file digest]
>   |- [ENTRY#N_PATH, length, file path]
> 
> DIGEST_LIST_ALGO is a field to specify the algorithm of the file digest.
> DIGEST_LIST_ENTRY is a nested TLV structure with the following fields:
> ENTRY_DIGEST contains the file digest; ENTRY_PATH contains the file path.
> 
> 
> rpm
> ~~~
> 
> The rpm digest list is basically a subset of the RPM package header.
> Its format is:
> 
>  [RPM magic number]
>  [RPMTAG_IMMUTABLE]
> 
> RPMTAG_IMMUTABLE is a section of the full RPM header containing the part
> of the header that was signed, and whose signature is stored in the
> RPMTAG_RSAHEADER section.
> 
> 
> Appended Signature
> ~~~~~~~~~~~~~~~~~~
> 
> Digest lists can have a module-style appended signature, that can be used
> for appraisal with IMA. The signature type can be PKCS#7, as for kernel
> modules, or the new user asymmetric key signature.
> 
> 
> History
> =======
> 
> The original name of this work was IMA Digest Lists, which was somehow
> considered too invasive. The code was moved to a separate component named
> DIGLIM (DIGest Lists Integrity Module), with the purpose of removing the
> complexity away of IMA, and also add the possibility of using it with other
> kernel components (e.g. Integrity Policy Enforcement, or IPE).
> 
> Since it was originally proposed, in 2017, this work grew up a lot thanks
> to various comments/suggestions. It became integrally part of the openEuler
> distribution since end of 2020.
> 
> There are significant differences between this and the previous versions.
> The most important one is moving from a centralized repository of file
> digests to a per-package repository. This significantly reduces the memory
> pressure, since digest lists are loaded into kernel memory only when they
> are actually needed. Also, file digests are automatically unloaded from
> kernel memory at the same time inodes are evicted from memory during
> reclamation.
> 
> 
> Performance
> ===========
> 
> The tests have been performed on a Fedora 38 virtual machine, with 8 cores
> (AMD EPYC-Rome), 4GB of RAM, TPM passthrough. The signing key is an ECDSA
> NIST P-384.
> 
> IMA measurement policy: no cache
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
>  dont_measure fsmagic=0x01021994
>  measure func=BPRM_CHECK
>  measure func=MMAP_CHECK
> 
> 
> IMA measurement policy: cache
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
>  dont_measure fsmagic=0x01021994
>  measure func=DIGEST_LIST_CHECK template=ima-modsig pcr=11
>  measure func=BPRM_CHECK digest_cache=content pcr=11
>  measure func=MMAP_CHECK digest_cache=content pcr=11
> 
> 
> IMA Measurement Results
> ~~~~~~~~~~~~~~~~~~~~~~~
> 
>                                +-----------+-----------+-----------+
>                                | # measur. | boot time |   slab    |
>  +-----------------------------+-----------+-----------+-----------+
>  | measure (no cache)          |    389    |  12.682s  | 231453 KB |
>  +-----------------------------+-----------+-----------+-----------+
>  | measure (cache, no iter)    |    175    |  12.283s  | 234224 KB |
>  +-----------------------------+-----------+-----------+-----------+
>  | measure (cache, iter)       |    853    |  16.430s  | 238491 KB |
>  +-----------------------------+-----------+-----------+-----------+
> 
> With the iterator enabled, all 852 packages are measured. Consequently, the
> boot time is longer. One possible optimization would be to exclude the
> packages that don't include measured files. By disabling the iterator, it
> can be seen that the packages actually used are 174 (one measurement is for
> boot_aggregate).
> 
> 
> IMA appraisal policy: no cache
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
>  dont_appraise fsmagic=0x01021994
>  appraise func=BPRM_CHECK appraise_type=imasig
>  appraise func=MMAP_CHECK appraise_type=imasig
> 
> 
> IMA appraisal policy: cache
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
>  dont_appraise fsmagic=0x01021994
>  appraise func=DIGEST_LIST_CHECK appraise_type=imasig|modsig
>  appraise func=BPRM_CHECK digest_cache=content
>  appraise func=MMAP_CHECK digest_cache=content
> 
> 
> IMA Appraisal Results
> ~~~~~~~~~~~~~~~~~~~~~
> 
>                                +-----------+-----------+
>                                | boot time |   slab    |
>  +-----------------------------+-----------+-----------+
>  | appraise (no cache)         |  11.995s  | 231145 KB |
>  +-----------------------------+-----------+-----------+
>  | appraise (cache)            |  11.879s  | 233091 KB |
>  +-----------------------------+-----------+-----------+
> 
> In this test, it can be seen that the performance of the two solutions are
> comparable, with the digest cache slightly ahead. The difference could be
> more substantial with more file appraised.
> 
> 
> How to Test
> ===========
> 
> First, it is necessary to copy the new kernel headers (tlv_parser.h,
> uasym_parser.h, tlv_digest_list.h) from usr/include/linux in the kernel
> source directory to /usr/include/linux.
> 
> Then, gpg must be rebuilt with the additional patches to convert the PGP
> keys of the Linux distribution to the new user asymmetric key format:
> 
>  $ gpg --conv-kernel <path of PGP key> >> certs/uasym_keys.bin
> 
> This embeds the converted keys in the kernel image. Then, the following
> kernel options must be enabled:
> 
>  CONFIG_INTEGRITY_DIGEST_CACHE=y
>  CONFIG_UASYM_KEYS_SIGS=y
>  CONFIG_UASYM_PRELOAD_PUBLIC_KEYS=y
> 
> and the kernel must be rebuilt with the patches applied. After boot, it is
> necessary to build and install the digest list tool in tools/digest-lists,
> and to execute (as root):
> 
>  # manage_digest_lists -o gen -d /etc/digest_lists -i rpmdb -f rpm
> 
> The new gpg must also be installed in the system, as it will be used to
> convert the PGP signatures of the RPM headers to the user asymmetric key
> format.
> 
> It is recommended to create an additional digest list with the following
> files, by creating a file named 'list' with the content:
> 
>  /usr/bin/manage_digest_lists
>  /usr/lib64/libgen-tlv-list.so
>  /usr/lib64/libgen-rpm-list.so
>  /usr/lib64/libparse-rpm-list.so
>  /usr/lib64/libparse-tlv-list.so
> 
> Then, to create the digest list, it is sufficient to execute:
> 
>  # manage_digest_lists -i list -L -d /etc/digest_lists -o gen -f tlv
> 
> If appraisal is enabled and in enforcing mode, it is necessary to sign the
> new digest list, with the sign-file tool in the scripts/ directory of the
> kernel sources:
> 
>  # scripts/sign-file sha256 certs/signing_key.pem certs/signing_key.pem /etc/digest_lists/tlv-list
> 
> The final step is to add security.digest_list to each file with:
> 
>  # manage_digest_lists -i /etc/digest_lists -o add-xattr
> 
> After that, it is possible to test the integrity digest cache with the
> following policy written to /etc/ima/ima-policy:
> 
>  dont_measure fsmagic=0x01021994
>  measure func=DIGEST_LIST_CHECK template=ima-modsig pcr=11
>  measure func=BPRM_CHECK digest_cache=content pcr=11
>  measure func=MMAP_CHECK digest_cache=content pcr=11
>  dont_appraise fsmagic=0x01021994
>  appraise func=BPRM_CHECK digest_cache=content
>  appraise func=MMAP_CHECK digest_cache=content
>  appraise func=DIGEST_LIST_CHECK appraise_type=imasig|modsig
> 
> Tmpfs is excluded for now, until memfd is properly handled.
> 
> Before loading the policy, it is possible to enable dynamic debug to see
> which operations are done by the integrity digest cache:
> 
>  # echo "file tlv* +p" > /sys/kernel/debug/dynamic_debug/control
>  # echo "file rpm* +p" > /sys/kernel/debug/dynamic_debug/control
>  # echo "file digest* +p" > /sys/kernel/debug/dynamic_debug/control
> 
> Alternatively, the same strings can be set as value of the dyndbg= option
> in the kernel command line.
> 
> A preliminary test, before booting the system with the new policy, is to
> supply the policy to IMA in the current system with:
> 
>  # cat /etc/ima/ima-policy > /sys/kernel/security/ima/policy
> 
> If that worked, the system can be rebooted. Systemd will take care of
> loading the IMA policy at boot. The instructions have been tested on a
> Fedora 38 OS.
> 
> After boot, it is possible to check the content of the measurement list:
> 
>  # cat /sys/kernel/security/ima/ascii_runtime_measurements
> 
> If only the files shipped with Fedora 38 have been executed, the
> measurement list will contain only the digest lists, and not the individual
> files.
> 
> Another test is to ensure that IMA prevents the execution of unknown files:
> 
>  # cp -a /bin/cat .
>  # ./cat
> 
> That will work. But not on the modified binary:
> 
>  # echo 1 >> cat
>  # cat
>  -bash: ./cat: Permission denied
> 
> Execution will be denied, and a new entry in the measurement list will
> appear (it would be probably ok to not add that entry, as access to the
> file was denied):
> 
>  11 50b5a68bea0776a84eef6725f17ce474756e51c0 ima-ng sha256:15e1efee080fe54f5d7404af7e913de01671e745ce55215d89f3d6521d3884f0 /root/cat
> 
> Finally, it is possible to test the shrinking of the digest cache, by
> forcing the kernel to evict inodes from memory:
> 
>  # echo 3 > /proc/sys/vm/drop_caches
> 
> The kernel log should have messages like:
> 
>  [  313.032536] DIGEST CACHE: Remove digest sha256:102900208eef27b766380135906d431dba87edaa7ec6aa72e6ebd3dd67f3a97b from digest list /etc/digest_lists/rpm-libseccomp-2.5.3-4.fc38.x86_64
> 
> 
> Patch set dependencies
> ======================
> 
> This patch set depends on:
> 
> https://lore.kernel.org/linux-integrity/20230720153247.3755856-2-roberto.sassu@huaweicloud.com/
> 
> which allows to appraise RPM package headers with the PGP keys of Linux
> distribution vendors.
> 
> 
> Patch set content
> =================
> 
> Patch 1 introduces a new hook to identify the loading of digest lists and
> consequently appraise them.
> 
> Patches 2-4 implement the digest cache, and an iterator to prefetch the
> digest lists to measure them in a deterministic way.
> 
> Patches 5-6 implement the currently supported digest list formats: tlv and
> rpm. The tlv format relies on the TLV parser defined in the patch set
> mentioned above.
> 
> Patches 7-9 enable the usage of the digest cache in IMA for measurement and
> appraisal.
> 
> Patches 10-12 add a tool to manage digest lists.
> 
> Patch 13 adds the documentation of the integrity digest cache.
> 
> 
> Changelog
> =========
> 
> v1:
> - Add documentation in Documentation/security/integrity-digest-cache.rst
> - Pass the mask of IMA actions to digest_cache_alloc()
> - Add a reference count to the digest cache
> - Remove the path parameter from digest_cache_get(), and rely on the
>   reference count to avoid the digest cache disappearing while being used
> - Rename the dentry_to_check parameter of digest_cache_get() to dentry
> - Rename digest_cache_get() to digest_cache_new() and add
>   digest_cache_get() to set the digest cache in the iint of the inode for
>   which the digest cache was requested
> - Add dig_owner and dig_user to the iint, to distinguish from which inode
>   the digest cache was created from, and which is using it; consequently it
>   makes the digest cache usable to measure/appraise other digest caches
>   (support not yet enabled)
> - Add dig_owner_mutex and dig_user_mutex to serialize accesses to dig_owner
>   and dig_user until they are initialized
> - Enforce strong synchronization and make the contenders wait until
>   dig_owner and dig_user are assigned to the iint the first time
> - Move checking IMA actions on the digest list earlier, and fail if no
>   action were performed (digest cache not usable)
> - Remove digest_cache_put(), not needed anymore with the introduction of
>   the reference count
> - Fail immediately in digest_cache_lookup() if the digest algorithm is
>   not set in the digest cache
> - Use 64 bit mask for IMA actions on the digest list instead of 8 bit
> - Return NULL in the inline version of digest_cache_get()
> - Use list_add_tail() instead of list_add() in the iterator
> - Copy the digest list path to a separate buffer in digest_cache_iter_dir()
> - Use digest list parsers verified with Frama-C
> - Explicitly disable (for now) the possibility in the IMA policy to use the
>   digest cache to measure/appraise other digest lists
> - Replace exit(<value>) with return <value> in manage_digest_lists.c
> 
> Roberto Sassu (13):
>   ima: Introduce hook DIGEST_LIST_CHECK
>   integrity: Introduce a digest cache
>   integrity/digest_cache: Add functions to populate and search
>   integrity/digest_cache: Prefetch digest lists in a directory
>   integrity/digest_cache: Parse tlv digest lists
>   integrity/digest_cache: Parse rpm digest lists
>   ima: Add digest_cache policy keyword
>   ima: Use digest cache for measurement
>   ima: Use digest cache for appraisal
>   tools: Add tool to manage digest lists
>   tools/digest-lists: Add tlv digest list generator and parser
>   tools/digest-lists: Add rpm digest list generator and parser
>   docs: Add documentation of the integrity digest cache
> 
>  Documentation/ABI/testing/ima_policy          |   6 +-
>  Documentation/security/index.rst              |   1 +
>  .../security/integrity-digest-cache.rst       | 484 ++++++++++++++++++
>  MAINTAINERS                                   |   2 +
>  include/linux/kernel_read_file.h              |   1 +
>  include/uapi/linux/tlv_digest_list.h          |  59 +++
>  include/uapi/linux/xattr.h                    |   3 +
>  security/integrity/Kconfig                    |  12 +
>  security/integrity/Makefile                   |   4 +
>  security/integrity/digest_cache.c             | 454 ++++++++++++++++
>  security/integrity/digest_cache.h             | 110 ++++
>  security/integrity/digest_cache_iter.c        | 160 ++++++
>  .../integrity/digest_list_parsers/parsers.h   |  15 +
>  security/integrity/digest_list_parsers/rpm.c  | 215 ++++++++
>  security/integrity/digest_list_parsers/tlv.c  | 188 +++++++
>  security/integrity/iint.c                     |  12 +
>  security/integrity/ima/ima.h                  |  16 +-
>  security/integrity/ima/ima_api.c              |  22 +-
>  security/integrity/ima/ima_appraise.c         |  16 +-
>  security/integrity/ima/ima_main.c             |  40 +-
>  security/integrity/ima/ima_policy.c           |  59 ++-
>  security/integrity/integrity.h                |   8 +
>  tools/Makefile                                |  16 +-
>  tools/digest-lists/.gitignore                 |   7 +
>  tools/digest-lists/Makefile                   |  72 +++
>  tools/digest-lists/common.c                   | 163 ++++++
>  tools/digest-lists/common.h                   |  90 ++++
>  tools/digest-lists/generators/generators.h    |  18 +
>  tools/digest-lists/generators/rpm.c           | 257 ++++++++++
>  tools/digest-lists/generators/tlv.c           | 168 ++++++
>  tools/digest-lists/manage_digest_lists.c      | 349 +++++++++++++
>  tools/digest-lists/manage_digest_lists.txt    |  82 +++
>  tools/digest-lists/parsers/parsers.h          |  16 +
>  tools/digest-lists/parsers/rpm.c              | 169 ++++++
>  tools/digest-lists/parsers/tlv.c              | 195 +++++++
>  tools/digest-lists/parsers/tlv_parser.h       |  38 ++
>  36 files changed, 3501 insertions(+), 26 deletions(-)
>  create mode 100644 Documentation/security/integrity-digest-cache.rst
>  create mode 100644 include/uapi/linux/tlv_digest_list.h
>  create mode 100644 security/integrity/digest_cache.c
>  create mode 100644 security/integrity/digest_cache.h
>  create mode 100644 security/integrity/digest_cache_iter.c
>  create mode 100644 security/integrity/digest_list_parsers/parsers.h
>  create mode 100644 security/integrity/digest_list_parsers/rpm.c
>  create mode 100644 security/integrity/digest_list_parsers/tlv.c
>  create mode 100644 tools/digest-lists/.gitignore
>  create mode 100644 tools/digest-lists/Makefile
>  create mode 100644 tools/digest-lists/common.c
>  create mode 100644 tools/digest-lists/common.h
>  create mode 100644 tools/digest-lists/generators/generators.h
>  create mode 100644 tools/digest-lists/generators/rpm.c
>  create mode 100644 tools/digest-lists/generators/tlv.c
>  create mode 100644 tools/digest-lists/manage_digest_lists.c
>  create mode 100644 tools/digest-lists/manage_digest_lists.txt
>  create mode 100644 tools/digest-lists/parsers/parsers.h
>  create mode 100644 tools/digest-lists/parsers/rpm.c
>  create mode 100644 tools/digest-lists/parsers/tlv.c
>  create mode 100644 tools/digest-lists/parsers/tlv_parser.h
>