[RFC,00/12] integrity: Introduce a digest cache

Message ID 20230721163326.4106089-1-roberto.sassu@huaweicloud.com
Headers
Series integrity: Introduce a digest cache |

Message

Roberto Sassu July 21, 2023, 4:33 p.m. UTC
  From: Roberto Sassu <roberto.sassu@huawei.com>

A bit of history first. The original name of this work was IMA Digest
Lists, which was somehow considered too invasive. I then moved the code
to a separate component named DIGLIM (DIGest Lists Integrity Module), with
the purpose of removing the complexity away of IMA, and also add the
possibility of using it with other kernel components (e.g. Integrity Policy
Enforcement, or IPE).

Since it was originally proposed, in 2017, this work grew up a lot thanks
to the feedback of my colleagues and external reviewers. It became
integrally part of the openEuler distribution since end of 2020. The
upstreaming process has been difficult, also due to the fact too many
features were included in the proposals.

So, I decided to take a step back and find the minimum possible set of
features that would make this work meaningful. The really minimum set that
would make appraisal work is to parse the RPM header (after signature
verification) containing the reference digests and compare the calculated
file digest with them. That would be similar to reading the signature from
the xattr and verifying the digest.

But, maybe that was too minimal. It does not make sense to appraise and
parse again the same RPM header for another packaged files. And should not
be necessarily tied to appraisal, as it would be useful for measurement too
(to make a PCR predictable). Please read the Benefits section here:

https://lore.kernel.org/linux-integrity/20210914163401.864635-1-roberto.sassu@huawei.com/

So, this is how the integrity digest cache is born.

The main difference with the previous attempts is that there is not a
centralized place to store file/metadata digests, but they are stored in
the digest cache, attached to the file (the digest list) the digest cache
was created from.

The link with the files being measured/appraised is the new
security.digest_list xattr containing the full path of the digest list
they refer to. At the time there is a measurement/appraisal, the digest
cache is built by reading (after appraisal) and parsing the digest list
(a TLV format and the RPM package format are currently supported).
Extracted digests are added to a per-package hash table, sized depending on
the number of elements.

Lookup should be much faster, as we are not anymore searching in a hash
table with 80000 digests, but most of the time with 100 or less. Also,
there is no need for locking after creation, the digest cache does not
change depending on writes (it would not make sense, since digest lists are
signed). Another very important feature is that the digest cache is
reclaimable, i.e. it disappears if the inode is evicted from memory. In
that case, the digest cache need to be initialized again. The digest cache
does not disappear while IMA is using it, by acquiring and releasing a
reference to the path structure of the digest list.

What about the predictability of PCRs, that some folks are trying to
address with the Unified Kernel Image? The concept of digest lists is
quite simple: measure the digest list to represent the possible access of a
group of files, instead of recording individual file accesses. If digest
lists are measured in a deterministic way, the PCR remains predictable
despite files are accessed in a different order. We currently don't support
xattrs in the initial ram disk, but if we did and the initial ram disk uses
the same measured files as the root filesystem, the PCR would still remain
predictable.

Another important point of the design was to avoid any possible
interference with existing IMA measurement and appraisal behavior. This
has been achieved in the following way.

First, the digest cache needs to be explicitly enabled in the IMA policy
through the new policy keyword 'digest_cache=content'. Second, new-style
measurements cannot be done on the default IMA PCR, a policy writer must
specify a different PCR or the policy will be rejected. Also, the use of
the digest cache is incompatible with other appraisal methods, e.g. with
xattrs or a modsig. The policy writer cannot specify any of the other
methods if it includes the digest_cache directive. Finally, the digest
cache does not bypass EVM too. Files matched with the digest caches can
only be opened read-only, to prevent updating an unverified HMAC to a valid
one.

Another incorrect policy combination is forbidden. The use of the
digest_cache directive alone does not enable the new behavior. The action
for which the digest cache should be used should have also be done on the
digest list itself. Otherwise, one cannot explain why there are no
measurements (if the digest list was not measured for example). The same
applies for appraisal.

The last part I wanted to talk about is about the digest list parsers. This
was a long debate. In the original proposal, Matthew Garrett and Christoph
Hellwig said that adding parsers in the kernel is not scalable and not a
good idea in general. While I do agree with them, I'm also thinking what
benefits we get if we relax a bit this requirement. If we merge this patch
set and the dependency (user asymmetric keys and signatures) today, we are
immediately able to have a predictable PCR for measurement, and do
appraisal at least of executable code without additional support from the
Linux distributions. We would need a small rpm plugin to write/remove RPM
headers and their signature to/from the disk as soon as packages are
installed/removed.

Over the years, I have tried many alternatives to the kernel-based parsers.
In the very first version, we supported only one digest list format in the
kernel, and injected the digest list to each package at build time. While
it works, it is still one way to do that, others don't support it and it is
building infrastructure-dependent.

I have tried to do the conversion from the RPM format to the kernel format
in user space, with the idea that the process doing it could be isolated
against an untrusted root. I had a recent discussion with the security
folks and they don't seem excited about it.

I have fully implemented DIGLIM and PGP keys and signatures support in
eBPF. The idea itself of being able to add kernel functionality in a safe
way without touching the kernel is very nice, and I would have pursued it
more. However, after I found LSM policy bypass and other bugs and after my
patches were not accepted by the maintainers, I didn't really feel I could
rely on this subsystem. So, back to the original approach for now,
maintainers seem to be more incline to accept kernel code, if there is a
need for strong isolation. In the future, we might work on better
alternative.

This patch set depends on:

https://lore.kernel.org/linux-integrity/20230720153247.3755856-2-roberto.sassu@huaweicloud.com/

which allows to appraise RPM package headers with the PGP keys of Linux
distribution vendors.


Patch 1 introduces a new hook to identify the loading of digest lists and
consequently appraise them.

Patches 2-4 implement the digest cache, and an iterator to prefetch the
digest lists to measure them in a deterministic way.

Patches 5-6 implement the currently supported digest list formats: tlv and
rpm. The tlv format relies on the TLV parser defined in the patch set
mentioned above.

Patches 7-9 enable the usage of the digest cache in IMA for measurement and
appraisal.

Patches 10-12 add a tool to manage digest lists.

If you are curious to try on an existing system, you first need to build
gpg with the patches in the previous patch set, and convert the PGP keys of
your Linux distribution to the new user asymmetric key format:

$ gpg --conv-kernel <path of PGP key> >> certs/uasym_keys.bin

This embeds the converted keys in the kernel image. Then, enable the
following kernel options:

CONFIG_INTEGRITY_DIGEST_CACHE=y
CONFIG_UASYM_KEYS_SIGS=y
CONFIG_UASYM_PRELOAD_PUBLIC_KEYS=y

and rebuild the kernel with the patches applied. After boot, build and
install the digest list tool in tools/digest-lists, and execute:

$ manage_digest_lists -o gen -d /etc/digest_lists -i rpmdb -f rpm

You also need the new gpg when you execute this tool, to convert the PGP
signatures of the RPM headers to the user asymmetric key format.

You may want to add the following additional files in a digest list by
creating a file named 'list' with the content:

/usr/bin/manage_digest_lists
/usr/lib64/libgen-tlv-list.so
/usr/lib64/libgen-rpm-list.so
/usr/lib64/libparse-rpm-list.so
/usr/lib64/libparse-tlv-list.so

and, execute:

$ manage_digest_lists -i list -L -d /etc/digest_lists -o gen -f tlv

You need to sign the created file:

scripts/sign-file sha256 certs/signing_key.pem certs/signing_key.pem /etc/digest_lists/tlv-list

The final step is to add security.digest_list to each file with:

$ manage_digest_lists -i /etc/digest_lists -o add-xattr

After that, create the following policy in /etc/ima/ima-policy:

dont_measure fsmagic=0x01021994
dont_appraise fsmagic=0x01021994
appraise func=BPRM_CHECK digest_cache=content
appraise func=MMAP_CHECK digest_cache=content
appraise func=DIGEST_LIST_CHECK appraise_type=imasig|modsig
measure func=DIGEST_LIST_CHECK template=ima-modsig pcr=11
measure func=BPRM_CHECK digest_cache=content pcr=11
measure func=MMAP_CHECK digest_cache=content pcr=11

I'm excluding tmpfs for now, we need to deal with memfd.

Before loading the policy, you could enable the dynamic debug with:

$ echo "file tlv* +p" > /sys/kernel/debug/dynamic_debug/control
$ echo "file rpm* +p" > /sys/kernel/debug/dynamic_debug/control
$ echo "file digest* +p" > /sys/kernel/debug/dynamic_debug/control

Or add the same strings with the dyndbg= option in the kernel command line.

Then, just cat the policy to IMA:

$ cat /etc/ima/ima-policy > /sys/kernel/security/ima/policy

If that worked, you can reboot the system. Systemd will take care of
loading the IMA policy at boot. Everything works for me on Fedora 38.

You can check the content of the measurement list:

$ cat /sys/kernel/security/ima/ascii_runtime_measurements

You will see only the measurement of the digest lists, not of the other
files. If you try:

$ cp -a /bin/cat .
$ ./cat

That will work. But if you do:

$ echo 1 >> cat
$ cat
-bash: ./cat: Permission denied

Execution will be denied, and you will see a new entry in the measurement
list (honestly, it should not be there, as access to the file was denied):

11 50b5a68bea0776a84eef6725f17ce474756e51c0 ima-ng sha256:15e1efee080fe54f5d7404af7e913de01671e745ce55215d89f3d6521d3884f0 /root/cat

Finally, you could try to evict inodes from memory, to free the digest
cache:

$ echo 3 > /proc/sys/vm/drop_caches

You will see in the kernel logs messages like:

[  313.032536] DIGEST CACHE: Remove digest sha256:102900208eef27b766380135906d431dba87edaa7ec6aa72e6ebd3dd67f3a97b from digest list /etc/digest_lists/rpm-libseccomp-2.5.3-4.fc38.x86_64

Roberto Sassu (12):
  ima: Introduce hook DIGEST_LIST_CHECK
  integrity: Introduce a digest cache
  integrity/digest_cache: Add functions to populate and search
  integrity/digest_cache: Iterate over digest lists in same dir
  integrity/digest_cache: Parse tlv digest lists
  integrity/digest_cache: Parse rpm digest lists
  ima: Add digest_cache policy keyword
  ima: Use digest cache for measurement
  ima: Use digest cache for appraisal
  tools: Add tool to manage digest lists
  tools/digest-lists: Add tlv digest list generator and parser
  tools/digest-lists: Add rpm digest list generator and parser

 Documentation/ABI/testing/ima_policy          |   6 +-
 MAINTAINERS                                   |   1 +
 include/linux/kernel_read_file.h              |   1 +
 include/uapi/linux/tlv_digest_list.h          |  59 +++
 include/uapi/linux/xattr.h                    |   3 +
 security/integrity/Kconfig                    |  12 +
 security/integrity/Makefile                   |   4 +
 security/integrity/digest_cache.c             | 430 ++++++++++++++++++
 security/integrity/digest_cache.h             | 113 +++++
 security/integrity/digest_cache_iter.c        | 163 +++++++
 .../integrity/digest_list_parsers/parsers.h   |  15 +
 security/integrity/digest_list_parsers/rpm.c  | 174 +++++++
 security/integrity/digest_list_parsers/tlv.c  | 188 ++++++++
 security/integrity/iint.c                     |   7 +
 security/integrity/ima/ima.h                  |  17 +-
 security/integrity/ima/ima_api.c              |  22 +-
 security/integrity/ima/ima_appraise.c         |  16 +-
 security/integrity/ima/ima_main.c             |  39 +-
 security/integrity/ima/ima_policy.c           |  54 ++-
 security/integrity/integrity.h                |   5 +
 tools/Makefile                                |  16 +-
 tools/digest-lists/.gitignore                 |   7 +
 tools/digest-lists/Makefile                   |  72 +++
 tools/digest-lists/common.c                   | 163 +++++++
 tools/digest-lists/common.h                   |  90 ++++
 tools/digest-lists/generators/generators.h    |  18 +
 tools/digest-lists/generators/rpm.c           | 257 +++++++++++
 tools/digest-lists/generators/tlv.c           | 168 +++++++
 tools/digest-lists/manage_digest_lists.c      | 349 ++++++++++++++
 tools/digest-lists/manage_digest_lists.txt    |  82 ++++
 tools/digest-lists/parsers/parsers.h          |  16 +
 tools/digest-lists/parsers/rpm.c              | 169 +++++++
 tools/digest-lists/parsers/tlv.c              | 195 ++++++++
 tools/digest-lists/parsers/tlv_parser.h       |  38 ++
 34 files changed, 2942 insertions(+), 27 deletions(-)
 create mode 100644 include/uapi/linux/tlv_digest_list.h
 create mode 100644 security/integrity/digest_cache.c
 create mode 100644 security/integrity/digest_cache.h
 create mode 100644 security/integrity/digest_cache_iter.c
 create mode 100644 security/integrity/digest_list_parsers/parsers.h
 create mode 100644 security/integrity/digest_list_parsers/rpm.c
 create mode 100644 security/integrity/digest_list_parsers/tlv.c
 create mode 100644 tools/digest-lists/.gitignore
 create mode 100644 tools/digest-lists/Makefile
 create mode 100644 tools/digest-lists/common.c
 create mode 100644 tools/digest-lists/common.h
 create mode 100644 tools/digest-lists/generators/generators.h
 create mode 100644 tools/digest-lists/generators/rpm.c
 create mode 100644 tools/digest-lists/generators/tlv.c
 create mode 100644 tools/digest-lists/manage_digest_lists.c
 create mode 100644 tools/digest-lists/manage_digest_lists.txt
 create mode 100644 tools/digest-lists/parsers/parsers.h
 create mode 100644 tools/digest-lists/parsers/rpm.c
 create mode 100644 tools/digest-lists/parsers/tlv.c
 create mode 100644 tools/digest-lists/parsers/tlv_parser.h
  

Comments

Roberto Sassu Aug. 3, 2023, 4:20 p.m. UTC | #1
On Fri, 2023-07-21 at 18:33 +0200, Roberto Sassu wrote:
> From: Roberto Sassu <roberto.sassu@huawei.com>

[...]

> The last part I wanted to talk about is about the digest list parsers. This
> was a long debate. In the original proposal, Matthew Garrett and Christoph
> Hellwig said that adding parsers in the kernel is not scalable and not a
> good idea in general. While I do agree with them, I'm also thinking what
> benefits we get if we relax a bit this requirement. If we merge this patch

I tried to mitigate the risk of adding unsafe code to the kernel by
verifying the parsers with a formal verification tool, Frama-C.

The verified code can be accessed here, and contains all the necessary
dependencies (so that the kernel is not involved):

https://github.com/robertosassu/rpm-formal

I added some assertions, to ensure that for any given input, the parser
does not try to reference memory outside the assigned memory area.

I also tried to enforce finite termination by making the number of
loops dependent on the passed data length.

The output I get is the following:

[eva:summary] ====== ANALYSIS SUMMARY ======
  ----------------------------------------------------------------------------
  13 functions analyzed (out of 13): 100% coverage.
  In these functions, 232 statements reached (out of 251): 92% coverage.
  ----------------------------------------------------------------------------
  Some errors and warnings have been raised during the analysis:
    by the Eva analyzer:      0 errors    2 warnings
    by the Frama-C kernel:    0 errors    0 warnings
  ----------------------------------------------------------------------------
  0 alarms generated by the analysis.
  ----------------------------------------------------------------------------
  Evaluation of the logical properties reached by the analysis:
    Assertions        5 valid     0 unknown     0 invalid      5 total
    Preconditions    25 valid     0 unknown     0 invalid     25 total
  100% of the logical properties reached have been proven.
  ----------------------------------------------------------------------------

The warnings are:

[eva] validate_tlv.c:353: Warning: 
  this partitioning parameter cannot be evaluated safely on all states

[eva] validate_tlv.c:381: Warning: 
  this partitioning parameter cannot be evaluated safely on all states

Not sure how I can make them go away. Anyway, the assertions are
successful.

I verified the parsers with both deterministic (random but valid) and
non-deterministic (random and possibly invalid) data. For deterministic
data, I also verified that bytes at a specific location have the
expected value.

Due to the increasing complexity, the analysis was not done on
arbitrary lengths and value ranges (it would probably require a
different type of analysis).

Thanks

Roberto