Skip to content

libtcb: skip privilege drop in unprivileged user namespaces#38

Merged
solardiz merged 3 commits into
openwall:mainfrom
Blarse:libtcb-unpriv-userns-support
May 29, 2026
Merged

libtcb: skip privilege drop in unprivileged user namespaces#38
solardiz merged 3 commits into
openwall:mainfrom
Blarse:libtcb-unpriv-userns-support

Conversation

@Blarse
Copy link
Copy Markdown
Contributor

@Blarse Blarse commented Apr 28, 2026

When an unprivileged user sets up a user namespace without newuidmap, it must first write "deny" to /proc/self/setgroups before writing gid_map, after which setgroups() becomes permanently forbidden[1]. tcb_drop_priv_r() then trips over sys_setgroups(0, NULL) and bails out with EPERM, even though euid 0 inside the namespace is not real root and there is nothing to drop in the first place.

In my case this breaks mkosi tool that builds images inside an unprivileged userns, where any tcb-aware utility aborts on the setgroups step. The patch detects the "deny" state via a small setgroups_allowed() helper and short-circuits tcb_drop_priv_r() with PRIV_MAGIC_NONROOT, the same path already taken for non-root callers. No behavioural change outside user namespaces; non-Linux builds get a stub that always returns 1.

[1] https://man7.org/linux/man-pages/man7/user_namespaces.7.html

@solardiz solardiz requested a review from ldv-alt May 2, 2026 20:17
@solardiz
Copy link
Copy Markdown
Member

solardiz commented May 2, 2026

Thank you @Blarse! Are these changes already in use in ALT's package perhaps?

I was concerned at first that the deny mode could possibly be enabled by a user in some other scenario, such as planning to attack libtcb as used by a privileged program, but reading up on it and experimenting with it a little bit I see it's very specialized to unprivileged containers.

Another approach I'd consider is only checking for deny after setgroups fails, and letting the uid/gid changes proceed anyway (and similar when restoring, so would need another magic to identify this state, or maybe mark it with number_of_groups = -1). However, this isn't obviously better - it skips the extra logic in the common case (faster?) and never skips uid/gid switching (safer against the unknown?), but adds complexity (and effort since it's different from what you already have in this PR). So just thinking out loud.

Comment thread libs/libtcb.c Outdated
@solardiz
Copy link
Copy Markdown
Member

solardiz commented May 2, 2026

Another approach I'd consider is only checking for deny after setgroups fails, and letting the uid/gid changes proceed anyway (and similar when restoring, so would need another magic to identify this state, or maybe mark it with number_of_groups = -1). However, this isn't obviously better - it skips the extra logic in the common case (faster?) and never skips uid/gid switching (safer against the unknown?)

"The unknown" includes possible future changes to the kernel, where the deny mode could possibly become usable in some other scenario. It isn't currently intended to disable any security measures that userspace tools take, so us doing essentially that is a risk and may not be future-proof.

@Blarse
Copy link
Copy Markdown
Contributor Author

Blarse commented May 4, 2026

Hi,

Thank you @Blarse! Are these changes already in use in ALT's package perhaps?

Not yet, but I'm working on porting mkosi to ALT Linux and stumbled upon a systemd-sysusers segfault caused by this. I can confirm that this patch fixes it.

Apparently, when useradd invoked libtcb and libtcb could not drop privileges after a failed setgroups() call, it left an empty shadow file, which in turn triggered another issue in ALT's downstream patch to systemd-sysusers that adds tcb support.

I was concerned at first that the deny mode could possibly be enabled by a user in some other scenario, such as planning to attack libtcb as used by a privileged program, but reading up on it and experimenting with it a little bit I see it's very specialized to unprivileged containers.

(For reference, virtiofsd hit the same setgroups + user namespace issue and applied a similar fix in !207.)

Another approach I'd consider is only checking for deny after setgroups fails, and letting the uid/gid changes proceed anyway (and similar when restoring, so would need another magic to identify this state, or maybe mark it with number_of_groups = -1). However, this isn't obviously better - it skips the extra logic in the common case (faster?) and never skips uid/gid switching (safer against the unknown?), but adds complexity (and effort since it's different from what you already have in this PR). So just thinking out loud.

@Blarse Blarse force-pushed the libtcb-unpriv-userns-support branch from 9f42bc3 to ad67dc3 Compare May 4, 2026 06:21
@solardiz
Copy link
Copy Markdown
Member

solardiz commented May 6, 2026

Thank you for revising the proc file reading/check per my request - this part passes my review now.

For reference, virtiofsd hit the same setgroups + user namespace issue and applied a similar fix in !207.

This looks like they're skipping only setgroups, whereas your changes here will also skip uid/gid switching. So the way you do it may be higher risk.

@ldv-alt Do we need to wait for your review here?

@ldv-alt
Copy link
Copy Markdown
Collaborator

ldv-alt commented May 6, 2026

I'd definitely prefer if the code would do the following:

  • attempt to drop supplementary groups using setgroups() unconditionally;
  • if setgroups() failed, check whether setgroups() is disabled;
  • if setgroups() is disabled, skipped just dropping/restoring of supplementary groups.

@Blarse
Copy link
Copy Markdown
Contributor Author

Blarse commented May 6, 2026

For reference, virtiofsd hit the same setgroups + user namespace issue and applied a similar fix in !207.

This looks like they're skipping only setgroups, whereas your changes here will also skip uid/gid switching. So the way you do it may be higher risk.

You're right. According to the man page referenced above:

•  One of the following two cases applies:

   (a)  Either the writing process has the CAP_SETUID (CAP_SETGID)
        capability in the parent user namespace.

        •  No further restrictions apply: the process can make
           mappings to arbitrary user IDs (group IDs) in the
           parent user namespace.

   (b)  Or otherwise all of the following restrictions apply:

        •  The data written to uid_map (gid_map) must consist of a
           single line that maps the writing process's effective
           user ID (group ID) in the parent user namespace to a
           user ID (group ID) in the user namespace.

        •  The writing process must have the same effective user
           ID as the process that created the user namespace.

        •  In the case of gid_map, use of the setgroups(2) system
           call must first be denied by writing "deny" to the
           /proc/pid/setgroups file (see below) before writing to
           gid_map.

The current PR effectively handles only case (b), where it makes no sense to switch uid/gid (though without the single line mapping check that the virtiofsd patch has). In case (a) the mapping can cover an arbitrary range, and /proc/pid/setgroups may be either allow or deny. So we shouldn't skip the whole drop just because setgroups=deny. With a range mapping, uid/gid switching still works and is needed.

I'll update the patch to skip only setgroups(0, NULL) when it's denied, while still calling ch_uid()/ch_gid(). tcb_gain_priv_r() needs a matching change to skip the setgroups() restoration in that case.

@Blarse
Copy link
Copy Markdown
Contributor Author

Blarse commented May 6, 2026

I'd definitely prefer if the code would do the following:

* attempt to drop supplementary groups using setgroups() unconditionally;

* if setgroups() failed, check whether setgroups() is disabled;

* if setgroups() is disabled, skipped just dropping/restoring of supplementary groups.

Would something like this work for you?

 	}
 	free(dir);
 
-	res = getgroups(p->number_of_groups, p->grplist);
-	if (res < 0 || res > p->number_of_groups)
-		return -1;
-
-	p->number_of_groups = res;
+	if (setgroups_allowed()) {
+		res = getgroups(p->number_of_groups, p->grplist);
+		if (res < 0 || res > p->number_of_groups)
+			return -1;
+		p->number_of_groups = res;
+		if (sys_setgroups(0, NULL) == -1)
+			return -1;
+	} else {
+		p->number_of_groups = -1;
+	}
 
-	if (sys_setgroups(0, NULL) == -1)
-		return -1;
 	if (!ch_gid(shadow_gid, &p->old_gid))
 		return -1;
 	if (!ch_uid(st.st_uid, &p->old_uid))

And for tcb_gain_priv_r:

 		return -1;
 	if (!ch_gid(p->old_gid, NULL))
 		return -1;
-	if (sys_setgroups(p->number_of_groups, p->grplist) == -1)
+	if (p->number_of_groups >= 0 &&
+		sys_setgroups(p->number_of_groups, p->grplist) == -1)
 		return -1;

@ldv-alt
Copy link
Copy Markdown
Collaborator

ldv-alt commented May 6, 2026

I'd definitely prefer if the code would do the following:

* attempt to drop supplementary groups using setgroups() unconditionally;

* if setgroups() failed, check whether setgroups() is disabled;

* if setgroups() is disabled, skipped just dropping/restoring of supplementary groups.

Would something like this work for you?

 	}
 	free(dir);
 
-	res = getgroups(p->number_of_groups, p->grplist);
-	if (res < 0 || res > p->number_of_groups)
-		return -1;
-
-	p->number_of_groups = res;
+	if (setgroups_allowed()) {
+		res = getgroups(p->number_of_groups, p->grplist);
+		if (res < 0 || res > p->number_of_groups)
+			return -1;
+		p->number_of_groups = res;
+		if (sys_setgroups(0, NULL) == -1)
+			return -1;
+	} else {
+		p->number_of_groups = -1;
+	}
 
-	if (sys_setgroups(0, NULL) == -1)
-		return -1;

I'd rather invoked sys_setgroups(0, NULL) unconditionally and started checking setgroups_allowed() only if that setgroups call failed.

@Blarse Blarse force-pushed the libtcb-unpriv-userns-support branch from ad67dc3 to b8805df Compare May 6, 2026 09:25
@Blarse
Copy link
Copy Markdown
Contributor Author

Blarse commented May 6, 2026

I'd rather invoked sys_setgroups(0, NULL) unconditionally and started checking setgroups_allowed() only if that setgroups call failed.

I've updated the patch according to your suggestions.

@Blarse
Copy link
Copy Markdown
Contributor Author

Blarse commented May 22, 2026

Hi, is there anything else I need to address here, or is this good to merge?

Copy link
Copy Markdown
Collaborator

@ldv-alt ldv-alt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change itself looks good, the only comments I have are stylistic.

Comment thread libs/libtcb.c Outdated
Comment thread ChangeLog
Comment thread libs/libtcb.c Outdated
@Blarse Blarse force-pushed the libtcb-unpriv-userns-support branch from b8805df to 85d2484 Compare May 26, 2026 05:50
Copy link
Copy Markdown
Collaborator

@ldv-alt ldv-alt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libtcb.c: In function ‘tcb_gain_priv_r’:
libtcb.c:286:30: error: ‘PRIV_MAGIC_NOSETGROUP’ undeclared (first use in this function); did you mean ‘PRIV_MAGIC_NOSETGROUPS’?
  286 |         if (p->is_dropped != PRIV_MAGIC_NOSETGROUP &&
      |                              ^~~~~~~~~~~~~~~~~~~~~
      |                              PRIV_MAGIC_NOSETGROUPS

@Blarse Blarse force-pushed the libtcb-unpriv-userns-support branch from 85d2484 to 9628b98 Compare May 26, 2026 17:32
@Blarse
Copy link
Copy Markdown
Contributor Author

Blarse commented May 26, 2026

libtcb.c: In function ‘tcb_gain_priv_r’:
libtcb.c:286:30: error: ‘PRIV_MAGIC_NOSETGROUP’ undeclared (first use in this function); did you mean ‘PRIV_MAGIC_NOSETGROUPS’?
  286 |         if (p->is_dropped != PRIV_MAGIC_NOSETGROUP &&
      |                              ^~~~~~~~~~~~~~~~~~~~~
      |                              PRIV_MAGIC_NOSETGROUPS

My bad, fixed.

Copy link
Copy Markdown
Collaborator

@ldv-alt ldv-alt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libtcb.c: In function ‘tcb_gain_priv_r’:
libtcb.c:286:27: error: comparison of integer expressions of different signedness: ‘int’ and ‘unsigned int’ [-Werror=sign-compare]
  286 |         if (p->is_dropped != PRIV_MAGIC_NOSETGROUPS &&
      |                           ^~

@Blarse
Copy link
Copy Markdown
Contributor Author

Blarse commented May 27, 2026

libtcb.c: In function ‘tcb_gain_priv_r’:
libtcb.c:286:27: error: comparison of integer expressions of different signedness: ‘int’ and ‘unsigned int’ [-Werror=sign-compare]
  286 |         if (p->is_dropped != PRIV_MAGIC_NOSETGROUPS &&
      |                           ^~

Yep, just saw this. A simple fix would be adding a cast to int:

--- b/libs/libtcb.c
+++ a/libs/libtcb.c
@@ -283,7 +283,7 @@ int tcb_gain_priv_r(struct tcb_privs *p)
                return -1;
        if (!ch_gid(p->old_gid, NULL))
                return -1;
-       if (p->is_dropped != PRIV_MAGIC_NOSETGROUPS &&
+       if (p->is_dropped != (int) PRIV_MAGIC_NOSETGROUPS &&
            sys_setgroups(p->number_of_groups, p->grplist) == -1)
                return -1;

But the problem in general is due to the value I chose for PRIV_MAGIC_NOSETGROUPS not fitting in signed int. Do you think a better solution would be picking another value or casting to int in the macro definition?

@ldv-alt
Copy link
Copy Markdown
Collaborator

ldv-alt commented May 27, 2026

But the problem in general is due to the value I chose for PRIV_MAGIC_NOSETGROUPS not fitting in signed int. Do you think a better solution would be picking another value or casting to int in the macro definition?

Both PRIV_MAGIC and PRIV_MAGIC_NONROOT aren't fitting into signed int either. I'd consider changing the type of is_dropped to unsigned int.

Blarse added 3 commits May 28, 2026 11:18
The PRIV_MAGIC_* values used to mark the privilege state do not fit
into a signed int, so store them in an unsigned field to avoid a
signedness mismatch.
Detect /proc/self/setgroups == "deny" to recognize an unprivileged
user namespace where setgroups(2) is permanently denied by the kernel.
No-op on non-Linux.
In a user namespace where /proc/self/setgroups is "deny",
setgroups(2) is permanently rejected by the kernel.  Perform the
regular privilege drop and only tolerate sys_setgroups(0, NULL)
failing with EPERM in such a namespace; in that case the kernel
guarantees no supplementary group could have been gained via the
namespace, so leaving the list in place is safe.  Record this
with a new PRIV_MAGIC_NOSETGROUPS state so that tcb_gain_priv_r()
skips the matching setgroups() call.

Fixes failures of pam_tcb, libnss_tcb, tcb_unconvert and shadow's
shadowtcb_drop_priv() when running under rootless container.
@Blarse Blarse force-pushed the libtcb-unpriv-userns-support branch from 9628b98 to 630ffd2 Compare May 28, 2026 08:19
@ldv-alt ldv-alt added the enhancement New feature or request label May 28, 2026
Copy link
Copy Markdown
Collaborator

@ldv-alt ldv-alt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@solardiz solardiz merged commit fd69597 into openwall:main May 29, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants