Skip to content

Fix rereading partition table with partprobe#3000

Open
schaefi wants to merge 1 commit into
mainfrom
fix_partprobe_locking
Open

Fix rereading partition table with partprobe#3000
schaefi wants to merge 1 commit into
mainfrom
fix_partprobe_locking

Conversation

@schaefi
Copy link
Copy Markdown
Collaborator

@schaefi schaefi commented May 18, 2026

partprobe does locking itself already. Wrapping it in udevadm lock results in a deadlock, breaking boot.
This Fixes bsc#1263973

@schaefi schaefi requested a review from Vogtinator May 18, 2026 08:43
@schaefi schaefi self-assigned this May 18, 2026
@Bischoff
Copy link
Copy Markdown

Fix tested to work fine on self-built FBA image.

The ECKD image still works too.

@Vogtinator
Copy link
Copy Markdown
Collaborator

Vogtinator commented May 22, 2026

partprobe does locking itself already.

I thought I ruled that out. Here on TW x86_64 and nvme at least it does not appear to do that:

fvogt@fvogt-thinkpad:~> sudo strace -f udevadm lock --device /dev/nvme0n1 /sbin/partprobe /dev/nvme0n1 |& grep flock
flock(3, LOCK_EX|LOCK_NB)               = 0
fvogt@fvogt-thinkpad:~> sudo strace -f /sbin/partprobe /dev/nvme0n1 |& grep flock

That means on x86_64 at least this change drops the locking entirely.

So the question is: Why does partprobe lock dasd on s390x? Should it do that? If so, what's the best workaround?

@Vogtinator
Copy link
Copy Markdown
Collaborator

So the question is: Why does partprobe lock dasd on s390x? Should it do that? If so, what's the best workaround?

The answer is:

https://build.opensuse.org/projects/Base:System/packages/parted/files/libparted-make-BLKRRPART-more-robust.patch?expand=1

So a downstream patch adds support for dasd to partprobe which calls flock. So either that drops the flock part (which may break stuff, there is a sleep(1)) or kiwi avoids udevadm --lock only for dasd devices.

@Vogtinator
Copy link
Copy Markdown
Collaborator

I wonder if that also needs to be done with 04c92f0...

@schaefi
Copy link
Copy Markdown
Collaborator Author

schaefi commented May 24, 2026

or kiwi avoids udevadm --lock only for dasd devices.

I believe that's what we need to do... Thanks much for the investigations

@Bischoff
Copy link
Copy Markdown

or kiwi avoids udevadm --lock only for dasd devices.

I believe that's what we need to do... Thanks much for the investigations

Given that those downstream patches are six years old and seem pretty stable, this looks like the most reasonable option, yes.

@Conan-Kudo
Copy link
Copy Markdown
Member

Please don't merge this, as it breaks everything on Fedora.

@Conan-Kudo
Copy link
Copy Markdown
Member

I also don't see any indication there was upstream discussion about the SUSE patch in https://github.com/bcl/parted

@schaefi schaefi force-pushed the fix_partprobe_locking branch from 2028402 to 5f74267 Compare May 30, 2026 18:40
@schaefi
Copy link
Copy Markdown
Collaborator Author

schaefi commented May 30, 2026

Please don't merge this, as it breaks everything on Fedora.

Thinking more closely on this I believe we should get rid of parted as a requirement to kiwi. It seems there are distribution specific patches which are not aligned with upstream and this thread also shows that it might be not easy to properly address this from a builder perspective in a general way.

As such I added a commit which gets rid of the parted dependency. It adds changes to the following places

  • In the kiwi python code to replace setting activation flag via sfdisk --activate instead of parted. it was the only place in the python code that still holds a dependency
  • In the dracut code for rereading the partition table by using partx -u which is expected to work everywhere and if not kiwi should not be the place to fix this. This drops the use of partprobe which was added for s390 only in the module setup for reasons I cannot remember anymore
  • In the dracut code for detecting the partition table type on s390 by using the output from sfdisk --dump instead of parted
  • In the dracut code for this obscure hack in create_dasd_partitions() also s390 specific, to force reading the new disk geometry because fdasd is not able to do this. I changed this code to a dump/reload call via sfdisk instead of that parted call which I also believe does not work if the disk has more than one partition

As all of this is way more than I initially expected in depth testing on s390 is needed. I do not expect any negative side effects on the other archs

I will ask @Bischoff when he is back from vacation to help me in testing because he has a setup for all s390 disk types

@schaefi
Copy link
Copy Markdown
Collaborator Author

schaefi commented May 30, 2026

after testing I found that sfdisk/partx does not work on DASD devices. I think I came to the same conclusion several years back when I already had tried this. So for me only parted/fdasd/partprobe are tools that works somewhat with DASD disks. I believe there are several patches in parted which are not available in other parted versions. Especially if the locking is handled differently for the same tool on different distributions this makes it really hard.

I keep the commits added here but I most likely will revert them and then we need to find a solution that does not break other distributions...

@Conan-Kudo
Copy link
Copy Markdown
Member

The parted patches in openSUSE have never been submitted upstream as far as I can tell. Please someone do that.

@Bischoff
Copy link
Copy Markdown

I will ask @Bischoff when he is back from vacation to help me in testing because he has a setup for all s390 disk types

I think I can even download a 3270 terminal onto my vacations laptop, and test whatever you need whenever the code stabilizes. This is because it is seen as urgent from the Multi-Linux Manager side. Just ask.

@schaefi schaefi force-pushed the fix_partprobe_locking branch from 727180d to 7f1242b Compare June 1, 2026 09:17
@Vogtinator
Copy link
Copy Markdown
Collaborator

after testing I found that sfdisk/partx does not work on DASD devices. I think I came to the same conclusion several years back when I already had tried this. So for me only parted/fdasd/partprobe are tools that works somewhat with DASD disks.

Yeah, that's what I heard as well. DASD is a bit too unusual apparently.

I think I can even download a 3270 terminal onto my vacations laptop, and test whatever you need whenever the code stabilizes. This is because it is seen as urgent from the Multi-Linux Manager side. Just ask.

I can easily test DASD-ECKD and FCP as well and can probably ask for FBA as well - no need to take a break from your vacation :-)

@schaefi
Copy link
Copy Markdown
Collaborator Author

schaefi commented Jun 1, 2026

I wonder if that also needs to be done with 04c92f0...

The code is part of create_dasd_partitions which is only called after we detected a DASD device. As such we don't need further checking

With the latest commit I have changed the code such that parted and partprobe are only used in the context of a DASD device.

We still have the problem that we avoid set_device_lock in this context because the assumption is that libparted calls flock() itself which is only true for SUSE.

At this point I'm now lost.

I still think preventing a dead-lock condition weights more than a maybe race condition. Given that we apply a device lock when calling fdasd, the race condition might be in a very small window because directly after fdasd unlock we call partprobe... but yes the race still exists.

Thoughts please

@Vogtinator
Copy link
Copy Markdown
Collaborator

I keep the commits added here but I most likely will revert them and then we need to find a solution that does not break other distributions...

Given that there is apparently no support for DASD in sfdisk and parted in other distributions, I don't think it ever worked anywhere else...

Comment thread dracut/modules.d/55kiwi-live/kiwi-live-lib.sh
@schaefi schaefi force-pushed the fix_partprobe_locking branch from 7f1242b to 4582c6a Compare June 1, 2026 09:26
@Bischoff
Copy link
Copy Markdown

Bischoff commented Jun 1, 2026

I think I can even download a 3270 terminal onto my vacations laptop, and test whatever you need whenever the code stabilizes. This is because it is seen as urgent from the Multi-Linux Manager side. Just ask.

I can easily test DASD-ECKD and FCP as well and can probably ask for FBA as well - no need to take a break from your vacation :-)

I have tried yesterday and my connection to the mainframe works with c3270. I have a readymade test bed for both CEKD and FBA, so no big deal. In fact the most annoying part is waiting for the image builds in IBs 😛 . And take into account my curiosity 😄 .

Now, taking a step back: I was in favour of a quick fix at Kiwi level. But yeah, it might create problems in Fedora and other distributions as they don't have the upstream patches. If we are to do it really well, I would vote for:

  • submitting the upstream patches to the other distros, without the locking code (i.e. remove that code)
  • keeping the locking in Kiwi, for all arches

The reasoning is that there is no reason why parted and partprobe should lock for s390 and not for the other arches. Either lock in all arches, or in none. Lock in none is simpler, so let's move the responsibility of locking to the calling code, Kiwi or other. Also, I agree that the problem of a race condition is less severe than the problem of an interlock as we currently see it.

@Conan-Kudo
Copy link
Copy Markdown
Member

@pleia2, do you have any contacts with folks that work on dasd support in Linux partitioning tools? This situation is kind of a mess...

On s390 and with a DASD disk the rereading of the partition
table can only be done properly with partprobe. This commit
changes the dracut code to Require parted and partprobe only
for DASD disks on s390.

In addition this commit prevents the active device locking
for parted and partprobe when called in the context of a
DASD device. The reason for this change is because of a
patch in libparted which applies flock() itself in this
condition. For details see:

    https://build.opensuse.org/projects/Base:System/packages/parted/files/libparted-make-BLKRRPART-more-robust.patch?expand=1

This change however does not exist in the upstream version
of parted which is causing problems that are hard to solve
by kiwi. However, a dead-lock condition should be avoided
and I think it's still better to have this change in kiwi
and convince people to upstream the above parted fix, rather
than living with a dead-lock condition due to double locking
in kiwi. This Fixes bsc#1263973
@schaefi schaefi force-pushed the fix_partprobe_locking branch from 4582c6a to 25bfd16 Compare June 1, 2026 13:27
@schaefi
Copy link
Copy Markdown
Collaborator Author

schaefi commented Jun 1, 2026

I have created a Staging build at

https://build.opensuse.org/project/show/Virtualization:Appliances:Staging

to check if the integration test builds and will do some tests. @Bischoff your MLM server build should automatically be re-triggered by this and it would be great if you can give it a test for s390 too. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants