【Backport】ACPI模式下的启动时序修复#312
Open
uestc-gr wants to merge 1 commit into
Open
Conversation
mainline inclusion from mainline-6.19-rc8 commit 7cf28b3 category: feature bugzilla: RVCK-Project#311 -------------------------------- The device object rescan in acpi_scan_clear_dep_fn() is scheduled on a system workqueue which is not guaranteed to be finished before entering userspace. This may cause some key devices to be missing when userspace init task tries to find them. Two issues observed on RISCV platforms: - Kernel panic due to userspace init cannot have an opened console. The console device scanning is queued by acpi_scan_clear_dep_queue() and not finished by the time userspace init process running, thus by the time userspace init runs, no console is present. - Entering rescue shell due to the lack of root devices (PCIe nvme in our case). Same reason as above, the PCIe host bridge scanning is queued on a system workqueue and finished after init process runs. The reason is because both devices (console, PCIe host bridge) depend on riscv-aplic irqchip to serve their interrupts (console's wired interrupt and PCI's INTx interrupts). In order to keep the dependency, these devices are scanned and created after initializing riscv-aplic. The riscv-aplic is initialized in device_initcall() and a device scan work is queued via acpi_scan_clear_dep_queue(), which is close to the time userspace init process is run. Since system_dfl_wq is used in acpi_scan_clear_dep_queue() with no synchronization, the issues will happen if userspace init runs before these devices are ready. The solution is to wait for the queued work to complete before entering userspace init. One possible way would be to use a dedicated workqueue instead of system_dfl_wq, and explicitly flush it somewhere in the initcall stage before entering userspace. Another way is to use async_schedule_dev_nocall() for scanning these devices. It's designed for asynchronous initialization and will work in the same way as before because it's using a dedicated unbound workqueue as well, but the kernel init code calls async_synchronize_full() right before entering userspace init which will wait for the work to complete. Compared to a dedicated workqueue, the second approach is simpler because the async schedule framework takes care of all of the details. The ACPI code only needs to focus on its job. A dedicated workqueue for this could also be redundant because some platforms don't need acpi_scan_clear_dep_queue() for their device scanning. Signed-off-by: Yicong Yang <yang.yicong@picoheart.com> [ rjw: Subject adjustment, changelog edits ] Link: https://patch.msgid.link/20260128132848.93638-1-yang.yicong@picoheart.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Gao Rui <gao.rui@zte.com.cn>
|
开始测试 log: https://github.com/RVCK-Project/rvck/actions/runs/28022150625 参数解析结果
测试完成 详细结果:
Kunit Test Result[11:18:51] Testing complete. Ran 482 tests: passed: 466, skipped: 16
Kernel Build Result
Check Patch Result
LAVA Check (qemu)
result: Lava check done!
|
Contributor
Author
|
该pr已完成,请老师评审合入,多谢 |
unicornx
approved these changes
Jun 26, 2026
unicornx
left a comment
There was a problem hiding this comment.
因为手头没有支持 ACPI 的硬件,所以只是编译并在 Pioneerbox 上测试了一下启动。结果正常。
Reviewed-by: Wang Chen wangchen20@iscas.ac.cn
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
issues: #311
acpi_scan_clear_dep_fn() 中的设备对象重新扫描被调度到了一个系统工作队列上,而这个队列并不能保证在进入用户空间之前完成。这可能导致用户空间初始化任务在查找某些关键设备时,这些设备尚未就绪。
解决方案:使用 async_schedule_dev_nocall() 来扫描这些设备。该函数专为异步初始化而设计,由于它同样使用一个专用的无绑定工作队列,因此工作方式与之前相同,但内核初始化代码在进入用户空间初始化之前会调用 async_synchronize_full(),这会等待工作完成。
测试方法,在acpi模式的riscv环境,进行反复重启,观察是否正常