Skip to content

【Backport】ACPI模式下的启动时序修复#312

Open
uestc-gr wants to merge 1 commit into
RVCK-Project:rvck-6.6from
uestc-gr:aplic-bugfix
Open

【Backport】ACPI模式下的启动时序修复#312
uestc-gr wants to merge 1 commit into
RVCK-Project:rvck-6.6from
uestc-gr:aplic-bugfix

Conversation

@uestc-gr

Copy link
Copy Markdown
Contributor

issues: #311

acpi_scan_clear_dep_fn() 中的设备对象重新扫描被调度到了一个系统工作队列上,而这个队列并不能保证在进入用户空间之前完成。这可能导致用户空间初始化任务在查找某些关键设备时,这些设备尚未就绪。

解决方案:使用 async_schedule_dev_nocall() 来扫描这些设备。该函数专为异步初始化而设计,由于它同样使用一个专用的无绑定工作队列,因此工作方式与之前相同,但内核初始化代码在进入用户空间初始化之前会调用 async_synchronize_full(),这会等待工作完成。

测试方法,在acpi模式的riscv环境,进行反复重启,观察是否正常

mainline inclusion
from mainline-6.19-rc8
commit 7cf28b3
category: feature
bugzilla: RVCK-Project#311

--------------------------------

The device object rescan in acpi_scan_clear_dep_fn() is scheduled on a
system workqueue which is not guaranteed to be finished before entering
userspace. This may cause some key devices to be missing when userspace
init task tries to find them. Two issues observed on RISCV platforms:

 - Kernel panic due to userspace init cannot have an opened
   console.

   The console device scanning is queued by acpi_scan_clear_dep_queue()
   and not finished by the time userspace init process running, thus by
   the time userspace init runs, no console is present.

 - Entering rescue shell due to the lack of root devices (PCIe nvme in
   our case).

   Same reason as above, the PCIe host bridge scanning is queued on
   a system workqueue and finished after init process runs.

The reason is because both devices (console, PCIe host bridge) depend on
riscv-aplic irqchip to serve their interrupts (console's wired interrupt
and PCI's INTx interrupts). In order to keep the dependency, these
devices are scanned and created after initializing riscv-aplic. The
riscv-aplic is initialized in device_initcall() and a device scan work
is queued via acpi_scan_clear_dep_queue(), which is close to the time
userspace init process is run. Since system_dfl_wq is used in
acpi_scan_clear_dep_queue() with no synchronization, the issues will
happen if userspace init runs before these devices are ready.

The solution is to wait for the queued work to complete before entering
userspace init. One possible way would be to use a dedicated workqueue
instead of system_dfl_wq, and explicitly flush it somewhere in the
initcall stage before entering userspace. Another way is to use
async_schedule_dev_nocall() for scanning these devices. It's designed
for asynchronous initialization and will work in the same way as before
because it's using a dedicated unbound workqueue as well, but the kernel
init code calls async_synchronize_full() right before entering userspace
init which will wait for the work to complete.

Compared to a dedicated workqueue, the second approach is simpler
because the async schedule framework takes care of all of the details.
The ACPI code only needs to focus on its job. A dedicated workqueue for
this could also be redundant because some platforms don't need
acpi_scan_clear_dep_queue() for their device scanning.

Signed-off-by: Yicong Yang <yang.yicong@picoheart.com>
[ rjw: Subject adjustment, changelog edits ]
Link: https://patch.msgid.link/20260128132848.93638-1-yang.yicong@picoheart.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Gao Rui <gao.rui@zte.com.cn>
@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown

开始测试 log: https://github.com/RVCK-Project/rvck/actions/runs/28022150625

参数解析结果
args value
repository RVCK-Project/rvck
head ref pull/312/head
base ref rvck-6.6
LAVA repo RVCK-Project/lavaci
LAVA hardware ['qemu']
LAVA Testcase path lava-testcases/common-test/ltp/ltp.yaml
need run job kunit-test,kernel-build,check-patch,lava-trigger

测试完成

详细结果:
check result
kunit-test success
kernel-build success
check-patch success
lava-trigger-qemu success
lava-trigger-sg2042 skipped
lava-trigger-k1 skipped
lava-trigger-lpi4a skipped

Kunit Test Result

[11:18:51] Testing complete. Ran 482 tests: passed: 466, skipped: 16

Kernel Build Result

Check Patch Result

Total Errors 0
Total Warnings 1

LAVA Check (qemu)

args value
testcase_repo RVCK-Project/lavaci
lava_template lava-job-template/qemu/qemu-ltp.yaml
testcase_path lava-testcases/common-test/ltp/ltp.yaml
kernel_download_url http://10.30.190.110/openEuler-RISC-V/RVCK/OERV-RVCI/RVCK-Project/rvck/312_28022150625_1/Image
initramfs_download_url http://10.30.190.110/openEuler-RISC-V/RVCK/OERV-RVCI/RVCK-Project/rvck/312_28022150625_1/initramfs.img
rootfs_download_url https://fast-mirror.isrc.ac.cn/openeuler-sig-riscv/openEuler-RISC-V/RVCK/openEuler24.03-LTS-SP1/openeuler-rootfs.img.zst
testcase_ref main
testitem_name RVCK-Project_rvck_pull_request_target_312__common-test_qemu

result: Lava check done!

@uestc-gr

Copy link
Copy Markdown
Contributor Author

该pr已完成,请老师评审合入,多谢

@unicornx unicornx self-assigned this Jun 25, 2026
@sterling-teng sterling-teng requested a review from unicornx June 25, 2026 07:42
@unicornx unicornx removed their assignment Jun 25, 2026

@unicornx unicornx left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为手头没有支持 ACPI 的硬件,所以只是编译并在 Pioneerbox 上测试了一下启动。结果正常。

Reviewed-by: Wang Chen wangchen20@iscas.ac.cn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants