Skip to content

feat: Support GDB index#1990

Open
lapla-cogito wants to merge 24 commits into
wild-linker:mainfrom
lapla-cogito:gdb_idx
Open

feat: Support GDB index#1990
lapla-cogito wants to merge 24 commits into
wild-linker:mainfrom
lapla-cogito:gdb_idx

Conversation

@lapla-cogito
Copy link
Copy Markdown
Member

This patch adds support for --gdb-index and --no-gdb-index. Although we support version 9 here, the test environment may produce a version 7 GDB index when using lld. This patch also adds a test that checks whether specific symbols are included in the symbol table of the .gdb_index section. Because of this requirement, the integration test includes logic to support versions earlier than version 9 as well.

Closes #811

Comment thread libwild/src/gdb_index.rs
fallback_cu: u32,
mut on_entry: impl FnMut(&'data [u8], u32),
) {
for section_name in [".debug_gnu_pubnames", ".debug_gnu_pubtypes"] {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The symbol table construction assumes the presence of these two sections by adding the compiler flag -ggnu-pubnames (see the added integration test). Without these sections, the symbol table will not be built and only the CU list and address table will be constructed. This effectively means you won't benefit from GDB index, but since lld and mold behave this way, I've aligned the implementation.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we emit a warning to the user telling the option is leading to a GDB index result?

@lapla-cogito
Copy link
Copy Markdown
Member Author

I did git commit --amend --allow-empty because the release workflow was triggered by the previous commit (which I canceled)...
https://github.com/wild-linker/wild/actions/runs/26509302406

I have no idea what happened... 🤷

Copy link
Copy Markdown
Collaborator

@marxin marxin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the numerous findings I created - can you explain what was the genesis of the PR and how much was a LLM involved? I know we had a PR for the very same functionality that was closed right after it was opened. Is it an incarnation of the changes, or was it created completely independently?

Comment thread libwild/src/gdb_index.rs Outdated
Comment thread libwild/src/gdb_index.rs
Comment thread libwild/src/gdb_index.rs
4 + init_len as usize
};
if total == 0 || offset + total > data.len() {
break;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we rather report a warning as the emitted GDB index will be likely incomeplete/corrupted?

Comment thread libwild/src/gdb_index.rs
Comment thread libwild/src/gdb_index.rs
Comment thread libwild/src/gdb_index.rs Outdated
Comment thread libwild/src/gdb_index.rs Outdated

// Emit into the output buffer.
let total = cp_off as usize + cv_data.len() + str_data.len();
let len = buf.len().min(total);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a hard error if the allocated buffer does not match the expected written data.

Comment thread libwild/src/gdb_index.rs Outdated
Comment thread libwild/src/gdb_index.rs Outdated
Comment thread libwild/src/gdb_index.rs Outdated
@lapla-cogito
Copy link
Copy Markdown
Member Author

@marxin (I haven't seen the review details yet)

we had a PR for the very same functionality that was closed right after it was opened

I think you mean #1605. I used that as a reference when implementing this, since I had also reviewed it at the time and therefore already had some familiarity with the surrounding context and implementation details. However, I did not reuse it directly for licensing concerns (and in any case, it does not support version 9 as-is). I mainly used LLMs to help refine the implementation through review and iterative revisions.

@lapla-cogito lapla-cogito marked this pull request as draft May 27, 2026 21:30
@marxin
Copy link
Copy Markdown
Collaborator

marxin commented May 28, 2026

I think you mean #1605. I used that as a reference when implementing this, since I had also reviewed it at the time and therefore already had some familiarity with the surrounding context and implementation details.

Yes - fine, that's a good approach you chose ;)

However, I did not reuse it directly for licensing concerns (and in any case, it does not support version 9 as-is). I mainly used LLMs to help refine the implementation through review and iterative revisions.

All good! Let's iterate on the PR, given you knowledge, I guess you can address the findings pretty fast. Happy to review and help you with the feature.

Comment thread libwild/src/gdb_index.rs
let (header_size, set_end, debug_info_offset) = if init_len == 0xFFFF_FFFF {
// DWARF64: 4 + 8(len) + 2(ver) + 8(offset) + 8(size) = 30
if pos + 30 > data.len() {
break;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe most of the breaks in the function should be actually replaced with an error handling instead.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As shown by this test failure, some toolchains don't generate strictly correct .debug_info. Therefore, both lld and mold are designed to be tolerant of such structures, so the original break approach should work well I think.

Copy link
Copy Markdown
Collaborator

@marxin marxin Jun 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I am not aligned with that: apparently it seems we're decoding a compress ZLIB stream, so our parsing logic is effectively hiding a real bug:

failures:
        elf/x86_64/gdb-index/enabled

    test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 1 filtered out; finished in 0.00s

  stderr ───
    [libwild/src/gdb_index.rs:127:9] init_len = 1
    [libwild/src/gdb_index.rs:137:9] (total, offset) = (
        5,
        0,
    )
    [libwild/src/gdb_index.rs:127:9] init_len = 3053453312
    [libwild/src/gdb_index.rs:137:9] (total, offset) = (
        3053453316,
        5,
    )
...
❯ eu-readelf -zS ./wild/tests/build/elf/x86_64/gdb-index/enabled/gdb-index.c.o
...
[ 5] .debug_info          PROGBITS     0000000000000000 000000a0 0000007f  0 C      0   0  8
     [ELF ZLIB (1) 000000b6  1]
...
./wild/tests/build/elf/x86_64/gdb-index/enabled/gdb-index.c.o:  file format elf64-x86-64

.debug_info contents:
0x00000000: Compile Unit: length = 0x000000b2, format = DWARF32, version = 0x0005, unit_type = DW_UT_compile, abbr_offset = 0x0000, addr_size = 0x08 (next unit at 0x000000b6)

so the expected CU size is 0x000000b2. So please, let's start with a strict mode and we can then make an intensive testing if that's really the case. If so, then we have basically 2 options: not emitting the .gdbindex, or bail-out and provide an error message.

@lapla-cogito lapla-cogito marked this pull request as ready for review June 1, 2026 05:07
@lapla-cogito lapla-cogito requested a review from marxin June 1, 2026 05:07
Copy link
Copy Markdown
Member

@davidlattimore davidlattimore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran a quick benchmark to see what effect adding the flag --gdb-index has on the performance of the different linkers. With lld and mold, the performance difference between --gdb-index and --no-gdb-index was negligible. With wild however, it slows down linking by about 2x. Benchmark was with 32 threads linking zed.

I expect that the main problem is that there's lots of work being done single-threaded. e.g. iterating through all the objects. Also, looking up relevant sections by name is likely to be slow - as opposed to our usual approach where SectionRules determines what we should do with each input section.

Comment thread libwild/src/gdb_index.rs
buf[so..so + HASH_SLOT_SIZE].copy_from_slice(new_slot.as_bytes());
break;
}
slot = (slot + step) & mask;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it'd be worthwhile asserting that we haven't gone back to the starting slot? In case we somehow allocated too few slots - it'd be better to crash that infinitely loop

Comment thread libwild/src/gdb_index.rs
}

/// Pre-scan all input objects to compute the `.gdb_index` section size.
pub(crate) fn compute_gdb_index_size(groups: &[GroupState<'_, Elf>]) -> Result<u64> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could do with a timing_phase here

Comment thread libwild/src/gdb_index.rs
fn scan_objects_for_gdb_index<'data>(
objects: impl Iterator<Item = (&'data crate::elf::File<'data>, &'data [SectionSlot])>,
) -> Result<GdbIndexScanResult<'data>> {
let mut total_cus = 0usize;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could do with a timing_phase here

Comment thread libwild/src/gdb_index.rs
sorted_symbols: sorted_names,
ht_slots,
..
} = scan_objects_for_gdb_index(objects)?;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice that this function is called twice. Is it expensive? If so, would it make sense to store the result from the first call?

@marxin
Copy link
Copy Markdown
Collaborator

marxin commented Jun 1, 2026

After spending a little bit of time researching the area, I noticed there's a systematic DWARF extension Lookup by Name aka .debug_names (DWARF 5 6.1.1): https://dwarfstd.org/doc/DWARF5.pdf. It's pretty similar to .gdbindex, but supported by both lldb and gdb. Clang can emit the format with -g -gpubnames (can be dumped with llvm-dwarfdump --debug-names) and lld has an option that builds one global cache (--debug-names). Moreover, what's interesting, gdb itself can generate (and consume) the index format: gdb-add-index --dwarf-5. If I'm not mistaken, it's only GCC that lacks the support.

With that said, shouldn't the long-term, future-focused approach be to build out the Debug Names format rather than introducing an ad hoc, GDB-specific index file format?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for --gdb-index

3 participants