Limited Microsoft PDB format support#93
Conversation
1f9aa20 to
f133d74
Compare
96bdadf to
ef1975e
Compare
simark
left a comment
There was a problem hiding this comment.
Just a handful of comments for now, I'm currently going through your README.md.
6ac89f1 to
694e797
Compare
|
Yes, I think the usual way would be to set a flag in configure.tgt, in the right configuration. Maybe gdb_require_amd_dbgapi can serve as inspiration. That would just be a default though, if the user passes --disable-pdb-debug-format (or whatever it's called), that should take precedence.
…On 2026-05-05 12:13, vuzelac-amd wrote:
Enabled with |--enable-targets=all| or when a windows target is enabled - testing for presence of windows-tdep.o in TARGET_OBS . Maybe should set a flag in configure.tgt ?
|
694e797 to
f4ef004
Compare
simark
left a comment
There was a problem hiding this comment.
Since this is completely new code, it would be nice to adopt C++ practices from the start. The rest of the GDB code sometimes looks like C compiled as C++, because of its legacy. But here we can be "clean" from the start.
Some example:
constexprvariables instead of macros- avoiding the
struct (orenumorclass` keyword when possible) nullptrinstead ofNULL- I would also consider putting everything in here inside a
pdbnamespace.
There are probably more that we'll find as we go.
f0f2334 to
8333e12
Compare
macros are now constexpr, removed struct/enum/class keywords, using nullptr and introduced pdb namespace |
3149385 to
061e84f
Compare
There was a problem hiding this comment.
There are some magic numbers usage which may need either to be explained where it happens or moved into some defines/constants. Also, there is too many references to AMD. This code should be independent from a chip vendor, whatever is vendor specific, we need to move them into hooks and vendor specific backend.
Also, I would suggest using more assert functions, usually, they are not compiled into a distribution, but they are active during debugging. Using them, makes the life so easy when a bug hits.
| const bfd_arch_info *arch = gdbarch_bfd_arch_info (gdbarch); | ||
| int dwarf_regnum; | ||
|
|
||
| if (arch->arch == bfd_arch_i386 && arch->mach == bfd_mach_x86_64) |
There was a problem hiding this comment.
can we use something more generic here? Maybe adding a new gdb hook, or something of a sort.
There was a problem hiding this comment.
I'd need to introduce a new gdbarch and implement backends. I've left it here during development not to touchgdbarch (yet). I would like to hear if arch dependent code can actually stay in PDB reader (but moved into some target files - this file is not to know anything about arch) as we are doing it for just two backends - x86_64 (same layout as i386) and ARM. If we skip ARM we just need to provide a single implementation and some kind of hook for future implementation, which is what I have here. @palves @simark
| switch (enc) | ||
| { | ||
| case 1: | ||
| *frame_reg = CV_AMD64_RSP; |
There was a problem hiding this comment.
too AMD specific, can we use a hook or something of a sort?
| { | ||
| const char *fp_name; | ||
| if (enc == 2) | ||
| fp_name = "RBP"; |
There was a problem hiding this comment.
I am not a fun of having binary files in test suite, but probably there are no alternatives.
There was a problem hiding this comment.
Fixed PDB files are needed as the the compiler might change PDB symbols, the tests are looking for. We need a matching EXE so the binary file needs to be fixes as well. This is for PDB tests (maint commands) that inspect PDB files. But these tests can be dropped as the passing gdb testsuite surely is enough to tell us the symbols are read out properly.
| # PDB is primarily supported on Windows/MSVC targets. | ||
| proc pdb_support {} { | ||
| # Check if running on Windows | ||
| if {[info exists ::env(WINDIR)] || [string match "mingw*" [exec uname -s]]} { |
There was a problem hiding this comment.
I think we may need to document how to build and test on windows. Probably in gdb.texinfo, adding a new chapter on this.
| } | ||
| } | ||
|
|
||
|
|
b9f080c to
eff658a
Compare
|
Added compound types tests to testsuite/gdb.pdb |
PDB (Program Database) is the debug information format produced by the Microsoft toolchain for PE/COFF executables. This commit adds a reader that allows GDB to load PDB files associated with Windows executables and provide source-level debugging for them. `pdb_initialize_objfile()` is the entry point, called from `coff_symfile_read()` (in the COFF reader) before the DWARF initialization call. It calls `pdb_read_pdb_file()` to load and parse the PDB. PDB files are searched at the following locations: the executable directory, each entry of 'debug-file-directory', the path indicated by RSDS record of the Debug Directory section of the executable, paths from _NT_ALT_SYMBOL_PATH and _NT_SYMBOL_PATH, Windows registry SymbolSearchPath. Various `maintainance info pdb-*` commands are provided for inspecting PDB file content. What is loaded: - The PDB Info stream: holds the PDB identity and the named-stream map used to locate streams by name. - The /names stream: global string table holding source file paths and other strings used by the PDB. - The DBI stream: per-module headers, section contributions, and the Symbol Record stream (provides global/public symbols). - The TPI stream: type records are indexed up-front, then resolved on demand when a symbol referencing the type is built. - PE section base addresses queried from BFD, used to relocate every PDB `(section, offset)` into a real PC value. - Each module's substream: CodeView symbol records (functions, locals, parameters, scopes) and C13 line-info subsections. From these, a compunit_symtab is built with a linetable, block tree, and GDB symbols carrying resolved types and locations. - Per-module line information: file checksums, line numbers and source file names are linked into linetables on the module's compunit. Limitations: - No calling convention or stack unwinding support. - Locals only accessible in the current frame (No `up` / `down` /`frame N`). - Virtual method dispatch does not work. The vtable pointer field (LF_VFUNCTAB) is not shown, and the vtable slot index from LF_METHODLIST is not extracted. - Virtual base classes (LF_VBCLASS/LF_IVBCLASS) appear in `ptype` output but cannot be accessed at runtime. Computing the virtual-base offset requires Microsoft C++ ABI support. - No template parameter decoding. Template instantiation names are shown as stored in the PDB (e.g. `vector<int,allocator<int>>`). - No inline function support (inlined frames are not visible). - CodeView register mapping covers AMD64 GPRs only (RAX-R15, RSP, RBP). Variables stored in other registers show as unavailable. x86-64 support only. - No IPI stream parsing. - No language type detection. - No MSVC name demangling. Needed when accessing public symbols and minsyms. - No PDB symbol server support. - No lazy loading -- all modules are expanded eagerly at load time. PDB support is built when any Windows target is configured. It can also be forced with `--enable-gdb-pdb-support`. See gdb/pdb/README.md for the architecture and file layout. Change-Id: Ib0a53539e56bf5aad7a0e619e62240c4f716a3b9
eff658a to
fab3907
Compare
Overview
PDB is a multi-stream file container where different streams provide different
debug information. Streams are composed of multiple blocks, which don't have to
be consecutive. The blocks are the actual physical parts of the file — the PDB
file itself consists of multiple fixed-size blocks (except for the header).
Following is the data from PDB we need to read on initialization.
MSF header (SuperBlock):
The MSF SuperBlock is the first block in the PDB file and contains basic
information such as the block size, number of blocks, and most importantly,
the location of the stream directory, which is used to locate all other
streams in the file. The SuperBlock is 64 bytes long.
Stream directory
The stream directory is located immediately after the SuperBlock and specifies
which block belongs to which stream. Each stream can span multiple physical
blocks that are not necessarily contiguous.
With the information from the stream directory, we are able to parse any stream.
PDB Info stream (stream 1)
Basic information stream - it's most significant part is the location of the
the "/names" stream (the String Table) which contains the list of all the files
compiled into the PDB.
Names stream (String Table):
Contains info on all the files used by all modules compiled into the PDB.
The names are read out into the String Table. The table is loaded eagerly
because just about any module will need to reference it when trying to display
it's files. The concept of lazy loading assumes we access the data only when
needed i.e. - only when a particular module is referenced. In PDB case,
accessing just about any module (break, info sources...) will quickly reference
this table in order to get the line information, thus we just preload the table.
DBI stream (stream 3):
DBI stream contains the debug information (line numbers, symbols, etc.) for all
the modules (object files) linked into the program. Each module's debug info is
in a different stream and we read those streams on request. Eagerly we only load
the header which contains info on per module streams (debug info is per module).
DBI File Info substream
Substream is just a piece of data located at a given offset in a stream.
The File Info substream contains info on all the files used by all the modules
compiled into the PDB - the Names Buffer. Names Buffer actually duplicates the
String Table but it also adds the information on files that go into each module.
This is suitable for Quick Functions that check if a files is in a module;
obtaining this info from the String Table would require expanding the parts
of the module stream, to get the sections that reference per per module files
(indices into String Buffer).
The duplication of the file names likely exists for compatibility.
TPI stream (stream 2)
The TPI (Type Program Information) stream contains all non-builtin type
records used by the program — pointers, modifiers, arrays, procedures, member
functions, structs, classes, unions, enums, bitfields, argument lists, etc.
A type index is a 32-bit integer that uniquely identifies a type. Indices
below 0x1000 are reserved for simple/builtin types (encoded within the index).
Indices 0x1000 and above correspond to records in the TPI stream, assigned
sequentially: the first record is 0x1000, the second 0x1001, etc. Symbol records a
nd other type records reference types by their type index.
Each record in the stream has variable length consisting of a 2-byte
RecordLen, a 2-byte
RecordKind(the "leaf type" identifier such asLF_POINTER,LF_MODIFIER,LF_ARRAY,LF_PROCEDURE,LF_ARGLIST...), and a payload whose layout dependson the leaf type. Fields within a record can reference other types by their type index,
forming a directed graph (e.g. an
LF_POINTERrecord contains the type index of the pointee type).The TPI stream is parsed eagerly at load time — type records are indexed so
they can be resolved on demand when a symbol references a type index. Resolved
types are cached so each type index is converted to a GDB
struct typeat mostonce.
IPI stream (stream 4). TODO
The IPI (Id Program Information) stream has the same physical layout as the TPI
stream but contains id records rather than type records. Id records reference
items like functions, strings, and build information by name rather than by
type structure. Currently the IPI stream is not parsed.
Module Streams
Module streams contain the debug information for individual modules (object
files). Various debug sections are specified using identifiers — e.g. symbols
or line information or file info. The line information is in C13 sections
(C11 sections are obsolete). C13 sections are split into subsections, most
importantly Checksums and Lines. The Checksums subsection references the
String Table to provide the source files that belong to the module, while the
Lines subsection maps addresses to source lines (analogous to
.debug_lineinDWARF).
Symbol Record Stream / Global Symbol Stream (GSI) / Public Symbol Stream (PSGSI)
The Symbol Record Stream (referenced by the DBI header) contains all global
symbol records — both private globals (S_GPROC32, S_GDATA32, S_PROCREF, etc.)
and public symbols (S_PUB32).
The PSGSI (Public Symbol Index) stream is PDB's equivalent of the ELF symbol
table (
.symtab/.dynsym) — it contains a hash table whose hash records pointinto the Symbol Record Stream to locate the S_PUB32 records stored there.
After the hash table there is an address to name map that is used to build
the GDB minimal symbol table.
The GSI (Global Symbol Index) stream is a hash table for O(1) name to symbol
lookup similar to DWARF's .debug_names. It indexes cross-reference records
(S_PROCREF, S_LPROCREF, S_DATAREF) that point into module streams — each
reference carries module index and offset, telling the reader which module
contains the full symbol definition. We use this table to build the cooked index
and provide quick functions for symbol lookup on GDB's request.
Finding PDB files.
PDB files are searched at different locations - for the main executable the user
can specify the --pdb-path command-line override. Further, we search for PDB
by the PDB name recorded in the so called RSDS record of the Debug Directory
section in the actual executable. This PDB name is searched as is or as the
base name in the EXE directory. We also search for the PDB by simply replacing
the EXE name.
Windows can specify the location of the PDB files in Windows registry or in the
environment variables.
TODO: For system DLLs, Windows normally uses so called Debug Symbol server from
where the PDB files can be downloaded.
Path Conversion (MSYS2)
PDBs produced under MSYS2 can have Linux style paths which are converted into
Windows style paths before storing them to symtab linetables, so that GDB
can load them. This either requires prepending the MSYS2 root
(e.g. /home/PATH -> C:/msys2/PATH) or converting drive information
(e.g. /c/PATH -> C:/PATH).
The MSYS2 root must be specified using MSYS2_ROOT env. var, otherwise we look
into common msys2/mingw64 directories.
Info Commands
All commands accept optional
path=<pdb-path>andmodi=Narguments toselect a specific PDB / module. If omitted, the default (main program) PDB and
all modules are used.
info pdb-loaded-filesList paths of all currently loaded PDB files.info pdb-modulesList modules (object files) in the PDB withstream numbers and file counts.
info pdb-filesList source files per module from the DBI File Infosubstream.
info pdb-files-c13List source files per module from C13 Checksumssubsections, showing checksum type (MD5/SHA-1/SHA-256)
and hash values.
info pdb-linesDump C13 line info: section:offset ranges and linenumber to offset mappings.
info pdb-symbolsDump raw CodeView symbol records from module streams.info pdb-sym-recordsDump records from the global symbol record stream.info pdb-gsiDump GSI (Global Symbol Index) hash table: header,hash records, bitmap, bucket data.
info pdb-psiDump PSGSI (Public Symbol Index) hash table withembedded GSI hash and address map.
info pdb-locationsDump resolved variable location batons (ranges,register/offset, gaps). Requires
modi=N; optionalsymbol=NAMEto filter.TODO:
Source Files
pdb.hMain header. Defines all public data structures, MSF/DBI/CodeView constants,
stream index numbers (PDB=1, TPI=2, DBI=3, IPI=4), and the public API.
Key Structs:
pdb_per_objfile— Top-level context for one PDB file. Holds MSF geometry,stream directory, cached stream data, DBI data, module array, section
addresses, string table, TPI context, GSI table, and the symbol record
stream cache.
pdb_module_info— Per-module metadata: stream number, symbol/C11/C13 bytesizes, section contribution, file lists (from File Info and from C13),
expansion state and cached
compunit_symtab.pdb_tpi_context— Parsed TPI stream: type record array, type cache.pdb_tpi_type— Single raw TPI record: leaf type, length, data pointer(into cached stream), data length.
pdb_gsi_hdr— Parsed GSI hash header (signature, version, hash-recordand bucket-data byte counts, data pointers, validity flag).
pdb_rsds_info— RSDS record from the PE debug directory: GUID, age,PDB path.
pdb_loclist_baton— Per-symbol location baton: linked list of locationentries plus back-pointer to the PDB context.
pdb_loc_entry— One DEFRANGE location range: start/end PC, registernumber, offset, flags, and inline gap array.
pdb_loc_gap— Gap within a location entry (start/end addresses).pdb_file_info— Per-file checksum entry from C13: filename, checksumtype and data.
pdb_line_block_info— Callback data forpdb_walk_c13_line_blocks:filename, line section header, line array, line count.
CV_FileBlock,CV_FileChecksum— On-disk C13 file block and checksumstructures.
CV_LineSection,CV_Line— On-disk C13 line section header and lineentry.
Key Functions:
pdb_initialize_objfile()— Entry point called from COFF reader; loads PDB,expands modules, registers quick functions.
pdb_find_pdb_file()— PDB file search.pdb_read_stream()— Read and cache an MSF stream by index.pdb_read_tpi_stream()— Parse TPI stream header and type records.pdb_tpi_resolve_type()— Resolve a type index to a GDBstruct type.pdb_build_module()— Expand a single module into acompunit_symtab.pdb_parse_symbols()— Parse CodeView symbol records from a module stream.pdb_read_sym_record_stream()— Cache the global symbol record stream.pdb_load_global_syms()— Create GDB symbols from SymRecordStream globals.pdb_parse_sym_record_stream()— Parse/dump the SymRecordStream.pdb_init_gsi_table()— Parse GSI stream into a hash table.pdb_build_minsyms()— Create minimal symbols from PSGSI.pdb_read_module_stream()— Load a module's stream data.pdb_read_module_files()— Resolve file names from the File Info substream.pdb_read_module_files_c13()— Resolve file names from C13 checksums.pdb_walk_c13_line_blocks()— Walk C13 line blocks.pdb_map_section_offset_to_pc()— Convert (section, offset) to relocated PC.pdb_register_loaded_pdb()— Register a PDB for info commands.pdb_init_loclist()— Register PDB location list implementation with GDB.pdb.cCore implementation. Handles MSF file I/O (reading blocks, assembling
streams), parsing the stream directory, /names stream (global string table),
DBI stream (module headers, section contributions), File Info substream,
and BFD section address mapping. Walks C13 line subsections and builds GDB
symtabs with linetables. Contains
pdb_initialize_objfile()(the GDB entrypoint),
pdb_expand_all_modules(), and thepdb_readnow_functionsquickfunction table. Also handles MSYS2-style path conversion.
pdb-read-types.cTPI stream parser. Reads the TPI stream header and builds an indexed array of
pdb_tpi_typerecords — each record stores the leaf type, length, and apointer directly into the cached stream data (no copy). The type record array
is allocated on the objfile obstack.
A type cache (
struct type **, 0x10000 entries covering all possible 16-bittype indices) is also allocated on the objfile obstack. It maps type indices to
resolved GDB
struct typepointers so each index is resolved at most once.Simple/builtin types (0x0000–0x0FFF) and compound types (0x1000+) share the
same cache array. The cache is 512 KB on a 64-bit system.
Resolution is on-demand: when a symbol references a type index,
pdb_tpi_resolve_type()checks the cache, then either decodes the Kind+Modeencoding (simple types) or parses the leaf record (LF_MODIFIER, LF_PROCEDURE,
LF_MFUNCTION, LF_POINTER, LF_ARRAY, LF_BITFIELD). Compound type resolution
is recursive — e.g. an LF_POINTER record references an underlying type index
that is itself resolved via the cache.
pdb-read-symbols.cCodeView symbol record parser. Contains
pdb_parse_symbols()(per-modulesymbol parsing),
pdb_load_global_syms()(global symbol stream), andpdb_parse_sym_record_stream()(DBI symbol record stream dump). Also containsthe
create_gdb_sym()implementations for each CodeView symbol wrapper struct.The function
pdb_loclist_read_variableprovides symbol (and it's location)resolution to GDB by registering with GDB's symbol_computed_ops. Unlike DWARF
the implementation here uses LOC_COMPUTED for register variables as well, and
we use a single baton class.
All GDB symbols are allocated on the objfile obstack. Location batons
(
pdb_loclist_baton) and their location entries (pdb_loc_entry, whichinclude an inline gap array) are also obstack-allocated. The symbol wrapper
structs (
pdb_sym) are stack-allocated during parsing — they exist only longenough to extract fields from the raw record and call
create_gdb_sym().pdb-cv-regs-amd64.hCodeView register definitions for AMD64. Maps CodeView register IDs
(from Microsoft's
cvconst.h) to DWARF and GDB register numbers.pdb-path.cPDB file discovery. Searches for the PDB file using multiple strategies in
order:
--pdb-pathcommand-line override, PDB basename in the EXE directory(from the RSDS record in the PE debug directory), EXE path with
.pdbextension, full RSDS path,
_NT_SYMBOL_PATH/_NT_ALT_SYMBOL_PATHenvironment variables, and Windows registry entries.
pdb-cmd.cGDB command registration. Implements all
info pdb-*commands listed above.Provides helper functions for parsing command arguments (
path=,modi=)and dispatching to the appropriate dump routines.
GDB Integration
pdb_initialize_objfile()is the entry point, called fromcoff_symfile_read()(in the COFF reader) before the DWARF initialization call. It calls
pdb_read_pdb_file()to load and parse the PDB.
Each file registers the following with GDB:
pdb.c— builds per-module symtabs with linetables and registers quicksymbol functions (
pdb_readnow_functions).pdb-read-symbols.c— Builds the CU; adds GDB symbols, function/scopeblocks, and symbol location info to the compute unit.
pdb-read-types.c— Creates GDB types out of TPI types.pdb-cmd.c— Registersinfo pdb-*commands for inspecting PDB internals.Initialization Order
pdb_read_pdb_file()loads PDB data in this order:Limitations
LF_ENUM are not yet resolved — returns void/unsupported placeholder.
Variables of these types display as
<unsupported PDB type>.pdb_loclist_read_variable()reads variables from live registers, so only the innermost frame (frame #0)
is supported.
up/down/frame Nneed unwinding that is not yet supported.Other registers are not mapped — variables stored in those registers show as unavailable. Only x86-64 is supported.
info pdb-psiand minsyms. Module-level records store undecorated names sosymbols display correct names.
pdb-path.c).Memory Allocations
Using objfile obstack, except for:
Heap (
new):pdb_per_objfile— registered viaregistry<objfile>::key, auto-deletedwhen objfile is destroyed.
buildsym_compunit— builder, deleted after modules arepdb_build_module()/pdb_expand_all_modules().Scoped (
unique_ptr<gdb_byte[]>):pdb.c- reading stream directory and stream block map. Freed automatically.pdb.cpdb_read_stream()- reading of the actual streams bytes.Released into
pdb->stream_data[](pdbon objstack) or freed automaticaly.pdb-path.c— temporary buffers for PE executable access.