Skip to content

feat: add git submodule support#485

Merged
bjornrobertsson merged 34 commits into
mainfrom
74-git-submodule-support
May 22, 2026
Merged

feat: add git submodule support#485
bjornrobertsson merged 34 commits into
mainfrom
74-git-submodule-support

Conversation

@bjornrobertsson
Copy link
Copy Markdown
Contributor

@bjornrobertsson bjornrobertsson commented Dec 11, 2025

Add submodule cloning support via native Go implementation

Summary

This PR resolves/mitigates #74

Changes

Since the existing submodule support in git.go depends on execution of the git binary, this implementation circumvents that limitation by traversing the tree to clone submodules natively in Go.

Behavior

  • Functionality can be toggled via a parameter (ENVBUILDER_GIT_CLONE_SUBMODULES)
  • Submodule initialization runs on every workspace start when enabled, so a transient failure during the first run is automatically retried on the next start
  • Already-cloned submodules are detected and only their expected commit is checked out, making re-runs cheap

Add support for initializing and updating git submodules during repository cloning.
Added GitCloneSubmodules option to support recursive submodule initialization.
Resolved issues with git submodule handling for better cloning and URL resolution.
Resolved issues with git submodule handling to ensure robust cloning and URL resolution without relying on git binary calls.
Adding tests for Submodules and two missing
@bjornrobertsson bjornrobertsson linked an issue Dec 11, 2025 that may be closed by this pull request
@bjornrobertsson
Copy link
Copy Markdown
Contributor Author

I have read the CLA Document and I hereby sign the CLA
recheck

Copy link
Copy Markdown
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @bjornrobertsson !
I have some comments below, and I think it would also be worthwhile adding an integration test (see: integration/integration_test.go and testutil/gittest/gittest.go).

Comment thread git/git.go Outdated
Comment thread git/git_test.go
Comment thread git/git.go Outdated
Comment thread README.md Outdated
Comment thread git/git.go Outdated
Comment thread testutil/gittest/gittest.go Outdated
Comment thread git/git.go Outdated
Comment thread git/git.go Outdated
Comment thread docs/env-variables.md Outdated
- Added SubmoduleDepth custom type that accepts 'true' (depth 10), 'false' (0), or positive integer
- initSubmodules now tracks current depth and stops at max depth
- Default is 0 (disabled) - submodules are not cloned unless explicitly enabled
@johnstcn johnstcn changed the title 74 git submodule support feat: add git submodule support Feb 6, 2026
Comment thread git/git.go Outdated
Comment thread git/git.go Outdated
Comment thread git/git.go Outdated
Comment thread testutil/gittest/gittest.go Outdated
Comment thread git/git.go Outdated
- Add RedactURL function that handles all common URL formats:
  - Standard URLs with userinfo (user:pass, token-only, user-only)
  - URL-encoded credentials (p%40ss%3Aw0rd)
  - SCP-like URLs with any user (git@, deploy@, token@)
  - Various schemes (http, https, ssh, git, ftp, sftp)
  - IPv6 hosts
- Uses net/url for scheme URLs (handles encoding automatically)
- Uses regex for SCP-style URLs
- Redact credentials from parent URL, submodule URL, and resolved URL logs
- Add comprehensive tests covering all credential patterns
Addresses credential exfiltration risk where a malicious .gitmodules could
point to an attacker-controlled server and receive the parent repo's auth.

- Add extractHost() to parse host from standard URLs and SCP-like URLs
- Add SameHost() to compare hosts (case-insensitive, ignores port)
- Only forward RepoAuth to submodule if hosts match
- Log warning when auth is not forwarded due to different host
- Add comprehensive tests for SameHost()
- CreateGitServerWithSubmodule now creates a proper git submodule with:
  - .gitmodules file
  - Submodule config in .git/config
  - Gitlink entry (mode 160000) in the index pointing to submodule commit
- Add CommitSubmodule helper to create proper submodule entries
- Update TestGitSubmodules to verify the submodule file is actually cloned,
  not just that .gitmodules exists
- Add explicit check for SCP-like parent URLs when resolving relative submodules
- Return clear error message with link to tracking issue
- Document limitation in function doc comment
- Add test cases for SCP parent with relative and absolute submodule URLs

Addresses review feedback from johnstcn.
URLs like oauth2:token@host:path were being misinterpreted as having
scheme 'oauth2'. Check SCP regex first, and also require parsed.Host
to be non-empty for standard URL handling.
Move standard URL parsing back before SCP check, but require both
scheme AND host to be non-empty. This correctly handles:
- oauth2:token@host:path (scheme=oauth2, host=empty → SCP fallback)
- https://user@[2001:db8::1]/path (scheme=https, host=[2001:db8::1] → standard URL)
Copy link
Copy Markdown
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First-pass review (Netero). These are mechanical findings only; the full review panel has not yet reviewed this PR. The panel will review after these findings are addressed.

Severity count: 1 P1 (security), 1 P2 (logic bug), 1 P3 (consistency), 1 Nit.

The submodule feature addresses a real gap and the overall structure is reasonable. The security guard (SameHost) and URL redaction are good additions. However, the credential-forwarding logic has a bypass path through nested recursion, and the relative-URL resolution has a semantic error for ./-prefixed paths.

"Attack path: a malicious .gitmodules on github.com points submodule A to evil.com. The first-level SameHost check blocks auth. But if evil.com/repo itself has a nested submodule also on evil.com, the second-level check passes and forwards the root GitHub PAT to evil.com." (Netero)

🤖 This review was automatically generated with Coder Agents.

Comment thread git/git.go Outdated
Comment thread git/git.go
if part == ".." {
// Go up one directory
currentPath = path.Dir(currentPath)
} else if part == "." {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 [DEREM-2] ResolveSubmoduleURL mishandles ./-relative URLs.

The . case does continue (no-op), leaving currentPath at the full parent path including the repo "filename." For ./extras/tool.git relative to https://example.com/org/main.git, this produces https://example.com/org/main.git/extras/tool.git.

Git's behavior: both . and .. are relative to the directory containing the repo, not the repo path itself. The correct result is https://example.com/org/extras/tool.git. The .. case works by accident because path.Dir strips the filename AND goes up one level in a single operation.

Fix: before the loop, set currentPath = path.Dir(parentPath) to establish directory context. Then .. means "go up from directory" and . means "stay."

The test TestResolveSubmoduleURL/relativeChild asserts the wrong expected value, so it passes vacuously. (Netero)

🤖

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified against native git that ./extras/tool.git relative to https://example.com/org/main.git resolves to https://example.com/org/main.git/extras/tool.git, matching the current code:

$ # parent remote: https://example.com/org/main.git
$ cat .gitmodules
[submodule "extras"]
        path = extras
        url = ./extras/tool.git

$ git submodule init && git config submodule.extras.url
https://example.com/org/main.git/extras/tool.git

go-git uses the same rule (vendor/.../submodule.go: rootEndpoint.Path = path.Join(rootEndpoint.Path, moduleEndpoint.Path)). Both . and .. are resolved against the parent URL via path.Join, not against the parent's directory: .. works because path.Join("/org/main.git", "../deps/lib.git") cleans to /org/deps/lib.git. The existing relativeChild test expectation is correct.

Closing as not a bug; happy to revisit if you have a reproducer showing a different behavior from upstream git.

🤖 Posted using /amend-review skill via Coder Agents.

Comment thread git/git.go Outdated
Comment thread git/git_test.go Outdated
Copy link
Copy Markdown
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Full panel review (16 reviewers). The submodule feature fills a real gap and the security-conscious design (SameHost gating, URL redaction) shows good instincts. However, the implementation has several structural issues that need attention before merge.

Severity count: 1 P1, 7 P2, 8 P3, 3 Nit (including R1 findings).

The most critical cluster: the "already cloned" recovery path uses the wrong filesystem root for git storage (5 reviewers independently found this). Combined with the fact that submodule init never retries on subsequent starts, any transient failure during first workspace setup permanently breaks submodule state with no automatic recovery.

A second cluster around credential handling: beyond the R1 credential-leak-through-recursion (DEREM-1), the panel found that Endpoint.String() re-emits URL-embedded credentials into resolved URLs, creating a parallel credential leak channel that bypasses the SameHost gate entirely.

The panel also identified that nested submodule recursion (depth > 1) is likely non-functional because sub.Repository() requires the submodule to be registered in .git/config (via sub.Init()), which the custom clone path never does. The depth parameter accepts values up to MaxInt but depth > 1 silently produces warning logs and skips.

Architectural note: multiple reviewers observed that a thinner wrapper around go-git's existing Submodule.UpdateContext() (addressing only the two narrow gaps: relative URL resolution and per-submodule auth) would avoid the class of bugs found here. The 200-line reimplementation reintroduces problems that go-git's built-in API already solves (storage layout, initialization tracking, recursive update). This is not blocking, but worth considering.

"An attacker controlling .gitmodules on github.com points submodule A to evil.com. Auth correctly blocked at depth 1. But evil.com/repo has a nested submodule also on evil.com. The depth-2 SameHost check sees matching hosts and leaks the root credential." (Netero)

"If go-git's internal expectations differ from the manual clone layout, nested recursion silently fails at the continue on line 623 and nested submodules are never initialized." (Hisoka)

🤖 This review was automatically generated with Coder Agents.

Comment thread git/git.go
Comment thread git/git.go
Comment thread git/git.go
Comment thread git/git.go Outdated
Comment thread git/git.go
Comment thread git/git.go Outdated
Comment thread git/git.go Outdated
Comment thread git/git_test.go
Comment thread options/options.go
Comment thread options/options_test.go
@mafredri
Copy link
Copy Markdown
Member

/coder-agents-review

@mafredri mafredri dismissed their stale review May 21, 2026 18:12

Stale review

Copy link
Copy Markdown

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Full panel review (18 reviewers). The submodule feature fills a real gap, and the security-conscious design (SameHost gating, credential chain propagation, URL redaction) is well-thought-out. The TestSubmoduleAuthChainStaysWithheld test in particular shows good security instincts: once auth is withheld, it stays withheld through all recursion levels. That said, the implementation has structural issues that need attention before merge.

1 P1, 5 P2, 13 P3, 4 Nit.

The P1 is a swapped-argument bug on git.Open that makes every workspace restart with submodules fail. The P2s cover protocol-crossing auth failures, an orphaned .git sentinel that breaks the retry contract, wrong ./-relative URL resolution, missing nested-recursion test coverage, and a PR description that contradicts the code.

"The retry promise and the stale state are secretly attached through that .git sentinel. Pull the thread: first failure creates the directory, second attempt trusts it, every attempt after that fails identically." (Hisoka)

🤖 This review was automatically generated with Coder Agents.

Comment thread git/git.go
Comment thread git/git.go
Comment thread git/git.go
Comment thread git/git.go
if part == ".." {
// Go up one directory
currentPath = path.Dir(currentPath)
} else if part == "." {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 [ENVB-5] ResolveSubmoduleURL handles ./-relative paths incorrectly.

The . case at line 561 is a continue (no-op), leaving currentPath at the full parent path including the repo filename. Subsequent path components are appended to the filename.

Trace for ./extras/tool.git against https://example.com/org/main.git:

currentPath = "/org/main.git"
"."       → continue (stays "/org/main.git")
"extras"  → "/org/main.git/extras"
"tool.git"→ "/org/main.git/extras/tool.git"

Result: https://example.com/org/main.git/extras/tool.git (embeds repo name as spurious directory; 404).

Both RFC 3986 and git's native relative_url resolve this to https://example.com/org/extras/tool.git. Fix: change continue to currentPath = path.Dir(currentPath), which strips the file component.

The test expectations for relativeChild, scpRelativeChild, and httpsParentWithUserPassStripped assert the wrong results. (Kite)

🤖

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-raise of DEREM-2 from R1. The claim that ./ should resolve relative to the parent's directory rather than the parent's path contradicts native git's behavior, which was tested in R1:

$ git submodule init && git config submodule.extras.url
https://example.com/org/main.git/extras/tool.git

go-git resolves the same way (submodule.go:152). No change.

🤖 Posted using /amend-review skill via Coder Agents.

Comment thread git/git.go
Comment thread git/git.go
Comment thread options/options.go
Comment thread git/git.go
// checkoutSubmoduleCommit ensures expectedHash is present in subRepo,
// fetching it from the remote if it is not already there, and then checks
// it out into the submodule's worktree.
func checkoutSubmoduleCommit(ctx context.Context, logf func(string, ...any), subRepo *git.Repository, expectedHash plumbing.Hash, submoduleAuth transport.AuthMethod, opts CloneRepoOptions) error {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 [ENVB-17] checkoutSubmoduleCommit has three code paths: (1) commit present in shallow clone, (2) fetched by hash via targeted RefSpec, (3) fetched via full unshallow fetch. Only path (1) is exercised by the integration test. Paths (2) and (3) are the recovery paths when the parent's clone depth excludes the submodule's pinned commit, which is the expected case for shallow clones with Depth: 1.

A regression in the RefSpec format at line 789 or the NoErrAlreadyUpToDate check at line 797 would be undetected. (Bisky P3, Chopper P3)

🤖

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The three code paths (commit present, fetch-by-hash, fetch-all) in checkoutSubmoduleCommit are only exercised by the integration happy path. Adding unit tests for the fetch fallback paths requires a git server that can simulate a shallow clone where the pinned commit is not the tip. This is a follow-up.

#74

🤖 Posted using /amend-review skill via Coder Agents.

)
submoduleHead, err := submoduleRepo.Head()
require.NoError(t, err)
submoduleSrv := httptest.NewServer(mwtest.BasicAuthMW("", "")(gittest.NewServer(submoduleFS)))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 [ENVB-18] TestGitSubmodules passes BasicAuthMW("", "") for both servers, effectively no auth. The security-relevant behavior (credentials forwarded to same-host, withheld from cross-host) has unit tests, but no integration test proves the full chain works when the HTTP server actually rejects unauthenticated requests. (Bisky)

🤖

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An authenticated integration test (where the server rejects unauthenticated requests) would cover the full auth forwarding chain. The unit tests cover the decision logic; the gap is the HTTP layer. Follow-up.

#74

🤖 Posted using /amend-review skill via Coder Agents.

Comment thread options/options.go
@bjornrobertsson bjornrobertsson merged commit 044c616 into main May 22, 2026
4 checks passed
@bjornrobertsson bjornrobertsson deleted the 74-git-submodule-support branch May 22, 2026 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Git submodule support

3 participants