feat: add git submodule support#485
Conversation
Add support for initializing and updating git submodules during repository cloning.
Added GitCloneSubmodules option to support recursive submodule initialization.
Resolved issues with git submodule handling for better cloning and URL resolution.
Resolved issues with git submodule handling to ensure robust cloning and URL resolution without relying on git binary calls.
*test.go file addendum
Adding tests for Submodules and two missing
|
I have read the CLA Document and I hereby sign the CLA |
johnstcn
left a comment
There was a problem hiding this comment.
Thanks for working on this @bjornrobertsson !
I have some comments below, and I think it would also be worthwhile adding an integration test (see: integration/integration_test.go and testutil/gittest/gittest.go).
- Added SubmoduleDepth custom type that accepts 'true' (depth 10), 'false' (0), or positive integer - initSubmodules now tracks current depth and stops at max depth - Default is 0 (disabled) - submodules are not cloned unless explicitly enabled
- Add RedactURL function that handles all common URL formats: - Standard URLs with userinfo (user:pass, token-only, user-only) - URL-encoded credentials (p%40ss%3Aw0rd) - SCP-like URLs with any user (git@, deploy@, token@) - Various schemes (http, https, ssh, git, ftp, sftp) - IPv6 hosts - Uses net/url for scheme URLs (handles encoding automatically) - Uses regex for SCP-style URLs - Redact credentials from parent URL, submodule URL, and resolved URL logs - Add comprehensive tests covering all credential patterns
Addresses credential exfiltration risk where a malicious .gitmodules could point to an attacker-controlled server and receive the parent repo's auth. - Add extractHost() to parse host from standard URLs and SCP-like URLs - Add SameHost() to compare hosts (case-insensitive, ignores port) - Only forward RepoAuth to submodule if hosts match - Log warning when auth is not forwarded due to different host - Add comprehensive tests for SameHost()
- CreateGitServerWithSubmodule now creates a proper git submodule with: - .gitmodules file - Submodule config in .git/config - Gitlink entry (mode 160000) in the index pointing to submodule commit - Add CommitSubmodule helper to create proper submodule entries - Update TestGitSubmodules to verify the submodule file is actually cloned, not just that .gitmodules exists
- Add explicit check for SCP-like parent URLs when resolving relative submodules - Return clear error message with link to tracking issue - Document limitation in function doc comment - Add test cases for SCP parent with relative and absolute submodule URLs Addresses review feedback from johnstcn.
URLs like oauth2:token@host:path were being misinterpreted as having scheme 'oauth2'. Check SCP regex first, and also require parsed.Host to be non-empty for standard URL handling.
Move standard URL parsing back before SCP check, but require both scheme AND host to be non-empty. This correctly handles: - oauth2:token@host:path (scheme=oauth2, host=empty → SCP fallback) - https://user@[2001:db8::1]/path (scheme=https, host=[2001:db8::1] → standard URL)
mafredri
left a comment
There was a problem hiding this comment.
First-pass review (Netero). These are mechanical findings only; the full review panel has not yet reviewed this PR. The panel will review after these findings are addressed.
Severity count: 1 P1 (security), 1 P2 (logic bug), 1 P3 (consistency), 1 Nit.
The submodule feature addresses a real gap and the overall structure is reasonable. The security guard (SameHost) and URL redaction are good additions. However, the credential-forwarding logic has a bypass path through nested recursion, and the relative-URL resolution has a semantic error for ./-prefixed paths.
"Attack path: a malicious .gitmodules on github.com points submodule A to evil.com. The first-level SameHost check blocks auth. But if evil.com/repo itself has a nested submodule also on evil.com, the second-level check passes and forwards the root GitHub PAT to evil.com." (Netero)
🤖 This review was automatically generated with Coder Agents.
| if part == ".." { | ||
| // Go up one directory | ||
| currentPath = path.Dir(currentPath) | ||
| } else if part == "." { |
There was a problem hiding this comment.
P2 [DEREM-2] ResolveSubmoduleURL mishandles ./-relative URLs.
The . case does continue (no-op), leaving currentPath at the full parent path including the repo "filename." For ./extras/tool.git relative to https://example.com/org/main.git, this produces https://example.com/org/main.git/extras/tool.git.
Git's behavior: both . and .. are relative to the directory containing the repo, not the repo path itself. The correct result is https://example.com/org/extras/tool.git. The .. case works by accident because path.Dir strips the filename AND goes up one level in a single operation.
Fix: before the loop, set currentPath = path.Dir(parentPath) to establish directory context. Then .. means "go up from directory" and . means "stay."
The test TestResolveSubmoduleURL/relativeChild asserts the wrong expected value, so it passes vacuously. (Netero)
🤖
There was a problem hiding this comment.
Verified against native git that ./extras/tool.git relative to https://example.com/org/main.git resolves to https://example.com/org/main.git/extras/tool.git, matching the current code:
$ # parent remote: https://example.com/org/main.git
$ cat .gitmodules
[submodule "extras"]
path = extras
url = ./extras/tool.git
$ git submodule init && git config submodule.extras.url
https://example.com/org/main.git/extras/tool.git
go-git uses the same rule (vendor/.../submodule.go: rootEndpoint.Path = path.Join(rootEndpoint.Path, moduleEndpoint.Path)). Both . and .. are resolved against the parent URL via path.Join, not against the parent's directory: .. works because path.Join("/org/main.git", "../deps/lib.git") cleans to /org/deps/lib.git. The existing relativeChild test expectation is correct.
Closing as not a bug; happy to revisit if you have a reproducer showing a different behavior from upstream git.
🤖 Posted using
/amend-reviewskill via Coder Agents.
mafredri
left a comment
There was a problem hiding this comment.
Full panel review (16 reviewers). The submodule feature fills a real gap and the security-conscious design (SameHost gating, URL redaction) shows good instincts. However, the implementation has several structural issues that need attention before merge.
Severity count: 1 P1, 7 P2, 8 P3, 3 Nit (including R1 findings).
The most critical cluster: the "already cloned" recovery path uses the wrong filesystem root for git storage (5 reviewers independently found this). Combined with the fact that submodule init never retries on subsequent starts, any transient failure during first workspace setup permanently breaks submodule state with no automatic recovery.
A second cluster around credential handling: beyond the R1 credential-leak-through-recursion (DEREM-1), the panel found that Endpoint.String() re-emits URL-embedded credentials into resolved URLs, creating a parallel credential leak channel that bypasses the SameHost gate entirely.
The panel also identified that nested submodule recursion (depth > 1) is likely non-functional because sub.Repository() requires the submodule to be registered in .git/config (via sub.Init()), which the custom clone path never does. The depth parameter accepts values up to MaxInt but depth > 1 silently produces warning logs and skips.
Architectural note: multiple reviewers observed that a thinner wrapper around go-git's existing Submodule.UpdateContext() (addressing only the two narrow gaps: relative URL resolution and per-submodule auth) would avoid the class of bugs found here. The 200-line reimplementation reintroduces problems that go-git's built-in API already solves (storage layout, initialization tracking, recursive update). This is not blocking, but worth considering.
"An attacker controlling .gitmodules on github.com points submodule A to evil.com. Auth correctly blocked at depth 1. But evil.com/repo has a nested submodule also on evil.com. The depth-2 SameHost check sees matching hosts and leaks the root credential." (Netero)
"If go-git's internal expectations differ from the manual clone layout, nested recursion silently fails at the continue on line 623 and nested submodules are never initialized." (Hisoka)
🤖 This review was automatically generated with Coder Agents.
|
/coder-agents-review |
There was a problem hiding this comment.
Full panel review (18 reviewers). The submodule feature fills a real gap, and the security-conscious design (SameHost gating, credential chain propagation, URL redaction) is well-thought-out. The TestSubmoduleAuthChainStaysWithheld test in particular shows good security instincts: once auth is withheld, it stays withheld through all recursion levels. That said, the implementation has structural issues that need attention before merge.
1 P1, 5 P2, 13 P3, 4 Nit.
The P1 is a swapped-argument bug on git.Open that makes every workspace restart with submodules fail. The P2s cover protocol-crossing auth failures, an orphaned .git sentinel that breaks the retry contract, wrong ./-relative URL resolution, missing nested-recursion test coverage, and a PR description that contradicts the code.
"The retry promise and the stale state are secretly attached through that .git sentinel. Pull the thread: first failure creates the directory, second attempt trusts it, every attempt after that fails identically." (Hisoka)
🤖 This review was automatically generated with Coder Agents.
| if part == ".." { | ||
| // Go up one directory | ||
| currentPath = path.Dir(currentPath) | ||
| } else if part == "." { |
There was a problem hiding this comment.
P2 [ENVB-5] ResolveSubmoduleURL handles ./-relative paths incorrectly.
The . case at line 561 is a continue (no-op), leaving currentPath at the full parent path including the repo filename. Subsequent path components are appended to the filename.
Trace for ./extras/tool.git against https://example.com/org/main.git:
currentPath = "/org/main.git"
"." → continue (stays "/org/main.git")
"extras" → "/org/main.git/extras"
"tool.git"→ "/org/main.git/extras/tool.git"
Result: https://example.com/org/main.git/extras/tool.git (embeds repo name as spurious directory; 404).
Both RFC 3986 and git's native relative_url resolve this to https://example.com/org/extras/tool.git. Fix: change continue to currentPath = path.Dir(currentPath), which strips the file component.
The test expectations for relativeChild, scpRelativeChild, and httpsParentWithUserPassStripped assert the wrong results. (Kite)
🤖
There was a problem hiding this comment.
Re-raise of DEREM-2 from R1. The claim that ./ should resolve relative to the parent's directory rather than the parent's path contradicts native git's behavior, which was tested in R1:
$ git submodule init && git config submodule.extras.url
https://example.com/org/main.git/extras/tool.git
go-git resolves the same way (submodule.go:152). No change.
🤖 Posted using
/amend-reviewskill via Coder Agents.
| // checkoutSubmoduleCommit ensures expectedHash is present in subRepo, | ||
| // fetching it from the remote if it is not already there, and then checks | ||
| // it out into the submodule's worktree. | ||
| func checkoutSubmoduleCommit(ctx context.Context, logf func(string, ...any), subRepo *git.Repository, expectedHash plumbing.Hash, submoduleAuth transport.AuthMethod, opts CloneRepoOptions) error { |
There was a problem hiding this comment.
P3 [ENVB-17] checkoutSubmoduleCommit has three code paths: (1) commit present in shallow clone, (2) fetched by hash via targeted RefSpec, (3) fetched via full unshallow fetch. Only path (1) is exercised by the integration test. Paths (2) and (3) are the recovery paths when the parent's clone depth excludes the submodule's pinned commit, which is the expected case for shallow clones with Depth: 1.
A regression in the RefSpec format at line 789 or the NoErrAlreadyUpToDate check at line 797 would be undetected. (Bisky P3, Chopper P3)
🤖
There was a problem hiding this comment.
The three code paths (commit present, fetch-by-hash, fetch-all) in checkoutSubmoduleCommit are only exercised by the integration happy path. Adding unit tests for the fetch fallback paths requires a git server that can simulate a shallow clone where the pinned commit is not the tip. This is a follow-up.
🤖 Posted using
/amend-reviewskill via Coder Agents.
| ) | ||
| submoduleHead, err := submoduleRepo.Head() | ||
| require.NoError(t, err) | ||
| submoduleSrv := httptest.NewServer(mwtest.BasicAuthMW("", "")(gittest.NewServer(submoduleFS))) |
There was a problem hiding this comment.
P3 [ENVB-18] TestGitSubmodules passes BasicAuthMW("", "") for both servers, effectively no auth. The security-relevant behavior (credentials forwarded to same-host, withheld from cross-host) has unit tests, but no integration test proves the full chain works when the HTTP server actually rejects unauthenticated requests. (Bisky)
🤖
There was a problem hiding this comment.
An authenticated integration test (where the server rejects unauthenticated requests) would cover the full auth forwarding chain. The unit tests cover the decision logic; the gap is the HTTP layer. Follow-up.
🤖 Posted using
/amend-reviewskill via Coder Agents.
Add submodule cloning support via native Go implementation
Summary
This PR resolves/mitigates #74
Changes
Since the existing submodule support in
git.godepends on execution of the git binary, this implementation circumvents that limitation by traversing the tree to clone submodules natively in Go.Behavior