Skip to content

Commit f173659

Browse files
feat(webapp): apply default repository policy on ECR repo creation (#3467)
## Summary Self-hosters that operate the webapp's ECR account separately from the account running the EKS workers (e.g., a shared platform account that hosts the registry plus per-team accounts that host clusters) currently hit a 403 Forbidden the first time **any** project is deployed: ``` Failed to pull image "<acct-A>.dkr.ecr.<region>.amazonaws.com/<namespace>/proj_…:…": unexpected status from HEAD request to .../v2/.../manifests/sha256:…: 403 Forbidden ``` `ensureEcrRepositoryExists` in `apps/webapp/app/v3/getDeploymentImageRef.server.ts` calls `CreateRepository` and `PutLifecyclePolicy`, but never `SetRepositoryPolicy` — so the new repo inherits the AWS default (only the registry-owner account can read/pull). Workers in the cluster account get 403 every single deploy. The only workarounds today are running a one-off post-create script or pre-creating every repo by hand. ## Proposed change Add an optional env var: ``` DEPLOY_REGISTRY_ECR_DEFAULT_REPOSITORY_POLICY (V4 mirror: V4_DEPLOY_REGISTRY_ECR_DEFAULT_REPOSITORY_POLICY) ``` Raw IAM policy JSON. When set, the webapp calls `SetRepositoryPolicy` immediately after `CreateRepository` so every new repo carries that policy from creation. Operators control the principal/actions; we don't bake in any opinions about cross-account boundaries. Example value (for the typical self-host case — grant pull to the cluster account): ```json { "Version": "2012-10-17", "Statement": [{ "Sid": "AllowClusterAccountPull", "Effect": "Allow", "Principal": {"AWS": "arn:aws:iam::<cluster-account-id>:root"}, "Action": [ "ecr:GetDownloadUrlForLayer", "ecr:BatchGetImage", "ecr:BatchCheckLayerAvailability" ] }] } ``` ## Why env var (not a chart-level field) - Mirrors the shape of the sibling vars (`DEPLOY_REGISTRY_ECR_TAGS`, `DEPLOY_REGISTRY_ECR_ASSUME_ROLE_ARN`, etc.) which are already operator-supplied via `webapp.extraEnvVars` in self-host setups. - Cloud is unaffected — the env var is optional, unset by default; existing behavior unchanged. - Existing repos are unaffected — only newly-created repos get the policy. - `RepositoryCreationTemplate` from the AWS provider isn't an alternative here: it only applies to repos created via pull-through-cache or replication, not to `ecr:CreateRepository` API calls. ## Implementation - `apps/webapp/app/env.server.ts` — declare `DEPLOY_REGISTRY_ECR_DEFAULT_REPOSITORY_POLICY` and the V4 fallback. - `apps/webapp/app/v3/registryConfig.server.ts` — propagate `ecrDefaultRepositoryPolicy` to `RegistryConfig`. - `apps/webapp/app/v3/getDeploymentImageRef.server.ts` — `createEcrRepository` accepts the policy; if set, calls `SetRepositoryPolicy` after `PutLifecyclePolicy`. - `docs/self-hosting/env/webapp.mdx` — documentation row added under **Deploy & Registry**. ## Verification Verified end-to-end against a self-hosted Trigger.dev on EKS where the ECR account is separate from the cluster account: - **Without the env var** (current `main`): the new project's first run pod stays in `ImagePullBackOff` with `403 Forbidden`. - **With the env var set** to a JSON granting `ecr:BatchGetImage`/`GetDownloadUrlForLayer`/`BatchCheckLayerAvailability` to the cluster account: a fresh `trigger.dev deploy --env prod` followed by a `hello-world` run completes in ~5s end-to-end on the first try. Manually also confirmed that existing repos are untouched (the call only fires inside `createEcrRepository`, which only runs when `DescribeRepositories` returned `RepositoryNotFoundException`). ## Out of scope - Chart values surface for this — operators already pass the existing ECR vars via `webapp.extraEnvVars`, so this follows the same pattern. Happy to add a first-class chart field in a follow-up if that's the preferred direction. - IAM-policy validation in the webapp — we forward the JSON verbatim to AWS and surface AWS's error messages on misuse, matching how `DEPLOY_REGISTRY_ECR_TAGS` is handled today. This is a draft pending CI / CodeRabbit pass — happy to iterate on direction (e.g., split into per-action env vars, or extend the chart values schema) if any of the above choices feels off. --------- Co-authored-by: nicktrn <55853254+nicktrn@users.noreply.github.com>
1 parent 8e368cc commit f173659

5 files changed

Lines changed: 95 additions & 1 deletion

File tree

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
---
2+
area: webapp
3+
type: feature
4+
---
5+
6+
Optional `DEPLOY_REGISTRY_ECR_DEFAULT_REPOSITORY_POLICY` env var to apply a default repository policy when the webapp creates new ECR repos

apps/webapp/app/env.server.ts

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -300,6 +300,7 @@ const EnvironmentSchema = z
300300
DEPLOY_REGISTRY_ECR_TAGS: z.string().optional(), // csv, for example: "key1=value1,key2=value2"
301301
DEPLOY_REGISTRY_ECR_ASSUME_ROLE_ARN: z.string().optional(),
302302
DEPLOY_REGISTRY_ECR_ASSUME_ROLE_EXTERNAL_ID: z.string().optional(),
303+
DEPLOY_REGISTRY_ECR_DEFAULT_REPOSITORY_POLICY: z.string().optional(), // raw IAM policy JSON applied to every repo created by the webapp
303304

304305
// Deployment registry (v4) - falls back to v3 registry if not specified
305306
V4_DEPLOY_REGISTRY_HOST: z
@@ -332,6 +333,10 @@ const EnvironmentSchema = z
332333
.string()
333334
.optional()
334335
.transform((v) => v ?? process.env.DEPLOY_REGISTRY_ECR_ASSUME_ROLE_EXTERNAL_ID),
336+
V4_DEPLOY_REGISTRY_ECR_DEFAULT_REPOSITORY_POLICY: z
337+
.string()
338+
.optional()
339+
.transform((v) => v ?? process.env.DEPLOY_REGISTRY_ECR_DEFAULT_REPOSITORY_POLICY),
335340

336341
// Compute gateway (template creation during deploy finalize)
337342
COMPUTE_GATEWAY_URL: z.string().optional(),

apps/webapp/app/v3/getDeploymentImageRef.server.ts

Lines changed: 80 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ import {
88
GetAuthorizationTokenCommand,
99
PutLifecyclePolicyCommand,
1010
PutImageTagMutabilityCommand,
11+
SetRepositoryPolicyCommand,
1112
} from "@aws-sdk/client-ecr";
1213
import { STSClient, AssumeRoleCommand } from "@aws-sdk/client-sts";
1314
import { tryCatch } from "@trigger.dev/core";
@@ -138,6 +139,7 @@ export async function getDeploymentImageRef({
138139
roleArn: registry.ecrAssumeRoleArn,
139140
externalId: registry.ecrAssumeRoleExternalId,
140141
},
142+
defaultRepositoryPolicy: registry.ecrDefaultRepositoryPolicy,
141143
})
142144
);
143145

@@ -219,12 +221,14 @@ async function createEcrRepository({
219221
accountId,
220222
registryTags,
221223
assumeRole,
224+
defaultRepositoryPolicy,
222225
}: {
223226
repositoryName: string;
224227
region: string;
225228
accountId?: string;
226229
registryTags?: string;
227230
assumeRole?: AssumeRoleConfig;
231+
defaultRepositoryPolicy?: string;
228232
}): Promise<Repository> {
229233
const ecr = await createEcrClient({ region, assumeRole });
230234

@@ -262,9 +266,50 @@ async function createEcrRepository({
262266
})
263267
);
264268

269+
// Apply an operator-provided IAM policy to the new repository. Useful for
270+
// self-hosters whose ECR account is separate from the account running the
271+
// EKS workers — without this the workers get 403 Forbidden when pulling the
272+
// task image (default ECR policy only grants access to the registry owner).
273+
// The existing-repo branch of `ensureEcrRepositoryExists` reconciles this
274+
// same policy on every call, so a partial-create that fails here is
275+
// self-healing on the next deploy.
276+
if (defaultRepositoryPolicy) {
277+
await applyEcrRepositoryPolicy({
278+
repositoryName: result.repository.repositoryName!,
279+
region,
280+
accountId: result.repository.registryId ?? accountId,
281+
assumeRole,
282+
defaultRepositoryPolicy,
283+
});
284+
}
285+
265286
return result.repository;
266287
}
267288

289+
async function applyEcrRepositoryPolicy({
290+
repositoryName,
291+
region,
292+
accountId,
293+
assumeRole,
294+
defaultRepositoryPolicy,
295+
}: {
296+
repositoryName: string;
297+
region: string;
298+
accountId?: string;
299+
assumeRole?: AssumeRoleConfig;
300+
defaultRepositoryPolicy: string;
301+
}): Promise<void> {
302+
const ecr = await createEcrClient({ region, assumeRole });
303+
304+
await ecr.send(
305+
new SetRepositoryPolicyCommand({
306+
repositoryName,
307+
registryId: accountId,
308+
policyText: defaultRepositoryPolicy,
309+
})
310+
);
311+
}
312+
268313
async function updateEcrRepositoryCacheSettings({
269314
repositoryName,
270315
region,
@@ -386,11 +431,13 @@ async function ensureEcrRepositoryExists({
386431
registryHost,
387432
registryTags,
388433
assumeRole,
434+
defaultRepositoryPolicy,
389435
}: {
390436
repositoryName: string;
391437
registryHost: string;
392438
registryTags?: string;
393439
assumeRole?: AssumeRoleConfig;
440+
defaultRepositoryPolicy?: string;
394441
}): Promise<{ repo: Repository; repoCreated: boolean }> {
395442
const { region, accountId } = parseEcrRegistryDomain(registryHost);
396443

@@ -421,14 +468,46 @@ async function ensureEcrRepositoryExists({
421468
}
422469
}
423470

471+
// Reconcile the default repository policy on every call. Idempotent, and
472+
// covers two recovery cases: (1) a previous create succeeded but the
473+
// SetRepositoryPolicy call failed mid-flight, leaving the repo without a
474+
// policy; (2) the operator updated DEPLOY_REGISTRY_ECR_DEFAULT_REPOSITORY_POLICY
475+
// and existing repos need to pick up the new value.
476+
if (defaultRepositoryPolicy) {
477+
const [policyError] = await tryCatch(
478+
applyEcrRepositoryPolicy({
479+
repositoryName,
480+
region,
481+
accountId,
482+
assumeRole,
483+
defaultRepositoryPolicy,
484+
})
485+
);
486+
487+
if (policyError) {
488+
logger.error("Failed to reconcile ECR repository policy on existing repo", {
489+
repositoryName,
490+
region,
491+
policyError,
492+
});
493+
}
494+
}
495+
424496
return {
425497
repo: existingRepo,
426498
repoCreated: false,
427499
};
428500
}
429501

430502
const [createRepoError, newRepo] = await tryCatch(
431-
createEcrRepository({ repositoryName, region, accountId, registryTags, assumeRole })
503+
createEcrRepository({
504+
repositoryName,
505+
region,
506+
accountId,
507+
registryTags,
508+
assumeRole,
509+
defaultRepositoryPolicy,
510+
})
432511
);
433512

434513
if (createRepoError) {

apps/webapp/app/v3/registryConfig.server.ts

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ export type RegistryConfig = {
88
ecrTags?: string;
99
ecrAssumeRoleArn?: string;
1010
ecrAssumeRoleExternalId?: string;
11+
ecrDefaultRepositoryPolicy?: string;
1112
};
1213

1314
export function getRegistryConfig(isV4Deployment: boolean): RegistryConfig {
@@ -20,6 +21,7 @@ export function getRegistryConfig(isV4Deployment: boolean): RegistryConfig {
2021
ecrTags: env.V4_DEPLOY_REGISTRY_ECR_TAGS,
2122
ecrAssumeRoleArn: env.V4_DEPLOY_REGISTRY_ECR_ASSUME_ROLE_ARN,
2223
ecrAssumeRoleExternalId: env.V4_DEPLOY_REGISTRY_ECR_ASSUME_ROLE_EXTERNAL_ID,
24+
ecrDefaultRepositoryPolicy: env.V4_DEPLOY_REGISTRY_ECR_DEFAULT_REPOSITORY_POLICY,
2325
};
2426
}
2527

@@ -31,5 +33,6 @@ export function getRegistryConfig(isV4Deployment: boolean): RegistryConfig {
3133
ecrTags: env.DEPLOY_REGISTRY_ECR_TAGS,
3234
ecrAssumeRoleArn: env.DEPLOY_REGISTRY_ECR_ASSUME_ROLE_ARN,
3335
ecrAssumeRoleExternalId: env.DEPLOY_REGISTRY_ECR_ASSUME_ROLE_EXTERNAL_ID,
36+
ecrDefaultRepositoryPolicy: env.DEPLOY_REGISTRY_ECR_DEFAULT_REPOSITORY_POLICY,
3437
};
3538
}

docs/self-hosting/env/webapp.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,7 @@ mode: "wide"
7676
| `DEPLOY_REGISTRY_USERNAME` | No || Deploy registry username. |
7777
| `DEPLOY_REGISTRY_PASSWORD` | No || Deploy registry password. |
7878
| `DEPLOY_REGISTRY_NAMESPACE` | No | trigger | Deploy registry namespace. |
79+
| `DEPLOY_REGISTRY_ECR_DEFAULT_REPOSITORY_POLICY` | No || Raw IAM policy JSON applied via SetRepositoryPolicy to every ECR repo created by the webapp. Use to grant cross-account pull access to EKS workers when the ECR account is separate from the cluster account. |
7980
| `DEPLOY_IMAGE_PLATFORM` | No | linux/amd64 | Deploy image platform, same values as docker `--platform` flag. |
8081
| `DEPLOY_TIMEOUT_MS` | No | 480000 (8m) | Deploy timeout (ms). |
8182
| **Object store (S3)** | | | |

0 commit comments

Comments
 (0)