Skip to content

Implement assistant job model and lifecycle #30

Description

@alexeygrigorev

Implement assistant job model and lifecycle

Status: blocked
Tags: enhancement, assistant, podcast, work-engine, backend, frontend, portal, data, infra, P0
Depends on: #29
Blocks: None

Current State

Agent-verifiable assistant job model, API, UI, export, SAM, and deterministic dry-run work is complete on current main; this issue remains open only for the [HUMAN] real external assistant execution decision/smoke.

Evidence:

Remaining gate:

  • [HUMAN] Verify the real Telegram/Heru/Codex/Claude execution path with production credentials, or explicitly defer that external execution path from V1 launch acceptance.

Scope

Implement the DataOps V1 assistant job model as part of the unified workflow-first portal. Assistant runs must be durable workflow entities in work-engine, not local files under assistants/podcast/inbox/, assistants/podcast/documents/, or assistants/podcast/heru_runs/.

This issue covers the model, lifecycle rules, API, operator UI, portable export behavior, and the first safe Podcast Assistant integration boundary. The goal is that an operator can request assistant help from a task or bundle, watch the job lifecycle, review logs and failures, approve or reject outputs, retry failed work, and see generated artifacts attached back to the workflow.

Implement in the DataOps repo only. Do not modify source repos such as ../podcast-assistant, ../dtc-operations, or ../datatasks.

Product Behavior

  • Add assistant jobs as first-class runtime records linked to a task, bundle, or both.
  • Add a stable assistant lifecycle with explicit statuses:
    • draft: intake exists but is not submitted.
    • queued: submitted and waiting to run.
    • running: worker or operator-triggered runner is active.
    • waiting_approval: outputs exist and require operator review.
    • approved: operator accepted outputs.
    • rejected: operator rejected outputs with a reason.
    • retrying: retry was requested after a failure or rejected output.
    • succeeded: completed without further approval required, or completed after approval.
    • failed: terminal failure after allowed attempts or explicit failure.
    • canceled: operator canceled the job before completion.
  • Persist job lifecycle changes, retry attempts, approval decisions, errors, and output attachment events as append-only job log or audit entries. Raw assistant transcript/log text must be stored as artifact/log references, not as unbounded DynamoDB blobs.
  • Link assistant outputs to the artifact model from Define artifact model and storage policy #29 through artifactRefs/outputArtifactIds. If Define artifact model and storage policy #29 changes exact artifact fields, use Define artifact model and storage policy #29 as the source of truth.
  • Preserve the existing workflow-first model: tasks and bundles remain the primary operator screens; assistant jobs are contextual workflow support, not a separate disconnected app.

Data Model

Add durable assistant job records in work-engine with migration-safe stable IDs. Minimum fields:

  • id / export assistant_job_id
  • assistantType / assistant_type such as podcast
  • title
  • status
  • taskId
  • bundleId
  • requestedBy
  • inputRefs: array of typed references to source messages, files, URLs, docs, tasks, or bundle links
  • outputArtifactIds: stable artifact IDs produced by the job
  • logRefs: references to stored run logs or transcript artifacts
  • approvalRequired
  • approval: { status, decidedBy, decidedAt, reason } when applicable
  • attemptCount
  • maxAttempts
  • retryOfJobId when a retry creates a new job record, or equivalent attempt history if retries stay on one job
  • lastError: sanitized error summary and code, no secrets
  • createdAt, queuedAt, startedAt, completedAt, updatedAt

Add append-only job event/log records or audit events with stable IDs and these minimum fields:

  • id / export audit_event_id or equivalent job-event ID
  • assistantJobId
  • actorId
  • action: created, queued, started, log-appended, artifact-attached, approval-requested, approved, rejected, retry-requested, failed, canceled, succeeded
  • summary
  • metadata: bounded JSON payload for attempt number, artifact IDs, error code, runner name, or status transition
  • createdAt

DynamoDB requirements:

  • Production tables must be declared in lambda-functions/template.full.yaml; production code must not create unmanaged tables on cold start.
  • Add environment variables for any new tables, including DATAOPS_ASSISTANT_JOBS_TABLE and, if implemented separately, DATAOPS_AUDIT_EVENTS_TABLE.
  • Use stack-scoped table names, PAY_PER_REQUEST, point-in-time recovery, retain policy, DataOps tags, and least-privilege IAM for WorkEngineFunction.
  • Local/test mode may auto-create local tables through work-engine/src/db/setup.ts.

API

Add authenticated work-engine API routes. Exact route names can follow repo conventions, but they must support:

  • Create/update draft job intake with assistantType, taskId, bundleId, inputRefs, approvalRequired, and retry policy.
  • Submit a draft job to queued.
  • List jobs with filters for status, assistantType, taskId, bundleId, and jobs needing approval.
  • Fetch one job with its related artifacts and bounded event/log timeline.
  • Append bounded job log/event entries from trusted backend code.
  • Transition queued -> running -> waiting_approval|succeeded|failed with validation.
  • Retry failed or rejected jobs without losing the original attempt history.
  • Approve or reject jobs that are in waiting_approval; rejection requires a reason.
  • Cancel jobs that are not terminal.
  • Attach output artifacts to jobs and reflect references on the linked task/bundle.

Validation requirements:

  • Reject unknown statuses and invalid state transitions.
  • Reject jobs without at least one of taskId or bundleId.
  • Reject approval/rejection on jobs not in waiting_approval.
  • Reject retry when attemptCount >= maxAttempts unless the API explicitly records a privileged override.
  • Redact or reject secrets in logs and errors where detectable.
  • Return consistent 400/401/404 responses matching existing work-engine API style.

Portal UI

Integrate assistant jobs into the existing unified portal UI:

  • Add operator visibility for assistant jobs from linked task rows and bundle detail pages.
  • Add an Assistants view or panel that lists recent jobs, running jobs, failed jobs, and jobs waiting for approval.
  • Show assistant type, linked workflow/task, status, attempt count, last update time, and next available action.
  • Allow the operator to create a Podcast Assistant job from a podcast task or bundle by selecting existing task/bundle context and adding input references.
  • Allow retry, cancel, approve, and reject actions from the UI when state allows them.
  • Show a bounded timeline/log preview with links to full log artifacts where applicable.
  • Show output artifacts on the linked task/bundle after they are attached.
  • Failed assistant jobs must surface as operator-visible risk, either in the Assistants view and/or workflow detail, and should produce an automation-failure notification where appropriate.

Podcast Assistant Boundary

Use assistants/podcast as the first assistant type, but do not require production Telegram, Groq, Heru, Codex, or Claude credentials for normal tests.

  • Preserve the current safe unit-test behavior of uv run --project assistants/podcast pytest.
  • Add a dry-run or fake-runner boundary that can turn a podcast job payload into deterministic output metadata for tests.
  • Do not treat assistants/podcast/inbox/, assistants/podcast/documents/, or assistants/podcast/heru_runs/ as production storage.
  • Real Telegram/Heru processing may remain opt-in and must not be required for CI.
  • [HUMAN] Real production assistant execution with Telegram/Heru credentials, if connected in this issue, requires manual verification before closing any human-only rollout checklist.

Export And Data Safety

Portable exports must include assistant job data safely:

  • Add assistant_jobs.jsonl to normal exports once the entity is implemented.

  • Include job IDs, relationships, statuses, input/output/log references, retry/approval metadata, sanitized errors, and timestamps.

  • Include job log/audit entities in audit_events.jsonl or a documented exported job-event entity. Prefer the existing audit_events.jsonl contract from docs/v1-execution-data-safety.md unless implementation constraints require a documented alternative.

  • Remove assistant_jobs from manifest.omitted_entities after export support is implemented.

  • Keep artifacts omitted only if Define artifact model and storage policy #29 is not merged into the implementation branch; if Define artifact model and storage policy #29 is merged, relationship validation must verify output artifact IDs.

  • Validate relationship integrity for job task_id, bundle_id, requested_by, output_artifact_ids, and log/artifact references.

  • Normal exports must exclude secrets, session tokens, raw credentials, signed URLs, and unbounded raw runner transcripts.

  • Dry-run import validation must report assistant job inserts/updates without writing production data.

  • work-engine defines typed assistant job status, job record, input reference, approval, retry, and job event/log contracts that align with docs/v1-workflow-data-model.md and docs/v1-execution-state-schema.md.

  • Production DynamoDB/SAM configuration owns the new durable assistant-job storage and any audit/log storage; local/test auto-create remains available only for local/test mode.

  • Authenticated APIs support job intake, submit, list/filter, detail, lifecycle transitions, bounded logs/events, retry, cancel, approval, rejection, and artifact attachment.

  • API validation rejects invalid state transitions, missing task/bundle context, invalid retries, invalid approvals, unknown statuses, and malformed references.

  • Task and bundle records can show assistant job references without embedding job internals or raw logs.

  • Portal UI exposes assistant jobs from workflow context and provides an operator-visible queue for running, failed, and approval-needed jobs.

  • Podcast Assistant has a deterministic dry-run/fake-runner integration path that can create or update a podcast assistant job and output artifact metadata without external credentials.

  • Assistant failures and exhausted retries are visible to operators and produce enough context to retry, cancel, or file a follow-up issue.

  • Approval-required jobs cannot become succeeded until approved; rejected jobs preserve the rejection reason and can be retried according to retry policy.

  • Portable export writes and validates assistant_jobs.jsonl; manifest omissions are updated accurately; relationship checks cover assistant job links and exported job events/logs.

  • Normal export redacts secrets and does not include raw Telegram tokens, API keys, session tokens, or unbounded assistant transcript/log bodies.

  • Existing work-engine task, bundle, template, notification, file, recurring, export, and auth tests still pass.

  • Existing Podcast Assistant unit tests still pass without Telegram, Groq, Heru, Codex, or Claude credentials.

  • Tester captures screenshots for changed assistant/job UI surfaces and verifies they are not 404s, blank, broken, or overlapping.

  • [HUMAN] Any real Telegram/Heru/Codex/Claude production execution path is manually verified with real credentials before being treated as production-ready.

Scenario: Create and submit assistant job from workflow context

Given: an authenticated operator, an active podcast bundle, and at least one task in that bundle
When: the operator creates a Podcast Assistant job with source input references and submits it
Then: the job is stored as queued, references the bundle/task, appears in the assistant queue, and appears from the linked workflow context.

Scenario: Job runs and waits for approval

Given: a queued podcast assistant job with approvalRequired=true
When: the deterministic runner records running, attaches output artifact metadata, and finishes
Then: the job becomes waiting_approval, output artifact IDs are visible, and the linked task/bundle shows the assistant job reference.

Scenario: Approve output

Given: a job in waiting_approval with output artifacts
When: the operator approves it
Then: the job records the approver and timestamp, transitions to approved or succeeded according to the implemented lifecycle contract, and appends an immutable approval event.

Scenario: Reject and retry output

Given: a job in waiting_approval
When: the operator rejects it with a reason and then retries it
Then: the rejection reason remains in history, retry attempt count increments or a linked retry job is created, and the new attempt can progress without overwriting the original history.

Scenario: Failure and exhausted retries

Given: a job with maxAttempts=2
When: the runner fails twice with sanitized errors
Then: the job becomes failed, no further non-override retry is allowed, and the UI shows the failure with retry/cancel/follow-up context as appropriate.

Scenario: Invalid transitions are blocked

Given: jobs in terminal and non-approval states
When: API clients try to approve a non-approval job, cancel a succeeded job, retry beyond max attempts, or mark an approval-required job as succeeded without approval
Then: the API returns 400 and persists no invalid transition event.

Scenario: Export and validate assistant job data

Given: users, tasks, bundles, assistant jobs, output artifact references, and job events/logs exist in local test data
When: export:data, validate:export, and dry-run import are run
Then: assistant_jobs.jsonl is present, manifest counts/checksums are correct, omitted entities are accurate, references validate, and secrets/raw unbounded logs are absent.

Scenario: Existing workflow behavior remains intact

Given: existing tasks, bundles, templates, recurring configs, files, notifications, and auth flows
When: the full relevant work-engine unit/typecheck/build and UI tests run
Then: existing behavior still passes and assistant job additions do not regress task completion, proof requirements, waiting tasks, or bundle detail rendering.

Out Of Scope

  • Grooming or implementing the artifact storage policy itself; that belongs to Define artifact model and storage policy #29.
  • Replacing the current vanilla work-engine frontend framework.
  • Building a general external queue service, EventBridge worker fleet, or background orchestration platform beyond what is needed for V1 assistant job lifecycle state.
  • Requiring real Telegram, Groq, Heru, Codex, Claude, Google Drive, Dropbox, or S3 credentials in normal automated tests.
  • Moving production binaries or large generated files into DynamoDB.
  • Modifying ../podcast-assistant, ../dtc-operations, ../datatasks, or other source repos.
  • Automatically pushing assistant outputs to public websites, Slack, Telegram, email, Google Docs, or other external systems.
  • Full generalized audit-event product UX beyond assistant job lifecycle history needed here.

Dependencies

  • Define artifact model and storage policy #29 should define the artifact model and storage policy used by outputArtifactIds, artifactRefs, log artifacts, storage URIs, artifact statuses, and artifact export validation.
  • The existing V1 runtime architecture in docs/v1-runtime-architecture.md remains the deployment boundary: public Python portal brokers authenticated /work/api/* calls to private WorkEngineFunction.
  • SAM/CloudFormation must remain the owner of production DynamoDB tables.
  • Existing assistant docs and code under assistants/podcast remain the local Podcast Assistant reference and test suite.

Verification Commands

Run from repo root unless noted otherwise:

npm --prefix work-engine test
npm --prefix work-engine run typecheck
npm --prefix work-engine run build
npm --prefix work-engine run test:e2e
uv run --project assistants/podcast pytest
npm --prefix work-engine run export:data -- .tmp/exports/assistant-jobs
npm --prefix work-engine run validate:export -- .tmp/exports/assistant-jobs
npm --prefix work-engine run dry-run:import -- .tmp/exports/assistant-jobs

If lambda-functions/template.full.yaml or deployment workflow files change, also run:

sam validate --template-file lambda-functions/template.full.yaml

Tester must include screenshot evidence for the Assistants queue/panel and any changed task or bundle detail assistant-job surfaces.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P0Must haveassistantAssistant modules and jobsbackendBackend/APIdataData model, migration, storageenhancementNew or improved functionalityfrontendFrontend UIhumanCode done or issue blocked on human verificationinfraDeployment and infrastructurepodcastPodcast workflow and assistantportalShared portal shell and UXwork-engineDataTasks task execution engine

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions