Skip to content

becloudready/devops-launchpad

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 

Repository files navigation

AI Cloud Engineer Bootcamp — Syllabus

The week-by-week topics, labs, and reference resources I use to run my live bootcamp.

For the current cohort dates, format, price, and registration, see the course page: → becloudready.com/programs/ai-cloud-engineer-bootcamp

Format note (updated): Earlier cohorts ran as a 5-day Mon–Fri intensive. Based on feedback that the pace was overwhelming, the program now runs across 6 weeks, every Monday, 6:00–8:00 PM EST. The total instructional time (~12 hours live) is the same — just spread out so you can actually finish the labs between sessions. Ongoing Slack support across all 6 weeks, plus a 1-hour interview prep session after completion.

Prerequisites: basic Linux CLI, comfort with Bash or Python, an AWS Free Tier account.


Week 0 — Before You Show Up

Get these out of the way so we don't burn live time on setup.

  • AWS Free Tier account, IAM admin user with access keys, AWS CLI installed
  • Docker Desktop installed and docker run hello-world works
  • Terraform CLI installed (terraform -v)
  • kubectl and helm installed
  • VS Code (or your editor) + Git configured

Light pre-reading if you have time:


Week 1 — Cloud Foundations

Topics

  • VPC, IAM, EC2, S3, RDS — deep dive
  • Networking, Security Groups, NACLs
  • Cost optimisation & billing
  • AWS CLI & SDK fundamentals

Lab (this week) Stand up a 3-tier VPC (public/private subnets, NAT, IGW), launch an EC2 instance into the private subnet, attach a bastion, query S3 via CLI from the instance, and write a least-privilege IAM policy that scopes access to a single bucket. Lab support runs in Slack all week.

References


Week 2 — Containers

Topics

  • Docker — images, layers, networking, volumes, multi-stage builds
  • Kubernetes architecture — pods, deployments, services, ingress
  • EKS cluster setup on AWS
  • Helm charts and GitOps with Argo CD

Lab (this week) Containerise a Python/FastAPI app with a multi-stage Dockerfile (<50 MB final image), push to ECR, deploy to an EKS cluster via a Helm chart, then wire Argo CD to auto-sync the deployment from a Git repo.

References


Week 3 — Terraform

Topics

  • Terraform modules & workspaces
  • Remote state with S3 + DynamoDB locking
  • Drift detection and remediation
  • Security scanning with tfsec

Lab (this week) Rewrite Week 1's manually-built VPC + EC2 as Terraform modules with a dev and prod workspace, store state remotely with locking, and demo terraform plan catching a drift.

References


Week 4 — DevOps / SRE

Topics

  • GitHub Actions — build, test, deploy workflows
  • Blue-green and canary deployment strategies
  • Container registry workflows + image scanning
  • Secrets management (AWS Secrets Manager / Vault)
  • SLOs, error budgets, alerting with Grafana / Datadog
  • Log aggregation and distributed tracing (OpenTelemetry)
  • Incident response playbooks

Lab (this week) Wire a GitHub Actions workflow that, on push to main: runs tests → builds + scans a container → pushes to ECR (via OIDC, no static keys) → triggers Argo CD to roll out a blue-green deploy to EKS. Add Prometheus/Grafana dashboards and an OpenTelemetry-instrumented service, define one SLO with an error budget, and write a one-page runbook for a simulated incident.

References


Week 5 — AI, RAG, and AgentOps

Topics

  • Why every production AI agent is a distributed system, not a notebook
  • LLM serving — hosted endpoints vs. self-hosted (vLLM, TGI) on GPU nodes
  • Vector databases — pgvector, Pinecone; embedding workers; index lifecycle
  • RAG pipeline architecture — retrieval API, reranking, evals
  • AgentOps — agent orchestration, tool registries, observability for LLM calls, cost guardrails, prompt versioning
  • SQL-safety patterns for agents that touch real data (read-only roles, query budgets, schema allow-lists)

Lab (this week) Layer a small RAG service on top of the EKS cluster from earlier weeks: pgvector as the vector store, a small embedding worker, a FastAPI retrieval service, and an OpenAI-compatible LLM endpoint (vLLM or a hosted model). Instrument the LLM calls with OpenTelemetry, set a per-request token budget, and add a Grafana panel for cost-per-request.

References


Week 6 — Project: End-to-End Agentic AI Data Engineering (db-agent)

The capstone. You take everything from Weeks 1–5 and ship one working system: an agentic text-to-SQL data engineering pipeline, deployed on the cloud you've been building all along.

What you'll build

  • A working clone of the open-source db-agent pattern on your own AWS account
  • Backend: FastAPI retrieval service + LLM endpoint (Week 5) running on the EKS cluster (Week 2), provisioned by Terraform (Week 3), deployed via the GitHub Actions + Argo CD pipeline (Week 4)
  • Read-only SQL execution path with query budgets and a schema allow-list (the safety pattern that keeps agents out of production data)
  • Observability — traces, token-cost dashboards, an SLO on retrieval latency
  • A one-page architecture diagram, a runbook, and a short demo video

This is the project you point recruiters at. End-to-end, real cloud, real AI infra, your code.

References


After the Bootcamp

  • 1-hour interview prep session the week after Week 6 — resume review, LinkedIn audit, technical-interview question patterns for Cloud / DevOps / AI Cloud Engineer roles.
  • Lifetime access to the beCloudReady Slack community.
  • Eligible to re-attend future cohorts at no extra cost as the curriculum evolves.

Also useful:

— Chandan Kumar, beCloudReady

About

Free DevOps Roadmap 2026 — learn AWS, Kubernetes, Terraform, Docker, CI/CD & SRE with hands-on labs. Land a Cloud/DevOps job

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors