The week-by-week topics, labs, and reference resources I use to run my live bootcamp.
For the current cohort dates, format, price, and registration, see the course page: → becloudready.com/programs/ai-cloud-engineer-bootcamp
Format note (updated): Earlier cohorts ran as a 5-day Mon–Fri intensive. Based on feedback that the pace was overwhelming, the program now runs across 6 weeks, every Monday, 6:00–8:00 PM EST. The total instructional time (~12 hours live) is the same — just spread out so you can actually finish the labs between sessions. Ongoing Slack support across all 6 weeks, plus a 1-hour interview prep session after completion.
Prerequisites: basic Linux CLI, comfort with Bash or Python, an AWS Free Tier account.
Get these out of the way so we don't burn live time on setup.
- AWS Free Tier account, IAM admin user with access keys, AWS CLI installed
- Docker Desktop installed and
docker run hello-worldworks - Terraform CLI installed (
terraform -v) kubectlandhelminstalled- VS Code (or your editor) + Git configured
Light pre-reading if you have time:
- LFS101 — Intro to Linux Chapters 1–5
- Git & GitHub — my YouTube playlist
- AWS Cloud Practitioner Essentials — first 2 modules
Topics
- VPC, IAM, EC2, S3, RDS — deep dive
- Networking, Security Groups, NACLs
- Cost optimisation & billing
- AWS CLI & SDK fundamentals
Lab (this week) Stand up a 3-tier VPC (public/private subnets, NAT, IGW), launch an EC2 instance into the private subnet, attach a bastion, query S3 via CLI from the instance, and write a least-privilege IAM policy that scopes access to a single bucket. Lab support runs in Slack all week.
References
- AWS Technical Essentials
- VPC docs · IAM docs · EC2 docs · S3 docs
- Well-Architected Labs
- quick-labs — my repo for fast instance launchers
Topics
- Docker — images, layers, networking, volumes, multi-stage builds
- Kubernetes architecture — pods, deployments, services, ingress
- EKS cluster setup on AWS
- Helm charts and GitOps with Argo CD
Lab (this week) Containerise a Python/FastAPI app with a multi-stage Dockerfile (<50 MB final image), push to ECR, deploy to an EKS cluster via a Helm chart, then wire Argo CD to auto-sync the deployment from a Git repo.
References
- Docker — Get Started · Multi-stage builds
- Learn Kubernetes Basics
- EKS Workshop
- Helm — Quickstart
- Argo CD — Getting Started
- docker-tutorials · kubernetes-tutorials
Topics
- Terraform modules & workspaces
- Remote state with S3 + DynamoDB locking
- Drift detection and remediation
- Security scanning with tfsec
Lab (this week)
Rewrite Week 1's manually-built VPC + EC2 as Terraform modules with a dev and prod workspace, store state remotely with locking, and demo terraform plan catching a drift.
References
- Terraform — Get Started on AWS
- HashiCorp Developer — Terraform tutorials
- Terraform Registry
- EKS Terraform Workshop
- terraform-tutorials
Topics
- GitHub Actions — build, test, deploy workflows
- Blue-green and canary deployment strategies
- Container registry workflows + image scanning
- Secrets management (AWS Secrets Manager / Vault)
- SLOs, error budgets, alerting with Grafana / Datadog
- Log aggregation and distributed tracing (OpenTelemetry)
- Incident response playbooks
Lab (this week)
Wire a GitHub Actions workflow that, on push to main: runs tests → builds + scans a container → pushes to ECR (via OIDC, no static keys) → triggers Argo CD to roll out a blue-green deploy to EKS. Add Prometheus/Grafana dashboards and an OpenTelemetry-instrumented service, define one SLO with an error budget, and write a one-page runbook for a simulated incident.
References
- GitHub Actions docs · Quickstart
- Google SRE Book · Site Reliability Workbook
- Prometheus — First Steps
- Grafana Labs tutorials
- OpenTelemetry documentation
- DORA — DevOps Research & Assessment — the four key metrics
- OWASP DevSecOps Guideline
- Snyk Learn
Topics
- Why every production AI agent is a distributed system, not a notebook
- LLM serving — hosted endpoints vs. self-hosted (vLLM, TGI) on GPU nodes
- Vector databases — pgvector, Pinecone; embedding workers; index lifecycle
- RAG pipeline architecture — retrieval API, reranking, evals
- AgentOps — agent orchestration, tool registries, observability for LLM calls, cost guardrails, prompt versioning
- SQL-safety patterns for agents that touch real data (read-only roles, query budgets, schema allow-lists)
Lab (this week) Layer a small RAG service on top of the EKS cluster from earlier weeks: pgvector as the vector store, a small embedding worker, a FastAPI retrieval service, and an OpenAI-compatible LLM endpoint (vLLM or a hosted model). Instrument the LLM calls with OpenTelemetry, set a per-request token budget, and add a Grafana panel for cost-per-request.
References
- LangChain docs
- LlamaIndex docs
- pgvector · Pinecone docs
- vLLM · Text Generation Inference (TGI)
- OpenTelemetry for LLMs (Traceloop / OpenLLMetry)
- db-agent — my open-source reference for the SQL-safety + agent pattern
The capstone. You take everything from Weeks 1–5 and ship one working system: an agentic text-to-SQL data engineering pipeline, deployed on the cloud you've been building all along.
What you'll build
- A working clone of the open-source db-agent pattern on your own AWS account
- Backend: FastAPI retrieval service + LLM endpoint (Week 5) running on the EKS cluster (Week 2), provisioned by Terraform (Week 3), deployed via the GitHub Actions + Argo CD pipeline (Week 4)
- Read-only SQL execution path with query budgets and a schema allow-list (the safety pattern that keeps agents out of production data)
- Observability — traces, token-cost dashboards, an SLO on retrieval latency
- A one-page architecture diagram, a runbook, and a short demo video
This is the project you point recruiters at. End-to-end, real cloud, real AI infra, your code.
References
- db-agent repo — the pattern, three deployment variants, five progressive modules
- AAAI-25 workshop materials — context on the SQL-safety design
- 1-hour interview prep session the week after Week 6 — resume review, LinkedIn audit, technical-interview question patterns for Cloud / DevOps / AI Cloud Engineer roles.
- Lifetime access to the beCloudReady Slack community.
- Eligible to re-attend future cohorts at no extra cost as the curriculum evolves.
Also useful:
- Interview prep practice repos: k8s-interview-action · interview-challenges
— Chandan Kumar, beCloudReady