Skip to content
Merged
65 changes: 50 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,14 +33,15 @@ AUP Learning Cloud is a tailored JupyterHub deployment designed to provide an in
The simplest way to deploy AUP Learning Cloud on a single machine in a development or demo environment.

### Prerequisites
- **Hardware**: AMD Ryzen AI Halo Device (e.g., AI Max+ 395, AI Max 390)
- **Hardware**: Supported **Ryzen AI 300 series and above** APUs and **Radeon 9000 series** PCIe GPUs.
- **Memory**: 32GB+ RAM (64GB recommended)
- **Storage**: 500GB+ SSD
- **OS**: Ubuntu 24.04.3 LTS
- **Docker**: Install Docker and configure for non-root access
- **TUI deps**: `python3-questionary` and `python3-prompt-toolkit` (apt) for the recommended interactive installer; conda/venv users use `pip install questionary prompt_toolkit`

```bash
# Install the OEM kernel for AMD Ryzen-series APU ROCm support (reboot required)
# Ryzen AI APU only: OEM kernel for ROCm on Ubuntu 24.04 (reboot required)
sudo apt update && sudo apt install linux-image-6.14.0-1018-oem

# Install Docker
Expand All @@ -54,32 +55,68 @@ newgrp docker

# Install Build Tools
sudo apt install build-essential

# TUI dependencies (required for the recommended interactive install)
sudo apt install python3-questionary python3-prompt-toolkit
```

> **Kernel note**: The OEM kernel package follows AMD ROCm's Ryzen APU installation guidance for Ubuntu 24.04. See the [ROCm installation guide for Ryzen APUs](https://rocm.docs.amd.com/en/7.12.0/install/rocm.html?fam=ryzen&gpu=max-pro-395&os=ubuntu&os-version=24.04&i=pkgman) for details.
> **Kernel note** (Ryzen AI APU only): The OEM kernel package follows AMD ROCm's Ryzen APU installation guidance for Ubuntu 24.04. See the [ROCm installation guide for Ryzen APUs](https://rocm.docs.amd.com/en/7.12.0/install/rocm.html?fam=ryzen&gpu=max-pro-395&os=ubuntu&os-version=24.04&i=pkgman) for details. Radeon dGPU systems typically use the stock Ubuntu kernel—check ROCm docs for your GPU.
>
> **Docker note**: See [Docker Post-installation Steps](https://docs.docker.com/engine/install/linux-postinstall/) and [Install Docker Engine on Ubuntu](https://docs.docker.com/engine/install/ubuntu/) for details.
>
> **TUI note**: **System Python (apt):** install `python3-questionary` and `python3-prompt-toolkit` as shown above. **Conda or virtualenv users:** use `pip install questionary prompt_toolkit` inside your active environment instead of the apt packages. These are required for the interactive TUI; non-interactive `./auplc-installer install` does not need them.

### Installation

**Interactive (recommended):**

```bash
git clone https://github.com/AMDResearch/aup-learning-cloud.git
cd aup-learning-cloud
sudo ./auplc-installer install
./auplc-installer # pick Install, accept defaults, set Image tag to develop
```
After installation completes, open http://localhost:30890 in your browser. No login credentials are required - you will be automatically logged in.

Common options:
**Non-interactive:**

```bash
sudo ./auplc-installer install --gpu=strix-halo # specify GPU type
sudo ./auplc-installer install --docker=0 # use containerd instead of Docker
sudo ./auplc-installer install --mirror=mirror.example.com # use registry mirror
git clone https://github.com/AMDResearch/aup-learning-cloud.git
cd aup-learning-cloud
./auplc-installer install
```

See more at [link](https://amdresearch.github.io/aup-learning-cloud/installation/single-node.html#runtime-and-mirror-configuration)
A successful install looks like this:

```text
This operation needs root privileges. Requesting sudo password...
✓ [1/8] Detecting GPU (0.2s)
✓ [2/8] Generating values overlay (initial) (0.0s)
✓ [3/8] Installing helm + k9s (0.0s)
✓ [4/8] Installing K3s (single-node) (3.8s)
✓ [5/8] Pulling custom + external images (25.0s)
✓ [6/8] Deploying ROCm GPU device plugin + node labeller (0.2s)
✓ [7/8] Refreshing values overlay from node labels (0.2s)
✓ [8/8] Deploying JupyterHub runtime (helm install + wait) (9.2s)

_ _ _ ____ _ _ ____ _ _
/ \ | | | | _ \ | | ___ __ _ _ __ _ __ (_)_ __ __ _ / ___| | ___ _ _ __| |
/ _ \ | | | | |_) | | | / _ \/ _` | '__| '_ \| | '_ \ / _` | | | | |/ _ \| | | |/ _` |
/ ___ \| |_| | __/ | |__| __/ (_| | | | | | | | | | | (_| | | |___| | (_) | |_| | (_| |
/_/ \_\___/|_| |_____\___|\__,_|_| |_| |_|_|_| |_|\__, | \____|_|\___/ \__,_|\__,_|
|___/
You have successfully installed AUP Learning Cloud!

Open in your browser: http://localhost:30890
(auto-logged-in as 'student' — no login needed)

kubectl is configured at $HOME/.kube/config; try `kubectl get nodes`
```

See the full guide at [Quick Start](https://amdresearch.github.io/aup-learning-cloud/installation/quick-start.html) and [Single-Node Deployment](https://amdresearch.github.io/aup-learning-cloud/installation/single-node.html).

### Uninstall

```bash
sudo ./auplc-installer uninstall
./auplc-installer uninstall
```

## Cluster Installation
Expand Down Expand Up @@ -145,12 +182,10 @@ Current environments are configured via `custom.resources.images` in `runtime/va

The `auplc-default`, `auplc-base`, and `Course-*` images remain notebook and course focused. Browser-based coding is provided by generic code-server images instead of per-course VS Code image variants. Resources launch code-server when their `custom.resources.metadata.<resource>.launchMode` is set to `code-server`; the default configuration uses `code-cpu` for CPU-only coding workspaces and `code-gpu` for GPU-accelerated coding workspaces.

Build the generic coding images from the repository root with:
Build the images:

```bash
make -C dockerfiles code-cpu
make -C dockerfiles code-gpu GPU_TARGET=gfx1151
make -C dockerfiles code
./auplc-installer img build base-rocm --gpu=strix
```

The code-server container starts on port `8888` with `code-server --auth none`. This is safe only when the user pod is reachable exclusively through JupyterHub and the JupyterHub proxy authentication boundary. Do not expose the code-server pod port directly through a NodePort, LoadBalancer, ingress, or other unauthenticated route.
Expand Down
124 changes: 107 additions & 17 deletions auplc_installer/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
from auplc_installer.progress import stage
from auplc_installer.rocm import deploy_rocm_gpu_device_plugin
from auplc_installer.state import InstallerState
from auplc_installer.summary import format_configuration_summary, resolve_install_image_source
from auplc_installer.util import (
InstallerError,
ensure_sudo_session,
Expand Down Expand Up @@ -73,12 +74,16 @@
optional `questionary` package is not installed.

install [--pull] Full installation (k3s + images + runtime)
Default: build images locally via Makefile
--pull: use pre-built images from GHCR (no local build needed)
Default: pull pre-built images from --image-registry
--pull: legacy alias for --image-source=pull
--image-source=build: build via dockerfiles/Makefile

install --dry-run Show Configuration summary without making changes
(also accepts --try-run). No sudo required.

pack [--local] Create offline deployment bundle (requires Docker + internet)
Default: pull pre-built images from GHCR
--local: build images locally then pack (needs build deps)
Default: pull pre-built images from registry
--local: legacy alias for building locally before pack

uninstall Remove everything (K3s + runtime)
install-tools Install helm and k9s
Expand All @@ -101,6 +106,7 @@

Options (can also be set via environment variables):
--gpu=TYPE Override auto-detected GPU type. Accepts:
auto - auto-detect (same as omitting the flag)
phx - Phoenix Point iGPU (gfx1100..gfx1103)
strix - Strix Point iGPU (gfx1150)
strix-halo - Strix Halo iGPU (gfx1151)
Expand All @@ -112,10 +118,28 @@
Auto-detection uses rocminfo or KFD topology.
Env: GPU_TYPE

--runtime=MODE K3s container runtime: docker (default) or containerd.
Env: K3S_USE_DOCKER (1 = docker, 0 = containerd)
Legacy: --docker=0|1 (still supported; --runtime wins)

--image-source=SRC Custom image acquisition for install:
pull - fetch pre-built images from --image-registry
build - build from dockerfiles/ via Makefile
Default: pull. Legacy: ghcr is an alias for pull;
install --pull is an alias for --image-source=pull

--image-registry=PREFIX
Registry prefix for custom images (default: ghcr.io/amdresearch)
Env: IMAGE_REGISTRY

--image-tag=TAG Custom-image tag prefix (default: latest). GPU suffix
appended automatically. Env: IMAGE_TAG

--docker=0|1 Use host Docker as K3s container runtime (default: 1).
1 = Docker mode: images visible to K3s immediately.
0 = containerd mode: images exported for offline use.
Env: K3S_USE_DOCKER
Prefer --runtime=docker|containerd for new scripts.

--courses=SPEC Restrict the install/pack to a subset of courses. Affects
image build/pull AND the rendered values.local.yaml so
Expand All @@ -130,6 +154,10 @@
-y, --yes Assume yes to all prompts (for scripted/CI use).
Env: AUPLC_YES=1

--dry-run, --try-run
With ``install``: print Configuration summary and exit.
Does not prompt for sudo or change the system.

-v, --verbose Stream every subprocess line live (default is a quiet
"progress-bar" mode where only stage labels show, and
captured output is dumped only when a command fails).
Expand All @@ -142,22 +170,24 @@

Examples:
./auplc-installer tui # interactive wizard
./auplc-installer install --dry-run # preview defaults
./auplc-installer install --image-source=pull --image-tag=develop
./auplc-installer install --runtime=containerd --image-source=build
./auplc-installer install --gpu=strix-halo
./auplc-installer install --gpu=phx --docker=0
./auplc-installer install --gpu=auto --dry-run
./auplc-installer install --gpu=phx --docker=0 # legacy flags
./auplc-installer install --courses=basic # cpu + gpu only
./auplc-installer install --courses=cpu,gpu,Course-CV
./auplc-installer img build base-rocm --gpu=strix
./auplc-installer install --mirror=mirror.example.com

Image Registry:
IMAGE_REGISTRY Registry prefix for custom images (default: ghcr.io/amdresearch)
Override when pulling from a fork or private registry.
IMAGE_TAG Image tag prefix (default: latest). GPU suffix appended automatically.
Use "develop" for images built from the develop branch.
Image Registry (legacy env-only aliases still work):
IMAGE_REGISTRY Same as --image-registry (default: ghcr.io/amdresearch)
IMAGE_TAG Same as --image-tag (default: latest)

Offline Deployment:
1. On a machine with internet access, create bundle:
./auplc-installer pack --gpu=strix-halo # pull from GHCR
./auplc-installer pack --gpu=strix-halo # pull from registry
./auplc-installer pack --gpu=strix-halo --local # or build locally

2. Transfer bundle to air-gapped machine, then:
Expand Down Expand Up @@ -186,12 +216,17 @@ def _build_parser() -> argparse.ArgumentParser:
)
# Global flags
p.add_argument("--gpu", dest="gpu_type", default=None)
p.add_argument("--runtime", dest="runtime", default=None)
p.add_argument("--docker", dest="use_docker", default=None)
p.add_argument("--image-source", dest="image_source", default=None)
p.add_argument("--image-registry", dest="image_registry", default=None)
p.add_argument("--image-tag", dest="image_tag", default=None)
p.add_argument("--mirror", dest="mirror_prefix", default=None)
p.add_argument("--mirror-pip", dest="mirror_pip", default=None)
p.add_argument("--mirror-npm", dest="mirror_npm", default=None)
p.add_argument("--courses", dest="courses", default=None)
p.add_argument("-y", "--yes", dest="assume_yes", action="store_true")
p.add_argument("--dry-run", "--try-run", dest="dry_run", action="store_true")
p.add_argument(
"-v",
"--verbose",
Expand All @@ -208,9 +243,23 @@ def _build_parser() -> argparse.ArgumentParser:

def _apply_global_flags(state: InstallerState, args: argparse.Namespace) -> None:
if args.gpu_type is not None:
state.gpu_type = args.gpu_type
if args.use_docker is not None:
state.gpu_type = "" if args.gpu_type.lower() == "auto" else args.gpu_type
if args.runtime is not None:
runtime = args.runtime.lower()
if runtime == "docker":
state.use_docker = True
elif runtime == "containerd":
state.use_docker = False
else:
raise InstallerError(f"Unknown --runtime={args.runtime!r} (expected docker or containerd)")
elif args.use_docker is not None:
state.use_docker = args.use_docker not in ("0", "false", "False")
if args.image_source is not None:
state.image_source = args.image_source
if args.image_registry is not None:
state.image_registry = args.image_registry
if args.image_tag is not None:
state.image_tag = args.image_tag
if args.mirror_prefix is not None:
state.mirror_prefix = args.mirror_prefix
if args.mirror_pip is not None:
Expand All @@ -227,6 +276,25 @@ def _apply_global_flags(state: InstallerState, args: argparse.Namespace) -> None
set_verbose(state.verbose)


def _install_pull_and_label(
state: InstallerState,
*,
legacy_pull: bool,
) -> tuple[bool, str]:
return resolve_install_image_source(
image_source=state.image_source,
legacy_pull=legacy_pull,
offline_mode=state.offline_mode,
bundle_dir=state.bundle_dir,
)


def cmd_install_plan(state: InstallerState, *, legacy_pull: bool = False) -> None:
"""Print the install Configuration summary without side effects."""
_, label = _install_pull_and_label(state, legacy_pull=legacy_pull)
sys.stdout.write(format_configuration_summary(state, image_source_label=label) + "\n")


# ---------------------------------------------------------------------------
# Command implementations
# ---------------------------------------------------------------------------
Expand All @@ -253,7 +321,7 @@ def cmd_install(state: InstallerState, *, pull: bool) -> None:
def _cmd_install_inner(state: InstallerState, *, pull: bool) -> None:
"""Body of ``cmd_install`` after sudo session has been primed."""
# Pre-compute the image-stage label so the user knows up-front which path
# the installer is taking (offline / GHCR pull / local build).
# the installer is taking (offline / pull / build).
if state.offline_mode:
image_stage_label = "Loading images from offline bundle"
elif pull:
Expand Down Expand Up @@ -655,6 +723,13 @@ def _resolve_source_root() -> Path:

def main(argv: Sequence[str] | None = None) -> None:
argv = list(argv) if argv is not None else sys.argv[1:]

# Custom help text (add_help=False on the parser). Intercept -h/--help
# before argparse runs so `./auplc-installer --help` works as users expect.
if any(tok in ("-h", "--help") for tok in argv):
show_help()
return

parser = _build_parser()

# Argparse refuses positional + --foo=bar interleaving in some edge
Expand All @@ -667,12 +742,16 @@ def main(argv: Sequence[str] | None = None) -> None:
for tok in argv:
if (
tok.startswith("--gpu=")
or tok.startswith("--runtime=")
or tok.startswith("--docker=")
or tok.startswith("--image-source=")
or tok.startswith("--image-registry=")
or tok.startswith("--image-tag=")
or tok.startswith("--mirror=")
or tok.startswith("--mirror-pip=")
or tok.startswith("--mirror-npm=")
or tok.startswith("--courses=")
or tok in ("-y", "--yes", "-v", "--verbose", "--version")
or tok in ("-y", "--yes", "-v", "--verbose", "--version", "--dry-run", "--try-run")
):
flags.append(tok)
else:
Expand All @@ -684,7 +763,7 @@ def main(argv: Sequence[str] | None = None) -> None:
try:
state = InstallerState.from_environment(script_dir=script_dir)
_apply_global_flags(state, args)
_dispatch(args.command, list(args.rest), state, source_root=script_dir)
_dispatch(args.command, list(args.rest), state, source_root=script_dir, dry_run=args.dry_run)
except InstallerError as exc:
log_error(str(exc))
sys.exit(1)
Expand All @@ -699,6 +778,7 @@ def _dispatch(
state: InstallerState,
*,
source_root: Path,
dry_run: bool = False,
) -> None:
# Default behaviour when invoked with no subcommand: launch the TUI in
# an interactive terminal, fall back to printing help in a piped /
Expand All @@ -720,10 +800,20 @@ def _dispatch(
return

if cmd == "install":
pull = bool(rest) and rest[0] == "--pull"
legacy_pull = "--pull" in rest
rest = [t for t in rest if t != "--pull"]
if rest:
raise InstallerError(f"Unexpected install argument(s): {' '.join(rest)}")
if dry_run:
cmd_install_plan(state, legacy_pull=legacy_pull)
return
pull, _ = _install_pull_and_label(state, legacy_pull=legacy_pull)
cmd_install(state, pull=pull)
return

if dry_run:
raise InstallerError("--dry-run is only supported with the install command")

if cmd == "pack":
local_build = bool(rest) and rest[0] == "--local"
cmd_pack(state, local_build=local_build, source_root=source_root)
Expand Down
2 changes: 1 addition & 1 deletion auplc_installer/images.py
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,7 @@ def pull_custom_images(

tag = f"{image_tag}-{cfg.gpu_target}"
log_section(
"Pulling pre-built custom images from GHCR...\n"
"Pulling pre-built custom images from registry...\n"
f" GPU_TARGET={cfg.gpu_target}, tag={tag}\n"
f" Courses: {courses.description()}"
)
Expand Down
2 changes: 1 addition & 1 deletion auplc_installer/pack.py
Original file line number Diff line number Diff line change
Expand Up @@ -443,7 +443,7 @@ def pack_bundle(

log("===========================================")
log("AUP Learning Cloud - Pack Offline Bundle")
log(" Image source: " + ("local build" if local_build else "pull from GHCR"))
log(" Image source: " + ("build" if local_build else "pull"))
log("===========================================")

require_command("docker", install_hint="Docker is required for pack.")
Expand Down
4 changes: 4 additions & 0 deletions auplc_installer/state.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,10 @@ class InstallerState:
image_registry: str = DEFAULT_IMAGE_REGISTRY
image_tag: str = DEFAULT_IMAGE_TAG

# Install image acquisition override (CLI --image-source=pull|build).
# None => default ``pull`` (matches TUI); legacy ``install --pull`` also selects pull.
image_source: str | None = None

# Course selection (drives image filtering + teams.mapping override)
courses: CourseSelection = field(default_factory=CourseSelection.default)

Expand Down
Loading
Loading