Voxify — AI Text-to-Speech Platform

Built a production-ready AI voice platform that lets creators, businesses, and developers generate realistic speech audio at scale; fast, clean, and API-ready.

What the platform does

Multi-voice generation with controllable parameters (temperature, top-p, top-k, repetition penalty)
Organization-based voice management (system voices + per-org custom voices)
Low-latency audio generation and playback (streamed audio via API)
Developer-friendly APIs (Next.js Route Handlers + tRPC), plus a typed client to the inference service (OpenAPI)
Audio storage + delivery via Cloudflare R2 (S3-compatible) with signed URLs

Stack

Next.js (App Router) + TypeScript + React
Prisma (PostgreSQL via @prisma/adapter-pg + pg)
REST API (Next Route Handlers) + tRPC
Python FastAPI service on Modal (GPU A10G) for TTS inference
OpenAPI typegen + typed client (openapi-typescript + openapi-fetch)
Auth & Org (Clerk)
Object Storage (Cloudflare R2 via AWS S3 SDK + signed URLs)
UI (Tailwind CSS + shadcn/ui + Radix)
Validation (Zod)
Data fetching/state (TanStack React Query)

Architecture (flow)

flowchart LR
  subgraph Client
    U[User Browser]
    UI[Dashboard UI]
    U --> UI
  end

  subgraph App[Next.js App Router]
    API[API Routes and tRPC]
    Auth[Clerk Auth and Org]
    UI --> API
    API --> Auth
  end

  subgraph Data[Data Layer]
    DB[(Postgres)]
    R2[Cloudflare R2]
  end

  subgraph Inference[Inference Layer]
    M[Modal FastAPI Service]
    Model[Chatterbox TTS Model]
    M --> Model
  end

  Auth -->|user session and orgId| API
  API -->|Prisma read write| DB

  %% Voice upload and management
  API -->|upload voice audio| R2
  API -->|save voice metadata| DB

  %% TTS generation
  API -->|POST generate| M
  M -->|read voice prompt audio| R2
  M -->|return wav bytes| API
  API -->|store generation audio| R2
  API -->|save generation record| DB

  %% Playback
  UI -->|GET api audio generationId| API
  API -->|signed url| R2
  R2 -->|wav stream| UI

Key flows

Voice upload: UI -> API -> R2, then API -> DB
Generate TTS: UI -> tRPC -> Modal, then API stores WAV to R2 and metadata to DB
Playback: UI requests /api/audio/:generationId, API returns a streamed WAV from R2

Getting started

Prerequisites

Node.js (recommended: latest LTS)
Postgres database
Cloudflare R2 credentials + bucket
Clerk application keys
A running Chatterbox inference API (Modal) or your own compatible endpoint

Clone & install

git clone https://github.com/Ryanakml/Voxify.git
cd Voxify
npm install

Environment variables

Create a .env in the project root. Minimum variables used by the code:

# Database
DATABASE_URL=postgresql://USER:PASSWORD@HOST:PORT/DB

# Cloudflare R2 (S3-compatible)
R2_ACCOUNT_ID=...
R2_ACCESS_KEY_ID=...
R2_SECRET_ACCESS_KEY=...
R2_BUCKET_NAME=...

# Chatterbox inference API (Modal FastAPI)
CHATTERBOX_API_URL=https://your-modal-app.modal.run
CHATTERBOX_API_KEY=...

# Clerk (typical Next.js + Clerk env vars)
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=...
CLERK_SECRET_KEY=...

Database migrations + seed

npx prisma migrate dev
npm run seed

Sync OpenAPI types (optional but recommended)

This fetches ${CHATTERBOX_API_URL}/openapi.json and regenerates the typed client types.

npm run sync:api

Run the app

npm run dev

Open http://localhost:3000

Notes: Modal inference service

The inference service lives in chatterbox_tts.py and is designed to run on Modal with an A10G GPU.

It expects an API key via x-api-key header
It reads voice prompt audio from a Cloudflare R2 bucket mounted into Modal

If you change the inference API base URL, update CHATTERBOX_API_URL and rerun npm run sync:api.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
prisma		prisma
public		public
src		src
.gitignore		.gitignore
README.md		README.md
chatterbox_tts.py		chatterbox_tts.py
components.json		components.json
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
prisma.config.ts		prisma.config.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voxify — AI Text-to-Speech Platform

What the platform does

Stack

Architecture (flow)

Key flows

Getting started

Prerequisites

Clone & install

Environment variables

Database migrations + seed

Sync OpenAPI types (optional but recommended)

Run the app

Notes: Modal inference service

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Voxify — AI Text-to-Speech Platform

What the platform does

Stack

Architecture (flow)

Key flows

Getting started

Prerequisites

Clone & install

Environment variables

Database migrations + seed

Sync OpenAPI types (optional but recommended)

Run the app

Notes: Modal inference service

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages