Skip to content

[RFE] Enable local LLM provider support (Ollama) for development and testing #784

@anik120

Description

@anik120

Is your feature request related to a problem? Please describe.

Currently, Lightspeed Stack only officially supports cloud-based LLM providers (OpenAI, Azure, RHOAI, RHEL AI). This creates significant barriers for developers and users trying to:

  • Get started with the project
  • Test and develop locally
  • Control costs during development
  • Work offline or in air-gapped environments
  • Learn the system without cloud dependencies

Current Situation

From docs/providers.md:

Provider Type Supported in LCS
ollama remote
meta-reference inline
sentence-transformers inline

Developers currently must:

  1. Create an OpenAI/Azure account
  2. Add payment information
  3. Manage API keys and quotas
  4. Pay for every test query during development
  5. Wait for quota resets when limits are hit

Example: A developer following the getting started guide encounters:

RateLimitError: You exceeded your current quota

This blocks them from testing basic functionality without adding credits.

Describe the solution you'd like

Add official support for Ollama as a local inference provider.

Ollama provides:

  • Free, unlimited local inference (no API keys, no quotas)
  • Easy installation (brew install ollama / curl -fsSL https://ollama.ai/install.sh | sh)
  • Production-quality models (Llama 3, Mistral, Phi, etc.)
  • OpenAI-compatible API (minimal integration effort)
  • Active community and regular updates

Use Cases

  1. Getting Started Experience

New developers should be able to run:

#Install Ollama
brew install ollama

#Pull a model
ollama pull llama3.2

#Start Lightspeed Stack (no cloud setup needed!)
OLLAMA_MODEL=llama3.2 make run
  1. Development & Testing
  • Run unit/integration tests without API costs
  • Iterate quickly without rate limits
  • Test RAG pipelines with local embeddings
  • Develop offline
  1. CI/CD Pipelines
  • Run E2E tests in GitHub Actions without managing secrets
  • No quota concerns for parallel test runs
  • Reproducible test environment
  1. Educational & Demo Purposes
  • Workshop attendees don't need cloud accounts
  • Demo the system without internet connectivity
  • Training environments without budget concerns
  1. Privacy-Sensitive Development
  • Test with sensitive data locally
  • Develop features for air-gapped deployments
  • Comply with data sovereignty requirements

Benefits

  1. For Developers:
  • Zero cost for development
  • Faster iteration (no network latency)
  • No quota limits during testing
  • Lower barrier to contribution
  1. For the Project:
  • Increased adoption (easier onboarding)
  • Better test coverage (developers can test more)
  • Broader contributor base (global accessibility)
  • Production flexibility (hybrid cloud/local deployments)
  1. For Users:
  • Try before committing to cloud providers
  • Development/staging environments without cloud costs
  • Option for fully private deployments

Example Configuration

examples/ollama-run.yaml:

  version: '2'
  image_name: local-development-stack

  providers:
    inference:
      - provider_id: ollama
        provider_type: remote::ollama
        config:
          base_url: http://localhost:11434

  models:
    - model_id: llama3.2
      provider_id: ollama
      model_type: llm
      provider_model_id: llama3.2

    - model_id: nomic-embed-text
      provider_id: ollama
      model_type: embedding
      provider_model_id: nomic-embed-text

Describe alternatives you've considered

  1. Continue cloud-only: Creates unnecessary barriers
  2. Use sentence-transformers only - Insufficient: Only supports embeddings, not chat
  3. Support meta-reference (Llama direct) - More complex setup than Ollama
  4. Support multiple local providers - Start with Ollama first, expand later

Additional context

  • Similar projects (LangChain, LlamaIndex) prominently feature local model support as a first-class option alongside cloud providers.

  • Ollama is already listed in llama-stack's provider list, just needs dependency installation and testing.

Looks like this will be a high-impact, medium-effort improvement that significantly reduces friction for new contributors and developers. The infrastructure is already in place via llama-stack; we primarily need:

  • Dependency addition
  • Example configurations
  • Documentation updates
  • Test validation

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions