[RFE] Enable local LLM provider support (Ollama) for development and testing

**Is your feature request related to a problem? Please describe.**

Currently, Lightspeed Stack only officially supports cloud-based LLM providers (OpenAI, Azure, RHOAI, RHEL AI). This creates significant barriers for developers and users trying to:

  - Get started with the project
  - Test and develop locally
  - Control costs during development
  - Work offline or in air-gapped environments
  - Learn the system without cloud dependencies

  Current Situation

  From docs/providers.md:

  | Provider              | Type   | Supported in LCS |
  |-----------------------|--------|------------------|
  | ollama                | remote |        ❌         |
  | meta-reference        | inline |        ❌         |
  | sentence-transformers | inline |        ❌         |

Developers currently must:
  1. Create an OpenAI/Azure account
  2. Add payment information
  3. Manage API keys and quotas
  4. Pay for every test query during development
  5. Wait for quota resets when limits are hit

Example: A developer following the getting started guide encounters:
  ```
RateLimitError: You exceeded your current quota
```
This blocks them from testing basic functionality without adding credits.


**Describe the solution you'd like**

Add official support for Ollama as a local inference provider.

  Ollama provides:
  - Free, unlimited local inference (no API keys, no quotas)
  - Easy installation (brew install ollama / curl -fsSL https://ollama.ai/install.sh | sh)
  - Production-quality models (Llama 3, Mistral, Phi, etc.)
  - OpenAI-compatible API (minimal integration effort)
  - Active community and regular updates

#### Use Cases

1. Getting Started Experience

  New developers should be able to run:
```
#Install Ollama
brew install ollama

#Pull a model
ollama pull llama3.2

#Start Lightspeed Stack (no cloud setup needed!)
OLLAMA_MODEL=llama3.2 make run
```
2. Development & Testing

  - Run unit/integration tests without API costs
  - Iterate quickly without rate limits
  - Test RAG pipelines with local embeddings
  - Develop offline 

3. CI/CD Pipelines

  - Run E2E tests in GitHub Actions without managing secrets
  - No quota concerns for parallel test runs
  - Reproducible test environment

4. Educational & Demo Purposes

  - Workshop attendees don't need cloud accounts
  - Demo the system without internet connectivity
  - Training environments without budget concerns

5. Privacy-Sensitive Development

  - Test with sensitive data locally
  - Develop features for air-gapped deployments
  - Comply with data sovereignty requirements

#### Benefits

  1. For Developers:
  - Zero cost for development
  - Faster iteration (no network latency)
  - No quota limits during testing
  - Lower barrier to contribution

  2. For the Project:
  - Increased adoption (easier onboarding)
  - Better test coverage (developers can test more)
  - Broader contributor base (global accessibility)
  - Production flexibility (hybrid cloud/local deployments)

  3. For Users:
  - Try before committing to cloud providers
  - Development/staging environments without cloud costs
  - Option for fully private deployments

### Example Configuration

examples/ollama-run.yaml:
```
  version: '2'
  image_name: local-development-stack

  providers:
    inference:
      - provider_id: ollama
        provider_type: remote::ollama
        config:
          base_url: http://localhost:11434

  models:
    - model_id: llama3.2
      provider_id: ollama
      model_type: llm
      provider_model_id: llama3.2

    - model_id: nomic-embed-text
      provider_id: ollama
      model_type: embedding
      provider_model_id: nomic-embed-text
```

**Describe alternatives you've considered**

1. Continue cloud-only: Creates unnecessary barriers
2. Use sentence-transformers only - Insufficient: Only supports embeddings, not chat
3. Support meta-reference (Llama direct) - More complex setup than Ollama
4. Support multiple local providers - Start with Ollama first, expand later

**Additional context**

- Similar projects (LangChain, LlamaIndex) prominently feature local model support as a first-class option alongside cloud providers.

- Ollama is already listed in llama-stack's provider list, just needs dependency installation and testing.


Looks like this will be a **high-impact**, **medium-effort** improvement that significantly reduces friction for new contributors and developers. The infrastructure is already in place via llama-stack; we primarily need:
  - Dependency addition
  - Example configurations
  - Documentation updates
  - Test validation





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFE] Enable local LLM provider support (Ollama) for development and testing #784

Use Cases

Benefits

Example Configuration

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Provider	Type	Supported in LCS
ollama	remote	❌
meta-reference	inline	❌
sentence-transformers	inline	❌

[RFE] Enable local LLM provider support (Ollama) for development and testing #784

Description

Use Cases

Benefits

Example Configuration

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions