Skip to content

adding docs for the llm chat interface#552

Open
mohalkh5 wants to merge 1 commit into
ResearchComputing:mainfrom
mohalkh5:llmchat
Open

adding docs for the llm chat interface#552
mohalkh5 wants to merge 1 commit into
ResearchComputing:mainfrom
mohalkh5:llmchat

Conversation

@mohalkh5

Copy link
Copy Markdown
Contributor

No description provided.

@monaghaa monaghaa left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like more than it is.... To shorten up the launch instructions into more of a "Quick start", I rearranged some materials in that section and put several of the configuration-related steps into a drop-down.

The **LLM Chat Interface** is an interactive Open OnDemand application that provides a browser-based chat experience powered by [Chainlit](https://docs.chainlit.io) and [Ollama](https://ollama.com). When you launch a session, the application starts an Ollama server on a GPU-equipped compute node and connects it to CURC-hosted large language models (LLMs). You can ask questions, draft and debug code, summarize documents, and analyze images (with vision-capable models), all from your web browser.

```{important}
The LLM Chat Interface is currently offered as a beta service. Functionality, available models, and resource allocations may change as we gather feedback and refine the service. Please report issues or suggestions through the [CURC support form](https://colorado.service-now.com/req_portal?id=ucb_sc_rc_form).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The LLM Chat Interface is currently offered as a beta service. Functionality, available models, and resource allocations may change as we gather feedback and refine the service. Please report issues or suggestions through the [CURC support form](https://colorado.service-now.com/req_portal?id=ucb_sc_rc_form).
The LLM Chat Interface is currently offered as a beta service. Functionality, available models, and resource allocations may change as we gather feedback and refine the service. Please report issues or make suggestions through the [CURC support form](https://colorado.service-now.com/req_portal?id=ucb_sc_rc_form).

2. Navigate to either the **Interactive Apps** drop-down menu or the **My Interactive Sessions** tab and select **LLM Chat Interface**.
3. Review the launch form fields:

- Ollama model path — Select which model library Ollama should load. The default **CURC LLM Models** uses CURC-hosted models. You may also provide the **absolute path** to your own Ollama model directory, if you have pulled or fine-tuned models there. See [Ollama documentation](../ai-ml/llms.md#ollama) for more details.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Ollama model path — Select which model library Ollama should load. The default **CURC LLM Models** uses CURC-hosted models. You may also provide the **absolute path** to your own Ollama model directory, if you have pulled or fine-tuned models there. See [Ollama documentation](../ai-ml/llms.md#ollama) for more details.
- Ollama model path — Select which model library Ollama should load. The default **CURC LLM Models** uses CURC-hosted models. You may also provide the **absolute path** to your own Ollama model directory, if you have downloaded or fine-tuned models there. See [Ollama documentation](../ai-ml/llms.md#ollama) for more details.

4. If you selected **Preset configuration**, choose **10 cores, 1 GPU, 1 hour**. This submits your job to Alpine with one GPU, which is required to run the LLM backend.

```{important}
When you launch with **Preset configuration**, your job is submitted to Alpine **testing** hardware (`atesting_a100` partition, `testing` QoS). Testing resources are shared and **limited in capacity**, so sessions may queue during high demand, run for a maximum of **one hour**, and are not intended for sustained or large-scale work. For longer runtimes or heavier workloads, use **Custom configuration** and request appropriate resources.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When you launch with **Preset configuration**, your job is submitted to Alpine **testing** hardware (`atesting_a100` partition, `testing` QoS). Testing resources are shared and **limited in capacity**, so sessions may queue during high demand, run for a maximum of **one hour**, and are not intended for sustained or large-scale work. For longer runtimes or heavier workloads, use **Custom configuration** and request appropriate resources.

@@ -0,0 +1,164 @@
# LLM Chat Interface

The **LLM Chat Interface** is an interactive Open OnDemand application that provides a browser-based chat experience powered by [Chainlit](https://docs.chainlit.io) and [Ollama](https://ollama.com). When you launch a session, the application starts an Ollama server on a GPU-equipped compute node and connects it to CURC-hosted large language models (LLMs). You can ask questions, draft and debug code, summarize documents, and analyze images (with vision-capable models), all from your web browser.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The **LLM Chat Interface** is an interactive Open OnDemand application that provides a browser-based chat experience powered by [Chainlit](https://docs.chainlit.io) and [Ollama](https://ollama.com). When you launch a session, the application starts an Ollama server on a GPU-equipped compute node and connects it to CURC-hosted large language models (LLMs). You can ask questions, draft and debug code, summarize documents, and analyze images (with vision-capable models), all from your web browser.
The **LLM Chat Interface** is an interactive Open OnDemand application that provides a browser-based chat experience to support a variety of [use cases](#use-cases). It is powered by [Chainlit](https://docs.chainlit.io) and [Ollama](https://ollama.com). When you launch a session, the application starts an Ollama server on a GPU-equipped compute node and connects it to CURC-hosted large language models (LLMs). You can ask questions, draft and debug code, summarize documents, and analyze images (with vision-capable models), all from your web browser.

- Ollama model path — Select which model library Ollama should load. The default **CURC LLM Models** uses CURC-hosted models. You may also provide the **absolute path** to your own Ollama model directory, if you have pulled or fine-tuned models there. See [Ollama documentation](../ai-ml/llms.md#ollama) for more details.
- Configuration type — Choose Preset configuration (recommended for most users) or Custom configuration for advanced resource control. For details on these options, see [Configuring Open OnDemand interactive applications](./configuring_apps.md).

4. If you selected **Preset configuration**, choose **10 cores, 1 GPU, 1 hour**. This submits your job to Alpine with one GPU, which is required to run the LLM backend.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
4. If you selected **Preset configuration**, choose **10 cores, 1 GPU, 1 hour**. This submits your job to Alpine with one GPU, which is required to run the LLM backend.
- If you selected **Preset configuration**, choose **10 cores, 1 GPU, 1 hour**. This submits your job to Alpine with one GPU, which is required to run the LLM backend.


5. If you selected **Custom configuration**, you **must** request at least one GPU in the **gres** field (for example, `gpu:1`). See the [Limitations](#limitations) section for guidance on GPU memory (VRAM) and model size.
6. Click **Launch** and wait for your session to start. When the job is ready, click **Connect to LLM Chat Interface** to open the chat in a new browser tab.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```{important}

5. If you selected **Custom configuration**, you **must** request at least one GPU in the **gres** field (for example, `gpu:1`). See the [Limitations](#limitations) section for guidance on GPU memory (VRAM) and model size.
6. Click **Launch** and wait for your session to start. When the job is ready, click **Connect to LLM Chat Interface** to open the chat in a new browser tab.


Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When you launch with **Preset configuration**, your job is submitted to Alpine **testing** hardware (`atesting_a100` partition, `testing` QoS). Testing resources are shared and **limited in capacity**, so sessions may queue during high demand. Preset sessions run for a maximum of **one hour**, and are not intended for sustained or large-scale work. For longer runtimes or heavier workloads, use **Custom configuration** and request appropriate resources (jobs will be subject to queue waits and may not start immediately).


4. If you selected **Preset configuration**, choose **10 cores, 1 GPU, 1 hour**. This submits your job to Alpine with one GPU, which is required to run the LLM backend.

```{important}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```{important}


```{important}
When you launch with **Preset configuration**, your job is submitted to Alpine **testing** hardware (`atesting_a100` partition, `testing` QoS). Testing resources are shared and **limited in capacity**, so sessions may queue during high demand, run for a maximum of **one hour**, and are not intended for sustained or large-scale work. For longer runtimes or heavier workloads, use **Custom configuration** and request appropriate resources.
```

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```

1. Log in to [Open OnDemand](https://curc.readthedocs.io/en/latest/open_ondemand/index.html) using your CURC credentials.
2. Navigate to either the **Interactive Apps** drop-down menu or the **My Interactive Sessions** tab and select **LLM Chat Interface**.
3. Review the launch form fields:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
::::{dropdown} Additional guidance for launch form fields
:icon: note

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants