Skip to content

feat(langchain): split provider integrations into optional extras#1989

Open
marcusds wants to merge 1 commit into
NVIDIA:developfrom
marcusds:astd-165-split-langchain-extras
Open

feat(langchain): split provider integrations into optional extras#1989
marcusds wants to merge 1 commit into
NVIDIA:developfrom
marcusds:astd-165-split-langchain-extras

Conversation

@marcusds
Copy link
Copy Markdown

@marcusds marcusds commented May 25, 2026

Summary

nvidia-nat-langchain currently declares every langchain provider integration as a required dependency:

langchain-aws, langchain-community, langchain-exa, langchain-huggingface, langchain-litellm, langchain-milvus, langchain-nvidia-ai-endpoints, langchain-oci, langchain-openai, langchain-tavily, plus openevals and wikipedia.

A consumer that only configures _type: openai + _type: react_agent (the minimum to run a basic ReAct agent) still pulls all of them. That blows up wheel size, slows pip resolution, and -- most painfully -- transitive langchain-aws / boto3 / botocore constraints have been triggering ResolutionTooDeep failures downstream that require manual pinning to work around.

The good news: every provider module is already lazy-imported inside the matching register_llm_client / register_function body, so the install-time deps were only nominally required. This PR promotes that reality into the package metadata.

Dependency split

Core deps (always installed), matching what is actually imported at module load time:

  • langchain, langchain-classic, langchain-core, langgraph
  • nvidia-nat-{core,eval,opentelemetry}
  • pyopenssl (existing transitive pin)

Optional extras (provider integrations):

Extra Pulls in
[aws] langchain-aws
[community] langchain-community, wikipedia
[exa] langchain-exa
[huggingface] langchain-huggingface
[judge] openevals (LLM-as-judge eval)
[litellm] langchain-litellm
[milvus] langchain-milvus
[nvidia] langchain-nvidia-ai-endpoints
[oci] langchain-oci
[openai] langchain-openai
[tavily] langchain-tavily
[all] All of the above

Backwards compatibility

Both the top-level nvidia-nat[langchain] and nvidia-nat[most] extras now resolve to nvidia-nat-langchain[all] == {version}, so anyone installing the toolkit via the umbrella distribution continues to get the exact same resolved tree as today. Only the bare pip install nvidia-nat-langchain install surface shrinks; users who depend on a specific provider need to add the corresponding extra (e.g. pip install \"nvidia-nat-langchain[openai]\").

uv.lock has been regenerated against the new graph.

Why this is safe today

A grep across packages/nvidia_nat_langchain/src/ confirms only langchain_core, langchain_classic, and langgraph appear in top-level from ... import ... statements. Every provider module (langchain_aws, langchain_openai, langchain_oci, langchain_huggingface, langchain_litellm, langchain_milvus, langchain_exa, langchain_tavily, langchain_community, langchain_nvidia_ai_endpoints) is imported lazily inside its registration function. openevals is already wrapped in a try/except ImportError in eval/langsmith_judge.py. So the split changes which deps pip installs, not which code paths execute.

Test plan

  • CI green on develop-targeted PR (existing tests should pass unchanged because [all] is installed in the test extra)
  • uv sync --extra langchain --extra most resolves the same provider set as before
  • pip install "nvidia-nat-langchain[openai]" succeeds without pulling langchain-aws / boto3
  • pip install "nvidia-nat-langchain" is enough to import nat.plugins.langchain and use _type: react_agent

Summary by CodeRabbit

  • New Features

    • Provider-specific optional extras (aws, openai, community, etc.) available for selective integration installation.
    • New "all" extra installs all provider integrations together for full functionality.
  • Chores

    • Dependency layout restructured: core packages installed by default; provider integrations moved to optional extras for flexible setup.
    • Install groups updated so aggregated extras (e.g., "most" and "langchain") and test tooling now pull the "all" provider bundle.

Review Change Stack

@marcusds marcusds requested review from a team as code owners May 25, 2026 21:21
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 25, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 25, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 99c315e7-7f5e-4026-b1f6-cd53c7080a85

📥 Commits

Reviewing files that changed from the base of the PR and between 2fbfe55 and 015729b.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (2)
  • packages/nvidia_nat_langchain/pyproject.toml
  • pyproject.toml

Walkthrough

This PR restructures dependency management in the NVIDIA NeMo Agent Toolkit to separate core packages from provider integrations. Provider packages are moved from base dependencies into optional extras (aws, openai, community, etc.), and the root pyproject.toml is updated to consume the new dependency structure.

Changes

Dependency Structure Refactoring

Layer / File(s) Summary
nvidia-nat-langchain dependency restructuring
packages/nvidia_nat_langchain/pyproject.toml
Core [tool.setuptools_dynamic_dependencies] dependencies reduced to only core nvidia-nat-* and LangChain/LangGraph packages; provider packages (langchain-aws, langchain-openai, etc.) moved to individual provider-specific extras (aws, openai, community, etc.); all extra created to include all provider extras; test extra updated to use nvidia-nat-langchain[all].
Root pyproject.toml extras consumption
pyproject.toml
Root pyproject.toml langchain and most extras updated to use nvidia-nat-langchain[all] instead of base nvidia-nat-langchain, aligning aggregated installs with the new provider-extras structure.

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(langchain): split provider integrations into optional extras' is concise (65 chars), descriptive, uses imperative mood, and accurately reflects the main change of reorganizing langchain provider dependencies into optional setuptools extras.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

The nvidia-nat-langchain distribution required every supported langchain
provider integration (langchain-aws, langchain-community, langchain-oci,
langchain-openai, ...) at install time. Consumers that only configure one
provider still pulled in all of them, which bloated wheels and made pip
resolution fragile -- transitive aws/boto3 pins in particular trigger
ResolutionTooDeep failures downstream.

Every provider module is already lazy-imported inside the corresponding
register_llm_client / register_function body, so the deps were only
nominally required. Promote that reality into the package metadata:

  * Core deps now cover only the langchain pieces imported at module
    load time: langchain, langchain-classic, langchain-core, langgraph
    (plus the existing nvidia-nat-* siblings and the pyopenssl
    transitive pin).
  * Each provider integration moves behind an extra: [aws], [community]
    (langchain-community + wikipedia), [exa], [huggingface], [judge]
    (openevals), [litellm], [milvus], [nvidia], [oci], [openai],
    [tavily].
  * A new [all] extra installs every provider, matching the pre-split
    behavior. The top-level `nvidia-nat[langchain]` and `nvidia-nat[most]`
    extras now resolve to `nvidia-nat-langchain[all]` so existing
    consumers see no functional change.

Lock regenerated to reflect the new graph.
@marcusds marcusds force-pushed the astd-165-split-langchain-extras branch from 2fbfe55 to 015729b Compare May 25, 2026 21:23
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pyproject.toml (1)

1-1: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update copyright year range for consistency.

The copyright header shows only "2026", but the root pyproject.toml almost certainly existed in 2025 (the package file at packages/nvidia_nat_langchain/pyproject.toml shows "2025-2026"). Per the guideline to "confirm that copyright years are up-to date whenever a file is changed", the range should be updated to reflect the file's history.

📅 Proposed fix
-# SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES.
+# SPDX-FileCopyrightText: Copyright (c) 2025-2026, NVIDIA CORPORATION & AFFILIATES.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pyproject.toml` at line 1, Update the SPDX copyright header line that
currently reads "Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES." to the
full range "2025-2026" so it matches the repository's root/package metadata;
locate the SPDX header line in this file (the first line) and replace the
single-year value with the range "2025-2026" to keep copyright years consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@pyproject.toml`:
- Line 1: Update the SPDX copyright header line that currently reads "Copyright
(c) 2026, NVIDIA CORPORATION & AFFILIATES." to the full range "2025-2026" so it
matches the repository's root/package metadata; locate the SPDX header line in
this file (the first line) and replace the single-year value with the range
"2025-2026" to keep copyright years consistent.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a97f0628-5e74-4d8b-a798-0d7e61241918

📥 Commits

Reviewing files that changed from the base of the PR and between b09789d and 2fbfe55.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (2)
  • packages/nvidia_nat_langchain/pyproject.toml
  • pyproject.toml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant