-
Notifications
You must be signed in to change notification settings - Fork 207
docs: update langchain docs #2683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -8,17 +8,26 @@ slug: /integrations/langchain | |
|
|
||
| import ThirdPartyDisclaimer from '@site/sources/_partials/_third-party-integration.mdx'; | ||
|
|
||
| > For more information on LangChain visit its [documentation](https://docs.langchain.com/oss/python/langchain/overview). | ||
| > For more information on LangChain visit its [documentation](https://docs.langchain.com/oss/python/langchain/overview). The Apify integration lives in the [langchain-apify](https://github.com/apify/langchain-apify) repository. | ||
|
|
||
| <ThirdPartyDisclaimer /> | ||
|
|
||
| In this example, we'll use the [Website Content Crawler](https://apify.com/apify/website-content-crawler) Actor, which can deeply crawl websites such as documentation, knowledge bases, help centers, or blogs and extract text content from the web pages. | ||
| Then we feed the documents into a vector index and answer questions from it. | ||
|
|
||
| This example demonstrates how to integrate Apify with LangChain using the Python language. | ||
| If you prefer to use JavaScript, you can follow the [JavaScript LangChain documentation](https://docs.langchain.com/oss/javascript/integrations/document_loaders/web_loaders/apify_dataset). | ||
| This example demonstrates how to integrate Apify with LangChain in Python. | ||
|
|
||
| :::info Python only | ||
|
|
||
| The `langchain-apify` package is currently available for Python only. | ||
|
|
||
| ::: | ||
|
|
||
| Before we start with the integration, we need to install all dependencies: | ||
|
|
||
| `pip install langchain langchain-openai langchain-apify` | ||
| ```bash | ||
| pip install langchain langchain-openai langchain-apify | ||
| ``` | ||
|
|
||
| After successful installation of all dependencies, we can start writing code. | ||
|
|
||
|
|
@@ -39,7 +48,7 @@ Find your [Apify API token](https://console.apify.com/settings/integrations) and | |
|
|
||
| ```python | ||
| os.environ["OPENAI_API_KEY"] = "Your OpenAI API key" | ||
| os.environ["APIFY_API_TOKEN"] = "Your Apify API token" | ||
| os.environ["APIFY_TOKEN"] = "Your Apify API token" | ||
| ``` | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this a verified change? The langchain readme (https://github.com/apify/langchain-apify) still uses APIFY_API_TOKEN.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yup yup, I updated it based on @MQ37 suggestion in this review apify/langchain-apify#28 (comment) |
||
|
|
||
| Run the Actor, wait for it to finish, and fetch its results from the Apify dataset into a LangChain document loader. | ||
|
|
@@ -48,7 +57,7 @@ Note that if you already have some results in an Apify dataset, you can load the | |
|
|
||
| ```python | ||
| apify = ApifyWrapper() | ||
| llm = ChatOpenAI(model="gpt-4o-mini") | ||
| llm = ChatOpenAI(model="gpt-5-mini") | ||
|
|
||
| loader = apify.call_actor( | ||
| actor_id="apify/website-content-crawler", | ||
|
|
@@ -97,10 +106,10 @@ from langchain_openai import ChatOpenAI | |
| from langchain_openai.embeddings import OpenAIEmbeddings | ||
|
|
||
| os.environ["OPENAI_API_KEY"] = "Your OpenAI API key" | ||
| os.environ["APIFY_API_TOKEN"] = "Your Apify API token" | ||
| os.environ["APIFY_TOKEN"] = "Your Apify API token" | ||
|
|
||
| apify = ApifyWrapper() | ||
| llm = ChatOpenAI(model="gpt-4o-mini") | ||
| llm = ChatOpenAI(model="gpt-5-mini") | ||
|
|
||
| print("Call website content crawler ...") | ||
| loader = apify.call_actor( | ||
|
|
@@ -136,21 +145,184 @@ It provides modules you can use to build language model applications as well as | |
|
|
||
| You can use all of Apify’s Actors as document loaders in LangChain. | ||
| For example, to incorporate web browsing functionality, you can use the [RAG-Web-Browser Actor](https://apify.com/apify/rag-web-browser). | ||
| This allows you to either crawl and scrape top pages from Google Search results or directly scrape text content from a URL and return it as Markdown. | ||
| This allows you to either crawl and scrape top pages from Google Search results or directly scrape text content from a URL and return it as markdown. | ||
| To set this up, change the `actor_id` to `apify/rag-web-browser` and specify the `run_input`. | ||
|
|
||
| ```python | ||
| loader = apify.call_actor( | ||
| actor_id="apify/rag-web-browser", | ||
| run_input={"query": "apify langchain web browser", "maxResults": 3}, | ||
| dataset_mapping_function=lambda item: Document(page_content=item["text"] or "", metadata={"source": item["metadata"]["url"]}), | ||
| dataset_mapping_function=lambda item: Document(page_content=item["markdown"] or "", metadata={"source": item["metadata"]["url"]}), | ||
| ) | ||
| print("Documents:", loader.load()) | ||
| ``` | ||
|
|
||
| Similarly, you can use other Apify Actors to load data into LangChain and query the vector index. | ||
|
|
||
| <ThirdPartyDisclaimer /> | ||
| ## Use Actors as LangChain tools | ||
|
|
||
| The `ApifyWrapper` shown above loads Actor output into a vector index. For agent workflows, the `langchain-apify` package also ships dedicated tools that wrap specific Actors behind a simplified, LLM-friendly interface, so an agent can call them without knowing Actor IDs or input schemas. Each tool is a standard LangChain `BaseTool`, so it works anywhere LangChain tools do. | ||
|
|
||
| ### Choose the right tool set | ||
|
|
||
| The package provides 19 tools split across three lists, so you can bind a focused subset to an agent instead of importing everything: | ||
|
|
||
| | Tool set | Tools | Use case | | ||
| | --- | --- | --- | | ||
| | `APIFY_CORE_TOOLS` | 6 | Run any Actor or saved task, fetch dataset items, scrape a single URL to markdown | | ||
| | `APIFY_SEARCH_TOOLS` | 6 | Google Search, Google Maps, YouTube, multi-page website crawling, RAG web browsing, e-commerce | | ||
| | `APIFY_SOCIAL_TOOLS` | 7 | Instagram, LinkedIn, Twitter/X, TikTok, Facebook | | ||
|
|
||
| A model selects tools based on their names and descriptions. The more tools you register, the larger the decision space, which can lead to wrong tool selection, slower responses, and higher token usage. Register only the tools your agent needs. | ||
|
|
||
| Each list holds tool *classes*, so instantiate them before passing them to an agent. There are three ways to compose your tool list: | ||
|
|
||
| 1. Bind a whole tool set when your agent needs the full category: | ||
|
|
||
| ```python | ||
| from langchain_apify import APIFY_SEARCH_TOOLS | ||
|
|
||
| tools = [tool_cls() for tool_cls in APIFY_SEARCH_TOOLS] | ||
| ``` | ||
|
|
||
| 1. Import individual tools for tighter control: | ||
|
|
||
| ```python | ||
| from langchain_apify import ApifyScrapeUrlTool, ApifyGoogleSearchTool | ||
|
|
||
| tools = [ApifyScrapeUrlTool(), ApifyGoogleSearchTool()] | ||
| ``` | ||
|
|
||
| 1. Mix a tool set with individual tools: | ||
|
|
||
| ```python | ||
| from langchain_apify import APIFY_CORE_TOOLS, ApifyTwitterScraperTool | ||
|
|
||
| tools = [tool_cls() for tool_cls in APIFY_CORE_TOOLS] + [ApifyTwitterScraperTool()] | ||
| ``` | ||
|
|
||
| ### Call a tool directly | ||
|
|
||
| Each tool can run on its own, which is the quickest way to see what it returns. Every call runs a real Actor on the Apify platform, so it may take from seconds to minutes. Set your Apify API token and invoke the tool with its input: | ||
|
|
||
| ```python | ||
| import json | ||
| import os | ||
|
|
||
| from langchain_apify import ApifyGoogleSearchTool | ||
|
|
||
| os.environ["APIFY_TOKEN"] = "Your Apify API token" | ||
|
|
||
| tool = ApifyGoogleSearchTool() | ||
| result = tool.invoke({"query": "what is langchain", "max_results": 3}) | ||
|
|
||
| payload = json.loads(result) | ||
| print("Run status:", payload["run"]["status"]) | ||
| print("Items returned:", len(payload["items"])) | ||
| ``` | ||
|
|
||
| Most tools return a JSON string with two keys: `run` (run metadata such as `status` and `dataset_id`) and `items` (the Actor's dataset items). | ||
|
|
||
| :::note Some components return a different shape | ||
|
|
||
| `ApifyScrapeUrlTool` follows the same JSON envelope, with the scraped markdown in the single item's `content` field. `ApifySearchRetriever` and `ApifyCrawlLoader` are the exceptions: they return LangChain `Document` objects instead. | ||
|
|
||
| ::: | ||
|
|
||
| ### Give the tools to an agent | ||
|
|
||
| To let a model decide when to call the tools, bind a tool list to an agent. The example below uses LangGraph's prebuilt ReAct agent, so install it alongside the previous dependencies: | ||
|
|
||
| ```bash | ||
| pip install langgraph | ||
| ``` | ||
|
|
||
| ```python | ||
| import os | ||
|
|
||
| from langchain_apify import APIFY_SEARCH_TOOLS | ||
| from langchain_openai import ChatOpenAI | ||
| from langgraph.prebuilt import create_react_agent | ||
|
|
||
| os.environ["OPENAI_API_KEY"] = "Your OpenAI API key" | ||
| os.environ["APIFY_TOKEN"] = "Your Apify API token" | ||
|
|
||
| model = ChatOpenAI(model="gpt-5-mini") | ||
| tools = [tool_cls() for tool_cls in APIFY_SEARCH_TOOLS] | ||
| agent = create_react_agent(model, tools) | ||
|
|
||
| response = agent.invoke( | ||
| {"messages": [("human", "Search the web and tell me what Apify is.")]} | ||
| ) | ||
| print(response["messages"][-1].content) | ||
| ``` | ||
|
|
||
| ### Retrieve documents for RAG | ||
|
|
||
| For retrieval-augmented generation, `ApifySearchRetriever` returns LangChain `Document` objects directly, so it plugs into a RAG chain in place of the vector index built earlier: | ||
|
|
||
| ```python | ||
| import os | ||
|
|
||
| from langchain_apify import ApifySearchRetriever | ||
|
|
||
| os.environ["APIFY_TOKEN"] = "Your Apify API token" | ||
|
|
||
| retriever = ApifySearchRetriever(max_results=3) | ||
| docs = retriever.invoke("What is LangChain?") | ||
| print(docs) | ||
| ``` | ||
|
|
||
| To crawl a whole site into documents on demand, use `ApifyCrawlLoader` instead. | ||
|
|
||
| ## Tool reference | ||
|
|
||
| Every dedicated tool below is importable from `langchain_apify`. For full parameter tables, see the [LangChain Apify provider page](https://docs.langchain.com/oss/python/integrations/providers/apify). | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The list of parameters is currently not on the langchain docs page. Will it be published sometime later?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I will open the LangChain docs PR once we merge the new code version. |
||
|
|
||
| ### Core tools (`APIFY_CORE_TOOLS`) | ||
|
|
||
| Generic platform primitives that run any Actor or task and read datasets. | ||
|
|
||
| - `ApifyRunActorTool` - start any Actor by ID and return run metadata only (run ID, status, dataset ID). Pair with `ApifyGetDatasetItemsTool`. | ||
| - `ApifyRunActorAndGetDatasetTool` - run any Actor and return both run metadata and dataset items in one call. | ||
| - `ApifyGetDatasetItemsTool` - read items from an existing dataset by ID, with `limit` / `offset` pagination. | ||
| - `ApifyScrapeUrlTool` - scrape a single URL and return its markdown content in a JSON envelope (in the item's `content` field). | ||
| - `ApifyRunTaskTool` - run a saved [Actor task](/platform/actors/running/tasks) by ID and return run metadata. | ||
| - `ApifyRunTaskAndGetDatasetTool` - run a saved task and return both run metadata and dataset items. | ||
|
|
||
| ### Search and crawling tools (`APIFY_SEARCH_TOOLS`) | ||
|
|
||
| Web search, crawling, and platform-specific search. | ||
|
|
||
| - `ApifyGoogleSearchTool` - structured Google Search results. Wraps [apify/google-search-scraper](https://apify.com/apify/google-search-scraper). | ||
| - `ApifyWebCrawlerTool` - crawl multiple pages of a site and return each as markdown. Wraps [apify/website-content-crawler](https://apify.com/apify/website-content-crawler). | ||
| - `ApifyRAGWebBrowserTool` - search the web and return the top results' content as JSON. Wraps [apify/rag-web-browser](https://apify.com/apify/rag-web-browser). | ||
| - `ApifyGoogleMapsTool` - Google Maps place results for a query. Wraps [compass/crawler-google-places](https://apify.com/compass/crawler-google-places). | ||
| - `ApifyYouTubeScraperTool` - search YouTube or scrape a video / channel URL. Wraps [streamers/youtube-scraper](https://apify.com/streamers/youtube-scraper). | ||
| - `ApifyEcommerceScraperTool` - extract product or category-listing data. Wraps [apify/e-commerce-scraping-tool](https://apify.com/apify/e-commerce-scraping-tool). | ||
|
|
||
| ### Social media tools (`APIFY_SOCIAL_TOOLS`) | ||
|
|
||
| Scrape posts and profiles from major social platforms. | ||
|
|
||
| - `ApifyInstagramScraperTool` - users, hashtags, posts, or comments. Wraps [apify/instagram-scraper](https://apify.com/apify/instagram-scraper). | ||
| - `ApifyLinkedInProfilePostsTool` - recent posts from a LinkedIn profile. Wraps [apimaestro/linkedin-profile-posts](https://apify.com/apimaestro/linkedin-profile-posts). | ||
| - `ApifyLinkedInProfileSearchTool` - search LinkedIn profiles by keyword. Wraps [harvestapi/linkedin-profile-search](https://apify.com/harvestapi/linkedin-profile-search). | ||
| - `ApifyLinkedInProfileDetailTool` - full detail for a single LinkedIn profile. Wraps [apimaestro/linkedin-profile-detail](https://apify.com/apimaestro/linkedin-profile-detail). | ||
| - `ApifyTwitterScraperTool` - tweets via search, a user's timeline, or replies. Wraps [apidojo/twitter-scraper-lite](https://apify.com/apidojo/twitter-scraper-lite). | ||
| - `ApifyTikTokScraperTool` - TikTok by search, user, hashtag, or video URL. Wraps [clockworks/tiktok-scraper](https://apify.com/clockworks/tiktok-scraper). | ||
| - `ApifyFacebookPostsScraperTool` - posts from a public Facebook page. Wraps [apify/facebook-posts-scraper](https://apify.com/apify/facebook-posts-scraper). | ||
|
|
||
| ### Run any other Actor | ||
|
|
||
| For Actors without a dedicated tool, use `ApifyActorsTool`. Construct it with the Actor ID; it builds an input schema from the Actor and accepts a `run_input` dict matching that Actor's input: | ||
|
|
||
| ```python | ||
| from langchain_apify import ApifyActorsTool | ||
|
|
||
| tool = ApifyActorsTool("apify/rag-web-browser") | ||
| result = tool.invoke({"run_input": {"query": "latest AI news", "maxResults": 3}}) | ||
| ``` | ||
|
|
||
| ## Resources | ||
|
|
||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.