diff --git a/.gitignore b/.gitignore index 5b50a45..21d8177 100644 --- a/.gitignore +++ b/.gitignore @@ -38,3 +38,11 @@ python/README_files python/README.html mall.Rproj .coverage + +**/*.quarto_ipynb +.venv-site/ +**/__pycache__/ +**/uv.lock +uv.lock +.claude/settings.local.json +python/uv.lock diff --git a/_freeze/index/execute-results/html.json b/_freeze/index/execute-results/html.json index 667ab27..0063f24 100644 --- a/_freeze/index/execute-results/html.json +++ b/_freeze/index/execute-results/html.json @@ -2,7 +2,7 @@ "hash": "4816ce24551f730854f875c468b5c495", "result": { "engine": "knitr", - "markdown": "---\nformat:\n html:\n toc: true\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n\n\n\n\n\n\n[![PyPi](https://img.shields.io/pypi/v/mlverse-mall)](https://pypi.org/project/mlverse-mall/) [![Python tests](https://github.com/mlverse/mall/actions/workflows/python-tests.yaml/badge.svg)](https://github.com/mlverse/mall/actions/workflows/python-tests.yaml) \\| \"CRAN [![R check](https://github.com/mlverse/mall/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/mlverse/mall/actions/workflows/R-CMD-check.yaml) \\| [![Package coverage](https://codecov.io/gh/mlverse/mall/branch/main/graph/badge.svg)](https://app.codecov.io/gh/mlverse/mall?branch=main)\n\n\n\nUse Large Language Models (LLM) to run Natural Language Processing (NLP) \noperations against your data. It takes advantage of the LLMs general language\ntraining in order to get the predictions, thus removing the need to train a new\nNLP model. `mall` is available for R and Python.\n\nIt works by running multiple LLM predictions against your data. The predictions\nare processed row-wise over a specified column. It relies on the \"one-shot\" \nprompt technique to instruct the LLM on a particular NLP operation to perform. \nThe package includes prompts to perform the following specific NLP operations:\n\n- [Sentiment analysis](#sentiment)\n- [Text summarizing](#summarize)\n- [Classify text](#classify)\n- [Extract one, or several](#extract), specific pieces information from the text\n- [Translate text](#translate)\n- [Verify that something is true](#verify) about the text (binary)\n\nFor other NLP operations, `mall` offers the ability for you to [write your own prompt](#custom-prompt).\n\n\n\nIn **R** The functions inside `mall` are designed to easily work with piped \ncommands, such as `dplyr`.\n\n``` r\nreviews |>\n llm_sentiment(review)\n```\n\n\n\nIn **Python**, `mall` is a library extension to [Polars](https://pola.rs/).\n\n``` python\nreviews.llm.sentiment(\"review\")\n```\n\n## Motivation\n\nWe want to new find new ways to help data scientists use LLMs in their daily work.\nUnlike the familiar interfaces, such as chatting and code completion, this \ninterface runs your text data directly against the LLM. This package is inspired\nby the SQL AI functions now offered by vendors such as [Databricks](https://docs.databricks.com/en/large-language-models/ai-functions.html) \nand Snowflake. \n\nThe LLM's flexibility, allows for it to adapt to the subject of your data, and\nprovide surprisingly accurate predictions. This saves the data scientist the \nneed to write and tune an NLP model.\n\nIn recent times, the capabilities of LLMs that can run locally in your computer \nhave increased dramatically. This means that these sort of analysis can run in \nyour machine with good accuracy. It also makes it possible to take \nadvantage of LLMs at your institution, since the data will not leave the \ncorporate network. Additionally, LLM management and integration platforms, such\nas [Ollama](https://ollama.com/), are now very easy to setup and use. `mall`\nuses Ollama as to interact with local LLMs.\n\nIn its latest version, `mall` lets you **use external LLMs such as\n[OpenAI](https://openai.com/), [Gemini](https://gemini.google.com/) and\n[Anthropic](https://www.anthropic.com/)**. In R, `mall` uses the\n[`ellmer`](https://ellmer.tidyverse.org/index.html)\npackage to integrate with the external LLM, and the \n[`chatlas`](https://posit-dev.github.io/chatlas/) package to integrate in Python.\n\n## Install `mall` {#get-started}\n\nInstall the package to get started:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nOfficial version from CRAN:\n\n``` r\ninstall.packages(\"mall\")\n```\n\nDevelopment version from GitHub *(required for remote LLM integration)*:\n\n``` r\npak::pak(\"mlverse/mall/r\")\n```\n\n## Python\n\nOfficial version from PyPi:\n\n``` python\npip install mlverse-mall\n```\n\nDevelopment version from GitHub:\n\n``` python\npip install \"mlverse-mall @ git+https://git@github.com/mlverse/mall.git#subdirectory=python\"\n```\n:::\n\n## Setup the LLM\n\nChoose one of the two following options to setup LLM connectivity:\n\n### Local LLMs, via Ollama {#local-llms}\n\n- [Download Ollama from the official website](https://ollama.com/download)\n\n- Install and start Ollama in your computer\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n- Install Ollama in your machine. The `ollamar` package's website provides this \n[Installation guide](https://hauselin.github.io/ollama-r/#installation)\n\n- Download an LLM model. For example, I have been developing this package using \nLlama 3.2 to test. To get that model you can run:\n\n ``` r\n ollamar::pull(\"llama3.2\")\n ```\n\n## Python\n\n- Install the official Ollama library\n\n ``` python\n pip install ollama\n ```\n\n- Download an LLM model. For example, I have been developing this package\nusing Llama 3.2 to test. To get that model you can run:\n\n ``` python\n import ollama\n ollama.pull('llama3.2')\n ```\n:::\n\n### Remote LLMs {#remote-llms}\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n`mall` uses the `ellmer` package as the integration point to the LLM. This package supports multiple providers such as OpenAI, Anthropic, Google Gemini, etc.\n\n- Install `ellmer`\n\n ``` r\n install.packages(\"ellmer\")\n ```\n\n- Refer to `ellmer`'s documentation to find out how to setup the connections with your selected provider: \n\n- Let `mall` know which `ellmer` object to use during the R session. To do this, call `llm_use()`. Here is an example of using OpenAI:\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n library(mall)\n library(ellmer)\n chat <- chat_openai()\n #> Using model = \"gpt-4.1\".\n llm_use(chat)\n #> \n #> ── mall session object \n #> Backend: ellmerLLM session: model:gpt-4.1R session:\n #> cache_folder:/var/folders/y_/f_0cx_291nl0s8h26t4jg6ch0000gp/T//RtmpmtAm72/_mall_cache14c3f6e10b6db\n ```\n :::\n\n\n**Set a default LLM for your R session**\n\nAs a convenience, `mall` is able to automatically establish a connection with the\nLLM at the beginning o R session. To do this you can use the `.mall_chat` option:\n\n```r\noptions(.mall_chat = ellmer::chat_openai(model = \"gpt-4o\"))\n```\n\nAdd this line to your *.Rprofile* file in order for that code to run every time\nyou start R. You can call `usethis::edit_r_profile()` to open your .Rprofile\nfile so you can add the option. \n\n## Python\n\n`mall` uses the `chatlas` package as the integration point to the LLM. This \npackage supports multiple providers such as OpenAI, Anthropic, Google Gemini, etc.\n\n- Install the `chatlas` library\n\n ``` python\n pip install chatlas\n ```\n\n- Refer to `chatlas`'s documentation to find out how to setup the connections\nwith your selected provider: \n\n- Let `mall` know which `chatlas` object to use during the Python session. \nTo do this, call `llm_use()`. Here is an example of using OpenAI:\n\n ``` python\n import mall\n from chatlas import ChatOpenAI\n\n chat = ChatOpenAI()\n\n data = mall.MallData\n reviews = data.reviews\n\n reviews.llm.use(chat)\n ```\n:::\n\n## LLM functions\n\nWe will start with loading a very small data set contained in `mall`. It has \n3 product reviews that we will use as the source of our examples.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(mall)\ndata(\"reviews\")\n\nreviews\n#> # A tibble: 3 × 1\n#> review \n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. \n#> 2 I regret buying this laptop. It is too slow and the keyboard is too noisy \n#> 3 Not sure how to feel about my new washing machine. Great color, but hard to f…\n```\n:::\n\n\n## Python\n\n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nimport mall \ndata = mall.MallData\nreviews = data.reviews\n\nreviews \n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
review
"This has been the best TV I've ever used. Great screen, and sound."
"I regret buying this laptop. It is too slow and the keyboard is too noisy"
"Not sure how to feel about my new washing machine. Great color, but hard to figure"
\n```\n\n:::\n:::\n\n:::\n\n### Sentiment {#sentiment}\n\nAutomatically returns \"positive\", \"negative\", or \"neutral\" based on the text.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreviews |>\n llm_sentiment(review)\n#> # A tibble: 3 × 2\n#> review .sentiment\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. positive \n#> 2 I regret buying this laptop. It is too slow and the keyboard is to… negative \n#> 3 Not sure how to feel about my new washing machine. Great color, bu… neutral\n```\n:::\n\n\nFor more information and examples visit this function's [R reference page](reference/llm_sentiment.qmd)\n\n## Python\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.sentiment(\"review\")\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewsentiment
"This has been the best TV I've ever used. Great screen, and sound.""positive"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""negative"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""neutral"
\n```\n\n:::\n:::\n\n\nFor more information and examples visit this function's [Python reference page](reference/MallFrame.qmd#mall.MallFrame.sentiment)\n:::\n\n### Summarize {#summarize}\n\nThere may be a need to reduce the number of words in a given text. Typically to \nmake it easier to understand its intent. The function has an argument to control \nthe maximum number of words to output (`max_words`):\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreviews |>\n llm_summarize(review, max_words = 5)\n#> # A tibble: 3 × 2\n#> review .summary \n#> \n#> 1 This has been the best TV I've ever used. Gr… this tv is excellent quality\n#> 2 I regret buying this laptop. It is too slow … i regret my laptop purchase \n#> 3 Not sure how to feel about my new washing ma… confused about the purchase\n```\n:::\n\n\nFor more information and examples visit this function's [R reference page](reference/llm_summarize.qmd)\n\n## Python\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.summarize(\"review\", 5)\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewsummary
"This has been the best TV I've ever used. Great screen, and sound.""it's a great tv set"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop purchase was not wise"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""mixed feelings about new appliance"
\n```\n\n:::\n:::\n\n\nFor more information and examples visit this function's [Python reference page](reference/MallFrame.qmd#mall.MallFrame.summarize)\n:::\n\n### Classify {#classify}\n\nUse the LLM to categorize the text into one of the options you provide:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreviews |>\n llm_classify(review, c(\"appliance\", \"computer\"))\n#> # A tibble: 3 × 2\n#> review .classify\n#> \n#> 1 This has been the best TV I've ever used. Gr… computer \n#> 2 I regret buying this laptop. It is too slow … computer \n#> 3 Not sure how to feel about my new washing ma… appliance\n```\n:::\n\n\nFor more information and examples visit this function's [R reference page](reference/llm_classify.qmd)\n\n## Python\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.classify(\"review\", [\"computer\", \"appliance\"])\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewclassify
"This has been the best TV I've ever used. Great screen, and sound.""appliance"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""computer"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""appliance"
\n```\n\n:::\n:::\n\n\nFor more information and examples visit this function's [Python reference page](reference/MallFrame.qmd#mall.MallFrame.classify)\n:::\n\n### Extract {#extract}\n\nOne of the most interesting use cases Using natural language, we can tell the \nLLM to return a specific part of the text. In the following example, we request \nthat the LLM return the product being referred to. We do this by simply saying \n\"product\". The LLM understands what we *mean* by that word, and looks for that \nin the text.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreviews |>\n llm_extract(review, \"product\")\n#> # A tibble: 3 × 2\n#> review .extract \n#> \n#> 1 This has been the best TV I've ever used. Gr… tv \n#> 2 I regret buying this laptop. It is too slow … laptop \n#> 3 Not sure how to feel about my new washing ma… washing machine\n```\n:::\n\n\nFor more information and examples visit this function's [R reference page](reference/llm_extract.qmd)\n\n## Python\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.extract(\"review\", \"product\")\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewextract
"This has been the best TV I've ever used. Great screen, and sound.""tv"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""washing machine"
\n```\n\n:::\n:::\n\n\nFor more information and examples visit this function's [Python reference page](reference/MallFrame.qmd#mall.MallFrame.extract)\n:::\n\n### Verify {#verify}\n\nThis functions allows you to check and see if a statement is true, based on the\nprovided text. By default, it will return a 1 for \"yes\", and 0 for \"no\". This \ncan be customized.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreviews |>\n llm_verify(review, \"is the customer happy with the purchase\")\n#> # A tibble: 3 × 2\n#> review .verify\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. 1 \n#> 2 I regret buying this laptop. It is too slow and the keyboard is too n… 0 \n#> 3 Not sure how to feel about my new washing machine. Great color, but h… 0\n```\n:::\n\n\nFor more information and examples visit this function's [R reference page](reference/llm_verify.qmd)\n\n## Python\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.verify(\"review\", \"is the customer happy with the purchase\")\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewverify
"This has been the best TV I've ever used. Great screen, and sound."1
"I regret buying this laptop. It is too slow and the keyboard is too noisy"0
"Not sure how to feel about my new washing machine. Great color, but hard to figure"0
\n```\n\n:::\n:::\n\n\nFor more information and examples visit this function's [Python reference page](reference/MallFrame.qmd#mall.MallFrame.verify)\n:::\n\n### Translate {#translate}\n\nAs the title implies, this function will translate the text into a specified \nlanguage. What is really nice, it is that you don't need to specify the language\nof the source text. Only the target language needs to be defined. The \ntranslation accuracy will depend on the LLM\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreviews |>\n llm_translate(review, \"spanish\")\n#> # A tibble: 3 × 2\n#> review .translation \n#> \n#> 1 This has been the best TV I've ever used. Gr… Esta ha sido la mejor televisió…\n#> 2 I regret buying this laptop. It is too slow … Lo lamento comprar este portáti…\n#> 3 Not sure how to feel about my new washing ma… No estoy seguro de cómo sentirm…\n```\n:::\n\n\nFor more information and examples visit this function's [R reference page](reference/llm_translate.qmd)\n\n## Python\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.translate(\"review\", \"spanish\")\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewtranslation
"This has been the best TV I've ever used. Great screen, and sound.""Esto ha sido la mejor televisión que he utilizado jamás. Pantalla y sonido excelentes."
"I regret buying this laptop. It is too slow and the keyboard is too noisy""Lamento haber comprado este portátil. Es demasiado lento y la tecla de espacio es demasiado ruidosa."
"Not sure how to feel about my new washing machine. Great color, but hard to figure""No estoy seguro de cómo sentirme con mi nueva lavadora. Me gusta mucho el color, pero no sé cómo fun…
\n```\n\n:::\n:::\n\n\nFor more information and examples visit this function's [Python reference page](reference/MallFrame.qmd#mall.MallFrame.translate)\n:::\n\n### Custom prompt {#custom-prompt}\n\nIt is possible to pass your own prompt to the LLM, and have `mall` run it \nagainst each text entry:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_prompt <- paste(\n \"Answer a question.\",\n \"Return only the answer, no explanation\",\n \"Acceptable answers are 'yes', 'no'\",\n \"Answer this about the following text, is this a happy customer?:\"\n)\n\nreviews |>\n llm_custom(review, my_prompt)\n#> # A tibble: 3 × 2\n#> review .pred\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. No \n#> 2 I regret buying this laptop. It is too slow and the keyboard is too noi… No \n#> 3 Not sure how to feel about my new washing machine. Great color, but har… No\n```\n:::\n\n\nFor more information and examples visit this function's [R reference page](reference/llm_custom.qmd)\n\n## Python\n\n\n::: {.cell}\n\n```{.python .cell-code}\nmy_prompt = (\n \"Answer a question.\"\n \"Return only the answer, no explanation\"\n \"Acceptable answers are 'yes', 'no'\"\n \"Answer this about the following text, is this a happy customer?:\"\n)\n\nreviews.llm.custom(\"review\", prompt = my_prompt)\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewcustom
"This has been the best TV I've ever used. Great screen, and sound.""Yes"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""No"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""No"
\n```\n\n:::\n:::\n\n\nFor more information and examples visit this function's [Python reference page](reference/MallFrame.qmd#mall.MallFrame.custom)\n:::\n\n## Model selection and settings\n\n#### Local LLMs via Ollama {#settings-local}\n\nYou can set the model and its options to use when calling the LLM. In this case,\nwe refer to options as model specific things that can be set, such as seed or \ntemperature.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nInvoking an `llm` function will automatically initialize a model selection if \nyou don't have one selected yet. If there is only one option, it will pre-select\nit for you. If there are more than one available models, then `mall` will \npresent you as menu selection so you can select which model you wish to use.\n\nCalling `llm_use()` directly will let you specify the model and backend to use.\nYou can also setup additional arguments that will be passed down to the function\nthat actually runs the prediction. In the case of Ollama, that function is [`chat()`](https://hauselin.github.io/ollama-r/reference/chat.html).\n\nThe model to use, and other options can be set for the current R session\n\n\n::: {.cell}\n\n```{.r .cell-code}\nllm_use(\"ollama\", \"llama3.2\", seed = 100, temperature = 0)\n```\n:::\n\n\n## Python\n\nThe model and options to be used will be defined at the Polars data frame object \nlevel. If not passed, the default model will be **llama3.2**.\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.use(\"ollama\", \"llama3.2\", options = dict(seed = 100))\n```\n:::\n\n:::\n\n#### Remote LLMs\n\nThe provider and model selection will be based on the chat object you create. \nAny model related setting, such as temperature, seed and others, should be\nset at the time of the object creation as well.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(mall)\nlibrary(ellmer)\nchat <- chat_openai(model = \"gpt-4o\", seed = 100)\nllm_use(chat)\n```\n:::\n\n\n## Python\n\n\n::: {.cell}\n\n```{.python .cell-code}\nimport mall\nfrom chatlas import ChatOpenAI\nchat = ChatOpenAI(model = \"gpt-4o\", seed= 100)\ndata = mall.MallData\nreviews = data.reviews\nreviews.llm.use(chat)\n```\n:::\n\n:::\n\n\n## Results caching\n\nBy default `mall` caches the requests and corresponding results from a given \nLLM run. Each response is saved as individual JSON files. By default, the folder\nname is `_mall_cache`. The folder name can be customized, if needed. Also, the\ncaching can be turned off by setting the argument to empty (`\"\"`).\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\nllm_use(.cache = \"_my_cache\")\n```\n:::\n\n\nTo turn off:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nllm_use(.cache = \"\")\n```\n:::\n\n\n## Python\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.use(_cache = \"my_cache\")\n```\n:::\n\n\nTo turn off:\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.use(_cache = \"\")\n```\n:::\n\n:::\n\nFor more information see the [Caching Results](articles/caching.qmd) article.\n\n## Key considerations\n\nThe main consideration is **cost**. Either, time cost, or money cost.\n\nIf using this method with an LLM locally available, the cost will be a long \nrunning time. Unless using a very specialized LLM, a given LLM is a general \nmodel. It was fitted using a vast amount of data. So determining a response for\neach row, takes longer than if using a manually created NLP model. The default\nmodel used in Ollama is [Llama 3.2](https://ollama.com/library/llama3.2), which \nwas fitted using 3B parameters.\n\nIf using an external LLM service, the consideration will need to be for the \nbilling costs of using such service. Keep in mind that you will be sending a \nlot of data to be evaluated.\n\nAnother consideration is the novelty of this approach. Early tests are providing\nencouraging results. But you, as an user, will still need to keep in mind that \nthe predictions will not be infallible, so always check the output. At this time,\nI think the best use for this method, is for a quick analysis.\n\n## Vector functions\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n`mall` includes functions that expect a vector, instead of a table, to run the\npredictions. This should make it easier to test things, such as custom prompts\nor results of specific text. Each `llm_` function has a corresponding `llm_vec_`\nfunction:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nllm_vec_sentiment(\"I am happy\")\n#> [1] \"positive\"\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nllm_vec_translate(\"Este es el mejor dia!\", \"english\")\n#> [1] \"It's the best day!\"\n```\n:::\n\n\n## Python \n\n`mall` is also able to process vectors contained in a `list` object. This allows\nus to avoid having to convert a list of texts without having to first convert\nthem into a single column data frame. To use, initialize a new `LLMVec` class\nobject with either an Ollama model, or a `chatlas` `Chat` object, and then\naccess the same NLP functions as the Polars extension.\n\n\n::: {.cell}\n\n```{.python .cell-code}\n# Initialize a Chat object\nfrom chatlas import ChatOllama\nchat = ChatOllama(model = \"llama3.2\")\n\n# Pass it to a new LLMVec\nfrom mall import LLMVec\nllm = LLMVec(chat) \n```\n:::\n\n\nAccess the functions via the new LLMVec object, and pass the text to be processed.\n\n\n::: {.cell}\n\n```{.python .cell-code}\nllm.sentiment([\"I am happy\", \"I am sad\"])\n#> ['positive', 'negative']\n```\n:::\n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nllm.translate([\"Este es el mejor dia!\"], \"english\")\n#> ['This is the best day!']\n```\n:::\n\n\nFor more information visit the reference page: [LLMVec](reference/LLMVec.qmd)\n:::\n\n", + "markdown": "---\nformat:\n html:\n toc: true\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n\n\n\n\n\n\n[![PyPi](https://img.shields.io/pypi/v/mlverse-mall)](https://pypi.org/project/mlverse-mall/) [![Python tests](https://github.com/mlverse/mall/actions/workflows/python-tests.yaml/badge.svg)](https://github.com/mlverse/mall/actions/workflows/python-tests.yaml) \\| \"CRAN [![R check](https://github.com/mlverse/mall/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/mlverse/mall/actions/workflows/R-CMD-check.yaml) \\| [![Package coverage](https://codecov.io/gh/mlverse/mall/branch/main/graph/badge.svg)](https://app.codecov.io/gh/mlverse/mall?branch=main)\n\n\n\nUse Large Language Models (LLM) to run Natural Language Processing (NLP) \noperations against your data. It takes advantage of the LLMs general language\ntraining in order to get the predictions, thus removing the need to train a new\nNLP model. `mall` is available for R and Python.\n\nIt works by running multiple LLM predictions against your data. The predictions\nare processed row-wise over a specified column. It relies on the \"one-shot\" \nprompt technique to instruct the LLM on a particular NLP operation to perform. \nThe package includes prompts to perform the following specific NLP operations:\n\n- [Sentiment analysis](#sentiment)\n- [Text summarizing](#summarize)\n- [Classify text](#classify)\n- [Extract one, or several](#extract), specific pieces information from the text\n- [Translate text](#translate)\n- [Verify that something is true](#verify) about the text (binary)\n\nFor other NLP operations, `mall` offers the ability for you to [write your own prompt](#custom-prompt).\n\n\n\nIn **R** The functions inside `mall` are designed to easily work with piped \ncommands, such as `dplyr`.\n\n``` r\nreviews |>\n llm_sentiment(review)\n```\n\n\n\nIn **Python**, `mall` is a library extension to [Polars](https://pola.rs/).\n\n``` python\nreviews.llm.sentiment(\"review\")\n```\n\n## Motivation\n\nWe want to new find new ways to help data scientists use LLMs in their daily work.\nUnlike the familiar interfaces, such as chatting and code completion, this \ninterface runs your text data directly against the LLM. This package is inspired\nby the SQL AI functions now offered by vendors such as [Databricks](https://docs.databricks.com/en/large-language-models/ai-functions.html) \nand Snowflake. \n\nThe LLM's flexibility, allows for it to adapt to the subject of your data, and\nprovide surprisingly accurate predictions. This saves the data scientist the \nneed to write and tune an NLP model.\n\nIn recent times, the capabilities of LLMs that can run locally in your computer \nhave increased dramatically. This means that these sort of analysis can run in \nyour machine with good accuracy. It also makes it possible to take \nadvantage of LLMs at your institution, since the data will not leave the \ncorporate network. Additionally, LLM management and integration platforms, such\nas [Ollama](https://ollama.com/), are now very easy to setup and use. `mall`\nuses Ollama as to interact with local LLMs.\n\nIn its latest version, `mall` lets you **use external LLMs such as\n[OpenAI](https://openai.com/), [Gemini](https://gemini.google.com/) and\n[Anthropic](https://www.anthropic.com/)**. In R, `mall` uses the\n[`ellmer`](https://ellmer.tidyverse.org/index.html)\npackage to integrate with the external LLM, and the \n[`chatlas`](https://posit-dev.github.io/chatlas/) package to integrate in Python.\n\n## Install `mall` {#get-started}\n\nInstall the package to get started:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nOfficial version from CRAN:\n\n``` r\ninstall.packages(\"mall\")\n```\n\nDevelopment version from GitHub *(required for remote LLM integration)*:\n\n``` r\npak::pak(\"mlverse/mall/r\")\n```\n\n## Python\n\nOfficial version from PyPi:\n\n``` python\npip install mlverse-mall\n```\n\nDevelopment version from GitHub:\n\n``` python\npip install \"mlverse-mall @ git+https://git@github.com/mlverse/mall.git#subdirectory=python\"\n```\n:::\n\n## Setup the LLM\n\nChoose one of the two following options to setup LLM connectivity:\n\n### Local LLMs, via Ollama {#local-llms}\n\n- [Download Ollama from the official website](https://ollama.com/download)\n\n- Install and start Ollama in your computer\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n- Install Ollama in your machine. The `ollamar` package's website provides this \n[Installation guide](https://hauselin.github.io/ollama-r/#installation)\n\n- Download an LLM model. For example, I have been developing this package using \nLlama 3.2 to test. To get that model you can run:\n\n ``` r\n ollamar::pull(\"llama3.2\")\n ```\n\n## Python\n\n- Install the official Ollama library\n\n ``` python\n pip install ollama\n ```\n\n- Download an LLM model. For example, I have been developing this package\nusing Llama 3.2 to test. To get that model you can run:\n\n ``` python\n import ollama\n ollama.pull('llama3.2')\n ```\n:::\n\n### Remote LLMs {#remote-llms}\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n`mall` uses the `ellmer` package as the integration point to the LLM. This package supports multiple providers such as OpenAI, Anthropic, Google Gemini, etc.\n\n- Install `ellmer`\n\n ``` r\n install.packages(\"ellmer\")\n ```\n\n- Refer to `ellmer`'s documentation to find out how to setup the connections with your selected provider: \n\n- Let `mall` know which `ellmer` object to use during the R session. To do this, call `llm_use()`. Here is an example of using OpenAI:\n\n\n ::: {.cell}\n \n ```{.r .cell-code}\n library(mall)\n library(ellmer)\n chat <- chat_openai()\n #> Using model = \"gpt-4.1\".\n llm_use(chat)\n #> \n #> ── mall session object \n #> Backend: ellmerLLM session: model:gpt-4.1R session:\n #> cache_folder:/var/folders/y_/f_0cx_291nl0s8h26t4jg6ch0000gp/T//RtmpBMwtSu/_mall_cache41d836dbb0bd\n ```\n :::\n\n\n**Set a default LLM for your R session**\n\nAs a convenience, `mall` is able to automatically establish a connection with the\nLLM at the beginning o R session. To do this you can use the `.mall_chat` option:\n\n```r\noptions(.mall_chat = ellmer::chat_openai(model = \"gpt-4o\"))\n```\n\nAdd this line to your *.Rprofile* file in order for that code to run every time\nyou start R. You can call `usethis::edit_r_profile()` to open your .Rprofile\nfile so you can add the option. \n\n## Python\n\n`mall` uses the `chatlas` package as the integration point to the LLM. This \npackage supports multiple providers such as OpenAI, Anthropic, Google Gemini, etc.\n\n- Install the `chatlas` library\n\n ``` python\n pip install chatlas\n ```\n\n- Refer to `chatlas`'s documentation to find out how to setup the connections\nwith your selected provider: \n\n- Let `mall` know which `chatlas` object to use during the Python session. \nTo do this, call `llm_use()`. Here is an example of using OpenAI:\n\n ``` python\n import mall\n from chatlas import ChatOpenAI\n\n chat = ChatOpenAI()\n\n data = mall.MallData\n reviews = data.reviews\n\n reviews.llm.use(chat)\n ```\n:::\n\n## LLM functions\n\nWe will start with loading a very small data set contained in `mall`. It has \n3 product reviews that we will use as the source of our examples.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(mall)\ndata(\"reviews\")\n\nreviews\n#> # A tibble: 3 × 1\n#> review \n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. \n#> 2 I regret buying this laptop. It is too slow and the keyboard is too noisy \n#> 3 Not sure how to feel about my new washing machine. Great color, but hard to f…\n```\n:::\n\n\n## Python\n\n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nimport mall \ndata = mall.MallData\nreviews = data.reviews\n\nreviews \n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
review
"This has been the best TV I've ever used. Great screen, and sound."
"I regret buying this laptop. It is too slow and the keyboard is too noisy"
"Not sure how to feel about my new washing machine. Great color, but hard to figure"
\n```\n\n:::\n:::\n\n:::\n\n### Sentiment {#sentiment}\n\nAutomatically returns \"positive\", \"negative\", or \"neutral\" based on the text.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreviews |>\n llm_sentiment(review)\n#> # A tibble: 3 × 2\n#> review .sentiment\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. positive \n#> 2 I regret buying this laptop. It is too slow and the keyboard is to… negative \n#> 3 Not sure how to feel about my new washing machine. Great color, bu… negative\n```\n:::\n\n\nFor more information and examples visit this function's [R reference page](reference/llm_sentiment.qmd)\n\n## Python\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.sentiment(\"review\")\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewsentiment
"This has been the best TV I've ever used. Great screen, and sound.""positive"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""negative"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""positive"
\n```\n\n:::\n:::\n\n\nFor more information and examples visit this function's [Python reference page](reference/MallFrame.qmd#mall.MallFrame.sentiment)\n:::\n\n### Summarize {#summarize}\n\nThere may be a need to reduce the number of words in a given text. Typically to \nmake it easier to understand its intent. The function has an argument to control \nthe maximum number of words to output (`max_words`):\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreviews |>\n llm_summarize(review, max_words = 5)\n#> # A tibble: 3 × 2\n#> review .summary \n#> \n#> 1 This has been the best TV I've ever used. Gr… the tv is excellent quality \n#> 2 I regret buying this laptop. It is too slow … i made a bad purchase \n#> 3 Not sure how to feel about my new washing ma… having mixed feelings about it\n```\n:::\n\n\nFor more information and examples visit this function's [R reference page](reference/llm_summarize.qmd)\n\n## Python\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.summarize(\"review\", 5)\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewsummary
"This has been the best TV I've ever used. Great screen, and sound.""best tv i've ever had"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""it's not the best purchase"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""uncertain about its features"
\n```\n\n:::\n:::\n\n\nFor more information and examples visit this function's [Python reference page](reference/MallFrame.qmd#mall.MallFrame.summarize)\n:::\n\n### Classify {#classify}\n\nUse the LLM to categorize the text into one of the options you provide:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreviews |>\n llm_classify(review, c(\"appliance\", \"computer\"))\n#> # A tibble: 3 × 2\n#> review .classify\n#> \n#> 1 This has been the best TV I've ever used. Gr… computer \n#> 2 I regret buying this laptop. It is too slow … computer \n#> 3 Not sure how to feel about my new washing ma… appliance\n```\n:::\n\n\nFor more information and examples visit this function's [R reference page](reference/llm_classify.qmd)\n\n## Python\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.classify(\"review\", [\"computer\", \"appliance\"])\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewclassify
"This has been the best TV I've ever used. Great screen, and sound.""appliance"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""appliance"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""appliance"
\n```\n\n:::\n:::\n\n\nFor more information and examples visit this function's [Python reference page](reference/MallFrame.qmd#mall.MallFrame.classify)\n:::\n\n### Extract {#extract}\n\nOne of the most interesting use cases Using natural language, we can tell the \nLLM to return a specific part of the text. In the following example, we request \nthat the LLM return the product being referred to. We do this by simply saying \n\"product\". The LLM understands what we *mean* by that word, and looks for that \nin the text.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreviews |>\n llm_extract(review, \"product\")\n#> # A tibble: 3 × 2\n#> review .extract \n#> \n#> 1 This has been the best TV I've ever used. Gr… tv \n#> 2 I regret buying this laptop. It is too slow … laptop \n#> 3 Not sure how to feel about my new washing ma… washing machine\n```\n:::\n\n\nFor more information and examples visit this function's [R reference page](reference/llm_extract.qmd)\n\n## Python\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.extract(\"review\", \"product\")\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewextract
"This has been the best TV I've ever used. Great screen, and sound.""tv"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""washing machine"
\n```\n\n:::\n:::\n\n\nFor more information and examples visit this function's [Python reference page](reference/MallFrame.qmd#mall.MallFrame.extract)\n:::\n\n### Verify {#verify}\n\nThis functions allows you to check and see if a statement is true, based on the\nprovided text. By default, it will return a 1 for \"yes\", and 0 for \"no\". This \ncan be customized.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreviews |>\n llm_verify(review, \"is the customer happy with the purchase\")\n#> # A tibble: 3 × 2\n#> review .verify\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. 1 \n#> 2 I regret buying this laptop. It is too slow and the keyboard is too n… 0 \n#> 3 Not sure how to feel about my new washing machine. Great color, but h… 0\n```\n:::\n\n\nFor more information and examples visit this function's [R reference page](reference/llm_verify.qmd)\n\n## Python\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.verify(\"review\", \"is the customer happy with the purchase\")\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewverify
"This has been the best TV I've ever used. Great screen, and sound."1
"I regret buying this laptop. It is too slow and the keyboard is too noisy"0
"Not sure how to feel about my new washing machine. Great color, but hard to figure"0
\n```\n\n:::\n:::\n\n\nFor more information and examples visit this function's [Python reference page](reference/MallFrame.qmd#mall.MallFrame.verify)\n:::\n\n### Translate {#translate}\n\nAs the title implies, this function will translate the text into a specified \nlanguage. What is really nice, it is that you don't need to specify the language\nof the source text. Only the target language needs to be defined. The \ntranslation accuracy will depend on the LLM\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\nreviews |>\n llm_translate(review, \"spanish\")\n#> # A tibble: 3 × 2\n#> review .translation \n#> \n#> 1 This has been the best TV I've ever used. Gr… \"Esta ha sido la mejor televisi…\n#> 2 I regret buying this laptop. It is too slow … \"Lo lamento comprar este port\\u…\n#> 3 Not sure how to feel about my new washing ma… \"No estoy seguro de c\\u00f3mo s…\n```\n:::\n\n\nFor more information and examples visit this function's [R reference page](reference/llm_translate.qmd)\n\n## Python\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.translate(\"review\", \"spanish\")\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewtranslation
"This has been the best TV I've ever used. Great screen, and sound.""Esta ha sido la mejor televisión que he utilizado hasta ahora. Gran pantalla y sonido."
"I regret buying this laptop. It is too slow and the keyboard is too noisy""Me arrepiento de comprar este portátil. Es demasiado lento y la tecla es muy ruidosa."
"Not sure how to feel about my new washing machine. Great color, but hard to figure""No estoy seguro de como me siento con mi nueva lavadora. Gran color, pero muy difícil de entender"
\n```\n\n:::\n:::\n\n\nFor more information and examples visit this function's [Python reference page](reference/MallFrame.qmd#mall.MallFrame.translate)\n:::\n\n### Custom prompt {#custom-prompt}\n\nIt is possible to pass your own prompt to the LLM, and have `mall` run it \nagainst each text entry:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_prompt <- paste(\n \"Answer a question.\",\n \"Return only the answer, no explanation\",\n \"Acceptable answers are 'yes', 'no'\",\n \"Answer this about the following text, is this a happy customer?:\"\n)\n\nreviews |>\n llm_custom(review, my_prompt)\n#> # A tibble: 3 × 2\n#> review .pred\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. Yes \n#> 2 I regret buying this laptop. It is too slow and the keyboard is too noi… No \n#> 3 Not sure how to feel about my new washing machine. Great color, but har… No\n```\n:::\n\n\nFor more information and examples visit this function's [R reference page](reference/llm_custom.qmd)\n\n## Python\n\n\n::: {.cell}\n\n```{.python .cell-code}\nmy_prompt = (\n \"Answer a question.\"\n \"Return only the answer, no explanation\"\n \"Acceptable answers are 'yes', 'no'\"\n \"Answer this about the following text, is this a happy customer?:\"\n)\n\nreviews.llm.custom(\"review\", prompt = my_prompt)\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n
reviewcustom
"This has been the best TV I've ever used. Great screen, and sound.""Yes"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""No"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""No"
\n```\n\n:::\n:::\n\n\nFor more information and examples visit this function's [Python reference page](reference/MallFrame.qmd#mall.MallFrame.custom)\n:::\n\n## Model selection and settings\n\n#### Local LLMs via Ollama {#settings-local}\n\nYou can set the model and its options to use when calling the LLM. In this case,\nwe refer to options as model specific things that can be set, such as seed or \ntemperature.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nInvoking an `llm` function will automatically initialize a model selection if \nyou don't have one selected yet. If there is only one option, it will pre-select\nit for you. If there are more than one available models, then `mall` will \npresent you as menu selection so you can select which model you wish to use.\n\nCalling `llm_use()` directly will let you specify the model and backend to use.\nYou can also setup additional arguments that will be passed down to the function\nthat actually runs the prediction. In the case of Ollama, that function is [`chat()`](https://hauselin.github.io/ollama-r/reference/chat.html).\n\nThe model to use, and other options can be set for the current R session\n\n\n::: {.cell}\n\n```{.r .cell-code}\nllm_use(\"ollama\", \"llama3.2\", seed = 100, temperature = 0)\n```\n:::\n\n\n## Python\n\nThe model and options to be used will be defined at the Polars data frame object \nlevel. If not passed, the default model will be **llama3.2**.\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.use(\"ollama\", \"llama3.2\", options = dict(seed = 100))\n```\n:::\n\n:::\n\n#### Remote LLMs\n\nThe provider and model selection will be based on the chat object you create. \nAny model related setting, such as temperature, seed and others, should be\nset at the time of the object creation as well.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(mall)\nlibrary(ellmer)\nchat <- chat_openai(model = \"gpt-4o\", seed = 100)\nllm_use(chat)\n```\n:::\n\n\n## Python\n\n\n::: {.cell}\n\n```{.python .cell-code}\nimport mall\nfrom chatlas import ChatOpenAI\nchat = ChatOpenAI(model = \"gpt-4o\", seed= 100)\ndata = mall.MallData\nreviews = data.reviews\nreviews.llm.use(chat)\n```\n:::\n\n:::\n\n\n## Results caching\n\nBy default `mall` caches the requests and corresponding results from a given \nLLM run. Each response is saved as individual JSON files. By default, the folder\nname is `_mall_cache`. The folder name can be customized, if needed. Also, the\ncaching can be turned off by setting the argument to empty (`\"\"`).\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\nllm_use(.cache = \"_my_cache\")\n```\n:::\n\n\nTo turn off:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nllm_use(.cache = \"\")\n```\n:::\n\n\n## Python\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.use(_cache = \"my_cache\")\n```\n:::\n\n\nTo turn off:\n\n\n::: {.cell}\n\n```{.python .cell-code}\nreviews.llm.use(_cache = \"\")\n```\n:::\n\n:::\n\nFor more information see the [Caching Results](articles/caching.qmd) article.\n\n## Key considerations\n\nThe main consideration is **cost**. Either, time cost, or money cost.\n\nIf using this method with an LLM locally available, the cost will be a long \nrunning time. Unless using a very specialized LLM, a given LLM is a general \nmodel. It was fitted using a vast amount of data. So determining a response for\neach row, takes longer than if using a manually created NLP model. The default\nmodel used in Ollama is [Llama 3.2](https://ollama.com/library/llama3.2), which \nwas fitted using 3B parameters.\n\nIf using an external LLM service, the consideration will need to be for the \nbilling costs of using such service. Keep in mind that you will be sending a \nlot of data to be evaluated.\n\nAnother consideration is the novelty of this approach. Early tests are providing\nencouraging results. But you, as an user, will still need to keep in mind that \nthe predictions will not be infallible, so always check the output. At this time,\nI think the best use for this method, is for a quick analysis.\n\n## Vector functions\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n`mall` includes functions that expect a vector, instead of a table, to run the\npredictions. This should make it easier to test things, such as custom prompts\nor results of specific text. Each `llm_` function has a corresponding `llm_vec_`\nfunction:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nllm_vec_sentiment(\"I am happy\")\n#> [1] \"positive\"\n```\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nllm_vec_translate(\"Este es el mejor dia!\", \"english\")\n#> [1] \"It's the best day!\"\n```\n:::\n\n\n## Python \n\n`mall` is also able to process vectors contained in a `list` object. This allows\nus to avoid having to convert a list of texts without having to first convert\nthem into a single column data frame. To use, initialize a new `LLMVec` class\nobject with either an Ollama model, or a `chatlas` `Chat` object, and then\naccess the same NLP functions as the Polars extension.\n\n\n::: {.cell}\n\n```{.python .cell-code}\n# Initialize a Chat object\nfrom chatlas import ChatOllama\nchat = ChatOllama(model = \"llama3.2\")\n\n# Pass it to a new LLMVec\nfrom mall import LLMVec\nllm = LLMVec(chat) \n```\n:::\n\n\nAccess the functions via the new LLMVec object, and pass the text to be processed.\n\n\n::: {.cell}\n\n```{.python .cell-code}\nllm.sentiment([\"I am happy\", \"I am sad\"])\n#> ['positive', 'negative']\n```\n:::\n\n\n\n::: {.cell}\n\n```{.python .cell-code}\nllm.translate([\"Este es el mejor dia!\"], \"english\")\n#> [\"It's my favorite day!\"]\n```\n:::\n\n\nFor more information visit the reference page: [LLMVec](reference/LLMVec.qmd)\n:::\n\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/_freeze/reference/MallFrame/execute-results/html.json b/_freeze/reference/MallFrame/execute-results/html.json index 7f8842e..f3fe4de 100644 --- a/_freeze/reference/MallFrame/execute-results/html.json +++ b/_freeze/reference/MallFrame/execute-results/html.json @@ -1,15 +1,15 @@ { - "hash": "216838d4140dcc3a6fd654fe2973fc43", + "hash": "3e66ac82d73b36c324153482ec54e2b5", "result": { "engine": "jupyter", - "markdown": "---\ntitle: MallFrame\n---\n\n\n\n``` python\nMallFrame(df)\n```\n\nExtension to Polars that add ability to use an LLM to run batch predictions over a data frame\n\nWe will start by loading the needed libraries, and set up the data frame that will be used in the examples:\n\n\n::: {#48b6d6f6 .cell execution_count=1}\n``` {.python .cell-code}\nimport mall\nimport polars as pl\npl.Config(fmt_str_lengths=100)\npl.Config.set_tbl_hide_dataframe_shape(True)\npl.Config.set_tbl_hide_column_data_types(True)\ndata = mall.MallData\nreviews = data.reviews\nreviews.llm.use(\"ollama\", model = \"llama3.2\")\n```\n:::\n\n\n## Methods\n\n| Name | Description |\n|------------------------------------|------------------------------------|\n| [classify](#mall.MallFrame.classify) | Classify text into specific categories. |\n| [custom](#mall.MallFrame.custom) | Provide the full prompt that the LLM will process. |\n| [extract](#mall.MallFrame.extract) | Pull a specific label from the text. |\n| [sentiment](#mall.MallFrame.sentiment) | Use an LLM to run a sentiment analysis |\n| [summarize](#mall.MallFrame.summarize) | Summarize the text down to a specific number of words. |\n| [translate](#mall.MallFrame.translate) | Translate text into another language. |\n| [use](#mall.MallFrame.use) | Define the model, backend, and other options to use to |\n| [verify](#mall.MallFrame.verify) | Check to see if something is true about the text. |\n\n### classify {#mall.MallFrame.classify}\n\n``` python\nMallFrame.classify(col, labels='', additional='', pred_name='classify')\n```\n\nClassify text into specific categories.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|---------|---------|---------------------------------------------|---------|\n| col | str | The name of the text field to process | *required* |\n| labels | list | A list or a DICT object that defines the categories to classify the text as. It will return one of the provided labels. | `''` |\n| pred_name | str | A character vector with the name of the new column where the prediction will be placed | `'classify'` |\n| additional | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#3e0a8e0d .cell execution_count=2}\n``` {.python .cell-code}\nreviews.llm.classify(\"review\", [\"appliance\", \"computer\"])\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
\n
reviewclassify
"This has been the best TV I've ever used. Great screen, and sound.""computer"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""computer"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""computer"
\n```\n:::\n:::\n\n\n::: {#2bdcd379 .cell execution_count=3}\n``` {.python .cell-code}\n# Use 'pred_name' to customize the new column's name\nreviews.llm.classify(\"review\", [\"appliance\", \"computer\"], pred_name=\"prod_type\")\n```\n\n::: {.cell-output .cell-output-display execution_count=3}\n```{=html}\n
\n
reviewprod_type
"This has been the best TV I've ever used. Great screen, and sound.""computer"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""computer"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""computer"
\n```\n:::\n:::\n\n\n::: {#5fb178fc .cell execution_count=4}\n``` {.python .cell-code}\n#Pass a DICT to set custom values for each classification\nreviews.llm.classify(\"review\", {\"appliance\" : \"1\", \"computer\" : \"2\"})\n```\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n
\n
reviewclassify
"This has been the best TV I've ever used. Great screen, and sound.""1"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""2"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""1"
\n```\n:::\n:::\n\n\n### custom {#mall.MallFrame.custom}\n\n``` python\nMallFrame.custom(col, prompt='', valid_resps='', pred_name='custom')\n```\n\nProvide the full prompt that the LLM will process.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|----------|----------|-------------------------------------------|----------|\n| col | str | The name of the text field to process | *required* |\n| prompt | str | The prompt to send to the LLM along with the `col` | `''` |\n| pred_name | str | A character vector with the name of the new column where the prediction will be placed | `'custom'` |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#920e69ef .cell execution_count=5}\n``` {.python .cell-code}\nmy_prompt = (\n \"Answer a question.\"\n \"Return only the answer, no explanation\"\n \"Acceptable answers are 'yes', 'no'\"\n \"Answer this about the following text, is this a happy customer?:\"\n)\n\nreviews.llm.custom(\"review\", prompt = my_prompt)\n```\n\n::: {.cell-output .cell-output-display execution_count=5}\n```{=html}\n
\n
reviewcustom
"This has been the best TV I've ever used. Great screen, and sound.""Yes"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""No"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""No"
\n```\n:::\n:::\n\n\n### extract {#mall.MallFrame.extract}\n\n``` python\nMallFrame.extract(\n col,\n labels='',\n expand_cols=False,\n additional='',\n pred_name='extract',\n)\n```\n\nPull a specific label from the text.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|----------|----------|------------------------------------------|----------|\n| col | str | The name of the text field to process | *required* |\n| labels | list | A list or a DICT object that defines tells the LLM what to look for and return | `''` |\n| pred_name | str | A character vector with the name of the new column where the prediction will be placed | `'extract'` |\n| additional | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#e3de581d .cell execution_count=6}\n``` {.python .cell-code}\n# Use 'labels' to let the function know what to extract\nreviews.llm.extract(\"review\", labels = \"product\")\n```\n\n::: {.cell-output .cell-output-display execution_count=6}\n```{=html}\n
\n
reviewextract
"This has been the best TV I've ever used. Great screen, and sound.""tv"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""washing machine"
\n```\n:::\n:::\n\n\n::: {#12389546 .cell execution_count=7}\n``` {.python .cell-code}\n# Use 'pred_name' to customize the new column's name\nreviews.llm.extract(\"review\", \"product\", pred_name = \"prod\")\n```\n\n::: {.cell-output .cell-output-display execution_count=7}\n```{=html}\n
\n
reviewprod
"This has been the best TV I've ever used. Great screen, and sound.""tv"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""washing machine"
\n```\n:::\n:::\n\n\n::: {#e3178a17 .cell execution_count=8}\n``` {.python .cell-code}\n# Pass a vector to request multiple things, the results will be pipe delimeted\n# in a single column\nreviews.llm.extract(\"review\", [\"product\", \"feelings\"])\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n```{=html}\n
\n
reviewextract
"This has been the best TV I've ever used. Great screen, and sound.""tv | great"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop|disappointment"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""washing machine|frustration"
\n```\n:::\n:::\n\n\n::: {#92e3ee7b .cell execution_count=9}\n``` {.python .cell-code}\n# Set 'expand_cols' to True to split multiple lables\n# into individual columns\nreviews.llm.extract(\n col=\"review\",\n labels=[\"product\", \"feelings\"],\n expand_cols=True\n )\n```\n\n::: {.cell-output .cell-output-display execution_count=9}\n```{=html}\n
\n
reviewproductfeelings
"This has been the best TV I've ever used. Great screen, and sound.""tv "" great"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop""disappointment"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""washing machine""frustration"
\n```\n:::\n:::\n\n\n::: {#fb5e92d1 .cell execution_count=10}\n``` {.python .cell-code}\n# Set custom names to the resulting columns\nreviews.llm.extract(\n col=\"review\",\n labels={\"prod\": \"product\", \"feels\": \"feelings\"},\n expand_cols=True\n )\n```\n\n::: {.cell-output .cell-output-display execution_count=10}\n```{=html}\n
\n
reviewprodfeels
"This has been the best TV I've ever used. Great screen, and sound.""tv "" great"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop""disappointment"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""washing machine""frustration"
\n```\n:::\n:::\n\n\n### sentiment {#mall.MallFrame.sentiment}\n\n``` python\nMallFrame.sentiment(\n col,\n options=['positive', 'negative', 'neutral'],\n additional='',\n pred_name='sentiment',\n)\n```\n\nUse an LLM to run a sentiment analysis\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-------------|-------------|----------------------------------|-------------|\n| col | str | The name of the text field to process | *required* |\n| options | list or dict | A list of the sentiment options to use, or a named DICT object | `['positive', 'negative', 'neutral']` |\n| pred_name | str | A character vector with the name of the new column where the prediction will be placed | `'sentiment'` |\n| additional | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#04182b96 .cell execution_count=11}\n``` {.python .cell-code}\nreviews.llm.sentiment(\"review\")\n```\n\n::: {.cell-output .cell-output-display execution_count=11}\n```{=html}\n
\n
reviewsentiment
"This has been the best TV I've ever used. Great screen, and sound.""positive"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""negative"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""negative"
\n```\n:::\n:::\n\n\n::: {#528966f8 .cell execution_count=12}\n``` {.python .cell-code}\n# Use 'pred_name' to customize the new column's name\nreviews.llm.sentiment(\"review\", pred_name=\"review_sentiment\")\n```\n\n::: {.cell-output .cell-output-display execution_count=12}\n```{=html}\n
\n
reviewreview_sentiment
"This has been the best TV I've ever used. Great screen, and sound.""positive"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""negative"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""negative"
\n```\n:::\n:::\n\n\n::: {#a44ceee5 .cell execution_count=13}\n``` {.python .cell-code}\n# Pass custom sentiment options\nreviews.llm.sentiment(\"review\", [\"positive\", \"negative\"])\n```\n\n::: {.cell-output .cell-output-display execution_count=13}\n```{=html}\n
\n
reviewsentiment
"This has been the best TV I've ever used. Great screen, and sound.""positive"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""negative"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""negative"
\n```\n:::\n:::\n\n\n::: {#5ea5d77d .cell execution_count=14}\n``` {.python .cell-code}\n# Use a DICT object to specify values to return per sentiment\nreviews.llm.sentiment(\"review\", {\"positive\" : 1, \"negative\" : 0})\n```\n\n::: {.cell-output .cell-output-display execution_count=14}\n```{=html}\n
\n
reviewsentiment
"This has been the best TV I've ever used. Great screen, and sound."1
"I regret buying this laptop. It is too slow and the keyboard is too noisy"0
"Not sure how to feel about my new washing machine. Great color, but hard to figure"0
\n```\n:::\n:::\n\n\n### summarize {#mall.MallFrame.summarize}\n\n``` python\nMallFrame.summarize(col, max_words=10, additional='', pred_name='summary')\n```\n\nSummarize the text down to a specific number of words.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|----------|----------|------------------------------------------|----------|\n| col | str | The name of the text field to process | *required* |\n| max_words | int | Maximum number of words to use for the summary | `10` |\n| pred_name | str | A character vector with the name of the new column where the prediction will be placed | `'summary'` |\n| additional | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#69eb955e .cell execution_count=15}\n``` {.python .cell-code}\n# Use max_words to set the maximum number of words to use for the summary\nreviews.llm.summarize(\"review\", max_words = 5)\n```\n\n::: {.cell-output .cell-output-display execution_count=15}\n```{=html}\n
\n
reviewsummary
"This has been the best TV I've ever used. Great screen, and sound.""best tv i've ever used"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop purchase was a mistake"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""hard to adjust to new appliance"
\n```\n:::\n:::\n\n\n::: {#b137569e .cell execution_count=16}\n``` {.python .cell-code}\n# Use 'pred_name' to customize the new column's name\nreviews.llm.summarize(\"review\", 5, pred_name = \"review_summary\")\n```\n\n::: {.cell-output .cell-output-display execution_count=16}\n```{=html}\n
\n
reviewreview_summary
"This has been the best TV I've ever used. Great screen, and sound.""best tv i've ever used"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""laptop purchase was a mistake"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""hard to adjust to new appliance"
\n```\n:::\n:::\n\n\n### translate {#mall.MallFrame.translate}\n\n``` python\nMallFrame.translate(col, language='', additional='', pred_name='translation')\n```\n\nTranslate text into another language.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-----------|-----------|-----------------------------------------|-----------|\n| col | str | The name of the text field to process | *required* |\n| language | str | The target language to translate to. For example 'French'. | `''` |\n| pred_name | str | A character vector with the name of the new column where the prediction will be placed | `'translation'` |\n| additional | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#aebc3426 .cell execution_count=17}\n``` {.python .cell-code}\nreviews.llm.translate(\"review\", \"spanish\")\n```\n\n::: {.cell-output .cell-output-display execution_count=17}\n```{=html}\n
\n
reviewtranslation
"This has been the best TV I've ever used. Great screen, and sound.""Esta ha sido la mejor televisor que he utilizado hasta ahora. Gran pantalla y sonido."
"I regret buying this laptop. It is too slow and the keyboard is too noisy""Me arrepiento de haber comprado este portátil. Es demasiado lento y la tecla del espacio es demasiad…
"Not sure how to feel about my new washing machine. Great color, but hard to figure""No estoy seguro de cómo sentirme con mi nuevo lavadora. El color es excelente, pero es difícil de en…
\n```\n:::\n:::\n\n\n::: {#c2f570fe .cell execution_count=18}\n``` {.python .cell-code}\nreviews.llm.translate(\"review\", \"french\")\n```\n\n::: {.cell-output .cell-output-display execution_count=18}\n```{=html}\n
\n
reviewtranslation
"This has been the best TV I've ever used. Great screen, and sound.""Ceci est la meilleure télévision que j'ai jamais utilisée. Écran et son excellent."
"I regret buying this laptop. It is too slow and the keyboard is too noisy""Je regrets avoir acheté ce portable. C'est trop lent et le clavier est trop bruyant."
"Not sure how to feel about my new washing machine. Great color, but hard to figure""Ne suis pas sûr de savoir comment je me sens à l'égard de mon nouveau lave-linge. Couleur superbe, m…
\n```\n:::\n:::\n\n\n### use {#mall.MallFrame.use}\n\n``` python\nMallFrame.use(backend='', model='', _cache='_mall_cache', **kwargs)\n```\n\nDefine the model, backend, and other options to use to interact with the LLM.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|---------|---------|---------------------------------------------|---------|\n| backend | str \\| Chat \\| Client | The name of the backend to use, or an Ollama Client object, or a `chatlas` Chat object. At the beginning of the session it defaults to \"ollama\". If passing `\"\"`, it will remain unchanged | `''` |\n| model | str | The name of the model tha the backend should use. At the beginning of the session it defaults to \"llama3.2\". If passing `\"\"`, it will remain unchanged | `''` |\n| \\_cache | str | The path of where to save the cached results. Passing `\"\"` disables the cache | `'_mall_cache'` |\n| \\*\\*kwargs | | Arguments to pass to the downstream Python call. In this case, the `chat` function in `ollama` | `{}` |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#38254047 .cell execution_count=19}\n``` {.python .cell-code}\n# Additional arguments will be passed 'as-is' to the\n# downstream R function in this example, to ollama::chat()\nreviews.llm.use(\"ollama\", \"llama3.2\", options = dict(seed = 100, temperature = 0.1))\n```\n:::\n\n\n::: {#662becb8 .cell execution_count=20}\n``` {.python .cell-code}\n# During the Python session, you can change any argument\n# individually and it will retain all of previous\n# arguments used\nreviews.llm.use(options = dict(temperature = 0.3))\n```\n:::\n\n\n::: {#26daf441 .cell execution_count=21}\n``` {.python .cell-code}\n# Use _cache to modify the target folder for caching\nreviews.llm.use(_cache = \"_my_cache\")\n```\n:::\n\n\n::: {#c3e13977 .cell execution_count=22}\n``` {.python .cell-code}\n# Leave _cache empty to turn off this functionality\nreviews.llm.use(_cache = \"\")\n```\n:::\n\n\n::: {#e39fbb22 .cell execution_count=23}\n``` {.python .cell-code}\n# Use a `chatlas` object\nfrom chatlas import ChatOpenAI\nchat = ChatOpenAI()\nreviews.llm.use(chat)\n```\n:::\n\n\n### verify {#mall.MallFrame.verify}\n\n``` python\nMallFrame.verify(col, what='', yes_no=[1, 0], additional='', pred_name='verify')\n```\n\nCheck to see if something is true about the text.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|--------|--------|------------------------------------------------|--------|\n| col | str | The name of the text field to process | *required* |\n| what | str | The statement or question that needs to be verified against the provided text | `''` |\n| yes_no | list | A positional list of size 2, which contains the values to return if true and false. The first position will be used as the 'true' value, and the second as the 'false' value | `[1, 0]` |\n| pred_name | str | A character vector with the name of the new column where the prediction will be placed | `'verify'` |\n| additional | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#01994253 .cell execution_count=24}\n``` {.python .cell-code}\nreviews.llm.verify(\"review\", \"is the customer happy\")\n```\n\n::: {.cell-output .cell-output-display execution_count=19}\n```{=html}\n
\n
reviewverify
"This has been the best TV I've ever used. Great screen, and sound."1
"I regret buying this laptop. It is too slow and the keyboard is too noisy"0
"Not sure how to feel about my new washing machine. Great color, but hard to figure"0
\n```\n:::\n:::\n\n\n::: {#e8130e8d .cell execution_count=25}\n``` {.python .cell-code}\n# Use 'yes_no' to modify the 'true' and 'false' values to return\nreviews.llm.verify(\"review\", \"is the customer happy\", [\"y\", \"n\"])\n```\n\n::: {.cell-output .cell-output-display execution_count=20}\n```{=html}\n
\n
reviewverify
"This has been the best TV I've ever used. Great screen, and sound.""y"
"I regret buying this laptop. It is too slow and the keyboard is too noisy""n"
"Not sure how to feel about my new washing machine. Great color, but hard to figure""n"
\n```\n:::\n:::\n\n\n", + "markdown": "---\ntitle: MallFrame\n---\n\n\n\n```python\nMallFrame(df)\n```\n\nExtension to Polars that add ability to use\nan LLM to run batch predictions over a data frame\n\nWe will start by loading the needed libraries, and\nset up the data frame that will be used in the\nexamples:\n\n\n::: {#c9cc9ffb .cell execution_count=1}\n``` {.python .cell-code}\nimport mall\nimport polars as pl\npl.Config(fmt_str_lengths=100)\npl.Config.set_tbl_hide_dataframe_shape(True)\npl.Config.set_tbl_hide_column_data_types(True)\ndata = mall.MallData\nreviews = data.reviews\nreviews.llm.use(options = dict(seed = 100))\n```\n:::\n\n\n## Methods\n\n| Name | Description |\n| --- | --- |\n| [classify](#mall.MallFrame.classify) | Classify text into specific categories. |\n| [custom](#mall.MallFrame.custom) | Provide the full prompt that the LLM will process. |\n| [extract](#mall.MallFrame.extract) | Pull a specific label from the text. |\n| [sentiment](#mall.MallFrame.sentiment) | Use an LLM to run a sentiment analysis |\n| [summarize](#mall.MallFrame.summarize) | Summarize the text down to a specific number of words. |\n| [translate](#mall.MallFrame.translate) | Translate text into another language. |\n| [use](#mall.MallFrame.use) | Define the model, backend, and other options to use to |\n| [verify](#mall.MallFrame.verify) | Check to see if something is true about the text. |\n\n### classify { #mall.MallFrame.classify }\n\n```python\nMallFrame.classify(col, labels='', additional='', pred_name='classify')\n```\n\nClassify text into specific categories.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|------------|--------|-------------------------------------------------------------------------------------------------------------------------|--------------|\n| col | str | The name of the text field to process | _required_ |\n| labels | list | A list or a DICT object that defines the categories to classify the text as. It will return one of the provided labels. | `''` |\n| pred_name | str | A character vector with the name of the new column where the prediction will be placed | `'classify'` |\n| additional | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#4314974d .cell execution_count=2}\n``` {.python .cell-code}\nreviews.llm.classify(\"review\", [\"appliance\", \"computer\"])\n```\n\n::: {.cell-output .cell-output-display execution_count=2}\n```{=html}\n
\n
reviewclassify
"This has been the best TV I've ever used. Great screen, and sound."null
"I regret buying this laptop. It is too slow and the keyboard is too noisy"null
"Not sure how to feel about my new washing machine. Great color, but hard to figure"null
\n```\n:::\n:::\n\n\n::: {#020c2d1b .cell execution_count=3}\n``` {.python .cell-code}\n# Use 'pred_name' to customize the new column's name\nreviews.llm.classify(\"review\", [\"appliance\", \"computer\"], pred_name=\"prod_type\")\n```\n\n::: {.cell-output .cell-output-display execution_count=3}\n```{=html}\n
\n
reviewprod_type
"This has been the best TV I've ever used. Great screen, and sound."null
"I regret buying this laptop. It is too slow and the keyboard is too noisy"null
"Not sure how to feel about my new washing machine. Great color, but hard to figure"null
\n```\n:::\n:::\n\n\n::: {#5e6768c8 .cell execution_count=4}\n``` {.python .cell-code}\n#Pass a DICT to set custom values for each classification\nreviews.llm.classify(\"review\", {\"appliance\" : \"1\", \"computer\" : \"2\"})\n```\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n
\n
reviewclassify
"This has been the best TV I've ever used. Great screen, and sound."null
"I regret buying this laptop. It is too slow and the keyboard is too noisy"null
"Not sure how to feel about my new washing machine. Great color, but hard to figure"null
\n```\n:::\n:::\n\n\n### custom { #mall.MallFrame.custom }\n\n```python\nMallFrame.custom(col, prompt='', valid_resps='', pred_name='custom')\n```\n\nProvide the full prompt that the LLM will process.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-----------|--------|----------------------------------------------------------------------------------------|------------|\n| col | str | The name of the text field to process | _required_ |\n| prompt | str | The prompt to send to the LLM along with the `col` | `''` |\n| pred_name | str | A character vector with the name of the new column where the prediction will be placed | `'custom'` |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#fd23d335 .cell execution_count=5}\n``` {.python .cell-code}\nmy_prompt = (\n \"Answer a question.\"\n \"Return only the answer, no explanation\"\n \"Acceptable answers are 'yes', 'no'\"\n \"Answer this about the following text, is this a happy customer?:\"\n)\n\nreviews.llm.custom(\"review\", prompt = my_prompt)\n```\n\n::: {.cell-output .cell-output-display execution_count=5}\n```{=html}\n
\n
reviewcustom
"This has been the best TV I've ever used. Great screen, and sound."""
"I regret buying this laptop. It is too slow and the keyboard is too noisy"""
"Not sure how to feel about my new washing machine. Great color, but hard to figure"""
\n```\n:::\n:::\n\n\n### extract { #mall.MallFrame.extract }\n\n```python\nMallFrame.extract(\n col,\n labels='',\n expand_cols=False,\n additional='',\n pred_name='extract',\n)\n```\n\nPull a specific label from the text.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|------------|--------|----------------------------------------------------------------------------------------|-------------|\n| col | str | The name of the text field to process | _required_ |\n| labels | list | A list or a DICT object that defines tells the LLM what to look for and return | `''` |\n| pred_name | str | A character vector with the name of the new column where the prediction will be placed | `'extract'` |\n| additional | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#a33b7608 .cell execution_count=6}\n``` {.python .cell-code}\n# Use 'labels' to let the function know what to extract\nreviews.llm.extract(\"review\", labels = \"product\")\n```\n\n::: {.cell-output .cell-output-display execution_count=6}\n```{=html}\n
\n
reviewextract
"This has been the best TV I've ever used. Great screen, and sound."""
"I regret buying this laptop. It is too slow and the keyboard is too noisy"""
"Not sure how to feel about my new washing machine. Great color, but hard to figure"""
\n```\n:::\n:::\n\n\n::: {#510ea76f .cell execution_count=7}\n``` {.python .cell-code}\n# Use 'pred_name' to customize the new column's name\nreviews.llm.extract(\"review\", \"product\", pred_name = \"prod\")\n```\n\n::: {.cell-output .cell-output-display execution_count=7}\n```{=html}\n
\n
reviewprod
"This has been the best TV I've ever used. Great screen, and sound."""
"I regret buying this laptop. It is too slow and the keyboard is too noisy"""
"Not sure how to feel about my new washing machine. Great color, but hard to figure"""
\n```\n:::\n:::\n\n\n::: {#ddfe6a0c .cell execution_count=8}\n``` {.python .cell-code}\n# Pass a vector to request multiple things, the results will be pipe delimeted\n# in a single column\nreviews.llm.extract(\"review\", [\"product\", \"feelings\"])\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n```{=html}\n
\n
reviewextract
"This has been the best TV I've ever used. Great screen, and sound."""
"I regret buying this laptop. It is too slow and the keyboard is too noisy"""
"Not sure how to feel about my new washing machine. Great color, but hard to figure"""
\n```\n:::\n:::\n\n\n::: {#09986445 .cell execution_count=9}\n``` {.python .cell-code}\n# Set 'expand_cols' to True to split multiple lables\n# into individual columns\nreviews.llm.extract(\n col=\"review\",\n labels=[\"product\", \"feelings\"],\n expand_cols=True\n )\n```\n\n::: {.cell-output .cell-output-display execution_count=9}\n```{=html}\n
\n
reviewproductfeelings
"This has been the best TV I've ever used. Great screen, and sound."""null
"I regret buying this laptop. It is too slow and the keyboard is too noisy"""null
"Not sure how to feel about my new washing machine. Great color, but hard to figure"""null
\n```\n:::\n:::\n\n\n::: {#74c21011 .cell execution_count=10}\n``` {.python .cell-code}\n# Set custom names to the resulting columns\nreviews.llm.extract(\n col=\"review\",\n labels={\"prod\": \"product\", \"feels\": \"feelings\"},\n expand_cols=True\n )\n```\n\n::: {.cell-output .cell-output-display execution_count=10}\n```{=html}\n
\n
reviewprodfeels
"This has been the best TV I've ever used. Great screen, and sound."""null
"I regret buying this laptop. It is too slow and the keyboard is too noisy"""null
"Not sure how to feel about my new washing machine. Great color, but hard to figure"""null
\n```\n:::\n:::\n\n\n### sentiment { #mall.MallFrame.sentiment }\n\n```python\nMallFrame.sentiment(\n col,\n options=['positive', 'negative', 'neutral'],\n additional='',\n pred_name='sentiment',\n)\n```\n\nUse an LLM to run a sentiment analysis\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|------------|--------------|----------------------------------------------------------------------------------------|---------------------------------------|\n| col | str | The name of the text field to process | _required_ |\n| options | list or dict | A list of the sentiment options to use, or a named DICT object | `['positive', 'negative', 'neutral']` |\n| pred_name | str | A character vector with the name of the new column where the prediction will be placed | `'sentiment'` |\n| additional | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#36a58951 .cell execution_count=11}\n``` {.python .cell-code}\nreviews.llm.sentiment(\"review\")\n```\n\n::: {.cell-output .cell-output-display execution_count=11}\n```{=html}\n
\n
reviewsentiment
"This has been the best TV I've ever used. Great screen, and sound."null
"I regret buying this laptop. It is too slow and the keyboard is too noisy"null
"Not sure how to feel about my new washing machine. Great color, but hard to figure"null
\n```\n:::\n:::\n\n\n::: {#19181236 .cell execution_count=12}\n``` {.python .cell-code}\n# Use 'pred_name' to customize the new column's name\nreviews.llm.sentiment(\"review\", pred_name=\"review_sentiment\")\n```\n\n::: {.cell-output .cell-output-display execution_count=12}\n```{=html}\n
\n
reviewreview_sentiment
"This has been the best TV I've ever used. Great screen, and sound."null
"I regret buying this laptop. It is too slow and the keyboard is too noisy"null
"Not sure how to feel about my new washing machine. Great color, but hard to figure"null
\n```\n:::\n:::\n\n\n::: {#cd2cb728 .cell execution_count=13}\n``` {.python .cell-code}\n# Pass custom sentiment options\nreviews.llm.sentiment(\"review\", [\"positive\", \"negative\"])\n```\n\n::: {.cell-output .cell-output-display execution_count=13}\n```{=html}\n
\n
reviewsentiment
"This has been the best TV I've ever used. Great screen, and sound."null
"I regret buying this laptop. It is too slow and the keyboard is too noisy"null
"Not sure how to feel about my new washing machine. Great color, but hard to figure"null
\n```\n:::\n:::\n\n\n::: {#bdc5dbf7 .cell execution_count=14}\n``` {.python .cell-code}\n# Use a DICT object to specify values to return per sentiment\nreviews.llm.sentiment(\"review\", {\"positive\" : 1, \"negative\" : 0})\n```\n\n::: {.cell-output .cell-output-display execution_count=14}\n```{=html}\n
\n
reviewsentiment
"This has been the best TV I've ever used. Great screen, and sound."null
"I regret buying this laptop. It is too slow and the keyboard is too noisy"null
"Not sure how to feel about my new washing machine. Great color, but hard to figure"null
\n```\n:::\n:::\n\n\n### summarize { #mall.MallFrame.summarize }\n\n```python\nMallFrame.summarize(col, max_words=10, additional='', pred_name='summary')\n```\n\nSummarize the text down to a specific number of words.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|------------|--------|----------------------------------------------------------------------------------------|-------------|\n| col | str | The name of the text field to process | _required_ |\n| max_words | int | Maximum number of words to use for the summary | `10` |\n| pred_name | str | A character vector with the name of the new column where the prediction will be placed | `'summary'` |\n| additional | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#21895d07 .cell execution_count=15}\n``` {.python .cell-code}\n# Use max_words to set the maximum number of words to use for the summary\nreviews.llm.summarize(\"review\", max_words = 5)\n```\n\n::: {.cell-output .cell-output-display execution_count=15}\n```{=html}\n
\n
reviewsummary
"This has been the best TV I've ever used. Great screen, and sound."""
"I regret buying this laptop. It is too slow and the keyboard is too noisy"""
"Not sure how to feel about my new washing machine. Great color, but hard to figure"""
\n```\n:::\n:::\n\n\n::: {#bf0b8bfe .cell execution_count=16}\n``` {.python .cell-code}\n# Use 'pred_name' to customize the new column's name\nreviews.llm.summarize(\"review\", 5, pred_name = \"review_summary\")\n```\n\n::: {.cell-output .cell-output-display execution_count=16}\n```{=html}\n
\n
reviewreview_summary
"This has been the best TV I've ever used. Great screen, and sound."""
"I regret buying this laptop. It is too slow and the keyboard is too noisy"""
"Not sure how to feel about my new washing machine. Great color, but hard to figure"""
\n```\n:::\n:::\n\n\n### translate { #mall.MallFrame.translate }\n\n```python\nMallFrame.translate(col, language='', additional='', pred_name='translation')\n```\n\nTranslate text into another language.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|------------|--------|----------------------------------------------------------------------------------------|-----------------|\n| col | str | The name of the text field to process | _required_ |\n| language | str | The target language to translate to. For example 'French'. | `''` |\n| pred_name | str | A character vector with the name of the new column where the prediction will be placed | `'translation'` |\n| additional | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#1d696b92 .cell execution_count=17}\n``` {.python .cell-code}\nreviews.llm.translate(\"review\", \"spanish\")\n```\n\n::: {.cell-output .cell-output-display execution_count=17}\n```{=html}\n
\n
reviewtranslation
"This has been the best TV I've ever used. Great screen, and sound."""
"I regret buying this laptop. It is too slow and the keyboard is too noisy"""
"Not sure how to feel about my new washing machine. Great color, but hard to figure"""
\n```\n:::\n:::\n\n\n::: {#44769dbb .cell execution_count=18}\n``` {.python .cell-code}\nreviews.llm.translate(\"review\", \"french\")\n```\n\n::: {.cell-output .cell-output-display execution_count=18}\n```{=html}\n
\n
reviewtranslation
"This has been the best TV I've ever used. Great screen, and sound."""
"I regret buying this laptop. It is too slow and the keyboard is too noisy"""
"Not sure how to feel about my new washing machine. Great color, but hard to figure"""
\n```\n:::\n:::\n\n\n### use { #mall.MallFrame.use }\n\n```python\nMallFrame.use(backend='', model='', _cache='_mall_cache', **kwargs)\n```\n\nDefine the model, backend, and other options to use to\ninteract with the LLM.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|----------|-----------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|\n| backend | str \\| Chat \\| Client | The name of the backend to use, or an Ollama Client object, or a `chatlas` Chat object. At the beginning of the session it defaults to \"ollama\". If passing `\"\"`, it will remain unchanged | `''` |\n| model | str | The name of the model tha the backend should use. At the beginning of the session it defaults to \"llama3.2\". If passing `\"\"`, it will remain unchanged | `''` |\n| _cache | str | The path of where to save the cached results. Passing `\"\"` disables the cache | `'_mall_cache'` |\n| **kwargs | | Arguments to pass to the downstream Python call. In this case, the `chat` function in `ollama` | `{}` |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#6b7b6343 .cell execution_count=19}\n``` {.python .cell-code}\n# Additional arguments will be passed 'as-is' to the\n# downstream R function in this example, to ollama::chat()\nreviews.llm.use(\"ollama\", \"llama3.2\", options = dict(seed = 100, temperature = 0.1))\n```\n\n::: {.cell-output .cell-output-display execution_count=19}\n```\n{'backend': 'ollama',\n 'model': 'llama3.2',\n '_cache': '_mall_cache',\n 'options': {'seed': 100, 'temperature': 0.1}}\n```\n:::\n:::\n\n\n::: {#1577d96e .cell execution_count=20}\n``` {.python .cell-code}\n# During the Python session, you can change any argument\n# individually and it will retain all of previous\n# arguments used\nreviews.llm.use(options = dict(temperature = 0.3))\n```\n\n::: {.cell-output .cell-output-display execution_count=20}\n```\n{'_cache': '_mall_cache', 'options': {'temperature': 0.3}}\n```\n:::\n:::\n\n\n::: {#c6cd5bdd .cell execution_count=21}\n``` {.python .cell-code}\n# Use _cache to modify the target folder for caching\nreviews.llm.use(_cache = \"_my_cache\")\n```\n\n::: {.cell-output .cell-output-display execution_count=21}\n```\n{'_cache': '_my_cache'}\n```\n:::\n:::\n\n\n::: {#219384b9 .cell execution_count=22}\n``` {.python .cell-code}\n# Leave _cache empty to turn off this functionality\nreviews.llm.use(_cache = \"\")\n```\n\n::: {.cell-output .cell-output-display execution_count=22}\n```\n{'_cache': ''}\n```\n:::\n:::\n\n\n::: {#477a2d06 .cell execution_count=23}\n``` {.python .cell-code}\n# Use a `chatlas` object\nfrom chatlas import ChatOpenAI\nchat = ChatOpenAI()\nreviews.llm.use(chat)\n```\n\n::: {.cell-output .cell-output-display execution_count=23}\n```\n{'backend': 'chatlas',\n 'chat': ,\n '_cache': '_mall_cache'}\n```\n:::\n:::\n\n\n### verify { #mall.MallFrame.verify }\n\n```python\nMallFrame.verify(col, what='', yes_no=[1, 0], additional='', pred_name='verify')\n```\n\nCheck to see if something is true about the text.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|------------|--------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|\n| col | str | The name of the text field to process | _required_ |\n| what | str | The statement or question that needs to be verified against the provided text | `''` |\n| yes_no | list | A positional list of size 2, which contains the values to return if true and false. The first position will be used as the 'true' value, and the second as the 'false' value | `[1, 0]` |\n| pred_name | str | A character vector with the name of the new column where the prediction will be placed | `'verify'` |\n| additional | str | Inserts this text into the prompt sent to the LLM | `''` |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#cbd41374 .cell execution_count=24}\n``` {.python .cell-code}\nreviews.llm.verify(\"review\", \"is the customer happy\")\n```\n\n::: {.cell-output .cell-output-display execution_count=24}\n```{=html}\n
\n
reviewverify
"This has been the best TV I've ever used. Great screen, and sound."null
"I regret buying this laptop. It is too slow and the keyboard is too noisy"null
"Not sure how to feel about my new washing machine. Great color, but hard to figure"null
\n```\n:::\n:::\n\n\n::: {#7bf0eecd .cell execution_count=25}\n``` {.python .cell-code}\n# Use 'yes_no' to modify the 'true' and 'false' values to return\nreviews.llm.verify(\"review\", \"is the customer happy\", [\"y\", \"n\"])\n```\n\n::: {.cell-output .cell-output-display execution_count=25}\n```{=html}\n
\n
reviewverify
"This has been the best TV I've ever used. Great screen, and sound."null
"I regret buying this laptop. It is too slow and the keyboard is too noisy"null
"Not sure how to feel about my new washing machine. Great color, but hard to figure"null
\n```\n:::\n:::\n\n\n", "supporting": [ "MallFrame_files" ], "filters": [], "includes": { "include-in-header": [ - "\n\n\n" + "\n\n\n" ] } } diff --git a/_freeze/reference/llm_classify/execute-results/html.json b/_freeze/reference/llm_classify/execute-results/html.json index cc6ba3b..906f8b8 100644 --- a/_freeze/reference/llm_classify/execute-results/html.json +++ b/_freeze/reference/llm_classify/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "8636e2352d2fd05050145b8abb8c1b47", + "hash": "57e7a9b7e666e6a9885d7ba9a2d9bd9e", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Categorize data as one of options given\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n[R/llm-classify.R](https://github.com/mlverse/mall/blob/main/r/R/llm-classify.R)\n\n## llm_classify\n\n## Description\nUse a Large Language Model (LLM) to classify the provided text as one of the options provided via the `labels` argument.\n\n\n## Usage\n```r\n\nllm_classify(\n .data,\n col,\n labels,\n pred_name = \".classify\",\n additional_prompt = \"\"\n)\n\nllm_vec_classify(x, labels, additional_prompt = \"\", preview = FALSE)\n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| .data | A `data.frame` or `tbl` object that contains the text to be analyzed |\n| col | The name of the field to analyze, supports `tidy-eval` |\n| labels | A character vector with at least 2 labels to classify the text as |\n| pred_name | A character vector with the name of the new column where the prediction will be placed |\n| additional_prompt | Inserts this text into the prompt sent to the LLM |\n| x | A vector that contains the text to be analyzed |\n| preview | It returns the R call that would have been used to run the prediction. It only returns the first record in `x`. Defaults to `FALSE` Applies to vector function only. |\n\n\n\n## Value\n`llm_classify` returns a `data.frame` or `tbl` object. `llm_vec_classify` returns a vector that is the same length as `x`.\n\n\n## Examples\n\n::: {.cell}\n\n```{.r .cell-code}\n\n\nlibrary(mall)\n\ndata(\"reviews\")\n\nllm_use(\"ollama\", \"llama3.2\", seed = 100, .silent = TRUE)\n\nllm_classify(reviews, review, c(\"appliance\", \"computer\"))\n#> # A tibble: 3 × 2\n#> review .classify\n#> \n#> 1 This has been the best TV I've ever used. Gr… computer \n#> 2 I regret buying this laptop. It is too slow … computer \n#> 3 Not sure how to feel about my new washing ma… appliance\n\n# Use 'pred_name' to customize the new column's name\nllm_classify(\n reviews,\n review,\n c(\"appliance\", \"computer\"),\n pred_name = \"prod_type\"\n)\n#> # A tibble: 3 × 2\n#> review prod_type\n#> \n#> 1 This has been the best TV I've ever used. Gr… computer \n#> 2 I regret buying this laptop. It is too slow … computer \n#> 3 Not sure how to feel about my new washing ma… appliance\n\n# Pass custom values for each classification\nllm_classify(reviews, review, c(\"appliance\" ~ 1, \"computer\" ~ 2))\n#> # A tibble: 3 × 2\n#> review .classify\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. 1\n#> 2 I regret buying this laptop. It is too slow and the keyboard is too… 2\n#> 3 Not sure how to feel about my new washing machine. Great color, but… 1\n\n# For character vectors, instead of a data frame, use this function\nllm_vec_classify(\n c(\"this is important!\", \"just whenever\"),\n c(\"urgent\", \"not urgent\")\n)\n#> [1] \"urgent\" \"urgent\"\n\n# To preview the first call that will be made to the downstream R function\nllm_vec_classify(\n c(\"this is important!\", \"just whenever\"),\n c(\"urgent\", \"not urgent\"),\n preview = TRUE\n)\n#> [[1]]\n#> ollamar::chat(messages = list(list(role = \"user\", content = \"You are a helpful classification engine. Determine if the text refers to one of the following: urgent, not urgent. No capitalization. No explanations. The answer is based on the following text:\\nthis is important!\")), \n#> output = \"text\", model = \"llama3.2\", seed = 100)\n```\n:::\n\n\n\n", + "markdown": "---\ntitle: \"Categorize data as one of options given\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n[R/llm-classify.R](https://github.com/mlverse/mall/blob/main/r/R/llm-classify.R)\n\n## llm_classify\n\n## Description\nUse a Large Language Model (LLM) to classify the provided text as one of the options provided via the `labels` argument.\n\n\n## Usage\n```r\nllm_classify(\n .data,\n col,\n labels,\n pred_name = \".classify\",\n additional_prompt = \"\"\n)\n\nllm_vec_classify(x, labels, additional_prompt = \"\", preview = FALSE)\n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| .data | A `data.frame` or `tbl` object that contains the text to be analyzed |\n| col | The name of the field to analyze, supports `tidy-eval` |\n| labels | A character vector with at least 2 labels to classify the text as |\n| pred_name | A character vector with the name of the new column where the prediction will be placed |\n| additional_prompt | Inserts this text into the prompt sent to the LLM |\n| x | A vector that contains the text to be analyzed |\n| preview | It returns the R call that would have been used to run the prediction. It only returns the first record in `x`. Defaults to `FALSE` Applies to vector function only. |\n\n\n\n## Value\n`llm_classify` returns a `data.frame` or `tbl` object. `llm_vec_classify` returns a vector that is the same length as `x`.\n\n\n## Examples\n\n::: {.cell}\n\n```{.r .cell-code}\n\n\nlibrary(mall)\n\ndata(\"reviews\")\n\nllm_use(\"ollama\", \"llama3.2\", seed = 100, .silent = TRUE)\n\nllm_classify(reviews, review, c(\"appliance\", \"computer\"))\n#> # A tibble: 3 × 2\n#> review .classify\n#> \n#> 1 This has been the best TV I've ever used. Gr… computer \n#> 2 I regret buying this laptop. It is too slow … computer \n#> 3 Not sure how to feel about my new washing ma… appliance\n\n# Use 'pred_name' to customize the new column's name\nllm_classify(\n reviews,\n review,\n c(\"appliance\", \"computer\"),\n pred_name = \"prod_type\"\n)\n#> # A tibble: 3 × 2\n#> review prod_type\n#> \n#> 1 This has been the best TV I've ever used. Gr… computer \n#> 2 I regret buying this laptop. It is too slow … computer \n#> 3 Not sure how to feel about my new washing ma… appliance\n\n# Pass custom values for each classification\nllm_classify(reviews, review, c(\"appliance\" ~ 1, \"computer\" ~ 2))\n#> # A tibble: 3 × 2\n#> review .classify\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. 1\n#> 2 I regret buying this laptop. It is too slow and the keyboard is too… 2\n#> 3 Not sure how to feel about my new washing machine. Great color, but… 1\n\n# For character vectors, instead of a data frame, use this function\nllm_vec_classify(\n c(\"this is important!\", \"just whenever\"),\n c(\"urgent\", \"not urgent\")\n)\n#> [1] \"urgent\" \"urgent\"\n\n# To preview the first call that will be made to the downstream R function\nllm_vec_classify(\n c(\"this is important!\", \"just whenever\"),\n c(\"urgent\", \"not urgent\"),\n preview = TRUE\n)\n#> [[1]]\n#> ollamar::chat(messages = list(list(role = \"user\", content = \"You are a helpful classification engine. Determine if the text refers to one of the following: urgent, not urgent. No capitalization. No explanations. The answer is based on the following text: this is important!\")), \n#> output = \"text\", model = \"llama3.2\", seed = 100)\n```\n:::\n\n\n\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/_freeze/reference/llm_custom/execute-results/html.json b/_freeze/reference/llm_custom/execute-results/html.json index 64833ff..3b70ca2 100644 --- a/_freeze/reference/llm_custom/execute-results/html.json +++ b/_freeze/reference/llm_custom/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "fe916472ec7aacb91ac94d1d8f38324b", + "hash": "66a572fa1fd3859ce2b3e9391f7628d5", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Send a custom prompt to the LLM\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n[R/llm-custom.R](https://github.com/mlverse/mall/blob/main/r/R/llm-custom.R)\n\n## llm_custom\n\n## Description\nUse a Large Language Model (LLM) to process the provided text using the instructions from `prompt`\n\n\n## Usage\n```r\n\nllm_custom(.data, col, prompt = \"\", pred_name = \".pred\", valid_resps = \"\")\n\nllm_vec_custom(x, prompt = \"\", valid_resps = NULL)\n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| .data | A `data.frame` or `tbl` object that contains the text to be analyzed |\n| col | The name of the field to analyze, supports `tidy-eval` |\n| prompt | The prompt to append to each record sent to the LLM |\n| pred_name | A character vector with the name of the new column where the prediction will be placed |\n| valid_resps | If the response from the LLM is not open, but deterministic, provide the options in a vector. This function will set to `NA` any response not in the options |\n| x | A vector that contains the text to be analyzed |\n\n\n\n## Value\n`llm_custom` returns a `data.frame` or `tbl` object. `llm_vec_custom` returns a vector that is the same length as `x`.\n\n\n## Examples\n\n::: {.cell}\n\n```{.r .cell-code}\n\n\nlibrary(mall)\n\ndata(\"reviews\")\n\nllm_use(\"ollama\", \"llama3.2\", seed = 100, .silent = TRUE)\n\nmy_prompt <- paste(\n \"Answer a question.\",\n \"Return only the answer, no explanation\",\n \"Acceptable answers are 'yes', 'no'\",\n \"Answer this about the following text, is this a happy customer?:\"\n)\n\nreviews |>\n llm_custom(review, my_prompt)\n#> # A tibble: 3 × 2\n#> review .pred\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. No \n#> 2 I regret buying this laptop. It is too slow and the keyboard is too noi… No \n#> 3 Not sure how to feel about my new washing machine. Great color, but har… No\n```\n:::\n\n\n\n", + "markdown": "---\ntitle: \"Send a custom prompt to the LLM\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n[R/llm-custom.R](https://github.com/mlverse/mall/blob/main/r/R/llm-custom.R)\n\n## llm_custom\n\n## Description\nUse a Large Language Model (LLM) to process the provided text using the instructions from `prompt`\n\n\n## Usage\n```r\nllm_custom(.data, col, prompt = \"\", pred_name = \".pred\", valid_resps = \"\")\n\nllm_vec_custom(x, prompt = \"\", valid_resps = NULL, preview = FALSE)\n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| .data | A `data.frame` or `tbl` object that contains the text to be analyzed |\n| col | The name of the field to analyze, supports `tidy-eval` |\n| prompt | The prompt to append to each record sent to the LLM |\n| pred_name | A character vector with the name of the new column where the prediction will be placed |\n| valid_resps | If the response from the LLM is not open, but deterministic, provide the options in a vector. This function will set to `NA` any response not in the options |\n| x | A vector that contains the text to be analyzed |\n| preview | It returns the R call that would have been used to run the prediction. It only returns the first record in `x`. Defaults to `FALSE` Applies to vector function only. |\n\n\n\n## Value\n`llm_custom` returns a `data.frame` or `tbl` object. `llm_vec_custom` returns a vector that is the same length as `x`.\n\n\n## Examples\n\n::: {.cell}\n\n```{.r .cell-code}\n\n\nlibrary(mall)\n\ndata(\"reviews\")\n\nllm_use(\"ollama\", \"llama3.2\", seed = 100, .silent = TRUE)\n\nmy_prompt <- paste(\n \"Answer a question.\",\n \"Return only the answer, no explanation\",\n \"Acceptable answers are 'yes', 'no'\",\n \"Answer this about the following text, is this a happy customer?:\"\n)\n\nreviews |>\n llm_custom(review, my_prompt)\n#> # A tibble: 3 × 2\n#> review .pred\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. Yes \n#> 2 I regret buying this laptop. It is too slow and the keyboard is too noi… No \n#> 3 Not sure how to feel about my new washing machine. Great color, but har… No\n\n# For character vectors, instead of a data frame, use this function\nllm_vec_custom(reviews$review, my_prompt)\n#> [1] \"Yes\" \"No\" \"No\"\n\n# To preview the first call that will be made to the downstream R function\nllm_vec_custom(reviews$review, my_prompt, preview = TRUE)\n#> [[1]]\n#> ollamar::chat(messages = list(list(role = \"user\", content = \"Answer a question. Return only the answer, no explanation Acceptable answers are 'yes', 'no' Answer this about the following text, is this a happy customer?: The answer is based on the following text: This has been the best TV I've ever used. Great screen, and sound.\")), \n#> output = \"text\", model = \"llama3.2\", seed = 100)\n```\n:::\n\n\n\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/_freeze/reference/llm_extract/execute-results/html.json b/_freeze/reference/llm_extract/execute-results/html.json index 9158b67..ad644ae 100644 --- a/_freeze/reference/llm_extract/execute-results/html.json +++ b/_freeze/reference/llm_extract/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "9809786aedefcd4b48e3d7a17269c62f", + "hash": "9f8e637c03ae1511fa3c46cd17945452", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Extract entities from text\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n[R/llm-extract.R](https://github.com/mlverse/mall/blob/main/r/R/llm-extract.R)\n\n## llm_extract\n\n## Description\nUse a Large Language Model (LLM) to extract specific entity, or entities, from the provided text\n\n\n## Usage\n```r\n\nllm_extract(\n .data,\n col,\n labels,\n expand_cols = FALSE,\n additional_prompt = \"\",\n pred_name = \".extract\"\n)\n\nllm_vec_extract(x, labels = c(), additional_prompt = \"\", preview = FALSE)\n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| .data | A `data.frame` or `tbl` object that contains the text to be analyzed |\n| col | The name of the field to analyze, supports `tidy-eval` |\n| labels | A vector with the entities to extract from the text |\n| expand_cols | If multiple `labels` are passed, this is a flag that tells the function to create a new column per item in `labels`. If `labels` is a named vector, this function will use those names as the new column names, if not, the function will use a sanitized version of the content as the name. |\n| additional_prompt | Inserts this text into the prompt sent to the LLM |\n| pred_name | A character vector with the name of the new column where the prediction will be placed |\n| x | A vector that contains the text to be analyzed |\n| preview | It returns the R call that would have been used to run the prediction. It only returns the first record in `x`. Defaults to `FALSE` Applies to vector function only. |\n\n\n\n## Value\n`llm_extract` returns a `data.frame` or `tbl` object. `llm_vec_extract` returns a vector that is the same length as `x`.\n\n\n## Examples\n\n::: {.cell}\n\n```{.r .cell-code}\n\n\nlibrary(mall)\n\ndata(\"reviews\")\n\nllm_use(\"ollama\", \"llama3.2\", seed = 100, .silent = TRUE)\n\n# Use 'labels' to let the function know what to extract\nllm_extract(reviews, review, labels = \"product\")\n#> # A tibble: 3 × 2\n#> review .extract \n#> \n#> 1 This has been the best TV I've ever used. Gr… tv \n#> 2 I regret buying this laptop. It is too slow … laptop \n#> 3 Not sure how to feel about my new washing ma… washing machine\n\n# Use 'pred_name' to customize the new column's name\nllm_extract(reviews, review, \"product\", pred_name = \"prod\")\n#> # A tibble: 3 × 2\n#> review prod \n#> \n#> 1 This has been the best TV I've ever used. Gr… tv \n#> 2 I regret buying this laptop. It is too slow … laptop \n#> 3 Not sure how to feel about my new washing ma… washing machine\n\n# Pass a vector to request multiple things, the results will be pipe delimeted\n# in a single column\nllm_extract(reviews, review, c(\"product\", \"feelings\"))\n#> # A tibble: 3 × 2\n#> review .extract \n#> \n#> 1 This has been the best TV I've ever used. Gr… tv | great \n#> 2 I regret buying this laptop. It is too slow … laptop|frustration \n#> 3 Not sure how to feel about my new washing ma… washing machine | confusion\n\n# To get multiple columns, use 'expand_cols'\nllm_extract(reviews, review, c(\"product\", \"feelings\"), expand_cols = TRUE)\n#> # A tibble: 3 × 3\n#> review product feelings \n#> \n#> 1 This has been the best TV I've ever used. Gr… \"tv \" \" great\" \n#> 2 I regret buying this laptop. It is too slow … \"laptop\" \"frustration\"\n#> 3 Not sure how to feel about my new washing ma… \"washing machine \" \" confusion\"\n\n# Pass a named vector to set the resulting column names\nllm_extract(\n .data = reviews,\n col = review,\n labels = c(prod = \"product\", feels = \"feelings\"),\n expand_cols = TRUE\n)\n#> # A tibble: 3 × 3\n#> review prod feels \n#> \n#> 1 This has been the best TV I've ever used. Gr… \"tv \" \" great\" \n#> 2 I regret buying this laptop. It is too slow … \"laptop\" \"frustration\"\n#> 3 Not sure how to feel about my new washing ma… \"washing machine \" \" confusion\"\n\n# For character vectors, instead of a data frame, use this function\nllm_vec_extract(\"bob smith, 123 3rd street\", c(\"name\", \"address\"))\n#> [1] \"bob smith | 123 3rd street\"\n\n# To preview the first call that will be made to the downstream R function\nllm_vec_extract(\n \"bob smith, 123 3rd street\",\n c(\"name\", \"address\"),\n preview = TRUE\n)\n#> [[1]]\n#> ollamar::chat(messages = list(list(role = \"user\", content = \"You are a helpful text extraction engine. Extract the name, address being referred to in the text. I expect 2 items exactly. No capitalization. No explanations. Return the response exclusively in a pipe separated list, and no headers. The answer is based on the following text:\\nbob smith, 123 3rd street\")), \n#> output = \"text\", model = \"llama3.2\", seed = 100)\n```\n:::\n\n\n\n", + "markdown": "---\ntitle: \"Extract entities from text\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n[R/llm-extract.R](https://github.com/mlverse/mall/blob/main/r/R/llm-extract.R)\n\n## llm_extract\n\n## Description\nUse a Large Language Model (LLM) to extract specific entity, or entities, from the provided text\n\n\n## Usage\n```r\nllm_extract(\n .data,\n col,\n labels,\n expand_cols = FALSE,\n additional_prompt = \"\",\n pred_name = \".extract\"\n)\n\nllm_vec_extract(x, labels = c(), additional_prompt = \"\", preview = FALSE)\n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| .data | A `data.frame` or `tbl` object that contains the text to be analyzed |\n| col | The name of the field to analyze, supports `tidy-eval` |\n| labels | A vector with the entities to extract from the text |\n| expand_cols | If multiple `labels` are passed, this is a flag that tells the function to create a new column per item in `labels`. If `labels` is a named vector, this function will use those names as the new column names, if not, the function will use a sanitized version of the content as the name. |\n| additional_prompt | Inserts this text into the prompt sent to the LLM |\n| pred_name | A character vector with the name of the new column where the prediction will be placed |\n| x | A vector that contains the text to be analyzed |\n| preview | It returns the R call that would have been used to run the prediction. It only returns the first record in `x`. Defaults to `FALSE` Applies to vector function only. |\n\n\n\n## Value\n`llm_extract` returns a `data.frame` or `tbl` object. `llm_vec_extract` returns a vector that is the same length as `x`.\n\n\n## Examples\n\n::: {.cell}\n\n```{.r .cell-code}\n\n\nlibrary(mall)\n\ndata(\"reviews\")\n\nllm_use(\"ollama\", \"llama3.2\", seed = 100, .silent = TRUE)\n\n# Use 'labels' to let the function know what to extract\nllm_extract(reviews, review, labels = \"product\")\n#> # A tibble: 3 × 2\n#> review .extract \n#> \n#> 1 This has been the best TV I've ever used. Gr… tv \n#> 2 I regret buying this laptop. It is too slow … laptop \n#> 3 Not sure how to feel about my new washing ma… washing machine\n\n# Use 'pred_name' to customize the new column's name\nllm_extract(reviews, review, \"product\", pred_name = \"prod\")\n#> # A tibble: 3 × 2\n#> review prod \n#> \n#> 1 This has been the best TV I've ever used. Gr… tv \n#> 2 I regret buying this laptop. It is too slow … laptop \n#> 3 Not sure how to feel about my new washing ma… washing machine\n\n# Pass a vector to request multiple things, the results will be pipe delimeted\n# in a single column\nllm_extract(reviews, review, c(\"product\", \"feelings\"))\n#> # A tibble: 3 × 2\n#> review .extract \n#> \n#> 1 This has been the best TV I've ever used. Gr… tv | great \n#> 2 I regret buying this laptop. It is too slow … laptop|frustration \n#> 3 Not sure how to feel about my new washing ma… washing machine | confusion\n\n# To get multiple columns, use 'expand_cols'\nllm_extract(reviews, review, c(\"product\", \"feelings\"), expand_cols = TRUE)\n#> # A tibble: 3 × 3\n#> review product feelings \n#> \n#> 1 This has been the best TV I've ever used. Gr… \"tv \" \" great\" \n#> 2 I regret buying this laptop. It is too slow … \"laptop\" \"frustration\"\n#> 3 Not sure how to feel about my new washing ma… \"washing machine \" \" confusion\"\n\n# Pass a named vector to set the resulting column names\nllm_extract(\n .data = reviews,\n col = review,\n labels = c(prod = \"product\", feels = \"feelings\"),\n expand_cols = TRUE\n)\n#> # A tibble: 3 × 3\n#> review prod feels \n#> \n#> 1 This has been the best TV I've ever used. Gr… \"tv \" \" great\" \n#> 2 I regret buying this laptop. It is too slow … \"laptop\" \"frustration\"\n#> 3 Not sure how to feel about my new washing ma… \"washing machine \" \" confusion\"\n\n# For character vectors, instead of a data frame, use this function\nllm_vec_extract(\"bob smith, 123 3rd street\", c(\"name\", \"address\"))\n#> [1] \"bob smith | 123 3rd street\"\n\n# To preview the first call that will be made to the downstream R function\nllm_vec_extract(\n \"bob smith, 123 3rd street\",\n c(\"name\", \"address\"),\n preview = TRUE\n)\n#> [[1]]\n#> ollamar::chat(messages = list(list(role = \"user\", content = \"You are a helpful text extraction engine. Extract the name, address being referred to in the text. I expect 2 items exactly. No capitalization. No explanations. Return the response exclusively in a pipe separated list, and no headers. The answer is based on the following text: bob smith, 123 3rd street\")), \n#> output = \"text\", model = \"llama3.2\", seed = 100)\n```\n:::\n\n\n\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/_freeze/reference/llm_sentiment/execute-results/html.json b/_freeze/reference/llm_sentiment/execute-results/html.json index ad14c8e..0dc5d2b 100644 --- a/_freeze/reference/llm_sentiment/execute-results/html.json +++ b/_freeze/reference/llm_sentiment/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "5db75fd9fa22e0103a8d211d4148d14e", + "hash": "db29363fdd0712a1ecf11d24ab3e903e", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Sentiment analysis\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n[R/llm-sentiment.R](https://github.com/mlverse/mall/blob/main/r/R/llm-sentiment.R)\n\n## llm_sentiment\n\n## Description\nUse a Large Language Model (LLM) to perform sentiment analysis from the provided text\n\n\n## Usage\n```r\n\nllm_sentiment(\n .data,\n col,\n options = c(\"positive\", \"negative\", \"neutral\"),\n pred_name = \".sentiment\",\n additional_prompt = \"\"\n)\n\nllm_vec_sentiment(\n x,\n options = c(\"positive\", \"negative\", \"neutral\"),\n additional_prompt = \"\",\n preview = FALSE\n)\n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| .data | A `data.frame` or `tbl` object that contains the text to be analyzed |\n| col | The name of the field to analyze, supports `tidy-eval` |\n| options | A vector with the options that the LLM should use to assign a sentiment to the text. Defaults to: 'positive', 'negative', 'neutral' |\n| pred_name | A character vector with the name of the new column where the prediction will be placed |\n| additional_prompt | Inserts this text into the prompt sent to the LLM |\n| x | A vector that contains the text to be analyzed |\n| preview | It returns the R call that would have been used to run the prediction. It only returns the first record in `x`. Defaults to `FALSE` Applies to vector function only. |\n\n\n\n## Value\n`llm_sentiment` returns a `data.frame` or `tbl` object. `llm_vec_sentiment` returns a vector that is the same length as `x`.\n\n\n## Examples\n\n::: {.cell}\n\n```{.r .cell-code}\n\n\nlibrary(mall)\n\ndata(\"reviews\")\n\nllm_use(\"ollama\", \"llama3.2\", seed = 100, .silent = TRUE)\n\nllm_sentiment(reviews, review)\n#> # A tibble: 3 × 2\n#> review .sentiment\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. positive \n#> 2 I regret buying this laptop. It is too slow and the keyboard is to… negative \n#> 3 Not sure how to feel about my new washing machine. Great color, bu… neutral\n\n# Use 'pred_name' to customize the new column's name\nllm_sentiment(reviews, review, pred_name = \"review_sentiment\")\n#> # A tibble: 3 × 2\n#> review review_sentiment\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and … positive \n#> 2 I regret buying this laptop. It is too slow and the keyboard… negative \n#> 3 Not sure how to feel about my new washing machine. Great col… neutral\n\n# Pass custom sentiment options\nllm_sentiment(reviews, review, c(\"positive\", \"negative\"))\n#> # A tibble: 3 × 2\n#> review .sentiment\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. positive \n#> 2 I regret buying this laptop. It is too slow and the keyboard is to… negative \n#> 3 Not sure how to feel about my new washing machine. Great color, bu… negative\n\n# Specify values to return per sentiment\nllm_sentiment(reviews, review, c(\"positive\" ~ 1, \"negative\" ~ 0))\n#> # A tibble: 3 × 2\n#> review .sentiment\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. 1\n#> 2 I regret buying this laptop. It is too slow and the keyboard is to… 0\n#> 3 Not sure how to feel about my new washing machine. Great color, bu… 0\n\n# For character vectors, instead of a data frame, use this function\nllm_vec_sentiment(c(\"I am happy\", \"I am sad\"))\n#> [1] \"positive\" \"negative\"\n\n# To preview the first call that will be made to the downstream R function\nllm_vec_sentiment(c(\"I am happy\", \"I am sad\"), preview = TRUE)\n#> [[1]]\n#> ollamar::chat(messages = list(list(role = \"user\", content = \"You are a helpful sentiment engine. Return only one of the following answers: positive, negative, neutral. No capitalization. No explanations. The answer is based on the following text:\\nI am happy\")), \n#> output = \"text\", model = \"llama3.2\", seed = 100)\n```\n:::\n\n\n\n", + "markdown": "---\ntitle: \"Sentiment analysis\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n[R/llm-sentiment.R](https://github.com/mlverse/mall/blob/main/r/R/llm-sentiment.R)\n\n## llm_sentiment\n\n## Description\nUse a Large Language Model (LLM) to perform sentiment analysis from the provided text\n\n\n## Usage\n```r\nllm_sentiment(\n .data,\n col,\n options = c(\"positive\", \"negative\", \"neutral\"),\n pred_name = \".sentiment\",\n additional_prompt = \"\"\n)\n\nllm_vec_sentiment(\n x,\n options = c(\"positive\", \"negative\", \"neutral\"),\n additional_prompt = \"\",\n preview = FALSE\n)\n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| .data | A `data.frame` or `tbl` object that contains the text to be analyzed |\n| col | The name of the field to analyze, supports `tidy-eval` |\n| options | A vector with the options that the LLM should use to assign a sentiment to the text. Defaults to: 'positive', 'negative', 'neutral' |\n| pred_name | A character vector with the name of the new column where the prediction will be placed |\n| additional_prompt | Inserts this text into the prompt sent to the LLM |\n| x | A vector that contains the text to be analyzed |\n| preview | It returns the R call that would have been used to run the prediction. It only returns the first record in `x`. Defaults to `FALSE` Applies to vector function only. |\n\n\n\n## Value\n`llm_sentiment` returns a `data.frame` or `tbl` object. `llm_vec_sentiment` returns a vector that is the same length as `x`.\n\n\n## Examples\n\n::: {.cell}\n\n```{.r .cell-code}\n\n\nlibrary(mall)\n\ndata(\"reviews\")\n\nllm_use(\"ollama\", \"llama3.2\", seed = 100, .silent = TRUE)\n\nllm_sentiment(reviews, review)\n#> # A tibble: 3 × 2\n#> review .sentiment\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. positive \n#> 2 I regret buying this laptop. It is too slow and the keyboard is to… negative \n#> 3 Not sure how to feel about my new washing machine. Great color, bu… negative\n\n# Use 'pred_name' to customize the new column's name\nllm_sentiment(reviews, review, pred_name = \"review_sentiment\")\n#> # A tibble: 3 × 2\n#> review review_sentiment\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and … positive \n#> 2 I regret buying this laptop. It is too slow and the keyboard… negative \n#> 3 Not sure how to feel about my new washing machine. Great col… negative\n\n# Pass custom sentiment options\nllm_sentiment(reviews, review, c(\"positive\", \"negative\"))\n#> # A tibble: 3 × 2\n#> review .sentiment\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. positive \n#> 2 I regret buying this laptop. It is too slow and the keyboard is to… negative \n#> 3 Not sure how to feel about my new washing machine. Great color, bu… negative\n\n# Specify values to return per sentiment\nllm_sentiment(reviews, review, c(\"positive\" ~ 1, \"negative\" ~ 0))\n#> # A tibble: 3 × 2\n#> review .sentiment\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. 1\n#> 2 I regret buying this laptop. It is too slow and the keyboard is to… 0\n#> 3 Not sure how to feel about my new washing machine. Great color, bu… 0\n\n# For character vectors, instead of a data frame, use this function\nllm_vec_sentiment(c(\"I am happy\", \"I am sad\"))\n#> [1] \"positive\" \"negative\"\n\n# To preview the first call that will be made to the downstream R function\nllm_vec_sentiment(c(\"I am happy\", \"I am sad\"), preview = TRUE)\n#> [[1]]\n#> ollamar::chat(messages = list(list(role = \"user\", content = \"You are a helpful sentiment engine. Return only one of the following answers: positive, negative, neutral. No capitalization. No explanations. The answer is based on the following text: I am happy\")), \n#> output = \"text\", model = \"llama3.2\", seed = 100)\n```\n:::\n\n\n\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/_freeze/reference/llm_summarize/execute-results/html.json b/_freeze/reference/llm_summarize/execute-results/html.json index eb00ce9..6df2120 100644 --- a/_freeze/reference/llm_summarize/execute-results/html.json +++ b/_freeze/reference/llm_summarize/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "c23cc0bc7e4b0d22897e67cec63b6c3f", + "hash": "373812da167efebca9965ca3eeda28ba", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Summarize text\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n[R/llm-summarize.R](https://github.com/mlverse/mall/blob/main/r/R/llm-summarize.R)\n\n## llm_summarize\n\n## Description\nUse a Large Language Model (LLM) to summarize text\n\n\n## Usage\n```r\n\nllm_summarize(\n .data,\n col,\n max_words = 10,\n pred_name = \".summary\",\n additional_prompt = \"\"\n)\n\nllm_vec_summarize(x, max_words = 10, additional_prompt = \"\", preview = FALSE)\n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| .data | A `data.frame` or `tbl` object that contains the text to be analyzed |\n| col | The name of the field to analyze, supports `tidy-eval` |\n| max_words | The maximum number of words that the LLM should use in the summary. Defaults to 10. |\n| pred_name | A character vector with the name of the new column where the prediction will be placed |\n| additional_prompt | Inserts this text into the prompt sent to the LLM |\n| x | A vector that contains the text to be analyzed |\n| preview | It returns the R call that would have been used to run the prediction. It only returns the first record in `x`. Defaults to `FALSE` Applies to vector function only. |\n\n\n\n## Value\n`llm_summarize` returns a `data.frame` or `tbl` object. `llm_vec_summarize` returns a vector that is the same length as `x`.\n\n\n## Examples\n\n::: {.cell}\n\n```{.r .cell-code}\n\n\nlibrary(mall)\n\ndata(\"reviews\")\n\nllm_use(\"ollama\", \"llama3.2\", seed = 100, .silent = TRUE)\n\n# Use max_words to set the maximum number of words to use for the summary\nllm_summarize(reviews, review, max_words = 5)\n#> # A tibble: 3 × 2\n#> review .summary \n#> \n#> 1 This has been the best TV I've ever used. Gr… this tv is excellent quality\n#> 2 I regret buying this laptop. It is too slow … i regret my laptop purchase \n#> 3 Not sure how to feel about my new washing ma… confused about the purchase\n\n# Use 'pred_name' to customize the new column's name\nllm_summarize(reviews, review, 5, pred_name = \"review_summary\")\n#> # A tibble: 3 × 2\n#> review review_summary \n#> \n#> 1 This has been the best TV I've ever used. Gr… this tv is excellent quality\n#> 2 I regret buying this laptop. It is too slow … i regret my laptop purchase \n#> 3 Not sure how to feel about my new washing ma… confused about the purchase\n\n# For character vectors, instead of a data frame, use this function\nllm_vec_summarize(\n \"This has been the best TV I've ever used. Great screen, and sound.\",\n max_words = 5\n)\n#> [1] \"this tv is excellent quality\"\n\n# To preview the first call that will be made to the downstream R function\nllm_vec_summarize(\n \"This has been the best TV I've ever used. Great screen, and sound.\",\n max_words = 5,\n preview = TRUE\n)\n#> [[1]]\n#> ollamar::chat(messages = list(list(role = \"user\", content = \"You are a helpful summarization engine. Your answer will contain no capitalization and no explanations. Return no more than 5 words. The answer is based on the following text:\\nThis has been the best TV I've ever used. Great screen, and sound.\")), \n#> output = \"text\", model = \"llama3.2\", seed = 100)\n```\n:::\n\n\n\n", + "markdown": "---\ntitle: \"Summarize text\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n[R/llm-summarize.R](https://github.com/mlverse/mall/blob/main/r/R/llm-summarize.R)\n\n## llm_summarize\n\n## Description\nUse a Large Language Model (LLM) to summarize text\n\n\n## Usage\n```r\nllm_summarize(\n .data,\n col,\n max_words = 10,\n pred_name = \".summary\",\n additional_prompt = \"\"\n)\n\nllm_vec_summarize(x, max_words = 10, additional_prompt = \"\", preview = FALSE)\n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| .data | A `data.frame` or `tbl` object that contains the text to be analyzed |\n| col | The name of the field to analyze, supports `tidy-eval` |\n| max_words | The maximum number of words that the LLM should use in the summary. Defaults to 10. |\n| pred_name | A character vector with the name of the new column where the prediction will be placed |\n| additional_prompt | Inserts this text into the prompt sent to the LLM |\n| x | A vector that contains the text to be analyzed |\n| preview | It returns the R call that would have been used to run the prediction. It only returns the first record in `x`. Defaults to `FALSE` Applies to vector function only. |\n\n\n\n## Value\n`llm_summarize` returns a `data.frame` or `tbl` object. `llm_vec_summarize` returns a vector that is the same length as `x`.\n\n\n## Examples\n\n::: {.cell}\n\n```{.r .cell-code}\n\n\nlibrary(mall)\n\ndata(\"reviews\")\n\nllm_use(\"ollama\", \"llama3.2\", seed = 100, .silent = TRUE)\n\n# Use max_words to set the maximum number of words to use for the summary\nllm_summarize(reviews, review, max_words = 5)\n#> # A tibble: 3 × 2\n#> review .summary \n#> \n#> 1 This has been the best TV I've ever used. Gr… the tv is excellent quality \n#> 2 I regret buying this laptop. It is too slow … i made a bad purchase \n#> 3 Not sure how to feel about my new washing ma… having mixed feelings about it\n\n# Use 'pred_name' to customize the new column's name\nllm_summarize(reviews, review, 5, pred_name = \"review_summary\")\n#> # A tibble: 3 × 2\n#> review review_summary \n#> \n#> 1 This has been the best TV I've ever used. Gr… the tv is excellent quality \n#> 2 I regret buying this laptop. It is too slow … i made a bad purchase \n#> 3 Not sure how to feel about my new washing ma… having mixed feelings about it\n\n# For character vectors, instead of a data frame, use this function\nllm_vec_summarize(\n \"This has been the best TV I've ever used. Great screen, and sound.\",\n max_words = 5\n)\n#> [1] \"the tv is excellent quality\"\n\n# To preview the first call that will be made to the downstream R function\nllm_vec_summarize(\n \"This has been the best TV I've ever used. Great screen, and sound.\",\n max_words = 5,\n preview = TRUE\n)\n#> [[1]]\n#> ollamar::chat(messages = list(list(role = \"user\", content = \"You are a helpful summarization engine. Your answer will contain no capitalization and no explanations. Return no more than 5 words. The answer is based on the following text: This has been the best TV I've ever used. Great screen, and sound.\")), \n#> output = \"text\", model = \"llama3.2\", seed = 100)\n```\n:::\n\n\n\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/_freeze/reference/llm_translate/execute-results/html.json b/_freeze/reference/llm_translate/execute-results/html.json index c6f84f4..298f233 100644 --- a/_freeze/reference/llm_translate/execute-results/html.json +++ b/_freeze/reference/llm_translate/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "e45041321ea1b6ba75e1fc133bcf396c", + "hash": "7932b98ea301c0a87c99712dc43a4846", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Translates text to a specific language\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n[R/llm-translate.R](https://github.com/mlverse/mall/blob/main/r/R/llm-translate.R)\n\n## llm_translate\n\n## Description\nUse a Large Language Model (LLM) to translate a text to a specific language\n\n\n## Usage\n```r\n\nllm_translate(\n .data,\n col,\n language,\n pred_name = \".translation\",\n additional_prompt = \"\"\n)\n\nllm_vec_translate(x, language, additional_prompt = \"\", preview = FALSE)\n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| .data | A `data.frame` or `tbl` object that contains the text to be analyzed |\n| col | The name of the field to analyze, supports `tidy-eval` |\n| language | Target language to translate the text to |\n| pred_name | A character vector with the name of the new column where the prediction will be placed |\n| additional_prompt | Inserts this text into the prompt sent to the LLM |\n| x | A vector that contains the text to be analyzed |\n| preview | It returns the R call that would have been used to run the prediction. It only returns the first record in `x`. Defaults to `FALSE` Applies to vector function only. |\n\n\n\n## Value\n`llm_translate` returns a `data.frame` or `tbl` object. `llm_vec_translate` returns a vector that is the same length as `x`.\n\n\n## Examples\n\n::: {.cell}\n\n```{.r .cell-code}\n\n\nlibrary(mall)\n\ndata(\"reviews\")\n\nllm_use(\"ollama\", \"llama3.2\", seed = 100, .silent = TRUE)\n\n# Pass the desired language to translate to\nllm_translate(reviews, review, \"spanish\")\n#> # A tibble: 3 × 2\n#> review .translation \n#> \n#> 1 This has been the best TV I've ever used. Gr… Esta ha sido la mejor televisió…\n#> 2 I regret buying this laptop. It is too slow … Lo lamento comprar este portáti…\n#> 3 Not sure how to feel about my new washing ma… No estoy seguro de cómo sentirm…\n```\n:::\n\n\n\n", + "markdown": "---\ntitle: \"Translates text to a specific language\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n[R/llm-translate.R](https://github.com/mlverse/mall/blob/main/r/R/llm-translate.R)\n\n## llm_translate\n\n## Description\nUse a Large Language Model (LLM) to translate a text to a specific language\n\n\n## Usage\n```r\nllm_translate(\n .data,\n col,\n language,\n pred_name = \".translation\",\n additional_prompt = \"\"\n)\n\nllm_vec_translate(x, language, additional_prompt = \"\", preview = FALSE)\n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| .data | A `data.frame` or `tbl` object that contains the text to be analyzed |\n| col | The name of the field to analyze, supports `tidy-eval` |\n| language | Target language to translate the text to |\n| pred_name | A character vector with the name of the new column where the prediction will be placed |\n| additional_prompt | Inserts this text into the prompt sent to the LLM |\n| x | A vector that contains the text to be analyzed |\n| preview | It returns the R call that would have been used to run the prediction. It only returns the first record in `x`. Defaults to `FALSE` Applies to vector function only. |\n\n\n\n## Value\n`llm_translate` returns a `data.frame` or `tbl` object. `llm_vec_translate` returns a vector that is the same length as `x`.\n\n\n## Examples\n\n::: {.cell}\n\n```{.r .cell-code}\n\n\nlibrary(mall)\n\ndata(\"reviews\")\n\nllm_use(\"ollama\", \"llama3.2\", seed = 100, .silent = TRUE)\n\n# Pass the desired language to translate to\nllm_translate(reviews, review, \"spanish\")\n#> # A tibble: 3 × 2\n#> review .translation \n#> \n#> 1 This has been the best TV I've ever used. Gr… Esta ha sido la mejor televisió…\n#> 2 I regret buying this laptop. It is too slow … Lo lamento comprar este portáti…\n#> 3 Not sure how to feel about my new washing ma… No estoy seguro de cómo sentirm…\n```\n:::\n\n\n\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/_freeze/reference/llm_use/execute-results/html.json b/_freeze/reference/llm_use/execute-results/html.json index 45cae1b..9815f87 100644 --- a/_freeze/reference/llm_use/execute-results/html.json +++ b/_freeze/reference/llm_use/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "203b509b0afd56876dc55df1ca91094c", + "hash": "14eaf1550a5ec94884d24b8508582d48", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Specify the model to use\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n[R/llm-use.R](https://github.com/mlverse/mall/blob/main/r/R/llm-use.R)\n\n## llm_use\n\n## Description\nAllows us to specify the back-end provider, model to use during the current R session\n\n\n## Usage\n```r\n\nllm_use(\n backend = NULL,\n model = NULL,\n ...,\n .silent = FALSE,\n .cache = NULL,\n .force = FALSE\n)\n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| backend | \"ollama\" or an `ellmer` `Chat` object. If using \"ollama\", `mall` will use is out-of-the-box integration with that back-end. Defaults to \"ollama\". |\n| model | The name of model supported by the back-end provider |\n| ... | Additional arguments that this function will pass down to the integrating function. In the case of Ollama, it will pass those arguments to `ollamar::chat()`. |\n| .silent | Avoids console output |\n| .cache | The path to save model results, so they can be re-used if the same operation is ran again. To turn off, set this argument to an empty character: `\"\"`. It defaults to a temp folder. If this argument is left `NULL` when calling this function, no changes to the path will be made. |\n| .force | Flag that tell the function to reset all of the settings in the R session |\n\n\n\n## Value\nA `mall_session` object\n\n\n## Examples\n\n::: {.cell}\n\n```{.r .cell-code}\n\n\nlibrary(mall)\n\nllm_use(\"ollama\", \"llama3.2\")\n#> \n#> ── mall session object\n#> Backend: ollama\n#> LLM session: model:llama3.2\n#> R session:\n#> cache_folder:/var/folders/y_/f_0cx_291nl0s8h26t4jg6ch0000gp/T//RtmpVFYNVk/_mall_cache72b0280b70e8\n\n# Additional arguments will be passed 'as-is' to the\n# downstream R function in this example, to ollama::chat()\nllm_use(\"ollama\", \"llama3.2\", seed = 100, temperature = 0.1)\n#> \n#> ── mall session object \n#> Backend: ollamaLLM session: model:llama3.2\n#> seed:100\n#> temperature:0.1\n#> R session:\n#> cache_folder:/var/folders/y_/f_0cx_291nl0s8h26t4jg6ch0000gp/T//RtmpVFYNVk/_mall_cache72b0280b70e8\n\n# During the R session, you can change any argument\n# individually and it will retain all of previous\n# arguments used\nllm_use(temperature = 0.3)\n#> \n#> ── mall session object \n#> Backend: ollamaLLM session: model:llama3.2\n#> seed:100\n#> temperature:0.3\n#> R session:\n#> cache_folder:/var/folders/y_/f_0cx_291nl0s8h26t4jg6ch0000gp/T//RtmpVFYNVk/_mall_cache72b0280b70e8\n\n# Use .cache to modify the target folder for caching\nllm_use(.cache = \"_my_cache\")\n#> \n#> ── mall session object \n#> Backend: ollamaLLM session: model:llama3.2\n#> seed:100\n#> temperature:0.3\n#> R session: cache_folder:_my_cache\n\n# Leave .cache empty to turn off this functionality\nllm_use(.cache = \"\")\n#> \n#> ── mall session object \n#> Backend: ollamaLLM session: model:llama3.2\n#> seed:100\n#> temperature:0.3\n\n# Use .silent to avoid the print out\nllm_use(.silent = TRUE)\n\n# Use an `ellmer` object\nlibrary(ellmer)\nchat <- chat_openai(model = \"gpt-4o\")\nllm_use(chat)\n#> \n#> ── mall session object \n#> Backend: ellmerLLM session: model:gpt-4o\n#> seed:100\n#> temperature:0.3\n```\n:::\n\n\n\n", + "markdown": "---\ntitle: \"Specify the model to use\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n[R/llm-use.R](https://github.com/mlverse/mall/blob/main/r/R/llm-use.R)\n\n## llm_use\n\n## Description\nAllows us to specify the back-end provider, model to use during the current R session\n\n\n## Usage\n```r\nllm_use(\n backend = NULL,\n model = NULL,\n ...,\n .silent = FALSE,\n .cache = NULL,\n .force = FALSE\n)\n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| backend | \"ollama\" or an `ellmer` `Chat` object. If using \"ollama\", `mall` will use is out-of-the-box integration with that back-end. Defaults to \"ollama\". |\n| model | The name of model supported by the back-end provider |\n| ... | Additional arguments that this function will pass down to the integrating function. In the case of Ollama, it will pass those arguments to `ollamar::chat()`. |\n| .silent | Avoids console output |\n| .cache | The path to save model results, so they can be re-used if the same operation is ran again. To turn off, set this argument to an empty character: `\"\"`. It defaults to a temp folder. If this argument is left `NULL` when calling this function, no changes to the path will be made. |\n| .force | Flag that tell the function to reset all of the settings in the R session |\n\n\n\n## Value\nA `mall_session` object\n\n\n## Examples\n\n::: {.cell}\n\n```{.r .cell-code}\n\n\nlibrary(mall)\n\nllm_use(\"ollama\", \"llama3.2\")\n#> \n#> ── mall session object\n#> Backend: ollama\n#> LLM session: model:llama3.2\n#> R session:\n#> cache_folder:/var/folders/y_/f_0cx_291nl0s8h26t4jg6ch0000gp/T//RtmpfvE9Tj/_mall_cache42722d228a80\n\n# Additional arguments will be passed 'as-is' to the\n# downstream R function in this example, to ollama::chat()\nllm_use(\"ollama\", \"llama3.2\", seed = 100, temperature = 0.1)\n#> \n#> ── mall session object \n#> Backend: ollamaLLM session: model:llama3.2\n#> seed:100\n#> temperature:0.1\n#> R session:\n#> cache_folder:/var/folders/y_/f_0cx_291nl0s8h26t4jg6ch0000gp/T//RtmpfvE9Tj/_mall_cache42722d228a80\n\n# During the R session, you can change any argument\n# individually and it will retain all of previous\n# arguments used\nllm_use(temperature = 0.3)\n#> \n#> ── mall session object \n#> Backend: ollamaLLM session: model:llama3.2\n#> seed:100\n#> temperature:0.3\n#> R session:\n#> cache_folder:/var/folders/y_/f_0cx_291nl0s8h26t4jg6ch0000gp/T//RtmpfvE9Tj/_mall_cache42722d228a80\n\n# Use .cache to modify the target folder for caching\nllm_use(.cache = \"_my_cache\")\n#> \n#> ── mall session object \n#> Backend: ollamaLLM session: model:llama3.2\n#> seed:100\n#> temperature:0.3\n#> R session: cache_folder:_my_cache\n\n# Leave .cache empty to turn off this functionality\nllm_use(.cache = \"\")\n#> \n#> ── mall session object \n#> Backend: ollamaLLM session: model:llama3.2\n#> seed:100\n#> temperature:0.3\n\n# Use .silent to avoid the print out\nllm_use(.silent = TRUE)\n\n# Use an `ellmer` object\nlibrary(ellmer)\nchat <- chat_openai(model = \"gpt-4o\")\nllm_use(chat)\n#> \n#> ── mall session object \n#> Backend: ellmerLLM session: model:gpt-4o\n#> seed:100\n#> temperature:0.3\n```\n:::\n\n\n\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/_freeze/reference/llm_verify/execute-results/html.json b/_freeze/reference/llm_verify/execute-results/html.json index 64c6f81..b4c81b1 100644 --- a/_freeze/reference/llm_verify/execute-results/html.json +++ b/_freeze/reference/llm_verify/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "48c5b10f3f5e1e62b1c6a52651eb1f57", + "hash": "e1b7b13fecf96cb4254133a115796da5", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Verify if a statement about the text is true or not\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n[R/llm-verify.R](https://github.com/mlverse/mall/blob/main/r/R/llm-verify.R)\n\n## llm_verify\n\n## Description\nUse a Large Language Model (LLM) to see if something is true or not based the provided text\n\n\n## Usage\n```r\n\nllm_verify(\n .data,\n col,\n what,\n yes_no = factor(c(1, 0)),\n pred_name = \".verify\",\n additional_prompt = \"\"\n)\n\nllm_vec_verify(\n x,\n what,\n yes_no = factor(c(1, 0)),\n additional_prompt = \"\",\n preview = FALSE\n)\n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| .data | A `data.frame` or `tbl` object that contains the text to be analyzed |\n| col | The name of the field to analyze, supports `tidy-eval` |\n| what | The statement or question that needs to be verified against the provided text |\n| yes_no | A size 2 vector that specifies the expected output. It is positional. The first item is expected to be value to return if the statement about the provided text is true, and the second if it is not. Defaults to: `factor(c(1, 0))` |\n| pred_name | A character vector with the name of the new column where the prediction will be placed |\n| additional_prompt | Inserts this text into the prompt sent to the LLM |\n| x | A vector that contains the text to be analyzed |\n| preview | It returns the R call that would have been used to run the prediction. It only returns the first record in `x`. Defaults to `FALSE` Applies to vector function only. |\n\n\n\n## Value\n`llm_verify` returns a `data.frame` or `tbl` object. `llm_vec_verify` returns a vector that is the same length as `x`.\n\n\n## Examples\n\n::: {.cell}\n\n```{.r .cell-code}\n\n\nlibrary(mall)\n\ndata(\"reviews\")\n\nllm_use(\"ollama\", \"llama3.2\", seed = 100, .silent = TRUE)\n\n# By default it will return 1 for 'true', and 0 for 'false',\n# the new column will be a factor type\nllm_verify(reviews, review, \"is the customer happy\")\n#> # A tibble: 3 × 2\n#> review .verify\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. 1 \n#> 2 I regret buying this laptop. It is too slow and the keyboard is too n… 0 \n#> 3 Not sure how to feel about my new washing machine. Great color, but h… 0\n\n# The yes_no argument can be modified to return a different response\n# than 1 or 0. First position will be 'true' and second, 'false'\nllm_verify(reviews, review, \"is the customer happy\", c(\"y\", \"n\"))\n#> # A tibble: 3 × 2\n#> review .verify\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. y \n#> 2 I regret buying this laptop. It is too slow and the keyboard is too n… n \n#> 3 Not sure how to feel about my new washing machine. Great color, but h… n\n\n# Number can also be used, this would be in the case that you wish to match\n# the output values of existing predictions\nllm_verify(reviews, review, \"is the customer happy\", c(2, 1))\n#> # A tibble: 3 × 2\n#> review .verify\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. 2\n#> 2 I regret buying this laptop. It is too slow and the keyboard is too n… 1\n#> 3 Not sure how to feel about my new washing machine. Great color, but h… 1\n```\n:::\n\n\n\n", + "markdown": "---\ntitle: \"Verify if a statement about the text is true or not\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n[R/llm-verify.R](https://github.com/mlverse/mall/blob/main/r/R/llm-verify.R)\n\n## llm_verify\n\n## Description\nUse a Large Language Model (LLM) to see if something is true or not based the provided text\n\n\n## Usage\n```r\nllm_verify(\n .data,\n col,\n what,\n yes_no = factor(c(1, 0)),\n pred_name = \".verify\",\n additional_prompt = \"\"\n)\n\nllm_vec_verify(\n x,\n what,\n yes_no = factor(c(1, 0)),\n additional_prompt = \"\",\n preview = FALSE\n)\n```\n\n## Arguments\n|Arguments|Description|\n|---|---|\n| .data | A `data.frame` or `tbl` object that contains the text to be analyzed |\n| col | The name of the field to analyze, supports `tidy-eval` |\n| what | The statement or question that needs to be verified against the provided text |\n| yes_no | A size 2 vector that specifies the expected output. It is positional. The first item is expected to be value to return if the statement about the provided text is true, and the second if it is not. Defaults to: `factor(c(1, 0))` |\n| pred_name | A character vector with the name of the new column where the prediction will be placed |\n| additional_prompt | Inserts this text into the prompt sent to the LLM |\n| x | A vector that contains the text to be analyzed |\n| preview | It returns the R call that would have been used to run the prediction. It only returns the first record in `x`. Defaults to `FALSE` Applies to vector function only. |\n\n\n\n## Value\n`llm_verify` returns a `data.frame` or `tbl` object. `llm_vec_verify` returns a vector that is the same length as `x`.\n\n\n## Examples\n\n::: {.cell}\n\n```{.r .cell-code}\n\n\nlibrary(mall)\n\ndata(\"reviews\")\n\nllm_use(\"ollama\", \"llama3.2\", seed = 100, .silent = TRUE)\n\n# By default it will return 1 for 'true', and 0 for 'false',\n# the new column will be a factor type\nllm_verify(reviews, review, \"is the customer happy\")\n#> # A tibble: 3 × 2\n#> review .verify\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. 1 \n#> 2 I regret buying this laptop. It is too slow and the keyboard is too n… 0 \n#> 3 Not sure how to feel about my new washing machine. Great color, but h… 0\n\n# The yes_no argument can be modified to return a different response\n# than 1 or 0. First position will be 'true' and second, 'false'\nllm_verify(reviews, review, \"is the customer happy\", c(\"y\", \"n\"))\n#> # A tibble: 3 × 2\n#> review .verify\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. y \n#> 2 I regret buying this laptop. It is too slow and the keyboard is too n… n \n#> 3 Not sure how to feel about my new washing machine. Great color, but h… n\n\n# Number can also be used, this would be in the case that you wish to match\n# the output values of existing predictions\nllm_verify(reviews, review, \"is the customer happy\", c(2, 1))\n#> # A tibble: 3 × 2\n#> review .verify\n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. 2\n#> 2 I regret buying this laptop. It is too slow and the keyboard is too n… 1\n#> 3 Not sure how to feel about my new washing machine. Great color, but h… 1\n```\n:::\n\n\n\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/_freeze/reference/reviews/execute-results/html.json b/_freeze/reference/reviews/execute-results/html.json index 44dceff..f7445f7 100644 --- a/_freeze/reference/reviews/execute-results/html.json +++ b/_freeze/reference/reviews/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "af3ea3e521fa21a556d6df7ef6633f3b", + "hash": "ace805f187bbf2521f9b8f65afa17066", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Mini reviews data set\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n[R/data-reviews.R](https://github.com/mlverse/mall/blob/main/r/R/data-reviews.R)\n\n## reviews\n\n## Description\nMini reviews data set\n\n## Format\nA data frame that contains 3 records. The records are of fictitious product reviews.\n\n## Usage\n```r\n\nreviews\n```\n\n\n\n\n\n\n## Examples\n\n::: {.cell}\n\n```{.r .cell-code}\n\nlibrary(mall)\ndata(reviews)\nreviews\n#> # A tibble: 3 × 1\n#> review \n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. \n#> 2 I regret buying this laptop. It is too slow and the keyboard is too noisy \n#> 3 Not sure how to feel about my new washing machine. Great color, but hard to f…\n```\n:::\n\n\n\n", + "markdown": "---\ntitle: \"Mini reviews data set\"\nexecute:\n eval: true\n freeze: true\n---\n\n\n\n[R/data-reviews.R](https://github.com/mlverse/mall/blob/main/r/R/data-reviews.R)\n\n## reviews\n\n## Description\nMini reviews data set\n\n## Format\nA data frame that contains 3 records. The records are of fictitious product reviews.\n\n## Usage\n```r\nreviews\n```\n\n\n\n\n\n\n## Examples\n\n::: {.cell}\n\n```{.r .cell-code}\n\nlibrary(mall)\ndata(reviews)\nreviews\n#> # A tibble: 3 × 1\n#> review \n#> \n#> 1 This has been the best TV I've ever used. Great screen, and sound. \n#> 2 I regret buying this laptop. It is too slow and the keyboard is too noisy \n#> 3 Not sure how to feel about my new washing machine. Great color, but hard to f…\n```\n:::\n\n\n\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/r/DESCRIPTION b/r/DESCRIPTION index 7567855..597bef5 100644 --- a/r/DESCRIPTION +++ b/r/DESCRIPTION @@ -1,7 +1,7 @@ Package: mall Title: Run Multiple Large Language Model Predictions Against a Table, or Vectors -Version: 0.2.0.9000 +Version: 0.2.0.9001 Authors@R: c( person("Edgar", "Ruiz", , "edgar@posit.co", role = c("aut", "cre")), person(given = "Posit Software, PBC", role = c("cph", "fnd")) @@ -13,7 +13,6 @@ Description: Run multiple 'Large Language Model' predictions against a table. Th License: MIT + file LICENSE Encoding: UTF-8 Roxygen: list(markdown = TRUE) -RoxygenNote: 7.3.2 Imports: cli, dplyr, @@ -32,5 +31,4 @@ URL: https://mlverse.github.io/mall/ Depends: R (>= 4.1) LazyData: true - - +Config/roxygen2/version: 8.0.0 diff --git a/r/NEWS.md b/r/NEWS.md index c88433d..3d110ea 100644 --- a/r/NEWS.md +++ b/r/NEWS.md @@ -1,5 +1,9 @@ # mall (dev) +* Adds `preview` arugment to llm_vec_custom() + +* Fixes `llm_extract()` ignoring multiple names when `pred_name` is passed (#66) + * Fix for missing content when using custom prompt with Ollama directly (#62) # mall 0.2.0 @@ -12,3 +16,5 @@ sent to Ollama (#43) # mall 0.1.0 * Initial CRAN submission. + + diff --git a/r/R/llm-classify.R b/r/R/llm-classify.R index d73b88e..89d1c3f 100644 --- a/r/R/llm-classify.R +++ b/r/R/llm-classify.R @@ -53,20 +53,24 @@ #' ) #' } #' @export -llm_classify <- function(.data, - col, - labels, - pred_name = ".classify", - additional_prompt = "") { +llm_classify <- function( + .data, + col, + labels, + pred_name = ".classify", + additional_prompt = "" +) { UseMethod("llm_classify") } #' @export -llm_classify.data.frame <- function(.data, - col, - labels, - pred_name = ".classify", - additional_prompt = "") { +llm_classify.data.frame <- function( + .data, + col, + labels, + pred_name = ".classify", + additional_prompt = "" +) { mutate( .data = .data, !!pred_name := llm_vec_classify( @@ -78,11 +82,13 @@ llm_classify.data.frame <- function(.data, } #' @export -`llm_classify.tbl_Spark SQL` <- function(.data, - col, - labels, - pred_name = ".classify", - additional_prompt = "") { +`llm_classify.tbl_Spark SQL` <- function( + .data, + col, + labels, + pred_name = ".classify", + additional_prompt = "" +) { prep_labels <- paste0("'", labels, "'", collapse = ", ") mutate( .data = .data, @@ -94,10 +100,12 @@ globalVariables(c("ai_classify", "array")) #' @rdname llm_classify #' @export -llm_vec_classify <- function(x, - labels, - additional_prompt = "", - preview = FALSE) { +llm_vec_classify <- function( + x, + labels, + additional_prompt = "", + preview = FALSE +) { m_vec_prompt( x = x, prompt_label = "classify", diff --git a/r/R/llm-custom.R b/r/R/llm-custom.R index 7e60751..4537b49 100644 --- a/r/R/llm-custom.R +++ b/r/R/llm-custom.R @@ -26,25 +26,34 @@ #' #' reviews |> #' llm_custom(review, my_prompt) +#' +#' # For character vectors, instead of a data frame, use this function +#' llm_vec_custom(reviews$review, my_prompt) +#' +#' # To preview the first call that will be made to the downstream R function +#' llm_vec_custom(reviews$review, my_prompt, preview = TRUE) #' } #' @returns `llm_custom` returns a `data.frame` or `tbl` object. #' `llm_vec_custom` returns a vector that is the same length as `x`. #' @export llm_custom <- function( - .data, - col, - prompt = "", - pred_name = ".pred", - valid_resps = "") { + .data, + col, + prompt = "", + pred_name = ".pred", + valid_resps = "" +) { UseMethod("llm_custom") } #' @export -llm_custom.data.frame <- function(.data, - col, - prompt = "", - pred_name = ".pred", - valid_resps = NULL) { +llm_custom.data.frame <- function( + .data, + col, + prompt = "", + pred_name = ".pred", + valid_resps = NULL +) { mutate( .data = .data, !!pred_name := llm_vec_custom( @@ -57,12 +66,13 @@ llm_custom.data.frame <- function(.data, #' @rdname llm_custom #' @export -llm_vec_custom <- function(x, prompt = "", valid_resps = NULL) { +llm_vec_custom <- function(x, prompt = "", valid_resps = NULL, preview = FALSE) { m_vec_prompt( x = x, - prompt_label = "custom", + prompt_label = "custom", prompt = prompt, - custom_prompt = prompt, - valid_resps = valid_resps + custom_prompt = prompt, + valid_resps = valid_resps, + preview = preview ) } diff --git a/r/R/llm-extract.R b/r/R/llm-extract.R index a94285e..89f7e99 100644 --- a/r/R/llm-extract.R +++ b/r/R/llm-extract.R @@ -53,22 +53,26 @@ #' @returns `llm_extract` returns a `data.frame` or `tbl` object. #' `llm_vec_extract` returns a vector that is the same length as `x`. #' @export -llm_extract <- function(.data, - col, - labels, - expand_cols = FALSE, - additional_prompt = "", - pred_name = ".extract") { +llm_extract <- function( + .data, + col, + labels, + expand_cols = FALSE, + additional_prompt = "", + pred_name = ".extract" +) { UseMethod("llm_extract") } #' @export -llm_extract.data.frame <- function(.data, - col, - labels = c(), - expand_cols = FALSE, - additional_prompt = "", - pred_name = ".extract") { +llm_extract.data.frame <- function( + .data, + col, + labels = c(), + expand_cols = FALSE, + additional_prompt = "", + pred_name = ".extract" +) { if (expand_cols && length(labels) > 1) { text <- pull(.data, {{ col }}) resp <- llm_vec_extract( @@ -78,11 +82,13 @@ llm_extract.data.frame <- function(.data, ) resp <- map( resp, - \(x) ({ - x <- strsplit(x, "\\|")[[1]] - names(x) <- clean_names(labels) - x - }) + \(x) { + ({ + x <- strsplit(x, "\\|")[[1]] + names(x) <- clean_names(labels) + x + }) + } ) resp <- transpose(resp) var_names <- names(labels) @@ -92,6 +98,9 @@ llm_extract.data.frame <- function(.data, } else { var_names <- resp_names } + if (length(pred_name) == length(labels)) { + var_names <- pred_name + } var_names <- clean_names(var_names) for (i in seq_along(resp)) { vals <- as.character(resp[[i]]) @@ -113,10 +122,12 @@ llm_extract.data.frame <- function(.data, #' @rdname llm_extract #' @export -llm_vec_extract <- function(x, - labels = c(), - additional_prompt = "", - preview = FALSE) { +llm_vec_extract <- function( + x, + labels = c(), + additional_prompt = "", + preview = FALSE +) { m_vec_prompt( x = x, prompt_label = "extract", diff --git a/r/R/llm-sentiment.R b/r/R/llm-sentiment.R index 5961ccd..ef10c1a 100644 --- a/r/R/llm-sentiment.R +++ b/r/R/llm-sentiment.R @@ -36,20 +36,24 @@ #' llm_vec_sentiment(c("I am happy", "I am sad"), preview = TRUE) #' } #' @export -llm_sentiment <- function(.data, - col, - options = c("positive", "negative", "neutral"), - pred_name = ".sentiment", - additional_prompt = "") { +llm_sentiment <- function( + .data, + col, + options = c("positive", "negative", "neutral"), + pred_name = ".sentiment", + additional_prompt = "" +) { UseMethod("llm_sentiment") } #' @export -llm_sentiment.data.frame <- function(.data, - col, - options = c("positive", "negative", "neutral"), - pred_name = ".sentiment", - additional_prompt = "") { +llm_sentiment.data.frame <- function( + .data, + col, + options = c("positive", "negative", "neutral"), + pred_name = ".sentiment", + additional_prompt = "" +) { mutate( .data = .data, !!pred_name := llm_vec_sentiment( @@ -61,11 +65,13 @@ llm_sentiment.data.frame <- function(.data, } #' @export -`llm_sentiment.tbl_Spark SQL` <- function(.data, - col, - options = NULL, - pred_name = ".sentiment", - additional_prompt = NULL) { +`llm_sentiment.tbl_Spark SQL` <- function( + .data, + col, + options = NULL, + pred_name = ".sentiment", + additional_prompt = NULL +) { mutate( .data = .data, !!pred_name := ai_analyze_sentiment({{ col }}) @@ -76,10 +82,12 @@ globalVariables("ai_analyze_sentiment") #' @rdname llm_sentiment #' @export -llm_vec_sentiment <- function(x, - options = c("positive", "negative", "neutral"), - additional_prompt = "", - preview = FALSE) { +llm_vec_sentiment <- function( + x, + options = c("positive", "negative", "neutral"), + additional_prompt = "", + preview = FALSE +) { m_vec_prompt( x = x, prompt_label = "sentiment", diff --git a/r/R/llm-summarize.R b/r/R/llm-summarize.R index 91cee6a..40f8e0e 100644 --- a/r/R/llm-summarize.R +++ b/r/R/llm-summarize.R @@ -36,20 +36,24 @@ #' @returns `llm_summarize` returns a `data.frame` or `tbl` object. #' `llm_vec_summarize` returns a vector that is the same length as `x`. #' @export -llm_summarize <- function(.data, - col, - max_words = 10, - pred_name = ".summary", - additional_prompt = "") { +llm_summarize <- function( + .data, + col, + max_words = 10, + pred_name = ".summary", + additional_prompt = "" +) { UseMethod("llm_summarize") } #' @export -llm_summarize.data.frame <- function(.data, - col, - max_words = 10, - pred_name = ".summary", - additional_prompt = "") { +llm_summarize.data.frame <- function( + .data, + col, + max_words = 10, + pred_name = ".summary", + additional_prompt = "" +) { mutate( .data = .data, !!pred_name := llm_vec_summarize( @@ -61,11 +65,13 @@ llm_summarize.data.frame <- function(.data, } #' @export -`llm_summarize.tbl_Spark SQL` <- function(.data, - col, - max_words = 10, - pred_name = ".summary", - additional_prompt = NULL) { +`llm_summarize.tbl_Spark SQL` <- function( + .data, + col, + max_words = 10, + pred_name = ".summary", + additional_prompt = NULL +) { mutate( .data = .data, !!pred_name := ai_summarize({{ col }}, as.integer(max_words)) @@ -76,10 +82,12 @@ globalVariables("ai_summarize") #' @rdname llm_summarize #' @export -llm_vec_summarize <- function(x, - max_words = 10, - additional_prompt = "", - preview = FALSE) { +llm_vec_summarize <- function( + x, + max_words = 10, + additional_prompt = "", + preview = FALSE +) { m_vec_prompt( x = x, prompt_label = "summarize", diff --git a/r/R/llm-translate.R b/r/R/llm-translate.R index fddfc07..68fa2ec 100644 --- a/r/R/llm-translate.R +++ b/r/R/llm-translate.R @@ -20,20 +20,24 @@ #' @returns `llm_translate` returns a `data.frame` or `tbl` object. #' `llm_vec_translate` returns a vector that is the same length as `x`. #' @export -llm_translate <- function(.data, - col, - language, - pred_name = ".translation", - additional_prompt = "") { +llm_translate <- function( + .data, + col, + language, + pred_name = ".translation", + additional_prompt = "" +) { UseMethod("llm_translate") } #' @export -llm_translate.data.frame <- function(.data, - col, - language, - pred_name = ".translation", - additional_prompt = "") { +llm_translate.data.frame <- function( + .data, + col, + language, + pred_name = ".translation", + additional_prompt = "" +) { mutate( .data = .data, !!pred_name := llm_vec_translate( @@ -47,10 +51,11 @@ llm_translate.data.frame <- function(.data, #' @rdname llm_translate #' @export llm_vec_translate <- function( - x, - language, - additional_prompt = "", - preview = FALSE) { + x, + language, + additional_prompt = "", + preview = FALSE +) { m_vec_prompt( x = x, prompt_label = "translate", diff --git a/r/R/llm-use.R b/r/R/llm-use.R index 94f780f..8a39061 100644 --- a/r/R/llm-use.R +++ b/r/R/llm-use.R @@ -49,12 +49,13 @@ #' #' @export llm_use <- function( - backend = NULL, - model = NULL, - ..., - .silent = FALSE, - .cache = NULL, - .force = FALSE) { + backend = NULL, + model = NULL, + ..., + .silent = FALSE, + .cache = NULL, + .force = FALSE +) { ellmer_obj <- NULL models <- list() not_init <- inherits(m_defaults_get(), "list") diff --git a/r/R/llm-verify.R b/r/R/llm-verify.R index e972a73..4b30d71 100644 --- a/r/R/llm-verify.R +++ b/r/R/llm-verify.R @@ -35,22 +35,26 @@ #' } #' #' @export -llm_verify <- function(.data, - col, - what, - yes_no = factor(c(1, 0)), - pred_name = ".verify", - additional_prompt = "") { +llm_verify <- function( + .data, + col, + what, + yes_no = factor(c(1, 0)), + pred_name = ".verify", + additional_prompt = "" +) { UseMethod("llm_verify") } #' @export -llm_verify.data.frame <- function(.data, - col, - what, - yes_no = factor(c(1, 0)), - pred_name = ".verify", - additional_prompt = "") { +llm_verify.data.frame <- function( + .data, + col, + what, + yes_no = factor(c(1, 0)), + pred_name = ".verify", + additional_prompt = "" +) { mutate( .data = .data, !!pred_name := llm_vec_verify( @@ -64,11 +68,13 @@ llm_verify.data.frame <- function(.data, #' @rdname llm_verify #' @export -llm_vec_verify <- function(x, - what, - yes_no = factor(c(1, 0)), - additional_prompt = "", - preview = FALSE) { +llm_vec_verify <- function( + x, + what, + yes_no = factor(c(1, 0)), + additional_prompt = "", + preview = FALSE +) { m_vec_prompt( x = x, prompt_label = "verify", diff --git a/r/R/m-backend-prompt.R b/r/R/m-backend-prompt.R index a02a98a..6bcbb74 100644 --- a/r/R/m-backend-prompt.R +++ b/r/R/m-backend-prompt.R @@ -9,14 +9,14 @@ m_backend_prompt.mall_ollama <- function(backend, additional = "") { next_method <- NextMethod() additional <- glue(paste( additional, - "The answer is based on the following text:\n{{x}}" + "The answer is based on the following text: {{x}}" )) next_method$custom <- function(custom_prompt) { glue(paste( - "{custom_prompt}", + "{custom_prompt}", "{additional}" )) - } + } next_method } @@ -30,10 +30,10 @@ m_backend_prompt.mall_ellmer <- function(backend, additional = "") { )) next_method$custom <- function(custom_prompt) { glue(paste( - "{custom_prompt}", + "{custom_prompt}", "{additional}" )) - } + } next_method } diff --git a/r/R/m-backend-submit.R b/r/R/m-backend-submit.R index d988e66..8988380 100644 --- a/r/R/m-backend-submit.R +++ b/r/R/m-backend-submit.R @@ -34,18 +34,19 @@ m_backend_submit.mall_ollama <- function(backend, x, prompt, preview = FALSE) { \(x) { .args <- c( messages = list( - map(prompt, \(i) - map(i, \(j) { - out <- glue(j, x = x) - ln <- length(unlist(strsplit(out, " "))) - if (ln > m_ollama_tokens()) { - warnings <<- c( - warnings, - list(list(row = substr(x, 1, 20), len = ln)) - ) - } - out - })) + map(prompt, \(i) { + map(i, \(j) { + out <- glue(j, x = x) + ln <- length(unlist(strsplit(out, " "))) + if (ln > m_ollama_tokens()) { + warnings <<- c( + warnings, + list(list(row = substr(x, 1, 20), len = ln)) + ) + } + out + }) + }) ), output = "text", m_defaults_args(backend) @@ -144,10 +145,12 @@ m_ellmer_chat <- function(...) { # ------------------------------ Simulate -------------------------------------- #' @export -m_backend_submit.mall_simulate_llm <- function(backend, - x, - prompt, - preview = FALSE) { +m_backend_submit.mall_simulate_llm <- function( + backend, + x, + prompt, + preview = FALSE +) { .args <- as.list(environment()) args <- m_defaults_args(backend) if (args$model == "pipe") { diff --git a/r/R/m-defaults.R b/r/R/m-defaults.R index 0749658..c958dae 100644 --- a/r/R/m-defaults.R +++ b/r/R/m-defaults.R @@ -22,7 +22,9 @@ m_defaults_set <- function(...) { .env_llm$session <- structure( list( name = defaults[["backend"]], - args = defaults[names(defaults) != "backend" & names(defaults) != ".cache"], + args = defaults[ + names(defaults) != "backend" & names(defaults) != ".cache" + ], session = list( cache_folder = defaults[[".cache"]] ) diff --git a/r/R/m-vec-prompt.R b/r/R/m-vec-prompt.R index 6808b2b..edc40c4 100644 --- a/r/R/m-vec-prompt.R +++ b/r/R/m-vec-prompt.R @@ -1,11 +1,13 @@ -m_vec_prompt <- function(x, - prompt_label = "", - additional_prompt = "", - valid_resps = NULL, - prompt = NULL, - convert = NULL, - preview = FALSE, - ...) { +m_vec_prompt <- function( + x, + prompt_label = "", + additional_prompt = "", + valid_resps = NULL, + prompt = NULL, + convert = NULL, + preview = FALSE, + ... +) { # Initializes session LLM backend <- llm_use(.silent = TRUE, .force = FALSE) @@ -15,10 +17,10 @@ m_vec_prompt <- function(x, additional = additional_prompt ) fn <- defaults[[prompt_label]] - if(!is.null(fn)) { - prompt <- fn(...) + if (!is.null(fn)) { + prompt <- fn(...) } - + # Submits final prompt to the LLM resp <- m_backend_submit( backend = backend, diff --git a/r/man/llm_custom.Rd b/r/man/llm_custom.Rd index aa698be..07dd2ce 100644 --- a/r/man/llm_custom.Rd +++ b/r/man/llm_custom.Rd @@ -7,7 +7,7 @@ \usage{ llm_custom(.data, col, prompt = "", pred_name = ".pred", valid_resps = "") -llm_vec_custom(x, prompt = "", valid_resps = NULL) +llm_vec_custom(x, prompt = "", valid_resps = NULL, preview = FALSE) } \arguments{ \item{.data}{A \code{data.frame} or \code{tbl} object that contains the text to be @@ -25,6 +25,10 @@ deterministic, provide the options in a vector. This function will set to \code{NA} any response not in the options} \item{x}{A vector that contains the text to be analyzed} + +\item{preview}{It returns the R call that would have been used to run the +prediction. It only returns the first record in \code{x}. Defaults to \code{FALSE} +Applies to vector function only.} } \value{ \code{llm_custom} returns a \code{data.frame} or \code{tbl} object. @@ -51,5 +55,11 @@ my_prompt <- paste( reviews |> llm_custom(review, my_prompt) + +# For character vectors, instead of a data frame, use this function +llm_vec_custom(reviews$review, my_prompt) + +# To preview the first call that will be made to the downstream R function +llm_vec_custom(reviews$review, my_prompt, preview = TRUE) } } diff --git a/r/tests/testthat/_snaps/llm-classify.md b/r/tests/testthat/_snaps/llm-classify.md index d32fb34..25f4ba0 100644 --- a/r/tests/testthat/_snaps/llm-classify.md +++ b/r/tests/testthat/_snaps/llm-classify.md @@ -13,7 +13,7 @@ llm_vec_classify("this is a test", c("a", "b"), preview = TRUE) Output [[1]] - ollamar::chat(messages = list(list(role = "user", content = "You are a helpful classification engine. Determine if the text refers to one of the following: a, b. No capitalization. No explanations. The answer is based on the following text:\nthis is a test")), + ollamar::chat(messages = list(list(role = "user", content = "You are a helpful classification engine. Determine if the text refers to one of the following: a, b. No capitalization. No explanations. The answer is based on the following text: this is a test")), output = "text", model = "llama3.2", seed = 100) diff --git a/r/tests/testthat/_snaps/llm-extract.md b/r/tests/testthat/_snaps/llm-extract.md index 68bdb5f..7dbdc99 100644 --- a/r/tests/testthat/_snaps/llm-extract.md +++ b/r/tests/testthat/_snaps/llm-extract.md @@ -24,5 +24,5 @@ Code llm_vec_extract("bob smith, 105 2nd street", c("name", "address")) Output - [1] "| bob smith | 105 2nd street |" + [1] "bob smith | 105 2nd street" diff --git a/r/tests/testthat/_snaps/llm-sentiment.md b/r/tests/testthat/_snaps/llm-sentiment.md index e0671b5..8b2940d 100644 --- a/r/tests/testthat/_snaps/llm-sentiment.md +++ b/r/tests/testthat/_snaps/llm-sentiment.md @@ -12,7 +12,7 @@ Code llm_vec_sentiment(vec_reviews) Output - [1] "positive" "negative" "neutral" + [1] "positive" "negative" "negative" --- @@ -41,7 +41,7 @@ .sentiment 1 positive 2 negative - 3 neutral + 3 negative --- @@ -55,5 +55,5 @@ new 1 positive 2 negative - 3 neutral + 3 negative diff --git a/r/tests/testthat/_snaps/llm-summarize.md b/r/tests/testthat/_snaps/llm-summarize.md index 86e44eb..14f61c4 100644 --- a/r/tests/testthat/_snaps/llm-summarize.md +++ b/r/tests/testthat/_snaps/llm-summarize.md @@ -25,8 +25,8 @@ 1 This has been the best TV I've ever used. Great screen, and sound. 2 I regret buying this laptop. It is too slow and the keyboard is too noisy 3 Not sure how to feel about my new washing machine. Great color, but hard to figure - .summary - 1 this tv is excellent quality - 2 i regret my laptop purchase - 3 confused about the purchase + .summary + 1 the tv is excellent quality + 2 i made a bad purchase + 3 having mixed feelings about it diff --git a/r/tests/testthat/_snaps/llm-translate.md b/r/tests/testthat/_snaps/llm-translate.md index 46bc0ff..74e48ae 100644 --- a/r/tests/testthat/_snaps/llm-translate.md +++ b/r/tests/testthat/_snaps/llm-translate.md @@ -7,8 +7,8 @@ 1 This has been the best TV I've ever used. Great screen, and sound. 2 I regret buying this laptop. It is too slow and the keyboard is too noisy 3 Not sure how to feel about my new washing machine. Great color, but hard to figure - .translation - 1 Esta ha sido la mejor televisión que he utilizado hasta ahora. Gran pantalla y sonido. - 2 Lo lamento comprar este portátil. Es demasiado lento y el teclado es demasiado ruidoso. - 3 No estoy seguro de cómo sentirme con mi nueva lavadora. Un color grande, pero difícil de entender + .translation + 1 Esta ha sido la mejor televisión que he utilizado. Gran pantalla y sonido. + 2 Lo lamento comprar este portátil. Es demasiado lento y la tecla es muy ruidosa. + 3 No estoy seguro de cómo sentirme con mi nueva lavadora. Un color excelente, pero difícil de entender. diff --git a/r/tests/testthat/_snaps/llm-verify.md b/r/tests/testthat/_snaps/llm-verify.md index 77beeac..7fb66e7 100644 --- a/r/tests/testthat/_snaps/llm-verify.md +++ b/r/tests/testthat/_snaps/llm-verify.md @@ -4,7 +4,7 @@ llm_vec_verify("this is a test", "a test", preview = TRUE) Output [[1]] - ollamar::chat(messages = list(list(role = "user", content = "You are a helpful text analysis engine. Determine if this is true 'a test'. There are only two acceptable answers, 'yes' and 'no'. No capitalization. No explanations. The answer is based on the following text:\nthis is a test")), + ollamar::chat(messages = list(list(role = "user", content = "You are a helpful text analysis engine. Determine if this is true 'a test'. There are only two acceptable answers, 'yes' and 'no'. No capitalization. No explanations. The answer is based on the following text: this is a test")), output = "text", model = "llama3.2", seed = 100) diff --git a/r/tests/testthat/_snaps/m-backend-prompt.md b/r/tests/testthat/_snaps/m-backend-prompt.md index 04c7dec..d6322a6 100644 --- a/r/tests/testthat/_snaps/m-backend-prompt.md +++ b/r/tests/testthat/_snaps/m-backend-prompt.md @@ -5,3 +5,10 @@ Output You are a helpful sentiment engine. Return only one of the following answers: positive. No capitalization. No explanations. The answer will be based on each individual prompt. Treat each prompt as unique when deciding the answer. +--- + + Code + ellmer_funcs$custom("Translate to Spanish:") + Output + Translate to Spanish: The answer will be based on each individual prompt. Treat each prompt as unique when deciding the answer. + diff --git a/r/tests/testthat/test-llm-classify.R b/r/tests/testthat/test-llm-classify.R index b463cab..9a8b045 100644 --- a/r/tests/testthat/test-llm-classify.R +++ b/r/tests/testthat/test-llm-classify.R @@ -1,6 +1,12 @@ test_that("Classify works", { test_text <- "this is a test" - llm_use("simulate_llm", "echo", .silent = TRUE, .force = TRUE, .cache = .mall_test$cache) + llm_use( + "simulate_llm", + "echo", + .silent = TRUE, + .force = TRUE, + .cache = .mall_test$cache + ) expect_equal( llm_vec_classify(test_text, labels = test_text), test_text diff --git a/r/tests/testthat/test-llm-custom.R b/r/tests/testthat/test-llm-custom.R index 9525660..ff747b4 100644 --- a/r/tests/testthat/test-llm-custom.R +++ b/r/tests/testthat/test-llm-custom.R @@ -1,12 +1,22 @@ test_that("Custom works", { test_text <- "this is a test" - llm_use("simulate_llm", "echo", .silent = TRUE, .force = TRUE, .cache = .mall_test$cache) + llm_use( + "simulate_llm", + "echo", + .silent = TRUE, + .force = TRUE, + .cache = .mall_test$cache + ) expect_equal( llm_vec_custom(test_text, "this is a test: "), test_text ) expect_message( - x <- llm_vec_custom(test_text, "this is a test: ", valid_resps = "not valid") + x <- llm_vec_custom( + test_text, + "this is a test: ", + valid_resps = "not valid" + ) ) expect_equal(x, as.character(NA)) @@ -16,7 +26,12 @@ test_that("Custom works", { ) expect_equal( - llm_custom(data.frame(x = test_text), x, "this is a test: ", pred_name = "new"), + llm_custom( + data.frame(x = test_text), + x, + "this is a test: ", + pred_name = "new" + ), data.frame(x = test_text, new = test_text) ) }) diff --git a/r/tests/testthat/test-llm-extract.R b/r/tests/testthat/test-llm-extract.R index ac49238..c441895 100644 --- a/r/tests/testthat/test-llm-extract.R +++ b/r/tests/testthat/test-llm-extract.R @@ -1,5 +1,11 @@ test_that("Extract works", { - llm_use("simulate_llm", "prompt", .silent = TRUE, .force = TRUE, .cache = .mall_test$cache) + llm_use( + "simulate_llm", + "prompt", + .silent = TRUE, + .force = TRUE, + .cache = .mall_test$cache + ) expect_snapshot( llm_vec_extract("toaster", labels = "product") @@ -33,6 +39,17 @@ test_that("Extract data frame works", { ), data.frame(x = "test1|test2", y = "test1", z = "test2") ) + + expect_equal( + llm_extract( + .data = data.frame(x = "test1|test2"), + col = x, + labels = c("product1", "product2"), + expand_cols = TRUE, + pred_name = c("item", "vibe") + ), + data.frame(x = "test1|test2", item = "test1", vibe = "test2") + ) }) test_that("Extract on Ollama works", { diff --git a/r/tests/testthat/test-llm-sentiment.R b/r/tests/testthat/test-llm-sentiment.R index 1b2c05b..4b5629d 100644 --- a/r/tests/testthat/test-llm-sentiment.R +++ b/r/tests/testthat/test-llm-sentiment.R @@ -1,5 +1,11 @@ test_that("Sentiment works", { - llm_use("simulate_llm", "pipe", .silent = TRUE, .force = TRUE, .cache = .mall_test$cache) + llm_use( + "simulate_llm", + "pipe", + .silent = TRUE, + .force = TRUE, + .cache = .mall_test$cache + ) expect_equal( llm_vec_sentiment("this is a test|positive"), "positive" diff --git a/r/tests/testthat/test-llm-summarize.R b/r/tests/testthat/test-llm-summarize.R index e168d05..f986bed 100644 --- a/r/tests/testthat/test-llm-summarize.R +++ b/r/tests/testthat/test-llm-summarize.R @@ -1,6 +1,12 @@ test_that("Summarize works", { test_text <- "this is a test" - llm_use("simulate_llm", "echo", .silent = TRUE, .force = TRUE, .cache = .mall_test$cache) + llm_use( + "simulate_llm", + "echo", + .silent = TRUE, + .force = TRUE, + .cache = .mall_test$cache + ) expect_equal( llm_vec_summarize(test_text), test_text diff --git a/r/tests/testthat/test-llm-translate.R b/r/tests/testthat/test-llm-translate.R index 17325ae..7c728ad 100644 --- a/r/tests/testthat/test-llm-translate.R +++ b/r/tests/testthat/test-llm-translate.R @@ -1,6 +1,12 @@ test_that("Translate works", { test_text <- "this is a test" - llm_use("simulate_llm", "echo", .silent = TRUE, .force = TRUE, .cache = .mall_test$cache) + llm_use( + "simulate_llm", + "echo", + .silent = TRUE, + .force = TRUE, + .cache = .mall_test$cache + ) expect_equal( llm_vec_translate(test_text, language = "other"), test_text diff --git a/r/tests/testthat/test-llm-verify.R b/r/tests/testthat/test-llm-verify.R index fec840b..47fa095 100644 --- a/r/tests/testthat/test-llm-verify.R +++ b/r/tests/testthat/test-llm-verify.R @@ -1,6 +1,12 @@ test_that("Verify works", { test_text <- "this is a test" - llm_use("simulate_llm", "echo", .silent = TRUE, .force = TRUE, .cache = .mall_test$cache) + llm_use( + "simulate_llm", + "echo", + .silent = TRUE, + .force = TRUE, + .cache = .mall_test$cache + ) expect_equal( llm_vec_verify(test_text, "test", yes_no = test_text), test_text diff --git a/r/tests/testthat/test-m-backend-prompt.R b/r/tests/testthat/test-m-backend-prompt.R index cbe7538..87da2f2 100644 --- a/r/tests/testthat/test-m-backend-prompt.R +++ b/r/tests/testthat/test-m-backend-prompt.R @@ -38,4 +38,5 @@ test_that("Ellmer method works", { class(ellmer_session) <- c("mall_ellmer", "mall_session") ellmer_funcs <- m_backend_prompt(ellmer_session, "") expect_snapshot(ellmer_funcs$sentiment("positive")) + expect_snapshot(ellmer_funcs$custom("Translate to Spanish:")) }) diff --git a/reference/MallFrame.qmd b/reference/MallFrame.qmd index 358c76d..d01ae23 100644 --- a/reference/MallFrame.qmd +++ b/reference/MallFrame.qmd @@ -1,12 +1,15 @@ -# MallFrame {#mall.MallFrame} +# MallFrame { #mall.MallFrame } -``` python +```python MallFrame(df) ``` -Extension to Polars that add ability to use an LLM to run batch predictions over a data frame +Extension to Polars that add ability to use +an LLM to run batch predictions over a data frame -We will start by loading the needed libraries, and set up the data frame that will be used in the examples: +We will start by loading the needed libraries, and +set up the data frame that will be used in the +examples: ```{python} #| output: false @@ -17,13 +20,13 @@ pl.Config.set_tbl_hide_dataframe_shape(True) pl.Config.set_tbl_hide_column_data_types(True) data = mall.MallData reviews = data.reviews -reviews.llm.use("ollama", model = "llama3.2") +reviews.llm.use(options = dict(seed = 100)) ``` ## Methods | Name | Description | -|------------------------------------|------------------------------------| +| --- | --- | | [classify](#mall.MallFrame.classify) | Classify text into specific categories. | | [custom](#mall.MallFrame.custom) | Provide the full prompt that the LLM will process. | | [extract](#mall.MallFrame.extract) | Pull a specific label from the text. | @@ -33,9 +36,9 @@ reviews.llm.use("ollama", model = "llama3.2") | [use](#mall.MallFrame.use) | Define the model, backend, and other options to use to | | [verify](#mall.MallFrame.verify) | Check to see if something is true about the text. | -### classify {#mall.MallFrame.classify} +### classify { #mall.MallFrame.classify } -``` python +```python MallFrame.classify(col, labels='', additional='', pred_name='classify') ``` @@ -43,12 +46,12 @@ Classify text into specific categories. #### Parameters {.doc-section .doc-section-parameters} -| Name | Type | Description | Default | -|---------|---------|---------------------------------------------|---------| -| col | str | The name of the text field to process | *required* | -| labels | list | A list or a DICT object that defines the categories to classify the text as. It will return one of the provided labels. | `''` | -| pred_name | str | A character vector with the name of the new column where the prediction will be placed | `'classify'` | -| additional | str | Inserts this text into the prompt sent to the LLM | `''` | +| Name | Type | Description | Default | +|------------|--------|-------------------------------------------------------------------------------------------------------------------------|--------------| +| col | str | The name of the text field to process | _required_ | +| labels | list | A list or a DICT object that defines the categories to classify the text as. It will return one of the provided labels. | `''` | +| pred_name | str | A character vector with the name of the new column where the prediction will be placed | `'classify'` | +| additional | str | Inserts this text into the prompt sent to the LLM | `''` | #### Examples {.doc-section .doc-section-examples} @@ -66,9 +69,9 @@ reviews.llm.classify("review", ["appliance", "computer"], pred_name="prod_type") reviews.llm.classify("review", {"appliance" : "1", "computer" : "2"}) ``` -### custom {#mall.MallFrame.custom} +### custom { #mall.MallFrame.custom } -``` python +```python MallFrame.custom(col, prompt='', valid_resps='', pred_name='custom') ``` @@ -76,11 +79,11 @@ Provide the full prompt that the LLM will process. #### Parameters {.doc-section .doc-section-parameters} -| Name | Type | Description | Default | -|----------|----------|-------------------------------------------|----------| -| col | str | The name of the text field to process | *required* | -| prompt | str | The prompt to send to the LLM along with the `col` | `''` | -| pred_name | str | A character vector with the name of the new column where the prediction will be placed | `'custom'` | +| Name | Type | Description | Default | +|-----------|--------|----------------------------------------------------------------------------------------|------------| +| col | str | The name of the text field to process | _required_ | +| prompt | str | The prompt to send to the LLM along with the `col` | `''` | +| pred_name | str | A character vector with the name of the new column where the prediction will be placed | `'custom'` | #### Examples {.doc-section .doc-section-examples} @@ -95,9 +98,9 @@ my_prompt = ( reviews.llm.custom("review", prompt = my_prompt) ``` -### extract {#mall.MallFrame.extract} +### extract { #mall.MallFrame.extract } -``` python +```python MallFrame.extract( col, labels='', @@ -111,12 +114,12 @@ Pull a specific label from the text. #### Parameters {.doc-section .doc-section-parameters} -| Name | Type | Description | Default | -|----------|----------|------------------------------------------|----------| -| col | str | The name of the text field to process | *required* | -| labels | list | A list or a DICT object that defines tells the LLM what to look for and return | `''` | -| pred_name | str | A character vector with the name of the new column where the prediction will be placed | `'extract'` | -| additional | str | Inserts this text into the prompt sent to the LLM | `''` | +| Name | Type | Description | Default | +|------------|--------|----------------------------------------------------------------------------------------|-------------| +| col | str | The name of the text field to process | _required_ | +| labels | list | A list or a DICT object that defines tells the LLM what to look for and return | `''` | +| pred_name | str | A character vector with the name of the new column where the prediction will be placed | `'extract'` | +| additional | str | Inserts this text into the prompt sent to the LLM | `''` | #### Examples {.doc-section .doc-section-examples} @@ -155,9 +158,9 @@ reviews.llm.extract( ) ``` -### sentiment {#mall.MallFrame.sentiment} +### sentiment { #mall.MallFrame.sentiment } -``` python +```python MallFrame.sentiment( col, options=['positive', 'negative', 'neutral'], @@ -170,12 +173,12 @@ Use an LLM to run a sentiment analysis #### Parameters {.doc-section .doc-section-parameters} -| Name | Type | Description | Default | -|-------------|-------------|----------------------------------|-------------| -| col | str | The name of the text field to process | *required* | -| options | list or dict | A list of the sentiment options to use, or a named DICT object | `['positive', 'negative', 'neutral']` | -| pred_name | str | A character vector with the name of the new column where the prediction will be placed | `'sentiment'` | -| additional | str | Inserts this text into the prompt sent to the LLM | `''` | +| Name | Type | Description | Default | +|------------|--------------|----------------------------------------------------------------------------------------|---------------------------------------| +| col | str | The name of the text field to process | _required_ | +| options | list or dict | A list of the sentiment options to use, or a named DICT object | `['positive', 'negative', 'neutral']` | +| pred_name | str | A character vector with the name of the new column where the prediction will be placed | `'sentiment'` | +| additional | str | Inserts this text into the prompt sent to the LLM | `''` | #### Examples {.doc-section .doc-section-examples} @@ -198,9 +201,9 @@ reviews.llm.sentiment("review", ["positive", "negative"]) reviews.llm.sentiment("review", {"positive" : 1, "negative" : 0}) ``` -### summarize {#mall.MallFrame.summarize} +### summarize { #mall.MallFrame.summarize } -``` python +```python MallFrame.summarize(col, max_words=10, additional='', pred_name='summary') ``` @@ -208,12 +211,12 @@ Summarize the text down to a specific number of words. #### Parameters {.doc-section .doc-section-parameters} -| Name | Type | Description | Default | -|----------|----------|------------------------------------------|----------| -| col | str | The name of the text field to process | *required* | -| max_words | int | Maximum number of words to use for the summary | `10` | -| pred_name | str | A character vector with the name of the new column where the prediction will be placed | `'summary'` | -| additional | str | Inserts this text into the prompt sent to the LLM | `''` | +| Name | Type | Description | Default | +|------------|--------|----------------------------------------------------------------------------------------|-------------| +| col | str | The name of the text field to process | _required_ | +| max_words | int | Maximum number of words to use for the summary | `10` | +| pred_name | str | A character vector with the name of the new column where the prediction will be placed | `'summary'` | +| additional | str | Inserts this text into the prompt sent to the LLM | `''` | #### Examples {.doc-section .doc-section-examples} @@ -227,9 +230,9 @@ reviews.llm.summarize("review", max_words = 5) reviews.llm.summarize("review", 5, pred_name = "review_summary") ``` -### translate {#mall.MallFrame.translate} +### translate { #mall.MallFrame.translate } -``` python +```python MallFrame.translate(col, language='', additional='', pred_name='translation') ``` @@ -237,12 +240,12 @@ Translate text into another language. #### Parameters {.doc-section .doc-section-parameters} -| Name | Type | Description | Default | -|-----------|-----------|-----------------------------------------|-----------| -| col | str | The name of the text field to process | *required* | -| language | str | The target language to translate to. For example 'French'. | `''` | -| pred_name | str | A character vector with the name of the new column where the prediction will be placed | `'translation'` | -| additional | str | Inserts this text into the prompt sent to the LLM | `''` | +| Name | Type | Description | Default | +|------------|--------|----------------------------------------------------------------------------------------|-----------------| +| col | str | The name of the text field to process | _required_ | +| language | str | The target language to translate to. For example 'French'. | `''` | +| pred_name | str | A character vector with the name of the new column where the prediction will be placed | `'translation'` | +| additional | str | Inserts this text into the prompt sent to the LLM | `''` | #### Examples {.doc-section .doc-section-examples} @@ -254,34 +257,33 @@ reviews.llm.translate("review", "spanish") reviews.llm.translate("review", "french") ``` -### use {#mall.MallFrame.use} +### use { #mall.MallFrame.use } -``` python +```python MallFrame.use(backend='', model='', _cache='_mall_cache', **kwargs) ``` -Define the model, backend, and other options to use to interact with the LLM. +Define the model, backend, and other options to use to +interact with the LLM. #### Parameters {.doc-section .doc-section-parameters} -| Name | Type | Description | Default | -|---------|---------|---------------------------------------------|---------| -| backend | str \| Chat \| Client | The name of the backend to use, or an Ollama Client object, or a `chatlas` Chat object. At the beginning of the session it defaults to "ollama". If passing `""`, it will remain unchanged | `''` | -| model | str | The name of the model tha the backend should use. At the beginning of the session it defaults to "llama3.2". If passing `""`, it will remain unchanged | `''` | -| \_cache | str | The path of where to save the cached results. Passing `""` disables the cache | `'_mall_cache'` | -| \*\*kwargs | | Arguments to pass to the downstream Python call. In this case, the `chat` function in `ollama` | `{}` | +| Name | Type | Description | Default | +|----------|-----------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------| +| backend | str \| Chat \| Client | The name of the backend to use, or an Ollama Client object, or a `chatlas` Chat object. At the beginning of the session it defaults to "ollama". If passing `""`, it will remain unchanged | `''` | +| model | str | The name of the model tha the backend should use. At the beginning of the session it defaults to "llama3.2". If passing `""`, it will remain unchanged | `''` | +| _cache | str | The path of where to save the cached results. Passing `""` disables the cache | `'_mall_cache'` | +| **kwargs | | Arguments to pass to the downstream Python call. In this case, the `chat` function in `ollama` | `{}` | #### Examples {.doc-section .doc-section-examples} ```{python} -#| eval: false # Additional arguments will be passed 'as-is' to the # downstream R function in this example, to ollama::chat() reviews.llm.use("ollama", "llama3.2", options = dict(seed = 100, temperature = 0.1)) ``` ```{python} -#| eval: false # During the Python session, you can change any argument # individually and it will retain all of previous # arguments used @@ -289,28 +291,25 @@ reviews.llm.use(options = dict(temperature = 0.3)) ``` ```{python} -#| eval: false # Use _cache to modify the target folder for caching reviews.llm.use(_cache = "_my_cache") ``` ```{python} -#| eval: false # Leave _cache empty to turn off this functionality reviews.llm.use(_cache = "") ``` ```{python} -#| eval: false # Use a `chatlas` object from chatlas import ChatOpenAI chat = ChatOpenAI() reviews.llm.use(chat) ``` -### verify {#mall.MallFrame.verify} +### verify { #mall.MallFrame.verify } -``` python +```python MallFrame.verify(col, what='', yes_no=[1, 0], additional='', pred_name='verify') ``` @@ -318,13 +317,13 @@ Check to see if something is true about the text. #### Parameters {.doc-section .doc-section-parameters} -| Name | Type | Description | Default | -|--------|--------|------------------------------------------------|--------| -| col | str | The name of the text field to process | *required* | -| what | str | The statement or question that needs to be verified against the provided text | `''` | -| yes_no | list | A positional list of size 2, which contains the values to return if true and false. The first position will be used as the 'true' value, and the second as the 'false' value | `[1, 0]` | -| pred_name | str | A character vector with the name of the new column where the prediction will be placed | `'verify'` | -| additional | str | Inserts this text into the prompt sent to the LLM | `''` | +| Name | Type | Description | Default | +|------------|--------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------| +| col | str | The name of the text field to process | _required_ | +| what | str | The statement or question that needs to be verified against the provided text | `''` | +| yes_no | list | A positional list of size 2, which contains the values to return if true and false. The first position will be used as the 'true' value, and the second as the 'false' value | `[1, 0]` | +| pred_name | str | A character vector with the name of the new column where the prediction will be placed | `'verify'` | +| additional | str | Inserts this text into the prompt sent to the LLM | `''` | #### Examples {.doc-section .doc-section-examples} diff --git a/reference/llm_classify.qmd b/reference/llm_classify.qmd index 55aa36d..927f942 100644 --- a/reference/llm_classify.qmd +++ b/reference/llm_classify.qmd @@ -20,7 +20,6 @@ Use a Large Language Model (LLM) to classify the provided text as one of the opt ## Usage ```r - llm_classify( .data, col, diff --git a/reference/llm_custom.qmd b/reference/llm_custom.qmd index 976c9bc..a840b7f 100644 --- a/reference/llm_custom.qmd +++ b/reference/llm_custom.qmd @@ -20,10 +20,9 @@ Use a Large Language Model (LLM) to process the provided text using the instruct ## Usage ```r - llm_custom(.data, col, prompt = "", pred_name = ".pred", valid_resps = "") -llm_vec_custom(x, prompt = "", valid_resps = NULL) +llm_vec_custom(x, prompt = "", valid_resps = NULL, preview = FALSE) ``` ## Arguments @@ -35,6 +34,7 @@ llm_vec_custom(x, prompt = "", valid_resps = NULL) | pred_name | A character vector with the name of the new column where the prediction will be placed | | valid_resps | If the response from the LLM is not open, but deterministic, provide the options in a vector. This function will set to `NA` any response not in the options | | x | A vector that contains the text to be analyzed | +| preview | It returns the R call that would have been used to run the prediction. It only returns the first record in `x`. Defaults to `FALSE` Applies to vector function only. | @@ -62,6 +62,12 @@ my_prompt <- paste( reviews |> llm_custom(review, my_prompt) +# For character vectors, instead of a data frame, use this function +llm_vec_custom(reviews$review, my_prompt) + +# To preview the first call that will be made to the downstream R function +llm_vec_custom(reviews$review, my_prompt, preview = TRUE) + ``` diff --git a/reference/llm_extract.qmd b/reference/llm_extract.qmd index a655aa2..0e8ea20 100644 --- a/reference/llm_extract.qmd +++ b/reference/llm_extract.qmd @@ -20,7 +20,6 @@ Use a Large Language Model (LLM) to extract specific entity, or entities, from t ## Usage ```r - llm_extract( .data, col, diff --git a/reference/llm_sentiment.qmd b/reference/llm_sentiment.qmd index 7e5fb6c..3faacb3 100644 --- a/reference/llm_sentiment.qmd +++ b/reference/llm_sentiment.qmd @@ -20,7 +20,6 @@ Use a Large Language Model (LLM) to perform sentiment analysis from the provided ## Usage ```r - llm_sentiment( .data, col, diff --git a/reference/llm_summarize.qmd b/reference/llm_summarize.qmd index 954cca6..f85bef4 100644 --- a/reference/llm_summarize.qmd +++ b/reference/llm_summarize.qmd @@ -20,7 +20,6 @@ Use a Large Language Model (LLM) to summarize text ## Usage ```r - llm_summarize( .data, col, diff --git a/reference/llm_translate.qmd b/reference/llm_translate.qmd index 50df4ba..eae7ba8 100644 --- a/reference/llm_translate.qmd +++ b/reference/llm_translate.qmd @@ -20,7 +20,6 @@ Use a Large Language Model (LLM) to translate a text to a specific language ## Usage ```r - llm_translate( .data, col, diff --git a/reference/llm_use.qmd b/reference/llm_use.qmd index 099dfc5..0c1f4c5 100644 --- a/reference/llm_use.qmd +++ b/reference/llm_use.qmd @@ -20,7 +20,6 @@ Allows us to specify the back-end provider, model to use during the current R se ## Usage ```r - llm_use( backend = NULL, model = NULL, diff --git a/reference/llm_verify.qmd b/reference/llm_verify.qmd index 8b5f570..6e104bb 100644 --- a/reference/llm_verify.qmd +++ b/reference/llm_verify.qmd @@ -20,7 +20,6 @@ Use a Large Language Model (LLM) to see if something is true or not based the pr ## Usage ```r - llm_verify( .data, col, diff --git a/reference/reviews.qmd b/reference/reviews.qmd index 6dbb28d..565f89c 100644 --- a/reference/reviews.qmd +++ b/reference/reviews.qmd @@ -22,7 +22,6 @@ A data frame that contains 3 records. The records are of fictitious product revi ## Usage ```r - reviews ``` diff --git a/site/README.md b/site/README.md index 5500dec..9b9a913 100644 --- a/site/README.md +++ b/site/README.md @@ -2,15 +2,17 @@ To re-create the reference files, and capture the possibly new output from the resulting Quarto files, use the following steps: ```bash -uv sync --project python/ -uv pip install python/ jupyter quartodoc --python python/.venv/bin/python3 +uv venv .venv-site +UV_PROJECT_ENVIRONMENT=$PWD/.venv-site uv sync --project python/ +uv pip install jupyter quartodoc "griffe<1.0" --python .venv-site/bin/python3 R CMD INSTALL R/ rm -rf _freeze/reference rm -rf _freeze/index R -e 'pkgsite::write_reference()' -python/.venv/bin/quartodoc build --verbose +.venv-site/bin/quartodoc build --verbose export OPENAI_API_KEY="na" -export QUARTO_PYTHON=python/.venv/bin/python3 +export QUARTO_PYTHON=.venv-site/bin/python3 quarto render +rm -rf .venv-site quarto preview ```