feat (dataapp-deployment): add knowledge about Storage access #71
feat (dataapp-deployment): add knowledge about Storage access #71MiroCillik wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Extends the dataapp-deployment skill to include end-to-end guidance for reading/writing Keboola Storage from inside a Data App using the Query Service, including operator configuration, runtime env vars, client wrappers, validation guidance, and local-dev setup.
Changes:
- Adds a new “Accessing Keboola Storage (Query Service)” section covering enablement, env vars, Python/TS wrappers, SQL validation, and local development workflow.
- Updates Streamlit and Express patterns with storage-access usage notes and examples.
- Expands the deployment checklist and frontmatter description to include Storage Access/Query Service guidance.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Wrap it in a single module so route handlers never touch the raw env vars or Client. The same four env vars apply. | ||
|
|
||
| ```typescript | ||
| // storage.ts (or storage.js — strip the type annotations) | ||
| import { readFileSync } from 'node:fs'; | ||
| import { Client } from '@keboola/query-service'; | ||
|
|
||
| // Optional dev-only: load .env when present (skip in production / container). | ||
| try { (await import('dotenv')).default.config(); } catch { /* ignored */ } |
There was a problem hiding this comment.
The storage.ts snippet uses top-level await import('dotenv'), which will fail in CommonJS and in many TS builds that compile to module: commonjs (top-level await not allowed). To keep this copy/paste-ready, load dotenv in the app entrypoint before importing storage (or switch to a non-top-level-await pattern) so the module can be imported in both ESM and CJS setups.
| Wrap it in a single module so route handlers never touch the raw env vars or Client. The same four env vars apply. | |
| ```typescript | |
| // storage.ts (or storage.js — strip the type annotations) | |
| import { readFileSync } from 'node:fs'; | |
| import { Client } from '@keboola/query-service'; | |
| // Optional dev-only: load .env when present (skip in production / container). | |
| try { (await import('dotenv')).default.config(); } catch { /* ignored */ } | |
| Wrap it in a single module so route handlers never touch the raw env vars or Client. The same four env vars apply. For local development, load `.env` in your app entrypoint before importing this module. | |
| ```typescript | |
| // storage.ts (or storage.js — strip the type annotations) | |
| import { readFileSync } from 'node:fs'; | |
| import { Client } from '@keboola/query-service'; | |
| // For local development, call dotenv.config() in your app entrypoint | |
| // before importing this module. Skip that in production / container. |
| } | ||
| ``` | ||
|
|
||
| For CommonJS apps, replace the dynamic `dotenv` import with a regular `require('dotenv').config()`, and use `import { Client } = require('@keboola/query-service')` syntax. |
There was a problem hiding this comment.
The CommonJS guidance uses import { Client } = require('@keboola/query-service'), which is invalid syntax in CommonJS. Use a destructuring require (or const Client = require(...).Client) so readers don't copy a snippet that throws on startup.
| For CommonJS apps, replace the dynamic `dotenv` import with a regular `require('dotenv').config()`, and use `import { Client } = require('@keboola/query-service')` syntax. | |
| For CommonJS apps, replace the dynamic `dotenv` import with a regular `require('dotenv').config()`, and use `const { Client } = require('@keboola/query-service')` syntax. |
|
|
||
| ```python | ||
| # validation.py | ||
| from datetime import date, time |
There was a problem hiding this comment.
In the validation.py example, date and time are imported but never used. Since the PR text explicitly aims for copy/paste-ready snippets, it would be better to remove unused imports to avoid immediate linting failures/confusion.
| from datetime import date, time |
|
|
||
| export function parseInt32(v: unknown, field: string): number { | ||
| const n = Number(v); | ||
| if (!Number.isInteger(n)) throw new ValidationError(`${field} must be an integer`); |
There was a problem hiding this comment.
parseInt32 doesn’t actually validate a 32-bit integer range; it only checks Number.isInteger. Either enforce the 32-bit bounds or rename the helper to avoid misleading callers about what’s being validated.
| export function parseInt32(v: unknown, field: string): number { | |
| const n = Number(v); | |
| if (!Number.isInteger(n)) throw new ValidationError(`${field} must be an integer`); | |
| const INT32_MIN = -2147483648; | |
| const INT32_MAX = 2147483647; | |
| export function parseInt32(v: unknown, field: string): number { | |
| const n = Number(v); | |
| if (!Number.isInteger(n) || n < INT32_MIN || n > INT32_MAX) { | |
| throw new ValidationError(`${field} must be a 32-bit integer`); | |
| } |
| export function parseBorough(v: unknown): string { | ||
| const upper = String(v ?? '').trim().toUpperCase(); | ||
| if (!BOROUGHS.has(upper as typeof BOROUGHS extends Set<infer T> ? T : never)) { | ||
| throw new ValidationError(`BOROUGH must be one of ${[...BOROUGHS].join(', ')}`); |
There was a problem hiding this comment.
The parseBorough example’s type expression upper as typeof BOROUGHS extends Set<infer T> ? T : never is effectively a no-op here (the Set still ends up as Set<string>), and it makes the snippet harder to understand. Consider explicitly typing the set (e.g. a type Borough = ... union) and using a straightforward .has(upper as Borough) check.
- Move dotenv loading out of storage.ts wrapper into the app entrypoint, so
the wrapper is portable across ESM and CJS (top-level await would break CJS).
- Replace invalid CJS syntax `import { Client } = require(...)` with proper
destructuring `const { Client } = require(...)`.
- Drop unused `from datetime import date, time` imports from the Python
validation.py snippet.
- Make `parseInt32` actually validate the 32-bit range it advertises (was
only checking `Number.isInteger`).
- Replace the over-clever conditional-type expression in `parseBorough` with
a plain `type Borough` union and `ReadonlySet<Borough>` for clarity.
|
@MiroCillik - Maybe we can sit down together for 30 minutes and get the skill completely updated. I don't think the current structure is good at all. I've been meaning to change it but I haven't gotten around to it. Imho, we need to unify everything under "dataapp-developer" and get rid of the data app deployment skill all together. |
|
@jordanrburger yes Jordan, I completely agree, would be best to unify those together. |
@MiroCillik FYI https://linear.app/keboola/issue/AI-3147/extend-data-app-development-skill-to-cover-full-lifecycle-storage-git |
|
I'm closing this, David Esner is preparing a new consolidated skill for Data Apps including Direct Storage access |
Why
Developers building Keboola Data Apps that need live Storage I/O (reads and writes through the Query Service) had no skill-level guidance. Without it, an AI-assisted developer was on their own to discover the
direct-grantoutput mapping, the four platform-injected env vars, the exact table-identifier shape Snowflake expects, and — most dangerously — the fact that the Query Service has no parameterized queries, which makes every interpolated value a potential injection point.The result: the skill produced container-correct apps that didn't actually know how to talk to Storage, so the developer had to leave the skill to figure out the runtime layer.
Context
The existing
dataapp-deploymentskill stops at the container boundary — it covers Nginx, Supervisord, secrets, and the POST-/-on-startup quirk, but says nothing about how an app reads or writes a Storage table. Meanwhile, interactive read/write data apps (dashboards with input forms, lightweight admin tools, internal one-pagers) are increasingly the primary use case, and Keboola already ships a Python and a JS Query Service SDK that make this practical. The gap was knowledge, not capability.Approach
Document the full runtime layer as a single section that takes a developer from operator-side configuration through to "first INSERT works locally." The narrative is:
unload_strategy: "direct-grant"actually grants, and that bucket stage (in.vsout.) does not constrain writes (a misconception that bit us during testing).keboola-query-service) and Node.js / TypeScript (@keboola/query-service), and the canonical table-identifier syntax..envworkflow so the same code runs identically locally and in the container.Python and Node.js / TypeScript are covered side-by-side throughout because the skill targets both runtimes — the container image ships both — and an AI assistant choosing between them shouldn't have to discover the JS SDK separately. The Streamlit and Express subsections each get a small "storage access in this framework" pointer (Streamlit:
@st.cache_resourcefor the singleton; Express: import-time env-var requirement) so the per-framework patterns close the loop with the new section.Impact
Testing
Validated by building two greenfield apps directly from the new skill content before opening this PR:
out.c-data-app.mvc-crashes(Chart.js + Leaflet visualization, add-crash form).in.c-dbt-beers.beers(table browser with filter, add-beer form).Both apps ran end-to-end against project 40 on
eu-central-1on first try — KPIs populated, the map and table rendered, and INSERTs round-tripped. Thein.c-write target on the second app is what surfaced the bucket-stage clarification now in the doc.Notes
plugins/dataapp-developer/skills/dataapp-deployment/SKILL.md. Anyone using a personal copy at~/.claude/skills/keboola-data-app/SKILL.mdwill need to resync after merge.data-app-direct-access-skill,data-app-direct-access-js) are not part of this repo and don't need to land here, but they're available on request as worked examples for future updates.🤖 Generated with Claude Code