Skip to content

feat (dataapp-deployment): add knowledge about Storage access #71

Open
MiroCillik wants to merge 2 commits into
mainfrom
miro-AJDA-2519
Open

feat (dataapp-deployment): add knowledge about Storage access #71
MiroCillik wants to merge 2 commits into
mainfrom
miro-AJDA-2519

Conversation

@MiroCillik
Copy link
Copy Markdown
Member

@MiroCillik MiroCillik commented Apr 29, 2026

Why

Developers building Keboola Data Apps that need live Storage I/O (reads and writes through the Query Service) had no skill-level guidance. Without it, an AI-assisted developer was on their own to discover the direct-grant output mapping, the four platform-injected env vars, the exact table-identifier shape Snowflake expects, and — most dangerously — the fact that the Query Service has no parameterized queries, which makes every interpolated value a potential injection point.

The result: the skill produced container-correct apps that didn't actually know how to talk to Storage, so the developer had to leave the skill to figure out the runtime layer.

Context

The existing dataapp-deployment skill stops at the container boundary — it covers Nginx, Supervisord, secrets, and the POST-/-on-startup quirk, but says nothing about how an app reads or writes a Storage table. Meanwhile, interactive read/write data apps (dashboards with input forms, lightweight admin tools, internal one-pagers) are increasingly the primary use case, and Keboola already ships a Python and a JS Query Service SDK that make this practical. The gap was knowledge, not capability.

Approach

Document the full runtime layer as a single section that takes a developer from operator-side configuration through to "first INSERT works locally." The narrative is:

  1. Operator side — what the Storage Access toggle does, what unload_strategy: "direct-grant" actually grants, and that bucket stage (in. vs out.) does not constrain writes (a misconception that bit us during testing).
  2. Runtime side — the four env vars the platform injects, copy-paste wrappers for both Python (keboola-query-service) and Node.js / TypeScript (@keboola/query-service), and the canonical table-identifier syntax.
  3. Security — every SQL value must be validated and escaped because the Query Service does not support parameterized queries. Both languages get a concrete validation module example.
  4. Local development — the workspace + token + .env workflow so the same code runs identically locally and in the container.
  5. Errors — the four failure modes a developer is actually going to hit, with the fix for each.

Python and Node.js / TypeScript are covered side-by-side throughout because the skill targets both runtimes — the container image ships both — and an AI assistant choosing between them shouldn't have to discover the JS SDK separately. The Streamlit and Express subsections each get a small "storage access in this framework" pointer (Streamlit: @st.cache_resource for the singleton; Express: import-time env-var requirement) so the per-framework patterns close the loop with the new section.

Impact

  • An AI-assisted developer can scaffold a working Data App with live Storage I/O on first try, in either Python or JavaScript, without leaving the skill.
  • The validation guidance closes a real injection-shaped trap that the previous skill silently left open.
  • The frontmatter description now triggers the skill for Streamlit / FastAPI / Flask and Node.js / TypeScript prompts, not just generic "deploy a data app" prompts.
  • No breaking changes; this is purely additive.

Testing

Validated by building two greenfield apps directly from the new skill content before opening this PR:

  • Python + FastAPI app reading and writing out.c-data-app.mvc-crashes (Chart.js + Leaflet visualization, add-crash form).
  • Node.js + Express app reading and writing in.c-dbt-beers.beers (table browser with filter, add-beer form).

Both apps ran end-to-end against project 40 on eu-central-1 on first try — KPIs populated, the map and table rendered, and INSERTs round-tripped. The in.c- write target on the second app is what surfaced the bucket-stage clarification now in the doc.

Notes

  • Source-of-truth file is plugins/dataapp-developer/skills/dataapp-deployment/SKILL.md. Anyone using a personal copy at ~/.claude/skills/keboola-data-app/SKILL.md will need to resync after merge.
  • The two reference apps (data-app-direct-access-skill, data-app-direct-access-js) are not part of this repo and don't need to land here, but they're available on request as worked examples for future updates.

🤖 Generated with Claude Code

@linear
Copy link
Copy Markdown

linear Bot commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Extends the dataapp-deployment skill to include end-to-end guidance for reading/writing Keboola Storage from inside a Data App using the Query Service, including operator configuration, runtime env vars, client wrappers, validation guidance, and local-dev setup.

Changes:

  • Adds a new “Accessing Keboola Storage (Query Service)” section covering enablement, env vars, Python/TS wrappers, SQL validation, and local development workflow.
  • Updates Streamlit and Express patterns with storage-access usage notes and examples.
  • Expands the deployment checklist and frontmatter description to include Storage Access/Query Service guidance.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +413 to +421
Wrap it in a single module so route handlers never touch the raw env vars or Client. The same four env vars apply.

```typescript
// storage.ts (or storage.js — strip the type annotations)
import { readFileSync } from 'node:fs';
import { Client } from '@keboola/query-service';

// Optional dev-only: load .env when present (skip in production / container).
try { (await import('dotenv')).default.config(); } catch { /* ignored */ }
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The storage.ts snippet uses top-level await import('dotenv'), which will fail in CommonJS and in many TS builds that compile to module: commonjs (top-level await not allowed). To keep this copy/paste-ready, load dotenv in the app entrypoint before importing storage (or switch to a non-top-level-await pattern) so the module can be imported in both ESM and CJS setups.

Suggested change
Wrap it in a single module so route handlers never touch the raw env vars or Client. The same four env vars apply.
```typescript
// storage.ts (or storage.js — strip the type annotations)
import { readFileSync } from 'node:fs';
import { Client } from '@keboola/query-service';
// Optional dev-only: load .env when present (skip in production / container).
try { (await import('dotenv')).default.config(); } catch { /* ignored */ }
Wrap it in a single module so route handlers never touch the raw env vars or Client. The same four env vars apply. For local development, load `.env` in your app entrypoint before importing this module.
```typescript
// storage.ts (or storage.js — strip the type annotations)
import { readFileSync } from 'node:fs';
import { Client } from '@keboola/query-service';
// For local development, call dotenv.config() in your app entrypoint
// before importing this module. Skip that in production / container.

Copilot uses AI. Check for mistakes.
}
```

For CommonJS apps, replace the dynamic `dotenv` import with a regular `require('dotenv').config()`, and use `import { Client } = require('@keboola/query-service')` syntax.
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CommonJS guidance uses import { Client } = require('@keboola/query-service'), which is invalid syntax in CommonJS. Use a destructuring require (or const Client = require(...).Client) so readers don't copy a snippet that throws on startup.

Suggested change
For CommonJS apps, replace the dynamic `dotenv` import with a regular `require('dotenv').config()`, and use `import { Client } = require('@keboola/query-service')` syntax.
For CommonJS apps, replace the dynamic `dotenv` import with a regular `require('dotenv').config()`, and use `const { Client } = require('@keboola/query-service')` syntax.

Copilot uses AI. Check for mistakes.

```python
# validation.py
from datetime import date, time
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the validation.py example, date and time are imported but never used. Since the PR text explicitly aims for copy/paste-ready snippets, it would be better to remove unused imports to avoid immediate linting failures/confusion.

Suggested change
from datetime import date, time

Copilot uses AI. Check for mistakes.
Comment on lines +553 to +556

export function parseInt32(v: unknown, field: string): number {
const n = Number(v);
if (!Number.isInteger(n)) throw new ValidationError(`${field} must be an integer`);
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseInt32 doesn’t actually validate a 32-bit integer range; it only checks Number.isInteger. Either enforce the 32-bit bounds or rename the helper to avoid misleading callers about what’s being validated.

Suggested change
export function parseInt32(v: unknown, field: string): number {
const n = Number(v);
if (!Number.isInteger(n)) throw new ValidationError(`${field} must be an integer`);
const INT32_MIN = -2147483648;
const INT32_MAX = 2147483647;
export function parseInt32(v: unknown, field: string): number {
const n = Number(v);
if (!Number.isInteger(n) || n < INT32_MIN || n > INT32_MAX) {
throw new ValidationError(`${field} must be a 32-bit integer`);
}

Copilot uses AI. Check for mistakes.
Comment on lines +560 to +563
export function parseBorough(v: unknown): string {
const upper = String(v ?? '').trim().toUpperCase();
if (!BOROUGHS.has(upper as typeof BOROUGHS extends Set<infer T> ? T : never)) {
throw new ValidationError(`BOROUGH must be one of ${[...BOROUGHS].join(', ')}`);
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parseBorough example’s type expression upper as typeof BOROUGHS extends Set<infer T> ? T : never is effectively a no-op here (the Set still ends up as Set<string>), and it makes the snippet harder to understand. Consider explicitly typing the set (e.g. a type Borough = ... union) and using a straightforward .has(upper as Borough) check.

Copilot uses AI. Check for mistakes.
@MiroCillik MiroCillik requested review from odinuv and sykora-ji April 29, 2026 18:03
- Move dotenv loading out of storage.ts wrapper into the app entrypoint, so
  the wrapper is portable across ESM and CJS (top-level await would break CJS).
- Replace invalid CJS syntax `import { Client } = require(...)` with proper
  destructuring `const { Client } = require(...)`.
- Drop unused `from datetime import date, time` imports from the Python
  validation.py snippet.
- Make `parseInt32` actually validate the 32-bit range it advertises (was
  only checking `Number.isInteger`).
- Replace the over-clever conditional-type expression in `parseBorough` with
  a plain `type Borough` union and `ReadonlySet<Borough>` for clarity.
@jordanrburger
Copy link
Copy Markdown
Collaborator

@MiroCillik - Maybe we can sit down together for 30 minutes and get the skill completely updated. I don't think the current structure is good at all. I've been meaning to change it but I haven't gotten around to it. Imho, we need to unify everything under "dataapp-developer" and get rid of the data app deployment skill all together.

@jordanrburger jordanrburger self-requested a review May 4, 2026 11:37
@MiroCillik
Copy link
Copy Markdown
Member Author

@jordanrburger yes Jordan, I completely agree, would be best to unify those together.

@davidesner
Copy link
Copy Markdown
Contributor

@MiroCillik - Maybe we can sit down together for 30 minutes and get the skill completely updated. I don't think the current structure is good at all. I've been meaning to change it but I haven't gotten around to it. Imho, we need to unify everything under "dataapp-developer" and get rid of the data app deployment skill all together.

@MiroCillik FYI https://linear.app/keboola/issue/AI-3147/extend-data-app-development-skill-to-cover-full-lifecycle-storage-git

@MiroCillik
Copy link
Copy Markdown
Member Author

I'm closing this, David Esner is preparing a new consolidated skill for Data Apps including Direct Storage access

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants