From 6c4b8ab68c5be89960506297de448c24e658df7e Mon Sep 17 00:00:00 2001 From: BohuTANG Date: Sun, 26 Oct 2025 20:27:41 +0800 Subject: [PATCH] Polish zh guides translations --- docs/cn/guides/00-products/index.md | 48 +-- .../cn/guides/51-ai-functions/_category_.json | 4 +- docs/cn/guides/54-query/00-sql-analytics.md | 323 +++++++----------- docs/cn/guides/54-query/01-json-search.md | 155 +++------ docs/cn/guides/54-query/02-vector-db.md | 98 +++--- docs/cn/guides/54-query/03-geo-analytics.md | 189 +++++----- docs/cn/guides/54-query/04-lakehouse-etl.md | 194 ++++++----- docs/cn/guides/54-query/_category_.json | 4 +- docs/cn/guides/54-query/index.md | 18 +- .../10-search-functions/index.md | 4 +- .../10-search-functions/query.md | 4 +- i18n/zh/code.json | 92 ++--- 12 files changed, 527 insertions(+), 606 deletions(-) diff --git a/docs/cn/guides/00-products/index.md b/docs/cn/guides/00-products/index.md index 139596a9cd..9875efa44a 100644 --- a/docs/cn/guides/00-products/index.md +++ b/docs/cn/guides/00-products/index.md @@ -12,11 +12,11 @@ import LanguageDocs from '@site/src/components/LanguageDocs'; cn= ' -**Databend** —— 一个数据库,搞定所有数据。 +**Databend** —— 一套引擎撑起所有数据与场景。 -Databend 是开源的云原生数仓,把数据存储、向量搜索、SQL 分析、全文检索、地理计算都整合到一起,用 SQL 就能操作。兼容 Snowflake 语法,数据存在对象存储里,随写随查,不用来回倒腾。 +Databend 是开源的云原生数仓,把存储、向量搜索、SQL 分析、全文检索与地理计算统一到一套与 Snowflake 兼容的 SQL 接口上。所有数据都放在对象存储里,写入、分析、搜索一次到位,无需折腾多套系统。 -想用就用:云端开个 Databend Cloud,本地跑个 Docker,或者直接 `pip install databend`,都是一套代码,直接读写你的对象存储。 +想上手随时可行:可以直接开通 Databend Cloud,也能本地用 Docker 自建,甚至 `pip install databend` 嵌到现有工程——不论入口如何,运行的都是同一份内核。 ' en= @@ -32,33 +32,33 @@ Explore the engine on [**GitHub**](https://github.com/databendlabs/databend). La -**以下是一些您可能感兴趣的入门主题** +**推荐先从这些主题开始了解** **快速上手** -- **[快速开始](/guides/deploy/quickstart)**: 使用 Docker 快速启动 Databend 并加载示例数据。 -- **[Databend Cloud](/guides/cloud)**: 启动无服务器仓库并管理您的组织。 -- **[连接到 Databend](/guides/sql-clients)**: 使用各种 SQL 客户端和编程语言进行连接。 -- **[SQL 参考](/sql)**: 浏览 Databend SQL 命令、函数和语法。 +- **[快速开始](/guides/deploy/quickstart)**: 用 Docker 几分钟内启动 Databend,并加载示例数据。 +- **[Databend Cloud](/guides/cloud)**: 创建无服务器仓库,集中管理组织与资源。 +- **[连接到 Databend](/guides/sql-clients)**: 通过常见 SQL 客户端或编程语言接入 Databend。 +- **[SQL 参考](/sql)**: 查询 Databend 支持的 SQL 命令、函数与语法。 **数据处理** -- **[数据加载](/guides/load-data)**: 将各种来源的数据导入 Databend。 -- **[数据卸载](/guides/unload-data)**: 将 Databend 中的数据导出为不同格式。 -- **[半结构化数据](/sql/sql-functions/semi-structured-functions)**: 使用 VARIANT 类型处理 JSON、数组和嵌套数据。 +- **[数据加载](/guides/load-data)**: 把不同来源的数据导入 Databend。 +- **[数据卸载](/guides/unload-data)**: 将 Databend 数据导出为所需格式。 +- **[半结构化数据](/sql/sql-functions/semi-structured-functions)**: 借助 VARIANT 处理 JSON、数组与嵌套结构。 -**统一工作负载** -- **[SQL 分析指南](/guides/query/sql-analytics)**: 用于分析、搜索、向量和地理工作负载的共享会话表。 -- **[JSON 与搜索指南](/guides/query/json-search)**: 使用倒排索引和 Lucene 风格的 `QUERY` 查询 VARIANT 数据。 -- **[向量数据库指南](/guides/query/vector-db)**: 在 Databend 中存储嵌入向量并运行语义相似度搜索。 -- **[地理分析指南](/guides/query/geo-analytics)**: 使用地理空间 SQL 绘制事件地图以获得实时洞察。 -- **[湖仓 ETL 指南](/guides/query/lakehouse-etl)**: 将对象存储文件流式传输到托管表中,无需数据孤岛。 +**统一引擎场景** +- **[SQL 分析指南](/guides/query/sql-analytics)**: 用同一套引擎支撑分析、搜索、向量与地理任务。 +- **[JSON 与搜索指南](/guides/query/json-search)**: 依托倒排索引和 Elasticsearch 风格 `QUERY` 检索 VARIANT 载荷。 +- **[向量数据库指南](/guides/query/vector-db)**: 在 Databend 内存储嵌入并完成语义相似检索。 +- **[地理分析指南](/guides/query/geo-analytics)**: 借助地理空间 SQL 绘制事件地图,实时定位热点。 +- **[湖仓 ETL 指南](/guides/query/lakehouse-etl)**: 将对象存储文件流式写入托管表,杜绝数据孤岛。 **性能与扩展** -- **[性能优化](/guides/performance)**: 通过各种策略提升查询性能。 -- **[基准测试](/guides/benchmark)**: 将 Databend 的性能与其他数据仓库进行比较。 -- **[数据湖仓](/sql/sql-reference/table-engines)**: 与 Hive、Iceberg 和 Delta Lake 无缝集成。 +- **[性能优化](/guides/performance)**: 结合多种策略加速查询与计算。 +- **[基准测试](/guides/benchmark)**: 了解 Databend 与其他数据仓库的性能对比。 +- **[数据湖仓](/sql/sql-reference/table-engines)**: 与 Hive、Iceberg、Delta Lake 无缝协作。 **社区与支持** -- **[加入 Slack](https://link.databend.com/join-slack)**: 与 Databend 社区和核心工程师交流。 -- **[文档问题](https://github.com/databendlabs/databend-docs/issues)**: 报告问题或请求新内容。 -- **[路线图](https://github.com/databendlabs/databend/issues/14167)**: 跟踪即将推出的功能并分享反馈。 -- **[邮件联系](mailto:hi@databend.com)**: 需要帮助时直接联系团队。 +- **[加入 Slack](https://link.databend.com/join-slack)**: 与社区成员及核心工程师直接交流。 +- **[文档问题](https://github.com/databendlabs/databend-docs/issues)**: 反馈文档缺失或提交改进建议。 +- **[路线图](https://github.com/databendlabs/databend/issues/14167)**: 跟踪即将发布的功能并留下意见。 +- **[邮件联系](mailto:hi@databend.com)**: 需要即时协助时写信给我们。 diff --git a/docs/cn/guides/51-ai-functions/_category_.json b/docs/cn/guides/51-ai-functions/_category_.json index 6293e87178..11f9346364 100644 --- a/docs/cn/guides/51-ai-functions/_category_.json +++ b/docs/cn/guides/51-ai-functions/_category_.json @@ -1,3 +1,3 @@ { - "label": "Databend 人工智能(AI)与机器学习(ML)" -} \ No newline at end of file + "label": "Databend AI" +} diff --git a/docs/cn/guides/54-query/00-sql-analytics.md b/docs/cn/guides/54-query/00-sql-analytics.md index 64afb2a6fa..8aa1eb616b 100644 --- a/docs/cn/guides/54-query/00-sql-analytics.md +++ b/docs/cn/guides/54-query/00-sql-analytics.md @@ -1,277 +1,202 @@ --- -title: SQL 分析(SQL Analytics) +title: SQL 分析 --- -> **场景(Scenario):** EverDrive Smart Vision 的分析师整理了一组共享的驾驶会话(drive sessions)和关键帧(key frames),使每个下游工作负载都能查询相同的 ID,而无需在系统之间复制数据。 +> **场景:** CityDrive 会把所有行车视频写入共享的关系表,分析师因此可以在同一批 `video_id` / `frame_id` 上做过滤、连接与聚合,供后续的 JSON、向量、地理和 ETL 负载共用。 -本教程将构建一个微型的 **EverDrive Smart Vision** 数据集,并展示 Databend 的单一查询优化器(Query Optimizer)如何在其余指南中发挥作用。您在此处创建的每个 ID(`SES-20240801-SEA01`、`FRAME-0001` …)都会重新出现在 JSON、向量、地理和 ETL 演练中,形成一致的自动驾驶故事。 +本演练建模了 CityDrive 编目中的关系层,并串起常见的 SQL 积木。这里出现的示例 ID 会在其余指南中再次用到。 -## 1. 创建示例表 -两张表分别记录测试会话和从行车记录仪视频中提取的重要帧。 +## 1. 创建基础表 +`citydrive_videos` 保存视频级元数据,而 `frame_events` 记录每段视频里抽出的关键帧。 ```sql -CREATE OR REPLACE TABLE drive_sessions ( - session_id VARCHAR, - vehicle_id VARCHAR, - route_name VARCHAR, - start_time TIMESTAMP, - end_time TIMESTAMP, - weather VARCHAR, - camera_setup VARCHAR +CREATE OR REPLACE TABLE citydrive_videos ( + video_id STRING, + vehicle_id STRING, + capture_date DATE, + route_name STRING, + weather STRING, + camera_source STRING, + duration_sec INT ); CREATE OR REPLACE TABLE frame_events ( - frame_id VARCHAR, - session_id VARCHAR, - frame_index INT, - captured_at TIMESTAMP, - event_type VARCHAR, - risk_score DOUBLE + frame_id STRING, + video_id STRING, + frame_index INT, + collected_at TIMESTAMP, + event_tag STRING, + risk_score DOUBLE, + speed_kmh DOUBLE ); -INSERT INTO drive_sessions VALUES - ('SES-20240801-SEA01', 'VEH-01', 'Seattle → Bellevue → Seattle', '2024-08-01 09:00', '2024-08-01 10:10', 'Sunny', 'Dual 1080p'), - ('SES-20240802-SEA02', 'VEH-02', 'Downtown Night Loop', '2024-08-02 20:15', '2024-08-02 21:05', 'Light Rain','Night Vision'), - ('SES-20240803-SEA03', 'VEH-03', 'Harbor Industrial Route', '2024-08-03 14:05', '2024-08-03 15:30', 'Overcast', 'Thermal + RGB'); +INSERT INTO citydrive_videos VALUES + ('VID-20250101-001', 'VEH-21', '2025-01-01', 'Downtown Loop', 'Rain', 'roof_cam', 3580), + ('VID-20250101-002', 'VEH-05', '2025-01-01', 'Port Perimeter', 'Overcast', 'front_cam',4020), + ('VID-20250102-001', 'VEH-21', '2025-01-02', 'Airport Connector', 'Clear', 'front_cam',3655), + ('VID-20250103-001', 'VEH-11', '2025-01-03', 'CBD Night Sweep', 'LightFog', 'rear_cam', 3310); INSERT INTO frame_events VALUES - ('FRAME-0001', 'SES-20240801-SEA01', 120, '2024-08-01 09:32:15', 'SuddenBrake', 0.82), - ('FRAME-0002', 'SES-20240801-SEA01', 342, '2024-08-01 09:48:03', 'CrosswalkPedestrian', 0.67), - ('FRAME-0003', 'SES-20240802-SEA02', 88, '2024-08-02 20:29:41', 'NightLowVisibility', 0.59), - ('FRAME-0004', 'SES-20240802-SEA02', 214, '2024-08-02 20:48:12', 'EmergencyVehicle', 0.73), - ('FRAME-0005', 'SES-20240803-SEA03', 305, '2024-08-03 15:02:44', 'CyclistOvertake', 0.64); + ('FRAME-0101', 'VID-20250101-001', 125, '2025-01-01 08:15:21', 'hard_brake', 0.81, 32.4), + ('FRAME-0102', 'VID-20250101-001', 416, '2025-01-01 08:33:54', 'pedestrian', 0.67, 24.8), + ('FRAME-0201', 'VID-20250101-002', 298, '2025-01-01 11:12:02', 'lane_merge', 0.74, 48.1), + ('FRAME-0301', 'VID-20250102-001', 188, '2025-01-02 09:44:18', 'hard_brake', 0.59, 52.6), + ('FRAME-0401', 'VID-20250103-001', 522, '2025-01-03 21:18:07', 'night_lowlight', 0.63, 38.9); ``` -> 需要回顾表 DDL?请参阅 [CREATE TABLE](/sql/sql-commands/ddl/table/ddl-create-table)。 +文档:[CREATE TABLE](/sql/sql-commands/ddl/table/ddl-create-table)、[INSERT](/sql/sql-commands/dml/dml-insert)。 --- -## 2. 过滤最近会话 -让分析聚焦在最新的驾驶记录上。 +## 2. 只看最新车次 +把调查范围控制在最近 3 天的导航路线。 ```sql -WITH recent_sessions AS ( - SELECT * - FROM drive_sessions - WHERE start_time >= DATEADD('day', -7, CURRENT_TIMESTAMP) +WITH recent_videos AS ( + SELECT * + FROM citydrive_videos + WHERE capture_date >= DATEADD('day', -3, TODAY()) ) -SELECT * -FROM recent_sessions -ORDER BY start_time DESC; +SELECT v.video_id, + v.route_name, + v.weather, + COUNT(f.frame_id) AS flagged_frames +FROM recent_videos v +LEFT JOIN frame_events f USING (video_id) +GROUP BY v.video_id, v.route_name, v.weather +ORDER BY flagged_frames DESC; ``` -尽早过滤可加快后续连接(JOIN)与聚合(GROUP BY)。文档:[WHERE & CASE](/sql/sql-commands/query-syntax/query-select#where-clause)。 +文档:[DATEADD](/sql/sql-functions/datetime-functions/date-add)、[GROUP BY](/sql/sql-commands/query-syntax/query-select#group-by-clause)。 --- -## 3. 连接(JOIN) -### INNER JOIN ... USING -合并会话元数据与帧级事件。 - +## 3. 常见 JOIN 模式 +### INNER JOIN:取帧上下文 ```sql -WITH recent_events AS ( - SELECT * - FROM frame_events - WHERE captured_at >= DATEADD('day', -7, CURRENT_TIMESTAMP) -) -SELECT e.frame_id, - e.captured_at, - e.event_type, - e.risk_score, - s.vehicle_id, - s.route_name, - s.weather -FROM recent_events e -JOIN drive_sessions s USING (session_id) -ORDER BY e.captured_at; +SELECT f.frame_id, + f.event_tag, + f.risk_score, + v.route_name, + v.camera_source +FROM frame_events AS f +JOIN citydrive_videos AS v USING (video_id) +ORDER BY f.collected_at; ``` -### NOT EXISTS(反连接/Anti Join) -查找缺少会话元数据的事件。 - +### NOT EXISTS:做 QA ```sql SELECT frame_id -FROM frame_events e +FROM frame_events f WHERE NOT EXISTS ( - SELECT 1 - FROM drive_sessions s - WHERE s.session_id = e.session_id + SELECT 1 + FROM citydrive_videos v + WHERE v.video_id = f.video_id ); ``` -### LATERAL FLATTEN(JSON 展开/Unnest) -将事件与 JSON 载荷中的检测对象合并。 - +### LATERAL FLATTEN:展开 JSON 检测 ```sql -SELECT e.frame_id, - obj.value['type']::STRING AS object_type -FROM frame_events e -JOIN frame_payloads p USING (frame_id), - LATERAL FLATTEN(p.payload['objects']) AS obj; +SELECT f.frame_id, + obj.value['type']::STRING AS detected_type, + obj.value['confidence']::DOUBLE AS confidence +FROM frame_events AS f +JOIN frame_payloads AS p ON f.frame_id = p.frame_id, + LATERAL FLATTEN(input => p.payload['objects']) AS obj +WHERE f.event_tag = 'pedestrian' +ORDER BY confidence DESC; ``` -更多模式:[JOIN 参考](/sql/sql-commands/query-syntax/query-join)。 +文档:[JOIN](/sql/sql-commands/query-syntax/query-join)、[FLATTEN](/sql/sql-functions/table-functions/flatten)。 --- -## 4. 分组(GROUP BY) -### GROUP BY route_name, event_type -标准 `GROUP BY` 比较路线与事件类型。 - +## 4. 车队 KPI 聚合 +### 分路线的行为统计 ```sql -WITH recent_events AS ( - SELECT * - FROM frame_events - WHERE captured_at >= DATEADD('week', -4, CURRENT_TIMESTAMP) -) -SELECT route_name, - event_type, - COUNT(*) AS event_count, - AVG(risk_score) AS avg_risk -FROM recent_events -JOIN drive_sessions USING (session_id) -GROUP BY route_name, event_type -ORDER BY avg_risk DESC, event_count DESC; +SELECT v.route_name, + f.event_tag, + COUNT(*) AS occurrences, + AVG(f.risk_score) AS avg_risk +FROM frame_events f +JOIN citydrive_videos v USING (video_id) +GROUP BY v.route_name, f.event_tag +ORDER BY avg_risk DESC, occurrences DESC; ``` -### GROUP BY ROLLUP -增加路线小计及总计。 - +### ROLLUP 总计 ```sql -SELECT route_name, - event_type, - COUNT(*) AS event_count, - AVG(risk_score) AS avg_risk -FROM frame_events -JOIN drive_sessions USING (session_id) -GROUP BY ROLLUP(route_name, event_type) -ORDER BY route_name NULLS LAST, event_type; +SELECT v.route_name, + f.event_tag, + COUNT(*) AS occurrences +FROM frame_events f +JOIN citydrive_videos v USING (video_id) +GROUP BY ROLLUP(v.route_name, f.event_tag) +ORDER BY v.route_name NULLS LAST, f.event_tag; ``` -### GROUP BY CUBE -生成路线与事件类型的所有组合。 - +### CUBE:路线 × 天气 覆盖 ```sql -SELECT route_name, - event_type, - COUNT(*) AS event_count, - AVG(risk_score) AS avg_risk -FROM frame_events -JOIN drive_sessions USING (session_id) -GROUP BY CUBE(route_name, event_type) -ORDER BY route_name NULLS LAST, event_type; +SELECT v.route_name, + v.weather, + COUNT(DISTINCT v.video_id) AS videos +FROM citydrive_videos v +GROUP BY CUBE(v.route_name, v.weather) +ORDER BY v.route_name NULLS LAST, v.weather NULLS LAST; ``` --- -## 5. 窗口函数(WINDOW FUNCTION) -### SUM(...) OVER(运行总计/running total) -用运行 `SUM` 跟踪每次驾驶的累积风险。 - +## 5. 窗口函数 +### 单次视频的风险累计 ```sql -WITH session_event_scores AS ( - SELECT session_id, - captured_at, - risk_score - FROM frame_events +WITH ordered_events AS ( + SELECT video_id, collected_at, risk_score + FROM frame_events ) -SELECT session_id, - captured_at, +SELECT video_id, + collected_at, risk_score, SUM(risk_score) OVER ( - PARTITION BY session_id - ORDER BY captured_at + PARTITION BY video_id + ORDER BY collected_at ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS cumulative_risk -FROM session_event_scores -ORDER BY session_id, captured_at; +FROM ordered_events +ORDER BY video_id, collected_at; ``` -### AVG(...) OVER(移动平均/moving average) -显示最近三个事件的风险移动平均: - +### 帧级滑动平均 ```sql -WITH session_event_scores AS ( - SELECT session_id, - captured_at, - risk_score - FROM frame_events -) -SELECT session_id, - captured_at, +SELECT video_id, + frame_id, + frame_index, risk_score, AVG(risk_score) OVER ( - PARTITION BY session_id - ORDER BY captured_at + PARTITION BY video_id + ORDER BY frame_index ROWS BETWEEN 3 PRECEDING AND CURRENT ROW - ) AS moving_avg_risk -FROM session_event_scores -ORDER BY session_id, captured_at; + ) AS rolling_avg_risk +FROM frame_events +ORDER BY video_id, frame_index; ``` -窗口函数(Window Functions)让你以内联方式表达滚动总计或平均值。完整列表:[窗口函数(Window Functions)](/sql/sql-functions/window-functions)。 +窗口函数可以在 SQL 中直接表达滚动求和或滑动平均。完整列表见:[窗口函数](/sql/sql-functions/window-functions)。 --- -## 6. 聚合索引加速(Aggregating Index Acceleration) -用[聚合索引(Aggregating Index)](/guides/performance/aggregating-index)缓存繁重汇总,让仪表盘保持秒级响应。 +## 6. 聚合索引提速 +使用 [Aggregating Index](/guides/performance/aggregating-index) 缓存高频汇总,让仪表盘查询避开全表扫描。 ```sql -CREATE OR REPLACE AGGREGATING INDEX idx_route_event_summary ON frame_events +CREATE OR REPLACE AGGREGATING INDEX idx_video_event_summary AS -SELECT session_id, - event_type, +SELECT video_id, + event_tag, COUNT(*) AS event_count, AVG(risk_score) AS avg_risk FROM frame_events -GROUP BY session_id, event_type; +GROUP BY video_id, event_tag; ``` -再次运行相同的汇总查询——优化器将自动命中索引: - -```sql -SELECT s.route_name, - e.event_type, - COUNT(*) AS event_count, - AVG(e.risk_score) AS avg_risk -FROM frame_events e -JOIN drive_sessions s USING (session_id) -WHERE s.start_time >= DATEADD('week', -8, CURRENT_TIMESTAMP) -GROUP BY s.route_name, e.event_type -ORDER BY avg_risk DESC; -``` - -`EXPLAIN` 该语句可看到 `AggregatingIndex` 节点而非全表扫描。Databend 在新帧到达时自动刷新索引,无需额外 ETL 即可实现亚秒级仪表盘体验。 - ---- - -## 7. 存储过程自动化(Stored Procedure Automation) -将报告逻辑封装到存储过程(Stored Procedure)中,确保在定时任务中按预期执行。 - -```sql -CREATE OR REPLACE PROCEDURE generate_weekly_route_report(days_back INT) -RETURNS TABLE(route_name VARCHAR, event_count BIGINT, avg_risk DOUBLE) -LANGUAGE SQL -AS -$$ -BEGIN - RETURN TABLE ( - SELECT s.route_name, - COUNT(*) AS event_count, - AVG(e.risk_score) AS avg_risk - FROM frame_events e - JOIN drive_sessions s USING (session_id) - WHERE e.captured_at >= DATEADD('day', -days_back, CURRENT_TIMESTAMP) - GROUP BY s.route_name - ); -END; -$$; - -CALL PROCEDURE generate_weekly_route_report(28); -``` - -返回的结果集可直接用于笔记本、ETL 任务或自动告警。了解更多:[存储过程脚本(Stored Procedure Scripting)](/sql/stored-procedure-scripting)。 - ---- - -至此,您已拥有完整闭环:摄取会话数据、过滤、连接、聚合、加速重查询、趋势分析并发布。只需替换过滤条件或连接方式,即可将同一套方案应用于驾驶员评分、传感器退化或算法对比等其他智能驾驶 KPI。 \ No newline at end of file +当你再次运行相同的汇总(如路线事件分布)时,`EXPLAIN` 会显示 `AggregatingIndex` 节点,说明查询已经命中上面的摘要副本。索引会在新的帧写入后自动刷新,无须额外 ETL 即可保持秒级体验。 diff --git a/docs/cn/guides/54-query/01-json-search.md b/docs/cn/guides/54-query/01-json-search.md index 11d1202079..c33105526e 100644 --- a/docs/cn/guides/54-query/01-json-search.md +++ b/docs/cn/guides/54-query/01-json-search.md @@ -1,140 +1,77 @@ --- -title: JSON 与搜索(Search) +title: JSON 与搜索 --- -> **场景(Scenario):** EverDrive Smart Vision 的感知服务会为每个观察到的帧发出 JSON 有效载荷(payloads),安全分析师需要在不将数据移出 Databend 的情况下搜索检测结果。 +> **场景:** CityDrive 会为每个抽取出来的帧附带一份 JSON 元数据,并希望直接在 Databend 内用 Elasticsearch 风格的过滤语法完成检索,而不用把数据复制到别的系统。 -EverDrive 的感知 Pipeline(流水线)会发出 JSON 有效载荷,我们可以使用 Elasticsearch 风格的语法进行查询。通过将有效载荷存储为 VARIANT 类型并在创建表时声明倒排索引(inverted index),Databend 允许您直接在数据上运行 Lucene 的 `QUERY` 过滤器。 +Databend 可以在同一仓库里托管多模态信号:VARIANT 列支持倒排索引,位图表刻画标签覆盖率,向量索引用于相似度查询,原生 GEOMETRY 列提供空间过滤。 -## 1. 创建示例表 -每个帧都携带着来自感知模型(边界框、速度、分类)的结构化元数据。 +## 1. 创建元数据表 +每个帧保存一份 JSON,有了共同的结构,任意查询都可以复用。 ```sql -CREATE OR REPLACE TABLE frame_payloads ( - frame_id VARCHAR, - run_stage VARCHAR, - payload VARIANT, - logged_at TIMESTAMP, - INVERTED INDEX idx_frame_payloads(payload) -- 声明倒排索引(inverted index) -); - -INSERT INTO frame_payloads VALUES - ('FRAME-0001', 'detection', PARSE_JSON('{ - "objects": [ - {"type":"vehicle","bbox":[545,220,630,380],"confidence":0.94}, - {"type":"pedestrian","bbox":[710,200,765,350],"confidence":0.88} - ], - "ego": {"speed_kmh": 32.5, "accel": -2.1} - }'), '2024-08-01 09:32:16'), - ('FRAME-0002', 'detection', PARSE_JSON('{ - "objects": [ - {"type":"pedestrian","bbox":[620,210,670,360],"confidence":0.91} - ], - "scene": {"lighting":"daytime","weather":"sunny"} - }'), '2024-08-01 09:48:04'), - ('FRAME-0003', 'tracking', PARSE_JSON('{ - "objects": [ - {"type":"vehicle","speed_kmh": 18.0,"distance_m": 6.2}, - {"type":"emergency_vehicle","sirens":true} - ], - "scene": {"lighting":"night","visibility":"low"} - }'), '2024-08-02 20:29:42'); -``` - -## 2. 提取 JSON 路径 -查看有效载荷以确认结构。 - -```sql -SELECT frame_id, - payload['objects'][0]['type']::STRING AS first_object, - payload['ego']['speed_kmh']::DOUBLE AS ego_speed, - payload['scene']['lighting']::STRING AS lighting -FROM frame_payloads -ORDER BY logged_at; +CREATE DATABASE IF NOT EXISTS video_unified_demo; +USE video_unified_demo; + +CREATE OR REPLACE TABLE frame_metadata_catalog ( + doc_id STRING, + meta_json VARIANT, + captured_at TIMESTAMP, + INVERTED INDEX idx_meta_json (meta_json) +) CLUSTER BY (captured_at); ``` -使用 `::STRING` / `::DOUBLE` 进行类型转换(Casting)可以将 JSON 值暴露给常规的 SQL 过滤器。Databend 还通过 `QUERY` 函数支持在此数据之上进行 Elasticsearch 风格的搜索——通过在变体字段前加上列名(例如 `payload.objects.type`)来引用它们。更多提示:[加载半结构化数据](/guides/load-data/load-semistructured/load-ndjson)。 - ---- - -## 3. Elasticsearch 风格的搜索(Search) -`QUERY` 使用 Elasticsearch/Lucene 语法,因此您可以组合布尔逻辑、范围、权重(boosts)和列表。以下是 EverDrive 有效载荷上的几种模式: - -### 数组匹配(Array Match) -查找检测到行人的帧: +> 需要同时管理多模态数据(向量嵌入、GPS 轨迹、标签位图)?可以直接复用 [向量](./02-vector-db.md) 与 [地理](./03-geo-analytics.md) 指南里的建表语句,再同 JSON 结果拼接。 +## 2. 使用 `QUERY()` 的检索模式 +### 数组匹配 ```sql -SELECT frame_id -FROM frame_payloads -WHERE QUERY('payload.objects.type:pedestrian') -ORDER BY logged_at DESC -LIMIT 10; +SELECT doc_id, + captured_at, + meta_json['detections'] AS detections +FROM frame_metadata_catalog +WHERE QUERY('meta_json.detections.objects.type:pedestrian') +ORDER BY captured_at DESC +LIMIT 5; ``` ### 布尔 AND -车辆行驶速度大于 30 km/h **且** 检测到行人: - ```sql -SELECT frame_id, - payload['ego']['speed_kmh']::DOUBLE AS ego_speed -FROM frame_payloads -WHERE QUERY('payload.objects.type:pedestrian AND payload.ego.speed_kmh:[30 TO *]') -ORDER BY ego_speed DESC; +SELECT doc_id, captured_at +FROM frame_metadata_catalog +WHERE QUERY('meta_json.scene.weather_code:rain + AND meta_json.camera.sensor_view:roof') +ORDER BY captured_at; ``` ### 布尔 OR / 列表 -夜间驾驶遇到紧急车辆或骑自行车的人: - ```sql -SELECT frame_id -FROM frame_payloads -WHERE QUERY('payload.scene.lighting:night AND payload.objects.type:(emergency_vehicle OR cyclist)'); +SELECT doc_id, + meta_json['media_meta']['tagging']['labels'] AS labels +FROM frame_metadata_catalog +WHERE QUERY('meta_json.media_meta.tagging.labels:(hard_brake OR swerve OR lane_merge)') +ORDER BY captured_at DESC +LIMIT 10; ``` ### 数值范围 -速度在 10–25 km/h 之间(包含)或严格在 25–40 km/h 之间: - ```sql -SELECT frame_id, - payload['ego']['speed_kmh'] AS speed -FROM frame_payloads -WHERE QUERY('payload.ego.speed_kmh:[10 TO 25] OR payload.ego.speed_kmh:{25 TO 40}') -ORDER BY speed; +SELECT doc_id, + meta_json['vehicle']['speed_kmh']::DOUBLE AS speed +FROM frame_metadata_catalog +WHERE QUERY('meta_json.vehicle.speed_kmh:{30 TO 80}') +ORDER BY speed DESC +LIMIT 10; ``` ### 权重(Boosting) -优先考虑同时出现行人和车辆的帧,但强调行人项: - ```sql -SELECT frame_id, +SELECT doc_id, SCORE() AS relevance -FROM frame_payloads -WHERE QUERY('payload.objects.type:pedestrian^2 AND payload.objects.type:vehicle') +FROM frame_metadata_catalog +WHERE QUERY('meta_json.scene.weather_code:rain AND (meta_json.media_meta.tagging.labels:hard_brake^2 OR meta_json.media_meta.tagging.labels:swerve)') ORDER BY relevance DESC -LIMIT 10; -``` - -请参阅 [搜索函数](/sql/sql-functions/search-functions) 以了解 `QUERY`、`SCORE()` 和相关辅助函数支持的完整 Elasticsearch 语法。 - ---- - -## 4. 交叉引用帧事件 -将查询结果连接回在分析指南中创建的帧级风险评分。 - -```sql -WITH risky_frames AS ( - SELECT frame_id, - payload['ego']['speed_kmh']::DOUBLE AS ego_speed - FROM frame_payloads - WHERE QUERY('payload.objects.type:pedestrian AND payload.ego.speed_kmh:[30 TO *]') -) -SELECT r.frame_id, - e.event_type, - e.risk_score, - r.ego_speed -FROM risky_frames r -JOIN frame_events e USING (frame_id) -ORDER BY e.risk_score DESC; +LIMIT 8; ``` -由于 `frame_id` 在表之间共享,您可以立即从原始有效载荷跳转到精选分析结果。 \ No newline at end of file +`QUERY()` 遵循 Elasticsearch 的语义(布尔逻辑、范围、权重、列表等),`SCORE()` 则暴露检索相关性,方便在 SQL 里直接排序。完整算子列表见:[搜索函数](/sql/sql-functions/search-functions)。 diff --git a/docs/cn/guides/54-query/02-vector-db.md b/docs/cn/guides/54-query/02-vector-db.md index 34e38f4395..5f3e691fe9 100644 --- a/docs/cn/guides/54-query/02-vector-db.md +++ b/docs/cn/guides/54-query/02-vector-db.md @@ -1,95 +1,99 @@ --- -title: 向量搜索(Vector Search) +title: 向量搜索 --- -> **场景:** EverDrive Smart Vision 将紧凑的视觉嵌入(vision embeddings)附加到高风险帧,以便调查团队直接在 Databend 内检索相似场景。 +> **场景:** CityDrive 把每个帧的嵌入直接存放在 Databend,语义相似搜索(“找出和它看起来像的帧”)便可与传统 SQL 分析一同运行,无需再部署独立的向量服务。 -每帧都附带视觉嵌入,感知工程师可借此发现相似情况。本指南演示如何插入这些向量,并在同一 EverDrive ID 上执行语义搜索。 +`frame_embeddings` 表与 `frame_events`、`frame_payloads`、`frame_geo_points` 共用同一批 `frame_id`,让语义检索与常规 SQL 牢牢绑定在一起。 -## 1. 创建示例表 -为便于阅读,示例使用四维向量。生产环境中可保存 CLIP 或自监督模型输出的 512 维或 1536 维嵌入。 +## 1. 准备嵌入表 +生产模型通常输出 512–1536 维,本例使用 512 维方便直接复制到演示集群。 ```sql CREATE OR REPLACE TABLE frame_embeddings ( - frame_id VARCHAR, - session_id VARCHAR, - embedding VECTOR(4), - model_version VARCHAR, - created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, - VECTOR INDEX idx_frame_embeddings(embedding) distance='cosine' + frame_id STRING, + video_id STRING, + sensor_view STRING, + embedding VECTOR(512), + encoder_build STRING, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + VECTOR INDEX idx_frame_embeddings(embedding) distance='cosine' ); INSERT INTO frame_embeddings VALUES - ('FRAME-0001', 'SES-20240801-SEA01', [0.18, 0.42, 0.07, 0.12]::VECTOR(4), 'clip-mini-v1', DEFAULT), - ('FRAME-0002', 'SES-20240801-SEA01', [0.20, 0.38, 0.12, 0.18]::VECTOR(4), 'clip-mini-v1', DEFAULT), - ('FRAME-0003', 'SES-20240802-SEA02', [0.62, 0.55, 0.58, 0.61]::VECTOR(4), 'night-fusion-v2', DEFAULT), - ('FRAME-0004', 'SES-20240802-SEA02', [0.57, 0.49, 0.52, 0.55]::VECTOR(4), 'night-fusion-v2', DEFAULT); + ('FRAME-0101', 'VID-20250101-001', 'roof_cam', RANDOM_VECTOR(512), 'clip-lite-v1', DEFAULT), + ('FRAME-0102', 'VID-20250101-001', 'roof_cam', RANDOM_VECTOR(512), 'clip-lite-v1', DEFAULT), + ('FRAME-0201', 'VID-20250101-002', 'front_cam',RANDOM_VECTOR(512), 'night-fusion-v2', DEFAULT), + ('FRAME-0401', 'VID-20250103-001', 'rear_cam', RANDOM_VECTOR(512), 'night-fusion-v2', DEFAULT); ``` -文档:[向量数据类型(Vector data type)](/sql/sql-reference/data-types/vector) 与 [向量索引(Vector index)](/sql/sql-reference/data-types/vector#vector-indexing)。 +文档:[向量类型](/sql/sql-reference/data-types/vector)、[向量索引](/sql/sql-reference/data-types/vector#vector-indexing)。 --- -## 2. COSINE_DISTANCE 搜索 -查找与 `FRAME-0001` 最相似的帧。 +## 2. 运行余弦搜索 +先取出某一帧的嵌入,再让 HNSW 索引返回最近邻。 ```sql WITH query_embedding AS ( - SELECT embedding - FROM frame_embeddings - WHERE frame_id = 'FRAME-0001' - LIMIT 1 + SELECT embedding + FROM frame_embeddings + WHERE frame_id = 'FRAME-0101' ) SELECT e.frame_id, - e.session_id, - cosine_distance(e.embedding, q.embedding) AS distance -FROM frame_embeddings e -CROSS JOIN query_embedding q + e.video_id, + COSINE_DISTANCE(e.embedding, q.embedding) AS distance +FROM frame_embeddings AS e +CROSS JOIN query_embedding AS q ORDER BY distance LIMIT 3; ``` -余弦距离计算将利用先前创建的 HNSW 索引,优先返回最近邻帧。 +距离越小越相似。即便有数百万帧,`VECTOR INDEX` 也能让响应保持毫秒级。 ---- - -## 3. WHERE 过滤 + 相似度 -结合相似度搜索与传统谓词,缩小结果范围。 +继续叠加传统谓词(如路线、视频、传感器视角),即可在向量比对前后收窄候选集。 ```sql WITH query_embedding AS ( - SELECT embedding - FROM frame_embeddings - WHERE frame_id = 'FRAME-0003' - LIMIT 1 + SELECT embedding + FROM frame_embeddings + WHERE frame_id = 'FRAME-0201' ) SELECT e.frame_id, - cosine_distance(e.embedding, q.embedding) AS distance -FROM frame_embeddings e -CROSS JOIN query_embedding q -WHERE e.session_id = 'SES-20240802-SEA02' -ORDER BY distance; + e.sensor_view, + COSINE_DISTANCE(e.embedding, q.embedding) AS distance +FROM frame_embeddings AS e +CROSS JOIN query_embedding AS q +WHERE e.sensor_view = 'rear_cam' +ORDER BY distance +LIMIT 5; ``` +优化器会在满足 `sensor_view` 过滤的同时继续走向量索引。 + --- -## 4. JOIN 语义 + 风险元数据 -将语义结果与风险评分或检测载荷关联,丰富调查维度。 +## 3. 丰富相似帧 +把 Top-N 相似帧物化,再与 `frame_events` 连接,方便下游分析。 ```sql WITH query_embedding AS ( - SELECT embedding FROM frame_embeddings WHERE frame_id = 'FRAME-0001' LIMIT 1 + SELECT embedding + FROM frame_embeddings + WHERE frame_id = 'FRAME-0102' ), similar_frames AS ( - SELECT frame_id, - cosine_distance(e.embedding, q.embedding) AS distance + SELECT frame_id, + video_id, + COSINE_DISTANCE(e.embedding, q.embedding) AS distance FROM frame_embeddings e CROSS JOIN query_embedding q ORDER BY distance LIMIT 5 ) SELECT sf.frame_id, - fe.event_type, + sf.video_id, + fe.event_tag, fe.risk_score, sf.distance FROM similar_frames sf @@ -97,4 +101,4 @@ LEFT JOIN frame_events fe USING (frame_id) ORDER BY sf.distance; ``` -该混合视图呈现“外观类似 FRAME-0001 且触发高风险事件的帧”。 \ No newline at end of file +嵌入与关系表同库共存,调查人员可以立即从“视觉相似”跳转到“同时伴随 `hard_brake` 标签、特定天气或 JSON 检测”的线索,无需导出数据。 diff --git a/docs/cn/guides/54-query/03-geo-analytics.md b/docs/cn/guides/54-query/03-geo-analytics.md index 239caded11..0ab247ff1d 100644 --- a/docs/cn/guides/54-query/03-geo-analytics.md +++ b/docs/cn/guides/54-query/03-geo-analytics.md @@ -1,93 +1,98 @@ --- -title: 地理空间分析(Geo Analytics) +title: 地理分析 --- -> **场景(Scenario):** EverDrive Smart Vision 会记录每个关键帧的 GPS 坐标,以便运营团队在城市中绘制危险驾驶热点图。 +> **场景:** CityDrive 会为每个被标记的帧记录精准的 GPS 定位以及与信号灯的距离,运营人员可以纯 SQL 回答“事故发生在什么位置?”之类的问题。 -每帧都带有 GPS 坐标,因此我们可以把危险情况映射到整个城市。本指南新增一张地理空间表,并使用相同的 EverDrive 会话 ID 演示空间过滤、多边形和 H3 分桶。 +`frame_geo_points` 与 `signal_contact_points` 同样复用本指南里的 `video_id` / `frame_id`,因此可以在不复制数据的情况下把 SQL 指标延伸到地图视图。 -## 1. 创建示例表 -每条记录表示捕获关键帧时自车(ego vehicle)的位置。将坐标存储为 `GEOMETRY` 类型,即可复用本工作负载中的 `ST_X`、`ST_Y` 和 `HAVERSINE` 等函数。 +## 1. 创建位置表 +如果你已完成 JSON 指南,这些表应该已经存在。下方片段包含表结构以及几条深圳示例数据。 ```sql -CREATE OR REPLACE TABLE drive_geo ( - frame_id VARCHAR, - session_id VARCHAR, - location GEOMETRY, - speed_kmh DOUBLE, - heading_deg DOUBLE +CREATE OR REPLACE TABLE frame_geo_points ( + video_id STRING, + frame_id STRING, + position_wgs84 GEOMETRY, + solution_grade INT, + source_system STRING, + created_at TIMESTAMP ); -INSERT INTO drive_geo VALUES - ('FRAME-0001', 'SES-20240801-SEA01', TO_GEOMETRY('SRID=4326;POINT(-122.3321 47.6062)'), 28.0, 90), - ('FRAME-0002', 'SES-20240801-SEA01', TO_GEOMETRY('SRID=4326;POINT(-122.3131 47.6105)'), 35.4, 120), - ('FRAME-0003', 'SES-20240802-SEA02', TO_GEOMETRY('SRID=4326;POINT(-122.3419 47.6205)'), 18.5, 45), - ('FRAME-0004', 'SES-20240802-SEA02', TO_GEOMETRY('SRID=4326;POINT(-122.3490 47.6138)'), 22.3, 60), - ('FRAME-0005', 'SES-20240803-SEA03', TO_GEOMETRY('SRID=4326;POINT(-122.3610 47.6010)'), 30.1, 210); +INSERT INTO frame_geo_points VALUES + ('VID-20250101-001','FRAME-0101',TO_GEOMETRY('SRID=4326;POINT(114.0579 22.5431)'),104,'fusion_gnss','2025-01-01 08:15:21'), + ('VID-20250101-001','FRAME-0102',TO_GEOMETRY('SRID=4326;POINT(114.0610 22.5460)'),104,'fusion_gnss','2025-01-01 08:33:54'), + ('VID-20250101-002','FRAME-0201',TO_GEOMETRY('SRID=4326;POINT(114.1040 22.5594)'),104,'fusion_gnss','2025-01-01 11:12:02'), + ('VID-20250102-001','FRAME-0301',TO_GEOMETRY('SRID=4326;POINT(114.0822 22.5368)'),104,'fusion_gnss','2025-01-02 09:44:18'), + ('VID-20250103-001','FRAME-0401',TO_GEOMETRY('SRID=4326;POINT(114.1195 22.5443)'),104,'fusion_gnss','2025-01-03 21:18:07'); + +CREATE OR REPLACE TABLE signal_contact_points ( + node_id STRING, + signal_position GEOMETRY, + video_id STRING, + frame_id STRING, + frame_position GEOMETRY, + distance_m DOUBLE, + created_at TIMESTAMP +); ``` 文档:[地理空间数据类型](/sql/sql-reference/data-types/geospatial)。 --- -## 2. ST_DISTANCE 半径过滤 -`ST_DISTANCE` 函数用于测量几何体之间的距离。将帧位置和热点均转换到 Web Mercator(SRID 3857),结果以米为单位,再过滤 500 米以内。 +## 2. 空间过滤 +可计算帧与市中心坐标的距离,或检查它是否落在多边形内部。需要以米为单位时,把坐标投影到 SRID 3857。 ```sql -SELECT g.frame_id, - g.session_id, - e.event_type, - e.risk_score, +SELECT l.frame_id, + l.video_id, + f.event_tag, ST_DISTANCE( - ST_TRANSFORM(g.location, 3857), - ST_TRANSFORM(TO_GEOMETRY('SRID=4326;POINT(-122.3350 47.6080)'), 3857) - ) AS meters_from_hotspot -FROM drive_geo g -JOIN frame_events e USING (frame_id) + ST_TRANSFORM(l.position_wgs84, 3857), + ST_TRANSFORM(TO_GEOMETRY('SRID=4326;POINT(114.0600 22.5450)'), 3857) + ) AS meters_from_hq +FROM frame_geo_points AS l +JOIN frame_events AS f USING (frame_id) WHERE ST_DISTANCE( - ST_TRANSFORM(g.location, 3857), - ST_TRANSFORM(TO_GEOMETRY('SRID=4326;POINT(-122.3350 47.6080)'), 3857) - ) <= 500 -ORDER BY meters_from_hotspot; + ST_TRANSFORM(l.position_wgs84, 3857), + ST_TRANSFORM(TO_GEOMETRY('SRID=4326;POINT(114.0600 22.5450)'), 3857) + ) <= 400 +ORDER BY meters_from_hq; ``` -需要原始几何调试?在投影中加入 `ST_ASTEXT(g.location)`。偏好直接的大圆计算?改用 `HAVERSINE` 函数,它直接操作 `ST_X`/`ST_Y` 坐标。 - ---- - -## 3. ST_CONTAINS 多边形过滤 -检查事件是否发生在划定安全区内(如学校区域)。 +调试时可以输出 `ST_ASTEXT(l.position_wgs84)`,若偏好直接使用球面距离,可改用 [`HAVERSINE`](/sql/sql-functions/geospatial-functions#trigonometric-distance-functions)。 ```sql WITH school_zone AS ( - SELECT TO_GEOMETRY('SRID=4326;POLYGON(( - -122.3415 47.6150, - -122.3300 47.6150, - -122.3300 47.6070, - -122.3415 47.6070, - -122.3415 47.6150 - ))') AS poly + SELECT TO_GEOMETRY('SRID=4326;POLYGON(( + 114.0505 22.5500, + 114.0630 22.5500, + 114.0630 22.5420, + 114.0505 22.5420, + 114.0505 22.5500 + ))') AS poly ) -SELECT g.frame_id, - g.session_id, - e.event_type -FROM drive_geo g -JOIN frame_events e USING (frame_id) +SELECT l.frame_id, + l.video_id, + f.event_tag +FROM frame_geo_points AS l +JOIN frame_events AS f USING (frame_id) CROSS JOIN school_zone -WHERE ST_CONTAINS(poly, g.location); +WHERE ST_CONTAINS(poly, l.position_wgs84); ``` --- -## 4. GEO_TO_H3 热力图 -按六边形单元聚合事件,构建路线热力图。 +## 3. 六边形聚合 +把风险帧聚合进 H3 单元,用于仪表盘或热力图。 ```sql -SELECT GEO_TO_H3(ST_X(location), ST_Y(location), 8) AS h3_cell, +SELECT GEO_TO_H3(ST_X(position_wgs84), ST_Y(position_wgs84), 8) AS h3_cell, COUNT(*) AS frame_count, - AVG(e.risk_score) AS avg_risk -FROM drive_geo -JOIN frame_events e USING (frame_id) + AVG(f.risk_score) AS avg_risk +FROM frame_geo_points AS l +JOIN frame_events AS f USING (frame_id) GROUP BY h3_cell ORDER BY avg_risk DESC; ``` @@ -96,44 +101,56 @@ ORDER BY avg_risk DESC; --- -## 5. ST_DISTANCE + JSON 查询 -将空间距离检查与丰富的检测元数据(来自 JSON 指南)结合,生成精准告警。 +## 4. 交通信号上下文 +连接 `signal_contact_points` 与 `frame_geo_points`,即可验证存量指标或把空间条件与 JSON 搜索联动。 ```sql -WITH near_intersection AS ( - SELECT frame_id - FROM drive_geo - WHERE ST_DISTANCE( - ST_TRANSFORM(location, 3857), - ST_TRANSFORM(TO_GEOMETRY('SRID=4326;POINT(-122.3410 47.6130)'), 3857) - ) <= 200 +SELECT t.node_id, + t.video_id, + t.frame_id, + ST_DISTANCE(t.signal_position, t.frame_position) AS recomputed_distance, + t.distance_m AS stored_distance, + l.source_system +FROM signal_contact_points AS t +JOIN frame_geo_points AS l USING (frame_id) +WHERE t.distance_m < 0.03 -- 不同投影下约等于 30 米 +ORDER BY t.distance_m; +``` + +```sql +WITH near_junction AS ( + SELECT frame_id + FROM frame_geo_points + WHERE ST_DISTANCE( + ST_TRANSFORM(position_wgs84, 3857), + ST_TRANSFORM(TO_GEOMETRY('SRID=4326;POINT(114.0700 22.5400)'), 3857) + ) <= 150 ) -SELECT n.frame_id, - p.payload['objects'][0]['type']::STRING AS first_object, - e.event_type, - e.risk_score -FROM near_intersection n -JOIN frame_payloads p USING (frame_id) -JOIN frame_events e USING (frame_id) -WHERE QUERY('payload.objects.type:pedestrian'); +SELECT f.frame_id, + f.event_tag, + meta.meta_json['media_meta']['tagging']['labels'] AS labels +FROM near_junction nj +JOIN frame_events AS f USING (frame_id) +JOIN frame_metadata_catalog AS meta + ON meta.doc_id = nj.frame_id +WHERE QUERY('meta_json.media_meta.tagging.labels:hard_brake'); ``` -空间过滤器、JSON 运算符与经典 SQL 均可在一句话内完成。 +这类模式可以先按地理范围筛选,再对剩余帧执行 JSON 搜索。 --- -## 6. 创建视图热力图 -将六边形级摘要导出到可视化工具或地图图层。 +## 5. 发布热力视图 +把空间摘要封装成视图,供 BI 或 GIS 工具直接查询。 ```sql -CREATE OR REPLACE VIEW v_route_heatmap AS ( - SELECT GEO_TO_H3(ST_X(location), ST_Y(location), 7) AS h3_cell, - COUNT(*) AS frames, - AVG(e.risk_score) AS avg_risk - FROM drive_geo - JOIN frame_events e USING (frame_id) - GROUP BY h3_cell -); +CREATE OR REPLACE VIEW v_citydrive_geo_heatmap AS +SELECT GEO_TO_H3(ST_X(position_wgs84), ST_Y(position_wgs84), 7) AS h3_cell, + COUNT(*) AS frames, + AVG(f.risk_score) AS avg_risk +FROM frame_geo_points AS l +JOIN frame_events AS f USING (frame_id) +GROUP BY h3_cell; ``` -下游系统可直接查询 `v_route_heatmap`,在地图上渲染风险热点,无需重新处理原始遥测数据。 \ No newline at end of file +同一批 `video_id` 现在既能支撑向量、文本,也能支撑空间查询,调查团队不再需要维护额外的管道。 diff --git a/docs/cn/guides/54-query/04-lakehouse-etl.md b/docs/cn/guides/54-query/04-lakehouse-etl.md index 3012fe00bd..6de3620406 100644 --- a/docs/cn/guides/54-query/04-lakehouse-etl.md +++ b/docs/cn/guides/54-query/04-lakehouse-etl.md @@ -1,186 +1,224 @@ --- -title: 湖仓一体 ETL(Lakehouse ETL) +title: 湖仓 ETL --- -> **场景(Scenario):** EverDrive Smart Vision 的数据工程团队将每次路测批次导出为 Parquet 文件,以便统一工作负载在 Databend 内加载、查询并丰富同一份遥测数据。 +> **场景:** CityDrive 的数据工程团队会把每一批行车录像导出成 Parquet(视频、帧事件、JSON 元数据、嵌入、GPS 轨迹、信号灯距离),希望用一套 COPY 流程将共享表刷新到 Databend。 -EverDrive 的摄取循环非常简单: +加载闭环非常直接: ``` -对象存储导出(例如 Parquet)→ Stage → COPY INTO →(可选)Stream & Task +对象存储 → STAGE → COPY INTO 表 → (可选)STREAMS / TASKS ``` -调整桶路径/凭据(如格式不同,把 Parquet 换成实际格式),然后粘贴下方命令。所有语法均与官方[加载数据指南](/guides/load-data/)一致。 +根据自己的桶路径或格式进行调整,然后直接执行下面的 SQL。语法与[加载数据指南](/guides/load-data/)一致。 --- -## 1. Stage -EverDrive 的数据工程团队每批次导出四个文件——sessions、frame events、detection payloads(含嵌套 JSON 字段)和 frame embeddings——到 S3 桶。本指南以 Parquet 为例,只需修改 `FILE_FORMAT` 即可接入 CSV、JSON 或其他支持的格式。一次性创建命名连接,后续所有 Stage 复用。 +## 1. 创建 Stage +为 CityDrive 导出的桶创建可复用的 Stage。示例使用 Parquet,你可以改成任意受支持的格式。 ```sql -CREATE OR REPLACE CONNECTION everdrive_s3 +CREATE OR REPLACE CONNECTION citydrive_s3 STORAGE_TYPE = 's3' ACCESS_KEY_ID = '' SECRET_ACCESS_KEY = ''; -CREATE OR REPLACE STAGE drive_stage - URL = 's3://everdrive-lakehouse/raw/' - CONNECTION = (CONNECTION_NAME = 'everdrive_s3') +CREATE OR REPLACE STAGE citydrive_stage + URL = 's3://citydrive-lakehouse/raw/' + CONNECTION = (CONNECTION_NAME = 'citydrive_s3') FILE_FORMAT = (TYPE = 'PARQUET'); ``` -更多选项见[创建 Stage](/sql/sql-commands/ddl/stage/ddl-create-stage)。 +> [!IMPORTANT] +> 请把示例中的 AWS 密钥与桶地址替换成真实值,否则 `LIST`、`SELECT ... FROM @citydrive_stage`、`COPY INTO` 都会因为 403/`InvalidAccessKeyId` 失败。 -列出导出文件夹(本示例为 Parquet)确认可见: +快速检查: ```sql -LIST @drive_stage/sessions/; -LIST @drive_stage/frame-events/; -LIST @drive_stage/payloads/; -LIST @drive_stage/embeddings/; +LIST @citydrive_stage/videos/; +LIST @citydrive_stage/frame-events/; +LIST @citydrive_stage/manifests/; +LIST @citydrive_stage/frame-embeddings/; +LIST @citydrive_stage/frame-locations/; +LIST @citydrive_stage/traffic-lights/; ``` --- -## 2. Preview -加载前先查看 Parquet 文件,验证 schema 并抽样。 +## 2. 预览文件 +在装载前对 Stage 做一次 `SELECT`,确认 schema 与样例行。 ```sql SELECT * -FROM @drive_stage/sessions/session_2024_08_16.parquet +FROM @citydrive_stage/videos/capture_date=2025-01-01/videos.parquet LIMIT 5; SELECT * -FROM @drive_stage/frame-events/frame_events_2024_08_16.parquet +FROM @citydrive_stage/frame-events/batch_2025_01_01.parquet LIMIT 5; ``` -按需对 payloads 与 embeddings 重复预览。Databend 会自动使用 Stage 上指定的文件格式。 +Databend 会沿用 Stage 定义的文件格式,因此无需额外参数。 --- -## 3. COPY INTO -将各文件加载到指南用到的表中。通过内联类型转换把输入列映射到表列;下方投影以 Parquet 为例,其他格式同理。 +## 3. COPY INTO 统一表 +每份导出都对应指南里的一张共享表。内联的 `::TYPE` 转换可以保证上下游 schema 一致。 -### Sessions +### `citydrive_videos` ```sql -COPY INTO drive_sessions (session_id, vehicle_id, route_name, start_time, end_time, weather, camera_setup) +COPY INTO citydrive_videos (video_id, vehicle_id, capture_date, route_name, weather, camera_source, duration_sec) FROM ( - SELECT session_id::STRING, + SELECT video_id::STRING, vehicle_id::STRING, + capture_date::DATE, route_name::STRING, - start_time::TIMESTAMP, - end_time::TIMESTAMP, weather::STRING, - camera_setup::STRING - FROM @drive_stage/sessions/ + camera_source::STRING, + duration_sec::INT + FROM @citydrive_stage/videos/ ) FILE_FORMAT = (TYPE = 'PARQUET'); ``` -### Frame Events +### `frame_events` ```sql -COPY INTO frame_events (frame_id, session_id, frame_index, captured_at, event_type, risk_score) +COPY INTO frame_events (frame_id, video_id, frame_index, collected_at, event_tag, risk_score, speed_kmh) FROM ( SELECT frame_id::STRING, - session_id::STRING, + video_id::STRING, frame_index::INT, - captured_at::TIMESTAMP, - event_type::STRING, - risk_score::DOUBLE - FROM @drive_stage/frame-events/ + collected_at::TIMESTAMP, + event_tag::STRING, + risk_score::DOUBLE, + speed_kmh::DOUBLE + FROM @citydrive_stage/frame-events/ ) FILE_FORMAT = (TYPE = 'PARQUET'); ``` -### Detection Payloads -payload 文件含嵌套列(`payload` 列为 JSON 对象)。用相同投影复制到 `frame_payloads` 表。 +### `frame_metadata_catalog` +```sql +COPY INTO frame_metadata_catalog (doc_id, meta_json, captured_at) +FROM ( + SELECT doc_id::STRING, + meta_json::VARIANT, + captured_at::TIMESTAMP + FROM @citydrive_stage/manifests/ +) +FILE_FORMAT = (TYPE = 'PARQUET'); +``` +### `frame_embeddings` ```sql -COPY INTO frame_payloads (frame_id, run_stage, payload, logged_at) +COPY INTO frame_embeddings (frame_id, video_id, sensor_view, embedding, encoder_build, created_at) FROM ( SELECT frame_id::STRING, - run_stage::STRING, - payload, - logged_at::TIMESTAMP - FROM @drive_stage/payloads/ + video_id::STRING, + sensor_view::STRING, + embedding::VECTOR(768), -- 根据实际维度调整 + encoder_build::STRING, + created_at::TIMESTAMP + FROM @citydrive_stage/frame-embeddings/ ) FILE_FORMAT = (TYPE = 'PARQUET'); ``` -### Frame Embeddings +### `frame_geo_points` ```sql -COPY INTO frame_embeddings (frame_id, session_id, embedding, model_version, created_at) +COPY INTO frame_geo_points (video_id, frame_id, position_wgs84, solution_grade, source_system, created_at) FROM ( - SELECT frame_id::STRING, - session_id::STRING, - embedding::VECTOR(4), -- 将 4 替换为实际嵌入维度 - model_version::STRING, + SELECT video_id::STRING, + frame_id::STRING, + position_wgs84::GEOMETRY, + solution_grade::INT, + source_system::STRING, + created_at::TIMESTAMP + FROM @citydrive_stage/frame-locations/ +) +FILE_FORMAT = (TYPE = 'PARQUET'); +``` + +### `signal_contact_points` +```sql +COPY INTO signal_contact_points (node_id, signal_position, video_id, frame_id, frame_position, distance_m, created_at) +FROM ( + SELECT node_id::STRING, + signal_position::GEOMETRY, + video_id::STRING, + frame_id::STRING, + frame_position::GEOMETRY, + distance_m::DOUBLE, created_at::TIMESTAMP - FROM @drive_stage/embeddings/ + FROM @citydrive_stage/traffic-lights/ ) FILE_FORMAT = (TYPE = 'PARQUET'); ``` -下游所有指南(分析/搜索/向量/地理)均可看到本批次数据。 +完成后,SQL 分析、`QUERY()` 搜索、向量相似、地理过滤等所有负载都会读取完全相同的数据。 --- -## 4. Stream(可选) -若希望下游作业在每次 `COPY INTO` 后感知新行,可在关键表(如 `frame_events`)上创建 Stream。用法参考[持续 Pipeline → Stream](/guides/load-data/continuous-data-pipelines/stream)。 +## 4. Streams(可选) +想让下游作业只消费最近一次批量新增的数据?给目标表创建 Stream。 ```sql CREATE OR REPLACE STREAM frame_events_stream ON TABLE frame_events; -SELECT * FROM frame_events_stream; -- 显示上次消费后的新行 +SELECT * FROM frame_events_stream; -- 查看刚 COPY 的新行 +-- …处理… +SELECT * FROM frame_events_stream WITH CONSUME; -- 推进游标 ``` -处理完毕后执行 `CONSUME STREAM frame_events_stream;`(或将行插入另一表)以推进偏移。 +`WITH CONSUME` 会在你处理完行后向前推进 offset。参考:[Streams](/guides/load-data/continuous-data-pipelines/stream)。 --- -## 5. Task(可选) -Task 按调度执行**一条 SQL 语句**。可为每张表创建小 Task(或调用存储过程作为统一入口)。 +## 5. Tasks(可选) +Task 会按计划运行**单条 SQL**。你可以为每张表建一个轻量 Task,或把逻辑写成存储过程后在 Task 中调用。 ```sql -CREATE OR REPLACE TASK task_load_sessions +CREATE OR REPLACE TASK task_load_citydrive_videos WAREHOUSE = 'default' - SCHEDULE = 5 MINUTE + SCHEDULE = 10 MINUTE AS - COPY INTO drive_sessions (session_id, vehicle_id, route_name, start_time, end_time, weather, camera_setup) + COPY INTO citydrive_videos (video_id, vehicle_id, capture_date, route_name, weather, camera_source, duration_sec) FROM ( - SELECT session_id::STRING, + SELECT video_id::STRING, vehicle_id::STRING, + capture_date::DATE, route_name::STRING, - start_time::TIMESTAMP, - end_time::TIMESTAMP, weather::STRING, - camera_setup::STRING - FROM @drive_stage/sessions/ + camera_source::STRING, + duration_sec::INT + FROM @citydrive_stage/videos/ ) FILE_FORMAT = (TYPE = 'PARQUET'); -ALTER TASK task_load_sessions RESUME; +ALTER TASK task_load_citydrive_videos RESUME; CREATE OR REPLACE TASK task_load_frame_events WAREHOUSE = 'default' - SCHEDULE = 5 MINUTE + SCHEDULE = 10 MINUTE AS - COPY INTO frame_events (frame_id, session_id, frame_index, captured_at, event_type, risk_score) + COPY INTO frame_events (frame_id, video_id, frame_index, collected_at, event_tag, risk_score, speed_kmh) FROM ( SELECT frame_id::STRING, - session_id::STRING, + video_id::STRING, frame_index::INT, - captured_at::TIMESTAMP, - event_type::STRING, - risk_score::DOUBLE - FROM @drive_stage/frame-events/ + collected_at::TIMESTAMP, + event_tag::STRING, + risk_score::DOUBLE, + speed_kmh::DOUBLE + FROM @citydrive_stage/frame-events/ ) FILE_FORMAT = (TYPE = 'PARQUET'); ALTER TASK task_load_frame_events RESUME; - --- 对 frame_payloads 与 frame_embeddings 重复即可 ``` -cron 语法、依赖设置与错误处理见[持续 Pipeline → Task](/guides/load-data/continuous-data-pipelines/task)。 \ No newline at end of file +其余表可以按同样模式新增 Task。更多调度/依赖选项见:[Tasks](/guides/load-data/continuous-data-pipelines/task)。 + +--- + +当这些作业运行后,“统一工作负载”系列里的每个指南都读取相同的 CityDrive 表——无需额外 ETL,也不需要重复存储。 diff --git a/docs/cn/guides/54-query/_category_.json b/docs/cn/guides/54-query/_category_.json index eceb721ed2..40762446c3 100644 --- a/docs/cn/guides/54-query/_category_.json +++ b/docs/cn/guides/54-query/_category_.json @@ -1,3 +1,3 @@ { - "label": "统一工作负载(Unified Workloads)" -} \ No newline at end of file + "label": "统一引擎场景" +} diff --git a/docs/cn/guides/54-query/index.md b/docs/cn/guides/54-query/index.md index ab8959fdc9..035b9c9200 100644 --- a/docs/cn/guides/54-query/index.md +++ b/docs/cn/guides/54-query/index.md @@ -1,15 +1,15 @@ --- -title: 统一工作负载 +title: 统一引擎场景 --- -Databend 现已作为统一引擎,支持 SQL 分析、多模态搜索、向量相似度、地理空间分析及持续 ETL。本迷你系列以 **EverDrive 智能视觉** 场景为例(会话 ID 如 `SES-20240801-SEA01`,帧 ID 如 `FRAME-0001`),演示同一数据集如何在不跨系统复制的情况下流经所有工作负载。 +CityDrive Intelligence 会保存每一次行车记录:把整段视频拆成帧,并为每个 `video_id` 写入结构化元数据、JSON 清单、行为标签、向量特征以及 GPS 轨迹。下面这一组指南展示 Databend 如何把这些需求都跑在同一个数仓里,既不需要复制数据,也不用额外搭建搜索或向量集群。 -| 指南 | 涵盖内容 | +| 指南 | 内容摘要 | |-------|----------------| -| [SQL 分析](./00-sql-analytics.md) | 构建共享表、切分会话、添加窗口/聚合加速 | -| [JSON 与搜索](./01-json-search.md) | 存储检测负载并 `QUERY` 风险场景 | -| [向量搜索](./02-vector-db.md) | 保留帧嵌入并查找语义邻居 | -| [地理分析](./03-geo-analytics.md) | 使用 `HAVERSINE`、多边形、H3 映射事件 | -| [湖仓 ETL](./04-lakehouse-etl.md) | 暂存文件、`COPY INTO` 表、可选流/任务 | +| [SQL 分析](./00-sql-analytics.md) | 构建基础表,示范过滤、连接、窗口与聚合索引 | +| [JSON 与搜索](./01-json-search.md) | 加载 `frame_metadata_catalog`,运行 Elasticsearch `QUERY()`,关联位图标签 | +| [向量搜索](./02-vector-db.md) | 保留向量特征,用余弦距离做语义相似度检索,并联动风险指标 | +| [地理分析](./03-geo-analytics.md) | 运用 `GEOMETRY`、距离/多边形过滤以及信号灯关联 | +| [湖仓 ETL](./04-lakehouse-etl.md) | 一次暂存,`COPY INTO` 共享表,并可选配 Streams/Tasks | -按顺序完成即可看到 Databend 的单个查询优化器(Query Optimizer)如何为同一车队数据上的分析、搜索、向量、地理及加载流水线提供支持。 \ No newline at end of file +按顺序体验,即可看到同一批 CityDrive 标识符如何贯穿经典 SQL、全文检索、向量、地理和 ETL,全程由 Databend 的单一执行引擎托管。 diff --git a/docs/cn/sql-reference/20-sql-functions/10-search-functions/index.md b/docs/cn/sql-reference/20-sql-functions/10-search-functions/index.md index ab14db315c..7b884a5769 100644 --- a/docs/cn/sql-reference/20-sql-functions/10-search-functions/index.md +++ b/docs/cn/sql-reference/20-sql-functions/10-search-functions/index.md @@ -23,7 +23,7 @@ CREATE OR REPLACE TABLE frames ( | 函数 | 描述 | 示例 | |----------|-------------|---------| | [MATCH](match) | 对指定列执行相关性排序搜索。 | `MATCH('summary, tags', 'traffic light red')` | -| [QUERY](query) | 解析 Lucene 风格查询表达式,支持嵌套 `VARIANT` 字段。 | `QUERY('meta.signals.traffic_light:red')` | +| [QUERY](query) | 解析 Elasticsearch 风格查询表达式,支持嵌套 `VARIANT` 字段。 | `QUERY('meta.signals.traffic_light:red')` | | [SCORE](score) | 与 `MATCH` 或 `QUERY` 配合使用时,返回当前行的相关性得分。 | `SELECT summary, SCORE() FROM frame_notes WHERE MATCH('summary, tags', 'traffic light red')` | ## 查询语法示例 @@ -89,4 +89,4 @@ SELECT id, meta['frame']['timestamp'] AS ts, SCORE() FROM frames WHERE QUERY('meta.signals.traffic_light:red^1.0 AND meta.tags:urban^2.0') LIMIT 100; -``` \ No newline at end of file +``` diff --git a/docs/cn/sql-reference/20-sql-functions/10-search-functions/query.md b/docs/cn/sql-reference/20-sql-functions/10-search-functions/query.md index 76d86c9663..4cafd84057 100644 --- a/docs/cn/sql-reference/20-sql-functions/10-search-functions/query.md +++ b/docs/cn/sql-reference/20-sql-functions/10-search-functions/query.md @@ -5,7 +5,7 @@ import FunctionDescription from '@site/src/components/FunctionDescription'; -`QUERY` 通过 Lucene 风格查询表达式与具备倒排索引(Inverted Index)的列进行匹配,从而过滤行。使用点记法可导航 `VARIANT` 列中的嵌套字段。该函数仅在 `WHERE` 子句中生效。 +`QUERY` 通过 Elasticsearch 风格查询表达式与具备倒排索引(Inverted Index)的列进行匹配,从而过滤行。使用点记法可导航 `VARIANT` 列中的嵌套字段。该函数仅在 `WHERE` 子句中生效。 :::info Databend 的 QUERY 函数灵感源自 Elasticsearch 的 [QUERY](https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-functions-search.html#sql-functions-search-query)。 @@ -179,4 +179,4 @@ SELECT id, meta['frame']['timestamp'] AS ts FROM frames WHERE QUERY('meta.detections.text:SCHOOL AND meta.scene.time_of_day:day'); -- 返回 id 3 -``` \ No newline at end of file +``` diff --git a/i18n/zh/code.json b/i18n/zh/code.json index 9658929e8a..868e7d774c 100644 --- a/i18n/zh/code.json +++ b/i18n/zh/code.json @@ -27,7 +27,7 @@ "description": "The first paragraph of the 404 page" }, "theme.NotFound.p2": { - "message": "请联系原始链接来源网站的所有者,并告知他们链接已损坏。", + "message": "请联系原始链接来源网站的所有者,并告知他们链接已损坏。", "description": "The 2nd paragraph of the 404 page" }, "theme.admonition.note": { @@ -138,7 +138,7 @@ "description": "The label used to tell the user that he's browsing an unreleased doc version" }, "theme.docs.versions.unmaintainedVersionLabel": { - "message": "此为 {siteTitle} {versionLabel} 版的文档,现已不再积极维护。", + "message": "此为 {siteTitle} {versionLabel} 版的文档,现已不再积极维护。", "description": "The label used to tell the user that he's browsing an unmaintained doc version" }, "theme.docs.versions.latestVersionSuggestionLabel": { @@ -414,7 +414,7 @@ "description": "Thanks for voting!" }, "Did this page help you?": { - "message": "指出文档中的错误或问题,我们将会赠予您专属纪念 T 恤一件!", + "message": "指出文档中的错误或问题,我们将会赠予您专属纪念 T 恤一件!", "description": "Did this page help you?" }, "Explore Databend Cloud for FREE": { @@ -470,7 +470,7 @@ "description": "Cloud Data Analytics" }, "Databend - Your best alternative to Snowflake. Cost-effective and simple for massive-scale analytics.": { - "message": "Databend - 替代 Snowflake 的最佳方案。高性价比且简单易用,适用于大规模数据分析。", + "message": "Databend - 替代 Snowflake 的最佳方案。高性价比且简单易用,适用于大规模数据分析。", "description": "Databend - Your best alternative to Snowflake. Cost-effective and simple for massive-scale analytics." }, "PAGE NOT FOUND": { @@ -478,7 +478,7 @@ "description": "PAGE NOT FOUND" }, "Please check your link or head Home to regroup.": { - "message": "页面地址可能有所变更或者不存在,请检查您的链接或返回到操作指南。", + "message": "页面地址可能有所变更或者不存在,请检查您的链接或返回到操作指南。", "description": "Either you're out of bounds or that page doesn't exist. Please check your link or head Home to regroup." }, "BACK TO HOME": { @@ -522,7 +522,7 @@ "description": "Databend Cloud 部分的描述" }, "Connect to Databend": { - "message": "连接到 Databend", + "message": "连接 Databend", "description": "连接到 Databend 部分的标题" }, "Developer Resources": { @@ -530,11 +530,11 @@ "description": "开发者资源的链接文字" }, "Connect your application to Databend in just a few minutes.": { - "message": "几分钟内即可让您的应用连上 Databend。", + "message": "几分钟内就能让应用接入 Databend。", "description": "连接到 Databend 部分的描述" }, "Load Data into Databend": { - "message": "加载数据到 Databend", + "message": "向 Databend 加载数据", "description": "加载数据到 Databend 部分的标题" }, "Know More": { @@ -542,27 +542,27 @@ "description": "了解更多的链接文字" }, "Bulk import data into Databend(Cloud) in multiple formats.": { - "message": "支持多种格式批量导入数据至 Databend(Cloud)。", + "message": "以多种格式批量把数据导入 Databend(含 Cloud)。", "description": "加载数据到 Databend 部分的描述" }, "AI & BI & Visualization & Notebooks": { - "message": "AI & BI & 可视化 & 笔记本", + "message": "AI · BI · 可视化 · Notebook", "description": "AI & BI & 可视化 & 笔记本 部分的标题" }, "All Tools": { - "message": "所有工具", + "message": "全部工具", "description": "所有工具的链接文字" }, "Databend offers connectors and plugins for integrating with major data import tools, ensuring efficient data synchronization.": { - "message": "Databend 提供丰富的连接器与插件,可与主流数据导入工具无缝集成,保障数据高效同步。", + "message": "Databend 提供主流导入工具的连接器与插件,保障高效同步。", "description": "AI & BI & 可视化 & 笔记本 部分的描述" }, "Continuous Data Pipelines": { - "message": "连续数据管道", + "message": "持续数据管道", "description": "连续数据管道部分的标题" }, "Data pipelines automate the process of moving and changing data from different sources into Databend.": { - "message": "数据管道可自动完成多源数据的迁移、转换与加载到 Databend。", + "message": "数据管道自动完成多源采集、转换并写入 Databend。", "description": "连续数据管道部分的描述" }, "Real-Time CDC Ingestion": { @@ -574,11 +574,11 @@ "description": "自动化数据管道的文本" }, "Additional Informations": { - "message": "更多信息", + "message": "更多资料", "description": "额外信息部分的标题" }, "AI Capabilities": { - "message": "AI 功能", + "message": "AI 能力", "description": "AI 功能的文本" }, "Databend Products": { @@ -590,7 +590,7 @@ "description": "安全的文本" }, "Contact Support": { - "message": "联系客服", + "message": "联系支持团队", "description": "联系支持的文本" }, "Pricing": { @@ -598,15 +598,15 @@ "description": "价格的文本" }, "Use Cases": { - "message": "用户案例", + "message": "典型场景", "description": "Use Cases" }, "Introduction to Databend Products": { - "message": "Databend 产品介绍", + "message": "Databend 产品导览", "description": "Databend 产品介绍部分的标题" }, "Choose the deployment option that best fits your needs and scale.": { - "message": "选择最契合业务需求的部署方式,随需扩展。", + "message": "按业务规模选择最合适的部署方式。", "description": "Databend 产品介绍部分的描述" }, "Databend Cloud": { @@ -614,7 +614,7 @@ "description": "Databend Cloud 产品的标题" }, "Fully-managed cloud service. No setup required.": { - "message": "全托管云服务,开箱即用。", + "message": "全托管云服务,开箱即可使用。", "description": "Databend Cloud 产品的描述" }, "Databend Enterprise": { @@ -622,7 +622,7 @@ "description": "Databend Enterprise 产品的标题" }, "Self-hosted with enterprise features and support.": { - "message": "自主部署,拥有企业级功能与专业支持。", + "message": "自主部署,配备企业级功能与支持。", "description": "Databend Enterprise 产品的描述" }, "Databend Community": { @@ -630,15 +630,15 @@ "description": "Databend 社区版 产品的标题" }, "Open-source and free for all use cases.": { - "message": "开源免费,适用于任何场景。", + "message": "开源且永久免费。", "description": "Databend 社区版 产品的描述" }, "Getting Started": { - "message": "入门指南", + "message": "快速入门", "description": "入门指南部分的标题" }, "Create a Databend Cloud account or deploy your own Databend instance.": { - "message": "注册 Databend Cloud 账户或自主部署 Databend 实例。", + "message": "注册 Databend Cloud 或自行部署实例。", "description": "入门指南部分的描述" }, "Activate Databend Cloud": { @@ -686,7 +686,7 @@ "description": "升级 Databend 的链接文字" }, "Changelog": { - "message": "发布记录", + "message": "更新日志", "description": "Changelog" }, "FAQ": { @@ -694,7 +694,7 @@ "description": "FAQ" }, "Product Features": { - "message": "产品特点", + "message": "产品特性", "description": "Product Features" }, "Unified Engine": { @@ -718,23 +718,23 @@ "description": "Stores all data in object storage." }, "Analytics, vector, search, and geo share one optimizer and runtime.": { - "message": "分析、向量、搜索、地理信息共享统一的查询优化器与执行引擎。", + "message": "分析、向量、搜索与地理能力共用一套优化器和执行引擎。", "description": "Description for unified engine feature" }, "Unified Data": { - "message": "统一数据", + "message": "统一数据层", "description": "Headline for unified data feature" }, "Structured, semi-structured, unstructured, and vector data share object storage.": { - "message": "结构化、半结构化、非结构化及向量数据统一存储于对象存储中。", + "message": "结构化、半结构化、非结构化与向量数据共享同一对象存储。", "description": "Description for unified data feature" }, "Analytics Native": { - "message": "原生分析能力", + "message": "原生分析引擎", "description": "Headline for analytics native feature" }, "ANSI SQL, windowing, incremental aggregates, and streaming power BI.": { - "message": "标准 SQL、窗口函数、增量聚合与流式计算为 BI 分析提供强力支撑。", + "message": "ANSI SQL、窗口函数、增量聚合与流式处理为 BI 持续供能。", "description": "Description for analytics native feature" }, "Vector Native": { @@ -742,7 +742,7 @@ "description": "Headline for vector native feature" }, "Embeddings, vector indexes, and semantic retrieval all run in SQL.": { - "message": "向量嵌入、向量索引与语义检索均可通过 SQL 直接完成。", + "message": "向量嵌入、索引与语义检索全部在 SQL 中完成。", "description": "Description for vector native feature" }, "Search Native": { @@ -750,27 +750,27 @@ "description": "Headline for search native feature" }, "JSON inverted indexes, geo functions, and ranking fuel hybrid maps.": { - "message": "JSON 全文索引、地理函数与排序算法共同驱动混合检索。", + "message": "JSON 倒排索引、地理函数与排序能力共同驱动混合检索。", "description": "Description for search native feature" }, "Unified Deployment": { - "message": "统一部署方式", + "message": "统一部署选择", "description": "Headline for unified deployment feature" }, "Databend runs the same in Cloud, Docker, or `pip install`.": { - "message": "无论云端、Docker 还是 `pip install`,都是同一个 Databend 内核。", + "message": "无论 Cloud、Docker 还是 `pip install`,体验的都是同一个 Databend 引擎。", "description": "Description for unified deployment feature" }, "Start with Databend Cloud": { - "message": "注册 Databend Cloud", + "message": "从 Databend Cloud 起步", "description": "Start with Databend Cloud" }, "Get started in minutes with our fully-managed cloud service. No setup required.": { - "message": "几分钟即可上手我们的全托管云服务,无需任何配置。", + "message": "几分钟即可启用全托管云服务,无需任何额外配置。", "description": "Get started in minutes with our fully-managed cloud service. No setup required." }, "What you need to know:": { - "message": "您需要了解的内容:", + "message": "重点信息:", "description": "What you need to know:" }, "Choose Your Edition": { @@ -778,7 +778,7 @@ "description": "Choose Your Edition" }, "Pricing & Plans": { - "message": "定价与计划", + "message": "价格与套餐", "description": "Pricing & Plans" }, "Using Databend Cloud": { @@ -786,23 +786,23 @@ "description": "Using Databend Cloud" }, "Deploy Your Own Instance": { - "message": "部署您自己的实例", + "message": "自主部署实例", "description": "Deploy Your Own Instance" }, "Install Databend on your infrastructure for complete control and customization.": { - "message": "部署在您自己的基础设施上,实现完全自主可控与深度定制。", + "message": "在自有基础设施上安装 Databend,配置完全可控。", "description": "Install Databend on your infrastructure for complete control and customization." }, "5-Minute Quick Start": { - "message": "5 分钟快速开始", + "message": "5 分钟快速上手", "description": "5-Minute Quick Start" }, "Download & Install": { - "message": "下载与安装", + "message": "下载并安装", "description": "Download & Install" }, "Enterprise Features & Licensing": { - "message": "企业功能与许可", + "message": "企业特性与许可", "description": "Enterprise Features & Licensing" }, "Copy Page": { @@ -810,7 +810,7 @@ "description": "Copy Page" }, "Copy page as Markdown for LLMs": { - "message": "复制为 Markdown 格式,供大语言模型使用", + "message": "复制为 Markdown 格式,供大语言模型使用", "description": "Copy page as Markdown for LLMs" }, "View as Markdown": {