@@ -264,7 +264,67 @@ python run_benchmark.py \
264264 --output-dir ./results \
265265 --output-prefix h2o
266266```
267+ ---
268+ ## Elasticsearch End-to-End Example using H2O Dataset
269+
270+ ### Step 1-5:
271+ Follow the same instructions from the H2O GroupBy example above.
267272
273+ ### Step 6 — Launch Arroyo sketch pipeline
274+
275+ ``` bash
276+ python export_to_arroyo.py \
277+ --streaming-config ./configs/h2o_streaming.yaml \
278+ --source-type file \
279+ --input-file ./data/h2o_arroyo.json \
280+ --file-format json \
281+ --ts-format unix_millis \
282+ --pipeline-name h2o_pipeline \
283+ --arroyosketch-dir ~ /ASAPQuery/asap-summary-ingest \
284+ --output-dir ./arroyo_outputs
285+ ```
286+
287+ ### Step 7 — Start QueryEngineRust
288+
289+ ``` bash
290+ cd ~ /ASAPQuery/asap-query-engine
291+
292+ ./target/release/query_engine_rust \
293+ --kafka-topic sketch_topic
294+ --input-format json \
295+ --config ~ /ASAPQuery/asap-tools/execution-utilities/benchmark/configs/h2o_inference.yaml \
296+ --streaming-config ~ /ASAPQuery/asap-tools/execution-utilities/benchmark/configs/h2o_streaming.yaml \
297+ --http-port 8088 --delete-existing-db --log-level DEBUG \
298+ --output-dir ./output --streaming-engine arroyo \
299+ --query-language SQL --lock-strategy per-key \
300+ --prometheus-scrape-interval 1 > /tmp/query_engine.log 2>&1 &
301+ ```
302+
303+ ### Step 8 — Load data into Elasticsearch (baseline)
304+
305+ ``` bash
306+ python export_to_database.py
307+ --dataset h2o
308+ --file-path ./data/G1_1e7_1e2_0_0.csv
309+ --es-host localhost
310+ --es-port 9200
311+ --es-index h2o_groupby
312+ --es-api-key your-api-key
313+ --es-bulk-size 5000
314+ ```
315+
316+ ### Step 9 — Run benchmark
317+
318+ ``` bash
319+ python run_benchmark.py
320+ --mode asap
321+ --asap-sql-file ./queries/h2o_asap.sql
322+ --baseline-sql-file ./queries/h2o_elasticsearch.sql
323+ --elastic-host localhost
324+ --elastic-port 9200
325+ --elastic-api-key your-api-key
326+ --output-dir ./results --output-prefix h2o
327+ ```
268328---
269329
270330## Custom Dataset
0 commit comments