From 1bd3feee2ba265a7ddfb18e0f44eb22ff3940a78 Mon Sep 17 00:00:00 2001 From: avaamsel Date: Wed, 24 Jun 2026 14:04:59 -0700 Subject: [PATCH 1/6] docs: updated isolation.level description --- docs/ingestion/kafka-ingestion.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/docs/ingestion/kafka-ingestion.md b/docs/ingestion/kafka-ingestion.md index 5854d5820764..928d8aa3ff88 100644 --- a/docs/ingestion/kafka-ingestion.md +++ b/docs/ingestion/kafka-ingestion.md @@ -156,9 +156,7 @@ Consumer properties control how a supervisor reads and processes event messages You must include `bootstrap.servers` in consumer properties with a list of Kafka brokers in the format `:,:,...`. In some cases, you may need to retrieve consumer properties at runtime. For example, when `bootstrap.servers` is unknown or not static. -The `isolation.level` property in `consumerProperties` determines how Druid reads messages written transactionally. -With `read_committed`, which is the default in Druid, only committed transactions are read. -If you use older versions of Kafka without transaction support, or you want to read even aborted transactions, set `isolation.level` to `read_uncommitted`. +The `isolation.level` property determines how Druid handles transactional Kafka messages. Although standard Kafka consumers default to `read_uncommitted`, Druid's ingestion engine defaults `read_committed`. This ensures that only finalized data is indexed and aborted transactions are ignored. If you need to use legacy Kafka brokers or don’t want Druid to consume only committed transactions, explicitly set `isolation.level` to `read_uncommitted`. Note that using `read_uncommitted` removes Druid's offset gap check, which requires the message source to ensure the message offsets are continuous. If your Kafka cluster enables consumer group ACLs, you can set `group.id` in `consumerProperties` to override the default auto generated group ID. From ad41d9532b8c6c155eb341137d1d4ead51077b46 Mon Sep 17 00:00:00 2001 From: avaamsel Date: Wed, 24 Jun 2026 14:10:20 -0700 Subject: [PATCH 2/6] added yarn serve to scripts --- website/package.json | 1 + 1 file changed, 1 insertion(+) diff --git a/website/package.json b/website/package.json index 5ff9dfdf86ba..0733ccbb9b6a 100644 --- a/website/package.json +++ b/website/package.json @@ -3,6 +3,7 @@ "scripts": { "start": "docusaurus start", "build": "docusaurus build", + "serve": "docusaurus serve", "compile-scss": "sass scss/custom.scss > static/css/custom.css", "link-lint": "node script/link-lint.js", "spellcheck": "mdspell --en-us --ignore-numbers --report '../docs/**/*.md' || (./script/notify-spellcheck-issues && false)", From b84173360e65db12d092c3622224415337bc2261 Mon Sep 17 00:00:00 2001 From: ava nunes Date: Fri, 26 Jun 2026 10:56:17 -0700 Subject: [PATCH 3/6] Update docs/ingestion/kafka-ingestion.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> --- docs/ingestion/kafka-ingestion.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/ingestion/kafka-ingestion.md b/docs/ingestion/kafka-ingestion.md index 928d8aa3ff88..1d8127a767ce 100644 --- a/docs/ingestion/kafka-ingestion.md +++ b/docs/ingestion/kafka-ingestion.md @@ -156,7 +156,7 @@ Consumer properties control how a supervisor reads and processes event messages You must include `bootstrap.servers` in consumer properties with a list of Kafka brokers in the format `:,:,...`. In some cases, you may need to retrieve consumer properties at runtime. For example, when `bootstrap.servers` is unknown or not static. -The `isolation.level` property determines how Druid handles transactional Kafka messages. Although standard Kafka consumers default to `read_uncommitted`, Druid's ingestion engine defaults `read_committed`. This ensures that only finalized data is indexed and aborted transactions are ignored. If you need to use legacy Kafka brokers or don’t want Druid to consume only committed transactions, explicitly set `isolation.level` to `read_uncommitted`. Note that using `read_uncommitted` removes Druid's offset gap check, which requires the message source to ensure the message offsets are continuous. +The `isolation.level` property determines how Druid handles transactional Kafka messages. Although standard Kafka consumers default to `read_uncommitted`, Druid's ingestion engine defaults to `read_committed`. This ensures that only finalized data is indexed and aborted transactions are ignored. If you need to use legacy Kafka brokers or don’t want Druid to consume only committed transactions, explicitly set `isolation.level` to `read_uncommitted`. Note that using `read_uncommitted` removes Druid's offset gap check, which requires the message source to ensure the message offsets are continuous. If your Kafka cluster enables consumer group ACLs, you can set `group.id` in `consumerProperties` to override the default auto generated group ID. From ce83ca40c7594d1d199d42b89ae98caff9d75062 Mon Sep 17 00:00:00 2001 From: avaamsel Date: Mon, 29 Jun 2026 15:16:44 -0700 Subject: [PATCH 4/6] docs: updated all commands referencing bin/post-index-task --- docs/ingestion/native-batch.md | 2 +- docs/tutorials/tutorial-batch.md | 8 ++++---- docs/tutorials/tutorial-compaction.md | 14 ++++++++++---- docs/tutorials/tutorial-delete-data.md | 6 ++++-- docs/tutorials/tutorial-ingestion-spec.md | 4 +++- docs/tutorials/tutorial-retention.md | 6 ++++-- 6 files changed, 26 insertions(+), 14 deletions(-) diff --git a/docs/ingestion/native-batch.md b/docs/ingestion/native-batch.md index 986d7e977975..96ab037b2126 100644 --- a/docs/ingestion/native-batch.md +++ b/docs/ingestion/native-batch.md @@ -46,7 +46,7 @@ For related information on batch indexing, see: To run either kind of JSON-based batch indexing task, you can: - Use the **Load Data** UI in the web console to define and submit an ingestion spec. -- Define an ingestion spec in JSON based upon the [examples](#parallel-indexing-example) and reference topics for batch indexing. Then POST the ingestion spec to the [Tasks API endpoint](../api-reference/tasks-api.md), `/druid/indexer/v1/task`, the Overlord service. Alternatively, you can use the indexing script included with Druid at `bin/post-index-task`. +- Define an ingestion spec in JSON based upon the [examples](#parallel-indexing-example) and reference topics for batch indexing. Then POST the ingestion spec to the [Tasks API endpoint](../api-reference/tasks-api.md), `/druid/indexer/v1/task`, the Overlord service. ## Parallel task indexing diff --git a/docs/tutorials/tutorial-batch.md b/docs/tutorials/tutorial-batch.md index e8b729b04a46..765f91d104d3 100644 --- a/docs/tutorials/tutorial-batch.md +++ b/docs/tutorials/tutorial-batch.md @@ -118,14 +118,14 @@ Once the spec is submitted, wait a few moments for the data to load, after which ## Loading data with a spec (via command line) -For convenience, the Druid package includes a batch ingestion helper script at `bin/post-index-task`. - -This script will POST an ingestion task to the Druid Overlord and poll Druid until the data is available for querying. +To load data with a spec, you need to POST an ingestion task to the Druid Overlord and poll Druid until the data is available for querying. Run the following command from Druid package root: ```bash -bin/post-index-task --file quickstart/tutorial/wikipedia-index.json --url http://localhost:8081 +curl -X POST http://localhost:8081/druid/indexer/v1/task \ + -H "Content-Type: application/json" \ + -d @quickstart/tutorial/wikipedia-index.json ``` You should see output like the following: diff --git a/docs/tutorials/tutorial-compaction.md b/docs/tutorials/tutorial-compaction.md index c4a918897ab0..7f8c4cc9bb0e 100644 --- a/docs/tutorials/tutorial-compaction.md +++ b/docs/tutorials/tutorial-compaction.md @@ -45,10 +45,12 @@ This tutorial uses the Wikipedia edits sample data included with the Druid distr To load the initial data, you use an ingestion spec that loads batch data with segment granularity of `HOUR` and creates between one and three segments per hour. You can review the ingestion spec at `quickstart/tutorial/compaction-init-index.json`. -Submit the spec as follows to create a datasource called `compaction-tutorial`: +Submit the spec as follows to the Druid Overlord API to create a datasource called `compaction-tutorial`: ```bash -bin/post-index-task --file quickstart/tutorial/compaction-init-index.json --url http://localhost:8081 +curl -X POST http://localhost:8081/druid/indexer/v1/task \ + -H "Content-Type: application/json" \ + -d @quickstart/tutorial/compaction-init-index.json ``` :::info @@ -106,7 +108,9 @@ This datasource only has 39,244 rows. 39,244 is below the default limit of 5,000 Submit the compaction task now: ```bash -bin/post-index-task --file quickstart/tutorial/compaction-keep-granularity.json --url http://localhost:8081 +curl -X POST http://localhost:8081/druid/indexer/v1/task \ + -H "Content-Type: application/json" \ + -d @quickstart/tutorial/compaction-keep-granularity.json ``` After the task finishes, refresh the [segments view](http://localhost:8888/unified-console.html#segments). @@ -172,7 +176,9 @@ Note that `segmentGranularity` is set to `DAY` in this compaction task spec. Submit this task now: ```bash -bin/post-index-task --file quickstart/tutorial/compaction-day-granularity.json --url http://localhost:8081 +curl -X POST http://localhost:8081/druid/indexer/v1/task \ + -H "Content-Type: application/json" \ + -d @quickstart/tutorial/compaction-day-granularity.json ``` It takes some time before the Coordinator marks the old input segments as unused, so you may see an intermediate state with 25 total segments. Eventually, only one DAY granularity segment remains: diff --git a/docs/tutorials/tutorial-delete-data.md b/docs/tutorials/tutorial-delete-data.md index 93173470c4ca..9c176c9cb9a4 100644 --- a/docs/tutorials/tutorial-delete-data.md +++ b/docs/tutorials/tutorial-delete-data.md @@ -34,10 +34,12 @@ This tutorial requires the following: In this tutorial, we will use the Wikipedia edits data, with an indexing spec that creates hourly segments. This spec is located at `quickstart/tutorial/deletion-index.json`, and it creates a datasource called `deletion-tutorial`. -Let's load this initial data: +Let's load this initial data by calling the Druid Overlord API: ```bash -bin/post-index-task --file quickstart/tutorial/deletion-index.json --url http://localhost:8081 +curl -X POST http://localhost:8081/druid/indexer/v1/task \ + -H "Content-Type: application/json" \ + -d @quickstart/tutorial/deletion-index.json ``` When the load finishes, open [http://localhost:8888/unified-console.md#datasources](http://localhost:8888/unified-console.html#datasources) in a browser. diff --git a/docs/tutorials/tutorial-ingestion-spec.md b/docs/tutorials/tutorial-ingestion-spec.md index 67324cb19ebf..d26fcc73bc2b 100644 --- a/docs/tutorials/tutorial-ingestion-spec.md +++ b/docs/tutorials/tutorial-ingestion-spec.md @@ -583,7 +583,9 @@ We've finished defining the ingestion spec, it should now look like the followin From the `apache-druid-{{DRUIDVERSION}}` package root, run the following command: ```bash -bin/post-index-task --file quickstart/ingestion-tutorial-index.json --url http://localhost:8081 +curl -X POST http://localhost:8081/druid/indexer/v1/task \ + -H "Content-Type: application/json" \ + -d @quickstart/tutorial/ingestion-tutorial-index.json ``` After the script completes, we will query the data. diff --git a/docs/tutorials/tutorial-retention.md b/docs/tutorials/tutorial-retention.md index 6beca6255b66..2e6381bfe7d0 100644 --- a/docs/tutorials/tutorial-retention.md +++ b/docs/tutorials/tutorial-retention.md @@ -35,10 +35,12 @@ It will also be helpful to have finished [Load a file](../tutorials/tutorial-bat For this tutorial, we'll be using the Wikipedia edits sample data, with an ingestion task spec that will create a separate segment for each hour in the input data. -The ingestion spec can be found at `quickstart/tutorial/retention-index.json`. Let's submit that spec, which will create a datasource called `retention-tutorial`: +The ingestion spec can be found at `quickstart/tutorial/retention-index.json`. Let's submit that spec by calling Druid Overlord, which will create a datasource called `retention-tutorial`: ```bash -bin/post-index-task --file quickstart/tutorial/retention-index.json --url http://localhost:8081 +curl -X POST http://localhost:8081/druid/indexer/v1/task \ + -H "Content-Type: application/json" \ + -d @quickstart/tutorial/retention-index.json ``` After the ingestion completes, go to [http://localhost:8888/unified-console.html#datasources](http://localhost:8888/unified-console.html#datasources) in a browser to access the web console's datasource view. From c5ed917d22c059b5c0e7e685fef140c0ba9f6564 Mon Sep 17 00:00:00 2001 From: avaamsel Date: Mon, 29 Jun 2026 15:21:26 -0700 Subject: [PATCH 5/6] cleaned up some instructions around updated commands --- docs/tutorials/tutorial-compaction.md | 2 +- docs/tutorials/tutorial-delete-data.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/tutorials/tutorial-compaction.md b/docs/tutorials/tutorial-compaction.md index 7f8c4cc9bb0e..f25b5f866b3b 100644 --- a/docs/tutorials/tutorial-compaction.md +++ b/docs/tutorials/tutorial-compaction.md @@ -173,7 +173,7 @@ The Druid distribution includes a compaction task spec to create `DAY` granulari Note that `segmentGranularity` is set to `DAY` in this compaction task spec. -Submit this task now: +Now, submit this task: ```bash curl -X POST http://localhost:8081/druid/indexer/v1/task \ diff --git a/docs/tutorials/tutorial-delete-data.md b/docs/tutorials/tutorial-delete-data.md index 9c176c9cb9a4..bb68f1f62960 100644 --- a/docs/tutorials/tutorial-delete-data.md +++ b/docs/tutorials/tutorial-delete-data.md @@ -34,7 +34,7 @@ This tutorial requires the following: In this tutorial, we will use the Wikipedia edits data, with an indexing spec that creates hourly segments. This spec is located at `quickstart/tutorial/deletion-index.json`, and it creates a datasource called `deletion-tutorial`. -Let's load this initial data by calling the Druid Overlord API: +Let's load our initial data by calling Druid Overlord: ```bash curl -X POST http://localhost:8081/druid/indexer/v1/task \ From 970498a729c305f894f77952fcfb5dff717d29c8 Mon Sep 17 00:00:00 2001 From: ava nunes Date: Wed, 1 Jul 2026 12:37:11 -0700 Subject: [PATCH 6/6] removed mention of script --- docs/tutorials/tutorial-ingestion-spec.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/tutorials/tutorial-ingestion-spec.md b/docs/tutorials/tutorial-ingestion-spec.md index d26fcc73bc2b..be951e035c1e 100644 --- a/docs/tutorials/tutorial-ingestion-spec.md +++ b/docs/tutorials/tutorial-ingestion-spec.md @@ -580,7 +580,7 @@ We've finished defining the ingestion spec, it should now look like the followin ## Submit the task and query the data -From the `apache-druid-{{DRUIDVERSION}}` package root, run the following command: +From the `apache-druid-{{DRUIDVERSION}}` package root, run the following command to create a datasource called `ingestion tutorial`: ```bash curl -X POST http://localhost:8081/druid/indexer/v1/task \ @@ -588,7 +588,7 @@ curl -X POST http://localhost:8081/druid/indexer/v1/task \ -d @quickstart/tutorial/ingestion-tutorial-index.json ``` -After the script completes, we will query the data. +After the ingestion completes, we will query the data. In the web console, open a new tab in the **Query** view. Run the following query to view the ingested data: @@ -604,4 +604,4 @@ Returns the following: | `2018-01-01T01:02:00.000Z` | `9000` | `18.1` | `2` | `2.2.2.2` | `7000` | `90` | `6` | `1.1.1.1` | `5000` | | `2018-01-01T01:03:00.000Z` | `6000` | `4.3` | `1` | `2.2.2.2` | `7000` | `60` | `6` | `1.1.1.1` | `5000` | | `2018-01-01T02:33:00.000Z` | `30000` | `56.9` | `2` | `8.8.8.8` | `5000` | `300` | `17` | `7.7.7.7` | `4000` | -| `2018-01-01T02:35:00.000Z` | `30000` | `46.3` | `1` | `8.8.8.8` | `5000` | `300` | `17` | `7.7.7.7` | `4000` | \ No newline at end of file +| `2018-01-01T02:35:00.000Z` | `30000` | `46.3` | `1` | `8.8.8.8` | `5000` | `300` | `17` | `7.7.7.7` | `4000` |