Skip to content

Druid 37.0.0: EXTERN parquet from S3 fails with java.io.IOException: No such file or directory #19619

Description

@sadyuri

Minimal reproduction

SELECT *
FROM TABLE(
  EXTERN(
    '{"objectGlob" : "**.parquet", "type":"s3","uris":["s3://bucket/path/file.parquet"]}',
    '{"type":"parquet","binaryAsString":false}',
    '[{"name":"column_a","type":"string"},{"name":"column_b","type":"string"}]'
  )
)
LIMIT 1

Affected Version

37.0.0

Observed

java.io.IOException: No such file or directory

at java.io.File.createTempFile
at org.apache.druid.data.input.InputEntity.fetch
at org.apache.druid.data.input.parquet.ParquetReader

Regression

Works on 36.0.0
Fails on 37.0.0

Stacktrace:

org.apache.druid.error.DruidException: java.io.IOException: No such file or directory
	at org.apache.druid.java.util.common.Either.valueOrThrow(Either.java:106)
	at org.apache.druid.frame.processor.FrameProcessorExecutor$1ExecutorRunnable.runProcessorNow(FrameProcessorExecutor.java:275)
	at org.apache.druid.frame.processor.FrameProcessorExecutor$1ExecutorRunnable.run(FrameProcessorExecutor.java:141)
	at org.apache.druid.msq.exec.WorkerImpl$2$2.run(WorkerImpl.java:929)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at org.apache.druid.query.PrioritizedListenableFutureTask.run(PrioritizedExecutorService.java:259)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:840)

Caused by: java.lang.RuntimeException: java.io.IOException: No such file or directory
	at org.apache.druid.data.input.parquet.ParquetReader.intermediateRowIterator(ParquetReader.java:98)
	at org.apache.druid.data.input.IntermediateRowParsingReader.intermediateRowIteratorWithMetadata(IntermediateRowParsingReader.java:235)
	at org.apache.druid.data.input.IntermediateRowParsingReader.read(IntermediateRowParsingReader.java:49)
	at org.apache.druid.data.input.impl.InputEntityIteratingReader.lambda$read$0(InputEntityIteratingReader.java:76)
	at org.apache.druid.java.util.common.parsers.CloseableIterator$2.findNextIteratorIfNecessary(CloseableIterator.java:82)
	at org.apache.druid.java.util.common.parsers.CloseableIterator$2.hasNext(CloseableIterator.java:93)
	at org.apache.druid.java.util.common.parsers.CloseableIterator$1.hasNext(CloseableIterator.java:42)
	at org.apache.druid.msq.input.external.ExternalSegment$1$1.hasNext(ExternalSegment.java:89)
	at org.apache.druid.java.util.common.guava.BaseSequence.toYielder(BaseSequence.java:70)
	at org.apache.druid.java.util.common.guava.Yielders.each(Yielders.java:32)
	at org.apache.druid.segment.RowWalker.<init>(RowWalker.java:53)
	at org.apache.druid.segment.RowBasedCursorFactory$1.asCursor(RowBasedCursorFactory.java:75)
	at org.apache.druid.msq.querykit.scan.ScanQueryFrameProcessor.runWithSegment(ScanQueryFrameProcessor.java:324)
	at org.apache.druid.msq.querykit.BaseLeafFrameProcessor.runIncrementally(BaseLeafFrameProcessor.java:95)
	at org.apache.druid.msq.querykit.scan.ScanQueryFrameProcessor.runIncrementally(ScanQueryFrameProcessor.java:189)
	at org.apache.druid.msq.counters.CpuTimeAccumulatingFrameProcessor.runIncrementally(CpuTimeAccumulatingFrameProcessor.java:66)
	at org.apache.druid.frame.processor.FrameProcessors$1FrameProcessorWithBaggage.runIncrementally(FrameProcessors.java:72)
	at org.apache.druid.frame.processor.FrameProcessorExecutor$1ExecutorRunnable.runProcessorNow(FrameProcessorExecutor.java:243)
	... 8 more

Caused by: java.io.IOException: No such file or directory
	at java.base/java.io.UnixFileSystem.createFileExclusively(Native Method)
	at java.base/java.io.File.createTempFile(File.java:2170)
	at org.apache.druid.data.input.InputEntity.fetch(InputEntity.java:89)
	at org.apache.druid.data.input.BytesCountingInputEntity.fetch(BytesCountingInputEntity.java:68)
	at org.apache.druid.data.input.parquet.ParquetReader.intermediateRowIterator(ParquetReader.java:86)
	... 25 more

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions