Skip to content

Bump Spark 4.0.2 -> 4.0.3 (fix dlcdn 404)#696

Open
yans3meta wants to merge 1 commit into
facebookresearch:v2-betafrom
yans3meta:export-D109883537-to-v2-beta
Open

Bump Spark 4.0.2 -> 4.0.3 (fix dlcdn 404)#696
yans3meta wants to merge 1 commit into
facebookresearch:v2-betafrom
yans3meta:export-D109883537-to-v2-beta

Conversation

@yans3meta

Copy link
Copy Markdown

Summary:
install_spark_standalone.sh downloads Spark from dlcdn.apache.org, which only
mirrors the current Apache release. Spark 4.0.2 has been superseded and
removed from the mirror, so the install (and the public DCPerf "Spark CI"
GitHub job that runs on every diff) fails:

Bump to the current release, 4.0.3, and update every reference to the extracted
directory name (spark-4.0.2-bin-hadoop3 -> spark-4.0.3-bin-hadoop3) so the
download path and all downstream paths stay in sync:

  • packages/spark_standalone/install_spark_standalone.sh: download URL, cached
    tarball check, and tar extract.
  • benchpress/config/benchmarks.yml: spark-sql binary path.
  • packages/spark_standalone/templates/proj_root/scripts/utils.py and
    templates/runner.py: SPARK_DIR / metastore_db runtime paths.
  • packages/spark_standalone/README.md: documented metastore_db paths.

Differential Revision: D109883537

Summary:
install_spark_standalone.sh downloads Spark from dlcdn.apache.org, which only
mirrors the *current* Apache release. Spark 4.0.2 has been superseded and
removed from the mirror, so the install (and the public DCPerf "Spark CI"
GitHub job that runs on every diff) fails:

  + wget https://dlcdn.apache.org/spark/spark-4.0.2/spark-4.0.2-bin-hadoop3.tgz
  ERROR 404: Not Found.
  Exception: Failed to run '.../packages/spark_standalone/install_spark_standalone.sh'

Bump to the current release, 4.0.3, and update every reference to the extracted
directory name (spark-4.0.2-bin-hadoop3 -> spark-4.0.3-bin-hadoop3) so the
download path and all downstream paths stay in sync:

- packages/spark_standalone/install_spark_standalone.sh: download URL, cached
  tarball check, and tar extract.
- benchpress/config/benchmarks.yml: spark-sql binary path.
- packages/spark_standalone/templates/proj_root/scripts/utils.py and
  templates/runner.py: SPARK_DIR / metastore_db runtime paths.
- packages/spark_standalone/README.md: documented metastore_db paths.

Differential Revision: D109883537
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 26, 2026
@meta-codesync

meta-codesync Bot commented Jun 26, 2026

Copy link
Copy Markdown

@yans3meta has exported this pull request. If you are a Meta employee, you can view the originating Diff in D109883537.

meta-codesync Bot pushed a commit that referenced this pull request Jun 26, 2026
Summary:
Pull Request resolved: #696

install_spark_standalone.sh downloads Spark from dlcdn.apache.org, which only
mirrors the *current* Apache release. Spark 4.0.2 has been superseded and
removed from the mirror, so the install (and the public DCPerf "Spark CI"
GitHub job that runs on every diff) fails:

  + wget https://dlcdn.apache.org/spark/spark-4.0.2/spark-4.0.2-bin-hadoop3.tgz
  ERROR 404: Not Found.
  Exception: Failed to run '.../packages/spark_standalone/install_spark_standalone.sh'

Bump to the current release, 4.0.3, and update every reference to the extracted
directory name (spark-4.0.2-bin-hadoop3 -> spark-4.0.3-bin-hadoop3) so the
download path and all downstream paths stay in sync:

- packages/spark_standalone/install_spark_standalone.sh: download URL, cached
  tarball check, and tar extract.
- benchpress/config/benchmarks.yml: spark-sql binary path.
- packages/spark_standalone/templates/proj_root/scripts/utils.py and
  templates/runner.py: SPARK_DIR / metastore_db runtime paths.
- packages/spark_standalone/README.md: documented metastore_db paths.

Reviewed By: excelle08

Differential Revision: D109883537

fbshipit-source-id: d357d6cb680b078b4e4c415b1c8cd19ce404ce05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant