Skip to content

Add liblinear benchmark with synthetic + real datasets#682

Open
yans3meta wants to merge 2 commits into
facebookresearch:v2-betafrom
yans3meta:export-D108165488-to-v2-beta
Open

Add liblinear benchmark with synthetic + real datasets#682
yans3meta wants to merge 2 commits into
facebookresearch:v2-betafrom
yans3meta:export-D108165488-to-v2-beta

Conversation

@yans3meta

Copy link
Copy Markdown

Summary:
Integrate liblinear (NTU's large-scale linear classification library,
https://github.com/cjlin1/liblinear) as a new DCPerf/benchpress benchmark
in the default suite. It builds liblinear's train/predict CLIs from a pinned
upstream release, trains an L2-regularized L2-loss SVC on either a synthetic
arbitrary-size dataset or a real LIBSVM dataset, and reports training time,
derived throughput (instances/sec), and prediction accuracy.

Changes:

  • New package packages/liblinear/:
    • install_liblinear.sh: installs build deps for CentOS (dnf) and Ubuntu
      (apt) via packages/common/os-distro.sh, clones+builds liblinear at a
      pinned tag into bin/, and downloads the default rcv1.binary dataset.
    • gen_dataset.py: stdlib-only synthetic LIBSVM generator. A random
      ground-truth hyperplane labels samples by sign of the linear score,
      producing linearly-separable +1/-1 data up to configurable label noise.
      Size/shape configurable via --n-samples/--n-features/--density/--noise.
    • run_liblinear.sh: orchestrator (the benchmark path). Generates
      synthetic data or resolves/downloads a real set (rcv1/a9a/ijcnn1/w8a),
      times training, runs predict, and emits machine-parseable liblinear_*
      metric lines. Solver/cost/bias are configurable (-s/-c/-B).
    • cleanup_liblinear.sh, README.md.
  • New parser benchpress/plugins/parsers/liblinear.py + registration in
    parsers/__init__.py. Extracts train_time_sec,
    throughput_instances_per_sec, accuracy_pct, train_instances, with an
    Accuracy = X% fallback.
  • benchmarks.yml: add the liblinear benchmark with install_markers for
    bin/train + bin/predict so install is validated by binary existence.
  • jobs.yml: add liblinear_synthetic (overridable size/solver/cost/bias)
    and liblinear_rcv1 jobs.
  • BUCK: add the parser to :parsers and the test to parsers_tests.
  • tests/test_liblinear_parser.py: parser unit test (full metric block +
    accuracy fallback).

Differential Revision: D108165488

Summary:
The `graph500_omp_csr` benchpress job hardcoded SCALE (log2 of graph
vertices) as the literal string `"20"`. Switch to benchpress's standard
`vars`/`args` mechanism so SCALE can be overridden per-run from the CLI
without editing config files:

- `benchpress/config/jobs.yml` and `jobs_internal.yml`:
  args -> `["{scale}"]`, vars -> `["scale=20"]`. Default behavior
  (SCALE=20) is preserved.
- `packages/graph500/README.md` (new): package-level docs covering
  install, default run, custom SCALE via `-i '{"scale":"<N>"}'`, a
  SCALE-to-RAM sizing table, and MPI tuning env vars.
- `README.md`: add a Graph500 row to the "Internal benchmarks" table.
- `tests/test_graph500_vars.py` + `BUCK` target `graph500_vars_tests`:
  unit tests for the var-substitution and dry_run paths, modeled after
  the existing `silo_vars_tests`. ParserFactory.create is patched
  per-test via setUp/tearDown to avoid leaking into other tests.

No changes to `run.sh` were needed — it already accepts SCALE as a
positional arg.

Usage:
```bash
./benchpress_cli.py run graph500_omp_csr                          # SCALE=20
./benchpress_cli.py run graph500_omp_csr -i '{"scale":"25"}'      # SCALE=25
```

Differential Revision: D107278406
Summary:
Integrate liblinear (NTU's large-scale linear classification library,
https://github.com/cjlin1/liblinear) as a new DCPerf/benchpress benchmark
in the default suite. It builds liblinear's train/predict CLIs from a pinned
upstream release, trains an L2-regularized L2-loss SVC on either a synthetic
arbitrary-size dataset or a real LIBSVM dataset, and reports training time,
derived throughput (instances/sec), and prediction accuracy.

Changes:

- New package `packages/liblinear/`:
  - `install_liblinear.sh`: installs build deps for CentOS (dnf) and Ubuntu
    (apt) via packages/common/os-distro.sh, clones+builds liblinear at a
    pinned tag into bin/, and downloads the default rcv1.binary dataset.
  - `gen_dataset.py`: stdlib-only synthetic LIBSVM generator. A random
    ground-truth hyperplane labels samples by sign of the linear score,
    producing linearly-separable +1/-1 data up to configurable label noise.
    Size/shape configurable via --n-samples/--n-features/--density/--noise.
  - `run_liblinear.sh`: orchestrator (the benchmark `path`). Generates
    synthetic data or resolves/downloads a real set (rcv1/a9a/ijcnn1/w8a),
    times training, runs predict, and emits machine-parseable `liblinear_*`
    metric lines. Solver/cost/bias are configurable (-s/-c/-B).
  - `cleanup_liblinear.sh`, `README.md`.
- New parser `benchpress/plugins/parsers/liblinear.py` + registration in
  `parsers/__init__.py`. Extracts train_time_sec,
  throughput_instances_per_sec, accuracy_pct, train_instances, with an
  `Accuracy = X%` fallback.
- `benchmarks.yml`: add the `liblinear` benchmark with install_markers for
  bin/train + bin/predict so install is validated by binary existence.
- `jobs.yml`: add `liblinear_synthetic` (overridable size/solver/cost/bias)
  and `liblinear_rcv1` jobs.
- `BUCK`: add the parser to `:parsers` and the test to `parsers_tests`.
- `tests/test_liblinear_parser.py`: parser unit test (full metric block +
  accuracy fallback).

Differential Revision: D108165488
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 10, 2026
@meta-codesync

meta-codesync Bot commented Jun 10, 2026

Copy link
Copy Markdown

@yans3meta has exported this pull request. If you are a Meta employee, you can view the originating Diff in D108165488.

meta-codesync Bot pushed a commit that referenced this pull request Jun 12, 2026
Summary:
Pull Request resolved: #682

Integrate liblinear (NTU's large-scale linear classification library,
https://github.com/cjlin1/liblinear) as a new DCPerf/benchpress benchmark
in the default suite. It builds liblinear's train/predict CLIs from a pinned
upstream release, trains an L2-regularized L2-loss SVC on either a synthetic
arbitrary-size dataset or a real LIBSVM dataset, and reports training time,
derived throughput (instances/sec), and prediction accuracy.

Changes:

- New package `packages/liblinear/`:
  - `install_liblinear.sh`: installs build deps for CentOS (dnf) and Ubuntu
    (apt) via packages/common/os-distro.sh, clones+builds liblinear at a
    pinned tag into bin/, and downloads the default rcv1.binary dataset.
  - `gen_dataset.py`: stdlib-only synthetic LIBSVM generator. A random
    ground-truth hyperplane labels samples by sign of the linear score,
    producing linearly-separable +1/-1 data up to configurable label noise.
    Size/shape configurable via --n-samples/--n-features/--density/--noise.
  - `run_liblinear.sh`: orchestrator (the benchmark `path`). Generates
    synthetic data or resolves/downloads a real set (rcv1/a9a/ijcnn1/w8a),
    times training, runs predict, and emits machine-parseable `liblinear_*`
    metric lines. Solver/cost/bias are configurable (-s/-c/-B).
  - `cleanup_liblinear.sh`, `README.md`.
- New parser `benchpress/plugins/parsers/liblinear.py` + registration in
  `parsers/__init__.py`. Extracts train_time_sec,
  throughput_instances_per_sec, accuracy_pct, train_instances, with an
  `Accuracy = X%` fallback.
- `benchmarks.yml`: add the `liblinear` benchmark with install_markers for
  bin/train + bin/predict so install is validated by binary existence.
- `jobs.yml`: add `liblinear_synthetic` (overridable size/solver/cost/bias)
  and `liblinear_rcv1` jobs.
- `BUCK`: add the parser to `:parsers` and the test to `parsers_tests`.
- `tests/test_liblinear_parser.py`: parser unit test (full metric block +
  accuracy fallback).

Reviewed By: hasan3050

Differential Revision: D108165488

fbshipit-source-id: d827ac32b0a82c62677c786a7650f2b59215ac49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant