Add liblinear benchmark with synthetic + real datasets by yans3meta · Pull Request #682 · facebookresearch/DCPerf

yans3meta · 2026-06-10T17:30:44Z

Summary:
Integrate liblinear (NTU's large-scale linear classification library,
https://github.com/cjlin1/liblinear) as a new DCPerf/benchpress benchmark
in the default suite. It builds liblinear's train/predict CLIs from a pinned
upstream release, trains an L2-regularized L2-loss SVC on either a synthetic
arbitrary-size dataset or a real LIBSVM dataset, and reports training time,
derived throughput (instances/sec), and prediction accuracy.

Changes:

New package packages/liblinear/:
- install_liblinear.sh: installs build deps for CentOS (dnf) and Ubuntu
  (apt) via packages/common/os-distro.sh, clones+builds liblinear at a
  pinned tag into bin/, and downloads the default rcv1.binary dataset.
- gen_dataset.py: stdlib-only synthetic LIBSVM generator. A random
  ground-truth hyperplane labels samples by sign of the linear score,
  producing linearly-separable +1/-1 data up to configurable label noise.
  Size/shape configurable via --n-samples/--n-features/--density/--noise.
- run_liblinear.sh: orchestrator (the benchmark path). Generates
  synthetic data or resolves/downloads a real set (rcv1/a9a/ijcnn1/w8a),
  times training, runs predict, and emits machine-parseable liblinear_*
  metric lines. Solver/cost/bias are configurable (-s/-c/-B).
- cleanup_liblinear.sh, README.md.
New parser benchpress/plugins/parsers/liblinear.py + registration in
parsers/__init__.py. Extracts train_time_sec,
throughput_instances_per_sec, accuracy_pct, train_instances, with an
Accuracy = X% fallback.
benchmarks.yml: add the liblinear benchmark with install_markers for
bin/train + bin/predict so install is validated by binary existence.
jobs.yml: add liblinear_synthetic (overridable size/solver/cost/bias)
and liblinear_rcv1 jobs.
BUCK: add the parser to :parsers and the test to parsers_tests.
tests/test_liblinear_parser.py: parser unit test (full metric block +
accuracy fallback).

Differential Revision: D108165488

Summary: The `graph500_omp_csr` benchpress job hardcoded SCALE (log2 of graph vertices) as the literal string `"20"`. Switch to benchpress's standard `vars`/`args` mechanism so SCALE can be overridden per-run from the CLI without editing config files: - `benchpress/config/jobs.yml` and `jobs_internal.yml`: args -> `["{scale}"]`, vars -> `["scale=20"]`. Default behavior (SCALE=20) is preserved. - `packages/graph500/README.md` (new): package-level docs covering install, default run, custom SCALE via `-i '{"scale":"<N>"}'`, a SCALE-to-RAM sizing table, and MPI tuning env vars. - `README.md`: add a Graph500 row to the "Internal benchmarks" table. - `tests/test_graph500_vars.py` + `BUCK` target `graph500_vars_tests`: unit tests for the var-substitution and dry_run paths, modeled after the existing `silo_vars_tests`. ParserFactory.create is patched per-test via setUp/tearDown to avoid leaking into other tests. No changes to `run.sh` were needed — it already accepts SCALE as a positional arg. Usage: ```bash ./benchpress_cli.py run graph500_omp_csr # SCALE=20 ./benchpress_cli.py run graph500_omp_csr -i '{"scale":"25"}' # SCALE=25 ``` Differential Revision: D107278406

Summary: Integrate liblinear (NTU's large-scale linear classification library, https://github.com/cjlin1/liblinear) as a new DCPerf/benchpress benchmark in the default suite. It builds liblinear's train/predict CLIs from a pinned upstream release, trains an L2-regularized L2-loss SVC on either a synthetic arbitrary-size dataset or a real LIBSVM dataset, and reports training time, derived throughput (instances/sec), and prediction accuracy. Changes: - New package `packages/liblinear/`: - `install_liblinear.sh`: installs build deps for CentOS (dnf) and Ubuntu (apt) via packages/common/os-distro.sh, clones+builds liblinear at a pinned tag into bin/, and downloads the default rcv1.binary dataset. - `gen_dataset.py`: stdlib-only synthetic LIBSVM generator. A random ground-truth hyperplane labels samples by sign of the linear score, producing linearly-separable +1/-1 data up to configurable label noise. Size/shape configurable via --n-samples/--n-features/--density/--noise. - `run_liblinear.sh`: orchestrator (the benchmark `path`). Generates synthetic data or resolves/downloads a real set (rcv1/a9a/ijcnn1/w8a), times training, runs predict, and emits machine-parseable `liblinear_*` metric lines. Solver/cost/bias are configurable (-s/-c/-B). - `cleanup_liblinear.sh`, `README.md`. - New parser `benchpress/plugins/parsers/liblinear.py` + registration in `parsers/__init__.py`. Extracts train_time_sec, throughput_instances_per_sec, accuracy_pct, train_instances, with an `Accuracy = X%` fallback. - `benchmarks.yml`: add the `liblinear` benchmark with install_markers for bin/train + bin/predict so install is validated by binary existence. - `jobs.yml`: add `liblinear_synthetic` (overridable size/solver/cost/bias) and `liblinear_rcv1` jobs. - `BUCK`: add the parser to `:parsers` and the test to `parsers_tests`. - `tests/test_liblinear_parser.py`: parser unit test (full metric block + accuracy fallback). Differential Revision: D108165488

meta-codesync · 2026-06-10T17:31:06Z

@yans3meta has exported this pull request. If you are a Meta employee, you can view the originating Diff in D108165488.

Summary: Pull Request resolved: #682 Integrate liblinear (NTU's large-scale linear classification library, https://github.com/cjlin1/liblinear) as a new DCPerf/benchpress benchmark in the default suite. It builds liblinear's train/predict CLIs from a pinned upstream release, trains an L2-regularized L2-loss SVC on either a synthetic arbitrary-size dataset or a real LIBSVM dataset, and reports training time, derived throughput (instances/sec), and prediction accuracy. Changes: - New package `packages/liblinear/`: - `install_liblinear.sh`: installs build deps for CentOS (dnf) and Ubuntu (apt) via packages/common/os-distro.sh, clones+builds liblinear at a pinned tag into bin/, and downloads the default rcv1.binary dataset. - `gen_dataset.py`: stdlib-only synthetic LIBSVM generator. A random ground-truth hyperplane labels samples by sign of the linear score, producing linearly-separable +1/-1 data up to configurable label noise. Size/shape configurable via --n-samples/--n-features/--density/--noise. - `run_liblinear.sh`: orchestrator (the benchmark `path`). Generates synthetic data or resolves/downloads a real set (rcv1/a9a/ijcnn1/w8a), times training, runs predict, and emits machine-parseable `liblinear_*` metric lines. Solver/cost/bias are configurable (-s/-c/-B). - `cleanup_liblinear.sh`, `README.md`. - New parser `benchpress/plugins/parsers/liblinear.py` + registration in `parsers/__init__.py`. Extracts train_time_sec, throughput_instances_per_sec, accuracy_pct, train_instances, with an `Accuracy = X%` fallback. - `benchmarks.yml`: add the `liblinear` benchmark with install_markers for bin/train + bin/predict so install is validated by binary existence. - `jobs.yml`: add `liblinear_synthetic` (overridable size/solver/cost/bias) and `liblinear_rcv1` jobs. - `BUCK`: add the parser to `:parsers` and the test to `parsers_tests`. - `tests/test_liblinear_parser.py`: parser unit test (full metric block + accuracy fallback). Reviewed By: hasan3050 Differential Revision: D108165488 fbshipit-source-id: d827ac32b0a82c62677c786a7650f2b59215ac49

yans3meta added 2 commits June 10, 2026 10:30

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 10, 2026

meta-codesync Bot added the meta-exported label Jun 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add liblinear benchmark with synthetic + real datasets#682

Add liblinear benchmark with synthetic + real datasets#682
yans3meta wants to merge 2 commits into
facebookresearch:v2-betafrom
yans3meta:export-D108165488-to-v2-beta

yans3meta commented Jun 10, 2026

Uh oh!

meta-codesync Bot commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

yans3meta commented Jun 10, 2026

Uh oh!

meta-codesync Bot commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant