Add liblinear benchmark with synthetic + real datasets#682
Open
yans3meta wants to merge 2 commits into
Open
Conversation
Summary:
The `graph500_omp_csr` benchpress job hardcoded SCALE (log2 of graph
vertices) as the literal string `"20"`. Switch to benchpress's standard
`vars`/`args` mechanism so SCALE can be overridden per-run from the CLI
without editing config files:
- `benchpress/config/jobs.yml` and `jobs_internal.yml`:
args -> `["{scale}"]`, vars -> `["scale=20"]`. Default behavior
(SCALE=20) is preserved.
- `packages/graph500/README.md` (new): package-level docs covering
install, default run, custom SCALE via `-i '{"scale":"<N>"}'`, a
SCALE-to-RAM sizing table, and MPI tuning env vars.
- `README.md`: add a Graph500 row to the "Internal benchmarks" table.
- `tests/test_graph500_vars.py` + `BUCK` target `graph500_vars_tests`:
unit tests for the var-substitution and dry_run paths, modeled after
the existing `silo_vars_tests`. ParserFactory.create is patched
per-test via setUp/tearDown to avoid leaking into other tests.
No changes to `run.sh` were needed — it already accepts SCALE as a
positional arg.
Usage:
```bash
./benchpress_cli.py run graph500_omp_csr # SCALE=20
./benchpress_cli.py run graph500_omp_csr -i '{"scale":"25"}' # SCALE=25
```
Differential Revision: D107278406
Summary: Integrate liblinear (NTU's large-scale linear classification library, https://github.com/cjlin1/liblinear) as a new DCPerf/benchpress benchmark in the default suite. It builds liblinear's train/predict CLIs from a pinned upstream release, trains an L2-regularized L2-loss SVC on either a synthetic arbitrary-size dataset or a real LIBSVM dataset, and reports training time, derived throughput (instances/sec), and prediction accuracy. Changes: - New package `packages/liblinear/`: - `install_liblinear.sh`: installs build deps for CentOS (dnf) and Ubuntu (apt) via packages/common/os-distro.sh, clones+builds liblinear at a pinned tag into bin/, and downloads the default rcv1.binary dataset. - `gen_dataset.py`: stdlib-only synthetic LIBSVM generator. A random ground-truth hyperplane labels samples by sign of the linear score, producing linearly-separable +1/-1 data up to configurable label noise. Size/shape configurable via --n-samples/--n-features/--density/--noise. - `run_liblinear.sh`: orchestrator (the benchmark `path`). Generates synthetic data or resolves/downloads a real set (rcv1/a9a/ijcnn1/w8a), times training, runs predict, and emits machine-parseable `liblinear_*` metric lines. Solver/cost/bias are configurable (-s/-c/-B). - `cleanup_liblinear.sh`, `README.md`. - New parser `benchpress/plugins/parsers/liblinear.py` + registration in `parsers/__init__.py`. Extracts train_time_sec, throughput_instances_per_sec, accuracy_pct, train_instances, with an `Accuracy = X%` fallback. - `benchmarks.yml`: add the `liblinear` benchmark with install_markers for bin/train + bin/predict so install is validated by binary existence. - `jobs.yml`: add `liblinear_synthetic` (overridable size/solver/cost/bias) and `liblinear_rcv1` jobs. - `BUCK`: add the parser to `:parsers` and the test to `parsers_tests`. - `tests/test_liblinear_parser.py`: parser unit test (full metric block + accuracy fallback). Differential Revision: D108165488
|
@yans3meta has exported this pull request. If you are a Meta employee, you can view the originating Diff in D108165488. |
meta-codesync Bot
pushed a commit
that referenced
this pull request
Jun 12, 2026
Summary: Pull Request resolved: #682 Integrate liblinear (NTU's large-scale linear classification library, https://github.com/cjlin1/liblinear) as a new DCPerf/benchpress benchmark in the default suite. It builds liblinear's train/predict CLIs from a pinned upstream release, trains an L2-regularized L2-loss SVC on either a synthetic arbitrary-size dataset or a real LIBSVM dataset, and reports training time, derived throughput (instances/sec), and prediction accuracy. Changes: - New package `packages/liblinear/`: - `install_liblinear.sh`: installs build deps for CentOS (dnf) and Ubuntu (apt) via packages/common/os-distro.sh, clones+builds liblinear at a pinned tag into bin/, and downloads the default rcv1.binary dataset. - `gen_dataset.py`: stdlib-only synthetic LIBSVM generator. A random ground-truth hyperplane labels samples by sign of the linear score, producing linearly-separable +1/-1 data up to configurable label noise. Size/shape configurable via --n-samples/--n-features/--density/--noise. - `run_liblinear.sh`: orchestrator (the benchmark `path`). Generates synthetic data or resolves/downloads a real set (rcv1/a9a/ijcnn1/w8a), times training, runs predict, and emits machine-parseable `liblinear_*` metric lines. Solver/cost/bias are configurable (-s/-c/-B). - `cleanup_liblinear.sh`, `README.md`. - New parser `benchpress/plugins/parsers/liblinear.py` + registration in `parsers/__init__.py`. Extracts train_time_sec, throughput_instances_per_sec, accuracy_pct, train_instances, with an `Accuracy = X%` fallback. - `benchmarks.yml`: add the `liblinear` benchmark with install_markers for bin/train + bin/predict so install is validated by binary existence. - `jobs.yml`: add `liblinear_synthetic` (overridable size/solver/cost/bias) and `liblinear_rcv1` jobs. - `BUCK`: add the parser to `:parsers` and the test to `parsers_tests`. - `tests/test_liblinear_parser.py`: parser unit test (full metric block + accuracy fallback). Reviewed By: hasan3050 Differential Revision: D108165488 fbshipit-source-id: d827ac32b0a82c62677c786a7650f2b59215ac49
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Integrate liblinear (NTU's large-scale linear classification library,
https://github.com/cjlin1/liblinear) as a new DCPerf/benchpress benchmark
in the default suite. It builds liblinear's train/predict CLIs from a pinned
upstream release, trains an L2-regularized L2-loss SVC on either a synthetic
arbitrary-size dataset or a real LIBSVM dataset, and reports training time,
derived throughput (instances/sec), and prediction accuracy.
Changes:
packages/liblinear/:install_liblinear.sh: installs build deps for CentOS (dnf) and Ubuntu(apt) via packages/common/os-distro.sh, clones+builds liblinear at a
pinned tag into bin/, and downloads the default rcv1.binary dataset.
gen_dataset.py: stdlib-only synthetic LIBSVM generator. A randomground-truth hyperplane labels samples by sign of the linear score,
producing linearly-separable +1/-1 data up to configurable label noise.
Size/shape configurable via --n-samples/--n-features/--density/--noise.
run_liblinear.sh: orchestrator (the benchmarkpath). Generatessynthetic data or resolves/downloads a real set (rcv1/a9a/ijcnn1/w8a),
times training, runs predict, and emits machine-parseable
liblinear_*metric lines. Solver/cost/bias are configurable (-s/-c/-B).
cleanup_liblinear.sh,README.md.benchpress/plugins/parsers/liblinear.py+ registration inparsers/__init__.py. Extracts train_time_sec,throughput_instances_per_sec, accuracy_pct, train_instances, with an
Accuracy = X%fallback.benchmarks.yml: add theliblinearbenchmark with install_markers forbin/train + bin/predict so install is validated by binary existence.
jobs.yml: addliblinear_synthetic(overridable size/solver/cost/bias)and
liblinear_rcv1jobs.BUCK: add the parser to:parsersand the test toparsers_tests.tests/test_liblinear_parser.py: parser unit test (full metric block +accuracy fallback).
Differential Revision: D108165488