Machine Learning Enhanced Multi-Factor Quantitative Trading — A Cross-Sectional Portfolio Optimization Approach with Bias Correction
arXiv:2507.07107 | Yimin Du, 2025
A clean, fork-friendly, end-to-end A-share quantitative trading system:
| Module | What it does |
|---|---|
features.tensor_factors |
GPU-vectorised masked primitives (rank, corr, ewma, ts_*) |
features.legacy_factors |
204 hand-crafted alpha factors (handbook) |
features.alpha101 |
Alpha101-style formulaic factors |
features.neutralize |
Cross-sectional & industry neutralisation |
features.bias |
Limit-up / limit-down / halt bias correction |
training.augment |
GBM data augmentation |
models.nets |
MLP / Transformer |
models.losses |
AdjMSE, IC, RankIC losses |
portfolio.markowitz |
Cross-sectional Markowitz (shrunk cov, no-short) |
backtest.engine |
Vectorised backtest → Sharpe / IC / IR / DD |
git clone https://github.com/initial-d/ml-quant-trading.git
cd ml-quant-trading
pip install -e .[dev] # add ,gpu for CUDA; add ,mosek for MOSEK solver
# 30-second smoke test (synthetic 200 stocks × 500 days)
make paper CONFIG=configs/small.yamlYou can run an end-to-end demo of this project instantly in Google Colab without installing anything locally:
The full feature set comprises 9 curated Alpha101 formulas (features.alpha101) plus 204 hand-crafted legacy factors (features.legacy_factors) for a total of 213 dimensions. All factors are mask-aware PyTorch tensors with signature Panel → (values[T,N], mask[T,N]).
📖 完整因子手册 (Factor Handbook) — 每个因子一段话详解思想、动机和原理,方便按需选用。
| Family | Count | Description |
|---|---|---|
better_001 – better_028 |
28 | VWAP deviation + volume-weighted momentum |
best_001 – best_021 |
21 | Close-location momentum variants |
old_027 – old_076 |
50 | Classic alpha signals (corr/rank composites) |
stock_001 – stock_022 |
22 | Per-stock derived series (volume, range, price) |
extra_001 – extra_014 |
14 | Turnover + amount features |
add_001 – add_030 |
30 | Additional composite factors |
change_001 – change_005 |
5 | Short-window change-of-velocity |
original_001 – original_028 |
28 | Close/volume direct statistics |
cs_rank_* |
6 | Market breadth (cross-sectional rank signals) |
Full factor list (click to expand)
add_001 add_002 add_003 add_004 add_005 add_006
add_007 add_008 add_009 add_010 add_011 add_012
add_013 add_014 add_015 add_016 add_017 add_018
add_019 add_020 add_021 add_022 add_023 add_024
add_025 add_026 add_027 add_028 add_029 add_030
best_001 best_002 best_003 best_004 best_005 best_006
best_007 best_008 best_009 best_010 best_011 best_012
best_013 best_014 best_015 best_016 best_017 best_018
best_019 best_020 best_021
change_001 change_002 change_003 change_004 change_005
extra_001 extra_002 extra_003 extra_004 extra_005 extra_006
extra_007 extra_008 extra_009 extra_010 extra_011 extra_012
extra_013 extra_014
old_027 old_028 old_029 old_030 old_031 old_032
old_033 old_034 old_035 old_036 old_037 old_038
old_039 old_040 old_041 old_042 old_043 old_044
old_045 old_046 old_047 old_048 old_049 old_050
old_051 old_052 old_053 old_054 old_055 old_056
old_057 old_058 old_059 old_060 old_061 old_062
old_063 old_064 old_065 old_066 old_067 old_068
old_069 old_070 old_071 old_072 old_073 old_074
old_075 old_076
original_001 original_002 original_003 original_004 original_005
original_006 original_007 original_008 original_009 original_010
original_011 original_012 original_013 original_014 original_015
original_016 original_017 original_018 original_019 original_020
original_021 original_022 original_023 original_024 original_025
original_026 original_027 original_028
stock_001 stock_002 stock_003 stock_004 stock_005 stock_006
stock_007 stock_008 stock_009 stock_010 stock_011 stock_012
stock_013 stock_014 stock_015 stock_016 stock_017 stock_018
stock_019 stock_020 stock_021 stock_022
You can directly fetch stock data from Yahoo Finance or Baostock (for A-shares).
yfinance:
from mlquant.data import make_panel
panel = make_panel(
source="yfinance",
tickers=["000001.SZ", "600000.SS"],
start="2020-01-01",
end="2023-12-31"
)baostock:
from mlquant.data import make_panel
panel = make_panel(
source="baostock",
tickers=["sh.600000", "sz.000001"],
start="2020-01-01",
end="2023-12-31"
)from mlquant.features import compute_legacy_set, LEGACY_REGISTRY
# Compute all 204 factors
factors, mask, names = compute_legacy_set(panel) # → [T, N, 204]
# Or a subset
factors, mask, names = compute_legacy_set(panel, names=("best_001", "add_015", "old_042"))raw OCHLV → data.loaders / data.synthetic / data.yfinance_loader / data.baostock_loader (Panel with mask)
→ features.tensor_factors (GPU masked primitives)
→ features.legacy_factors (204 alphas)
→ training.augment + models.nets + models.losses
→ portfolio.markowitz (efficient frontier)
→ backtest.engine → Sharpe / IC / IR / DD / turnover
ml-quant-trading/
├── src/mlquant/
│ ├── data/ # Panel dataclass, loaders, synthetic generator
│ ├── features/ # Factor engine + 204 legacy + Alpha101
│ ├── training/ # Dataset, augmentation, trainer
│ ├── models/ # MLP, Transformer, losses
│ ├── portfolio/ # Markowitz, frontier sweep
│ ├── backtest/ # Engine, metrics
│ └── cli/ # Command-line interface
├── configs/ # small.yaml (smoke) / paper.yaml (full)
├── tests/ # pytest suite
├── scripts/ # IC eval, frontier plot
├── legacy/ # Original research scripts (archival, unsupported)
└── docs/ # Architecture, factor docs, paper reproduction
See docs/reproducing_paper.md for table-by-table mapping.
| Paper section | Code module | Tests |
|---|---|---|
| §3.1 Tensor factor engine | features.tensor_factors |
test_tensor_factors |
| §3.2 Alpha + microstructure factors | features.alpha101, features.legacy_factors |
test_alpha101 |
| §3.3 Neutralisation | features.neutralize |
— |
| §3.4 Bias correction | features.bias |
test_bias |
| §4.1 GBM augmentation | training.augment |
test_augment |
| §4.2 ML models | models.nets, models.losses |
test_losses |
| §5 Portfolio optimisation | portfolio.markowitz |
test_markowitz |
| §6 Backtest | backtest.engine, backtest.metrics |
test_metrics |
@article{du2025mlquant,
title = {Machine Learning Enhanced Multi-Factor Quantitative Trading:
A Cross-Sectional Portfolio Optimization Approach with Bias Correction},
author = {Du, Yimin},
journal= {arXiv preprint arXiv:2507.07107},
year = {2025},
url = {https://arxiv.org/abs/2507.07107}
}MIT — see LICENSE.