Towards a Large Language Model Training Framework for Time Series Forecasting
News | Overview | Architecture | Setup | Quick Start | Examples | Acknowledgements
- 2026.05.27: 🚀 Added Cast-R1-style example integration and support for agentic time-series RLVR training!
- 2026.05.15: ✨ Added Time-R1 example code, RLVR training support, and the core CPT and SFT training implementations.
- 2026.04.29: 🎉 Released CastFactory — Towards a Large Language Model Training Framework for Time Series Forecasting!
CastFactory is a recipe-centric research framework for LLM-driven time-series forecasting. It organizes data processing, time-series representation, model training, verifiable rewards, evaluation, and trace artifacts around explicit YAML recipes, so CPT, SFT, and RLVR experiments can be reproduced and extended with a consistent workflow.
The framework connects two lines of work: Time-R1-style staged training for forecasting LLMs and Cast-R1-style tool-augmented sequential decision policies. For RLVR, CastFactory prepares verl-compatible rollout datasets, reward entrypoints, config files, agent-loop configs, and launch commands while keeping the external verl runtime outside this repository.
| Area | Current support |
|---|---|
| Experiment control | YAML recipes, CLI runner, dotlist overrides, stage metadata |
| Data pipeline | CSV reader, timestamp/ratio split, rolling windows, leakage checks, train-only normalization utilities |
| Representations | Statistics, textual summaries, context prompts, Markdown tables, discrete tokens, numerical patches, hybrid prompts |
| Training stages | CPT, SFT, RLVR through Experiment.fit() |
| Local training backend | HuggingFace Transformers backends for CPT and SFT |
| RLVR export | verl 0.7.1-style GRPO/vLLM config, rollout JSONL, reward entrypoint, launch command |
| Agentic RLVR | Native verl AgentLoop config, Cast-R1-style tools, timestamp/value ground truth, univariate workflow |
| Evaluation | Standard, rolling, and zero-shot evaluators with point metrics |
| Traces | Automatic recipe/metrics/predictions/report/stage metadata; RunStore helpers for prompts, responses, parsed rows, and errors |
CastFactory is organized around one explicit contract: a recipe describes the full experiment, and
Experiment wires the selected components together.
YAML recipe
-> RecipeConfig validation and defaults
-> CSV reader + timestamp/ratio split + rolling windows
-> Representation adapter
-> CPTDataset / SFTDataset / RLVRDataset
-> Trainer + backend
-> Evaluation, reward, and trace artifacts
For RLVR, the path branches by rollout workflow:
single_turn
-> text prompt rows
-> rollout_dataset.jsonl
-> verl_config.yaml
-> castfactory.training.verl_reward_adapter:compute_score
time_series_agent
-> raw chat rows
-> agent_name: time_series_forecast_agent
-> Cast-R1-style timestamp/value ground truth
-> agent_loop_config.yaml
-> native verl AgentLoop with time-series tools
- Recipe-first experiments: data, representation, model, training stage, reward, evaluation, rollout, and trace behavior are described in YAML recipes.
- Three-stage training flow:
cpt,sft, andrlvrare first-class experiment stages inExperiment.fit(). - Leakage-aware data layer: CSV reading, timestamp or ratio split, rolling windows, leakage
checks, train-only normalization utilities, and visibility-separated
ForecastSampleobjects. - Time-series representations: textual statistics, context summaries, discrete tokens, numerical patches, hybrid prompts, and Markdown-table prompts.
- Forecast parsers: JSON, array, timestamp/value, and
<think>/<answer>parsers with fallback behavior. - Verifiable rewards: format, accuracy, calibration, reasoning, MSE, and agentic rewards for length, normalized MSE, change points, and season/trend structure.
- verl integration: RLVR recipes can export rollout datasets, verl config files, reward entrypoints, agent-loop config files, and launch commands for GRPO/vLLM training.
- Trace artifacts:
Experiment.evaluate()writes recipe snapshots, metrics, predictions, leaderboards, and reports;Experiment.fit()writes stage metadata;RunStorealso exposes helpers for prompts, responses, parsed rows, and errors.
castfactory/
cli/ # command-line recipe runner
core/ # RecipeConfig, Experiment, registries
data/ # records, readers, splits, windows, leakage checks
evaluation/ # point metrics and evaluation protocols
models/ # backbone, bridge, head, and adapter skeletons
parsers/ # forecast output parsers
representation/ # time-series to LLM input representations
rewards/ # verifiable reward functions
trace/ # run artifact storage
training/ # CPT/SFT/RLVR datasets, trainers, backends, agentic tools
examples/
cpt/ # ETTh1 CPT recipes
sft/ # ETTh1 SFT recipes
rlvr/ # ETTh1 GRPO/RLVR recipes, Cast-R1-style agentic recipe, prompt template
scripts/ # server and 4-GPU launch helpers
tests/ # unit tests by module area
Install CastFactory in editable mode:
python -m pip install -e .Optional HuggingFace model loading:
python -m pip install -e ".[hf]"Optional local CPT/SFT training stack:
python -m pip install -e ".[train]"Optional time-series agentic RL helpers:
python -m pip install -e ".[agentic]"Development tools:
python -m pip install -e ".[dev]"RLVR recipes generate artifacts for an external verl runtime. Before running exported GRPO/vLLM launch commands, prepare a separate training environment with verl 0.7.1 and its matching vLLM, Ray, CUDA, and PyTorch stack.
CastFactory does not vendor verl. It writes the dataset, config, reward entrypoint, and launch command that should be executed in that preconfigured verl 0.7.1 environment.
python -c "import verl; print(getattr(verl, '__version__', 'unknown'))"Expected version:
0.7.1
Load a recipe:
python -m castfactory.cli.run examples/sft/etth1_qwen_sft.yamlRun CPT or SFT locally with the configured Transformers backend:
python -m castfactory.cli.run examples/cpt/etth1_qwen_cpt.yaml --mode fit
python -m castfactory.cli.run examples/sft/etth1_qwen_sft.yaml --mode fitPrepare RLVR artifacts for verl 0.7.1:
python -m castfactory.cli.run examples/rlvr/etth1_qwen_grpo.yaml --mode fitCommand-line dotlist overrides are supported:
python -m castfactory.cli.run examples/sft/etth1_qwen_sft.yaml --mode fit \
model.backbone.model_name=Qwen/Qwen2.5-1.5B \
training.args.num_train_epochs=1 \
training.args.learning_rate=2.0e-4| Stage | Recipe | Purpose | Output |
|---|---|---|---|
| CPT | examples/cpt/etth1_qwen_cpt.yaml |
Convert ETTh1 windows into causal LM text streams | ./checkpoints/etth1_ot_qwen_cpt |
| SFT | examples/sft/etth1_qwen_sft.yaml |
Build single-turn forecasting instruction data from CPT checkpoint | ./checkpoints/etth1_ot_qwen_sft |
| RLVR | examples/rlvr/etth1_qwen_grpo.yaml |
Prepare single-turn GRPO artifacts for verl 0.7.1 | ./checkpoints/etth1_ot_qwen_grpo/rlvr/ |
| Agentic RLVR | examples/rlvr/etth1_qwen3_1_7b_agentic_grpo_4gpu.yaml |
Prepare Cast-R1-style tool-augmented AgentLoop artifacts | ./checkpoints/etth1_qwen3_1_7b_4gpu/etth1_ot_qwen_agentic_grpo/rlvr/ |
The basic ETTh1 pipeline is checkpoint-chained:
examples/cpt/etth1_qwen_cpt.yaml
-> ./checkpoints/etth1_ot_qwen_cpt
examples/sft/etth1_qwen_sft.yaml
-> init_checkpoint: ./checkpoints/etth1_ot_qwen_cpt
-> ./checkpoints/etth1_ot_qwen_sft
examples/rlvr/etth1_qwen_grpo.yaml
-> init_checkpoint: ./checkpoints/etth1_ot_qwen_sft
-> ./checkpoints/etth1_ot_qwen_grpo
For larger Qwen3-1.7B 4-GPU examples, use:
examples/cpt/etth1_qwen3_1_7b_cpt_4gpu.yaml
examples/sft/etth1_qwen3_1_7b_sft_4gpu.yaml
examples/rlvr/etth1_qwen3_1_7b_grpo_4gpu.yaml
examples/rlvr/etth1_qwen3_1_7b_agentic_grpo_4gpu.yaml
The helper scripts mirror these recipes:
bash scripts/cpt.sh
bash scripts/sft.sh
bash scripts/rlvr.sh
bash scripts/agentic_rlvr.shCPT converts forecasting windows into causal language modeling text streams. The built-in
CPTDataset serializes observed time-series values with channel, domain, and unit metadata when
available. TransformersCPTBackend then tokenizes the text and trains a causal LM objective where
input tokens are also labels.
Minimal recipe shape:
experiment:
name: etth1_ot_qwen_cpt
stage: cpt
data:
reader:
name: csv
path: ./castfactory/dataset/ETTh1/ETTh1.csv
timestamp_col: date
target_channels: [OT]
split:
type: ratio
ratios: [0.7, 0.1, 0.2]
window:
context_length: 96
prediction_length: 96
stride: 96
model:
backbone:
name: hf_causal_lm
model_name: Qwen/Qwen2.5-1.5B
training:
checkpoint_dir: ./checkpoints/etth1_ot_qwen_cpt
backend:
name: transformersSFT builds single-turn forecasting instruction examples. The dataset formats each row as an
instruction plus visible time-series context, and the output is a JSON forecast target. The
Transformers backend masks prompt tokens with -100, so loss is computed on response tokens.
Supported prompt inputs include representation text, future-known covariates, cutoff time, prediction length, dataset metadata, target channel names, and template-defined placeholders.
The SFT backend exposes separate prompt and response token budgets:
training:
init_checkpoint: ./checkpoints/etth1_ot_qwen_cpt
checkpoint_dir: ./checkpoints/etth1_ot_qwen_sft
backend:
name: transformers
max_prompt_length: 2048
max_response_length: 1024RLVR supports both the default single-turn rollout workflow and a time_series_agent workflow that
uses verl native AgentLoop for multi-turn tool use. CastFactory builds rollout rows from forecasting
samples, exports them for verl 0.7.1, and connects model outputs back to task-specific reward
computation.
The current ETTh1 GRPO example uses:
MarkdownTableRepresentationfor historical context;- an external instruction template file;
<think>...</think>and<answer>...</answer>output structure;- format checking before accuracy-style reward computation;
- MSE-based bounded reward;
- verl GRPO config generation with vLLM rollout settings.
Running an RLVR recipe prepares artifacts under the configured checkpoint directory:
checkpoints/.../rlvr/
rollout_dataset.jsonl
verl_config.yaml
agent_loop_config.yaml # only for rollout.workflow.name: time_series_agent
launch_command.txt
The generated reward entrypoint is:
castfactory.training.verl_reward_adapter:compute_score
For Cast-R1-style agentic training, add a workflow section to an RLVR recipe:
rollout:
workflow:
name: time_series_agent
max_steps: 3
max_parallel_calls: 5
tool_parser_format: hermes
model_service_url: http://localhost:8994
prediction_models: [chronos2, arima, patchtst, itransformer]
local_fallback: arima_then_last_valueThis path is currently univariate. It writes raw chat prompts, agent_name: time_series_forecast_agent,
Cast-R1-style timestamp/value ground truth, and a native verl
agent_loop_config.yaml. The predict_time_series tool tries the configured HTTP model service
first and falls back locally to ARIMA, then last-value forecasting.
The RLVR path strengthens the connection between model output format and verifiable reward calculation.
Prompt construction is template-driven. A recipe can point to an external .txt instruction
template through training.instruction_template_file, keeping prompt design separate from Python
code. Templates can reference forecasting context variables such as:
prediction_lengthcutoff_timedataset_nameattr_meaningtarget_channelscovariate_channelsdata_lookback
Structured parsing is handled by ThinkAnswerForecastParser. It expects the model to separate
reasoning and final prediction:
<think>
...
</think>
<answer>
```
...
```
</answer>
FormatReward can validate both parse success and required structural blocks. If the response does
not satisfy the expected reasoning/answer format, the reward adapter can short-circuit the sample
and penalize invalid outputs before computing numerical rewards.
For prediction correctness, CastFactory currently includes accuracy-oriented rewards such as
AccuracyReward, CalibrationReward, and MSEReward. The Cast-R1-style agentic path also includes
length, normalized MSE, change-point, and season/trend reward components.
Use the standard-library test runner:
python -m unittest discover -s tests -vAfter installing development dependencies:
python -m pytest tests -q
python -m ruff check castfactory testsExperimentcurrently supports CSV reader recipes and timestamp/ratio split recipes.- The
time_series_agentRLVR workflow currently supports univariate forecasting samples. - RLVR
fitprepares verl artifacts by default; execute them in a configured verl 0.7.1 runtime. - The included ETTh1 examples are small, reproducible templates rather than benchmark claims.
CastFactory is developed with thanks to the following open-source projects and research lines:

