MSc student at Politecnico di Torino, conducting thesis research at Eindhoven University of Technology. Applied ML and reinforcement learning researcher with five years of industrial analytics experience in automotive supply chain. Based in Turin, Italy.
- Deep RL: sim-to-real transfer, robust policy evaluation, entropy-based exploration
- RL for supply chain and sustainable operations decision-making under uncertainty
- ML systems: training pipelines, REST inference, containerized deployment
sim2real — Sim-to-real locomotion for MuJoCo Hopper using curriculum randomization, domain randomization, and entropy scheduling; companion paper under review at IEEE Robotics and Automation Letters.
FBWM-FTOPSIS-PPO — PPO agent benchmarked against LP/MIP and fuzzy MCDM baselines for sustainable supplier order allocation under stochastic demand.
math-llm-poc — Decoder-only Transformer trained from scratch in PyTorch on synthetic arithmetic data, served via FastAPI and Docker; 98.44% exact-match on 5,006 held-out test equations.
kpi-lens — Supply chain intelligence platform with statistical and ML anomaly detection, an LLM analyst layer, and automated Excel/PPT reporting via FastAPI and MCP.
weld-anomaly-classifier — Multimodal deep learning pipeline for weld defect classification from sensor, audio, and video streams; F1 0.957, ranked first at I3P AI Hackathon, Politecnico di Torino.
My research sits at the intersection of deep reinforcement learning and operations research: learning robust policies for locomotion transfer and for supply chain decision-making under uncertainty.
Current first-author work: sim-to-real locomotion (IEEE RA-L, under review) and a hybrid RL/MCDM framework for dynamic supplier order allocation (targeting the International Journal of Production Economics). Earlier published work on blockchain-integrated sustainable supplier selection in the Journal of Cleaner Production (52 citations).
Reinforcement learning: PPO, SAC, DQN, Gymnasium, MuJoCo
ML and engineering: PyTorch, FastAPI, Docker, Python, SQL
Optimization: Gurobi, MILP, fuzzy MCDM
GitHub Pages · LinkedIn · Google Scholar · ali.vaezi@studenti.polito.it





