Hamidreza hamidmatiny

Mohammadreza Matiny

AI Software & MLOps Engineer · Production Deep Learning · Low-Latency Async APIs

About Me

Hybrid engineer bridging deep learning mathematics and production infrastructure.

I build, optimize, and deploy production-grade deep learning systems and low-latency distributed data pipelines — with 6+ years across core software engineering, computer vision, data science, and ML infrastructure. Currently, I bring this experience to Torc Robotics as a Quality Assurance and Annotation Specialist, verifying high-fidelity data streams for autonomous vehicle environments. I specialize in distributed compute scheduling, strict data contract enforcement, and localized hardware performance optimization, turning complex multi-modal data streams into robust, stable, and highly scalable production networks.

Core Expertise

MLOps & Deployment

Deep Learning & Core ML

Data & Distributed Systems

Optimization & Acceleration

Featured Architectural Builds

Sentinel-Ray

A high-performance distributed data gatekeeper engineered for streaming machine learning and data pipelines. It intercepts production data feeds to mitigate statistical anomalies, structural degradation, and data drift. The core engine paralyzes data processing workloads via asynchronously scaled Ray Tasks and stateful Actors, verifying every batch against programmatic constraints built on Pandera schema contracts.

Ray Core · Pandera · Distributed Validation · Data Quality Gates · Docker

Hydra Data Factory

Enterprise-grade data infrastructure automation ecosystem focusing on schema safety and decoupled system deployments. Leverages strict behavioral regression contracts engineered through automated Pytest testing frameworks. Fully orchestrates cloud infrastructure footprints utilizing Terraform (Infrastructure as Code) mapped seamlessly onto target AWS environments.

Terraform · AWS Infrastructure · Pytest · Schema Contracts · Continuous Delivery

Itinera

AI-driven asynchronous travel engine built on search-then-synthesize architecture. Integrates xAI's Grok API with async Python to manage concurrent data streams, parallel I/O-bound LLM calls, and low-latency task scheduling — delivering hyper-personalized itineraries without blocking the request path.

FastAPI · AsyncIO · xAI Grok · Streamlit · Search-then-Synthesize

Scalable Vision Transformer Deployment

Asynchronous ML serving infrastructure on GCP using FastAPI and containerized Docker environments. Evolved from custom ResNet-18 to Hugging Face ViT transfer learning with production-oriented inference paths — achieving a 40% reduction in production latency through async request handling and optimized model serving.

Vision Transformer · FastAPI · Docker · GCP · Hugging Face

High-Performance Inference & CUDA Optimization

Localized hardware optimization routines leveraging mixed-precision (AMP), torch.compile, and advanced DataLoader tuning across CUDA and Apple MPS backends. Systematic profiling and kernel-level tuning deliver 2–4× inference speedups with ~60% memory reduction on constrained hardware.

PyTorch AMP · torch.compile · CUDA · MPS · DataLoader Tuning

Engineering Philosophy

Production-quality code is resource-efficient by design — not optimized as an afterthought. I treat messy multi-modal sensor data and massive data-streaming pipelines as first-class systems problems: ingest, distribute, validate, and serve with the same exact rigor applied to model architecture. Success is measured by deployment metrics — p99 latency, compute node efficiency, structural schema integrity, and overall pipeline resiliency — not theoretical benchmarks that never survive contact with production data streams.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly