I build things that have to be fast, correct, and real at the same time: FPGA accelerators, RISC-V silicon, compilers, and low-latency trading systems. Most of my best work lives in hardware that beats software baselines by four orders of magnitude.
MEng Electrical & Electronic Engineering @ Imperial College London (First Class, Dean's List) · Incoming SWE Intern @ Squarepoint Capital · CV Engineer @ SOTA AI
Website · LinkedIn · yk1122@ic.ac.uk
I take problems that "should" run on a CPU and make them run on a pipeline instead. Across team and organization repos I've owned the deep end of the stack: fixed-point datapaths, hazard logic, AXI-DMA transfer paths, compiler backends, and live market-data feature engineering. The numbers below are real and the commit trails are public.
8,715x speedup achieved by moving a chaotic-pendulum solver onto a 12-core FPGA
68x peak speedup of an FPGA HFT engine over a NumPy baseline on live order books
98.2% DSP utilization at an overclocked 100.3 MHz
200+ functional tests passed by a C90 to RISC-V compiler I helped build
I keep the contribution trail explicit. Every project below links to the organization repo and to my own commit history, so what I built is verifiable rather than just claimed.
| Project | What I owned | Stack | Proof |
|---|---|---|---|
| Quick-Mafs / Magnetic-Pendulum-Accelerator | Core fixed-point compute pipeline and binary TCP streaming | SystemVerilog, AXI-DMA, PYNQ, Python | commits |
| Information-Processing / trading_indicators | FPGA HFT engine, on-chip learning, feature pipeline | FPGA, Python, PYNQ, NumPy | commits |
| Compilers-Labs / langproc-cw | Parser, AST, scope analysis, RISC-V codegen | C++, Lex/Yacc, RISC-V | commits |
| IAC-Group-2 / Team-2-RISCV-CPU | Pipeline, hazard unit, cache, branch predictor | SystemVerilog, RTL | commits |
| Information-Processing / Team-9 | Real-time data pipeline and feature engineering | Python, Jupyter | commits |
| IAC-Group-2 / Lab4-Reduced-RISC-V | Datapath, decode, ALU, control logic | SystemVerilog, RISC-V | repo |
Took a simulation that ran for 17 hours on a CPU and made it finish in 7 seconds.
- 12 parallel Q4.14 cores, 21-stage pipelines, 98.2% DSP utilization, overclocked to 100.3 MHz.
- 8,715x runtime speedup (61,878s to 7.10s) and roughly 24x lower frame latency by replacing a Flask transfer path with AXI-DMA and a custom binary TCP protocol.
- Hand-tuned fixed-point arithmetic in SystemVerilog with full DSP packing, plus streaming visualization for live trajectory playback.
- Repo: Quick-Mafs/Magnetic-Pendulum-Accelerator
A trading system that trains and infers on the FPGA itself, while the market moves.
- Live Binance order-book ingestion feeding a microstructure feature pipeline, producing 2-second price forecasts at 57.6% live directional accuracy.
- ~18x average and 68x peak speedup vs NumPy, with online learning via incremental outer-product weight updates on-chip, so there is no retrain-and-redeploy loop.
- Owned the full path from raw market data to hardware acceleration to live decisioning on PYNQ-Z1.
- Repo: Information-Processing/trading_indicators
A compiler that turns real C into real assembly and passes 200+ functional tests.
- Lex/Yacc front end, full AST, scope and semantic analysis, and RISC-V code generation.
- Validated end-to-end through a Dockerized testbench over a large program suite.
- Repo: Compilers-Labs/langproc-cw
A 5-stage RISC-V core with every hazard handled in hardware.
- Full data and control hazard resolution, a 2-way writeback cache, and dynamic branch prediction.
- RTL design in SystemVerilog, verified against a RISC-V instruction test suite.
- Repo: IAC-Group-2/Team-2-RISCV-CPU
- Squarepoint Capital: incoming Software Developer Intern (quant tech), summer 2026.
- SOTA AI: Computer Vision Engineer. Rebuilt the core image-generation pipeline for 90% higher first-pass success and 88% faster turnaround, and unlocked text rendering on curved surfaces and rotated fonts for a 30% larger addressable market.
- Imperial College London: UTA for Digital Electronics & Computer Architecture, supporting 100+ students on memory systems, low-level architecture, and pipelined CPU design.
- Ideatec Software: led a 7-person team building backend APIs and an AI orchestration model that cut manual work 40% across 30+ workflows, shipped a RAG retrieval system with embeddings, and presented at the APEC SME & Venture Expo.
- ROK Army (Satellite Ops): root-caused an 18-month satellite comms blackout to a power-starved encryption module and a mis-tuned broadcast channel.
- Black Medal, top 8 of 165, South East Asia Mathematics Competition
- Top 70 globally, IMC Prosperity Algorithmic Trading Competition
- Top-3 team globally, World Mathematics Competition (problem-solving and cipher-breaking)
- Winner, Macquarie Hackathon 2025
- Selected for YC Startup School 2026, San Francisco (from 30,000+ applicants)
Languages: Python · C++ · C · SystemVerilog · RISC-V Assembly Hardware / FPGA: Vivado · Vitis · AXI / AXI-DMA · PYNQ · fixed-point arithmetic · pipelining · RTL design ML / Data: NumPy · RAG · embeddings · pHash Tools: Linux · Git · AWS · FastAPI · LangChain · Jira
Currently most interested in hardware/software co-design for real-time systems, low-latency trading infrastructure, and compiler and CPU architecture.

