I build governed data and AI platforms — lakehouse architecture, RAG and LLM systems, and responsible-AI workflows. 15+ years architecting the data infrastructure behind teaching, advising, and research at UC Berkeley.
Open to senior data & AI architecture/engineer/data science and leadership roles.
I build end-to-end reference implementations — each taking one architecture or technique from data through to a working, governed system.
Flagship builds
- Campus RAG Assistant — production-style enterprise RAG platform: with pluggable multi-cloud LLM + RAG + Vector store provider registry supporting mixed deployments. Providers include (AWS Bedrock + OpenSearch) / (Azure OpenAI + AISearch) / mock mode, LangGraph orchestration, RAGAS evaluation, LangSmith tracing, and full CI/CD using Github Actions. Implemented using Next.js + FastAPI. Documentation →
- Scribe IQ — governed clinical-documentation AI: note-grounded RAG with enforced citations, structured note generation, and a first-class audit trail. Built entirely on synthetic data (Synthea + public clinical-note datasets). Documentation →
- Fabric HLS Readmission Lakehouse — Microsoft Fabric-native medallion lakehouse and ML scoring for synthetic hospital-readmission analytics, with an explicit Databricks-to-Fabric pattern mapping. (Repository publishing soon.)
Data science & ML samples (in progress)
Focused builds isolating a single technique end-to-end:
UC Berkeley Educational Technology Services (@ets-berkeley-edu) Contributor to the campus data and learning-platform ecosystem:
- boac — Berkeley Online Advising (BOA), the award-winning academic advising platform
- nessie — data pipeline and analytics engine
- chabot - RTL Gen AI Chatbot platform for support use cases. limited pilot
- data-loch — AWS data lake infrastructure for learning data
- cloud-lrs — cloud-based Learning Record Store
- cloudlrs-ingest-microservice - Learning Events Processing microservice for LRS
- bcourses-chatbot-poc — GenAI support-chatbot for internal training/tutorial purposes
Apereo Learning Analytics Initiative Analytics Liaison and Community Coordinator on Apereo Foundation Learning Analytics Initiative. Contributor to the campus data and learning-platform ecosystem:
- LearningAnalyticsProcessor - An open source, Java-based analytics workflow manager to run Pentho-based data integration + ML pipelines. First automation of OAAI research.
Lakehouse & Data Architecture · RAG / LLM Systems · MLOps & Observability ·
Responsible AI & Governance · AWS · Azure · Python · Spark
-
15+ years of experience in Higher Education
-
~10 years at UC Berkeley, building governed cloud-native data and AI/ML platforms — the foundational RTL Data Lake, the data systems behind the award-winning Berkeley Online Advising
-
Lead Architect on Enterprise Data Lakes & Lakehouse and built Data mesh Architectures to support domain ownership and seamless connectivity across campus.
-
Developed a high-throughput multi-tenant streaming platform processing an average of ~5M events/day.
-
Data Science and ML/NLP work in support of Research enablement - Built MLOps pipeline supporting reproducible research.
-
Now working on the campus's governed GenAI work, building production-style Knowledge assistants grounded in institutional data and Responsible AI audit and provenance trails
-
Earlier, while at Marist University, I led the Gates Foundation-funded Open Academic Analytics Initiative research, working with principal investigators to build open-source academic early alert risk models.
-
Scaled research to multi-institution production deployments.
-
Ten peer-reviewed publications in Learning Analytics and Educational Data Mining.


