Skip to content
View sandeep-jay's full-sized avatar

Highlights

  • Pro

Organizations

@Apereo-Learning-Analytics-Initiative

Block or report sandeep-jay

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
sandeep-jay/README.md

Sandeep Jayaprakash

Data & AI Platform Architect & Engineering Leader

I build governed data and AI platforms — lakehouse architecture, RAG and LLM systems, and responsible-AI workflows. 15+ years architecting the data infrastructure behind teaching, advising, and research at UC Berkeley.

Open to senior data & AI architecture/engineer/data science and leadership roles.


Projects & reference implementations

I build end-to-end reference implementations — each taking one architecture or technique from data through to a working, governed system.

Flagship builds

  • Campus RAG Assistant — production-style enterprise RAG platform: with pluggable multi-cloud LLM + RAG + Vector store provider registry supporting mixed deployments. Providers include (AWS Bedrock + OpenSearch) / (Azure OpenAI + AISearch) / mock mode, LangGraph orchestration, RAGAS evaluation, LangSmith tracing, and full CI/CD using Github Actions. Implemented using Next.js + FastAPI. Documentation →
  • Scribe IQ — governed clinical-documentation AI: note-grounded RAG with enforced citations, structured note generation, and a first-class audit trail. Built entirely on synthetic data (Synthea + public clinical-note datasets). Documentation →
  • Fabric HLS Readmission Lakehouse — Microsoft Fabric-native medallion lakehouse and ML scoring for synthetic hospital-readmission analytics, with an explicit Databricks-to-Fabric pattern mapping. (Repository publishing soon.)

Data science & ML samples (in progress)

Focused builds isolating a single technique end-to-end:


Open source & community

UC Berkeley Educational Technology Services (@ets-berkeley-edu) Contributor to the campus data and learning-platform ecosystem:

  • boac — Berkeley Online Advising (BOA), the award-winning academic advising platform
  • nessie — data pipeline and analytics engine
  • chabot - RTL Gen AI Chatbot platform for support use cases. limited pilot
  • data-loch — AWS data lake infrastructure for learning data
  • cloud-lrs — cloud-based Learning Record Store
  • cloudlrs-ingest-microservice - Learning Events Processing microservice for LRS
  • bcourses-chatbot-poc — GenAI support-chatbot for internal training/tutorial purposes

Apereo Learning Analytics Initiative Analytics Liaison and Community Coordinator on Apereo Foundation Learning Analytics Initiative. Contributor to the campus data and learning-platform ecosystem:

  • LearningAnalyticsProcessor - An open source, Java-based analytics workflow manager to run Pentho-based data integration + ML pipelines. First automation of OAAI research.

Focus areas

Lakehouse & Data Architecture · RAG / LLM Systems · MLOps & Observability · Responsible AI & Governance · AWS · Azure · Python · Spark

Background

  • 15+ years of experience in Higher Education

  • ~10 years at UC Berkeley, building governed cloud-native data and AI/ML platforms — the foundational RTL Data Lake, the data systems behind the award-winning Berkeley Online Advising

  • Lead Architect on Enterprise Data Lakes & Lakehouse and built Data mesh Architectures to support domain ownership and seamless connectivity across campus.

  • Developed a high-throughput multi-tenant streaming platform processing an average of ~5M events/day.

  • Data Science and ML/NLP work in support of Research enablement - Built MLOps pipeline supporting reproducible research.

  • Now working on the campus's governed GenAI work, building production-style Knowledge assistants grounded in institutional data and Responsible AI audit and provenance trails

  • Earlier, while at Marist University, I led the Gates Foundation-funded Open Academic Analytics Initiative research, working with principal investigators to build open-source academic early alert risk models.

  • Scaled research to multi-institution production deployments.

  • Ten peer-reviewed publications in Learning Analytics and Educational Data Mining.

Connect

LinkedIn · Google Scholar

Pinned Loading

  1. campus-rag-assistant campus-rag-assistant Public

    Production-minded multicloud AI platform for governed campus knowledge: RAG, LangGraph orchestration, AWS/Azure providers, cited answers, evals, and cloud-ready ops.

    Python

  2. scribe-iq scribe-iq Public

    Grounded clinical documentation AI prototype built on a synthetic Synthea patient spine, public clinical note corpora, RAG, pgvector, FastAPI, Next.js, AWS/Azure LLM providers, and governed LLM aud…

    Python

  3. nessie nessie Public

    Forked from ets-berkeley-edu/nessie

    Networked engines supply statistics in education.

    Python

  4. boac boac Public

    Forked from ets-berkeley-edu/boac

    Berkeley Online Advising (BOA) ✈️

    Python

  5. ets-berkeley-edu/data-loch ets-berkeley-edu/data-loch Public

    AWS Data Lake infrastructure to store and process Learning Data

    JavaScript 5 6