Skip to content

Latest commit

 

History

History
1375 lines (1064 loc) · 63.5 KB

File metadata and controls

1375 lines (1064 loc) · 63.5 KB

🔥 FireScope SF — Hackathon Master Context Document

Databricks Fire Risk Prevention & Response Optimization Engine

Problem Statement 3 | Complete Strategic & Technical Reference

Version: 1.0 | Last Updated: April 11, 2026 Status: Active hackathon build — use this as the single source of truth for all decisions


TABLE OF CONTENTS

  1. Hackathon Problem Statement — Decoded
  2. Strategic Analysis — Winner's Framework
  3. What Most Teams Will Build vs What We Build
  4. Real-World Precedents — What Already Exists
  5. Firebird Deep Dive — Our Primary Inspiration
  6. Updated Architecture — Dual-Granularity Risk Engine
  7. Databricks Platform — Low-Level Technical Plan
  8. Gold Tables — Full Schema
  9. ML Strategy — Formula + Challenger Model
  10. Geospatial Command Center — UI Design
  11. Demo Strategy — Three Clicks to Win
  12. Judge Psychology & Objections
  13. 24-Hour Execution Plan
  14. 10 Differentiators Over Prior Systems
  15. Winning Narrative & Pitch

1. HACKATHON PROBLEM STATEMENT — DECODED

Plain-English Summary

You are given 5 years of real San Francisco Fire Department operational data across five open datasets. In 14 days (hackathon time), build a production-ready Databricks lakehouse that helps the fire department:

  1. Predict where fires and losses are most likely to happen (risk scoring)
  2. Deploy units and inspection resources smarter (response optimization)
  3. Close compliance gaps faster (violation tracking and inspection prioritization)

Why This Problem Is Unsolved Today

SFFD currently schedules inspections using a combination of complaint-driven dispatch, fixed annual cycles by property type, and inspector discretion. There is no cross-dataset risk scoring system — violations, incidents, call history, and permit status are tracked in four separate tools with no unified property identity. The result: a battalion chief cannot answer "which 20 buildings in my district need inspection this week" using data. This project builds that answer for the first time.

The Five Datasets (SF Open Data / Socrata)

Dataset Records Socrata ID Primary Use
Fire Incidents 250K+ wr8u-xric Historical fire events, loss, type
Calls for Service 1M+ nuek-vuh3 Dispatch, response times, unit deployment
Fire Violations 150K+ fey4-yub6 Open/closed violations per property
Fire Inspections 80K+ wb4c-6hwj Inspection history, pass/fail, recency
Fire Permits 50K+ 893e-7rap Permit type (occupancy risk proxy)

Required Deliverables (Explicit)

  • 5+ Gold Delta tables with a clean medallion architecture
  • Risk prediction layer (per property or area)
  • Response optimization layer (unit/station coverage)
  • Compliance tracking system
  • Operational dashboard dataset
  • Full documentation and notebooks

Explicit Goal

Ship an integrated Databricks-native platform with medallion architecture, ML risk scoring, response optimization, compliance analytics, and an operational dashboard that a fire chief or prevention lead could actually use.

Hidden Goal

Prove you can think like a real enterprise data platform team:

  • Strong data governance (Unity Catalog, lineage)
  • Reproducible ML (MLflow, model registry)
  • Orchestrated pipelines (Workflows/DLT)
  • Production-grade Gold tables, not just notebook outputs
  • A solution pattern that generalizes to other cities and risk domains

Likely Judging Intent

  1. Platform depth: Did you use Databricks the way a real customer would — medallion, UC, MLflow, Model Serving, Workflows?
  2. Operational value: Are your insights actionable to an overworked fire chief, not just impressive to a data scientist?
  3. Generalizability: Does this look like a reusable pattern for other cities and risk domains, not a one-off hack?
  4. Demo magic: Can you create a "wow" moment in under 90 seconds?

Ambiguities Converted to Strategic Opportunities

Ambiguity Strategic Opportunity
"Risk" is undefined Define a composite "Fire Harm Index" — you own the narrative
"Response optimization" is vague Use NFPA 1710 4-minute travel standard as a concrete, authoritative benchmark
No explicit fairness requirement Add fairness/coverage analysis proactively — massive governance cred
No LLM requirement (stretch) Add a narrow, governed explanation assistant — a RAG system with zero free-form generation: it only composes sentences using structured values already in gold.property_risk_twin. No external knowledge, no hallucination surface. Input: property_id. Output: a templated 3-sentence plain-English risk summary grounded entirely in Gold table columns. Evaluated using a custom mlflow.evaluate() metric checking for field-grounding.
No multi-city requirement Show the pattern generalizes — same medallion, different city, same Gold schema

2. STRATEGIC ANALYSIS — WINNER'S FRAMEWORK

Business Value

The painful, expensive, frequent, urgent problem:

  • US structure fires: 494,000/year causing 2,800 deaths and $9.8B in property damage (2014 NFPA data)
  • Municipal fire departments operate with flat budgets and staffing while risk grows with climate, urban densification, and aging infrastructure
  • SFFD conducts thousands of commercial inspections/year but has no data-driven prioritization — gut feel and tradition dominate
  • Inspection resources are finite; high-risk properties fall through the cracks
  • San Francisco has ~7,500 commercial properties requiring annual fire inspections; SFFD has historically completed ~4,000–5,000/year — a structural gap of 30–40%
  • The 2016 Ghost Ship warehouse fire (Oakland, 36 deaths) — 30 miles from SF — was a property with unresolved violations and zero inspections in years: exactly the "dark property" pattern our system targets
  • SF's dense SoMa/Mission/Tenderloin neighborhoods combine aging building stock, high occupancy, and historically understaffed inspection coverage — the highest-risk confluence in the dataset

The ROI argument:

  • Preventing one major structural fire (multi-million loss + injuries) pays for the entire Databricks platform several times over
  • Dark properties (buildings never inspected despite active permits) represent the greatest unaddressed risk

User Pull — Who Cares Enough to Adopt

User Pain What They Get
Fire Prevention Officer "Which buildings do I inspect this week?" Ranked inspection priority queue with formula-based explanations
Battalion Chief "Where are my coverage gaps?" NFPA compliance map by station area
Compliance Lead "Which violations never get closed?" Compliance funnel: violations → inspections → closures
City Executive "Can I defend our resource allocation?" Fairness/coverage chart by district
Analytics Lead "Is our model still accurate?" Drift monitoring dashboard

Databricks Fit — Why This Is Better on Databricks

  1. Scale: 1M+ calls-for-service records require distributed Spark processing, not serial pandas
  2. Geospatial: H3 indexing in Databricks (via Mosaic or h3-py) enables property-to-hex aggregation at city scale
  3. Governance: Unity Catalog provides lineage from raw Socrata to Gold risk table — essential for a public agency with audit requirements
  4. ML lifecycle: MLflow experiment tracking, model registry, and champion/challenger comparisons are native
  5. Orchestration: Databricks Workflows connects Bronze → Silver → Gold → scoring → dashboard refresh as a governed pipeline
  6. Reusability: Medallion pattern + UC schema can be lifted to any city with open fire data

Technical Depth Opportunities

  • Dual-granularity H3 + property entity architecture
  • Nearest-neighbor distance features (not just raw counts)
  • Spatial cross-validation (not random k-fold — statistically wrong for geographic data)
  • Weighted formula primary + MLflow-tracked ML challenger
  • PSI-style feature drift monitoring over time windows
  • Fairness/coverage analysis by district

3. WHAT MOST TEAMS WILL BUILD VS WHAT WE BUILD

What Most Teams Will Do (losing approach)

  • One or two dashboards: incident heatmaps, average response time by district, top violator properties
  • A basic XGBoost risk score with random k-fold CV
  • No entity resolution across datasets (each dataset analyzed in isolation)
  • No Unity Catalog governance, minimal MLflow tracking
  • No drift monitoring, no fairness analysis
  • No deployment or model registry
  • A "pretty map" with random colored dots and no decision layer
  • A generic chatbot on top of the data with no evaluation

What the Top 1% Team Does (our approach)

  • Treats this as a reusable "Public Safety Risk Lakehouse" pattern that generalizes to dropout, credit risk, and other cities
  • Builds a property-centric "Fire Risk Twin" with per-property risk scores, explanations, and proactive recommendations
  • Implements dual-granularity architecture: H3 cells for the map surface, property entities for the inspection queue
  • Discovers "dark properties" — buildings never inspected despite active permits or open violations
  • Uses NFPA 1710 4-minute travel standard as a concrete, legally-grounded benchmark for response optimization
  • Adds spatial cross-validation, nearest-neighbor distance features, and weighted explainable formula
  • Deploys a geospatial command center with 5 layered overlays, time slider, and drill-down panel
  • Governs everything in Unity Catalog with automatic lineage, access policies, and drift/fairness metrics -- Surfaces permit expiry risk — properties with expired fire safety permits that have not renewed are a prospective leading indicator not used by any cited prior system; derivable entirely from bronze.sf_permits with zero external data

4. REAL-WORLD PRECEDENTS — WHAT ALREADY EXISTS

System 1: FDNY FireCast (New York City) — Production, Most Advanced

Status: Active, in production since 2010 (currently v3.0) Scale: 330,000 buildings, 17 city agencies, 7,500 risk factors, daily score updates

Architecture:

  • Layer 1: Data warehouse pulling nightly ETL from 17 agencies (Dept of Buildings, Finance, Health, DEP, 311, etc.)
  • Layer 2: FireCast risk scoring engine — 7,500 variables, proprietary ML, daily updates
  • Layer 3: RBIS operational interface — ranked inspection lists, building profiles, 26 inspection types, closed-loop feedback

Key challenges FDNY faced (lessons for us):

Challenge Takeaway
Data silos across 17 agencies, no shared ID Our Silver entity resolution layer solves this
"Crying wolf" — early models over-predicted risk, inspectors lost trust Calibrate scores: only ~5-10% of properties should be CRITICAL
Cultural resistance from veteran inspectors Frame as "decision support" not "decision replacement" — show the formula, not just the score
Class imbalance (~99% of buildings never had a fire) Use weighted scoring formula + class_weight='balanced' in ML challenger
Model bias on demographic features Use building-level features only, add fairness/coverage audit

Key stats for our presentation:

  • 330,000 buildings analyzed
  • Increased serious violation detection by ~25% over manual scheduling
  • 32,000 building inspections/year prioritized by FireCast

System 2: Firebird (Atlanta) — Open Source, KDD 2016

Status: Published 2016, open-source MIT, peer-reviewed at KDD 2016 Team: Georgia Tech Data Science for Social Good (DSSG) + Atlanta Fire Rescue Dept NFPA recognition: Highlighted as national best practice for data-driven fire inspections

Key result: 71.36% TPR at 20% FPR, discovered 6,096 previously unknown commercial properties

Three repos:

  1. DSSG-Firebird/property-joins — Python, dual-key address join (proximity + fuzzywuzzy)
  2. DSSG-Firebird/risk-model — R, SVM/RF with time-based validation
  3. DSSG-Firebird/interactive-map — JS, Mapbox + D3.js

The dual-key join pattern:


Step 1: Geocode all addresses → lat/lon (Google Geocoding API)
Step 2: Spatial filter — only consider records within ~50 meter tolerance
Step 3: Fuzzy name match — fuzzywuzzy string similarity on business names
Step 4: Take best match above threshold → assign canonical PropertyID
Step 5: Flag source presence (in_incidents, in_violations, in_inspections, in_permits)
Step 6: Dark property discovery — permit=True AND inspections=False → never inspected

58 features used: Building age, size, construction materials, building use type (SIC code), historical fire count, inspection results, violation count, business license type, parcel condition, floors, station proximity, socioeconomic indicators, and 46 more engineered features.

Validation: Time-based — train on first 4 years, test on final year, 10 bootstrapped samples

Key lessons:

  • Address matching is hard — ~15% missed even with their approach. Accept imperfection, use district-level fallback.

  • Our fallback strategy: if Levenshtein join confidence < 0.7 threshold, fall back to H3 cell-level district aggregation. District-level scores (33 districts in SF) still power all four deliverables and lose only the per-property ranking precision. Log match quality as a Silver metadata column: match_confidence (0.0–1.0) so judges can see the join quality directly.

  • SVM = accurate but black box. Inspectors wanted to know WHY. Weighted formula is better for adoption.

  • Missing data: convert NAs to explicit "NA" category for categorical, impute median for continuous — never drop.

System 3: Pittsburgh Fire Risk Analysis — Government-Maintained Open Source

Status: Active, maintained by City of Pittsburgh, CMU Metro21 partnership GitHub: CityofPittsburgh/fire_risk_analysis

Full pipeline in one shell script:


getdata.py → riskmodel.py → merger.py → R Shiny Dashboard

What Pittsburgh maps to our project:

  • getdata.py → Our Bronze ingest notebooks
  • riskmodel.py → Our MLflow experiment + Gold risk scoring
  • merger.py → Our Gold table joins
  • R Shiny Dashboard → Our geospatial command center

System 4: PredictingFireRisk (San Francisco) — USES EXACT SAME DATA

Status: Published 2020, educational/academic workshop by Ken Steif (UPenn) GitHub: urbanSpatial/PredictingFireRisk Critical: Uses San Francisco Fire Incidents (wr8u-xric) — the EXACT same dataset as ours

Key innovations we steal:

1. Fishnet / Grid-cell spatial unit of analysis

Instead of only property-level modeling, the SF system creates 500ft × 500ft grid cells (we use H3 hexagons instead). This gives a continuous risk surface — even areas with no known properties still have a risk level based on surrounding evidence.

2. Nearest-neighbor distance features (NOT raw counts)


vacants.nn   → Average distance to 3 nearest vacant parcels
graf.nn      → Average distance to 5 nearest graffiti reports

Why better: A property with zero violations next to a cluster of highly-violated buildings is still at risk. Raw count within buffer misses this. Distance to nearest hazard captures area-level exposure.

In our implementation:


nearest_violation_h3_dist  → Distance to nearest H3 cell with open violations
nearest_incident_h3_dist   → Distance to nearest H3 cell with fire incidents

3. "Latent risk" framing

"Fire risk is greater than actual fires observed."

  • Properties never inspected have lower apparent incident rates even if their structural risk is high
  • Standard supervised learning systematically underrates dark properties
  • Our weighted formula + dark-property penalty corrects for this observation bias
  • This framing immediately impresses ML-savvy judges

4. Spatial cross-validation

  • Standard k-fold CV is statistically wrong for spatial data (spatially autocorrelated neighbors leak information)
  • Use neighborhood-based folds — train on some neighborhoods, validate on others
  • Even mentioning this scores major points with ML-engineer judges

5. Urban blight signals fire risk

SF system used 311 graffiti reports as blight proxies. We don't need external data — our five datasets contain better, fire-specific blight signals:

  • Open violation density by area
  • Low permit activity vs high incident density (ratio)
  • High call-to-incident ratio (nuisance/deterioration signal)
  • Inspection recency gaps

System 5: SFFD Response Time Predictor — Uses Same SF Calls Dataset

GitHub: areevesman/SFFD-response-times Language: Python + Apache Spark Uses: SFFD Calls for Service (nuek-vuh3) — SAME dataset as ours

Key finding: Response time ≠ just distance. Traffic, time of day, and simultaneous call volume matter MORE than distance.

What we borrow:

  • Timestamp parsing logic for SFFD call data
  • Response time calculation: total_response_min = (on_scene_ts - received_ts)
  • Travel time specifically: travel_time_min = (on_scene_ts - dispatch_ts) — compare to NFPA 4-minute standard
  • Finding: time-of-day and concurrent call volume are the biggest compliance risk factors

NFPA 1710 Standard — Our Benchmark Authority

Component Benchmark Notes
Travel Time (1st Engine) 240 seconds (4 minutes) Compare on_scene_ts - dispatch_ts
Full Alarm Assignment 480 seconds (8 minutes) Full response complement
Compliance measurement 90th percentile Not average — 90th percentile compliance

Formula:

travel_time_min = (on_scene_dttm - dispatch_dttm).total_seconds() / 60.0
nfpa_pass = (travel_time_min <= 4.0)
nfpa_compliance_rate = nfpa_pass.mean()  # % of calls meeting 4-min standard

This transforms "response optimization" from a vague concept into a concrete, legally-grounded, auditable metric.


5. FIREBIRD DEEP DIVE — OUR PRIMARY INSPIRATION

Direct Translation: Firebird (Atlanta 2016) → Our Databricks Build (SF 2026)

Firebird Component Our Databricks Equivalent Why Better
address_to_xy_geocoding.py (serial Python) Silver transform: PySpark address normalization at scale Distributed, 100x faster
fuzzywuzzy dual-key join PySpark levenshtein() join with H3 spatial bucketing Distributed, governed in UC
Flat CSV property list silver.property_entity_master Delta table Versioned, lineaged, queryable
Property_short_list_generator.py gold.property_risk_twin SHAP factors added, UC governed
SVM.R with time-based validation MLflow experiment: XGBoost + spatial CV Tracked, reproducible, champion/challenger
Manual 1-10 score mapping by visual inspection PR-curve threshold optimization Data-driven, not eyeballed
Risk score CSV output Gold table + optional Model Serving endpoint SQL-queryable, REST-servable
Static Leaflet/GIS map H3-backed geospatial command center Live, layered, temporal playback
No fairness analysis Explicit fairness Gold table: risk vs coverage by district Production-grade trust layer

What Firebird Did NOT Have (Our Competitive Differentiators)

  1. No medallion architecture — flat CSVs and R scripts. We have Bronze → Silver → Gold with Delta, DLT, Workflows.
  2. No model governance — no registry, no versioning, no champion/challenger. We have MLflow + UC Model Registry.
  3. No drift detection — static model built once. We add PSI-style drift metrics tracked over time.
  4. No fairness analysis — paper acknowledged bias concerns but didn't instrument them. We have a Fairness Gold table.
  5. No lineage or audit trail — no governance at all. We get UC automatic lineage.
  6. No explanations — output was a score, no per-property reasons. We add formula decomposition + SHAP factors.
  7. No H3 indexing — used lat/lon points only. We build H3 hex cells for scalable, layered rendering.
  8. No dual granularity — only property-level. We have H3 macro surface + property micro layer simultaneously.
  9. No NFPA benchmark — no response time analysis at all. We compute NFPA 1710 compliance per station area.
  10. No cross-dataset dark property discovery — their join found new properties. We do the same for SF but flag dark properties by type.

The Dark Property Discovery Pattern (Peer-Validated)

Atlanta: 6,096 properties discovered that AFRD didn't know about. Pittsburgh: Similar gaps found. San Francisco: We expect hundreds to thousands of "dark properties."

Four types of dark properties we will discover:

# Type 1: Has active permit, never inspected
dark_type_1 = properties where in_permits=True AND total_inspections=0

# Type 2: Has open violations, no follow-up inspection
dark_type_2 = properties where open_violation_count > 0 AND inspections_after_violation=0

# Type 3: Has incident history, no permit on file
dark_type_3 = properties where incident_count > 0 AND in_permits=False

# Type 4: Appears in 3+ datasets but has never been flagged for inspection
dark_type_4 = properties where source_count >= 3 AND inspection_priority_rank IS NULL

The demo moment: "These 600+ buildings have active business licenses and zero inspections in five years. The department doesn't even know they exist in their inspection queue."


6. UPDATED ARCHITECTURE — DUAL-GRANULARITY RISK ENGINE

Core Innovation

The single biggest architectural decision that separates us from all prior systems:

We run two parallel granularities simultaneously:

Layer Granularity Unit Primary Purpose Databricks Table
Macro H3 Resolution 9 (~175m hex) H3 cell Map surface, hotspot detection, NFPA coverage analysis gold.h3_risk_surface
Micro Property entity Canonical address/building Inspection priority queue, action card, dark property flag gold.property_risk_twin

Why this wins:

  • Property level alone: can't render a smooth risk surface on a map (too many points, irregular shapes)
  • H3 level alone: can't give an inspector a specific building address with specific reasoning
  • Both together: the map is the navigation, the property card is the action. This is exactly what operational systems like FDNY RBIS do.

Full Medallion Architecture

SF Open Data (Socrata)
  5 datasets × ~500K-1M total records
         │
         ▼
┌─────────────────────────────────────────────┐
│              BRONZE LAYER                    │
│   sf_fire.bronze.*                           │
│                                              │
│   bronze.sf_incidents      (250K records)   │
│   bronze.sf_calls          (1M+ records)    │
│   bronze.sf_violations     (150K records)   │
│   bronze.sf_inspections    (80K records)    │
│   bronze.sf_permits        (50K records)    │
│                                              │
│   All: Delta managed tables, append-only    │
│   Partitioned by date field                 │
│   Unity Catalog governed                    │
│   Auto Loader or batch Spark ingest         │
└──────────────────┬──────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────┐
│              SILVER LAYER                    │
│   sf_fire.silver.*                           │
│                                              │
│   silver.address_entity_master              │
│     → Canonical PropertyID via dual-key     │
│       join (H3 spatial + Levenshtein name)  │
│     → Source presence flags for 5 datasets  │
│     → lat/lon, h3_index (resolution 9)      │
│     → dark_property_type (1-4)              │
│                                              │
│   silver.property_features                  │
│     → Per-property: incident_count_3yr,    │
│       call_frequency_12mo, violation_count, │
│       inspection_failure_rate,              │
│       days_since_last_inspection,           │
│       permit_type_risk_tier,                │
│       source_count (1-5 confidence)         │
│                                              │
│   silver.h3_features                        │
│     → Per H3 cell: fire_count_3yr,          │
│       call_density_12mo, open_violations,   │
│       nearest_violation_h3_dist,  ← NEW     │
│       nearest_incident_h3_dist,   ← NEW     │
│       local_moran_i (spatial autocorr)      │
│       mean_days_since_inspection            │
│                                              │
│   silver.response_performance               │
│     → Per call: travel_time_min,            │
│       nfpa_pass (≤4min), time_of_day,       │
│       concurrent_calls, station_id          │
│     → Per station: compliance_rate,         │
│       p90_travel_time, avg_travel_time      │
└──────────────────┬──────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────┐
│               GOLD LAYER                     │
│   sf_fire.gold.*                             │
│                                              │
│   gold.property_risk_twin          ← CORE   │
│   gold.h3_risk_surface             ← NEW    │
│   gold.dark_property_discovery     ← NEW    │
│   gold.nfpa_response_compliance    ← NEW    │
│   gold.inspection_priority_queue            │
│   gold.compliance_funnel                    │
│   gold.fairness_coverage                    │
│   gold.model_health_drift                   │
└──────────────────┬──────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────┐
│         ML + SERVING LAYER                   │
│                                              │
│   MLflow Experiments:                        │
│     → Formula baseline (interpretable)      │
│     → RF/XGBoost challenger model           │
│     → Spatial CV validation metrics         │
│     → SHAP feature importance               │
│                                              │
│   UC Model Registry:                         │
│     → Champion: weighted formula            │
│     → Challenger: XGBoost                   │
│                                              │
│   Model Serving (optional):                  │
│     → score_property(address_id) endpoint   │
│     → explain_property(address_id) endpoint │
└──────────────────┬──────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────┐
│      ORCHESTRATION + GOVERNANCE              │
│                                              │
│   Databricks Workflows:                      │
│     → Weekly pipeline: Bronze→Silver→Gold   │
│     → Drift monitoring refresh              │
│     → Dashboard dataset refresh             │
│                                              │
│   Unity Catalog:                             │
│     → All tables governed under             │
│       sf_fire_prod.{bronze,silver,gold}      │
│     → Automatic lineage from Socrata        │
│       to Gold risk tables                   │
│     → Row/column-level access control       │
│                                              │
│   Monitoring:                                │
│     → gold.model_health_drift               │
│     → Alerts on feature drift / data lag    │
└──────────────────┬──────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────┐
│       GEOSPATIAL COMMAND CENTER UI           │
│                                              │
│   Full-screen SF map (Mapbox / deck.gl)     │
│   Backed by Gold tables                     │
│   5 layered overlays                        │
│   Property drill-down panel                 │
│   NFPA compliance overlay                   │
│   Dark property discovery layer             │
│   Time slider + drift view                  │
│   Trust/health strip                        │
└─────────────────────────────────────────────┘

7. DATABRICKS PLATFORM — LOW-LEVEL TECHNICAL PLAN

Unity Catalog Setup

# Catalog + schema structure
spark.sql("CREATE CATALOG IF NOT EXISTS sf_fire_prod")
spark.sql("CREATE SCHEMA IF NOT EXISTS sf_fire_prod.bronze")
spark.sql("CREATE SCHEMA IF NOT EXISTS sf_fire_prod.silver")
spark.sql("CREATE SCHEMA IF NOT EXISTS sf_fire_prod.gold")
spark.sql("CREATE SCHEMA IF NOT EXISTS sf_fire_prod.ml")

All tables created as sf_fire_prod.{layer}.{table_name} — UC-managed Delta tables with automatic lineage.

Bronze Ingest Pattern

# Auto Loader for incremental ingest (or batch for hackathon)
from pyspark.sql.functions import current_timestamp

df = (spark.read
    .format("csv")
    .option("header", "true")
    .option("inferSchema", "true")
    .load("/Volumes/sf_fire_prod/landing/sf_incidents/")
    .withColumn("_ingest_timestamp", current_timestamp()))

df.write.format("delta") \
    .mode("append") \
    .partitionBy("incident_date") \
    .saveAsTable("sf_fire_prod.bronze.sf_incidents")

Silver: Address Entity Resolution (Firebird Pattern, Distributed)

from pyspark.sql.functions import levenshtein, greatest, length, col, row_number
from pyspark.sql.window import Window
import h3

# Step 1: Add H3 index to all datasets (resolution 9 = ~175m cells)
h3_udf = udf(lambda lat, lon: h3.geo_to_h3(lat, lon, 9), StringType())

incidents_h3 = incidents_silver.withColumn("h3_index", h3_udf(col("latitude"), col("longitude")))
violations_h3 = violations_silver.withColumn("h3_index", h3_udf(col("address_lat"), col("address_lon")))
# ... same for all 5 datasets

# Step 2: Spatial bucket join — only consider rows in same H3 cell
cross = incidents_h3.join(permits_h3, on="h3_index")

# Step 3: Fuzzy name similarity
cross = cross.withColumn("name_sim",
    1 - levenshtein(col("business_name_a"), col("business_name_b")) /
    greatest(length(col("business_name_a")), length(col("business_name_b"))))

# Step 4: Best match per entity
w = Window.partitionBy("incident_entity_id").orderBy(col("name_sim").desc())
best = cross.withColumn("rank", row_number().over(w)).filter(col("rank") == 1)

Silver: H3 Nearest-Neighbor Distance Features (PredictingFireRisk Pattern)

# For each H3 cell, find distance to 3 nearest cells with open violations
# This is the key innovation from PredictingFireRisk

from pyspark.sql.functions import pandas_udf
import h3

@pandas_udf("double")
def nearest_violation_h3_dist(h3_indices: pd.Series, violation_h3_set: pd.Series) -> pd.Series:
    """Compute average distance to k nearest H3 cells with violations."""
    violation_cells = set(violation_h3_set)
    k = 3
    distances = []
    for h3_idx in h3_indices:
        rings = []
        for ring in range(1, 15):  # expand rings outward
            neighbors = h3.k_ring(h3_idx, ring) - h3.k_ring(h3_idx, ring-1)
            found = neighbors & violation_cells
            if found:
                rings.extend([ring] * len(found))
                if len(rings) >= k:
                    break
        distances.append(sum(rings[:k]) / k if rings else 15.0)
    return pd.Series(distances)

Silver: NFPA Response Performance

from pyspark.sql.functions import col, unix_timestamp, when

calls = spark.table("sf_fire_prod.silver.sf_calls")

response_df = calls.withColumn(
    "travel_time_min",
    (unix_timestamp("on_scene_dttm") - unix_timestamp("dispatch_dttm")) / 60.0
).withColumn(
    "total_response_min",
    (unix_timestamp("on_scene_dttm") - unix_timestamp("received_dttm")) / 60.0
).withColumn(
    "nfpa_pass",
    when(col("travel_time_min") <= 4.0, 1).otherwise(0)
).withColumn(
    "dispatch_delay_min",
    (unix_timestamp("dispatch_dttm") - unix_timestamp("received_dttm")) / 60.0
)

# Per-station NFPA compliance (90th percentile — NFPA measures at P90)
station_compliance = response_df.groupBy("station_area").agg(
    F.avg("travel_time_min").alias("avg_travel_time_min"),
    F.expr("percentile(travel_time_min, 0.90)").alias("p90_travel_time_min"),
    F.avg("nfpa_pass").alias("nfpa_compliance_rate"),
    F.count("*").alias("total_calls")
)

MLflow Experiment Pattern

import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score

mlflow.set_experiment("/sf-fire/risk-model-challenger")

with mlflow.start_run(run_name="rf_spatial_cv_v1"):
    mlflow.log_param("model_type", "RandomForestClassifier")
    mlflow.log_param("n_estimators", 200)
    mlflow.log_param("validation", "spatial_neighborhood_cv")
    mlflow.log_param("temporal_split", "train:2016-2022 test:2023")
    
    model = RandomForestClassifier(n_estimators=200, class_weight='balanced')
    
    # CRITICAL: Use spatial CV, not random k-fold
    # Split by neighborhood to prevent spatial data leakage
    cv_scores = spatial_cross_val_score(model, X_train, y_train, neighborhoods)
    
    mlflow.log_metric("cv_auc_mean", cv_scores.mean())
    mlflow.log_metric("cv_auc_std", cv_scores.std())
    mlflow.log_metric("temporal_test_auc", test_auc)
    mlflow.log_metric("tpr_at_20pct_fpr", tpr)
    
    # Log SHAP values
    import shap
    explainer = shap.TreeExplainer(model)
    shap_values = explainer.shap_values(X_test)
    mlflow.log_artifact("shap_summary.png")
    
    # Register model in UC Model Registry
    mlflow.sklearn.log_model(
        model,
        "fire_risk_challenger",
        registered_model_name="sf_fire_prod.ml.fire_risk_model"
    )

Databricks Workflows Orchestration

# Workflow: sf_fire_weekly_refresh
tasks:
  - task_key: bronze_ingest
    notebook_path: /notebooks/01_bronze_ingest
    
  - task_key: silver_entity_resolution
    depends_on: [bronze_ingest]
    notebook_path: /notebooks/02_silver_entity_resolution
    
  - task_key: silver_h3_features
    depends_on: [silver_entity_resolution]
    notebook_path: /notebooks/03_silver_h3_features
    
  - task_key: silver_response_performance
    depends_on: [bronze_ingest]
    notebook_path: /notebooks/04_silver_response_performance
    
  - task_key: gold_risk_twin
    depends_on: [silver_entity_resolution, silver_h3_features]
    notebook_path: /notebooks/05_gold_risk_twin
    
  - task_key: gold_h3_surface
    depends_on: [silver_h3_features]
    notebook_path: /notebooks/06_gold_h3_surface
    
  - task_key: gold_nfpa_compliance
    depends_on: [silver_response_performance]
    notebook_path: /notebooks/07_gold_nfpa_compliance
    
  - task_key: gold_dark_properties
    depends_on: [gold_risk_twin]
    notebook_path: /notebooks/08_gold_dark_properties
    
  - task_key: gold_model_health
    depends_on: [gold_risk_twin, gold_h3_surface]
    notebook_path: /notebooks/09_gold_model_health_drift

8. GOLD TABLES — FULL SCHEMA

gold.property_risk_twin (Primary Output)

CREATE TABLE sf_fire_prod.gold.property_risk_twin (
  property_id           STRING,      -- Canonical entity ID from Silver join
  canonical_address     STRING,      -- Normalized address
  latitude              DOUBLE,
  longitude             DOUBLE,
  h3_index              STRING,      -- H3 resolution 9 cell membership
  district              STRING,
  neighborhood          STRING,
  
  -- Primary Score (Weighted Formula)
  weighted_risk_score   DOUBLE,      -- 0-100 composite score
  risk_tier             STRING,      -- LOW / MEDIUM / HIGH / CRITICAL
  
  -- Formula Decomposition (Explainability)
  violation_score       DOUBLE,      -- Contribution from violations (0-30)
  incident_score        DOUBLE,      -- Contribution from incidents (0-25)
  inspection_score      DOUBLE,      -- Contribution from inspection gaps (0-25)
  call_score            DOUBLE,      -- Contribution from call frequency (0-10)
  permit_type_score     DOUBLE,      -- Contribution from permit type (0-10)
  
  -- Secondary Score (MLflow Challenger)
  ml_risk_probability   DOUBLE,      -- XGBoost probability 0-1
  ml_risk_tier          STRING,      -- Challenger tier
  
  -- SHAP Explanations
  top_factor_1          STRING,      -- "2 structure fires in 3 years"
  top_factor_2          STRING,      -- "3 open violations unresolved"
  top_factor_3          STRING,      -- "847 days since last inspection"
  
  -- Operational Fields
  open_violation_count  INT,
  severe_violation_count INT,
  days_since_inspection INT,
  total_inspections     INT,
  inspection_pass_rate  DOUBLE,
  incident_count_3yr    INT,
  permit_type           STRING,
  permit_type_risk_tier STRING,
  
  -- Dark Property Flags
  is_dark_property      BOOLEAN,     -- Never been inspected
  dark_property_type    INT,         -- 1-4 (see discovery types above)
  
  -- Action Layer
  recommended_action    STRING,      -- "Priority Inspection" / "Escalate Violation" / "Verify Permit"
  inspection_priority_rank INT,      -- Rank within district
  
  -- Source Confidence
  source_count          INT,         -- How many of 5 datasets contain this entity (1-5)
  
  -- Metadata
  score_computed_at     TIMESTAMP,
  model_version         STRING
)
USING DELTA
PARTITIONED BY (district)

gold.h3_risk_surface (Geospatial Map Layer)

CREATE TABLE sf_fire_prod.gold.h3_risk_surface (
  h3_index                  STRING,   -- H3 resolution 9
  h3_center_lat             DOUBLE,
  h3_center_lon             DOUBLE,
  district                  STRING,
  neighborhood              STRING,
  
  -- Aggregated Risk Metrics
  weighted_risk_score_avg   DOUBLE,   -- Avg property risk in cell
  max_property_risk         DOUBLE,   -- Max risk in cell (worst case)
  critical_property_count   INT,      -- Properties with risk_tier=CRITICAL
  high_risk_property_count  INT,
  
  -- Incident Signal
  fire_count_3yr            INT,
  fire_count_1yr            INT,
  call_density_12mo         DOUBLE,
  
  -- Compliance Signal
  open_violation_count      INT,
  severe_violation_count    INT,
  mean_days_since_inspection DOUBLE,
  
  -- Nearest Neighbor Distance Features (PredictingFireRisk pattern)
  nearest_violation_h3_dist INT,     -- Rings to nearest violation cell
  nearest_incident_h3_dist  INT,     -- Rings to nearest incident cell
  
  -- Spatial Autocorrelation
  local_moran_i             DOUBLE,  -- Hotspot indicator
  
  -- NFPA Coverage
  nfpa_compliant_pct        DOUBLE,  -- % of calls in cell meeting 4-min standard
  station_coverage_gap      BOOLEAN, -- TRUE if beyond 4-min from nearest station
  nearest_station_id        STRING,
  
  -- Combined Crisis Score (HIGH RISK + NFPA FAIL = crisis zone)
  is_crisis_zone            BOOLEAN,
  
  -- Temporal Snapshots (for time slider)
  snapshot_month            DATE,
  
  -- Metadata
  computed_at               TIMESTAMP
)
USING DELTA
PARTITIONED BY (snapshot_month)

gold.dark_property_discovery (Anchor Demo Moment)

CREATE TABLE sf_fire_prod.gold.dark_property_discovery (
  property_id           STRING,
  canonical_address     STRING,
  latitude              DOUBLE,
  longitude             DOUBLE,
  h3_index              STRING,
  district              STRING,
  
  dark_property_type    INT,         -- 1: permit+no inspection, 2: violations+no followup
                                     -- 3: incidents+no permit, 4: 3+ sources+never queued
  dark_reason           STRING,      -- Human-readable explanation
  
  weighted_risk_score   DOUBLE,      -- Despite being dark, their risk score
  risk_tier             STRING,
  years_without_inspection INT,
  open_violation_count  INT,
  incident_count_3yr    INT,
  source_count          INT,
  
  permit_type           STRING,
  business_type         STRING,
  
  recommended_action    STRING,
  urgency_rank          INT
)
USING DELTA

gold.nfpa_response_compliance (NFPA 1710 Standard)

CREATE TABLE sf_fire_prod.gold.nfpa_response_compliance (
  station_id            STRING,
  station_area          STRING,
  district              STRING,
  station_lat           DOUBLE,
  station_lon           DOUBLE,
  
  -- NFPA 1710 Compliance (90th percentile standard)
  total_calls           INT,
  nfpa_compliance_rate  DOUBLE,      -- % calls with travel_time ≤ 4.0 min
  p90_travel_time_min   DOUBLE,      -- 90th percentile travel time
  avg_travel_time_min   DOUBLE,
  avg_dispatch_delay_min DOUBLE,
  
  -- Time-of-Day Breakdown
  peak_hour_compliance  DOUBLE,      -- 7am-7pm compliance rate
  offpeak_compliance    DOUBLE,
  
  -- Call Volume Effect
  low_volume_compliance DOUBLE,      -- Compliance when few concurrent calls
  high_volume_compliance DOUBLE,     -- Compliance during multi-call periods
  
  -- Risk Exposure in Coverage Area
  high_risk_properties_count INT,
  critical_properties_count  INT,
  dark_properties_count      INT,
  h3_cells_outside_nfpa      INT,    -- H3 cells with high risk beyond 4-min
  
  -- Snapshot
  analysis_period_start DATE,
  analysis_period_end   DATE,
  computed_at           TIMESTAMP
)
USING DELTA

gold.compliance_funnel (Violations → Closures)

CREATE TABLE sf_fire_prod.gold.compliance_funnel (
  district              STRING,
  neighborhood          STRING,
  risk_tier             STRING,
  
  total_violations      INT,
  open_violations       INT,
  closed_violations     INT,
  
  violations_inspected  INT,         -- Violations that triggered an inspection
  violations_uninspected INT,        -- Violations with no follow-up inspection
  
  avg_days_to_closure   DOUBLE,
  median_days_to_closure DOUBLE,
  closure_rate          DOUBLE,
  
  severe_open_violations INT,
  repeat_offender_properties INT,    -- Properties with 3+ violations
  
  snapshot_month        DATE,
  computed_at           TIMESTAMP
)
USING DELTA
PARTITIONED BY (district)

gold.fairness_coverage (Equity/Governance Layer)

CREATE TABLE sf_fire_prod.gold.fairness_coverage (
  district              STRING,
  neighborhood          STRING,
  
  -- Risk-side
  avg_risk_score        DOUBLE,
  high_risk_property_count INT,
  critical_property_count  INT,
  expected_inspections_by_risk INT,  -- Proportional to risk share
  
  -- Inspection-side
  actual_inspections    INT,
  inspection_coverage_rate DOUBLE,
  
  -- Coverage gap
  coverage_gap          DOUBLE,      -- actual - expected (negative = under-inspected)
  coverage_gap_pct      DOUBLE,      -- % under/over relative to risk
  under_inspected       BOOLEAN,
  over_inspected        BOOLEAN,
  
  -- Trust flags
  fairness_flag         STRING,      -- "UNDER_SERVED" / "OVER_SERVED" / "BALANCED"
  
  computed_at           TIMESTAMP
)
USING DELTA

gold.model_health_drift (Monitoring Layer)

CREATE TABLE sf_fire_prod.gold.model_health_drift (
  feature_name          STRING,
  snapshot_month        DATE,
  
  -- Distribution stats
  mean_value            DOUBLE,
  p25_value             DOUBLE,
  p50_value             DOUBLE,
  p75_value             DOUBLE,
  p95_value             DOUBLE,
  
  -- PSI-style drift metric (vs baseline period)
  psi_score             DOUBLE,      -- Population Stability Index
  drift_status          STRING,      -- "STABLE" (PSI<0.1) / "MONITOR" / "DRIFT"
  
  -- Model performance (where labels available)
  auc_monthly           DOUBLE,
  calibration_error     DOUBLE,
  
  -- Data freshness
  data_lag_days         INT,         -- How stale is our Bronze data?
  
  model_version         STRING,
  computed_at           TIMESTAMP
)
USING DELTA
PARTITIONED BY (feature_name)

9. ML STRATEGY — FORMULA + CHALLENGER MODEL

Primary: Weighted Fire Risk Score (Explainable, Adoption-Safe)

Why formula-first, not ML-first:

  • FDNY's biggest challenge: "When everything is 'high risk,' inspectors get alarm fatigue and stop trusting the system"
  • Cultural resistance: "I know my district better than any algorithm"
  • Solution adopted by all successful real-world systems: interpretable primary output + ML as validation

The formula:

def compute_fire_risk_score(row):
    # Component 1: Violation Score (0-30 points)
    violation_score = min(
        row['severe_violation_count'] * 10 +
        row['open_violation_count'] * 4 +
        (10 if row['violations_uninspected'] > 0 else 0),
        30
    )
    
    # Component 2: Incident Score (0-25 points)
    incident_score = min(
        row['structure_fire_count_3yr'] * 10 +
        row['other_incident_count_3yr'] * 2,
        25
    )
    
    # Component 3: Inspection Score (0-25 points)
    inspection_recency = min(row['days_since_last_inspection'] / 365 * 8, 15)
    inspection_failure = row['inspection_failure_count'] * 5
    inspection_score = min(inspection_recency + inspection_failure, 25)
    
    # Component 4: Call Frequency Score (0-10 points)
    call_score = min(row['call_frequency_12mo'] / 5 * 10, 10)
    
    # Component 5: Permit Type Risk (0-10 points)
    permit_type_lookup = {
        'Assembly/Restaurant': 10, 'Industrial/Warehouse': 9,
        'Mixed Use': 7, 'Retail': 5, 'Office': 3, 'Residential': 2
    }
    permit_score = permit_type_lookup.get(row['permit_type'], 5)
    
    # Component 6: Permit Expiry Risk (0-8 points) — NEW SIGNAL
    permit_expired = (row['permit_status'] == 'Expired') and (row['days_since_permit_expiry'] > 0)
    permit_expiry_score = min(row['days_since_permit_expiry'] / 180 * 8, 8) if permit_expired else 0
    # Dark property penalty — addresses latent risk undercount
    dark_penalty = 8 if row['is_dark_property'] else 0
    
    total = violation_score + incident_score + inspection_score + call_score + permit_score + dark_penalty
    return min(total, 100)

# Tier mapping
def risk_tier(score):
    if score >= 81: return "CRITICAL"
    elif score >= 61: return "HIGH"
    elif score >= 31: return "MEDIUM"
    else: return "LOW"

Temporal Risk Decay — Time-Aware Scoring

A static score ignores that risk grows with time. We apply an exponential decay to the inspection recency component: a building last inspected 1 year ago scores moderately; one last inspected 3 years ago scores near-maximum on that component. This is not modeled by any cited prior system.

import math

def inspection_recency_decay(days_since_inspection: int, max_score: int = 15) -> float:
    """
    Exponential decay: score grows toward max_score as days increase.
    Half-max reached at ~365 days, near-max at ~730 days.
    """
    lambda_decay = 0.0019  # tunable: controls how fast score rises
    return max_score * (1 - math.exp(-lambda_decay * days_since_inspection))

This replaces the linear days_since_last_inspection / 365 * 8 term in Component 3 with a more realistic decay curve. Log lambda_decay as an MLflow parameter so it's tunable without code changes.

Formula output in the demo:

"Property at 123 Mission St scored 82/100 — CRITICAL because:

  • 2 structure fires in 3 years (+20 pts)
  • 3 open severe violations unresolved (+25 pts)
  • 847 days since last inspection (+14 pts)
  • Restaurant permit type (+10 pts)
  • Dark property — never in inspection queue (+8 pts)
  • High call frequency in area (+5 pts)"

Secondary: MLflow XGBoost Challenger

# Temporal split — train on earlier years, test on most recent
# CRITICAL: Spatial cross-validation, NOT random k-fold
# Spatial CV prevents leakage from spatially autocorrelated neighbors

from sklearn.model_selection import GroupKFold

# Groups = neighborhoods — ensure no neighborhood appears in both train and test
gkf = GroupKFold(n_splits=5)
neighborhood_groups = X_train['neighborhood'].values

cv_scores = cross_val_score(
    xgb_model, X_train, y_train,
    groups=neighborhood_groups,
    cv=gkf,
    scoring='roc_auc'
)

# SHAP for feature importance → feeds back into Gold table
explainer = shap.TreeExplainer(xgb_model)
shap_values = explainer.shap_values(X_test)
# Top 3 features per property stored in gold.property_risk_twin.top_factor_*

Champion vs. Challenger in MLflow (Judge Candy)

MLflow Experiment: sf_fire_prod/risk-model-comparison

Run 1: "formula_baseline"
  → AUC: computed by ranking formula scores and evaluating against labels
  → TPR at 20% FPR: target > 60%
  → Explainability: FULL
  → Latency: ~0ms (pure formula)

Run 2: "xgboost_spatial_cv"
  → AUC: expected 0.72-0.78
  → TPR at 20% FPR: target > 65% (Firebird achieved 71%)
  → Explainability: SHAP-based, post-hoc
  → Spatial CV AUC: ~0.68-0.74 (lower than random CV = honest)

Registered in Model Registry:
  Champion: formula_baseline (production)
  Challenger: xgboost_spatial_cv (staging)

10. GEOSPATIAL COMMAND CENTER — UI DESIGN

Core Design Decision

Not a "world map." Not a generic dashboard with a map widget. A city-scale operational cockpit where the map is the primary canvas and every layer maps to one decision.

Dark theme, full-screen, command center aesthetic. This is the "world monitor style" but scoped to San Francisco.

Screen Layout

┌────────────────────────────────────────────────────────────────────┐
│  🔥 FireScope SF        [LAYERS]  [TIME: ▶ 2024-01] [⚠ DRIFT OK]  │  ← Top bar
│─────────────────────────────────────────────────────────────────── │
│                                        │                           │
│   LEFT SIDEBAR                         │  MAIN MAP CANVAS          │
│   ──────────────                       │  (Full height SF map)     │
│   LAYER TOGGLES:                       │                           │
│   ☑ Risk Surface (H3 hex)             │  H3 hex heatmap            │
│   ☑ Dark Properties                   │  (red = CRITICAL)         │
│   ☑ NFPA Coverage Gaps                │                           │
│   ☐ Incident Points                   │  Station coverage halos   │
│   ☐ Violation Severity                │                           │
│                                        │  Dark property markers    │
│   FILTERS:                             │  (pulsing red dots)       │
│   District: [All ▼]                   │                           │
│   Risk Tier: [All ▼]                  │  NFPA gap zones           │
│   Property Type: [All ▼]              │  (hatched overlay)        │
│                                        │                           │
│   SCENARIO:                            │                           │
│   ☐ What-if: Relocate Station         │                           │
│                                        │                           │
│────────────────────────────────────────│───────────────────────────│
│  RIGHT PANEL (on click)                                            │
│  ─────────────────────                                             │
│  [Property/District card appears here]                             │
│  Risk score decomposition, open violations,                        │
│  days since last inspection, NFPA status,                          │
│  recommended action button                                         │
│────────────────────────────────────────────────────────────────────│
│  BOTTOM STRIP                                                      │
│  Timeline: ◄ 2022 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2024 ►         │
│  KPIs: Critical Properties: 847 | NFPA Gaps: 23 cells | Dark: 634 │
└────────────────────────────────────────────────────────────────────┘

Five Layers — What Each Means

Layer Data Source Decision It Enables
Risk Surface (H3 hex heatmap) gold.h3_risk_surface Where to focus prevention resources
Dark Properties (pulsing red markers) gold.dark_property_discovery Which buildings to add to inspection queue immediately
NFPA Coverage Gaps (hatched zones) gold.nfpa_response_compliance Where to consider station/unit redeployment
Incident Points (historical dots) bronze.sf_incidents Where fires actually happened — model validation
Violation Severity (colored rings) gold.property_risk_twin Where compliance enforcement is most urgent

Rule: Five layers maximum. More layers = cognitive overload = weak team signal.

The Right-Side Property Action Card

┌─────────────────────────────────────┐
│  123 Mission St, SoMa District      │
│  ████████████████████░░  82/100     │
│  🔴 CRITICAL                         │
│─────────────────────────────────────│
│  Why CRITICAL:                       │
│  -  2 structure fires (2021, 2023)   │
│  -  3 open severe violations         │
│  -  847 days since last inspection   │
│  -  Restaurant permit (high risk)    │
│  -  ⚠ DARK PROPERTY                  │
│─────────────────────────────────────│
│  NFPA Status: ⚠ 4.2 min (FAIL)     │
│  Nearest Station: Station 7         │
│─────────────────────────────────────│
│  [ 🔍 Priority Inspect ]            │
│  [ 📋 View Full History ]           │
└─────────────────────────────────────┘

Technology Stack

Backend: Databricks Gold tables (all data) Map rendering: Mapbox GL JS or deck.gl with H3 layer Property cards: JavaScript/React or simple HTML/CSS served from static build Data access: Either JDBC to Databricks SQL or pre-exported GeoJSON from Gold tables for hackathon speed H3 rendering: deck.gl H3HexagonLayer for the risk surface Kepler.gl alternative: Kepler (open-source, Uber) can load H3 data directly — fastest path for hackathon


11. DEMO STRATEGY — THREE CLICKS TO WIN

Opening Hook (1 sentence)

"Right now, San Francisco has hundreds of high-risk commercial buildings that no inspector has ever visited — and this command center shows exactly which ones they are, why they're dangerous, and where to send someone first."

Three-Click Demo Script

Click 1: The Macro View — "Where is the risk building?"

  • Open H3 risk surface layer
  • Show city-wide heatmap: red clusters in SoMa, Tenderloin, Mission
  • Overlay NFPA coverage gaps: "These hatched zones are high-risk AND beyond 4-minute response coverage — the crisis zones"
  • Animate time slider: "Risk has been building in this cluster over the past 18 months"

What this shows judges: H3 geospatial architecture, temporal analysis, NFPA benchmark integration

Click 2: The Dark Property Reveal — "What the department doesn't know"

  • Toggle on Dark Properties layer
  • 600+ pulsing red markers appear in unexpected locations
  • "These buildings have active business licenses or open violations — and zero inspections in three years. The department doesn't have them in any inspection queue."
  • Show the count: "Our entity resolution across all five datasets surfaced 634 previously uncrossed properties"

What this shows judges: Cross-dataset entity resolution, property discovery (Firebird-validated methodology), operational impact

Click 3: The Property Action — "Here's what to do about it"

  • Click one dark, high-risk property near a NFPA gap zone
  • Action card appears: 82/100 CRITICAL, formula breakdown visible
  • "This building is #3 on our priority list for next week — Restaurant, 847 days without inspection, 3 open violations, 4.2-minute response time from Station 7 (FAILS NFPA standard)"
  • Hit "Priority Inspect" button → shows what the updated inspection queue looks like
  • Brief flip to drift/fairness panel: "We're not targeting any district unfairly — inspection allocation aligns with risk share"

What this shows judges: Explainability, actionability, fairness governance, end-to-end platform

Demo Closing Statement

"If SFFD had used this engine last year, they would have had 634 additional high-risk properties in their inspection queue — properties the current system had no knowledge of. Any city with open fire data can stand this up on Databricks in days, not years. That's the platform argument."

Fallback Plan

If live demo fails at any point:

  • Pre-recorded 60-second screen capture of all three clicks
  • Static screenshots of each panel embedded in the deck
  • Databricks SQL dashboard as backup for Gold table metrics
  • All Gold tables are pre-computed — worst case, open a notebook and run display(spark.table("gold.property_risk_twin").orderBy("weighted_risk_score", ascending=False).limit(20))

12. JUDGE PSYCHOLOGY & OBJECTIONS

By Judge Type

Judge Type What They Care About What to Emphasize What to Avoid
ML Engineer Correct modeling choices, validation rigor Spatial CV (not random k-fold), latent risk framing, SHAP, formula vs. ML tradeoff Overclaiming accuracy, random splits for spatial data
Data Engineer Platform depth, architecture quality Medallion layers, UC lineage, Workflows orchestration, Auto Loader Flat files, no governance, no orchestration
Enterprise/Platform Judge Production readiness, scalability, generalizability UC governance, model registry, drift monitoring, multi-city pattern Prototype-only framing, missing governance
Product/Business Judge User value, ROI, clarity of impact Dark property discovery demo, NFPA compliance gaps, formula breakdown, $ROI estimate Technical architecture without business story
Domain Expert Fire safety accuracy, NFPA compliance, real-world feasibility NFPA 1710 standard, Firebird/FDNY citations, latent risk concept Fake precision, overclaiming prediction accuracy

10 Brutal Objections + Rebuttals

# Objection Rebuttal Fatal?
1 "This is just a fancy dashboard" Show MLflow registry, UC lineage, spatial CV validation, formula decomposition — the dashboard is 5% of the system Fatal if not addressed — must show platform depth
2 "Why Databricks? Any database could do this" H3 distributed spatial indexing, UC lineage, MLflow challenger framework, Workflows orchestration — all native Fatal if not addressed
3 "Your risk labels are weak — you can't predict fires reliably" We rank, not predict. Formula output is interpretable inspection priority, not fire probability. Firebird achieved 71% TPR at KDD 2016 with same approach. Manageable — acknowledge and contextualize
4 "What about response optimization? You mostly did prevention" NFPA compliance panel + crisis zone (high risk + NFPA fail) IS the response optimization layer — it's specific, benchmarked, and actionable Manageable with strong NFPA story
5 "This could be biased against certain neighborhoods" Show fairness/coverage Gold table: inspection allocation aligns with risk share by district, no structural over/under Manageable with one fairness chart
6 "Your model will drift over time" Show drift metrics Gold table, PSI scores, model freshness timestamp in health strip, champion/challenger framework ready for retraining Manageable — show it's designed for this
7 "Address matching is too messy — your joins are unreliable" Acknowledge it proactively. Show match quality metrics. Use conservative thresholds. Show source_count as confidence signal. Manageable — being honest scores more than pretending it's clean
8 "How does this generalize to other cities?" Same medallion pattern, same Gold schema, swap Bronze sources. Show Pittsburgh and Atlanta as peer-validated implementations. Manageable with one "Future Work" slide
9 "You're overcounting dark properties" Show the discovery logic explicitly — only flagging clear cases (permit + zero inspections, violations + zero follow-up). Conservative criteria, peer-validated by Firebird methodology. Manageable
10 "This isn't production-ready" Show Workflows, UC governance, model registry, drift monitoring. Acknowledge what's missing (CI/CD, real-time ingest). Frame as "production-oriented design, not production deployment" Manageable

13. 24-HOUR EXECUTION PLAN

Team Allocation (Assuming 3 Engineers)

Role Person Primary Responsibility
Data/Pipeline Lead Engineer A Bronze + Silver layers, entity resolution, H3 features, NFPA timestamps
ML/Gold Lead Engineer B Weighted formula, MLflow experiment, SHAP, Gold table scoring, drift/fairness
UI/Demo Lead Engineer C Geospatial command center, demo script, fallback screenshots

MVP Complexity Guard — H3 Nearest-Neighbor

⚠️ The full ring-expansion nearest-neighbor UDF (Section 7) is a Target/Stretch feature only.

In MVP (Hours 0–16), replace it with a simpler H3 k-ring buffer count:

# MVP version — same signal, 10x simpler, no loop
from pyspark.sql.functions import udf
from pyspark.sql.types import IntegerType
import h3

@udf(IntegerType())
def violation_count_in_kring(h3_idx, k=2):
    """Count violation H3 cells within k rings. No ring-expansion loop needed."""
    neighbors = h3.k_ring(h3_idx, k)
    return len(neighbors & violation_h3_set_broadcast.value)

Swap to the full ring-expansion UDF only after MVP is stable. Both produce the same directional insight; the full version just gives an exact distance number instead of a bucket count.