Shortly provides comprehensive monitoring capabilities through health check and metrics endpoints.
The health check endpoint monitors service availability and database connectivity.
URL: /api/health
Method: GET
Authentication: Not required (public endpoint)
Status Code: 200 OK
{
"status": "healthy",
"database": "ok"
}Status Code: 500 Internal Server Error
{
"status": "unhealthy",
"database": "error"
}The endpoint performs the following checks:
- Database Connectivity: Executes
SELECT COUNT(*) FROM _migrationsto verify:- SQLite database file is accessible
- Database connection pool is working
- Migration system is properly initialized
If any check fails, the endpoint returns a 500 error with details logged to the application logs.
# Check health status
curl http://localhost:8080/api/health
# Check with status code
curl -w "\nHTTP Status: %{http_code}\n" http://localhost:8080/api/healthwget -q -O- http://localhost:8080/api/healthhttp http://localhost:8080/api/healthThe health endpoint is designed for Kubernetes liveness and readiness probes.
Detects if the application is in a broken state and needs to be restarted:
livenessProbe:
httpGet:
path: /api/health
port: 8080
initialDelaySeconds: 10
periodSeconds: 30
timeoutSeconds: 5
failureThreshold: 3Determines if the application is ready to receive traffic:
readinessProbe:
httpGet:
path: /api/health
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3Gives the application time to initialize before liveness/readiness probes start:
startupProbe:
httpGet:
path: /api/health
port: 8080
initialDelaySeconds: 0
periodSeconds: 2
timeoutSeconds: 5
failureThreshold: 15 # 30 seconds totalSee Helm Chart Configuration for complete probe settings.
The /api/health and /api/metrics endpoints are blocked from external access for security:
- External Access (via Ingress): Blocked by nginx sidecar (returns 403 Forbidden)
- Internal Access: Available via
shortly-metricsservice on port 8081 - Health Probes: Work normally (connect directly to pod IP, bypassing nginx)
When config.metrics.enabled: true in Helm chart, a separate metrics service is created:
Service Details:
- Name:
{release-name}-shortly-metrics(e.g.,shortly-metrics) - Port: 8081
- Endpoint:
/api/metrics(also/api/healthavailable) - Access: Internal cluster only (not exposed via ingress)
Traffic Flow:
External → Ingress → Service (80) → Nginx (8080) → BLOCKED (403)
Prometheus → Internal Service (8081) → App (8081) → ALLOWED (200)
K8s Probes → Pod IP (8081) → App → ALLOWED (200)
scrape_configs:
- job_name: 'shortly'
kubernetes_sd_configs:
- role: service
namespaces:
names:
- shortly
relabel_configs:
- source_labels: [__meta_kubernetes_service_name]
regex: .*-internal
action: keep
- source_labels: [__meta_kubernetes_service_name]
target_label: service
metrics_path: /api/metrics
scrape_interval: 30s
scrape_timeout: 10sEnable in Helm chart values.yaml:
config:
metrics:
enabled: true
monitoring:
serviceMonitor:
enabled: true
labels:
prometheus: kube-prometheus # Match your Prometheus selector
interval: "30s"
scrapeTimeout: "10s"The ServiceMonitor automatically targets the internal service and scrapes /api/metrics.
# From within the cluster
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- \
curl http://shortly-metrics.shortly.svc.cluster.local:8081/api/metrics
# Expected: 200 OK with Prometheus metrics
# Test external access (should be blocked)
curl https://shortly.example.com/api/metrics
# Expected: 403 ForbiddenYou can also access metrics directly via pod IP (bypasses both service and nginx):
# Get pod IP
kubectl get pods -n shortly -o wide
# Access directly
curl http://<POD_IP>:8081/api/metricsThe blocking strategy relies on nginx sidecar configuration:
- Nginx sidecar enabled (default in production): External requests flow through Ingress → Service (80) → Nginx (8080) → App (8081). Nginx blocks
/api/healthand/api/metricswithdeny all. - Internal access: Requests via
shortly-metricsservice go directly to App (8081), bypassing nginx sidecar. - Health probes: Kubernetes probes connect directly to pod IP:8081, bypassing both service and nginx.
Important: If config.nginx.enabled: false, the blocking will not be active. Ensure nginx sidecar is enabled in production for security.
Monitor the /api/health endpoint response time to detect performance degradation:
- Normal: < 50ms
- Warning: 50-200ms
- Critical: > 200ms
Set up external monitoring to track endpoint availability:
- Uptime checks: Every 1-5 minutes
- Alert threshold: 2+ consecutive failures
- Timeout: 5 seconds
The health endpoint indirectly monitors:
- SQLite file system availability
- Database file corruption
- Migration table integrity
If the database becomes unavailable, the endpoint will return 500 errors, triggering Kubernetes pod restarts.
Health check failures are logged with ERROR level:
ERROR Health check failed - database error: SqlxError(...)
Successful health checks generate DEBUG level logs (not logged by default to reduce noise).
The health check is designed to be lightweight:
- Query: Simple COUNT query on migrations table (typically < 10 rows)
- Connection: Uses existing connection pool (no new connections)
- Overhead: < 1ms per check on typical hardware
- Frequency: Default probes run ~10 times per minute per pod
With WAL mode enabled on SQLite, health checks don't block writes.
Shortly exposes application metrics in Prometheus format for monitoring and observability.
URL: /api/metrics
Method: GET
Authentication: Not required (public endpoint)
Format: Prometheus text format (version 0.0.4)
| Metric Name | Type | Description |
|---|---|---|
shortly_urls_total |
Gauge | Total number of active URLs in the system |
shortly_urls_last_created_timestamp |
Gauge | Unix timestamp of the most recently created URL |
shortly_urls_custom_named_total |
Gauge | Number of URLs with custom names |
shortly_urls_expired_total |
Gauge | Number of expired URLs (where TTL has passed) |
shortly_urls_last_accessed_timestamp |
Gauge | Unix timestamp of the most recent URL access (redirect event) |
shortly_urls_deleted_last_24h |
Gauge | URLs deleted in the last 24 hours |
shortly_urls_ttl_hours |
Histogram | Distribution of URL TTL values in hours |
TTL Histogram Buckets: 1h, 6h, 12h, 24h, 48h, 72h, 168h (1 week), 336h (2 weeks), 720h (1 month)
| Metric Name | Type | Description |
|---|---|---|
shortly_users_total |
Gauge | Total number of registered users |
shortly_users_active_sessions |
Gauge | Number of active user sessions |
shortly_users_last_login_timestamp |
Gauge | Unix timestamp of the last user login |
Note: User metrics are only available when authentication is enabled in the configuration.
| Metric Name | Type | Labels | Description |
|---|---|---|---|
shortly_audit_events_total |
Counter | event_type |
Total count of audit events by type |
shortly_audit_last_event_timestamp |
Gauge | event_type |
Unix timestamp of last event for each type |
Event Types:
CreateUrl- URL creation eventsDeleteUrl- URL deletion eventsUserLogin- User login eventsUserLogout- User logout eventsUserQuotaUpdate- User quota modification events
| Metric Name | Type | Description |
|---|---|---|
shortly_database_connection_pool_size |
Gauge | Total database connections in the pool |
shortly_database_connection_pool_idle |
Gauge | Idle database connections |
| Metric Name | Type | Labels | Description |
|---|---|---|---|
shortly_uptime_seconds |
Gauge | - | Application uptime in seconds since startup |
shortly_version_info |
Gauge | version |
Application version (value is always 1) |
# HELP shortly_urls_total Total number of active URLs
# TYPE shortly_urls_total gauge
shortly_urls_total 1523
# HELP shortly_urls_last_created_timestamp Unix timestamp of last created URL
# TYPE shortly_urls_last_created_timestamp gauge
shortly_urls_last_created_timestamp 1735470123
# HELP shortly_urls_ttl_hours Distribution of URL TTL in hours
# TYPE shortly_urls_ttl_hours histogram
shortly_urls_ttl_hours_bucket{le="1"} 45
shortly_urls_ttl_hours_bucket{le="6"} 123
shortly_urls_ttl_hours_bucket{le="168"} 1211
shortly_urls_ttl_hours_bucket{le="+Inf"} 1523
shortly_urls_ttl_hours_sum 253467.5
shortly_urls_ttl_hours_count 1523
# HELP shortly_audit_events_total Total audit events by type
# TYPE shortly_audit_events_total counter
shortly_audit_events_total{event_type="CreateUrl"} 1523
shortly_audit_events_total{event_type="DeleteUrl"} 312
# HELP shortly_version_info Application version information
# TYPE shortly_version_info gauge
shortly_version_info{version="1.3.0"} 1
# Fetch all metrics
curl http://localhost:8080/api/metrics
# Count total metrics
curl -s http://localhost:8080/api/metrics | grep "^shortly_" | wc -l
# Filter specific metric
curl -s http://localhost:8080/api/metrics | grep "shortly_urls_total"wget -q -O- http://localhost:8080/api/metricsAdd the following to your Prometheus prometheus.yml:
scrape_configs:
- job_name: 'shortly'
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /api/metrics
static_configs:
- targets: ['localhost:8080']
labels:
environment: 'production'
app: 'shortly'For Prometheus Operator in Kubernetes:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: shortly
namespace: monitoring
spec:
selector:
matchLabels:
app: shortly
endpoints:
- port: http
path: /api/metrics
interval: 30s
scrapeTimeout: 10sExample PromQL queries for Grafana dashboards:
# Total URLs
shortly_urls_total
# URL creation rate (per hour)
rate(shortly_audit_events_total{event_type="CreateUrl"}[1h]) * 3600
# Active users
shortly_users_total
# Database connection pool utilization
shortly_database_connection_pool_size - shortly_database_connection_pool_idle
# Uptime in days
shortly_uptime_seconds / 86400
# Last URL access time (Unix timestamp)
shortly_urls_last_accessed_timestamp
# 95th percentile TTL
histogram_quantile(0.95, shortly_urls_ttl_hours_bucket)
Example Prometheus alerting rules:
groups:
- name: shortly_alerts
interval: 30s
rules:
- alert: HighExpiredURLs
expr: shortly_urls_expired_total{job="example-job-name"} > 1000
for: 5m
labels:
severity: warning
annotations:
summary: "High number of expired URLs"
description: "{{ $value }} URLs have expired"
- alert: DatabasePoolExhausted
expr: shortly_database_connection_pool_idle{job="example-job-name"} == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Database connection pool exhausted"
description: "No idle database connections available"
- alert: NoRecentURLAccess
expr: time() - shortly_urls_last_accessed_timestamp{job="example-job-name"} > 86400
for: 5m
labels:
severity: warning
annotations:
summary: "No URLs accessed in last 24 hours"
description: "Service may not be receiving traffic"
- alert: HighURLDeletionRate
expr: shortly_urls_deleted_last_24h{job="example-job-name"} > 500
for: 10m
labels:
severity: warning
annotations:
summary: "High URL deletion rate"
description: "{{ $value }} URLs deleted in last 24 hours"
- alert: DatabasePoolUtilizationHigh
expr: (shortly_database_connection_pool_size{job="example-job-name"} - shortly_database_connection_pool_idle{job="example-job-name"}) / shortly_database_connection_pool_size{job="example-job-name"} > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "Database connection pool utilization is high"
description: "Pool utilization is {{ $value | humanizePercentage }}"
- alert: ServiceDown
expr: up{job="example-job-name"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Shortly service is down"
description: "Shortly service has been down for more than 1 minute"- Real-time: Metrics are collected from the database on each scrape request
- No caching: All values are freshly queried to ensure accuracy
- Performance: Optimized SQL queries use existing indexes
- Expected latency: < 50ms for typical datasets (< 10k URLs)
Metrics collection executes approximately 15 SQL queries per scrape:
- 6 queries for URL metrics
- 3 queries for user metrics (when auth enabled)
- 2 queries for audit metrics (grouped)
- 1 query for TTL histogram data
- 0 queries for database pool metrics (uses sqlx Pool API)
- 0 queries for system metrics (calculated from application state)
All queries leverage existing database indexes for optimal performance.
Recommended: 15-60 seconds
- High frequency (15s): For production monitoring with quick alert detection
- Standard (30s): Balance between freshness and load
- Low frequency (60s): For development or low-traffic environments
- Metrics collection adds minimal overhead (< 10ms per scrape on typical hardware)
- SQLite WAL mode prevents metrics queries from blocking writes
- Connection pool is shared with application traffic
- No memory buildup (counters are recalculated from database each time)
If performance becomes a concern with large datasets:
- Caching: Implement 10-30 second cache for metrics
- Background collection: Use scheduler to pre-compute metrics
- Sampling: Sample histogram data instead of loading all TTL values
- Materialized views: Store pre-aggregated counts in dedicated tables