Monitoring
genro-mail-proxy exposes Prometheus metrics and health endpoints for comprehensive monitoring and alerting.
Prometheus Metrics
The /metrics endpoint returns metrics in Prometheus text exposition format.
No authentication required for this endpoint—secure it via firewall rules
or reverse proxy.
curl http://localhost:8000/metrics
Available Metrics
All metrics use the gmp_ prefix (genro-mail-proxy):
Metric |
Type |
Description |
|---|---|---|
|
Counter |
Successfully sent emails (by |
|
Counter |
Permanently failed emails (by |
|
Counter |
Temporarily deferred emails (by |
|
Counter |
Rate limit enforcement events (by |
|
Gauge |
Current number of messages in queue |
Sample Output
# HELP gmp_sent_total Total sent emails
# TYPE gmp_sent_total counter
gmp_sent_total{account_id="primary-smtp"} 1523.0
gmp_sent_total{account_id="bulk-smtp"} 45678.0
# HELP gmp_errors_total Total send errors
# TYPE gmp_errors_total counter
gmp_errors_total{account_id="primary-smtp"} 12.0
gmp_errors_total{account_id="bulk-smtp"} 89.0
# HELP gmp_deferred_total Total deferred emails
# TYPE gmp_deferred_total counter
gmp_deferred_total{account_id="primary-smtp"} 5.0
gmp_deferred_total{account_id="bulk-smtp"} 234.0
# HELP gmp_rate_limited_total Total rate limited occurrences
# TYPE gmp_rate_limited_total counter
gmp_rate_limited_total{account_id="primary-smtp"} 0.0
gmp_rate_limited_total{account_id="bulk-smtp"} 156.0
# HELP gmp_pending_messages Current pending messages
# TYPE gmp_pending_messages gauge
gmp_pending_messages 42.0
Prometheus Configuration
Add the mail-proxy to your Prometheus scrape configuration:
# prometheus.yml
scrape_configs:
- job_name: 'mail-proxy'
scrape_interval: 15s
static_configs:
- targets: ['mail-proxy:8000']
metrics_path: /metrics
For Kubernetes with service discovery:
scrape_configs:
- job_name: 'mail-proxy'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app]
regex: mail-proxy
action: keep
Grafana Dashboard
Import or create a dashboard with these panels:
Send Rate:
rate(gmp_sent_total[5m])
Error Rate:
rate(gmp_errors_total[5m])
Success Rate (percentage):
rate(gmp_sent_total[5m]) /
(rate(gmp_sent_total[5m]) + rate(gmp_errors_total[5m])) * 100
Queue Depth:
gmp_pending_messages
Rate Limit Hits (per minute):
increase(gmp_rate_limited_total[1m])
Throughput by Account:
sum by(account_id) (rate(gmp_sent_total[5m]))
You can find an example dashboard inside examples/grafana_dashboard.
Alerting Rules
Example Prometheus alerting rules:
# alerts.yml
groups:
- name: mail-proxy
rules:
# High error rate
- alert: MailProxyHighErrorRate
expr: |
rate(gmp_errors_total[5m]) /
(rate(gmp_sent_total[5m]) + rate(gmp_errors_total[5m])) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "Mail proxy error rate above 10%"
description: "{{ $labels.account_id }} has {{ $value | humanizePercentage }} error rate"
# Queue backing up
- alert: MailProxyQueueBacklog
expr: gmp_pending_messages > 1000
for: 10m
labels:
severity: warning
annotations:
summary: "Mail proxy queue backlog"
description: "{{ $value }} messages pending for over 10 minutes"
# Rate limiting active
- alert: MailProxyRateLimited
expr: increase(gmp_rate_limited_total[5m]) > 100
for: 5m
labels:
severity: info
annotations:
summary: "Mail proxy hitting rate limits"
description: "{{ $labels.account_id }} rate limited {{ $value }} times in 5m"
# Service down
- alert: MailProxyDown
expr: up{job="mail-proxy"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Mail proxy is down"
Health Endpoints
Two health endpoints are available:
GET /health
Lightweight health check. Returns 200 OK if the service is running. No authentication required.
curl http://localhost:8000/health
Response:
{"ok": true}
Use this for load balancer health checks and Kubernetes liveness probes.
GET /status
Detailed status including scheduler state. Requires authentication.
curl http://localhost:8000/status -H "X-API-Token: $TOKEN"
Response:
{
"ok": true,
"scheduler_active": true,
"pending_messages": 42,
"uptime_seconds": 86400
}
Kubernetes Probes
Configure probes in your deployment:
apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
containers:
- name: mail-proxy
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 5
periodSeconds: 5
Docker Compose Health Check
services:
mail-proxy:
image: genro-mail-proxy
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 10s
Logging
The service logs to stdout with Python’s logging module. Key log events:
INFO: Service start/stop, message sent, delivery report posted
WARNING: Rate limit hit, temporary SMTP error, retry scheduled
ERROR: Permanent SMTP error, attachment fetch failure
Configure log level via environment:
export LOG_LEVEL=DEBUG
uvicorn core.mail_proxy.server:app --host 0.0.0.0
Or in Docker:
services:
mail-proxy:
environment:
- LOG_LEVEL=INFO
Log Format
Default format includes timestamp, level, and message:
2025-01-23 10:15:32,456 INFO Message msg-123 sent via primary-smtp
2025-01-23 10:15:33,789 WARNING Rate limit reached for bulk-smtp, deferring msg-456
2025-01-23 10:15:34,012 ERROR SMTP error for msg-789: 550 Mailbox not found
Structured Logging (JSON)
For log aggregation systems, use JSON output. Set environment variable:
export LOG_FORMAT=json
Output:
{"timestamp": "2025-01-23T10:15:32.456Z", "level": "INFO", "message": "Message msg-123 sent", "account_id": "primary-smtp"}
Best Practices
Scrape frequently: 15-second intervals catch transient issues.
Alert on trends, not spikes: Use
for: 5mto avoid false positives.Monitor queue depth: Rising
gmp_pending_messagesindicates throughput issues.Track per-account metrics: Different accounts may have different error patterns.
Correlate with SMTP provider: Match your metrics with provider’s dashboard for root cause analysis.
Secure /metrics: Even without auth, the endpoint reveals account IDs. Use firewall rules or reverse proxy.
See Also
Rate Limiting for understanding rate limit metrics
Usage for configuration options
Network Requirements for firewall and port configuration