Attachments

This guide covers all aspects of handling email attachments in genro-mail-proxy: fetching from various sources, caching for deduplication, and handling large files.

Attachment specification

Each attachment in a message requires at minimum a filename and storage_path:

{
  "attachments": [
    {
      "filename": "report.pdf",
      "storage_path": "doc_id=123"
    }
  ]
}

Full attachment fields:

Field	Required	Description
filename	Yes	Display name for the attachment (max 255 chars)
storage_path	Yes	Content location (format depends on fetch_mode)
fetch_mode	No	How to retrieve content (inferred if not provided)
mime_type	No	MIME type override (auto-detected from filename if not set)
content_md5	No	MD5 hash for cache lookup (32-char hex string)
auth	No	Authentication override for HTTP requests

Fetch modes

The fetch_mode field determines how the proxy retrieves attachment content. If not specified, it is inferred from the storage_path format:

fetch_mode	storage_path format	Description
endpoint	`doc_id=123` (default)	POST to tenant’s `client_attachment_path`
http_url	`https://...` or `http://...`	Direct GET from URL
base64	`base64:SGVsbG8=` or raw base64	Inline encoded content
filesystem	`/var/attachments/file.pdf`	Read from local filesystem

Inference rules (when fetch_mode is omitted):

Starts with base64: → base64 (prefix is stripped)
Starts with http:// or https:// → http_url
Starts with / → filesystem
Otherwise → endpoint (default)

endpoint mode

The proxy POSTs to the tenant’s attachment endpoint with the storage_path value:

{
  "filename": "invoice.pdf",
  "storage_path": "doc_id=456&version=2",
  "fetch_mode": "endpoint"
}

The proxy sends:

POST {client_base_url}{client_attachment_path}
Content-Type: application/json
Authorization: Bearer {client_auth.token}

{"storage_path": "doc_id=456&version=2"}

Your endpoint must return the raw file content with appropriate Content-Type.

http_url mode

Direct HTTP GET from the URL in storage_path:

{
  "filename": "logo.png",
  "storage_path": "https://cdn.example.com/images/logo.png",
  "fetch_mode": "http_url"
}

Since URLs start with http:// or https://, fetch_mode can be omitted:

{
  "filename": "logo.png",
  "storage_path": "https://cdn.example.com/images/logo.png"
}

Use the auth field to override authentication for this specific request:

{
  "filename": "private.pdf",
  "storage_path": "https://api.example.com/files/123",
  "auth": {
    "method": "bearer",
    "token": "file-specific-token"
  }
}

base64 mode

Inline base64-encoded content directly in the message:

{
  "filename": "small.txt",
  "storage_path": "SGVsbG8gV29ybGQh",
  "fetch_mode": "base64"
}

Or with the base64: prefix (fetch_mode is inferred):

{
  "filename": "small.txt",
  "storage_path": "base64:SGVsbG8gV29ybGQh"
}

Warning

Base64 encoding increases payload size by ~33%. For files larger than a few KB, prefer endpoint or http_url modes to avoid bloating the message queue.

filesystem mode

Read directly from the local filesystem:

{
  "filename": "local-report.pdf",
  "storage_path": "/var/attachments/reports/2024/report.pdf",
  "fetch_mode": "filesystem"
}

Since absolute paths start with /, fetch_mode can be omitted:

{
  "filename": "local-report.pdf",
  "storage_path": "/var/attachments/reports/2024/report.pdf"
}

The path can be:

Absolute: /var/attachments/file.pdf
Relative: reports/file.pdf (resolved against configured base_dir)

Warning

Filesystem mode requires the proxy to have read access to the file path. For containerized deployments, ensure the volume is mounted correctly.

Caching and deduplication

The proxy supports MD5-based caching to avoid re-fetching identical content. This is useful when the same attachment appears in multiple messages.

MD5 marker in filename (legacy)

Embed the MD5 hash directly in the filename:

{
  "filename": "report_{MD5:d41d8cd98f00b204e9800998ecf8427e}.pdf",
  "storage_path": "doc_id=123"
}

The proxy:

Extracts the MD5 hash from the filename
Checks the cache for that hash
If found, uses cached content (skips fetch)
If not found, fetches and caches with computed MD5
Strips the marker from the final filename → report.pdf

content_md5 field (recommended)

Provide the MD5 hash as a separate field:

{
  "filename": "report.pdf",
  "storage_path": "doc_id=123",
  "content_md5": "d41d8cd98f00b204e9800998ecf8427e"
}

This is cleaner than embedding in the filename and provides the same caching benefit.

Note

If both content_md5 and filename marker are provided, content_md5 takes precedence.

Cache behavior

Cache hit: Content returned immediately, fetch skipped
Cache miss: Content fetched, then cached using its computed MD5
No MD5 provided: Content fetched and cached (for future lookups by MD5)

The cache uses a two-tier architecture:

Memory tier: Fast LRU cache (configurable size)
Disk tier: Persistent storage for larger files

Large file handling

Attachments exceeding a size threshold can be automatically uploaded to external storage and replaced with download links. This prevents:

Memory exhaustion during email building
SMTP server size limits (Gmail: 25MB, Exchange: 10MB default)
Slow email delivery due to large payloads

Installation

# Install the appropriate storage backend:
pip install genro-mail-proxy[enterprise-s3]    # Amazon S3 / MinIO
pip install genro-mail-proxy[enterprise-gcs]   # Google Cloud Storage
pip install genro-mail-proxy[enterprise-azure] # Azure Blob Storage
pip install genro-mail-proxy[enterprise]       # All cloud backends

Configuration

Large file handling is configured per-tenant:

{
  "large_file_config": {
    "enabled": true,
    "max_size_mb": 10,
    "storage_url": "s3://my-bucket/mail-attachments",
    "file_ttl_days": 30,
    "action": "rewrite"
  }
}

Fields:

Field	Default	Description
enabled	false	Enable large file handling
max_size_mb	10.0	Size threshold in megabytes
storage_url	(required)	fsspec URL for storage backend
public_base_url	(optional)	Required for filesystem storage
file_ttl_days	30	Days before files expire
lifespan_after_download_days	(optional)	Days to keep after first download
action	warn	Behavior when threshold exceeded

Actions

warn: Log a warning but send the attachment normally
reject: Reject the message with an error
rewrite: Upload to storage and replace with download link

Storage backends

The proxy uses fsspec for storage abstraction:

S3 / MinIO:

storage_url: "s3://bucket-name/path/prefix"

Requires AWS credentials via environment or IAM role.

Google Cloud Storage:

storage_url: "gs://bucket-name/path/prefix"

Requires GOOGLE_APPLICATION_CREDENTIALS environment variable.

Azure Blob Storage:

storage_url: "az://container-name/path/prefix"

Requires Azure credentials via environment.

Local filesystem:

storage_url: "file:///var/www/downloads/attachments"
public_base_url: "https://files.example.com/attachments"

Requires public_base_url for generating download URLs. Files must be served by a web server (nginx, Apache, etc.).

Download URLs

When action: rewrite is used:

Cloud storage: Presigned URLs are generated automatically (S3, GCS, Azure)
Local filesystem: Signed token URLs using public_base_url

The email body is modified to include download links:

<hr>
<p><strong>Attachments available for download:</strong></p>
<ul>
  <li><a href="https://...presigned-url...">large-report.pdf</a> (15.2 MB)</li>
</ul>
<p><em>Links expire in 30 days.</em></p>

Best practices

Use endpoint mode for dynamic content: When attachments are generated on-demand or require authentication, use endpoint mode with your own API.
Use http_url for CDN-hosted files: Static assets already on a CDN can be fetched directly without proxying through your application.
Avoid base64 for large files: Base64 bloats the message queue by 33%. Use endpoint or http_url instead.
Enable caching for repeated attachments: If the same file appears in multiple messages (e.g., company logo), provide content_md5 to avoid re-fetching.
Configure large_file_config for production: Prevent memory issues by enabling large file handling with appropriate thresholds for your SMTP provider.
Use presigned URLs for security: Cloud storage presigned URLs expire, preventing unauthorized long-term access to attachments.