Architecture - How Certstream Server Works

High-Level Architecture

End-to-end data flow from CT logs to clients

flowchart LR CT["60+ CT Logs"] --> W["Watchers"] W --> P["Parser"] P --> S["Pre-Serialize"] S --> B["Broadcast"] B --> WS["WebSocket"] B --> SSE["SSE"] B --> TCP["TCP"] style CT fill:#1a1a1a,stroke:#4ade80,color:#fff style W fill:#1a1a1a,stroke:#60a5fa,color:#fff style P fill:#1a1a1a,stroke:#60a5fa,color:#fff style S fill:#1a1a1a,stroke:#f97316,color:#fff style B fill:#1a1a1a,stroke:#f97316,color:#fff style WS fill:#1a1a1a,stroke:#4ade80,color:#fff style SSE fill:#1a1a1a,stroke:#4ade80,color:#fff style TCP fill:#1a1a1a,stroke:#4ade80,color:#fff

CT Log Polling

How certificates are fetched from Certificate Transparency logs

Fetch log list

On startup, fetches the official CT log list from Google's all_logs_list.json. Filters to only "usable" logs. Merges with custom logs from config.
Spawn watchers

Each CT log gets its own async task (tokio spawn). Tasks run independently. If state file exists, resumes from saved position.
Poll tree size

Watcher calls /ct/v1/get-sth to get current tree size. Compares with tracked position to determine new entries.
Fetch entries

Fetches batch via /ct/v1/get-entries?start=X&end=Y. Batch size configurable (default: 256).
Health tracking

Tracks consecutive failures. States: Healthy, Degraded, Unhealthy. Unhealthy logs pause with exponential backoff.
Save state

After each batch, saves position to state file. Enables resume after restart.

Certificate Parsing

X.509 certificate decoding flow

flowchart LR A["CT Entry"] --> B["Base64"] B --> C["MerkleTreeLeaf"] C --> D["x509_parser"] D --> E["Extract"] style A fill:#1a1a1a,stroke:#60a5fa,color:#fff style B fill:#1a1a1a,stroke:#f97316,color:#fff style C fill:#1a1a1a,stroke:#f97316,color:#fff style D fill:#1a1a1a,stroke:#f97316,color:#fff style E fill:#1a1a1a,stroke:#4ade80,color:#fff

MerkleTreeLeaf Structure (RFC 6962)

Precert extra_data (RFC 6962)

3 bytes pre-certificate length | variable pre-certificate (X509 with CT poison extension) | 3 bytes chain length | variable certificate chain

Extracted Fields

Subject/Issuer: CN, O, C, L, ST, OU, Email
Hashes: SHA1, SHA256, Fingerprint
Validity: not_before, not_after
Extensions: SubjectAltName, KeyUsage, BasicConstraints
Domains: Collected from CN + SAN DNS entries

Pre-Serialization

Serialize once, broadcast to all clients

flowchart LR A["Message"] --> B["serialize()"] B --> C["Arc"] C --> D["broadcast"] D --> E["Clients"] style A fill:#1a1a1a,stroke:#4ade80,color:#fff style B fill:#1a1a1a,stroke:#f97316,color:#fff style C fill:#1a1a1a,stroke:#60a5fa,color:#fff style D fill:#1a1a1a,stroke:#60a5fa,color:#fff style E fill:#1a1a1a,stroke:#4ade80,color:#fff

Why Pre-Serialize?

Instead of serializing each message for every connected client, certstream-server-rust serializes once and shares via Arc. With 10,000 clients, this means 1 serialization instead of 10,000. The broadcast channel uses backpressure - lagging clients skip messages rather than blocking others.

Stream Formats

Three output formats for different use cases

full

/full-stream

Complete certificate data including chain and DER-encoded certificate.

~15-50 KB per message

lite

/ (default)

Certificate metadata without chain or DER data. Best for most use cases.

~2-5 KB per message

domains_only

/domains-only

Just the domain names array. Minimal bandwidth.

~100-500 bytes per message

State Persistence

Resume from last position after restart

flowchart LR A["StateManager"] --> B["DashMap"] B --> C["state.json"] C --> D["Resume"] style A fill:#1a1a1a,stroke:#4ade80,color:#fff style B fill:#1a1a1a,stroke:#60a5fa,color:#fff style C fill:#1a1a1a,stroke:#60a5fa,color:#fff style D fill:#1a1a1a,stroke:#4ade80,color:#fff

State Structure

Each CT log's state is tracked with: current_index (last processed entry), tree_size (known tree size), and last_success (timestamp for health tracking). State is persisted periodically to JSON file, enabling zero-loss restarts.

How Certstream Works

High-Level Architecture

CT Log Polling

Fetch log list

Spawn watchers

Poll tree size

Fetch entries

Health tracking

Save state