How Certstream Works
Technical architecture and data flow
High-Level Architecture
End-to-end data flow from CT logs to clients
v1.3.0: Zero-Copy & Lock-Free Pipeline
Every certificate is serialized exactly once using simd-json (SIMD-accelerated, enabled by default) into three pre-built Bytes payloads
(full, lite, domains_only), then wrapped in an Arc<PreSerializedMessage>.
Each subscriber receives only a pointer clone — no re-serialization, no extra heap allocation per client.
Shared state uses lock-free DashMap throughout; there are no global read/write mutexes in the hot path.
The result is a flat ~150 MB RSS footprint under sustained load, stable over time.
CT Log Polling
How certificates are fetched from Certificate Transparency logs
-
Fetch log list
On startup, fetches the Chrome-trusted CT log list from Google's
log_list.json. Filters out rejected and retired logs. Merges with custom logs from config. -
Spawn watchers
Each CT log gets its own async task (tokio spawn). Tasks run independently. If state file exists, resumes from saved position.
-
Poll tree size
Watcher calls
/ct/v1/get-sthto get current tree size. Compares with tracked position to determine new entries. -
Fetch entries
Fetches batch via
/ct/v1/get-entries?start=X&end=Y. Batch size configurable (default: 256). -
Health tracking
Tracks consecutive failures. States: Healthy, Degraded, Unhealthy. Unhealthy logs pause with exponential backoff.
-
Save state
After each batch, saves position to state file. Enables resume after restart.
Static CT Protocol
Checkpoint + tile-based fetching for next-generation CT logs (v1.3.0)
Why Static CT?
Let's Encrypt is shutting down RFC 6962 CT logs on February 28, 2026. New logs use a static, tile-based protocol
where the tree is served as immutable tiles instead of dynamic get-entries API calls.
Chrome accepts static-ct-api logs since April 2025.
-
Fetch checkpoint
Polls
/checkpointto get current tree size. Checkpoints are signed text files with origin, tree size, and root hash. -
Calculate tile range
Each tile contains 256 entries. Calculates which tiles need fetching based on current index vs tree size. Supports partial tiles for the last tile.
-
Fetch tile data
Downloads tile from
/tile/data/<path>. Path uses hierarchical encoding (e.g.,x001/234for tile 1234). Tiles may be gzip-compressed. -
Parse binary entries
Binary parser extracts timestamp, entry type (x509/precert), DER certificate, and chain fingerprints from each 256-entry tile.
-
Fetch issuer certificates
Chain certificates are referenced by SHA-256 fingerprint. Fetched from
/issuer/<hex>and cached in a DashMap-based issuer cache. -
Dedup and broadcast
Certificates pass through cross-log dedup filter before being serialized and broadcast to clients.
Cross-Log Deduplication
Filter duplicate certificates across multiple CT logs (v1.3.0)
How It Works
The same certificate often appears in multiple CT logs simultaneously. The dedup filter uses a DashMap<[u8; 32], Instant>
keyed by the raw 32-byte SHA-256 digest stored directly in LeafCert::sha256_raw — a fixed-size, stack-allocated key that
eliminates one heap allocation per lookup compared to a String key. First occurrence passes through; duplicates within the
5-minute TTL window are discarded. Capacity is capped at 500K entries. A background task cleans expired entries every 60 seconds.
Certificate Parsing
X.509 certificate decoding flow
MerkleTreeLeaf Structure (RFC 6962)
Byte 0 Version |
Byte 1 LeafType |
Byte 2-9 Timestamp |
Byte 10-11 EntryType (0=X509, 1=Precert) |
Byte 12-14 Cert length |
Byte 15+ DER certificate
Precert extra_data (RFC 6962)
3 bytes pre-certificate length |
variable pre-certificate (X509 with CT poison extension) |
3 bytes chain length |
variable certificate chain
Extracted Fields
Subject/Issuer: CN, O, C, L, ST, OU, Email
Hashes: SHA1, SHA256, Fingerprint
Validity: not_before, not_after
Extensions: SubjectAltName, KeyUsage, BasicConstraints
Domains: Collected from CN + SAN DNS entries
Pre-Serialization
Serialize once, broadcast to all clients
Why Pre-Serialize? (Serialize-Once, Broadcast-Many)
Instead of serializing each message for every connected client, certstream-server-rust serializes once per certificate into three pre-built
byte payloads (full, lite, domains_only), wraps them in an Arc<PreSerializedMessage>,
and clones only the Arc pointer to every subscriber — zero re-serialization, zero extra heap allocation per client.
JSON serialization uses simd-json (SIMD-accelerated) when the simd feature is enabled (the default).
With 10,000 clients, this means 1 serialization instead of 10,000. The broadcast channel uses backpressure — lagging clients
skip messages rather than blocking others.
Stream Formats
Three output formats for different use cases
full
Complete certificate data including chain and DER-encoded certificate.
lite
Certificate metadata without chain or DER data. Best for most use cases.
domains_only
Just the domain names array. Minimal bandwidth.
State Persistence
Resume from last position after restart
State Structure
Each CT log's state is tracked with: current_index (last processed entry),
tree_size (known tree size), and last_success (timestamp for health tracking).
State is persisted periodically (every 30s) and on shutdown to JSON file, enabling zero-loss restarts.
Both RFC 6962 and static CT log positions are tracked.
Atomic Dirty Flag (v1.3.0)
The dirty flag uses AtomicBool instead of a lock, ensuring state updates are never silently dropped.
State is flushed on graceful shutdown (SIGINT/SIGTERM) and when the periodic save task is cancelled.
Enabled by default with state_file: "certstream_state.json".
certstream-server-rust