How Certstream Works
Technical architecture and data flow
High-Level Architecture
End-to-end data flow from CT logs to clients
CT Log Polling
How certificates are fetched from Certificate Transparency logs
-
Fetch log list
On startup, fetches the official CT log list from Google's
all_logs_list.json. Filters to only "usable" logs. Merges with custom logs from config. -
Spawn watchers
Each CT log gets its own async task (tokio spawn). Tasks run independently. If state file exists, resumes from saved position.
-
Poll tree size
Watcher calls
/ct/v1/get-sthto get current tree size. Compares with tracked position to determine new entries. -
Fetch entries
Fetches batch via
/ct/v1/get-entries?start=X&end=Y. Batch size configurable (default: 256). -
Health tracking
Tracks consecutive failures. States: Healthy, Degraded, Unhealthy. Unhealthy logs pause with exponential backoff.
-
Save state
After each batch, saves position to state file. Enables resume after restart.
Certificate Parsing
X.509 certificate decoding flow
MerkleTreeLeaf Structure (RFC 6962)
Byte 0 Version |
Byte 1 LeafType |
Byte 2-9 Timestamp |
Byte 10-11 EntryType (0=X509, 1=Precert) |
Byte 12-14 Cert length |
Byte 15+ DER certificate
Precert extra_data (RFC 6962)
3 bytes pre-certificate length |
variable pre-certificate (X509 with CT poison extension) |
3 bytes chain length |
variable certificate chain
Extracted Fields
Subject/Issuer: CN, O, C, L, ST, OU, Email
Hashes: SHA1, SHA256, Fingerprint
Validity: not_before, not_after
Extensions: SubjectAltName, KeyUsage, BasicConstraints
Domains: Collected from CN + SAN DNS entries
Pre-Serialization
Serialize once, broadcast to all clients
Why Pre-Serialize?
Instead of serializing each message for every connected client, certstream-server-rust serializes once and shares via Arc.
With 10,000 clients, this means 1 serialization instead of 10,000. The broadcast channel uses backpressure - lagging clients skip messages rather than blocking others.
Stream Formats
Three output formats for different use cases
full
Complete certificate data including chain and DER-encoded certificate.
lite
Certificate metadata without chain or DER data. Best for most use cases.
domains_only
Just the domain names array. Minimal bandwidth.
State Persistence
Resume from last position after restart
State Structure
Each CT log's state is tracked with: current_index (last processed entry),
tree_size (known tree size), and last_success (timestamp for health tracking).
State is persisted periodically to JSON file, enabling zero-loss restarts.