GitHub

High-Level Architecture

End-to-end data flow from CT logs to clients

flowchart LR CT["60+ CT Logs"] --> W["Watchers"] W --> P["Parser"] P --> S["Pre-Serialize"] S --> B["Broadcast"] B --> WS["WebSocket"] B --> SSE["SSE"] B --> TCP["TCP"] style CT fill:#1a1a1a,stroke:#4ade80,color:#fff style W fill:#1a1a1a,stroke:#60a5fa,color:#fff style P fill:#1a1a1a,stroke:#60a5fa,color:#fff style S fill:#1a1a1a,stroke:#f97316,color:#fff style B fill:#1a1a1a,stroke:#f97316,color:#fff style WS fill:#1a1a1a,stroke:#4ade80,color:#fff style SSE fill:#1a1a1a,stroke:#4ade80,color:#fff style TCP fill:#1a1a1a,stroke:#4ade80,color:#fff

CT Log Polling

How certificates are fetched from Certificate Transparency logs

  1. Fetch log list

    On startup, fetches the official CT log list from Google's all_logs_list.json. Filters to only "usable" logs. Merges with custom logs from config.

  2. Spawn watchers

    Each CT log gets its own async task (tokio spawn). Tasks run independently. If state file exists, resumes from saved position.

  3. Poll tree size

    Watcher calls /ct/v1/get-sth to get current tree size. Compares with tracked position to determine new entries.

  4. Fetch entries

    Fetches batch via /ct/v1/get-entries?start=X&end=Y. Batch size configurable (default: 256).

  5. Health tracking

    Tracks consecutive failures. States: Healthy, Degraded, Unhealthy. Unhealthy logs pause with exponential backoff.

  6. Save state

    After each batch, saves position to state file. Enables resume after restart.

Certificate Parsing

X.509 certificate decoding flow

flowchart LR A["CT Entry"] --> B["Base64"] B --> C["MerkleTreeLeaf"] C --> D["x509_parser"] D --> E["Extract"] style A fill:#1a1a1a,stroke:#60a5fa,color:#fff style B fill:#1a1a1a,stroke:#f97316,color:#fff style C fill:#1a1a1a,stroke:#f97316,color:#fff style D fill:#1a1a1a,stroke:#f97316,color:#fff style E fill:#1a1a1a,stroke:#4ade80,color:#fff

MerkleTreeLeaf Structure (RFC 6962)

Byte 0 Version | Byte 1 LeafType | Byte 2-9 Timestamp | Byte 10-11 EntryType (0=X509, 1=Precert) | Byte 12-14 Cert length | Byte 15+ DER certificate

Precert extra_data (RFC 6962)

3 bytes pre-certificate length | variable pre-certificate (X509 with CT poison extension) | 3 bytes chain length | variable certificate chain

Extracted Fields

Subject/Issuer: CN, O, C, L, ST, OU, Email
Hashes: SHA1, SHA256, Fingerprint
Validity: not_before, not_after
Extensions: SubjectAltName, KeyUsage, BasicConstraints
Domains: Collected from CN + SAN DNS entries

Pre-Serialization

Serialize once, broadcast to all clients

flowchart LR A["Message"] --> B["serialize()"] B --> C["Arc"] C --> D["broadcast"] D --> E["Clients"] style A fill:#1a1a1a,stroke:#4ade80,color:#fff style B fill:#1a1a1a,stroke:#f97316,color:#fff style C fill:#1a1a1a,stroke:#60a5fa,color:#fff style D fill:#1a1a1a,stroke:#60a5fa,color:#fff style E fill:#1a1a1a,stroke:#4ade80,color:#fff

Why Pre-Serialize?

Instead of serializing each message for every connected client, certstream-server-rust serializes once and shares via Arc. With 10,000 clients, this means 1 serialization instead of 10,000. The broadcast channel uses backpressure - lagging clients skip messages rather than blocking others.

Stream Formats

Three output formats for different use cases

full

/full-stream

Complete certificate data including chain and DER-encoded certificate.

~15-50 KB per message

lite

/ (default)

Certificate metadata without chain or DER data. Best for most use cases.

~2-5 KB per message

domains_only

/domains-only

Just the domain names array. Minimal bandwidth.

~100-500 bytes per message

State Persistence

Resume from last position after restart

flowchart LR A["StateManager"] --> B["DashMap"] B --> C["state.json"] C --> D["Resume"] style A fill:#1a1a1a,stroke:#4ade80,color:#fff style B fill:#1a1a1a,stroke:#60a5fa,color:#fff style C fill:#1a1a1a,stroke:#60a5fa,color:#fff style D fill:#1a1a1a,stroke:#4ade80,color:#fff

State Structure

Each CT log's state is tracked with: current_index (last processed entry), tree_size (known tree size), and last_success (timestamp for health tracking). State is persisted periodically to JSON file, enabling zero-loss restarts.