GitHub

High-Level Architecture

End-to-end data flow from CT logs to clients

flowchart LR CT["49+ RFC 6962\nCT Logs"] --> W["RFC 6962\nWatchers"] SCT["Static CT Logs\n(Willow, Sycamore)"] --> SW["Static CT\nWatchers"] W --> P["Parser"] SW --> P P --> D["Dedup Filter"] D --> S["Pre-Serialize\n(simd-json)"] S --> B["Arc<PreSerializedMessage>\nBroadcast"] B --> WS["WebSocket"] B --> SSE["SSE"] style CT fill:#1a1a1a,stroke:#4ade80,color:#fff style SCT fill:#1a1a1a,stroke:#a78bfa,color:#fff style W fill:#1a1a1a,stroke:#60a5fa,color:#fff style SW fill:#1a1a1a,stroke:#a78bfa,color:#fff style P fill:#1a1a1a,stroke:#60a5fa,color:#fff style D fill:#1a1a1a,stroke:#f59e0b,color:#fff style S fill:#1a1a1a,stroke:#f97316,color:#fff style B fill:#1a1a1a,stroke:#f97316,color:#fff style WS fill:#1a1a1a,stroke:#4ade80,color:#fff style SSE fill:#1a1a1a,stroke:#4ade80,color:#fff

v1.3.0: Zero-Copy & Lock-Free Pipeline

Every certificate is serialized exactly once using simd-json (SIMD-accelerated, enabled by default) into three pre-built Bytes payloads (full, lite, domains_only), then wrapped in an Arc<PreSerializedMessage>. Each subscriber receives only a pointer clone — no re-serialization, no extra heap allocation per client. Shared state uses lock-free DashMap throughout; there are no global read/write mutexes in the hot path. The result is a flat ~150 MB RSS footprint under sustained load, stable over time.

CT Log Polling

How certificates are fetched from Certificate Transparency logs

  1. Fetch log list

    On startup, fetches the Chrome-trusted CT log list from Google's log_list.json. Filters out rejected and retired logs. Merges with custom logs from config.

  2. Spawn watchers

    Each CT log gets its own async task (tokio spawn). Tasks run independently. If state file exists, resumes from saved position.

  3. Poll tree size

    Watcher calls /ct/v1/get-sth to get current tree size. Compares with tracked position to determine new entries.

  4. Fetch entries

    Fetches batch via /ct/v1/get-entries?start=X&end=Y. Batch size configurable (default: 256).

  5. Health tracking

    Tracks consecutive failures. States: Healthy, Degraded, Unhealthy. Unhealthy logs pause with exponential backoff.

  6. Save state

    After each batch, saves position to state file. Enables resume after restart.

Static CT Protocol

Checkpoint + tile-based fetching for next-generation CT logs (v1.3.0)

Why Static CT?

Let's Encrypt is shutting down RFC 6962 CT logs on February 28, 2026. New logs use a static, tile-based protocol where the tree is served as immutable tiles instead of dynamic get-entries API calls. Chrome accepts static-ct-api logs since April 2025.

  1. Fetch checkpoint

    Polls /checkpoint to get current tree size. Checkpoints are signed text files with origin, tree size, and root hash.

  2. Calculate tile range

    Each tile contains 256 entries. Calculates which tiles need fetching based on current index vs tree size. Supports partial tiles for the last tile.

  3. Fetch tile data

    Downloads tile from /tile/data/<path>. Path uses hierarchical encoding (e.g., x001/234 for tile 1234). Tiles may be gzip-compressed.

  4. Parse binary entries

    Binary parser extracts timestamp, entry type (x509/precert), DER certificate, and chain fingerprints from each 256-entry tile.

  5. Fetch issuer certificates

    Chain certificates are referenced by SHA-256 fingerprint. Fetched from /issuer/<hex> and cached in a DashMap-based issuer cache.

  6. Dedup and broadcast

    Certificates pass through cross-log dedup filter before being serialized and broadcast to clients.

Cross-Log Deduplication

Filter duplicate certificates across multiple CT logs (v1.3.0)

flowchart LR A["Certificate"] --> B["SHA-256 Hash"] B --> C{"DedupFilter"} C -->|New| D["Broadcast"] C -->|Duplicate| E["Discard"] style A fill:#1a1a1a,stroke:#4ade80,color:#fff style B fill:#1a1a1a,stroke:#60a5fa,color:#fff style C fill:#1a1a1a,stroke:#f59e0b,color:#fff style D fill:#1a1a1a,stroke:#4ade80,color:#fff style E fill:#1a1a1a,stroke:#ef4444,color:#fff

How It Works

The same certificate often appears in multiple CT logs simultaneously. The dedup filter uses a DashMap<[u8; 32], Instant> keyed by the raw 32-byte SHA-256 digest stored directly in LeafCert::sha256_raw — a fixed-size, stack-allocated key that eliminates one heap allocation per lookup compared to a String key. First occurrence passes through; duplicates within the 5-minute TTL window are discarded. Capacity is capped at 500K entries. A background task cleans expired entries every 60 seconds.

Certificate Parsing

X.509 certificate decoding flow

flowchart LR A["CT Entry"] --> B["Base64"] B --> C["MerkleTreeLeaf"] C --> D["x509_parser"] D --> E["Extract"] style A fill:#1a1a1a,stroke:#60a5fa,color:#fff style B fill:#1a1a1a,stroke:#f97316,color:#fff style C fill:#1a1a1a,stroke:#f97316,color:#fff style D fill:#1a1a1a,stroke:#f97316,color:#fff style E fill:#1a1a1a,stroke:#4ade80,color:#fff

MerkleTreeLeaf Structure (RFC 6962)

Byte 0 Version | Byte 1 LeafType | Byte 2-9 Timestamp | Byte 10-11 EntryType (0=X509, 1=Precert) | Byte 12-14 Cert length | Byte 15+ DER certificate

Precert extra_data (RFC 6962)

3 bytes pre-certificate length | variable pre-certificate (X509 with CT poison extension) | 3 bytes chain length | variable certificate chain

Extracted Fields

Subject/Issuer: CN, O, C, L, ST, OU, Email
Hashes: SHA1, SHA256, Fingerprint
Validity: not_before, not_after
Extensions: SubjectAltName, KeyUsage, BasicConstraints
Domains: Collected from CN + SAN DNS entries

Pre-Serialization

Serialize once, broadcast to all clients

flowchart LR A["Message"] --> B["serialize()"] B --> C["Arc"] C --> D["broadcast"] D --> E["Clients"] style A fill:#1a1a1a,stroke:#4ade80,color:#fff style B fill:#1a1a1a,stroke:#f97316,color:#fff style C fill:#1a1a1a,stroke:#60a5fa,color:#fff style D fill:#1a1a1a,stroke:#60a5fa,color:#fff style E fill:#1a1a1a,stroke:#4ade80,color:#fff

Why Pre-Serialize? (Serialize-Once, Broadcast-Many)

Instead of serializing each message for every connected client, certstream-server-rust serializes once per certificate into three pre-built byte payloads (full, lite, domains_only), wraps them in an Arc<PreSerializedMessage>, and clones only the Arc pointer to every subscriber — zero re-serialization, zero extra heap allocation per client. JSON serialization uses simd-json (SIMD-accelerated) when the simd feature is enabled (the default). With 10,000 clients, this means 1 serialization instead of 10,000. The broadcast channel uses backpressure — lagging clients skip messages rather than blocking others.

Stream Formats

Three output formats for different use cases

full

/full-stream

Complete certificate data including chain and DER-encoded certificate.

~15-50 KB per message

lite

/ (default)

Certificate metadata without chain or DER data. Best for most use cases.

~2-5 KB per message

domains_only

/domains-only

Just the domain names array. Minimal bandwidth.

~100-500 bytes per message

State Persistence

Resume from last position after restart

flowchart LR A["StateManager"] --> B["DashMap"] B --> C["state.json"] C --> D["Resume"] style A fill:#1a1a1a,stroke:#4ade80,color:#fff style B fill:#1a1a1a,stroke:#60a5fa,color:#fff style C fill:#1a1a1a,stroke:#60a5fa,color:#fff style D fill:#1a1a1a,stroke:#4ade80,color:#fff

State Structure

Each CT log's state is tracked with: current_index (last processed entry), tree_size (known tree size), and last_success (timestamp for health tracking). State is persisted periodically (every 30s) and on shutdown to JSON file, enabling zero-loss restarts. Both RFC 6962 and static CT log positions are tracked.

Atomic Dirty Flag (v1.3.0)

The dirty flag uses AtomicBool instead of a lock, ensuring state updates are never silently dropped. State is flushed on graceful shutdown (SIGINT/SIGTERM) and when the periodic save task is cancelled. Enabled by default with state_file: "certstream_state.json".