GitHub
Overview

End-to-end data flow

From CT logs to WebSocket and SSE clients.

flowchart LR CT["RFC 6962\nCT Logs"] --> W["RFC 6962\nWatchers"] SCT["Static CT Logs\n(Willow, Sycamore)"] --> SW["Static CT\nWatchers"] W --> P["Parser"] SW --> P P --> D["Dedup Filter"] D --> S["Pre-Serialize\n(simd-json)"] S --> B["Arc<PreSerializedMessage>\nBroadcast"] B --> WS["WebSocket"] B --> SSE["SSE"]

Zero-copy, lock-free pipeline

Every certificate is serialized exactly once using simd-json (SIMD-accelerated, enabled by default) into three pre-built Bytes payloads (full, lite, domains_only), then wrapped in an Arc<PreSerializedMessage>. Each subscriber receives a pointer clone and a zero-copy Utf8Bytes text frame — no re-serialization, no per-client JSON re-encoding. When no clients are connected the serialize step is skipped entirely via a receiver_count() == 0 guard. Shared state uses lock-free DashMap throughout; there are no global read/write mutexes in the hot path. The result is a roughly 118 MiB stable RSS under load (100 WebSocket clients, 10-minute plateau, ±5 MiB swing).

Ingest · RFC 6962

CT log polling

How certificates are fetched from classic Certificate Transparency logs.

  1. Fetch the log lists

    On startup the server fetches the Chrome-trusted log list from Google and the Apple log list in parallel, filters out rejected and retired logs, dedupes by log ID, and merges in any custom logs from config.

  2. Spawn watchers

    Each CT log gets its own async task (tokio spawn). Tasks run independently. If a state file exists, each watcher resumes from its saved position.

  3. Poll tree size

    The watcher calls /ct/v1/get-sth to read the current tree size and compares it with the tracked position to find new entries.

  4. Fetch entries

    Entries are fetched in batches via /ct/v1/get-entries?start=X&end=Y. Batch size is configurable (default: 256), and the index only advances by the number of entries actually returned.

  5. Track health

    Consecutive failures move a log through Healthy → Degraded → Unhealthy. Unhealthy logs pause behind a circuit breaker with exponential backoff.

  6. Save state

    After each batch the position is recorded so the server can resume cleanly after a restart.

Ingest · static-CT-API

Static CT protocol

Checkpoint + tile-based fetching for next-generation CT logs.

Why static CT?

Let's Encrypt has retired its RFC 6962 logs in favor of static, tile-based logs, where the tree is served as immutable tiles instead of dynamic get-entries calls. Chrome has accepted static-ct-api logs since April 2025, and the format is now where most new logs are headed.

  1. Fetch checkpoint

    Polls /checkpoint for the current tree size. Checkpoints are signed text files carrying origin, tree size, and root hash.

  2. Calculate tile range

    Each tile holds 256 entries. The watcher computes which tiles to fetch from the current index versus tree size, with partial-tile width validation for the last tile.

  3. Fetch tile data

    Tiles are downloaded from /tile/data/<path> using hierarchical path encoding (e.g. x001/234 for tile 1234). Tiles may be gzip-compressed, with a hard cap on decompressed size.

  4. Parse binary entries

    A binary parser extracts the timestamp, entry type (x509 / precert), DER certificate, and chain fingerprints from each entry in the tile.

  5. Fetch issuer certificates

    Chain certificates are referenced by SHA-256 fingerprint, fetched from /issuer/<hex> and stored in a single shared, DashMap-based issuer cache.

  6. Dedup and broadcast

    Certificates pass through the cross-log dedup filter before being serialized once and broadcast to all clients.

Pipeline

Cross-log deduplication

One certificate, many logs — collapsed to a single broadcast.

flowchart LR A["Certificate"] --> B["SHA-256"] B --> C{"Dedup Filter"} C -->|New| D["Broadcast"] C -->|Duplicate| E["Discard"]

How it works

The same certificate often appears in multiple CT logs at once. The dedup filter uses a DashMap<[u8; 32], Instant> keyed by the raw 32-byte SHA-256 digest stored directly in LeafCert::sha256_raw — a fixed-size, stack-allocated key that avoids one heap allocation per lookup compared with a String key. The first occurrence passes through; duplicates within the TTL window are discarded. The default window is 900 seconds (15 minutes) with a default capacity of 200K entries, both tunable. A background task prunes expired entries every 60 seconds rather than wiping the cache on overflow.

Pipeline

Certificate parsing

X.509 decoding, from CT entry to extracted fields.

flowchart LR A["CT Entry"] --> B["Base64"] B --> C["MerkleTreeLeaf"] C --> D["x509_parser"] D --> E["Extract"]

MerkleTreeLeaf structure (RFC 6962)

Byte 0 Version · Byte 1 LeafType · Bytes 2–9 Timestamp · Bytes 10–11 EntryType (0 = X509, 1 = Precert) · Bytes 12–14 Cert length · Byte 15+ DER certificate

Precert extra_data (RFC 6962)

3 bytes pre-certificate length · variable pre-certificate (X509 with CT poison extension) · 3 bytes chain length · variable certificate chain

Extracted fields

Subject / Issuer: CN, O, C, L, ST, OU, Email
Hashes: SHA1, SHA256, fingerprint
Validity: not_before, not_after
Extensions: SubjectAltName, KeyUsage, BasicConstraints
Domains: collected from CN + SAN DNS entries

Pipeline

Pre-serialization

Serialize once, broadcast to everyone.

flowchart LR A["Message"] --> B["serialize()"] B --> C["Arc"] C --> D["broadcast"] D --> E["Clients"]

Serialize-once, broadcast-many

Instead of serializing each message per connected client, certstream serializes once per certificate into three pre-built byte payloads (full, lite, domains_only), wraps them in an Arc<PreSerializedMessage>, and clones only the Arc pointer to every subscriber — zero re-serialization, zero extra heap allocation per client. With 10,000 clients that is one serialization instead of 10,000. The broadcast channel uses backpressure: lagging clients skip messages rather than blocking others, and a client that falls too far behind is disconnected.

Output

Stream formats

Three payload shapes for three use cases.

full

/full-stream

Complete certificate data including chain and DER-encoded certificate.

~15–50 KB / message

lite

/ (default)

Certificate metadata without chain or DER data. Best for most use cases.

~2–5 KB / message

domains_only

/domains-only

Just the domain names array. Minimal bandwidth.

~100–500 B / message
Durability

State persistence

Resume from the last position after a restart.

flowchart LR A["StateManager"] --> B["DashMap"] B --> C["state.json"] C --> D["Resume"]

State structure

Each CT log's state tracks current_index (last processed entry), tree_size (known tree size), and last_success (timestamp for health tracking). State is persisted periodically (every 30 s) and on shutdown to a JSON file, enabling zero-loss restarts. Both RFC 6962 and static CT positions are tracked.

Atomic dirty flag

The dirty flag uses an AtomicBool instead of a lock, so state updates are never silently dropped. State is flushed on graceful shutdown (SIGINT / SIGTERM) and when the periodic save task is cancelled. Persistence is on by default with state_file: "certstream_state.json".